Causality APIs 09: The Pitch Count Chose The Fragile Arms

When a front office starts capping pitch counts, the first objection is easy to predict:

the capped pitchers got hurt anyway
therefore the cap did not work

That argument sounds practical. It is also almost built to confuse correlation with intervention, because teams do not place stricter limits on random arms. They place them on the pitchers they already fear are fragile.

A workload cap can look associated with injury simply because the cap was assigned to the riskiest arms.

This is one of the cleanest sports examples of why causal reasoning matters. The policy is targeted. The outcome is high stakes. And the wrong subgroup analysis is always waiting nearby.

Start with the baseball version, not the textbook version

The baseball conversation usually sounds like this:

capped pitchers missed time anyway
uncapped pitchers looked sturdier

But the training staff was not blind. If one arm already looked fragile, that pitcher was more likely to get the cap in the first place.

There is also a second trap. Analysts often look only at pitchers who stayed in the rotation long enough to keep accumulating starts. That sounds sensible until you notice that staying in the rotation is partly affected by the cap and partly affected by fragility. Now you are conditioning on a post-treatment selection node.

Those two distortions are easier to see in one flow than in a table:

Rotation survivor trap for pitch counts

First the cap gets assigned to the arms the staff already fears. Then a rotation-only slice can bias the picture again by filtering on a post-cap survivor node.

Make the graph explicit

Here is the compact story:

Fragility: baseline arm risk
Cap: whether the pitcher gets the pitch-count limit
Rotation: whether the pitcher remains in the active rotation window you are analyzing
Injury: downstream injury outcome

The edges are:

Fragility -> Cap
Fragility -> Rotation
Fragility -> Injury
Cap -> Rotation
Cap -> Injury

Drawn as a graph, that story looks like this:

Pitch count DAG

The first question is a graph question, not an estimation question.

import networkx as nx  from pybbn.graphical import get_graph_tuple, get_minimal_confounders  g = nx.DiGraph() g.add_edges_from(     [         ("Fragility", "Cap"),         ("Fragility", "Rotation"),         ("Fragility", "Injury"),         ("Cap", "Rotation"),         ("Cap", "Injury"),     ] )  gt = get_graph_tuple(g)  get_minimal_confounders(gt, "Cap", "Injury") # ['Fragility']

That is the essential result. If you do not account for baseline fragility, the raw cap-versus-no-cap comparison is already biased. If you then filter to rotation survivors, you make the story even less trustworthy.

Build the toy model in `py-bbn`

from pybbn.factory import create_reasoning_model  d = {     "nodes": ["Fragility", "Cap", "Rotation", "Injury"],     "edges": [         ("Fragility", "Cap"),         ("Fragility", "Rotation"),         ("Fragility", "Injury"),         ("Cap", "Rotation"),         ("Cap", "Injury"),     ], }  p = {     "Fragility": {         "columns": ["Fragility", "__p__"],         "data": [["durable", 0.72], ["fragile", 0.28]],     },     "Cap": {         "columns": ["Fragility", "Cap", "__p__"],         "data": [             ["durable", "no", 0.78],             ["durable", "yes", 0.22],             ["fragile", "no", 0.25],             ["fragile", "yes", 0.75],         ],     },     "Rotation": {         "columns": ["Fragility", "Cap", "Rotation", "__p__"],         "data": [             ["durable", "no", "no", 0.10],             ["durable", "no", "yes", 0.90],             ["durable", "yes", "no", 0.04],             ["durable", "yes", "yes", 0.96],             ["fragile", "no", "no", 0.35],             ["fragile", "no", "yes", 0.65],             ["fragile", "yes", "no", 0.18],             ["fragile", "yes", "yes", 0.82],         ],     },     "Injury": {         "columns": ["Fragility", "Cap", "Injury", "__p__"],         "data": [             ["durable", "no", "no", 0.93],             ["durable", "no", "yes", 0.07],             ["durable", "yes", "no", 0.96],             ["durable", "yes", "yes", 0.04],             ["fragile", "no", "no", 0.66],             ["fragile", "no", "yes", 0.34],             ["fragile", "yes", "no", 0.76],             ["fragile", "yes", "yes", 0.24],         ],     }, }  model = create_reasoning_model(d, p)

The raw comparison says the cap group looks worse

If you only condition on the cap assignment, the capped group looks riskier:

capped = model.pquery(nodes=["Injury"], evidences=model.e({"Cap": "yes"}))["Injury"] uncapped = model.pquery(nodes=["Injury"], evidences=model.e({"Cap": "no"}))["Injury"]

In this toy setup:

P(injury = yes | cap = yes) = 0.1540
P(injury = yes | cap = no) = 0.0999

That looks like a failure if you forget why the cap was assigned.

The intervention flips the sign

Now ask the question the pitching coach actually cares about:

do_cap = model.iquery(["Injury"], ["yes"], ["Cap"], ["yes"]).iloc[0] do_no_cap = model.iquery(["Injury"], ["yes"], ["Cap"], ["no"]).iloc[0]

Now the result is:

P(injury = yes | do(cap = yes)) = 0.0960
P(injury = yes | do(cap = no)) = 0.1456

So the same cap that looked associated with harm in the raw slice is protective under intervention.

That is not a paradox. It is what targeted policy assignment does to observational comparisons.

Counterfactuals make it useful for one pitcher

Suppose you are looking at one fragile arm who received the cap and stayed in the rotation. You can ask:

model.cquery(     "Injury",     {"Fragility": "fragile", "Cap": "yes", "Rotation": "yes"},     {"Cap": "no"}, )

In this toy model, that counterfactual injury risk under no cap is 0.34.

That is a better baseball conversation than “caps work” or “caps do not work.” It is: for this kind of pitcher, how much protection is the cap buying?

What this buys you

A lot of sports medicine arguments are really causal questions hiding inside practical language.

The graph makes that visible:

fragility is a confounder
rotation status is not a harmless filter
the intervention is not the same thing as the raw split

That is exactly the kind of structure a causal API should clarify instead of obscuring.

Next in the series: a congestion toll can look weak on the exact days when the city needs it most.

Blogs