When a front office starts capping pitch counts, the first objection is easy to predict:
- the capped pitchers got hurt anyway
- therefore the cap did not work
That argument sounds practical. It is also almost built to confuse correlation with intervention, because teams do not place stricter limits on random arms. They place them on the pitchers they already fear are fragile.
A workload cap can look associated with injury simply because the cap was assigned to the riskiest arms.
This is one of the cleanest sports examples of why causal reasoning matters. The policy is targeted. The outcome is high stakes. And the wrong subgroup analysis is always waiting nearby.
Start with the baseball version, not the textbook version
The baseball conversation usually sounds like this:
- capped pitchers missed time anyway
- uncapped pitchers looked sturdier
But the training staff was not blind. If one arm already looked fragile, that pitcher was more likely to get the cap in the first place.
There is also a second trap. Analysts often look only at pitchers who stayed in the rotation long enough to keep accumulating starts. That sounds sensible until you notice that staying in the rotation is partly affected by the cap and partly affected by fragility. Now you are conditioning on a post-treatment selection node.
Those two distortions are easier to see in one flow than in a table:

First the cap gets assigned to the arms the staff already fears. Then a rotation-only slice can bias the picture again by filtering on a post-cap survivor node.
Make the graph explicit
Here is the compact story:
Fragility: baseline arm riskCap: whether the pitcher gets the pitch-count limitRotation: whether the pitcher remains in the active rotation window you are analyzingInjury: downstream injury outcome
The edges are:
Fragility -> CapFragility -> RotationFragility -> InjuryCap -> RotationCap -> Injury
Drawn as a graph, that story looks like this:

The first question is a graph question, not an estimation question.
import networkx as nx from pybbn.graphical import get_graph_tuple, get_minimal_confounders g = nx.DiGraph() g.add_edges_from( [ ("Fragility", "Cap"), ("Fragility", "Rotation"), ("Fragility", "Injury"), ("Cap", "Rotation"), ("Cap", "Injury"), ] ) gt = get_graph_tuple(g) get_minimal_confounders(gt, "Cap", "Injury")# ['Fragility']
That is the essential result. If you do not account for baseline fragility, the raw cap-versus-no-cap comparison is already biased. If you then filter to rotation survivors, you make the story even less trustworthy.
Build the toy model in py-bbn
from pybbn.factory import create_reasoning_model
d = {
"nodes": ["Fragility", "Cap", "Rotation", "Injury"],
"edges": [
("Fragility", "Cap"),
("Fragility", "Rotation"),
("Fragility", "Injury"),
("Cap", "Rotation"),
("Cap", "Injury"),
],
}
p = {
"Fragility": {
"columns": ["Fragility", "__p__"],
"data": [["durable", 0.72], ["fragile", 0.28]],
},
"Cap": {
"columns": ["Fragility", "Cap", "__p__"],
"data": [
["durable", "no", 0.78],
["durable", "yes", 0.22],
["fragile", "no", 0.25],
["fragile", "yes", 0.75],
],
},
"Rotation": {
"columns": ["Fragility", "Cap", "Rotation", "__p__"],
"data": [
["durable", "no", "no", 0.10],
["durable", "no", "yes", 0.90],
["durable", "yes", "no", 0.04],
["durable", "yes", "yes", 0.96],
["fragile", "no", "no", 0.35],
["fragile", "no", "yes", 0.65],
["fragile", "yes", "no", 0.18],
["fragile", "yes", "yes", 0.82],
],
},
"Injury": {
"columns": ["Fragility", "Cap", "Injury", "__p__"],
"data": [
["durable", "no", "no", 0.93],
["durable", "no", "yes", 0.07],
["durable", "yes", "no", 0.96],
["durable", "yes", "yes", 0.04],
["fragile", "no", "no", 0.66],
["fragile", "no", "yes", 0.34],
["fragile", "yes", "no", 0.76],
["fragile", "yes", "yes", 0.24],
],
},
}
model = create_reasoning_model(d, p)
The raw comparison says the cap group looks worse
If you only condition on the cap assignment, the capped group looks riskier:
capped = model.pquery(nodes=["Injury"], evidences=model.e({"Cap": "yes"}))["Injury"]
uncapped = model.pquery(nodes=["Injury"], evidences=model.e({"Cap": "no"}))["Injury"]
In this toy setup:
P(injury = yes | cap = yes) = 0.1540P(injury = yes | cap = no) = 0.0999
That looks like a failure if you forget why the cap was assigned.
The intervention flips the sign
Now ask the question the pitching coach actually cares about:
do_cap = model.iquery(["Injury"], ["yes"], ["Cap"], ["yes"]).iloc[0]
do_no_cap = model.iquery(["Injury"], ["yes"], ["Cap"], ["no"]).iloc[0]
Now the result is:
P(injury = yes | do(cap = yes)) = 0.0960P(injury = yes | do(cap = no)) = 0.1456
So the same cap that looked associated with harm in the raw slice is protective under intervention.
That is not a paradox. It is what targeted policy assignment does to observational comparisons.
Counterfactuals make it useful for one pitcher
Suppose you are looking at one fragile arm who received the cap and stayed in the rotation. You can ask:
model.cquery(
"Injury",
{"Fragility": "fragile", "Cap": "yes", "Rotation": "yes"},
{"Cap": "no"},
)
In this toy model, that counterfactual injury risk under no cap is 0.34.
That is a better baseball conversation than “caps work” or “caps do not work.” It is: for this kind of pitcher, how much protection is the cap buying?
What this buys you
A lot of sports medicine arguments are really causal questions hiding inside practical language.
The graph makes that visible:
- fragility is a confounder
- rotation status is not a harmless filter
- the intervention is not the same thing as the raw split
That is exactly the kind of structure a causal API should clarify instead of obscuring.
Next in the series: a congestion toll can look weak on the exact days when the city needs it most.


Leave a Reply