A Major League Baseball (MLB) bullpen gives up three runs in the eighth inning. The explanation is ready before the inning ends: they were overworked.
That explanation is plausible. It is also incomplete. Managers use their best relievers in high-leverage games, after short starts, and against particular matchups. Those same conditions can make late runs more likely even if fatigue is not the whole cause.
Recent bullpen workload can raise late-game risk, but the raw workload comparison also carries the game state that created the workload.
Share: Bullpen fatigue is a causal question because leverage, rest, matchup quality, and score pressure move together. #Causality #MLB #Baseball
This article uses a code-first walkthrough: build the graph, set the parameters, then ask how much late-run pressure changes when recent workload is pushed.
Build the graph
MLB plays a long daily schedule during the week of May 4, 2026 [1]. That makes bullpen management visible: a reliever who pitched yesterday may be available, unavailable, or less sharp today.
The directed acyclic graph (DAG), a graph whose arrows do not loop back on themselves, uses RecentWorkload as the treatment and LateRunsAllowed as the outcome:
GameLeverage: pressure that makes managers use high-value relieversStarterShortStart: an early exit by the starterRecentWorkload: recent appearances and pitch stressRelieverFatigue: the physiological fatigue channelMatchupQuality: whether the available reliever fits the hittersLateRunsAllowed: late-game run damage

Here is the graph and its toy parameters as a py-scm setup. This uses the py-scm reference implementation for a continuous Gaussian structural causal model (SCM). An SCM defines each variable through direct causes. The coefficients are illustrative.
import numpy as np
from pyscm.reasoning import create_reasoning_model
nodes = [
"GameLeverage",
"StarterShortStart",
"RecentWorkload",
"RelieverFatigue",
"MatchupQuality",
"LateRunsAllowed",
]
weighted_edges = [
("GameLeverage", "RecentWorkload", 0.60),
("GameLeverage", "LateRunsAllowed", 0.25),
("StarterShortStart", "RecentWorkload", 0.55),
("StarterShortStart", "LateRunsAllowed", 0.30),
("RecentWorkload", "RelieverFatigue", 0.85),
("RecentWorkload", "MatchupQuality", -0.25),
("RecentWorkload", "LateRunsAllowed", 0.10),
("RelieverFatigue", "LateRunsAllowed", 0.75),
("MatchupQuality", "LateRunsAllowed", -0.45),
]
idx = {node: i for i, node in enumerate(nodes)}
B = np.zeros((len(nodes), len(nodes)))
for parent, child, weight in weighted_edges:
B[idx[child], idx[parent]] = weight
A = np.eye(len(nodes)) - B
cov = np.linalg.inv(A) @ np.eye(len(nodes)) @ np.linalg.inv(A).T
model = create_reasoning_model(
{"nodes": nodes, "edges": [(p, c) for p, c, _ in weighted_edges]},
{"v": nodes, "m": [0.0] * len(nodes), "S": cov.tolist()},
)
The key backdoor path is:
RecentWorkload <- GameLeverage -> LateRunsAllowed
Workload is partly selected by hard games. MLB Statcast data makes it possible to describe pitches and batted balls in much richer detail, but a causal question still has to separate usage from the state that selected usage [2].
Ask the fatigue question
The inference code asks whether pushing workload changes late-run pressure:
raw_slice = model.pquery({"RecentWorkload": 1.0})[0]["LateRunsAllowed"]
do_loaded = model.iquery("LateRunsAllowed", {"RecentWorkload": 1.0})
do_fresh = model.iquery("LateRunsAllowed", {"RecentWorkload": 0.0})
workload_effect = model.equery(
"LateRunsAllowed",
{"RecentWorkload": 1.0},
{"RecentWorkload": 0.0},
)
raw_slice observes a high-workload bullpen. workload_effect asks what changes when workload is set directly.
Observed workload gap: +1.04
Intervention workload effect: +0.85
Observed minus effect: +0.19
The intervention effect is still positive, so workload matters in this toy model. The raw gap is larger because high workload is selected by leverage and short starts.

What a manager can do with the distinction
The causal lesson is practical. If fatigue is the mechanism, rest and usage planning help. If matchup quality is the mechanism, the problem may be roster construction or tactical sequencing. If leverage selection is the mechanism, the bullpen may look tired because the team keeps playing tight games.
Those are different fixes. A box score does not separate them by itself.
Sources
- MLB Schedule 2026, MLB, accessed April 28, 2026.
- Statcast: Glossary of terms, MLB, accessed April 28, 2026.
- Manifestations of muscle fatigue in baseball pitchers: a systematic review, PeerJ via PubMed Central, accessed April 28, 2026.
Download the runnable standalone Python example: Python example ZIP.


Leave a Reply