Causality APIs 21: The Bullpen Was Overworked. Did That Cost The Game?

Rocket Vector rocket logo on a dark branded background.

A Major League Baseball (MLB) bullpen gives up three runs in the eighth inning. The explanation is ready before the inning ends: they were overworked.

That explanation is plausible. It is also incomplete. Managers use their best relievers in high-leverage games, after short starts, and against particular matchups. Those same conditions can make late runs more likely even if fatigue is not the whole cause.

Recent bullpen workload can raise late-game risk, but the raw workload comparison also carries the game state that created the workload.

Share: Bullpen fatigue is a causal question because leverage, rest, matchup quality, and score pressure move together. #Causality #MLB #Baseball

This article uses a code-first walkthrough: build the graph, set the parameters, then ask how much late-run pressure changes when recent workload is pushed.

Build the graph

MLB plays a long daily schedule during the week of May 4, 2026 [1]. That makes bullpen management visible: a reliever who pitched yesterday may be available, unavailable, or less sharp today.

The directed acyclic graph (DAG), a graph whose arrows do not loop back on themselves, uses RecentWorkload as the treatment and LateRunsAllowed as the outcome:

  • GameLeverage: pressure that makes managers use high-value relievers
  • StarterShortStart: an early exit by the starter
  • RecentWorkload: recent appearances and pitch stress
  • RelieverFatigue: the physiological fatigue channel
  • MatchupQuality: whether the available reliever fits the hitters
  • LateRunsAllowed: late-game run damage

Directed acyclic graph linking game leverage, short starts, recent workload, reliever fatigue, matchup quality, and late runs allowed

Here is the graph and its toy parameters as a py-scm setup. This uses the py-scm reference implementation for a continuous Gaussian structural causal model (SCM). An SCM defines each variable through direct causes. The coefficients are illustrative.

import numpy as np

from pyscm.reasoning import create_reasoning_model

nodes = [
    "GameLeverage",
    "StarterShortStart",
    "RecentWorkload",
    "RelieverFatigue",
    "MatchupQuality",
    "LateRunsAllowed",
]

weighted_edges = [
    ("GameLeverage", "RecentWorkload", 0.60),
    ("GameLeverage", "LateRunsAllowed", 0.25),
    ("StarterShortStart", "RecentWorkload", 0.55),
    ("StarterShortStart", "LateRunsAllowed", 0.30),
    ("RecentWorkload", "RelieverFatigue", 0.85),
    ("RecentWorkload", "MatchupQuality", -0.25),
    ("RecentWorkload", "LateRunsAllowed", 0.10),
    ("RelieverFatigue", "LateRunsAllowed", 0.75),
    ("MatchupQuality", "LateRunsAllowed", -0.45),
]

idx = {node: i for i, node in enumerate(nodes)}
B = np.zeros((len(nodes), len(nodes)))
for parent, child, weight in weighted_edges:
    B[idx[child], idx[parent]] = weight

A = np.eye(len(nodes)) - B
cov = np.linalg.inv(A) @ np.eye(len(nodes)) @ np.linalg.inv(A).T

model = create_reasoning_model(
    {"nodes": nodes, "edges": [(p, c) for p, c, _ in weighted_edges]},
    {"v": nodes, "m": [0.0] * len(nodes), "S": cov.tolist()},
)

The key backdoor path is:

RecentWorkload <- GameLeverage -> LateRunsAllowed

Workload is partly selected by hard games. MLB Statcast data makes it possible to describe pitches and batted balls in much richer detail, but a causal question still has to separate usage from the state that selected usage [2].

Ask the fatigue question

The inference code asks whether pushing workload changes late-run pressure:

raw_slice = model.pquery({"RecentWorkload": 1.0})[0]["LateRunsAllowed"]

do_loaded = model.iquery("LateRunsAllowed", {"RecentWorkload": 1.0})
do_fresh = model.iquery("LateRunsAllowed", {"RecentWorkload": 0.0})
workload_effect = model.equery(
    "LateRunsAllowed",
    {"RecentWorkload": 1.0},
    {"RecentWorkload": 0.0},
)

raw_slice observes a high-workload bullpen. workload_effect asks what changes when workload is set directly.

Observed workload gap:        +1.04
Intervention workload effect: +0.85
Observed minus effect:        +0.19

The intervention effect is still positive, so workload matters in this toy model. The raw gap is larger because high workload is selected by leverage and short starts.

Observed bullpen-workload slice compared with the intervention effect on late runs allowed

What a manager can do with the distinction

The causal lesson is practical. If fatigue is the mechanism, rest and usage planning help. If matchup quality is the mechanism, the problem may be roster construction or tactical sequencing. If leverage selection is the mechanism, the bullpen may look tired because the team keeps playing tight games.

Those are different fixes. A box score does not separate them by itself.

Sources

  1. MLB Schedule 2026, MLB, accessed April 28, 2026.
  2. Statcast: Glossary of terms, MLB, accessed April 28, 2026.
  3. Manifestations of muscle fatigue in baseball pitchers: a systematic review, PeerJ via PubMed Central, accessed April 28, 2026.

Download the runnable standalone Python example: Python example ZIP.

Leave a Reply

Discover more from Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading