Hospitals deploy warning systems, dashboards, and alerts all the time. Then the first slide shows up: mortality was lower after rollout, or higher among patients who got flagged, or both depending on who is telling the story.
That is a perfect causal problem because the alert is not assigned randomly. The sickest patients are the most likely to trigger it. The alert may still help, but the raw table is already contaminated by severity.
If you do not separate who gets the alert from what the alert changes, you can make a genuinely useful system look harmful.
This post is not trying to settle a real hospital deployment from public data. It is doing something narrower and more useful: take a familiar healthcare argument, turn it into a graph, and show how py-bbn can answer the right questions once the story is stated clearly.
Start with the argument people actually make
The naive argument sounds like this:
- flagged patients died more often
- therefore the alert did not help
But a clinician hears that and immediately asks a different question:
- who was more likely to get flagged in the first place?
That is the whole setup. High-severity patients are more likely to trigger the alert, more likely to receive treatment quickly, and more likely to die even after treatment. Once that is true, P(death | alert) is not the same quantity as P(death | do(alert)).
Seen as a flow, the mistake is not subtle: severity selects who gets flagged, severity also carries baseline risk, and the alert changes treatment on top of both.

The raw flagged-versus-unflagged table collapses those distinct forces into one misleading split.
Draw the graph before you argue about the number
For a compact toy model, four nodes are enough:
Severity: how sick the patient already isAlert: whether the sepsis system firesAntibiotics: whether treatment gets to the patient quicklyDeath: the downstream outcome
The causal picture is:
Severity -> AlertSeverity -> AntibioticsSeverity -> DeathAlert -> AntibioticsAntibiotics -> Death
Drawn explicitly, that structure looks like this:

That is already enough to see the problem. There is a real causal path from Alert to Death through Antibiotics, but there is also a backdoor path through Severity.
With py-bbn, the first job is not inference. The first job is graph inspection.
import networkx as nx from pybbn.graphical import get_graph_tuple, get_minimal_confounders, get_paths g = nx.DiGraph() g.add_edges_from( [ ("Severity", "Alert"), ("Severity", "Antibiotics"), ("Severity", "Death"), ("Alert", "Antibiotics"), ("Antibiotics", "Death"), ] ) gt = get_graph_tuple(g) get_minimal_confounders(gt, "Alert", "Death")# ['Severity']get_paths(gt, "Alert", "Death")
That is the right kind of result. Before you compute anything, the graph tells you that severity is the node that has to be dealt with if you want an interventional story instead of a descriptive one.
Then build a toy world you can query
Now we give the graph small, inspectable probabilities. These are not “the hospital.” They are a compact causal hypothesis you can read line by line.
from pybbn.factory import create_reasoning_model
d = {
"nodes": ["Severity", "Alert", "Antibiotics", "Death"],
"edges": [
("Severity", "Alert"),
("Severity", "Antibiotics"),
("Severity", "Death"),
("Alert", "Antibiotics"),
("Antibiotics", "Death"),
],
}
p = {
"Severity": {
"columns": ["Severity", "__p__"],
"data": [["low", 0.82], ["high", 0.18]],
},
"Alert": {
"columns": ["Severity", "Alert", "__p__"],
"data": [
["low", "off", 0.82],
["low", "on", 0.18],
["high", "off", 0.15],
["high", "on", 0.85],
],
},
"Antibiotics": {
"columns": ["Severity", "Alert", "Antibiotics", "__p__"],
"data": [
["low", "off", "slow", 0.70],
["low", "off", "fast", 0.30],
["low", "on", "slow", 0.25],
["low", "on", "fast", 0.75],
["high", "off", "slow", 0.60],
["high", "off", "fast", 0.40],
["high", "on", "slow", 0.20],
["high", "on", "fast", 0.80],
],
},
"Death": {
"columns": ["Severity", "Antibiotics", "Death", "__p__"],
"data": [
["low", "slow", "no", 0.975],
["low", "slow", "yes", 0.025],
["low", "fast", "no", 0.988],
["low", "fast", "yes", 0.012],
["high", "slow", "no", 0.68],
["high", "slow", "yes", 0.32],
["high", "fast", "no", 0.80],
["high", "fast", "yes", 0.20],
],
},
}
model = create_reasoning_model(d, p)
Ask the observational question first
If you just condition on whether the alert fired, the flagged group looks much worse:
obs_on = model.pquery(nodes=["Death"], evidences=model.e({"Alert": "on"}))["Death"]
obs_off = model.pquery(nodes=["Death"], evidences=model.e({"Alert": "off"}))["Death"]
In this toy setup:
P(death = yes | alert = on) = 0.1215P(death = yes | alert = off) = 0.0308
If you stopped there, you would say the alerted patients did much worse.
That statement is descriptively true and causally useless.
Now ask the intervention question
The quantity people actually care about is the result of setting the alert state, not merely observing it.
do_on = model.iquery(["Death"], ["yes"], ["Alert"], ["on"]).iloc[0]
do_off = model.iquery(["Death"], ["yes"], ["Alert"], ["off"]).iloc[0]
Now the story changes:
P(death = yes | do(alert = on)) = 0.0528P(death = yes | do(alert = off)) = 0.0663
So in this model, the alert helps. The raw flagged group still looks worse because the alert is concentrated in high-severity cases.
That is exactly the kind of contradiction an operational causal API should be able to explain in plain language.
Counterfactuals are where the argument becomes clinical
Once the team accepts the population-level effect, the next question is case-level:
What about this specific high-severity patient?
model.cquery(
"Death",
{"Severity": "high", "Alert": "on"},
{"Alert": "off"},
)
For a high-severity alerted case in this toy world, the counterfactual death probability under alert = off is 0.272.
That is a better clinical conversation than “alerts work” or “alerts do not work.” It is: in this particular kind of case, how much did the pathway through faster antibiotics matter?
What this buys you
The useful move is not “trust the model.” The useful move is “state the causal claim in a form that can be inspected.”
py-bbn helps in three different ways here:
- the graph utilities tell you which backdoor story is contaminating the comparison
- the interventional query gives the population quantity people actually mean
- the counterfactual query turns the same model into a case-level question
That is the difference between a dashboard claim and a causal workflow.
Next in the series: a campaign says canvassing won the district, but the graph says they may have knocked on the easiest doors first.


Leave a Reply