LeadScope 05: The Average Lied

One of the fastest ways to lose trust is to recommend an average-positive edit to the wrong molecules.

That is why the fail-closed part of the LeadScope story matters so much.

The cleanest example comes from one of our lorazepam-centered internal workflows. In that run, one docking-friendly shape edit looks attractive at the population level. It carries an average gain for the preferred docking state of about +0.124. If you stop reading there, you can write a very clean post: make this change, get better docking.

But that is not yet a decision-ready answer.

But the molecule-level audit shows a mixed population. Across 1,213 molecules in the relevant starting state, the move is positive for 877, negative for 309, and near-zero for 27. So the average is real, but it is hiding a meaningful anti-responder minority.

Now ask the harder question the report is supposed to answer: what happens on molecules that are already strong?

Among molecules already sitting in the strongest docking group, the sign flips. The same move is positive for 125 and negative for 174. On the 51 molecules in the top tradeoff set for that state, it is positive for 19 and negative for 30, with mean uplift collapsing to about +0.015.

The subgroup split is the whole point:

Sign flip by subgroup

That is the lesson. The lever is not fake. The average is not fake. The mistake is pretending the average is universal.

A good decision system should not only identify promising edits. It should also state where those edits stop being broadly trustworthy. Sometimes the honest answer is: this looks like a rescue move for weaker molecules, not a general recommendation for the molecules you already like.

That kind of honesty is not a weakness. It is part of the product.

If the report cannot say “do not apply this broadly” or “reserve this for rescue only,” then the workflow is still biased toward always recommending action. That is exactly the behavior that makes chemistry teams distrust software.

LeadScope gets more credible when it says no, or at least not here, not in this subgroup, not with this confidence.

That is the fail-closed standard the whole series is trying to defend.

Next in the series: another uncomfortable boundary condition, the best region often sits farther from the seed than teams initially want to go.

Blogs