Causal AI for Clinical Trials Scales to 400 Million Patient Records
A model trained on 400 million patient entries asks "what if?" at scale — reproducing known clinical patterns is exactly how causal AI in medicine starts.
Researchers at the University of Tokyo have built an autoregressive generative model trained on data from more than 300,000 patients and 400 million patient timeline entries — and used it to simulate counterfactual clinical outcomes. Rather than predicting what will happen to a patient, the system models what would happen if something changed: what if this patient were older, or had elevated kidney markers, or received a different medication? The team validated the approach on patients hospitalized with COVID-19 in 2023, modifying age, serum C-reactive protein (CRP), and serum creatinine to simulate seven-day outcomes.
The model predicted increased in-hospital mortality when simulating older age, elevated CRP, and impaired kidney function. It also generated plausible prescription patterns: Remdesivir use increased in simulations with higher CRP values and decreased in those with impaired kidney function, consistent with the drug's known contraindications. These are not new clinical findings — they are established medical knowledge, reproduced by a model. That reproducibility is the point.
The distinction matters. Most clinical AI models pattern-match: they predict what will happen based on what has happened. Counterfactual simulation asks a different class of question: what would happen under a hypothetical intervention? This is causal reasoning, not prediction. Finance solved it decades ago with Monte Carlo methods. Physics has long had digital twins for particle collisions. Medicine is arriving late — and the data scale required to do it credibly is now within reach.
"These findings suggest that autoregressive generative models trained on real-world data in a self-supervised manner can establish a foundation for counterfactual clinical simulation," the researchers wrote in a preprint posted to arXiv in January 2026.
The validation results are coherent with established clinical knowledge — not surprising, but credible as a proof of concept. Akagi and colleagues' companion paper, also released in January 2026, introduced a longitudinal simulation model pretrained on more than 200 million clinical records, with observed-to-expected ratios consistently near 1.0 — meaning its predictions closely tracked actual patient outcomes.
The commercial context is already taking shape. Model-informed drug development (MIDD) practices, which include digital twins and trial simulation, could eventually cut clinical trial development costs by as much as 60 percent and shorten cycle times by up to 40 percent, according to a ZS Associates analysis of modeled scenarios and early proof points. These are not industry norms today. AstraZeneca has used more than 300 million synthetic patient records in digital clinical trials, decreasing drug development costs by an estimated $100 million per drug†, according to the Lancet Digital Health00007-X/fulltext).
The gap between a single academic model and what a major pharmaceutical company can deploy is not trivial. The Tokyo model was trained on data from one institution; its generalizability to other populations is untested. What the work demonstrates is feasibility at scale — not that the approach has arrived.