Can frontier AI do real science, or only retrieve it? That's the question Dwarkesh Patel has been working through this week, drawing on a conversation with Michael Nielsen — and the history of science suggests the answer is not reassuring.
Patel's test case is the discovery of Neptune. In 1846, the astronomer Urbain Le Verrier applied Newtonian mechanics to explain why Uranus was straying from its predicted path. He calculated where an unknown planet must be tugging on it. Neptune was found almost exactly where he said. Triumph for both Newton and Le Verrier.
But Mercury's orbit had the same problem. Le Verrier applied the same method. A perturbing planet, inside Mercury's orbit. He called it Vulcan. Vulcan was never found. Sixty-nine years later, Einstein's general relativity explained Mercury's wobble without any new planet. The same logic, the same method — triumph in one case, decades of fruitless searching in the other. The asymmetry is not obvious in advance.
The pattern matters because AI faces exactly this kind of problem. Reinforcement learning from verifiable rewards — RLVR — is what made AI crack coding and math: those domains have tight feedback loops where you run a program and know immediately whether it worked. Patel argues that the history of science shows the most consequential breakthroughs had verification loops spanning decades or centuries, and that experiments rarely settle questions as decisively as AI companies need them to.
AlphaFold is the strongest case for optimism. It solved protein structure prediction — a problem biologists had labored over since the 1970s — by training on known structures and checking predictions against held-out data. Fast feedback. Clear ground truth. More than three million researchers across 190 countries now use the platform, and the work earned DeepMind's founders the 2024 Nobel Prize in Chemistry.
But AlphaFold predicts structure. Drug discovery is not primarily a structure-prediction problem. The hard part is not knowing what a protein looks like; it is knowing whether a molecule that binds to it will treat a disease in a human being. That requires wet-lab experiments, clinical trials, regulatory review — a loop that runs years and costs hundreds of millions of dollars per successful drug.
Isomorphic Labs, the Alphabet biotech spinoff built on AlphaFold, said in 2025 that it would have AI-designed drugs in human clinical trials by the end of that year. At the World Economic Forum in early 2026, the timeline was revised to the end of 2026. As of late April, Isomorphic's president says the company is "gearing up" — no date committed. No AI-designed drug has ever received FDA approval. The company has partnerships with Eli Lilly, Novartis, and Johnson and Johnson, and raised $600 million in March 2025. The first human trial data will be the closest thing the field has to a real answer about whether computational drug design can close a loop that took Einstein's relativity sixty years to verify.
The counterintuitive history of heliocentrism makes the same point at longer timescales. Copernicus proposed in 1543 that Earth orbits the sun. His circular-orbit model was numerically worse than Ptolemy's geocentric framework, which had been refined for centuries — Copernicus had to add more epicycles to match the data. In raw accuracy, Ptolemy won. What Copernicus had was explanatory elegance: retrograde motion fell out of his model without any ad hoc adjustments. That economy of explanation is why heliocentrism survived despite its worse numerical fit. But you could not have verified heliocentrism by counting epicycles. Stellar parallax — the apparent shift of nearby stars against distant ones as Earth orbits the sun — was first measured in 1838, nearly two thousand years after Aristarchus proposed heliocentrism in ancient Greece.
Prout's hypothesis makes the same point at shorter timescales. Prout proposed in 1815 that all atomic weights are whole-number multiples of hydrogen. Chlorine's measured atomic weight kept coming in at 35.46, not 35, and chemists spent the next century proposing impurities, measurement errors, and fractional weights to explain the discrepancy. The resolution came only when physicists discovered isotopes in the 1920s: chlorine has multiple isotopes with different masses, so the average atomic weight is not a whole number. The hypothesis was right in outline, wrong in detail, and required seventy years of accumulated evidence and a new subfield of physics to reconcile.
What this means practically: science cannot be easily automated by a system that requires rapid feedback. Patel argues that the most productive research programs may require scientists who are willing to be unreasonably obstinate in the face of disconfirming evidence. Einstein insisted there should be no arbitrary inertial reference frame in physics, a prior most physicists considered aesthetic rather than empirical. He was right. But that hunch cannot be verified with a quick experiment.
What AI can plausibly do in the near term is on-the-job learning: deploying AI instances to do actual scientific work over extended periods, improving from feedback embedded in whether that work was useful. The model runs an experiment, gets the result back, updates. That kind of learning does not require short-horizon RL loops, and it sidesteps the verification bottleneck by embedding the loop in real-world utility rather than formal falsification.
The bets that look most viable are the narrow domains with fast feedback: coding agents, math solvers, protein structure prediction. Drug discovery is not one of them. The biology has not been cooperating with compressed timelines, and the historical record suggests that the most consequential scientific questions are verification problems that current AI techniques cannot shortcut.
Isomorphic's coming clinical trial data will be informative. It will show whether the best AI-for-science system money and prestige can buy can close a loop that took Prout's hypothesis seventy years and Einstein's relativity another sixty. The bet is not irrational. But the history on which it rests is not reassuring.