Bacteria have been fighting phage viruses for roughly four billion years. In that time, they've built an immune system that makes the human immune response look like a rough draft. A team at MIT just got a much better look at it.
In a paper published in Science, researchers describe DefensePredictor, a machine learning model trained to recognize bacterial defense proteins. They fed it genomic data from nearly 17,000 microbial species and told it to find anything that looked like part of an antiphage immune system. It found 624 candidates across 69 strains of E. coli — more than 100 had never been described before. Forty-five of those were tested in the lab. They worked.
The validation rate matters as much as the count. When the team synthesized 106 predicted systems and challenged them with phage in live E. coli, 45 produced measurable protection — 42 percent. That's not a bioinformatics curiosity. For a computational prediction with no mechanistic hypothesis behind it, it's a remarkably efficient hit rate.
The deeper finding is in the details. Forty-five percent of the newly predicted proteins shared no recognizable sequence or structural similarity with anything in the existing defense protein database — they function via mechanisms that have no name yet. The researchers' estimate of how much of the bacterial genome is devoted to immune function roughly tripled, from around 0.5 percent to 1.5 percent. "We have likely only scratched the surface of the diversity of bacterial immunity," they write.
The comparison to AlphaFold is hard to avoid. When that protein-folding model arrived, it didn't just solve one structure — it revealed that the universe of protein folds was far larger and stranger than anyone had catalogued. DefensePredictor is doing the same thing for bacterial immune systems. It found biology that existed before humans, that evolved without any human hypothesis about what it should look like, and that had no entry in any database until the model pointed at it.
Michael Laub, a microbiologist at MIT and a co-author of the study, noted that the systems are distributed unevenly across species and environments — a sign that bacteria are continuously co-evolving with phage rather than settling into a stable defense. Aude Bernheim, a microbiologist at the Pasteur Institute in Paris who co-authored a related study, has argued that the diversity of bacterial defense systems represents one of the largest uncharacterized portions of the genomic universe.
Some of the newly discovered systems have unexpected connections to human biology. One system contains a metallophosphatase structurally similar to SMPDL3A, a human protein that cleaves a signaling molecule involved in innate immunity. The link between bacterial defense and human immune signaling is a known thread in the research — but finding it by following a computational prediction, rather than designing an experiment to look for it, is new.
For gene editing, the implications are practical. CRISPR systems — the molecular scissors that have generated enormous commercial and clinical interest — are bacterial defense proteins. The version most widely used in research and medicine, Cas9, was just one of many that biologists eventually characterized. DefensePredictor's 624 candidates are a pool of prospective tools. Not all will be usable as gene-editing components. Some may do things we don't yet know how to use. But the pipeline for finding out has gotten dramatically faster.
The phage-bacterial arms race is a permanent feature of the biosphere. In some environments, phage can drive the turnover of 10 to 25 percent of all bacteria daily. That constant pressure is why bacteria keep evolving new countermeasures — and why the catalog of those countermeasures has always been incomplete. DefensePredictor doesn't replace the lab. But it tells the lab where to look next.