The Math Behind AI Agent Reliability (And Why One Startup Thinks It Can Fix It) — type0 | type0

The Math Behind AI Agent Reliability (And Why One Startup Thinks It Can Fix It) — type0 | type0

Here is a number worth remembering: 0.95 × 0.95 = 0.9025.

That is the arithmetic of reliability in enterprise AI deployment. Two models, each 95 percent accurate, combine to produce a system that fails 10 percent of the time — more than either model alone. Stack a third, a fourth, a fifth, and you are not building a reliable system. You are compounding your exposure. Satya Nitta, co-founder and CEO of Emergence AI, uses this example not to argue against AI, but to explain why his startup exists: the probabilistic nature of large language models is not a bug that better prompting will fix. It is a structural constraint that requires a different kind of solution.

Emergence AI, a New York-based agentic AI company, builds what it calls a formally verified control architecture — a deterministic layer on top of LLMs that converts natural language instructions into mathematical lemmas, validates them using a theorem prover called Lean, and only then executes. The result is a system that can make guarantees about what it will and will not do, which turns out to be what large enterprises actually need before they will put AI in charge of anything that matters.

"Neural networks and large language models are powerful but probabilistic," Nitta said. "For mission-critical applications, you cannot rely only on probabilistic systems."

The company emerged from stealth this week with $97.2 million in funding and a new research hub in Bengaluru, India — not a satellite outpost, Nitta is careful to say, but the nucleus of the company's broader research ambitions. Emergence India Labs will house around 500 researchers over the next three to four years, operating out of the Indian Institute of Science in Bangalore. Nitta spent two decades at IBM Research before founding Emergence. He has seen how hub-and-spoke research models work. "The main center is where the agenda is set and where most breakthroughs occur," he said. "So we chose to place the core lab in Bengaluru rather than New York."

The Bengaluru decision is worth noting on its own. India's deep technical talent pool — particularly in formal methods, programming language theory, and systems engineering — has historically had one directional outlet: emigration. The best researchers left for Silicon Valley. What Nitta is betting on is that the talent is there, the ambition is there, and the willingness to stay is there if the work is real. "What India historically has not had is places where an enormous pool of deep talent can go to and do world-class, world-leading research," he said.

Emergence's product is an autonomous AI agent deployed inside enterprise environments — not an external API that queries a model and returns a result, but a system that integrates into existing workflows and operates with a defined constraint set. The architecture follows a loop: planning, execution, verification, and memory. Natural language inputs are first converted into mathematical form using Lean, a programming language designed for formal verification. A planning layer generates an execution strategy. The verification layer then checks outputs against ground truth and stated constraints using theorem provers, sending the system back for reprocessing if inconsistencies are detected.

The self-correction loop is the core innovation. "If the verifier is not confident, it asks the system to try again," Nitta said. "This continues until the correct reasoning is achieved."

This is not a new idea in the abstract — formal methods have been applied to software verification since the 1970s, and systems like the seL4 microkernel demonstrated that mathematical proof of correctness is achievable for real code. What is new is applying that framework to the output of large language models in real-time enterprise workflows. The challenge has always been scale: formal verification is expensive, slow, and requires experts. Emergence's bet is that the cost is coming down as tooling matures, and that the alternative — shipping probabilistic outputs into environments where 90 percent reliability means lawsuits, shutdowns, or worse — is no longer acceptable.

The company is targeting sectors where failure is expensive and regulated: semiconductor manufacturing, airlines, biotechnology, oil and gas. The target environment is one characterized by high data volume, velocity, variety, and veracity — four V's that describe the operating conditions of modern industrial production. The use case closest to production today is semiconductor manufacturing: yield analysis and new product ramp-up, where small errors in prediction or scheduling cost millions.

"We are currently in advanced pilot stages and expect to scale these deployments soon," Nitta said.

The business model is licensing. Customers license the agent system and pay a licensing fee; customization adds additional cost. Emergence is not building foundation models — it operates above them in the AI stack, consuming LLM outputs and producing verified, constrained action. "We are building on top of them," Nitta said. "We are not competing at the large language model layer. As these models improve, our systems also improve."

This positioning — a layer above the foundation models, not alongside them — is both the company's bet and its vulnerability. If frontier AI labs solve the reliability problem themselves, or if customers decide that 90 percent is good enough for their use cases, the wedge narrows.

The company is frank about what it cannot yet do. Not all tasks are formally verifiable. Emergence expects tasks to fall into three categories: those that can be formally verified, those that cannot, and those that can be partially verified. The third category — partial verification — is where most interesting enterprise problems probably live, and it is also the hardest to reason about. The absence of a comprehensive library of formal specifications for real-world enterprise rules is a real constraint: you cannot verify what you cannot formalize, and formalizing complex operational rules requires both domain expertise and formal methods expertise that remains in short supply.

The verification bottleneck — the problem that Emergence is trying to solve — is also the problem that will determine whether agentic AI actually delivers on its enterprise promise. The gap between what agents can execute and what humans can trust them to do is not a product problem. It is a structural problem. Solving it requires deciding what guarantees actually mean in a probabilistic system, and then building the tooling to prove those guarantees hold.

That is what Emergence is attempting. Whether the formal methods layer can scale fast enough to matter before enterprises give up and accept 90 percent is the question this company was built to answer.