A new kind of AI infrastructure startup is raising serious money to solve a problem that benchmarks alone can't: how you actually train an agent to do real work.
Deeptune has closed a $43 million Series A led by Andreessen Horowitz to build what it calls "training gyms" for AI agents — high-fidelity simulations of professional workflows where models learn by doing, not just by reading. The round, first reported at Fortune, included participation from 776, Abstract Ventures, and Inspired Capital, with angels Noam Brown (OpenAI Research), Brendan Foody (Mercor CEO), and Yash Patil (Applied Compute CEO).
This is not the Deeptune that raised a $3M seed in 2023 to build an AI-powered audiovisual dubbing tool for creators and studios. That earlier iteration of the company — focused on matching dubbed audio to foreign-language video — has since pivoted. The new Deeptune, founded by the same team, is making a different bet entirely: that the bottleneck in AI development has shifted from model capability to training environment quality, and that the next critical layer of the AI stack is the gym, not the model.
"Reinforcement learning environments are already becoming the next critical layer of the AI stack: a shift from static, human-annotated datasets to dynamic, engineered systems that generate high-quality training signals at scale," a16z partner Marco Mascorro wrote in the firm's announcement. "This transforms data from a labor problem into an engineering and research problem, and increasingly, a compute problem."
The pitch maps to a real constraint. Today's frontier models can pass bar exams and write clean code on command. Ask one to close an LBO in Excel or handle a multi-step customer support escalation across Salesforce and Slack, and performance collapses. The gap between knowing and doing — what Deeptune calls the "mastery gap" — isn't a model problem anymore. It's a data problem, and specifically a training signal problem.
"You wouldn't have a pilot who has only ever read books or watched tutorials fly a plane," Tim Lupo, Deeptune's co-founder and CEO, told Fortune. "What we build are essentially the flight simulators for AI doing work across the economy." Lupo previously worked as a founding engineer at Hebbia, where Lukas Schmit — Deeptune's co-founder and CTO — was the founding ML engineer. The team of roughly 20 works in-person in New York and includes engineers from Anthropic, Scale AI, Palantir, Glean, and Retool.
The mechanics are straightforward: Deeptune builds realistic digital workspaces — accounting workflows, DevOps pipelines, support ticket queues — then runs millions of agentic rollouts in those environments, with reward signals tied to task completion. Labs and enterprises subscribe to the platform, which claims 10,000-plus parallel experiments, 50-plus environment replicas, sub-50-millisecond API latency, and a 99.9% uptime SLA. Deeptune says it has built hundreds of training gyms for leading AI labs — unnamed in its announcement — and that its environments have already contributed to measurable gains on the OSWorld benchmark, which measures how well agents navigate real computer interfaces.
The benchmark numbers are moving fast. Anthropic's Opus 4.6 scores 72.7% on OSWorld, surpassing the human baseline of 72.36%, and OpenAI's GPT-5.4 reaches 75%, per a16z's announcement. But benchmarks are a lagging indicator. The bet is that as the training environment ecosystem matures, the ceiling on agent capability rises with it.
"We were the first company to build an environment a bit over a year ago, and no one really knew if it was going to work," Lupo told Fortune. "We now know that they work insanely well." According to the company, anything that can be distilled into an environment — "from editing a video to building an LBO in Excel" — is something AI can learn through RL.
The market timing is backed by third-party projections. The global reinforcement learning market, including tools and environments, is projected to grow from roughly $11.6 billion in 2025 to more than $90 billion by 2034, according to ResearchAndMarkets, as cited by Fortune. Major labs are already spending accordingly: leaders at Anthropic have discussed committing more than $1 billion to RL environments over the next year, The Information reported. Scale AI and data-labeling incumbents are racing to build out their own offerings, per TechCrunch.
Mascorro frames the stakes as architectural. "If the last decade of AI progress was driven by better datasets, the next decade will be mostly driven by better environments," he wrote. The implication is that the moat isn't the model — it's the gym.
Whether that holds depends on whether the environments can keep pace with the agents they train. Simulation fidelity is hard; enterprise software stacks are messy, versioned, and full of edge cases that don't surface in clean benchmarks. The risk is that Deeptune's gyms become a shadow world — close enough to train on, different enough from production that agents degrade when deployed. The company is upfront that "there's still a lot of work to be done," per its own blog post, but hasn't published independent red-teaming or deployment data.
The bigger open question is who owns the environment layer long-term. If training gyms become the critical infrastructure layer, the strategic position belongs to whoever controls the simulation — a position that could sit above the model providers, or get subsumed by them as labs build their own proprietary environments. Noam Brown's presence as an angel investor is notable: he's the OpenAI researcher behind the poker AIs Libratus and Pluribus, which were early proof that RL in synthetic environments could beat humans at complex tasks. His involvement signals that the frontier labs are watching this space closely — and may be both customers and potential acquirers or competitors.
Deeptune was founded in 2022 and ran for roughly a year before the Series A, which means it was building in near-silence while the market consensus on agent training shifted. The $43M gives it runway to scale the gym count, expand the environment library, and figure out whether subscription revenue from AI labs is a business or a feature. The a16z bet is that it's infrastructure.