While AI Agents Wait, Idle Compute Becomes the New Optimization Target
Speculative planning converts idle compute cycles into accuracy gains — but the code is private and no vendor has publicly committed to replicating the results.
A paper posted to arXiv on May 21 by researchers at KAIST, Amazon AGI, and Together AI makes a claim that will feel counterintuitive to anyone who has watched AI agents stumble through multi-step tasks: the time those agents spend waiting between steps is not dead weight. It is a performance budget that has been left on the table for years.
The system is called IdleSpec. Its core finding, laid out in the paper's second paragraph, is that the only prior serious attempt to solve this problem actually made things worse. That prior approach, called Sleep-Time Compute, was published by researchers at Letta and UC Berkeley in April 2025. IdleSpec's authors show that Sleep-Time Compute was using just 13.7 percent of available idle cycles on the GAIA benchmark and, because it assumed future tool calls were predictable, sometimes degraded accuracy below the baseline it was trying to improve. The paper calls the pattern "often degrading performance" — which understates how unusual it is for a follow-on paper to find its predecessor was actively harmful.
On GAIA and FRAMES, two standard agentic reasoning benchmarks, IdleSpec pushed Gemini-2.5-Flash to 55.6 percent average accuracy, a 5.1 percentage point gain over the same model running without idle-time exploitation. On MLE-Bench, which involves substantial code execution delays, the improvement reached 9.1 percent on the Any Medal rate. No additional API calls. No new infrastructure. Just better use of time that was already being wasted.
The fix is a two-phase drafting strategy. During idle time, IdleSpec generates plan candidates using one of two complementary approaches. Progressive drafting assumes the next observation will be favorable and pushes forward. Recovery drafting assumes the observation will be unfavorable and explores alternative paths. A learned distribution selects between them at runtime, updated via posterior feedback once observations arrive.
"We find that planning yields the most consistent performance improvements when generated during idle time," the authors write, compared against summarization and reflection strategies. Planning is more resilient to observation uncertainty because it can conditionally formulate over plausible futures rather than committing to a single prediction.
Thibault Jaigu, CEO and co-founder of AI infrastructure company Requesty, analyzed IdleSpec's results alongside other May 2026 agent techniques. He notes that when idle-time speculation succeeds — which the paper's own analysis puts at roughly 60 to 80 percent of cases — the agent experiences effectively zero latency between steps. The same source notes agent latency is the number one complaint from users in production deployments.
The broader shift the paper points toward is a new optimization target for AI agents. The conventional goal was token throughput: how many tokens per second can the model generate. IdleSpec adds a second dimension. Not how fast the model talks, but how productively it uses the dead time between utterances.
This matters for the economics of production agent deployments. Agents solving complex tasks through multi-step reasoning and tool use spend a significant fraction of their wall-clock time waiting. If that waiting can be converted to accuracy gains at zero marginal cost, the cost-per-useful-output of agentic systems drops. Whether the gains hold outside benchmarks, and whether the technique generalizes to real-world agent pipelines with variable latency profiles, remains the open question the paper cannot answer for readers.
No public code release has accompanied the paper, and no vendor has announced plans to adopt or replicate IdleSpec's approach. Independent reviewers have called it a signal worth watching rather than a production-ready tool. For teams running agents in production today, the core mechanism is compelling — the implementation gap is the thing to watch.
What changed is the paradigm. The idea that idle time was unsolvable died this week. What comes next is the engineering. Watch for whether any major agent framework — LangGraph, AutoGen, CrewAI — publicly commits to replicating the results before the paper's code materializes. If it does, the production-readiness question closes faster than the peer review will.