On April 23, researchers at Chalmers University and the Volvo Group posted an updated preprint that drew a careful line between what AI can do in software engineering today and what's still an empty whiteboard. The line is coordination.
The paper, The Semi-Executable Stack, organizes AI's expanding role in software engineering into six concentric rings. Code sits at the center. Around it: prompts and specifications, orchestrated multi-agent workflows, safety guardrails, organizational decision routines, and outermost, regulatory fit. Rings one through four have methods. Rings five and six — the organizational and governance layers — have zero. That part is not new. What the April 23 update confirmed is something the field has been circling without naming: the unlock is not the agents. It is the coordination layer between them.
At Volvo, two systems sit on opposite sides of that gap. GoNoGo handles software release decisions by routing them across specialized AI agents — one handles the analysis, another applies safety checks, a third manages the approval workflow — and escalates anything complex to a human. For routine releases, it works without human involvement. In a live pilot, it saves roughly two hours per decision, according to Volvo-reported production data. The researchers cite it as the clearest example of ring three work — orchestrated multi-agent workflows — actually deployed.
The other case is more revealing precisely because it did not ship. SPAPI-Tester is a tool that automates testing for automotive APIs. At Volvo it ran across 193 newly developed APIs and found 23 failures — 22 confirmed implementation bugs and one documentation parsing issue. The bugs were real. The system replaced the two to three full-time engineers a team would otherwise need for that work. This is ring three and ring four territory: orchestrated execution and control systems. It works. But these APIs had not reached production — the result is in a different category than GoNoGo's measured deployment.
The coordination infrastructure is the part that does not get demoed at conferences. It is unglamorous. It is the plumbing between the agents: how decisions get routed, how safety checks attach to a workflow, how a human stays in the loop for anything that matters. That plumbing is what Volvo built, and it is what the rest of the field has not.
The "empty whiteboard" the researchers describe for rings five and six — organizational decision routines and institutional fit — is not an abstract gap. It is the gap between what regulators can now demand under the EU AI Act and what engineering exists to answer. A regulator can ask whether your agentic system will stay coherent under real governance requirements. There is no method to demonstrate that. The EU AI Act arrived before the engineering to comply with it did.
The researchers describe their work as diagnostic and agenda-setting. That framing holds. The paper surfaces a real structural problem: the coordination infrastructure that makes multi-agent systems work in production is specific, painstaking, and almost entirely unstandardized. Volvo built GoNoGo for their own release workflow and SPAPI-Tester for their own testing pipeline. Nobody has published the equivalent for anyone else.
What happens when that problem gets solved — when a coordination layer exists that works across teams, not just inside one company's internal processes — is the open question the paper does not answer. That is where the field is headed next.