The Quantum Benchmark That Might Actually Catch a Spoof
A quantum computer told you it found the answer. How do you know it wasn't just a classical computer pretending?
That is the question a new paper from five physicists attempts to answer — and unlike most quantum benchmarking research, their method might actually be cheap enough to use. Gregory Bentsen at William & Mary, Bill Fefferman, Soumik Ghosh at the University of Chicago, Michael J. Gullans at the University of Maryland and NIST, and Yinchen Liu at the University of Waterloo published a technique last week that lets hardware operators check whether their quantum circuits are doing something a classical computer cannot easily fake. The tool requires only logarithmic samples — roughly m~log(n) measurements to catch a spoofing attack with high confidence — which the authors describe as practically free by the standards of quantum experimentation.
The need is real. Existing quantum benchmarks, particularly one called linear cross-entropy, have been gamed. A Harvard group demonstrated that a classical algorithm could match linear cross-entropy scores on real quantum hardware by exploiting noise rather than quantum physics — essentially convincing the benchmark that noise was quantum computation. "Existing benchmarks to characterize these experiments, like linear cross-entropy, have been classically spoofed due to noise," the authors write.
The new method — nonlinear cross-entropy, or log XEB — is designed to close that gap. Where the linear version can be fooled by noise patterns that look like quantum behavior, the nonlinear version distinguishes actual quantum output distributions from a spoofing attack even when moderate noise is present. The paper demonstrates mathematically that the nonlinear score separates noisy quantum computers from state-of-the-art classical spoofers in the specific setting the authors studied: shallow-depth all-to-all random circuits using Brownian motion modeling and replica tricks to derive exact analytic expressions.
The technical core is a binary classifier the paper calls heavy output generation. Given a set of measurement outcomes from a quantum circuit, it decides whether the device is producing genuinely quantum output or a classical imitation. The authors prove that this classifier succeeds with high probability using only a logarithmic number of samples — a result that matters because most quantum verification methods require exponential measurements to rule out spoofing reliably.
There is a meaningful limitation embedded in that proof. The mathematical guarantees hold for all-to-all circuit architectures, where every qubit can interact with every other qubit. Real quantum hardware does not look like that. IBM's quantum systems use a grid connectivity. Google uses a similar layout. Trapped-ion machines have their own topology constraints. The paper's authors acknowledge this gap and note that extending the result to grid architectures "remains an open problem." What works on paper for all-to-all Brownian circuits may not transfer directly to the machines currently deployed in cloud quantum computing services.
No independent lab has reproduced the result, and no hardware vendor has publicly committed to implementing the benchmark. The authors tested their method theoretically and against existing spoofing attacks — but only the specific Harvard spoofer their framework is designed to catch. Other classical algorithms not tested in the paper may or may not be captured by the same mathematical separation.
For enterprises and national laboratories evaluating quantum hardware today, this is the gap between a promising paper and a practical tool. Linear cross-entropy has been the field's working assumption for quantum advantage since Google's 2019 supremacy demonstration — and a Quantum Insider investigation earlier this year documented how extensively the industry has struggled to separate verified quantum performance from vendor noise. If that assumption has been quietly broken by noise-exploiting classical algorithms — and the Bentsen-Fefferman paper says it has — then every vendor claiming quantum advantage on that benchmark owes their customers an explanation. Log XEB may offer a way to get one, cheaply — and for the enterprise buyer or VC evaluating a quantum vendor's claims, that is the difference between a real purchase decision and a science project. If adopted, it functions as a de facto filter: vendors whose hardware fails the benchmark face credibility pressure, while those that pass it gain a demonstrable edge in enterprise sales conversations. Whether it generalizes to the architectures actually in production hardware — IBM's grid, Google's similar layout, trapped-ion topology — is the question the paper cannot answer from a whiteboard.
The authors are Yinchen Liu from the University of Waterloo, Michael J. Gullans from the University of Maryland and NIST, Soumik Ghosh from the University of Chicago, Bill Fefferman, and Gregory Bentsen from William & Mary. The paper is posted on arXiv as 2605.22909.