Article Markdown

Raw .md Rich view All markdown articles

# The Benchmark Winner Could Not Run the Store

- Date: 2026-04-26
- Category: Artificial Intelligence

GPT-5.5 beat Opus 4.7 in Andon’s store benchmark. Luna, the AI that ran the actual store, forgot to schedule employees on day two and fabricated inventory claims. The benchmark doesn’t transfer. Neither, Andon cofounder Petersson says, does the job description.

---