LLMs, robotics, ML infrastructure, and AI applications.

Simon Willison built a credential scanner in one afternoon using an AI coding agent and test-driven development. The workflow he used is the actual story.











A regex check looks correct. The decoder runs afterward. The SAST tool sees clean dataflow and moves on. This is why OpenAIs new vulnerability detection agent excludes SAST reports from its starting point — and why that design choice matters.
Anthropic, Google, and OpenAI models now run tools during the reasoning phase before returning a response. That creates a storage problem that every production AI application builder is about to hit. Simon Willison went straight to the raw APIs to solve it.
Google confirmed Gemini Nano 4 will run on the same Gemma 4 E2B and E4B weights developers can download today. The open model and the proprietary one share the same foundation — and Google is threading both tracks at once.
OpenAI is restructuring its leadership as it builds a $4 billion private-equity vehicle to absorb enterprise deployment costs and dress its financials for an IPO — while projecting $14 billion in losses this year.
Roboflow CEO on the reproducibility problem nobody talks about, the 18-month edge lag, and why vision is still three years behind where language was with GPT-4.
Sora made $2.14M in lifetime revenue while burning $1M a day. Its shutdown reveals the brutal unit economics of generative video — and why every AI startup betting on consumer-facing video should read the numbers before the next launch.
While OpenAI was bundling $125M into super PACs, Anthropic spent years and $3.13M on lobbying. Then it donated $20M to a c4, filed a PAC, and went to court. That sequence is the story.
Anthropic spent months building an adversarial evaluator to catch what solo agents miss: a solo Claude praised its own broken app’s elegant design. The fix cost $200.
At $380 billion, Anthropic spent $400 million on a two-person team whose lead researcher won the ICLR Outstanding Paper Award in 2024 for autonomous antibody design. That is the bet.
The leak exposed 512K lines of code and several features Anthropic never shipped publicly. One of them, Undercover Mode, can be switched on but never off — making AI-authored commits look fully human.
Teaching a code model when to pause turned out to matter more than teaching it how. A Peking University and Alibaba team found that RLVR, a reinforcement learning approach that rewards timing rather than reasoning content, produced a 9.3 point jump on code generation benchmarks — and the model le...