# Why a 24% Score on a Reasoning Benchmark Is an Argument About Compute - Date: 2026-04-06 - Category: Artificial Intelligence A CoreThink AI pipeline that separates perception from rule induction pushed a weak LLM from 16% to 24.4% on ARC-AGI-2 without fine-tuning — and the ablation numbers show why the result matters for the test-time scaling debate. ---