The most important scientific papers, decoded. 309 papers analyzed from arXiv and beyond.
A new study reveals that vision-language models can reliably audit computer-use agents on straightforward tasks — but start diverging significantly when the work gets messier. Researchers Marta Sumyk and Oleksandr Kosovan published "CUAAudit" on arXiv (March 11, 2026), evaluating five VLMs as au...
OpenAI has released a new training dataset that significantly improves how language models prioritize conflicting instructions—a critical vulnerability that hackers have exploited to jailbreak AI systems. The dataset, called IH-Challenge, was published on arXiv (arXiv:2603.10521) on March 11, 20...