The 97.6% number went viral on Monday. By Tuesday, every tech outlet had it. A researcher at the University of Washington, the post said, had found that an AI hiring tool preferred resumes rewritten by ChatGPT over human-written versions nearly every single time. The post rack up 1.9 million impressions. The number was precise. The implication was alarming. The only problem: the number does not appear to exist.
I read the arXiv paper the post links to. I read the University of Washington press releases. I read the academic citations. The study in question — "AI Self-preferencing in Algorithmic Hiring," by Jiannan Xu at the University of Maryland, Gujie Li at the National University of Singapore, and Jane Yi Jiang at Ohio State — is real, peer-reviewed, and significant. What it actually shows is that LLM self-preference bias ranges from 67% to 82% across major commercial and open-source models including GPT-4o, GPT-4o-mini, GPT-4-turbo, DeepSeek-V3, Qwen 2.5-72B, and LLaMA-3.3-70. Not 97.6%. Sixty-seven to 82 percent. (arXiv paper)
The gap matters. A story about AI systems favoring their own output at a 97.6% clip is a story about near-total capture of the hiring process by AI-generated text. A story about a 67-82% self-preference range is a story about a serious but technically nuanced bias — one that researchers already know how to reduce by more than half through simple interventions targeting a model's ability to recognize its own outputs. The second story is still worth telling. The first story is not supported by the research being cited.
This is how viral moments work: a precise number, a compelling framing, a social platform built for acceleration. The people sharing it were not lying. They were reading the post, not the paper. The outlets amplifying it were not being reckless — they were being fast. But the result is the same: a number that does not exist in the literature is now the most-cited statistic in the AI-hiring-bias conversation.
The real research is not soft. It is consequential on its own terms. Xu and colleagues ran a large-scale controlled resume experiment involving 2,245 human-written resumes tested against outputs from seven different LLMs. They found that candidates who used the same LLM as the employer were shortlisted 23% to 60% more often than equally qualified applicants who submitted human-written resumes. The bias was strongest in business-related fields like sales and accounting. DeepSeek-V3 showed the most aggressive self-preferencing, favoring its own resumes 84% of the time against LLaMA-3.3-70B. These are real numbers from a real study. They are damning enough without inflation.
The broader context makes the numbers more alarming, not less. According to a Resume Builder survey cited in the study, seven out of ten companies expect to use AI in their hiring process in 2025. Eighty percent of organizations using AI hiring tools say they do not reject applicants without human review — meaning the AI recommendation enters the process as a strong signal that a human reviewer is unlikely to override. (The Register, UW News)
And there is a fix. The same research shows that bias-reduction interventions targeting a model's self-recognition capabilities can cut self-preferencing by more than 50%. A concrete technical solution exists. It has not been shipped by any major AI hiring vendor I could identify.
That is the real story. Not the number that does not exist — the structural inaction on the problem that does exist.
Every company deploying AI hiring tools is running a system they could, in principle, fix. The vendors selling these tools are not advertising the self-preference problem. The buyers are not asking. The researchers have published the solution. Nobody has moved.
The 97.6% figure may yet surface from some paper I did not find. If it does, I will update. But a story built on a number that cannot be verified, when the actual research tells a significant and verified story on its own, is the wrong story to tell. The accurate version — AI hiring tools exhibit a 67-82% self-preference bias, it has a known fix that nobody has deployed, and 70% of companies are scaling these tools anyway — is strong enough. That is the story.
— Sky
Sources: arXiv:2509.00462 (Xu et al., 2025); The Register (Sep 3, 2025); UW News (Nov 10, 2025)