A study published Tuesday in the journal Radiology puts a number on something radiologists have quietly feared since generative AI became mainstream: the images they are diagnosing might not be real. Researchers tested whether 17 practicing radiologists could tell the difference between real X-rays and synthetic ones generated by ChatGPT. Without a specific warning to look for deepfakes, the radiologists identified that something was wrong only 41 percent of the time. Even after being told to scrutinize for synthetic images, they correctly distinguished real from fake just 75 percent of the time.
The attack is not technically demanding. The researchers used simple text prompts specifying an anatomical location, a disorder, and a level of image noise. The prompts did not require any specialized knowledge of radiology or image generation. That low barrier to entry is the central concern: if a bad actor can produce convincing radiographs with off-the-shelf AI, the implications for medical fraud, insurance abuse, and deliberate misdiagnosis are significant.
The detection problem cuts both ways. The researchers also tested four multimodal AI models on the same deepfake images. The best-performing model caught the synthetic X-rays 85 percent of the time. The worst managed only 57 percent. That means even the best AI detection tool available today would misclassify roughly one in seven synthetic images in a clinical setting.
The Radiology paper is peer-reviewed, which gives the findings more weight than a preprint or a vendor-sponsored report. The study was conducted by an international team and the methodology is described in enough detail to evaluate. That matters because the market for AI-generated medical imagery is moving faster than the regulatory or institutional infrastructure around it. Hospitals are deploying AI diagnostic tools. Insurers are processing claims that may include AI-generated imaging. Law firms are evaluating injury cases that depend on radiographic evidence. None of those systems currently have a reliable deepfake check in the loop.
Radiology AI has been a genuine success story in terms of genuinely useful tools reaching clinical practice. The concern the study surfaces is not that AI is useless in imaging but that the technology has crossed a threshold where the barrier to generating convincing synthetic medical images is now low enough that relying on the authenticity of radiographic evidence without verification is no longer defensible. A radiologist who spots something suspicious in an X-ray and wants to confirm it is real has no obvious tool to do that quickly in a clinical workflow.
What is missing from the study is a clear solution. The researchers demonstrate the problem more clearly than they prescribe a fix. Better detection models would help, but the 57-to-85 percent range tells you those models are not ready for autonomous clinical use. Watermarking standards for AI-generated medical images are being discussed in standards bodies but are not yet in clinical deployment. Hospital procurement processes for AI imaging tools generally do not include deepfake resistance as a checkpoint. The study is a warning more than a roadmap.