Some AI cancer tools analysing tumour images may rely on visual shortcuts rather than genuine biological signals, according to new research.
A large analysis of more than 8,000 patient samples across four cancer types found artificial intelligence models often achieved high accuracy by relying on statistical correlations rather than true biological signals.
The findings raise concerns that some AI pathology tools, designed to help identify cancer faster and potentially reduce testing costs, may not yet be reliable enough for routine clinical care.
The study compared the performance of machine learning systems across breast, colorectal, lung and endometrial cancers.
Instead of detecting specific genetic mutations directly, some models appeared to rely on related clinical features.
For example, rather than identifying mutations in the cancer-related BRAF gene, a model might detect a linked feature called microsatellite instability, a condition where the cell’s DNA repair system does not function properly.
Because these features often occur together, the system may predict BRAF mutation status using that association. This means predictions may only remain accurate when both features appear together.
“It’s a bit like judging a restaurant’s quality by the queue of people waiting to get in: it’s a useful shortcut, but it’s not a direct measure of what’s happening in the kitchen,” said Dr Fayyaz Minhas, associate professor and lead author of the study at Warwick.
“Many AI pathology models are doing the same thing, relying on correlations between biomarkers or on obvious tissue features, rather than isolating biomarker-specific signals. And when conditions change, these shortcuts often fall apart.”
When researchers tested the models within specific patient subgroups, such as only high-grade breast cancers or tumours with microsatellite instability, accuracy fell substantially.
For some prediction tasks, the advantage of deep learning over existing clinical information was limited.
AI systems achieved accuracy scores of just over 80 per cent when predicting biomarkers, compared with around 75 per cent using tumour grade alone, a measure already assessed by pathologists.
Kim Branson, senior vice president and global head of artificial intelligence and machine learning at GSK and co-author of the study, said: “We’ve found that predicting a BRAF mutation by looking at correlated features like MSI is often like predicting rain by looking at umbrellas it works, but it doesn’t mean you understand meteorology.
“Crucially, if a model cannot demonstrate information gain above a simple pathologist-assigned grade, we haven’t advanced the field; we’ve just automated a shortcut.”
The researchers said machine learning could still prove useful for research, drug development screening and clinical decision support.
However, they argue future AI systems should move beyond correlation-based learning and instead model underlying biological relationships.
Dr Minhas added: “This research is not a condemnation of AI in pathology. It is a wake-up call.
“Current models may perform well in controlled settings but rely on statistical shortcuts rather than genuine biological understanding.
“Until more robust evaluation standards are in place, these tools should not be seen as replacements for molecular testing, and it is essential that clinicians and researchers understand their limitations and use them with appropriate caution.”
Professor Nasir Rajpoot, director of the Tissue Image Analytics Centre at the University of Warwick, said: “This study highlights a critical point about the rollout of AI in medicine: to deliver real and lasting impact, the value of AI-based clinically important predictions must be judged through rigorous, bias-aware evaluation, rather than relying solely on headline accuracies that fail to account for confounding effects.”

