The Cochrane Collaboration, the London-based publisher renowned for producing the gold standard in health-related systematic reviews, is pushing back hard against the hype surrounding AI-powered scientific literature analysis. In a candid assessment published this week, newly appointed editor-in-chief (who took the role in March 2026) revealed that after extensive testing, current AI tools are "far from ready for mainstream adoption" in evidence synthesis work.
The Hallucination Problem Runs Deep
Systematic reviews form the backbone of clinical practice guidelines, public-health policy, and treatment recommendations affecting millions of patients. When these reviews contain errors—whether fabricated data points or misinterpreted findings—the consequences extend far beyond academic corrections. Patients could receive false hope from studies that don't actually support claimed outcomes, while health systems waste resources on interventions that are either ineffective or actively unsafe. AI models trained to replicate human review processes bring a fundamental weakness: their well-documented tendency to hallucinate, meaning they confidently generate plausible-sounding but entirely fabricated information that requires rigorous human verification before any clinical application.
Black Box Bias Concerns Plague Private Sector Tools
Perhaps most troubling is the source of available AI solutions. According to Cochrane's evaluation, "most of the tools available were developed by private companies," which creates a significant conflict when reviewing drugs and medical devices—an area requiring strict independence from industry influence. Beyond ownership structure, these proprietary systems operate as opaque "black box" processes with no transparency into how decisions are made. There's currently no way to verify whether such tools disproportionately include trials with results favorable to specific pharmaceutical companies while systematically excluding less flattering research. For systematic reviews meant to provide objective summaries of all available evidence, this lack of verifiability represents a fundamental breach of methodological integrity.
Cochrane's Testing Reveals Surprising Performance Issues
Beyond concerns about bias and hallucination, practical testing at Cochrane uncovered another uncomfortable reality: AI-assisted review workflows actually take longer than traditional manual approaches. Both the AI systems and their human operators require extensive training before producing reliable outputs—and even after that investment, "for each review, the whole process takes longer than doing the work manually." This performance gap suggests that organizations rushing to deploy AI for systematic reviews may be trading efficiency gains for accuracy losses without achieving either benefit.
Key Takeaways
- Current AI tools hallucinate fabricated information at rates incompatible with scientific rigor requirements
- Proprietary "black box" systems prevent verification of potential industry bias in study selection
- Cochrane found AI-assisted workflows actually slower than manual review processes after extensive training
- Tool developers should focus on human-AI collaboration rather than full automation of systematic reviews
The Bottom Line
The assumption that machines can replace humans on all methodological tasks isn't just premature—it's dangerous when applied to evidence synthesis with direct clinical implications. Cochrane's real-world testing provides the kind of insider validation that AI vendors won't share: current generation tools are slower, less trustworthy, and potentially compromised by commercial interests. Before anyone deploys these systems in healthcare decision-making, we need open-source alternatives with verifiable methodology—or patients will pay the price for our enthusiasm.