MechaScreener: Large Language Model-Based Automated Screening for Systematic Reviews and Research

Wait 5 sec.

Systematic Reviews (SRs) are the gold standard for evidence synthesis, but the manual title and abstract screening of thousands of references creates a severe bottleneck. Existing automated tools have historically struggled to achieve the near-perfect recall (sensitivity) required for reliable reviews. We developed MechaScreener as a "zero-shot" automated screening tool that utilises a Large Language Model (LLM) to rank article relevance. The tool requires no initial training data or manual pre-screening, as MechaScreener directly applies user-provided question elements (PICO) or inclusion/exclusion criteria to assign an inclusion probability score (1-5) to each reference. We evaluated the tool in two phases: a development phase using five reference libraries to optimise prompts, and an independent evaluation phase using 10 diverse Cochrane review libraries (comprising both randomised controlled trials and non-RCTs) containing over 58,000 references. In the evaluation dataset, MechaScreener achieved a perfect mean recall of 1.00 (100%, pooled 95% CI: 0.98-1.00), ensuring no relevant articles were missed. Concurrently, it achieved an overall mean specificity of 0.61 (61%, pooled 95% CI: 0.59-0.60). Specificity varied: from 0.21 in broad public health topics to 0.91 in precise pharmacological interventions - reflecting the tool's built-in conservatism when evaluating ambiguous abstracts. By safely eliminating over 60% of irrelevant literature during the initial screening phase without compromising recall, MechaScreener functions as a highly reliable but low-effort "first-pass" filter, allowing researchers to substantially reduce manual workloads and reallocate resources toward full-text review and data extraction.