Abstract
Blood-based biomarkers have demonstrated strong performance for identifying cerebral amyloid pathology within individual cohorts. However, their clinical utility depends on portability across populations and assay platforms. The impact of cross-cohort deployment on clinically actionable metrics such as negative predictive value remains insufficiently characterized. We analyzed data from two independent cohorts: the Alzheimer’s Disease Neuroimaging Initiative (n = 885) and the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease study (n = 822). Machine learning models were developed within each cohort to predict amyloid positron emission tomography status and continuous amyloid burden. Performance was evaluated using area under the receiver operating characteristic curve, accuracy, coefficient of determination, and root mean square error. Cross-cohort portability under pairwise external validation was assessed using bidirectional transfer without retraining. Calibration, predictive values, and decision curve analysis were used to evaluate clinical utility. Within-cohort discrimination was high, with area under the curve up to 0.917–0.918 in the Alzheimer’s Disease Neuroimaging Initiative and 0.870 in the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease cohort. Prediction of continuous amyloid burden was moderate (coefficient of determination up to 0.628 and 0.535, respectively). Cross-cohort deployment resulted in modest attenuation of discrimination but substantially greater degradation in clinically actionable performance. Negative predictive value declined from 0.831 to 0.644 when models trained in the Alzheimer’s Disease Neuroimaging Initiative were applied to the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease cohort, despite preserved discrimination. Calibration analyses demonstrated systematic probability misestimation, and decision curve analysis showed reduced net clinical benefit. Biomarker distributions differed across cohorts, consistent with dataset shift. Blood-based biomarker models retain discrimination across cohorts but exhibit clinically meaningful degradation in predictive value under real-world deployment conditions. Calibration instability and population differences critically affect rule-out performance. These findings highlight the need for cross-cohort validation, calibration assessment, and assay-consistent biomarker generation prior to clinical implementation.
Keywords
Alzheimer’s disease, Plasma biomarkers, Amyloid PET, Machine learning, Pairwise external validation, Calibration, Negative predictive value