Anumana Appoints Kevin Ballinger and Jean-Luc Butel to Board of Directors

Understanding Spectrum Bias In Algorithms Derived By Artificial Intelligence A Case Study In Detecting Aortic Stenosis Using Electrocardiograms

May 16, 2021

Background

There are an increasing number of diagnostic tests derived from artificial intelligence (AI) and machine learning algorithms. Spectrum bias can arise when a diagnostic test is derived from study populations with different disease spectra than the target population. This bias is well described studies of test performance, has not been previously evaluated in AI-derived algorithms. We used a real-world AI-derived electrocardiogram (AI-ECG) algorithm to detect severe aortic stenosis (AS) to demonstrate spectrum bias across the range of aortic value disease severity.

Methods

All adult patients at the Mayo Clinic Minnesota, Arizona and Florida campuses between January 1st, 1989 to September 30th, 2019 with transthoracic echocardiograms within 180 days after ECG were identified. Two patient cohorts were derived based on the composition of the comparator group: a general cohort comparing severe AS to any non-severe AS and a limited cohort comparing severe AS to no AS. We developed two AI-ECG models using the two cohorts separately. Model performance was assessed by each respective holdout test group.

Results

Overall, 258,607 patients had valid ECG and echocardiograms pairs. Using optimal decision threshold, the area under the receiver operator curve was 0.87 and 0.91 for the general and limited models respectively. Sensitivity and specificity for the general model was 80% and 81% respectively, while for the limited model it was 84% and 84% respectively. When applying the AI-ECG derived from the limited cohort to patients in the general cohort, the sensitivity, specificity and AUC were 83%, 73% and 0.86 respectively. In general, models should be applied to classification tasks that are similar to their initial training and exposure.

Conclusion

While AI-ECG in both general and limited models performed robustly in identifying severe AS, there is evidence that spectrum bias may exist based on disease-severity selection. While the effect of the bias may be modest in this example, clinicians should be aware of the existence of such a bias in AI-derived algorithms and methods to ensure proper interpretation of test performance and generalizability in clinical practice.

Published In:
Journal of the American College of Cardiology
Authors:
Andrew S. Tseng, Michal Shelly-Cohen, Zachi Itzhak Attia, Peter Noseworthy, Paul Friedman, and Francisco Lopez-Jimenez