Deep Learning in Automating Cardiac Imaging: Challenges and Mitigation Strategies

When applying imaging analysis based on deep learning (DL) models, training data and generalization were found to challenge the performance of DL algorithms for automated cardiac measurements, while evaluation metrics were found to challenge the ability to detect underperforming algorithms, according to new research published in JACC: Cardiovascular Imaging.

In a study of a clinical use case of automated DL echocardiographic image analyses for LVEF estimation in heart failure management, David Pasdeloup, PhD, et al., included 3,538 patients from three different data sets: EchoNet (n=10,030) for the internal data set and HUNT4 (n=1,762) and CAMUS (n=500) for external testing. The examined the impact of three challenges for using DL for cardiac imaging measurements: relevance of evaluation metrics, effect of data imbalance in training, and generalization performance of DL models when confronted with new, unseen external data.

For the evaluation challenge, results showed a mismatch between available testing data and the clinical problem. The area under the receiver-operating characteristic curve in particular varied from 0.71-0.98 due to changes in population characteristics. DL could also not completely account for uncertainty in reference values.

Investigators suggest an extended version of Bland-Altman analysis as a more robust evaluation method – and provide a GitHub repository with the code for this analysis available in open source and as a Python package.

For the training data challenge, results showed that data imbalance and lack of diversity impacted model performance, but that enrichment and oversampling techniques could improve it when a small subset of unique cases was used for training – a reduction of 40%. However, these strategies offered no additional benefit when the entire training set was available.

For the generalization challenge, results showed a degradation in performance when moving from internal data to external data. Investigators suggest this gap could be reduced substantially when domain-specific augmentations were applied during training and robustly tested.

JACC Central Illustration

"By considering evaluation metrics and training data distribution, and incorporating imaging domain knowledge, the design and evaluation of DL models can be improved, leading to more robust models, improved interpretation, and easier comparison across data sets," write the authors.

"Although these challenges and their mitigation strategies warrant further scrutiny, they should definitely be discussed and considered in future updates of the PRIME checklist," write Márton Tokodi, MD, PhD, and Ádám Szijártó, MD, in an accompanying editorial comment. "We look forward to seeing how the proposed strategies will shape the interpretation of forthcoming DL studies, and we hope they will reduce research waste by facilitating the development of novel and more robust DL models that will truly mature into products ready for clinical adoption rather than remain research prototypes."

Clinical Topics: Heart Failure and Cardiomyopathies, Noninvasive Imaging, Acute Heart Failure, Echocardiography/Ultrasound

Keywords: Deep Learning, Heart Failure, Echocardiography


< Back to Listings