Deep Learning of ECGs From US Veterans to Predict Atrial Fibrillation
Quick Takes
- A deep learning model trained on outpatient Veterans Affairs (VA) sinus ECGs shows promise in predicting incidence of AF in VA and non-VA populations.
- Deep learning may be used to identify patients at high risk of developing AF for consideration for additional monitoring to prevent long-term complications from AF.
Study Questions:
Does a deep learning model using routinely acquired outpatient 12-lead electrocardiograms (ECGs) in sinus rhythm predict the presence of atrial fibrillation (AF)?
Methods:
This was a prognostic study using 12-lead ECGs extracted from six Veterans Affairs (VA) sites obtained between January 1, 1987–December 31, 2022. ECGs were included if they were in sinus rhythm and collected in the outpatient setting and excluded if they had poor data quality, paced rhythms, or could not be paired with age and sex information. An external test data set consisted of ECGs from a non-VA medical center obtained March 1, 2005–December 31, 2018. Cases of concurrent AF were defined as a sinus rhythm ECG that within 31 days could be paired with ≥1 ECG in AF or flutter, whereas controls were ECGs without AF or flutter. Deep learning model performance for clinical prediction of AF was compared to other VA sites and a non-VA academic medical center as well as performance relative to conventional clinical prediction tools (CHA2DS2-VASc and risk factor regression model). Exploratory analysis simulated the 1-year prospective prediction of a patient’s first case of AF.
Results:
A total of 907,858 outpatient ECGs in sinus rhythm from 277,528 patients were evaluated, with 28,117 ECGs (3.1%) having a documented case of AF within 31 days. The VA cohort ECGs were from patients who were predominantly male (93.6%), White (62.4%), with a mean age of 62.4 years, average CHA2DS2-VASc score of 1.6, and >10% had comorbidities such as heart failure, diabetes, or prior myocardial infarction. The external cohort (n = 72,483 ECGs from 44,754 patients) identified 1,736 cases (2.4%) of AF within 31 days and this cohort included more women (52.5%) with a mean age of 59.5 years, still predominantly White (74.8%), with a lower prevalence of comorbidities, but the same mean CHA2DS2-VASc score compared to the VA cohort. Patients with concurrent AF were older, more often male, more often White, with a higher incidence of comorbidities and CHA22DS22-VASc score compared to controls.
The deep learning model used in the VA population had an overall area under the receiver operating characteristic curve (AUROC) of 0.86 (95% confidence interval [CI], 0.85-0.86), accuracy of 0.78 (95% CI, 0.77-0.78), and F1 (measure of positive predictive value and sensitivity) score of 0.3 (95% CI, 0.3-0.31). The model used in the external cohort had an AUROC of 0.93 (95% CI, 0.93-0.94), accuracy of 0.87 (95% CI, 0.86-0.88), and F1 score of 0.46 (95% CI, 0.44-0.48). Brier scores of 0.02 across both VA and external sites and nonsignificant Spiegelhalter z test (p = 0.06 across all sites) also suggested the model was well calibrated. The deep learning model had a higher AUROC (0.86; 95% CI, 0.86-0.87) compared to CHA2DS2-VASc score (0.7; 95% CI, 0.7-0.7) and risk factor regression model (0.73; 95% CI, 0.73-0.74). The corresponding screening threshold with testing sensitivity fixed at 25% resulted in a lower number needed to screen of 2.47 individuals with the deep learning model compared to 11.48 with the regression model and 12.01 with CHA2DS2-VASc score. Exploratory analysis of 1-year prospective prediction of AF showed AUROCs ranging from 0.8-0.85 and accuracies from 0.73-0.79 at VA sites and AUROC 0.79 with accuracy of 0.72 at a non-VA medical center.
Conclusions:
This prognostic study showed that a convolutional neural network trained using outpatient 12-lead ECGs in sinus rhythm from US veterans successfully predicted the presence of AF within 31 days in VA and non-VA populations with a diversity of demographic characteristics and comorbidities.
Perspective:
Deep learning offers a way to analyze large amounts of data that would be impractical for a clinician to manually review, such as data generated by ECGs, and to generate predictions for identifying high-risk patients that warrant further evaluation. This prognostic study demonstrated a model that performed well for predicting 31-day risk of AF in both VA and non-VA populations as well as when compared to conventional clinical prediction tools. Utilizing machine learning may enable earlier identification of patients at risk for developing AF. Further studies will be needed to determine how early identification of patients at high risk of developing AF identified by deep learning impacts clinical outcomes.
Clinical Topics: Arrhythmias and Clinical EP, Atrial Fibrillation/Supraventricular Arrhythmias
Keywords: Atrial Fibrillation, Deep Learning, Electrocardiography
< Back to Listings