A New Life for Old Algorithms?

Challenges of Individualized Risk Estimation

Primary cardiovascular prevention starts with accurate risk stratification to identify the highest-risk patients who will benefit from intensive management. The recently updated American College of Cardiology/American Heart Association (ACC/AHA) cardiovascular (CV) risk assessment guidelines recommended ongoing use of the Pooled Cohort Equations (PCE) despite evidence that the PCE overestimate risk in most modern multi-ethnic cohorts.1-3 However, derivation and rigorous validation of new risk scores a priori is time consuming and costly, requiring recruitment of large cohorts with meticulous follow-up. By the time of score derivation a decade later, baseline data are practically outdated in the current climate of improving preventive care and evolving population risk factors.

A new study in the European Heart Journal demonstrated that, rather than starting from scratch, we may simply need a "periodic facelift" for the existing risk prediction algorithms.4 In order to apply a risk algorithm to a given population, the algorithm should: 1) accurately represent the modifying effect of each risk factor on baseline atherosclerotic cardiovascular disease (ASCVD) risk, and 2) take into account the risk factor profile and ASCVD incidence in the population of interest. The latter of these characteristics limits the clinical performance of recommended risk estimators in modern populations, particularly in ethnic and geographic groups of unique ASCVD risk profiles not included in the derivation cohorts.

What if a meticulously derived algorithm could be fitted to the unique risk profile of the population of interest?

This paper is likely the first large-scale demonstration that recalibration of previously validated risk scores based on population-specific health record data improves individual estimation of ASCVD risk when compared to the original risk score rooted in the disease incidence of outdated research populations.

Recalibration can be best defined as a method that refines standard risk prediction algorithms to account for differences in the risk factor distribution and disease incidence of the target population, effectively adjusting risk algorithms to local contemporary circumstances.

The Study

Four risk prediction algorithms featured in modern American or European guidelines were evaluated head-to-head using pooled participant-level data from more than 300,000 individuals enrolled in 86 prospective primary prevention cohorts assimilated by the Emerging Risk Factors Collaboration (ERFC): the PCE, Framingham Risk Score (FRS), Systematic COronary Risk Evaluation (SCORE) and Reynolds Risk Score (RRS). Participants in the ERFC had not previously contributed to the derivation of any of these scores.

Subsequently, each risk score was recalibrated to capture the best of old and new: the original validated algorithm modified by a re-scaling factor determined by plotting the age-specific observed risk in the ERFC against the age-specific predicted risk using the old algorithm. The recalibrated model could then generate a recalibrated risk prediction at the individual level based on current CV disease incidence in that population. For direct comparison, they also recalibrated SCORE and RRS to the common ASCVD outcome used by the FRS and PCE.

Prior to recalibration, the four models were found to have widely different calibration and clinical performance in the ERFC cohort. Overall, the FRS, SCORE, and PCE overestimated risk by 10%, 52%, and 41%, respectively, and the RRS underpredicted risk by 10%. The FRS performed particularly poorly in women and in populations outside North America or Western Europe, while RRS was better calibrated for women than for men.

The extent of relative miscalibration of SCORE and PCE appeared similar across demographics, leading to greater discrepancy between absolute predicted and observed risks at older ages. When applied to a US standard population, only 58% of individuals recommended to consider statin initiation based on at least one risk score had agreement from all four algorithms, and 7% of the total population received such a recommendation from only one or two of four models.

The Effect of Recalibration

After recalibration, the performance of the four algorithms was harmonized, such that recommendations correlated at almost any given treatment threshold. Overall, the subset of individuals classified as high risk was reduced from 40% to 23%. Of those receiving a statin recommendation, 94% had agreement from three or four of the scores, compared to 82% pre-calibration. Though there was a relative decrease in estimated score sensitivity, negative predictive value remained 98% pre- and post-calibration.

The estimated number needed to treat with statin therapy improved from between 44 and 51 using the original algorithms to between 37 and 39 using recalibration to a common endpoint. It should be noted that estimated number needed to treat assumed 100% statin adherence; the surprisingly low and variable statin adherence in real-world populations would likely result in higher number needed to treat regardless of prediction model.5-7

Moving Forward: Targeted Risk Estimation

How can these findings be applied to modern risk prediction? These results suggest that re-calibrating old algorithms in light of population-specific risk factor and ASCVD incidence rates is more important than comparing the relative merits of each model in a given population.

This method was effective when applied to the broad and diverse ERFC population. Actually, this indicates that recalibration would likely have a more striking effect on clinical performance of risk prediction models in smaller, more homogenous populations too small to be represented in derivation cohorts.

The downstream effect is that future guidelines may recommend standardized routine updates of local data—distribution of risk factors and aggregate ASCVD incidence—particularly in regions and ethnic groups with risk profiles that are historically discordant from standard calculators. Regional recalibration coding by socioeconomic status (SES, potentially using postal codes as a surrogate) may improve discrimination as well, as the PCE seem to particularly overestimate risk in those of middle-to-high SES.8 Such data could be converted in each region to scaling factors embedded in existing user-friendly risk estimators based on validated algorithms.

At the most extreme, this may involve leveraging real-time electronic medical record data, perhaps from a network of academic medical centers, to ascertain current local risk profiles and generate updated recalibrated risk calculators every couple of years.9 There are pros and cons to this approach. Most importantly, this relies on accurate classification within the health records, which has mixed validity in prior studies evaluating risk prediction models. With a commitment to high-quality data reporting, however, this method is more cost- effective, timely, and targeted to the population of interest than frequently attempting to derive new broad cohort-based risk prediction models.

Previous Recalibration Models

While this was the first large study to compare clinical performance of re-calibrated algorithms head to head, the value of recalibration in single algorithms has been previously demonstrated. The SCORE model improved calibration by using separate "low risk" and "high risk" charts for European countries with relatively low and high incidence of ASCVD, respectively. However, even this bimodal calibration fell short in certain countries at extremes of inherent risk.10

This led the Fourth European Joint Task Force Guidelines to recommend recalibration of the SCORE chart to current national circumstances. Subsequent country-specific charts derived in Belgium, Germany and Spain confirmed the improved performance of such a model over the original, though this method was less effective in the Russian population, with the re-calibrated model maintaining under-estimation of risk in young men.11-14

The same method applied to a pooled Australian cohort (called the AusSCORE) markedly improved score performance in an independent single cohort after both low-risk and high-risk charts were found to have poor calibration in that country.15 In indigenous Australian populations, prone to even more extreme mis-calculation, recalibration of the Framingham score improved 5-year predictive value. Since limited data led to ongoing overestimation at 10 years, 5-year risk predictions were used to design SCORE-like charts for use in sparsely populated areas.16

Similarly, the Framingham Score has been recalibrated successfully to better approximate risk in US populations under-represented in derivation cohorts: those of Native American, Japanese-American, and Hispanic ethnicities.17

Of course, the recalibration method relies on accurate population health data, which is more limited in some localities than others. It is reasonable in these settings to conclude that simplified risk prediction is better than inaccurate risk prediction. One novel risk score was designed specifically for ease of population-specific recalibration in predicting only fatal CV events (which are often easier to report). This model performed well when recalibrated to risk factor and CVD incidence rates in 11 countries from different world regions with recent national health survey data.18

Conclusions

Recalibration of four well-validated ASCVD risk prediction algorithms to the distinct risk profile of the target population resulted in improved individual model clinical performance, harmonization between models, and a lower estimated number needed to treat. Re-scaling existing models to current population-specific health data may be a cost-effective and broadly applicable framework for optimizing risk assessment in modern clinical practice. This will be a very important topic for future AHA/ACC Prevention Guidelines to deal with.

Table 1: Derivation characteristics and estimated clinical performance of four validated risk prediction models before and after calibration

 

FRS

SCORE

PCE

RRS

Derivation
   Age (years)
   Region
   Baseline data
   collection

 

30-74
USA
Between 1968 and 1987*

 

40-65
Europe
1967-1991

 

40-79
USA
Between 1971 and 1999*

 

Women ≥45 / Men ≥50
USA
Women: 1992
Men: 1995

Demographic components

Age
Sex
Smoking status

Age
Sex
Smoking status

Age
Sex
Smoking status
Ethnicity

Age
Sex
Smoking status

Biomarker components

 

TChol
HDL-C
SBP
Antihypertensive use
Diabetes

TChol / HDL
SBP

TChol
HDL-C
SBP
Antihypertensive use
Diabetes

TChol
HDL-C
SBP
hsCRP
HbA1c (women, if diabetic)
Family history of early MI

Score-specific endpoints

first onset of non-fatal MI, fatal CHD, or any stroke

Fatal CVD

first onset of non-fatal MI, fatal CHD, or any stroke

non-fatal MI, fatal CHD or any stroke, coronary revascularization, or any CVD death

 

ORIGINAL

Sens / Spec**

73% / 71%

71% / 73%

81% / 63%

75% / 70%

PPV

11%

11%

9.8%

11%

NNS**

145

150

131

142

NNT**

46

44

51

45

 

RECALIBRATED TO SCORE-SPECIFIC ENDPOINTS

Sens / Spec**

61% / 80%

43% / 89%

66% / 78%

72% / 72%

PPV

13%

16%

13%

11%

NNS**

174

247

160

147

NNT**

38

31

39

45

 

RECALIBRATED TO COMMON ENDPOINT: non-fatal MI, fatal CVD, or any stroke

Sens / Spec**

61% / 80%

62% / 80%

66% / 78%

65% / 79%

PPV

13%

13%

13%

13%

NNS**

174

171

160

165

NNT**

38

38

39

37

*Multiple cohorts
**When applied to eligible cohorts in the ERFC. Disease incidence and events avoided based on assumptions of the same age structure of a standard population of the United States, the same age- and sex-specific incidence rates for CVD events as in the current study, and CVD risk reductions of 20% with statin treatment in people without a history of CVD.
CHD, coronary heart disease; CVD, cardiovascular disease; FRS, Framingham Risk Score; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; hsCRP, high sensitivity C-reactive protein; MI, myocardial infarction; NNS, number needed to screen; NNT, number needed to treat; NPV, negative predictive value; PCE, Pooled Cohort Equations; PPV, positive predictive value; RRS, Reynolds Risk Score; SBP, systolic blood pressure; SCORE, Systemic COronary Risk Stratification; Sens, sensitivity; Spec, specificity; TChol, total cholesterol.

References

  1. DeFilippis AP, Young R, McEvoy JW, et al. Risk score overestimation: the impact of individual cardiovascular risk factors and preventive therapies on the performance of the American Heart Association-American College of Cardiology-atherosclerotic cardiovascular disease risk score in a modern multi-ethnic cohort. Eur Heart J 2017;38:598-608.
  2. Pylypchuk R, Wells S, Kerr A, et al. Cardiovascular disease risk prediction equations in 400,000 primary care patients in New Zealand: a derivation and validation study. Lancet 2018;391:1897-1907.
  3. Rana JS, Tabada GH, Solomon MD, et al. Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J Am Coll Cardiol 2016;67:2118-30.
  4. Pennells L, Kaptoge S, Wood A, et al. Equalization of four cardiovascular risk algorithms after systematic recalibration: individual-participant meta-analysis of 86 prospective studies. Eur Heart J 2018. [Epub ahead of print]
  5. Avorn J, Monette J, Lacour A, et al. Persistence of use of lipid-lowering medications: a cross-national study. JAMA 1998;279:1458-62.
  6. Blackburn DF, Dobson RT, Blackburn JL, Wilson TW, Stang MR, Semchuk WM. Adherence to statins, beta-blockers and angiotensin-converting enzyme inhibitors following a first cardiovascular event: a retrospective cohort study. Can J Cardiol 2005;21:485-8.
  7. Turner RM, Yin P, Hanson A, et al. Investigating the prevalence, predictors, and prognosis of suboptimal statin use early after a non-ST elevation acute coronary syndrome. J Clin Lipidol 2017;11:204-14.
  8. Colantonio LD, Richman JS, Carson AP, et al. Performance of the atherosclerotic cardiovascular disease pooled cohort risk equations by social deprivation status. J Am Heart Assoc 2017;6.
  9. Blaha MJ. The critical importance of risk score calibration: time for transformative approach to risk score validation? J Am Coll Cardiol 2016;67:2131-4.
  10. Ulmer H, Kollerits B, Kelleher C, Diem G, Concin H. Predictive accuracy of the SCORE risk function for cardiovascular disease in clinical practice: a prospective evaluation of 44,649 Austrian men and women. Eur J Cardiovasc Prev Rehabil 2005;12:433-41.
  11. Hense HW, Koesters E, Wellmann J, Meisinger C, Voltzke H, Keil U. Evaluation of recalibrated Systematic Coronary Risk Evaluation cardiovascular risk chart: results from the Systematic Coronary Risk Evaluation Germany. Eur J Cardiovasc Prev Rehabil 2008;15:409-15.
  12. De Bacquer D, De Backer G. Predictive ability of the SCORE Belgium risk chart for cardiovascular mortality. Int J Cardiol 2010;143:385-90.
  13. Sans S, Fitzgerald AP, Royo D, Conroy R, Graham I. Calibrating the SCORE cardiovascular risk chart for use in Spain. Rev Esp Cardiol 2007;60:476-85.
  14. Jdanov DA, Deev AD, Jasilionis D, Shalnova SA, Shkolnikova MA, Shkolnikov VM. Recalibration of the SCORE risk chart for the Russian population. Eur J Epidemiol 2014;29:621-8.
  15. Chen L, Tonkin AM, Moon L, et al. Recalibration and validation of the SCORE risk chart in the Australian population: the AusSCORE chart. Eur J Cardiovasc Prev Rehabil 2009;16:562-70.
  16. Hua X, McDermott R, Lung T, et al. Validation and recalibration of the Framingham cardiovascular disease risk models in an Australian Indigenous cohort. Eur J Prev Cardiol 2017;24:1660-9.
  17. D'Agostino RB Sr, Grundy S, Sullivan LM, Wilson P, CHD Risk Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA 2001;286:180-7.
  18. Hajifathalian K, Ueda P, Lu Y, et al. A novel risk score to predict cardiovascular disease risk in national populations (Globorisk): a pooled analysis of prospective cohorts and health examination surveys. Lancet Diabetes Endocrinol 2015;3:339-55.

Keywords: Dyslipidemias, Hydroxymethylglutaryl-CoA Reductase Inhibitors, Risk Factors, American Heart Association, Research Design, Rhytidoplasty, Prospective Studies, Risk Assessment, Atherosclerosis, Primary Prevention, Electronic Health Records, Algorithms


< Back to Listings