Editor's Corner | Clinical Trials: Do They Have Significance For Patient-Centered Care?
Tis the season for the reporting of clinical trials! After ACC comes SCAI, EuroPCR and then ESC, TCT, AHA… and then we're back to ACC all over again. Arguably, the centerpiece of each of these large cardiology meetings is the presentation of "Late-Breaking Clinical Trials" (LBCTs).
The game is played by all sides. Meeting planners showcase LBCTs in grand ballrooms, the press is alerted, results are embargoed, and count on the expectation of clinical breakthroughs and the fact that we have to hear it "first" to lure us. Trialists choose which venue is likely to generate the most "buzz" and attract the largest desirable audience, even though most of the time the trials are not "late-breaking" at all (the data have been analyzed weeks before and kept under wraps).
For most of us, the cardiology consumers, the presentation brouhaha is irrelevant. We want to know what the short version of the results means, how the results will change what we do each day, and how reliable really are the outcomes. There's the rub, of course.
Trialists understandably want their trial outcomes to be "positive" and trials are designed for this as well. Few trials these days have straightforward endpoints. How many trials have you seen recently that have a simple binary (yes/no) endpoint (dead or alive, bleeding or no bleeding)?
These days most trials that have such a binary endpoint have a "primary composite endpoint" (death/myocardial infarction/ischemia-driven revascularization/bleeding/stent thrombosis, MACE, or even MAACE) measured within a short time frame. Bear in mind that any one of those multiple outcomes might "drive" a positive result for the composite.
A second type of trial analyzes time to event outcomes (time to death, time to repeat hospitalization), while a third type of trial reports what I call a "delta" effect – a change in NYHA class or a change in blood pressure for a new antihypertensive drug. These three trial types have varying ways for statisticians to evaluate them.
We are bombarded by relative risk, relative risk reduction, relative odds, number needed to treat, risk ratio, odds ratio, hazard ratio, Kaplan-Meier curves, ANOVA, and of course the "p value." What's a poor cardiologist to do?
For one, if you really want an in-depth description of what these various interpretations mean and how they are understood, read the four-part series written by Pocock, et al. in the Journal of the American College of Cardiology.1 It's a great start.
That homework aside, I'll bet most folks reading this editorial are most comfortable with the p value, and indeed many (most?) trials include the p value in the outcomes. Should all trials have a p value for all outcomes? Perhaps not! Want to see how controversy-provoking all this is even among statisticians? Try the American Statistician editorial on its p value statement,2 and then read the New England Journal of Medicine's new guidelines for authors concerning statistics issued just this July.3
Nothing, it seems, is easy.
But here is what we have been taught and here is what we sometimes teach: if the p value is less than 0.05 (p<0.05) the trial is positive, and if p is greater than 0.05 (p>0.05) the trial is negative. Right? — WRONG!
Here's the reason. The p value is based on the null hypothesis. The null hypothesis states that if two treatments are under study they are exactly the same in their effect on the outcome. That is the assumption. But when the trial is finished, the outcomes observed are not the same!
When those outcome data are crunched, the p value simply answers this question: "What is the probability of finding a difference in outcomes in the two treatments at least as great as that actually observed?"
Or, more simply: "What is the probability that the magnitude of the observed treatment difference is due to chance?" A p value less than 0.05 simply means there is a less than 5 percent chance the reported difference from a trial is a random effect.
The smaller the p value is, the stronger the evidence is against the null hypothesis. The smaller the p value is, the more convincing the evidence is that a real difference in treatment effect has been found. If the benefit of Treatment A over Treatment B has a p value of 0.00002, it is highly likely that the effect is real. If the benefit has a p value of 0.045 – not so much, though we still call it "significantly different."
But of course, it is never that simple. In looking at a trial result, look at the data more closely. The report of trial outcomes should include the actual p value instead of the rough p<0.05. If the actual p value is 0.053, does that really make the trial outcome "not signiﬁcant? What the p=0.053 means is that the probability of the outcome happening by chance is 5.3 percent. Also, remember that under the null hypothesis, the p value is the probability of getting a difference in either direction.
Most of the time we are interested only in a positive direction of treatment effect. But using a "one-sided" p value by cutting the p value in half can be misleading. Caveat emptor: the reporting of trials can be tricky for the reader! A recent trial claimed a one-sided p<0.04 for lower mortality comparing two treatments. The 2-sided p was actually p<0.08. Dividing the one-sided p<0.08 in half to p<0.04 made it easier to achieve a p<0.05.
Moreover, a small p value does not necessarily mean that a treatment effect is occurring, and a p value greater than 0.05 does not mean that the trial is necessarily "negative." Lots of things can affect the outcome data. This is where a little extra trial drilling can produce unexpected insights. Look at the trial design to see what kind it really is. Could there be biases in recruitment, were the investigators blinded, and what percentage of patients initially entered into the trial were followed up?
Is a randomized trial really randomized or is the trial a registry report of "sequential patients" with a registry's inherent biases? Are secondary endpoints being showcased instead of the primary endpoint for which the trial was designed? How many patients were in the trial? The latter question can be critical. For a small trial the difference in treatment effect has to be large enough to achieve a p value less than 0.05. A far larger trial can detect smaller differences.
Here is a theoretical trial example. A new Treatment C is being compared to standard, established Treatment D. The trial randomizes 200 patients. The outcome is repeat hospitalization.
The rehospitalization rate is 2.1 percent for Treatment C and 2.6 percent for Treatment D.
If the trial has 200 total patients (100 Treatment C, 100 Treatment D) the p-value (two-tailed) is: 0.77 using a chi-square test.
If the trial has 20,000 patients (10,000 Treatment C, 10,000 Treatment D) the p-value (two-tailed) is 0.019 using a chi-square test.
Which of the two trials is "correct" or "significant?" Or are both? Or neither?
My theoretical trial and its results raise the last critical question: Do the results of a statistically significant trial always have clinical relevance? Is the benefit of Treatment C over Treatment D in my theoretical trial with 20,000 patients really great enough to warrant a change in clinical strategy? One of my teachers told me that if a trial needs more than 10,000 patients to show a treatment effect, the difference in the two treatments must be small. Not only that, but look at the actual outcome numbers in my theoretical trial.
The difference in rehospitalization rates is 2.6 percent (Treatment D) minus 2.1 percent (Treatment C) = 0.5 percent! Want to make that number sound impressive? It calculates to a reduction of risk for the endpoint of 19 percent! But in reality, is a difference of just a half of a percent in rehospitalization rates enough to warrant a clinical change – when the rehospitalization rates in both groups are under 3 percent?
Remember also that anything new that we do to patients has some risk perhaps not identified in the trial. Want to know why the guidelines committees much prefer at least two trials confirming a positive effect of a treatment? All of the above explains why.
Please understand this editorial does not go anywhere near the complex analyses that sophisticated statisticians can perform to help us understand what a trial can tell us. But sooner or later a trial must tell us whether its results are compelling enough to help us in treating patients. That is where a trial's "significance" truly lies.
Regardless, remembering the issues raised here and looking critically at trial reports should make any trial results more meaningful. The value of the LBCT blitz that comes to us every year will make a bit more sense. And perhaps you will find out that some of those "significant" trials are not as significant as touted! Keeping this in mind will help us all to better practice patient-centered care.
Peter C. Block, MD, FACC, is a professor of medicine and cardiology at Emory University Hospital and School of Medicine in Atlanta, GA.
- Pocock SJ, McMurray JJV, Collier TJ. Making sense of statistics in clinical trial reports: Part 1 of a 4-part series on statistics for clinical trials. J Am Coll Cardiol 2015;66:2535-49.
- Wasserstein RL, Lazar NA. The ASA statement on p-values: context, process and purpose. The American Statistician 2016;70:129-133.
- Harrington D, D'Agostino RB Sr, Gatsonis C, et al. New guidelines for statistical reporting in the Journal. N Engl J Med 2019;381:285-6.
Keywords: ACC Publications, Cardiology Magazine, Chi-Square Distribution, Mental Recall, Risk, Patient-Centered Care, Knowledge, Hospitalization, Antihypertensive Agents, Odds Ratio, Blood Pressure, Risk, Research Personnel, Myocardial Infarction, Coronary Artery Disease, RNA-Binding Proteins, Nerve Tissue Proteins, Registries, Publication Bias, Hospitalization, Thrombosis, Analysis of Variance, Stents
< Back to Listings