Big Data: When and How Will it Impact Interventional Cardiology?

What is "Big Data"?

So just what is "Big Data?" And what is its relevance to health care? Foundationally, Big Data comprises lots and lots of disparate, "small" pieces of data (not to be confused with documents, but more on that later). Conceptually, Big Data is characterized by the five Vs: volume, velocity, variety, veracity, and value.1,2 The concept of volume extends the classical notion of data (e.g., data captured via case report forms in clinical trials) to the explosion of sources and types of data across the dimensions of health and life (e.g., clinical documentation, health care transactions, genomic sequencing, patient-provided information, biosensor data, research findings, environmental parameters, social media, and economic status) (Figure 1). Implicit in the concept of Big Data is the expectation of aggregation of this disparate, multidimensional data via analytics that discover and drive insight about health care delivery and clinical outcomes. Velocity denotes the increasing speed with which data are available for consumption and use. In conjunction with increasing volume and velocity, increases in variety and decrements in overall veracity (quality) of data can concomitantly occur in terms of data structure, interpretability, semantic meaning, and timing given the considerable variety and variability of the people, systems, and processes for sourcing data. The value statement is in the opportunity for Big Data to drive analytics that can be applied to assess health care processes and quality, improve clinical outcomes, align resources with the delivery of health care to individuals and populations, and glean insight into new and undiscovered dimensions.3 Also implicit in the concept of Big Data is that the volume of data is so large that the data cannot be readily handled by conventional approaches.

What about Big Data in interventional cardiology? Consider this question as comprising several components:

  1. What are the sources, types, and characteristics of data (i.e., volume, variety, and veracity) specific and relevant to interventional cardiology?
  2. What pressing problems are worthwhile targets in interventional cardiology (i.e., value) for Big Data to address?
  3. What are the key challenges that will determine the velocity with which Big Data will transform knowledge generation and/or evidence application in interventional cardiology (e.g., transitioning from classical regression modeling to machine-learning algorithms and use of Big Data to inform clinical decision support)?

Figure 1: Sources of Data in Health Care

Figure 1

Data Sources in Interventional Cardiology

Interventional cardiology has a rich, data-centric tradition. From the beginnings of coronary angioplasty, Andreas Gruentzig collated volume information from around the world to better understand and improve the procedure (Figure 2). In 1994, version 1 of the American College of Cardiology cardiac catheterization and angioplasty registry began accepting data about interventional cardiology procedures. Some 25 years later, the CathPCI registry of the National Cardiovascular Data Registry (NCDR) remains pre-eminent, accruing data on over 90% of procedures in the United States.4 Critically, analytics (primarily descriptive statistics) applied to these data are the foundation of quality, process, and outcome improvement initiatives at many health care enterprises.

Figure 2: Photo of Blackboard at August 1980 Live Course Conducted by Andreas Gruentzig, Zurich, Switzerland5

Figure 2

Acknowledged shortcomings of the NCDR CathPCI registry include that data are limited primarily to the in-hospital phase of care, that chart abstraction is a time and resource-consuming process, and that return of actionable information requires an extended amount of time. To ascertain long-term outcomes, an approach based on 100% manual follow-up of patients (i.e., adoption of the clinical trials process of dedicated follow-up for the duration of the clinical trial) is simply untenable. Instead, tapping into the transactions of health care documented in the electronic health record (EHR), claims submissions, and mortality indices could be used as a surrogate. Given that the endpoint of major adverse cardiac events following percutaneous coronary intervention is generally composed of the incidence of death, myocardial infarction, and repeat revascularization, the "computable phenotype" can be theoretically derived from a combination of data sources that identifies patients sustaining one or more of these adverse events.6 This straightforward example of Big Data to analysis derived from the aggregation of disparate sources of data is well within the realm of what can be technically accomplished today.

Expanding beyond transactional health care data, a more holistic view of the patient can be envisioned wherein clinical, procedural, imaging, socioeconomic, genomic, proteomic, device, patient-provided, environmental, and social media information (to name a few!) contribute to a more complete perspective of each patient. For example, the ability to predict both short-term abrupt closure and longer-term restenosis remains limited at best. Conventional models suggest that both clinical risk factors and compliance contribute to adverse events. But these models are only modestly predictive, suggesting a component attributable to socioeconomic, environmental, genomic, proteomic, and/or metabolomic factors. Predictive analytics enabled by the linkage of disparate datasets should improve sensitivity and discrimination for fundamental clinical issues such as the prediction of stent thrombosis or restenosis at the individual patient level, termed precision medicine.7

Pressing Problems, High-Value Targets

Although there has been some success in oncology with the application of precision medicine, a representative study of high-throughput genomics identified only 7% of 1,035 patients with cancer who benefitted from a Big Data, precision medicine approach.8 Can this startlingly poor statistic be improved in the field of interventional cardiology, or should the opportunity for Big Data be rethought? A second dimension that will be a key determinant of the impact of Big Data on interventional cardiology (and health care in general) is the identification of pressing issues that Big Data can potentially solve today with a reasonable expectation of return on investment (ROI). There already is a considerable body of literature about the enabling potential of Big Data, but a better lens for predicting the Big Data investment (and the resulting implementation timeline) is to consider the practical problems Big Data can more quickly address. Three domains seem likely candidates for the near horizon, predicated by both the (relatively) ready availability of the requisite data streams and the opportunity for ROI (Table 1). The first is to improve the translation of clinical findings identified at the population level (e.g., results of clinical trials and recommendations of guidelines) to the level of groups or classes of patients (in contradistinction to precision medicine focused on identifying the approaches specific to the individual patient based on genetic, environmental, and lifestyle factors). The second is to reduce the time to actionable information related to medical therapeutics such as medical device usage optimization and innovation. Finally, the third promising opportunity is to reduce clinician burden (and health care system expense) related to use of EHR systems.

Table 1: Near Horizon Big Data Opportunities

Predicting Risk

  • Identifying patients at high risk for adverse events and/or who may incur high costs of care

Developing Real-World Evidence

  • Leveraging clinical documentation to generate evidence for regulatory decision-making
  • Incorporating real-world data into clinical research

Improving Clinical Workflow

  • Using the computer to compile views of data
  • Reducing clinician documentation burden imposed by EHR systems

In a seminal manuscript published in 2014 in Health Affairs, Bates and colleagues articulated six clinical targets to help strategically guide the development of Big Data approaches, specifically the predictive identification of groups of high-risk, high-cost patients.9 The characteristics of these groups fall into one or more of the following categories of patients: those with a high cost of care, those with a high probability for readmission, those for whom there is need for better triage, those at risk for clinical decompensation, those at risk for adverse events, and those for whom treatment optimization is needed because of the interplay of complex, multi-organ involvement. As noted by the authors, these are "some of the clearest opportunities... to reduce costs through the use of big data."9 The ROI calculus suggests that Big Data approaches to addressing these targets will likely be brought online in the near future.

One step removed from the immediacy of clinical decision-making are initiatives that leverage Big Data methodologies to more quickly generate evidence across the domains of quality assessment, performance improvement, device surveillance, and related activities. The National Evaluation System for health Technology was created in 2012 to "quickly identify problematic devices, accurately and transparently characterize and disseminate information about device performance in clinical practice, and efficiently generate data to support premarket clearance or approval of new devices and new uses of currently marketed devices."10 The approach being developed is to tap into real-world data sources to develop evidence that supports the medical device product lifecycle, from inception through incremental improvement to device sunsetting. Key data sources are anticipated to include clinical documentation in EHRs, adverse event reports, claims data, and patient-reported information, indexed via the Unique Device Identifier.11,12 The intention of tapping into this real-world data torrent is to transform device assessment and surveillance from the somewhat limited approaches of today into one with greater capabilities for comparative effectiveness determinations and early adverse signal detection.13

A third logical near-term opportunity amenable to the methodologies inherent in Big Data is in reducing clinician burden. When the HITECH provisions of the American Recovery and Reinvestment Act of 2009 were passed, expectations were that the widespread adoption of EHRs would improve patient safety and clinical outcomes while reducing the costs of the American health care system.14 Instead, a decade of experience has demonstrated a decrement in efficiency (with EHR systems adding 1-2 hours of work per day); poor correlation of EHR use with modest improvements in care processes and metrics; no changes in hospitalization length of stay, mortality, readmission, or patient safety incidents; no improvement in longevity and other population health metrics; and no reduction in the rate of health care expenditure growth.15-18 Reducing clinician burden has become a rallying point for substantive efforts to understand the problem and describe potential solutions.19

With EHRs focused on handling the transactions of health care (and optimizing revenue), the "electronic filing cabinet for documents" model of the modern EHR has only added to the cognitive load of clinicians. How could Big Data change this calculus, improve the usability of EHR systems, and facilitate better integration of EHR systems with clinical workflows? Envisioning this, Hollywood has already capitalized on the concept on the big screen (Minority Report, directed by Steven Spielberg) and television (Pure Genius, CBS medical drama). In these and other entertainment venues, computers automate the collation, aggregation, and analysis of disparate data from across conventional and novel data sources, presenting the data to the clinician in manner consonant with the clinical mental model. This is a service that is desperately needed by the clinical community. This would reduce the work of accessing information while presenting that information in much more informative motifs. Done this way, clinical data could provide decision support for diagnosis and treatment and increase the individualization of predictive analytics. Tactically, a pressing early need of a view-based approach to a body of data about a specific patient could be leveraged to support pre-authorization requirements for procedures or automate the assessment of the application of appropriate use criteria. Given the prevalence of coronary disease, the proportion of health care spend devoted to cardiovascular disease, and the suitability of interventional cardiology as a model for Big Data approaches, it is only a short leap to anticipate growing investment using interventional cardiology as a demonstration platform.

Key Challenges

When will Big Data impact interventional cardiology? With billions being spent annually on Big Data in health care,20 the issue is not investment. Concerns about privacy and security of protected health information are also frequently raised, but the use cases above are amenable to safe harbor methods, whether direct (e.g., as a component of health care operations), as a matter of public health (e.g., device surveillance), or by de-identification (e.g., via distributed analytics wherein identifiable health care data are kept within the auspices of a health care organization, with only the aggregate results of analyses being brought together). Instead, two general categories of issues will most likely determine the timeline. First is the relative absence of high-quality "small data," the atomic particles that must be assembled to form Big Data. Aside from transactional events, laboratory data, and medication information, most clinical data (like EHR information and billing and claims) are poorly defined and not suitable for analysis. Not only is most clinical information in the form of documents (and not as discrete data), the verbiage used in clinical documents is frequently recorded in broad categories with low dimensionality (e.g., documentation of the presence of diabetes is simply listed as "history of diabetes" without further qualification).21 Informatics approaches focusing on data standards, data liquidity, and industrial engineering to integrate data capture with clinical workflows and the documentation thereof (i.e., structured reporting) are keys to transforming this component.22 Second is the harnessing of resources within the context of a complex health care environment. Rather than standalone products for sale, the optimal approach requires the building of Big Data solutions within the framework of existing EHR systems and the multiplicity of other envisioned data sources. Creating a one-off environment analogous to the chart abstraction model of registries does not appear to be reasonable for capturing high-quality data at high volume.

Are high-quality, high-volume data that return actionable information in real time or near real time actually possible? We believe the answer is affirmative. Duke Heart has a rich history of data collection and use of data in the interventional arena beginning circa 1967 with the inception of the Duke DataBank.23 Duke has evolved the processes of data collection in the cardiac catheterization suite to integrate with clinical workflows in a distributed, team-based documentation model of structured reporting, thus capturing discrete small data elements at the point of care. This approach reduces the cognitive burden of the EHR on all clinicians interacting with the system such that data are collected only in the context of patient care (and not in the service of data collection itself), following the mantra "collect once, use many times." Data are collated and repurposed for multitude uses in real time: clinical documentation (e.g., procedure report and procedure log), submission to registries, and real-time analytics. The environment is depicted in Figure 3.

Figure 3: Duke Model of Structured Reporting in the Cardiac Catheterization Laboratory

Figure 3

In conclusion, the vision of Big Data is coming into focus. Note that the application of Big Data in interventional cardiology will likely have little impact on the technical components of cardiac catheterization per se. Instead, the potential impact (and utility) of Big Data is in optimizing the longitudinal care of the interventional patient, informing and improving interventional devices across the total product lifecycles, and reducing clinician burden through the intelligent collation and presentation of relevant information while reducing documentation overhead. Critically, Big Data is already positioned to impact the practice of interventional cardiology, and our discipline is in many ways an ideal demonstration environment. Big Data made actionable at the level of the individual patient is not far away, facilitating the right management of the right patient at the right time.


  1. De Mauro A, Greco M, Grimaldi M. A formal definition of Big Data based on its essential features. Libr Rev 2016;65:122-35.
  2. Hilbert M. Big Data for development: a review of promises and challenges. Dev Policy Rev 2016;34:135-74.
  3. Waljee AK, Joyce JC, Wang S, et al. Algorithms outperform metabolite tests in predicting response of patients with inflammatory bowel disease to thiopurines. Clin Gastroenterol Hepatol 2010;8:143-50.
  4. Moussa I, Hermann A, Messenger JC, et al. The NCDR CathPCI Registry: a US national perspective on care and outcomes for percutaneous coronary intervention. Heart 2013;99:297-303.
  5. Worldwide PTCA experience-Blackboard tally! Fourth Live Demonstration Course, Zurich, 1980 (PCR Online website). 2005-2019. Available at: Accessed March 1, 2019.
  6. Richesson RL, Hammond WE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc 2013;20(e2):e226-31.
  7. National Research Council (US) Committee on A Framework for Developing a New Taxonomy of Disease. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. Washington, DC: National Academies Press; 2011.
  8. Massard C, Michiels S, Ferté C, et al. High-Throughput Genomics and Clinical Outcome in Hard-to-Treat Advanced Cancers: Results of the MOSCATO 01 Trial. Cancer Discov 2017;7:586-95.
  9. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff (Millwood) 2014;33:1123-31.
  10. Shuren J, Califf RM. Need for a National Evaluation System for Health Technology. JAMA 2016;316:1153-4.
  11. Tcheng JE, Crowley J, Tomes M, et al. Unique device identifiers for coronary stent postmarket surveillance and research: a report from the Food and Drug Administration Medical Device Epidemiology Network Unique Device Identifier demonstration. Am Heart J 2014;168:405-13.e2.
  12. Drozda JP Jr, Roach J, Forsyth T, Helmering P, Dummitt B, Tcheng JE. Constructing the informatics and information technology foundations of a medical device evaluation system: a report from the FDA unique device identifier demonstration. J Am Med Inform Assoc 2018;25:111-20.
  13. Fleurence RL, Blake K, Shuren J. The Future of Registries in the Era of Real-world Evidence for Medical Devices. JAMA Cardiol 2019;Feb 20:[Epub ahead of print].
  14. Hillestad R, Bigelow J, Bower A, et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Aff (Millwood) 2005;24:1103-17.
  15. Sinsky C, Colligan L, Li L, et al. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Ann Intern Med 2016;165:753-60.
  16. Krenn L, Schlossman D. Have Electronic Health Records Improved the Quality of Patient Care? PM R 2017;9:S41-S50.
  17. Kellermann AL, Jones SS. What it will take to achieve the as-yet-unfulfilled promises of health information technology. Health Aff (Millwood) 2013;32:63-8.
  18. Arndt BG, Beasley JW, Watkinson MD, et al. Tethered to the EHR: Primary Care Physician Workload Assessment Using EHR Event Log Data and Time-Motion Observations. Ann Fam Med 2017;15:419-26.
  19. EHR Interoperability WG (HL7 International website). 2019. Available at: Accessed March 1, 2019.
  20. Global Big Data Spending Market in Healthcare Sector to Post a CAGR of Over 12% Through 2022 (BusinessWire website). 2019. Available at: Accessed March 1, 2019.
  21. Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol 2016;13:350-9.
  22. Sanborn TA, Tcheng JE, Anderson HV, et al. ACC/AHA/SCAI 2014 health policy statement on structured reporting for the cardiac catheterization laboratory: a report of the American College of Cardiology Clinical Quality Committee. J Am Coll Cardiol 2014;63:2591-623.
  23. The Duke Databank for Cardiovascular Disease (Duke University Medical Center Archives website). 2007. Available at: Accessed March 1, 2019.

Keywords: Access to Information, Algorithms, American Recovery and Reinvestment Act, Angioplasty, Balloon, Coronary, Biosensing Techniques, Biomedical Technology, Calculi, Cardiovascular Diseases, Cardiac Catheterization, Cognition, Computer Security, Coronary Disease, Decision Making, Decision Support Systems, Clinical, Delivery of Health Care, Diabetes Mellitus, Documentation, Electronic Health Records, Filing, Follow-Up Studies, Health Expenditures, Information Storage and Retrieval, Life Style, Longevity, Metagenomics, Myocardial Infarction, Patient Readmission, Patient Safety, Percutaneous Coronary Intervention, Point-of-Care Systems, Phenotype, Prevalence, Proteomics, Registries, Quality Improvement, Public Health, Risk Factors, Social Media, Thrombosis, Stents, Workflow, Angiography, Coronary Angiography

< Back to Listings