How to Read Articles That Use Machine Learning

Authors:
Liu Y, Chen PH, Krause J, Peng L.
Citation:
How to Read Articles That Use Machine Learning: Users’ Guides to the Medical Literature. JAMA 2019;322:1806-1816.

The following are key points to remember from this Users’ Guide to the Medical Literature on how to read articles that use machine learning:

  1. In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Machine learning methods use mathematical operations to process input data, resulting in a prediction.
  2. Modern machine learning methods use greater numbers of mathematical operations than traditional regression techniques to better define complex relationships between risk factors and outcomes. Irrespective of how a diagnostic tool is derived, it has to be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool.
  3. The name machine learning is used because these methods learn from examples during a process called training. There are two commonly used machine learning schemes: supervised learning, and unsupervised learning.
  4. Machine learning–based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size.
  5. Machine learning models generally have additional prespecified settings called hyperparameters (parameters that are established before a model is trained and remain fixed through the training process), which must be tuned on a data set independent of the validation set.
  6. On the validation set, the outcome against which the model is evaluated is termed the reference standard. Furthermore, the rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading.
  7. Similar to how a diagnostic test can be used (in principle) for triaging, screening, or diagnostic purposes, a machine learning model, developed to perform a specific task, can be used for several purposes.
  8. Even if a machine learning model has been thoroughly validated in different studies and the logistical, technical, and regulatory hurdles have been overcome for integration into the clinical workflow, the system still requires further research to measure the system’s clinical effectiveness.
  9. Readers of studies reporting the results of machine learning systems should assess the most crucial elements of machine learning model validation, such as whether the study design over-represents model performance through inappropriate hyperparameter tuning or a poor-quality reference standard.
  10. Finally, clinical gestalt plays an important role in evaluating whether the results are believable: because one of the biggest strengths of machine learning models is consistency and the lack of fatigue, a useful check for believable machine learning results is whether an experienced expert could reproduce the claimed accuracy given an abundance of time. Results that substantially differ from what such a hypothetical expert is capable of should be scrutinized and re-validated carefully.

Keywords: Diagnostic Tests, Routine, Learning, Primary Prevention, Risk Factors, Treatment Outcome, Workflow


< Back to Listings