Risk stratification is essential for effective targeting of prevention, management and treatment strategies in individuals who are most likely to benefit. This requires to estimate the absolute risk of unfavorable outcomes in individual patients, and to implement risk prediction models that combine information from social determinants of health, signs and symptoms, comorbidity data, imaging results or laboratory markers. Prediction models can particularly be useful during emerging outbreaks where large numbers of people need to be screened for disease and assessed for subsequent outcomes. For instance, more than 60 prediction models have been developed during the past few months for diagnosing coronavirus disease 2019 (covid-19), for assessing the risk of infection, and for estimating subsequent prognosis.
Unfortunately, the development and validation of prediction models is not straightforward during an emerging outbreak. Due to reseource constraints, existing cohorts are often small, adopt selective, often pathogen specific design choices, and focus on a limited set of exposures and data types. In addition, many studies are conducted hastily, analyzed using inappropriate statistical methods and poorly reported. As a consequence, the vast majority of developed prediction models are unreliable and unsuitable for use in routine care.
Collaborative research is urgently needed to defragment ongoing research activities, and to improve the overall quality and validity of clinical prediction models. For this reason, there is an increasing demand to share participant-level data and to perform a cross-cohort analysis. This strategy is also known as individual participant data meta-analysis (IPD-MA), and directly facilitates testing of developed prediction models across different health care settings, time periods and clinical phenotypes.
In this talk, I will introduce key principles of risk prediction modelling and discuss the opportunities of performing an IPD-MA. I will provide several illustrations, and explain statistical methods to address the presence of harmonization issues, missing data, and between-study heterogeneity. Finally, I will highlight recent iniatives aiming to improve the prediction of Zika virus and covid-19.