Smart Data Analysis and Statistics

Publications

Filter

Topic

History

Showing 6 of 6 publications

Evaluating individualized treatment effect predictions: A model‐based perspective on discrimination and calibration assessment

In recent years, there has been a growing interest in the prediction of individualized treatment effects. While there is a rapidly growing literature on the development of such models, there is little literature on the evaluation of their performance. In this paper, we aim to facilitate the validation of prediction models for individualized treatment effects. The estimands of interest are defined based on the potential outcomes framework, which facilitates a comparison of existing and novel measures. In particular, we examine existing measures of discrimination for benefit (variations of the c-for-benefit), and propose model-based extensions to the treatment effect setting for discrimination and calibration metrics that have a strong basis in outcome risk prediction. The main focus is on randomized trial data with binary endpoints and on models that provide individualized treatment effect predictions and potential outcome predictions. We use simulated data to provide insight into the characteristics of the examined discrimination and calibration statistics under consideration, and further illustrate all methods in a trial of acute ischemic stroke treatment. The results show that the proposed model-based statistics had the best characteristics in terms of bias and accuracy. While resampling methods adjusted for the optimism of performance estimates in the development data, they had a high variance across replications that limited their accuracy. Therefore, individualized treatment effect models are best validated in independent data. To aid implementation, a software implementation of the proposed methods was made available in R.

Journal: Stat Med |

Year: 2024

Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population

External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.

Journal: Stat Med |

Year: 2023

Current trends in the application of causal inference methods to pooled longitudinal non-randomised data: a protocol for a methodological systematic review

Introduction: Causal methods have been adopted and adapted across health disciplines, particularly for the analysis of single studies. However, the sample sizes necessary to best inform decision-making are often not attainable with single studies, making pooled individual-level data analysis invaluable for public health efforts. Researchers commonly implement causal methods prevailing in their home disciplines, and how these are selected, evaluated, implemented and reported may vary widely. To our knowledge, no article has yet evaluated trends in the implementation and reporting of causal methods in studies leveraging individual-level data pooled from several studies. We undertake this review to uncover patterns in the implementation and reporting of causal methods used across disciplines in research focused on health outcomes. We will investigate variations in methods to infer causality used across disciplines, time and geography and identify gaps in reporting of methods to inform the development of reporting standards and the conversation required to effect change.

Methods and analysis We will search four databases (EBSCO, Embase, PubMed, Web of Science) using a search strategy developed with librarians from three universities (Heidelberg University, Harvard University, and University of California, San Francisco). The search strategy includes terms such as "pool*", "harmoniz*", "cohort*", "observational", variations on "individual-level data". Four reviewers will independently screen articles using Covidence and extract data from included articles. The extracted data will be analysed descriptively in tables and graphically to reveal the pattern in methods implementation and reporting. This protocol has been registered with PROSPERO (CRD42020143148).

Ethics and dissemination No ethical approval was required as only publicly available data were used. The results will be submitted as a manuscript to a peer-reviewed journal, disseminated in conferences if relevant, and published as part of doctoral dissertations in Global Health at the Heidelberg University Hospital.

Journal: BMJ Open |

Year: 2021

Citation: 3

Developing more generalizable prediction models from pooled studies and large clustered data sets

Prediction models often yield inaccurate predictions for new individuals. Large data sets from pooled studies or electronic healthcare records may alleviate this with an increased sample size and variability in sample characteristics. However, existing strategies for prediction model development generally do not account for heterogeneity in predictor-outcome associations between different settings and populations. This limits the generalizability of developed models (even from large, combined, clustered data sets) and necessitates local revisions. We aim to develop methodology for producing prediction models that require less tailoring to different settings and populations. We adopt internal-external cross-validation to assess and reduce heterogeneity in models' predictive performance during the development. We propose a predictor selection algorithm that optimizes the (weighted) average performance while minimizing its variability across the hold-out clusters (or studies). Predictors are added iteratively until the estimated generalizability is optimized. We illustrate this by developing a model for predicting the risk of atrial fibrillation and updating an existing one for diagnosing deep vein thrombosis, using individual participant data from 20 cohorts (N = 10 873) and 11 diagnostic studies (N = 10 014), respectively. Meta-analysis of calibration and discrimination performance in each hold-out cluster shows that trade-offs between average and heterogeneity of performance occurred. Our methodology enables the assessment of heterogeneity of prediction model performance during model development in multiple or clustered data sets, thereby informing researchers on predictor selection to improve the generalizability to different settings and populations, and reduce the need for model tailoring. Our methodology has been implemented in the R package metamisc.

Journal: Stat Med |

Year: 2021

Citation: 17

A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes

It is widely recommended that any developed - diagnostic or prognostic - prediction model is externally validated in terms of its predictive performance measured by calibration and discrimination. When multiple validations have been performed, a systematic review followed by a formal meta-analysis helps to summarize overall performance across multiple settings, and reveals under which circumstances the model performs suboptimal (alternative poorer) and may need adjustment. We discuss how to undertake meta-analysis of the performance of prediction models with either a binary or a time-to-event outcome. We address how to deal with incomplete availability of study-specific results (performance estimates and their precision), and how to produce summary estimates of the c-statistic, the observed:expected ratio and the calibration slope. Furthermore, we discuss the implementation of frequentist and Bayesian meta-analysis methods, and propose novel empirically-based prior distributions to improve estimation of between-study heterogeneity in small samples. Finally, we illustrate all methods using two examples: meta-analysis of the predictive performance of EuroSCORE II and of the Framingham Risk Score. All examples and meta-analysis models have been implemented in our newly developed R package "metamisc".

Journal: Stat Methods Med Res |

Year: 2018

Citation: 109

Practical Implications of Using Real-World Evidence in Comparative Effectiveness Research: Learnings from IMI-GetReal

In light of increasing attention towards the use of Real-World Evidence (RWE) in decision making in recent years, this commentary aims to reflect on the experiences gained in accessing and using RWE for Comparative Effectiveness Research (CER) as part of the Innovative Medicines Initiative GetReal Consortium (IMI-GetReal) and discuss their implications for RWE use in decision-making. For the purposes of this commentary, we define RWE as evidence generated based on health data collected outside the context of RCTs. Meanwhile, we define Comparative Effectiveness Research (CER) as the conduct and/or synthesis of research comparing different benefits and harms of alternative interventions and strategies to prevent, diagnose, treat, and monitor health conditions in routine clinical practice (i.e. the real-world setting). The equivalent term for CER as used in the European context of Health Technology Assessment (HTA) and decision making is Relative Effectiveness Assessment (REA).

Journal: J Comp Eff Res |

Year: 2017

Citation: 13

Books

Systematic Reviews in Health Research

Individual Participant Data Meta-Analysis

Handbook of Meta-Analysis

Prognosis Research in Healthcare