loader
Measuring the performance of survival models to personalize treatment choices
Efthimiou O, Hoogland J, Debray TPA, Ribero VA, Knol W, Koek HL, Schwenkglenks M, Henrard S, Egger M, Rodondi N, White IR

Various statistical and machine learning algorithms can be used to predict treatment effects at the patient level using data from randomized clinical trials (RCTs). Such predictions can facilitate individualized treatment decisions. Recently, a range of methods and metrics were developed for assessing the accuracy of such predictions. Here, we extend these methods, focusing on the case of survival (time-to-event) outcomes. We start by providing alternative definitions of the participant-level treatment benefit; subsequently, we summarize existing and propose new measures for assessing the performance of models estimating participant-level treatment benefits. We explore metrics assessing discrimination and calibration for benefit and decision accuracy. These measures can be used to assess the performance of statistical as well as machine learning models and can be useful during model development (i.e., for model selection or for internal validation) or when testing a model in new settings (i.e., in an external validation). We illustrate methods using simulated data and real data from the OPERAM trial, an RCT in multimorbid older people, which randomized participants to either standard care or a pharmacotherapy optimization intervention. We provide R codes for implementing all models and measures.