Welcome to our research page featuring recent publications in the field of biostatistics and epidemiology! These fields play a crucial role in advancing our understanding of the causes, prevention, and treatment of various health conditions. Our team is dedicated to advancing the field through innovative studies and cutting-edge statistical analyses. On this page, you will find our collection of research publications describing the development of new statistical methods and their application to real-world data. Please feel free to contact us with any questions or comments.
Filter
Topic
Showing 1 of 7 publications
Introduction: A major challenge of the use of prediction models in clinical care is missing data. Real-time imputation may alleviate this. However, to what extent clinicians accept this solution remains unknown. We aimed to assess acceptance of real-time imputation for missing patient data in a clinical decision support system (CDSS) including 10-year cardiovascular absolute risk for the individual patient.
Methods: We performed a vignette study extending an existing CDSS with the real-time imputation method Joint Modelling Imputation (JMI). We included 17 clinicians to use the CDSS with three different vignettes, describing potential use cases (missing data, no risk estimate; imputed values, risk estimate based on imputed data; complete information). In each vignette missing data was introduced to mimic a situation as could occur in clinical practice. Acceptance of end-users was assessed on three different axes: clinical realism, comfortableness and added clinical value.
Results: Overall, the imputed predictor values were found to be clinically reasonable and according to the expectations. However, for binary variables, use of a probability scale to express uncertainty was deemed inconvenient. The perceived comfortableness with imputed risk prediction was low and confidence intervals were deemed too wide for reliable decision making. The clinicians acknowledged added value for using JMI in clinical practice when used for educational, research or informative purposes.
Conclusion: Handling missing data in CDSS via JMI is useful, but more accurate imputations are needed to generate comfort in clinicians for use in routine care. Only then CDSS can create clinical value by improving decision making.
Background: Modelling non-linear associations between an outcome and continuous patient characteristics, whilst investigating heterogeneous treatment effects, is one of the opportunities offered by individual participant data meta-analysis (IPD-MA). Splines offer great flexibility, but guidance is lacking.
Objective: To introduce modelling of nonlinear associations using restricted cubic splines (RCS), natural B-splines, P-splines, and smoothing splines in IPD-MA to estimate absolute treatment effects.
Methods: We describe the pooling of spline-based models using pointwise and multivariate meta-analysis (two-stage methods) and one-stage generalised additive mixed effects models (GAMMs). We illustrate their performance on three IPD-MA scenarios of five studies each: one where only the associations differ across studies, one where only the ranges of the effect modifier differ and one where both differ. We also evaluated the approaches in an empirical example, modelling the risk of fever and/or ear pain in children with acute otitis media conditional on age.
Results: In the first scenario, all pooling methods showed similar results. In the second and third scenario, pointwise meta-analysis was flexible but showed non-smooth results and wide confidence intervals; multivariate meta-analysis failed to converge with RCS, but was efficient with natural B-splines. GAMMs produced smooth pooled regression curves in all settings. In the empirical example, results were similar to the second and third scenario, except for multivariate meta-analysis with RCS, which now converged.
Conclusion: We provide guidance on the use of splines in IPD-MA, to capture heterogeneous treatment effects in presence of non-linear associations, thereby facilitating estimation of absolute treatment effects to enhance personalized healthcare.
Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.
Aims: Use of prediction models is widely recommended by clinical guidelines, but usually requires complete information on all predictors, which is not always available in daily practice. We aim to describe two methods for real-time handling of missing predictor values when using prediction models in practice.
Methods and results: We compare the widely used method of mean imputation (M-imp) to a method that personalizes the imputations by taking advantage of the observed patient characteristics. These characteristics may include both prediction model variables and other characteristics (auxiliary variables). The method was implemented using imputation from a joint multivariate normal model of the patient characteristics (joint modelling imputation; JMI). Data from two different cardiovascular cohorts with cardiovascular predictors and outcome were used to evaluate the real-time imputation methods. We quantified the prediction model's overall performance [mean squared error (MSE) of linear predictor], discrimination (c-index), calibration (intercept and slope), and net benefit (decision curve analysis). When compared with mean imputation, JMI substantially improved the MSE (0.10 vs. 0.13), c-index (0.70 vs. 0.68), and calibration (calibration-in-the-large: 0.04 vs. 0.06; calibration slope: 1.01 vs. 0.92), especially when incorporating auxiliary variables. When the imputation method was based on an external cohort, calibration deteriorated, but discrimination remained similar.
Conclusions: We recommend JMI with auxiliary variables for real-time imputation of missing values, and to update imputation models when implementing them in new settings or (sub)populations.
Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.
We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show why these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset gathers a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.