Welcome to our research page featuring recent publications in the field of biostatistics and epidemiology! These fields play a crucial role in advancing our understanding of the causes, prevention, and treatment of various health conditions. Our team is dedicated to advancing the field through innovative studies and cutting-edge statistical analyses. On this page, you will find our collection of research publications describing the development of new statistical methods and their application to real-world data. Please feel free to contact us with any questions or comments.
Filter
Topic
Showing 1 of 6 publications
External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.
Introduction: Causal methods have been adopted and adapted across health disciplines, particularly for the analysis of single studies. However, the sample sizes necessary to best inform decision-making are often not attainable with single studies, making pooled individual-level data analysis invaluable for public health efforts. Researchers commonly implement causal methods prevailing in their home disciplines, and how these are selected, evaluated, implemented and reported may vary widely. To our knowledge, no article has yet evaluated trends in the implementation and reporting of causal methods in studies leveraging individual-level data pooled from several studies. We undertake this review to uncover patterns in the implementation and reporting of causal methods used across disciplines in research focused on health outcomes. We will investigate variations in methods to infer causality used across disciplines, time and geography and identify gaps in reporting of methods to inform the development of reporting standards and the conversation required to effect change.
Methods and analysis We will search four databases (EBSCO, Embase, PubMed, Web of Science) using a search strategy developed with librarians from three universities (Heidelberg University, Harvard University, and University of California, San Francisco). The search strategy includes terms such as "pool*", "harmoniz*", "cohort*", "observational", variations on "individual-level data". Four reviewers will independently screen articles using Covidence and extract data from included articles. The extracted data will be analysed descriptively in tables and graphically to reveal the pattern in methods implementation and reporting. This protocol has been registered with PROSPERO (CRD42020143148).
Ethics and dissemination No ethical approval was required as only publicly available data were used. The results will be submitted as a manuscript to a peer-reviewed journal, disseminated in conferences if relevant, and published as part of doctoral dissertations in Global Health at the Heidelberg University Hospital.
Objectives: In clinical practice, many prediction models cannot be used when predictor values are missing. We therefore propose and evaluate methods for real-time imputation.
Study design and Setting: We describe (i) mean imputation (where missing values are replaced by the sample mean), (ii) joint modeling imputation (JMI, where we use a multivariate normal approximation to generate patient-specific imputations) and (iii) conditional modeling imputation (CMI, where a multivariable imputation model is derived for each predictor from a population). We compared these methods in a case study evaluating the root mean squared error (RMSE) and coverage of the 95% confidence intervals (i.e. the proportion of confidence intervals that contain the true predictor value) of imputed predictor values.
Results: RMSE was lowest when adopting JMI or CMI, although imputation of individual predictors did not always lead to substantial improvements as compared to mean imputation. JMI and CMI appeared particularly useful when the values of multiple predictors of the model were missing. Coverage reached the nominal level (i.e. 95%) for both CMI and JMI.n
Conclusion: Multiple imputation using, either CMI or JMI, is recommended when dealing with missing predictor values in real time settings.
Background: Heart failure (HF) is a chronic and common condition with a rising prevalence, especially in the elderly. Morbidity and mortality rates in people with HF are similar to those with common forms of cancer. Clinical guidelines highlight the need for more detailed prognostic information to optimise treatment and care planning for people with HF. Besides proven prognostic biomarkers and numerous newly developed prognostic models for HF clinical outcomes, no risk stratification models have been adequately established. Through a number of linked systematic reviews, we aim to assess the quality of the existing models with biomarkers in HF and summarise the evidence they present.
Methods: We will search MEDLINE, EMBASE, Web of Science Core Collection, and the prognostic studies database maintained by the Cochrane Prognosis Methods Group combining sensitive published search filters, with no language restriction, from 1990 onwards. Independent pairs of reviewers will screen and extract data. Eligible studies will be those developing, validating, or updating any prognostic model with biomarkers for clinical outcomes in adults with any type of HF. Data will be extracted using a piloted form that combines published good practice guidelines for critical appraisal, data extraction, and risk of bias assessment of prediction modelling studies. Missing information on predictive performance measures will be sought by contacting authors or estimated from available information when possible. If sufficient high quality and homogeneous data are available, we will meta-analyse the predictive performance of identified models. Sources of between-study heterogeneity will be explored through meta-regression using pre-defined study-level covariates. Results will be reported narratively if study quality is deemed to be low or if the between-study heterogeneity is high. Sensitivity analyses for risk of bias impact will be performed.
Discussion: This project aims to appraise and summarise the methodological conduct and predictive performance of existing clinically homogeneous HF prognostic models in separate systematic reviews.Registration: PROSPERO registration number CRD42019086990.
Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.