A synthetic data integration framework to leverage external summary-level information from heterogeneous populations

Link to the publication

Gu T, Taylor JMG, Mukherjee B. A synthetic data integration framework to leverage external summary-level information from heterogeneous populations. Biometrics. 2023 Mar 6. doi: 10.1111/biom.13852. Epub ahead of print. PMID: 36876883.

Abstract

There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology, where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.

 

Keywords: data integration; prediction models; stacked multiple imputation; synthetic data.

Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons

Link to the publication

Du J, Zhou X, Clark-Boucher D, Hao W, Liu Y, Smith JA, Mukherjee B. Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons. Genet Epidemiol. 2023 Mar;47(2):167-184. doi: 10.1002/gepi.22510. Epub 2022 Dec 8. PMID: 36465006.

Operationalizing the Exposome Using Passive Silicone Samplers

Link to the publication

Fuentes ZC, Schwartz YL, Robuck AR, Walker DI. Operationalizing the Exposome Using Passive Silicone Samplers. Curr Pollut Rep. 2022;8(1):1-29. doi: 10.1007/s40726-021-00211-6. Epub 2022 Jan 4. PMID: 35004129; PMCID: PMC8724229.

Abstract

Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, 0 (  : effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product  has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three p values obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel’s test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package medScan available on the CRAN for implementing all the six methods.

 

Keywords: agnostic mediation analysis; composite null hypothesis; indirect effect; mediation effect; multiple hypothesis testing.

Abstract

The exposome, which is defined as the cumulative effect of environmental exposures and corresponding biological responses, aims to provide a comprehensive measure for evaluating non-genetic causes of disease. Operationalization of the exposome for environmental health and precision medicine has been limited by the lack of a universal approach for characterizing complex exposures, particularly as they vary temporally and geographically. To overcome these challenges, passive sampling devices (PSDs) provide a key measurement strategy for deep exposome phenotyping, which aims to provide comprehensive chemical assessment using untargeted high-resolution mass spectrometry for exposome-wide association studies. To highlight the advantages of silicone PSDs, we review their use in population studies and evaluate the broad range of applications and chemical classes characterized using these samplers. We assess key aspects of incorporating PSDs within observational studies, including the need to preclean samplers prior to use to remove impurities that interfere with compound detection, analytical considerations, and cost. We close with strategies on how to incorporate measures of the external exposome using PSDs, and their advantages for reducing variability in exposure measures and providing a more thorough accounting of the exposome. Continued development and application of silicone PSDs will facilitate greater understanding of how environmental exposures drive disease risk, while providing a feasible strategy for incorporating untargeted, high-resolution characterization of the external exposome in human studies.

 

Keywords: Exposome; Exposure assessment; High-resolution mass spectrometry; Precision medicine; Silicone wristband samplers.