Abstract
Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of Cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.
Keywords: Michigan Genomics Initiative; calibration; directed acyclic graphs; inverse probability weighting; nonprobability sample; poststratification.
© The Royal Statistical Society 2024. All rights reserved. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].
Association between Accelerated Biological Aging, Diet, and Gut Microbiome
Abstract
Factors driving accelerated biological age (BA), an important predictor of chronic diseases, remain poorly understood. This study focuses on the impact of diet and gut microbiome on accelerated BA. Accelerated Klemera-Doubal biological age (KDM-BA) was estimated as the difference between KDM-BA and chronological age. We assessed the cross-sectional association between accelerated KDM-BA and diet/gut microbiome in 117 adult participants from the 10,000 Families Study. 16S rRNA sequencing was used to estimate the abundances of gut bacterial genera. Multivariable linear mixed models evaluated the associations between accelerated KDM-BA and diet/gut microbiome after adjusting for family relatedness, diet, age, sex, smoking status, alcohol intake, and BMI. One standard deviation (SD) increase in processed meat was associated with a 1.91-year increase in accelerated KDM-BA (p = 0.04), while one SD increase in fiber intake was associated with a 0.70-year decrease in accelerated KDM-BA (p = 0.01). Accelerated KDM-BA was positively associated with Streptococcus and negatively associated with Subdoligranulum, unclassified Bacteroidetes, and Burkholderiales. Adjustment for gut microbiome did not change the association between dietary fiber and accelerated KDM-BA, but the association with processed meat intake became nonsignificant. These cross-sectional associations between higher meat intake, lower fiber intake, and accelerated BA need validation in longitudinal studies.
Keywords: accelerated aging; diet; gut microbiome.
To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice
Abstract
Objectives: To develop recommendations regarding the use of weights to reduce selection bias for commonly performed analyses using electronic health record (EHR)-linked biobank data. Materials and methods: We mapped diagnosis (ICD code) data to standardized phecodes from 3 EHR-linked biobanks with varying Recruitment strategies: All of Us (AOU; n = 244 071), Michigan Genomics Initiative (MGI; n = 81 243), and UK Biobank (UKB; n = 401 167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to represent the US adult population more. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted 4 common analyses comparing unweighted and weighted results. Results: For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted phenome-wide association study for colorectal Cancer, the strongest associations remained unaltered, with considerable overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal Cancer to align more closely with national registry-based estimates. Discussion: Weighting had a limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation. When interested in estimating effect size, specific signals from untargeted association analyses should be followed up by weighted analysis. Conclusion: EHR-linked biobanks should report Recruitment and selection mechanisms and provide selection weights with defined target populations. Researchers should consider their intended estimands, specify source and target populations, and weight EHR-linked biobank analyses accordingly.
Keywords: ICD codes; biobank; electronic health records; phenome; selection bias.
© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: [email protected].
Harmonized US National Health and Nutrition Examination Survey 1988-2018 for high throughput exposome-health discovery
Abstract
The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases. Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis and secular trends of the exposome.
A nested case-control study of untargeted plasma Metabolomics and lung Cancer among never-smoking women within the prospective Shanghai Women’s Health Study
Abstract
The etiology of lung Cancer in never-smokers remains elusive, despite 15% of lung Cancer cases in men and 53% in women worldwide being unrelated to smoking. Here, we aimed to enhance our understanding of lung Cancer pathogenesis among never-smokers using untargeted Metabolomics. This nested case-control study included 395 never-smoking women who developed lung Cancer and 395 matched never-smoking Cancer-free women from the prospective Shanghai Women’s Health Study with 15,353 metabolic features quantified in pre-diagnostic plasma using liquid chromatography high-resolution mass spectrometry. Recognizing that metabolites often correlate and seldom act independently in biological processes, we utilized a weighted correlation network analysis to agnostically construct 28 network modules of correlated metabolites. Using conditional logistic regression models, we assessed the associations for both metabolic network modules and individual metabolic features with lung Cancer, accounting for multiple testing using a false discovery rate (FDR) < 0.20. We identified a network module of 121 features inversely associated with all lung Cancer (p = .001, FDR = 0.028) and lung adenocarcinoma (p = .002, FDR = 0.056), where lyso-glycerophospholipids played a key role driving these associations. Another module of 440 features was inversely associated with lung adenocarcinoma (p = .014, FDR = 0.196). Individual metabolites within these network modules were enriched in biological pathways linked to oxidative stress, and energy metabolism. These pathways have been implicated in previous Metabolomics studies involving populations exposed to known lung Cancer risk factors such as traffic-related air pollution and polycyclic aromatic hydrocarbons. Our results suggest that untargeted plasma Metabolomics could provide novel insights into the etiology and risk factors of lung Cancer among never-smokers.
Keywords: lung Cancer; Metabolomics; network analysis; never‐smokers; oxidative stress.
© 2024 UICC. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Comparative impact assessment of COVID-19 policy interventions in five South Asian countries using reported and estimated unreported death counts during 2020-2021
Abstract
There has been raging discussion and debate around the quality of COVID death data in South Asia. According to WHO, of the 5.5 million reported COVID-19 deaths from 2020-2021, 0.57 million (10%) were contributed by five low and middle income countries (LMIC) countries in the Global South: India, Pakistan, Bangladesh, Sri Lanka and Nepal. However, a number of excess death estimates show that the actual death toll from COVID-19 is significantly higher than the reported number of deaths. For example, the IHME and WHO both project around 14.9 million total deaths, of which 4.5-5.5 million were attributed to these five countries in 2020-2021. We focus our gaze on the COVID-19 performance of these five countries where 23.5% of the world population lives in 2020 and 2021, via a counterfactual lens and ask, to what extent the mortality of one LMIC would have been affected if it adopted the pandemic policies of another, similar country? We use a Bayesian semi-mechanistic model developed by Mishra et al. (2021) to compare both the reported and estimated total death tolls by permuting the time-varying reproduction number (Rt) across these countries over a similar time period. Our analysis shows that, in the first half of 2021, mortality in India in terms of reported deaths could have been reduced to 96 and 102 deaths per million compared to actual 170 reported deaths per million had it adopted the policies of Nepal and Pakistan respectively. In terms of total deaths, India could have averted 481 and 466 deaths per million had it adopted the policies of Bangladesh and Pakistan. On the other hand, India had a lower number of reported COVID-19 deaths per million (48 deaths per million) and a lower estimated total deaths per million (80 deaths per million) in the second half of 2021, and LMICs other than Pakistan would have lower reported mortality had they followed India’s strategy. The gap between the reported and estimated total deaths highlights the varying level and extent of under-reporting of deaths across the subcontinent, and that model estimates are contingent on accuracy of the death data. Our analysis shows the importance of timely public health intervention and vaccines for lowering mortality and the need for better coverage infrastructure for the death registration system in LMICs.
Copyright: © 2023 Kundu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Methods for mediation analysis with high-dimensional DNA methylation data: Possible choices and comparisons
Abstract
Epigenetic researchers often evaluate DNA methylation as a potential mediator of the effect of social/Environmental Exposures on a health outcome. Modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large multi-ethnic Cohort in the United States, while providing an R package for their seamless implementation and adoption. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model (BSLMM) and high-dimensional mediation analysis (HDMA); while the preferred methods for estimating the global mediation effect are high-dimensional linear mediation analysis (HILMA) and principal component mediation analysis (PCMA). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.
Copyright: © 2023 Clark-Boucher et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Accuracy and Reliability of Chatbot Responses to Physician Questions
Abstract
Importance: Natural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency. Objective: To assess the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries, highlighting the reliability and limitations of artificial intelligence-generated medical information. Design, setting, and participants: Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes or no) or descriptive answers. The physicians then graded the chatbot-generated answers to these questions for accuracy (6-point Likert scale with 1 being completely incorrect and 6 being completely correct) and completeness (3-point Likert scale, with 1 being incomplete and 3 being complete plus additional context). Scores were summarized with descriptive statistics and compared using the Mann-Whitney U test or the Kruskal-Wallis test. The study (including data analysis) was conducted from January to May 2023. Main outcomes and measures: Accuracy, completeness, and consistency over time and between 2 different versions (GPT-3.5 and GPT-4) of chatbot-generated medical responses. Results: Across all questions (n = 284) generated by 33 physicians (31 faculty members and 2 recent graduates from residency or fellowship programs) across 17 specialties, the median accuracy score was 5.5 (IQR, 4.0-6.0) (between almost completely and complete correct) with a mean (SD) score of 4.8 (1.6) (between mostly and almost completely correct). The median completeness score was 3.0 (IQR, 2.0-3.0) (complete and comprehensive) with a mean (SD) score of 2.5 (0.7). For questions rated easy, medium, and hard, the median accuracy scores were 6.0 (IQR, 5.0-6.0), 5.5 (IQR, 5.0-6.0), and 5.0 (IQR, 4.0-6.0), respectively (mean [SD] scores were 5.0 [1.5], 4.7 [1.7], and 4.6 [1.6], respectively; P = .05). Accuracy scores for binary and descriptive questions were similar (median score, 6.0 [IQR, 4.0-6.0] vs 5.0 [IQR, 3.4-6.0]; mean [SD] score, 4.9 [1.6] vs 4.7 [1.6]; P = .07). Of 36 questions with scores of 1.0 to 2.0, 34 were requeried or regraded 8 to 17 days later with substantial improvement (median score 2.0 [IQR, 1.0-3.0] vs 4.0 [IQR, 2.0-5.3]; P < .01). A subset of questions, regardless of initial scores (version 3.5), were regenerated and rescored using version 4 with improvement (mean accuracy [SD] score, 5.2 [1.5] vs 5.7 [0.8]; median score, 6.0 [IQR, 5.0-6.0] for original and 6.0 [IQR, 6.0-6.0] for rescored; P = .002). Conclusions and relevance: In this cross-sectional study, chatbot generated largely accurate information to diverse medical queries as judged by academic physician specialists with improvement over time, although it had important limitations. Further research and model development are needed to correct inaccuracies and for validation.
Exploratory profiles of phenols, parabens, and per- and Polyfluoroalkyl
substances among NHANES study participants in
association with previous Cancer diagnoses
Abstract
Background: Some hormonally active cancers have low survival rates, but a large proportion of their incidence remains unexplained. Endocrine disrupting chemicals may affect hormone pathways in the pathology of these cancers. Objective: To evaluate cross-sectional associations between Per- and Polyfluoroalkyl Substances (PFAS), phenols, and parabens and self-reported previous Cancer diagnoses in the National Health and Nutrition Examination Survey (NHANES). Methods: We extracted concentrations of 7 PFAS and 12 phenols/parabens and self-reported diagnoses of melanoma and cancers of the thyroid, breast, ovary, uterus, and prostate in men and women (≥20 years). Associations between previous Cancer diagnoses and an interquartile range increase in exposure biomarkers were evaluated using logistic regression models adjusted for key covariates. We conceptualized race as social construct proxy of structural social factors and examined associations in non-Hispanic Black, Mexican American, and other Hispanic participants separately compared to White participants. Results: Previous melanoma in women was associated with higher PFDE (OR:2.07, 95% CI: 1.25, 3.43), PFNA (OR:1.72, 95% CI: 1.09, 2.73), PFUA (OR:1.76, 95% CI: 1.07, 2.89), BP3 (OR: 1.81, 95% CI: 1.10, 2.96), DCP25 (OR: 2.41, 95% CI: 1.22, 4.76), and DCP24 (OR: 1.85, 95% CI: 1.05, 3.26). Previous ovarian Cancer was associated with higher DCP25 (OR: 2.80, 95% CI: 1.08, 7.27), BPA (OR: 1.93, 95% CI: 1.11, 3.35) and BP3 (OR: 1.76, 95% CI: 1.00, 3.09). Previous uterine Cancer was associated with increased PFNA (OR: 1.55, 95% CI: 1.03, 2.34), while higher ethyl paraben was inversely associated (OR: 0.31, 95% CI: 0.12, 0.85). Various PFAS were associated with previous ovarian and uterine cancers in White women, while MPAH or BPF was associated with previous breast Cancer among non-White women. Impact statement: Biomarkers across all exposure categories (phenols, parabens, and per- and poly- fluoroalkyl substances) were cross-sectionally associated with increased odds of previous melanoma diagnoses in women, and increased odds of previous ovarian Cancer was associated with several phenols and parabens. Some associations differed by racial group, which is particularly impactful given the established racial disparities in distributions of exposure to these chemicals. This is the first epidemiological study to investigate exposure to phenols in relation to previous Cancer diagnoses, and the first NHANES study to explore racial/ethnic disparities in associations between environmental phenol, paraben, and PFAS exposures and historical Cancer diagnosis.
© 2023. The Author(s).
A synthetic data integration framework to leverage external summary-level information from heterogeneous populations
Abstract
There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate Cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology, where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.
Keywords: data integration; prediction models; stacked multiple imputation; synthetic data.
© 2023 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society.
Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons
Du J, Zhou X, Clark-Boucher D, Hao W, Liu Y, Smith JA, Mukherjee B. Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons. Genet Epidemiol. 2023 Mar;47(2):167-184. doi: 10.1002/gepi.22510. Epub 2022 Dec 8. PMID: 36465006.
Abstract
Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, 0 ( : effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three p values obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel’s test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package medScan available on the CRAN for implementing all the six methods.
Keywords: agnostic mediation analysis; composite null hypothesis; indirect effect; mediation effect; multiple hypothesis testing.
© 2022 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC.
Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis
Clark-Boucher D, Boss J, Salvatore M, Smith JA, Fritsche LG, Mukherjee B. Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis. PLoS One. 2022 Jul 25;17(7):e0269017. doi: 10.1371/journal.pone.0269017. PMID: 35877617; PMCID: PMC9312965.
Abstract
Since the beginning of the Coronavirus Disease 2019 (COVID-19) pandemic, a focus of research has been to identify risk factors associated with COVID-19-related outcomes, such as testing and diagnosis, and use them to build prediction models. Existing studies have used data from digital surveys or electronic health records (EHRs), but very few have linked the two sources to build joint predictive models. In this study, we used survey data on 7,054 patients from the Michigan Genomics Initiative biorepository to evaluate how well self-reported data could be integrated with electronic records for the purpose of modeling COVID-19-related outcomes. We observed that among survey respondents, self-reported COVID-19 diagnosis captured a larger number of cases than the corresponding EHRs, suggesting that self-reported outcomes may be better than EHRs for distinguishing COVID-19 cases from controls. In the modeling context, we compared the utility of survey- and EHR-derived predictor variables in models of survey-reported COVID-19 testing and diagnosis. We found that survey-derived predictors produced uniformly stronger models than EHR-derived predictors-likely due to their specificity, temporal proximity, and breadth-and that combining predictors from both sources offered no consistent improvement compared to using survey-based predictors alone. Our results suggest that, even though general EHRs are useful in predictive models of COVID-19 outcomes, they may not be essential in those models when rich survey data are already available. The two data sources together may offer better prediction for COVID severity, but we did not have enough severe cases in the survey respondents to assess that hypothesis in in our study.
Identification of occupations susceptible to high exposure and risk associated with multiple toxicants in an observational study: National Health and Nutrition Examination Survey 1999-2014
Nguyen VK, Colacino J, Patel CJ, Sartor M, Jolliet O. Identification of occupations susceptible to high exposure and risk associated with multiple toxicants in an observational study: National Health and Nutrition Examination Survey 1999-2014. Exposome. 2022 Jun 25;2(1):osac004. doi: 10.1093/exposome/osac004. PMID: 35832257; PMCID: PMC9266352.
Operationalizing the Exposome Using Passive Silicone Samplers
Fuentes ZC, Schwartz YL, Robuck AR, Walker DI. Operationalizing the Exposome Using Passive Silicone Samplers. Curr Pollut Rep. 2022;8(1):1-29. doi: 10.1007/s40726-021-00211-6. Epub 2022 Jan 4. PMID: 35004129; PMCID: PMC8724229.
Abstract
Occupational exposures to toxicants are estimated to cause over 370 000 premature deaths annually. The risks due to multiple workplace chemical exposures and those occupations most susceptible to the resulting health effects remain poorly characterized. The aim of this study is to identify occupations with elevated toxicant biomarker concentrations and increased health risk associated with toxicant exposures in a diverse working US population. For this observational study of 51 008 participants, we used data from the 1999-2014 National Health and Nutrition Examination Survey. We characterized differences in chemical exposures by occupational group for 131 chemicals by applying a series of generalized linear models with the outcome as biomarker concentrations and the main predictor as the occupational groups, adjusting for age, sex, race/ethnicity, poverty income ratio, study period, and biomarker of tobacco use. For each occupational group, we calculated percentages of participants with chemical biomarker levels exceeding acceptable health-based guidelines. Blue-collar workers from “Construction,” “Professional, Scientific, Technical Services,” “Real Estate, Rental, Leasing,” “Manufacturing,” and “Wholesale Trade” have higher biomarker levels of toxicants such as several heavy metals, Acrylamide, glycideamide, and several volatile organic compounds (VOCs) compared with their white-collar counterparts. Moreover, blue-collar workers from these industries have toxicant concentrations exceeding acceptable levels: Arsenic (16%-58%), Lead (1%-3%), Cadmium (1%-11%), glycideamide (3%-6%), and VOCs (1%-33%). Blue-collar workers have higher toxicant levels relative to their white-collar counterparts, often exceeding acceptable levels associated with noncancer effects. Our findings identify multiple occupations to prioritize for targeted interventions and health policies to monitor and reduce toxicant exposures.
Keywords: biomonitoring equivalents; environmental chemicals; occupational epidemiology; occupational exposures; risk assessment; unsupervised learning.
© The Author(s) 2022. Published by Oxford University Press.
Abstract
The exposome, which is defined as the cumulative effect of Environmental Exposures and corresponding biological responses, aims to provide a comprehensive measure for evaluating non-genetic causes of disease. Operationalization of the exposome for environmental health and precision medicine has been limited by the lack of a universal approach for characterizing complex exposures, particularly as they vary temporally and geographically. To overcome these challenges, passive sampling devices (PSDs) provide a key measurement strategy for deep exposome phenotyping, which aims to provide comprehensive chemical assessment using untargeted high-resolution mass spectrometry for exposome-wide association studies. To highlight the advantages of silicone PSDs, we review their use in population studies and evaluate the broad range of applications and chemical classes characterized using these samplers. We assess key aspects of incorporating PSDs within observational studies, including the need to preclean samplers prior to use to remove impurities that interfere with compound detection, analytical considerations, and cost. We close with strategies on how to incorporate measures of the external exposome using PSDs, and their advantages for reducing variability in exposure measures and providing a more thorough accounting of the exposome. Continued development and application of silicone PSDs will facilitate greater understanding of how Environmental Exposures drive disease risk, while providing a feasible strategy for incorporating untargeted, high-resolution characterization of the external exposome in human studies.
Keywords: Exposome; Exposure assessment; High-resolution mass spectrometry; Precision medicine; Silicone wristband samplers.
© The Author(s) 2021.