Project 3: Exogenous and Genetic Determinants

Affiliated Groups


In Progress



Cancer results from a complex series of mutational changes and cell proliferation that are driven by biological processes in the target organ. The rates of these processes are influenced by the internal environment -the metabolome and the microbiome – which are in turn influenced by the external environment and the subject’s inherited genetic make-up. The goal of this Project is to develop novel statistical methods to describe this process from subject’s lifetime external environmental exposure history and genome through the internal environment to disease risk. We apply these methods to studies that have exquisitely characterized biomarkers of the internal chemical and microbial environment, including one of identical twins discordant for colorectal polyps, one of colorectal cancer patients followed for clinical outcomes, and one multi-ethnic cohort followed for colorectal cancer incidence.

The remarkable yield of novel genetic associations over the last decade resulting from the agnostic approach of genome-wide association studies (GWAS) had not been matched by comparable advances on the environmental side. The “exposome” concept introduced by Chris Wild in 2005 as a “comprehensive description of lifelong exposure history” of external exposures (e.g., chemical, physical, and biological agents), general external environment (e.g., climate, urban-rural, socioeconomic position), and internal exposures (e.g., metabolites, gut microflora) has been operationalized in terms of the measurement of internal chemicals at particular points in time, typically using mass spectrometry to characterize the “metabolome.” With this machinery Environment-Wide Association Studies (EWAS) are now feasible, but there remain numerous methodological challenges before the EWAS concept can be considered a real companion to GWASs, including the dynamic nature of the external and internal environment, the problem of reverse causation, control of non-genetic host and environmental confounders, measurement error (temporal variability, instrument error, identification of unknown chemicals, etc.), and ways of conducting Gene-Environment-Wide Interaction Studies (GEWIS).

An important component of the exposome is the microbiome. Evidence is mounting linking tumor promotion in a broad array of cancer types to the effects of bacterial microbiota. Local environmental conditions, affected by diet, antibiotics, pre- and probiotics, etc., could affect the structure of microbial communities, affecting risk of disease and response to therapy. The advent of high-throughput sequencing has allowed the relatively inexpensive identification and quantification of thousands of operational taxonomic units (OTUs) in a single biospecimen, providing a wealth of information on the complex structure of resident microbial communities. The microbiome raises many of the same methodological challenges as the exposome, such as time dependency, reverse causation, and non-genetic confounding, but also some different ones like ways of characterizing community effects like diversity and resilience. Although GxE and GxG are also relevant, equally interesting are host-microbial interactions and exposome-microbiome interactions.

We proposed an integrated approach to developing statistical methods for studying the determinants of the internal environment (the metabolome and the microbiome jointly) in relation to the external environment and the host genome and the relationship of the internal environment to disease risk. As part of this we are developing Bayesian network methods to relate all these variables and investigate mediation. We apply our methods to data from, e.g., the Multi-Ethnic Cohort, the ColoCare Consortium, and a study of colorectal polyps in twins.

Recent progress in these aims includes the following. First, our Partition dimensionality reduction approach has now been published in Bioinformatics and can be applied to high dimensional mediators to reduce the number of possible mediators and reduce dependencies among mediators. To enable this we have also published a software paper. The software is downloadable from CRAN, from where it has been downloaded around 8000 times, or from our Github page. Preliminary results from simulations suggest that under some conditions, this data reduction process can reduce direct effects, and thus improve performance of mediation analysis approaches. Second, we have made progress extending the CIT to handle multiple mediators and multiple instrumental variables. In simulated data we have preliminary indications that this approach performs well in comparison to other published methods. Third, we have found evidence that the multi-step approach of screening potential mediation trios by first identifying marginal associations is statistically valid in the context of CIT performance in that it does not excessively inflate type I error, and we are currently working to prove this result.

Currently, we are evaluating the approach against other optimal and approximate algorithms. We have developed the software, causnet, in the form of an R package, which is available on our Github site. We generalized the software to allow for the return of all networks with the best score if there are multiple such networks.


Duncan Thomas, PhD

Emeritus Professor of Population and Public Health Sciences

Explore Research

Getting to the Heart of HIV Stigma

Getting to the Heart of HIV Stigma

Affiliated GroupsPublication dateOctober 22, 2022StatusCompletedShareOverviewSystematic review of the literature on frameworks, measures, and interventions of HIV stigma.Investigators[pphs_api_faculty_card faculty_id="f240"...