Project 4: Modeling Tumor Growth

Affiliated Groups


In Progress



Despite decades of research, cancer remains a significant health problem. A fundmental knowledge gap, too often overlooked, is that we know few details of the actual patterns of tumor growth. We use “molecular phylogeny” to study a cancer’s natural history, looking back in time to track the genomes of single tumor cells as they divide and move through time and space, and characterize early cell behavior that might predict cancer aggressiveness and assist clinical decision-making.

The overall goal of this project is to develop computational methods for studying tumor growth, and to relate growth parameters to patient characteristics and prognosis. We hypothesize that tumor growth parameters will allow us to define cancer phenotypes that help resolve cancer heterogeneity, and thereby improve power in analyses that try to link germline genetic variation and internal/external environment to phenotypic variation (Projects 1 & 3).

Our models leverage multiregional sampling of the tumor and data on whole exome sequencing, DNA methylation, and DNA copy number.

An example of our work is HiLDA, a hierarchical latent Dirichlet allocation model for characterizing somatic mutation data in cancer. The method allows us to infer mutational patterns and their relative frequencies in a set of tumor mutational catalogs and to compare the estimated frequencies between tumor sets. The paper can be found here. In it we present the method and then apply it to two datasets, one containing somatic mutations in colon cancer by the time of occurrence, before or after tumor initiation, and the second containing somatic mutations in esophageal cancer by sex, age, smoking status, and tumor site. In colon cancer, the relative frequencies of mutational patterns were found significantly associated with the time of occurrence of mutations. In esophageal cancer, the relative frequencies were significantly associated with the tumor site. Our novel method provides higher statistical power for detecting differences in mutational signatures. Software is available at Bioconductor or our github repository.

In related work we report a novel mutational signature analysis in colon cancer using publicly available data from the Cancer Genome Atlas (TCGA). We identified four mutational signatures in MSI cancers, that were replicated in a publicly available Chinese colon cancer data set. We also tested for a difference in mutational signature catalogs based on the time of mutation occurrence using HiLDA, stratifying the tumors into groups based on whether they had high microsatellite instability (MSI-H) or not. We found that the mutational signature burdens varied between trunk and branch mutations in tumors that were not MSI-H, but not in MSI-H tumors.

Both of these projects were supported by our development of iMutSig, an R shiny web application to allow investigators to compare mutational signatures from two online databases, and allow the identification of the most similar mutational signature to a novel user-entered signature. This paper can be found here.

We have a further focus on detecting regions of the genome that are conserved in terms of their epigenetic states. Different epigenetic configurations allow one genome to develop into multiple cell types. Although the rules governing what epigenetic features confer gene expression are increasingly being understood, much remains uncertain. We have developed methodology, which we also implemented in a novel software package, MethCon5, to explore whether the principle of biologic conservation can be used to identify expressed genes. The hypothesis is that epigenetic configurations of important expressed genes will be conserved within a tissue. We compared the DNA methylation of approximately 850,000 CpG sites between multiple clonal crypts or glands of human colon, small intestine, and endometrium. We were able to show that DNA methylation is preferentially conserved at gene-associated CpG sites, particularly in gene promoters (e.g., near the transcription start site) or the first exon. Furthermore, higher conservation correlated well with gene expression levels and performed better than promoter DNA methylation levels. Most conserved genes are in canonical housekeeping pathways. Full details can be found in our publication.

Explore Research

Getting to the Heart of HIV Stigma

Getting to the Heart of HIV Stigma

Affiliated GroupsPublication dateOctober 22, 2022StatusCompletedShareOverviewSystematic review of the literature on frameworks, measures, and interventions of HIV stigma.Investigators[pphs_api_faculty_card faculty_id="f240"...