Over the past few years, "genetic genomics" has provided a new and powerful paradigm for understanding the effect of sequence polymorphisms, by analyzing their genome-wide effect on the expression profiles of potential target genes. This approach has promise both for providing basic biological insight on gene regulation and as a starting point for understanding human diseases.
Classical computational methods for analyzing these data have been direct extensions of genetic analysis, viewing each gene expression profile as an isolated, quantitative trait. In collaboration with Daphne Koller at Stanford, we developed "Geronemo" a novel computational method designed specifically for gene expression quantitative traits. Our premise is that the influence of genotype on phenotype is induced by fine-grained perturbations to the complex regulatory network that governs a cell's activity. We provide a computational method that deciphers both the cell's regulatory network and perturbations to it that result from sequence variability. Our method, builds on our successful Module Networks procedure and offers several significant advantages over eQTL mapping.
  • Geronemo can distinguish between associations directly induced by sequence variation and those induced by an indirectly via the abundance of a regulator, leading to a better causal understanding of the observed variation.
  • Geronemo exploits the modularity of biological systems, allowing discovery of complex combinatorial regulation programs that are undetectable when considering each gene in isolation.
We applied Genonemo to a dataset containing expression and genotype data for 116 S. cerevisiae strains, generated by crossing a lab strain (BY) with a wild vineyard strain (RM). Our method produced a range of interesting biological findings regarding both the yeast regulatory network and an understanding how perturbations to it result in variation observed between the strains. Two of our most interesting findings include:
  • Variation in a small number of chromatin modifying factors plays a key role modulating a large fraction of the variance in gene expression between strains. Our global module based analysis suggests that evolutionary forces use changes in a small set of chromatin modification proteins to drive coordinated global changes in the regulatory network. (PNAS 2006)
  • Geronemo predicted a novel mechanism involving regulation of mRNA degradation, connecting Puf3 to P-bodies, which we subsequently verified experimentally. This connection was uncovered due to our ability to concurrently infer both the regulatory network and analyze changes to it between strains.
Expression variation among individuals is a powerful resource that is well-suited both for detecting regulatory interactions and uncovering complex phenotypes. Unlike other types of data (e.g., gene deletions or environmental stimuli), functional assays from divergent strains represent small, natural perturbations to the system, allowing subtle changes to manifest. Moreover, each individual represents a large set of such perturbations, providing a rich source of statistical variation that helps clarify the signal. Interestingly, many perturbations are only revealed in the offspring, with the parents showing no variation. Such data is rapidly accumulating for a number of model systems, yet much of the variation remains unexplained, even by the best models. We are working on improving and extending Geronemo, and developing entirely new methods. Our computational efforts are aimed at a number of biological questions and directions.
  • Understanding the flow of genetic information, from genotype to phenotype and fitness. We take a multi-layer approach to understand how genotype manifests in phenotypic diversity, using the regulatory network and changes in gene expression patterns and an intermediate to facilitate our understanding.
  • De-convolving genetic complexity: Despite clear heritability of many phenotypes and disease, association of multi-loci traits has remained an unresolved challenge. We are developing new approaches to detect causality in multi-loci situations that are obscured by standard techniques.
  • Scaling to mammalian system and clinically relevant problems. Such scaling entails considerable computational and statistical challenges: mammalian genomes are 100-fold larger, have a higher degree of combinatorial regulation and have a complex landscape of variation. Nevertheless, our success in yeast bears much promise for its extension to mammalian systems.