-
Research
-
Publications
-
People
-
Benner, Steven
-
Carrigan, Matthew
-
Chamberlin, Steve
-
Davis, Ross
-
Gaucher, Eric
-
Hughes, Romaine
-
Hutter, Daniel
-
Kim, Hyo-Joong
-
Leal, Nicole
-
Shaw, Ryan
-
Yang, Zunyi
-
Software
-
News and Events
-
Our Foundation
|
Eric Gaucher's Publications
 Paleotemperature trend for Precambrian life inferred from resurrected proteins
Gaucher, EA
Ganesh, O
Govindarajan, S
Nature
(2007)
In press
 Molecular evolutionary models to guide experiments in protein engineering and directed evolution
Gaucher, EA
(2007)
In review

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics
Li, T
Chamberlin, SG
Caraco, MD
Liberles, DA
Gaucher, EA
Benner, SA
BMC Evol. Biol. 6 25
(2006)
<Abstract>
Background: The exchange of nucleotides at synonymous sites in a gene
encoding a protein is believed to have little impact on the fitness of
a host organism. This should be especially true for synonymous
transitions, where a pyrimidine nucleotide is replaced by another
pyrimidine, or a purine is replaced by another purine. This suggests
that transition redundant exchange ( TREx) processes at the third
position of conserved two-fold codon systems might offer the best
approximation for a neutral molecular clock, serving to examine, within
coding regions, theories that require neutrality, determine whether
transition rate constants differ within genes in a single lineage, and
correlate dates of events recorded in genomes with dates in the
geological and paleontological records. To date, TREx analysis of the
yeast genome has recognized correlated duplications that established a
new metabolic strategies in fungi, and supported analyses of functional
change in aromatases in pigs. TREx dating has limitations, however.
Multiple transitions at synonymous sites may cause equilibration and
loss of information. Further, to be useful to correlate events in the
genomic record, different genes within a genome must suffer transitions
at similar rates.
Results: A formalism to analyze divergence at two fold redundant codon
systems is presented. This formalism exploits two-state
approach-to-equilibrium kinetics from chemistry. This formalism
captures, in a single equation, the possibility of multiple
substitutions at individual sites, avoiding any need to "correct" for
these. The formalism also connects specific rate constants for
transitions to specific approximations in an underlying evolutionary
model, including assumptions that transition rate constants are
invariant at different sites, in different genes, in different
lineages, and at different times. Therefore, the formalism supports
analyses that evaluate these approximations.
Transitions at synonymous sites within two-fold redundant coding
systems were examined in the mouse, rat, and human genomes. The key
metric (f(2)), the fraction of those sites that holds the same
nucleotide, was measured for putative ortholog pairs. A transition
redundant exchange ( TREx) distance was calculated from f(2) for these
pairs. Pyrimidine-pyrimidine transitions at these sites occur
approximately 14% faster than purine-purine transitions in various
lineages. Transition rate constants were similar in different genes
within the same lineages; within a set of orthologs, the f(2)
distribution is only modest overdispersed. No correlation between
disparity and overdispersion is observed. In rodents, evidence was
found for greater conservation of TREx sites in genes on the X
chromosome, accounting for a small part of the overdispersion, however.
Conclusion: The TREx metric is useful to analyze the history of
transition rate constants within these mammals over the past 100
million years. The TREx metric estimates the extent to which silent
nucleotide substitutions accumulate in different genes, on different
chromosomes, with different compositions, in different lineages, and at
different times.

Application of DETECTER, an Evolutionary Genomic Tool to Analyze Genetic Variation, to the Cystic Fibrosis Gene Family
Gaucher, EA
DeKee, DW
Benner, SA
BMC Genomics 7 44
(2006)
<Abstract>
Background: The medical community requires computational tools that
distinguish genetic differences having phenotypic impact within the
vast number of mutations that do not. Tools that do this will become
increasingly important for those seeking to use human genome sequence
data to predict disease, make prognoses, and customize therapy to
individual patients.
Results: An approach, termed DETECTER, is proposed to identify sites
in a protein sequence where amino acid replacements are likely to have
a significant effect on phenotype, including causing genetic
disease. This approach uses a model-dependent tool to estimate the
normalized replacement rate at individual sites in a protein sequence,
based on a history of those sites extracted from an evolutionary
analysis of the corresponding protein family. This tool identifies
sites that have higher-than-average, average, or lower- than-average
rates of change in the lineage leading to the sequence in the
population of interest. The rates are then combined with sequence data
to determine the likelihoods that particular amino acids were present
at individual sites in the evolutionary history of the gene
family. These likelihoods are used to predict whether any specific
amino acid replacements, if introduced at the site in a modern human
population, would have a significant impact on fitness. The DETECTER
tool is used to analyze the cystic fibrosis transmembrane conductance
regulator (CFTR) gene family.
Conclusions: In this system, DETECTER retrodicts amino acid
replacements associated with the cystic fibrosis disease with greater
accuracy than alternative approaches. While this result validates this
approach for this particular family of proteins only, the approach may
be applicable to the analysis of polymorphisms generally, including
SNPs in a human population.

The diverse biological functions of phosphatidylinositol transfer proteins in eukaryotes
Phillips, SE
Vincent, P
Rizzieri, KE
Schaaf, G
Bankaitis, VA
Gaucher, EA
Crit. Rev. Biochem. Mol. Biol. 41
(1)
21-49
(2006)
<Abstract>
Phosphatidylinositol/phosphatidylcholine transfer proteins (PITPs)
remain largely functionally uncharacterized, despite the fact that
they are highly conserved and are found in all eukaryotic cells thus
far examined by biochemical or sequence analysis approaches. The
available data indicate a role for PITPs in regulating specific
interfaces between lipid-signaling and cellular function. In this
regard, a role for PITPs in controlling specific membrane trafficking
events is emerging as a common functional theme. However, the
mechanisms by which PITPs regulate lipid-signaling and
membrane-trafficking functions remain unresolved. Specific PITP
dysfunctions are now linked to neurodegenerative and intestinal
malabsorbtion diseases in mammals, to stress response and
developmental regulation in higher plants, and to previously
uncharacterized pathways for regulating membrane trafficking in yeast
and higher eukaryotes, making it clear that PITPs are integral parts
of a highly conserved signal transduction strategy in
eukaryotes. Herein, we review recent progress in deciphering the
biological functions of PITPs, and discuss some of the open questions
that remain.

A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous
Gaucher, EA
Miyamoto, MM
Mol. Phylogenet. Evol. 37
(3)
928-931
(2005)
<Abstract>
All methods of phylogenetic inference make assumptions about the
underlying evolutionary process of their characters and it is
these assumptions that determine their relative successes and
failures in the estimation of the true phylogeny for a group.
This dependency of phylogenetic accuracy and robustness on
evolutionary assumptions has been most extensively studied for
the classic case of Felsenstein (1978) and its four-taxon
phylogeny with two long, unrelated, terminal branches
interspersed with two short ones. Given this model phylogeny,
"long branch attraction" can occur and thereby lead to the
convergence of a phylogenetic method onto an incorrect tree with
the two long and two short terminal branches directly connected
rather than interspersed. The extent to which a particular
phylogenetic method is susceptible to this problem depends on
what assumptions it makes about the evolution of the characters
and data themselves.

Resurrecting ancestral alcohol dehydrogenases from yeast
Thomson, JM
Gaucher, EA
Burgan, MF
De Kee, DW
Li, T
Aris, JP
Benner, SA
Nature Genet. 37
(6)
630-635
(2005)
<Abstract>
Modern yeast living in fleshy fruits rapidly convert sugars into
bult ethanol through pyruvate. Pyruvate loses carbon dioxide to
become acetaldehyde, which is reduced by alcohol dehydrogenase 1
(Adh1) to ethanol, which accumulates. Yeast later consumes the
accumulated ethanol, exploiting Adh2, an Adh1 homolog differing by
24 (of 348) amino acids. Because many microorganisms cannot grow
in ethanol, accumulated ethanol may help yeast defend resources in
the fruit. We report here the reconstruction of the last common
ancestor of Adh1 and Adh2, called AdhA. The kinetic behavior of
AdhA suggests that it was optimized to make (not consume) ethanol.
This is consistent with the hypothesis that before the Adh1-Adh2
duplication, yeast did not accumulate ethanol for later consumption
but rather used AdhA to recycle NADH generated in the glycolytic
pathway. Silent nucleotide dating suggests that the Adh1-Adh2
duplication occurred near the time of duplication of several other
proteins involved in the accumulation of ethanol, possibly in the
Cretaceous age when fleshy fruits arose. These results help to
connect the chemical behavior of these enzymes through systems
analysis to a time of global ecosystem change, a small but useful
step towards a planetary systems biology.

Inferred thermophily of the Last Universal Ancestor based on estimated amino acid composition
Brooks, DJ
Gaucher, EA
(2005)
Submitted
<Abstract>
The environmental temperature of the last universal ancestor (LUA) of all extant organisms is the subject of heated debate. Because the amino acid composition of proteins differs between mesophiles and thermophiles, the inferred amino acid composition of proteins in the LUA could be used to classify it as one or the other. We applied expectation maximization (EM) to estimate the amino acid composition of a set of thirty-one proteins in the LUA based on alignments of their modern day descendants, a phylogenetic tree relating those descendants and a model of evolution. Separate estimates of amino acid composition in LUA proteins were derived using modern day sequences of eight mesophilic species, eight thermophilic species or the sixteen species combined. We show that the relative mean Euclidean distance between the amino acid composition in one species and that of a set of mesophiles or thermophiles can be employed as a classifier with 100% accuracy. Applying this classifier to the estimated amino acid composition of the ancestral protein set in the LUA, we find it to be classified as a thermophile even when only the proteins of mesophilic species are used to derive the estimate. Based on the estimated amino acid composition of proteins in the LUA, we infer that it was a thermophile. We discuss our findings in the context of previous data pertaining to the OGT of the LUA, particularly the inferred G + C content of its rRNA. We conclude that the gathering evidence strongly supports a thermophilic LUA.

Cytoplasmic glycosylation of protein-hydroxyproline and its relationship to other glycosylation pathways
West, CM
van der Wel, H
Sassi, S
Gaucher, EA
Biochim. Biophys. Acta 1673
(1-2)
29-44
(2004)
<Abstract>
The Skp1 protein, best known as a subunit of E3(SCF)-ubiquitin ligases, is subject to complex glycosylation in the cytoplasm of the cellular slime mold Dictyostelium. Pro143 of this protein is sequentially modified by a prolyl hydroxylase and five soluble glycosyltransferases (GT), to yield the structure Galalpha1,Galalpha1,3Fucalpha1,2Galbeta1,3GlcNAcalpha1-HyPro143. These enzymes are unusual in that they are expressed in the cytoplasmic compartment of the cell, rather than the secretory pathway where complex glycosylation of proteins usually occurs. The first enzyme in the pathway appears to be related to the soluble animal prolyl 4-hydroxylases (P4H), which modify the transcriptional factor subunit HIF-1alpha in the cytoplasm, and more distantly to the P4Hs that modify collagen and other proteins in the rER, based on biochemical and informatics analyses. The soluble alphaGlcNAc-transferase acting on Skp1 has been cloned and is distantly related to the mucin-type polypeptide N-acetyl-alpha-galactosaminyltransferase in the Golgi of animals. Its characterization has led to the discovery of a family of related polypeptide N-acetyl-alpha-glucosaminyltransferases in the Golgi of selected lower eukaryotes. The Skp1 GlcNAc is extended by a bifunctional diglycosyltransferase that sequentially and apparently processively adds beta1,3Gal and alpha1,2Fuc. Though this structure is also formed in the animal secretory pathway, the GTs involved are dissimilar. Conceptual translation of available genomes suggests the existence of this kind of complex cytoplasmic glycosylation in other eukaryotic microorganisms, including diatoms, oomycetes, and possibly Chlamydomonas and Toxoplasma, and an evolutionary precursor of this pathway may also occur in prokaryotes. (C) 2004 Elsevier B.V. All rights reserved.

The planetary biology of cytochrome P450 aromatases
Gaucher, EA
Graddy, LG
Li, T
Simmen, RC
Simmen, FA
Schreiber, DR
Liberles, DA
Janis, CM
Benner, SA
BMC Biology 2
(1)
19
(2004)
<Abstract>
BACKGROUND: Joining a model for the molecular evolution of a
protein family to the paleontological and geological records
(geobiology), and then to the chemical structures of substrates,
products, and protein folds, is emerging as a broad strategy for
generating hypotheses concerning function in a post-genomic
world. This strategy expands systems biology to a planetary
context, necessary for a notion of fitness to underlie (as it
must) any discussion of function within a biomolecular
system.
RESULTS: Here, we report an example of such an expansion,
where tools from planetary biology were used to analyze three
genes from the pig Sus scrofa that encode cytochrome P450
aromatases-enzymes that convert androgens into estrogens. The
evolutionary history of the vertebrate aromatase gene family was
reconstructed. Transition redundant exchange silent substitution
metrics were used to interpolate dates for the divergence of
family members, the paleontological record was consulted to
identify changes in physiology that correlated in time with the
change in molecular behavior, and new aromatase sequences from
peccary were obtained. Metrics that detect changing function in
proteins were then applied, including KA/KS values and those
that exploit structural biology. These identified specific amino
acid replacements that were associated with changing substrate
and product specificity during the time of presumed adaptive
change. The combined analysis suggests that aromatase paralogs
arose in pigs as a result of selection for Suoidea with larger
litters than their ancestors, and permitted the Suoidea to
survive the global climatic trauma that began in the
Eocene.
CONCLUSIONS: This combination of bioinformatics analysis,
molecular evolution, paleontology, cladistics, global
climatology, structural biology, and organic chemistry serves as
a paradigm in planetary biology. As the geological,
paleontological, and genomic records improve, this approach
should become widely useful to make systems biology statements
about high-level function for biomolecular systems.
 Significance of cytoplasmic prolyl hydroxylation and complex glycosylation in the cellular slime mold Dictyostelium
West, CM
van der Wel, H
Sassi, S
Gaucher, E
Ercan, A
Glycobiology 14
(11)
1063-1063
(2004)
 Initiation of mucin-type O-glycosylation in lower eukaryotes (O-alpha-GlcNAc-type) and higher eukaryotes (O-alpha-GalNAc-type) is homologous
West, CM
Wang, F
van der Wel, H
Gaucher, E
Sassi, S
Metcalf, T
Heise, N
Mendonca-Previato, L
Previato, JO
Glycobiology 13
(11)
875-876
(2003)

Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
Gaucher, EA
Thomson, JM
Burgan, MF
Benner, SA
Nature 425
(6955)
285-288
(2003)
<Abstract>
Features of the physical environment surrounding an ancestral
organism can be inferred by reconstructing sequences(1-9) of
ancient proteins made by those organisms, resurrecting these
proteins in the laboratory, and measuring their
properties. Here, we resurrect candidate sequences for
elongation factors of the Tu family (EF-Tu) found at ancient
nodes in the bacterial evolutionary tree, and measure their
activities as a function of temperature. The ancient EF-Tu
proteins have temperature optima of 55-65degreesC. This value
seems to be robust with respect to uncertainties in the
ancestral reconstruction. This suggests that the ancient
bacteria that hosted these particular genes were thermophiles,
and neither hyperthermophiles nor mesophiles. This conclusion
can be compared and contrasted with inferences drawn from an
analysis of the lengths of branches in trees joining proteins
from contemporary bacteria(10), the distribution of thermophily
in derived bacterial lineages(11), the inferred G+C content of
ancient ribosomal RNA(12), and the geological record combined
with assumptions concerning molecular clocks(13). The study
illustrates the use of experimental palaeobiochemistry and
assumptions about deep phylogenetic relationships between
bacteria to explore the character of ancient life.
 Complex glycosylation of Skp1 in Dictyostelium: implications for the modification of other eukaryotic cytoplasmic and nuclear proteins
West, CM
van der Wel, H
Gaucher, EA
Glycobiology 12
(2)
17R-27R
(2002)
<Abstract>
Recently, complex O-glycosylation of the cytoplasmic/nuclear
protein Skp1 has been characterized in the eukaryotic
microorganism Dirtyostelium. Skp1's glycosylation is mediated by
the sequential action of a prolyl hydroxylase and five
conventional sugar nucleotide-dependent glycosyltransferase
activities that reside in the cytoplasm rather than the
secretory compartment. The Skp1-HyPro GlcNAc-Transferase, which
adds the first sugar, appears to be related to a lineage of
enzymes that originated in the prokaryotic cytoplasm and
initiates mucin-type O-linked glycosylation in the lumen of the
eukaryotic Golgi apparatus. GlcNAc is extended by a bifunctional
glycosyltransferase that mediates the ordered addition of
beta1,3-linked Gal and alpha1,2-linked Fuc. The architecture of
this enzyme resembles that of certain two-domain prokaryotic
glycosyl-transferases. The catalytic domains are related to
those of a large family of prokaryotic and eukaryotic,
cytoplasmic, membrane-bound, inverting glycosyltransferases that
modify glycolipids and polysaccharides prior to their
translocation across membranes toward the secretory pathway or
the cell exterior. The existence of these enzymes in the
eukaryotic cytoplasm away from membranes and their ability to
modify protein acceptors expose a new set of cytoplasmic and
nuclear proteins to potential prolyl bydroxylation and complex
O-linked glycosylation.
 Identification of a Golgi-associated UDP-GlcNAc : polypeptide mucin-type alpha-N-acetylglucosaminyltransferase that modifies cell surface proteins in Dictyostelium
West, CM
van der Wel, H
Metcalf, T
Kaplan, L
Gaucher, EA
Glycobiology 12
(10)
697-697
(2002)

Evolution - Planetary biology - Paleontological, geological, and molecular histories of life
Benner, SA
Caraco, MD
Thomson, JM
Gaucher, EA
Science 296
(5569)
864-868
(2002)
<Abstract>
The history of life on Earth is chronicled in the geological
strata, the fossil record, and the genomes of contemporary
organisms. When examined together, these records help identify
metabolic and regulatory pathways, annotate protein sequences,
and identify animal models to develop new drugs, among other
features of scientific and biomedical interest. Together,
planetary analysis of genome and proteome databases is providing
an enhanced understanding of how life interacts with the
biosphere and adapts to global change.

Predicting functional divergence in protein evolution by site-specific rate shifts
Gaucher, EA
Gu, X
Miyamoto, MM
Benner, SA
Trends Biochem. Sci. 27
(6)
315-321
(2002)
<Abstract>
Most modern tools that analyze protein evolution allow
individual sites to mutate at constant rates over the history of
the protein family. However, Walter Fitch observed in the 1970s
that, if a protein changes its function, the mutability of
individual sites might also change. This observation is captured
in the 'non-homogeneous gamma model', which extracts functional
information from gene families by examining the different rates
at which individual sites evolve. This model has recently been
coupled with structural and molecular biology to identify sites
that are likely to be involved in changing function within the
gene family. Applying this to multiple gene families highlights
the widespread divergence of functional behavior among proteins
to generate paralogs and orthologs.
 A bifunctional diglycosyltransferase forms the Fuca1,2Galb,3-disaccharide on Skp1 in the cytoplasm of Dictyostelium
van der Wel, H
Fisher, SZ
Gaucher, EA
West, CM
Glycobiology 11
(10)
884-884
(2001)

Function-structure analysis of proteins using covarion-based evolutionary approaches: Elongation factors
Gaucher, EA
Miyamoto, MM
Benner, SA
Proc. Natl. Acad. Sci. USA 98
(2)
548-552
(2001)
<Abstract>
The divergent evolution of protein sequences from genomic
databases can be analyzed by the use of different mathematical
models. The most common treat all sites in a protein sequence as
equally variable. More sophisticated models acknowledge the fact
that purifying selection generally tolerates variable amounts of
amino acid replacement at different positions in a protein
sequence. In their "stationary" versions, such models assume
that the replacement rate at individual positions remains
constant throughout evolutionary history. "Nonstationary"
covarion versions, however, allow the replacement rate at a
position to vary in different branches of the evolutionary
tree. Recently, statistical methods have been developed that
highlight this type of variation in replacement rates. Here, we
show how positions that have variable rates of divergence in
different regions of a tree ("covarion behavior"), coupled with
analyses of experimental three-dimensional structures, can
provide experimentally testable hypotheses that relate
individual amino acid residues to specific functional
differences in those branches. We illustrate this in the
elongation factor family of proteins as a paradigm for
applications of this type of analysis in functional genomics
generally.

Evolution, language and analogy in functional genomics
Benner, SA
Gaucher, EA
Trends in Genetics 17
(7)
414-418
(2001)
<Abstract>
Almost a century ago, Wittgenstein pointed out that theory in
science is intricately connected to language. This connection is
not a frequent topic in the genomics literature. But a case can
be made that functional genomics is today hindered by the
paradoxes that Wittgenstein identified. If this is true, until
these paradoxes are recognized and addressed, functional
genomics will continue to be limited in its ability to
extrapolate information from genomic sequences.
|
|
|