Modern biomedical research cannot endure without the most powerful tools from information science. Unfortunately, most bioinformatics tools are now developed by mathematicians and computer scientists lacking an understanding of the needs of biologists and biomedical researchers. The FfAME brings together both areas of expertise, often within a single scientist, to use informatics with evolutionary models to interpret experimental data, develop predictive models, and support planetary biology and medicine.
Naturally organized genome sequence databases were invented by scientists now working at FfAME, as were the first convincing tools to predict protein folds from primary sequence data, and some of the first tools to connect evolutionary sequence divergence to functional change in proteins. Ongoing work is developing better tools for evolutionary modeling, better software to visualize evolutionary events in the context of structural biolgy, and better strategies for predictive phylogenomics. Our goal is to enable scientific research and education by providing resources that effortlessly apply multiple lines of evidence from different scientific disciplines (for example molecular evolution, structural biology and paleontology) to illuminate problems in science and technology.
Molecular models of sequence evolution
The FfAME is developing advanced models for analyzing divergent evolution of proteins to join molecular evolution and structural biology to support functional analysis.
Predicting functional divergence in protein evolution by site-specific rate shifts
Gaucher, EA; Gu, X; Miyamoto, MM; Benner, SA
Trends Biochem. Sci.
27 (6) 315-321 (2002)
Most modern tools that analyze protein evolution allow individual sites to mutate at constant rates over the history of the protein family. However, Walter Fitch observed in the 1970s that, if a protein changes its function, the mutability of individual sites might also change. This observation is captured in the 'non-homogeneous gamma model', which extracts functional information from gene families by examining the different rates at which individual sites evolve. This model has recently been coupled with structural and molecular biology to identify sites that are likely to be involved in changing function within the gene family. Applying this to multiple gene families highlights the widespread divergence of functional behavior among proteins to generate paralogs and orthologs.
Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments
Chang, MSS; Benner, SA
J. Mol. Biol.
341 (2) 617-631 (2004)
To understand how protein segments are inserted and deleted during divergent evolution, a set of pairwise alignments contained exactly one gap, and therefore arising from the first insertion-deletion (indel) event in the time separating the homologs, was examined. The alignments showed that "structure breaking" amino acids (PGDNS) were preferred within and flanking gapped regions, as are two residues with hydrophilic side-chains (QE) that frequently occur at the surface of protein folds. Conversely, hydrophobic residues (FMILYVW) occur infrequently within and flanking the gapped region. These preferences are modestly different in protein pairs separated by an episode of adaptive evolution, than in pairs diverging under strong functional constraints. Surprisingly, regions near an indel have not evolved more rapidly than the sequence pair overall, showing no evidence that an indel event must be compensated by local amino acid replacement. The gap-lengths are best approximated by a Zipfian distribution, with the probability of a gap of length L decreasing as a function of L-1.8. These features are largely independent of the length of the gap and the extent of divergence (measured by both silent and non-silent sequence changes) separating the two proteins. Surprisingly, amino acid repeats were discovered in more than a third of the polypeptide segments in and around the gap. These correspond to repeats in the DNA sequence. This suggests that a signature of the mechanism by which indels occur in the DNA sequence remains in the encoded protein sequences. These data suggest specific tools to score gap placement in an alignment. They also suggest tools that distinguish true indels from gaps created by mistaken gene finding, including under-predicted and overpredicted introns. By providing mechanisms to identify errors, the tools will enhance the value of genome sequence databases in support of integrated paleogenomics strategies used to extract functional information in a post-genomic environment.
A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous
Gaucher, EA; Miyamoto, MM
Mol. Phylogenet. Evol.
37 (3) 928-931 (2005)
All methods of phylogenetic inference make assumptions about the underlying evolutionary process of their characters and it is these assumptions that determine their relative successes and failures in the estimation of the true phylogeny for a group. This dependency of phylogenetic accuracy and robustness on evolutionary assumptions has been most extensively studied for the classic case of Felsenstein (1978) and its four-taxon phylogeny with two long, unrelated, terminal branches interspersed with two short ones. Given this model phylogeny, "long branch attraction" can occur and thereby lead to the convergence of a phylogenetic method onto an incorrect tree with the two long and two short terminal branches directly connected rather than interspersed. The extent to which a particular phylogenetic method is susceptible to this problem depends on what assumptions it makes about the evolution of the characters and data themselves.
Tools for visualization and analysis
The FfAME is developing better software to analyze, display and combine the results of molecular evolutionary analysis on proteins having biomedical significance. (NIH)
Read more about this study
Developing new molecular dating tools
The FfAME is constructing a comprehensive model for vertebrate evolution using advanced dating tools. This groups genes that duplicated at about the same time and ties duplication events to the paleontological record.
Developing better databases for research
The FfAME is developing advanced databases for pharmacophylogenomics, allowing more rapid analysis of human disease (such as cancer, hypertension and inflammation) and to better understand vertebrate evolution
Functional inferences from reconstructed evolutionary biology involving rectified databases. An evolutionarily-grounded approach to functional genomics.
Benner, SA; Chamberlin, SG; Liberles, DA; Govindarajan, S; Knecht, L
151 (2) 97-106 (2000)
If bioinformatics tools are constructed to reproduce the natural, evolutionary history of the biosphere, they offer powerful approaches to some of the most difficult tasks in genomics, including the organization and retrieval of sequence data, the updating of massive genomic databases, the detection of database error, the assignment of introns, the prediction of protein conformation from protein sequences, the detection of distant homologs, the assignment of function to open reading frames, the identification of biochemical pathways from genomic data, and the construction of a comprehensive model correlating the history of biomolecules with the history of planet Earth.