-
Research
-
Publications
-
People
-
Benner, Steven
-
Carrigan, Matthew
-
Chamberlin, Steve
-
Davis, Ross
-
Gaucher, Eric
-
Hughes, Romaine
-
Hutter, Daniel
-
Kim, Hyo-Joong
-
Leal, Nicole
-
Shaw, Ryan
-
Yang, Zunyi
-
Software
-
News and Events
-
Our Foundation
|
Steven Benner's Publications

Incorporation of Multiple Sequential Pseudothymidines by DNA Polymerases and Their Impact on DNA Duplex Structure
Havemann, SA
Hoshika, S
Hutter, D
Benner, SA
Nuc. Nuc. Nuc. acids 27
(3)
261-278
(2008)
<Abstract>
In this article, we focus on the synthesis of aryl C-glycosides via
Heck coupling. It is organized based on the type of structures used in
the assembly of the C-glycosides (also called C-nucleosides) with the
following subsections: pyrimidine C-nucleosides, purine C-nucleosides,
and monocyclic, bicyclic, and tetracyclic C-nucleosides. The reagents
and conditions used for conducting the Heck coupling reactions are
discussed. The subsequent conversion of the Heck products to the
corresponding target molecules and the application of the target
molecules are also described.

The evolution of seminal ribonuclease: Pseudogene reactivation or multiple gene inactivation events?
Sassi, SO
Braun, EL
Benner, SA
Mol. Biol. Evol. 24
(4)
1012-1024
(2007)
<Abstract>
Two approaches, one novel, are applied to analyze the divergent
evolution of ruminant seminal ribonucleases (RNases), paralogs of the
well-known pancreatic RNases of mammals. Here, the goal was to identify
periods of divergence of seminal RNase under functional constraints,
periods of divergence as a pseudogene, and periods of divergence driven
by positive selection pressures. The classical approach involves the
analysis of nonsynonymous to synonymous replacements ratios (omega) for
the branches of the seminal RNase evolutionary tree. The novel approach
coupled these analyses with the mapping of substitutions on the folded
structure of the protein. These analyses suggest that seminal RNase
diverged during much of its history after divergence from pancreatic
RNase as a functioning protein, followed by homoplastic inactivations
to create pseudogenes in multiple ruminant lineages. Further, they are
consistent with adaptive evolution only in the most recent episode
leading to the gene in modern oxen. These conclusions contrast sharply
with the view, cited widely in the literature, that seminal RNase
decayed after its formation by gene duplication into an inactive
pseudogene, whose lesions were repaired in a reactivation event.
Further, the 2 approaches, omega estimation and mapping of replacements
on the protein structure, were compared by examining their utility for
establishing the functional status of the seminal RNase genes in 2 deer
species. Hog and roe deer share common lesions, which strongly suggests
that the gene was inactive in their last common ancestor. In this
specific example, the crystallographic approach made the correct
implication more strongly than the omega approach. Studies of this type
should contribute to an integrated framework of tools to assign
functional and nonfunctional episodes to recently created gene
duplicates and to understand more broadly how gene duplication leads to
the emergence of proteins with novel functions.

Nucleoside alpha-thiotriphosphates, polymerases and the exonuclease III analysis of oligonucleotides containing phosphorothioate linkages
Yang, ZY
Sismour, AM
Benner, SA
Nucl. Acids Res. 35
(9)
3118-3127
(2007)
<Abstract>
The use of DNA polymerases to incorporate phosphorothioate linkages
into DNA, and the use of exonuclease III to determine where those
linkages have been incorporated, are re- examined in this work. The
results presented here show that exonuclease III degrades single-
stranded DNA as a substrate and digests through phosphorothioate
linkages having one absolute stereochemistry, assigned ( assuming
inversion in the polymerase reaction) as S, but not the other absolute
stereochemistry. This contrasts with a general view in the literature
that exonuclease III favors double-stranded nucleic acid as a substrate
and stops completely at phosphorothioate linkages. Furthermore, not all
DNA polymerases appear to accept exclusively the ( R) stereoisomer of
nucleoside alpha- thiotriphosphates [ and not the ( S) diastereomer], a
conclusion inferred two decades ago by examination of five Family- A
polymerases and a reverse transcriptase. This suggests that caution is
appropriate when extrapolating the detailed behavior of one polymerase
from the behaviors of other polymerases. Furthermore, these results
provide constraints on how exonuclease III - thiotriphosphate -
polymerase combinations can be used to analyze the behavior of the
components of a synthetic biology.

Enzymatic incorporation of a third nucleobase pair
Yang, ZY
Sismour, AM
Sheng, PP
Puskar, NL
Benner, SA
Nucl. Acids Res. 35
(13)
4238-4249
(2007)
<Abstract>
DNA polymerases are identified that copy a nonstandard nucleotide pair
joined by a hydrogen bonding pattern different from the patterns
joining the dA:T and dG:dC pairs.
6-Amino-5-nitro3-(l'-p-D-2'-deoxyribofuranosyl)-2(1H)-pyridone (dZ)
implements the non-standard 'small' donordonor-acceptor (pyDDA)
hydrogen bonding pattern.
2-Amino-8-(1-beta-D-2'-deoxyribofuranosyl)imidazo[1,2-a]-1,3,5-triazin-4
(8H)-one [dP) implements the 'large' acceptor-acceptor-donor (puAAD)
pattern. These nucleobases were designed to present electron density to
the minor groove, density hypothesized to help determine specificity
for polymerases. Consistent with this hypothesis, both dZTP and dPTP
are accepted by many polymerases from both Families A and B. Further,
the dZ:dP pair participates in PCR reactions catalyzed by Taq, Vent
(exo(-)) and Deep Vent (exo-) polymerases, with 94.4%, 97.5% and 97.5%,
respectively, retention per round. The dZ:dP pair appears to be lost
principally via transition to a dC:dG pair. This is consistent with a
mechanistic hypothesis that deprotonated dZ (presenting a pyDAA
pattern) complements dG (presenting a puADD pattern), while protonated
dC (presenting a pyDDA pattern) complements dP (presenting a puAAD
pattern). This hypothesis, grounded in the Watson-Crick model for
nucleobase pairing, was confirmed by studies of the pH-dependence of
mismatching. The dZ:dP pair and these polymerases, should be useful in
dynamic architectures for sequencing, molecular-, systems- and
synthetic-biology.

The origin of proteins and nucleic acids
Ricardo, A
Benner, SA
Planets and Life: The Emerging Science of Astrobiology, ed. Woodruff T. Sullivan and John A. Baross, Cambridge University Press 154-173
(2007)

Alien biochemistries
Ward, PD
Benner, SA
Planets and Life: The Emerging Science of Astrobiology, ed. Woodruff T. Sullivan and John A. Baross, Cambridge University Press 537-544
(2007)

Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins
Bradley, ME
Benner, SA
BMC Bioinformatics 7 89
(2006)
<Abstract>
Background: When accurate models for the divergent evolution of protein
sequences are integrated with complementary biological information,
such as folded protein structures, analyses of the combined data often
lead to new hypotheses about molecular physiology. This represents an
excellent example of how bioinformatics can be used to guide
experimental research. However, progress in this direction has been
slowed by the lack of a publicly available resource suitable for
general use.
Results: The precomputed Magnum database offers a solution to this
problem for ca. 1,800 full-length protein families with at least one
crystal structure. The Magnum deliverables include 1) multiple sequence
alignments, 2) mapping of alignment sites to crystal structure sites,
3) phylogenetic trees, 4) inferred ancestral sequences at internal tree
nodes, and 5) amino acid replacements along tree branches.
Comprehensive evaluations revealed that the automated procedures used
to construct Magnum produced accurate models of how proteins
divergently evolve, or genealogies, and correctly integrated these with
the structural data. To demonstrate Magnum's capabilities, we asked for
amino acid replacements requiring three nucleotide substitutions,
located at internal protein structure sites, and occurring on short
phylogenetic tree branches. In the cellular retinoid binding protein
family a site that potentially modulates ligand binding affinity was
discovered. Recruitment of cellular retinol binding protein to function
as a lens crystallin in the diurnal gecko afforded another opportunity
to showcase the predictive value of a browsable database containing
branch replacement patterns integrated with protein structures.
Conclusion: We integrated two areas of protein science, evolution and
structure, on a large scale and created a precomputed database, known
as Magnum, which is the first freely available resource of its kind.
Magnum provides evolutionary and structural bioinformatics resources
that are useful for identifying experimentally testable hypotheses
about the molecular basis of protein behaviors and functions, as
illustrated with the examples from the cellular retinoid binding
proteins.

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics
Li, T
Chamberlin, SG
Caraco, MD
Liberles, DA
Gaucher, EA
Benner, SA
BMC Evol. Biol. 6 25
(2006)
<Abstract>
Background: The exchange of nucleotides at synonymous sites in a gene
encoding a protein is believed to have little impact on the fitness of
a host organism. This should be especially true for synonymous
transitions, where a pyrimidine nucleotide is replaced by another
pyrimidine, or a purine is replaced by another purine. This suggests
that transition redundant exchange ( TREx) processes at the third
position of conserved two-fold codon systems might offer the best
approximation for a neutral molecular clock, serving to examine, within
coding regions, theories that require neutrality, determine whether
transition rate constants differ within genes in a single lineage, and
correlate dates of events recorded in genomes with dates in the
geological and paleontological records. To date, TREx analysis of the
yeast genome has recognized correlated duplications that established a
new metabolic strategies in fungi, and supported analyses of functional
change in aromatases in pigs. TREx dating has limitations, however.
Multiple transitions at synonymous sites may cause equilibration and
loss of information. Further, to be useful to correlate events in the
genomic record, different genes within a genome must suffer transitions
at similar rates.
Results: A formalism to analyze divergence at two fold redundant codon
systems is presented. This formalism exploits two-state
approach-to-equilibrium kinetics from chemistry. This formalism
captures, in a single equation, the possibility of multiple
substitutions at individual sites, avoiding any need to "correct" for
these. The formalism also connects specific rate constants for
transitions to specific approximations in an underlying evolutionary
model, including assumptions that transition rate constants are
invariant at different sites, in different genes, in different
lineages, and at different times. Therefore, the formalism supports
analyses that evaluate these approximations.
Transitions at synonymous sites within two-fold redundant coding
systems were examined in the mouse, rat, and human genomes. The key
metric (f(2)), the fraction of those sites that holds the same
nucleotide, was measured for putative ortholog pairs. A transition
redundant exchange ( TREx) distance was calculated from f(2) for these
pairs. Pyrimidine-pyrimidine transitions at these sites occur
approximately 14% faster than purine-purine transitions in various
lineages. Transition rate constants were similar in different genes
within the same lineages; within a set of orthologs, the f(2)
distribution is only modest overdispersed. No correlation between
disparity and overdispersion is observed. In rodents, evidence was
found for greater conservation of TREx sites in genes on the X
chromosome, accounting for a small part of the overdispersion, however.
Conclusion: The TREx metric is useful to analyze the history of
transition rate constants within these mammals over the past 100
million years. The TREx metric estimates the extent to which silent
nucleotide substitutions accumulate in different genes, on different
chromosomes, with different compositions, in different lineages, and at
different times.

Application of DETECTER, an Evolutionary Genomic Tool to Analyze Genetic Variation, to the Cystic Fibrosis Gene Family
Gaucher, EA
DeKee, DW
Benner, SA
BMC Genomics 7 44
(2006)
<Abstract>
Background: The medical community requires computational tools that
distinguish genetic differences having phenotypic impact within the
vast number of mutations that do not. Tools that do this will become
increasingly important for those seeking to use human genome sequence
data to predict disease, make prognoses, and customize therapy to
individual patients.
Results: An approach, termed DETECTER, is proposed to identify sites
in a protein sequence where amino acid replacements are likely to have
a significant effect on phenotype, including causing genetic
disease. This approach uses a model-dependent tool to estimate the
normalized replacement rate at individual sites in a protein sequence,
based on a history of those sites extracted from an evolutionary
analysis of the corresponding protein family. This tool identifies
sites that have higher-than-average, average, or lower- than-average
rates of change in the lineage leading to the sequence in the
population of interest. The rates are then combined with sequence data
to determine the likelihoods that particular amino acids were present
at individual sites in the evolutionary history of the gene
family. These likelihoods are used to predict whether any specific
amino acid replacements, if introduced at the site in a modern human
population, would have a significant impact on fitness. The DETECTER
tool is used to analyze the cystic fibrosis transmembrane conductance
regulator (CFTR) gene family.
Conclusions: In this system, DETECTER retrodicts amino acid
replacements associated with the cystic fibrosis disease with greater
accuracy than alternative approaches. While this result validates this
approach for this particular family of proteins only, the approach may
be applicable to the analysis of polymorphisms generally, including
SNPs in a human population.

2-Hydroxymethylboronate as a Reagent To Detect Carbohydrates: Application to the Analysis of the Formose Reaction
Ricardo, A
Frye, F
Carrigan, MA
Tipton, JD
Powell, DH
Benner, SA
J. Org. Chem. 71
(25)
9503-9505
(2006)
<Abstract>
2-Hydroxymethylphenylboronate is described as a reagent that converts
neutral 1,2-diols, as found in simple carbohydrates, into 1:1 anionic
complexes that are easily detected by Fourier transform ion cyclotron
resonance mass spectrometry. The value of this reagent was demonstrated
through its application to analyze complex mixtures of carbohydrates
formed in the formose process, often cited as a way that biologically
significant carbohydrates might have been generated from formaldehyde
under prebiotic conditions. Coupled with isotope studies, the reagent
shows that the simplest autocatalytic cycle for the consumption of
formaldehyde in this process cannot account for the bulk consumption of
formaldehyde.

Dynamic assembly of primers on nucleic acid templates
Leal, NA
Sukeda, M
Benner, SA
Nucl. Acids Res. 34 4702-4710
(2006)
<Abstract>
A strategy is presented that uses dynamic equlibria to assemble in situ
composite DNA polymerase primers, having lengths of 14 or 16 nt, from DNA
fragments that are 6 or 8 nt in length. In this implementation, the
fragments are transiently joined under conditions of dynamic equilibrium by
an imine linker, which has a dissociation constant of 1 µM. If a polymerase
is able to extend the composite, but not the fragments, it is possible to
prime the synthesis of a target DNA molecule under conditions where two
useful specificities are combined: (i) single nucleotide discrimination
that is characteristic of short oligonucleotide duplexes (four to six
nucleobase pairs in length), which effectively excludes single mismatches,
and (ii) an overall specificity of priming that is characteristic of long
(14 to 16mers) oligonucleotides, potentially unique within a genome. We
report here the screening of a series of polymerases that combine an
ability not to accept short primer fragments with an ability to accept the
long composite primer held together by an unnatural imine linkage. Several
polymerases were found that achieve this combination, permitting the
implementation of the dynamic combinatorial chemical strategy.

Artificially expanded genetic information system: a new base pair with an alternative hydrogen bonding pattern
Yang, ZY
Hutter, D
Sheng, PP
Sismour, AM
Benner, SA
Nucl. Acids Res. 34
(21)
6095-6101
(2006)
<Abstract>
To support efforts to develop a 'synthetic biology' based on an
artificially expanded genetic information system (AEGIS), we have
developed a route to two components of a non-standard nucleobase pair,
the pyrimidine analog
6-amino-5-nitro-3-(1'-beta-D-2'-deoxyribofuranosyl)-2(1H)-pyridone (dZ)
and its Watson-Crick complement, the purine analog
2-amino-8-(1'-beta-D-2'-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin
-4(8H)-one (dP). These implement the pyDDA:puAAD hydrogen bonding
pattern (where 'py' indicates a pyrimidine analog and 'pu' indicates a
purine analog, while A and D indicate the hydrogen bonding patterns of
acceptor and donor groups presented to the complementary nucleobases,
from the major to the minor groove). Also described is the synthesis of
the triphosphates and protected phosphoramidites of these two
nucleosides. We also describe the use of the protected phosphoramidites
to synthesize DNA oligonucleotides containing these AEGIS components,
verify the absence of epimerization of dZ in those oligonucleotides,
and report some hybridization properties of the dZ:dP nucleobase pair,
which is rather strong, and the ability of each to effectively
discriminate against mismatches in short duplex DNA.

A review: Synthesis of aryl C-glycosides via the heck coupling reaction
Wellington, KW
Benner, SA
Nuc. Nuc. Nuc. acids 25
(12)
1309-1333
(2006)
<Abstract>
In this article, we focus on the synthesis of aryl C-glycosides via
Heck coupling. It is organized based on the type of structures used in
the assembly of the C-glycosides (also called C-nucleosides) with the
following subsections: pyrimidine C-nucleosides, purine C-nucleosides,
and monocyclic, bicyclic, and tetracyclic C-nucleosides. The reagents
and conditions used for conducting the Heck coupling reactions are
discussed. The subsequent conversion of the Heck products to the
corresponding target molecules and the application of the target
molecules are also described.

Desorption/ionization on porous silicon mass spectrometry studies on pentose-borate complexes
Li, Q
Ricardo, A
Benner, SA
Winefordner, JD
Powell, DH
Anal. Chem. 77
(14)
4503-4508
(2005)
<Abstract>
Desorption/ionization on porous silicon mass spectrometry (DIOS-MS) was
used to investigate the binding affinities between aldopentose isomers
and boron. Boron has been recognized for its importance in pentose
synthesis and stabilization in prebiotic conditions. Boron may also
account for the fact that ribose, among other aldopentoses, is the
favored building block in RNA synthesis. This research started with the
detection of aldopentoses in the positive mode through cationization
and the aldopentose-borate complexes in the negative mode. Then two
competition schemes, one using a pentose structure analogue and the
other using C-13-labeled ribose, were designed to compare the relative
binding affinities of four aldopentoses (xylose, lyxose, arabinose, and
ribose) to boron. Both approaches determined the binding preference to
be ribose > lyxose > arabinose > xylose. This work illustrates the
potential of DIOS-MS in the analyses of nonvolatile, small molecules in
delicate chemical equilibria. Without externally introduced matrices,
background signals are not a limiting factor. Furthermore, the possible
dramatic change of pH associated with the matrix introduction, which
may disturb the equilibria of interest, is avoided.

Phylogenomic approaches to common problems encountered in the analysis of low copy repeats: The sulfotransferase IA gene family example
Bradley, ME
Benner, SA
BMC Evol. Biol. 5 22
(2005)
<Abstract>
Background: Blocks of duplicated genomic DNA sequence longer than 1000
base pairs are known as low copy repeats (LCRs). Identified by their
sequence similarity, LCRs are abundant in the human genome, and are
interesting because they may represent recent adaptive events, or
potential future adaptive opportunities within the human lineage.
Sequence analysis tools are needed, however, to decide whether these
interpretations are likely, whether a particular set of LCRs represents
nearly neutral drift creating junk DNA, or whether the appearance of
LCRs reflects assembly error. Here we investigate an LCR family
containing the sulfotransferase (SULT) IA genes involved in drug
metabolism, cancer, hormone regulation, and neurotransmitter biology as
a first step for defining the problems that those tools must manage.
Results: Sequence analysis here identified a fourth sulfotransferase
gene, which may be transcriptionally active, located on human
chromosome 16. Four regions of genomic sequence containing the four
human SULTIA paralogs defined a new LCR family. The stem hominoid
SULTIA progenitor locus was identified by comparative genomics
involving complete human and rodent genomes, and a draft chimpanzee
genome. SULTIA expansion in hominoid genomes was followed by positive
selection acting on specific protein sites. This episode of adaptive
evolution appears to be responsible for the dopamine sulfonation
function of some SULT enzymes. Each of the conclusions that this
bioinformatic analysis generated using data that has uncertain
reliability (such as that from the chimpanzee genome sequencing
project) has been confirmed experimentally or by a "finished"
chromosome 16 assembly, both of which were published after the
submission of this manuscript.
Conclusion: SULTIA genes expanded from one to four copies in hominoids
during intra-chromosomal LCR duplications, including (apparently) one
after the divergence of chimpanzees and humans. Thus, LCRs may provide
a means for amplifying genes (and other genetic elements) that are
adaptively useful. Being located on and among LCRs, however, could make
the human SULTIA genes susceptible to further duplications or deletions
resulting in 'genomic diseases' for some individuals. Pharmacogenomic
studies of SULTIAsingle nucleotide polymorphisms, therefore, should
also consider examining SULTIA copy number variability when searching
for genotype-phenotype associations. The latest duplication is,
however, only a substantiated hypothesis; an alternative explanation,
disfavored by the majority of evidence, is that the duplication is an
artifact of incorrect genome assembly.

Synthetic biology
Sismour, AM
Benner, SA
Expert Opin. Biol. Ther. 5
(11)
1409-1414
(2005)
<Abstract>
Chemistry is a broadly powerful discipline in contemporary science
because it has the ability to create new forms of the matter that it
studies. By doing so, chemistry can test models that connect molecular
structure to behaviour without having to rely on what nature has
provided. This creation, known as synthesis', began to be applied to
living systems in the 1980s as recombinant DNA technologies allowed
biologists to deliberately change the molecular structure of the
microbes that they studied, and automated chemical synthesis of DNA
became widely available to support these activities. The impact of the
information that has emerged has made biologists aware of a truism that
has long been known in chemistry: synthesis drives discovery and
understanding in ways that analysis cannot. Synthetic biology is now
setting an ambitious goal: to recreate in artificial systems the
emergent properties found in natural biology. By doing so, it is
advancing our understanding of the molecular basis of genetics in ways
that analysis alone cannot. More practically, it has yielded artificial
genetic systems that improve the healthcare of some 400,000 Americans
annually. Synthetic biology is now set to take the next step, to create
artificial Darwinian systems by direct construction. Supported by the
National Science Foundation as part of its Chemical Bonding program,
this work cannot help but generate clarity in our understanding of how
biological systems work.

Planetary systems biology
Benner, SA
Ricardo, A
Mol. Cell 17
(4)
471-472
(2005)
<Abstract>
Combining paleogenetics, protein engineering, synthetic biology, and
metabolic modeling, a planetary biology perspective is brought to bear
on adaptive evolutionary events in ancient bacteria.

The use of thymidine analogs to improve the replication of an extra DNA base pair: a synthetic biological system
Sismour, AM
Benner, SA
Nucl. Acids Res. 33 5640-5646
(2005)
<Abstract>
Synthetic biology based on a six-letter genetic alphabet that
includes the two non-standard nucleobases isoguanine (isoG) and
isocytosine (isoC), as well as the standard A, T, G and C, is
known to suffer as a consequence of a minor tautomeric form of
isoguanine that pairs with thymine, and therefore leads to
infidelity during repeated cycles of the PCR. Reported here is a
solution to this problem. The solution replaces thymidine
triphosphate by 2-thiothymidine triphosphate (2-thioTTP). Because
of the bulk and hydrogen bonding properties of the thione unit in
2-thioT, 2-thioT does not mispair effectively with the minor
tautomer of isoG. To test whether this might allow PCR
amplification of a six-letter artificially expanded genetic
information system, we examined the relative rates of
misincorporation of 2-thioTTP and TTP opposite isoG using affinity
electrophoresis. The concentrations of isoCTP and 2-thioTTP were
optimal to best support PCR amplification using thermostable
polymerases of a six-letter alphabet that includes the isoC-isoG
pair. The fidelity-per-round of amplification was found to be
approximately 98% in trial PCRs with this six-letter DNA
alphabet. The analogous PCR employing TTP had a fidelity-per-round
of only approximately 93%. Thus, the A, 2-thioT, G, C, isoC, isoG
alphabet is an artificial genetic system capable of Darwinian
evolution.

Resurrecting ancestral alcohol dehydrogenases from yeast
Thomson, JM
Gaucher, EA
Burgan, MF
De Kee, DW
Li, T
Aris, JP
Benner, SA
Nature Genet. 37
(6)
630-635
(2005)
<Abstract>
Modern yeast living in fleshy fruits rapidly convert sugars into
bult ethanol through pyruvate. Pyruvate loses carbon dioxide to
become acetaldehyde, which is reduced by alcohol dehydrogenase 1
(Adh1) to ethanol, which accumulates. Yeast later consumes the
accumulated ethanol, exploiting Adh2, an Adh1 homolog differing by
24 (of 348) amino acids. Because many microorganisms cannot grow
in ethanol, accumulated ethanol may help yeast defend resources in
the fruit. We report here the reconstruction of the last common
ancestor of Adh1 and Adh2, called AdhA. The kinetic behavior of
AdhA suggests that it was optimized to make (not consume) ethanol.
This is consistent with the hypothesis that before the Adh1-Adh2
duplication, yeast did not accumulate ethanol for later consumption
but rather used AdhA to recycle NADH generated in the glycolytic
pathway. Silent nucleotide dating suggests that the Adh1-Adh2
duplication occurred near the time of duplication of several other
proteins involved in the accumulation of ethanol, possibly in the
Cretaceous age when fleshy fruits arose. These results help to
connect the chemical behavior of these enzymes through systems
analysis to a time of global ecosystem change, a small but useful
step towards a planetary systems biology.
 Synthetic Biology
Sismour, AM
Benner, SA
Nat. Rev. Genet. 6 533-543
(2005)
<Abstract>
Synthetic biologists come in two broad classes. One uses unnatural
molecules to reproduce emergent behaviours from natural biology,
with the goal of creating artificial life. The other seeks
interchangeable parts from natural biology to assemble into
systems that function unnaturally. Either way, a synthetic goal
forces scientists to cross uncharted ground to encounter and solve
problems that are not easily encountered through analysis. This
drives the emergence of new paradigms in ways that analysis cannot
easily do. Synthetic biology has generated diagnostic tools that
improve the care of patients with infectious diseases, as well as
devices that oscillate, creep and play tic-tac-toe.

Understanding nucleic acids using synthetic chemistry
Benner, SA
Acc. Chem. Res. 37
(10)
784-797
(2004)
<Abstract>
This Account describes work done in these laboratories that has used
synthetic, physical organic, and biological chemistry to understand the
roles played by the nucleobases, sugars, and phosphates of DNA in the
molecular recognition processes central to genetics. The number of
nucleobases has been increased from 4 to 12, generating an artificially
expanded genetic information system. This system is used today in the
clinic to monitor the levels of HIV and hepatitis C viruses in
patients, helping to manage patient care. Work with uncharged phosphate
replacements suggests that a repeating charge is a universal feature of
genetic molecules operating in water and will be found in
extraterrestrial life (if it is ever encountered). The use of ribose
may reflect prebiotic processes in the presence of borate-containing
minerals, which stabilize ribose formed from simple organic precursors.
A new field, synthetic biology, is emerging on the basis of these
experiments, where chemistry mimics biological processes as complicated
as Darwinian evolution.

Quantitative analysis of a RNA-cleaving DNA catalyst obtained via in vitro selection
Carrigan, MA
Ricardo, A
Ang, DN
Benner, SA
Biochemistry 43
(36)
11446-11459
(2004)
<Abstract>
In vitro selections performed in the presence of Mg2+ generated DNA
sequences capable of cleaving an internal ribonucleoside linkage.
Several of these, surprisingly, displayed intermolecular catalysis and
catalysis independent of Mg2+, features that the selection protocol was
not explicitly designed to select. A detailed physical organic analysis
was applied to one of these DNAzymes, termed 614. First, the progress
curve for the reaction was dissected to identify factors that prevented
the molecule from displaying clean first-order transformation kinetics
and 100% conversion. Several factors were identified and quantitated,
including (a) competitive intra- and intermolecular rate processes, (b)
alternative reactive and unreactive conformations, and (c) mutations
within the catalyst. Other factors were excluded, including "approach
to equilibrium" kinetics and product inhibition. The possibility of
complementary strand inhibition was demonstrated but was shown to not
be a factor under the conditions of these experiments. The rates of the
intra- and intermolecular processes were compared, and saturation
models for the intermolecular process were built. The rate-limiting
step for the intermolecular reaction was found to be the association/
folding of the enzyme with the substrate and not the cleavage step. The
DNAzyme 614 is more active in trans than in cis and more active at
temperatures below the selection temperature than at the selection
temperature. Many of these properties have not been reported in similar
systems; these results therefore expand the phenomenology known for
this class of DNA-based catalysts. A brief survey of other catalysts
arising from this selection found other Mg2+-independent DNAzymes and
provided a preliminary view of the ruggedness of the landscape,
relating function to structure in sequence space. Hypotheses are
suggested to account for the fact that a selection in the presence of
Mg2+ did not exploit this Mg2+. This study of a specific catalytically
active DNAzyme is an example of studies that will be necessary
generally to permit in vitro selection to help us understand the
distribution of function in sequence space.

The planetary biology of cytochrome P450 aromatases
Gaucher, EA
Graddy, LG
Li, T
Simmen, RC
Simmen, FA
Schreiber, DR
Liberles, DA
Janis, CM
Benner, SA
BMC Biology 2
(1)
19
(2004)
<Abstract>
BACKGROUND: Joining a model for the molecular evolution of a
protein family to the paleontological and geological records
(geobiology), and then to the chemical structures of substrates,
products, and protein folds, is emerging as a broad strategy for
generating hypotheses concerning function in a post-genomic
world. This strategy expands systems biology to a planetary
context, necessary for a notion of fitness to underlie (as it
must) any discussion of function within a biomolecular
system.
RESULTS: Here, we report an example of such an expansion,
where tools from planetary biology were used to analyze three
genes from the pig Sus scrofa that encode cytochrome P450
aromatases-enzymes that convert androgens into estrogens. The
evolutionary history of the vertebrate aromatase gene family was
reconstructed. Transition redundant exchange silent substitution
metrics were used to interpolate dates for the divergence of
family members, the paleontological record was consulted to
identify changes in physiology that correlated in time with the
change in molecular behavior, and new aromatase sequences from
peccary were obtained. Metrics that detect changing function in
proteins were then applied, including KA/KS values and those
that exploit structural biology. These identified specific amino
acid replacements that were associated with changing substrate
and product specificity during the time of presumed adaptive
change. The combined analysis suggests that aromatase paralogs
arose in pigs as a result of selection for Suoidea with larger
litters than their ancestors, and permitted the Suoidea to
survive the global climatic trauma that began in the
Eocene.
CONCLUSIONS: This combination of bioinformatics analysis,
molecular evolution, paleontology, cladistics, global
climatology, structural biology, and organic chemistry serves as
a paradigm in planetary biology. As the geological,
paleontological, and genomic records improve, this approach
should become widely useful to make systems biology statements
about high-level function for biomolecular systems.

Multiplexed genetic analysis using an expanded genetic alphabet
Johnson, SC
Marshall, DJ
Harms, G
Miller, CM
Sherrill, CB
Beaty, EL
Lederer, SA
Roesch, EB
Madsen, G
Hoffman, GL
Laessig, RH
Kopish, GJ
Baker, MW
Benner, SA
Farrell, PM
Prudent, JR
Clin. Chem. 50
(11)
2019-2027
(2004)
<Abstract>
Background: All states require some kind of testing for newborns, but
the policies are far from standardized. In some states, newborn
screening may include genetic tests for a wide range of targets, but
the costs and complexities of the newer genetic tests inhibit expansion
of newborn screening. We describe the development and technical
evaluation of a multiplex platform that may foster increased newborn
genetic screening.
Methods: MultiCode(R) PLx involves three major steps: PCR,
target-specific extension, and liquid chip decoding. Each step is
performed in the same reaction vessel, and the test is completed in
similar to3 h. For site-specific labeling and room-temperature
decoding, we use an additional base pair constructed from isoguanosine
and isocytidine. We used the method to test for mutations within the
cystic fibrosis transmembrane conductance regulator (CFTR) gene. The
developed test was performed manually and by automated liquid handling.
Initially, 225 samples with a range of genotypes were tested
retrospectively with the method. A prospective study used samples from
>400 newborns.
Results: In the retrospective study, 99.1% of samples were correctly
genotyped with no incorrect calls made. In the perspective study, 95%
of the samples were correctly genotyped for all targets, and there were
no incorrect calls.
Conclusions: The unique genetic multiplexing platform was successfully
able to test for 31 targets within the CFTR gene and provides accurate
genotype assignments in a clinical setting. (C) 2004 American
Association for Clinical Chemistry.

Is there a common chemical model for life in the universe?
Benner, SA
Ricardo, A
Carrigan, MA
Curr. Op. Chem Biol. 8
(6)
672-689
(2004)
<Abstract>
A review of organic chemistry suggests that life, a chemical system
capable of Darwinian evolution, may exist in a wide range of
environments. These include non-aqueous solvent systems at low
temperatures, or even supercritical dihydrogen-helium mixtures. The
only absolute requirements may be a thermodynamic disequilibrium and
temperatures consistent with chemical bonding. A solvent system,
availability of elements such as carbon, hydrogen, oxygen and nitrogen,
certain thermodynamic features of metabolic pathways, and the
opportunity for isolation, may also define habitable environments. If
we constrain life to water, more specific criteria can be proposed,
including soluble metabolites, genetic materials with repeating
charges, and a well defined temperature range.

Expanding the genetic alphabet: Pyrazine nucleosides that support a donor-donor-acceptor hydrogen-bonding pattern
von Krosigk, U
Benner, SA
Helv. Chim. Acta 87
(6)
1299-1324
(2004)
<Abstract>
The 6-aminopyrazin-2(1H)-one, when incorporated as a pyrimidine-base
analog into an oligonucleotide chain, presents a H-bond donor- donor-
acceptor pattern to a complementary DNA or RNA strand. When paired with
the corresponding acceptor-acceptor-donor purine in oligonucleotides,
the heterocycle selectively contributes to the stability of the duplex,
presumably by forming a base pair of Watson-Crick geometry joined by a
nonstandard H-bonding pattern, expanding the genetic alphabet. Reported
here is a short, high yielding, beta-D-selective synthesis of a
6-aminopyrazin-2(l H) -one nucleoside via the glycine riboside
derivative 28. The key steps include a Wittig-Horner reaction of an
appropriately protected ribose derivative (Scheme 10, 19 --> 21)
followed by a Michael-like ring closure (Scheme 12, 30 --> la and 32
--> 1b). Thus, a variety of pyrazine nucleosides (Scheme 73) including
the target 6-aminopyrazin-2(1H)-one riboside la, and its 5-methyl
derivative 1b, 6-amino-5-methylpyrazin-2(1H)-one riboside, are obtained.

Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments
Chang, MSS
Benner, SA
J. Mol. Biol. 341
(2)
617-631
(2004)
<Abstract>
To understand how protein segments are inserted and deleted during
divergent evolution, a set of pairwise alignments contained exactly one
gap, and therefore arising from the first insertion-deletion (indel)
event in the time separating the homologs, was examined. The alignments
showed that "structure breaking" amino acids (PGDNS) were preferred
within and flanking gapped regions, as are two residues with
hydrophilic side-chains (QE) that frequently occur at the surface of
protein folds. Conversely, hydrophobic residues (FMILYVW) occur
infrequently within and flanking the gapped region. These preferences
are modestly different in protein pairs separated by an episode of
adaptive evolution, than in pairs diverging under strong functional
constraints. Surprisingly, regions near an indel have not evolved more
rapidly than the sequence pair overall, showing no evidence that an
indel event must be compensated by local amino acid replacement. The
gap-lengths are best approximated by a Zipfian distribution, with the
probability of a gap of length L decreasing as a function of L-1.8.
These features are largely independent of the length of the gap and the
extent of divergence (measured by both silent and non-silent sequence
changes) separating the two proteins. Surprisingly, amino acid repeats
were discovered in more than a third of the polypeptide segments in and
around the gap. These correspond to repeats in the DNA sequence. This
suggests that a signature of the mechanism by which indels occur in the
DNA sequence remains in the encoded protein sequences. These data
suggest specific tools to score gap placement in an alignment. They
also suggest tools that distinguish true indels from gaps created by
mistaken gene finding, including under-predicted and overpredicted
introns. By providing mechanisms to identify errors, the tools will
enhance the value of genome sequence databases in support of integrated
paleogenomics strategies used to extract functional information in a
post-genomic environment.

Probing minor groove recognition contacts by DNA polymerases and reverse transcriptases using 3-deaza-2 '-deoxyadenosine
Hendrickson, CL
Devine, KG
Benner, SA
Nucl. Acids Res. 32
(7)
2241-2250
(2004)
<Abstract>
Standard nucleobases all present electron density as an unshared pair
of electrons to the minor groove of the double helix. Many heterocycles
supporting artificial genetic systems lack this electron pair. To
determine how different DNA polymerases use the pair as a substrate
specificity determinant, three Family A polymerases, three Family B
polymerases and three reverse transcriptases were examined for their
ability to handle 3-deaza-2'-deoxyadenosine (c(3)dA), an analog of
2'-deoxyadenosine lacking the minor groove electron pair. Different
polymerases differed widely in their interaction with c(3)dA. Most
notably, Family A and Family B polymerases differed in their use of
this interaction to exploit their exonuclease activities. Significant
differences were also found within polymerase families. This plasticity
in polymerase behavior is encouraging to those wishing to develop a
synthetic biology based on artificial genetic systems. The differences
also suggest either that Family A and Family B polymerases do not share
a common ancestor, that minor groove contact was not used by that
ancestor functionally or that this contact was not sufficiently
critical to fitness to have been conserved as the polymerase families
diverged. Each interpretation is significant for understanding the
planetary biology of polymerases.

PCR amplification of DNA containing non-standard base pairs by variants of reverse transcriptase from Human Immunodeficiency Virus-1
Sismour, AM
Lutz, S
Park, JH
Lutz, MJ
Boyer, PL
Hughes, SH
Benner, SA
Nucl. Acids Res. 32 728-735
(2004)
<Abstract>
As the next step towards generating a synthetic biology from
artificial genetic information systems, we have examined variants
of HIV reverse transcriptase (RT) for their ability to synthesize
duplex DNA incorporating the non-standard base pair between
2,4-diaminopyrimidine (pyDAD), a pyrimidine presenting a hydrogen
bond 'donor-acceptor-donor' pattern to the complementary base,
and xanthine (puADA), a purine presenting a hydrogen bond
'acceptor-donor-acceptor' pattern. This base pair fits the
Watson-Crick geometry, but is joined by a pattern of hydrogen
bond donor and acceptor groups different from those joining the
GC and AT pairs. A variant of HIV-RT where Tyr 188 is replaced by
Leu, has emerged from experiments where HIV was challenged to
grow in the presence of drugs targeted against the RT, such as
L-697639, TIBO and nevirapine. These drugs bind at a site near,
but not in, the active site. This variant accepts the pyDAD-puADA
base pair significantly better than wild type HIV-RT, and we used
this as a starting point. A second mutation, E478Q, was
introduced into the Y188L variant, in the event that the residual
nuclease activity observed is due to the RT, and not a
contaminant. The doubly mutated RT incorporated the non-standard
pair with sufficient fidelity that the variant could be used to
amplify oligonucleotides containing pyDAD and puADA through
several rounds of a polymerase chain reaction (PCR) without
losing the non-standard base pair. This is the first time where
DNA containing non-standard base pairs with alternative hydrogen
bonding patterns has been amplified by a full PCR. This work also
illustrates a research strategy that combines in clinico
pre-evolution of proteins followed by rational design to obtain
an enzyme that meets a particular technological specification.

2 '-deoxycytidines carrying amino and thiol functionality: Synthesis and incorporation by vent (exo(-)) polymerase
Roychowdhury, A
Illangkoon, H
Hendrickson, CL
Benner, SA
Org. Lett. 6
(4)
489-492
(2004)
<Abstract>
The synthesis of 2'-deoxycytidine nucleosides bearing amino and thiol
groups appended to the 5-position of the nucleobase via a butynyl
linker is described. The corresponding triphosphates were then
synthesized from the nucleoside and incorporated into oligonucleotides
by Vent (exo(-)) DNA polymerase. The ability of Vent (exo(-))
polymerase to amplify oligonucleotides containing these functionalized
cytidine derivatives in a polymerase chain reaction (PCR) was
demonstrated for the amino-functionalized derivative.

The NASA astrobiology roadmap
Marais, DJD
Allamandola, LJ
Benner, SA
Boss, AP
Deamer, D
Falkowski, PG
Farmer, JD
Hedges, SB
Jakosky, BM
Knoll, AH
Liskowsky, DR
Meadows, VS
Meyer, MA
Pilcher, CB
Nealson, KH
Spormann, AM
Trent, JD
Turner, WW
Woolf, NJ
Yorke, HW
Astrobiology 3
(2)
219-235
(2003)
<Abstract>
The NASA Astrobiology Roadmap provides guidance for research and
technology development across the NASA enterprises that encompass the
space, Earth, and biological sciences. The ongoing development of
astrobiology roadmaps embodies the contributions of diverse scientists
and technologists from government, universities, and private
institutions. The Roadmap addresses three basic questions: How does
life begin and evolve, does life exist elsewhere in the universe, and
what is the future of life on Earth and beyond? Seven Science Goals
outline the following key domains of investigation: understanding the
nature and distribution of habitable environments in the universe,
exploring for habitable environments and life in our own solar system,
understanding the emergence of life, determining how early life on
Earth interacted and evolved with its changing environment,
understanding the evolutionary mechanisms and environmental limits of
life, determining the principles that will shape life in the future,
and recognizing signatures of life on other worlds and on early Earth.
For each of these goals, Science Objectives outline more specific
high-priority efforts for the next 3-5 years. These 18 objectives are
being integrated with NASA strategic planning.
 First PCR amplification of DNA containing a nonstandard base pair A.
Lutz, S
Park, JH
Benner, SA
Biochemistry 42
(28)
8598-8598
(2003)

A direct synthesis of nucleoside analogs homologated at the 3 '- and 5 '-positions
Schmidt, J
Eschgfaller, B
Benner, SA
Helv. Chim. Acta 86
(9)
2937-2958
(2003)
<Abstract>
A new route is presented to prepare analogs of nucleosides homologated
at the 3'- and 5'-positions. This route, applicable to both the D- and
L-enantiomeric forms, is suitable for the preparation of monomeric
bis-homonucleosides needed for the synthesis of oligonucleotide
analogs. It begins with the known monobenzyl ether 3 of
pent-2-yne-1,5-diol, which is reduced to alkenol 4. Sharpless
asymmetric epoxidation of 4, followed by opening of the epoxide 5 with
allylmagnesium bromide, gives a mixture of diols 6 and 7 Protection of
the primary alcohol as a silyl ether followed by treatment with OsO4,
NalO(4), and mild acid in MeOH, followed by reduction, yields (2R,3R)
{{[(tert-butyl)diphenylsilyl]oxy}methyl}tetrahydro-2-(2-hydroxyethyl)-5-
methoxyfuran (=methyl 3-{{[(tert-butyl)diphenylsilyl]oxy}methyl}-2
3,5-trideoxy-alpha/beta-D-erythro-hexafuranoside: 10) (Scheme 1).
Protected nucleobases are added to this skeleton with the aid of
trimethylsilyl triflate (Scheme 2). The o-toluoyl (2-MeC6H4CO) and
p-anisoyl (4-MeOC6H4CO) groups were used to protect the exocyclic amino
group of cytosine. The bis-homonucleoside analogs 11 and 14a are then
converted to monothiol derivatives suitable for coupling (Schemes 3 and
4) to oligonucleotide analogs with bridging S-atoms. This synthesis
replaces a much longer synthesis for analogous nucleoside analogs that
begins with diacetoneglucose (= 1,2:5,6-di-O-isopropylideneglucose),
with the stereogenic centers in the final products derived from the
Sharpless asymmetric epoxidation. The new route is useful for
large-scale synthesis of these building blocks for the synthesis of
oligonucleotide analogs.

Synthesis and properties of oligodeoxynucleotide analogs with bis(methylene) sulfone bridges
Eschgfaller, B
Schmidt, JG
Konig, M
Benner, SA
Helv. Chim. Acta 86
(9)
2959-2997
(2003)
<Abstract>
A convergent, solution-phase synthesis was developed for the
bis(methylene) sulfone-bridged oligodeoxynucleotide analogs (SNA)
5'-d(HOCH2-Tso(2)Tso(2)Tso(2)Cso(2)Tso(2)Tso(2)Tso(2)T-CH2SO3-)-3'
(35b) and
5'-d(HOCH2-Tso(2)Tso(2)Tso(2)Tso(2)Tso(2)Tso(2)Tso(2)T-CH2SO3-)-3'
(34c) (SO2 corresponds to CH2SO2CH2 instead of OP(=O)(O-)(O). In these,
the phosphodiester linkages are replaced by non-ionic bis(methylene)
sulfone linkers. The general strategy involved convergent coupling of
3',5'-bishomo-beta-D-deoxyribonucleotide analogs functionalized at the
6'-end (=CH2-C(5')) as bromides or mesylates and at the CH2-C(3')
position as thiols. with the resulting thioether being oxidized to the
corresponding sulfone. A single charge was introduced at the terminal
CH2-C(3') position of the octamers to increase their solubility in
water. During the synthesis, it became apparent that the key
intermediates generated secondary structures through either folding or
aggregation in a variety of solvents. This generated unusual reactivity
and was unique for very similar structures. For example, although the
dimeric thiol d(BzOCH(2)-Tso(2)C-CH2SH) (14b) was a well-behaved
synthetic intermediate, the tetrameric thiol
d(TrOCH2-Tso(2)Tso(2)Tso(2)(10)C-CH2SH) derived from the corresponding
thioacetate was rapidly converted to a disulfide by very small amounts
of oxidant (28 --> 29, Scheme 6). while the analogous tetrameric thiol
d(BzOCH(2)-Tso(2)TsTso(2)T-CH2SH) (26), differing only by a single
heterocycle, was oxidized much more slowly (Bz = PhCO, Tr = Ph3C, to =
2-MeC6H4CO (at N-4 of dc)). The sequence-dependent reactivity, well
known in many classes of natural products (including polypeptides), is
not prominent in natural oligonucleotides. These results are discussed
in light of the proposal that the repeating negative charge in nucleic
acids is key to their ability to serve as genetic molecules, in
particular, their capability to support Darwinian evolution. The
ability of
5'-d(HOCH2-Tso(2)Tso(2)Tso(2)Cso(2)Tso(2)Tso(2)Tso(2)T-CH2SO3-)-3'
(35b) to bind as a third strand to duplex DNA was also examined. No
triple-helix-forming propensity was detected in this molecule.

Expanding the genetic alphabet: Non-epimerizing nucleoside with the pyDDA hydrogen-bonding pattern
Hutter, D
Benner, SA
J. Org. Chem. 68
(25)
9839-9842
(2003)
<Abstract>
6-Amino-3-(2'-deoxy-beta-D-ribofuranosyl)-5-nitro-1H-pyridin-2-one (4),
a C-glycoside exhibiting the nonstandard pgammaDDA hydrogen-bonding
pattern, was synthesized via Heck coupling. The nitro group greatly
enhances the stability of the nucleoside toward acid-catalyzed
epimerization without leading to significant deprotonation of the
heterocycle at physiological pH. These results make nucleoside 4 a
promising candidate for an expanded genetic alphabet.

Synthetic biology with artificially expanded genetic information systems. From personalized medicine to extraterrestrial life
Benner, SA
Hutter, D
Sismour, AM
Nucleic Acids Res. Suppl. 3 125-126
(2003)
<Abstract>
Over 15 years ago, the Benner group noticed that the DNA alphabet
need not be limited to the four standard nucleotides known in
natural DNA. Rather, twelve nucleobases forming six base pairs
joined by mutually exclusive hydrogen bonding patterns are
possible within the geometry of the Watson-Crick pair
(Fig. 1). Synthesis and studies on these compounds have brought us
to the threshold of a synthetic biology, an artificial chemical
system that does basic processes needed for life (in particular,
Darwinian evolution), but with unnatural chemical structures. At
the same time, the artificial genetic information systems (AEGIS)
that we have developed have been used in FDA-approved commercial
tests for managing HIV and hepatitis C infections in individual
patients, and in a tool that seeks the virus for severe acute
respiratory syndrome (SARS). AEGIS also supports the next
generation of robotic probes to search for genetic molecules on
Mars, Europa, and elsewhere where NASA probes will travel.

Act natural
Benner, SA
Nature 421
(6919)
118-118
(2003)

Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
Gaucher, EA
Thomson, JM
Burgan, MF
Benner, SA
Nature 425
(6955)
285-288
(2003)
<Abstract>
Features of the physical environment surrounding an ancestral
organism can be inferred by reconstructing sequences(1-9) of
ancient proteins made by those organisms, resurrecting these
proteins in the laboratory, and measuring their
properties. Here, we resurrect candidate sequences for
elongation factors of the Tu family (EF-Tu) found at ancient
nodes in the bacterial evolutionary tree, and measure their
activities as a function of temperature. The ancient EF-Tu
proteins have temperature optima of 55-65degreesC. This value
seems to be robust with respect to uncertainties in the
ancestral reconstruction. This suggests that the ancient
bacteria that hosted these particular genes were thermophiles,
and neither hyperthermophiles nor mesophiles. This conclusion
can be compared and contrasted with inferences drawn from an
analysis of the lengths of branches in trees joining proteins
from contemporary bacteria(10), the distribution of thermophily
in derived bacterial lineages(11), the inferred G+C content of
ancient ribosomal RNA(12), and the geological record combined
with assumptions concerning molecular clocks(13). The study
illustrates the use of experimental palaeobiochemistry and
assumptions about deep phylogenetic relationships between
bacteria to explore the character of ancient life.
 C-5 modified nucleosides: Direct insertion of alkynyl-thio functionality in pyrimidines
Held, HA
Roychowdhury, A
Benner, SA
Nuc. Nuc. Nuc. acids 22
(4)
391-404
(2003)
<Abstract>
A route is presented to append, in a single step, alkynyl thioesters to
the 5-position of a pyrimidine ring of a nucleoside that is
unprotected. These products should be useful to support in vitro
selection experiments with functionalized DNA.

Nucleobase pairing in Watson-Crick-like genetic expanded information systems
Geyer, CR
Battersby, TR
Benner, SA
Structure 11
(12)
1485-1498
(2003)
<Abstract>
To guide the design of alternative genetic systems, we measured melting
temperatures of DNA duplexes containing matched and mismatched
nucleobase pairs from natural and unnatural structures. The pairs were
analyzed in terms of structural features, including nucleobase size,
number of hydrogen bonds formed, the presence of uncompensated hydrogen
bonding functional groups, the nature of the bond joining the
nucleobase to the sugar, and nucleobase charge. The results suggest
that stability of nucleobase pairs correlates with the number of
H-bonds, size complementarity, the presence of uncompensated functional
groups, and the presence of charge on a nucleobase. Each of these
properties appear to be more significant than the nature of the
glycosidic bond and sequence context. The results provide guidelines
for constructing stable Watson-Crick like nucleobase pairs with
unnatural nucleobases. The experiments also demonstrate that expanded
genetic systems can be constructed using size complementary nucleobase
pairs that contain three hydrogen bonds.

Phosphates, DNA, and the search for nonterrean life: A second generation model for genetic molecules
Benner, SA
Hutter, D
Bioorg. Chem. 30
(1)
62-80
(2002)
<Abstract>
Phosphate groups are found and used widely in biological chemistry. We
have asked whether phosphate groups are likely to be important to the
functioning of genetic molecules. including DNA and RNA. From
observations made on synthetic analogs of DNA and RNA where the
phosphates are replaced by nonanionic linking groups, we infer a set of
rules that highlight the importance of the phosphodiester backbone for
the proper functioning of DNA as a genetic molecule. The polyanionic
backbone appears to give DNA the capability of replication following
simple rules, and evolving. The polyanionic nature of the backbone
appears to be critical to prevent the single strands from folding.
permitting them to act as templates, guiding the interaction between
two strands to form a duplex in a way that permits simple rules to
guide the molecular recognition event, and buffering the sensitivity of
its physicochemical properties to changes in sequence. We argue that
the feature of a polyelectrolyte (polyanion or polycation) may be
required for a "self-sustaining chemical system capable of Darwinian
evolution." The polyelectrolyte structure therefore may be a universal
signature of life, regardless of its genesis. and unique to living
forms as well. (C) 2002 Elsevier Science (USA).

From phosphate to bis(methylene) sulfone: Non-ionic backbone linkers in DNA
Hutter, D
Blaettler, MO
Benner, SA
Helv. Chim. Acta 85
(9)
2777-2806
(2002)
<Abstract>
Chimeric DNA molecules containing four different linking groups, the
natural phosphate, 5'-methylenephosphonate. bis(methylene)phosphinate,
and bis(methylene) sulfone (see Fig.1), were directly compared for
their ability to form duplexes with complementary DNA and DNA chimeras.
From melting temperatures for analogous complementary sequences,
general conclusions about the impact of geometric distortion of the
internucleotide linkage around the two P-O-C bridges were drawn, as
were conclusions about the impact on duplex stability that arises from
the removal of the negative charge in the linking group. Each
structural perturbation diminished the melting temperature, by ca.
-2.5degrees per modification for the 5'-methylenephosphonate,
-3.5degrees per modification for the bis(methylene)phosphinate, and
-4.5degrees per modification for the bis(methylene) sulfone linker.
These results have implications for DNA chemistry including the design
of 'antisense' candidates and the proposal of alternative genetic
materials in the search for non-terrean life.

Fourier transform-ion cyclotron resonance mass spectrometric resolution, identification, and screening of non-covalent complexes of Hck Src homology 2 domain receptor and ligands from a 324-member peptide combinatorial library
Wigger, M
Eyler, JR
Benner, SA
Li, WQ
Marshall, AG
J. Am. Soc. Mass Spec. 13
(10)
1162-1169
(2002)
<Abstract>
The preferred ligands for the Hck Src homology 2 domain among a
combinatorial library containing 324 different peptides were determined
in a single experiment involving Fourier transform ion cyclotron
resonance (FT-ICR) mass spectrometry (MS), electrospray ionization
(ESI), stored-waveform inverse Fourier transformation (SWIFT), and
infrared multiphoton laser disassociation (IRMPD). These were compared
with the results obtained by conventional screening of the peptide
library in solution using affinity chromatography. The results reported
here show that by combining ESI, FT-ICR MS, SWIFT, and IRMPD, ligands
likely to bind under physiological conditions are rapidly and
efficiently identified, even from complex library mixtures. In the gas
phase some discrimination against hydrophobic ligands could be
observed. However, the illustrated feasibility of identifying high
affinity ligand via gas-phase screening of complex library mixtures
should lead to broad applications in the development of ligands for
proteins with interesting biological activity, the first step that must
be taken to develop a therapeutic agent.

Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences
Fukami-Kobayashi, K
Schreiber, DR
Benner, SA
J. Mol. Biol. 319
(3)
729-743
(2002)
<Abstract>
When protein sequences divergently evolve under functional constraints,
some individual amino acid replacements that reverse the charge (e.g.
Lys to Asp) may be compensated by a replacement at a second position
that reverses the charge in the opposite direction (e.g. Glu to Arg).
When these side-chains are near in space (proximal), such double
replacements might be driven by natural selection, if either is
selectively disadvantageous, but both together restore fully the
ability of the protein to contribute to fitness (are together
"neutral"). Accordingly, many have sought to identify pairs of
positions in a protein sequence that suffer compensatory replacements,
often as a way to identify positions near in space in the folded
structure. A "charge compensatory signal" might manifest itself in two
ways. First, proximal charge compensatory replacements may occur more
frequently than predicted from the product of the probabilities of
individual positions suffering charge reversing replacements
independently. Conversely, charge compensatory pairs of changes may be
observed to occur more frequently in proximal pairs of sites than in
the average pair. Normally, charge compensatory covariation is detected
by comparing the sequences of extant proteins at the "leaves" of
phylogenetic trees. We show here that the charge compensatory signal is
more evident when it is sought by examining individual branches in the
tree between reconstructed ancestral sequences at nodes in the tree.
Here, we find that the signal is especially strong when the positions
pairs are in a single secondary structural unit (e.g. ut helix or P
strand) that brings the side-chains suffering charge compensatory
covariation near in space, and may be useful in secondary structure
prediction. Also, "node-node" and "node-leaf" compensatory covariation
may be useful to identify the better of two equally parsimonious trees,
in a way that is independent of the mathematical formalism used to
construct the tree itself. Further, compensatory covariation may
provide a signal that indicates whether an episode of sequence
evolution contains more or less divergence in functional behavior.
Compensatory covariation analysis on reconstructed evolutionary trees
may become a valuable tool to analyze genome sequences, and use these
analyses to extract biomedically useful information from proteome
databases. (C) 2002 Elsevier Science Ltd. All rights reserved.

Oligodeoxyribonucleotide analogues with bridging dimethylene sulfide, sulfoxide, and sulfone groups. Toward a second-generation model of nucleic acid structure
Huang, Z
Benner, SA
J. Org. Chem. 67
(12)
3996-4013
(2002)
<Abstract>
Short DNA analogues with bridging dimethylene sulfide, sulfoxide, and
sulfone groups replacing the phosphate diesters (S-DNAs) were
synthesized from building blocks prepared via two routes, both starting
from D-glucose. Building blocks for RNA analogues were prepared by
stereoselective introduction of nucleobase into a 2'-acylated ribose
analogue. The ribose analogues were converted to deoxyribose analogues
by replacement of a 3"-OH group by a thioacetyl unit, followed by
photolytic deoxygenation or radical-based 2'-deoxygenation. DNA
analogues joined via CH2-S-CH2 units were prepared by S(N)2
displacement of a 6'-mesyl group on one building block using a thiolate
nucleophile of another. 4,4'-Dimethoxytrityl protection and
deprotection schemes were established for both the thiol and hydroxyl
groups. The corresponding sulfoxide DNA analogues were obtained by
oxidation with hydrogen peroxide. Sulfone DNA analogues were obtained
by oxidation of the sulfide DNA with persulfate or hydrogen peroxide in
the presence of a titanium silicate catalyst. The physical properties
of several representative oligonucleotide analogues were examined, and
interpreted in light of a "second-generation" model for DNA
strand-strand recognition, a model that emphasizes the role of the
polyanionic backbone in diminishing unwanted tendencies of highly
functionalized molecules to form "structure" in solution. Even short
sulfide-linked DNA analogues displayed,association properties different
from those displayed by standard DNA molecules. Complex formation
observed with sulfide-linked tetramers by HPLC study in different
solvents suggested that the complex is formed using hydrogen bonding.
Sulfone-linked dinucleotides display Watson-Crick behavior; the
tetramer, however, displayed self-structure. Self-structure and
self-aggregation become more prominent as the length of the
oligonucleotide analogues increases. The tendency to self-aggregate can
be decreased by adding a charged sulfonate group to the 3"-end of the
DNA analogue. Features of the second-generation model are important for
many areas of nucleic acid chemistry, from the design of nucleic acid
therapeutic agents to the search for life on other planets.

Challenging artificial genetic systems: thymidine analogs with 5-position sulfur functionality
Held, HA
Benner, SA
Nucl. Acids Res. 30
(17)
3857-3869
(2002)
<Abstract>
Eight different polymerases, chosen from evolutionary families A (Taq,
Tfl, HotTub and Tth) and B (Pfu, Pwo, Vent and Deep Vent), were
examined for their ability to incorporate 5-position modified
2'-deoxyuridine derivatives that carry a protected thiol group appended
via different linkers containing either three or four carbon atoms.
This represents the first attempt to incorporate the thiol
functionality into DNA via enzymatic synthesis. Each
polymerase-substrate combination was evaluated using a hierarchy of
increasingly more difficult challenges, starting with incorporation of
a single derivative, proceeding to incorporation of two derivatives at
adjacent sites and non-adjacent sites, then examining the ability of
the polymerase to accept the derivative within the template, and
concluding with a challenge involving PCR. The evaluation of
thiol-bearing 2'-deoxyuridine derivatives was then extended to consider
their chemical stabilities. Stability was found to be less than
satisfactory when the thiol functionality has a 'propargylic'
relationship to the unsaturation in the linker. The best
polymerase-appendage combination used the polymerase from Pyrococcus
woesei (Pwo) and the 5'-tBu-SS-CH2-CH2-Cequivalent toC- linker. This
pair supported PCR amplification and therefore should have value in
artificial in vitro selection experiments. Indeed, we discovered that
Pwo and Pfu preferred the derivative triphosphate over TTP, the natural
substrate, in competition studies. These studies confirm an earlier
suggestion that membership of an evolutionary family of polymerases is
a partial predictor of the ability of the polymerase to accept
5-modified 2'-deoxyuridines. Considerable differences are displayed by
different members within a polymerase family, however. This remains
curious, as the ability of the polymerase to replicate natural DNA with
high fidelity and its propensity to exclude unnatural analogs are
presumed to be correlated.

Evolution - Planetary biology - Paleontological, geological, and molecular histories of life
Benner, SA
Caraco, MD
Thomson, JM
Gaucher, EA
Science 296
(5569)
864-868
(2002)
<Abstract>
The history of life on Earth is chronicled in the geological
strata, the fossil record, and the genomes of contemporary
organisms. When examined together, these records help identify
metabolic and regulatory pathways, annotate protein sequences,
and identify animal models to develop new drugs, among other
features of scientific and biomedical interest. Together,
planetary analysis of genome and proteome databases is providing
an enhanced understanding of how life interacts with the
biosphere and adapts to global change.

Predicting functional divergence in protein evolution by site-specific rate shifts
Gaucher, EA
Gu, X
Miyamoto, MM
Benner, SA
Trends Biochem. Sci. 27
(6)
315-321
(2002)
<Abstract>
Most modern tools that analyze protein evolution allow
individual sites to mutate at constant rates over the history of
the protein family. However, Walter Fitch observed in the 1970s
that, if a protein changes its function, the mutability of
individual sites might also change. This observation is captured
in the 'non-homogeneous gamma model', which extracts functional
information from gene families by examining the different rates
at which individual sites evolve. This model has recently been
coupled with structural and molecular biology to identify sites
that are likely to be involved in changing function within the
gene family. Applying this to multiple gene families highlights
the widespread divergence of functional behavior among proteins
to generate paralogs and orthologs.

Fluorescent charge-neutral analogue of xanthosine: Synthesis of a 2 '-deoxyribonucleoside bearing a 5-aza-7-deazaxanthine base
Rao, P
Benner, SA
J. Org. Chem. 66
(15)
5012-5015
(2001)
<Abstract>
A concise route is described to prepare the 5-aza-7-deazapurine 2 '
-deoxyriboside (4), which presents the puADA hydrogen-bonding pattern,
analogous to the hydrogen-bonding pattern presented by 2 '
-deoxyxanthosine (2). The route begins with the commercially available
1-alpha -chloro-2-deoxy-3-5-bistoluoyloxyribofuranose (10), which
proves to be a versatile point of entry to beta -2 '
-deoxyribofuranosides. In the first step, 2-nitroimidazole (8) is
coupled with 10 to yield intermediate 11. Reduction of the nitro group
to an amino group yields 12, which is treated with phenyl
isocyanatoformate to complete the nucleobase to yield 13. Removal of
the toluoyloxy protecting groups of 13 yields the target nucleoside 4
in 40% overall yield in four steps. In an alternative strategy,
convergent coupling of 14 with 10 under basic conditions was attempted
but found to yield the heterocycle glycosylated at the undesired
position. Compound 13 displays potentially useful fluorescence
properties. After excitation at 250 nm, a solution of 13 in MeCN shows
a fluorescence emission with a maximum at 410 Dm. Furthermore, 13 is
neutral at physiological pH, a property that it shares with natural
nucleobases but not xanthosine itself, which is an acid with a pK(a) of
ca. 5.6. Furthermore, as part of the design, 4 is made capable of
presenting an unshared pair of electrons to the DNA minor groove.

Function-structure analysis of proteins using covarion-based evolutionary approaches: Elongation factors
Gaucher, EA
Miyamoto, MM
Benner, SA
Proc. Natl. Acad. Sci. USA 98
(2)
548-552
(2001)
<Abstract>
The divergent evolution of protein sequences from genomic
databases can be analyzed by the use of different mathematical
models. The most common treat all sites in a protein sequence as
equally variable. More sophisticated models acknowledge the fact
that purifying selection generally tolerates variable amounts of
amino acid replacement at different positions in a protein
sequence. In their "stationary" versions, such models assume
that the replacement rate at individual positions remains
constant throughout evolutionary history. "Nonstationary"
covarion versions, however, allow the replacement rate at a
position to vary in different branches of the evolutionary
tree. Recently, statistical methods have been developed that
highlight this type of variation in replacement rates. Here, we
show how positions that have variable rates of divergence in
different regions of a tree ("covarion behavior"), coupled with
analyses of experimental three-dimensional structures, can
provide experimentally testable hypotheses that relate
individual amino acid residues to specific functional
differences in those branches. We illustrate this in the
elongation factor family of proteins as a paradigm for
applications of this type of analysis in functional genomics
generally.

Evolution, language and analogy in functional genomics
Benner, SA
Gaucher, EA
Trends in Genetics 17
(7)
414-418
(2001)
<Abstract>
Almost a century ago, Wittgenstein pointed out that theory in
science is intricately connected to language. This connection is
not a frequent topic in the genomics literature. But a case can
be made that functional genomics is today hindered by the
paradoxes that Wittgenstein identified. If this is true, until
these paradoxes are recognized and addressed, functional
genomics will continue to be limited in its ability to
extrapolate information from genomic sequences.
 Beyond BLAST: Paleogenomics tools to infer function to genetic sequences.
Benner, S
Chamberlin, S
Am. J. Hum. Genet. 67
(4)
260-260
(2000)

Synthesis and characterization of oligonucleotides containing 2 '-deoxyxanthosine using phosphoramidite chemistry
Jurczyk, SC
Horlacher, J
Devined, KG
Benner, SA
Battersby, TR
Helv. Chim. Acta 83
(7)
1517-1524
(2000)
<Abstract>
Oligodeoxynucleotides containing 2'-deoxyxanthosine (X-d) were
synthesized in good yield from a
O-2,O-6-bis[2-(4-nitrophenyl)ethyl](NPE)-protected phosphoramidite of
X-d. Attempts to synthesize a O-6-monoNPE-protected phosphoramidite
resulted in formation of a major by-product. The NPE protecting groups
were removed by treatment with oximate ion after other protecting
groups were removed with aqueous NH,OH solution. The composition of the
synthetic oligonucleotides was verified by enzymatic degradation and
MALDI-TOF mass spectrometry. The efficacy of this procedure allowed
isolation of oligodeoxynucleotides containing multiple X-d residues.

Evaluation measures of multiple sequence alignments
Gonnet, GH
Korostensky, C
Benner, S
J. Comp. Bio. 7
(1-2)
261-276
(2000)
<Abstract>
Multiple sequence alignments (MSAs) are frequently used in the study of
families of protein sequences or DNA/RNA sequences. They are a
fundamental tool for the understanding of the structure, functionality
and, ultimately, the evolution of proteins. A new algorithm, the
Circular Sum (CS) method, is presented for formally evaluating the
quality of an MSA, It is based on the use of a solution to the
Traveling Salesman Problem, which identifies a circular tour through an
evolutionary tree connecting the sequences in a protein family. With
this approach, the calculation of an evolutionary tree and the errors
that it mould introduce can be avoided altogether, The algorithm gives
an upper bound, the best score that can possibly be achieved by any MSA
for a given set of protein sequences. Alternatively, if presented with
a specific MSA, the algorithm provides a formal score for the MSA,
which serves as an absolute measure of the quality of the MSA, The CS
measure yields a direct connection between an MSA and the associated
evolutionary tree, The measure can be used as a tool for evaluating
different methods for producing MSAs, A brief example of the last
application is provided, Because it weights all evolutionary events on
a tree identically, but does not require the reconstruction of a tree,
the CS algorithm has advantages over the frequently used sum-of-pairs
measures for scoring MSAs, which weight some evolutionary events more
strongly than others. Compared to other weighted sum-of-pairs measures,
it has the advantage that no evolutionary tree must be constructed,
because we can find a circular tour without knowing the tree.

Evolutionary history of the uterine serpins
Peltier, MR
Raley, LC
Liberles, DA
Benner, SA
Hansen, PJ
J. Exp. Zoo. 288
(2)
165-174
(2000)
<Abstract>
A bioinformatics analysis was conducted on the four members of the
uterine serpin (US) family of serpins. Evolutionary analysis of the
protein sequences and 86 homologous serpins by maximum parsimony and
distance methods indicated that the uterine serpins proteins form a
clade distinct from other serpins. Ancestral sequences were
reconstructed throughout the evolutionary tree by parsimony. These
suggested that some branches suffered a high ratio of nonsynonymous to
synonymous mutations, suggesting episodes of adaptive evolution within
the serpin family. Analysis of the sequences by neutral evolutionary
distance methods suggested that the uterine serpins diverged from other
serpins prior to the divergence of the mammals from other vertebrates.
The porcine uterine serpins are paralogs that diverged from a single
common ancestor within the Sus genus after pigs separated from other
artiodactyls. The uterine serpins contain several protein kinase C and
tyrosine kinase phosphorylation sites. These sites may be important for
the lymphocyte-inhibitory activity of OvUS if, Like other basic
proteins, OvUS can cross the cell membrane of an activated lymphocyte.
Internalized OvUS could serve as an alternative target to protein
kinases important for the mitogenic response to antigens. (C) 2000
Wiley-Liss, Inc.

The missing organic molecules on Mars
Benner, SA
Devine, KG
Matveeva, LN
Powell, DH
Proc. Natl. Acad. Sci. USA 97
(6)
2425-2430
(2000)
<Abstract>
GC-MS on the Viking 1976 Mars missions did not detect organic molecules
on the Martian surface, even those expected from meteorite bombardment.
This result suggested that the Martian regolith might hold a potent
oxidant that converts all organic molecules to carbon dioxide rapidly
relative to the rate at which they arrive. This conclusion is
influencing the design of Mars missions. We reexamine this conclusion
in light of what is known about the oxidation of organic compounds
generally and the nature of organics likely to come to Mars via
meteorite. We conclude that nonvolatile salts of benzenecarboxylic
acids, and perhaps oxalic and acetic acid, should be metastable
intermediates of meteoritic organics under oxidizing conditions. Salts
of these organic acids would have been largely invisible to GC-MS,
Experiments show that one of these, benzenehexacarboxylic acid
(mellitic acid), is generated by oxidation of organic matter known to
come to Mars, is rather stable to further oxidation, and would not have
been easily detected by the Viking experiments. Approximately 2 kg of
meteorite-derived mellitic acid may have been generated per m(2) of
Martian surface over 3 billion years. How much remains depends on
decomposition rates under Martian conditions, As available data do not
require that the surface of Mars be very strongly oxidizing, some
organic molecules might be found near the surface of Mars, perhaps in
amounts sufficient to be a resource. Missions should seek these and
recognize that these complicate the search for organics from entirely
hypothetical Martian life.

Functional inferences from reconstructed evolutionary biology involving rectified databases. An evolutionarily-grounded approach to functional genomics.
Benner, SA
Chamberlin, SG
Liberles, DA
Govindarajan, S
Knecht, L
Res. MicroBiol. 151
(2)
97-106
(2000)
<Abstract>
If bioinformatics tools are constructed to reproduce the
natural, evolutionary history of the biosphere, they offer
powerful approaches to some of the most difficult tasks in
genomics, including the organization and retrieval of sequence
data, the updating of massive genomic databases, the detection
of database error, the assignment of introns, the prediction of
protein conformation from protein sequences, the detection of
distant homologs, the assignment of function to open reading
frames, the identification of biochemical pathways from genomic
data, and the construction of a comprehensive model correlating
the history of biomolecules with the history of planet
Earth.
| |