Purpose: The purpose of these lectures/discussion is to overview cutting edge developments in biochemistry especially genomics that are relevant to plant biochemistry research and applications.
Background Readings for the discussions on Frontiers in Plant Biochemistry:
1 - Edgar B. Cahoon*, Ylva Lindqvist, Gunter Schneider, and John Shanklin*. 1997. Redesign of soluble fatty acid desaturases from plants for altered substrate specificity and double bond position. Proc. Natl. Acad. Sci. USA Vol. 94, pp. 4872-4877.
2 - Cahoon, E.B. and Shanklin, J. (2000) Substrate-dependent mutant complementation to select fatty acid desaturase variants for metabolic engineering of plant seed oils. Proc. Natl. Acad. Sci. USA 97: 12350-12355.
3 - Zivy, M. and D. de-Vienne. 2000. Proteomics: a link between genomics, genetics and physiology. Pl. Molec. Biol. 44: 575-580.
4 - Oliver Fiehn, Joachim Kopka, Peter Dörmann, Thomas Altmann1, Richard N. Trethewey & Lothar Willmitzer. 2000. Metabolite profiling for plant functional genomics. Nature Biotech. 18: 1157 - 1161.
5 - Lassner, M. and Bedbrook, J. 2001. Directed molecular evolution in plant improvement. Curr. Op. Pl. Biol. 4: 152 - 156.
6 - Edward W. and John Shanklin, J. 2001. Engineering D9-16:0-ACP specificity based on combinatorial saturation mutagenesis and logical redesign of the Castor D9-18:0-ACP desaturase. JBC pre-published April 9, 2001: http://www.jbc.org/cgi/reprint/M102129200v1.pdf
1 - William F. DeGrado, Christopher M. Summa, Vincenzo Pavone, Flavia Nastri and Angela Lombardi. 1999. DE NOVO DESIGN AND STRUCTURAL CHARACTERIZATION OF PROTEINS AND METALLOPROTEINS. Annu. Rev. Biochem. 68:779-819.
2 - Roessner, U., Luedemann, A., Brust-D; Fiehn, O., Linke, T., Willmitzer, L. and Fernie, A.R. 2001. Metabolic profiling allows comprehensive phenotyping of genetically or environmentally modified plant systems. PLANT CELL. 13 : 11-29.
3 - Wang, X., Stumpf, D.K. and Larkins, B.A. 2001. Aspartate kinase 2. A candidate gene of a quantitative trait locus influencing free amino acid content in maize endosperm. Plant Physiol. 125: 1778-1787.
4 - Neville Kallenbach. 2001. Breaking open a protein barrel. PNAS USA 98: 2958-2960.
We are at an exciting time in Plant Biochemistry. As mentioned by President Clinton in his 1998 "State of the Union" address, the next century is anticipated to be "The Century of Biology" with advances in biotechnology expected to lead economic development. Not only will we be able use advances in Plant Biochemistry to produce better, more nutritious foods which will improve human health more than any other medical advance, but we are on the verge of using plants as sophisticated chemical factories to produce all manner of materials useful to medicine and industry (as alluded to in previous lectures). We are beginning to understand the structure and function of an increasing number of enzymes and other proteins and this is spawning the development of rational design of enzyme function and other methods of protein engineering.
The largest life science research endeavors ever undertaken are the genome sequencing projects, most notably the human genome sequencing project. Plant and other genomes have been or are also being sequenced led by the efforts with Arabidopsis. Concurrently the accumulation of information on sequence and structure of proteins and other macromolecules is accelerating. As noted by Charles Platt in a 1998 article in Wired Magazine:
"During the past five years a slow collision of epic proportions has united two disparate fields of science. The result promises to be an explosion of new knowledge and power that will sever us from our human heritage and transform us in ways that we cannot yet imagine. If that sounds like overstatement, so be it; this is one occasion where reality should have no trouble matching -- or exceeding -- journalistic hype."This collision is between computer science and biology. A growing number of prokaryote and lower eukaryote genomes have already been completely sequenced and a rough draft of the entire human genome was reported last year and a near complete sequence of the Arabidopsis genome was published in December, 2000. The entire human genome DNA sequence would occupy ~750 megs on your hard drive without data compression! Most crop genomes, e.g., corn, will have similar file sizes and Arabidopsis <10%.
The explosion of biological information from the various advances in biochemistry and molecular biology such as the sequencing projects requires extensive and sophisticated use of computers and has spawned a new area known as Biocomputing or Bioinformatics that is a combination of computer science and biology.
As mentioned by Shankar Subramaniam of the Univ. Illinois in a recent Bioinformatics seminar (April 8, 1998) sponsored by the UK computer center, Bioinformatics can be considered to go all the way back to Aristotle. In 350 B.C. Aristotle wrote (translated by William Ogle):
"Ought we, for instance, to begin by discussing each separate species -- man, lion, ox, and the like -- taking each kind in hand independently of the rest, or ought we rather to deal first with the attributes which they have in common in virtue of some common element of their nature, and proceed from this as a basis for the consideration of them separately?"[If you would like to see this quotation "On the Parts of Animals" by Aristotle in context, refer to the full text.]
Many universities around the world now offer a course in bioinformatics or biocomputing. Also see a nice introduction to this area and internet courses on this subject.
The National Biotechnology Information Facility, NBIF, at New Mexico State University is another very useful source of biocomputing information. The Kyoto Encyclopedia of Genes and Genomes, KEGG, is a leading site for functional genomics or going from gene sequence to function. KEGG aims to make biological sense of the current flood of sequencing data by linking genes to biochemical pathways.
The European, US, and Japanese governments have started units to develop computational algorithms and provide databases to facilitate bioinformatics. The principal databases are GenBank, EMBL (European Molecular Biology Lab), DDBJ (DNA DataBase of Japan), Swiss-Prot, PDB (protein data bank) and PIR (protein identification resource). There are also ESTs (expressed sequence tags) databases. As of April 23, 1999, the number of public EST entries was 2,340,548 with most (but increasingly less) of these from humans --1,339,235. ESTs from only a handful of plants are so far in the public domain with Arabidopsis leading with 37,671 [6th among all organisms (4th a year ago)] and rice following with 35,048. 8,914 ESTs were listed for maize as of last week (up from 1,785 a year ago), but many more are in the private domain. How does this compare with the number of genes known in some of these organisms?
The National Center for Biotechnology Information (NCBI), sponsored by the US National Library of Medicine and National Institutes of Health (NIH) which maintains GenBank, provides links to the other major databases and provides search tools of ESTs (dbEST) and DNA and protein sequence comparisons (Basic Local Alignment Search Tool; BLAST). There is now a family of BLAST programs. For example, PCI-BLAST (Position-Specific Iterated BLAST) provides speed and ease of operation for motif searching. It can help delineate diverse protein families and predict function(s) for newly sequenced proteins.
An important component of bioinformatics goes back to one of the earliest areas of biological science inquiry, systematics or phylogeny. For example, see a part of a phylogenetic tree based on the sequence of cytochrome c. This can also be found in most general biochemistry texts.
See also an example of the phylogenetic grouping of flowering plant (angiosperm) subclasses.
A phylogenetic classification of angiosperms down to orders from Cronquist (1981) is given as follows; dicot subclass names blue or black, monocot subclasses green (see also Fig. 2.13 of the class text).
Every GenBank entry is classified phylogenetically. For example NCBI/GenBank classifies Arabidopsis thaliana as:
Charophyta/Embryophyta group; Embryophyta; Tracheophyta; seed plants; Magnoliophyta; eudicotyledons;
The Plant Genome Data and Information Center (PGDIC) from the National Agricultural Library of the US Department of Agriculture - ARS provides access to a variety of information products and services on all aspects of plant and animal genome mapping.
The GCG suite of programs is a useful bioinformatics tool. Univ. KY has a site license for use of the GCG programs maintained by the Medical Microbiology Dept. on a system called the SeqWeb on the Seqanal but available to the entire campus community. Other useful groups of genomics programs include, the All-IN-ONE SEQ-ANALYZER and Doubletwist.
Genomics can be divided into structural genomics and functional genomics. Functional genomics can be further divided into 3 groupings.
1. Comparative analysis of transcript abundance, i.e. microarrays.
3. Metanomics or metabolomics.
One of the major surprises of structural genomics is how few genes it takes to encode the entire blueprint of higher plants and animals.
We are moving from structural to functional genomics with this latter field being particular relevant to biochemistry. As stated in the BioMedNet News and Comments on the FASEB/Protein Society Symposium that concluded Sunday, April 22, 2001:
"Information about the function of a protein is contained not in its primary sequence but in its structure, so the
classical 'protein folding problem' - how proteins reach their native, folded state from the unfolded, newly
synthesized polypeptide - now takes on new significance.
"It's absolutely clear that the number-one bottleneck at the moment is producing protein and refolding it," Chris Dobson of Oxford University and the current president of the Protein Society told BioMedNet Conference Reporter. But refolding proteins isn't so easy, as veteran refolder Rainer Rudolph of the University of Halle, Germany, made clear in the closing lecture." What is needed to understand the native 3D structure of a protein?
There are a number of new tools for viewing and studying 3D protein structures such as Molecular Visualization for the Masses - 3-D imaging resources for nonstructural biologists: http://news.bmn.com/hmsbeagle/90/reviews/insitu and Swiss-PdbViewer 3.5: http://news.bmn.com/hmsbeagle/90/reviews/sreview
A good example of rationale design of enzyme activity in plants that can be used to alter products of metabolic pathways is the work of Ed Cahoon et al. (1997) in John Shanklin's lab, the first of the required reading assignments.
As mentioned in Lecture 11 the 1st double bond is introduced into fatty acids in plants by a soluble stromal ACP desaturase. This desaturase normally has high specificity for 18:0-ACP and inserts a cis double bond at carbon 9 forming oleic acid, 18:1 D9. A small number of plant species in several families posses other acyl-ACP desaturases. Coriandrum sativum (coriander) seed has a D4-palmitoyl (16:0)-ACP desaturase that inserts a double bond at carbon 4 of 16:0. Thunbergia alata (black-eyed Susan vine) seed has a D6-16:0-ACP desaturase and Pelargonium xhortorum (geranium) trichomes a D9-myristoyl (14:0)-ACP desaturase.
By comparing the protein sequences of these 3 soluble desaturases and R. communis D9-18:0-ACP desaturase and analyzing 3-dimensional structural information from the crystal structure of the R. communis D9-18:0-ACP desaturase, Cahoon et al. were able to decipher the molecular basis for chain-length recognition and positional placement of double bonds into fatty acids. They showed that the regio-specificities could be modified by replacement of specific amino acid residues and that acyl-ACP activities can be rationally redesigned.
D9-18:0-ACP desaturase contains a catalytic diiron cluster (and all the other above-mentioned soluble desaturases contain the conserved iron-binding domains) that represents a fixed point for double bond introduction. Adjacent to these iron atoms is a deep, narrow channel that Cahoon et al. predicted was the binding pocket for the 18:0 portion of the substrate. The channel forces the 18:0 to bend at carbons 9-10 that corresponds to the cis configuration of the 18:1 D9 product. Thus the architecture of the substrate-binding channel appeared to determine substrate acyl-chain length, position and stereochemistry of the introduced double bond. A schematic view of the D9-18:0-ACP desaturase substrate channel with the 18:0 substrate docked is shown in Fig. 3 of Cahoon et al. They predicted that substitution of alanine-188 for the smaller glycine (G188) and tyrosine-189 for phenylalanine (F189) of D6-16:0-ACP desaturase would extend the cavity at the bottom of the desaturase active site enough to permit 2 more carbon atoms at the methyl end of 18:0-ACP. They in fact found that mutant A188G/Y189F is able to desaturate 18:0-ACP at a similar rate as with 16:0-ACP.
The structural model of the enzyme active site indicates that the variant desaturases that accommodate fewer carbon atoms at the bottom of the active site have their binding clefts occluded by bulkier amino acids. As shown in their model (Fig. 4), the D9-18:0-ACP desaturase (red) which allows 9 carbon atoms beyond the double bond insertion, contains a methionine, proline and glycine at positions 114, 179 and 188. On the other hand, the D9-14:0-ACP desaturase (green) which permits 5 carbons beyond the double bond, has the larger hydrophobic residues leucine, isoleucine and leucine at positions 114, 179 and 188 at the bottom of the channel.
Directed evolution or molecular breeding...
Gene Silencing - Jie Zhu...
Review & Integration of Plant Metabolic Pathways
Good luck with your future plant biochemistry endeavors!
|All materials © 1998, 1999, 2000, 2001, 2003 David Hildebrand, unless otherwise noted.|
|home||syllabus||lecture schedule & web notes||virtual office hours||messages & answers from the instructor||supplementary material||related links||What's new?|