Genomic DNA Sequencing P.carinii Libraries Contig Mapping Sequence Acquisition Sequence Assembly Sequence Annotation

Contig Mapping

Strategy. A physical map is the ordering of cosmid clones by their position along a chromosome. Construction of a physical map begins with the creation of an initial, partially ordered collection of clones, which is then edited to create a final map. Editing includes integrating the physical map (ordered clones) with other genetic information like ESTs to verify and if necessary, correct the physical map under construction. Ordering of the clones will be accomplished by using the chromosomal clones with unique assignments (S) as probes to link with other clones via computerized compression analysis (a chromosome walk) as described in Prade, R.A., Griffith, J. Kochut, K. Arnold, J. and Timberlake, W.E. (1997). PNAS USA, 94, 14564-14569. This ordering will be done under the supervision of the Co-investigator, Dr. Jonathan Arnold, UGA.

a. Chromosomal assignment. Chromosomal assignment can be accomplished in 2 ways. With a random genomic library, DNA from each of the 16 Pc chromosomes is extracted, made radioactive using random hexamer priming with 32P-CTP, and used to probe the membranes containing the cosmid libraries. Nylon membranes are prehybridized and hybridized overnight using standard procedures. Membranes are exposed to X-ray film for 24-72 hrs and visually examined for positive signals. Cosmids are then assigned to chromosomes and classified as repetitive, hybridizing to all 16 chromosomes (R); hybridizing to a limited set of chromosomes, >1 or <16 (L); or specific to a particular chromosome (S). With this strategy, chromosome-specific clones will be picked and distributed in 96-well microtiter plates to create chromosome libraries (non-ordered). These libraries will then become immediately available to the scientific community through the Fungal Genetics Stock Center to facilitate ongoing studies. Likewise, mini-libraries of repetitive clones (likely containing msg and related genes) and limited repetitive elements will be created. With the second strategy, construction of mini-libraries from eluted chromosomal DNA, most clones created from a chromosome will be by definition contained therein. However, identification of repetitive and limited repetitive elements will require the same chromosomal labeling strategy used in the previous approach.

  1. Probe/Clone Hybridization. Two strategies are possible for labeling of specific probes to be used for construction of the physical map; probing with the cosmid insert or primer extension of the insert ends using the SP6 and T7 bacterial promoters. In our preliminary work, we explored both methods. Use of gel-isolated cosmid inserts was time-consuming and unnecessary in our evaluation. Priming off both ends of the cosmid permitted a rapid identification of linking clones as well as clones containing repetitive elements. A commercially available standard mini-prep procedure (Quiagen, Santa Clarita, CA) will be used to isolate cosmid DNA. Each clone will be double stamped in a 5 x 5 array using robotics to reduce false positives. Each end of the isolated cosmid insert will be labeled using the SP6 and T7 promoters and radiolabeled dNTPs and hybridized to the stamped libraries after standard prehybridization conditions. High stringency conditions will be used for hybridization (60-65 oC) and for washes (2XSSC twice for 15 minutes decreasing to 0.1X SSC at the temperature of hybridization) to increase specificity.

The robotics hardware and compression software available at UGA will hasten this process by providing the ability to create high density stamped membranes (2400 clones) permitting more rapid walking and by eliminating redundant clones creating a compressed library. The software and robotics will analyze the membranes and select the next probe/clone to be used to link to the next DNA sequence/chromosome, increasing the efficiency of this process. Membranes will be scanned with the Packard Instant Imager and the digital data deposited in a text file for loading into the Fungal Genome Database (25).

It is estimated that with the use of the robotics systems currently available at UGA, about 48 probings will be performed per week for the Pc Genome project. Using the observed rate of progress of the A. nidulans mapping project, a contig map of P. carinii should be generated in 8 weeks with a probe density of 1 cosmid probe end per 29 kb, with robotic assistance. This mapping strategy will be repeated twice for both the rat and human Pc genomes to reduce greatly the frequency of false joins between contigs and to yield a physical map at 13 kb for both organisms. This resolution of the physical map will permit the recovery of almost any DNA fragment by long distance PCR.

c.Cataloguing of assigned clones. Clones will be assigned a call number based on the plate of origin, position in the 4 x 4 array on each membrane, and well number. The hybridization results are then entered according to call number and result (R, L, or S; L results are listed as to the chromosome bands of identity, e.g. 1A12, L-234). Once the project is underway, the results from hybridizations will be automatically fed into the Fungal Genome Data Base (FGDB) at the University of Georgia, which then selects the probes/clones for subsequent screening.

d.. Constructing ordered clone banks. After fingerprinting each cosmid in terms of chromosome and hybridization with a panel of probes, the cosmids will then be ordered automatically by comparison of their "digital fingerprints" into a 2-way layout. From an unordered binary data matrix, distances can be computed between each pair of clones. Using this distance matrix, random cost algorithms in the Fungal Genome Data Base (FGBD) will then be used to order quickly clones (in rows) by their position along the chromosome. The transpose of the binary data matrix is then used to order the cosmid probes (in columns) into cells, using the whole cosmid library as probes. The final result is a redundant ordering of the library down the rows and a minimal tiling across the columns. We plan to shift over to a maximum likelihood procedure (Kececioglu, Shete, and Arnold, 2nd International Fungal Genome Workshop, Athens, GA, May 1998) for constructing these chromosome walks from the binary data matrix, once we have finished verifying the code and improved performance of the ML procedure vs random cost procedure.

e. Assessment of contig maps. The bootstrap method will be used to assess the reliability of each linkage in the physical map. With the bootstrap method, random subsets of probes are used to reconstruct the physical map and to asceretain whether or not a clone was linked to (adjacent to) a neighboring clone via the computational methods described above. Multiple rounds of this procedure are performed. The number of times that a linkage appears is scored. The entire process is repeated 1000 times. This produces a measure of the probability that a particular linkage actually exists. For example, if a linkage is only supported by one probe, this linkage will have decreased confidence because it will often not appear in resampling.

f. Confirmation of contig order. Weak links flagged by bootstrap resampling or the rules of thumb of Arratia et al. will initiate a visual re-examination of the images stored in the Hewlett-Packard Instant Imager originally read by the robotics system. Weak links that were not due to mistakes at this level will be tested by additional hybridization studies. Colonies for probes used to map a chromosome will be re-picked and their DNAs arrayed on membranes for hybridization. Cosmid mini-preps of pools of clones from the ends of the same inferred contig will be hybridized to the arrayed probe clones spotted onto membranes to confirm contig boundaries. If more than one positive shows up on the membrane, then the gap is closed. Repeats located within the middle of one probe and at the end of another would account for "nonreciprocal hybridization". That is, a probe used early in the mapping experiment does not hybridize to a later probe, but the later probe is found to hybridize with the earlier probe. This would arise if the repeat is located in the middle of the early probe and at the end of the late probe. These nonreciprocal hybridizations will be flagged and sidestepped where possible by choosing another S-clone that hybridizes to the late probe and does not hybridize to the early probe (i.e.misses the repeat).

g. Data storage, retrieval and distribution. All data are being made available in 5 modes by: (i) anonymous ftp from a server FUNGUS.GENETICS.UGA.EDU; (ii)World Wide Web at; (iii) X-Windows connection directly to the FGDB; (iv) by remote connection to FGDB through client software on PCs and Macintoshes, such as ODS; and (v) Genetic Maps. The database system is being constantly refined and will be coupled to an Internet server to the WWW.

As the walk progresses the software will provide a "real-time" diagram of the chromosomal physical map updated via the FGDB

From the original 8-10-fold redundant library 1, a 1.5 genome equivalent library of approximately 300 clones will be created in an in vitro construction of the entire Pc genome contained in 3, 96-well microtiter plates, which will be made available to the scientific community at large.

 homea.gif (3687 bytes)Chuck Staben07/13/99