NCBI
Home
Up
lab6.tfa

LAB6

The objectives of this lab are:

  1. Learn to use basic BLAST tools to identify similar polypeptides and DNA sequences.

  2. Align 2 sequences to one another and evaluate the statistical significance of such alignments

  3. Algin multiple sequences (polypeptides)

  4. Interpret multiple alignments in terms of protein functional regions.


You have completed the DNA sequence of a segment of genomic DNA from Drosophila melanogaster, a fruit fly.  This DNA sequence is presented to you in a standard DNA sequence format in a file titled lab6.tfa.  This genomic region encodes a polypeptide critical to basal transcription and therefore to cell function.  Please perform appropriate searches, programs, etc, as below.  Do each of these in the EASIEST way possible, no restrictions, unless otherwise stated.

1.  Determine the regions encoding the polypeptide product from this gene.  Prepare a 1-line description that defines the coding sequence (eg. CDS 5-77, 137-737.)  This will be submitted.

2.  Perform a search, using an appropriate BLAST tool, to find 100 polypeptides similar in sequence to the product of this gene that are contained in the nr polypeptide database.  I recommend that you save the result of this search to disk as you will need it for several questions.  Submit the gi number of the highest-scoring polypeptide that IS NOT the given polypeptide.

3.  One high-scoring polypeptide that you SHOULD find in this alignment is from Saccharomyces cerevisiaie.  For this polypeptide, perform the following using GCG programs:
    a.  Align the Drosophila and Saccharomyces polypeptides along their lengths.  EVALUATE the statistical significance of this alignment.  Submit a 1-line "decision" on whether this alignment is significant or not and the basis.
    b.  Align the DNAs of the ORFs encoding these polypeptides along their lengths.  EVALUATE the statistical significance of this alignment.  Submit a 1-line "decision" on whether this alignment is significant or not and the basis.
    c.  Find the single segment of highest similarity between these two ORFS AND EVALUATE the statistical significance of this alignment.  Compare the length of this segment to that of the alignment in 3b, above.  Submit a 1-line "decision" on whether this alignment is significant or not and the basis.   Include a very brief, numerical comment on the relative lengths of part 3b and 3 c.

4.  Use the multiple alignment package of your choice (specify which you used) to align the similar polypeptides that you should find from:

  • Drosophila melanogaster
  • Homo sapiens
  • Xenopus laevis
  • Arabadopsis thaliana
  • Saccharomyces cerevisiae

    a.    Save this alignment in some text format.  You will submit this alignment.

Now, add the Archaebacterium sequence from Sulfolobus shibitae to your alignment.

    b.    Save this alignment in text form for submission.

5.  The 3D structure of the human protein has been determined.  Use RasMol to examine the structure of the human polypeptide.
        In RasMol, represent the protein in backbone and the DNA in spacefill modes.  Generate 2 different figures:
                a.   A .gif file, named yourname6a.gif,  that shows those regions of the polypeptide backbone conserved in the eukarya from part 4a. (Colored distinctively)
                b.   A .gif file, named yourname6b.gif, that shows those regions of the polypeptide conserved in ALL  of the sequences (even Archaea) from part 4b.

                c.   You should make a 1 or 2 sentence conclusion about the differences that you observe and their locations in the polypeptides.  Include a 1-sentence statement that includes an example of a NONCONSERVATIVE amino acid substitution in one of the conserved regions and the effect that you expect this would have on the mutant polypeptide.


What does the ideal submission look like?

1.  CDS 45-135, 800-1100
2.  gi=123456
3. a. Alignment is NOT significant because (and this is NOT a good reason) the aligned regions are shorter than 10 amino acids
    b.  Alignment IS significant because (and this is NOT a good reason) the aligned regions are divisible by 3.
    c.  Alignemnt IS NOT significant because (and this is NOT a good reason) I did it on Tuesday, which was an odd-numbered date.
4. Each of these outputs will be about 2 pages of text.

I used INSPIREALIGN, a method in which the alignment magically appears projected on the inner surface of one's eyelids.

a.

Dm    TYGACFlllvvcAG
Hs    TYGACFlavcacAG
Xl    TYGACFvavcacAG
Hs    TYGACFlavcatAG
At    TYGACFlavcatAG
Sc    TYGACFlalsatAG

  b.

Dm    TYgaCFlllvvcAG
Hs    TYgaCFlavcacAG
Xl    TYgaCFvavcacAG
Hs    TYgaCFlavcatAG
At    TYgaCFlavcatAG
Sc    TYgaCFlalsatAG
Ss    TYppCFlllvvcAG

5.  Two beautiful pictures and one clear statement

A.ex5a.GIF (12008 bytes) B.ex5b.GIF (11853 bytes) C.  The more diverse set of polypeptides conserves only a subset of the amino acids conserved in eukarya because this region is most likely the part of the molecule that binds to frizzem-frazzem, the only substrate in common with all 7 polypeptides.  Changing ARG273 to proline would cause a kink in the polypeptide that would not allow it to bind frizzem-frazzem any more.
 

University of KentuckyMorgan School of Biological SciencesNSF-CCD Support wpe1.jpg (5798 bytes)Chuck Staben, copyright reserved || 10/01/98