BCH 401G Lecture Notes: Transcription (RNA Synthesis) Andres

Central Dogma of Molecular Biology:

How does the sequence of a strand of DNA correspond to the amino acid sequence of a protein? This concept is explained by the central dogma of molecular biology, which states that:

Flow of genetic information in normal cells:

Transcription Translation

(--<-->---) -DNA ---------------------> RNA -------------------> Protein


Why would a cell want to have an intermediate between DNA and the protein it encodes?

The DNA can stay pristine and protected, away from the caustic chemistry of the cytoplasm.

Gene information can be amplified by having many copies of an RNA made by one copy of DNA.

Regulation of gene expression can be effected by having specific controls at each element of the pathway between DNA and proteins. The more elements there are in the pathway, the more opportunities there are to control it in different circumstances.

We will now look at the transcription of DNA to RNA.

-We will mainly discuss the process in bacteria.

-Transcription in eukaryotic cells is similar - will discuss some differences.

Transcription: Information stored in the sequence of DNA is read. Mechanistically, transcription is similar to DNA replication: uses nucleotide triphosphates and template directed growth in 5' to 3' direction.

2 major differences:

1) Only one DNA template is transcribed (single stranded poly-ribonucleotide chain is synthesized).

2) Only a small fraction of the total genetic potential of an organism is used in any one cell.

The reaction is thermodynamically favorable: Hydrolysis of the terminal phosphoanhydride bond of nucleotide triphosphate yields 13 kJ/mol more energy than is necessary for formation of a phosphodiester linkage within the RNA backbone (remember back to our discussion in earlier lectures).

Classes of RNA:

1) Messenger RNA (mRNA): carries information which will be translated into protein sequence by ribosomes. 3% of total RNA in bacteria is mRNA.

2) Ribosomal RNA (rRNA): part of the ribosome which is involved in the translation of mRNA to protein. 83% of bacterial total RNA.

3) Transfer RNA (tRNA): involved in the translation of mRNA into protein. 14% of total bacterial RNA.

4) Small RNA's: Involved in the modification of some RNA's after transcription. <1% of Total RNA.

Define a new term: GENE

A Gene is any portion of the DNA sequence which is transcribed into RNA. Gene's can be transcribed into any of the classes of RNA that we just discussed.

Terminology and numbering of Gene Sequences:

1). DNA is indicated in a 5' to 3' direction along its top (or coding) strand and 3' to 5' along the bottom (TEMPLATE or noncoding) strand.



If this DNA sequence is capable of being transcribed to RNA, the sequence would be termed a "gene" and the RNA would be written as the 5' to 3' TOP or CODING strand sequence.

Coding Strand: Identical to the RNA transcript.

Template Strand: Serves as the template for making the RNA transcript and is complementary to that of the RNA transcript.

2). Numbering system

Transcription Start Site. Nucleotide in DNA coding strand corresponding to the first nucleotide of the transcribed RNA is numbered +1.

Nucleotides to the right of the start site (+1) toward 3' end on coding strand are indicated by increasing positive numbers (+ 2, 3, 4, 5, etc.).

Nucleotide directly to the left of the +1 nucleotide (start site) is defined as -1, and the next is -2, -3, etc. There is no zero between -1 and +1.

3). Promoter Sequences: Each gene has sequences which are important for controlling its expression. These are termed 'promoter sequences."

Usually found at the 5' end of the gene, relative to the coding strand.

In the numbering system, these promoter sequences have negative numbers.

Enzymology of RNA Synthesis: RNA POLYMERASE

A single RNA polymerase functions in bacteria.

In eukaryotes, three distinct RNA polymerases are responsible for the synthesis of each class of RNA: rRNA, mRMA, and small RNAs.

DNA polymerase and RNA polymerase Catalyze Similar Reactions:

Vmax DNA pol III 500-1000 nucleotides/sec

Vmax RNA pol. 50 nuc./sec

10 molecules of DNA pol./cell, 3000 molecules of RNA polymerase (~50% involved in making RNA at any one time). DNA replication is fast but initiates at a few sites, RNA transcription is slow but occurs at many sites of initiation and so accumulates to high levels.

RNA polymerase is highly processive (like DNA pol.). So once initiated, it will not dissociate until a specific termination signal is received.

See Figure 27.4

Another difference is that RNA polymerase is much less accurate.

RNA Polymerase is an Oligomeric Protein:

5 separate protein subunits comprise RNA Polymerase in bacteria: 2 copies of a, b, b', s, and w.

The sigma subunit can be removed from RNA polymerase and leave the rest of the complex intact.

Can then test the binding affinity of the entire complex and the Core complex (lacking sigma) for general DNA and "Promoter" DNA (which contains -10 and -35 consensus sequences).

Kassoc. Values for: Any DNA Promoter DNA Sequence

RNA polymerase (- sigma) 1 x 1010 M-1 1 x 1010 M-1

RNA polymerase (+ sigma) 5 x 106 M-1 2 x 1011 M-1

Sigma Factor does two things:

1). Decreases affinity of RNA polymerase for general DNA (by 4 orders of magnitude).

2). Increases affinity of RNA polymerase for promoter DNA (by 1 order of magnitude).

Function of sigma subbing is to interact with -10 and -35 consensus sequences so that polymerase can bind to promoter region of genes.


1). Binding of RNA polymerase to Promoter Sequences:

In E. coil there are two regions that are similar in all promoters. One sequence is centered at -10 and the other -35 relative to the transcriptional start site at +1.

For -10 and -35 sequence, can identify nucleotides that are usually found at each position in these promoters.

Called "Consensus Sequence".

For -10 region the consensus sequence is; 5' TATAAT 3', often called "TATA" box for this reason.

For -35 region the consensus is 5' TTGACA 3'.

The nucleotide at the transcriptional start site is almost ALWAYS A PURINE (A or G), most often an Adenine.

Promoter recognition is a critical step in transcription, for regulation as well as mechanism. This is because promoter recognition is a rate-limiting step for transcription. Because all genes are transcribed by the same protein complex, differences in promoter structure must be largely responsible for differences in frequency of initiation ( as rapid as 1/10 sec to 1/per generation 30-60 min).

SEE Figure 27.7

How does RNA polymerase find promoter DNA sequences?

RNA polymerase binds to DNA at random sites and moves quickly along the DNA while the sigma factor scans for promoter sequences.

Once a promoter is reached, sigma subunit binds promoter sequences with high affinity and prevents the polymerase from scanning any further.

Why use a scanning mechanism? Because it is much faster than a random association/dissociation search which is diffusion controlled and therefore a second-order reaction (Maximum rate 108 M-1 S-1).

The scanning scheme is essentially first order and has a rate constant of 1010 M-1 S-1,. This is two orders of magnitude faster than a bind/release search.

2). Initiation of Transcription.

See Figure 27.6

A). RNA polymerase associates with promoter sequences near the +1 Transcription start site.

This is called a "CLOSED PROMOTER COMPLEX" because the DNA at the Transcription start site is still double stranded.

See Figure 27.12 and 27.6

B). Polymerase then unwinds the DNA at the Transcription start site to make it single-stranded.

This complex is termed the "Open Complex" because DNA is unwound.

17 base-pairs of DNA is unwound, forming a "Transcription Bubble".

RNA polymerase now starts to synthesize the RNA transcript.

RNA polymerase has two binding sites for ribonucleoside triphosphates, the FIRST is used during elongation, binds all of the 4 common ribonucleoside triphosphates with a half saturating concentration of 10 mM.

The SECOND, used for initiation, binds ATP and GTP preferentially at 100 mM. Thus, most RNA have a purine at the 5' end.

Chain growth begins with binding of the template specified rNTP at the initiation site, followed by binding of the next nucleotide at the elongation site. Next, nucleophilic attack by the 3' hydroxyl of the first nucleotide on the a (inner) phosphorus of the second nucleotide generates the first phosphodiester bond and leaves an intact triphosphate at the 5' position of the first nucleotide.

RNA polymerase moves in 5' to 3' direction (relative to coding strand) and continues synthesizing RNA off the DNA template strand.

"Transcription Bubble" moves down the DNA helix in concert with the new synthesis.

Within the "Bubble" only 12 nucleotides of the DNA template strand are base-paired with the RNA strand at any time.

Called the "RNA:DNA hybrid".

See Figure 27.13

As each new nucleotide is incorporated, one base-pair of the RNA:DNA hybrid at the other end of the transcription bubble has to dissociate.

3). Termination of Transcription.

RNA transcripts are not infinitely long. There are two ways in which termination of transcription is known to occur.

First lets talk about pausing:

RNA polymerase an pause during transcription. Pausing occurs at sequences rich in G/C base-pairs.

Difficult to disrupt stable G/C base-pairs to allow formation of the transcription bubble and to release RNA:DNA hybrid.

Pausing can last from 10 seconds to 30 minutes.

Two Major Mechanisms of Transcription Termination. Simple and Rho-dependent.

1). SIMPLE: Some termination sites have two shared structural features at these termination sites:

A). Two symmetrical G/C-rich sequences that in the transcript have the potential to form a stem-loop structure.

B). A downstream run of four to eight A residues.

see Figure 27.16 and 27.17

RNA polymerase pauses at first G/C rich region, this allows the second G/C rich region of the RNA transcript to base-pair with the first region- forming a RNA:RNA stem-loop duplex and eliminating some of the base-pairing between template and transcript. Further weakening, leading to dissociation, occurs when the A-rich region is transcribed to give a series of very weak A-U bonds.

Once the RNA has dissociated from the enzyme complex, the DNA bubble is lost and without the sigma factor the RNA polymerase does not sufficient affinity to remain bound to the DNA duplex. It releases the DNA, reassociates with the sigma protein and searches for a new promoter sequence.

2). Rho Mediated: Factor-dependent termination is more rare. The Rho protein is necessary for the termination of these genes.

see Figure 27.18

3 Steps:

1). Polymerase pauses.

2). Rho protein recognizes and binds to a specific RNA sequence in the template and then binds to approximately 80 additional nucleotides of RNA at the pause site.

3). Rho protein terminates transcription by disrupting RNA:DNA hybrid and in an ATP dependent process migrates toward the 3' end of the template, displacing the RNA polymerase and disrupting the RNA:DNA hybrid.

Differences in RNA transcription between eukaryotes and prokaryotes:

1). Only one RNA polymerase in E. coli. There are three RNA polymerases in eukaryotes.

2). In eukaryotes, most promoters direct transcription of only one gene. In bacteria, several genes are often transcribed from a single promoter. As we discussed, this type of transcriptional unit is called an "Operon".

Gene A Gene B Gene C



3). Eukaryotic Polymerases require additional protein factors (Transcription Factors) to bind to a promoter and initiate transcription.

4). Eukaryotic RNA polymerases must pass through the nucleosomes that are found on all chromatin.

5). Eukaryotic RNA polymerases do not have terminator signals, rather they proceed well past the coding region and into the 3' noncoding region of genes. The later action of enzymes process this RNA molecule extensively in a series of reactions that we will discuss (capping, splicing, editing).