Transcription, Translation and Replication
DNA, RNA and protein synthesis
The genetic material is stored in the form of DNA in most organisms. In humans, the nucleus of each cell contains 3 × 109 base pairs of DNA distributed over 23 pairs of chromosomes, and each cell has two copies of the genetic material. This is known collectively as the human genome. The human genome contains around 30 000 genes, each of which codes for one protein.
Large stretches of DNA in the human genome are transcribed but do not code for proteins. These regions are called introns and make up around 95% of the genome. The nucleotide sequence of the human genome is now known to a reasonable degree of accuracy but we do not yet understand why so much of it is non-coding. Some of this non-coding DNA controls gene expression but the purpose of much of it is not yet understood. This is a fascinating subject that is certain to advance rapidly over the next few years.
The Central Dogma of Molecular Biology states that DNA makes RNA makes proteins (Figure 1).
Figure 1 | The Central Dogma of Molecular Biology: DNA makes RNA makes proteins
The process by which DNA is copied to RNA is called transcription, and that by which RNA is used to produce proteins is called translation.
Each time a cell divides, each of its double strands of DNA splits into two single strands. Each of these single strands acts as a template for a new strand of complementary DNA. As a result, each new cell has its own complete genome. This process is known as DNA replication. Replication is controlled by the Watson-Crick pairing of the bases in the template strand with incoming deoxynucleotide triphosphates, and is directed by DNA polymerase enzymes. It is a complex process, particularly in eukaryotes, involving an array of enzymes. A simplified version of bacterial DNA replication is described in Figure 2.
Figure 2 | DNA replication in bacteriaSimplified representation of DNA replication in bacteria.
DNA biosynthesis proceeds in the 5′- to 3′-direction. This makes it impossible for DNA polymerases to synthesize both strands simultaneously. A portion of the double helix must first unwind, and this is mediated by helicase enzymes.
The leading strand is synthesized continuously but the opposite strand is copied in short bursts of about 1000 bases, as the lagging strand template becomes available. The resulting short strands are called Okazaki fragments (after their discoverers, Reiji and Tsuneko Okazaki). Bacteria have at least three distinct DNA polymerases: Pol I, Pol II and Pol III; it is Pol III that is largely involved in chain elongation. Strangely, DNA polymerases cannot initiate DNA synthesis de novo, but require a short primer with a free 3′-hydroxyl group. This is produced in the lagging strand by an RNA polymerase (called DNA primase) that is able to use the DNA template and synthesize a short piece of RNA around 20 bases in length. Pol III can then take over, but it eventually encounters one of the previously synthesized short RNA fragments in its path. At this point Pol I takes over, using its 5′- to 3′-exonuclease activity to digest the RNA and fill the gap with DNA until it reaches a continuous stretch of DNA. This leaves a gap between the 3′-end of the newly synthesized DNA and the 5′-end of the DNA previously synthesized by Pol III. The gap is filled by DNA ligase, an enzyme that makes a covalent bond between a 5′-phosphate and a 3′-hydroxyl group (Figure 3). The initiation of DNA replication at the leading strand is more complex and is discussed in detail in more specialized texts.
Figure 3 | DNA polymerases in DNA replicationSimplified representation of the action of DNA polymerases in DNA replication in bacteria.
Mistakes in DNA replication
DNA replication is not perfect. Errors occur in DNA replication, when the incorrect base is incorporated into the growing DNA strand. This leads to mismatched base pairs, or mispairs. DNA polymerases have proofreading activity, and a DNA repair enzymes have evolved to correct these mistakes. Occasionally, mispairs survive and are incorporated into the genome in the next round of replication. These mutations may have no consequence, they may result in the death of the organism, they may result in a genetic disease or cancer; or they may give the organism a competitive advantage over its neighbours, which leads to evolution by natural selection.
Transcription is the process by which DNA is copied (transcribed) to mRNA, which carries the information needed for protein synthesis. Transcription takes place in two broad steps. First, pre-messenger RNA is formed, with the involvement of RNA polymerase enzymes. The process relies on Watson-Crick base pairing, and the resultant single strand of RNA is the reverse-complement of the original DNA sequence. The pre-messenger RNA is then "edited" to produce the desired mRNA molecule in a process called RNA splicing.
Formation of pre-messenger RNA
The mechanism of transcription has parallels in that of DNA replication. As with DNA replication, partial unwinding of the double helix must occur before transcription can take place, and it is the RNA polymerase enzymes that catalyze this process.
Unlike DNA replication, in which both strands are copied, only one strand is transcribed. The strand that contains the gene is called the sense strand, while the complementary strand is the antisense strand. The mRNA produced in transcription is a copy of the sense strand, but it is the antisense strand that is transcribed.
Ribonucleotide triphosphates (NTPs) align along the antisense DNA strand, with Watson-Crick base pairing (A pairs with U). RNA polymerase joins the ribonucleotides together to form a pre-messenger RNA molecule that is complementary to a region of the antisense DNA strand. Transcription ends when the RNA polymerase enzyme reaches a triplet of bases that is read as a "stop" signal. The DNA molecule re-winds to re-form the double helix.
Figure 4 | TranscriptionSimplified representation of the formation of pre-messenger RNA (orange) from double-stranded DNA (blue) in transcription.
The pre-messenger RNA thus formed contains introns which are not required for protein synthesis. The pre-messenger RNA is chopped up to remove the introns and create messenger RNA (mRNA) in a process called RNA splicing (Figure 5).
Figure 5 | RNA splicingIntrons are spliced from the pre-messenger RNA to give messenger RNA (mRNA).
In alternative splicing, individual exons are either spliced or included, giving rise to several different possible mRNA products. Each mRNA product codes for a different protein isoform; these protein isoforms differ in their peptide sequence and therefore their biological activity. It is estimated that up to 60% of human gene products undergo alternative splicing. Several different mechanisms of alternative splicing are known, two of which are illustrated in Figure 6.
Figure 6 | Alternative splicingSeveral different mechanisms of alternative splicing exist − a cassette exon can be either included in or excluded from the final RNA (top), or two cassette exons may be mutually exclusive (bottom).
Alternative splicing contributes to protein diversity − a single gene transcript (RNA) can have thousands of different splicing patterns, and will therefore code for thousands of different proteins: a diverse proteome is generated from a relatively limited genome. Splicing is important in genetic regulation (alteration of the splicing pattern in response to cellular conditions changes protein expression). Perhaps not surprisingly, abnormal splicing patterns can lead to disease states including cancer.
In reverse transcription, RNA is "reverse transcribed" into DNA. This process, catalyzed by reverse transcriptase enzymes, allows retroviruses, including the human immunodeficiency virus (HIV), to use RNA as their genetic material. Reverse transcriptase enzymes have also found applications in biotechnology, allowing scientists to convert RNA to DNA for techniques such as PCR.
The mRNA formed in transcription is transported out of the nucleus, into the cytoplasm, to the ribosome (the cell's protein synthesis factory). Here, it directs protein synthesis. Messenger RNA is not directly involved in protein synthesis − transfer RNA (tRNA) is required for this. The process by which mRNA directs protein synthesis with the assistance of tRNA is called translation.
The ribosome is a very large complex of RNA and protein molecules. Each three-base stretch of mRNA (triplet) is known as a codon, and one codon contains the information for a specific amino acid. As the mRNA passes through the ribosome, each codon interacts with the anticodon of a specific transfer RNA (tRNA) molecule by Watson-Crick base pairing. This tRNA molecule carries an amino acid at its 3′-terminus, which is incorporated into the growing protein chain. The tRNA is then expelled from the ribosome. Figure 7 shows the steps involved in protein synthesis.
Figure 7 | Translation(a) and (b) tRNA molecules bind to the two binding sites of the ribosome, and by hydrogen bonding to the mRNA; (c) a peptide bond forms between the two amino acids to make a dipeptide, while the tRNA molecule is left uncharged; (d) the uncharged tRNA molecule leaves the ribosome, while the ribosome moves one codon to the right (the dipeptide is translocated from one binding site to the other); (e) another tRNA molecule binds; (f) a peptide bond forms between the two amino acids to make a tripeptide; (g) the uncharged tRNA molecule leaves the ribosome.
Figure 8 | Two-dimensional structures of tRNA (transfer RNA)In some tRNAs the DHU arm has only three base pairs.
Each amino acid has its own special tRNA (or set of tRNAs). For example, the tRNA for phenylalanine (tRNAPhe) is different from that for histidine (tRNAHis). Each amino acid is attached to its tRNA through the 3′-OH group to form an ester which reacts with the α-amino group of the terminal amino-acid of the growing protein chain to form a new amide bond (peptide bond) during protein synthesis (Figure 9). The reaction of esters with amines is generally favourable but the rate of reaction is increased greatly in the ribosome.
Figure 9 | Protein synthesisReaction of the growing polypeptide chain with the 3′-end of the charged tRNA. The amino acid is transferred from the tRNA molecule to the protein.
Each transfer RNA molecule has a well defined tertiary structure that is recognized by the enzyme aminoacyl tRNA synthetase, which adds the correct amino acid to the 3′-end of the uncharged tRNA. The presence of modified nucleosides is important in stabilizing the tRNA structure. Some of these modifications are shown in Figure 10.
Figure 10 | Modified bases in tRNAStructures of some of the modified bases found in tRNA.
The Genetic code
The genetic code is almost universal. It is the basis of the transmission of hereditary information by nucleic acids in all organisms. There are four bases in RNA (A,G,C and U), so there are 64 possible triplet codes (43 = 64). In theory only 22 codes are required: one for each of the 20 naturally occurring amino acids, with the addition of a start codon and a stop codon (to indicate the beginning and end of a protein sequence). Many amino acids have several codes (degeneracy), so that all 64 possible triplet codes are used. For example Arg and Ser each have 6 codons whereas Trp and Met have only one. No two amino acids have the same code but amino acids whose side-chains have similar physical or chemical properties tend to have similar codon sequences, e.g. the side-chains of Phe, Leu, Ile, Val are all hydrophobic, and Asp and Glu are both carboxylic acids (see Figure 11). This means that if the incorrect tRNA is selected during translation (owing to mispairing of a single base at the codon-anticodon interface) the misincorporated amino acid will probably have similar properties to the intended tRNA molecule. Although the resultant protein will have one incorrect amino acid it stands a high probability of being functional. Organisms show "codon bias" and use certain codons for a particular amino acid more than others. For example, the codon usage in humans is different from that in bacteria; it can sometimes be difficult to express a human protein in bacteria because the relevant tRNA might be present at too low a concentration.
Figure 11 | The Genetic code − triplet codon assignments for the 20 amino acids. As well as coding for methionine, AUG is used as a start codon, initiating protein biosynthesis
An exercise in the use of the genetic code
One strand of genomic DNA (strand A, coding strand) contains the following sequence reading from 5′- to 3′-:
This strand will form the following duplex:
The sequence of bases in the other strand of DNA (strand B) written 5′- to 3′- is therefore
The sequence of bases in the mRNA transcribed from strand A of DNA written 5′- to 3′- is
The amino acid sequence coded by the above mRNA is
However, if DNA strand B is the coding strand the mRNA sequence will be:
and the amino-acid sequence will be:
The Wobble hypothesis
Close inspection of all of the available codons for a particular amino acid reveals that the variation is greatest in the third position (for example, the codons for alanine are GCU, GCC, GCA and GCG). Crick and Brenner proposed that a single tRNA molecule can recognize codons with different bases at the 3′-end owing to non-Watson-Crick base pair formation with the third base in the codon-anticodon interaction. These non-standard base pairs are different in shape from A·U and G·C and the term wobble hypothesis indicates that a certain degree of flexibility or "wobbling" is allowed at this position in the ribosome. Not all combinations are possible; examples of "allowed" pairings are shown in Figure 12.
Figure 12 | Structures of wobble base pairs found in RNA
The ability of DNA bases to form wobble base pairs as well as Watson-Crick base pairs can result in base-pair mismatches occurring during DNA replication. If not repaired by DNA repair enzymes, these mismatches can lead to genetic diseases and cancer.
This article on Transcription, Translation and Replication is part of the Nucleic Acids Book.
What is Transcription?
Transcription generally refers to the written form of something. In biology, transcription is the process whereby DNA is usedas a template to form a complementary RNA strand – RNA is the “written” form of DNA. This is the first stage of protein production or the flow of information within a cell. DNA stores genetic information, which is then transferred to RNA in transcription, before directing the synthesis of proteins in translation. Three types of RNA can be formed: messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).
Transcription occurs in four stages: pre-initiation, initiation, elongation, and termination. These differ in prokaryotes and eukaryotes in that DNA is stored in the nucleus in eukaryotes, and whereas DNA is stored in the cytoplasm in prokaryotes. In eukaryotes, DNA is stored in tightly packed chromatin, which must be uncoiled before transcription can occur. The production of mRNA from RNA in eukaryotes is particularly more complicated than it is in prokaryotes, involving several additional processing steps.
Pre-initiation, or template binding, is initiated by the RNA polymerase σ subunit binding to a promoter region located in the 5’ end of a DNA strand. Following this, the DNA strand is denatured, uncoupling the two complementary strands and allowing the template strand to be accessed by the enzyme. The opposing strand is known as the partner strand. Promoter sequences on the DNA strand are vital for the successful initiation of transcription. Promoter sequences are specific sequences of the ribonucleotide bases making up the DNA strand (adenine, thymine, guanine,and cytosine), and the identity of several of these motifs have been discovered, including TATAAT and TTGACA in prokaryotes and TATAAAA and GGCCAATCT in eukaryotes. These sequences are known as cis-acting elements. In eukaryotes, an additional transcription factor is necessary to facilitate the binding of RNA polymerase to the promoter region.
RNA polymerase catalyzes initiation, causing the introduction of the first complementary 5’-ribonucleoside triphosphate. Remember that each DNA nucleotide base has a complement: adenine and thymine, and guanine and cytosine. However, the ribonucleotide base complements differ slightly as RNA does not contain thymine, but rather a uracil, and so adenine’s complement is uracil. After the introduction of the first complementary 5’-ribonucleotide, subsequent complementary ribonucleotides are inserted in a 5’ to 3’ direction. These ribonucleotides are joined by phosphodiester bonds, and at this stage, the DNA and RNA molecules are still connected(see Figure 1).
Image Source : Wikipedia
Figure 1: Initiation of transcription. RNAP refers to RNA polymerase.
Chain elongation occurs when the σ subunit dissociates from the DNA strand, allowing the growing RNA strand to separate from the DNA template strand. This is facilitated by the core enzyme (see Figure 2).
Image Source : Wikipedia
Figure 2: Elongation in transcription
Termination occurs when the core enzyme encounters a termination sequence, which is a specific sequence of nucleotides which acts as a signal to stop transcription. At this point, the RNA transcript forms a hairpin secondary structure by folding back on itself with the aid of hydrogen bonds. Termination in prokaryotes can be assisted by an additional termination factor known as rho(ρ). Termination is complete when the RNA molecule is released from the template DNA strand. In eukaryotes, termination requires an additional step known as polyadenylation in eukaryotes, whereby a tail of multiple adenosine monophosphates is added to the RNA strand.
Figure 3: The main events in each stage of transcription
What is Reverse Transcription?
Reverse transcription is the process of transcribing a DNA molecule from an RNA molecule. This method of replication is utilized by retroviruses, such as HIV, and produces altered DNA, which can be incorporated directly into a host cell, allowing rapid reproduction. This is made possible by the reverse transcriptase enzyme. This can be seen in Figure 4.
Image Source : Wikipedia
Figure 4: The process of reverse transcription.
What is Translation?
Translation refers to the conversion of something from one language or form to another. In biology, translation is the process whereby messenger ribonucleic acid, or mRNA, synthesizes proteins – mRNA is converted to proteins. This is accomplished by the production of a chain of amino acids (a polypeptide chain) determined by the chemical information stored by a specific strand of mRNA. These polypeptides fold to form proteins. Each strand of mRNA is coded by a different gene and codes a different protein. This is important for gene expression.
Image Source : Wikimedia Commons
Figure 5: The triplet code is translated into amino acids, some of the amino acids code for the start and end of translation
Translation has three main stages: initiation, elongation, and termination. These differ slightly in prokaryotic and eukaryotic organisms: in prokaryotes, translation occurs in the cytoplasm, while in eukaryotes, translation takes place in the endoplasmic reticulum. Essential to the process of translation is the ribosome; ribosomal structure also differs in prokaryotes and eukaryotes, mostly concerning the rate of the migration of their subunits when centrifuged, and the number of proteins their subunits contain.
Initiation begins with the small ribosomal subunit binding to the 5’ end of the mRNA, the messenger RNA created in transcription from DNA. This occurs in two stages: the small ribosomal subunit first binds to several proteinaceous initiation factors, before the combined structure binds to mRNA. This binding site is several ribonucleotides before the start codon of the mRNA. Following this, a charged molecule of tRNA binds to the small ribosomal subunit. The large ribosomal subunit then goes on to bind to the complex formed by the small ribosomal subunit, the mRNA, and the tRNA. This process hydrolyzes the GTP (guanosine-5′-triphosphate) needed to power the bonds. After the large ribosomal subunit joins the complex, the initiation factors are released.
The charging of the molecule of tRNA utilized in the process of translation refers to the linking of the tRNA molecule with an amino acid. This occurs as a result of aminoacyl-tRNAsynthetases, which reacts with the amino acid and ATP (adenosine triphosphate) to form a reactive form of the amino acid, known as an aminoacyladenylic acid. This binds with the ATP to form a complex which can react with a tRNA molecule, forming a covalent bond between the two. The tRNA can now transfer the amino acid to the mRNA molecule.
Elongation begins when both the small and large ribosomal subunits have been bound to the mRNA. A peptidyl site and an aminoacyl site are formed on the mRNA molecule for further binding with tRNA. The tRNA first binds to the P site (peptidyl site), and elongation begins with the binding of the second tRNA molecule to the A site (aminoacyl site). Both these tRNA molecules are transporting amino acids. An enzyme known as peptidyl transferase is released and forms a peptide bond between the amino acids transported by the two tRNA molecules. The covalent bond between the tRNA molecule at the P site and its amino acid is broken, releasing this tRNA to the E site (exit site) before it is released from the mRNA molecule entirely. The tRNA located at the A site then moves to the P site, utilizing the energy produced from the GTP. This leaves the A site free for further bindingwhile the P site contains a tRNA molecule attached to an amino acid, that is attached to another amino acid. This forms the basis of the polypeptide chain. Another tRNA molecule then binds to the A site, and peptidyl transferase catalyzes the creation of a peptide bond between this new amino acid and the amino acid attached to the tRNA located at the P site. The covalent bond between the amino acid and tRNA at the P site is broken and the tRNA is released. This process repeats over and over again, adding to add amino acids to the polypeptide chain.
Termination occurs when the ribosome complex encounters a stop codon(see figure 5). At this stage, the polypeptide chain is attached to a tRNA at the P site, while the A site is unattached. GTP-dependent release factors break the bond between the final tRNA and the terminal amino acid. The tRNA is released from the ribosome complex, which then splits again into the small and large ribosomal subunits, which are released from the mRNA strand. This polypeptide chain then folds in on itself to form a protein. This process is depicted in Figure 6 and Figure 7.
Image Source : Wikipedia
Figure 6: The overview of the process of translation
Figure 7: The main events in each stage of translation.
How is Translation Different from Transcription?
Both transcription and translation are equally important in the process of genetic information flow within a cell, from genes in DNA to proteins. Neither process can occur without the other. However, there are several important differences in these processes.
To begin with, initial transcription components include DNA, RNA polymerase core enzyme, and the σ subunit.Translation components include mRNA, small and large ribosomal subunits, initiation factors, elongation factors and tRNA. In transcription, a DNA double helix is denatured to allow the enzyme to access the template strand. In translation, no such denaturing is necessary, as the template is a single mRNA strand. The product of transcription is RNA, which can be encountered in the form mRNA, tRNA or rRNAwhile the product of translation is a polypeptide amino acid chain, which forms a protein.
Transcription occurs in the nucleus in eukaryotic organisms, while translation occurs in the cytoplasm and endoplasmic reticulum. Both processes occur in the cytoplasm in prokaryotes. The factor controlling these processes is RNA polymerase in transcription and ribosomes in translation. In transcription, this polymerase moves over the template strand of DNA, while in translation, the ribosome-tRNA complex moves over the mRNA strand.
These differences are summarized in Table 1 below.
Table 1: The differences between transcription and translation
|DNA, RNA polymerase core enzyme, σ subunit||mRNA, small and large ribosomal subunits, initiation factors, elongation factors, tRNA|
|RNA polymerase reacts with DNA template strand||Ribosome complex interacts with mRNA strand|
Wrapping Up Translation vs. Transcription
For as powerful as it is, DNA is as good as its products. It is for this very reason that the processes of transcription and translation are so important. For a smooth operation of cell processes both the DNA sequences and the products thereof must work according to plan. This is where transcription and translation come into play and fulfill a vital purpose in the DNA function.
Let’s put everything into practice. Try this Biology practice question:
Looking for more Biology practice?
Check out our other articles on Biology.
You can also find thousands of practice questions on Albert.io. Albert.io lets you customize your learning experience to target practice where you need the most help. We’ll give you challenging practice questions to help you achieve mastery in Biology.
Start practicing here.
Are you a teacher or administrator interested in boosting Biology student outcomes?
Learn more about our school licenses here.