Week 2 Lecture Prep

Homework Questions from Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Solution: Error rate of DNA polymerase in DNA Polymerase is ~1 error per 10^5 nucleotides during raw DNA synthesis. When compared to human genome (3 × 10⁹ base pairs) this accounts to 30,000 mutations per replication. This discrepancy is handled by

Proofreading activity : This is done by exonuclease (3′→5′). This improves fidelity by 100x
Mismatch repair (post-replication) : Detects mismatches that escape proofreading. It uses strand discrimination to fix the new strand and improves fidelity by another ~100–1000×.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

How many different DNA sequences can encode an average human protein?

An average human protein is ~400 amino acids long. Because the genetic code is degenerate, each amino acid is encoded by ~3 synonymous codons on average. Thus, the number of possible DNA sequences encoding the same protein is approximately:

[ 3^{400} ]

Although many sequences are theoretically possible, biological constraints limit functionality:

Codon usage bias: preferred codons match abundant tRNAs; rare codons reduce translation efficiency
Translation speed & folding: codon choice affects co-translational protein folding
mRNA structure: some sequences form inhibitory secondary structures
Regulatory signals in coding DNA: codon changes can disrupt splicing or RNA-binding motifs
GC content & cryptic signals: extreme GC or hidden stop/splice signals impair expression

Homework Questions from Dr. LeProust

1. What is the most commonly used method for oligonucleotide synthesis currently?

The most commonly used method for oligonucleotide synthesis is phosphoramidite-based solid-phase DNA synthesis. In this method, DNA is synthesized stepwise on a solid support by the sequential addition of protected nucleotide phosphoramidites. Each synthesis cycle consists of deprotection, coupling, capping of unreacted chains, and oxidation. This method is highly automated, reproducible, and provides high coupling efficiencies (~99–99.5% per nucleotide), making it suitable for routine synthesis of short DNA oligonucleotides.

2. Why is it difficult to make oligonucleotides longer than ~200 nucleotides by direct synthesis?

Direct synthesis of long oligonucleotides is limited due to cumulative error and yield loss during sequential nucleotide addition. Although each coupling step is highly efficient, the probability of obtaining a full-length correct product decreases exponentially with increasing length. For example, with a coupling efficiency of 99.5%, only about 37% of molecules remain full-length after 200 synthesis cycles. In addition, side reactions such as incomplete coupling, depurination, and truncation accumulate, making purification increasingly difficult and reducing the overall fidelity and yield of long oligos.

3. Why can’t a 2000 bp gene be synthesized directly using oligo synthesis?

Synthesizing a 2000 bp gene would require approximately 2000 sequential chemical coupling steps. Even with high coupling efficiency, the cumulative probability of producing a full-length, error-free product becomes negligibly small (on the order of 10⁻⁵). Furthermore, the high frequency of insertions, deletions, and truncations makes purification impractical and the error rate biologically unacceptable. Therefore, long genes are not synthesized directly but are instead constructed by assembling shorter, chemically synthesized oligonucleotides using enzymatic methods such as PCR-based assembly or Gibson assembly.

Homework Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? Ten essential amino acids in animals

Animals cannot synthesize the following amino acids in sufficient amounts and must obtain them from the diet:

Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Threonine
Tryptophan
Valine
Arginine

Implications for the “Lysine Contingency”

The fact that lysine is essential in all animals implies that animals have irreversibly lost the biosynthetic pathways required to produce it. This supports the idea of the lysine contingency: once early animals evolved in lysine-rich environments (e.g., feeding on plants or microbes that could synthesize lysine), there was no selective pressure to retain the complex and energetically costly lysine biosynthesis pathway. Over evolutionary time, this loss became locked in.

As a result, lysine availability became a nutritional dependency rather than a metabolic choice, shaping animal diets and trophic relationships. The lysine contingency highlights how evolution can constrain future possibilities by eliminating biosynthetic options that are no longer immediately necessary.

What code would you suggest for AA:AA interactions? could not understand this question