Week 2 Homework Section 1: DNA Read, Write, and Edit

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

The intrinsic error rate of DNA polymerase is approximately 1 error per 10⁶ nucleotides incorporated. Given that the human genome is approximately 3.2 × 10⁹ base pairs, this would imply on the order of ~3,200 errors per genome replication if no corrective mechanisms were present.

Biological systems mitigate this discrepancy through multiple layers of error correction. First, many DNA polymerases possess 3′→5′ exonuclease proofreading activity, which reduces the error rate to approximately 1 in 10⁷–10⁸ nucleotides. Second, post-replicative mismatch repair (MMR) pathways further correct misincorporated bases, ultimately reducing the effective mutation rate to approximately 1 error per 10⁹–10¹⁰ nucleotides. Together, these mechanisms ensure genome stability despite the large size of the human genome.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, why do all of these different codes not work equally well?

An average human protein is approximately 450 amino acids long. Due to the degeneracy of the genetic code—where most amino acids are encoded by multiple codons—the number of possible DNA sequences that could encode such a protein is astronomically large, potentially exceeding 10¹⁰⁰ distinct nucleotide sequences.

In practice, however, only a small subset of these sequences function efficiently. This is due to several biological constraints:

  • Codon usage bias: Different organisms preferentially use certain synonymous codons, reflecting tRNA abundance and translational efficiency.
  • mRNA stability and structure: Certain nucleotide sequences form secondary structures that reduce transcript stability or impair ribosome binding and elongation.
  • Translational accuracy and speed: Rare codons can cause ribosome stalling, leading to misfolding or premature termination.
  • Protein folding requirements: Translation kinetics influence co-translational folding; inappropriate codon choices can result in misfolded or nonfunctional proteins.

Thus, although many nucleotide sequences are theoretically valid, only a limited fraction produce functional protein in vivo.

Homework Questions from Dr. LeProust

3. What is the most commonly used method for oligonucleotide synthesis currently?

The most commonly used method for oligonucleotide synthesis is the solid-phase phosphoramidite method, an automated chemical process that sequentially adds nucleotides in the 3′ to 5′ direction.

4. Why is it difficult to make oligonucleotides longer than ~200 nucleotides via direct synthesis?

Direct oligonucleotide synthesis is limited to approximately 200 nucleotides due to cumulative coupling inefficiencies. Even with stepwise coupling efficiencies greater than 99%, errors accumulate exponentially as oligo length increases, resulting in:

  • Decreased full-length product yield
  • Increased truncation products
  • Higher overall error rates

As a result, longer oligos become impractical to synthesize reliably via single-step chemical synthesis.

5. Why can’t you make a 2000 bp gene via direct oligonucleotide synthesis?

A 2000 bp gene cannot be synthesized directly because chemical synthesis error rates and truncation frequencies render long products unusable. Instead, long genes are constructed by assembling multiple shorter oligonucleotides using enzymatic methods such as:

  • PCR-based assembly
  • Gibson Assembly
  • Golden Gate cloning

These methods leverage biological fidelity mechanisms rather than purely chemical synthesis.

Homework Question from George Church

Chosen question: Using Google & Prof. Church’s slide #4 — What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?

The ten essential amino acids in animals are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, Arginine.

The concept of the “Lysine Contingency” highlights lysine’s unique status as an essential amino acid that animals cannot synthesize de novo and must obtain from their diet or environment. This dependency introduces a systemic vulnerability: any disruption to lysine availability, whether ecological, agricultural, or geopolitical, can have cascading effects on animal health and food security.

From a synthetic biology perspective, the lysine contingency underscores how metabolic dependencies constrain evolutionary and technological design space. It also suggests that engineering alternative nutritional pathways or lysine-efficient systems could have outsized impacts on resilience, sustainability, and global food systems.

AI assistance was used to organize, clarify, and refine explanations; no external factual claims were introduced beyond standard molecular biology knowledge.