Week 2 Lecture Prep Questions

Homework Questions from Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of DNA polymerase with proofreading has an error rate of approximately 1 error per 10⁶ base pairs copied (10⁻⁶ per base). The human genome is ~3.2 billion base pairs (3.2 × 10⁹ bp). If polymerase made errors at 1 in 10⁶ bases, then you would expect roughly ~3,200 errors per genome replication. Biology uses multiple layers of error correction:

  • Proofreading by DNA polymerase during replication
  • Post-replication mismatch repair systems (e.g., MutS/MutL pathways)
  • Redundancy in the genome (non-coding regions, diploidy)
  • Cellular quality control, including apoptosis for heavily damaged cells Together, these reduce the effective mutation rate to ~10⁻⁸ per base per generation.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

DNA has 64 codons encoding 20 amino acids, so most amino acids are encoded by multiple codons. An average human protein is ~1036 bp (~345 amino acids). Because many amino acids have 2–6 synonymous codons, the number of possible DNA sequences that encode the same protein is extremely large. (on the order of 10^100+ possible sequences) In practice, most of these possible codes do not work effectively. This is due to factors such as codon bias (preferred codons differ between organisms), mRNA secondary structure that can inhibit transcription or translation, unintended regulatory signals embedded in DNA, repetitive or GC-rich sequences that are difficult to synthesize, and effects on translation speed that influence protein folding. As a result, although many sequences are theoretically valid, only a small subset function well in real biological systems.

Homework Questions from Dr. LeProust

What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide synthesis is chemical phosphoramidite synthesis on a solid support. This method adds nucleotides one at a time through repeated cycles of coupling, capping, oxidation, and deprotection. It is highly automated, scalable, and reliable for short oligos, which is why it remains the industry standard.

Why is it difficult to make oligos longer than ~200 nt via direct synthesis?

Each phosphoramidite synthesis cycle has a small but nonzero failure rate. As the oligo gets longer, these errors accumulate exponentially, leading to truncated products and sequence errors. Longer sequences also suffer from increased depurination, incomplete coupling, secondary structure formation, and purification challenges. As a result, yield and fidelity drop sharply beyond ~200 nucleotides.

Why can’t you make a 2000 bp gene via direct oligo synthesis?

Direct chemical synthesis of a 2000 bp sequence would accumulate too many errors and truncations to be practical, resulting in extremely low yield of full-length, correct product. The chemistry lacks error correction, and the probability of obtaining a perfect 2000 bp strand is effectively near zero. Instead, long genes are made by assembling many shorter oligos (typically 60–200 nt) using enzymatic methods such as PCR-based assembly, Gibson assembly, or ligase-based assembly, followed by sequencing and error correction.

Homework Questions from George Church

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Using standard biological knowledge (what animals cannot synthesize de novo and must obtain from diet), the 10 essential amino acids in animals are:

Histidine (H)

Isoleucine (I)

Leucine (L)

Lysine (K)

Methionine (M)

Phenylalanine (F)

Threonine (T)

Tryptophan (W)

Valine (V)

Arginine (R)