Week 2 HW: DNA Read, Write, and Edit

Professor Jacobson

1. Polymerase and error rate

Nature’s machinery for copying DNA is DNA polymerase. The error rate of typical polymerases is approximately 1 in 104 to 106 nucleotides depending on the enzyme.

  • The human genome is ~3 billion base pairs (3 × 10^9 bp).
  • Without correction, polymerase errors would result in thousands of mistakes per genome replication.
  • Biology solves this via proofreading and mismatch repair mechanisms, which reduce the effective error rate to ~1 in 10^9 bp, ensuring accurate genome replication.

2. Coding DNA for human proteins

  • On average, a human protein has ~300 amino acids.
  • Each amino acid can be encoded by 1–6 codons (degeneracy of the genetic code).
  • There are theoretically many different DNA sequences that could code for the same protein.
  • In practice, not all codons work equally well due to:
    • Codon usage bias (some codons are translated more efficiently)
    • mRNA secondary structure affecting translation
    • Regulatory sequences overlapping coding regions

Dr. LeProust

1. Most commonly used method for oligo synthesis

  • Solid-phase phosphoramidite synthesis is the standard method.
  • Nucleotides are added one at a time to a growing DNA chain attached to a solid support, using cycles of deprotection, coupling, capping, and oxidation.
  • This method is highly automated and used commercially.
  • Citation: PMC article on oligo synthesis

2. Why oligos >200 nt are difficult

  • Error accumulation: Each added nucleotide can fail, and errors compound with longer chains.
  • Practical limit: High-purity oligos become impossible above ~200 nt.
  • Chemical constraints: Steric hindrance and protecting group limitations reduce efficiency.

3. Why a 2000 bp gene cannot be synthesized directly

  • 2000 bp is far beyond the ~200 nt limit of direct chemical synthesis.
  • Long genes are instead assembled from short oligos using methods like PCR assembly or Gibson assembly.
  • Direct synthesis of a 2 kb gene would yield mostly truncated or error-prone products.

George Church

Option Chosen: 10 essential amino acids and Lysine Contingency

  • Essential amino acids in most animals: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine.
  • (Arginine can be conditionally essential in children or certain species.)

Lysine Contingency:

  • Fictional mechanism in Jurassic Park where engineered organisms cannot make lysine and therefore depend on an external supply.
  • In reality, lysine is already essential, but organisms can survive in nature because lysine is widely available in food.
  • This highlights that engineered dependencies could theoretically control survival, but natural environmental availability must be considered.

Citation: