Week 1: Professor Questions

Homework Questions from Professor Jacobson

Question 1: DNA Polymerase Error Rate

Question:
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

Answer:

The error rate of DNA polymerase with proofreading is approximately 1 error per 10⁶ base pairs. The human genome is about 3.2 billion base pairs long. At this error rate, a single round of genome replication would introduce roughly 3,200 errors, which would be incompatible with stable life.

How Biology Resolves This Discrepancy

Biology relies on layered error correction mechanisms rather than polymerase accuracy alone:

Polymerase Proofreading
Many DNA polymerases possess 3′→5′ exonuclease activity, which removes incorrectly incorporated nucleotides immediately during synthesis.
Mismatch Repair (MMR)
Errors that escape proofreading are corrected post-replication by the mismatch repair system, involving proteins such as MutS, MutL, and MutH, which detect mismatches, excise the incorrect segment, and resynthesize it correctly.

Net Result:
These combined systems reduce the final mutation rate to approximately 1 error per 10⁹–10¹⁰ base pairs, resulting in only a few errors per genome replication.

Question 2: Coding Diversity for Human Proteins

Question:
How many different DNA sequences can encode an average human protein, and why do most of these sequences fail in practice?

Answer:

An average human protein is encoded by approximately 1,036 base pairs of DNA (about 345 amino acids). Because the genetic code is redundant, with 61 codons encoding 20 amino acids, there are an astronomical number of possible DNA sequences that can theoretically encode the same protein.

Why Most Synonymous Sequences Fail

Despite this theoretical diversity, most synonymous sequences do not function properly due to several biological constraints:

mRNA Secondary Structure
Different sequences fold into different mRNA structures. Stable hairpins or loops can block ribosome binding, slow translation, or destabilize the transcript.
GC Content and Stability
Extreme GC or AT content alters nucleic acid stability. Excessive GC content makes DNA and RNA difficult to unwind, while low GC content reduces structural stability.
RNA Cleavage Rules
Certain sequences form structures recognized by RNases (e.g., RNase III), leading to premature mRNA degradation.
Codon Usage Bias
Organisms prefer specific codons. Rare codons slow translation due to limited tRNA availability, reducing protein yield or causing misfolding.
Translation Kinetics and Folding
Translation speed affects co-translational protein folding. Incorrect synonymous choices can produce misfolded, non-functional proteins.

Together, these constraints explain why only a small fraction of synonymous DNA sequences successfully produce functional proteins.

Homework Questions from Dr. LeProust

What’s the most common method for oligonucleotide synthesis?

The most common method is solid-phase phosphoramidite synthesis, a chemical process in which nucleotides are added stepwise to a growing DNA strand. Modern platforms perform this synthesis on silicon chips, enabling the parallel production of millions of oligonucleotides.

Why is it difficult to synthesize oligonucleotides longer than ~200 nucleotides?

As oligonucleotide length increases, small errors accumulate and coupling efficiency decreases, leading to truncated and incomplete products. This limits reliable direct synthesis to a few hundred nucleotides.

Why can’t a 2000 base-pair gene be made by direct oligonucleotide synthesis?

Chemical synthesis is limited to short DNA fragments. A 2000 base-pair gene must be constructed by synthesizing shorter oligonucleotides and assembling them into the full-length gene using enzymatic assembly and ligation methods.

Homework Questions from Professor Jacobson

Natural vs. Synthetic Biocontainment Strategies

Amino Acid Essentiality & Biocontainment

The ten essential amino acids for animals—those that must be obtained through diet—are phenylalanine (F), valine (V), threonine (T), tryptophan (W), isoleucine (I), methionine (M), histidine (H), arginine (R), leucine (L), and lysine (K).

The Lysine Contingency and Biological Reality

The “lysine contingency,” popularized by Jurassic Park, proposes limiting survival by making organisms dependent on lysine. In reality:

Natural dependency: All animals are already dependent on lysine and other essential amino acids obtained from the environment.
Poor containment: Lysine is abundant in nature, making this an ineffective biocontainment strategy.
Synthetic solutions: Research on genomically recoded organisms (GROs) replaces natural amino acid dependence with reliance on non-standard amino acids (NSAAs) that do not exist outside controlled environments.

This creates a synthetic contingency, ensuring engineered organisms cannot survive beyond the laboratory or production setting.

Citations and AI Prompt Disclosure

Key References:

Lajoie et al. (2013), Genomically Recoded Organisms Expand Biological Functions
Nyerges et al. (2022), Swapped genetic code blocks viral infections & gene transfer

AI Usage Disclosure:
Standard biological facts were retrieved using internal knowledge. Google NotebookLM was used as a study aid. Lecture slides were uploaded to ChatGPT, and the following prompt was used:

“Teach me this lecture as a coherent essay. Explain all concepts from first principles, and clearly explain any new or technical terms when they appear.”

The connection to Prof. Church’s work and the lysine contingency was synthesized directly from the provided source materials.