Week 2 HW: Lecture Prep

Homework Questions from Professor Jacobson

DNA Polymerase Error Rates and the Human Genome

Error Rate of Polymerase: In biological synthesis, error-correcting polymerase has an error rate of approximately $1:10^6$. This is significantly more accurate than raw chemical synthesis, which has an error rate of roughly $1:10^2$.
Comparison to the Human Genome: The human genome is approximately 3 billion base pairs ($3 \times 10^{9$) in length. At an error rate of $1:10}6$, copying the entire human genome would result in roughly 3,000 errors per replication cycle.
How Biology Deals with the Discrepancy: Biology utilizes specific enzymatic functions to manage and correct these errors to ensure genomic integrity. This includes 3’-5’ proofreading exonuclease activity and 5’-3’ error-correcting exonuclease functions that work alongside template-dependent primer extension to identify and remove incorrect bases.

Coding for Human Proteins

Ways to Code for an Average Human Protein: The average human protein is 1,036 base pairs long. Because the genetic code is redundant (multiple different codons can code for the same amino acid), there are an astronomical number of possible DNA sequences that can result in the same protein sequence. The sources highlight that biology must find a balance between this codon redundancy and diversity to maintain “fabricational complexity”.
Reasons Some Codes Do Not Work: In practice, many DNA sequences that technically code for the correct protein are “impossible” or difficult to use for synthesis or expression due to several biological and mechanical factors:
- Secondary Structures: Sequences that form hairpins or inverted repeats can interfere with replication and transcription machinery.
- Extreme GC Content: Regions with very high (≥90%) or very low (≤10%) GC content are often unstable or difficult for polymerase to navigate.
- Repetitive Sequences: Long terminal repeats, tandem repeats, or clusters of repeats can lead to “slippage” and errors during synthesis.
- Homopolymers: Long runs of an identical base (e.g., more than 30bp of A) are particularly prone to errors.
- RNA Cleavage and Stability: Certain nucleotide combinations may inadvertently trigger RNA cleavage rules (such as targets for RNase III), leading to the degradation of the mRNA before it can be translated.
- Codon Optimization: Not all redundant codons are treated equally by the cell’s translational machinery; choosing the “wrong” codons can lead to inefficient protein production.

Homework Questions from Dr. LeProust

The most commonly used method for oligonucleotide synthesis currently is phosphoramidite chemical synthesis on a solid support. This process follows a cycle involving deprotection, base coupling, capping, and oxidation to add nucleotides sequentially to a growing chain.

Direct synthesis of oligos longer than 200 nucleotides is difficult because the cumulative yield drops exponentially with each coupling step. Chemical synthesis has a high raw error rate of approximately 1:10², meaning errors are introduced frequently as the chain grows. Because the efficiency of adding each base is not 100%, the probability of obtaining a perfect, full-length product decreases significantly as the length increases.

You cannot make a 2000bp gene via direct synthesis primarily because the error rate of chemical synthesis would result in approximately 20 errors in a sequence of that length. Furthermore, the physical yield of a 2000bp molecule would be vanishingly small due to the cumulative losses over 2000 coupling cycles. Instead, genes are constructed using enzymatic assembly to join many smaller, sequence-verified oligonucleotides into a single long fragment. While advanced chemistry has pushed direct synthesis limits to 700 nucleotides, creating a 2000bp gene still requires the assembly of multiple fragments to maintain accuracy and throughput.

Homework Question from George Church

Command IA: [Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Based on the list of essential amino acids and the provided lecture materials, here is an analysis of the “Lysine Contingency”:

The 10 Essential Amino Acids

According to general biological standards (verified via external search), the 10 essential amino acids that most animals (including humans) cannot synthesize and must obtain through their diet are:

Phenylalanine
Valine
Threonine
Tryptophan
Isoleucine
Methionine
Histidine
Arginine
Leucine
Lysine

In Professor Church’s slides, the standard genetic code chart identifies these amino acids by their single-letter codes (e.g., K for Lysine, L for Leucine, M for Methionine) as the building blocks for protein synthesis.

The “Lysine Contingency” and Biocontainment

The “Lysine Contingency” is a concept popularized by Jurassic Park, where engineered organisms are made unable to produce lysine, theoretically preventing them from surviving in the wild without human-provided supplements.

Knowing that lysine is already an essential amino acid for all animals significantly changes the view of this contingency:

Redundancy in Nature: Since wild animals (and humans) already cannot synthesize lysine, they must constantly find it in their environment (by eating plants or other animals). Therefore, a “lysine contingency” is not a robust biocontainment strategy because lysine is widely available in the natural world. An escaped organism would simply find lysine in the wild just as any other animal does.
Church’s Advanced Solution: Professor Church’s research proposes a much more effective version of this idea through Genomically Recoded Organisms (GROs). Instead of relying on a natural amino acid like lysine, Church’s team engineered organisms to be dependent on Non-Standard Amino Acids (NSAAs).
Synthetic Dependency: These NSAAs do not exist in the wild. This creates a true “metabolic isolation” or “biocontainment” because, unlike lysine, the organism cannot find these synthetic building blocks in the environment, effectively preventing survival outside a controlled laboratory setting.

In summary, while the “Lysine Contingency” is a natural reality for all animals, it is an ineffective tool for synthetic biology containment. Modern genetic engineering instead uses synthetic amino acid dependency to ensure organisms remain contained.