Week 2 - 1. Lecture Preparation

Next lecture will be around how to Read, Write and Edit DNA.

Homework Questions from Professor Jacobson:

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Polymerases are enzymes that synthesize long chains of nucleic acids, such as DNA or RNA. Two main types exist: DNA polymerase and RNA polymerase, which assemble DNA and RNA molecules respectively by copying a DNA template strand through base-pairing interactions during semi-conservative replication. The error rate of DNA polymerase is approximately one error per 10⁶ bases.

This error rate may seem low, but considering the length of the human genome (about 3.2 × 10⁹ base pairs), it would result in roughly 3,000 errors per replication cycle, which would make the process unreliable.

To address this issue, biological systems rely on two main error-correction mechanisms. First, many DNA polymerases possess proofreading activity through an associated exonuclease, which detects and removes incorrectly incorporated bases during replication. Second, post-replication mismatch repair proteins, such as MutS, detect and repair mismatches between the parent strand and the daughter strand. Together, these mechanisms significantly reduce the final number of errors, allowing for stable and reliable DNA replication.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

An average human protein is about 1036 base pairs long. Because the genetic code is redundant, each amino acid can be encoded by multiple codons (typically 2–6), so in theory there are an immense number of different DNA sequences that could code for the same protein.

In practice, most of these sequences do not work. Cells show codon bias, preferring certain codons for efficient translation. Some sequences create strong DNA or RNA secondary structures that interfere with transcription or translation. Extreme GC or AT content can reduce stability or cause replication problems. Other sequences accidentally introduce regulatory signals or stop codons. Some are also difficult to synthesize accurately. As a result, only a small fraction of the theoretically possible DNA sequences can reliably produce the intended protein.

Homework Questions from Dr. LeProust:

What’s the most commonly used method for oligo synthesis currently?

The most commonly used method today is solid-phase phosphoramidite chemical synthesis. DNA bases are added one at a time to a growing strand that is attached to a solid support, repeating a cycle of coupling, capping, oxidation, and deprotection

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Each nucleotide addition is not 100% efficient. Even small failure rates compound over many cycles. As the oligo gets longer, the fraction of full-length correct molecules drops rapidly, and truncated or error-containing products dominate. By around 200 nt, yield and accuracy become too low for reliable direct synthesis

Why can’t you make a 2000bp gene via direct oligo synthesis?

A 2000 bp gene would require 2000 sequential chemical coupling steps. With cumulative errors and truncations, the probability of producing a full-length, correct molecule becomes essentially zero. Instead, long genes are made by assembling many shorter oligos (typically 50–200 nt) using enzymatic methods like PCR and ligation, followed by error correction and sequencing

Homework Question from George Church:

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in all animals are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine (arginine is strictly essential during growth and development). Animals cannot synthesize these amino acids in sufficient amounts, so they must obtain them from their diet or from other organisms.

Lysine is especially interesting because it is essential, chemically distinct, and metabolically expensive to make. Animals have completely lost the ability to synthesize it, yet lysine is required for protein synthesis, regulation, and many post-translational modifications

In Jurassic Park, InGen claims the dinosaurs were engineered to be lysine-dependent, so they would die if they escaped because lysine would be unavailable in the wild. However, lysine is a common essential amino acid found in many foods, and humans themselves are lysine-dependent. As a medically trained author, Michael Crichton would have known this, and in the story the dinosaurs indeed survive outside containment. Therefor we can imagine that the “lysine contingency” was an intentional deception by InGen, a narrative of safety designed to reassure investors and regulators, rather than a genuine biological containment strategy.