Week 2 Pre-work
Homework Questions from Professor Jacobson: 1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? 2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Error-correcting polymerase has an error Rate: 1:10^6, i.e. one mistake for every 1,000,000 base pairs added with a throughput of 10 mS per base addition. The human genome is approximately 3.2 billion bp long. This means that we would expect 3200 mistakes in the human genome everytime it was copied in full if there were no defense mechanism to proofread and fix these errors. The MutS and MutL repair proteins work together to proof-read and repair DNA mismatches from incorrect base pairings or small insertions or deletions as part of a defense mechanism known as the DNA Mismatch Repair (MMR) system: MutS protein acts as a homodimer and scans the length of the newly synthesized DNA, binding to sites of mismatches to flag them. The MutL protein the binds to the MutS-DNA complex to form an assembly that coordinates the repair of this mismatch.
The median protein length is about 375 amino acids (aa) long (1) for human sapiens and an average of 472aa in eukaryotic proteins, speaking more generally (2). Each aa is coded for by 1-6 codons, depending on the individual aa; if we know what protein we are coding for, this could be specifically calculated from the number of possible codons for each aa in the specific protein, for each amino acid. Estimating, we could say that we have a protein with 375 aa and each amino acid could be encoded by an average of 3 codons: 3^375= 8.3266540232298E+178. Not all of these codes will lead to a functioning protein in practice, however, since changing the codon may affect co-translational folding, often a function of the speed at which translation occurs based upon tRNA availability, impact mRNA stability, or unintentionally cause A/U rich clusters that may result in premature termination of translation.
(1)https://bionumbers.hms.harvard.edu/bionumber.aspx?s=n&v=4&id=106445 (2)Tiessen, A., Pérez-Rodríguez, P. & Delaye-Arredondo, L.J. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes 5, 85 (2012). https://doi.org/10.1186/1756-0500-5-85
Homework Questions from Dr. LeProust: 1. What’s the most commonly used method for oligo synthesis currently? 2. Why is it difficult to make oligos longer than 200nt via direct synthesis? 3. Why can’t you make a 2000bp gene via direct oligo synthesis?
Solid-phase phosphoramidite chemistry is the current standard method for synthesizing DNA and RNA oligonucleotides. It involves anchoring the first nucleotide to a solid support, followed by a repeating 4-step cycle (deprotection, coupling, capping, oxidation) to add nucleotides. Unlike biological synthesis, this method proceeds in the 3′ to 5′ direction.
It is difficult to make oligos longer than 200nt via direct synthesis because the phosphoramidite chemistry involved is not 100% efficient, leading to sharp drops in yield and higher impurities after this point.
You cannot make a 2000bp gene via direct oligo synthesis because of the limitations in high-fidelity synthesis via the current methods for oligo synthesis and the difficulty in making long sequences without them forming secondary structures, such as hairpins. Typically, smaller high-fidelity oligos are synthesized and then assembled together.
Homework Question from George Church: Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any. [Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? [Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions? (Advanced students) Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own: https://arpa-h.gov/explore-funding/programs/boss https://www.darpa.mil/research/programs/smart-rbc https://www.darpa.mil/research/programs/go
Histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine are considered the nine essential amino acids due to the fact that they cannot by synthesized by the human body or other mammalian cells (1). Arginine can be synthesized in small quantities in the human body, but this synthesis may not be enough during times of growth, healing, or other physiological stress, making it a conditionally essential amino acid in humans. Arginine is considered essential in birds due to the lack of a urea cycle (2) and is essential in animals such as cats, dogs, and ferrets (3) due to large amounts of ammonia being created during protein catabolism after meals that needs to be processed via the urea cycle.
In the 1993 hit film Jurassic Park, engineer Ray Arnold describes one of the genetic modifications Dr. Wu has made to the dinosaurs’ genomes: “The lysine contingency is intended to prevent the spread of the animals in case they ever got off the island. Dr. Wu inserted a gene that creates a single faulty enzyme in protein metabolism. The animals can’t manufacture the amino acid lysine. Unless they’re continually supplied with lysine by us, they’ll slip into a coma and die.” As can be seen in the above list of essential amino acids, no animal can produce lysine (hence, why it is an essential amino acid), so it’s extremely unlikely that these dinosaurs would have had that ability in the first place, rendering this completely useless as a concept. Any of those dinosaurs would get adequate lysine from their diets without any specific supplementation, as it is prevalent in a large variety of plant and animal proteins.
(1)Lopez MJ, Mohiuddin SS. Biochemistry, Essential Amino Acids. [Updated 2024 Apr 30]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK557845/ (2)Freedland RA, Briggs S (2012-12-06). A Biochemical Approach to Nutrition. Springer Science & Business Media. p. 45. ISBN 9789400957329. (3)Nutrient Requirements of Dogs. National Academies Press. 1985. p. 65. ISBN 978-0-309-03496-8. (4)https://jurassicpark.fandom.com/wiki/Lysine_contingency