Week 1 HW: Principles and Practices

Class Assignment #1
Homework Questions from Professor Jacobson
1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
The initial insertion of nucleotides by polymerases incurs in an error once every 10000 to 100000. When including the effect of exonuclease proofreading domain, accuracy is increased by 100- to 1000- fold, making the final error rate one per 106 107 nucleotides. The haploid human genome is roughly 3 billion base pairs and a diploid cell (before division), this is 6 x 10^9 base pairs — If the polymerase only had its intrinsic proofreading ability error rate, a single cell division would result in roughly 30 to 600 errors per replication. This would be an unsustainable rate of mutation for a multicellular organism.
Thus, biology employs a multi-tiered, highly efficient repair system to ensure high fidelity, resulting in an overall mutation rate of less than one mutation per genome per cell division by: Proofreading (Immediate Correction), Mismatch Repair (Post-Replication Repair), Redundancy and Non-coding DNA and Low-Fidelity Backup — In cases of severe DNA damage, the cell uses specialized, “error-prone” polymerases (translesion synthesis) to skip over damage to prevent cell death, allowing for a temporary increase in mutations, but saving the cell.
2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Most amino acids are coded by 2-6 codons, therefore, for an average human protein—roughly 300 to 500 amino acids long—the number of potential DNA sequences is astronomical. So most of these potential genetic codes will not produce a functional protein due to: Codon Usage Bias & Translation Speed: there are preferred codons to improve speed; Co-translational Folding Errors: the folding is coordinated with speed of translation, so if the speed is slowed down by the use of rare codons the proteins might not fold properly; mRNA Stability and Structure; Splicing Errors:in eukaryotes, the coding sequence (exons) is interrupted by non-coding sequences (introns). Eukaryotic DNA sequences contain “hidden” splicing signals that tell the cell where to cut and join RNA. A different coding sequence might accidentally introduce or destroy these sites, resulting in an improperly spliced mRNA; Regulatory Site Disruption: DNA regions often contain dual information: coding for a protein and containing regulatory signals (e.g., enhancers, transcription factor binding sites). Changing the DNA code to a synonym might destroy a crucial regulatory element, meaning the protein is simply never produced.
Homework Questions from Dr. LeProust
1. What’s the most commonly used method for oligo synthesis currently?
It’s the solid phase chemichal synthesis — Phosphoramidite Method.
2. Why is it difficult to make oligos longer than 200nt via direct synthesis?
Even with most highly optimized protocols, each step of the chemical synthesis cycle is not 100% efficient (99% — 99.5%). As the sequence lenght increases. The effect of these small cumulative losses in a 200 nt segment can reduce the final yield to < 30%.
3. Why can’t you make a 2000bp gene via direct oligo synthesis?
Due to the accuracy limitations, chemical synthesis is mostly limited to 500 bp. Making a gene longer than that implies the posterior ligation of the several smaller diferent fragments to ensure a viable gene, otherwise, the errors would accumulate into a non-working gene.
Homework Question from George Church
1. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency"?
Being the 10 essential amino acids in animals (PVT TIM HALL): Phenylalanine, Valine, Tryptophan, Threonine, Isoleucine, Methionine, Histidine, Arginine, Leucine and Lysine — the “Lysine Contingency” would be lacking the main component of a biological kill switch which is dependant on a substance that any given organism wouldn’t be able to get outside controlled systems. If Lysine is already an essential amino acid that all animals, and presumably dinosaurs, need to find through their diet, then it can’t be considered a contingency since the dinosaurs could find it anywhere outside the island through eating plants or other animals. For this amino acid contingency to be functional the GM dinosaur would need to be dependant on some kind of completely synthetic amino acid that could not be substituted by any naturally occurring one.
Personal Resources
https://pubs.rsc.org/en/content/articlehtml/2024/cs/d3cs00469d
