Homework
Weekly homework submissions:
Week 1 HW: Principles and Practices
The Biological Engineering Tool I Want to Develop: The Gut-Longevity Diagnostic Platform Emerging research firmly establishes the gut microbiome as a key modulator of systemic inflammation, metabolic health, and even the rate of biological aging. The metabolic outputs of our gut bacteria—particularly short-chain fatty acids (SCFAs) like butyrate—are directly linked to immune regulation, insulin sensitivity, and cellular repair pathways. I propose developing a diagnostic platform to functionally map this ecosystem and provide actionable insights for promoting healthspan. The Gut-Longevity Diagnostic is an at-home testing system that moves beyond static genomic sequencing. A user sample is exposed to a standardized panel of prebiotic substrates within a disposable cartridge containing engineered biosensors. These sensors measure the real-time, functional metabolic output—the specific SCFAs and gases produced by the user’s unique microbial community. A validated algorithm interprets this dynamic functional profile against longitudinal health data, generating a personalized, food-based nutritional prescription designed to steer the microbiome towards an anti-inflammatory, metabolic, and pro-longevity phenotype.
title: Homework
Week 2: DNA Read, Write, and Edit
Homework Questions from Professor Jacobson
- Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
The core DNA replication machinery, DNA polymerase, has a raw error rate of about 1 mistake for every 100,000 nucleotides it copies. This might sound precise, but given the human genome is roughly 3 billion base pairs long, a single cell division would result in about 30,000 mutations if uncorrected. This is a catastrophic level of errors that would make life impossible.
To deal with this, biology employs a powerful, multi-layered proofreading system:
Proofreading (3’→5’ Exonuclease Activity): Many DNA polymerases have a built-in “backspace” function. As they add nucleotides, they can immediately check and remove a mismatched one, improving accuracy by about 100-fold.
Mismatch Repair: After replication, a separate system acts like a final quality control team. It scans the new DNA strand, identifies and corrects mismatches that escaped the initial proofreading, boosting fidelity by another 100 to 1000 times.
Together, these systems reduce the final error rate to an astonishingly low ~1 error per 10 billion nucleotides, making high-fidelity inheritance possible.
- How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Because the genetic code is redundant (multiple DNA codons can specify the same amino acid), the number of possible DNA sequences for an average 400-amino-acid protein is astronomically high—roughly 3400 different sequences.
However, not all these theoretical sequences will produce a functional protein efficiently in a living cell. Key biological constraints include:
Codon Usage Bias: Cells have preferred “words” (codons). Using rare codons that match scarce transfer RNA (tRNA) molecules can dramatically slow down protein production.
mRNA Structure: The sequence itself can fold into shapes that block the ribosome, preventing translation.
Protein Folding: The speed of translation, influenced by codon choice, can affect how the protein folds correctly as it’s being made.
Hidden Signals: The coding sequence might accidentally create signals that tell the cell to cut (splice) the RNA in the wrong place or stop translation early.
Homework Questions from Dr. LeProust
What is the most commonly used method for oligonucleotide synthesis currently? The industry standard is phosphoramidite-based solid-phase synthesis. In this automated process, DNA strands are built nucleotide-by-nucleotide onto a solid bead or chip. Each cycle adds one base with very high efficiency (99-99.5%), allowing for the reliable and scalable production of short DNA sequences.
Why is it difficult to make oligonucleotides longer than ~200 nucleotides by direct synthesis? The limitation is cumulative yield loss. Even with 99.5% efficiency per step, after 200 cycles, the chance of any one strand being fully correct is only about 37%. Beyond this length, the majority of the product is fragments of various lengths, making it extremely difficult and expensive to purify the tiny amount of full-length, error-free DNA.
Why can’t a 2000 bp gene be synthesized directly using oligo synthesis? For a 2000-base-pair gene, the probability of a perfect, full-length strand from direct chemical synthesis is effectively zero. Instead, scientists synthesize many shorter, manageable oligonucleotides (like 40-60 bases long) that overlap in sequence. These fragments are then stitched together accurately using enzymatic assembly methods like Gibson Assembly or Polymerase Chain Reaction (PCR)-based assembly, which leverage the cell’s own precise DNA repair and replication machinery.
Homework Question from George Church
- What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? The ten amino acids that animals cannot synthesize and must obtain from food are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine.
The “Lysine Contingency” is a concept that highlights an evolutionary trap. The fact that lysine is essential means animal ancestors permanently lost the complex biochemical pathways to produce it, likely because they lived in environments rich in lysine (e.g., eating plants). Once lost, these pathways are virtually impossible to re-evolve. This creates a fundamental nutritional dependency that shapes all animal ecology—from what we eat to how food webs are structured—and underscores how evolution can constrain future possibilities by eliminating unused metabolic options.
- What code would you suggest for AA:AA interactions?
For AA:AA interactions, I would suggest a code based on hydrophobicity scales. The rule is that amino acids with similar hydrophobicity values have a high propensity to interact, with hydrophobic ones driving core packing and hydrophilic ones favoring surface exposure. This provides a powerful, simplified model because the drive to sequester hydrophobic residues from water is the fundamental organizing principle of protein folding.