Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
The Biological Engineering Tool I Want to Develop: The Gut-Longevity Diagnostic Platform Emerging research firmly establishes the gut microbiome as a key modulator of systemic inflammation, metabolic health, and even the rate of biological aging. The metabolic outputs of our gut bacteria—particularly short-chain fatty acids (SCFAs) like butyrate—are directly linked to immune regulation, insulin sensitivity, and cellular repair pathways. I propose developing a diagnostic platform to functionally map this ecosystem and provide actionable insights for promoting healthspan. The Gut-Longevity Diagnostic is an at-home testing system that moves beyond static genomic sequencing. A user sample is exposed to a standardized panel of prebiotic substrates within a disposable cartridge containing engineered biosensors. These sensors measure the real-time, functional metabolic output—the specific SCFAs and gases produced by the user’s unique microbial community. A validated algorithm interprets this dynamic functional profile against longitudinal health data, generating a personalized, food-based nutritional prescription designed to steer the microbiome towards an anti-inflammatory, metabolic, and pro-longevity phenotype.

title: Homework

Week 2: DNA Read, Write, and Edit

Homework Questions from Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

The core DNA replication machinery, DNA polymerase, has a raw error rate of about 1 mistake for every 100,000 nucleotides it copies. This might sound precise, but given the human genome is roughly 3 billion base pairs long, a single cell division would result in about 30,000 mutations if uncorrected. This is a catastrophic level of errors that would make life impossible.

To deal with this, biology employs a powerful, multi-layered proofreading system:

Proofreading (3’→5’ Exonuclease Activity): Many DNA polymerases have a built-in “backspace” function. As they add nucleotides, they can immediately check and remove a mismatched one, improving accuracy by about 100-fold.

Mismatch Repair: After replication, a separate system acts like a final quality control team. It scans the new DNA strand, identifies and corrects mismatches that escaped the initial proofreading, boosting fidelity by another 100 to 1000 times.

Together, these systems reduce the final error rate to an astonishingly low ~1 error per 10 billion nucleotides, making high-fidelity inheritance possible.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Because the genetic code is redundant (multiple DNA codons can specify the same amino acid), the number of possible DNA sequences for an average 400-amino-acid protein is astronomically high—roughly 3⁴⁰⁰ different sequences.

However, not all these theoretical sequences will produce a functional protein efficiently in a living cell. Key biological constraints include:

Codon Usage Bias: Cells have preferred “words” (codons). Using rare codons that match scarce transfer RNA (tRNA) molecules can dramatically slow down protein production.

mRNA Structure: The sequence itself can fold into shapes that block the ribosome, preventing translation.

Protein Folding: The speed of translation, influenced by codon choice, can affect how the protein folds correctly as it’s being made.

Hidden Signals: The coding sequence might accidentally create signals that tell the cell to cut (splice) the RNA in the wrong place or stop translation early.

Homework Questions from Dr. LeProust

What is the most commonly used method for oligonucleotide synthesis currently? The industry standard is phosphoramidite-based solid-phase synthesis. In this automated process, DNA strands are built nucleotide-by-nucleotide onto a solid bead or chip. Each cycle adds one base with very high efficiency (99-99.5%), allowing for the reliable and scalable production of short DNA sequences.
Why is it difficult to make oligonucleotides longer than ~200 nucleotides by direct synthesis? The limitation is cumulative yield loss. Even with 99.5% efficiency per step, after 200 cycles, the chance of any one strand being fully correct is only about 37%. Beyond this length, the majority of the product is fragments of various lengths, making it extremely difficult and expensive to purify the tiny amount of full-length, error-free DNA.
Why can’t a 2000 bp gene be synthesized directly using oligo synthesis? For a 2000-base-pair gene, the probability of a perfect, full-length strand from direct chemical synthesis is effectively zero. Instead, scientists synthesize many shorter, manageable oligonucleotides (like 40-60 bases long) that overlap in sequence. These fragments are then stitched together accurately using enzymatic assembly methods like Gibson Assembly or Polymerase Chain Reaction (PCR)-based assembly, which leverage the cell’s own precise DNA repair and replication machinery.

Homework Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? The ten amino acids that animals cannot synthesize and must obtain from food are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine.

The “Lysine Contingency” is a concept that highlights an evolutionary trap. The fact that lysine is essential means animal ancestors permanently lost the complex biochemical pathways to produce it, likely because they lived in environments rich in lysine (e.g., eating plants). Once lost, these pathways are virtually impossible to re-evolve. This creates a fundamental nutritional dependency that shapes all animal ecology—from what we eat to how food webs are structured—and underscores how evolution can constrain future possibilities by eliminating unused metabolic options.

What code would you suggest for AA:AA interactions?

For AA:AA interactions, I would suggest a code based on hydrophobicity scales. The rule is that amino acids with similar hydrophobicity values have a high propensity to interact, with hydrophobic ones driving core packing and hydrophilic ones favoring surface exposure. This provides a powerful, simplified model because the drive to sequester hydrophobic residues from water is the fundamental organizing principle of protein folding.

Week 1 HW: Principles and Practices

The Biological Engineering Tool I Want to Develop: The Gut-Longevity Diagnostic Platform Emerging research firmly establishes the gut microbiome as a key modulator of systemic inflammation, metabolic health, and even the rate of biological aging. The metabolic outputs of our gut bacteria—particularly short-chain fatty acids (SCFAs) like butyrate—are directly linked to immune regulation, insulin sensitivity, and cellular repair pathways. I propose developing a diagnostic platform to functionally map this ecosystem and provide actionable insights for promoting healthspan.

The Gut-Longevity Diagnostic is an at-home testing system that moves beyond static genomic sequencing. A user sample is exposed to a standardized panel of prebiotic substrates within a disposable cartridge containing engineered biosensors. These sensors measure the real-time, functional metabolic output—the specific SCFAs and gases produced by the user’s unique microbial community. A validated algorithm interprets this dynamic functional profile against longitudinal health data, generating a personalized, food-based nutritional prescription designed to steer the microbiome towards an anti-inflammatory, metabolic, and pro-longevity phenotype.

The goal is to transform gut health from a vague concept into a measurable, modifiable pillar of preventative medicine, providing a science-backed tool for conscious dietary choices aimed at healthy aging.

Governance & Policy Goals for an Ethical Future The primary goal is Beneficence and Non-Maleficence in Preventative Health: ensuring the tool delivers real health benefits while rigorously preventing harm across diverse populations.

Sub-goal 1.1: Ensure Clinical Validity and Safety. The algorithm’s nutritional recommendations must be grounded in robust clinical evidence to avoid harm. Incorrect advice could exacerbate conditions like metabolic syndrome or IBD. Governance must mandate rigorous validation against health outcomes, not just correlation.

Sub-goal 1.2: Prevent Biological Data Exploitation. Gut microbiome data is highly personal predictive information. Governance must establish it as a protected health entity, preventing its use by insurers or employers for discrimination or by third parties for unauthorized manipulation (e.g., targeted advertising for unhealthy foods).

Sub-goal 1.3: Architect for Equitable Access from Inception. To avoid exacerbating health disparities, the technology’s design and business model must prioritize accessibility. Governance should incentivize affordable, scalable solutions suitable for integration into public health initiatives for aging populations.

Three Potential Governance Actions Action 1: A “Functional Diagnostic” Pre-Market Framework.

Purpose: Create a new regulatory pathway for tools that provide functional health analysis and dietary advice, distinct from medical devices or supplements.

Design: A consortium of regulatory agencies, microbiologists, and nutrition scientists defines evidence tiers for claims. Developers must achieve a given tier before marketing.

Assumptions: Regulators can adapt quickly, and predefined evidence standards will accelerate, not hinder, responsible innovation.

Risks: Over-standardization could stifle novel approaches. Success could create a two-tier system where only well-funded entities achieve the highest validation tiers.

Action 2: A Microbiome Data Commons with Granular User Control.

Purpose: Shift data ownership from corporations to individuals by creating a user-controlled, interoperable data repository.

Design: A non-profit or public entity develops the open-source platform. Users hold encryption keys, granting time-limited, specific access to researchers or apps via a “data wallet.”

Assumptions: Users will manage their keys responsibly. Researchers will participate despite more complex data access procedures.

Risks: Increased platform complexity and liability. If poorly adopted, it could fragment the data ecosystem further.

Action 3: Public-Private Development of a Core, Open-Source Algorithm.

Purpose: Ensure the core science remains transparent and auditable, preventing proprietary “black boxes” from dominating a public health field.

Design: A government-funded research center develops and validates a base algorithm using diverse, ethically-sourced data. Commercial entities build applications on this audited core.

Assumptions: An open-source model can achieve and maintain clinical-grade accuracy. Public funding will be sustained.

Risks: The core model could become outdated without continuous public investment. Commercial forks could deviate from safety guidelines. 4. Scoring Governance Actions Against Policy Goals (Scoring: 1 = Best, 3 = Worst, n/a = Not Applicable)

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	1	3	2
• By helping respond	1	2	2
Foster Lab Safety
• By preventing incident	2	1	3
• By helping respond	3	1	3
Protect the environment
• By preventing incidents	3	2	1
• By helping respond	2	1	1
Other considerations
• Minimizing costs and burdens to stakeholders	2	3	1
• Feasibility?	1	2	2
• Not impede research	2	1	1
• Promote constructive applications	1	1	1

My Recommendation & Trade-Offs To a National Institute on Aging or Public Health Agency:

I recommend prioritizing Action 3 (Open-Source Core) supported by Action 1 (Diagnostic Framework). This combination best serves long-term public health goals.

Why This Combination? Developing a publicly-audited core algorithm (Action 3) is a strategic investment that ensures foundational science remains a public good, fosters innovation on a level playing field, and directly enables equitable access by lowering the cost of entry. A clear, tiered Action 1 framework provides the necessary guardrails for safety and validity without prematurely stifling innovation around this open core.

The Trade-Off: This approach consciously accepts that early commercial market development may be less lucrative, potentially slowing initial private investment. However, it prioritizes long-term ecosystem health, scientific transparency, and equitable dissemination over short-term market capture by a few entities.

The Uncertainty: The major unknown is whether a publicly maintained model can match the rapid iteration pace of well-funded private labs. This requires a commitment to sustainable, competitive funding for the public core development team.

Personal Ethical Reflection This week’s work crystallized a critical ethical tension: the gap between personalized health technology and population-level health justice. A tool optimized for “longevity” risks being calibrated using data from affluent, already-healthy cohorts, potentially pathologizing normal variations in gut flora from underrepresented groups or labeling their traditional diets as “suboptimal.” This risks a new form of biomedical marginalization.

Therefore, a non-negotiable governance action must be mandatory diversity and inclusion in foundational research. Any public funding or regulatory approval for such platforms should require that the training datasets and validation cohorts are representative of global genetic, dietary, and socioeconomic diversity. This is not merely an ethical imperative but a scientific one, ensuring the resulting tools are robust, generalizable, and truly serve the goal of healthspan for all.