Week 1 HW: Principles and Practices

Class assignment

1. First, describe a biological engineering application or tool you want to develop and why.

My brother has autism, which is why this area is personal for me. As a CS/AI master’s student, I find it exciting that I can use AI protein design tools like AlphaFold to work on something that actually matters to me and my family.

A recent study by Trudler et al. (2024) at Scripps Research used patient-derived brain organoids (“mini-brains”) to show that mutations in the MEF2C gene — responsible for a severe form of ASD — disrupt the expression of specific microRNAs (miR-9, miR-124, miR-128). These miRNAs normally guide developing brain cells to become the right type of neuron; when they are dysregulated, the balance between excitatory and inhibitory neurons is lost, leading to hyperexcitability associated with autism.

Reference: Trudler, D. et al. “Dysregulation of miRNA expression and excitation in MEF2C autism patient hiPSC-neurons and cerebral organoids.” Molecular Psychiatry 30, 1479–1496 (2025). DOI: 10.1038/s41380-024-02761-9

I want to use AI protein design tools (AlphaFold, ESMFold, Rosetta — covered in HTGAA Weeks 4-5) to design a small protein or peptide that can bind to and modulate these autism-associated miRNAs. The designed protein would be expressed using cell-free synthesis (Week 9) and characterized with mass spectrometry (Week 10). The gene would be ordered from Twist Bioscience. I want to develop this because it sits right at the intersection of my CS/AI background and the wet lab techniques taught in HTGAA, and because AI-designed proteins targeting neurodevelopment could open up new research directions for autism.

My primary goal is to make sure AI-designed proteins targeting neurodevelopment are safe, responsibly used, and beneficial to the autism community. The sub-goals are:

Prevent AI-designed molecules from causing unintended biological harm.
Ensure traceability of AI-generated designs so problems can be traced back.
Keep AI protein design tools accessible and not locked behind excessive regulation.
Ensure the autism community benefits and is not exploited by research done in their name.

AI-designed proteins have the possibility to be used for great benefit in understanding and eventually treating neurodevelopmental conditions, but could also be misused or cause harm if not properly governed.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Action 1: Mandatory biosafety screening for AI-designed proteins at DNA synthesis providers.

Purpose: Currently, synthesis companies like Twist screen orders against databases of known pathogens. But AI-designed novel proteins are not in these databases. We would extend screening to include computational toxicity and function prediction for novel sequences before synthesis.
Design: Synthesis providers (Twist, IDT, GenScript) would add a screening layer using protein function prediction tools to flag designs resembling known toxins or immune modulators. The IGSC would coordinate standards. Funding could come from industry fees and government biosecurity grants.
Assumptions: This assumes computational tools can reliably predict function from sequence alone, which is improving but still imperfect. It also assumes providers would voluntarily adopt this or that regulation would compel them.
Risks: If screening is too aggressive, it could block legitimate research orders (like my HTGAA project). If successful, it could create a false sense of security — “it passed screening” does not mean “it is safe.” Bad actors could also bypass providers using benchtop synthesis.

Action 2: Institutional ethics review for AI-designed therapeutics targeting neurodevelopment.

Purpose: Currently, IBCs review recombinant DNA work and IRBs review human subjects research. But there is no specific review for AI-designed molecules targeting brain development. We would add a lightweight ethics checklist when projects involve AI-designed proteins and neurodevelopmental targets.
Design: Universities would add a checklist to existing IBC review covering scientific justification, community benefit assessment, and transparent reporting. This could be piloted at MIT and Harvard first. The cost would be minimal since it piggybacks on existing infrastructure.
Assumptions: This assumes institutions will adopt the extra step without external mandate, and that the review can be lightweight enough not to discourage student projects.
Risks: If the review is too burdensome, students may avoid neurodevelopmental projects altogether. If too light, it becomes a rubber stamp. There is also a risk of paternalism — who decides what “benefits” the autism community?

Action 3: Open sharing of AI protein design methods and results through conferences and preprints.

Purpose: To share beneficial use cases of AI-designed proteins, foster collaboration between CS/AI and biology researchers, and disseminate safety learnings so the community can self-correct.
Design: This requires coordinating the AI protein design and synthetic biology communities. Organizations like iGEM or the Protein Society could host dedicated sessions. Researchers would be encouraged to publish designs, validation results, and failure cases openly.
Assumptions: I assume that researchers would be interested in attending and discussing, and that open sharing does more good than harm.
Risks: Open sharing of protein designs could be misused. Ethics could be overlooked in favor of scientific progress. However, keeping designs secret is arguably more dangerous because it prevents community oversight.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Does the action:	Action 1: Synthesis Screening	Action 2: Ethics Review	Action 3: Open Sharing
Enhance Biosecurity
• By preventing incidents	1	2	2
• By helping respond	1	3	1
Foster Lab Safety
• By preventing incident	1	2	2
• By helping respond	2	2	1
Protect the environment
• By preventing incidents	1	2	2
• By helping respond	2	2	1
Other considerations
• Minimizing costs and burdens to stakeholders	3	2	1
• Feasibility?	2	1	1
• Not impede research	3	2	1
• Promote constructive applications	2	2	1

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why.

I think a combination of Action 2 (ethics review) and Action 3 (open sharing) would work best as immediate steps, with Action 1 (synthesis screening) as a longer-term goal.

Action 1 scores best on biosecurity and safety, but it is the most burdensome and could slow down student research. I would not want my own Twist order for this course to get delayed or rejected by an overly aggressive screening algorithm. This is better pursued as a long-term industry-wide initiative rather than something individual institutions can do on their own.

Action 2 is the most feasible to implement right now. MIT already has IBC infrastructure, and adding a lightweight neurodevelopment checklist requires no new funding or legislation. Action 3 is also easy to start and promotes the kind of interdisciplinary collaboration that makes AI protein design safer through community oversight.

Here many assumptions are made — mainly that AI-designed proteins pose meaningfully different risks from traditionally designed ones, which may not be true yet but will become more relevant as the tools improve. There is also a tension between accessibility and oversight. As a CS student entering biology for the first time through HTGAA, I benefit a lot from open tools and low barriers. Over-regulation could discourage exactly the kind of interdisciplinary work this course promotes. But as someone with a family member with autism, I also understand why this community can be wary of researchers who study autism without engaging with autistic people. These uncertainties can be mitigated by keeping review processes lightweight and making sure they include input from the autism community itself.

Week 2 lecture prep

Homework Questions from Joe Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

DNA polymerase has a raw error rate of about 10⁻⁴ to 10⁻⁶ per base pair during nucleotide insertion. With the built-in 3’→5’ proofreading exonuclease, accuracy improves to around 10⁻⁷ to 10⁻⁸. The human genome is roughly 3.2 × 10⁹ base pairs (about 6.3 billion for the diploid genome), so at that rate there would still be many errors per copy. Biology solves this through mismatch repair mechanisms (such as MutS) that catch errors proofreading missed, bringing the final error rate down to about 1 per 10⁹ to 10¹⁰ nucleotides — low enough to reliably copy a genome this large.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The genetic code is degenerate — 64 codons encode 20 amino acids plus 3 stop signals. The slides mention the average human protein is about 1036 base pairs, which is roughly 345 codons / 345 amino acids. With about 3 possible codons per amino acid on average, that gives roughly 3³⁴⁵ ≈ 10¹⁷⁹ possible DNA sequences for one protein — a huge number.

In practice most don’t work because organisms prefer certain codons that match their tRNA pools (codon usage bias), and using rare codons slows the ribosome and drops protein yield. Other issues include mRNA secondary structure and GC content blocking transcription or translation, accidental creation of regulatory signals like splice sites, and changes in translation speed that alter co-translational protein folding — affecting structure, solubility, or stability even when the amino acid sequence is identical. This is why codon optimization is standard practice in protein engineering.

Homework Questions from Emily Leproust

What’s the most commonly used method for oligo synthesis currently?

The phosphoramidite method — solid-phase chemical synthesis originally developed by Caruthers. Oligos are built stepwise on a solid support like controlled pore glass (CPG), adding one nucleotide at a time through cycles of detritylation, coupling, oxidation, and capping. It is highly automatable and forms the basis of all modern oligo synthesis platforms.

Why is it difficult to make oligos longer than 200 nt via direct synthesis?

Each coupling step has an efficiency of about 99%, but these small errors compound exponentially. A 200-mer at 99% efficiency gives only ~13% theoretical full-length yield, and in practice it’s much lower. Longer sequences accumulate more deletions, truncations, and depurination from the repeated harsh chemical cycles. Longer strands also form secondary structures that hinder reagent diffusion and coupling on porous supports like CPG. The result is that failure sequences (n-1 mers, mutations) dominate the output, and purifying the correct full-length product becomes impractical.

Why can’t you make a 2000 bp gene via direct oligo synthesis?

The above answer explains why longer oligos become dramatically harder to make. At 2000 bp, cumulative coupling inefficiency and side reactions cause full-length product yield to approach zero. Unlike enzymatic replication, chemical synthesis has no proofreading, so deletions, insertions, and substitutions build up rapidly. Long sequences also form stable hairpins that block reagent access. Instead, modern gene synthesis assembles shorter oligos (~60–200 nt) into longer fragments using enzymatic methods like Gibson Assembly, followed by sequence verification.

Homework Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids are: lysine, methionine, tryptophan, threonine, valine, isoleucine, leucine, arginine, histidine, and phenylalanine. Animals cannot synthesize these and must obtain them from dietary sources.

The “Lysine Contingency” is from Jurassic Park — a genetic modification to make the dinosaurs unable to produce lysine so they would die without human-provided supplements. However, lysine is already one of the 10 essential amino acids, so animals cannot produce it anyway. The dinosaurs could simply get lysine by eating plants, meat, or bacteria, making this a scientifically dubious plot point.

That said, the concept highlights something real about food security. Lysine is a limiting amino acid in cereal-based diets — staple crops like maize, rice, and wheat are all lysine-deficient relative to animal nutritional needs (Galili G, 2002). Growth and health can be constrained by lysine availability even when total protein intake is sufficient. This is why biotechnological interventions like microbial lysine production or high-lysine crops can have outsized impacts on food security and animal productivity. The lysine contingency, while fictional, illustrates how molecular-level biochemical constraints shape global food systems and ecological dependencies.

Applied AI support for early-stage project ideation through informal, conversational brainstorming (including: “My background is [X]. I’m interested in autism in the HTGAA course—what can I do?”), as well as subsequent formatting, structural organization, and language refinement of written outputs.