Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Class assignment 1. First, describe a biological engineering application or tool you want to develop and why. As a CS/AI master’s student, I find it exciting that I can use AI protein design tools like AlphaFold to work on something that actually matters.

  • Week 10 HW: Advanced Imaging & Measurement Technology

  • Week 11 HW: Bioproduction & Cloud Labs

    ((in progress…))

  • Week 12 HW: Building Genomes

    ((in progress…))

  • Week 2 HW: DNA Read, Write and Edit

    Part 1: Benchling & In-silico Gel Art For this exercise, the full genome of Bacteriophage Lambda (GenBank accession J02459.1, 48,502 bp) was imported into Benchling from NCBI. A virtual restriction enzyme digestion was performed using seven enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Each enzyme was applied individually to identify its recognition sites across the Lambda genome. The digest results were visualized using Benchling’s simulated gel electrophoresis tool. To get a nice visual, I tried different ladders and different lane orderings. The final output consisting of for each enzyme’s fragment pattern is below.

  • Week 3 HW: Lab Automation

    Part 1: Python Script Opentron Artwork Opentrons Colab for Source Code My Jelly Smiley Design Part 2: Post-Lab Questions Question 1: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

  • Week 4 HW: Protein Design I

    ((in progress…))

  • Week 5 HW: Protein Design II

    Part 1: Generate Binders with PepMLM I retrieved the human SOD1 sequence from UniProt (P00441) in FASTA format. Original canonical sequence: >sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ I introduced the A4V mutation by changing Alanine (A) to Valine (V) at position 4 of the mature protein sequence (index 4, 0-based).

  • Week 6 HW: Genetic Circuits I

    1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion HF PCR Master Mix is a ready-to-use 2X solution that contains everything needed for PCR except the template, primers, and water. The most important component is Phusion DNA Polymerase, which is the enzyme that copies DNA during the reaction. The standard alternative is Taq polymerase, a widely used enzyme isolated from a heat-resistant bacterium called Thermus aquaticus. Taq survives the high temperatures of PCR but has no proofreading ability, so it cannot correct mistakes it makes while copying, resulting in a relatively high error rate. Phusion addresses this by including a proofreading domain that catches and fixes errors as they occur, making it about 50 times more accurate than Taq. This accuracy is important in a mutagenesis experiment where only the specific, intended mutations should be introduced and any unintended errors would compromise the result.
  • Week 7 HW: Neuromorphic Circuits

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Boolean genetic circuits treat signals like 0 or 1, meaning every input must pass a hard on/off threshold. However, this can lose useful information. This can be limiting because real biological signals are usually not binary. Inside cells, signals often exist as gradual changes such as in concentration level. IANNs are useful because they keep more of that analog information since they are not only asking “is this signal present or absent?”.

  • Week 9 HW: Cell-Free Systems

Subsections of Homework

Week 1 HW: Principles and Practices

Class assignment

1. First, describe a biological engineering application or tool you want to develop and why.

As a CS/AI master’s student, I find it exciting that I can use AI protein design tools like AlphaFold to work on something that actually matters.

A recent study by Trudler et al. (2024) at Scripps Research used patient-derived brain organoids (“mini-brains”) to show that mutations in the MEF2C gene — responsible for a severe form of ASD — disrupt the expression of specific microRNAs (miR-9, miR-124, miR-128). These miRNAs normally guide developing brain cells to become the right type of neuron; when they are dysregulated, the balance between excitatory and inhibitory neurons is lost, leading to hyperexcitability associated with autism.

Reference: Trudler, D. et al. “Dysregulation of miRNA expression and excitation in MEF2C autism patient hiPSC-neurons and cerebral organoids.” Molecular Psychiatry 30, 1479–1496 (2025). DOI: 10.1038/s41380-024-02761-9

I want to use AI protein design tools (AlphaFold, ESMFold, Rosetta — covered in HTGAA Weeks 4-5) to design a small protein or peptide that can bind to and modulate these autism-associated miRNAs. The designed protein would be expressed using cell-free synthesis (Week 9) and characterized with mass spectrometry (Week 10). The gene would be ordered from Twist Bioscience. I want to develop this because it sits right at the intersection of my CS/AI background and the wet lab techniques taught in HTGAA, and because AI-designed proteins targeting neurodevelopment could open up new research directions for autism.

My primary goal is to make sure AI-designed proteins targeting neurodevelopment are safe, responsibly used, and beneficial to the autism community. The sub-goals are:

  • Prevent AI-designed molecules from causing unintended biological harm.
  • Ensure traceability of AI-generated designs so problems can be traced back.
  • Keep AI protein design tools accessible and not locked behind excessive regulation.
  • Ensure the autism community benefits and is not exploited by research done in their name.

AI-designed proteins have the possibility to be used for great benefit in understanding and eventually treating neurodevelopmental conditions, but could also be misused or cause harm if not properly governed.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Action 1: Mandatory biosafety screening for AI-designed proteins at DNA synthesis providers.

  • Purpose: Currently, synthesis companies like Twist screen orders against databases of known pathogens. But AI-designed novel proteins are not in these databases. We would extend screening to include computational toxicity and function prediction for novel sequences before synthesis.
  • Design: Synthesis providers (Twist, IDT, GenScript) would add a screening layer using protein function prediction tools to flag designs resembling known toxins or immune modulators. The IGSC would coordinate standards. Funding could come from industry fees and government biosecurity grants.
  • Assumptions: This assumes computational tools can reliably predict function from sequence alone, which is improving but still imperfect. It also assumes providers would voluntarily adopt this or that regulation would compel them.
  • Risks: If screening is too aggressive, it could block legitimate research orders (like my HTGAA project). If successful, it could create a false sense of security — “it passed screening” does not mean “it is safe.” Bad actors could also bypass providers using benchtop synthesis.

Action 2: Institutional ethics review for AI-designed therapeutics targeting neurodevelopment.

  • Purpose: Currently, IBCs review recombinant DNA work and IRBs review human subjects research. But there is no specific review for AI-designed molecules targeting brain development. We would add a lightweight ethics checklist when projects involve AI-designed proteins and neurodevelopmental targets.
  • Design: Universities would add a checklist to existing IBC review covering scientific justification, community benefit assessment, and transparent reporting. This could be piloted at MIT and Harvard first. The cost would be minimal since it piggybacks on existing infrastructure.
  • Assumptions: This assumes institutions will adopt the extra step without external mandate, and that the review can be lightweight enough not to discourage student projects.
  • Risks: If the review is too burdensome, students may avoid neurodevelopmental projects altogether. If too light, it becomes a rubber stamp. There is also a risk of paternalism — who decides what “benefits” the autism community?

Action 3: Open sharing of AI protein design methods and results through conferences and preprints.

  • Purpose: To share beneficial use cases of AI-designed proteins, foster collaboration between CS/AI and biology researchers, and disseminate safety learnings so the community can self-correct.
  • Design: This requires coordinating the AI protein design and synthetic biology communities. Organizations like iGEM or the Protein Society could host dedicated sessions. Researchers would be encouraged to publish designs, validation results, and failure cases openly.
  • Assumptions: I assume that researchers would be interested in attending and discussing, and that open sharing does more good than harm.
  • Risks: Open sharing of protein designs could be misused. Ethics could be overlooked in favor of scientific progress. However, keeping designs secret is arguably more dangerous because it prevents community oversight.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Does the action:Action 1: Synthesis ScreeningAction 2: Ethics ReviewAction 3: Open Sharing
Enhance Biosecurity
• By preventing incidents122
• By helping respond131
Foster Lab Safety
• By preventing incident122
• By helping respond221
Protect the environment
• By preventing incidents122
• By helping respond221
Other considerations
• Minimizing costs and burdens to stakeholders321
• Feasibility?211
• Not impede research321
• Promote constructive applications221

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why.

I think a combination of Action 2 (ethics review) and Action 3 (open sharing) would work best as immediate steps, with Action 1 (synthesis screening) as a longer-term goal.

Action 1 scores best on biosecurity and safety, but it is the most burdensome and could slow down student research. I would not want my own Twist order for this course to get delayed or rejected by an overly aggressive screening algorithm. This is better pursued as a long-term industry-wide initiative rather than something individual institutions can do on their own.

Action 2 is the most feasible to implement right now. MIT already has IBC infrastructure, and adding a lightweight neurodevelopment checklist requires no new funding or legislation. Action 3 is also easy to start and promotes the kind of interdisciplinary collaboration that makes AI protein design safer through community oversight.

Here many assumptions are made — mainly that AI-designed proteins pose meaningfully different risks from traditionally designed ones, which may not be true yet but will become more relevant as the tools improve. There is also a tension between accessibility and oversight. As a CS student entering biology for the first time through HTGAA, I benefit a lot from open tools and low barriers. Over-regulation could discourage exactly the kind of interdisciplinary work this course promotes. I also understand why the autism community can be wary of researchers who study autism without engaging with autistic people. These uncertainties can be mitigated by keeping review processes lightweight and making sure they include input from the autism community itself.


Week 2 lecture prep

Homework Questions from Joe Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

DNA polymerase has a raw error rate of about 10⁻⁴ to 10⁻⁶ per base pair during nucleotide insertion. With the built-in 3’→5’ proofreading exonuclease, accuracy improves to around 10⁻⁷ to 10⁻⁸. The human genome is roughly 3.2 × 10⁹ base pairs (about 6.3 billion for the diploid genome), so at that rate there would still be many errors per copy. Biology solves this through mismatch repair mechanisms (such as MutS) that catch errors proofreading missed, bringing the final error rate down to about 1 per 10⁹ to 10¹⁰ nucleotides — low enough to reliably copy a genome this large.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The genetic code is degenerate — 64 codons encode 20 amino acids plus 3 stop signals. The slides mention the average human protein is about 1036 base pairs, which is roughly 345 codons / 345 amino acids. With about 3 possible codons per amino acid on average, that gives roughly 3³⁴⁵ ≈ 10¹⁷⁹ possible DNA sequences for one protein — a huge number.

In practice most don’t work because organisms prefer certain codons that match their tRNA pools (codon usage bias), and using rare codons slows the ribosome and drops protein yield. Other issues include mRNA secondary structure and GC content blocking transcription or translation, accidental creation of regulatory signals like splice sites, and changes in translation speed that alter co-translational protein folding — affecting structure, solubility, or stability even when the amino acid sequence is identical. This is why codon optimization is standard practice in protein engineering.

Homework Questions from Emily Leproust

What’s the most commonly used method for oligo synthesis currently?

The phosphoramidite method — solid-phase chemical synthesis originally developed by Caruthers. Oligos are built stepwise on a solid support like controlled pore glass (CPG), adding one nucleotide at a time through cycles of detritylation, coupling, oxidation, and capping. It is highly automatable and forms the basis of all modern oligo synthesis platforms.

Why is it difficult to make oligos longer than 200 nt via direct synthesis?

Each coupling step has an efficiency of about 99%, but these small errors compound exponentially. A 200-mer at 99% efficiency gives only ~13% theoretical full-length yield, and in practice it’s much lower. Longer sequences accumulate more deletions, truncations, and depurination from the repeated harsh chemical cycles. Longer strands also form secondary structures that hinder reagent diffusion and coupling on porous supports like CPG. The result is that failure sequences (n-1 mers, mutations) dominate the output, and purifying the correct full-length product becomes impractical.

Why can’t you make a 2000 bp gene via direct oligo synthesis?

The above answer explains why longer oligos become dramatically harder to make. At 2000 bp, cumulative coupling inefficiency and side reactions cause full-length product yield to approach zero. Unlike enzymatic replication, chemical synthesis has no proofreading, so deletions, insertions, and substitutions build up rapidly. Long sequences also form stable hairpins that block reagent access. Instead, modern gene synthesis assembles shorter oligos (~60–200 nt) into longer fragments using enzymatic methods like Gibson Assembly, followed by sequence verification.

Homework Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids are: lysine, methionine, tryptophan, threonine, valine, isoleucine, leucine, arginine, histidine, and phenylalanine. Animals cannot synthesize these and must obtain them from dietary sources.

The “Lysine Contingency” is from Jurassic Park — a genetic modification to make the dinosaurs unable to produce lysine so they would die without human-provided supplements. However, lysine is already one of the 10 essential amino acids, so animals cannot produce it anyway. The dinosaurs could simply get lysine by eating plants, meat, or bacteria, making this a scientifically dubious plot point.

That said, the concept highlights something real about food security. Lysine is a limiting amino acid in cereal-based diets — staple crops like maize, rice, and wheat are all lysine-deficient relative to animal nutritional needs (Galili G, 2002). Growth and health can be constrained by lysine availability even when total protein intake is sufficient. This is why biotechnological interventions like microbial lysine production or high-lysine crops can have outsized impacts on food security and animal productivity. The lysine contingency, while fictional, illustrates how molecular-level biochemical constraints shape global food systems and ecological dependencies.

Applied AI support for early-stage project ideation through informal, conversational brainstorming (including: “My background is [X]. I’m interested in autism in the HTGAA course—what can I do?”), as well as subsequent formatting, structural organization, and language refinement of written outputs.

Week 10 HW: Advanced Imaging & Measurement Technology

Week 11 HW: Bioproduction & Cloud Labs

((in progress…))

Week 12 HW: Building Genomes

((in progress…))

Week 2 HW: DNA Read, Write and Edit

Part 1: Benchling & In-silico Gel Art

For this exercise, the full genome of Bacteriophage Lambda (GenBank accession J02459.1, 48,502 bp) was imported into Benchling from NCBI. A virtual restriction enzyme digestion was performed using seven enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Each enzyme was applied individually to identify its recognition sites across the Lambda genome. The digest results were visualized using Benchling’s simulated gel electrophoresis tool. To get a nice visual, I tried different ladders and different lane orderings. The final output consisting of for each enzyme’s fragment pattern is below.

digest-art digest-art

Experiments with DNA Gel Art Interface. My initials below figure B. K.

initials initials

Note: As a possible extension, it could be cool to build an interactive version of Ronan’s website where you can pick genome, enzymes, drag lanes around, swap ladders, add timing, etc. Could be a nice tool for future HTGAA classes.

Note: As a possible extension, it could be cool to build an interactive version of Ronan’s website where you can pick genome, enzymes, drag lanes around, swap ladders, add timing, etc. Could be a nice tool for future HTGAA classes.

Part 3: DNA Design Challenge

3.1. Choose your protein

I chose PIGH (Phosphatidylinositol N-acetylglucosaminyltransferase subunit H), a human protein encoded by the PIGH gene on chromosome 14q24.1. PIGH is a 188-amino-acid subunit of the GPI-GnT complex, which catalyzes the first step of GPI-anchor biosynthesis — transferring N-acetylglucosamine to phosphatidylinositol on the cytoplasmic side of the endoplasmic reticulum. GPI anchors tether many important proteins to the cell surface.

I chose this protein because mutations in PIGH cause GPIBD17 (Glycosylphosphatidylinositol Biosynthesis Defect 17), a rare autosomal recessive disorder characterized by developmental delay, seizures, and autistic features. The gene was only linked to disease in 2018, so it is likely underdiagnosed. Its small size (188 aa) also makes it practical for the synthesis and expression exercises in this homework.

I obtained the FASTA sequence from UniProt.

>sp|Q14442|PIGH_HUMAN Phosphatidylinositol N-acetylglucosaminyltransferase subunit H OS=Homo sapiens OX=9606 GN=PIGH PE=1 SV=1
MEDERSFSDICGGRLALQRRYYSPSCREFCLSCPRLSLRSLTAVTCTVWLAAYGLFTLCE
NSMILSAAIFITLLGLLGYLHFVKIDQETLLIIDSLGIQMTSSYASGKESTTFIEMGKVK
DIVINEAIYMQKVIYYLCILLKDPVEPHGISQVVPVFQSAKPRLDCLIEVYRSCQEILAH
QKATSTSP

3.2. Reverse Translate

Using the Sequence Manipulation Suite reverse translation tool from bioinformatics.org, I converted the 188-amino-acid protein sequence into a 564 bp DNA sequence using the most likely codons:

>reverse translation of sp|Q14442|PIGH_HUMAN to a 564 base sequence of most likely codons.
atggaagatgaacgcagctttagcgatatttgcggcggccgcctggcgctgcagcgccgc
tattatagcccgagctgccgcgaattttgcctgagctgcccgcgcctgagcctgcgcagc
ctgaccgcggtgacctgcaccgtgtggctggcggcgtatggcctgtttaccctgtgcgaa
aacagcatgattctgagcgcggcgatttttattaccctgctgggcctgctgggctatctg
cattttgtgaaaattgatcaggaaaccctgctgattattgatagcctgggcattcagatg
accagcagctatgcgagcggcaaagaaagcaccacctttattgaaatgggcaaagtgaaa
gatattgtgattaacgaagcgatttatatgcagaaagtgatttattatctgtgcattctg
ctgaaagatccggtggaaccgcatggcattagccaggtggtgccggtgtttcagagcgcg
aaaccgcgcctggattgcctgattgaagtgtatcgcagctgccaggaaattctggcgcat
cagaaagcgaccagcaccagcccg

3.3. Codon Optimization

optimization optimization

I used the GenSmart Codon Optimization Tool to optimize the PIGH coding sequence for expression in E. coli and Staphylococcus aureus. Codon optimization is necessary because different organisms have different preferences for which codons they use most efficiently related to tRNA availability. Each organism has a different pool of tRNA molecules, and some tRNAs are abundant while others are rare. When the ribosome hits a codon whose matching tRNA is scarce in that organism, translation slows down or stalls. A human gene expressed directly in E. coli may use rare codons that slow down or stall translation. Optimization replaces these with codons preferred by the host organism, improving expression levels without changing the protein sequence. Optimization report available here.

Optimized sequence:

ATGGAAGATGAACGTAGCTTTAGTGATATTTGTGGTGGTAGATTAGCACTGCAGCGTCGTTATTATAGCCCGAGCTGTCGTGAATTTTGTCTGAGTTGTCCACGTCTGAGCCTG
CGCAGCCTGACCGCCGTTACTTGTACAGTTTGGTTAGCTGCATATGGTTTATTTACATTATGTGAAAATTCAATGATCTTGTCAGCAGCAATTTTTATCACCCTGCTGGGATTGT
TAGGTTATTTACATTTTGTAAAAATTGATCAAGAAACGCTGCTGATTATTGACAGCCTGGGCATTCAAATGACATCAAGCTATGCTTCAGGTAAAGAAAGTACCACCTTTATTGAA
ATGGGTAAGGTTAAAGATATCGTTATTAATGAAGCGATTTATATGCAAAAAGTAATATACTATCTTTGTATTCTGTTAAAAGATCCTGTTGAACCACATGGTATTTCACAGGTTGTT
CCGGTGTTCCAGAGTGCAAAACCTCGTTTAGATTGTTTAATTGAAGTTTATCGTTCATGCCAGGAAATTTTAGCACATCAAAAAGCAACCAGCACTAGCCCG

3.4. You have a sequence! Now what?

To produce the PIGH protein, the codon-optimized DNA sequence is incorporated into an expression construct containing regulatory elements — a promoter (BBa_J23106), ribosome binding site (BBa_B0034), start codon, coding sequence, 7×His tag for purification, stop codon, and terminator (BBa_B0015). This can be ordered as a synthetic plasmid from Twist Bioscience in a backbone like pTwist Amp High Copy.

Protein production follows the central dogma in two stages. During transcription, RNA polymerase binds the promoter and synthesizes a complementary mRNA strand. During translation, ribosomes assemble at the RBS and read the mRNA in three-nucleotide codons, with tRNA molecules delivering amino acids from the start codon (AUG/methionine) until the UAA stop codon is reached, producing the folded 188-amino-acid PIGH polypeptide.

Cell-Dependent Production: The plasmid could be transformed into E. coli BL21 via heat shock or electroporation, with ampicillin selection to identify successful transformants. The host cell’s machinery and metabolic resources would then drive protein expression. Since PIGH is relatively small and does not appear to require complex eukaryotic modifications, E. coli may be a suitable host. The His-tagged protein could then be purified via Ni-NTA chromatography. This approach tends to be scalable but requires time for cell growth and purification.

Cell-Free Production: Alternatively, a TX-TL cell-free system (e.g., PURExpress) could be used, where the linear expression cassette is mixed directly with ribosomes, polymerase, tRNAs, and amino acids from lysed cells. This would likely produce protein within hours without living cultures, making it potentially ideal for rapid prototyping. PIGH’s small size (~564 bp) should be well within the efficient range for cell-free systems, though yields are generally lower and cost per reaction tends to be higher.

Part 4: Prepare a Twist DNA Synthesis Order

plasmid plasmid

Part 5: DNA Read/Write/Edit

5.1 DNA Read

I would want to sequence the PIGH gene in patients presenting with developmental delay, seizures, and autistic features who lack a molecular diagnosis. Sequencing PIGH in such individuals could potentially help identify pathogenic variants and improve diagnostic rates for GPI-anchor biosynthesis disorders.

Since the PIGH coding region is relatively short (~564 bp), Sanger sequencing would likely be a suitable approach. Sanger is generally considered a first-generation method, reading one sequence at a time rather than in parallel like next-gen methods. I would extract genomic DNA from a patient sample, PCR-amplify the PIGH region using forward and reverse primers, and use the purified product as input for the Sanger reaction. A forward and reverse read should be sufficient to cover the full sequence with overlap.

The Sanger reaction mix contains normal dNTPs along with fluorescently labeled ddNTPs. During extension, the polymerase occasionally incorporates a ddNTP, terminating the chain at that position. This produces fragments of various lengths, each capped with a color for A, T, G, or C. Capillary electrophoresis separates them by size, and a detector reads the color at each position. The output is a chromatogram with color-coded peaks and a derived nucleotide sequence that could be compared against the PIGH reference to look for mutations.

5.2 DNA Write

I would want to synthesize the codon-optimized PIGH expression cassette from Parts 3–4. It could also be useful to order variant versions carrying known disease-associated mutations alongside wild-type to compare their effects on protein function. I would likely use phosphoramidite oligonucleotide synthesis as offered by companies like Twist Bioscience. Short oligos are built base-by-base on a solid support through repeated cycles of de-protection, coupling, capping, and oxidation, then assembled into the full-length gene and sequence-verified. Error rate tends to increase with length, so longer constructs may need to be built from smaller fragments. For PIGH (~564 bp), this should be well within a comfortable range.

5.3 DNA Edit

I would want to correct the M1L mutation (c.1A>T) in the PIGH gene. This change appears to disrupt the start codon and may prevent normal protein production. An adenine base editor (ABE) could be a reasonable choice for this kind of single-base correction. ABE generally uses a nickase Cas9 fused to an adenosine deaminase. A guide RNA directs the complex to the target site, the deaminase converts the target base without creating a double-strand break, and the cell’s repair machinery completes the correction. Inputs would include the ABE protein or mRNA, a guide RNA targeting the mutation region, and cells carrying the variant. Some limitations include the narrow editing window of the deaminase, dependence on a nearby PAM sequence, possible bystander edits on other adenines in the window, and variable delivery efficiency across cells.

Week 3 HW: Lab Automation

Part 1: Python Script Opentron Artwork

Opentrons Colab for Source Code

My Jelly Smiley Design

jelly jelly

Part 2: Post-Lab Questions

Question 1: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Fushimi, K., Nakai, Y., Nishi, A., Suzuki, R., Ikegami, M., Nimura, R., Tomono, T., Hidese, R., Yasueda, H., Tagawa, Y., & Hasunuma, T. (2025). Development of the autonomous lab system to support biotechnology research. Scientific Reports, 15, 6648. https://doi.org/10.1038/s41598-025-89069-y

This paper by Fushimi, Nakai et al. (2025) tackles the problem of optimizing how engineered bacteria produce glutamic acid, a commercially valuable amino acid used in MSG, medicine, and cosmetics. The researchers engineered E. coli by knocking out four genes and overexpressing ten others to redirect its metabolism toward glutamic acid production. However, finding the right growth medium recipe is extremely difficult because the bacteria tightly regulate their own glutamic acid levels to protect themselves from stress like pH changes and osmotic pressure. With dozens of possible nutrient combinations to test, manual experimentation is too slow and the biology is too complex for human intuition alone.

To solve this, they built the Autonomous Lab (ANL) using an Opentrons OT-2 liquid handler, an incubator, centrifuge, plate reader, mass spectrometer, and a robotic arm connecting them all. A Bayesian optimization AI suggests which combinations of calcium, magnesium, cobalt, and zinc concentrations to test, the robots autonomously prepare the media, culture the bacteria, and measure both growth and glutamic acid output, then feed the results back to the AI for the next round.

The system found a medium that nearly doubled E. coli growth using only 24 AI-picked experiments, outperforming a brute-force search of 256 conditions. This is the first fully autonomous closed-loop system for culture medium optimization, demonstrating that affordable tools like the Opentrons can power self-driving biology labs capable of navigating complex biological regulation faster than any human could, opening the door to optimizing production of many other valuable biomolecules the same way.

Part 3: Final Project Ideas

idea1 idea1idea2 idea2idea3 idea3

Week 4 HW: Protein Design I

((in progress…))

Week 5 HW: Protein Design II

Part 1: Generate Binders with PepMLM

I retrieved the human SOD1 sequence from UniProt (P00441) in FASTA format.

Original canonical sequence:

>sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

I introduced the A4V mutation by changing Alanine (A) to Valine (V) at position 4 of the mature protein sequence (index 4, 0-based).

Mutant SOD1-A4V sequence:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

The change is visible at positions 4-5: the original MATKA... becomes MATKV...

Using the PepMLM-650M Colab, I ran the model with the following parameters:

ParameterValue
Target sequenceSOD1-A4V (mutant)
Peptide length12
Top-K3
Number of binders4

I generated 4 peptides conditioned on the SOD1-A4V sequence, and added the known SOD1-binding peptide FLYRWLPSRRGG as a positive control for comparison.

Perplexity indicates PepMLM’s confidence that a peptide will bind the target — lower scores indicate higher confidence.

PepMLM perplexity scores output PepMLM perplexity scores output

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

Week 6 HW: Genetic Circuits I

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion HF PCR Master Mix is a ready-to-use 2X solution that contains everything needed for PCR except the template, primers, and water.

The most important component is Phusion DNA Polymerase, which is the enzyme that copies DNA during the reaction. The standard alternative is Taq polymerase, a widely used enzyme isolated from a heat-resistant bacterium called Thermus aquaticus. Taq survives the high temperatures of PCR but has no proofreading ability, so it cannot correct mistakes it makes while copying, resulting in a relatively high error rate. Phusion addresses this by including a proofreading domain that catches and fixes errors as they occur, making it about 50 times more accurate than Taq. This accuracy is important in a mutagenesis experiment where only the specific, intended mutations should be introduced and any unintended errors would compromise the result.

The mix also contains dNTPs, which are the four DNA building blocks (dATP, dTTP, dGTP, dCTP) that the polymerase uses to construct new strands. MgCl₂ is included as well, providing magnesium ions that the polymerase requires to function and that also affect how tightly primers bind to the template. Finally, the HF Buffer maintains the right pH and salt levels so the enzyme works reliably across different templates.

2. What are some factors that determine primer annealing temperature during PCR?

The annealing temperature is typically set 2 to 5°C below the melting temperature (Tm) of the primer pair. The Tm is the temperature at which half of the primer-template bonds break apart, so setting the annealing temperature just below it ensures primers bind firmly without sticking to the wrong locations.

Several factors affect Tm. Primer length matters because longer primers form more bonds with the template and therefore require more heat to detach, resulting in a higher Tm. GC content also plays a role since G-C base pairs form three hydrogen bonds while A-T pairs form only two, meaning primers with more G and C bases bind more tightly and have a higher Tm.

Salt concentration in the buffer raises the effective Tm because magnesium and other ions stabilize the DNA duplex. Mismatches lower the Tm because they weaken binding. When a primer contains intentional mismatches, it grips the template less tightly, so a lower annealing temperature is needed to allow binding to occur. Primer concentration and secondary structures such as hairpins can also shift the effective annealing behavior.

3. Compare and contrast PCR and restriction enzyme digests as methods for producing linear DNA fragments.

Both PCR and restriction enzyme digestion produce linear DNA fragments, but they work through different mechanisms and are suited to different situations.

In terms of protocol, restriction digestion is the simpler method. The DNA is mixed with the enzyme in an appropriate buffer and incubated at 37°C for about an hour. PCR is more involved and requires primer design, a reaction setup with polymerase and dNTPs, and a thermocycling program that moves through denaturation, annealing, and extension steps, taking around 90 minutes in total.

The two methods also differ in how flexible they are. Restriction enzymes can only cut at their specific recognition sequences, so the location of the cut is fixed by whatever sites exist in the template. PCR can amplify any region defined by the primer binding sites, giving full control over fragment boundaries. PCR also amplifies the DNA exponentially from a very small starting amount, while restriction digestion simply cuts the DNA that is already present without increasing the quantity.

A key advantage of PCR is the ability to introduce mutations through mismatches built into the primers. The amplified product will carry those changes. Restriction enzymes cannot introduce new sequence and can only cut existing DNA.

PCRRestriction Enzyme Digest
Requires primer design and a thermocycling program. More involved setup.Simpler protocol. Mix with enzyme and buffer, incubate at 37°C.
Fragment boundaries are fully flexible, defined by primer placement.Can only cut at specific recognition sequences already in the DNA.
Primers can introduce mutations into the product.Cannot change sequence, only cuts existing DNA.
Amplifies DNA exponentially from very small amounts.Does not amplify, only cuts what is already present.
Best when custom boundaries, mutations, or overhangs are needed.Best when convenient cut sites already exist and no changes are needed.

In this lab PCR is the right choice because the chromophore mutations need to be introduced and overlapping ends need to be added for Gibson assembly, neither of which restriction digestion can do.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

The most important requirement is that the fragments have overlapping ends of around 20 to 40 base pairs. These overlaps are what allow the fragments to recognize and stick to each other during Gibson assembly. When designing PCR primers, the overlapping regions need to be built into the primer sequences so that the amplified fragment carries them at its ends. The overlaps must also match the correct neighboring fragment so everything assembles in the right order.

Unlike restriction cloning, Gibson assembly does not require specific cut sites in the DNA. However, the ends must be carefully designed to be complementary to adjacent fragments. After PCR or digestion, the DNA should also be clean and free of contaminants so the assembly enzymes can work efficiently. Running a gel to confirm the fragments are the expected size is a useful check before proceeding to the assembly step.

5. How does plasmid DNA enter E. coli cells during transformation?

Plasmid DNA enters E. coli by making the cell membrane temporarily permeable so the DNA can pass through. There are two common methods for doing this.

The first is heat-shock transformation. The cells are treated with calcium chloride, which neutralizes the negative charges on both the DNA and the membrane. This reduces the natural repulsion between them and allows the DNA to sit against the cell surface. A sudden heat shock at 42°C then causes the membrane to expand rapidly, opening temporary pores. The DNA diffuses through those pores into the cell. The cells are immediately cooled so the membrane reseals, and then given nutrients and time to recover before being plated on selective media.

The second method is electroporation. Instead of heat, a short electrical pulse is applied to the cells. This creates tiny pores in the membrane through which the plasmid DNA can pass. Electroporation is generally more efficient than heat shock and is commonly used when transformation efficiency needs to be very high.

6. Describe another assembly method in detail: Golden Gate Assembly

Golden Gate Assembly is a one-pot cloning method that uses Type IIS restriction enzymes, most commonly BsaI, to join multiple DNA fragments in a single reaction without leaving any scar sequence at the junctions.

The key property of Type IIS enzymes is that they cut outside their recognition sequence rather than within it. This means the recognition site can be placed at the end of a fragment so that when the enzyme cuts, the recognition site itself is removed. What remains is a short custom 4-base-pair overhang at each junction, which the designer specifies in the primer sequence. Because every junction in the assembly has a different 4-base overhang, each fragment can only join its correct neighbor in the correct orientation. Order and direction are determined by the overhang sequences, not by chance.

The reaction contains all fragments, BsaI, and T4 ligase together in one tube. It cycles between 37°C, where the enzyme cuts, and 16°C, where the ligase seals compatible overhangs. Once two fragments are correctly joined, the BsaI recognition site is gone from that junction and the ligated product cannot be re-cut. This drives the reaction toward the fully assembled product over multiple cycles.

Compared to Gibson assembly, Golden Gate offers more precise control over junction sequences and scales well to 10 or more fragments in a single reaction. Gibson is simpler for small assemblies of 2 to 4 fragments because it only requires sequence overlap in the primers and runs isothermally at 50°C. Golden Gate requires that the BsaI recognition sequence does not appear anywhere inside the fragments being assembled, otherwise the enzyme will cut in the wrong place.

Golden GateGibson
Overhang type4-bp sticky ends, custom designed~25 bp single-stranded overlap
Reaction temperatureCycles between 37°C and 16°CIsothermal at 50°C
Scarless junctionsYesYes
Best for10 or more fragments, ordered assembly2 to 4 fragments, simple workflow
Requires RE site in primersYesNo

For this chromophore lab, Gibson is the more practical choice because only two fragments need to be joined. Golden Gate becomes the better option when assembling many parts in a defined order, such as in metabolic pathway engineering or combinatorial library construction.

Benchling

In Benchling, I created a new DNA sequence (type: DNA, topology: circular) to represent the pGEX plasmid. I imported it from the NCBI database using the accession number CVU78874, and confirmed it to be 4983 bp. The plasmid map showed several key annotated regions: the GST CDS (glutathione S-transferase), which is the orange region on the map and acts as a fusion tag so the inserted protein can be purified later using glutathione beads; the tac promoter, which drives expression of the insert and is switched on by adding IPTG to the growth media; and the ampicillin resistance gene, which allows only bacteria that successfully took up the plasmid to survive on selection plates.

CVU78874 CVU78874

I then created a second DNA sequence (type: DNA, topology: linear) for the mCherry insert. From UniProt, I obtained the mCherry amino acid sequence of 236 amino acids and used the Reverse Translate tool at bioinformatics.org to convert it into a 708 bp DNA coding sequence optimized for E. coli. I added the BsaI-compatible overhangs to both ends of the sequence — CGGT at the 5’ end before the ATG start codon, and ACCG at the 3’ end after the stop codon — giving a final insert length of 716 bp.

mcherry mcherry

In the pGEX sequence, I ran a digest using the Type IIS enzyme BsaI and found that it cuts once at position 2113 with 100% efficiency at 37°C. I added annotations for the cut site: 2107–2112 for the BsaI recognition site (GGTCTC), and 2113–2116 for the 4 bp overhang (CGGT) that gets exposed after cutting. Because BsaI cuts 1 nt outside its own recognition sequence, the recognition site is removed after digestion, leaving only the clean 4-nt sticky end for ligation.

Finally, using the Assembly Wizard, I selected Golden Gate Assembly as the method, assigned pGEX as the backbone and mCherry as the insert, and set the enzyme to BsaI. I manually selected the region around positions 2090–2130 as the fragment to provide sufficient flanking sequence for primer binding. The assembled plasmid was generated successfully, with mCherry correctly inserted at the BsaI cut site in the pGEX backbone, producing a circular plasmid where the GST tag and mCherry are expressed together as a fusion protein.

assemble assemble

Week 7 HW: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Boolean genetic circuits treat signals like 0 or 1, meaning every input must pass a hard on/off threshold. However, this can lose useful information. This can be limiting because real biological signals are usually not binary. Inside cells, signals often exist as gradual changes such as in concentration level. IANNs are useful because they keep more of that analog information since they are not only asking “is this signal present or absent?”.

Main advantages of IANNs

  • They preserve more information. Boolean circuits lose information when they convert a signal into ON or OFF. IANNs keep the strength of the signal.

  • They are more robust to noise. A Boolean circuit can fail if a signal is close to its threshold. An IANN responds gradually, so small changes or noise are less likely to cause sudden failure.

  • They degrade smoothly. If an input changes slightly, the output changes slightly. In Boolean circuits, a small change can flip the whole output.

  • They can handle weighted inputs. Some signals can matter more than others. IANNs can assign different “weights” to different biological inputs.

  • They are more efficient for complex decisions. A single IANN neuron can combine several inputs in a way that might require many Boolean gates. Fewer gates can mean: less metabolic burden, less delay, or fewer parts that can fail.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

IANNs (in theory) could work for any classification problem! For example, IANNs could help a cell decide whether a newly produced protein is likely to be properly folded, or misfolded or whether an environment that cell is placed in is a healthy tissue, inflamed tissue, or damaged tissue. The inputs could include oxygen level, pH, glucose level, inflammatory signals, cell-density signals, stress-response markers etc.

Each input alone is not enough for a reliable decision such as low oxygen might happen normally in some tissues, and inflammation alone does not always mean damage. But the combination of several signals can give a better estimation. A limitation is that these signals are still context-dependent. The same oxygen, pH, or inflammation levels might mean different things in different tissues. So the IANN would need careful calibration, and it might work well in one environment but less reliably in another.

3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.

perceptron perceptron

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

2-layer-perceptron 2-layer-perceptron

Assignment Part 2: Fungal Materials

What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Most current fungal materials are based on mycelium composites. Mycelium is the filament-like growth network of a fungus. In these materials, the fungus is grown through a plant-based substrate such as straw, corn stalks, or other agricultural waste. As it grows, the mycelium binds the loose material together. The final object is usually dried or heat-treated so the organism stops growing and the remaining material becomes a lightweight composite.

Use Cases:

  • Protective packaging. Mycelium can be grown inside molds to make custom shapes that protect products during shipping. This makes it a possible alternative to expanded polystyrene foam. Its advantages are that it can use agricultural waste, can be compostable, and requires less conventional manufacturing. The main limitations are slower production and lower mechanical strength compared with many petroleum-based foams.

  • Thermal and acoustic insulation. Because these materials are lightweight and porous, they can trap air and absorb sound. This makes them useful for interior panels, sound-absorbing tiles.

mycelium-acoustic-panel mycelium-acoustic-panel
  • Architecture and construction. Mycelium has also been used mostly through experimental bricks, panels, and temporary structures. A well-known example is Hy-Fi, a temporary pavilion at MoMA PS1 built from mycelium bricks grown from agricultural waste. This project showed that mycelium materials can be used at architectural scale, but it also shows the current limits: these materials are better suited for lightweight, temporary, or insulating structures than for directly replacing concrete, steel, or structural timber.
hy-fi-mycelium-pavilion hy-fi-mycelium-pavilion
  • Mycelium-based leather / synthetic vegan leathers (MycoWorks, Bolt Threads) / Design Objects.
eric-klarenbeek-mycelium-chair eric-klarenbeek-mycelium-chair

A distinct product example is the Loop Living Cocoon, a mycelium-based coffin. This is different from packaging or construction because it uses mycelium for an end-of-life product where biodegradability is the main value. The coffin is grown from mycelium and plant fibers and is designed to decompose after burial.

loop-living-cocoon loop-living-cocoon

Overall strengths and weaknesses

The main strengths of fungal materials are:

  • they can be grown from low-value biomass or waste
  • they are lightweight
  • they can be molded into specific shapes
  • they can be compostable or biodegradable
  • they can have useful insulation and acoustic properties
  • their texture and density can be tuned through growth conditions

The main weaknesses are:

  • lower strength than conventional structural materials
  • slower production compared with industrial foam or plastic manufacturing
  • contamination risk during growth
  • limited outdoor durability without coating or treatment

What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Fungi are useful for material applications because they naturally form large, connected, three-dimensional networks. Bacteria are easier to engineer in many cases, especially common lab organisms like E. coli, but bacteria usually produce molecules, biofilms, or materials that need extra processing. Fungi directly grow into bulk structures through hyphal extension and entanglement.

Fungi also grow well on many cheap plant-based substrates. This is important because material production only becomes practical if the feedstock is inexpensive. Many mycelium-forming fungi can use sawdust, straw, coffee grounds, hemp, or agricultural residues.

Another advantage is that fungi are eukaryotes. This means they can sometimes process more complex proteins or biomolecules than bacteria. This matters if the material is expected to produce pigments, enzymes, structural proteins, or other functional additives.

The tradeoff is that fungi are harder to engineer than bacteria. Bacteria grow quickly and have standardized genetic tools. Many mushroom-forming fungi grow more slowly and have less developed engineering toolkits. Tools that work in yeast or common lab fungi do not always transfer easily to species such as oyster mushroom or reishi.

Week 9 HW: Cell-Free Systems