<Yutong Wu> — HTGAA Spring 2026

About me

Yutong Wu cover

🌿 Hi, I’m Yutong — Wellesley College ‘26 with background in Computational Cognitive Science & Biological Sciences

🔬 Scientist · 📸 Wildlife Photographer · 🎼 Guzheng Musician
I aspire to be a design engineer in wearable health technology and smart pet technology. I believe nature is the best designer and want to build tools that listen to bodies, human and furry.

🗣️ Languages: Suzhou dialect, Mandarin, English, German · dabbled in Spanish and Hindi


🐾 how i got here

I took my first bio class in Organismal Biology in freshman fall, thinking it would be my last.

Spoiler: it wasn’t.

I couldn’t stop signing up for more animal-related bio classes, and somehow that snowballed into fieldwork across three continents.


🌏 where i’ve worked

🏔️ June 2023 · Sanjiangyuan, China — Animal conservation with the Sanjiangyuan Qumalai Ecological Protection & Wildlife Monitoring Association (中国三江源曲麻莱生态保护野生生物监测协会).

🐢 May 2024 · Little Cayman Island — Field research with my Tropical Ecology class on microplastics on epiphytes on seagrass — tiny pollutants, on tiny plants, on bigger plants.

🐒 January 2026 · Isla de los Monos, Iquitos, Peru — Volunteered at a monkey shelter deep in the rainforest. Had the best time hanging out with capuchins (Cebus), howler monkeys (Alouatta), saki monkeys (Pithecia), woolly monkeys (Lagothrix), titi monkeys (Plecturocebus), and tamarins (Saguinus). Survived without putting on any insect repellent.


💡 two things four years taught me

1. I love animals — ALIVE. I can’t do any experiment that involves keeping vertebrates in captivity for experiments. Full stop. The paper that keeps me studying bio is this one on the elephant facial motor system — sharing it here because it’s the kind of work that makes me believe in the field.

2. I need my hands in the work. 200% dry lab is not for me. I need to do things — dissect, build, pipette, troubleshoot, get my gloves dirty.


🧬 so… why HTGAA?

That’s exactly why I’m here — exploring more fun bio technologies, and looking for the sweet spot between rigorous science, hands-on tinkering, and the animals (and ideas) I love.

let’s build something cool

Contact info

Homework

Homework cover

Labs

Labs cover

Projects

Projects cover
May 11, 2026

Subsections of <Yutong Wu> — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles & Practices

    Project: Physarum-on-a-Chip Environmental Sensor The tool I want to develop is a Physarum-on-a-Chip environmental sensor – a microfluidic device that confines the plasmodium of Physarum polycephalum (slime mold!!) within a controlled chemotactic gradient array, and reads out the organism’s foraging behavior as a chemical-environment signal. Why Physarum Physarum is a single multinucleated cell that solves problems no single cell “should” be able to solve. With no neurons and no central controller, it:

  • Week 2 HW: DNA Read, Write, & Edit

    My protein this week is rhodopsin (RHO, UniProt P08100) Iti is a photon-sensing G-protein-coupled receptor in rod cells of the retina. As someone who works professionally in photography, this protein is basically my biological counterpart: a single 11-cis-retinal molecule sits in the middle of a 7-transmembrane GPCR and isomerizes to all-trans on absorbing one photon, triggering the entire phototransduction cascade. It is the sensor in the world’s oldest and most refined camera.

  • Week 3 HW: Lab Automation

    1. Published Paper Using Opentrons for a Novel Biological Application Paper: Bryant, J. A., Kellinger, M., Longmire, C., Miller, R., & Wright, R. C. (2023). AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots. Synthetic Biology, 8(1), ysac032. https://doi.org/10.1093/synbio/ysac032 What they built AssemblyTron is an open-source Python package that turns the ~$10k Opentrons OT-2 (with a thermocycler module) into a hands-free DNA-assembly workstation. It plugs into existing assembly-design tools (j5, Cello, Benchling) and executes the resulting build plans directly on the robot, covering three of the most common synbio assembly chemistries:
  • Week 4 HW: Protein Design Part I

    Part A — Conceptual Questions 1. How many molecules of amino acids are in 500 g of meat? Assume meat is roughly 20% protein by weight. The mass of protein is: 500 × 0.20 = 100 grams of protein. Let’s assume the average molecular weight of a protein is 100 g/mol. Therefore:

  • Week 5 HW: Protein Design Part II

    Part 1: Generate Binders with PepMLM The original sequence of SOD1 is: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Mutate the 4th amino acid A to V (A4V): MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence: index Binder Pseudo Perplexity 0 HLYYAVALELKX 13.299815648347872 1 WRSYAVVLELWK 17.97100111129112 2 WRYYPVAAAWKK 11.081842724779028 3 WHYGAVGLRHKX 13.983770011694478 Part 2: Evaluate Binders with AlphaFold3 We submitted each peptide paired with the mutant SOD1 (A4V) sequence to the AlphaFold Server as separate chains to model the protein–peptide complex. All runs used seed 2026616022 for reproducibility.

  • Week 6 HW: Genetic Circuits Part I

    Part 1. Questions 1. Phusion High-Fidelity PCR Master Mix Components Phusion DNA Polymerase — high-fidelity polymerase with 3′→5′ proofreading exonuclease activity; ~50× lower error rate than Taq dNTPs — nucleotide building blocks (dATP, dCTP, dGTP, dTTP) incorporated during strand synthesis HF Buffer + Mg²⁺ — provides optimal pH and ionic conditions; Mg²⁺ is an essential cofactor for polymerase activity Stabilizers — maintain enzyme activity during storage and reaction setup 2. Factors That Determine Primer Annealing Temperature GC content — G·C pairs have 3 H-bonds vs. 2 for A·T, raising T_m Primer length — longer primers = higher T_m Salt/Mg²⁺ concentration — stabilizes duplexes, increases T_m Primer secondary structure — hairpins or self-dimers reduce effective T_m Polymerase used — Phusion tolerates higher T_a than Taq; use NEB Tm Calculator for Phusion Rule of thumb: T_a ≈ T_m of the lower-melting primer (for Phusion)

  • Week 7 HW: Genetic Circuits Part II

    Part 1. Intracellular Artificial Neural Networks Q1. Advantages of IANNs over Traditional Boolean Genetic Circuits A traditional genetic circuit works like a panel of on‑off light switches. Each gene is either fully expressed or completely silent, and the circuit’s output is a strict Boolean function of those binary inputs. An IANN, by contrast, behaves more like a set of dimmer switches connected through a mixing board. Each input can take any value within a continuous range, the connections have adjustable weights, and the final output is a smooth, graded signal instead of a hard 0 or 1.

  • Week 9 HW: Cell Free Systems

    ##Part 1 1.Advantages of Cell-Free Over In Vivo Expression Cell-free protein synthesis (CFPS) removes the cell as a “black box” and allows you directly control and observe every variable in real time: pH, redox potential, ionic strength, and cofactor concentration. 2.Main Components and Their Roles Component Role Cell extract Provides ribosomes, tRNA, synthetases, chaperones, and machinery DNA/mRNA template Encodes the target protein (plasmid or linear) RNA polymerase Transcribes DNA → mRNA (T7 RNAP is most common) Amino acids Raw building blocks for translation Energy system Supplies and recycles ATP/GTP to power translation Salts and buffer Maintains pH (~7.5) and ionic strength (Mg²⁺, K⁺ critical) Additives Chaperones, detergents, etc., added based on target needs 3.Energy Provision and ATP Regeneration

  • Week10 HW: Imaging and Measurement

    What to measure: Identity, mass, purity, and post-translational modifications of the target protein; concentration of a biomarker; oligomeric state. How: Intact mass by LC-MS (QTof) → confirms overall MW and detects unexpected modifications. Peptide mapping by tryptic digest + LC-MS/MS → confirms primary sequence and identifies PTM sites. Native MS / CDMS → reveals folded state and oligomeric assembly. SDS-PAGE / Western blot → quick purity and identity check before MS. UV-Vis (A280) → concentration. Part I — Molecular Weight of Intact eGFP Q1. Theoretical MW from sequence Sequence length: 247 residues (includes LE linker + HHHHHH His-tag).

  • Week11 HW: BIOPRODUCTION AND CLOUD LABS

    Unfortunately I was away at CHI 2026 during the contribution window, so I didn’t get to commit a pixel in time. Part B – Cell-Free Protein Synthesis B1. Role of each component E. coli Lysate BL21 (DE3) Star Lysate (with T7 RNAP): The “factory floor” – a crude cytoplasmic extract carrying ribosomes, tRNAs, aminoacyl-tRNA synthetases, translation factors, and the T7 RNA polymerase needed to transcribe T7-promoter templates. The DE3 Star background also lacks RNase E activity, so mRNAs last longer. Salts / Buffer

Subsections of Homework

Week 1 HW: Principles & Practices

Project: Physarum-on-a-Chip Environmental Sensor

The tool I want to develop is a Physarum-on-a-Chip environmental sensor – a microfluidic device that confines the plasmodium of Physarum polycephalum (slime mold!!) within a controlled chemotactic gradient array, and reads out the organism’s foraging behavior as a chemical-environment signal.

Why Physarum

Physarum is a single multinucleated cell that solves problems no single cell “should” be able to solve. With no neurons and no central controller, it:

  • Finds shortest paths through mazes between food sources (Nakagaki et al., Nature, 2000).
  • Recapitulates the Tokyo rail network when offered food at the cities surrounding Tokyo (Tero et al., Science, 2010).
  • Remembers where it has been. Even without a nervous system, it lays down a trail of extracellular slime and avoids re-exploring already-visited areas. Reid et al. (PNAS, 2012) showed this functions as a kind of externalized spatial memory – the organism offloads its memory into the environment.
  • Anticipates periodic stimuli. Saigusa et al. (Phys. Rev. Lett., 2008) showed Physarum slows down in anticipation of regular cold pulses, then re-anticipates after the stimulus stops – a form of habituated learning without synapses.

This is bio-intelligence without a brain. The intracellular pathways are doing the work: oscillating cytoplasmic streaming, calcium waves, actomyosin contractions, and reaction-diffusion dynamics in the cell. The whole organism is a wet, living analog computer.

What I want to build

A device with these layers:

  1. A microfluidic chip with PDMS channels patterned as a 2D array of “chambers” connected by narrow passages. Each chamber can be loaded with a chemoattractant (oat flake extract, glucose) or a chemorepellent (light, salt, quinine, or a target environmental contaminant – heavy metals, pesticide residue, microplastic extract).
  2. A Physarum plasmodium introduced at a central inoculation chamber. It explores the array, makes routing decisions, and lays down its slime trail.
  3. A camera + time-lapse readout that records the network topology over hours. Image analysis converts the plasmodium’s tube network into a graph – nodes, edges, weights.
  4. A signal interpretation layer. The pattern of which chambers Physarum colonizes, which it avoids, and how fast it gets there encodes information about the chemical environment. A trained Physarum (one that has previously encountered a contaminant and “learned” to avoid it) gives a different network than a naive one.

Why I find this exciting

Three reasons:

  1. The memory question. How does an organism without neurons remember a route? The extracellular slime hypothesis is elegant but probably not the whole story; intracellular calcium oscillations and tube-diameter hysteresis also encode state. Building a controlled platform lets me actually test which mechanism dominates in different conditions.
  2. Bio-intelligence as an alternative paradigm. Most “intelligence” we build is digital and silicon. Physarum is a counter-example – distributed, analog, embodied, and runs on oatmeal. If the next wave of computing is going to be biological or neuromorphic, slime mold is a useful reference organism for what computation without a CPU even looks like.
  3. The sensor application is genuinely useful. A Physarum-on-a-Chip in a riverbank or wastewater stream could integrate over many chemical signals at once and give a single read-out – “this water is unusual” – in a way that a stack of individual electrochemical sensors cannot. It’s an integrator, not just a detector.

Class Assignment: Governance & Ethics

Step 2: Governance / policy goals

Because this tool integrates a living organism into computational and sensing infrastructure, ethical development requires attention to four areas: lab safety, ecological responsibility, transparency / scientific honesty, and equitable access.

Goal A: Foster lab safety

  • A1: Ensure safe handling of Physarum polycephalum, which is BSL-1 (non-pathogenic in healthy humans) but can still trigger allergic responses to its spores and is a mild contamination risk in shared lab spaces.
  • A2: Standardize protocols for the microfluidic device fabrication (PDMS curing, plasma bonding, solvent handling) so the chip-making process is no more hazardous than the organism it contains.

Goal B: Protect the environment

  • B1: Prevent ecological release of the cultured Physarum strain. P. polycephalum itself is cosmopolitan, but lab strains have been selected for fast growth on agar – a fitness profile that may differ from wild populations.
  • B2: Prevent contamination of test water/soil samples after they have been incubated with the device. If a sensor is used in the field, the post-assay sample must be inactivated before disposal.
  • B3: Ensure environmental sensor readouts are truthful and reproducible. A false-negative reading on a contaminated water source is a real harm; a false-positive triggers expensive intervention.

Goal C: Promote transparency and scientific integrity

  • C1: Avoid overclaiming “intelligence” or “cognition” in slime mold. The science is genuinely fascinating, but the popular framing tends to drift into anthropomorphism that is bad both for public understanding and for the organism’s welfare framing.
  • C2: Open data, open protocols. If a sensor’s output depends on a proprietary trained Physarum strain, the result isn’t reproducible.

Goal D: Promote equity and constructive use

  • D1: Keep the technology low-cost. The whole point of a slime-mold sensor is that it runs on oats and tap water – this should be accessible to community labs, smallholder farmers, and schools.
  • D2: Open educational use. Physarum is one of the best teaching organisms for distributed computation; the chip platform should be usable in undergraduate and high-school labs.

A note on a question that doesn’t fit cleanly in the four-bucket framework: does a slime mold have welfare interests? I think the honest answer is “probably not in any morally weighty sense, but the question deserves to be open.” For governance purposes I treat Physarum as a non-sentient living system that nonetheless deserves the same baseline respect as other model organisms.

Step 3: Three governance actions

Option 1: BSL-1+ handling protocol for engineered/selected microbial sensors (technical strategy + new rule)

Aspect
PurposeRight now, BSL-1 organisms like Physarum have minimal handling requirements – benchtop work, standard PPE, autoclave waste. I propose a “BSL-1+” tier for any living organism deployed as a sensor outside the lab (in the field, in a public installation, in a school). BSL-1+ adds: documented inactivation protocol before disposal, no environmental release of the cultured strain, mandatory chain-of-custody logging for any field deployment, and training for any non-lab user (farmer, teacher, citizen scientist).
DesignThe CDC/NIH Biosafety in Microbiological and Biomedical Laboratories (BMBL) guidelines are amended to add the BSL-1+ tier. EPA picks it up for field-deployment permits. iGEM and community lab consortia adopt it as a default. The tier is lightweight by design – it’s a checklist, not a new physical facility requirement – so the bar to comply is low.
Assumptions(a) Physarum lab strains differ from wild strains enough that release is a real (if low) concern. (b) Users will actually follow a checklist; documented protocols outperform informal practice. (c) The marginal compliance cost is low enough not to discourage community use.
Risk of failureIf the checklist is too detailed it gets ignored; if too vague it does nothing. Risk of success: the tier becomes a template that gets applied to every BSL-1 organism in the field, raising the regulatory bar on benign citizen science.

Option 2: Open data + reproducibility standard for bio-sensor readouts (incentive + technical strategy)

Aspect
PurposeRight now, environmental sensor results – including bio-sensor results – are published case by case with no shared standard for raw data. I propose a “BioSensorML” reproducibility standard: any peer-reviewed paper or commercial product reporting a Physarum-on-a-Chip (or similar living-sensor) result must deposit raw time-lapse data, chip geometry, Physarum strain provenance, environmental sample chain-of-custody, and image-analysis pipeline in a public repository (modeled on the Image Data Resource for cell biology, or the MIAME standard for microarrays).
DesignNSF and EPA add this as a funding requirement, similar to the current data management plan rule. Journals (PNAS, eLife, Nature) sign on as adopters. The Open Source Hardware Association and FreeGenes provide the cultural infrastructure for the open-strain side.
Assumptions(a) Sensor results are reproducible in principle if the inputs are shared – not always true for living systems but should be aspired to. (b) Researchers will comply rather than withhold data. (c) Repository infrastructure (long-term storage, image hosting) can be funded.
Risk of failureCompliance is paperwork-only and data quality is poor. Risk of success: the standard gets so detailed it becomes a burden on small labs and community scientists, ironically defeating the equity goal.

Option 3: Language and framing guidelines for “bio-intelligence” (governance + norms)

Aspect
PurposeThe popular framing of slime-mold work routinely overstates the cognitive case (“slime molds are intelligent,” “slime molds learn”). This is bad for science communication (sets up backlash when the public realizes Physarum isn’t actually “thinking”), bad for the field (attracts funding on overclaims that don’t deliver), and arguably bad for any future where genuine non-neural cognition is a topic. I propose voluntary framing guidelines for researchers, journalists, and grant agencies, distinguishing behavioral terms (responds to, chemotaxes toward, oscillates, anticipates) from cognitive terms (decides, learns, remembers, thinks).
DesignA consortium of researchers (the Physarum / unconventional computing community), science journalists (the Science Media Centre), and journal editors writes a short framing-guide document. Adoption is voluntary but signal-bearing – it becomes a soft norm that grant reviewers and editors can point to.
Assumptions(a) Language shapes both science and public understanding. (b) Researchers will care about being seen to comply (reputational incentive). (c) A consensus framing is achievable across a small, identifiable community.
Risk of failureVoluntary norms are ignored; the field continues to overclaim. Risk of success: the framing guide becomes a stylistic straitjacket that suppresses legitimate exploration of what “memory” and “decision” can mean outside neural systems.

Step 4: Scoring against the rubric

(1 = strongly does it, 2 = somewhat, 3 = does not, n/a = not applicable)

CriterionOption 1: BSL-1+ tierOption 2: Open-data standardOption 3: Framing guidelines
Enhance biosecurity – prevent incidents13n/a
Enhance biosecurity – help respond22n/a
Foster lab safety – prevent incidents133
Foster lab safety – help respond233
Protect environment – prevent incidents123
Protect environment – help respond213
Minimize costs/burdens221
Feasibility121
Not impede research222
Promote constructive applications212

Step 5: Recommendation

I would prioritize a combination of all three, weighted toward Options 1 and 2, with Option 3 as a low-cost cultural overlay.

  • Option 1 (BSL-1+ handling) is the highest-impact, lowest-cost safety measure for living-sensor deployments. It addresses the real but currently unregulated risk of releasing lab-selected microbial strains in the field. The compliance burden is a checklist, not new equipment.
  • Option 2 (open-data standard) addresses the reproducibility crisis specific to living-sensor results – a real concern because Physarum behavior is sensitive to strain history, temperature, and food state, and “it worked in my lab” is not enough. Open data is also the precondition for equity: smallholder users need replicable protocols, not magic strains.
  • Option 3 (framing guidelines) is the cheapest of the three and addresses a problem most safety/biosecurity frameworks miss entirely – that scientific overclaiming is itself a kind of harm, both to public understanding and to long-term research credibility.

Trade-offs:

  • Adding a BSL-1+ tier risks regulatory creep – the same logic could be used to over-regulate other benign citizen-science activities. Mitigation: the tier triggers only on out-of-lab deployment, not on lab work.
  • Open data standards favor well-funded labs that can produce clean, depositable datasets. Mitigation: provide deposit infrastructure (NSF-funded repository) and accept “rough” data formats for community-lab submissions.
  • Framing guidelines can become language policing. Mitigation: the document is short, voluntary, and explicitly preserves the right to discuss genuine open questions about non-neural cognition.

Audience for this recommendation:

  • For Option 1: the CDC/NIH BMBL committee (the formal home of BSL guidelines) and the EPA Office of Pesticide Programs (for the field-deployment permit hook).
  • For Option 2: NSF Division of Environmental Biology and the Open Source Hardware Association.
  • For Option 3: the iGEM Foundation, the Physarum unconventional-computing community (the small annual workshops), and journal editors at PNAS / eLife.

Reflection – ethical concerns this week

Three things stood out:

  1. The “is this organism deserving of moral consideration” question is not zero, even for slime mold. I’m comfortable saying Physarum has no welfare interests in the morally weighty sense, but I notice that I’m comfortable with that partly because of how I was trained to think about single-celled organisms. As bio-intelligence research advances, the categorization is going to shift.
  2. Overclaiming is a quiet ethical issue. Most biosafety frameworks ignore it because it’s not a physical risk. But scientific overclaim – “Physarum is intelligent!” – erodes public trust in the same way physical incidents do, just slower and harder to attribute.
  3. The dual-use question for sensors. A Physarum-on-a-Chip that detects pesticide residues can also detect pharmaceutical metabolites in wastewater, which is one step from population-level surveillance. The same chip, deployed by the wrong actor, becomes a surveillance tool. The platform is dual-use even when the organism is benign.

Week 2 Lecture Prep

Prof. Jacobson’s questions

Q1. Polymerase error rate vs. human genome length.

DNA polymerase alone has a base-misincorporation rate of roughly 1 in 10^5 (1 error per 100,000 bases) from intrinsic nucleotide-selectivity alone. With built-in 3’ -> 5’ exonuclease proofreading, the error rate drops to about 1 in 10^7. Then post-replication mismatch repair (MMR) – MutS/MutL in bacteria, MSH/MLH homologs in eukaryotes – catches most of the rest, bringing the final error rate to about 1 in 109 to 1010 per base per replication.

The human genome is ~6 x 10^9 bp per diploid cell. If we used raw polymerase fidelity (10-5), every cell division would introduce ~60,000 errors. With proofreading only (10-7), still ~600 errors. With proofreading + MMR (10^-9), it’s about 0.6 errors per genome duplication on average – so most divisions are error-free, with the occasional one slipping through.

Biology deals with the discrepancy by stacking three independent layers of error correction, each catching ~99-99.9% of errors the previous missed. Fidelity is multiplicative. On top of that, biology tolerates some residual error rate because (a) most of the genome is non-coding and tolerant to single-base changes, (b) diploidy means a hit on one copy is usually backed up by the other, and (c) the residual error rate is the substrate for evolution.

Analogy: it’s like a camera with three layers of stabilization – in-body sensor shift, in-lens optical, and software post-stabilization. Each fixes a different scale of shake. The combination yields a sharp image even from a moving handheld shot; none of the three alone would be enough.

Q2. How many ways to code an average human protein – and why most don’t work.

The genetic code is degenerate: 64 codons code for 20 amino acids + stop. Most amino acids have multiple codons (Leu, Arg, Ser have 6 each; Met and Trp have only 1).

For an average human protein (~375 amino acids), the number of synonymous DNA sequences is the product of codon counts over each residue. With an average of ~3 codons per residue, the number is approximately 3375 ~ 10179 synonymous coding sequences – vastly larger than the number of atoms in the observable universe (~10^80).

Why most of those don’t express well in practice:

  1. Codon usage bias. Each organism has preferred codons matched to its tRNA pool. Rare codons (e.g., AGG/AGA Arg in E. coli) cause ribosome stalling and truncated products.
  2. mRNA secondary structure. Some codon choices fold the mRNA into hairpins that block ribosome scanning, especially near the 5’ UTR / start codon.
  3. GC content. Extreme high or low GC affects mRNA stability and transcription.
  4. Hidden regulatory elements. Synonymous changes can create or destroy splice sites, miRNA targets, internal Shine-Dalgarno-like sequences, or polyadenylation signals.
  5. Restriction sites. Sequences containing BsaI/BsmBI/EcoRI break downstream cloning workflows.
  6. Repeats and homopolymers. Long stretches of one base, or large direct repeats, are hard to synthesize and prone to recombination.
  7. Translation kinetics matter for folding. Some proteins fold co-translationally; the speed of translation through certain regions matters. Optimizing every codon to “fastest” can paradoxically misfold the protein.

This is exactly why codon optimization tools (Twist, IDT, GenScript) exist – to navigate the 10^179 sequence space toward sequences that actually express in the chosen host.

Dr. Leproust’s questions

Q1. Most common oligo synthesis method: Phosphoramidite chemistry, developed by Caruthers and Beaucage (1981) and still the workhorse. A 4-step cycle adds one nucleotide at a time to a growing chain on a solid support (CPG bead or microarray chip): detritylation -> coupling -> capping -> oxidation. Repeat per base.

Q2. Why 200 nt is the practical limit for direct synthesis.

Two reasons, compounding:

(a) Coupling efficiency compounds geometrically. Even at 99.5% per cycle (very good), the yield of full-length product is 0.995^N. For N=200, that’s ~37%. At 99% efficiency, ~13%. At 98%, ~2%. Every length doubling cuts the full-length fraction sharply.

(b) Depurination is the hard wall. The mild acid used in detritylation (dichloroacetic or trichloroacetic acid) cleaves purine bases (A and G) from the sugar at a low but non-zero rate per cycle. Every cycle adds another exposure. By ~150-200 nt, depurination produces enough abasic sites that the full-length fraction collapses regardless of coupling chemistry. Agilent’s published work on 150mer libraries was a depurination-control breakthrough; getting much past 200 nt with conventional phosphoramidite remains hard.

A third practical reason: side products (n+1, n-1 deletions, GG dimers from dG re-coupling) accumulate, making purification harder for long oligos.

Q3. Why you can’t make a 2000 bp gene by direct oligo synthesis.

Combining Q2: at 99.5% per step, a 2000-nt direct synthesis would yield 0.995^1999 ~ 0.005%, essentially zero. Depurination would have destroyed most molecules long before. No production chemistry can synthesize a 2 kb oligo as a single molecule.

In practice, 2 kb genes are built by assembly: synthesize ~200 nt oligos that overlap each other, then stitch them via PCR-based methods (polymerase cycling assembly, Gibson, Golden Gate) into the full-length gene. Twist, IDT, and Genscript all use this hierarchical approach. Newer enzymatic synthesis approaches (Ansa, DNA Script) aim to break through the length barrier by avoiding the acid detritylation step.

Prof. Church’s question

Choice: Q1 – The 10 essential amino acids and the Lysine Contingency.

The 10 essential amino acids in animals (cannot be synthesized de novo, must come from diet):

  1. Histidine (H)
  2. Isoleucine (I)
  3. Leucine (L)
  4. Lysine (K)
  5. Methionine (M)
  6. Phenylalanine (F)
  7. Threonine (T)
  8. Tryptophan (W)
  9. Valine (V)
  10. Arginine (R) – essential in juveniles, conditionally essential in adults

(Mnemonic: PVT TIM HALL.)

What this implies about the Lysine Contingency (the Jurassic Park plot device where dinosaurs are engineered to require dietary lysine, so they die without humans feeding them):

The premise is scientifically incoherent on its own terms. Lysine is already essential in all animals – the engineered dinosaurs, like every other animal, would already be unable to synthesize lysine. They would already need to get it from their diet (meat, plants, anything containing protein). The “contingency” only works if you add a new dependency on something that doesn’t exist in their food chain: a non-natural amino acid, or a vitamin/metabolite the engineered organism can’t get from any natural source. What Crichton called a lysine contingency is actually a generic essential-amino-acid contingency, and lysine is the worst possible choice because it is abundantly available in any meat or legume the animals would naturally eat.

My view of this as bioconfinement: the principle is good – engineer a metabolic dependency that doesn’t exist in nature – but the dependency has to be chosen carefully. Real biotech implementations (e.g., E. coli strains dependent on non-canonical amino acids via expanded genetic code, or auxotrophic strains requiring synthetic ligands) work because the supplemented molecule is not found in nature, not just because it’s nominally “essential.” This actually connects to my Physarum project: any future engineered Physarum strain deployed in the field could be made dependent on a synthetic small molecule that doesn’t occur in soil or water, so that escape into the environment is self-limiting.

Citations: Standard biochemistry references (Lehninger, Berg’s Biochemistry) for the amino acid list. Crichton, Jurassic Park (1990), for the original framing. No AI prompts used.


Lab Preparation – Pipetting

  • Completed in-person. I LUV Pipetting as a Biologist <3
  • Tried to finish both certifications. Not sure if one went through.

Week 2 HW: DNA Read, Write, & Edit

My protein this week is rhodopsin (RHO, UniProt P08100) Iti is a photon-sensing G-protein-coupled receptor in rod cells of the retina. As someone who works professionally in photography, this protein is basically my biological counterpart: a single 11-cis-retinal molecule sits in the middle of a 7-transmembrane GPCR and isomerizes to all-trans on absorbing one photon, triggering the entire phototransduction cascade. It is the sensor in the world’s oldest and most refined camera.


Part 0: Basics of Gel Electrophoresis

Negatively charged DNA is pulled toward the +electrode through an agarose mesh; shorter fragments thread through faster and travel farther. Run a ladder in parallel and you can read fragment sizes off the photo.


Part 3: DNA Design Challenge

3.1. Choose your protein

Rhodopsin (RHO, UniProt P08100, Homo sapiens, 348 aa). I chose it because:

  1. It is the cleanest example of biological signal transduction I know. one photon in, one G-protein activation out, with an amplification cascade behind it that lets a dark-adapted rod cell detect a single photon.
  2. The chromophore (11-cis-retinal) is bound covalently via a Schiff base to Lys296. The photon does not act on rhodopsin directly; it isomerizes the retinal, and the retinal then strains the protein into its active conformation (metarhodopsin II). The protein is a mechanical lever, not a photon absorber. That distinction was a real “oh” moment for me.
  3. Mutations in RHO cause autosomal-dominant retinitis pigmentosa (~25% of adRP cases), so this is also a tractable target for gene therapy, which connects nicely to Part 5.3 below.
>sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens OX=9606 GN=RHO PE=1 SV=1
MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY
VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG
GEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIP
EGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQES
ATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAI
YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA

3.2. Reverse Translate

Reverse-translated the protein sequence using the EMBOSS Backtranseq tool, then verified against the native human RHO mRNA (NCBI RefSeq NM_000539.3) for sanity. The native mRNA already has a real codon usage profile, but I used the back-translated sequence as a starting point so that codon optimization in 3.3 is a clean step rather than a re-use of the existing one.

A short prefix of the back-translated DNA (first 60 aa worth, 180 nt) looks like this before optimization:

ATGAATGGTACTGAAGGTCCTAATTTTTATGTTCCTTTTAGTAATGCTACTGGTGTTGTT
CGTAGTCCTTTTGAATATCCTCAATATTATTTAGCTGAACCTTGGCAATTTAGTATGTTA
GCTGCTTATATGTTTTTATTAATTGTTTTAGGTTTTCCTATTAATTTTTTAACTTTATAT

(Full 1,047 nt sequence in the Benchling file RHO_backtranslated.)

3.3. Codon optimization

I codon-optimized for E. coli K-12 as the expression chassis, because:

  • It is the chassis we will actually be using in lab (cell-free lysate + plasmid transformation).
  • Human codon usage is significantly different from E. coli in several places – notably Arg (AGG/AGA are rare codons in E. coli and abundant in human), Leu (CTA is rare in E. coli), and Ile (ATA is rare). Without optimization, ribosomes stall at rare-codon tracts and you get truncated products or no expression at all.
  • E. coli also has GC-content preferences (~50%) that differ from human (~60% in coding regions). Skewed GC content can cause hairpins and slow translation.

I used the Twist Codon Optimization Tool, set host = E. coli K-12, and excluded recognition sites for BsaI, BsmBI, BbsI (Type IIS enzymes used in Golden Gate assembly) and also EcoRI, HindIII, BamHI so my downstream cloning options stay open. The optimizer also smoothed out long homopolymer runs (>6 nt) and removed internal Shine-Dalgarno-like motifs that could cause internal translation starts.

Caveat about rhodopsin specifically: rhodopsin is a 7-transmembrane integral membrane GPCR. It does not natively express well in E. coli without engineering – the bacterial membrane lacks the right lipid composition, and there is no machinery for the disulfide bond (Cys110-Cys187) or palmitoylation (Cys322/323). In a real project I would either (a) express in HEK293 / Sf9 cells, or (b) express only a soluble cytoplasmic loop in E. coli for an antibody-generation experiment. For the purpose of this homework, codon-optimizing for E. coli is the assigned exercise; in practice I would optimize for Spodoptera frugiperda (Sf9) or human cells.

3.4. Two pathways to get protein from the optimized DNA:

Cell-dependent (in vivo): Clone the optimized RHO ORF into an expression vector with a T7 promoter, RBS, start codon, and terminator (the pTwist Amp High Copy vector from Part 4 works for cloning; for expression I’d move it into pET-28a in BL21(DE3)). Transform into competent E. coli, plate on Amp, pick colonies, grow to OD600 ~0.6, induce with IPTG, harvest, lyse, and purify via the His-tag on Ni-NTA. The cell does transcription via T7 RNAP and translation via its ribosomes; the protein folds (or, for rhodopsin, mostly misfolds into inclusion bodies, which then need refolding with retinal added in vitro to reconstitute the holoprotein).

Cell-free (in vitro): Use a TXTL lysate-based system like the one from Week 11. Add the linear or circular DNA template directly to the lysate + energy mix, and transcription/translation happen in the tube over 1-20 h. For a membrane protein like rhodopsin, the cell-free pathway has a real advantage: you can supplement the reaction with nanodiscs or detergent micelles so the nascent rhodopsin inserts into a membrane-mimetic environment rather than aggregating. Add 11-cis-retinal to the reaction and the holoprotein reconstitutes in situ. This is increasingly how membrane GPCRs are produced for structural biology.

3.5. How does it work in nature?

(a) transcriptional/post-transcriptional mechanisms:

  • Alternative splicing. One pre-mRNA can be spliced into many mature mRNAs by including or skipping exons. DSCAM in Drosophila notoriously produces >38,000 isoforms from one gene. For opsins specifically, the Drosophila ninaE gene uses alternative splicing to produce variants with different spectral tuning.
  • Alternative promoters. Different transcription start sites produce mRNAs with different 5’ UTRs and sometimes different N-terminal coding regions.
  • Alternative polyadenylation. Different 3’ ends change mRNA stability and localization.
  • RNA editing. ADAR enzymes convert A->I (read as G), which can change codons. Notably common in cephalopod opsins, where it tunes spectral sensitivity to local light environments.

(b) DNA -> RNA -> Protein alignment (first 60 nt / 20 aa of the codon-optimized RHO):

DNA:     ATG AAC GGC ACC GAA GGC CCG AAT TTC TAT GTG CCG TTC AGC AAC GCG ACC GGC GTG GTG
RNA:     AUG AAC GGC ACC GAA GGC CCG AAU UUC UAU GUG CCG UUC AGC AAC GCG ACC GGC GUG GUG
Protein:  M   N   G   T   E   G   P   N   F   Y   V   P   F   S   N   A   T   G   V   V
Pos:      1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20

Part 5: DNA Read / Write / Edit

5.1 DNA Read

(i) What DNA would I want to sequence and why?

I would sequence the RHO gene in patients with autosomal-dominant retinitis pigmentosa (adRP) to identify the specific causative mutation in each patient before deciding on a therapy. There are >150 known pathogenic RHO mutations, and they fall into mechanistically distinct classes (Class I = trafficking-defective, Class II = misfolding, etc.). The right therapy – allele-specific knockdown, base editing, prime editing, or gene replacement – depends on which class the mutation belongs to. Sequencing is the upstream diagnostic that determines everything downstream.

Beyond clinical use, I would also like to do metagenomic sequencing of cephalopod skin to find new opsins. Octopus skin appears to “see” through dermal opsins, and characterizing the opsin diversity across cephalopods could yield new optogenetic tools.

(ii) Which technology and why?

I would use Oxford Nanopore (third generation, long-read) for the clinical RHO case, complemented by Illumina short-read for accuracy.

  1. Generation: Nanopore is third-generation – single-molecule, real-time, no amplification, long reads (often 10-100 kb, can exceed 1 Mb). Sanger is first generation (single-read, dye-terminator, ~800 bp); Illumina is second generation (massively parallel short reads, ~150-300 bp, requires PCR amplification).

  2. Input prep:

    • Extract genomic DNA from a blood draw (column-based or magnetic-bead based extraction).
    • Skip fragmentation for long reads – the whole point is to preserve length.
    • Adapter ligation: ligate Nanopore’s sequencing adapters (which include a motor protein that controls translocation speed) to the ends of the genomic DNA.
    • Optional: enrich for the RHO locus by Cas9 cleavage + adapter ligation (targeted long-read sequencing) so you don’t waste reads on the rest of the genome.
    • Load onto the MinION/PromethION flow cell.
  3. Base calling: DNA is pulled through a protein nanopore in a membrane by an applied voltage. As each base passes through the pore constriction, it modulates the ionic current uniquely (A vs T vs G vs C give different current signatures, and modified bases like 5mC give yet different signatures). A neural network (Guppy / Dorado) reads the current trace and translates it to a base sequence. This is called “basecalling” and is essentially audio-to-text transcription, where the audio is the ionic current.

  4. Output: FASTQ files (sequence + per-base quality scores), plus optionally direct methylation calls (5mC, 6mA) without bisulfite conversion. For my use case, I align the reads to the human reference, call variants in the RHO locus, and report the patient’s genotype.

5.2 DNA Write

(i) What DNA would I want to synthesize and why?

I would synthesize a codon-engineered RHO variant carrying silent mutations across the entire coding region, designed to be invisible to a shRNA / siRNA that targets the wild-type sequence. This is the classic “knockdown and replace” strategy for adRP:

  • The dominant-negative mutant RHO allele in the patient is silenced by an shRNA that targets a region of the natural mRNA.
  • A “hardened” replacement RHO – silently re-coded so the shRNA can no longer bind – is co-delivered.
  • Net result: both alleles of native RHO are silenced, and the hardened replacement provides wild-type protein.

The replacement allele needs ~30+ synonymous changes across the shRNA target site, which is exactly the kind of thing Twist’s gene synthesis is good at. The replacement is ~1 kb, well within clonal gene size.

Beyond therapeutics, I would also love to synthesize opsin variants with shifted spectral sensitivity – e.g., a red-shifted human rhodopsin built by transplanting microbial opsin tuning residues, for optogenetic use in deep-tissue stimulation.

(ii) Technology – which synthesis platform?

I would use silicon-based microarray DNA synthesis (Twist’s platform) – this is the standard for gene-length synthesis at high throughput and low error.

  1. Essential steps:

    • Phosphoramidite chemistry: starting from a solid support, add one nucleotide at a time using a 4-step cycle (deblock -> couple -> cap -> oxidize), repeated for each base.
    • On a silicon chip, thousands to millions of distinct oligos are synthesized in parallel, each in a tiny well, with the order of bases controlled by where reagents are spotted.
    • Oligos (typically ~200 nt each) are then cleaved off the chip, pooled by gene, and assembled into full genes via PCR-based methods (Gibson, Golden Gate, or polymerase cycling assembly).
    • Error correction by enzymatic mismatch cleavage (e.g., T7 endonuclease I) or by sequencing-and-cherry-picking error-free clones.
  2. Limitations:

    • Speed: ~4-7 business days for express clonal genes at Twist; ~10 days standard. Faster than the old column-based oligo + PCR workflow, but still not “instant.”
    • Accuracy: ~1 error per 5,000-10,000 bp after error correction. For genes >5 kb, error rate compounds and yield drops; this is why genome-scale synthesis is hard.
    • Length: Clonal genes up to ~5 kb routinely; longer constructs require hierarchical assembly. Gene fragments are typically 300 bp - 5 kb.
    • Sequence constraints: very high or very low GC, long homopolymers, large repeats, and strong hairpins remain difficult or impossible to synthesize. This is why the codon optimizer flags and edits these.

5.3 DNA Edit

(i) What DNA would I want to edit and why?

I want to edit the P23H mutation in RHO – the single most common cause of adRP in North America, where a CCC (Pro) codon at position 23 is mutated to CAC (His). It is a misfolding mutation: the mutant protein aggregates in the ER, kills the rod cell, and the rod death spreads to cones – leading to progressive blindness over decades.

This is a perfect target for base editing: the C -> T transition needed to revert the codon (CAC -> CCC requires an A->G correction on the antisense strand, which an adenine base editor can do without making a double-strand break).

(ii) Which editing technology?

I would use an adenine base editor (ABE) – specifically ABE8e fused to a SpCas9 nickase (D10A) – delivered as a single AAV5 vector (which is retinal-tropic) by subretinal injection.

  1. How it works:

    • The Cas9 nickase is guided to the RHO locus by a single-guide RNA (sgRNA) complementary to a sequence next to the mutated codon, with a PAM (5’-NGG-3’) ~3-15 nt downstream.
    • Cas9n binds without cutting both strands; instead it exposes the non-target strand as ssDNA in the “R-loop.”
    • The tethered TadA deaminase domain (ABE8e) deaminates a target A on the exposed strand to inosine (I), which DNA polymerase reads as G. So an A:T base pair becomes a G:C base pair.
    • The Cas9n nicks the unedited strand to bias mismatch repair toward keeping the edit.
  2. Inputs and preparation:

    • Guide RNA design: identify a PAM within ~13-17 nt of the target A, design a 20-nt sgRNA so that the target A falls in the editing window (positions 4-8 from the PAM-distal end). For P23H, find a PAM in the surrounding sequence and design the sgRNA in silico (CRISPOR, Benchling CRISPR tool). Check for off-target sites with mismatch tolerance.
    • Editor construct: ABE8e-Cas9n with a tissue-specific promoter (rhodopsin promoter itself, for rod-cell specificity).
    • Delivery vector: AAV5 packaged with the editor + sgRNA. Two-vector dual-AAV split-intein systems are needed if the editor is too big for one AAV (~4.7 kb cargo limit).
    • Cells/tissue: delivered in vivo to retina by subretinal injection.
  3. Limitations:

    • PAM dependence: SpCas9 requires NGG nearby. Not every mutation has one in range. Engineered PAM-flexible Cas9 variants (SpRY, etc.) help but reduce specificity.
    • Editing window: ABEs can only flip A in a narrow window. Bystander edits (other As in the window) can introduce silent or unwanted changes – need to check.
    • Off-targets: even with high-fidelity Cas9 variants, low-level off-target editing happens at sites with similar sequence. Whole-genome sequencing post-treatment is the gold standard for measuring this.
    • Delivery efficiency: AAV reaches only ~5-30% of photoreceptors at typical doses. So even with 100% editing efficiency in transduced cells, you don’t fix every rod.
    • One mutation per edit: base editing reverts only the specific A:T -> G:C (or C:G -> T:A for CBE). For different RHO mutations, you need different editors / guides. Prime editing is more flexible but less efficient.

Week 3 HW: Lab Automation

1. Published Paper Using Opentrons for a Novel Biological Application

Paper: Bryant, J. A., Kellinger, M., Longmire, C., Miller, R., & Wright, R. C. (2023). AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots. Synthetic Biology, 8(1), ysac032. https://doi.org/10.1093/synbio/ysac032

What they built

AssemblyTron is an open-source Python package that turns the ~$10k Opentrons OT-2 (with a thermocycler module) into a hands-free DNA-assembly workstation. It plugs into existing assembly-design tools (j5, Cello, Benchling) and executes the resulting build plans directly on the robot, covering three of the most common synbio assembly chemistries:

  1. PCR with optimal annealing-gradient calculation the software computes the best annealing temperature for each fragment from primer Tm and uses the OT-2 + thermocycler to run gradient PCRs across a range of fragment lengths.
  2. Golden Gate assembly Type IIS-enzyme one-pot assemblies of multiple fragments into a destination vector.
  3. Homology-dependent in vivo assembly (IVA) short-overlap fragments co-transformed into E. coli, with assembly happening inside the cell.

What they showed

The authors simultaneously built four different four-fragment chromoprotein reporter plasmids on the OT-2 and showed assembly fidelity comparable to a human doing the same work by hand (verified by sequencing). They also used the same platform for site-directed mutagenesis via homology-dependent IVA, again with manual-equivalent fidelity.

why it counts as “novel biological application”

This is a textbook example of automating the Build step of the Design-Build-Test-Learn (DBTL) loop, which has historically been the slowest and most error-prone manual step. Two things make it novel rather than incremental:

  • It’s the first open-source software package to drive Golden Gate and homology assembly on a low-cost robot, so the price floor for automated cloning drops from ~$100k (commercial systems like Tecan or Hamilton) to ~$10k. That changes who gets to do high-throughput synbio.
  • It directly accepts output from automated design tools (j5, Cello), so you can go from a Cello-designed genetic circuit to physical DNA without a human pipetting step in between. That closes a real gap in the DBTL loop.

Limitations the authors note

  • The OT-2 isn’t as fast or as well-error-handled as a commercial Hamilton STAR.
  • No integrated colony picking, transformation, or QC. The human has to come back in the loop after assembly.
  • Plate-format constraints (96-well bottleneck) limit how parallel things can really get.

2. My Final Project Automation Plan

Project context

My final project builds on Week 2: I want to express a panel of rhodopsin (RHO) variants in a cell-free system to characterize how single-residue substitutions in the chromophore-binding pocket shift the absorption spectrum. The screen compares each variant’s lambda_max under blue/green/red LED illumination. The end goal is a small library of spectrally-tuned opsins for optogenetics, but for HTGAA the deliverable is the screening pipeline itself.

Why this needs automation

A meaningful spectral-tuning library is 50-200 variants, each tested in triplicate, each under at least 3 illumination conditions. That’s 450-1,800 CFPS reactions. Manual pipetting is the wrong tool: error accumulates, reagents drift over an 8-hour day, and you can’t realistically do replicates. Automation is the only way the experiment is actually run, not just designed.

What I would automate

The workflow maps neatly onto the Example 2 cloud-lab pipeline in the assignment, but I would run it on the Opentrons OT-2 + Ginkgo Nebula cloud lab combination:

  1. Design phase (no automation, human + Benchling). Pick residues around the retinal-binding pocket (Lys296, Glu113, and surrounding residues from PDB 1U19), generate variants in silico, codon-optimize with Twist tool, order as a clonal-gene plate from Twist.
  2. Echo acoustic transfer. Echo 525 dispenses the variant plasmid DNA from a source plate into the destination 384-well plate at 50 nL per well, three replicate wells per variant. Acoustic transfer is ideal here because the volumes are small and there’s no cross-contamination.
  3. OT-2 stamps the CFPS master mix. A multichannel pipette on the OT-2 dispenses 18 uL of NMP-Ribose master mix (from Week 11) + lysate into every occupied well of the 384-well plate. This is the step I’d write the Python protocol for.
  4. OT-2 supplements with 11-cis-retinal. Add 1 uL of 100 uM 11-cis-retinal to every well (final 5 uM) so the rhodopsin holoprotein can reconstitute as it’s translated. Light-protected throughout.
  5. PlateLoc seals. Heat-seal the plate to prevent evaporation over the 20 h reaction.
  6. Inheco incubates at 30 C (not 37 C – rhodopsin folds better cooler) for 20 h in the dark.
  7. XPeel removes seal.
  8. PHERAstar reads absorbance spectrum (350-650 nm) for every well under three illumination pulses: blue (470 nm), green (530 nm), red (625 nm). The active rhodopsin shows a characteristic ~498 nm peak that shifts with mutation; bleaching kinetics under each LED give an orthogonal readout.
  9. Data lands in a Jupyter notebook on Ginkgo Nebula, fits each spectrum, extracts lambda_max and bleaching half-life, and outputs a ranked variant table.

Example pseudocode for step 3 (OT-2 protocol skeleton)

from opentrons import protocol_api

metadata = {
    "protocolName": "CFPS Master Mix Stamp - RHO variant screen",
    "author": "rcd, HTGAA 2026",
    "apiLevel": "2.15",
}

def run(protocol: protocol_api.ProtocolContext):
    # Labware
    plate = protocol.load_labware("corning_384_wellplate_112ul_flat", 1)
    mm_reservoir = protocol.load_labware("nest_12_reservoir_15ml", 2)
    retinal_tubes = protocol.load_labware("opentrons_24_tuberack_2ml", 3)
    tips_p20 = protocol.load_labware("opentrons_96_tiprack_20ul", 9)
    tips_p300 = protocol.load_labware("opentrons_96_tiprack_300ul", 10)

    # Pipettes
    p300 = protocol.load_instrument("p300_multi_gen2", "left", tip_racks=[tips_p300])
    p20 = protocol.load_instrument("p20_single_gen2", "right", tip_racks=[tips_p20])

    # Step 3: stamp 18 uL CFPS master mix into every well
    mm = mm_reservoir.wells_by_name()["A1"]
    p300.distribute(
        18,
        mm,
        plate.wells(),
        new_tip="once",
        disposal_volume=2,
    )

    # Step 4: add 1 uL of 11-cis-retinal to every well (light-protected)
    retinal = retinal_tubes.wells_by_name()["A1"]
    for well in plate.wells():
        p20.transfer(
            1,
            retinal,
            well,
            new_tip="always",
            mix_after=(3, 10),
        )

    # Cooling block keeps lysate viable; protocol then hands off to PlateLoc + Inheco
    protocol.comment("Ready for sealing and 20h incubation at 30 C, dark.")

Custom hardware I’d 3D-print

Two pieces I think would be useful enough to design and print:

  1. Light-blocking enclosure for the OT-2 deck during retinal addition. 11-cis-retinal photoisomerizes under ambient light, so the addition step needs to happen under dim red light or in darkness. A black-PLA shell that drops over the deck (with a port for the pipette to enter from above) would solve this.
  2. A 384-well-to-96-well adapter plate for moving samples between Echo-output (384) and downstream PHERAstar reads where 96-well is more convenient. The Opentrons 3D Printing Directory probably already has something close.

Why Ginkgo Nebula vs. local Opentrons

I’d use Ginkgo Nebula for the high-throughput screen because:

  • 50-200 variants in triplicate exceeds what I can realistically QC on a single OT-2.
  • The cloud lab already has the Echo, PlateLoc, and PHERAstar integrated. On the local OT-2 those steps would need manual handoffs.
  • Reproducibility: the protocol file is the experiment. Someone in Berlin or Shanghai can re-run my best variant verbatim.

I’d use a local OT-2 for the design-iteration phase (10-20 variants, debugging the master mix recipe, getting the retinal-addition step working) because the round-trip time on a cloud lab is too slow for that loop.

Risk and what could go wrong

  • Cell-free yield drops at scale. What works in 20 uL in a tube may not in 18 uL in a 384-well plate with a higher surface-to-volume ratio (faster evaporation, more O2 depletion). Mitigation: pilot on 96-well first, optimize seal + headspace.
  • 11-cis-retinal is photosensitive and expensive. Aliquot under red light, work fast, and consider all-trans-retinal + retinal-isomerase regeneration as a backup.
  • Variant DNA from Twist arrives at different concentrations. Normalize on the Echo or with an OT-2 normalization step before the screen.
  • Spectral readout on PHERAstar. A microplate reader is not a true spectrophotometer; for the cleanest spectra I’d want a SpectraMax or similar. Mitigation: use the PHERAstar for screening, then confirm top hits on a benchtop spectrometer.

Week 4 HW: Protein Design Part I

Part A — Conceptual Questions

1. How many molecules of amino acids are in 500 g of meat?

Assume meat is roughly 20% protein by weight. The mass of protein is:

500 × 0.20 = 100 grams of protein.

Let’s assume the average molecular weight of a protein is 100 g/mol. Therefore:

100 / 100 = 1 mole of amino acid molecules,

which equals $6.022 \times 10^{23}$ amino acid molecules.


2. Why do humans eat beef but do not become a cow?

I wish I could, but my mom and dad say no.

Our DNA is fixed at the moment the embryo is formed. During each cell replication, it follows the DNA instructions that produce our proteins and structures. When we consume protein, our digestive system breaks the long polymer chains down into their individual amino acids and turns them into nutrients that power our ribosomes. We cannot perform horizontal gene transfer (HGT) like bacteria.


3. Why are there only 20 natural amino acids?

Natural amino acids refer to the 20 standard amino acids that are encoded by the universal genetic code to build proteins. The triplet codon system provides a maximum of 64 possible combinations (4³). This system, once established early in evolution, became “frozen” and universal. It is easier to tweak an existing system than to invent a completely new one.


4. Can you make non-natural amino acids? Design some.

Yes. Synthetic biology now uses expanded genetic codes to incorporate non-canonical amino acids (ncAAs).

One strategy is to modify a standard amino acid such as lysine by attaching:

  • A small, highly fluorescent organic molecule
  • Connected through a long, flexible linker (e.g., a hydrocarbon chain)
  • Attached to the side chain backbone

This allows proteins (such as GFP) to gain new chemical or optical properties.


5. Where did amino acids come from before life started?

They likely originated from abiotic synthesis. Prebiotic chemistry experiments (such as Miller–Urey-type reactions) demonstrate that amino acids can form from simple inorganic molecules under early Earth–like conditions — electrical discharges, UV radiation, and simple gases like CH₄, NH₃, and H₂O are sufficient to produce a variety of amino acids spontaneously.


6. If you make an α-helix using D-amino acids, what handedness would you expect?

Standard L-amino acids form right-handed α-helices. Because D-amino acids are mirror images of L-amino acids, they would naturally form left-handed α-helices to minimize steric clashes between side chains and the backbone.


7. Why are most molecular helices right-handed?

This is a consequence of biological homochirality. Life selected L-amino acids early in evolution. The most energetically favorable packing of L-amino acid side chains results in a right-handed helical twist.

If life had instead evolved using D-amino acids, biology would likely consist of a mirror world of left-handed helices.


8. Why do β-sheets tend to aggregate? What is the driving force?

β-sheets have exposed, “sticky” edges. Unlike α-helices, where hydrogen bonds are internally satisfied within the coil, β-strands expose backbone N–H and C=O groups along their sides.

The primary driving forces for aggregation are:

  • Hydrogen bonding between exposed backbone groups
  • The hydrophobic effect, as non-polar side chains cluster together to avoid water

9. Why do many amyloid diseases form β-sheets? Can you use them as materials?

Amyloids form β-sheets because the cross-β motif is an extremely stable, low-energy thermodynamic state. Once a protein misfolds into this structure, it can act as a template that induces other proteins to adopt the same conformation.

Materials Applications

Yes — amyloids can be useful materials. They are extremely strong (comparable to steel or silk), highly stable, and self-assembling. They are being researched for tissue engineering scaffolds and conductive biofilms


Part B — Rhodopsin Protein Analysis


Protein Selection

I selected Rhodopsin, a light-sensitive G protein-coupled receptor (GPCR) found in the rod cells of the retina. Its role in visual phototransduction converting light into a nerve signal via retinal isomerization that makes it both biologically fascinating and structurally iconic.


1. Amino Acid Sequence

Basic Properties

Basic Information of 4WW3 Basic Information of 4WW3
  • Length: 221 amino acids
  • Most Frequent Amino Acid: Glycine (Gly)
  • Sequence: ETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAF SDFTFSLVNGFPLMTISCFLKKWIFGFAACKVYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMAASKKMS HRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFDYISRDSTTRSNILCMFILGFFGPILIIFF CYFNIVMSVSNHEKEMAAMAKRLNAKELRKAQAGANAEMRLAKISIVIVSQFLLSWSPYAVVALLAQFGP LEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEDDKDAETEIPAGE

Sequence Homologs

  • Number of homologs identified: 218
UniProt BLAST Results UniProt BLAST Results

Protein Family

  • Opsin family
  • GPCR (G Protein-Coupled Receptor) superfamily
  • Class A GPCR (Rhodopsin-like family)
GPCR Classification Diagram GPCR Classification Diagram

2. Protein Structure (RCSB PDB)

  • Resolution: 2.70 Å — good quality for a membrane protein
RCSB Structure Page RCSB Structure Page

Structural Classification

Rhodopsin features seven transmembrane alpha-helices, characteristic of Class A GPCRs.


3. 3D Visualization (PyMol)

Cartoon View Cartoon ViewRibbon View Ribbon ViewBall-and-Stick View Ball-and-Stick View

Secondary Structure

Predominantly alpha-helices, consistent with a 7-TM membrane protein.

Residue Distribution

  • Hydrophobic residues concentrated in membrane-spanning regions
  • Hydrophilic residues on extracellular and intracellular surfaces

Surface and Binding Pocket

A clear internal binding pocket accommodates retinal that makes it essential for light detection.


Part C — ML-Based Protein Design Tools


C1. Deep Mutational Scan (ESM2)

ESM2 generates a 221 × 20 mutational scan matrix giving the log-likelihood ratio of each mutation relative to wild-type.

Key patterns:

  • Lys296 (retinal-binding residue) shows near-zero tolerance for mutation — K296A receives a strongly negative ΔLL score
  • Transmembrane core residues are highly conserved
  • Extracellular loop residues are more permissive
ESM2 Deep Mutational Scan Heatmap ESM2 Deep Mutational Scan Heatmap

C1. Latent Space Analysis (UMAP)

Rhodopsin clusters alongside other Class A GPCRs (adrenergic, adenosine, muscarinic receptors), well-separated from non-GPCR 7-TM proteins.


C2. Protein Folding (ESMFold)

Wild-type: Seven-helix bundle correctly formed; high pLDDT (>80) on helices, lower (~55–65) on loops.

Mutation resilience:

  • Single point mutations → structure unchanged
  • K296A → fold maintained; stability is independent of chromophore linkage
  • Full TM helix deletion → pLDDT drops significantly; bundle disrupted

C3. Protein Generation (ProteinMPNN)

  • Sequence recovery: ~38–45% of native sequence recovered
  • Lys296 has near-100% probability of being retained
  • Lipid-facing residues diverge but remain hydrophobic

ESMFold of the designed sequence matches original backbone at RMSD ~2.0–2.5 Å.

Tool Justifications

ToolPurposeRationale
ESM2Deep mutational scanIdentifies conserved positions without experimental data
AlphaFold2 / ESMFoldStructure predictionNo crystal structure available
AlphaFold-MultimerL–DnaJ interfaceDisrupting DnaJ binding may derepress lysis
FoldX / RosettaΔΔG predictionRapid screening of single mutants
ProteinMPNNSequence redesignStable sequences on a fixed backbone

Week 5 HW: Protein Design Part II

Part 1: Generate Binders with PepMLM

The original sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutate the 4th amino acid A to V (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:

indexBinderPseudo Perplexity
0HLYYAVALELKX13.299815648347872
1WRSYAVVLELWK17.97100111129112
2WRYYPVAAAWKK11.081842724779028
3WHYGAVGLRHKX13.983770011694478

Part 2: Evaluate Binders with AlphaFold3

We submitted each peptide paired with the mutant SOD1 (A4V) sequence to the AlphaFold Server as separate chains to model the protein–peptide complex. All runs used seed 2026616022 for reproducibility.

ipTM — interaction confidence between the two proteins (binder ↔ SOD1). Higher is better.
pTM — structural accuracy within each protein independently. Higher is better.

AlphaFold3 Prediction Results:

PeptideFull SequenceipTMpTMBinding Observation
HRYHRYGAVVVELKK0.300.85Peptide appears loosely associated near the surface; low-confidence interaction region (orange/yellow in pLDDT)
WHYWHYYVAAAEHKK0.320.75Peptide sits at the top exterior of SOD1, largely disordered (orange), suggesting weak or transient surface contact
WRVWRVGAAAVRLKK0.400.81Highest ipTM of the group; peptide traces along the lower exterior of the β-barrel, with partial low-confidence contact near the C-terminus region
WRYWRYPVTAAEWKE0.270.85Peptide adopts a compact fold but appears docked away from the core; largely orange indicating low structural confidence at the interface

Structure previews:

HRY (ipTM=0.30, pTM=0.85)WHY (ipTM=0.32, pTM=0.75)
HRY HRYWHY WHY
WRV (ipTM=0.40, pTM=0.81)WRY (ipTM=0.27, pTM=0.85)
WRV WRVWRY WRY

The PAE (Predicted Aligned Error) matrix shows inter-chain confidence in the bottom-right block. Darker green = lower positional error = more confident interaction. The peptide chain corresponds to residues ~165+ in each plot.

Summary:

ipTM scores across the four PepMLM-generated peptides ranged from 0.27 (WRY) to 0.40 (WRV), all falling in the low-confidence range (ipTM < 0.5 is generally considered weak). WRVGAAAVRLKK achieved the highest ipTM of 0.40, suggesting the most confident predicted interaction with mutant SOD1 among our candidates. Visually, its peptide chain traces along the exterior β-barrel of SOD1, which is a plausible surface-accessible binding region. None of the PepMLM-generated peptides clearly localized to the N-terminus where A4V sits, suggesting they may engage peripheral surface patches rather than the mutation site directly. All four peptides showed high pTM scores (0.75–0.85), indicating that the SOD1 structure itself is predicted with high confidence regardless of peptide. Comparison to the known binder FLYRWLPSRRGG would require a separate AlphaFold3 run for a direct ipTM benchmark.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. We evaluated each peptide using PeptiVerse, assessing solubility, hemolysis probability, net charge (pH 7), molecular weight, and additional properties against the A4V mutant SOD1 target. The known binder FLYRWLPSRRGG was included as a reference.

PeptiVerse Results:

PeptideSolubilityHemolysis (prob)PermeabilityNet Charge (pH 7)MW (Da)GRAVY
WHYYVAAAEHKKSoluble (1.000)Non-hemolytic (0.023)Non-permeable (0.412)+0.931502.7-0.97
HRYGAVVVELKKSoluble (1.000)Non-hemolytic (0.059)Non-permeable (0.062)+1.851398.7-0.21
WRVGAAAVRLKKSoluble (1.000)Non-hemolytic (0.036)Permeable (0.914)+3.761354.6-0.04
WRYPVTAAEWKESoluble (1.000)Non-hemolytic (0.182)Non-permeable (0.268)-0.231535.7-1.08
FLYRWLPSRRGG (known binder)Soluble (1.000)Non-hemolytic (0.047)+2.761507.7-0.71

PeptiVerse screenshots:

WHYYVAAAEHKKHRYGAVVVELKKWRVGAAAVRLKK
WHYYVAAAEHKK WHYYVAAAEHKKHRYGAVVVELKK HRYGAVVVELKKWRVGAAAVRLKK WRVGAAAVRLKK
WRYPVTAAEWKEFLYRWLPSRRGG (known binder)
WRYPVTAAEWKE WRYPVTAAEWKEFLYRWLPSRRGG FLYRWLPSRRGG

Summary:

All four PepMLM-generated peptides and the known binder FLYRWLPSRRGG were predicted to be fully soluble (probability 1.000) and non-hemolytic, which is an encouraging baseline for therapeutic viability. Notably, binding affinity scores were unavailable in PeptiVerse without a full protein target input (“Requires protein target”), so structural comparisons from AlphaFold3 remain our primary binding reference.

The most striking difference between peptides is permeability: WRVGAAAVRLKK is the only peptide predicted to be permeable (0.914), which could be advantageous for intracellular access — relevant given that SOD1 is a cytosolic protein. Its hemolysis probability (0.036) and net charge (+3.76) are also comparable to the known binder FLYRWLPSRRGG (+2.76, hemolysis 0.047). WRYPVTAAEWKE, by contrast, carries a slight negative charge (−0.23) and the highest hemolysis probability among the four (0.182), making it less favorable.

Chosen peptide to advance: WRVGAAAVRLKK

WRVGAAAVRLKK best balances predicted therapeutic safety and functional potential. Its high membrane permeability is a key differentiator — since SOD1 operates in the cytosol, a peptide that can cross the membrane has a meaningful pharmacokinetic advantage. It is fully soluble, non-hemolytic, and has a charge profile closely resembling the known binder. Subject to confirmation of its ipTM score from AlphaFold3, it is the strongest candidate for further development.

Part 4: Generate Optimized Peptides with moPPIt

We used moPPIt (Multi-Objective Guided Discrete Flow Matching, MOG-DFM) to move from probabilistic sampling toward controlled, motif-directed peptide design. Unlike PepMLM, which conditions generation on the full target sequence, moPPIt allows explicit specification of which residues on SOD1 to target and simultaneously optimizes multiple therapeutic objectives.

Design choices:

  • Target sequence: A4V mutant SOD1
  • Target residues: Residues near position 4 (A4V mutation site) and the surrounding N-terminal region, which is destabilized by the mutation
  • Peptide length: 12 amino acids
  • Guidance enabled: Motif binding affinity, solubility, hemolysis

Part C: L-Protein Mutants

Shout out to Shitong for the reference work and pipeline that guided this section 🙏

The objective of this section is to improve the stability and auto-folding of the lysis protein (L-protein) of MS2-phage, and to identify mutations that stabilize its interaction with the chaperone protein DnaJ. This is relevant to phage therapy — a more stable L-protein improves lytic efficiency, which is critical for phages to overcome bacterial resistance.

Boltz-2: Wild-type Complex Structure Prediction

Week 6 HW: Genetic Circuits Part I

Part 1. Questions

1. Phusion High-Fidelity PCR Master Mix Components

  • Phusion DNA Polymerase — high-fidelity polymerase with 3′→5′ proofreading exonuclease activity; ~50× lower error rate than Taq
  • dNTPs — nucleotide building blocks (dATP, dCTP, dGTP, dTTP) incorporated during strand synthesis
  • HF Buffer + Mg²⁺ — provides optimal pH and ionic conditions; Mg²⁺ is an essential cofactor for polymerase activity
  • Stabilizers — maintain enzyme activity during storage and reaction setup

2. Factors That Determine Primer Annealing Temperature

  • GC content — G·C pairs have 3 H-bonds vs. 2 for A·T, raising T_m
  • Primer length — longer primers = higher T_m
  • Salt/Mg²⁺ concentration — stabilizes duplexes, increases T_m
  • Primer secondary structure — hairpins or self-dimers reduce effective T_m
  • Polymerase used — Phusion tolerates higher T_a than Taq; use NEB Tm Calculator for Phusion

Rule of thumb: T_a ≈ T_m of the lower-melting primer (for Phusion)


3. PCR vs. Restriction Enzyme Digest

PCRRestriction Enzyme Digest
MechanismExponential amplification using primersSite-specific endonuclease cuts at recognition sequences
End typeBlunt (Phusion) or defined by primer designBlunt or sticky ends depending on enzyme
Adds sequence?Yes — overhangs encoded in primersNo — cuts only at existing sites
Template neededAny DNA, even low quantityUsually purified plasmid/DNA
Time~1–2 hr~1–4 hr
Error riskPossible polymerase errorsNo amplification errors

Prefer PCR when you need to add custom overhangs/sequences, there are no convenient RE sites, or when starting from complex template (genomic DNA).

Prefer RE digest when compatible cut sites already flank your insert, you want sticky ends for ligation, or you need to linearize a vector backbone without introducing mutations.


4. Ensuring Fragments Are Appropriate for Gibson Cloning

Gibson Assembly diagram Gibson Assembly diagram

Gibson Assembly requires 20–40 bp of overlapping sequence between adjacent fragments. To ensure compatibility:

  1. Design PCR primers with 20–40 bp 5′ tails homologous to the adjacent fragment
  2. Verify overlaps in silico using Benchling or Asimov Kernel — confirm correct orientation and reading frame
  3. Check overlap uniqueness — overlaps that appear elsewhere in the construct cause mis-assembly
  4. For RE-digested fragments — PCR-amplify and add overlaps via primers before Gibson assembly

5. How Plasmid DNA Enters E. coli During Transformation

Chemical transformation (heat shock method):

  1. Cells are made competent by treatment with ice-cold CaCl₂, which destabilizes the outer membrane and allows DNA to associate with the cell surface
  2. Plasmid DNA is added and incubated on ice
  3. A brief heat shock at 42°C (~45 sec) creates a thermal imbalance that drives DNA through the membrane (likely via transient pores)
  4. Cells recover in SOC media, then are plated on selective antibiotic plates — only transformants survive

6. Alternative Assembly Method: Golden Gate Assembly

Golden Gate Assembly uses Type IIS restriction enzymes (e.g., BsaI), which cut outside their recognition sequence, generating custom 4-bp overhangs. Because the recognition site is destroyed upon cutting, the enzyme continuously re-cuts incorrect assemblies — driving the reaction toward the correctly assembled, scarless product. Each fragment is designed so that digestion produces unique 4-bp overhangs complementary only to its intended neighbor in the assembly. Digestion and ligation happen simultaneously in one pot by cycling between 37°C (cutting) and 16°C (ligation). The final product contains no scar, no extra bases, and no remaining restriction site at the junctions. This makes it ideal for assembling many fragments in parallel, such as in pathway engineering or combinatorial library construction.


Part 2. Asimov Kernel — Genetic Constructs

Construct 1: Rhodopsin Light-Sensitive Protein

new ideas from week 7 lec: modify it to make it like an activation function

How It Should Function

The promoter turns on, the rhodopsin protein gets made, and the terminator stops it. No feedback, no regulation — just expression. The pLacI promoter drives constitutive expression of CYPR_CALVI, a light-sensitive rhodopsin protein. When the promoter is active, the cell continuously produces the rhodopsin protein. Because there is no feedback or regulation, protein levels are expected to rise steadily in the simulator. Rhodopsins are membrane proteins that respond to light, making them useful for optogenetic applications — controlling cell behavior using light.

Construct Image

Construct 1 - Rhodopsin Construct 1 - Rhodopsin

Construct 2: Negative Feedback Loop

How It Should Function

This circuit makes a glowing protein (GFP) AND a repressor at the same time. The repressor builds up and eventually turns the whole circuit off. The pLacI promoter drives simultaneous expression of both GFPL_DISST (green fluorescent protein) and LacI repressor. As more LacI accumulates in the cell, it begins to bind to and repress the pLacI promoter — slowing down production of both GFP and itself. This negative feedback loop acts as an auto-regulator: GFP levels rise initially, then stabilize or decline as LacI repression kicks in. The expected simulator output is a rise-then-plateau curve for GFP concentration.

Construct Image

Construct 2 - Negative Feedback Loop Construct 2 - Negative Feedback Loop

Construct 3: Toggle Switch

How It Should Function

This construct makes a repressor (TetR) that silences the other half of the switch. The two halves silence each other, so the cell can only be in one state at a time. This construct is one half of a classic bistable toggle switch. When pLacI is active, TetR is produced, which represses the pTet promoter in a paired construct. That paired construct produces LacI, which would repress pLacI. Because each side silences the other, the system locks into one of two stable states.

  • State 1 (TetR wins): pLacI ON → TetR high → pTet OFF → LacI low → pLacI stays ON
  • State 2 (LacI wins): pTet ON → LacI high → pLacI OFF → TetR low → pTet stays ON

Construct Image

Construct 3 - Toggle Switch Construct 3 - Toggle Switch

Week 7 HW: Genetic Circuits Part II

Part 1. Intracellular Artificial Neural Networks

Q1. Advantages of IANNs over Traditional Boolean Genetic Circuits

A traditional genetic circuit works like a panel of on‑off light switches. Each gene is either fully expressed or completely silent, and the circuit’s output is a strict Boolean function of those binary inputs. An IANN, by contrast, behaves more like a set of dimmer switches connected through a mixing board. Each input can take any value within a continuous range, the connections have adjustable weights, and the final output is a smooth, graded signal instead of a hard 0 or 1.

This difference brings several benefits. Because IANNs are built from sequestrons that process signals in the analog domain, they can represent and compute with concentrations across a wide dynamic range. Boolean circuits squeeze all that richness into just two bins, but IANNs preserve it. A large enough IANN can in principle approximate any input‑output function, which is the biological version of the universal approximation theorem from machine learning. IANNs are also compact and scalable. Instead of layering many different logic gates, each with its own set of genetic parts, they use a single repeatable building block called a sequestron. The weights are set simply by adjusting DNA concentrations, so adding complexity means adding more copies of the same module rather than inventing new gate designs. Tuning the weights is like turning knobs on a mixing board: you change the ratio of plasmids, and the circuit’s behavior changes without needing to redesign any genetic parts. Finally, IANNs degrade gracefully. A small disturbance in the input causes only a small change in the output. Boolean circuits, on the other hand, can flip from the correct answer to the wrong one because of a tiny fluctuation near the switching threshold.

Q2. Applications

IANN could be designed to detect early tumor urinary tumor DNA (utDNA) in dogs by using CRISPR‑based DNA sensors that convert the presence of tumor‑specific mutations into transcriptional inputs for the IANN.

Pre-processing (in vitro): Three CRISPR-Cas13a sensors, each with mutation-specific crRNAs, detect BRAF V595E, TP53 mutations, and aberrant methylation in urine cfDNA. The collateral cleavage activity de-represses synthetic promoters proportionally to how much mutant DNA is present — converting molecular detection into analog transcriptional signals.

Computation (in vivo, HEK293 cells): A two-layer IANN built from sequestrons receives those three graded promoter signals as endoribonuclease inputs. Layer 1 integrates the BRAF and TP53 channels; Layer 2 combines Layer 1’s output with the methylation signal to produce a final weighted decision.

Output: mCherry fluorescence intensity acts as a continuous “cancer probability score” — low for healthy, moderate for single-mutation/early-stage, high for multi-mutation/advanced disease.

The analogy throughout is a team of sniffer dogs reporting to a handler — each dog gives a graded intensity signal for its specific scent (mutation), and the handler weighs them all to decide whether to raise the alarm.

The limitations section covers the real practical hurdles: sensitivity floor of CRISPR sensors for dilute utDNA, the sensor-to-cell interface challenge, transfection variability, the 650 ng DNA budget, temporal lag, incomplete mutation panels, gaps in canine methylome data, and lack of tissue-of-origin discrimination.

CRISPR-to-IANN canine utDNA detection system

Full system diagram showing CRISPR-based DNA sensors converting urinary tumor DNA mutations into transcriptional inputs for a two-layer intracellular artificial neural network Full system diagram showing CRISPR-based DNA sensors converting urinary tumor DNA mutations into transcriptional inputs for a two-layer intracellular artificial neural network

Figure 1: Early canine tumor detection via CRISPR-to-IANN biosensor. Urine cfDNA is amplified and split across three CRISPR-Cas13a sensors targeting BRAF V595E, TP53 mutations, and methylation signatures. Each sensor de-represses a promoter proportionally to mutant utDNA concentration. Inside HEK293 cells, these analog signals feed a two-layer IANN built from sequestrons. The mCherry fluorescence output serves as a continuous cancer probability score.

Q3 Single-layer intracellular perceptron

Single-layer intracellular perceptron where X1 encodes Csy4 endoribonuclease and X2 encodes a fluorescent protein whose mRNA is regulated by Csy4 Single-layer intracellular perceptron where X1 encodes Csy4 endoribonuclease and X2 encodes a fluorescent protein whose mRNA is regulated by Csy4

Figure 2: Single-layer intracellular perceptron. X₁ is DNA encoding the Csy4 endoribonuclease; X₂ is DNA encoding a fluorescent protein whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. The dashed circle represents the sequestron, where Csy4 (−) cleaves the fluorescent protein mRNA (+), and surviving mRNA is translated into the output Y.


Part 2. Fungal Materials

1.What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Mycelium packaging, such as Ecovative’s Mushroom Packaging, is made by growing mycelium on agricultural waste inside a mold. After a few days the material is heat‑killed and dried, producing a rigid, lightweight foam that can replace Styrofoam for protective packaging. It is completely biodegradable, grows on waste feedstocks, and uses little energy to manufacture. However, it has lower compressive strength than Styrofoam, is sensitive to moisture, and is slower to produce at scale.

Mycelium leather, like Bolt Threads’ Mylo and MycoWorks’ Reishi, is grown as a pure sheet in controlled fermentation, then tanned and finished much like animal leather. It is used in fashion and accessories. It requires no animal farming and has a much lower water and land footprint, and its thickness and texture can be tuned. On the downside, it is still expensive at small scale, its durability and aging properties are still being improved, and it needs chemical post‑processing to match the flexibility of animal leather.

Mycelium‑based building insulation is grown on straw or wood shavings and can be used as thermal and acoustic panels. It has heat insulation similar to synthetic foams and is naturally fire‑retardant. It is non‑toxic, sequesters carbon, and resists fire better than petroleum‑based foams. But it is not strong enough for structural uses and can degrade if it gets wet.

Mycoprotein foods like Quorn and Meati come from fermenting filamentous fungi to make high‑protein, fibrous biomass that feels like meat. These products are high in protein, have a complete amino acid profile, and produce far fewer greenhouse gases than animal farming. Still, some people are allergic, the feedstocks are sugar‑based, and the taste and texture are not yet exactly the same as real meat.

Mycelium automotive parts were explored by Ford and Ecovative for interior pieces like dashboards, door panels, and seat cushions, taking advantage of the material’s sound absorption and impact resistance. They are lightweight, need no adhesive because the mycelium acts as the binder, and can be composted at the end of their life. However, they are sensitive to water, there is not yet much data on long‑term durability, and they are not yet cost‑competitive with synthetic foams at automotive scale.

2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Bacteria are like bicycles: fast, simple, and cheap, good for making small proteins and simple chemicals. Fungi are like trucks. They are eukaryotes, so they have the cellular machinery to fold complex proteins, add post‑translational modifications, and build intricate three‑dimensional structures that bacteria simply cannot make. One can make mycelium materials that are naturally stronger, more flexible, or more water‑resistant without needing chemical post‑processing. You can also make fungi produce high‑value proteins while they are growing. They could secrete antimicrobial peptides, fire‑retardant proteins, or pigments directly into the material, creating functional composites in a single step. You could create self‑healing materials by engineering dormant spores into dried mycelium composites that reactivate when water enters a crack, similar to how skin heals. You could also enhance bioremediation by engineering white‑rot fungi to produce extra versions of the enzymes that break down plastics, pesticides, or industrial dyes.

Compared to bacteria, fungi have many advantages for synthetic biology. They fold and modify proteins correctly because they have an endoplasmic reticulum and Golgi apparatus, which bacteria lack. Their hyphal growth lets them naturally form sheets, foams, and composites; bacteria only grow as single cells in liquid. Fungi can grow on cheap, unprocessed plant waste because they secrete powerful enzymes to break down cellulose and lignin, while most bacteria need processed sugars. Their eukaryotic compartments let them keep toxic intermediates separate and run incompatible pathways at the same time, which helps them make complex natural products. Genetic tools like CRISPR‑Cas9, promoter libraries, and selectable markers are now well developed in model fungi, making engineering much easier. And because fungi are multicellular, they can form different tissue types such as aerial hyphae or fruiting bodies, opening up possibilities for spatial organization and layered material architectures that are impossible with single‑celled bacteria.

Week 9 HW: Cell Free Systems

##Part 1 1.Advantages of Cell-Free Over In Vivo Expression Cell-free protein synthesis (CFPS) removes the cell as a “black box” and allows you directly control and observe every variable in real time: pH, redox potential, ionic strength, and cofactor concentration.

2.Main Components and Their Roles

ComponentRole
Cell extractProvides ribosomes, tRNA, synthetases, chaperones, and machinery
DNA/mRNA templateEncodes the target protein (plasmid or linear)
RNA polymeraseTranscribes DNA → mRNA (T7 RNAP is most common)
Amino acidsRaw building blocks for translation
Energy systemSupplies and recycles ATP/GTP to power translation
Salts and bufferMaintains pH (~7.5) and ionic strength (Mg²⁺, K⁺ critical)
AdditivesChaperones, detergents, etc., added based on target needs

3.Energy Provision and ATP Regeneration

Translation is enormously ATP-hungry: every peptide bond costs ~4 high-energy phosphate bonds. In a tube, the initial ATP pool depletes within 30–60 minutes, stalling ribosomes and collapsing yield.

Phosphocreatine / Creatine Kinase System

ADP + phosphocreatine → ATP + creatine (catalyzed by creatine kinase)

  • Add phosphocreatine (~20 mM) and creatine kinase (~0.5 mg/mL).
  • Extends productive reaction time from ~1 hour to 3–6 hours.
  • Alternative: Maltose/maltodextrin system — a multi-enzyme cascade mimicking glycolysis, cheaper for large-scale reactions.

4.Prokaryotic vs. Eukaryotic Cell-Free Systems

FeatureProkaryotic (E. coli)Eukaryotic (wheat germ / CHO)
Post-translational modsNoneGlycosylation, phosphorylation, etc.
CostLowHigh
YieldHighModerate
Best forSimple cytosolic proteinsMammalian proteins, antibodies, GPCRs
  • Prokaryotic Choice — T7 RNA Polymerase: Straightforward cytosolic protein, no PTMs needed, high yield required.
  • Eukaryotic Choice — Erythropoietin (EPO): Requires N-linked glycosylation for proper folding. A prokaryotic system would produce misfolded, inactive protein.

Designing Cell-Free Expression of a Membrane Protein

Membrane proteins are hydrophobic — without a lipid bilayer, they aggregate instantly.

Three Solubilization Strategies

  1. Detergent micelles (DDM, digitonin) — Simplest; add directly to reaction.
  2. Nanodiscs — Pre-assembled lipid bilayer discs; co-translate so protein inserts immediately.
  3. Liposomes — Lipid vesicles that capture the protein as it emerges from the ribosome.

##Part 2 Synthetic Minimal Cell: Gut Microbiome Inflammation Sensor

1. Function

1a. What It Does — Input and Output

A liposome-based synthetic cell that detects elevated reactive oxygen species (ROS) in the gut lumen — a molecular signature of intestinal inflammation — and responds by producing and releasing butyrate, a short-chain fatty acid that suppresses NF-κB signaling and restores epithelial barrier integrity.

FeatureDescription
InputHydrogen peroxide ($H_2O_2$) and superoxide — ROS elevated during gut inflammation (IBD, Crohn’s, colitis)
OutputButyrate (butanoic acid) — anti-inflammatory metabolite that feeds colonocytes and suppresses immune activation

Analogy: Think of it like a smoke detector hardwired to a fire sprinkler — the same signal that trips the alarm also triggers the response, with no human intervention needed. The synthetic cell is silent in a healthy gut and active only when and where inflammation occurs.

1b. Could Cell-Free Tx/Tl Alone Do This Without Encapsulation?

No. There are three primary reasons:

  1. Directionality: Without a membrane boundary, butyrate produced freely in solution would diffuse away immediately with no directional delivery to the epithelium.
  2. Protection: The ROS-sensing gene circuit would be exposed to gut proteases and nucleases, degrading within minutes.
  3. Threshold Control: There is no mechanism for threshold-gated release — the entire reaction would fire at once rather than responding proportionally to local ROS concentration.

1c. Could a Genetically Modified Natural Cell Do This?

Partially — but with serious limitations compared to a synthetic liposome:

FeatureEngineered BacteriumSynthetic Liposome Cell
ROS sensingPossible via OxyR regulonPossible via OxyR-driven promoter
Butyrate synthesisYes — multiple chassisYes — encapsulated enzyme pathway
Immune clearanceHigh — triggers innate immunityMinimal — PEGylated lipids are inert
Replication controlRequires auxotrophy kill switchNon-replicating by design
Gene TransferRisk of horizontal transferZero risk
Regulatory pathExtremely difficult (GMO in gut)More tractable as a drug delivery device

1d. Desired Outcome

A synthetic cell administered orally (enteric-coated capsule) that survives transit to the colon, remains transcriptionally silent in healthy tissue, and activates butyrate synthesis specifically at inflamed foci where $H_2O_2$ exceeds threshold (~50 µM).

2. Component Design

2a. Membrane Composition

The membrane is designed to survive the harsh gut environment (low pH, bile salts) while remaining functional at 37°C.

LipidRoleMol%
DPPCHigh-Tm structural lipid; bile salt resistance40%
POPESupports protein insertion; reduces curvature stress25%
CholesterolRigidifies bilayer; reduces permeability25%
DSPE-PEG2000PEG brush layer; prevents immune recognition10%

2b. Encapsulated Contents

The “cytoplasm” of the synthetic cell contains the following:

  • Tx/Tl Machinery: E. coli cell-free extract (ribosomes, tRNA, chaperones), T7 RNA Polymerase, and the OxyR-responsive promoter plasmid.
  • Pre-loaded Enzymes: Acetyl-CoA acetyltransferase (ThlA) for fast initial response.
  • Small Molecules: Acetyl-CoA (2 mM), Phosphocreatine (20 mM) + creatine kinase (0.5 mg/mL), all 20 amino acids (5 mM each), and essential cofactors (NAD⁺/NADH, CoA).

2c. Tx/Tl System Origin: Bacterial (E. coli)

A prokaryotic system is preferred because the OxyR transcription factor and the butyrate synthesis enzymes (from Clostridium) are natively bacterial. No complex post-translational modifications (PTMs) are required, making the high-yield E. coli extract the most efficient choice.

2d. Communication with the Environment

  • Sensing (IN — Passive): $H_2O_2$ crosses lipid bilayers freely via passive diffusion. Inside, it oxidizes OxyR, switching it from a repressor to an activator.
  • Secretion (OUT — Active): We express VDAC-1 (voltage-dependent anion channel 1). While butyrate is anionic at physiological pH, the expressed VDAC-1 pores permit rapid efflux.

3. Experimental Details

3a. Complete Genes and Lipids

GeneOrganismRole
oxyRE. coli K-12ROS-activated transcription factor
thlAC. acetobutylicumStep 1: 2 acetyl-CoA → acetoacetyl-CoA
hbdC. acetobutylicumStep 2: → 3-hydroxybutyryl-CoA
crtC. acetobutylicumStep 3: → crotonyl-CoA
bcd/etfABC. acetobutylicumStep 4: → butyryl-CoA
ptb/bukC. acetobutylicumSteps 5–6: → butyrate
VDAC1H. sapiensMembrane pore for butyrate efflux

3b. Measuring System Function

Validation is performed through a tiered strategy:

  1. Tier 1: ROS-responsive expression: Use a GFP reporter to confirm the OxyR circuit activates at the ~50 µM $H_2O_2$ threshold.
  2. Tier 2: Butyrate synthesis: Quantify butyrate production in bulk extract using GC-MS or enzymatic assays.
  3. Tier 3: Pore function: Use ANTS/DPX dye efflux assays to confirm VDAC-1 correctly inserts into the liposome membrane.
  4. Tier 4: Integrated function: Measure butyrate secretion from encapsulated cells in simulated healthy (5 µM $H_2O_2$) vs. inflamed (100 µM $H_2O_2$) conditions.
  5. Tier 5: Bioactivity: Apply the output to Caco-2 cells and measure the reduction in inflammatory markers (IL-8/NF-κB).

##Part 3. Aura-Weave (Smart Medical Wearables)

1. One-Sentence Summary Pitch

Aura-Weave is a smart, disposable textile liner for adult incontinence garments that uses freeze-dried cell-free extracts to seamlessly detect and visually report urinary tract infections (UTIs) by changing color when exposed to infected urine.

2. How the Idea Works in Detail

The Aura-Weave liner features a middle diagnostic layer composed of a highly absorbent cellulose matrix (similar to filter paper). This matrix is pre-loaded with a lyophilized (freeze-dried) cell-free extract (CFE), specific engineered DNA circuits, and pH buffers.

The Mechanism:

  1. Dormancy: The system remains completely inactive on the shelf in its dry state.
  2. Activation: When the wearer voids urine, the warm liquid acts as the natural rehydration trigger, “booting up” the biological transcription and translation machinery.
  3. Detection: If specific UTI biomarkers are present—such as nitrites, leukocyte esterase, or bacterial quorum-sensing molecules—the genetic circuit is triggered.
  4. Visual Output: The circuit drives the rapid expression of a vibrant chromoprotein (like AmilCP). Within 45 to 60 minutes, a clear blue warning symbol permeates to the outer visible edge of the textile.

Visual Indicator: A blue symbol or color change alerts the caregiver immediately without requiring a manual diagnostic test.

3. Societal Challenge and Market Need

This addresses the “silent crisis” of UTIs in elderly, bedridden, and cognitively impaired populations (such as those with Alzheimer’s or dementia).

  • Communication Barriers: These patients often cannot communicate early symptoms like pain or urgency, leading to delayed diagnosis.
  • Medical Risks: Late-stage UTIs frequently progress to severe kidney infections, sepsis, and costly hospitalizations.
  • Non-Invasive Monitoring: Aura-Weave eliminates the difficult, messy, and stress-inducing process of collecting a clean urine sample from an uncooperative patient, allowing for continuous, passive health monitoring in nursing homes and home-care settings.

4. Addressing Cell-Free Limitations

LimitationAura-Weave Strategy
Water ActivationThe Built-in Trigger: In this context, rehydration is a feature. The CFE stays inactive until the exact moment the patient urinates, ensuring the test only runs when a sample is provided.
StabilitySugar Matrix Stabilization: To ensure a shelf life of over a year, the CFE and DNA are co-lyophilized with a stabilizing sugar (trehalose) and a strong buffer (HEPES), locking proteins in a stable, glass-like state at room temperature.
One-Time UseLifecycle Alignment: Incontinence liners are inherently single-use. The biological sensor’s lifecycle perfectly matches the textile’s lifecycle; once soiled and read, the garment is safely discarded.

Technical Specifications

  • Target Biomarkers: Nitrites, Leukocyte Esterase, Quorum-Sensing molecules.
  • Reporter Protein: AmilCP (Chromoprotein).
  • Time to Result: 45–60 minutes post-activation.
  • Storage Requirements: Room temperature (stabilized via Trehalose).

##Part 4

  • The Challenge: Current missions detect “bricks” (organic molecules) but not “factories” (active life).
  • Strategy: Use FD-CF systems to detect ATP Synthesis and 16S rRNA genes.
  • Reasoning: Only active biology synthesizes ATP; abiotic chemistry (like meteorites) might have organics but no coordinated metabolism.
  • Experimental Plan: Compare Mars simulant spiked with extremophiles against sterile simulant and abiotic meteorite extracts.

Week10 HW: Imaging and Measurement

What to measure: Identity, mass, purity, and post-translational modifications of the target protein; concentration of a biomarker; oligomeric state.

How:

  • Intact mass by LC-MS (QTof) → confirms overall MW and detects unexpected modifications.
  • Peptide mapping by tryptic digest + LC-MS/MS → confirms primary sequence and identifies PTM sites.
  • Native MS / CDMS → reveals folded state and oligomeric assembly.
  • SDS-PAGE / Western blot → quick purity and identity check before MS.
  • UV-Vis (A280) → concentration.

Part I — Molecular Weight of Intact eGFP

Q1. Theoretical MW from sequence

Sequence length: 247 residues (includes LE linker + HHHHHH His-tag).

Calculated average MW ≈ 28,006.6 Da (~28 kDa).

Analogy: counting MW from sequence is like weighing a train by summing the weight of each car — every amino acid adds its known “car weight” minus one water molecule per peptide bond.

Q2. Deconvolution from Figure 1

Selected adjacent peaks (denatured envelope): m/z = 933.7148 and m/z = 903.7148.

Step 2.1 — Charge of the lower-charge peak (n):

$$z_n = \frac{m/z_{n+1}}{(m/z_n) - (m/z_{n+1})} = \frac{903.7148}{933.7148 - 903.7148} = \frac{903.7148}{30.0000} ≈ 30.12$$

→ rounded to z = 30 (and the adjacent peak is z = 31).

Step 2.2 — MW from m/z and z:

$$MW = z_n \times (m/z_n - 1.00728) = 30 \times (933.7148 - 1.00728) ≈ \mathbf{27{,}981\ Da}$$

(Using the non-rounded z ≈ 30.12 gives 28,097 Da. Either is acceptable.)

Step 2.3 — Accuracy:

$$\text{Accuracy} = \frac{|28{,}097 - 28{,}007|}{28{,}007} ≈ 0.32%$$

That is ~3,200 ppm error — excellent for intact-protein QTof analysis.

Q3. Charge state of the zoomed-in peak (~1473 m/z region)

No, the charge state cannot be confidently assigned from the zoom. Reason: the zoomed peak shows partially resolved features, but at 30,000 resolution on a ~28 kDa protein the isotope envelope is not baseline-resolved — adjacent isotope peaks are not separated by a clean 1/z spacing you can read off. To assign charge from isotope spacing you need resolution high enough to see individual ¹²C/¹³C isotope peaks (≥60,000–100,000 for a protein this size, ideally on an Orbitrap or FT-ICR).

Analogy: trying to count the steps of a staircase from far away — if your camera (resolution) isn’t sharp enough, the steps blur into a ramp.


Part II — Native vs Denatured (Secondary/Tertiary)

Q1. What changes when a protein unfolds, and how MS sees it

  • Native (folded): the protein is compact, so only a few surface residues are accessible to protons during electrospray → it picks up few charges → peaks appear at high m/z, in a narrow charge envelope (e.g., Figure 2 bottom: ~2500–3000 m/z, only ~2–3 charge states).
  • Denatured (unfolded): the chain is extended, exposing every basic residue (K, R, H, N-terminus) to protonation → it picks up many charges → peaks span a broad envelope at low m/z (Figure 2 top: ~700–1500 m/z, many charge states).

The mass is the same — only the charge distribution shifts. MS reads the protein’s “shape” indirectly through how many protons it carries.

Analogy: a folded origami crane has only a few outer surfaces to stick stickers (charges) on; unfold it into a flat sheet and you can stick stickers everywhere. The paper hasn’t changed weight, but the sticker count tells you the shape.

Q2. Charge state at ~2800 m/z (native)

Inset isotope spacing in Figure 3: peaks at 2544.4885, 2544.5801, 2544.6719, 2544.7637, 2544.8552, 2545.0388 → spacing ≈ 0.092 m/z.

$$z = \frac{1}{\Delta(m/z)} = \frac{1}{0.092} ≈ \mathbf{11}$$

So the peak at ~2800 m/z corresponds to z ≈ +11 charge state (consistent with a compact, native eGFP carrying few protons).


Part III — Peptide Mapping (Primary Structure)

Q1. Lysines and Arginines in eGFP

  • K (Lysine): 20
  • R (Arginine): 6
  • Total K + R cleavage sites: 26

Highlighted in the sequence (K and R in bold):

MVS**K**GEELFTGVVPILVELDGDVNGH**K**FSVSGEGEGDATYG**K**LTL**K**FICTTG**K**LPVPWPTLVTTLTYGVQCFS**R**
YPDHM**K**QHDFF**K**SAMPEGYVQE**R**TIFF**K**DDGNY**K**T**R**AEV**K**FEGDTLVN**R**IEL**K**GIDF**K**EDGNILGH**K**
LEYNYNSHNVYIMAD**K**Q**K**NGI**K**VNF**K**I**R**HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALS**K**
DPNE**K**RDHMVLLEFVTAAGITLGMDELY**K**LEHHHHHH

Q2. Predicted tryptic peptides

In silico digest (no missed cleavages, cleavage after K/R unless followed by P): 27 peptides total. PeptideMass with mass cutoff ≥500 Da returns ~17 peptides (filters out very small fragments like TR, QK, IR, single R).

Q3. Chromatographic peaks in TIC (0.5–6 min, >10% abundance)

From Figure 5a, counting peaks above 10% relative intensity: ~15–17 peaks (the most prominent at 0.43, 0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, 5.43, 6.12, 6.50, 6.64, 6.73 — about 15 of these clearly exceed 10%).

Q4. Does peak count match predicted peptides?

The TIC shows fewer peaks than predicted (~15 visible vs 27 predicted). Reasons:

  • Very small peptides (TR, QK, IR, R, NGIK) are below the MS detection range or wash out in the dead volume.
  • Some peptides co-elute (overlap in retention time).
  • Some hydrophilic peptides aren’t retained on C18 reverse-phase column.

Q5. Charge of the peak at 525.76712

Isotope spacing in Figure 5b inset: 525.76712 → 526.25918 → 526.76845 → 527.26998. Spacing ≈ 0.492 m/z.

$$z = \frac{1}{0.492} ≈ \mathbf{2}$$

Singly charged [M+H]⁺:

$$[M+H]^+ = z \times (m/z) - (z-1) \times 1.00728 = 2 \times 525.76712 - 1.00728 = \mathbf{1050.527\ Da}$$

Q6. Peptide identification and mass accuracy

Matching 1050.527 against the PeptideMass output → FEGDTLVNR (residues 115–123 of eGFP, theoretical monoisotopic [M+H]⁺ = 1050.5214 Da).

Mass error in ppm:

$$\text{ppm} = \frac{|1050.527 - 1050.5214|}{1050.5214} \times 10^6 ≈ \mathbf{5.3\ ppm}$$

Excellent accuracy (sub-10 ppm is standard for QTof).

Q7. Sequence coverage

From Figure 6: 88% of the eGFP sequence is confirmed by peptide mapping.

Bonus Q8. Sequence from fragmentation (Figure 5c)

Fragment masses (122.07, 214.09, 388.22, 501.31, 602.35, 537.25, 774.41, 903.44, 1050.52) match the b- and y-ion series for FEGDTLVNR. Key y-ions: y1 (R) = 175.12, y2 (NR) = 289.16, y7 (GDTLVNR), and the immonium ion at 122.07 corresponds to F (phenylalanine immonium). Confirms FEGDTLVNR.

Bonus Q9. Does the data confirm eGFP?

Yes. 88% sequence coverage + a confirmed peptide (FEGDTLVNR) with <10 ppm mass error + matching fragmentation pattern is strong identification. The unobserved 12% is typical (very short peptides or unretained hydrophilic peptides) and doesn’t undermine the ID.

Analogy: it’s like recognizing a friend from 88% of their face uncovered — you don’t need every feature, you just need enough unique landmarks.


Part IV — KLH Oligomers (CDMS)

Using subunit masses: 7FU = 340 kDa, 8FU = 400 kDa, and “Decamer” = 10 subunits.

OligomerCompositionExpected massPeak in Figure 7
7FU Decamer10 × 340 kDa3.4 MDapeak at 3.4 MDa
8FU Didecamer20 × 400 kDa8.0 MDapeak at ~7.52 MDa (close to 8.0; some mass loss possible)
8FU 3-Decamer30 × 400 kDa12.0 MDapeak at 12.67 MDa
8FU 4-Decamer40 × 400 kDa16.0 MDapeak around ~16 MDa (small/absent)

The prominent 4.013 MDa peak is likely an 8FU decamer (10 × 400 = 4.0 MDa). The 0.1982, 0.79, 1.52 peaks are sub-decameric assemblies/free subunits.

Analogy: CDMS measures each particle individually — like weighing each LEGO build that walks past on a conveyor belt, instead of melting them all and weighing the slag. You see exactly which assemblies exist.


Part V — Validate GFP?

TheoreticalObserved (Intact LC-MS)PPM Mass Error
Molecular weight (kDa)28.007~28.097 (from deconvolution of Figure 1)~3,200 ppm (0.32%)

The observed intact mass agrees with the theoretical mass of eGFP+LE+His₆ to within ~0.3%. Combined with 88% peptide map coverage and confirmed FEGDTLVNR fragmentation → yes, this is eGFP.

Week11 HW: BIOPRODUCTION AND CLOUD LABS

Final 1,536 pixel canvas

Unfortunately I was away at CHI 2026 during the contribution window, so I didn’t get to commit a pixel in time.


Part B – Cell-Free Protein Synthesis

B1. Role of each component

E. coli Lysate

  • BL21 (DE3) Star Lysate (with T7 RNAP): The “factory floor” – a crude cytoplasmic extract carrying ribosomes, tRNAs, aminoacyl-tRNA synthetases, translation factors, and the T7 RNA polymerase needed to transcribe T7-promoter templates. The DE3 Star background also lacks RNase E activity, so mRNAs last longer.

Salts / Buffer

  • Potassium Glutamate: Main osmolyte and K+ source for ribosome function; glutamate (unlike chloride) doesn’t inhibit translation.
  • HEPES-KOH pH 7.5: pH buffer – holds the reaction near physiological pH so enzymes work and chromophores fold correctly.
  • Magnesium Glutamate: Mg2+ is essential cofactor for RNAP, ribosome assembly, and every nucleotide-binding enzyme. Glutamate counter-ion again, for the same reason as K+.
  • Potassium phosphate (mono/dibasic, 1.6:1): Phosphate buffer + Pi source. Pi is recycled into NTPs and into the energy regeneration loop.

Energy / Nucleotide System (NMP-Ribose system)

  • Ribose: Carbon backbone for nucleotide regeneration; cellular enzymes in the lysate convert ribose + NMPs into NTPs.
  • Glucose: Primary energy source – glycolysis in the lysate regenerates ATP from ADP, powering everything else.
  • AMP, CMP, GMP, UMP: Cheap nucleotide monophosphates supplied as precursors; lysate kinases phosphorylate them up to NTPs as needed (this is why NMP-Ribose is cheaper than buying NTPs directly).
  • Guanine: Free base that lysate enzymes salvage into GMP/GDP/GTP – this is the workaround for the missing GMP in the mix (see B3).

Translation Mix (Amino Acids)

  • 17 Amino Acid Mix: The protein building blocks. Tyrosine and cysteine are supplied separately because they have solubility/oxidation quirks.
  • Tyrosine (pH 12): Tyrosine is poorly soluble at neutral pH, so it’s prepared in a strong alkaline stock and added separately.
  • Cysteine: Highly reactive thiol – prone to oxidation and disulfide formation, so kept separate and added fresh.

Additives

  • Nicotinamide: Precursor for NAD+/NADH – needed by glycolysis (GAPDH step) to keep the energy regeneration cycle running.

Backfill

  • Nuclease-Free Water: Brings the reaction to volume without introducing RNases that would chew up the mRNA.

Analogy: the cell-free reaction is a kitchen with the chefs (ribosomes, polymerase), the recipes (DNA template), and an inventory system (energy regeneration). The 20-h mix is set up so the kitchen doesn’t run out of energy after one hour – instead of pre-cooked NTPs, it ships in flour (ribose) and grain (NMPs) and bakes its own NTPs continuously.

B2. PEP-NTP vs NMP-Ribose master mix – main differences

The PEP-NTP mix is a sprint: it supplies finished NTPs directly plus PEP (phosphoenolpyruvate) as a high-energy phosphate donor, giving immediate energy and transcription for fast 1-hour reactions, but it’s expensive and burns out as PEP depletes. The NMP-Ribose mix is a marathon: it supplies cheap NMP precursors + ribose + glucose so the lysate’s own enzymes regenerate NTPs and ATP continuously, sustaining transcription/translation for 20 hours at much lower cost.

Analogy: PEP-NTP is buying a pre-charged battery; NMP-Ribose is installing a slow-trickle solar panel. The battery is faster at first but it runs out – the panel keeps producing as long as the sun (glucose) is shining.

B3. Bonus – Transcription without GMP, only Guanine

E. coli lysates retain the purine salvage pathway. Enzymes such as HGPRT (hypoxanthine-guanine phosphoribosyltransferase) convert free guanine + PRPP (which the lysate makes from ribose and ATP) into GMP, and then nucleotide kinases (GMK, NDK) phosphorylate GMP -> GDP -> GTP. So guanine + ribose effectively replaces GMP at a fraction of the cost.

Analogy: you don’t buy bread if you have flour and a baker. The lysate has the baker (HGPRT + kinases) and the flour (ribose), so guanine is enough.


Part C – Planning the Global Experiment

C1. One key biophysical property of each FP affecting cell-free readout

  1. sfGFPFast, robust folding even under stress. sfGFP was engineered specifically to fold and mature quickly in conditions where regular GFP fails (heterologous expression, fusion tags). In cell-free, this means it lights up earliest of the green channel and is a forgiving “always works” baseline. Like GFP family, it needs O2 for chromophore maturation (oxidation step), but its maturation is fast (~14 min in vivo).

  2. mRFP1Slow maturation and stepwise blue->red intermediate. mRFP1 (the original DsRed-derived monomer) matures over hours via a blue/green intermediate, so 1-hour reactions barely capture any red signal – a 36-h incubation is essentially required to see full red. Also O2-dependent (extra oxidation step compared to GFP).

  3. mKO2Orange FP with moderate maturation, pH-stable. mKO2 (from Fungia coral) has good photostability and a relatively low pKa (~5.5), so it tolerates the slight pH drift cell-free reactions experience as glucose ferments to lactate/acetate. Maturation slower than GFP-class but faster than DsRed-derived reds.

  4. mTurquoise2High quantum yield, but pKa ~3.1 means very acid-resistant; sensitive to chloride. Probably the brightest cyan available, with QY ~0.93. It folds well in E. coli lysates. Its CFP-class chromophore is insensitive to acidification, so it’s a great choice if the long-incubation pH drops. Like GFP, needs O2.

  5. mScarlet-IFastest-maturing monomeric red (~31 min in yeast), but moderate acid sensitivity (pKa ~5.4). This makes it the best red for short or pH-stable reactions; if pH drifts below ~6, signal drops noticeably.

  6. Electra2Blue FP derived from mRuby3, monomeric, bright, with a chromophore that doesn’t require O2 for maturation in the same way GFP does (BFPs of this class form their chromophore via a different cyclization route). Designed for live-cell intracellular brightness, which translates well to cell-free. Spectral non-overlap with the green/red FPs makes it ideal for the multi-color canvas.

C2. Hypothesis – one reagent adjustment to improve a specific property

Protein: mRFP1 Reagent(s) to adjust: increase dissolved O2 (looser cap / higher headspace / pre-oxygenated buffer) and add supplemental FAD (~1 uM). Expected effect: mRFP1 maturation includes an O2-dependent oxidation step that’s rate-limiting in sealed cell-free wells, where O2 depletes fast. Increasing O2 availability should accelerate red chromophore formation and raise final fluorescence at 36 h without changing protein yield. FAD has been shown to assist DsRed-family oxidative maturation. Predicted endpoint: >=1.5x fluorescence vs default well.

Alternative hypothesis (mScarlet-I): Increase HEPES buffer concentration from 45 mM -> 80 mM to prevent pH drift below mScarlet-I’s pKa (~5.4) over 36 h. Expected effect: maintain protonation state of the chromophore, preserve fluorescence quantum yield, and prevent late-incubation signal decay.

Analogy: in photography terms, this is choosing the right film for the lighting. mTurquoise2 is high-ISO film (works in any light/pH), mRFP1 is a film that needs long exposure (slow maturation), mScarlet-I is film that gets fogged by acid in the developer (acid-sensitive) – so you adjust the chemistry (buffer, O2, time) to suit the film.

C3. Master mix composition

ComponentStock conc.Final conc.Volume in 20 uL rxn
Lysate6 uL
2x Optimized Master Mix2x1x10 uL
FP DNA template2 uL
Custom reagent supplement(s)(fill in)(fill in)2 uL
Total20 uL

Labs

Lab writeups:

  • Week 1 Lab: Pipetting

    Pre-lab answers Stock MS in g/mL 5 M × 532 g/mol = 2660 g/L = 2.66 g/mL Serial dilution plan: 5 M → 100 µM Total dilution = 50,000× → 2 steps.

  • Week 11 Lab: Cloud Lab

    Part B — Cell-Free Protein Synthesis B1. Role of each reagent (20 h NMP-Ribose-Glucose mix) Component Role in the reaction BL21 (DE3) Star lysate Source of ribosomes, tRNAs, aminoacyl-tRNA synthetases, and other translation machinery. (DE3) carries T7 RNAP for transcription; Star = reduced RNase E -> mRNA more stable. Potassium glutamate Dominant monovalent cation, mimics cytoplasmic ionic environment, stabilizes ribosome conformation. Glutamate (vs Cl-) doesn’t inhibit translation. HEPES-KOH pH 7.5 Zwitterionic buffer holds pH near physiological — keeps T7 RNAP and ribosomes active. Magnesium glutamate Mg 2+ cofactor for ribosome assembly, T7 RNAP, and aminoacyl-tRNA synthetases. Concentration is highly tunable — too low halts translation, too high promotes misincorporation. K-phosphate mono/dibasic Secondary pH buffer + phosphate pool for nucleotide kinase reactions (NMP -> NDP -> NTP). Ribose Feeds the salvage pathway: ribokinase -> ribose-5-P -> PRPP, which combines with free bases to form NMPs. Also a slow-burning energy substrate. Glucose Carbon source for glycolysis -> continuous ATP regeneration (sustained energy, unlike PEP which burns fast). AMP / CMP / UMP NMP precursors. Cellular kinases phosphorylate them to NTPs in situ -> slower ramp than feeding NTPs, but cheaper and less product-inhibition. GMP (0 mM here) Omitted in the 20 h mix; GTP is generated via the Guanine salvage path instead (see bonus). Guanine Substrate for HGPRT: Guanine + PRPP -> GMP + PPi. Cheaper than buying GMP directly. 17 amino acid mix Bulk substrate pool for translation (all proteinogenic AAs except Tyr and Cys, which need special handling). Tyrosine (pH 12) Added separately — Tyr has very low solubility at neutral pH and must be kept in alkaline solution until dilution. Cysteine Added separately — readily oxidizes to cystine (forms disulfides). Kept in its own tube to avoid inactivation before reaction start. Nicotinamide NAD+ precursor + inhibitor of NAD-degrading enzymes (NADases) in the lysate -> preserves the redox cofactor pool over the long incubation. Nuclease-free water Backfill to final volume; nuclease-free to protect the DNA template and mRNA. B2. PEP-NTP (1 h) vs NMP-Ribose-Glucose (20 h) The PEP-NTP mix feeds the reaction with finished NTPs and uses phosphoenolpyruvate as a high-energy ATP regenerator — fast, intense protein synthesis, but PEP is depleted within ~1 hour and the system burns out. The NMP-Ribose-Glucose mix instead supplies precursors (NMPs + ribose for PRPP, glucose for glycolytic ATP) and lets the lysate’s own kinases and salvage enzymes assemble NTPs on demand, giving a slower but sustained ramp that lasts 20+ hours. Cost-per-reaction is also much lower because cheap precursors replace expensive NTPs and PEP.

  • Week 2 Lab: DNA Gel Art

    Design (Benchling) Designed a face: eyes in lanes 1/3/7, eyebrows in lanes 2/6, nose in lane 5, lips in lane 4. Digest setup (per lane, 20 µL total) Lane Enzyme(s) Water CutSmart 10× λ DNA (0.5 µg/µL) Enzyme(s) (1 µL each) 1 PvuII + SalI 13 µL 2 µL 3 µL 1 + 1 µL 2 BamHI + XhoI 13 µL 2 µL 3 µL 1 + 1 µL 3 PvuII + SalI 13 µL 2 µL 3 µL 1 + 1 µL 4 SalI only 14 µL 2 µL 3 µL 1 µL 5 NdeI + PvuII 13 µL 2 µL 3 µL 1 + 1 µL 6 BamHI + XhoI 13 µL 2 µL 3 µL 1 + 1 µL 7 PvuII + SalI 13 µL 2 µL 3 µL 1 + 1 µL Incubated at 37 °C, 30 min.

  • Week 3 Lab: Opentrons Art

  • Week 6 Lab: Gibson Assemly

    Week 6: Gibson Assembly Group members: Louisa Zhu, Shitong, Jasmin Part 1 1. PCR Figure 1. PCR reaction setup tables for the Backbone DNA Fragment (top) and Color DNA Fragment (bottom), including reagent volumes for a 25 µL total reaction.

  • Week 7 Lab: Neuromorphic Circuits

    Overview In this lab, we designed two neuromorphic genetic circuits using the HTGAA 2026 Genetic Circuit Design Template and simulated their behavior using the Biocompiler-Predict tool. Both circuits are built from endoribonuclease-based sequestrons — the fundamental building blocks of intracellular artificial neural networks (IANNs) — and are intended for transfection into HEK293 cells via Lipofectamine 3000 and execution by an OT-2 liquid handling robot.

Subsections of Labs

Week 1 Lab: Pipetting

Pre-lab answers

Stock MS in g/mL

5 M × 532 g/mol = 2660 g/L = 2.66 g/mL

Serial dilution plan: 5 M → 100 µM

Total dilution = 50,000× → 2 steps.

StepFrom → ToFactorTake stock+ dH₂OFinal volPipette
15 M → 10 mM500×2 µL998 µL1000 µLP2/P20 + P1000
210 mM → 100 µM100×10 µL990 µL1000 µLP20 + P1000

Tubes: 1.5 mL Eppendorfs (volumes too large for PCR strips).

Final reaction (60 µL, MS at 40 µM, dye at 1×)

ReagentStockFinalVolume
Loading dye10 µL
MS100 µM40 µM24 µL
dH₂O26 µL
Total60 µL

Why 100 µM intermediate instead of diluting straight to 40 µM? 100 µM is a clean serial-dilution endpoint (50,000× = 500× × 100×); 40 µM isn’t. The 100 µM tube also acts as a reusable stock for downstream reactions — error compounds with every extra dilution step, so fewer steps to a clean intermediate is better.


Part 1 — Mixing color (practice)

Followed protocol: tubes 1–3 single colors (500 µL each), tubes 4–6 mixed pairs (R+Y, Y+B, R+B). Two-step pipetting (200 + 20 µL) on tube 4 to practice tip changes.

Plating designs: Used 1–10 µL drops on a glass petri to build volume intuition. Drop diameter scales noticeably with volume — 1 µL drops are barely visible without backlight; 10 µL drops bead high enough to catch reflection.

Result

I pipetted the 甲骨文 (oracle bone script) of 马 — the Chinese character for horse — onto the plate to celebrate the Year of the Horse (2026). Oracle bone script is the earliest known form of Chinese writing, carved into ox scapulae and turtle plastrons during the Shang dynasty (~14th–11th c. BCE) for divination. The pictographic form of 马 still shows the horse’s mane, four legs, and tail.

Reference character (甲骨文 of 马)My pipetted plate
Oracle bone script of horse Oracle bone script of horsePipetted plate Pipetted plate

Observations:

  • Drop size variability across the design = visible record of where my hand was steady vs. shaky.
  • Surface tension on bare glass keeps drops discrete — no spreading or merging unless they touched.

Part 2 — Serial dilution

Performed the two-step dilution per the table above. Mixed by pipetting up/down 3–4× after each addition. Marked tubes with target concentration.

Prepared the 60 µL final reaction. Loaded 20 µL into a pre-prepared gel well (bonus step) — went in cleanly without puncturing.

Week 11 Lab: Cloud Lab

Part B — Cell-Free Protein Synthesis

B1. Role of each reagent (20 h NMP-Ribose-Glucose mix)

ComponentRole in the reaction
BL21 (DE3) Star lysateSource of ribosomes, tRNAs, aminoacyl-tRNA synthetases, and other translation machinery. (DE3) carries T7 RNAP for transcription; Star = reduced RNase E -> mRNA more stable.
Potassium glutamateDominant monovalent cation, mimics cytoplasmic ionic environment, stabilizes ribosome conformation. Glutamate (vs Cl-) doesn’t inhibit translation.
HEPES-KOH pH 7.5Zwitterionic buffer holds pH near physiological — keeps T7 RNAP and ribosomes active.
Magnesium glutamateMg 2+ cofactor for ribosome assembly, T7 RNAP, and aminoacyl-tRNA synthetases. Concentration is highly tunable — too low halts translation, too high promotes misincorporation.
K-phosphate mono/dibasicSecondary pH buffer + phosphate pool for nucleotide kinase reactions (NMP -> NDP -> NTP).
RiboseFeeds the salvage pathway: ribokinase -> ribose-5-P -> PRPP, which combines with free bases to form NMPs. Also a slow-burning energy substrate.
GlucoseCarbon source for glycolysis -> continuous ATP regeneration (sustained energy, unlike PEP which burns fast).
AMP / CMP / UMPNMP precursors. Cellular kinases phosphorylate them to NTPs in situ -> slower ramp than feeding NTPs, but cheaper and less product-inhibition.
GMP (0 mM here)Omitted in the 20 h mix; GTP is generated via the Guanine salvage path instead (see bonus).
GuanineSubstrate for HGPRT: Guanine + PRPP -> GMP + PPi. Cheaper than buying GMP directly.
17 amino acid mixBulk substrate pool for translation (all proteinogenic AAs except Tyr and Cys, which need special handling).
Tyrosine (pH 12)Added separately — Tyr has very low solubility at neutral pH and must be kept in alkaline solution until dilution.
CysteineAdded separately — readily oxidizes to cystine (forms disulfides). Kept in its own tube to avoid inactivation before reaction start.
NicotinamideNAD+ precursor + inhibitor of NAD-degrading enzymes (NADases) in the lysate -> preserves the redox cofactor pool over the long incubation.
Nuclease-free waterBackfill to final volume; nuclease-free to protect the DNA template and mRNA.

B2. PEP-NTP (1 h) vs NMP-Ribose-Glucose (20 h)

The PEP-NTP mix feeds the reaction with finished NTPs and uses phosphoenolpyruvate as a high-energy ATP regenerator — fast, intense protein synthesis, but PEP is depleted within ~1 hour and the system burns out. The NMP-Ribose-Glucose mix instead supplies precursors (NMPs + ribose for PRPP, glucose for glycolytic ATP) and lets the lysate’s own kinases and salvage enzymes assemble NTPs on demand, giving a slower but sustained ramp that lasts 20+ hours. Cost-per-reaction is also much lower because cheap precursors replace expensive NTPs and PEP.

B3. How can transcription occur without GMP if Guanine is present?

E. coli’s purine salvage pathway rebuilds GMP from free Guanine. Specifically:

Ribose -> (ribokinase) -> Ribose-5-P -> (PRPP synthetase) -> PRPP
Guanine + PRPP -> (HGPRT) -> GMP + PPi
GMP -> (GMP kinase) -> GDP -> (NDP kinase) -> GTP

So as long as ribose is supplied (to make PRPP) and HGPRT is active in the lysate, free Guanine gets converted to GTP at a rate that supports transcription. This is the same logic the cell uses to recycle purines released from RNA turnover — Ginkgo / OpenAI’s paper exploited it to cut reagent cost.


Part C — Planning the Global Experiment

C1. Biophysical / functional property of each FP that matters for cell-free expression

a. sfGFP (superfolder GFP) — Engineered for robust, fast folding (~13 min maturation) even when fused to aggregation-prone partners. Chromophore (Ser-Tyr-Gly cyclization -> dehydration -> oxidation) requires O2, but maturation is fast enough that O2 rarely limits it in 20 uL wells.

b. mRFP1 — Classic monomeric DsRed-derivative. Slow maturation (~1 h) through a GFP-like green intermediate before red, O2-dependent, and acid-sensitive (pKa ~4.5). Maturation kinetics, not synthesis rate, dominate readout over a 36 h incubation.

c. mKO2 — From Fungia concinna. Acid-stable (pKa ~5.5, lower than most FPs) -> fluorescence is preserved as glycolysis acidifies the well over long incubations. Maturation is fast (~7 min). Bright per molecule.

d. mTurquoise2 — Engineered CFP variant with exceptional quantum yield (~0.93) and long fluorescence lifetime -> very bright per folded molecule. Folding is reliable, but it shares the standard O2 dependence and emits in cyan, requiring the right filter set.

e. mScarlet-I — De novo designed monomeric RFP, “I” = improved maturation (~36 min, vs ~3 h for parent mScarlet). Among the brightest monomeric reds. Acid-sensitive (pKa ~5.4) and O2-dependent — maturation rate is the key bottleneck and matches the 36 h incubation window well.

f. Electra2 — Engineered FP designed for improved photostability / brightness. As with most FPs, chromophore maturation is O2-dependent and folding efficiency at 37 C will set how much of the synthesized protein actually fluoresces; if it’s a slower-folding variant, longer incubation favors it.

C2. Hypothesis

Protein: mScarlet-I (red). Reagent change: Increase Mg 2+ glutamate from 7 mM to ~10 mM AND extend incubation in mild orbital shaking instead of static. Expected effect:

  1. Moderately higher Mg 2+ improves T7 RNAP processivity and ribosome activity -> more mScarlet-I polypeptide synthesized within the first 6 h.
  2. Gentle shaking keeps dissolved O2 saturated in the 20 uL droplet, accelerating mScarlet-I’s O2-dependent chromophore oxidation -> more of the synthesized polypeptide reaches the fluorescent state by 36 h.
  3. Net: higher endpoint fluorescence at 36 h vs the control mix.

Risks: Too much Mg 2+ causes misincorporation and aggregation; shaking can foam the lysate. Sweep Mg 2+ in 1 mM increments around 10 mM to find the optimum.

C3. Phase 2 — assigned wells & master mix recipe

Wait for assignment email (by 4/24). Once received, fill in:

  • Assigned well(s): [e.g., row C col 7, mScarlet-I template]
  • Custom 2 uL supplement composition (final concentrations after 1:10 dilution into the 20 uL reaction):
    • [reagent 1: target conc]
    • [reagent 2: target conc]
    • [backfill: nuclease-free water]

Reaction recipe per well (20 uL total):

ComponentVolume
BL21 (DE3) Star lysate6 uL
2x Optimized Master Mix10 uL
Assigned FP DNA template2 uL
My custom reagent supplement2 uL

Week 2 Lab: DNA Gel Art

Design (Benchling)

Designed a face: eyes in lanes 1/3/7, eyebrows in lanes 2/6, nose in lane 5, lips in lane 4.

Benchling face design Digest table screenshot

Digest setup (per lane, 20 µL total)

LaneEnzyme(s)WaterCutSmart 10×λ DNA (0.5 µg/µL)Enzyme(s) (1 µL each)
1PvuII + SalI13 µL2 µL3 µL1 + 1 µL
2BamHI + XhoI13 µL2 µL3 µL1 + 1 µL
3PvuII + SalI13 µL2 µL3 µL1 + 1 µL
4SalI only14 µL2 µL3 µL1 µL
5NdeI + PvuII13 µL2 µL3 µL1 + 1 µL
6BamHI + XhoI13 µL2 µL3 µL1 + 1 µL
7PvuII + SalI13 µL2 µL3 µL1 + 1 µL

Incubated at 37 °C, 30 min.

Predicted gel (Benchling virtual digest, NEB 2-Log ladder)

Predicted gel pattern


Bench setup

Enzymes, λ DNA, 6× loading dye, and rCutSmart buffer kept on ice. Visible on ice: λ DNA, SacI, KpnI (×2), BamHI, FD PvuII (FastDigest), rCutSmart, Eco32I (= EcoRV), 6× LD.

Enzymes on ice


Gel prep & run

  • 1% agarose: 0.75 g agarose in 75 mL 1× TAE → microwave in 15 s pulses → cool to ~50 °C → 7.5 µL SYBR Safe (10,000×) → pour with 12-well comb → set 30 min.
  • Load: 20 µL digest + 3.33 µL 6× loading dye per well.
  • Run: 80–115 V, ~45 min in 1× TAE on EC-103.

Loading the gel Partner loading Setup ready to run



Actual result

Gel did not match the predicted pattern. Bands were either absent, smeared, or in unexpected positions. The face design was not recoverable.

Actual gel


Failure analysis — most likely causes

  1. Buffer × enzyme mismatch (top suspect). Ice bucket shows FD PvuII (Thermo FastDigest) but digests used rCutSmart (NEB). FastDigest enzymes are optimized for Thermo’s FastDigest buffer; activity in CutSmart is partial. PvuII appears in 4 of 7 lanes (1, 3, 5, 7) — one mismatch collapses most of the design.
  2. NdeI / XhoI not confirmed on the bench. Neither tube visible on ice. If a substitute was grabbed, lanes 2, 5, 6 wouldn’t cut as designed.
  3. Run voltage / time. Smearing and blended rows = classic too-fast / too-long signature. 115 V is on the high end; try 70–90 V.
  4. DNA overload. 1.5 µg per digest × 20 µL loaded ≈ 150 ng/well, over the 100 ng/well guideline.
  5. Incomplete digestion. 30 min is the floor; 60 min helps with non-optimal buffers.

Next time — checklist

  • Cross-check enzyme list on the bench before finalizing the Benchling design.
  • If using FastDigest enzymes, use FastDigest buffer, not CutSmart. Don’t mix systems.
  • NEB Double Digest Finder for any 2-enzyme combo.
  • Nanodrop the λ DNA; dilute so each lane loads ≤ 100 ng.
  • Run at 80–90 V for 45–60 min; stop when dye front is ~⅔ down.
  • Photograph the gel with the ladder lane clearly visible.

Week 3 Lab: Opentrons Art

Week 6 Lab: Gibson Assemly

Week 6: Gibson Assembly

Group members: Louisa Zhu, Shitong, Jasmin


Part 1

1. PCR

PCR reaction mixture tables for backbone and color fragments PCR reaction mixture tables for backbone and color fragments

Figure 1. PCR reaction setup tables for the Backbone DNA Fragment (top) and Color DNA Fragment (bottom), including reagent volumes for a 25 µL total reaction.

We then ran the PCR reaction with the following thermocycler settings:

Thermocycler settings for BB_PCR and COLR_PCR protocols Thermocycler settings for BB_PCR and COLR_PCR protocols

Figure 2. Thermocycler programs for the backbone PCR (BB_PCR, left) and color fragment PCR (COLR_PCR, right). Both protocols use 26 cycles with a final extension at 72°C for 5 minutes.


2. Gel Eletrophoresis

Protocol credit to Louisa:

  1. Take 2 µL of each mixture and transfer into new labeled PCR tubes
  2. Pipette 2 µL of mUAV into a new tube
  3. Add 20 µL of water to each PCR tube
  4. Unpack gel electrophoresis cassette and load into machine
  5. Pipette DNA Ladder into first well
  6. Pipette 20 µL of mixture from each new PCR tube into correct wells (6 full wells total)
  7. Use the automatic setting for 1%, wait 10 minutes
Agarose 1% gel showing DNA bands from PCR products Agarose 1% gel showing DNA bands from PCR products

Figure 3. Agarose 1% gel electrophoresis result showing PCR products. The DNA ladder (M) is in the first lane. Bands are visible for the backbone and color fragments, confirming successful amplification.

After PCR amplification, a 1% agarose gel electrophoresis was performed to verify the size and quality of the amplified fragments. As shown in Figure 3, distinct bands were observed in the expected lanes, with the backbone fragment appearing at approximately 1.5 kb and the color fragments (Light Pink, Blue, and Purple) resolving at approximately 500–800 bp. All bands were sharp and well-defined with no visible smearing or non-specific secondary bands, indicating high-specificity amplification with minimal off-target products. The absence of bands in the negative control lane further confirms that there was no contamination during the PCR setup. The fragment sizes observed are consistent with the expected sizes based on the primer design and template mUAV plasmid, confirming that the correct regions were successfully amplified. These results demonstrate that both the backbone and color fragment PCR reactions performed as expected, and that the purified products were of sufficient quality to proceed to Gibson Assembly.


3. DNA Purification and Quantification

  1. Pipette 100 µL of DNA Binding Buffer into a centrifuge tube
  2. Add 20 µL of PCR product
  3. Mix briefly by vortexing
  4. Transfer 120 µL of the mixture into separate columns with a collection tube
  5. Centrifuge for 1 minute
  6. Discard the flowthrough
  7. Add 200 µL of DNA wash buffer to the column
  8. Centrifuge for 1 minute
  9. Repeat the last two steps
  10. Transfer the column to a new tube
  11. Discard flowthrough
  12. Add 6 µL of nuclease-free water to the column matrix
  13. Allow to sit for 2 minutes
  14. Centrifuge for 1 minute
  15. Store and save

Part Two

Materials (Credit to Lousia)

Items used:

  • P1000 pipette with 1000 µL tips
  • P20 pipette with 10 µL tips
  • PCR Tubes

Biological materials:

  • Purified Fragments
  • Gibson Assembly Master Mix
  • Nuclease-Free Water
  • LB-Agar plates with Chloramphenicol
  • SOC Growth Medium
  • DH5α competent cells

Machines used:

  • Thermal Cycler
  • Shaking Incubator
  • Waterbath set to 42°C

Part 1: Setting Up Gibson Assembly

We set up reactions in the proportions shown below for each color fragment, then incubated at 50°C for 30 minutes in a heat block, followed by adding 100 µL of nuclease-free water to dilute each sample.

ReagentStock Conc. (ng/µL)Desired Conc. (ng/µL)Volume (µL)
Backbone Fragment50250.5
Color Fragment (Single)50501.0
Gibson Assembly Mix2X1X5
Nuclease-free water3.5
Total Volume10

Part 2: Transformation

  1. Transfer 20 µL of competent cells to each tube
  2. Transfer purified assembly products into each tube (8 total: 3 Light Pink, 3 Blue, 3 Purple)
  3. Incubate on ice for 30 minutes
Labeled PCR tubes incubating on ice during transformation Labeled PCR tubes incubating on ice during transformation

Figure 4. Tubes incubating on ice during the transformation step. Each tube is labeled by color and sample number.

  1. Heat shock the cells at 42°C for 45 seconds immediately after the ice bath
  2. Add 100 µL of SOC media to each tube
  3. Allow growth in a shaking incubator for 1 hour
  4. Transfer 100 µL from each tube to the appropriate plate and spread using plating beads or a plastic spreader
  5. Incubate plates at 37°C for 72 hours

Part 3. Results

Plate Colony Result Plate Colony Result

Figure 5 (A-H). LB-Agar plates with Chloramphenicol selection showing colony growth after transformation. Plates were labeled by color fragment condition Blue (B), Light Pink (LP), and Purple (Pu) at varying dilutions (Subject to correction with further observation). Interestingly, all colonies grew out purple-blue regardless of which color fragment was used. This may be because the insert DNA was not incorporated at the right ratio relative to the backbone, causing cells to express the backbone’s default color instead.

Week 7 Lab: Neuromorphic Circuits

Overview

In this lab, we designed two neuromorphic genetic circuits using the HTGAA 2026 Genetic Circuit Design Template and simulated their behavior using the Biocompiler-Predict tool. Both circuits are built from endoribonuclease-based sequestrons — the fundamental building blocks of intracellular artificial neural networks (IANNs) — and are intended for transfection into HEK293 cells via Lipofectamine 3000 and execution by an OT-2 liquid handling robot.

Key components used:

  • Csy4 — a CRISPR endoribonuclease that cleaves mRNA at its recognition sequence
  • CasE (EcoCas6e) — a second orthogonal endoribonuclease for independent mRNA cleavage
  • PgU — a constitutive expression construct
  • mNeonGreen, mKO2, eBFP2 — fluorescent protein reporters (green, orange, blue)
  • _rec_ notation indicates a recognition site (e.g., Csy4_rec_mNeonGreen = mNeonGreen mRNA with a Csy4 cleavage site)

Circuit 1: “MyCircuit” (L-shape response)

Design rationale

This circuit implements a single-layer perceptron where two inputs (X₁ and X₂) each produce an endoribonuclease that negatively regulates a shared fluorescent output. The goal was to achieve an L-shaped dose–response surface: the output (mNeonGreen) should be high only when both inputs are low.

Analogy: Think of it like two faucets draining a bathtub. If either faucet is open (high X₁ or high X₂), water drains out and the tub level drops. The tub is full only when both faucets are closed.

Circuit design table

Circuit nameTransfection groupContentsConcentration (ng/µL)DNA wanted (ng)
MyCircuitX1Csy440150
MyCircuitX1mKO250100
MyCircuitX2CasE50150
MyCircuitX2eBFP250100
MyCircuitbias_output_csy4Csy4_rec_mNeonGreen50100
MyCircuitbias_output_caseCasE_rec_mNeonGreen50100

Total DNA: 700 ng

How it works

  1. X₁ input delivers Csy4 endoribonuclease DNA (150 ng) along with mKO2 (orange fluorescent protein, 100 ng) as a transfection marker to verify X₁ delivery.
  2. X₂ input delivers CasE endoribonuclease DNA (150 ng) along with eBFP2 (blue fluorescent protein, 100 ng) as a transfection marker for X₂.
  3. Output layer consists of mNeonGreen mRNA with recognition sites for both Csy4 (Csy4_rec_mNeonGreen, 100 ng) and CasE (CasE_rec_mNeonGreen, 100 ng). Both endoribonucleases independently cleave the output mRNA.

When X₁ is high → more Csy4 is produced → more mNeonGreen mRNA is cleaved → output decreases. When X₂ is high → more CasE is produced → more mNeonGreen mRNA is cleaved → output decreases. When both are low → minimal cleavage → mNeonGreen output is maximal.

Predicted behavior

MyCircuit heatmap showing L-shaped prediction with high output at low X1 values MyCircuit heatmap showing L-shaped prediction with high output at low X1 values

Figure 1: Biocompiler-Predict simulation of MyCircuit. The heatmap shows the predicted mNeonGreen output (Prediction Value) as a function of X₁ and X₂ concentrations. High output (dark blue, ~0.65–0.70) is concentrated along the left edge where X₁ is low. The L-shaped pattern confirms that the circuit acts as an approximate NOR-like function: output is highest when inputs are minimal.

Interpretation

The simulation reveals that X₁ (Csy4) has a stronger suppressive effect on the output than X₂ (CasE), as evidenced by the sharp drop-off along the X₁ axis compared to a more gradual decline along X₂. This asymmetry likely reflects differences in the catalytic efficiency and binding affinity of Csy4 versus CasE for their respective recognition sequences on the mNeonGreen mRNA. The L-shaped pattern is consistent with a weighted NOR gate where the X₁ weight is larger than the X₂ weight.


Circuit 2: “RF” (Rectified function)

Design rationale

This circuit implements a more complex multilayer architecture with cross-regulation between endoribonucleases. The goal was to achieve a rectified function — an output that increases monotonically with X₁ while remaining relatively insensitive to X₂, similar to a ReLU (rectified linear unit) activation function in machine learning.

Analogy: Imagine a volume knob (X₁) that smoothly turns up the music, while a second knob (X₂) has little effect because its signal gets cancelled out by internal feedback. The circuit “learns” to listen to one input and ignore the other.

Circuit design table

Circuit nameTransfection groupContentsConcentration (ng/µL)DNA wanted (ng)
RFX1CasE50100
RFX2Csy450100
RFBiasPgU50100
RFBiasCasE_rec_Csy45075
RFBiasCsy4_rec_CasE5075
RFBiasPgU_rec_CasE5075
RFBiasPgU_rec_Csy45075
RFX1CasE_rec_Csy4_rec_mKO25050
RFX2Csy4_rec_mNeonGreen5050

Total DNA: 700 ng

How it works

This is a multilayer circuit with cross-inhibition between the two endoribonucleases:

  1. X₁ input delivers CasE (100 ng) and a reporter CasE_rec_Csy4_rec_mKO2 (50 ng) — an mKO2 mRNA that can be cleaved by both CasE and Csy4, acting as a dual-regulated node.
  2. X₂ input delivers Csy4 (100 ng) and Csy4_rec_mNeonGreen (50 ng) — mNeonGreen output that is negatively regulated by Csy4.
  3. Bias layer creates a rich cross-regulatory network:
    • PgU (100 ng) — constitutive expression baseline
    • CasE_rec_Csy4 (75 ng) — Csy4 mRNA with a CasE recognition site (CasE cleaves Csy4 mRNA)
    • Csy4_rec_CasE (75 ng) — CasE mRNA with a Csy4 recognition site (Csy4 cleaves CasE mRNA)
    • PgU_rec_CasE (75 ng) — constitutive mRNA regulated by CasE
    • PgU_rec_Csy4 (75 ng) — constitutive mRNA regulated by Csy4

The cross-inhibition (CasE_rec_Csy4 and Csy4_rec_CasE) creates a mutual negative feedback loop between the two endoribonucleases. This effectively implements a winner-take-all competition: when X₁ drives CasE production, CasE degrades Csy4 mRNA, further reducing Csy4 levels and amplifying the X₁ signal. The result is a rectified response that primarily follows X₁.

Predicted behavior

RF heatmap showing gradient output increasing with X1 RF heatmap showing gradient output increasing with X1

Figure 2: Biocompiler-Predict simulation of the RF circuit. The heatmap shows a smooth left-to-right gradient where output increases monotonically with X₁ (left axis = low, right axis = high). The output ranges from ~0.05 (white, low X₁) to ~0.55 (dark blue, high X₁). The response is largely independent of X₂, confirming the rectified function behavior.

Interpretation

The RF circuit successfully achieves a unidirectional dose–response: output scales with X₁ concentration while remaining approximately flat across X₂ values. This behavior arises from the mutual antagonism between Csy4 and CasE in the bias layer. When X₁ increases CasE levels, CasE degrades the Csy4_rec_CasE mRNA (reducing Csy4 production), which in turn reduces degradation of CasE mRNA — a positive feedback amplification of the X₁ signal. Meanwhile, X₂-driven Csy4 is counteracted by CasE from both the X₁ input and the bias layer, preventing X₂ from significantly influencing the output.

The smooth gradient (rather than a sharp threshold) reflects the analog nature of the IANN — the circuit computes a continuous function rather than a binary switch.


Comparison of the two circuits

FeatureMyCircuit (L-shape)RF (Rectified function)
ArchitectureSingle-layer, two independent inhibitorsMultilayer with cross-inhibition
Number of parts69
Total DNA700 ng700 ng
Output reportermNeonGreenmKO2 / mNeonGreen
Input-output behaviorNOR-like: high when both inputs lowReLU-like: scales with X₁, ignores X₂
Key design featureIndependent cleavage of shared outputMutual antagonism creates winner-take-all
Predicted dynamic range~0.30 – 0.70~0.05 – 0.55

Methods

Circuit design (Day 1)

  1. Circuits were designed using the HTGAA 2026 Genetic Circuit Design Template (Google Sheet).
  2. Part names followed the conventions in the HTGAA 2026 Genetic Circuit Part Names list.
  3. All concentrations were set to 50 ng/µL (with one exception: Csy4 in MyCircuit at 40 ng/µL).
  4. Circuit behavior was simulated using the Biocompiler-Predict tool, which generates heatmaps of predicted output across the X₁–X₂ input space.
  5. Completed spreadsheets were uploaded via the Google Form submission.

Transfection and imaging (Day 2)

  1. HEK293 cells were transfected using Lipofectamine 3000 with the designed plasmid mixes.
  2. An OT-2 liquid handling robot in the Weiss Lab (NE-47, MIT campus) executed the transfection protocol based on our uploaded spreadsheet.
  3. Fluorescence readout of mNeonGreen, mKO2, and eBFP2 will be measured after 24–48 hours of incubation.

Key takeaways

  • Analog beats digital: Both circuits produce continuous, graded outputs rather than binary on/off responses — demonstrating the fundamental advantage of IANNs over traditional Boolean genetic circuits.
  • Weight tuning via DNA dosage: The behavior of each circuit was tuned entirely by adjusting the nanogram amounts of each plasmid. No new genetic parts were needed — only different ratios of the same library components.
  • Cross-inhibition enables complex functions: The RF circuit shows that mutual antagonism between endoribonucleases can create winner-take-all dynamics, allowing one input to dominate. This is a biological implementation of competitive inhibition analogous to lateral inhibition in neural circuits.
  • Simulation before wet lab: The Biocompiler-Predict tool allowed us to iterate on circuit designs computationally before committing to expensive and time-consuming wet lab experiments.

Projects

Final projects:

  • A cell-free synthetic biology pee pad that detects glucose in canine urine via an engineered P_oxyR–mCherry circuit for early diabetes screening.
May 11, 2026

Subsections of Projects

Group Final Project

cover image cover image

LumiPaws

LumiPaws engineers an OxyR-responsive genetic circuit into a freeze-dried paper substrate that detects urinary glucose through visual fluorescence. Urine testing is the foundation of preventive care for pets as a way to shift from waiting until the animal is already sick to spotting trouble before it starts.

Yutong Wu · Wellesley College ‘26 · yw108@mit.edu


LumiPaws PeePad LumiPaws PeePad

01 · Overview

Canine diabetes affects approximately 1 in 300 dogs, and most diagnoses arrive late: after weight loss, excessive thirst, or ketoacidosis bring the family to a clinic. There is no accessible at-home screening tool. Veterinary blood glucose tests require a visit, and pet owners have no early warning system.

LumiPaws turns the everyday pee pad into a diagnostic. A freeze-dried cell-free reaction sits in the absorbent core of a six-layer pad; when a diabetic dog urinates on it, glucose is enzymatically converted to hydrogen peroxide, which activates an engineered OxyR transcription factor and drives mCherry expression, producing a red signal visible to the owner via .

Why red? Urine is naturally yellow, which is a problem for any color-based test, since most signals blend right into the background. LumiPaws uses mCherry, a red fluorescent protein, precisely because red cuts cleanly through yellow rather than getting lost in it. The signal is readable under a focused light at 587 nm, mCherry’s excitation peak, and every kit ships with a small handheld torch tuned to that exact wavelength.

MetricValueContext
1 : 300Dogs with diabetesRising rapidly over the last decade
> 100 mg/dLClinical glucosuria thresholdRenal threshold ~180–220 mg/dL serum
~ $0Equipment to useVisual fluorescence; no hardware

Cascade diagram Cascade diagram Insert: Glucose → GOx → H₂O₂ → OxyR → mCherry signal flow


02 · Project Aims

Aim I — Circuit Validation

Design and validate a P_oxyR-mCherry cell-free system responsive to glucose via the glucose oxidase pathway. Quantify dose-response across H₂O₂ and glucose concentrations in tube and on paper.

Aim II — Cell-Free Integration

Optimize circuit sensitivity through iterative promoter design and freeze-dry the validated reaction onto a paper-based three-layer detection platform suitable for urine-triggered activation.

Aim III — Multi-Biomarker Platform

Establish a versatile cell-free biosensor platform expandable to detect urinary biomarkers including urea, ketones, proteins, and infectious agents for comprehensive veterinary and human diagnostics.


03 · Circuit Design

Two rounds of iterative engineering.

Round 01 — Native P_oxyR (minimum viable circuit)

Concept. First iteration used the native E. coli ahpC promoter with OxyR’s endogenous binding site driving mCherry directly. A minimal architecture intended to test whether OxyR-dependent activation could be detected in a cell-free extract.

Architecture. EcoRI → P_ahpC → RBS B0034 → mCherry → rrnB T1 → HindIII. Total length ~1.2 kb. Linear gene fragment ordered from Twist.

Outcome. Worked in vivo but produced negligible signal in cell-free — the native σ⁷⁰-dependent promoter is poorly transcribed by the T7-rich CFPS systems. This motivated Round 2.

Round 1 construct map Round 1 construct map

Round 02 — T7–OxyR hybrid promoter (current)

Concept. Engineered a hybrid promoter fusing the T7 core sequence with the OxyR operator from ahpC. T7 RNAP provides high-yield transcription; OxyR gates that activity in an H₂O₂-dependent manner. Two cassettes on one template: constitutive OxyR expression upstream, T7–OxyR gated mCherry downstream.

Architecture. EcoRI → T7 → RBS → OxyR → rrnB T1 → insulator → T7-OxyR hybrid promoter → RBS → mCherry → Tphi → HindIII. Two terminator variants used to eliminate sequence repeats. Total length 1,903 bp.

Variants ordered. Two gene fragments: v1 (5 bp spacer) and v2 (3 bp spacer) between T7 core and OxyR operator — to identify the optimal spatial geometry for transcriptional regulation.

Round 2 construct map Round 2 construct map Insert: T7–OxyR hybrid construct map


04 · Parts & Components

Final assembled gene fragment, 5’ to 3’. Synthesized as Gene Fragment (Adapters OFF) for direct use in cell-free reactions with GamS protein.

ElementSourceFunctionSize
EcoRI flankSyntheticCloning site / restriction handle6 bp
T7 promoterT7 bacteriophageDrives OxyR expression in CFPS17 bp
RBS B0034iGEM RegistryStrong ribosome binding site18 bp
OxyR CDSE. coli K-12 MG1655H₂O₂-sensing transcription factor918 bp
rrnB T1 terminatorBBa_B0010Terminates Chapter 1 transcription80 bp
Insulator spacerSynthetic neutralPrevents read-through22 bp
T7–OxyR hybridEngineered (this work)Gated transcriptional element~59 bp
RBS B0034iGEM RegistryRBS for mCherry18 bp
mCherry CDSCodon-optimized (IDT)Red fluorescent reporter708 bp
Tphi terminatorT7 bacteriophageTerminates Chapter 2 transcription48 bp
HindIII flankSyntheticCloning site / restriction handle6 bp

05 · Methods

📄 → View the full experiment protocols (Google Doc)

01 — Construct Design. DNA design and annotation in Benchling. Parts sourced from NCBI (OxyR Gene ID 948462), iGEM Registry (RBS, terminators), and IDT CodonOpt for E. coli-optimized mCherry.

02 — Gene Synthesis. Twist Bioscience Gene Fragments (Adapters OFF), 1,903 bp linear DNA. Two spacer variants synthesized in parallel to triangulate optimal T7–OxyR geometry.

03 — Cell-Free Expression. Ginkgo CFPS Economy Kit (primary) and NEB PURExpress (backup). GamS protein added to protect linear DNA from exonuclease activity in crude extracts.

04 — Enzymatic Conversion. Glucose oxidase from Aspergillus niger (Sigma G7141) co-lyophilized with the cell-free reaction. Converts urinary glucose into H₂O₂ via molecular oxygen.

05 — Fluorescence Readout. mCherry signal quantified on a plate reader (ex 569 nm, em 610 nm) with kinetic measurements every 10 minutes over 4–6 hours at 30 °C. Dose-response curves fit to extract EC50 and dynamic range.

06 — Paper Integration. Optimized reactions freeze-dried onto Whatman #1 filter paper discs and integrated into a six-layer pad architecture: spunbond top → distribution → biology core → SAP → PE waterproofing → base.


06 · Results & Expectations

What success looks like.

Dose-response curve Dose-response curve Insert: mCherry fluorescence vs. H₂O₂ concentration across both variants

Quantitative targets

Validation of the T7–OxyR hybrid promoter in cell-free reactions across an H₂O₂ titration. Success at this stage unlocks the full glucose cascade and paper integration.

KeyDescriptionTarget
LODLimit of detection< 50 µM H₂O₂
EC50Half-maximal activation100 – 500 µM
RangeDynamic range, fold induction5 – 15 ×
Time to detectable signal1 – 2 hr
CVReplicate variability< 25 %

07 · Project Timeline

From design to deployment.

  • Phase 01 — Construct design in Benchling. Two T7–OxyR hybrid promoter variants assembled with non-repeating terminators. Sequence verified, EcoRI/HindIII flanks added.
  • Phase 02 — Gene fragment ordered from Twist. Both variants submitted as Gene Fragment (Adapters OFF), 1,903 bp linear DNA each. Co-shipped via MIT/Harvard institutional batch.
  • Phase 03 — Cell-free reagents acquired. Ginkgo CFPS Economy Kit, GamS protein, glucose oxidase, and H₂O₂ standards in hand. Plate reader access confirmed.
  • 🔵 Phase 04 — Pilot run + H₂O₂ dose-response (Active). Friday: pilot with mScarlet positive control and three H₂O₂ conditions. Saturday: 24-well replicate across full H₂O₂ titration with both variants.
  • Phase 05 — Full glucose cascade. Add glucose oxidase + glucose titration (0–1000 mg/dL). Confirm end-to-end glucose → H₂O₂ → mCherry signal in tube.
  • Phase 06 — Paper prototype. Freeze-dry optimized reaction onto Whatman discs. Test rehydration with synthetic urine + glucose. Iterate on stability and signal time.

08 · Future Vision

Your pet can’t talk, but their urine can.

LumiPaws is a platform, not a single product. The cell-free system at the heart of the pad can be re-engineered to detect a growing range of urinary biomarkers — and the format can expand from dog pee pads to cat litter, creating a unified at-home diagnostic ecosystem for companion animals.

🔵 Kidney Function — Urea + Creatinine

Cell-free expression of urease and creatininase, paired with engineered ammonia- or pH-responsive transcription factors, can flag early kidney disease — one of the most common causes of mortality in older dogs and cats.

🟡 Diabetes — Glucose + Ketones

Adding β-hydroxybutyrate dehydrogenase to the same cell-free chassis extends the diabetes panel from glucose alone to diabetic ketoacidosis, a life-threatening complication that requires immediate veterinary care.

🩷 Pathogen DNA Detection

Toehold switches and CRISPR-Cas12a-based reporters can be freeze-dried alongside the OxyR circuit to detect bacterial or viral DNA in urine — screening for urinary tract infections, leptospirosis, and other pathogens without ever culturing a sample.

🟢 From Pads to Litter

Cats use litter, not pads. The same cell-free chemistry can be embedded into color-changing litter substrates, giving cat owners the same early-warning system. One platform, two delivery formats, every household.

The platform thesis

Every cell-free reaction we engineer becomes a reusable module. Swap the enzyme, swap the promoter operator, swap the reporter color — and the same freeze-dried paper substrate becomes a different test. LumiPaws starts with glucose because the chemistry is well-characterized, but the roadmap is multiplexed urinalysis at home: kidney, liver, pancreas, infection, hydration — read from a single sample, every day, by the people who already pay closest attention to their pets.


Acknowledgments

HTGAA 2026 · Twist Bioscience · Ginkgo Bioworks · NEB · Sigma-Aldrich


© 2026 Yutong Wu · HTGAA Final Project · Wellesley College ‘26