With a rather limited background in synthetic biology and bioengineering, I sketched out my initial scope of interest in closed-loop controllers…
1. Introduction
With a rather limited background in the field of synthetic biology and bioengineering, I sketched out my initial scope of interest in closed-loop controllers, in which they are autonomous and adjust to the environment around.
While I’m also interested in the bidirectional communication via the gut-brain axis. I want to explore the idea of engineering a gut bacterium with a synthetic genetic circuit that could detect biomarkers in the gut and conditionally produce neuroactive compounds that modulate brain activity via the GBA.
The circuit should ideally consist of a sensor module, processing module, and a response module. The logic is elucidated as following:
Inflammation detected → threshold exceeded → produce calming molecules → inflammation decreases → production shuts off.
This idea draws distinction from those open-loop, stress-relieving gummies and pills in that, this is a self-regulating therapeutic that produces compounds at the site where the gut-brain signaling infrastructure exists, and only produces upon conditional activation when the stress/inflammation biomarker exceeds a certain threshold.
2. Governance Goals
The overarching goal is Non-Malfeasance (preventing harm)
The nature of the technology involves releasing a genetically engineered organism into the human body, and potentially into the broader environment, making harm prevention and the Dual Use Research Concern (DUrC) indispensable presences and should be carried out at multiple scales.
SubGoal 1A: Preventing Uncontrolled Spread and Ecological Contamination
The engineered microbe must not exist beyond its therapeutic window, which means it should by no means spread to unintended hosts, or transfer its synthetic genes to wild microbial populations via the following possible routes:
Horizontal gene transfer (HGT): Synthetic circuit components (especially antibiotic resistance markers used in cloning) could transfer to pathogenic gut bacteria.
Environmental shedding: Engineered bacteria will be excreted and enter wastewater and soil ecosystems.
Mutation: The organism could evolve and mutate overtime to the point where the original means of control no longer works, or it can gain unintended functions.
The closed-loop circuit must not overproduce compounds that trigger immune reactions within the body or interferes with the existing microbiome in unintended ways, such as:
Overproduction toxicity: A sensor that is too sensitive or a failed threshold filter could flood the gut with GABA/serotonin precursors.
Immune overactivation: The engineered organism might trigger inflammatory responses, paradoxically worsening the target condition.
Microbiome disruption: The engineered organism at therapeutic densities could outcompete native beneficial bacteria.
SubGoal 1C: Informed Consent
Governance must address who gets access and whether patients can meaningfully consent to hosting a living engineered organism, as the commitment is larger than taking in a single pill.
3. Potential Actions
Three potential governance actions are considered below, incorporating 1) Purpose, 2) Design, 3) Assumptions, and 4) Risk of Failure and “Success”.
Governance Action 1: Comprehensive policy framework and clear assignment on roles played by different actors
Purpose: The work conducted with living organisms in making them biotherapeutic product usually fall under FDA’s established framework of CBER, but due to the closed-loop nature of the synthetic circuit, there are no detailed requirements/regulations revolving around how to exert controllable influence that distinguishes from the treatment of those open-looped projects.
Design: Given the participation of various actors, when FDA issues the guidance, academic labs should design/provide corresponding biocontainment tools. While biotech companies comply and absorb testing costs. Research agencies should then standardize biocontainment toolkits to lower barriers for smaller labs. Cross-agency coordination with environmental protection agencies (e.g. EPA) may be needed.
Assumptions
Effective switches can be engineered over time to keep the microbiome in check
FDA has sufficient synbio experts in evaluating the circuit design
In vitro stability testing predicts in vivo behavior
Risks
Failure: IF the standards were set too high making the project difficult to perform, it could lead to the decline in industry as small labs and startups may choose to opt out.
Success: A standard designed too well could lead to underestimation of risks.
Governance Action 2: Long Term Monitoring and Clinical Trials
Purpose: Given the closed-loop nature and the potential changes that could occur in living therapeutics, clincal trial framework should establish different tiers that occurs over a designated timescale for constant surveillance.
Design: The clinical trials should develop at least three tiers, with
Tier 1 (1-3 yr): Standard testing phase
Tier 2 (5 yr): Mandatory microbiome monitoring and tracking of genomic sequences
Tier 3: Constant survillance of wastewater disposal in experimenting/trial regions
Assumptions
Patient will remain in 5 year follow up
The engineered organism can be effectively tracked within gut environment
Risks
Failure: Unforseen development of organism is sighted after widespread distribution.
Success: Over institutionalized framework could slow development of future iterations.
Governance Action 3: Transparency and International Oversee
Purpose: In considering the potential widespread use of such ideation, the public should gain transparency to the fundamental logic/codes. Simultaneously, international harmonization groups like WHO should develop and align the set of harmonized minimum standards for testing and monitoring.
Design: National governments in coordinating and aligning regulations under international organizations and synbio industry leaders. Commited collaboration between public and private sectors in a foreseeable timescale.
Assumptions
Committed support among decision maker exists despite current issue in international relations.
Applicable universal standard despite different cultural practice
Development of technology be in pace with international harmonization.
Risks
Failure: No actual efforts of enforcement made.
Success: Rigorous standards that further stabilize the advantage of developed countries, and enlarge the medical development and accessibilities between countries.
4. Scoring Framework
The following rubric evaluates the governance options presented above on a 1–3 scale (1=week/limited, 2=moderate, 3=strong) across the span of biosecurity, lab safety, environmental protection, and practical considerations.
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
3
2
2
• By helping respond
1
3
3
Foster Lab Safety
• By preventing incident
3
2
2
• By helping respond
2
2
1
Protect the environment
• By preventing incidents
3
2
2
• By helping respond
1
3
3
Other considerations
• Minimizing costs and burdens to stakeholders
2
2
1
• Feasibility?
2
3
1
• Not impede research
1
2
2
• Promote constructive applications
2
3
3
Total
20
24
20
5. Prioritized Option
Given the overall scoring, Governance Action 2 yields the highest total amongst the three, because the design in stages of trial over a timescale monitors the progress of experiment closely and allows for early detection of incidents. The gradual development also allows brings the market into consideration, making the idea of wide application possible.
However, it also contain weakness that needs to be accompanied by complementary actions. Specifically on prevention, Action 1 scores higher in that it implants kill switches in the initial engineering phase.
Action 3 touches a little bit of everything, but it should be of a later consideration when the technology and domestic standards became more mature, as implementing regulations on an international level generates huge costs and often require longer time for reconciliation/negotiation.
Assignment:
Questions from Professor Jacobson
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
The error rate, according to slide 8, is 1:10^6. The human genome as noted is 3.2 billion base pairs (gbp), and hence if we were to do the calculation there would be around three thousand new mutations/cell division. The biology deals with the discrepancy through error correction like MutS Repair System, that detects the mismatched base pairs and resynthesize it correctly, therefore bringing down the error rate and enabling the copying to proceed with very few/zero errors.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
An average human protein is encoded by around 1036 base pairs of DNA (slide 6), and divided by three (codon) will get roughly around 345 amino acids/protein. So given the number, there’s around 10^150 possible DNA sequences that result in the same primary chain of amino acids. But the majority are redundant, and in some situations a sequence of amino acid would create mRNA structures like hairpin that blocks the ribosome from binding and the forming of right protein.
Questions from Professor LeProust
What’s the most commonly used method for oligo synthesis currently?
The most used method is the phosphoramidite method, which is a 4 step chemical cycle that repeats for N times, specifically including coupling (with phosphoramidite), capping (unreacted sites), oxidation, and deblocking.
Why is it difficult to make oligos longer than 200nt via direct synthesis?
It is difficult mainly due to the inefficiency of the coupling steps and the accumulation of errors, given the exponentially decaying yield, as the error rate accumlates, the majority would be of failure sequence by the time it reaches 200.
Why can’t you make a 2000bp gene via direct oligo synthesis?
Because the direct oligo synthesis is performed via phosphoramidite, and due to the multiplicative nature of the success rate and the final yield follows an exponential decay curve, as the number of nucleotides increases, the accuracy will go down. By the time it reaches 2000, it would be hardly possible to extract the correct sequence among all disturbances and noises. Hence bioengineers synthesize smaller oligos and stitch them together to ensure the correct sequence.
Question from Professor Church
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
The 10 essential amino acid (from the slide and with the aid of google) are listed below:
Arginine (Arg)
Histidine (His)
Isoleucine (Ile)
Leucine (Leu)
Lysine (Lys)
Methionine (Met)
Phenylalanine (Phe)
Threonine (Thr)
Tryptophan (Trp)
Valine (Val)
The Lysine Contingency (according to Google) refers to the genetic alteration performed in the movie Jurassic Park, that made dinosaurs unable to produce lysine, therefore relying on human supplements to survive. But this idea does not stand as it is an essential amino acid within them that doesn’t need to be synthesized, and hence dinosaurs can gain lysine by eating other organisms. This idea sheds light on the biocontainment method of NSAA (non standard amino acid), which organisms cannot obtain in a natural setting, and hence is a more secure contingency.
Week 2 HW: DNA Read, Write, and Edit
3.1. Choose your protein.
In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
I have selected PIEZO1 as my protein, that is a protein sitting in the cell membrane and opens when the membrane is physically stretched, compressed, or deformed, basically detecting the membrane tension.
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
E. coli Codon-Optimized DNA (7,566 bp)
Optimized for expression in E. coli C43(DE3). Rare codons (AGG/AGA for Arg, CUA for Leu, AUA for Ile) replaced with E. coli-preferred synonymous codons to prevent ribosomal stalling and improve yield.
Click to expand E. coli-optimized sequence (codon-spaced)
Key differences from human-optimized version: Arginine codons AGG/AGA → CGT/CGC (abundant E. coli tRNAs) · Leucine CTA → CTG/CTT · Isoleucine ATA → ATT · Lower GC content (~52% vs ~69% in human-optimized)
Quick Comparison
Property
Protein
Native DNA
E. coli-Optimized DNA
Length
2,521 aa
7,566 bp
7,566 bp
GC content
—
~58%
~52%
Target host
—
H. sapiens
E. coli C43(DE3)
Rare codons
—
None (native)
Eliminated
Encoded protein
PIEZO1
Identical
Identical
Note: Both DNA sequences encode the exact same protein. Only the synonymous codon choices differ, optimized for the translational machinery of the target host organism.
Week 3 HW: Lab Automation
Post Lab Questions
Write a description about what you intend to do with automation tools for your final project.You may include example pseudocode or Python scripts, procedures you may need to automate, 3D printed holders you may need, and more.
Example ideas that you can create a protocol for:
Use the cloud laboratory to screen an array of biosensors constructs that you design, synthesize, and express using cell-free protein synthesis
Use Opentrons to dispense microorganisms onto fabric to design “living textiles” as “bio artwork”
Find and briefly summarize a published paper that utilizes laboratory automation to achieve novel biological applications
Include in your summary:
General overview (2 paragraphs)
Findings (1 paragraph)
Relevant Figures (1 - 2 max)
Week 4 HW: Protein Design Part I
Part A Conceptual Questions
Q1. How many molecules of amino acids do you take in with a piece of 500 g of meat?
Meat is approximately 25% protein by weight, so 500 g of meat contains about 125 g of protein. Using the given average molecular weight of ~100 Da (= 100 g/mol) per amino acid:
$500\text{ g} \times 0.25 = 125\text{ g of protein}$
Moles of amino acids = 125 g ÷ 100 g/mol = 1.25 mol
Number of molecules = 1.25 mol × 6.022 × 10²³ mol⁻¹ ≈ 7.5 × 10²³ amino acid molecules
Q2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Proteases break dietary proteins down into individual amino acids during digestion, which are chemically identical regardless of source. Once absorbed, your cells reassemble these amino acids into human proteins according to the instructions in your own DNA. No genetic information transfers from food to your genome; dietary DNA is degraded by nucleases in the gut. Food provides raw building blocks, but your genome provides the blueprint, so the output is always human protein.
Q3. Why are there only 20 natural amino acids?
The 20 canonical amino acids provide a near-optimal coverage of side-chain chemical properties — spanning small to large, polar to nonpolar, charged, aromatic, and nucleophilic — with minimal redundancy. The triplet genetic code can encode 64 codons, and after reserving stop signals and building in redundancy to buffer against mutation errors, 20 amino acids strikes a good balance between functional diversity and error tolerance. These 20 are also the ones that were biosynthetically accessible through early metabolic pathways derived from central metabolites. Once the translation machinery co-evolved around this set, changing it became prohibitively costly since it would affect every protein in every organism, so the system became frozen early in evolution.
Q4. Where did amino acids come from before enzymes that made them, and before life started?
Amino acids predate life and arise from chemistry. The Miller–Urey experiment demonstrated that electric discharges through a reducing atmosphere produce glycine, alanine, aspartate, and other amino acids. Life inherited these building blocks from prebiotic geochemistry and later evolved enzymatic pathways to produce them more efficiently.
Q5. If you make an α-helix using D-amino acids, what handedness would you expect?
A left-handed α-helix. The natural right-handed α-helix arises because L-amino acids position their side chains to minimize steric clashes with backbone carbonyls specifically in the right-handed conformation. D-amino acids are the mirror image of L-amino acids, so the favorable backbone dihedral angles flip sign — from (−57°, −47°) to (+57°, +47°) — producing a left-handed helix. This is confirmed experimentally: synthetic D-peptides give circular dichroism spectra that are exact mirror images of natural L-peptide helices.
Q6. Can you discover additional helices in proteins?
Yes. (according to google) Beyond the common α-helix, proteins contain 3₁₀-helices (3.0 residues/turn, i→i+3, common at helix termini), π-helices (4.4 residues/turn, i→i+5, rare single-turn insertions), and the collagen triple helix. In principle, any repeating set of backbone (φ, ψ) angles that permits regular hydrogen bonding defines a helix, and the main candidates have been systematically mapped from the Ramachandran plot.
Q7. Why are most molecular helices right-handed?
The dominance of right-handed helices stems from the universal use of L-amino acids–> the lowest-energy conformation due to favorable side-chain positioning.
Once L-amino acids became dominant, all downstream molecular machinery co-evolved around that chirality. If life had been founded on D-amino acids, left-handed helices would dominate and the biology would be equally functional.
Q8. Why do β-sheets tend to aggregate? What is the driving force?
β-sheets are inherently open-ended structures: unlike α-helices where all backbone hydrogen-bond donors and acceptors are satisfied internally, β-sheet edge strands have one face of exposed N–H and C=O groups available for hydrogen bonding with additional strands. This creates a thermodynamic driving force to recruit more strands and extend the sheet. The main forces driving aggregation are backbone hydrogen bonding between exposed edges, essentially intermolecular β-sheet extension, the hydrophobic effect from burying nonpolar side chains between stacked sheets, and van der Waals contacts in the cross-β arrangement.
Q9. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
Proteins involved in amyloid diseases have aggregation-prone hydrophobic stretches or destabilizing mutations that lower the kinetic barrier to reaching this state, and once a nucleus forms it templates further conversion in a self-propagating manner.
Amyloid fibrils can be used as materials. They have tensile strength comparable to steel and Young’s moduli of 2–14 GPa, and they resist proteases, detergents, and heat. In bionanotechnology, amyloid fibrils serve as scaffolds for conductive nanowires, hydrogel matrices for tissue engineering and drug delivery, and membranes for heavy-metal water purification.
Part B — Protein Analysis and Visualization
Q1. Briefly describe the protein you selected and why you selected it.
PIEZO1 is a homotrimeric mechanosensitive ion channel that converts physical forces — such as fluid shear stress, membrane stretch, and compressive pressure — into biochemical signals by allowing cation influx (primarily Ca²⁺) upon mechanical stimulation. Each subunit contains ~38 transmembrane helices that form a distinctive curved, propeller-like architecture with three peripheral “blades” and a central pore.
PIEZO1 is valuable because it serves as a fundamental mechanical switch for cellular programming: it governs processes including vascular development, red blood cell volume regulation, blood pressure sensing, and cell lineage determination in stem cells.
Q2. Identify the amino acid sequence of your protein.
Most common amino acid:Leucine (L), appearing 367 times (~14.6% of the sequence). This is expected — leucine is the most abundant residue in transmembrane α-helices due to its hydrophobic character and favorable helix-forming propensity, and PIEZO1 is overwhelmingly α-helical with ~38 transmembrane passes per subunit.
Homologs
Using UniProt BLAST on the human PIEZO1 sequence returns homologs across a broad range of eukaryotes — vertebrates, insects, plants, and even single-celled eukaryotes — reflecting the ancient evolutionary origin of mechanosensation. The closest homolog is PIEZO2 (human, ~42% sequence identity), which mediates light touch and proprioception. Beyond PIEZO2, orthologs of PIEZO1 are found in most metazoan genomes (mouse, zebrafish, Drosophila, C. elegans), with more distant homologs in plants (Arabidopsis) and protists. A typical BLAST search returns several hundred significant hits (E-value < 0.05), though the number depends on the database and threshold used.
Protein family
PIEZO1 belongs to the Piezo family , a eukaryote-specific family of mechanosensitive channels with no significant homology to any other known ion channel family (e.g., TRP channels, Degenerin/ENaC, or MscL/MscS bacterial mechanosensitive channels). This makes the Piezo family an evolutionarily independent solution to mechanotransduction.
Q3. Identify the structure page of your protein in RCSB.Structure and resolution
The primary full-length structure is PDB: 5Z10 . The human PIEZO1 also has related entries (e.g., PDB 7WLT).
Resolution: 3.97 Å. For a cryo-EM structure of a ~900 kDa trimeric membrane protein, this is a reasonable resolution — sufficient to trace the backbone, assign secondary structure, and identify transmembrane helix positions. However, it is not high resolution by crystallographic standards; individual side-chain conformations and water molecules are generally not resolvable at this resolution.
Other molecules in the structure
The solved structure contains:
Lipid molecules — Phospholipids are resolved in the transmembrane domain, consistent with PIEZO1’s curved membrane-embedded architecture and its sensitivity to membrane composition and tension.
Detergent molecules — from the purification process (typically digitonin or LMNG).
Ions — depending on the specific entry, Ca²⁺ or other cations may be modeled in or near the pore region.
Structure classification
In the RCSB classification, it falls under membrane proteins → ion channels → mechanosensitive channels. Its unique propeller-blade topology does not closely resemble any other structurally characterized ion channel family, making it a distinct structural class.
Q4. Visualize the structure of your protein.
Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
Cartoon
Ribbon
Ball and Stick
Secondary structure
PIEZO1 is overwhelmingly α-helical. Each subunit contains ~38 transmembrane helices organized into repeated structural units called “Piezo repeats” (or “transmembrane helical units”), which form the curved blades of the propeller. The central pore region includes an inner helix (TM37), outer helix (TM38), and the C-terminal extracellular domain (CED). There are virtually no β-sheets in the structure — only short loops and turns connect the helices. This extreme α-helical bias is consistent with its identity as a multi-pass transmembrane protein.
Residue type distribution (hydrophobic vs. hydrophilic)
When colored by residue type:
The transmembrane blade regions are dominated by hydrophobic residues (Leu, Ile, Val, Phe, Ala) — these face the lipid bilayer and form the core of helix-helix packing within the membrane. This explains why leucine is the most frequent amino acid.
Hydrophilic and charged residues (Arg, Lys, Glu, Asp) are concentrated at the intracellular and extracellular surfaces, at helix termini (anchoring the protein at the membrane-water interface), and lining the central ion conduction pore (where they contribute to ion selectivity and gating).
The CED (C-terminal extracellular domain), which protrudes above the membrane at the trimer center, has a higher proportion of polar and charged residues, consistent with its aqueous environment.
This distribution follows the classic “positive-inside rule” — positively charged residues (Arg, Lys) are enriched on the cytoplasmic side of the membrane.
Surface and binding pockets
The surface of PIEZO1 reveals several notable features:
Central pore. The most prominent “hole” is the ion conduction pathway at the trimer axis. This is the functional pore through which cations flow upon channel activation.
Lateral fenestrations. Between the blade domains near the membrane plane, there are openings (fenestrations) that may allow lateral lipid access to the pore — a feature shared with some other ion channels and potentially important for lipid-mediated gating.
Intracellular “cap” cavity. On the cytoplasmic face, the converging beam-like structures create an enclosed cavity that has been proposed as a binding site for intracellular modulators.
Yoda1 binding site. The small-molecule agonist Yoda1 binds in a pocket between the blade and pore module (identified in structures like PDB 7WLT), confirming a druggable pocket in the structure.
Overall, the surface is not smooth — the curved, dome-shaped architecture creates multiple grooves and pockets that are functionally relevant for lipid interaction, mechanical force transduction, and pharmacological targeting.
Part C - Using ML-Based Protein Design Tools
1. Deep Mutational Scans
1.1 Method
ESM2 was used to generate an unsupervised deep mutational scan of human PIEZO1 (UniProt Q9H5I5, 2,521 amino acids). For every position in the sequence, the model scores the log-likelihood of substituting the wild-type residue with each of the 20 amino acids. The resulting heatmap displays Model Scores across all positions (x-axis) and all possible amino acid substitutions (y-axis), where green/yellow indicates neutral or favorable substitutions and dark blue/purple indicates substitutions the model predicts to be strongly deleterious.
1.2 Observed Patterns
Conserved positions appear as dark vertical columns. Several positions show strongly negative scores across nearly all 20 substitutions, indicating that the model considers any change at those positions highly unlikely based on evolutionary sequence patterns. These columns correspond to residues that are critical for PIEZO1’s structure or function — they map primarily to the pore-lining region and the C-terminal anchor domain, where even conservative substitutions would disrupt ion conduction or mechanical gating.
The Leucine (L) row is notably bright across most positions. Mutations to leucine are generally well-tolerated, which is consistent with PIEZO1’s identity as a multi-pass transmembrane protein (~38 TM helices per subunit). Leucine is the most common residue in α-helical transmembrane domains due to its hydrophobic character and favorable helix-forming propensity, so substituting to leucine is a “safe” change at most positions.
The Glycine (G) row shows scattered deep blue spots. Positions where the wild-type is glycine tend to show dark columns across other substitutions. Glycines in transmembrane helices are critical for helix packing and flexibility — they allow tight inter-helix contacts that bulkier residues would sterically prevent. Mutating these glycines is therefore strongly disfavored.
A specific example: One of the most prominent dark vertical bands appears in the region corresponding to the inner pore helix of PIEZO1. Conserved charged residues in this region (e.g., glutamate or arginine residues lining the pore) score very negatively when mutated to hydrophobic residues like leucine, isoleucine, or valine. This is biologically expected — charged residues in the pore domain are essential for cation selectivity and gating, and replacing them with hydrophobic side chains would destroy channel function.
2. Latent Space Analysis
2.1 Method
15,177 structurally classified protein domains from the SCOPe/ASTRAL database were embedded using ESM2-8M (hidden dimension = 320) into 320-dimensional vectors. t-SNE then projected these into 3D for visualization. The color scale represents TSNE3 (yellow = high, purple = low), providing visual depth. Despite using the smallest ESM2 model, the projection recovers meaningful structural groupings, demonstrating that protein language models encode structural information implicitly from sequence alone.
2.2 Neighborhood Analysis
I took three corresponding coordinates for analysis:
Upper yellow region (high TSNE3) — β-sheet-rich proteins.
d2g5da1 (TSNE: −2.29, −1.13, 4.05) is Membrane-bound lytic murein transglycosylase A (MLTA) from Neisseria gonorrhoeae. Its neighbors in this yellow cluster are predominantly other β-barrel and β-sheet-rich domains, including outer membrane proteins from gram-negative bacteria that share the β-barrel architecture.
Dense central orange region (intermediate TSNE3) — common α/β folds.
d3cwna_ (TSNE: −0.82, 0.88, 0.34) is an E. coli protein matching SCOP class c.1.10.1 (α/β, TIM barrel fold). The TIM barrel is the most common enzyme fold in nature (found in glycolysis enzymes, aldolases, tryptophan synthase, etc.), and its position in the densest part of the plot reflects both its abundance in protein databases.
Lower purple region (low TSNE3) — unusual/transmembrane proteins.
d1x2ma1 (TSNE: −0.79, 0.85, −6.20) is Lag1 longevity assurance homolog 6 (LASS6/CerS6) from mouse. LASS6 is a multi-pass transmembrane ceramide synthase with ~5–6 TM helices and a unique Lag1p motif. Its position far from the soluble enzyme core reflects ESM2’s recognition that its hydrophobic, membrane-spanning sequence features are fundamentally distinct from typical soluble proteins.
2.3 Placing PIEZO1
PIEZO1 would be expected to sit in the purple periphery or as an isolated outlier given that
It is an extremely large multi-pass transmembrane protein, so its sequence composition is heavily biased toward hydrophobic residues. This transmembrane character would push it away from the soluble-protein-dominated central core, similar to how LASS6 sits in the purple region.
PIEZO1 has no sequence homology to any other known ion channel family. Its “Piezo repeat” domains and propeller-blade architecture are structurally unique. ESM2 would therefore embed it far from other channel proteins.
The only protein expected to sit nearby is PIEZO2 (~42% sequence identity), the sole close homolog. If PIEZO2 is absent from the dataset, PIEZO1 would sit alone — reflecting the evolutionary isolation of the Piezo family as a structurally novel, independent solution to mechanosensation.
4 candidate binders generated against the A4V mutant sequence + the known reference peptide. Lower perplexity scores indicate sequences more confidently predicted by the model.
#
Sequence
Perplexity
Note
PepMLM 1
WHYPAAAAAWKK
8.611
—
PepMLM 2
WRSPAVAAAHKE
7.866
Lowest perplexity
PepMLM 3
WRYPAVALEWKK
16.562
Highest perplexity
PepMLM 4
WHSYVVGARWWK
13.338
—
Known
FLYRWLPSRRGG
—
Reference binder
Note on Perplexity: In PepMLM, perplexity reflects how confidently the masked language model predicts each residue in context. Lower perplexity suggests the sequence is more consistent with the model’s learned distribution of binders; however, higher perplexity sequences may still yield productive binding if their physicochemical and structural properties are favourable.
Part 2: Evaluate Binders with AlphaFold 3
For the sake of my OCD or else with only 5 pics will look ugly
Known Peptide ipTM = 0.36
Peptide 1 ipTM = 0.27
Peptide 2 ipTM = 0.40
Peptide 3 ipTM = 0.19
Peptide 4 ipTM = 0.39
ipTM (interface predicted TM-score) measures predicted interface accuracy. Values range from 0 to 1 — higher is better. Scores ≥ 0.5 generally indicate confident predictions.
Binding Analysis
Structure
ipTM
Near A4V / N-term?
β-barrel engagement
Surface character
Known (Reference)
0.36
Yes
Lateral strand edge
Surface-bound, extended
PepMLM Peptide 1
0.27
No
Minimal
Surface, poorly engaged
PepMLM Peptide 2
0.40
Partial — dimer face
Lateral interface cleft
Surface docked
PepMLM Peptide 3
0.19
No
None
Peripheral, non-specific
PepMLM Peptide 4
0.39
Distal (C-term base)
Bottom loop region
Surface-bound
Notes
PepMLM Peptide 2 is the strongest candidate: highest ipTM, adopts α-helical secondary structure upon binding, and docks into the concave groove at the lateral β-barrel interface — the region destabilised by the A4V mutation. One face of the helix contacts SOD1 while the other remains solvent-exposed. This binding mode is consistent with therapeutic peptides that stabilise misfolding-prone interfaces.
PepMLM Peptide 4 has a comparable ipTM (0.39) but localises to the base of the barrel near C-terminal loops, distal from the A4V site, limiting its therapeutic relevance.
PepMLM Peptides 1 and 3 show poor interface engagement and are unlikely to be productive binders.
ipTM (interface predicted TM-score) measures predicted interface accuracy. Values range from 0–1; scores ≥ 0.5 generally indicate confident predictions. All values here are modest, consistent with flexible peptide–protein interfaces typical in AlphaFold-Multimer assessments.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
swipe left for more
PepMLM 1 WHYPAAAAAWKK
PepMLM 2 WRSPAVAAAHKE
PepMLM 3 WRYPAVALEWKK
PepMLM 4 WHSYVVGARWWK
Known (Reference) FLYRWLPSRRGG
Property
Prediction
Value
💧 Solubility
Soluble
1.000
🩸 Hemolysis
Non-hemolytic
0.013
🔗 Binding Affinity
Weak binding
4.902 pKd/pKi
📏 Length
—
12 aa
⚖️ Mol. Weight
—
1399.6 Da
⚡ Net Charge
—
+1.84
🎯 pI
—
9.70 pH
💦 GRAVY
—
−0.56
Property
Prediction
Value
💧 Solubility
Soluble
1.000
🩸 Hemolysis
Non-hemolytic
0.016
🔗 Binding Affinity
Weak binding
4.661 pKd/pKi
📏 Length
—
12 aa
⚖️ Mol. Weight
—
1322.5 Da
⚡ Net Charge
—
+0.85
🎯 pI
—
8.76 pH
💦 GRAVY
—
−0.58
Property
Prediction
Value
💧 Solubility
Soluble
1.000
🩸 Hemolysis
Non-hemolytic
0.027
🔗 Binding Affinity
Weak binding
5.784 pKd/pKi
📏 Length
—
12 aa
⚖️ Mol. Weight
—
1546.8 Da
⚡ Net Charge
—
+1.76
🎯 pI
—
9.70 pH
💦 GRAVY
—
−0.74
Property
Prediction
Value
💧 Solubility
Soluble
1.000
🩸 Hemolysis
Non-hemolytic
0.039
🔗 Binding Affinity
Weak binding
6.308 pKd/pKi
📏 Length
—
12 aa
⚖️ Mol. Weight
—
1574.8 Da
⚡ Net Charge
—
+1.85
🎯 pI
—
9.99 pH
💦 GRAVY
—
−0.55
Property
Prediction
Value
💧 Solubility
Soluble
1.000
🩸 Hemolysis
Non-hemolytic
0.047
🔗 Binding Affinity
Weak binding
5.968 pKd/pKi
📏 Length
—
12 aa
⚖️ Mol. Weight
—
1507.7 Da
⚡ Net Charge
—
+2.76
🎯 pI
—
11.71 pH
💦 GRAVY
—
−0.71
All peptides are predicted soluble and non-hemolytic. Binding affinity (pKd/pKi): higher = stronger predicted affinity. Negative GRAVY scores reflect hydrophilic character across all sequences.
Across the five peptides, there is no clear correlation between ipTM and predicted binding affinity.
The peptide I selected is PepMLM Peptide 2 (WRYPAVALEWKK). While its predicted affinity is modest, it has the highest ipTM, adopts stable α-helical secondary structure upon docking — a hallmark of productive peptide–protein interfaces — and engages the lateral cleft of the β-barrel at precisely the region destabilised by A4V. It is the only candidate where the structural, physicochemical, and site-specificity evidence converge.
Part C: Final Project: L-Protein Mutants
The MS2 bacteriophage lysis protein (L-protein) is a 74 amino acid protein responsible for
killing E. coli host cells by perforating the bacterial membrane. A critical vulnerability of
this system is that a single point mutation in the host chaperone protein DnaJ can prevent the
lysis protein from functioning, allowing E. coli to acquire resistance to MS2.
The L-protein has two structurally and functionally distinct regions:
Soluble N-terminal domain (positions 1–38): responsible for interaction with DnaJ
Transmembrane domain (positions 39–73): responsible for membrane insertion and lysis
At least 2 in the transmembrane region and at least 2 in the soluble region.
Option 1: Mutagenesis
Running the ESM-2 protein language model
(facebook/esm2_t6_8M_UR50D) on the full wild-type L-protein sequence:
The model scores every possible single amino acid substitution at every position using a Log Likelihood Ratio (LLR):
Positive score → the substitution looks evolutionarily natural and compatible
Negative score → the substitution disrupts what the model expects at that position
Position 1 (M) showed almost entirely dark purple scores, confirming the start methionine is essential and should not be mutated
Rows M, W, Y were dark across most positions — large/bulky amino acids are generally disruptive substitutions
The transmembrane region (~positions 39–73) showed brighter yellow/green scores for hydrophobic substitutions (L, I, V, F) — consistent with the hydrophobic nature of membrane-spanning helices
Bright yellow hotspots at positions 29, 39, and 50 stood out as positions where specific mutations are strongly predicted
The notebook was first run with a focused query on the transmembrane region (positions 38–60), producing the following top-scored mutations:
Amino Acid Position Score
0 L 50 2.561468
1 L 39 2.241780
2 I 50 1.928801
3 L 53 1.864932
4 L 52 1.813968
5 F 50 1.802069
6 V 50 1.594576
7 S 50 1.574557
8 L 45 1.539248
9 S 39 1.517457
10 L 40 1.477630
11 A 39 1.364999
12 A 50 1.357795
13 I 39 1.320103
14 T 39 1.302804
15 F 39 1.245851
16 V 39 1.244390
17 T 50 1.222131
18 L 54 1.120860
19 R 39 1.064191
Three positions dominate the top scores: 50, 39, and 45. The model strongly favors leucine (L) substitutions at positions 50 and 39, and also at position 45. This is the first signal pointing toward K50L, Y39L, and A45(L or P) as strong TM candidates. Notably, multiple substitutions at position 50 rank highly (L, I, F, V, S, A),suggesting this position is generally flexible — but leucine scores the highest of all.
The notebook was then run on the full protein sequence to get a global ranking across all 74 positions:
Position Wild_Type_AA Mutation_AA LLR_Score
989 50 K L 2.561468
574 29 C R 2.395427
769 39 Y L 2.241780
575 29 C S 2.043150
173 9 S Q 2.014325
573 29 C Q 1.997049
572 29 C P 1.971029
569 29 C L 1.960646
987 50 K I 1.928801
1049 53 N L 1.864932
The top 10 globally are dominated by three positions: 50 (K→L), 29 (C→R/S/Q/P/L), and 39 (Y→L). This globally confirms what the TM scan already suggested, and additionally highlights C29 in the soluble region as a computationally interesting mutation site.
The full ranking also produced a second merged output combining both score datasets:
Position Wild_Type_AA Mutation_AA LLR_Score
1332 50 K L 2.561468
770 29 C R 2.395427
1035 39 Y L 2.241780
229 9 S Q 2.014325
776 29 C Q 1.997049
...
The computational shortlist from the ESM model was:
K50L (score: +2.56) — highest in entire protein
C29R (score: +2.40) — highest in soluble region
Y39L (score: +2.24) — strong TM candidate
A45L (score: +1.54) — noted in TM scan
The L-Protein Mutants CSV was uploaded into the notebook, which displayed the first rows of the experimental dataset:
This dataset contains experimentally measured lysis outcomes (0 = no lysis, 1 = lysis) for mutations that have already been tested in the lab. Cross-referencing this with the ESM scores revealed which computational predictions align with real biology.
Merging both datasets exposed a critical finding: the ESM model only partially agrees with experimental lysis outcomes.
Mutation
ESM Score
Lysis (Lab)
Agreement?
P13L
+0.10
Yes
✅
S15A
+0.04
Yes
✅
K23E
+0.18
Yes
✅
E25G
+0.45
Yes
✅
A45P
+0.04
Yes
✅
I46F
-0.10
Yes
❌
R18G
-0.85
Yes
❌
R31I
-0.93
Yes
❌
L44P
-1.59
Yes
❌
R20W
-2.18
Yes
❌
The disagreements (especially R18G, I46F, L44P) suggest that the ESM model scores general protein structural fitness (the ability to fold into a stable, functional, three-dimensional shape (conformation) that is energeticaly favorable), not functional lysis activity (the process of breaking open cell membranes).
Mutations that disrupt DnaJ binding (like R18G) are penalised by the model because the arginine is evolutionarily conserved — but conserved because it binds DnaJ.
This insight shaped the final selection strategy:
Use ESM scores to identify novel untested candidates with high computational confidence,
and use experimental data to validate or override those scores based on known biology.
With all evidence assembled, five mutations were selected spanning both protein regions:
Soluble Region Mutations (Positions 1–38)
P13L — Position 13, Proline → Leucine
ESM Score: +0.10 | Lysis: Confirmed | Protein Level: Confirmed
Proline at position 13 creates a rigid backbone kink within the DnaJ-binding domain.
Replacing it with leucine (flexible, hydrophobic) removes this constraint, potentially
allowing the soluble domain to fold independently of DnaJ. Supported by both model and lab.
S15A — Position 15, Serine → Alanine
ESM Score: +0.04 | Lysis: Confirmed | Protein Level: Confirmed
Serine at position 15 sits within the NRRRP arginine-rich DnaJ-binding motif. Its
hydroxyl side chain is a candidate hydrogen-bonding contact point with DnaJ. Replacing
it with alanine (no side chain beyond a methyl group) directly removes a potential DnaJ
interaction site. Both ESM and lab confirm this is tolerated. Selected alongside P13L
because the two mechanisms are complementary — P13L addresses backbone rigidity,
S15A addresses the interaction surface.
Transmembrane Region Mutations (Positions 39–73)
Y39L — Position 39, Tyrosine → Leucine
ESM Score: +2.24 | Lysis: Not yet tested
Position 39 is the first residue of the transmembrane domain — the boundary point where
the protein transitions from soluble to membrane-spanning. Tyrosine is large and polar
(hydroxyl group), which is chemically unusual at the start of a hydrophobic TM helix.
Leucine is hydrophobic and small, making for a cleaner, sharper TM helix start.
The ESM model strongly favors this change, and it ranked 3rd globally across the entire
protein. The only tested mutation at this position (Y39H) failed — but histidine is
charged and polar, making it incomparable to leucine. Selected as the highest-confidence
novel TM candidate.
A45P — Position 45, Alanine → Proline
ESM Score: +0.04 | Lysis: Confirmed | Protein Level: Confirmed
Introducing proline into a transmembrane helix creates a structural kink — a feature
found in many natural pore-forming proteins and ion channels. This kink at position 45
(sitting centrally in the TM helix) may promote the conformational change needed to open
the transmembrane pore. Supported by both the ESM model and direct experimental
confirmation.
K50L — Position 50, Lysine → Leucine
ESM Score: +2.56 (highest in entire protein) | Lysis: Not yet tested
Lysine (K) is a charged, hydrophilic amino acid — unusual to find it buried deep in a
hydrophobic transmembrane helix. The ESM model assigns the highest score in the entire
protein to replacing it with leucine (hydrophobic), which is thermodynamically much more
compatible with a membrane environment. This substitution could improve membrane
insertion efficiency, increase protein expression, or stabilize the TM assembly.
It is acknowledged that four other K50 variants (K50E, K50N, K50I, K50Q) have failed in
the lab, suggesting this position may be sensitive. However, K50L is specifically a
hydrophobic substitution — chemically distinct from the charged/polar variants that
failed — and its extremely high ESM score justifies testing it as a novel candidate.
Highest ESM score in protein; removes charged residue from TM core
AI Prompt used in this section for mutation selection:
Given the provided mutations, could you explain the rationale behind each and why would each serve as potentially candidates?
Week 6 HW: Genetic Circuits Part I
DNA Assembly Questions
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion DNA Polymerase: Building enzyme that reads the original DNA and constructs the new copies with high accuracy.
nucleotides
Optimized reaction buffer: A liquid that maintains the perfect chemical environment and pH for the enzyme to work.
MGCL2: Helper molecule (cofactor) that the polymerase needs to function properly.
What are some factors that determine primer annealing temperature during PCR?
Primer Length: Longer primers have more binding area, so they also require higher temperatures.
GC Content: The DNA bases Guanine (G) and Cytosine (C) bind to each other with three chemical bonds, while Adenine (A) and Thymine (T) only use two. Therefore, primers with more Gs and Cs hold on tighter and require a higher temperature.
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
PCR Protocol: Uses heat cycles to melt DNA apart, lets primers attach, and uses an enzyme to build new copies.
When to use: When you have a tiny amount of DNA and need billions of copies of a very specific segment, or when you want to add custom ends to a DNA sequence.
Restriction Digest Protocol: Mixes DNA with restriction enzymes and incubates them at a steady temperature. The enzymes physically cut the DNA at specific sequences.
When to use: When you want to extract a specific chunk of DNA out of a larger, already-existing piece, or when you want to verify that a DNA sequence is correct by seeing what sizes it cuts into.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
Must design the PCR primers so that the ends of DNA pieces overlap. The tail end of piece A must have the exact same sequence (usually 15 to 40 base pairs) as the starting end of piece B. The Gibson mix will chew back one strand of these ends, allowing the matching sequences to find each other and stick together like perfect puzzle pieces.
How does the plasmid DNA enter the E. coli cells during transformation?
Usually through heat shock or electroporation.
Heat Shock (Chemical): The bacteria are treated with chemicals (like calcium) to neutralize their charge, then subjected to a sudden spike in heat. This sudden temperature change creates temporary “pores” or holes in the bacterial wall, allowing the DNA to slip inside.
Electroporation: The bacteria are hit with a quick zap of electricity, which shocks the cell membrane into opening those temporary pores.
Describe another assembly method in detail (such as Golden Gate Assembly)
Golden Gate assembly is a method for joining multiple DNA fragments together in a single tube. It uses special “molecular scissors” called Type IIS restriction enzymes. Unlike normal restriction enzymes that cut exactly where they bind, Type IIS enzymes bind to a recognition sequence but reach over and cut the DNA a few steps away. Because they cut outside their recognition site, they leave behind custom “sticky ends” (overhangs) that you can design to match perfectly with the next piece of DNA. When the matching pieces snap together, an enzyme called ligase glues them shut permanently. Crucially, the original enzyme recognition site is cut off and left behind in this process, meaning the final assembled DNA has no “scars” or unwanted leftover sequences. Because the assembled product can no longer be cut by the enzyme, the cutting and gluing can happen simultaneously in one reaction tube.
Model this assembly method with Benchling or Asimov Kernel!
Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository
Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository
Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook
Construct Glyphs
Model — color-coded cassettes, includes pUC-SpecR v1 backboneMy Build — same 3 cassettes, no backbone, monochrome glyphs
Simulation Results
Model — 24h, clean phase separation, transcripts named by repressorMy Build — 72h, oscillation sustained but curves heavily overlapping
Model
My Build
Backbone
pUC-SpecR v1 included
Not added
Duration
24 hours
72 hours
Oscillation
Clear phase separation between curves
Sustained but three curves blur together
RNAP flux pattern
Stepped bars (1.57 / 0.65 / 2.87)
Similar stepped pattern (3.1 / 1.25 / 0.65)
Noise bands
Moderate spread
Wider spread
Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo
Explain in the Notebook Entry how you think each of the Constructs should function
Run the simulator and share your results in the Notebook Entry
Two cassettes mutually silence each other. The system snaps to one of two stable states — either LacI is high and TetR is low, or vice versa. Acts as a bistable memory switch: once flipped, it holds its state.
No → bistable lock Expect: one protein high, one flat zero
2 — NOR Gate pAmtR → AmtR ⟐ pPsrA → PsrA Both repress pAmeR → LambdaCI
Two input repressors each independently silence the output promoter pAmeR. LambdaCI is only produced when neither AmtR nor PsrA is present — a true NOR logic gate.
A two-stage repression cascade. When the upstream signal (pAmtR) is active, it silences the chain, keeping output OFF. Remove the signal → repression lifts through both stages → LambdaCI output turns ON.
Signal present → Output OFF Signal removed → Output ON
Toggle Switch
NOR Gate
Inducible Reporter
Cassettes
2
3
3
Logic
Bistable memory
NOR (A=0 AND B=0)
Signal-gated ON/OFF
Output when inputs silent
Locked state
ON
ON
Key behaviour
Snap to one stable state
Universal logic gate
Controlled expression
Ideal sim duration
24h
24h
48h
Week 7 HW: Genetic Circuits Part II
Intracellular Artificial Neural Networks
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits operate on Boolean logic (AND, OR, NOT), which digitizes biological signals into strict ON (1) or OFF (0) states. IANNs, which operate on analog logic, allows for
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
Layer 2 is an INHIBIT gate: X3 is the excitatory input (fluorescent protein mRNA), RNase2 from Layer 1 is the inhibitory input, and fluorescence only appears when X3 is present and Layer 1 has successfully suppressed RNase2 via RNase1.
An intracellular two-layer perceptron in which Layer 1 produces an endoribonuclease that post-transcriptionally regulates the Layer 2 fluorescent protein output.
Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
Most existing fungal materials are made from Mycelium, used for biopackaging, fungal leather/textile. The advantage is sustainability, given the biomaterial, mycelium is 100% compostable, and make efficient use of resources. The down side is that it’s susceptible to moisture, and the nature of the living biomaterial made standardization harder.
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
Fungi could be useful in tackling environmental issue, such as engineered to absorb and sequester heavy metals and radioactive waste from contaminated soil.
Fungi is better than bacteria because it’s a fun guy! (not funny..)
Week 9 HW: Cell Free Systems
General homework questions
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Describe the main components of a cell-free expression system and explain the role of each component.
Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
Homework question from Kate Adamala
Design an example of a useful synthetic minimal cell as follows:
Pick a function and describe it.
What would your synthetic cell do? What is the input and what is the output?
Would this function be realized by cell-free Tx/Tl alone, without encapsulation?
Could this function be realized by genetically modified natural cell?
Describe the desired outcome of your synthetic cell operation.
Design all components that would need to be part of your synthetic cell.
What would be the membrane made of?
What would you encapsulate inside? Enzymes, small molecules.
Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)
How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
Experimental details
List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
How will you measure the function of your system?
Homework question from Peter Nguyen
Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:
Write a one-sentence summary pitch sentence describing your concept.
How will the idea work, in more detail? Write 3-4 sentences or more.
What societal challenge or market need will this address?
How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?
Homework question from Ally Huang
Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!
For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .
Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)
Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)
Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)
Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)
Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)
Week 10 HW: Imaging and Measurement
Week 11 HW: Bioproduction and Cloud Labs
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork
This is a lovely piece of art created by HTGAA community, I love the bio elements and the niche reference to DNA yay.
I received the link but forgot to contribute. But that wasn’t intentional because maybe someday I will return as a TA for this course.
Although with my pathetic knowledge in bio I will probably get fired on the spot.
What I really liked about the project is the creative use of color palette and the layout of words, and also the fact that I was able to see the quantitative recollection of people’s contribution.
Part B: Cell-Free Protein Synthesis | Cell-Free Reagents
Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.
E. coli Lysate
Component
Role
BL21(DE3) Star Lysate (with T7 RNA Polymerase)
Provides the complete transcription and translation machinery — ribosomes, tRNAs, aminoacyl-tRNA synthetases, initiation/elongation factors, and chaperones. The DE3 genomic insertion encodes T7 RNA Polymerase, enabling high-efficiency transcription from T7 promoter-driven DNA templates.
Salts and Buffer
Component
Role
Potassium Glutamate
Primary K⁺ source for ribosome function and osmotic balance; glutamate is a preferred counterion over Cl⁻, which is inhibitory to translation
HEPES-KOH pH 7.5
Maintains physiological pH to stabilize enzymatic activity throughout the reaction
Magnesium Glutamate
Supplies Mg²⁺, essential for ribosome assembly, RNA structural integrity, and phosphotransfer reactions
Potassium phosphate (monobasic/dibasic)
Provides a phosphate buffer reserve and inorganic phosphate for nucleotide phosphorylation reactions
Energy and Nucleotide System
Component
Role
Glucose
Primary carbon and energy source; feeds glycolysis to drive ATP regeneration and downstream metabolism
Ribose
Enters the pentose phosphate pathway (PPP) to generate PRPP for nucleotide salvage and NADPH for redox balance
AMP, CMP, GMP, UMP
Nucleoside monophosphates (NMPs) serve as transcription precursors, phosphorylated in situ to NTPs by endogenous kinases
Guanine
Free nucleobase salvaged via HGPRT to produce GMP, supplementing the GTP pool for transcription (see Bonus)
Translation Mix (Amino Acids)
Component
Role
17 Amino Acid Mix
Provides the bulk substrates required for ribosomal translation and polypeptide elongation
Tyrosine
Added separately due to its poor aqueous solubility at neutral pH; typically prepared as a pH 12 suspension
Cysteine
Added separately due to oxidation sensitivity and reactivity; prone to disulfide formation in mixed stock solutions
Additives
Component
Role
Nicotinamide
NAD⁺ precursor (vitamin B3) that sustains the redox cofactor pool required for glycolysis and energy metabolism; also inhibits NAD⁺-consuming enzymes (e.g., sirtuins, PARPs) that would otherwise deplete the pool
Backfill
Component
Role
Nuclease-Free Water
Brings the reaction to final volume without introducing RNases that would degrade mRNA templates or tRNAs
Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix. (2-3 sentences)
The 1-hour PEP-NTP system supplies NTPs directly (ATP 1.5 mM, GTP 1.5 mM, CTP/UTP 875 µM each) alongside phosphoenolpyruvate (PEP-Mono, 17.5 mM) and Maltodextrin as fast-acting energy donors — this provides immediate substrates for transcription and translation but is short-lived because PEP is rapidly exhausted and accumulating inorganic phosphate (Pᵢ) inhibits the reaction.
The 20-hour NMP-Ribose-Glucose system instead supplies nucleoside monophosphates (AMP, CMP, UMP) and substitutes GMP entirely with free Guanine (200 µM), relying on endogenous cellular enzymes to phosphorylate NMPs to NTPs using metabolic energy regenerated from Ribose (77.4 mM) and Glucose (6.9 mM), avoiding rapid Pᵢ accumulation and sustains productive synthesis far longer. The PEP-NTP formulation also includes a richer additive cocktail (Spermidine, DMSO, cAMP, NAD, Folinic Acid) to maximize short-burst translation efficiency, whereas the NMP-Ribose system is simplified to Nicotinamide alone and compensates with higher amino acid concentrations (~4.1 mM vs. 2.5 mM) to support extended protein production.
Bonus question: How can transcription occur if GMP is not included but Guanine is?
Guanine is converted to GMP via the purine salvage pathway:
Guanine + PRPP →(HGPRT)→ GMP + PPi
PRPP (5-phosphoribosyl-1-pyrophosphate) is generated from ribose-5-phosphate, a product of the pentose phosphate pathway fed by ribose. The GMP produced is then sequentially phosphorylated by endogenous kinases:
GMP →(Guanylate kinase)→ GDP →(NDP kinase)→ GTP
GTP is the actual substrate incorporated by T7 RNAP during transcription. Using free Guanine rather than GMP is both cost-effective and avoids the chemical instability of pre-formed GTP in the reaction mix — the lysate’s endogenous HGPRT activity handles the conversion efficiently.
Image 1 (Mid-run photograph): The photograph taken during electrophoresis shows the gel submerged in TAE within the gel box. Two colored dye fronts are faintly visible — a blue band and a dark purple band — but they appear localized to only one or two lanes. The majority of the gel appears empty, with no visible dye migration in the other wells. This is already an early indicator that most wells were either not loaded successfully or contained insufficient DNA.
Image 2 (GeneSnap image): The final imaging result is largely dark. Only a single lane shows any detectable fluorescence — a faint, somewhat smeared signal concentrated in what appears to be one lane, with no clearly resolved discrete bands. The remaining lanes are entirely blank. This represents an unsuccessful gel run in terms of the intended gel art pattern.
Analysis of What Went Wrong
Based on the observations made during lab sessions and the photographic evidence, several compounding factors likely contributed to the result:
Pipetting error during well loading. When I was loading the fourth slot, the pipette tip was not properly inserted into the well. This is a critical failure point. In submerged gel electrophoresis, the wells are filled with buffer. The loading dye’s density causes the sample to sink — but only if it is dispensed directly into the well. If the tip hovers above the well or is positioned outside it, the sample disperses into the surrounding buffer and is effectively lost. This likely explains why most lanes are empty on the final image.
Insufficient electrophoresis run time due to electrical issues. There was an unforeseen electrical short circuit that cut the run time short. This is consistent with the imaging result — even in the one lane that has signal, the DNA has not migrated very far, and there is no clear band resolution. A truncated run means fragments have not separated sufficiently, resulting in a compressed, smeared appearance rather than discrete bands. The faint dye fronts visible in Image 1 also suggest limited migration distance.
Potential variability in reaction preparation. Another plausible explanation adding to the result could be the differences in mixing or component proportions across the PCR tubes. This is plausible as if the Lambda DNA stock was not thoroughly vortexed or flicked, concentration could vary between tubes. Similarly, enzyme or buffer pipetting errors at the 1–3 μL scale are common and can result in incomplete digestion or no digestion at all, though the imaging suggests the bigger problem was DNA not being present in the wells at all.
Low overall signal intensity. Even the one visible lane is quite faint. This could indicate that the total DNA mass loaded was below the detection threshold of SYBR Safe under blue light excitation. With 1.5 μg of Lambda DNA per reaction and SYBR Safe staining, bands should normally be clearly visible. The faintness suggests either DNA was lost during loading, the stain was not adequately mixed into the gel, or the transilluminator exposure settings were suboptimal.
Week 3 Lab: Opentrons Art
Week 6 Lab: Gibson Assembly
Week 6 Lab: Gibson Assembly Lab
Overview
In this experiment we engineer color variants of the purple Acropora millepora chromoprotein (amilCP) by introducing targeted mutations at the chromophore (CP) site: cagTGTCAGtac. Substituting the TGTCAG hexamer with variant codons shifts the expressed color to orange, pink, magenta, or blue, as described by Liljeruhm et al. (2018).
Part 1 covers the preparation of two PCR fragments — a Backbone fragment and a Color insert fragment — which will be joined by Gibson Assembly and transformed into E. coli in Part 2.
Two parallel PCR reactions were prepared on ice using the mUAV plasmid as template. The Backbone reaction amplifies the vector (ori + CmR + promoter + RBS), while the Color reaction amplifies the chromophore region with a mutant forward primer that introduces the desired codon substitution at the CP site.
Fig. 1 — Completed PCR reaction setup tables for Backbone and Color fragments.
Reagent Tables
Backbone DNA Fragment(Primers: Backbone Fwd + Backbone Rev)
Reagent
Stock Conc.
Desired Conc.
Volume (µL)
Template mUAV Plasmid
38.5 ng/µL
20 ng/µL
0.8
Backbone Forward Primer
5 µM
0.5 µM
2.5
Backbone Reverse Primer
5 µM
0.5 µM
2.5
Phusion HF PCR Mix
2×
1×
12.5
Nuclease-free water
—
—
6.8
Total Volume
25.0
Color DNA Fragment(Primers: Color Fwd + Color Rev)
Reagent
Stock Conc.
Desired Conc.
Volume (µL)
Template mUAV Plasmid
38.5 ng/µL
20 ng/µL
0.8
Color Forward Primer
5 µM
0.5 µM
2.5
Color Reverse Primer
5 µM
0.5 µM
2.5
Phusion HF PCR Mix
2×
1×
12.5
Nuclease-free water
—
—
6.8
Total Volume
25.0
Thermocycler Programs
Backbone Fragment (BB_PCR) — run on Bio-Rad T100, 25 µL volume
Fig. 2a — PCR tubes labeled on ice prior to thermocycler loading.Fig. 2b — Bio-Rad T100 Thermal Cycler running the BB_PCR program (57°C anneal, 26 cycles, 25 µL volume).
The Color forward primer carries an intentional mismatch in the 6-bp chromophore region (e.g. TGTCAG → GTTGGA for orange). Because the mismatch sits in the 5′ overhang, Phusion polymerase still extends efficiently from the matched 3′ binding region. The mutation is thus incorporated into every PCR copy and all downstream clones.
Part 1a — DpnI Digest
⏱ Time estimate: 45 min at 37°C
After PCR, 1 µL of DpnI was added directly to each 25 µL reaction and incubated at 37°C for 30–60 minutes. DpnI recognises methylated 5′-Gm6ATC-3′ sequences present on E. coli-propagated plasmid template, but absent from unmethylated PCR products. The enzyme therefore selectively digests the parental template while leaving new amplicons intact.
Residual un-digested template will generate wildtype (purple) background colonies that compete with and obscure your color-mutant transformants.
Part 1b — DNA Purification & Quantification
⏱ Time estimate: 30 min
PCR products were purified using the Zymo DNA Clean & Concentrator kit (silica-column adsorption) to remove primers, dNTPs, polymerase, and buffer salts before Gibson Assembly.
Equipment & Consumables
Zymo DNA Clean & Concentrator kit (columns + buffers)
Add 50 µL PCR product + 250 µL DNA Binding Buffer to a 1.5 mL tube. Vortex briefly.
Transfer all 300 µL to a Zymo-Spin Column seated in a Collection Tube. Centrifuge 1 min at 13,000 rpm. Discard flow-through; keep the collection tube.
Add 200 µL Wash Buffer. Centrifuge 1 min. Discard flow-through. Repeat once (2 washes total). Transfer column to a fresh 1.5 mL tube; discard the collection tube.
Add 6 µL nuclease-free water directly to the column membrane. Rest at room temperature for 2 min. Centrifuge 1 min. Collect and save the elution.
Measure concentration on Nanodrop: 2 µL per read. Target ≥ 30 ng/µL, A260/A280 ≈ 1.8–2.0.
Fig. 3 — Eppendorf Centrifuge 5415C used for all column spin steps at 13,000 rpm.
Part 1c — Diagnostic Gel Electrophoresis
⏱ Time estimate: ~15 min at 100 V
Purified fragments were run on a 1% agarose E-Gel EX (Invitrogen) to confirm fragment sizes. Each lane received 3 µL sample + 3.3 µL 6× Loading Dye. DNA ladder loaded in lane M (leftmost).
Fig. 4 — 1% agarose E-Gel EX result. Lanes M (ladder) and 1–5 loaded; lanes 6–10 empty.
Band Interpretation
Lane
Observation
Interpretation
M
Ladder bands across full range
Reference marker
1
Faint band ~400–500 bp
Likely primer-dimer or low-yield non-specific product
2
Faint band, similar to lane 1
Same as above; low amplification
3
Bright band ~600–750 bp
Color insert fragment (~700 bp) — strong, clean yield
4
Faint lower band
Minor non-specific; likely negligible for downstream steps
5
Bright band ~2.7–2.9 kb
Backbone fragment (~2800 bp) — strong, clean yield
6–10
Empty
—
Expected fragment sizes:
Backbone: ~2800 bp (ori + CmR + promoter + RBS)
Color insert: ~700 bp (24 bp upstream of CP site + chromophore + terminator)
Lanes 3 and 5 show bright, clean bands at the expected sizes for Color insert and Backbone respectively. Faint bands in lanes 1, 2, and 4 represent minor non-specific products that will be diluted out during Gibson Assembly and will not affect the outcome. Both fragments are confirmed — proceed to Gibson Assembly.
All colonies across every plate — regardless of intended color variant (blue, pink, light pink) — express a uniform blue-purple color consistent with wildtype amilCP. The intended color shifts to pink or blue did not appear. One notable exception is the red-circled colony in the final plate (Fig. 5h), which is transparent/colorless.
Analysis
Imbalanced Insert:Backbone Molar Ratio
Gibson Assembly outcome is highly sensitive to the molar ratio of insert to backbone, not just the volumes used. The protocol specifies 0.5 µL backbone and 1.0 µL insert — but those volumes assume both fragments are at exactly the stated concentrations after purification.
When backbone is in excess, the probability of the two backbone ends annealing to each other increases sharply — rather than each end finding the insert:
Too much backbone → backbone ends self-anneal → re-circularization
→ carries CmR + original amilCP promoter
→ colonies survive selection AND express wildtype purple
Because the ratio imbalance originates in the Gibson reaction before transformation, it would affect all three volume groups (2µL, 4µL, 7µL) equally — explaining the consistency of the wildtype purple outcome across all plates.
transparent Colony
Partially succeeded, as this is consistent with a scenario where the backbone reassembled without the color insert — the Gibson exonuclease chewed back both ends of the backbone, they annealed to each other rather than to the insert, and ligase sealed the nick. The result is a backbone-only plasmid that carries CmR but lacks the amilCP CDS entirely, hence no color.
Alternatively, the insert was incorporated but with a frameshift or premature stop codon introduced during the Gibson join, knocking out chromoprotein expression without replacing it with a new color.
Either way, this colony is evidence that the Gibson Assembly chemistry was active and processing DNA correctly. The colorless result is not a failure — it is a partial success where the backbone was modified but the color swap did not complete as intended.
the wildtype blue-purple across all other colonies most likely reflects surviving template from incomplete DpnI digestion, while the single transparent colony shows that at least one genuine assembly event occurred.
Week 7 Lab: NeuroMorphic Circuits
For the neuromorphic circuit, our group aimed to design a “L” shaped heatmap. We added two bias corresponding to X1 and X2 ERNs.
Looked perfect
I think we might’ve submitted the wrong file ahahaha, so the final output only displayed the bottom part of the “L”
Each dot in these scatterplots represents a single human cell. The color shows the level of output (mNeonGreen) as a function of X1 and X2 and, optionally, varying levels of bias.
Week 11 Lab: Cloud Labs
Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems (hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each).
The amino acid sequences are shown in the HTGAA Cell-Free Benchling folder.
sfGFP: primary advantage is robust folding kinetics; it is engineered to fold correctly even when fused to insoluble proteins, making it highly resistant to aggregation in the crowded environment of a cell-free extract.
mRFP1: characterized by slow maturation kinetics and a tendency for photobleaching; the delay between peptide synthesis and chromophore formation can lead to an underestimation of protein yield in short-term reactions.
mKO2: features fast maturation and oxygen dependence; while it reaches peak fluorescence quickly, the final oxidative step of chromophore formation requires sufficient O2 levels, which may become limiting in deep-well plates.
mTurquoise2: known for high quantum yield and acid stability; its low pKa makes it less sensitive to the $pH$ drops that naturally occur as metabolic byproducts (like organic acids) accumulate during long-term cell-free incubation.
mScarlet_I: a high-brightness variant with accelerated maturation compared to earlier red FPs; however, it remains sensitive to the oxidative environment, as oxygen is required to complete the cyclization of its chromophore.
Electra2: optimized for ultra-fast maturation; its rapid “time-to-bright” makes it the ideal candidate for real-time monitoring of transcription-translation (TX-TL) kinetics where immediate feedback is required.
Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.
Protein: mScarlet_I
Reagent Adjustment: Increase Glucose and Nicotinamide concentrations while utilizing a semi-permeable reaction seal.
Expected Effect:In a 36-hour run, the primary bottleneck for a bright red FP like mScarlet_I is the depletion of energy and the requirement for oxygen for chromophore maturation. By increasing Glucose and Nicotinamide, we extend the metabolic “runway” for $ATP$ regeneration via the NMP-Ribose-Glucose pathway; combining this with a semi-permeable seal ensures a constant influx of O2 to drive the oxidative maturation of the chromophore, thereby maximizing the total fluorescent signal over the extended incubation period.
The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by 4/24). Make sure that your final project slide is in the slide deck below to be included!
The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (TBD!).
PosmY is an osmotic and nutrient-responsive promoter that becomes transcriptionally active as available carbon sources and nitrogen sources deplete in the surrounding media. As bacteria consume LB broth over hours, PosmY activity increases and mCherry accumulates. mCherry is a fast-maturing, photostable red fluorescent protein.
Excitation: 590nm (amber LED)
Emission: ~610nm (red)
Response time: 2–8 hours (accumulation-based; slow and persistent)
Visual: mCherry emission is visible to the naked eye as a red glow through translucent PDMS at high expression — this is the intended aesthetic centrepiece of the device
Readout method: OPT101 photodiode positioned below the optics stack reads integrated red fluorescence intensity; also directly visible through the PDMS top surface
Why mCherry specifically:
mCherry’s 610nm emission is far enough from cellular autofluorescence (peak ~500–550nm) that signal-to-noise is excellent even with a simple longpass filter. Its large Stokes shift (590nm ex → 610nm em) allows the amber LED to excite it without the emission overlapping the excitation wavelength. It matures faster than mKate or mRFP and is more photostable under repeated amber LED illumination than DsRed variants.
Chassis: E. coli K12
Ideally a thyA⁻ auxotrophic strain (cannot survive outside thymidine-supplemented media). BSL-1 organism. Standard good microbiological practice applies. Transformed with pBad/T7 or equivalent low-copy plasmid carrying PosmY→mCherry + antibiotic resistance marker (kanamycin recommended; ampicillin degrades in media and loses selective pressure over hours).
Bacterial chamber design — hybrid agar approach
Bottom layer: 0.5–0.8mm of 0.5% low-melting-point agarose in LB, with bacteria embedded at OD600 ~0.25–0.5
Top layer: 0.7–1.2mm of liquid LB + kanamycin as a nutrient reservoir
Total well depth: 1.5–2.0mm (set by spacer frame)
Well diameter: 7mm (recommended)
The agar layer immobilizes cells, giving consistent geometry for fluorescence reads and EIS measurements. The liquid layer on top depletes over hours to drive PosmY induction (hunger signal) and acidification (pH signal).
Lab Protocol: Bio-Tamagotchi Metabolic State Validation
Project: PosmY-mCherry Induction & Recovery Characterization Objective: To quantitatively validate the transition of E. coli through three metabolic states—Thriving, Hungry, and Recovering—using fluorescence intensity (RFU) and optical density (OD600) as indicators of osmotic stress response.
1. Equipment and Materials
1.1 Equipment
Item
Specifications
Function
Multi-mode Microplate Reader
Capable of OD600 (±5 nm) and fluorescence (Ex: 587 nm / Em: 610 nm for mCherry)
Performs kinetic assays measuring bacterial density and mCherry expression
Inverted Fluorescence Microscope
mCherry filter set (Ex: 580/20 nm, Em: 630/60 nm)
Visual validation of fluorescence in PDMS microfluidic chambers
Shaking Incubator
37°C ± 0.5°C, orbital shaking at 220 RPM
Aerobic growth conditions for starter cultures
Vacuum Desiccator
-15 to -20 inHg
Degassing PDMS during device fabrication
Pipettes
P20, P200, P1000 with sterile filtered tips
Aseptic liquid handling
1.2 Biologicals & Reagents
Item
Specifications
Function
E. coli Biosensor Strain
DH5α or similar, transformed with pTwist-Cm-PosmY-mCherry
Use two-tailed Student’s t-test for pairwise comparisons
Report p-values and indicate significance thresholds (p < 0.05*, p < 0.01**, p < 0.001***)
For multiple timepoints, consider Bonferroni correction
4. Results Presentation
4.1 Primary Figure: Dual-Axis Growth and Fluorescence Plot
Figure 1: Bio-Tamagotchi Metabolic State Transitions
Plot Specifications:
X-axis: Time (hours, 0–24 h)
Left Y-axis: OD600 (Growth, log scale or linear 0–5)
Right Y-axis: Normalized Fluorescence (RFU/OD600, linear scale)
Data to Include:
OD600 as solid blue line with error bars (SEM, n=3)
Normalized fluorescence as solid red line with error bars (SEM, n=3)
Vertical dashed line indicating feeding event (e.g., t = 12 h)
Annotate three states with shaded regions or labels:
Thriving (Green): t = 0–6 h, low normalized RFU
Hungry (Orange): t = 6–12 h, rising normalized RFU
Recovery (Blue): t = 12–20 h, declining normalized RFU
Caption Example:
“Figure 1. Characterization of PosmY-mCherry biosensor dynamics. E. coli cultures were monitored for growth (OD600, blue) and osmotic stress response (normalized mCherry fluorescence, red) over 24 hours. Nutrient depletion triggers PosmY activation (Hungry state), which is reversed upon feeding with 10× LB (dashed line, t = 12 h). Data represent mean ± SEM (n=3 biological replicates).”
4.2 Supporting Figure: Microscopy Comparison
Figure 2: Single-Cell Fluorescence Validation
Panel A: Phase contrast image of cells at t = 2 h (Thriving) Panel B: mCherry fluorescence of same field (Thriving) - minimal signal Panel C: Phase contrast image of cells at t = 12 h (Hungry) Panel D: mCherry fluorescence of same field (Hungry) - bright cytoplasmic signal
Include scale bar (5 µm)
Use identical exposure times across A-B and C-D pairs
Inset histogram showing fluorescence intensity distribution (optional)
Caption Example:
“Figure 2. Single-cell validation of metabolic state. Representative micrographs showing (A,B) Thriving cells with minimal mCherry expression and (C,D) Hungry cells with elevated osmotic stress response. Scale bar = 5 µm.”
4.3 Quantitative Summary Table
Metric
Value
Units
Baseline Fluorescence (Thriving)
150 ± 20
RFU/OD
Peak Fluorescence (Hungry)
1200 ± 180
RFU/OD
Fold Induction
8.0 ± 1.2
Fold
Time to Peak
11.5 ± 0.8
Hours
Recovery Half-Time
2.3 ± 0.4
Hours
Growth Rate (Exponential Phase)
0.65 ± 0.05
h⁻¹
Values reported as mean ± SD from n=3 independent experiments
5. Troubleshooting Guide
Problem
Possible Cause
Solution
No fluorescence increase
Plasmid loss
Re-streak from glycerol stock; verify Cm resistance
Recovery State: Nutrient restoration reduces osmotic stress; pre-existing mCherry dilutes through cell division, signal decays
Relevance:
This system models a simplified “digital pet” where fluorescence serves as a real-time readout of bacterial metabolic health, demonstrating principles of:
Stress-responsive promoters in synthetic biology
Quantitative phenotyping via plate-based assays
Dynamic feedback systems in living cells
Click to expand Reference and Appendix
## 7. References and Further Reading
1. **P<sub>osmY</sub> Promoter Characterization:**
Yim, H. H., & Villarejo, M. (1992). osmY, a new hyperosmotically inducible gene, encodes a periplasmic protein in *Escherichia coli*. *Journal of Bacteriology*, 174(11), 3637-3644.
2. **mCherry Fluorescent Protein:**
Shaner, N. C., et al. (2004). Improved monomeric red, orange and yellow fluorescent proteins derived from *Discosoma* sp. red fluorescent protein. *Nature Biotechnology*, 22(12), 1567-1572.
3. **Microplate Reader Assays:**
Myers, J. A., et al. (2013). Improving accuracy of cell and chromophore concentration measurements using optical density. *BMC Biophysics*, 6(1), 4.
4. **Synthetic Biology Education:**
Smanski, M. J., et al. (2014). Functional optimization of gene clusters by combinatorial design and assembly. *Nature Biotechnology*, 32(12), 1241-1249.
---
## Appendix A: Preparation of 10× Concentrated LB
**Recipe (for 100 mL):**
- Tryptone: 100 g
- Yeast Extract: 50 g
- NaCl: 100 g
- Distilled H₂O: to 100 mL final volume
**Procedure:**
1. Weigh out components and dissolve in 80 mL distilled water with stirring
2. Adjust to 100 mL final volume
3. Autoclave at 121°C for 20 minutes
4. After cooling, filter-sterilize through 0.22 µm filter to remove particulates
5. Store at 4°C for up to 1 month; check for precipitates before use
**Quality Control:**
- Dilute 1:10 and measure pH (should be 7.0 ± 0.2)
- Perform growth test: diluted 10× LB should support E. coli growth equivalent to standard LB
---
## Appendix B: Chloramphenicol Stock Preparation
**1000× Stock (34 mg/mL in 100% ethanol):**
1. Weigh 340 mg chloramphenicol in chemical fume hood
2. Dissolve in 10 mL absolute ethanol (molecular biology grade)
3. Mix by vortexing until fully dissolved
4. Aliquot into 1 mL microcentrifuge tubes
5. Store at -20°C (stable for 1 year)
6. Thaw aliquot before use; do not refreeze
**Working Concentration:** Add 1 µL of 1000× stock per 1 mL of media for final concentration of 34 µg/mL
Visionary Lookout:
Click to expand the electrical/fab system
## LAYER 3: ELECTROCHEMICAL SENSING SUBSYSTEM
This is the new core of v2. Two unengineered biological signals are read directly from the well using miniaturized electrodes embedded in the spacer frame.
### Signal 1: pH (Metabolic Acid Production)
**What it measures:** As E. coli actively metabolize LB (consuming amino acids, sugars, and organic nitrogen), they excrete metabolic byproducts including acetate, succinate, formate, and lactate. This naturally acidifies the media. A healthy, growing culture drops pH from ~7.2 (fresh LB) toward ~5.8–6.2 over 4–12 hours.
**Game meaning:** pH dropping = bacteria are metabolically active = "alive and growing." pH flat at baseline = dormant or freshly fed. pH flat at low value for extended time = nutrients exhausted, culture stressed.
**Electrode design:**
The pH electrode cannot be a standard glass probe — too fragile, too large, and reference junction too bulky for a 7mm well. Instead, use a two-electrode system threaded through holes drilled or printed into the spacer frame sidewall:
- Working electrode: IrOx-coated platinum wire, 0.5mm diameter, inserted through spacer frame wall at a 30° downward angle into the agar layer. IrOx (iridium oxide) has a Nernstian pH response of ~59mV/pH unit, is biocompatible, and can be electrodeposited onto Pt wire in the lab (protocol: cycle in 0.4mM IrCl₃ + 0.4mM oxalic acid solution, 25 cycles, −0.5V to +1.4V at 50mV/s).
- Reference electrode: Ag/AgCl wire (0.5mm diameter silver wire, chloridized by immersion in FeCl₃ solution for 10 min or by brief anodization in NaCl). Inserted through the opposite side of the spacer frame into the liquid LB layer. This provides a stable reference potential independent of solution composition.
**Signal conditioning:** IrOx pH electrodes have high source impedance (~MΩ). The signal must be buffered before reaching the Arduino analog pin. Use a single-supply rail-to-rail op-amp in voltage-follower configuration (unity gain buffer) between the working electrode output and Arduino A0:
- Recommended op-amp: MCP6001 (1.8–6V supply, rail-to-rail I/O, single-supply, SOT-23 package, ~$0.40). Wire: VDD to 5V, VSS to GND, V+ from IrOx wire, V− connected to Vout (follower config), Vout to A0.
- Expected output range: ~0.2V (pH 4) to ~1.0V (pH 8) relative to Ag/AgCl reference. Calibrate at startup using two-point calibration buffers (pH 4.0 and pH 7.0 standard buffer solutions).
**Arduino read:** Analog A0, 10-bit ADC. Calibrate slope (mV/pH) and intercept at device startup using known buffers. Store calibration constants in EEPROM.
---
### Signal 2: Electrochemical Impedance Spectroscopy (EIS) via AD5933
**What it measures:** The AD5933 applies a small sinusoidal voltage across two electrodes in the well and measures the resulting current to extract the complex impedance (Z = R + jX) of the sample. In a bacterial culture:
- **Cell density:** As bacteria proliferate, they displace conductive medium and add membrane capacitance. Bulk resistance (R) increases with cell crowding at low frequencies; double-layer capacitance (X) increases with total cell membrane area. A growing healthy culture shows characteristic impedance changes detectable within 1–2 hours.
- **Membrane integrity:** Healthy cells maintain high intracellular ion concentration behind an intact lipid bilayer — this contributes a capacitive signature at 10–100kHz. Dead or lysed cells lose membrane integrity, causing characteristic phase angle shifts and capacitance drop. This distinguishes "cells present but dead" from "cells present and healthy."
**Game meaning:** Impedance stable in expected range = population healthy and intact. Impedance falling (phase angle shifting toward purely resistive) = cells dying, membranes compromised. Impedance rising abnormally = possible biofilm formation or media precipitation.
**IC: AD5933 (Analog Devices)**
- I2C interface, address 0x0D (no conflict with SSD1306 at 0x3C)
- Frequency range: 1kHz – 100kHz (programmable sweep)
- Output excitation: 200mV peak-to-peak (suitable for biological samples — non-damaging)
- Supply: 3.3V or 5V (tie to 3.3V to protect OLED on shared I2C bus)
- Returns: real (R) and imaginary (X) impedance components as 16-bit integers
**Gain-setting resistor (RFB):** Critical calibration component. RFB sets the transimpedance gain of the AD5933's internal current-to-voltage converter. It must be matched to the expected well impedance range. For LB agar + liquid layer at 7mm well diameter:
- Expected impedance: ~500Ω–5kΩ at 10–50kHz
- Recommended RFB: 1kΩ (1% tolerance) — this keeps the AD5933 output in its linear range for the expected well impedance. If impedance is consistently saturating or clipping (check by verifying |Z| = √(R²+X²) makes physical sense), swap RFB to 2.2kΩ or 510Ω accordingly.
**EIS electrode design:**
Two-electrode configuration (adequate for relative monitoring; 4-electrode Kelvin measurement is more accurate but unnecessary for game-state detection):
- Material: 0.5mm platinum wire (biocompatible, stable, doesn't corrode in LB, can be autoclaved)
- Placement: two Pt wires inserted through the spacer frame on opposing sides, penetrating 1–2mm into the agar layer. Space electrodes ~5mm apart (across the 7mm well diameter with ~1mm offset from center to avoid blocking the optical axis).
- Connection: short Teflon-insulated lead wires from spacer frame to PCB pads. The AD5933's VIN (excitation output) connects to electrode 1; the VOUT (current measurement input) connects to electrode 2 via RFB.
> **Important geometry note:** The EIS electrodes and the OPT101 optical axis must not conflict. The OPT101 sits directly below the coverslip. The EIS Pt wires enter from the sides of the spacer frame, parallel to the coverslip plane. At 30° downward angle, their tips sit in the agar without blocking the optical path. Verify clearance in the spacer frame design before printing.
**Calibration procedure at startup:**
1. Measure RFB resistance directly (known calibration resistor in place of well) — store as `Z_cal`
2. With well loaded, measure at 5 frequencies (e.g., 5, 10, 25, 50, 100kHz)
3. Store initial baseline |Z| values per frequency in EEPROM
4. Game logic uses *delta from baseline*, not absolute impedance values
**Software:** Use the ZSweeper or Electrochemistry Arduino library for AD5933. For v2 prototype, single-frequency measurement at 10kHz is sufficient. Full sweep (5 frequencies) gives richer data for debugging.
---
## LAYER 4: HARDWARE SYSTEM
### Component list (v2, updated)
**Microcontroller:**
- Arduino Nano (ATmega328P) — unchanged. Plugs into female headers on PCB.
**Sensors:**
- IrOx/Pt pH working electrode + Ag/AgCl reference (in well, through spacer frame) → analog A0 via MCP6001 buffer
- FSR force sensitive resistor (feed button, on case surface) → analog A1 via 10kΩ voltage divider
- OPT101 photodiode + transimpedance amplifier → analog A2 (reads mCherry fluorescence from below)
- AD5933 impedance analyzer (I2C, 0x0D) → reads EIS from Pt well electrodes
**Display:**
- SSD1306 128×64 OLED (I2C, 0x3C) → SDA/SCL (A4/A5)
**Excitation LED:**
- Amber LED 590nm on D6 (PWM) + 220Ω — excites mCherry only. No multiplexing required.
> **v1 → v2 hardware removals:** Blue 470nm LED (D3), Green 530nm LED (D5), and Velostat petting pad (A0 in v1) are all eliminated. This removes 3 components and simplifies the PCB significantly.
**Optical filter:**
- 600nm longpass filter (Rosco #19 "Fire" gel, or equivalent) — upgraded from 500nm longpass in v1. The tighter cutoff at 600nm more aggressively blocks the 590nm amber excitation scatter while still passing mCherry emission at 610nm. This improves SNR for mCherry specifically (critical since mCherry's Stokes shift is only 20nm).
**Translucent PDMS membrane:**
- Use standard Sylgard 184 cured at 10:1 base:crosslinker ratio — this gives optically clear, slightly translucent PDMS. The red mCherry emission at 610nm transmits through 300–500μm PDMS with minimal attenuation. Users looking at the top of the device can see the biological glow. This is intentional and is the primary user-facing aesthetic of v2.
**Optics stack (bottom to top):**
1. OPT101 sensor (face up, on PCB)
2. 600nm longpass filter (resting on sensor)
3. Glass coverslip #1.5 (170μm)
4. Spacer frame (7mm well, 2mm thick, black PLA or acrylic) — with IrOx/Pt pH electrode and two Pt EIS electrodes threaded through sidewalls
5. Agar layer (~0.5–0.8mm, bacteria embedded)
6. Liquid LB layer (~0.7–1.2mm)
7. Translucent PDMS membrane (300–500μm) — visible red glow surface
8. User's hand / ambient light above
**Power:** USB-powered via Nano's USB port. No battery in current design.
### Pin mapping (v2)
Off-Arduino (in well, wired to PCB pads):
PCB pad “pH_W” → IrOx/Pt working electrode wire
PCB pad “pH_R” → Ag/AgCl reference electrode wire
PCB pad “EIS_1” → AD5933 VIN → Pt electrode 1
PCB pad “EIS_2” → AD5933 VOUT via RFB (1kΩ) → Pt electrode 2
> **I2C bus note:** AD5933 (0x0D) and SSD1306 (0x3C) addresses do not conflict. Both operate at 3.3V VDD. Pull-up resistors: 4.7kΩ on SDA and SCL to 3.3V (not 5V — both devices are 3.3V I2C). The Nano's A4/A5 are 5V-tolerant with 3.3V pull-ups; this is fine.
### Physical layout (v2 updates)
Egg-shaped enclosure, black 3D-printed PLA, same form factor as v1 with the following changes:
- **Top surface:** Translucent PDMS window replaces the opaque membrane — this is now the primary visual feature. At high mCherry expression, a diffuse red glow is visible to the naked eye from several feet away in a dimmed room.
- **FSR feed button:** Remains on lower front face as a tactile soft pad.
- **Side port:** Feeding port (syringe injection for LB + kanamycin) unchanged.
- **Electrode ports:** Two additional small sealed ports in spacer frame for pH and EIS electrode wires. These are internal (not user-accessible); wires exit through the frame and terminate at PCB solder pads.
- **OLED position:** Unchanged — sits beside the well on the PCB, visible through cutout.
- **USB port:** Unchanged — bottom of device.
---
## LAYER 5: SIGNAL PROCESSING AND GAME LOGIC
### Sensing timeline and polling rates
| Signal | Hardware | Poll interval | Biological timescale |
|---|---|---|---|
| mCherry fluorescence | OPT101 + amber LED | Every 10 seconds | Hours (protein accumulation) |
| pH | IrOx electrode → A0 | Every 30 seconds | Hours (acid accumulation) |
| EIS impedance | AD5933 at 10kHz | Every 60 seconds | Hours (cell growth/death) |
| FSR (feed input) | A1 | Interrupt / 50ms | Instant |
### Startup calibration routine (mandatory)
Before entering game loop, the device runs a 3-step calibration:
1. **pH calibration:** Prompt user (OLED message) to place pH 7.0 buffer on well, read A0 for 10 seconds averaged → store as `pH7_ADC`. Then pH 4.0 buffer → store as `pH4_ADC`. Compute slope = (7.0 − 4.0) / (pH7_ADC − pH4_ADC) and intercept. Store in EEPROM. This is valid for the life of the IrOx electrode (weeks to months).
2. **EIS baseline:** With bacteria freshly loaded at OD600 ~0.25–0.5, run AD5933 sweep at 10kHz for 30 seconds (5 readings averaged) → store as `Z_baseline`. This baseline shifts when bacteria are reloaded — recalibrate at each reload.
3. **mCherry baseline:** Run 3 amber LED fluorescence readings averaged → store as `mCherry_baseline`. This accounts for any background signal from the agar matrix or optical path.
All subsequent readings use delta from baseline: `ΔpH`, `ΔZ`, `ΔmCherry`.
### Signal delta interpretation
Use exponential moving average (α = 0.1) on all three deltas to smooth noise before game-state evaluation.
### Game states (v2 — 5 states)
STATE pH SIGNAL IMPEDANCE mCHERRY INTERPRETATION
──────────────────────────────────────────────────────────────────────────────────
THRIVING ΔpH < −0.3 ΔZ within ±15% ΔmCherry low Active growth, happy
(acidifying) of baseline (<+20%) “Alive and eating well”
HUNGRY ΔpH flat ΔZ within ±15% ΔmCherry high Nutrients depleted,
(near baseline) of baseline (>+50%) PosmY active
“Feed me”
SICK ΔpH flat ΔZ < −20% ΔmCherry high Cells dying, membranes
or rising (dropping) (>+50%) compromised
“Sick, may not recover”
RECOVERING ΔpH drops ΔZ stable or ΔmCherry Just fed, bacteria
rapidly after improving dropping consuming fresh LB,
feeding event from SICK low from peak PosmY shutting off
OVERFED ΔpH < −0.8 ΔZ rising ΔmCherry low pH crash from excess
(acidifying above baseline (< baseline) feeding; media too
severely) (cell crowding) acidic for growth
> **Disambiguation of THRIVING vs RECOVERING:** Both show pH dropping. The key difference: RECOVERING is entered explicitly after a feeding event (FSR press + confirmed LB injection timestamp). Use a `lastFedTime` variable — if a feeding occurred within the past 2 hours AND pH is dropping AND mCherry is falling, state = RECOVERING. THRIVING is the steady-state equivalent in the absence of a recent feeding event.
### State transition logic (pseudocode)
```cpp
// All deltas are EMA-smoothed before this evaluation
void evaluateGameState() {
bool acidifying = (deltaPH < -0.3);
bool acidCrash = (deltaPH < -0.8);
bool impedOK = (abs(deltaZ_pct) < 15);
bool impedDrop = (deltaZ_pct < -20);
bool impedHigh = (deltaZ_pct > +20);
bool mCherryHigh = (deltaMCherry_pct > 50);
bool mCherryLow = (deltaMCherry_pct < 20);
bool recentFeed = ((millis() - lastFedTime) < 7200000UL); // 2hr window
if (acidCrash && impedHigh && mCherryLow) { state = OVERFED; return; }
if (impedDrop && mCherryHigh) { state = SICK; return; }
if (acidifying && recentFeed && !mCherryHigh) { state = RECOVERING; return; }
if (mCherryHigh && !impedDrop) { state = HUNGRY; return; }
if (acidifying && impedOK && mCherryLow) { state = THRIVING; return; }
// Default / ambiguous: { state = DORMANT; }
}
Fluorescence reading procedure (simplified vs v1)
With only one LED and one fluorescent protein, the timing cycle is straightforward:
1. All LEDs off → OPT101 reads ambient baseline → store as ambientRead
2. Amber LED (D6) ON at PWM 180/255 → wait 50ms → OPT101 reads for 100ms averaged
3. Amber LED OFF
4. mCherry_raw = reading − ambientRead
5. Apply EMA: mCherry_EMA = α * mCherry_raw + (1-α) * mCherry_EMA
6. Repeat every 10 seconds
No multiplexing required. The timing simplification also improves SNR — the OPT101 integrates for a full 100ms on each mCherry read rather than the briefer reads forced by v1’s 3-LED rotation.
MAINTENANCE
Feed every 6–12 hours: Inject 20–50μL fresh LB + kanamycin through side port. Press FSR feed button at same time — this timestamps lastFedTime for RECOVERING state logic.
Reload bacteria every 1–2 weeks: Warm well to 42°C to melt agarose, flush with sterile PBS, recast with fresh bacteria. Recalibrate EIS baseline immediately after.
pH electrode maintenance: IrOx electrodes are stable for weeks to months in aqueous media. If readings drift by >0.5 pH units vs. known buffer, re-run two-point calibration. If drift persists, electrodeposit fresh IrOx layer.
Ag/AgCl reference: Replace every 2–4 weeks or when reference potential shifts (observed as anomalous pH readings unresolvable by calibration). Re-chloridize silver wire in 1M FeCl₃ for 10 min.
Keep at 20–25°C. Avoid direct sunlight — UV light triggers the SOS response in E. coli even without the sulA construct, and will cause oxidative stress that kills the culture and photobleaches mCherry.
Temperature and impedance: EIS measurements are temperature-dependent (solution conductivity changes ~2%/°C). For quantitative impedance data, keep the device in a stable-temperature environment or add an NTC thermistor for temperature compensation in software. For game-state detection using relative changes, this is optional.