<Elsa Muleya> — HTGAA Spring 2026

![cover image](ELSA MULEYA. jpg)

About me

I am a student at Copperbelt University in Zambia and a researcher in the How to Grow (Almost) Anything (HTGAA) 2026 course.

My Mission: Sustainable Agriculture Through Synthetic Biology My primary focus is the development of sustainable, bio-based solutions for agriculture. Currently, my research explores the use of cyanobiochar as a biofertilizer. By leveraging the nitrogen-fixing capabilities of cyanobacteria combined with the structural benefits of biochar, I aim to create a natural, high-efficiency alternative to chemical fertilizers that can revitalize soil health in my local community and beyond.

Strategic Goals & Personal Development To push the boundaries of my final year project, I am focusing on two key development pillars during HTGAA:

Space-Hardened Extremotolerant Stocks: I am interested in exploring how exposure to extreme environments—specifically launching samples into space—can help select for or engineer extremotolerant strains of cyanobacteria. These “space-hardened” stocks could offer superior resilience to the harsh environmental stressors found on Earth, such as drought and high salinity.

Environmental Biosensors: As a secondary goal, I am exploring synthetic biology to create low-cost biosensors that detect heavy metal contamination, ensuring the water used in sustainable irrigation is safe and clean.

Week 1 & 2 Homework

Contact info

Homework

Labs

Projects

Subsections of <Elsa Muleya> — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Week 1: Project Concept — The “Copper-Sentinel” Initiative My Vision: Why This Matters Living in the Copperbelt, we see the good and bad aspects of mining every day—it drives our economy, but it also leaves a heavy footprint on our groundwater. I want to build Copper-Sentinel, a low-cost, decentralized tool for real-time water monitoring.

  • Week 10 HW: Advanced Imaging & Measurement Technology

    Laboratory Report: Advanced Mass Spectrometric Analysis of eGFP Course: How to Grow Almost Anything (HTGAA) — Week 10 Final Project: Measurement Plan Zambia Mineral-Waste Bioremediation Predictor My final project uses a genetically engineered Bacillus subtilis strain expressing a metallothionein (MT) protein (accession WP_070466881.1) to remove copper and other heavy metals from mine-contaminated water in Zambia’s Copperbelt Province. The system also includes a copper-sensing genetic circuit (CopA-CueR), a MazF/MazE kill switch for biocontainment, and a dual-layer hydrogel encapsulation system called ZAMGEL.

  • Week 2 HW: DNA READ WRITE AND EDIT

    Part 1: Benchling & In-silico Gel Art In-Silico Gel Art: Latent Figure Protocol Project Overview For this week’s assignment, I used Benchling to simulate restriction enzyme digests on the Lambda Phage genome (NC_001416). My goal was to move beyond simple data analysis and create “Gel Art” in the style of Paul Vanouse’s Latent Figure Protocol.

  • Week 3 HW: Lab Automation

    Week 3: Lab Automation & Opentrons Art Introduction This week’s focus is on the intersection of biology, robotics, and creative coding. As part of the HTGAA 2026* cohort based in Zambia, I am exploring how liquid-handling automation (specifically the Opentrons OT-2) can streamline laboratory workflows. Beyond the technical utility, this assignment challenged us to use the robot as a canvas, translating digital coordinates into physical biological art.

  • Week 4 HW: Protein Design I

    Homework: Protein Design I Part A. Conceptual Questions 1.# Assignment: Proteins and Amino Acids 1. Amino Acids in 500g of Meat To calculate the total molecules, we first look at the protein density. Meat is roughly 20% protein by mass.

  • Week 5 HW: Protein Design Part II

    Week 5: Protein Design Part II SOD1 Binder Peptide Design and Evaluation Part 1: Generate Binders with PepMLM The human SOD1 sequence was retrieved from UniProt (P00441). The A4V mutation (Alanine to Valine at residue 4) was introduced to the wild-type sequence to create the target for peptide generation. Using the PepMLM-650M model, four 12-amino acid peptides were generated, and the known binder FLYRWLPSRRGG was added as a control.

  • Week 6 HW: Genetic Circuits Part 1

    Assignment: DNA Assembly 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion DNA Polymerase: This is the “engine.” It’s a highly thermostable enzyme that synthesizes new DNA strands. It’s “High-Fidelity” because it has $3’ \rightarrow 5’$ exonuclease activity (proofreading), making significantly fewer mistakes than standard Taq. dNTPs (Deoxynucleotide Triphosphates): These are the molecular building blocks (A, T, C, and G) used by the polymerase to construct the new DNA strand. Buffer (containing $Mg^{2+}$): Maintains the optimal pH for enzymatic activity and provides essential divalent cations. Magnesium ions act as a cofactor for the polymerase, helping it catalyze the phosphodiester bond. Stabilizers: Often includes detergents or proprietary chemicals to prevent the enzyme from denaturing or sticking to the tube walls during the high-heat cycles. 2. What are some factors that determine primer annealing temperature during PCR? Primer Length: Longer primers generally require higher temperatures to remain specific. GC Content: G-C pairs have three hydrogen bonds compared to the two in A-T pairs. Therefore, primers with higher GC content have higher melting temperatures ($T_m$). Salt Concentration: The concentration of monovalent cations (like $K^+$) in the buffer affects the stability of the DNA duplex. Primer Concentration: Higher concentrations can slightly shift the kinetics of annealing. Mismatches: If the primer isn’t a 100% match to the template, the $T_m$ will decrease. Note: The annealing temperature ($T_a$) is usually chosen to be $3-5^\circ\text{C}$ below the $T_m$ of the primers to balance specificity and yield.

  • Week 7 HW: GENETIC circuits II

    Week 7: IANNs & Fungal Materials Part 1: Intracellular Artificial Neural Networks (IANNs) Question 1 What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

  • Week 9 HW: Cell-Free Systems

    HTGAA Homework — Cell-Free Systems Part A: General & Lecturer-Specific Questions General Question 1 Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Subsections of Homework

Week 1 HW: Principles and Practices

Week 1: Project Concept — The “Copper-Sentinel” Initiative

My Vision: Why This Matters

Living in the Copperbelt, we see the good and bad aspects of mining every day—it drives our economy, but it also leaves a heavy footprint on our groundwater. I want to build Copper-Sentinel, a low-cost, decentralized tool for real-time water monitoring.

Instead of traditional sensors that require expensive labs, I’m looking at using Cell-Free Synthetic Biology. Basically, we take the “machinery” out of a cell (the parts that can read DNA and make proteins) and freeze-dry them onto simple paper strips. When a person dips this strip into their well water, a specific DNA circuit I’ve designed reacts to copper ions. If the copper is above the safe limit, the strip turns a vivid purple. Because there are no living bacteria involved, there’s no risk of accidentally releasing a “GMO” into our local environment.


Ensuring an Ethical Future (Governance & Policy)

It isn’t enough to just hand out sensors; we have to think about the “what ifs.” My goal is to ensure this technology contributes to an ethical future where people are protected, not just informed.

Goal 1: Environmental Safety (Non-malfeasance)

  • Specific Sub-goal A: We must stick strictly to a Cell-Free platform. By ensuring the tool is non-living, we avoid the ethical nightmare of synthetic organisms self-replicating in our rivers.
  • Specific Sub-goal B: We need a clear “End-of-Life” protocol for these strips so they don’t become a new source of litter or chemical waste.

Goal 2: Data Equity & Autonomy

  • Specific Sub-goal A: I want the results to be owned by the community. If a village finds high copper, they should have the first right to that data before it goes to a corporation or a government agency.
  • Specific Sub-goal B: The science needs to be “legible”—meaning a person without a science degree should be able to look at the strip and understand exactly what it means for their health.

How We Make This Work (The Governance Matrix)

AspectAction 1: The Technical “Kill-Switch”Action 2: The Community “Water Union”Action 3: National Bio-Policy
PurposeUsing “Cell-Free” extracts instead of live bacteria to prevent any biological spread.Training local youth and leaders to act as “Sentinel Guardians” of their own data.Proposing that the Zambian government recognizes citizen-led bio-data as legal evidence.
Design (Actors)Synthetic biologists and molecular designers (like us in HTGAA).Local community leaders, NGOs, and residents.ZEMA (Zambia Environmental Management Agency) and the Ministry of Mines.
AssumptionsWe’re assuming these delicate biological reagents can survive the Zambian heat without a fridge.We’re assuming that mining firms won’t try to suppress the findings of local citizens.We assume the government is willing to prioritize public health over short-term mining profits.
Risks of Failure & SuccessFailure: The strip gives a “false safe” reading because it got too hot, and people drink toxic water.Failure: The community finds high copper but has no money or help to dig a new, cleaner well.Success Risk: We find so much pollution that land values drop, causing an economic crisis for the locals.

Scoring the Governance Actions

I’ve rated these from 1 (Most Effective/Easiest) to 3 (Hardest/Riskiest).

Does the option:Option 1 (Technical)Option 2 (Community)Option 3 (Legal)
Enhance Biosecurity122
Foster Lab & Field Safety112
Protect the Environment121
Minimize Costs & Burdens213
Feasibility?213
Promote Constructive Use112

My Recommendation & Trade-offs

If I have to choose, I’m prioritizing a combination of the Technical (Cell-Free) and Community-led models (Options 1 and 2).

The “Cell-Free” design is a non-negotiable for me because it’s the most responsible way to use biotech in the wild. But a tool is useless if the people don’t trust it. By building a “Water Union,” we empower people. The biggest trade-off here is the cost of cell-free reagents, which are currently more expensive than living bacteria. However, I believe the environmental safety is worth the extra few cents per test.

I’d present this plan to the Zambian Ministry of Green Economy and Environment. We need them to create a “Safe Sandbox” for us to test these sensors without being buried in the red tape that usually slows down biotech.


Personal Reflection

This week made me realize that biotech isn’t just about what happens in a test tube. I was struck by the idea of Dual-Use risks. A sensor that finds copper could, in the wrong hands, be used to sabotage water supplies or manipulate land prices.

Also, a new ethical concern for me was technological Paternalism the idea of an expert coming in with a fancy tool and leaving. To fix this, our governance needs to focus on remediation. It’s not enough to tell someone their water is poisoned; we must also provide the biological tools (like copper-absorbing biopolymers) to help them clean it.

Copper-Sentinel Model Sketch Copper-Sentinel Model Sketch

Week 2 Lecture Prep: Reading and Writing Life

Part 1: Professor Jacobson’s Questions

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

The Discrepancy: The error rate of standard DNA polymerases is roughly 1 in 10,000 to 1 in 100,000 nucleotides. Since the human genome has approximately 3 billion base pairs, relying solely on basic polymerase would mean tens of thousands of mutations every time a cell divides. The Solution: Biology uses a multi-layered “spell-check” system. First, the polymerase has proofreading abilities (exonuclease activity) that catch most mistakes as they happen. Second, Mismatch Repair (MMR) proteins scan the strands to fix remaining errors. This brings the final error rate down to about 1 in a billion.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work?

The Numbers: There are an astronomical number of ways to write the same protein due to code degeneracy. For an average human protein (~400 amino acids), there are roughly $10^{150}$ possible DNA sequences. Practical Constraints: Not all codes work because some codons are “rare,” causing the cell to run out of tRNA and stall production. Additionally, certain sequences can create hairpins (DNA folding on itself) or unintended stop signals that terminate the process prematurely.


Part 2: Dr. LeProust’s Questions

  1. What’s the most commonly used method for oligo synthesis currently?

    The gold standard is Phosphoramidite synthesis. This chemical process builds DNA one nucleotide at a time on a solid surface.

  2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

    It is due to Efficiency. Even with a 99% coupling efficiency, errors compound over 200 steps. By the end, only a tiny fraction of the strands are correct; the rest are “trash” sequences, missing letters.

  3. Why can’t you make a 2000bp gene via direct oligo synthesis?

    The math implies the yield for a 2000bp strand would be effectively zero—not a single perfect molecule would exist in the tube. Instead, scientists synthesize many short 100-200nt pieces and “glue” them together using enzymes (assembly).


    Part 3: George Church’s Question:

  4. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 Essentials: Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Arginine, Leucine, and Lysine. My View: In Jurassic Park, the “Lysine Contingency” was a fictional “kill switch.” However, in reality, all animals (including humans) are unable to make lysine. It isn’t a special safety feature—it is a fundamental natural limitation that shows how all life depends on its environment and diet for survival.

Week 10 HW: Advanced Imaging & Measurement Technology

Laboratory Report: Advanced Mass Spectrometric Analysis of eGFP

Course: How to Grow Almost Anything (HTGAA) — Week 10


Final Project: Measurement Plan

Zambia Mineral-Waste Bioremediation Predictor

My final project uses a genetically engineered Bacillus subtilis strain expressing a metallothionein (MT) protein (accession WP_070466881.1) to remove copper and other heavy metals from mine-contaminated water in Zambia’s Copperbelt Province. The system also includes a copper-sensing genetic circuit (CopA-CueR), a MazF/MazE kill switch for biocontainment, and a dual-layer hydrogel encapsulation system called ZAMGEL.

The table below summarizes what I need to measure, why it matters, and how I will measure it:

What I Am MeasuringWhy It MattersHow I Will Measure It
Metallothionein (MT) protein mass (~5.2 kDa, 49 amino acids)Confirm the protein was successfully expressed; detect copper bound to the proteinIntact LC-MS (native mode): run the protein in ammonium acetate buffer to preserve metal binding; each copper ion bound adds ~61.5 Da to the mass, allowing me to count how many copper ions are attached
MT protein amino acid sequenceConfirm all 11 cysteines are present (these are the copper-grabbing residues); check there are no mutationsTryptic peptide mapping (LC-MS/MS): digest the protein with trypsin, then identify the resulting peptides by mass and fragmentation — same method as this week’s eGFP lab
Copper binding capacityMeasure exactly how many copper ions one MT protein molecule can holdNative MS + ICP-MS: native MS gives the mass of the copper-protein complex; ICP-MS (Inductively Coupled Plasma MS) measures copper concentration in solution per mole of protein
CopA sensor circuit activityCheck that the genetic circuit switches on in response to copperFluorescence plate reader: if a GFP reporter is placed downstream of the CopA copper-sensing promoter, fluorescence will increase when copper is present — I will measure this at different copper concentrations to build a dose-response curve
MazF/MazE kill switch expressionConfirm the biocontainment system works — the bacteria must die when the switch is triggeredWestern blot + quantitative PCR (qRT-PCR): detect the toxin (MazF) and antitoxin (MazE) proteins; measure mRNA levels to confirm the switch triggers correctly
Heavy metal removal from waterProve the system actually removes copper from real Copperbelt water samplesICP-MS or Atomic Absorption Spectroscopy (AAS): measure copper, cobalt, and manganese concentrations before and after treatment; compare to Zambia EPA water quality limits
ZAMGEL hydrogel structureConfirm the hydrogel bead is porous enough for water and copper to pass through, but tight enough to keep bacteria insideScanning Electron Microscopy (SEM): image the hydrogel surface and pores at high magnification
Bacterial viability inside ZAMGELConfirm bacteria stay alive and active inside the hydrogel under Copperbelt water conditionsColony Forming Unit (CFU) counts + LIVE/DEAD fluorescence staining: count living versus dead cells inside the beads

In simple terms: I am checking that (1) the MT protein is made correctly and grabs copper, (2) the genetic switch turns on only when copper is present, (3) the kill switch works to destroy the bacteria when needed, and (4) the whole system actually cleans up copper-contaminated water.


Part I: Intact Protein Analysis — Molecular Weight of eGFP

Question 1: Theoretical Molecular Weight

The full eGFP sequence (247 amino acids, including the His₆-tag and LE linker) was entered into the ExPASy Compute pI/Mw tool and verified in Benchling.

ParameterValue
Total amino acids247
Isoelectric point (pI)5.90
Theoretical MW (average mass)28,006.60 Da
Theoretical MW (monoisotopic)27,988.96 Da

For intact proteins at this size, the average mass (28,006.60 Da) is the appropriate theoretical reference because mass spectrometers detect the centre of the unresolved isotope envelope.

Theoretical MW and pI results from ExPASy Theoretical MW and pI results from ExPASy Figure 1: Theoretical Molecular Weight and Isoelectric Point (pI) calculation for eGFP via ExPASy.

Benchling primary sequence analysis Benchling primary sequence analysis Figure 2: Primary sequence analysis in Benchling confirming a 247-residue length.

Question 2: Experimental Molecular Weight Using the Adjacent Charge State Method

Two adjacent charge state peaks were selected from the denatured intact LC-MS spectrum (Figure 1):

Peakm/z value
Peak A — (m/z)n1037.4423
Peak B — (m/z)n+11000.5021

Step 1 — Calculate the charge state (z):

z = (m/z)_(n+1) / [(m/z)n − (m/z)(n+1)]

z = 1000.5021 / (1037.4423 − 1000.5021)

z = 1000.5021 / 36.9402 = 27.08 → z = 27


Step 2 — Calculate the experimental molecular weight:

MW_experiment = z × [(m/z)_n − 1.0073]

MW_experiment = 27 × (1037.4423 − 1.0073)

MW_experiment = 27 × 1036.435 = 27,983.75 Da

(1.0073 Da = mass of one proton)


Step 3 — Calculate mass accuracy:

Accuracy (ppm) = |MW_experiment − MW_theory| / MW_theory × 10⁶

Accuracy (ppm) = |27,983.75 − 28,006.60| / 28,006.60 × 10⁶

Accuracy (ppm) = 22.85 / 28,006.60 × 10⁶ = 816 ppm


Interpretation of the 816 ppm offset: This is not an analytical error — it is a biochemical signal. The ExPASy tool calculates the mass of the unmatured linear peptide chain. In living cells, eGFP undergoes spontaneous chromophore formation involving two modifications: dehydration (−18.01 Da) and oxidation (−2.02 Da), a total loss of ~20 Da.

Corrected theoretical mass for mature eGFP:

MW_matured = 28,006.60 − 18.01 − 2.02 = 27,986.57 Da

Revised accuracy = |27,983.75 − 27,986.57| / 27,986.57 × 10⁶

Revised accuracy = 2.82 / 27,986.57 × 10⁶ = 101 ppm

This is consistent with expected intact protein LC-MS performance on the Xevo G3 and confirms the protein carries a mature fluorescent chromophore.


Question 3: Charge State from the Zoomed-in Peak (Figure 1)

Yes, the charge state can be observed. At 30,000 resolution on the Xevo G3, the spacing between adjacent isotope peaks within a charge state envelope equals 1/z Da. For the z = 27 peak at m/z ≈ 1037.44:

Δ(m/z) = 1/z = 1/27 ≈ 0.037 Da

At 30,000 resolution, peaks separated by 0.037 Da at m/z ~1037 are resolvable because:

Resolving power needed = m/z ÷ Δ(m/z) = 1037 ÷ 0.037 ≈ 28,000

This is within the instrument’s 30,000 specification. The zoomed inset therefore shows a resolved isotope ladder confirming z = +27.


Part II: Protein Conformation — Native vs. Denatured

Question 1: Difference Between Native and Denatured States

When a protein unfolds, it loses its compact three-dimensional structure. All the amino acid residues that were buried inside the core become exposed to the surrounding solution. This is important for mass spectrometry because basic residues (Lys, Arg, His) that are normally hidden inside the protein can now all pick up protons from the solvent. More protons attached = higher charge = lower m/z values.

FeatureDenatured State (Figure 2, top)Native State (Figure 2, bottom)
Protein structureUnfolded; all residues exposedCompact; interior residues hidden
Protonation sites availableAll basic residues accessibleOnly surface-exposed basic residues
Charge state rangez = +15 to +25 (high charge)z = +8 to +11 (low charge)
m/z range in spectrum~800–1,200 (low m/z)~2,400–2,800 (high m/z)
Envelope shapeBroad, many charge statesNarrow, few charge states
MS buffer conditionsLow pH, organic solvents (LC-MS)Aqueous ammonium acetate, pH ~7

How the mass spectrometer detects this: The denatured spectrum shows a wide distribution of peaks at low m/z, reflecting the many charge states a fully exposed chain can adopt. The native spectrum shows a narrow cluster of peaks at much higher m/z, because the folded protein’s buried core limits proton access. This difference in charge state distribution is the direct readout of protein folding state in the mass spectrum.


Question 2: Charge State at ~2800 m/z (Figure 3)

Yes, the charge state can be determined. Using the isotope spacing in the zoomed inset at the neighbouring ~2545 m/z peak:

Step 1 — Calculate z from the 2545 peak isotope spacing:

Δ(m/z)_2545 = 2545.1304 − 2545.0388 = 0.0916 Da

z_2545 = 1 / 0.0916 = 10.9 → z = +11


Step 2 — Determine z for the ~2800 peak:

Since adjacent charge state peaks differ by z = ±1, and the ~2800 m/z peak sits at higher m/z (therefore lower charge) than the +11 peak:

z_2800 = 11 − 1 = +10


Step 3 — Verify by back-calculating the mass:

MW = z × [(m/z) − 1.0073]

MW = 10 × (2800 − 1.0073)

MW = 10 × 2798.9927 ≈ 27,990 Da

This matches the matured eGFP theoretical mass of ~27,986.57 Da, confirming the assignment.

How you can tell visually: At 30,000 resolution and m/z ~2800, isotopes are 1/10 = 0.1 Da apart — resolvable in the inset as a clear ladder of peaks separated by 0.1 Da, which is exactly how one can confirm the charge state is +10.


Part III: Peptide Mapping — Primary Structure Confirmation

Question 1: Lysine and Arginine Count

The eGFP sequence (247 aa) was entered into Benchling (Biochemical Properties tab) and confirmed with ExPASy PeptideMass:

Amino AcidCount
Lysine (K)20
Arginine (R)6
Total cleavage sites for trypsin26

Predicted number of peptides:

Trypsin cleaves after every K and R residue (except when followed by proline). With 26 cleavage sites:

Peptides = cleavage sites + 1 = 26 + 1 = 27 peptides


Question 2: Predicted Tryptic Peptides from ExPASy PeptideMass

Parameters used: Enzyme = Trypsin, Missed cleavages = 0, Cysteines = reduced form, Methionines = unoxidized.

The tool predicts 27 peptides. The full list is shown below (masses are monoisotopic):

Mass (Da)PositionPeptide Sequence
4472.1752170–210HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK
2566.2931217–239DHMVLLEFVTAAGITLGMDELYK
2437.26085–27GEELFTGVVPILVELDGDVNGHK
2378.257754–74LPVPWPTLVTTLTYGVQCFSR
1973.9062142–157LEYNYNSHNVYIMADK
1503.659728–42FSVSGEGEGDATYGK
1266.578387–97SAMPEGYVQER
1083.4979240–247LEHHHHHH
1050.5214115–123FEGDTLVNR
982.4952133–141EDGNILGHK
821.394081–86QHDFFK
790.355275–80YPDHMK
769.391347–53FICTTGK
711.2944103–108DDGNYK
655.381398–102TIFFK
602.2780211–215DPNEK
579.3137128–132GIDFK
507.2925164–167VNFK
502.3235124–127IELK

(Peptides below 500 Da: MVSK, LTLK, AEVK, TR, LYK, GID — not shown by ExPASy at default settings; these represent the remaining ~9.3% of sequence)

ExPASy PeptideMass reports that 90.7% of the sequence is covered by peptides ≥ 500 Da at the default display threshold.


Question 3: Chromatographic Peaks in Figure 5a

From the Total Ion Chromatogram (Figure 5a), counting all peaks with relative abundance >10% between 0.5 and 6 minutes:

Observed chromatographic peaks: approximately 20–22 major peaks


Question 4: Do Peaks Match Predicted Peptides?

There are fewer peaks in the chromatogram than predicted peptides. The TIC shows ~20–22 peaks versus 27 predicted, for the following reasons:

  • Very small, highly hydrophilic peptides (MVSK, LTLK, IELK, VNFK — all < 500 Da) do not bind the reverse-phase C18 column and elute in the void volume, so they are not detected as distinct peaks
  • Some peptides may co-elute and appear as a single unresolved peak in the TIC
  • The His₆-tag peptide (LEHHHHHH) may ionize poorly due to its unusual composition

Question 5: Identify m/z, Charge State, and [M+H]⁺ for the 2.78 min Peak

From Figure 5b:

m/z = 525.7602


Charge state from isotope spacing in the zoomed inset:

Δ(m/z) = 0.499 Da

z = 1 / 0.499 ≈ 2 → z = +2


Calculate singly charged mass [M+H]⁺:

[M+H]⁺ = (m/z × z) − (z − 1) × 1.0073

[M+H]⁺ = (525.7602 × 2) − (1 × 1.0073)

[M+H]⁺ = 1051.5204 − 1.0073 = 1050.51 Da


Question 6: Peptide Identification and Mass Accuracy

Searching the ExPASy PeptideMass output for the peptide whose monoisotopic mass is closest to 1050.51 Da:

PeptidePositionTheoretical Mass (Da)Δ from observed
FEGDTLVNR115–1231050.52140.009 Da ✓
EDGNILGHK133–141982.4952too far

The identified peptide is FEGDTLVNR (positions 115–123).


Mass accuracy in ppm:

ppm error = |MW_experiment − MW_theory| / MW_theory × 10⁶

ppm error = |1050.5124 − 1050.5214| / 1050.5214 × 10⁶

ppm error = 0.009 / 1050.5214 × 10⁶ = 8.6 ppm

This is excellent performance, fully consistent with Waters BioAccord LC-MS specifications (< 10 ppm for peptides).


Question 7: Sequence Coverage (Figure 6)

From the ExPASy PeptideMass output, 90.7% of the eGFP sequence is represented by tryptic peptides with mass ≥ 500 Da. The amino acid coverage map (Figure 6) from the Waters BioAccord LC-MS experimentally confirms this coverage through detected peptide masses and fragmentation patterns. The remaining ~9.3% corresponds to very small peptides below the instrument’s reliable detection threshold.


Bonus Question 1: Peptide Sequence from Fragmentation Spectrum (Figure 5c)

The sequence FEGDTLVNR was entered into the SystemsBiology Fragment Ion Servlet (monoisotopic masses, +1 charge, b/y ion series). The predicted fragment ions are:

#Residueb-ion (m/z)y-ion (m/z)# from C-term
1F148.075741050.521499
2E277.11833903.453088
3G334.13979774.410497
4D449.16673717.389026
5T550.21441602.362085
6L663.29848501.314404
7V762.36689388.230343
8N876.40982289.161922
9R1032.51093175.119001

Mass/Charge Table for FEGDTLVNR:

SpeciesMonoisotopic (Da)Average (Da)
(M)1049.514221050.13629
(M+H)⁺1050.521491051.14356
(M+2H)²⁺525.76441526.07544
(M+3H)³⁺350.84538351.05273
(M+4H)⁴⁺263.38586263.54138

Confirmation of match — ppm error on the (M+2H)²⁺ ion:

Predicted (M+2H)²⁺ = 525.76441 vs. observed = 525.7602

ppm error = |525.7602 − 525.76441| / 525.76441 × 10⁶

ppm error = 0.00421 / 525.76441 × 10⁶ = 8.0 ppm


Sequence confirmed via ion series:

Matching the y-ion series from Figure 5c against predicted values reads the C-terminal sequence inward:

  • y1 = 175.12 → R
  • y2 = 289.16 → NR
  • y3 = 388.23 → VNR
  • y4 = 501.31 → LVNR
  • y5 = 602.36 → TLVNR

The b-ion series confirms the N-terminal sequence reading outward:

  • b1 = 148.08 → F
  • b2 = 277.12 → FE
  • b3 = 334.14 → FEG

The peptide sequence is confirmed as FEGDTLVNR.


Bonus Question 2: Does the Peptide Map Confirm eGFP Identity?

Yes, the peptide map data unambiguously confirms that the protein is eGFP. Three independent lines of evidence support this conclusion:

  1. Mass accuracy: Tryptic peptide masses match ExPASy PeptideMass theoretical predictions within < 10 ppm — consistent with authentic eGFP sequence
  2. MS/MS fragmentation: The fragmentation spectrum of the 2.78-min peak matches the predicted b- and y-ion series for FEGDTLVNR, confirming the amino acid sequence residue by residue
  3. Sequence coverage: Figure 6 shows that >90% of the eGFP primary sequence is experimentally confirmed, leaving no significant unexplained regions

Part IV: Oligomeric States of KLH by Charge Detection Mass Spectrometry

CDMS enables direct, single-particle mass measurement of very large protein complexes without requiring resolved charge states. Using known subunit masses (7FU = 340 kDa; 8FU = 400 kDa), the expected masses for each oligomeric species are calculated below:

Oligomeric SpeciesSubunitCalculationTheoretical MassLocation on Figure 7
7FU Decamer7FU (340 kDa)10 × 340 kDa3,400 kDa (3.4 MDa)Leftmost peak, ~3.4 MDa
8FU Didecamer8FU (400 kDa)20 × 400 kDa8,000 kDa (8.0 MDa)~8.0 MDa
8FU 3-Decamer8FU (400 kDa)30 × 400 kDa12,000 kDa (12.0 MDa)~12.0 MDa
8FU 4-Decamer8FU (400 kDa)40 × 400 kDa16,000 kDa (16.0 MDa)Rightmost peak, ~16.0 MDa

Calculations shown explicitly:

  • 7FU Decamer: 10 × 340 = 3,400 kDa
  • 8FU Didecamer: 20 × 400 = 8,000 kDa
  • 8FU 3-Decamer: 30 × 400 = 12,000 kDa
  • 8FU 4-Decamer: 40 × 400 = 16,000 kDa

Reading Figure 7 left-to-right, the four peaks correspond to these four species in increasing order of mass. CDMS is uniquely suited for this measurement because the extremely large size of these complexes (3.4–16 MDa) makes charge state resolution impossible in conventional ESI-MS — CDMS bypasses this by measuring charge on each individual particle directly.


Part V: Final Assessment — Did I Make eGFP?

Summary Table

Theoretical (unmatured)Theoretical (matured)Observed (Intact LC-MS)PPM Error
Molecular Weight (Da)28,006.6027,986.5727,983.75816 ppm (vs. unmatured) / 101 ppm (vs. matured)

PPM error calculation (vs. unmatured):

ppm = |27,983.75 − 28,006.60| / 28,006.60 × 10⁶ = 22.85 / 28,006.60 × 10⁶ = 816 ppm

PPM error calculation (vs. matured eGFP):

ppm = |27,983.75 − 27,986.57| / 27,986.57 × 10⁶ = 2.82 / 27,986.57 × 10⁶ = 101 ppm


Verdict: Yes — eGFP was successfully produced.

The mass difference of ~23 Da between the unmatured theoretical mass and the observed mass is not analytical error — it is the biochemical signature of chromophore maturation (loss of H₂O and H₂ during spontaneous cyclization and oxidation of the Ser65-Tyr66-Gly67 tripeptide). When compared against the correct matured eGFP mass of 27,986.57 Da, the measurement accuracy is 101 ppm, consistent with intact protein LC-MS performance.

This is further confirmed by: (1) tryptic peptide mapping recovering >90% of the primary sequence with < 10 ppm mass accuracy, and (2) native MS (Part II) showing a compact charge state distribution at high m/z confirming the protein is properly folded into its characteristic β-barrel structure. The combination of mass, sequence, and folding data provides complete confirmation that the expressed protein is functional eGFP.


References

Carr, S. (2012). Fundamentals of peptide and protein mass spectrometry [Video]. Broad Institute of MIT and Harvard. https://www.youtube.com/watch?v=PFOodSbH9IY

Eiler, S., Gangloff, M., & Duclohier, H. (2020). Native vs denatured: An in-depth investigation of charge state and isotope distributions. Journal of the American Society for Mass Spectrometry, 31(10). https://pmc.ncbi.nlm.nih.gov/articles/PMC7539638/

Jorgenson, J. (2012). History of LC and mass spectrometry [Video]. Vimeo. https://player.vimeo.com/video/53604465

Tucholski, T., Coon, J. J., & Ge, Y. (2019). Best practices for intact protein analysis for top-down mass spectrometry. Nature Methods, 16(7), 587–594. https://doi.org/10.1038/s41592-019-0457-0

Swiss Institute of Bioinformatics. (2024). ExPASy Compute pI/Mw tool. https://web.expasy.org/compute_pi/

Swiss Institute of Bioinformatics. (2024). ExPASy PeptideMass tool. https://web.expasy.org/peptide_mass/

University of Washington. (2024). Fragment Ion Servlet. SystemsBiology.net. http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html

Waters Corporation. (2024). Waters Xevo G3 QTof mass spectrometer. https://www.waters.com

Week 2 HW: DNA READ WRITE AND EDIT

Part 1: Benchling & In-silico Gel Art

In-Silico Gel Art: Latent Figure Protocol

Project Overview

For this week’s assignment, I used Benchling to simulate restriction enzyme digests on the Lambda Phage genome (NC_001416). My goal was to move beyond simple data analysis and create “Gel Art” in the style of Paul Vanouse’s Latent Figure Protocol.

The Visual Design

I designed a zigzag pattern that emerges from a complex reference lane. By selecting specific enzymes, I was able to control the migration height of the DNA bands to create a deliberate visual W shape.

Enzyme Key and Lane Setup

LaneEnzyme CombinationVisual Goal
LadderNEB 2-LogSize reference for the DNA bands.
Lane 1All 7 EnzymesThe Master Key: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, SalI.
Lane 2SacILow Point: Sharp band near the bottom.
Lane 3BamHI + SalIMid-Point: Moving the pattern upward.
Lane 4EcoRIHigh Point: The peak of the zigzag.
Lane 5BamHI + SalIMid-Point: Symmetric return to the middle.
Lane 6SacILow Point: Completing the zigzag at the bottom.

Final Result

Zigzag Gel Art Zigzag Gel Art

Reflection

Working with EcoRV was a challenge because it cuts the genome 21 times, resulting in a significant amount of noise. By isolating simpler cutters, such as SacI and EcoRI, in the later lanes, I was able to make the intended artwork much clearer.

View my Benchling Virtual Digest Project

Part 3: DNA Design Challenge

** 3.1. Choose Your Protein**

  • Protein Chosen: Insulin (Homo sapiens)
  • Why: I chose Insulin because it is a vital hormone for glucose regulation and holds historical significance as the first human protein to be manufactured using recombinant DNA technology.
  • Protein Sequence (FASTA format):

sp|P01308|INS_HUMAN Insulin OS=Homo sapiens OX=9606 GN=INS PE=1 SV=1 MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN


** 3.2. Reverse Translate**

Process: Using the Sequence Manipulation Suite, I reverse-translated the Insulin amino acid sequence into a DNA sequence. I used the most likely codons based on the genetic code to ensure a usable, non-degenerate sequence.

Insulin DNA Sequence (Naive/Initial):

atggcgctgtggatgcgcctgctgccgctgctggcgctgctggcgctgtggggcccggatccggcggcggcgtttgtgaaccagcatctgtgcggcagccatctggtggaagcgctgtatctggtgtgcggcgaacgcggctttttttataccccgaaaacccgccgcgaagcggaagatctgcaggtgggccaggtggaactgggcggcggcccgggcgcgggcagcctgcagccgctggcgctggaaggcagcctgcagaaacgcggcattgtggaacagtgctgcaccagcatttgcagcctgtatcagctggaaaactattgcaactaa

3.3. Codon optimization.

Chosen Organism: Escherichia coli (E. coli)

Why do we need to optimize? Different organisms have different “preferences” for which codons they use to build proteins. If we put the human insulin DNA sequence directly into E. coli, the bacteria might lack the necessary tRNA “building blocks” to read it efficiently. By using the IDT Codon Optimization Tool, I have swapped the human codons for the ones that E. coli prefers, ensuring the fastest and most reliable production of the protein.

Optimized Insulin DNA Sequence (for E. coli): ATG GCA CTG TGG ATG CGC CTG CTG CCG TTG TTA GCT CTG CTG GCG TTA TGG GGG CCG GAT CCG GCG GCG GCC TTC GTG AAT CAG CAT TTA TGT GGC TCA CAC CTG GTC GAA GCC TTG TAC TTA GTC TGT GGT GAA CGT GGT TTT TTT TAC ACA CCG AAA ACC CGC CGT GAA GCG GAG GAC CTT CAG GTG GGC CAG GTT GAA CTG GGC GGC GGT CCG GGC GCG GGA TCT CTT CAG CCT CTG GCT TTA GAA GGA AGC CTG CAG AAA CGC GGC ATT GTG GAG CAG TGC TGT ACC TCT ATT TGC TCC CTG TAT CAG TTG GAA AAC TAT TGT AAT TAA

3.4. You have a sequence! Now what?

To turn my digital sequence into a physical protein, I would use the following technologies:

  • Chemical DNA Synthesis: I would send my optimized sequence to a vendor like IDT to synthesize the physical DNA strands.
  • Recombinant Expression: I would insert this DNA into a plasmid and transform it into E. coli cells. The bacteria act as a biological factory, using transcription and translation to manufacture the insulin.
  • Cell-Free Synthesis: Alternatively, I could use a X-TL system, which uses cellular machinery in a test tube to produce the protein without needing a living host.

3.5. Biological Systems

How can a single gene code for multiple proteins? Nature is far more efficient than a simple 1:1 “one gene, one protein” rule. Through Alternative Splicing, a cell can choose which sections of an RNA transcript to keep and which to discard. This allows the same gene to produce several different versions of a protein, known as isoforms, which can have different functions in the body.

Case Study: Human Insulin (P01308)

  • Isoforms: This gene produces 2 isoforms via alternative splicing.
  • Maturation: Insulin also undergoes Post-translational processing, where it is trimmed from a long Preproinsulin chain into the final active hormone.

The Biomolecular Flow: Below is the full breakdown of how my digital DNA sequence becomes a functional protein.

LevelSequenceKey Change
DNAATG GCA CTG TGG...Optimized for E. coli host
RNAAUG GCA CUG UGG...Transcribed copy; T is now U
ProteinM A L W ...Translated amino acid sequence

Part 4: Prepare a Twist DNA Synthesis Order

Project: Insulin_v1.0_System_Architecture

Developer: [Elsa Muleya]
Status: Compiled & Verified
Target Environment: E. coli OS

1. The Source Code (DNA)

The circular plasmid represents the permanent Read-Only Memory (ROM) of the biological system.

  • ENTRY_POINT (promoter): Executes the START command. It signals the system’s hardware to begin data processing at position 1.
  • DATA_PACKET (RBS): The Ribosome Binding Site acts as the Buffer. It prepares the hardware to load the upcoming instructions.
  • MAIN_APP (Insulin CDS): The primary logic gate. This is the raw sequence that defines the structure of the final output (Insulin).
  • METADATA_TAG (7x His Tag): An attached Header. This 7-histidine string acts as a unique ID for downstream sorting and purification.
  • EOF_MARKER (Terminator): The exit(0) command. It forces the system to stop reading and release the hardware resources.

2. The Compiler (Transcription)

This is the process of converting the High-Level Code (DNA) into Machine Code (mRNA).

  • The system’s compiler (RNA Polymerase) docks at the ENTRY_POINT.
  • It generates a temporary copy of the data. This is equivalent to loading an application from the Hard Drive (DNA) into RAM (mRNA) for active execution.

3. The Execution (Translation)

The system hardware (Ribosome) executes the instructions stored in the RAM (mRNA).

  • BIT_READING: The hardware reads the code in 3-bit segments called Codons.
  • OUTPUT_GENERATION: For every 3 bits read, the system adds one unit (amino acid) to the physical product.
  • FRAME_CHECK: I have verified the 7x His Tag is in-frame, ensuring the Metadata Header is correctly attached to the Main App without data corruption.

4. System Security & Multi-Threading (The Vector)

The design uses the pTwist Amp High Copy backbone for optimized performance.

  • FIREWALL (AmpR): Provides Ampicillin resistance. This acts as a security filter; any cell that does not contain the “authorized” plasmid is deleted by the antibiotic.
  • MULTI-THREADING (colE1_high_copy): Forces the cell to run hundreds of instances of the program simultaneously. This maximizes the Data Throughput, resulting in high-volume insulin production.

Build Logs:

  • Coordinates: 1-2761 bp
  • Topology: Circular
  • Resistance: Ampicillin
  • Integrity: Verified

Plasmid Map Plasmid Map

**Part 5: 5.1 DNA READ | 5.2 DNA WRITE | 5.3 DNA EDIT

5.1 DNA READ: PALEOVIROMICS & PERMAFROST SURVEILLANCE

(i) WHAT DNA AND WHY? I intend to sequence ancient viral DNA/RNA (eDNA) extracted from Siberian permafrost cores (Reference: Alempic et al., 2023). As climate change accelerates, dormant pathogens like Pithovirus or Pandoravirus are resurfacing. Sequencing these allows for the creation of a Pre-emptive Pandemic Library to identify ancestral motifs and develop vaccine scaffolds before zoonotic spillover occurs.

(ii) TECHNOLOGY & METHODOLOGY: Technology: Oxford Nanopore Technologies (ONT) Ultra-Long Read Sequencing.

  • GENERATION: 3rd Generation (Single-molecule, real-time sequencing).
  • INPUT: Environmental DNA (eDNA) from permafrost meltwater.
  • PREPARATION STEPS:
    1. EXTRACTION: Bead-based magnetic isolation of fragmented ancient DNA.
    2. REPAIR: End-repair and A-tailing to fix degraded DNA termini.
    3. TARGETED ENRICHMENT: Hybrid capture using RNA-probe baits to isolate viral sequences from bacterial/fungal background.
    4. ADAPTER LIGATION: Attaching motor proteins to pull DNA through pores.
  • DECODING (BASE CALLING): DNA passes through a protein nanopore, disrupting an ionic current. Each base creates a specific squiggle (electrical signature). Recurrent Neural Networks (RNNs) like the ‘Dorado’ basecaller translate these signals into ATCG sequences.
  • OUTPUT: FastQ files containing Long Reads (10kb - 2Mb), enabling high-fidelity de novo assembly of unknown viral genomes.

5.2 DNA WRITE: DE NOVO ANTIFREEZE GLYCOPROTEINS (AFGPs)

(i) WHAT DNA AND WHY? I want to synthesize DNA encoding a De Novo Synthetic Antifreeze Glycoprotein (AFGP), inspired by Arctic Notothenioids (Reference: Zhuang, 2014).

  • SEQUENCE: [Ala-Ala-Thr]n repeats, optimized for human tissue compatibility.
  • WHY: To enable “Supercooling” in organ transplantation. This DNA would produce proteins that prevent ice crystal formation, extending the viability of donor organs from hours to several days.

(ii) TECHNOLOGY & METHODOLOGY: Technology: Silicon-based Phosphoramidite Synthesis (e.g., Twist Bioscience).

  • ESSENTIAL STEPS:
    1. DE-BLOCKING: Acidic removal of the DMT protective group from the silicon-bound nucleotide.
    2. COUPLING: Addition of the next phosphoramidite monomer (A,T,C, or G).
    3. CAPPING: Acetification of failed strands to prevent truncation errors.
    4. OXIDATION: Stabilizing the phosphite triester bond.
  • LIMITATIONS:
    1. SPEED: Chemical synthesis is a multi-day process involving logistics.
    2. SCALABILITY: Individual oligos are limited to ~300bp; longer constructs require Gibson Assembly, which is difficult for repetitive sequences like [Ala-Ala-Thr]n.

5.3 DNA EDIT: MUTATION-AGNOSTIC PROGERIA CORRECTION

(i) WHAT DNA AND WHY? I want to edit the LMNA gene in human fibroblasts to treat Hutchinson-Gilford Progeria Syndrome (HGPS).

  • THE EDIT: Deletion of the CAAX box motif at the C-terminus.
  • WHY: Instead of fixing a patient-specific mutation, removing the CAAX box prevents the toxic protein (progerin) from anchoring to the nuclear membrane. This is a Mutation-Agnostic therapeutic approach applicable to all HGPS patients.

(ii) TECHNOLOGY & METHODOLOGY: Technology: Prime Editing (PE).

  • HOW IT EDITS: Uses an engineered Cas9 nickase fused to a Reverse Transcriptase (RT). It uses a Search-and-Replace mechanism without causing double-strand breaks.
  • ESSENTIAL STEPS:
    1. SEARCH: The pegRNA (prime editing guide RNA) targets the LMNA site.
    2. NICK: Cas9 nicks only the target DNA strand.
    3. REPLACE: The RT enzyme synthesizes new DNA directly from the pegRNA template into the nicked site.
  • INPUTS & PREPARATION:
    1. INPUT: Plasmids/mRNA encoding the PE protein, pegRNA, and a nick-gRNA.
    2. DESIGN: Computational modeling of the Primer Binding Site (PBS) thermodynamics to ensure stable hybridization.
  • LIMITATIONS:
    1. EFFICIENCY: Prime editing often has lower “on-target” efficiency in primary cells compared to standard CRISPR.
    2. DELIVERY: The PE complex is too large for many standard viral delivery vectors (AAVs).

REFERENCES & RESOURCES

  1. Alempic, J. M., et al. (2023). “An update on eukaryotic viruses revived from ancient permafrost.” Viruses.
  2. Zhuang, X. (2014). “Creating sense from non-sense DNA: de novo genesis and evolutionary history of antifreeze glycoprotein gene.” UIUC.
  3. Anzalone, A. V., et al. (2019). “Search-and-replace genome editing without double-strand breaks or donor DNA.” Nature.
  4. Twist Bioscience Technical Documentation (2024). “Silicon-based DNA Synthesis.”

Week 3 HW: Lab Automation

Week 3: Lab Automation & Opentrons Art

Introduction

This week’s focus is on the intersection of biology, robotics, and creative coding. As part of the HTGAA 2026* cohort based in Zambia, I am exploring how liquid-handling automation (specifically the Opentrons OT-2) can streamline laboratory workflows. Beyond the technical utility, this assignment challenged us to use the robot as a canvas, translating digital coordinates into physical biological art.

Lab automation isn’t just about efficiency; it’s about precision in environments where resources must be used optimally. My work this week involves a Python-based protocol that instructs the robot to “paint” a design using colored liquids in a 96-well plate.

AI Documentation (Opentrons Python Script)

Model used: Gemini 3 Flash (Free Tier)

Description of AI Contribution: AI was utilized to translate the artistic concept from the Opentrons GUI into a functional Python script using the Opentrons API v2.13. Specifically, the AI assisted in:

  • Optimization Logic: Implementing a conditional loop (if spots_drawn % 8 == 0) to handle bulk aspiration, which reduces the number of trips the pipette makes to the source reservoir.
  • Spatial Mapping: Calculating relative coordinates using types.Point for precise deposition on an agar plate or flat-bottom well plate.
  • Troubleshooting: Ensuring proper tip handling (e.g., including drop_tip() commands) to prevent cross-contamination and robot errors.
  • Metadata Structure: Properly formatting the protocol metadata and labware loading sequences required for the robot to recognize the script.

The final art concept and the selection of the specific visual ID (zjiq3p93t07ee2n) were directed by the student, while the AI served as a technical co-pilot for the Python implementation.


The Artwork Design

I used the Opentrons Art GUI to map out the coordinates for my design. The visual representation and the specific well-mapping for this protocol can be viewed at the link below:

View my design here: Opentrons Art Design - x8zh29jmvm87u3v


Opentrons Python Protocol

Below is the Python script generated to execute the design. This script defines the labware (tips, reservoir, and plate) and the specific pipetting movements required to recreate the art.

from opentrons import types

metadata = {    # see https://docs.opentrons.com/v2/tutorial.html#tutorial-metadata
    'author': 'ELSA MULEYA',
    'protocolName': 'HTGAA Agar Art - Full Set',
    'description': 'FLORAL ART',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

##############################################################################
###   Robot deck setup constants - don't change these
##############################################################################

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

well_colors = {
    'A1' : 'Red',
    'B1' : 'Green',
    'C1' : 'Orange'
}


def run(protocol):
  ##############################################################################
  ###   Load labware, modules and pipettes
  ##############################################################################

  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)

  # Temperature Module Plate
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul',
                                                      'Cold Plate')
  # Choose where to take the colors from
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')  ## TA MUST CALIBRATE EACH PLATE!
  # Get the top-center of the plate, make sure the plate was calibrated before running this
  center_location = agar_plate['A1'].top()

  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  ##############################################################################
  ###   Patterning
  ##############################################################################

  ###
  ### Helper functions for this lab
  ###

  # pass this e.g. 'Red' and get back a Location which can be passed to aspirate()
  def location_of_color(color_string):
    for well,color in well_colors.items():
      if color.lower() == color_string.lower():
        return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

  # For this lab, instead of calling pipette.dispense(1, loc) use this: dispense_and_detach(pipette, 1, loc)
  def dispense_and_detach(pipette, volume, location):
      """
      Move laterally 5mm above the plate (to avoid smearing a drop); then drop down to the plate,
      dispense, move back up 5mm to detach drop, and stay high to be ready for next lateral move.
      5mm because a 4uL drop is 2mm diameter; and a 2deg tilt in the agar pour is >3mm difference across a plate.
      """
      assert(isinstance(volume, (int, float)))
      above_location = location.move(types.Point(z=location.point.z + 5))  # 5mm above
      pipette.move_to(above_location)       # Go to 5mm above the dispensing location
      pipette.dispense(volume, location)    # Go straight downwards and dispense
      pipette.move_to(above_location)       # Go straight up to detach drop and stay high

  ###
  ### YOUR CODE HERE to create your design
  mrfp1_points = [(-8.8, 24.2),(-6.6, 24.2),(6.6, 24.2),(8.8, 24.2),(-11, 22),(-8.8, 22),(-6.6, 22),(-4.4, 22),(4.4, 22),(6.6, 22),(8.8, 22),(11, 22),(-11, 19.8),(-8.8, 19.8),(-6.6, 19.8),(-4.4, 19.8),(-2.2, 19.8),(2.2, 19.8),(4.4, 19.8),(6.6, 19.8),(8.8, 19.8),(11, 19.8),(-11, 17.6),(-8.8, 17.6),(-6.6, 17.6),(-4.4, 17.6),(-2.2, 17.6),(2.2, 17.6),(4.4, 17.6),(6.6, 17.6),(8.8, 17.6),(11, 17.6),(-11, 15.4),(-8.8, 15.4),(-4.4, 15.4),(-2.2, 15.4),(2.2, 15.4),(6.6, 15.4),(8.8, 15.4),(11, 15.4),(-11, 13.2),(-8.8, 13.2),(-2.2, 13.2),(2.2, 13.2),(8.8, 13.2),(11, 13.2),(-22, 11),(-19.8, 11),(-17.6, 11),(-15.4, 11),(-8.8, 11),(-6.6, 11),(6.6, 11),(8.8, 11),(15.4, 11),(17.6, 11),(19.8, 11),(22, 11),(-24.2, 8.8),(-22, 8.8),(-19.8, 8.8),(-17.6, 8.8),(-15.4, 8.8),(-13.2, 8.8),(13.2, 8.8),(15.4, 8.8),(17.6, 8.8),(19.8, 8.8),(22, 8.8),(24.2, 8.8),(-24.2, 6.6),(-22, 6.6),(-19.8, 6.6),(-17.6, 6.6),(-13.2, 6.6),(-11, 6.6),(0, 6.6),(4.4, 6.6),(11, 6.6),(17.6, 6.6),(19.8, 6.6),(22, 6.6),(24.2, 6.6),(-22, 4.4),(-19.8, 4.4),(-17.6, 4.4),(-6.6, 4.4),(6.6, 4.4),(15.4, 4.4),(17.6, 4.4),(19.8, 4.4),(22, 4.4),(24.2, 4.4),(-19.8, 2.2),(-17.6, 2.2),(-15.4, 2.2),(-13.2, 2.2),(13.2, 2.2),(15.4, 2.2),(17.6, 2.2),(19.8, 2.2),(22, 2.2),(-8.8, 0),(0, 0),(8.8, 0),(-19.8, -2.2),(-17.6, -2.2),(-15.4, -2.2),(-13.2, -2.2),(-6.6, -2.2),(0, -2.2),(2.2, -2.2),(6.6, -2.2),(13.2, -2.2),(15.4, -2.2),(17.6, -2.2),(19.8, -2.2),(-22, -4.4),(-19.8, -4.4),(-17.6, -4.4),(-13.2, -4.4),(-6.6, -4.4),(-2.2, -4.4),(6.6, -4.4),(17.6, -4.4),(19.8, -4.4),(22, -4.4),(-24.2, -6.6),(-22, -6.6),(-19.8, -6.6),(-17.6, -6.6),(-11, -6.6),(-4.4, -6.6),(4.4, -6.6),(11, -6.6),(13.2, -6.6),(17.6, -6.6),(19.8, -6.6),(22, -6.6),(24.2, -6.6),(-24.2, -8.8),(-22, -8.8),(-19.8, -8.8),(-17.6, -8.8),(-15.4, -8.8),(-13.2, -8.8),(-11, -8.8),(0, -8.8),(11, -8.8),(13.2, -8.8),(15.4, -8.8),(17.6, -8.8),(19.8, -8.8),(22, -8.8),(24.2, -8.8),(-22, -11),(-19.8, -11),(-17.6, -11),(-15.4, -11),(-13.2, -11),(-8.8, -11),(-6.6, -11),(6.6, -11),(8.8, -11),(13.2, -11),(15.4, -11),(17.6, -11),(19.8, -11),(22, -11),(-11, -13.2),(-8.8, -13.2),(-2.2, -13.2),(2.2, -13.2),(8.8, -13.2),(11, -13.2),(-11, -15.4),(-8.8, -15.4),(-6.6, -15.4),(-2.2, -15.4),(2.2, -15.4),(4.4, -15.4),(8.8, -15.4),(11, -15.4),(-11, -17.6),(-8.8, -17.6),(-6.6, -17.6),(-4.4, -17.6),(-2.2, -17.6),(2.2, -17.6),(4.4, -17.6),(6.6, -17.6),(8.8, -17.6),(11, -17.6),(-11, -19.8),(-8.8, -19.8),(-6.6, -19.8),(-4.4, -19.8),(-2.2, -19.8),(2.2, -19.8),(4.4, -19.8),(6.6, -19.8),(8.8, -19.8),(11, -19.8),(-11, -22),(-8.8, -22),(-6.6, -22),(-4.4, -22),(4.4, -22),(6.6, -22),(8.8, -22),(11, -22),(-8.8, -24.2),(-6.6, -24.2),(6.6, -24.2),(8.8, -24.2)]
  sfgfp_points = [(-11, 28.6),(11, 28.6),(-13.2, 26.4),(-11, 26.4),(-8.8, 26.4),(8.8, 26.4),(11, 26.4),(13.2, 26.4),(-13.2, 24.2),(-11, 24.2),(11, 24.2),(13.2, 24.2),(-13.2, 22),(13.2, 22),(-13.2, 19.8),(13.2, 19.8),(-13.2, 17.6),(13.2, 17.6),(-13.2, 15.4),(-6.6, 15.4),(4.4, 15.4),(13.2, 15.4),(-26.4, 13.2),(-24.2, 13.2),(-22, 13.2),(-19.8, 13.2),(-17.6, 13.2),(-6.6, 13.2),(-4.4, 13.2),(4.4, 13.2),(6.6, 13.2),(17.6, 13.2),(19.8, 13.2),(22, 13.2),(24.2, 13.2),(26.4, 13.2),(28.6, 13.2),(-28.6, 11),(-26.4, 11),(-24.2, 11),(24.2, 11),(26.4, 11),(28.6, 11),(30.8, 11),(-26.4, 8.8),(-2.2, 8.8),(0, 8.8),(2.2, 8.8),(26.4, 8.8),(28.6, 8.8),(-15.4, 6.6),(-4.4, 6.6),(-2.2, 6.6),(2.2, 6.6),(13.2, 6.6),(15.4, 6.6),(26.4, 6.6),(-15.4, 4.4),(-13.2, 4.4),(-8.8, 4.4),(-2.2, 4.4),(2.2, 4.4),(8.8, 4.4),(13.2, 4.4),(-8.8, 2.2),(-6.6, 2.2),(-4.4, 2.2),(0, 2.2),(4.4, 2.2),(6.6, 2.2),(8.8, 2.2),(-6.6, 0),(6.6, 0),(-8.8, -2.2),(-4.4, -2.2),(4.4, -2.2),(8.8, -2.2),(-15.4, -4.4),(-8.8, -4.4),(2.2, -4.4),(8.8, -4.4),(13.2, -4.4),(15.4, -4.4),(-15.4, -6.6),(-13.2, -6.6),(-2.2, -6.6),(0, -6.6),(2.2, -6.6),(15.4, -6.6),(-26.4, -8.8),(-2.2, -8.8),(2.2, -8.8),(26.4, -8.8),(-28.6, -11),(-26.4, -11),(-24.2, -11),(24.2, -11),(26.4, -11),(28.6, -11),(-26.4, -13.2),(-24.2, -13.2),(-22, -13.2),(-19.8, -13.2),(-17.6, -13.2),(-6.6, -13.2),(-4.4, -13.2),(4.4, -13.2),(6.6, -13.2),(17.6, -13.2),(19.8, -13.2),(22, -13.2),(24.2, -13.2),(26.4, -13.2),(-13.2, -15.4),(-4.4, -15.4),(6.6, -15.4),(13.2, -15.4),(-13.2, -17.6),(13.2, -17.6),(-13.2, -19.8),(13.2, -19.8),(-13.2, -22),(13.2, -22),(-13.2, -24.2),(-11, -24.2),(11, -24.2),(13.2, -24.2),(-13.2, -26.4),(-11, -26.4),(-8.8, -26.4),(8.8, -26.4),(11, -26.4),(13.2, -26.4),(-11, -28.6),(11, -28.6)]

  # Combine the point data with their corresponding well colors into an art_data dictionary
  art_data = {
      'Red': {
          'well': 'A1',
          'points': mrfp1_points
      },
      'Green': {
          'well': 'B1',
          'points': sfgfp_points
      },
      'Orange': {
          'well': 'C1',
          'points': [] # Add points for Orange if needed, otherwise leave empty
      }
  }

  # --- EXECUTION LOGIC ---
  # Center spot of the agar (adjust based on plate size)
  center_well = agar_plate['D6'] # Fixed: Use dictionary-like access instead of wells_by_name()

  for color_name, data in art_data.items():
      source = color_plate[data["well"]] # Fixed: source_plate should be color_plate
      pipette_20ul.pick_up_tip()

      spots_drawn = 0
      for x, y in data["points"]:
          # Aspirate enough liquid for up to 8 spots, or less if fewer spots remain.
          # Each spot is 2uL, so 8 spots is 16uL.
          # The 'min' ensures we don't aspirate more than 16uL at a time or more than what's needed.
          if spots_drawn % 8 == 0:
              pipette_20ul.aspirate(min(16, (len(data["points"])-spots_drawn) * 2), source)

          # Create the relative coordinate on the agar plate
          target = center_well.top().move(types.Point(x=x, y=y, z=0))

          # Use the helper function to dispense and detach the tip
          dispense_and_detach(pipette_20ul, 2, target)

          spots_drawn += 1

      pipette_20ul.drop_tip()

Opentrons Art Design Results Opentrons Art Design Results

3. Final Project Ideas


Idea 1: Zambia Mineral-Waste Bioremediation Predictor

  • Technical Problem: Mining tailings IN Zambia contain high levels of $Cu$ and $Zn$. Traditional cleaning is too expensive. We need “extremophiles” to stabilize these metals.
  • The Project: A computational pipeline to analyze the genomes of Bacillus and Pseudomonas from Zambian sites. I will search for protein sequences (Metallothioneins) that bind heavy metals.
  • Data Source: NCBI SRA data for “Zambian Mine Tailings,” specifically searching for the pbr (lead) and mer (mercury) operons.

Idea 2: Maize Lethal Necrosis (MLN) Genomic Tracker

  • Technical Problem: MLN is a double infection (MCMV + SCMV) devastating maize. It’s hard to distinguish strains visually.
  • The Project: A Comparative Genomics study comparing RNA sequences of MCMV from East Africa vs. South Africa to see if a unique “Zambian strain” is emerging.
  • Data Source: Nextstrain.org and GenBank, focusing on mutations in the Coat Protein (CP) gene.

Idea 3: Maize Yield “Climate-Window” Predictor

  • Technical Problem: Maize is highly vulnerable to moisture stress during the 2-week silking stage. Climate change has shifted Zambia’s rainy season.
  • The Project: An automated Predictive Model using “Agro-Meteorological” data to calculate Growing Degree Days (GDD) for Zambian hybrids (SeedCo/MRI) against 20 years of rainfall patterns.
  • Data Source: CHIRPS rainfall data for Zambia.

Week 4 HW: Protein Design I

Homework: Protein Design I

Part A. Conceptual Questions

1.# Assignment: Proteins and Amino Acids

1. Amino Acids in 500g of Meat

To calculate the total molecules, we first look at the protein density. Meat is roughly 20% protein by mass.

  • Protein Mass: $500\text{g} \times 0.20 = 100\text{g}$
  • Average Molecular Weight (MW): $100\text{ Daltons (g/mol)}$
  • Moles of AA: $100\text{g} / 100\text{g/mol} = 1\text{ mole}$

Using Avogadro’s number, 1 mole contains approximately $6.022 \times 10^{23}$ molecules. That is sextillion amino acids in a single large steak.


2. Metabolic Identity: Why don’t we turn into cows?

When we ingest beef or fish, our digestive system performs proteolysis. Enzymes such as pepsin and trypsin break down foreign proteins into their constituent amino acids. Our ribosomes then take those bricks and reassemble them into human-specific proteins according to the instructions in our DNA. We don’t become the cow because we recycle the parts, not the blueprints.


3. The Standard 20

While there are hundreds of amino acids found in nature, only 20 are universally encoded.

  • The Frozen Accident Theory: Francis Crick proposed that once life settled on a set of 20 that covered the necessary chemical functionalities (acidic, basic, polar, non-polar), the translation machinery became too complex to change. Adding a new one would have required re-coding the entire genome, which would be evolutionarily lethal.

4. Non-Natural Amino Acids (nAAs)

We can expand the genetic code. By engineering aminoacyl-tRNA synthetases, we can incorporate synthetic amino acids.

  • Design Proposal: p-Azidophenylalanine (pAzF).
  • Function: It contains an azide group ($N_3$) that allows for Click Chemistry. This lets us chemically staple drugs or fluorescent dyes to a protein at a precise location that nature never intended.

5. Pre-Biotic Origins

Before life and enzymes, amino acids were produced through abiotic synthesis.

  • The Miller-Urey Experiment: Demonstrated that simple gases (methane, ammonia, hydrogen) plus an energy source (lightning/sparks) could spontaneously generate glycine and alanine.
  • Astrobiology: Analysis of the Murchison meteorite proved that amino acids can form in space via Strecker synthesis, suggesting the ingredients for life are ubiquitous in the solar system.

6. Handedness of D-amino Acid Helices

In biology, we use L-amino acids, which form right-handed α-helices. If you synthesize a peptide using D-amino acids (the mirror image), the resulting helix will be left-handed. The steric hindrance of the D-side chains makes a right-handed twist energetically impossible.


7. Discovering New Helices

Beyond the common α-helix, we find the Φ-helix and the π. We “discover” these by plotting the dihedral angles Ψ on a Ramachandran Plot. By using β-peptides (which have an extra carbon in the backbone), we can create entirely new foldamers with geometries that nature hasn’t explored.


8. Why the Right-Handed Preference?

It comes down to the L-configuration of the alpha-carbon. In a right-handed helix, the side chains (R-groups) point away from the centre, minimising steric clashes. In a left-handed helix made of L-amino acids, the side chains would bump into the backbone and each other, making the structure unstable.


9. β-sheet Aggregation

β-sheets are inherently sticky because they have hydrogen bond donors and acceptors along their edges that are exposed.

  • Driving Force: The Hydrophobic Effect. When β-strands come together, they bury their oily (hydrophobic) side chains away from water. The formation of inter-strand hydrogen bonds then locks them into place. Stacking β-sheets together gives them a crystalline-like lattice.

10. Amyloids: From Disease to Materials

Amyloid plaques (associated with Alzheimer’s) are essentially β-sheets that have aggregated out of control.

  • Utility: These structures are incredibly stable—stronger than steel in some cases. Scientists are now using amyloid-inspired β-sheets to create functional nanomaterials, such as conductive nanowires or ultra-stable drug-delivery scaffolds.

Part B: Protein Analysis and Visualization

1. Protein Selection: The Bacterial Buster

For this assignment, I chose Hen Egg-White Lysozyme (HEWL). I selected this protein because it is a classic example of structure-equals-function. It acts as a biological weapon by physically slicing through bacterial cell walls. It was also the first enzyme ever to have its 3D structure solved by X-ray crystallography, making it a landmark in biotechnology. It was discovered by Alexander Fleming (before he found penicillin) because he noticed his own nasal mucus could kill bacteria.


2. Sequence Analysis

  • Amino Acid Sequence: KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
  • Length: 129 amino acids.
  • The most Frequent Amino Acids are Asparagine (N), followed by Alanine (A) and Glycine (G), but the Cysteine (C) residues are the most structurally significant, as they form 4 disulfide “staples” that keep the protein stable.**
Amino AcidCountPercentage
Asparagine (N)1410.85%
Glycine (G)129.30%
Alanine (A)129.30%
  • Why Asparagine (N)? In the structure of Lysozyme, Asparagine is crucial for its function as a Bacterial Buster. Because Asparagine is excellent at forming hydrogen bonds, these 14 residues act like molecular velcro on the surface of the protein. They help the enzyme stick to the bacterial cell wall (peptidoglycan) so it can stay in place long enough to perform its catalytic cut.

Homologs and Evolutionary Relatives

Using the UniProt BLAST tool, I searched for sequences similar to my Lysozyme query.

  • Homolog Count: The search returned over 250 homologs.
  • Diverse Species: Homologs were found across a wide range of vertebrates, including:
  • Birds: Quail (Colinus virginianus), Pheasants (Phasianus colchicus), and Turkeys (Meleagris gallopavo).
  • Reptiles: Turtles (Chelydra serpentina) and Alligators (Alligator sinensis).
  • Mammals: Humans (Homo sapiens), Gorillas, and even Milk isozymes in Cattle (Bos taurus).

Protein Family Classification

My protein belongs to the Glycosyl Hydrolase Family 22 (GH22). Members of this family are specialized enzymes that identify and break the $\beta(1\rightarrow4)$ glycosidic bonds in the peptidoglycan of bacterial cell walls. Essentially, being part of this family means the protein’s primary job is to act as a highly specific pair of molecular scissors.


3. Protein Structure & Bioinformatics Analysis: 1LZ1

a. RCSB PDB Structure Overview

I identified the structural data for my protein using the RCSB Protein Data Bank.

  • PDB ID: 1LZ1
  • Structure Title: Refinement of Human Lysozyme at 1.5 Angstroms Resolution.
  • Release Date: The structure was officially released on 1985-01-02.
  • Resolution: 1.50 Å.
  • Quality Assessment: This is an excellent quality structure. Since 1.50 Å is significantly lower (better) than the standard 2.70 Å benchmark, it provides high-atomic detail, allowing us to see precise hydrogen-bond interactions.

b. Composition and Non-Protein Molecules

Apart from the protein chain, the solved crystal structure contains:

  • Nitrate Ions ($NO_3^-$): Found in the crystallization buffer.
  • Water ($H_2O$): Essential for understanding the protein’s stability in a liquid environment. meaning its biological function is to use water to break chemical bonds in sugars.
Structural Classification Analysis

I analyzed the structural hierarchy of my protein using the SCOP2 (Structural Classification of Proteins) database.

  • PDB ID: 1LZ1
  • SCOP Representative: 2NWD X
  • Class: Alpha and beta proteins (a+b)
  • Fold: Lysozyme-like
  • Superfamily: Lysozyme-like
  • Family: C-type lysozyme

Summary: My protein belongs to the C-type lysozyme family. This structural classification is significant because it groups my protein with other evolutionary relatives (like the chicken lysozyme found in my BLAST search) that share a specific Alpha+Beta fold. This specific shape is what creates the cleft or active site that allows the protein to function as a Hydrolase, breaking down bacterial cell walls. On the RCSB PDB page, the protein is formally classified as a HYDROLASE (O-GLYCOSYL), which confirms its mechanical family—enzymes that use a water molecule to break the sugar bonds in bacterial cell walls.


4. PyMOL Protein Analysis: Hen Egg-White Lysozyme (1LZ1)

a. Protein Visualizations

I visualized the Lysozyme protein using three different representation methods to understand its structure at various scales.

  • Cartoon: Shows the overall 3D folding architecture and secondary structure flow.
  • Ribbon: Simplifies the view by tracing only the polypeptide backbone.
  • Ball and Stick (Sticks): Reveals the precise location of every atom and the chemical bonds connecting them.
View 1 View 1

b. Secondary Structure: Helices vs. Sheets

By coloring the protein by its secondary structure, I analyzed the building blocks of its shape.

  • Observation: The protein is dominated by alpha-helices (colored red).
  • Analysis: Several prominent spiral helices form the core of the protein. In contrast, there is only one small anti-parallel beta-sheet (colored yellow) acting as a structural wing on the side.
View 2 View 2

c. Residue Distribution: Hydrophobic vs. Hydrophilic

I colored the residues to see how the protein interacts with its watery environment in an egg white.

  • Hydrophobic (Orange): These water-fearing residues are almost entirely tucked away inside the protein’s core.
  • Hydrophilic (Gray): These water-loving residues dominate the outer surface.
  • Conclusion: This follows the oil drop model of protein folding, where the hydrophobic core is shielded from water to maintain stability.

My PyMOL Protein Views

View 3 View 3

d. Surface Analysis and Binding Pockets

Visualizing the molecular surface allows us to see how the protein grabs its targets.

  • Observation: The protein is not a solid sphere; it has a very distinct binding pocket.
  • Analysis: A deep canyon or cleft is clearly visible cutting across the center of the molecule.
  • Function: This hole is the active site where the lysozyme captures and breaks down the sugar chains (polysaccharides) of bacterial cell walls

View 4 View 4

C1. Protein Language Modeling

In this section, I explored the capabilities of modern protein AI models using Bacteriorhodopsin (PDB: 1C3W) as a model system. Bacteriorhodopsin is a sophisticated light-driven proton pump found in Halobacterium salinarum, characterized by its iconic seven-transmembrane alpha-helical structure.


a. Deep Mutational Scanning with ESM2

Using the ESM2 language model (specifically the esm2_t6_8M_UR50D variant), I generated an unsupervised deep mutational scan of the 1C3W sequence. The model predicts the “fitness” of every possible single-point mutation by calculating the log-likelihood of each amino acid at every position in the sequence.

The Heatmap Analysis

The heatmap visualizes the model scores, where the x-axis represents the residue position and the y-axis represents the 20 standard amino acids:

  • High Scores (Yellow/Light Green): Indicate mutations the AI predicts are favorable or neutral.
  • Low Scores (Dark Purple/Blue): Indicate mutations predicted to be destabilizing or functionally detrimental.

Identifying a Standout Mutation

A particularly interesting pattern emerged at Position 168:

  • The Observation: While many transmembrane residues are highly constrained (visible as dark vertical columns), position 168 shows a high tolerance for Proline (P), with a model score of 5.394987.
  • The Interpretation: In the context of a 7-helix bundle, Proline usually acts as a helix breaker. However, the AI’s high score suggests that at this specific coordinate, the structural “kink” or rigidity introduced by Proline is actually beneficial for the protein’s native fold or its conformational light-cycle.
Mutation Scan Heatmap Mutation Scan Heatmap

(Bonus) Experimental Comparison

Experimental data for Bacteriorhodopsin highlights critical residues like D85 and D96 as essential for proton transport. My ESM2 scan accurately reflects this: these positions appear as dark vertical stripes, meaning the language model assigned low likelihoods to almost all mutations at these sites. This demonstrates that the AI has learned functional biological constraints purely from evolutionary sequence data.


Latent Space Analysis: Mapping the Protein Universe

After processing 15,177 sequences from the ASTRAL dataset through the ESM2 transformer, I projected the resulting high-dimensional embeddings into a 3D latent space using t-SNE. This visualization allows us to see how the AI categorizes proteins without any human-labeled data.

Neighborhood Analysis: Structural Peer Groups

Looking at the 3D scatter plot, it is clear that the neighborhoods are not random. The clusters represent distinct structural architectures:

  • The Neighborhoods: The map forms a dense central mass of globular, soluble proteins with distinct arms extending outward. These arms represent specialized folds, such as all-beta sheets or long alpha-helical bundles.
  • Biological Logic: Proteins in the same neighborhood share similar biophysical properties. By hovering over the data points, I found that proteins clustered near my target are often involved in energy transduction or membrane transport.

1C3W Position & Neighborhood

I placed my protein, Bacteriorhodopsin (1C3W), into this map to see who its neighbors are.

  • The Neighbors: My protein landed in a cluster populated by other transmembrane proteins, such as Vacuolar ATP synthase subunits (visible in my analysis as the yellow cluster).
  • Position Significance: 1C3W sits in a specialized island on the periphery of the main protein cloud. This position is highly significant because it reflects the protein’s hydrophobic nature.
  • Conclusion: The AI successfully grouped Bacteriorhodopsin with other membrane-embedded proton pumps and synthases. Even though the sequence identity might be low, the model recognizes the shared “structural grammar” required to span a lipid bilayer. This proves that the ESM2 latent space effectively approximates biological function and fold-topology purely from sequence data.
Universal Protein Map Universal Protein Map

C2. Protein Folding with ESMFold

In this stage, I used ESMFold to predict the 3D atomic structure of Bacteriorhodopsin (1C3W) directly from its amino acid sequence. This test determines if the AI can accurately recreate the physical geometry of a complex membrane protein.

Fold Results & Structural Accuracy

The ESMFold prediction was highly successful. The model generated a clear, seven-transmembrane alpha-helical bundle that aligns almost perfectly with the original experimental structure from the PDB.

  • The Verdict: The predicted coordinates match the original structure with high confidence. The AI correctly identified the hydrophobic nature of the sequence and packed the helices into the characteristic barrel shape required for its function as a proton pump.
ESMFold Predicted Structure ESMFold Predicted Structure

Sequence Resilience & Mutation Testing

I performed two separate “stress tests” on the sequence to see how much change the structure could tolerate before it collapsed.

1. Small Mutations (The Point Test)

I first introduced minor point mutations into the loop regions of the protein.

  • Observation: The protein was remarkably resilient. The overall 7-helix bundle remained intact, with only tiny shifts in the flexible loops. This shows the fold is robust against minor “noise” in non-structural areas.
Small Mutation Fold Structure Small Mutation Fold Structure

2. Large Segments (The Collapse Test)

I then replaced a large, 20-residue segment of a core transmembrane helix with flexible Glycines to break the structural pillar.

  • Observation: The structure was not resilient to this change. The helical bundle was significantly distorted, and the parallel arrangement of the “barrel” caved in.
  • Conclusion: Bacteriorhodopsin is resilient to surface-level mutations but highly dependent on the integrity of its transmembrane helices. The “grammar” of this protein requires these rigid pillars to stay upright; once a pillar is removed, the entire architecture fails.
Large Mutation Fold Structure Large Mutation Fold Structure

C3. Protein Generation (Inverse Folding)

In the final stage of my project, I moved beyond studying natural proteins to De Novo Design. I used ProteinMPNN to perform Inverse Folding the process of providing the AI with a fixed 3D backbone and asking it to “dream up” a brand-new amino acid sequence that would stabilize that specific shape.

The Inverse-Folding Process

  • The Blueprint: I provided the high-confidence 3D coordinates (PDB file) of my Bacteriorhodopsin fold as the structural input.
  • Sequence Analysis: The model generated several candidate sequences. My top-ranked design had a Sequence Recovery of 47.3%.
  • Observations: This means that the AI completely redesigned 52.7% of the protein’s sequence. While the “letters” changed significantly, the predicted sequence probabilities remained high for residues that maintain the hydrophobic core of the helices.

`> Generating sequences…

tmp, score=1.6136, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 TGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALLVDADQGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFFGFSMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSRAIFG T=0.1, sample=0, score=0.9199, seq_recovery=0.4730 APDLSKPWWAIGTIVFLLGTIFFAVRGLLVTDPRARVFYILCTLIPLIMFICYLAILLGFGWVLVPVDGKLRKVPVARYVGWLLTGPLILLCLARLVDAPAGLVALLVALWLVSVLTALLGALSTVPPLRFVFLAISTAALLLILAILLFGFTLDPRVRPTYLVLVALFVVLWLLYPVVLLLGPEGLNVIPLHVFEALVTVLDILLTVGFGLILVSSPAIFS T=0.1, sample=0, score=0.9118, seq_recovery=0.4459 APRLWAPWVALGTAVMAAGAVYFAARGARVTDPRARWFYVLATLIPLIMAVCFLAILLGLGVVLVPKDGKLRPIPVMLFVGWLLTGPLILLCLARLVDASPALIALLVALWVIAVLSALIGALSTIPPLRFVFLAISTLALLIILYILLFGFTLDPRVRPTYLVLVTLFIILWSLYPIILLLGPFGLNLIPLSVFMALITVLDILLTVGFGLILLASPAIRA

New Sequence:APDLSKPWWAIGTIVFLLGTIFFAVRGLLVTDPRARVFYILCTLIPLIMFICYLAILLGFGWVLVPVDGKLRKVPVARYVGWLLTGPLILLCLARLVDAPAGLVALLVALWLVSVLTALLGALSTVPPLRFVFLAISTAALLLILAILLFGFTLDPRVRPTYLVLVALFVVLWLLYPVVLLLGPEGLNVIPLHVFEALVTVLDILLTVGFGLILVSSPAIFS`


Validation: AI Sequence vs. Natural Shape

To prove the design worked, I took the AI-generated sequence (APDLSKPWWAIGTIVFLLGTIFFAVRGLLVTDPRARVFYILCTLIPLIMFICYLAILLGFGWVLVPVDGKLRKVPVARYVGWLLTGPLILLCLARLVDAPAGLVALLVALWLVSVLTALLGALSTVPPLRFVFLAISTAALLLILAILLFGFTLDPRVRPTYLVLVALFVVLWLLYPVVLLLGPEGLNVIPLHVFEALVTVLDILLTVGFGLILVSSPAIFS) and fed it back into ESMFold to see if it would still form the seven-transmembrane bundle.

  • The Result: The validation was a total success. Despite being less than half-identical to the natural sequence found in nature, the synthetic sequence folded into the identical 7-helix architecture.
  • Comparison: The predicted structure for the synthetic sequence matches the original 1C3W backbone almost perfectly, demonstrating that the AI successfully captured the “structural grammar” of the protein.
ESMFold result of the AI-designed synthetic sequence ESMFold result of the AI-designed synthetic sequence

Final Project Conclusion

This journey from sequence analysis to de novo design highlights a fundamental principle of modern bioengineering: Structure is more conserved than sequence. Through this lab, I have demonstrated that:

  1. Language Models (ESM2) can organize the protein universe by structural similarity without being explicitly taught physics.
  2. Folding models (ESMFold) can accurately predict complex transmembrane architectures.
  3. Inverse-folding models (ProteinMPNN) allow us to design entirely new, non-natural sequences that fulfill specific geometric goals.

This capability is the cornerstone of the next generation of drug discovery and synthetic biology, allowing us to build custom molecular machines from the ground up.

link to protein language modeling work: https://colab.research.google.com/drive/1TyJ7DqysYyLd2P1MPcW8_aDekQIB6x07?usp=sharing


HTGAA 2026: Bacteriophage Engineering Project

Topic: Engineering the MS2 L-Protein for Enhanced Lytic Kinetics


1. Project Goals

Our team is focusing on two primary engineering objectives for the MS2 bacteriophage L-protein:

  • Increased Toxicity (Hard): Optimize lytic kinetics to trigger faster host cell lysis by bypassing the DnaJ-dependent “damping” mechanism.
  • Increased Stability (Easy): Redesign the N-terminal and transmembrane domains to prevent proteolytic degradation, ensuring robust protein accumulation.

2. Proposed Computational Pipeline

Step 1: Generative Sequence Design (Evo 2)

  • Approach: We will utilize the Evo 2 genome language model to generate a library of novel MS2 L variants. We will specifically prompt the model to design “L-odj-like” variants (L-overcomes-DnaJ) by modifying the N-terminal Domain 1.
  • Reasoning: Evo 2 can navigate novel evolutionary spaces beyond the 67 unique mutations identified in natural screens, accessing sequence diversity that purely experimental methods might miss.

Step 2: Sequence Stability Optimization (ProteinMPNN)

  • Approach: Use ProteinMPNN to perform inverse folding on the core Transmembrane Domain (TMD) of the generated candidates.
  • Reasoning: ProteinMPNN redesigns sequences to fit the specific 3D backbone required for membrane insertion while optimizing for thermodynamic stability, preventing accumulation defects.

Step 3: Functional Motif Tuning (ESM-2 / ESM-3)

  • Approach: Use ESM-2/3 protein language models to extract embeddings and perform in silico mutagenesis on the essential Leu48-Ser49 (LS) motif.
  • Reasoning: ESM models identify which substitutions in the surrounding Domain 2 and Domain 4 preserve the critical hydrophobic and polar character necessary for function.

Step 4: Oligomerization Verification (AlphaFold-Multimer)

  • Approach: Use AlphaFold-Multimer to predict the ability of designed variants to assemble into high-order oligomeric complexes (decamers or higher).
  • Reasoning: MS2 L must form large membrane-disrupting clusters. This step validates if mutations at the TMD interface promote or hinder essential assembly.

3. Pipeline Schematic: From Sequence to Pore

To engineer the MS2 L-protein, we utilize a tiered computational pipeline. This workflow moves from broad “sequence discovery” to high-resolution “structural validation,” ensuring each candidate is both stable and functional before experimental testing.

Phase 1: Sequence Discovery via Evo 2

The Architect We initiate the pipeline using Evo 2, a genomic-scale language model. By providing the MS2 genome as context, we prompt the model to generate novel L-protein sequences. Unlike traditional mutagenesis, Evo 2 identifies long-range dependencies within the genome, allowing us to design “L-odj” (overcomes DnaJ) variants that can bypass host inhibitory mechanisms while maintaining the integrity of the viral life cycle.

Phase 2: Stability Refinement via ProteinMPNN

The Reinforcer Generative models can sometimes produce “orphan” sequences that are theoretically toxic but physically unstable. We use ProteinMPNN to perform inverse folding on the Transmembrane Domain (TMD). By fixing the 3D backbone required for membrane insertion and “redesigning” the amino acid side chains, we maximize the thermodynamic stability of the protein. This ensures the L-protein accumulates in the E. coli membrane rather than being degraded by host proteases.

Phase 3: Functional Filtering via ESM-2/3

The Evaluator To ensure our redesigned sequences haven’t lost their “killing power,” we use ESM-2/3 (Evolutionary Scale Models). We extract embeddings to perform zero-shot fitness predictions, specifically focusing on the essential Leu48-Ser49 (LS) motif. This step acts as a filter: any sequence that deviates from the hydrophobic and polar requirements of the LS-motif—the core engine of MS2-induced lysis—is discarded.

Phase 4: Quaternary Validation via AlphaFold-Multimer

The Gatekeeper The final and most rigorous check involves AlphaFold-Multimer. MS2 L-protein does not work in isolation; it must oligomerize into high-order clusters (likely decamers) to create a pore large enough for cytoplasmic leakage. We model the top 10 candidates in a 10-mer configuration to verify that our mutations haven’t disrupted the protein-protein interfaces required for assembly. Only candidates that show a stable, pore-forming geometry are selected for synthesis.

4. Potential Pitfalls

The Suicide Problem

If our engineered L protein is too toxic and bypasses DnaJ entirely, it might lyse the E. coli before the phage has finished replicating its genome. This would result in “lysis from without” but zero phage progeny, making the engineering a failure for phage therapy applications.

  • Membrane Complexity: Most of these tools (like AlphaFold and ProteinMPNN) were trained on soluble proteins. Modeling a protein that lives entirely inside a lipid bilayer is computationally noisy, and the predicted oligomers might not behave the same way in a real, pressurized bacterial membrane.

References

  1. Nelson, D. L., & Cox, M. M. (2021). Lehninger Principles of Biochemistry. 8th Ed.
  2. Miller, S. L. (1953). “A Production of Amino Acids Under Possible Primitive Earth Conditions.” Science.
  3. Dobson, C. M. (2003). “Protein folding and misfolding.” Nature.
  4. Crick, F. H. (1968). “The origin of the genetic code.” Journal of Molecular Biology.

Week 5 HW: Protein Design Part II

Week 5: Protein Design Part II

SOD1 Binder Peptide Design and Evaluation

Part 1: Generate Binders with PepMLM

The human SOD1 sequence was retrieved from UniProt (P00441). The A4V mutation (Alanine to Valine at residue 4) was introduced to the wild-type sequence to create the target for peptide generation. Using the PepMLM-650M model, four 12-amino acid peptides were generated, and the known binder FLYRWLPSRRGG was added as a control. htgaa-week5-sod1-protein htgaa-week5-sod1-protein

‘>sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ’

The human SOD1 sequence with the A4V mutation ‘MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ’

Sequence Data & Perplexity Scores

The perplexity scores below represent the model’s confidence in the generated sequences (lower scores generally indicate higher confidence).

pepmlm-650m-peptide-generation pepmlm-650m-peptide-generation
Peptide IDSequencePerplexity
Peptide 1WHYPVVAVALKX9.85
Peptide 2WHYPAVGLALKX9.74
Peptide 3WLSYAVAAALGE10.14
Peptide 4WLVGVTVLRLKE25.60

Part 2: Evaluate Binders with AlphaFold3

Each peptide was modeled in complex with the A4V mutant SOD1 using the AlphaFold Server. The following results detail the structural affinity and localization of each candidate.

1. WHYPVVAVALKX

  • Scores: ipTM: 0.44 | pTM: 0.79
  • Binding Site: This peptide engages the β-barrel region on the exterior surface of the protein.
  • Localization: It does not localize near the N-terminus/A4V mutation site.
  • Burial State: It appears surface-bound, showing moderate contact with the protein exterior but remaining mostly exposed to the solvent.
pepmlm-650m-peptide-generation-1 pepmlm-650m-peptide-generation-1

2. WHYPAVGLALKX

  • Scores: ipTM: 0.38 | pTM: 0.79
  • Binding Site: This peptide appears loosely bound to a distal loop area far from the mutation.
  • Localization: It fails to localize to the N-terminus or the dimer interface.
  • Burial State: It is surface-bound and lacks a deep binding pocket, suggesting a weak interaction.
pepmlm-650m-peptide-generation-4 pepmlm-650m-peptide-generation-4

3. WLSYAVAAALGE

  • Scores: ipTM: 0.30 | pTM: 0.73
  • Binding Site: This sequence shows no specific site preference and remains dissociated.
  • Localization: No proximity to the A4V site or the β-barrel.
  • Burial State: It appears unbound/solvent-exposed, indicating a non-binder. pepmlm-650m-peptide-generation-3 pepmlm-650m-peptide-generation-3

4. WLVGVTVLRLKE

  • Scores: ipTM: 0.30 | pTM: 0.80
  • Binding Site: Similar to Peptide 3, this peptide remains detached from the protein body.
  • Localization: Far from the A4V mutation site.
  • Burial State: Fully exposed; the model shows no structured interaction with the SOD1 surface. pepmlm-650m-peptide-generation-4 pepmlm-650m-peptide-generation-4

5. FLYRWLPSRRGG (Known Binder)

  • Scores: ipTM: 0.36 | pTM: 0.83
  • Binding Site: Unexpectedly, AlphaFold places this binder against the β-barrel rather than the N-terminus.
  • Localization: It does not localize to the destabilized A4V region in this specific mutant model.
  • Burial State: It is partially buried against the barrel but does not form a deep complex. htgaa-sod1-peptide-validation htgaa-sod1-peptide-validation

Comparative Analysis of ipTM Values

The observed ipTM values across all five peptides range from 0.30 to 0.44, all of which fall below the 0.5 confidence threshold generally required for a “high-confidence” interaction. Peptide 1 (WHYPVVAVALKX) achieved the highest score at 0.44, followed by Peptide 2 at 0.3. Interestingly, the known binder FLYRWLPSRRGG yielded an ipTM of only 0.36, meaning that my top PepMLM-generated peptide (Peptide 1) exceeded the known binder in terms of predicted structural stability. While none of the peptides perfectly “capped” the A4V mutation at the N-terminus, the AI-generated sequences showed a comparable, and in one case superior, affinity for the protein surface compared to the established baseline.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence from AlphaFold3 provides a visual starting point, but therapeutic viability requires assessing physicochemical properties. Using PeptiVerse, I evaluated the solubility, toxicity, and chemical affinity of the generated sequences against the A4V mutant SOD1 protein.

Therapeutic Property Data

PeptideSequenceipTM (AF3)Binding Affinity (pKd/pKi)Solubility (Prob.)Hemolysis (Prob.)Net Charge (pH 7)
Peptide 1WHYPVVAVALKX0.445.643 (Weak)1.000 (Soluble)0.045 (Non-Hemo)+0.85
Peptide 2WHYPAVGLALKX0.385.802 (Weak)1.000 (Soluble)0.028 (Non-Hemo)+0.85
Peptide 3WLSYAVAAALGE0.306.110 (Weak)1.000 (Soluble)0.120 (Non-Hemo)-1.23
Peptide 4WLVGVTVLRLKE0.306.504 (Weak)1.000 (Soluble)0.121 (Non-Hemo)+0.77

Comparative Analysis

In comparing the structural data from AlphaFold3 with the chemical predictions from PeptiVerse, there is an inverse relationship between structural confidence and predicted chemical affinity in this dataset. While Peptides 3 and 4 showed the highest predicted chemical affinity (6.110 and 6.504 pKd/pKi respectively), they displayed the lowest structural confidence (ipTM 0.30) and appeared dissociated in AlphaFold3 models. Conversely, Peptide 1, which had the highest structural docking confidence (ipTM 0.44), showed a lower chemical affinity score of 5.643.

Regarding safety, all peptides are predicted to be highly soluble (1.000 probability). However, Peptides 3 and 4 show a significantly higher hemolysis probability (~0.12) compared to Peptide 1 (0.045) and Peptide 2 (0.028), making them riskier for blood-contacting therapeutic applications.

Final Selection & Justification

Selected Candidate: Peptide 1 (WHYPVVAVALKX)

Justification: I have chosen to advance Peptide 1 as the lead candidate for SOD1 stabilization. Although PeptiVerse predicted weak affinity for all candidates, Peptide 1 represents the most robust structural fit identified by AlphaFold3, suggesting a more defined binding pose compared to the others. Critically, it balances this structural potential with a superior safety profile—maintaining perfect solubility and the second-lowest hemolysis risk (0.045). This combination of structural docking stability and low toxicity makes it the most viable candidate for further in vitro synthesis and stabilization assays for the A4V mutant.

Part 4: Targeted Peptide Design with moPPIt

In this final phase, I transitioned from general sequence sampling to directed design using moPPIt (Multi-Objective Guided Discrete Flow Matching). While my earlier work with PepMLM was useful for identifying potential binders across the protein surface, moPPIt allowed me to specifically “steer” the AI to design peptides for the A4V mutation site while simultaneously optimizing for therapeutic safety.

Design Strategy and Hotspot Targeting

To address the destabilization caused by the A4V mutation, I constrained the design process to residues 1-10 (the N-terminus). I enabled multi-objective guidance to prioritize high Affinity and Solubility while minimizing Hemolysis risk. The model utilized “Motif Guidance” to sculpt 12-mer peptides specifically for this pocket.

moPPIt Generated Results

BinderSequenceHemolysis (Prob)SolubilityAffinity (pKd)Motif Score
Lead 1AGWLLGQTLA0.8490.405.8580.018
Lead 2DYYEKWKATN0.9230.805.2230.210
Lead 3WQKWVKRTAC0.9160.604.3890.315

Analysis: moPPIt vs. PepMLM

Comparing these results to the initial PepMLM sequences reveals a significant shift in design quality:

  1. Controlled Localization: PepMLM binders primarily docked to the stable $\beta$-barrel. In contrast, the moPPIt sequences were steered to interact specifically with the N-terminal residues (1-10) where the A4V mutation resides.
  2. Property Trade-offs: There are clear trade-offs between objectives. For example, Lead 2 (DYYEKWKATN) achieved a high solubility score (0.80), but its predicted affinity was lower than Lead 1. This demonstrates moPPIt’s ability to provide a range of candidates with balanced therapeutic profiles.

Pre-Clinical Evaluation Pipeline

To evaluate these moPPIt-generated peptides before advancing to clinical studies, the following validation steps are required:

  • Biophysical Verification: Synthesize the leads and use Surface Plasmon Resonance (SPR) to measure the actual $K_D$ (binding affinity) against recombinant A4V SOD1.
  • Serum Stability: Conduct stability assays to ensure these 12-mer peptides resist degradation by circulating proteases.
  • Functional Rescue: Test the candidates in human iPSC-derived motor neurons to confirm they prevent toxic SOD1 aggregation and restore cellular health.

Final Conclusion

The moPPIt process provided more drug-like leads than simple sampling. Lead 2 stands out as the most promising candidate due to its superior balance of solubility and motif targeting, offering a potential path forward for stabilizing the destabilized SOD1 dimer interface in A4V-mediated ALS.


HTGAA 2026: Phage Lysis Protein Design Challenge

Author: Elsa Muleya
Affiliation: Copperbelt University (CBU), Zambia
Project Date: March 2026
Objective: To engineer MS2 bacteriophage L-protein variants capable of bypassing host DnaJ-mediated resistance and optimizing membrane lysis efficiency through structural modeling and rational design.


1. Project Background and Introduction

The Bacteriophage MS2 is a single-stranded RNA virus that specifically targets E. coli. A single protein, the Lysis (L) protein (75 residues), is responsible for creating pores in the bacterial membrane to release new viral progeny. However, this viral assassin is not entirely independent; it relies on the host chaperone protein DnaJ for proper folding.

A critical hurdle in phage therapy is the evolution of bacterial resistance. E. coli can develop single point mutations in the DnaJ chaperone that prevent the L-protein from interacting with it. When this interaction is broken, the L-protein fails to process, and the infection cycle stops. My research focuses on introducing mutations into the L-protein to either achieve DnaJ-independence or increase the speed of lysis, thereby reducing the window for the host to acquire resistance.


2. Evolutionary Context and Design Methodology

Before making mutations, I used pBLAST and Clustal Omega to perform a multiple sequence alignment. This allowed me to distinguish between highly conserved residues (essential for structural integrity) and variable regions (potential targets for engineering).

MS2_L-protein_ClustalOmega_MSA MS2_L-protein_ClustalOmega_MSA > Figure 1: Multiple Sequence Alignment highlighting evolutionary conservation.

My design strategy utilizes AlphaFold2-Multimer to predict how these mutants interact with the DnaJ chaperone. By analyzing the Predicted Aligned Error (PAE) plots, I can assess the confidence of the protein-protein interaction. High confidence (dark blue at the interface) suggests the protein still binds to the chaperone, whereas high error (red/green) indicates a potential disruption of that dependency.


3. Analysis of Engineered Mutants

I selected five positions for mutation, ensuring two were in the soluble N-terminal region (residues 1-40) and two were in the transmembrane C-terminal region (residues 41-75).

Variant 1: T3I (Soluble Region)

  • Design Rationale: I targeted a variable site at the extreme N-terminus. By swapping Threonine for the more hydrophobic Isoleucine, I aimed to test if a slight shift in the N-terminal anchor could alter chaperone docking requirements.
  • Computational Results: The AlphaFold2 results showed a high pLDDT score for the fold, but the PAE plots indicated that the docking confidence with DnaJ remained high.

3D Structure of T3I Mutant 3D Structure of T3I Mutant > Validation Plots for T3I Mutant Validation Plots for T3I Mutant

Variant 2: Q11A (Soluble Region)

  • Design Rationale: This polar-to-hydrophobic swap was intended to disrupt the electrostatic surface interaction with DnaJ.
  • Computational Results: Similar to T3I, the structural integrity remained intact, but the model still predicted a strong binding event with the host chaperone.

3D Structure of Q11A Mutant 3D Structure of Q11A Mutant > Validation Plots for Q11A Mutant Validation Plots for Q11A Mutant

Variant 3: I42V (Transmembrane Region - Control)

  • Design Rationale: This acts as a conservative control. By swapping Isoleucine for Valine (both hydrophobic and branched), I expected minimal impact on the pore-forming helix.
  • Computational Results: The PAE plot showed very low error across the complex, confirming that this region is structurally robust and can tolerate minor volume changes without losing DnaJ affinity.

**3D Structure of I42V Mutant 3D Structure of I42V Mutant ** > **Validation Plots for I42V Mutant Validation Plots for I42V Mutant **

Variant 4: L61G (Transmembrane Region)

  • Design Rationale: Introducing a Glycine “hinge” into a rigid alpha-helix increases conformational flexibility. This was designed to allow the L-protein to insert into the membrane more dynamically.
  • Computational Results: There was a slight increase in the predicted error at the interface, suggesting the hinge might slightly destabilize the rigid docking required by DnaJ.

**3D Structure of L61G Mutant 3D Structure of L61G Mutant ** > **Validation Plots for L61G Mutant Validation Plots for L61G Mutant **

Variant 5: V63Q (Transmembrane Region - Lead Candidate)

  • Design Rationale: This is my most disruptive design. Inserting a polar Glutamine (Q) into the hydrophobic core of the helix is intended to trigger a “forced” conformational change or rapid membrane disruption.
  • Computational Results: The PAE plots for V63Q showed a significant loss of confidence (red and light green coloring) at the DnaJ interface. This suggests the mutation successfully disrupts the docking confidence, potentially allowing the protein to bypass the chaperone entirely.

**3D Structure of V63Q Mutant 3D Structure of V63Q Mutant ** > **Validation Plots for V63Q Mutant Validation Plots for V63Q Mutant


4. Synthesis and Wet-Lab Implementation

To test these variants, I have codon-optimized the sequences for E. coli expression. These will be synthesized via Twist Bioscience and assembled into the pBAD expression vector using Gibson Assembly.

Reference Sequences (Optimized DNA)

Variant 1 (T3I): text atggaaatccgttttccgcagcagtctcagcagaccccggcttctaccaaccgtcgtcgtccgttcaaacacgaagactacccgtgccgtcgtcagcagcgttcttctaccctgtacgttctgatcttcctggctatcttcctgtctaaattcaccaaccagctgctgctgtctctgctggaagctgttatccgtaccgttaccaccctgcagcagctgctgacc```

Variant 2 (Q11A): Plaintext atggaaacccgttttccgcagcagtctgcgcagaccccggcttctaccaaccgtcgtcgtccgttcaaacacgaagactacccgtgccgtcgtcagcagcgttcttctaccctgtacgttctgatcttcctggctatcttcctgtctaaattcaccaaccagctgctgctgtctctgctggaagctgttatccgtaccgttaccaccctgcagcagctgctgacc

Variant 3 (I42V):

atggaaacccgttttccgcagcagtctcagcagaccccggcttctaccaaccgtcgtcgtccgttcaaacacgaagactacccgtgccgtcgtcagcagcgttcttctaccctgtacgttctggttttcctggctatcttcctgtctaaattcaccaaccagctgctgctgtctctgctggaagctgttatccgtaccgttaccaccctgcagcagctgctgacc

Variant 4 (L61G):

atggaaacccgttttccgcagcagtctcagcagaccccggcttctaccaaccgtcgtcgtccgttcaaacacgaagactacccgtgccgtcgtcagcagcgttcttctaccctgtacgttctgatcttcctggctatcttcctgtctaaattcaccaaccagctgctgctgtctggtctggaagctgttatccgtaccgttaccaccctgcagcagctgctgacc

Variant 5 (V63Q):

atggaaacccgttttccgcagcagtctcagcagaccccggcttctaccaaccgtcgtcgtccgttcaaacacgaagactacccgtgccgtcgtcagcagcgttcttctaccctgtacgttctgatcttcctggctatcttcctgtctaaattcaccaaccagctgctgctgtctctgcaggaagctgttatccgtaccgttaccaccctgcagcagctgctgacc

5. Final Reflection and Future Directions

The computational data strongly suggest that V63Q is the most promising lead candidate. Weakening the interaction between confidence and DnaJ provides a viable pathway to overcome host resistance. One potential risk discussed during the design phase is that disrupting the chaperone interaction might also impair the protein’s ability to self-oligomerize to form the pore.

Strategic Analysis: The Synthetic Biology Trade-off

In engineering the V63Q variant, I am addressing a fundamental challenge in protein design: the balance between chaperone independence and structural stability. While the L-protein typically requires DnaJ as a structural scaffold to reach the membrane, my design tests whether a mutation in the transmembrane region can bypass this requirement.

Theoretical Outcomes for Variant V63Q

There are two primary biochemical scenarios that this mutation aims to explore during experimental validation:

  1. The Auto-Insertion Success: In this scenario, the V63Q mutation increases the protein’s affinity for the lipid bilayer to such an extent that it no longer requires DnaJ-mediated folding. The protein effectively auto-inserts into the membrane, oligomerizes, and induces lysis independently of host machinery.
  2. The Aggregation Failure: Conversely, without the DnaJ chaperone to shield hydrophobic patches during translation, the polar Glutamine (Q) at position 63 may cause the transmembrane helices to clump together inappropriately in the cytoplasm. This would form an inactive inclusion body that never reaches the membrane.

Refining the Strategy

To mitigate risks during the experimental phase, my strategy focuses on the specific Surface Area of Interaction:

  • DnaJ Binding Site: Usually involves the soluble N-terminus (residues 1–40).
  • Self-Oligomerization Site: Usually involves the transmembrane C-terminus (residues 41–75).

By focusing disruptive mutations like V63Q in the transmembrane region, I am testing the theory that the L-protein can auto-insert into the membrane.

Validation through Plaque Assays

The results of the upcoming plaque assays will provide a definitive answer to this design’s viability:

  • Clear Zones (Lysis): If the assay shows clear zones, it proves that DnaJ is not strictly necessary for pore formation and the bypass was successful.
  • No Plaques: If no plaques are visible, it suggests the mutation terminally disrupted the protein’s ability to self-assemble or fold without chaperone assistance.

Note: This analysis will be validated using the 3D structures and validation plots generated for all five variants to correlate predicted stability with observed lysis activity.

References

Chamakura, K. R., et al. (2017). “Mutational analysis of the MS2 lysis protein L.” Journal of Virology.

Hyman, P., et al. (2023). “Phage therapy: From biological mechanisms to future directions.” Microbiology Research Reviews.

UniProt Consortium. “Lysis protein L - Bacteriophage MS2 (P03609).

Week 6 HW: Genetic Circuits Part 1

Assignment: DNA Assembly

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

  • Phusion DNA Polymerase: This is the “engine.” It’s a highly thermostable enzyme that synthesizes new DNA strands. It’s “High-Fidelity” because it has $3’ \rightarrow 5’$ exonuclease activity (proofreading), making significantly fewer mistakes than standard Taq.
  • dNTPs (Deoxynucleotide Triphosphates): These are the molecular building blocks (A, T, C, and G) used by the polymerase to construct the new DNA strand.
  • Buffer (containing $Mg^{2+}$): Maintains the optimal pH for enzymatic activity and provides essential divalent cations. Magnesium ions act as a cofactor for the polymerase, helping it catalyze the phosphodiester bond.
  • Stabilizers: Often includes detergents or proprietary chemicals to prevent the enzyme from denaturing or sticking to the tube walls during the high-heat cycles.

2. What are some factors that determine primer annealing temperature during PCR?

  • Primer Length: Longer primers generally require higher temperatures to remain specific.
  • GC Content: G-C pairs have three hydrogen bonds compared to the two in A-T pairs. Therefore, primers with higher GC content have higher melting temperatures ($T_m$).
  • Salt Concentration: The concentration of monovalent cations (like $K^+$) in the buffer affects the stability of the DNA duplex.
  • Primer Concentration: Higher concentrations can slightly shift the kinetics of annealing.
  • Mismatches: If the primer isn’t a 100% match to the template, the $T_m$ will decrease.

Note: The annealing temperature ($T_a$) is usually chosen to be $3-5^\circ\text{C}$ below the $T_m$ of the primers to balance specificity and yield.


3. Compare and contrast PCR vs. Restriction Enzyme Digests.

FeaturePCR (Polymerase Chain Reaction)Restriction Enzyme Digest
MechanismEnzymatic synthesis of new DNA strands.Enzymatic “cutting” of existing DNA strands.
InputTemplate DNA + Primers + Polymerase.Plasmid or genomic DNA + Specific Enzymes.
OutputExponentially amplified linear fragments.Linearized fragments (no amplification).
CustomizationVery high; you define the ends via primers.Limited to where specific “sites” (e.g., EcoRI) exist.
AccuracyRisk of point mutations (minimized by Phusion).Highly accurate sequence retention.

When to use which?

  • Use PCR when you need to add specific “overhangs” for Gibson assembly or when you have a very small amount of starting material.
  • Use Restriction Digest when you are moving a large chunk of DNA from a “classic” vector that already contains the necessary sites, or when you want to avoid the risk of PCR-induced mutations in a large gene.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

For Gibson Assembly to work, your fragments must have homologous overlapping ends (typically 20–40 base pairs).

  • For PCR: You must design your primers so that the 5’ end of the primer contains a sequence that matches the end of the adjacent fragment.
  • For Digest: You must ensure the restriction site is positioned such that the resulting linearized DNA shares overlap with the next piece, or use a “Stitch PCR” on the digested fragment to add the necessary overlaps.
  • Verification: Use a tool like NEB’s Gibson Assembly Designer or Benchling to simulate the “junctions” and confirm the overlaps are in the correct orientation ($5’ \rightarrow 3’$) and have a high enough $T_m$ to stay stable during the reaction.

5. How does the plasmid DNA enter the E. coli cells during transformation?

In the HTGAA lab context, we usually use Chemically Competent cells:

  1. Heat Shock: Cells are kept on ice with DNA, then suddenly moved to $42^\circ\text{C}$.
  2. Pore Formation: This temperature spike creates a pressure imbalance and temporary “pores” or thermal fluctuations in the chemically-weakened cell membrane.
  3. DNA Uptake: The DNA moves through these temporary pores into the cytoplasm.
  4. Recovery: Cells are placed back on ice and then incubated in SOC/LB media at $37^\circ\text{C}$ to “heal” the membrane and begin expressing the antibiotic resistance gene before plating.

6. Describe another assembly method in detail: Golden Gate Assembly.

Golden Gate Assembly relies on Type IIS restriction enzymes (like BsaI or BpiI). Unlike standard enzymes, these cut outside of their recognition sequence, creating custom non-palindromic 4-base overhangs.

Because the recognition site is removed during the cleavage, the reaction is “directional” and “seamless.” This allows for a “one-pot” reaction where digestion and ligation happen simultaneously in the same tube. You can assemble multiple fragments (up to 10+) in a specific order by designing unique 4-bp overlaps for each junction. It is highly efficient and leaves no “scar” sequences if designed correctly.

Week 7 HW: GENETIC circuits II

Week 7: IANNs & Fungal Materials

Part 1: Intracellular Artificial Neural Networks (IANNs)

Question 1

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Traditional genetic circuits typically operate on Boolean logic (AND, OR, NOT), which processes inputs as binary states (0 or 1). IANNs offer several distinct advantages:

  • Analog Processing: IANNs can process continuous, fuzzy signals rather than just binary ones, allowing cells to respond to gradients of environmental stimuli (Beardall et al., 2022).
  • Pattern Recognition: Unlike simple logic gates, IANNs can perform complex classification tasks, such as identifying specific combinations of biomarkers that do not follow a simple all-or-nothing rule (Moghimianavval et al., 2024).
  • Robustness to Noise: Neural network architectures are inherently better at filtering molecular noise. By using weighted sums and non-linear activation functions, they can ignore minor fluctuations in input and only trigger an output when a meaningful threshold is reached (Pandi et al., 2019).
  • Adaptability: While a Boolean circuit is hard-wired for one function, the weights in an IANN (represented by enzyme concentrations) can theoretically be tuned or learned over time to optimize the cell response to its environment.

Question 2

Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

Application: Smart Cancer Diagnostics An IANN could be engineered into a cell to detect a specific fingerprint of microRNAs (miRNAs) that characterize a tumor.

Input/Output Behavior

  • Inputs ($X_1, X_2… X_n$): The concentrations of five different miRNAs associated with a specific cancer type.
  • Processing: The IANN assigns weights to each miRNA. If the weighted sum of these inputs exceeds a threshold, it indicates the presence of a malignant state rather than a healthy one.
  • Output ($Y$): Production of a pro-apoptotic protein to trigger cell death (the kill switch) or a fluorescent reporter for diagnostic imaging.

Limitations

  • Metabolic Burden: Complex IANNs require significant cellular resources (ATP, ribosomes). This metabolic load can slow cell growth or cause the circuit to fail (Moghimianavval et al., 2024).
  • Orthogonality: It is difficult to ensure that the IANN parts do not interfere with the host cell native genetic machinery.

Question 3

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

Intracellular Multilayer Perceptron Diagram Intracellular Multilayer Perceptron Diagram

Part 2: Fungal Materials

Question 1

What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

ExampleUse Case
Mycelium Brick/PackagingBiodegradable alternative to Styrofoam or concrete.
Fungal Leather (Myco-leather)Sustainable fashion alternative to animal leather.

Advantages:

  • Sustainability: They are carbon-negative or neutral and biodegradable.
  • Low Energy: Fungi grow on agricultural waste (sawdust, straw) at room temperature, requiring far less energy than plastic or metal production.

Disadvantages:

  • Water Sensitivity: Fungal materials can be hydrophilic (absorb water), leading to structural weakness in humid environments.
  • Consistency: Unlike synthetic plastics, biological growth can be variable, making it harder to ensure uniform density and strength.

Question 2

What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Engineering Goals: One might engineer fungi to secrete specific enzymes for breaking down complex environmental toxins (bioremediation) or to incorporate conductive nanoparticles into their mycelium to create living electronics or sensors.

Advantages over Bacteria:

  1. Complex Secretion: Fungi are naturally professional secretors; they can export large, complex proteins more efficiently than many bacteria (like E. coli).
  2. Eukaryotic Processing: As eukaryotes, fungi can perform post-translational modifications (like glycosylation) necessary for human-like proteins.
  3. Structural Integrity: Mycelium forms a physical, fibrous network that can span meters, allowing for the creation of large-scale physical structures which bacteria (which usually form biofilms) cannot achieve.

References

  • Beardall, W. A. V., Stan, G.-B., & Dunlop, M. J. (2022). Deep learning concepts and applications for synthetic biology. GEN Biotechnology, 1(5), 360–371.
  • Moghimianavval, H., et al. (2024). Engineering sequestration-based biomolecular classifiers with shared resources. BioSystems, 238, 105164.
  • Pandi, A., et al. (2019). Metabolic perceptrons for neural computing in biological systems. Nature Communications, 10(1), 3854.

Week 9 HW: Cell-Free Systems

HTGAA Homework — Cell-Free Systems


Part A: General & Lecturer-Specific Questions


General Question 1

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free protein synthesis (CFPS) has genuinely changed how I think about expressing proteins, and the more I explore it, the more obvious it becomes why it is increasingly preferred for certain applications. The single biggest advantage is freedom from the constraints of a living cell. In traditional in vivo systems, the host organism has its own agenda — it needs to survive, divide, and regulate its own metabolic processes. This means if your target protein is toxic to the host, aggregates in the cytoplasm, or competes with essential cellular functions, you are fighting the cell the entire time (Pardee et al., 2016, Cell, 167(1), pp.248–259).

In a cell-free system, you lyse the cells and work directly with the molecular machinery — ribosomes, tRNA, aminoacyl-tRNA synthetases, chaperones — without the overhead of cellular regulation. This gives extraordinary control: you can directly titrate DNA template concentration, add non-natural amino acids, introduce isotopic labels for NMR, adjust ionic strength, or add chemical inhibitors — all impossible in a living cell without enormous genetic engineering effort (Silverman et al., 2020, Nature Structural & Molecular Biology, 17(12), pp.1241–1252).

Two clear cases where cell-free expression outperforms cell-based production:

Case 1 — Toxic or antimicrobial proteins. If you want to express a bacteriocin, a membrane-disrupting peptide, or a cytotoxic protein, the host cell will die before useful product accumulates. In a cell-free system there is no living cell to kill — you simply add the template to the extract and collect the protein (Rosenblum & Cooperman, 2014, Trends in Biochemical Sciences, 39(10), pp.475–486).

Case 2 — Membrane proteins. Overexpression of membrane proteins in living cells typically overwhelms the insertion machinery and produces inclusion bodies. In CFPS, detergent micelles, liposomes, or nanodiscs are added directly to the reaction, providing a lipid environment for the protein to fold into as it is being synthesised — an approach that has successfully expressed GPCRs and ion channels that were completely intractable in cellular systems (Sachse et al., 2014, PLOS ONE, 9(3), e96825).


General Question 2

Describe the main components of a cell-free expression system and explain the role of each component.

A cell-free expression system is essentially a reconstituted cytoplasm — all the molecular machines a cell normally uses for gene expression, running in a tube without the cell itself. The core components are:

Cell extract: Typically prepared from E. coli, wheat germ, or rabbit reticulocyte lysate, this contains ribosomes, translation factors (initiation, elongation, and release), aminoacyl-tRNA synthetases, RNA polymerase (in T7-based systems), and molecular chaperones. It is the engine of the system — the component that actually reads the mRNA and assembles the protein chain (Shin & Noireaux, 2012, ACS Synthetic Biology, 1(1), pp.29–41).

DNA or RNA template: Your genetic instruction. A plasmid or linear PCR product carrying the gene of interest under a strong promoter (usually T7) is added to the extract, which transcribes it into mRNA for translation. The ability to add naked DNA without cloning into a host chromosome is one of the biggest time-saving features of CFPS.

Amino acids: All 20 standard amino acids must be supplied exogenously at millimolar concentrations, since the extract does not contain enough free amino acids to sustain prolonged synthesis. In advanced applications, unnatural amino acids can be substituted at specific positions for site-specific labelling or chemical modification.

Energy regeneration system: Translation is energetically expensive — each peptide bond costs multiple ATP equivalents. Without a continuous ATP supply, the reaction exhausts itself within minutes. Creatine phosphate/creatine kinase (CP/CK), phosphoenolpyruvate (PEP), or glucose-based oxidative phosphorylation systems are used to continuously regenerate ATP (Jewett & Swartz, 2004, Molecular Systems Biology, published online).

Salts and cofactors: Magnesium (Mg²⁺), potassium (K⁺), and polyamines (spermidine, putrescine) are critical for ribosome structural integrity and activity. Optimising Mg²⁺ concentration alone can alter protein yield by several-fold.

RNase inhibitors and reducing agents: RNase inhibitors (e.g., SUPERaseIn) protect the mRNA template from nuclease degradation. Reducing agents such as DTT maintain the reducing cytoplasmic environment needed for most cytoplasmic proteins.


General Question 3

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Translation is one of the most energy-intensive processes in biology. Each peptide bond formation requires at least 4 ATP equivalents — 2 ATP for aminoacyl-tRNA charging, plus GTP hydrolysis at each EF-Tu and EF-G step during elongation. In a living cell, mitochondria or the electron transport chain continuously regenerate ATP from ADP, so the cell never runs dry as long as there is a carbon source. In a cell-free system, you start with a finite pool of ATP, and once it is depleted the ribosomes stall, mRNA remains untranslated, and protein synthesis stops completely — often within 20–40 minutes without intervention (Jewett & Swartz, 2004, Molecular Systems Biology, published online).

This is why energy regeneration is not an optional detail — it determines whether you get a useful yield or an empty tube.

The most widely used method is the creatine phosphate / creatine kinase (CP/CK) system. Creatine phosphate donates its high-energy phosphate group to ADP via the enzyme creatine kinase, directly regenerating ATP:

Creatine phosphate + ADP → Creatine + ATP

Typically 20–80 mM creatine phosphate and 0.5–2 mg/mL creatine kinase are added to the CFPS reaction at the start. This system sustains ATP levels for 1–2 hours in batch mode (Ryabova et al., 1995, Nucleic Acids Research, 23(13), pp.2401–2407).

For my Zambia metallothionein project, I would use this system and supplement it with a 37°C incubation temperature, which is optimal for E. coli extract activity. I would also monitor ATP concentration using a luciferase-based ATP assay at 30-minute intervals and replenish the CP/CK system at the 60-minute mark to extend the reaction and maximise yield of the 49 amino acid MT protein.


General Question 4

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic and eukaryotic CFPS systems are both powerful, but they serve fundamentally different purposes, and choosing the wrong system for a given protein is a costly and common mistake.

Prokaryotic CFPS — almost always based on E. coli S30 extract — is inexpensive, fast to prepare, gives the highest volumetric yields (up to 2–3 mg/mL in optimised systems), and is simple to work with. Its limitation is the absence of eukaryotic post-translational modifications: no N-linked glycosylation, no complex disulfide isomerisation pathways, and no signal peptide processing. For proteins that fold well in a reducing environment and do not require PTMs for function, E. coli CFPS is the obvious choice (Gregorio et al., 2019, Scientific Reports, 9(1), p.6771).

For my project, I would express the Zambian metallothionein (WP_070466881.1) in a prokaryotic E. coli CFPS. The protein is 49 amino acids, originates from a prokaryote (Bacillus cereus group), has no glycosylation sites, and its cysteine-rich structure folds through Cu²⁺ coordination rather than classical disulfide bonding. An E. coli extract supplemented with Cu²⁺ ions would allow real-time monitoring of metal uptake without the complexity of eukaryotic systems.

Eukaryotic CFPS — wheat germ extract (WGE), rabbit reticulocyte lysate (RRL), or HeLa cell extracts — is essential for proteins requiring eukaryotic PTMs. Glycosylation profoundly affects protein half-life, receptor binding, and immunogenicity in ways that bacteria simply cannot replicate. Signal peptides are correctly processed when microsomal membranes are supplemented (Endo & Sawasaki, 2006, Current Opinion in Biotechnology, 17(4), pp.373–380).

For a eukaryotic CFPS example, I would express human erythropoietin (EPO). EPO is a heavily N-glycosylated cytokine where the glycan chains are not mere decoration — they constitute approximately 40% of the molecular weight and are essential for correct in vivo half-life and receptor binding. A wheat germ extract system supplemented with dog pancreatic microsomes for signal peptide cleavage and glycosylation machinery would produce biologically relevant EPO that a prokaryotic system structurally cannot.


General Question 5

How would you design a cell-free experiment to optimise the expression of a membrane protein? Discuss the challenges and how you would address them.

Membrane proteins represent roughly 30% of the genome but are dramatically underrepresented in structural databases because they are so difficult to express and purify — their hydrophobic transmembrane helices aggregate instantly when exposed to aqueous environments without a lipid scaffold (Klammt et al., 2006, FEBS Journal, 273(18), pp.4141–4153). Cell-free systems are uniquely suited to address this because you can supply the lipid environment directly into the reaction as the protein is being synthesised.

My experimental design would proceed in three stages:

Stage 1 — Detergent-supplemented CFPS. I would use an E. coli S30 extract supplemented with mild non-ionic detergents added just above their CMC — screening DDM (n-dodecyl-β-D-maltoside, CMC = 0.17 mM), LMNG (lauryl maltose neopentyl glycol, CMC = 0.01 mM), and digitonin (CMC = 0.5 mM). The detergent micelles intercept the emerging hydrophobic transmembrane helices at the ribosomal exit tunnel, preventing aggregation (Kalmbach et al., 2007, Journal of Structural Biology, 159(2), pp.194–205).

Stage 2 — Nanodisc-supplemented CFPS. For proteins requiring a true bilayer environment for correct folding, I would add empty nanodiscs (DOPE:DOPG:DOPC at a ratio mimicking the E. coli inner membrane) to the CFPS reaction. Nanodiscs are discoidal lipid bilayer patches stabilised by membrane scaffold proteins (MSPs) that allow co-translational membrane insertion into a native-like environment.

Stage 3 — Parameter optimisation. I would screen Mg²⁺ concentration (4–16 mM), DNA template concentration (1–100 nM), reaction temperature (25°C, 30°C, 37°C), and incubation time (2–6 hours) in a factorial design. Yield would be quantified by SDS-PAGE densitometry with His-tag western blot, and folding quality assessed by circular dichroism (CD) spectroscopy — a correctly folded helical membrane protein produces a characteristic double-minimum CD spectrum at 208 and 222 nm.


General Question 6

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons and suggest a troubleshooting strategy for each.

Low yield is almost always diagnosable if you approach it systematically:

Reason 1 — mRNA instability or poor translation initiation. If residual nucleases in the extract rapidly degrade the mRNA, or if the ribosome binding site (RBS) is poorly configured for the E. coli ribosome, translation will fail even if transcription is normal. Troubleshoot by adding an RNase inhibitor (e.g., SUPERaseIn, 1 U/µL) and sampling mRNA levels at 0, 30, and 60 minutes via gel electrophoresis. Separately, redesign the RBS using the Salis Lab RBS Calculator to maximise translation initiation rate, and switch to a codon-optimised synthetic gene to eliminate rare codon pauses (Salis et al., 2009, Nature Biotechnology, 27(10), pp.946–950).

Reason 2 — Energy and ATP depletion. If the creatine phosphate supply is insufficient, or if creatine kinase has lost activity due to a freeze-thaw cycle, the ribosomes stall early. Test by adding fresh creatine phosphate (80 mM) and creatine kinase (2 mg/mL) at the 30-minute mark and monitoring whether yield recovers. Directly measure ATP using a luciferase-based ATP assay kit at multiple time points — if ATP falls below 1 mM within the first hour, shift to a glucose-based energy system or increase the initial creatine phosphate concentration (Jewett & Swartz, 2004, Molecular Systems Biology, published online).

Reason 3 — Protein aggregation post-synthesis. The protein may be expressed normally but immediately misfold and aggregate. Check by running both the supernatant and the pellet fractions on SDS-PAGE after centrifugation at 14,000 rpm for 10 minutes — if the target band appears only in the pellet, aggregation is occurring. Address this by supplementing the CFPS reaction with molecular chaperones (DnaK/DnaJ/GrpE system, 1–4 µM each), reducing reaction temperature to 25°C, and in my case adding Cu²⁺ ions co-translationally to drive the metallothionein into its correctly folded metal-bound conformation before aggregation can occur (Hartl et al., 2011, Nature, 475(7356), pp.324–332).


Homework Question from Kate Adamala

Design an example of a useful synthetic minimal cell.

Based on: Rampioni, G. et al., 2018. Synthetic cells produce a quorum sensing chemical signal perceived by Pseudomonas aeruginosa. Chemical Communications, 54(18), pp.2090–2093.


1. Pick a function and describe it.

1.1 What would your synthetic cell do? What is the input and what is the output?

Expand the metal-sensing capacity of engineered Bacillus subtilis for bioremediation of Cu²⁺-contaminated mine water. The synthetic minimal cell (SMC) acts as a molecular translator — it detects dissolved Cu²⁺ ions in Zambian mine water (which cannot directly activate the B. subtilis MT expression system at sub-threshold concentrations) and responds by synthesising and releasing IPTG into the surrounding medium, which then derepresses a lac operator–controlled metallothionein (MT) gene in nearby B. subtilis cells.

Input: Cu²⁺ ions (dissolved copper from Copperbelt mine leachate, threshold ≥ 5 mg/L). Output of the SMC: IPTG (isopropyl β-D-1-thiogalactopyranoside). Output of the whole system: Metallothionein protein expressed in B. subtilis, actively sequestering Cu²⁺ from the surrounding water.

(Copper riboswitch reference: Dambach, M. et al., 2015. The ubiquitous yybP-ykoY riboswitch is a manganese-responsive regulatory element. Molecular Cell, 57(6), pp.1099–1109. For CsoR-based copper sensing: Liu, T. et al., 2007. CsoR is a novel Mycobacterium tuberculosis copper-sensing transcriptional regulator. Nature Chemical Biology, 3(1), pp.60–68.)

1.2 Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

No. If the IPTG were not encapsulated inside the SMC, it would diffuse freely into the B. subtilis cells regardless of whether Cu²⁺ is present, bypassing the copper-sensing circuit entirely. The encapsulation is what creates the conditional logic — IPTG is only released when Cu²⁺ enters the SMC and activates the internal copper-responsive gene expression system that drives synthesis of the membrane pore. Without the vesicle compartment, the SMC actuator does not exist and the system has no Cu²⁺ specificity.

1.3 Could this function be realized by a genetically modified natural cell?

Yes, in principle: a Cu²⁺-responsive riboswitch or CsoR-regulated promoter could be incorporated into a transformed B. subtilis strain to directly drive MT expression upon copper exposure. However, this approach lacks generality and introduces biosafety concerns — a genetically modified organism that grows, divides, and spreads in a Zambian mine site raises significant regulatory and ecological risks. The SMC approach is inherently safer: it is a non-replicating lipid vesicle with no genome, no ability to proliferate, and predictable degradation in the environment. Furthermore, using an SMC means that a single B. subtilis reporter strain can be paired with different SMCs tuned to different metal ions (Cu²⁺, Zn²⁺, Pb²⁺), without re-engineering the bacterium each time.

1.4 Describe the desired outcome of your synthetic cell operation.

In the presence of SMCs, B. subtilis cells sense Cu²⁺ at ecologically relevant concentrations and produce metallothionein to sequester the metal. In the absence of SMCs, the B. subtilis MT system remains silent regardless of Cu²⁺ concentration, because the lacI repressor blocks MT expression until IPTG is present. When mine water Cu²⁺ exceeds the threshold, SMCs autonomously bridge the chemical gap — translating the inorganic copper signal into an organic molecular signal (IPTG) that the bacteria can respond to.


2. Design all components that would need to be part of your synthetic cell.

2.1 What would be the membrane made of?

POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) + cholesterol (4:1 molar ratio). POPC provides a fluid, permeable bilayer at the 24–38°C ambient temperature range of the Zambian Copperbelt, while cholesterol increases mechanical rigidity and reduces passive permeability to IPTG (ensuring IPTG stays encapsulated until the pore is formed). The membrane is naturally permeable to small Cu²⁺ ions, which enter via passive diffusion down their concentration gradient, eliminating the need for an input channel.

2.2 What would you encapsulate inside? Enzymes, small molecules.

  • E. coli S30 cell-free Tx/Tl extract (transcription/translation machinery)
  • Pre-loaded IPTG (5 mM internal concentration, sufficient to derepress MT expression in surrounding B. subtilis)
  • Linear DNA template encoding α-hemolysin (aHL) under the control of a CsoR-regulated copper-responsive promoter (PcopA, from the Bacillus subtilis copper resistance operon)
  • NTPs (ATP, GTP, CTP, UTP) and amino acid mix to sustain CFPS
  • Creatine phosphate (50 mM) + creatine kinase (1 mg/mL) for energy regeneration

2.3 Which organism will your Tx/Tl system come from? Is bacterial OK, or do you need a mammalian system?

Bacterial (E. coli S30 extract) is appropriate here, because the copper-sensing regulatory element is the CsoR-responsive PcopA promoter — a prokaryotic transcriptional control element that does not require mammalian-specific transcription factors or chromatin remodelling. There is no need for Tet-ON or other mammalian small-molecule-modulated systems. E. coli extract is also ideal for cost-effectiveness at the volumes needed for environmental deployment.

2.4 How will your synthetic cell communicate with the environment?

The outer POPC/cholesterol membrane is naturally permeable to Cu²⁺ ions (ionic radius 0.73 Å), which enter the SMC passively when external concentration exceeds approximately 5 mg/L. Once inside, Cu²⁺ binds to the CsoR repressor, releasing it from the PcopA promoter and derepressing transcription of the aHL gene. The resulting α-hemolysin monomers self-assemble into a heptameric pore in the SMC membrane, creating a ~2 nm channel through which the pre-loaded IPTG diffuses out into the surrounding water. The surrounding B. subtilis cells then take up IPTG and produce metallothionein. The output communication is therefore chemical — IPTG crossing the SMC membrane via the expressed aHL pore.


3. Experimental details.

3.1 List all lipids and genes.

  • Lipids: POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine), cholesterol
  • Enzymes: E. coli S30 cell-free Tx/Tl extract; creatine kinase
  • Small molecules (encapsulated): IPTG (5 mM), creatine phosphate (50 mM), NTP mix, amino acid mix
  • Genes:
    • α-hemolysin (aHL; gene: hla, UniProt P09616) — encapsulated in SMC under PcopA promoter control; forms the IPTG-release pore upon Cu²⁺ activation
    • CsoR (copper-sensing repressor; gene: csoR, NCBI Gene ID: 936347) — co-encapsulated to regulate PcopA; released from promoter upon Cu²⁺ binding
  • Biological cells: Bacillus subtilis 168 transformed with MT gene (WP_070466881.1) under T7 promoter and lac operator; lacI constitutively expressed to keep MT repressed until IPTG is released by the SMC

3.2 How will you measure the function of your system?

Primary readout: Measure MT protein yield in B. subtilis culture supernatant via SDS-PAGE and western blot (anti-His-tag, if MT is His-tagged) as a function of external Cu²⁺ concentration (0, 1, 5, 10, 50, 100 mg/L). Confirm metal sequestration by ICP-MS of the growth medium supernatant — a reduction in dissolved Cu²⁺ concentration confirms the functional output of the full SMC → B. subtilis → MT system.

Secondary readout: Replace MT with GFP under the same lac operator in a control construct, and measure GFP fluorescence (Ex 488 nm / Em 510 nm) via plate reader as a proxy for SMC-triggered IPTG release. This provides a fast, high-throughput screen for SMC function before moving to the full MT assay.

Negative controls: SMCs without CsoR (constitutive aHL expression, IPTG leaks regardless of Cu²⁺); B. subtilis without SMCs (no IPTG source, no MT expression); buffer with Cu²⁺ but no SMCs (confirms Cu²⁺ alone does not induce MT in unmodified B. subtilis).


Diagram concept: The SMC (circle) floats in Cu²⁺-contaminated mine water alongside B. subtilis cells (oblong). (a) In the absence of SMCs, B. subtilis cannot respond to Cu²⁺ because the lacI repressor blocks MT expression. (b) When Cu²⁺ enters the SMC, CsoR releases PcopA, aHL is expressed and inserts into the SMC membrane, and pre-loaded IPTG diffuses out into the water. B. subtilis takes up IPTG, derepresses the MT gene, produces metallothionein, and sequesters Cu²⁺ from the surrounding water.


Homework Question from Peter Nguyen

Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems functionally integrated into the material.

Application field: Architecture

One-sentence summary pitch: Freeze-dried cell-free biosensor panels embedded in building facade tiles activate upon contact with heavy metal–contaminated rainwater and produce a visible colour change, turning a building’s exterior into a self-reporting, equipment-free environmental monitoring system for Zambian Copperbelt communities.

How will the idea work? The product is a modular ceramic tile containing freeze-dried CFPS reactions embedded in a porous chitosan hydrogel matrix within micro-channels printed across the tile surface. When contaminated rainwater contacts the tile surface, it rehydrates the CFPS reaction. The encapsulated DNA template encodes a Cu²⁺-responsive genetic circuit: the CsoR-regulated PcopA promoter drives expression of a laccase enzyme, which oxidises a pre-loaded colourless substrate (ABTS, 2,2’-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid)) into a dark blue-green product visible from street level without any equipment (Pardee et al., 2014, Cell, 159(4), pp.940–954). Each tile acts as an independent, single-use test unit; tiles are replaced monthly and spent tiles safely incinerated. Because the CFPS components are lyophilised with trehalose as a cryoprotectant, tiles are shelf-stable for up to 18 months in ambient conditions, making them practical for the Zambian supply chain.

Societal challenge addressed: Communities in Kitwe, Chingola, and Mufulira — built directly adjacent to active Copperbelt mine tailings — have historically lacked affordable, accessible, real-time environmental monitoring. Professional ICP-MS testing costs hundreds of dollars per sample and requires samples to be shipped to Lusaka. These biosensor tiles place continuous heavy metal monitoring in the hands of communities who currently have none, directly addressing environmental justice gaps and supporting Zambia’s obligations under the Minamata Convention on heavy metals pollution.

Addressing cell-free system limitations: The single-use nature of CFPS reactions is here reframed as a design advantage — tiles are designed as replaceable consumable panels, similar to air filters, with a defined replacement schedule. Lyophilisation with trehalose addresses long-term stability (Pardee et al., 2016, Cell, 167(1), pp.248–259). Water activation is inherent to the outdoor application — contaminated rainwater is the activating agent by design. To prevent false positives from clean rain, the riboswitch activation threshold is calibrated above the WHO Cu²⁺ discharge limit (2 mg/L), so only genuinely contaminated runoff produces a signal. For the one-time-use limitation, a tile refresh subscription service model — supplying replacement tile panels quarterly to mine-adjacent communities — creates a sustainable commercial and social impact model.


Homework Question from Ally Huang

Develop a mock Genes in Space proposal incorporating the BioBits® cell-free protein expression system.

1. Background (≤100 words)

Long-duration spaceflight profoundly disrupts the human gut microbiome. Microgravity, ionising radiation, and chronic psychological stress cause measurable shifts in microbial community composition — reductions in beneficial commensals such as Lactobacillus and Bifidobacterium, alongside blooms of potentially pathogenic genera (Turroni et al., 2020, Frontiers in Physiology, 11, p.553). These dysbiotic shifts are linked to immune dysregulation, inflammatory conditions, and impaired nutrient absorption in astronauts. On multi-year Mars missions where no medical evacuation is possible, early detection of gut dysbiosis could prevent life-threatening complications. Yet current microbiome diagnostics require complex laboratory infrastructure unavailable aboard spacecraft.

2. Molecular or genetic target (≤30 words)

Indole (produced by tryptophanase-expressing gut commensals) and butyrate (produced by Faecalibacterium prausnitzii) as proxy biomarkers of gut microbiome health status, detectable non-invasively in astronaut saliva.

3. Relationship to space biology challenge (≤100 words)

Indole is produced exclusively by tryptophanase-expressing bacteria — predominantly healthy gut commensals including E. coli and Bacteroides species — while butyrate is generated by the fermentation activity of Faecalibacterium prausnitzii and Roseburia, both of which decline sharply during spaceflight-associated dysbiosis. A drop in salivary indole and butyrate below an astronaut’s personal pre-flight baseline would serve as an early, non-invasive warning signal that the gut microbiome is shifting toward a dysbiotic state, allowing intervention — probiotic supplementation or dietary adjustment — before clinical symptoms appear (Lee & Lee, 2010, FEMS Microbiology Letters, 313(2), pp.120–128).

4. Hypothesis (≤150 words)

I hypothesise that freeze-dried BioBits® cell-free reactions containing riboswitch-based genetic circuits sensitive to indole and butyrate can be rehydrated with a single drop of astronaut saliva to produce a fluorescent output proportional to biomarker concentration — providing a rapid, equipment-minimal, and quantitative readout of gut microbiome health status aboard the ISS or a Mars transit vehicle. Specifically, I predict that astronauts showing greater than 50% reduction in salivary indole from their personal pre-flight baseline will demonstrate concurrent immune and gastrointestinal stress markers, validating the cell-free biosensor as a clinically meaningful diagnostic tool. The reasoning is grounded in published correlations between indole production and Lactobacillus-dominated healthy microbiomes, and the established capacity of riboswitch-based CFPS circuits to generate threshold-responsive fluorescent outputs at microgram-scale reagent quantities (Pardee et al., 2016, Cell, 167(1), pp.248–259).

5. Experimental plan (≤100 words)

Samples: Weekly saliva (100 µL) from four ISS crew members over a 6-month mission. Controls: pre-flight baseline saliva (personal reference), Earth-based healthy volunteer saliva, and synthetic indole/butyrate standard curves (0–500 µM). Protocol: Rehydrate one BioBits® freeze-dried pellet per sample with 5 µL of saliva. Incubate at 37°C (body temperature, maintained by crew member hand-warming pouch) for 2 hours. Read GFP fluorescence using the P51 Molecular Fluorescence Viewer. Confirm positive results with miniPCR® amplification of the tryptophanase gene (tnaA) from saliva as a microbial community abundance proxy. Data recorded: fluorescence intensity, tnaA band intensity, weekly dietary log, and crew health self-assessment scores.


Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Subsections of Individual Final Project

Week 1 HW: Principles and Practices

Zambia Mineral-Waste Bioremediation Predictor

From Metagenome to Marketable Bioremediation Product

HTGAA 2026 Final Project · Elsa Muleya · SynBio USFQ Node


Project Rationale

Zambia’s Copperbelt Province faces severe heavy metal contamination from decades of copper mining at Konkola, Nchanga, Mufulira, and Chingola. Cu²⁺, Zn²⁺, Co²⁺, and Pb²⁺ leach from mine tailings into groundwater and agricultural soils at concentrations far exceeding WHO limits, with no affordable or accessible remediation solution for affected communities.

This project designs, validates, and packages a living biological solution: engineered Bacillus subtilis carrying a novel metallothionein (MT) gene discovered from Zambian mine-associated bacterial genomes, encapsulated in a field-deployable dual-layer hydrogel biocontainment system — ZAMGEL — that can be commercially produced and applied without specialist equipment or laboratory infrastructure.


Three-Aim Project Structure

AimTitleFocus
1Bioinformatics Discovery & Genetic DesignMetagenomics, structural prediction, circuit design
2Wet Lab Validation Under Zambian ConditionsTransformation, metal assays, pH & stress testing
3ZAMGEL Containment & Commercial Product DesignHydrogel bioencapsulation, kill-switch, market pathway

Aim 1: Bioinformatics Discovery & Genetic Design

Goal: Identify and structurally validate novel metallothioneins from Zambian mine-associated bacterial genomes, and design a complete synthetic expression cassette ready for wet lab transformation.

Sub-aim 1a: Metagenomic Mining of Zambian Copperbelt Sequences

Mine publicly available sequencing datasets from NCBI SRA, MG-RAST, and IMG/M targeting the Konkola, Nchanga, and Mufulira mine regions. The full computational pipeline:

FASTQ → fastp (QC trim) → MEGAHIT (assembly) → Prodigal (ORF prediction) → BLASTp + Prokka (annotation)

Filter candidates by the presence of the Cys-X-Cys motif — the canonical Cu/Zn coordination fingerprint in prokaryotic metallothioneins — and cross-reference against known prokaryotic MT families (SmtA-like, BmtA-like, CzcA operons, CopA ATPases). Build a maximum-likelihood phylogenetic tree using IQ-TREE 2 to confirm novelty.

DatabasePurpose
NCBI SRAPrimary source for Zambian mine metagenome FASTQ files
MG-RASTMine microbiome metagenomes with functional annotation
IMG/MIntegrated Microbial Genomes — metal resistance gene clusters
UniProt/SwissProtReference MT homology and Cys-X-Cys motif validation

Sub-aim 1b: Structural Validation & Synthetic Expression Cassette Design

For the top 5 MT candidates from Sub-aim 1a, simultaneously validate 3D structural integrity and design the full synthetic genetic system.

Structural Validation

  • Submit top candidate sequences to AlphaFold3 to generate .pdb files and visualise cysteine-rich metal-binding pockets
  • Pass threshold: pLDDT > 85 across the metal-binding domain; ipTM > 0.80 for confident fold prediction
  • Quantify binding pocket geometry in PyMOL / ChimeraX: pocket volume (ų), solvent accessibility, Cys coordination angle, and closest Cys–Cys distance (target < 6 Å for effective Cu²⁺ coordination)
  • Calculate predicted dissociation constant: Kd = e^(ΔG/RT) at T = 310 K (37°C); expected range 10⁻¹³ to 10⁻¹⁵ M for high-performance prokaryotic MTs
  • Compare all candidates against reference proteins (SmtA from Synechococcus PCC 7942; BmtA from Pseudomonas) on Kd, Cys count, and pLDDT

Expression Cassette Design (Benchling)

  • Codon-optimise the best-scoring MT sequence for B. subtilis 168 using Benchling’s built-in optimiser
  • Design a metal-responsive synthetic circuit in Cello 2.0: Cu²⁺ sensor (PcorA or PmtA promoter) → NOT gate logic → MT expressed only when Cu²⁺ exceeds threshold
  • Include eGFP fluorescent reporter downstream of MT as a real-time visual proxy for circuit activation
5'─[PcopA/PmtA]─[RBS B0034]─[MT_Bsubtilis_optimised]─[eGFP]─[T_B0015]─3'
    Cu²⁺ sensor   strong RBS    codon-optimised         reporter  terminator
  • Verify BioBrick RFC10 compatibility in Benchling
  • Submit all sequences through Twist Bioscience biosecurity screening (“Green” classification required before synthesis order)

Aim 2: Wet Lab Validation Under Zambian Environmental Conditions

Goal: Transform the computationally designed system into a living, functional biosensor-remediator and rigorously stress-test it against the real environmental conditions of the Zambian Copperbelt.

Sub-aim 2a: Chassis Construction & Verification

Transform B. subtilis 168 with the assembled MT expression plasmid and confirm successful integration using three independent assays before proceeding to metal exposure experiments:

AssayMethodPass Criterion
Colony PCRMT-specific primers flanking insert; 30 cycles, 55°C annealingBand at expected insert size
Sanger SequencingSequence full insert with M13 forward/reverse primers100% identity to designed cassette
SDS-PAGE + Western BlotAnti-His-tag antibody; 4h induction at 37°CBand at ~6 kDa (49 AA protein)
GFP Fluorescence MicroscopyImage colonies in Cu²⁺-spiked media at Ex 488 / Em 510 nm> 5× fluorescence over water control

Sub-aim 2b: Metal Ion Concentration Response Assays

Expose the engineered B. subtilis to a full Cu²⁺ concentration gradient spanning real Copperbelt mine drainage (reported range: 0.5–500 mg/L). Measure metal removal using ICP-MS on growth media supernatant and calculate Bio-Sequestration Efficiency (%BSE):

%BSE = ([Metal]₀ − [Metal]f) ÷ [Metal]₀ × 100
Cu²⁺ ConcentrationEnvironmental ContextMeasurements
0 mg/LNegative controlGFP baseline, OD600, ICP-MS
0.5 mg/LWHO drinking water limitGFP, OD600, ICP-MS
5 mg/LWHO industrial discharge limitGFP, OD600, ICP-MS
50 mg/LTypical Konkola drainage concentrationGFP, OD600, ICP-MS
500 mg/LPeak Copperbelt leachate concentrationGFP, OD600, ICP-MS, survival rate
1000 mg/LToxicity threshold — LD50 determinationColony viability, LD50 endpoint

Sub-aim 2c: pH Stress Testing

Zambian mine tailings range from pH 2.5–4.5 (active acid mine drainage) to pH 8–9 (alkaline neutralisation runoff). Test bacteria across this full range at fixed 50 mg/L Cu²⁺ to define the operational pH window and inform ZAMGEL outer shell buffer design.

pHEnvironmental Context (Zambia)Measurements
2.5Active acid mine drainage leachateGFP, OD600, ICP-MS
3.5Tailing pond runoffGFP, OD600, ICP-MS
4.5Near-tailing agricultural soil leachateGFP, OD600, ICP-MS
5.5Mildly acidic Copperbelt soilGFP, OD600, ICP-MS
6.5 ★Neutral control (laboratory standard)GFP, OD600, ICP-MS
7.5Borehole drinking water (Kitwe)GFP, OD600, ICP-MS
8.5Alkaline mine neutralisation runoffGFP, OD600, ICP-MS
9.0Extreme alkaline drainage (worst case)GFP, OD600, ICP-MS

Sub-aim 2d: Multi-Stressor Environmental Simulation

Real Copperbelt soil presents multiple co-occurring stresses. Bacteria must survive all of these simultaneously to be field-deployable. Each stressor is tested at fixed Cu²⁺ = 50 mg/L and pH 6.5 to isolate the effect; a final cocktail experiment combines all worst-case stressors simultaneously.

StressorZambia-Specific ConditionTest ParametersOutput Measured
TemperatureAvg 24°C; dry season peak 38°C20, 28, 37, 42°COD600, GFP, %BSE
Co-metal toxicityCu²⁺ + Zn²⁺ + Co²⁺ + Pb²⁺ co-contaminationSingle vs cocktail, 50 mg/L eachICP-MS all ions, GFP
DesiccationDry season soil water activity < 0.85aw 0.85, 0.90, 0.95 via NaClOD600, colony viability
UV exposureHigh solar UV at 12–15°S latitudeUV-C 254 nm: 0, 10, 30, 60 s pulseColony survival, DNA damage gel
Competing microbiomeIndigenous Copperbelt soil microbiome10% v/v heat-killed soil extractGFP, OD600, ICP-MS

Aim 3: ZAMGEL Containment System & Commercial Product Design

Goal: Design a biomaterial containment system that physically and genetically contains the engineered bacteria inside a field-deployable carrier, preventing environmental escape while maintaining full metal-sequestration function — creating a product that can be commercially sold and applied without ecological risk.

Sub-aim 3a: ZAMGEL Dual-Layer Hydrogel Bioencapsulation

The ZAMGEL biocapsule is a three-layer biomaterial architecture. Each layer performs a distinct function, together creating a self-contained living bioreactor deployable directly onto mine tailings:

LayerCompositionFunctionSourcing
Outer shellCalcium alginate + CaCO₃ nanoparticlespH buffering: neutralises acidic mine leachate to pH 5.5–6.5 before bacteria are exposed; structural integrity in soilFood-grade alginate; CaCO₃ from local limestone
Middle membraneCellulose nanofibre + chitosan crosslinkSize-selective filter: 200 nm pores allow Cu²⁺ ions (0.73 Å) to enter freely; bacteria (1–2 µm) physically cannot escapeLocal agricultural waste cellulose; chitosan import
Inner corePVA + gelatin hydrogel + activated charcoalBacteria viability matrix at 10⁸ CFU/mL; activated charcoal provides passive metal co-adsorption during biological lag phaseCommercial PVA/gelatin; charcoal from local Copperbelt source

Sub-aim 3b: Containment Validation & Kill-Switch Integration

Containment Validation

TestProtocolPass Threshold
Bacterial escapePlate surrounding water on LB agar at 7, 14, 30 days< 1 CFU/mL at 30 days
Ion permeabilityICP-MS of surrounding fluid vs bead interior after 24h Cu²⁺ exposureCu²⁺ enters freely; bacteria absent in external fluid
Mechanical durabilityCompression to 50 kPa (equivalent to 30 cm soil overburden)No structural failure; containment maintained
Biodegradation rateBury spent beads in Zambian soil analogue at 28°C; measure mass loss weeklyFull degradation in 90–180 days; no persistent residue

Genetic Kill-Switch (MazF/MazE Toxin-Antitoxin)

A MazF/MazE kill-switch is integrated into the B. subtilis chromosome (not plasmid, to prevent loss). MazE antitoxin is expressed under a Ptet promoter requiring anhydrotetracycline (aTc) to remain active. When aTc is withdrawn (ZAMGEL retrieved or degraded at end of life), MazE degrades, MazF mRNA interferase cleaves all mRNA, and all bacteria die within 48 hours. A secondary CcdB/CcdA kill-switch on the plasmid backbone provides an orthogonal safety layer.

aTc present → MazE expressed → MazF neutralised → Bacteria LIVE
aTc absent  → MazE degraded  → MazF active      → Bacteria DEAD within 48h

Sub-aim 3c: Commercial Product Formats & Digital Predictor App

FormatDescriptionUse CaseDeployment
ZAMGEL Beads3–5 mm spheres, ~10⁸ CFU/beadMine water treatment pondsBroadcast by hand or machine
ZAMGEL Sheets10×10 cm biodegradable matsSoil surface tailing cap treatmentLay directly on contaminated soil
ZAMGEL CartridgesInline filter column packed with beadsBorehole and drainage pipe treatmentInstall in drainage infrastructure

A Streamlit-based mobile web app (offline-capable PWA) allows community members and mine site managers to input local soil Cu²⁺ concentration, pH, temperature, and treatment area, and receive a data-driven treatment recipe — number of ZAMGEL beads, predicted %BSE, and estimated remediation timeline — based on dose-response curves generated in Aim 2. No laboratory equipment required.

Regulatory pathway: Zambia Environmental Management Agency (ZEMA) contained-use application under Biosafety Act No. 10 of 2007; Nagoya Protocol compliance for use of indigenous Zambian microbial genetic resources; community consent framework with Copperbelt mining communities. Primary commercial client: ZCCM-IH.


15-Week Project Timeline

WeekAimActivity
11aSRA/MG-RAST/IMG/M search for Konkola, Nchanga, Mufulira mine datasets; quality trim with fastp
21aMEGAHIT assembly → Prodigal ORF prediction → BLASTp + Prokka annotation of metal resistance genes
31aCys-X-Cys motif filter → top 5 candidates selected; IQ-TREE 2 maximum-likelihood phylogenetic tree
41bAlphaFold3 structure prediction for all 5 candidates; retrieve .pdb files
51bPyMOL/ChimeraX binding pocket quantification: volume, Cys coordination geometry, pLDDT mapping
61bBenchling codon optimisation + Cello 2.0 logic gate design + Twist Bioscience DNA order
72aB. subtilis 168 transformation; colony PCR; Sanger sequencing verification
82aSDS-PAGE + western blot + GFP fluorescence microscopy to confirm MT expression
92bCu²⁺ concentration gradient assays (0–1000 mg/L); ICP-MS; GFP plate reader; dose-response curve
102cpH stress assays (pH 2.5–9.0) at 50 mg/L Cu²⁺; identify operational pH window
112dMulti-stressor factorial experiment: temperature × co-metals × UV × desiccation × microbiome cocktail
123aZAMGEL prototype fabrication: alginate outer shell + chitosan membrane + PVA/gelatin inner core
133bContainment validation: LB plating, ICP-MS permeability, compression testing, biodegradation assay
143bMazF/MazE kill-switch chromosomal integration + aTc withdrawal 48h death assay; CcdB/CcdA backup
153cStreamlit app prototype; ZEMA regulatory pathway draft; final in silico feasibility report

Validation Criteria & Contingency Plans

ExperimentPass ThresholdIf Fail — Contingency
AlphaFold3 pLDDT (binding domain)> 85 on core domain; ipTM > 0.80Use SmtA (Synechococcus PCC 7942) as positive control scaffold; re-run with AlphaFold2
GFP activation in Cu²⁺ media> 5× fluorescence over backgroundRedesign Cello promoter with stronger RBS; increase plasmid copy number
ICP-MS metal removal (%BSE)> 60% BSE at 50 mg/L Cu²⁺Increase MT copy number via multi-copy plasmid (pHT01); co-express CopA copper ATPase
pH operational windowActive sequestration at pH 4.5–8.0Increase CaCO₃ loading in ZAMGEL outer shell; add internal carbonate buffer inside PVA core
ZAMGEL containment (30 days)< 1 CFU/mL in surrounding mediumIncrease chitosan crosslink density; reduce pore size to 100 nm
Kill-switch efficacy100% cell death within 48h of aTc removalSwitch to CcdB/CcdA system; add second orthogonal kill-switch on separate chromosome locus

Why This Project Matters

Existing Copperbelt remediation approaches — lime neutralisation, chemical precipitation, pump-and-treat — are capital-intensive, infrastructure-dependent, and inaccessible to subsistence communities adjacent to mine tailings. The ZAMGEL system offers:

  • No electricity or specialist infrastructure required — scatter-and-forget deployment
  • Zero environmental release — physically contained by 200 nm membrane; genetically contained by dual kill-switch
  • Self-regulating — MT only expressed when Cu²⁺ exceeds threshold; GFP reporter confirms activity in real time
  • Locally grounded — MT gene discovered from Zambian mine-associated bacterial genomes
  • Commercially viable — manufacturable from locally sourced materials; approvable under existing Zambian biosafety law
  • Community-facing — Streamlit app enables treatment planning without laboratory equipment or expertise

Group Final Project

cover image cover image