Subsections of Homework
Week 1 HW: Principles and Practices
1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. 🧬
Bio-Hybrid Fusion Blanket
Research Context:
I am currently a research assistant investigating Magnetohydrodynamics (MHD), specifically focusing on the complex interactions between magnetic fields and 150-million-degree plasma. My work involves optimizing plasma confinement within Tokamak reactors. At these extreme temperatures, the behaviour of the plasmas is governed by a delicate balance of magnetic pressure and fluid dynamics, creating an environment that is incredibly hostile to the physical structures surrounding it.
Physics Problem:
In a Deuterium-Tritium fusion reactor, the blanket is a critical component that lies in the interior of the reactor. It captures high energy neutrons released by fusion converting their kinetic energy into heat which generated electricity. It also contains lithium which when struck by those neutrons breeds tritium, the fuel we can recycle in the reactor.
Currently, these blankets are limited by severe material degradation. High-energy neutron bombardment causes metals to swell, become brittle, and crack from the inside out as waste reactants accumulate. Plasma is also a volatile fluid that is difficult to control with magnets; sudden disruptions can dump massive thermal loads onto the reactor walls. Since current materials are rigid and static, they cannot absorb or repair these shocks, leading to surface melting and catastrophic structural failures.
The proposal:
The idea is to develop a bio-hybrid, self-healing blanket for the fusion reactor replacing rigid metal walls with a dynamic system where biology acts as both the architect and the maintenance crew.
One idea could be utilizing synthetic biology to grow the initial reactor structure. By using biology as a 3D template, we can grow a reactor structure that places lithium atoms with perfect accuracy. This creates a more efficient fuel-making system inside a heat-shield wall filled with tiny, vein-like cooling channels that traditional machines simply can’t build.
Another idea involves using the reactor’s downtime as a biological recovery phase. Once the system is cooled, the network of vascular channels becomes a highway for bespoke, bio-engineered cells designed to seek out and clear trapped helium waste. These cellular workers then secrete new mineral precursors to “re-grow” the scaffolding at the site of neutron-induced cracks, allowing the blanket to rejuvenate its structural integrity like a self-maintaining organ.
2. Governance or policy goals for an ethical future ⚖️
One goal would be to ensure the bio-hybrid blanket is easy to clean up. We want to avoid creating bio-nuclear waste that is harder to handle than regular blanket material.
Sub goal 1: Easy deconstruction – the biological structure should be non-toxic and easy to recycle and dissolve away after use; we should be able to filter out and recycle expensive metals used such as lithium.
Sub goal 2: Chemical safety – the maintenance cells must be engineered so they don’t produce harmful chemicals while they work, so the reactor process doesn’t require hazardous waste treatment.
3. Governance actions across actors 🏛️
Action 1: Digital DNA Registry (Technical Strategy)
Actor: Researchers
Purpose: Move away from secret, proprietary cell design to a shared public database of genetic blueprints.
Design: Researchers must upload their genetic code of their cells to a registry so other people know how to handle and recycle them.
Assumptions: Assumes labs will share blueprints, and that a global standard for DNA data would work.
Risk of Failure: Bad actors could learn how to destabilise or reverse engineer cells since it’s public.
Success: Any country could build and recycle their reactors and blankets.
Action 2: Green Fusion Tax Credits (Financial Incentive)
Purpose: Reward reactors that prove they are highly recyclable.
Design: The government would give extra funding or tax breaks to companies whose bio blankets leave minimal toxic waste behind.
Assumptions: Assumes money is the biggest motivator for companies to prioritize over speed of reactor development.
Risk of Failures: Companies might greenwash their data to get money without being clean.
Success: Low waste reactors become the most profitable way to run and becomes industry standard.
Action 3: Biological Security (New Rule)
Purpose: Prevent technology from being turned into a biological weapon that can survive extreme environments.
Design: Require fusion labs to store biological material in high security facilities with background checks like those used for handling nuclear fuel.
Assumptions: Assumes these bio engineered cells are dangerous.
Risk of Failure: Expensive security could slow down science, so we never get to clean fusion energy.
Success: Only good actors working on building innovative materials to help achieve clean energy get access to these biomaterials.
4. Scoring governance options 📊
| Does the option: | Option 1: DNA Registry | Option 2: Green Tax Credits | Option 3: Bio-Security Rule |
|---|
| Enhance Biosecurity | | | |
| By preventing incidents | 3 | 2 | 1 |
| By helping respond | 1 | 3 | 2 |
| Foster Lab Safety | | | |
| By preventing incident | 2 | 3 | 1 |
| By helping respond | 1 | 3 | 2 |
| Protect the environment | | | |
| By preventing incidents | 2 | 1 | 2 |
| By helping respond | 1 | 2 | 3 |
| Other considerations | | | |
| Minimizing costs or burdens | 1 | 2 | 3 |
| Feasibility | 2 | 1 | 3 |
| Not impede research | 2 | 3 | 1 |
| Promote constructive applications | 1 | 2 | 3 |
5. Recommended governance pathway 🎯
I would prioritize Action 3: Biological Security as the main requirement addressed to the U.S. Department of Energy and Defense. This is because we first and foremost should address the immediate risk of creating bioweapons that can withstand radiation and high temperatures. This ensures that the foundation of the industry is built on containment and control before scaling or commercialization. Once the technology is regulated in a similar manner to nuclear fuel, Action 1 should be incentivized serving as a long-term safety net, providing a transparent repair manual for materials once they are safely deployed.
Trade-offs and Uncertainties:
Innovation vs. Security: The primary trade-off is that high security increases costs and can slow down academic research. There is a risk that over-regulating early-stage biology could delay clean fusion energy development.
Assumption of Risk: This plan assumes these bio-engineered cells are dangerous enough to warrant military-grade security. If the cells are actually fragile outside the reactor, the security measures might be unnecessary.
Questions from Professor Jacobson 🧪
Error rate for polymerase is 1 in 106 bases. The human genome length is 3.2 × 109 bases. Biology deals with the discrepancy using the MutS Repair system.
Average Human Protein: 1036bp = 345 amino acids
Each amino acid can have 61 sense codons – so that’s 61^345 = huge number of different ways. Most codes don’t work in practice because differences in codon bias, mRNA and translation efficiency can disrupt expression, stability, or correct protein production.
Questions from Dr. LeProust 🧬
Phosphonamidite DNA Synthesis.
Due to the high error rate – 1 in 10^2 per base so errors and truncated products accumulate exponentially with each base addition cycle.
After 2000 chemical synthesis cycles, errors and incomplete couplings accumulate at each step, and because the process has no proofreading, nearly all strands become truncated or mutated, leaving virtually no correct full-length product.
Question from George Church 🧠
10 essential amino acids which can’t be synthesized in the body:
Phenylalanine
Valine
Threonine
Tryptophan
Isoleucine
Methionine
Histidine
Arginine
Leucine
Lysine
Since Lysine is one of the amino acids which can’t be synthesised, lysine contingency as a strategy for bio containment exploits this natural dependency to control.
Sources:
https://nutrenaworld.com/blog/horses/what-are-essential-amino-acids-in-protein-and-why-do-they-matter/
Ai prompt – What is Lysine Contingency:
Lysine Contingency is a biocontainment strategy where an engineered organism is made unable to synthesize lysine, so it can only survive if lysine is externally supplied.
Week 2 HW: DNA Design Challenge
⚙️ 3.1 Choose a protein
I chose the ATP synthase beta subunit because it’s essentially a biological motor and connects to my broader interest in energy systems:
Protons flow down their gradient across the mitochondrial membrane, almost like current moving through a circuit, and that flow physically spins part of the protein like a tiny turbine. That rotation drives changes in the beta subunits, which catalyze the formation of ATP from ADP and phosphate.
So it’s literally energy stored in a gradient being converted into mechanical motion and then into chemical energy. I find that idea really compelling, it’s molecular thermodynamics in action, where fundamental physics laws become something tangible inside living cells.
From NCBI I obtained the protein sequence:
https://www.ncbi.nlm.nih.gov/protein/NP_001677.2/
https://www.ncbi.nlm.nih.gov/protein/NP_001677.2?report=fasta
NP_001677.2 ATP synthase F(1) complex subunit beta, mitochondrial precursor [Homo sapiens]
MLGFVGRVAAAPASGALRRLTPSASLPPAQLLLRAAPTAVHPVRDYAAQTSPSPKAGAATGRIVAVIGAV
VDVQFDEGLPPILNALEVQGRETRLVLEVAQHLGESTVRTIAMDGTEGLVRGQKVLDSGAPIKIPVGPET
LGRIMNVIGEPIDERGPIKTKQFAPIHAEAPEFMEMSVEQEILVTGIKVVDLLAPYAKGGKIGLFGGAGV
GKTVLIMELINNVAKAHGGYSVFAGVGERTREGNDLYHEMIESGVINLKDATSKVALVYGQMNEPPGARA
RVALTGLTVAEYFRDQEGQDVLLFIDNIFRFTQAGSEVSALLGRIPSAVGYQPTLATDMGTMQERITTTK
KGSITSVQAIYVPADDLTDPAPATTFAHLDATTVLSRAIAELGIYPAVDPLDSTSRIMDPNIVGSEHYDV
ARGVQKILQDYKSLQDIIAILGMDELSEEDKLTVSRARKIQRFLSQPFQVAEVFTGHMGKLVPLKETIKG
FQQILAGEYDHLPEQAFYMVGPIEEAVAKADKLAEEHSS
🔁 3.1 Reverse translate a protein sequence
We know we go from 3 DNA bases → RNA → 1 Codon → 1 Amino Acid → 1 Protein letter
We can find the nucleotide record
https://www.ncbi.nlm.nih.gov/nuccore/NM_001686.4
https://www.ncbi.nlm.nih.gov/nuccore/NM_001686.4?report=fasta
NM_001686.4 Homo sapiens ATP synthase F1 subunit beta (ATP5F1B), mRNA; nuclear gene for mitochondrial product
AGTCTCCACCCGGACTACGCCATGTTGGGGTTTGTGGGTCGGGTGGCCGCTGCTCCGGCCTCCGGGGCCT
TGCGGAGACTCACCCCTTCAGCGTCGCTGCCCCCAGCTCAGCTCTTACTGCGGGCCGCTCCGACGGCGGT
CCATCCTGTCAGGGACTATGCGGCGCAAACATCTCCTTCGCCAAAAGCAGGCGCCGCCACCGGGCGCATC
GTGGCGGTCATTGGCGCAGTGGTGGACGTCCAGTTTGATGAGGGACTACCACCAATTCTAAATGCCCTGG
AAGTGCAAGGCAGGGAGACCAGACTGGTTTTGGAGGTGGCCCAGCATTTGGGTGAGAGCACAGTAAGGAC
TATTGCTATGGATGGTACAGAAGGCTTGGTTAGAGGCCAGAAAGTACTGGATTCTGGTGCACCAATCAAA
ATTCCTGTTGGTCCTGAGACTTTGGGCAGAATCATGAATGTCATTGGAGAACCTATTGATGAAAGAGGTC
CCATCAAAACCAAACAATTTGCTCCCATTCATGCTGAGGCTCCAGAGTTCATGGAAATGAGTGTTGAGCA
GGAAATTCTGGTGACTGGTATCAAGGTTGTCGATCTGCTAGCTCCCTATGCCAAGGGTGGCAAAATTGGG
CTTTTTGGTGGTGCTGGAGTTGGCAAGACTGTACTGATCATGGAGTTAATCAACAATGTCGCCAAAGCCC
ATGGTGGTTACTCTGTGTTTGCTGGTGTTGGTGAGAGGACCCGTGAAGGCAATGATTTATACCATGAAAT
GATTGAATCTGGTGTTATCAACTTAAAAGATGCCACCTCTAAGGTAGCGCTGGTATATGGTCAAATGAAT
GAACCACCTGGTGCTCGTGCCCGGGTAGCTCTGACTGGGCTGACTGTGGCTGAATACTTCAGAGACCAAG
AAGGTCAAGATGTACTGCTATTTATTGATAACATCTTTCGCTTCACCCAGGCTGGTTCAGAGGTGTCTGC
ATTATTGGGCCGAATCCCTTCTGCTGTGGGCTATCAGCCTACCCTGGCCACTGACATGGGTACTATGCAG
GAAAGAATTACCACTACCAAGAAGGGATCTATCACCTCTGTACAGGCTATCTATGTGCCTGCTGATGACT
TGACTGACCCTGCCCCTGCTACTACGTTTGCCCATTTGGATGCTACCACTGTACTGTCGCGTGCCATTGC
TGAGCTGGGCATCTATCCAGCTGTGGATCCTCTAGACTCCACCTCTCGTATCATGGATCCCAACATTGTT
GGCAGTGAGCATTACGATGTTGCCCGTGGGGTGCAAAAGATCCTGCAGGACTACAAATCCCTCCAGGATA
TCATTGCCATCCTGGGTATGGATGAACTTTCTGAGGAAGACAAGTTGACCGTGTCCCGTGCACGGAAAAT
ACAGCGTTTCTTGTCTCAGCCATTCCAGGTTGCTGAGGTCTTCACAGGTCATATGGGGAAGCTGGTACCC
CTGAAGGAGACCATCAAAGGATTCCAGCAGATTTTGGCAGGTGAATATGACCATCTCCCAGAACAGGCCT
TCTATATGGTGGGACCCATTGAAGAAGCTGTGGCAAAAGCTGATAAGCTGGCTGAAGAGCATTCATCGTG
AGGGGTCTTTGTCCTCTGTACTGTCTCTCTCCTTGCCCCTAACCCAAAAAGCTTCATTTTTCTGTGTAGG
CTGCACAAGAGCCTTGATTGAAGATATATTCTTTCTGAACAGTATTTAAGGTTTCCAATAAAATGTACAC
CCCTCAGAA
🧪 3.3 Codon Optimization
Multiple codons can code for the same amino acid, but different organisms prefer certain codons over others. So we have to optimize codon usage for that specific organism otherwise translation might be inefficient, we want to use tRNA’s that are plentiful – which bind to that specific codon attaching the specific amino acid.
I have chosen E.coli as the organism to optimize the protein sequence for. Since we use them in the fluorescent bacteria artwork lab!
Above is the entire mRNA sequence, but we need the coding sequence (CDS) – the mRNA sequence has additional information like a start and end codon and untranslated regions. We can go to the CDS record instead, obtain the coding sequence and then use our codon optimization on it.
https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8924.1
https://www.idtdna.com/CodonOpt
We get this result:
ATG CTG GGA TTT GTT GGA CGT GTG GCT GCC GCG CCT GCG TCA GGA GCA CTG CGC CGC CTG ACT CCT TCT GCC TCT CTG CCG CCG GCG CAG CTG CTG CTG CGT GCG GCG CCA ACC GCG GTT CAC CCG GTG CGT GAT TAT GCC GCG CAG ACC TCG CCC TCT CCG AAA GCC GGT GCG GCC ACC GGC CGT ATC GTC GCG GTG ATC GGC GCG GTG GTA GAT GTA CAG TTT GAT GAA GGT CTG CCG CCG ATT CTC AAT GCG CTG GAA GTT CAG GGC CGT GAA ACC CGC CTG GTT CTG GAG GTA GCG CAG CAC CTG GGT GAG AGC ACC GTC CGT ACC ATT GCT ATG GAC GGC ACC GAA GGT CTG GTG CGT GGT CAG AAA GTG CTG GAT TCT GGT GCA CCG ATC AAA ATC CCG GTT GGC CCG GAA ACG TTG GGG CGT ATC ATG AAC GTC ATT GGT GAA CCG ATT GAT GAA CGT GGA CCG ATC AAA ACC AAA CAG TTT GCG CCG ATC CAT GCG GAA GCG CCG GAG TTT ATG GAA ATG AGC GTT GAG CAG GAG ATC CTG GTG ACC GGC ATC AAA GTG GTT GAT CTG CTG GCG CCG TAT GCC AAA GGC GGC AAA ATC GGC CTG TTC GGC GGT GCG GGT GTC GGC AAA ACC GTG CTG ATC ATG GAG CTG ATC AAC AAC GTG GCG AAA GCG CAC GGT GGT TAC AGC GTC TTT GCC GGT GTC GGT GAG CGC ACC CGT GAA GGT AAC GAC CTG TAT CAC GAA ATG ATT GAG AGC GGT GTG ATC AAC CTG AAA GAT GCG ACC AGC AAG GTC GCG CTG GTT TAC GGC CAG ATG AAC GAG CCG CCA GGT GCG CGT GCC CGT GTT GCG CTG ACT GGC CTG ACG GTA GCT GAG TAC TTC CGT GAC CAG GAA GGT CAG GAT GTG CTG CTG TTT ATC GAC AAC ATC TTC CGC TTC ACC CAG GCA GGC TCT GAA GTC TCT GCG CTG CTG GGT CGC ATC CCC TCA GCG GTT GGC TAT CAG CCG ACC CTG GCG ACC GAC ATG GGC ACC ATG CAG GAG CGT ATC ACC ACC ACC AAA AAA GGC TCT ATC ACC TCG GTT CAG GCG ATC TAT GTG CCG GCT GAT GAT CTG ACT GAT CCG GCA CCG GCA ACC ACC TTT GCC CAC CTG GAT GCC ACC ACC GTG CTC AGC CGT GCG ATT GCC GAG CTG GGT ATC TAC CCG GCG GTG GAT CCG CTG GAC AGC ACC TCG CGT ATT ATG GAC CCC AAC ATT GTC GGC TCT GAA CAC TAC GAT GTG GCG CGC GGC GTG CAG AAG ATC CTG CAG GAC TAC AAA AGC CTG CAG GAT ATC ATT GCC ATC CTG GGT ATG GAT GAA CTC TCT GAA GAA GAT AAA CTG ACC GTT AGC CGT GCG CGC AAA ATC CAG CGC TTC CTG AGC CAG CCG TTC CAG GTG GCG GAA GTG TTC ACC GGT CAC ATG GGC AAA CTG GTG CCG CTG AAA GAG ACT ATT AAA GGC TTC CAG CAG ATT CTG GCG GGT GAG TAC GAC CAC CTG CCG GAA CAG GCG TTC TAT ATG GTG GGC CCG ATT GAA GAG GCG GTG GCG AAA GCG GAT AAA CTG GCG GAA GAA CAT AGC AGC TAA
🧫 3.4 What technologies could be used to produce this protein from your DNA?
We can use cell dependent expressions, like cloning the optimized DNA sequence into a plasmid vector and introducing it into a host organism such as E.coli. Once inside the promoter recruits RNA polymerase and transcribes the DNA sequence into mRNA. The ribosomes then binds to the mRNA and tRNA’s match codons and deliver amino acids. The amino acids are then linked together to form the protein. The bacteria would then produce ATP synthase beta subunit as part of their cellular machinery.
🧩 4.1-2 Build your DNA insert sequence
Expression Cassette
https://benchling.com/s/seq-QDGibA4g7TjoTuX3lb5A?m=slm-Gx8zqXYh9sr4lxSK0Xqu
🔄 4.3-6 Twist, Vector choice, Sequence Download
We can view the full plasmid sequence for our clonal genes (circular dna) and pTwist Amp High Copy cloning vector in Benchling:
https://benchling.com/s/seq-wsl9w63Z5DcxN7rlp5cG?m=slm-ndl9y5U2FSsJgNYW6z7
🧬 5.1 What DNA would you want to sequence and technologies used?
I would choose to sequence the DNA of extremophiles that thrive in high-radiation or high-temperature environments. By sequencing genes involved in radiation resistance, DNA repair, and protein stabilization, we could better understand the molecular mechanisms that allow biological systems to survive under extreme stress. This knowledge could help inform the engineering of radiation-resistant biological materials or bio-hybrid systems designed to operate in harsh energy environments. Studying these organisms connects molecular biology with broader challenges in advanced energy systems.
I would use Illumina sequencing to sequence the DNA since it provides high accuracy and high throughput and is well suited for whole-genome sequencing and variant detection. It’s a second generation technique, sequencing millions of short DNA fragments in parallel using sequencing-by-synthesis. Illumina sequencing reads DNA by copying it one base at a time and taking a picture after each base is added.
The input would be the extracted genomic DNA.
Preparation steps:
- Fragment DNA into short pieces
- Ligate sequencing adapters
- PCR amplify fragments
- Load onto flow cell for cluster amplification
Essential sequencing steps:
- DNA fragments bind to flow cell
- Bridge amplification forms clusters
- Fluorescently labeled nucleotides are added one at a time
- A camera detects the fluorescent signal for each incorporated base
- The color signal determines the base
The output would be millions of short sequence reads containing nucleotide sequences and quality scores which can be assembled into a genome or aligned to a reference.
🧪 5.2 What DNA would you want to synthesize and technologies used?
I would want to synthesize a cluster of genes involved in enhanced DNA repair and protein stabilization from extremophiles and express them in a model organism. By combining multiple protective pathways, we could engineer cells with improved resistance to radiation and thermal stress. The idea would to use this to develop radiation-resistant biomaterials or biological components for extreme energy environments. We could build a genetic circuit that enables engineered bacteria to sense and respond to radiation stress. This circuit could include radiation response promoters, DNA repair genes and protective protein pathways that activate under high oxidative or ionizing radiation conditions.
To synthesize this genetic circuit, we could use Twist combined with phosphoramidite solid-phase DNA synthesis and Gibson Assembly for multi-fragment assembly.
Essential steps:
- Design optimized DNA sequence computationally
- Chemically synthesize short oligonucleotides (base-by-base addition)
- Cleave and purify oligos
- Assemble fragments into full-length gene (e.g., Gibson Assembly)
- Clone into plasmid backbone
- Sequence-verify construct
Limitations:
Length limits: Direct chemical synthesis is reliable only for short fragments with longer genes requiring assembly. There’s also base errors so we would need to do sequencing validation and it can be very expensive for large gene clusters and take a large amount of time.
✏️ 5.3 What DNA would you want to edit and why? What technologies?
I would edit the genomes of photosynthetic microorganisms such as algae to improve their efficiency in converting light energy into chemical fuels. I could target genes involved in photosystem efficiency, carbon fixation pathways, and hydrogen production.
Photosynthesis is essentially a natural solar energy conversion system, but it is quite inefficient. We could modify regulatory genes to reduce energy losses or redirect metabolic pathways toward hydrogen or biofuel production, so we could have biological systems that convert sunlight into storable chemical energy more efficiently.
I am interested as it connects directly to large-scale energy systems and treating living cells as programmable energy conversion platforms, similar to designing more efficient reactors or turbines.
WE could use CRISPR-Cas12a for genome editing in cyanobacteria.
How does it edit dna
- Design guide RNAs targeting specific genes.
- Deliver Cas12a and guide RNAs into the cells.
- Cas12a cuts the DNA at precise locations.
- The cell repairs the cut using a donor DNA template to insert optimized sequence
Design:
- Identify metabolic bottlenecks in photosynthesis or fuel production.
- Design guide RNAs.
- Design donor DNA templates if inserting new sequences.
Inputs:
- Cas enzyme
- Guide RNAs
- Donor DNA (if needed)
- Host cells (e.g., cyanobacteria)
Limitations
- Off-target edits may occur.
- Large pathway rewiring is complex.
- Efficiency gains may be modest due to thermodynamic constraints.
Week 3 HW: OpenTrons and Python
OpenTrons, Python and Hypotrochoid Patterns 🧪
We learned how to use the Opentrons Python API to write a protocol, essentially a set of instructions that controls the robot’s pipettes. Instead of manually pipetting, we defined coordinates, volumes, and movement steps in code so the robot could deposit liquid precisely into specific wells to create a defined pattern.
Also we could simulate the protocol before running it on the actual robot. This let us preview how the design would look, check for mistakes, and adjust the pattern in software first.
1. Importing the tools we need 🧰
from opentrons import types
import math
This protocol relies on two key libraries.
The math module provides the trigonometric functions ( sin , cos , pi ) needed to compute the hypotrochoid curve.
The Opentrons types module allows us to describe 3-dimensional positions on the robot deck. In particular, we use types.Point() to move the pipette relative to a reference point on the agar plate.
Together these allow us to convert mathematical coordinates into physical robot movements.
2. Protocol metadata 📋
metadata = {
'protocolName':'HTGAA_SAMI',
'author':'Sami',
'description':'Hypotrochoid loops',
'source':'HTGAA 2026 Opentrons Lab',
'apiLevel':'2.20'
}
Every Opentrons protocol contains metadata describing the experiment.
This includes:
• the name of the protocol
• the author
• a description of the experiment
• the API version
The API level is particularly important because it determines which robot commands are available.
3. Defining the robot deck layout 🧭
TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'
These constants define where each piece of labware sits on the robot deck.
For this experiment we use:
• a 20µL tip rack
• a temperature-controlled plate containing colored liquids
• an agar plate where the design will be drawn
Separating these as constants makes the protocol easier to modify if the deck layout changes.
4. Mapping colors to wells 🎨
well_colors = {
'A1':'Red',
'B1':'Yellow',
'C1':'Green',
'D1':'Cyan',
'E1':'Blue'
}
Each colored dye is stored in a specific well on the cold plate.
This dictionary creates a simple mapping between well locations and color names so that later in the protocol we can refer to colors directly (e.g., “blue”) rather than remembering the exact well coordinates.
5. Initializing the robot and loading labware 🤖
tips_20ul = protocol.load_labware(
'opentrons_96_tiprack_20ul',
TIP_RACK_DECK_SLOT
)
pipette_20ul = protocol.load_instrument(
“p20_single_gen2”,
“right”,
[tips_20ul]
)
Inside the run() function, the robot is configured by loading labware and instruments.
Here we load:
• a 20 µL tip rack
• a P20 single-channel pipette
The pipette is mounted on the robot’s right arm and is linked to the tip rack so the robot knows where to pick up tips.
6. Finding color locations automatically 🔎
def location_of_color(color_string):
for well, color in well_colors.items():
if color.lower() == color_string.lower():
return color_plate[well]
Instead of hardcoding well positions throughout the code, this helper function allows us to request colors by name.
For example:
location_of_color("blue")
The function searches the well_colors dictionary and returns the corresponding well location on the plate.
This keeps the protocol clean and readable.
7. Calculating hypotrochoid curves 🧮
def hypotrochoid_points(R_mm, r_mm, d_mm, n_steps, n_turns):
x = (R - r) * cos(t) + d * cos((R - r) / r * t)
y = (R - r) * sin(t) - d * sin((R - r) / r * t)
The core of the design is the hypotrochoid equation, the same mathematical curve used in spirograph toys.
A hypotrochoid describes the path traced by a point on a circle rolling inside a larger circle.
The parameters control the shape:
• R – radius of the large circle
• r – radius of the rolling circle
• d – distance of the pen from the rolling circle center
The function evaluates these equations at many values of t to generate a list of (x, y) points representing the curve.
These coordinates later become robot movement instructions.
8. Transforming the curve 🔄
def rotate_points(pts, deg):
th = math.radians(deg)
return [(x*c - y*s, x*s + y*c) for x, y in pts]
Scaling
def scale_points(pts, scale):
return [(x * scale, y * scale) for x, y in pts]
This shrinks or expands the pattern.
By applying these transformations we can create multiple interwoven layers of the same curve.
9. Converting curve points into droplets 💧
loc = center_location.move(types.Point(x, y))
dispense_and_detach(pipette_20ul, drop_ul, loc)
Each (x, y) coordinate is translated into a physical position on the agar plate relative to the plate center.
The robot then:
- moves above the point
- dispenses a tiny droplet
- lifts the pipette slightly to detach the drop
This produces a sequence of small droplets that trace the mathematical curve.
10. Creating layered designs 🧵
layers = [
('cyan',0,1.00,0.2,2.5),
('blue',18,1.00,0.2,2.5),
('green',36,0.985,0.2,2.5),
('yellow',54,0.97,0.2,2.5),
]
Instead of drawing a single curve, the protocol draws multiple layers.
Each layer specifies:
• a color
• a rotation angle
• a scale factor
• droplet size
• spacing between droplets
By rotating and slightly scaling each layer, the curves weave together into a complex multi-color pattern.
11. Drawing the pattern ✏️
for color, rot_deg, scl, dot_ul, step_mm in layers:
pts = scale_points(base_pts, scl)
pts = rotate_points(pts, rot_deg)
dispense_path(color, pts)
For each layer the protocol:
- scales the base hypotrochoid
- rotates it
- sends the points to the dispensing routine
The robot then physically draws the pattern on the agar plate.
12. Adding a final decorative ring ✨
ring_pts = []
for i in range(80):
t = 2 * math.pi * i / 80
ring_pts.append((ring_r * math.cos(t), ring_r * math.sin(t)))
Finally, a small circular ring of yellow droplets is added at the center of the design.
This creates a visual “sparkle” effect and highlights the symmetry of the pattern.
Week 4 HW: Protein Design I
🔵 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Protein in 500 g of meat:
100 g → 26 g protein
500 g → 130 g protein
Mass of one amino acid:
1 Dalton = 1.66 × 10⁻²⁴ g
Average amino acid ≈ 100 Da
→ 100 × 1.66 × 10⁻²⁴ = 1.66 × 10⁻²² g
Number of amino acid molecules:
130 g ÷ 1.66 × 10⁻²² g ≈ 7.83 × 10²³ molecules
Convert to moles using Avogadro’s number:
7.83 × 10²³ ÷ 6.022 × 10²³ ≈ 1.30 mol
Final answer:
≈ 7.8 × 10²³ amino acid molecules
≈ 1.3 mol of amino acids
🔵 2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
When we eat beef or fish, the body breaks it down into basic building blocks like amino acids, so at that point it is no longer “cow” or “fish” but just raw materials that we use to build our own cells and proteins; the DNA in food is also broken down and cannot function in our bodies, and since our cells only follow human DNA instructions, we are simply using the materials rather than becoming what we eat.
🔵 3. Why are there only 20 natural amino acids?
It is not fully understood why there are only 20 natural amino acids. One idea, proposed by Francis Crick, is the frozen accident theory, which suggests that the genetic code is not perfectly optimized but instead came from an early, somewhat arbitrary setup that later became fixed. In that sense, the 20 amino acids we see today may have just been what happened to get locked in at the start of life. At the same time, studies suggest these amino acids cover a good spread of chemical properties—like charge, polarity, hydrophobicity, and size—so they are diverse enough to build a wide range of protein structures.
🔵 4. Can you make other non-natural amino acids? Design some new amino acids.
Yes you can create non-natural amino acids. A well known example is the work by Floyd E. Romesberg, particularly the paper A Genomically Recoded Organism with an Expanded Genetic Alphabet (Nature, 2014), which demonstrated that the genetic alphabet can be expanded by introducing unnatural base pairs. This allows cells to encode and incorporate non-natural amino acids into proteins by creating new codons.
🔵 5. Where did amino acids come from before enzymes that make them, and before life started?
Amino acids likely formed before life through simple chemistry on early Earth rather than through enzymes. A classic example is the experiment by Stanley Miller, who showed in his 1953 Science paper that if you simulate early Earth conditions (basic gases plus an energy source like lightning), amino acids can form spontaneously. This lines up with ideas going back to Charles Darwin, who speculated that life might have first emerged in a warm little pond with the right chemicals and energy. So the building blocks of life can arise from pretty simple ingredients without any biology involved, and interestingly, even now we still haven’t been able to create life itself from scratch.
🔵 6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Because D-amino acids are mirror images of L-amino acids, they naturally form the opposite helix. So while normal L amino acids form right-handed α-helices, D-amino acids form left-handed ones.
🔵 7. Can you discover additional helices in proteins?
Yes, proteins can form additional types of helices beyond the standard α-helix. These include 3₁₀-helices and π-helices, which differ in how tightly they coil and in their hydrogen bonding patterns.
🔵 8. Why are most molecular helices right-handed?
Most molecular helices are right-handed because biological building blocks are chiral with L-amino acids and D-sugars favoring right-handed structures that minimize steric clashes and optimize hydrogen bonding. Exceptions like Z-DNA exist but are less common and form under specific conditions.
🔵 9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
This happens because the edges of β-sheets can easily form hydrogen bonds with other strands, and many of the side chains involved are hydrophobic, so they cluster together to avoid water. The main driving force is therefore hydrophobic interactions, along with additional stabilization from hydrogen bonding between sheets.
🔵 10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
Amyloid diseases form β-sheets because they are very stable, so misfolded proteins stack into insoluble “cross-β” fibrils that build up over time, as seen in Alzheimer’s Disease, Type 2 Diabetes, and Creutzfeldt–Jakob disease; that same stability also makes these structures useful as materials like nanofibers and scaffolds.
🧪 Part B: Protein Analysis and Visualization
🔵 1. Briefly describe the protein you selected and why you selected it.
I selected green fluorescent protein (GFP) because it is a well-known protein that clearly links structure to function. GFP naturally fluoresces due to a chromophore formed within its folded structure, which makes it widely used to track gene expression and protein location in cells. I also chose it because I’ve enjoyed working with fluorescent systems in biology so far, like in the Opentrons lab, so it feels familiar and intuitive while still being a powerful example of how protein structure leads to function.
🔵 2. Identify the amino acid sequence of your protein.
From CBI I obtained the amino acid sequence - Aequorea victoria green-fluorescent protein:
https://www.ncbi.nlm.nih.gov/nuccore/L29345.1
MSKGEELFTGVVPILVELDGDVNGQKFSVSGEGEGDATYGKLT
KFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKD
DGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKMEYNYNSHNVYIMADKPKNG
IKVNFKIRHNIKDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHM
ILLEFVTAAGITHGMDELYK
sp|Q15465|SHH_HUMAN Sonic hedgehog protein Results
Length: 238 amino acids
Most frequent: G (22 times, 9.2%)
How many protein sequence homologs are there for your protein?
Uniprot id: P42212 - 205 hits found
Does your protein belong to any protein family?
GFP belongs to the green fluorescent protein (GFP) family. This family includes a range of fluorescent proteins found in organisms like jellyfish and corals, all of which share a similar β-barrel structure and fluorescent chromophore but can emit different colors (e.g. green, blue, cyan, yellow, red).
🔵 Structure Analysis
https://www.rcsb.org/3d-view/1EMA - 1EMA structure for GFP
The GFP structure (PDB ID: 1EMA) was solved in 1996, deposited on August 1 and released on November 8 by Ormo and Remington. It was determined using X-ray diffraction with a resolution of 1.90 Å, indicating a high-quality structure with well-resolved atomic positions. In addition to the protein, the structure includes water molecules and the chromophore (listed as a non-standard residue). Structurally, GFP belongs to the all-β β-barrel fold, commonly referred to as the GFP-like fold in classification systems such as SCOP.
🔵 3D Visualization
dss
color red, ss h
color yellow, ss s
color green, ss l
From the structure, we can infer that hydrophobic residues are predominantly located in the interior of the protein, forming a stable core, while hydrophilic residues are mainly exposed on the surface where they can interact with the surrounding aqueous environment. This distribution is consistent with typical protein folding and helps stabilize the β-barrel structure of GFP.
The protein appears mostly smooth and compact with no large exposed binding pockets on the exterior. There are only small surface indentations, but no obvious deep cavities. This suggests that GFP does not have a typical surface binding site; instead, its main “hole” is an internal cavity within the β-barrel, which is not visible from the outside surface.
🤖 Part C: Using ML-Based Protein Design Tools
At a high level, we are using a pretrained machine learning model to learn patterns from large numbers of protein sequences and then apply that knowledge to analyze a specific protein. In the deep mutational scan, we systematically mutate each position in the protein and use the model to estimate how likely or tolerated each mutation is, which helps identify important versus flexible regions of the protein. In latent space analysis, we convert entire protein sequences into vector embeddings and visualize them in a reduced-dimensional space, where proteins with similar structure or function cluster together. Together, these approaches let us explore both how individual mutations affect a protein and how whole proteins relate to each other, without directly simulating their physical behavior.
🧪 Deep Mutational Scans
The mutation scan heatmap shows how each possible amino acid substitution affects every position in the protein. The x-axis represents the position along the protein sequence (from residue 1 to ~238 for GFP), and the y-axis represents the 20 possible amino acids that could be substituted at each position. Each cell in the heatmap corresponds to a specific mutation (e.g. position i mutated to amino acid j), and the color indicates the model’s score or likelihood for that mutation: brighter colors (yellow/green) indicate mutations that are more likely or tolerated, while darker colors (blue/purple) indicate mutations that are unlikely and likely destabilizing.
By looking vertically at a single column (one position), we can see how sensitive that position is to mutation, columns that are mostly dark suggest highly conserved, functionally or structurally critical residues, whereas columns with many lighter colors indicate positions that are more flexible and tolerant to change. Patterns across the heatmap therefore reveal which regions of the protein are constrained (e.g. core or active regions) versus more variable (e.g. surface or loop regions), giving insight into the protein’s stability and function.
For example, at a position in the core of the protein (e.g. around residue ~65, near the chromophore region in GFP), most substitutions are dark (low likelihood), but mutations to similar amino acids (e.g. hydrophobic → hydrophobic) may be slightly less penalized. This suggests that the residue is highly conserved and structurally important, and changing it disrupts the local environment required for stability or function. In contrast, substituting with a chemically similar residue is less disruptive, which is why those mutations appear slightly more tolerated.
🌌 Latent Space Analysis
Each point represents a protein, and proximity reflects similarity in learned sequence features. After placing GFP into this space, its nearest neighbor was another GFP sequence, confirming the model correctly captures sequence similarity. However, other nearby proteins were functionally different and relatively distant, suggesting that GFP is somewhat isolated in this dataset due to a lack of closely related sequences. This indicates that while the latent space captures meaningful relationships, the dataset composition strongly influences the observed neighborhoods.
🧩 Protein Folding
Protein folding is important because a protein’s function is determined by its three-dimensional structure rather than just its amino acid sequence. The way a protein folds defines its active sites, binding interactions, and overall stability, which in turn controls how it behaves in a biological system. Misfolding can lead to loss of function or disease, while correct folding enables proteins to carry out roles such as catalysis, signaling, and structural support. Being able to understand and predict how a sequence folds therefore allows us to infer function, study the effects of mutations, and design new proteins or therapeutics without relying solely on experimental methods.

In this task, we used ESMFold to predict protein structure directly from sequence. The model is first pretrained as a protein language model (ESM-2), learning patterns from large datasets of sequences, and then passes this information into a folding module that outputs predicted 3D coordinates. In the diagram, the sequence is encoded into embeddings, processed through a series of network blocks, and iteratively refined to produce a final structure along with a confidence estimate. We then compare the predicted structure to known experimental structures and test how mutations affect folding, allowing us to explore how robust the protein’s structure is to changes in its sequence.
🧬 Protein Generation – Inverse Folding
Inverse protein folding is the process of starting with a desired 3D protein structure (its backbone shape) and designing an amino acid sequence that will fold into that structure. Instead of predicting structure from a sequence, you reverse the problem: given a fixed geometry, a model like ProteinMPNN selects residues that fit spatially, stabilize interactions, and satisfy physical constraints. Because many different sequences can produce the same structure, the goal is to find one (or several) that make the structure energetically stable. The designed sequence is then typically validated by folding it again with a model like ESMFold and checking whether it reproduces the original structure.
Week 5 HW: Protein Design II
🧬 Part 1 Generate Binders with PepMLM
Human SOD1 Sequence:
https://www.uniprot.org/uniprotkb/P00441/entry
https://www.uniprot.org/uniprotkb/P00441/entry#sequences
SOD1 sequence
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLS
RKHGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLGDHCIIGRTLVVHEKADDLGKGGNEESTKT
GNAGSRLACGVIGIAQ
SOD1 sequence with A4V mutation
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLS
RKHGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLGDHCIIGRTLVVHEKADDLGKGGNEESTKT
GNAGSRLACGVIGIAQ
Here is a table with the binders ranked and compared against a known binder:
| Rank | Peptide Source | Sequence | Pseudo Perplexity |
|---|
| 1 | Reference (Experimental) | FLYRWLPSRRGG | 2.2833 |
| 2 | PepMLM (Candidate 0) | KLVPAVVLAHKX | 7.4714 |
| 3 | PepMLM (Candidate 1) | KRSYPTALRHWX | 10.1367 |
| 4 | PepMLM (Candidate 2) | WRYPVAABHGK | 11.0383 |
| 5 | PepMLM (Candidate 3) | WHVYVVGLRHKE | 25.8914 |
The perplexity metric measures how perplexed or “surprised” as it were, a model is by a sequence. Hence a lower score represents higher model confidence or predicted affinity. Here, the known binder FLYRWLPSRRGG acts as a benchmark, scoring 2.28 on the pseudo perplexity rating, which is significantly lower than the newly generated designs. As you can see, I have ranked the binders in order of their respective perplexity ratings.
🔬 Part 2: Evaluate Binders with AlphaFold3
| Rank | Job Name | ipTM | pTM | Primary Binding Location | Target Engagement |
|---|
| 1 | SOD1 and KLVPAVVLAHK | 0.58 | 0.82 | N-terminus Groove | High (Pocket) |
| 2 | SOD1 and WHVYVVGLRHKE | 0.49 | 0.81 | Upper β-barrel Ridge | Moderate (Surface) |
| 3 | SOD1 and KRSYPTALRHW | 0.44 | 0.90 | β-barrel Loops | Moderate (Surface) |
| 4 | SOD1 and WRYPVAABHGK | 0.39 | 0.83 | Lower Dimer Interface | Low/Mod (Surface) |
| 5 | SOD1 and FLYRWLPSRRGG (Ref) | 0.26 | 0.81 | Surface Loops | Low (Transient) |
Key
| Confidence Level | pLDDT Range | Corresponding Color |
|---|
| Very High | pLDDT > 90 | Dark Blue |
| Confident | 90 > pLDDT > 70 | Light Blue (Cyan) |
| Low | 70 > pLDDT > 50 | Yellow |
| Very Low | pLDDT < 50 | Orange |
Protein-peptide complex Models using AlphaFold3 and Residue Alignment Charts (Green)
They are ordered according to their ipTM score, with the first (KLVPAVVLAHK) having the greatest score (0.58) etc
AlphaFold 3 modelling supported the binding potential of the peptides generated using PepMLM. Notably, all four model-generated peptides outperformed the experimental reference peptide, FLYRWLPSRRGG, in terms of ipTM (interface confidence), despite the reference having the lowest pseudo-perplexity score.
Candidate 0, KLVPAVVLAHK, achieved the highest ipTM score of 0.58 and also exhibited the lowest pseudo-perplexity score of 7.4714. Its elevated ipTM score suggests a strong ability to dock deeply within the N-terminal groove of SOD1, specifically near the ALS-associated A4V mutation site. In contrast, the remaining peptides displayed differing binding preferences across the β-barrel region and dimer interface.
The second strongest binder was Candidate 3, WHVYVVGLRHKE, with an ipTM score of 0.49. Interestingly, this peptide also had the highest pseudo-perplexity score at 25.8914, indicating that although it demonstrates favourable binding to mutant SOD1, its sequence is less likely to occur naturally compared with the other generated candidates.
🧪 Part 3: Evaluate Properties of Generated Peptides in PeptiVerse
In the search for peptides capable of stabilizing the SOD1 protein, a major therapeutic target in ALS research, the focus shifts from structural prediction in AlphaFold 3 to therapeutic evaluation in PeptiVerse. While AlphaFold 3 provides insight into the three-dimensional binding structure of a peptide, the 11 profiling metrics generated by PeptiVerse offer a broader assessment of how each candidate may behave in a biological and therapeutic context. Shown below are the results of evaluating the four PepMLM-designed peptide candidates against the established reference binder, FLYRWLPSRRGG, ranked from highest to lowest ipTM score.
| Metric / Property | KLVPAVVLAHK | WHVYVVGLRHKE | KRSYPTALRHW | WRYPVAABHGK | FLYRWLPSRRGG (Ref) |
|---|
| ipTM (Structural) | 0.58 | 0.49 | 0.44 | 0.39 | 0.26 |
| Solubility | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Permeability | 0.242 | 0.143 | 0.849 | 0.359 | 0.862 |
| Hemolysis | 0.032 | 0.052 | 0.022 | 0.010 | 0.047 |
| Non-Fouling | 0.285 | 0.297 | 0.549 | 0.480 | 0.666 |
| Half-Life (hrs) | 0.438 | 0.412 | 0.342 | 0.339 | 0.310 |
| Binding (pKd) | 5.528 | 5.919 | 5.965 | 5.300 | 5.968 |
| Length (aa) | 11 | 12 | 11 | 11 | 12 |
| Mol. Weight (Da) | 1174.5 | 1522.8 | 1414.6 | 1166.5 | 1507.7 |
| Net Charge (pH 7) | +1.59 | +0.94 | +2.85 | +1.85 | +2.76 |
| Isoelectric Point | 10.00 | 8.60 | 11.00 | 9.99 | 11.71 |
| GRAVY (Hydrophobicity) | 1.02 | -0.38 | -1.44 | -0.73 | -0.71 |
The results revealed an interesting trade-off between structural binding confidence and therapeutic potential.
Although Candidate 0 (KLVPAVVLAHK) achieved the highest ipTM score of 0.58, indicating that AlphaFold 3 predicts a highly confident structural interaction with mutant SOD1, PeptiVerse’s therapeutic profiling identified Candidate 1 (KRSYPTALRHW) as the most promising overall candidate despite its lower ipTM score of 0.44.
This distinction likely arises from the balance between binding performance and drug-like properties. Candidate 1 exhibited one of the lowest pseudo-perplexity scores among the generated peptides at 7.4714, suggesting that its sequence remains relatively biologically plausible and potentially more nature-like. In addition, it achieved the highest predicted binding affinity of the generated candidates, with a pKd score of 5.965, alongside the strongest permeability score of 0.849, indicating an increased likelihood of penetrating cells and reaching intracellular mutant SOD1 targets.
Importantly, Candidate 1 also displayed the highest positive net charge of all tested peptides, including the reference peptide, with a score of +2.85. This characteristic may enhance its ability to cross the blood–brain barrier and interact with the negatively charged aggregates associated with mutant SOD1 pathology.
Taken together, these results suggest that while Candidate 0 demonstrates the strongest predicted structural fit, Candidate 1 offers the most balanced combination of binding capability, permeability, and therapeutic suitability, making it the strongest candidate for further investigation.
⚙️ Part 4: Generate Optimized Peptides with moPPIt
| Run | Sequence | Affinity (pKd) | Solubility | Specificity | Motif Score | Hemolysis |
|---|
| #1 | RFKCIVKVMVRR | 8.881 | 0.500 | 0.615 | 0.553 | 0.944 |
| #2 | KRLQLYRKKCAE | 7.193 | 0.750 | 0.737 | 0.634 | 0.964 |
| #3 | QRACDYFRDDED | 7.783 | 0.833 | 0.679 | 0.059 | 0.895 |
| #4 | KEKEGPCWESEK | 7.360 | 0.833 | 0.871 | 0.002 | 0.962 |
The PepMLM-generated peptides primarily emphasize high-confidence structural docking alongside balanced biophysical properties, resulting in a more conservative yet affinity-improving profile relative to the baseline reference. In contrast, the moPPIt-generated peptides explore a broader chemical space and place greater emphasis on targeted binding interactions.
Week 6 HW: Genetic Circuits I
🧬 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion High-Fidelity PCR Master Mix contains several important components needed for accurate DNA amplification during PCR. The main component is Phusion DNA Polymerase, which is a highly accurate and thermostable enzyme that quickly copies DNA while minimizing mistakes. This makes it especially useful for applications such as cloning and DNA sequencing where precision is important.
The mix also contains deoxynucleotide triphosphates (dNTPs), which are the building blocks used to create new DNA strands. In addition, there is an optimized reaction buffer that provides the ideal chemical environment for the polymerase to work efficiently by maintaining the correct pH and ionic strength, while also helping stabilize the enzyme during the high temperatures of PCR.
Another key component is magnesium chloride (MgCl₂). Magnesium ions act as essential cofactors for the polymerase, allowing it to catalyse DNA synthesis by helping form phosphodiester bonds between nucleotides. They also help primers anneal to the template DNA by reducing electrostatic repulsion between the negatively charged DNA strands.
🧪 2. What are some factors that determine primer annealing temperature during PCR?
Some of the main factors that determine primer annealing temperature during PCR include the primer’s melting temperature (Tm), primer length, GC content, primer concentration, and the ionic strength of the reaction buffer. Primers with higher GC content generally require higher annealing temperatures because GC base pairs form three hydrogen bonds compared with two in AT base pairs, making them more stable. Longer primers also tend to have higher melting temperatures. In addition, buffer conditions and salt concentration influence how strongly the primer binds to the template DNA, which can affect the optimal annealing temperature.
🧫 3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
PCR and restriction enzyme digests both create linear DNA fragments, but they do so in very different ways and are used for different purposes. PCR is an additive process that amplifies a specific DNA sequence, essentially acting like a biological photocopier. It uses thermal cycling, DNA polymerase, primers, and dNTPs to generate millions of copies of a target DNA fragment. PCR is most useful when only a very small amount of DNA is available, such as from a cheek swab or ancient DNA, and when a specific gene or sequence needs to be isolated and amplified from an entire genome.
In contrast, a restriction enzyme digest is a subtractive process that cuts DNA at specific recognition sequences using restriction endonucleases, acting like biological scissors. The reaction is usually performed at a constant temperature, around 37°C, and produces multiple DNA fragments of different sizes. Restriction digests are mainly used to manipulate or verify existing DNA, particularly plasmids, such as checking whether a gene has been successfully inserted or cutting plasmids open for cloning and ligation. They were also historically important for genomic mapping techniques like RFLP analysis. Overall, PCR is primarily used for finding and amplifying DNA, whereas restriction enzyme digests are mainly used for cutting, modifying, and analysing DNA.
🧠 4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
To ensure that DNA fragments produced through PCR and restriction enzyme digestion are suitable for Gibson cloning, several factors must be considered. First, the DNA should be purified using a PCR clean-up kit to remove residual enzymes, salts, and buffers that could interfere with the Gibson Assembly Master Mix. Gel electrophoresis should then be used to confirm that the correct DNA fragments and gene sizes were successfully generated. In addition, the fragments must contain overlapping regions of at least ~20 base pairs so they can anneal correctly during Gibson assembly. This is usually achieved by designing PCR primers with appropriate overlap tails that match the adjacent DNA fragment or vector sequence.
🦠 5. How does the plasmid DNA enter the E. coli cells during transformation?
Plasmid DNA enters E. coli cells during transformation by temporarily creating pores in the bacterial cell membrane. This can be achieved through heat shock or electroporation. In heat shock transformation, the cells are exposed to a sudden change in temperature, while electroporation uses a brief high-voltage electrical pulse. Both methods disrupt the membrane enough to allow plasmid DNA to diffuse into the cells.
After transformation, the E. coli cells are incubated in a nutrient-rich broth such as LB or SOB at 37°C to allow them to recover, begin multiplying, and express the antibiotic resistance gene carried by the plasmid. The cells are then plated onto agar containing antibiotics, so only bacteria that successfully took up the plasmid survive and form colonies. If the plasmid contains a reporter gene such as GFP, the transformed colonies may also display visible fluorescence or colour after incubation.
⚙️ 6. Describe another assembly method in detail (such as Golden Gate Assembly)
a. Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
b. Model this assembly method with Benchling or Asimov Kernel!
Golden Gate Assembly is a cloning method that allows multiple DNA fragments to be assembled seamlessly in a single reaction using Type IIS restriction enzymes such as BsaI. Unlike standard restriction enzymes, Type IIS enzymes cut outside of their recognition sequence, creating custom 4-base overhangs that determine the exact order in which fragments join together. In a single tube, the DNA fragments, destination vector, restriction enzyme, T4 DNA ligase, and reaction buffer are combined and cycled through alternating temperatures for digestion and ligation. During the reaction, correctly assembled DNA loses the restriction sites, making it resistant to further cutting, while incorrect products continue to be digested. This makes the process highly efficient and scarless, meaning no unwanted sequences are left between assembled fragments. After the reaction is complete, the enzymes are heat-inactivated and the assembled plasmid can be transformed into E. coli for propagation.
Week 7 HW: Genetic Circuits II
🧠 1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits based on Boolean logic work in a binary way, where genes are basically either on or off. In contrast, IANNs use analog signalling, meaning they can process information in a more continuous and brain-like way. Instead of just sensing whether a signal is there or not, they can also respond to how strong the signal is, which is important because biological systems are noisy and constantly changing.
One major advantage of IANNs is that they allow much finer control over gene expression instead of relying on strict thresholds. They are also more robust to stochastic biological noise, making them better suited to real cellular environments. Unlike simple AND/OR logic gates, IANNs can integrate and weight multiple inputs at the same time, similar to artificial neural networks, allowing for much more complex decision-making. They can also perform these functions with fewer genetic components, which reduces metabolic burden on the cell. Overall, IANNs are more flexible, scalable, and capable of handling complex biological tasks than traditional Boolean genetic circuits.
🧬 2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
One useful application of an IANN would be cancer detection and targeted therapy. In the lecture, Rob Weiss discussed using around 10 different biomarkers to identify whether a cell is cancerous, which I found incredibly inspiring because it highlights how future cancer therapies could become far more precise and intelligent. Detecting cancer often depends on recognising complex patterns between many biomarkers rather than relying on a single signal, and this is where IANNs become especially powerful. They could process combinations of RNA expression, protein levels, mutations, and metabolic signals simultaneously to identify more nuanced cancer signatures that traditional genetic circuits might miss.
The output of the system could then trigger a therapeutic response only when the overall cellular profile strongly matches a cancerous state. For example, the circuit could activate apoptosis-inducing genes, release immune-signalling molecules, or express fluorescent markers for detection. Because these systems can integrate and weight multiple biological signals continuously, they could potentially reduce false positives and distinguish cancer cells from healthy cells more accurately. Rob Weiss also mentioned the possibility of tailoring these genetic networks to specific tumour profiles or patients in the future, allowing for even more targeted treatments.
However, there are still limitations. Biomarker expression is noisy and variable, making it difficult to perfectly tune the system across different cells and environments. Delivering these genetic circuits safely into the body and preventing unintended activation in healthy tissue also remains a major challenge. In addition, larger and more complex networks may place metabolic burden on the cell and become harder to engineer reliably.
🔬 3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
🍄 Assignment Part 2: Fungal Materials
🌱 1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
Examples of fungal materials include mycelium-based composites and biocement. Mycelium materials are already being used for sustainable packaging, insulation, furniture, leather alternatives, and experimental building materials. In class, Renn also talked about her work with NASA exploring mycelium-based space habitats, which I thought was incredibly cool. The idea is that astronauts could potentially grow building materials directly in space instead of transporting heavy construction materials from Earth. Honestly, one day I would love to have space mushroom farmer as my LinkedIn title xD.
One of the main advantages of fungal materials is that they are biodegradable, sustainable, and can often be grown from agricultural or food waste. They are lightweight, easy to shape, and provide good thermal and acoustic insulation. Compared to traditional materials, they also tend to have a much lower environmental impact and require less energy to produce.
However, there are still limitations. Fungal materials are often weaker and more brittle than conventional materials like plastics, concrete, or metals. They can also be difficult to scale consistently because biological growth is sensitive to environmental conditions such as temperature and humidity. In addition, growing these materials takes time, making production slower than traditional manufacturing methods.
🧫 2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
One interesting goal would be to genetically engineer fungi to produce stronger, more flexible, and programmable materials that could be used in things like textiles, wearable technology, furniture, or even durable building components. Right now, many mycelium-based materials are lightweight and sustainable but can still be brittle compared to traditional materials. By modifying how the fungal cell wall is formed or introducing proteins that alter the mechanical properties of the mycelium network, it may be possible to create fungal materials with tunable strength, elasticity, or even responsive behaviours. I also think it would be fascinating to engineer fungi that could self-repair damage or adapt to different environmental conditions, especially for applications like sustainable architecture or even future space habitats.
One major advantage of using fungi for synthetic biology instead of bacteria is that fungi naturally grow as large interconnected networks of mycelium, making them much better suited for producing macroscopic structures and materials. Bacteria are generally better for producing small molecules or chemicals, whereas fungi can physically grow into complex 3D forms. Fungi can also grow on inexpensive agricultural waste and be shaped directly in moulds during growth, making fabrication relatively sustainable and low-cost. In addition, fungi are eukaryotic organisms, meaning they can carry out more complex post-translational modifications and biological processes than bacteria, which can be useful for engineering advanced material properties.
Week 9 HW: Cell Free Systems
🧪 Homework Part A: General and Lecturer-Specific Questions
🧬 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Cell-free protein synthesis essentially uses biology as an engineering tool without needing living cells. Traditional in vivo systems require cells to stay alive, meaning you constantly need to maintain the correct conditions such as nutrients, water, gases, temperature, pressure, and energy supply. In contrast, cell-free systems remove many of these constraints, giving much greater flexibility and control over experimental variables. Since there are no living cells, researchers can directly tune reaction conditions, add or remove components easily, and rapidly test biological circuits or protein designs without worrying about cell survival or toxicity.
Another major advantage is portability and stability. Cell-free systems can be freeze-dried and stored for long periods, sometimes up to a year, then simply activated again by adding water. This makes them extremely useful for therapeutics on demand, rapid manufacturing, and applications where maintaining living cells would be difficult. They also have improved biosafety because there is less risk of engineered organisms escaping into the environment.
Cell-free expression is especially beneficial in environments such as space, where sustaining living cell cultures is difficult and resources are limited. It is also useful in developing regions or disaster zones where supply chains and laboratory infrastructure may not be reliable. Other important applications include rapid protein engineering, biosensors, metabolic engineering, and testing CRISPR or synthetic biology systems in a highly controlled environment.
⚙️ 2. Describe the main components of a cell-free expression system and explain the role of each component.
A cell-free expression system contains all the molecular machinery needed for transcription and translation without requiring living cells. One of the main components is the whole cell extract, which contains ribosomes, tRNAs, aminoacyl-tRNA synthetases, translation factors, and often RNA polymerase. Together, these provide the machinery required to transcribe mRNA and translate it into protein.
Another key component is the DNA template, usually in the form of a plasmid or linear PCR product. This acts as the blueprint for the desired protein because it contains the coding sequence as well as a promoter, such as a T7 promoter, which allows RNA polymerase to initiate transcription. Amino acids and nucleotides (NTPs) are also required because they serve as the building blocks for proteins and mRNA respectively.
Since there is no living metabolism present, the system also requires an external energy source such as ATP, GTP, phosphoenolpyruvate, and pyruvate kinase to power protein synthesis and regenerate ATP. In addition, salts and buffers are needed to maintain the correct chemical environment. For example, magnesium stabilises ribosomes and supports polymerase activity, potassium helps maintain ionic strength for enzyme activity and protein folding, and buffers such as HEPES maintain a stable pH. Finally, chaperones and protease inhibitors are often included to help proteins fold correctly and prevent them from being degraded during synthesis.
🔋 3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
Energy provision and regeneration are critical in cell-free systems because there is no living cell metabolism to continuously produce ATP. Unlike living cells, cell-free systems do not contain mitochondria or other metabolic pathways that naturally regenerate energy, yet processes such as transcription and translation require large amounts of ATP and GTP. Without a continuous energy supply, protein synthesis would quickly stop.
One common method for maintaining ATP levels is using phosphoenolpyruvate (PEP) together with the enzyme pyruvate kinase (PK). PEP acts as a high-energy phosphate donor, while pyruvate kinase catalyses the transfer of a phosphate group from PEP onto ADP, regenerating ATP. As the ribosomes and other molecular machinery consume ATP during protein synthesis, ADP accumulates in the reaction mixture. Pyruvate kinase then converts this ADP back into ATP using the energy stored in PEP, allowing the system to continue functioning until the PEP supply is depleted.
🧫 4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
Prokaryotic and eukaryotic cell-free expression systems each have different advantages depending on the type of protein being produced. Prokaryotic systems, most commonly based on E. coli extracts, are typically faster, lower cost, and capable of producing very high protein yields. They are therefore ideal for simple proteins and high-throughput screening applications. In contrast, eukaryotic systems such as rabbit reticulocyte, wheat germ, HeLa, or CHO cell extracts are slower, more expensive, and generally produce lower yields, but they are much better at handling complex protein folding and post-translational modifications such as glycosylation.
A good example of a protein suited for a prokaryotic cell-free system is GFP. GFP is a relatively robust and simple protein that does not require major post-translational modifications in order to function, making it ideal for rapid and inexpensive production in E. coli-based systems.
In contrast, a protein such as human erythropoietin (EPO) is much better suited to a eukaryotic cell-free system. Although EPO is not extremely large, it is a glycoprotein hormone that requires glycosylation to become biologically active and stable in the human body. Around 40% of its mass consists of carbohydrate chains. Standard prokaryotic systems cannot naturally perform these modifications, meaning the resulting protein would be non-functional in a medical context. Eukaryotic systems contain the necessary enzymes and endoplasmic reticulum-derived vesicles required for glycosylation and proper folding, allowing complex proteins like EPO to be produced correctly.
🧠 5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
To design a cell-free experiment for optimizing membrane protein expression, the main challenge is dealing with the hydrophobic parts of the protein. In a normal cell, these transmembrane regions are stabilised by the phospholipid bilayer, but in a cell-free extract there is no natural membrane environment. This means the protein can easily misfold, aggregate, or become insoluble.
To address this, I would add synthetic membrane-like systems directly into the cell-free reaction. For example, liposomes could be used to provide a membrane compartment for the protein to insert into, while nanodiscs could help keep the membrane protein soluble and properly stabilised. I would then test different concentrations and types of liposomes or nanodiscs to see which gives the highest yield of correctly folded protein.
I would also add molecular chaperones to help newly synthesised proteins fold into their correct 3D structure and reduce aggregation. Finally, I would optimize variables such as temperature, magnesium concentration, reaction time, and DNA template concentration, then check expression and folding using a fluorescence tag, Western blot, or activity assay. Overall, the goal would be to recreate enough of a membrane-like environment that the protein can fold and function properly outside of a living cell.
🛠️ 6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
A low yield in a cell-free system could happen for several reasons. One common issue is energy depletion. Protein synthesis uses a lot of ATP and GTP, and once these energy sources are used up, translation slows down or stops. Some energy systems also create inhibitory byproducts such as inorganic phosphate, which can disrupt the reaction. To troubleshoot this, I would switch to a cleaner energy regeneration system such as glucose or pyruvate, or use dialysis so fresh substrates can diffuse in while inhibitory byproducts diffuse out.
Another possible reason is template instability or poor template quality. If the DNA or mRNA template is degraded by nucleases in the extract, the ribosomes will not have enough time to produce the target protein. To fix this, I would use a circular plasmid instead of a linear PCR product, since plasmids are generally more resistant to nuclease degradation. I could also add RNase inhibitors or protect linear DNA using GamS protein or phosphorothioate-modified primers.
A third issue could be protein folding or solubility. The protein may be synthesised but then misfold, aggregate, or become insoluble, especially if it has hydrophobic regions or needs disulfide bonds. To troubleshoot this, I would lower the reaction temperature to slow down translation and give the protein more time to fold properly. I would also add chaperones such as DnaK/J or GroEL/ES, include mild detergents if needed, and adjust the redox environment with GSH/GSSG if the protein requires disulfide bond formation. Finally, I would check basic reaction conditions such as magnesium, potassium, pH, and codon usage, since poor tuning of these variables can also reduce yield.
🧲 Homework question from Kate Adamala
🧬 What would your synthetic cell do? What is the input and what is the output?
I would design a magnetically guided synthetic minimal cell that can sense a disease-like environment and produce a signal or therapeutic output. The input could be a small molecule associated with cancer or inflammation, such as high lactate. The output would first be something easy to measure, like sfGFP fluorescence, but later this could be replaced with a therapeutic protein.
🧪 Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
Partly yes, but encapsulation makes it more useful because the membrane gives the system a cell-like boundary. It protects the reaction, allows communication with the environment, and makes the system behave more like a programmable artificial cell rather than just a test-tube reaction.
🦠 Could this function be realized by genetically modified natural cell?
Yes, but a synthetic minimal cell is safer and more controllable because it is not alive and cannot replicate. This is useful for therapeutic or environmental applications where you do not want engineered cells spreading.
🎯 Describe the desired outcome of your synthetic cell operation.
The desired outcome is that the synthetic cell only produces a fluorescent or therapeutic output when it detects the correct disease-associated signal. Ideally, it could also be guided or concentrated using magnetic particles.
🧫 2. Design all components that would need to be part of your synthetic cell
🫧 What would the membrane be made of?
The membrane could be made from a simple lipid vesicle using POPC and cholesterol, with a small amount of DOTAP to tune membrane charge and stability.
⚙️ What would you encapsulate inside? Enzymes, small molecules.
Inside, I would encapsulate an E. coli cell-free Tx/Tl system, DNA templates, ribosomes, tRNAs, amino acids, NTPs, ATP/GTP, an energy regeneration system, salts, buffer, and magnetic nanoparticles.
🧬 Which organism would your Tx/Tl system come from? Is bacterial OK, or do you need a mammalian system?
A bacterial E. coli Tx/Tl system would be fine for the first version because it is fast, cheap, and high-yield. Since the output is sfGFP or a simple protein, we do not need a mammalian system unless the protein requires complex folding, glycosylation, or mammalian promoters like Tet-ON.
🌍 How will your synthetic cell communicate with the environment?
Small molecules could enter through a membrane pore. A good example is α-hemolysin, encoded by the hla gene, which forms pores that allow small molecules to pass into the vesicle and activate the internal expression system.
🧪 3. Experimental details
🧬 List all lipids and genes.
The lipids would be POPC, cholesterol, and possibly DOTAP. The genes would include sfGFP as the reporter gene, a sensor-controlled promoter for the disease-associated input, and hla if I wanted the vesicle to express or contain α-hemolysin membrane pores.
📈 How will you measure the function of your system?
I would measure sfGFP fluorescence over time using a plate reader or fluorescence microscope. I would compare vesicles with and without the input signal, and also compare vesicles with and without α-hemolysin pores. If the system works, only vesicles exposed to the correct input should become fluorescent.
🤖 Homework question from Peter Nguyen
💡 Write a one-sentence summary pitch sentence describing your concept.
I would design a soft robotic skin embedded with freeze-dried cell-free systems that allows robots to chemically sense and respond to their environment like a form of synthetic biological touch.
🧠 How will the idea work, in more detail?
The idea would involve integrating freeze-dried cell-free biosensors directly into the flexible outer layer of a soft robot. When exposed to moisture or environmental chemicals, the embedded cell-free systems would activate and detect specific signals such as toxins, pH changes, bacterial contamination, or stress-related molecules. Depending on the detected input, the system could generate fluorescent outputs, trigger enzymatic reactions, or even alter the physical properties of the robotic material itself, such as stiffness, adhesion, or permeability. For example, a search-and-rescue robot could detect dangerous gas leaks or bacterial contamination in environments where traditional electronic sensors struggle. I think the exciting part is that instead of just giving robots electronic sensors, you are essentially giving them a programmable biochemical layer inspired by living tissue.
🌍 What societal challenge or market need will this address?
This could help address the need for safer and more adaptable robots in hazardous environments such as disaster zones, chemical spills, industrial sites, or healthcare settings. Traditional sensors are often rigid, power-intensive, and limited in the types of molecules they can detect. A biologically integrated robotic skin could allow robots to sense subtle chemical changes in real time while remaining lightweight and flexible. It could also reduce reliance on expensive sensor hardware and open up new possibilities for soft robotics and human-robot interaction.
🧊 How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?
The cell-free systems could be freeze-dried into hydrogel compartments or microcapsules embedded throughout the robotic skin, allowing them to remain stable until activated by water or environmental moisture. To address one-time use limitations, the robotic skin could contain replaceable sensing patches or layered compartments that activate sequentially over time. Stability could be improved using protective polymer coatings, antioxidants, and UV-resistant materials to protect the biological components during long-term operation.
🚀 Homework question from Ally Huang
Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!
For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .
🛰️ Background information
Spaceflight exposes astronauts to microgravity and radiation, both of which can stress human cells and disrupt normal biological function. One major concern is DNA damage, since long-duration missions to the Moon or Mars will involve greater radiation exposure than life on Earth. Understanding how cells respond to DNA damage in space is important for astronaut health, cancer risk, and future space medicine. It is also scientifically interesting because space acts like an extreme biological environment, revealing how fundamental repair pathways behave when normal gravity and environmental conditions are removed.
🧬 Molecular or genetic target
The DNA damage response protein p53, encoded by the TP53 gene, using a p53-responsive fluorescent reporter in the BioBits® cell-free system.
🔭 Relationship to space biology question
p53 is a key regulator of the cellular response to DNA damage. When DNA damage occurs, p53 helps activate repair pathways, cell-cycle arrest, or apoptosis depending on the level of stress. Since radiation in space can damage DNA, studying p53-related activity provides a useful way to model how human cells might respond to spaceflight conditions. In a BioBits® cell-free system, a p53-responsive fluorescent reporter could provide a simplified, safe way to measure whether DNA-damage signalling is being activated without needing to culture living human cells in space.
🧪 Hypothesis or research goal
My research goal is to test whether a BioBits® cell-free system can be used as a simple biosensor for space-like DNA damage stress. I hypothesize that DNA templates exposed to radiation or simulated damage will produce a stronger fluorescent output from a p53-responsive reporter compared with undamaged controls. The reasoning is that p53 is one of the most important proteins involved in sensing and responding to DNA damage in human cells. If this pathway can be modelled in a freeze-dried cell-free reaction, it could become a portable tool for monitoring biological stress during space missions. This would be useful because cell-free systems are lightweight, stable, and do not require living cells, making them well-suited for constrained environments like spacecraft.
🧫 Experimental plan
I would test BioBits® reactions containing a p53-responsive fluorescent reporter. Samples would include an undamaged DNA template control, a radiation-exposed DNA template, and a positive control designed to strongly activate fluorescence. After adding water to activate the freeze-dried reactions, samples would be incubated using the miniPCR® thermal cycler if temperature control is needed. Fluorescence would be measured using the P51 Molecular Fluorescence Viewer. The main data collected would be fluorescence intensity over time, comparing damaged versus undamaged samples to determine whether the system can detect DNA-damage-related signalling.
Week 10 HW: Imaging & Measurement Technology
🧪 Final Project
📋 For your final project:
- Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
- Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
- What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.
For my final project, I would measure whether GFP was successfully conjugated to magnetic beads and whether those GFP-coated magnetic beads can activate anti-GFP synNotch/SNIPR-style receptors in cells. The main thing I care about is whether magnetic presentation of the ligand changes receptor activation compared to normal soluble GFP.
First, I would measure GFP attachment to the magnetic beads. I could do this by measuring the fluorescence of the supernatant before and after conjugation. If GFP successfully binds to the beads, the remaining supernatant should become less fluorescent because less free GFP is left in solution. I could also image the beads under a fluorescence microscope to see whether the magnetic particles show green fluorescence, although this is more qualitative. A more quantitative method would be using a plate reader to compare GFP fluorescence in the starting solution, wash fractions, and final bead fraction.
Second, I would measure whether the cells are actually receiving and expressing the synNotch receptor and reporter plasmids. This could be checked using fluorescence microscopy or flow cytometry if the system includes a reporter such as mNeonGreen, mKO2, or eBFP2. Flow cytometry would be especially useful because it would let me quantify what percentage of cells are fluorescent and how strong the signal is per cell.
Third, I would measure synNotch activation itself. The output would be reporter expression downstream of the UAS promoter, such as mNeonGreen or another fluorescent protein. I would compare cells exposed to soluble GFP, GFP-conjugated magnetic beads without a magnet, and GFP-conjugated magnetic beads with magnetic guidance. If the magnetic system works, I would expect stronger or more spatially localized reporter expression in the magnetic bead condition.
The main technologies I would use are fluorescence microscopy, plate reader fluorescence measurements, magnetic separation, and potentially flow cytometry. Fluorescence microscopy would show where the GFP-beads are located and whether reporter activation is spatially patterned. A plate reader would give a bulk quantitative measurement of GFP conjugation and reporter output. Flow cytometry would give a more precise single-cell measurement of activation across the cell population. Together, these measurements would let me test both parts of the project: whether the magnetic GFP ligand was made successfully, and whether it can control synthetic receptor activation in cells.
⚖️ Waters Part I Molecular Weight
1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight?
To calculate the theoretical molecular weight of eGFP, we used the full amino acid sequence provided, including the eGFP core protein, the LE linker, and the 6x-His purification tag. The sequence was entered into the ExPASy Compute pI/Mw tool, which calculates the predicted molecular weight based on the amino acid composition of the protein.
The calculated theoretical molecular weight was 27,988.97 Da. This value represents the expected mass of the intact eGFP protein before experimental measurement by mass spectrometry.
2. Calculate the molecular weight of the eGFP using the adjacent charge state approach
To experimentally determine the molecular weight of eGFP from the mass spectrum, we used the adjacent charge state method. In electrospray ionization mass spectrometry (ESI-MS), proteins acquire multiple positive charges, producing a series of peaks corresponding to different charge states. By selecting two adjacent peaks, we can calculate the charge state and then determine the molecular weight.
We selected two adjacent peaks from the spectrum:
- Peak 1: 875.4421 m/z
- Peak 2: 903.7148 m/z
2.1 Determine z for each adjacent pair of peaks
We used the charge state equation:
z = (m/zn+1 − 1.0078) / (m/zn+1 − m/zn)
Substituting the values:
z = (903.7148 − 1.0078) / (903.7148 − 875.4421)
z = 902.707 / 28.2727
z = 31.92
Rounding to the nearest integer gives a charge state of 32+ for the 875.4 peak and 31+ for the 903.7 peak.
2.2 Determine the molecular weight of the protein
Using the 32+ charge state:
MW = (m/z × z) − (z × 1.0078)
MW = (875.4421 × 32) − (32 × 1.0078)
MW = 28014.15 − 32.25
MW = 27981.90 Da
Therefore, the experimentally determined molecular weight of eGFP was 27,981.90 Da.
2.3 Calculate the accuracy of the measurement
To compare the experimental value to the theoretical value, we calculated the error in parts per million (ppm):
Error (ppm) = |MWexp − MWtheory| / MWtheory × 1,000,000
Substituting the values:
Error = |27981.90 − 27988.97| / 27988.97 × 1,000,000
Error = 252.6 ppm
This shows that the experimentally measured molecular weight was very close to the predicted theoretical mass.
3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?
No, the individual isotopic charge state peaks cannot be clearly resolved in the zoomed-in spectrum. This is because the protein is detected in a highly charged denatured state around 32+, meaning the isotopic peak spacing becomes very small:
1/z = 1/32 ≈ 0.03 m/z
At this spacing, the peaks are too close together for a mass spectrometer with a resolution of 30,000 to distinguish individually. Instead of separate isotopic peaks, the signal appears as a broad unresolved isotopic envelope.
🧬 Waters Part II — Secondary/Tertiary Structure
We will analyze eGFP in its native, folded state and compare it to its denatured, unfolded state on a quadrupole time-of-flight MS. We will be doing MS-only analysis (no liquid chromatography, also known as “direct infusion” experiments) on the Waters Xevo G3-QToF MS.
1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?
In the denatured state, the protein becomes unfolded, usually due to acidic solvents, organic solvents, or heat. This unfolding exposes many basic amino acid residues such as lysine, arginine, and histidine that were previously buried inside the protein structure. Because more protonation sites become accessible, the protein picks up many protons during electrospray ionization, leading to high charge states. As a result, the denatured protein appears at lower m/z values in the mass spectrum, typically in the ~500–1500 m/z range.
In the native state, the protein remains folded in its normal 3D conformation. For eGFP, this corresponds to its compact beta-barrel structure. Since many basic residues remain buried within the folded protein, fewer sites are available for protonation. Consequently, the protein acquires fewer charges and appears at higher m/z values in the mass spectrum, typically in the ~2000–4000 m/z range.
By comparing the spectra, we observe that the denatured spectrum contains a broad distribution of many highly charged peaks at low m/z, whereas the native spectrum contains fewer charge states shifted toward much higher m/z values. This reflects the difference between an unfolded and compact folded protein structure.
2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z in Figure 3? What is it? How can you tell?
Yes. The charge state of the peak at ~2800 m/z is 10+.
To determine this, we examine the zoomed-in isotopic distribution shown in Figure 3. The individual isotopic peaks are clearly resolved, and the spacing between adjacent isotopes is approximately 0.1 m/z.
In mass spectrometry, isotopic peak spacing is equal to 1/z, where z is the charge state.
Since the observed spacing is ~0.1 m/z:
z = 1 / 0.1 = 10
Therefore the peak corresponds to a 10+ charge state.
🧪 Waters Part III — Peptide Mapping - primary structure
We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.
There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.
1. How many Lysines (K) and Arginines (R) are in eGFP?
To determine how many trypsin cleavage sites exist in eGFP, we analyzed the amino acid sequence and counted the number of Lysine (K) and Arginine (R) residues, since trypsin specifically cleaves after K and R residues.
From the sequence analysis, eGFP contains:
- 20 Lysines (K)
- 6 Arginines (R)
These residues define the locations where trypsin can digest the protein into smaller peptide fragments.
2. How many peptides will be generated from tryptic digestion of eGFP?
To predict the number of peptides produced after digestion, we used the PeptideMass tool from ExPASy. The full eGFP amino acid sequence was entered into the program and digested in silico using trypsin with zero missed cleavages.
The prediction generated:
27 peptides
This represents the theoretical number of peptide fragments expected after complete tryptic digestion.
3. How many chromatographic peaks do you see between 0.5 and 6 minutes in Figure 5a?
We examined the Total Ion Chromatogram (TIC) shown in Figure 5a and counted peaks with greater than 10% relative abundance between 0.5 and 6 minutes retention time.
Approximately:
18 distinct chromatographic peaks were observed.
Each peak likely corresponds to one or more peptides eluting from the LC column during the separation.
4. Does the number of peaks match the number of peptides predicted?
No. The number of observed chromatographic peaks does not exactly match the predicted number of peptides.
- Predicted peptides: 27
- Observed peaks: 18
This difference is expected in LC-MS peptide mapping because some peptides may be too small to retain on the chromatography column, some may co-elute at the same retention time, and others may ionize poorly or fall below the detection threshold.
5. Identify the m/z and charge (z) of the peptide in Figure 5b. Calculate the mass of the singly charged form (MH+).
From Figure 5b, the most abundant peptide peak was observed at:
To determine the charge state, we examined the isotope spacing in the zoomed-in spectrum. The isotopic peaks were separated by approximately:
0.5 m/z
Since isotopic spacing equals 1/z, we calculate:
z = 2
Therefore, the peptide has a:
To calculate the singly charged mass (MH+):
MH+ = (m/z × z) − (z − 1)(1.0078)
Substituting the values:
MH+ = (525.767 × 2) − 1.0078
MH+ = 1050.53 Da
Thus, the singly charged peptide mass is:
6. Identify the peptide and calculate the mass accuracy in ppm.
The experimentally measured peptide mass was compared with the predicted peptide list generated by PeptideMass.
The peptide was identified as:
FEGDTLVNR
The theoretical singly charged mass for this peptide is:
1049.52 Da
The experimental mass accuracy was calculated in parts per million (ppm):
Error (ppm) = |MWexp − MWtheory| / MWtheory × 1,000,000
The calculated error was:
5.7 ppm
This very small error indicates high confidence in the peptide identification.
7. What is the percentage of the sequence confirmed?
Using the peptide mapping results shown in Figure 6, the LC-MS analysis confirmed:
88% sequence coverage
This means that peptides corresponding to 88% of the amino acid sequence of eGFP were experimentally detected and identified.
8. Bonus: What is the sequence for the fragmentation spectrum in Figure 5c?
The fragmentation spectrum in Figure 5c corresponds to the peptide:
FEGDTLVNR
This was determined by matching the observed fragment ions to the expected b-ion and y-ion fragmentation pattern generated from this peptide sequence.
9. Bonus: Does the peptide map data make sense?
Yes. The peptide map data is highly consistent with the expected eGFP protein standard.
The experiment achieved:
- High sequence coverage (88%)
- High mass accuracy (5.7 ppm)
- Matching fragmentation spectra for identified peptides
Together, these results strongly confirm that the analyzed protein is eGFP and demonstrate successful peptide mapping using LC-MS/MS.
🧬 Waters Part IV — Oligomers
We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):
- 7FU Decamer
- 8FU Didecamer
- 8FU 3-Decamer
- 8FU 4-Decamer
Based on the CDMS mass spectrum and the known KLH subunit masses, the oligomeric states can be assigned by multiplying the subunit mass by the number of subunits present.
The 7FU decamer contains 10 copies of the 7FU subunit. Since each 7FU subunit is 340 kDa, the expected mass is about 3.4 MDa, which matches the peak observed at 3.4 MDa.
The 8FU didecamer contains 20 copies of the 8FU subunit. Since each 8FU subunit is 400 kDa, the expected mass is about 8.0 MDa, which corresponds closely to the large peak observed at 8.33 MDa.
The 8FU 3-decamer contains 30 copies of the 8FU subunit, giving an expected mass of about 12.0 MDa. This matches the peak observed at 12.67 MDa.
The 8FU 4-decamer contains 40 copies of the 8FU subunit giving an expected mass of about 16.0 MDa. This would correspond to the lower-abundance peaks furthest to the right, around 16 to 17 MDa.
🟢 Waters Part V — Did I make GFP?
Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.
| Metric | Theoretical | Observed (Measured) | PPM Mass Error |
|---|
| Molecular weight (kDa) | 27.989 kDa | 27.982 kDa | 252.6 ppm |
Week 11 HW: Bioproduction and Cloud Labs
🎨 The 1,536 Pixel Artwork Canvas
Everyone on the HTGAA network contributed to this global piece of artwork: https://rcdonovan.com/synbiobeta (I contributed by adding a few yellow cells in the bottom centre of the plate for the design. Shout out to Ronan Donovan our TA. I think its absolutely awesome turning biology into a medium for artistic expression!
This gave me a fun idea - the pixel art aesthetic kind of reminds me of conway's game of life. What if we made a little simulation where cells of fluorescent proteins/bo pixels evolved over time using the rules from the game of life like a living fluorescent colony - might vibe code this up as a fun weekend project :)
🧪 Cell Free Protein Synthesis | Cell Free Reagents
🧫 What is the role of the BL21 (DE3) Star lysate?
The lysate basically provides all the cellular machinery needed for protein production outside of living cells, including ribosomes, enzymes, and cofactors. It also contains T7 RNA polymerase, which transcribes the DNA template into mRNA using the T7 promoter system.
🧂 Why is potassium glutamate included?
Potassium glutamate helps recreate the ionic conditions normally found inside cells, which keeps enzymes active and stabilizes ribosomes during transcription and translation.
🧪 What does HEPES-KOH do?
HEPES-KOH acts as a buffer to keep the reaction at a stable physiological pH (~7.5), which is important because the transcription and translation enzymes work best under those conditions.
⚙️ Why is magnesium glutamate important?
Magnesium ions are essential cofactors for many biological processes in the reaction, especially ribosome function and RNA polymerase activity.
🧬 What is the purpose of potassium phosphate monobasic and dibasic?
Together, these phosphate salts help maintain pH balance and provide phosphate ions that are important for nucleotide metabolism and energy transfer.
⚡ Why are ribose and glucose included in the energy system?
Ribose and glucose act as energy and carbon sources that help regenerate nucleotides and ATP over time, allowing the reaction to continue for much longer incubations.
🔬 What roles do AMP, CMP, GMP, and UMP play?
These nucleotide monophosphates serve as precursors that can be converted into ATP, CTP, GTP, and UTP, which are needed for transcription, translation, and energy metabolism.
🧬 Why is guanine added separately?
Guanine can be salvaged by enzymes in the lysate and converted into GMP/GTP, helping replenish the guanosine nucleotide pool needed for transcription and translation.
🧱 What is the purpose of the amino acid mix?
The amino acid mix supplies the building blocks needed by ribosomes to synthesize proteins.
🧪 Why are tyrosine and cysteine added separately?
Tyrosine is added separately because it has poor solubility at neutral pH, while cysteine is separated because it is highly reactive and important for forming disulfide bonds in proteins.
🔋 What does nicotinamide do?
Nicotinamide is a precursor to NAD+, which supports redox reactions and helps regenerate energy during the cell-free reaction.
💧 Why is nuclease-free water used?
Nuclease-free water is used to bring the reaction to the correct final volume without introducing RNases or DNases that could degrade the nucleic acids in the reaction.
⏱️ 2. What are the main differences between the 1-hour PEP/NTP mix and the 20-hour NMP-ribose mix?
The biggest difference is how they generate energy and nucleotides. The 1-hour PEP/NTP mix supplies ready-to-use NTPs and uses PEP as a fast, direct energy source, so the reaction starts quickly but doesn’t last very long. In contrast, the 20-hour NMP-ribose mix relies on NMPs, ribose, and glucose, which the lysate enzymes gradually convert into usable nucleotides and ATP, making the reaction slower but much more sustainable over long incubations.
The 1-hour system is optimized for rapid protein production, so it includes extra additives that boost transcription and translation efficiency immediately. The 20-hour system is designed for long-term stability, so it uses a simpler formulation with fewer additives.
🧬 Bonus question: How can transcription occur if GMP is not included but guanine is?
Even though GMP is not directly added, the lysate can recycle guanine through the nucleotide salvage pathway. Enzymes convert guanine into GMP, which can then be phosphorylated into GTP and used for transcription.
🌍 Planning the Global Experiment | Cell-Free Master Mix Design
🧬 1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)
🟢 sfGFP
sfGFP is engineered for extremely fast and robust folding, which makes it one of the most reliable fluorescent reporters in cell-free expression systems. Its fluorescence develops quickly and consistently even under less-than-ideal reaction conditions, although chromophore maturation still depends on oxygen availability.
🔴 mRFP1
mRFP1 has a relatively slow maturation time compared to newer red fluorescent proteins, so fluorescence often appears significantly later than the actual protein translation event. It is also less bright than modern red reporters, which can reduce signal sensitivity in low-yield reactions.
🟠 mKO2
mKO2 is a very bright orange fluorescent protein, making it useful for strong signal detection in multiplexed experiments. However, its fluorescence can be sensitive to acidic pH shifts and photobleaching during long imaging experiments, which may reduce signal stability over time.
🔵 mTurquoise2
mTurquoise2 has an exceptionally high quantum yield and strong photostability, allowing sensitive fluorescence detection even at relatively low protein concentrations. It also matures rapidly, which helps produce fast fluorescence readouts in cell-free reactions.
🌹 mScarlet-I
mScarlet-I is one of the brightest monomeric red fluorescent proteins and matures faster than many earlier red reporters, making it highly effective for real-time fluorescence measurements. Like most fluorescent proteins, its chromophore formation requires oxygen, so low-oxygen conditions can limit fluorescence development.
💙 Electra2
Electra2 was engineered for high stability and rapid maturation, which allows fluorescence to closely track ongoing protein production in real time. Its blue fluorescence also provides good spectral separation from green and red proteins, making it useful for multicolor cell-free experiments.
🧪 2. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.
Target Protein: mRFP1
Reagent Adjustment: Add a small amount of GMP and slightly increase cysteine in the 36-hour cell-free mastermix.
Hypothesis: Because mRFP1 has relatively slow maturation and lower brightness, adding GMP could improve GTP availability for sustained transcription, leading to more mRNA and more total protein production. Increasing cysteine may also help support proper folding, so together these changes should increase the amount of mature fluorescent mRFP1 produced over the 36-hour incubation.