Subsections of Homework
Week 1 HW: Principles and Practices
1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. 🧬
Bio-Hybrid Fusion Blanket
Research Context:
I am currently a research assistant investigating Magnetohydrodynamics (MHD), specifically focusing on the complex interactions between magnetic fields and 150-million-degree plasma. My work involves optimizing plasma confinement within Tokamak reactors. At these extreme temperatures, the behaviour of the plasmas is governed by a delicate balance of magnetic pressure and fluid dynamics, creating an environment that is incredibly hostile to the physical structures surrounding it.
Physics Problem:
In a Deuterium-Tritium fusion reactor, the blanket is a critical component that lies in the interior of the reactor. It captures high energy neutrons released by fusion converting their kinetic energy into heat which generated electricity. It also contains lithium which when struck by those neutrons breeds tritium, the fuel we can recycle in the reactor.
Currently, these blankets are limited by severe material degradation. High-energy neutron bombardment causes metals to swell, become brittle, and crack from the inside out as waste reactants accumulate. Plasma is also a volatile fluid that is difficult to control with magnets; sudden disruptions can dump massive thermal loads onto the reactor walls. Since current materials are rigid and static, they cannot absorb or repair these shocks, leading to surface melting and catastrophic structural failures.
The proposal:
The idea is to develop a bio-hybrid, self-healing blanket for the fusion reactor replacing rigid metal walls with a dynamic system where biology acts as both the architect and the maintenance crew.
One idea could be utilizing synthetic biology to grow the initial reactor structure. By using biology as a 3D template, we can grow a reactor structure that places lithium atoms with perfect accuracy. This creates a more efficient fuel-making system inside a heat-shield wall filled with tiny, vein-like cooling channels that traditional machines simply can’t build.
Another idea involves using the reactor’s downtime as a biological recovery phase. Once the system is cooled, the network of vascular channels becomes a highway for bespoke, bio-engineered cells designed to seek out and clear trapped helium waste. These cellular workers then secrete new mineral precursors to “re-grow” the scaffolding at the site of neutron-induced cracks, allowing the blanket to rejuvenate its structural integrity like a self-maintaining organ.
2. Governance or policy goals for an ethical future ⚖️
One goal would be to ensure the bio-hybrid blanket is easy to clean up. We want to avoid creating bio-nuclear waste that is harder to handle than regular blanket material.
Sub goal 1: Easy deconstruction – the biological structure should be non-toxic and easy to recycle and dissolve away after use; we should be able to filter out and recycle expensive metals used such as lithium.
Sub goal 2: Chemical safety – the maintenance cells must be engineered so they don’t produce harmful chemicals while they work, so the reactor process doesn’t require hazardous waste treatment.
3. Governance actions across actors 🏛️
Action 1: Digital DNA Registry (Technical Strategy)
Actor: Researchers
Purpose: Move away from secret, proprietary cell design to a shared public database of genetic blueprints.
Design: Researchers must upload their genetic code of their cells to a registry so other people know how to handle and recycle them.
Assumptions: Assumes labs will share blueprints, and that a global standard for DNA data would work.
Risk of Failure: Bad actors could learn how to destabilise or reverse engineer cells since it’s public.
Success: Any country could build and recycle their reactors and blankets.
Action 2: Green Fusion Tax Credits (Financial Incentive)
Purpose: Reward reactors that prove they are highly recyclable.
Design: The government would give extra funding or tax breaks to companies whose bio blankets leave minimal toxic waste behind.
Assumptions: Assumes money is the biggest motivator for companies to prioritize over speed of reactor development.
Risk of Failures: Companies might greenwash their data to get money without being clean.
Success: Low waste reactors become the most profitable way to run and becomes industry standard.
Action 3: Biological Security (New Rule)
Purpose: Prevent technology from being turned into a biological weapon that can survive extreme environments.
Design: Require fusion labs to store biological material in high security facilities with background checks like those used for handling nuclear fuel.
Assumptions: Assumes these bio engineered cells are dangerous.
Risk of Failure: Expensive security could slow down science, so we never get to clean fusion energy.
Success: Only good actors working on building innovative materials to help achieve clean energy get access to these biomaterials.
4. Scoring governance options 📊
| Does the option: | Option 1: DNA Registry | Option 2: Green Tax Credits | Option 3: Bio-Security Rule |
|---|
| Enhance Biosecurity | | | |
| By preventing incidents | 3 | 2 | 1 |
| By helping respond | 1 | 3 | 2 |
| Foster Lab Safety | | | |
| By preventing incident | 2 | 3 | 1 |
| By helping respond | 1 | 3 | 2 |
| Protect the environment | | | |
| By preventing incidents | 2 | 1 | 2 |
| By helping respond | 1 | 2 | 3 |
| Other considerations | | | |
| Minimizing costs or burdens | 1 | 2 | 3 |
| Feasibility | 2 | 1 | 3 |
| Not impede research | 2 | 3 | 1 |
| Promote constructive applications | 1 | 2 | 3 |
5. Recommended governance pathway 🎯
I would prioritize Action 3: Biological Security as the main requirement addressed to the U.S. Department of Energy and Defense. This is because we first and foremost should address the immediate risk of creating bioweapons that can withstand radiation and high temperatures. This ensures that the foundation of the industry is built on containment and control before scaling or commercialization. Once the technology is regulated in a similar manner to nuclear fuel, Action 1 should be incentivized serving as a long-term safety net, providing a transparent repair manual for materials once they are safely deployed.
Trade-offs and Uncertainties:
Innovation vs. Security: The primary trade-off is that high security increases costs and can slow down academic research. There is a risk that over-regulating early-stage biology could delay clean fusion energy development.
Assumption of Risk: This plan assumes these bio-engineered cells are dangerous enough to warrant military-grade security. If the cells are actually fragile outside the reactor, the security measures might be unnecessary.
Questions from Professor Jacobson 🧪
Error rate for polymerase is 1 in 106 bases. The human genome length is 3.2 × 109 bases. Biology deals with the discrepancy using the MutS Repair system.
Average Human Protein: 1036bp = 345 amino acids
Each amino acid can have 61 sense codons – so that’s 61^345 = huge number of different ways. Most codes don’t work in practice because differences in codon bias, mRNA and translation efficiency can disrupt expression, stability, or correct protein production.
Questions from Dr. LeProust 🧬
Phosphonamidite DNA Synthesis.
Due to the high error rate – 1 in 10^2 per base so errors and truncated products accumulate exponentially with each base addition cycle.
After 2000 chemical synthesis cycles, errors and incomplete couplings accumulate at each step, and because the process has no proofreading, nearly all strands become truncated or mutated, leaving virtually no correct full-length product.
Question from George Church 🧠
10 essential amino acids which can’t be synthesized in the body:
Phenylalanine
Valine
Threonine
Tryptophan
Isoleucine
Methionine
Histidine
Arginine
Leucine
Lysine
Since Lysine is one of the amino acids which can’t be synthesised, lysine contingency as a strategy for bio containment exploits this natural dependency to control.
Sources:
https://nutrenaworld.com/blog/horses/what-are-essential-amino-acids-in-protein-and-why-do-they-matter/
Ai prompt – What is Lysine Contingency:
Lysine Contingency is a biocontainment strategy where an engineered organism is made unable to synthesize lysine, so it can only survive if lysine is externally supplied.
Week 2 HW: DNA Design Challenge
⚙️ 3.1 Choose a protein
I chose the ATP synthase beta subunit because it’s essentially a biological motor and connects to my broader interest in energy systems:
Protons flow down their gradient across the mitochondrial membrane, almost like current moving through a circuit, and that flow physically spins part of the protein like a tiny turbine. That rotation drives changes in the beta subunits, which catalyze the formation of ATP from ADP and phosphate.
So it’s literally energy stored in a gradient being converted into mechanical motion and then into chemical energy. I find that idea really compelling, it’s molecular thermodynamics in action, where fundamental physics laws become something tangible inside living cells.
From NCBI I obtained the protein sequence:
https://www.ncbi.nlm.nih.gov/protein/NP_001677.2/
https://www.ncbi.nlm.nih.gov/protein/NP_001677.2?report=fasta
NP_001677.2 ATP synthase F(1) complex subunit beta, mitochondrial precursor [Homo sapiens]
MLGFVGRVAAAPASGALRRLTPSASLPPAQLLLRAAPTAVHPVRDYAAQTSPSPKAGAATGRIVAVIGAV
VDVQFDEGLPPILNALEVQGRETRLVLEVAQHLGESTVRTIAMDGTEGLVRGQKVLDSGAPIKIPVGPET
LGRIMNVIGEPIDERGPIKTKQFAPIHAEAPEFMEMSVEQEILVTGIKVVDLLAPYAKGGKIGLFGGAGV
GKTVLIMELINNVAKAHGGYSVFAGVGERTREGNDLYHEMIESGVINLKDATSKVALVYGQMNEPPGARA
RVALTGLTVAEYFRDQEGQDVLLFIDNIFRFTQAGSEVSALLGRIPSAVGYQPTLATDMGTMQERITTTK
KGSITSVQAIYVPADDLTDPAPATTFAHLDATTVLSRAIAELGIYPAVDPLDSTSRIMDPNIVGSEHYDV
ARGVQKILQDYKSLQDIIAILGMDELSEEDKLTVSRARKIQRFLSQPFQVAEVFTGHMGKLVPLKETIKG
FQQILAGEYDHLPEQAFYMVGPIEEAVAKADKLAEEHSS
🔁 3.1 Reverse translate a protein sequence
We know we go from 3 DNA bases → RNA → 1 Codon → 1 Amino Acid → 1 Protein letter
We can find the nucleotide record
https://www.ncbi.nlm.nih.gov/nuccore/NM_001686.4
https://www.ncbi.nlm.nih.gov/nuccore/NM_001686.4?report=fasta
NM_001686.4 Homo sapiens ATP synthase F1 subunit beta (ATP5F1B), mRNA; nuclear gene for mitochondrial product
AGTCTCCACCCGGACTACGCCATGTTGGGGTTTGTGGGTCGGGTGGCCGCTGCTCCGGCCTCCGGGGCCT
TGCGGAGACTCACCCCTTCAGCGTCGCTGCCCCCAGCTCAGCTCTTACTGCGGGCCGCTCCGACGGCGGT
CCATCCTGTCAGGGACTATGCGGCGCAAACATCTCCTTCGCCAAAAGCAGGCGCCGCCACCGGGCGCATC
GTGGCGGTCATTGGCGCAGTGGTGGACGTCCAGTTTGATGAGGGACTACCACCAATTCTAAATGCCCTGG
AAGTGCAAGGCAGGGAGACCAGACTGGTTTTGGAGGTGGCCCAGCATTTGGGTGAGAGCACAGTAAGGAC
TATTGCTATGGATGGTACAGAAGGCTTGGTTAGAGGCCAGAAAGTACTGGATTCTGGTGCACCAATCAAA
ATTCCTGTTGGTCCTGAGACTTTGGGCAGAATCATGAATGTCATTGGAGAACCTATTGATGAAAGAGGTC
CCATCAAAACCAAACAATTTGCTCCCATTCATGCTGAGGCTCCAGAGTTCATGGAAATGAGTGTTGAGCA
GGAAATTCTGGTGACTGGTATCAAGGTTGTCGATCTGCTAGCTCCCTATGCCAAGGGTGGCAAAATTGGG
CTTTTTGGTGGTGCTGGAGTTGGCAAGACTGTACTGATCATGGAGTTAATCAACAATGTCGCCAAAGCCC
ATGGTGGTTACTCTGTGTTTGCTGGTGTTGGTGAGAGGACCCGTGAAGGCAATGATTTATACCATGAAAT
GATTGAATCTGGTGTTATCAACTTAAAAGATGCCACCTCTAAGGTAGCGCTGGTATATGGTCAAATGAAT
GAACCACCTGGTGCTCGTGCCCGGGTAGCTCTGACTGGGCTGACTGTGGCTGAATACTTCAGAGACCAAG
AAGGTCAAGATGTACTGCTATTTATTGATAACATCTTTCGCTTCACCCAGGCTGGTTCAGAGGTGTCTGC
ATTATTGGGCCGAATCCCTTCTGCTGTGGGCTATCAGCCTACCCTGGCCACTGACATGGGTACTATGCAG
GAAAGAATTACCACTACCAAGAAGGGATCTATCACCTCTGTACAGGCTATCTATGTGCCTGCTGATGACT
TGACTGACCCTGCCCCTGCTACTACGTTTGCCCATTTGGATGCTACCACTGTACTGTCGCGTGCCATTGC
TGAGCTGGGCATCTATCCAGCTGTGGATCCTCTAGACTCCACCTCTCGTATCATGGATCCCAACATTGTT
GGCAGTGAGCATTACGATGTTGCCCGTGGGGTGCAAAAGATCCTGCAGGACTACAAATCCCTCCAGGATA
TCATTGCCATCCTGGGTATGGATGAACTTTCTGAGGAAGACAAGTTGACCGTGTCCCGTGCACGGAAAAT
ACAGCGTTTCTTGTCTCAGCCATTCCAGGTTGCTGAGGTCTTCACAGGTCATATGGGGAAGCTGGTACCC
CTGAAGGAGACCATCAAAGGATTCCAGCAGATTTTGGCAGGTGAATATGACCATCTCCCAGAACAGGCCT
TCTATATGGTGGGACCCATTGAAGAAGCTGTGGCAAAAGCTGATAAGCTGGCTGAAGAGCATTCATCGTG
AGGGGTCTTTGTCCTCTGTACTGTCTCTCTCCTTGCCCCTAACCCAAAAAGCTTCATTTTTCTGTGTAGG
CTGCACAAGAGCCTTGATTGAAGATATATTCTTTCTGAACAGTATTTAAGGTTTCCAATAAAATGTACAC
CCCTCAGAA
🧪 3.3 Codon Optimization
Multiple codons can code for the same amino acid, but different organisms prefer certain codons over others. So we have to optimize codon usage for that specific organism otherwise translation might be inefficient, we want to use tRNA’s that are plentiful – which bind to that specific codon attaching the specific amino acid.
I have chosen E.coli as the organism to optimize the protein sequence for. Since we use them in the fluorescent bacteria artwork lab!
Above is the entire mRNA sequence, but we need the coding sequence (CDS) – the mRNA sequence has additional information like a start and end codon and untranslated regions. We can go to the CDS record instead, obtain the coding sequence and then use our codon optimization on it.
https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8924.1
https://www.idtdna.com/CodonOpt
We get this result:
ATG CTG GGA TTT GTT GGA CGT GTG GCT GCC GCG CCT GCG TCA GGA GCA CTG CGC CGC CTG ACT CCT TCT GCC TCT CTG CCG CCG GCG CAG CTG CTG CTG CGT GCG GCG CCA ACC GCG GTT CAC CCG GTG CGT GAT TAT GCC GCG CAG ACC TCG CCC TCT CCG AAA GCC GGT GCG GCC ACC GGC CGT ATC GTC GCG GTG ATC GGC GCG GTG GTA GAT GTA CAG TTT GAT GAA GGT CTG CCG CCG ATT CTC AAT GCG CTG GAA GTT CAG GGC CGT GAA ACC CGC CTG GTT CTG GAG GTA GCG CAG CAC CTG GGT GAG AGC ACC GTC CGT ACC ATT GCT ATG GAC GGC ACC GAA GGT CTG GTG CGT GGT CAG AAA GTG CTG GAT TCT GGT GCA CCG ATC AAA ATC CCG GTT GGC CCG GAA ACG TTG GGG CGT ATC ATG AAC GTC ATT GGT GAA CCG ATT GAT GAA CGT GGA CCG ATC AAA ACC AAA CAG TTT GCG CCG ATC CAT GCG GAA GCG CCG GAG TTT ATG GAA ATG AGC GTT GAG CAG GAG ATC CTG GTG ACC GGC ATC AAA GTG GTT GAT CTG CTG GCG CCG TAT GCC AAA GGC GGC AAA ATC GGC CTG TTC GGC GGT GCG GGT GTC GGC AAA ACC GTG CTG ATC ATG GAG CTG ATC AAC AAC GTG GCG AAA GCG CAC GGT GGT TAC AGC GTC TTT GCC GGT GTC GGT GAG CGC ACC CGT GAA GGT AAC GAC CTG TAT CAC GAA ATG ATT GAG AGC GGT GTG ATC AAC CTG AAA GAT GCG ACC AGC AAG GTC GCG CTG GTT TAC GGC CAG ATG AAC GAG CCG CCA GGT GCG CGT GCC CGT GTT GCG CTG ACT GGC CTG ACG GTA GCT GAG TAC TTC CGT GAC CAG GAA GGT CAG GAT GTG CTG CTG TTT ATC GAC AAC ATC TTC CGC TTC ACC CAG GCA GGC TCT GAA GTC TCT GCG CTG CTG GGT CGC ATC CCC TCA GCG GTT GGC TAT CAG CCG ACC CTG GCG ACC GAC ATG GGC ACC ATG CAG GAG CGT ATC ACC ACC ACC AAA AAA GGC TCT ATC ACC TCG GTT CAG GCG ATC TAT GTG CCG GCT GAT GAT CTG ACT GAT CCG GCA CCG GCA ACC ACC TTT GCC CAC CTG GAT GCC ACC ACC GTG CTC AGC CGT GCG ATT GCC GAG CTG GGT ATC TAC CCG GCG GTG GAT CCG CTG GAC AGC ACC TCG CGT ATT ATG GAC CCC AAC ATT GTC GGC TCT GAA CAC TAC GAT GTG GCG CGC GGC GTG CAG AAG ATC CTG CAG GAC TAC AAA AGC CTG CAG GAT ATC ATT GCC ATC CTG GGT ATG GAT GAA CTC TCT GAA GAA GAT AAA CTG ACC GTT AGC CGT GCG CGC AAA ATC CAG CGC TTC CTG AGC CAG CCG TTC CAG GTG GCG GAA GTG TTC ACC GGT CAC ATG GGC AAA CTG GTG CCG CTG AAA GAG ACT ATT AAA GGC TTC CAG CAG ATT CTG GCG GGT GAG TAC GAC CAC CTG CCG GAA CAG GCG TTC TAT ATG GTG GGC CCG ATT GAA GAG GCG GTG GCG AAA GCG GAT AAA CTG GCG GAA GAA CAT AGC AGC TAA
🧫 3.4 What technologies could be used to produce this protein from your DNA?
We can use cell dependent expressions, like cloning the optimized DNA sequence into a plasmid vector and introducing it into a host organism such as E.coli. Once inside the promoter recruits RNA polymerase and transcribes the DNA sequence into mRNA. The ribosomes then binds to the mRNA and tRNA’s match codons and deliver amino acids. The amino acids are then linked together to form the protein. The bacteria would then produce ATP synthase beta subunit as part of their cellular machinery.
🧩 4.1-2 Build your DNA insert sequence
Expression Cassette
https://benchling.com/s/seq-QDGibA4g7TjoTuX3lb5A?m=slm-Gx8zqXYh9sr4lxSK0Xqu
🔄 4.3-6 Twist, Vector choice, Sequence Download
We can view the full plasmid sequence for our clonal genes (circular dna) and pTwist Amp High Copy cloning vector in Benchling:
https://benchling.com/s/seq-wsl9w63Z5DcxN7rlp5cG?m=slm-ndl9y5U2FSsJgNYW6z7
🧬 5.1 What DNA would you want to sequence and technologies used?
I would choose to sequence the DNA of extremophiles that thrive in high-radiation or high-temperature environments. By sequencing genes involved in radiation resistance, DNA repair, and protein stabilization, we could better understand the molecular mechanisms that allow biological systems to survive under extreme stress. This knowledge could help inform the engineering of radiation-resistant biological materials or bio-hybrid systems designed to operate in harsh energy environments. Studying these organisms connects molecular biology with broader challenges in advanced energy systems.
I would use Illumina sequencing to sequence the DNA since it provides high accuracy and high throughput and is well suited for whole-genome sequencing and variant detection. It’s a second generation technique, sequencing millions of short DNA fragments in parallel using sequencing-by-synthesis. Illumina sequencing reads DNA by copying it one base at a time and taking a picture after each base is added.
The input would be the extracted genomic DNA.
Preparation steps:
- Fragment DNA into short pieces
- Ligate sequencing adapters
- PCR amplify fragments
- Load onto flow cell for cluster amplification
Essential sequencing steps:
- DNA fragments bind to flow cell
- Bridge amplification forms clusters
- Fluorescently labeled nucleotides are added one at a time
- A camera detects the fluorescent signal for each incorporated base
- The color signal determines the base
The output would be millions of short sequence reads containing nucleotide sequences and quality scores which can be assembled into a genome or aligned to a reference.
🧪 5.2 What DNA would you want to synthesize and technologies used?
I would want to synthesize a cluster of genes involved in enhanced DNA repair and protein stabilization from extremophiles and express them in a model organism. By combining multiple protective pathways, we could engineer cells with improved resistance to radiation and thermal stress. The idea would to use this to develop radiation-resistant biomaterials or biological components for extreme energy environments. We could build a genetic circuit that enables engineered bacteria to sense and respond to radiation stress. This circuit could include radiation response promoters, DNA repair genes and protective protein pathways that activate under high oxidative or ionizing radiation conditions.
To synthesize this genetic circuit, we could use Twist combined with phosphoramidite solid-phase DNA synthesis and Gibson Assembly for multi-fragment assembly.
Essential steps:
- Design optimized DNA sequence computationally
- Chemically synthesize short oligonucleotides (base-by-base addition)
- Cleave and purify oligos
- Assemble fragments into full-length gene (e.g., Gibson Assembly)
- Clone into plasmid backbone
- Sequence-verify construct
Limitations:
Length limits: Direct chemical synthesis is reliable only for short fragments with longer genes requiring assembly. There’s also base errors so we would need to do sequencing validation and it can be very expensive for large gene clusters and take a large amount of time.
✏️ 5.3 What DNA would you want to edit and why? What technologies?
I would edit the genomes of photosynthetic microorganisms such as algae to improve their efficiency in converting light energy into chemical fuels. I could target genes involved in photosystem efficiency, carbon fixation pathways, and hydrogen production.
Photosynthesis is essentially a natural solar energy conversion system, but it is quite inefficient. We could modify regulatory genes to reduce energy losses or redirect metabolic pathways toward hydrogen or biofuel production, so we could have biological systems that convert sunlight into storable chemical energy more efficiently.
I am interested as it connects directly to large-scale energy systems and treating living cells as programmable energy conversion platforms, similar to designing more efficient reactors or turbines.
WE could use CRISPR-Cas12a for genome editing in cyanobacteria.
How does it edit dna
- Design guide RNAs targeting specific genes.
- Deliver Cas12a and guide RNAs into the cells.
- Cas12a cuts the DNA at precise locations.
- The cell repairs the cut using a donor DNA template to insert optimized sequence
Design:
- Identify metabolic bottlenecks in photosynthesis or fuel production.
- Design guide RNAs.
- Design donor DNA templates if inserting new sequences.
Inputs:
- Cas enzyme
- Guide RNAs
- Donor DNA (if needed)
- Host cells (e.g., cyanobacteria)
Limitations
- Off-target edits may occur.
- Large pathway rewiring is complex.
- Efficiency gains may be modest due to thermodynamic constraints.
Week 3 HW: OpenTrons and Python
OpenTrons, Python and Hypotrochoid Patterns 🧪
We learned how to use the Opentrons Python API to write a protocol, essentially a set of instructions that controls the robot’s pipettes. Instead of manually pipetting, we defined coordinates, volumes, and movement steps in code so the robot could deposit liquid precisely into specific wells to create a defined pattern.
Also we could simulate the protocol before running it on the actual robot. This let us preview how the design would look, check for mistakes, and adjust the pattern in software first.
1. Importing the tools we need 🧰
from opentrons import types
import math
This protocol relies on two key libraries.
The math module provides the trigonometric functions ( sin , cos , pi ) needed to compute the hypotrochoid curve.
The Opentrons types module allows us to describe 3-dimensional positions on the robot deck. In particular, we use types.Point() to move the pipette relative to a reference point on the agar plate.
Together these allow us to convert mathematical coordinates into physical robot movements.
2. Protocol metadata 📋
metadata = {
'protocolName':'HTGAA_SAMI',
'author':'Sami',
'description':'Hypotrochoid loops',
'source':'HTGAA 2026 Opentrons Lab',
'apiLevel':'2.20'
}
Every Opentrons protocol contains metadata describing the experiment.
This includes:
• the name of the protocol
• the author
• a description of the experiment
• the API version
The API level is particularly important because it determines which robot commands are available.
3. Defining the robot deck layout 🧭
TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'
These constants define where each piece of labware sits on the robot deck.
For this experiment we use:
• a 20µL tip rack
• a temperature-controlled plate containing colored liquids
• an agar plate where the design will be drawn
Separating these as constants makes the protocol easier to modify if the deck layout changes.
4. Mapping colors to wells 🎨
well_colors = {
'A1':'Red',
'B1':'Yellow',
'C1':'Green',
'D1':'Cyan',
'E1':'Blue'
}
Each colored dye is stored in a specific well on the cold plate.
This dictionary creates a simple mapping between well locations and color names so that later in the protocol we can refer to colors directly (e.g., “blue”) rather than remembering the exact well coordinates.
5. Initializing the robot and loading labware 🤖
tips_20ul = protocol.load_labware(
'opentrons_96_tiprack_20ul',
TIP_RACK_DECK_SLOT
)
pipette_20ul = protocol.load_instrument(
“p20_single_gen2”,
“right”,
[tips_20ul]
)
Inside the run() function, the robot is configured by loading labware and instruments.
Here we load:
• a 20 µL tip rack
• a P20 single-channel pipette
The pipette is mounted on the robot’s right arm and is linked to the tip rack so the robot knows where to pick up tips.
6. Finding color locations automatically 🔎
def location_of_color(color_string):
for well, color in well_colors.items():
if color.lower() == color_string.lower():
return color_plate[well]
Instead of hardcoding well positions throughout the code, this helper function allows us to request colors by name.
For example:
location_of_color("blue")
The function searches the well_colors dictionary and returns the corresponding well location on the plate.
This keeps the protocol clean and readable.
7. Calculating hypotrochoid curves 🧮
def hypotrochoid_points(R_mm, r_mm, d_mm, n_steps, n_turns):
x = (R - r) * cos(t) + d * cos((R - r) / r * t)
y = (R - r) * sin(t) - d * sin((R - r) / r * t)
The core of the design is the hypotrochoid equation, the same mathematical curve used in spirograph toys.
A hypotrochoid describes the path traced by a point on a circle rolling inside a larger circle.
The parameters control the shape:
• R – radius of the large circle
• r – radius of the rolling circle
• d – distance of the pen from the rolling circle center
The function evaluates these equations at many values of t to generate a list of (x, y) points representing the curve.
These coordinates later become robot movement instructions.
8. Transforming the curve 🔄
def rotate_points(pts, deg):
th = math.radians(deg)
return [(x*c - y*s, x*s + y*c) for x, y in pts]
Scaling
def scale_points(pts, scale):
return [(x * scale, y * scale) for x, y in pts]
This shrinks or expands the pattern.
By applying these transformations we can create multiple interwoven layers of the same curve.
9. Converting curve points into droplets 💧
loc = center_location.move(types.Point(x, y))
dispense_and_detach(pipette_20ul, drop_ul, loc)
Each (x, y) coordinate is translated into a physical position on the agar plate relative to the plate center.
The robot then:
- moves above the point
- dispenses a tiny droplet
- lifts the pipette slightly to detach the drop
This produces a sequence of small droplets that trace the mathematical curve.
10. Creating layered designs 🧵
layers = [
('cyan',0,1.00,0.2,2.5),
('blue',18,1.00,0.2,2.5),
('green',36,0.985,0.2,2.5),
('yellow',54,0.97,0.2,2.5),
]
Instead of drawing a single curve, the protocol draws multiple layers.
Each layer specifies:
• a color
• a rotation angle
• a scale factor
• droplet size
• spacing between droplets
By rotating and slightly scaling each layer, the curves weave together into a complex multi-color pattern.
11. Drawing the pattern ✏️
for color, rot_deg, scl, dot_ul, step_mm in layers:
pts = scale_points(base_pts, scl)
pts = rotate_points(pts, rot_deg)
dispense_path(color, pts)
For each layer the protocol:
- scales the base hypotrochoid
- rotates it
- sends the points to the dispensing routine
The robot then physically draws the pattern on the agar plate.
12. Adding a final decorative ring ✨
ring_pts = []
for i in range(80):
t = 2 * math.pi * i / 80
ring_pts.append((ring_r * math.cos(t), ring_r * math.sin(t)))
Finally, a small circular ring of yellow droplets is added at the center of the design.
This creates a visual “sparkle” effect and highlights the symmetry of the pattern.
Week 4 HW: Protein Design Part 1
🔵 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Protein in 500 g of meat:
100 g → 26 g protein
500 g → 130 g protein
Mass of one amino acid:
1 Dalton = 1.66 × 10⁻²⁴ g
Average amino acid ≈ 100 Da
→ 100 × 1.66 × 10⁻²⁴ = 1.66 × 10⁻²² g
Number of amino acid molecules:
130 g ÷ 1.66 × 10⁻²² g ≈ 7.83 × 10²³ molecules
Convert to moles using Avogadro’s number:
7.83 × 10²³ ÷ 6.022 × 10²³ ≈ 1.30 mol
Final answer:
≈ 7.8 × 10²³ amino acid molecules
≈ 1.3 mol of amino acids
🔵 2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
When we eat beef or fish, the body breaks it down into basic building blocks like amino acids, so at that point it is no longer “cow” or “fish” but just raw materials that we use to build our own cells and proteins; the DNA in food is also broken down and cannot function in our bodies, and since our cells only follow human DNA instructions, we are simply using the materials rather than becoming what we eat.
🔵 3. Why are there only 20 natural amino acids?
It is not fully understood why there are only 20 natural amino acids. One idea, proposed by Francis Crick, is the frozen accident theory, which suggests that the genetic code is not perfectly optimized but instead came from an early, somewhat arbitrary setup that later became fixed. In that sense, the 20 amino acids we see today may have just been what happened to get locked in at the start of life. At the same time, studies suggest these amino acids cover a good spread of chemical properties—like charge, polarity, hydrophobicity, and size—so they are diverse enough to build a wide range of protein structures.
🔵 4. Can you make other non-natural amino acids? Design some new amino acids.
Yes you can create non-natural amino acids. A well known example is the work by Floyd E. Romesberg, particularly the paper A Genomically Recoded Organism with an Expanded Genetic Alphabet (Nature, 2014), which demonstrated that the genetic alphabet can be expanded by introducing unnatural base pairs. This allows cells to encode and incorporate non-natural amino acids into proteins by creating new codons.
🔵 5. Where did amino acids come from before enzymes that make them, and before life started?
Amino acids likely formed before life through simple chemistry on early Earth rather than through enzymes. A classic example is the experiment by Stanley Miller, who showed in his 1953 Science paper that if you simulate early Earth conditions (basic gases plus an energy source like lightning), amino acids can form spontaneously. This lines up with ideas going back to Charles Darwin, who speculated that life might have first emerged in a warm little pond with the right chemicals and energy. So the building blocks of life can arise from pretty simple ingredients without any biology involved, and interestingly, even now we still haven’t been able to create life itself from scratch.
🔵 6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Because D-amino acids are mirror images of L-amino acids, they naturally form the opposite helix. So while normal L amino acids form right-handed α-helices, D-amino acids form left-handed ones.
🔵 7. Can you discover additional helices in proteins?
Yes, proteins can form additional types of helices beyond the standard α-helix. These include 3₁₀-helices and π-helices, which differ in how tightly they coil and in their hydrogen bonding patterns.
🔵 8. Why are most molecular helices right-handed?
Most molecular helices are right-handed because biological building blocks are chiral with L-amino acids and D-sugars favoring right-handed structures that minimize steric clashes and optimize hydrogen bonding. Exceptions like Z-DNA exist but are less common and form under specific conditions.
🔵 9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
This happens because the edges of β-sheets can easily form hydrogen bonds with other strands, and many of the side chains involved are hydrophobic, so they cluster together to avoid water. The main driving force is therefore hydrophobic interactions, along with additional stabilization from hydrogen bonding between sheets.
🔵 10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
Amyloid diseases form β-sheets because they are very stable, so misfolded proteins stack into insoluble “cross-β” fibrils that build up over time, as seen in Alzheimer’s Disease, Type 2 Diabetes, and Creutzfeldt–Jakob disease; that same stability also makes these structures useful as materials like nanofibers and scaffolds.
🧪 Part B: Protein Analysis and Visualization
🔵 1. Briefly describe the protein you selected and why you selected it.
I selected green fluorescent protein (GFP) because it is a well-known protein that clearly links structure to function. GFP naturally fluoresces due to a chromophore formed within its folded structure, which makes it widely used to track gene expression and protein location in cells. I also chose it because I’ve enjoyed working with fluorescent systems in biology so far, like in the Opentrons lab, so it feels familiar and intuitive while still being a powerful example of how protein structure leads to function.
🔵 2. Identify the amino acid sequence of your protein.
From CBI I obtained the amino acid sequence - Aequorea victoria green-fluorescent protein:
https://www.ncbi.nlm.nih.gov/nuccore/L29345.1
MSKGEELFTGVVPILVELDGDVNGQKFSVSGEGEGDATYGKLT
KFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKD
DGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKMEYNYNSHNVYIMADKPKNG
IKVNFKIRHNIKDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHM
ILLEFVTAAGITHGMDELYK
sp|Q15465|SHH_HUMAN Sonic hedgehog protein Results
Length: 238 amino acids
Most frequent: G (22 times, 9.2%)
How many protein sequence homologs are there for your protein?
Uniprot id: P42212 - 205 hits found
Does your protein belong to any protein family?
GFP belongs to the green fluorescent protein (GFP) family. This family includes a range of fluorescent proteins found in organisms like jellyfish and corals, all of which share a similar β-barrel structure and fluorescent chromophore but can emit different colors (e.g. green, blue, cyan, yellow, red).
🔵 Structure Analysis
https://www.rcsb.org/3d-view/1EMA - 1EMA structure for GFP
The GFP structure (PDB ID: 1EMA) was solved in 1996, deposited on August 1 and released on November 8 by Ormo and Remington. It was determined using X-ray diffraction with a resolution of 1.90 Å, indicating a high-quality structure with well-resolved atomic positions. In addition to the protein, the structure includes water molecules and the chromophore (listed as a non-standard residue). Structurally, GFP belongs to the all-β β-barrel fold, commonly referred to as the GFP-like fold in classification systems such as SCOP.
🔵 3D Visualization
dss
color red, ss h
color yellow, ss s
color green, ss l
From the structure, we can infer that hydrophobic residues are predominantly located in the interior of the protein, forming a stable core, while hydrophilic residues are mainly exposed on the surface where they can interact with the surrounding aqueous environment. This distribution is consistent with typical protein folding and helps stabilize the β-barrel structure of GFP.
The protein appears mostly smooth and compact with no large exposed binding pockets on the exterior. There are only small surface indentations, but no obvious deep cavities. This suggests that GFP does not have a typical surface binding site; instead, its main “hole” is an internal cavity within the β-barrel, which is not visible from the outside surface.
🤖 Part C: Using ML-Based Protein Design Tools
At a high level, we are using a pretrained machine learning model to learn patterns from large numbers of protein sequences and then apply that knowledge to analyze a specific protein. In the deep mutational scan, we systematically mutate each position in the protein and use the model to estimate how likely or tolerated each mutation is, which helps identify important versus flexible regions of the protein. In latent space analysis, we convert entire protein sequences into vector embeddings and visualize them in a reduced-dimensional space, where proteins with similar structure or function cluster together. Together, these approaches let us explore both how individual mutations affect a protein and how whole proteins relate to each other, without directly simulating their physical behavior.
🧪 Deep Mutational Scans
The mutation scan heatmap shows how each possible amino acid substitution affects every position in the protein. The x-axis represents the position along the protein sequence (from residue 1 to ~238 for GFP), and the y-axis represents the 20 possible amino acids that could be substituted at each position. Each cell in the heatmap corresponds to a specific mutation (e.g. position i mutated to amino acid j), and the color indicates the model’s score or likelihood for that mutation: brighter colors (yellow/green) indicate mutations that are more likely or tolerated, while darker colors (blue/purple) indicate mutations that are unlikely and likely destabilizing.
By looking vertically at a single column (one position), we can see how sensitive that position is to mutation, columns that are mostly dark suggest highly conserved, functionally or structurally critical residues, whereas columns with many lighter colors indicate positions that are more flexible and tolerant to change. Patterns across the heatmap therefore reveal which regions of the protein are constrained (e.g. core or active regions) versus more variable (e.g. surface or loop regions), giving insight into the protein’s stability and function.
For example, at a position in the core of the protein (e.g. around residue ~65, near the chromophore region in GFP), most substitutions are dark (low likelihood), but mutations to similar amino acids (e.g. hydrophobic → hydrophobic) may be slightly less penalized. This suggests that the residue is highly conserved and structurally important, and changing it disrupts the local environment required for stability or function. In contrast, substituting with a chemically similar residue is less disruptive, which is why those mutations appear slightly more tolerated.
🌌 Latent Space Analysis
Each point represents a protein, and proximity reflects similarity in learned sequence features. After placing GFP into this space, its nearest neighbor was another GFP sequence, confirming the model correctly captures sequence similarity. However, other nearby proteins were functionally different and relatively distant, suggesting that GFP is somewhat isolated in this dataset due to a lack of closely related sequences. This indicates that while the latent space captures meaningful relationships, the dataset composition strongly influences the observed neighborhoods.
🧩 Protein Folding
Protein folding is important because a protein’s function is determined by its three-dimensional structure rather than just its amino acid sequence. The way a protein folds defines its active sites, binding interactions, and overall stability, which in turn controls how it behaves in a biological system. Misfolding can lead to loss of function or disease, while correct folding enables proteins to carry out roles such as catalysis, signaling, and structural support. Being able to understand and predict how a sequence folds therefore allows us to infer function, study the effects of mutations, and design new proteins or therapeutics without relying solely on experimental methods.

In this task, we used ESMFold to predict protein structure directly from sequence. The model is first pretrained as a protein language model (ESM-2), learning patterns from large datasets of sequences, and then passes this information into a folding module that outputs predicted 3D coordinates. In the diagram, the sequence is encoded into embeddings, processed through a series of network blocks, and iteratively refined to produce a final structure along with a confidence estimate. We then compare the predicted structure to known experimental structures and test how mutations affect folding, allowing us to explore how robust the protein’s structure is to changes in its sequence.
🧬 Protein Generation – Inverse Folding
Inverse protein folding is the process of starting with a desired 3D protein structure (its backbone shape) and designing an amino acid sequence that will fold into that structure. Instead of predicting structure from a sequence, you reverse the problem: given a fixed geometry, a model like ProteinMPNN selects residues that fit spatially, stabilize interactions, and satisfy physical constraints. Because many different sequences can produce the same structure, the goal is to find one (or several) that make the structure energetically stable. The designed sequence is then typically validated by folding it again with a model like ESMFold and checking whether it reproduces the original structure.