Project Propalsal:
A small, low-cost desktop platform that combines short DNA synthesis with cell-free expression. Users (students, community labs, small clinics) design short DNA sequences through a web interface, send them to a benchtop “DNA printer,” and immediately test them in a cell-free system. This pushes “personal fabrication” into biology and could support education and grassroots innovation, but raises serious questions about biosecurity, safety, and equity when DNA writing becomes cheap and widely accessible.
3.1 Choose your protein
I chose Green Fluorescent Protein (GFP) from the jellyfish Aequorea victoria.
Reasons:
-Classic reporter protein in molecular biology and imaging
-Small, monomeric, and widely used as a fusion tag
sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
3.2 Reverse translate (protein → DNA) reverse translation of sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 to a 714 base sequence of most likely codons. atgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtggaactggatggc gatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgatgcgacctatggc aaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccgtggccgaccctg gtgaccacctttagctatggcgtgcagtgctttagccgctatccggatcatatgaaacag catgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgcaccatttttttt aaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggcgataccctggtg aaccgcattgaactgaaaggcattgattttaaagaagatggcaacattctgggccataaa ctggaatataactataacagccataacgtgtatattatggcggataaacagaaaaacggc attaaagtgaactttaaaattcgccataacattgaagatggcagcgtgcagctggcggat cattatcagcagaacaccccgattggcgatggcccggtgctgctgccggataaccattat ctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgatcatatggtgctg ctggaatttgtgaccgcggcgggcattacccatggcatggatgaactgtataaa
Reference paper: HYDRA – hydrogels by robotic automation Citation
Torchia, E. et al. Fabrication of cell culture hydrogels by robotic liquid handling automation for high-throughput drug testing. Communications Engineering, 4, 222 (2025).
What they did
This paper introduces HYDRA (HYDrogels by Robotic liquid-handling Automation), a method to fabricate thin, planar hydrogel films directly inside standard 96- and 384-well plates using liquid-handling robots.
Part A. Conceptual Questions Question: How many molecules of amino acids do you take with a piece of 500 grams of meat?
(On average an amino acid is ~100 Daltons.)
Answer:
5 mol × 6.02 × 10²³ ≈ 3 × 10²⁴ amino acid molecules.
So eating 500 g of meat gives you on the order of 10²⁴ (about three septillion) amino acid molecules.
Part 1 1. Retrieve SOD1 sequence and introduce A4V MATKVCVVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
PepMLM-generated 12-aa peptides + known binder (lower = higher model confidence in the binder).
ID Peptide (12 aa) Source Perplexity (PepMLM) P1 WHSPVAAARLKE PepMLM 11.721713 P2 KRYGAAAARHKK PepMLM 11.211369 P3 WRYPVAGLALKE PepMLM 13.068802 P4 WHSPPAAVALGE PepMLM 12.159801 Ref FLYRWLPSRRGG known SOD1 binder N/A Observation (PepMLM confidence by perplexity):
Among generated candidates, P2 (KRYGAAAARHKK) has the lowest perplexity (highest PepMLM confidence), while P3 (WRYPVAGLALKE) has the highest perplexity (lowest confidence among the four).
Answers to Questions About This Week’s Lab Protocol 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fidelity PCR Master Mix typically contains several important components: Phusion DNA polymerase, dNTPs, reaction buffer, and MgCl2. The polymerase synthesizes new DNA strands and has proofreading activity, which lowers the error rate compared with standard Taq polymerase. The dNTPs provide the nucleotide building blocks needed to make the new DNA strands. The buffer maintains the proper chemical environment, including pH and salt concentration, for the enzyme to work efficiently. MgCl2 is an essential cofactor that allows the polymerase to function properly. In this lab, the master mix is provided as a 2X mix, so only template DNA, primers, and water are added separately. (neb.com)
Subsections of Homework
Week 1 HW: Principles and Practices
Project Propalsal:
A small, low-cost desktop platform that combines short DNA synthesis with cell-free expression. Users (students, community labs, small clinics) design short DNA sequences through a web interface, send them to a benchtop “DNA printer,” and immediately test them in a cell-free system. This pushes “personal fabrication” into biology and could support education and grassroots innovation, but raises serious questions about biosecurity, safety, and equity when DNA writing becomes cheap and widely accessible.
Option 1: Mandatory sequence screening and basic customer vetting for all DNA synthesis providers (including cartridge vendors), coordinated through national / international standards.
Option 2: Built-in technical safeguards in desktop devices (on-device sequence screening, hard limits on sequence length and volume, whitelist mode for education deployments).
Option 3: Community lab / school codes of conduct, safety & security training, and an incident-report network co-developed with public agencies and DIYbio / professional societies.
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
1
2
2
• By helping respond
2
3
1
Foster Lab Safety
• By preventing incident
2
2
1
• By helping respond
3
3
1
Protect the environment
• By preventing incidents
2
2
1
• By helping respond
3
3
1
Other considerations
• Minimizing costs and burdens to stakeholders
3
2
1
• Feasibility?
2
3
1
• Not impede research
2
3
1
• Promote constructive applications
2
2
1
Based on this scoring, I would prioritize a combination of Option 1 and Option 3, with Option 2 as a complementary, medium-term measure.
Option 1 scores best on preventing high-consequence biosecurity incidents, especially if screening standards are coordinated internationally and made affordable for smaller providers. However, it is costly and risks concentrating DNA synthesis capacity in a few large actors. Option 3 scores best on lab safety, environmental protection, and promoting constructive applications in community labs and schools, but it is weaker for deterring sophisticated malicious actors. Option 2 could add an important technical layer of protection, yet it faces feasibility and “jailbreaking” challenges and could more easily impede legitimate research if designed too rigidly.
For a national science policy audience or major funders, I would recommend:
Supporting shared, affordable sequence-screening tools and minimum standards (Option 1).
Investing in training, codes of conduct, and incident-report networks for community labs and schools (Option 3).
Encouraging research and early deployment of built-in safeguards, while monitoring how they affect usability and innovation (Option 2).
Key uncertainties include how quickly desktop DNA platforms will diffuse, how easy it will be to circumvent safeguards, and how governance choices in one country will shift risks and opportunities globally.
Reflecting on this week’s class, one ethical concern that became more salient to me is how routine DNA writing already is in modern biology. It no longer feels like a rare, “sci-fi” capability but a basic infrastructure, which makes dual-use risks more mundane and distributed. Another concern is equity: if governance relies only on heavy regulation and expensive compliance, advanced tools may become concentrated in a few wealthy institutions, while informal or under-resourced spaces are pushed into a gray zone with less support and oversight.
In the local context of MIT and Harvard, I think appropriate governance actions include: brief, practical training on DNA synthesis ethics for people who can place synthesis orders; centrally provided sequence-screening tools so individual labs do not each have to solve the problem; and safe channels to ask questions about “borderline” projects and to report concerns. These measures align with Option 1 and Option 3, and feel tractable at the institutional level.
Homework Questions:
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
A:Polymerase is ~1 error per 10⁶ bases, which would mean thousands of errors across the 3.2×10⁹-bp human genome, so cells rely on proofreading plus mismatch repair to bring the effective error rate way down.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
A:Because many amino acids have multiple synonymous codons, an average-length protein can be encoded by an astronomically large number of DNA sequences, but many fail in practice due to codon bias/rare tRNAs, harmful mRNA structures, and unintended regulatory or splicing signals that reduce or disrupt expression.
What’s the most commonly used method for oligo synthesis currently?
A:
Oligonucleotide synthesis
Why is it difficult to make oligos longer than 200nt via direct synthesis?
A:Its gonna have errors.
Why can’t you make a 2000bp gene via direct oligo synthesis?
A:Its gonna have a lots of errors.
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
A:The “10 essential amino acids” mnemonic often used for animals is PVT TIM HALL: Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Arginine, Leucine, Lysine.
Since lysine is already an essential amino acid (animals generally can’t synthesize it and must get it from diet), “making an animal lysine-dependent” is basically making it normal, so as a containment strategy it’s weak unless you also control lysine access or engineer dependence on something non-natural (a synthetic nutrient) rather than a widely available dietary essential.
Week 2 HW: DNA Read, Write, & Edit
3.1 Choose your protein
I chose Green Fluorescent Protein (GFP) from the jellyfish Aequorea victoria.
Reasons:
-Classic reporter protein in molecular biology and imaging
-Small, monomeric, and widely used as a fusion tag
sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL
VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV
NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD
HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
3.2 Reverse translate (protein → DNA)
reverse translation of sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 to a 714 base sequence of most likely codons.
atgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtggaactggatggc
gatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgatgcgacctatggc
aaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccgtggccgaccctg
gtgaccacctttagctatggcgtgcagtgctttagccgctatccggatcatatgaaacag
catgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgcaccatttttttt
aaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggcgataccctggtg
aaccgcattgaactgaaaggcattgattttaaagaagatggcaacattctgggccataaa
ctggaatataactataacagccataacgtgtatattatggcggataaacagaaaaacggc
attaaagtgaactttaaaattcgccataacattgaagatggcagcgtgcagctggcggat
cattatcagcagaacaccccgattggcgatggcccggtgctgctgccggataaccattat
ctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgatcatatggtgctg
ctggaatttgtgaccgcggcgggcattacccatggcatggatgaactgtataaa
3.3 Organism chosen and why:
I optimized the sequence for Escherichia coli (e.g. K-12 lab strain).
E. coli is cheap, grows fast, and is a standard workhorse for expressing GFP. There are many well-characterized plasmids and promoters for high-level GFP expression in E. coli.
There are two main ways to produce my GFP protein from this DNA: cell-dependent and cell-free expression.
Cell-dependent method (E. coli expression) I can clone my codon-optimized GFP sequence into an expression plasmid under a strong promoter (for example a T7 or lac promoter) with a ribosome binding site and terminator. The plasmid is transformed into E. coli. Inside the cells, bacterial RNA polymerase transcribes the GFP gene into mRNA, and ribosomes translate this mRNA into the GFP polypeptide, reading it codon by codon. The peptide folds into the GFP β-barrel and forms its chromophore, so the cells become fluorescent under blue/UV light. This is a classic, cell-dependent way to produce GFP.
Cell-free method (in vitro transcription–translation) Alternatively, I can add the same GFP DNA template to a cell-free transcription–translation system made from E. coli lysate. The lysate contains RNA polymerase, ribosomes, tRNAs, amino acids, NTPs, and energy regeneration components. In the tube, the DNA is transcribed into mRNA and then translated into GFP, again following the central dogma (DNA → RNA → protein), but without living cells. After incubation, the reaction mixture will glow green if GFP is correctly produced and folded.
Week 3 HW: LabAutomation
1. Reference paper: HYDRA – hydrogels by robotic automation
Citation
Torchia, E. et al. Fabrication of cell culture hydrogels by robotic liquid handling automation for high-throughput drug testing. Communications Engineering, 4, 222 (2025).
What they did
This paper introduces HYDRA (HYDrogels by Robotic liquid-handling Automation), a method to fabricate thin, planar hydrogel films directly inside standard 96- and 384-well plates using liquid-handling robots.
Normally, when you cast hydrogels into small wells, capillary forces at the sidewalls create a curved meniscus, which:
makes hydrogel thickness non-uniform across the well
disturbs cell seeding density and imaging focus
reduces the reliability of high-throughput drug screening
HYDRA solves this by:
Robotically dispensing a sub-contact volume of hydrogel precursor (fish gelatin + microbial transglutaminase) into each well, carefully avoiding contact with the sidewalls.
Immediately re-aspirating most of that volume with precisely controlled height and flow rate.
Using contact angle hysteresis so that a thin, meniscus-free layer (about 10–50 μm) remains at the bottom.
The authors show that these hydrogels support drug dose–response assays on engineered epithelial cells and allow long-term imaging on soft, biomimetic substrates. They also demonstrate that HYDRA can be implemented on an open-source Opentrons OT-2 robot, effectively turning a liquid-handling platform into a simple, programmable soft-materials fabrication tool.
They combine this with plate-scale quality control and show that the hydrogels support:
Drug dose–response assays with engineered epithelial cells
Long-term holographic and fluorescence microscopy on soft, biomimetic substrates
How the automation is implemented (and why it’s relevant)
They explicitly use an Opentrons OT-2 as an open-source, low-cost platform to implement HYDRA. The pipeline is built with Opentrons Protocol Designer and custom Python to control dispense/aspirate heights, volumes, and speeds. :contentReference[oaicite:4]{index=4}
The OT-2 mixes gelatin and transglutaminase stocks, casts the precursor into plates, and re-aspirates to leave a controlled film.
Conceptually, this turns a “liquid-handling robot” into a materials fabrication tool: it is designing not only concentrations but also geometry (flat films with controlled thickness) and mechanical properties (tunable stiffness ~1.5–6 kPa). :contentReference[oaicite:5]{index=5}
For my interests (digital fabrication, soft metamaterials, auxetics), this is very close to “2.5D soft material printing”:
Process parameters: volumes, pipette height, flow rate
Output behavior: thickness, flatness, stiffness and cell response
This is exactly the kind of workflow I want to adapt: using Opentrons not just as a biology helper, but as a programmable fabrication device for soft, structured materials.
Opentrons-printed auxetic hydrogel tiles for programmable mechanics
Core idea
Use the Opentrons OT-2 as a “dot-matrix printer” for soft materials: it will deposit small droplets of hydrogel precursor with different formulations onto a thin flexible substrate, forming a 2D auxetic (negative Poisson’s ratio) pattern.
By controlling which beams/tiles are soft vs stiff (or swell more vs less), the overall structure exhibits programmable shape change or auxetic behavior when stretched or stimulated.
This combines:
HYDRA’s idea of robotic hydrogel fabrication
My background in digital fabrication and mechanical metamaterials
A design space of structure × material formulation × process
Biological / material system (high-level)
I will use a crosslinkable hydrogel system compatible with HTGAA and the Opentrons:
Option A: fish gelatin + microbial transglutaminase (following HYDRA)
Option B: a photo-crosslinkable gelatin (e.g., GelMA) if lab infrastructure favors photogels
Each “unit” in the auxetic pattern is defined by two main parameters:
Base stiffness – controlled by gelatin concentration (e.g., 5%, 10%, 20%)
Crosslink density – controlled by enzyme concentration or light exposure time
These combinations create:
Soft segments that deform easily
Stiff segments that act as constraints
Arranged in an auxetic geometry (e.g., re-entrant honeycomb, rotating squares), the global behavior becomes a tunable mechanical metamaterial.
Cells are optional at this stage; the primary goal is to demonstrate programmable mechanical behavior. Cells could later be seeded to test how different stiffness regions affect attachment and morphology.
What will be automated
Automated formulation library
The OT-2 prepares a small library of hydrogel precursor formulations in a 96-well plate:
“Soft”: 5% gelatin + 0.5% crosslinker
“Medium”: 10% gelatin + 1% crosslinker
“Stiff”: 20% gelatin + 2% crosslinker
Using standard pipetting commands, the robot mixes stock solutions to create these recipes in defined wells.
This step establishes a recipe space that can be expanded later (different polymer, different additives, etc.).
Geometric pattern → robot coordinates
I design an auxetic pattern in Rhino/Grasshopper or Python (e.g., 6 × 6 re-entrant lattice).
Each structural element (“beam”) is discretized into a small number of print points.
Each point is annotated with a recipe ID (“soft”, “medium”, “stiff”).
I then map this point set into the OT-2 deck coordinate system by:
3D-printing a simple holder that clamps a thin transparent membrane (PDMS or plastic) on the deck
Calibrating its four corners relative to the robot origin
Converting design coordinates (x, y) into deck positions via a linear transform + offsets
The result: a table of points like (x, y, recipe_id) that the robot can iterate through.
Printing and curing
The robot:
Aspirates small volumes (e.g., 2–5 µL) of each formulation from the 96-well “recipe plate”
Moves to specified (x, y) positions over the substrate
Deposits droplets in the order dictated by the auxetic pattern
After printing:
The membrane is transferred to an incubator (37 °C) or UV station for crosslinking
Once cured, the membrane can be mounted in a simple tensile frame and mechanically tested (even by hand with phone video) to observe auxetic deformation.
This is a direct analog of HYDRA (controlling meniscus and layer thickness), but applied to patterned, multi-formulation structures instead of uniform coatings.
Experimental workflow (high-level)
Preparation (manual)
Prepare gelatin and crosslinker stock solutions.
Design and 3D-print a membrane holder compatible with a specific Opentrons deck slot.
Attach a thin transparent membrane on top and calibrate four reference points on the OT-2.
Deck layout
Slot 1: Eppendorf rack with stock solutions (gelatin, crosslinker, buffer).
Slot 2: 96-well “recipe plate” where the robot generates soft/medium/stiff formulations.
Slot 3: custom membrane holder (“printing bed”).
Slot 4–5: tip racks (P20).
Slot 6: waste reservoir.
Opentrons protocol (concept)
Step 1 – Formulation generation
Robot mixes gelatin + crosslinker into the recipe plate:
Use P300/P20 to combine stocks in wells A1–A3 to produce “soft”, “medium”, “stiff”.
Optional: generate more variants across the plate to explore stiffness/swell space.
Step 2 – Auxetic pattern printing
Load auxetic point list (x, y, recipe) from a CSV or embedded list.
For each point:
Pick up a tip
Aspirate 2 µL from the well corresponding to recipe
Move to the transformed (x, y, z) above the membrane
Dispense the droplet
Drop the tip
Repeat until all beams in the pattern are printed.
Step 3 – Curing and testing
Move the printed membrane to an incubator / UV lamp for crosslinking.
After curing, mount on a simple frame, apply tension, and record deformation.
Optional: overlay grid or markers to track local strain.
My Final Project – Possible Directions
1. Opentrons-printed auxetic hydrogel tiles
One-liner Use an Opentrons robot as a “2.5D printer” to deposit hydrogel droplets in an auxetic pattern, and study how composition plus geometry shape the mechanical behavior.
What I would automate
Opentrons mixes a small library of hydrogel formulations in a 96-well plate (for example, soft / medium / stiff).
It then “prints” droplets onto a thin flexible film in a precomputed auxetic pattern (re-entrant squares, rotating squares, etc.).
Each beam or node of the pattern can use a different formulation, so the auxetic response is not only geometric but also rheology-informed.
Why it is interesting
Treats the OT-2 as a soft-material fabrication machine instead of just a pipetting tool.
Links soft-matter rheology, mechanical metamaterials, and lab automation: geometry, composition, and process are all programmable.
Can start as a purely mechanical experiment and later add biological layers if there is time.
2. High-throughput hydrogel rheology map in a plate
One-liner Use lab automation to build a small materials map of hydrogel rheology in a 96-well plate, linking formulation to mechanical properties as a design tool for later printing.
What I would automate
Opentrons prepares a combinatorial grid of hydrogel recipes across the plate (for example, rows = polymer concentration, columns = crosslinker concentration).
For each well, I run a simple, automatable mechanical proxy (for example, indentation depth under a fixed weight, or image-based deformation).
Collect all measurements into a composition–property map that I can use to choose formulations for the auxetic printing in project 1.
Why it is interesting
Turns subjective “this gel feels soft or stiff” into structured data driven by automation.
Shows a clear automation loop: the robot explores a soft-material design space, not just prepares biological assays.
Directly supports and informs the first project idea.
One-liner Use automation to assemble and place small cell-free reactions that behave like simple logic gates, so that a 2D fluorescent pattern encodes a truth table in space.
What I would automate
Opentrons prepares cell-free reactions with different DNA constructs that approximate AND / OR / NOT behavior (or simpler ON / OFF variants).
Each well corresponds to an input combination (00, 01, 10, 11), arranged in the plate as a visual truth table, or optionally printed as droplets onto a flat substrate.
After incubation, the pattern of fluorescence across wells or positions becomes a spatial representation of the logic.
Why it is interesting
Uses automation to build and arrange many small reactions that would be tedious by hand.
Connects synthetic biology, simple computation, and digital fabrication: logic is expressed both in biochemical reactions and in spatial layout.
Offers a complementary direction where the “printed pattern” carries information and function, not just mechanical behavior.
Week 4 HW: ProteinDesign
Part A. Conceptual Questions
Question: How many molecules of amino acids do you take with a piece of 500 grams of meat? (On average an amino acid is ~100 Daltons.)
So eating 500 g of meat gives you on the order of 10²⁴ (about three septillion) amino acid molecules.
Question: Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Answer:
These free amino acids are absorbed and reused as generic building blocks to make human proteins, under the instructions of human DNA and our own gene expression programs.
Part B. Protein Analysis and Visualization
In this part, pick any protein (with a known 3D structure) and answer the following.
B1. Protein choice
Briefly describe the protein you selected and why you selected it.
Answer:
I chose the spider dragline silk protein, specifically a major ampullate spidroin (MaSp1). This is one of the main structural proteins that spiders use to spin their dragline silk, the “lifeline” they hang from and use as a safety rope. Dragline silk is famous for combining very high tensile strength (comparable to steel) with remarkable toughness and elasticity, and these mechanical properties come directly from its unusual amino acid sequence: long, repetitive blocks that form a mix of crystalline β-sheet nanodomains and more disordered, elastic regions.
I selected MaSp1 because it sits at the intersection of protein sequence, hierarchical structure, and macroscopic material behavior. It’s a natural example of how repeating sequence motifs can encode a programmable mechanical material, which fits nicely with the theme of protein design and with thinking about proteins as engineerable structural materials, not just enzymes or receptors.
This sequence is heavily enriched in glycine (G) and alanine (A), which together make up the majority of residues. Glycine is extremely small and flexible, while alanine is small and slightly hydrophobic. In spider silk, repeated (Gly–Ala)n motifs are known to stack into tightly packed β-sheet nanocrystals that provide the fiber’s high stiffness and tensile strength.
B4. Sequence homologs
How many protein sequence homologs are there for your protein?
Answer:
UniProt BLAST returns hundreds of homologous sequences, mostly other spider silk proteins from related species, showing that MaSp1 belongs to a large and diversified silk protein family.
B5. Protein family
Does your protein belong to any protein family?
Answer:
Belongs to the silk fibroin family
B6. Structure identification (RCSB PDB)
Identify the structure page of your protein in RCSB.
When was the structure solved? Is it a good quality structure?
Answer:
Year solved: 2016
Resolution: 2.02A
Comment on quality: Good
B8. Other molecules in the structure
Are there any other molecules in the solved structure apart from the protein?
Answer:
Ligands: None – there are no bound small-molecule ligands or cofactors reported.
Ions: None – no metal ions are annotated in this structure.
Water / other components: Only crystallographic water molecules (HOH) are present as part of the solvent.
Comments: The 5IZ2 structure is essentially just the N-terminal domain of MaSp1A plus bulk water, with no extra cofactors or metal ions. This makes it a relatively “clean” system for analyzing the intrinsic protein–protein interface and packing.
B9. Structural classification
Does your protein belong to any structure classification family? (e.g., SCOP, CATH)
This shows that the N-terminal domain of MaSp1 belongs to a specific spidroin N-terminal structural family in SCOP.
B10. 3D visualization
B10.1 Cartoon / ribbon / ball-and-stick views
In PyMOL, I loaded the N-terminal domain of MaSp1 (PDB 5IZ2) and visualized it with different representations:
Cartoon view (secondary structure):
Ribbon view:
Ball-and-stick view (side chains):
B10.2 Hydrophobic vs hydrophilic residue distribution
When I color the protein by residue type / hydrophobicity, I observe a typical hydrophobic-core, hydrophilic-surface pattern:
Many hydrophobic residues (e.g., Leu, Val, Ile, Phe) are buried in the interior of each helical bundle and at the dimer interface.
Polar and charged residues (e.g., Gln, Glu, Lys, Arg, Ser) are mostly exposed on the outside, where they can interact with solvent and potentially mediate pH-dependent behavior.
This distribution is consistent with the idea that the MaSp1 N-terminal domain is a soluble, dimerizing helical bundle, with its hydrophobic core shielded from water and its polar residues decorating the surface.
B10.3 Surface and potential “holes” (pockets)
Finally, I visualized the molecular surface of the dimer:
The overall shape is compact and smooth, but there are some shallow grooves and one noticeable depression on the surface. This depression looks like a small, shallow pocket rather than a deep enzyme-like active site or tunnel. It is more likely to be a protein–protein interaction groove (or simply a consequence of how the helices pack) than a dedicated small-molecule binding pocket.
Overall, the surface of 5IZ2 matches its role as a structural N-terminal dimerization domain, not a classical ligand-binding enzyme.
Part C. Using ML-Based Protein Design Tools
C1. Protein Language Modeling
C1.1 Deep mutational scan with ESM2
Use ESM2 to generate an unsupervised deep mutational scan of your protein. Can you explain any particular pattern? (Choose a residue and a mutation that stands out.)
Answer:
Using ESM2, I generated a deep mutational scan for the full MaSp1 sequence. A clear pattern is that the model strongly prefers small residues like glycine (G) and alanine (A) in the long repetitive core of the silk protein, and strongly dislikes large or strongly charged residues at those positions.
For example, I looked at one position in the middle of the repetitive region where the wild-type residue is glycine. In the heatmap, mutating this site to alanine (G → A) has a relatively neutral or mildly favorable score (greenish color), but mutating it to tryptophan (G → W) gives a very unfavorable score (dark purple). This makes sense: natural spider dragline silk is built from G/A-rich motifs such as (Gly–Ala)n that pack into tight β-sheet nanocrystals. Replacing a small glycine with a bulky aromatic residue like tryptophan would likely disrupt this packing, so the language model correctly treats this mutation as strongly deleterious.
C1.2 Latent Space Analysis
Using the notebook’s SCOPe ASTRAL dataset, I first embedded all protein domains with ESM2 and then reduced the embedding dimensionality to 3D using t-SNE (n_components = 3, perplexity = 30). Each point in the plot below corresponds to one protein domain in the SCOPe dataset; nearby points represent proteins that the language model considers similar in terms of sequence statistics and learned evolutionary patterns.
In this 3D map, the SCOPe domains do not form one perfectly uniform “cloud”: there is a dense central region and a somewhat more diffuse shell of points around it. Hovering over different neighborhoods in the plot shows that proteins with related properties tend to live near each other (for example, domains with similar lengths and compositional biases cluster more closely than very different domains). This suggests that the latent space is grouping broadly similar proteins into local neighborhoods.
I then added my own protein sequence — a fragment of the spider dragline silk protein MaSp1 — as an extra point in the embedding. In the t-SNE plot, this MaSp1 point is highlighted in a different color (red in the figure). It does not appear as an isolated outlier: instead, it lies on the outer part of the main cloud, close to other domains that, based on their descriptions, also have biased compositions and/or repetitive, low-complexity features.
This placement is consistent with what we know about MaSp1. The sequence is very long and strongly enriched in glycine and alanine repeats, designed to form structural silk fibers rather than a compact catalytic enzyme. In the latent space, the model therefore locates MaSp1 in a neighborhood of other non-enzymatic, compositionally biased domains, rather than in the core region where many typical enzyme-like domains cluster. In other words, the latent space neighborhoods do approximate “similar” proteins, and MaSp1’s position in the map matches its role as a G/A-rich structural material protein.
C2. Protein Folding (ESMFold)
C2.1 Folding your protein
Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
Answer:
C3. Protein Generation
C3.1 Inverse folding with ProteinMPNN
Use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
Observation (PepMLM confidence by perplexity): Among generated candidates, P2 (KRYGAAAARHKK) has the lowest perplexity (highest PepMLM confidence), while P3 (WRYPVAGLALKE) has the highest perplexity (lowest confidence among the four).
Part 2 – Evaluate binders with AlphaFold3
Peptide
Sequence
Binding description (1–2 sentences)
Near N-terminus / A4V?
P1
WHSPVAAARLKE
surface-bound
no
P2
KRYGAAAARHKK
surface-bound
no
P3
WRYPVAGLALKE
surface-bound
no
P4
WHSPPAAVALGE
surface-bound
yes
Ref
FLYRWLPSRRGG
Surface-associated
no
Part 3 – Evaluate Properties of Generated Peptides in the PeptiVerse
Peptide
Solubility
Solubility Score
Hemolysis
Hemolysis Score
Length (aa)
Molecular Weight (Da)
Net Charge (pH 7)
Isoelectric Point (pH)
Hydrophobicity (GRAVY)
WHSPVAAARLKE
Soluble
1.000
Non-hemolytic
0.013
12
1364.6
0.85
8.76
-0.42
KRYGAAAARHKK
Soluble
1.000
Non-hemolytic
0.009
12
1356.6
4.84
11.17
-1.53
WRYPVAGLALKE
Soluble
1.000
Non-hemolytic
0.025
12
1402.6
0.77
8.59
-0.06
WHSPPAAVALGE
Soluble
1.000
Non-hemolytic
0.041
12
1234.4
-1.14
5.47
0.12
From the AlphaFold3 models, all of the peptides showed relatively low and fairly similar ipTM values, and structurally they all appeared to bind in a shallow, surface associated manner rather than forming a clearly buried or highly specific interface on mutant SOD1. This means that higher ipTM did not translate into an obviously much stronger or more convincing binding pose in the structural models. When compared with the PeptiVerse predictions, the four generated peptides were all predicted to be fully soluble and non hemolytic, so none of the better candidates raised an immediate concern for poor solubility or red blood cell toxicity. Among them, KRYGAAAARHKK provides the best overall balance. It had the lowest PepMLM perplexity, indicating the highest generation confidence, and it also showed excellent therapeutic property predictions, including full solubility, very low hemolysis risk, and a strongly positive net charge that could support interaction with exposed regions of SOD1.
I would advance KRYGAAAARHKK. Although its AlphaFold3 model did not show a dramatically stronger binding geometry than the others, it combines the best overall computational profile across generation confidence, structural plausibility, solubility, and safety related properties. In other words, it is the most balanced candidate for further optimization and experimental validation.
Part 4 – Generate Optimized Peptides with moPPIt
Binder
Hemolysis
Solubility
Motif
CTWVKKTKKQVT
0.979838
0.833333
0.809806
GYKQKTCNTVKW
0.96261
0.833333
0.782062
STAEFTRQTKKM
0.954297
0.75
0.839537
RGKTTTQNGKVI
0.978479
0.833333
0.820856
GFGTQKKTKCG
0.964351
0.916667
0.849918
RTDQGGVKITLE
0.969846
0.833333
0.905129
ETKKRQKFKTDF
0.974053
0.833333
0.887031
KGETTDKIQKTM
0.975992
0.833333
0.841838
RETVGKKTQTKC
0.981714
0.916667
0.81274
The moPPIt peptides differ noticeably from the PepMLM peptides in both composition and design emphasis. The PepMLM peptides I generated for SOD1 were short 12 amino acid candidates with relatively simple compositions, and several were enriched in alanine, valine, lysine, and histidine, which made them look like general target conditioned binders. In contrast, the moPPIt peptides appear more motif driven and more strongly enriched in lysine and threonine, with some cysteine, glycine, phenylalanine, and aspartate included in specific patterns. This makes them look more constrained by the optimization objectives, especially motif retention and therapeutic property balancing, rather than just sequence likelihood conditioned on the target. In other words, PepMLM gave plausible binder candidates, while moPPIt seems to generate peptides that are more explicitly shaped by multiple design goals.
Before advancing any of these peptides toward clinical studies, I would evaluate them in several stages. First, I would confirm binding experimentally using assays such as surface plasmon resonance, biolayer interferometry, or microscale thermophoresis to measure affinity and specificity for mutant SOD1 compared with wild type SOD1 and unrelated proteins. Second, I would test whether the peptides actually improve a disease relevant phenotype, for example by reducing SOD1 aggregation, toxicity, or misfolding in cell based ALS models. Third, I would assess safety and developability, including hemolysis, cytotoxicity, serum stability, protease resistance, immunogenicity risk, and off target effects. Fourth, I would evaluate pharmacology, including uptake, biodistribution, half life, and whether the peptide can reach the relevant tissues, especially the central nervous system. Only peptides that show convincing binding, functional benefit, acceptable safety, and realistic delivery potential should move forward into animal studies and later clinical development.
L-Protein Engineering | Option 3: Random Mutagenesis
Goal
In this option, I generated random multi-site L-protein mutants using mutation information from prior mutational analysis experiments, then selected candidates for co-folding with DnaJ using AF2 Multimer. The goal is to explore combinations of tolerated or favorable mutations rather than relying only on single-site manual design. The L protein contains a soluble N-terminal domain followed by a transmembrane region in the last 35 residues, so both structural context and prior mutation evidence were considered when building the mutation pool. :contentReference[oaicite:1]{index=1}
Strategy
Instead of introducing fully random amino acid substitutions, I used a constrained random mutagenesis strategy. I first built a mutation pool from mutations that were supported by prior mutational analysis experiments or appeared tolerated or favorable in the provided mutation data. Then I randomly sampled combinations containing at least 2 mutated positions. This makes the search more biologically grounded than unconstrained random mutagenesis.
A good mutant should not simply contain many mutations. I define an effective mutant as one that satisfies several criteria:
It preserves or improves the foldability and structural plausibility of the L protein.
It does not obviously disrupt the soluble N-terminal region that is implicated in DnaJ interaction.
It avoids strongly unfavorable substitutions in conserved or functionally critical positions.
It is compatible with productive interaction behavior when co-folded with DnaJ.
Ideally, it improves properties related to lysis efficiency, DnaJ independence, or expression, which are the broader goals of this assignment. :contentReference[oaicite:2]{index=2}
Python function for constrained random mutagenesis
importrandomfromtypingimportDict,List,TupleL_PROTEIN_WT="METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT"# Example mutation pool format:# key = 1-indexed residue position# value = (wildtype_residue, [allowed_mutant_residues])## You should replace these example entries with positions supported by your# mutational analysis sheet or notebook results.mutation_pool:Dict[int,Tuple[str,List[str]]]={8:("Q",["K","R"]),14:("A",["S","T"]),24:("E",["D","Q"]),31:("R",["K"]),35:("S",["T","A"]),41:("L",["I","V"]),44:("A",["V"]),58:("L",["I","V"]),63:("A",["V","L"]),69:("V",["I","L"]),}defapply_mutations(sequence:str,mutations:List[Tuple[int,str]])->str:seq_list=list(sequence)forpos,new_resinmutations:seq_list[pos-1]=new_resreturn"".join(seq_list)defformat_mutation_name(sequence:str,mutations:List[Tuple[int,str]])->str:names=[]forpos,new_resinmutations:wt=sequence[pos-1]names.append(f"{wt}{pos}{new_res}")return",".join(names)defgenerate_random_mutant(sequence:str,pool:Dict[int,Tuple[str,List[str]]],min_sites:int=2,max_sites:int=4):n_sites=random.randint(min_sites,max_sites)chosen_positions=random.sample(list(pool.keys()),n_sites)mutations=[]forposinchosen_positions:wt_expected,allowed=pool[pos]wt_actual=sequence[pos-1]ifwt_actual!=wt_expected:raiseValueError(f"WT mismatch at position {pos}: expected {wt_expected}, found {wt_actual}")new_res=random.choice(allowed)mutations.append((pos,new_res))mutations=sorted(mutations,key=lambdax:x[0])mutant_seq=apply_mutations(sequence,mutations)mutant_name=format_mutation_name(sequence,mutations)return{"mutations":mutant_name,"sequence":mutant_seq,"num_mutations":len(mutations)}defgenerate_unique_mutants(sequence:str,pool:Dict[int,Tuple[str,List[str]]],n_mutants:int=10,min_sites:int=2,max_sites:int=4):seen=set()results=[]whilelen(results)<n_mutants:mutant=generate_random_mutant(sequence,pool,min_sites,max_sites)key=mutant["mutations"]ifkeynotinseen:seen.add(key)results.append(mutant)returnresults# Example usagerandom.seed(42)mutants=generate_unique_mutants(sequence=L_PROTEIN_WT,pool=mutation_pool,n_mutants=10,min_sites=2,max_sites=4)fori,minenumerate(mutants,1):print(f"Mutant {i}")print("Mutations:",m["mutations"])print("Sequence :",m["sequence"])print("Count :",m["num_mutations"])print()
How I would use this function
I would first replace the example mutation_pool with residue positions and substitutions supported by the provided L-protein mutational analysis data. Then I would generate a small library of random double, triple, or quadruple mutants. From that library, I would select a few candidates that look structurally reasonable and co-fold them with DnaJ using AF2 Multimer.
Co-folding setup with DnaJ
For each selected mutant, I would submit the mutant L-protein sequence together with the DnaJ sequence as separate chains in AF2 Multimer. The purpose is to compare whether different mutation combinations change the predicted interaction pattern between the L-protein soluble region and DnaJ. The assignment notes that this kind of prediction may be difficult, especially for membrane related systems, so these structures should be interpreted cautiously.
How I define a “good” mutant
I define a good or effective mutant as one that balances multiple factors rather than optimizing only one metric. First, it should remain structurally plausible as an L-protein variant and avoid obviously destabilizing substitutions. Second, it should preserve or improve a meaningful interaction pattern with DnaJ in the soluble region if the biological mechanism still depends on that interaction. Third, if the long-term goal is DnaJ independence, then a good mutant could also be one that remains well folded without requiring the same interaction geometry. Finally, the mutant should be supported by prior mutational evidence and should not rely on highly disruptive changes at conserved positions. In practice, the best mutant is the one that shows a reasonable structural model, uses biologically tolerated substitutions, and best aligns with the assignment goals of improved folding, lysis efficiency, or reduced dependence on bacterial chaperones.
Week 6 HW: Genetic Circuits Part I: Assembly Technologies
Answers to Questions About This Week’s Lab Protocol
1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion High-Fidelity PCR Master Mix typically contains several important components: Phusion DNA polymerase, dNTPs, reaction buffer, and MgCl2. The polymerase synthesizes new DNA strands and has proofreading activity, which lowers the error rate compared with standard Taq polymerase. The dNTPs provide the nucleotide building blocks needed to make the new DNA strands. The buffer maintains the proper chemical environment, including pH and salt concentration, for the enzyme to work efficiently. MgCl2 is an essential cofactor that allows the polymerase to function properly. In this lab, the master mix is provided as a 2X mix, so only template DNA, primers, and water are added separately. (neb.com)
2. What are some factors that determine primer annealing temperature during PCR?
Primer annealing temperature is mainly determined by the melting temperature (Tm) of the primer binding region. According to the course page, a good primer binding region is usually about 18–22 base pairs, with a Tm around 52–58 °C, and the Tm values of the two primers should ideally be within about 5 °C of each other. GC content affects annealing temperature because GC base pairs are more stable than AT base pairs, so primers with higher GC content tend to have higher Tm values. A GC clamp at the 3′ end can improve binding stability, while hairpins or primer dimers can reduce efficiency. In this lab, the backbone PCR uses an annealing temperature of 57 °C, while the insert PCR uses 53 °C, showing that different primers may require different annealing conditions. (docs.google.com)
3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
PCR creates a linear DNA fragment by amplifying a selected region from a DNA template using primers, polymerase, dNTPs, and a thermocycler. In this week’s lab, PCR is used both to generate the backbone fragment and the color fragment, and it also introduces mutations through the primer sequence. Restriction enzyme digestion, by contrast, cuts DNA at specific recognition sites already present in the sequence. This means restriction digestion is limited by whether suitable restriction sites exist in the right positions, while PCR allows much more flexibility in choosing the exact boundaries of the fragment. (docs.google.com)
PCR is often preferable when you want to customize the fragment, such as adding Gibson overlaps, introducing mutations, or amplifying a region that does not have convenient restriction sites. Restriction enzyme digestion is often preferable when a plasmid already has well-placed restriction sites and you want a straightforward way to cut out a fragment without designing long primers. PCR is more versatile but can introduce amplification errors or require more optimization. Restriction digests are simpler in some cases, but they are less flexible because they depend on the DNA sequence already containing the right enzyme recognition sites. In this lab, PCR is the better choice because the experiment depends on introducing specific mutations into the color sequence and creating Gibson-compatible overlaps at the same time. (docs.google.com)
4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
To make DNA fragments appropriate for Gibson cloning, the adjacent fragments must contain the correct overlapping homologous sequences. The course page states that Gibson Assembly usually requires 20–40 bp overlaps between neighboring fragments, and these overlaps are designed directly into the primers. You also need to verify that the fragments are in the correct orientation and that their ends match exactly with the intended assembly junctions. In practice, this means checking the fragment design in a sequence editor before running the experiment, and then confirming fragment size by gel electrophoresis after PCR. (docs.google.com)
Another important step in this lab is DpnI digestion. DpnI digests the original methylated plasmid template but leaves the new unmethylated PCR products intact. This reduces background from the original plasmid and helps ensure that the fragments going into Gibson Assembly are the newly generated ones rather than leftover parental template. After that, the DNA is purified and quantified before assembly, which helps confirm that the samples are clean and present at usable concentrations. (docs.google.com)
5. How does the plasmid DNA enter the E. coli cells during transformation?
In this lab, plasmid DNA enters E. coli cells during transformation after the cells are made permeable by heat shock. The course page explains that heat shock or electroporation causes the bacterial membrane to temporarily open up, allowing plasmid DNA to enter the cell by diffusion. After the DNA enters, the cells are placed in SOC medium to recover and start expressing the antibiotic resistance gene carried by the plasmid. The cells are then plated on selective media containing chloramphenicol, so only cells that successfully took up the plasmid can grow. (docs.google.com)
6. Describe another assembly method in detail (such as Golden Gate Assembly).
Another widely used DNA assembly method is Golden Gate Assembly. Golden Gate uses a Type IIS restriction enzyme such as BsaI or BsmBI together with T4 DNA ligase in a one-pot reaction. Unlike standard restriction enzymes, Type IIS enzymes cut outside of their recognition sequence, which allows the user to design custom sticky ends. These sticky ends determine the order in which fragments assemble, so multiple fragments can be assembled directionally in one reaction. During repeated digestion-ligation cycles, incorrectly assembled products are recut, while correctly assembled products are ligated and preserved because the recognition sites are removed in the final construct. This makes Golden Gate highly efficient for assembling multiple modular parts in a predefined order, often without leaving scars between fragments. (neb.com)
7. Explain the other method in 5–7 sentences plus diagrams.
Golden Gate Assembly is a one-pot DNA assembly method that uses a Type IIS restriction enzyme and DNA ligase. The Type IIS enzyme cuts outside its recognition site, creating user-defined sticky ends instead of fixed ones. Those sticky ends determine which DNA fragments can join to each other, allowing multiple fragments to assemble in a chosen order. During thermal cycling, the enzyme cuts unassembled or misassembled molecules, while ligase seals the correct junctions. Because the recognition sites are typically removed during assembly, the final product is often scarless and is no longer recut in the same reaction. Compared with Gibson Assembly, Golden Gate uses short sticky ends rather than long homologous overlaps. This makes it especially useful for modular cloning systems with many interchangeable parts. (neb.com)
Diagram
Fragment A Fragment B Vector
[ BsaI ][ A overhang ] [ matching ][ BsaI ] [ BsaI ][ matching ][ BsaI ]
Step 1: Type IIS digestion
BsaI cuts outside its recognition sequence
↓ ↓ ↓
sticky end exposed matching sticky end exposed vector ends exposed
Step 2: Ligation
[ A overhang ] + [ matching overhang ] → joined fragment
Step 3: Final product
Correctly assembled product no longer contains the original Type IIS sites at the junction
→ scarless, ordered assembly