1.Biological engineering tool/application I am trying to develop a dyeing method for fabrics and surfaces by using Physarum Polycephalum, or the slime mould as an activator. The aim is to let the slime mould create one-of-one designs by growing on the surface, letting a level of unpredictabiity of growth control the outcome. Slime moulds are very good at creating pathways while expanding in search of optimum survival conditons. During this travel, they tend to leave behind residual pigment, usually yellow in colour. After drying it looks something like this. In this bioengineered application, physarum polycephalum expresses a pigment forming enzyme(tyrosinase/laccase-type oxidase) that catalyzes the oxidation of benign phenolic or cathechol precursors into reactive quinones that polymerize into and insoluble melanin-like pigment.
Part 1: Gel Electrophoresis Due to no access to equipment and space for gel electrophoresis I simulated the same to understand the process on https://www.labxchange.org/library/items/lb:LabXchange:9548bee3:lx_simulation:1?fullscreen=true
Workflow Design plasmid DNA with protein of interest →Transform bacteria with plasmid DNA→Get many copies of plasmid DNA→introduction of plasmid DNA to cells
Part A: Questions by Shuguang Zhang How many molecules of amino acids do you take with a piece of 500 grams of meat? 500g divided by 100 Da gives you about 3 × 10²⁴ molecules. So there are roughly 3 trillion trillion amino acids in a single serving of meat.
Human SOD1 sequence MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
After adding A4V mutation MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Therefore, produced peptides:
index Binder Pseudo Perplexity 1 WLYVVAAVRWKX 23.320599604199636 2 WRYVAAAAAHKE 8.96053025308908 3 WLYVPAGLALWX 13.021677157633269 4 WLYYVVAVAHKX 15.430388570774006 5 FLYRWLPSRRGG 11.545571242285833 ##Part 2: Evaluating Binders with alpha fold3
The alpha fold results for some reason are not loading for me, despite multiple attempst and troubleshooting. Hence the results were analyzed with the help of Claude using PAE matrices
peptide 1 ipTM 0.38 The PAE matrix shows a uniformly mid-green inter-chain strip with no distinct dark patch, indicating no preferred binding site and the peptide appears to be floating without specific engagement.
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fidelity PCR Master Mix contains most of the key ingredients needed for PCR, except the template DNA and primers. It is designed to make DNA amplification more accurate and easier to set up.
Some of the main components are:
Part1: Intracellular Artificial Neural Networks What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Traditional genetic circuits treat inputs as binary. This works for simple logic but breaks down when you need nuanced, graded decisions based on multiple continuous signals. Biology itself is almost never binary; cells exist on spectrums of gene expression and signalling intensity. IANNs overcome this by operating in the analog domain. An IANN computes a weighted sum of all inputs and applies a nonlinear activation function, exactly like an artificial neuron. The same molecular parts can be reused to implement completely different decision boundaries just by changing the weights, without engineering new biological parts from scratch. IANNs can also be stacked into multiple layers, enabling hierarchical computation that is completely impossible with single-layer Boolean circuits.
General Questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free protein synthesis gives you a level of control over the reaction environment that you simply cannot get when working inside a living cell. Because there’s no cell membrane, you can directly add or remove components, adjust concentrations in real time, and introduce molecules that would be toxic to a living cell without worrying about killing your chassis. You also get direct access to the product without needing to lyse cells or purify through layers of cellular debris.
Waters Part I Molecular Weight Question 1: Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? Using the ExPASy Compute pI/Mw tool with the provided eGFP sequence
Theoretical MW = 28,006.60 Da
Subsections of Homework
Week 1 HW: Principles and Practices
1.Biological engineering tool/application
I am trying to develop a dyeing method for fabrics and surfaces by using Physarum Polycephalum, or the slime mould as an activator. The aim is to let the slime mould create one-of-one designs by growing on the surface, letting a level of unpredictabiity of growth control the outcome. Slime moulds are very good at creating pathways while expanding in search of optimum survival conditons. During this travel, they tend to leave behind residual pigment, usually yellow in colour. After drying it looks something like this.
In this bioengineered application, physarum polycephalum expresses a pigment forming enzyme(tyrosinase/laccase-type oxidase) that catalyzes the oxidation of benign phenolic or cathechol precursors into reactive quinones that polymerize into and insoluble melanin-like pigment.
The target surface/fabric is to be first coated with a reservoir layer (mild binder+humectant) that is stable and non-coloured when dry. As the plasmodium (active foraging stage of slime mould), it leaves back a hydrateed, anionic extracellular slime film (acidic polysaccharide rich) that locally rehydrates the layer and provides a high water, ionically active environment for the reaction to take place. Enzyme delivered at the surface via organism converts the reservoir layer into pigment only with the trail’s footprint, and the newly formed polymer precipitates in place. The slime’s polyanionic matrix and the binder layer together act as immobilizing scaffold, physically and electrostatically retainining the pigment on fibres so the organism still moves while the dyed path remains as a persistent spatial record of its presence.
2.Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.
2.Safety + Non-malfeasance
Exposure:
Ensuring rigourous quality tests ensuring the engineered organism/pigment polymer/enzyme does not create risks like allergens/irritation, sensitizers, or use unsafe binders/precursors with result in volatile+unpredicatble by-products.
Developing narrow function envelope for the to curb new emergent pathways that may produce undocumented results.
Create a timeline documenting the processes that have been enacted and by which actors. Ensure “program changes” cannot be done by end-users (e.g., no easy swapping of genetic payloads or addition of external DNA to redirect production).
Containment and handling:
Developing systems that prevent accidental spread/mishandling of the GMO from the process of R&D to Distribution to end-of -life. (Develop clear handling protocols, containment during demonstrations and training, maintaining workspaces etc.)
Ensuring design features that reduce aerosolization/smearing (sealed edges, protective breathable membranes, simple decontamination steps for handlers).
Making failure modes public will also ensure the same errors are not repeated
Environmental safety:
Ensuring all the agents used in the process especially the GMO go through assess,,emt of whether it can sporulate in local environments and accordingly come up with stronger safeguarding.
Assessing toxicity levels for precursors and binders to avoid accumulative compounds post end-of-life. Ensuring biological activity is terminated before disposal and the waste is integrated with local waste stream systems.
A radial graph to show the level of involvement of different actors in enforcing policies
5. Ideal combination
My choice of policies is to combine Dual safeguard and screening of developed application + Standardizing end-of-life management
Choosing option 1 would reduce the scope of innovation, but Option 2 that ensures thourough assessment of the modified product whcih enables it to be replicated and scaled widely. It also mitigates concerns like pathogenic propogation risks, mutations in local environments, and/or any unintended consequences since a standardized model of development will be certified and followed.
Standardization of post-use processes also ensures responsible disposal of the product again, applied to the same scale.
Answers to questions from Professor Jacobson
DNA Polymerase has an inherent error rate of 1 in (10^{5}) to (10^{6}) bases. Human genome’s size is (\approx 3\times 10^{9}) base pairs. If replication is 100 percent efficient 0 errors would occur. With mistakes at (10^{-5}) rate it would result in 30,000 to 50,000 errors. Due to post replication mismatch error the final error rate in human cells is reduced to less than 10 mutations per genome per replication. To deal with this, enzymes ((\delta ) and (\epsilon )) check each nucleotide as they go, removing mispaired bases instantly, increasing accuracy 100-fold. After replication fork passes special repair proeins scan newly synthesized DNA for mismatches that slipped past the proofreading step and throughout the cell cycle other mechanisms like base excision repair nucleotide excision repair fixes spontaneous damage that could possibly cause a failure.
An average human protein (~450-500 amino acids) can be coded by different DNA sequences, potentially exceeding (10^{100}) possibilities, due to the genetic code’s degeneracy (61 codons for 20 amino acids). The reasons for failure to produce functional proteins are due to cases of improper protein folding, premature stop codons, incorrect splicing etc.
Answers to questions from Dr.LeProust
The mist used method currently is solid-phase phosphoramidite chemistry.
It is difficult due to exponential accumulation of minor chemical errors and significant drops in overall yield.
It is again not possible due to the limitations of the phosphoramidite chemistry. While it is possible to make them assembling shorter, multiple, purified and error checked oligonucleotides of around 50-100 bases long, attempting to make it in one go may result in extremely low yields, high error rates, inability to purify long correct and single stranded molecule.
Answers to quesitons from Prof. George Church
The 10 essential amino acids are lysine, methionine, tryptophan, threonine, valine, isoleucine, leucine, arginine, histidine and phenylalanine. 10 amino acids I think lysine contingency is not a failsafe biocontainment strategy, it is available in food. It is a good way to look at what started as an example from fiction, to understanding biocontainment in real -life scenarios. What will happen if a synthetic organism is released in the wild, or how will it evolve as natural forces act upon it.
Workflow
Design plasmid DNA with protein of interest →Transform bacteria with plasmid DNA→Get many copies of plasmid DNA→introduction of plasmid DNA to cells
Working in Benchling
After signing in I imported it into Benching and ran digests for
EcoRI
HindIII
BamHI
KpnI
EcoRV
SacI
SalI
And then ran digests on
SalI
SacI
BamHI KpnI
EcoRV
BamHI KpnI
SacI
SalI
I would recommend a plasmid-based cloning approach for initial expression work.
The optimized DNA can be inserted into a standard expression plasmid and introduced into E. coli via transformation. Once inside the bacterial cells, the plasmid replicates autonomously, allowing the host machinery to transcribe my DNA into mRNA and subsequently translate it into actin protein. However, since actin is a eukaryotic cytoskeletal protein and my sequence lacks the signal peptides and targeting sequences necessary for membrane localization or secretion, the expressed protein will likely accumulate intracellularly. This necessitates cell lysis and downstream purification via affinity chromatography or other protein separation techniques to isolate and characterize my recombinant actin.
Alternatively, the PURE system (Protein synthesis Using Recombinant Elements) presents a compelling option due to its turnaround time. In this cell-free approach, my DNA template is incubated with a defined set of recombinant enzymes and cellular extracts that provide the necessary transcriptional and translational machinery. This in vitro reaction proceeds rapidly without the overhead of maintaining living cells, generating my actin protein directly in the reaction mixture. The resulting product must subsequently be purified via affinity chromatography to obtain homogeneous, functional protein suitable for my downstream biochemical investigations.
DNA synthesis order
I want to use Green Fluorescent Protein because it is a good medium to understand and track other proteins. Physarum polycephalum has actin and myosis predominantly, and to understand the movements within the Physarum tubes, fluorescence can help.
Protein sequnece copied from UnitProt:
sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL
VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV
NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD
HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
reverse translation of sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 to a 714 base sequence of most likely codons.
atgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtggaactggatggc
gatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgatgcgacctatggc
aaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccgtggccgaccctg
gtgaccacctttagctatggcgtgcagtgctttagccgctatccggatcatatgaaacag
catgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgcaccatttttttt
aaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggcgataccctggtg
aaccgcattgaactgaaaggcattgattttaaagaagatggcaacattctgggccataaa
ctggaatataactataacagccataacgtgtatattatggcggataaacagaaaaacggc
attaaagtgaactttaaaattcgccataacattgaagatggcagcgtgcagctggcggat
cattatcagcagaacaccccgattggcgatggcccggtgctgctgccggataaccattat
ctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgatcatatggtgctg
ctggaatttgtgaccgcggcgggcattacccatggcatggatgaactgtataaa
The metadata was then submitted to opentrons google form
Week 4 HW: Protein Design Part 1
Part A: Questions by Shuguang Zhang
How many molecules of amino acids do you take with a piece of 500 grams of meat?
500g divided by 100 Da gives you about 3 × 10²⁴ molecules. So there are roughly 3 trillion trillion amino acids in a single serving of meat.
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Digestion breaks everything down to bare amino acids first. The original protein blueprint is completely destroyed. Then our ribosomes rebuild new proteins using our own genetic code, not the cow’s or the fish’s.
Why are there only 20 natural amino acids?
It is probably just a frozen evolutionary accident. Early life found 20 that worked well enough and the genetic code hardwired them in. At that point there is no going back without breaking every living thing on the planet.
Can you make other non-natural amino acids? Design some new amino acids.
You just swap out the side chain for something chemically stable. For example you can put a fluorine where the methyl group is in alanine and get fluoroalanine which is more hydrophobic and harder to degrade. You can add an azide group for click chemistry. You can even shift to beta amino acids by inserting an extra carbon in the backbone which makes them resistant to proteases.
Where did amino acids come from before enzymes that make them, and before life started?
They formed abiotically. The Miller-Urey experiment showed that just mixing early atmospheric gases with lightning produces amino acids spontaneously. They also show up on meteorites, glycine has been found in carbonaceous chondrites. No enzymes needed at all.
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
It would be left-handed. Normal L-amino acids form right-handed helices because of their backbone dihedral angle preferences. Mirror the chirality and you mirror the helix.
Can you discover additional helices in proteins?
We already know the 3-10 helix and the pi helix exist beyond the standard alpha helix. With cryo-EM resolution improving and AlphaFold predictions getting better, there are likely more unusual helical conformations hiding in membrane proteins and intrinsically disordered regions.
Why are most molecular helices right-handed?
Because life uses L-amino acids, and L-amino acids have backbone angles that naturally favor a right-handed turn. It traces all the way back to whichever chirality got selected early in evolution and then just stuck.
Why do β-sheets tend to aggregate?
What is the driving force for β-sheet aggregation?
Edge strands have exposed hydrogen bond donors and acceptors sitting there unsatisfied. They are basically sticky edges looking for a partner. The driving force is intermolecular hydrogen bonding combined with hydrophobic burial, and water gets released in the process which makes it entropically favorable too.
Why do many amyloid diseases form β-sheets?
Can you use amyloid β-sheets as materials?
When proteins misfold under stress they expose hydrophobic patches that seed beta sheet stacking. Once that nucleus forms it is thermodynamically very stable so more protein keeps piling on. As for materials, amyloid fibers are actually incredibly strong, comparable to silk, and they are self-assembling and tunable. People are already engineering them into scaffolds, nanowires, and hydrogels.
Part B: Protein Analysis and Visualization
1. Briefly describe the protein you selected and why you selected it.
In the plasmodium of Physarum polycephalum, the F-actin capping activity of the actin-fragmin complex is regulated by phosphorylation of actin, mediated by a novel type of protein kinase with no sequence homology to eukaryotic-type protein kinases.
This protein sits at the heart of what makes Physarum behavior fascinating. The oscillatory protoplasmic streaming that drives Physarum’s decision-making and network formation depends on rapid, rhythmic reorganization of the actin cytoskeleton. AFK is the molecular switch that controls it by phosphorylating actin, it determines whether actin filaments are being capped and severed (disrupting the cytoskeleton) or allowed to grow (driving streaming). Studying this kinase is therefore studying the molecular basis of Physarum’s behavioral intelligence.
The signalling pathway results in phosphorylation of actin, and stage-dependent phosphorylation of actin is associated with morphological alterations and reorganization of the actin cytoskeleton.
2. Identify the amino acid sequence of your protein.
The protein sequence has a total length of 737 amino acids. The most frequent amino acid is Serine (S) with 84 occurrences (11.40%), followed by Leucine (L) with 58 occurrences (7.87%), and Glycine (G) with 56 occurrences (7.60%). The least frequent is Cysteine (C) with 10 occurrences (1.36%).
Protein family
AFK belongs to the eukaryotic protein kinase (ePK) superfamily structurally, but functionally it is classified as the founding member of a unique actin kinase family. It is structurally related to the phosphoinositide kinase superfamily rather than classical Ser/Thr kinases, placing it in an unusual evolutionary position.
3. Identify structure page of your protein
It was solved in 1999. At 2.9 Å, you can reliably identify the backbone fold, secondary structure elements, and the position of the AMP ligand, but side-chain details are slightly less precise than higher-resolution structures.
The structure contains the protein (actin-fragmin kinase) and adenosine monophosphate (AMP).
AMP is not a random co-crystal contaminant. AMP occupies the ATP binding pocket of the kinase. This tells you precisely where the nucleotide binding site is and how the kinase is oriented to receive ATP before phosphorylating actin. In the context of Physarum behavior, this pocket is a potential target for disrupting the actin phosphorylation cycle to study what happens to streaming oscillations when AFK is inhibited.
4. Open the structure of your protein in any 3D molecule visualization software
P.S. There are double protein structures in the screenshots accidentally.
visualizing as ‘cartoon’, ‘ribbon’ and ‘ball and stick’
cartoon view
ribbon view
ball and stick
Looking at the structure image, the catalytic module spans about 160 residues, with the nucleotide binding site and catalytic machinery tucked into the cleft between the two lobes. According to PubMed, there is a pretty balanced mix of alpha helices and beta sheets, which is exactly what you expect from this bilobal kinase fold.
The protein surface is a sea of blue hydrophilic residues, which is what allows it to stay dissolved in the crowded cytoplasm of the cell.
In contrast, the protein core is packed with orange hydrophobic residues. These are tucked away from water, creating the internal glue that keeps the entire structure stable and folded correctly. In the AMP binding pocket the hydrophobic patches grip the adenine ring of the nucleotide, while polar residues reach out to coordinate the phosphate groups.
This mapping is really the key to Physarum biology. Since the kinase has to dock onto actin filaments, that unique flat substrate recognition domain is covered in hydrophilic patches specifically designed to recognize and stick to actin’s surface chemistry.
First, there’s the ATP/AMP binding pocket, a deep cleft that’s carved right between the N-terminal and C-terminal lobes. Since you can clearly see the yellow AMP ligand tucked inside, it’s obviously the biggest “hole” on the surface and the best place for drug targeting. Second, check out the flat substrate recognition domain. Unlike most kinases that have a narrow groove, AFK uses a remarkably flat, broad surface to dock with the large actin substrate. This unique structural flatness is a huge defining trait for this enzyme.
C1. Protein Language Modelling
Position 109 (Asp/D) shows the strongest conservation signal in the mutational scan — nearly all substitutions receive strongly negative log-likelihood scores. This is consistent with this residue being the catalytic base in the kinase active site, directly involved in phosphotransfer to actin’s Thr202. Even conservative mutations (D→E) are penalized, suggesting the precise geometry of this aspartate is essential.
Yes, the t-SNE map forms meaningful neighborhoods where evolutionarily related proteins cluster tightly together, confirming that ESM2 has successfully learned to group biologically similar sequences into shared regions of the latent space.
I ran a request in Gemini to create another 3d t-SNE with the AFK highlighted and this is how it looked
AFK from Physarum polycephalum lands at coordinates (−3.39, −0.29, −0.89) in a sparse, isolated region of the map with no tight cluster, reflecting its status as an evolutionarily unique kinase with no sequence homology to classical eukaryotic protein kinases. Its nearest neighbors are similarly atypical, low-homology proteins rather than mainstream kinases or cytoskeletal proteins like actin
Protein Folding
predicted structure after running it through colab
RMSD score Executive: RMSD = 0.728 (1913 to 1913 atoms)
ESMFold predicted the 3D structure of Actin-Fragmin Kinase from Physarum polycephalum using sequence alone, achieving an RMSD of 0.728 Å against the experimentally determined crystal structure 1CJA
Mutation 1 - position 45, changed S (Serine) to A (Alanine)
Executive: RMSD = 0.744 (1918 to 1918 atoms) only
Mutation 2 - changed position 155, which is in the catalytic core. L (Leucine) to P (Proline)
Human SOD1 sequence
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
After adding A4V mutation
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Therefore, produced peptides:
index
Binder
Pseudo Perplexity
1
WLYVVAAVRWKX
23.320599604199636
2
WRYVAAAAAHKE
8.96053025308908
3
WLYVPAGLALWX
13.021677157633269
4
WLYYVVAVAHKX
15.430388570774006
5
FLYRWLPSRRGG
11.545571242285833
##Part 2: Evaluating Binders with alpha fold3
The alpha fold results for some reason are not loading for me, despite multiple attempst and troubleshooting. Hence the results were analyzed with the help of Claude using PAE matrices
peptide 1 ipTM 0.38
The PAE matrix shows a uniformly mid-green inter-chain strip with no distinct dark patch, indicating no preferred binding site and the peptide appears to be floating without specific engagement.
peptide 2 ipTM 0.35
The inter-chain strip is mostly light green with a very faint darker region around residues 60–100, suggesting a weak, non-specific affinity toward the β-barrel region, though confidence is low.
peptide 3 ipTM 0.36
The inter-chain strip is the lightest and most uniform of all five, indicating the highest positional uncertainty. It appears to have the least defined interaction with SOD1.
peptide 4 ipTM 0.37
A slightly darker patch in the inter-chain strip around residues 1–30 hints at proximity to the N-terminal region where the A4V mutation sits, making this the most therapeutically interesting placement among the PepMLM peptides.
peptide 5 ipTM 0.41
Shows the darkest and most defined inter-chain strip overall, with a signal around residues 60–110 suggesting some affinity toward the β-barrel mid-region consistent with it being a known SOD1 binder and having the highest ipTM.
Part 3: Evavluating properties of generated peptides in Peptiverse
Peptide 1 WLYVVAAVRWKA
Peptide 2 WRYVAAAAAHKE
Peptide 3 WLYVPAGLALWA
Peptide 4 WLYYVVAVAHKA
Peptide 5 FLYRWLPSRRGG
All four peptides demonstrated favorable therapeutic profiles when evaluated through PeptiVerse, and outperformed FLYRWLPSRRGG in predicted binding affinity. Every peptide showed perfect solubility (1.000 probability) and was predicted to be non-hemolytic, confirming a safe baseline. In terms of binding affinity, Peptide 3 (WLYVPAGLALWA) emerged as the strongest binder with a medium binding score of 7.599 pKd/pKi, followed by Peptide 1 (WLYVVAAVRWKA) at 7.214. FLYRWLPSRRGG achieved a weak binding score of 5.968. This is a significant finding as it suggests PepMLM successfully generated peptides with stronger predicted affinity than an experimentally validated binder. Based on this analysis, Peptide 3 (WLYVPAGLALWA) remains the top candidate to advance. It has the highest predicted binding affinity, full solubility, low hemolytic risk, and a drug-like molecular weight of 1359.6 Da, making it the strongest overall therapeutic candidate from this screen pending AlphaFold3 structural confirmation.
Interpretation of PeptiVerse results
The generated peptides showed trade-offs between predicted binding affinity, therapeutic safety, and developability.
Peptide 7 (GKRYYYYKDKCF) showed the strongest predicted binding affinity (pKd = 9.123), making it the most promising binder from an interaction standpoint. However, it had a relatively low motif score (0.340), suggesting weaker alignment with the desired design motif.
Peptide 8 (VGTCYCIKKKKM) had the highest hemolysis probability (0.978), which makes it less attractive as a therapeutic candidate despite a reasonably strong predicted affinity (pKd = 7.123) and a strong motif score (0.730).
Peptide 9 (TKQCKFTRPQNE) had the strongest motif score (0.876), indicating good alignment with the desired interaction pattern, but its predicted binding affinity (pKd = 5.533) was lower than the best-performing candidates.
Overall, Peptide 7 appears strongest in terms of predicted affinity, while Peptide 9 may represent a more motif-consistent but weaker-binding alternative. Since all candidates showed high hemolysis probabilities, additional optimization would likely be required before therapeutic development.
Part 4: Optimized peptide generation with moPPIt
Index
Peptide
Hemolysis
Solubility
Affinity (pKd)
Motif Score
6
GKCGKNEVHKHR
0.955
0.917
5.692
0.396
7
GKRYYYYKDKCF
0.945
0.917
9.123
0.340
8
VGTCYCIKKKKM
0.978
0.750
7.123
0.730
9
TKQCKFTRPQNE
0.955
0.833
5.533
0.876
Overall, moPPIt gives more rational, multi-objective candidates anchored to a therapeutic hypothesis (binding the A4V site), while PepMLM provides broader sequence diversity without site or safety guidance.
Among the moPPIt candidates, GKRYYYYKDKCF (Peptide 7) is the strongest candidate to advance. It has by far the highest predicted binding affinity (9.12 pKd), a hemolysis score of 0.945 (non-hemolytic), and a solubility score of 0.917. Its motif score of 0.340 is the lowest among the four, suggesting it may not perfectly engage the exact residues targeted near position 4, but given that its affinity is dramatically higher than all other candidates from both tools, it warrants further structural and experimental investigation to determine where exactly it binds SOD1.
Part B skipped since optional
Part C: Final project L-Protein Mutants
Option 1: Mutagenesis
Attaching MSA output
looking at the TM region in Image 2, almost every sequence ends with EAVIRTVTTLQQLLT. This stretch is extremely conserved, which means residues ~62–75 (VIRTVTTLQQLLT) are very risky to mutate.
And the L Protein mutation heatmap,
The heatmap x-axis follows the full L-protein sequence. Mapping positions to amino acids:
M(1) E(2) T(3) R(4) F(5) P(6) Q(7) Q(8) S(9) Q(10)Q(11)T(12)P(13)A(14)S(15)T(16)
N(17)R(18)R(19)R(20)P(21)F(22)K(23)H(24)E(25)D(26)Y(27)P(28)C(29)R(30)R(31)Q(32)
Q(33)R(34)S(35)S(36)T(37)L(38)Y(39)V(40) | L(41)I(42)F(43)L(44)A(45)I(46)F(47)L(48)
S(49)K(50)F(51)T(52)N(53)Q(54)L(55)L(56)L(57)S(58)L(59)L(60)E(61)A(62)V(63)I(64)
R(65)T(66)V(67)T(68)T(69)L(70)Q(71)Q(72)L(73)L(74)T(75)
ESM Score vs. Experimental Data Correlation
To evaluate whether the ESM-based mutational scores capture real functional information, I cross-referenced the heatmap against the experimental L-protein mutant dataset from the spreadsheet. Positions such as those in the conserved EAVIRTVTTLQQLLT stretch of the TM domain (residues 62–75) consistently appear as dark columns in the ESM heatmap, indicating strong negative predicted fitness for any substitution. This aligns well with the MSA data, where these positions show near-zero variation across related phage sequences. Conversely, some positions in the soluble N-terminal domain (residues 1–40) show yellow-to-neutral scores at certain substitutions, suggesting the model predicts these changes are tolerable and consistent with the experimental observation that many soluble-domain mutations retain partial lysis activity.
The following 5 mutations were selected based on positive ESM LLR scores, MSA
conservation analysis, and structural reasoning. Two mutations fall in the TM domain
(residues 41–75) and three in the soluble N-terminal domain (residues 1–40). The mutations I chose to continue with in the Soluble Domain and Transmembrane domain are:
Index
Position
Wildtype_AA
Mutation_AA
LLR Score
1
29
C
R
2.3954
2
09
S
Q
2.014
3
50
K
L
2.5615
4
53
N
L
1.8649
5
22
F
R
1.6020
Alphafold multimer runs
8 chains of L-protein (including proposed mutations) separated by colons,
total length 600 residues.
All five ranked models show uniformly very low pLDDT scores (20–28, well below the
<50 threshold). The PAE matrices are nearly uniformly red (~25–30 Å error) across all off-diagonal inter-chain blocks, with confidence only on the per-chain diagonal. This means the model cannot confidently place any chain relative to any other.
Despite the low confidence scores, the predicted structures display a biologically
interesting pattern: in the Mol* viewer, helical secondary structure is
visible at the center of the assembly, with disordered tails radiating outward in a
sunburst arrangement. This is consistent with the pore-formation hypothesis for the
L-protein. the TM helices converge at the central axis (as expected for a membrane
pore), while the soluble N-terminal domains remain disordered and point outward into
the cytoplasm. The per-position IDDT plot shows periodic peaks that
correspond to the TM helix region of each chain, which is the only portion with
marginally higher local confidence (~40–50).
Run 2: L-Protein + DnaJ CoFold
L-protein (mutant sequence, 75 residues, Chain A) + DnaJ (357 residues,
Chain B), submitted as a two-chain heterodimer to ColabFold AlphaFold2 Multimer v3
In contrast to the octamer run, the L-protein + DnaJ co-fold produces
substantially higher confidence scores across all five models (pLDDT 70–78, pTM
~0.527), indicating that AlphaFold2 can form a meaningful structural prediction for
this complex. This difference is expected: DnaJ is a well-characterised soluble
protein with rich MSA coverage (~2000 sequences, Image 1), which anchors the
prediction and allows confident inter-chain contact modeling.
The per-position IDDT plot reveals the key asymmetry of the complex:
Chain A (L-protein, positions 0–75) consistently scores in the 20–50 range across
all models while Chain B (DnaJ, positions 75–450) scores
80–95 throughout, well into the “confident” to “very high” range. This is biologically
meaningful: the L-protein is a largely disordered, membrane-dependent protein that AF2
cannot confidently fold in isolation, while DnaJ is a structured chaperone that the
model predicts with high accuracy. The L-protein’s low per-residue confidence does
not invalidate the interaction prediction — it reflects the intrinsic disorder of
the L-protein rather than a failure of the complex model.
All five ranked models show distinct blue (low error, ~0–10 Å) regions in the inter-chain quadrants, specifically, the L-protein (chain A, rows 0–75) shows confident predicted placement relative to the N-terminal J-domain region of DnaJ (approximately positions 100–250 in chain B). This is a strong signal: the model is confidently predicting that the L-protein contacts DnaJ, and that the interaction interface is localised rather than diffuse. Crucially, the contact region maps to L-protein residues in the soluble N-terminal domain (residues 1–40), not the TM domain — consistent with the
published biological evidence that DnaJ interacts with the soluble domain of the
L-protein (Chamakura et al., 2017).
The Mol* structure shows DnaJ folded as a large, confident beta-sheet and
helix domain (blue/dark blue throughout), with the L-protein appearing as a short
helix (red, low pLDDT) docked against DnaJ’s surface at the J-domain. The helical
secondary structure of the L-protein’s TM region is partially preserved even in this
soluble context, appearing as a compact helical element adjacent to the DnaJ
interaction surface.
Relevance to proposed mutations:
Three of the five proposed mutations (S9Q, F22R, and C29R) fall directly within
the soluble domain (residues 1–40) that the PAE matrix identifies as the predicted
DnaJ contact region. This strongly supports their therapeutic rationale:
C29R introduces a positively charged arginine at a cysteine position within
the predicted interface. This could either strengthen hydrophilic contacts with
DnaJ or, more importantly, sterically and electrostatically disrupt the native
interaction — potentially enabling DnaJ-independent folding by forcing the
L-protein to adopt a stable conformation without chaperone assistance.
F22R replaces an aromatic residue with arginine at another interface-proximal
position, similarly altering the electrostatic character of the binding surface.
S9Q lies at the N-terminal edge of the predicted contact zone; the glutamine
substitution introduces new hydrogen-bonding capacity that could stabilize the
soluble domain’s fold autonomously.
The two TM mutations (K50L and N53L) fall in Chain A positions beyond the confident
inter-chain contact region, consistent with TM residues not participating in DnaJ
binding — instead targeting membrane insertion efficiency independently of the
DnaJ interaction.
The L-protein’s low per-residue pLDDT throughout means the exact
contact geometry should be treated as a hypothesis rather than a reliable atomic
model. AlphaFold2 lacks membrane context, so the TM domain is modeled as if soluble.
Validation via co-immunoprecipitation or crosslinking mass spectrometry of the
wildtype and mutant complexes would be required to confirm the predicted interface.
A more reliable structural prediction for just the soluble domain co-folded with
DnaJ’s J-domain (rather than full-length L-protein) could also be attempted, as
this would focus modeling resources on the well-defined interaction region.
Week 6 HW: Genetic Circuits Part I
1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion High-Fidelity PCR Master Mix contains most of the key ingredients needed for PCR, except the template DNA and primers. It is designed to make DNA amplification more accurate and easier to set up.
Some of the main components are:
Phusion High-Fidelity DNA Polymerase – the enzyme that synthesizes new DNA strands.
dNTPs – the nucleotide building blocks (A, T, G, and C) used to build the new DNA.
MgCl₂ – provides magnesium ions, which are required for the polymerase to function.
Reaction buffer – maintains the correct pH and salt conditions so the reaction can proceed efficiently.
Together, these components create the right environment for accurate and efficient PCR amplification.
2. What are some factors that determine primer annealing temperature during PCR?
The annealing temperature in PCR depends on how well the primers can bind to the target DNA sequence. If the temperature is too low, the primers might bind non-specifically. If it is too high, they may not bind properly at all. So overall, the annealing temperature is chosen to balance specificity and efficiency during PCR.
3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
PCR works by amplifying a specific region of DNA using primers, DNA polymerase, dNTPs, and thermal cycling. The main advantage of PCR is that it is highly flexible. PCR is especially useful when I need a specific insert or when I only have a small amount of starting DNA.
Restriction enzyme digestion works by cutting DNA at specific recognition sites using restriction enzymes. Unlike PCR, it does not amplify DNA but cuts the DNA wherever those enzyme sites are present. This method is often easier and more straightforward if the plasmid or DNA sequence already contains the right restriction sites.
4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
To make sure that the DNA fragments are appropriate for Gibson cloning, the main thing I need to check is whether they have the correct overlapping ends. Gibson Assembly works by joining DNA fragments that share homologous sequences at their ends, so the insert and the vector backbone need to have matching overlap regions. Usually, these overlaps are designed into the PCR primers so that the amplified insert already contains the right sequences for assembly. The plasmid backbone also needs to be linearized in a way that exposes the corresponding matching ends.
5. How does the plasmid DNA enter the E. coli cells during transformation?
Plasmid DNA enters E. coli cells when the cell membrane is temporarily made permeable during transformation. Normally, DNA cannot easily cross the bacterial membrane because of charge repulsion and the barrier created by the cell envelope.
In chemical transformation, the cells are made competent using salts such as calcium chloride, which helps the DNA interact more easily with the cell surface. A brief heat shock is then used to create temporary changes in membrane permeability, allowing the plasmid DNA to enter.
In electroporation, a short electrical pulse creates temporary pores in the membrane, and the DNA enters through those openings.
6. Describe another assembly method in detail (such as Golden Gate Assembly)
Gibson Assembly is a molecular cloning method that joins multiple DNA fragments in a single, isothermal reaction. Each fragment is designed with short overlapping ends, and a mix of enzymes, a 5′ exonuclease, DNA polymerase, and DNA ligase, works together to assemble them seamlessly. The exonuclease creates single-stranded overhangs, allowing complementary regions to anneal; the polymerase fills in gaps, and the ligase seals the nicks. This enables rapid and scarless construction of complex DNA constructs without the need for restriction enzymes.
Gibson Assembly — Construct Design in Benchling
I used Benchling’s Gibson Assembly tool with pSB1C3 as the expression vector.
I chose pSB1C3 because it is the standard iGEM backbone used throughout
this course, is high-copy, and worksreliably in E. coli, making it the most practical choice for expressingthe L-protein mutant.
The construct architecture I used was:
Anderson constitutive promoter BBa_J23106, followed by the Elowitz RBS
BBa_B0034, the mutant L-protein coding sequence, and the double terminator
BBa_B0015, all cloned into the pSB1C3 backbone. I built it as a
separate DNA sequence in Benchling, then concatenated them into a single
insert fragment per construct before attempting assembly.
The assembly process was not straightforward. When I first tried using
Benchling’s new Gibson Assembly tool, the vector slot showed a persistent
orange dot indicating it couldn’t resolve the cut site on the circular
pSB1C3 sequence.
I tried lowering the minimum Tm, widening the homology length range, increasing the Tm difference tolerance but the orange dot remained. After troubleshooting, I realised
the issue was that Benchling’s newer assembly interface couldn’t
automatically determine where to linearise an imported iGEM vector, likely
because pSB1C3 lacks a standard cut site annotation that the tool
expects.
To navigate this, I switched to Benchling’s legacy assembly tool, which
handles vector linearisation differently and gave me direct control over
the cut position. I also manually created a linearised version of pSB1C3
by reorienting the sequence to start at position 22, effectively
pre-cutting the vector at the BioBrick MCS insertion site between the
BioBrick suffix and the his operon terminator. This bypassed the
auto-detection issue entirely. Using the linearised vector with the legacy
tool, the assembly ran successfully on the first attempt.
The final assembled plasmid for Construct 1 (F22R + C29R) came out at
2483 bp, with all four insert annotations J23106 promoter, B0034 RBS,
L-protein CDS, and B0015 terminator correctly placed and visible in
the circular map. Benchling also auto-designed all four Gibson primers
(vector forward, vector reverse, insert forward, insert reverse) with
appropriate overlapping tails for in vitro assembly.
Asimov Kernel
Bacterial Demo
I ran the bacterial demo in the repository on Asimov first
And then tried to recreate it using the given parts in the Characterizeed bacterial parts repository
The recreated Repressilator appears to match the original very closely at the circuit-design and dynamical-behavior level. The topology is preserved, the annotated sequence structure is essentially the same, and the simulated outputs show the expected three-node oscillatory repression dynamics.
L-Protein Mutant Constructs
Construct 1: Constitutive GFP Expression Circuit
This is a simple constitutive expression circuit used as a reference
design to test the Kernel simulation environment.
This construct uses a Short HsEef1a1 promoter driving expression of
BBa_K3630002 and BBa_K3128009, with an L3S2P24 bacterial terminator.
The RNAP flux was lower here (~0.27 relative units) compared to
Construct 1, which makes sense since the HsEef1a1 promoter is a
mammalian promoter and not optimised for bacterial simulation contexts. I also asked the asimov AI for assistance
This uses the constitutive promoter BBa_J23119, RBS BBa_B0034, the L-protein
coding sequence BBa_E0040, a coding sequence extension BBa_B0032, an insulator BBa_E1010, and terminator BBa_B0015. The simulation showed RNAP flux, and interestingly the ribosome flux graph showed two distinct peaks suggesting the simulator is resolving translation at two separate coding regions. This is the construct architecture most directly applicable to expressing L-protein mutants in E. coli for the downstream plaque assay experiments.
Week 7 HW: Genetic Circuits Part II
Part1: Intracellular Artificial Neural Networks
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits treat inputs as binary. This works for simple logic but breaks down when you need nuanced, graded decisions based on multiple continuous signals. Biology itself is almost never binary; cells exist on spectrums of gene expression and signalling intensity. IANNs overcome this by operating in the analog domain. An IANN computes a weighted sum of all inputs and applies a nonlinear activation function, exactly like an artificial neuron. The same molecular parts can be reused to implement completely different decision boundaries just by changing the weights, without engineering new biological parts from scratch. IANNs can also be stacked into multiple layers, enabling hierarchical computation that is completely impossible with single-layer Boolean circuits.
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
Application: multi-signal tumour detection
A compelling use case is engineering a cancer-detecting IANN in CAR-T cells that triggers apoptosis only when multiple tumour markers are simultaneously present at the right levels, while ignoring healthy cells that express some markers at lower concentrations.
Three inputs (HER2, MUC1, HIF-1a) drive promoters at strengths proportional to their concentration. Those promoters produce endoribonucleases whose expression encodes the weighted input combination. Layer 1 outputs Csy4, whose concentration reflects the weighted sum. In layer 2, a caspase gene carries a Csy4-recognition hairpin in its 5’ UTR. If Csy4 is below threshold, the hairpin is intact and the cell triggers apoptosis in the target. If Csy4 is high, it cleaves the mRNA and nothing happens.
Limitations: the number of well-characterised orthogonal ERNs is small, capping practical input dimensionality. The system is also sensitive to transcriptional noise at low signal concentrations, and tuning promoter strengths reliably across cell types is difficult.
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
cover image
Part 2: Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
The most developed fungal material is mycelium composite, where filaments of fungi like Ganoderma are grown through agricultural waste substrates like corn stalks and grain husks. The mycelium binds these particles into a solid mass that can be moulded. Ecovative Design uses this for packaging foam replacing expanded polystyrene. Bolt Threads grows mycelium leather sheets (Mylo) used by fashion brands, and Mogu produces acoustic wall panels and floor tiles.
Here are some samples of mycelium I grew in 2024. The strains used are reishi and florida oyester mushroom strains.
Mycelium composites are biodegradable, grown on agricultural waste with no petrochemical inputs, naturally fire-resistant, and thermally insulating. Mycelium leather avoids tanning chemicals and animal welfare concerns, and unlike synthetic PU leather it does not shed microplastics.
Disadvantages: the material must be heat-killed at the end of growth to stop fungal activity, causing dehydration and shrinkage that can warp precision shapes. Moisture resistance is limited without coatings. The growth process is sensitive to contamination. And mechanical properties like tensile strength still fall short of high-performance synthetics.
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
The most impactful engineering target is producing complex therapeutic glycoproteins that bacteria cannot make correctly. Beyond therapeutics, engineering mycelium to produce chitin fibres with controlled orientation or to express spider silk proteins could yield composites with dramatically improved mechanical properties. Fungi could also be engineered for mycoremediation, bioaccumulating heavy metals from contaminated soil.
Fungi secrete proteins at rates 10 to 1000 times higher than bacteria, survive harsh conditions like low pH and desiccation, and build three-dimensional hyphal networks enabling solid-state fermentation without large water volumes.
Week 9 HW: Cell Free Systems
General Questions
1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Cell-free protein synthesis gives you a level of control over the reaction
environment that you simply cannot get when working inside a living cell.
Because there’s no cell membrane, you can directly add or remove
components, adjust concentrations in real time, and introduce molecules
that would be toxic to a living cell without worrying about killing your
chassis. You also get direct access to the product without needing to
lyse cells or purify through layers of cellular debris.
Two cases where cell-free is better than cell-based production is
MS2 L-protein punches holes in membranes and kills bacteria, you can’t
reasonably produce it inside a living E. coli because it would lyse its
own host before you getting meaningful yield. Cell-free lets you synthesize
toxic protein in a controlled environment without that problem. It also lets you iterate and test on dozens of variants quickly.
2. Describe the main components of a cell-free expression system and explain the role of each component.
A cell-free expression system is essentially the inside of a cell,
extracted and reconstituted in a tube. It conssits of:
Cell extract: This is the ‘machinery’ containing ribosomes,
translation factors, chaperones, and all the machinery needed to read
an mRNA and assemble a protein.
DNA template or mRNA: This is what you want expressed. You can add
a plasmid, linear PCR product, or pre-transcribed mRNA depending on
whether you want transcription to happen in the reaction or not.
RNA polymerase: Needed if you’re starting from DNA typically T7
RNAP is added for prokaryotic systems since it’s fast and highly
processive.
Amino acids: The building blocks. You supply all 20 at defined
concentrations so the ribosomes have raw material.
Energy regeneration system: ATP is consumed rapidly during
translation. You need a system to regenerate it typically
phosphocreatine + creatine kinase, or PEP (phosphoenolpyruvate).
3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
Energy regeneration is critical because translation is ATP-
intensive. The cell-free reaction has a finite supply, and without regeneration the reaction stalls within minutes.
The most common approach is the phosphocreatine/creatine kinase system that catalyzes the transfer of a phosphate group from phosphocreatine to ADP, regenerating ATP. This is simple to add and works well for reactions up to a few hours.
4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
Prokaryotic cell-free systems (E. coli-based) are faster to prepare,
cheaper, and give higher yields for most simple proteins. The extract is
easy to make in bulk and the system is well characterized. I’d use it to
produce the MS2 L-protein, its natural context is E. coli, all the relevant chaperones are present in the E. coli extract, and I need high yield quickly for membrane insertion assays.
Eukaryotic systems are needed when your protein requires post-translational modifications like glycosylation, disulfide bond formation in the ER, or mammalian-specific folding chaperones. I’d use a mammalian cell-free system to produce
human SOD1 it’s a cytosolic metalloenzyme that requires proper
copper and zinc cofactor loading, and its folding energetics in the A4V
mutant form are already perturbed, so having the right chaperone
environment matters.
5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
Membrane proteins are the hardest class to express in cell-free systems
because they’re hydrophobic and aggregate instantly in aqueous solution
without a membrane to insert into. The key is to provide a hydrophobic
environment during synthesis.
I would design the experiment as follows: use an E. coli-based cell-free
system supplemented with nanodiscs or liposomes added directly to the
reaction so the protein co-translationally inserts into a lipid bilayer
as it comes off the ribosome. For the L-protein specifically, I’d prepare
nanodiscs made from POPC and MSP1D1 scaffold protein, add them at ~0.2 mg/mL
to the cell-free reaction, and run the reaction to slow translation slightly and give the protein more time to fold beforethe next ribosome catches up.
The main challenges are: (1) aggregation before membrane insertion
addressed by pre-adding nanodiscs before starting transcription; (2) low
yield because hydrophobic proteins titrate out ribosomes, addressed by
using a PURE system where you have more control over ribosome
concentration; (3) confirming proper insertion addressed by running a
protease protection assay where correctly inserted protein is shielded
from externally added proteinase K.
6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
Reason 1: The genetic template isn’t intact or there isn’t enough of it.
The machinery can only build what it can read. If the DNA or RNA blueprint
has degraded, or if there simply isn’t enough of it in the reaction, the
output will be low no matter how healthy everything else is. To fix this,
I’d first verify the quality and quantity of my template before adding it
to the reaction. If the instructions are broken, no amount of tweaking
elsewhere will help. I’d also protect the template from being destroyed
mid-reaction by adding agents that block the enzymes responsible for
degrading nucleic acids.
Reason 2: The energy or building blocks ran out.
Protein synthesis is energy-hungry, and a cell-free reaction
has a fixed starting supply. Once it is exhausted, the machinery stops,
even if everything else is fine. Similarly, if the amino acid pool gets
depleted partway through, the ribosomes stall. To troubleshoot this, I’d
make sure the reaction includes an energy regeneration system so the fuel
gets continuously recycled rather than just consumed, and I’d check that
all twenty amino acids are present and well-supplied throughout the
reaction.
Reason 3: The reaction environment isn’t right for this particular protein.
The chemical conditions inside the tube things like salt balance and
pH affect how well the machinery functions and whether the protein folds
correctly after being made. A protein that misfolds immediately gets
flagged and broken down, so even if translation is happening, the yield
of intact product stays low. I’d troubleshoot this by running a small
set of test reactions where I vary the buffer conditions slightly and see
which environment gives the best result for my specific protein, rather
than assuming the default conditions work for everything.
Homework Question from Kate Adamala
1. Function
a. What would your synthetic cell do? What is the input and what is
the output?
My synthetic cell would act as a targeted antibiotic delivery vesicle
for treating antibiotic-resistant bacterial infections. The input is a
specific lipopolysaccharide (LPS) signature from a pathogenic gram-negative
bacterium (e.g. K. pneumoniae). The output is localized release of a
pore-forming peptide payload directly at the bacterial surface, lysing
the pathogen without systemic antibiotic exposure.
b. Could this function be realized by cell-free Tx/Tl alone, without
encapsulation?
No. Without encapsulation, there is no spatial specificity. The
pore-forming peptide would be released everywhere and would be toxic to
host cells as well. Encapsulation is what makes the delivery targeted:
the synthetic cell only releases its payload when it docks onto a
pathogen-specific surface signal.
c. Could this function be realized by a genetically modified natural
cell?
Not easily. A living cell programmed to lyse bacteria would face serious
immune clearance, regulatory hurdles, and the risk of horizontal gene
transfer to other organisms. A synthetic minimal cell is non-replicating,
non-living, and therefore much safer and more controllable.
d. Describe the desired outcome of your synthetic cell operation.
When the synthetic cell encounters a K. pneumoniae surface, a LPS-sensing
aptamer on the membrane surface triggers expression of a pore-forming
peptide (colistin mimetic) from the encapsulated Tx/Tl system. The peptide
inserts into the bacterial membrane, causing lysis specifically at the site
of infection, while host mammalian cells which lack LPS are untouched.
2. Component Design
a. What would the membrane be made of?
POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) as the main
structural lipid, supplemented with 30% cholesterol for membrane stability,
and 5% DSPE-PEG2000 for steric stabilization and extended circulation
time in biological fluids. The LPS-sensing aptamer would be conjugated to
DSPE-PEG-maleimide on the outer leaflet.
b. What would you encapsulate inside?
Bacterial cell-free Tx/Tl system (E. coli S30 extract)
Linear DNA template encoding the pore-forming peptide under a T7
promoter with an aptazyme riboswitch responsive to LPS
ATP regeneration mix (phosphocreatine + creatine kinase)
All 20 amino acids at standard PURE system concentrations
Mg²⁺ optimized to 8 mM
c. Which organism will your Tx/Tl system come from?
Bacterial (E. coli S30 extract) — this is sufficient because the trigger
is an aptazyme riboswitch, which works in bacterial Tx/Tl. No mammalian
promoter system is needed since I’m not using Tet-ON or similar
mammalian-specific inducible systems.
d. How will your synthetic cell communicate with the environment?
The LPS signal is detected by a surface-conjugated aptamer that, upon
binding, triggers local membrane destabilization — releasing the Tx/Tl
system contents or initiating fusion with the bacterial outer membrane.
The pore-forming peptide produced inside the synthetic cell is
hydrophobic enough to insert directly into the adjacent bacterial membrane
upon release, without needing a dedicated membrane channel for export.
3. Experimental Details
a. List all lipids and genes:
Lipids:
POPC (main bilayer)
Cholesterol (30 mol%)
DSPE-PEG2000-maleimide (5 mol%, for aptamer conjugation)
Genes:
Pore-forming peptide gene: synthetic codon-optimized gene encoding
Magainin-2 (a well-characterized antimicrobial peptide) under T7
promoter, with an LPS-responsive aptazyme (based on the
OxyS aptazyme scaffold) in the 5’ UTR
T7 RNA polymerase gene: for transcription of the peptide gene inside
the vesicle
Aptamer: LPS-binding aptamer sequence (Johnson et al., 2008, derived from
SELEX against LPS from E. coli O111:B4) conjugated to DSPE-PEG-maleimide
via thiol chemistry.
b. How will you measure the function of your system?
Primary readout: mix synthetic cells with K. pneumoniae in liquid culture
and measure optical density at 600 nm over 6 hours a drop in OD600
indicates bacterial lysis. Secondary readout: add SYTOX Green (a membrane-
impermeant DNA dye) to the co-culture. If bacteria are lysed, SYTOX
enters and fluorescence increases, which can be quantified by plate reader
or flow cytometry.
Homework Question from Peter Nguyen
Field chosen: Architecture
One-sentence pitch:
A building facade material embedded with dormant slime mould networks
and freeze-dried cell-free reporters that together map and visually
display real-time moisture stress, structural load distribution, and
ventilation dead zones across a building’s surface thereby turning the wall
itself into a living diagnostic instrument.
How it works:
Slime mould (Physarum polycephalum) is a remarkable organism that
naturally grows its network along paths of least resistance, optimises
for efficient transport between nodes, and retreats from dry or
chemically hostile zones. These are exactly the same problems a building
faces: where is moisture accumulating behind cladding? Where are thermal
bridges concentrating stress? Where is air circulation failing?
The material would work in two layers. The first is a slime mould
network layer a thin hydrogel matrix embedded in the interior face
of a facade panel, seeded with dormant freeze-dried Physarum. When
humidity inside the wall cavity rises above a threshold (indicating
moisture ingress, condensation, or a failing vapour barrier), the
slime mould rehydrates and begins growing. Because Physarum
preferentially colonises humid corridors and avoids dry zones, its
network topology after 24–48 hours of growth literally traces the
moisture distribution map of that wall section — the densest growth
appears where the problem is worst.
The second layer is a freeze-dried cell-free biosensor layer sitting
just inside the visible surface of the panel. As the slime mould
network grows, it releases metabolic byproducts specifically
extracellular ATP and changes in local pH that diffuse into the
cell-free layer. These chemical signals activate a riboswitch in the
encapsulated Tx/Tl system, driving expression of a pigment or
structural protein that causes a visible color shift on the panel’s
surface. The wall literally marks its own problem zones in a colour
visible from outside, without any wiring, sensors, or power supply.
When the moisture problem is resolved and the wall dries out, Physarum
desiccates back into its dormant spore state, the cell-free reaction
stops (no more trigger signal), and the panel resets, ready to respond
again if the problem returns. Multiple panels across a facade create a
distributed, self-reporting moisture map of the entire building skin.
Societal challenge addressed:
Hidden moisture damage is one of the most expensive and dangerous
failure modes in construction. It causes structural rot, mould growth,
and insulation failure, and it is almost always detected too late because
it is invisible until the damage is severe. Current monitoring requires
either invasive physical inspection or expensive embedded electronic
sensor networks that need power, maintenance, and replacement. A
passive biological system that self-activates, self-maps, and self-resets
would give architects and building managers a continuous, maintenance-free
diagnostic layer in the fabric of the building itself is particularly
valuable in social housing, schools, and infrastructure in lower-resource
settings where sensor networks are not economically viable.
Addressing cell-free limitations:
The one-time-use limitation is turned into a feature here. Each
activation event corresponds to a real moisture event, and the system
resetting when conditions improve means the panel is always ready for
the next event rather than giving a permanent false positive. Stability
is handled by the slime mould’s own biology by naturally
encysting into desiccation-resistant sclerotia when dry, which can
survive years without nutrients, and the freeze-dried cell-free layer
sits dormant in the same conditions. Activation is not by externally
added water but by the building’s own pathological moisture. The system only triggers when there is a genuine problem, not from rain on the outer surface or ambient humidity fluctuations. The spatial resolution of the diagnostic comes for free from Physarum’s network growth dynamics.
Homework Question from Ally Huang
Using BioBits® Cell-Free Protein Expression System
1. Background
Astronauts on long-duration missions experience significant immune
dysregulation, including reduced lymphocyte function and increased
susceptibility to latent viral reactivation. In space, standard
laboratory-based immune monitoring is completely out of reach. Early
detection of immune stress markers is critical for crew health,
especially on future Mars missions where communication delays make
real-time Earth-based medical support impossible. A lightweight,
freeze-dried diagnostic system that can be activated on demand would
directly address this gap.
2. Molecular Target
Interleukin-6 (IL-6) mRNA — an early biomarker of systemic immune
activation, inflammation, and viral reactivation in astronauts.
3. How the target relates to the challenge
IL-6 spikes within hours of infection or physiological stress and has
been documented at elevated levels in astronaut blood samples linked to
latent herpesvirus reactivation during ISS missions. Detecting IL-6 mRNA
using a cell-free toehold switch biosensor gives real-time immune status
information without cold-chain reagents, trained personnel, or
centrifuges.
4. Hypothesis
I hypothesize that a freeze-dried BioBits cell-free expression system
programmed with an IL-6 mRNA-responsive toehold switch will reliably
detect elevated IL-6 transcript levels aboard the ISS, producing a
fluorescent output measurable by the P51 Molecular Fluorescence Viewer.
The toehold switch keeps the ribosome binding site sequestered in a
hairpin until the target IL-6 mRNA binds and unfolds it, triggering
translation of sfGFP. A visible fluorescence signal indicates immune
activation. The system will be validated against known IL-6 concentration
standards before flight.
5. Experimental Plan
Freeze-dried BioBits pellets will be rehydrated with a small whole blood
lysate sample from crew members at pre-flight, mid-mission, and
post-flight timepoints. The miniPCR thermal cycler will maintain isothermal
incubation conditions, and fluorescence will be read on the P51 viewer.
Controls include a synthetic IL-6 mRNA positive control and a buffer-only
negative control. Fluorescence presence or absence relative to a set
threshold identifies immune activation events across mission timepoints.
Week 10 HW: Imaging and Measurement
Waters Part I Molecular Weight
Question 1: Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight?
Using the ExPASy Compute pI/Mw tool with the provided eGFP sequence
Theoretical MW = 28,006.60 Da
Question 2: Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation.
Question 3: Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?
No, the charge state cannot be determined from the zoomed-in peak. Determining the charge state requires at least two adjacent charge-state peaks so their spacing can be used to calculate $z$. In the zoomed region, only a single isolated peak is shown with no neighboring charge-state peak visible, so there is insufficient information to assign a charge state.
Waters Part III — Peptide Mapping (Primary Structure)
Q1.How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above.
There are 20 Lysines (K) and 6 Arginines (R) in the eGFP sequence, for a total of 26 cleavage sites.
Q2. How many peptides will be generated from tryptic digestion of eGFP?
Using the PeptideMass tool at https://web.expasy.org/peptide_mass/ with the eGFP sequence, trypsin as enzyme, no missed cleavages, and the parameters shown in Figure 4, the tool generates 19 tryptic peptides.
Q3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
There are around 19 peaks between .5 and 6 minutes
Q4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?
Accounting for all peaks the total would be 22, more than the predicted 19 peaks
Q5. Identify the mass-to-charge of the peptide shown in Figure 5b. What is the charge of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H])+ based on its m/z and z.
The highlighted peak shows the most abundant isotope at m/z = 2.78 (the apex of the green-circled envelope).
In TOF-MS, the isotope spacing reveals the charge state. The relationship is:
\text{Isotope spacing} = \frac{1}{z}
Looking at the isotope pattern around the peak at retention time ~2.7 min, you can see fine structure. The isotope spacing (distance between consecutive 13C isotopes) is approximately 0.33-0.35 m/z units.
z = \frac{1}{\text{isotope spacing}} = \frac{1}{0.33} \approx 3
The charge state is z = 3 (triply charged ion, [M+3H]³⁺)
Using the relationship between observed m/z, charge state, and molecular mass:
m/z = \frac{M + nH}{z}
Rearranging to solve for M:
M = (m/z \times z) - nH
Where n = z (number of protons added). For z = 3:
M = (2.78 \times 3) - (3 \times 1.0073) = 8.34 - 3.0219 = 5.3181 \text{ kDa} \approx 5.32 \text{ kDa}
The singly charged form [M+H]⁺ would have m/z equal to the neutral mass plus one proton:
[M+H]^+ = 5318.1 + 1.0073 = 5319.1 \text{ m/z} \approx 5319 \text{ Da}
Q6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm.
Observed peptide mass (MW_experiment): 1050.52438 Da
Theoretical peptide mass (MW_theory): 1050.5214 Da
The mass accuracy is calculated using the absolute difference between experimental and theoretical mass values, normalized to the theoretical mass: