Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. I have worked with the concept of CA before within design and 3d space generative making through creating tools for generating patterns and environments, so it was really fascinating to see it being brought up during class. So, for my idea I’d like to merge my previous digital experience with CA and synthetic biology tooling in a form of a computer aided design tool for spatial synthetic biology
Week 2 HW: DNA Read, Write, & Edit
Part 1: Benchling & In-silico Gel Art Here’s my version of made in Benchling pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks :) .. It’s a frog!!! https://benchling.com/s/seq-E1w2i5oXtxnqS8BWnwcZ?m=slm-mz29OlZAWUwDeJTn4nir I also played around separately to that drawing with Ronan’s tool, and this is what I got ( I wanted to go for something more abstract this time)
Week 3 HW: Principles and Practices
For this lab, I made a little bunny drawing! I used Ronan’s tool as a base and then fixed it to work with colab environment and colours that we have in our node. from opentrons import types import math metadata = { 'protocolName': 'Bunny', 'author': 'HTGAA', 'source': 'HTGAA 2026', 'apiLevel': '2.20' } # Coordinate Data mkate2_points = [(12.1, 29.7),(14.3, 29.7),(16.5, 29.7),(18.7, 29.7),(20.9, 29.7),(23.1, 29.7),(9.9, 27.5),(12.1, 27.5),(16.5, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(-1.1, 25.3),(1.1, 25.3),(3.3, 25.3),(5.5, 25.3),(7.7, 25.3),(20.9, 25.3),(23.1, 25.3),(-3.3, 23.1),(-1.1, 23.1),(7.7, 23.1),(14.3, 23.1),(20.9, 23.1),(25.3, 23.1),(27.5, 23.1),(-7.7, 20.9),(14.3, 20.9),(20.9, 20.9),(29.7, 20.9),(12.1, 18.7),(16.5, 18.7),(18.7, 18.7),(29.7, 18.7),(-12.1, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(23.1, 16.5),(25.3, 16.5),(27.5, 16.5),(-14.3, 14.3),(12.1, 14.3),(16.5, 14.3),(18.7, 14.3),(25.3, 14.3),(-14.3, 12.1),(27.5, 12.1),(-16.5, 9.9),(27.5, 9.9),(-16.5, 7.7),(29.7, 7.7),(-16.5, 5.5),(-14.3, 5.5),(29.7, 5.5),(-16.5, 3.3),(-14.3, 3.3),(31.9, 3.3),(-14.3, 1.1),(-12.1, 1.1),(31.9, 1.1),(-12.1, -3.3),(34.1, -3.3),(-16.5, -5.5),(-14.3, -5.5),(34.1, -5.5),(-27.5, -7.7),(-25.3, -7.7),(-23.1, -7.7),(-18.7, -7.7),(-16.5, -7.7),(34.1, -7.7),(-27.5, -9.9),(-23.1, -9.9),(-20.9, -9.9),(-18.7, -9.9),(-16.5, -9.9),(-7.7, -9.9),(-5.5, -9.9),(34.1, -9.9),(-27.5, -12.1),(-18.7, -12.1),(-16.5, -12.1),(-14.3, -12.1),(-12.1, -12.1),(-9.9, -12.1),(-7.7, -12.1),(-5.5, -12.1),(34.1, -12.1),(-25.3, -14.3),(-16.5, -14.3),(-14.3, -14.3),(-12.1, -14.3),(-29.7, -16.5),(-27.5, -16.5),(-25.3, -16.5),(-23.1, -16.5),(-12.1, -16.5),(-9.9, -16.5),(31.9, -16.5),(-29.7, -18.7),(-20.9, -18.7),(-29.7, -20.9),(-18.7, -20.9),(-16.5, -20.9),(-14.3, -20.9),(-12.1, -20.9),(27.5, -20.9),(-27.5, -23.1),(-12.1, -23.1),(25.3, -23.1),(27.5, -23.1),(-27.5, -25.3),(-25.3, -25.3),(25.3, -25.3),(-23.1, -27.5),(-20.9, -27.5),(-18.7, -27.5),(-16.5, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(23.1, -27.5),(-14.3, -29.7),(-12.1, -29.7),(-9.9, -29.7),(-7.7, -29.7),(-3.3, -29.7),(5.5, -29.7),(7.7, -29.7),(9.9, -29.7),(12.1, -29.7),(14.3, -29.7),(18.7, -29.7),(20.9, -29.7)] mcerulean3_points = [(14.3, 25.3),(27.5, 20.9),(23.1, 18.7),(25.3, 18.7),(27.5, 18.7),(-5.5, 12.1),(-7.7, 9.9),(-5.5, 9.9)] TIP_RACK_DECK_SLOT = 9 COLORS_DECK_SLOT = 6 AGAR_DECK_SLOT = 5 PIPETTE_STARTING_TIP_WELL = 'A1' well_colors = { 'A1' : 'Purple', 'B1' : 'Pink', 'C3' : 'Cyan' } def run(protocol): tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips') pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul]) temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT) color_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate') agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate') center_location = agar_plate['A1'].top() pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL) def location_of_color(color_string): for well,color in well_colors.items(): if color.lower() == color_string.lower(): return color_plate[well] raise ValueError(f"No well found with color {color_string}") def dispense_and_detach(pipette, volume, location): above_location = location.move(types.Point(z=location.point.z + 5)) pipette.move_to(above_location) pipette.dispense(volume, location) pipette.move_to(above_location) drop_vol = 0.5 # Draw mcerulean3_points using Pink pipette_20ul.pick_up_tip() pipette_20ul.aspirate(len(mcerulean3_points) * drop_vol, location_of_color('Pink')) for x, y in mcerulean3_points: dispense_and_detach(pipette_20ul, drop_vol, center_location.move(types.Point(x, y))) pipette_20ul.drop_tip() # Draw mkate2_points using cornflowerblue pipette_20ul.pick_up_tip() num_batches = math.ceil((len(mkate2_points) * drop_vol) / 18) points_per_batch = math.floor(len(mkate2_points) / num_batches) for b in range(num_batches): start = b * points_per_batch end = min((b + 1) * points_per_batch, len(mkate2_points)) current_batch = mkate2_points[start:end] pipette_20ul.aspirate(len(current_batch) * drop_vol, location_of_color('Cyan')) for x, y in current_batch: dispense_and_detach(pipette_20ul, drop_vol, center_location.move(types.Point(x, y))) pipette_20ul.drop_tip() Post-Lab Questions Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
Week 4 HW: Protein Design Part I
Part A. Conceptual Questions 1.How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) 500g of meat = ~20% protein 100g ÷ 110 g/mol ≈ 0.91 mol 0.91 × 6.022 × 10²³ ≈ 5.5 × 10²³ molecules* Why are there only 20 natural amino acids?
Week 5 HW: Protein Design Part II
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ (taken from https://rest.uniprot.org/uniprotkb/P00441.fasta ) -> muted form A-> V MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ (2-5) # Binder Pseudo Perplexity Notes 0 WHYGAAQAAHWX 7.60026377248636 High confidence 1 WLYGASAAAWKK 7.46473740432208 Highest confidence!!! 2 WLYGAAGVAWKE 10.9325804754158 Moderate confidence 3 WLYYPQAAKLKK 15.5499787120909 Lowest confidence — FLYRWLPSRRGG 20.9180890005569 Known binder (control) Part 2: Evaluate Binders with AlphaFold3
Week 6 HW: Genetic Circuits Part I: Assembly Technologies
Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Based on Protocol for Phusion™ High-Fidelity PCR Master Mix with GC Buffer (https://www.neb.com/en-gb/protocols/protocol-for-phusion-high-fidelity-pcr-master-mix-with-gc-buffer-m0532) Phusion DNA Polymerase - a high-fidelity enzyme that synthesises new DNA strands with 3’→5’ exonuclease activity for proofreading dNTPs - building blocks for DNA synthesis (200 µM each at 1X concentration) MgCl₂ - Cofactor required for polymerase activity (1.5 mM at 1X concentration) Possibly optional as the question doesn’t mention it (GC Buffer - optimised buffer for amplifying difficult templates with high GC content or secondary structure) DMSO (optional additive) - helps denature secondary structures in GC-rich or difficult templates (recommended at 3% final concentration)
Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Analogue quality of computations (not just on and off, but rather treat them more like waves ->easier to work with) -> Thresholds finetuning works You can also work with more inputs simultaneously that will automatically become a weighted sum Which also means they can be trained like neural networks! whereas Boolean functions are just True/False with operators AND, OR, and NOT
Week 9 HW: Cell-Free Systems
Homework Part A: General and Lecturer-Specific Questions General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. As the reaction happens outside living cells, you have much more flexibility and control over experimental variables!
Week 10 HW: Advanced Imaging & Measurement Technology
Final Project Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. molecular validation (to confirm whether the Twist plasmids are correct and transformed properly) through the presence of the sender and receiver plasmid, correct insert size and sequence identity, then successful bacterial transformation functional signalling validation / AHL sensing ability through receiver activation threshold, GFP expression strength, dose-response curve, signal saturation spatial pattern formation to measure whether LuxI actually produces AHL through sender-generated AHL activity, effective AHL equivalent concentration, and hopefully signalling consistency between cultures ML/CAD dataset features through OT-2 plate maps, fluorescence reads and metadata logs Please describe all of the elements you would like to measure, and furthermore, describe how you will perform these measurements.
Week 11 HW: Bioproduction & Cloud Labs
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork At this point, I’m not entirely sure what I’ve contributed to, as the artwork changed quite a lot. I’ve added a few yellow and green pixels when we first got access to the board, but as time went it changed so much that I don’t think any of them were left in the same spots. I did like the collaborative aspect of it, but going forward, I kind of wish everyone could contribute to just one/two pixels that couldn’t be overwritten. That would make the pointing-out aspect of it so much easier
Week 12 HW: Building Genomes
See week 11~

Week 1 HW: Principles and Practices

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

I have worked with the concept of CA before within design and 3d space generative making through creating tools for generating patterns and environments, so it was really fascinating to see it being brought up during class. So, for my idea I’d like to merge my previous digital experience with CA and synthetic biology tooling in a form of a computer aided design tool for spatial synthetic biology

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

Ensuring biosafety:

a) Possibility of the prediction of biological behaviour

b) Testable behaviour (not just false confidence in whatever is happening, both for a and b!)

c) safety protocols built in within the ux/ui of the software

d) training provided

Transparency whilst preventing harm:

a) Maintenance of accountability over the created projects (as in not fully automated)

b) Responsible use

c) (Ideally!) some sort of encouragement of socially beneficial applications

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).
1. Purpose: What is done now and what changes are you proposing?
2. Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
3. Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
4. Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	2	1	2
• By helping respond	3	2	2
Foster Lab Safety
• By preventing incident	1	1	2
• By helping respond	2	1	3
Protect the environment
• By preventing incidents	3	2	1
• By helping respond	2	2	1
Other considerations
• Minimizing costs and burdens to stakeholders	3	2	1
• Feasibility?	2	2	1
• Not impede research	2	3	1
• Promote constructive applications	1	1	2

Homework Questions from Professor Jacobson: [Lecture 2 slides]

1)Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase?

1:10⁶

How does this compare to the length of the human genome?

approximately 3.2 Gbp

How does biology deal with that discrepancy?

Through 5’-3’ error-correcting exonuclease and 3’-5’ proofreading exonucleas + there is also MutS Repair System

2)How many different ways are there to code (DNA nucleotide code) for an average human protein?

2³⁴⁵ or higher (as there is 1,036 bp and 345 amino acids)

In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

because of the secondary structure formation
RNA cleavage
mRNA instability
different organisms might have preferences for certain codons

Homework Questions from Dr. LeProust: [Lecture 2 slides]

1.What’s the most commonly used method for oligo synthesis currently?

The phosphoramidite method

2.Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because of the 1)error accumulation, 2)”enhanced chemistry” is required

3.Why can’t you make a 2000bp gene via direct oligo synthesis?

Error accumulation and length limitations > assembly approach is required

Homework Question from George Church: [Lecture 2 slides]

[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

A protein–protein interaction code

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Here’s my version of made in Benchling pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks :)

.. It’s a frog!!!

https://benchling.com/s/seq-E1w2i5oXtxnqS8BWnwcZ?m=slm-mz29OlZAWUwDeJTn4nir

I also played around separately to that drawing with Ronan’s tool, and this is what I got ( I wanted to go for something more abstract this time)

Enzyme	Number of Cuts	Number of Fragments	Fragment Sizes (bp)
EcoRI	5	6	21,226 / 7,421 / 5,804 / 5,643 / 4,878 / 3,530
HindIII	6	7	9,416 / 6,682 / 4,361 / 3,130 / 2,322 / 2,027 / 564
BamHI	5	6	16,841 / 7,233 / 6,770 / 6,527 / 5,626 / 5,505
KpnI	2	3	29,942 / 17,057 / 1,503
EcoRV	21	22	5,765 / 5,376 / 4,613 / 3,873 / 3,744 / 3,595 / 2,884 / 2,674 / 1,921 / 1,679 / 1,434 / 1,403 / 1,377 / 1,313 / 738 / 655 / 618 / 597 / 588 / 268 / 52 / 35
SacI	2	3	24,776 / 22,621 / 1,105
SalI	2	3	32,745 / 15,258 / 499

Restriction Enzymes Used

EcoRI
EcoRV
HindIII
KpnI
BamHI
SacI
SalI

Restriction Digest Setup

Lane	Water	CutSmart Buffer	λ DNA	Enzyme(s)
M (Ladder)	14 μL	2 μL	3 μL	-
1	13 μL	2 μL	3 μL	1 μL EcoRI
2	13 μL	2 μL	3 μL	1 μL KpnI + 1 μL BamHI
3	14 μL	2 μL	3 μL	1 μL EcoRI + 1 μL HindIII
4	14 μL	2 μL	3 μL	1 μL EcoRV
5	13 μL	2 μL	3 μL	1 μL EcoRI + 1 μL KpnI
6	13 μL	2 μL	3 μL	1 μL SacI + 1 μL HindIII
7	13 μL	2 μL	3 μL	1 μL SacI + 1 μL SacI
8	13 μL	2 μL	3 μL	1 μL SalI + 1 μL KpnI
9	14 μL	2 μL	3 μL	1 μL SacI + 1 μL SacI
10	13 μL	2 μL	3 μL	1 μL HindIII + 1 μL SacI

Total volume per tube: 20 μL

Restriction Digest Parameters

Incubation: 37°C for 60 minutes
Heat Inactivation (optional): 80°C for 20 minutes

DNA Gel Electrophoresis

Goal: 100-150 ng of DNA per lane

Hand-Cast Gel Protocol

Digest Sample:

14.7 μL Water
3.3 μL Loading Dye
2 μL Digest
Total: 20 μL

Ladder:

6.6 μL Water
3.3 μL Loading Dye
10 μL Ladder (15 ng/μL stock)
Total: 20 μL

E-Gel Protocol

Digest Sample:

18 μL Water
2 μL Digest
Total: 20 μL

Ladder:

10 μL Water
10 μL Ladder (15 ng/μL stock)
Total: 20 μL

Part 3: DNA Design Challenge

3.1 For this task, I’d like to use green fluorescent protein also known as GFP as it is, firstly, safe and pretty well characterised (as described here! https://pubmed.ncbi.nlm.nih.gov/8303295/), essential for CA mechanics to be working (https://www.science.org/doi/10.1126/science.abb8205) and just generally directly connects to tool making (https://www.nature.com/articles/s41467-024-53078-8)

sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPT LVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDT LVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQ LADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

3.2 Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

based on this tool https://www.bioinformatics.org/sms2/rev_trans.html

reverse translation of sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 to a 714 base sequence of consensus codons. atgwsnaarggngargarytnttyacnggngtngtnccnathytngtngarytngayggn gaygtnaayggncayaarttywsngtnwsnggngarggngarggngaygcnacntayggn aarytnacnytnaarttyathtgyacnacnggnaarytnccngtnccntggccnacnytn gtnacnacnttywsntayggngtncartgyttywsnmgntayccngaycayatgaarcar caygayttyttyaarwsngcnatgccngarggntaygtncargarmgnacnathttytty aargaygayggnaaytayaaracnmgngcngargtnaarttygarggngayacnytngtn aaymgnathgarytnaarggnathgayttyaargargayggnaayathytnggncayaar ytngartayaaytayaaywsncayaaygtntayathatggcngayaarcaraaraayggn athaargtnaayttyaarathmgncayaayathgargayggnwsngtncarytngcngay caytaycarcaraayacnccnathggngayggnccngtnytnytnccngayaaycaytay ytnwsnacncarwsngcnytnwsnaargayccnaaygaraarmgngaycayatggtnytn ytngarttygtnacngcngcnggnathacncayggnatggaygarytntayaar

3.3. Codon optimization.

based on this tool - https://en.vectorbuilder.com/tool/codon-optimization/51ebfb2e-a00a-4190-9fe9-c1578d3ecfbe.html

Pasted Sequence: GC=48.60%, CAI=1.00

ATGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAACGGCCATAAATTTAGCGTGAGCGGCGAAGGCGAAGGCGATGCGACCTATGGCAAACTGACCCTGAAATTTATTTGCACCACCGGCAAACTGCCGGTGCCGTGGCCGACCCTGGTGACCACCTTTAGCTATGGCGTGCAGTGCTTTAGCCGCTATCCGGATCATATGAAACAGCATGATTTTTTTAAAAGCGCGATGCCGGAAGGCTATGTGCAGGAACGCACCATTTTTTTTAAAGATGATGGCAACTATAAAACCCGCGCGGAAGTGAAATTTGAAGGCGATACCCTGGTGAACCGCATTGAACTGAAAGGCATTGATTTTAAAGAAGATGGCAACATTCTGGGCCATAAACTGGAATATAACTATAACAGCCATAACGTGTATATTATGGCGGATAAACAGAAAAACGGCATTAAAGTGAACTTTAAAATTCGCCATAACATTGAAGATGGCAGCGTGCAGCTGGCGGATCATTATCAGCAGAACACCCCGATTGGCGATGGCCCGGTGCTGCTGCCGGATAACCATTATCTGAGCACCCAGAGCGCGCTGAGCAAAGATCCGAACGAAAAACGCGATCATATGGTGCTGCTGGAATTTGTGACCGCGGCGGGCATTACCCATGGCATGGATGAACTGTATAAA

Improved DNA[1]: GC=48.88%, CAI=0.97

ATGAGCAAAGGCGAAGAACTGTTTACCGGCGTGGTGCCGATTCTGGTGGAACTGGATGGCGATGTGAATGGCCATAAATTTAGCGTGAGCGGCGAAGGTGAAGGCGATGCGACCTATGGCAAACTGACCCTGAAATTTATCTGCACCACCGGTAAACTGCCGGTGCCGTGGCCGACCCTGGTGACCACCTTCAGCTACGGCGTGCAGTGTTTTAGCCGCTACCCGGATCATATGAAACAGCATGATTTTTTTAAAAGCGCGATGCCGGAAGGCTATGTGCAGGAACGCACCATTTTTTTCAAAGATGATGGCAATTACAAAACCCGTGCCGAAGTGAAATTCGAAGGCGATACCCTGGTGAATCGCATTGAACTGAAAGGCATTGATTTTAAAGAAGATGGTAACATTCTGGGCCACAAACTGGAATACAACTATAACAGCCATAACGTGTACATTATGGCGGATAAACAGAAAAATGGCATTAAAGTGAACTTTAAAATTCGCCATAACATTGAAGATGGCTCAGTGCAGCTGGCGGATCACTATCAGCAGAACACCCCGATTGGCGATGGCCCGGTTCTGCTGCCGGATAACCACTATCTGAGCACCCAGAGCGCGCTGTCGAAAGATCCGAACGAAAAACGCGATCACATGGTGCTGCTGGAATTTGTGACCGCCGCGGGCATCACCCATGGTATGGATGAACTGTATAAA Avoid cleavage sites of restriction enzymes: BbsI BsaI

It’s done to get a more reliable and consistent protein expression!

I chose Escherichia coli K-12 substr. MG1655 because it’s standard and saf to work with + GFP expression is well established within it

3.4 If it’s a cell-dependent method the DNA sequence can be transcribed and translated into my protein through these steps: 1. Transformation 2.Transcription 3.Translation 4.Protein Folding 5.Fluorescence

Part 4: Prepare a Twist DNA Synthesis Order

https://benchling.com/s/seq-TYyZiuVgAJUcWN179kJj?m=slm-SR2kt8YFMq4ngkmbdM2Q

Afterwards in Twist I got a bunch of errors when importing as when I was making a sequence a copy pasted an article “a"into it as it was a part of it. It is also visible here as 931 is not divisible by 3 fixed version!> how it works>

final link https://benchling.com/s/seq-jNpnBifWoDGPy7RLJoHu?m=slm-nUfWbFSbXctEauGsTTaq

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

So far some sort of plasmids encoding CA-based pattern formation circuits in bacterial populations to be able to design patterns with them for the final project tool

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Probably something like Illumina as it is pretty commonly used which means it will be easier to find

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

GFP-based circuits so I could prototype programmable living pattern systems

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

PCR or Gibson assembly to do precise construction of my custom genetic circuit

And then in terms of computer based tech Benchling and Twist

5.3 DNA Edit

(i) What DNA would you want to edit and why?

So far based on my research for this project E. coli K-12 MG1655as it is safe, well-documented, accessible and also works amazing with my project

(ii) What technology or technologies would you use to perform these DNA edits and why?

I’m not fully sure what’s possible to use for the projects yet but if I can CRISPR-Cas9 as it can precisely edit genetic circuits

Week 3 HW: Principles and Practices

For this lab, I made a little bunny drawing! I used Ronan’s tool as a base and then fixed it to work with colab environment and colours that we have in our node.

    from opentrons import types
    import math
    
    metadata = {
        'protocolName': 'Bunny',
        'author': 'HTGAA',
        'source': 'HTGAA 2026',
        'apiLevel': '2.20'
    }
    
    # Coordinate Data
    mkate2_points = [(12.1, 29.7),(14.3, 29.7),(16.5, 29.7),(18.7, 29.7),(20.9, 29.7),(23.1, 29.7),(9.9, 27.5),(12.1, 27.5),(16.5, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(-1.1, 25.3),(1.1, 25.3),(3.3, 25.3),(5.5, 25.3),(7.7, 25.3),(20.9, 25.3),(23.1, 25.3),(-3.3, 23.1),(-1.1, 23.1),(7.7, 23.1),(14.3, 23.1),(20.9, 23.1),(25.3, 23.1),(27.5, 23.1),(-7.7, 20.9),(14.3, 20.9),(20.9, 20.9),(29.7, 20.9),(12.1, 18.7),(16.5, 18.7),(18.7, 18.7),(29.7, 18.7),(-12.1, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(23.1, 16.5),(25.3, 16.5),(27.5, 16.5),(-14.3, 14.3),(12.1, 14.3),(16.5, 14.3),(18.7, 14.3),(25.3, 14.3),(-14.3, 12.1),(27.5, 12.1),(-16.5, 9.9),(27.5, 9.9),(-16.5, 7.7),(29.7, 7.7),(-16.5, 5.5),(-14.3, 5.5),(29.7, 5.5),(-16.5, 3.3),(-14.3, 3.3),(31.9, 3.3),(-14.3, 1.1),(-12.1, 1.1),(31.9, 1.1),(-12.1, -3.3),(34.1, -3.3),(-16.5, -5.5),(-14.3, -5.5),(34.1, -5.5),(-27.5, -7.7),(-25.3, -7.7),(-23.1, -7.7),(-18.7, -7.7),(-16.5, -7.7),(34.1, -7.7),(-27.5, -9.9),(-23.1, -9.9),(-20.9, -9.9),(-18.7, -9.9),(-16.5, -9.9),(-7.7, -9.9),(-5.5, -9.9),(34.1, -9.9),(-27.5, -12.1),(-18.7, -12.1),(-16.5, -12.1),(-14.3, -12.1),(-12.1, -12.1),(-9.9, -12.1),(-7.7, -12.1),(-5.5, -12.1),(34.1, -12.1),(-25.3, -14.3),(-16.5, -14.3),(-14.3, -14.3),(-12.1, -14.3),(-29.7, -16.5),(-27.5, -16.5),(-25.3, -16.5),(-23.1, -16.5),(-12.1, -16.5),(-9.9, -16.5),(31.9, -16.5),(-29.7, -18.7),(-20.9, -18.7),(-29.7, -20.9),(-18.7, -20.9),(-16.5, -20.9),(-14.3, -20.9),(-12.1, -20.9),(27.5, -20.9),(-27.5, -23.1),(-12.1, -23.1),(25.3, -23.1),(27.5, -23.1),(-27.5, -25.3),(-25.3, -25.3),(25.3, -25.3),(-23.1, -27.5),(-20.9, -27.5),(-18.7, -27.5),(-16.5, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(23.1, -27.5),(-14.3, -29.7),(-12.1, -29.7),(-9.9, -29.7),(-7.7, -29.7),(-3.3, -29.7),(5.5, -29.7),(7.7, -29.7),(9.9, -29.7),(12.1, -29.7),(14.3, -29.7),(18.7, -29.7),(20.9, -29.7)]
    mcerulean3_points = [(14.3, 25.3),(27.5, 20.9),(23.1, 18.7),(25.3, 18.7),(27.5, 18.7),(-5.5, 12.1),(-7.7, 9.9),(-5.5, 9.9)]
    
    TIP_RACK_DECK_SLOT = 9
    COLORS_DECK_SLOT = 6
    AGAR_DECK_SLOT = 5
    PIPETTE_STARTING_TIP_WELL = 'A1'
    
    well_colors = {
        'A1' : 'Purple',
        'B1' : 'Pink',
        'C3' : 'Cyan'
    }
    
    def run(protocol):
      tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')
      pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])
      temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)
      color_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate')
      agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')
      center_location = agar_plate['A1'].top()
      pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)
    
      def location_of_color(color_string):
        for well,color in well_colors.items():
          if color.lower() == color_string.lower():
            return color_plate[well]
        raise ValueError(f"No well found with color {color_string}")
    
      def dispense_and_detach(pipette, volume, location):
          above_location = location.move(types.Point(z=location.point.z + 5))
          pipette.move_to(above_location)
          pipette.dispense(volume, location)
          pipette.move_to(above_location)
    
      drop_vol = 0.5
    
      # Draw mcerulean3_points using Pink
      pipette_20ul.pick_up_tip()
      pipette_20ul.aspirate(len(mcerulean3_points) * drop_vol, location_of_color('Pink'))
      for x, y in mcerulean3_points:
        dispense_and_detach(pipette_20ul, drop_vol, center_location.move(types.Point(x, y)))
      pipette_20ul.drop_tip()
    
      # Draw mkate2_points using cornflowerblue
      pipette_20ul.pick_up_tip()
      num_batches = math.ceil((len(mkate2_points) * drop_vol) / 18)
      points_per_batch = math.floor(len(mkate2_points) / num_batches)
    
      for b in range(num_batches):
        start = b * points_per_batch
        end = min((b + 1) * points_per_batch, len(mkate2_points))
        current_batch = mkate2_points[start:end]
        pipette_20ul.aspirate(len(current_batch) * drop_vol, location_of_color('Cyan'))
        for x, y in current_batch:
          dispense_and_detach(pipette_20ul, drop_vol, center_location.move(types.Point(x, y)))
      pipette_20ul.drop_tip()

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

https://www.nature.com/articles/s41598-024-64938-0

This study by Norton-Baker et al. (2024) used an Opentrons OT-2 liquid-handling tool to efficiently characterise a large number of proteins. They also described a generalizable pipeline for high-throughput protein purification using small-scale expression in E. coli and an affordable liquid-handling robot. As a result, the automation significantly increased throughput, reduced manual labour, and improved consistency across samples, demonstrating how accessible robotics can accelerate biological research workflows. It also allowed to confirm the validity of previous findings.

On a separate note, I really enjoyed their explanation for using the tool : “Therefore, we aimed to develop a protocol using the OT-2 to provide a low-cost option for the purification and analysis of enzymes, or other proteins, making high-throughput studies more accessible to a broader range of research laboratories. Beyond increasing efficiency, this automation-assisted approach reduces the labor burden on researchers and lowers the risk of repetitive use injuries.”"

Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

For my final project, I plan on using Opentrons to support a design, then build, test and learn pipeline for programmable living CA.

Things to do:

build libraries of sender receiver circuits with a range of parameters (I’ve done a similar thing with chi.bio before so should be straightforward). I’d also like to see how I can integrate it within the pre existing lab environment (Airtable) to get more transparent data for the future
Go through the process of general screening of spatial rule variants in plates to test pattern formation under different genetic and environmental conditions, depending on available options
In the later stages. use Cloud lab integration (e.g., Ginkgo Nebula) to scale, run large combinatorial libraries, and feed results back into the CA simulation engine for model fine tunning

All in all, this automation will increase reproducibility, enable systematic exploration of rule space, and accelerate the feedback loop between digital design and wet-lab experimentation.

I would also like to explore more of the idea using Ginkgo’s cloud lab (Nebula) to run large combinatorial libraries of CA rule variants. That would allow me to test hundreds of spatial rule configurations and feed experimental results back into the simulation engine!

Final Project Ideas:

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

1.How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

500g of meat = ~20% protein 100g ÷ 110 g/mol ≈ 0.91 mol 0.91 × 6.022 × 10²³ ≈ 5.5 × 10²³ molecules*

Why are there only 20 natural amino acids?

Because of the frozen accident!! https://pubmed.ncbi.nlm.nih.gov/27926995/ Evolution has developed so, and it’s too late to change that also, they have the optimal protein structure

Where did amino acids come from before enzymes that make them, and before life started? Based on this Nature article( https://www.nature.com/scitable/topicpage/an-evolutionary-perspective-on-amino-acids-14568445/ ) from metabolic pathways up to Archean Eon and chemical synthesis up to Hadean Eon

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? Left!

Can you discover additional helices in proteins? Yes, through like DSSP or PROCHECK software.

Why are most molecular helices right-handed? They’re more stable! https://www.ch.ic.ac.uk/rzepa/blog/?p=3802

Why do β-sheets tend to aggregate? their structure has edge strands with hydrogen-bonding capacity, making them thermodynamically prone to stacking through edge-to-edge or face-to-face interactions

What is the driving force for β-sheet aggregation?

It is determined by ATR-FTIR spectroscopy

Why do many amyloid diseases form β-sheets?

“Amyloid beta peptide (Aβ) is produced through the proteolytic processing of a transmembrane protein, amyloid precursor protein (APP), by β- and γ-secretases. Aβ accumulation in the brain is proposed to be an early toxic event”. https://www.nature.com/articles/aps201728

Can you use amyloid β-sheets as materials? Design a β-sheet motif that forms a well-ordered structure. Yes, they can be used as stable nano materials with applications in environmental sciences, material engineering, and translational medicines https://pmc.ncbi.nlm.nih.gov/articles/PMC8508955/. Still unsure how to design it

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

I’ve selected LuxR as it controls bioluminescence through quorum sensing which is a “is the regulation of gene expression in response to fluctuations in cell-population density”(https://pubmed.ncbi.nlm.nih.gov/11544353/). As I want to do project in connection to CA it seemed fitting. For structure visualizatio I will be using TraR (PDB: 1L3L)

2. Identify the amino acid sequence of your protein.

from collections import Counter: Protein sequence provided by the user protein_sequence = “MSSNIESLYRQMLNEIIEQMEKEGKISREEALAVLESKLKKSNVSRDIILSHGFNSGVTQSAQPSSFLNNMAAIAKQNLHFSDIFTDLENQFIAELEELEQSYKNLVTKLSQKLDLSSLTEQALVELRSYLNQIKQSGLLDSAQNIFNFSEQISALQEAIGQLSQSIANLELQSREISELNQRVQDLEQELSFLSQQLASAQTQASQLQAQINQLNHQLEALEKAHQEALSRAKQEIEKLRS”

The length of the protein is: 242 aminoacids. The most common amino acid is: L, which appears 36 times.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. 250!

*The UniProt BLAST search identified over 250 homologs with highly significant E-values (ranging from ~0 to 10⁻²⁴), indicating LuxR is part of a large, well-conserved protein family. LuxR belongs to the LuxR-FixJ superfamily and the LuxR family of transcriptional regulators, which are widespread across gram-negative bacteria and typically involved in quorum sensing and gene regulation.

LuxR belongs to the TetR family RCSB PDB of transcriptional regulators, specifically to Vibrio alginolyticus variant, and more broadly to LuxR-FixJ

3. Identify the structure page of your protein in RCSB

https://www.uniprot.org/uniprotkb/P12746/entry#sequences

Resolution: 1.66 Å

Are there any other molecules in the solved structure apart from protein?

Yes! LAE (3-OXO-OCTANOIC ACID (2-OXO-TETRAHYDRO-FURAN-3-YL)-AMIDE) and MSE (L-PEPTIDE LINKING)

Does your protein belong to any structure classification family?

Yes! https://www.uniprot.org/uniprotkb/P33905/entry#family_and_domains

LuxR (TraR) belongs to the autoinducer-regulated transcriptional regulatory protein family. The 2 domains are an N-terminal autoinducer-binding domain (Pfam: Autoind_bind) that binds the AHL quorum sensing signal, and a C-terminal HTH LuxR-type DNA-binding domain (Pfam: GerE) that activates transcription. Also classified under the LuxR-FixJ superfamily (SUPFAM) and the Winged Helix DNA-binding superfamily for its C-terminal domain.

4.Open the structure of your protein in any 3D molecule visualization software:

:The structure is predominantly alpha-helical.

Hydrophobic residues are concentrated in the protein core, forming the structural interior and lining the AHL ligand-binding pocket.

It does have binding pockets

Part C. Using ML-Based Protein Design Tools

I also asked Gemini to identify mutation hotspots based on the compiled data in the heatmap:**

“Let’s analyze the mutation scan heatmap to identify potential mutation hotspots. I’ll write some code to scan through each position in the protein sequence and highlight those where several different amino acid substitutions are predicted to be favorable (meaning they have a high positive Log-Likelihood Ratio). Here’s the analysis:

Mutation Hotspots Identified (positions with at least 3 mutations having LLR > 0.5):

Position 2 (Original: Q):

Mutate to M: LLR = 7.67
Mutate to A: LLR = 2.68
Mutate to L: LLR = 2.55
Mutate to V: LLR = 2.43
Mutate to R: LLR = 2.33
Mutate to E: LLR = 2.02
Mutate to I: LLR = 2.00
Mutate to T: LLR = 1.99
Mutate to S: LLR = 1.91
Mutate to K: LLR = 1.90
Mutate to Y: LLR = 1.89
Mutate to F: LLR = 1.69
Mutate to G: LLR = 1.59
Mutate to Q: LLR = 1.59
Mutate to H: LLR = 1.33
Mutate to D: LLR = 1.31
Mutate to N: LLR = 1.26
Mutate to P: LLR = 1.16 Position 4 (Original: W):
Mutate to A: LLR = 1.31
Mutate to E: LLR = 1.11
Mutate to T: LLR = 1.07
Mutate to S: LLR = 1.01
Mutate to R: LLR = 1.00
Mutate to Q: LLR = 0.96
Mutate to D: LLR = 0.95
Mutate to K: LLR = 0.80 Position 5 (Original: L):
Mutate to L: LLR = 2.45
Mutate to I: LLR = 1.99
Mutate to A: LLR = 1.96
Mutate to V: LLR = 1.77
Mutate to T: LLR = 1.17
Mutate to F: LLR = 1.07
Mutate to R: LLR = 0.80
Mutate to Y: LLR = 0.76
Mutate to E: LLR = 0.72
Mutate to S: LLR = 0.56 Position 15 (Original: E):
Mutate to A: LLR = 1.13
Mutate to S: LLR = 1.00
Mutate to T: LLR = 0.62
Mutate to R: LLR = 0.62
Mutate to L: LLR = 0.58 Position 20 (Original: I):
Mutate to E: LLR = 5.70
Mutate to A: LLR = 5.51
Mutate to D: LLR = 5.31
Mutate to T: LLR = 4.86
Mutate to Q: LLR = 4.72
Mutate to S: LLR = 4.71
Mutate to R: LLR = 4.31
Mutate to K: LLR = 4.31
Mutate to G: LLR = 4.15
Mutate to N: LLR = 4.03
Mutate to P: LLR = 3.68
Mutate to V: LLR = 3.55
Mutate to L: LLR = 3.52
Mutate to H: LLR = 3.47
Mutate to I: LLR = 2.68
Mutate to M: LLR = 2.11
Mutate to F: LLR = 2.08
Mutate to Y: LLR = 1.95
Mutate to W: LLR = 1.11 Position 25 (Original: L):
Mutate to A: LLR = 3.12
Mutate to V: LLR = 2.93
Mutate to L: LLR = 2.73
Mutate to I: LLR = 2.70
Mutate to T: LLR = 1.85
Mutate to M: LLR = 1.40
Mutate to F: LLR = 1.15
Mutate to C: LLR = 0.80 Position 28 (Original: I):
Mutate to E: LLR = 0.99
Mutate to A: LLR = 0.73
Mutate to L: LLR = 0.64 Position 38 (Original: A):
Mutate to V: LLR = 1.76
Mutate to A: LLR = 1.56
Mutate to I: LLR = 1.23
Mutate to S: LLR = 0.90
Mutate to L: LLR = 0.81
Mutate to G: LLR = 0.79
Mutate to T: LLR = 0.53 Position 40 (Original: L):
Mutate to I: LLR = 1.66
Mutate to L: LLR = 1.64
Mutate to V: LLR = 1.53
Mutate to F: LLR = 0.63 Position 42 (Original: I):
Mutate to D: LLR = 2.31
Mutate to E: LLR = 1.39
Mutate to L: LLR = 1.08
Mutate to S: LLR = 0.71
Mutate to G: LLR = 0.63
Mutate to N: LLR = 0.62
Mutate to V: LLR = 0.62
Mutate to T: LLR = 0.60 Position 43 (Original: Q):
Mutate to D: LLR = 4.50
Mutate to E: LLR = 2.76
Mutate to N: LLR = 2.38
Mutate to P: LLR = 2.35
Mutate to S: LLR = 2.06
Mutate to T: LLR = 1.76
Mutate to A: LLR = 1.44
Mutate to G: LLR = 1.35
Mutate to H: LLR = 1.29
Mutate to R: LLR = 1.26
Mutate to Q: LLR = 1.06
Mutate to K: LLR = 0.92
Mutate to L: LLR = 0.80
Mutate to V: LLR = 0.51 Position 44 (Original: H):
Mutate to D: LLR = 1.89
Mutate to E: LLR = 1.59
Mutate to P: LLR = 1.31
Mutate to G: LLR = 0.72
Mutate to A: LLR = 0.55 Position 45 (Original: R):
Mutate to D: LLR = 2.75
Mutate to N: LLR = 1.61
Mutate to E: LLR = 1.59
Mutate to G: LLR = 1.42
Mutate to S: LLR = 1.10
Mutate to T: LLR = 0.62 Position 46 (Original: H):
Mutate to D: LLR = 1.93
Mutate to G: LLR = 1.62
Mutate to E: LLR = 1.26
Mutate to N: LLR = 0.61 Position 47 (Original: I):
Mutate to E: LLR = 1.81
Mutate to D: LLR = 1.51
Mutate to R: LLR = 1.04
Mutate to T: LLR = 0.98
Mutate to S: LLR = 0.90
Mutate to G: LLR = 0.88
Mutate to Q: LLR = 0.77
Mutate to K: LLR = 0.74
Mutate to A: LLR = 0.60 Position 48 (Original: T):
Mutate to G: LLR = 1.72
Mutate to L: LLR = 1.12
Mutate to D: LLR = 0.98
Mutate to E: LLR = 0.85
Mutate to A: LLR = 0.75
Mutate to R: LLR = 0.50 Position 53 (Original: Y):
Mutate to G: LLR = 1.56
Mutate to E: LLR = 0.93
Mutate to A: LLR = 0.90
Mutate to D: LLR = 0.89
Mutate to S: LLR = 0.68 Position 55 (Original: R):
Mutate to G: LLR = 1.31
Mutate to S: LLR = 1.27
Mutate to D: LLR = 1.14
Mutate to T: LLR = 1.12
Mutate to P: LLR = 0.75
Mutate to A: LLR = 0.56 Position 56 (Original: Q):
Mutate to P: LLR = 1.07
Mutate to G: LLR = 0.79
Mutate to D: LLR = 0.77
Mutate to E: LLR = 0.59 Position 57 (Original: W):
Mutate to D: LLR = 1.10
Mutate to G: LLR = 0.99
Mutate to E: LLR = 0.93
Mutate to S: LLR = 0.56 Position 58 (Original: Q):
Mutate to G: LLR = 2.29
Mutate to D: LLR = 1.82
Mutate to E: LLR = 1.73
Mutate to P: LLR = 1.65
Mutate to A: LLR = 1.52
Mutate to S: LLR = 1.45
Mutate to T: LLR = 1.40
Mutate to L: LLR = 1.33
Mutate to R: LLR = 1.20
Mutate to N: LLR = 1.14
Mutate to Q: LLR = 1.01
Mutate to K: LLR = 0.96
Mutate to H: LLR = 0.91
Mutate to V: LLR = 0.90
Mutate to I: LLR = 0.84
Mutate to F: LLR = 0.60
Mutate to Y: LLR = 0.57 Position 59 (Original: S):
Mutate to G: LLR = 1.33
Mutate to L: LLR = 1.02
Mutate to A: LLR = 0.89
Mutate to D: LLR = 0.89
Mutate to S: LLR = 0.84
Mutate to P: LLR = 0.82
Mutate to T: LLR = 0.63
Mutate to E: LLR = 0.61
Mutate to I: LLR = 0.53 Position 61 (Original: Y):
Mutate to E: LLR = 0.68
Mutate to D: LLR = 0.61
Mutate to A: LLR = 0.60 Position 63 (Original: D):
Mutate to D: LLR = 1.12
Mutate to S: LLR = 0.94
Mutate to A: LLR = 0.84
Mutate to G: LLR = 0.75
Mutate to P: LLR = 0.74
Mutate to T: LLR = 0.67
Mutate to L: LLR = 0.59
Mutate to E: LLR = 0.56 Position 65 (Original: K):
Mutate to A: LLR = 0.89
Mutate to P: LLR = 0.88
Mutate to E: LLR = 0.87
Mutate to L: LLR = 0.85
Mutate to G: LLR = 0.80
Mutate to D: LLR = 0.67
Mutate to R: LLR = 0.65 Position 66 (Original: F):
Mutate to S: LLR = 1.46
Mutate to P: LLR = 1.41
Mutate to D: LLR = 1.29
Mutate to T: LLR = 1.27
Mutate to G: LLR = 0.86
Mutate to A: LLR = 0.79
Mutate to R: LLR = 0.78
Mutate to E: LLR = 0.75 Position 75 (Original: K):
Mutate to A: LLR = 1.30
Mutate to E: LLR = 1.23
Mutate to L: LLR = 1.01
Mutate to R: LLR = 0.86
Mutate to D: LLR = 0.65
Mutate to S: LLR = 0.59
Mutate to W: LLR = 0.52 Position 78 (Original: R):
Mutate to L: LLR = 2.00
Mutate to I: LLR = 1.32
Mutate to F: LLR = 0.60
Mutate to V: LLR = 0.56 Position 80 (Original: R):
Mutate to G: LLR = 1.42
Mutate to E: LLR = 0.95
Mutate to D: LLR = 0.84
Mutate to S: LLR = 0.69
Mutate to A: LLR = 0.64 Position 82 (Original: H):
Mutate to G: LLR = 1.36
Mutate to E: LLR = 1.11
Mutate to D: LLR = 1.00
Mutate to T: LLR = 0.96
Mutate to P: LLR = 0.96
Mutate to S: LLR = 0.92
Mutate to R: LLR = 0.89
Mutate to A: LLR = 0.87
Mutate to L: LLR = 0.55 Position 84 (Original: F):
Mutate to L: LLR = 1.07
Mutate to V: LLR = 1.05
Mutate to I: LLR = 0.99 Position 85 (Original: T):
Mutate to V: LLR = 0.84
Mutate to L: LLR = 0.65
Mutate to I: LLR = 0.54 Position 86 (Original: W):
Mutate to V: LLR = 2.01
Mutate to L: LLR = 1.91
Mutate to I: LLR = 1.90
Mutate to A: LLR = 1.64
Mutate to T: LLR = 1.52
Mutate to S: LLR = 1.39
Mutate to R: LLR = 1.12
Mutate to F: LLR = 1.10
Mutate to E: LLR = 0.90
Mutate to D: LLR = 0.82
Mutate to G: LLR = 0.71
Mutate to Y: LLR = 0.63
Mutate to K: LLR = 0.62
Mutate to H: LLR = 0.62
Mutate to P: LLR = 0.56
Mutate to N: LLR = 0.53 Position 90 (Original: H):
Mutate to L: LLR = 1.31
Mutate to I: LLR = 1.28
Mutate to V: LLR = 1.09
Mutate to A: LLR = 1.03
Mutate to T: LLR = 0.94
Mutate to P: LLR = 0.79
Mutate to S: LLR = 0.76
Mutate to G: LLR = 0.75
Mutate to D: LLR = 0.64
Mutate to E: LLR = 0.61 Position 92 (Original: R):
Mutate to L: LLR = 1.41
Mutate to I: LLR = 0.92
Mutate to D: LLR = 0.91
Mutate to V: LLR = 0.67 Position 93 (Original: P):
Mutate to L: LLR = 1.17
Mutate to I: LLR = 0.76
Mutate to V: LLR = 0.73 Position 97 (Original: K):
Mutate to A: LLR = 1.41
Mutate to P: LLR = 1.39
Mutate to E: LLR = 0.93
Mutate to D: LLR = 0.80 Position 105 (Original: H):
Mutate to L: LLR = 1.84
Mutate to A: LLR = 1.60
Mutate to I: LLR = 1.40
Mutate to F: LLR = 1.26
Mutate to V: LLR = 1.24
Mutate to E: LLR = 0.89
Mutate to Y: LLR = 0.75
Mutate to T: LLR = 0.72
Mutate to R: LLR = 0.71 Position 107 (Original: S):
Mutate to L: LLR = 1.67
Mutate to R: LLR = 1.33
Mutate to I: LLR = 1.30
Mutate to V: LLR = 1.14
Mutate to A: LLR = 0.70 Position 108 (Original: D):
Mutate to R: LLR = 1.01
Mutate to A: LLR = 0.62
Mutate to E: LLR = 0.57
Mutate to K: LLR = 0.51 Position 118 (Original: P):
Mutate to A: LLR = 0.97
Mutate to G: LLR = 0.81
Mutate to R: LLR = 0.67
Mutate to L: LLR = 0.50 Position 123 (Original: N):
Mutate to D: LLR = 1.15
Mutate to T: LLR = 0.76
Mutate to S: LLR = 0.58 Position 125 (Original: F):
Mutate to E: LLR = 1.39
Mutate to V: LLR = 1.08
Mutate to R: LLR = 1.00
Mutate to L: LLR = 0.98
Mutate to K: LLR = 0.93
Mutate to I: LLR = 0.87
Mutate to T: LLR = 0.87
Mutate to D: LLR = 0.72
Mutate to A: LLR = 0.64 Position 126 (Original: X):
Mutate to V: LLR = 5.51
Mutate to I: LLR = 5.07
Mutate to L: LLR = 4.74
Mutate to A: LLR = 4.02
Mutate to T: LLR = 3.96
Mutate to F: LLR = 3.36
Mutate to R: LLR = 3.28
Mutate to G: LLR = 3.26
Mutate to E: LLR = 3.18
Mutate to S: LLR = 3.08
Mutate to P: LLR = 3.07
Mutate to Y: LLR = 3.06
Mutate to C: LLR = 2.88
Mutate to M: LLR = 2.85
Mutate to K: LLR = 2.72
Mutate to D: LLR = 2.59
Mutate to W: LLR = 2.55
Mutate to Q: LLR = 2.46
Mutate to H: LLR = 2.39
Mutate to N: LLR = 1.89 Position 127 (Original: S):
Mutate to G: LLR = 1.53
Mutate to A: LLR = 1.52
Mutate to L: LLR = 1.44
Mutate to V: LLR = 1.37
Mutate to I: LLR = 1.23
Mutate to Y: LLR = 0.84
Mutate to F: LLR = 0.69
Mutate to T: LLR = 0.51 Position 128 (Original: X):
Mutate to V: LLR = 6.03
Mutate to A: LLR = 5.56
Mutate to L: LLR = 5.32
Mutate to I: LLR = 5.28
Mutate to T: LLR = 4.73
Mutate to G: LLR = 4.70
Mutate to F: LLR = 4.55
Mutate to S: LLR = 4.33
Mutate to M: LLR = 4.08
Mutate to Y: LLR = 3.94
Mutate to C: LLR = 3.80
Mutate to R: LLR = 2.84
Mutate to E: LLR = 2.66
Mutate to H: LLR = 2.64
Mutate to Q: LLR = 2.62
Mutate to W: LLR = 2.53
Mutate to N: LLR = 2.39
Mutate to D: LLR = 2.05
Mutate to P: LLR = 1.81
Mutate to K: LLR = 1.69 Position 129 (Original: F):
Mutate to L: LLR = 1.23
Mutate to I: LLR = 1.05
Mutate to V: LLR = 0.93 Position 130 (Original: T):
Mutate to L: LLR = 1.51
Mutate to I: LLR = 1.10
Mutate to V: LLR = 1.04 Position 131 (Original: X):
Mutate to V: LLR = 4.62
Mutate to I: LLR = 4.53
Mutate to L: LLR = 4.41
Mutate to A: LLR = 4.02
Mutate to G: LLR = 3.53
Mutate to F: LLR = 3.40
Mutate to S: LLR = 3.16
Mutate to M: LLR = 3.09
Mutate to T: LLR = 2.82
Mutate to C: LLR = 2.71
Mutate to Y: LLR = 1.81
Mutate to D: LLR = 1.67
Mutate to E: LLR = 1.65
Mutate to N: LLR = 1.64
Mutate to W: LLR = 1.44
Mutate to H: LLR = 1.37
Mutate to Q: LLR = 1.36
Mutate to R: LLR = 1.35
Mutate to P: LLR = 0.87 Position 137 (Original: V):
Mutate to R: LLR = 1.52
Mutate to G: LLR = 1.38
Mutate to S: LLR = 0.79
Mutate to A: LLR = 0.66
Mutate to D: LLR = 0.63 Position 138 (Original: I):
Mutate to A: LLR = 0.73
Mutate to S: LLR = 0.66
Mutate to L: LLR = 0.53 Position 139 (Original: D):
Mutate to F: LLR = 2.18
Mutate to W: LLR = 1.91
Mutate to P: LLR = 1.48
Mutate to L: LLR = 1.48
Mutate to Y: LLR = 1.07
Mutate to A: LLR = 0.53 Position 142 (Original: ):
Mutate to D: LLR = 0.74
Mutate to A: LLR = 0.69
Mutate to E: LLR = 0.59 Position 149 (Original: A):
Mutate to L: LLR = 1.31
Mutate to I: LLR = 0.67
Mutate to V: LLR = 0.53 Position 152 (Original: A):
Mutate to L: LLR = 1.36
Mutate to A: LLR = 1.14
Mutate to V: LLR = 0.83
Mutate to I: LLR = 0.71 Position 155 (Original: G):
Mutate to L: LLR = 1.54
Mutate to A: LLR = 1.41
Mutate to V: LLR = 0.68
Mutate to I: LLR = 0.65
Mutate to E: LLR = 0.63
Mutate to R: LLR = 0.61 Position 157 (Original: I):
Mutate to L: LLR = 2.75
Mutate to A: LLR = 2.72
Mutate to I: LLR = 1.95
Mutate to V: LLR = 1.88
Mutate to E: LLR = 1.73
Mutate to G: LLR = 1.66
Mutate to S: LLR = 1.64
Mutate to F: LLR = 1.55
Mutate to T: LLR = 1.35
Mutate to R: LLR = 1.11
Mutate to M: LLR = 1.03
Mutate to Q: LLR = 0.90
Mutate to D: LLR = 0.67
Mutate to K: LLR = 0.57 Position 161 (Original: I):
Mutate to E: LLR = 1.09
Mutate to A: LLR = 1.02
Mutate to D: LLR = 0.97 Position 162 (Original: S):
Mutate to A: LLR = 3.09
Mutate to E: LLR = 2.57
Mutate to R: LLR = 2.27
Mutate to S: LLR = 2.09
Mutate to T: LLR = 1.97
Mutate to D: LLR = 1.89
Mutate to K: LLR = 1.80
Mutate to Q: LLR = 1.79
Mutate to G: LLR = 1.70
Mutate to L: LLR = 1.47
Mutate to P: LLR = 1.05
Mutate to V: LLR = 1.02
Mutate to N: LLR = 0.98
Mutate to H: LLR = 0.90 Position 174 (Original: A):
Mutate to A: LLR = 3.50
Mutate to R: LLR = 3.12
Mutate to S: LLR = 3.01
Mutate to T: LLR = 2.99
Mutate to L: LLR = 2.88
Mutate to K: LLR = 2.82
Mutate to G: LLR = 2.71
Mutate to E: LLR = 2.64
Mutate to D: LLR = 2.51
Mutate to V: LLR = 2.51
Mutate to Q: LLR = 2.31
Mutate to N: LLR = 2.19
Mutate to I: LLR = 2.11
Mutate to P: LLR = 1.95
Mutate to H: LLR = 1.79
Mutate to M: LLR = 1.36
Mutate to F: LLR = 1.32
Mutate to Y: LLR = 0.83 Position 178 (Original: P):
Mutate to A: LLR = 1.11
Mutate to R: LLR = 0.96
Mutate to Q: LLR = 0.52 Position 181 (Original: A):
Mutate to E: LLR = 2.23
Mutate to D: LLR = 1.54
Mutate to R: LLR = 1.26
Mutate to Q: LLR = 0.98
Mutate to A: LLR = 0.85
Mutate to K: LLR = 0.67 Position 182 (Original: T):
Mutate to V: LLR = 4.57
Mutate to I: LLR = 3.69
Mutate to L: LLR = 3.16
Mutate to F: LLR = 2.15
Mutate to A: LLR = 2.15
Mutate to C: LLR = 1.91
Mutate to T: LLR = 1.13
Mutate to M: LLR = 0.77 Position 185 (Original: R):
Mutate to L: LLR = 2.84
Mutate to A: LLR = 1.98
Mutate to G: LLR = 1.76
Mutate to M: LLR = 1.67
Mutate to R: LLR = 1.53
Mutate to Q: LLR = 1.14
Mutate to S: LLR = 1.00
Mutate to E: LLR = 0.96
Mutate to H: LLR = 0.74
Mutate to T: LLR = 0.71
Mutate to D: LLR = 0.67
Mutate to F: LLR = 0.64
Mutate to K: LLR = 0.57
Mutate to Y: LLR = 0.56 Position 188 (Original: A):
Mutate to E: LLR = 3.31
Mutate to A: LLR = 2.80
Mutate to D: LLR = 2.68
Mutate to S: LLR = 2.50
Mutate to Q: LLR = 2.36
Mutate to T: LLR = 1.84
Mutate to N: LLR = 1.65
Mutate to G: LLR = 1.42
Mutate to L: LLR = 1.39
Mutate to R: LLR = 1.37
Mutate to K: LLR = 1.36
Mutate to H: LLR = 0.71 Position 192 (Original: T):
Mutate to V: LLR = 7.46
Mutate to N: LLR = 7.19
Mutate to Q: LLR = 6.97
Mutate to I: LLR = 6.92
Mutate to A: LLR = 6.88
Mutate to L: LLR = 6.77
Mutate to T: LLR = 6.67
Mutate to D: LLR = 6.52
Mutate to E: LLR = 6.46
Mutate to S: LLR = 6.35
Mutate to P: LLR = 6.23
Mutate to Y: LLR = 6.07
Mutate to M: LLR = 5.65
Mutate to G: LLR = 5.41
Mutate to R: LLR = 5.30
Mutate to H: LLR = 5.06
Mutate to F: LLR = 4.99
Mutate to K: LLR = 4.49
Mutate to W: LLR = 3.79
Mutate to C: LLR = 3.53 Position 198 (Original: D):
Mutate to A: LLR = 1.24
Mutate to E: LLR = 0.90
Mutate to L: LLR = 0.81
Mutate to I: LLR = 0.68 Position 199 (Original: V):
Mutate to L: LLR = 3.77
Mutate to M: LLR = 1.63
Mutate to V: LLR = 1.14
Mutate to I: LLR = 0.95
Mutate to F: LLR = 0.67
Mutate to T: LLR = 0.61 Position 202 (Original: V):
Mutate to S: LLR = 1.98
Mutate to P: LLR = 1.27
Mutate to A: LLR = 1.20 Position 203 (Original: K):
Mutate to V: LLR = 2.31
Mutate to P: LLR = 1.82
Mutate to I: LLR = 1.59
Mutate to L: LLR = 1.38
Mutate to E: LLR = 1.20
Mutate to A: LLR = 1.18
Mutate to R: LLR = 0.76
Mutate to T: LLR = 0.67 Position 208 (Original: R):
Mutate to T: LLR = 1.82
Mutate to N: LLR = 1.48
Mutate to S: LLR = 1.02 Position 209 (Original: V):
Mutate to H: LLR = 3.42
Mutate to Y: LLR = 2.66
Mutate to R: LLR = 1.62
Mutate to Q: LLR = 0.73
Mutate to L: LLR = 0.59 Position 212 (Original: R):
Mutate to N: LLR = 2.55
Mutate to S: LLR = 1.41
Mutate to R: LLR = 0.98
Mutate to A: LLR = 0.88 Position 214 (Original: E):
Mutate to M: LLR = 8.36
Mutate to Y: LLR = 8.16
Mutate to L: LLR = 8.13
Mutate to R: LLR = 7.78
Mutate to F: LLR = 7.65
Mutate to E: LLR = 7.36
Mutate to A: LLR = 7.24
Mutate to Q: LLR = 7.22
Mutate to K: LLR = 7.17
Mutate to I: LLR = 7.12
Mutate to V: LLR = 6.98
Mutate to S: LLR = 6.58
Mutate to C: LLR = 6.40
Mutate to T: LLR = 6.04
Mutate to W: LLR = 5.67
Mutate to H: LLR = 5.62
Mutate to G: LLR = 5.38
Mutate to N: LLR = 4.57
Mutate to D: LLR = 3.21
Mutate to P: LLR = 1.85 Position 224 (Original: K):
Mutate to E: LLR = 2.96
Mutate to Q: LLR = 2.18
Mutate to D: LLR = 1.11
Mutate to A: LLR = 0.56

Total hotspots found: 75”

These results seemed a bit too excessive so I changed the code around a little bit and got this

Mutation Hotspots Identified (positions with at least 3 mutations having LLR > 5):

Position 20 (Original: I):

Mutate to E: LLR = 5.70
Mutate to A: LLR = 5.51
Mutate to D: LLR = 5.31 Position 128 (Original: X):
Mutate to V: LLR = 6.03
Mutate to A: LLR = 5.56
Mutate to L: LLR = 5.32
Mutate to I: LLR = 5.28 Position 192 (Original: T):
Mutate to V: LLR = 7.46
Mutate to N: LLR = 7.19
Mutate to Q: LLR = 6.97
Mutate to I: LLR = 6.92
Mutate to A: LLR = 6.88
Mutate to L: LLR = 6.77
Mutate to T: LLR = 6.67
Mutate to D: LLR = 6.52
Mutate to E: LLR = 6.46
Mutate to S: LLR = 6.35
Mutate to P: LLR = 6.23
Mutate to Y: LLR = 6.07
Mutate to M: LLR = 5.65
Mutate to G: LLR = 5.41
Mutate to R: LLR = 5.30
Mutate to H: LLR = 5.06 Position 214 (Original: E):
Mutate to M: LLR = 8.36
Mutate to Y: LLR = 8.16
Mutate to L: LLR = 8.13
Mutate to R: LLR = 7.78
Mutate to F: LLR = 7.65
Mutate to E: LLR = 7.36
Mutate to A: LLR = 7.24
Mutate to Q: LLR = 7.22
Mutate to K: LLR = 7.17
Mutate to I: LLR = 7.12
Mutate to V: LLR = 6.98
Mutate to S: LLR = 6.58
Mutate to C: LLR = 6.40
Mutate to T: LLR = 6.04
Mutate to W: LLR = 5.67
Mutate to H: LLR = 5.62
Mutate to G: LLR = 5.38

Total hotspots found: 4

One thing I noticed whilst analysing the heatmap, the x spots around 16, 122 to 126, 189 and 211 appear brighter. That means that these are the areas where the protein’s sequence might be quite adaptable, making them interesting for further investigation into their role and potential for modification.

New sequence for folding: MNFLDDVEKLSKVKGDDETFKKGLEEIAKKHGFSGWNFVYDNNGHLKVWSNLDPALLKEYFEKNWYKVDPVIKKAREEEKLFTWSWDEEWPKLTEEEKALGEEYKKYGVKSGITIPIKAENGTRAYFTFYSKKDKIELKEPIDDEKAKKVTKLLYKKIVDNKWTPTGHTPMSLNEKERAYLKLRSKGYSQREIAEKLGVAQSTVSRTIRKAMEKFGVKTIEELIKIAKEMGLI

What it looks like -> somewhat similar to the original

Part D. Group Brainstorm on Bacteriophage Engineering

*Claude AI prompt: Can you please analyse the results of UniProtKB LuxR search ** Gemini built in to colab AI prompt “identify mutation hotspots based on the compiled data in the heatmap”

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ (taken from https://rest.uniprot.org/uniprotkb/P00441.fasta ) -> muted form A-> V MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

(2-5)

#	Binder	Pseudo Perplexity	Notes
0	WHYGAAQAAHWX	7.60026377248636	High confidence
1	WLYGASAAAWKK	7.46473740432208	Highest confidence!!!
2	WLYGAAGVAWKE	10.9325804754158	Moderate confidence
3	WLYYPQAAKLKK	15.5499787120909	Lowest confidence
—	FLYRWLPSRRGG	20.9180890005569	Known binder (control)

Part 2: Evaluate Binders with AlphaFold3

So what I found out is that in WHYGAAQAAHWX X is an unknown amino acid and that AlphaFold3 is not going to work with it so I’m just skipping it for now.

SOD1_A4V_WLYYPQAAKLKK

SOD1_A4V_WLYGASAAAWKK

SOD1_A4V_WLYGAAGVAWKE

SOD1_A4V_FLYRWLPSRRGG

Peptide	ipTM	pTM	Binding Location
FLYRWLPSRRGG (known)	0.34	0.83	Surface-bound, near bottom/loop region
WLYGAAGVAWKE	0.45	0.88	Engages β-barrel, partially buried near core
WLYGASAAAWKK	0.24	0.78	Surface-bound, near N-terminus/top loops
WLYYPQAAKLKK	0.27	0.70	Surface-bound, loose association near loops

ipTM scores across all peptides ranged from 0.24 to 0.45 -> suggests weak-to-moderate predicted interface confidence. The PepMLM-generated peptide WLYGAAGVAWKE (ipTM = 0.45) outperformed the known binder FLYRWLPSRRGG (ipTM = 0.34), appearing more engaged with the β-barrel core of SOD1. The remaining two peptides WLYGASAAAWKK (ipTM = 0.24) and WLYYPQAAKLKK (ipTM = 0.27) scored below FLYRWLPSRRGG and appeared loosely surface-bound. None exceeded an ipTM of 0.5, which should be expected for short peptides against a structured target

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

WLYYPQAAKLKK_pv

WLYGASAAAWKK_pv

WLYGAAGVAWKE_pv

FLYRWLPSRRGG_pv

Peptide	ipTM	Binding Affinity (pKd)	Solubility	Hemolysis	Net Charge
FLYRWLPSRRGG (known)	0.34	5.968	Soluble	Non-hemolytic	+2.76
WLYGAAGVAWKE	0.45	6.683	Soluble	Non-hemolytic	-0.23
WLYGASAAAWKK	0.24	6.071	Soluble	Non-hemolytic	+1.76
WLYYPQAAKLKK	0.27	5.482	Soluble	Non-hemolytic	+2.76

WLYGAAGVAWKE had both the highest ipTM (0.45) and the strongest predicted binding affinity (6.683 pKd), suggesting that structural confidence and binding prediction align for this peptide. All peptides were soluble and non-hemolytic, meaning none raised safety red flags. Notably, higher ipTM did loosely correlate with stronger affinity — WLYGAAGVAWKE topped both metrics while WLYYPQAAKLKK scored lowest on affinity and had a weak ipTM.

Peptide to Advance: WLYGAAGVAWKE!!!

WLYGAAGVAWKE has the best structural binding confidence (ipTM = 0.45), the strongest predicted affinity (6.683 pKd), is fully soluble, and non-hemolytic. It outperforms the known binder FLYRWLPSRRGG on every key metric! Therefore, we should use this one

Part 4: Generate Optimized Peptides with moPPIt

I tried to run for Motif positions 1, 2, 3, 4, 5, 6, 7 but it’s just taking so long, so, I’m nit going to finosh running it but here’re some results that I got so far:

moPPIt Generated Peptides

#	Peptide	Target Residues
1	GKTEKTYTDCCD	1, 2, 3, 4, 5, 6, 7
2	EEQNTCIQTTKA	1, 2, 3, 4, 5, 6, 7

Comparison: moPPIt vs PepMLM Peptides:

moPPIt peptides differ notably from PepMLM-generated ones in both composition and design intent. PepMLM peptides (e.g., WLYGAAGVAWKE, WLYGASAAAWKK) were dominated by W at position 1 and showed a hydrophobic character, as the model simply sampled sequences likely to bind SOD1 without any spatial constraints. In contrast, moPPIt peptides (GKTEKTYTDCCD, EEQNTCIQTTKA) are more polar and charged while balancing affinity, solubility and hemolysis objectives. moPPIt asks “what binds specifically near position 4, and is also therapeutically viable?”

Before advancing the studies of moPPIt peptides, the following next steps/ evaluations needed:

SPR or ITC should be used to measure actual binding affinity
AlphaFold3 rystallography to confirm binding near the target residues 1–7
To confirm non-toxic
Need to assess peptide half-life through test degradation in serum
Check for cell permeability
ALS mouse model

new!! TKCVATKKLQED

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

waiting to be accepted onto the platform

Part C: Final Project: L-Protein Mutants

I’ve also asked Claude* for help with the validation of computations, and this was the correlation assessment

STRONG CORRELATION between computational scores and experimental LLR values:

Computational Prediction	Experimental Validation	Agreement
K50L (Score: 2.56)	Bright yellow	Excellent
C29R (Score: 2.40)	Position 29 hotspot	Excellent
Y39L (Score: 2.24)	Bright at pos 39	Strong
N53L (Score: 1.87)	TM leucine pattern	Strong
S9Q (Score: 2.01)	Some positive signals	Good

Conclusion: The ESM2 language model predictions correlate well with experimental data, particularly for:

Identifying hotspot positions (29, 39, 50)
Predicting beneficial amino acid types (hydrophobic in TM, removing C29)
Overall mutation effects (positive vs negative)

This validates using computational approaches for rational protein design, though experimental validation remains essential.

Top 20 Mutations with Scores

Rank	Mutation	Original AA	New AA	Position	Score	Region
1	K50L	K	L	50	2.561	Transmembrane
2	C29R	C	R	29	2.395	Soluble
3	Y39L	Y	L	39	2.242	Soluble
4	C29S	C	S	29	2.043	Soluble
5	S9Q	S	Q	9	2.014	Soluble
6	C29Q	C	Q	29	1.997	Soluble
7	C29P	C	P	29	1.971	Soluble
8	C29L	C	L	29	1.961	Soluble
9	K50I	K	I	50	1.929	Transmembrane
10	N53L	N	L	53	1.865	Transmembrane
11	E61L	E	L	61	1.818	Transmembrane
12	T52L	T	L	52	1.814	Transmembrane
13	K50F	K	F	50	1.802	Transmembrane
14	C29T	C	T	29	1.797	Soluble
15	C29K	C	K	29	1.796	Soluble
16	F5Q	F	Q	5	1.795	Soluble
17	F5R	F	R	5	1.660	Soluble
18	C29A	C	A	29	1.649	Soluble
19	Y27R	Y	R	27	1.628	Soluble
20	F22R	F	R	22	1.602	Soluble

my final 5 mutations: K50L (Transmembrane) 2.561 C29R (Soluble) 2.395 Y39L (Soluble) 2.242 N53L (Transmembrane) 1.865 S9Q (Soluble) 2.014

*prompt used with Claude find correlation between the experimental data L-Protein Mutants - Sheet1.csv and protein_mutations_scores.csv

Week 6 HW: Genetic Circuits Part I: Assembly Technologies

Assignment: DNA Assembly

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Based on Protocol for Phusion™ High-Fidelity PCR Master Mix with GC Buffer (https://www.neb.com/en-gb/protocols/protocol-for-phusion-high-fidelity-pcr-master-mix-with-gc-buffer-m0532)

Phusion DNA Polymerase - a high-fidelity enzyme that synthesises new DNA strands with 3’→5’ exonuclease activity for proofreading dNTPs - building blocks for DNA synthesis (200 µM each at 1X concentration) MgCl₂ - Cofactor required for polymerase activity (1.5 mM at 1X concentration) Possibly optional as the question doesn’t mention it (GC Buffer - optimised buffer for amplifying difficult templates with high GC content or secondary structure) DMSO (optional additive) - helps denature secondary structures in GC-rich or difficult templates (recommended at 3% final concentration)

What are some factors that determine primer annealing temperature during PCR?

Primer Tm as in melting temperature, primer length, GC content, primer pair Tm similarity

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR vs. Restriction Enzyme Digests

Comparison Table

Aspect	PCR	Restriction Enzyme Digests
Mechanics	Amplifies specific DNA regions using primers	Cuts DNA at specific recognition sequences
Flexibility	More flexible; doesn’t require specific cut sites	Limited by existing restriction sites in sequence
Mutations	Can introduce mutations via primer design	Cannot introduce mutations
End Products	Variable ends based on primer design	Creates predictable, precise ends
Time needed	~90 minutes with thermal cycling	Faster (~1-2 hours incubation)
Template amount	Can work with small amounts of template	Requires sufficient DNA quantity
Sticky ends	Not applicable	Can create compatible sticky ends
When to use	When introducing mutations into sequences, no suitable restriction sites exist in the DNA, from small amounts of template DNA	When restriction sites are conveniently located in the sequence, Creating compatible sticky ends for cloning, Cutting plasmids for traditional cloning applications, Precise, predictable cuts are required

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Ensure primers have 20-22 bp overhangs that are complementary between fragments
After PCR and DpnI digest, purify the DNA using the Zymo DNA Clean & Concentrator kit and measure concentration using Nanodrop/Qubit. The concentration should be above ~30 ng/µL.
Run the PCR products on an agarose gel to verify correct band sizes. Mix 3 µL of sample with 3.3 µL of 6x Loading Dye, run at ~100 mV for 15 min, and compare against a DNA ladder. Calculate your predicted digest on Benchling to verify the correct band size matches what you see on the gel
Ensure only unmethylated PCR products (not the original methylated template plasmid) are present through DpnI Treatment, reducing background colonies from unmutated template.

How does the plasmid DNA enter the E. coli cells during transformation?

By diffusion when the cells are shocked

Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate assembly is a molecular cloning technique used to join DNA fragments in a single reaction. Methods for golden gate include MoClo, GoldenBraid 2.0, Mobius assembly and EMMA. It’s an efficient and cost-effective means to generate construct variants. Type IIS restriction enzymes are used to assemble multiple fragments in a single reaction. It’s also important to make sure that the Type IIS included in the process is not present in the vector or insert sequences

Screenshot taken from https://www.youtube.com/watch?v=JuiJASyG9gc

Model this assembly method with Benchling or Asimov Kernel!

As we don’t have access to Asimove I’ll be modelling this with Benchling

First, I added the DNA sequense of the well-known green fluorescent protein (GFP), which is native to the jellyfish Aequorea victoria
Changed the GGTCTC to GGTCTT to be able to work with the Golden Gate Assembly. Then I’ve added the plasmid to the project. I used the same one described in the lab. Then I checked for the presence of GGTCTC to use as the typeII enzyme.. there were 2 of those. Checked the other ones 2 of those again. Basically all three Type IIS enzymes (BsaI, BbsI, and Esp3I) have recognition sites in the mUAV plasmid
I ended up editing the GGTCTC ones as they were all in the base parts of the plasmid so they didn’t affect much. ( GGTCTC → GGTCTT )
Then I figured out that amilCP gene starts at 2114 and ends at 2779 Complement: A↔T, G↔C
I then loooked for stop codons and found the TAA one. For the primer design i also needed the 20 protein sequence of the GFP which was atgtctaaaggtgaagaatt. Basically my primer components ended up being BsaI site: GGTCTC, Overhang A: AATG (includes the ATG start codon) and the binding region which is those first 20 bp of GFP = atgtctaaaggtgaagaatt. Then for the reverse binding region, the positions of 666 to 685 (20 before the TAA) are TGGTCTTGTTAGAATTTGTT therefore reverse complement ones are AACAAATTCTAACAAGACCA
End primers are GFP Forward 5’-GGTCTCAATGTCTAAAGGTGAAGAATT-3’ and GFP Reverse 5’-GAGACCAAGCAACAAATTCTAACAAGACCA-3'
Then as for the backbone forward primer 5’-GGTCTCGCTTAAGCTTCAAATAAAACGAA-3’ as it is (taagcttcaaataaaacgaa) where GGTCTC is a BsaI recognition site, G a spacer base, and CTTAAGCTTCAAATAAAACGAA a binding region.
Backbone Reverse ended up being 5’-GAGACCCATTTTAGTATTTCTCCTCTTTCT-3'
Then knowing all this it’s time to do the Golden Gate Assembly
Needed to change the setting from use cut lines otherwise i was getting an error but other than that it was a pretty straightforward process

and here are some final setting that I used

all done !!! https://benchling.com/s/seq-7kPMc9tGfTBvUaQHkniq?m=slm-VIKLDWpNc1o4GKeuM4Ii

Assignment: Asimov Kernel

(didn’t get access)

Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Analogue quality of computations (not just on and off, but rather treat them more like waves ->easier to work with)
-> Thresholds finetuning works
You can also work with more inputs simultaneously that will automatically become a weighted sum
Which also means they can be trained like neural networks!

whereas Boolean functions are just True/False with operators AND, OR, and NOT

Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

Actually, mycelium described in the later parts of the homework is a great application of IANN due to its non-uniform growth, which is basically a response to chemical gradients, nutrient availability, CO₂ levels, humidity, and light. That’s all analogue and continuous signals! Therefore, we could engineer a fungal cell to use an IANN to sense its local environment and decide how to grow, giving mycelium a programmable, environmentally-responsive architecture, which is basically a CA <3 where simple local rules produce complex emergent global patterns!!

So then, for input behaviour, we have

-Local nutrient and carbon concentration sensed via metabolite-responsive promoters -CO₂ or O₂ levels -Mechanical compression (fungi lowkey already have these) -Neighbour cell density, which would act like quorum-sensing signalling molecules

Then, for output behaviour, we have

-Expression of growth-promoting genes -Expression of growth-halting genes → denser, stiffer composite in that region -Secretion of signalling molecules → coordinating behaviour with neighbouring cells

Limitations: intercellular signal noise (“local” signals may not stay local, warping and structural variability (Walter and Gürsoy (2022)), instability of speed as the IANN may lag behind rapidly changing environmental conditions

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

Here! A super quick sketch made in figma I tried to make it as close to the example as possible

Assignment Part 2: Fungal Materials

What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

I ended up reading a lot about acoustic dampening and how fungal materials are used for it. So the most established fungal material is the mycelium-based composite. It is produced when the vegetative root network of fungi (mycelium), is grown on agricultural or waste plant substrates and then dried to stop growth. The resulting material has the potential to replace petrochemical-based materials within architectural systems and can serve as a biodegradable(!!!) alternative to synthetic sound-absorbing materials. Based on this paper https://pmc.ncbi.nlm.nih.gov/articles/PMC9394424/ , the acoustic performance is genuinely competitive. Mycelium-based composites cultivated on shredded and fine cardboard can both be considered sound-absorbing materials and have the potential to compete with the performance of synthetic sound absorbers as both sample groups outperformed a polypropylene acoustic panel at low frequencies (125 Hz, 250 Hz, 500 Hz). An earlier study found that even the lowest-performing substrate, 100% cotton bur fibre, still yielded higher than 70% acoustic absorption at 1000 Hz.

Advantages over traditional counterparts like fibreglass or stone wool are primarily environmental. As for disadvantages, performance is variable and hard to control because mycelium growth is sensitive to humidity, temperature, and other things.

What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

I’d be so interested in researching further applications for acoustics. Making a speaker out of a part made out of fungi would be really interesting from a sound perspective. Like mycelium composites are already being used as acoustic panels because their fibrous, porous structure absorbs and diffuses sound well.

It’s so much easier to create a mater out of fungi. They are cheaper and easier to grow at scale. You can also also mold it, control its density, and at the end harvest a structured composite

Assignment Part 3: First DNA Twist Order

So, my order is the LuxI/LuxR sender-receiver circuit: the LuxI expression cassette (constitutive promoter + LuxI + terminator) and the LuxR receiver cassette (LuxR + pLux promoter + GFP + terminator)

For a backbone vector, the construct will be synthesized in the pTwist Chlor High Copy backbone which is a high-copy pMB1-origin plasmid with chloramphenicol resistance

https://pmc.ncbi.nlm.nih.gov/articles/PMC9394424/ https://www.mdpi.com/2313-7673/7/3/100

Week 9 HW: Cell-Free Systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

As the reaction happens outside living cells, you have much more flexibility and control over experimental variables!

To be more specific, the biggest difference is the promoter. In cell-free protein synthesis, a T7 promoter is often preferred because T7 RNA polymerase is readily added or already present in the extract, making transcription simple and highly controllable. And this is why we have a much more precise control over transcription This doesn’t happen with real bacteria, as gene expression is tied to cell growth phase, metabolism and others. That also means that prototyping is faster as you can propagate the plasmid in E. coli, then purify plasmid DNA, then add it to cell-free protein synthesis and be kinda done within so much faster

Two cases where cell-free expression is more beneficial than cell production:

unstable proteins degraded in vivo
toxic proteins, where something might kill bacteria before enough protein accumulates or like if one needs to do any sort of rapid screening

Describe the main components of a cell-free expression system and explain the role of each component.

Cell Extract (Lysate). It provides the machinery for transcription/translation
Genetic Template (DNA or mRNA). Contains the gene encoding the target protein.
Energy Regeneration System. Sustain long-term protein synthesis through supplying ATP/GTP
Amino Acids and Building Blocks. Needed for polypeptide synthesis
Buffer and Salts. Stabilise the reaction environment and optimise enzyme function and others

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Transcription and translation consume ATP super fast -> ATP is quickly depleted. To avoid that a common method is adding PEP with pyruvate kinase (Calhoun and Swartz, 2007)

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic cell-free systems - rapid, high-yield production of simple proteins -> Green fluorescent protein (GFP) (small, soluble, no need for complicated glycosylation) Eukaryotic systems - complex proteins requiring proper folding and post-translational modification -> Erythropoietin (EPO) (requires specific disulfide bonds and glycosylation)

How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

To express a membrane protein, I would use a eukaryotic cell-free system with liposomes. Then I’d also include chaperones, and optimise reaction conditions to promote proper folding and membrane insertion while preventing aggregation.

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Low yield may result from ATP depletion (->add an energy regeneration system), protein misfolding (->include chaperones or membrane mimetics), or poor template quality(->would be solved by optimising template design and reaction conditions)

Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows: Pick a function and describe it.

For this mini project/ question, I’d like to explore how synthetic minimal cells for the wols of biocensors specifically, I’d like to look into the possibilities of heavy metal detection in water. It would be also interesting to explore what’s possible to be done after the detection as a form of capturing it

What would your synthetic cell do? What is the input and what is the output?

The synthetic cell would function as a heavy metal detector targeting mercury (Hg²⁺) contamination in water. It will first sense dissolved Hg²⁺ through a metal-responsive transcription factor, then synthesise both a fluorescent reporter (so you can see the signal) and a metal-binding protein to physically capture the mercury.

Input Hg²⁺ ions in the surrounding aqueous environment Output GFP fluorescence, which would act as a quantitative detection signal

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

No, as it would be ineffective

Could this function be realized by genetically modified natural cell?

Yes, it’s possible to use E.coli but that wouldn’t be safe for the environment.

Describe the desired outcome of your synthetic cell operation.

The desired outcome would be a real-time fluorescent detection of mercury contamination (+passive remediation!)

Design all components that would need to be part of your synthetic cell.

PURE system - E. coli Tx/Tl machinery
MerR + PmerT-DNA - Hg²⁺ sensor + promoter
GFP - fluorescent reporter
Metallothionein - Hg²⁺ binding protein
ATP regen (creatine-P / CK) - energy source

What would be the membrane made of?

DPPC (structural bilayer backbone), cholesterol (reduced passive permeability), DSPE-PEG2000PE (for stability in water), DPhPC (help with protein reconstitution)

What would you encapsulate inside? Enzymes, small molecules.

Genetic material: A linear DNA expression cassette encoding gfp and MT1 (human metallothionein-1) under a single MerR-regulated PmerT promoter — both genes transcribed as a bicistronic message driven by the same mercury-sensing switch. Tx/Tl machinery: PURE system (PUREfrex 2.0 or similar), providing all ribosomes, elongation factors, tRNAs, aminoacyl-tRNA synthetases, and RNA polymerase needed for in-vesicle protein synthesis. Pre-loaded proteins/small molecules:

MerR transcription factor (pre-expressed and loaded — it is the sensor that detects Hg²⁺ and activates PmerT) Creatine phosphate + creatine kinase (ATP regeneration system to power the PURE system) All 20 amino acids and NTP mix RNase inhibitor (to protect mRNA during synthesis)

Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promoters, like Tet-ON, you need mammalian)

A bacterial PURE system is the right choice here because all genes are prokaryotic (merR, merB, gfp, MT1 with bacterial codon optimization),

How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

Through MerB and MscL

Experimental details List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

Lipids:

DPPC (Avanti Polar Lipids )
Cholesterol (Sigma C8667)
DSPE-PEG2000
DPhPC

Genes / DNA Cassettes:

merR Uniprot P0A3C3
merB Uniprot P0A393
gfp Uniprot P42212
MT1 Uniprot P04731
mscL Uniprot P0A742

How will you measure the function of your system?

In vitro validation (to consider dose-response fluorescence, mercury sequestration, Tx/Tl confirmation, membrane integrity controls)
Then test the same synthetic cells against Pb, Cd, Cu, Zn, and Fe at equivalent concentrations
No-MerR negative control (cells assembled without the MerR protein but with the PmerT-gfp cassette should be no flouresebce)
Environmental validation through spiked water sample test

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch: Write a one-sentence summary pitch sentence describing your concept.

I’m also really interested in how acoustics work and operate. I know there’s been a lot of recent development with mycelium-based material research for speakers, but for this project, I’d like to look into wall panels, which you can find a lot in music venues. In most cases, such venues require a specific setup for the type of music that’s going to be played, and this would solve this issue. I imagine them being updated with using freeze-dried cell-free enzymes to modify material porosity/stiffness in response to sound frequency and humidity, optimising acoustics in real-time for different performances.

How will the idea work, in more detail? Write 3-4 sentences or more.

The panels contain freeze-dried cell-free systems with mechanosensitive promoters and enzymes that degrade polymers in the material matrix. When sound waves vibrate the material and ambient humidity activates the system -> the enzymes modify the panel’s density and porosity. High-frequency performances trigger different enzymatic pathways than bass-heavy music, creating frequency-specific acoustic reflection. The material could shift from absorptive (for loud rock) to reflective (for classical) over the course of hours.

What societal challenge or market need will this address?

Live music venues struggle with “one-size-fits-all” acoustics; a room great for orchestras sounds terrible for others. Acoustic treatment is expensive and static. This addresses the massive live music industry with adaptive infrastructure that reduces the need for costly manual reconfiguration.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

One could use multiple “generations” of enzymes with different activation thresholds. Early activators work on first moisture exposure, but reserve populations activate only after pH changes from initial reactions, creating extended activity. Incorporate hygroscopic materials that capture atmospheric moisture in humid London venues to enable repeated activation cycles.

Homework question from Ally Huang

Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

When in space, astronauts lose quite a bit of their bone density 1-2%, which is the same thing as having osteoporosis on Earth for a year. Therefore, it’s important to understand how bone-building and bone-resorbing cells operate for long missions in space. There is a chance that cellular automata could have an answer for that. With CA, we could simulate spatial-temporal patterns of bone remodelling to predict how mechanical unloading disrupts the delicate balance between mineralisation and resorption. This research could inform is of possible countermeasures

Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

The RANK/RANKL/OPG signalling pathway regulating osteoblast-osteoclast communication (RUNX2, osteocalcin, collagen type I).

Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

The RANK/RANKL/OPG pathway controls bone remodelling balance. In microgravity, mechanical unloading downregulates osteoblast activity (reduced RUNX2 expression) while RANKL signalling increases osteoclast formation. Cellular automata can model how local cell-cell signalling rules—based on real gene expression data from spaceflight experiments—produce emergent bone microarchitecture patterns.

Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

Altered RANK/RANKL/OPG expression ratios in microgravity create spatial signalling gradients that, when modelled using cellular automata rules, will reproduce bone loss patterns observed in astronaut imaging studies. Traditional biomechanical models miss the discrete cellular interactions driving remodelling. Cellular automata excel at capturing emergent behaviour from simple local rules.

Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

Samples: Analyse existing NASA GeneLab datasets from microgravity analogue rodent hindlimb unloading experiments, measuring RANK/RANKL/OPG expression in bone tissue at multiple timepoints. Controls: Earth-gravity parameter sets; randomised signalling gradients. Data collected: Bone volume fraction, trabecular thickness, connectivity metrics from CA output; compare against μCT imaging data from actual experiments.

Homework Part B: Individual Final Project

https://docs.google.com/presentation/d/142YNBXXcDJBfGO_OaF0DpeaF_287YsDeH1-Acp7kUI0/edit?slide=id.g3d41c2c8c8b_41_231#slide=id.g3d41c2c8c8b_41_231

Bibliography

Calhoun, K.A. and Swartz, J.R. (2007). Energy Systems for ATP Regeneration in Cell-Free Protein Synthesis Reactions. In Vitro Transcription and Translation Protocols, pp.3–17. doi:https://doi.org/10.1007/978-1-59745-388-2_1.

‌

Week 10 HW: Advanced Imaging & Measurement Technology

Final Project

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

molecular validation (to confirm whether the Twist plasmids are correct and transformed properly) through the presence of the sender and receiver plasmid, correct insert size and sequence identity, then successful bacterial transformation
functional signalling validation / AHL sensing ability through receiver activation threshold, GFP expression strength, dose-response curve, signal saturation
spatial pattern formation to measure whether LuxI actually produces AHL through sender-generated AHL activity, effective AHL equivalent concentration, and hopefully signalling consistency between cultures
ML/CAD dataset features through OT-2 plate maps, fluorescence reads and metadata logs

Please describe all of the elements you would like to measure, and furthermore, describe how you will perform these measurements.

The project will measure genetic construct integrity, quorum-sensing response strength, sender-derived signalling output, and spatial cellular automata pattern formation

What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

Plasmid integrity will be measured using agarose gel electrophoresis to confirm the expected insert sizes of the LuxI sender and LuxR-sfGFP receiver constructs.
Functional LuxR receiver performance will be quantified by exposing transformed receiver E. coli to a defined dilution series of 3-oxo-C6-HSL and measuring GFP fluorescence over time using a handmade(! to plan) fluorescence plate reader
LuxI sender activity will be measured indirectly by collecting cell-free sender supernatant and applying it to the receiver strain, using the previously generated AHL-GFP calibration curve to understand the effective signalling molecule concentration.
Spatial pattern formation relevant to biological cellular automata will be measured using Opentrons (OT-2) automated plate layouts combined with fluorescence time-lapse imaging. Metrics including activation radius, neighbourhood response, and endpoint fluorescence distribution will then be extracted and used as training data for ML-enhanced CAD prediction in Aim 2.

Waters Part I — Molecular Weight

1. Calculated molecular weight

Compute pI/Mw - Results Theoretical pI/Mw (average)

Sequence: 10 20 30 40 50 60 MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT

    70         80         90        100        110        120

LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL

   130        140        150        160        170        180

VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA

   190        200        210        220        230        240

DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYKL

EHHHHHH

Theoretical pI/Mw: 5.90 / 28006.60

2. Calculated molecular weight of the eGFP using the adjacent charge state approach described in the recitation

For my data I chose peaks at $m/z = 933.8044$ and $m/z = 966.0037$

$$966.0037 - 933.8044 = 32.1993$$

For $n$ and $n-1$:

$$\frac{MW}{n} - \frac{MW}{n-1} \approx \frac{MW}{n(n-1)}$$

$z = 30$ and $z = 29$:

$$\text{Expected spacing} \approx \frac{MW}{30 \times 29} \approx \frac{28000}{870} \approx 32.2$$

$m/z = 933.8044$ is $z = 30$

$m/z = 966.0037$ is $z = 29$

MW ($z = 30$):

$$MW = (30 \times 933.8044) - (30 \times 1.007825)$$ $$= 28,014.13 - 30.23$$ $$= 27,983.9\ \text{Da}$$

Super close to the theoretical MW of $28,006.60\ \text{Da}$ from the first step

3. accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1

$$\text{Accuracy} = \frac{|MW_{\text{experiment}} - MW_{\text{theory}}|}{MW_{\text{theory}}}$$ $$= \frac{|27,983.9 - 28,006.60|}{28,006.60}$$ $$= \frac{22.7}{28,006.60}$$ $$= 0.00081$$ $$= 0.081\%$$

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

No, as the protein is in its denatured state at a high charge state (32+)

Waters Part III — Peptide Mapping - primary structure

1. Lysines (K) and Arginines (R) are in eGFP

K R

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYGKLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

K=20 R=6

2. Peptides generated from tryptic digestion of eGFP?

Theoretical pI: 5.90
Mw (average mass): 28006.60
Mw (monoisotopic mass): 27988.96

#	Mass	Position	Peptide Sequence
1	4472.1752	170–210	HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK
2	2566.2931	217–239	DHMVLLEFVTAAGITLGMDELYK
3	2437.2608	5–27	GEELFTGVVPILVELDGDVNGHK
4	2378.2577	54–74	LPVPWPTLVTTLTYGVQCFSR
5	1973.9062	142–157	LEYNYNSHNVYIMADK
6	1503.6597	28–42	FSVSGEGEGDATYGK
7	1266.5783	87–97	SAMPEGYVQER
8	1083.4979	240–247	LEHHHHHH
9	1050.5214	115–123	FEGDTLVNR
10	982.4952	133–141	EDGNILGHK
11	821.3940	81–86	QHDFFK
12	790.3552	75–80	YPDHMK
13	769.3913	47–53	FICTTGK
14	711.2944	103–108	DDGNYK
15	655.3813	98–102	TIFFK
16	602.2780	211–215	DPNEK
17	579.3137	128–132	GIDFK
18	507.2925	164–167	VNFK
19	502.3235	124–127	IELK

Predicted peptides = 19

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

-5.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

Observed LC peaks = ~19

Observed peaks are about the same as predicted

5. Identify the mass-to-charge of the peptide shown in Figure 5b. What is the charge of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide based on its m/z and z

Main peptide peak = 525.76712

Spacing = 0.5 m/z, => most abundant charge state is 2+. The singly charged mass is 1050.52 Da

Charge = +2 [ M + H ] + [M+H] + = 1050.52145 Da [ M + H ] + [M+H] + = 1050.52438 Da

Identified peptide: FEGDTLVNR

the error is about 2.8 ppm

6. Identify the peptide from PeptideMass

FEGDTLVNR has a theoretical monoisotopic mass of 1050.52143 Da

7. 88 is the percentage of the sequence that is confirmed by peptide mapping

Waters Part IV — Oligomers

7FU Decamer = 3.40 MDa

8FU Didecamer = 8 MDa

8FU 3-Decamer = 12 MDa

8FU 4-Decamer = 16 MDa

Waters Part V — Did I make GFP?

Yes!

	Theoretical	Observed/Measured on Intact LC-MS	PPM Mass Error	comparing with
Molecular Weight (kDa)	26,905 Da	26,903 Da	0.7 ppm	26.90

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

At this point, I’m not entirely sure what I’ve contributed to, as the artwork changed quite a lot. I’ve added a few yellow and green pixels when we first got access to the board, but as time went it changed so much that I don’t think any of them were left in the same spots. I did like the collaborative aspect of it, but going forward, I kind of wish everyone could contribute to just one/two pixels that couldn’t be overwritten. That would make the pointing-out aspect of it so much easier

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

E. coli Lysate Provides the cellular machinery for transcription and translation, including ribosomes, tRNAs, translation factors, and other essential proteins.

Salts/Buffer

Potassium Glutamate Acts as the primary osmolyte to maintain ionic strength and pH stability in the reaction mixture.

HEPES-KOH pH 7.5 Maintains optimal pH for enzyme activity and protein synthesis throughout the reaction.

Magnesium Glutamate Provides essential Mg²⁺ cofactor. Responsible for ribosome function, translation factors, and RNA stability.

Potassium phosphate monobasic & dibasic Work together as a phosphate buffer system. Maintains pH stability and provide inorganic phosphate for nucleotide synthesis.

Energy/Nucleotide System

Ribose: A pentose sugar that serves as the sugar backbone for nucleotide synthesis

Glucose: Provides metabolic energy and carbon source

AMP, CMP, GMP, UMP Nucleoside monophosphates serve as building blocks for RNA synthesis during transcription. They are recycled during the reaction

Guanine A purine base that can be salvaged and converted into GMP. Gives an alternative route for guanine nucleotide synthesis when needed

Translation Mix (Amino Acids)

17 Amino Acid Mix Provides the standard amino acids (minus Tyr and Cys, which are listed separately) needed for protein synthesis during translation(!)

Tyrosine & Cysteine Amino acids requiring special handling (Tyr for fluorescent protein variants, Cys to prevent oxidation or for selenomethionine incorporation). They are also needed for protein synthesis during translation though

Additives

Nicotinamide: Acts as a precursor for NAD⁺/NADH. Needed for essential cofactors for energy metabolism within the cell-free system.

Backfill

Nuclease Free Water: Dissolves all components and serves as the reaction medium. The goal of nuclease-free formulation is to prevent degradation of RNA transcripts.

Main Differences:

1-hour PEP-NTP vs. 20-hour NMP-Ribose-Glucose Master Mixes The 1-hour optimised PEP-NTP master mix uses PEP and NTPs, which provide rapid, readily available energy and nucleotides for quick transcription/translation reactions with high yields but shorter reaction windows. In contrast, the 20-hour NMP-Ribose-Glucose master mix uses NMPs + glucose and ribose, for a more sustainable, self-regenerating system where nucleotides and energy are generated on-demand through metabolic pathways - > reaction continues for much longer periods BUT! longer reaction times to reach similar protein yields.

The longer format is designed for extended protein synthesis reactions, while the PEP-NTP system is optimised for speed

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Part 1: Biophysical/Functional Properties of Each Fluorescent Protein

sfGFP can fold in under 10 minutes, making it one of the fastest-maturing green fluorescent proteins.
mRFP1 is described as a somewhat slowly-maturing monomer with low acid sensitivity.
mKO2 is an orange fluorescent protein with moderate acid sensitivity
In mTurquoise2, the reaction with oxygen is a rate-determining step
mScarlet is a red fluorescent protein with moderate acid sensitivity, a fluorescence lifetime of 3.9 ns (which the highest value recorded to date for other red mRFPs) also generally highest calculated brightness in the mRFP spectral class.
Electra2 is a flavin-based fluorescent protein. fluorescence intensity is independent of molecular oxygen.

Part 2: Hypothesis for Optimization

So, for my hypothesis, I’d like to check whether fluorescence in sfGFP increases most when glucose and phosphate are added together, suggesting energy regeneration (not just translation) can be the limiting factor over long incubations. My guess here is that increasing magnesium concentration will enhance sfGFP fluorescence up to an optimal point (well 1), while simultaneous increases in glucose and phosphate (well 2) will further boost fluorescence over 36 hours through improving sustained energy regeneration. I also added just an increased glucose well (well 3) as another control to further support this hypothesis. This is based on research done for CFPS studies. Especially through Silverman et al. (2026) where they performed multi-dimensional optimization and emphasized Mg²⁺ and phosphate as key variables. They also showed that replacing GMP with guanine+ribose works (so transcription will succeed in all our conditions even though GMP is omitted). Then, Xu et al. (2026) found that single-component tweaks often had no effect, but certain pairs (e.g. sugar+phosphate) did, guiding us to test glucose+phosphate together

Week 12 HW: Building Genomes

See week 11~