<Jenn Leung> — HTGAA Spring 2026

cover image cover image

About me

Jenn Leung is a researcher and creative technologist working at the intersection of synthetic biology, real-time simulation, and living neural systems. She is a Senior Lecturer in Creative Technology & Design at University of the Arts London, a researcher at LifeFabs Institute, and a Visiting Researcher at The Bartlett School of Architecture, UCL, working on the 100 Minds in Motion project combining EEG, eye-tracking, and movement data within agent-based simulation.

Her research currently focuses on developing Unreal Engine interfaces for living neurons and agent behaviour simulations. Two papers have recently been published in the MIT Antikythera Journal, and a paper on UE-API for brain-on-a-chip platforms was presented at NeurIPS 2025. Since 2025, she has collaborated with the biocomputing start-up Cortical Labs to create human-synthetic biological intelligence visualizations. In 2026, she is also collaborating with Michael Levin, Agnieszka Kurant, and Emily Ertle for Rhizome’s 7x7 programme.

Previously, Jenn was a studio researcher at Antikythera’s Cognitive Infrastructures Studio in 2024, supported by the Berggruen Institute, led on community tech support for Off World Live, and served as Programme Head at the Architectural Association VS Unit 5 Xalon. Her work has been exhibited at Epic Games Innovation Lab, Ars Electronica, Medialab Matadero, W1 Curates, Tai Kwun Hong Kong, LACMA Digital Leaders, National Communication Museum (Australia), CIVA Festival, DAE Research Festival, PAF Olomouc, ALife Conference Kyoto, Aksioma, and Museum of Art in Public Spaces (Køge) among others, and was featured on Dazed, TANK Magazine, DIS, SHOWStudio, Art Asia Pacific, COEVAL Magazine, and AQNB. 

In collaboration with Daniel Felstead, she has produced a short film series ‘I’m so Janky’ from DIS that explore the myths, ideologies and realities of the metaverse, AI, and Neuralink. She also collaborates with dmstfctn on simulation projects for Serpentine Arts Technologies and the Leonardo Supercomputer at Bologna’s Tecnopolo.

Contact info

Homework

Labs

Projects

Subsections of <Jenn Leung> — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1: Principles and Practices

    Question 1 First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  • Week 3: Lab Automation

    Lab: Opentrons Art

  • Week 4: Protein Design Part I

    Protein Design Part I (Thras Karydis, Jon Kaufman)
    Lab: Protein Design I

  • Week 11: Bioproduction & Cloud Labs

    Bioproduction & Cloud Labs (Reshma Shetty)
    Lab: Cloud Lab

  • Week 10: Imaging and Measurement

    Week 10: Imaging and Measurement title: “Week 10 — Advanced Imaging & Measurement Technology” linkTitle: “Week 10 (Apr 7)” weight: 200 description: | Advanced Imaging & Measurement Tech (Evan Daugharthy, Waters Corp.) Lab: Mass Spectrometry This lecture presents a range of advanced technologies to do precision measurement of proteins at atomic scales, characterizing chemical composition, and detecting protein sequence and structure.

  • Week 12: Bioproduction

    Week 12: Bioproduction Homework — DUE BY START OF APR 28 LECTURE (TBD)

  • Week 13: Bio Design Living Materials

    Week 13: Bio Design Living Materials Homework: Work on your Final Project Present it May 12 (MIT/Harvard) or May 13 (Committed Listeners)

  • Week 14: Biofabrication

    Week 14: Biofabrication Homework: Finish your Final Project Present it May 12 (MIT/Harvard) or May 13 (Committed Listeners)

  • Week 2: DNA Read, Write, and Edit

    Week 2: DNA Read, Write, and Edit Part 1: Benchling & In-silico Gel Art 1.1 Import Lambda DNA Simulate Restriction Enzyme Digestion Virtual Gel Part 2: Gel Art I have chosen to create a gel art of a person doing a jumping jack through randomization method.

  • Week 5: Protein Design Part II

    Week 5: Protein Design Part II Homework — DUE BY START OF MAR 10 LECTURE Part A: SOD1 Binder Peptide Design Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

  • Week 6: Genetic Circuits Part I

    Week 6: Genetic Circuits Part I Homework — DUE BY START OF MAR 17 LECTURE Assignment: DNA Assembly Answer these questions about the protocol in this week’s lab:

  • Week 7: Genetic Circuits Part II

    Week 7: Genetic Circuits Part II Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? An artificual neuron is a weighted summation through an activation function that produces outputs, eventually they form networks to become ANN. Intracellular artificial networks still have weighted summation and a non-linear activation function, but we can consider implementing gene circuits as these activation functions. The main difference is that IANNs will have two inputs that can do addition and subtraction. On the one hand, a promoter that through transcription makes a gene, and through translation we create proteins, we can perform addition on this. To subtract, we can treat input x1 as an endoribonuclease CasE that will bind and cleaves the RNA on the sequence and produce output. x1 is negative weight and x2 is positve weight, where the function is max(x2-x1,0). This is also referred to as Sequestration. Sequestration involves using an endorribonucleus to transcribe into mRNA to produce non-linearity (applying single turnover enzyme to remove it out of circulation).

  • Week 9: Cell-Free Systems

    Week 9: Cell-Free Systems Homework — DUE BY START OF Apr 7 LECTURE Homework Part A: General and Lecturer-Specific Questions General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free systems help us understand biology ‘from scratch’ to bioengineer from smaller units. There’s wider flexibility for scaffolding biology from the ground-up and controlling the environments in a complete model. Existing living cells as we know it are already incredibly complex and hence less controlled in experimental settings. Synthetic cell engineering allows flexibility in size of the cell, proteins, and even expanding largely on the chemistry of the cell. So the two scenarios could be if you want to control the size of the cell and want uniform control it might be ideal to use cell-free system. The other scenario might be to engineer a specific chemical environment or want chemical diversity in the experiment that is not naturally common/ compatible with cells. Compared to in-vivo expression where you have to create plasmids, cell-free protein expressions are faster and cheaper to construct and can also help you through quick iterations with linear fragments and without plasmids.

Subsections of Homework

Week 1: Principles and Practices

cover image cover image

Question 1

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

Answer 1

I would like to expand on my project on Unreal Engine API for brain-on-a-chip platforms that was presented at NeurIPS 2025 (https://openreview.net/forum?id=BroaBkQAGa). The project proposes to build an API between living neurons interfaced with microelectrode arrays and virtual gaming environments, so that researchers and designers can use this environment to visualize spiking behavior across MEA channels, and to use reinforcement learning algorithms within the game environment to train neuronal cultures as game agents.

I’m currently collaborating with Cortical Labs to use CL-1 to connect via UDP to design closed-loop real-time visualization systems at the National Communication Museum in Melbourne. To start the loop I’ve sent in blob tracking data for Cl1 to process. The spikes from the CL1 are then streamed to Unreal Engine so that the neuronal activity can be used to transform agent parameters. https://ncm.org.au/exhibitions/cortical-labs https://jennleung.xyz/corticallabs

Question 2

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

Answer 2

One of the main objectives of this project is that it provides an open playground for benchmarking open-source and non-standardized brain-on-a-chip platforms. As we speculate these systems to become democratized and decentralized, there will spawn many different configurations of physical/ neural assemblies with advances in MEA designs, bioprinting technologies, and microfluidic platforms. Therefore, it is important to supporting 1) benchmarking integrity and reproducibility, for example, how do we measure spiking activity across different systems? How do we make sure experiments are scientifically meaningful? How do we translate and deliver virtual environments to channels on different MEA geometries? 2) ensuring accessibility to indepdnent researchers, for example, writing software environments not only for proprietary technologies such as Cortical Lab’s CL1 or FinalSpark’s Neuroplatform. Governance here means committing to abstraction layers that treat CL1 as one implementation among many 3) responsible scalability across new substrates, for example, new substrates includes increasingly complex organoids or assembloids that should go through rigorous bioethical frameworks. 4) Sustainability & longevity of the substrates, there should be rate limitations so that cells aren’t overly stimulated and at risk of quick death.

Question 3

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

For each governance action, address:

  • Purpose: What is done now and what changes are you proposing?
  • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
  • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
  • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Answer 3

  1. Benchmarking metadata across different brain-on-a-chip platforms

Purpose: Currently there are multiple commercial/ proprietary brain-on-a-chip platforms such as Cortical Labs’ CL1 and FinalSpark’s Neuroplatform, but there are no standardizations or comparisons metadata of these systems. I am proposing to create a metadata of existing platforms/ systems and develop an open access metadata standard that documents different MEA geometries, channel count, substrates,

Design: Map out a group of academic researchers who have been working on organoid intelligence/ synthetic bioengineered intelligence standardization, and manufacturers such as MaxWell Biosystems, Cortical Labs, etc., join community labs or open-source groups on open-source resesarch. In terms of implementations, I will need to consult all these groups to create an UE plugin that responds to their needs. It would be great to apply for AHRC/ UKRI grants.

Assumptions: This action assumes that all parties are happy to share their manual or manufacturing details, however, some of this data might be protected under NDA.

Risks of Failure and Success: There’s a high chance the open-source projects will grow exponentially, making this metadata impossible to manage at scale.

  1. Developing stimulation protocols at API layer

Purpose: Since there are many different types of brain-on-a-chip platforms, each company/ lab has different protocols of stimulating and recording these systems. It would be great to propose a stimulation protocol that is initiated by the API/ game environment.

Design: Study the stimulation protocols across different systems and apply appropriate time scales, rate limits, response rates, and stimulation/ discretization patterns so we can formalize communication with living neurons.

Assumptions: The biggest assumption here is likely that standardization might not be applicable or scientifically meaninful across different biological systems because of biological variability, they vary by culture, by physical assembly and MEA type.

Risk of Failure: Overstandardization might lead to less meaningful scientific experiments. Certain rate limits and standardization might fail to recognize the plasticity and assumes this technology to not evolve. Constant review and negotiations are needed to make this option work!

  1. Developing a wide range of benchmarking gaming environments/ templates

Purpose: Cortical Labs has compared living neurons against RL algorithms in Pong. I would like to expand on this to develop something adjacent to OpenAI Gym, so that we can create environments for synthetic bioengineered intelligence.

Design: These might include standardized task environments that allow researchers to compare RL agent performance on identical tasks, or have multiplayer/ team battles between two systems for performance evaluations. Standardized environments ensure that experimental results are reproducible and comparable across institutions.

Assumptions: The templates assume that this variability can be characterized statistically across many runs, but if variability is too high, the benchmarks may not be informative.

Risks of Failure and Success: Templates might restrict certain experiment design, so it would be important to balance standardization/ benchmarking vs openness.

Question 4

Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Answer 4

(Fill in the table with your scores for each option.)

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents213
• By helping respond132
Foster Lab Safety
• By preventing incident312
• By helping respond123
Protect the environment
• By preventing incidents213
• By helping respond213
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility?213
• Not impede research123
• Promote constructive applications312

Question 5

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

Answer 5

Option 2 seems to be the most well-considered option because it implies and builds on fundamental knowledge of other research institutions practice and existing start-up solutions. It’s the governance action that most directly addresses the biological welfare and safety concerns that are unique to this field. Since we can’t retroactively un-damage a neuronal culture, having safety protocols embedded at the API layer is the most impactful intervention point.

Question 6

Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.

Answer 6

I am interested in the concept of pharmakon - that for research to be really successful also comes at the cost of creating additional problems such as bioweapons or disregulation of illegal substance (biosecurity). The governance actions I am interested in are perhaps on the cloud/ API side of things, around how we may be able to apply trust-based connectivity from software design to bio-design. For example, cloud infrastructure already uses trust models and I think we could potentially learn from internet architecture to look at regulating or modeling remote access to living biological systems.

Homework Questions from Professor Jacobson (Lecture 2 slides)

Question 7

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Answer 7

Error Rate: 1:106 Throughput Error Rate Product Differential: ~108 The human genome is 3.2 billion letters long and will roughly make 3200 mistakes. Biology can reduce the error rate by shifting mismatched pair and tries again with the corrent nucleotide.

Question 8

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Answer 8

There is an astronomical number of ways to code an average human protein. Each amino acid has 3 codons available and there’s more than 300 amino acids long for an average human protein. But some codons have many matching tRNAs that not all codons apply, this means some ribosomes can fall off or misread which leads to less protein produced.

Homework Questions from Dr. LeProust (Lecture 2 slides)

Question 9

What’s the most commonly used method for oligo synthesis currently?

Answer 9

Phosphoramidite method by Caruthers

Question 10

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Answer 10

Chemistry causes cumulative damage and hits a wall around 200 nucleotides.

Question 11

Why can’t you make a 2000bp gene via direct oligo synthesis?

Answer 11

1 in 3,000 bp error rate. There’s too many errors distributed and become unpurifiable. It requires good sequencing analysis and fragment analysis as well as uniform distribution across all oligos.

Homework Question from George Church (Lecture 2 slides)

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

Option A – Question 12

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Answer 12 (if you choose Option A)

Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Theronine, Tryptophan, Valine.

The lysine contingency from Jurassic Park is irrelevant here as all animals already cannot synthesize lysine and require consuming food.

Option B – Question 13

[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

Answer 13 (if you choose Option B)

(Write your answer here.)

Option C – Question 14 (Advanced students)

[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own: https://arpa-h.gov/explore-funding/programs/boss https://www.darpa.mil/research/programs/smart-rbc https://www.darpa.mil/research/programs/go

Answer 14 (if you choose Option C)

(Write your answer here.)

Week 3: Lab Automation

Python Script for Opentrons Artwork

Question 1

Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.

I first generated a design using an image input of a voxelated ragdoll. The pixels should help simplify the image so that it can be plotted in the dish similarly.

alt text alt textalt text alt text

Because of the lack of contrast and limitations in the range of colors, the image looked different than expected.

Meanwhile at LifeFabs, we only had access to the colors Pink, Blue, and Purple. So I ended up simplifying the number of fluorescent proteins used to three and generated the coordinates appropriately.


Question 2

Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons.

These were the coordinates generated from the GUI using three fluorescent proteins:

azurite_points = [(-5, 39),(-3, 39),...]
tdtomato_points = [(5, 35),(5, 33),...]
tagrfp_points = [(11, 29),(13, 29),...]

Using the Opentrons Colab document, I successfully integrated the point data into the code:

from opentrons import types

metadata = {
    'author': 'Jenn Leung',
    'protocolName': 'Opentrons Cat',
    'description': 'HTGAA 2026 Opentrons cat drawing',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

##############################################################################
###   Colour mapping
###   A1 = Blue   → Azurite (cat outline and body)
###   B1 = Purple → mCherry + mPlum (shadow and accent details)
###   C1 = Pink   → tdTomato + tagRFP + mHoneydew (warm fill and face)
##############################################################################

well_colors = {
    'A1': 'Blue',
    'B1': 'Purple',
    'C1': 'Pink',
}

##############################################################################
###   Point data
##############################################################################

azurite_points = [...]
tdtomato_points = [...]
tagrfp_points = [...]

# ... (full code in OpentronsProtocol.py)

# Blue (A1) — Azurite: cat outline and body
paint_layer(azurite_points, 'Blue')

# Pink (C1) — tdTomato + tagRFP + mHoneydew: warm fill and face detail
paint_layer(tagrfp_points, 'Pink')

# Purple (B1) — mCherry + mPlum: shadow accents and deep detail
paint_layer(tdtomato_points, 'Purple')
alt text alt text

This is the result of the final preview on the colab document, using the three colors available.


Post-Lab Questions

Question 1

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Answer:
‘Fluidic Programmable Gravi-maze Array for High Throughput Multiorgan Drug Testing’ by Wong et al. proposes OrganRX, which is a multi-organ-on-a-chip system that is compatible with automated liquid dispensing robots such as Opentrons, OT2.

The programmable part of the microfluidic architecture uses robotic liquid handlers and automated plate readers, which can help researchers program how much media reaches each organ compartment.

There is also a programmable tilting recirculation mechanism that drives flow between the corner wells of the chip, allowing for directional flow.

The developers developed a Bluetooth-enabled iOS app that allows for remote control of the recirculation system, allowing users to select from multiple shear flow rates, set programmable waiting times between tilt-direction changes, and conduct system reset.


Question 2

Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more.

As my research focuses on brains-on-chips and facilitating closed-loop interactions between living substrates and software systems, I’m curious to develop something similar to the OrganRX platform that utilizes Opentrons for chemical I/O with synthetic bioengineered intelligence. The direction is to look into facilitating biochemical feedback loops and designing custom plates for Opentrons via 3D printing.

#pseudo-code for HTGAA final project Assembloid Agency
#design plasmids and custom plate MEA holder > order Twist DNA > 3D print labware > script OT2 protocol > measure spike changes and chemical delivery

prep:
>list type of materials will be needed to facilitate chemical i/o for wetware, e.g. placeholder for neurons, Opentrons OT2, custom labware, Twist order, microfluidic design, media

synbio component:
>design a plasmid in Benchling and identify a chemical 'handshake' between Opentrons and neurons
>synthetic gene for measurable and identifiable chemical signals
>research in DREADD hM3Dq (human M3 muscarinic DREADD coupled to)
>benchling design!

physical system:
>design microfluidics system and composition of the custom labware
>print custom labware for Opentrons OT2, holding an MEA chip, including different wells for cultures and reinforcement agents, waste profusion/filter
>the wells should hold basal media, reinforcement agents, and waste buffer - maybe model after the OrganRX chip to 'tilt' agents into center/substrate.

software:
>develop an API for the OT-2 to detect

assembly:
>test and try to connect the synbio parts, with hardware, and software!
>measure spikes from neurons placeholder after robotic chemical delivery

WIP JSON code for custom labware:

{
  "ordering": [["A1", "A2"]],
  "brand": {"brand": "CorticalLabs-Custom"},
  "metadata": {
    "displayName": "Assembloid Agency Chemical IO Plate",
    "displayCategory": "other",
    "displayVolumeUnits": "µL"
  },
  "dimensions": {
    "xDimension": 127.76,
    "yDimension": 85.48,
    "zDimension": 15.0
  },
  "wells": {
    "A1": {
      "depth": 10.0,
      "diameter": 3.0,
      "shape": "circular",
      "x": 20.0, 
      "y": 40.0,
      "z": 5.0
    },
    "A2": {
      "depth": 10.0,
      "diameter": 3.0,
      "shape": "circular",
      "x": 40.0,
      "y": 40.0,
      "z": 5.0
    }
  }
}

Final Project Ideas — DUE BY START OF FEB 24 LECTURE

As explained in this week’s recitation, add 1-3 slides with 3 ideas you have for an Individual Final Project in the appropriate slide deck for MIT/Harvard/Wellesley students or for Committed Listeners. Be sure to put your name on your slide(s); for CLs, also put your city and country on your slide(s) and be sure you’re putting your slide(s) in your Node’s section of the deck.

Assembloid Agency is a bio-digital interface platform designed to facilitate closed-loop biochemical communication between synthetic neural substrates and automated software systems. I will be integrating the Opentrons OT-2 with Multi-Electrode Array to create chemical I/O bridge between neural substrates and software systems.

I’m looking into using DREADDs to allow software-controlled chemical I/O as well as designing custom 3D-printed labware, housing the biological assembly while providing microfluidic channels for automated media exchange, chemical reinforcement signals, and waste management. The aim is to conduct real-time closed-loop chemical communication with the substrate. alt text alt text alt text alt text alt text alt text alt text alt text

Reading & Resources

Week 4: Protein Design Part I

This week focuses on how sequence, structure, and energetics can be modeled and manipulated to create or optimize proteins with specified functions.

Objective:

  1. Learn basic concepts:
    • amino acid structure
    • 3D protein visualization
    • the variety of ML-based design tools
  2. Brainstorm as a group how to apply these tools to engineer a better bacteriophage (setting the stage for the final project).

Part A. Conceptual Questions

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

For 500 grams of meat, there is roughly 20-25% grams of protein. This means that roughly 100 grams belong to protein, while there is remaining fat, fiber, and water that make up the rest of the mass. Because 1 mole = 100 Da Number of moles = 100 g of protein / 100 Da = 100g/ 100 g / mol $$\text{Molecules} = 1 \text{ mol} \times 6.022 \times 10^{23} \text{ molecules/mol}$$$$\text{Molecules} \approx 6.022 \times 10^{23}$$ There are roughly 602 sextillion amino acids.

  1. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

When humans eat beef, through mastication and digestion we break down the beef into smaller units. First protein is broken down by enzymes (proteases) and into shorter chains of amino acids in the stomach. Then the chains become further broken down into individual amino acids in the small intestine. As these amino acids enter the bloodstream, they require DNA to instruct them into building other things. The human DNA does different things than cows and fish, therefore the amino acids will build a cow or a fish.

  1. Why are there only 20 natural amino acids?

It may be an evolutionary mystery that almost all living things are built from these 20 natural amino acids. The 20 amino acids serve as the building blocks of most proteins, they line up as codons in 3-letter assemblies, in which the ribosomes read to create actions following the DNA sequence. When they read 3 bases at once, the combinations create 4^3 possibilities that are expansive enough for the making of diverse lifeforms.

  1. Can you make other non-natural amino acids? Design some new amino acids.

Yes, there are a lot of non-natural amino acids. Designing new amino acids require us to follow the same chassis but redesign the ‘r-group’ to alter the chemistry of the bond, which is the side chain of the amino acid. One may attach an azide to the chain to create a strong bond for stickiness or bio-glue. For experiments, some researchers also use non-natural florescent amino acids like Acridonylalanine to glow under microscopy or photographs.

  1. Where did amino acids come from before enzymes that make them, and before life started?

This might be related to assembly theory? Lee Cronin proposed that life is composed of different assemblies, in that life is scaffolded by energy, raw sources, and minerals through complex interactions and then becomes amino acids, and longer chains. Gases and energy together can create amino acids. The Miller-Urey Experiment use water, methane, ammonia, and hydrogen to create amino acids.

  1. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Left-handed. D-amino acids create a mirror image of α-helixes, because the building blocks and the structure are completely mirrored.

  1. Can you discover additional helices in proteins?

Yes, since 2020, AlphaFold has allowed us to quickly discover new helices and the instructions to their fold, revealed millions of protein structures.

  1. Why are most molecular helices right-handed?

Because of chirality, most helices are non-identical to their mirror image. As most amino acids are L-form (left-handed), the way they most efficiently stack together is twisting to the right where they can create stable bonds with enough room between each other.

  1. Why do β-sheets tend to aggregate?

β-sheets bond together via hydrogen bonds. The geometry appears like pleated, zigzag, sheet-like structure with side chains protruding.

  • What is the driving force for β-sheet aggregation?

They tend to aggregate because of its geometry, where the hydrophobic faces might sandwich and stick together to hide from water. The force from the water becomes driving force for clumping.

  1. Why do many amyloid diseases form β-sheets?
    • Can you use amyloid β-sheets as materials?
  2. Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

  1. Briefly describe the protein you selected and why you selected it.

alt text alt text I chose GPR3 Orphan G-coupled Protein Receptor in complex with Dominant Negative Gs (8U8F) because I’m interested in GPR3 is a class A orphan G protein-coupled receptor (GPCR) exhibiting broad expression across various brain regions including the hypothalamus, hippocampus, and cortex, as well as in peripheral tissues such as liver and ovary.It has a potential role in modulating a number of brain functions, including behavioral responses to stress, amyloid-beta peptide generation in neurons and neurite outgrowth. For brains-on-chips research I’m interested in different types of expressions in the central nervous system and the brain.

  1. Identify the amino acid sequence of your protein.
    • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

    There are four protein chains. Chain A: 372, Chain B: 339, Chain C: 58, Chain D: 384. The most frequent amino acid seems to be leucine. It is a sturdy, hydrophobic (water-hating) amino acid.

    >8U8F_4|Chain D[auth R]|G-protein coupled receptor 3|Homo sapiens (9606)NSTMKTIIALSYIFCLVFADYKDDDDLEVLFQGPAMWGAGSPLAWLSAGSGNVNVSSVGPAEGPTGPAAPLPSPKAWDVVLCISGTLVSCENALVVAIIVGTPAFRAPMFLLVGSLAVADLLAGLGLVLHFAAVFCIGSAEMSLVLVGVLAMAFTASIGSLLAITVDRYLSLYNALTYYSETTVTRTYVMLALVWGGALGLGLLPVLAWNCLDGLTTCGVVYPLSKNHLVVLAIAFFMVFGIMLQLYAQICRIVCRHAQQIALQRHLLPASHYVATRKGIATLAVVLGAFAACWLPFTVYCLLGDAHSPPLYTYLTLLPATYNSMINPIIYAFRNQDVQKVLWAVCCCCSSSKIPFRSRSPSDVPAGLEVLFQGPHHHHHHHHAAAFESR
    >8U8F_3|Chain C[auth G]|Guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit gamma-2|Homo sapiens (9606)
    NTASIAQARKLVEQLKMEANIDRIKVSKAAADLMAYCEAHAKEDPLLTPVPASENPFR
    >8U8F_2|Chain B|Guanine nucleotide-binding protein G(I)/G(S)/G(T) subunit beta-1|Homo sapiens (9606)
    QSELDQLRQEAEQLKNQIRDARKACADATLSQITNNIDPVGRIQMRTRRTLRGHLAKIYAMHWGTDSRLLVSASQDGKLIIWDSYTTNKVHAIPLRSSWVMTCAYAPSGNYVACGGLDNICSIYNLKTREGNVRVSRELAGHTGYLSCCRFLDDNQIVTSSGDTTCALWDIETGQQTTTFTGHTGDVMSLSLAPDTRLFVSGACDASAKLWDVREGMCRQTFTGHESDINAICFFPNGNAFATGSDDATCRLFDLRADQELMTYSHDNIICGITSVSFSKSGRLLLAGYDDFNCNVWDALKADRAGVLAGHDNRVSCLGVTDDGMAVATGSWDSFLKIWN
    >8U8F_1|Chain A|Guanine nucleotide-binding protein G(s) subunit alpha isoforms short|Homo sapiens (9606)
    MGCLGNSKTEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGAGESGKNTIVKQMRILHVNGFNGEGGEEDPQAARSNSDGEKATKVQDIKNNLKEAIETIVAAMSNLVPPVELANPENQFRVDYILSVMNVPDFDFPPEFYEHAKALWEDEGVRACYERSNEYQLIDCAQYFLDKIDVIKQADYVPSDQDLLRCRVLTSGIFETKFQVDKVNFHMFDVGAQRDERRKWIQCFNDVTAIIFVVASSSYNMVIREDNQTNRLQAALKLFDSIWNNKWLRDTSVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCSVDTENIRRVFNDCRDIIQRMHLRQYELL
    • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

    There are thousands of homologs, incuding human, pygmy chimpanzee, olive babboon, cotton-top tamarin, etc. The protein seems highly conserved and not changed.

    • Does your protein belong to any protein family?

    G Protein-Coupled Receptor (GPCR) Family

  2. Identify the structure page of your protein in RCSB
    • When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

    The structure is solved around 2023 September and released 2024 Match. The method id electron microscopy but resolution 3.49 Å.

    • Are there any other molecules in the solved structure apart from protein?

    Yes, I see palmitic acid in the structure apart from protein.

    It belongs to a membrain protein, and falls under 7-transmembrane receptive (GPCR).

  3. Open the structure of your protein in any 3D molecule visualization software:
    • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
    • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

    Cartoon alt text alt text Ribbon alt text alt text Ball and stick alt text alt text

    • Color the protein by secondary structure. Does it have more helices or sheets? alt text alt text It has a lot more helices than sheets.
    • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

    alt text alt text I used an additional script to label the hydrophobicity scale. Hydrophobic residues are red and hydrophilic (polar/charged) residues are white. It is slightly more hydrophobic.

   #https://pymolwiki.org/index.php/Color_h
   from pymol import cmd

def color_h(selection='all'):
        s = str(selection)
        print(s)
        cmd.set_color('color_ile',[0.996,0.062,0.062])
        cmd.set_color('color_phe',[0.996,0.109,0.109])
        cmd.set_color('color_val',[0.992,0.156,0.156])
        cmd.set_color('color_leu',[0.992,0.207,0.207])
        cmd.set_color('color_trp',[0.992,0.254,0.254])
        cmd.set_color('color_met',[0.988,0.301,0.301])
        cmd.set_color('color_ala',[0.988,0.348,0.348])
        cmd.set_color('color_gly',[0.984,0.394,0.394])
        cmd.set_color('color_cys',[0.984,0.445,0.445])
        cmd.set_color('color_tyr',[0.984,0.492,0.492])
        cmd.set_color('color_pro',[0.980,0.539,0.539])
        cmd.set_color('color_thr',[0.980,0.586,0.586])
        cmd.set_color('color_ser',[0.980,0.637,0.637])
        cmd.set_color('color_his',[0.977,0.684,0.684])
        cmd.set_color('color_glu',[0.977,0.730,0.730])
        cmd.set_color('color_asn',[0.973,0.777,0.777])
        cmd.set_color('color_gln',[0.973,0.824,0.824])
        cmd.set_color('color_asp',[0.973,0.875,0.875])
        cmd.set_color('color_lys',[0.899,0.922,0.922])
        cmd.set_color('color_arg',[0.899,0.969,0.969])
        cmd.color("color_ile","("+s+" and resn ile)")
        cmd.color("color_phe","("+s+" and resn phe)")
        cmd.color("color_val","("+s+" and resn val)")
        cmd.color("color_leu","("+s+" and resn leu)")
        cmd.color("color_trp","("+s+" and resn trp)")
        cmd.color("color_met","("+s+" and resn met)")
        cmd.color("color_ala","("+s+" and resn ala)")
        cmd.color("color_gly","("+s+" and resn gly)")
        cmd.color("color_cys","("+s+" and resn cys)")
        cmd.color("color_tyr","("+s+" and resn tyr)")
        cmd.color("color_pro","("+s+" and resn pro)")
        cmd.color("color_thr","("+s+" and resn thr)")
        cmd.color("color_ser","("+s+" and resn ser)")
        cmd.color("color_his","("+s+" and resn his)")
        cmd.color("color_glu","("+s+" and resn glu)")
        cmd.color("color_asn","("+s+" and resn asn)")
        cmd.color("color_gln","("+s+" and resn gln)")
        cmd.color("color_asp","("+s+" and resn asp)")
        cmd.color("color_lys","("+s+" and resn lys)")
        cmd.color("color_arg","("+s+" and resn arg)")
cmd.extend('color_h',color_h)

def color_h2(selection='all'):
        s = str(selection)
        print(s)
        cmd.set_color("color_ile2",[0.938,1,0.938])
        cmd.set_color("color_phe2",[0.891,1,0.891])
        cmd.set_color("color_val2",[0.844,1,0.844])
        cmd.set_color("color_leu2",[0.793,1,0.793])
        cmd.set_color("color_trp2",[0.746,1,0.746])
        cmd.set_color("color_met2",[0.699,1,0.699])
        cmd.set_color("color_ala2",[0.652,1,0.652])
        cmd.set_color("color_gly2",[0.606,1,0.606])
        cmd.set_color("color_cys2",[0.555,1,0.555])
        cmd.set_color("color_tyr2",[0.508,1,0.508])
        cmd.set_color("color_pro2",[0.461,1,0.461])
        cmd.set_color("color_thr2",[0.414,1,0.414])
        cmd.set_color("color_ser2",[0.363,1,0.363])
        cmd.set_color("color_his2",[0.316,1,0.316])
        cmd.set_color("color_glu2",[0.27,1,0.27])
        cmd.set_color("color_asn2",[0.223,1,0.223])
        cmd.set_color("color_gln2",[0.176,1,0.176])
        cmd.set_color("color_asp2",[0.125,1,0.125])
        cmd.set_color("color_lys2",[0.078,1,0.078])
        cmd.set_color("color_arg2",[0.031,1,0.031])
        cmd.color("color_ile2","("+s+" and resn ile)")
        cmd.color("color_phe2","("+s+" and resn phe)")
        cmd.color("color_val2","("+s+" and resn val)")
        cmd.color("color_leu2","("+s+" and resn leu)")
        cmd.color("color_trp2","("+s+" and resn trp)")
        cmd.color("color_met2","("+s+" and resn met)")
        cmd.color("color_ala2","("+s+" and resn ala)")
        cmd.color("color_gly2","("+s+" and resn gly)")
        cmd.color("color_cys2","("+s+" and resn cys)")
        cmd.color("color_tyr2","("+s+" and resn tyr)")
        cmd.color("color_pro2","("+s+" and resn pro)")
        cmd.color("color_thr2","("+s+" and resn thr)")
        cmd.color("color_ser2","("+s+" and resn ser)")
        cmd.color("color_his2","("+s+" and resn his)")
        cmd.color("color_glu2","("+s+" and resn glu)")
        cmd.color("color_asn2","("+s+" and resn asn)")
        cmd.color("color_gln2","("+s+" and resn gln)")
        cmd.color("color_asp2","("+s+" and resn asp)")
        cmd.color("color_lys2","("+s+" and resn lys)")
        cmd.color("color_arg2","("+s+" and resn arg)")
cmd.extend('color_h2',color_h2)
alt text alt text
  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

alt text alt text Yes it appears to have a hole in the middle.

Part C. Using ML-Based Protein Design Tools

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

  1. Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
  2. Choose your favorite protein from the PDB.
  3. We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359

  1. Deep Mutational Scans

    1. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
    2. >Using ESM2 mutational scans, 8U8F looks like >![alt text]()
    3. Can you explain any particular pattern? (choose a residue and a mutation that stands out)
    4. It appears that there are vertical bands in the sequence where across different amino acids, it's predicted to have a low score. This might be due to highly conserved functional and structural reasons. Lysine is the most common amino acid, but it also shows lots of dark spots and low scores because it is may have a hydrophobic mismatch. >There is a yellow band at position 243. >It is interesting Lysine is charged and has lots of blue bands, Leucine is neutral and is mostly high on the score.
    5. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
  2. Latent Space Analysis

    1. Use the provided sequence dataset to embed proteins in reduced dimensionality.
    2. >![alt text]()
    3. Analyze the different formed neighborhoods: do they approximate similar proteins?
    4. >They are positionally far away from each other, they are very different proteins.
    5. Place your protein in the resulting map and explain its position and similarity to its neighbors.
    6. >G-protein subunits ($\alpha, \beta, \text{ and } \gamma$ are much closer to each other on the map. >Chain G is much shorter, only 58 amino acids and is structurally very different to other proteins. Chain G is essentially just two small alpha-helices connected by a loop.

C2. Protein Folding

Picture Source: Lin et al (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model.

Picture Source: Lin et al (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model.

Folding a protein

  1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
alt text alt text
  1. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

I tried changing small snippets of the sequence and it wasn’t as visible, but adding longer sequences of the same amino acid allowed twists to be more visible. alt text alt text

C3. Protein Generation

Picture Source: 1. Post from Sergey Ovchinnikov 2. Roney, Ovchinnikov et al (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101

Picture Source: 1. Post from Sergey Ovchinnikov 2. Roney, Ovchinnikov et al (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

Using the fixed-backbone design, we kept the 3D shape of 8U8F Chain A and reskinned a sequence. ProteinMPNN ended up rewriting 75% of the protein, there is a high frequency of Leucine and Lysine. alt text alt text My results look like:

Model weights found in ProteinMPNN/vanilla_model_weights
Using device: cuda:0
Number of edges: 48
Training noise level: 0.2A
Model loaded
{'8u8f': (['A'], [])}
Length of chain A is 381
Generating sequences...
>8u8f, score=2.1622, fixed_chains=[], designed_chains=['A'], model_name=v_48_020
NEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGAGESGKNTIVKQMRIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSGIFETKFQVDKVNFHMFDVGAQRDERRKWIQCFNDVTAIIFVVASSSYXXXXXXXXQTNRLQAALKLFDSIWNNKWLRDTSVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCSVDTENIRRVFNDCRDIIQRMHLRQYELL
>T=0.1, sample=0, score=1.0949, seq_recovery=0.2511
ELLKLLEELLKKLAEKLKKEEEEEKKIKKILLLGSPSSGKTTLLKNIKKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXEPEEVVEFTIDGKKYKIYDLKNQPPDLREVLAKYKDAKVIIYVFPLGSFXXXXXXXXPEDLEKVALEELEWIWNHPDLKNVPILVIFNRPELLRERVLSGKNPIEERFPEYKGYELPKEVKPPEGVPEEWVKVLAFIIDKILKFANKNRGGIREVYPVISSPESKDIKQIIYDAIKKAEERKKLIAEGKL
>T=0.1, sample=0, score=1.1122, seq_recovery=0.2338
LLLLLLLLLLLLLLVLLLLKLLEESKIKKLLLLGSPSSGKTSLLENIEKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXEPERVLEFEIDGVKYRIIDLSNLPPDLSDVLSEYSDCEIIIYVFSTGSYXXXXXXXXPEDLESVDLERLKWIWNHPALKNTPILVIFNRPELLAKRVLSGEKPIEERFPEYKGYKLPENVKPPPGVPEETVKVLSFLIDKVLEFANQNRGGIREVYPVISSVKSKEIKEIIYEAVKKAEERKKLIAQGLL
>T=0.1, sample=0, score=1.0975, seq_recovery=0.2554
KEEEKKKELEEKLKKEEEKKKEEEEKVIKLLLLGLPNSGKTTILENIKKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXEPEEVIEFEIEGKKYRIVDLKNLPPDLSEILEKYSDCKILVYIFPTGSFXXXXXXXXPENLEKEALELLKRIWNHPSLKNVPLLVIFNRAEKLKEIVLSGEKPIEEYFPEYKGYKLPESAKPPPNTDPEVVKVLSFLIDKILEYANQNRGGIRKVFPVISSPESKDIREIIYKAVKEAEERKKLIALGLL
>T=0.1, sample=0, score=1.1196, seq_recovery=0.2857
AALAEELAKKKALAALKKKEEEEESKVKKLLLLGGPSSGKTTLLENISKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSSIRELEFEIDGVKYKILDLENRPEDLSEILSEFKDCEIIIYVFPLGSFXXXXXXXXPENLLKKALEEFERIWNHPDLKDVPILVLFNRPELLKEKVLSGKKPLEEIFPEYKGWELPEDAKPPPNTPLEWVKALYFLKEKVLEIANKNRGGRREVFPFIVSPKSKDIKEIIYNAVKEAEKRKALIAAGLL
>T=0.1, sample=0, score=1.1445, seq_recovery=0.2381
LLLLLLLALLLALAALLAALAEEEKKVRKLLLLGLPNSGKTTLLKNISKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXEPEEILKFEIDGVKYEIKDLKNRPPDLSDILKEYSDCDIIIYVFPSGLFXXXXXXXXPENLEEVALEQLKNLLNNPDLKNVPILVLFNRPELLKKIVESGKRPLEEIFPEYKGYELPESAVCPPNTPLEWCKAIYFLIDKILEFANQNRGGISEVYPHITSPDSKDIKQIIYDAVKKAEERKKLIAAGKL

New Sequence: DKKIKKDDKKIIKDIKIIDDDDDIIHIIHKKKFRNRRFISSKKIMHIIYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYDNDDTTDESHCFIIWIHWCKIMPNNCKQDTKIWICITHHWTENKFREYYYYYYYYNDCKDITKDDKDVHVMGNCKIMTNHKTHEMQNDKKQDQTKRFIMNHDDQENDWIFWDKNIDTINNDFTNDDVTITKEHHCIHKIEMIMQFFHQDTWNTHRRNDRICHIPHHWCHIIDDQIIKHDFIK

============================================================
Summary
============================================================
Sequence 1: score=1.0949, recovery=25.11%
Sequence 2: score=1.1122, recovery=23.38%
Sequence 3: score=1.0975, recovery=25.54%
Sequence 4: score=1.1196, recovery=28.57%
Sequence 5: score=1.1445, recovery=23.81%

Google Colab doesn’t work with GPU acceleration so I’ve cloned to work locally.

  1. Input this sequence into ESMFold and compare the predicted structure to your original.

new sequence new sequence alt text alt text

DKKIKKDDKKIIKDIKIIDDDDDIIHIIHKKKFRNRRFISSKKIMHIIYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYDNDDTTDESHCFIIWIHWCKIMPNNCKQDTKIWICITHHWTENKFREYYYYYYYYNDCKDITKDDKDVHVMGNCKIMTNHKTHEMQNDKKQDQTKRFIMNHDDQENDWIFWDKNIDTINNDFTNDDVTITKEHHCIHKIEMIMQFFHQDTWNTHRRNDRICHIPHHWCHIIDDQIIKHDFIK

The predicted structure has retained the structure but upon comparison on PyMOL, the white structure (new) looks displaced. alt text alt text

Part D. Group Brainstorm on Bacteriophage Engineering

Assignees for the following sections
MIT/Harvard studentsOptional
Committed ListenersRequired
  1. Find a group of ~3–4 students

  2. Read through the Phage Reading material listed under “Reading & Resources” below.

  3. Review the Bacteriophage Final Project Goals for engineering the L Protein:

    • Increased stability (easiest)
    • Higher titers (medium)
    • Higher toxicity of lysis protein (hard)
  4. Brainstorm Session

    • Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).

    optimizing protein’s binding affinity to e coli to accelerate lysis trigger increasing stability of L protein, ensuring proteins are folded and integrated into membrane to perform function.

    • Write a 1-page proposal (bullet points or short paragraphs) describing:

      • Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).

      We would like to use protein language models such as ESM2 in the colab document to perform in sillilco mutagenesis. We will calculate single point mutations in the L protein sequence, and try to idenitfy mutations that are more evolutionarily favorable. Like the assignment I am interested to use ProteinMPNN for to redesign and generate a new sequence. Given the backbone structure of the L protein, this tool will help us generate alternative sequences that maintain the same fold but with higher thermal stability, thereby achieving our goals. AlphaFold Multimer was particularly interesting too, as it predicts 3D structures of protein complexes (co-folding multiple chains). Novel complexes create range and breadth.

      • Why do you think those tools might help solve your chosen sub-problem?

      ProteinMPNN was very robust in developing sequences that fit a specific shape, there is guarantee we will be able to increase protein stabililty. ESM2 allows us to scan so many mutations at once, which allows us to very quickly narrow down a direction that we couldn’t perform in wet lab setting.

      • Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).

      L protein is a membrane protein. Most standard protein models like AlphaFold multimer seem to be trained primarily on soluble proteins. The specific lipid-protein interactions required for lysis may not be fully captured, leading to “stable” designs that fail to insert into the membrane. In my assignment I don’t understsand still how the shape will fit as it seems displaced?

      • Include a schematic of your pipeline.
    • This resource may be useful: HTGAA Protein Engineering Tools

  5. Each individually put your plan on your HTGAA website

    • Include your group’s short plan for engineering a bacteriophage

Input a L protein sequence > use ESM2 to generate favorable mutations, the heat map should show us green-light vs no-go directions in the sequence > Use protein MPNN to generate and find a skeleton template for core stability > add complexity via alphaFold, predicting an interaction. >use PyMOL to check shape and geometry > calculate binding affinity score via colab > and select best candidates!


Reading & Resources (click to expand)

Tools

Phage Reading

Week 11: Bioproduction & Cloud Labs

Cloud laboratories are making science accessible, affordable, and reproducible. Our aim this semester is to showcase how they can enable human creativity at scale, and how they provide a platform for collaboration and community.

How To Grow (Almost) Anything is about synthetic biology, bioengineering, robotics, automation, art, and AI. But it is also about friendship, shared purpose, and the freedom to build beyond what we know and to be inspired by what can be. To that end, the goal with this cloud lab unit and homework assignment is to inspire collaboration and creativity while designing a scientifically rigorous cell-free fluorescent protein optimization experiment together.

Lecture (Tues, Apr 14)

Bioproduction & Cloud Labs
(▶️Recording)
Reshma Shetty

Recitation (Wed, Apr 15)

Cloud laboratories
(▶️Recording | 💻Slides)
Ronan Donovan

Lab (Thurs-Fri, Apr 16 - 17)
Tip

As you plan for final projects, you may want to refer to the provided non-exhaustive list of common Nebula protocols and their parameters in the “Reading & Resources” section below.

Homework — DUE BY START OF APR 28 LECTURE

Info

Note that this homework is due a week later than it ordinarily would due to its release a week later than normal.

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired
  1. Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST.
    • A personalized URL was sent to the email address associated with your Discourse account, and you can discuss the artwork on the Discourse.
    • If you did not have a chance to contribute, it’s okay, just make sure you become a TA this fall! 😉
  2. Make a note on your HTGAA webpages including:
    • what you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”)
    • what you liked about the project, and
    • what about this collaborative art experiment could be made better for next year.

Here’s some evidence of what I contributed to the collective artwork: I had chosen specific pixels that were assigned different colors but adjacent to core shapes and forms. My most enjoyable part was during our recitation, our classmate Constantin found a way to hack the system and constantly draw LifeFabs representation onto the community art so we had a leading presence!

image image image image image image

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired
  1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

    E. coli Lysate

    • BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)
    • Clarified lysate of E Coli containing all transcription and translation machinery such as ribosomes,tRNA, aminoacyl-tRNA synthetases, and elongation factors.

    • BL21(DE3) is a derivative of the E. coli B strain that does not contain the lon protease and is also deficient in the outer membrane protease OmpT. The lack of two key proteases reduces degradation of heterologous proteins expressed in the strain. e.g. rne131 mutation that reduces levels of endogenous RNases and mRNA degradation

    • Is likely optimized for use with low copy n umber with T7 promoter based plasmids, will have fast growth in minimal medium and ability to reach high cell density

    Salts/Buffer

    • Potassium Glutamate
    • Salt creating monovalent ionic environment so that ribosomes can maintain their structure and function. flutamate used instead of chloride because high concentrations of chloride ions makes it hard for translation to take place.

    • HEPES-KOH pH 7.5
    • keeps reaction stable and enzyme friendly pH during incubation. its to help make sure metabolism wont turn too acidic and denature proteins.

    • Magnesium Glutamate
    • magnesium is essential for ribosome assembly and tRNA binding and for enzymes be involved in the nucleotide metabolism. a good concentration makes sure they can bind.

    • Potassium phosphate monobasic
    • Potassium phosphate dibasic
    • These two potassium phosphates help to supply phosphate that can feed back to nucleotide regeneration and help sustain ATP and NTP synthesis over time.

    Energy / Nucleotide System

    • Ribose
    • central substrate of this NMP-Ribose energy system. Endogenous kinases phosphorylate ribose into ribose-5-phosphate, which feeds both nucleotide biosynthesis and the pentose phosphate pathway, providing a sustained source of NTPs to drive transcription and translation

    • Glucose
    • supplementary energy source that feeds glycolysis, generating ATP and NADH alongside the ribose pathway to extend the productive lifetime of the reaction.

    • AMP
    • CMP
    • GMP
    • UMP
    • Nucleoside monophosphates that serve as precursors to the activated NTP forms (ATP, CTP, UTP) required for mRNA synthesis and as the universal energy currency powering translation.

    • GMP here though is set to zero for guanine salvage pathway to produce ennough GTP without needing to supplement as GMP

    • Guanine
    • purine salvage pathway, where it is converted to GMP and then phosphorylated up to GTP , more cost effective way to maintain the GTP pool than building it from scratch via de novo synthesis.

    Translation Mix (Amino Acids)

    • 17 Amino Acid Mix
    • substrates for ribosome, building blocks that can be loaded onto tRNAs and incorporated into peptide chain during translation

    • Tyrosine
    • dissolved at pH12 bcs tyrosine is not very soluble at normal ph levels,

    • Cysteine
    • oxidises in mixed solution and will form disulfide bonds or react with metal ions, it might deplete free cysteine available for target protein.

    Additives

    • Nicotinamide
    • learnt about this from skincare (haha): precursor to NAD+ that can replenish effects that drives glycolysis and other metabolic reactions.

    Backfill

    • Nuclease Free Water
    • making sure water is at a right working volume but also no contaminating RNases or DNases that would degrade mRNA trnasscription .

  2. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

Main difference between the two systems is how they supply energy and nucleotides. 1-hour PEP-NTP system delivers ready-made NTPs and fast-burning energy donors directly, so transcription and translation kick off immediately but te fuel runs out quickly as PEP is consumed and inhibitory phosphate builds up. The 20-hour NMP-Ribose-Glucose system instead supplies simpler precursors (NMPs, ribose, glucose) and lets the lysate’s own enzymes continuously regenerate NTPs from scratch, which is slower to get going but sustains productive protein synthesis much longer. The short-burst system also uses a richer cocktail of additives to squeeze maximum output from a brief reaction window, while the long-run system compensates with higher amino acid concentrations to keep the ribosomes fed over the extended incubation.

  1. Bonus question: How can transcription occur if GMP is not included but Guanine is?

Guanine is converted to GMP via the purine salvage pathway. GTP is the actual substrate incorporated by T7 RNAP during transcription

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired
  1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

    1. sfGFP
    2. mRFP1
    3. mKO2
    4. mTurquoise2
    5. mScarlet_I
    6. Electra2

    The amino acid sequences are shown in the HTGAA Cell-Free Benchling folder.

sfGFP is a superfolder GFP is engineered to fold correctly even under stress conditions, and it matures so quickly that it produces a reliable green signal faster than almost any other variant, making it a great baseline readout in cell-free reactions. mRFP1 is the earliest or one of the earliest monomeric red proteins developed, it is notably slow to mature and not particularly bright, so in a cell-free context you often have to wait significantly longer before you see meaningful signal compared to newer red variants. mKO2 : mKO2 needs oxygen to complete its chromophore, and because cell-free reactions in small droplets or dense solutions can become oxygen-depleted, its orange signal may be weaker or delayed relative to what you’d see in a well-aerated system. mTurquoise2 this is very efficient at converting absorbed light into emitted fluorescence, meaning even modest expression levels produce a strong, stable signal that holds up well over long imaging periods. mScarlet_I - is newer type of rFP and matures much faster and produces a brighter signal sooner, making it well-suited for watching protein production happen in real time. >this could be good for my experiment for transient transfections? Electra2 optimized specifically for speed, Electra2 folds and forms its chromophore almost immediately after translation, so it’s particularly useful when you need to detect protein expression as early as possible in the reaction.

  1. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

To maximise mKO2 fluorescence over a 36-hour incubation, we would increase nicotinamide to 1.5x its standard concentration and slightly raise the ratio of dibasic potassium phosphate to keep the reaction pH stable for longer. Nicotinamide sustains the metabolic pathways that regenerate NAD⁺ and keep translation running, while the adjusted phosphate balance prevents the gradual acidification that would otherwise slow or stall the ribosomes. Together these changes give mKO2 the stable, long-running reaction environment it needs to complete its slow, oxygen-dependent chromophore maturation and accumulate detectable fluorescence over the full incubation window.

  1. The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

    Important

    In order to be eligible for this, make sure that your final project slide is in the “2026 Committed Listener ONE FINAL PROJECT IDEA” slide deck.

i started doing some mixes, aprt from nicotinamide mentioned above, adding a low concentration of GMP and slightly increasing cysteine will likely change RFP slow to mature and not particularly bright, the strategy is to compensate by producing more total protein, more mRNA transcribed means more chances for mature fluorescent protein to accumulate by the end of the incubation. The extra GMP tops up the GTP pool that RNA polymerase draws on during transcription, helping sustain mRNA production over the full reaction window. cysteine is a structural amino acid that appears in mRFP1’s folding core, so keeping it in good supply reduces the chance of misfolding image image

  1. The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!). The reaction composition for each well will be as follows:

    • 6 μL of Lysate
    • 10 μL of 2X Optimized Master Mix from above
    • 2 μL of assigned fluorescent protein DNA template
    • 2 μL of your custom reagent supplements

    Total: 20 μL reaction

N/A? No data was provided that week but we can use the existing data from earlier provision.

Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment

Assignees for the following sections
MIT/Harvard studentsOptional
Committed ListenersOptional
Ginkgo Nebula Cloud Laboratory Rendering, 2025

Ginkgo Nebula Cloud Laboratory Rendering, 2025

  1. Use this simulation tool to create an interesting looking cloud lab out of the Ginkgo Reconfigurable Automation Carts. This is just a minimal implementation so far, but I would love to see some fun designs!
    Tip

    Note from Ronan: If you are interested in helping me build out future HTGAA cloud lab software, please fill out this form!

this was skipped!


Reading & Resources (click to expand)

Reading:

Common Nebula protocols & their parameters

Generic_atc_run_protocol
{
    "bs_shake": false,
    "storage_rac": "ambistore-1",
    "hig_pre_spin": false,
    "hig_post_spin": false,
    "storage_stacker": "10-position",
    "atc_sample_volume": 10,
    "bs_model": "3000",
    "bs_speed": 1500,
    "bs_duration": 30,
    "hig_pre_g_force": 1500,
    "hig_pre_spin_time": 0,
    "hig_post_g_force": 1500,
    "hig_post_spin_time": 0,
    "atc_block_format": 96
}
Generic_bravo_stamp
{
    "pl_dest_seal": false,
    "bs_dest_shake": false,
    "hig_dest_spin": false,
    "pl_source_seal": false,
    "bs_source_shake": false,
    "hig_source_spin": false,
    "xpeel_dest_peel": false,
    "bravo_asp_height": 2,
    "dest_storage_rac": "ambistore-1",
    "bravo_disp_height": 2,
    "bravo_head_format": 96,
    "xpeel_source_peel": false,
    "bravo_liquid_class": "Aqueous",
    "source_storage_rac": "ambistore-1",
    "dest_storage_stacker": "10-position",
    "bravo_dest_mix_cycles": 0,
    "bravo_dest_mix_volume": 0,
    "source_storage_stacker": "10-position",
    "bravo_source_mix_cycles": 0,
    "bravo_source_mix_volume": 0,
    "trash_submodule_type_name": "trash-1",
    "bravo_dest_mix_liquid_class": "Gentle",
    "bravo_source_mix_liquid_class": "Gentle",
    "hig_source_g_force": 1500,
    "hig_source_spin_time": 0,
    "pl_source_seal_temp": 166,
    "pl_source_seal_time": 2.5,
    "pl_source_seal_type": "alu-1",
    "hig_dest_g_force": 1500,
    "hig_dest_spin_time": 0,
    "pl_dest_seal_temp": 166,
    "pl_dest_seal_time": 2.5,
    "pl_dest_seal_type": "alu-1",
    "bs_dest_model": "3000",
    "bs_dest_speed": 200,
    "bs_dest_duration": 0,
    "bs_source_model": "3000",
    "bs_source_speed": 200,
    "bs_source_duration": 0
}
Generic_cytomat_incubate
{
    "cytomat_stacker": "7-position",
    "store_payload_after_incubation": false,
    "storage_rac": "ambistore-1",
    "storage_stacker": "10-position"
}
Generic_echo_hitpick
{
    "pl_dest_seal": false,
    "bs_dest_shake": false,
    "hig_dest_spin": false,
    "pl_source_seal": false,
    "source_centric": false,
    "bs_source_shake": false,
    "hig_source_spin": false,
    "xpeel_dest_peel": false,
    "dest_storage_rac": "ambistore-1",
    "xpeel_source_peel": false,
    "source_storage_rac": "ambistore-1",
    "dest_storage_stacker": "10-position",
    "source_storage_stacker": "10-position",
    "echo_submodule_type_name": "echo",
    "bs_source_model": "3000",
    "bs_source_speed": 1500,
    "bs_source_duration": 0,
    "hig_source_g_force": 1500,
    "hig_source_spin_time": 0,
    "bs_dest_model": "3000",
    "bs_dest_speed": 1500,
    "bs_dest_duration": 0,
    "hig_dest_g_force": 1500,
    "hig_dest_spin_time": 0,
    "pl_source_seal_type": "alu-1",
    "pl_dest_seal_type": "alu-1",
    "echo_source_liquid_type": "AQ_BP",
    "echo_transfer_information": [
	{
	    "plate_map": {},
	    "source_payload_id": "placeholder-plate-id",
	    "source_liquid_type": "placeholder-liquid-type",
	    "source_payload_type": "placeholder-plate-type",
	    "destination_payload_id": "placeholder-plate-id",
	    "destination_payload_type": "placeholder-plate-type"
	}
    ],
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5
}
Generic_hig_centrifuge
{
    "hig_g_force": 1500,
    "storage_rac": "ambistore-1",
    "hig_spin_time": 60,
    "storage_stacker": "10-position",
    "hig_spin_two_payloads": false
}
Generic_floi8_cherry_pick
{
    "pl_dest_seal": false,
    "bs_dest_shake": false,
    "hig_dest_spin": false,
    "pl_source_seal": false,
    "source_centric": true,
    "bs_source_shake": false,
    "hig_source_spin": false,
    "xpeel_dest_peel": false,
    "dest_storage_rac": "ambistore-1",
    "xpeel_source_peel": false,
    "floi8_request_tips": true,
    "source_storage_rac": "ambistore-1",
    "dest_storage_stacker": "10-position",
    "floi8_tip_preferences": [
	"f50"
    ],
    "source_storage_stacker": "10-position",
    "floi8_cherry_pick_plans": [
	{
	    "well_transfers": {},
	    "dest_payload_id": "placeholder-plate-id",
	    "dest_payload_type": "placeholder-plate-type",
	    "pipetting_profile": "placeholder-liquid-type",
	    "source_payload_id": "placeholder-plate-id",
	    "source_payload_type": "placeholder-plate-type"
	}
    ],
    "hig_source_g_force": 250,
    "hig_source_spin_time": 0,
    "bs_source_model": "3000",
    "bs_source_speed": 200,
    "bs_source_duration": 0,
    "floi8_source_delid_relid": false,
    "hig_dest_g_force": 250,
    "hig_dest_spin_time": 0,
    "bs_dest_model": "3000",
    "bs_dest_speed": 200,
    "bs_dest_duration": 0,
    "floi8_dest_delid_relid": false,
    "pl_source_seal_temp": 166,
    "pl_source_seal_time": 2.5,
    "pl_source_seal_type": "alu-1",
    "pl_dest_seal_temp": 166,
    "pl_dest_seal_time": 2.5,
    "pl_dest_seal_type": "alu-1"
}
generic_multiflo_dispense
{
    "bs_shake": false,
    "hig_spin": false,
    "mf_shake": false,
    "xpeel_peel": false,
    "storage_rac": "ambistore-1",
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "pl_seal_type": "alu-1",
    "storage_stacker": "10-position",
    "mf_dispense_type": "peripump",
    "mf_source_content": "reagent",
    "mf_dispense_volume": 25,
    "mf_cols_to_dispense": [],
    "mf_num_pre_dispenses": 2,
    "mf_pre_dispense_volume": 100,
    "mf_submodule_type_name": "multiflo-1",
    "mf_soak_duration": 0,
    "mf_shake_duration": 0,
    "mf_shake_intensity": "medium",
    "hig_g_force": 250,
    "hig_spin_time": 0,
    "bs_model": "3000",
    "bs_speed": 1500,
    "bs_duration": 0,
    "mf_flow_rate": "med",
    "mf_pump_or_syringe_name": "primary"
}
generic_spark_read
{
    "pl_seal": false,
    "bs_shake": false,
    "hig_spin": false,
    "xpeel_peel": false,
    "storage_rac": "ambistore-1",
    "storage_stacker": "10-position",
    "spark_delid_relid": false,
    "spark_protocol_duration": 60,
    "spark_run_custom_protocol": false,
    "bs_model": "3000",
    "bs_speed": 200,
    "bs_duration": 0,
    "hig_g_force": 250,
    "hig_spin_time": 0,
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "pl_seal_type": "alu-1",
    "spark_protocol_steps": []
}
generic_pherastar_read
{
    "pl_seal": true,
    "bs_shake": false,
    "hig_spin": false,
    "xpeel_peel": true,
    "storage_rac": "ambistore-1",
    "storage_stacker": "10-position",
    "ps_protocol_duration": 90,
    "bs_model": "3000",
    "bs_speed": 1500,
    "bs_duration": 0,
    "hig_g_force": 1500,
    "hig_spin_time": 0,
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "pl_seal_type": "alu-1"
}

Week 10: Imaging and Measurement

Week 10: Imaging and Measurement



title: “Week 10 — Advanced Imaging & Measurement Technology” linkTitle: “Week 10 (Apr 7)” weight: 200 description: | Advanced Imaging & Measurement Tech (Evan Daugharthy, Waters Corp.)
Lab: Mass Spectrometry

This lecture presents a range of advanced technologies to do precision measurement of proteins at atomic scales, characterizing chemical composition, and detecting protein sequence and structure.

Lecture (Tues, Apr 7)

Advanced Imaging & Measurement Tech
(▶️Recording)
Evan Daugharthy, Lindsay Morrison.

Recitation (Wed, Apr 8)

Mass spectrometry
(▶️Recording | 💻Slides)
Waters Corp. Team

Lab (Thurs-Fri, Apr 9 - 10)

Homework — DUE BY START OF Apr 14 LECTURE

Homework is partly based on data that will be generated in the Waters Immerse Lab in Cambridge, MA. Students will characterize green fluorescent protein (eGFP, a recombinant protein standard) structure (primary, secondary/tertiary) in the lab using liquid chromatography and mass spectrometry, as well as Keyhole Limpet Hemocyanin (KLH) oligomeric states using charge detection mass spectrometry (CDMS). Data generated in the lab needed to do the homework is included both within this document and in the Appendix of the laboratory protocol.

Homework: Final Project

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

For your final project:

  • Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

As I will be dopaminergically differentiating PC12 cells via mRNA approach, the first aspect for measurement in my project is to measure the quantity of dopamine released by PC12 cells into surrounding media. To measure dopamine concentration, I will need to measure with ELISA, which is an enzyme linked immunosorbent assay that is a plate-based quantitative immunoassay.

From Thermo Fisher’s ELISA kit, there are roughly 7 steps for the instant ELISA kits, from rehydration of standard and sample wells on plate, to incubation, to washing, adding TMB substrate, adding stop solution, and then to calculate results.

Absorbance is read at 450 nm on the Spark plate reader, and dopamine concentration will be calculated from a standard curve from the known dopamine concentrations ranging from 0 to 50 nM.

Expected range of dopamine should be 3- 50 nM and co-transfected wells with Nurr1-GFP and FoxA2 RFP will be 3-5x above negative control if we follow Kim et al (2017) closely.

  • Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

After successful transfection of my fusion proteins Nurr1+GFP and FoxA2+RFP, other elements I’m planning to measure will also include the presence of GFP and RFP florescence as measurements and validations for subcellular localizaiton of these fusion proteins in PC12 cells. Their presence will help confirm that the mRNA constructs were successfully taken up and translated into protein and localized to nucleus. This means dopamine synthesis will be activated in PC12 cells. This can be measured via live florescence microscopy and quantitative plate reader. One example will be use florescene microscopy where it will illuminate a sample with a specific wavelength of light, which will cause tagged structures (fluorophores) to emit a lower-energy glow that reveals specific cellular components. The expected results is to see GFP excited at ~488 nm and emitting at ~520 nm, whereas RFP will be excited at ~555 nm and emitting at ~610 nm. If GFP and RFP are both present and successfully transfected, the nucleus of these cells should overlay green and red to show a bit of yellow in the nucleus.

  • What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

I will be using ELISA kit, Spark plate reader, and florescence microscopy as mentioned above (description also above).

Homework: Waters Part I — Molecular Weight

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

  1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

    eGFP Sequence:
    MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH
    Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it).

image image I used the ExPASy tool with the full eGFP sequence (including the LE linker and His-tag), the calculated theoretical mass comes about to 28,006.60 Da with a pI of 5.90. But eGFP doesn’t stay chemically identical after unfolding, three amino acids cyclise to form the fluorescent chromophore, and this reaction releases small molecules that reduce the protein’s mass by roughly 20 Da. The modification will arrive at a theoretical weight of approximately 27,986.60 Da accounting for this maturation gap.

  1. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:
    1. Determine $z$ for each adjacent pair of peaks $(n, n+1)$ using: $$ {\large z} = {\Large \frac{\frac{m}{z_{n+1}}}{\frac{m}{z_n} - \frac{m}{z_{n+1}}}} $$

I selected these two peaks: Peak 1 (n): m/z = 933.7349 Peak 2 (n+1): m/z = 903.7148

Because mass is constant and m/z decreases as charge increases, the higher m/z peak (933.7349) carries the lower charge. Plugging into the formula: z=903.7148933.7349−903.7148=903.714830.0201=30.10≈30z = \frac{903.7148}{933.7349 - 903.7148} = \frac{903.7148}{30.0201} = 30.10 \approx 30z=933.7349−903.7148903.7148​=30.0201903.7148​=30.10≈30 So the charge state of the 903.7148 peak is z = 30, and the adjacent 933.7349 peak is z = 29.

  1. Determine the MW of the protein using the relationship between $\frac{m}{z_n}$, $MW$, and $z$

Each charge state arises from protons (mass = 1.00728 Da) attaching to the protein, so: M=(mz)×z − (z×1.00728)M = \left(\frac{m}{z}\right) \times z \ - \ (z \times 1.00728)M=(zm​)×z − (z×1.00728) For the z = 30 peak (m/z = 903.7148): M=(903.7148×30) − (30×1.00728)M = (903.7148 \times 30) \ - \ (30 \times 1.00728)M=(903.7148×30) − (30×1.00728) =27,111.444 − 30.218=27,081.23 Da= 27{,}111.444 \ - \ 30.218 = \textbf{27,081.23 Da}=27,111.444 − 30.218=27,081.23 Da For the z = 29 peak (m/z = 933.7349): M=(933.7349×29) − (29×1.00728)M = (933.7349 \times 29) \ - \ (29 \times 1.00728)M=(933.7349×29) − (29×1.00728) =27,078.312 − 29.211=27,049.10 Da= 27{,}078.312 \ - \ 29.211 = \textbf{27,049.10 Da}=27,078.312 − 29.211=27,049.10 Da The difference between these two values is only 0.032 Da

  1. Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using: $$ \text{Accuracy} = \frac{|MW_{\text{experiment}} - MW_{\text{theory}}|}{MW_{\text{theory}}} $$
    Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with $\frac{m}{z}$ values.

    Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with $\frac{m}{z}$ values.

Taking the average of the two calculated masses as the experimental MW: MWexp=27,081.23+27,049.102=27,065.16 DaMW_{\text{exp}} = \frac{27{,}081.23 + 27{,}049.10}{2} = 27{,}065.16 \ \text{Da}MWexp​=227,081.23+27,049.10​=27,065.16 Da Comparing to the theoretical MW of 26,941.48 Da: Accuracy=∣27,065.16−26,941.48∣26,941.48=123.6826,941.48=0.00459=4,591 ppm\text{Accuracy} = \frac{|27{,}065.16 - 26{,}941.48|}{26{,}941.48} = \frac{123.68}{26{,}941.48} = 0.00459 = \textbf{4,591 ppm}Accuracy=26,941.48∣27,065.16−26,941.48∣​=26,941.48123.68​=0.00459=4,591 ppm

  1. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

No charge state cannot be seen from zoomed-in peak alone. The zoomed-in peak reveals the pattern of the charge state, but the charge state is determined by comparing the spacing between adjacent charge-state peaks in the full spectrum.

Homework: Waters Part II — Secondary/Tertiary structure

Assignees for the following sections
MIT/Harvard studentsOptional but highly recommended
Committed ListenersOptional but highly recommended

We will analyze eGFP in its native, folded state and compare it to its denatured, unfolded state on a quadrupole time-of-flight MS. We will be doing MS-only analysis (no liquid chromatography, also known as “direct infusion” experiments) on the Waters Xevo G3-QToF MS.

  1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?
    Figure 2.  Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.

    Figure 2. Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.

In their native state proteins retain their three dimensional architecture , the secondary structures (helices and sheets), the overall folded shape (tertiary structure), and any multi-subunit assemblies (quaternary structure) are all intact. When a protein denatures, all those structures collapses and it reverts to a loose, unfolded chain , essentially just its primary sequence with no organised shape remaining. unfolded proteins expose their hydrophobic surfaces so bonds are broken and charge states are more generated

  1. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 $\frac{m}{z}$? What is the charge state? How can you tell?
    Figure 3.  Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS.  The inset is a zoomed-in view of the charge state at ~2800 $\frac{m}{z}$ on a mass spectrometer with 30,000 resolution.

    Figure 3. Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS. The inset is a zoomed-in view of the charge state at ~2800 $\frac{m}{z}$ on a mass spectrometer with 30,000 resolution.

skipped…

Homework: Waters Part III — Peptide Mapping - primary structure

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.

There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.

  1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

From Benchling:
Amino Acid Frequencies Amino acid Count Ala A 8 3.2% Arg R 6 2.4% Asn N 13 5.3% Asp D 18 7.3% Cys C 2 0.8% Gln Q 8 3.2% Glu E 17 6.9% Gly G 22 8.9% His H 15 6.1% Ile I 12 4.9% Leu L 22 8.9% Lys K 20 8.1% Met M 6 2.4% Phe F 12 4.9% Pro P 10 4.0% Ser S 10 4.0% Thr T 16 6.5% Trp W 1 0.4% Tyr Y 11 4.5% Val V 18 7.3% Pyl O 0 0.0% Sec U 0 0.0%

There are 20 Lysines (K) and 6 Arginines (R) in eGFP.

  1. How many peptides will be generated from tryptic digestion of eGFP?
    1. Navigate to https://web.expasy.org/peptide_mass/

    2. Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.

    3. Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.

    4. Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.

      Figure 4.  Example conditions for predicting the number of tryptic peptides from the eGFP standard.  Please replicate all parameters shown above.

      Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard. Please replicate all parameters shown above.

19 peptides are over 500Da and the predicted coverage for the detected peptides is 90.7%.

Sequence: 10 20 30 40 50 60 MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT

    70         80         90        100        110        120 

LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL

   130        140        150        160        170        180 

VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA

   190        200        210        220        230        240 

DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYKL

EHHHHH

Selected enzyme: Trypsin Maximum number of missed cleavages (MC): 0 Cysteines modifications: All cysteines in reduced form Methionines modifications: Methionines have not been oxidized. Mass of displayed peptides:

500 Dalton Mass calculation: Using monoisotopic masses of the occurring amino acid residues and giving peptide masses as [M+H]+. Peptide masses for your input sequence

[Theoretical pI: 5.84 / Mw (average mass): 27869.46 / Mw (monoisotopic mass): 27851.90]

mass position #MC modifications peptide sequence 4472.1752 170-210 0 HNIEDGSVQLADHYQQNTPI GDGPVLLPDNHYLSTQSALS K 2566.2931 217-239 0 DHMVLLEFVTAAGITLGMDE LYK 2437.2608 5-27 0 GEELFTGVVPILVELDGDVN GHK 2378.2577 54-74 0 LPVPWPTLVTTLTYGVQCFS R 1973.9062 142-157 0 LEYNYNSHNVYIMADK 1503.6597 28-42 0 FSVSGEGEGDATYGK 1266.5783 87-97 0 SAMPEGYVQER 1050.5214 115-123 0 FEGDTLVNR 982.4952 133-141 0 EDGNILGHK 946.4390 240-246 0 LEHHHHH 821.3940 81-86 0 QHDFFK 790.3552 75-80 0 YPDHMK 769.3913 47-53 0 FICTTGK 711.2944 103-108 0 DDGNYK 655.3813 98-102 0 TIFFK 602.2780 211-215 0 DPNEK 579.3137 128-132 0 GIDFK 507.2925 164-167 0 VNFK 502.3235 124-127 0 IELK

90.7% of sequence covered (you may modify the input parameters to display also peptides < 500 Da):

  1. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
    Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in the mass spectrum in Figure 5b, below.

    Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in the mass spectrum in Figure 5b, below.

Counting across the 0.5–6 minute window, there are roughly 20 peaks visible above the 10% relative abundance threshold.

  1. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

Pretty much, 20 observed peaks versus 19 predicted peptides is quite close. The slight excess is normal in LC-MS experiments, where things like background noise, trace impurities, or the occasional missed cleavage by trypsin can produce an extra signal or two.

  1. Identify the mass-to-charge ($\frac{m}{z}$) of the peptide shown in Figure 5b. What is the charge ($z$) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ($\small{[M\!\!+\!\!H]^+}$) based on its $\frac{m}{z}$ and $z$.
    Figure 5b. Mass spectrum figure to show $\frac{m}{z}$ for the chromatographic peak at 2.78 min from Figure 5a above. The inset is a zoom-in of the peak at $\frac{m}{z}$ 525.76, to discern the isotope peaks.

    Figure 5b. Mass spectrum figure to show $\frac{m}{z}$ for the chromatographic peak at 2.78 min from Figure 5a above. The inset is a zoom-in of the peak at $\frac{m}{z}$ 525.76, to discern the isotope peaks.

    Figure 5c. Fragmentation spectrum of the peptide eluting at retention time 2.78 minutes in Figure 5a (above).

    Figure 5c. Fragmentation spectrum of the peptide eluting at retention time 2.78 minutes in Figure 5a (above).

The most abundant peak in Figure 5b sits at m/z = 525.767. Looking at the isotope spacing in the zoomed inset, the peaks are separated by about 0.5 m/z units, which tells us the charge state is z = 2. Converting to the singly charged form gives a monoisotopic mass of approximately 1050.527 Da.

  1. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that $ \text{Accuracy} = \frac{|MW_{\text{experiment}} - MW_{\text{theory}}|}{MW_{\text{theory}}} $ )

Matching that mass against the PeptideMass output points to the peptide [FEGDTLVNR], which has a theoretical mass of 1050.521 Da. The difference between measured and theoretical is only about 2.0 ppm which is within expected accuracy range

  1. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)
    Figure 6.  Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data.

    Figure 6. Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data.

The coverage map from the BioAccord run shows that 88% of the eGFP sequence was confirmed by peptide mapping.

Bonus Peptide Map Questions

  1. Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?
  2. Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

Homework: Waters Part IV — Oligomers

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

  • 7FU Decamer
  • 8FU Didecamer
  • 8FU 3-Decamer
  • 8FU 4-Decamer

KLH is made from two types of subunits inc a smaller 7FU subunit (340 kDa) and a larger 8FU subunit (400 kDa) . they naturally cluster together into rings of 10 (decamers) and larger stacked assemblies.

AssignmentCalculationExpected MassCDMS Peak ObservedMatch
7FU Decamer10 × 340 kDa3,400 kDa (3.40 MDa)~3.40 MDaExact
8FU Decamer10 × 400 kDa4,000 kDa (4.00 MDa)~4.01 MDa0.3% off
8FU Didecamer20 × 400 kDa8,000 kDa (8.00 MDa)~8.33 MDa4.1% off
8FU 3-Decamer30 × 400 kDa12,000 kDa (12.00 MDa)~12.67 MDa5.6% off
8FU 4-Decamer40 × 400 kDa16,000 kDa (16.00 MDa)Not visible

The 7FU Decamer matches its theoretical mass well. The larger assemblies (didecamer and 3-decamer) show slightly higher masses than predicted, which is expected as proteins tends to carry along some associated water, salt ions, or lipid molecules that add a small amount of extra mass on top of the protein itself. The 4-decamer doesn’t show up clearly in the spectrum at all, which suggests it’s either not forming in this sample, present in such small amounts that it falls below the detection threshold, or simply too heavy for the instrument to resolve clearly at the high-mass end. The two smaller peaks visible around 0.79 MDa and 1.52 MDa in Figure 7 are likely sub-decameric fragments , smaller incomplete assemblies such as dimers or tetramers of individual subunits that haven’t fully assembled into the complete decamer ring.

Figure 7.  Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.

Figure 7. Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.

Homework: Waters Part V — Did I make GFP?

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

PropertyTheoreticalCalculationObserved (Intact LC-MS)PPM Error
Molecular weight (kDa)28.007 kDaFrom ExPASy sequence input~27.984 kDa~820 ppm
Chromophore correction−20 Da28,007 − 20 = 27,987 Da
Peptide mapping coverage100%19 predicted peptides, 88% identified88%
Peptide FEGDTLVNR mass (Da)1050.5214 DaFrom PeptideMass tool1050.5270 Da~5 ppm

it worked. yes. the mass is within 820 ppm of the theoretical value, it can be explained by chomophore maturation which is when eGFP folds and forms its fluorescent chromophore. If it loses roughly 20 Da. The peptide map also independently confirms 88% of the amino acid sequence at under 5 ppm accuracy, meaning the instrument directly detected fragments covering nearly the entire protein. What was expressed and purified is very liekly to be eGFP.

Reading & Resources (click to expand)

Week 12: Bioproduction

Week 12: Bioproduction


Homework — DUE BY START OF APR 28 LECTURE

(TBD)

Week 13: Bio Design Living Materials

Week 13: Bio Design Living Materials


Homework: Work on your Final Project

Present it May 12 (MIT/Harvard) or May 13 (Committed Listeners)

Week 14: Biofabrication

Week 14: Biofabrication


Homework: Finish your Final Project

Present it May 12 (MIT/Harvard) or May 13 (Committed Listeners)

Week 2: DNA Read, Write, and Edit

Week 2: DNA Read, Write, and Edit


Part 1: Benchling & In-silico Gel Art

1.1 Import Lambda DNA

Lambda DNA Import Lambda DNA Import

Simulate Restriction Enzyme Digestion

Restriction Digest Restriction Digest

Virtual Gel

Virtual Gel Virtual Gel

Part 2: Gel Art

I have chosen to create a gel art of a person doing a jumping jack through randomization method.

Gel Art Gel Art

Part 3: DNA Sequence Design

3.1 Protein Selection

I have chosen IL23 as I am interested in autoimmune disease such as psoriasis. This protein is related to inflammation and I am curious to learn more about biologics in general.


3.2 Reverse Translation

Reverse translation of sp|Q5VWK5|IL23R_HUMAN Interleukin-23 receptor OS=Homo sapiens OX=9606 GN=IL23R PE=1 SV=3 to a 1887 base sequence of most likely codons.

atgaaccaggtgaccattcagtgggatgcggtgattgcgctgtatattctgtttagctgg
tgccatggcggcattaccaacattaactgcagcggccatatttgggtggaaccggcgacc
atttttaaaatgggcatgaacattagcatttattgccaggcggcgattaaaaactgccag
ccgcgcaaactgcatttttataaaaacggcattaaagaacgctttcagattacccgcatt
aacaaaaccaccgcgcgcctgtggtataaaaactttctggaaccgcatgcgagcatgtat
tgcaccgcggaatgcccgaaacattttcaggaaaccctgatttgcggcaaagatattagc
agcggctatccgccggatattccggatgaagtgacctgcgtgatttatgaatatagcggc
aacatgacctgcacctggaacgcgggcaaactgacctatattgataccaaatatgtggtg
catgtgaaaagcctggaaaccgaagaagaacagcagtatctgaccagcagctatattaac
attagcaccgatagcctgcagggcggcaaaaaatatctggtgtgggtgcaggcggcgaac
gcgctgggcatggaagaaagcaaacagctgcagattcatctggatgatattgtgattccg
agcgcggcggtgattagccgcgcggaaaccattaacgcgaccgtgccgaaaaccattatt
tattgggatagccagaccaccattgaaaaagtgagctgcgaaatgcgctataaagcgacc
accaaccagacctggaacgtgaaagaatttgataccaactttacctatgtgcagcagagc
gaattttatctggaaccgaacattaaatatgtgtttcaggtgcgctgccaggaaaccggc
aaacgctattggcagccgtggagcagcctgttttttcataaaaccccggaaaccgtgccg
caggtgaccagcaaagcgtttcagcatgatacctggaacagcggcctgaccgtggcgagc
attagcaccggccatctgaccagcgataaccgcggcgatattggcctgctgctgggcatg
attgtgtttgcggtgatgctgagcattctgagcctgattggcatttttaaccgcagcttt
cgcaccggcattaaacgccgcattctgctgctgattccgaaatggctgtatgaagatatt
ccgaacatgaaaaacagcaacgtggtgaaaatgctgcaggaaaacagcgaactgatgaac
aacaacagcagcgaacaggtgctgtatgtggatccgatgattaccgaaattaaagaaatt
tttattccggaacataaaccgaccgattataaaaaagaaaacaccggcccgctggaaacc
cgcgattatccgcagaacagcctgtttgataacaccaccgtggtgtatattccggatctg
aacaccggctataaaccgcagattagcaactttctgccggaaggcagccatctgagcaac
aacaacgaaattaccagcctgaccctgaaaccgccggtggatagcctggatagcggcaac
aacccgcgcctgcagaaacatccgaactttgcgtttagcgtgagcagcgtgaacagcctg
agcaacaccatttttctgggcgaactgagcctgattctgaaccagggcgaatgcagcagc
ccggatattcagaacagcgtggaagaagaaaccaccatgctgctggaaaacgatagcccg
agcgaaaccattccggaacagaccctgctgccggatgaatttgtgagctgcctgggcatt
gtgaacgaagaactgccgagcattaacacctattttccgcagaacattctggaaagccat
tttaaccgcattagcctgctggaaaaa

Reverse translation of sp|Q5VWK5|IL23R_HUMAN Interleukin-23 receptor OS=Homo sapiens OX=9606 GN=IL23R PE=1 SV=3 to a 1887 base sequence of consensus codons.

atgaaycargtnacnathcartgggaygcngtnathgcnytntayathytnttywsntgg
tgycayggnggnathacnaayathaaytgywsnggncayathtgggtngarccngcnacn
athttyaaratgggnatgaayathwsnathtaytgycargcngcnathaaraaytgycar
...

3.3 Codon Optimization

Original Sequence

  • GC Content: 49.34%
  • CAI: 0.83
ATGAACCAGGTGACCATTCAGTGGGATGCGGTGATTGCGCTGTATATTCTGTTTAGCTGGTGCCATGGCGGCATTACCAACATTAACTGCAGCGGCCATATTTGGGTGGAACCGGCGACCATTTTTAAAATGGGCATGAACATTAGCATTTATTGCCAGGCGGCGATTAAAAACTGCCAGCCGCGCAAACTGCATTTTTATAAAAACGGCATTAAAGAACGCTTTCAGATTACCCGCATTAACAAAACCACCGCGCGCCTGTGGTATAAAAACTTTCTGGAACCGCATGCGAGCATGTATTGCACCGCGGAATGCCCGAAACATTTTCAGGAAACCCTGATTTGCGGCAAAGATATTAGCAGCGGCTATCCGCCGGATATTCCGGATGAAGTGACCTGCGTGATTTATGAATATAGCGGCAACATGACCTGCACCTGGAACGCGGGCAAACTGACCTATATTGATACCAAATATGTGGTGCATGTGAAAAGCCTGGAAACCGAAGAAGAACAGCAGTATCTGACCAGCAGCTATATTAACATTAGCACCGATAGCCTGCAGGGCGGCAAAAAATATCTGGTGTGGGTGCAGGCGGCGAACGCGCTGGGCATGGAAGAAAGCAAACAGCTGCAGATTCATCTGGATGATATTGTGATTCCGAGCGCGGCGGTGATTAGCCGCGCGGAAACCATTAACGCGACCGTGCCGAAAACCATTATTTATTGGGATAGCCAGACCACCATTGAAAAAGTGAGCTGCGAAATGCGCTATAAAGCGACCACCAACCAGACCTGGAACGTGAAAGAATTTGATACCAACTTTACCTATGTGCAGCAGAGCGAATTTTATCTGGAACCGAACATTAAATATGTGTTTCAGGTGCGCTGCCAGGAAACCGGCAAACGCTATTGGCAGCCGTGGAGCAGCCTGTTTTTTCATAAAACCCCGGAAACCGTGCCGCAGGTGACCAGCAAAGCGTTTCAGCATGATACCTGGAACAGCGGCCTGACCGTGGCGAGCATTAGCACCGGCCATCTGACCAGCGATAACCGCGGCGATATTGGCCTGCTGCTGGGCATGATTGTGTTTGCGGTGATGCTGAGCATTCTGAGCCTGATTGGCATTTTTAACCGCAGCTTTCGCACCGGCATTAAACGCCGCATTCTGCTGCTGATTCCGAAATGGCTGTATGAAGATATTCCGAACATGAAAAACAGCAACGTGGTGAAAATGCTGCAGGAAAACAGCGAACTGATGAACAACAACAGCAGCGAACAGGTGCTGTATGTGGATCCGATGATTACCGAAATTAAAGAAATTTTTATTCCGGAACATAAACCGACCGATTATAAAAAAGAAAACACCGGCCCGCTGGAAACCCGCGATTATCCGCAGAACAGCCTGTTTGATAACACCACCGTGGTGTATATTCCGGATCTGAACACCGGCTATAAACCGCAGATTAGCAACTTTCTGCCGGAAGGCAGCCATCTGAGCAACAACAACGAAATTACCAGCCTGACCCTGAAACCGCCGGTGGATAGCCTGGATAGCGGCAACAACCCGCGCCTGCAGAAACATCCGAACTTTGCGTTTAGCGTGAGCAGCGTGAACAGCCTGAGCAACACCATTTTTCTGGGCGAACTGAGCCTGATTCTGAACCAGGGCGAATGCAGCAGCCCGGATATTCAGAACAGCGTGGAAGAAGAAACCACCATGCTGCTGGAAAACGATAGCCCGAGCGAAACCATTCCGGAACAGACCCTGCTGCCGGATGAATTTGTGAGCTGCCTGGGCATTGTGAACGAAGAACTGCCGAGCATTAACACCTATTTTCCGCAGAACATTCTGGAAAGCCATTTTAACCGCATTAGCCTGCTGGAAAAA

Improved DNA Sequence

  • GC Content: 51.56%
  • CAI: 0.91
ATGAACCAGGTGACTATCCAGTGGGACGCCGTTATCGCACTGTATATCCTGTTCAGCTGGTGCCACGGGGGCATTACCAACATAAACTGTAGCGGGCACATCTGGGTGGAACCTGCGACCATCTTCAAGATGGGCATGAATATCTCTATCTACTGTCAGGCCGCCATTAAGAACTGCCAGCCCAGGAAGCTGCATTTCTATAAGAATGGGATCAAGGAAAGGTTCCAGATCACCCGGATCAATAAGACCACAGCCCGCCTGTGGTACAAGAATTTTCTCGAGCCTCATGCCTCTATGTACTGTACAGCAGAGTGTCCTAAGCACTTCCAGGAGACTCTGATCTGCGGCAAAGATATTAGCTCCGGGTACCCCCCCGACATCCCCGACGAAGTGACCTGCGTGATCTATGAGTACTCCGGGAATATGACCTGCACCTGGAATGCCGGCAAGCTGACTTACATTGATACAAAGTACGTGGTGCATGTGAAGAGTCTGGAAACTGAGGAGGAACAGCAGTACCTGACAAGCTCCTATATCAATATTTCTACCGACTCTCTGCAGGGCGGCAAGAAGTACCTGGTGTGGGTGCAGGCCGCCAACGCTCTGGGCATGGAAGAGTCTAAGCAGCTGCAGATTCACCTAGATGATATTGTGATCCCATCCGCCGCCGTGATCAGCCGTGCAGAGACAATCAACGCCACCGTGCCTAAAACCATCATCTACTGGGACTCCCAAACCACCATTGAAAAGGTGAGTTGCGAAATGAGGTATAAGGCCACCACCAATCAGACCTGGAACGTGAAGGAATTCGACACAAACTTTACATATGTGCAGCAGAGCGAGTTTTATCTGGAGCCTAATATCAAGTACGTGTTCCAGGTCAGGTGTCAGGAGACAGGGAAGCGCTACTGGCAGCCCTGGAGTTCCCTGTTCTTTCACAAAACCCCAGAAACCGTGCCTCAGGTGACCTCCAAGGCCTTTCAGCATGACACCTGGAATTCCGGCCTGACTGTGGCCTCAATCTCAACTGGACATCTGACCAGCGATAATAGAGGAGACATAGGCCTGCTGCTGGGCATGATCGTGTTCGCAGTGATGCTGAGCATCCTGTCCCTGATCGGGATCTTCAATAGGTCTTTCCGCACCGGCATCAAGAGGAGGATCCTGCTGCTGATCCCCAAGTGGCTGTATGAGGATATCCCCAACATGAAGAACTCAAATGTGGTGAAGATGCTGCAGGAGAATTCCGAACTGATGAACAACAACAGCTCTGAGCAGGTGCTGTATGTGGACCCCATGATTACCGAGATCAAGGAAATCTTCATACCTGAGCACAAGCCCACAGACTACAAAAAAGAGAACACCGGACCACTGGAGACAAGGGATTATCCACAGAATAGCCTTTTCGATAATACAACCGTGGTGTACATCCCCGATCTGAACACCGGCTACAAACCCCAGATCTCTAACTTCCTGCCTGAGGGCTCCCACCTGTCCAATAACAACGAGATCACCAGCCTGACCCTGAAGCCCCCAGTGGACTCCCTGGACTCCGGCAATAATCCCAGACTGCAAAAACACCCTAACTTCGCGTTTTCCGTGTCAAGCGTGAATTCCCTGAGCAACACCATTTTCCTGGGCGAGCTGTCACTGATCCTGAACCAGGGCGAGTGCTCAAGCCCAGACATCCAGAACTCTGTCGAGGAGGAGACTACGATGCTGCTGGAGAATGATAGTCCCTCCGAAACAATCCCAGAGCAGACCCTGCTGCCTGATGAGTTTGTCAGCTGCCTGGGCATCGTGAACGAGGAGCTGCCCTCCATAAATACCTATTTCCCCCAGAATATCCTGGAATCCCACTTCAACAGAATTAGCCTGCTGGAGAAG

Avoid Cleavage Sites

  • BbsI
  • BsaI

Why Codon Optimization is Important

Codon optimization is important because there is codon usage bias, which means humans and other organisms like E. coli might prefer different codons for the same amino acid. Expressing human gene like IL23 might be difficult because codons natural to human cells are rare in E. coli. If bacterium has low levels of corresponding tRNAs, then it will be slowed down during translation. There will be low protein yield as a result.

The codon optimization here increased GC content so there will be more mRNA stability. Codon adaptation index has also gone up.


3.4 Protein Expression

Now, we will use this optimized DNA sequence to create IL23 protein. First we clone the codon optimized sequence into expression vector, and we transform a plasmid into E. coli cells. Bacteria will be shocked by heat to start making protein. The cell’s RNA polymerase will read the DNA and makes mRNA copy. Once the transcription is read, it will begin to build protein using tRNAs in the translation process.

Once this is done, there is a chromatography technique which separates protein from everything else in the cell.


Part 4: IL23 Sequence Analysis

Summary

alt text alt text
PropertyValue
GeneIL23
Benchling Linkhttps://benchling.com/s/seq-009SW3mnB5zCD8Vhh1Tp?m=slm-3ISQ8GXHvPtygDx4UjjQ
Start Codon (ATG)Positions 1–3
Coding SequencePositions 1 through the end
Stop CodonMissing — needs to be added
Promoter, RBS, His-tag, TerminatorAll missing — provided by the vector

Download IL23 Plasmid Map (PDF)

Part 5:

Part 5: DNA Read/Write/Edit

5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would like to sequence and read genes that can help facilitate brains-on-chips research, so while human DNA is interesting, I am perhaps more curious about biocompatible materials or bio-glue that can help with assembling living neuronal tissue with physical hardware like microelectrode arrays. This is usually microbial/ environmental DNA where we can look at genetic strands that can be programmed into biocompatible hydrogels.

DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025] DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025]

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

If I were to use the Sanger sequencing method, we will need to use ddNTPs to shorten chains and terminate chains.

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so?

First generation uses chain-termination, where Polmerase copies the DNA, but ddNTPs tagged with florescent colors are added, so that it creates fragments and is separated by electrophoresis. Second generation only sequnces short fragments and reading a lot of fragments simutaneously. Third generation pulls single strands through nanopore in a membrane and is is read through currents.

What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

We need to prepare input but growing bacteriaphage, we will use plasmid purification to extract DNA. Use Benchling to design primer. We will be preparing via chain-termination PCR which will mix DNA sequence with an enzyme Poolymerase, a primer to bind to target DNA, and dNTPs and ddNTPS that are fluorescent to terminate chains.

Electrophoresis will help us separate DNA, RNA, and proteins by electrical charge.

5.2 DNA Write (i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

Although irrelevant to my final project I’ve always been fascinated by biologics as adillimumab, which is a type of recombinant DNA that instruct living cells to synthesize a therapeutic protein. For the final project probably something that allows biological tissue to be more adhesive to microelectrodes as a part of facilitating electrical communication. Also interested in bioprinting microfluidics.

See some famous examples of DNA design

DNA origami by Paul W. K. Rothemund, California Institute of Technology, 2004. 100 nanometers in diameter.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

Benchling, which is a platform that can help copy and paste DNA sequence, import DNA and protein sequences, perform in silico restriction digestion, and to design gel layouts. We will cut with restrictions enzyme,copy through polymerase chain reactions, and perform DNA cloning to synthesize in silico.

Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

5.3 DNA Edit (i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

If working on neural tissues, I am curious to edit neuroplasticity-related genes so that I can consider how plasticity can be modified or reinforced. I would like to facilitate electrical and chemical stimulation to make it easier for reinforcement learning experiments.

Colossal Biosciences Inc., a biotechnology company using genetic engineering to de-extinct various historic animals such as the woolly mammoth, dodo, and dire wolf.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Electrophoresis will help us separate DNA, RNA, and proteins by electrical charge.

I would like to first perform PCR and digest, and then conduct assembly by converting GFP into RFP.

Week 5: Protein Design Part II

Week 5: Protein Design Part II


Homework — DUE BY START OF MAR 10 LECTURE

Part A: SOD1 Binder Peptide Design

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge:

  1. Design short peptides that bind mutant SOD1.
  2. Then decide which ones are worth advancing toward therapy.

You will use three models developed in our lab:

  • PepMLM: target sequence-conditioned peptide generation via masked language modeling
  • PeptiVerse: therapeutic property prediction
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
>sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Modified A4V:

>sp|P00441|SODC_HUMAN_A4V
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
  1. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
  2. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

0 WRSGATVARHAX 6.030266 0 WRYGAAAVELKE 11.785982 0 WHSGVVGLARGX 6.638643 0 WSYPWVALELGK 16.418794

  1. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

Pseudo Perplexity for binder ‘FLYRWLPSRRGG’ with protein sequence: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ is: 20.63523127283615

  1. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

This is PepMLM’s most confident score: WRSGATVARHAX.

Part 2: Evaluate Binders with AlphaFold3

  1. Navigate to the AlphaFold Server: alphafoldserver.com

  2. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. alt text alt text

  3. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

alt text alt text WRYGAAAVELKE ipTM = 0.28 pTM = 0.78

alt text alt text WSYPWVALELGK ipTM = 0.67 pTM = 0.88

alt text alt text WRSGATVARHAX ipTM = 0.42 pTM = 0.86

alt text alt text iWHSGVVGLARGX pTM = 0.32 pTM = 0.85

In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

One of the PepMLM-generated peptides (ipTM = 0.67) significantly outperforms the others and appears to exceed the confidence of the known literature binder (which often scores in the 0.50–0.60 range in similar AlphaFold benchmarks). While the sequence WRYGAAAVELKE (ipTM = 0.28) failed to find a stable “home” on the SOD1 surface, the high-scoring candidate suggests that PepMLM successfully identified a sequence that “staples” the protein’s interface. This indicates that the language model can indeed generate de novo sequences that are more structurally compatible with the mutated A4V surface than existing experimental peptides.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes:
    • Predicted binding affinity
    • Solubility
    • Hemolysis probability
    • Net charge (pH 7)
    • Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

alt text alt text
WSYPWVALELGK,💧 Solubility,Soluble,1.000,Probability
WSYPWVALELGK,🩸 Hemolysis,Non-hemolytic,0.064,Probability
WSYPWVALELGK,🔗 Binding Affinity,Weak binding,6.048,pKd/pKi
WSYPWVALELGK,📏 Length,,12,aa
WSYPWVALELGK,⚖️ Molecular Weight,,1448.7,Da
WSYPWVALELGK,⚡ Net Charge (pH 7),,-0.24,
WSYPWVALELGK,🎯 Isoelectric Point,,6.00,pH
WSYPWVALELGK,💦 Hydrophobicity (GRAVY),,0.02,GRAVY
WHSGVVGLARGX,💧 Solubility,Soluble,1.000,Probability
WHSGVVGLARGX,🩸 Hemolysis,Non-hemolytic,0.039,Probability
WHSGVVGLARGX,🔗 Binding Affinity,Weak binding,5.754,pKd/pKi
WHSGVVGLARGX,📏 Length,,12,aa
WHSGVVGLARGX,⚖️ Molecular Weight,,1120.5,Da
WHSGVVGLARGX,⚡ Net Charge (pH 7),,0.85,
WHSGVVGLARGX,🎯 Isoelectric Point,,9.76,pH
WHSGVVGLARGX,💦 Hydrophobicity (GRAVY),,0.28,GRAVY
WRYGAAAVELKE,💧 Solubility,Soluble,1.000,Probability
WRYGAAAVELKE,🩸 Hemolysis,Non-hemolytic,0.049,Probability
WRYGAAAVELKE,🔗 Binding Affinity,Weak binding,6.266,pKd/pKi
WRYGAAAVELKE,📏 Length,,12,aa
WRYGAAAVELKE,⚖️ Molecular Weight,,1392.6,Da
WRYGAAAVELKE,⚡ Net Charge (pH 7),,-0.23,
WRYGAAAVELKE,🎯 Isoelectric Point,,6.28,pH
WRYGAAAVELKE,💦 Hydrophobicity (GRAVY),,-0.38,GRAVY
WRSGATVARHAX,💧 Solubility,Soluble,1.000,Probability
WRSGATVARHAX,🩸 Hemolysis,Non-hemolytic,0.013,Probability
WRSGATVARHAX,🔗 Binding Affinity,Weak binding,5.451,pKd/pKi
WRSGATVARHAX,📏 Length,,12,aa
WRSGATVARHAX,⚖️ Molecular Weight,,1193.5,Da
WRSGATVARHAX,⚡ Net Charge (pH 7),,1.85,
WRSGATVARHAX,🎯 Isoelectric Point,,12.00,pH
WRSGATVARHAX,💦 Hydrophobicity (GRAVY),,-0.45,GRAVY

I would choose to advance WRYGAAAVELKE as the best option. First it has great binding strength, with the highest predicted binding affinity ($pK_d \approx 6.27$), which is roughly in the micromolar range, a great starting point for a de novo peptide.

The fact that it has a negative GRAVY score (-0.38) shows that it is more hydrophilic than others. This will help with solubility and lower hemolysis risk (0.049).

Structurally, although its initial ipTM was low, its chemical makeup makes it a better scaffold than a peptide that might bind tightly but aggregate in the blood.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

  1. Open the moPPit Colab linked from the HuggingFace moPPIt model card
  2. Make a copy and switch to a GPU runtime.
  3. In the notebook:
    • Paste your A4V mutant SOD1 sequence.
    • Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
    • Set peptide length to 12 amino acids.
    • Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
  4. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

alt text alt text Running locally on machine because of GPU allocation Results: Peptide 1: EPTEEEQRTCGT Affinity score: 9.17 Solubility score: 0.70

Peptide 2: YYLRRCGYYQRV Affinity score: 8.33 Solubility score: 0.79

moPPit optimizes for binding. So it will generate sequences often with higher predicted affinity scores than PeptiVerse. Peptide 1 EPTEEEQRTCGT has a superior affinity score and is physically complementary to the A4V sequence.

Part B: BRD4 Drug Discovery Platform Tutorial (Optional)

(View Full Screen)

Part C: Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

This homework requires computation that might take you a while to run, so please get started early.

alt text alt text

(View Full Screen)


Reading & Resources

Tools

Week 6: Genetic Circuits Part I

Week 6: Genetic Circuits Part I


Homework — DUE BY START OF MAR 17 LECTURE

Assignment: DNA Assembly

Answer these questions about the protocol in this week’s lab:

  1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion High-Fidelity PCR Master Mix is a chemical environment designed to ensure the most accurate duplication of DNA possible. At its core is the Phusion DNA Polymerase, which is enzyme created by fusing a Pyrococcus-like proofreading polymerase to a double-stranded DNA-binding domain. This structural modification allows the enzyme to remain attached to the DNA template for much longer than standard enzymes, resulting in high processivity and a significantly lower error rate. To support this activity, the mix contains a balanced concentration of dNTPs, which serve as the raw material for the new DNA strands, and a specialized reaction buffer. This buffer includes magnesium chloride, a vital cofactor that helps the polymerase coordinate with the phosphate groups of the incoming nucleotides, as well as various stabilizers that prevent the enzyme from denaturing during the intense heat of the thermal cycling process.

  1. What are some factors that determine primer annealing temperature during PCR?

Determining the correct annealing temperature for a PCR reaction requires balancing several molecular factors. The primary driver is the melting temperature of the primers, which is largely dictated by their length and their GC content. Because Guanine-Cytosine pairs are held together by three hydrogen bonds compared to the two bonds in Adenine-Thymine pairs, primers with more G and C bases require more energy—and thus a higher temperature—to separate and re-anneal. Furthermore, the concentration of salts and ions in the master mix, such as potassium and magnesium, can stabilize the negative charges on the DNA backbone, effectively raising the required annealing temperature. If the temperature is set too low, the primers may bind non-specifically to the wrong parts of the template, while setting it too high may prevent the primers from binding at all, resulting in no amplification.

  1. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR:PCR is a synthetic process that builds billions of new copies of a specific DNA segment using thermal cycling and a polymerase enzyme, making it the preferred choice when starting with a tiny amount of DNA or when you need to add custom sequences, like Gibson tails, to the ends of a fragment. Restriction enzyme digests: restriction digest is an analytical or preparatory process that uses specialized proteins to cut an existing, purified piece of DNA at specific recognition sequences. While PCR is ideal for generating large quantities of modified DNA, restriction digests are often preferred for simpler tasks like subcloning between classic plasmids or performing a diagnostic check to see if a plasmid contains the correct insert.

  1. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Careful attention must be paid to the design of the fragment ends. Unlike traditional cloning, Gibson Assembly relies on an exonuclease enzyme that chews back the ends of DNA to reveal single-stranded overlaps. Therefore, you must ensure that each PCR-generated fragment has a “tail” that is identical to the end of the adjacent fragment, typically between 20 and 40 base pairs in length. It is also critical to treat your PCR products with an enzyme like DpnI to destroy any original template DNA and to purify the final product through a column or a gel. This prevents leftover primers or incorrect DNA templates from interfering with the assembly process, ensuring that only the intended overlapping fragments are available for the final reaction.

  1. How does the plasmid DNA enter the E. coli cells during transformation?

circular plasmid is introduced into a bacterial cell, and it relies on making the E. coli “competent” to receive foreign DNA. In a typical chemical transformation, the cells are treated with a calcium chloride solution that helps neutralize the repulsive negative charges between the DNA and the cell membrane. By applying a sudden heat shock at 42°C, a pressure difference is created between the inside and outside of the cell, which momentarily opens up small pores or “adhesion zones” in the lipid bilayer. This allows the plasmid to be pulled into the cytoplasm, where the cell can then begin to express the genes carried on the plasmid, such as antibiotic resistance.

  1. Describe another assembly method in detail (such as Golden Gate Assembly):

    • Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

    Golden Gate Assembly a method that uses Type IIS restriction enzymes, such as BsaI, to assemble multiple parts simultaneously. These specific enzymes are unique because they recognize a DNA sequence but cut the DNA several bases away from that site, allowing for the creation of custom four-base overhangs. Because the recognition site is actually removed during the cutting process, the final assembled product no longer contains the enzyme’s “handle” and cannot be cut again. This creates a “one-pot” reaction where digestion and ligation happen in a continuous cycle, eventually driving the reaction toward the fully assembled, stable circular plasmid. This method is exceptionally modular and is frequently used in synthetic biology toolkits to mix and match different promoters, genes, and terminators with nearly 100% efficiency.

    • Model this assembly method with Benchling or Asimov Kernel!

Assignment: Asimov Kernel

no asimov account - on pause

  1. Create a Repository for your work
  2. Create a blank Notebook entry to document the homework and save it to that Repository
  3. Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)
  4. Create a blank Construct and save it to your Repository:
    • Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository
    • Search the parts using the Search function in the right menu
    • Drag and drop the parts into the Construct
    • Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository
    • Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook
  5. Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo:
    • Explain in the Notebook Entry how you think each of the Constructs should function
    • Run the simulator and share your results in the Notebook Entry
    • If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome

Reading & Resources

Week 7: Genetic Circuits Part II

Week 7: Genetic Circuits Part II


Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

  1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

An artificual neuron is a weighted summation through an activation function that produces outputs, eventually they form networks to become ANN. Intracellular artificial networks still have weighted summation and a non-linear activation function, but we can consider implementing gene circuits as these activation functions. The main difference is that IANNs will have two inputs that can do addition and subtraction. On the one hand, a promoter that through transcription makes a gene, and through translation we create proteins, we can perform addition on this. To subtract, we can treat input x1 as an endoribonuclease CasE that will bind and cleaves the RNA on the sequence and produce output. x1 is negative weight and x2 is positve weight, where the function is max(x2-x1,0). This is also referred to as Sequestration. Sequestration involves using an endorribonucleus to transcribe into mRNA to produce non-linearity (applying single turnover enzyme to remove it out of circulation).

  1. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

I think an interesting use of IANNs would be in culturing and programming organoids - Weiss Lab also focuses a lot on programmable patterning to trigger cell changes, and I can see this leading to very useful applications in supporting organoid intelligence. I think using endoribonuclease in waste management in microfluidics might be useful.

Microfluidics supporting organoid growth may end up accumulating extracellular vesicles, might shed RNA and dead cells, and these RNA-protein aggregates that can clog channels. We can use RNase H to clear RNA or use Cas13 endoribonuclease to cleave transcripts or CasE to degrade fragments so they can pass through filters, then maybe more layers of flushing and binding for removal? Limitations - not sure how they will exist the microfluidics- more resesarch needed.

  1. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.

alt text alt text alt text alt text

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2. alt text alt text

Assignment Part 2: Fungal Materials

  1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

There’s lots of use of mycelium leather, that is currently being used to manifacture into fashion products. They have used agricultural waste as feedstock and with enough coating, can develop great strength and malleability. There’s much less waste and resources needed to create mycelium leather and will help us with better animal welfare.

  1. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

It might be possible to create mycelium electronics which I’m super interested in. I am interested to genetically engineer mycelium to have better conductivity or bioelectric activity, or to use bioglue to stick to sensors for activity readout. Fungi is adaptable and can respond to environments quickly and act as good living sensors. They ’learn’ quickly and can be interfaced with EMG sensors without needing to submerge in many culture mediums or inject antibiotics.

Assignment Part 3: First DNA Twist Order

  1. Review the Individual Final Project documentation guidelines.
  2. Submit this Google Form with your draft Aim 1, final project summary, HTGAA industry council selections, and shared folder for DNA designs. DUE MARCH 20 FOR MIT/HARVARD/WELLESLEY STUDENTS

Done this!

  1. Review Part 3: DNA Design Challenge of the week 2 homework. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.

Reading & Resources

Week 9: Cell-Free Systems

Week 9: Cell-Free Systems


Homework — DUE BY START OF Apr 7 LECTURE

Homework Part A: General and Lecturer-Specific Questions

General homework questions

  1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free systems help us understand biology ‘from scratch’ to bioengineer from smaller units. There’s wider flexibility for scaffolding biology from the ground-up and controlling the environments in a complete model. Existing living cells as we know it are already incredibly complex and hence less controlled in experimental settings. Synthetic cell engineering allows flexibility in size of the cell, proteins, and even expanding largely on the chemistry of the cell. So the two scenarios could be if you want to control the size of the cell and want uniform control it might be ideal to use cell-free system. The other scenario might be to engineer a specific chemical environment or want chemical diversity in the experiment that is not naturally common/ compatible with cells. Compared to in-vivo expression where you have to create plasmids, cell-free protein expressions are faster and cheaper to construct and can also help you through quick iterations with linear fragments and without plasmids.

  1. Describe the main components of a cell-free expression system and explain the role of each component.

The anatomy of the synthetic cell has multiple parts:

  1. phospholipids and cholesterol to create strong lipid membranes
  2. cytoplasm contains small molecules
  3. cell extract such as ribosomes and enzymes
  4. tRNAs
  5. plasmids and membrane channels for communciation
  1. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Within normal cells, energy is continuously regenerated through metabolism, but cell-free systems are normally carried within microfluidics or vesicle and isn’t able to have the same glucose-ATP interactions a normal cell does. To achieve continous protein synthesis we must also introduce additional energy substrates and enzymatic regeneration systems. Common practices include introducing either phosphoenolpyruvate (PEP), creatine phosphate (CP), or acetyl phosphate (AcP) for rapid ATP regeneration via kinases present in the extract.

  1. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Transcription happens in nucleus for eukaryotes but in cytoplasm for prokaryotes.

Within prokaryotic cell-free systems, transcription and translation happen at the same time, they are much faster and productive. In contrast eukaryotic systems separate (exons and introns) and will require specific machinery enzymes that will take out the introns. They retain ribosomes, chaperones, and modification enzymes so that there is correct folding and processing of complex proteins.

GFP can be produced in a prokaryotic system and commonly produced in E Coli lysate, it is small and does not require glycosylation.

Complex human proteins are more appropriately made in eukaryotic cell free systems. Membrane proteins such as GPCRs are usually expressed through wheat germ extract. Because they are hydrophobic and are 7 transmembranes, it is difficult to fold while inserting into a membrane and require a lipid bilayer.

  1. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Membrane proteins like GPCRs are difficult because they require eukaryotic chaperones to correctly fold. Bacterial systems like E Coli lysate do not have the machinery to handle hydrophobic transmembrane domains without aggregation.

To avoid challenges like aggregation, liposomes or nanodiscs might be added to the raction mixtures so that we can help with co-translational membrane insertion.

Another problem with cell free systems is that it’s hard to distinguish protein from everything else in the mixture, so using His-tags are very useful ways to pull out specific protein using histadine and wash out other components.

Fusion protein
  |
His-tag (histadine)
  |
Ligand
  |
Bead 
------
Magnetic block 
  1. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

First it may be that ATP depletes quicker than it regenerates. In this case we will need to switch from PEP to other systems like creatine kinase for slower reactions.

Secondly, there might be aggregation due to hydrophobicity of the protein, these chains that are poking out that normally can embed into a nearby membrane will end up clumping together with other hydrophobic sections, leading to misfolding. Again Nanodiscs might help to provide hydrophobic environment so that proteins can bind to these discs as opposed to binding into each other.

mRNA is also unstable. Cell free lysates will degrade the mRNA template quickly and so maybe there might not be enough translation occurring. RNase inhibitors are typically used to stabilize.

Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows:

  1. Pick a function and describe it.
    1. What would your synthetic cell do? What is the input and what is the output?
    2. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
    3. Could this function be realized by genetically modified natural cell?
    4. Describe the desired outcome of your synthetic cell operation.

    I am interested in a synthetic minimal cell that may act as an artificial dopaminergic synapse sensor, so the input will potentially be extracellular dopamine released by differentiated PC12 cells, and to make this signal visible, we will also need a GFP or RFP florescent signal to identify proportional to dopamine concentration and report a reward signal. Typically, this must be done with encapsulation, particularly due to how membrane proteins play a huge role in allowing dopamine receptors through GPCR and that DRD1 must be expressed with the presence of lipid environment. So one way of working with this is a eukaryotic cell free system with the introduction of liposomes or nanodiscs directly into the cell-free reaction. Yes, for part of the final project I am working on this exact function via overexpressing DRD1 gene with GFP construct in PC12 cells. But natural cells have hundreds of competing pathways activated by dopamine and it’s easier to control density/ concentrations in a synthhetic cell. The desired outcome is that the synthetic minimal cell can successfully operate as a dopamine-responsive optical reporter and report with florescence that maps dopamine release events.

2. Design all components that would need to be part of your synthetic cell.
  1. What would be the membrane made of?
  2. What would you encapsulate inside? Enzymes, small molecules.
  3. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)
  4. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
  5. The membrane will likely be a cholesterol liposome bilayer. Since GPCRs insertions require cholesterol-rich membranes to adopt correct conformation for DRD1 to express, cholestrol can help increase membrane rigidity and support that insertion. For the transcription and translation machinery, we might want to use HEK293 cell-free lysate system with T7 RNA polymerase for in-vitro transcription. The DNA constructs will include a DRD1 insertion plasmid and GFP construct embedded or separated with cAMP response signalling sequence. We need mammalian set-up because GPCR is a membrane protein. Using HEK293 lysate cell free system retains glycosylation that DRD1 needs to be expressed. It will communicate with the environment via cAMP signalling, as dopamine binds extracellular DRD1 and will trigger intracellular cAMP signalling without requiring membrane permeation.

  1. Experimental details
    1. List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
    2. How will you measure the function of your system?

POPC (Palmitoyloleoylphosphatidylcholine), DOPG (Dioleoylphosphatidylglycerol ), and Cholesterol for lipids to create liposome bilayer. Genes: DRD1, protein kinase, cAMP Response Element Binding Protein 1 (CREB1), EGFP for florescence, T7 RNA Polymerase (ecoli phage). Using a plate reader we will have a range of different concentrations and read florescence. The Plate reader fluorescence assay will allow us to run 96-well plate and add extracellular dopamine at different concentrations 0, 1nM, 10nM, 100nM, 1µM, 10µM alongside lipid bilayer POPC:DOPG:Cholesterol = 60:10:30 mol%. Then we can measure GFP florescence every 30 min.

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

  • Write a one-sentence summary pitch sentence describing your concept.
  • How will the idea work, in more detail? Write 3-4 sentences or more.
  • What societal challenge or market need will this address?
  • How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

I would like to design a biosensor for robotics ‘in the wild’, where only under specific weather conditions it will use freeze dried cell free system for repair or environment-adaptive changes. For example, performing localised self-repair protein expression on the skin. To think about robotics as regenerative rather than designed and completed ‘at a factory’. Just like a tub of instant coffee that can be used ‘on tap’- the fact that it can be activated with water is extremely stable!

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

  1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

There’s actually a lot of research right now out of UC San Diego via Alysson Muotri on brain organoids in space. There’s growing research and need to maybe thinking about hybrid brains or neural repair in space that might be useful in rescuing living substrates mid-journey? Neural surgery performed in brain organoids or to support fusion in space also means that we will need to do bioprinting or scaffolding using cell-free systems via lab robot.

  1. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

I am interested in DRD1 dopamine receptor D1, Forkhead Box A2 and Nurr1 transcription factors that regulate and amplify dopamine reception in cells. We could do more chemical signalling and reinforcement if we are able to make these cells more receptive to signals.

  1. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

DRD1 is the primary receptor mediating dopamine’s effects on motivation, working memory, and motor coordination. By expressing DRD1 in a BioBits cell-free system with fluorescent reporter, we can either detect dopamine concentration in biofluid samples or test receptiveness of cells in space experiments.

  1. Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

The hypothesis will be centered around whether dopaminergic functions will be retained after freeze-drying and rehydration under simulated microgravity conditions.

  1. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

It could be a comparative experiment against freeze-dried samples with rehydration against living samples to test if freeze-drying retains dopeminergic functions. Since PC12 cell-based dopamine research on Earth requires living cell cultures, CO2 incubators, freeze-drying will streamline and potentially use as neurochemical monitoring toolkit.

Protein synthesis requires transcription and translation 

Transcription
>eukaryotes or in cytoplasm in prokaryotes
>RNA polymerase
>DNA
>nucleotides to make RNA
>polymerase will bind promoter and in the space will 

Translation
>tRNA, amino acids and ribosomes, mRNA
>inside nucleas, splicing take introns and leave extrons


>RBS (ribosome binding site) - attach to small subunit - mRNA to merge with small subunit
>tRNA will bind start codon (AUG/ ATG) (complementary to start codon)
>codons are 3 nucleotides 
>EPA
>E P A are three sites of tRNA working on codons [exit, peptide, amino acid]
>bind to start, peptides growing protein on P site, then amino acid on site and leave ribosome 

> can happen outside of cells


> TX transcription
> TL translation
> CFPS (Cell free protein synthesis)


Cell lysate
>ribosomes for translations
>tRNA
>Initiation, transcription, and trasnslation factors
>Microsomes (membranes phospholipid bilayer)

Template
>plasmid DNA (more stable)
>linear PCR

Supplements
>Nucleotides
>amino acids
>ATP (transcription)
>GTP (translation)



buffer stabilize the pH 

Prokaryotes transcription and translation can happen at the same time 
Eukaryotes separate (extrons and introns) you need specific machinery enzymes that will take out the introns

Cell free systems will give code sequence no need introns and extrons
plasmid -coding sequence - use messenger RNA to produce insulin 



Endosymbiotic theory (using mitochondria will be bactera - circular DNA too complex, use more energy keep mitochondria working)

-no time-consuming cloning steps required
-reaction conditions can be fully controlled and modified
-proteins that are toxic to cells can still be produced
-

Tx-TL system can be classified by the source of the cell extract 
> bacterial cell-free system
  >E Coli
> eukaryotic 
  >yeast, mammalian, insect, plant

Insulin are made by disulfide bond make two polypeptide chains


Cell lysis and  a lot of purification 

Chromoproteins
just have colors 

GFP 
Green florescent protein requires blacklight

RFP



purification filters the protein you want

his-tag + protein of interest --> the tag will bind to another metal ion 


Fusion protein
  |
His-tag (histadine)
  |
Ligand
  |
Bead 
------
Magnetic block 




Today's protocol: 

GFP, RFP, mix 
Affinity Purification to isolate our samples 
Mixture --> will get separation 

1 GFP
2 RFP
3 mix
4 mix

Homework Part B: Individual Final Project

Check final project page!


Reading & Resources (click to expand)

Labs

Lab writeups:

  • PCR

    PCR Photocopier and amplifier qPCR quantitative PCR mastermix pcr tubes DENATURALIZATION ANNEALING EXTENSION

  • Week 1 Lab: Pipetting

  • Week 11 Lab: Microfluidics

    Bends in microfluidics devices Separate and sort particles Add weight and shape in particle Microreactors - cavities in the middle (stationery area - form assay, UV curating, heat, using time to activate the reaction) Reynolds number Re is the ratio of inertial forces to viscous forces. Force to viscosity

  • Week 7 Lab: Cell-free systems

    Protein synthesis requires transcription and translation Transcription eukaryotes or in cytoplasm in prokaryotes RNA polymerase DNA nucleotides to make RNA polymerase will bind promoter and in the space will Translation tRNA, amino acids and ribosomes, mRNA inside nucleas, splicing take introns and leave extrons RBS (ribosome binding site) - attach to small subunit - mRNA to merge with small subunit tRNA will bind start codon (AUG/ ATG) (complementary to start codon) codons are 3 nucleotides EPA E P A are three sites of tRNA working on codons [exit, peptide, amino acid] bind to start, peptides growing protein on P site, then amino acid on site and leave ribosome

Subsections of Labs

PCR

PCR Photocopier and amplifier

qPCR quantitative PCR

mastermix pcr tubes

DENATURALIZATION

ANNEALING

EXTENSION

Week 1 Lab: Pipetting

cover image cover image

Week 11 Lab: Microfluidics

Bends in microfluidics devices Separate and sort particles

Screenshot 2026-04-23 at 18.42.28.png Screenshot 2026-04-23 at 18.42.28.png

Add weight and shape in particle

Microreactors - cavities in the middle (stationery area - form assay, UV curating, heat, using time to activate the reaction)

Reynolds number

Re is the ratio of inertial forces to viscous forces. Force to viscosity

Capillary Number

Peclet Number

Types of channels

Rectangular channels Circular Channels Trapezoidal Channels V shaped channels Herringbone or grooves –> things rolling along the pattern

Cavities

Network Architectures

Chamber Filter Tesla Valve Droplet

if shapes are sandwiches - there are sealants e.g. PDMS bonding system

Stereolithography DLP

Syringe pump Flow.io Nano litres per minute

Design Challenge

Fluid3D

Week 7 Lab: Cell-free systems

cover image cover image

Protein synthesis requires transcription and translation

Transcription

eukaryotes or in cytoplasm in prokaryotes RNA polymerase DNA nucleotides to make RNA polymerase will bind promoter and in the space will

Translation

tRNA, amino acids and ribosomes, mRNA inside nucleas, splicing take introns and leave extrons

RBS (ribosome binding site) - attach to small subunit - mRNA to merge with small subunit tRNA will bind start codon (AUG/ ATG) (complementary to start codon) codons are 3 nucleotides EPA E P A are three sites of tRNA working on codons [exit, peptide, amino acid] bind to start, peptides growing protein on P site, then amino acid on site and leave ribosome

can happen outside of cells

TX transcription TL translation CFPS (Cell free protein synthesis)

Cell lysate

ribosomes for translations tRNA Initiation, transcription, and trasnslation factors Microsomes (membranes phospholipid bilayer)

Template

plasmid DNA (more stable) linear PCR

Supplements

Nucleotides amino acids ATP (transcription) GTP (translation)

buffer stabilize the pH

Prokaryotes transcription and translation can happen at the same time Eukaryotes separate (extrons and introns) you need specific machinery enzymes that will take out the introns

Cell free systems will give code sequence no need introns and extrons plasmid -coding sequence - use messenger RNA to produce insulin

Endosymbiotic theory (using mitochondria will be bactera - circular DNA too complex, use more energy keep mitochondria working)

-no time-consuming cloning steps required -reaction conditions can be fully controlled and modified -proteins that are toxic to cells can still be produced

Tx-TL system can be classified by the source of the cell extract

bacterial cell-free system E Coli eukaryotic yeast, mammalian, insect, plant

Insulin are made by disulfide bond make two polypeptide chains

Cell lysis and a lot of purification

Chromoproteins just have colors

GFP Green florescent protein requires blacklight

RFP

purification filters the protein you want

his-tag + protein of interest –> the tag will bind to another metal ion

Fusion protein | His-tag (histadine) | Ligand | Bead

Magnetic block

Today’s protocol:

GFP, RFP, mix Affinity Purification to isolate our samples Mixture –> will get separation

1 GFP 2 RFP 3 mix 4 mix

Projects

Final projects:

Subsections of Projects

Group Final Project

cover image cover image

Individual Final Project: Organoid-lab-on-a-chip (BSL-1)


Organoid-lab-on-a-chip for BSL-1 Labs

Closed-loop biochemical reinforcement learning for brains-on-chips via mRNA-mediated dopaminergic differentiation of PC12 cells with Nurr1 and FoxA2

Jenn Leung | LifeFabs Institute | London

HTGAA 2026: Individual Final Project Documentation

Presentation Slide Presentation Slide Presentation Slide Presentation Slide Presentation Slide Presentation Slide

Section 1: Abstract

This project is an attempt at prototyping a minimal brain-on-chip platform designed to facilitate closed-loop biochemical communication between synthetic neural substrates and automated software systems. Inspired by Cortical Labs’ CL-1, and following up from my previous collaboration with the start-up on developing closed-loop reinforcement learning games or creative technology experiments with biocomputers.

The brain-on-chip generally is composed of human-derived cortical neurons, high-density microelectrode arrays, microfluidics, and a software system that analyzes the electrical activity of the neurons. For my final project, I am looking to build a minimal version of that with a focus on biochemical delivery.

The stack looks to integrate three major components: Wetware: Synthetic mRNA-Mediated Dopaminergic Differentiation of PC12 Cells with Nurr1 and FoxA2 mRNA; Software: Automated chemical delivery via Opentrons OT-2 with Python and chemical/electrical activity readout; and Hardware: 3D-printed microfluidics custom labware with Multi-Electrode Array to create biochemical I/O between neural substrates and liquid handling systems.

While various labs are already working towards organoid intelligence, treating living neurons as the computational substrate, for LifeFabs I have adapted my protocol to adhere to BSL-1 safety standards, using PC12 cells (rattus norvegicus) derived from rat pheochromocytoma as a basis for modeling neuronal differentiation for synaptic transmission (See PMC12696136). I am interested in studying dopamine synthesis in PC12 cells as a measurable signal for reinforcement learning, which is why I have looked into Nurr1 and FoxA2 as transcription factors that can drive neuron differentiation (10.1002/stem.294).

For the protocol for the synthetic biology component there are two general directions depending on costs and ease of delivery: 1) Twist synthetic DNA sequences to activate Nurr1 and FoxA2 transcription factors and promote gene expression

  1. Direct mRNA synthesis for ready-to-transfect mRNA

With this, I can then transfect the mRNA into PC12 cells, using Opentrons OT-2 robot for lab automation and to facilitate synergistic dopaminergic differentiation. Dopamine synthesis will be measured.

Significance

Current platforms such as Cortical Labs’ CL-1 rely on human-derived cortical neurons requiring BSL-2 containment, which limits accessibility for independent researchers and creative technologists. There is a need for minimal, accessible brain-on-chip prototypes that maintain biological relevance while operating within BSL-1 safety constraints. The minimal set-up already provides useful contexts for the development of other hardware and physical systems such as microfluidics, custom labware, and microelectrode arrays (MEA).

Broad Objective

This project aims to prototype a minimal brain-on-chip platform integrating synthetic dopaminergic neural substrates, automated biochemical delivery, and multi-electrode array readout into a closed-loop biochemical I/O system, with the goal to start a foundation for reinforcement learning experiments with living neural tissue at BSL-1.

Hypothesis

Synthetic mRNA-mediated co-expression of Nurr1 and FoxA2 transcription factors in PC12 cells wis likely to drive measurable dopaminergic differentiation, producing quantifiable dopamine synthesis that can serve as a biochemical state signal in reinforcement learning experiments automated via liquid handling robotics.

Specific Aims

Common approaches to brains-on-chips use electrical stimulation or chemical stimulation as reinforcement learning methods. While studying genetically modified PC12 cells I am hoping to study dopamine synthesis with activated Nurr1 and FoxA2 transcription factors that will help facilitate neuron differentiation.

Methods

The wetware component employs PC12 cells derived from rat pheochromocytoma as a BSL-1 compatible model for dopaminergic neuronal differentiation. Cells are differentiated with NGF prior to transfection with synthetic Nurr1 and FoxA2 mRNA using Lipofectamine MessengerMAX.

Two synthesis routes are under consideration depending on cost and accessibility: Twist Bioscience DNA synthesis with subsequent in vitro transcription, or direct ready-to-transfect mRNA from a commercial synthesis service. Dopamine synthesis is quantified by high-sensitivity ELISA as the primary biochemical readout and reinforcement learning state signal. The software component uses Python on the Opentrons OT-2 platform to automate stimulus delivery and implement a closed-loop reward logic based on dopamine concentration thresholds. The hardware component integrates custom 3D-printed microfluidic labware with a multi-electrode array to enable simultaneous biochemical delivery and electrophysiological recording, creating bidirectional I/O between the neural substrate and the automated software system.

Section 2: Project Aims

Aim 1: Experimental Aim

My aim 1 is to define and test a synthetic biology approach to designing better dopaminergic reinforcement learning signals for brains-on-chips. The experimental aim is to generate dopaminergically differentiated PC12 cells through synthetic Nurr1 and FoxA2 mRNA transfection as the computational substrate for minimal brains-on-chips, following established protocols for mRNA-based dopaminergic neuron generation from PC12 cells, methods and protocols include:

Designing DNA sequences to promote Nurr1 and FoxA2 via Benchling

Developing synthetic Nurr1, FoxA2 and GFP mRNA (commercial synthesis or Twist DNA + IVT)

Lipofectamine MessengerMAX transfection protocol (Kim et al. 2017, PMC5589083)

NGF differentiation protocol for PC12 cells

Dopamine ELISA readout (Eagle Biosciences)

Aim 2: Development Aim

Measure and respond to biochemical signals from the substrate real-time so that this could help support reinforcement learning experiments

This will be developed alongside the Opentrons OT-2 with custom Python Reinforcement Learning script

Designing custom labware for OT2 to facilitate chemical delivery and signal readout (inspired by OrganRX)

Aim 3: Visionary Aim

Develop open, accessible, and low-cost framework for biological reinforcement learning in DIY brains-on-chips at BSL-1 labs.

This may include:

  • Using commercially available immortalised cell lines such as PC12

  • Developing open-source repositories and libraries for liquid handling automation for lab robotics

  • Designing custom labware ‘organoid-lab-on-a-chip’ and allowing open access to its Opentrons JSON

  • Using the organoid-lab-on-a-chip as a platform for developing and testing for cheaper microelectrode arrays for electrical interfacing

Section 3: Background

Background and Literature Context

Provide background research that explains the current state of knowledge and identifies the gap in knowledge or capability that your project addresses.

Briefly summarize two peer-reviewed research citations relevant to your research:

Foxa2 and Nurr1 Synergistically Yield A9 Nigral Dopamine Neurons

Lee et al. 2010

This paper establishes why it is critical to use both Nurr1 and FoxA2 together for improved differentiation from neural precursor cells (NPCs) such as PC12. FOXA2 (Forkhead Box A2)is a transcription factor critical for embryonic development, while Nurr1 is a transcription factor essential for the development, survival, and maintenance of midbrain dopaminergic neurons.

The authors showed that Nurr1 alone cannot generate fully mature midbrain dopamine neurons as it produces cells with only partial dopaminergic identity and fails entirely in mouse and human-derived precursors. FoxA2, a forkhead transcription factor expressed early in midbrain development, was identified as the critical co-factor.

Efficient Generation of Dopamine Neurons by Synthetic Transcription Factor mRNAs

Kim et al. 2017 (PMC5589083)

Kim et al. asked whether Nurr1 and FoxA2 could be delivered via synthetic mRNA rather than viral vectors. They designed a custom vector (pcDNA/UTR120A) with optimised 5’ and 3’UTRs and produced mRNA by in vitro transcription, then transfected it into rat neural precursor cells. mRNA transfection alone was sufficient to drive full dopaminergic differentiation — cells became electrophysiologically active, released measurable dopamine, and expressed the full synthesis machinery

Fluidic Programmable Gravi-maze Array for High Throughput Multiorgan Drug Testing

Wong et al. 2025 (bioRxiv 2025.06.18.660241)

Biopico Systems presents OrganRX™, a modular, gravity-driven multiorgan-on-a-plate (MOAP) system integrating gut, liver, kidney, brain and endothelium within a single microfluidic architecture.

OrganRX is compatible with the Opentrons OT-2 for automated liquid dispensing, which is exactly what I want to adapt for my project. Second, it references gravity-driven, pump-free design principles that I may be able to model after for the 3D-printed microfluidic labware that can help support the development of accessible, minimal brain-on-chip hardware.

PC12 Cell Line: Cell Types, Coating of Culture Vessels, Differentiation and Other Culture Conditions

Wiatrak et al. 2020 (Cells 9(4):958)

The paper systematically compared the two ATCC PC12 variants across coating types, NGF concentrations, and incubation times, and made several findings that directly affect my protocol design and choices. The authors concluded that only traditional PC12 (CRL-1721) should be used for neurobiological studies, as these immortalized cell lines, when combined with NGF (nerve growth factor), can stop dividing, develop neurites, and adopt a neuronal phenotype. The adherent variant (PC12 Adh, CRL-1721.1) behaves fundamentally differently and does not differentiate effectively with NGF. The protocol follows 100 ng/mL rat NGF for 14 days with 48-hour media changes.

Innovation

Typically, at BSL-2 level, brains-on-chips use human iPSC-derived neurons, which become extremely expensive to reproduce. While these typical brains-on-chips have demonstrated learning capabilities due to their rich electrical computation, they are not yet capable of producing dopamine as an alternative mode for reinforcement learning.

The project is novel in that it not only proposes a BSL-1 accessible method to learn and prototype minimal brains-on-chips, but it also offers a cheaper alternative to differentiating neuron-like cells from PC12 immortalized cell lines. This allows the project to be more democratizable without human-derived cells.

Secondly, the project takes on a unique synthetic biology approach to understanding brains-on-chips. As there is little research in using PC-12 cells for brains-on-chips right now, learning has not been demonstrated. Although PC-12 cells show only limited electrical activity, through an mRNA approach to differentiating PC-12 cells with two complementary transcription factors (FoxA2 and Nurr1) can help generate dopaminergic neurons, so that these cells can become electrically excitable and dopaminergically functional, which now makes biochemical and electrical reinforcement learning a possibility for these BSL-1 brains-on-chips.

Significance

Synthetic biological intelligence provides us with alternative frameworks to review our current silicon-based computational infrastructure. We look into the possibility of living neurons as the computational substrate for information processing when interfaced with MEA for electrical stimulation and recording, as a new framework for helping us reposition the current silicon based computational infrastructure.

As biology is neuroplastic and difficult to scale, they are currently underutilized in helping us study their potential in changing silicon-based computing infrastructure. Future synthetic biological intelligence comes in different cultures, assemblies, sizes, shapes, and forms, and also its dimensions will change based on MEAs - this means there isn’t a one-size fits all solution for us to actually understand these cognitive assemblies nor a good method. In order to develop meaningful communication across these intelligences, we must understand them through higher order behavioral traits, where intelligence should be interfaced with ‘at their scale’. This means we don’t try to understand biological intelligence through mechanistic interpretability but through communicating meaningfully.

The project also aims to address the privatization of brains-on-chips research. Since 2022, leading brains-on-chips research has been concentrated among a few start-ups and university institutions, such as Harvard’s Arlotta Lab, Swiss start-up FinalSpark, and Australian/Singaporean start-up Cortical Labs. Current biotechnology start ups are primarily concerned with serving biocomputing at scale, and needing to build a product quickly. But this era of biocomputing deserves research on more diverse methods of making meaningful communication. This project is instead more interested in democratizing the building, development, and networking of new biocomputers.

Bioethical Considerations

As we speculate these brains-on-chips systems to become democratized and decentralized, there will spawn many different configurations of physical/ neural assemblies with advances in MEA designs, bioprinting technologies, and microfluidic platforms. 1) benchmarking integrity and reproducibility, for example, how do we measure spiking activity across different systems? How do we make sure experiments are scientifically meaningful? How do we translate and deliver virtual environments to channels on different MEA geometries? 2) ensuring accessibility to independent researchers, for example, writing software environments not only for proprietary technologies such as Cortical Lab’s CL1 or FinalSpark’s Neuroplatform. Governance here means committing to abstraction layers that treat CL1 as one implementation among many 3) responsible scalability across new substrates, for example, new substrates includes increasingly complex organoids or assembloids that should go through rigorous bioethical frameworks. 4) Support sustainability & longevity of the substrates, there should be rate limitations so that cells aren’t overly stimulated and at risk of quick death.

This will mean that the project has to go through bioethical standards compliance, research ethics committee approvals, and wet-laboratory licensing requirements must be ensured through partner wet-lab facilities.

First, to develop a general-use minimal brain-on-a-chip model at BSL-1 labs, we will need to first benchmark across different currently commercially available platforms.

For example, there are multiple proprietary brain-on-a-chip platforms such as Cortical Labs’ CL1 and FinalSpark’s Neuroplatform, but there are no standardizations or comparisons metadata of these systems. I am proposing to create a metadata of existing platforms/ systems and develop an open access metadata standard that documents different MEA geometries, channel count, neural substrates, culture medium, experimental protocols.

This will involve mapping out a group of academic researchers who have been working on organoid intelligence/ synthetic bioengineered intelligence standardization, and manufacturers such as MaxWell Biosystems, Cortical Labs, etc., join community labs or open-source groups on open-source research. This action assumes that all parties are happy to share their manual or manufacturing details, however, some of this data might be protected under NDA. There are also some risks of failure and success through this approach as there’s a high chance the open-source projects will grow exponentially, making this metadata impossible to manage at scale.

Section 4: Experimental Design, Techniques, Tools, and Technology

Use Claude AI skills to refine your HTGAA final project experimental design here. All HTGAA projects must include some DNA design! Make sure this form is submitted.

The Stack

Neural Substrate (PC12 Cells)

DNA Construct Design

Two IVT template plasmids are designed on Benchling and ordered from Twist Bioscience in pTwist Amp High Copy:

FoxA2_T7_EGFP_Kozak_3UTR

Nurr1_T7_Kozak_RFP_3UTR

Construct 1: Nurr1-GFP IVT Template

Architecture:

5' buffer — T7 promoter — 5'UTR — Kozak — Nurr1 CDS — GGGSGGGGS linker — EGFP — stop codon — 3'UTR — 3' buffer

Element sources:

  • 5’ buffer: AAAACCCAAA — IVT initiation efficiency

  • T7 promoter: TAATACGACTCACTATA — T7 RNA polymerase recognition (Studier & Moffatt 1986)

  • 5’UTR: Xenopus beta-globin derived — from Kim et al. (2017)

  • Kozak: GCCACCATG — optimal translation initiation (Kozak 1987)

  • Nurr1 CDS: rat NM_019328, positions 112–1908 (1797 nt), stop codon removed

  • Linker: GGCGGCGGCTCCGGCGGCGGCTCC → GGGSGGGGS (flexible fusion linker)

  • EGFP: AAB02572.1 (Cormack et al. 1996)

  • Stop codon: TAA

  • 3’UTR: double beta-globin from Kim et al. (2017)

  • 3’ buffer: GCGGCCGC (NotI site)

LOCUS       Nurr1_GFP_IVT_template       4156 bp    DNA     linear
DEFINITION  Nurr1-GFP fusion IVT template in pTwist Amp High Copy
ACCESSION   .
VERSION     .
KEYWORDS    Nurr1; GFP; IVT; mRNA; dopaminergic; PC12
SOURCE      synthetic construct
ORGANISM  synthetic construct
FEATURES             Location/Qualifiers
misc_feature    1..10
/label="5prime_buffer"
/note="AAAACCCAAA"
promoter        11..27
/label="T7_promoter"
/note="TAATACGACTCACTATA"
5'UTR           28..71
/label="5prime_UTR"
/note="Xenopus beta-globin derived, Kim et al. 2017"
regulatory      72..80
/label="Kozak"
/note="GCCACCATG"
CDS             78..1874
/label="Nurr1_CDS"
/note="rat NM_019328, stop codon removed"
/codon_start=1
misc_feature    1875..1898
/label="GGGSGGGGS_linker"
/note="GGCGGCGGCTCCGGCGGCGGCTCC"
CDS             1899..2618
/label="EGFP"
/note="AAB02572.1, Cormack et al. 1996"
misc_feature    2619..2621
/label="stop_codon"
/note="TAA"
3'UTR           2622..2715
/label="3prime_UTR"
/note="double beta-globin, Kim et al. 2017"
misc_feature    2716..2723
/label="3prime_buffer"
/note="GCGGCCGC NotI site"
ORIGIN
1 aaaacccaaa taatacgact cactataggg aaataagaga gaaaagaaga gtaagaagaa
61 atataagagc cgccaccatg cactcggctt ccagtatgct gggagccgtg aagatggaag
121 ggcacgagcc atccgactgg agcagctact acgcggagcc tgagggctac tcttccgtga
181 gcaacatgaa cgccagcctg gggatgaatg gcatgaacac ttacatgagc atgtccgcgg
241 ctgcaatggg cagtggttcc ggcaacatga gcgcaggctc catgaacatg tcatcctatg
301 tgggcgctgg aatgagcccg tcgctggctg gcatgtcccc gggcgcgggc gccatggcgg
361 gcatgagcgg ctcagctggg gcggccggcg tggcgggcat gggaccgcac ctgagtccga
421 gtctgagccc actcgggggc aggcggccgg ggctatgggt ggccttgctc cctacgccat
481 atgaactcca tgagtcctat gtacgggcag gcgggcctga gccgcgctcg ggaccccaag
541 acgtaccggc gcagctacac tcacgccaag cctccctact cgtacatctc gctcatcacc
601 atggccatcc agcagagccc caacaagatg ctgacgctga gcgagatcta tcagtggatc
661 atggacctct tccctttcta ccggcagaac cagcagcgct ggcagaactc catccgtcat
721 tctctctcct tcaacgactg ctttctcaag gtgccccgct cgccagacaa gcctggcaag
781 ggctccttct ggaccctgca ccctgactct ggcaacatgt tcgagaacgg ttgctacctg
841 cgccgccaga agcgcttcaa gtgtgagaag caactggcgt tgaaggaagc agcgggtgcg
901 ggcagtggcg gaggcaagaa gaccgctcct gggacacagg cttctcaggt tcagctcggg
961 gaggccgcag gctcggcctc tgagactccg gcgggcaccg agtcccccat tccagcgctt
1021 ctccgtgtca ggagcacaag cgaggtggcc tgagcgagct gaagggaaca cctgcctctg
1081 cgctgagtcc tccggagccg gcgccctcgc ctgggcagca gcagcaggct gcagcccacc
1141 tgctggtccc acctcaccat cctggcctgc caccagaggc ccacctgaag cccgagcacc
1201 attacgcctt caaccacccc ttctctatca acaacctcat gtcctccgag cagcaacatc
1261 atcacagcca ccaccaccat cagccccaca aaatggacct caagacctac gaacaggtca
1321 tgcactaccc tgggggctac ggttcccccatgccaggcag cttggccatg ggcccagtca
1381 cgaacaaagc cggcctggat gcctcgcccc tggctgcaga cacttcctac taccagggag
1441 tgtactccag gcctattatg aactcgtcct aaggcggcgg ctccggcggc ggctccatgg
1501 tctccaaagg tgaagaattg tttactggag ttgttccgat tctcgtggaa ctcgatggag
1561 atgtgaatgg gcataaattt tccgtcagcg gggaaggaga aggagacgca acatatggga
1621 aactcactct taaatttata tgtacaacag ggaaactccc ggttccgtgg ccaactcttg
1681 tcactactct cacatatggt gtccaatgtt ttagcaggta tcctgatcat atgaaacaac
1741 atgatttctt taaatcagca atgcctgagg gatatgtgca ggaaagaacc attttcttta
1801 aagacgatgg taattataaaacaagagctg aagtcaaatt tgagggagat acactcgtca
1861 atcggattga actcaaaggg attgacttta aagaagatgg gaatattctg ggccataaac
1921 ttgaatacaa ttataattca cataatgtgt acataatggc ggataagcaa aagaacggaa
1981 taaaagtcaa ctttaaaatt aggcataata tagaagatgg gagtgtacaa ctggcagatc
2041 attatcaaca aaatactcca attggcgatg ggccagtcct tttgcctgat aatcattatc
2101 tctcaactca aagcgctctt tcaaaggatc caaatgagaa acgagaccat atggtgttgc
2161 tcgaatttgt tactgctgct ggcattaccc ttgggatgga tgaattgtat aaataatgcc
2221 ttctgcgggg cttgccttct ggccatgccc ttcttctctc ccttgcacct gtacctcttg
2281 gtctttgaat aaagcctgag taggaagtga gggtgaagag cctgcacctc ggcgcggccgc
//
Construct 2: FoxA2-RFP IVT Template

Architecture:

5' buffer — T7 promoter — 5'UTR — Kozak — FoxA2 CDS — GGGSGGGGS linker — mCherry — stop codon — 3'UTR — 3' buffer

Element sources:

  • FoxA2 CDS: rat NM_012743, positions 190–1569 (1380 nt), stop codon removed

  • mCherry: Shaner et al. (2004)

  • All other elements identical to Construct 1

LOCUS       FoxA2_RFP_IVT_template       3739 bp    DNA     linear
DEFINITION  FoxA2-mCherry fusion IVT template in pTwist Amp High Copy
ACCESSION   .
VERSION     .
KEYWORDS    FoxA2; mCherry; RFP; IVT; mRNA; dopaminergic; PC12
SOURCE      synthetic construct
ORGANISM  synthetic construct
FEATURES             Location/Qualifiers
misc_feature    1..10
/label="5prime_buffer"
promoter        11..27
/label="T7_promoter"
5'UTR           28..71
/label="5prime_UTR"
regulatory      72..80
/label="Kozak"
CDS             78..1457
/label="FoxA2_CDS"
/note="rat NM_012743, stop codon removed"
misc_feature    1458..1481
/label="GGGSGGGGS_linker"
CDS             1482..2198
/label="mCherry"
/note="Shaner et al. 2004"
misc_feature    2199..2201
/label="stop_codon"
/note="TAA"
3'UTR           2202..2295
/label="3prime_UTR"
misc_feature    2296..2303
/label="3prime_buffer"
/note="GCGGCCGC"
ORIGIN
1 aaaacccaaa taatacgact cactataggg aaataagaga gaaaagaaga gtaagaagaa
61 atataagagc cgccaccatg ctgggagcag tgaagatgga agggcacgag ccatccgact
121 ggttcagctc tggggaggcc gcaggctcgg cctctgagac tccggcgggc accgagtccc
181 cccattccag cgcttctccg tgtcaggagc acaagcgagg tggcctgagc gagctgaagg
241 gaacacctgc ctctgcgctg agtcctccgg agccggcgcc ctcgcctggg cagcagcagc
301 aggctgcagc ccacctgctg gtcccacctc accatcctgg cctgccacca gaggcccacc
361 tgaagcccga gcaccattac gccttcaacc accccttctc tatcaacaac ctcatgtcct
421 ccgagcagca acatcatcac agccaccacc accatcagcc ccacaaaatg gacctcaaga
481 cctacgaaca ggtcatgcac taccctgggg gctacggttc ccccatgcca ggcagcttgg
541 ccatgggccc agtcacgaac aaagccggcc tggatgcctc gcccctggct gcagacactt
601 cctactacca gggagtgtac tccaggccta ttatgaactc gtcctaaggc ggcggctccg
661 gcggcggctc catggtgagc aagggcgagg aggataacat ggccatcatc aaggagttca
721 tgcgcttcaa ggtgcacatg gagggctccg tgaacggcca cgagttcgag atcgagggcg
781 agggcgaggg ccgcccctac gagggcaccc agaccgccaa gctgaaggtg accaagggcg
841 gcccgctgcc cttcgcctgg gacatcctgt cccctcagtt catgtacggc tccaaggcct
901 acgtgaagca ccccgccgac atccccgact acttgaagct gtccttcccc gagggcttca
961 agtgggagcg cgtgatgaac ttcgaggacg gcggcgtggt gaccgtgacc caggactcct
1021 ccctgcagga cggcgagttc atctacaagg tgaagctgcg cggcaccaac ttcccctccg
1081 acggccccgt aatgcagaag aagaccatgg gctgggaggc ctccaccgag cggatgtacc
1141 ccgaggacgg cgccctgaag ggcgagatca agcagaggct gaagctgaag gacggcggcc
1201 actacgacgc cgaggtcaag accacctaca aggccaagaa gcccgtgcag ctgcccggcg
1261 cctacaacgt caacatcaag ctggacatca cctcccacaa cgaggactac accatcgtgg
1321 agcagtacga gcgcgccgag ggccgccact ccaccggcgg catggacgag ctgtacaagt
1381 aatgccttct gcggggcttg ccttctggcc atgcccttct tctctccctt gcacctgtac
1441 ctcttggtct ttgaataaag cctgagtagg aagtgagggt gaagagcctg cacctcggcg
1501 cggccgc
//

These constructs are ordered, validated, and codon optimized via Twist Bioscience. Below are some evidences of the validation:

image image

Custom labware for Opentrons OT-2

3D-printed custom labware microfluidic plate with inlets and outlets for OT2.

The design was first prototyped via Flui3D: Flui3D Flui3D Flui3D2 Flui3D2

They were tested and designed to fit within a single-well plate in its first iteration.

The following are then prototyped and modeled in Blender, 3D-printed, and then converted into Opentrons JSON using Labware Creator. Custom labware Custom labware Custom labware 2 Custom labware 2

Opentrons OT-2 Deck Layout Design

Experimental Protocol and Automated Workflows

Here is a simplified version of the experimental protocol - more detailed version in appendix.

Step 1 — Sequence design and SecureDNA screening

Method: Sequences designed in Benchling. All sequences submitted to SecureDNA for screening prior to ordering.

Automation: Manual — Benchling platform

Plate: N/A

Expected result: Sequences cleared — no sequences of concern identified

Timeline: Day 1 — 2 hours


Step 2 — Twist Bioscience DNA order

###eSer

Method: Insert sequences submitted to Twist Bioscience as clonal gene orders in pTwist Amp High Copy. Two constructs ordered: Nurr1-GFP IVT template and FoxA2-RFP IVT template.

Automation: Twist online ordering portal

Plate: N/A (Twist ships as glycerol stock or resuspended plasmid)

Expected result: Plasmid delivery in 7-10 business days

Timeline: Day 1 — 30 minutes submission


Diagram Diagram

Step 3 — Cell-free mRNA expression validation

Method: Upon receipt of Twist plasmids, validate construct expression using cell-free protein synthesis (CFPS) before committing to full IVT and PC12 transfection. Plasmid transcribed in vitro using HiScribe T7 kit. Resulting mRNA added to cell-free expression system. GFP and mCherry fluorescence measured to confirm both fusion proteins are produced.

Automation: Echo525 — acoustic liquid transfer of CFPS reagents to 384 Greiner black-well clear-bottom plate; PHERAstar FSX — fluorescence detection (GFP: ex 488/em 520; mCherry: ex 555/em 610)

Plate: 384 Greiner black-well clear-bottom

Expected result: Fluorescence signal above background in construct wells; no signal in no-template controls

Timeline: Day 10-11 (upon Twist delivery)


Step 4 — Bacterial transformation and plasmid amplification

Method: Both Twist plasmids transformed into NEB 10-beta competent E. coli. Overnight culture on LB + ampicillin (100 µg/ml) plates. 3-6 colonies picked per construct for overnight liquid culture.

Automation: Manual transformation; Cytomat shaking incubator for overnight bacterial growth

Plate: N/A (standard Petri dishes)

Expected result: 10-100 colonies per construct

Timeline: Day 10-11


Step 5 — Miniprep and sequencing verification

Method: GeneJET Miniprep Kit. Elute in 50 µl elution buffer. Quantify by Nanodrop. Send one sample per construct for Sanger sequencing. Verify T7 promoter, 5’UTR, Kozak, CDS, linker junction, fluorescent protein, stop codon, and 3’UTR all intact.

Automation: HiG Centrifuge for miniprep spins

Plate: N/A

Expected result: >50 ng/µl, A260/280 ratio 1.8-2.0. Sequencing confirms all junctions correct.

Timeline: Day 12-13


Step 6 — Template linearisation and IVT

Method: Plasmid linearised by KpnI digestion (1 hour, 37°C). Linearisation verified on 1% agarose gel. Purified by GeneJET PCR purification. IVT performed using HiScribe T7 High Yield RNA Synthesis Kit (2 hours, 37°C). TURBO DNase added to remove template (15 min). Poly-A tail added enzymatically using E. coli Poly-A Polymerase (NEB M0276, 45 min). mRNA purified by RNeasy Mini Kit. Quantified by Nanodrop. Quality verified on agarose gel.

Automation: Inheco Plate Incubator for IVT reaction; HiG Centrifuge for RNeasy spins

Plate: N/A (microcentrifuge tubes — RNase-free)

Expected result: Single clean mRNA band on gel. Yield 50-100 µg per reaction. A260/280 ~2.0.

Timeline: Day 14 — full day


Step 7 — PC12 cell thawing and expansion

Method: PC12 CRL-1721 (suspension variant) thawed from liquid nitrogen into warm growth medium (DMEM + 10% horse serum + 5% FBS + 1% P/S). Expanded in T25 flask. Passage 15-25 cells used throughout.

Automation: Manual — laminar flow hood and CO2 incubator

Plate: T25 flask

Expected result: Cells attach as clusters; healthy morphology by Day 2

Timeline: Day 1 (parallel with DNA work)


Step 8 — Plate coating (PDL + Collagen Type I)

Method: 6-well plates coated sequentially. PDL (50 µg/ml) applied for 1 hour at RT, aspirated, dried. Collagen Type I (10 µg/ml in PBS) applied overnight at 4°C. Residual collagen left on surface — not washed.

Automation: Opentrons OT-2 with P300 single channel — PDL and collagen dispense steps

Plate: Corning 6-well flat bottom plate

Expected result: Uniform coating visible; cells will adhere and extend neurites

Timeline: Day 7-8


Step 9 — PC12 cell plating and NGF differentiation

Method: Cells broken up using syringe (18G x2, 22G x3, 25G x3). Counted by haemocytometer. Plated at 50,000 cells/well in growth medium. After 24h, switched to differentiation medium (DMEM + 1% horse serum + 1% P/S + 100 ng/ml rat NGF). Media changed every 48h with fresh NGF for 14 days.

Automation: Opentrons OT-2 — media change steps (aspirate 1ml, dispense 1ml fresh NGF medium per well, 6 wells per run, every 48h)

Plate: Corning 6-well flat bottom plate

Expected result: Neurites visible by Day 10-11; mature network by Day 14

Timeline: Days 8-22


Step 10 — Daily mRNA transfection (Days 15-21)

Method: Per well: Tube A — 25 µl Opti-MEM + 500 ng mRNA total (250 ng Nurr1-GFP + 250 ng FoxA2-RFP for experimental wells). Tube B — 25 µl Opti-MEM + 1.5 µl MessengerMAX. Combine, incubate 5 min. Remove half media from well. Add complex dropwise. Rock plate. Incubate 4 hours. Replace with fresh NGF differentiation medium. Repeat daily for 7 days.

Automation: Opentrons OT-2 — complex addition, half-media removal, fresh medium addition

Plate: Corning 6-well flat bottom

Expected result: GFP and RFP nuclear fluorescence visible from Day 16. Yellow overlap cells (both TFs) appear by Day 17-18.

Timeline: Days 15-21 daily


Step 11 — Live fluorescence imaging

Method: Cells imaged at 10x and 20x on fluorescence microscope. GFP channel (ex 488/em 520) for Nurr1-GFP. RFP channel (ex 555/em 610) for FoxA2-RFP. Merged image assessed for yellow double-positive cells. GFP and RFP signal in nucleus confirms correct transcription factor localisation.

Automation: Spark Plate Reader — fluorescence quantification in plate format (if 96-well plate format used); manual fluorescence microscope for imaging

Plate: 96-round-axygen-pdw11cs-halfdeep for Spark quantification of conditioned media samples

Expected result: 20-40% cells GFP+, 20-40% RFP+, 10-20% yellow double-positive

Timeline: Days 16, 18, 20


Step 12 — TH immunostaining (Day 20)

Method: Cells fixed with 4% PFA (15 min RT). Washed 3x PBS. Permeabilised with 0.25% Triton X-100 (10 min). Blocked with 3% BSA (1 hour). Anti-TH primary antibody (1:500, overnight 4°C). Secondary antibody Alexa Fluor 647 (1:500, 1 hour RT). DAPI (1:1000, 5 min). Imaged on fluorescence microscope. Far-red channel (Alexa 647) does not overlap with GFP or mCherry.

Automation: Manual immunostaining

Plate: Corning 6-well

Expected result: Nurr1-GFP + FoxA2-RFP co-transfected wells show 3-5x higher TH fluorescence than single-TF or negative controls

Timeline: Day 20-21


Step 13 — Dopamine ELISA (Days 18 and 21)

Method: Replace medium with serum-free DMEM 2 hours before collection. Add 56 mM KCl to stimulate dopamine release (15 min). Collect 200 µl conditioned medium per well into pre-chilled tubes. Centrifuge 300 x g for 5 min. Transfer supernatant. Run high-sensitivity dopamine ELISA (Eagle Biosciences) following kit protocol. Read absorbance at 450 nm.

Automation: Opentrons OT-2 — media collection and transfer to 96-well ELISA plate; Spark Plate Reader — absorbance at 450 nm; Plateloc — seal ELISA plate during incubation steps

Plate: 96-round-axygen-pdw11cs-halfdeep (sample collection); 96-Armadillo-PCR-AB2396X (ELISA)

Expected result: Nurr1-GFP + FoxA2-RFP wells show 3-5x dopamine above baseline. Target: 5-50 nM.

Timeline: Days 18 and 21


Step 14 — Closed-loop Opentrons RL experiment (Day 21)

Method: Python script on Opentrons OT-2. Baseline ELISA read → deliver dopamine stimulus (1 µM start) → 15 min incubation → collect media sample → ELISA read → Python threshold decision → escalate to 5 µM if above threshold or maintain. Repeat 3-5 cycles.

Automation: Opentrons OT-2 — all liquid handling; Spark Plate Reader — ELISA absorbance reads between cycles

Plate: Corning 6-well (cells); 96-Armadillo-PCR-AB2396X (ELISA reads)

Expected result: Measurable dopamine concentration change with each stimulus cycle. Loop runs without error. State signal varies with stimulus.

Timeline: Day 21 — 6-8 hour session

Example code for automated RL chemical delivery and signal readout

The example code aims to deliver:

  • Baseline media sampling for ELISA
  • Dopamine stimulus delivery
  • Post-stimulus media sampling
  • Fresh media replenishment between cycles

The python RL agent will attempt to read ELISA dopamine concentration values (either entered manually between cycles, or via plate reader API if available) and decides the next stimulus concentration based on threshold logic (Aim 1) or Q-learning (extending Aim 2)

This may be adapted to be used with custom labware in the future, currently it uses the following commercially available well plates and tip racks:

Slot 1: NEST 12-well reservoir — reagents A1 = Dopamine 1 µM (in DMEM + 0.1 mg/ml ascorbic acid) A2 = Dopamine 5 µM (in DMEM + 0.1 mg/ml ascorbic acid) A3 = Dopamine 10 µM (in DMEM + 0.1 mg/ml ascorbic acid) A4 = PBS wash buffer A5 = Fresh differentiation medium + NGF (100 ng/ml) A6 = Waste

Slot 2: Corning 6-well plate — PC12 cells Wells A1-A3: experimental wells (Nurr1-GFP + FoxA2-RFP) Wells B1-B3: control wells (NGF only / GFP only / RFP only)

Slot 3: 96-well flat bottom plate — ELISA sample collection Row A: Cycle 0 baseline samples (wells A1-A6 from cell plate) Row B: Cycle 1 post-stimulus samples Row C: Cycle 2 post-stimulus samples Row D: Cycle 3 post-stimulus samples Row E: Cycle 4 post-stimulus samples Row F: Cycle 5 post-stimulus samples

Slot 4: Opentrons 96 tip rack (300 µl) Slot 5: Opentrons 96 tip rack (300 µl) — second rack for long sessions

from opentrons import protocol_api
import time

# ============================================================
# METADATA
# ============================================================

metadata = {
    'apiLevel': '2.13',
    'protocolName': 'Organoid-lab-on-a-chip:RL-Loop',
    'author': 'Jenn Leung',
    'description': (
        'Closed-loop dopamine reinforcement learning in NGF-differentiated '
        'PC12 cells transfected with Nurr1-GFP and FoxA2-RFP mRNA. '
        'Opentrons delivers dopamine stimuli and collects media samples. '
        'ELISA reads dopamine concentration as the RL state signal.'
    )
}

# ============================================================
# EXPERIMENTAL PARAMETERS — edit these before each run
# ============================================================

# Number of RL cycles to run (each cycle = stimulus delivery + wait + sample)
N_CYCLES = 5

# Incubation time after each dopamine stimulus (minutes)
INCUBATION_MIN = 15

# Volume to sample for ELISA per well (µl)
SAMPLE_VOLUME_UL = 150

# Volume of dopamine stimulus to deliver per well (µl)
STIMULUS_VOLUME_UL = 100

# Volume of fresh media to add after each cycle (µl)
MEDIA_REPLENISH_UL = 500

# Dopamine concentration threshold for reward decision (nM)
# Set this based on your Day 18 baseline ELISA reading
REWARD_THRESHOLD_NM = 15.0

# Starting stimulus — 'low', 'medium', or 'high'
INITIAL_STIMULUS = 'low'

# Wells to treat as experimental (Nurr1-GFP + FoxA2-RFP)
EXPERIMENTAL_WELLS = ['A1', 'A2', 'A3']

# Wells to treat as controls
CONTROL_WELLS = ['B1', 'B2', 'B3']

# All wells combined
ALL_WELLS = EXPERIMENTAL_WELLS + CONTROL_WELLS

# ============================================================
# RL AGENT — Threshold Logic (Aim 1)
# ============================================================

class ThresholdRLAgent:
    """
    Simple threshold-based RL agent for Aim 1.

    Logic:
        - If dopamine above threshold → reward (escalate to next stimulus level)
        - If dopamine below threshold → withhold (maintain or reduce stimulus)

    State:  dopamine concentration (nM) from ELISA
    Action: stimulus concentration to deliver next cycle
    """

    def __init__(self, threshold_nm, initial_stimulus='low'):
        self.threshold_nm = threshold_nm
        self.stimulus_levels = {
            'none': 0,
            'low':  1,     # 1 µM dopamine
            'medium': 5,   # 5 µM dopamine
            'high': 10     # 10 µM dopamine
        }
        self.current_stimulus = initial_stimulus
        self.history = []

    def decide(self, dopamine_nm):
        """
        Given current dopamine reading, decide next stimulus.

        Args:
            dopamine_nm (float): dopamine concentration in nM from ELISA

        Returns:
            str: stimulus level for next cycle ('low', 'medium', 'high')
            float: reward value (+1, 0, or -1)
        """
        reward = 0.0

        if dopamine_nm >= self.threshold_nm:
            reward = 1.0
            # Escalate to next level if not already at max
            if self.current_stimulus == 'low':
                next_stimulus = 'medium'
            elif self.current_stimulus == 'medium':
                next_stimulus = 'high'
            else:
                next_stimulus = 'high'  # Already at max — maintain
        else:
            reward = -0.5
            # Reduce or maintain
            if self.current_stimulus == 'high':
                next_stimulus = 'medium'
            elif self.current_stimulus == 'medium':
                next_stimulus = 'low'
            else:
                next_stimulus = 'low'

        self.history.append({
            'dopamine_nm': dopamine_nm,
            'stimulus': self.current_stimulus,
            'reward': reward,
            'next_stimulus': next_stimulus
        })

        self.current_stimulus = next_stimulus
        return next_stimulus, reward

    def get_stimulus_reservoir(self, stimulus_level):
        """Return reservoir well for given stimulus level."""
        mapping = {
            'low':    'A1',   # 1 µM
            'medium': 'A2',   # 5 µM
            'high':   'A3'    # 10 µM
        }
        return mapping.get(stimulus_level, 'A1')

    def print_summary(self):
        """Print RL session summary."""
        print("\n" + "="*60)
        print("RL SESSION SUMMARY")
        print("="*60)
        print(f"Threshold: {self.threshold_nm} nM")
        print(f"Cycles completed: {len(self.history)}")
        print(f"\nCycle-by-cycle results:")
        print(f"{'Cycle':>6} {'Dopamine (nM)':>14} {'Stimulus':>10} {'Reward':>8} {'Next':>10}")
        print("-"*55)
        for i, h in enumerate(self.history):
            print(
                f"{i+1:>6} "
                f"{h['dopamine_nm']:>14.1f} "
                f"{h['stimulus']:>10} "
                f"{h['reward']:>8.1f} "
                f"{h['next_stimulus']:>10}"
            )
        total_reward = sum(h['reward'] for h in self.history)
        print(f"\nTotal reward: {total_reward:.1f}")
        print("="*60)


# ============================================================
# RL AGENT — Q-Learning (Aim 2 extension)
# ============================================================

class QLearningRLAgent:
    """
    Q-learning agent for Aim 2 — more sophisticated RL.

    State space:  dopamine concentration binned into discrete levels
    Action space: stimulus concentrations (none, low, medium, high)

    Q-table updated after each cycle using Bellman equation.
    Epsilon-greedy exploration allows agent to occasionally try
    non-optimal actions to discover better strategies.
    """

    def __init__(self, threshold_nm, alpha=0.1, gamma=0.9, epsilon=0.2):
        """
        Args:
            threshold_nm:  reward threshold in nM
            alpha:         learning rate (0-1) — how fast Q-values update
            gamma:         discount factor (0-1) — how much future rewards matter
            epsilon:       exploration rate (0-1) — random action probability
        """
        import numpy as np
        self.np = np
        self.threshold_nm = threshold_nm
        self.alpha = alpha
        self.gamma = gamma
        self.epsilon = epsilon

        # State bins — dopamine concentration ranges (nM)
        self.state_bins = [0, 5, 10, 20, 40, 80, float('inf')]
        self.n_states = len(self.state_bins) - 1

        # Actions
        self.actions = ['none', 'low', 'medium', 'high']
        self.n_actions = len(self.actions)

        # Q-table initialised to zero
        self.q_table = np.zeros((self.n_states, self.n_actions))

        self.history = []
        self.current_state = 0

    def get_state(self, dopamine_nm):
        """Bin continuous dopamine reading into discrete state."""
        for i in range(len(self.state_bins) - 1):
            if self.state_bins[i] <= dopamine_nm < self.state_bins[i + 1]:
                return i
        return self.n_states - 1

    def get_reward(self, dopamine_before, dopamine_after):
        """
        Reward based on change in dopamine release.

        Positive reward for increase above threshold.
        Negative reward for decrease.
        """
        delta = dopamine_after - dopamine_before
        if dopamine_after >= self.threshold_nm and delta > 0:
            return 1.0
        elif delta > 5:
            return 0.5
        elif delta < -5:
            return -0.5
        else:
            return 0.0

    def choose_action(self, state):
        """Epsilon-greedy action selection."""
        import random
        if random.random() < self.epsilon:
            return random.randint(0, self.n_actions - 1)
        return int(self.np.argmax(self.q_table[state]))

    def update(self, state, action_idx, reward, next_state):
        """Q-learning update rule (Bellman equation)."""
        current_q = self.q_table[state, action_idx]
        max_next_q = self.np.max(self.q_table[next_state])
        new_q = current_q + self.alpha * (
            reward + self.gamma * max_next_q - current_q
        )
        self.q_table[state, action_idx] = new_q

    def decide(self, dopamine_before, dopamine_after):
        """Full Q-learning update and next action decision."""
        state = self.get_state(dopamine_before)
        next_state = self.get_state(dopamine_after)
        reward = self.get_reward(dopamine_before, dopamine_after)

        # Get action that was taken (stored from previous cycle)
        action_idx = getattr(self, '_last_action_idx', 1)

        # Update Q-table
        self.update(state, action_idx, reward, next_state)

        # Choose next action
        next_action_idx = self.choose_action(next_state)
        self._last_action_idx = next_action_idx
        next_action = self.actions[next_action_idx]

        self.history.append({
            'dopamine_before': dopamine_before,
            'dopamine_after': dopamine_after,
            'state': state,
            'next_state': next_state,
            'reward': reward,
            'next_action': next_action
        })

        return next_action, reward

    def get_stimulus_reservoir(self, stimulus_level):
        mapping = {
            'none':   None,
            'low':    'A1',
            'medium': 'A2',
            'high':   'A3'
        }
        return mapping.get(stimulus_level)

    def print_q_table(self):
        """Print current Q-table values."""
        print("\nQ-TABLE (rows=states, cols=actions)")
        print(f"{'State (nM)':>15}", end="")
        for a in self.actions:
            print(f"{a:>10}", end="")
        print()
        state_labels = ['0-5', '5-10', '10-20', '20-40', '40-80', '80+']
        for i, label in enumerate(state_labels):
            print(f"{label:>15}", end="")
            for j in range(self.n_actions):
                print(f"{self.q_table[i,j]:>10.3f}", end="")
            print()


# ============================================================
# MAIN PROTOCOL
# ============================================================

def run(protocol: protocol_api.ProtocolContext):

    # ----------------------------------------------------------
    # LABWARE SETUP
    # ----------------------------------------------------------

    # Tip racks
    tiprack_1 = protocol.load_labware(
        'opentrons_96_tiprack_300ul', 4,
        label='Tip Rack 1'
    )
    tiprack_2 = protocol.load_labware(
        'opentrons_96_tiprack_300ul', 5,
        label='Tip Rack 2'
    )

    # Reagent reservoir (NEST 12-well)
    reservoir = protocol.load_labware(
        'nest_12_reservoir_15ml', 1,
        label='Reagent Reservoir'
    )

    # PC12 cell plate (6-well)
    cell_plate = protocol.load_labware(
        'corning_6_wellplate_16.8ml_flat', 2,
        label='PC12 Cell Plate'
    )

    # ELISA sample collection plate (96-well flat)
    elisa_plate = protocol.load_labware(
        'corning_96_well_plate_360ul_flat', 3,
        label='ELISA Sample Plate'
    )

    # ----------------------------------------------------------
    # REAGENT POSITIONS
    # ----------------------------------------------------------

    dopamine_1uM  = reservoir['A1']   # Low stimulus
    dopamine_5uM  = reservoir['A2']   # Medium stimulus
    dopamine_10uM = reservoir['A3']   # High stimulus
    pbs_wash      = reservoir['A4']   # PBS wash
    fresh_medium  = reservoir['A5']   # Fresh NGF differentiation medium
    waste         = reservoir['A6']   # Liquid waste

    stimulus_wells = {
        'low':    dopamine_1uM,
        'medium': dopamine_5uM,
        'high':   dopamine_10uM
    }

    # ----------------------------------------------------------
    # PIPETTE SETUP
    # ----------------------------------------------------------

    p300 = protocol.load_instrument(
        'p300_single_gen2',
        mount='right',
        tip_racks=[tiprack_1, tiprack_2]
    )

    # ----------------------------------------------------------
    # ELISA PLATE MAP
    # Row A = baseline (cycle 0)
    # Row B = cycle 1
    # Row C = cycle 2 ... etc.
    # Columns 1-3 = experimental wells A1,A2,A3
    # Columns 4-6 = control wells B1,B2,B3
    # ----------------------------------------------------------

    elisa_row_letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

    def get_elisa_well(cycle, well_index):
        """
        Get ELISA plate well for a given cycle and cell plate well.
        cycle 0 = baseline, cycle 1-5 = post-stimulus
        well_index 0-5 corresponds to A1,A2,A3,B1,B2,B3
        """
        row = elisa_row_letters[cycle]
        col = well_index + 1
        return elisa_plate[f'{row}{col}']

    # ----------------------------------------------------------
    # RL AGENT INITIALISATION
    # Change QLearningRLAgent to ThresholdRLAgent for Aim 1
    # ----------------------------------------------------------

    agent = ThresholdRLAgent(
        threshold_nm=REWARD_THRESHOLD_NM,
        initial_stimulus=INITIAL_STIMULUS
    )

    # ----------------------------------------------------------
    # HELPER FUNCTIONS
    # ----------------------------------------------------------

    def collect_sample(cell_well, elisa_well, volume_ul):
        """
        Collect conditioned medium sample from cell well
        and transfer to ELISA plate.
        Always uses a fresh tip.
        """
        p300.pick_up_tip()
        p300.aspirate(volume_ul, cell_plate[cell_well].bottom(z=2))
        p300.dispense(volume_ul, elisa_well)
        p300.blow_out(elisa_well)
        p300.drop_tip()

    def deliver_stimulus(cell_well, stimulus_level, volume_ul):
        """
        Deliver dopamine stimulus to cell well.
        Always uses a fresh tip.
        """
        if stimulus_level == 'none':
            protocol.comment(f'  Withholding stimulus for {cell_well}')
            return

        source = stimulus_wells[stimulus_level]
        p300.pick_up_tip()
        p300.aspirate(volume_ul, source)
        p300.dispense(volume_ul, cell_plate[cell_well].bottom(z=3))
        # Gentle mix — 3 times at 80 µl to avoid disturbing cells
        p300.mix(3, 80, cell_plate[cell_well].bottom(z=3))
        p300.blow_out(cell_plate[cell_well].top(z=-5))
        p300.drop_tip()

    def replenish_medium(cell_well, volume_ul):
        """
        Add fresh differentiation medium + NGF to well.
        Always uses a fresh tip.
        """
        p300.pick_up_tip()
        # Aspirate from side to avoid disturbing cells/neurites
        p300.aspirate(volume_ul, fresh_medium)
        # Dispense gently to side of well, not directly onto cells
        p300.dispense(volume_ul, cell_plate[cell_well].bottom(z=5))
        p300.drop_tip()

    def get_dopamine_reading():
        """
        Pause protocol and prompt user to enter ELISA reading.
        In Aim 2, this would be replaced by a direct plate reader API call.

        Returns:
            float: dopamine concentration in nM
        """
        protocol.pause(
            'ELISA READING REQUIRED\n'
            'Run ELISA on collected samples now.\n'
            'Note dopamine concentration (nM) for experimental wells.\n'
            'Resume protocol when ready.'
        )
        # In a real run, you would either:
        # (a) manually note the reading and resume
        # (b) integrate with Spark Plate Reader API to read automatically
        # Returning a placeholder here — replace with actual reading logic
        # For automated integration:
        # from spark_api import SparkReader
        # reader = SparkReader()
        # reading = reader.read_absorbance(plate=elisa_plate, wavelength=450)
        # dopamine_nm = calculate_from_standard_curve(reading)
        # return dopamine_nm
        return 0.0  # Replace with actual reading

    # ----------------------------------------------------------
    # PROTOCOL EXECUTION
    # ----------------------------------------------------------

    protocol.comment("="*60)
    protocol.comment("ASSEMBLOID AGENCY — BIOCHEMICAL RL LOOP")
    protocol.comment(f"Cycles: {N_CYCLES}")
    protocol.comment(f"Incubation per cycle: {INCUBATION_MIN} min")
    protocol.comment(f"Reward threshold: {REWARD_THRESHOLD_NM} nM")
    protocol.comment(f"Initial stimulus: {INITIAL_STIMULUS}")
    protocol.comment("="*60)

    # ----------------------------------------------------------
    # STEP 1: COLLECT BASELINE SAMPLES (Cycle 0)
    # ----------------------------------------------------------

    protocol.comment("\n--- BASELINE SAMPLING (Cycle 0) ---")

    all_wells = EXPERIMENTAL_WELLS + CONTROL_WELLS

    for well_idx, well in enumerate(all_wells):
        elisa_dest = get_elisa_well(cycle=0, well_index=well_idx)
        protocol.comment(f'  Collecting baseline from {well} → ELISA {elisa_dest}')
        collect_sample(well, elisa_dest, SAMPLE_VOLUME_UL)

    # Prompt for baseline ELISA reading
    protocol.pause(
        'BASELINE ELISA\n'
        'Run ELISA on Row A of sample plate.\n'
        'Record baseline dopamine (nM) for each well.\n'
        'This sets your reference point for RL decisions.\n'
        'Resume when ready.'
    )

    # ----------------------------------------------------------
    # STEP 2: DELIVER INITIAL STIMULUS
    # ----------------------------------------------------------

    protocol.comment(f"\n--- INITIAL STIMULUS DELIVERY ({INITIAL_STIMULUS}) ---")

    current_stimulus = INITIAL_STIMULUS

    for well in EXPERIMENTAL_WELLS:
        protocol.comment(f'  Delivering {current_stimulus} dopamine to {well}')
        deliver_stimulus(well, current_stimulus, STIMULUS_VOLUME_UL)

    # Controls receive PBS instead of dopamine
    for well in CONTROL_WELLS:
        protocol.comment(f'  PBS to control well {well}')
        p300.pick_up_tip()
        p300.aspirate(STIMULUS_VOLUME_UL, pbs_wash)
        p300.dispense(STIMULUS_VOLUME_UL, cell_plate[well].bottom(z=3))
        p300.drop_tip()

    # ----------------------------------------------------------
    # STEP 3: RL CYCLES
    # ----------------------------------------------------------

    dopamine_before = REWARD_THRESHOLD_NM  # Starting estimate — updated by ELISA

    for cycle in range(1, N_CYCLES + 1):

        protocol.comment(f"\n{'='*60}")
        protocol.comment(f"RL CYCLE {cycle} of {N_CYCLES}")
        protocol.comment(f"Current stimulus: {current_stimulus}")
        protocol.comment(f"{'='*60}")

        # ---- INCUBATION ----
        protocol.comment(f"  Incubating {INCUBATION_MIN} minutes...")
        protocol.delay(
            minutes=INCUBATION_MIN,
            msg=f'RL Cycle {cycle} — stimulus incubation. '
                f'Keep cells on deck. Resume automatically.'
        )

        # ---- COLLECT POST-STIMULUS SAMPLES ----
        protocol.comment(f"  Collecting post-stimulus samples (Cycle {cycle})...")

        for well_idx, well in enumerate(all_wells):
            elisa_dest = get_elisa_well(cycle=cycle, well_index=well_idx)
            protocol.comment(f'    {well} → ELISA {elisa_dest}')
            collect_sample(well, elisa_dest, SAMPLE_VOLUME_UL)

        # ---- GET ELISA READING ----
        # In Aim 1: manual ELISA run, user enters reading, resumes
        # In Aim 2: automated plate reader API call
        dopamine_after = get_dopamine_reading()

        # ---- RL DECISION ----
        protocol.comment(f"\n  RL DECISION:")
        protocol.comment(f"  Dopamine before: {dopamine_before:.1f} nM")
        protocol.comment(f"  Dopamine after:  {dopamine_after:.1f} nM")

        next_stimulus, reward = agent.decide(dopamine_after)

        protocol.comment(f"  Reward:          {reward:.1f}")
        protocol.comment(f"  Next stimulus:   {next_stimulus}")

        # ---- REPLENISH MEDIUM ----
        protocol.comment("  Replenishing medium...")
        for well in all_wells:
            replenish_medium(well, MEDIA_REPLENISH_UL)

        # ---- DELIVER NEXT STIMULUS ----
        if cycle < N_CYCLES:
            protocol.comment(f"  Delivering {next_stimulus} stimulus...")
            for well in EXPERIMENTAL_WELLS:
                deliver_stimulus(well, next_stimulus, STIMULUS_VOLUME_UL)
            for well in CONTROL_WELLS:
                p300.pick_up_tip()
                p300.aspirate(STIMULUS_VOLUME_UL, pbs_wash)
                p300.dispense(STIMULUS_VOLUME_UL, cell_plate[well].bottom(z=3))
                p300.drop_tip()

        # Update tracking variables
        dopamine_before = dopamine_after
        current_stimulus = next_stimulus

    # ----------------------------------------------------------
    # STEP 4: FINAL SUMMARY
    # ----------------------------------------------------------

    agent.print_summary()

    protocol.comment("\n" + "="*60)
    protocol.comment("RL SESSION COMPLETE")
    protocol.comment(f"Total cycles completed: {N_CYCLES}")
    protocol.comment("ELISA sample plate is ready for final analysis.")
    protocol.comment("Return cells to incubator if further culture needed.")
    protocol.comment("="*60)

    protocol.pause(
        'SESSION COMPLETE\n'
        'Collect ELISA sample plate for final analysis.\n'
        'Return PC12 cell plate to incubator.\n'
        'Check RL summary printed to console.'
    )


# ============================================================
# STANDALONE SIMULATION MODE
# Run this file directly (not via Opentrons App) to simulate
# the RL agent logic without hardware
# ============================================================

if __name__ == '__main__':
    """
    Simulate the RL agent logic offline.
    Useful for testing threshold and Q-learning parameters
    before running on the actual OT-2.

    Usage:
        python assembloid_rl_protocol.py
    """

    print("ASSEMBLOID AGENCY — RL Agent Simulation (no hardware)")
    print("="*60)

    # Simulated dopamine readings (nM) — replace with real ELISA values
    simulated_readings = [8.2, 12.4, 18.6, 24.3, 38.7, 41.2]

    # Test threshold agent
    print("\n--- THRESHOLD AGENT SIMULATION ---")
    threshold_agent = ThresholdRLAgent(
        threshold_nm=15.0,
        initial_stimulus='low'
    )

    for i, reading in enumerate(simulated_readings[1:], 1):
        stimulus, reward = threshold_agent.decide(reading)
        print(
            f"Cycle {i}: "
            f"dopamine={reading:.1f} nM | "
            f"reward={reward:.1f} | "
            f"next stimulus={stimulus}"
        )

    threshold_agent.print_summary()

    # Test Q-learning agent
    print("\n--- Q-LEARNING AGENT SIMULATION ---")
    try:
        import numpy as np
        q_agent = QLearningRLAgent(
            threshold_nm=15.0,
            alpha=0.1,
            gamma=0.9,
            epsilon=0.2
        )
        q_agent._last_action_idx = 1  # Start with 'low'

        readings_before = simulated_readings[:-1]
        readings_after = simulated_readings[1:]

        for i, (before, after) in enumerate(
            zip(readings_before, readings_after), 1
        ):
            action, reward = q_agent.decide(before, after)
            print(
                f"Cycle {i}: "
                f"before={before:.1f} nM | "
                f"after={after:.1f} nM | "
                f"reward={reward:.1f} | "
                f"next={action}"
            )

        q_agent.print_q_table()

    except ImportError:
        print("numpy not available — skipping Q-learning simulation")
        print("Install with: pip install numpy")
        

Step 15 — Data analysis and RL loop evaluation

Method: Dopamine concentration values from each ELISA cycle plotted against cycle number. Q-values assessed for threshold convergence. Fluorescence data quantified for transfection efficiency. TH immunostaining quantified by mean fluorescence intensity per condition.

Automation: Python (pandas, matplotlib, numpy) for data processing and visualisation

Plate: N/A

Expected result: Positive correlation between dual-TF transfection and dopamine release. RL loop demonstrates state-dependent stimulus delivery.

Timeline: Day 22-23


Example ELISA Plate Layout (96-well, Day 21)

1 2 3 4 5 6 7 8 9 10 11 12 A [STD1] [STD1] [STD2] [STD2] [STD3] [STD3] [STD4] [STD4] [STD5] [STD5] [STD6] [STD6] B [STD7] [STD7] [STD8] [STD8] [BLK] [BLK] [NC1] [NC1] [NC2] [NC2] [NC3] [NC3] C [GFP] [GFP] [RFP] [RFP] [N+F1] [N+F1] [N+F2] [N+F2] [RL-0] [RL-0] [RL-1] [RL-1] D [RL-2] [RL-2] [RL-3] [RL-3] [RL-4] [RL-4] [RL-5] [RL-5] [SPARE][SPARE][SPARE][SPARE] LEGEND: STD1-8 = Dopamine standard curve (0, 0.5, 1, 2, 5, 10, 25, 50 nM) BLK = Blank (media only, no cells) NC1-3 = Negative controls (no mRNA transfection — NGF only wells) GFP = Nurr1-GFP only transfected RFP = FoxA2-RFP only transfected N+F1/2 = Nurr1-GFP + FoxA2-RFP co-transfected (technical replicates) RL-0 = Pre-RL baseline sample RL-1..5 = Successive RL cycle post-stimulus samples SPARE = Reserve wells

SECTION 5: TECHNIQUES, TOOLS, AND TECHNOLOGY

Technique Checklist

  • Pipetting

  • Lab Safety

  • Bioethical Considerations

  • DNA sequence design (Benchling)

  • Gene synthesis (Twist Bioscience)

  • Databases (e.g., GenBank, NCBI, Ensembl, and UCSC Genome Browser)

  • Bacterial transformation

  • Plasmid miniprep

  • Sanger sequencing verification

  • Restriction enzyme digestion

  • In vitro transcription (IVT)

  • RNA purification

  • Cell-free protein expression (validation)

  • Mammalian cell culture

  • Lipid nanoparticle transfection (MessengerMAX)

  • Fluorescence microscopy

  • Immunostaining

  • ELISA

  • Automated liquid handling (Opentrons OT-2)

  • Python scripting and reinforcement learning


Industry Partner Connections

PartnerRelevance
Twist BioscienceDirect — ordering both IVT template constructs
OpentronsDirect — OT-2 as primary automation platform for RL loop, media changes, ELISA
New England BiolabsDirect — HiScribe T7 IVT kit, Poly-A Polymerase, KpnI, competent cells
Thermo Fisher ScientificDirect — MessengerMAX, Opti-MEM, DMEM, RNeasy, TURBO DNase, PFA
Millipore SigmaDirect — PDL, dopamine HCl, LB broth
SecureDNADirect — sequence screening prior to Twist order
Asimov (Kernel Platform)Potential — genetic circuit design and construct optimisation for Aim 2 GRAB-DA sensor construct
Helix NanoPotential — lipid nanoparticle delivery optimisation for more efficient mRNA transfection in Aim 2 microfluidic format

Technique Expansion

onon

1. In Vitro Transcription (IVT)

In vitro transcription is a cell-free method for producing RNA from a DNA template using a bacteriophage RNA polymerase — in this project, T7 RNA polymerase from the HiScribe T7 High Yield RNA Synthesis Kit (New England Biolabs). The linearised plasmid template is incubated with T7 polymerase, nucleotide triphosphates (ATP, CTP, GTP, UTP), and reaction buffer, and the polymerase reads the DNA strand from the T7 promoter sequence and synthesises a complementary RNA strand with high processivity, producing up to 100 µg of mRNA from a single 20 µl reaction. Following transcription, the DNA template is removed by TURBO DNase treatment, and a poly-A tail of approximately 150-200 adenosine residues is added enzymatically using E. coli Poly-A Polymerase, which stabilises the mRNA and enhances translation efficiency in mammalian cells. The mRNA is purified using the RNeasy Mini Kit and quality-checked by Nanodrop spectrophotometry (A260/280 ratio ~2.0 indicates pure RNA) and agarose gel electrophoresis (single clean band indicates intact, non-degraded product). IVT is the core production step that converts the Twist-synthesised DNA template into transfectable mRNA, and its success is entirely dependent on maintaining RNase-free conditions throughout — any contamination with RNases, which are ubiquitous on surfaces and skin, will degrade the mRNA and result in failed transfection with no fluorescence and no dopaminergic differentiation.

2. Lipid Nanoparticle mRNA Transfection (MessengerMAX)

Lipofectamine MessengerMAX is an ionisable lipid transfection reagent specifically engineered for mRNA delivery into mammalian cells, addressing the fundamental challenge that mRNA — being large, negatively charged, and membrane-impermeant — cannot cross the cell membrane unaided. When mixed with mRNA in Opti-MEM at low pH, the ionisable lipids spontaneously self-assemble around the mRNA cargo, forming lipid nanoparticles approximately 100-200 nm in diameter whose positively charged outer surface is electrostatically attracted to the negatively charged cell membrane. The cell internalises the nanoparticle through endocytosis, forming an endosome, and the acidic endosomal environment (pH ~5-6) causes the MessengerMAX lipids to become more charged and destabilise the endosomal membrane — this endosomal escape step releases the mRNA into the cytoplasm where ribosomes can access it for translation. MessengerMAX is specifically preferred over DNA-optimised reagents such as Lipofectamine 2000 because mRNA is significantly more fragile than DNA and degrades rapidly in acidic endosomes if escape is delayed — MessengerMAX’s ionisable lipid formulation is optimised for rapid endosomal escape and designed for repeat daily transfection over 7 consecutive days, which is required by the Kim et al. (2017) protocol because the mRNA degrades within 24-48 hours in the cytoplasm and must be continuously replenished to maintain Nurr1 and FoxA2 protein levels sufficient for sustained dopaminergic differentiation.


SECTION 6: PROJECT VALIDATION

6a — Validation Choice

Cell-free mRNA expression (CFPS) is selected as the primary validation experiment because it tests the core functional assumption of the entire project — that the Twist-synthesised DNA constructs produce correctly folded, fluorescent Nurr1-GFP and FoxA2-RFP fusion proteins when transcribed and translated — before any PC12 cells are involved. This is the most efficient validation gate: if the constructs fail to produce fluorescent protein in a cell-free system, there is no point proceeding to the more complex and time-consuming PC12 transfection experiment, and the result unambiguously points to a construct-level problem rather than a cell biology problem.

6b — Validation Protocol

  1. Receive Twist plasmid delivery — resuspend both constructs to 100 ng/µl in TE buffer
  2. Linearise 1 µg of each plasmid with KpnI (1 hour, 37°C) in CutSmart Buffer
  3. Heat inactivate at 65°C for 20 minutes
  4. Purify linearised template with GeneJET PCR Purification Kit, elute in RNase-free water
  5. Set up IVT reaction (HiScribe T7 kit): 1 µg linearised template, NTPs, T7 polymerase, buffer — 20 µl total, 2 hours at 37°C
  6. Add 2 µl TURBO DNase, incubate 15 minutes at 37°C
  7. Purify mRNA with RNeasy Mini Kit, elute in 30 µl RNase-free water
  8. Quantify mRNA by Nanodrop — record concentration and A260/280
  9. Run 1 µl on 1% agarose gel — confirm single clean band
  10. Prepare cell-free expression reactions in 384 Greiner black-well clear-bottom plate:
    • Well A1-A4: Nurr1-GFP mRNA (500 ng) + CFPS master mix (10 µl total)
    • Well B1-B4: FoxA2-RFP mRNA (500 ng) + CFPS master mix
    • Well C1-C4: No template control (RNase-free water + CFPS master mix)
    • Well D1-D4: Positive control (commercial GFP mRNA + CFPS master mix)
  11. Seal plate with A4s breathable seal using Plateloc
  12. Incubate 2-4 hours at 37°C in Inheco Plate Incubator
  13. Read fluorescence on PHERAstar FSX: GFP channel (ex 488/em 520), mCherry channel (ex 555/em 610)
  14. Compare signal-to-background ratio — target >3:1 for positive validation

6c — Techniques Used

Cell-free protein synthesis (CFPS) is an in vitro reconstitution of the transcription and translation machinery that allows gene expression to be measured outside of living cells, using a concentrated lysate of E. coli or wheat germ ribosomes, tRNA, amino acids, energy regeneration system, and cofactors. In this validation, the mRNA produced by IVT is directly added to the CFPS master mix, bypassing the need for any transfection and allowing the construct’s coding capacity to be assessed in a controlled, cell-free environment within 2-4 hours rather than the 16-24 hours required for mammalian cell expression. The PHERAstar FSX plate reader is used to detect GFP and mCherry fluorescence in the 384-well plate format, providing quantitative fluorescence readout that distinguishes between correctly folded fluorescent fusion proteins and non-fluorescent or misfolded products — a positive result (GFP signal from Nurr1-GFP mRNA, mCherry signal from FoxA2-RFP mRNA, above the no-template control) confirms that both the IVT reaction and the fusion protein coding sequences are functional before any PC12 cells are used. This CFPS validation step is particularly critical for detecting linker junction frameshifts that would result in Nurr1 or FoxA2 protein being made but GFP or mCherry being absent — a scenario that would produce no fluorescence despite successful transfection, making it impossible to distinguish from a failed transfection without this prior validation.

6d — Hypothetical Data

Hypothetical CFPS validation fluorescence results:

ConditionGFP signal (RFU)mCherry signal (RFU)
Nurr1-GFP mRNA4850 ± 320180 ± 45
FoxA2-RFP mRNA210 ± 386240 ± 415
No template control195 ± 28165 ± 32
GFP positive control5120 ± 280190 ± 40

Hypothetical dopamine ELISA results (Day 21):

ConditionDopamine released (nM)
Negative control (NGF only)3.2 ± 0.8
GFP mRNA only3.8 ± 0.9
Nurr1-GFP only7.4 ± 1.2
FoxA2-RFP only6.9 ± 1.4
Nurr1-GFP + FoxA2-RFP18.6 ± 2.3

Hypothetical RL loop dopamine readings across 5 cycles (Day 21):

RL CycleStimulus delivered (µM)Dopamine measured (nM)Reward decision
0 (baseline)18.6
11.024.3Threshold met → escalate
25.038.7Threshold met → maintain
35.041.2Threshold met → maintain
45.035.8Threshold met → maintain
55.039.4Threshold met → maintain

Troubleshooting

The most critical failure point in this experiment is mRNA degradation by RNase contamination — a single moment of carelessness with non-sterile tips or unwashed bench surfaces can destroy the entire mRNA preparation silently, with no visible indication until the experiment fails. If no fluorescence is observed in PC12 cells after transfection, the first diagnostic step should always be to run the remaining mRNA on an agarose gel — smearing rather than a clean band indicates degradation, and a fresh IVT batch should be prepared before attempting transfection again. A second significant risk is the linker junction frameshift — if the stop codon was not correctly removed from the Nurr1 or FoxA2 CDS before the linker, the ribosome will terminate at Nurr1 or FoxA2 and never produce the fluorescent protein, resulting in TH upregulation and dopamine production but no fluorescence, which would be invisible without TH staining as a parallel readout. If fluorescence is observed but dopamine ELISA shows no increase above baseline, the most likely explanations are that the fusion protein is expressed but misfolded and unable to bind DNA, that KCl stimulation failed (prepare fresh KCl solution), or that the ELISA kit sensitivity is insufficient — an alternative strategy is to extend the transfection period from 7 to 10 days and add 0.5 mM dibutyryl-cAMP to the differentiation medium as Kim et al. (2017) showed this approximately doubles expression efficiency in rat cells.


SECTION 7: ADDITIONAL INFORMATION

References

  • Kim S et al. (2017) Efficient Generation of Dopamine Neurons by Synthetic Transcription Factor mRNAs. Molecular Therapy. PMC5589083.
  • Lee HS et al. (2010) Foxa2 and Nurr1 Synergistically Yield A9 Nigral Dopamine Neurons Exhibiting Improved Differentiation, Function, and Cell Survival. Stem Cells. DOI: 10.1002/stem.294.
  • Wiatrak B et al. (2020) PC12 Cell Line: Cell Types, Coating of Culture Vessels, Differentiation and Other Culture Conditions. Cells 9(4):958.
  • Wong HC et al. (2025) Fluidic Programmable Gravi-maze Array for High Throughput Multiorgan Drug Testing. bioRxiv 2025.06.18.660241.
  • Sari Y et al. (2024) Comprehensive evaluation of T7 promoter for enhanced yield and quality in mRNA production. Scientific Reports. PMC11053036.
  • Kozak M (1987) An analysis of 5’-noncoding sequences from 208 human messenger RNAs. Nucleic Acids Research 15(20):8125-8148.
  • Cormack BP et al. (1996) FACS-optimized mutants of the green fluorescent protein (GFP). Gene 173(1):33-38.
  • Shaner NC et al. (2004) Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein. Nature Biotechnology 22(12):1567-1572.
  • Kagan BJ et al. (2022) In vitro neurons learn and exhibit sentience when embodied in a simulated game-world. Neuron 115(19):3414-3434. (DishBrain/Cortical Labs)
  • Studier FW and Moffatt BA (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. Journal of Molecular Biology 189(1):113-130.
  • Prasher DC et al. (1992) Primary structure of the Aequorea victoria green-fluorescent protein. Gene 111(2):229-233.

Supplies and Budget

ItemSupplierEst. Cost (GBP)Link
Nurr1-GFP IVT template (pTwist Amp High Copy)Twist Bioscience£150-200twist.com
FoxA2-RFP IVT template (pTwist Amp High Copy)Twist Bioscience£150-200twist.com
HiScribe T7 High Yield RNA Synthesis KitNEB (E2040S)£150-180neb.com
E. coli Poly-A PolymeraseNEB (M0276S)£50-70neb.com
TURBO DNaseThermo Fisher (AM2238)£60-80thermofisher.com
RNeasy Mini Kit (50 preps)Qiagen (74104)£80-100qiagen.com
RNase-free water (500 ml)Thermo Fisher (AM9937)£25-35thermofisher.com
NEB 10-beta Competent E. coliNEB (C3019H)£50-70neb.com
GeneJET Miniprep Kit (50 preps)Thermo Fisher (K0503)£60-80thermofisher.com
KpnI restriction enzymeNEB (R0142S)£30-40neb.com
GeneJET PCR Purification KitThermo Fisher (K0701)£60-80thermofisher.com
PC12 cells (CRL-1721)ATCC£400-500atcc.org
DMEM high glucose + L-glutamineThermo Fisher (11965092)£30-45thermofisher.com
Horse serum heat inactivatedThermo Fisher (26050088)£80-120thermofisher.com
FBS heat inactivatedThermo Fisher (10500064)£60-100thermofisher.com
NGF 2.5S rat originAlomone Labs (N-100)£80-120alomone.com
Poly-D-LysineMillipore Sigma (P6407)£30-50sigmaaldrich.com
Collagen Type I rat tailCorning (354236)£60-90fishersci.co.uk
Lipofectamine MessengerMAXThermo Fisher (LMRNA003)£85-100thermofisher.com
Opti-MEM (500 ml)Thermo Fisher (31985062)£30-45thermofisher.com
Dopamine ELISA Kit (High Sensitivity)Eagle Biosciences (EA101RB)£300-400eaglebiosciences.com
Anti-TH antibodyAbcam (ab112)£80-110abcam.com
Secondary antibody Alexa Fluor 647Abcam (ab150083)£60-90abcam.com
4% ParaformaldehydeThermo Fisher (28908)£40-60thermofisher.com
Dopamine hydrochlorideMillipore Sigma (H8502)£30-50sigmaaldrich.com
Corning 6-well platesCorning (3516)£20-30fishersci.co.uk
96-well flat bottom plates (ELISA)Corning (3596)£15-20fishersci.co.uk
384 Greiner black-well clear-bottomGreiner Bio-One£30-50gbo.com
RNase-free tubes and tipsThermo Fisher (AM12450)£40-60thermofisher.com
LB Broth + Agar + AmpicillinMillipore Sigma£40-60sigmaaldrich.com
Miscellaneous consumablesVarious£80-120
TOTAL ESTIMATED£2,290-3,155

Proposal generated May 2026. All sequences designed in Benchling. SecureDNA screening to be completed prior to Twist order submission. IBC notification filed before transfection work commences.

Appendix

##### Experimental Protocol
###### 1. Overview and Scientific Rationale
This protocol describes a complete experime
ntal workflow for: (1) generating synthetic mRNA encoding the dopaminergic transcription factors Nurr1 and FoxA2; (2) transfecting NGF-differentiated PC12 rat pheochromocytoma cells; (3) verifying dopaminergic differentiation via dopamine ELISA and TH immunostaining; and (4) implementing a minimal closed-loop chemical reinforcement learning (RL) system using the Opentrons liquid handling robot.


The approach is based on Kim et al. (2017) who demonstrated that synthetic mRNA transfection of Nurr1 and FoxA2 into rat neural precursor cells generates functional dopaminergic neurons exhibiting electrophysiological and biochemical properties characteristic of midbrain dopamine neurons. This protocol adapts their methodology for PC12 cells as a BSL-1 compatible, immortalised dopaminergic model system.


###### 1.1 Scientific Background


PC12 cells are an immortalised rat adrenal pheochromocytoma cell line that synthesise and store catecholamines, predominantly dopamine. Upon NGF treatment they extend neurites and acquire a sympathetic neuron-like phenotype. The addition of Nurr1 and FoxA2 mRNA pushes these cells further toward authentic midbrain dopaminergic identity by activating the full dopamine synthesis machinery including tyrosine hydroxylase (TH), dopamine transporter (DAT), and vesicular monoamine transporter 2 (VMAT2).


###### 1.2 Chemical Reinforcement Learning Concept


The closed-loop RL system uses Opentrons to deliver dopamine pulses as reward signals to the differentiated PC12 cell culture. Dopamine release from cells is measured by ELISA as the state readout. If dopamine release exceeds a defined threshold following a stimulus, a reward pulse is delivered. This creates a minimal biological RL loop in which cellular dopaminergic activity is the state variable and exogenous dopamine delivery is the reward signal.


###### 2. Safety


###### 2.1 Biosafety Classification


All components of this protocol are BSL-1 safe:


###### 2.2 Personal Protective Equipment


Lab coat, nitrile gloves, and eye protection at all times


All cell work performed in laminar flow biosafety cabinet (Class II)


Dopamine: prepare in fume hood, avoid skin contact, protect from light


RNA work: always use RNase-free consumables, change gloves frequently


All liquid waste decontaminated with 10% bleach before disposal


###### 3. Materials and Equipment


###### 3.1 Cell Line


###### 3.2 Constructs (from Twist Bioscience)


NOTE: All constructs use Kim et al. (2017) UTR design: T7 promoter + Xenopus 5'UTR + Kozak + CDS + custom 3'UTR. Poly-A tail added enzymatically post-IVT.


###### 3.3 Reagents


###### 3.4 Equipment


Laminar flow biosafety cabinet (Class II)


CO2 incubator (37°C, 5% CO2)


Centrifuge


Plate reader (absorbance and fluorescence)


Opentrons OT-2 or Flex with P300 single channel pipette


Fluorescence microscope


Nanodrop spectrophotometer


Water bath (37°C)


Thermal cycler (for optional PCR verification)


Agarose gel electrophoresis system


###### 3.5 Consumables


6-well plates (PDL/laminin coated)


96-well plates (for ELISA)


T25 and T75 flasks


RNase-free tubes and tips (all RNA work)


Standard filtered pipette tips


15 ml and 50 ml centrifuge tubes


Cryovials (for cell stock preparation)


###### 4. Media and Reagent Preparation


###### 4.1 Growth Medium


DMEM high glucose


10% horse serum (heat inactivated)


5% FBS (heat inactivated)


1% Penicillin/Streptomycin


NOTE: Warm to 37°C before use. Store at 4°C for up to 4 weeks.


###### 4.2 Differentiation Medium


DMEM high glucose


1% horse serum (heat inactivated)


1% Penicillin/Streptomycin


50 ng/ml NGF 2.5S (add fresh at each media change)


NOTE: Prepare NGF working stock fresh from aliquots. Never refreeze thawed NGF.


######  4.3 NGF Stock Solution


Reconstitute NGF in sterile PBS + 0.1% BSA to 100 µg/ml


Aliquot into single-use volumes (typically 5-10 µl)


Store at -80°C


Dilute to working concentration (50-100 ng/ml) in differentiation medium immediately before use


######  4.4 Dopamine Working Solution


Dissolve dopamine hydrochloride in sterile PBS to 10 mM stock


Add ascorbic acid to 0.1 mg/ml final concentration


Prepare fresh on day of experiment


Keep on ice, protect from light, use within 4 hours


Prepare working dilutions: 1 µM, 5 µM, 10 µM in differentiation medium


NOTE: Dopamine oxidises rapidly. Yellow/brown discolouration indicates degradation — discard and prepare fresh.


######  5. Part 1 — Bacterial Transformation and Plasmid Preparation


Timeline: Days 1-4. Perform for all three constructs simultaneously.


###### 5.1 Transformation (Day 1)


Remove NEB 10-beta competent cells from -80°C, thaw on ice 10 minutes


Add 1-2 µl plasmid DNA (from Twist delivery) to competent cells


Flick gently to mix — do not vortex


Incubate on ice 30 minutes


Heat shock at 42°C for exactly 45 seconds


Return immediately to ice for 2 minutes


Add 250 µl LB broth (no ampicillin)


Incubate at 37°C for 1 hour shaking at 200 rpm


Spread onto LB agar + ampicillin (100 µg/ml) plates


Incubate overnight at 37°C


###### 5.2 Colony Picking and Overnight Culture (Day 2)


Check plates — expect 10-100 colonies per construct


Pick 3-6 individual well-separated colonies per construct


Inoculate each into 5 ml LB broth + ampicillin (100 µg/ml) in 15 ml tube


Grow overnight at 37°C shaking at 200 rpm


###### 5.3 Miniprep (Day 3)


Follow GeneJET Miniprep Kit protocol exactly


Elute in 50 µl elution buffer


Measure concentration by Nanodrop — aim for >50 ng/µl


Check A260/280 ratio — should be 1.8-2.0


Send one sample per construct for Sanger sequencing


Store at -20°C


###### 5.4 Sequencing Verification (Day 4)


Align sequencing results against Benchling sequences. Confirm:


T7 promoter intact: TAATACGACTCACTATA


5'UTR intact: AAATAAGAGAGAAAAGAAGAGTAAGAAGAAATATAAGAGCC


Kozak + ATG intact: GCCACCATG


CDS correct with no frameshifts or premature stop codons


3'UTR intact


NOTE: Do not proceed to IVT until sequencing confirms all three constructs are correct.


###### 6. Part 2 — In Vitro Transcription (IVT) and mRNA Preparation


Timeline: Day 5. Perform in RNase-free conditions throughout. Change gloves frequently.


######  6.1 Template Linearisation


Set up KpnI digestion for each construct:


2 µg plasmid DNA


2 µl CutSmart Buffer (10x)


1 µl KpnI enzyme


Make up to 20 µl with RNase-free water


Incubate 1 hour at 37°C


Heat inactivate 20 minutes at 65°C


Verify linearisation on 1% agarose gel — should show single linear band


Purify linearised template using GeneJET PCR Purification Kit


Elute in RNase-free water, quantify by Nanodrop


NOTE: Linearisation is critical — circular template produces run-on transcripts that reduce mRNA quality.


###### 6.2 IVT Reaction (HiScribe T7 Kit)


Following Kim et al. (2017) protocol with HiScribe T7 High Yield RNA Synthesis Kit:


Assemble on ice in RNase-free tube:


Mix gently by flicking — do not vortex


Incubate at 37°C for 2 hours


Add 2 µl TURBO DNase, mix gently, incubate 15 minutes at 37°C


NOTE: DNase step removes DNA template — essential to prevent DNA contamination of your mRNA product.


###### 6.3 Poly-A Tailing (E. coli Poly-A Polymerase)


Following Kim et al. (2017) — poly-A added enzymatically post-IVT:


To the 20 µl IVT product add:


4 µl 10x E. coli Poly-A Polymerase Reaction Buffer


2 µl ATP (10 mM)


2 µl E. coli Poly-A Polymerase


12 µl RNase-free water (total reaction 40 µl)


Incubate 45 minutes at 37°C


NOTE: This enzymatically adds ~150-200 A's — better than DNA-encoded poly-A for translation efficiency.


###### 6.4 mRNA Purification (RNeasy Kit)


Follow RNeasy Mini Kit protocol for RNA cleanup


Elute in 50 µl RNase-free water


Quantify by Nanodrop — note A260/280 and A260/230 ratios


Run 1 µl on agarose gel to verify mRNA integrity — should show single band


Aliquot into single-use volumes


Store at -80°C


NOTE: Expected yield: 50-100 µg mRNA per 20 µl IVT reaction. A260/280 should be ~2.0.


###### 7. Part 3 — PC12 Cell Culture and NGF Differentiation


Timeline: Days 1-14. Run in parallel with bacterial/IVT work.


###### 7.1 Thawing PC12 Cells (Day 1)


Warm growth medium to 37°C


Remove PC12 vial from liquid nitrogen, thaw quickly in 37°C water bath


Transfer dropwise into 9 ml warm growth medium in 15 ml tube — add medium slowly


Centrifuge 200 x g for 5 minutes


Aspirate supernatant carefully


Resuspend pellet in 5 ml warm growth medium


Transfer to T25 flask


Incubate at 37°C, 5% CO2


Check next day — cells should be attached


###### 7.2 Plate Coating (Day 7)


Prepare PDL solution: 50 µg/ml in sterile water


Add 1 ml per well of 6-well plate


Incubate 1 hour at room temperature


Aspirate PDL completely


Wash 3x with sterile water


Allow to dry completely in laminar flow hood


Prepare laminin solution: 10 µg/ml in PBS


Add 1 ml per well


Incubate overnight at 37°C


NOTE: PDL coating is essential for PC12 Adh adhesion. Laminin significantly improves neurite extension and process stability.


###### 7.3 Cell Plating and Differentiation (Day 8)


Aspirate laminin solution — do not wash, leave residual laminin


Trypsinise PC12 cells from flask


Count using haemocytometer


Plate at 50,000 cells/well in growth medium


Incubate overnight


Day 9: Replace with differentiation medium containing 50 ng/ml NGF


Days 9-15: Change half the medium every 2-3 days with fresh NGF


NOTE: Always add medium to the side of the well — never directly onto cells. Neurites are fragile.


###### 8. Part 4 — mRNA Transfection


Timeline: Day 15 onwards. Daily transfection for 5-7 days following Kim et al. (2017).


###### 8.1 Experimental Groups


###### 8.2 Daily Transfection Protocol (Kim et al. Method)


Perform daily for 5-7 consecutive days starting Day 15:


###### Prepare mRNA-MessengerMAX complex (per well):


Dilute mRNA in Opti-MEM: 500 ng total mRNA in 25 µl Opti-MEM


For Nurr1 + FoxA2 wells: 250 ng Nurr1 mRNA + 250 ng FoxA2 mRNA


For GFP wells: 500 ng GFP mRNA


Dilute MessengerMAX: 1.5 µl MessengerMAX in 25 µl Opti-MEM


Incubate separately 5 minutes at room temperature


Combine mRNA and MessengerMAX dilutions


Mix gently by pipetting 3-4 times — do not vortex


Incubate 5 minutes at room temperature


###### Transfect cells:


Aspirate half the medium from each well carefully


Add transfection complex dropwise to cells


Rock plate gently to distribute


Incubate 4 hours at 37°C, 5% CO2


Replace with fresh differentiation medium + NGF


NOTE: Following Kim et al. (2017): daily transfection is required because mRNA degrades within 24-48 hours. Adding db-cAMP (0.5 mM) alongside transfection significantly improves expression efficiency in rat cells — add to differentiation medium if available.


###### 8.3 Transfection Timeline


###### 9. Part 5 — Verification of Dopaminergic Differentiation


###### 9.1 TH Immunostaining


Tyrosine hydroxylase (TH) is the rate-limiting enzyme in dopamine synthesis. Increased TH expression confirms successful dopaminergic differentiation.


Aspirate medium and wash cells 2x with PBS


Fix with 4% paraformaldehyde for 15 minutes at room temperature


Wash 3x with PBS


Permeabilise with 0.25% Triton X-100 in PBS for 10 minutes


Wash 3x with PBS


Block with 3% BSA in PBS for 1 hour at room temperature


Add anti-TH primary antibody (1:500 in blocking buffer)


Incubate overnight at 4°C


Wash 3x with PBS


Add FITC-conjugated secondary antibody (1:500) for 1 hour at room temperature in dark


Wash 3x with PBS


Add DAPI (1:1000) for 5 minutes


Wash 2x with PBS


Image on fluorescence microscope


Positive result: increased TH+ green fluorescence in Nurr1 + FoxA2 wells compared to GFP control.


###### 9.2 Dopamine ELISA


Primary readout for both differentiation verification and chemical RL loop. Follows methodology used in the PC12 dopamine literature (Eagle Biosciences High Sensitivity Dopamine ELISA Kit).


At least 2 hours before ELISA: replace medium with 1 ml fresh DMEM without serum


Stimulate with 56 mM KCl for 15 minutes to trigger dopamine release


Collect conditioned medium into RNase-free tube


Centrifuge at 300 x g for 5 minutes to remove cell debris


Transfer supernatant — use immediately or store at -80°C


Follow Eagle Biosciences ELISA kit protocol exactly


Read plate at 450 nm on plate reader


Calculate dopamine concentration from standard curve


###### 10. Part 6 — Chemical Reinforcement Learning with Opentrons


Timeline: Day 21. Perform after dopamine ELISA confirms differentiation is successful.


###### 10.1 Opentrons Deck Setup


###### 10.2 RL Loop Logic


The Opentrons Python script implements the following logic:


Baseline read: collect 50 µl conditioned medium, transfer to ELISA plate


Define reward threshold: dopamine concentration above X nM (determined from Day 18 baseline ELISA)


Deliver stimulus: Opentrons dispenses 100 µl dopamine solution (start at 1 µM)


Incubation: 15 minutes


State read: collect medium sample for ELISA


Decision: if dopamine above threshold → escalate to 5 µM; if below → maintain or withhold


Repeat for 3-5 cycles per session


###### 10.3 Opentrons Python Protocol

###### 10.4 Data Recording


Record for every RL session:


Date and time of each dopamine delivery


Dopamine concentration dispensed (µM)


Volume delivered (µl)


ELISA dopamine reading (nM) before and after each pulse


Cell passage number


Days post transfection


Days post NGF differentiation


Any observations about cell morphology


###### 11. Troubleshooting


###### 12. Complete Experiment Timeline


###### 13. Key References


1. Kim S et al. (2017) Efficient Generation of Dopamine Neurons by Synthetic Transcription Factor mRNAs. Molecular Therapy. PMC5589083.


2. Sari Y, Sousa Rosa S et al. (2024) Comprehensive evaluation of T7 promoter for enhanced yield and quality in mRNA production. Scientific Reports. PMC11053036.


3. Lee HS et al. (2010) Foxa2 and Nurr1 Synergistically Yield A9 Nigral Dopamine Neurons Exhibiting Improved Differentiation, Function, and Cell Survival. Stem Cells. PMID:20049900.


4. Yang K et al. (2019) Synaptic dopamine release is positively regulated by SNAP-25 that involves in benzo[a]pyrene-induced neurotoxicity. Chemosphere 237:124378.


5. Homberg JR et al. (2016) The role of the dopamine D1 receptor in social cognition: studies using a novel genetic rat model. Disease Models & Mechanisms 9:1147-1158.


6. Wiatrak B et al. (2020) PC12 Cell Line: Cell Types, Coating of Culture Vessels, Differentiation and Other Culture Conditions. Cells 9(4):958.


7. Greene LA, Tischler AS (1976) Establishment of a noradrenergic clonal line of rat adrenal pheochromocytoma cells which respond to nerve growth factor. PNAS 73:2424-2428.


###### 14. Experimental Notes
###### Observations:
| RL Component | Biological Implementation |
| --- | --- |
| State | Dopamine concentration measured by ELISA from conditioned media |
| Action | Opentrons delivers dopamine pulse to culture well |
| Reward signal | Dopamine above threshold triggers further stimulation |
| Suppression signal | Below threshold — withhold stimulus |
| Learning readout | Shift in baseline dopamine release over multiple sessions |


| Component | BSL Classification | Notes |
| --- | --- | --- |
| PC12 cells (CRL-1721) | BSL-1 | Rat, non-human, non-primate |
| Nurr1 mRNA | BSL-1 | Synthetic mRNA, no pathogen sequences |
| FoxA2 mRNA | BSL-1 | Synthetic mRNA, no pathogen sequences |
| GFP mRNA | BSL-1 | Control, no pathogen sequences |
| Lipofectamine MessengerMAX | BSL-1 | Chemical transfection reagent |
| Dopamine HCl | BSL-1 | Standard laboratory chemical |
| E. coli (NEB 10-beta) | BSL-1 | Non-pathogenic laboratory strain |


| Item | Supplier | Catalogue Number |
| --- | --- | --- |
| PC12 (CRL-1721) | ATCC | CRL-1721 |
| Note: standard PC12 not Adh variant — responds to NGF |  |  |


| Construct | Vector | Purpose |
| --- | --- | --- |
| rNurr1_mRNA_template_FINAL | pTwist Amp High Copy | IVT template for Nurr1 mRNA |
| rFoxA2_mRNA_template_FINAL | pTwist Amp High Copy | IVT template for FoxA2 mRNA |
| GFP_mRNA_template_FINAL | pTwist Amp High Copy | IVT template for GFP control mRNA |


| Reagent | Supplier | Notes |
| --- | --- | --- |
| DMEM high glucose with L-glutamine | Thermo Fisher | Base medium |
| Horse serum, heat inactivated | Thermo Fisher | 10% for growth, 1% for differentiation |
| FBS, heat inactivated | Thermo Fisher | 5% for growth |
| Penicillin/Streptomycin | Thermo Fisher | 100 U/ml + 100 µg/ml |
| PBS | Thermo Fisher | Sterile, calcium/magnesium-free |
| NGF 2.5S | Alomone / Sigma | 50-100 ng/ml working; store -80°C in BSA |
| BSA | Sigma | 0.1% for NGF dilution and storage |
| Poly-D-Lysine (PDL) | Sigma | 50 µg/ml coating solution |
| Laminin | Sigma/Thermo | 10 µg/ml coating solution |
| LB Broth | Sigma | Bacterial culture |
| LB Agar | Sigma | Bacterial plating |
| Ampicillin | Sigma | 100 µg/ml for bacterial selection |
| Lipofectamine MessengerMAX | Thermo Fisher | mRNA-specific transfection reagent |
| Opti-MEM | Thermo Fisher | For transfection complex preparation |
| HiScribe T7 High Yield RNA Synthesis Kit | NEB | IVT reaction |
| E. coli Poly-A Polymerase | NEB M0276 | Post-IVT poly-A addition |
| RNeasy Mini Kit | Qiagen | mRNA purification |
| RNase-free water | Thermo Fisher | All RNA steps |
| RNase inhibitor | NEB / Thermo | mRNA protection |
| KpnI restriction enzyme | NEB | Template linearisation |
| CutSmart Buffer | NEB | KpnI digestion |
| GeneJET Miniprep Kit | Thermo Fisher | Plasmid extraction |
| Dopamine hydrochloride | Sigma | Prepare fresh; store -80°C |
| Ascorbic acid | Sigma | 0.1 mg/ml; prevents dopamine oxidation |
| Dopamine ELISA Kit (High Sensitivity) | Eagle Biosciences / Abcam | Quantitative dopamine readout |
| Anti-TH antibody | Abcam | For differentiation verification |
| Secondary antibody (FITC) | Abcam | Fluorescent detection |
| DAPI | Thermo Fisher | Nuclear counterstain |
| 4% Paraformaldehyde | Sigma | Cell fixation |
| Triton X-100 | Sigma | Cell permeabilisation |
| Competent E. coli NEB 10-beta | NEB | Transformation |


| Component | Volume |
| --- | --- |
| RNase-free water | To 20 µl total |
| 10x T7 RNA Polymerase Buffer | 2 µl |
| ATP (100 mM) | 2 µl |
| CTP (100 mM) | 2 µl |
| GTP (100 mM) | 2 µl |
| UTP (100 mM) | 2 µl |
| Linearised DNA template (1 µg/µl) | 1 µl |
| T7 RNA Polymerase Mix | 2 µl |


| Day | Expected Observation |
| --- | --- |
| 9 | Cells healthy, attached, no visible neurites |
| 10-11 | Short process initiation visible |
| 12-13 | Clear neurite extensions 1-2 cell lengths |
| 14-15 | Mature neurite network, differentiation established |


| Well | Condition | mRNA transfected | Purpose |
| --- | --- | --- | --- |
| A1 | Negative control | None | Baseline PC12 |
| A2 | Transfection control | GFP mRNA only | Confirms transfection efficiency |
| A3 | Experimental | Nurr1 + FoxA2 mRNA | Dopaminergic differentiation |
| A4 | Experimental replicate | Nurr1 + FoxA2 mRNA | Technical replicate |
| A5 | Single factor control | Nurr1 mRNA only | Shows FoxA2 requirement |
| A6 | Single factor control | FoxA2 mRNA only | Shows Nurr1 requirement |


| Day | Action |
| --- | --- |
| 15 | First transfection — all wells |
| 16 | Second transfection + first TH staining check (optional) |
| 17 | Third transfection |
| 18 | Fourth transfection + first dopamine ELISA |
| 19 | Fifth transfection |
| 20 | Final transfection + TH immunostaining |
| 21 | Chemical RL experiment |


| Condition | Expected dopamine (relative) |
| --- | --- |
| Negative control (no mRNA) | Baseline (low) |
| GFP control | Similar to negative control |
| Nurr1 only | Modest increase over baseline |
| FoxA2 only | Modest increase over baseline |
| Nurr1 + FoxA2 | Significantly elevated — 3-5x baseline (Kim et al.) |


| Deck Position | Labware | Contents |
| --- | --- | --- |
| 1 | NEST 12-well reservoir | Dopamine working solutions (1, 5, 10 µM) + PBS wash |
| 2 | Corning 96-well flat plate | PC12 cells (transferred from 6-well) |
| 3 | Opentrons 96 tip rack (300 µl) | Standard tips |
| 4 | NEST 12-well reservoir (waste) | Aspirate waste |