Sofía Segura Cárdenas — HTGAA Spring 2026


Biological Engineering · Medical physics · Space science

About me

Hello! Welcome to my How To Grow (Almost) Anything (HTGAA) - Spring 2026 page.

My name is Sofía, and I am a final-year Biological Engineering undergraduate student. My academic background is mainly focused on biomaterials, biological systems, modeling, simulation and engineering approaches to working with living matter.

Alongside this, my main scientific interests lean strongly toward physics-related fields, particularly medical physics, space medicine, and the study of extreme environments—ranging from radiation effects in matter to broader interests in space and astrophysics.


Contact info

Email: seguracardenassofia@gmail.com


Homework


Node - Lifefabs Institute

  • HTGAA UK Node (Digby Usher, Danny Chen)
    TAs
    Homework review sessions & mentorship
    Thursdays: 09:00 – 10:00 AM & 21:00-22:00 PM (UK Time)

Labs


Projects

  • The three FP ideas requestes by HTGAA

    • Dual-Signal Biosensor for Functional Radiation Dosimetry
    • Smart Bacteria for Targeted Oil Spill Cleanup
    • Self-Destructing Antimicrobial Biofilms
  • Self-Destructing Antimicrobial Biofilms

    • Bacillus subtilis for AMR and Space Science aplications
    • Antimicrobial market and new necessities
    • Sequential Logic Circuit and genetic constructs
    • Eperimental lab protocol

Subsections of Sofía Segura Cárdenas — HTGAA Spring 2026

Weekly Homework



Switch view to Labs

Subsections of Weekly Homework

HTGAA - Week 1: Principles and Practices


cover image cover image

My Homework

WEEK 1 - SAVE THE WORLD OR DESTROY THE WORLD

This week lays the foundation for ethics, safety, and governance in biotechnology.

Lecture (Tues, Feb 3)

Principles & Practices
(▶️Recording)
David Kong
George Church
Joe Jacobson

Recitation (Wed, Feb 4)

Principles, Ethics, Practices
(▶️Recording | 💻Slides)
Ronan Donovan
Suvin Sundararajan
Subastian Kamau
Greg Galperin


Instructions

  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

  3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

    Example
    • Purpose: What is done now and what changes are you proposing?
    • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
    • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
    • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

  4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents
• By helping respond
Foster Lab Safety
• By preventing incident
• By helping respond
Protect the environment
• By preventing incidents
• By helping respond
Other considerations
• Minimizing costs and burdens to stakeholders
• Feasibility?
• Not impede research
• Promote constructive applications
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

PART 1. FIXING THE COURSE
  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

NOTE: This project is just the initial idea, it can be subjected to changes and upgrades in the near future.

Project: Reversible Cell-Free Biosensor for ROS-Mediated Radiation Damage

This project aims to design a reversible, cell-free biosensor capable of reporting radiation-induced oxidative damage through a visible biochemical signal.

The system is based on a DNA-programmed TX–TL circuit embedded within a hydrogel matrix, inspired by biological systems that can transition between active and inactive states under physical stress. Upon exposure to radiation-induced reactive oxygen species (ROS), the biosensor activates a transient fluorescent response, which gradually returns to a basal state once the stimulus is removed, enabling reuse of the material.

By decoupling damage sensing from living cells, this platform provides a controllable and modular approach to studying radiation effects on biological matter.

One-sentence project goal

The goal of this project is to engineer a reversible, reusable, cell-free biosensor that translates radiation-induced oxidative damage into a transient biochemical signal.

Background, application and why does it matter

The primary application of this biosensor is in radiation physics, medical physics and even space science, where it can be used as a reusable biological dosimetry platform to study oxidative damage induced by ionizing radiation.

Rather than measuring radiation directly, the system reports biologically relevant damage, specifically ROS generation, offering insight into how physical radiation translates into molecular stress in biological systems. This makes the material particularly valuable for experimental radiation setups, calibration studies, and comparative stress assays, without the need for living models.

The material functions as a reversible biological stress reporter. Instead of permanently activating or degrading under radiation-induced stress, it temporarily switches state to signal damage and then returns to baseline, enabling repeated use and long-term monitoring.

In medical physics and radiobiology, many existing sensing systems present fundamental limitations:

  • They degrade over time
  • They saturate under high stimulus
  • They are single-use
  • They cannot be reset or recovered

Similarly, most biological sensors:

  • lose viability
  • or remain irreversibly activated after damage

This creates a gap between physical radiation sensing and biologically meaningful damage reporting. The hydrogel is not just a container. While individual stress-responsive genetic elements are well characterized, their integration into a reusable, reversible cell-free biomaterial capable of multiple stress-response cycles remains largely unexplored.


Inspiration

The project is inspired by simple biological systems, such as jellyfish, which exhibit functional resilience and reversible state transitions despite minimal organizational complexity. These organisms demonstrate that biological function does not always require permanent activation or structural complexity, but can instead rely on transient, physics-driven responses to environmental stress.

Translating this principle into a synthetic, cell-free context, the proposed biosensor explores how biological states—such as gene expression and signal emission—can be reversibly triggered by physical damage and allowed to relax back to a stable baseline.


What makes this a synthetic biology project

This project constitutes a synthetic biology approach by designing and programming a DNA-based TX–TL circuit that links oxidative stress sensing to a controlled biochemical output to manifest a visible fluorescent signal. The circuit architecture, combined with material constraints imposed by the hydrogel matrix, enables tunable activation, decay, and reversibility of the signal.

Signal intensity correlates with stress magnitude, while signal reversibility reflects the system’s ability to recover to a baseline state. System reversibility is achieved through the co-design of a stress-responsive genetic circuit and a diffusion-regulated material matrix, enabling transient activation and passive return to a basal state without permanent system alteration. The system does not shut down because it fails; it shuts down because it is designed to relax back to its original state.

This platform is thinked to be modular, allowing future expansion to additional damage types. Rather than engineering a new organism, the project focuses on engineering biological function, emphasizing control, modularity, and reusability.

Conceptual state transition

  1. The system starts in an OFF (basal) state
  2. Oxidative stress is applied (e.g. H₂O₂ or radiation-induced ROS)
  3. The system enters a “damage state”
  4. A fluorescent signal is activated
  5. The stress is removed
  6. The system relaxes back to its basal state

Engineering design decisions

Biological Circuit ControlsMaterial (Hydrogel) Controls
What is detected (ROS, damage, stress)How much stimulus enters the system
What signal is produced (fluorescence)How fast the stimulus diffuses
Activation threshold and sensitivityHow long the stimulus is retained
Timing of signal initiationRate of stimulus clearance
Duration of protein expressionSmoothness of system shutdown
Signal termination mechanismsBuffering of damage spikes
Susceptibility to noise or false positivesProtection of TX–TL components

Key tunable parameters in the system design include:

  • duration of protein expression
  • protein degradation rate
  • response speed
  • energy consumption
  • lifetime of the TX–TL system

Primary and secondary reporting strategy

  • Primary signal: fluorescence intensity
  • Secondary signal: temporal dynamics of activation and decay

Interpretation:

  • Fluorescence intensity reflects the magnitude of ROS-induced damage
  • Signal duration and decay profile reflect the dynamic response of the system under stress
Simplifying

How much it glows → magnitud of the damage
How fast it starts glowing → intensity of the stress
How the signal declines → dynamics of the system under damage

Reversibility is not interpreted as a property of the damage itself, but as a designed feature of the biosensor, enabling repeated use under multiple damage cycles.

Circuit architecture

[ROS-sensitive promoter]

[Fluorescent protein + degron]

[Terminator]

Why this is non-trivial (and why it’s innovative)

Poor design choices lead to failure modes such as:
  • Gel too dense → stimulus never reaches the circuit → no activation
  • Gel too loose → excessive activation → no shutdown
  • Reporter too stable → permanent signal → no reuse
  • Circuit too sensitive → noise and false positives

PART 2. PROJECT CONSIDERATIONS
  1. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

Governance and Policy Considerations

flowchart TB
    G["Governance & Policy Goals"]

    G --> A["Non-malfeasance<br/>(Preventing Harm)"]
    A --> A1["Cell-free TX–TL limits dual-use potential"]
    A --> A2["Avoids human or clinical deployment"]
    A --> A3["Environment friendly"]

    G --> B["Safe and Responsible Research"]
    B --> B1["Transparency in system limitations"]
    B --> B2["Reproducibility and containment"]
    B --> B3["Ensuring personal safety and capacitation"]
    B --> B4["Financial responsability"]

    G --> C["Constructive & Equitable Use"]
    C --> C1["Accessibility of the platform"]
    C --> C2["Supports education and interdisciplinary research"]
    C --> C3["Promotion of heuristic rules/method"]
    
  • Goal 1A. Non-malfeasance (Preventing Harm)
    cover image cover image
  • Goal 1B. Safe and Responsible Research
    cover image cover image
  • Goal 1C. Constructive and Equitable Use
    cover image cover image
  • Sub-Goal 1A. Cell-free TX–TL limits dual-use potential
    • The biosensor is designed as a cell-free system, preventing replication, evolution, or environmental persistence, thereby reducing biosafety and biosecurity risks.
  • Sub-Goal 2A. Avoids human or clinical deployment
    • The system is not intended for in vivo, clinical, or diagnostic use; clear communication of this limitation helps prevent inappropriate application and fends emerging ethical concerns about animal and human clinical trials.
  • Sub-Goal 3A. Environment friendly
    • This project prioritizes environmentally responsible design by relying on hydrogel matrices derived from biodegradable, bio-based, or naturally sourced polymers. Such materials are often obtained from renewable resources or industrial by-products, reducing environmental impact compared to synthetic, non-degradable sensing technologies. Additionally, the reusability of the biosensor minimizes material waste and lowers the frequency of disposal, contributing to a more sustainable experimental practice.
  • Sub-Goal 1B. Transparency in system limitations
    • The biosensor reports oxidative damage via ROS signaling rather than direct radiation dose, and this distinction must be clearly stated to avoid misinterpretation.
  • Sub-Goal 2B. Reproducibility and containment
    • The use of in silico circuit design and controlled TX–TL systems improves reproducibility while minimizing unintended biological interactions.
  • Sub-Goal 3B. Ensuring personal welfare and capacitation
    • Because the system is intended for studying radiation-induced damage in controlled environments, its use must be accompanied by appropriate safety protocols and user training. This biosensor is explicitly not designed to replace personal dosimeters or occupational safety monitoring devices. Clear operational guidelines, radiation-handling protocols, and user capacitation are required to ensure that the biosensor is employed strictly as an experimental tool, without increasing risk to personnel.
  • Sub-Goal 4B. Financial responsability
    • The proposed system emphasizes cost-effective design through the use of low-cost materials, minimal infrastructure requirements, and a reusable sensing strategy. By enabling multiple experimental cycles within the same biosensor material, the system reduces recurring expenses associated with single-use sensors or consumables. This extended operational lifetime represents a significant financial advantage for laboratories and institutions, supporting responsible allocation of economic resources.
  • Sub-Goal 1C. Accessibility of the platform
    • Cell-free and hydrogel-based systems lower infrastructure barriers, making the platform more accessible to educational and research laboratories.
  • Sub-Goal 2C. Supports education and interdisciplinary research
    • The project bridges synthetic biology, materials science, and medical physics while maintaining clear ethical boundaries around scope and use.
  • Sub-Goal 3C. Promotion of heuristic rules
    • This project adopts a heuristic-driven design philosophy, leveraging simple, interpretable rules to guide system construction and experimentation. Material properties, circuit dynamics, and experimental steps are intentionally ordered to maximize efficiency—favoring low-cost, low-complexity processes early and reserving more resource-intensive steps for later stages. This approach improves time efficiency, reduces unnecessary expenditures, and promotes accessible, transferable design strategies that can be adapted across laboratories and disciplines.

PART 3. THE WHO AND THE HOW
  1. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).
Example
  • Purpose: What is done now and what changes are you proposing?
  • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
  • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
  • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Governance Action 1 — Mandatory contextual labeling and use limitation

Actor(s): Academic researchers, research institutions, funding agencies.

Purpose

Currently, biosensors designed for radiation-related applications can be misinterpreted as direct radiation detectors or clinical tools. This project proposes a mandatory contextual labeling requirement stating that the system detects ROS-mediated damage, not radiation dose, and is intended strictly for in vitro experimental use. The change ensures that the tool is not misapplied in clinical, occupational, or regulatory contexts.

Design

To make this work, institutions and funding bodies would require that:

  • All documentation, publications, and public-facing descriptions explicitly state the system’s scope and limitations.
  • Experimental protocols include a standardized disclaimer clarifying that the biosensor does not replace dosimeters or personal safety devices.
  • Course projects and academic demonstrations reinforce correct interpretation through documentation templates and reporting guidelines.

Assumptions

This action assumes that misinterpretation is a primary pathway for harm and that clear documentation meaningfully influences user behavior. It also assumes that researchers and students will comply with labeling norms when they are formally required.

Risks of Failure & “Success”

  • Failure risk: Labels may be ignored, especially when the system performs well and appears “sensor-like.”
  • Risk of success: If widely adopted, the tool could become a de facto standard for damage reporting, tempting users to extend it beyond its intended domain without appropriate validation.

Governance Action 2 — Safety training and protocol integration as a prerequisite for use

Actor(s): Research institutions, laboratory safety committees, instructors.

Purpose

Radiation-related experimentation already requires specialized training, but novel biosensors can create a false sense of safety. This action proposes that use of the biosensor be explicitly tied to existing radiation safety training and protocols, reinforcing that the tool supplements—but does not replace—established safety infrastructure.

Design

This action would require:

  • Integration of the biosensor into institutional radiation safety manuals as an experimental reporting tool.
  • Mandatory user training that explains what the biosensor measures, what it does not measure, and how to interpret its output.
  • Oversight by institutional safety committees when the system is used in radiation-adjacent experiments.

Assumptions

This approach assumes that institutions already have safety frameworks capable of absorbing new tools, and that users are more likely to behave responsibly when a technology is embedded within formal safety structures.

Risks of Failure & “Success”

  • Failure risk: Training could become procedural rather than substantive, reducing its effectiveness.
  • Risk of success: If the biosensor becomes normalized within safety workflows, it may be incorrectly perceived as an authoritative indicator of safety rather than an experimental proxy.

Governance Action 3 — Incentivizing reusable, low-waste biosensing systems

Actor(s): Funding agencies, academic programs, sustainability-focused research initiatives.

Purpose

Many sensing technologies are single-use, expensive, or environmentally burdensome. This action proposes incentivizing reusable and low-waste biosensor designs, positioning reusability and material efficiency as desirable research outcomes rather than secondary considerations.

Design

This could be implemented through:

  • Establish evaluation criteria that favor reusability, material sustainability, and life cycle efficiency.
  • Creation and promotion of open, repositories that document reuse cycles, material performance, and design adaptations for biosensing platforms.
  • Recognition or funding bonuses for designs that reduce consumables and experimental waste.

Assumptions

This action assumes that researchers respond to incentive structures and that sustainability metrics can be meaningfully evaluated without stifling innovation or creativity.

Risks of Failure & “Success”

  • Failure risk: Incentives may encourage superficial reuse claims without rigorous validation.
  • Risk of success: Strong emphasis on reuse could discourage exploration of necessary single-use or high-sensitivity designs in certain contexts.

PART 4. HOW WELL DO YOU DO?
  1. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

cover image cover image
PART 5. PRIORITIES
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

Prioritized governance strategy and rationale

Drawing upon the governance scoring matrix, the most effective strategy for guiding the responsible development and use of the proposed reversible cell-free biosensor is a combined prioritization of Governance Options 1 and 2, with Governance Option 3 acting as a reinforcing, longer-term incentive mechanism.

Primary priority: Governance Options 1 and 2 (combined)

  • Option 1 — Mandatory contextual labeling and use limitation +
  • Option 2 — Safety training and protocol integration as prerequisites

These two options consistently score highest across biosafety, lab safety, and environmental protection, particularly in their ability to prevent incidents rather than merely respond to them. Together, they address the most immediate risks associated with misuse, misinterpretation, or inappropriate deployment of the biosensor.

Option 1 ensures that the system is clearly framed as:

  • A cell-free, non-replicative biosensing platform
  • Not a personal radiation dosimeter Not intended for clinical or in vivo use

This directly reduces the risk of over-interpretation of fluorescence signals and prevents the technology from being deployed outside its validated scope.

Option 2 complements this by embedding the biosensor within existing institutional safety cultures, requiring that users receive appropriate training in:

  • Radiation handling protocols
  • Interpretation of indirect ROS-based signals
  • Limitations of TX–TL systems

Importantly, this option does not introduce new regulatory burdens but instead leverages existing laboratory training and approval workflows, making it both feasible and scalable.

Trade-off considered: These measures may slow early adoption or increase onboarding time for new users. However, this is outweighed by the reduction in misuse risk and the preservation of trust in the technology.


Secondary priority: Governance Option 3 (Incentive-based einforcement)

Option 3 — Incentivizing reusable, low-waste biosensing systems

While Option 3 scores lower in immediate incident prevention, it plays a crucial role in shaping long-term research behavior and system design choices. Incentives that reward reusability, lifecycle efficiency, and reduced consumables encourage adoption of the very properties that distinguish this biosensor from traditional single-use sensors.

Rather than acting as a primary safeguard, this option functions best as:

  • A structural reinforcement mechanism +
  • A signal to researchers and institutions that sustainability and reuse are valued outcomes

Trade-off considered:
Incentive-based mechanisms depend on institutional uptake and may have uneven effects across well-funded versus resource-limited laboratories. Their impact is therefore slower and less uniform than mandatory requirements.


Assumptions and Uncertainties

This prioritization assumes that:

  • Institutions and laboratories already possess baseline safety infrastructure
  • Users are willing to engage with training and labeling requirements
  • Regulatory bodies are receptive to non-single-use technologies

Uncertainties remain regarding:

  • How fluorescence-based damage reporting might be interpreted by non-experts
  • Variability in institutional enforcement of training standards
  • How incentive structures translate into real design decisions over time

This governance strategy is primarily directed toward:

  1. Institutional biosafety committees and laboratory leadership
  2. Funding agencies and regulatory bodies overseeing research infrastructure
  3. Organizations setting best-practice standards for cell-free and biosensing technologies

By acting at this institutional and regulatory level, the proposed governance combination balances safety, feasibility, innovation, and sustainability, aligning closely with the technical and ethical goals of the project.


WEEK 2 - LECTURE PREP

In preparation for Week 2’s lecture on “DNA Read, Write, and Edit," please review the follow materials

Lecture 2 slides as posted below. The associated papers that are referenced in those slides.
In addition, answer these questions in each faculty member’s section:

Homework Questions from Professor Jacobson: [Lecture 2 slides]

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

DNA Polymerase error rate and genome fidelity

DNA polymerase, the enzyme responsible for copying DNA during replication, has an intrinsic error rate of approximately 1 mistake per 10⁵ nucleotides incorporated.

The human genome contains about 3 × 10⁹ base pairs. At this raw error rate, tens of thousands of mutations would occur every time a human cell divides, which would be incompatible with life.

How biology addresses this discrepancy

Biological systems reduce replication errors through multiple layers of error correction:

  1. Proofreading by DNA polymerase
    Many DNA polymerases possess 3′→5′ exonuclease activity, which allows them to remove incorrectly incorporated nucleotides immediately. This improves fidelity to roughly 1 error per 10⁷ nucleotides.

  2. Post-replication mismatch repair (MMR)
    Additional cellular repair systems detect and correct mismatches that escape proofreading, further reducing the error rate to approximately 1 error per 10⁹–10¹⁰ nucleotides.

As a result, the final error rate is low enough that most cell divisions occur without introducing harmful mutations.


How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Coding Capacity of DNA for an Average Human Protein

An average human protein is approximately 300 amino acids long. Each amino acid is encoded by a codon, a sequence of three nucleotides.

Because there are 4 possible nucleotides (A, T, C, G), there are:

  • ( 4^3 = 64 ) possible codons
  • Only 20 amino acids (plus stop signals)

This means the genetic code is degenerate, and most amino acids are encoded by multiple codons.

Number of Possible DNA Sequences for One Protein

If an average amino acid is encoded by ~3 synonymous codons, then the total number of possible DNA sequences that could encode a 300–amino acid protein is approximately:

[ 3^{300} ]

This is an astronomically large number, meaning there are many distinct DNA sequences that can, in theory, encode the same protein.


Why Most Possible Codes Do Not Work in Practice

Despite this theoretical flexibility, not all synonymous DNA sequences function equally well due to several biological constraints:

  1. Codon usage bias
    Organisms preferentially use certain codons over others. Rare codons can slow translation or cause ribosome stalling.

  2. mRNA secondary structure
    Certain nucleotide sequences form stable secondary structures that hinder ribosome binding or elongation.

  3. Translational accuracy and efficiency
    Codon choice can affect misincorporation rates and protein folding during translation.

  4. Regulatory elements embedded in coding sequences
    Coding regions may overlap with regulatory signals affecting splicing, mRNA stability, or localization.

  5. GC content and genome stability
    Extreme nucleotide compositions can impact DNA replication and transcription efficiency.

Because of these factors, only a small subset of all theoretically possible DNA sequences are biologically viable for producing a functional protein at appropriate levels.


Homework Questions from Dr. LeProust: [Lecture 2 slides]

What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry.
In this method, DNA is synthesized stepwise from the 3′ to the 5′ end on a solid support. Each cycle consists of four main steps: deprotection, coupling of a phosphoramidite nucleotide, capping of unreacted chains, and oxidation. This approach is highly automated, fast, and reliable, making it the standard technique used by commercial DNA synthesis providers.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

It is difficult to synthesize oligos longer than ~200 nucleotides because errors accumulate with each synthesis cycle.
Each nucleotide addition has a small but nonzero failure rate (incomplete coupling, side reactions, or deletions). As the oligo length increases, these errors compound exponentially, leading to a low fraction of full-length, correct sequences. Additionally, longer oligos are harder to purify effectively, since truncated products differ only slightly in length from the desired product.

Why can’t you make a 2000bp gene via direct oligo synthesis?

A 2000 bp gene cannot be made via direct oligo synthesis because the cumulative error rate would be extremely high, resulting in an almost negligible yield of error-free full-length DNA.
Beyond error accumulation, chemical synthesis efficiency, purification limitations, and cost make direct synthesis impractical at this scale. Instead, long genes are constructed by assembling shorter, overlapping oligos using enzymatic methods such as PCR-based assembly or Gibson assembly, followed by cloning and sequence verification.


Homework Question from George Church: [Lecture 2 slides]

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in animals are those that cannot be synthesized de novo and therefore must be obtained from the diet:

  1. Histidine
  2. Isoleucine
  3. Leucine
  4. Lysine
  5. Methionine
  6. Phenylalanine
  7. Threonine
  8. Tryptophan
  9. Valine
  10. Arginine (essential in all animals during growth; in many adult animals it is conditionally essential)

How does this affect the view of the “Lysine Contingency”?

The “Lysine Contingency” refers to the idea that life—particularly animals—became evolutionarily dependent on lysine availability from external sources, because animals lost the ability to synthesize lysine. Since lysine is universally essential in animals and often limiting in plant-based diets (especially cereal grains), this creates a strong nutritional and evolutionary constraint.

This reinforces the view that the lysine contingency is real and biologically significant:

  • Animals are metabolically constrained by the loss of lysine biosynthesis pathways.
  • Ecosystems and food webs are shaped by lysine availability and by organisms (plants, fungi, bacteria) that can synthesize it.
  • It helps explain why lysine supplementation or biofortification (e.g., high-lysine crops) has a major impact on nutrition and health.

Overall, the universality of lysine as an essential amino acid in animals supports the idea that lysine availability is a key evolutionary and nutritional bottleneck rather than a trivial dietary detail.


[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?
[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:
https://arpa-h.gov/explore-funding/programs/boss
https://www.darpa.mil/research/programs/smart-rbc
https://www.darpa.mil/research/programs/go


Resources

  1. Nguyen, P. Q., Soenksen, L. R., Donghia, N. M., Angenent-Mari, N. M., de Puig, H., Huang, A., Lee, R. A., Slomovic, S., Galbersanini, T., Lansberry, G., Sallum, H. M., Zhao, E. M., Niemi, J. B. & Collins, J. J. (2021). Wearable materials with embedded synthetic biology sensors for biomolecule detection. Nature Biotechnology, 39(11). https://doi.org/10.1038/s41587-021-00950-3
  2. Karim, M. M. and Lasker, T. (2025). Electrochemical Biosensors for Cancer Biomarker Detection: Basic Concept, Design Strategy and Cutting‐Edge Development. Electrochemical Science Advances. https://doi.org/10.1002/elsa.70007
  3. Liang, Q., Lu, Y. & Zhang, Q. (2022). Hydrogels‐Based Electronic Devices for Biosensing Applications. In Smart Stimuli-Responsive Polymers, Films, and Gels. https://doi.org/10.1002/9783527832385.ch10
  4. Zhang, M., Xu, T., Liu, K., Zhu, L., Miao, C., Chen, T., Gao, M., Wang, J. & Si, C. (2024). Modulation and Mechanisms of Cellulose‐Based Hydrogels for Flexible Sensors. SusMat, 5. https://doi.org/10.1002/sus2.255
  5. Ahmed, S. N. (2015). Physics and Engineering of Radiation Detection. Choice Reviews Online. https://doi.org/10.1016/C2013-0-15270-1
  6. Ng, K., Ung, N. & Hill, R. (2022). Problems and Solutions in Medical Physics: Radiotherapy Physics. CRC Press. https://doi.org/10.1201/9780429159466
  7. Abaza, A. M. H. (2017). New Trend in Radiation Dosimeters. American Journal of Modern Physics, 7(1), 21-30. https://doi.org/10.11648/j.ajmp.20180701.13
  8. Bartoloni, A. & Strigari, L. (2025). Space Radiobiology: Synergies between Astroparticle and Medical Physics. World Scientific.
  9. Zhang, J., Liu, J., Qiao, L., Zhang, Q., Hu, J. & Zhang, C. (2024). Recent Advance in Single-Molecule Fluorescent Biosensors for Tumor Biomarker Detection. Biosensors, 14(11). https://doi.org/10.3390/bios14110540
  10. Cao, X., Lv, D., Zhang, L & Xing, Z. (2020). Adaptive Governance, Loose Coupling, Forward-Looking Strategies and Responsible Innovation (September 2020). https://doi.org/10.1109/ACCESS.2020.3046095
  11. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 2: DNA read, write and edit


cover image cover image

My Homework

WEEK 2 - CUT, COPY AND STITCH

This week explores the read–write–edit toolkit: sequencing and synthesis workflows, restriction digests and gel electrophoresis, and early genome-editing frameworks.

Lecture (Tues, Feb 10)

DNA Read, Write, & Edit
George Church [slides]
Joe Jacobson [slides]
Emily Leproust [slides]

(The recording will be posted here when available)

Recitation (Wed, Feb 11)

DNA Gel, restriction enzymes, Benchling intro, Twist intro
(▶️Recording | 💻Slides)
Ice Kiattisewee


Documentation

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Instructions

Part 0: Basics of Gel Electrophoresis

Attend or watch all lecture and recitation videos. Optionally watch bootcamp.

Electrophoresis chamber setup (left) and injected gel matrix (right).

Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview:

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes:
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
  • You might find Ronan’s website a helpful tool for quickly iterating on designs!

Ronan Donovan’s electrophoresis simulation

Attempt to generate a Jellyfish design!

Benchling simulation

Generating a gel design with Benchling enzymes on site

Final download:

Yes well, we tried that jellyfish (hope you can see it too!), mad respect to Paul Vanouse.


Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersOptional (for those with Lab access)

Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.


Part 3: DNA Design Challenge

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

[Example from our group homework, you may notice the particular format — The example below came from UniProt]

>sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT


Chosen protein: α-Bungarotoxin

I chose α-bungarotoxin, a neurotoxin found in the venom of the snake Bungarus multicinctus.

Bungarus multicinctus - Many-banded krait (left) and α-Bungarotoxin protein form by UniProt (right).

Why?

  • It is an extremely potent toxin.
  • It binds almost irreversibly to nicotinic acetylcholine receptors.
  • It causes paralysis by blocking neuromuscular transmission.
  • It is widely used in neurobiological research.
  • It is relatively small (95 amino acids), which makes it manageable for sequence analysis.

Protein sequence (UniProt-style format)

P60615|3L21A_BUNMU Alpha-bungarotoxin

>sp|P60615|3L21A_BUNMU Alpha-bungarotoxin OS=Bungarus multicinctus OX=8616 PE=1 SV=1 MKTLLLTLVVVTIVCLDLGYTIVCHTTATSPISAVTCPPGENLCYRKMWCDAFCSSRGKVVELGCAATCPSKKPYEEVTCCSTDKCNPHPKQRPG

(95 amino acids; cysteine-rich protein with multiple disulfide bonds)


3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

[Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Lysis protein DNA sequence
atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa


As discussed in class, due to the degeneracy of the genetic code, multiple codons can encode the same amino acid. Therefore, reverse translation from a protein sequence to a DNA sequence is not unique.

Reverse translation was performed using the Bioinformatics reverse translation tool. Due to codon degeneracy, the resulting DNA sequence represents one possible coding sequence corresponding to the selected protein.

Complete pdf of the reverse translation: [pdf]

A possible nucleotide sequence corresponding to the selected protein is shown below:

Reverse translation of Untitled to a 285 base sequence of most likely codons

atgaaaaccctgctgctgaccctggtggtggtgaccattgtgtgcctggatctgggctataccattgtgtgccataccaccgcgaccagcccgattagcgcggtgacctgcccgccgggcgaaaacctgtgctatcgcaaaatgtggtgcgatgcgttttgcagcagccgcggcaaagtggtggaactgggctgcgcggcgacctgcccgagcaaaaaaccgtatgaagaagtgacctgctgcagcaccgataaatgcaacccgcatccgaaacagcgcccgggc


3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

[Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Lysis protein DNA sequence with Codon-Optimization
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA


Why is codon optimization necessary?

Although the genetic code is degenerate, meaning that multiple codons can encode the same amino acid, organisms do not use synonymous codons with equal frequency. Each organism has a preferred codon usage bias that reflects the abundance of its tRNAs.

If a gene containing rare codons is introduced into a heterologous host organism, several issues may arise:

  • Reduced translation efficiency
  • Lower protein yield
  • Increased risk of ribosome stalling
  • Potential misfolding due to slowed or irregular translation kinetics

Therefore, codon optimization is performed to adapt the coding sequence to the codon usage preferences of the chosen host organism, improving translation efficiency and overall protein production.

Selected organism for optimization

The coding sequence was optimized for expression in: Escherichia coli

Why E. coli?

  • It is the most widely used bacterial expression system.
  • It is cost-effective, fast-growing, and easy to genetically manipulate.
  • It is ideal for recombinant protein production.

Although α-bungarotoxin is a cysteine-rich protein containing multiple disulfide bonds, specialized strains (e.g., oxidative cytoplasm strains) or periplasmic targeting strategies can facilitate proper folding.

Codon-optimized sequence for E. coli

Codon optimization was performed using the Expression Optimization (Pilot) algorithm provided by Integrated DNA Technologies. The amino acid sequence was optimized for expression in Escherichia coli while avoiding BsaI, BsmBI and BbsI restriction sites.

The resulting optimized coding sequence (285 bp) is:

ATG AAA ACG TTA CTG CTG ACG CTG GTG GTG GTC ACC ATT GTT TGC CTG GAT CTG GGC TAC ACC ATT GTT TGC CAC ACC ACC GCG ACC TCA CCG ATC TCT GCG GTC ACC TGC CCG CCG GGT GAA AAT CTG TGC TAT CGC AAA ATG TGG TGC GAT GCG TTC TGC AGC AGC CGC GGC AAA GTG GTG GAG CTG GGC TGC GCG GCG ACC TGC CCG AGC AAA AAA CCG TAT GAA GAA GTG ACC TGC TGC AGC ACC GAC AAA TGC AAC CCG CAT CCG AAA CAG CGC CCG GGC

Sequence analysis indicated low complexity (score 2.1), suggesting no anticipated synthesis issues. Internal restriction sites unrelated to the cloning strategy were detected but do not interfere with the intended design.


3.4. We have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.


Once the codon-optimized DNA sequence has been obtained, several biotechnological strategies can be used to produce the corresponding protein. These methods rely on the fundamental biological processes of transcription and translation. Protein production can be achieved using either cell-dependent systems or cell-free expression systems.



3.5. [Optional] How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.
  2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

In natural biological systems, a single gene can give rise to multiple protein products. Although the classical view suggests that one gene encodes one protein, several regulatory mechanisms allow diversification at the transcriptional and post-transcriptional levels.

Mechanisms that allow one gene to produce multiple proteins

1. Alternative splicing

In eukaryotic organisms, genes contain exons and introns. During RNA processing, introns are removed and exons are joined together. However, different combinations of exons can be assembled, producing distinct mRNA variants from the same gene.

This process, known as alternative splicing, results in different protein isoforms with potentially different functions.

2. Alternative promoters

A single gene may contain multiple promoter regions. Depending on which promoter is activated, transcription may begin at different start sites, generating mRNAs with different 5′ ends. This can influence translation efficiency or alter the protein sequence.

3. Alternative translation initiation sites

Some mRNAs contain more than one possible start codon (AUG). Ribosomes may initiate translation at different positions, leading to proteins of different lengths.

4. RNA editing

In certain organisms, specific nucleotides in the RNA sequence are chemically modified after transcription. This can change codons and therefore alter the amino acid sequence of the final protein.

Alignment example: DNA → RNA → Protein

Using our optimized coding sequence as an example:

DNA (coding strand)

5′- ATG AAA ACG TTA CTG CTG -3′

Transcribed mRNA

(Thymine is replaced by uracil)

5′- AUG AAA ACG UUA CUG CUG -3′

Translated protein

Met – Lys – Thr – Leu – Leu – Leu

This alignment illustrates the central dogma of molecular biology:

DNA → RNA → Protein

In prokaryotes such as Escherichia coli, transcription and translation are coupled and occur simultaneously in the cytoplasm. In contrast, in eukaryotic cells, transcription occurs in the nucleus and translation occurs in the cytoplasm after RNA processing.


Part 4: Preparing a Twist DNA Synthesis Order

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

SECTION A. BENCHLING

This is a practice exercise, not necessarily the real Twist order!

4.1. Creating a Twist account, and Benchling account

4.2. Building a DNA Insert Sequence

We’ll make a sequence that will allow E. coli to glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein):

  1. In Benchling, we select New DNA/RNA sequence
  2. Now name the insert sequence and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).
  1. We go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with the codon optimized DNA sequence of interest). Each time we add a new piece of the sequence, we make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).
WHAT YOU SHOULD PINPOINT

Promoter (e.g. BBa_J23106)
TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (e.g. BBa_B0034 with spacers for optimal expression)
CATTAAAGAGGAGAAAGGTACC

Start Codon
ATG

Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example)
AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli)
CATCACCATCACCATCATCAC

Stop Codon
TAA

Terminator (e.g. BBa_B0015)
CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

  1. Once this is completed, we click on Linear Map to preview the entire sequence.

Note: This is not required for this exercise, but to share the design with others, ensure that link sharing is turned on!

The insert sequence that was built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). We now download the FASTA file for the sequence made.

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey the design. Here’s an example of what we just annotated in Benchling:


SECTION B. TWIST


Part 5: DNA Read/Write/Edit

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

For my project — Reversible Cell-Free Biosensor for ROS-Mediated Radiation Damage — we would want to sequence three categories of DNA:

1. The ROS-responsive promoter region
- Specifically, oxidative stress–responsive regulatory elements (e.g., OxyR/SoxR-regulated promoters from E. coli).
Why?
- To verify the exact sequence integrity of the promoter controlling our reporter.
- Small mutations in regulatory regions can drastically alter activation threshold, leakiness, or response dynamics.
- Since our system depends on reversible, tunable activation (not binary irreversible switching), promoter fidelity is critical for predictable behavior.
2. The full genetic construct used in the TX–TL system
This includes:
- Promoter
- RBS
- Reporter gene (e.g., GFP variant)
- Degron tag
- Terminator
Why?
- To confirm assembly correctness after cloning or synthesis.
- To ensure no frameshifts, truncations, or rearrangements occurred.
- To validate that the degron sequence is intact (since reversibility depends on controlled protein degradation).
3. DNA stability after ROS exposure (damage assessment)
- Because the biosensor operates in oxidative environments, we may also sequence recovered plasmid DNA after repeated ROS cycles.
Why?
- To assess oxidative damage accumulation.
- To evaluate mutation rates under stress.
- To determine long-term reusability limits of the system.

This directly connects to governance and safety: understanding failure modes prevents misleading signal interpretation.


(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also answer the following questions:

  1. Is your method first-, second- or third-generation or other? How so?
  2. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
  3. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
  4. What is the output of your chosen sequencing technology?

For this project, I would use a combination of Sanger sequencing and Illumina sequencing, depending on the question being asked.

1. Sanger


Sanger functioning basics. By Merck©.

2. Illuimina


Next-generation sequencing (NGS) vs Sanger sequencing. By Illumina©.

Final Justification for Technology Choice

For the reversible ROS biosensor:

  1. Sanger sequencing is sufficient and ideal for construct validation.
  2. Illumina sequencing becomes valuable when studying oxidative mutation accumulation and long-term robustness.

This sequencing strategy directly supports:

  • Reliability
  • Reversibility characterization
  • Governance considerations
  • Failure-mode understanding
  • Safe system deployment

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

For the reversible ROS-mediated hydrogel biosensor, we would synthesize a minimal genetic circuit designed for oxidative stress detection and transient fluorescent output.

The construct would include:

  • A ROS-responsive promoter (e.g., OxyR-regulated promoter)
  • A ribosome binding site (RBS)
  • A fluorescent reporter gene (e.g., sfGFP)
  • A short degron tag to ensure rapid protein degradation
  • A transcriptional terminator

Why synthesize this DNA?

Because:

  • The promoter must be precisely tuned to oxidative stress.
  • The degron must be fused correctly to ensure reversibility.
  • The full construct must function in a cell-free TX–TL system.
  • Synthetic DNA reduces cloning errors.
  • It enables modular optimization.

We are not synthesizing a whole genome.
We are synthesizing a minimal functional sensing circuit embedded in a biomaterial.


(ii) What technology or technologies would you use to perform this DNA synthesis and why?

Also answer the following questions:

  1. What are the essential steps of your chosen sequencing methods?
  2. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

We would use commercial gene synthesis services, such as those provided by:

Twist Bioscience

These companies use high-throughput DNA synthesis platforms based on phosphoramidite chemistry and silicon-based parallel synthesis.

Simplified process:

  • Attach first base to solid surface.
  • Add chemically protected nucleotide.
  • Remove protective group.
  • Add next nucleotide.
  • Repeat cycle.

Each cycle adds ONE base. This is automated.

For longer fragments:

  • Short oligos are synthesized.
  • Then assembled enzymatically into longer genes.
  • Verified by sequencing.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

For this biosensor, I would edit the genetic circuit itself to optimize sensing dynamics, reversibility, and robustness.

Specifically, we would edit:


(ii) What technology or technologies would you use to perform these DNA edits and why?

Also answer the following questions:

  1. How does your technology of choice edit DNA? What are the essential steps?
  2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
  3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The most appropriate technology for precise edits in genetic constructs would be:

CRISPR-based editing systems.

Specifically: CRISPR-Cas9

Limitations of DNA Editing Methods

1. Off-target effects (CRISPR)

  • Cas9 can cut unintended regions.
  • Less relevant for small plasmids, more relevant for genomes.

2. Efficiency variability

  • Not all cells incorporate edits.
  • Requires screening.

3. Repair pathway dependence

  • Precise edits require homologous recombination.
  • Not always efficient.

4. Context sensitivity

  • Changing one base can unpredictably alter promoter behavior.
  • Requires iterative testing.

For this project DNA editing is not strictly required for initial system implementation. However, it would be essential for iterative optimization of promoter sensitivity, degradation kinetics, and response tuning.


Resources

  1. Secuenciación Sanger Pasos y método. (s.f.). Merck©. https://www.sigmaaldrich.com/MX/es/technical-documents/protocol/genomics/sequencing/sanger-sequencing
  2. Differences between NGS and Sanger sequencing. (s.f.). Illumina©. https://www.illumina.com/science/technology/next-generation-sequencing/beginners/advantages/ngs-vs-sanger.html
  3. A Simple Guide to Phosphoramidite Chemistry and How it Fits in Twist Bioscience’s Commercial Engine. Twist Bioscience. https://www.twistbioscience.com/blog/science/simple-guide-phosphoramidite-chemistry-and-how-it-fits-twist-biosciences-commercial
  4. CRISPR: ¿Qué es y cómo funciona?. (s.f.). genotipia. https://genotipia.com/crispr-cas/
  5. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 3: Lab Automatisation


cover image cover image

My Homework

WEEK 3 - CODING AND AUTOMATISATION

This week we get hands-on (or at least code-on) with pipetting robots.

Lecture (Tues, Feb 17)

NO LECTURE THIS WEEK!

Recitation (Wed, Feb 18)

Lab automation: autonomous lab intro
(▶️Recording | 💻Slides)
Ronan Donovan, Alex Tran, Ana Arkhipenkova, Kourosh Afshinjoo


Instructions

1. Assignment: Python Script for Opentrons Artwork

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.

  1. Review this week’s recitation and this week’s lab for details on the Opentrons and programming it.
  2. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.

The original idea was to create a piece based on gothic arquitecture featuring a stained glass rose window

The inspo vs the reallity.

However, the results where closer to a Mario Bros castle and I didn’t quite like it, so instead, I made a second attempt with two different options; one for my gothic rose window greed and another one more simple with a Snoopy design, thinking more on the time recuired for it to be created on the Opentron machine.

The first idea vs the final idea

Rose window (left), full final design (center) and simplified final design (right).

The link for the final published design on te GUI site is this: Click here


  1. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons.
    • You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept.
    • If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead.

For the Python code in Google Colab:

I did try to make the Python file aside from the Ronan’s site Python download and I encounter a few issues while coding.

HTGAA Opentrons Setup Code Analysis

1. Environment Setup

import sys, os
py = f"{sys.version_info.major}.{sys.version_info.minor}"  
PKG = f"/content/venv/lib/python{py}/site-packages"  
os.makedirs(PKG, exist_ok=True)  
if PKG not in sys.path: sys.path.insert(0, PKG)  
os.environ["PIP_TARGET"] = PKG  
os.environ["PYTHONNOUSERSITE"] = "1"  

%pip install -q --upgrade --target "$PKG" opentrons    

Explanation:

  • Google Colab comes with a newer numpy version that is incompatible with Opentrons.
  • To avoid restarting the runtime repeatedly, we create a venv-like environment where Opentrons and its compatible dependencies are installed.
  • This ensures the rest of the protocol works without conflicts.

plt.rcParams["figure.figsize"] = (10,10) sets the default figure size for visualizations of the Petri dish and droplets.


2. Petri Dish Constants

`PETRI_INNER_DIAMETER = 84 MAX_DRAW_RADIUS = PETRI_INNER_DIAMETER/2 - 2`

Explanation:

  • Defines the Petri dish size (in mm) to simulate a real 90 mm plate.
  • MAX_DRAW_RADIUS leaves a 2 mm margin to prevent dispensing outside the plate due to tip size or miscalibration.
  • When scaling coordinates (like SCALE = 0.7), all points fit within ±36.3 mm, safely under the 40 mm limit.

3. Helper Classes and Functions

nullLocation

nullLocation = types.Location(types.Point(x=250, y=250, z=250), None)

  • Placeholder for pipette location before dispensing anything.

same2DLocation(loc1, loc2): Compares x and y only, ignores z, to detect whether two points are essentially the same on the Petri dish.
mock_print(str): A silent print function used instead of standard print(), to avoid cluttering output logs during simulation.


4. Pipette Simulation Class (PipetteSim)

This is the heart of the setup, emulating an Opentrons pipette for aspirating, dispensing, and tracking droplets.

Key components:

self.droplets_x, self.droplets_y, self.droplets_size, self.droplets_color

  • Tracks droplet positions, sizes, and colors for visualization.

self.smears

  • Originally draws lines connecting sequential dispenses to simulate smearing/dragging of droplets.

Important: SMEAR Handling

# for xlist,ylist,color in self.smears:  
#     plt.gca().plot(xlist, ylist, color=color, linewidth=4, solid_capstyle='round')   
  • Commented out to remove unwanted lines in the visualization.
  • Concept: Each time the pipette moves after dispensing, the simulator connects the last droplet to the new location with a line.
  • We replaced it with plt.scatter() for droplets only, avoiding the “demonic laser beams of death” - ChatGPT, 2026.
Code without commenting "self.smears" on figures 1-3 starting from the left and commented code fixing the smear lines (figure 4) on the far right; the before and after.

5. Scaling and Coordinates

  • Coordinates for droplets (like electra2_points fron de GUI site) originally go up to ±36.3 mm.
  • With SCALE = 0.7, all points safely fit inside the MAX_DRAW_RADIUS = 40 mm.

This prevents runtime errors like:

ValueError: Dispensing outside "safe" area: Point (-25.3, 36.3) is more than 40.0mm away

Math used: simple multiplication for scaling each (x, y) coordinate

scaled_x = original_x * SCALE
scaled_y = original_y * SCALE

Mathematical calculations for coordinate scaling

We scale each coordinate $(x, y)$ to fit inside the safe dispensing area using a scale factor:

$$ x_{\text{scaled}} = x_{\text{original}} \cdot SCALE $$ $$ y_{\text{scaled}} = y_{\text{original}} \cdot SCALE $$

Example:

  • Original point:
$(-36.3, 25.3)$ mm
  • Scale factor:
$SCALE = 0.7$ $$ x_{\text{scaled}} = -36.3 \cdot 0.7 = -25.41 \text{ mm} $$ $$ y_{\text{scaled}} = 25.3 \cdot 0.7 = 17.71 \text{ mm} $$

Radius Check

The simulator only allows dispensing inside the safe radius:

$$ R_{\text{max}} = \frac{\text{PETRI INNER DIAMETER}}{2} - 2 = 40 \text{ mm} $$ A point $(x_{\text{scaled}}, y_{\text{scaled}})$ is valid if: $$ \sqrt{x_{\text{scaled}}^2 + y_{\text{scaled}}^2} \leq R_{\text{max}} $$

Example:

$$ \sqrt{(-25.41)^2 + (17.71)^2} = \sqrt{645.7 + 313.7} = \sqrt{959.4} \approx 30.98 \text{ mm} < 40 \text{ mm} $$

Safe to dispense.


Droplet Size Mapping

Droplet volume $V$ in μL is mapped to a visual size $S$ for plotting:

$$ S = V \cdot K $$ Where $K = 100$ in our code.

Example:

$$ V = 1 \mu L \implies S = 1 \cdot 100 = 100 \text{ (scatter marker size)} $$

Summary Formula for Visualization

For each original coordinate $(x, y)$ and droplet volume $V$: $$ \begin{cases} x_{\text{scaled}} = x \cdot SCALE \\ y_{\text{scaled}} = y \cdot SCALE \\ S = V \cdot 100 \\ \text{Check: } \sqrt{x_{\text{scaled}}^2 + y_{\text{scaled}}^2} \leq 40 \end{cases} $$

Example Table

Original $(x,y)$Scaled $(x,y)$Volume $(\mu L)$Size $S$
(-36.3, 25.3)(-25.41, 17.71)1100
(29.7, -16.5)(20.79, -11.55)2200
(-12.1, -36.3)(-8.47, -25.41)0.550

AI really helped making this calculations neatly and fast to implement organically on the Python code.


6. Pipette Operations

Dispense

self.droplets_x.append(location.point.x)  
self.droplets_y.append(location.point.y)  
self.droplets_size.append(volume * 100)  
self.droplets_color.append(color)      
  • Maps volume → size of droplet visually (unprincipled scaling, but works for display).
  • Updates self.totalDispensed to track volumes per color.

Aspirate

  • Checks for tip presence, maximum volume, and cross-contamination.
  • Updates self.totalAspirated.

Pick Up & Drop Tip

  • Ensures the pipette is always aware of whether it holds a tip, preventing accidental dispensing or aspirating without one.

7. Petri Dish Mapping (petriLocOfWell)

x=(x-ord('D')) * MAX_DRAW_RADIUS/4  
y=(y-6) * MAX_DRAW_RADIUS/6  
  • Converts well IDs (A1-H12) into (x, y) coordinates on the Petri dish.
  • ord('D') and y-6 center the mapping around the dish.
  • Ensures droplets are placed accurately relative to the plate center.
Visual Representation of the Petri Dish and Safe Dispensing Area

We define:

  • Petri dish radius:
$R_\text{petri} = 42\text{ mm}$ (inner diameter 84 mm)
  • Safe dispensing radius: $R_\text{max} = 40\text{ mm}$
  • Sample points scaled by $SCALE = 0.7$

Legend:

  • Black circle: Petri dish outer edge
  • Red dashed circle: Maximum safe dispensing radius
  • Colored dots: Scaled droplet coordinates
  • Axes: $x$ and $y$ in mm

How it works:

  • The \draw commands make the dish and safe area.
  • The \filldraw commands place your points after scaling with SCALE = 0.7.
  • You can add more points by duplicating \filldraw[...] (x_scaled, y_scaled) ....

8. Visualization (visualize())

  • Draws the Petri dish with plt.Circle.
  • Displays droplets with plt.scatter.

Smears are commented out to prevent unwanted lines:

# for xlist,ylist,color in self.smears:  
#    plt.gca().plot(...)      
  • X and Y limits are set slightly beyond the dish to avoid clipping.

9. Color & Well Handling

Additionally, we discovered that in the simulator:

  • Blue corresponds to A2, with A1 you get pink, B1 is purple, while C1 is green and D1 is yellow.
  • Columns beyond D may not exist in some mock labware.
  • This required careful mapping of colors to well IDs.
  • We also used the color mapping to differentiate bio-inks visually.

10. Optional Future Feature

  • A PNG → Opentrons coordinates converter could automate mapping any pixel art (Snoopy, logos, text) into pipette instructions (this part really makes your life easier!).
  • Could be useful for quickly generating complex designs. However, we still have to scale the coordinates.

Summary of ChatGPT - AI Contributions

  • Analyzed and adapted the Opentrons mock environment to work in Colab with new numpy versions.
  • Applied scaling (SCALE = 0.7) to prevent MAX_DRAW_RADIUS errors.
  • Commented out smears to clean the visualization (plt.scatter() only).
  • Helped map real coordinates and colors into Opentrons wells for the simulator.
  • Explained the logic behind dispense, aspirate, tip handling, and visualization.
  • Suggested a PNG → coordinates converter for rapid design automation.

Now, for the code used

The colors instructed by Lifefabs Institute, London - Node are blue, pink and purple so two versions where made

Link to the Google Colab Opentrons Python notebook: Click here


The final take

Final design in pink and purle (left) and second final design option in blue and pink (right).

4. If the Python component is proving too problematic even with AI and human assistance, download the full Python script from the GUI website and submit that:

Use the download icon pointed to by the red arrow in this diagram.

Use the download icon pointed to by the red arrow in this diagram.

This are the Python files with the final design downloaded directly from the GUI site:

5. If you use AI to help complete this homework or lab, document how you used AI and which models made contributions.

Did you use AI in to help write your code? If so, what was your experience & which AI tool did you find most helpful?

Did I use AI? For sure! I used AI to help write and optimize my code. I primarily used ChatGPT, which was extremely helpful in reviewing my code, explaining tricky parts, and suggesting optimizations. I also tried Google Colab’s Gemini, but I found its responses less useful and not satisfactory for my needs, even when providing it with access to the code. ChatGPT really guided me step by step, helping me understand how to structure the Opentrons protocol correctly and troubleshoot potential issues, which made the process much smoother and more reliable.

That said, even with ChatGPT’s guidance, we encountered several issues that we were not able to fully resolve, so while it significantly helped improve and clarify the code, it didn’t solve every problem.


  1. Sign up for a robot time slot if you are at MIT/Harvard/Wellesley or at a Node offering Opentrons automation. The Python script you created will be run on the robot to produce your work of art!
    • At MIT/Harvard? Lab times are on Thursday Feb.19 between 10AM and 6PM.
    • At other Nodes? Please coordinate with your Node.
  2. Submit your Python file via this form.

2. Post-Lab Questions

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

The paper chosen was:

PlasmoTron: an open-source platform for automated culture of malaria parasites.

Sanderson, T. & Rayner, J. C. (2018). PlasmoTron: an open-source platform for automated culture of malaria parasites. Bioarxiv. https://doi.org/10.1101/241596

About this article:

(View Full Screen)

Also, some other papers that are very interesting about this topic:

1. Semi-automated Production of Cell-free Biosensors.

Brown, D. M., Phillips, D. A., Garcia, D. C., et al. (2024). Semi-automated Production of Cell-free Biosensors. bioRxiv. https://doi.org/10.1101/2024.10.13.618078

2. Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research.

Hatakeyama-Sato, K., Nishida, T., Kitamura, K., et al. (2025). Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research. Arxiv. arXiv:2506.12312 [cs.RO]. https://doi.org/10.48550/arXiv.2506.12312

3. BOTany Methods: Accessible Automation for Plant Synthetic Biology.

Qiande, M., Lin, A., Larson, L., et al. (2026). BOTany Methods: Accessible Automation for Plant Synthetic Biology. Plant Physiology. https://doi.org/10.1093/plphys/kiag066


  1. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.

Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.

  1. Echo transfer biosensor constructs and any required cofactors into specified wells.
  2. Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate.
  3. Multiflo dispense the CFPS lysate to all wells to start protein expression.
  4. PlateLoc seal the plate.
  5. Inheco incubate the plate at 37°C while the biosensor proteins are synthesized.
  6. XPeel remove the seal.
  7. PHERAstar measure fluorescence to compare biosensor responses.

I decided to hold on on this section just for the moment since i might change my project this week!


3. Final Project Ideas

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

As explained in this week’s recitation, add 1-3 slides in your Node’s section of this slide deck with 3 ideas you have for an Individual Final Project. Be sure to put your name, city, and country on your slide!


The submitted project ideas are as follows:

Project N° 1: Dual-Signal Biosensor for Functional Radiation Dosimetry


Project N° 2: Living Sound-to-Color Interface Using Optogenetic Bacteria


Project N° 3: Engineered Microbial Sensor for Deep-Ocean Environments


Resources

  1. Opentrons API Documentation: https://docs.opentrons.com/python-api/
  2. Opentrons Artwork GUI Website: http://opentrons-art.rcdonovan.com/
  3. Opentrons Artwork Colab: HTGAA26 Opentrons Colab
  4. Automation Equipment: HTGAA 2026 Recitation: Lab Automation, Opentrons Art, Intro to Cloud Laboratories
  5. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 4: Protein Design Part I


cover image cover image

My Homework

WEEK 4 - BIOINFORMATICS PART I

This week focuses on how sequence, structure, and energetics can be modeled and manipulated to create or optimize proteins with specified functions.

Lecture (Tues, Feb 24)

Protein Design Part I
(▶️Recording)
Thras Karydis, Jon Kaufman

Recitation (Wed, Feb 25)

Protein folding
(▶️Recording)
Allan Costa


Protein Design I

Objective:

  1. Learn basic concepts:
    • amino acid structure
    • 3D protein visualization
    • the variety of ML-based design tools
  2. Brainstorm as a group how to apply these tools to engineer a better bacteriophage (setting the stage for the final project).

Part A. Conceptual Questions

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

The following questions will be answer: 1, 2, 3, 4, 5, 6, 8, 9, and 10

The following questions will be left unanswer: 7 and 11


1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

To estimate how many amino acid molecules are consumed in a 500 g piece of meat, we need to make reasonable biochemical assumptions.


1.1. Estimate protein content in meat

On average, raw beef contains approximately 20–25% protein by weight.

To stay conservative, we assume:

Protein content ≈ 20%

\[ 500 \, g \times 0.20 = 100 \, g \text{ protein} \]

So, 500 g of meat contains approximately:

100 g of protein


1.2. Estimate number of moles of amino acids

We are told that the average amino acid has a molecular weight of approximately:

\[ 100 \, \text{Daltons} = 100 \, g/mol \]

If we assume protein is fully hydrolyzed into individual amino acids, then:

\[ \frac{100 \, g}{100 \, g/mol} = 1 \, mol \]

So, 100 g of amino acids corresponds to:

1 mole of amino acids


1.3. Convert moles to number of molecules

Using Avogadro’s number:

\[ 1 \, mol = 6.022 \times 10^{23} \text{ molecules} \]

Therefore:

You ingest approximately 6.0 × 10²³ amino acid molecules


Final Answer

A 500 g piece of meat contains on the order of:

~ 10²⁴ amino acid molecules

(approximately one mole of amino acids)


Important Notes

  • This is an order-of-magnitude estimate.
  • Real proteins are polymers, so their molecular weights are much larger.
  • The calculation assumes complete digestion into free amino acids.
  • Water content and protein percentage vary by meat type and preparation.

Interpretation

Eating 500 g of meat means consuming roughly Avogadro-scale molecular quantities of amino acids — on the order of (10²⁴) individual molecules.

This illustrates how biological systems operate at unimaginably large molecular scales, even in everyday nutrition.


2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

The answer is straightforward: we do not incorporate foreign organisms as whole structures — we digest them into molecular building blocks.


2.1. Digestion breaks macromolecules into basic units

Proteins in beef or fish are large, highly ordered macromolecules. During digestion:

  • Stomach acid (HCl) denatures proteins.
  • Proteases such as pepsin, trypsin, and chymotrypsin cleave peptide bonds.
  • Proteins are hydrolyzed into short peptides and free amino acids.

By the time nutrients are absorbed in the small intestine, the original protein structures no longer exist.

We absorb:

  • Amino acids
  • Simple sugars
  • Fatty acids
  • Nucleotides

not intact tissues


2.2. Molecular identity is lost during digestion

A cow muscle protein (for example, bovine actin) is not transferred into your muscles as bovine actin. It is broken down into its constituent amino acids:

\[ \text{Protein} \rightarrow \text{Amino acids} \]

Those amino acids are chemically indistinguishable from amino acids obtained from fish, plants, or synthesized endogenously.

At the molecular level:

An amino acid is just an amino acid — it carries no “species identity.”


2.3. Your genome determines what you become

Once absorbed, amino acids enter your metabolic pool. Your ribosomes then synthesize proteins according to:

\[ \text{DNA} \rightarrow \text{RNA} \rightarrow \text{Protein} \]

Your DNA sequence encodes human proteins, not cow or fish proteins.

Therefore:

  • You rebuild human actin.
  • You rebuild human collagen.
  • You rebuild human enzymes.

Your phenotype is determined by your genome, not by the origin of your nutrients.


2.4. Information vs. Matter

This question highlights a fundamental biological principle:

Biological identity is determined by information, not raw material.

Matter (carbon, nitrogen, oxygen, amino acids) is universal.
Biological structure depends on how that matter is organized, and organization is encoded in DNA.


Final Answer

Humans do not become cows or fish after eating them because digestion reduces food to molecular building blocks. These building blocks are then reassembled according to human genetic instructions.

We recycle matter — but we do not inherit structural identity from what we eat.


3. Why are there only 20 natural amino acids?

Indeed, why are there only 20 amino acids when the triplet genetic code has 64 codons available? Similarly, could the system work effectively with less than 20? The existence of 20 canonical amino acids is not a chemical inevitability — it is the result of evolutionary optimization. There is no fundamental law of physics that limits proteins to 20 amino acids. Instead, the number reflects a balance between chemical diversity, translational fidelity, and evolutionary stability.


3.1. The genetic code constrains the set

Proteins are encoded by triplet codons:

\[ 4^3 = 64 \text{ possible codons} \]

Out of these:

  • 61 encode amino acids
  • 3 are stop codons

The canonical genetic code maps these 61 codons to 20 amino acids. This mapping is highly redundant (degenerate), which increases robustness against mutations.

Expanding the number of amino acids would require:

  • New tRNAs
  • New aminoacyl-tRNA synthetases
  • Rewiring of codon assignments

This is evolutionarily costly.


3.2. Chemical sufficiency

The 20 amino acids provide a remarkably broad range of chemical functionality:

  • Nonpolar (hydrophobic packing)
  • Polar uncharged (hydrogen bonding)
  • Charged (electrostatics)
  • Aromatic (π interactions)
  • Special cases (glycine flexibility, proline rigidity, cysteine disulfide bonding)

With just 20 building blocks, proteins can:

  • Fold into stable 3D structures
  • Catalyze diverse chemical reactions
  • Form dynamic assemblies

Adding many more amino acids would yield diminishing functional returns.


3.3. Evolutionary “freeze” of the code

Once the genetic code became established in early life, it became extremely difficult to change.

This is known as the frozen accident hypothesis:

Once organisms shared a common genetic code, large-scale changes would be lethal.

Thus, the 20 amino acids became locked in by evolutionary history.


3.4. Are there really only 20?

Interestingly, modern biology slightly exceeds 20:

  • Selenocysteine (21st amino acid)
  • Pyrrolysine (22nd amino acid)

These are incorporated via special recoding mechanisms.

Additionally, synthetic biology has engineered organisms that incorporate noncanonical amino acids, proving that 20 is not a hard biochemical limit — just the natural evolutionary standard.


Final Answer

There are 20 natural amino acids because evolution selected a chemically sufficient, robust, and efficient set early in the history of life.

The genetic code then became evolutionarily fixed, making large-scale expansion unlikely. The number 20 reflects evolutionary optimization — not chemical necessity.


4. Can you make other non-natural amino acids? Design some new amino acids.

Yes. Non-natural (noncanonical) amino acids can be synthesized chemically and even incorporated into proteins using engineered translation systems.

There is no chemical rule limiting amino acids to the 20 canonical ones. The only strict requirement for incorporation into proteins is that the molecule must:

  • Contain an α-amino group
  • Contain an α-carboxyl group
  • Be compatible with ribosomal geometry
  • Be recognized by a tRNA / aminoacyl-tRNA synthetase pair

Modern synthetic biology has successfully expanded the genetic code to include dozens of artificial amino acids.


4.1 Design some new amino acids

4.1.1. Design Principles

When designing a new amino acid, we must consider:

  1. Side-chain size and steric compatibility
  2. Polarity and hydrogen bonding capacity
  3. Electronic effects
  4. Stability under physiological conditions
  5. Synthetic accessibility

A) Fluorinated Hydrophobic Amino Acid

Structure Concept:

Replace a methyl group in leucine with a trifluoromethyl group.

Side chain: \[ -CH_2-CH(CF_3)_2 \]

Purpose:

  • Increase hydrophobicity
  • Alter packing interactions
  • Increase metabolic stability

Fluorinated residues are useful for:

  • Stabilizing protein cores
  • Modifying membrane interactions
  • 19F NMR tracking

B) Photo-Crosslinking Amino Acid

Structure Concept:

Attach a diazirine group to a phenylalanine-like ring.

Side chain: \[ -CH_2-phenyl-diazrine \]

Purpose:

  • UV-activated covalent crosslinking
  • Study protein–protein interactions
  • Capture transient binding events

This would allow light-controlled structural locking of proteins.


C) Redox-Active Aromatic Amino Acid

Structure Concept:

Modify tyrosine to include a quinone-like moiety.

Side chain:

\[ -CH_2-aromatic-quinone \]

Purpose:

  • Electron transfer capability
  • Catalysis in synthetic enzymes
  • Bioelectronic interfaces

This could enhance long-range electron transport in engineered proteins.


Are These Realistic?

Yes. Variants of these ideas already exist in synthetic biology:

  • Fluorinated amino acids
  • Photo-reactive amino acids
  • Click-chemistry compatible residues
  • Redox-active artificial cofactors

Genetic code expansion techniques allow site-specific incorporation using engineered:

  • Orthogonal tRNA
  • Engineered aminoacyl-tRNA synthetase
  • Reassigned stop codons (often UAG)

Final Answer

Yes, non-natural amino acids can be synthesized and incorporated into proteins. The natural 20 amino acids represent an evolutionary standard, not a chemical limit.

By modifying side chains, we can design amino acids with enhanced hydrophobicity, photo-reactivity, redox properties, or catalytic potential — dramatically expanding the functional landscape of proteins.


5. Where did amino acids come from before enzymes that make them, and before life started?

Amino acids did not require life to exist. They can form through purely abiotic chemical processes under the right physical conditions. Before enzymes evolved, amino acids were likely synthesized through prebiotic chemistry on early Earth — and possibly delivered from space.


5.1. Prebiotic Atmospheric Chemistry

In 1953, Stanley Miller and Harold Urey demonstrated that amino acids can form spontaneously from simple gases when energy is supplied.

They simulated early Earth conditions using:

  • Methane (CH₄)
  • Ammonia (NH₃)
  • Hydrogen (H₂)
  • Water vapor (H₂O)
  • Electrical sparks (lightning)

After several days, the system produced amino acids such as:

  • Glycine
  • Alanine
  • Aspartic acid

This experiment showed that amino acids can emerge from non-living chemistry.

Reaction principle (simplified):

\[ \text{Simple gases} + \text{energy} \rightarrow \text{Organic molecules} \]

5.2. Hydrothermal Vent Chemistry

Another hypothesis suggests that amino acids formed near deep-sea hydrothermal vents.

These environments provide:

  • Mineral catalysts (iron, nickel sulfides)
  • Redox gradients
  • Thermal energy
  • High pressure

Mineral surfaces may have catalyzed the formation of organic molecules and concentrated them locally.


5.3. Extraterrestrial Delivery

Amino acids have been detected in carbonaceous meteorites, such as the Murchison meteorite.

These findings suggest that:

  • Amino acids can form in interstellar space
  • They can survive planetary accretion
  • Early Earth may have received organic molecules via meteorite bombardment

Thus, part of Earth’s prebiotic inventory may have been extraterrestrial.


5.4. No Enzymes Required

Modern organisms synthesize amino acids using enzyme-catalyzed pathways. However, enzymes are highly evolved catalysts.

Before life:

  • Chemistry was driven by thermodynamics and energy input
  • Catalysis may have been mineral-based
  • Reaction networks were simpler but chemically plausible

Life did not invent amino acids — it inherited them from chemistry.


Final Answer

Amino acids likely originated through abiotic chemical reactions on early Earth (e.g., atmospheric discharge or hydrothermal systems) and possibly through extraterrestrial synthesis. They existed before enzymes because their formation does not require biological catalysis — only appropriate chemical conditions and energy sources.


6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

An α-helix formed from D-amino acids would be left-handed.


6.1. Chirality Determines Helical Handedness

Natural proteins are built from L-amino acids.

When L-amino acids adopt an α-helical conformation, they form a:

Right-handed α-helix

This is the most energetically favorable geometry due to:

  • Backbone bond angles (φ and ψ)
  • Steric constraints
  • Optimal hydrogen bonding alignment

6.2. Mirror Symmetry Argument

D-amino acids are the mirror images of L-amino acids.

Because chirality inverts stereochemistry at the α-carbon, the entire conformational energy landscape is mirrored.

Therefore:

  • L-amino acids → right-handed α-helix
  • D-amino acids → left-handed α-helix

The structures are mirror images of each other.


6.3. Hydrogen Bond Geometry

The α-helix is stabilized by hydrogen bonds:

\[ C=O_{(i)} \rightarrow H-N_{(i+4)} \]

The spatial orientation required for optimal hydrogen bonding depends on backbone stereochemistry.

Switching from L to D reverses:

  • Dihedral angle preferences
  • Side-chain orientation
  • Overall helical twist direction

6.4. Energetics

For L-amino acids:

  • Right-handed helices are lower in energy.
  • Left-handed helices are sterically disfavored.

For D-amino acids:

  • The energetic preference is inverted.

Thus, a D-polypeptide naturally favors a left-handed α-helix.


Final Answer

An α-helix composed entirely of D-amino acids would adopt a left-handed conformation, because reversing chirality at the α-carbon mirrors the backbone geometry and inverts the preferred helical handedness.


8. Why are most molecular helices right-handed?

Most molecular helices in biology are right-handed because life is built almost exclusively from L-amino acids and D-sugars. Molecular chirality determines the preferred helical geometry. Right-handed helices are not universally required by physics — they are a consequence of stereochemistry and evolutionary selection.


8.1. Chirality Bias in Biology

Biological systems exhibit homochirality:

  • Proteins are built from L-amino acids.
  • Nucleic acids contain D-ribose or D-deoxyribose.

Because helices emerge from repeating chiral building blocks, their handedness is dictated by the stereochemistry of those monomers.

For example:

  • L-amino acids → right-handed α-helices
  • D-sugars → right-handed DNA double helix (B-DNA)

If chirality were inverted, handedness would invert.


8.2. Energetic Favorability

Helical structures form when:

  • Backbone dihedral angles minimize steric clashes
  • Hydrogen bonds align optimally
  • Side chains pack efficiently

For L-amino acids, the lowest-energy α-helical conformation is right-handed. Left-handed helices are possible but typically sterically disfavored in L-polypeptides. Thus, the dominance of right-handed helices reflects energetic optimization under stereochemical constraints.


8.3. Repeating Geometry and Twist

A helix arises from repeating units with constrained bond angles.

Because bond rotations are not symmetric in chiral molecules, the accumulation of small angular preferences results in a macroscopic twist.

This is an emergent geometric property:

\[ \text{Local chirality} \rightarrow \text{Global helical handedness} \]

8.4. Is Right-Handedness Universal?

No.

  • Polymers built from D-amino acids form left-handed helices.
  • Synthetic achiral polymers can form either handedness.
  • Certain protein segments (e.g., polyproline II helices) may adopt left-handed conformations.

Thus, right-handed dominance in biology reflects molecular asymmetry, not universal physical law.


Final Answer

Most molecular helices in biology are right-handed because they are built from chiral building blocks (L-amino acids and D-sugars) whose stereochemistry favors right-handed twist geometries. Helical handedness emerges from the accumulation of local stereochemical constraints into a global structural bias.


9. Why do β-sheets tend to aggregate?

  • What is the driving force for β-sheet aggregation?

β-sheets tend to aggregate because their structure exposes hydrogen-bonding backbone groups and often hydrophobic side chains, making intermolecular association energetically favorable.

The driving force is primarily:

  • Backbone hydrogen bonding
  • The hydrophobic effect
  • Reduction of solvent-exposed surface area
  • Overall free energy minimization

9.1. Backbone Hydrogen Bonding Is Not Fully Satisfied

In an isolated or partially unfolded polypeptide:

  • Carbonyl (C=O) groups
  • Amide (N–H) groups

are capable of forming hydrogen bonds. If these groups are not satisfied intramolecularly, they seek partners intermolecularly.

When multiple β-strands align:

\[ C=O_{(i)} \leftrightarrow H-N_{(j)} \]

They form extended hydrogen-bond networks between molecules. This makes β-sheets particularly prone to forming intermolecular structures.


9.2. The Extended Geometry of β-Strands

β-strands are relatively:

  • Extended
  • Planar
  • Repetitive

This geometry allows:

  • Easy stacking
  • Sheet-to-sheet association
  • Formation of fibrillar structures

Unlike α-helices (which are internally hydrogen-bonded), β-strands expose bonding potential along their length.


9.3. Hydrophobic Effect

Many β-sheet–forming sequences contain hydrophobic residues.

When strands aggregate:

  • Hydrophobic side chains become buried
  • Ordered water molecules are released into bulk solvent
  • Solvent entropy increases

This contributes favorably to:

\[ \Delta G = \Delta H - T\Delta S \]

The increase in solvent entropy (ΔS > 0) often drives aggregation.


9.4. Structural Complementarity

β-sheets allow:

  • Tight side-chain interdigitation
  • Steric zipper formation
  • Highly ordered packing

This geometric complementarity stabilizes aggregates such as amyloid fibrils.


9.5. Thermodynamic Perspective

Aggregation is favored when:

\[ \Delta G_{aggregation} < 0 \]

This occurs due to:

  • Enthalpic gain from hydrogen bonding
  • Entropic gain from water release
  • Reduced solvent-exposed surface area

Thus, β-sheet aggregation is often thermodynamically favorable, especially at high concentration or under partially denaturing conditions.


Final Answer

β-sheets tend to aggregate because their extended backbone structure allows extensive intermolecular hydrogen bonding and efficient hydrophobic packing. The primary driving forces are backbone hydrogen bonding and the hydrophobic effect, which together lower the system’s free energy and stabilize ordered aggregates.


10. Why do many amyloid diseases form β-sheets?

  • Can you use amyloid β-sheets as materials?

Many amyloid diseases are associated with β-sheet formation because β-sheets provide a structurally stable, energetically favorable architecture for protein aggregation. The same physical principles that stabilize β-sheets in normal proteins can drive pathological self-assembly under destabilizing conditions.


10.1. Misfolding Exposes Aggregation-Prone Regions

Many proteins contain segments with high β-sheet propensity.

Under normal conditions:

  • Proteins fold into native conformations
  • Aggregation-prone regions are buried

However, mutations, oxidative stress, or aging can:

  • Destabilize native folds
  • Expose hydrophobic and hydrogen-bonding surfaces

Once exposed, these regions can align into intermolecular β-sheets.


10.2. Cross-β Architecture

Amyloid fibrils share a characteristic structural motif:

  • β-strands run perpendicular to the fibril axis
  • Hydrogen bonds run parallel to the fibril axis

This “cross-β” structure creates:

  • Extensive hydrogen-bond networks
  • High mechanical stability
  • Repetitive, ordered packing

Because backbone hydrogen bonds are strong and directional, β-sheets form highly stable fibrillar assemblies.


10.3. Thermodynamic Driving Forces

Amyloid formation is driven by:

  • Backbone hydrogen bonding
  • Hydrophobic packing
  • Release of ordered water (entropy gain)
  • Reduction of exposed surface area

Thus, amyloid fibrils often represent a deep thermodynamic minimum. In some cases, the amyloid state may be more stable than the native fold.


10.4. Why So Many Diseases?

Examples include:

  • Alzheimer’s disease
  • Parkinson’s disease
  • Huntington’s disease
  • Prion diseases

In each case, a normally soluble protein adopts an aggregation-prone β-sheet–rich structure. Because β-sheets allow extensive intermolecular stabilization, once nucleation occurs, fibril growth can become self-propagating.


10.5. Can Amyloid β-Sheets Be Used as Materials?

Yes. Although pathological in some contexts, amyloid fibrils have remarkable material properties:

  • High tensile strength
  • Nanometer-scale precision
  • Self-assembly capability
  • Chemical robustness

Potential applications include:

  • Biomaterials and scaffolds
  • Nanowires
  • Drug delivery systems
  • Tissue engineering frameworks
  • Bioelectronic interfaces

Some organisms naturally use functional amyloids (e.g., bacterial biofilms), demonstrating that amyloid structures are not inherently pathological.


Final Answer

Many amyloid diseases form β-sheets because the β-sheet architecture allows extensive intermolecular hydrogen bonding and hydrophobic packing, creating highly stable cross-β fibrils. While pathological in neurodegenerative diseases, amyloid β-sheet assemblies can also be harnessed as robust, self-assembling nanomaterials in biotechnology and materials science.


Part B: Protein Analysis and Visualization

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:


1. Briefly describe the protein you selected and why you selected it.

I selected the protein RecA from the extremophilic bacterium Deinococcus radiodurans. RecA is a DNA recombination and repair protein that plays a central role in homologous recombination and in the repair of double-strand DNA breaks.

D. radiodurans is known for its extraordinary resistance to ionizing radiation, desiccation, and other extreme environmental stresses. It can survive radiation levels thousands of times higher than those lethal to humans. RecA is essential for this remarkable resilience, as it facilitates DNA strand exchange and genome reassembly after severe DNA fragmentation.

I selected this protein because of its strong relevance to space biology and astrobiology. Radiation is one of the main challenges for life beyond Earth, and understanding the molecular mechanisms that enable DNA repair under extreme radiation conditions provides insight into how life might survive in extraterrestrial environments such as Mars. Additionally, RecA belongs to a highly conserved protein family, making it ideal for evolutionary and structural analysis.


2. Identify the amino acid sequence of your protein.

2.1. How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

2.2. How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

2.3. Does your protein belong to any protein family?

The amino acid sequence was obtainted from UniProt:

RecA - Amino acid sequence

sp|P42443|RECA_DEIRA Protein RecA OS=Deinococcus radiodurans (strain ATCC 13939 / DSM 20539 / JCM 16871 / CCUG 27074 / LMG 4051 / NBRC 15346 / NCIMB 9279 / VKM B-1422 / R1) OX=243230 GN=recA PE=1 SV=2

MSKDATKEISAPTDAKERSKAIETAMSQIEKAFGKGSIMKLGAESKLDVQVVSTGSLSLDLALGVGGIPRGRITEIYGPESGGKTTLALAIVAQAQKAGGTCAFIDAEHALDPVYARALGVNTDELLVSQPDNGEQALEIMELLVRSGAIDVVVVDSVAALTPRAEIEGDMGDSLPGLQARLMSQALRKLTAILSKTGTAAIFINQVREKIGVMYGNPETTTGGRALKFYASVRLDVRKIGQPTKVGNDAVANTVKIKTVKNKVAAPFKEVELALVYGKGFDQLSDLVGLAADMDIIKKAGSFYSYGDERIGQGKEKTIAYIAERPEMEQEIRDRVMAAIRAGNAGEAPALAPAPAAPEAAEA

  • Protein lenght: 363 amino acids, wich aligns with the data shown at the UniProt site

  • Most frecuent amino acid: Alanine (Ala, A), with a Frequency Count of 53 residues (14.60%)


Counting result by Google Colab notebook:


Counting result by JupyterLab:


There was a small error on the second half of the Colab code, but the first part runs without issues. Even with assistance from Gemini AI it was not posible for it to run correctly:

Access to the Google Colab notebook used to count the frequency of amino acids: Click here


Protein sequence homologs: 250 hits found with BLAST


Protein affiliation (family): it belongs to the RecA family

According to UniProt and InterPro classification, RecA belongs to the RecA/Rad51 protein family. This family includes bacterial RecA, archaeal RadA, and eukaryotic Rad51 proteins. These proteins share a conserved ATPase domain of the P-loop NTP-binding superfamily and play essential roles in homologous recombination and DNA repair.


3. Identify the structure page of your protein in RCSB

3.1. When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

3.2. Are there any other molecules in the solved structure apart from protein?

3.3. Does your protein belong to any structure classification family?


Structure released

  • Deposited: 2004-10-08
  • Released: 2004-12-21
  • Deposition Author(s): Bell, C.E., Rajan, R. (2004) J Mol Biology 344: 951-963.
  • Title: Crystal structure of RecA from Deinococcus radiodurans: insights into the structural basis of extreme radioresistance.
  • DOI: https://doi.org/10.1016/j.jmb.2004.09.087
  • Quality: Resolution of 2.50 Å

This is considered a good quality crystal structure. In X-ray crystallography, the resolution indicates the level of structural detail observed in the electron density map. Lower values correspond to higher structural precision.

  • Molecules in the structure:

Yes. According to the RCSB entry, the structure contains additional molecules besides the RecA protein. These include:

  • ADP (adenosine diphosphate) — a nucleotide bound to the ATPase active site
  • Magnesium ion (Mg²⁺) — a cofactor required for nucleotide binding and ATP hydrolysis
  • Water molecules (HOH) — commonly observed in crystal structures

These molecules are functionally relevant because RecA is an ATPase, and nucleotide binding plays an important role in its mechanism during DNA repair and homologous recombination.

Protein affiliation:

Yes. According to the SCOP structural classification database, the protein belongs to the following hierarchy:

  • Fold: RecA-like classic
  • Superfamily: RecA-like P-loop NTPases
  • Family: RecA/Rad51/KaiC-like ATPases
  • SCOP ID: 4004007

This classification groups proteins with similar three-dimensional folds and ATPase domains, even if their sequences differ. Members of this superfamily share a conserved P-loop NTP-binding domain involved in nucleotide binding and hydrolysis.


4. Open the structure of your protein in any 3D molecule visualization software:

  • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

  • Color the protein by secondary structure. Does it have more helices or sheets?

  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Protein visualization in PyMOL:

Cartoon

Ribbon

Ball and Stick


Helices or Sheets?

🔴 α-helices
🟡 β-sheets
🟢 loops or flexible regions

Cartoon:

The cartoon representation highlights the secondary structure elements of the protein. The structure contains several α-helices and a smaller number of β-sheets arranged near the center of the protein. Overall, α-helices appear to be more abundant than β-sheets.

Ribbon:

The ribbon representation reveals the overall fold of the protein. The structure consists of a central β-sheet region surrounded by α-helices, which is characteristic of the RecA-like fold found in ATP-binding proteins.

Ball and Stick:

The ball-and-stick representation shows the detailed atomic arrangement of the amino acid residues. Hydrophobic residues appear mostly buried within the protein core, whereas hydrophilic residues are more exposed on the surface, which is typical for soluble cytoplasmic proteins.


Hydrophobic vs Hydrophilic

Hydrophobic residues tend to cluster within the interior of the protein structure, while hydrophilic and charged residues are more exposed on the protein surface, which is typical for soluble proteins interacting with the aqueous cytoplasm.

🔵 Hydrophilic residues (positively charged)
🔴 Negatively charged residues


Binding pockets

Surface representation of the protein reveals several small cavities distributed across the structure. These cavities likely correspond to potential ligand-binding pockets. In RecA proteins, such pockets are typically involved in nucleotide binding (ATP/ADP), which is required for DNA repair activity.


Part C. Using ML-Based Protein Design Tools

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.


C1. Protein Language Modeling

C1.1. Deep mutational scans

Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.

The deep mutational scan generated using the protein language model ESM-2 evaluates how likely each amino-acid substitution is at every position in the protein sequence of RecA.

In the Heatmap:

  • The x-axis represents the position of residues in the protein sequence.
  • The y-axis represents the amino acid substituted at that position.
  • Each cell corresponds to a single mutation.

The color scale represents the model score, which reflects how compatible a mutation is with the learned evolutionary patterns of proteins.

ColorScore RangeInterpretation
🟡 YellowPositive (>0)Favorable or tolerated mutation
🟢 GreenAround 0Neutral mutation
🔵 BlueModerately negativeLikely destabilizing mutation
🟣 PurpleVery negative (< -8)Strongly deleterious mutation

Mutations with strongly negative scores are predicted to be highly disruptive to the protein structure or function.


Can you explain any particular pattern? (choose a residue and a mutation that stands out)

One of the most noticeable patterns in the heatmap is the presence of vertical dark bands across several sequence positions. These vertical bands indicate positions that are highly sensitive to mutation, where almost any substitution results in a strongly negative score. This pattern suggests that these residues are evolutionarily conserved and structurally or functionally important.

In proteins such as RecA, highly conserved residues often correspond to:

  • Catalytic residues
  • ATP-binding residues
  • Residues located in the structural core of the protein

Mutations at these positions are therefore predicted to disrupt protein folding or functional activity.


Strongly unfavorable mutations

a) Cysteine mutations

A particularly prominent pattern in the heatmap corresponds to mutations to cysteine (C), which frequently show very negative scores across many sequence positions. Introducing cysteine residues can be problematic for several reasons:

  • Disulfide bond formation: Cysteine residues can form disulfide bonds, which may introduce unintended cross-links that disrupt the protein’s native structure.
  • Structural constraints: Cysteine has a reactive thiol group that may interfere with local interactions within the protein core.
  • Protein environment mismatch: In cytosolic proteins such as RecA, cysteines are relatively rare and often occur only at specific functional sites.

Because of these factors, many cysteine substitutions are predicted to be structurally destabilizing, which explains the strong negative scores observed in the heatmap.

b) Tryptophan mutations

Another notable pattern is the strong negative scores observed for mutations to tryptophan (W) at many positions. Tryptophan is the largest amino acid, and its bulky aromatic side chain can disrupt tightly packed regions of the protein structure. When introduced at positions that cannot accommodate large residues, it may:

  • Create steric clashes
  • Disturb secondary structure packing
  • Destabilize the hydrophobic core

As a result, many tryptophan substitutions receive very negative model scores, indicating that these mutations are likely to be deleterious.


Favorable or tolerated mutations

Some amino acids show mostly neutral or favorable scores across the sequence. One example in the heatmap is serine (S), which appears largely green and occasionally yellow. Serine substitutions are often tolerated because:

  • it is small in size
  • it is polar but not strongly charged
  • it can participate in hydrogen bonding
  • it does not introduce major steric clashes

Because of these properties, serine can frequently replace other small or polar residues without significantly disrupting the protein structure.


Neutral mutations

Neutral mutations (shown in green) typically occur when the substituted amino acid has similar physicochemical properties to the original residue. Examples include substitutions between:

  • Hydrophobic residues (e.g., V → I)
  • Polar residues (e.g., S → T)
  • Similarly sized amino acids

These mutations tend to preserve the overall structural stability and local interactions of the protein.


C1.2. Latent space analysis
  • Use the provided sequence dataset to embed proteins in reduced dimensionality.
  • Analyze the different formed neighborhoods: do they approximate similar proteins?
  • Place your protein in the resulting map and explain its position and similarity to its neighbors.

The following dimensionality reduction technique preserves local similarity relationships, allowing sequences with similar structural or evolutionary features to cluster together in the resulting latent space.

NOTE

Some adjusments were made. During the latent space analysis, an error occurred while applying the t-SNE dimensionality reduction using scikit-learn. The program returned the message ValueError: perplexity must be less than n_samples. This error arose because the input dataset initially contained only a single protein sequence corresponding to RecA from Deinococcus radiodurans. The t-SNE algorithm requires multiple samples to estimate neighborhood relationships between points, and the perplexity parameter (set to 30) must always be smaller than the number of samples in the dataset. Because only one sequence was provided, the algorithm could not compute the embedding. The issue was resolved with assistance from Gemini, which identified that the FASTA input needed multiple sequences. The original FASTA link was therefore replaced with a dataset containing 50 homologous protein sequences related to RecA, allowing the model to generate valid embeddings and complete the latent space analysis.

With error


Fixed


Final take

The following color encoding helps visualize how proteins are distributed along the third dimension of the latent space and highlights subtle structural relationships that may not be obvious from spatial position alone.

ColorTSNE3 ValueInterpretation
🔵 Dark purpleLow values (negative)Proteins positioned in lower regions of the latent dimension
🟣 Magenta / 🩷 pinkIntermediate valuesProteins occupying middle regions of the embedding
🟠 Orange / 🟡 yellowHigh values (positive)Proteins separated along the third dimension

NOTE: The color scale represents the TSNE3 coordinate and does not indicate protein quality or functional superiority. Instead, it simply reflects the relative position of proteins along the third dimension of the embedding space.


a) Neighborhood structure in the embedding space

The resulting 3D visualization reveals a clear clustering pattern, where the majority of sequences form a dense neighborhood in the latent space. This clustering indicates that many sequences in the dataset share similar sequence patterns, structural motifs, or evolutionary relationships, and likely belong to related protein families or structural classes. Protein language models capture evolutionary constraints during training, meaning sequences with similar functional or structural properties tend to occupy nearby regions in the embedding space.

Within the visualization, most sequences form a compact cluster, suggesting they share significant sequence similarity and may belong to related recombination or DNA-binding protein families.

b) Outlier sequences

A small number of sequences appear separated from the main cluster, forming outliers in the latent space. These outliers may correspond to proteins that:

  • Contain significant sequence divergence
  • Belong to more distant homologous families
  • Contain additional domains or structural insertions

Protein language models often place functionally related but evolutionarily distant proteins in nearby regions, but sequences that diverge significantly may appear as isolated points in the reduced dimensional space.


Placement of the selected protein

RecA from Deinococcus radiodurans, appears within the main cluster of sequences in the embedding space. Its coordinates (TSNE1 ≈ −39, TSNE2 ≈ −5, TSNE3 ≈ −0.5) place it close to several other proteins in the dataset. The surrounding points share similar color and spatial proximity, indicating that these proteins have similar embedding representations.


C2. Protein Folding

1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

ESMFOLDING RESULTS
Using device: cuda:0  
Total sequence length: 363  
Running ESMFold inference for sequence with length 363...  
Prediction complete. ptm: 0.919 plddt: 94.831  
Results saved to RecA_5fccb/  
CPU times: user 45.5 s, sys: 8.63 s, total: 54.1 s  
Wall time: 1min 24s  

Displays

a) Sidechain


b) Mainchain


c) Sidechain + Mainchain


Comparison

Overall, they do look similar 😃! The structure predicted with ESMFold closely resembles the monomeric fold of the RecA proteins reported in experimental structures and AlphaFold models on UniProt. The predicted structure contains the characteristic α/β core domain and several α-helices typical of RecA family proteins. However, it does not reproduce the circular oligomeric structure observed in some UniProt models (PDB 1XP8), because those structures represent multimeric assemblies composed of several RecA subunits. ESMFold predicts the structure of a single polypeptide chain (monomer), which explains why the predicted structure resembles monomeric crystal structures such as 2ofo.1 rather than the oligomeric filament assemblies (1xp8.1.F).


SWISS-MODEL: SMTL ID : 2ofo.1
Average Model Confidence (QMEANDisCo): 0.77 ± 0.05


SWISS-MODEL homology model based on template 2.ofo.1.
RecA structure predicted using ESMFold.

SWISS-MODEL: AlphaFold Model AF-P42443-F1
Average Model Confidence (pLDDT): 88.88


SWISS-MODEL homology model based on template AlphaFold: AF-P42443-F1.
RecA structure predicted using ESMFold.

The RecA structure predicted with ESMFold showed a very high confidence score (pLDDT = 94.83) and was compared with previously available structural models, including the SWISS-MODEL homology model based on the crystal structure 2ofo.1 (QMEANDisCo = 0.77 ± 0.05) and the AlphaFold prediction AF-P42443-F1 (average pLDDT = 88.88). All models display the characteristic RecA fold, consisting of a central α/β ATPase domain with a β-sheet core surrounded by multiple α-helices. The arrangement of secondary structural elements, including β-strands, α-helices, and connecting loops, is largely conserved between the three structures. Minor differences are mainly observed in flexible regions such as loops and the C-terminal tail, which are known to exhibit conformational variability. Overall, the high pLDDT value obtained with ESMFold indicates that the predicted structure is highly reliable and consistent with experimentally derived and AI-predicted RecA models.


2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?


C3. Protein Generation

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN
  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
  2. Input this sequence into ESMFold and compare the predicted structure to your original.

The “New Sequence” generated by ProteinMPNN is not merely a mutated version of the original; it’s a de novo design tailored for the provided 3D protein structure (PDB 1XP8). ProteinMPNN operates on the principle of inverse protein folding, where instead of predicting the 3D structure from a given amino acid sequence, it takes a fixed 3D backbone and designs an amino acid sequence that is predicted to be compatible with and stable within that structure.

How the new sequence is generated

1. Structural Input: The model uses the atomic coordinates of the protein backbone (from PDB 1XP8) as its primary input.

2. Per-Position Probability: For each amino acid position along the chain, ProteinMPNN evaluates the local structural environment (neighboring residues, local geometry). Based on this context, it predicts a probability distribution over all 20 standard amino acids (and sometimes ‘X’ for ambiguous or masked regions) for that specific position.

3. Sampling: A new amino acid is then sampled from this probability distribution for each position. The sampling_temp parameter (set to 0.1 in our run) influences this sampling: a lower temperature means the model is more likely to pick the highest-probability amino acid, leading to more conservative designs, while a higher temperature introduces more diversity.

4. Novelty and Mutations: The resulting “New Sequence” is thus a sequence that ProteinMPNN predicts will stably adopt the input 3D conformation. Differences between this generated sequence and the “Native Sequence” (from the original PDB) represent designed mutations. These mutations aim to optimize the sequence for the given fold, potentially improving stability or introducing new functions.


Amino acid probability map

Each row on the y-axis represents a different amino acid, and each column on the x-axis corresponds to a specific position in the protein sequence. The color intensity at each cell (intersection of an amino acid and a position) indicates the average probability that ProteinMPNN assigned to that amino acid at that position.

Interpretation

  • Hotter/Brighter Colors (yellow, white): Indicate a high probability for that specific amino acid at that particular position. This suggests that the model strongly prefers or predicts that amino acid to be structurally compatible at that site. These positions are often critical for the protein’s fold or function, or simply have strong local preferences.

  • Colder/Darker Colors (blue, purple): Indicate a low probability for that amino acid at that position. The model considers these amino acids unlikely or incompatible with the structural context of that site.

  • Vertical Stripes of Hot Colors: If a column (a specific position) shows a very bright stripe concentrated on one or a few amino acids, it means that position is highly constrained or conserved. The model has a very strong preference for only a few types of amino acids there.

  • Horizontal Stripes/Scattered Hot Colors: If a position has several amino acids with moderately high probabilities, it suggests more variability or plasticity at that site. The structure can tolerate different amino acids there.

The “New Sequence” is derived from these probability distributions. The amino acid selected for each position in the new sequence would typically be one of the high-probability amino acids shown in the heatmap for that specific position, especially with a low sampling_temp.


GENERATED SEQUENCE

Generating sequences…

1XP8, score=1.5369, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020
AKERSKAIETAMSQIEKAFGKGSIMKLGAESKLDVQVVSTGSLSLDLALGVGGIPRGRITEIYGPESGGKTTLALAIVAQAQKAGGTCAFIDAEHALDPVYARALGVNTDELLVSQPDNGEQALEIMELLVRSGAIDVVVVDSVAALTPRAEIXXXXXXXXPGLQARLMSQALRKLTAILSKTGTAAIFINQVXXXXXXXXXXXXXXXGGRALKFYASVRLDVRKIGQPTXXXXXXVANTVKIKTVKNKVAAPFKEVELALVYGKGFDQLSDLVGLAADMDIIKKAGSFYSYGDERIGQGKEKTIAYIAERPEMEQEIRDRVMAAIR

T=0.1, sample=0, score=0.7948, seq_recovery=0.4497
GELAALLERLARALLWAANGPGAAYAPDERAGLPVEVVPTGVVSLDRALGVGGLPKGRITLIQGPPGSGKTTLALQVVAEAQRAGGKAALIDMEGELDEARAAALGVDVAALERYRPENGEEALEKTLELVSSGEYDVVVIDSLQAAVPKAELXXXXXXXXAGLRAELFARYLARLKEVLAGTGTCLIILHHTXXXXXXXXXXXXXXXGIEAVREAASVILDVRRVGEPEXXXXXXRSWRVEIRVVKNTVAPAGRSVVVTLTEGEGFDRIADLVEEAARLGIIERDGNKYSYKNKFIGEGEKAAAATIAKDPALEEEIRREVLERIR

New Sequence:GELAALLERLARALLWAANGPGAAYAPDERAGLPVEVVPTGVVSLDRALGVGGLPKGRITLIQGPPGSGKTTLALQVVAEAQRAGGKAALIDMEGELDEARAAALGVDVAALERYRPENGEEALEKTLELVSSGEYDVVVIDSLQAAVPKAELXXXXXXXXAGLRAELFARYLARLKEVLAGTGTCLIILHHTXXXXXXXXXXXXXXXGIEAVREAASVILDVRRVGEPEXXXXXXRSWRVEIRVVKNTVAPAGRSVVVTLTEGEGFDRIADLVEEAARLGIIERDGNKYSYKNKFIGEGEKAAAATIAKDPALEEEIRREVLERIR


Part D. Group Brainstorm on Bacteriophage Engineering

Assignees for the following sections
MIT/Harvard studentsOptional
Committed ListenersRequired
  1. Find a group of ~3–4 students
  2. Read through the Phage Reading material listed under “Reading & Resources” below.
  3. Review the Bacteriophage Final Project Goals for engineering the L Protein:
    • Increased stability (easiest)
    • Higher titers (medium)
    • Higher toxicity of lysis protein (hard)
  4. Brainstorm Session
    • Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).
    • Write a 1-page proposal (bullet points or short paragraphs) describing:
      • Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).
      • Why do you think those tools might help solve your chosen sub-problem?
        • Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).
      • Include a schematic of your pipeline.
    • This resource may be useful: HTGAA Protein Engineering Tools
  5. Each individually put your plan on your HTGAA website
    • Include your group’s short plan for engineering a bacteriophage

Resources

  1. USDA FoodData Central. Beef, raw — nutritional composition. https://fdc.nal.usda.gov/
  2. IUPAC Gold Book. Avogadro constant. https://goldbook.iupac.org/
  3. Nelson, D. L., & Cox, M. M. (2021). Lehninger Principles of Biochemistry (8th ed.). W.H. Freeman.
  4. Hall, J. E. (2020). Guyton and Hall Textbook of Medical Physiology (14th ed.). Elsevier.
  5. Alberts, B. et al. (2022). Molecular Biology of the Cell (7th ed.). Garland Science.
  6. Crick, F. H. C. (1968). The origin of the genetic code. Journal of Molecular Biology, 38(3), 367–379.
  7. Liu, C. C., & Schultz, P. G. (2010). Adding new chemistries to the genetic code. Annual Review of Biochemistry, 79, 413–444.
  8. Lobanov, A. V., et al. (2009). Selenocysteine: The 21st amino acid. Journal of Biological Chemistry, 284(44), 28532–28536.
  9. Weber, A.L., Miller, S.L. Reasons for the occurrence of the twenty coded protein amino acids. J Mol Evol 17, 273–284 (1981). https://doi.org/10.1007/BF01795749
  10. Doig, A.J. (2017), Frozen, but no accident – why the 20 standard amino acids were selected. FEBS J, 284: 1296-1305. https://doi.org/10.1111/febs.13982
  11. Young, Travis S. & Schultz, P. G. schultz@scripps.edu. (April, 2010). Beyond the Canonical 20 Amino Acids: Expanding the Genetic Lexicon. Journal of Biological Chemistry, Volume 285, Issue 15, 11039 - 11044. https://doi.org/10.1074/jbc.R109.091306
  12. Atkins, J., Gesteland, R. The twenty-first amino acid. Nature 407, 463–464 (2000). https://doi.org/10.1038/35035189
  13. Miles, S. A., Nillama, J. A., & Hunter, L. (2023). Tinker, Tailor, Soldier, Spy: The Diverse Roles That Fluorine Can Play within Amino Acid Side Chains. Molecules, 28(17), 6192. https://doi.org/10.3390/molecules28176192
  14. Lee, Hyang-Yeol; Lee, Kyung-Hoon; Al-Hashimi, Hashim M.; Marsh, E. Neil G. . (2006). Modulating Protein Structure with Fluorous Amino Acids: Increased Stability and Native-like Structure Conferred on a 4-Helix Bundle Protein by Hexafluoroleucine. Journal of the American Chemical Society, 128(1), 337–343. doi:10.1021/ja0563410
  15. Buer, B.C. and Marsh, E.N.G. (2012), Fluorine: A new element in protein design. Protein Science, 21: 453-462. https://doi.org/10.1002/pro.2030
  16. Benjamin C. Buer; E. Neil G. Marsh. (2012). Fluorine: A new element in protein design. , 21(4), 453–462. doi:10.1002/pro.2030
  17. Buer, B.C., Meagher, J.L., Stuckey, J.A. & Marsh, E.N.G. (2012). Structural basis for the enhanced stability of highly fluorinated proteins, Proc. Natl. Acad. Sci. U.S.A. 109 (13) 4810-4815. https://doi.org/10.1073/pnas.1120112109
  18. Costantino, A., Pham, L.B.T., Barbieri, L., Calderone, V., Ben-Nissan, G., Sharon, M., et al. Controlling the incorporation of fluorinated amino acids in human cells and its structural impact. Protein Science. 2024; 33(3):e4910. https://doi.org/10.1002/pro.4910
  19. Zhang, Huimin; Song, Yanling; Zou, Yuan; Ge, Yun; An, Yuan; Ma, Yanli; Zhu, Zhi; Yang, Chaoyong James . (2014). A diazirine-based photoaffinity probe for facile and efficient aptamer–protein covalent conjugation. Chemical Communications, 50(38), 4891–. doi:10.1039/c4cc01528b
  20. S. Ravindra, C. P. Irfana Jesin, A. Shabashini, G. C. Nandi. (2021). Recent Advances in the Preparations and Synthetic Applications of Oxaziridines and Diaziridines. Catal. 363, 1756. https://doi.org/10.1002/adsc.202001372
  21. Jian Fan, Qingyao Shu, Yi-Ming Li b, Jing Shi. (2022). Efficient synthesis of terminal-diazirine-based histone peptide probes. Tetrahedron Letters Volume 102, 153878. https://doi.org/10.1016/j.tetlet.2022.153878
  22. Famiano, M.A., Boyd, R.N., Kajino, T. et al. Amino Acid Chiral Selection Via Weak Interactions in Stellar Environments: Implications for the Origin of Life. Sci Rep 8, 8833 (2018). https://doi.org/10.1038/s41598-018-27110-z
  23. Ronald Breslow, The origin of homochirality in amino acids and sugars on prebiotic earth. Tetrahedron Letters, Volume 52, Issue 32, 2011, Pages 4228-4232, ISSN 0040-4039. https://doi.org/10.1016/j.tetlet.2011.06.002
  24. Engel, M. H.; Macko, S. A. . (1997). . Nature, 389(6648), 265–268. https://doi.org/10.1038/38460
  25. M.H Engel; S.A Macko. (2001). The stereochemistry of amino acids in the Murchison meteorite. , 106(1-2), 0–45. https://doi.org/10.1016/s0301-9268(00)00123-6
  26. Glavin, D.P., Elsila, J.E., McLain, H.L., Aponte, J.C., Parker, E.T., Dworkin, J.P., Hill, D.H., Connolly, H.C., Jr. and Lauretta, D.S. (2021), Extraterrestrial amino acids and L-enantiomeric excesses in the CM2 carbonaceous chondrites Aguas Zarcas and Murchison. Meteorit Planet Sci, 56: 148-173. https://doi.org/10.1111/maps.13451
  27. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/
  28. GeminiAI. (2026). Gemini (Gemini 2.5 Flash) [Large language model].

Tools


Phage Reading

HTGAA - Week 5: Protein Design Part II


cover image cover image

My Homework

WEEK 5 - BIOINFORMATICS PART II

This week we learn how cutting-edge AI and protein language models are used to design functional proteins and peptides “in silico”.

Lecture (Tues, Mar 3)

Protein Design Part II
Gabriele Corso ▶️Recording
Pranam Chatterjee ▶️Recording

Recitation (Wed, Mar 4)

Phage Therapy
(▶️Recording | 💻Slides)
Suvin Sundararajan, Dominika Wawrzyniak



Part 1: SOD1 Binder Peptide Design (From Pranam)

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Challenge:

  1. Design short peptides that bind mutant SOD1.
  2. Then decide which ones are worth advancing toward therapy.

Available models:

  • PepMLM: target sequence-conditioned peptide generation via masked language modeling
  • PeptiVerse: therapeutic property prediction
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

A: Generate Binders with PepMLM

1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

UniProt - SOD1 (P00441) sequence

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ


Mutation A → V

MATK A VCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
MATK V VCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Mutated sequence:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ


2. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:

  • Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
  • To the generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
  • Record the perplexity scores that indicate PepMLM’s confidence in the binders.

PepMLM Colab used:

The default values where changed as follows:

  • Updated Peptide Length: 12
  • Updated Top K Value: 3
  • Updated num_binders: 16

The list of the 16 generated peptides and the 17 one added for control:

PepMLM assigns a pseudo-perplexity score to each generated peptide, reflecting the model’s confidence in the sequence given the target protein context. Lower pseudo-perplexity values indicate higher model confidence and a better fit to the learned sequence distribution of potential binders.

IndexBinderPseudo PerplexityX
0WRYYATAVEHKX10.445826Yes
1WRYYAVAVRHKX12.354167Yes
2WRYPVVALALKE11.448351No
3HRYGATVVAWKE11.524772No
4WRYYAAALEHGX8.100808Yes
5WLYYAAALRHKX15.539718Yes
6HHSYPVALEHWK14.301987No
7HHYYAVAAAWKK13.441748No
8WRSGPVAARWWX8.107713Yes
9WLYGATGAAHGE9.124785No
10WLYPAVAAELKX9.295740Yes
11WLYPVTVLELKE19.095537No
12WLYPVVALAHGX10.353661Yes
13WLYGAAAVEWGE14.981852No
14WHYGAAAVRWKX10.837565Yes
15HRYPAVAVRHGX12.434339Yes
16FLYRWLPSRRGGN/ANo

Several generated peptides contain the residue X, which represents an ambiguous or unknown amino acid in protein sequence notation. In peptide design workflows, X typically appears when the model has uncertainty about the most probable residue at that position. Because X cannot be synthesized or interpreted structurally, these peptides are generally considered lower-confidence candidates for downstream therapeutic design and may be deprioritized in later filtering steps.


Observed sequence pattern

Many of the generated peptides begin with W, H, or the motif WR. Examples include sequences such as WRYY…, WLY…, and HRY…. This pattern suggests that PepMLM may have identified an aromatic and positively charged motif favorable for interaction with SOD1.

A possible explanation is related to the chemical properties of these residues:

  • W (Tryptophan) can participate in hydrophobic and aromatic interactions, which often stabilize protein–peptide binding interfaces.
  • R, H, and K (Arginine, Histidine, Lysine) are positively charged residues that can contribute to electrostatic interactions with negatively charged regions on the protein surface.

Together, these features may help promote stable binding between the designed peptides and the mutant SOD1 protein.


Selection of the four best candidate peptides

To select candidates for further evaluation, peptides were prioritized based on:

  • Low pseudo-perplexity scores (higher model confidence)
  • Absence of ambiguous residues (X)
  • Reasonable sequence composition for peptide stability
PeptidePseudo PerplexityJustification
WLYGATGAAHGE9.1248Lowest perplexity among sequences without ambiguous residues; strong model confidence.
WRYPVVALALKE11.4484Moderate perplexity and no ambiguous residues; hydrophobic core may favor binding.
HRYGATVVAWKE11.5248Balanced composition with aromatic and hydrophobic residues that may stabilize interactions.
HHSYPVALEHWK14.3020Slightly higher perplexity but still valid; contains aromatic and charged residues that could support binding.
FLYRWLPSRRGGN/AKnown SOD1-binding peptide used as a benchmark

B: Evaluate Binders with AlphaFold3

Scoring peptides
  1. Navigate to the AlphaFold Server: alphafoldserver.com
  2. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
  3. Record the ipTM score and describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
  4. Describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

All AlphaFold predictions were run using a fixed random seed [100] to ensure reproducibility across peptide–protein complex predictions.

Run 1: WLYGATGAAHGEParameters
Run 1(ipTM=0.49, pTM=0.83)

In the structure:

  • SOD1 appears dark blue, meaning the protein structure is predicted with very high confidence.
  • The peptide is yellow/orange, meaning low confidence in its position and structure.

This usually indicates that AlphaFold is uncertain about the peptide’s binding pose, which is consistent with your ipTM = 0.49.

The peptide WLYGATGAAHGE produced an ipTM score of 0.49, indicating very low confidence in the predicted protein–peptide interaction. The overall structure of SOD1 was predicted with high confidence (dark blue pLDDT values), while the peptide displayed lower confidence scores (yellow/orange). Structural inspection shows the peptide positioned along the surface of the SOD1 β-barrel, rather than binding near the N-terminal region where the A4V mutation is located. The low pLDDT values suggest that the peptide adopts a flexible or weakly defined binding conformation, consistent with a surface-associated interaction rather than a tightly bound interface.


Does it localize near the N-terminus where A4V sits?

No. The peptide does not appear to bind near the N-terminal region where the A4V mutation is located. Instead, it is positioned further along the side of the protein.

Does it engage the β-barrel region or approach the dimer interface?

Yes. The peptide is located along the surface of the SOD1 β-barrel, which is the central structural feature of the protein composed of several β-strands (the arrow-shaped ribbons in the structure). This suggests a surface interaction with the β-barrel region.

No. The model shows only a single SOD1 monomer, so the dimer interface is not present in this prediction. Therefore, the peptide cannot be interacting with the dimer interface in this model.

Does it appear surface-bound or partially buried?

It appears surface-associated but weakly constrained. The peptide is positioned near the surface of the protein, but the yellow/orange coloring indicates low structural confidence, meaning AlphaFold is not strongly confident about the exact binding pose. This suggests that the peptide may transiently interact with the protein surface rather than forming a stable, well-defined interface.


Run 2: WRYPVVALALKEParameters
Run 2(ipTM=0.43, pTM=0.80)

Compared to the first peptide:

  • ipTM decreased slightly (0.49 → 0.43) → weaker predicted interaction
  • pTM also decreased (0.83 → 0.80) → more structural perturbation in SOD1

Important observation: protein color changes

The protein is no longer uniformly dark blue. This suggests:

  • Local decreases in pLDDT
  • Possible structural perturbations induced by the peptide

The peptide may be destabilizing local regions of SOD1 or AlphaFold is uncertain about the interface, propagating uncertainty into nearby residues. A peptide can appear to interact more broadly but still produce lower confidence, indicating a less stable or more disruptive interaction.

The peptide WRYPVVALALKE produced an ipTM score of 0.43, indicating moderate but still low confidence in the predicted protein–peptide interface. The peptide appears to align along the surface of the β-barrel, forming broader contact with the protein compared to the first design. However, it does not localize near the N-terminal region where the A4V mutation resides, and no interaction with the dimer interface can be assessed. The peptide shows partial structural definition, with a central region of moderate confidence and flexible termini. Notably, the SOD1 structure exhibits localized decreases in confidence, suggesting possible structural perturbation or uncertainty induced by the peptide. Overall, the interaction appears surface-bound and weakly defined, without a clear binding pocket or stable interface.


Does it localize near the N-terminus where A4V sits?

No, not clearly. The peptide is positioned along the side of the β-barrel, not near the top region where the N-terminus (and A4V mutation) is located. Therefore, it does not appear to target the mutation site directly.

Does it engage the β-barrel region or approach the dimer interface?

Yes, more convincingly than Run 1. The peptide runs along the surface of the β-sheets, appearing to align with the β-barrel architecture. This suggests a surface-guided interaction, possibly stabilized by:

  • hydrophobic residues (V, L, A)
  • aromatic residue (W, Y)

However, it still does not insert into a defined binding pocket.

No. Again, only a monomer is modeled, so the dimer interface is absent. No conclusions can be drawn about dimer stabilization.

Does it appear surface-bound or partially buried?

Partially surface-bound, partially flexible. The central region of the peptide (yellow) suggests moderate confidence (~70 pLDDT). The ends (orange) remain highly flexible/unresolved. This indicates:

  • Some transient or weak interaction with the protein surface
  • No stable, well-defined binding conformation

Run 3: HRYGATVVAWKEParameters
Run 3(ipTM=0.26, pTM=0.87)
  • The protein is predicted extremely well
  • The peptide is not interacting meaningfully at all

The peptide HRYGATVVAWKE produced an ipTM score of 0.26, indicating very low confidence in the predicted protein–peptide interaction, while the overall SOD1 structure was predicted with high confidence (pTM = 0.87). The peptide appears completely detached from the protein, with no visible interaction with the β-barrel or any defined binding region. It does not localize near the N-terminal region where the A4V mutation is located, and no interaction with the dimer interface can be assessed. The peptide exhibits very low confidence (orange coloring) across most of its length, suggesting high flexibility and lack of a stable conformation. Overall, this model indicates no meaningful binding interaction, representing the weakest candidate among the peptides tested.


Does it localize near the N-terminus where A4V sits?

No. The peptide is located far from the N-terminal region of SOD1. It does not approach the top portion of the structure where the A4V mutation resides.

Does it engage the β-barrel region or approach the dimer interface?

  • No. Unlike Run 2, this peptide does not even align along the β-barrel surface. It is clearly spatially separated from the structured core of the protein.
  • No. Again, only a monomer is modeled, so the dimer interface is not present.

Does it appear surface-bound or partially buried?

Completely detached, his is actually the cleanest negative result so far! This are the key observation:

  • The peptide is far away from the protein
  • It is colored mostly orange, indicating very low confidence and high flexibility
  • There is no visible interaction interface
  • This is essentially a non-binding prediction.

Why this happens

Even though the sequence contains:

  • H (charged)
  • W/Y (aromatic)
  • hydrophobic residues (V, A)

The arrangement and context of residues matters more than composition. This peptide likely does not form a compatible interface geometry, remains too flexible to stabilize binding or is treated by AlphaFold as an independent chain.


Run 4: HHSYPVALEHWKParameters
Run 4(ipTM=0.27, pTM=0.87)

Same pattern as Run 3:

  • Protein is very well predicted
  • Interaction is essentially absent

Important observation: peptide secondary structure

The peptide looks “thicker” and more structured (helix-like or sheet-like), it may be forming a transient secondary structure (likely α-helix). However, internal folding ≠ binding. This means, the peptide can stabilize itself but still fails to interact with SOD1.

This suggests: Binding requires complementarity, not just structure.

Even with:

  • aromatic residues (Y, W)
  • charged residues (H)

The peptide does not match the geometry or chemistry of the binding surface.

The peptide HHSYPVALEHWK produced an ipTM score of 0.27, indicating very low confidence in the predicted protein–peptide interaction, while the SOD1 structure was predicted with high confidence (pTM = 0.87). The peptide appears fully detached from the protein, with no observable interaction with the β-barrel or the N-terminal region containing the A4V mutation. Interestingly, unlike other non-binding peptides, this sequence adopts a more compact and partially structured conformation, suggesting the formation of internal secondary structure. Despite this, the peptide does not form a stable interface with SOD1, indicating that self-folding alone is insufficient for binding. Overall, this model represents a non-binding case with increased peptide structural definition.


Does it localize near the N-terminus where A4V sits?

No. The peptide is clearly distant from the N-terminal region and does not approach the area where the A4V mutation is located.

Does it engage the β-barrel region or approach the dimer interface?

  • No. There is no contact with the β-barrel surface. The peptide is positioned away from the structured core of the protein.
  • No. As in all previous runs, only a monomer is modeled, so the dimer interface is not represented.

Does it appear surface-bound or partially buried?

Detached, but structurally more defined than previous cases. This is the key difference:

  • The peptide is still far from the protein (no interaction)
  • But unlike Run 3, it is not just a random flexible chain
  • It appears to form a more compact, partially folded structure

Run 5: FLYRWLPSRRGGParameters
Run 5(ipTM=0.30, pTM=0.78)

The protein structure is still predicted well, but the interaction between the peptide and SOD1 is predicted very poorly!

The control peptide FLYRWLPSRRGG produced an ipTM score of 0.30, indicating very low confidence in the predicted protein–peptide interface. While the overall fold of SOD1 was predicted with reasonable confidence (pTM = 0.78), the peptide displayed very low pLDDT values across its entire length, suggesting high structural uncertainty. Visual inspection shows that the peptide lies loosely along the surface of the β-barrel, but it does not form a well-defined binding interface and does not localize near the N-terminal region where the A4V mutation occurs. Instead, the peptide appears highly flexible and partially detached from the protein surface.


Does it localize near the N-terminus where A4V sits?

No. The peptide does not appear to bind near the N-terminal region of SOD1. The N-terminus is located in the upper portion of the structure, while the peptide is positioned toward the lower region of the protein. Therefore, the peptide does not interact with the region where the A4V mutation occurs in this prediction.

Does it engage the β-barrel region or approach the dimer interface?

Partially, but only loosely. The peptide lies along the outer surface of the β-barrel, but it does not form a clear or well-defined binding interface. It appears to pass across the surface rather than docking into a specific pocket.

No. The model again contains only a single SOD1 monomer, so the dimer interface is not present in this prediction. Therefore, the peptide cannot be interacting with the dimer interface.

Does it appear surface-bound or partially buried?

It appears largely unbound and highly flexible. The peptide is colored orange across nearly its entire length, indicating very low pLDDT (<50). This means AlphaFold has very little confidence in the peptide’s structure or position. This suggests that the peptide does not form a stable interaction with the protein in the predicted model and may be essentially floating near the protein surface.


Final results

RunPeptideSeedipTMpTMProtein confidence
1WLYGATGAAHGE1000.490.83stable
2WRYPVVALALKE1000.430.80slightly perturbed
3HRYGATVVAWKE1000.260.87stable
4HHSYPVALEHWK1000.270.87stable
5FLYRWLPSRRGG1000.300.78stable

Across all predictions, the PepMLM-generated peptides exhibited a range of interaction behaviors with Superoxide dismutase 1, but none achieved high-confidence binding according to AlphaFold ipTM scores. The best-performing designs (WLYGATGAAHGE and WRYPVVALALKE) showed moderate interface confidence (ipTM ≈ 0.43–0.49) and appeared to interact weakly along the β-barrel surface, although without forming well-defined binding pockets or localizing near the N-terminal region containing the A4V mutation. In contrast, other peptides (HRYGATVVAWKE and HHSYPVALEHWK) displayed little to no interaction, remaining largely detached from the protein despite in some cases adopting partial secondary structure. Surprisingly, the known binder (FLYRWLPSRRGG) also yielded a low ipTM score (0.30) and showed no clear binding interface in the predicted model. Overall, none of the PepMLM-generated peptides clearly matched or exceeded the known binder in terms of predicted binding confidence; however, several designs performed comparably or slightly better in silico. These results highlight important limitations of structure-based prediction for short, flexible peptides, suggesting that low-confidence AlphaFold outputs do not necessarily rule out experimental binding, and that additional validation methods would be required to accurately assess peptide affinity.


C: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of the peptides!

For each PepMLM-generated peptide:
  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes
    1. Predicted binding affinity
    2. Solubility
    3. Hemolysis probability
    4. Net charge (pH 7)
    5. Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.


Run 1: WLYGATGAAHGE

Good drug-like properties, but weak efficacy

The peptide WLYGATGAAHGE shows a favorable therapeutic profile despite only moderate structural interaction with Superoxide dismutase 1 predicted by AlphaFold (ipTM ≈ 0.49). It is predicted to be highly soluble (1.000) and non-hemolytic (0.042), which are desirable properties for therapeutic development. However, the peptide is classified as non-permeable (0.058) and has a relatively short predicted half-life (0.266 hours), which may limit its bioavailability. The predicted binding affinity is weak (pKd/pKi = 5.779), consistent with the moderate and surface-level interaction observed structurally. The peptide carries a slight negative charge at physiological pH (-1.15) and exhibits near-neutral hydrophobicity (GRAVY = -0.13), suggesting a balanced but not strongly interacting physicochemical profile. Overall, while structural predictions suggest limited binding strength, the peptide demonstrates good safety and solubility characteristics, making it a reasonable candidate for further optimization rather than immediate therapeutic application.


Run 2: WRYPVVALALKE

The peptide WRYPVVALALKE shows a slightly improved predicted binding affinity (pKd/pKi = 6.143) compared to WLYGATGAAHGE, which is consistent with its somewhat more extensive surface interaction observed in AlphaFold (ipTM ≈ 0.43). Like the previous peptide, it is predicted to be highly soluble (1.000) and non-hemolytic (0.047), indicating a favorable safety profile. However, it remains non-permeable (0.170) and exhibits only a modest increase in predicted half-life (0.367 hours). Notably, this peptide is more hydrophobic (GRAVY = 0.32) and carries a slightly positive charge at physiological pH (0.77), which may contribute to its somewhat improved binding affinity through enhanced surface interactions. Despite these improvements, the peptide is still classified as a weak binder, and the interaction observed structurally remains surface-level and not well-defined. Overall, this peptide demonstrates a better balance between binding potential and physicochemical properties compared to Run 1, although significant limitations remain for therapeutic application.


Run 3: HRYGATVVAWKE

The peptide HRYGATVVAWKE shows a weaker predicted binding affinity (pKd/pKi = 5.669) compared to the previous candidates, which is consistent with the very low interaction confidence observed in AlphaFold (ipTM ≈ 0.26). Structurally, this peptide appeared fully detached from Superoxide dismutase 1, indicating no meaningful binding interaction. Despite this, the peptide retains favorable therapeutic properties, including high solubility (1.000) and low hemolysis probability (0.037). It also exhibits one of the longest predicted half-life so far (0.421 hours) among the tested peptides. However, it remains non-permeable (0.071) and shows relatively high fouling potential (0.327). The peptide carries a positive net charge (0.85) but is overall more hydrophilic (GRAVY = -0.53), which may reduce its ability to form stable hydrophobic interactions with the protein surface. Overall, both structural and physicochemical predictions consistently indicate that this peptide is a poor binder, despite having acceptable safety and solubility characteristics.


Run 4: HHSYPVALEHWK

The peptide HHSYPVALEHWK shows the weakest predicted binding affinity among all candidates (pKd/pKi = 4.808), which is consistent with the very low interaction confidence observed in AlphaFold (ipTM ≈ 0.27). Structurally, the peptide appeared fully detached from Superoxide dismutase 1, indicating no meaningful interaction. Despite this, it exhibits several favorable therapeutic properties, including high solubility (1.000) and the lowest hemolysis probability (0.017) among all peptides. It also shows the longest predicted half-life (0.484 hours), suggesting improved stability relative to other candidates. However, it presents the highest fouling propensity (0.504) and remains non-permeable (0.172). The peptide is nearly neutral at physiological pH (net charge ≈ 0.02) and highly hydrophilic (GRAVY = -0.98), which may limit its ability to form stable hydrophobic interactions with the protein surface. Overall, both structural and physicochemical analyses indicate that this peptide is not a viable binder, despite its favorable safety and stability profile.


Run 5 - Control peptide: FLYRWLPSRRGG

The control peptide FLYRWLPSRRGG exhibits a distinct physicochemical profile compared to the PepMLM-generated candidates. While its predicted binding affinity remains in the weak range (pKd/pKi = 5.968), consistent with the low interaction confidence observed in AlphaFold (ipTM ≈ 0.30), it demonstrates several advantageous therapeutic properties. Notably, it is predicted to be highly permeable (0.862), in contrast to all generated peptides, which were non-permeable. Additionally, it is classified as non-fouling (0.666) and non-hemolytic (0.047), indicating favorable biocompatibility. The peptide carries a strong positive charge (2.76) and a high isoelectric point (11.71), which may facilitate interactions with negatively charged cellular membranes and contribute to its permeability. Despite these advantages, its binding affinity and structural predictions do not indicate a strong or well-defined interaction with Superoxide dismutase 1. Overall, the control peptide highlights a trade-off between cellular delivery properties and binding specificity, suggesting that effective therapeutic peptides must balance both aspects.


Final insights

PeptideipTMAffinityPermeabilityKey takeaway
WLYGATGAAHGE0.495.78best structure
WRYPVVALALKE0.436.14best affinity
HRYGATVVAWKE0.265.67no binding
HHSYPVALEHWK0.274.81worst binder
FLYRWLPSRRGG0.305.97best delivery properties

Winner peptide! 😀

Run 2 - WRYPVVALALKE  

Among the evaluated candidates, WRYPVVALALKE represents the best balance between predicted binding and therapeutic properties. This peptide exhibited the highest predicted binding affinity (pKd/pKi = 6.143) and showed moderate interaction with SOD1 in AlphaFold predictions, suggesting some potential for target engagement. While it remains non-permeable and displays only moderate stability, it is highly soluble and non-hemolytic, indicating a favorable safety profile. In comparison, other peptides either showed weaker binding or no interaction, while the control peptide demonstrated superior permeability but no improved binding. Therefore, WRYPVVALALKE would be the most suitable candidate to advance, as it provides the best compromise between binding potential and acceptable physicochemical properties, and could be further optimized to improve delivery and stability.


D: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

How to:
  1. Open the moPPit Colab linked from the HuggingFace moPPIt model card
  2. Make a copy and switch to a GPU runtime.
  3. In the notebook:
    1. Paste your A4V mutant SOD1 sequence.
    2. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
    3. Set peptide length to 12 amino acids.
    4. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
  4. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

moPPIt Colab used:

The motif positions were constrained to residues 1–6 of the peptide to bias binding toward the N-terminal region of SOD1, where the A4V mutation is located (residues 1–10). By restricting the motif to the N-terminal portion of the peptide, the design encourages early contact formation between the peptide and the target region of interest. This does not enforce a one-to-one positional interaction but instead promotes favorable orientation and interaction propensity near the mutation site. Additionally, limiting motif positions to 1–5 reduces the search space, improving computational efficiency while maintaining biologically relevant targeting.

Due to computational limitations in the Colab environment, particularly related to GPU memory, it was not feasible to optimize all properties simultaneously in moPPIt while generating multiple peptide candidates. Including all objectives significantly increases the complexity of the multi-objective optimization process, leading to higher memory usage and instability during execution. As a result, it was necessary to reduce the number of properties selected to successfully generate peptide sequences.


Selected properties

PropertyObjective importanceSelection
Hemolysis1Yes
Non-Fouling0No
Solubility1Yes
Half-Life1Yes
Affinity1Yes
Motif1Yes
Specificity1Yes

The Non-fouling property was sacrificied as an optimization objective. When designing a therapeutic peptide targeting mutant SOD1, the most reasonable property to relax in a multi-objective optimization framework such as moPPIt would be Non-fouling. While properties such as solubility and non-hemolytic behavior are essential for safety and delivery, and binding affinity is the primary objective, some degree of nonspecific interaction may be tolerated during early-stage design to enhance binding strength. Specificity can often be improved in later optimization steps, whereas insufficient binding cannot be easily rescued. Therefore, allowing partial fouling enables exploration of sequences with stronger interaction potential, which can subsequently be refined for selectivity.


Generated binders

RunBinderHemolysisSolubilityHalf-LifeAffinityMotifSpecificity
6WILIKKLGGSTA0.9120.505.0635.8240.0300.853
7KTEEEWKALFAD0.9150.5812.4826.5010.0110.712
8ETPTEIAQKLKE0.9230.674.4995.1450.6120.724
9KTAGETILQWFM0.9390.507.4056.4740.5990.609

All moPPIt-generated peptides are strongly predicted to be hemolytic

Many high-affinity peptides resemble antimicrobial peptides, which are inherently hemolytic due to their ability to disrupt lipid membranes.

All moPPIt-generated peptides exhibited very high hemolysis probabilities (>0.9), indicating a strong tendency to disrupt cellular membranes. This is likely a consequence of the optimization strategy, where specificity (non-fouling) was excluded and binding affinity was prioritized. As a result, the model favored sequences with physicochemical properties similar to membrane-active peptides, such as high charge and amphipathicity, which are known to correlate with hemolytic activity. This highlights an important trade-off in peptide design: improving binding and target interaction can inadvertently increase toxicity. Therefore, although these peptides may have promising binding characteristics toward Superoxide dismutase 1, their high hemolytic potential makes them unsuitable for direct therapeutic application without further optimization.


Run 6: WILIKKLGGSTA - ipTM = 0.4, pTM = 0.83
Run 7: KTEEEWKALFAD - ipTM = 0.35, pTM = 0.87
Run 8: ETPTEIAQKLKE - ipTM = 0.45, pTM = 0.88
Run 9: KTAGETILQWFM - ipTM = 0.52, pTM = 0.88

Although the generated peptides (Runs 6–9) exhibit favorable physicochemical properties (such as high solubility, low predicted hemolysis, and acceptable structural stability) the structural predictions obtained from AlphaFold and PeptiVerse indicate that they do not achieve the intended functional objective of binding to the N-terminal region of the mutated protein.

Specifically:

  • The ipTM values (0.35–0.52) suggest low confidence in protein–peptide interactions, indicating that binding is likely weak or non-specific.
  • In contrast, the pTM values (~0.83–0.88) are relatively high, reflecting accurate prediction of the overall protein structure, but this does not imply successful peptide binding.
  • Visual inspection in AlphaFold shows that:
    • The peptides do not localize to the N-terminal region (residues 1–4), which was the intended binding site.
    • Instead, they remain dispersed near the β-barrel, without forming stable or consistent interactions.
    • The peptides appear in yellow coloration, particularly in Runs 7–9, corresponding to moderate confidence scores (pLDDT ~50–70), which suggests structural flexibility or lack of a well-defined binding conformation.
    • The mutated protein remains in dark blue, indicating that its structural integrity is preserved, but without evidence of functional interaction with the peptides.

Resources

  1. HTGAA Protein Engineering Tools spreadsheet
  2. AlphaFold Server. https://alphafoldserver.com/
  3. PeptiVerse. ChatterjeeLab. https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse
  4. Chen, L.T., Quinn, Z., Dumas, M. et al. (2025). Target sequence-conditioned design of peptide binders using masked language modeling. Nat Biotechnol. https://doi.org/10.1038/s41587-025-02761-2
  5. Chen, T., Dumas, M., Watson, R., et al. (2023). PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling. arXiv. https://doi.org/10.48550/arXiv.2310.03842
  6. Chen, T., Quinn, Z., Mishra, K., et al. (2026). moPPIt: De Novo Generation of Motif-Specific and Functionally Active Peptide Binders via Discrete Flow Matching. https://doi.org/10.1101/2024.07.31.606098
  7. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 6: Genetic Circuits Part I: Assembly Technologies


cover image cover image

My Homework

WEEK 6 - LAB DIGESTION AND CLONING TECHNIQUES

This week we learn core molecular biology tools and techniques for processing and assembling DNA, including PCR and Gibson Assembly.

Lecture (Tues, Mar 10)

Genetic Circuits Part I: Assembly Technologies
(▶️Recording)
Doug Densmore, Traci Haddock

Recitation (Wed, Mar 11)

PCR, Gibson Assembly
(▶️Recording | 💻Slides)
Eyal Perry, Ronan Donovan



Assignment: DNA Assembly

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Answer these questions about the protocol in this week’s lab:

  1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
  2. What are some factors that determine primer annealing temperature during PCR?
  3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
  4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
  5. How does the plasmid DNA enter the E. coli cells during transformation?
  6. Describe another assembly method in detail (such as Golden Gate Assembly)
    1. Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
    2. Model this assembly method with Benchling or Asimov Kernel!
(View Full Screen)

Resources

  1. Primer Design: HTGAA’s Supplement to Gibson Assembly Recitation
  2. NEB’s (New England Biolabs) video Introduction to Gibson Assembly
  3. NEB’s (New England Biolabs) explanation & protocols for Gibson Assembly®
  4. PCR Master Mix. (s.f.). © Merck. https://www.sigmaaldrich.com/MX/es/technical-documents/technical-article/genomics/pcr/pcr-master-mix
  5. PCR Cycling Parameters–Six Key Considerations for Success. (s.f.). © ThermoFisher Scientific. https://www.thermofisher.com/mx/es/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/pcr-education/pcr-reagents-enzymes/pcr-cycling-considerations.html
  6. Restriction Enzyme Digestion. (s.f.). © New England Biolabs. https://www.neb.com/en/applications/cloning-and-synthetic-biology/dna-preparation/restriction-enzyme-digestion
  7. 2.14: Restriction Enzyme Digests. (2013). [video]. Basic Biology. JoVE. https://www.jove.com/v/5070/restriction-enzyme-digests-principle-procedure-and-applications
  8. Restriction Enzyme Digestion: Key Considerations. (s.f.). © ThermoFisher Scientific. https://www.thermofisher.com/mx/es/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/molecular-cloning/restriction-enzymes/restriction-enzyme-key-considerations.html
  9. PCR Basics. (s.f.). © ThermoFisher Scientific. https://www.thermofisher.com/mx/es/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/pcr-education/pcr-reagents-enzymes/pcr-basics.html
  10. Polymerase Chain Reaction. (s.f.). SnapGene. https://www.snapgene.com/guides/polymerase-chain-reaction
  11. Polymerase Chain Reaction (PCR). (s.f.). Addgene. https://www.addgene.org/protocols/pcr/
  12. Panja, S., Aich, P., Jana, B., & Basu, T. (2008). How does plasmid DNA penetrate cell membranes in artificial transformation process of Escherichia coli? Molecular Membrane Biology, 25(5), 411–422. https://doi.org/10.1080/09687680802187765
  13. Bacterial Transformation. (s.f.). Addgene. https://www.addgene.org/protocols/bacterial-transformation/
  14. Bacterial Transformation Workflow. (s.f.). © ThermoFisher Scientific. https://www.thermofisher.com/mx/es/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/molecular-cloning/transformation/bacterial-transformation-workflow.html
  15. Bacterial transformation & selection. (s.f.). Khan Academy. https://www.khanacademy.org/science/biology/biotech-dna-technology/dna-cloning-tutorial/a/bacterial-transformation-selection
  16. NEBridge® Golden Gate Assembly. (s.f.). © New England Biolabs. https://www.neb.com/en/applications/cloning-and-synthetic-biology/dna-assembly-and-cloning/golden-gate-assembly
  17. Golden Gate Assembly. (s.f.). SnapGene. https://www.snapgene.com/guides/golden-gate-assembly

HTGAA - Week 7: Genetic Circuits Part II: Neuromorphic Circuits


cover image cover image

My Homework

WEEK 7 - SMART COMPUTATIONAL SOLUTIONS

This week covers neuromorphic genetic circuits, showing how engineered gene networks can implement neural-network “perceptron”-like computation and learning.

Lecture (Tues, Mar 17)

Genetic Circuits Part II: Neuromorphic Circuits
(▶️Recording)
Ron Weiss

Recitation (Wed, Mar 18)

Neuromorphic circuits & Biomaterials
(▶️Recording | 💻Slides)
Evan Holbrook, Ren Ramlan



Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

About IANNs

Intracellular Artificial Neural Networks (IANNs) are engineered gene networks inside living cells that mimic the behavior of artificial neural networks, particularly simple models like perceptrons.

In a traditional artificial neural network, you have inputs, weights, a summation step, and an activation function. IANNs recreate these same components using biological parts:

  • Inputs → concentrations of molecules (transcription factors, small molecules, or signals)
  • Weights → regulatory strengths (such as promoter strength, ribosome binding sites, or binding affinities between regulators and DNA)
  • Summation → combined effect of multiple regulators acting on a promoter
  • Activation function → nonlinear gene expression response (e.g., sigmoidal response of transcription)

So instead of silicon-based computation, the “computation” happens through gene expression and molecular interactions inside the cell.

What makes IANNs especially interesting is that they allow cells to perform analog, multi-input decision-making, rather than simple Boolean logic. For example, a cell could integrate several environmental signals and produce an output only if a weighted combination of those signals crosses a threshold—just like a perceptron classifying data. Additionally, some IANN designs incorporate mechanisms for tuning or learning, where the effective “weights” can be adjusted (for example, by modifying gene expression levels or regulatory interactions), allowing the system to adapt to new conditions.

In short, IANNs are a bridge between:

  • Synthetic biology (engineering gene circuits)
  • Machine learning concepts (like neural networks and learning)

Enabling living cells to carry out more sophisticated computations such as pattern recognition, classification, and adaptive responses.


Advantages

  • IANNs can process continuous (analog) inputs rather than being limited to binary ON/OFF states. Traditional Boolean circuits treat signals as discrete, which restricts the complexity of responses. In contrast, IANNs allow graded responses to varying concentrations of molecules, making them more biologically realistic.

  • IANNs enable integration of multiple inputs in a weighted manner. Instead of simple logical operations like AND or OR, they can assign different “weights” to each input, allowing more nuanced decision-making—similar to how perceptrons work in artificial neural networks.

  • They provide greater computational complexity and flexibility. Boolean circuits scale poorly when trying to implement complex behaviors, often requiring many layers and components. IANNs can implement sophisticated functions (like classification or pattern recognition) more efficiently within a single network.

  • IANNs are capable of learning and adaptability. While traditional genetic circuits are typically static once designed, IANNs can, in principle, be engineered to adjust their parameters (like weights) in response to environmental signals, enabling adaptive behavior.

  • Also, IANNs mimic natural cellular decision-making processes better, which are rarely purely binary. This makes them especially useful for applications in synthetic biology where cells must respond to complex, noisy, and dynamic environments.

Overall, IANNs expand the capabilities of synthetic gene networks from simple logical operations to more powerful, flexible, and biologically relevant computation.


2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

A) Conceptual idealistic application: IANN-Based Metabolic Optimization Under Space Stress

Concept

An Intracellular Artificial Neural Network (IANN) can be engineered to function as a closed-loop metabolic controller, allowing cells to dynamically redistribute metabolic resources under extreme conditions (like microgravity, radiation, or nutrient limitations in space) encountered in Space Medicine.

Instead of relying on fixed metabolic pathways, the system continuously evaluates the internal physiological state of the cell and adjusts metabolic fluxes to maintain homeostasis and viability.

Instead of fixed pathways, the system does: “Given my current stress and resources, how should I reroute metabolism?”

Inputs (continuous intracellular signals)Processing (IANN computation)
The IANN integrates multiple metabolic and stress-related variables, such as:

• ATP/ADP ratio (cellular energy status)
• NADH/NAD⁺ ratio (redox balance)
• Reactive oxygen species (ROS) levels
• Availability of key nutrients (e.g., glucose, amino acids)

These inputs are inherently continuous and noisy, making them well-suited for neural-like computation rather than Boolean logic.
Each input is assigned a regulatory weight based on its impact on cellular fitness. The IANN performs a weighted integration of these signals and applies a nonlinear activation function to classify the overall metabolic state, for example:

• Energy-efficient state
• Oxidative stress state
• Nutrient-limited state

This enables the system to interpret complex intracellular conditions as distinct physiological regimes.
Outputs (active metabolic control)Usefulness
Based on this classification, the IANN actively regulates cellular metabolism by:

• Upregulating or downregulating key metabolic enzymes
• Redirecting flux between pathways (glycolysis vs. oxidative phosphorylation)
• Activating antioxidant and stress-response pathways

Unlike a biosensor, the system directly modifies cellular behavior, forming a true closed-loop control system.
In space environments, cells experience combined stresses such as radiation, microgravity, and limited nutrients. These conditions disrupt metabolic balance.

An IANN-based controller allows cells to:

• Adapt in real time to fluctuating conditions
• Maintain energy and redox balance
• Improve survival and functional stability
Limitations and ChallengesWhy IANNs are necessary
• Complexity of metabolic networks: Metabolic pathways are highly interconnected, making precise control difficult.
• Weight tuning and calibration: Determining appropriate regulatory weights is challenging and may vary across conditions.
• Cellular burden: Implementing complex circuits may reduce overall cellular efficiency.
• Dynamic instability: Feedback regulation could lead to oscillations or unintended metabolic states.
Traditional genetic circuits based on Boolean logic cannot effectively handle:

• Continuous metabolic variables
• Multi-factorial regulation
• Dynamic adaptation

IANNs enable graded, multi-input decision-making, making them uniquely suited for real-time metabolic control.

This application highlights how IANNs can transform engineered cells from passive sensors into adaptive metabolic systems capable of maintaining homeostasis under extreme and unpredictable environments.


B) Tested real ans successful application:

Concept

A machine learning system (Random Forest) applied to continuous acoustic emission data from a laboratory fault. The system learns to predict the time remaining before failure (lab earthquake) by identifying subtle patterns in the signal that are not detectable by humans. Unlike traditional methods based on recurrence intervals, this approach uses instantaneous physical signal features and reveals previously overlooked precursors hidden in what was thought to be noise.


ParametersDescription
Inputs (continuous physical signals)The model analyzes continuous acoustic emission (AE) signals generated by the fault. From these signals, it extracts statistical features such as:
  • Mean (average signal amplitude)
  • Variance (signal fluctuation intensity)
  • Kurtosis (presence of outliers / impulsive events)
  • Autocorrelation (temporal structure of the signal)

These inputs are continuous, noisy, and high-dimensional, making them ideal for machine learning rather than classical threshold-based analysis.
Processing (ML computation)The system uses a Random Forest algorithm, which consists of multiple decision trees. Each tree evaluates the statistical features extracted from short time windows of the acoustic signal.
The model:
  • Selects the most relevant features recursively
  • Assigns implicit weights through tree decisions
  • Combines outputs from multiple trees
It performs a nonlinear mapping from signal features → predicted time to failure. Importantly, predictions are made using only the current time window, without relying on historical data.
Outputs (prediction / system response)The system outputs a continuous prediction of time remaining before the next failure event.

Key characteristics:
  • Real-time prediction (“now” prediction)
  • High accuracy (R² ≈ 0.89)
  • Works across the entire stress cycle, not just near failure

This demonstrates that the system continuously tracks the progression toward failure.
UsefulnessThis approach overcomes limitations of traditional earthquake prediction methods by:

  • Identifying hidden signals previously classified as noise
  • Providing continuous, real-time forecasting
  • Reducing human bias in feature detection
It suggests that failure systems (earthquakes, avalanches, material fracture) emit continuous predictive signals, enabling earlier and more reliable forecasting.
Key scientific insightThe study reveals that low-amplitude acoustic signals—previously ignored—contain critical information about the system’s state. These signals likely originate from continuous grain motion in the fault gouge, reflecting gradual stress accumulation.

The system shows that failure is not sudden, but a progressive, measurable process.
Limitations and Challenges
  • Laboratory conditions differ from real Earth systems (scale, pressure, temperature)
  • Shear rates are much higher than natural faults
  • Translation to real earthquakes remains uncertain
  • Model predicts timing, not magnitude

This is a real demonstration of a system that:

  • Integrates continuous noisy signals
  • Performs nonlinear multi-parameter analysis
  • Produces real-time predictions
  • This system is computational (external ML model)
  • IANNs would implement similar logic inside living cells

3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.

The diagram represents an intracellular single-layer perceptron, where biological components mimic a simple neural network:

  • Inputs are encoded as DNA sequences
  • Processing occurs through gene expression (transcription + translation)
  • Output is a measurable protein signal

System Breakdown

Inputs

  • X₁ (DNA → Csy4 endoribonuclease): Encodes an enzyme capable of cleaving RNA.
  • X₂ (DNA → fluorescent protein): Encodes the output protein, but its expression is regulated post-transcriptionally.

Processing (Tx + Tl)

  • Tx (Transcription): DNA → mRNA
  • Tl (Translation): mRNA → Protein

👉 X₁ produces Csy4 enzyme
👉 X₂ produces fluorescent protein mRNA

Regulation Mechanism

  • The Csy4 endoribonuclease recognizes and cleaves specific sequences in the mRNA of the fluorescent protein.
  • This acts like a negative weight (inhibition)

Output

Final fluorescence depends on:

  • Presence of X₂ (can produce protein)
  • Presence of X₁ (can suppress it)

Layer 1 flux

X1 ──| inhibitory weight |──┐
                              ↓
X2 ────────────────────────> OUTPUT (fluorescence)
  • X₂ = main signal
  • X₁ = regulator (weight)
  • Output = activation level

👉 This is equivalent to a single neuron with weighted inputs. The system does not just detect signals—it computes a weighted response through molecular interactions.


Now, we draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

The original system is limited because:

  • Regulation happens in a single step
  • No intermediate processing exists

Therefore: A multilayer intracellular perceptron is proposed, where biological regulators act as hidden nodes enabling hierarchical computation.

Layer 1: Inputs → Regulatory molecule (E1)
Layer 2: Regulatory molecule → Output protein  

System flux

Layer 1 — Signal Integration

Goal: Convert multiple inputs into a regulatory signal

Inputs: X₁, X₂ = DNA-encoded signals

Process: Transcription + translation produce an endoribonuclease (E1)

This layer performs:

👉 signal integration
👉 feature extraction


Hidden Node (E1)

  • E1 acts as a biological neuron output
  • Encodes processed information from Layer 1

Can:

  • cleave RNA
  • modulate stability
  • control translation efficiency

Equivalent to: hidden layer activation in neural networks

X1 ──┐
     ├──> [Layer 1: Transcription/Translation] ──> Endoribonuclease (E1)
X2 ──┘

E1 ────────────────| regulates |───────────────> mRNA (Fluorescent Protein)

                                   ↓
                           [Layer 2: Translation]

                                   ↓
                        Fluorescent Protein (OUTPUT)   
Layer 2 — Output Generation

Goal: Transform regulatory signal into measurable output

Input: E1 (from Layer 1)

Process: E1 regulates fluorescent protein mRNA

Output: Fluorescent protein level

Inputs (X1, X2)
        ↓
[Layer 1: Tx/Tl]
        ↓
Endoribonuclease (E1)
        ↓
[Layer 2: mRNA regulation + Tl]
        ↓
Fluorescent Output 

The new design allows:

  • Nonlinear responses
  • Signal filtering
  • Amplification or suppression
  • Temporal dynamics (if extended)

By introducing an intermediate regulatory layer, intracellular circuits can implement hierarchical computation, where molecular species act as hidden nodes transforming input signals before generating an output response.


Sketch

INPUT LAYER
-----------

  X1 (DNA → Csy4)        X2 (DNA → regulator)
        │                      │
        └──────┬───────────────┘
               ↓
        ┌───────────────────┐
        │   Layer 1         │
        │ (Tx + Tl)         │
        │                   │
        │  produces E1      │
        └────────┬──────────┘
                 ↓
        Endoribonuclease (E1)


HIDDEN → OUTPUT CONNECTION
--------------------------

        E1 ───────┤ cleavage/regulation ├───────┐
                                                   ↓


OUTPUT LAYER
------------

        ┌────────────────────────────┐
        │   Layer 2                  │
        │ (regulated mRNA → Tl)      │
        │                            │
        │ Fluorescent Protein        │
        └────────────────────────────┘
Neural NetworkBiológico
NeuronGene circuit
WeightStrength of regulation (cleavage efficiency)
ActivationProtein expression level
Hidden layerRegulatory molecule (endoribonuclease)

The multilayer intracellular perceptron enables hierarchical signal processing, where intermediate biomolecular regulators act as hidden nodes transforming input signals into controlled gene expression outputs.


Assignment Part 2: Fungal Materials

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Fungal materials are bio-based composites made primarily from mycelium, the root-like network of fungi. The mycelium acts as a natural binder, growing through agricultural waste (e.g., straw, husks) and forming a solid structure.

Examples of existing fungal materials

MaterialExampleUsesAdvantagesDisadvantages
Mycelium-Based PackagingEcovative DesignReplacement for polystyrene; protective packaging for electronics and furnitureBiodegradable (compostable in weeks); renewable raw materials; low energy productionLower durability than plastics; sensitive to moisture; limited shelf life
Mycelium Composites (Furniture & Panels)MycoWorks; mycelium boards in architectureFurniture (chairs, tables); interior panels; lightweight structural componentsLightweight; good compressive strength; moldable into custom shapesLower tensile strength than wood/metal; not suitable for load-bearing structures; variability in properties
Mycelium Leather (Biofabricated Leather)Bolt Threads (Mylo™); MycoWorks (Reishi™)Fashion (bags, shoes, clothing); upholsteryAnimal-free; lower environmental impact than leather; customizable texture and thicknessScaling challenges; durability not yet equal to high-grade leather; can be expensive
Mycelium Bricks & Construction MaterialsExperimental architectural applicationsInsulation panels; temporary structures; sustainable constructionFire-resistant; good thermal and acoustic insulation; very low carbon footprintLow structural strength; not standardized for building codes; degradation over time
Mycelium FoamsInsulation; shock absorption; acoustic dampeningReplaces petroleum-based foams; biodegradable; non-toxicLess uniform than synthetic foams; performance variability

Comparison with traditional materials

PropertyFungal MaterialsTraditional Materials
SourceRenewable (biological)Fossil-based / mined
BiodegradabilityHighLow (plastics persist)
Energy useLow (grown, not manufactured)High (industrial processing)
Mechanical strengthModerate–lowHigh
DurabilityLimitedHigh
Environmental impactLowHigh (CO₂, pollution)

Key avantages

Sustainability

  • Grown from waste
  • Compostable
  • Circular economy compatible

Energy Efficiency

  • No high-temperature processing
  • Self-assembling material

Design Flexibility

  • Can grow into molds
  • Tunable properties via growth conditions

Key limitations

  • Mechanical Constraints: Not as strong as steel, concrete, or advanced polymers
  • Environmental Sensitivity: Moisture and biological degradation
  • Standardization Issues
    • Variability between batches
    • Hard to scale consistently

2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Environmental Context – Mexico Oil Spill (YeT aGaIN! 🙃🫠)

Recent petroleum spills in my cute little Mexico have once again highlighted the vulnerability of marine and coastal ecosystems to hydrocarbon contamination. These events have led to severe environmental consequences, including water pollution, damage to marine habitats, and visible distress and mortality in wildlife. In response, local communities and volunteers around the country have mobilized grassroots efforts, such as collecting human and animal hair to create absorbent barriers for oil cleanup. While these initiatives demonstrate remarkable social engagement, they also underscore the limitations of current response strategies, which are often reactive, labor-intensive, and insufficient for large-scale remediation. This situation emphasizes the urgent need for innovative, scalable, and biologically driven solutions. Engineering fungi capable of degrading petroleum compounds offers a promising approach, as such systems could actively break down pollutants in situ, complementing physical cleanup efforts and contributing to faster and more sustainable ecosystem recovery.

Environmental damage

Engineering Fungi for Oil Spill Remediation

The proposal is to genetically engineer fungi to detect, absorb, and degrade petroleum hydrocarbons in contaminated marine and coastal environments.

Fungi would be designed to:

  • Secrete hydrocarbon-degrading enzymes, such as laccases and peroxidases, to break down complex petroleum compounds into less toxic molecules.
  • Enhance their mycelial network structure to physically trap and retain oil, similar to how materials like human hair are currently used in cleanup efforts.
  • Sense the presence of hydrocarbons and upregulate degradation pathways only when pollutants are detected, improving efficiency and reducing unnecessary metabolic burden.

The motivation for this approach comes from the urgent need for more effective and scalable responses to oil spills. Current methods are often limited to physical removal or absorption, which do not fully eliminate contaminants. In contrast, engineered fungi could provide a self-sustaining, in situ bioremediation system that not only contains but actively neutralizes pollutants, accelerating ecosystem recovery.

Conceptual system overview

Hydrocarbons (oil spill)
            ↓
     [Detection system]
            ↓
  Activation of gene circuits
            ↓
 Enzyme secretion + mycelial growth
            ↓
Oil degradation + physical trapping
            ↓
 Reduced toxicity / ecosystem recovery

Why use fungi instead of bacteria?

Advantages of Fungi

1. Physical Structure (Mycelium Networks)

Fungi form extensive filamentous networks that can:

  • Penetrate contaminated sediments and shorelines
  • Physically trap oil particles
  • Cover large surface areas

This makes them both a material and a metabolic system, unlike bacteria.

2. Ability to Degrade Complex Compounds

Fungi naturally degrade highly complex organic materials such as lignin, which is structurally similar to many petroleum compounds. This gives them a strong advantage in breaking down recalcitrant hydrocarbons.

3. Environmental Robustness

Fungi can survive in:

  • Low oxygen environments
  • Nutrient-poor conditions
  • Harsh and variable ecosystems

Ideal for real-world spill conditions.

4. In Situ Growth and Self-Propagation

Once deployed, fungal mycelium can:

  • Expand across contaminated areas
  • Continuously produce degrading enzymes
  • Self-repair and persist over time

5. Potential for Integrated Sensing and Response

Fungi can be engineered with regulatory circuits to:

  • Detect pollutants
  • Dynamically adjust enzyme production

Enabling smart, adaptive bioremediation systems


Limitations compared to bacteria

  • Slower growth rates
  • More complex genetic engineering
  • Fewer standardized tools and models
  • Greater difficulty in precise control

Assignment Part 3: First DNA Twist Order

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired
  1. Review the Individual Final Project documentation guidelines.
  2. Submit this Google Form with your draft Aim 1, final project summary, HTGAA industry council selections, and shared folder for DNA designs. DUE MARCH 20 FOR MIT/HARVARD/WELLESLEY STUDENTS
  3. Review Part 3: DNA Design Challenge of the week 2 homework. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.

Resources

  1. The perceptron, the basis of artificial neural networks: https://www.geeksforgeeks.org/deep-learning/what-is-perceptron-the-simplest-artificial-neural-network/
  2. Many examples of artificial neural networks made using biomolecules: https://doi.org/10.1016/j.biosystems.2024.105164
  3. Montesinos López O. A., Montesinos López A., & Crossa, J. Chapter 10: Fundamentals of Artificial Neural Networks and Deep Learning. Multivariate Statistical Machine Learning Methods for Genomic Prediction. (2022). Springer. https://www.ncbi.nlm.nih.gov/books/NBK583971/
  4. Nilsson, A., Peters, J. M., Meimetis, N., et al. (2022). Artificial neural networks enable genome-scale simulations of intracellular signaling. Massachusetts Institute of Technology. Nature Communications, 13 (1). https://hdl.handle.net/1721.1/147780
  5. Rouet-Leduc, B., Hulbert, C.,Lubbers, N., et al. (2017). Machine learningpredicts laboratory earth-quakes. Geophysical Research Letters, 44, 9276–9282. https://doi.org/10.1002/2017GL074677
  6. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 8: Spring Break

HTGAA - Week 9: Cell-Free Systems


cover image cover image

My Homework

WEEK 9 - X

This week introduces synthesis of proteins using cellular machinery outside of a cell.

Lecture (Tues, Mar 17)

Cell-Free Systems
(▶️Recording)
Kate Adamala, Peter Nguyen, Ally Huang

Recitation (Wed, Mar 18)

Cell-free protein synthesis
(▶️Recording | 💻Slides)
Ben Arias-Almeida, Ice Kiattisewee



Homework Part A: General and Lecturer-Specific Questions

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

A.1. General homework questions

  1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
  2. Describe the main components of a cell-free expression system and explain the role of each component.
  3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
  4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
  5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
  6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

For the answers to Homework Part A - General homework questions and Homework question from Kate Adamala - see document below 😀👇🏻


A.2. Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows:

Example solution

Based on: Lentini, R. et al., 2014. Nat comm, 5, p.4012.

  1. Pick a function and describe it.
    1. What would your synthetic cell do? What is the input and what is the output?
      Expand the sensing capacity of bacteria. Input: theophylline (inert to bacteria). Output of the SMC: IPTG. Output of the whole system: GFP produced in bacteria. (Theophyline aptamer reference: *Martini, L. & Mansy, S.S., 2011. Cell-like systems with riboswitch controlled gene expression. Chemical Communications, 47(38), p.10734.*)
    2. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
      No. If the IPTG were not encapsulated, it would go into the bacteria without the need of theophylline-induced membrane channel synthesis, thus the synthetic cell actuator would not exist.
    3. Could this function be realized by genetically modified natural cell?
      Yes, in this particular case: the theophylline aptamer could be incorporated into a transformed gene. This lacks generality though – it is easier to make SMC than modify bacteria, so in this system a single bacteria reporter can be used to detect various small molecules.
    4. Describe the desired outcome of your synthetic cell operation.
      In the presence of SMC, bacteria sense theophylline.
  2. Design all components that would need to be part of your synthetic cell.
    1. What would be the membrane made of?
      Phospholipids + cholesterol.
    2. What would you encapsulate inside? Enzymes, small molecules.
      cell-free Tx/Tl system, IPTG, gene for membrane transporter under the control of theophylline aptamer.
    3. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)
      Bacterial, because of the theophylline riboswitch used as SMC input.
    4. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
      The membrane is permeable to the input molecule (theophylline), the output is IPTG that will cross the membrane via the membrane pore created after theophyline-initiated gene expression.
  3. Experimental details
    1. List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
      • Lipids: POPC, cholesterol
      • Enzymes: bacterial cell-free Tx/Tl
      • Genes: a-hemolysin (aHL) to encapsulate in SMC
      • Biological cells: *E.coli* transformed with GFP under T7 promoter and a lac operator
    2. How will you measure the function of your system?
      Measure GFP output of the cells via flow cytometry. Alternatively, use enzymatic reporter, like luciferase, and measure bulk output of the enzyme.
Artificial cells translate chemical signals for E. coli.
(a) In the absence of artificial cells (circles), E. coli (oblong) cannot sense theophylline.
(b) Artificial cells can be engineered to detect theophylline and in response release IPTG, a chemical signal that induces a response in E. coli.

Artificial cells translate chemical signals for E. coli.
(a) In the absence of artificial cells (circles), E. coli (oblong) cannot sense theophylline.
(b) Artificial cells can be engineered to detect theophylline and in response release IPTG, a chemical signal that induces a response in E. coli.


Answers to Homework Part A.1 and A.2: 👇🏻

(View Full Screen)

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

  • Write a one-sentence summary pitch sentence describing your concept.
  • How will the idea work, in more detail? Write 3-4 sentences or more.
  • What societal challenge or market need will this address?
  • How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

Application field: Architecture

1. One-sentence summary pitch

“Bioactive architectural lattices that detect airborne pathogens and release antimicrobial peptides on-demand, creating self-sanitizing indoor environments activated by humidity or water spray.”


2. How will the idea work?

This concept builds upon wearable freeze-dried cell-free (wFDCF) technology developed by Nguyen and colleagues at the Wyss Institute , but scales it up for architectural applications. The system consists of 3D-printed biopolymer lattices (composed of cellulose fibers, chitosan gels, and silk fibroin) embedded with freeze-dried “biosites”—porous pellets containing cell-free TXTL (transcription-translation) machinery, DNA circuits encoding antimicrobial peptides (such as nisin or LL-37), and riboswitch-based pathogen sensors.

When airborne pathogens (S. aureus, E. coli, or influenza virus) contact the lattice surface, they are captured by the porous biopolymer matrix and detected via toehold switch sensors or CRISPR-Cas12a-based genetic circuits that specifically recognize pathogen-derived nucleic acids . Upon detection, the riboswitch-triggered circuit activates expression of antimicrobial peptides, which are immediately released from the cell-free system into the surrounding environment to neutralize the threat. The entire system is activated by ambient humidity or controlled water misting, eliminating the need for living cells while providing programmable, on-demand biocidal functionality within building materials.

The lattices are designed with functionally graded porosity—denser regions provide structural integrity while sparser, high-porosity zones maximize air contact with biosites and facilitate capillary-driven fluid distribution during rehydration . The modular, foldable geometry allows installation as ceiling-hung ribbons, wall partitions, or facade elements that maximize surface area exposure to air circulation.


3. Societal challenge and market need

This technology addresses the global challenge of healthcare-associated infections (HAIs) and indoor air quality, which costs the US healthcare system alone approximately $28–45 billion annually and causes 99,000 deaths per year . The COVID-19 pandemic starkly revealed the lack of rapid, accurate environmental diagnostics and the vulnerability of indoor spaces to airborne pathogen transmission.

Current solutions rely on passive HEPA filtration or chemical disinfectants that require manual application and provide no real-time detection capability. This bioactive architectural system offers:

  • Real-time pathogen detection without laboratory infrastructure
  • Autonomous, targeted antimicrobial response rather than blanket chemical treatment
  • Biodegradable, non-toxic materials (silk fibroin, cellulose, chitosan) that replace carcinogenic and carbon-positive conventional building materials
  • Scalability through additive manufacturing and modular assembly

The market need extends beyond healthcare to include schools, public transportation hubs, food processing facilities, and residential buildings—any indoor environment where air quality and pathogen control are critical.


4. Addressing limitations of Cell-Free reactions

a) Activation with water

Rather than viewing water-activation as a limitation, this system leverages it as a controlled activation mechanism. The biosites are designed to respond to:

  • Ambient humidity (40–60% RH typical of indoor environments) for passive, continuous low-level monitoring
  • Controlled water misting systems (similar to existing building humidification or fire suppression systems) for active, on-demand activation when elevated pathogen risk is detected

The biopolymer matrix (silk fibroin and sodium alginate) naturally regulates water uptake through capillary action, ensuring consistent rehydration of embedded cell-free pellets without manual intervention . The system uses ×1.5-concentrated cell-free reactions to accelerate signal output, ensuring antimicrobial peptide production completes before evaporation terminates the reaction.


b) Stability and Shelf-Life

Freeze-dried cell-free systems have demonstrated shelf stability for months to years when properly sealed and stored at room temperature . To enhance longevity in architectural applications:

  • Biosites are encapsulated in lyophilized biopolymer sponges that protect against oxidation and moisture ingress during storage
  • Silk fibroin stabilization (which showed 74% expression retention compared to buffer-diluted controls) provides a protective, crowding environment that enhances protein synthesis kinetics
  • Modular replacement design: Individual biosite pellets can be swapped out when depleted, similar to changing air filters, without replacing entire structural elements

c) One-time use

While individual biosites are single-use (one activation cycle per freeze-dried pellet), the system architecture is designed for modularity and serviceability:

  • Biosites are press-fitted into lattice cells, allowing easy removal and replacement
  • Distributed sensing arrays ensure that only activated zones require replacement, while the structural lattice remains intact for years
  • Future iterations could incorporate regenerative capsules containing fresh freeze-dried TXTL reservoirs that auto-dispense to replenish spent biosites, though this remains an area for future development

Additional mitigation strategies

  • Evaporation control: Impermeable silicone elastomer barriers (as demonstrated in wFDCF wearables) constrain rehydration volume to ~50 μL per sensor, preventing excessive dilution
  • Signal amplification: CRISPR-Cas12a’s collateral cleavage activity provides signal amplification, enabling detection at femtomolar sensitivity even with limited reaction time
  • Colorimetric readout: For maintenance purposes, visible color change (via LacZ or other enzymatic reporters) indicates which biosites have been activated and require replacement

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

  1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)
  2. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)
  3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)
  4. Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)
  5. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

Proposal:Real-Time Monitoring of Radiation-Induced DNA Damage Response in Space Using BioBits® Cell-Free Synthesis of γ-H2AX and 53BP1 Repair Proteins


1. Background information

Space radiation poses severe health risks to astronauts, causing DNA double-strand breaks (DSBs) that can lead to cancer, immune dysfunction, and cardiovascular disease. Current biodosimetry requires blood sample return to Earth, creating critical delays in assessing astronaut health during long-duration missions. Cell-free protein synthesis (CFPS) has been validated aboard the ISS, demonstrating that BioBits® can produce functional proteins and biosensors in microgravity using minimal resources. This proposal addresses the urgent need for real-time, in-situ DNA damage assessment capability to enable immediate medical countermeasures and personalized radiation protection during deep space exploration to the Moon and Mars.

2. Molecular/Genetic target

Primary targets: γ-H2AX (phosphorylated H2A.X histone) and 53BP1 (tumor suppressor p53-binding protein 1) DNA damage response proteins; secondary target: fluorescent reporter (mCherry or sfGFP) for visualization.

3. Relationship between target and Space Biology challenge

γ-H2AX and 53BP1 are critical biomarkers of DNA DSBs—the most dangerous form of radiation-induced damage . These proteins form nuclear foci at damage sites, with γ-H2AX appearing within minutes and 53BP1 recruiting repair machinery. Astronauts experience elevated cell-free mitochondrial DNA and persistent DNA damage during spaceflight, correlating with immune dysfunction and long-term health risks. By synthesizing these repair proteins in real-time using BioBits®, we can develop a quantitative biosensor that measures radiation exposure through functional DNA repair capacity rather than just damage accumulation, providing actionable data for crew health management during missions beyond low-Earth orbit where radiation exposure increases dramatically.

4. Hypothesis and research goal

Hypothesis: BioBits® cell-free systems can synthesize functional γ-H2AX and 53BP1 proteins in microgravity that retain DNA damage-binding activity, enabling development of a rapid, fluorescence-based assay for monitoring astronaut cellular radiation response without requiring living cells or sample return to Earth.

Reasoning: Previous Genes in Space experiments validated that BioBits® performs comparably in space and on Earth for protein expression and biosensor applications. The 2024 winning proposal demonstrated cell-free bacteriophage synthesis in space, establishing precedent for complex macromolecular assembly. γ-H2AX and 53BP1 are well-characterized, robustly folding proteins that do not require eukaryotic post-translational modifications for their damage-recognition functions. By expressing these proteins with fluorescent tags (mCherry-γ-H2AX and sfGFP-53BP1 fusion proteins), we can visualize protein synthesis using the P51™ Fluorescence Viewer and validate functionality through DNA-binding assays. This approach leverages the freeze-dried, room-temperature stable nature of BioBits® to create a “just-add-water” diagnostic platform suitable for resource-constrained spacecraft environment.

5. Experimental plan

Samples: BioBits® freeze-dried pellets with plasmids encoding mCherry-γ-H2AX and sfGFP-53BP1; positive control (RFP expression plasmid); negative control (no DNA template).

Procedure: Rehydrate pellets with nuclease-free water, incubate at 37°C using miniPCR® thermal cycler for 90 minutes, visualize fluorescence with P51™ Viewer. Functional validation: add synthesized proteins to DNA-coated microbeads irradiated with bleomycin (DNA damage inducer) and assess binding via fluorescence microscopy or P51™ Viewer.

Measurements: Fluorescence intensity (protein yield), DNA-binding efficiency (functional assay), comparison between spaceflight and ground controls. Data recorded via iPad imaging for quantitative analysis.


Homework Part B: Individual Final Project

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We’d like students to start exploring their final project in depth this week! Of your three Aims, for this week you should have at least Aim 1 decided and written down.

  1. Put your chosen final project slide in the appropriate slide deck following the instructions on slide 1:
  2. Submit this Final Project selection form if you have not already.
  3. Begin planning how you will write your final project documentation based on these guidelines
  4. Prepare your first DNA order and put it in the “Twist (MIT)” or “Twist (Nodes)” tab of the 2026 HTGAA Ordering: DNA, Reagents, Consumables spreadsheet, as appropriate.
    • First Twist order deadline for MIT/Harvard/Wellesley students is Friday, April 3 at 11PM ET
    • First Twist order deadline for Committed Listeners is Friday, April 10 at 11PM ET. (Your Node Lead will place the Twist order, so please work with them to finalize your constructs and ordering decisions.)

Resources

  1. Cell-free protein synthesis (explanation by minipcr’s DNAdots)
  2. Validation of Cell-Free Protein Synthesis Aboard the International Space Station (ACS Synthetic Biology paper by Ally Huang et al.)

  1. Lang, X., Zhang, C., Lin, J., et al. (2025). A simplified and highly efficient cell-free protein synthesis system for prokaryotese. Life 14:RP109495. https://doi.org/10.7554/eLife.109495.1l
  2. Hunt, A. C., Rasor, B. J., Seki, K., et al. (2024). Cell-Free Gene Expression: Methods and Applications. ACS Synthetic Biology, 125, 1, 91–149. https://doi.org/10.1021/acs.chemrev.4c00116
  3. Cell-free Protein Synthesis. (s.f.). Isomerase.
    https://isomerase.com/about-us/articles/cell-free-protein-synthesis-isomerase
  4. Challener, C. A. (2024). Cell-Free Protein Synthesis Holds Real Potential to Transform Drug Development and Manufacturing. Pharma’s Almanac.
    https://www.pharmasalmanac.com/articles/cell-free-protein-synthesis-holds-real-potential-to-transform-drug-development-and-manufacturing
  5. Cell-free Protein Synthesis: Principle, Advantages, and Applications. SinoBiological.
    https://www.sinobiological.com/resource/antibody-technical/cell-free-protein-synthesis
  6. Zemella, A., Thoring, L., Hoffmeister, C., et al. (2015). Cell-Free Protein Synthesis: Pros and Cons of Prokaryotic and Eukaryotic Systems. ChemBioChem, 16(17):2420-2431.
    https://doi.org/10.1002/cbic.201500340
  7. An Introduction to Protein Expression. (s.f.). Promega Corporation.
    https://www.promega.com/resources/guides/protein-analysis/protein-expression-methods/
  8. Steinkühler, J., Peruzzi, J. A., Krüger, A., et al. (2023). Improving Cell-Free Expression of Model Membrane Proteins by Tuning Ribosome Cotranslational Membrane Association and Nascent Chain Aggregation. ACS Synthetic Biology, 13, 1, 129–140.
    https://doi.org/10.1021/acssynbio.3c00357
  9. Cell-Free Protein Expression Systems. (s.f.). Promega Corporation, Technical Guide.
    https://www.promega.com/-/media/files/resources/product-guides/proteomics/cell-free-protein-expression-systems.pdf
  10. Yadav, S., Perkins, A. J. P., Liyanagedera, S. B. W., et al. (2025). ATP Regeneration from Pyruvate in the PURE System. ACS Synthetic Biology, 14, 1, 247–256.
    https://doi.org/10.1021/acssynbio.4c00697
  11. Batista, A.C., Soudier, P., Kushwaha, M. and Faulon, J. L. (2021), Optimising protein synthesis in cell-free systems, a review. Eng. Biol, 5: 10-19. https://doi.org/10.1049/enb2.12004
  12. Wang, Y., Zhang, YH. P. (2009). Cell-free protein synthesis energized by slowly-metabolized maltodextrin. BMC Biotechnol, 9:58.
    https://doi.org/10.1186/1472-6750-9-58
  13. Anderson, M. J., Stark, J. C., Hodgman, C. et al. (2015). Energizing eukaryotic cell-free protein synthesis with glucose metabolism, FEBS Letters, 589. https://pmc.ncbi.nlm.nih.gov/articles/PMC4651010/
  14. Troubleshooting Protein Folding Issues in Cell-Free Synthesis: Tips from Industry Experts. (s.f.). CD Biosynsis. https://www.biosynsis.com/troubleshooting-protein-folding-issues-in-cell-free-synthesis-tips-from-industry-experts.html

  1. Chen, Z., Wang, J., Sun, W. et al. (2018). Synthetic beta cells for fusion-mediated dynamic insulin secretion. Nat Chem Biol., 14(1):86-93.
    https://pmc.ncbi.nlm.nih.gov/articles/PMC6053053/
  2. Webber, M. J., Anderson, D. G. & Langer, R. (2015). Engineering Synthetically Modified Insulin for Glucose-Responsive Diabetes Therapy. Expert Rev Endocrinol Metab., 10(5):483-489.
    https://pmc.ncbi.nlm.nih.gov/articles/PMC4999256/
  3. Liu, J., Xue, J., Fu, L. et al. (2022). Genetically Encoded Synthetic Beta Cells for Insulin Biosynthesis and Release under Hyperglycemic Conditions. Adv. Funct. Mater., 32, 2111271.
    https://doi.org/10.1002/adfm.202111271
  4. NCBI Gene Database. ompF outer membrane porin F [Escherichia coli str. K-12 substr. MG1655]. Gene ID: 945554.
    https://www.ncbi.nlm.nih.gov/gene/945554
  5. Hilburger, C. E., Jacobs, M. L., Lewis, K. R. et al. (2019). Controlling Secretion in Artificial Cells with a Membrane AND Gate. ACS Synth Biol., 8(6):1224-1230.
    https://pmc.ncbi.nlm.nih.gov/articles/PMC6885402/

  1. Green, T. P., Talley, J. P., & Bundy, B. C. (2025). Recent Advances in Developing Cell-Free Protein Synthesis Biosensors for Medical Diagnostics and Environmental Monitoring. Biosensors, 15(8), 499.
    https://doi.org/10.3390/bios15080499
  2. Ho, G., Kubušová, V., Irabien, C. et al. (2023). Multiscale design of cell-free biologically active architectural structures. Front. Bioeng. Biotechnol. 11:1125156.
    https://doi.org/10.3389/fbioe.2023.1125156
  3. Nguyen, P.Q., Soenksen, L.R., Donghia, N.M. et al. (2021). Wearable materials with embedded synthetic biology sensors for biomolecule detection. Nat Biotechnol 39, 1366–1374.
    https://doi.org/10.1038/s41587-021-00950-3
  4. Wyss Institute. 2021. Face masks that can diagnose COVID-19.
    https://wyss.harvard.edu/news/face-masks-that-can-diagnose-covid-19/
  5. SynBioBeta. 2023. Designing Cell-Free, Biologically Active Architecture.
    https://www.synbiobeta.com/read/designing-cell-free-biologically-active-architecture
  6. Wyss Institute, 2021. wFDCF Face Masks: A Wearable COVID-19 Diagnostic.
    https://wyss.harvard.edu/technology/wfdcf-face-masks-a-wearable-covid-19-diagnostic/

  1. Kim, S., Min, K., Park, YG. et al. Stem cells in space: microgravity effects on stem cell fate and implications for regenerative medicine. npj Microgravity 12, 6 (2026). https://doi.org/10.1038/s41526-025-00547-z
  2. Beheshti, A., McDonald, J. T., Hada, M. et al. (2021). Genomic Changes Driven by Radiation-Induced DNA Damage and Microgravity in Human Cells. International Journal of Molecular Sciences, 22(19), 10507. https://doi.org/10.3390/ijms221910507
  3. Bisserier, M., Shanmughapriya, S., Rai, A. K. et al. (2021). Cell-Free Mitochondrial DNA as a Potential Biomarker for Astronauts’ Health. Journal of the American Heart Association, AHA Journals, 10(21).
    https://doi.org/10.1161/JAHA.121.022055
  4. Genes in Space, 2021. Meet the Genes in Space Toolkit: BioBits® cell-free system. https://www.genesinspace.org/news/blog/meet-the-genes-in-space-toolkit-biobits-cell-free-system/
  5. Bezdan, D., Grigorev, K., Meydan, C. et al. (2020). Cell-free DNA (cfDNA) and Exosome Profiling from a Year-Long Human Spaceflight Reveals Circulating Biomarkers. iScience, 23. Cell Press.
    https://doi.org/10.1016/j.isci.2020.101844
  6. Kocalar, S., Miller, B. M., Huang, A., et al. (2024). Validation of Cell-Free Protein Synthesis Aboard the International Space Station. ACS Synth Biol. 15;13(3):942-950. https://doi.org/10.1021/acssynbio.3c00733
  7. Genes in Space, 2025. Genes in Space winners receive a message from the ISS. https://www.genesinspace.org/news/blog/genes-in-space-winners-receive-a-message-from-the-iss/
  8. Genes in Space, 2025. Genes in Space 2024 student experiment successfully launched to the International Space Station. https://www.genesinspace.org/news/press/genes-in-space-2024-student-experiment-successfully-launched-to-the-international-space-station/
  9. Moreno-Villanueva, M., Wong, M., Lu, T. et al. (2017). Interplay of space radiation and microgravity in DNA damage and DNA damage response. npj Microgravity 3, 14. https://doi.org/10.1038/s41526-017-0019-7
  10. Pariset, E., Bertucci, A., Petay, M. et al. (2020). DNA Damage Baseline Predicts Resilience to Space Radiation and Radiotherapy. Cell Rep. 8;33(10):108434.
    https://doi.org/10.1016/j.celrep.2020.108434

HTGAA - Week 10: Advanced Imaging & Measurement Technology


cover image cover image
Lecture (Tues, Apr 7)

Advanced Imaging & Measurement Tech
(▶️Recording)
Evan Daugharthy, Lindsay Morrison.

Recitation (Wed, Apr 8)

Mass spectrometry
(▶️Recording | 💻Slides)
Waters Corp. Team



My Homework

WEEK 10 - X

This lecture presents a range of advanced technologies to do precision measurement of proteins at atomic scales, characterizing chemical composition, and detecting protein sequence and structure.

Homework is partly based on data that will be generated in the Waters Immerse Lab in Cambridge, MA. Students will characterize green fluorescent protein (eGFP, a recombinant protein standard) structure (primary, secondary/tertiary) in the lab using liquid chromatography and mass spectrometry, as well as Keyhole Limpet Hemocyanin (KLH) oligomeric states using charge detection mass spectrometry (CDMS). Data generated in the lab needed to do the homework is included both within this document and in the Appendix of the laboratory protocol.


Homework: Final Project

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

For your final project:

  • Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
  • Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
  • What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

Homework: Waters Part I — Molecular Weight

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

  1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

    eGFP Sequence:
    MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH
    Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it).

  2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:
    1. Determine $z$ for each adjacent pair of peaks $(n, n+1)$ using: $$ {\large z} = {\Large \frac{\frac{m}{z_{n+1}}}{\frac{m}{z_n} - \frac{m}{z_{n+1}}}} $$
    2. Determine the MW of the protein using the relationship between $\frac{m}{z_n}$, $MW$, and $z$
    3. Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using: $$ \text{Accuracy} = \frac{|MW_{\text{experiment}} - MW_{\text{theory}}|}{MW_{\text{theory}}} $$
      Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with $\frac{m}{z}$ values.

      Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with $\frac{m}{z}$ values.

  3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

Homework: Waters Part II — Secondary/Tertiary structure

Assignees for the following sections
MIT/Harvard studentsOptional but highly recommended
Committed ListenersOptional but highly recommended

We will analyze eGFP in its native, folded state and compare it to its denatured, unfolded state on a quadrupole time-of-flight MS. We will be doing MS-only analysis (no liquid chromatography, also known as “direct infusion” experiments) on the Waters Xevo G3-QToF MS.

  1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?
    Figure 2.  Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.

    Figure 2. Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.

  2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 $\frac{m}{z}$? What is the charge state? How can you tell?
    Figure 3.  Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS.  The inset is a zoomed-in view of the charge state at ~2800 $\frac{m}{z}$ on a mass spectrometer with 30,000 resolution.

    Figure 3. Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS. The inset is a zoomed-in view of the charge state at ~2800 $\frac{m}{z}$ on a mass spectrometer with 30,000 resolution.


Homework: Waters Part III — Peptide Mapping - primary structure

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.

There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.

  1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).
  2. How many peptides will be generated from tryptic digestion of eGFP?
    1. Navigate to https://web.expasy.org/peptide_mass/
    2. Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.
    3. Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.
    4. Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.
      Figure 4.  Example conditions for predicting the number of tryptic peptides from the eGFP standard.  Please replicate all parameters shown above.

      Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard. Please replicate all parameters shown above.

  3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
    Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in the mass spectrum in Figure 5b, below.

    Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in the mass spectrum in Figure 5b, below.

  4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?
  5. Identify the mass-to-charge ($\frac{m}{z}$) of the peptide shown in Figure 5b. What is the charge ($z$) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ($\small{[M\!\!+\!\!H]^+}$) based on its $\frac{m}{z}$ and $z$.
    Figure 5b. Mass spectrum figure to show $\frac{m}{z}$ for the chromatographic peak at 2.78 min from Figure 5a above. The inset is a zoom-in of the peak at $\frac{m}{z}$ 525.76, to discern the isotope peaks.

    Figure 5b. Mass spectrum figure to show $\frac{m}{z}$ for the chromatographic peak at 2.78 min from Figure 5a above. The inset is a zoom-in of the peak at $\frac{m}{z}$ 525.76, to discern the isotope peaks.

    Figure 5c. Fragmentation spectrum of the peptide eluting at retention time 2.78 minutes in Figure 5a (above).

    Figure 5c. Fragmentation spectrum of the peptide eluting at retention time 2.78 minutes in Figure 5a (above).

  6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that $ \text{Accuracy} = \frac{|MW_{\text{experiment}} - MW_{\text{theory}}|}{MW_{\text{theory}}} $ )
  7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)
    Figure 6.  Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data.

    Figure 6. Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data.


Bonus Peptide Map Questions

  1. Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?
  2. Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

Homework: Waters Part IV — Oligomers

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

  • 7FU Decamer
  • 8FU Didecamer
  • 8FU 3-Decamer
  • 8FU 4-Decamer
Polypeptide Subunit NameSubunit Mass
7FU340 kDa
8FU400 kDa
Table 1: KLH Subunit Masses
Figure 7.  Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.

Figure 7. Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.


Homework: Waters Part V — Did I make GFP?

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

TheoreticalObserved/measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)

Resources

HTGAA - Week 11: Bioproduction & Cloud Labs


Cloud laboratories are making science accessible, affordable, and reproducible. Our aim this semester is to showcase how they can enable human creativity at scale, and how they provide a platform for collaboration and community.

How To Grow (Almost) Anything is about synthetic biology, bioengineering, robotics, automation, art, and AI. But it is also about friendship, shared purpose, and the freedom to build beyond what we know and to be inspired by what can be. To that end, the goal with this cloud lab unit and homework assignment is to inspire collaboration and creativity while designing a scientifically rigorous cell-free fluorescent protein optimization experiment together.


Lecture (Tues, Apr 14)

Bioproduction & Cloud Labs
(▶️Recording)
Reshma Shetty

Recitation (Wed, Apr 15)

Cloud laboratories
(▶️Recording | 💻Slides)
Ronan Donovan



My Homework

WEEK 11 - X

This week focuses on.


Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

1. Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST.

2. Make a note on your HTGAA webpages including:


My contribution to the community bioart project

The canvas changed many times during the alocated time period, sometimes you could see some defined shapes, others they just looked like a colorfull mess. In the early satages, I contributed to a wide patch of “blue” pixels made out of Electra2, I was trying to make the figures that were already there a little more defined without erasing other people’s work. At the end, a lot of that blue patch was replaces for other patterns an little remained, but part of it still was preserved and blue pixels can be faund scattered on both, the right and left sides of the canvas with the Electra2 configuration. Also, I helped with some of the yellow mKO2 pixels at the left top corner of the piece with the “2026” design; especifically the yellow ones surrounding and filling the first “2” and the “0” have my name.

I did contributed to other color pixels, but they where minority compared to the Electra2 and mKO2 ones, by now they are mostly gone, so most of my register is plasterd with blue Electra2 ones.


What I liked about the project

I personally enjoy the “being part of something” of this iniciative, it was nice to include us all in a group project! :D


What about this collaborative art experiment could be made better for next year

Definetly the teamwork logistics! I know we had Discourse for this project, but it was still proven insufficient for clear organization since everyone had their own agenda for the design. My Node even mention that the lab logo they had work very hard on was removed SEVERAL times, wich was very sad. Next time, maybe add a rule asking for a initial design idea before starting the actual thing for wich we all can agree to, or let everyone submit a version and then vote for the one that was the most loved, then recreat it together in the final canvas using that as a blue pirnt.


Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

E. coli Lysate
BL21 (DE3) Star Lysate (includes T7 RNA Polymerase): Provides the complete translational machinery (ribosomes, tRNAs, aminoacyl-tRNA synthetases, elongation factors, chaperones) and the T7 RNA polymerase required to transcribe the DNA template into mRNA, enabling coupled transcription-translation in a single reaction.
Salts/Buffer
Potassium Glutamate: Acts as the primary counter-ion that stabilizes macromolecular interactions and optimizes ribosome function; glutamate is preferred over chloride as it is less inhibitory to translation.
HEPES-KOH pH 7.5: Maintains a stable physiological pH throughout the reaction, preventing acidification that could denature enzymes or inhibit translation.
Magnesium Glutamate: Provides Mg²⁺ as an essential cofactor for ribosome assembly, tRNA binding to the ribosome, and polymerase activity; its concentration is critical for translation efficiency.
Potassium phosphate monobasic/dibasic: Contributes to buffering capacity and provides phosphate ions that serve as building bloc
Energy / Nucleotide System
Ribose: Serves as a carbon source and precursor for de novo nucleotide synthesis through the pentose phosphate pathway, enabling sustained NTP regeneration in long reactions.
Glucose: Primary energy substrate that fuels glycolytic and oxidative phosphorylation pathways within the lysate to regenerate ATP and other NTPs.
AMP, CMP, GMP, UMP: Nucleoside monophosphates that act as substrates for cellular kinases in the lysate, which phosphorylate them to regenerate the corresponding NTPs (ATP, CTP, GTP, UTP) required for transcription.
Guanine: Free base that can be salvaged and converted to GMP via phosphoribosyltransferases and kinases present in the lysate, providing an alternative route for GTP regeneration.
Translation Mix (Amino Acids)
17 Amino Acid Mix: Provides the standard set of proteinogenic amino acids (excluding tyrosine and cysteine) as substrates for protein synthesis by the ribosome.
Tyrosine: Supplemented separately due to its poor solubility at neutral pH; it is essential for proteins containing tyrosine residues and for proper folding of certain fluorescent proteins.
Cysteine: Added separately because it is chemically unstable and prone to oxidation; it is critical for disulfide bond formation and the structural integrity of many proteins.
Additives
Nicotinamide: Serves as a precursor for NAD⁺ biosynthesis, supporting redox balance and energy metabolism within the lysate during extended incubations.
Backfill
Nuclease Free Water: Used to bring the reaction to the final volume without introducing RNases or DNases that would degrade the DNA template or synthesized mRNA.

2. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

The 1-hour PEP-NTP mix is optimized for immediate high-yield protein production by supplying pre-formed NTPs (ATP, GTP, CTP, UTP) and phosphoenolpyruvate (PEP) as a direct energy source, enabling rapid transcription and translation but exhausting its energy reserves quickly. In contrast, the 20-hour NMP-Ribose-Glucose mix uses a regenerative strategy where simple precursors (nucleoside monophosphates + ribose + glucose) are converted into NTPs over time by the endogenous metabolic enzymes remaining in the E. coli lysate, creating a slower but sustainable energy and nucleotide supply that supports protein synthesis for extended durations.


3. Bonus question: How can transcription occur if GMP is not included but Guanine is?

Transcription can proceed because the E. coli lysate contains phosphoribosyltransferases (such as hypoxanthine-guanine phosphoribosyltransferase, HGPRT) and nucleoside/nucleotide kinases that can salvage the free guanine base. These enzymes convert guanine into GMP by transferring a phosphoribosyl group from phosphoribosyl pyrophosphate (PRPP), which is generated from ribose-5-phosphate (derived from the supplied ribose via the pentose phosphate pathway). Subsequently, cellular kinases phosphorylate GMP to GDP and then to GTP, making it available for T7 RNA polymerase during transcription. This salvage pathway demonstrates the metabolic versatility of the crude lysate system.


Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each).

1. sfGFP
Property: Fast folding kinetics and high solubility with rapid maturation. sfGFP was engineered with six mutations (S30R, Y39N, N105T, Y145F, I171V, A206V) that enhance β-barrel folding efficiency and resistance to chemical denaturants. Its chromophore (derived from Ser65-Tyr66-Gly67) matures quickly once the protein folds, making it highly reliable in cell-free systems where rapid signal development is desired.
Effect on cell-free expression: The fast folding minimizes aggregation in the lysate environment, but chromophore formation still requires molecular oxygen. In sealed 384-well plates used for long incubations, oxygen depletion can become limiting, potentially slowing maturation despite fast folding.
2. mRFP1
Property: Slow maturation with a characteristic half-time (t₅₀) of ~22-24 minutes at 37°C and strict dependence on molecular oxygen for chromophore oxidation. The chromophore forms from the tripeptide Gln66-Tyr67-Gly68 through cyclization and oxidation steps, with oxidation being rate-limiting.
Effect on cell-free expression: The slow maturation creates a significant delay between protein synthesis and fluorescence detection. In cell-free reactions where oxygen is rapidly consumed by metabolic processes, mRFP1 maturation can be further delayed, making it challenging to achieve maximum fluorescence during short incubations.
3. mKO2
Property: Moderate acid sensitivity (pKa ~5.5-6.0) and strong dependence on oxygen tension for chromophore maturation. The chromophore requires a second oxidation step to form an acylimine linkage followed by spontaneous formation of a thiazole ring from Cys65. mKO2 exhibits slower oxidation kinetics compared to green FPs like mAG, with a pO₂·50 (oxygen tension for 50% fluorescence reduction) of 0.9%.
Effect on cell-free expression: The combination of acid sensitivity and high oxygen demand makes mKO2 vulnerable to pH drift and hypoxic conditions in extended cell-free reactions. Acidification of the lysate over time can protonate the chromophore, reducing fluorescence quantum yield.
4. mTurquoise2
Property: Exceptionally high quantum yield (QY = 0.93) and rapid maturation, but exhibits “complex maturation” kinetics where the rate progressively changes over time rather than following simple first-order exponential decay. The I146F mutation improves chromophore packing through enhanced van der Waals interactions, increasing brightness but potentially making folding more sensitive to ionic conditions.
Effect on cell-free expression: While its high QY provides excellent signal-to-noise ratio, the complex maturation kinetics suggest that mTurquoise2 may be sensitive to fluctuations in Mg²⁺ and K⁺ concentrations that affect ribosome-associated chaperones during extended synthesis.
5. mScarlet_I
Property: Rapidly maturing variant of mScarlet with a T74I mutation that accelerates cellular maturation significantly (t₅₀ ~26 min at 37°C) at the cost of reduced quantum yield (0.54) and fluorescence lifetime (3.1 ns). It maintains high brightness and monomeric structure with moderate acid sensitivity (pKa ~5.3).
Effect on cell-free expression: The fast maturation makes mScarlet-I ideal for time-course experiments, but like all red FPs, it requires oxygen for chromophore formation. The T74I mutation may alter the protein’s interaction with endogenous chaperones in the lysate, potentially affecting folding efficiency in crude extracts.
6. Electra2
Property: Blue fluorescent protein derived from mRuby3 with a TagBFP-like chromophore that forms via a non-fluorescent precursor in a rate-limiting step. Electra2 exhibits photoactivation behavior where UV illumination can accelerate chromophore formation from the non-fluorescent intermediate, suggesting slow spontaneous maturation kinetics. It is 2.1× brighter than mTagBFP2 but shows aggregation tendencies in some cellular contexts.
Effect on cell-free expression: The rate-limiting chromophore formation from the non-fluorescent precursor means that Electra2 may accumulate as non-fluorescent protein during the initial hours of cell-free synthesis. The aggregation tendency suggests sensitivity to the reducing conditions and macromolecular crowding in the lysate.

The amino acid sequences are shown in the HTGAA Cell-Free Benchling folder.


2. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

Hypothesis for 36-Hour Incubation Optimization
Protein: mKO2
Identified Property: Strong oxygen dependence for chromophore maturation (pO₂·50 = 0.9%) and moderate acid sensitivity (pKa ~5.5-6.0).
Reagent Adjustment: Increase Potassium phosphate buffer concentration and supplement with additional Magnesium Glutamate.
Expected Effect: The additional phosphate buffer (increasing from 5.6 mM to ~8-10 mM total phosphate) will enhance pH buffering capacity over 36 hours, preventing acidification that would protonate the mKO2 chromophore and reduce fluorescence. The increased Mg²⁺ (from 7.0 mM to ~8-9 mM) will support sustained activity of metabolic enzymes in the lysate that regenerate ATP and maintain oxygen-consuming pathways, ensuring adequate oxygen availability for the second oxidation step required for mKO2 chromophore maturation. This should increase total accumulated fluorescence by maintaining optimal pH and energy metabolism throughout the extended incubation.

3. The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

Important

In order to be eligible for this, make sure that your final project slide is in the “2026 Committed Listener ONE FINAL PROJECT IDEA” slide deck.

Optimized 36-Hour Artwork Master Mix Composition

Based on the progression from the 1-hour PEP/NTP mix (immediate energy) to the 20-hour NMP-Ribose mix (regenerative energy), the 36-hour Artwork mix must provide sustainable, cost-effective production with enhanced buffering and metabolic stability. The following composition optimizes for long-duration fluorescence maintenance:

Component36-Hour Artwork ConcentrationRationale
Potassium Glutamate300 mMSlightly reduced from 20-hour mix to balance ionic strength while maintaining ribosome stability
HEPES-KOH pH 7.550 mMIncreased from 45 mM to provide additional pH buffering over 36 hours, preventing acidification that affects mKO2 and mRFP1
Magnesium Glutamate8.0 mMIncreased from 7.0 mM to support sustained kinase activity for NTP regeneration and chaperone function
Potassium phosphate (1.6:1 dibasic:monobasic)8.0 mMIncreased from 5.6 mM to enhance buffering capacity and provide phosphate for energy metabolism
Potassium phosphate (1.6:1 monobasic:dibasic)8.0 mMMaintains the dual phosphate system for robust pH control
Ribose80 mM - 12.0 g/LSlightly increased from 77.4 mM to ensure sustained PRPP generation for nucleotide salvage and NTP regeneration
Glucose8.0 mM - 1.5 g/LIncreased from 6.9 mM to fuel extended glycolytic ATP regeneration without causing excessive acidification
AMP600 μM - 0.6 mMMaintained from 20-hour mix; sufficient for adenylate kinase-mediated ATP regeneration
CMP400 μM - 0.4 mMMaintained from 20-hour mix
GMP0 μMMaintained; guanine salvage provides alternative GTP source
UMP400 μM - 0.4 mMMaintained from 20-hour mix
Guanine200 μM - 0.2 mMMaintained; salvaged to GMP via HGPRT in lysate
17 Amino Acid Mix4.0 mMSlightly reduced from 4.1 mM to minimize osmotic stress while maintaining synthesis capacity
Tyrosine pH 124.0 mMMaintained; critical for sfGFP and mTurquoise2 maturation
Cysteine4.0 mMMaintained; essential for mKO2 thiazole ring formation and disulfide stability
Nicotinamide4.0 mMIncreased from 3.1 mM to enhance NAD⁺ regeneration for redox balance and extended metabolic activity

80 mM (millimolar) ribose to g/L, useing the molar mass of ribose and unit conversion:

  1. The molar mass of ribose es (C₅H₁₀O₅) is 150.13 g/mol
  2. Convert millimolar (mM) to (g/L):
  • 1 mM = 1 mmol/L = 0.001 mol/L
  • 80 mM = 80×10^−3 mol/L = 0.08 mol/L
  • g/L = 0.08 mol/L × 150.13 g/mol = 12.0104 g/L

8 mM (millimolar) glucose to g/L, useing the molar mass of glucose and unit conversion:

  1. The molar mass of glucose (C₆H₁₂O₆) is 180.156 g/mol
  2. Convert millimolar (mM) to molar (M):
  • 8 mM = 8×10^−3 mol/L
  1. Multiply by molar mass to get grams per liter:
  • (8×10^−3 mol/L) × 180.156 g/mol = 1.44125 g/L

600 µM of AMP to mM

  • 1 mM = 1000 µM
  • 600 μM = 600/1000 mM = 0.6 mM

Reaction composition per well

  • 6 μL BL21 (DE3) Star Lysate
  • 10 μL 2X Optimized 36-Hour Artwork Master Mix (concentrations above are 1X; 2X stocks double these values)
  • 2 μL Assigned fluorescent protein DNA template
  • 2 μL Custom reagent supplements (additional phosphate or Mg²⁺ for mKO2/mRFP1 wells)

Total: 20 μL reaction


4. The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!).

The reaction composition for each well will be as follows:

  • 6 μL of Lysate
  • 10 μL of 2X Optimized Master Mix from above
  • 2 μL of assigned fluorescent protein DNA template
  • 2 μL of your custom reagent supplements

Total: 20 μL reaction


Final optimized 8 wells: Master Mix combos


Reagent Supplement JSON

[
  {
    "quadrant": "Q4",
    "well_label": "A6",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q1",
    "well_label": "B19",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q1",
    "well_label": "B2",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q4",
    "well_label": "D14",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q3",
    "well_label": "F12",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q3",
    "well_label": "G21",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q2",
    "well_label": "I22",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  },
  {
    "quadrant": "Q2",
    "well_label": "N23",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1325
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 50
      },
      {
        "id": "hepes_koh",
        "supplemental_volume_nl": 100
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "guanine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 25
      },
      {
        "id": "potassium_phosphate_dibasic",
        "supplemental_volume_nl": 100
      },
      {```  
      
        "id": "potassium_phosphate_monobasic",
        "supplemental_volume_nl": 100
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  }
]

Master Mix contributions: Canvas final look


Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment

Assignees for the following sections
MIT/Harvard studentsOptional
Committed ListenersOptional
Ginkgo Nebula Cloud Laboratory Rendering, 2025

Ginkgo Nebula Cloud Laboratory Rendering, 2025

  1. Use this simulation tool to create an interesting looking cloud lab out of the Ginkgo Reconfigurable Automation Carts. This is just a minimal implementation so far, but I would love to see some fun designs!
    Tip

    Note from Ronan: If you are interested in helping me build out future HTGAA cloud lab software, please fill out this form!


Resources

Reading:

Common Nebula protocols & their parameters

Generic_atc_run_protocol
{
    "bs_shake": false,
    "storage_rac": "ambistore-1",
    "hig_pre_spin": false,
    "hig_post_spin": false,
    "storage_stacker": "10-position",
    "atc_sample_volume": 10,
    "bs_model": "3000",
    "bs_speed": 1500,
    "bs_duration": 30,
    "hig_pre_g_force": 1500,
    "hig_pre_spin_time": 0,
    "hig_post_g_force": 1500,
    "hig_post_spin_time": 0,
    "atc_block_format": 96
}
Generic_bravo_stamp
{
    "pl_dest_seal": false,
    "bs_dest_shake": false,
    "hig_dest_spin": false,
    "pl_source_seal": false,
    "bs_source_shake": false,
    "hig_source_spin": false,
    "xpeel_dest_peel": false,
    "bravo_asp_height": 2,
    "dest_storage_rac": "ambistore-1",
    "bravo_disp_height": 2,
    "bravo_head_format": 96,
    "xpeel_source_peel": false,
    "bravo_liquid_class": "Aqueous",
    "source_storage_rac": "ambistore-1",
    "dest_storage_stacker": "10-position",
    "bravo_dest_mix_cycles": 0,
    "bravo_dest_mix_volume": 0,
    "source_storage_stacker": "10-position",
    "bravo_source_mix_cycles": 0,
    "bravo_source_mix_volume": 0,
    "trash_submodule_type_name": "trash-1",
    "bravo_dest_mix_liquid_class": "Gentle",
    "bravo_source_mix_liquid_class": "Gentle",
    "hig_source_g_force": 1500,
    "hig_source_spin_time": 0,
    "pl_source_seal_temp": 166,
    "pl_source_seal_time": 2.5,
    "pl_source_seal_type": "alu-1",
    "hig_dest_g_force": 1500,
    "hig_dest_spin_time": 0,
    "pl_dest_seal_temp": 166,
    "pl_dest_seal_time": 2.5,
    "pl_dest_seal_type": "alu-1",
    "bs_dest_model": "3000",
    "bs_dest_speed": 200,
    "bs_dest_duration": 0,
    "bs_source_model": "3000",
    "bs_source_speed": 200,
    "bs_source_duration": 0
}
Generic_cytomat_incubate
{
    "cytomat_stacker": "7-position",
    "store_payload_after_incubation": false,
    "storage_rac": "ambistore-1",
    "storage_stacker": "10-position"
}
Generic_echo_hitpick
{
    "pl_dest_seal": false,
    "bs_dest_shake": false,
    "hig_dest_spin": false,
    "pl_source_seal": false,
    "source_centric": false,
    "bs_source_shake": false,
    "hig_source_spin": false,
    "xpeel_dest_peel": false,
    "dest_storage_rac": "ambistore-1",
    "xpeel_source_peel": false,
    "source_storage_rac": "ambistore-1",
    "dest_storage_stacker": "10-position",
    "source_storage_stacker": "10-position",
    "echo_submodule_type_name": "echo",
    "bs_source_model": "3000",
    "bs_source_speed": 1500,
    "bs_source_duration": 0,
    "hig_source_g_force": 1500,
    "hig_source_spin_time": 0,
    "bs_dest_model": "3000",
    "bs_dest_speed": 1500,
    "bs_dest_duration": 0,
    "hig_dest_g_force": 1500,
    "hig_dest_spin_time": 0,
    "pl_source_seal_type": "alu-1",
    "pl_dest_seal_type": "alu-1",
    "echo_source_liquid_type": "AQ_BP",
    "echo_transfer_information": [
	{
	    "plate_map": {},
	    "source_payload_id": "placeholder-plate-id",
	    "source_liquid_type": "placeholder-liquid-type",
	    "source_payload_type": "placeholder-plate-type",
	    "destination_payload_id": "placeholder-plate-id",
	    "destination_payload_type": "placeholder-plate-type"
	}
    ],
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5
}
Generic_hig_centrifuge
{
    "hig_g_force": 1500,
    "storage_rac": "ambistore-1",
    "hig_spin_time": 60,
    "storage_stacker": "10-position",
    "hig_spin_two_payloads": false
}
Generic_floi8_cherry_pick
{
    "pl_dest_seal": false,
    "bs_dest_shake": false,
    "hig_dest_spin": false,
    "pl_source_seal": false,
    "source_centric": true,
    "bs_source_shake": false,
    "hig_source_spin": false,
    "xpeel_dest_peel": false,
    "dest_storage_rac": "ambistore-1",
    "xpeel_source_peel": false,
    "floi8_request_tips": true,
    "source_storage_rac": "ambistore-1",
    "dest_storage_stacker": "10-position",
    "floi8_tip_preferences": [
	"f50"
    ],
    "source_storage_stacker": "10-position",
    "floi8_cherry_pick_plans": [
	{
	    "well_transfers": {},
	    "dest_payload_id": "placeholder-plate-id",
	    "dest_payload_type": "placeholder-plate-type",
	    "pipetting_profile": "placeholder-liquid-type",
	    "source_payload_id": "placeholder-plate-id",
	    "source_payload_type": "placeholder-plate-type"
	}
    ],
    "hig_source_g_force": 250,
    "hig_source_spin_time": 0,
    "bs_source_model": "3000",
    "bs_source_speed": 200,
    "bs_source_duration": 0,
    "floi8_source_delid_relid": false,
    "hig_dest_g_force": 250,
    "hig_dest_spin_time": 0,
    "bs_dest_model": "3000",
    "bs_dest_speed": 200,
    "bs_dest_duration": 0,
    "floi8_dest_delid_relid": false,
    "pl_source_seal_temp": 166,
    "pl_source_seal_time": 2.5,
    "pl_source_seal_type": "alu-1",
    "pl_dest_seal_temp": 166,
    "pl_dest_seal_time": 2.5,
    "pl_dest_seal_type": "alu-1"
}
generic_multiflo_dispense
{
    "bs_shake": false,
    "hig_spin": false,
    "mf_shake": false,
    "xpeel_peel": false,
    "storage_rac": "ambistore-1",
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "pl_seal_type": "alu-1",
    "storage_stacker": "10-position",
    "mf_dispense_type": "peripump",
    "mf_source_content": "reagent",
    "mf_dispense_volume": 25,
    "mf_cols_to_dispense": [],
    "mf_num_pre_dispenses": 2,
    "mf_pre_dispense_volume": 100,
    "mf_submodule_type_name": "multiflo-1",
    "mf_soak_duration": 0,
    "mf_shake_duration": 0,
    "mf_shake_intensity": "medium",
    "hig_g_force": 250,
    "hig_spin_time": 0,
    "bs_model": "3000",
    "bs_speed": 1500,
    "bs_duration": 0,
    "mf_flow_rate": "med",
    "mf_pump_or_syringe_name": "primary"
}
generic_spark_read
{
    "pl_seal": false,
    "bs_shake": false,
    "hig_spin": false,
    "xpeel_peel": false,
    "storage_rac": "ambistore-1",
    "storage_stacker": "10-position",
    "spark_delid_relid": false,
    "spark_protocol_duration": 60,
    "spark_run_custom_protocol": false,
    "bs_model": "3000",
    "bs_speed": 200,
    "bs_duration": 0,
    "hig_g_force": 250,
    "hig_spin_time": 0,
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "pl_seal_type": "alu-1",
    "spark_protocol_steps": []
}
generic_pherastar_read
{
    "pl_seal": true,
    "bs_shake": false,
    "hig_spin": false,
    "xpeel_peel": true,
    "storage_rac": "ambistore-1",
    "storage_stacker": "10-position",
    "ps_protocol_duration": 90,
    "bs_model": "3000",
    "bs_speed": 1500,
    "bs_duration": 0,
    "hig_g_force": 1500,
    "hig_spin_time": 0,
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "pl_seal_type": "alu-1"
}

HTGAA - Week 12: Building Genomes


cover image cover image
Lecture (Tues, Apr 21)

Building Genomes
(▶️Recording)
George Church, John Glass, Jef Boeke

Recitation (Wed, Apr 22)

CRISPR-based Metabolic Engineering
(▶️Recording | 💻Slides)
Ice Kiattisewee



My Homework

WEEK 12 - X

This week focuses on designing, synthesizing, and editing whole genomes, from minimal cells to refactored microbes and synthetic chromosomes.

Important

Be sure you’ve seen the updated week 11 homework which is due at the start of the April 28 lecture.

Tip

Continue making progress this week on your Individual Final Project and on DNA orders (due Friday midnight ET).


Resources

HTGAA - Week 13: Biodesign & Engineered Living Materials


cover image cover image

My Homework

WEEK 13 - X

This week covers designing, programming, and fabricating engineered living materials — such as self-healing concretes, adaptive biofilms, and responsive biomaterials — by integrating genetic circuit design, materials science, and bioprocess engineering.

Subsections of Node - Lifefabs Institute

Get to know the Lifefabs Intitute


About Lifefabs

The Lifefabs Institute is an interdisciplinary biofabrication research institute based in London, United Kingdom, dedicated to advancing impactful and inclusive innovation in the life sciences. Founded in 2024, Lifefabs provides an open-access, collaborative laboratory environment where scientists, engineers, creatives, and community members can design, prototype, and translate biological ideas into real-world applications.

As part of the global network of How to Grow (Almost) Anything (HTGAA), Lifefabs serves as a Global Node, supporting our learning journey as Committed Listeners. Their team plays a key mentorship role, reviewing assignments, providing technical guidance, and acting as teaching assistants throughout the course.

Beyond mentorship, Lifefabs enriches the HTGAA experience by offering additional educational sessions and hands-on laboratory opportunities aligned with the course topics. These include workshops, experimental demonstrations, and lab-based activities designed to strengthen practical skills in synthetic biology and biofabrication. Participants can engage either remotely or in person for those based in the UK, fostering both accessibility and real-world laboratory experience.

At its core, Lifefabs emphasizes collaborative learning, community-driven research, and the democratization of biotechnology. By bridging academic knowledge with hands-on practice, the institute empowers students and researchers to actively participate in shaping the future of bioengineering and sustainable innovation.

Find more on the official Lifefabs website: Here


Meet the team! The compromised TAs for this 2026 HTGAA chapter:

Lifefabs Institute - Activities & Itinerary


Official Node weekly meetings

Lifefabs Node Homework Session: every Thursday at 21:00-22:00 PM UK Time
Zoom Link: https://ucl.zoom.us/j/4101967739


Lifefabs suplementary labs schedule


Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Projects

Throughout the term each student defines and executes an Individual Final Project and then presents their work before the class as a culmination of their semester. This applies to all students including local for-credit MIT/Harvard students as well as the Global “Committed Listeners” who present their projects on Zoom to the Course Instructors, Lecturers and Teaching Staff (note that this presentation is one of the requirements for Committed Listeners to earn a Certificate of Completion for the course).

Info

May 12, 2026: MIT / Harvard Individual Final Project Presentations (~3 Hours) May 13, 2026: Global Committed Listener Individual Final Project Presentations (~9-12 Hours)

In addition, all students have the opportunity to contribute to the Group Final Project, a collaborative effort towards a significant research result which runs through the term and sometimes beyond.


  • The three FP ideas requestes by HTGAA

    • Dual-Signal Biosensor for Functional Radiation Dosimetry
    • Smart Bacteria for Targeted Oil Spill Cleanup
    • Self-Destructing Antimicrobial Biofilms
  • Self-Destructing Antimicrobial Biofilms

    • Bacillus subtilis for AMR and Space Science aplications
    • Antimicrobial market and new necessities
    • Sequential Logic Circuit and genetic constructs
    • Eperimental lab protocol


Switch view to Homework

Subsections of Projects

HTGAA Final Project Ideas

Individual Final Project Lineaments

Important Dates

Feb 25, 2026: Share 3 Individual Final Project ideas (1 slide each, in Google slide deck to be provided)
Mar 18, 2026: Finalize Individual Final Project topic; send TAs Twist designs
Apr 30 & May 1, 2026: Final project open Lab sessionn #1 (MIT/Harvard)
May 7 & 8, 2026: Final project open Lab sessionn #2 (MIT/Harvard)
May 12, 2026: MIT / Harvard Individual Final Project presentations (~3 Hours)
May 13, 2026: Global Committed Listener Individual Final Project presentations (~9-12 Hours)

Links: Links will be available later in the semester:

  • (Signup sheets for a presentation slot)
  • (A schedule of MIT/Harvard TA availability for lab work)
  • (Signup sheet for MIT/Harvard Lab slots)

Check the 2025 How To Grow (Almost) Anything students’ works: Notion page


(View Full Screen)

My Final project

Self-Destructing Antimicrobial Biofilms

1. Full project documentation


Overview

The following documentation presents the complete scientific rationale, engineering logic, and biological architecture for a novel self-destructing antimicrobial biofilm system. The project addresses the dual challenge of antimicrobial surface protection and biological containment through engineered Bacillus subtilis 168 biofilms capable of producing two distinct antimicrobial agents, followed by programmed self-lysis via a phage-derived kill switch. The system is designed as a three-module genetic circuit: (1) a sensor-priming module that detects microbial contamination and activates the antimicrobial arsenal; (2) an effector module producing the lantibiotic subtilin and the antimicrobial peptide cecropin with broad-spectrum activity against Gram-positive and Gram-negative pathogens; and (3) a kill switch module utilizing PBSX prophage holin-endolysin genes xhlA and xhlB under a time-delayed or chemically inducible promoter. Also, a market justification is presented demonstrating a USD 10.98 billion antimicrobial coating market, the genetic circuit architecture explaining why sequential coincidence detection logic supersedes classical Boolean AND gates, the molecular biology of biofilm formation and chassis selection, detailed antimicrobial mechanisms, kill switch technology, and comprehensive molecular methods including PCR, Golden Gate assembly, and chromosomal integration. The work is structured around three strategic aims that progressively build from justification through design to implementation, culminating in a 10-week experimental protocol.


1. Project Justification & Market Analysis

Global Antimicrobial Markets

The antimicrobial surface technology sector represents one of the most rapidly expanding segments of the biotechnology market. According to industry analyses by Research and Markets and Grand View Research, the global antimicrobial coatings market reached approximately USD 10.98 billion in 2025 and is projected to grow at a compound annual growth rate (CAGR) of 12.2% to 13.5% through 2030. This expansion is driven by increasing awareness of hospital-acquired infections (HAIs), the persistent challenge of microbial contamination in closed environments, and the emergence of antimicrobial resistance (AMR) among common pathogens.

Within this broader market, antimicrobial textiles represent a particularly relevant segment for biofilm-based approaches. The global antimicrobial textiles market was valued at USD 11.55 billion in 2024 and is projected to reach USD 16.37 billion by 2029, growing at a CAGR of 7.23%. The antimicrobial plastic packaging market, another adjacent segment, is anticipated to reach USD 17.2 billion by 2030. These figures demonstrate substantial commercial interest and investment in surfaces that actively prevent microbial colonization rather than passively resisting it.

The global biocides market, which encompasses the active chemical agents used in antimicrobial formulations, was valued at approximately USD 9.78 billion in 2025. However, traditional biocides face increasing regulatory scrutiny due to environmental persistence, toxicity concerns, and the evolution of resistance. The European Biocidal Products Regulation (BPR, Regulation (EU) 528/2012) has significantly restricted the approved biocidal active substances, creating demand for novel antimicrobial mechanisms that do not rely on conventional chemical biocides.

The presented self-destructing biofilm technology would occupy a unique position at the intersection of these markets. Unlike passive antimicrobial coatings that rely on silver nanoparticles, copper alloys, or quaternary ammonium compounds with fixed release kinetics, a living biofilm system provides active, responsive antimicrobial production. The engineered biofilm detects contamination and responds by producing antimicrobial peptides on demand. The self-destruct capability addresses the critical containment and end-of-life concerns that currently limit the deployment of living engineered systems in consumer and healthcare settings.


Space Microbiology & The ISS Contamination Crisis

Microbial contamination in closed environments represents a critical operational and health risk with quantified economic and safety implications. The International Space Station (ISS) serves as the most thoroughly documented example of this phenomenon in an extreme environment. Since the ISS began continuous human habitation in November 2000, astronauts have been exposed to an environment where terrestrial microorganisms adapt to the unique pressures of microgravity, radiation, and metal-rich surfaces.

In 2023, researchers at NASA and collaborating institutions reported the discovery of three previously unknown strains of multidrug-resistant bacteria aboard the ISS: Enterobacter bugandensis strains IF7SW-B2, IIF1SW-B5, and IF4SW-B5. These strains were isolated from the station toilet area and demonstrated resistance to multiple antibiotics including cephalosporins, tetracyclines, and aminoglycosides. Genome analysis revealed 112 virulence factor genes, 95 of which were associated with human pathogenicity. This discovery followed earlier documentation of Staphylococcus aureus, Staphylococcus epidermidis, and various Enterobacteriaceae persisting on ISS surfaces for extended durations.

The persistence of these organisms is not merely an academic concern. An outbreak of drug-resistant Acinetobacter pittii aboard the ISS was documented over a 5-month period, demonstrating that spaceflight conditions can select for and amplify resistant populations. The cost of crew illness events in space is estimated at millions of dollars per incident when accounting for mission delays, medical intervention, and potential evacuation. For future long-duration missions to Mars, where resupply is impossible and medical evacuation impractical, microbial contamination represents a mission-critical risk.

Biofilms exacerbate this risk through their extraordinary resilience. In microgravity, fluid dynamics change dramatically: buoyancy-driven convection is eliminated, and surface-associated flow dominates. Under these conditions, bacteria exhibit altered biofilm formation kinetics. Studies of Pseudomonas aeruginosa in simulated microgravity demonstrated enhanced biofilm biomass and altered extracellular matrix composition compared to 1g controls. Bacillus subtilis, our chosen chassis, has also been studied in spaceflight conditions and exhibits altered spore formation and biofilm morphologies, though it maintains its fundamental genetic programmability.

Current antimicrobial countermeasures on the ISS rely on silver-impregnated surfaces, periodic chemical disinfection with quaternary ammonium compounds, and HEPA filtration. These approaches have proven insufficient for complete microbial suppression, and chemical residues pose their own health concerns in closed-loop life support systems. A self-regenerating, self-destructing antimicrobial biofilm that actively produces antimicrobial peptides and then eliminates itself would represent a paradigm shift in closed-environment hygiene technology.


Biofilm-Associated Healthcare Burden

Biofilms are responsible for an estimated 80% of all bacterial infections in humans. The Centers for Disease Control and Prevention (CDC) estimates that approximately 1.7 million hospital-acquired infections (HAIs) occur annually in the United States alone, resulting in approximately 99,000 deaths and adding USD 28.4 to 45 billion in direct medical costs each year. A substantial proportion of these infections are biofilm-associated, including catheter-related bloodstream infections, ventilatorassociated pneumonia, surgical site infections, and urinary tract infections associated with indwelling catheters.

Biofilm-related infections are particularly costly because they resist conventional antibiotic therapy. The minimum inhibitory concentration (MIC) for biofilm-embedded bacteria can be 10 to 1,000 times higher than for planktonic cells. This tolerance arises from multiple mechanisms: the extracellular matrix (ECM) acts as a diffusion barrier for antibiotics; cells within biofilms enter metabolically dormant states that reduce antibiotic susceptibility; and horizontal gene transfer is enhanced in biofilm communities, accelerating resistance spread. The annual economic impact of antimicrobial resistance (AMR) is projected to reach USD 100 trillion by 2050 if current trends continue, according to the Review on Antimicrobial Resistance (the ONeill Report).

Wound care represents yet another high-value application. Chronic wounds, including diabetic foot ulcers, venous leg ulcers, and pressure injuries, affect approximately 8.2 million people in the United States annually. The global wound care market was valued at USD 22.8 billion in 2023 and is growing at 4.5% CAGR. Biofilm presence in chronic wounds is documented in over 78% of cases and is a primary driver of delayed healing. An engineered biofilm that delivers antimicrobial peptides directly to the wound bed and then eliminates itself would address both infection and the foreign body response that impairs healing.


Why This Project Matters: Validity, Relevance, Importance, and Innovation

This project addresses a genuine, quantified market and clinical need. The validity of the approach rests on three converging technological foundations: (1) the well-established capacity of Bacillus subtilis to form robust, genetically tractable biofilms; (2) the proven antimicrobial efficacy of subtilin and cecropin against clinically relevant pathogens; and (3) the demonstrated functionality of phagederived kill switches in bacterial containment. No existing technology combines all three capabilities in a single, self-regulating system.

The relevance extends across multiple sectors. In healthcare, it addresses the HAI crisis and the limitations of current antimicrobial surfaces. In aerospace, it targets the documented ISS contamination problem and the anticipated needs of long-duration spaceflight. In consumer applications, it offers a biodegradable alternative to persistent silver and copper coatings that accumulate in the environment. In food safety, active antimicrobial packaging that self-destructs after product use would eliminate persistent packaging waste while maintaining safety during shelf life.

The importance is amplified by the antimicrobial resistance crisis. The World Health Organization has declared AMR one of the top ten global public health threats facing humanity. New antimicrobial strategies that do not rely on conventional antibiotics and that minimize resistance selection are urgently needed. Antimicrobial peptides (AMPs) like subtilin and cecropin kill bacteria through membrane disruption mechanisms that are less prone to single-step resistance evolution than conventional antibiotics. Combining two AMPs with different mechanisms further reduces resistance probability

The innovation lies in the integration of active antimicrobial production with biological containment through genetic programming. Unlike passive coatings that leach antimicrobial agents continuously, our system produces antimicrobials only when needed, responding to microbial contamination. Unlike persistent living coatings, our system includes a genetically encoded expiration mechanism. And unlike conventional antimicrobial surfaces that require replacement or cleaning, our system leaves behind only degraded extracellular matrix and lysed cellular debris that can be wiped away or left to biodegrade.


Market Segmentation & Competitive Landscape

Current antimicrobial surface technologies fall into four categories: (1) metal-based coatings (silver, copper, zinc) that release toxic ions; (2) organic biocide coatings (quaternary ammonium compounds, triclosan, biguanides); (3) passive physical modifications (nano-roughness, anti-adhesive polymers); and (4) antibiotic-impregnated materials used primarily in medical devices. Each category has significant limitations.

Metal-based coatings face regulatory restrictions due to environmental accumulation and emerging evidence of mammalian cell toxicity. Silver nanoparticles, the dominant antimicrobial coating technology, have been restricted in certain textile applications by the European Commission. Copper surfaces require continuous oxidation to release Cu2+ ions, and their efficacy diminishes over time as surface oxide layers thicken.

Organic biocides face the most severe regulatory challenges. Triclosan has been banned in hand soaps by the U.S. FDA and restricted in Europe. Quaternary ammonium compounds are under increasing scrutiny for their role in antimicrobial resistance selection and environmental persistence. Biguanides (chlorhexidine) are effective but staining and skin irritation limit their application range. Passive physical modifications (like Sharklet micro-patterned surfaces) prevent bacterial attachment without chemical toxicity but provide no active killing mechanism. Once bacteria adhere, these surfaces offer no antimicrobial protection.

This project engineered biofilm technology occupies a fifth category: active biological antimicrobial systems. This category does not yet exist in the commercial market, representing a blue-ocean opportunity. The competitive advantage derives from three features: (a) on-demand antimicrobial production rather than passive release; (b) self-limiting duration through genetic programming; and (c) biodegradability and environmental compatibility compared to persistent metal or chemical coatings.


Regulatory Pathway & Commercialization Strategy

The regulatory pathway for genetically engineered living products is complex but increasingly welldefined. In the United States, the Environmental Protection Agency (EPA) regulates microbial pesticides under the Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA). However, the EPA has a specific exemption for genetically engineered microorganisms used in contained manufacturing processes. For consumer-facing applications, the FDA Center for Food Safety and Applied Nutrition (CFSAN) would evaluate food-contact applications, while the FDA Center for Devices and Radiological Health (CDRH) would regulate medical device coatings.

The presence of a functional kill switch significantly enhances the regulatory profile. The 2016 Presidential Commission for the Study of Bioethical Issues recommended that engineered organisms intended for environmental release incorporate multiple layers of biological containment. The “Self-Destructing Antimicrobial Biofilms” threemodule architecture, with the kill switch as an integral component rather than an afterthought, aligns with these recommendations. The use of Bacillus subtilis, a GRAS (Generally Recognized As Safe) organism with decades of safe use in food fermentation and industrial enzyme production, provides a favorable starting point for regulatory engagement.

The commercialization strategy would prioritize contained-use applications initially, where regulatory barriers are lowest. These include closed-environment antimicrobial systems for spacecraft, clean rooms, and controlled manufacturing facilities. As safety data accumulates, the technology could progress to semi-contained applications (wound dressings, dental applications) and eventually to broader consumer products. This staged approach mirrors the commercialization trajectory of other engineered biological systems, including genetically modified probiotics and living therapeutics.


2. Genetic Circuit Architecture & Systems Logic

From Boolean Logic to Genetic Circuits

The design of genetic circuits draws conceptual inspiration from electronic logic circuits, but the analogy must be applied with careful attention to biological reality. In electronic systems, Boolean logic gates (AND, OR, NOT, NAND, NOR, XOR) process discrete binary signals (0/1, low/high voltage) through physically separated conductive pathways with minimal noise and rapid switching times (nanoseconds to microseconds). In genetic circuits, signals are concentrations of transcription factors, RNA polymerase activity, and metabolite levels. These signals are analog rather than digital, noisy rather than deterministic, and slow rather than fast, with switching times typically measured in minutes to hours.

Despite these differences, the abstraction of transcriptional logic has proven powerful for engineering predictable biological behaviors. In a classical genetic AND gate, two input promoters (P_A and P_B) each drive expression of a split transcription factor or intermediate regulator. Only when both inputs are present simultaneously does the output promoter activate. For example, one input might express the DNA-binding domain of a transcription factor while the other expresses the activation domain; functional transcription only occurs when both domains are present to form a complete factor.

Numerous natural biological systems exhibit Boolean-like behavior. The lac operon in Escherichia coli demonstrates AND-like logic: full induction requires both lactose (to relieve LacI repression) and low glucose (to activate CRP-cAMP positive regulation). The arabinose operon shows similar coincidentdependency. Synthetic biology has engineered many artificial AND gates using split transcription factors, interlocked promoters, and cooperative binding architectures. The 2012 paper by Siuti, Yazbek, and Lu demonstrated a genetic AND gate in E. coli using T7 RNA polymerase split into Nterminal and C terminal fragments, each expressed from a different input promoter.

However, the application of AND logic to our antimicrobial biofilm system encounters fundamental mismatches between the electronic abstraction and biological constraints. The initial conceptual design for this project proposed an AND-type gate where the presence of a pathogen would serve as Input A and the surface colonization signal would serve as Input B, with the output being antimicrobial production. This design was ultimately rejected after rigorous analysis revealed that the temporal dynamics, signal integration, and functional requirements of the biological system demand a fundamentally different logic architecture.


Why NOT a Classical AND Gate

Three categories of constraints preclude the use of a classical AND gate for our system: temporal dynamics, signal-to-noise ratios in promoter threshold detection, and the fundamentally sequential nature of biofilm-based antimicrobial delivery.

Temporal Dynamics and Sequential Requirements

A classical AND gate requires the simultaneous presence of both inputs within the switching window of the gate. In our system, Input 1 is the detection of environmental contamination (or the decision to activate the system), and Input 2 is the biofilm maturity required for effective antimicrobial production and delivery. These inputs are inherently sequential, not simultaneous. The biofilm must form first, establish the sessile community, build the extracellular matrix, and reach sufficient cell density before antimicrobial production is useful. If antimicrobial peptides were produced before biofilm maturation, they would diffuse away from the surface rather than being concentrated at the target interface.

Biofilm formation in Bacillus subtilis proceeds through well-characterized stages: initial attachment (0-2 hours), microcolony formation (2-8 hours), matrix production and maturation (8-24 hours), and steady-state maintenance (24+ hours). The production of subtilin and cecropin by planktonic cells during the attachment phase would be pharmacologically wasteful, as the antimicrobials would disperse into the surrounding medium rather than being retained at the surface. The AND gate architecture, by requiring both inputs simultaneously, would force this wasteful early production or, alternatively, would require the second input to be artificially delayed, effectively converting the AND gate into a sequential circuit by adding delay elements.

This temporal ordering is not merely an implementation detail but a fundamental requirement for the biological function. The biofilm serves as the delivery platform, and the antimicrobials are the payload. The platform must exist before the payload is deployed. This sequential dependency (Platform First, Then Payload) is irreconcilable with the simultaneous-input requirement of a classical AND gate.

Promoter Threshold Detection and Signal-to-Noise Limitations

AND gates in genetic circuits typically require each input promoter to exceed a threshold activation level for the output to trigger. When both inputs are near their threshold boundaries, biological noise (stochastic variation in transcription, translation, and degradation) causes frequent mis-switching. Moon et al. (2012) demonstrated that genetic AND gates exhibit substantial leakage when one input is absent and the other is near threshold, and that this leakage increases with the dynamic range of the promoters.

In our system, the contamination detection signal (whether through a biosensor promoter or a manual induction decision) would need to integrate with a biofilm maturation signal. Biofilm maturation is not a binary state but a continuous progression. The transition from immature to mature biofilm involves gradual increases in extracellular matrix production, cell density, and structural complexity. A threshold-based AND gate would be susceptible to switching at suboptimal biofilm stages, producing antimicrobials before the biofilm could effectively retain them, or failing to switch despite adequate maturation due to noise-driven fluctuations below threshold.

The threshold problem is further complicated by the heterogeneity of biofilm microenvironments. Cells at the biofilm-surface interface experience different nutrient and oxygen conditions than cells at the biofilm-liquid interface. This spatial heterogeneity means that a single threshold for biofilm maturity may not accurately reflect the state of the entire community. A more robust architecture allows the biofilm to develop fully before any antimicrobial production decision is made, rather than attempting to gate production on a noisy maturity signal.

Functional Logic: Sequential Coincidence Detection

The logic of our system is better described as sequential coincidence detection with temporal ordering, not Boolean AND. The system has two phases: (Phase 1) Biofilm formation and priming, and (Phase 2) Antimicrobial production and eventual self-destruction. These phases are mutually exclusive in time: the system cannot be producing antimicrobials effectively before the biofilm is mature, and once the kill switch activates, the system self-destructs and ceases all function.

This sequential architecture can be understood through an analogy to an electronic sequential logic circuit with a state machine, rather than a combinational logic gate. The system has two states: PRIMED (biofilm growing, sensors active, effector genes repressed) and ACTIVE (antimicrobials producing, kill switch armed). The transition from PRIMED to ACTIVE is triggered by a contamination detection event (or manual induction), but this transition is only possible after a minimum biofilm maturation time has elapsed. The transition from ACTIVE to TERMINATED occurs when the kill switch activates, either on a timer or by chemical induction.

In control systems engineering, this architecture is called a supervisory control system with mode switching. The biofilm formation module operates autonomously in Mode 1. Upon receiving an activation signal, the system switches to Mode 2, enabling the antimicrobial module. A separate supervisory signal (the kill switch trigger) forces transition to Mode 3, where the lysis genes execute and the system self-destructs. This three-mode architecture is fundamentally different from a twoinput AND gate and provides more robust, predictable behavior.


Sequential Coincidence Detection: The Actual Circuit Logic

The genetic circuit designed implements sequential coincidence detection through three functionally separated modules that operate in temporal sequence rather than in parallel combination. This architecture provides inherent noise suppression, temporal ordering of biological events, and multiple containment layers.

Module 1 (Sensor-Priming) contains the biofilm formation genes and the environmental sensing system. The biofilm formation is constitutively or auto-induced through the natural quorum sensing of Bacillus subtilis, which uses the ComQXPA system and the Rap-Phr family of signaling peptides. As cells proliferate on the surface, they secrete ComX pheromone, which accumulates in the extracellular matrix. When ComX reaches a threshold concentration, the ComP histidine kinase activates, leading through a phosphorelay to the phosphorylation of Spo0A, the master regulator of biofilm formation. This is a natural sequential process that requires time to develop.

In parallel with biofilm formation, Module 1 includes an environmental sensing promoter. In the current design, this is an IPTG-inducible P_lac promoter or a pathogen-detecting biosensor promoter. The key feature is that the output of this sensor is not directly connected to antimicrobial production (as it would be in an AND gate), but rather primes the system for activation. In the absence of the sensor signal, the antimicrobial genes are held in a repressed or silent state even if the biofilm is fully mature. When the sensor signal appears, it does not immediately activate antimicrobial production; instead, it licenses the transition to Module 2.

Module 2 (Effector) contains the antimicrobial production genes: subtilin (spaS, spaB, spaC, spaT in the spaBTCS operon) and cecropin (custom synthetic sequence adapted for B. subtilis codon usage). These genes are placed under the control of a strong, inducible promoter that is activated only after the system has been primed by Module 1. So, the transition from Module 1 to Module 2 can be implemented in two ways: (1) through a single chemical inducer (IPTG) that serves as both the environmental proxy and the production trigger, with biofilm maturation providing the temporal delay naturally; or (2) through a two-step system where a first inducer primes the system and a second inducer (or the same inducer at a higher concentration, or a different signal) triggers production after a delay.

The natural temporal delay between Module 1 and Module 2 is the critical feature that makes this a sequential coincidence detector rather than an AND gate. In an AND gate, both inputs must be present simultaneously at the gate input. In our system, the biofilm formation input must precede the antimicrobial production input by hours to days. The coincidence is detected across time, not at a single time point.

Module 3 (Kill Switch) operates independently but with temporal sequencing. The lysis genes (xhlA and xhlB from the Bacillus subtilis PBSX prophage) are placed under a chemically inducible promoter (P_xyl for xylose induction) that is distinct from the Module 2 promoter. This separation ensures that antimicrobial production and cell lysis are genetically decoupled and can be triggered at different times. The kill switch is armed throughout the biofilm lifetime but is only triggered when the operator decides the antimicrobial mission is complete or when a pre-programmed timer expires.


3. Design

Module 1: Sensor-Priming Circuit

The sensor-priming module serves as the interface between the environment and the engineered biofilm. Its functions are: (a) to promote biofilm formation on the target surface; (b) to detect environmental conditions that warrant antimicrobial activation; and (c) to maintain the antimicrobial genes in a silent state until activation is licensed.

Biofilm formation in Bacillus subtilis is regulated by a sophisticated genetic network centered on Spo0A, the sporulation and biofilm master regulator. When nutrient conditions are favorable and cell density increases, the ComQXPA quorum sensing system activates, leading to ComA phosphorylation and the induction of surfactin biosynthesis genes (srfAA operon). Simultaneously, Rap proteins (RapA, RapE, RapK) are inhibited by Phr peptides, allowing Spo0A phosphorylation through the phosphorelay (KinA/KinB -> Spo0F -> Spo0B -> Spo0A). Phosphorylated Spo0A (Spo0A~P) directly activates the epsA-O and tapA-sipW-tasA operons, which encode the exopolysaccharide (EPS) and TasA amyloid fiber components of the biofilm matrix.

In our system, we leverage this natural biofilm program by providing the wild-type B. subtilis 168 with the capacity to form robust biofilms. The 168 strain carries mutations in the srfAA operon (specifically, the sfp₀ gene encoding 4-phosphopantetheinyl transferase is disrupted), which prevents surfactin production. However, EPS and TasA production remain functional. This natural biofilm formation provides the temporal delay required by our sequential logic: the biofilm must grow, the matrix must accumulate, and the community must reach sufficient density before the antimicrobial module is activated.


2. Lab protocol and materials tables (costs and suppliers)

(View Full Screen)

Group Final Project