Sofía Segura Cárdenas — HTGAA Spring 2026


Biological Engineering · Medical physics · Space science

About me

Hello! My name is Sofía, and I am a final-year Biological Engineering undergraduate student. My academic background is mainly focused on biomaterials, biological systems, modeling, simulation and engineering approaches to working with living matter.

Alongside this, my main scientific interests lean strongly toward physics-related fields, particularly medical physics, space medicine, and the study of extreme environments—ranging from radiation effects in matter to broader interests in space and astrophysics.


Contact info

Email: seguracardenassofia@gmail.com


Homework


Labs


Projects

Subsections of Sofía Segura Cárdenas — HTGAA Spring 2026

Weekly Homework



Switch view to Labs

Subsections of Weekly Homework

HTGAA - Week 1: Principles and Practices


cover image cover image

My Homework

WEEK 1 - SAVE THE WORLD OR DESTROY THE WORLD

This week lays the foundation for ethics, safety, and governance in biotechnology.

Instructions

  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

  3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

    Example
    • Purpose: What is done now and what changes are you proposing?
    • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
    • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
    • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

  4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents
• By helping respond
Foster Lab Safety
• By preventing incident
• By helping respond
Protect the environment
• By preventing incidents
• By helping respond
Other considerations
• Minimizing costs and burdens to stakeholders
• Feasibility?
• Not impede research
• Promote constructive applications
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

PART 1. FIXING THE COURSE
  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

NOTE: This project is just the initial idea, it can be subjected to changes and upgrades in the near future.

Project: Reversible Cell-Free Biosensor for ROS-Mediated Radiation Damage

This project aims to design a reversible, cell-free biosensor capable of reporting radiation-induced oxidative damage through a visible biochemical signal.

The system is based on a DNA-programmed TX–TL circuit embedded within a hydrogel matrix, inspired by biological systems that can transition between active and inactive states under physical stress. Upon exposure to radiation-induced reactive oxygen species (ROS), the biosensor activates a transient fluorescent response, which gradually returns to a basal state once the stimulus is removed, enabling reuse of the material.

By decoupling damage sensing from living cells, this platform provides a controllable and modular approach to studying radiation effects on biological matter.

One-sentence project goal

The goal of this project is to engineer a reversible, reusable, cell-free biosensor that translates radiation-induced oxidative damage into a transient biochemical signal.

Background, application and why does it matter

The primary application of this biosensor is in radiation physics, medical physics and even space science, where it can be used as a reusable biological dosimetry platform to study oxidative damage induced by ionizing radiation.

Rather than measuring radiation directly, the system reports biologically relevant damage, specifically ROS generation, offering insight into how physical radiation translates into molecular stress in biological systems. This makes the material particularly valuable for experimental radiation setups, calibration studies, and comparative stress assays, without the need for living models.

The material functions as a reversible biological stress reporter. Instead of permanently activating or degrading under radiation-induced stress, it temporarily switches state to signal damage and then returns to baseline, enabling repeated use and long-term monitoring.

In medical physics and radiobiology, many existing sensing systems present fundamental limitations:

  • They degrade over time
  • They saturate under high stimulus
  • They are single-use
  • They cannot be reset or recovered

Similarly, most biological sensors:

  • lose viability
  • or remain irreversibly activated after damage

This creates a gap between physical radiation sensing and biologically meaningful damage reporting. The hydrogel is not just a container. While individual stress-responsive genetic elements are well characterized, their integration into a reusable, reversible cell-free biomaterial capable of multiple stress-response cycles remains largely unexplored.


Inspiration

The project is inspired by simple biological systems, such as jellyfish, which exhibit functional resilience and reversible state transitions despite minimal organizational complexity. These organisms demonstrate that biological function does not always require permanent activation or structural complexity, but can instead rely on transient, physics-driven responses to environmental stress.

Translating this principle into a synthetic, cell-free context, the proposed biosensor explores how biological states—such as gene expression and signal emission—can be reversibly triggered by physical damage and allowed to relax back to a stable baseline.


What makes this a synthetic biology project

This project constitutes a synthetic biology approach by designing and programming a DNA-based TX–TL circuit that links oxidative stress sensing to a controlled biochemical output to manifest a visible fluorescent signal. The circuit architecture, combined with material constraints imposed by the hydrogel matrix, enables tunable activation, decay, and reversibility of the signal.

Signal intensity correlates with stress magnitude, while signal reversibility reflects the system’s ability to recover to a baseline state. System reversibility is achieved through the co-design of a stress-responsive genetic circuit and a diffusion-regulated material matrix, enabling transient activation and passive return to a basal state without permanent system alteration. The system does not shut down because it fails; it shuts down because it is designed to relax back to its original state.

This platform is thinked to be modular, allowing future expansion to additional damage types. Rather than engineering a new organism, the project focuses on engineering biological function, emphasizing control, modularity, and reusability.

Conceptual state transition

  1. The system starts in an OFF (basal) state
  2. Oxidative stress is applied (e.g. H₂O₂ or radiation-induced ROS)
  3. The system enters a “damage state”
  4. A fluorescent signal is activated
  5. The stress is removed
  6. The system relaxes back to its basal state

Engineering design decisions

Biological Circuit ControlsMaterial (Hydrogel) Controls
What is detected (ROS, damage, stress)How much stimulus enters the system
What signal is produced (fluorescence)How fast the stimulus diffuses
Activation threshold and sensitivityHow long the stimulus is retained
Timing of signal initiationRate of stimulus clearance
Duration of protein expressionSmoothness of system shutdown
Signal termination mechanismsBuffering of damage spikes
Susceptibility to noise or false positivesProtection of TX–TL components

Key tunable parameters in the system design include:

  • duration of protein expression
  • protein degradation rate
  • response speed
  • energy consumption
  • lifetime of the TX–TL system

Primary and secondary reporting strategy

  • Primary signal: fluorescence intensity
  • Secondary signal: temporal dynamics of activation and decay

Interpretation:

  • Fluorescence intensity reflects the magnitude of ROS-induced damage
  • Signal duration and decay profile reflect the dynamic response of the system under stress
Simplifying

How much it glows → magnitud of the damage
How fast it starts glowing → intensity of the stress
How the signal declines → dynamics of the system under damage

Reversibility is not interpreted as a property of the damage itself, but as a designed feature of the biosensor, enabling repeated use under multiple damage cycles.

Circuit architecture

[ROS-sensitive promoter]

[Fluorescent protein + degron]

[Terminator]

Why this is non-trivial (and why it’s innovative)

Poor design choices lead to failure modes such as:
  • Gel too dense → stimulus never reaches the circuit → no activation
  • Gel too loose → excessive activation → no shutdown
  • Reporter too stable → permanent signal → no reuse
  • Circuit too sensitive → noise and false positives

PART 2. PROJECT CONSIDERATIONS
  1. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

Governance and Policy Considerations

flowchart TB
    G["Governance & Policy Goals"]

    G --> A["Non-malfeasance<br/>(Preventing Harm)"]
    A --> A1["Cell-free TX–TL limits dual-use potential"]
    A --> A2["Avoids human or clinical deployment"]
    A --> A3["Environment friendly"]

    G --> B["Safe and Responsible Research"]
    B --> B1["Transparency in system limitations"]
    B --> B2["Reproducibility and containment"]
    B --> B3["Ensuring personal safety and capacitation"]
    B --> B4["Financial responsability"]

    G --> C["Constructive & Equitable Use"]
    C --> C1["Accessibility of the platform"]
    C --> C2["Supports education and interdisciplinary research"]
    C --> C3["Promotion of heuristic rules/method"]
    
  • Goal 1A. Non-malfeasance (Preventing Harm)
    cover image cover image
  • Goal 1B. Safe and Responsible Research
    cover image cover image
  • Goal 1C. Constructive and Equitable Use
    cover image cover image
  • Sub-Goal 1A. Cell-free TX–TL limits dual-use potential
    • The biosensor is designed as a cell-free system, preventing replication, evolution, or environmental persistence, thereby reducing biosafety and biosecurity risks.
  • Sub-Goal 2A. Avoids human or clinical deployment
    • The system is not intended for in vivo, clinical, or diagnostic use; clear communication of this limitation helps prevent inappropriate application and fends emerging ethical concerns about animal and human clinical trials.
  • Sub-Goal 3A. Environment friendly
    • This project prioritizes environmentally responsible design by relying on hydrogel matrices derived from biodegradable, bio-based, or naturally sourced polymers. Such materials are often obtained from renewable resources or industrial by-products, reducing environmental impact compared to synthetic, non-degradable sensing technologies. Additionally, the reusability of the biosensor minimizes material waste and lowers the frequency of disposal, contributing to a more sustainable experimental practice.
  • Sub-Goal 1B. Transparency in system limitations
    • The biosensor reports oxidative damage via ROS signaling rather than direct radiation dose, and this distinction must be clearly stated to avoid misinterpretation.
  • Sub-Goal 2B. Reproducibility and containment
    • The use of in silico circuit design and controlled TX–TL systems improves reproducibility while minimizing unintended biological interactions.
  • Sub-Goal 3B. Ensuring personal welfare and capacitation
    • Because the system is intended for studying radiation-induced damage in controlled environments, its use must be accompanied by appropriate safety protocols and user training. This biosensor is explicitly not designed to replace personal dosimeters or occupational safety monitoring devices. Clear operational guidelines, radiation-handling protocols, and user capacitation are required to ensure that the biosensor is employed strictly as an experimental tool, without increasing risk to personnel.
  • Sub-Goal 4B. Financial responsability
    • The proposed system emphasizes cost-effective design through the use of low-cost materials, minimal infrastructure requirements, and a reusable sensing strategy. By enabling multiple experimental cycles within the same biosensor material, the system reduces recurring expenses associated with single-use sensors or consumables. This extended operational lifetime represents a significant financial advantage for laboratories and institutions, supporting responsible allocation of economic resources.
  • Sub-Goal 1C. Accessibility of the platform
    • Cell-free and hydrogel-based systems lower infrastructure barriers, making the platform more accessible to educational and research laboratories.
  • Sub-Goal 2C. Supports education and interdisciplinary research
    • The project bridges synthetic biology, materials science, and medical physics while maintaining clear ethical boundaries around scope and use.
  • Sub-Goal 3C. Promotion of heuristic rules
    • This project adopts a heuristic-driven design philosophy, leveraging simple, interpretable rules to guide system construction and experimentation. Material properties, circuit dynamics, and experimental steps are intentionally ordered to maximize efficiency—favoring low-cost, low-complexity processes early and reserving more resource-intensive steps for later stages. This approach improves time efficiency, reduces unnecessary expenditures, and promotes accessible, transferable design strategies that can be adapted across laboratories and disciplines.

PART 3. THE WHO AND THE HOW
  1. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).
Example
  • Purpose: What is done now and what changes are you proposing?
  • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
  • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
  • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Governance Action 1 — Mandatory contextual labeling and use limitation

Actor(s): Academic researchers, research institutions, funding agencies.

Purpose

Currently, biosensors designed for radiation-related applications can be misinterpreted as direct radiation detectors or clinical tools. This project proposes a mandatory contextual labeling requirement stating that the system detects ROS-mediated damage, not radiation dose, and is intended strictly for in vitro experimental use. The change ensures that the tool is not misapplied in clinical, occupational, or regulatory contexts.

Design

To make this work, institutions and funding bodies would require that:

  • All documentation, publications, and public-facing descriptions explicitly state the system’s scope and limitations.
  • Experimental protocols include a standardized disclaimer clarifying that the biosensor does not replace dosimeters or personal safety devices.
  • Course projects and academic demonstrations reinforce correct interpretation through documentation templates and reporting guidelines.

Assumptions

This action assumes that misinterpretation is a primary pathway for harm and that clear documentation meaningfully influences user behavior. It also assumes that researchers and students will comply with labeling norms when they are formally required.

Risks of Failure & “Success”

  • Failure risk: Labels may be ignored, especially when the system performs well and appears “sensor-like.”
  • Risk of success: If widely adopted, the tool could become a de facto standard for damage reporting, tempting users to extend it beyond its intended domain without appropriate validation.

Governance Action 2 — Safety training and protocol integration as a prerequisite for use

Actor(s): Research institutions, laboratory safety committees, instructors.

Purpose

Radiation-related experimentation already requires specialized training, but novel biosensors can create a false sense of safety. This action proposes that use of the biosensor be explicitly tied to existing radiation safety training and protocols, reinforcing that the tool supplements—but does not replace—established safety infrastructure.

Design

This action would require:

  • Integration of the biosensor into institutional radiation safety manuals as an experimental reporting tool.
  • Mandatory user training that explains what the biosensor measures, what it does not measure, and how to interpret its output.
  • Oversight by institutional safety committees when the system is used in radiation-adjacent experiments.

Assumptions

This approach assumes that institutions already have safety frameworks capable of absorbing new tools, and that users are more likely to behave responsibly when a technology is embedded within formal safety structures.

Risks of Failure & “Success”

  • Failure risk: Training could become procedural rather than substantive, reducing its effectiveness.
  • Risk of success: If the biosensor becomes normalized within safety workflows, it may be incorrectly perceived as an authoritative indicator of safety rather than an experimental proxy.

Governance Action 3 — Incentivizing reusable, low-waste biosensing systems

Actor(s): Funding agencies, academic programs, sustainability-focused research initiatives.

Purpose

Many sensing technologies are single-use, expensive, or environmentally burdensome. This action proposes incentivizing reusable and low-waste biosensor designs, positioning reusability and material efficiency as desirable research outcomes rather than secondary considerations.

Design

This could be implemented through:

  • Establish evaluation criteria that favor reusability, material sustainability, and life cycle efficiency.
  • Creation and promotion of open, repositories that document reuse cycles, material performance, and design adaptations for biosensing platforms.
  • Recognition or funding bonuses for designs that reduce consumables and experimental waste.

Assumptions

This action assumes that researchers respond to incentive structures and that sustainability metrics can be meaningfully evaluated without stifling innovation or creativity.

Risks of Failure & “Success”

  • Failure risk: Incentives may encourage superficial reuse claims without rigorous validation.
  • Risk of success: Strong emphasis on reuse could discourage exploration of necessary single-use or high-sensitivity designs in certain contexts.

PART 4. HOW WELL DO YOU DO?
  1. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

cover image cover image
PART 5. PRIORITIES
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

Prioritized governance strategy and rationale

Drawing upon the governance scoring matrix, the most effective strategy for guiding the responsible development and use of the proposed reversible cell-free biosensor is a combined prioritization of Governance Options 1 and 2, with Governance Option 3 acting as a reinforcing, longer-term incentive mechanism.

Primary priority: Governance Options 1 and 2 (combined)

  • Option 1 — Mandatory contextual labeling and use limitation +
  • Option 2 — Safety training and protocol integration as prerequisites

These two options consistently score highest across biosafety, lab safety, and environmental protection, particularly in their ability to prevent incidents rather than merely respond to them. Together, they address the most immediate risks associated with misuse, misinterpretation, or inappropriate deployment of the biosensor.

Option 1 ensures that the system is clearly framed as:

  • A cell-free, non-replicative biosensing platform
  • Not a personal radiation dosimeter Not intended for clinical or in vivo use

This directly reduces the risk of over-interpretation of fluorescence signals and prevents the technology from being deployed outside its validated scope.

Option 2 complements this by embedding the biosensor within existing institutional safety cultures, requiring that users receive appropriate training in:

  • Radiation handling protocols
  • Interpretation of indirect ROS-based signals
  • Limitations of TX–TL systems

Importantly, this option does not introduce new regulatory burdens but instead leverages existing laboratory training and approval workflows, making it both feasible and scalable.

Trade-off considered: These measures may slow early adoption or increase onboarding time for new users. However, this is outweighed by the reduction in misuse risk and the preservation of trust in the technology.


Secondary priority: Governance Option 3 (Incentive-based einforcement)

Option 3 — Incentivizing reusable, low-waste biosensing systems

While Option 3 scores lower in immediate incident prevention, it plays a crucial role in shaping long-term research behavior and system design choices. Incentives that reward reusability, lifecycle efficiency, and reduced consumables encourage adoption of the very properties that distinguish this biosensor from traditional single-use sensors.

Rather than acting as a primary safeguard, this option functions best as:

  • A structural reinforcement mechanism +
  • A signal to researchers and institutions that sustainability and reuse are valued outcomes

Trade-off considered:
Incentive-based mechanisms depend on institutional uptake and may have uneven effects across well-funded versus resource-limited laboratories. Their impact is therefore slower and less uniform than mandatory requirements.


Assumptions and Uncertainties

This prioritization assumes that:

  • Institutions and laboratories already possess baseline safety infrastructure
  • Users are willing to engage with training and labeling requirements
  • Regulatory bodies are receptive to non-single-use technologies

Uncertainties remain regarding:

  • How fluorescence-based damage reporting might be interpreted by non-experts
  • Variability in institutional enforcement of training standards
  • How incentive structures translate into real design decisions over time

This governance strategy is primarily directed toward:

  1. Institutional biosafety committees and laboratory leadership
  2. Funding agencies and regulatory bodies overseeing research infrastructure
  3. Organizations setting best-practice standards for cell-free and biosensing technologies

By acting at this institutional and regulatory level, the proposed governance combination balances safety, feasibility, innovation, and sustainability, aligning closely with the technical and ethical goals of the project.


WEEK 2 - LECTURE PREP

In preparation for Week 2’s lecture on “DNA Read, Write, and Edit," please review the follow materials

Lecture 2 slides as posted below. The associated papers that are referenced in those slides.
In addition, answer these questions in each faculty member’s section:

Homework Questions from Professor Jacobson: [Lecture 2 slides]

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

DNA Polymerase error rate and genome fidelity

DNA polymerase, the enzyme responsible for copying DNA during replication, has an intrinsic error rate of approximately 1 mistake per 10⁵ nucleotides incorporated.

The human genome contains about 3 × 10⁹ base pairs. At this raw error rate, tens of thousands of mutations would occur every time a human cell divides, which would be incompatible with life.

How biology addresses this discrepancy

Biological systems reduce replication errors through multiple layers of error correction:

  1. Proofreading by DNA polymerase
    Many DNA polymerases possess 3′→5′ exonuclease activity, which allows them to remove incorrectly incorporated nucleotides immediately. This improves fidelity to roughly 1 error per 10⁷ nucleotides.

  2. Post-replication mismatch repair (MMR)
    Additional cellular repair systems detect and correct mismatches that escape proofreading, further reducing the error rate to approximately 1 error per 10⁹–10¹⁰ nucleotides.

As a result, the final error rate is low enough that most cell divisions occur without introducing harmful mutations.


How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Coding Capacity of DNA for an Average Human Protein

An average human protein is approximately 300 amino acids long. Each amino acid is encoded by a codon, a sequence of three nucleotides.

Because there are 4 possible nucleotides (A, T, C, G), there are:

  • ( 4^3 = 64 ) possible codons
  • Only 20 amino acids (plus stop signals)

This means the genetic code is degenerate, and most amino acids are encoded by multiple codons.

Number of Possible DNA Sequences for One Protein

If an average amino acid is encoded by ~3 synonymous codons, then the total number of possible DNA sequences that could encode a 300–amino acid protein is approximately:

[ 3^{300} ]

This is an astronomically large number, meaning there are many distinct DNA sequences that can, in theory, encode the same protein.


Why Most Possible Codes Do Not Work in Practice

Despite this theoretical flexibility, not all synonymous DNA sequences function equally well due to several biological constraints:

  1. Codon usage bias
    Organisms preferentially use certain codons over others. Rare codons can slow translation or cause ribosome stalling.

  2. mRNA secondary structure
    Certain nucleotide sequences form stable secondary structures that hinder ribosome binding or elongation.

  3. Translational accuracy and efficiency
    Codon choice can affect misincorporation rates and protein folding during translation.

  4. Regulatory elements embedded in coding sequences
    Coding regions may overlap with regulatory signals affecting splicing, mRNA stability, or localization.

  5. GC content and genome stability
    Extreme nucleotide compositions can impact DNA replication and transcription efficiency.

Because of these factors, only a small subset of all theoretically possible DNA sequences are biologically viable for producing a functional protein at appropriate levels.


Homework Questions from Dr. LeProust: [Lecture 2 slides]

What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry.
In this method, DNA is synthesized stepwise from the 3′ to the 5′ end on a solid support. Each cycle consists of four main steps: deprotection, coupling of a phosphoramidite nucleotide, capping of unreacted chains, and oxidation. This approach is highly automated, fast, and reliable, making it the standard technique used by commercial DNA synthesis providers.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

It is difficult to synthesize oligos longer than ~200 nucleotides because errors accumulate with each synthesis cycle.
Each nucleotide addition has a small but nonzero failure rate (incomplete coupling, side reactions, or deletions). As the oligo length increases, these errors compound exponentially, leading to a low fraction of full-length, correct sequences. Additionally, longer oligos are harder to purify effectively, since truncated products differ only slightly in length from the desired product.

Why can’t you make a 2000bp gene via direct oligo synthesis?

A 2000 bp gene cannot be made via direct oligo synthesis because the cumulative error rate would be extremely high, resulting in an almost negligible yield of error-free full-length DNA.
Beyond error accumulation, chemical synthesis efficiency, purification limitations, and cost make direct synthesis impractical at this scale. Instead, long genes are constructed by assembling shorter, overlapping oligos using enzymatic methods such as PCR-based assembly or Gibson assembly, followed by cloning and sequence verification.


Homework Question from George Church: [Lecture 2 slides]

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in animals are those that cannot be synthesized de novo and therefore must be obtained from the diet:

  1. Histidine
  2. Isoleucine
  3. Leucine
  4. Lysine
  5. Methionine
  6. Phenylalanine
  7. Threonine
  8. Tryptophan
  9. Valine
  10. Arginine (essential in all animals during growth; in many adult animals it is conditionally essential)

How does this affect the view of the “Lysine Contingency”?

The “Lysine Contingency” refers to the idea that life—particularly animals—became evolutionarily dependent on lysine availability from external sources, because animals lost the ability to synthesize lysine. Since lysine is universally essential in animals and often limiting in plant-based diets (especially cereal grains), this creates a strong nutritional and evolutionary constraint.

This reinforces the view that the lysine contingency is real and biologically significant:

  • Animals are metabolically constrained by the loss of lysine biosynthesis pathways.
  • Ecosystems and food webs are shaped by lysine availability and by organisms (plants, fungi, bacteria) that can synthesize it.
  • It helps explain why lysine supplementation or biofortification (e.g., high-lysine crops) has a major impact on nutrition and health.

Overall, the universality of lysine as an essential amino acid in animals supports the idea that lysine availability is a key evolutionary and nutritional bottleneck rather than a trivial dietary detail.


[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?
[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:
https://arpa-h.gov/explore-funding/programs/boss
https://www.darpa.mil/research/programs/smart-rbc
https://www.darpa.mil/research/programs/go


Resources

  1. Nguyen, P. Q., Soenksen, L. R., Donghia, N. M., Angenent-Mari, N. M., de Puig, H., Huang, A., Lee, R. A., Slomovic, S., Galbersanini, T., Lansberry, G., Sallum, H. M., Zhao, E. M., Niemi, J. B. & Collins, J. J. (2021). Wearable materials with embedded synthetic biology sensors for biomolecule detection. Nature Biotechnology, 39(11). https://doi.org/10.1038/s41587-021-00950-3
  2. Karim, M. M. and Lasker, T. (2025). Electrochemical Biosensors for Cancer Biomarker Detection: Basic Concept, Design Strategy and Cutting‐Edge Development. Electrochemical Science Advances. https://doi.org/10.1002/elsa.70007
  3. Liang, Q., Lu, Y. & Zhang, Q. (2022). Hydrogels‐Based Electronic Devices for Biosensing Applications. In Smart Stimuli-Responsive Polymers, Films, and Gels. https://doi.org/10.1002/9783527832385.ch10
  4. Zhang, M., Xu, T., Liu, K., Zhu, L., Miao, C., Chen, T., Gao, M., Wang, J. & Si, C. (2024). Modulation and Mechanisms of Cellulose‐Based Hydrogels for Flexible Sensors. SusMat, 5. https://doi.org/10.1002/sus2.255
  5. Ahmed, S. N. (2015). Physics and Engineering of Radiation Detection. Choice Reviews Online. https://doi.org/10.1016/C2013-0-15270-1
  6. Ng, K., Ung, N. & Hill, R. (2022). Problems and Solutions in Medical Physics: Radiotherapy Physics. CRC Press. https://doi.org/10.1201/9780429159466
  7. Abaza, A. M. H. (2017). New Trend in Radiation Dosimeters. American Journal of Modern Physics, 7(1), 21-30. https://doi.org/10.11648/j.ajmp.20180701.13
  8. Bartoloni, A. & Strigari, L. (2025). Space Radiobiology: Synergies between Astroparticle and Medical Physics. World Scientific.
  9. Zhang, J., Liu, J., Qiao, L., Zhang, Q., Hu, J. & Zhang, C. (2024). Recent Advance in Single-Molecule Fluorescent Biosensors for Tumor Biomarker Detection. Biosensors, 14(11). https://doi.org/10.3390/bios14110540
  10. Cao, X., Lv, D., Zhang, L & Xing, Z. (2020). Adaptive Governance, Loose Coupling, Forward-Looking Strategies and Responsible Innovation (September 2020). https://doi.org/10.1109/ACCESS.2020.3046095
  11. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 2: DNA read, write and edit


cover image cover image

My Homework

WEEK 2 - CUT, COPY AND STICH

This week explores the read–write–edit toolkit: sequencing and synthesis workflows, restriction digests and gel electrophoresis, and early genome-editing frameworks.

Lecture (Tues, Feb 10)

DNA Read, Write, & Edit
George Church [slides]
Joe Jacobson [slides]
Emily Leproust [slides]

(The recording will be posted here when available)

Recitation (Wed, Feb 11)

DNA Gel, restriction enzymes, Benchling intro, Twist intro
Ice Kiattisewee

(The recording and slides will be posted here when available)


Documentation

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Instructions

Part 0: Basics of Gel Electrophoresis

Attend or watch all lecture and recitation videos. Optionally watch bootcamp.

Electrophoresis chamber setup (left) and injected gel matrix (right).

Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview:

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes:
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
  • You might find Ronan’s website a helpful tool for quickly iterating on designs!

Ronan Donovan’s electrophoresis simulation

Attempt to generate a Jellyfish design!



Benchling simulation

Generating a gel design with Benchling enzymes on site





Final download:

Yes well, we tried that jellyfish (hope you can see it too!), mad respect to Paul Vanouse.


Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersOptional (for those with Lab access)

Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.


Part 3: DNA Design Challenge

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

[Example from our group homework, you may notice the particular format — The example below came from UniProt]

>sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT


Chosen protein: α-Bungarotoxin

I chose α-bungarotoxin, a neurotoxin found in the venom of the snake Bungarus multicinctus.

Bungarus multicinctus - Many-banded krait (left) and α-Bungarotoxin protein form by UniProt (right).

Why?

  • It is an extremely potent toxin.
  • It binds almost irreversibly to nicotinic acetylcholine receptors.
  • It causes paralysis by blocking neuromuscular transmission.
  • It is widely used in neurobiological research.
  • It is relatively small (95 amino acids), which makes it manageable for sequence analysis.





Protein sequence (UniProt-style format)

P60615|3L21A_BUNMU Alpha-bungarotoxin

>sp|P60615|3L21A_BUNMU Alpha-bungarotoxin OS=Bungarus multicinctus OX=8616 PE=1 SV=1 MKTLLLTLVVVTIVCLDLGYTIVCHTTATSPISAVTCPPGENLCYRKMWCDAFCSSRGKVVELGCAATCPSKKPYEEVTCCSTDKCNPHPKQRPG

(95 amino acids; cysteine-rich protein with multiple disulfide bonds)


3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

[Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Lysis protein DNA sequence
atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa


As discussed in class, due to the degeneracy of the genetic code, multiple codons can encode the same amino acid. Therefore, reverse translation from a protein sequence to a DNA sequence is not unique.

Reverse translation was performed using the Bioinformatics reverse translation tool. Due to codon degeneracy, the resulting DNA sequence represents one possible coding sequence corresponding to the selected protein.


Complete pdf of the reverse translation: [pdf]

A possible nucleotide sequence corresponding to the selected protein is shown below:

Reverse translation of Untitled to a 285 base sequence of most likely codons

atgaaaaccctgctgctgaccctggtggtggtgaccattgtgtgcctggatctgggctataccattgtgtgccataccaccgcgaccagcccgattagcgcggtgacctgcccgccgggcgaaaacctgtgctatcgcaaaatgtggtgcgatgcgttttgcagcagccgcggcaaagtggtggaactgggctgcgcggcgacctgcccgagcaaaaaaccgtatgaagaagtgacctgctgcagcaccgataaatgcaacccgcatccgaaacagcgcccgggc


3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

[Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Lysis protein DNA sequence with Codon-Optimization
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA


Why is codon optimization necessary?

Although the genetic code is degenerate, meaning that multiple codons can encode the same amino acid, organisms do not use synonymous codons with equal frequency. Each organism has a preferred codon usage bias that reflects the abundance of its tRNAs.

If a gene containing rare codons is introduced into a heterologous host organism, several issues may arise:

  • Reduced translation efficiency
  • Lower protein yield
  • Increased risk of ribosome stalling
  • Potential misfolding due to slowed or irregular translation kinetics

Therefore, codon optimization is performed to adapt the coding sequence to the codon usage preferences of the chosen host organism, improving translation efficiency and overall protein production.

Selected organism for optimization

The coding sequence was optimized for expression in: Escherichia coli

Why E. coli?

  • It is the most widely used bacterial expression system.
  • It is cost-effective, fast-growing, and easy to genetically manipulate.
  • It is ideal for recombinant protein production.

Although α-bungarotoxin is a cysteine-rich protein containing multiple disulfide bonds, specialized strains (e.g., oxidative cytoplasm strains) or periplasmic targeting strategies can facilitate proper folding.

Codon-optimized sequence for E. coli

Codon optimization was performed using the Expression Optimization (Pilot) algorithm provided by Integrated DNA Technologies. The amino acid sequence was optimized for expression in Escherichia coli while avoiding BsaI, BsmBI and BbsI restriction sites.



The resulting optimized coding sequence (285 bp) is:

ATG AAA ACG TTA CTG CTG ACG CTG GTG GTG GTC ACC ATT GTT TGC CTG GAT CTG GGC TAC ACC ATT GTT TGC CAC ACC ACC GCG ACC TCA CCG ATC TCT GCG GTC ACC TGC CCG CCG GGT GAA AAT CTG TGC TAT CGC AAA ATG TGG TGC GAT GCG TTC TGC AGC AGC CGC GGC AAA GTG GTG GAG CTG GGC TGC GCG GCG ACC TGC CCG AGC AAA AAA CCG TAT GAA GAA GTG ACC TGC TGC AGC ACC GAC AAA TGC AAC CCG CAT CCG AAA CAG CGC CCG GGC

Sequence analysis indicated low complexity (score 2.1), suggesting no anticipated synthesis issues. Internal restriction sites unrelated to the cloning strategy were detected but do not interfere with the intended design.


3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.


Once the codon-optimized DNA sequence has been obtained, several biotechnological strategies can be used to produce the corresponding protein. These methods rely on the fundamental biological processes of transcription and translation. Protein production can be achieved using either cell-dependent systems or cell-free expression systems.



3.5. [Optional] How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.
  2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

In natural biological systems, a single gene can give rise to multiple protein products. Although the classical view suggests that one gene encodes one protein, several regulatory mechanisms allow diversification at the transcriptional and post-transcriptional levels.

Mechanisms that allow one gene to produce multiple proteins

1. Alternative splicing

In eukaryotic organisms, genes contain exons and introns. During RNA processing, introns are removed and exons are joined together. However, different combinations of exons can be assembled, producing distinct mRNA variants from the same gene.

This process, known as alternative splicing, results in different protein isoforms with potentially different functions.

2. Alternative promoters

A single gene may contain multiple promoter regions. Depending on which promoter is activated, transcription may begin at different start sites, generating mRNAs with different 5′ ends. This can influence translation efficiency or alter the protein sequence.

3. Alternative translation initiation sites

Some mRNAs contain more than one possible start codon (AUG). Ribosomes may initiate translation at different positions, leading to proteins of different lengths.

4. RNA editing

In certain organisms, specific nucleotides in the RNA sequence are chemically modified after transcription. This can change codons and therefore alter the amino acid sequence of the final protein.

Alignment example: DNA → RNA → Protein

Using our optimized coding sequence as an example:

DNA (coding strand)

5′- ATG AAA ACG TTA CTG CTG -3′

Transcribed mRNA

(Thymine is replaced by uracil)

5′- AUG AAA ACG UUA CUG CUG -3′

Translated protein

Met – Lys – Thr – Leu – Leu – Leu

This alignment illustrates the central dogma of molecular biology:

DNA → RNA → Protein

In prokaryotes such as Escherichia coli, transcription and translation are coupled and occur simultaneously in the cytoplasm. In contrast, in eukaryotic cells, transcription occurs in the nucleus and translation occurs in the cytoplasm after RNA processing.


Part 4: Preparing a Twist DNA Synthesis Order

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

SECTION A. BENCHLING

This is a practice exercise, not necessarily the real Twist order!

4.1. Creating a Twist account, and Benchling account

4.2. Building a DNA Insert Sequence

We’ll make a sequence that will allow E. coli to glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein):

  1. In Benchling, we select New DNA/RNA sequence
  2. Now name the insert sequence and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).
  1. We go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with the codon optimized DNA sequence of interest). Each time we add a new piece of the sequence, we make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).
WHAT YOU SHOULD PINPOINT

Promoter (e.g. BBa_J23106)
TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (e.g. BBa_B0034 with spacers for optimal expression)
CATTAAAGAGGAGAAAGGTACC

Start Codon
ATG

Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example)
AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli)
CATCACCATCACCATCATCAC

Stop Codon
TAA

Terminator (e.g. BBa_B0015)
CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

  1. Once this is completed, we click on Linear Map to preview the entire sequence.

Note: This is not required for this exercise, but to share the design with others, ensure that link sharing is turned on!







The insert sequence that was built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). We now download the FASTA file for the sequence made.




It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey the design. Here’s an example of what we just annotated in Benchling:


SECTION B. TWIST


Part 5: DNA Read/Write/Edit

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

For my project — Reversible Cell-Free Biosensor for ROS-Mediated Radiation Damage — we would want to sequence three categories of DNA:

1. The ROS-responsive promoter region
- Specifically, oxidative stress–responsive regulatory elements (e.g., OxyR/SoxR-regulated promoters from E. coli).
Why?
- To verify the exact sequence integrity of the promoter controlling our reporter.
- Small mutations in regulatory regions can drastically alter activation threshold, leakiness, or response dynamics.
- Since our system depends on reversible, tunable activation (not binary irreversible switching), promoter fidelity is critical for predictable behavior.
2. The full genetic construct used in the TX–TL system
This includes:
- Promoter
- RBS
- Reporter gene (e.g., GFP variant)
- Degron tag
- Terminator
Why?
- To confirm assembly correctness after cloning or synthesis.
- To ensure no frameshifts, truncations, or rearrangements occurred.
- To validate that the degron sequence is intact (since reversibility depends on controlled protein degradation).
3. DNA stability after ROS exposure (damage assessment)
- Because the biosensor operates in oxidative environments, we may also sequence recovered plasmid DNA after repeated ROS cycles.
Why?
- To assess oxidative damage accumulation.
- To evaluate mutation rates under stress.
- To determine long-term reusability limits of the system.

This directly connects to governance and safety: understanding failure modes prevents misleading signal interpretation.


(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also answer the following questions:

  1. Is your method first-, second- or third-generation or other? How so?
  2. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
  3. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
  4. What is the output of your chosen sequencing technology?

For this project, I would use a combination of Sanger sequencing and Illumina sequencing, depending on the question being asked.

1. Sanger


Sanger functioning basics. By Merck©.

2. Illuimina


Next-generation sequencing (NGS) vs Sanger sequencing. By Illumina©.

Final Justification for Technology Choice

For the reversible ROS biosensor:

  1. Sanger sequencing is sufficient and ideal for construct validation.
  2. Illumina sequencing becomes valuable when studying oxidative mutation accumulation and long-term robustness.

This sequencing strategy directly supports:

  • Reliability
  • Reversibility characterization
  • Governance considerations
  • Failure-mode understanding
  • Safe system deployment

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

For the reversible ROS-mediated hydrogel biosensor, we would synthesize a minimal genetic circuit designed for oxidative stress detection and transient fluorescent output.

The construct would include:

  • A ROS-responsive promoter (e.g., OxyR-regulated promoter)
  • A ribosome binding site (RBS)
  • A fluorescent reporter gene (e.g., sfGFP)
  • A short degron tag to ensure rapid protein degradation
  • A transcriptional terminator

Why synthesize this DNA?

Because:

  • The promoter must be precisely tuned to oxidative stress.
  • The degron must be fused correctly to ensure reversibility.
  • The full construct must function in a cell-free TX–TL system.
  • Synthetic DNA reduces cloning errors.
  • It enables modular optimization.

We are not synthesizing a whole genome.
We are synthesizing a minimal functional sensing circuit embedded in a biomaterial.


(ii) What technology or technologies would you use to perform this DNA synthesis and why?

Also answer the following questions:

  1. What are the essential steps of your chosen sequencing methods?
  2. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

We would use commercial gene synthesis services, such as those provided by:

Twist Bioscience

These companies use high-throughput DNA synthesis platforms based on phosphoramidite chemistry and silicon-based parallel synthesis.

Simplified process:

  • Attach first base to solid surface.
  • Add chemically protected nucleotide.
  • Remove protective group.
  • Add next nucleotide.
  • Repeat cycle.

Each cycle adds ONE base. This is automated.

For longer fragments:

  • Short oligos are synthesized.
  • Then assembled enzymatically into longer genes.
  • Verified by sequencing.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

For this biosensor, I would edit the genetic circuit itself to optimize sensing dynamics, reversibility, and robustness.

Specifically, we would edit:


(ii) What technology or technologies would you use to perform these DNA edits and why?

Also answer the following questions:

  1. How does your technology of choice edit DNA? What are the essential steps?
  2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
  3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The most appropriate technology for precise edits in genetic constructs would be:

CRISPR-based editing systems.

Specifically: CRISPR-Cas9

Limitations of DNA Editing Methods

1. Off-target effects (CRISPR)

  • Cas9 can cut unintended regions.
  • Less relevant for small plasmids, more relevant for genomes.

2. Efficiency variability

  • Not all cells incorporate edits.
  • Requires screening.

3. Repair pathway dependence

  • Precise edits require homologous recombination.
  • Not always efficient.

4. Context sensitivity

  • Changing one base can unpredictably alter promoter behavior.
  • Requires iterative testing.

For this project DNA editing is not strictly required for initial system implementation. However, it would be essential for iterative optimization of promoter sensitivity, degradation kinetics, and response tuning.


Resources

  1. Secuenciación Sanger Pasos y método. (s.f.). Merck©. https://www.sigmaaldrich.com/MX/es/technical-documents/protocol/genomics/sequencing/sanger-sequencing
  2. Differences between NGS and Sanger sequencing. (s.f.). Illumina©. https://www.illumina.com/science/technology/next-generation-sequencing/beginners/advantages/ngs-vs-sanger.html
  3. A Simple Guide to Phosphoramidite Chemistry and How it Fits in Twist Bioscience’s Commercial Engine. Twist Bioscience. https://www.twistbioscience.com/blog/science/simple-guide-phosphoramidite-chemistry-and-how-it-fits-twist-biosciences-commercial
  4. CRISPR: ¿Qué es y cómo funciona?. (s.f.). genotipia. https://genotipia.com/crispr-cas/
  5. OpenAI. (2026). ChatGPT (GPT-5.2) [Large language model]. https://chat.openai.com/

HTGAA - Week 3: Lab Automatisation


cover image cover image

My Homework

WEEK 3 - CODING AND AUTOMATISATION

This week we get hands-on (or at least code-on) with pipetting robots.


Instructions

1. Assignment: Python Script for Opentrons Artwork

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.

  1. Review this week’s recitation and this week’s lab for details on the Opentrons and programming it.
  2. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.

The original idea was to create a piece based on gothic arquitecture featuring a stained glass rose window

The inspo vs the reallity.

However, the results where closer to a Mario Bros castle and I didn’t quite like it, so instead, I made a second attempt with two different options; one for my gothic rose window greed and another one more simple with a Snoopy design, thinking more on the time recuired for it to be created on the Opentron machine.

The first idea vs the final idea

Rose window (left), full final design (center) and simplified final design (right).

The link for the final published design on te GUI site is this: Click here


  1. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons.
    • You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept.
    • If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead.

For the Python code in Google Colab:

I did try to make the Python file aside from the Ronan’s site Python download and I encounter a few issues while coding.

HTGAA Opentrons Setup Code Analysis

1. Environment Setup

import sys, os
py = f"{sys.version_info.major}.{sys.version_info.minor}"  
PKG = f"/content/venv/lib/python{py}/site-packages"  
os.makedirs(PKG, exist_ok=True)  
if PKG not in sys.path: sys.path.insert(0, PKG)  
os.environ["PIP_TARGET"] = PKG  
os.environ["PYTHONNOUSERSITE"] = "1"  

%pip install -q --upgrade --target "$PKG" opentrons    

Explanation:

  • Google Colab comes with a newer numpy version that is incompatible with Opentrons.
  • To avoid restarting the runtime repeatedly, we create a venv-like environment where Opentrons and its compatible dependencies are installed.
  • This ensures the rest of the protocol works without conflicts.

plt.rcParams["figure.figsize"] = (10,10) sets the default figure size for visualizations of the Petri dish and droplets.


2. Petri Dish Constants

`PETRI_INNER_DIAMETER = 84 MAX_DRAW_RADIUS = PETRI_INNER_DIAMETER/2 - 2`

Explanation:

  • Defines the Petri dish size (in mm) to simulate a real 90 mm plate.
  • MAX_DRAW_RADIUS leaves a 2 mm margin to prevent dispensing outside the plate due to tip size or miscalibration.
  • When scaling coordinates (like SCALE = 0.7), all points fit within ±36.3 mm, safely under the 40 mm limit.

3. Helper Classes and Functions

nullLocation

nullLocation = types.Location(types.Point(x=250, y=250, z=250), None)

  • Placeholder for pipette location before dispensing anything.

same2DLocation(loc1, loc2): Compares x and y only, ignores z, to detect whether two points are essentially the same on the Petri dish.
mock_print(str): A silent print function used instead of standard print(), to avoid cluttering output logs during simulation.


4. Pipette Simulation Class (PipetteSim)

This is the heart of the setup, emulating an Opentrons pipette for aspirating, dispensing, and tracking droplets.

Key components:

self.droplets_x, self.droplets_y, self.droplets_size, self.droplets_color

  • Tracks droplet positions, sizes, and colors for visualization.

self.smears

  • Originally draws lines connecting sequential dispenses to simulate smearing/dragging of droplets.

Important: SMEAR Handling

# for xlist,ylist,color in self.smears:  
#     plt.gca().plot(xlist, ylist, color=color, linewidth=4, solid_capstyle='round')   
  • Commented out to remove unwanted lines in the visualization.
  • Concept: Each time the pipette moves after dispensing, the simulator connects the last droplet to the new location with a line.
  • We replaced it with plt.scatter() for droplets only, avoiding the “demonic laser beams of death” - ChatGPT, 2026.
Code without commenting "self.smears" on figures 1-3 starting from the left and commented code fixing the smear lines (figure 4) on the far right; the before and after.

5. Scaling and Coordinates

  • Coordinates for droplets (like electra2_points fron de GUI site) originally go up to ±36.3 mm.
  • With SCALE = 0.7, all points safely fit inside the MAX_DRAW_RADIUS = 40 mm.

This prevents runtime errors like:

ValueError: Dispensing outside "safe" area: Point (-25.3, 36.3) is more than 40.0mm away

Math used: simple multiplication for scaling each (x, y) coordinate

scaled_x = original_x * SCALE
scaled_y = original_y * SCALE

Mathematical calculations for coordinate scaling

We scale each coordinate $(x, y)$ to fit inside the safe dispensing area using a scale factor:

$$ x_{\text{scaled}} = x_{\text{original}} \cdot SCALE $$ $$ y_{\text{scaled}} = y_{\text{original}} \cdot SCALE $$

Example:

  • Original point:
$(-36.3, 25.3)$ mm
  • Scale factor:
$SCALE = 0.7$ $$ x_{\text{scaled}} = -36.3 \cdot 0.7 = -25.41 \text{ mm} $$ $$ y_{\text{scaled}} = 25.3 \cdot 0.7 = 17.71 \text{ mm} $$

Radius Check

The simulator only allows dispensing inside the safe radius:

$$ R_{\text{max}} = \frac{\text{PETRI INNER DIAMETER}}{2} - 2 = 40 \text{ mm} $$ A point $(x_{\text{scaled}}, y_{\text{scaled}})$ is valid if: $$ \sqrt{x_{\text{scaled}}^2 + y_{\text{scaled}}^2} \leq R_{\text{max}} $$

Example:

$$ \sqrt{(-25.41)^2 + (17.71)^2} = \sqrt{645.7 + 313.7} = \sqrt{959.4} \approx 30.98 \text{ mm} < 40 \text{ mm} $$

Safe to dispense.


Droplet Size Mapping

Droplet volume $V$ in μL is mapped to a visual size $S$ for plotting:

$$ S = V \cdot K $$ Where $K = 100$ in our code.

Example:

$$ V = 1 \mu L \implies S = 1 \cdot 100 = 100 \text{ (scatter marker size)} $$

Summary Formula for Visualization

For each original coordinate $(x, y)$ and droplet volume $V$: $$ \begin{cases} x_{\text{scaled}} = x \cdot SCALE \\ y_{\text{scaled}} = y \cdot SCALE \\ S = V \cdot 100 \\ \text{Check: } \sqrt{x_{\text{scaled}}^2 + y_{\text{scaled}}^2} \leq 40 \end{cases} $$

Example Table

Original $(x,y)$Scaled $(x,y)$Volume $(\mu L)$Size $S$
(-36.3, 25.3)(-25.41, 17.71)1100
(29.7, -16.5)(20.79, -11.55)2200
(-12.1, -36.3)(-8.47, -25.41)0.550

AI really helped making this calculations neatly and fast to implement organically on the Python code.


6. Pipette Operations

Dispense

self.droplets_x.append(location.point.x)  
self.droplets_y.append(location.point.y)  
self.droplets_size.append(volume * 100)  
self.droplets_color.append(color)      
  • Maps volume → size of droplet visually (unprincipled scaling, but works for display).
  • Updates self.totalDispensed to track volumes per color.

Aspirate

  • Checks for tip presence, maximum volume, and cross-contamination.
  • Updates self.totalAspirated.

Pick Up & Drop Tip

  • Ensures the pipette is always aware of whether it holds a tip, preventing accidental dispensing or aspirating without one.

7. Petri Dish Mapping (petriLocOfWell)

x=(x-ord('D')) * MAX_DRAW_RADIUS/4  
y=(y-6) * MAX_DRAW_RADIUS/6  
  • Converts well IDs (A1-H12) into (x, y) coordinates on the Petri dish.
  • ord('D') and y-6 center the mapping around the dish.
  • Ensures droplets are placed accurately relative to the plate center.
Visual Representation of the Petri Dish and Safe Dispensing Area

We define:

  • Petri dish radius:
$R_\text{petri} = 42\text{ mm}$ (inner diameter 84 mm)
  • Safe dispensing radius: $R_\text{max} = 40\text{ mm}$
  • Sample points scaled by $SCALE = 0.7$
\[ \begin{tikzpicture}[scale=0.15] % Petri dish (outer circle) \draw[thick, fill=black!10] (0,0) circle (42); % Safe radius \draw[dashed, thick, color=red] (0,0) circle (40); % Axes \draw[->] (-45,0) -- (45,0) node[right] {$x$ (mm)}; \draw[->] (0,-45) -- (0,45) node[above] {$y$ (mm)}; % Sample points (scaled examples) \filldraw[fill=green!70!black, draw=black] (-25.41,17.71) circle (1.5) node[above right] {Point A}; \filldraw[fill=blue!70!black, draw=black] (20.79,-11.55) circle (1.5) node[below right] {Point B}; \filldraw[fill=orange, draw=black] (-8.47,-25.41) circle (1.5) node[below left] {Point C}; % Labels \node at (0,-44) {Petri dish center (0,0)}; \node[color=red] at (30,30) {Safe dispensing area}; \end{tikzpicture} \]

Legend:

  • Black circle: Petri dish outer edge
  • Red dashed circle: Maximum safe dispensing radius
  • Colored dots: Scaled droplet coordinates
  • Axes: $x$ and $y$ in mm

How it works:

  • The \draw commands make the dish and safe area.
  • The \filldraw commands place your points after scaling with SCALE = 0.7.
  • You can add more points by duplicating \filldraw[...] (x_scaled, y_scaled) ....

8. Visualization (visualize())

  • Draws the Petri dish with plt.Circle.
  • Displays droplets with plt.scatter.

Smears are commented out to prevent unwanted lines:

# for xlist,ylist,color in self.smears:  
#    plt.gca().plot(...)      
  • X and Y limits are set slightly beyond the dish to avoid clipping.

9. Color & Well Handling

Additionally, we discovered that in the simulator:

  • Blue corresponds to A2, with A1 you get pink, B1 is purple, while C1 is green and D1 is yellow.
  • Columns beyond D may not exist in some mock labware.
  • This required careful mapping of colors to well IDs.
  • We also used the color mapping to differentiate bio-inks visually.

10. Optional Future Feature

  • A PNG → Opentrons coordinates converter could automate mapping any pixel art (Snoopy, logos, text) into pipette instructions (this part really makes your life easier!).
  • Could be useful for quickly generating complex designs. However, we still have to scale the coordinates.

Summary of ChatGPT - AI Contributions

  • Analyzed and adapted the Opentrons mock environment to work in Colab with new numpy versions.
  • Applied scaling (SCALE = 0.7) to prevent MAX_DRAW_RADIUS errors.
  • Commented out smears to clean the visualization (plt.scatter() only).
  • Helped map real coordinates and colors into Opentrons wells for the simulator.
  • Explained the logic behind dispense, aspirate, tip handling, and visualization.
  • Suggested a PNG → coordinates converter for rapid design automation.

Now, for the code used

The colors instructed by Lifefabs Institute, London - Node are blue, pink and purple so two versions where made

Link to the Google Colab Opentrons Python notebook: Click here


The final take

Final design in pink and purle (left) and second final design option in blue and pink (right).

4. If the Python component is proving too problematic even with AI and human assistance, download the full Python script from the GUI website and submit that:

Use the download icon pointed to by the red arrow in this diagram.

Use the download icon pointed to by the red arrow in this diagram.

This are the Python files with the final design downloaded directly from the GUI site:

5. If you use AI to help complete this homework or lab, document how you used AI and which models made contributions.

Did you use AI in to help write your code? If so, what was your experience & which AI tool did you find most helpful?

Did I use AI? For sure! I used AI to help write and optimize my code. I primarily used ChatGPT, which was extremely helpful in reviewing my code, explaining tricky parts, and suggesting optimizations. I also tried Google Colab’s Gemini, but I found its responses less useful and not satisfactory for my needs, even when providing it with access to the code. ChatGPT really guided me step by step, helping me understand how to structure the Opentrons protocol correctly and troubleshoot potential issues, which made the process much smoother and more reliable.

That said, even with ChatGPT’s guidance, we encountered several issues that we were not able to fully resolve, so while it significantly helped improve and clarify the code, it didn’t solve every problem.


  1. Sign up for a robot time slot if you are at MIT/Harvard/Wellesley or at a Node offering Opentrons automation. The Python script you created will be run on the robot to produce your work of art!
    • At MIT/Harvard? Lab times are on Thursday Feb.19 between 10AM and 6PM.
    • At other Nodes? Please coordinate with your Node.
  2. Submit your Python file via this form.

2. Post-Lab Questions

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

For this week, we’d like for you to do the following:

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

The paper chosen was:

PlasmoTron: an open-source platform for automated culture of malaria parasites.

Theo Sanderson, Julian C. Rayner. (February, 2018). PlasmoTron: an open-source platform for automated culture of malaria parasites. Bioarxiv. https://doi.org/10.1101/241596

About this article:


Also, some other papers that are very interesting about this tipic:

  1. Semi-automated Production of Cell-free Biosensors.

Dylan M. Brown, Daniel A. Phillips, David C. Garcia, et al. (october, 2024). Semi-automated Production of Cell-free Biosensors. bioRxiv. https://doi.org/10.1101/2024.10.13.618078

  1. Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research.

Kan Hatakeyama-Sato, Toshihiko Nishida, Kenta Kitamura, Yoshitaka Ushiku, Koichi Takahashi, Yuta Nabae, Teruaki Hayakawa. (Jun, 2025). Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research. Arxiv. arXiv:2506.12312 [cs.RO]. https://doi.org/10.48550/arXiv.2506.12312

  1. BOTany Methods: Accessible Automation for Plant Synthetic Biology. If you can get access.

Moni Qiande, Abigail Lin, Lianna Larson, Cătălin Voiniciuc. (2026). BOTany Methods: Accessible Automation for Plant Synthetic Biology, Plant Physiology. kiag066, https://doi.org/10.1093/plphys/kiag066


  1. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.

Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.

  1. Echo transfer biosensor constructs and any required cofactors into specified wells.
  2. Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate.
  3. Multiflo dispense the CFPS lysate to all wells to start protein expression.
  4. PlateLoc seal the plate.
  5. Inheco incubate the plate at 37°C while the biosensor proteins are synthesized.
  6. XPeel remove the seal.
  7. PHERAstar measure fluorescence to compare biosensor responses.

I decided to hold on this section just for the moment since i might change my project this week!.


3. Final Project Ideas

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

As explained in this week’s recitation, add 1-3 slides in your Node’s section of this slide deck with 3 ideas you have for an Individual Final Project. Be sure to put your name, city, and country on your slide!


The submitted project ideas are as follows:

Project N° 1: Dual-Signal Biosensor for Functional Radiation Dosimetry


Project N° 2: Living Sound-to-Color Interface Using Optogenetic Bacteria


Project N° 3: Engineered Microbial Sensor for Deep-Ocean Environments


Resources

HTGAA - Week 4: Protein Design Part I


cover image cover image

My Homework

x

This week focuses on how sequence, structure, and energetics can be modeled and manipulated to create or optimize proteins with specified functions.

Lecture (Tues, Feb 10)

Protein Design Part I
(▶️Recording)
Thras Karydis, Jon Kaufman

Recitation (Wed, Feb 11)

Protein folding
(▶️Recording)
Allan Costa


Protein Design I

Objective:

  1. Learn basic concepts:
    • amino acid structure
    • 3D protein visualization
    • the variety of ML-based design tools
  2. Brainstorm as a group how to apply these tools to engineer a better bacteriophage (setting the stage for the final project).

Part A. Conceptual Questions

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
  3. Why are there only 20 natural amino acids?
  4. Can you make other non-natural amino acids? Design some new amino acids.
  5. Where did amino acids come from before enzymes that make them, and before life started?
  6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
  7. Can you discover additional helices in proteins?
  8. Why are most molecular helices right-handed?
  9. Why do β-sheets tend to aggregate?
    • What is the driving force for β-sheet aggregation?
  10. Why do many amyloid diseases form β-sheets?
    • Can you use amyloid β-sheets as materials?
  11. Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

  1. Briefly describe the protein you selected and why you selected it.
  2. Identify the amino acid sequence of your protein.
    • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
    • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
    • Does your protein belong to any protein family?
  3. Identify the structure page of your protein in RCSB
    • When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
    • Are there any other molecules in the solved structure apart from protein?
    • Does your protein belong to any structure classification family?
  4. Open the structure of your protein in any 3D molecule visualization software:
    • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
    • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
    • Color the protein by secondary structure. Does it have more helices or sheets?
    • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
    • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Part C. Using ML-Based Protein Design Tools

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

  1. Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
  2. Choose your favorite protein from the PDB.
  3. We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359

  1. Deep Mutational Scans
    1. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
    2. Can you explain any particular pattern? (choose a residue and a mutation that stands out)
    3. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
  2. Latent Space Analysis
    1. Use the provided sequence dataset to embed proteins in reduced dimensionality.
    2. Analyze the different formed neighborhoods: do they approximate similar proteins?
    3. Place your protein in the resulting map and explain its position and similarity to its neighbors.

C2. Protein Folding

Picture Source: Lin et al (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model.

Picture Source: Lin et al (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model.

Folding a protein

  1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
  2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

C3. Protein Generation

Picture Source: 1. Post from Sergey Ovchinnikov 2. Roney, Ovchinnikov et al (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101

Picture Source: 1. Post from Sergey Ovchinnikov 2. Roney, Ovchinnikov et al (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
  2. Input this sequence into ESMFold and compare the predicted structure to your original.

Part D. Group Brainstorm on Bacteriophage Engineering

Assignees for the following sections
MIT/Harvard studentsOptional
Committed ListenersRequired
  1. Find a group of ~3–4 students
  2. Read through the Phage Reading material listed under “Reading & Resources” below.
  3. Review the Bacteriophage Final Project Goals for engineering the L Protein:
    • Increased stability (easiest)
    • Higher titers (medium)
    • Higher toxicity of lysis protein (hard)
  4. Brainstorm Session
    • Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).
    • Write a 1-page proposal (bullet points or short paragraphs) describing:
      • Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).
      • Why do you think those tools might help solve your chosen sub-problem?
        • Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).
      • Include a schematic of your pipeline.
    • This resource may be useful: HTGAA Protein Engineering Tools
  5. Each individually put your plan on your HTGAA website
    • Include your group’s short plan for engineering a bacteriophage

Resources

HTGAA - Week 5: Protein Design Part II


cover image cover image

My Homework

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image