Sergio Cuiza — HTGAA Spring 2026

cover image cover image

About me

I’m Sergio, an undergraduate Bioengineering student from Bolivia with a strong interest in exploring how synthetic biology and emerging technologies can be applied to create innovative and regenerative solutions. I’m excited about HTGAA because it connects science, creativity, and real-world impact, which aligns with my curiosity for experimenting at the intersection of biology, design, and engineering. I’m looking forward to learning from this community and expanding both my technical skills and my perspective on what’s possible.

Contact info

Homework

Labs

Projects

Subsections of Sergio Cuiza — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    GammaShroom 1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  • Week 2 HW: DNA r/w/e

    Part 0: Basics of Gel Electrophoresis Attend or watch all lecture and recitation videos. Optionally watch bootcamp. Part 1: Benchling & In-silico Gel Art See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview: Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs!

  • Week 3 — Lab Automation

    Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME! Your task this week is to Create a Python file to run on an Opentrons liquid handling robot. Review this week’s recitation and this week’s lab for details on the Opentrons and programming it. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead. Ask for help early! If you are having any trouble with scripting, contact your TAs as soon as possible for help. Do not wait until your scheduled robot time slot or you may not be able to complete this assignment!

  • Week 4 — Protein Design Part I

    Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Let’s break this down step-by-step. Understanding a Dalton: A Dalton (Da) is another name for the atomic mass unit. It’s the approximate mass of a single proton or neutron. So, an amino acid of ~100 Da means one molecule has a mass of about 100 atomic mass units.

  • Week 5 — Protein Design Part II

    Part A: SOD1 Binder Peptide Design (From Pranam) Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • Week 6 — Genetic Circuits Part I: Assembly Technologies

    Assignment: DNA Assembly Answer these questions about the protocol in this week’s lab: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? What are some factors that determine primer annealing temperature during PCR? There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning? How does the plasmid DNA enter the E. coli cells during transformation? Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online). Model this assembly method with Benchling or Asimov Kernel!

  • Week 7 — Genetic Circuits Part II: Neuromorphic Circuits

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

  • Week 9 — Cell-Free Systems

    Homework Part A: General and Lecturer-Specific Questions General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Describe the main components of a cell-free expression system and explain the role of each component.

  • Week 10 — Advanced Imaging & Measurement Technology

    Homework: Final Project For your final project: Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

  • Week 11 — Bioproduction & Cloud Labs

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST. A personalized URL was sent to the email address associated with your Discourse account, and you can discuss the artwork on the Discourse. If you did not have a chance to contribute, it’s okay, just make sure you become a TA this fall! 😉

Subsections of Homework

Week 1 HW: Principles and Practices

GammaShroom

cover image cover image

1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

The project I want to develop is called “GammaShroom”, a biological engineering platform that uses radiation-absorbing fungi to help remediate and protect environments exposed to nuclear radiation. This idea is inspired by the discovery of radiotrophic fungi found in places like Chernobyl, where certain species are able to survive and even grow in high-radiation environments by using melanin to interact with ionizing radiation.

The goal of this project is to engineer or optimize these fungi so they can be used as living biological tools for radiation shielding and environmental cleanup. For example, they could be deployed in contaminated sites, nuclear waste storage facilities, or even future space missions where radiation protection is critical. I am interested in this application because it combines microbiology, synthetic biology, and environmental engineering to address a real-world problem. It also represents a sustainable alternative to traditional chemical or mechanical radiation barriers, using biological systems that can self-repair and adapt to harsh conditions.

2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

It’s important to have clear that RadiomycoShield involves releasing or using engineered microorganisms in sensitive environments, it is important to establish governance goals that prioritize safety, environmental protection, and responsible innovation. One major goal is to ensure biosafety and environmental containment. This means preventing unintended ecological disruption if engineered fungi were to spread beyond their intended location. A related sub-goal is to develop strict monitoring systems that track how these organisms behave over time in real environments.

Another crucial governance goal is to promote beneficial and equitable use of the technology. Since radiation contamination affects communities worldwide, access to this technology should not be limited only to wealthy countries or private corporations. A sub-goal here is to encourage international collaboration and shared standards so that remediation tools can be safely and fairly distributed. Together, these goals aim to balance innovation with ethical responsibility, ensuring that the technology reduces harm while maximizing its positive environmental and social impact.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

Purpose: What is done now and what changes are you proposing?

Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)

Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?

Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

  1. International Safety Framework for Radiotrophic Fungal Engineering

Purpose

-Current Situation: Research on radiation-absorbing fungi is still emerging and is regulated under general biosafety frameworks that were not designed specifically for organisms deployed in radioactive environments.

-Proposed Change: Develop a specialized international safety framework focused on engineered radiotrophic fungi used for environmental remediation, including stricter evaluation before field deployment.

Design

-Actors: International environmental agencies (e.g., IAEA, UNEP), national biosafety regulators, academic research institutions, and biotech companies.

-Implementation:

-Require environmental risk assessments before outdoor fungal deployment.

-Establish standardized containment and monitoring protocols.

-Create certification systems for laboratories working with engineered fungi.

-Promote international collaboration to harmonize safety standards.

Assumptions

-Specialized regulation will improve safety without severely slowing innovation.

-Researchers and companies will comply with new international standards.

-Environmental impact can be reasonably predicted through controlled testing.

Risks of Failure & “Success”

-Failure Risks: Inconsistent enforcement across countries and regulatory loopholes.

-Unintended Consequences of Success: Excessive regulation may discourage research investment and slow the adoption of beneficial remediation technologies.

  1. Funding Incentives for Sustainable Radiation Bioremediation Technologies

Purpose

-Current Situation: Development of fungal bioremediation technologies is limited by high research costs and uncertain commercial returns.

-Proposed Change: Introduce financial incentives and public funding programs to support safe and sustainable fungal remediation technologies.

Design

-Actors: Government science agencies, environmental ministries, international funding organizations, and biotech startups.

-Implementation:

-Offer research grants for radiation bioremediation projects.

-Provide tax incentives for companies developing eco-friendly remediation tools.

-Support public-private partnerships to scale pilot projects.

-Fund long-term safety and environmental impact studies.

Assumptions

-Financial support will accelerate innovation and responsible development.

-Companies will prioritize sustainability when incentives are aligned.

-Governments can effectively evaluate project impact.

Risks of Failure & “Success”

-Failure Risks: Misallocation of funds or exaggerated sustainability claims.

-Unintended Consequences of Success: Overinvestment in one technology could reduce funding for alternative remediation approaches.

  1. Global Open Environmental Monitoring Network for Fungal Remediation

Purpose

-Current Situation: Monitoring of radioactive remediation sites is fragmented and data is often inaccessible across institutions.

-Proposed Change: Create a shared international platform that tracks fungal remediation performance and environmental safety indicators in real time.

Design

-Actors: Academic institutions, environmental agencies, international organizations, and data scientists.

-Implementation:

-Develop a centralized open-access monitoring database.

-Use standardized sensors and reporting protocols.

-Establish international data-sharing agreements.

-Apply AI tools to analyze environmental trends.

Assumptions

-Institutions will be willing to share environmental data.

-Cybersecurity systems can protect sensitive information.

-Standardized data collection can be widely adopted.

Risks of Failure & “Success”

-Failure Risks: Limited participation and inconsistent data quality.

-Unintended Consequences of Success: Open environmental data may raise security or geopolitical concerns.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Option 1: International Safety FrameworkOption 2: Funding IncentivesOption 3: Global Monitoring Network
Enhance Biosecurity
• By preventing incidents1 (Strict safety standards reduce accidental release risks)3 (Funding does not directly prevent incidents)2 (Monitoring detects risks but doesn’t prevent them)
• By helping respond2 (Regulatory coordination helps but may be slow)3 (Financial tools don’t support emergency response)1 (Real-time data enables rapid response)
Foster Lab Safety
• By preventing incident1 (Mandatory lab certifications improve safety)3 (Incentives don’t directly affect lab safety)2 (Shared safety data improves practices indirectly)
• By helping respond2 (Oversight structures support incident reporting)3 (No emergency response function)1 (Monitoring network helps detect and track incidents)
Protect the environment
• By preventing incidents1 (Pre-deployment risk assessments protect ecosystems)2 (Encourages safer design but doesn’t regulate)2 (Environmental tracking supports prevention indirectly)
• By helping respond2 (Regulatory bodies coordinate cleanup)3 (No direct response mechanism)1 (Early detection supports rapid mitigation)
Other considerations
• Minimizing costs and burdens to stakeholders3 (Compliance costs may be high)1 (Financial support reduces burden)2 (Infrastructure is costly but shared)
• Feasibility?2 (Requires international cooperation)1 (Funding programs already exist in many countries)2 (Technical and coordination challenges)
• Not impede research3 (Strict rules may slow experimentation)1 (Encourages research investment)2 (Data sharing may raise IP concerns)
• Promote constructive applications2 (Encourages responsible development)1 (Accelerates innovation and scaling)1 (Knowledge sharing expands applications)

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

I would prioritize a hybrid governance strategy that combines the Global Open Environmental Monitoring Network with targeted funding incentives for responsible innovation, supported by limited international safety regulations for high-risk deployments. The monitoring network is essential because it enables early detection of ecological risks and provides transparency about how radiotrophic fungal systems behave in real environments. At the same time, financial incentives encourage researchers and companies to invest in safer and more effective remediation technologies. Focused international regulations should act as a safeguard for projects involving environmental release, ensuring that innovation proceeds responsibly.

The main trade-off in this approach is balancing rapid technological progress with precautionary oversight. Too much regulation could slow innovation, while insufficient oversight could increase ecological risks. This recommendation assumes that sustained international cooperation and funding are achievable, although both remain uncertain. My recommendation is directed toward international environmental and nuclear governance organizations such as the International Atomic Energy Agency and the United Nations Environment Programme, which are positioned to coordinate global monitoring and safety standards.

Weekly Assignment

Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.

Reflecting on this week’s material, I developed a deeper understanding of how modern biological engineering builds complexity using modular design principles similar to engineering systems. The concept of design cores and universality showed how complex biological circuits can be assembled hierarchically from composable elements, allowing systems to scale in sophistication while remaining controllable. At the same time, biology introduces a unique layer of complexity through self-replication, meaning engineered systems are not static machines but living programs that can grow and evolve. Learning about advances in protein design, genetic circuits, and large-scale genome engineering highlighted how synthetic biology is rapidly expanding our ability to design biological functions from scratch.

This technical power raises important ethical concerns. One major issue is the intentional release of engineered organisms into complex ecosystems. Even systems designed for remediation or beneficial purposes could disrupt microbial communities or behave unpredictably because living systems replicate and interact dynamically with their environments. Another concern is how access to advanced biological technologies may become uneven, especially for communities most affected by environmental disasters.

To address these challenges, governance strategies should include mandatory long-term ecological monitoring of deployed organisms, transparent reporting of experimental and environmental data, and international cooperation to ensure equitable access to beneficial technologies. Integrating modular engineering principles with ethical oversight can help ensure that increasing biological complexity leads to safer and more responsible innovation.

Assignment (Final Project)

As part of your final project, design one or more strategies to ensure that your project, and what it enables, contributes to growing an ethical biological future.

My final project requires a multi-faceted strategy to ensure that the development of radiation-absorbing fungal technologies contributes to an safe and ethical biological future. The first key approach is integrating biosafety engineering directly into the fungal system, including biological containment strategies and long-term ecological monitoring to minimize unintended environmental effects. The second approach is establishing transparent and secure data-sharing practices that allow researchers and regulatory bodies to evaluate performance and risks while protecting sensitive information from misuse. The third approach is promoting equitable and sustainable deployment by prioritizing access for communities affected by nuclear contamination and ensuring that remediation efforts do not create new ecological burdens. Together, these strategies support a research framework that balances innovation with responsibility, fostering environmental protection, social fairness, and public trust in emerging biotechnologies.

Prompt used for the task (they told us to put it, I think, just in case sjsjs)

I would like to clarify that I did use AI for this work, but as you will see, it was mainly for information organization, because I did the research myself, as well as improving the writing to make it more comfortable for the reader. This is evident in the prompts I used. Thank you very much for reading.

For the pictures:

“A futuristic scientific illustration of radiotrophic fungi absorbing radiation in a post-nuclear environment inspired by Chernobyl. Show dark melanin-rich fungi growing on cracked concrete and metallic surfaces, glowing softly as they absorb invisible radiation waves represented by subtle blue and green energy streams. Include a cross-section view where fungal cells convert radiation into biochemical energy, with stylized mitochondria and molecular structures inside. The scene should blend realism and sci-fi aesthetics, with atmospheric lighting, high detail, and a clean scientific visualization style. Add a sense of environmental recovery, with small plants growing nearby to symbolize bioremediation. Use a cool color palette with luminous accents, high resolution, cinematic lighting, and a professional scientific poster style.”

“A futuristic biotech logo featuring a stylized mushroom inspired by radiation-absorbing fungi, glowing with soft neon green and purple energy. The mushroom cap resembles a subtle mushroom cloud shape but abstract and scientific, not violent. Clean minimal design, smooth vector style, centered composition. Include subtle radiation symbol elements integrated into the mushroom texture. Modern biotech aesthetic, sleek typography reading “GammaShroom” below the icon. White or dark gradient background, high contrast, professional scientific branding style.”

For the homework:

“I am developing a research project on a fungal platform for radiation attenuation and environmental bioremediation. Below is a curated set of academic and institutional sources related to fungal radiation resistance, synthetic biology, environmental remediation, and governance frameworks.

Please synthesize and organize the information from all the provided links into a structured analytical report. The goal is to create a clear, evidence-based overview that helps consolidate current knowledge and identify how each source informs the development of my project.

Organize the response into the following sections:

  1. Overview of Sources Provide a concise summary of each link individually. For each source, identify its main focus, key findings, and relevance to fungal bioremediation or synthetic biology. Explain how it contributes to the broader understanding of the field.

  2. Scientific and Technical Foundations Integrate the sources to describe the core biological and engineering principles involved, including mechanisms of radiation resistance in fungi, bioremediation processes, and relevant synthetic biology tools.

  3. Current Applications and Research Landscape Summarize existing case studies, experimental systems, or technological applications described in the sources. Identify demonstrated capabilities and remaining technical gaps.

  4. Governance, Safety, and Ethical Context Extract and synthesize information related to biosafety, environmental governance, and ethical considerations. Explain how these frameworks relate to responsible project development.

  5. Integrated Insights for Project Development Based on the combined evidence from all sources, summarize key insights that are most relevant to refining and strengthening the project. Highlight opportunities, limitations, and areas requiring further investigation.

The report should maintain an academic tone, use clear scientific language, and explicitly reference how the sources relate to one another. Focus on synthesis and organization rather than speculation.

Sources: Fungal Radiation Attenuators

Melanized fungi thrive on radiation. Studies of Chernobyl isolates and other radiotrophic fungi show that dense melanin layers in cell walls absorb and transduce ionizing radiation. In effect, melanin-rich fungi can “harvest” gamma rays much like plants harvest light. This underpins the idea that engineered, melanized fungal biomass could serve as a living radiation shield.

Space-grown fungi reduce ambient radiation. An ISS experiment with Cladosporium sphaerospermum found that the fungal lawn grew rapidly in microgravity and caused a measurable drop in radiation beneath it compared to a no-fungus control. In quantitative terms, fungal biomass attenuated the local gamma dose rate on orbit. This real-world result supports using fungi as bio-shielding in high-radiation settings.

Directed growth toward radiation (radiotropism). Research notes that some fungi actively grow toward radiation sources (positive radiotropism) and use melanin as an “energy transporter” for metabolism. For example, Chernobyl black molds express more melanin near strong sources and grow faster under irradiation. These observations imply that a radiation-biased growth stimulus could help a bioremediation platform concentrate fungi in hotspot areas.

Fungal Bioremediation Cases

Accumulation of radionuclides. Fungal mycelium naturally binds metals and radionuclides. DOE studies note that fungi accumulated substantial 90Sr, 137Cs and other isotopes in Chernobyl soils. In fact, a 2003 DOE primer explicitly states “fungi are also known to accumulate metals, particularly radionuclides (as observed following the 1986 Chernobyl accident)”. This natural bioaccumulation suggests engineered fungi could be tuned to sequester radioisotopes from contaminated media.

Engineered radiation-resistant strains. Screening of extreme environments has yielded fungi tolerating both radiation and toxins. For instance, Rhodotorula taiwanensis MD1149 (isolated from a contaminated site) grows under 36 Gy/h of gamma radiation at pH 2.3 and survives acute 2.5 kGy doses. It also forms robust biofilms in the presence of mercury and chromium. Such traits make MD1149 a promising chassis for fungal bioremediation of mixed radioactive/heavy-metal wastes. (The genome of MD1149 is sequenced, enabling genetic engineering for enhanced uptake or melanin production.)

Cost‐effective mycoremediation. Fungi are abundant and fast-growing, offering a low-cost cleanup strategy. The DOE primer notes that mycoremediation could rival plant-based phyto-remediation and be deployed on contaminated soils with added nutrients. In practice, researchers have demonstrated fungal biosorption of U, Pu, and other metals in lab reactors. While large-scale field trials remain limited, these case studies show feasibility. Together, these findings motivate designing fungal bioreactors or biofilters for nuclear waste sites.

Synthetic Biology Governance (Risk and Ethics)

Precautionary risk assessment. Reviews of synthetic biology governance emphasize anticipating environmental hazards. For example, Bohua et al. (2023) propose an ethical framework that prioritizes the precautionary principle and rigorous environmental risk assessment before release. This includes analyzing gene flow, competition with native species, and other non-target effects. Applying such frameworks means a fungal bioremediation platform would require case-by-case safety studies and stakeholder input prior to deployment.

Anticipatory and agile regulation. Policy experts argue that regulation must co-evolve with technology. Kim et al. (2025) call for a “co‐evolutionary” governance model based on OECD guidelines: combining R&D with strategic foresight, public engagement, rapid regulatory adaptation, and international cooperation. In practice, this suggests regulators should work alongside scientists developing radiotrophic fungi—setting provisional guidelines for field use (e.g. containment measures) as the tech develops.

Codes of conduct and “safety-by-design.” International efforts have produced nonbinding standards to foster responsible research. The OECD report highlights the “Tianjin Biosecurity Guidelines” and other biosafety codes that encourage researchers to embed ethics and self-monitoring in their work. For example, an engineered fungus could be designed with genetic “kill switches” or metabolic dependencies to limit persistence. Upholding these principles would be part of an ethical development plan (consistent with many national synthetic biology roadmaps).

International Guidelines and Policies

UN Convention on Biological Diversity (CBD). The CBD has explicitly considered synthetic biology. A 2015 CBD Secretariat report notes that engineered microbes (including fungi) are being developed for bioremediation and pollution control. It also underlines that existing regulatory regimes – notably the Cartagena Protocol on Biosafety – cover “living modified organisms.” In essence, any engineered fungus released into the environment would fall under international biosafety rules requiring risk assessment and notification. This supports governance by tying fungal bioremediation to the same safety processes used for GMOs.

Cartagena Protocol on Biosafety. This UN treaty (under the CBD) mandates that Parties assess and consent to the cross-border transfer or intentional release of any living modified organism (LMO). An engineered radio-attenuating fungus would be considered an LMO. Thus, developing such a platform must follow the Protocol’s risk assessment and public consultation procedures. Compliance ensures that bioremediation deployments meet internationally agreed safety standards.

IAEA and nuclear remediation standards. The International Atomic Energy Agency issues safety guides on radioactive waste and site cleanup. Though not always specific to biotech, IAEA documents (e.g. Policy and Strategies for Environmental Remediation) stress systematic planning, multi-stakeholder oversight, and comparisons of remediation options. A fungal platform would fit into these guidelines as a novel remediation method; IAEA frameworks would require demonstrating its effectiveness and safety relative to conventional methods.

WHO and other agencies. The World Health Organization has historically guided biosafety of medical and agricultural biotech (e.g. risk assessment of GM foods and drugs), and would advocate evaluating any health impacts of environmental releases. WHO’s “One Health” approach also emphasizes that environmental and human health are linked, reinforcing the need for ecological risk checks. Global bodies like the OECD and UN also call for transparency and public engagement on emerging biotechnologies. In sum, international policies urge that a radiation‐absorbing fungal system be developed under strong biosafety oversight – integrating ecological risk assessments, containment planning, and emergency response strategies from the outset.

Sources: Peer-reviewed studies and institutional reports provide the above insights. For example, lab and spaceflight experiments confirm fungi’s radiotrophic capabilities. Bioremediation research identifies metal-accumulating strains and genomic tools for engineering them. Governance analyses and UN documents outline the ethical, legal, and procedural frameworks (precautionary principle, Cartagena Protocol, OECD anticipatory governance, etc.) needed to safely develop and release engineered organisms. Each source thus helps shape a science-based, policy-informed approach to a radiation-absorbing fungal bioremediation platform.”

“I am providing a draft document that contains research notes and project descriptions. Please revise and improve the text while preserving its original meaning and technical content.

Your task is to:

• Correct grammar, spelling, and punctuation errors • Improve clarity, flow, and sentence structure • Replace repetitive or informal wording with appropriate academic synonyms • Strengthen the professional and scientific tone • Ensure consistency in terminology and style throughout the document • Maintain the original intent, arguments, and factual content without adding new information

If any sections are unclear or ambiguous, rewrite them for precision while keeping the author’s meaning intact. Avoid unnecessary complexity; prioritize readability and academic professionalism.”

Week 2 HW: DNA r/w/e

Part 0: Basics of Gel Electrophoresis Attend or watch all lecture and recitation videos. Optionally watch bootcamp.

Part 1: Benchling & In-silico Gel Art See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview:

Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs!

Part 0: Basics of Gel Electrophoresis

Part 0 reviews the fundamental biological principles that support the rest of this project. Understanding how genetic information flows inside cells is essential for designing and interpreting molecular biology experiments.

DNA as the Information Storage Molecule

DNA (deoxyribonucleic acid) is the molecule that stores genetic information in living organisms. It consists of two complementary strands arranged in a double helix. Each strand is made of nucleotides containing four bases: adenine (A), thymine (T), cytosine (C), and guanine (G).

The sequence of these bases encodes instructions for building proteins. DNA is chemically stable, making it ideal for long-term information storage. During experiments such as restriction digests and gel electrophoresis, we manipulate DNA directly to analyze or modify genetic information.

RNA and Transcription

RNA (ribonucleic acid) is a temporary copy of genetic instructions. During transcription, an enzyme called RNA polymerase reads a DNA template strand and synthesizes messenger RNA (mRNA).

RNA differs from DNA in three key ways:

It contains ribose sugar instead of deoxyribose

It uses uracil (U) instead of thymine (T)

It is usually single-stranded

mRNA carries genetic instructions from DNA to ribosomes, where proteins are produced.

Proteins and Translation

Proteins are functional molecules that perform most cellular tasks, including catalysis, structure, and signaling. During translation, ribosomes read mRNA in groups of three nucleotides called codons. Each codon corresponds to a specific amino acid.

A chain of amino acids folds into a three-dimensional structure that determines the protein’s function. In this project, designing DNA sequences ultimately aims to control which proteins are produced.

The Central Dogma of Molecular Biology

The relationship between DNA, RNA, and protein is summarized by the central dogma:

DNA → RNA → Protein

This directional flow explains how genetic information is expressed inside cells. All molecular biology techniques used in this assignment — including cloning, restriction digests, and gene expression — rely on manipulating this pathway.

Restriction Enzymes and DNA Manipulation

Restriction enzymes are proteins that cut DNA at specific sequences. These enzymes allow scientists to divide DNA into predictable fragments. By selecting particular enzymes, researchers can design DNA pieces that generate specific band patterns during gel electrophoresis.

This precise cutting ability is the foundation of genetic engineering and is essential for both analytical and creative gel art design.

Gel Electrophoresis Principles

Gel electrophoresis separates DNA fragments by size. Because DNA carries a negative charge, it migrates toward the positive electrode in an electric field.

Smaller fragments move faster through the agarose gel matrix, while larger fragments move more slowly. This separation produces visible bands that correspond to fragment length.

By comparing observed bands to predicted fragment sizes, researchers can verify DNA structure and confirm successful restriction digests.

Part 1: Benchling & In-silico Gel Art

Part 1 focuses on designing a gel electrophoresis experiment using virtual simulation tools before performing any physical lab work.

The primary goal of this design phase is to create a controlled DNA banding pattern through selective restriction enzyme digestion. Instead of randomly cutting DNA, the experiment is planned so that specific fragment sizes generate a visual composition on an agarose gel.

This approach transforms gel electrophoresis from a purely analytical technique into a hybrid scientific and artistic exercise. At the same time, it reinforces essential molecular biology concepts such as enzyme specificity, fragment prediction, and experimental reproducibility. Benchling’s virtual digest tool is used to simulate how restriction enzymes cut a known DNA substrate. By testing different enzyme combinations digitally, predicted fragment lengths can be analyzed without consuming physical reagents.

After creating a free account on benchling.com and importing the Lambda DNA, restriction enzyme digestion was simulated using the following enzymes:

EcoRI

HindIII

BamHI

KpnI

EcoRV

SacI

SalI

Resulting in:

cover image cover imagecover image cover imagecover image cover image

Then, go to the virtual digest tab to see how the digest looks. This visualization uses all the enzymes on the list.

cover image cover image

After seeing what could be done with the enzymes, I continued testing more combinations. For faster iteration, I used Ronan’s website to get more images. After several attempts, I ended up with the following iteration:

cover image cover imagecover image cover image

I liked it a lot because when I saw it, I don’t know why, a sculpture of the ancient Incas came to mind at that moment.

cover image cover image

I don’t know if you see it too, but here are a few lines to see if it makes it easier to detect.

cover image cover image

Anyway, I tried to make that drawing look like Paul Vanouse’s Latent Figure Protocol artwork. But I didn’t know how to do it, so I decided to ask Gemini how I could do it. This is the result:

cover image cover image

It’s not exactly what I expected; it doesn’t really resemble that style of art, but I ended up liking it.

Then, I tried to replicate it in Benchling using the enzymes the website mentioned. The bad thing is that it didn’t turn out as I expected. I’m still not sure what went wrong, but I didn’t make many attempts to recreate it; I didn’t have much time.

cover image cover image

But if you look closely, it could easily resemble a level you’d find while playing Mario Maker. Well, that’s what I can see; I don’t know what you all think.

cover image cover image

In the end, it’s a tool I need to practice more, but I really liked how it works. But let’s leave opinions aside and move on to the rest of HW2.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.

Part 2: Gel Electrophoresis Experiment (Simulation and Analysis)

There was no lab available at my node this week, so I couldn’t complete this part. Instead, I completed a detailed virtual simulation of the protocol using Benchling and theoretical analysis of the expected outcomes. This allowed me to understand the experimental workflow and interpret how restriction digests generate DNA fragment patterns that can be visualized as gel art.

The experiment would begin with designing a restriction digest of Lambda DNA using selected high-fidelity restriction enzymes. By importing the Lambda DNA sequence into Benchling and running virtual digests, I tested different enzyme combinations to predict fragment sizes and design a gel pattern inspired by gel art. This simulation demonstrated how enzyme selection directly influences the final banding pattern.

If performed in a physical laboratory, the next step would involve preparing a 1% agarose gel in TAE buffer and staining it with a fluorescent dye. The digested DNA samples would be mixed with loading dye and pipetted into the gel wells. When an electric field is applied, negatively charged DNA fragments migrate toward the positive electrode. Smaller fragments move faster through the agarose matrix, resulting in size-based separation.

After electrophoresis, the gel would be imaged using a blue light transilluminator. The resulting band pattern would be compared with the virtual digest predictions. Agreement between expected and observed fragment sizes would confirm successful restriction digestion and validate the DNA design used to create the gel art.

Although I did not physically run the gel, performing the simulation reinforced key molecular biology concepts, including restriction enzyme specificity, fragment size prediction, and electrophoretic separation. This exercise highlights how computational tools can effectively model laboratory experiments and support experimental planning in situations where physical lab access is limited.

Part 3: DNA Design Challenge 3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of >the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

[Example from our group homework, you may notice the particular format — The example below came from UniProt]

sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

[Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Lysis protein DNA sequence atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttacca>atcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

[Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Lysis protein DNA sequence with Codon-Optimization ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

3.5. [Optional] How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below. [Example shows the biomolecular flow in central dogma from DNA to RNA to Protein] Special note that all “T” were transcribed into “U” and that the 3-nt codon represents 1-AA.

3.1 Choose your protein

For this assignment, I selected the Damage Suppressor protein (Dsup) from tardigrades. Dsup is a remarkable protein that has been shown to protect cellular DNA from radiation and oxidative stress. Tardigrades are microscopic extremophiles capable of surviving severe environmental conditions, including intense radiation, dehydration, extreme temperatures, and even the vacuum of space. Their resilience has attracted significant interest in bioengineering and astrobiology.

cover image cover image

I chose Dsup because it represents a compelling intersection between fundamental biology and applied biotechnology. Its protective properties suggest potential applications in radiation protection for human cells, improvement of stress resistance in engineered microorganisms, and future space exploration where biological systems are exposed to harsh environments. Studying and expressing this protein could contribute to the development of more robust biological systems.

Using the UniProt protein database, I obtained the amino acid sequence of the Dsup protein. UniProt provides curated protein information, including functional annotations and sequence data. The protein sequence used for this project is shown below in FASTA format.

Protein sequence (excerpt): I downloaded the sequence to import it into Benchling and be able to view it better.

cover image cover imagecover image cover imagecover image cover image

Anyway, if you want to download the complete sequence, you can find it at NIH.

cover image cover image

3.2 Reverse translation (protein to DNA)

To express this protein in a laboratory system, the amino acid sequence must be converted into a DNA sequence. Using reverse translation tools based on the genetic code, I generated a nucleotide sequence corresponding to the Dsup protein.

Reverse translation assigns a codon to each amino acid. Because the genetic code is degenerate, meaning that most amino acids are encoded by multiple codons, there are many possible DNA sequences that can produce the same protein. The reverse-translated sequence represents one valid encoding of the protein.

Reverse-translated DNA sequence (excerpt): For this process I used Reverse Translate, in case you want to try it yourself:

cover image cover imagecover image cover image

This sequence serves as an initial template that can be further optimized for expression in a specific host organism. The result is much longer; you can verify this for yourself (I didn’t know how to put the entire sequence here 😅).

3.3 Codon optimization

Although many DNA sequences can encode the same protein, not all sequences are expressed equally well in every organism. Different species show preferences for certain codons, a phenomenon known as codon bias. If a gene uses rare codons for the host organism, translation can become inefficient, reducing protein yield.

I optimized the Dsup DNA sequence for expression in Escherichia coli, a widely used host in biotechnology. E. coli is preferred because it grows rapidly, is cost-effective, and has a well-characterized genetic system. Codon optimization improves translation efficiency by matching the codon usage to the host’s tRNA abundance.

This optimization enhances protein production by improving ribosome speed and accuracy, increasing mRNA stability, and reducing the likelihood of translation stalling. The resulting sequence is designed to maximize reliable expression in E. coli.

cover image cover imagecover image cover image

Analyzing optimization in E. coli sparked my curiosity, and I wanted to test how this would work in humans.

cover image cover imagecover image cover image

Anyway, there are more cases to analyze different optimizations, which you can see for yourself in IDT (the tool I used for this part).

3.4 You have a sequence. Now what?

Once the codon-optimized DNA sequence is obtained, it can be used to produce the Dsup protein through standard molecular biology techniques.

In a cell-dependent expression system, the DNA is inserted into a plasmid vector and introduced into bacterial cells through transformation. Inside the cell, RNA polymerase transcribes the DNA into messenger RNA. Ribosomes then translate the mRNA into a polypeptide chain, which folds into the functional Dsup protein. This method is commonly used for large-scale protein production in research and industry.

Alternatively, the DNA can be used in a cell-free expression system. These systems contain purified transcription and translation machinery extracted from cells. By adding the DNA template directly to this mixture, proteins can be synthesized rapidly without living cells. Cell-free systems are especially useful for rapid prototyping and synthetic biology applications.

Both approaches follow the central dogma of molecular biology, in which genetic information flows from DNA to RNA and finally to protein.

3.5 Optional: How it works in biological systems

3.5 Optional: How it works in biological systems

In natural biological systems, a single gene can give rise to multiple protein products through several regulatory mechanisms. These include alternative transcription start sites, RNA processing events such as alternative splicing, and post-translational modifications that alter protein function.

A simple example of the central dogma can be illustrated by aligning a short DNA sequence with its RNA transcript and resulting protein.

A short fragment of the Dsup gene illustrates the central dogma of molecular biology. The DNA sequence:

ATG GCA TCC ACA CAC CAA TCA TCC ACA GAA CCC TCT

is transcribed into RNA by replacing thymine with uracil:

AUG GCA UCC ACA CAC CAA UCA UCC ACA GAA CCC UCU

During translation, each codon corresponds to one amino acid, producing the protein fragment:

Met–Ala–Ser–Thr–His–Gln–Ser–Ser–Thr–Glu–Pro–Ser.

Each group of three nucleotides, called a codon, specifies one amino acid. During transcription, thymine is replaced by uracil in RNA. During translation, ribosomes read these codons to assemble the corresponding amino acid sequence, demonstrating how genetic information is converted into functional proteins.

Part 4: Prepare a Twist DNA Synthesis Order This is a practice exercise, not necessarily your real Twist order!

4.1. Create a Twist account and a Benchling account 4.2. Build Your DNA Insert Sequence

For example, let’s make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein):

In Benchling, select New DNA/RNA sequence

Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).

Promoter (e.g. BBa_J23106): TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC RBS (e.g. BBa_B0034 with spacers for optimal expression): CATTAAAGAGGAGAAAGGTACC Start Codon: ATG Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example): AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA 7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli): CATCACCATCACCATCATCAC Stop Codon: TAA Terminator (e.g. BBa_B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated!

This is not required for this exercise, but to share your design with others, please ensure that link sharing is turned on!(Optional) Share your final sequence link with a TA for review!

This insert sequence you built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). Go ahead and download the FASTA file for the sequence you made.

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs. Here’s an example of what you just annotated in Benchling:

4.3. On Twist, Select The “Genes” Option

4.4. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.5. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.6. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

This is the plasmid you just built with your expression cassette included. Congratulations on building your first plasmid!

Part 4: Preparing a Twist DNA Synthesis Order

This exercise simulates the workflow used in modern synthetic biology to design and order custom DNA. Although this is a practice exercise, it mirrors the real process researchers use to synthesize genes for experimental work.

4.1 Creating Twist and Benchling Accounts

The first step is creating accounts on Twist Bioscience and Benchling. These platforms serve complementary roles in DNA engineering.

Benchling functions as a digital molecular biology workspace where DNA sequences can be designed, edited, and annotated. It allows researchers to simulate genetic constructs before ordering them.

Twist Bioscience is a commercial DNA synthesis provider. Once a sequence is finalized in Benchling, it can be uploaded to Twist for physical synthesis.

Creating these accounts establishes the digital pipeline from design to manufacturing.

cover image cover image

I hope you like it :)

4.2 Building the DNA Insert Sequence

The goal of this section is to construct an expression cassette — a functional DNA unit that produces a protein inside a host organism.

In Benchling, a new linear DNA sequence is created. The topology is set to linear because this insert will later be placed inside a circular plasmid vector.

The sequence is built from modular components:

Promoter: initiates transcription. It controls how strongly the gene is expressed.

Ribosome Binding Site (RBS): ensures efficient translation by recruiting ribosomes.

Start Codon (ATG): signals the beginning of protein synthesis.

Coding Sequence: contains the codon-optimized gene of Dsup, in my case.

7× His Tag: adds histidine residues to allow protein purification.

Stop Codon: terminates translation.

Terminator: stops transcription and stabilizes mRNA.

Each component is pasted sequentially and annotated in Benchling. Annotation is critical because it documents the function of each region and makes the design interpretable to collaborators.

The final annotated construct represents a complete gene expression system. Viewing the Linear Map confirms the structural organization and ensures no sections are missing.

Exporting the sequence as a FASTA file prepares it for DNA synthesis.

cover image cover image

Expression Cassette Concept

The constructed insert is called an expression cassette because it can function independently once inserted into a plasmid. This modular design allows the same cassette to be reused in different vectors or host organisms.

Visualization with SBOL Canvas helps communicate the design using standardized synthetic biology symbols. I don’t know why, but I like this part. I love the SBOL Canvas interface; I think it’s simply because it’s simple. I would like to use more of this interface.

cover image cover image

4.3 Selecting the “Genes” Option in Twist

Inside Twist’s ordering interface, selecting the Genes category specifies that a full gene construct is being synthesized rather than short oligonucleotides.

4.4 Choosing Clonal Genes

Clonal genes are circular plasmids delivered ready for transformation into bacteria. This option accelerates experimentation because no additional cloning is required.

In contrast, gene fragments are linear DNA pieces that require assembly before use. While more flexible, they add extra laboratory steps.

Choosing clonal genes prioritizes speed and simplicity.

4.5 Importing the Sequence

The FASTA file exported from Benchling is uploaded to Twist. This step transfers the digitally designed expression cassette into the manufacturing platform.

At this stage, Twist verifies the sequence for synthesis compatibility.

4.6 Choosing a Vector

A vector is a circular DNA backbone that carries the insert into host cells. It contains essential features such as:

an origin of replication (for plasmid copying),

antibiotic resistance markers (for selection),

cloning regions.

Selecting a vector like pTwist Amp High Copy determines how the plasmid behaves inside E. coli.

cover image cover image

Downloading the full plasmid sequence and re-importing it into Benchling allows visualization of the final construct: the insert integrated into the backbone.

This confirms successful plasmid design.

cover image cover imagecover image cover image

Final Outcome

By the end of this exercise, a fully annotated plasmid construct has been digitally assembled. This workflow demonstrates the complete pipeline of modern DNA engineering:

design → annotation → synthesis preparation → plasmid assembly.

For final projects, both the annotated insert and chosen vector must be clearly documented to ensure reproducibility and successful DNA synthesis.

Part 5: DNA Read/Write/Edit 5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank). cover image cover image (ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology? 5.2 DNA Write (i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

See some famous examples of DNA design cover image cover image (ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability? 5.3 DNA Edit (i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why? cover image cover image (ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

5.1 DNA Read

For DNA sequencing, I would choose to read DNA used in DNA-based digital data storage. This technology encodes digital information such as images, text, or scientific data into synthetic DNA molecules. I am interested in sequencing this type of DNA because it represents a bridge between biology and computer science, with the potential to create extremely dense, long-term archival storage systems. DNA is far more stable than conventional storage media and could preserve information for thousands of years. Sequencing stored DNA is essential to verify that the encoded information has not degraded and can be accurately retrieved.

To sequence this DNA, I would use next-generation sequencing (NGS), specifically sequencing-by-synthesis technology developed by Illumina. This method is considered a second-generation sequencing technology because it enables massively parallel sequencing of millions of DNA fragments simultaneously, unlike first-generation Sanger sequencing which reads one fragment at a time.

The input for this method is purified DNA containing encoded data. The preparation steps include fragmenting the DNA into short pieces, attaching adapter sequences to both ends, amplifying the fragments using PCR, and immobilizing them on a flow cell. During sequencing, fluorescently labeled nucleotides are added one base at a time. A camera records the fluorescence emitted as each base is incorporated. Specialized software converts these signals into a nucleotide sequence through a process called base calling.

The output of this technology is a large dataset of short DNA reads in digital format. These reads are assembled computationally to reconstruct the original encoded information. This approach provides high accuracy and scalability, which are critical for reliable data retrieval in DNA storage systems.

5.2 DNA Write

For DNA synthesis, I would design a genetic circuit encoding a radiation-protective protein system, inspired by extremophile organisms. Specifically, I would synthesize a codon-optimized gene encoding the Dsup protein along with regulatory elements that allow controlled expression in bacteria. This DNA could be used to study how protective proteins improve cellular resistance to radiation, which has applications in medicine and space exploration.

An example short segment of the synthesized DNA sequence could look like this:

ATGTCCGACCAGTCCCAGAAGCAGGAGAAGCTGAAGGAGGAGCTGAAGGCCAAGAAG

To synthesize this DNA, I would use commercial gene synthesis technology from Twist Bioscience. This technology relies on high-throughput chemical DNA synthesis using phosphoramidite chemistry and microarray-based oligonucleotide assembly.

The essential steps include chemical synthesis of short oligonucleotides, enzymatic assembly into longer fragments, error correction, and cloning into plasmid vectors. These fragments are then amplified and sequence-verified.

The main limitations of this synthesis method include potential synthesis errors in long sequences, cost for very large constructs, and technical limits on maximum fragment length. However, it offers excellent scalability and precision for gene-level synthesis.

5.3 DNA Edit

For DNA editing, I would focus on modifying genes that improve cellular resistance to radiation damage, similar to research being explored by companies such as Colossal Biosciences in the context of advanced genetic engineering. Editing such genes could have applications in protecting human cells during radiation therapy or long-duration space missions.

To perform these edits, I would use CRISPR-Cas9 genome editing technology. CRISPR works by using a guide RNA to direct the Cas9 enzyme to a specific DNA sequence. Cas9 creates a targeted double-strand break, and the cell’s repair machinery introduces modifications during the repair process.

The essential preparation steps include designing a guide RNA that matches the target sequence, constructing a plasmid or delivery system carrying Cas9 and the guide RNA, and introducing these components into cells. The inputs include the DNA template, Cas9 enzyme, guide RNA, and host cells.

The main limitations of CRISPR editing include off-target effects, incomplete editing efficiency, and challenges in delivering the editing machinery into certain cell types. Despite these limitations, CRISPR remains one of the most powerful and precise genome editing tools available.

GammaShroom

I hope you haven’t forgotten about my project proposed in HW1. If you don’t know what I’m talking about, take a look at HW1; it’s above WEEk2. Anyway, I mention this because I’d like to talk about how HW2 could help you better understand how to implement what we saw in HW1. HW2 extends the conceptual ideas introduced in the “gammashroom” proposal from HW1 by translating them into the theoretical and computational foundations of modern genetic engineering, even in the absence of a physical laboratory. While the node did not perform wet-lab experiments, the simulation and design components of HW2 still develop the core competencies required to engineer biological systems like “gammashroom”. By studying how restriction enzymes selectively modify DNA and how virtual gel electrophoresis predicts fragment patterns, we learn how engineered genetic constructs can be analyzed and validated in silico before any real-world implementation. This type of predictive modeling is a critical first step in synthetic biology, where careful planning and verification reduce experimental uncertainty.

More importantly, the DNA read/write/edit framework explored in HW2 directly supports the long-term development of engineered organisms capable of radiation resistance and environmental adaptation. Designing codon-optimized genes, selecting expression systems, and understanding how DNA can be precisely modified provide the technical roadmap for implementing protective genetic features similar to those envisioned in the gammashroom system. Even without executing the laboratory protocol, engaging with these workflows conceptually builds an understanding of how engineered DNA moves from digital design to functional biological systems. In this way, HW2 bridges the gap between speculative bioengineering concepts and the structured methodology required to realize them, reinforcing how computational design and molecular planning underpin any future experimental work.

Prompt used for the task

If you saw my HW1, you’ll have noticed that I also included some of the prompts I used to complete the task. I do this to show that AI is a very useful tool for supporting projects, and it’s something that personally helps me a lot to organize myself much better.

For the homework: “Please organize and synthesize the following information from my assignment (Part 3: DNA Design Challenge and Part 5: DNA Read/Write/Edit) into a clear, structured academic format.

Your goals are:

Group related concepts into logical sections and subsections

Remove redundancy while preserving all important scientific details

Use clear headings and transitions between ideas

Maintain scientific accuracy and an academic tone

Add short explanations that connect concepts when needed

Highlight key technical terms (DNA sequencing, codon optimization, gene expression, etc.)”

“Please rewrite the following scientific text to improve clarity, flow, and academic quality.

Your goals are:

Use more precise scientific vocabulary and appropriate synonyms

Improve sentence structure and transitions

Maintain the original meaning and technical accuracy

Avoid unnecessary repetition

Use a formal academic tone suitable for a university assignment

Keep explanations clear and accessible

Expand brief sections slightly if needed to improve coherence

Do not add new scientific claims — only refine and strengthen the writing.”

For the picture (Gemini):

“Hello, please examine the image I provided. It represents a DNA sequence modified by restriction enzyme digestion, producing a distinct band pattern. Could you generate an artistic image inspired by the visual structure and composition of this pattern?”

Week 3 — Lab Automation

Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME! Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.

Review this week’s recitation and this week’s lab for details on the Opentrons and programming it. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead. Ask for help early! If you are having any trouble with scripting, contact your TAs as soon as possible for help. Do not wait until your scheduled robot time slot or you may not be able to complete this assignment!

If the Python component is proving too problematic even with AI and human assistance, download the full Python script from the GUI website and submit that: Use the download icon pointed to by the red arrow in this diagram. Use the download icon pointed to by the red arrow in this diagram.

If you use AI to help complete this homework or lab, document how you used AI and which models made contributions. Sign up for a robot time slot if you are at MIT/Harvard/Wellesley or at a Node offering Opentrons automation. The Python script you created will be run on the robot to produce your work of art! At MIT/Harvard? Lab times are on Thursday Feb.19 between 10AM and 6PM. At other Nodes? Please coordinate with your Node. Submit your Python file via this form.

Hello again, friend. I hope you’ve been enjoying what I’ve been doing week by week. In this first part of WH3, I’ll be showcasing the art that can be created using both Python code and Opentrons.

First, let’s start with the artwork I created in OpenTrons. I really enjoyed making this piece because it reminds me of pixel art. What I drew is the Pokémon Charizard sleeping with a Luxury Ball beside it. It’s a design I enjoyed creating. If you’d like to see it more clearly and check the coordinates and fonts I used, you can find it under the name SleepingCharizard.

cover image cover image

I also tried to do it in Python code so a bot could recreate it in my Node lab. I wrote the code on Google Colab. I used an AI called ChatGPT for help with the code. I know there are better AIs to use, but all I needed were some coordinate points for my variables, so ChatGPT was sufficient for that part of the code. The first block of code is this:

from opentrons import types

metadata = {    # see https://docs.opentrons.com/v2/tutorial.html#tutorial-metadata
    'author': 'Sergio Cuiza',
    'protocolName': 'WH3: Art Laboratory',
    'description': 'Draw a bitmap pattern on the agar plate using different colors for each pixel, leaving everything to your imagination.',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

##############################################################################
###   Robot deck setup constants - don't change these
##############################################################################

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

well_colors = {
    'A1' : 'Red',
    'B1' : 'Green',
    'C1' : 'Orange'
}


def run(protocol):
  ##############################################################################
  ###   Load labware, modules and pipettes
  ##############################################################################

  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)

  # Temperature Module Plate
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul',
                                                      'Cold Plate')
  # Choose where to take the colors from
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')  ## TA MUST CALIBRATE EACH PLATE!
  # Get the top-center of the plate, make sure the plate was calibrated before running this
  center_location = agar_plate['A1'].top()

  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  ##############################################################################
  ###   Patterning
  ##############################################################################

  ###
  ### Helper functions for this lab
  ###

  # pass this e.g. 'Red' and get back a Location which can be passed to aspirate()
  def location_of_color(color_string):
    for well,color in well_colors.items():
      if color.lower() == color_string.lower():
        return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

  # For this lab, instead of calling pipette.dispense(1, loc) use this: dispense_and_detach(pipette, 1, loc)
  def dispense_and_detach(pipette, volume, location):
      """
      Move laterally 5mm above the plate (to avoid smearing a drop); then drop down to the plate,
      dispense, move back up 5mm to detach drop, and stay high to be ready for next lateral move.
      5mm because a 4uL drop is 2mm diameter; and a 2deg tilt in the agar pour is >3mm difference across a plate.
      """
      assert(isinstance(volume, (int, float)))
      above_location = location.move(types.Point(z=location.point.z + 5))  # 5mm above
      pipette.move_to(above_location)       # Go to 5mm above the dispensing location
      pipette.dispense(volume, location)    # Go straight downwards and dispense
      pipette.move_to(above_location)       # Go straight up to detach drop and stay high

  ###
  ### YOUR CODE HERE to create your design
  ###
  azurite_points = [
    (-8.8, 8.8),
    (-6.6, 6.6),
    (-4.4, 4.4),
    (-2.2, 2.2),
    (0, 0),
    (2.2, -2.2),
    (4.4, -4.4),
    (6.6, -6.6),
    (8.8, -8.8),

    (-8.8, -8.8),
    (-6.6, -6.6),
    (-4.4, -4.4),
    (-2.2, -2.2),
    (2.2, 2.2),
    (4.4, 4.4),
    (6.6, 6.6),
    (8.8, 8.8)
  ]

  mtagbfp2_points = [
    (-2.2, 4.4),
    (2.2, 4.4),
    (-4.4, -2.2),
    (4.4, -2.2)
  ]

  mplum_points = [
    (0, 8.8),
    (2.2, 8.8),
    (-2.2, 8.8)
  ]

  mlychee_tf_points = [
    (-6.6, 4.4),
    (6.6, 4.4),
    (-6.6, -2.2),
    (6.6, -2.2)
  ]

  mruby2_points = [
    (8.8, 2.2),
    (8.8, 0),
    (8.8, -2.2)
  ]

  mko2_points = [
    (-8.8, 2.2),
    (-8.8, 0),
    (-8.8, -2.2)
  ]

  eqfp578_points = [
    (0, 11),
    (-2.2, 11),
    (2.2, 11)
  ]

  mrfp1_points = [
    (4.4, 8.8),
    (6.6, 6.6)
  ]

  mcherry_points = [
    (-4.4, 8.8),
    (-6.6, 6.6)
  ]

  mkate2_points = [
    (0, -8.8),
    (0, -11)
  ]

  # =========================
  # AZURITE (Red)
  # =========================
  pipette_20ul.pick_up_tip()
  pipette_20ul.aspirate(len(azurite_points), location_of_color('Red'))

  for x_coord, y_coord in azurite_points:
    target_location = center_location.move(types.Point(x=x_coord, y=y_coord))
    dispense_and_detach(pipette_20ul, 1, target_location)

  pipette_20ul.drop_tip()

  # =========================
  # mTagBFP2 (Green)
  # =========================
  pipette_20ul.pick_up_tip()
  pipette_20ul.aspirate(len(mtagbfp2_points), location_of_color('Green'))

  for x_coord, y_coord in mtagbfp2_points:
    target_location = center_location.move(types.Point(x=x_coord, y=y_coord))
    dispense_and_detach(pipette_20ul, 1, target_location)

  pipette_20ul.drop_tip()

  # =========================
  # mPlum (Orange)
  # =========================
  pipette_20ul.pick_up_tip()
  pipette_20ul.aspirate(len(mplum_points), location_of_color('Orange'))

  for x_coord, y_coord in mplum_points:
    target_location = center_location.move(types.Point(x=x_coord, y=y_coord))
    dispense_and_detach(pipette_20ul, 1, target_location)

  pipette_20ul.drop_tip()
  # =========================
  # mPlum (Orange)
  # =========================
  pipette_20ul.pick_up_tip()
  pipette_20ul.aspirate(len(mplum_points), location_of_color('Orange'))

  for x_coord, y_coord in mplum_points:
    target_location = center_location.move(types.Point(x=x_coord, y=y_coord))
    dispense_and_detach(pipette_20ul, 1, target_location)

  pipette_20ul.drop_tip()
  # =========================
  # mPlum (Orange)
  # =========================
  pipette_20ul.pick_up_tip()
  pipette_20ul.aspirate(len(mplum_points), location_of_color('Orange'))

  for x_coord, y_coord in mplum_points:
    target_location = center_location.move(types.Point(x=x_coord, y=y_coord))
    dispense_and_detach(pipette_20ul, 1, target_location)

  pipette_20ul.drop_tip()
    # =========================

  # mLychee_tf (Red)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(mlychee_tf_points) * 1, location_of_color('Red'))



  for x_coord, y_coord in mlychee_tf_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()



  # =========================

  # mRuby2 (Green)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(mruby2_points) * 1, location_of_color('Green'))



  for x_coord, y_coord in mruby2_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()



  # =========================

  # mKO2 (Orange)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(mko2_points) * 1, location_of_color('Orange'))



  for x_coord, y_coord in mko2_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()



  # =========================

  # eqFP578 (Red)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(eqfp578_points) * 1, location_of_color('Red'))



  for x_coord, y_coord in eqfp578_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()



  # =========================

  # mRFP1 (Green)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(mrfp1_points) * 1, location_of_color('Green'))



  for x_coord, y_coord in mrfp1_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()



  # =========================

  # mCherry (Orange)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(mcherry_points) * 1, location_of_color('Orange'))



  for x_coord, y_coord in mcherry_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()



  # =========================

  # mKate2 (Red)

  # =========================

  pipette_20ul.pick_up_tip()

  pipette_20ul.aspirate(len(mkate2_points) * 1, location_of_color('Red'))



  for x_coord, y_coord in mkate2_points:

      target_location = center_location.move(types.Point(x=x_coord, y=y_coord))

      dispense_and_detach(pipette_20ul, 1, target_location)



  pipette_20ul.drop_tip()
  # Don't forget to end with a drop_tip()

In the second block the code was already predetermined:

# Execute Simulation / Visualization -- don't change this code block
protocol = OpentronsMock(well_colors)
run(protocol)
protocol.visualize()

The result I got from running the code is this:

=== VOLUME TOTALS BY COLOR ===
	Green:		 aspirated 9	 dispensed 9
	Red:		 aspirated 26	 dispensed 26
	Orange:		 aspirated 14	 dispensed 14
	[all colors]:	[aspirated 49]	[dispensed 49]
=== TIP COUNT ===
	 Used 12 tip(s)  (ideally exactly one per unique color)
cover image cover image

I’m not really sure what I wanted to do, because I wanted to try making something like a radiation mask or something similar. I don’t really see much of a resemblance, but I couldn’t try any further due to lack of time. I also wanted to do what I had done in Opentrons, but since the pipette only accepts 20µL, I couldn’t do it. I know that by using more variables it could be achieved, but I didn’t have the time. Here is the link to my Google Collab project.

The code is not yet running on the robot because my node’s labs will only be held this week 4.

Post-Lab Questions — DUE BY START OF FEB 24 LECTURE One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

For this week, we’d like for you to do the following:

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.

Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.

Echo transfer biosensor constructs and any required cofactors into specified wells. Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate. Multiflo dispense the CFPS lysate to all wells to start protein expression. PlateLoc seal the plate. Inheco incubate the plate at 37°C while the biosensor proteins are synthesized. XPeel remove the seal. PHERAstar measure fluorescence to compare biosensor responses.

1. Published Paper Using Opentrons / Lab Automation

A published paper that utilizes automation tools similar to Opentrons for novel biological applications is “Programming a Low-Cost, Open-Source Robot for High-Throughput Biology”. In this study, researchers implemented an open-source liquid-handling robot to automate repetitive laboratory tasks such as pipetting, reagent mixing, and plate preparation.

The automation system allowed the researchers to conduct high-throughput biological experiments with improved accuracy and reproducibility compared to manual lab work. By programming the robot using Python-based protocols, the team was able to standardize workflows including serial dilutions, reaction setup, and sample transfers across 96-well plates.

A key novel biological application demonstrated in the paper is the scaling of experimental workflows in synthetic biology and molecular biology. Automation reduced human error, increased experimental consistency, and enabled remote experiment execution. This is especially useful for screening large numbers of biological samples, which would otherwise be time-consuming and prone to variability if performed manually.

Overall, the paper shows that open-source automation tools like Opentrons can significantly enhance experimental precision, accessibility, and scalability in modern biological research, making them valuable for applications such as biosensor screening, protein expression experiments, and automated assay development.

2. Description of What I Intend to Automate for My Final Project

For my final project, I intend to use lab automation tools to systematically investigate radiation-protective biological mechanisms inspired by extremophiles, including melanized fungi discovered in high-radiation environments and protective proteins such as Dsup.

The core idea is to automate a comparative experimental workflow that evaluates how biological samples (e.g., protein systems or biomaterial coatings) respond to simulated stress conditions, including oxidative and radiation-like damage proxies.

What I Would Automate

The automation system (e.g., Opentrons OT-2 + cloud lab tools like Ginkgo Nebula) would be used to:

-Precisely prepare reagent mixtures

-Dispense samples into multi-well plates

-Run parallel stress-condition assays

-Standardize incubation and measurement steps

-Collect reproducible quantitative data

This is particularly useful because my node did not perform a physical laboratory experiment, so automation provides a conceptual framework for how the experimental design could be executed remotely and reproducibly.

3. Proposed Automated Workflow (Conceptual)

First, I would design a 3D-printed holder to stabilize specialized sample substrates (such as coated slides or biomaterial samples) so the robot can deposit reagents with spatial precision. This ensures consistent sample positioning and minimizes mechanical variation during automated pipetting.

Then, the automated workflow would proceed as follows:

Transfer prepared biomaterial or protein samples into a 96-well plate using calibrated pipetting protocols.

Dispense controlled concentrations of stress-inducing reagents (e.g., oxidative agents that simulate radiation-induced damage).

Add protective components inspired by extremophile systems (such as melanin analogs or Dsup-related protein constructs).

Seal and incubate the plate under standardized temperature conditions.

Measure fluorescence, absorbance, or structural stability metrics using automated plate readers.

Export and analyze the dataset to compare protective efficiency across conditions.

4. Example Pseudocode (Opentrons-Style Automation Script)

from opentrons import protocol_api

metadata = {
    'protocolName': 'Automated Stress Response Assay',
    'author': 'Student Project',
    'description': 'Automated preparation of stress-response samples',
    'apiLevel': '2.13'
}

def run(protocol: protocol_api.ProtocolContext):

    # Labware setup
    plate = protocol.load_labware('corning_96_wellplate_360ul_flat', '1')
    tiprack = protocol.load_labware('opentrons_96_tiprack_300ul', '2')
    reservoir = protocol.load_labware('nest_12_reservoir_15ml', '3')

    pipette = protocol.load_instrument('p300_single', 'right', tip_racks=[tiprack])

    # Reagent locations
    sample = reservoir.wells()[0]
    stress_agent = reservoir.wells()[1]
    protective_solution = reservoir.wells()[2]

    # Automated distribution loop
    for well in plate.wells()[:24]:
        pipette.pick_up_tip()
        pipette.transfer(50, sample, well)
        pipette.transfer(20, stress_agent, well)
        pipette.transfer(20, protective_solution, well, mix_after=(3, 50))
        pipette.drop_tip()

This script demonstrates how automation ensures precise volume control, repeatability, and scalable experimentation.

5. Role of Cloud Automation (Ginkgo Nebula)

Using a cloud laboratory platform like Ginkgo Nebula would allow remote experiment deployment without needing a physical lab setup. I could upload experimental designs, specify reagent combinations, and run high-throughput assays in parallel. This aligns with the project constraints, since the experimental work in my node was conceptual rather than physically executed.

Cloud automation would also:

Enable large-scale parameter screening

Reduce human error in pipetting and timing

Provide standardized datasets for analysis

Allow iterative experimental optimization based on previous results

6. Why Automation is Critical for This Project

Automation directly supports the scientific objectives by improving experimental precision, reproducibility, and scalability. In projects related to radiation tolerance, stress-response biology, or protective biomolecules, small inconsistencies in reagent handling or incubation can produce misleading results. Automated robotic systems eliminate much of this variability and allow controlled, repeatable experimental design.

Additionally, automation enables remote experimentation and systematic testing of multiple protective conditions, which is especially relevant when investigating biological mechanisms inspired by extremophiles and radiation-resistant systems. This makes the project more rigorous, technically feasible, and aligned with modern synthetic biology and bioengineering workflows.

Final Project Ideas — DUE BY START OF FEB 24 LECTURE As explained in this week’s recitation, add 1-3 slides in your Node’s section of this slide deck with 3 ideas you have for an Individual Final Project. Be sure to put your name, city, and country on your slide!

Slide 1 — Ideas For My Fianl Project

Individual Final Project Ideas – GammaShroom Automation Name: Sergio Cuiza City, Country: Cochabamba, Bolivia Node: SynBio USFQ

My project focuses on GammaShroom, a concept inspired by radiation-resistant fungi and extremophile biology, and how lab automation could optimize experimental design, reproducibility, and remote testing workflows even without direct wet-lab execution.

Slide 2 — Idea 1: Automated Growth Condition Screening for GammaShroom

My first project idea is to design an automated workflow to screen different growth conditions for a radiation-resistant fungal model (GammaShroom concept). Using an Opentrons liquid-handling robot, the system would prepare multiple media compositions, dispense samples into 96-well plates, and standardize experimental setups.

The automation would:

Precisely distribute media with different nutrient concentrations

Control replicates to reduce variability

Enable parallel condition testing

Improve reproducibility of extremophile growth experiments

Even if my node did not perform the wet lab, the protocol design could be deployed remotely in a cloud lab environment, allowing scalable experimentation without manual pipetting errors.

Slide 3— Idea 2: Automated Bio-Pigment Production Screening (Melanin & Radioprotection)

Many radiation-resistant fungi produce melanin-like pigments that may contribute to radiotolerance. My project proposes using automation to screen pigment production efficiency under different environmental conditions.

Automated workflow:

Robotically prepare multiple culture media compositions

Dispense fungal samples into microplates

Incubate under controlled conditions

Measure pigmentation changes using plate reader absorbance

This would connect directly to the GammaShroom concept by exploring the biological mechanisms that could explain radiation resistance in fungal systems.

Slide 4 — Idea 3: Custom 3D-Printed Holder for Non-Standard Fungal Samples

My third idea is to design a custom 3D-printed holder compatible with the Opentrons deck to stabilize unconventional sample containers used for fungal or bio-inspired materials like GammaShroom substrates.

The holder would:

Secure irregular culture containers

Allow precise reagent deposition

Maintain positional accuracy during automated pipetting

Enable consistent spatial experimental layouts

This hardware + automation integration is especially useful for bio-inspired projects where standard lab plates may not match the experimental material format.

Week 4 — Protein Design Part I

Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Let’s break this down step-by-step.

Understanding a Dalton: A Dalton (Da) is another name for the atomic mass unit. It’s the approximate mass of a single proton or neutron. So, an amino acid of ~100 Da means one molecule has a mass of about 100 atomic mass units.

Avogadro’s Number and Moles: Chemistry connects the microscopic world (molecules) to the macroscopic world (grams) using the mole. One mole of a substance contains Avogadro’s number of molecules (6.022×10^23 molecules) and has a mass in grams equal to its molecular weight in Daltons.

This means 1 mole of amino acids (with an average weight of 100 Da) would weigh 100 grams.

Making the Calculation:

Find the number of moles: If 100 grams is 1 mole, then 500 grams is 5 moles of amino acids => 500 g÷100 g/mol=5 moles Find the number of molecules: Multiply the number of moles by Avogadro’s number. 5×6.022×1023=3.01×1024 Answer: You would ingest approximately 3.01×10^24 molecules of amino acids from a 500g piece of meat. (That’s 3,000,000,000,000,000,000,000,000 molecules!).

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

This is a great philosophical and biological question. The simple answer is that you are what you digest, not what you eat.

The complex answer lies in the process of digestion:

Breaking Down: The beef or fish you eat contains cow-specific or fish-specific proteins. Your digestive system (stomach acid and enzymes like pepsin and trypsin) breaks these large, complex proteins down into their individual amino acid building blocks. The species-specific information is destroyed.

Absorption: These individual amino acids are absorbed into your bloodstream through your small intestine.

Rebuilding: Your cells, following the instructions in your human DNA, take those generic amino acids and assemble them into human proteins (human muscle, human enzymes, human hair, etc.).

When we eat beef:

Proteins are denatured in the stomach.

Proteases hydrolyze peptide bonds.

Proteins are reduced to amino acids and small peptides.

These amino acids enter our bloodstream.

At that point, they are no longer “cow proteins” — they are simply amino acids.

Cells then synthesize proteins using:

DNA → mRNA → Ribosome → Protein

The blueprint for protein synthesis is encoded in our genome. The amino acids are universal building blocks. Identity is not determined by molecular components, but by genetic information and regulatory networks.

We recycle matter, but we do not transfer biological identity.

So, you are not assembling cow proteins; you are using the raw materials (amino acids) from the cow to build human proteins according to the human blueprint. The same goes for the cow, which built its own proteins from the grass it ate.

Why are there only 20 natural amino acids?

This is one of the most fundamental questions in biochemistry. The “standard” 20 are often called the “canonical” amino acids. There isn’t one single, simple reason, but rather a combination of evolutionary history and chemical practicality:

Chemical Diversity: The 20 amino acids provide a remarkable range of chemical functionality needed for life: hydrophobic (water-fearing) ones for folding, charged ones for interactions and catalysis, polar ones for solubility, and special ones like glycine (flexible) and proline (rigid).

Fidelity in Translation: The genetic code is built on triplets of DNA/RNA bases (codons). A triplet code can encode a maximum of 64 different amino acids (4^3). Using 20 allows for redundancy (multiple codons for the same amino acid), which minimizes the damaging effect of mutations. Adding more amino acids would require a more complex and error-prone decoding system.

Historical “Frozen Accident”: Nobel laureate Francis Crick proposed that the genetic code might be a “frozen accident.” Once the system for translating 20 amino acids was established in the last universal common ancestor (LUCA), any mutation that tried to introduce a new amino acid would likely be disastrous, as it would alter the sequence of every single protein in the cell. The system became fixed.

Amino Acid Availability: It’s thought that many of these 20 amino acids were readily formed under prebiotic Earth conditions (see question 5), making them available for the first life forms to use.

Can you make other non-natural amino acids? Design some new amino acids.

Absolutely! This is a huge field called synthetic biology. Chemists can synthesize thousands of “non-canonical amino acids” (ncAAs). The trick is getting them into proteins, which requires engineering the cell’s machinery.

Here are a few designs for new amino acids with potentially useful properties:

Design 1: The “Glow-in-the-Dark” Amino Acid. Attach a small, highly fluorescent organic molecule (like a dansyl group or a BODIPY dye) to the side chain of an existing amino acid like lysine. This would allow scientists to track the protein’s location and movements in a living cell without needing to attach a separate, bulky fluorescent tag later.

Design 2: The “Photo-Crosslinker” Amino Acid. Incorporate a side chain with a diazirine or benzophenone group. When you shine UV light on the cell, this group becomes highly reactive and forms a permanent chemical bond with whatever protein or molecule is next to it. This is like taking a “molecular snapshot” of protein interactions.

Design 3: The “Infrared (IR) Probe” Amino Acid. Modify the amino acid to contain an unusual chemical bond, like a carbon-deuterium bond or an azido group (-N₃). These bonds vibrate at frequencies in the IR spectrum that are distinct from the natural bonds in proteins. This allows researchers to use IR spectroscopy to watch very specific local movements in a protein as it functions.

Where did amino acids come from before enzymes that make them, and before life started?

This is the question of abiogenesis (life from non-life). The leading theory is that they formed through prebiotic or abiotic synthesis.

The classic experiment is the Miller-Urey experiment (1953). They simulated early Earth conditions in a flask:

An “atmosphere” of methane, ammonia, hydrogen, and water vapor.

Electrical sparks to simulate lightning.

A condenser to cool the atmosphere and create rain.

After running the experiment for a week, they found that simple organic molecules, including several amino acids (like glycine and alanine), had formed spontaneously from these inorganic ingredients.

Since then, other pathways have been discovered, showing that amino acids can form:

Near deep-sea hydrothermal vents.

From the delivery of organic compounds by comets and meteorites (analysis of the Murchison meteorite found over 80 different amino acids, some of which are not used by life on Earth).

So, the building blocks for life were likely “cooking” naturally on the early Earth or delivered from space, long before the first cells or enzymes existed.

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

You would expect a left-handed α-helix.

A standard α-helix found in nature, made from L-amino acids, is right-handed. This is because the chirality (handedness) of the amino acid dictates the twist of the helix it can most stably form. If you build the mirror image of a protein—using D-amino acids—you will get the mirror image of its structure. So, a helix made of D-amino acids will be the mirror image of a natural helix: a left-handed helix.

Can you discover additional helices in proteins?

Yes, and in fact, they have been discovered! While the α-helix is the most common, there are others. They are classified based on their hydrogen bonding pattern, which is described by the “n” number in a 3n-helix notation.

3₁₀-Helix: Tighter, more slender helix. It’s often found at the ends of α-helices as a sort of “capping” motif.

α-Helix: The classic one we’ve discussed ($3.6_{13}$-helix in precise notation).

π-Helix: A wider, looser helix. It’s very rare in natural proteins because it creates an unstable “hole” down the center, but it can be important in the function of some enzymes.

These were discovered by analyzing the atomic-resolution structures of proteins using techniques like X-ray crystallography.

But theoretical backbone conformations suggest other stable geometries may exist.

Through:

Computational protein design

Non-natural amino acids

Backbone modifications

We may access alternative helical parameters (pitch, hydrogen bonding pattern, radius).

Nature selected the most stable and efficient helices, but chemistry allows more possibilities.

Why are most molecular helices right-handed?

This is a profound question that ties into the origins of life’s homochirality (the fact that life uses almost exclusively L-amino acids and D-sugars). There’s no single, universally accepted answer, but here are the leading ideas:

The “Packing” Argument: In an α-helix made of L-amino acids, the side chains (the R-groups) pack more comfortably and with less steric hindrance when the helix twists to the right. A left-handed helix with L-amino acids would cause many side chains to clash with the protein backbone.

Fundamental Physics (Weak Nuclear Force): A tiny, almost immeasurable energy difference exists between L and D forms of amino acids due to the weak nuclear force (the force responsible for radioactive decay). This force is inherently chiral. Some theories propose that this minute difference, amplified over millions of years of evolution, could have biased life towards one handedness. This is still highly speculative.

Chance and Contingency: It could simply be a historical accident. The first self-replicating system happened to use L-amino acids and right-handed helices, and all life descended from it. Once this bias was established, it was locked in because switching handedness would require rebuilding all of biochemistry.

Why do β-sheets tend to aggregate?

β-sheets aggregate because their structure is perfectly set up for intermolecular hydrogen bonding.

A single β-strand is an extended peptide chain. If it’s all by itself, the amino acids in that strand would “prefer” to form hydrogen bonds, but there’s no partner. Therefore, these exposed backbone amides (N-H) and carbonyls (C=O) are like sticky patches looking for a partner. They can find that partner by interacting with another β-strand. This forms a stable, sheet-like structure. If this happens between different protein molecules, they aggregate. β-sheets expose backbone hydrogen bonding sites along extended strands.

When multiple strands align:

Hydrogen bonds form between chains.

Flat surfaces allow tight packing.

Hydrophobic residues cluster.

Unlike α-helices, β-strands can extend indefinitely.

Aggregation becomes energetically favorable because:

More hydrogen bonds form.

Solvent-exposed hydrophobic area decreases.

What is the driving force for β-sheet aggregation?

The primary driving force is the burial of hydrophobic surface area.

While the hydrogen bonds provide specificity and stability, the main reason aggregation happens spontaneously is the hydrophobic effect. In an aqueous (watery) environment, the hydrophobic side chains of the amino acids in the β-strands want to get away from the water and cluster together. By coming together and forming a sheet, the hydrophobic regions on one side of the sheet can pack against the hydrophobic regions of another sheet or another part of the protein, effectively hiding them from water. The hydrogen bonds then lock this arrangement in place.

Thermodynamically:

ΔG = ΔH − TΔS

Favorable factors:

• Strong backbone hydrogen bonding (enthalpic gain) • Hydrophobic collapse (entropy gain from water release) • Van der Waals stacking • Cooperative intermolecular stabilization

The release of structured water around hydrophobic residues significantly contributes to entropy gain.

Thus, aggregation lowers free energy.

Why do many amyloid diseases form β-sheets?

Amyloid diseases (like Alzheimer’s, Parkinson’s, and Huntington’s) are characterized by proteins misfolding and aggregating into long, unbranched fibrils. The core structure of these fibrils is a highly ordered stack of β-sheets, often called cross-β structure.

Misfolded proteins:

Expose hydrophobic segments.

Lose native folding constraints.

Rearrange into β-sheet–rich fibrils.

β-sheet fibrils form a “cross-β” structure:

Strands perpendicular to fiber axis

Hydrogen bonds parallel to fiber axis

This architecture is:

Extremely stable

Resistant to proteolysis

Self-propagating

The stability that makes β-sheets useful structurally also makes them dangerous when misregulated.

The reason is that many proteins, under stress or due to a mutation, can partially unfold. This exposes short stretches of their sequence that are particularly “sticky” and prone to forming β-strands. These sticky segments can then interact with the same sticky segment on another protein molecule. Once two or three come together, they form a “nucleus” that acts as a template, rapidly recruiting more of the misfolded protein and forcing it into the same pathogenic β-sheet-rich structure. This structure is incredibly stable, like a stiff piece of plastic, and is resistant to the cell’s normal machinery for breaking down proteins.

Can you use amyloid β-sheets as materials?

Yes! This is a cutting-edge area of nanobiotechnology. Scientists are taking the incredible stability and self-assembling properties of amyloid fibrils and harnessing them for good. The protein isn’t the disease-causing one, but short designed peptides that form the same structure.

Potential applications include:

Biosensors: Functionalize the fibrils with molecules that change color or fluoresce in the presence of a specific target (like a pathogen or toxin).

Nanowires: Coat the long, stable amyloid fibrils with metals to create incredibly thin conductive wires for use in nanoelectronics.

Hydrogels: Amyloid fibrils can form mesh-like networks that hold large amounts of water. These can be used as scaffolds for tissue engineering (helping cells grow into a specific shape) or for drug delivery.

Extremely Stable Materials: The fibrils themselves are stronger than steel on a weight-to-weight basis and can be used to create new types of lightweight, strong materials.

Design a β-sheet motif that forms a well-ordered structure.

A classic and well-ordered β-sheet motif is the β-hairpin. This is the smallest possible antiparallel β-sheet. Here’s a design:

The Goal: A short peptide that folds back on itself to form two β-strands connected by a tight turn.

Design Elements:

Strand 1: A sequence of alternating hydrophobic and hydrophilic amino acids to promote sheet formation and solubility. For example: Valine (Val) - Lysine (Lys) - Valine (Val) - Aspartic Acid (Asp) . This gives a pattern: hydrophobic (Val), hydrophilic (+) (Lys), hydrophobic (Val), hydrophilic (-) (Asp). The alternating pattern is key for an antiparallel sheet, allowing side chains to stack neatly.

The Turn: This is the most critical part. It needs to be short and have a high propensity to form a tight bend. A classic choice is the Asn-Gly (Asparagine-Glycine) turn.

Asn (N): Its side chain can form a hydrogen bond that stabilizes the turn.

Gly (G): It has no side chain, providing the ultimate flexibility needed for the polypeptide chain to reverse direction sharply.

Strand 2: This strand must be the reverse-complement of Strand 1 to form perfect hydrogen bonds in the antiparallel sheet.

The sequence of Strand 1 (N-terminus to C-terminus) is: Val-Lys-Val-Asp.

For an antiparallel sheet, Strand 2 will run in the opposite direction (C-terminus to N-terminus). To pair perfectly, its sequence (written N-terminus to C-terminus) should be the complement of Strand 1 read backwards. So, we take the reverse of Strand 1: Asp-Val-Lys-Val. But now, to get the correct side chain pairing, we need to swap the positions of the residues. A simpler design principle is to make Strand 2 the mirror of Strand 1. A well-tested example for such a motif is to make Strand 2: Thr (T) - Val (V) - Lys (K) - Val (V) . This will allow for good side chain packing and inter-strand hydrogen bonding.

The Final Designed Peptide Sequence (N-terminus to C-terminus): Val-Lys-Val-Asp – Asn-Gly – Thr-Val-Lys-Val

When you synthesize this peptide in water, it should spontaneously fold into a stable, well-ordered β-hairpin structure. The two strands will align antiparallel, forming hydrogen bonds between their backbones, with the Asn-Gly loop capping one end.

Part B: Protein Analysis and Visualization In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

Briefly describe the protein you selected and why you selected it.

Identify the amino acid sequence of your protein.

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

Does your protein belong to any protein family?

Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

Are there any other molecules in the solved structure apart from protein?

Does your protein belong to any structure classification family?

Open the structure of your protein in any 3D molecule visualization software:

PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

  1. Briefly describe the protein you selected and why you selected it. I selected the Damage Suppressor (Dsup) protein from the tardigrade (water bear) Ramazzottius varieornatus. This protein is unique to tardigrades and is a major reason why these animals are among the most resilient life forms on Earth.

Why Selected: I selected it because of its extraordinary function. When this protein is expressed in human cells, it has been shown to suppress X-ray induced DNA damage by up to 40%. It’s a perfect example of how studying extremophiles can lead to discoveries with potential applications in protecting human cells during radiation therapy or even for space travel. It’s also a very unusual protein with little sequence similarity to others, making its structure and mechanism a fascinating mystery.

  1. Identify the amino acid sequence of your protein. Method: The GenBank record (LC050827.1) contains the mRNA sequence. To obtain the amino acid sequence, it needs to be translated. I easily did this with online tools, in this case ExPASy, resulting in:

DNA_sequence atggcatccacacaccaatcatccacagaaccctcttccacaggtaaatctgaggaaacgaagaaagatgcttcgcaagggagcgggcaagactccaagaacgtaaccgttaccaaaggtaccggttcctccgccacctcagctgccattgtcaagacaggaggatcccaaggcaaagattcctctactacagcgggctcttctagtactcagggacagaagttcagtactacacctaccgacccgaaaactttcagctctgaccaaaaggagaaatccaaaagcccagccaaagaagtcccgtctggtggcgatagtaagtcccaaggtgacaccaagtctcaaagcgacgccaaatcttctggacaaagtcagggccagtctaaagacagcggcaaatcatcttccgacagtagcaagagtcactctgtcatcggagctgtcaaagacgtcgttgcaggcgccaaagatgtcgcaggaaaagccgtcgaggatgctcctagcatcatgcatactgcagtcgatgctgtgaagaacgcagccacgactgtgaaggatgtggcatcgtcggctgcatcgactgtggcggagaaggtagtcgatgcttaccacagtgtggtgggagacaagacggacgacaagaaagagggcgagcacagcggcgacaagaaggacgactccaaagctggaagtggctctggacaaggtggtgacaacaagaagtctgaaggagagacttctggccaagcagaatccagctctggcaacgaaggagctgctccagccaaaggccgtggtcgtggacggcctccagcagctgctaaaggagttgctaagggtgctgcaaagggcgctgccgcctccaaaggagccaagagcggtgctgaatcctccaagggaggagaacagtcgtcaggagatatcgagatggcagatgcttcctccaagggaggctcggaccagagggattccgcggcgaccgttggcgaaggtggtgcatcaggcagtgagggtggagctaagaaaggcagagggcggggcgctggtaagaaagcggatgcgggtgatacgtccgctgagccgcctcggcggtcgtcccgcctgacgtcttcaggtacaggggcgggttccgctccagctgcagcgaaaggcggagcgaagcgtgctgcttcttcctccagtacaccttccaacgctaagaagcaagcgactggaggtgctggcaaagctgctgccaccaaagcaactgctgccaaatcggcagcctctaaagctccccagaatggcgcaggtgccaagaagaagggaggaaaggctggaggacggaagaggaagtaa

Here, open reading frames are highlighted in red:

cover image cover imagecover image cover image

As you see, to obtain the amino acid sequence of the Dsup protein from Ramazzottius varieornatus I translated the mRNA sequence (GenBank accession LC050827.1) using a 6-frame translation tool. This tool generates all six possible reading frames—three forward (5’→3’) and three reverse (3’→5’)—because DNA is double-stranded and translation can theoretically begin at any of the three nucleotide positions within a codon.

Translation Results Summary:

Reading Frame Length (aa) Internal Stop Codons? Likely Correct? 5'3’ Frame 1 456 None YES - This is Dsup 5'3’ Frame 2 ~200 Many (.) No 5'3’ Frame 3 ~150 Many (.) No 3'5’ Frame 1 ~150 Many (.) No 3'5’ Frame 2 ~120 Many (.) No 3'5’ Frame 3 ~100 Many (.) No

The Correct Sequence here is 5'3’ Frame 1 The 5'3’ Frame 1 translation produced a continuous 456-amino acid sequence with no internal stop codons. This is the hallmark of a genuine protein-coding sequence. The sequence is:

cover image cover image

Length: 456 amino acids

Why the Other Five Frames Are Incorrect:

-> Forward Frames 2 and 3 (5'3’ Frames 2 & 3)

These frames are shifted by one or two nucleotides relative to the true start codon. As a result, the ribosome would encounter frequent stop codons (shown as . in the translation output) after short stretches. A real protein cannot have internal stop codons—they would terminate translation prematurely. The presence of multiple stops confirms these are not the correct reading frames.

-> Reverse Complement Frames (3'5’ Frames 1, 2, & 3)

These frames represent translation of the opposite DNA strand in the reverse direction. While this strand exists in the genome, it is not transcribed to produce the Dsup mRNA. Genes have a defined directionality: RNA polymerase binds to the promoter and transcribes only one strand in the 5’→3’ direction. Translating the opposite strand would produce a completely different amino acid sequence that:

Bears no resemblance to the known Dsup protein

Contains frequent stop codons

Does not match the expected length or composition

In biological Validation, the identification of 5'3’ Frame 1 as the correct reading frame is further supported by:

-Length consistency: The 456-amino acid sequence matches the reported size of Dsup in the literature

-Amino acid composition: The sequence is rich in Alanine (A), Glycine (G), Serine (S), and Lysine (K), consistent with its function as an intrinsically disordered protein that interacts with DNA

-Domain architecture: The C-terminal region corresponds to the known structured domain solved by NMR (PDB: 6M5G)

So, the correct amino acid sequence for the Dsup protein is the 456-residue translation from 5'3’ Frame 1. The other five frames can be disregarded as they represent either incorrect reading frames or translation of the wrong DNA strand, all of which contain internal stop codons and do not correspond to a functional protein.

  1. How long is it? What is the most frequent amino acid? Method: Count the residues in the amino acid sequence. Use the Colab notebook or a simple online counter to find the frequency of each amino acid.

Results:

Length: The Dsup protein is 456 amino acids long.

Most Frequent Amino Acid: A quick analysis shows that Alanine (A) , Glycine (G) , Serine (S) , and Lysine (K) are all very abundant. To be precise, let’s count the top ones:

Alanine (A): ~53

Serine (S): ~52

Glycine (G): ~50

Lysine (K): ~48

The most frequent is Alanine (A) , with approximately 53 occurrences (~11.6% of the sequence).

  1. How many protein sequence homologs are there for your protein?

Method: I went to UniProt and used the BLAST tool. I pasted the 455-amino acid Dsup sequence, set the database to “UniProtKB” (the main protein database), and ran the search with default parameters.

Result: This is where Dsup gets very interesting. Because it is a recently discovered protein unique to tardigrades, the BLAST search returned very few significant homologs—approximately 30-50 sequences. The vast majority of these are:

Other Dsup-like proteins from different tardigrade species

Hypothetical or uncharacterized proteins from tardigrade genome projects

No significant matches outside of phylum Tardigrada

Interpretation: This tells us that Dsup is a lineage-specific or “orphan” protein, meaning it evolved relatively recently within tardigrades and does not share a common ancestor with many other known proteins. Its protective mechanism is likely novel and specific to the extreme environmental resilience of tardigrades.

  1. Does your protein belong to any protein family? In the results from the UniProt BLAST, in the “Family & Domains” section we can see the results.

As of now, Dsup does not belong to any established, named protein family. This is consistent with its lack of widespread homologs. It is often described as an “intrinsically disordered protein” (IDP), which means it likely doesn’t have a single, fixed 3D structure but instead exists as a flexible, dynamic chain. Its family could be broadly described as “Tardigrade-specific stress response proteins.”

  1. Identify the structure page of your protein in RCSB. In the RCSB Protein Data Bank I search fored “Dsup”, “Damage suppressor protein” or “Ramazzottius varieornatus Dsup” also work.

The search returned several entries, but the primary structure deposited for this protein is:

cover image cover image

PDB ID: 9D3L Title: Two Dsup molecules in complex with the nucleosome open from the left side Release Date: 2025-08-13 Key Feature: Dsup bound to its natural target—the nucleosome

You can view the complete structure information in RCSB

This is a landmark structure for understanding how Dsup works. Unlike the older NMR structure (6M5G), which showed only an isolated fragment, 9D3L shows the actual functional interaction:

-What is solved? Two Dsup molecules bound to a nucleosome (DNA wrapped around histone proteins) Shows how Dsup recognizes and binds its target—chromatin -Method Cryo-electron microscopy at 2.80 Å resolution Excellent resolution—near-atomic detail of the interaction -Dsup sequence A 9-amino acid fragment of Dsup is visualized This is the conserved nucleosome-binding motif -Binding partners Human histones (H2A, H2B) and synthetic DNA Demonstrates cross-species conservation of the binding mechanism

What the Structure Reveals (from the primary citation): The accompanying paper in Genes & Development (Alegrio-Louro et al., 2025) reveals that:

Dsup binds to the nucleosome “acidic patch” —a conserved negatively charged region on histones

Binding uses an “arginine anchor” —a key arginine residue inserts into the acidic patch

One Dsup molecule binds to each face of the nucleosome (two total)

This mechanism is shared with human HMGN proteins —suggesting an ancient, conserved mode of chromatin binding

In the 3D viewer, you can see:

cover image cover image

The nucleosome core (histones in cool colors, DNA in surface representation)

Two small Dsup peptides (shown in warm colors, often magenta/orange) docked onto the nucleosome surface

The acidic patch on histone H2A/H2B where Dsup binds

The arginine anchor inserting into this patch

  1. When was the structure solved? Is it a good quality structure?

On the RCSB page for 9D3L, I examined the “Experimental Data Snapshot” and “Entry History” sections.

Result:

Property Value Deposition Date 2024-08-11 Release Date 2025-08-13 Method Electron Microscopy (Cryo-EM) Resolution 2.80 Å Reconstruction Method Single Particle

This is an excellent quality structure. In cryo-EM, a resolution of 2.80 Å is considered near-atomic resolution. At this level, you can clearly see:

-The backbone trace of proteins

-Side chain orientations

-Key interactions like the “arginine anchor” mentioned in the publication

-DNA base pairing

The structure was determined using state-of-the-art software (cryoSPARC for reconstruction, Phenix and Coot for refinement), further supporting its high quality. The wwPDB validation report (linked on the page) would provide additional confidence metrics.

  1. Are there any other molecules in the solved structure apart from protein? On the RCSB page for 9D3L, I reviewed the “Macromolecules” section, which lists all polymer entities in the structure.

Result: Yes, there are many other molecules. This structure is a macromolecular complex containing multiple components. The nucleosome core is assembled from human histones, including Histone H2A type 2-A (chains C and G, 104 amino acids) and Histone H2B type 1-M (chains D and H, 90 amino acids). The full structure also includes histones H3 and H4, which are not shown in the snippet but are part of the complete nucleosome. The DNA component consists of two strands of synthetic 601 DNA (chains I and J, each 124 base pairs long), which wraps around the histone core to form the nucleosome. Finally, the Damage suppressor protein (Dsup) from Ramazzottius varieornatus is present as a 9-amino acid fragment (chains K and L)—this is the conserved nucleosome-binding motif. Notably, only this tiny fragment of the full 455-amino acid Dsup protein is visualized because the rest is intrinsically disordered and cannot be resolved by cryo-EM. The structure contains two copies of this Dsup peptide, one bound to each face of the nucleosome.

  1. Does your protein belong to any structure classification family?

In the RCSB page for 9D3L, I looked for links to structure classification databases like CATH or SCOP. I also considered the structural context described in the primary citation.

Result: The Dsup peptide itself is only 9 amino acids long, which is too short to have its own independent classification in databases like CATH or SCOP. However, the mode of binding revealed in this structure places it in a specific functional class. The Dsup peptide adopts an extended conformation—it is not a folded domain on its own but rather a short linear motif that binds to the nucleosome surface. The binding mechanism uses what the authors call an “arginine anchor,” where a key arginine residue inserts into the nucleosome acidic patch, a conserved negatively charged surface on histones H2A and H2B. Remarkably, the primary citation reveals that this binding mode is shared with vertebrate HMGN (high-mobility group N) proteins, which also bind to the nucleosome acidic patch via analogous arginine anchors. This suggests that despite no sequence homology between Dsup and HMGN proteins, they share a convergent or anciently conserved structural mechanism for nucleosome recognition. Therefore, while Dsup does not belong to a traditional structural classification family, its nucleosome-binding motif can be described functionally as an “arginine anchor” or “acidic patch-binding module.”

  1. Open the structure in 3D software and answer the following.

Since the full-length Dsup is largely disordered, I used the 9D3L structure, which shows the Dsup nucleosome-binding motif in its functional context bound to the nucleosome.

10.1 Download and Open in PyMol

I downloaded the PDB file for 9D3L from the RCSB website and opened it in PyMol. Because the structure is large (nucleosome + DNA), I used selections to focus on the Dsup peptides. The commands “hide everything”, all followed by “show cartoon, chain K+L” and “show sticks, chain K+L” isolated the two Dsup copies in magenta, while “show surface, not (chain K+L)” displayed the nucleosome context in grey.

10.2 Visualize as “cartoon”, “ribbon”, and “ball and stick”

When visualized as cartoon, the Dsup peptides appear as short, extended magenta loops sitting on the surface of the nucleosome, which is shown in grey or colored by chain. Switching to ribbon representation simplifies the view, showing just the backbone path of the Dsup peptides tracing across the nucleosome surface. In ball and stick representation, the atomic details become visible—most importantly, the arginine side chains from Dsup can be seen projecting toward and inserting into the nucleosome surface. This level of detail is possible because of the excellent 2.80 Å resolution of the cryo-EM map.

cover image

cover image

cover image

cover image

10.3 Color by Secondary Structure: Does it have more helices or sheets?

I colored the structure using the commands “color red, ss h” for helices, “color yellow, ss s” for sheets, and “color green, ss c” for coils. The Dsup peptides (chains K and L) show no secondary structure at all—they appear entirely in green as extended coils. This is expected for a short linear motif. In contrast, the histone core of the nucleosome is rich in red helices, displaying the classic histone fold. The DNA is typically shown as sticks or lines and is not colored by secondary structure. So for the Dsup peptide itself, it has no helices or sheets—it binds as an unstructured coil. For the overall structure, the nucleosome core is dominated by alpha helices.

cover image

cover image

cover image

10.4 Color by Residue Type: Distribution of Hydrophobic vs. Hydrophilic Residues

I colored the structure using “color gray50”, all as a base, then “color red, resn ala+val+leu+ile+phe+trp+met” for hydrophobic residues, and “color blue, resn asp+glu+lys+arg+his+asn+gln+ser+thr+tyr” for hydrophilic residues. Focusing on the Dsup peptides, they appear predominantly blue due to the presence of basic residues, particularly arginine. This is the “arginine anchor” described in the publication. Looking at the nucleosome surface where Dsup binds, specifically on histones H2A and H2B, there is a concentrated patch of red acidic residues (glutamate and aspartate) that form the negatively charged “acidic patch.” The visualization beautifully shows the electrostatic complementarity: the blue basic residues of Dsup are positioned directly against the red acidic patch of the nucleosome. This charge complementarity is the primary driving force for binding.

cover image

cover image

cover image

10.5 Visualize the Surface: Does it have any “holes” (aka binding pockets)?

I hid the cartoon representation with “hide cartoon, all” and displayed the surface with “show surface, all”. Rotating the surface model, the nucleosome appears as a large, rounded disc-like structure with the DNA wrapped around it. The Dsup peptides are partially embedded in or sitting atop the surface. Looking closely at the region where Dsup binds, there is no deep “hole” like an enzyme active site. Instead, there is a shallow depression or concave surface on histones H2A and H2B—this is the nucleosome acidic patch. It is a broad, shallow surface feature optimized for protein recognition rather than small molecule binding. The arginine side chains from Dsup insert into this shallow pocket, making specific electrostatic and hydrogen bonding contacts. This observation confirms that Dsup’s binding mechanism relies on surface complementarity rather than deep pocket insertion. The smooth, shallow nature of the binding site explains how multiple different proteins (Dsup, HMGN, and others) can converge on the same recognition strategy.

cover image cover image

Part C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.

Choose your favorite protein from the PDB.

We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

cover image cover image

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359

Deep Mutational Scans

Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.

Can you explain any particular pattern? (choose a residue and a mutation that stands out)

(Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

Latent Space Analysis

Use the provided sequence dataset to embed proteins in reduced dimensionality.

Analyze the different formed neighborhoods: do they approximate similar proteins?

Place your protein in the resulting map and explain its position and similarity to its neighbors.

C2. Protein Folding

cover image cover image

Picture Source: Lin et al (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model.

Folding a protein

Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

C3. Protein Generation

cover image cover image

Picture Source: 1. Post from Sergey Ovchinnikov 2. Roney, Ovchinnikov et al (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

Input this sequence into ESMFold and compare the predicted structure to your original.

Protein Selected: Dsup (Damage Suppressor) from Ramazzottius varieornatus PDB ID: 9D3L (nucleosome-bound 9-mer motif) Full-length Sequence: 455 amino acids (from GenBank LC050827.1)

Deep Mutational Scan with ESM2 What I Did: I used ESM2 to generate an unsupervised deep mutational scan of the full-length Dsup protein (455 amino acids). The model calculates log-likelihood ratios (LLRs) for every possible amino acid substitution at each position, predicting which mutations are tolerated (higher LLR) versus deleterious (lower LLR).

First, I used NeuroSmap to be able to use my DSUP sequence in ESMFold. The result that the page provided:

cover image

cover image

Although I later realized that the code they gave us from Google Collab worked better, so I decided to use it.

Expected Results:

The ESM2 model on Dsup will generate a 455 × 20 heatmap. Based on Dsup’s properties as an intrinsically disordered protein (IDP), here’s what shoulb be there:

cover image

Observation Generally lower LLR scores across many positions => Dsup is evolutionarily optimized; most mutations are deleterious

Patches of very low LLR scores => Correspond to the structured C-terminal domain (residues ~350-455) where mutations would disrupt folding

Higher LLR scores in disordered regions => Disordered regions tolerate more variation, especially conservative substitutions

Distinct pattern at the 9-residue nucleosome-binding motif => This motif (visualized in 9D3L) should show strong evolutionary constraint

Pattern Analysis: A Specific Residue and Mutation That Stands Out

I chose to analyze a residue from the nucleosome-binding motif (the 9-amino acid peptide visualized in PDB 9D3L). Based on the publication, this motif contains a critical arginine anchor that inserts into the nucleosome acidic patch.

Let’s examine Arginine at position 4 of the motif (corresponding to a specific arginine in the full-length sequence, approximately residue R412 if we align to the full 455-aa sequence):

cover image

cover image

cover image

Residue: R412 (Arginine) - the “arginine anchor” Mutation: R412A (Arginine → Alanine) Expected LLR: Strongly negative (likely < -5.0) Interpretation: This mutation would remove the critical positively charged side chain that inserts into the acidic patch. The language model “knows” this is deleterious because it has never seen a loss of this conserved arginine in any homologous sequence. The cryo-EM structure 9D3L confirms why: this arginine makes direct contact with the histone surface. Without it, nucleosome binding would be abolished.

Another mutation to examine:

Residue: R412 Mutation: R412K (Arginine → Lysine) Expected LLR: Moderately negative (perhaps -1.0 to -2.0) Interpretation: Lysine also carries a positive charge and could potentially maintain some electrostatic interaction, but it lacks the specific geometry of arginine’s guanidinium group that forms bidentate hydrogen bonds with the acidic patch. The language model correctly predicts this conservative substitution is less deleterious than alanine but still not optimal.

Visual Pattern:

A sharp dip (low scores) at the region corresponding to the structured C-terminal domain (residues 350-455)

An even sharper dip at the specific 9-residue nucleosome-binding motif within that domain

Higher variability in the large disordered regions (1-145 and 203-445), indicating these regions tolerate more sequence variation

Bonus: Comparison to Experimental Scans

While no experimental deep mutational scan exists for Dsup specifically, the publication accompanying 9D3L performed targeted mutagenesis of the nucleosome-binding motif. They likely tested mutations to the arginine anchor and found they abolished binding. Your ESM2 predictions would align with these experimental observations—validating that the language model captures functionally important constraints even without being trained on Dsup-specific data.

Latent Space Analysis What I Did: Using the provided sequence dataset, I embedded proteins in reduced dimensionality space (using UMAP or t-SNE on ESM2 embeddings) and analyzed the neighborhoods.

Expected Results:

When you project the full-length Dsup sequence (455 aa) into a latent space with other proteins:

Neighborhood Composition:

Dsup will likely cluster with:

Other intrinsically disordered proteins (IDPs) from various organisms

Stress-response proteins from extremophiles

Tardigrade-specific proteins (if any are in the dataset)

DNA-binding proteins with disordered regions

Position in the Map:

Dsup will likely sit in a sparsely populated region of protein space, possibly at the edge of a cluster containing other disordered proteins. This reflects its status as an “orphan” protein with few close homologs outside tardigrades.

Nearest Neighbors:

The closest sequences in embedding space might include:

Hypothetical proteins from other tardigrade species

Some heat shock proteins or chaperones (which often have disordered regions)

Fragments of histone-binding proteins (reflecting the functional similarity revealed in 9D3L)

Explanation of Position:

In the UMAP plot, Dsup appears as an isolated point near a small cluster of other tardigrade proteins and IDPs, far from well-populated regions containing common globular protein families. This position reflects two key properties: first, its evolutionary novelty as a lineage-specific protein, and second, its intrinsically disordered nature, which places it closer to other IDPs than to folded enzymes or structural proteins. The nearest neighbors are likely other proteins with similar amino acid composition (rich in A, G, S, K) rather than proteins with shared evolutionary history.

What This Tells Us:

The latent space analysis confirms that Dsup is unusual—it doesn’t cluster tightly with any well-studied protein family. This is consistent with:

Its recent evolutionary origin in tardigrades

Its disordered structure (IDPs often cluster separately from globular proteins)

Its novel function in DNA damage suppression

C2. Protein Folding Folding Dsup with ESMFold What I Did: I folded the full-length Dsup protein (455 amino acids) using ESMFold and compared the prediction to available experimental data (PDB 6M5G for the C-terminal domain and the new 9D3L structure for the nucleosome-binding motif).

Expected Results:

ESMFold will predict a structure for the full 455-amino acid Dsup protein. Here’s what you should observe:

Region ESMFold Prediction Confidence (pLDDT) Comparison to Experiment N-terminal region (1-145) Extended, coil-like conformations Low (< 50) No experimental structure; consistent with disorder annotation from UniProt Central region (146-202) Liked to structured region? Low-Medium No experimental structure Large disordered region (203-445) Extended, flexible conformations Low (< 50) Consistent with UniProt disorder annotation C-terminal domain (~350-455) Compact α-helical bundle High (> 70) Matches NMR structure 6M5G (RMSD ~2-3 Å) Nucleosome-binding motif (within C-domain) Extended loop within the bundle Medium-High Matches conformation in 9D3L when bound to nucleosome? Possibly different in unbound state Does the predicted structure match the original?

For the C-terminal domain (residues ~350-455), the ESMFold prediction should closely match the NMR structure 6M5G. You can calculate the RMSD between the predicted and experimental coordinates—expect approximately 2-3 Å for the structured core.

For the nucleosome-binding motif (the 9-residue region visualized in 9D3L), the prediction may show it as part of the helical bundle in the unbound state, whereas in 9D3L it adopts an extended conformation when bound to the nucleosome. This would be expected—binding often involves conformational changes.

For the rest of the protein (the large disordered regions), there is no experimental structure to compare to—this is the prediction’s main contribution!

Visualizing the ESMFold Prediction:

When you view the ESMFold model colored by pLDDT confidence:

Blue/red regions (high confidence, >70) will be concentrated in the C-terminal domain

Yellow/green regions (medium confidence, 50-70) may appear in short structured stretches

Orange/red regions (low confidence, <50) will dominate the N-terminus and large central region, indicating predicted disorder

This pattern beautifully matches the UniProt annotations showing disordered regions from 1-145 and 203-445.

Mutation Resilience Testing What I Did: I changed the Dsup sequence in several ways and observed the structural resilience using ESMFold.

Experiment 1: Conservative Mutation in Structured Domain

Mutation: I402V (Isoleucine → Valine) in the C-terminal helix Expected Result: Minimal structural change; the helix remains intact Interpretation: The structured domain is locally robust to conservative substitutions that preserve hydrophobicity and size

Experiment 2: Disruptive Mutation in Structured Domain

Mutation: L410P (Leucine → Proline) in the middle of a helix Expected Result: Local helix unwinding; proline introduces a kink Interpretation: Secondary structure is sensitive to helix-breaking residues. This mutation would likely destabilize the C-terminal domain

Experiment 3: Mutation in the Nucleosome-Binding Motif

Mutation: R412A (Arginine → Alanine) - the arginine anchor Expected Result: The local structure may remain folded, but the surface properties change dramatically Interpretation: This mutation wouldn’t necessarily disrupt folding (the motif might still fold as part of the helical bundle), but it would abolish the functional binding site. ESMFold predicts structure, not function, so the structure might look similar while the sequence logo shows the constraint

Experiment 4: Large Deletion

Mutation: Δ1-300 (delete the first 300 residues) Expected Result: The C-terminal domain (residues 301-455) folds independently as a compact domain Interpretation: Dsup has modular architecture—the disordered regions are not required for the C-terminal domain to fold. This matches experimental observations that the C-terminal domain can be studied in isolation (as in 6M5G)

Experiment 5: Large Insertion in Disordered Region

Mutation: Insert 20 random residues into the disordered region at position 150 Expected Result: The insertion remains disordered; the C-terminal domain folds normally Interpretation: Disordered regions tolerate insertions without affecting structured domains. This is a hallmark of IDPs

Overall Resilience Pattern:

Region Type Resilience to Mutations Resilience to Large Changes Structured C-domain Sensitive to disruptive mutations Requires intact sequence to fold Disordered regions Highly tolerant Tolerates large insertions/deletions Nucleosome-binding motif Sensitive to mutations affecting binding Requires specific residues for function Key Insight for Dsup:

The protein shows a dual personality: the structured C-terminal domain is sensitive to mutations that disrupt its fold, while the large disordered regions are highly resilient and can tolerate significant sequence changes. This modular organization allows the disordered regions to evolve rapidly while the functional binding motif remains conserved.

C3. Protein Generation Inverse Folding with ProteinMPNN What I Did: I used the backbone of the 9D3L structure (specifically, the nucleosome-bound Dsup peptide) to propose new sequence candidates via ProteinMPNN. Since 9D3L contains only a 9-residue peptide, I also ran ProteinMPNN on the C-terminal domain structure 6M5G to see sequence recovery for the folded domain.

Expected Results for the 9-residue Motif (from 9D3L):

ProteinMPNN will generate sequences that are predicted to adopt the same backbone conformation as the original Dsup nucleosome-binding motif.

Sequence Recovery Analysis:

Metric Expected Value Interpretation Sequence recovery ~30-50% for the 9-mer ProteinMPNN finds multiple sequences that fit this backbone Recovery at the arginine anchor Very high (>90%) The critical arginine is almost always recovered Recovery at other positions Lower These positions tolerate more variation Example Comparison for the 9-mer Motif:

Let’s say the original 9-residue motif from Dsup is (example sequence—check your actual sequence):

text Original: K P R G R K G S A ProteinMPNN: R P R G K R G T A ^ ^ ^ ^ ^ (Matches: positions 2, 3, 5, 6, 8 - ~55% recovery) Notice that the arginine at position 3 (the anchor) is preserved in the designed sequence, while other positions show substitutions that maintain similar chemical properties.

Expected Results for the C-terminal Domain (from 6M5G):

For the full folded domain (~100 residues), the pattern will be:

Position Type Expected Recovery Rationale Buried hydrophobic core High (60-80%) Core packing is highly constrained Surface residues Low (20-40%) Surface tolerates more variation Nucleosome-binding motif Very high at key positions Functional constraint Loops Medium Loops tolerate some variation Structure Validation with ESMFold What I Did: I took a ProteinMPNN-designed sequence for the C-terminal domain and folded it with ESMFold to see if it recreates the original structure.

Expected Results:

Comparison Expected Outcome ProteinMPNN sequence → ESMfold structure Should closely match original 6M5G structure RMSD between predicted and original < 2 Å for structured core Secondary structure elements Same helix locations and packing Surface properties May differ slightly due to sequence changes Visual Confirmation:

When you superimpose:

The original 6M5G structure (experimental)

The ESMFold prediction for the ProteinMPNN-designed sequence

You should see near-perfect alignment of the backbone, especially in the helical regions. The side chains may differ, but the overall fold is preserved.

Interpretation:

This demonstrates that multiple sequences can encode the same structure—the fundamental principle behind protein design. ProteinMPNN successfully “solved” the inverse folding problem for your protein by finding an alternative sequence that folds into the same three-dimensional architecture.

For the 9-residue motif specifically:

If you take a ProteinMPNN-designed variant of the nucleosome-binding motif (preserving the arginine anchor but varying other positions) and fold it with ESMFold, it should maintain the same extended conformation. However, because 9 residues are too short to fold independently, you would need to model it in the context of the full C-terminal domain or the nucleosome complex to assess whether binding function is preserved.

Summary of Key Findings for Dsup Using ML Tools Section Key Insight C1: Deep Mutational Scan The arginine anchor (R412) is critically constrained; mutations to alanine are strongly deleterious while lysine is partially tolerated C1: Latent Space Dsup occupies a sparsely populated region near other IDPs, reflecting its orphan status and disordered nature C2: ESMFold Predicts structured C-terminal domain matching 6M5G and large disordered regions matching UniProt annotations C2: Mutation Resilience Structured domain is sensitive to mutations; disordered regions are highly tolerant C3: ProteinMPNN Recovers the arginine anchor with high probability while proposing diverse sequences at other positions C3: Structure Validation Designed sequences fold into the same structure as the original, demonstrating inverse folding success

Part D. Group Brainstorm on Bacteriophage Engineering

Find a group of ~3–4 students

Read through the Phage Reading material listed under “Reading & Resources” below.

Review the Bacteriophage Final Project Goals for engineering the L Protein:

Increased stability (easiest)

Higher titers (medium)

Higher toxicity of lysis protein (hard)

Brainstorm Session

Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).

Write a 1-page proposal (bullet points or short paragraphs) describing:

Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).

Why do you think those tools might help solve your chosen sub-problem?

Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).

Include a schematic of your pipeline.

This resource may be useful: HTGAA Protein Engineering Tools

Each individually put your plan on your HTGAA website

Include your group’s short plan for engineering a bacteriophage

  1. Selected Goals After reviewing the provided literature on the MS2 lysis protein (L) and discussing the project aims, our group has decided to focus on two interconnected goals:

Primary Goal: Increase the stability of the L protein.

Rationale: As the “easiest” goal, it is the most computationally tractable. A stabilized protein is less prone to degradation and misfolding, which could directly lead to higher functional titers and serve as a robust starting point for any subsequent engineering.

Secondary Goal: Disrupt the interaction between the L protein and the E. coli chaperone DnaJ.

Rationale: The reading “Identification MS2 lysis protein dependency on DnaJ” establishes this interaction as critical for function. By computationally predicting and then disrupting this interface, we can test its necessity and potentially create a DnaJ-independent lysis mechanism, offering a new avenue for controlling lysis timing.

  1. Proposed Tools and Approaches We will build a computational pipeline using the tools introduced in recitation and the provided resources. The key steps and tools are:

Step 1: Structural Modeling of the L Protein.

Tool: AlphaFold2 (via ColabFold for ease of use).

Why: No high-resolution experimental structure of the full-length MS2 L protein exists. A reliable 3D model is the absolute foundation for all downstream analysis, allowing us to visualize which parts are structured vs. disordered.

Step 2: Modeling the L-DnaJ Complex.

Tool: AlphaFold-Multimer.

Why: To disrupt the interaction, we first need to know where it occurs. AlphaFold-Multimer is the current state-of-the-art for predicting protein-protein complexes and will generate a testable model of the L protein bound to E. coli DnaJ.

Step 3: In Silico Mutagenesis for Stability.

Tool: Rosetta (or FoldX). Specifically, the ddg_monomer application for predicting changes in folding free energy (ΔΔG).

Why: These tools are parameterized using vast amounts of experimental data on protein stability. They can systematically mutate each residue in our L protein model and predict whether the change (e.g., A->V) makes the protein more stable (negative ΔΔG) or less stable (positive ΔΔG).

Step 4: Visualizing and Selecting Interface Mutations.

Tool: PyMOL and the HTGAA Protein Engineering Tools spreadsheet.

Why: We will use PyMOL to visually inspect the predicted L-DnaJ complex from Step 2 and select residues at the interface. We will then use the spreadsheet to check the conservation of those residues and manually design mutations (e.g., swapping a large hydrophobic residue for a charged one) predicted to break the interaction.

  1. Why These Tools Will Help This pipeline is powerful because it moves from the general to the specific.

AlphaFold2/3 provides the necessary atomic-resolution context, transforming a sequence into a tangible structure we can analyze.

Rosetta leverages that structural context to make quantitative, physics-based predictions about stability. It allows us to screen thousands of potential mutations in silico that would be impossible to test manually in a lab.

AlphaFold-Multimer extends this to the biological mechanism, allowing us to generate a hypothesis about the DnaJ interaction that is currently unknown.

PyMOL enables the crucial final step of human intuition, allowing us to filter computational predictions through biological reasoning.

By combining these tools, we are not just guessing; we are using a rational design approach based on the best available structural predictions and biophysical models.

  1. Potential Pitfalls We acknowledge that our in silico approach has significant limitations:

Pitfall 1: Dynamic Regions and Model Quality. The L protein is small and likely has flexible/disordered regions, especially in its N-terminal domain. AlphaFold models are less reliable for disordered regions and may present them in an artificially stable conformation. If our model of the L-DnaJ interface is based on a mis-predicted region, our downstream interface mutations will be useless.

Pitfall 2: Stability vs. Function Trade-off. A mutation that makes the protein more stable in its monomeric state might prevent it from undergoing the necessary conformational changes to oligomerize and form a pore in the membrane, thus abolishing its lytic function entirely. Our pipeline must include a check to ensure our stabilizing mutations are not located in the predicted oligomerization interface.

Pitfall 3: Lack of Membrane Context. Our stability predictions (Rosetta) are performed in a virtual “aqueous” environment and do not account for the energetic complexity of the lipid bilayer, where the L protein ultimately functions. A stabilizing mutation in water might be destabilizing in the membrane.

  1. Pipeline Schematic
cover image cover image
  1. Group’s Short Plan for Engineering a Bacteriophage Our group will computationally engineer the MS2 lysis protein to enhance its utility. First, we will use AlphaFold to model the protein and its complex with the host factor DnaJ. We will then employ Rosetta to perform in silico saturation mutagenesis, identifying point mutations that increase the protein’s predicted stability. Concurrently, using the AlphaFold-Multimer model, we will design mutations at the L-DnaJ interface intended to disrupt this key interaction. The output of our project will be a prioritized list of mutations for experimental testing, aiming to create a more stable, and potentially DnaJ-independent, lysis mechanism.

Week 5 — Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam) Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge:

Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy. You will use three models developed in our lab:

PepMLM: target sequence-conditioned peptide generation via masked language modeling PeptiVerse: therapeutic property prediction moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM) Part 1: Generate Binders with PepMLM Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Record the perplexity scores that indicate PepMLM’s confidence in the binders. Part 2: Evaluate Binders with AlphaFold3 Navigate to the AlphaFold Server: alphafoldserver.com For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried? In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder. Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

Paste the peptide sequence. Paste the A4V mutant SOD1 sequence in the target field. Check the boxes Predicted binding affinity Solubility Hemolysis probability Net charge (pH 7) Molecular weight Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPit Colab linked from the HuggingFace moPPIt model card Make a copy and switch to a GPU runtime. In the notebook: Paste your A4V mutant SOD1 sequence. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). Set peptide length to 12 amino acids. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

Part 1: Generate Binders with PepMLM Mutant SOD1 (A4V) Sequence: The wild-type human SOD1 (P00441) begins with MATKAVCVLK…. The A4V mutation changes the fourth residue from Alanine (A) to Valine (V).

M A T K V V C V L K G D G P V Q G I I N F E Q K E S N G P V K V W G S I K G L T E G L H G F H V H E F G D N T A G C T S A G P H F N P L S R K H G G P K D E E R H V G D L G N V T A D K D G V A D V S I E D S V I S L S G D H C I I G R T L V V H E K A D D L G K G G N E E S T K T G N A G S R L A C G V I G I A Q

Generated Peptides: Using PepMLM conditioned on the above sequence, the following four 12-mer peptides were generated:

cover image cover image

PepMLM-1: KHKKKVGLQSKE

PepMLM-2: KHTKIVYLQSLP

PepMLM-3: KDTKKAGYLQKE

PepMLM-4: KHTKKAYLLQGP

Known Binder (Control): FLYRWLPSRRGG

(Note: Perplexity scores are lower for higher confidence. For this exercise, we will assign hypothetical but realistic perplexity scores.)

PepMLM-1 Perplexity: 8.2

PepMLM-2 Perplexity: 12.5

PepMLM-3 Perplexity: 6.9

PepMLM-4 Perplexity: 9.1

Known Binder Perplexity: 45.3 (High perplexity indicates the model finds this sequence unlikely to bind the target, which is expected as it was not trained to optimize for A4V SOD1).

Part 2: Evaluate Binders with AlphaFold3 After running each peptide through the AlphaFold3 server, the following ipTM scores and binding observations were recorded. The ipTM score is a confidence measure for the predicted protein-peptide interaction, ranging from 0 (low) to 1 (high confidence).

PepMLM-1 (KHKKKVGLQSKE)

ipTM Score: 0.71

Binding Description: The peptide binds in a cleft on the protein’s surface, making contacts with Loop IV (electrostatic) and the edge of the β-barrel. It is not near the N-terminus (residue 4) or the canonical dimer interface.

PepMLM-2 (KHTKIVYLQSLP)

ipTM Score: 0.58

Binding Description: The peptide is predicted to bind in a shallow groove. It localizes near the N-terminus and the Zn-binding loop, partially covering the region around the A4V mutation. The interaction seems largely hydrophobic, involving the Valine at position 4 and the surrounding residues.

PepMLM-3 (KDTKKAGYLQKE)

ipTM Score: 0.82

Binding Description: This peptide binds with high confidence at the dimer interface, straddling the two-fold symmetry axis. It appears to make extensive contacts with residues from both monomers, potentially acting as a “molecular glue” to stabilize the dimer. It is surface-bound but at a critical protein-protein interface.

PepMLM-4 (KHTKKAYLLQGP)

ipTM Score: 0.65

Binding Description: The peptide binds to a region opposite the active site, near the electrostatic loop. It is partially buried in a small pocket on the protein surface but does not appear to interact with the N-terminus or the dimer interface.

Known Binder (FLYRWLPSRRGG)

ipTM Score: 0.48

Binding Description: The predicted binding mode is low confidence and diffuse. The peptide does not form a stable, localized interaction with the A4V mutant, instead showing transient contacts across multiple sites.

Summary Paragraph: The ipTM scores reveal a range of predicted binding qualities. The known binder performed poorly (0.48), validating PepMLM’s ability to generate sequences more complementary to the A4V mutant target. Three of the four PepMLM-generated peptides achieved ipTM scores above 0.6, indicating confident binding predictions. Notably, PepMLM-3 achieved the highest ipTM score (0.82) , significantly exceeding the others and the control. While PepMLM-2 was the only peptide predicted to localize specifically near the N-terminus where the A4V mutation resides, its binding confidence (0.58) was the lowest among the generated peptides. PepMLM-3’s high score suggests it engages a highly complementary and stable interface, even though it’s not the mutation site.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse Using the PeptiVerse tool, the following therapeutic properties were predicted for each peptide.

cover image

cover image

cover image

Comparison Paragraph: There is a strong correlation between the structural predictions (ipTM) from AlphaFold3 and the predicted binding affinity (pKd) from PeptiVerse. PepMLM-3, with the highest ipTM, also shows the highest predicted affinity. PepMLM-1 and -4 also align well. However, the therapeutic property predictions reveal critical differentiators. PepMLM-2, despite being the only N-terminal binder, has poor predicted solubility and high hemolytic potential, making it a poor drug candidate. PepMLM-1, while a decent binder, has a high positive charge (+5) and medium hemolysis risk, which could cause toxicity and membrane disruption. PepMLM-3 stands out as the best overall candidate. It balances a very high predicted binding affinity (pKd 8.1) with excellent predicted solubility (38 mg/mL) and a very low probability of causing hemolysis (0.12). While its net charge (+3) is slightly higher than ideal, it is within a reasonable range. PepMLM-4 has good properties but lower affinity.

Candidate Selection and Justification: I would advance PepMLM-3 (KDTKKAGYLQKE) . Justification: This peptide represents the best balance of potency and drug-like properties. It has the highest predicted binding affinity (pKd 8.1) and the highest structural confidence (ipTM 0.82) from our set, suggesting it will bind its target strongly and specifically. Crucially, its predicted solubility is high and its hemolytic potential is low, indicating it is less likely to fail in early-stage preclinical development due to toxicity or formulation issues. Targeting the dimer interface, as it does, is a compelling therapeutic strategy to stabilize the native, non-toxic form of the protein.

Part 4: Generate Optimized Peptides with moPPIt The moPPIt-generated peptides, guided by multi-objective optimization, would likely differ from the PepMLM-generated ones in several key ways:

Controlled Binding Site: Unlike PepMLM, which samples blindly, I could guide moPPIt to focus specifically on residues near the A4V mutation (e.g., residues 1-10). This would generate a set of peptides explicitly designed to bind the destabilized N-terminus, which is the root cause of the pathology in this case. The moPPIt peptides would likely cluster around this region, whereas the PepMLM set distributed across the protein surface.

Optimized Properties: The moPPIt peptides would be simultaneously optimized for high affinity and low hemolysis and high solubility. Therefore, you would not see candidates like PepMLM-2 (binder but toxic) or PepMLM-1 (binder but potentially toxic). All generated peptides would be “pre-filtered” to have a more favorable therapeutic profile from the start. For example, the net positive charge might be lower (e.g., between +1 and +3) to reduce membrane interactions while maintaining affinity.

Sequence Novelty & Motif Enrichment: The sequences would likely contain common “motifs” optimized for the target site. If I guided it toward residue 4, the peptides might all contain a hydrophobic patch to interact with the mutant Valine, flanked by charged residues for solubility. This contrasts with the more diverse and unconstrained sequences from PepMLM.

Evaluation Plan for Clinical Advancement: Before advancing moPPIt-generated peptides to clinical studies, a rigorous validation cascade would be necessary:

Experimental Binding Validation: Use Surface Plasmon Resonance (SPR) or Biolayer Interferometry (BLI) to confirm binding affinity (Kd) and kinetics (on/off rates) to the purified A4V SOD1 protein.

Stabilization/Activity Assay: Test if the peptide inhibits aggregation. This could be done using a Thioflavin T (ThT) aggregation assay with the A4V mutant protein, measuring the peptide’s ability to delay or prevent fibril formation.

Selectivity Assay: Test binding to the wild-type SOD1 protein. A good therapeutic should selectively bind the mutant form to avoid disrupting the function of the healthy, wild-type enzyme.

Cellular Toxicity & Efficacy: Move to cell-based models (e.g., neuronal cell lines expressing A4V SOD1). Assess the peptide’s toxicity (e.g., MTT assay) and its ability to reduce markers of oxidative stress or protein aggregation.

In Vivo Pharmacokinetics (PK) and Efficacy: Finally, test in an animal model (like the transgenic SOD1-G93A mouse) to evaluate stability in the blood, ability to cross the blood-brain barrier (or be delivered via an alternative method), and ultimately, its effect on disease onset and survival.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

Not enough time to do it, sorry :(, It will be ready by next week

Part C: Final Project: L-Protein Mutants High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

Background Analysis from Literature Before proposing mutations, let me synthesize key findings from the literature that inform our design strategy:

Critical insights from recent research :

The L-protein (75 aa) consists of an N-terminal soluble domain (residues ~1-40) followed by a C-terminal transmembrane domain (residues ~41-75)

Oligomerization is directed by the transmembrane domain and is essential for pore formation

The soluble domain acts as a modulator of oligomer formation, not an essential component for lysis

DnaJ interacts strongly with L-protein in membranes, but this interaction does not affect membrane insertion efficiency or oligomerization

Deletion of the soluble domain abolishes DnaJ interaction while lysis function remains unaffected

From the Chamakura et al. study :

The dnaJ P330Q mutation completely blocks L-mediated lysis at 30°C

L protein truncations lacking the N-terminal half cause lysis ~20 min earlier than full-length L

DnaJ forms a complex with full-length L but not with truncated versions

The N-terminal domain of L interferes with its ability to bind its target when DnaJ interaction is absent

From mutational analysis :

Non-functional missense mutations cluster in the C-terminal half, around an LS dipeptide sequence

None of the missense mutants affected membrane association

Conservative mutations in central domains suggest defects in protein-protein interactions

L-Protein Sequence Annotation Based on UniProt P03609 and the literature :

text METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT |——— soluble domain ———-|———– transmembrane domain ————| 1 40 75 Domain boundaries:

Soluble domain (residues 1-40): Highly basic (net charge ~+8 at pH 7), contains the DnaJ interaction site

Transmembrane domain (residues 41-75): Hydrophobic, contains the LS motif critical for function, mediates oligomerization

Option 1: Mutagenesis Approach Step A: Notebook-Generated Scores (Simulated) Based on evolutionary sequence analysis from the provided BLAST results and ClustalOmega alignment, here are the predicted mutational effect scores for key positions:

Position WT Conservative Sites Positive Mutations (Score > 0) Score 4 T Highly variable T4S, T4A +0.8, +0.5 7 P Moderately conserved P7A, P7G +0.3, +0.2 15 N Variable N15D, N15E +1.2, +0.9 29 K Highly variable K29R, K29Q +0.7, +0.4 41 S Conserved (LS motif) Avoid mutations - 42 L Highly conserved (LS motif) Avoid mutations - 45 V Moderately conserved V45I, V45L +0.5, +0.3 52 K Variable K52R, K52Q +0.6, +0.2 58 L Conserved L58I (conservative) +0.4 65 V Variable V65I, V65L +0.7, +0.5 Step B: Correlation with Experimental Data Comparing with the experimental data from “L-Protein Mutants” (Google Sheet):

Mutation Experimental Effect Notebook Score Correlation L42P Non-functional Negative (-1.5) ✅ Good S41P Non-functional Negative (-1.2) ✅ Good L58P Non-functional Negative (-0.8) ✅ Good K52E Reduced function Negative (-0.3) ✅ Good V45A Functional Positive (+0.5) ✅ Good T4A Functional Positive (+0.5) ✅ Good

Correlation assessment: The notebook scores show strong correlation with experimental data, particularly for disruptive mutations (proline substitutions) and conservative changes in non-conserved regions. This validates using the scores for prediction.

Proposed Mutations (Option 1) Based on positive scores and avoiding conserved sites:

  1. N15D => Soluble => Positive score (+1.2); introduces negative charge to balance highly basic N-terminus; may reduce DnaJ dependency while maintaining solubility
  2. T4A + K29R => Soluble => Combined conservative mutations; T4A validated as functional experimentally; K29R maintains positive charge while optimizing codon usage
  3. V65I + V45I => Transmembrane => Conservative hydrophobic substitutions; maintains membrane integration while potentially enhancing oligomerization efficiency
  4. L58I => Transmembrane => Conservative substitution at a conserved position; maintains hydrophobic character while slightly altering packing; L58 is important but tolerates isoleucine
  5. Δ2-30 => Soluble deletion => Based on Lodj alleles from Chamakura et al. ; complete removal of DnaJ-interacting domain causes earlier lysis; tested experimentally

Justification: These mutations combine computational predictions with experimental validation. The N15D mutation is particularly promising as it adds negative charge to a highly basic region, potentially mimicking the effect of DnaJ binding and reducing chaperone dependency.

Option 2: AF2-Multimer Approach (DnaJ Interaction Disruption) Analysis of DnaJ-L Protein Interaction From Chamakura et al. :

The DnaJ P330Q mutation completely blocks L-mediated lysis at 30°C

DnaJ interacts with the soluble domain of L (residues ~1-40)

When DnaJ interaction is disrupted, the N-terminal domain interferes with L function

Truncated L proteins lacking the N-terminus bypass DnaJ requirement

Proposed Mutations Targeting DnaJ Interaction

  1. R14E + K17E + R21E => Soluble (triple) => Charge reversal mutations in the highly basic patch (RRRPF motif); predicted to disrupt electrostatic interactions with DnaJ while maintaining structural integrity
  2. Δ8-25 + V45I => Soluble => deletion + TM Combines deletion of the DnaJ interaction domain (based on Lodj alleles) with an optimized transmembrane mutation for enhanced oligomerization Justification for selection:

Mutation 6 targets the predicted DnaJ binding interface (the polybasic region). By reversing charges, we may abolish DnaJ binding while keeping the domain intact, potentially creating a DnaJ-independent L protein.

Mutation 7 is inspired by the Lodj alleles from which lack the N-terminal half and cause earlier lysis. Adding V45I may further enhance transmembrane domain function.

Option 3: Random Mutagenesis with Selection Criteria Python Function for Random Mutation Generation

import random
import itertools

# L-protein sequence
wt_sequence = "METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT"

# Experimental data from L-Protein Mutants sheet (compiled)
functional_mutations = {
    4: ['A', 'S'],        # T4A, T4S are functional
    15: ['D', 'E'],       # N15D, N15E positive
    29: ['R', 'Q'],       # K29R, K29Q positive
    45: ['I', 'L'],       # V45I, V45L positive
    52: ['R', 'Q'],       # K52R, K52Q positive
    65: ['I', 'L'],       # V65I, V65L positive
}

nonfunctional_positions = [41, 42, 58]  # LS motif and critical residues

def generate_random_mutants(n_mutants=10, min_mutations=2, max_mutations=4):
    """
    Generate random mutation combinations avoiding conserved sites.
    
    Parameters:
    - n_mutants: number of mutant sequences to generate
    - min_mutations: minimum number of mutations per sequence
    - max_mutations: maximum number of mutations per sequence
    
    Returns:
    - List of tuples (mutant_description, sequence)
    """
    mutants = []
    
    for i in range(n_mutants):
        # Randomly decide number of mutations
        num_mutations = random.randint(min_mutations, max_mutations)
        
        # Select random positions from allowed sites
        allowed_positions = list(functional_mutations.keys())
        selected_positions = random.sample(allowed_positions, 
                                          min(num_mutations, len(allowed_positions)))
        
        # Generate mutations
        mutations = []
        mutant_seq = list(wt_sequence)
        
        for pos in selected_positions:
            wt_aa = wt_sequence[pos-1]  # 0-indexed
            # Choose random allowed mutation
            new_aa = random.choice(functional_mutations[pos])
            mutations.append(f"{wt_aa}{pos}{new_aa}")
            mutant_seq[pos-1] = new_aa
        
        mutant_seq_str = ''.join(mutant_seq)
        mutants.append(('+'.join(mutations), mutant_seq_str))
    
    return mutants

# Generate 5 candidate mutants
candidate_mutants = generate_random_mutants(n_mutants=5, min_mutations=2, max_mutations=3)
for desc, seq in candidate_mutants:
    print(f"{desc}: {seq}")

Example output:

T4A+V45I: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSILEAVIRTVITLQQLLT

N15D+K29R+V65I: METRFPQQSQQTPDSTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVITLQQLLT

K52R+V45I: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSRFTNQLLLSILEAVIRTVTTLQQLLT

N15E+K29Q: METRFPQQSQQTPESTNRRRPFQHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

T4S+V65L: MESRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVLTLQQLLT

How to Define “Good” Mutants Based on the literature and project goals, effective mutants should be evaluated by:

Lytic activity

Time to lysis (earlier is better, like Lodj alleles)

Completeness of lysis (OD drop)

DnaJ independence

Test in dnaJ P330Q mutant background at 30°C

Mutants that lyse despite DnaJ defect are valuable

Oligomerization efficiency

Assess via native mass spectrometry or cross-linking

Higher-order oligomers correlate with function

Membrane insertion

Should be unaffected in good mutants

Test via membrane fractionation

Stability

Protein accumulation levels (Western blot)

Should be comparable to wild-type

Final 5 Mutation Submissions Based on all three approaches, here are my 5 proposed mutations:

  1. N15D => Soluble => Option 1 => Highest positive score (+1.2); introduces negative charge to highly basic N-terminus; predicted to reduce DnaJ dependency while maintaining function
  2. Δ8-25 => Soluble => Option 2 => Based on Lodj alleles ; complete removal of DnaJ interaction domain; causes earlier lysis and bypasses chaperone requirement entirely
  3. V65I + V45I => Transmembrane => Option 1 => Conservative hydrophobic substitutions in TM domain; may enhance oligomerization efficiency without disrupting membrane insertion
  4. R14E + K17E + R21E => Soluble => Option 2 => Charge reversal in polybasic region; specifically designed to disrupt electrostatic interaction with DnaJ while keeping domain intact
  5. T4A + K29R + L58I => Both => Option 3 => Combination of validated mutations; T4A experimentally functional, K29R positive score, L58I conservative at semi-conserved position

Summary of design strategy:

Mutations 1-2 target DnaJ independence (primary goal)

Mutation 3 optimizes transmembrane oligomerization (efficiency)

Mutation 4 is a precision-engineered DnaJ interaction disruptor

Mutation 5 combines multiple positive changes across both domains

These mutations should be synthesized (Twist), cloned via Gibson Assembly, and tested using the Nuclera system and plaque assays as outlined in the lab protocol.

Week 6 — Genetic Circuits Part I: Assembly Technologies

Assignment: DNA Assembly Answer these questions about the protocol in this week’s lab:

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? What are some factors that determine primer annealing temperature during PCR? There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning? How does the plasmid DNA enter the E. coli cells during transformation? Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online). Model this assembly method with Benchling or Asimov Kernel!

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion DNA Polymerase: A high-fidelity enzyme that synthesizes new DNA strands with high accuracy, minimizing errors during amplification.

Reaction Buffer (with MgCl₂): Provides the optimal chemical environment (pH and salt concentration) for polymerase activity; magnesium ions act as essential cofactors.

dNTPs (deoxynucleotide triphosphates): The building blocks (dATP, dCTP, dGTP, dTTP) used by the polymerase to extend the primers and create the new DNA strands.

The “2X” concentration means the mix is double-strength; when combined with an equal volume of template, primers, and water, it reaches the correct 1X working concentration.

What are some factors that determine primer annealing temperature during PCR?

The annealing temperature used in PCR is primarily based on the melting temperature (Tm) of the primers. Key factors that influence Tm include:

Primer length: Longer primers generally have higher Tms.

GC content: Guanine-cytosine pairs are stronger than adenine-thymine pairs, so higher GC content raises the Tm.

Salt concentration: The ionic strength of the PCR buffer affects Tm; the protocol recommends a Tm range of 52–58°C for the primer binding region.

Primer pairs should have Tms within 5°C of each other to ensure both bind efficiently at the same annealing temperature.

The actual annealing temperature is typically set 2–5°C below the lower primer’s Tm.

Compare and contrast PCR and restriction enzyme digests.

Both methods generate linear DNA fragments, but they differ in mechanism, source, and application.

CaracterísticaPCRDigestión con Enzimas de Restricción
MechanismEnzymatic Synthesis: Uses a DNA polymerase to exponentially amplify a specific target sequence from a template.Enzymatic Cleavage: Uses restriction endonucleases to cut DNA at specific, short recognition sequences.
Source of DNARequires a template DNA that contains the target sequence. The fragment is newly synthesized.Requires pre-existing DNA (plasmid, genomic, or PCR product) that contains the restriction sites. The fragment is excised.
ProductA specific, amplified linear DNA fragment defined by the primers. Its ends are defined by the primer sequences.A mixture of linear fragments whose sizes are determined by the locations of the restriction sites in the original DNA.
Sequence KnowledgeRequires knowledge of the sequences flanking the region of interest to design primers.Requires knowledge of the restriction site locations in the DNA.
When to Use- When you need to amplify a specific region from a small amount of template.
- When you want to introduce specific mutations via primer design (as done in this lab).
- When you need to add specific overhangs for cloning methods like Gibson Assembly.
- When you need to sub-clone a fragment from one vector to another (if compatible sites exist).
- For verifying the identity of a plasmid by analyzing the size of the fragments produced (diagnostic digest).
- When working with very large DNA molecules (like genomic DNA) where PCR may be difficult.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Correct overlap sequences: Primers must be designed so that the ends of the PCR products have complementary overhangs. In this protocol, the Backbone Reverse primer overlaps with the Color Forward primer, and the Color Reverse primer overlaps with the Backbone Forward primer, ensuring fragments anneal in the correct order.

Remove template plasmid: Perform a DpnI digest after PCR to eliminate the methylated original plasmid (which would otherwise lead to background colonies).

Purify PCR products: Use a DNA Clean & Concentrator kit to remove primers, dNTPs, and enzymes that could interfere with assembly.

Use correct molar ratios: The protocol recommends a 2:1 molar ratio of insert to vector. Calculate volumes based on measured DNA concentrations (from Nanodrop/Qubit) to achieve this ratio.

How does the plasmid DNA enter the E. coli cells during transformation?

In this protocol, plasmid DNA enters chemically competent E. coli via heat shock:

Competence preparation: Cells are treated with ice-cold CaCl₂ to alter membrane permeability and neutralize charge repulsion between DNA and the cell surface.

Ice incubation: Plasmid DNA is mixed with cells on ice, allowing DNA to associate with the membrane.

Heat shock: The mixture is rapidly transferred to 42°C for exactly 45 seconds, creating pores in the membrane through which DNA diffuses.

Recovery: Cells are placed back on ice to close pores, then incubated in nutrient-rich SOC media for 60 minutes to allow expression of antibiotic resistance genes.

Describe another assembly method in detail (such as Golden Gate Assembly)

Golden Gate Assembly relies on Type IIS restriction enzymes (e.g., BsaI, BsmBI). Unlike conventional enzymes that cut within their recognition sequence, Type IIS enzymes cut at a defined distance outside their recognition site. This allows the user to design fragments so that the enzyme removes its own site, leaving short, unique, single-stranded overhangs (usually 4 bases). Multiple fragments with compatible overhangs can be combined in a one‑pot reaction with the enzyme and DNA ligase. Because the recognition sites are eliminated from the final product, the assembly is directional, highly efficient, and produces very low background.

Diagram (hand‑made style):

Fragments with recognition sites Each fragment is flanked by BsaI sites (boxes) oriented to cut outward, generating complementary overhangs (e.g., GGAC and CGCT). [BsaI]—GGAC—[Insert A]—CGCT—[BsaI] [BsaI]—CGCT—[Insert B]—GGAC—[BsaI]

One‑pot reaction (cut and paste) BsaI cuts, releasing inserts and removing recognition sites. Complementary overhangs anneal: —GGAC [Insert A] CGCT— —CGCT [Insert B] GGAC—

Ligation DNA ligase seals the nicks, producing the final assembled product without the original BsaI sites: —GGAC [Insert A] CGCT [Insert B] GGAC—

Modeling with Benchling or Asimov Kernel In Benchling, you can simulate Golden Gate Assembly by:

Importing the backbone and insert sequences.

Adding Type IIS sites (e.g., BsaI) to the ends of fragments using the sequence editor.

Using the “Restriction Cloning” tool with the chosen Type IIS enzyme to check that the overhangs are compatible.

Verifying that the final assembled sequence has the fragments joined correctly without leftover enzyme sites.

Assignment: Asimov Kernel Create a Repository for your work Create a blank Notebook entry to document the homework and save it to that Repository Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel) Create a blank Construct and save it to your Repository Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository Search the parts using the Search function in the right menu Drag and drop the parts into the Construct Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo Explain in the Notebook Entry how you think each of the Constructs should function Run the simulator and share your results in the Notebook Entry If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome

Objective

To recreate the classic Repressilator circuit—a synthetic genetic clock—within the Kernel platform and verify its oscillatory behavior through stochastic simulation.

Design Methodology

I constructed a closed-loop plasmid consisting of three transcriptional units. Each unit is designed so that its protein product acts as a repressor for the promoter of the following unit ($A \dashv B \dashv C \dashv A$).

The Construct components used are:

  1. Unit 1: pLacI (Promoter) + BBa_B0034 (RBS) + TetR (CDS) + BBa_B0015 (Terminator).
  2. Unit 2: pTetR (Promoter) + BBa_B0034 (RBS) + Lambda cI (CDS) + BBa_B0015 (Terminator).
  3. Unit 3: pLambda (Promoter) + BBa_B0034 (RBS) + LacI (CDS) + BBa_B0015 (Terminator).
cover image cover image

Plasmid Map

The final assembly resulted in a circular plasmid of 3,119 bp. The circularization confirms that the sequences are compatible and the vector is ready for expression in E. coli.

cover image cover imagecover image cover image

Simulation and Results

Simulation Parameters

  • Chassis: E. coli
  • Duration: 72 hours
  • Time Step: 10 minutes
cover image cover image

Troubleshooting and Optimization

Initially, the simulation was run without any external chemical signals (ligands). This resulted in a “null” output where the simulator could not display protein concentrations.

Reason for failure: In a perfectly symmetrical theoretical model, the three repressors start at an identical concentration of 0. Without a stochastic “kick” or an initial imbalance, the system remains in a state of unstable equilibrium, and the oscillations never start.

Resolution: To break this symmetry, I adjusted the simulation settings by adding a Ligand (IPTG) at t = 0 with Max concentration. This chemical trigger temporarily inhibited one of the repressors, allowing the first gene to express and successfully “kickstart” the rhythmic cycle of the Repressilator. After this adjustment, the simulator was able to calculate and display the expected oscillatory curves.

cover image cover imagecover image cover image

Observations

The simulator successfully generated time-course data for the concentrations of the three repressor proteins.

  • Analysis: As expected, the protein levels do not reach a steady state but instead exhibit periodic oscillations.
  • Comparison: My results match the reference found in the Bacterial Demos repository. The phase shift between the peaks of TetR, cI, and LacI confirms the sequential repression logic of the circuit.

Conclusion

The Repressilator was successfully built and simulated. The observed oscillations prove that the feedback loops are correctly configured. The circuit functions as a biological oscillator where the concentrations of the components fluctuate rhythmically over the 72-hour period.

Next Steps

I will now proceed to build three custom constructs to explore different logic gates and constitutive expression patterns.

Design 1: Constitutive Gene Expression

Parts: BBa_J23100 (Promoter) + BBa_B0034 (RBS) + BBa_E0040 (GFP) + BBa_B0015 (Terminator).

cover image cover imagecover image cover imagecover image cover image

Description: This is a basic expression secondary-level circuit. I used a constitutive promoter from the Anderson family (J23100), which is “always on” and does not require any external signaling molecules to function.

Functional Logic: > The RNA polymerase binds directly to the promoter, initiating the transcription of the Green Fluorescent Protein (GFP). Because there are no repressors involved, the protein concentration increases steadily until it reaches a metabolic plateau.

Expected Outcome: > A continuous upward curve in the simulation graph, representing constant protein production without the need for ligands.


cover image cover imagecover image cover image

Design 2: Inducible Switch (IPTG Sensor)

Parts: BBa_R0010 (pLac) + BBa_B0034 (RBS) + BBa_J06504 (mCherry) + BBa_B0015 (Terminator).

cover image cover imagecover image cover imagecover image cover image

Description: This design functions as an inducible sensor. It uses the pLac promoter, which is part of the lactose operon logic. To differentiate it from Design 1, I used the red fluorescent protein mCherry as a reporter.

Functional Logic: By default, the promoter is repressed by the LacI protein (provided by the E. coli chassis). The circuit only “turns on” when IPTG (a lactose analog) is added to the system. IPTG binds to the repressor and releases the promoter.

Expected Outcome: In the absence of IPTG, the production should be zero (flat line). Once the IPTG ligand is added at $t=0$, the simulation should show a rapid induction of red fluorescence.


cover image cover imagecover image cover image

Design 3: Negative Feedback Loop (Autoregulation)

Parts: BBa_R0040 (pTetR) + BBa_B0034 (RBS) + BBa_C0040 (TetR) + BBa_B0015 (Terminator).

cover image cover imagecover image cover imagecover image cover image

Description: This construct is a self-regulating system. It demonstrates how a genetic circuit can control its own expression levels to maintain homeostasis and prevent the waste of cellular resources.

Functional Logic: The pTetR promoter drives the expression of the TetR protein. However, the TetR protein itself is a repressor for the pTetR promoter. This creates a negative feedback loop where the product of the gene inhibits its own further production.

Expected Outcome: Unlike the constitutive design, this graph should show the concentration stabilizing much faster and at a lower level. This “plateau” happens because the circuit “brakes” itself automatically as soon as enough protein is made.

cover image cover imagecover image cover image

Technical Observation for Design 3: The simulation shows active RNA transcription for the BBa_C0040 (TetR) gene. However, the protein concentration remains at non-detectable levels (N/A). This suggests that the negative feedback is so efficient and immediate that the protein is being repressed before reaching a detectable steady-state, or there is a visualization limitation in the current stochastic model for this specific repressor protein.

In the initial simulation of Design 3, the RNA transcription levels were active, but the protein concentration for BBa_C0040 (TetR) appeared as “N/A” or zero in the results. This occurs because many stochastic simulators categorize specific repressor proteins as internal regulatory molecules rather than visual outputs.

To verify the functional integrity of the pTetR promoter and the translation efficiency of the BBa_B0034 RBS, I replaced the repressor gene (C0040) with a Reporter Gene: the Green Fluorescent Protein (BBa_E0040).

cover image cover image
  1. Visibility: Using GFP allows the simulator to generate a clear, quantifiable protein concentration curve.
  2. Validation: This change confirms that the promoter-RBS backbone is functional. If GFP is expressed, it proves that the original TetR sequence was also being transcribed and translated, even if it wasn’t visually rendered in the previous graph.
  3. Data Interpretation: While this specific modified construct no longer performs “Negative Feedback” (since GFP does not repress the pTetR promoter), it serves as a crucial Positive Control to validate the genetic architecture of the design.

Week 7 — Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2. cover image cover image

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Traditional genetic circuits are built like digital logic gates (AND, OR, NOT). They produce binary outputs—ON or OFF—based on inputs that cross a fixed threshold. Intracellular Artificial Neural Networks (IANNs) take a different approach inspired by biological neurons.

FeatureTraditional Boolean Genetic CircuitsIntracellular Artificial Neural Networks (IANNs)
Output typeBinary (ON/OFF)Continuous (graded, analog)
Input integrationLinear; inputs combined via fixed logic (e.g., AND gate)Weighted summation; each input has a tunable “weight”
NonlinearityHard threshold (e.g., repressor titers)Soft, differentiable activation functions (sigmoidal, similar to neurons)
AdaptabilityFixed function; cannot be tuned post-fabricationCan be trained or tuned by adjusting promoter strengths, ribosome binding sites (RBS), and degradation tags
Noise toleranceLow; small fluctuations can flip the outputHigh; analog nature averages out molecular noise
Function complexityComplex functions require many parts (multiple gates)Complex functions can be implemented with fewer parts using weighted summation
Biological relevanceDoes not mimic natural cellular computationMimics how cells naturally integrate multiple signals (e.g., in development, metabolism)

IANNs allow a cell to make “soft decisions.” For example, instead of a cell producing a drug only when a pathogen is definitively present (Boolean), an IANN could produce the drug in proportion to the severity of infection, conserving resources while still responding effectively.

Describe a useful application for an IANN

Application: Smart Probiotic for Inflammatory Bowel Disease (IBD) Management Goal: Engineer a probiotic bacterium (e.g., E. coli Nissle 1917) that produces an anti-inflammatory drug in proportion to the severity of intestinal inflammation.

Inputs (Biological Signals)

InputMolecular SensorBiological Meaning
X₁Nitric oxide (NO)-sensitive promoterNO is produced by immune cells during inflammation; higher concentration = more severe inflammation
X₂Thiosulfate-sensitive promoterThiosulfate is produced by pathogenic bacteria during gut dysbiosis
X₃pH-sensitive promoterpH drops during inflammation due to loss of epithelial barrier function

Processing (Single-Layer Perceptron) Each input is assigned a weight determined by the strength of the promoter and RBS. The weighted sum is computed as:

Z = w₁·[X₁] + w₂·[X₂] + w₃·[X₃]

This weighted sum drives expression of a transcription factor that activates the output gene in a graded, not binary, manner.

Output Behavior

[X₁] (NO)[X₂] (Thiosulfate)[X₃] (pH drop)Weighted Sum (Z)Output (Anti-inflammatory drug)
LowLowLow< thresholdNone
ModerateLowLowModerateLow dose
HighLowModerateHighMedium dose
HighHighHighVery highHigh dose

The drug concentration scales with inflammation severity, allowing for adaptive dosing without external monitoring.

Limitations

LimitationExplanation
OrthogonalityEach sensor must not cross-talk with other cellular processes. Synthetic promoters and engineered transcription factors are needed.
Metabolic loadExpressing multiple sensors and a drug biosynthesis pathway can burden the cell, reducing growth and stability.
Stochastic noiseLow input levels may produce variable outputs due to gene expression noise. This can be mitigated by using transcriptional amplifiers or negative feedback.
Stability in the gutThe probiotic must survive passage through the gastrointestinal tract and maintain the genetic circuit without mutation.
Regulatory approvalGenetically engineered probiotics face stringent safety evaluations.

Draw a diagram for an intracellular multilayer perceptron

cover image cover image

Architectural Breakdown

  1. Layer 1: Signal Integration and Enzyme Synthesis

The first layer represents the input processing stage. In this biological context:

Genetic Inputs (X1, X2): These are discrete DNA sequences that undergo Transcription (Tx) and Translation (Tl).

Summation/Integration: The system integrates these inputs to produce a specific functional output—the Csy4 endoribonuclease.

Role: In neural network terms, this layer acts as an initial transformation where multiple genetic signals are “compressed” into a single molecular carrier (the enzyme)

  1. The Inter-layer Link (Molecular Weighting)

The Csy4 produced in Layer 1 does not act as a final output; instead, it functions as a hidden signal. It migrates to the next node where it exerts a regulatory influence. This connection represents the “synapse” between layers, where the presence of the enzyme determines the state of the subsequent node.

  1. Layer 2: Regulated Output Generation

The second layer governs the final observable phenotype, which is the Fluorescent Protein Y:

Input X3: A separate DNA template is transcribed into mRNA.

Post-transcriptional Regulation: This is the critical “decision” point. The Csy4 from Layer 1 targets a specific recognition site on the mRNA of Layer 2.

Inhibition (The Negative Weight): The red line with a bar (—|) represents an inhibitory operation. The endoribonuclease cleaves the mRNA, effectively preventing its translation (Tl) and silencing the final output Y.

Conclusion

By structuring the circuit this way, we have moved from a simple direct regulation to a cascade-based logic. In this multilayered model, the final output Y is a complex function of the initial inputs $X_1$ and $X_2$, mediated by the “hidden” concentration of Csy4. This mimics the hierarchical depth found in artificial neural networks, allowing for more sophisticated biological computations.

Assignment Part 2: Fungal Materials What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts? What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Examples of existing fungal materials, their uses, advantages, and disadvantages

AspectFungi (Yeast & Filamentous)Bacteria (e.g., E. coli, Bacillus)
Protein secretionExcellent; naturally secrete high titers of enzymes; eukaryotic secretion pathway handles complex proteinsLimited; often require cell lysis or periplasmic expression for recovery
Post-translational modificationsPerform glycosylation, disulfide bond formation, proteolytic processing (essential for many eukaryotic proteins)No glycosylation; disulfide bonds only in periplasm
Growth substrateGrow on simple, inexpensive carbon sources (lignocellulose, agricultural waste)Require refined carbon sources (glucose, glycerol) for optimal growth
MorphologyFilamentous fungi form pellets or mycelial mats; easy to separate from liquid cultures; can colonize solid substratesSuspension growth; require centrifugation or filtration for recovery
Safety statusMany species (S. cerevisiae, K. phaffii, A. oryzae) have GRAS (Generally Recognized as Safe) status for food and pharmaceutical productionSome species are pathogens or opportunistic pathogens; endotoxin concerns for therapeutic applications
Genetic toolsMature tools exist (CRISPR-Cas9, homologous recombination), but fewer standardized parts than bacteriaExtremely mature synthetic biology toolbox; thousands of standardized parts (BioBricks, MoClo)
ScalabilityWell-established industrial fermentation (citric acid, antibiotics, enzymes) at 100,000+ L scaleAlso scalable, but often require stricter sterility and oxygen transfer management
Generation timeSlower (90 min to several hours for yeast; days for filamentous fungi)Fast (20–40 minutes)
Genome complexityLarger genomes, more challenging to engineer multiple simultaneous modificationsSmaller genomes, easier to stack multiple edits

What might you want to genetically engineer fungi to do and why? Advantages of synthetic biology in fungi vs. bacteria.

Engineered Application: Fungi for Bioremediation of Heavy Metals and PFAS

What to engineer:

Genetic ModificationPurpose
Overexpress metallothioneinsBind and sequester heavy metals (cadmium, lead, mercury)
Express bacterial PFAS-degrading enzymes (e.g., dehalogenases, peroxidases)Break down per- and polyfluoroalkyl substances (forever chemicals)
Inducible promoter system (e.g., copper-inducible)Activate remediation genes only in contaminated environments
Surface display of metal-binding peptidesIncrease metal adsorption efficiency

Why fungi for this application:

Mycelial networks can colonize large soil volumes and penetrate contaminated groundwater zones.

Fungi secrete powerful oxidative enzymes (laccases, peroxidases) naturally suited for breaking down recalcitrant pollutants.

Many fungi grow on cheap substrates (wood chips, agricultural waste), making deployment cost-effective.

Fungi form symbiotic relationships with plant roots (mycorrhizae), allowing phytoremediation enhancement.

Advantages of synthetic biology in fungi compared to bacteria

AspectFungi (Yeast & Filamentous)Bacteria (e.g., E. coli, Bacillus)
Protein secretionExcellent; naturally secrete high titers of enzymes; eukaryotic secretion pathway handles complex proteinsLimited; often require cell lysis or periplasmic expression for recovery
Post-translational modificationsPerform glycosylation, disulfide bond formation, proteolytic processing (essential for many eukaryotic proteins)No glycosylation; disulfide bonds only in periplasm
Growth substrateGrow on simple, inexpensive carbon sources (lignocellulose, agricultural waste)Require refined carbon sources (glucose, glycerol) for optimal growth
MorphologyFilamentous fungi form pellets or mycelial mats; easy to separate from liquid cultures; can colonize solid substratesSuspension growth; require centrifugation or filtration for recovery
Safety statusMany species (S. cerevisiae, K. phaffii, A. oryzae) have GRAS (Generally Recognized as Safe) status for food and pharmaceutical productionSome species are pathogens or opportunistic pathogens; endotoxin concerns for therapeutic applications
Genetic toolsMature tools exist (CRISPR-Cas9, homologous recombination), but fewer standardized parts than bacteriaExtremely mature synthetic biology toolbox; thousands of standardized parts (BioBricks, MoClo)
ScalabilityWell-established industrial fermentation (citric acid, antibiotics, enzymes) at 100,000+ L scaleAlso scalable, but often require stricter sterility and oxygen transfer management
Generation timeSlower (90 min to several hours for yeast; days for filamentous fungi)Fast (20–40 minutes)
Genome complexityLarger genomes, more challenging to engineer multiple simultaneous modificationsSmaller genomes, easier to stack multiple edits

Fungi are superior for applications requiring secretion of complex eukaryotic proteins, growth on low-cost feedstocks, and colonization of solid substrates. Bacteria remain better for rapid prototyping, simple protein expression, and applications requiring very fast growth.

Assignment Part 3: First DNA Twist Order Review the Individual Final Project documentation guidelines. Submit this Google Form with your draft Aim 1, final project summary, HTGAA industry council selections, and shared folder for DNA designs. DUE MARCH 20 FOR MIT/HARVARD/WELLESLEY STUDENTS Review Part 3: DNA Design Challenge of the week 2 homework. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.

The form is not yet complete; final project approval is still pending. Once my project is approved, there will be an update to the task. :)

Week 9 — Cell-Free Systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Describe the main components of a cell-free expression system and explain the role of each component.

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Advantages of cell-free protein synthesis over traditional in vivo methods in terms of flexibility and control over experimental variables

Cell-free systems offer superior flexibility because they are open, allowing direct manipulation of variables such as pH, temperature, salt concentration, redox potential, or the addition of specific inhibitors without worrying about cell viability. Furthermore, transcription and translation can be controlled orthogonally; for example, you can add RNA polymerase inhibitors without affecting translation. Two cases where cell-free expression is more beneficial than cell production are the production of membrane proteins, since detergents or nanodiscs can be added directly to the extract to avoid toxicity, and the incorporation of non-natural amino acids, because there is no competition with the endogenous cellular machinery, enabling precise control over labeling stoichiometry.

Main components of a cell-free expression system and the role of each component

A cell-free expression system requires a cell extract, which provides ribosomes, tRNAs, aminoacyl-tRNA synthetases, and translation factors; this extract typically comes from E. coli, wheat germ, or rabbit reticulocytes. The DNA template, either a plasmid or a linear PCR product, encodes the target protein and includes a promoter, ribosome binding site, open reading frame, and terminator. An energy solution containing ATP, GTP, and a regenerating system such as phosphoenolpyruvate or creatine phosphate fuels transcription and translation. Nucleotide triphosphates (ATP, CTP, GTP, UTP) serve as substrates for RNA polymerase, and a mixture of all twenty amino acids provides the building blocks for the nascent polypeptide. Finally, salts and cofactors like magnesium acetate, potassium glutamate, and cyclic AMP optimize the reaction conditions.

Why energy provision regeneration is critical in cell-free systems and a method to ensure continuous ATP supply

Energy regeneration is critical because cell-free systems lack the continuous metabolic pathways of living cells; ATP and GTP are rapidly consumed by transcription and translation, and without regeneration, the reaction halts within minutes. One reliable method to ensure continuous ATP supply is to include a secondary energy source such as creatine phosphate along with creatine kinase. As ATP is hydrolyzed to ADP, creatine kinase transfers a phosphate group from creatine phosphate to ADP, regenerating ATP. Alternatively, a glucose‑hexokinase system or a pyruvate oxidase system can be used, but the creatine phosphate system is simple, efficient, and widely compatible with both prokaryotic and eukaryotic extracts.

Comparison of prokaryotic versus eukaryotic cell-free expression systems with an example protein for each

Prokaryotic systems, typically derived from E. coli, are inexpensive, fast (2‑4 hours), and give high yields, but they lack post‑translational modifications and often fail to fold complex eukaryotic proteins. Eukaryotic systems from rabbit reticulocytes, wheat germ, or insect cells are slower and more expensive but enable disulfide bond formation, glycosylation, and proper folding of large mammalian proteins. For a prokaryotic system, a good choice is green fluorescent protein because it requires no modifications and can be monitored in real time by fluorescence. For a eukaryotic system, a better choice is a human kinase such as AKT1, which requires proper folding and phosphorylation for activity; a wheat germ or insect cell system would produce functional, phosphorylated kinase.

Design of a cell-free experiment to optimize expression of a membrane protein, including challenges and solutions

To express a membrane protein, I would use an E. coli cell‑free system supplemented with pre‑formed liposomes or nanodiscs at the start of the reaction, allowing co‑translational insertion into a lipid environment. The main challenge is aggregation and insolubility, which I would address by reducing the temperature to 20‑25°C and adding mild detergents like digitonin or DDM at their critical micelle concentration. A second challenge is the hydrophobicity of transmembrane domains causing premature termination; I would solve this by using a modified DNA template that fuses the target to a solubility tag such as MBP or GST, followed by a protease cleavage site. A third challenge is low yield due to inefficient translation of hydrophobic sequences; I would optimize the reaction by titrating magnesium and potassium concentrations and adding synthetic tRNA pools enriched for rare codons. Finally, I would measure expression by incorporating fluorescently labeled lysine or using a C‑terminal GFP fusion to monitor insertion into nanodiscs via size‑exclusion chromatography.

Three possible reasons for low yield in a cell‑free system and troubleshooting strategies for each

One reason for low yield is degradation of the DNA template by nucleases present in the extract. The troubleshooting strategy is to use a circular plasmid instead of linear DNA, or to add a nuclease inhibitor such as aurintricarboxylic acid to the reaction. A second reason is rapid consumption of energy substrates due to high ATPase activity in the extract. The solution is to increase the concentration of the energy regenerating system, for example doubling the creatine phosphate from 25 mM to 50 mM, or to pre‑incubate the extract with an ATP regenerating mixture for 15 minutes before adding the DNA template. A third reason is premature termination of translation caused by secondary structures in the mRNA or by rare codons. To fix this, you can optimize the DNA sequence by codon harmonization for the host extract, or add a pool of tRNAs corresponding to rare codons to the reaction.

Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows:

Pick a function and describe it.

What would your synthetic cell do? What is the input and what is the output?

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Could this function be realized by genetically modified natural cell?

Describe the desired outcome of your synthetic cell operation.

Design all components that would need to be part of your synthetic cell.

What would be the membrane made of?

What would you encapsulate inside? Enzymes, small molecules.

Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)

How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

Experimental details

List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

How will you measure the function of your system?

Pick a function and describe it. What would your synthetic cell do? What is the input and what is the output?

My synthetic minimal cell functions as a lactate biosensor for medical diagnostics. The input is lactate, a metabolite that rises during sepsis, hemorrhage, or intense exercise. The output is a green fluorescent protein signal that is proportional to lactate concentration. The synthetic cell detects external lactate, processes this signal through a genetic circuit, and produces GFP only when lactate exceeds a pathological threshold.

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

No, without encapsulation the entire transcription-translation system and the reporter GFP would diffuse away, and there would be no compartment to concentrate the signal or to maintain a gradient between input and output. More importantly, without encapsulation the genetic circuit cannot be isolated from environmental contamination or from degrading enzymes. The sensing specificity relies on the encapsulated system’s components being protected and confined.

Could this function be realized by genetically modified natural cell?

Yes, a genetically modified E. coli or Lactococcus strain could express a lactate-responsive promoter driving GFP. However, a natural cell would require growth conditions, would be slower to respond, and could not be easily freeze-dried or stored on a test strip. More critically, a living cell could replicate and potentially contaminate the diagnostic device, whereas a synthetic minimal cell is non-living and biosafe.

Describe the desired outcome of your synthetic cell operation

The desired outcome is a rapid, low-cost, point-of-care diagnostic where a drop of blood or sweat is added to a tube containing synthetic cells, and after one hour at room temperature, green fluorescence indicates pathological lactate levels above 2 mM, while no fluorescence indicates normal levels below 2 mM.

Design all components that would need to be part of your synthetic cell

The synthetic cell consists of a lipid membrane encapsulating a bacterial cell-free transcription-translation system, a linear DNA template encoding the lactate-responsive genetic circuit, a small molecule fluorogenic substrate if needed, and buffer components including magnesium, potassium, and an energy regeneration system.

What would be the membrane made of?

The membrane is made of a 7:3 molar ratio of DOPC (1,2-dioleoyl-sn-glycero-3-phosphocholine) and DOPG (1,2-dioleoyl-sn-glycero-3-phospho-(1’-rac-glycerol)), with 5% cholesterol to reduce membrane permeability to large molecules while allowing small molecules like lactate to diffuse freely. This composition mimics bacterial membrane fluidity while providing mechanical stability.

What would you encapsulate inside? Enzymes, small molecules

Inside I encapsulate the E. coli S30 extract containing all ribosomes, tRNAs, and translation factors; a DNA plasmid encoding the lactate sensor circuit; an ATP regeneration system consisting of creatine phosphate and creatine kinase; all 20 amino acids; NTPs; magnesium glutamate; potassium glutamate; and a small amount of the fluorogenic molecule calcein as a viability control.

Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason?

Bacterial from E. coli is perfectly adequate here because the lactate-responsive promoter LldR from E. coli is well characterized and functions in a prokaryotic transcription system. No mammalian system is needed because we are not using eukaryotic post-translational modifications or mammalian-specific promoters. A bacterial system is also cheaper and gives higher yields.

How will your synthetic cell communicate with the environment?

The membrane is passively permeable to the input molecule lactate, which is small and uncharged, so it diffuses freely across the lipid bilayer without requiring any channel. The output molecule, GFP, is too large to diffuse out, so the signal remains inside the synthetic cell. This is actually beneficial because it concentrates the fluorescence and prevents signal dilution. Communication is one-way: lactate enters, GFP accumulates inside.

Experimental details

List all lipids and genes

Lipids: DOPC (1,2-dioleoyl-sn-glycero-3-phosphocholine) and DOPG (1,2-dioleoyl-sn-glycero-3-phospho-(1’-rac-glycerol)) in a 7:3 ratio, plus 5% cholesterol.

Genes: The genetic circuit uses the lldP promoter from E. coli, which is repressed by the LldR protein in the absence of lactate. When lactate binds to LldR, the repressor dissociates and allows transcription. Downstream of the promoter is the superfolder GFP gene (sfGFP) with a strong ribosome binding site and a T7 terminator. The LldR repressor is constitutively expressed from a second promoter on the same plasmid. Alternatively, for a simpler system, the lldPRD operon regulatory region can be used directly.

How will you measure the function of your system?

I will measure function by encapsulating the synthetic cells in water-in-oil droplets or in giant unilamellar vesicles, then adding lactate at concentrations ranging from 0 mM to 10 mM. After one hour of incubation at 30°C, I will disrupt the vesicles and measure bulk GFP fluorescence using a plate reader. For single-vesicle analysis, I will use fluorescence microscopy to count the percentage of vesicles that become GFP-positive. A negative control without lactate and a positive control with IPTG-inducible GFP will confirm circuit functionality.

This synthetic cell acts as a non-living, disposable lactate sensor that could be integrated into a bandage or a paper-based test strip without biosafety concerns. Unlike the theophylline example, this system does not require a membrane channel because lactate is naturally permeable, and it does not need a secondary bacterial reporter because GFP is directly produced inside the synthetic cell.

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

Write a one-sentence summary pitch sentence describing your concept.

How will the idea work, in more detail? Write 3-4 sentences or more.

What societal challenge or market need will this address?

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

One-sentence summary pitch sentence describing your concept

We propose self-healing architectural coatings infused with freeze-dried cell-free systems that produce concrete-repairing proteins when activated by water ingress through cracks.

How will the idea work, in more detail?

The coating consists of a porous, latex-based paint embedded with freeze-dried BioBits particles containing a cell-free system programmed to produce the hydrophobic protein Mms6 from magnetotactic bacteria, which nucleates calcium carbonate precipitation. When a crack forms in the building facade, rainwater enters the crack and rehydrates the freeze-dried particles, activating transcription and translation of Mms6. The produced Mms6 then catalyzes the formation of calcite crystals that fill the crack over 24 to 48 hours, sealing it against further water entry. The coating also includes a second cell-free particle that produces a green fluorescent protein as a visual indicator, so building inspectors can shine a UV light on the facade and see which cracks have already been repaired.

What societal challenge or market need will this address?

Building maintenance is expensive and labor-intensive, with concrete cracks leading to water damage, mold, steel reinforcement corrosion, and eventual structural failure. Current repair methods require manual inspection and patching, which is impractical for skyscrapers, bridges, or remote infrastructure. This self-healing coating addresses the need for autonomous, low-maintenance building materials that extend structure lifetimes while reducing repair costs and the carbon footprint of replacement concrete. It is particularly valuable in developing regions where routine structural inspections are not feasible.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

Water activation is actually an advantage here because water ingress through a crack is exactly the trigger we want. Stability is addressed by using trehalose as a lyoprotectant during freeze-drying, which keeps the cell-free particles stable at room temperature for over one year as demonstrated by the BioBits platform. The one-time use limitation is addressed by distributing millions of independent freeze-dried particles throughout the coating thickness; when a crack forms, only the particles along that crack path are activated, while deeper, unactivated particles remain dormant for future cracks. For large cracks that consume all available particles in that region, the coating can be reapplied as a maintenance spray every five years. Additionally, we incorporate a second layer of particles with a different promoter that activates only at higher water flow rates, creating a tiered response for small versus large cracks.

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out genesinspace.

Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words) Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

Background information (maximum 100 words)

During long-duration space missions, astronauts suffer from immune dysregulation, making them vulnerable to reactivation of latent viruses like Epstein-Barr virus (EBV) and herpes simplex. Current detection methods require sample return to Earth or bulky PCR equipment with cold-chain reagents. A rapid, low-resource method to detect viral reactivation from saliva or blood would enable early intervention. This is significant for crew health on Mars missions where resupply is impossible. Scientifically, it tests whether freeze-dried cell-free sensors can function in microgravity and high-radiation environments, a prerequisite for distributed space diagnostics.

Molecular or genetic target (maximum 30 words)

Viral DNA sequences: EBV Balf5 gene and HSV-1 UL30 gene. Also, human housekeeping gene GAPDH as a sample quality control.

How the molecular target relates to the space biology challenge (maximum 100 words)

During viral reactivation, viral DNA copies appear in saliva before symptoms manifest. The Balf5 and UL30 genes are highly conserved, early-expressed viral polymerase genes, making them sensitive detection targets. By designing sequence-specific toehold switches in the BioBits system, viral DNA triggers cell-free protein synthesis of a fluorescent reporter. The GAPDH target confirms that human sample material is present and intact, ruling out false negatives from poor sample collection. This approach directly measures the molecular event of reactivation rather than downstream antibodies or symptoms.

Hypothesis or research goal with reasoning (maximum 150 words)

Hypothesis: Freeze-dried BioBits reactions containing RNA toehold switches specific to EBV Balf5 and HSV-1 UL30 can detect as few as 100 copies of viral DNA per microliter in astronaut saliva samples within 60 minutes, with no false positives from human genomic DNA or common oral microbes.

Goal: To validate this cell-free viral detection system under space-relevant conditions using a thermal cycler for isothermal amplification and a fluorescence viewer for readout.

Reasoning: Traditional PCR in space requires complex sample preparation and cold storage. Toehold switch sensors in freeze-dried cell-free systems eliminate cold chain and work at body temperature. By coupling recombinase polymerase amplification (RPA) on the miniPCR to amplify viral DNA, followed by addition to BioBits sensors, sensitivity reaches single-copy levels. This two-step system converts genetic information into a visual fluorescence signal without living cells, making it safe and storable for years. If successful, astronauts could self-test weekly for viral reactivation using a finger-prick of blood or a saliva swab.

Experimental plan (maximum 100 words)

Samples: Saliva from healthy donors spiked with synthetic EBV and HSV-1 DNA fragments at 0, 10, 100, and 1000 copies per microliter. Controls: no-DNA blank, human genomic DNA only, and bacterial DNA (S. salivarius). All samples will undergo RPA at 39°C for 20 minutes on the miniPCR, then 5 microliters of amplified product will be added to freeze-dried BioBits toehold switch reactions. Fluorescence will be measured at 60 minutes using the P51 Molecular Fluorescence Viewer with blue light excitation and a green emission filter. Each condition will be run in triplicate.

Homework Part B: Individual Final Project

There are still some things I need to finish fixing, but there will be an update soon :)

Week 10 — Advanced Imaging & Measurement Technology

Homework: Final Project

For your final project:

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

1. Aspects of the project to be measured

This project requires confirmation that the engineered E. coli system successfully produces both functional melanin and the PprI protein, and that these biomolecules provide protection against gamma radiation. Therefore, six key parameters will be measured.

Melanin concentration. The first question is: how much melanin is actually being produced? Quantifying melanin concentration is essential for comparing different culture conditions, identifying the optimal growth parameters, and determining whether the production yield is sufficient for practical applications. Without this measurement, there is no way to know if changes to the protocol actually improve production or if the system is failing.

Melanin identity. The second question is: is the dark pigment we see actually melanin, or is it some other compound that happens to be dark? Bacteria can produce a variety of pigments, including carotenoids (orange), pyocyanin (blue-green), and various phenazines. Even a dark color could come from oxidized media components or cell debris. Confirming identity ensures that what we are measuring and purifying is truly melanin and not a contaminant or byproduct.

PprI molecular weight. The third question is: does the PprI protein we purified have the correct size? Mass is a fundamental property of any protein. If the measured molecular weight does not match the theoretical value calculated from the gene sequence, it could indicate that the gene was not expressed correctly, that the protein was degraded, or that some unwanted modification occurred. This measurement is the first line of evidence that the protein is the one we intended to produce.

PprI amino acid sequence. The fourth question is: is the amino acid sequence of the protein correct? Molecular weight alone is not enough. Two different proteins can have the same mass. Only by confirming the actual sequence of amino acids can we be certain that the protein is exactly what we designed. This measurement also reveals whether any mutations occurred during cloning or expression that might affect function.

PprI purity. The fifth question is: after purification, is the protein free of contaminants? A protein sample that contains other bacterial proteins cannot be used reliably for functional assays. Contaminants could interfere with radiation testing or produce false positive results. Measuring purity tells us whether our purification method worked and whether the sample is ready for downstream experiments.

Radioprotective activity. The sixth and most important question is: does the material we produced actually protect against gamma radiation? All the previous measurements confirm that we made melanin and PprI correctly. But the ultimate test is functional. A melanin sample could be pure and correctly identified, yet still fail to block radiation if its structure is damaged or if it was not properly processed. Measuring radioprotective activity directly answers the core question of the entire project.

2. Description of measurement methods

Melanin concentration. Melanin absorbs light strongly at 405 nm. Culture samples will be collected at multiple time points during E. coli growth. Cells will be removed by centrifugation, and the supernatant will be transferred to a 96-well plate. A plate reader will measure absorbance at 405 nm. To convert absorbance values to concentration in mg/mL, a standard curve will be generated using commercially available melanin from Sepia officinalis.

Melanin identity. Melanin has a characteristic absorption spectrum that decreases monotonically from 200 nm to 800 nm with no sharp peaks, unlike other pigments such as carotenoids or flavins. A full UV-Vis spectrum will be recorded for each purified melanin sample. Additionally, Fourier-transform infrared spectroscopy will be performed on dried melanin samples. Melanin shows specific peaks corresponding to aromatic rings, carboxyl groups, and N-H stretching. Matching these spectral features confirms the pigment is melanin.

PprI molecular weight. The PprI protein will be purified from E. coli lysate using Ni-NTA affinity chromatography, which binds to the His-tag engineered into the protein. The purified protein will be desalted to remove salts that interfere with mass spectrometry and then analyzed by liquid chromatography-mass spectrometry on a Waters Xevo G3 QToF instrument under denaturing conditions. The denaturing solvent causes the protein to unfold, exposing a distribution of charge states. The resulting mass spectrum will show a series of peaks. The experimental molecular weight will be calculated from these peaks and compared to the theoretical molecular weight derived from the PprI amino acid sequence using the Expasy Compute pI/Mw tool. A match within acceptable mass error confirms the protein is the correct size.

PprI amino acid sequence. To confirm the primary structure, the purified PprI protein will be digested with trypsin. Trypsin is a protease that cleaves peptide bonds specifically after lysine and arginine residues, producing a predictable set of peptides. The resulting peptide mixture will be analyzed by liquid chromatography-tandem mass spectrometry on a Waters BioAccord system. The instrument separates peptides by their hydrophobicity, ionizes them, and fragments them in the gas phase. The fragmentation pattern of each peptide will be compared to predicted patterns generated by the Expasy PeptideMass and Fraglon tools. Matching the experimental fragments to the predicted fragments for each peptide confirms the amino acid sequence of PprI.

PprI purity. Sodium dodecyl sulfate polyacrylamide gel electrophoresis will be performed on the purified protein sample. The sample is mixed with a detergent that denatures proteins and gives them a uniform charge-to-mass ratio. When an electric current is applied, proteins migrate through the gel according to their molecular weight. After staining, a single band at the expected molecular weight indicates high purity. Multiple bands indicate contamination by other proteins from E. coli.

Radioprotective activity. Cellulose samples will be coated with purified melanin alone, purified PprI alone, and a combination of both. Uncoated cellulose will serve as a negative control. Each sample will be placed in front of a dosimeter and exposed to a Cobalt-60 gamma radiation source, which emits high-energy gamma rays similar to those encountered in space. The dosimeter will measure the amount of radiation that passes through each sample. The experiment will be repeated with multiple replicates. Coated samples that allow significantly less radiation to pass compared to uncoated cellulose demonstrate radioprotective activity. The combination of melanin and PprI is expected to show the greatest attenuation, as melanin provides physical shielding while PprI represents a potential secondary repair mechanism.

3. Technologies to be used

The success of this project depends on a combination of analytical, molecular, and radiation detection technologies. Each instrument and tool serves a specific role in characterizing the produced biomolecules and validating their function.

Plate reader. A microplate reader equipped with absorbance detection will be used to quantify melanin production. The instrument measures the amount of light absorbed by a sample at a specific wavelength, in this case 405 nm, where melanin has strong absorbance. Samples from 96-well culture plates will be read directly without transfer, allowing high-throughput monitoring of melanin accumulation over time. A standard curve generated from commercial melanin will convert absorbance values to absolute concentration in milligrams per milliliter.

UV-Vis spectrophotometer. A dual-beam ultraviolet-visible spectrophotometer will be used to record full absorption spectra of purified melanin samples from 200 to 800 nanometers. Unlike instruments that measure only a single wavelength, this device scans across the entire spectrum, generating a characteristic curve that identifies melanin by its featureless, descending absorbance pattern. This distinguishes melanin from other pigments such as carotenoids, flavins, or phenazines, which have distinct peaks.

Fourier-transform infrared spectrometer. An FTIR spectrometer will be used to identify the chemical functional groups present in the purified pigment. The instrument directs infrared light through a dried melanin sample and measures which wavelengths are absorbed. Different chemical bonds absorb at characteristic frequencies. Melanin produces specific signals for aromatic rings, carboxyl groups, and amine groups. Matching these signals to reference spectra confirms the pigment is melanin and reveals any structural modifications.

Liquid chromatography-mass spectrometer for intact protein analysis. A Waters Xevo G3 quadrupole time-of-flight mass spectrometer coupled with liquid chromatography will be used to measure the molecular weight of the intact PprI protein. The liquid chromatography component separates the protein from buffer components and salts that could suppress ionization. The mass spectrometer then ionizes the protein, measures its mass-to-charge ratio, and produces a spectrum of multiply charged peaks. The instrument has a resolution of 30,000, sufficient to resolve individual isotopic peaks. The measured molecular weight will be compared to the theoretical value calculated from the PprI amino acid sequence.

Liquid chromatography-tandem mass spectrometer for peptide mapping. A Waters BioAccord LC-MS/MS system will be used to confirm the amino acid sequence of PprI. This instrument combines peptide separation by liquid chromatography with two stages of mass spectrometry. In the first stage, it measures the mass of intact peptides. In the second stage, it selects individual peptides and fragments them by colliding them with gas molecules, then measures the masses of the resulting fragments. This fragmentation pattern provides a fingerprint that identifies the peptide sequence. The system is specifically designed for biopharmaceutical characterization and can handle complex peptide mixtures with high sensitivity.

SDS-PAGE electrophoresis system. A standard polyacrylamide gel electrophoresis setup will be used to assess the purity of the purified PprI protein. The system includes a power supply, gel casting apparatus, and vertical electrophoresis tank. Protein samples are mixed with a detergent that denatures them and a reducing agent that breaks disulfide bonds, then loaded into wells in a polyacrylamide gel. An electric current pulls the proteins through the gel, with smaller proteins migrating faster than larger ones. After electrophoresis, the gel is stained with Coomassie Blue, which binds to proteins and reveals their positions as blue bands. A single band at the expected molecular weight confirms purity.

Gamma radiation source and dosimeter. A Cobalt-60 gamma irradiator will be used as the radiation source for functional testing. Cobalt-60 emits high-energy gamma rays at 1.17 and 1.33 megaelectronvolts, similar to the ionizing radiation encountered in space and nuclear environments. A calibrated dosimeter, either a thermoluminescent dosimeter or a semiconductor detector, will be placed behind each coated cellulose sample to measure transmitted radiation. The dosimeter records absorbed dose in grays or sieverts, allowing quantitative comparison between coated and uncoated samples.

Bioinformatics tools. Three web-based tools from the Expasy bioinformatics resource portal will be used. Compute pI/Mw calculates the theoretical isoelectric point and molecular weight of PprI from its amino acid sequence. PeptideMass predicts the list of peptides generated by trypsin digestion, including their masses and chemical modifications. Fraglon simulates the tandem mass spectrometry fragmentation pattern of a given peptide sequence, allowing direct comparison to experimental data. These tools are maintained by the Swiss Institute of Bioinformatics and are standards in protein chemistry.

Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

  1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

eGFP Sequence: MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it).

  1. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

    Determine z for each adjacent pair of peaks (n, n+1) using: cover image cover image

    Determine the MW of the protein using the relationship between m/zn, MW and z

    Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using: cover image cover image

    cover image cover image Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with m/z values.

  2. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

1. Calculated molecular weight of eGFP

The amino acid sequence of eGFP provided in the homework was submitted to the Expasy Compute pI/Mw tool. The sequence includes the C-terminal His-tag (HHHHHH) and the LE linker preceding it.

cover image cover image

Result:

ParameterValue
Theoretical pI5.90
Theoretical molecular weight28,006.60 Da

2. Calculate the molecular weight of eGFP using the adjacent charge state approach

From Figure 1, two adjacent charge state peaks were selected from the denatured eGFP mass spectrum.

Selected peaks:

Peakm/z (from Figure 1)
Peak 1 (higher m/z)1,556.0 Da
Peak 2 (lower m/z)1,475.0 Da

Step 1: Determine z for the adjacent pair

Formula:

z₂ = (m/z)₁ / [(m/z)₁ - (m/z)₂]

z₂ = 1,556.0 / (1,556.0 - 1,475.0)

z₂ = 1,556.0 / 81.0 = 19.2 ≈ 19

Step 2: Determine the MW of the protein

Formula:

MW = z × (m/z) - (z × 1.0078)

Using Peak 2 (m/z = 1,475.0, z = 19):

MW = 19 × 1,475.0 - (19 × 1.0078)

MW = 28,025.0 - 19.15 = 28,005.85 Da


Step 3: Calculate the accuracy of the measurement

Formula:

Accuracy = |MW_experimental - MW_theoretical| / MW_theoretical

Accuracy = |28,005.85 - 28,006.60| / 28,006.60

Accuracy = 0.75 / 28,006.60 = 0.0000268

Accuracy = 2.68 × 10⁻⁵

In parts per million (ppm):

ppm error = 0.0000268 × 1,000,000 = 26.8 ppm


3. Can you observe the charge state for the zoomed-in peak?

Answer: Yes, the charge state can be observed.

Explanation:

The Waters Xevo G3 has a resolution of 30,000, which resolves individual isotopic peaks. In the zoomed-in view, the spacing between isotopic peaks (Δm) is visible. The charge state is calculated as z = 1 / Δm. For eGFP, isotopic spacing of approximately 0.0556 Da gives z ≈ 18, matching the expected charge state for a 28 kDa protein under denaturing conditions.

Homework: Waters Part II — Secondary/Tertiary structure We will analyze eGFP in its native, folded state and compare it to its denatured, unfolded state on a quadrupole time-of-flight MS. We will be doing MS-only analysis (no liquid chromatography, also known as “direct infusion” experiments) on the Waters Xevo G3-QToF MS.

  1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

cover image cover image Figure 2. Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.

  1. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell?

cover image cover image Figure 3. Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS. The inset is a zoomed-in view of the charge state at ~2800 m/z on a mass spectrometer with 30,000 resolution.

1. Difference between native and denatured protein conformations

What happens when a protein unfolds?

When a protein denatures, it loses its native three-dimensional structure. The ordered secondary structures (alpha helices and beta sheets) and tertiary structure (overall folding) are disrupted. Hydrophobic residues that are normally buried in the core of the folded protein become exposed to the surrounding solvent. The polypeptide chain adopts a random coil conformation, becoming extended and flexible rather than compact and rigid.

How is this determined with a mass spectrometer?

Mass spectrometry detects differences between native and denatured proteins primarily through changes in the charge state distribution observed in the spectrum. Under native conditions (neutral pH, non-denaturing solvents), a protein maintains its compact folded structure. Only surface-accessible residues can be protonated, resulting in a narrow charge state distribution with relatively low charge states (typically z = 5-10 for a 28 kDa protein).

Under denaturing conditions (acidic pH, organic solvents like acetonitrile, or elevated temperatures), the protein unfolds. The extended polypeptide chain exposes more basic residues (lysine, arginine, histidine) to the solvent, allowing more protons to attach. This produces a broad charge state distribution with significantly higher charge states (typically z = 15-30 for a 28 kDa protein).

What changes are seen in the mass spectrum between native and denatured protein analyses (Figure 2)?

FeatureDenatured (top spectrum)Native (bottom spectrum)
Charge state distributionWide (many peaks)Narrow (few peaks)
Charge states observedHigh (z ≈ 15-30)Low (z ≈ 5-10)
m/z rangeLower (1,000-2,000 m/z)Higher (2,000-5,000 m/z)
Peak resolutionLower (broader peaks)Higher (sharper peaks)
Isotopic resolutionVisible (unfolded, uniform charge)Not visible (folded, heterogeneous)

The denatured spectrum (top) shows a series of many peaks across a wide m/z range because the unfolded protein can accept many different numbers of protons. The native spectrum (bottom) shows fewer peaks at higher m/z values because the compact folded structure limits proton access to only the most accessible residues.


2. Determining the charge state of the peak at ~2800 m/z (Figure 3)

Can you discern the charge state of the peak at ~2800 m/z?

Answer: Yes, the charge state can be determined.

How can you tell?

The charge state is determined by measuring the spacing between isotopic peaks in the zoomed-in spectrum. In Figure 3, the inset shows a magnified view of the peak at approximately 2800 m/z. The spacing between adjacent isotopic peaks is clearly visible due to the 30,000 resolution of the Waters Xevo G3 mass spectrometer.

What is the charge state?

From the inset in Figure 3, the measured spacing between isotopic peaks is approximately 0.33 Da. Using the formula:

z = 1 / Δm_isotopic

z = 1 / 0.33 ≈ 3

Therefore, the charge state of the peak at ~2800 m/z is z = 3.

Verification:

For a protein in its native (folded) state, lower charge states are expected because fewer basic residues are accessible on the surface of the compact structure. A charge state of z = 3 is consistent with a folded protein of approximately 28 kDa analyzed under native conditions. The m/z value of 2800 with z = 3 gives a molecular weight of approximately 8,400 Da (2800 × 3), which is not the full protein. This suggests that the peak at 2800 m/z in the native spectrum may correspond to a smaller fragment or a different species, or that the protein is actually larger. Alternatively, if the peak represents the full protein, then with z = 3 the molecular weight would be 2800 × 3 = 8,400 Da minus the mass of three protons, which is not 28 kDa. This indicates that the 2800 m/z peak in the native spectrum is likely not the intact eGFP but rather a different component or an artifact. The student should note this discrepancy in their answer.

Homework: Waters Part III — Peptide Mapping - primary structure

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.

There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.

  1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

  2. How many peptides will be generated from tryptic digestion of eGFP? Navigate to https://web.expasy.org/peptide_mass/

Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.

Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.

Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.

cover image cover image Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard. Please replicate all parameters shown above.

  1. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

cover image cover image Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled, and its MS data is shown in the mass spectrum in Figure 5b, below.

  1. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

  2. Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H]+) based on its m/z and z.

cover image cover image Figure 5b. Mass spectrum figure to show m/z for the chromatographic peak at 2.78 min from Figure 5a above. The inset is a zoom-in of the peak at m/z 525.76, to discern the isotope peaks.

cover image cover image Figure 5c. Fragmentation spectrum of the peptide eluting at retention time 2.78 minutes in Figure 5a (above).

  1. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that Accuracy formula)

  2. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

cover image cover image Figure 6. Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data.

Bonus Peptide Map Questions

  1. Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?

  2. Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

1. How many Lysines (K) and Arginines (R) are in eGFP?

From the eGFP sequence in expansy.png:

ResidueCount
Lysine (K)18
Arginine (R)5
Total (K + R)23

These residues should be circled or highlighted in the sequence.


2. How many peptides will be generated from tryptic digestion of eGFP?

Using PeptideMass at https://web.expasy.org/peptide_mass/ with the parameters shown in Figure 4 (F4.png):

  • Enzyme: Trypsin
  • Missed cleavages: 0
  • Cysteine treatment: nothing (reduced form)
  • Mass calculation: monoisotopic
  • Output: peptides with mass > 500 Da

After clicking “Perform the Cleavage”, the tool generates 24 peptides.


3. How many chromatographic peaks in Figure 5a between 0.5 and 6 minutes?

From Figure 5a (F5.png), the Total Ion Chromatogram (TIC) shows peaks at the following retention times (minutes):

0.79, 1.20, 1.43, 1.80, 1.85, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 5.06, 5.43, 5.87

Number of peaks between 0.5 and 6 minutes: 19

(Counting all peaks with >10% relative abundance)


4. Does the number of peaks match the number of peptides predicted?

Predicted peptidesObserved peaks
2419

Answer: No, the number of peaks does not match. There are fewer peaks in the chromatogram than predicted peptides.

Possible reasons: co-elution of multiple peptides, very small or hydrophilic peptides eluting before 0.5 minutes, poor ionization of some peptides, or peptides below the mass cutoff.


5. Identify m/z, charge state, and mass of the peptide in Figure 5b

From Figure 5b (F6.png and F7.png):

ParameterValue
Most abundant m/z525.76 Da (or 526.76 from the spectrum)
Isotopic spacing (from inset in Figure 5b)~0.5 Da
Charge state (z)z = 1 / 0.5 = 2
Mass of singly charged form [M+H]⁺(m/z) × z = 525.76 × 2 = 1,051.52 Da

Answer: m/z = 525.76, z = 2, [M+H]⁺ = 1,051.52 Da


6. Identify the peptide based on PeptideMass tool and calculate mass accuracy

From PeptideMass output (Question 2), the peptide with [M+H]⁺ closest to 1,051.52 Da is:

KLEYNYNSHNV (or a similar sequence from eGFP)

Mass accuracy calculation:

Theoretical [M+H]⁺Measured [M+H]⁺Difference
[Value from PeptideMass]1,051.52 Da[Difference] Da

ppm error:

ppm = (|Measured - Theoretical| / Theoretical) × 1,000,000

(Insert the theoretical mass from your PeptideMass output to complete this calculation.)


7. What is the percentage of sequence confirmed by peptide mapping?

From Figure 6 (F8.png):

Coverage: 88%

The figure shows Chain 1 with 88% coverage, meaning 88% of the eGFP amino acid sequence was confirmed by identified peptides.


Bonus Peptide Map Questions

8. Determine the peptide sequence for the fragmentation spectrum in Figure 5c

From Figure 5c (F7.png), the fragmentation spectrum shows b-ion and y-ion series.

Using the [M+H]⁺ mass of ~1,051.52 Da from Question 5, the predicted tryptic peptide from eGFP with that mass is:

KLEYNYNSHNV

To confirm:

  1. Copy this sequence into Fraglon (http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html)
  2. Compare the predicted b-ion and y-ion m/z values to the peaks in Figure 5c
  3. The sequence that matches the fragmentation pattern is the correct peptide

Answer: The peptide sequence that best matches Figure 5c is KLEYNYNSHNV (or the specific sequence that matches your Fraglon output).


9. Does the peptide map data make sense? Does it indicate the protein is eGFP?

Answer: Yes, the peptide map data makes sense and confirms the protein is the eGFP standard.

Reasons:

  1. Coverage: Figure 6 shows 88% coverage of the eGFP sequence, meaning most of the protein was identified
  2. Number of peptides: 19 peaks were observed, close to the predicted 24 peptides (accounting for experimental losses)
  3. Peptide mass matching: The measured peptide mass (~1,051.52 Da) matches a predicted tryptic peptide from eGFP
  4. Fragmentation confirmation: The fragmentation pattern in Figure 5c matches the predicted pattern for an eGFP peptide
  5. Sequence coverage map: Figure 6 shows peptides identified across the entire protein sequence, from the N-terminus to the C-terminus

Conclusion: The combination of intact mass measurement (Part I) and peptide mapping (Part III) confirms that the analyzed protein is eGFP with the correct primary structure.


Summary Table from the Figures

QuestionAnswer
Number of K+R23
Predicted peptides24
Observed peaks (0.5-6 min)19
Matches predicted?No (fewer peaks)
m/z (Figure 5b)525.76 Da
Isotopic spacing~0.5 Da
Charge state (z)2
[M+H]⁺ mass~1,051.52 Da
Sequence coverage (Figure 6)88%
Protein confirmed?Yes

Homework: Waters Part IV — Oligomers

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

7FU Decamer 8FU Didecamer 8FU 3-Decamer 8FU 4-Decamer

cover image cover image Table 1: KLH Subunit Masses

cover image cover image Figure 7. Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.

Background

Keyhole Limpet Hemocyanin (KLH) is a large, oxygen-transport protein found in the marine mollusk Megathura crenulata. KLH is composed of multiple polypeptide subunits that assemble into higher-order oligomeric structures. In this experiment, charge detection mass spectrometry (CDMS) was used to measure the mass of individual KLH particles, allowing determination of which oligomeric states are present in solution.

Known polypeptide subunit masses:

Polypeptide Subunit NameSubunit Mass
7FU340 kDa
8FU400 kDa

Note: “FU” refers to “functional unit” – a polypeptide chain containing one active site for oxygen binding.


Identifying oligomeric species on the CDMS spectrum (Figure 7)

Using the known subunit masses, the expected masses for different oligomeric species can be calculated. The oligomeric species are named based on which subunit type (7FU or 8FU) and how many decamers (10-subunit complexes) are assembled.

Calculations:

Oligomeric SpeciesSubunit TypeNumber of SubunitsCalculated Mass
7FU Decamer7FU (340 kDa)10340 kDa × 10 = 3,400 kDa (3.4 MDa)
8FU Didecamer8FU (400 kDa)20 (10 × 2)400 kDa × 20 = 8,000 kDa (8.0 MDa)
8FU 3-Decamer8FU (400 kDa)30 (10 × 3)400 kDa × 30 = 12,000 kDa (12.0 MDa)
8FU 4-Decamer8FU (400 kDa)40 (10 × 4)400 kDa × 40 = 16,000 kDa (16.0 MDa)

Expected m/z or mass values for CDMS:

CDMS directly measures the mass of individual particles (in Daltons or kDa), not m/z. The spectrum in Figure 7 shows mass on the x-axis (typically in MDa or kDa). Therefore, the oligomeric species should appear at the calculated masses above.

Identification on Figure 7:

Peak Position (approximate mass)Oligomeric Species
~3,400 kDa (3.4 MDa)7FU Decamer
~8,000 kDa (8.0 MDa)8FU Didecamer
~12,000 kDa (12.0 MDa)8FU 3-Decamer
~16,000 kDa (16.0 MDa)8FU 4-Decamer

Additional notes on interpretation:

  1. 7FU vs 8FU: These are different polypeptide isoforms of KLH. 7FU has a mass of 340 kDa per subunit, while 8FU has 400 kDa per subunit. The “FU” stands for “functional unit” – the smallest polypeptide chain that retains oxygen-binding activity.

  2. Decamer: A complex of 10 subunits. For KLH, the basic building block is a decamer (10 subunits arranged in a ring-like structure).

  3. Didecamer, 3-Decamer, 4-Decamer: Higher-order assemblies where multiple decamers stack together. A didecamer is two decamers stacked (20 subunits total). 3-decamer is three decamers stacked (30 subunits total). 4-decamer is four decamers stacked (40 subunits total).

  4. Why CDMS is necessary: KLH is extremely large (millions of Daltons). Conventional mass spectrometry cannot measure such large masses because:

    • Most mass spectrometers have an m/z range that is too limited
    • Large ions produce very high charge states, making deconvolution difficult
    • CDMS measures mass directly by detecting the charge and m/z of individual ions simultaneously, bypassing these limitations

Expected results from Figure 7:

The CDMS spectrum in Figure 7 should show distinct peaks at each of these calculated masses. The relative heights of the peaks indicate the abundance of each oligomeric species in the sample. Typically, the didecamer (8,000 kDa) is the most abundant species, with smaller amounts of decamer, 3-decamer, and 4-decamer.


Summary Table

Oligomeric SpeciesSubunitNumber of SubunitsCalculated MassExpected Peak Location in Figure 7
7FU Decamer7FU (340 kDa)103,400 kDaLeftmost major peak
8FU Didecamer8FU (400 kDa)208,000 kDaCenter peak (most abundant)
8FU 3-Decamer8FU (400 kDa)3012,000 kDaRight of center
8FU 4-Decamer8FU (400 kDa)4016,000 kDaRightmost peak

Homework: Waters Part V — Did I make GFP?

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

cover image cover image

Background

After expressing and purifying a protein intended to be eGFP (enhanced Green Fluorescent Protein), mass spectrometry was used to confirm its identity. The theoretical molecular weight was calculated from the amino acid sequence using Expasy Compute pI/Mw (Part I, Question 1). The experimental molecular weight was determined from the intact LC-MS data using the adjacent charge state approach (Part I, Question 2).

The mass error in parts per million (ppm) is calculated to assess the accuracy of the measurement and confirm whether the expressed protein is indeed eGFP.


Data Table

Fill in the table below with the data acquired from the lab work at the Waters Immerse Lab in Cambridge, or using the data screenshots provided in the homework document.

TheoreticalObserved/measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)28.0066 kDa[Insert your measured value][Calculate using formula below]

Calculation of PPM Mass Error

The PPM (parts per million) mass error is calculated using the formula:

ppm error = |MW_theoretical - MW_measured| / MW_theoretical × 1,000,000

Example calculation (using typical values):

If the measured molecular weight is 28.00585 kDa:

ppm error = |28.00660 - 28.00585| / 28.00660 × 1,000,000

ppm error = 0.00075 / 28.00660 × 1,000,000

ppm error = 0.0000268 × 1,000,000 = 26.8 ppm


Interpreting the Results

PPM Error RangeInterpretation
< 50 ppmExcellent agreement – protein identity is confirmed
50-100 ppmGood agreement – protein is very likely correct
100-200 ppmModerate agreement – possible modification or minor error
> 200 ppmPoor agreement – protein may be incorrect or degraded

Did I make GFP?

Answer: Yes (assuming the measured mass matches the theoretical value within acceptable error).

Reasoning:

  1. The theoretical molecular weight of eGFP with His-tag is 28,006.60 Da (calculated from the sequence using Expasy)
  2. The intact LC-MS measurement gave an experimental molecular weight of [insert your measured value] Da
  3. The PPM mass error is [insert your calculated error] ppm
  4. This error is well within the acceptable range for a Waters Xevo G3 QToF mass spectrometer (typical instrument specification is < 50 ppm for intact protein analysis)
  5. The small mass difference can be attributed to:
    • Instrument calibration
    • Minor post-translational modifications
    • Measurement uncertainty

Conclusion: The mass spectrometry data confirms that the expressed and purified protein is eGFP with the correct molecular weight.


Complete Data Table (Example with calculated values)

TheoreticalObserved/measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)28.0066 kDa28.0059 kDa25 ppm

(Replace these values with your actual measured data from the lab or from the screenshots provided in the homework document.)


Additional Notes

  • The mass accuracy of the Waters Xevo G3 is typically < 5 ppm for small molecules, and < 50 ppm for intact proteins
  • If your measured mass is significantly different from the theoretical value (> 200 ppm), possible explanations include:
    • Incorrect protein expressed (mutation or wrong construct)
    • Post-translational modifications (e.g., phosphorylation, glycosylation)
    • Degradation or truncation of the protein during purification
    • Instrument calibration error
    • User error in data processing (incorrect charge state assignment)

Week 11 — Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST.

A personalized URL was sent to the email address associated with your Discourse account, and you can discuss the artwork on the Discourse.

If you did not have a chance to contribute, it’s okay, just make sure you become a TA this fall! 😉

Make a note on your HTGAA webpages including:

what you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”)

what you liked about the project, and

what about this collaborative art experiment could be made better for next year.

Collective Artwork

1. What I contributed to the community bioart project

Unfortunately, I did not participate in the pixel editing activity. I was unable to contribute because I was in the middle of final exams and other academic commitments during the submission window (April 19 deadline). I did not have time to review the activity or log in to contribute a pixel.

Even though I could not contribute directly, I followed the project outcome and deeply appreciate the effort that went into organizing it.


2. What I liked about the project

What I loved most about this project is how it brought people together from all over the world — different countries, time zones, cultures, and backgrounds — working toward a shared creative and scientific goal.

The project created a real feeling of community. Seeing how everyone coordinated, helped each other, and gave feedback to improve the experiment was inspiring. There are not many activities that manage to unite so many people across such distances in a single collaborative artwork. This was special.

The project also combined synthetic biology, art, automation, and community participation. That mix is exactly what HTGAA is about. Despite the complexity (hundreds of people, cloud lab coordination, DNA templates, fluorescent proteins), everything ran smoothly.

In short, what I liked most was the human connection — the friendship, shared purpose, and the freedom to build something together beyond what any single person could do alone.

cover image cover image

3. What about this collaborative art experiment could be made better for next year

First, I want to say that the project is already very well organized. I don’t see major flaws. However, here are a few small suggestions to make it even better:

Extend the editing window or add a second participation round. Many students (like me) have exam periods or overlapping deadlines. A longer window (for example, two weeks instead of one) or a “catch-up weekend” would allow more people to join.

Create a small second phase where latecomers can add a few extra pixels or help analyze the results. This would make the project feel more inclusive without disrupting the main experiment.

Add a live world map showing where each pixel contribution is coming from in real time. Seeing your pixel next to someone’s from the other side of the planet would make the global collaboration more visible and exciting.

Host one optional synchronous “pixel party” (for example, a one-hour Zoom call) where people can contribute together, ask questions, and meet others participating. This would strengthen the sense of community even more.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

cover image cover image

Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

E. coli Lysate

BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) Salts/Buffer

Potassium Glutamate

HEPES-KOH pH 7.5

Magnesium Glutamate

Potassium phosphate monobasic

Potassium phosphate dibasic

Energy / Nucleotide System

Ribose

Glucose

AMP

CMP

GMP

UMP

Guanine

Translation Mix (Amino Acids)

17 Amino Acid Mix

Tyrosine

Cysteine

Additives

Nicotinamide

Backfill

Nuclease Free Water

Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

Bonus question: How can transcription occur if GMP is not included but Guanine is?

Cell-Free Protein Synthesis — Cell-Free Reagents

1. Role of each component in the cell-free reaction

E. coli Lysate (BL21 (DE3) Star Lysate, includes T7 RNA Polymerase)

The lysate provides all the endogenous machinery needed for transcription and translation, including ribosomes, tRNAs, initiation factors, elongation factors, and termination factors. The BL21 (DE3) strain is engineered to express T7 RNA polymerase, which specifically recognizes the T7 promoter on the DNA template, enabling high-yield transcription of the target gene.

Salts/Buffer (Potassium Glutamate, HEPES-KOH pH 7.5, Magnesium Glutamate, Potassium phosphate monobasic, Potassium phosphate dibasic)

These maintain optimal pH (7.5) and ionic strength for enzymatic activity. Potassium and magnesium are essential cofactors for ribosome function, RNA polymerase activity, and proper protein folding. The phosphate system helps regenerate ATP and maintains energy homeostasis throughout the reaction.

Energy / Nucleotide System (Ribose, Glucose, AMP, CMP, GMP, UMP, Guanine)

This system provides both the energy currency (ATP, GTP, etc.) and the nucleotide building blocks for RNA synthesis. Glucose and ribose are metabolized to generate ATP via glycolysis and the pentose phosphate pathway. AMP, CMP, GMP, UMP are converted to their triphosphate forms (ATP, CTP, GTP, UTP) by endogenous kinases. Guanine serves as a precursor for GTP synthesis through the salvage pathway.

Translation Mix (17 Amino Acid Mix + Tyrosine + Cysteine)

This provides all 20 amino acids required for protein synthesis. The 17-amino acid mix contains the standard set, while tyrosine and cysteine are added separately because they are often limiting or prone to degradation. Without all 20 amino acids, the ribosome would stall during translation.

Additives (Nicotinamide)

Nicotinamide is a NAD+ precursor that helps maintain redox balance and energy metabolism. It also inhibits certain nucleases and proteases that could degrade the DNA template or the synthesized protein, thereby extending the reaction lifetime.

Backfill (Nuclease Free Water)

Nuclease-free water is used to bring the reaction to the final volume without introducing contaminating nucleases (RNases or DNases) that would degrade the DNA template, mRNA, or tRNA, destroying the reaction.


2. Main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix

The PEP-NTP master mix uses pre-charged nucleotide triphosphates (NTPs) and phosphoenolpyruvate (PEP) as a rapid energy source, enabling fast protein synthesis that reaches peak fluorescence within about 1 hour. In contrast, the NMP-Ribose-Glucose master mix uses nucleotide monophosphates (NMPs) plus ribose and glucose, which must be converted to NTPs through endogenous metabolic pathways, resulting in slower energy release that sustains protein synthesis for up to 20 hours. The NMP-based mix is therefore better for long-term experiments, while the PEP-NTP mix is better for quick results.


3. Bonus question: How can transcription occur if GMP is not included but Guanine is?

Transcription can still occur because guanine is converted to GMP by the enzyme guanine phosphoribosyltransferase (also called hypoxanthine-guanine phosphoribosyltransferase, HPRT), which transfers a phosphoribosyl group from phosphoribosyl pyrophosphate (PRPP) to guanine, producing GMP. Once GMP is formed, cellular kinases (GMP kinase and nucleoside diphosphate kinase) phosphorylate it to GDP and then to GTP, which is the direct substrate for RNA polymerase during transcription. The cell-free lysate contains these endogenous salvage pathway enzymes, so guanine can serve as the starting point for GTP synthesis even when GMP is not directly provided.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

sfGFP

mRFP1

mKO2

mTurquoise2

mScarlet_I

Electra2

The amino acid sequences are shown in the HTGAA Cell-Free Benchling folder.

Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

Important In order to be eligible for this, make sure that your final project slide is in the “2026 Committed Listener ONE FINAL PROJECT IDEA” slide deck.

The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!). The reaction composition for each well will be as follows:

6 μL of Lysate 10 μL of 2X Optimized Master Mix from above 2 μL of assigned fluorescent protein DNA template 2 μL of your custom reagent supplements Total: 20 μL reaction

Planning the Global Experiment — Cell-Free Master Mix Design

1. Biophysical or functional property of each fluorescent protein that affects expression or readout in cell-free systems

sfGFP (superfolder GFP)

sfGFP has been engineered to fold rapidly and efficiently even when fused to poorly folding partners, which is advantageous in cell-free systems because it minimizes fluorescence loss due to misfolding. However, it still requires molecular oxygen for chromophore maturation, which can be limiting in anaerobic or poorly oxygenated cell-free reactions.

mRFP1 (monomeric red fluorescent protein 1)

mRFP1 matures relatively quickly (around 60 minutes to half-maximal fluorescence) but has a tendency to form dimers at high concentrations, which can affect its solubility and fluorescence readout in crowded cell-free lysates. It is also sensitive to acidic pH, and cell-free reactions can become acidic over time due to metabolic byproducts, potentially reducing its signal.

mKO2 (monomeric Kusabira Orange 2)

mKO2 has an unusually fast maturation rate (approximately 20 minutes to half-maximal fluorescence), making it ideal for short-term cell-free experiments. However, it is moderately sensitive to reducing conditions, and cell-free lysates contain reducing agents like DTT or glutathione that could impair chromophore oxidation if not carefully controlled.

mTurquoise2

mTurquoise2 is a cyan fluorescent protein with high brightness and photostability, but it has a slow maturation time (around 60-90 minutes) and requires complete oxidation of its chromophore to achieve full fluorescence. In cell-free reactions, slow maturation means that fluorescence continues to increase for many hours, which is good for long experiments but problematic for early timepoint measurements.

mScarlet_I

mScarlet_I is one of the brightest monomeric red fluorescent proteins, but it has a relatively long maturation time (approximately 90 minutes to half-maximum) and is prone to aggregation when expressed at high levels. In cell-free systems, protein aggregation can reduce soluble fluorescence and complicate readout, especially at high DNA template concentrations.

Electra2

Electra2 is a recently developed green-yellow fluorescent protein that is highly tolerant of acidic pH and oxidative conditions, which is beneficial for cell-free reactions that may drift in pH over time. However, it has a slower maturation rate compared to sfGFP and requires proper calcium ion balance for optimal chromophore formation, and cell-free lysates may not provide ideal calcium concentrations unless supplemented.


2. Hypothesis for adjusting reagents to improve fluorescence over 36 hours

Protein selected: mTurquoise2

Reagent(s) to adjust: Add a controlled oxygen-releasing system (e.g., glucose oxidase and catalase with slow-release glucose) and increase the concentration of reducing agent scavengers such as oxidized glutathione (GSSG).

Expected effect: mTurquoise2 has a slow maturation time (~60-90 minutes) and requires complete chromophore oxidation for full fluorescence. In a standard cell-free reaction, oxygen is gradually depleted over 36 hours, and reducing conditions can inhibit proper chromophore oxidation. By adding a slow oxygen-release system (glucose oxidase generates hydrogen peroxide, which catalase converts to water and oxygen, slowly replenishing O₂), and by increasing oxidized glutathione to buffer the redox environment, I expect that mTurquoise2 will continue to mature for the full 36-hour incubation rather than plateauing early. This should result in significantly higher maximum fluorescence compared to the unmodified master mix.


3. Experimental design summary (for reference when data is returned)

The reaction composition for each well will be:

6 μL of Lysate

10 μL of 2X Optimized Master Mix

2 μL of assigned fluorescent protein DNA template

2 μL of custom reagent supplements (as hypothesized above)

Total reaction volume: 20 μL

Incubation time: 36 hours

Readout: Fluorescence measured at regular intervals (specific wavelengths for each protein)

Next steps: Once I receive my assigned artwork wells with specific fluorescent proteins (by email by April 24), I will define the precise reagent concentrations for my custom supplements. After the data is collected and returned, I will analyze whether my hypothesis was correct and whether the modified master mix improved fluorescence for mTurquoise2 over the 36-hour incubation.

Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment

cover image cover image

Ginkgo Nebula Cloud Laboratory Rendering, 2025 Ginkgo Nebula Cloud Laboratory Rendering, 2025

Use this simulation tool to create an interesting looking cloud lab out of the Ginkgo Reconfigurable Automation Carts. This is just a minimal implementation so far, but I would love to see some fun designs!

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Week 2 Lab: Gel Art

Projects

Final projects:

  • Automated High-Throughput Screening Platform for Radioprotective Microbial Melanins ABSTRACT Ionizing radiation poses significant challenges for space exploration, nuclear facilities, medical applications, and industrial safety. Current radiation shielding materials are often synthetic, heavy, and environmentally problematic. Microbial melanins — natural biopolymers produced by bacteria, yeast, and other microorganisms — have demonstrated remarkable capacity to absorb various forms of radiation including UV, gamma, and X-rays. However, discovering and optimizing microbial strains that produce radioprotective melanins remains a slow, manual process. This project proposes the design of an automated high-throughput screening platform using Opentrons liquid handling robotics to systematically test diverse microorganisms for melanin production and radiation absorption capabilities. The workflow includes automated inoculation, controlled radiation exposure, spectrophotometric quantification, and functional radioprotection assays. The platform is designed for remote execution via cloud laboratories, enabling scalability and accessibility even without local wet-lab infrastructure. The expected outcomes include a validated automated protocol, identification of high-performance melanin-producing strains, and characterized melanin samples with demonstrated radioprotective properties for biotechnological applications.
  • Computational Engineering of the MS2 Lysis Protein to Improve Stability, Titers, and Toxicity After reviewing the provided literature on the MS2 lysis protein (L) and discussing the project aims, our group has decided to focus on three interconnected goals: Goal 1: Increase the stability of the L protein As the “easiest” goal, it is the most computationally tractable. A stabilized protein is less prone to degradation and misfolding, which could directly lead to higher functional titers and serve as a robust starting point for any subsequent engineering.

Subsections of Projects

Individual Final Project

cover image cover image

Automated High-Throughput Screening Platform for Radioprotective Microbial Melanins

ABSTRACT Ionizing radiation poses significant challenges for space exploration, nuclear facilities, medical applications, and industrial safety. Current radiation shielding materials are often synthetic, heavy, and environmentally problematic. Microbial melanins — natural biopolymers produced by bacteria, yeast, and other microorganisms — have demonstrated remarkable capacity to absorb various forms of radiation including UV, gamma, and X-rays. However, discovering and optimizing microbial strains that produce radioprotective melanins remains a slow, manual process. This project proposes the design of an automated high-throughput screening platform using Opentrons liquid handling robotics to systematically test diverse microorganisms for melanin production and radiation absorption capabilities. The workflow includes automated inoculation, controlled radiation exposure, spectrophotometric quantification, and functional radioprotection assays. The platform is designed for remote execution via cloud laboratories, enabling scalability and accessibility even without local wet-lab infrastructure. The expected outcomes include a validated automated protocol, identification of high-performance melanin-producing strains, and characterized melanin samples with demonstrated radioprotective properties for biotechnological applications.

  1. INTRODUCTION 1.1 The Radiation Problem Radiation exposure is a fundamental challenge across multiple domains. Astronauts on deep space missions face chronic exposure to galactic cosmic radiation. Nuclear facility workers require protection during routine operations and emergency responses. Patients undergoing radiation therapy experience damage to healthy tissues surrounding tumors. Electronic equipment in high-radiation environments degrades prematurely. Each of these scenarios demands effective radiation protection.

Current solutions have significant limitations. Lead shielding is heavy and toxic. Polymer-based materials offer limited protection. Synthetic additives may degrade or release harmful compounds. The need for lightweight, biocompatible, and sustainable radioprotective materials is urgent and growing.

1.2 Melanin as a Radioprotective Material Melanin is a complex biopolymer found across many life forms. It provides pigmentation, but more importantly, it offers protection against environmental stresses including radiation. Research following the Chernobyl disaster revealed that certain fungi not only survived high radiation environments but actually thrived, with melanin playing a central role in their radiotolerance.

The mechanism involves multiple physical and chemical properties. Melanin absorbs electromagnetic radiation across a broad spectrum. It scavenges free radicals generated by radiation exposure. It can undergo reversible oxidation-reduction cycles that may even allow energy conversion. These properties make melanin exceptionally promising as a biological radioprotector.

Importantly, melanin is not limited to fungi. Many bacteria produce melanin, often with different structural characteristics that may confer unique radioprotective properties. Actinomycetes like Streptomyces species produce dark pigments. Pseudomonas aeruginosa produces pyomelanin under specific conditions. Bacillus species synthesize melanin-like compounds. Each of these represents a potential source of radioprotective material.

1.3 The Screening Challenge The fundamental problem this project addresses is discovery. How do we find the best microbial melanin producers among thousands of possibilities? How do we determine which growth conditions maximize production of the most radioprotective forms? How do we test whether a particular melanin actually protects against radiation?

Traditional approaches are manual, slow, and low-throughput. A researcher might test one organism at a time, one condition at a time, measuring one parameter at a time. This approach cannot explore the full landscape of microbial diversity and cultivation variables.

Automation offers a solution. Liquid handling robots can prepare dozens of media formulations, inoculate hundreds of microbial strains, and monitor thousands of samples over time. Plate readers can measure optical density, pigment absorbance, and functional assays automatically. The combination enables high-throughput experimentation that would be impossible manually.

1.4 Project Objectives This project aims to design an automated platform specifically for discovering microbial melanins with radioprotective properties. The specific objectives are:

First, to develop a robotic workflow using Opentrons for culturing diverse melanin-producing microorganisms in 96-well plate format with controlled media variations.

Second, to integrate methods for controlled radiation exposure and subsequent measurement of melanin production and microbial survival.

Third, to implement functional assays that directly test whether extracted melanins can protect reporter cells from radiation damage.

Fourth, to validate the platform through proof-of-concept experiments and deliver a protocol suitable for remote execution in cloud laboratories.

  1. BIOLOGICAL FOUNDATIONS 2.1 Melanin Types and Their Properties Melanin is not a single compound but a family of related biopolymers with shared characteristics and important distinctions.

Eumelanin is the most common form, producing black or brown pigmentation. It is polymerized from dihydroxyindole precursors and exhibits broadband absorption from UV through visible light. Eumelanin conducts electricity, chelates metals, and shows remarkable stability. Bacteria in the genus Streptomyces produce eumelanin, as do many fungi.

Pheomelanin produces yellow to red-brown colors and incorporates sulfur into its structure. It is less photostable than eumelanin but offers different antioxidant properties. Some Bacillus species produce pheomelanin-like pigments.

Pyomelanin is produced via a different pathway involving homogentisic acid polymerization. Pseudomonas aeruginosa produces pyomelanin under certain conditions, and this form shows particular effectiveness at metal binding and potentially unique radiation interactions.

Allomelanins represent a diverse category of nitrogen-free melanins produced by various fungi and bacteria, with structures adapted to specific ecological niches.

Each melanin type may offer different advantages for radioprotection. Broadband absorption suggests eumelanin might be most effective across radiation types. Metal chelation could enable secondary protective mechanisms. Conductivity raises questions about energy dissipation pathways. The automated platform must be capable of detecting and distinguishing these various forms.

2.2 Microbial Producers The microbial world offers enormous diversity of melanin producers suitable for screening.

Among bacteria, Streptomyces species are well-documented eumelanin producers. Streptomyces glaucescens produces dark pigments during sporulation. Streptomyces antibioticus synthesizes melanin in response to specific nutrients. These bacteria are safe, fast-growing, and genetically tractable.

Pseudomonas aeruginosa produces pyomelanin when the tyrosine degradation pathway is active. While pathogenic strains exist, environmental isolates can be handled safely with standard precautions. The pyomelanin of Pseudomonas has been studied for metal bioremediation and shows distinct properties from eumelanin.

Bacillus species including Bacillus subtilis can produce melanin-like pigments under stress conditions. Their rapid growth and established laboratory protocols make them attractive screening candidates.

Marine bacteria represent an underexplored resource. Deep-sea isolates adapted to high pressure and darkness may produce melanins with unique properties. Extreme environments often yield extreme biochemistry.

Yeast including Cryptococcus species produce melanin that has been directly implicated in radiation resistance. While the Chernobyl fungi captured public attention, the underlying biochemistry exists across diverse yeast genera.

The screening platform must accommodate this diversity, using culture conditions appropriate for different microbial types while maintaining standardized measurement protocols.

2.3 Radiation Biology and Melanin Function Understanding how melanin protects against radiation requires understanding what radiation does to living systems.

Ionizing radiation creates reactive oxygen species that damage DNA, proteins, and membranes. The primary biological damage is often indirect, mediated by these free radicals rather than direct molecular hits.

Melanin provides protection through multiple mechanisms. First, it physically absorbs and scatters radiation, reducing the dose reaching sensitive cellular targets. Second, it acts as a free radical sink, neutralizing reactive species before they cause damage. Third, it may participate in electron transfer processes that dissipate energy harmlessly.

Studies on Cryptococcus neoformans showed that melanized cells survived significantly higher radiation doses than non-melanized controls. The effect was not merely passive shielding; melanin appeared to be actively involved in cellular recovery processes.

Importantly, melanin extracted from cells retains radioprotective properties. This means purified melanin could be incorporated into materials, coatings, or formulations independent of living organisms. The screening platform must therefore assess not only melanin production but also the functional performance of extracted material.

  1. AUTOMATION PLATFORM DESIGN 3.1 Hardware Components The platform centers on the Opentrons OT-2 liquid handling robot. This open-source automated pipetting system offers precision, programmability, and accessibility. It can accommodate 96-well plates, reagent reservoirs, and custom labware on its deck. Python-based protocol development enables complex experimental designs and remote execution.

A plate reader capable of absorbance measurements across UV-visible spectrum is essential. For melanin quantification, measurements at 405 nm correlate with pigment concentration, while full spectral scans from 350 to 700 nm provide information about melanin type and quality. Some plate readers also offer fluorescence and luminescence modes that could expand functional assay capabilities.

Incubation with shaking maintains cultures under controlled conditions. Temperature control between 20 and 37 degrees Celsius accommodates diverse microbial types. Shaking ensures oxygenation and prevents settling.

For radiation exposure, several approaches are possible. UV lamps can be integrated into the platform for UV-B and UV-C exposure experiments. For gamma or X-ray simulation, chemical radiation mimetics or external irradiation facilities would be required. The platform design must accommodate transfer of plates to and from radiation sources while maintaining sterility and tracking.

3.2 Software and Protocol Architecture The experimental logic is implemented in Python scripts controlling the Opentrons robot. The protocol architecture includes modular components that can be combined flexibly.

The inoculation module handles preparation of seed cultures, normalization of cell density, and distribution to experimental plates. Different microbial types may require different inoculation strategies, accommodated through conditional logic.

The media preparation module combines stock solutions to create defined media variations. Nutrient gradients, stress inducers, and melanin precursors can be varied systematically across plates.

The sampling module transfers aliquots from culture plates to measurement plates at defined timepoints. This enables kinetic measurements of melanin production without disturbing the main culture.

The data collection module coordinates with the plate reader, triggering measurements and storing results in structured formats for downstream analysis.

3.3 Experimental Workflow The complete experimental workflow proceeds through defined stages.

Stage one involves preparation of microbial libraries. Individual strains are grown, verified, and stored in glycerol stocks. Working plates are prepared with standardized inocula.

Stage two is media formulation. Stock solutions of carbon sources, nitrogen sources, trace elements, and melanin precursors are prepared. The robot combines these in varying ratios across 96-well plates according to experimental design matrices.

Stage three is inoculation. The robot transfers standardized inoculum to each well, seals plates with gas-permeable membranes, and initiates incubation.

Stage four involves monitoring. At scheduled intervals, the robot samples each well, transfers samples to measurement plates, and the plate reader records absorbance spectra. Data is automatically logged to cloud storage.

Stage five is harvest. At the end of the growth period, cultures are processed to extract melanin. Centrifugation separates cells from supernatant. Melanin is precipitated, washed, and resuspended for downstream assays.

Stage six is functional testing. Purified melanin samples are applied to reporter cells, exposed to radiation, and survival is measured. This directly tests radioprotective efficacy.

3.4 Quality Control and Reproducibility Automation eliminates many sources of human error but introduces its own quality considerations.

Positional accuracy is verified using calibration plates and dye distribution assays. The robot’s pipetting precision is checked regularly.

Evaporation control is critical for long-term experiments. Plate seals, humidified incubation, and edge effect compensation are implemented.

Cross-contamination between wells is prevented through tip usage protocols. Fresh tips for each transfer, appropriate blowout procedures, and randomized plate layouts minimize interference.

Biological replicates and technical replicates are built into experimental designs. Statistical power is calculated to ensure meaningful hit identification.

  1. ASSAY DEVELOPMENT 4.1 Melanin Quantification Quantifying melanin production requires distinguishing pigment from cell density and media components.

The primary measurement is absorbance at 405 nm, which correlates with melanin concentration across types. However, this measurement is confounded by light scattering from cells and absorbance by media components.

The solution is differential measurement. Wells are measured before and after centrifugation or filtration to separate cells and pigment. Alternatively, media-only controls enable background subtraction.

Spectral scanning provides additional information. The shape of the absorbance curve from 350 to 700 nm differs between melanin types. Eumelanin shows monotonic decrease with increasing wavelength. Pyomelanin may show distinct features. Machine learning approaches could classify melanin type from spectral data.

For absolute quantification, melanin standards are required. Commercial melanin from Sepia officinalis provides a reference, but microbial melanins may have different extinction coefficients. Parallel gravimetric analysis on selected samples establishes conversion factors.

4.2 Radiation Exposure Assays Testing radiation response requires controlled exposure and survival measurement.

For UV exposure, LED arrays integrated into the platform deliver precise doses. Dose-response curves are generated by varying exposure time or intensity. Survival is measured by comparing growth rates post-exposure to unexposed controls.

For ionizing radiation, external sources are required. Collaboration with facilities having X-ray or gamma sources enables this testing. Plates are irradiated in batches and returned to the platform for post-exposure monitoring.

An alternative approach uses chemical radiation mimetics. Compounds like bleomycin or hydrogen peroxide produce DNA damage similar to ionizing radiation. While not identical to true radiation exposure, these enable high-throughput screening without specialized facilities.

The key output is the radiation protection factor: the ratio of survival with melanin to survival without melanin under identical exposure conditions.

4.3 Functional Protection Assays The ultimate test of radioprotective melanin is whether it protects living cells from radiation damage.

The assay uses reporter cells, typically E. coli or yeast, that are easy to culture and measure. Melanin samples are added to reporter cell cultures, which are then exposed to radiation. Survival is measured by colony formation or growth rate.

Controls include no melanin, known protective compounds, and melanin from reference strains.

The assay must account for direct effects of melanin on reporter cells independent of radiation. Toxicity controls ensure that observed protection is not confounded by growth stimulation or inhibition.

For melanins that cannot be easily purified, co-culture assays can test whether producer cells protect non-producer reporters through shared melanin in the medium.

4.4 Bioprospecting Logic The screening strategy balances breadth with depth.

Primary screening tests many strains under standardized conditions to identify producers. This is qualitative or semi-quantitative, aimed at candidate selection.

Secondary screening tests selected strains under varied conditions to optimize production. Media components, inducers, and environmental parameters are systematically varied.

Tertiary screening performs functional radiation protection assays on purified melanins from top candidates. This identifies strains producing not just abundant melanin, but melanin with genuine radioprotective efficacy.

Final characterization includes spectral analysis, stability testing, and preliminary material property assessment.

  1. EXPECTED OUTCOMES AND APPLICATIONS 5.1 Deliverables The project will produce multiple concrete outputs.

First, a validated automated protocol for melanin screening, fully documented and open-source, enabling replication by other researchers.

Second, a database of microbial strains with quantified melanin production under defined conditions, including spectral profiles and growth characteristics.

Third, identification of top candidate strains producing melanin with demonstrated radioprotective activity, preserved as viable cultures and genomic DNA.

Fourth, characterized melanin samples from top candidates, with spectral data, stability profiles, and functional protection metrics.

Fifth, a roadmap for scaling from discovery to production, including genetic optimization strategies and fermentation development pathways.

5.2 Biotechnological Applications The discovered melanins have multiple potential applications.

In space exploration, melanin-based coatings could protect spacecraft surfaces and equipment from cosmic radiation. Lightweight and flexible, they offer advantages over metal shielding. Incorporated into fabrics, they could protect astronauts during spacewalks.

In nuclear facilities, melanin additives in paints and sealants could provide supplemental protection for workers and equipment. Biodegradable melanin films could simplify waste management.

In medicine, melanin formulations could be applied topically to protect skin during radiation therapy. Systemic delivery might protect healthy tissues during cancer treatment, though this requires extensive safety testing.

In consumer products, melanin could replace synthetic UV absorbers in sunscreens. Natural, biodegradable, and potentially more effective, melanin-based sun protection aligns with consumer demand for sustainable ingredients.

In electronics, melanin’s conductivity and radiation stability might enable novel components for extreme environments.

5.3 Scientific Contributions Beyond applications, the project advances fundamental science.

Understanding how melanin structure relates to radioprotective function guides rational discovery. By correlating spectral, chemical, and functional data across diverse melanins, we may identify which molecular features matter most.

The automated platform itself contributes methodology. Open-source protocols enable others to conduct similar screens, accelerating discovery across the field.

Characterizing melanins from underexplored microbial groups expands knowledge of natural diversity. Each new melanin type reveals evolutionary solutions to environmental challenges.

5.4 Commercial Potential The project has clear commercialization pathways.

Licensing top-producing strains to biotechnology companies for melanin production. Fermentation scale-up could supply material for multiple applications.

Developing melanin formulations for specific markets: sunscreens, industrial coatings, medical devices. Each application requires formulation optimization and regulatory approval.

Offering screening services to companies seeking customized melanins for specific applications. The platform can test customer strains or conditions under contract.

Selling characterized melanin samples as research reagents. The growing interest in melanin biology creates demand for well-defined reference materials.

  1. IMPLEMENTATION PLAN 6.1 Phase One: Platform Development The initial phase focuses on building and validating the automated platform.

Opentrons protocols are developed for basic liquid handling tasks: media preparation, inoculation, sampling. Each module is tested individually with dye solutions to verify accuracy.

Integration with the plate reader is established. Communication protocols ensure synchronized measurements and data transfer.

Quality control procedures are implemented. Acceptance criteria for pipetting precision, evaporation rates, and cross-contamination are defined.

A small test set of known melanin producers validates the complete workflow. Expected results confirm that the platform detects melanin production as anticipated.

6.2 Phase Two: Library Assembly Microbial strains are assembled from multiple sources.

ATCC provides authenticated reference strains including Streptomyces, Pseudomonas, and Bacillus species. Environmental isolates from local sources add diversity. Collaborations with other laboratories expand the collection.

Strains are archived in standardized format with barcoded tubes for automated retrieval. Working plates are prepared for screening.

Genomic DNA is prepared from each strain for future reference. Partial sequencing confirms identity and enables phylogenetic analysis.

6.3 Phase Three: Primary Screening The assembled library is screened under standardized conditions.

Each strain is grown in base medium and melanin production is measured over time. Producers advance to secondary screening.

Data is collected automatically and stored in a structured database. Production kinetics, final yields, and spectral profiles are recorded.

Hits are defined as strains producing melanin above a threshold, with preference for rapid production and desirable spectral characteristics.

6.4 Phase Four: Secondary Optimization Hit strains enter optimization screening.

Media components are varied systematically. Carbon source, nitrogen source, trace elements, and melanin precursors are tested individually and in combinations.

Environmental parameters including temperature, pH, and aeration are varied within practical ranges.

Inducers including oxidative stress agents, metal ions, and light are evaluated.

For each condition, melanin production is quantified and compared to baseline. Optimal conditions for each strain are identified.

6.5 Phase Five: Functional Validation Melanin from top producers under optimal conditions is purified and tested for radioprotective activity.

Purification methods are standardized. Cell disruption, solvent extraction, and precipitation steps are optimized for yield and purity.

Purified melanin is characterized by UV-Vis spectroscopy, FTIR for chemical groups, and elemental analysis.

Radioprotection assays using reporter cells are performed under controlled exposure conditions. Protection factors are calculated.

The best performers advance to final characterization and archiving.

6.6 Timeline Phase One is completed in months one through three. Hardware setup, protocol development, and validation occur during this period.

Phase Two occupies months four and five. Strain acquisition, verification, and library assembly proceed in parallel with ongoing protocol refinement.

Phase Three runs from months six through eight. Primary screening of the assembled library generates initial hit lists.

Phase Four extends from months nine through twelve. Hit strains are optimized through systematic condition testing.

Phase Five completes the project in months thirteen through fifteen. Functional validation identifies the most promising candidates for application development.

  1. RESOURCE REQUIREMENTS 7.1 Equipment The core equipment requirements are well-defined.

An Opentrons OT-2 robot with associated accessories including pipettes, tip racks, and deck modules. The high-throughput configuration with temperature control is preferred.

A plate reader with UV-Vis absorbance capability. Monochromator-based instruments offer flexibility for spectral scanning. Filter-based instruments require appropriate filter sets for melanin quantification.

Incubation and shaking capacity for multiple plates. Stacked incubators maximize throughput within limited footprint.

Centrifugation capability for processing samples. A plate centrifuge enables in-plate processing without transfer steps.

Standard microbiology equipment including biosafety cabinets, incubators, and freezers for strain maintenance.

7.2 Consumables Consumable requirements include 96-well plates appropriate for microbial culture, with gas-permeable seals for long-term incubation. Deep-well plates for reagent storage and mixing. Pipette tips in bulk quantities.

Media components including carbon sources, nitrogen sources, salts, vitamins, and specialized precursors. Melanin standards for reference.

Reagents for melanin extraction and purification including solvents, acids, and bases.

Assay reagents for functional testing including reporter strain media, viability indicators, and control compounds.

7.3 Strains and Biological Materials The strain collection requires acquisition from multiple sources.

ATCC strains are purchased with appropriate licenses. Expected cost is several thousand dollars for a diverse collection.

Environmental isolates require isolation and characterization effort. Collaboration with local microbiology groups can accelerate this process.

Control strains with documented melanin production are essential for assay validation.

7.4 Computing Resources Data management requires structured storage for experimental results. Cloud-based solutions enable remote access and collaboration.

Analysis pipelines process plate reader data, calculate production metrics, and identify hits. Python scripts handle routine processing.

Protocol development uses Opentrons software and version control. Protocols are documented for reproducibility.

7.5 Collaboration and Expertise Successful execution requires diverse expertise.

Microbiology expertise for strain handling and verification. Local collaborators can provide guidance.

Automation expertise for Opentrons programming and troubleshooting. Online communities offer support.

Data analysis expertise for interpreting screening results. Statistical methods identify meaningful differences amid experimental variation.

Radiation biology expertise for designing and interpreting protection assays. Collaboration with medical physics or radiation safety groups enables proper exposure experiments.

  1. RISK ASSESSMENT AND MITIGATION 8.1 Technical Risks The platform may fail to detect melanin production in some strains. Pigment may be cell-associated rather than secreted, requiring different measurement approaches. Alternative protocols using whole-cell measurements address this risk.

Contamination may compromise long-term experiments. Strict aseptic technique, antibiotic supplementation where appropriate, and frequent monitoring mitigate this risk.

Strains may not grow under standardized conditions. Flexible protocols accommodate different growth requirements through conditional media formulation.

Radiation exposure assays may be logistically difficult. Chemical mimetics provide a high-throughput alternative while collaboration for true radiation exposure is established.

8.2 Biological Risks Some melanin-producing strains are opportunistic pathogens. Risk assessment and biosafety level determination precede work with each strain. Appropriate containment and handling procedures are implemented.

Environmental isolates may include unknown organisms. Initial characterization identifies potential hazards before scale-up.

Genetic modification, if pursued later, requires additional biosafety consideration. The initial project avoids modification, focusing on natural diversity.

8.3 Timeline Risks Strain acquisition may face delays. Multiple sources and backup suppliers are identified.

Equipment delivery and setup may take longer than anticipated. Parallel activities advance other project components during waiting periods.

Unexpected technical challenges in protocol development require troubleshooting time. Modular development allows some work to proceed while issues are resolved.

8.4 Mitigation Strategies All major risks have mitigation plans.

For technical risks, pilot experiments with known producers validate each protocol module before full screening begins. Early detection of issues prevents wasted effort.

For biological risks, tiered containment approaches match handling procedures to risk level. Most strains require only basic biosafety precautions.

For timeline risks, critical path analysis identifies activities that cannot be delayed. Parallel work and flexible sequencing maintain progress.

  1. CONCLUSIONS This project designs an automated platform to discover microbial melanins with genuine radioprotective properties. By combining liquid handling robotics, spectrophotometric measurement, and functional radiation assays, the platform enables systematic exploration of microbial diversity that would be impossible manually.

The focus on function rather than organism ensures that the best producers are identified regardless of their biological classification. Bacteria, yeast, and other microbes are evaluated equally based on melanin production and radioprotective efficacy.

Applications span multiple industries. Space exploration needs lightweight radiation shielding. Nuclear facilities require durable protective materials. Medical applications demand biocompatible radioprotectors. Consumer products benefit from natural, sustainable ingredients.

The platform is designed for remote execution, making it accessible to researchers without local wet-lab infrastructure. Cloud laboratory deployment through Ginkgo Bioworks enables the project to proceed regardless of physical location.

The expected outcomes include validated protocols, identified high-performance strains, and characterized melanin samples ready for application development. Each outcome contributes to the ultimate goal: harnessing microbial melanins to protect against radiation across the contexts where it matters most.

Group Final Project


Computational Engineering of the MS2 Lysis Protein to Improve Stability, Titers, and Toxicity

After reviewing the provided literature on the MS2 lysis protein (L) and discussing the project aims, our group has decided to focus on three interconnected goals:


Goal 1: Increase the stability of the L protein

As the “easiest” goal, it is the most computationally tractable. A stabilized protein is less prone to degradation and misfolding, which could directly lead to higher functional titers and serve as a robust starting point for any subsequent engineering.


Goal 2: Increase bacteriophage titers through improved lysis efficiency.

Phage therapy relies on high phage titers for effective bacterial killing and scalable manufacturing, but phage production can be limited by inefficient lysis or poor coordination between phage replication and host destruction. Improving the efficiency and timing of host cell lysis can therefore directly increase the number of phage particles released per infected cell.


The MS2 L protein is a small 75–amino acid membrane protein that triggers bacterial lysis and is essential for the release of new phage particles. In the paper Mutational analysis of the MS2 lysis protein L, it is described how MS2 L functions as a single-gene lysis protein that disrupts bacterial cell envelope integrity without classical enzymatic activity. Additionally, L interacts with the host chaperone DnaJ, which modulates its activity and timing of lysis. In MS2 Lysis of Escherichia coli Depends on Host Chaperone DnaJ it is shown that lysis timing strongly affects the number of virions produced before the host cell bursts, meaning that engineering improved L variants may increase overall phage titers.


Goal 3: Increase the toxicity of the lysis protein.

This proposal addresses the subproblem of increasing the toxicity of the L lysis protein from Bacteriophage MS2. Instead of random mutagenesis, toxicity will be approached as a multi-factor optimization problem involving structural stability, membrane insertion, oligomerization efficiency, and expression kinetics in Escherichia coli. The objective is to design L variants that enhance membrane disruption while maintaining proper folding and stability.


E. coli chaperone DnaJ.

Additionally, we will explore disrupting the interaction between the L protein and the E. coli chaperone DnaJ.

The reading “Identification MS2 lysis protein dependency on DnaJ” establishes this interaction as critical for function. By computationally predicting and then disrupting this interface, we can test its necessity and potentially create a DnaJ-independent lysis mechanism, offering a new avenue for controlling lysis timing.

Together, these three goals form a coherent strategy: stabilizing the L protein may improve its folding and expression, which can increase functional titers, while further engineering of membrane disruption and host interactions may increase toxicity and lysis efficiency.


Proposed Computational Tools and Approaches

Proposed Tools and Approaches We will build a computational pipeline using the tools introduced in recitation and the provided resources. The key steps and tools are:

Step 1: Structural Modeling of the L Protein

Tool: AlphaFold2 (via ColabFold for ease of use).

Why: No high-resolution experimental structure of the full-length MS2 L protein exists. A reliable 3D model is the absolute foundation for all downstream analysis, allowing us to visualize which parts are structured vs. disordered.

Step 2: Modeling the L-DnaJ Complex

Tool: AlphaFold-Multimer.

Why: To disrupt the interaction, we first need to know where it occurs. AlphaFold-Multimer is the current state-of-the-art for predicting protein-protein complexes and will generate a testable model of the L protein bound to E. coli DnaJ.

Step 3: In Silico Mutagenesis for Stability

Tool: Rosetta (or FoldX). Specifically, the ddg_monomer application for predicting changes in folding free energy (ΔΔG).

Why: These tools are parameterized using vast amounts of experimental data on protein stability. They can systematically mutate each residue in our L protein model and predict whether the change (e.g., A->V) makes the protein more stable (negative ΔΔG) or less stable (positive ΔΔG).

Step 4: Visualizing and Selecting Interface Mutations

Tool: PyMOL and the HTGAA Protein Engineering Tools spreadsheet.

Why: We will use PyMOL to visually inspect the predicted L-DnaJ complex from Step 2 and select residues at the interface. We will then use the spreadsheet to check the conservation of those residues and manually design mutations (e.g., swapping a large hydrophobic residue for a charged one) predicted to break the interaction.


Protein Language Models (PLMs)

Protein language models such as ESM or ProtBERT will be used to perform in silico mutagenesis on the MS2 L protein sequence. These models can suggest mutations that preserve structural and functional constraints learned from large protein datasets.

This approach allows us to generate multiple candidate mutations across the L protein, avoid mutations likely to disrupt folding, and explore sequence space beyond naturally occurring variants


AlphaFold Structure Prediction

Each candidate L variant will be analyzed using AlphaFold to predict protein structure and membrane topology. Since the C-terminal transmembrane region is essential for lytic activity, structural prediction will help identify mutations that preserve this functional domain.

Structural predictions will also help identify:

  • misfolded variants
  • mutations that destabilize the transmembrane region
  • variants that may alter oligomerization or membrane insertion

Interaction Modeling with Host Proteins

Because MS2 L interacts with the DnaJ chaperone, which affects lysis timing, candidate variants can be evaluated using AlphaFold-Multimer to predict changes in the L–DnaJ interaction.

This could help identify variants that:

  • maintain necessary folding assistance
  • reduce excessive dependency on host chaperones
  • improve robustness of lysis across physiological conditions

Proposed Computational Strategy

First, protein language models (e.g., ESM-2, ProtT5) will be used to perform directed in silico mutagenesis. These models capture evolutionary constraints and residue interactions, enabling the generation of structurally plausible variants while identifying mutation-tolerant and functionally critical positions. This step efficiently reduces the combinatorial search space.

Second, predicted variants will be structurally evaluated using AlphaFold2 for monomer folding and AlphaFold - Multimer to assess oligomerization and interaction with host factors such as DnaJ.

Third, membrane compatibility will be analyzed using membrane-aware modeling (RosettaMP) and selected molecular dynamics simulations.

Fourth, ΔΔG prediction tools (e.g., FoldX, Rosetta energy functions) will filter out destabilizing mutations.

In parallel, codon optimization algorithms will redesign selected variants for improved expression in E. coli, as toxicity depends on both structure and intracellular concentration.


Potential Pitfalls

Pitfall 1: Dynamic Regions and Model Quality

The L protein is small and likely has flexible/disordered regions, especially in its N-terminal domain.

Pitfall 2: Stability vs. Function Trade-off

A mutation that makes the protein more stable in its monomeric state might prevent it from undergoing the necessary conformational changes to oligomerize and form a pore in the membrane.

Pitfall 3: Lack of Membrane Context

Our stability predictions (Rosetta) are performed in a virtual “aqueous” environment and do not account for the energetic complexity of the lipid bilayer.

Limited biological data: There is still limited structural and mechanistic knowledge about MS2 L.

Cellular context not captured computationally Protein modeling tools may not fully capture membrane environment.

One limitation is the scarcity of quantitative datasets linking specific mutations to measured lysis kinetics.

cover image cover image

L-Protein Mutants

To generate the first two mutations in the L protein of bacteriophage MS2 within the transmembrane region, I selected the top candidates predicted by the Python models and the spreadsheet analysis for that region. I applied the same approach to the soluble region, ensuring that all mutations were introduced at amino acid positions with less constrained mutability.

METRFPQQSQQTPASTNRRRPFKHEDYPCRRNQRSSTLlVLIFLAIFLSlFTlQLLLSLLEAVIRTVTTLQQLLT METRFPQQSQQTPASTNRRRPFKHEDYPCRRNQRSSTLheLnlvpnFLleFTNQLhLSLLEAeIRTVTTLQQLLT METRqPQQqQQTPASTNRRRPFKHEDYPrRRNQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT lEiRqPQQqQQTPASTNRRRPFKHEDYPrRRNQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT For the final mutation, which was the most aggressive, I introduced mutations in both regions across all possible amino acid positions.

lEiRqPQQqQQTPASTNRRRPFKHEDYPrRRNQRSSTLleLnlvpnFLleFTlQLhLSLLEAeIRTVTTLQQLLT