Here are my three individual final project ideas!
An interactive 3D bio-art sculpture where human touch meets living bioluminiscence bloom.
A river-sensing automated robot system that helps both advance research and act as a tourist spot for the community to reunite and spend time together.
Chlorella vulgaris in silico optimization and automation for the optimal accumulation of polyhydroxybutyrate (PHB).
Bacteriophage Engineering: L-Protein Optimization 1. Hypothesis: Engineering Lysis Protein Stability Our core hypothesis is that the thermodynamic stability and lytic efficiency of the MS2 L-protein can be enhanced through two strategic pathways:
Structural Reinforcement: Introducing targeted mutations that promote independent folding or stabilize the 7-helix bundle, reducing dependence on the host chaperone DnaJ. Generative Optimization: Utilizing evolutionary conservation data and generative protein design to create variants with improved membrane-insertion kinetics and host compatibility, thereby minimizing host-mediated resistance. 2. Specific Aims and Validation Pipeline Aim 1: Mutation Design via Conservation and Predictive Modeling We will perform Clustal Omega alignments of homologous lysis proteins to identify conserved residues (specifically the “HEDYPCRRQQRSST” island). This is followed by:
Subsections of Projects
Individual Final Project
Here are my three individual final project ideas!
An interactive 3D bio-art sculpture where human touch meets living bioluminiscence bloom.
A river-sensing automated robot system that helps both advance research and act as a tourist spot for the community to reunite and spend time together.
Chlorella vulgaris in silico optimization and automation for the optimal accumulation of polyhydroxybutyrate (PHB).
After some thought and consideration, I decided to go with my third idea for my final individual project, here you can find a more detailed view of it:
SECTION 1: ABSTRACT
Provide a concise, self-contained summary of your project (minimum 150 words).
The abstract should allow a reader to understand the purpose, approach, and expected outcomes of the work without referring to other sections.
Your abstract should briefly address the following elements:
Significance: What problem or question does the project address, and why is it important?
The accumulation of petroleum-derived plastics has led to irreversible ecological damage, particularly in marine ecosystems like the ones in the coast of Peru. While Polyhydroxybutyrate (PHB) offers a 100% biodegradable and biocompatible alternative, its industrial scaling is hindered by low yields and the high cost of optimizing metabolic pathways. Microalgae like Chlorella vulgaris are ideal “chassis” because they fix CO2, but current optimization methods rely on slow, manual trial-and-error, creating a significant gap between laboratory research and industrial relevance.
Broad Objective: What is the overall goal of the project?
The overall goal of the project is to develop a high-throughput, closed-loop pipeline that integrates computational metabolic modeling with robotic automation. The goal is to rapidly identify and implement optimal “metabolic stress” environments that pivot Chlorella vulgaris from standard vegetative growth to specialized bioplastic accumulation, making the production of sustainable materials more predictable and effective.
Hypothesis: What prediction or principle is the project testing or demonstrating?
I hypothesize that Flux Balance Analysis (FBA) can accurately identify the metabolic “tipping points” where nutrient limitation (e.g., nitrogen or phosphorus) forces a shift in carbon flux. By using these models to predict the exact concentration of nutrients that inhibit biomass production without killing the cell, we can design “stress recipes” that maximize the partition of carbon into the PHB biosynthetic pathway (such as phaA, phaB, phaC).
Specific Aims: What key steps or milestones will be completed to achieve the objective?
Some of the steps I attempt to cover in the project are as follows:
Step 1: In Silico Modeling
Reconstruct and simulate a genome-scale metabolic model of C. vulgaris using COBRApy to identify optimal flux distributions for PHB precursors (Acetyl-CoA). This involves mapping the metabolic shifts during the transition from biomass growth to polymer accumulation.
Step 2: Automated Protocol Design
Develop and simulate Python-based automation protocols for the Opentrons OT-2 to execute parallelized micro-cultivations. The goal is to program the logic for complex “nutrient stress recipes” that can be deployed in a Biofoundry setting to validate model-driven growth and yield predictions.
Step 3: Visionary Scaling of the proccess
Develop a theoretical framework for an AI-guided, modular photobioreactor system. This conceptual design focuses on the integration of real-time sensing data with metabolic models to enable decentralized bioplastic manufacturing in coastal communities, empowering them to utilize local biodiversity.
Methods: What experimental or technical approaches will be used?
The project follows a Design-Build-Test-Learn (DBTL) cycle, prioritizing computational engineering and standardized automation protocols:
Metabolic Modeling (Design): I will utilize COBRApy (Python) to perform Flux Balance Analysis (FBA) on a genome-scale metabolic model of Chlorella vulgaris. This involves defining a mathematical objective function for PHB production and applying environmental constraints (nitrogen, phosphorus, and carbon levels) to predict the metabolic states that maximize bioplastic precursors like acetyl-CoA.
Automation Protocol Development (Build): The predicted nutrient “recipes” will be translated into Python scripts for the Opentrons OT-2 robot. This ensures that the logic for combinatorial screening is high-precision and reproducible. By programming the liquid handling workflows, the project demonstrates how to scale the preparation of complex media that would be prone to human error if done manually.
Micro-scale Validation (Test - If resource availability permits): If laboratory access and resources allow, the programmed protocols will be executed in 24-well or 96-well plates. Growth dynamics would be monitored via optical density (OD) measurements at 680 nm and 750 nm to track chlorophyll health and cell density, providing the real-world data needed to refine the initial FBA models.
Genetic Construct Design (Future Build): Using Benchling, I will design a synthetic operon containing the phaA, phaB, and phaC genes. These sequences will be codon-optimized for Chlorella and placed under the control of nitrate-inducible promoters (e.g., NIT1). This genetic “switch” ensures that bioplastic synthesis only triggers when the system detects or creates a nitrogen-depleted environment, as predicted by the model.
System Architecture (Vision): For the visionary phase, the focus shifts to a theoretical system architecture for a modular photobioreactor. Instead of physical manufacturing, this involves defining the integration of low-cost sensors (pH, temperature, and light) with a closed-loop AI model to maintain the optimal metabolic conditions identified during the computational phase.
The global crisis of plastic pollution demands a shift toward biodegradable materials like polyhydroxybutyrate (PHB) or poly lactic acid (PLA). Microalgae such as Chlorella vulgaris are natural candidates for bio-factories, but the traditional trial-and-error process to optimize their growth conditions remains a major industrial bottleneck. This project addresses the challenge by developing an integrated pipeline that combines in silico metabolic modeling with standardized automation protocols. The broad objective is to maximize PHB yield from Peruvian coastal strains by treating biology as an engineering problem optimized through data-driven predictions. I hypothesize that constraint-based modeling via COBRApy can accurately predict which nutrient stress conditions trigger the highest carbon flux toward PHB synthesis.
To test this, my experimental approach involves using Flux Balance Analysis (FBA) to design specific culture recipes translated into Python-based automation protocols for the Opentrons OT-2. This demonstrates how metabolic logic can be scaled in a Biofoundry setting to ensure high-precision media preparation. If resource availability permits, these protocols will be validated through micro-scale cultivations to refine the model. Furthermore, the project includes the design of synthetic genetic constructs in Benchling to overexpress the PHB pathway under inducible control. Finally, the visionary aim proposes a theoretical architecture for AI-guided modular photobioreactors, focusing on decentralizing production to empower coastal communities by transforming local biodiversity into sustainable materials through a closed-loop integration of sensors and models.
SECTION 2: PROJECT AIMS
Define three aims for your final project (minimum one sentence per aim).
Aim 1: Experimental Aim (this project):
“The first aim of my final project is to [achievable experimental goal] by utilizing [protocols, tools, or strategies].”
This aim should describe the core experimental objective you will attempt during this class. List or link any relevant methods or resources you plan to use (e.g., experimental protocols, automation workflows, DNA or protein designs, protein design tools, or Twist orders).
You will provide a detailed step-by-step experimental plan for Aim 1 in the Experimental Design section of this assignment.
“The first aim of my final project is to identify the optimal metabolic conditions for PHB accumulation in Chlorella vulgaris by utilizing COBRApy for Flux Balance Analysis (FBA) and developing automated Python-based protocols for the Opentrons OT-2 to simulate high-throughput nutrient stress ‘recipes.’”
Aim 2: Development Aim:
Describe the next step that would follow a successful Aim 1, extending the work beyond the scope of this course. This aim should represent a realistic progression of the project, such as executing additional experiments, solving a technical limitation, or developing the system or technology further.
“The next step following the computational validation would be to experimentally implement the designed genetic constructs using a modular synthetic biology approach, overexpressing the PHB biosynthetic pathway under the control of the nitrate-inducible NIT1 promoter to evaluate real-time polymer accumulation.”
Aim 3: Visionary Aim:
Describe the long-term vision for the project. Explain how the broader concept could have an impact if fully realized.
Examples include:
Challenging an existing paradigm or clinical practice.
Addressing a major barrier in a field.
Enabling a new experimental capability or research approach.
“The long-term vision is to establish a decentralized framework for bioplastic production through AI-guided, closed-loop modular photobioreactors, enabling coastal communities to transform local microalgae biodiversity into sustainable, high-value compostable materials that replace petroleum-based plastics.”
SECTION 3: BACKGROUND
Background and Literature Context
Provide background research that explains the current state of knowledge and identifies the gap in knowledge or capability that your project addresses.
Briefly summarize two peer-reviewed research citations relevant to your research (minimum four sentences).
Explain how your project is novel or innovative. (Minimum 3 sentences.)
Examples of topics to discuss:
New applications or uses of existing biological tools or concepts.
Development of new approaches, methodologies, or technologies.
Ways the project challenges existing paradigms or assumptions.
How the work expands the boundaries of synthetic biology.
Explain why your project matters and what impact it could have. (Minimum 5 sentences.)
Examples of topics to discuss:
The problem addressed: What pressing real-world problem does your project attempt to solve?
Importance of the problem: Why is this problem significant, or what critical barrier to progress in the field does it represent?
Broader societal contribution: How could the outcomes of your project benefit society beyond the immediate research context?
Advancement of knowledge or capability: How might the project improve scientific understanding, technical capability, or clinical practice within one or more fields?
Field-level change: If your aims are achieved, how could the concepts, methods, technologies, treatments, services, or preventative approaches used in this field of research change?
Describe the ethical implications associated with your project and identify relevant ethical principles (e.g., non-maleficence, beneficence, justice, or responsibility). (Minimum 2 paragraphs.)
First paragraph: Include what ethical implications are involved in your project. Try to suggest ethical the principle(s) you may apply (e.g. non-maleficence, justice)?
Second paragraph: Describe the measures that should be taken to ensure that your project is ethical (both in how the research is conducted and in its broader implications for society). You may wish to answer the following questions:
What action(s) do you propose?
What are potential unintended consequences of your proposed actions?
What could you have been wrong (e.g., incorrect assumptions and uncertainties)?
What are alternatives to your proposed actions?
Note: in an NIH proposal, an ethics statement is used to describe the relevance of this research to public health
SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY
Use Claude AI skills to refine your HTGAA final project experimental design here
Create a detailed experimental plan for your final project. Include a timeline for each part of your experimental plan (i.e., how long you would expect each step in your final project to take). (min. 15 lines/sentences—a numbered list is acceptable)
Include specific methods/tools/technologies/biological concepts for each part of the final project and analysis
This section will be used to determine whether the experiments are well designed, feasible, and likely to succeed in testing your hypothesis
Often this section is broken into discrete tasks/sub-aims
For each experiment and/or analysis, include a description of your expected results
If possible, include figure(s) that visually shows a broad workflow of your project or a specific aspect of your experimental plan
Reminder: All HTGAA projects must include some DNA design! Make sure this form is submitted.
We discussed and practiced various techniques related to synthetic biology throughout the semester. Place a check next to the techniques relevant to your project.
Pipetting
Pipetting
Lab Safety
Bioethical Considerations (must check this box)
DNA Gel Art
DNA Sequencing
DNA Editing
DNA Construct Design
Restriction Enzyme Digestion
Gel Electrophoresis
DNA Purification From Gel
Databases (e.g., GenBank, NCBI, Ensembl, and UCSC Genome Browser)
Lab Automation
Creating Code for Laboratory Automation
Using Liquid Handling Robots (e.g., Opentrons)
Designing a Twist Order
Creating a plan to use the Autonomous lab at Ginkgo Bioworks
Protein Design
Protein Design
Use of Boltz or PepMLM
Use of Asimov Kernel
Use of Benchling
Models and Notebooks
Databases
Bioproduction
Bioproduction
Chassis Selection (e.g., DH5alpha)
Registry of Standard Biological Parts
Plasmid Preparation
Bacterial Culturing
Quality Control/Analysis
Bacterial Processing (e.g., Centrifugation, Lysis, DNA Purification)
Cell-Free Systems
Cell Free Reactions
Freeze-Dried Cell Free Systems
miniPCR Tools
Protein Purification
Gibson Assembly
Primer Design or Selection
PCR Reactions
Gibson Assembly
Other Cloning Methods (e.g., Restriction Enzyme Digestion or Gateway Cloning)
CRISPR
CRISPR/Cas9
Designing Prime Editing gRNA
Expand upon two techniques you checked in the previous question by describing how you would utilize those techniques in your final project. (min. 4 sentences)
Identify any How To Grow (Almost) Anything Industry Council companies which are associated with your final project (optional)
You are required to validate at least one aspect of your final project aims. This is to ensure that you are able to successfully apply a relevant synthetic biology technique to your project. Include figures if you have them—accuracy is critical in figures, tables, and graphs
Here is a non-exhaustive list of acceptable validations:
Designing DNA relevant to your final project
Performing a PCR reaction using primers relevant to your final project
Performing a Gibson assembly relevant to your final project
Creating and performing a cell-free assay related to your final project
Creating and running code to validate an aspect of your final project
Developing a model or completing a computational analysis relevant to your project
Designing DNA construct(s) that can express at least one gene of interest, ordering it (via Twist), and testing of the expression of the construct(s) (potentially using an Opentrons robot)
What aspect of your final project did you choose to validate? (min. 2 sentences)
Write down a detailed protocol of how you validated this aspect of your final project. (Numbered list or paragraph is fine)
What synthetic biology techniques did you utilize in validating this aspect of your final project? You can refer to the list of techniques in question 8. (min. 4 sentences)
You must present data as part of your final project and include some analysis of that data. The data may be collected experimentally in the lab or generated as simulated data (e.g., using the Asimov Kernel or another simulation method). (min. 2 sentences)
Did you encounter any unexpected challenge(s) when performing your validation? If so, describe the challenge(s) and strategies to overcome it. If not, discuss potential problems, difficulties, limitations, and/or alternative strategies to overcome challenges in your final project. (min. 4 sentences).
SECTION 6: ADDITIONAL INFORMATION
List all references cited in this assignment (bullet-point list)
Create a supply list and budget for your project (bullet-point list)
What supplies, equipment, and budget is needed for your project to work?
Group Final Project
Bacteriophage Engineering: L-Protein Optimization
1. Hypothesis: Engineering Lysis Protein Stability
Our core hypothesis is that the thermodynamic stability and lytic efficiency of the MS2 L-protein can be enhanced through two strategic pathways:
Structural Reinforcement: Introducing targeted mutations that promote independent folding or stabilize the 7-helix bundle, reducing dependence on the host chaperone DnaJ.
Generative Optimization: Utilizing evolutionary conservation data and generative protein design to create variants with improved membrane-insertion kinetics and host compatibility, thereby minimizing host-mediated resistance.
2. Specific Aims and Validation Pipeline
Aim 1: Mutation Design via Conservation and Predictive Modeling
We will perform Clustal Omega alignments of homologous lysis proteins to identify conserved residues (specifically the “HEDYPCRRQQRSST” island). This is followed by:
In silico Mutational Scanning: Using ESM-2 embeddings and LLR scores to nominate stabilizing mutations.
Folding Assessment: Validation of fold accuracy via ESMFold and AlphaFold-Multimer to ensure independent folding propensity and multimeric pore symmetry.
Aim 2: Generative Design for Chaperone Independence
We propose using generative models (like ProteinMPNN or RFdiffusion) to optimize the soluble N-terminal domain. The goal is to redesign the interface to either:
Enhance co-folding with DnaJ under controlled structural constraints.
Enable “folding rescue” by alternative or orthogonal chaperones (e.g., DnaK or GroEL) to bypass host adaptation.
Aim 3: Evolutionary Analysis and Host Factor Integration
Using pBLAST to survey orthologs, we will reconstruct evolutionary trajectories of stability. Candidates will be screened against E. coli host factors to minimize proteotoxicity while maximizing the “aggressive” lytic potential identified through Genomic Language Models (GLMs).
3. Computational Tools and Workflow
Our design-build-test-learn (DBTL) framework utilizes the following stack:
Sequence & Conservation: Clustal Omega and pBLAST for “evolutionary grammar” analysis.
Mutational Analysis: ESM-2 (LLR scores) for high-speed structural feedback on the 7-helix bundle.
Generative Design: ProteinMPNN for sequence backbone optimization.
Structural Validation: AlphaFold 3 and AlphaFold-Multimer to ensure biophysical plausibility of the 8-chain pore assembly.
Contextual Gap: A lack of specific data regarding the host bacteria’s in vivo environment may lead to unexpected results despite positive simulations.
Functional Trade-offs: Mutations that improve structural stability might inadvertently perturb the membrane-interaction properties or the native lytic activity, leading to a loss of function.
Misfolding Risk: Compact lysis proteins are highly sensitive; even high-confidence predicted folds (pLDDT > 80) may aggregate or fail to insert into the membrane in a real biological system.
5. Expected Outcomes
If successful, this framework will yield L-protein variants with:
Increased Stability: Robust functionality under diverse environmental conditions.
Reduced Host Dependency: Decreased reliance on native DnaJ interactions, making the phage less vulnerable to host-dependent failure modes.
Optimized Lysis: Retention of a mature fold compatible with aggressive lytic activity, establishing a generalizable template for synthetic antimicrobial modules.
The study explains that the MS2 phage L protein is a 75 amino acid polypeptide that kills bacteria through a unique mechanism. Unlike other proteins like E or A2, which block cell wall synthesis, the L protein does not affect peptidoglycan production. Using a smart screening system with a lacZ reporter to filter out false positives, the researchers discovered that L depends entirely on the host chaperone DnaJ to function. Interestingly, a specific mutation in DnaJ called P330Q completely blocks lysis at 30°C. Through pulldown assays, they confirmed that DnaJ physically binds to the N-terminal “head” of the protein, which is full of basic charges and is actually dispensable for the killing process, serving instead as a control unit.
The final model proposes that this N-terminal domain of L acts as a biological brake that auto-inhibits the protein. The DnaJ chaperone acts like a key that unlocks this brake, allowing the hydrophobic tail of the protein to reach its actual target inside the cell. This was proven with Lodj mutants, which are versions of the L protein lacking the head. These mutants do not need DnaJ and kill the bacteria 20 minutes faster than the wild type. This system mirrors what happens with the E protein and its chaperone SlyD, suggesting that phages evolved these charged domains as a strategy to control lysis timing and ensure the virus has enough time to replicate before destroying the host.
The MS2 lysis protein (L) is a 75-amino acid polypeptide that triggers bacterial cell death without disrupting net peptidoglycan synthesis. Research reveals a conserved LS (Leu-Ser) dipeptide motif at residues Leu48-Ser49, which serves as the essential core for protein-protein interactions. While the N-terminal half of the protein is dispensable for lytic activity, the C-terminal domain is critical; specifically, the S49C mutation in the LS motif causes an absolute lysis defect. This motif is highly conserved across diverse phages, indicating it is a universal structural requirement for the lytic function in amurins.
The study suggests that the L protein interacts with a host membrane target through the LS motif and surrounding essential domains. The N-terminus functions as a regulatory domain that naturally inhibits this interaction, while the host chaperone DnaJ binds to the N-terminus to displace it from its inhibitory position. Interestingly, deleting the basic N-terminal domain allows the protein to bypass the need for DnaJ entirely. This confirms that the N-terminus acts as a regulatory gatekeeper, and DnaJ is the key that unlocks the protein’s ability to engage its cellular target.
The MS2 bacteriophage lysis protein (MS2L) facilitates host cell escape by punching holes in the bacterial wall through a dual-domain mechanism. It consists of a soluble HEAD domain and a transmembrane TAIL domain that anchors into membranes, behaving similarly to soap or micelles. A key finding is that the TAIL domain drives oligomerization, causing 10 or more proteins to clump into large complexes. CryoEM data confirms these clusters gather at specific spots to trigger a sequential rupture: first the outer membrane breaks, followed by the peptidoglycan layer, and finally the inner membrane, causing the cell contents to leak out.
The researchers identified the HEAD domain as a biological brake that regulates the timing of lysis. While the full MS2L protein is difficult to insert into membranes, removing the HEAD allows for relatively easy insertion, suggesting it functions as a timer to prevent premature cell death. Additionally, the helper protein DnaJ binds to MS2L but does not influence its membrane entry or oligomerization. From an engineering perspective, removing the HEAD domain could bypass this brake to achieve a “quicker kill,” a strategic goal for optimizing lytic toxicity in synthetic biology.
This paper explains how phages have evolved from a biological curiosity into a sophisticated therapeutic tool by focusing on their life cycles and resistance mechanisms. The review highlights that success in therapy depends on more than just injecting phages; it requires a deep understanding of pharmacokinetics and the patient’s immune response, as the body might neutralize the viruses before they reach the infection site. A key advancement mentioned is the use of genetic engineering to create “designer” phages that do more than just kill bacteria, such as degrading biofilms or working alongside traditional antibiotics to restore drug sensitivity. The future of the field points toward precision medicine where specific phages are selected or edited for each patient to overcome the regulatory and technical barriers that previously limited mass clinical use.
This text explores the historical evolution and the modern resurgence of phage therapy in response to the global antibiotic resistance crisis. It begins by reminding us that phages were used long before penicillin but were largely forgotten in the West due to a lack of standardized protocols and the convenience of broad-spectrum antibiotics. Currently, we are in a stage of “compassionate use” where phages are successfully applied in desperate cases of multi-drug resistant infections, which is driving new controlled clinical trials. The study concludes that the biggest challenge today is not just biological but also logistical and legal, as a global infrastructure is needed to collect and characterize phage libraries that can be quickly deployed against emerging pathogens. This marks a shift from general treatments to a completely personalized paradigm.
This research utilizes the Evo 1 and Evo 2 DNA foundation models to design functional biological systems at the whole-genome scale. Using the phiX174 lytic phage as a chassis, the AI successfully generated 16 viable phages with substantial evolutionary novelty. Some variants were highly distant from common natural sequences, proving that genomic language models (GLMs) can expand the known biological space. This is critical for phage therapy, as these AI-designed variants demonstrated a superior ability to overcome bacterial resistance in E. coli strains where natural phages failed.
The computational method employed taxonomic prompting (e.g., Riboviria) to guide the generative process toward specific viral realms. Novelty was rigorously validated using nucleotide BLAST against core databases to confirm the emergence of original sequences. This strategy offers a robust framework for creating diverse phage cocktails, a key requirement for modern antimicrobial treatments. By leveraging taxonomic labels and pretraining, the study establishes a “design-build-test” workflow for engineering complex, multi-gene systems beyond the limits of natural evolution.
Review the Bacteriophage Final Project Goals for engineering the L Protein:
Increased stability (easiest)
Higher titers (medium)
Higher toxicity of lysis protein (hard)
Brainstorm Session
Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).
We will focus on increasing the structural stability of the L protein to ensure it remains functional under different environmental conditions.
We will also attempt to increase the toxicity of the lysis protein by optimizing its target regions to enhance bacterial cell wall disruption.
Write a 1-page proposal (bullet points or short paragraphs) describing:
Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).
We will use ESMFold to perform in silico mutational scanning and identify target regions in the L protein.
We propose using Genomic Language Models (GLMs) to design and optimize sequences with higher lytic potential.
Finally, we will use AlphaFold-Multimer to validate the folding and stability of the engineered protein complexes.
Why do you think those tools might help solve your chosen sub-problem?
ESMFold allows for high-speed structural feedback, making it easier to test how mutations affect the 7-helix bundle.
GLMs are essential for capturing the “evolutionary grammar” of toxicity, helping to design proteins that are more aggressive than natural variants.
AlphaFold ensures that our computational designs are biophysically plausible and stable before any potential wet-lab implementation.
Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).
Contextual Gap: There is a lack of specific data regarding the host bacteria’s environment, which might lead to unexpected results in vivo.
Misfolding Risk: The engineered protein might still misfold or aggregate in a real biological system despite having positive simulation results in the pipeline.
Include a schematic of your pipeline.
Here’s a short written schematic of our pipeline:
[Sequence Input] → [ESM-2 Mutational Scan] → [GLM Toxicity Optimization] → [AlphaFold Validation] → [Final Design]
Each individually put your plan on your HTGAA website
Include your group’s short plan for engineering a bacteriophage
Part C: Final Project: L-Protein Mutants
High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a
MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.
L-Protein Engineering - Option 1: Mutagenesis
Step 1: Information Gathering
Here are the L-protein and Dnaj sequences
Lysis Protein Sequence (UniProtKB ID: P03609)
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
Note: Lysis protein contains a soluble N-terminal domain followed by a transmembrane protein (blue/last 35 residues). Transmembrane protein affects the lysis activity. The soluble domain (green) is the domain responsible for interaction with DnaJ.
Soluble N-terminal domain: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYV
TM domain: LIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
Additionally, here’s a screenshot of the BLAST results for L-protein:
Lastly, these results were aligned using Clustal Omega, revealing a highly conserved “island” (HEDYPCRRQQRSST) at residues 24-38. These sites will be avoided during mutagenesis to preserve the critical interaction with DnaJ and overall biological function of the phage.
My approach is very straightforward: I combined computational LLR scores with experimental lab data using a copy of the HTGAA Colab. I filtered for mutations that showed “active lysis” (value 1) in the experimental spreadsheet and high positive LLR scores in the notebook.
Step 3: Filtering and Ranking
I used AlphaFold 3 to model the 8-chain assembly. This step was used to rank candidates that showed both positive computational scores and confirmed experimental activity, ensuring they don’t disrupt the pore’s symmetry.
Step 4: Final Mutated Sequences
These 5 mutations were selected because they are experimentally proven to maintain lysis (score 1) and show improved or stable computational scores.
Region
Mutation
LLR Score (ESM-2)
Experimental Lysis
Rationale
Soluble
S9Q
2.01
Active (1)
High computational confidence; replaces Serine with Glutamine to stabilize the N-terminal loop.
Soluble
C29R
2.39
Active (1)
One of the top scores; removing this Cysteine likely prevents incorrect disulfide bonding.
TM Domain
Y39L
2.24
Active (1)
High confidence score in the TM interface; optimizes hydrophobicity for membrane entry.
TM Domain
A45L
1.53
Active (1)
Consistent with experimental data; improves the hydrophobic core of the lytic pore.
TM Domain
N53L
1.86
Active (1)
Replaces a polar Asparagine with Leucine, significantly improving helix-helix packing in the multimer.
S9Q mutation 8-chain assembly:
C29R mutation 8-chain assembly:
Y39L mutation 8-chain assembly:
A45L mutation 8-chain assembly:
N53L mutation 8-chain assembly:
While AF3 structures were used to visualize the multimeric orientation, the ipTM scores remained low (~0.17) across all mutations. This is expected given the small, intrinsically disordered nature of the L-protein and the high flexibility required for its lytic function, which challenges standard multimeric confidence metrics.