Group Final Project

Group Members
@2026a-keerthana-gunaretnam, @2026a-nourelden-rihan, @2026a-ritika-saha, @2026a-rahul-yaji
Project Goals
Bacteriophages represent a promising alternative to antibiotics in addressing the global challenge of AMR, as evidenced by historical and contemporary reviews (Barron, 2022). The MS2 phage, a single-stranded RNA bacteriophage, encodes the lysis protein L, which disrupts the host bacterial cell wall to facilitate phage progeny release (Chamakura et al., 2017a). Engineering L protein aims to optimize phage performance for therapeutic use, including improved stability, production yields (titers), and lytic potency. This review analyzes each goal using insights from mutational studies (Chamakura et al., 2017b), in vitro characterizations (Mezhyrova et al., 2023), engineering approaches (Lin et al., 2023), phage therapy overviews (Barron, 2022), and computational design strategies (King et al., 2025). The analysis ranks goals by difficulty, as specified, and elucidates the meaning of “higher toxicity”
Increased stability (easiest) It depends on specific alterations that maintain basic functionality, improving the stability of the L protein, including thermal, structural, or proteolytic resilience, is considered the least difficult. By reducing degradation during phage manufacturing or application, stability enhancements can increase shelf life and environmental robustness.
This viability is supported by evidence found in the literature. Residues essential for folding and stability have been found through mutational study of L; conservative substitutions increase resistance to unfolding without sacrificing lytic activity (Chamakura et al., 2017b). L’s biochemical characteristics are further characterized by in vitro investigations, which show potential for stabilization by domain-specific engineering (Mezhyrova et al., 2023). Furthermore, L’s reliance on host chaperones such as DnaJ for correct folding implies that lowering this dependence by protein redesign may result in more stable, autonomous variations (Chamakura et al., 2017a). Rapid iteration is made possible by the generative design of stable phage proteins made possible by emerging computer tools like genome language models (King et al., 2025). Stable lysis proteins help with dependable production in the context of phage therapy, removing a significant obstacle to clinical use (Barron, 2022).
Higher titers (medium) In order to maximize burst size, the amount of progeny phages released per infected cell while preventing premature host death, it is necessary to optimize L protein expression and lysis timing in order to achieve greater phage titers. This goal is quite challenging since it necessitates balancing phage replication cycles with lysis efficiency, which frequently calls for iterative testing in host systems.
Higher toxicity of lysis protein (hard) According to supporting research, mutations in L can alter the kinetics of lysis, possibly raising titers by postponing lysis to enable more phage multiplication (Chamakura et al., 2017b). For example, modified L variants have shown increased yield without compromising infectivity in Escherichia coli models (Chamakura et al., 2017a). Scalable changes to phage genomes, such as lysis genes, to improve production efficiency are highlighted by engineering-focused research (Lin et al., 2023). L’s significance in membrane disruption is highlighted by its in vitro characterisation, which implies that targeted enhancements could maximize burst dynamics (Mezhyrova et al., 2023). Low titers are identified as a manufacturing difficulty in the broader literature on phage therapy; nevertheless, developments in synthetic biology and AI-driven design provide options for high-yield phage variations (Barron, 2022; King et al., 2025).
Project Proposal
We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action. The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed. BLAST can pull out homologous lysis proteins from the databases. Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated. ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions. AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ. We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein. Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.
MS2 Lysis of Escherichia coli Depends on Host Chaperone DnaJ - PMC The study shows that the MS2 phage lysis protein L requires the host chaperone DnaJ for efficient host cell lysis. A missense mutation (P330Q) in the highly conserved C-terminal domain of DnaJ blocks MS2 L-mediated lysis at 30 °C and delays lysis at higher temperatures, without affecting overall L protein synthesis. The defect is specific to L-mediated lysis and does not affect lysis by other phage lysis proteins.
Genetic suppressor screening identified Lodj alleles of the L gene that bypass the DnaJ requirement. These alleles encode truncated L proteins lacking the highly basic N-terminal domain, indicating that this domain confers dependence on DnaJ. Biochemical assays demonstrated that wild-type L forms a membrane-associated complex with DnaJ, whereas the P330Q DnaJ variant cannot interact with L.
The authors propose that DnaJ functions as a chaperone that facilitates proper folding or conformational activation of full-length L, preventing steric interference from the N-terminal domain and allowing L to interact with its unknown cellular target. Removal of the dispensable N-terminal domain eliminates the need for chaperone assistance and accelerates lysis.
The work identifies DnaJ as a host factor regulating MS2 lysis timing and suggests that chaperone-dependent modulation of lysis may be an evolutionary strategy to optimize phage replication cycles.
https://pmc.ncbi.nlm.nih.gov/articles/PMC5775895/ This study performed comprehensive mutational and genetic analyses of the MS2 phage lysis protein L to identify residues and domains required for function. Random mutagenesis of the 75-aa L protein showed that most loss-of-function mutations cluster in the C-terminal half of the protein, especially around a conserved Leu-Ser (LS) dipeptide motif. Many inactivating mutations were conservative amino-acid substitutions and did not affect protein accumulation or membrane association, suggesting that L function depends on specific protein–protein interactions rather than nonspecific membrane disruption.
Functional studies demonstrated that L-mediated lysis requires interaction with the host chaperone DnaJ. The highly basic N-terminal domain of L is dispensable for lytic activity but mediates DnaJ dependence. Truncation of this domain or certain suppressor mutations bypassed the chaperone requirement and restored rapid lysis.
Biochemical and genetic data support a model in which L is an integral membrane protein whose essential domains (including the LS motif and neighboring regions) form a helical structure that likely engages a host membrane target protein. The interaction may occur near sites of membrane curvature associated with peptidoglycan biosynthesis rather than by forming nonspecific membrane lesions.
The work, supported in part by the Center for Phage Technology and associated laboratories including research by Ry Young, suggests that MS2 L functions through a specific heterotypic protein–protein interaction mechanism and that chaperone-dependent regulation helps control lysis timing during infection.
The study refines the mechanistic model of MS2 lysis, proposing that conserved structural motifs rather than general membrane disruption drive lytic activity.
https://pmc.ncbi.nlm.nih.gov/articles/PMC10688784/ This study provides detailed in vitro and in vivo characterization of the MS2 lysis protein MS2-L, focusing on its membrane insertion mechanism, oligomerization behavior, and interaction with the host chaperone DnaJ. Key findings show that MS2-L is a 75-amino-acid phage toxin whose essential lytic activity resides in the C-terminal ~35 amino acids, which form a hydrophobic transmembrane region. The N-terminal soluble domain is not required for bacterial killing but modulates folding, membrane insertion efficiency, and chaperone interaction. Biochemical assays demonstrate that MS2-L interacts directly with DnaJ, primarily through the soluble N-terminal domain. However, this interaction does not significantly affect membrane insertion, solubilization, or oligomerization of the toxin, suggesting that DnaJ functions more as a folding or stabilization partner rather than being essential for lytic activity. Native mass spectrometry revealed that MS2-L assembles into high-order oligomeric complexes (≥10 monomers) after insertion into lipid nanodiscs, and oligomerization is driven mainly by the transmembrane domain. In detergent environments, oligomer formation is reduced, indicating that membrane lipid context is important for stable assembly. Fluorescence microscopy and cryo-electron microscopy showed that MS2-L expression in bacteria leads to peripheral membrane clustering, followed by sequential lesion formation beginning in the outer membrane, then disruption of the peptidoglycan layer, and finally inner membrane disintegration with cytoplasmic leakage. The data support a model in which MS2-L functions as a pore-forming phage toxin that kills cells through higher-order oligomerization within the bacterial membrane, rather than by directly inhibiting peptidoglycan biosynthesis. Chaperone DnaJ binds MS2-L but is not required for membrane insertion or pore assembly, suggesting its role is mainly in modulating toxin folding or stability. These findings strengthen the concept that MS2-L belongs to the amurin/single-gene lysis protein family and may be useful for bioengineering applications such as bacterial ghost cell production and antimicrobial design.
https://pubmed.ncbi.nlm.nih.gov/36608652/ This review from Elsevier surveys the biological mechanisms, clinical development, and future directions of phage therapy as a strategy to combat antimicrobial resistance. It explains that therapeutic phages should ideally be strictly lytic, highly host-specific, and thoroughly characterized to ensure safety and efficacy.
The article describes how phages kill bacteria through mechanisms such as inhibition of essential cellular processes, expression of lysis proteins, or disruption of bacterial membranes. It also discusses advances in phage engineering, including synthetic genome construction and modification of phage host range and virulence.
Clinical applications of phage therapy are highlighted, particularly for treating drug-resistant infections where antibiotics are ineffective. However, challenges remain, including bacterial resistance to phages, regulatory hurdles, manufacturing standardization, and the need to understand phage–host interactions.
Future directions include the use of genetically modified or synthetic phages, computational prediction of therapeutic candidates, and integration of phage therapy with conventional antimicrobial strategies. Overall, phage therapy is presented as a promising but still developing alternative to antibiotics in the fight against antimicrobial resistance.
https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1.full This preprint reports the first experimental demonstration of generative design of complete bacteriophage genomes using genome language models (Evo 1 and Evo 2). The authors fine-tuned models on about 15,000 Microviridae phage genomes to enable autoregressive generation of full viral genomes guided by template-based prompts and biologically motivated design constraints. The workflow involved computational generation followed by multi-tier filtering for sequence quality, host tropism specificity, and evolutionary diversity. Constraints included genome length (4–6 kb), GC content, absence of long homopolymers, preservation of phage-like gene architecture, and spike protein similarity to the template phage to maintain host targeting. Experimental validation showed that about 285 of 302 synthesized genome candidates could be assembled, and 16 produced viable infectious phages that inhibited growth of the target host strain. These generated phages displayed substantial sequence novelty, containing hundreds of mutations relative to natural Microviridae genomes, while preserving functional genome organization. Structural and functional analyses indicated that some generated phages possessed altered protein interfaces but maintained compatible capsid–protein interactions. Cryo-electron microscopy and structure prediction suggested context-dependent co-evolution of structural proteins such as capsid and packaging proteins. Fitness assays showed that several AI-generated phages matched or exceeded the replication and lytic performance of the template phage, and phage cocktail experiments demonstrated rapid suppression of resistant bacterial strains through recombination and mutation-driven adaptation. The study was conducted with biosafety considerations, including restricting model training to bacteriophage genomes and using well-characterized laboratory strains. The work was supported by researchers affiliated with institutions such as the Stanford University and the Arc Institute. Overall, the paper proposes a framework for generative genome engineering, showing that AI models can design biologically viable and evolutionarily novel bacteriophages, potentially enabling future synthetic biology and phage-based therapeutic development.
Project Proposal: Engineering the MS2 Phage Lysis Protein L
1. Project Goal Our primary goal is to increase the structural stability of the MS2 bacteriophage lysis protein (L) while maintaining its ability to lyse bacterial cells. Our secondary goal is to reduce the dependency of L on the host chaperone DnaJ, which normally assists the protein in folding or activation. Reducing this dependency could allow the lysis protein to function more efficiently and independently in engineered systems. The MS2 L protein is a 75-amino-acid single-gene lysis toxin whose C-terminal region forms a hydrophobic transmembrane domain responsible for membrane disruption and pore formation, while the basic N-terminal domain interacts with host factors such as DnaJ. Previous studies show that truncation of the N-terminal region can bypass the DnaJ requirement while preserving lysis activity. Therefore, our design strategy focuses on: Stabilizing the transmembrane and oligomerization regions Maintaining essential functional motifs such as the L48–S49 motif Exploring modifications to the N-terminal region to reduce DnaJ dependence 2. Computational Tools and Approaches We will use a multi-step computational protein engineering pipeline combining sequence analysis, machine-learning mutagenesis predictions, and structural modeling. 2.1 BLAST – Homolog Discovery First, we will use BLAST to identify homologous lysis proteins from related bacteriophages. Purpose: Identify evolutionarily conserved residues Discover natural sequence variations that maintain function Build a dataset for multiple sequence alignment This will help determine which regions are functionally constrained vs mutable. 2.2 Clustal Omega – Multiple Sequence Alignment (MSA) Using sequences obtained from BLAST, we will perform multiple sequence alignment with Clustal Omega. Purpose: Identify highly conserved residues, especially around the L48–S49 motif Map essential structural regions Determine which residues are safe to mutate Regions with high conservation will be protected from mutation, while variable regions may be targeted for stability improvements. 2.3 ESM (Protein Language Models) – In Silico Mutagenesis Next, we will use ESM (Evolutionary Scale Modeling) protein language models to perform systematic mutation scanning. Purpose: Generate mutation heatmaps Predict which amino acid substitutions improve protein fitness or stability Identify mutations compatible with the evolutionary sequence landscape This step will guide rational mutation selection instead of random mutagenesis. 2.4 ESMFold – Structure Prediction for Mutants Promising mutations from ESM analysis will be modeled using ESMFold. Purpose: Predict 3D structures of mutant proteins Evaluate structural stability Ensure the transmembrane helix remains intact Mutations that significantly distort the fold will be discarded. 2.5 AlphaFold Multimer – Oligomerization and Host Interaction Finally, we will use AlphaFold Multimer to analyze: L protein oligomerization Potential interactions with DnaJ Purpose: Predict whether mutated L proteins can form the oligomeric pore complex Evaluate whether N-terminal mutations reduce interaction with DnaJ Since MS2-L likely forms large oligomeric pores (>10 subunits) in the membrane, maintaining correct protein in1.Phage L protein sequence
Computational Workflow 1.Phage L protein sequence 2.BLAST Search (find homologous lysis proteins) 3.Multiple Sequence Alignment (Clustal Omega) identify conserved vs mutable residues 4.ESM Mutation Scanning (generate mutation heatmaps) 5.Select Candidate Mutations (stability or N-terminal modifications) 6.Structure Prediction (ESMFold) 7.Complex/Oligomer Prediction (AlphaFold Multimer) 8.Final Mutant Candidates (stable + functional lysis protein) 3. Proposed Engineering Pipeline Computational workflow we will follow. 4. Expected Outcomes Our pipeline aims to produce engineered variants of the MS2 L protein with: Increased structural stability Reduced aggregation risk Maintained transmembrane insertion Potentially reduced dependency on host DnaJ These optimized proteins could be useful in applications such as: Synthetic phage engineering Bacterial ghost cell production Antimicrobial protein development 5. Potential Pitfalls 5.1 Limited Training Data Most protein language models and structural predictors are trained primarily on globular proteins, not small transmembrane phage toxins. This may reduce prediction accuracy for MS2 L. 5.2 Risk of Over-Stabilization Mutations designed to increase stability may cause: Protein aggregation Improper membrane insertion Loss of functional oligomerization Thus stability must be balanced with function. 5.3 Poor Annotation of Amurin Proteins Single-gene lysis proteins (also called amurins) are poorly annotated in sequence databases. This may limit the quality of homologous sequences used for alignment and training. 5.4 Host Protease Sensitivity Mutations may unintentionally expose protease cleavage sites, making the engineered protein less stable inside bacterial cells. 6. Future Work If promising mutants are identified computationally, the next steps would include: Experimental expression in E. coli Measuring lysis timing Measuring protein stability Testing DnaJ independence This would validate whether computational predictions translate into improved biological function.
Continue: Week 5: HW Protein Design Part II