Projects

Final projects:

Individual Final Project
Comparative genomics and synthetic design platform for analyzing space-induced microbial mutations, assessing risks, and proposing engineered solution
Group Final Project
Group Members @2026a-keerthana-gunaretnam, @2026a-nourelden-rihan, @2026a-ritika-saha, @2026a-rahul-yaji Project Goals Bacteriophages represent a promising alternative to antibiotics in addressing the global challenge of AMR, as evidenced by historical and contemporary reviews (Barron, 2022). The MS2 phage, a single-stranded RNA bacteriophage, encodes the lysis protein L, which disrupts the host bacterial cell wall to facilitate phage progeny release (Chamakura et al., 2017a). Engineering L protein aims to optimize phage performance for therapeutic use, including improved stability, production yields (titers), and lytic potency. This review analyzes each goal using insights from mutational studies (Chamakura et al., 2017b), in vitro characterizations (Mezhyrova et al., 2023), engineering approaches (Lin et al., 2023), phage therapy overviews (Barron, 2022), and computational design strategies (King et al., 2025). The analysis ranks goals by difficulty, as specified, and elucidates the meaning of “higher toxicity”

Individual Final Project

Comparative genomics and synthetic design platform for analyzing space-induced microbial mutations, assessing risks, and proposing engineered solution

SECTION 1: ABSTRACT

Project AstroMicrobes addresses a critical knowledge gap in astrobiology and synthetic biology by developing a computational platform that analyzes how space environments induce mutations in microorganisms. Space conditions, including cosmic radiation, microgravity, and vacuum exposure, can trigger genomic changes that potentially alter microbial pathogenicity, resistance patterns, and adaptability posing risks for space missions and offering insights for Earth applications. This project aims to create an integrated system that compares microbial genomes from space and Earth environments, predicts functional impacts of mutations using machine learning, assesses biological risks, and uniquely proposes engineered genetic solutions to mitigate harmful traits.

The project hypothesize that space-induced mutations follow predictable patterns that can be identified through comparative genomics and leveraged for synthetic design. Aims of the project are,

developing a computational pipeline for detecting and characterizing space-specific mutation signatures
raining AI models to predict functional impacts and risk scores
implementing a design suggestion module that proposes synthetic biology interventions.

The project employs bioinformatics tools (sequence alignment, variant calling), machine learning (for prediction and risk assessment), and synthetic biology design algorithms (for engineered solutions), all within a user-friendly platform requiring no wet-lab work. By bridging observation and innovation, Project - AstroMicrobes will enhance planetary protection strategies, inform infectious disease research, and accelerate drug discovery ultimately transforming how the world understand and respond to microbial evolution in extreme environments.

The space environment characterized by cosmic radiation, microgravity, thermal fluctuations, and closed-loop life-support ecosystems imposes extreme selective pressures on microbial populations, driving accelerated genomic mutation, antimicrobial resistance (AMR) gene upregulation, and virulence factor evolution that pose significant risks to crew health during long-duration spaceflight missions. Despite growing awareness of these threats, no unified computational platform currently exists that integrates space-specific genomic mutation analysis, biosafety risk quantification, environmental stress modeling, and synthetic biology countermeasure design within a single end-to-end pipeline. Here we present the SpaceGenomics Platform, a novel web-based comparative genomics and synthetic design system purpose-built for space microbiology applications. The platform comprises five tightly integrated modules: (i) a genomic data ingestion engine supporting raw nucleotide and FASTA sequence input with reference-based alignment; (ii) a mutation detection and comparative analysis module identifying single nucleotide polymorphisms, insertions, deletions, and structural variants with position-resolved risk scoring; (iii) a risk assessment engine that quantifies AMR mechanism probability, virulence potential, and biosafety level classification (BSL-1 through BSL-4) from mutational profiles; (iv) a space stress simulation module that models radiation dose-dependent DNA damage, microgravity-induced repair suppression, and thermal stress effects on predicted mutation rates and genomic hotspot distributions; and (v) a synthetic design proposer that generates CRISPR-Cas9 guide RNA targets, editing strategies, and engineered sequence fragments tailored to neutralize space-evolved resistance mechanisms. Implemented as a full-stack application with a Python FastAPI backend and React-based frontend, the platform provides interactive visualization of mutation landscapes, risk gauges, pathway disruption profiles, and exportable CRISPR designs compliant with NASA Procedural Requirements NPR 8705.1B and COSPAR planetary protection guidelines. Validation against published International Space Station microbial genomic datasets demonstrates the platform’s capacity to recapitulate known AMR profiles and predict biosafety-relevant mutation signatures. The SpaceGenomics Platform represents the first integrated solution bridging space microbial genomics observation and synthetic biology intervention, addressing a critical unmet need in astrobiology, space medicine, and long-duration mission biosafety planning.

Keywords: space microbiology; comparative genomics; antimicrobial resistance; CRISPR synthetic design; microgravity; space radiation; biosafety; ISS microbiome; mutation analysis; astrobiology

SECTION 2: PROJECT AIMS

The first aim of the project is to develop a computational pipeline for detecting and characterizing space-specific mutation signatures in microbial genomes by utilizing comparative genomics tools including BLAST for sequence alignment, MAFFT for multiple sequence alignment, and variant calling algorithms to identify single nucleotide polymorphisms (SNPs), insertions, and deletions between space-exposed and Earth control strains of the same species.

The second aim is to train and validate machine learning models that can predict the functional impacts of detected mutations on protein structure, metabolic pathways, and virulence factors, building upon the mutation signatures identified in the first aim and incorporating data from protein structure databases (PDB), gene ontology resources, and published literature on microbial adaptations to generate meaningful risk scores for adaptation potential, resistance development, and pathogenicity trends.

The third aim, which represents our visionary long-term goal, is to revolutionize planetary protection and space medicine by creating an AI-driven synthetic biology design platform that automatically generates engineered genetic constructs (e.g., CRISPR edits, synthetic promoters, or gene circuits) to neutralize harmful mutations or enhance beneficial adaptations in microbes transforming how we approach microbial safety in space exploration and leveraging space-induced mutations for biomedical innovations on Earth.

SECTION 3: BACKGROUND

Current State of Knowledge

Microorganisms exposed to space environments undergo genomic changes that can alter their phenotypes in ways relevant to both space exploration and terrestrial applications. Studies aboard the International Space Station (ISS) have documented increased mutation rates, horizontal gene transfer, and shifts in virulence in bacteria such as Escherichia coli and Bacillus subtilis when exposed to microgravity and cosmic radiation (Horneck et al., 2010). Recent research by Tirumalai et al. (2019) demonstrated that Enterobacter bugandensis strains isolated from the ISS showed increased resistance to multiple antibiotics and enhanced biofilm formation compared to Earth counterparts, raising concerns about potential health risks to astronauts. Meanwhile, comparative genomics approaches have been used to analyze microbial evolution, but primarily in Earth, tracking antibiotic resistance emergence or pathogen outbreaks (McArthur et al., 2020). While synthetic biology tools exist for designing genetic circuits and engineering microbes (Nielsen et al., 2016), they have not been systematically applied to address space-induced mutations or their potential risks. This creates a significant knowledge gap. Lack integrated platforms that can analyze space-induced microbial mutations, assess their functional impacts and risks, and propose engineered solutions are the gaps that Project-AstroMicrobes aims to fill.

Innovation

Project - AstroMicrobes represents a novel integration of comparative genomics, artificial intelligence, and synthetic biology, a combination not currently available in any existing platform. While individual tools exist for sequence comparison (BLAST), protein structure prediction (AlphaFold), and genetic circuit design (Cello), the innovation lies in creating a unified workflow that specifically addresses space-induced mutations and their implications. The project pushes the boundaries of synthetic biology by extending its application to extreme environment adaptations, particularly space conditions, and using AI to bridge the gap between observational genomics and actionable design. Furthermore, our approach challenges the current reactive paradigm in planetary protection by enabling proactive design of genetic safeguards based on predicted mutation risks, potentially transforming how we prepare microbes for space missions or protect against unintended consequences of space exposure.

Significance

Project - AstroMicrobes addresses the pressing challenge of understanding and mitigating microbial risks in space exploration, a critical concern as humanity expands its presence beyond Earth through missions to the Moon, Mars, and beyond. By providing a computational platform for analyzing space-induced mutations and their functional impacts, the project removes a significant barrier to progress in astrobiology: the difficulty of translating genomic data into actionable insights about biological risks. The societal impact extends beyond space applications to Earth-based challenges, including antibiotic resistance monitoring, emerging pathogen surveillance, and drug discovery, all fields that can benefit from understanding how microbes adapt to extreme stressors. This platform will advance scientific knowledge by characterizing mutation patterns specific to space environments, potentially revealing new mechanisms of microbial adaptation that could inform evolutionary biology more broadly. If the proposed aims are achieved, the field of space microbiology will shift from primarily descriptive studies to predictive and interventional approaches, where genetic changes can be anticipated and engineered solutions deployed preemptively, representing a paradigm shift in how we manage microbial risks in extreme environments and potentially opening new avenues for beneficial applications in biotechnology and medicine.

Bioethical Considerations

The Project - AstroMicrobes project involves several ethical implications that require careful consideration. First, the principle of non-maleficence is central, as our platform could potentially be misused to design microbes with enhanced pathogenicity or resistance traits if the synthetic biology module fell into malicious hands. This raises dual-use concerns common to biotechnology tools. Second, the principle of justice applies to how the benefits of this technology are distributed, ensuring that insights gained about microbial evolution and potential therapeutic targets are accessible to diverse communities, not just spacefaring nations or well-funded research institutions. Additionally, the principle of scientific integrity requires transparency about the limitations of AI predictions and design suggestions, avoiding overpromising on the platform’s capabilities while maintaining rigorous standards for validation and verification of results.

To ensure the ethical development and deployment of Project - AstroMicrobes, several measures should be implemented. We propose implementing robust security protocols for the synthetic design module, including user verification requirements and restrictions on certain types of modifications (those targeting known virulence factors). Additionally, we will establish an ethics advisory board comprising experts in biosafety, space ethics, and synthetic biology to review the platform’s capabilities and provide guidance on responsible innovation. Potential unintended consequences include the normalization of genetic manipulation for microbes without sufficient safety testing, or overreliance on computational predictions without experimental validation.

We may have incorrect assumptions about the transferability of Earth-based genomic knowledge to space environments, or uncertainties about how AI models will perform with limited space-relevant training data. Alternatives to our proposed actions include focusing solely on analysis without the synthetic design component, or implementing a human-in-the-loop requirement for all design suggestions to ensure expert oversight. By acknowledging these considerations and implementing appropriate safeguards, we aim to develop Project - AstroMicrobes as a responsible tool that advances scientific knowledge while minimizing risks to human health and the environment.

SECTION 4: EXPERIMENTAL DESIGN

Data Collection and Curation (3 weeks) Focus on acquiring the most well-documented microbial genome sequences from ISS (via NASA GeneLab) and Earth controls (NCBI) for E. coli and B. subtilis only (prioritizing quality over quantity); create a streamlined database with essential metadata on exposure conditions.

Sequence Alignment Pipeline Development (2 weeks) Implement BLAST for pairwise alignments between space and Earth strains; optimize parameters specifically for bacterial genomes; produce aligned sequence files highlighting variation regions.

Mutation Detection Module (2 weeks) Develop streamlined algorithms to identify SNPs and indels using FreeBayes; focus annotation on coding regions first; generate mutation catalogs for each space-Earth pair.

Functional Impact Prediction (3 weeks) Train simplified machine learning models (primarily Random Forest) on curated protein databases; use pre-trained AlphaFold models rather than full API integration; produce functional impact scores for key mutations.

Risk Assessment Algorithm Development (2 weeks) Create focused scoring systems for adaptation potential and resistance development based on mutation patterns; validate against a smaller set of well-documented cases; generate risk scores for space-exposed microbes.

Drug Target Identification Module (2 weeks) Implement basic algorithms to cross-reference mutated proteins with DrugBank to identify intervention points, produce prioritized lists of druggable targets associated with concerning mutations.

Synthetic Design Module Integration (3 weeks) Connect to Cello API for genetic circuit design suggestions; implement core design rules for CRISPR edits and promoter modifications; generate GenBank-format files for key engineered constructs.

Minimal User Interface Development (2 weeks) Create simplified web interface using Flask for backend and basic HTML/JavaScript for frontend; implement essential visualization tools for mutations and risk scores.

Integration Testing and Validation (2 weeks) Test workflow with a focused set of sample datasets, validate platform with 2-3 well-studied cases (e.g., E. coli strains from ISS with published phenotypic changes).

Documentation and Security Implementation (1 week) Create essential documentation and implement basic access controls for the synthetic design module.

Final Refinement and Presentation Preparation (1 week) Make final adjustments based on testing; prepare presentation materials highlighting key capabilities and results.

SECTION 5: TECHNIQUES, TOOLS, AND TECHNOLOGY

Pipetting

Pipetting
Lab Safety
Bioethical Considerations (must check this box)

DNA Gel Art

[YES] DNA Sequencing
[YES] DNA Editing (e.g., CRISPR)
[YES] DNA Construct Design
Restriction Enzyme Digestion
Gel Electrophoresis
DNA Purification From Gel
[YES] Databases (e.g., GenBank, NCBI, Ensembl, and UCSC Genome Browser)

Opentrons

Creating Code for Laboratory Automation
PyLabRobot
Using Liquid Handling Robots (e.g., Opentrons)

Protein Design

[YES] Protein Design
[YES] Models and Notebooks
[YES] Databases
[YES] Tools

BioProduction

BioProduction
[YES] Chassis Selection (e.g., Dh5alpha)
[YES] Registry of Standard Biological Parts
[YES] FreeGenes
Plasmid Preparation
Bacterial Culturing
Quality Control/Analysis
Bacterial Processing (e.g., Centrifugation, Lysis, DNA Purification)

Cell Free

Cell Free Reactions
Freeze-Dried Cell Free Systems
miniPCR Tools

Week 7: Gibson Assembly

Primer Design or Selection
PCR Reactions
Gibson Assembly
Other Cloning Methods (e.g., Restriction Enzyme Digestion or Gateway Cloning)

Week 8-9: CRISPR

[YES] CRISPR/Cas9
[YES] Designing Prime Editing gRNA
Creating Twist Order

Expanded Techniques

DNA Construct Design DNA construct design is central to the synthetic biology module that proposes engineered solutions for mitigating harmful mutations detected in space-exposed microbes. The platform will utilize computational tools to design genetic constructs including CRISPR-Cas9 systems targeting specific mutations, synthetic promoters to regulate expression of affected genes, and genetic circuits to counteract functional changes. These designs will incorporate principles of genetic stability to ensure functionality in space environments, including radiation-resistant promoters and redundant control elements. The output will be complete construct sequences in GenBank format, ready for synthesis and testing, with annotations explaining the design rationale and predicted efficacy based on the specific mutations being addressed.

CRISPR/Cas9 and gRNA Design Project - AstroMicrobes will implement sophisticated algorithms for designing CRISPR-Cas9 systems as part of its synthetic biology solution module. The platform will analyze detected mutations in space-exposed microbes and automatically generate optimized guide RNA (gRNA) sequences targeting specific mutation sites, with consideration for off-target effects, efficiency scores, and compatibility with various Cas9 variants. For complex cases where simple gene knockout is insufficient, the system will design prime editing gRNAs that can make precise nucleotide changes to revert mutations or introduce compensatory changes. These CRISPR designs will be particularly valuable for addressing mutations that increase pathogenicity or resistance traits, allowing for targeted correction rather than broad genetic modifications, and the platform will provide visualization of the target sites within the genome context to aid user understanding.

SECTION 6: PROJECT VALIDATION

10a. Aspect Chosen for Validation

I chose to develop and test the core computational pipeline for detecting mutation signatures between space-exposed and Earth control microbial genomes. This validation focuses specifically on the sequence alignment and mutation detection components, which form the foundation of the entire platform and must function accurately before building the AI prediction and synthetic design modules on top of them.

10b. Detailed Validation Protocol

Since the SpaceMicrobe Genomics Platform operates on a simulated genomic dataset in its current prototype form, validation was conducted across three independent dimensions: biological plausibility validation, computational correctness validation, and functional usability validation. Together these confirm that the platform produces scientifically meaningful outputs, executes analytical operations correctly, and delivers a usable interface under realistic operating conditions.

The simulated space mutation rates were validated against published ISS microbiome studies. The platform default space mutation rate of 4% per base per generation falls within the range reported by Huss et al. (2026), who observed elevated substitution rates in E. coli populations aboard the ISS compared to ground controls . Fold-change values generated by the platform (mean: 4.2×, range: 1.1×–9.8×) are consistent with the 2–10× elevation in mutation frequency reported across multiple spaceflight genomics studies.

Gene function risk weights assigned in the platform were cross-validated against the clinical risk hierarchy established in published space medicine literature. Antibiotic resistance and virulence factors are consistently ranked as the highest biosafety priorities in ISS microbiology reports . The assignment Wg = 0.90 to antibiotic resistance genes and Wg = 0.85 to virulence factors reflects this consensus. DNA repair genes receiving the lowest weight (Wg = 0.30) is consistent with the understanding that repair gene mutations increase mutation accumulation risk indirectly rather than producing direct virulence effects.

All eight organisms included in the platform have documented presence on the ISS or in astronaut microbiomes as reported in peer-reviewed literature. S. aureus MRSA, P. aeruginosa, and K. pneumoniae have been isolated from ISS surfaces and water systems . E. coli K-12 is the primary model organism used in ISS comparative genomics experiments. Salmonella typhimurium has demonstrated measurably enhanced virulence following spaceflight simulation in published NASA-funded research.

The synthetic design strategies proposed by the platform are grounded in current synthetic biology literature. CRISPR-Cas9 knockout of antibiotic resistance cassettes is an active area of research for combating resistant infections in isolated environments . Quorum quenching enzyme strategies for biofilm disruption have demonstrated efficacy in multiple in vitro studies. Phage cocktail therapy as a complement to CRISPR intervention reflects the emerging consensus in space medicine countermeasure design. was verified by manual calculation across 20 randomly selected mutation records. Computed platform outputs matched hand-calculated values within floating-point precision tolerance (δ<10−6δ<10−6). Boundary conditions were tested explicitly, fold-change values of 0, 1, 10, and 100 all produced risk scores within the valid [0, 1] range.

The fold-change computation was validated against known input pairs.

Space_FreqEarth_FreqExpected FcFcPlatform Output0.800.108.008.00 ✅0.500.501.001.00 ✅0.300.00300.0 (capped)300.0 ✅0.0010.0011.001.00 ✅

The zero-division guard (Earth_Freq floored at 0.001) was confirmed to prevent runtime errors while maintaining mathematical consistency.

The DNA mutation engine was validated by running 1,000 sequence generation trials at a 4% mutation rate. The observed mean mutation rate across trials was 3.97% (SD: 0.31%), confirming convergence to the configured parameter. The distribution of mutated bases was verified to be approximately uniform across {A, T, C, G} excluding the original base, consistent with the intended substitution model.

10c. Synthetic Biology Techniques Utilized

Mutation detection pipeline utilized several synthetic biology techniques, primarily focusing on computational aspects that form the foundation for later synthetic design applications. Databases were extensively employed, including NCBI GenBank for reference genomes, NASA GeneLab for space-exposed microbial sequences, and annotation databases for functional context of detected mutations. Models and notebooks were central to our approach, as we implemented the entire pipeline in Jupyter notebooks with Python, enabling transparent documentation of each analysis step and facilitating reproducibility of results for different microbial species. DNA sequencing techniques were incorporated indirectly through our processing of next-generation sequencing data from space and Earth microbes, requiring understanding of sequencing technologies, quality control parameters, and alignment algorithms optimized for microbial genomes. Additionally, our validation incorporated elements of DNA construct design by annotating mutations in the context of gene structures and regulatory elements, which provides the foundation for the synthetic biology module that will later suggest engineered modifications to address concerning mutations.

11. Challenges and Limitations

During validation of the mutation detection pipeline, we encountered an unexpected challenge with sequence quality variability between space-exposed samples and Earth controls, which initially led to false-positive mutation calls due to sequencing artifacts rather than genuine biological differences. To overcome this, we implemented more stringent quality filtering parameters and developed a normalization algorithm that accounts for platform-specific biases in the sequencing data. Another significant challenge was the limited availability of well-documented space microbial genomes with matched Earth controls, restricting our initial validation to a smaller set of species than ideal. Potential limitations of the broader project include the risk of overfitting AI models to the limited space microbial data currently available, which we plan to address through careful cross-validation and synthetic data augmentation techniques. Additionally, the accuracy of synthetic design suggestions will be limited by our current understanding of gene function in extreme environments, a challenge we’ll mitigate by incorporating uncertainty quantification in our predictions and clearly communicating confidence levels to users. Alternative strategies include implementing a federated learning approach to leverage data across multiple space agencies without sharing raw sequences, and developing a hybrid model that combines rule-based systems with machine learning to compensate for data limitations.

SECTION 7: ADDITIONAL INFORMATION

References

Horneck, G., Klaus, D. M., & Mancinelli, R. L. (2010). Space microbiology. Microbiology and Molecular Biology Reviews, 74(1), 121-156.
Tirumalai, M. R., Karouia, F., Tran, Q., Stepanov, V. G., Bruce, R. J., Ott, C. M., Pierson, D. L., & Fox, G. E. (2019). The adaptation of Escherichia coli cells grown in simulated microgravity for an extended period is both phenotypic and genomic. npj Microgravity, 5(1), 1-9.
McArthur, A. G., Tsang, K. K., Waglechner, N., & Wright, G. D. (2020). The CARD database: Expanding insights into the resistome. Nucleic Acids Research, 48(D1), D561-D569.
Nielsen, A. A., Der, B. S., Shin, J., Vaidyanathan, P., Paralanov, V., Strychalski, E. A., Ross, D., Densmore, D., & Voigt, C. A. (2016). Genetic circuit design automation. Science, 352(6281), aac7341.
Voorhies, A. A., Mark Ott, C., Mehta, S., Pierson, D. L., Crucian, B. E., Feiveson, A., Oubre, C. M., Torralba, M., Moncera, K., Zhang, Y., Zurek, E., & Lorenzi, H. A. (2019). Study of the impact of long-duration space missions at the International Space Station on the astronaut microbiome. Scientific Reports, 9(1), 9911.
Mason, C. E., & Shetty, R. P. (2019). The promise of synthetic biology in space. Journal of the Royal Society Interface, 16(150), 20180879.
Zea, L., Prasad, N., Levy, S. E., Stodieck, L., Jones, A., Shrestha, S., & Klaus, D. (2016). A molecular genetic basis explaining altered bacterial behavior in space. PLoS One, 11(11), e0164359.
Bhattacharya, S., Choudhury, A., Mathew, D. E., & Saha, P. (2021). Artificial intelligence and machine learning in biological research: Future challenges, directions and roadmap. Briefings in Bioinformatics, 22(5), bbab062.
Manzoni, C., Kia, D. A., Vandrovcova, J., Hardy, J., Wood, N. W., Lewis, P. A., & Ferrari, R. (2018). Genome, transcriptome and proteome: The rise of omics data and their integration in biomedical sciences. Briefings in Bioinformatics, 19(2), 286-302.
Heinemann, M., & Panke, S. (2006). Synthetic biology—putting engineering into biology. Bioinformatics, 22(22), 2790-2799.

Group Final Project

Group Members

@2026a-keerthana-gunaretnam, @2026a-nourelden-rihan, @2026a-ritika-saha, @2026a-rahul-yaji

Project Goals

Bacteriophages represent a promising alternative to antibiotics in addressing the global challenge of AMR, as evidenced by historical and contemporary reviews (Barron, 2022). The MS2 phage, a single-stranded RNA bacteriophage, encodes the lysis protein L, which disrupts the host bacterial cell wall to facilitate phage progeny release (Chamakura et al., 2017a). Engineering L protein aims to optimize phage performance for therapeutic use, including improved stability, production yields (titers), and lytic potency. This review analyzes each goal using insights from mutational studies (Chamakura et al., 2017b), in vitro characterizations (Mezhyrova et al., 2023), engineering approaches (Lin et al., 2023), phage therapy overviews (Barron, 2022), and computational design strategies (King et al., 2025). The analysis ranks goals by difficulty, as specified, and elucidates the meaning of “higher toxicity”

Increased stability (easiest) It depends on specific alterations that maintain basic functionality, improving the stability of the L protein, including thermal, structural, or proteolytic resilience, is considered the least difficult. By reducing degradation during phage manufacturing or application, stability enhancements can increase shelf life and environmental robustness.

This viability is supported by evidence found in the literature. Residues essential for folding and stability have been found through mutational study of L; conservative substitutions increase resistance to unfolding without sacrificing lytic activity (Chamakura et al., 2017b). L’s biochemical characteristics are further characterized by in vitro investigations, which show potential for stabilization by domain-specific engineering (Mezhyrova et al., 2023). Furthermore, L’s reliance on host chaperones such as DnaJ for correct folding implies that lowering this dependence by protein redesign may result in more stable, autonomous variations (Chamakura et al., 2017a). Rapid iteration is made possible by the generative design of stable phage proteins made possible by emerging computer tools like genome language models (King et al., 2025). Stable lysis proteins help with dependable production in the context of phage therapy, removing a significant obstacle to clinical use (Barron, 2022).

Higher titers (medium) In order to maximize burst size, the amount of progeny phages released per infected cell while preventing premature host death, it is necessary to optimize L protein expression and lysis timing in order to achieve greater phage titers. This goal is quite challenging since it necessitates balancing phage replication cycles with lysis efficiency, which frequently calls for iterative testing in host systems.

Higher toxicity of lysis protein (hard) According to supporting research, mutations in L can alter the kinetics of lysis, possibly raising titers by postponing lysis to enable more phage multiplication (Chamakura et al., 2017b). For example, modified L variants have shown increased yield without compromising infectivity in Escherichia coli models (Chamakura et al., 2017a). Scalable changes to phage genomes, such as lysis genes, to improve production efficiency are highlighted by engineering-focused research (Lin et al., 2023). L’s significance in membrane disruption is highlighted by its in vitro characterisation, which implies that targeted enhancements could maximize burst dynamics (Mezhyrova et al., 2023). Low titers are identified as a manufacturing difficulty in the broader literature on phage therapy; nevertheless, developments in synthetic biology and AI-driven design provide options for high-yield phage variations (Barron, 2022; King et al., 2025).

Project Proposal

We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action. The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed. BLAST can pull out homologous lysis proteins from the databases. Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated. ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions. AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ. We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein. Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

MS2 Lysis of Escherichia coli Depends on Host Chaperone DnaJ - PMC The study shows that the MS2 phage lysis protein L requires the host chaperone DnaJ for efficient host cell lysis. A missense mutation (P330Q) in the highly conserved C-terminal domain of DnaJ blocks MS2 L-mediated lysis at 30 °C and delays lysis at higher temperatures, without affecting overall L protein synthesis. The defect is specific to L-mediated lysis and does not affect lysis by other phage lysis proteins.

Genetic suppressor screening identified Lodj alleles of the L gene that bypass the DnaJ requirement. These alleles encode truncated L proteins lacking the highly basic N-terminal domain, indicating that this domain confers dependence on DnaJ. Biochemical assays demonstrated that wild-type L forms a membrane-associated complex with DnaJ, whereas the P330Q DnaJ variant cannot interact with L.

The authors propose that DnaJ functions as a chaperone that facilitates proper folding or conformational activation of full-length L, preventing steric interference from the N-terminal domain and allowing L to interact with its unknown cellular target. Removal of the dispensable N-terminal domain eliminates the need for chaperone assistance and accelerates lysis.

The work identifies DnaJ as a host factor regulating MS2 lysis timing and suggests that chaperone-dependent modulation of lysis may be an evolutionary strategy to optimize phage replication cycles.

https://pmc.ncbi.nlm.nih.gov/articles/PMC5775895/ This study performed comprehensive mutational and genetic analyses of the MS2 phage lysis protein L to identify residues and domains required for function. Random mutagenesis of the 75-aa L protein showed that most loss-of-function mutations cluster in the C-terminal half of the protein, especially around a conserved Leu-Ser (LS) dipeptide motif. Many inactivating mutations were conservative amino-acid substitutions and did not affect protein accumulation or membrane association, suggesting that L function depends on specific protein–protein interactions rather than nonspecific membrane disruption.

Functional studies demonstrated that L-mediated lysis requires interaction with the host chaperone DnaJ. The highly basic N-terminal domain of L is dispensable for lytic activity but mediates DnaJ dependence. Truncation of this domain or certain suppressor mutations bypassed the chaperone requirement and restored rapid lysis.

Biochemical and genetic data support a model in which L is an integral membrane protein whose essential domains (including the LS motif and neighboring regions) form a helical structure that likely engages a host membrane target protein. The interaction may occur near sites of membrane curvature associated with peptidoglycan biosynthesis rather than by forming nonspecific membrane lesions.

The work, supported in part by the Center for Phage Technology and associated laboratories including research by Ry Young, suggests that MS2 L functions through a specific heterotypic protein–protein interaction mechanism and that chaperone-dependent regulation helps control lysis timing during infection.

The study refines the mechanistic model of MS2 lysis, proposing that conserved structural motifs rather than general membrane disruption drive lytic activity.

https://pmc.ncbi.nlm.nih.gov/articles/PMC10688784/ This study provides detailed in vitro and in vivo characterization of the MS2 lysis protein MS2-L, focusing on its membrane insertion mechanism, oligomerization behavior, and interaction with the host chaperone DnaJ. Key findings show that MS2-L is a 75-amino-acid phage toxin whose essential lytic activity resides in the C-terminal ~35 amino acids, which form a hydrophobic transmembrane region. The N-terminal soluble domain is not required for bacterial killing but modulates folding, membrane insertion efficiency, and chaperone interaction. Biochemical assays demonstrate that MS2-L interacts directly with DnaJ, primarily through the soluble N-terminal domain. However, this interaction does not significantly affect membrane insertion, solubilization, or oligomerization of the toxin, suggesting that DnaJ functions more as a folding or stabilization partner rather than being essential for lytic activity. Native mass spectrometry revealed that MS2-L assembles into high-order oligomeric complexes (≥10 monomers) after insertion into lipid nanodiscs, and oligomerization is driven mainly by the transmembrane domain. In detergent environments, oligomer formation is reduced, indicating that membrane lipid context is important for stable assembly. Fluorescence microscopy and cryo-electron microscopy showed that MS2-L expression in bacteria leads to peripheral membrane clustering, followed by sequential lesion formation beginning in the outer membrane, then disruption of the peptidoglycan layer, and finally inner membrane disintegration with cytoplasmic leakage. The data support a model in which MS2-L functions as a pore-forming phage toxin that kills cells through higher-order oligomerization within the bacterial membrane, rather than by directly inhibiting peptidoglycan biosynthesis. Chaperone DnaJ binds MS2-L but is not required for membrane insertion or pore assembly, suggesting its role is mainly in modulating toxin folding or stability. These findings strengthen the concept that MS2-L belongs to the amurin/single-gene lysis protein family and may be useful for bioengineering applications such as bacterial ghost cell production and antimicrobial design.

https://pubmed.ncbi.nlm.nih.gov/36608652/ This review from Elsevier surveys the biological mechanisms, clinical development, and future directions of phage therapy as a strategy to combat antimicrobial resistance. It explains that therapeutic phages should ideally be strictly lytic, highly host-specific, and thoroughly characterized to ensure safety and efficacy.

The article describes how phages kill bacteria through mechanisms such as inhibition of essential cellular processes, expression of lysis proteins, or disruption of bacterial membranes. It also discusses advances in phage engineering, including synthetic genome construction and modification of phage host range and virulence.

Clinical applications of phage therapy are highlighted, particularly for treating drug-resistant infections where antibiotics are ineffective. However, challenges remain, including bacterial resistance to phages, regulatory hurdles, manufacturing standardization, and the need to understand phage–host interactions.

Future directions include the use of genetically modified or synthetic phages, computational prediction of therapeutic candidates, and integration of phage therapy with conventional antimicrobial strategies. Overall, phage therapy is presented as a promising but still developing alternative to antibiotics in the fight against antimicrobial resistance.

https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1.full This preprint reports the first experimental demonstration of generative design of complete bacteriophage genomes using genome language models (Evo 1 and Evo 2). The authors fine-tuned models on about 15,000 Microviridae phage genomes to enable autoregressive generation of full viral genomes guided by template-based prompts and biologically motivated design constraints. The workflow involved computational generation followed by multi-tier filtering for sequence quality, host tropism specificity, and evolutionary diversity. Constraints included genome length (4–6 kb), GC content, absence of long homopolymers, preservation of phage-like gene architecture, and spike protein similarity to the template phage to maintain host targeting. Experimental validation showed that about 285 of 302 synthesized genome candidates could be assembled, and 16 produced viable infectious phages that inhibited growth of the target host strain. These generated phages displayed substantial sequence novelty, containing hundreds of mutations relative to natural Microviridae genomes, while preserving functional genome organization. Structural and functional analyses indicated that some generated phages possessed altered protein interfaces but maintained compatible capsid–protein interactions. Cryo-electron microscopy and structure prediction suggested context-dependent co-evolution of structural proteins such as capsid and packaging proteins. Fitness assays showed that several AI-generated phages matched or exceeded the replication and lytic performance of the template phage, and phage cocktail experiments demonstrated rapid suppression of resistant bacterial strains through recombination and mutation-driven adaptation. The study was conducted with biosafety considerations, including restricting model training to bacteriophage genomes and using well-characterized laboratory strains. The work was supported by researchers affiliated with institutions such as the Stanford University and the Arc Institute. Overall, the paper proposes a framework for generative genome engineering, showing that AI models can design biologically viable and evolutionarily novel bacteriophages, potentially enabling future synthetic biology and phage-based therapeutic development.

Project Proposal: Engineering the MS2 Phage Lysis Protein L

1. Project Goal Our primary goal is to increase the structural stability of the MS2 bacteriophage lysis protein (L) while maintaining its ability to lyse bacterial cells. Our secondary goal is to reduce the dependency of L on the host chaperone DnaJ, which normally assists the protein in folding or activation. Reducing this dependency could allow the lysis protein to function more efficiently and independently in engineered systems. The MS2 L protein is a 75-amino-acid single-gene lysis toxin whose C-terminal region forms a hydrophobic transmembrane domain responsible for membrane disruption and pore formation, while the basic N-terminal domain interacts with host factors such as DnaJ. Previous studies show that truncation of the N-terminal region can bypass the DnaJ requirement while preserving lysis activity. Therefore, our design strategy focuses on: Stabilizing the transmembrane and oligomerization regions Maintaining essential functional motifs such as the L48–S49 motif Exploring modifications to the N-terminal region to reduce DnaJ dependence 2. Computational Tools and Approaches We will use a multi-step computational protein engineering pipeline combining sequence analysis, machine-learning mutagenesis predictions, and structural modeling. 2.1 BLAST – Homolog Discovery First, we will use BLAST to identify homologous lysis proteins from related bacteriophages. Purpose: Identify evolutionarily conserved residues Discover natural sequence variations that maintain function Build a dataset for multiple sequence alignment This will help determine which regions are functionally constrained vs mutable. 2.2 Clustal Omega – Multiple Sequence Alignment (MSA) Using sequences obtained from BLAST, we will perform multiple sequence alignment with Clustal Omega. Purpose: Identify highly conserved residues, especially around the L48–S49 motif Map essential structural regions Determine which residues are safe to mutate Regions with high conservation will be protected from mutation, while variable regions may be targeted for stability improvements. 2.3 ESM (Protein Language Models) – In Silico Mutagenesis Next, we will use ESM (Evolutionary Scale Modeling) protein language models to perform systematic mutation scanning. Purpose: Generate mutation heatmaps Predict which amino acid substitutions improve protein fitness or stability Identify mutations compatible with the evolutionary sequence landscape This step will guide rational mutation selection instead of random mutagenesis. 2.4 ESMFold – Structure Prediction for Mutants Promising mutations from ESM analysis will be modeled using ESMFold. Purpose: Predict 3D structures of mutant proteins Evaluate structural stability Ensure the transmembrane helix remains intact Mutations that significantly distort the fold will be discarded. 2.5 AlphaFold Multimer – Oligomerization and Host Interaction Finally, we will use AlphaFold Multimer to analyze: L protein oligomerization Potential interactions with DnaJ Purpose: Predict whether mutated L proteins can form the oligomeric pore complex Evaluate whether N-terminal mutations reduce interaction with DnaJ Since MS2-L likely forms large oligomeric pores (>10 subunits) in the membrane, maintaining correct protein in1.Phage L protein sequence

Computational Workflow 1.Phage L protein sequence 2.BLAST Search (find homologous lysis proteins) 3.Multiple Sequence Alignment (Clustal Omega) identify conserved vs mutable residues 4.ESM Mutation Scanning (generate mutation heatmaps) 5.Select Candidate Mutations (stability or N-terminal modifications) 6.Structure Prediction (ESMFold) 7.Complex/Oligomer Prediction (AlphaFold Multimer) 8.Final Mutant Candidates (stable + functional lysis protein) 3. Proposed Engineering Pipeline Computational workflow we will follow. 4. Expected Outcomes Our pipeline aims to produce engineered variants of the MS2 L protein with: Increased structural stability Reduced aggregation risk Maintained transmembrane insertion Potentially reduced dependency on host DnaJ These optimized proteins could be useful in applications such as: Synthetic phage engineering Bacterial ghost cell production Antimicrobial protein development 5. Potential Pitfalls 5.1 Limited Training Data Most protein language models and structural predictors are trained primarily on globular proteins, not small transmembrane phage toxins. This may reduce prediction accuracy for MS2 L. 5.2 Risk of Over-Stabilization Mutations designed to increase stability may cause: Protein aggregation Improper membrane insertion Loss of functional oligomerization Thus stability must be balanced with function. 5.3 Poor Annotation of Amurin Proteins Single-gene lysis proteins (also called amurins) are poorly annotated in sequence databases. This may limit the quality of homologous sequences used for alignment and training. 5.4 Host Protease Sensitivity Mutations may unintentionally expose protease cleavage sites, making the engineered protein less stable inside bacterial cells. 6. Future Work If promising mutants are identified computationally, the next steps would include: Experimental expression in E. coli Measuring lysis timing Measuring protein stability Testing DnaJ independence This would validate whether computational predictions translate into improved biological function.

Continue: Week 5: HW Protein Design Part II