Group Final Project

PROJECT OBJECTIVE

  • Engineer the L protein of the MS2 phage to increase structural stability.
  • Disrupt or reduce its interaction with the bacterial chaperone DnaJ.
  • Preserve the C-terminal lysis domain to maintain lytic function.
  • Avoid mutations that interfere with structurally or evolutionarily coupled residues.

Phase 1: Mapping the DnaJ Interaction Interface

Since the exact binding interface between the L protein and DnaJ is unknown, the first step is to identify it computationally rather than introducing arbitrary mutations.

  • Use AlphaFold-Multimer to model the complex between L protein and DnaJ.
  • Generate multiple structural predictions and select the top-ranked models.
  • Identify consensus interface residues that consistently appear in the predicted binding interface.
  • Perform in silico alanine scanning of the N-terminal residues in the complex to determine which residues significantly contribute to binding energy (ΔΔG).
  • Analyze whether the N-terminal region resembles known DnaJ-binding motifs, typically hydrophobic residues flanked by basic amino acids.

This phase defines which residues are critical for interaction and should not be mutated randomly.

Phase 2: Targeted N-Terminal Redesign

Instead of deleting regions or performing extensive random substitutions, introduce controlled chemical modifications to disrupt interaction while preserving structural stability.

  • Focus on charge inversion strategies:

    • Basic residues (K, R) → Acidic residues (E, D)
    • Acidic residues (E, D) → Basic residues (K, R)
  • Disrupt hydrophobic interaction patches:

    • Hydrophobic residues (L, I, V, F) → Polar residues (S, T, N, Q)
    • Aromatic residues (F, Y, W) → Aliphatic or small residues
  • Generate a graded library of variants:

    • Minor charge modifications
    • Moderate interface perturbations
    • Strong hydrophobic disruption

This creates a Pareto front of variants balancing reduced DnaJ interaction and preserved protein stability.

Phase 3: Stability and Functional Filtering

To ensure that redesigned variants remain structurally viable and functionally relevant:

  • Use Rosetta or FoldX to calculate ΔΔG and verify that mutations do not destabilize the overall protein fold.

  • Confirm that mutations in the N-terminal region do not propagate structural stress toward the C-terminal lysis domain.

  • Perform co-evolutionary analysis (e.g., EVcouplings):

    • Identify residue pairs that co-evolved between the N-terminal and C-terminal regions.
    • Avoid mutating co-evolved residues independently to prevent functional disruption.
  • Evaluate aggregation propensity using tools such as Aggrescan3D to ensure that mutations do not create exposed hydrophobic patches leading to cytoplasmic aggregation.

  • Assess sequence plausibility using protein language models such as ESM to filter out unlikely or non-natural variants.

Key Limitations:

  • The DnaJ binding mode may be transient or dynamic, reducing AlphaFold-Multimer accuracy.
  • Protein language model scores do not guarantee in vivo functionality.
  • Intrinsically disordered regions may not be accurately modeled.
  • Computational predictions must ultimately be validated experimentally.

From WEEK 5 HW:

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

Lysis Protein Sequence (UniProtKB ID:

https://www.uniprot.org/uniprotkb/P03609/entry)

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Note: Lysis protein contains a soluble N-terminal domain followed by a transmembrane protein (blue/last 35 residues). Transmembrane protein affects the lysis activity. The soluble domain (green) is the domain responsible for interaction with DnaJ.

L-Protein Engineering | Option 1: Mutagenesis

STEP 1:

A multiple sequence alignment of homologous L-protein sequences was performed using Clustal Omega to identify conserved and variable regions across related bacteriophages. The alignment revealed that the transmembrane region, located in the C-terminal portion of the protein, is highly conserved, particularly in residues forming a hydrophobic helix (LVLIFLAIFLSKFTNQLLLSLL). This high level of conservation suggests a critical functional role in membrane insertion and pore formation during bacterial lysis. In contrast, the N-terminal soluble region displayed greater sequence variability, indicating a higher tolerance to mutations. Based on these observations, conserved residues were avoided during mutational design, while more variable positions, especially in the soluble domain, were prioritized as potential targets for mutation.

STEP 2:

To evaluate the effect of mutations across the L-protein sequence, a protein language model (ESM-2) was used to compute log-likelihood ratio (LLR) scores for all possible amino acid substitutions at each position. This approach estimates how favorable a mutation is relative to the wild-type residue based on learned sequence patterns from large protein datasets. Positive LLR scores indicate mutations that are more likely to be tolerated or beneficial for protein stability, while negative scores suggest deleterious effects. The results were compiled into a ranked list of candidate mutations, allowing the identification of positions and substitutions with the highest predicted improvement. These scores were then used as a primary filter to guide mutation selection, in combination with conservation analysis from the multiple sequence alignment.

The protein language model identified several mutations with high positive LLR scores, indicating potentially favorable substitutions. The top-ranked mutations included K50L (LLR = 2.56), C29R (LLR = 2.39), Y39L (LLR = 2.24), C29S (LLR = 2.04), and S9Q (LLR = 2.01). Additional high-scoring mutations were observed at positions within both the soluble and transmembrane regions, such as T52L (LLR = 1.81), N53L (LLR = 1.86), and A45L (LLR = 1.54), particularly favoring substitutions to hydrophobic residues in the transmembrane domain. These results suggest that increasing hydrophobicity in the membrane region and selecting tolerated substitutions in variable regions may improve protein stability and folding.

STEP 3:

To assess how well the model predictions reflect real functional outcomes, the LLR scores were compared with available experimental lysis data for L-protein mutants. While some overlap between high-scoring mutations and experimentally tested variants was observed, many of the top-ranked mutations identified by the model were not present in the experimental dataset. Therefore, the experimental data was used when available, but for many candidate mutations, selection relied primarily on LLR scores in combination with conservation analysis.

STEP 4:

Based on the combined analysis of LLR scores, sequence conservation, and structural considerations, five mutations were selected as potential candidates for improving the L-protein. In the soluble region, the mutations S9Q and K23R were chosen due to their high LLR scores and location in more variable regions, suggesting a higher tolerance for substitutions that may improve folding stability. In the transmembrane region, K50L and T52L were selected, as both mutations introduce more hydrophobic residues, which is consistent with the conserved nature of this domain and may enhance membrane insertion and pore formation. Additionally, a combined mutant (S9Q + K50L) was designed to explore potential additive effects between improved folding in the soluble region and enhanced hydrophobicity in the transmembrane domain.

FASTA SEQUENCES:

>WT_L_protein METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.44

>S9Q METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>K23R METRFPQQSQQTPASTNRRRPFRHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>K50L METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>T52L METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.46

>S9Q_K50L METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

AlphaFold predictions were used to assess the structural impact of the selected mutations. The wild-type protein showed a pTM score of 0.44, while most mutants exhibited similar values around 0.43, indicating no significant structural disruption. Notably, the T52L mutant showed a slightly higher pTM score of 0.46, suggesting a modest improvement in structural stability. This result is consistent with the introduction of a more hydrophobic residue in the transmembrane region, which may favor membrane insertion. Overall, these findings indicate that the proposed mutations are structurally tolerated and may contribute to improved protein stability.