Group Final Project

Part D: Group Brainstorm on Bacteriophage Engineering

Due to later start of our Node, we had limited time to find groups and set up a meeting, therefore the drafts of our group are mainly individual, and not discussed

Goal

We target two complementary objectives: (A) Increased stability of the L protein, specifically engineering DnaJ-independent variants that fold correctly without host chaperone assistance; and (B) Higher toxicity / faster lysis, by optimizing the transmembrane oligomerization interface to accelerate pore formation. Goal A is prerequisite to Goal B: a stable, chaperone-independent L is resistant to the most documented E. coli escape mechanism (DnaJ P330Q mutation), and faster lysis narrows the window for resistance acquisition.

Scientific Rational

Three findings define our design space.

  1. DnaJ binds the highly basic N-terminal domain (res. 1–36) of L and relieves a steric inhibition blocking target engagement; removing this domain eliminates DnaJ dependency and accelerates lysis (Chamakura, J Bacteriol 2017).
  2. Near-saturating mutagenesis shows the LS motif (Leu48-Ser49) and flanking residues form a heterotypic interface with an unknown target; exquisitely conservative mutations matter (L44V = dead, L44I = functional) and all are recessive, pointing to a specific binding event rather than membrane disruption (Chamakura, Microbiology 2017).
  3. MS2-L oligomerizes into 10+ mers in nanodisc membranes via its TM domain; cryo-EM shows large envelope lesions starting at the outer membrane (Mezhyrova et al., 2023).

Strategy: neutralize basic charges in Domain 1 so DnaJ is no longer required, while leaving Domains 2–4 (the lytic machinery) untouched.

Computational Tools

ToolApplicationWhy it helps
Clustal OmegaAlign L homologs to identify which aminoacids are freely mutableReproduces and extends the LS-motif alignment from Chamakura (2017). Essential first step: tells us where NOT to mutate.
ESMFoldPredict 3D structure and each designed variant; verify the TM helix remains intact after mutationsFast single-sequence predictor. For a 75 aa peptide with few homologs, much more practical than full AlphaFold for screening many candidates.
AlphaFold-MultimerModel the L–DnaJ complex; confirm charge-neutralized variants show reduced interface confidence. Also model L–L homodimers to check TM packing.Key validation for Goal A: if predicted L–DnaJ interface weakens for our variants, that supports DnaJ independence.
ProteinMPNNInverse folding: redesign Domain 1 (res. 1–36) to be uncharged while fitting the ESMFold-predicted backbone. Domains 2–4 fixed as hard constraints.new sequence for existing fold with position-specific constraints. Generates diverse candidates we can then filter with ESM-2.
ESMZero-shot fitness scoring: rank all candidate variants by pseudo-log-likelihood as a sequence-level sanity checkIndependent of structure prediction. Benchmarked first against known mutants — if it captures L biology, we use it to filter; if not, we rely on conservation alone.

Schematic

Schematic Bacteriophage Schematic Bacteriophage

Pitfalls

We cannot model the most critical interaction (L with its unidentified host target) computationally. ML models may not capture L biology, as L is a 75 aa phage toxin with very few homologs, far outside the training distribution of ESM-2 and AlphaFold