Group Final Project
Part D: Group Brainstorm on Bacteriophage Engineering
Due to later start of our Node, we had limited time to find groups and set up a meeting, therefore the drafts of our group are mainly individual, and not discussed
Goal
We target two complementary objectives: (A) Increased stability of the L protein, specifically engineering DnaJ-independent variants that fold correctly without host chaperone assistance; and (B) Higher toxicity / faster lysis, by optimizing the transmembrane oligomerization interface to accelerate pore formation. Goal A is prerequisite to Goal B: a stable, chaperone-independent L is resistant to the most documented E. coli escape mechanism (DnaJ P330Q mutation), and faster lysis narrows the window for resistance acquisition.
Scientific Rational
Three findings define our design space.
- DnaJ binds the highly basic N-terminal domain (res. 1–36) of L and relieves a steric inhibition blocking target engagement; removing this domain eliminates DnaJ dependency and accelerates lysis (Chamakura, J Bacteriol 2017).
- Near-saturating mutagenesis shows the LS motif (Leu48-Ser49) and flanking residues form a heterotypic interface with an unknown target; exquisitely conservative mutations matter (L44V = dead, L44I = functional) and all are recessive, pointing to a specific binding event rather than membrane disruption (Chamakura, Microbiology 2017).
- MS2-L oligomerizes into 10+ mers in nanodisc membranes via its TM domain; cryo-EM shows large envelope lesions starting at the outer membrane (Mezhyrova et al., 2023).
Strategy: neutralize basic charges in Domain 1 so DnaJ is no longer required, while leaving Domains 2–4 (the lytic machinery) untouched.
Computational Tools
| Tool | Application | Why it helps |
|---|---|---|
| Clustal Omega | Align L homologs to identify which aminoacids are freely mutable | Reproduces and extends the LS-motif alignment from Chamakura (2017). Essential first step: tells us where NOT to mutate. |
| ESMFold | Predict 3D structure and each designed variant; verify the TM helix remains intact after mutations | Fast single-sequence predictor. For a 75 aa peptide with few homologs, much more practical than full AlphaFold for screening many candidates. |
| AlphaFold-Multimer | Model the L–DnaJ complex; confirm charge-neutralized variants show reduced interface confidence. Also model L–L homodimers to check TM packing. | Key validation for Goal A: if predicted L–DnaJ interface weakens for our variants, that supports DnaJ independence. |
| ProteinMPNN | Inverse folding: redesign Domain 1 (res. 1–36) to be uncharged while fitting the ESMFold-predicted backbone. Domains 2–4 fixed as hard constraints. | new sequence for existing fold with position-specific constraints. Generates diverse candidates we can then filter with ESM-2. |
| ESM | Zero-shot fitness scoring: rank all candidate variants by pseudo-log-likelihood as a sequence-level sanity check | Independent of structure prediction. Benchmarked first against known mutants — if it captures L biology, we use it to filter; if not, we rely on conservation alone. |
Schematic

Pitfalls
We cannot model the most critical interaction (L with its unidentified host target) computationally. ML models may not capture L biology, as L is a 75 aa phage toxin with very few homologs, far outside the training distribution of ESM-2 and AlphaFold