Week 5 HW: Protein Design Part 2


Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

Sequnce:

  • MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
BinderPseudo Perplexity
WRYYVAAVALWX16.2395343215211
WHYYAVAAEWKX13.6945052943038
WLVPAAAAAHGK7.93406338297721
WRYGPVAVRHWK14.3797355975152
FLYRWLPSRRGG20.6352312728361

Part 2: Evaluate Binders with AlphaFold3

“PepMLM outputs X; substituted X→A for AlphaFold input.”

BinderipTM Score
WRYYVAAVALWA0.34
WHYYAVAAEWKA0.23
WLVPAAAAAHGK0.48
WRYGPVAVRHWK0.36
FLYRWLPSRRGG0.38
  • Across the five AlphaFold3 complex predictions, ipTM values ranged from 0.23 to 0.48. The known binder FLYRWLPSRRGG gave ipTM = 0.38 and appeared weakly defined, remaining largely surface-adjacent/partially detached rather than buried in a clear pocket, with no obvious localization near the N-terminus where A4V sits. Three PepMLM peptides (WHYYAVAAEWKA, 0.23; WRYYVAAVALWA, 0.34; WRYGPVAVRHWK, 0.36) similarly showed low-confidence interfaces, tending to lie loosely on the β-barrel exterior instead of concentrating at the A4V region. In contrast, WLVPAAAAAHGK produced the strongest prediction (ipTM = 0.48) and appeared more plausibly docked along a β-barrel/loop-adjacent surface, making it the only PepMLM-generated peptide that exceeded the known binder’s ipTM in this set.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

PeptidePredicted binding affinitySolubilityHemolysis probabilityNet charge (pH 7)Molecular weight
WRYYVAAVALWAMedium binding(8.472)Soluble(0.999)Non-hemolytic(0.178)0.761468.7
WHYYAVAAEWKAWeak binding(5.407)Soluble(1.000)Non-hemolytic(0.042)-0.151494.6
WLVPAAAAAHGKWeak binding(5.328)Soluble(1.000)Non-hemolytic(0.019)0.851191.4
WRYGPVAVRHWKWeak binding(5.615)Soluble(1.000)Non-hemolytic(0.018)2.851554.8
FLYRWLPSRRGGWeak binding(5.968)Soluble(1.000)Non-hemolytic(0.047)2.761507.7
  • Comparing PeptiVerse to the AlphaFold3 complexes, the two signals only partially agree. Structurally, WLVPAAAAAHGK had the highest interface confidence (ipTM = 0.48), but PeptiVerse predicts only weak binding (5.328)—so higher ipTM did not map cleanly to higher predicted affinity in this set. Instead, the strongest PeptiVerse binder is WRYYVAAVALWA with medium binding (8.472) even though its AF3 interface confidence was lower (ipTM = 0.34) and its pose looked more loosely surface-associated. Encouragingly, all peptides are predicted soluble (~1.0) and non-hemolytic (hemolysis probabilities 0.018–0.178), so there’s no obvious developability red flag from solubility or hemolysis. Balancing predicted binding and therapeutic properties, WRYYVAAVALWA stands out as the best overall: it has the strongest predicted affinity, remains soluble, and is non-hemolytic with a moderate net charge (+0.76).

  • Peptide to advance: WRYYVAAVALWA — it offers the best binding/developability tradeoff (highest predicted affinity while still soluble and non-hemolytic), and its AF3 pose is at least compatible with a surface-binding interaction even if the interface confidence is modest.

Part 4: Generate Optimized Peptides with moPPIt

PeptideHemolysisNon-FoulingSolubilityAffinityMotifSpecifity
RFEEEERRRRRA0.95749700.97546480.83333335.57742550.00130710.9743590
GKCTLNNSQCQV0.88245760.68169040.83333336.08790780.87244780.6346154
PSPKKKRRKRCL0.96569240.94679380.75000006.57869530.01837320.9743590
DEKDDDHTCHEK0.91148210.86369531.00000005.78542420.74183480.7628205
  • moPPIt produced more “designed” peptides than PepMLM: instead of mostly natural-looking, Ala-rich 12-mers, it generated strongly biased sequences (very cationic like PSPKKKRRKRCL/RFEEEERRRRRA or more acidic like DEKDDDHTCHEK) because it was optimizing multiple objectives at once. With hemolysis, non-fouling, solubility, affinity, motif, and specificity all enabled (weights = 1; motif positions 1–4), the peptides generally looked developable: non-fouling scores were high (0.68–0.98) and solubility was good (0.75–1.00). Predicted affinity ranked highest for PSPKKKRRKRCL (6.58), then GKCTLNNSQCQV (6.09), DEKDDDHTCHEK (5.79), and RFEEEERRRRRA (5.58), while motif matching varied a lot (very low for RFEEEERRRRRA, higher for GKCTLNNSQCQV and DEKDDDHTCHEK). Before any real advancement, I would next run AlphaFold3 to confirm the peptides actually dock to SOD1 (ipTM + binding site), then validate with basic binding experiments (SPR/BLI/MST), plus solubility, hemolysis/cytotoxicity, and stability tests, since model scores don’t guarantee true binding in biology.

Part C: Final Project: L-Protein Mutants

Soluble (1–40):

  1. P13L — LLR +0.100
  2. S15A — LLR +0.036
  3. R30L — LLR −0.130

Transmembrane (41–75):

  1. A45P — LLR +0.038
  2. I46F — LLR −0.096
  • Every mutation is supported by lab evidence (lysis happens and the protein is detectable).
  • LLR scores are mostly near-neutral to mildly positive; even the slightly negative ones are not extremely deleterious by the model, which is a reasonable sanity check.

I prioritized variants that were lysis-positive in the experimental mutant dataset, and further preferred those with detectable protein levels, since the project goal includes improving stability/expression in addition to lysis. I then cross-referenced each candidate with the ESM mutational LLR scan as a plausibility check, avoiding strongly disfavored substitutions; however, when directly compared across the set of tested mutants, LLR showed no meaningful correlation with lysis outcomes (AUC ≈ 0.48), so the experimental phenotype was treated as the primary selection signal. To satisfy the assignment constraints, I selected three soluble-domain mutants and two transmembrane mutants.