Week 5 HW: Protein Design Part II

Part 1: Generate Binders with PepMLM

Sequence Preparation

Retrieved WT SOD1 from UniProt P00441 and introduced A4V (position 5 in full sequence, position 4 in mature protein after Met cleavage):

WT:  M A T K A V C V L K ...
A4V: M A T K V V C V L K ...
               ^

PepMLM Results

Model: PepMLM-650M (ESM-2 based, Colab, T4 GPU) | Parameters: length = 12, num_binders = 4, top_k = 3

RankPeptidePseudo-Perplexity
1WRYPAVAAEHWK12.41
2WRYPAVAAEHWE13.54
3WRYYPAGVAWKE14.19
4WLYPVAVLRWKX17.70
refFLYRWLPSRRGGβ€” (known SOD1 binder)

3/4 designs share a WRY N-terminal motif (aromatic + cationic). The best peptide (PPL=12.41) and the known binder both feature aromatic-rich N-termini and positively charged residues, suggesting electrostatic complementarity with SOD1’s surface.


Part 2: Evaluate Binders with Boltz-2

Method

Each peptide was modeled as a complex with SOD1-A4V using Boltz-2 (via Amina CLI) β€” a diffusion-based structure prediction model that produces ipTM scores comparable to AlphaFold-Multimer. SOD1-A4V was chain A, peptide was chain B.

Note: Peptide 4 contained an ambiguous X residue; this was replaced with Ala for structure prediction.

Results

PeptidePseudo-PerplexityipTMpLDDTConfidencepDockQ2
WRYPAVAAEHWK12.410.89493.60.9280.458
WLYPVAVLRWKA17.700.76993.70.9040.308
WRYYPAGVAWKE14.190.75391.30.8810.195
FLYRWLPSRRGG (known)β€”0.67592.80.8770.095
WRYPAVAAEHWE13.540.53093.10.8510.042

Analysis

The top PepMLM design (WRYPAVAAEHWK) substantially outperforms the known binder, with ipTM = 0.894 vs 0.675 β€” well above the 0.8 threshold generally considered confident for protein-protein interactions. Three of four PepMLM peptides exceed the known binder’s ipTM.

The best-scoring peptide also had the lowest PepMLM perplexity (12.41), showing agreement between the language model’s confidence and the structure predictor’s assessment. The one exception is peptide 2 (WRYPAVAAEHWE), which differs from peptide 1 by a single residue (Kβ†’E at position 12) but drops dramatically in ipTM (0.894 β†’ 0.530) β€” suggesting this C-terminal charge is critical for the binding interaction.

Predicted structures are available as PDB files in C2_boltz2/ for visualization.


Part 3: Evaluate Therapeutic Properties with PeptiVerse

Method

All peptides (4 PepMLM designs + known binder) were submitted to PeptiVerse with SOD1-A4V as the target. Properties evaluated: binding affinity (pKd), solubility, hemolysis, molecular weight, net charge (pH 7).

Results

PeptideipTM (Boltz-2)Binding Affinity (pKd)SolubilityHemolysis (prob)Net Charge (pH 7)MW (Da)
WRYPAVAAEHWK0.8945.44Soluble0.023+0.851513.7
WRYPAVAAEHWE0.5305.58Soluble0.035βˆ’1.141514.6
WRYYPAGVAWKE0.7535.95Soluble0.022+0.771525.7
WLYPVAVLRWKA0.7696.48Soluble0.043+1.761501.8
FLYRWLPSRRGG (known)0.6755.97Soluble0.047+2.761507.7

Analysis

All peptides are predicted soluble and non-hemolytic (all < 5% probability), which is encouraging for therapeutic development. Binding affinities are all in the “weak binding” range (pKd 5.4–6.5), which is typical for short linear peptides.

Interestingly, ipTM and predicted affinity do not fully agree. The best structural binder (WRYPAVAAEHWK, ipTM = 0.894) has the lowest predicted affinity (pKd = 5.44), while peptide 4 (WLYPVAVLRWKA) has the highest affinity (pKd = 6.48) but only moderate ipTM (0.769). This reflects the fact that PeptiVerse predicts affinity from sequence alone, while Boltz-2 evaluates structural complementarity β€” the two metrics capture different aspects of binding.

Recommended peptide to advance: WRYPAVAAEHWK. It has the strongest structural evidence for binding (ipTM = 0.894, well above the 0.8 confidence threshold), excellent safety profile (non-hemolytic, soluble, near-neutral charge), and the lowest PepMLM perplexity (12.41). While its sequence-predicted affinity is modest, the high ipTM and pDockQ2 (0.458) suggest a well-defined binding interface that may translate better to experimental validation than affinity predictions alone.


Part 4: Generate Optimized Peptides with moPPIt

Method

moPPIt-v3 uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific target residues while optimizing multiple therapeutic properties simultaneously. Unlike PepMLM (which samples plausible binders from sequence alone), moPPIt lets you specify where to bind and what properties to optimize.

Parameters: Target = SOD1-A4V, length = 12, motif positions = 1–10 (N-terminal region near A4V), objectives = Hemolysis, Solubility, Motif (weight = 1.0 each), GPU = L4

Results

PeptideHemolysis ScoreSolubility ScoreMotif Score
DTKVKCGGNTQW0.9680.8330.803
GCFEKTTGKTQD0.9710.9170.769
KTGGKTQKITWH0.9620.8330.757
TDTIRYKRQADE0.9740.8330.664

(Scores closer to 1.0 = better for hemolysis/solubility; motif score reflects binding to target residues 1–10.)

PepMLM vs moPPIt Comparison

PropertyPepMLM PeptidesmoPPIt Peptides
Design strategySequence-conditioned samplingMulti-objective guided flow matching
CompositionAromatic-rich (WRY motifs), hydrophobicCharged/polar (K, T, D, E), hydrophilic
Target awarenessWhole protein (implicit)Specific residues 1–10 (explicit)
Therapeutic optimizationNone (binding only)Hemolysis, solubility optimized jointly

The moPPIt peptides are strikingly different in character β€” dominated by polar and charged residues (Lys, Thr, Asp, Glu, Gly) rather than the aromatic-heavy PepMLM designs. This likely reflects the multi-objective optimization steering away from hydrophobic residues (which can cause hemolysis and poor solubility) toward safer, more soluble compositions.

How to Evaluate Before Clinical Advancement

Before advancing any peptide toward clinical studies, one would need to:

  • Binding validation: Surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) to measure actual binding affinity
  • Structural confirmation: Cryo-EM or X-ray crystallography of the peptide-SOD1 complex
  • Selectivity testing: Ensure binding to mutant A4V-SOD1 over wild-type (to avoid disrupting normal SOD1 function)
  • Cell-based assays: Test whether peptide-E3 ligase fusions can degrade mutant SOD1 in cellular models
  • Stability and pharmacokinetics: Serum stability, half-life, and cell permeability measurements
  • In vivo efficacy: Animal models of SOD1-ALS (e.g., SOD1-G93A transgenic mice)


Part C: Phage Lysis Protein Design Challenge

Background

The MS2 phage L-protein (75 residues) lyses E. coli by forming membrane pores. E. coli can resist by mutating its DnaJ chaperone, preventing L-protein folding. Our goal: design L-protein mutants that reduce DnaJ dependence while maintaining lysis activity.

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
|_____________ Soluble domain (1-39) ____________|____ Transmembrane (40-75) ____|
         (interacts with DnaJ)                        (forms membrane pores)

Approach: Option 2 β€” Boltz-2 Co-folding + DMS Analysis

Step 1: Deep Mutational Scan (ESM-1v)

Tool: amina run esm1v -m dms with 5-model ensemble (1,500 mutations scored across 75 positions)

DMS vs Experimental Data Correlation

Cross-referencing the DMS scores with published experimental mutation data (Chamakura et al., 2017):

Experimentally validated lysis-positive mutations in the soluble domain:

PositionChangeLysisProteinDMS ScoreDMS Assessment
13Pβ†’L11βˆ’0.37Tolerated
15S→A11+0.01Favorable
18Rβ†’G11βˆ’0.92Tolerated
18Rβ†’I11βˆ’1.17Tolerated
23K→E10+0.58Favorable
25E→G10+0.23Favorable
25E→V10+0.04Favorable
25E→D10+0.15Favorable
26D→G10+0.14Favorable
30Rβ†’Q11βˆ’0.46Tolerated
30Rβ†’L11βˆ’0.35Tolerated
31Rβ†’I11βˆ’1.23Tolerated

Critical discrepancy β€” Position 29 (Cys): The DMS model scores C29 as the most mutable position (avg = +2.45), yet experimentally C29R kills both lysis and protein expression. The language model misses this β€” likely a disulfide bond or folding nucleation site. This highlights the importance of cross-validating computational predictions with experimental data.

Most tolerant positions (DMS): 29 (but experimentally essential!), 27, 24, 5, 23, 22, 39, 25 Most conserved positions (DMS): 1 (Met, start codon), 38, 31, 19, 30, 18, 20, 11

Step 2: Boltz-2 Co-fold (L-protein + DnaJ)

Tool: amina run boltz2 with L-protein (chain A, 75 aa) + DnaJ (chain B, 376 aa)

Results: ipTM = 0.165, pLDDT = 71.5, pDockQ2 = 0.009

The very low ipTM confirms what the assignment warned β€” folding models struggle with this system. The L-protein has a disordered soluble domain and a transmembrane region, neither of which fold well in isolation. However, the low-confidence prediction still places the soluble domain (residues 1–39) in proximity to DnaJ’s N-terminal J-domain, which is consistent with the known chaperone-substrate interaction mode.

Step 3: Designed Mutations

Based on combining: (1) DMS-favorable scores, (2) experimentally validated lysis-positive mutations, and (3) avoidance of conserved/essential positions, we propose 5 L-protein variants:

VariantMutationsRegionRationale
V1K23E + E25G + D26GSolubleAll three individually maintain lysis. Remodels the charge landscape of positions 23–26 (removes net +1 charge, adds flexibility). All DMS-favorable.
V2R18G + R20L + K23ESolubleRemoves three positive charges from the Arg/Lys-rich stretch (18–23), which is the predicted DnaJ interaction surface. All experimentally lysis-positive. May reduce DnaJ dependence by altering the chaperone recognition motif.
V3S15A + E25G + R30QSolubleCombines the most conservative experimentally validated mutations across different subregions. S15A and R30Q both maintain lysis AND protein levels, minimizing risk.
V4P13L + R18I + E25DSolubleP13L disrupts a potential turn structure; R18I replaces a charged Arg with hydrophobic Ile; E25D is a conservative acidic→acidic swap. All lysis-positive experimentally.
V5R19S + K23E + D26GSolubleTargets the cationic cluster (R19, K23) that likely mediates DnaJ binding. Replacing Arg/Lys with neutral/acidic residues may enable DnaJ-independent folding while maintaining the downstream lysis machinery.

Design principles:

  • All individual mutations are experimentally validated as lysis-positive
  • Mutations target the Arg/Lys-rich region (positions 18–26) that likely mediates DnaJ recognition
  • No mutations at position 29 (Cys β€” essential despite DMS scores) or position 1 (Met β€” start codon)
  • Each variant has β‰₯3 mutations for meaningful charge/surface remodeling

Tools Used

StepToolDetails
Deep mutational scanamina run esm1v -m dms5-model ensemble, 1500 mutations
Co-fold L-protein + DnaJamina run boltz22-chain complex, diffusion model
Experimental cross-validationPublished dataChamakura et al., 2017