Week 5 HW: Protein Design Part II
Part 1: Generate Binders with PepMLM
Sequence Preparation
Retrieved WT SOD1 from UniProt P00441 and introduced A4V (position 5 in full sequence, position 4 in mature protein after Met cleavage):
PepMLM Results
Model: PepMLM-650M (ESM-2 based, Colab, T4 GPU) | Parameters: length = 12, num_binders = 4, top_k = 3
| Rank | Peptide | Pseudo-Perplexity |
|---|---|---|
| 1 | WRYPAVAAEHWK | 12.41 |
| 2 | WRYPAVAAEHWE | 13.54 |
| 3 | WRYYPAGVAWKE | 14.19 |
| 4 | WLYPVAVLRWKX | 17.70 |
| ref | FLYRWLPSRRGG | β (known SOD1 binder) |
3/4 designs share a WRY N-terminal motif (aromatic + cationic). The best peptide (PPL=12.41) and the known binder both feature aromatic-rich N-termini and positively charged residues, suggesting electrostatic complementarity with SOD1’s surface.
Part 2: Evaluate Binders with Boltz-2
Method
Each peptide was modeled as a complex with SOD1-A4V using Boltz-2 (via Amina CLI) β a diffusion-based structure prediction model that produces ipTM scores comparable to AlphaFold-Multimer. SOD1-A4V was chain A, peptide was chain B.
Note: Peptide 4 contained an ambiguous X residue; this was replaced with Ala for structure prediction.
Results
| Peptide | Pseudo-Perplexity | ipTM | pLDDT | Confidence | pDockQ2 |
|---|---|---|---|---|---|
WRYPAVAAEHWK | 12.41 | 0.894 | 93.6 | 0.928 | 0.458 |
WLYPVAVLRWKA | 17.70 | 0.769 | 93.7 | 0.904 | 0.308 |
WRYYPAGVAWKE | 14.19 | 0.753 | 91.3 | 0.881 | 0.195 |
FLYRWLPSRRGG (known) | β | 0.675 | 92.8 | 0.877 | 0.095 |
WRYPAVAAEHWE | 13.54 | 0.530 | 93.1 | 0.851 | 0.042 |
Analysis
The top PepMLM design (WRYPAVAAEHWK) substantially outperforms the known binder, with ipTM = 0.894 vs 0.675 β well above the 0.8 threshold generally considered confident for protein-protein interactions. Three of four PepMLM peptides exceed the known binder’s ipTM.
The best-scoring peptide also had the lowest PepMLM perplexity (12.41), showing agreement between the language model’s confidence and the structure predictor’s assessment. The one exception is peptide 2 (WRYPAVAAEHWE), which differs from peptide 1 by a single residue (KβE at position 12) but drops dramatically in ipTM (0.894 β 0.530) β suggesting this C-terminal charge is critical for the binding interaction.
Predicted structures are available as PDB files in C2_boltz2/ for visualization.
Part 3: Evaluate Therapeutic Properties with PeptiVerse
Method
All peptides (4 PepMLM designs + known binder) were submitted to PeptiVerse with SOD1-A4V as the target. Properties evaluated: binding affinity (pKd), solubility, hemolysis, molecular weight, net charge (pH 7).
Results
| Peptide | ipTM (Boltz-2) | Binding Affinity (pKd) | Solubility | Hemolysis (prob) | Net Charge (pH 7) | MW (Da) |
|---|---|---|---|---|---|---|
WRYPAVAAEHWK | 0.894 | 5.44 | Soluble | 0.023 | +0.85 | 1513.7 |
WRYPAVAAEHWE | 0.530 | 5.58 | Soluble | 0.035 | β1.14 | 1514.6 |
WRYYPAGVAWKE | 0.753 | 5.95 | Soluble | 0.022 | +0.77 | 1525.7 |
WLYPVAVLRWKA | 0.769 | 6.48 | Soluble | 0.043 | +1.76 | 1501.8 |
FLYRWLPSRRGG (known) | 0.675 | 5.97 | Soluble | 0.047 | +2.76 | 1507.7 |
Analysis
All peptides are predicted soluble and non-hemolytic (all < 5% probability), which is encouraging for therapeutic development. Binding affinities are all in the “weak binding” range (pKd 5.4β6.5), which is typical for short linear peptides.
Interestingly, ipTM and predicted affinity do not fully agree. The best structural binder (WRYPAVAAEHWK, ipTM = 0.894) has the lowest predicted affinity (pKd = 5.44), while peptide 4 (WLYPVAVLRWKA) has the highest affinity (pKd = 6.48) but only moderate ipTM (0.769). This reflects the fact that PeptiVerse predicts affinity from sequence alone, while Boltz-2 evaluates structural complementarity β the two metrics capture different aspects of binding.
Recommended peptide to advance: WRYPAVAAEHWK. It has the strongest structural evidence for binding (ipTM = 0.894, well above the 0.8 confidence threshold), excellent safety profile (non-hemolytic, soluble, near-neutral charge), and the lowest PepMLM perplexity (12.41). While its sequence-predicted affinity is modest, the high ipTM and pDockQ2 (0.458) suggest a well-defined binding interface that may translate better to experimental validation than affinity predictions alone.
Part 4: Generate Optimized Peptides with moPPIt
Method
moPPIt-v3 uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific target residues while optimizing multiple therapeutic properties simultaneously. Unlike PepMLM (which samples plausible binders from sequence alone), moPPIt lets you specify where to bind and what properties to optimize.
Parameters: Target = SOD1-A4V, length = 12, motif positions = 1β10 (N-terminal region near A4V), objectives = Hemolysis, Solubility, Motif (weight = 1.0 each), GPU = L4
Results
| Peptide | Hemolysis Score | Solubility Score | Motif Score |
|---|---|---|---|
DTKVKCGGNTQW | 0.968 | 0.833 | 0.803 |
GCFEKTTGKTQD | 0.971 | 0.917 | 0.769 |
KTGGKTQKITWH | 0.962 | 0.833 | 0.757 |
TDTIRYKRQADE | 0.974 | 0.833 | 0.664 |
(Scores closer to 1.0 = better for hemolysis/solubility; motif score reflects binding to target residues 1β10.)
PepMLM vs moPPIt Comparison
| Property | PepMLM Peptides | moPPIt Peptides |
|---|---|---|
| Design strategy | Sequence-conditioned sampling | Multi-objective guided flow matching |
| Composition | Aromatic-rich (WRY motifs), hydrophobic | Charged/polar (K, T, D, E), hydrophilic |
| Target awareness | Whole protein (implicit) | Specific residues 1β10 (explicit) |
| Therapeutic optimization | None (binding only) | Hemolysis, solubility optimized jointly |
The moPPIt peptides are strikingly different in character β dominated by polar and charged residues (Lys, Thr, Asp, Glu, Gly) rather than the aromatic-heavy PepMLM designs. This likely reflects the multi-objective optimization steering away from hydrophobic residues (which can cause hemolysis and poor solubility) toward safer, more soluble compositions.
How to Evaluate Before Clinical Advancement
Before advancing any peptide toward clinical studies, one would need to:
- Binding validation: Surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) to measure actual binding affinity
- Structural confirmation: Cryo-EM or X-ray crystallography of the peptide-SOD1 complex
- Selectivity testing: Ensure binding to mutant A4V-SOD1 over wild-type (to avoid disrupting normal SOD1 function)
- Cell-based assays: Test whether peptide-E3 ligase fusions can degrade mutant SOD1 in cellular models
- Stability and pharmacokinetics: Serum stability, half-life, and cell permeability measurements
- In vivo efficacy: Animal models of SOD1-ALS (e.g., SOD1-G93A transgenic mice)
Part C: Phage Lysis Protein Design Challenge
Background
The MS2 phage L-protein (75 residues) lyses E. coli by forming membrane pores. E. coli can resist by mutating its DnaJ chaperone, preventing L-protein folding. Our goal: design L-protein mutants that reduce DnaJ dependence while maintaining lysis activity.
Approach: Option 2 β Boltz-2 Co-folding + DMS Analysis
Step 1: Deep Mutational Scan (ESM-1v)
Tool: amina run esm1v -m dms with 5-model ensemble (1,500 mutations scored across 75 positions)
DMS vs Experimental Data Correlation
Cross-referencing the DMS scores with published experimental mutation data (Chamakura et al., 2017):
Experimentally validated lysis-positive mutations in the soluble domain:
| Position | Change | Lysis | Protein | DMS Score | DMS Assessment |
|---|---|---|---|---|---|
| 13 | PβL | 1 | 1 | β0.37 | Tolerated |
| 15 | SβA | 1 | 1 | +0.01 | Favorable |
| 18 | RβG | 1 | 1 | β0.92 | Tolerated |
| 18 | RβI | 1 | 1 | β1.17 | Tolerated |
| 23 | KβE | 1 | 0 | +0.58 | Favorable |
| 25 | EβG | 1 | 0 | +0.23 | Favorable |
| 25 | EβV | 1 | 0 | +0.04 | Favorable |
| 25 | EβD | 1 | 0 | +0.15 | Favorable |
| 26 | DβG | 1 | 0 | +0.14 | Favorable |
| 30 | RβQ | 1 | 1 | β0.46 | Tolerated |
| 30 | RβL | 1 | 1 | β0.35 | Tolerated |
| 31 | RβI | 1 | 1 | β1.23 | Tolerated |
Critical discrepancy β Position 29 (Cys): The DMS model scores C29 as the most mutable position (avg = +2.45), yet experimentally C29R kills both lysis and protein expression. The language model misses this β likely a disulfide bond or folding nucleation site. This highlights the importance of cross-validating computational predictions with experimental data.
Most tolerant positions (DMS): 29 (but experimentally essential!), 27, 24, 5, 23, 22, 39, 25 Most conserved positions (DMS): 1 (Met, start codon), 38, 31, 19, 30, 18, 20, 11
Step 2: Boltz-2 Co-fold (L-protein + DnaJ)
Tool: amina run boltz2 with L-protein (chain A, 75 aa) + DnaJ (chain B, 376 aa)
Results: ipTM = 0.165, pLDDT = 71.5, pDockQ2 = 0.009
The very low ipTM confirms what the assignment warned β folding models struggle with this system. The L-protein has a disordered soluble domain and a transmembrane region, neither of which fold well in isolation. However, the low-confidence prediction still places the soluble domain (residues 1β39) in proximity to DnaJ’s N-terminal J-domain, which is consistent with the known chaperone-substrate interaction mode.
Step 3: Designed Mutations
Based on combining: (1) DMS-favorable scores, (2) experimentally validated lysis-positive mutations, and (3) avoidance of conserved/essential positions, we propose 5 L-protein variants:
| Variant | Mutations | Region | Rationale |
|---|---|---|---|
| V1 | K23E + E25G + D26G | Soluble | All three individually maintain lysis. Remodels the charge landscape of positions 23β26 (removes net +1 charge, adds flexibility). All DMS-favorable. |
| V2 | R18G + R20L + K23E | Soluble | Removes three positive charges from the Arg/Lys-rich stretch (18β23), which is the predicted DnaJ interaction surface. All experimentally lysis-positive. May reduce DnaJ dependence by altering the chaperone recognition motif. |
| V3 | S15A + E25G + R30Q | Soluble | Combines the most conservative experimentally validated mutations across different subregions. S15A and R30Q both maintain lysis AND protein levels, minimizing risk. |
| V4 | P13L + R18I + E25D | Soluble | P13L disrupts a potential turn structure; R18I replaces a charged Arg with hydrophobic Ile; E25D is a conservative acidicβacidic swap. All lysis-positive experimentally. |
| V5 | R19S + K23E + D26G | Soluble | Targets the cationic cluster (R19, K23) that likely mediates DnaJ binding. Replacing Arg/Lys with neutral/acidic residues may enable DnaJ-independent folding while maintaining the downstream lysis machinery. |
Design principles:
- All individual mutations are experimentally validated as lysis-positive
- Mutations target the Arg/Lys-rich region (positions 18β26) that likely mediates DnaJ recognition
- No mutations at position 29 (Cys β essential despite DMS scores) or position 1 (Met β start codon)
- Each variant has β₯3 mutations for meaningful charge/surface remodeling
Tools Used
| Step | Tool | Details |
|---|---|---|
| Deep mutational scan | amina run esm1v -m dms | 5-model ensemble, 1500 mutations |
| Co-fold L-protein + DnaJ | amina run boltz2 | 2-chain complex, diffusion model |
| Experimental cross-validation | Published data | Chamakura et al., 2017 |