Week 5 HW: Protein Design II

cover cover

Week 5: Protein Design II

This week focuses on designing and evaluating therapeutic peptides for SOD1 mutant A4V, a key player in familial Amyotrophic Lateral Sclerosis (ALS).


Part A: SOD1 Binder Peptide Design

1. Preparation: Mutant SOD1 Sequence

I retrieved the human SOD1 sequence (P00441) and introduced the A4V mutation (Alanine to Valine at residue 4, relative to the processed chain).

Original Sequence (P00441):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutant Sequence (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Part 1: Generate Binders with PepMLM

PepMLM-650M

The first step is to generate candidate binders using target-conditioned masked language modeling. I used the PepMLM-650M model to sample 12-residue peptides conditioned on the A4V mutant SOD1 sequence.

Peptide IDSequence (12 AA)Perplexity Score
Known BinderFLYRWLPSRRGG(Reference)
PepMLM-0WRSYVVAVRHKA13.12
PepMLM-1WRSPVTAAALKK8.76
PepMLM-2WLYGAVGARHKE12.66
PepMLM-3WRYYVAVVRHKE26.45

Observations:

  • Amino Acid Substitution: The model generated an undefined amino acid “X” at the C-terminus of PepMLM-0. To enable structural prediction in AlphaFold3, I replaced it with Alanine (A).
  • PepMLM-1 achieved the lowest perplexity score (8.76), indicating the highest model confidence in its affinity for the mutant SOD1 target.
  • Most generated sequences show a high frequency of positively charged residues (Lysine, Arginine) or hydrophobic residues (Valine, Alanine), which may be important for interacting with the destabilized N-terminus of SOD1.
  • These candidates will now be validated structurally using AlphaFold3.

Part 2: Evaluate Binders with AlphaFold3

AlphaFold3 Server

I modeled the candidate peptides with the mutant SOD1 (A4V) using the AlphaFold3 Server to evaluate structural confidence and binding sites.

Comparison Result: PepMLM-0 (WRSYVVAVRHKA)

SOD1 PepMLM-0 AlphaFold3 SOD1 PepMLM-0 AlphaFold3 Figure 2: AlphaFold3 prediction of PepMLM-0 (Yellow/Orange).

MetricValue
ipTM Score0.39

Key Result: PepMLM-1 (WRSPVTAAALKK)

SOD1 PepMLM-1 AlphaFold3 SOD1 PepMLM-1 AlphaFold3 Figure 3: AlphaFold3 prediction of PepMLM-1 docking to SOD1 A4V (Blue).

MetricValue
ipTM Score0.56

Comparison Result: PepMLM-2 (WLYGAVGARHKE)

SOD1 PepMLM-2 AlphaFold3 SOD1 PepMLM-2 AlphaFold3 Figure 4: AlphaFold3 prediction of PepMLM-2.

MetricValue
ipTM Score0.38

Comparison Result: PepMLM-3 (WRYYVAVVRHKE)

SOD1 PepMLM-3 AlphaFold3 SOD1 PepMLM-3 AlphaFold3 Figure 5: AlphaFold3 prediction of PepMLM-3.

MetricValue
ipTM Score0.30

Reference: Known Binder (FLYRWLPSRRGG)

SOD1 Known Binder AlphaFold3 SOD1 Known Binder AlphaFold3 Figure 6: AlphaFold3 prediction of the known SOD1-binding peptide.

MetricValue
ipTM Score0.34

Analysis & Comparison:

  • PepMLM-1 vs. Known Binder: Remarkably, PepMLM-1 (ipTM 0.56) significantly outperforms the known binder (ipTM 0.34) in terms of structural binding confidence. This suggests that target-conditioned generation via PepMLM can yield candidates with superior theoretical affinity than previously identified sequences.
  • Correlation with Perplexity: The PepMLM Perplexity scores correlate well with structural confidence (ipTM). PepMLM-1 (8.76) is the top design, while the other generation candidates (Perplexity 12.6–26.4) and the known binder all achieved lower ipTM scores across the surface loops.
  • Common Binding Motifs: Both the PepMLM peptides and the known binder tend to localize on the exposed surface loops or β-sheet edges of the SOD1 β-barrel. This implies a general affinity for the protein’s “sticky” solvent-exposed patches.
  • Site Localization: None of the peptides—including the known binder—deeply targeted the N-terminal A4V mutation pocket in these simulations. This highlights that while we have found strong surface binders, specific “pocket-filling” designs may require the site-specific guidance of models like moPPIt.

Part 3: Evaluate Properties in PeptiVerse

PeptiVerse

Beyond structural docking, we must evaluate the pharmacological and therapeutic properties of the designed peptides. I used PeptiVerse to predict how these candidates would behave in a biological environment.

Peptide IndexSequenceAffinitySolubilityHemolysisNet ChargeAF3 ipTM
ReferenceFLYRWLPSRRGG[Pending][Pending]Non-hemolytic (0.047)+2.760.34
0 (X→A)WRSYVVAVRHKA[Pending][Pending]Non-hemolytic (0.031)+2.850.39
1WRSPVTAAALKK[Pending][Pending]Non-hemolytic (0.020)+2.760.56
2WLYGAVGARHKE[Pending][Pending]Non-hemolytic (0.035)+0.850.38
3WRYYVAVVRHKE[Pending][Pending]Non-hemolytic (0.057)+1.850.30

Observations:

  • AI-Designed vs. Known Binder: The AI-designed lead candidate, PepMLM-1, demonstrates superior structural confidence (ipTM 0.56) compared to the known binder (ipTM 0.34).
  • Safety Profile: PepMLM-1 also shows a lower predicted hemolysis probability (0.020) than the reference sequence (0.047), suggesting that sequence-conditioned generation can simultaneously optimize for both affinity and therapeutic safety.
  • Biochemical Consistency: Most successful candidates (PepMLM-0, 1) and the known binder share a high positive net charge (+2.7 to +2.8) at physiological pH, likely facilitating the initial attraction to the target protein’s surface.

Recommendation: Based on the integrated analysis of structural confidence and therapeutic safety, I recommend advancing PepMLM-1 (WRSPVTAAALKK) toward clinical development. It offers the best overall profile:

  1. Superior Binding: Highest ipTM score (0.56), significantly outperforming the known binder (0.34).
  2. Optimal Safety: Lowest predicted hemolysis probability (0.020) among all tested sequences.
  3. Physicochemical Favorability: Strong net positive charge (+2.76) at physiological pH, aligning with confirmed binding patterns for SOD1.

Part 4: Optimized Design with moPPIt

moPPIt (MOG-DFM)

While PepMLM provides plausible binders based on sequence context, moPPIt (Multi-Objective Guided Discrete Flow Matching) allows for controlled design. I used moPPIt to steer peptide generation toward specific surface patches on SOD1 and optimize for multiple objective functions simultaneously (Affinity, Solubility, and Hemolysis).

moPPIt Generated Candidates:

SequenceMotif ScoreBinding MetricSolubility ScoreHemolysis Score
NKKSGEWFQKPG0.755.750.680.58
KQTKIERPCCVQ0.756.620.670.57
QACGTGVVGTTF0.676.880.670.63

Analysis: moPPIt vs. PepMLM

  • Targeted Binding: Unlike the PepMLM leads which tended to bind general surface loops, the moPPIt-generated sequences like NKKSGEWFQKPG show a distinct motif structure. By specifying residue indices near position 4, moPPIt was able to “search” for sequences that specifically complement the destabilized N-terminus environment.
  • Complexity of Design: The moPPIt candidates exhibit a more diverse range of chemical functionalities, including specific motifs (e.g., the Proline-Glycine “turn” in ...QKPG) that are optimized to fit the target surface while maintaining high solubility.
  • Evaluation for Clinical Use: Before advancing these moPPIt designs, I would validate them using specialized assays:
    1. Biolayer Interferometry (BLI): To measure the actual $k_{on}$ and $k_{off}$ rates of the synthetic peptides against the recombinant A4V SOD1 protein.
    2. Aggregation Inhibition Assay: Since A4V causes aggressive aggregation, the ultimate test is whether these peptides prevent the mutant SOD1 from forming toxic fibrils in vitro.
    3. Cell-based Toxicity Rescue: Testing whether the peptides can rescue motor neuron-like cells (e.g., NSC-34) expressing the A4V mutant from SOD1-mediated proteotoxicity.

Part C: Final Project - L-Protein Mutants

Objective: Improve the stability and auto-folding of the lysis protein of the MS2 phage.

Current Progress:

  • [Task 1: Retrieve L-protein wild-type sequence]
  • [Task 2: Identify potential destabilizing regions]
  • [Task 3: Plan ML-guided mutagenesis]