Week 5 HW: Protein Design II

Week 5: Protein Design II

This week focuses on designing and evaluating therapeutic peptides for SOD1 mutant A4V, a key player in familial Amyotrophic Lateral Sclerosis (ALS).

Part A: SOD1 Binder Peptide Design

1. Preparation: Mutant SOD1 Sequence

I retrieved the human SOD1 sequence (P00441) and introduced the A4V mutation (Alanine to Valine at residue 4, relative to the processed chain).

Original Sequence (P00441):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutant Sequence (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Part 1: Generate Binders with PepMLM

PepMLM-650M

The first step is to generate candidate binders using target-conditioned masked language modeling. I used the PepMLM-650M model to sample 12-residue peptides conditioned on the A4V mutant SOD1 sequence.

Peptide ID	Sequence (12 AA)	Perplexity Score
Known Binder	`FLYRWLPSRRGG`	(Reference)
PepMLM-0	`WRSYVVAVRHKA`	13.12
PepMLM-1	`WRSPVTAAALKK`	8.76
PepMLM-2	`WLYGAVGARHKE`	12.66
PepMLM-3	`WRYYVAVVRHKE`	26.45

Observations:

Amino Acid Substitution: The model generated an undefined amino acid “X” at the C-terminus of PepMLM-0. To enable structural prediction in AlphaFold3, I replaced it with Alanine (A).
PepMLM-1 achieved the lowest perplexity score (8.76), indicating the highest model confidence in its affinity for the mutant SOD1 target.
Most generated sequences show a high frequency of positively charged residues (Lysine, Arginine) or hydrophobic residues (Valine, Alanine), which may be important for interacting with the destabilized N-terminus of SOD1.
These candidates will now be validated structurally using AlphaFold3.

Part 2: Evaluate Binders with AlphaFold3

AlphaFold3 Server

I modeled the candidate peptides with the mutant SOD1 (A4V) using the AlphaFold3 Server to evaluate structural confidence and binding sites.

Comparison Result: PepMLM-0 (WRSYVVAVRHKA)

Figure 2: AlphaFold3 prediction of PepMLM-0 (Yellow/Orange).

Metric	Value
ipTM Score	0.39

Key Result: PepMLM-1 (WRSPVTAAALKK)

Figure 3: AlphaFold3 prediction of PepMLM-1 docking to SOD1 A4V (Blue).

Metric	Value
ipTM Score	0.56

Comparison Result: PepMLM-2 (WLYGAVGARHKE)

Figure 4: AlphaFold3 prediction of PepMLM-2.

Metric	Value
ipTM Score	0.38

Comparison Result: PepMLM-3 (WRYYVAVVRHKE)

Figure 5: AlphaFold3 prediction of PepMLM-3.

Metric	Value
ipTM Score	0.30

Reference: Known Binder (FLYRWLPSRRGG)

Figure 6: AlphaFold3 prediction of the known SOD1-binding peptide.

Metric	Value
ipTM Score	0.34

Analysis & Comparison:

PepMLM-1 vs. Known Binder: Remarkably, PepMLM-1 (ipTM 0.56) significantly outperforms the known binder (ipTM 0.34) in terms of structural binding confidence. This suggests that target-conditioned generation via PepMLM can yield candidates with superior theoretical affinity than previously identified sequences.
Correlation with Perplexity: The PepMLM Perplexity scores correlate well with structural confidence (ipTM). PepMLM-1 (8.76) is the top design, while the other generation candidates (Perplexity 12.6–26.4) and the known binder all achieved lower ipTM scores across the surface loops.
Common Binding Motifs: Both the PepMLM peptides and the known binder tend to localize on the exposed surface loops or β-sheet edges of the SOD1 β-barrel. This implies a general affinity for the protein’s “sticky” solvent-exposed patches.
Site Localization: None of the peptides—including the known binder—deeply targeted the N-terminal A4V mutation pocket in these simulations. This highlights that while we have found strong surface binders, specific “pocket-filling” designs may require the site-specific guidance of models like moPPIt.

Part 3: Evaluate Properties in PeptiVerse

PeptiVerse

Beyond structural docking, we must evaluate the pharmacological and therapeutic properties of the designed peptides. I used PeptiVerse to predict how these candidates would behave in a biological environment.

Peptide Index	Sequence	Affinity	Solubility	Hemolysis	Net Charge	AF3 ipTM
Reference	`FLYRWLPSRRGG`	[Pending]	[Pending]	Non-hemolytic (0.047)	+2.76	0.34
0 (X→A)	`WRSYVVAVRHKA`	[Pending]	[Pending]	Non-hemolytic (0.031)	+2.85	0.39
1	`WRSPVTAAALKK`	[Pending]	[Pending]	Non-hemolytic (0.020)	+2.76	0.56
2	`WLYGAVGARHKE`	[Pending]	[Pending]	Non-hemolytic (0.035)	+0.85	0.38
3	`WRYYVAVVRHKE`	[Pending]	[Pending]	Non-hemolytic (0.057)	+1.85	0.30

Observations:

AI-Designed vs. Known Binder: The AI-designed lead candidate, PepMLM-1, demonstrates superior structural confidence (ipTM 0.56) compared to the known binder (ipTM 0.34).
Safety Profile: PepMLM-1 also shows a lower predicted hemolysis probability (0.020) than the reference sequence (0.047), suggesting that sequence-conditioned generation can simultaneously optimize for both affinity and therapeutic safety.
Biochemical Consistency: Most successful candidates (PepMLM-0, 1) and the known binder share a high positive net charge (+2.7 to +2.8) at physiological pH, likely facilitating the initial attraction to the target protein’s surface.

Recommendation: Based on the integrated analysis of structural confidence and therapeutic safety, I recommend advancing PepMLM-1 (WRSPVTAAALKK) toward clinical development. It offers the best overall profile:

Superior Binding: Highest ipTM score (0.56), significantly outperforming the known binder (0.34).
Optimal Safety: Lowest predicted hemolysis probability (0.020) among all tested sequences.
Physicochemical Favorability: Strong net positive charge (+2.76) at physiological pH, aligning with confirmed binding patterns for SOD1.

Part 4: Optimized Design with moPPIt

moPPIt (MOG-DFM)

While PepMLM provides plausible binders based on sequence context, moPPIt (Multi-Objective Guided Discrete Flow Matching) allows for controlled design. I used moPPIt to steer peptide generation toward specific surface patches on SOD1 and optimize for multiple objective functions simultaneously (Affinity, Solubility, and Hemolysis).

moPPIt Generated Candidates:

Sequence	Motif Score	Binding Metric	Solubility Score	Hemolysis Score
`NKKSGEWFQKPG`	0.75	5.75	0.68	0.58
`KQTKIERPCCVQ`	0.75	6.62	0.67	0.57
`QACGTGVVGTTF`	0.67	6.88	0.67	0.63

Analysis: moPPIt vs. PepMLM

Targeted Binding: Unlike the PepMLM leads which tended to bind general surface loops, the moPPIt-generated sequences like NKKSGEWFQKPG show a distinct motif structure. By specifying residue indices near position 4, moPPIt was able to “search” for sequences that specifically complement the destabilized N-terminus environment.
Complexity of Design: The moPPIt candidates exhibit a more diverse range of chemical functionalities, including specific motifs (e.g., the Proline-Glycine “turn” in ...QKPG) that are optimized to fit the target surface while maintaining high solubility.
Evaluation for Clinical Use: Before advancing these moPPIt designs, I would validate them using specialized assays:
1. Biolayer Interferometry (BLI): To measure the actual $k_{on}$ and $k_{off}$ rates of the synthetic peptides against the recombinant A4V SOD1 protein.
2. Aggregation Inhibition Assay: Since A4V causes aggressive aggregation, the ultimate test is whether these peptides prevent the mutant SOD1 from forming toxic fibrils in vitro.
3. Cell-based Toxicity Rescue: Testing whether the peptides can rescue motor neuron-like cells (e.g., NSC-34) expressing the A4V mutant from SOD1-mediated proteotoxicity.

Part C: Final Project - L-Protein Mutants

Objective: Improve the stability and auto-folding of the lysis protein of the MS2 phage.

Current Progress:

[Task 1: Retrieve L-protein wild-type sequence]
[Task 2: Identify potential destabilizing regions]
[Task 3: Plan ML-guided mutagenesis]