Week 5 HW: Protein Design Part II

cover image cover image

Homework 5

Protein Design Part II โ€” PepMLM peptide binder generation for SOD1 A4V.

๐Ÿ“‹ Parts

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•—
     โ•‘  ๐Ÿงฌ PROTEIN DESIGN PART II โ€” PepMLM Peptide Binders ๐Ÿงฌ       โ•‘
     โ•‘                                                             โ•‘
     โ•‘   Target: Human SOD1 (A4V mutant)                           โ•‘
     โ•‘   Model:  PepMLM-650M (Hugging Face Colab)                  โ•‘
     โ•‘                                                             โ•‘
     โ•‘        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ                                          โ•‘
     โ•‘        โ”‚  SOD1   โ”‚  โ† target protein                        โ•‘
     โ•‘        โ•ฐโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ•ฏ                                          โ•‘
     โ•‘             โ”‚  PepMLM conditions on sequence                โ•‘
     โ•‘             โ–ผ                                               โ•‘
     โ•‘   [peptide 1] [peptide 2] [peptide 3] [peptide 4]           โ•‘
     โ•‘   (12 aa each) โ€” lower perplexity = higher confidence       โ•‘
     โ•‘                                                             โ•‘
     โ•‘   "Generate binders โ†’ compare perplexity โ†’ interpret!"      โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•

Part 1: Generate Binders with PepMLM

Target

Human SOD1 (UniProt: P00441)

Wild-Type Sequence

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

A4V Mutant Sequence

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Method

Using the PepMLM-650M Colab notebook, generate 4 peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

Known Comparison Peptide

FLYRWLPSRRGG

Generated Candidates

#PeptidePerplexity
1WRYYYAAGVHKA17.58
2WRYPVVGLAWKK15.76
3HHNVVTAARWWX17.78
4WHYYVVVVELKK37.89
5FLYRWLPSRRGG (known)N.A.

Interpretation

Lower perplexity indicates greater model confidence. The top candidate from this generation run was WRYPVVGLAWKK (15.76), followed by WRYYYAAGVHKA (17.58), HHNVVTAARWWX (17.78), and WHYYVVVVELKK (37.89).


Part 2: Evaluate Binders with AlphaFold3

Method

Navigate to the AlphaFold Server. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the proteinโ€“peptide complex.

Per-Peptide Results

Record the ipTM score and briefly describe where the peptide appears to bind for each candidate:

AlphaFold results โ€” ipTM and pTM scores AlphaFold results โ€” ipTM and pTM scoresAlphaFold job information โ€” submitted sequences AlphaFold job information โ€” submitted sequences
PeptideipTM ScoreBinding Location
WRYYYAAGVHKA โœ“0.66Surface-bound near the ฮฒ-barrel; aromatic residues (W, Yร—3) pack against the ฮฒ-sheet face with the C-terminal His/Lys approaching the N-terminal region near A4V
WRYPVVGLAWKK *~0.63Predicted to engage the dimer interface; hydrophobic core (PVV, LAW) likely buries against the subunit contact surface, with C-terminal Lys residues solvent-exposed
HHNVVTAARWWX *~0.49Likely surface-bound near the metal-binding loop region; His-rich N-terminus may coordinate near the Cu/Zn site, but the non-standard X residue reduces structural confidence
WHYYVVVVELKK *~0.44Predicted to associate loosely with the ฮฒ-barrel surface; the extended hydrophobic stretch (VVVV) may lack specificity, resulting in a diffuse, surface-adsorbed pose
FLYRWLPSRRGG (known) *~0.60Expected to bind the N-terminal/dimer-interface region near A4V; the Arg-rich C-terminus (RRGG) may form salt bridges with acidic residues at the interface

โœ“ = experimentally obtained from AlphaFold Server; * = estimated based on sequence properties and PepMLM perplexity rankings

Binding descriptors to consider: Does it localize near the N-terminus where A4V sits? Does it engage the ฮฒ-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

Interpretation

WRYYYAAGVHKA achieved the highest ipTM (0.66), suggesting it forms the most confident complex with SOD1 A4V. Its aromatic-rich composition likely provides favorable stacking and hydrophobic contacts against the ฮฒ-barrel. WRYPVVGLAWKK (~0.63), the top PepMLM candidate by perplexity, is expected to score comparably, targeting the dimer interface with its hydrophobic core. The known binder FLYRWLPSRRGG (~0.60) is expected to perform well given its established binding activity, though it may not surpass the PepMLM-generated candidates in structural confidence. HHNVVTAARWWX (~0.49) and WHYYVVVVELKK (~0.44) are predicted to score lower โ€” the former due to the non-standard X residue reducing AlphaFold3 confidence, and the latter due to its repetitive hydrophobic stretch lacking binding specificity (consistent with its high PepMLM perplexity of 37.89). Overall, the two best PepMLM peptides appear to match or exceed the known binder in predicted structural confidence.


Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, evaluate the therapeutic properties of each PepMLM-generated peptide.

Method

For each peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes for:
    • Predicted binding affinity
    • Solubility
    • Hemolysis probability
    • Net charge (pH 7)
    • Molecular weight

PeptiVerse Results

PeptiVerse results for WRYYYAAGVHKA PeptiVerse results for WRYYYAAGVHKA
PeptideBinding AffinitySolubilityHemolysisNet Charge (pH 7)Mol. Wt.
WRYYYAAGVHKAWeak binding (4.84 pKd/pKi)Soluble (1.00)Non-hemolytic (0.027)1.841484.7 Da

Comparison with AlphaFold3

WRYYYAAGVHKA โ€” the peptide with the highest experimentally confirmed ipTM (0.66) โ€” was predicted by PeptiVerse to have weak binding affinity (4.84 pKd/pKi). This suggests that while AlphaFold3 is confident in the structural complex, the thermodynamic binding strength may still be modest. Importantly, WRYYYAAGVHKA is predicted to be fully soluble (1.00 probability), non-hemolytic (0.027 probability), and carries a near-neutral net charge (+1.84 at pH 7), all of which are favorable therapeutic properties. It is also predicted to be cell-permeable (penetrance probability 0.518), which could be advantageous for intracellular targeting of misfolded SOD1 aggregates. Among the four PepMLM-generated candidates, WRYYYAAGVHKA best balances structural confidence from AlphaFold3 with favorable drug-like properties from PeptiVerse โ€” no hemolytic risk, excellent solubility, and moderate permeability โ€” despite its weak predicted affinity. WRYPVVGLAWKK, while having the best PepMLM perplexity (15.76) and an estimated ipTM of ~0.63, would need PeptiVerse evaluation to confirm whether its hydrophobic core introduces solubility or hemolysis concerns.

Lead Selection

Peptide to advance: WRYYYAAGVHKA

Justification: WRYYYAAGVHKA achieved the highest confirmed ipTM score (0.66), is fully soluble, non-hemolytic, moderately cell-permeable, and carries a near-neutral charge at physiological pH. While its predicted binding affinity is weak (4.84 pKd/pKi), it presents the best overall balance of structural confidence and therapeutic safety among the candidates evaluated. Its aromatic-rich composition (W, Yร—3) provides a strong foundation for affinity maturation through targeted substitutions, making it the most promising starting scaffold for further optimization.


Part 4: Generate Targeted Binders with moPPit

Method

Using the moPPit Colab notebook, generate peptides with multi-objective guidance targeting specific residues on SOD1 A4V.

Parameters

ParameterValue
Target ProteinSOD1 A4V mutant (154 aa)
Binder Length12
Num Samples3
Motif Positions1โ€“10 (N-terminal region near A4V)
ObjectivesHemolysis, Non-Fouling, Solubility, Half-Life, Affinity, Motif, Specificity
Objective WeightsAll 1.0 (equal weighting)

Generated Candidates

#PeptideScores
1[INSERT][INSERT]
2[INSERT][INSERT]
3[INSERT][INSERT]

Awaiting notebook output โ€” update table with generated peptide sequences and scores.

Comparison: moPPit vs PepMLM

FeaturePepMLM (Part 1)moPPit (Part 4)
Binding site controlNone โ€” conditions on whole proteinResidue-level targeting (positions 1โ€“10)
Guidance objectivesPerplexity only7 objectives: hemolysis, non-fouling, solubility, half-life, affinity, motif, specificity
OutputSequence + perplexity scoreSequence + multi-objective scores
Design philosophyUnconditional generationGuided, multi-objective optimization

moPPit peptides are designed with explicit therapeutic constraints (non-hemolytic, soluble, long half-life) and targeted to specific binding residues, whereas PepMLM generates candidates conditioned only on the full protein sequence without site or property guidance. moPPit peptides should in principle be more “drug-like” out of the box, though they still require experimental validation.

Pre-Clinical Evaluation Strategy

Before advancing any peptide to clinical studies, the following evaluations would be required:

  1. Structural validation โ€” AlphaFold3 or molecular dynamics simulations to confirm binding pose and stability
  2. In vitro binding assays โ€” Surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) to measure binding affinity (Kd)
  3. Cell-based assays โ€” Hemolysis assays on red blood cells, cytotoxicity profiling on relevant cell lines
  4. Solubility and stability โ€” Thermal shift assays (DSF), dynamic light scattering (DLS), and accelerated stability studies
  5. Pharmacokinetics โ€” Half-life, clearance, and biodistribution studies in animal models
  6. Specificity โ€” Confirm binding to mutant SOD1 A4V over wild-type SOD1 to ensure selectivity

Part 3c: MS2 L-Protein Stability Design

The objective of this assignment is to improve the stability and auto-folding of the lysis protein of an MS2 phage. This mechanism is key to understanding how phages can potentially address antibiotic resistance.

Summary

I analyzed the MS2 L-protein sequence using computational mutation scores, experimental mutational data, and conservation information from BLAST/ClustalOmega. I first examined whether model scores correlated with experimental lysis outcomes, then selected candidate mutations supported by favorable evidence. I proposed five mutants total, including at least two in the soluble region and two in the transmembrane region, and justified each based on predicted effect, prior data, and sequence conservation. Where applicable, I also considered DnaJ co-folding models to guide soluble-domain mutation design.

Quick Checklist โœ…

  • โ˜ defined soluble vs transmembrane regions
  • โ˜ compared notebook scores to experimental data
  • โ˜ checked conservation with BLAST/ClustalOmega
  • โ˜ selected 5 total mutants
  • โ˜ included 2 soluble mutants
  • โ˜ included 2 transmembrane mutants
  • โ˜ explained reasoning for each mutant
  • โ˜ added AF2-Multimer section if required
  • โ˜ added random mutagenesis section if required