Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

P00441 can be found on the UnitProt site here. It has the following sequence:

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

And I’ll swap MATKA → MATKV to get MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

  1. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:

See table below.

  1. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

See table below.

  1. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

See table below.

  1. Record the perplexity scores that indicate PepMLM’s confidence in the binders.
BinderPseudo Perplexity
WHYGAVAAAHKE7.5475612006564115
WRYGATGARHKE11.178195011097767
WRYPVAALELWK21.190836127543506
WRYPAVVLRLKE13.790132945872145
FLYRWLPSRRGG (control)

Part 2: Evaluate Binders with AlphaFold3

  1. Navigate to the AlphaFold Server
  2. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
  3. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
BinderipTM ScoreBinding Spot
WHYGAVAAAHKE0.31Not near N-terminus, engages with β-barrel region. Surface-bound.
WRYGATGARHKE0.33Not near N-terminus, potentially engages with β-barrel region. Surface-bound.
WRYPVAALELWK0.21Not near N-terminus, potentially engages with β-barrel region. Surface-bound.
WRYPAVVLRLKE0.3Somewhat near N-terminus, does not engage with β-barrel region. Surface-bound.
FLYRWLPSRRGG (control)0.36On the complete opposite side of the N-terminus, with the β-barrel in between. Surface-bound.
  1. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

None of the ipTM values (max 0.33) exceed the known binder (0.36). However, all of the values are poor (< 0.6), which indicates the predictions might not be accurate. Some of the peptides do appear to bind better in their visualizations than the known (e.g. the first one WHYGAVAAAHKE and its interactions with the β-barrel).

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes
    1. Predicted binding affinity
    2. Solubility
    3. Hemolysis probability
    4. Net charge (pH 7)
    5. Molecular weight
BinderPredicted binding affinity (pKd/pKi)Solubility (%)Hemolysis probability (%)Net charge (pH 7)Molecular weight (Da)
WHYGAVAAAHKEWeak binding, 5.893Soluble, 1.000Non-hemolytic, 0.024-0.061339.5
WRYGATGARHKEWeak binding, 6.087Soluble, 1.000Non-hemolytic, 0.0281.851431.6
WRYPVAALELWKWeak binding, 6.446Soluble, 0.982Non-hemolytic, 0.0610.761531.8
WRYPAVVLRLKEWeak binding, 6.506Soluble, 0.816Non-hemolytic, 0.0501.771529.8
FLYRWLPSRRGG (control)Weak binding, 6.361Soluble, 0.608Non-hemolytic, 0.0472.761507.7

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see.

Actually, the extreme outlier is WRYPVAALELWK with an ipTM of 0.21, but it doesn’t vary as drastically with its other properties. It’s generally in the middle of the pack with binding affinity, solubility, and charge. One notable difference is its hemolysis probability is a bit higher, but not by much–still enough to be non-hemolytic.

The best performing two, WRYGATGARHKE with 0.33 and the control with 0.36, have solubilities on either extreme and very low to somewhat low hemolysis probability. They do have higher than average net charge, though–maybe that’s something that can be leveraged.

Choose one peptide you would advance and justify your decision briefly.

I don’t have an obvious contender, but WRYGATGARHKE is just behind the control in ipTM and is quite different in properties, so it’d be worth optimizing that to see if there’s a new direction that could produce good peptides.

Part 4: Generate Optimized Peptides with moPPIt

  1. Open the moPPit Colab linked from the HuggingFace moPPIt model card
  2. Make a copy and switch to a GPU runtime.
  3. In the notebook:
    1. Paste your A4V mutant SOD1 sequence.
    2. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
    3. Set peptide length to 12 amino acids.
    4. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.

I used the following inputs for generation: Binder Generation Setup Binder Generation Setup

  1. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?
BinderPredicted binding affinity (pKd/pKi)Solubility (%)Hemolysis probability (%)ipTM
CTAGSTVGVGVW6.78320.99960.06180.36
ASATFEPPPVCH5.806810.02230.39
VSEKYCVQFGKT6.262310.04050.33
MSAGICNEFKQK5.640410.02380.55
KNPCEAYCFNWV6.720010.03460.28

I’d say there’s more variety. PepMLM repeated a lot of beginning and ending amino acids in the sequence, but all these sequences look completely unique. It doesn’t reflect as much in the properties, though. For evaluating, I’d run each sequence through the same software and compare properties/ipTM to see if there’s any improvement. E.g. MSAGICNEFKQK had a huge jump in ipTM to 0.55–that’s promising!

Part C: Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.