Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

  1. sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ (taken from https://rest.uniprot.org/uniprotkb/P00441.fasta ) -> muted form A-> V MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

  2. (2-5)
    #BinderPseudo PerplexityNotes
    0WHYGAAQAAHWX7.60026377248636High confidence
    1WLYGASAAAWKK7.46473740432208Highest confidence!!!
    2WLYGAAGVAWKE10.9325804754158Moderate confidence
    3WLYYPQAAKLKK15.5499787120909Lowest confidence
    FLYRWLPSRRGG20.9180890005569Known binder (control)

Part 2: Evaluate Binders with AlphaFold3

So what I found out is that in WHYGAAQAAHWX X is an unknown amino acid and that AlphaFold3 is not going to work with it so I’m just skipping it for now.

 SOD1_A4V_WLYYPQAAKLKK  SOD1_A4V_WLYYPQAAKLKK SOD1_A4V_WLYYPQAAKLKK

 SOD1_A4V_WLYGASAAAWKK  SOD1_A4V_WLYGASAAAWKK SOD1_A4V_WLYGASAAAWKK

 SOD1_A4V_WLYGAAGVAWKE  SOD1_A4V_WLYGAAGVAWKE SOD1_A4V_WLYGAAGVAWKE

 SOD1_A4V_FLYRWLPSRRGG  SOD1_A4V_FLYRWLPSRRGG SOD1_A4V_FLYRWLPSRRGG

PeptideipTMpTMBinding Location
FLYRWLPSRRGG (known)0.340.83Surface-bound, near bottom/loop region
WLYGAAGVAWKE0.450.88Engages β-barrel, partially buried near core
WLYGASAAAWKK0.240.78Surface-bound, near N-terminus/top loops
WLYYPQAAKLKK0.270.70Surface-bound, loose association near loops
  1. ipTM scores across all peptides ranged from 0.24 to 0.45 -> suggests weak-to-moderate predicted interface confidence. The PepMLM-generated peptide WLYGAAGVAWKE (ipTM = 0.45) outperformed the known binder FLYRWLPSRRGG (ipTM = 0.34), appearing more engaged with the β-barrel core of SOD1. The remaining two peptides WLYGASAAAWKK (ipTM = 0.24) and WLYYPQAAKLKK (ipTM = 0.27) scored below FLYRWLPSRRGG and appeared loosely surface-bound. None exceeded an ipTM of 0.5, which should be expected for short peptides against a structured target

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

 WLYYPQAAKLKK_pv  WLYYPQAAKLKK_pv WLYYPQAAKLKK_pv

 WLYGASAAAWKK_pv  WLYGASAAAWKK_pv WLYGASAAAWKK_pv

 WLYGAAGVAWKE_pv  WLYGAAGVAWKE_pv WLYGAAGVAWKE_pv

 FLYRWLPSRRGG_pv  FLYRWLPSRRGG_pv FLYRWLPSRRGG_pv

PeptideipTMBinding Affinity (pKd)SolubilityHemolysisNet Charge
FLYRWLPSRRGG (known)0.345.968SolubleNon-hemolytic+2.76
WLYGAAGVAWKE0.456.683SolubleNon-hemolytic-0.23
WLYGASAAAWKK0.246.071SolubleNon-hemolytic+1.76
WLYYPQAAKLKK0.275.482SolubleNon-hemolytic+2.76

WLYGAAGVAWKE had both the highest ipTM (0.45) and the strongest predicted binding affinity (6.683 pKd), suggesting that structural confidence and binding prediction align for this peptide. All peptides were soluble and non-hemolytic, meaning none raised safety red flags. Notably, higher ipTM did loosely correlate with stronger affinity — WLYGAAGVAWKE topped both metrics while WLYYPQAAKLKK scored lowest on affinity and had a weak ipTM.

Peptide to Advance: WLYGAAGVAWKE!!!

WLYGAAGVAWKE has the best structural binding confidence (ipTM = 0.45), the strongest predicted affinity (6.683 pKd), is fully soluble, and non-hemolytic. It outperforms the known binder FLYRWLPSRRGG on every key metric! Therefore, we should use this one

Part 4: Generate Optimized Peptides with moPPIt

I tried to run for Motif positions 1, 2, 3, 4, 5, 6, 7 but it’s just taking so long, so, I’m nit going to finosh running it but here’re some results that I got so far:

moPPIt Generated Peptides

#PeptideTarget Residues
1GKTEKTYTDCCD1, 2, 3, 4, 5, 6, 7
2EEQNTCIQTTKA1, 2, 3, 4, 5, 6, 7

Comparison: moPPIt vs PepMLM Peptides:

moPPIt peptides differ notably from PepMLM-generated ones in both composition and design intent. PepMLM peptides (e.g., WLYGAAGVAWKE, WLYGASAAAWKK) were dominated by W at position 1 and showed a hydrophobic character, as the model simply sampled sequences likely to bind SOD1 without any spatial constraints. In contrast, moPPIt peptides (GKTEKTYTDCCD, EEQNTCIQTTKA) are more polar and charged while balancing affinity, solubility and hemolysis objectives. moPPIt asks “what binds specifically near position 4, and is also therapeutically viable?”

Before advancing the studies of moPPIt peptides, the following next steps/ evaluations needed:

  1. SPR or ITC should be used to measure actual binding affinity
  2. AlphaFold3 rystallography to confirm binding near the target residues 1–7
  3. To confirm non-toxic
  4. Need to assess peptide half-life through test degradation in serum
  5. Check for cell permeability
  6. ALS mouse model

new!! TKCVATKKLQED

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

waiting to be accepted onto the platform

Part C: Final Project: L-Protein Mutants

 SOD1_A4V_WLYYPQAAKLKK  SOD1_A4V_WLYYPQAAKLKK  SOD1_A4V_WLYYPQAAKLKK  SOD1_A4V_WLYYPQAAKLKK

I’ve also asked Claude* for help with the validation of computations, and this was the correlation assessment

STRONG CORRELATION between computational scores and experimental LLR values:

Computational PredictionExperimental ValidationAgreement
K50L (Score: 2.56)Bright yellowExcellent
C29R (Score: 2.40)Position 29 hotspotExcellent
Y39L (Score: 2.24)Bright at pos 39Strong
N53L (Score: 1.87)TM leucine patternStrong
S9Q (Score: 2.01)Some positive signalsGood

Conclusion: The ESM2 language model predictions correlate well with experimental data, particularly for:

  • Identifying hotspot positions (29, 39, 50)
  • Predicting beneficial amino acid types (hydrophobic in TM, removing C29)
  • Overall mutation effects (positive vs negative)

This validates using computational approaches for rational protein design, though experimental validation remains essential.

Top 20 Mutations with Scores

RankMutationOriginal AANew AAPositionScoreRegion
1K50LKL502.561Transmembrane
2C29RCR292.395Soluble
3Y39LYL392.242Soluble
4C29SCS292.043Soluble
5S9QSQ92.014Soluble
6C29QCQ291.997Soluble
7C29PCP291.971Soluble
8C29LCL291.961Soluble
9K50IKI501.929Transmembrane
10N53LNL531.865Transmembrane
11E61LEL611.818Transmembrane
12T52LTL521.814Transmembrane
13K50FKF501.802Transmembrane
14C29TCT291.797Soluble
15C29KCK291.796Soluble
16F5QFQ51.795Soluble
17F5RFR51.660Soluble
18C29ACA291.649Soluble
19Y27RYR271.628Soluble
20F22RFR221.602Soluble

my final 5 mutations: K50L (Transmembrane) 2.561 C29R (Soluble) 2.395 Y39L (Soluble) 2.242 N53L (Transmembrane) 1.865 S9Q (Soluble) 2.014

*prompt used with Claude find correlation between the experimental data L-Protein Mutants - Sheet1.csv and protein_mutations_scores.csv