Week 05 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

The original sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutate the 4th amino acid A to V (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:

indexBinderPseudo Perplexity
0HLYYAVALELKX13.299815648347872
1WRSYAVVLELWK17.97100111129112
2WRYYPVAAAWKK11.081842724779028
3WHYGAVGLRHKX13.983770011694478

The perplexity of the reference SOD-1 binding sequence FLYRWLPSRRGG is 20.63523127283615:

ppl_value = compute_pseudo_perplexity(model, tokenizer, protein_seq, "FLYRWLPSRRGG")
ppl_value  # Output: 20.63523127283615

Part 2: Evaluate Binders with AlphaFold3

The IPTM scores for the reference peptide against the wild-type and mutant SOD1 are both pretty low (0.36 and 0.41 respectively), indicating that AlphaFold is not very confident about the predicted binding structure. The first three generated peptides have IPTM scores of 0.24, 0.25, and 0.32, which are lower than the reference. The last generated peptide has an IPTM score of 0.43, which is higher than the reference.

Only the third (A4V-2) generated peptide binds to the dimerization interface of SOD1.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Predicted properties of the generated peptides:

SequencePropertyPredictionValueUnit
HLYYAVALELK💧 SolubilitySoluble1.000Probability
HLYYAVALELK🩸 HemolysisNon-hemolytic0.081Probability
HLYYAVALELK🔗 Binding AffinityWeak binding6.052pKd/pKi
HLYYAVALELK📏 Length11aa
HLYYAVALELK⚖️ Molecular Weight1319.5Da
HLYYAVALELK⚡ Net Charge (pH 7)-0.15
HLYYAVALELK🎯 Isoelectric Point6.75pH
HLYYAVALELK💦 Hydrophobicity (GRAVY)0.55GRAVY
WRSYAVVLELWK💧 SolubilitySoluble1.000Probability
WRSYAVVLELWK🩸 HemolysisNon-hemolytic0.130Probability
WRSYAVVLELWK🔗 Binding AffinityWeak binding6.818pKd/pKi
WRSYAVVLELWK📏 Length12aa
WRSYAVVLELWK⚖️ Molecular Weight1549.8Da
WRSYAVVLELWK⚡ Net Charge (pH 7)0.76
WRSYAVVLELWK🎯 Isoelectric Point8.59pH
WRSYAVVLELWK💦 Hydrophobicity (GRAVY)0.17GRAVY
WRYYPVAAAWKK💧 SolubilitySoluble1.000Probability
WRYYPVAAAWKK🩸 HemolysisNon-hemolytic0.021Probability
WRYYPVAAAWKK🔗 Binding AffinityWeak binding6.124pKd/pKi
WRYYPVAAAWKK📏 Length12aa
WRYYPVAAAWKK⚖️ Molecular Weight1538.8Da
WRYYPVAAAWKK⚡ Net Charge (pH 7)2.76
WRYYPVAAAWKK🎯 Isoelectric Point10.00pH
WRYYPVAAAWKK💦 Hydrophobicity (GRAVY)-0.72GRAVY
WHYGAVGLRHK💧 SolubilitySoluble1.000Probability
WHYGAVGLRHK🩸 HemolysisNon-hemolytic0.023Probability
WHYGAVGLRHK🔗 Binding AffinityWeak binding5.442pKd/pKi
WHYGAVGLRHK📏 Length11aa
WHYGAVGLRHK⚖️ Molecular Weight1323.5Da
WHYGAVGLRHK⚡ Net Charge (pH 7)1.93
WHYGAVGLRHK🎯 Isoelectric Point9.99pH
WHYGAVGLRHK💦 Hydrophobicity (GRAVY)-0.73GRAVY

Predicted properties of the reference peptide:

SequencePropertyPredictionValueUnit
FLYRWLPSRRGG💧 SolubilitySoluble1.000Probability
FLYRWLPSRRGG🩸 HemolysisNon-hemolytic0.047Probability
FLYRWLPSRRGG🔗 Binding AffinityWeak binding5.968pKd/pKi
FLYRWLPSRRGG📏 Length12aa
FLYRWLPSRRGG⚖️ Molecular Weight1507.7Da
FLYRWLPSRRGG⚡ Net Charge (pH 7)2.76
FLYRWLPSRRGG🎯 Isoelectric Point11.71pH
FLYRWLPSRRGG💦 Hydrophobicity (GRAVY)-0.71GRAVY

The peptide WHYGAVGLRHK has the highest ipTM score of 0.43, but it has a relatively low predicted binding affinity (5.442 pKd/pKi). The peptide WRSYAVVLELWK has a lower ipTM score of 0.25 but a higher predicted binding affinity (6.818 pKd/pKi). None of the generated peptides are predicted to be hemolytic or poorly soluble. The peptide WRSYAVVLELWK best balances predicted binding and therapeutic properties, as it has a reasonably high ipTM score and the highest predicted binding affinity among the generated peptides.

Part 4: Generate Optimized Peptides with moPPIt

Generated peptide sequence with predicted solubility score, affinity score, and hemolysis score:

['DFRQSTTYQY']
[0.9166666865348816, 6.323781490325928, 0.7198045253753662]

The moPPIt-generated peptide DFRQSTTYQY has a higher predicted binding affinity score and solubility score compared to the PepMLM-generated peptides. Before advancing this peptide to clinical studies, I would evaluate its binding affinity experimentally in vitro, and further assess its stability, toxicity, and pharmacokinetic properties in cell and animal models.

Part C: L-Protein Mutants

I first used Boltz-2 to predict the complex structure of the wild-type L-protein and DnaJ protein:

Next, I used FoldX, a force field-based protein design tool that can predict the effects of mutations on protein-protein interfaces. The goal is to identify mutations in the L-protein that are energetically favorable to stabilize the interaction with DnaJ.

To do this, I first relax the sidechain structure of the L-protein using the following command:

foldx --command=RepairPDB --pdb=result.pdb

The relaxation process slightly adjusts sidechain conformations to minimize steric clashes and optimize interactions. The resulting relaxed structure is shown below in blue, (green and cyan are the original structure predicted by Boltz-2):

Next, I scan through L-protein and mutate each residue to all 20 amino acids, and compute the change in binding energy (ΔΔG) for each mutation using the following command:

# Soluble
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=MA1a,EA2a,TA3a,RA4a,FA5a,PA6a,QA7a,QA8a,SA9a,QA10a,QA11a,TA12a,PA13a,AA14a,SA15a,TA16a,NA17a,RA18a,RA19a,RA20a,PA21a,FA22a,KA23a,HA24a,EA25a,DA26a,YA27a,PA28a,CA29a,RA30a,RA31a,QA32a,QA33a,RA34a,SA35a,SA36a,TA37a,LA38a,YA39a,VA40a

# Transmembrane
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=LA41a,IA42a,FA43a,LA44a,AA45a,IA46a,FA47a,LA48a,SA49a,KA50a,FA51a,TA52a,NA53a,QA54a,LA55a,LA56a,LA57a,SA58a,LA59a,LA60a,EA61a,AA62a,VA63a,IA64a,RA65a,TA66a,VA67a,TA68a,TA69a,LA70a,QA71a,QA72a,LA73a,LA74a,TA75a

The result of mutations on soluble region is shown below, green indicates mutations that are predicted to stabilize the interaction (negative ddG), while red indicates mutations that are predicted to destabilize the interaction (positive ddG):

The result of mutations on TM region is shown below:

Based on the results above, I would propose the following multi-site mutations in the soluble region:

  • DA26L + NA17W + CA29W: sum = −9.23
  • DA26L + EA25P + QA8T + FA22H: sum = −9.03
  • DA26L + NA17W + RA4E + SA9F: sum = −9.95
  • DA26L + EA2D + FA5M + RA20N: sum = −8.69
  • DA26L + HA24P + PA28K + RA34M: sum = −7.63

My rationale is that combining single stabilizing mutations will have an additive effect on the overall binding affinity. However, this assumption ignores potential epistatic interactions between mutations (non-additive effects).