Week 05 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

The original sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutate the 4th amino acid A to V (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:

index	Binder	Pseudo Perplexity
0	HLYYAVALELKX	13.299815648347872
1	WRSYAVVLELWK	17.97100111129112
2	WRYYPVAAAWKK	11.081842724779028
3	WHYGAVGLRHKX	13.983770011694478

The perplexity of the reference SOD-1 binding sequence FLYRWLPSRRGG is 20.63523127283615:

ppl_value = compute_pseudo_perplexity(model, tokenizer, protein_seq, "FLYRWLPSRRGG")
ppl_value  # Output: 20.63523127283615

Part 2: Evaluate Binders with AlphaFold3

The IPTM scores for the reference peptide against the wild-type and mutant SOD1 are both pretty low (0.36 and 0.41 respectively), indicating that AlphaFold is not very confident about the predicted binding structure. The first three generated peptides have IPTM scores of 0.24, 0.25, and 0.32, which are lower than the reference. The last generated peptide has an IPTM score of 0.43, which is higher than the reference.

Only the third (A4V-2) generated peptide binds to the dimerization interface of SOD1.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Predicted properties of the generated peptides:

Sequence	Property	Prediction	Value	Unit
HLYYAVALELK	💧 Solubility	Soluble	1.000	Probability
HLYYAVALELK	🩸 Hemolysis	Non-hemolytic	0.081	Probability
HLYYAVALELK	🔗 Binding Affinity	Weak binding	6.052	pKd/pKi
HLYYAVALELK	📏 Length		11	aa
HLYYAVALELK	⚖️ Molecular Weight		1319.5	Da
HLYYAVALELK	⚡ Net Charge (pH 7)		-0.15
HLYYAVALELK	🎯 Isoelectric Point		6.75	pH
HLYYAVALELK	💦 Hydrophobicity (GRAVY)		0.55	GRAVY
WRSYAVVLELWK	💧 Solubility	Soluble	1.000	Probability
WRSYAVVLELWK	🩸 Hemolysis	Non-hemolytic	0.130	Probability
WRSYAVVLELWK	🔗 Binding Affinity	Weak binding	6.818	pKd/pKi
WRSYAVVLELWK	📏 Length		12	aa
WRSYAVVLELWK	⚖️ Molecular Weight		1549.8	Da
WRSYAVVLELWK	⚡ Net Charge (pH 7)		0.76
WRSYAVVLELWK	🎯 Isoelectric Point		8.59	pH
WRSYAVVLELWK	💦 Hydrophobicity (GRAVY)		0.17	GRAVY
WRYYPVAAAWKK	💧 Solubility	Soluble	1.000	Probability
WRYYPVAAAWKK	🩸 Hemolysis	Non-hemolytic	0.021	Probability
WRYYPVAAAWKK	🔗 Binding Affinity	Weak binding	6.124	pKd/pKi
WRYYPVAAAWKK	📏 Length		12	aa
WRYYPVAAAWKK	⚖️ Molecular Weight		1538.8	Da
WRYYPVAAAWKK	⚡ Net Charge (pH 7)		2.76
WRYYPVAAAWKK	🎯 Isoelectric Point		10.00	pH
WRYYPVAAAWKK	💦 Hydrophobicity (GRAVY)		-0.72	GRAVY
WHYGAVGLRHK	💧 Solubility	Soluble	1.000	Probability
WHYGAVGLRHK	🩸 Hemolysis	Non-hemolytic	0.023	Probability
WHYGAVGLRHK	🔗 Binding Affinity	Weak binding	5.442	pKd/pKi
WHYGAVGLRHK	📏 Length		11	aa
WHYGAVGLRHK	⚖️ Molecular Weight		1323.5	Da
WHYGAVGLRHK	⚡ Net Charge (pH 7)		1.93
WHYGAVGLRHK	🎯 Isoelectric Point		9.99	pH
WHYGAVGLRHK	💦 Hydrophobicity (GRAVY)		-0.73	GRAVY

Predicted properties of the reference peptide:

Sequence	Property	Prediction	Value	Unit
FLYRWLPSRRGG	💧 Solubility	Soluble	1.000	Probability
FLYRWLPSRRGG	🩸 Hemolysis	Non-hemolytic	0.047	Probability
FLYRWLPSRRGG	🔗 Binding Affinity	Weak binding	5.968	pKd/pKi
FLYRWLPSRRGG	📏 Length		12	aa
FLYRWLPSRRGG	⚖️ Molecular Weight		1507.7	Da
FLYRWLPSRRGG	⚡ Net Charge (pH 7)		2.76
FLYRWLPSRRGG	🎯 Isoelectric Point		11.71	pH
FLYRWLPSRRGG	💦 Hydrophobicity (GRAVY)		-0.71	GRAVY

The peptide WHYGAVGLRHK has the highest ipTM score of 0.43, but it has a relatively low predicted binding affinity (5.442 pKd/pKi). The peptide WRSYAVVLELWK has a lower ipTM score of 0.25 but a higher predicted binding affinity (6.818 pKd/pKi). None of the generated peptides are predicted to be hemolytic or poorly soluble. The peptide WRSYAVVLELWK best balances predicted binding and therapeutic properties, as it has a reasonably high ipTM score and the highest predicted binding affinity among the generated peptides.

Part 4: Generate Optimized Peptides with moPPIt

Generated peptide sequence with predicted solubility score, affinity score, and hemolysis score:

['DFRQSTTYQY']
[0.9166666865348816, 6.323781490325928, 0.7198045253753662]

The moPPIt-generated peptide DFRQSTTYQY has a higher predicted binding affinity score and solubility score compared to the PepMLM-generated peptides. Before advancing this peptide to clinical studies, I would evaluate its binding affinity experimentally in vitro, and further assess its stability, toxicity, and pharmacokinetic properties in cell and animal models.

Part C: L-Protein Mutants

I first used Boltz-2 to predict the complex structure of the wild-type L-protein and DnaJ protein:

Next, I used FoldX, a force field-based protein design tool that can predict the effects of mutations on protein-protein interfaces. The goal is to identify mutations in the L-protein that are energetically favorable to stabilize the interaction with DnaJ.

To do this, I first relax the sidechain structure of the L-protein using the following command:

foldx --command=RepairPDB --pdb=result.pdb

The relaxation process slightly adjusts sidechain conformations to minimize steric clashes and optimize interactions. The resulting relaxed structure is shown below in blue, (green and cyan are the original structure predicted by Boltz-2):

Next, I scan through L-protein and mutate each residue to all 20 amino acids, and compute the change in binding energy (ΔΔG) for each mutation using the following command:

# Soluble
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=MA1a,EA2a,TA3a,RA4a,FA5a,PA6a,QA7a,QA8a,SA9a,QA10a,QA11a,TA12a,PA13a,AA14a,SA15a,TA16a,NA17a,RA18a,RA19a,RA20a,PA21a,FA22a,KA23a,HA24a,EA25a,DA26a,YA27a,PA28a,CA29a,RA30a,RA31a,QA32a,QA33a,RA34a,SA35a,SA36a,TA37a,LA38a,YA39a,VA40a

# Transmembrane
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=LA41a,IA42a,FA43a,LA44a,AA45a,IA46a,FA47a,LA48a,SA49a,KA50a,FA51a,TA52a,NA53a,QA54a,LA55a,LA56a,LA57a,SA58a,LA59a,LA60a,EA61a,AA62a,VA63a,IA64a,RA65a,TA66a,VA67a,TA68a,TA69a,LA70a,QA71a,QA72a,LA73a,LA74a,TA75a

The result of mutations on soluble region is shown below, green indicates mutations that are predicted to stabilize the interaction (negative ddG), while red indicates mutations that are predicted to destabilize the interaction (positive ddG):

The result of mutations on TM region is shown below:

Based on the results above, I would propose the following multi-site mutations in the soluble region:

DA26L + NA17W + CA29W: sum = −9.23
DA26L + EA25P + QA8T + FA22H: sum = −9.03
DA26L + NA17W + RA4E + SA9F: sum = −9.95
DA26L + EA2D + FA5M + RA20N: sum = −8.69
DA26L + HA24P + PA28K + RA34M: sum = −7.63

My rationale is that combining single stabilizing mutations will have an additive effect on the overall binding affinity. However, this assumption ignores potential epistatic interactions between mutations (non-additive effects).