Week 5 HW: Protein design part II
Homework
Part A: SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM
1- Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
(SOD1)
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
(SOD1 A4V)
2- Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card. 3- Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:
0 WHYGVTGLELKE, 21.127040
1 WRVYAVALEWKX, 12.035041
2 KRYYAVAVAHK, 18.943191
3 WRYPVAAAEHGX, 8.500123
Control: FLYRWLPSRRGG (X were A in all the peptides)
Part 2: Evaluate Binders with AlphaFold3
1- Navigate to the AlphaFold Server: alphafoldserver.com. 2- For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. 3- Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
Results: P0: ipTM = 0.22 pTM = 0.8. It appears to bind to the dimer interface, surface-bound.
P1: ipTM = 0.2 pTM = 0.83. Binds between the N-terminus and the B-barrel, partially buried. (ipTM scores 0.3 in other run that i make accidentally)
P2: ipTM = 0.48 pTM = 0.89. Binds between the B-barrel and the dimer interface, partially buried?
P3: ipTM = 0.37 pTM = 0.83. Dimer interface, partially buried?
Control: ipTM = 0.38 pTM = 0.83. Dimer interface, partially buried.
4- In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder. The ipTM values where generally shorts, but the P2 is the most notorious of the generated peptides. This also exceeds the know binder.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
1- Paste the peptide sequence. 2- Paste the A4V mutant SOD1 sequence in the target field. 3- Check the boxes
(For the P0, as example) 💧 Solubility,Soluble 1.000 🔬 Permeability (Penetrance), Non-permeable 0.043 🩸 Hemolysis, Non-hemolytic 0.067 👯 Non-Fouling, Fouling 0.360 ⏱️ Half-Life 1.167 h 🔗 Binding Affinity, Weak binding 5.364 ⚖️ Molecular Weight 1431.6 ⚡ Net Charge (pH 7) -1.14 🎯 Isoelectric Point 5.55 💦 Hydrophobicity (GRAVY) -0.50
P0: 5.364,pKd/pKi
P1: 6.013,pKd/pKi
P2: 5.406,pKd/pKi
P3: 5.446,pKd/pKi
Control: 5.555,pKd/pKi
Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties? Well, not necesarily. P2 wasn´t the best, as in comparison with P1. There is no correlation per se between AlphaFold 3 and PeptiVerse. No, there is no one with high hemolysis and poorly soluble. Im going to continue with P1 as has the best binding predicted, the other components seems to be okay, and in AlphaFold 3 the ipTM wasn´t so bad.
Part 4: Generate Optimized Peptides with moPPIt
After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?
1: DPTVLLLDFNWL a: 5.3629 (AF: ipTM = 0.48 pTM = 0.86)
2: MLAGIFGATPGT a: 5.3119 (ipTM = 0.37 pTM = 0.87)
3: WTPEEQRQWREL a: 7.0440 (ipTM = 0.69 pTM = 0.88)
4: TWEEWMKIYYEA a: 7.6667 (ipTM = 0.67 pTM = 0.89)
5: CPSAAWVCGPIW a: 7.5019 (ipTM = 0.18 pTM = 0.69)
ipTM were very low. Some were far away from the 1-10 region selected. 1, 4 and 5 were closest to the N-terminus, all surface bounded.
Part C: Final Project: L-Protein Mutants
Im going for the Option 2: Mutagenesis using Af2-Multimer. Going to optimize the structure of the N-terminus domain while optimizing worst binding with Dnaj.
Useful data:
L protein sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
1-30: METRFPQQSQQTPASTNRRRPFKHEDYPCR
31-75: RQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
Dnaj sequence: MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDSQKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYNMELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQQTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAPAGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTGKLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRSKSFFDGVKKFFDDLTR
Position Wild_Type_AA Mutation_AA LLR_Score
989 50 K L 2.561464 574 29 C R 2.395427 769 39 Y L 2.241778 575 29 C S 2.043150 173 9 S Q 2.014323 573 29 C Q 1.997049 572 29 C P 1.971028 569 29 C L 1.960646 987 50 K I 1.928798 1049 53 N L 1.864930 1209 61 E L 1.818096 1029 52 T L 1.813965 984 50 K F 1.802066 576 29 C T 1.797247 568 29 C K 1.795877 93 5 F Q 1.795244 94 5 F R 1.659716 560 29 C A 1.648655 534 27 Y R 1.628060 434 22 F R 1.602028 92 5 F P 1.596889 997 50 K V 1.594573 995 50 K S 1.574555 96 5 F T 1.559023 95 5 F S 1.556416 889 45 A L 1.539248 775 39 Y S 1.517457 535 27 Y S 1.497052 789 40 V L 1.477630 529 27 Y L 1.474638 435 22 F S 1.423357 563 29 C E 1.383282 760 39 Y A 1.364997 571 29 C N 1.362601 980 50 K A 1.357792 567 29 C I 1.344121 89 5 F L 1.332615 334 17 N R 1.323652 767 39 Y I 1.320101 776 39 Y T 1.302803 514 26 D R 1.268762 566 29 C H 1.246106 764 39 Y F 1.245850 777 39 Y V 1.244389 454 23 K R 1.236555 494 25 E R 1.229349 474 24 H R 1.227778 996 50 K T 1.222128 533 27 Y Q 1.218850 536 27 Y T 1.215567
Amino Acid Position Score 0 L 50 2.561468 1 L 39 2.241780 2 I 50 1.928801 3 L 53 1.864932 4 L 52 1.813968 5 F 50 1.802069 6 V 50 1.594576 7 S 50 1.574557 8 L 45 1.539248 9 S 39 1.517457 10 L 40 1.477630 11 A 39 1.364999 12 A 50 1.357795 13 I 39 1.320103 14 T 39 1.302804 15 F 39 1.245851 16 V 39 1.244390 17 T 50 1.222131 18 L 54 1.120860 19 R 39 1.064191
Position Wild_Type_AA Mutation_AA LLR_Score
989 50 K L 2.561468 574 29 C R 2.395427 769 39 Y L 2.241780 575 29 C S 2.043150 173 9 S Q 2.014325 573 29 C Q 1.997049 572 29 C P 1.971029 569 29 C L 1.960646 987 50 K I 1.928801 1049 53 N L 1.864932 996 50 K T 1.222131 533 27 Y Q 1.218851 536 27 Y T 1.215567
8 copias: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
I select this mutations based on the scores predicted by the AIs and the N-terminus region utilize to model: C29R, C29S, S9Q, C29Q, C29P, C29L, C29T, C29K, F5Q, F5R, C29A, Y27R, F22R, F5P, F5T, F5S, Y27S, Y27L, F22S, C29E, C29N, C29I, F5L, N17R, D26R, C29H, K23R, E25R, H24R, Y27Q, Y27T
I make a code with chatgpt to from the mutations (using only 1-30 AA of the L protein) → ESM2 LLR → choose top candidates stables → ColabFold peptide only → ColabFold peptide + chaperon → score final by:
- mono_pLDDT high
- mono_pTM high
- complex_ipTM low
- contact peptide–chaperon low
- final_score high
