Week 5 HW: Protein Design Part II
Part A. SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM

This homework starts by opening the protein database UniProt to copy the FASTA format of the AA sequence of Superoxide dismutase (SOD1). According to UniProt, Superoxide dismutase functions as a catalyst for the oxidation of hydrogen sulfide (H2S) to sulfate, playing a crucial role in detoxifying H2S and limiting the accumulation of reactive sulfur species (RSS). It is an enzyme critical for destroying free radicals produced within cells, which are toxic to biological systems. As such, mutations to the gene that encodes the enzyme come with their own potential health complications. One such mutation is the alanine to valine mutation at codon 4 (A4V) of SOD1. It is a mutation that causes a rapidly progressive dominant form of amyotrophic lateral sclerosis (ALS). ALS is a terminal disease that causes the progressive loss of motor neurons, which leads to paralysis and death (Saeed et al., 2009). The images below show the AA sequence of the normal SOD1 protein along with its mutated counterpart.
sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
By introducing the A4V mutation through changing the fourth AA other than the start codon Alanine into Valine, the AA sequence would be as follows:
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
The next part of the homework is to generate four 12-amino acid peptides conditioned on the mutant SOD1 sequence using PepMLM, a target-sequence-conditioned generator of de novo linear peptide binders (Chen et al., 2023). PepMLM learns amino-acid patterns from biological databases and uses this information to predict or suggest residues in peptide sequences, enabling the in-silico design and evaluation of natural or de novo peptides.
Inputting the mutated sequence into PepMLM
Changing the parameters from 15 AA’s to 12 AA’s.
By inputting the mutated sequence of SOD1, I’m able to generate these peptide sequences.
| No. | Peptide Sequence | Perplexity Score |
|---|---|---|
| 1 | WLYYAVAVRHGX | 13.922991 |
| 2 | WHSPAVAVAHWK | 8.861463 |
| 3 | WRYYPVALRHKX | 11.572637 |
| 4 | WHYYPVVVRWKX | 16.873592 |
| Reference | FLYRWLPSRRGG | - |
The table above shows something called Perplexity Score. Perplexity Score measures how well PepMLM predicts a sequence. The lower the Perplexity Score, the more the sequence fits patterns learned from real proteins, while a higher perplexity score means decreased confidence and improbability in the model. The peptides generated when I run the mutated sequence of SOD1 through PepMLM show a relatively low Perplexity Score.
Part 2: Evaluate Binders with AlphaFold3
References:
Chen, T., Dumas, M., Watson, R., Vincoff, S., Peng, C., Zhao, L., Hong, L., Pertsemlidis, S., Shaepers-Cheu, M., Wang, T. Z., Srijay, D., Monticello, C., Vure, P., Pulugurta, R., Kholina, K., Goel, S., DeLisa, M. P., Truant, R., Aguilar, H. C., . . . Chatterjee, P. (2023). PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling. ArXiv. https://arxiv.org/abs/2310.03842
Saeed, M., Yang, Y., Deng, H. X., Hung, W. Y., Siddique, N., Dellefave, L., Gellera, C., Andersen, P. M., & Siddique, T. (2009). Age and founder effect of SOD1 A4V mutation causing ALS. Neurology, 72(19), 1634–1639. https://doi.org/10.1212/01.wnl.0000343509.76828.2a
AI Prompts:
OpenAI. (2026, March 10). “Describe PepMLM to me. What are perplexity scores? What’s the function of PepMLM?” Prompt to ChatGPT.