Week 5: Protein Design 2

This week, we need to find a favorable peptide to bind into the mutated (A4V) version of a SOD1 enzyme.

Part 1: Generate Binders with PepMLM

This is the original sequence of SOD1 as it appears here.

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Changing the 4th residue (In much of the SOD1/ALS literature, that first methionine (M) is considered removed in the mature protein, so residue numbers shift by minus 1.)

wild type: MATKAVCV… mutant: MATKVVCV…

Running the sequence of the A4V mutated SOD1 in the PepMLM Collab, asking for 4 different peptides at the length of 12 AA’s with K = 5.

I want to note that I do observe X to appear in 3 out of 4 sequences, more specifically the top 2 scoring sequences end with an X. This may be an issue downstream.

Noting that the Alphafold server does not allow to input sequences containing an X AA, I return to the PepMLM notebook to continue generating peptides under the same parameters until I have 4 valid candidates for Alphafold.

It took a couple more iterations but finally I got 4 valid, generated, peptide predictions.

Running the SOD1 V4A mutation alone first in the Alphafold server to get familiar with it’s shape before I model more peptides in the scene.

For clarity, highlighted in pink is the mutated residue (I manually replaced the V residue in the 4th position with an A residue before inputting the sequence to the Alphafold server)

I can also verify this seems to be a good representation of the modeled SOD1 mutated protein using it’s Alignment Residue graph and high pTm score of 0.96. Below is the provided explantation for the scores from the Alphafold server

How can I interpret confidence metrics to check the accuracy of structures?

pTM and ipTM scores: the predicted template modeling (pTM) score and the interface predicted template modeling (ipTM) score are both derived from a measure called the template modeling (TM) score. This measures the accuracy of the entire structure (Zhang and Skolnick, 2004; Xu and Zhang, 2010). A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. ipTM measures the accuracy of the predicted relative positions of the subunits within the complex. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction. ipTM values between 0.6 and 0.8 are a gray zone where predictions could be correct or incorrect. TM score is very strict for small structures or short chains, so pTM assigns values less than 0.05 when fewer than 20 tokens are involved; for these cases PAE or pLDDT may be more indicative of prediction quality.

Moving on to model the interactions of the generated peptides from PepMLM with the SOD1 V4A For the first peptide on our list, the WRYYAAQAAWKE variant, the model shows a proximity to the N-terminus where the mutation sits, and in the illustrated view we can see it is actually binding. According to ipTM and pTM scores, we learn the folded-form prediction has high confidence grade from the model, while the spatial predicition (how these two models actually interact in the simulated space) is very low, making this first simulation unreliable.

For the second peptide, the WLYPYVAVALAA variant, the model shows again there is no binding between the peptide and the SOD1 V4A protein. This time, the model provides a high confidence score for the fold(pTM), but the ipTM scores read as a ‘failed prediction’ (“values below 0.6 suggest likely a failed prediction.”) and results seems even further away from the target site (Residue 4 near the N-Terminuus). to conclude, the peptide does not serve us well.

For the third peptide, the WRVSVVGVVHGG, results look more promising. ipTM and pTM scores both look strong. the Cartoon view shows that the peptide binds to the SOD1 protein, and it does sit relatively closer to the N-terminus. Getting more comforable with the Alphafold UI I saw an option to switch the illustration mode, where it seems to actually bind to the SOD1 (went ahead and updated all the other simulations as well). I’m unsure if this counts as binding ‘on the target site’, as it’s not precisely touching it, but it is very close. another observation worth mentioning is that it seems this interaction has completley changed the SOD1’s spatial configuration. I think this peptide is a good candidate to proceed.

For the fourth peptide the KVNGAYAGRWLE, which also had the worst psuedo-perplexity score from the PepMLM model (27.7), we see a failed ipTM score, making this simulation irrelevant. A strong pTM results gives confidence about the folding structures, noting this is the first generated-peptide to demonstrate a secondary structure of beta sheets when folded. However, the low ipTM score does not allow me to proceed with this one.

Lastly, simulating the reference peptide FLYRWLPSRRGG also yields failed ipTM scores. I find it surprising, as I was expecting the reference peptide to demonstrate the desired behavior to serve as a reference. while this peptide seems to be bind to the protein for the most part, I cannot trust the outcome due to the very low confidence of the model in the spatial predicition between the two elements. Kind of happy to know one of my generated peptides (WRVSVVGVVHGG) seems more promising than the reference one.

Moving onward to the Peptiverse model to analyise the different metrics we care about regarding our generated peptides: 💧 Solubility 🔬 Permeability (Penetrance) 🩸 Hemolysis 👯 Non-Fouling ⏱️ Half-Life 🔗 Binding Affinity (in context of our mutated SOD1-V4A sequence) 📏 Length ⚖️ Molecular Weight ⚡ Net Charge (pH 7.4) 🎯 Isoelectric Point 💦 Hydrophobicity (GRAVY)

Peptide 1 — WRYYAAQAAWKE AlphaFold3 gives this complex the lowest ipTM (0.17), and structurally the peptide looks mostly extended and only lightly attached, with limited contact against the SOD1 surface. That visually suggests a weak or unstable interface. PeptiVerse agrees: it predicts weak binding (6.13 pKd/pKi). On the therapeutic side it is still attractive in that it is soluble, permeable, and non-hemolytic, but the structural model does not make it look like a strong binder.

Peptide 2 — WLYPYVAVALAA This one looks much better structurally: the peptide appears to sit along one face of SOD1 with a broader contact patch, and the ipTM of 0.56 is clearly stronger than pep1, pep4, or the reference. PeptiVerse also gives it the best affinity score of the set (7.367, medium binding). It is predicted to be soluble, permeable, and non-hemolytic, so this is the clearest case where the structure and property model are both favorable. The main caution is that it is fairly hydrophobic and not predicted as “non-folding,” but overall it still looks like the best-balanced lead.

Peptide 3 — WRVSVVGVVHGG Structurally, this is the strongest-looking AlphaFold3 hit: it has the highest ipTM (0.71) and the peptide seems to make a long, continuous interface across the protein surface, which is exactly the kind of pose you would hope for in a binder. But PeptiVerse does not rank it as the best binder; it is still only weak binding (6.653) and is also predicted to be non-permeable. So this is the clearest mismatch between the two models: best structural interface confidence, but not best therapeutic profile.

Peptide 4 — KVNGAYAGRWLE This complex is intermediate-to-weak by structure: ipTM is 0.43, and the peptide looks more like a surface appendage on one side rather than a strongly buried binder. PeptiVerse is consistent with that and predicts weak binding (6.095). It is still soluble and non-hemolytic, but also non-permeable, so there is not a strong reason to prioritize it over pep2 or even pep3.

Reference peptide — FLYRWLPSRRGG The reference peptide gives a low ipTM (0.32) and looks only partially engaged, with the chain remaining fairly extended and not deeply wrapped into the SOD1 surface. PeptiVerse also predicts weak binding (5.968), the weakest of the five by affinity score. Its upside is that it is soluble, highly permeable, non-hemolytic, and the only one predicted as non-folding, which is a nice therapeutic feature, but its binding looks less compelling than the better generated candidates.

Across the whole set, higher ipTM does not map perfectly onto stronger predicted affinity. The clearest example is peptide 3, which has the highest ipTM and pTM scores (which made me favor it moving onward earlier) but only weak predicted affinity, while peptide 2 has the best PeptiVerse affinity despite a lower ipTM than pep3. Also, none of the stronger candidates are predicted to be hemolytic or poorly soluble; all five are predicted soluble and non-hemolytic. The best overall balance of predicted binding plus therapeutic properties is peptide 2 (WLYPYVAVALAA). I would choose peptide 2 because it gives the best combined picture across both models: the strongest predicted affinity in PeptiVerse, a reasonably strong AlphaFold3 interface (ipTM 0.56), and the model indicates signals of being soluble, permeable, and non-hemolytic.

In the next step we are tasken with running the moPPIt model on the google collab notebook provided. Following instructions I inputted all required data points. However, the final step of retreiveing a csv file with the scored results failed. I tried to change the Runtime modes to CPU, and V5E-1 TPU mode, (originally ran it on the T4 GPU) both have also failed. error code reads “RuntimeError: moo.py failed with code 2”.