Week 5 HW: Protein Design II

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge:

Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy.

You will use three models developed in our lab:

PepMLM: target sequence-conditioned peptide generation via masked language modeling

PeptiVerse: therapeutic property prediction moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

Amino Acid Sequence: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Part 2: Evaluate Binders with AlphaFold3

Navigate to the AlphaFold Server: alphafoldserver.com

For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.

Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

From the alphafold predictions, the peptide binders generated via pepMLM have a weak accuracy of .23-.28, meaning the probability of these binder to protein associations are unlikely to occur.

https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/confidence-scores-in-alphafold-multimer/

Vocab

Pseudo perplexity: measures the models uncertainty when predicting an amino acid sequece

iptm: accuracy of the predicted relative positions of the subunits forming the protein-protein complex.

In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

The PepMLM binders do not exceed the known binder in any meaningful capacity, with alphafold giving FLYWRLPSRRGG a .24 iptm rating which is only .04 less than the generated binders. The chains via alphafold were not attached to the barrel deepening the low confidence scores of the binders.

None of the generated binder exceed the known binder of FLY

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

Paste the peptide sequence.

Paste the A4V mutant SOD1 sequence in the target field.

Check the boxes:

Predicted binding affinity
Solubility
Hemolysis probability
Net charge (pH 7)
Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

All of the peptides had similar molecular weights, solubility scores and four out of the five had weak binding scores which correlates to the alphafold iptm ratings of low confidence. In the alphafold 3D structure, the binders did not connect to the barrel. The pH scores range from 4.05 to 11.71.

WRVYAALALWE is the only one stated to have a medium binding affinity but has the highest hemolysis score. I am interested if this was a consistent pairing and if so what the biochemical association is.

Choose one peptide you would advance and justify your decision briefly.

WRVYAALALWE or WRYPAVAAHKE are the peptides I’d advance as candidates due to several factors.

In the alphafold #D structure, it is the only binder that details a sheet form, pseudo-proving strength in the determination of its peptide to protein binding affinity with an iptm rating of .28 (low confidence but higher compared to the lowest scores of .23 and .24.) WRVYAALALWE also has medium binding stated by peptiverse.

However, WRYPAVAAHKE also has the lowest hemolysis score of .013, solubility of 1 and an the same iptm rating as WRVYAALALWE.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPit Colab linked from the HuggingFace moPPIt model card

Make a copy and switch to a GPU runtime.

In the notebook:

Paste your A4V mutant SOD1 sequence.

Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).

Set peptide length to 12 amino acids.

Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.

Unfortunately, when I attempted to change the run type, I was met with a paywall that I am unable to bypass. I have done the experiment before and will be relying on previous runs and information gathered from my peers in the node.

When I tried running the co-lab regardless of the hardware accelerator impediment, I kept running into this issue:

I asked gemini to evaluate the code in the case the code could be run regardless of hardware complications.

To give the experiment a fighting chance one more time, I accepted the changes gemini recommended.

However, regardless of the edits gemini was able to administer I was already down a rabbit whole full of errors. I pivoted to using the data Amanda Mainello, the BUGSS instructor, was able to load and save from the moPPit colab. I was not able to implement these into alpha fold for further comparisons due to9 the sequence being covered.

After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

The moPPit scores demonstrate a higher hemolysis rate but this may be due to the different value sets of the model. The affinity scores are comparable to the ones generated by peptiverse which was between 4-7, meaning the moPPit scores would be determined as weak binding. The overall solubility scores are less than that gathered from peptiverse.

In my evaluation, I would start by understanding the value ranges and categories of the binder traits further.

From my assumptions that I can make with the information provided and with limited knowledge:

First, determine my research goals and experimental parameters that are specific to the function of the peptide to the protein.

I would evaluate the generated peptides and do a cross comparison to ensure if there are any major incongruencies in vital categories such as hemolysis, pH, specificity and binding affinity to the protein.

From doing rigorous comparisons and determining candidate peptide binders, in vivo testing in animals to study the predicted behavior to the in vivo reactions would be required prior to clinical studies.