Week 5 HW: Protein Design Part II

This page tackles all homeworks of week 5.

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

  • – Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
  • Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
  • – Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
  • – To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
  • – Record the perplexity scores that indicate PepMLM’s confidence in the binders.

The human SOD1 sequence is: MATVAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Post-translation, the starting Methionine gets removed.
After A4V Mutation, it becomes: ATVVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generated four candidate binder peptides (length = 12 aa) using PepMLM conditioned on the mutant SOD1 (A4V) sequence. Lower pseudo-perplexity values indicate higher model confidence in the generated binder sequence.

The following abbreviations ae used in the table: PeptiVerse (PV); AlphaFold (AF);

BinderPseudo PerplexityAF ipTMAF pTTMPV 💧 Solubility [Probability]PV 🩸 Hemolysis [Probability]PV 🔗 Binding Affinity [pKd/pKi]PV 📏 Length [aa]⚖️ Molecular Weight [Da]⚡ Net Charge (pH 7)🎯 Isoelectric Point [pH]💦 Hydrophobicity [GRAVY]
GDNVSAAGRPWW29.5290090.420.891 (Soluble)0.022 (Non-hemolytic)6.54 (Weak binding)121315.4-0.245.84-0.72
RSPPVVGVVRDE23.5625440.390.870.987 (Soluble)0.04 (Non-hemolytic)6.501 (Weak binding)121309.5-0.236.25-0.3
KDRSAVGAKRKE21.2789730.50.91 (Soluble)0.026 (Non-hemolytic)5.677 (Weak binding)121344.52.7710.28-1.76
WSYWAVLAYLKR19.1595060.50.8350.901 (Soluble)0.155 (Non-hemolytic)6.646 (Weak binding)121555.81.769.70.15
WRSYAVAIGHKK16.8721210.320.821 (Soluble)0.019 (Non-hemolytic)6.121 (Weak binding)121415.62.8410.29-0.55
FLYRWLPSRRGG (known SOD1-binding peptide; comparison control)N/A0.290.790.608 (Soluble)0.047 (Non-hemolytic)6.366 (Weak binding)121507.72.7611.71-0.71
DRYYAQVIRRKX15.398977
WSSVVTGLKLKX12.151349
WSYPAVAARLKX6.810960

Among the generated candidates, WSYPAVAARLKX achieved the lowest pseudo-perplexity (6.81), suggesting the highest confidence prediction by PepMLM among the generated sequences.

Important Note: X denotes a masked/unspecified amino acid token generated by the model pipeline and may require post-processing or substitution depending on downstream analysis. The “X” causes problems in Alphafold and so the first four were generated by editing that Google Collab…

Part 2: Evaluate Binders with AlphaFold3

  • – Navigate to the AlphaFold Server: alphafoldserver.com
  • – For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
  • – Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
  • – In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

Most information is consolidated in the above table.

Among the peptides, I will be going forward with WRSYAVAIGHKK as its parameters (in the above table) seem to be the closest to the known binding peptide. The peptide is localised near the C-terminus; but both the C and N terminus are also near each other…

For FLYRWLPSRRGG, the peptide seems to be closer to the beta-sheets and more near to the N-terminus than the C-terminus.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, we evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  • – Paste the peptide sequence..
  • – Check the boxes: Predicted binding affinity, Solubility, Hemolysis probability, Net charge (pH 7), Molecular weight
  • – Paste the A4V mutant SOD1 sequence in the target field.
  • – Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?
  • – Choose one peptide you would advance and justify your decision briefly.

(Above table is to be referred…)

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

  • – Open the moPPit Colab linked from the HuggingFace moPPIt model card (Make a copy and switch to a GPU runtime.)
  • – In the notebook:
    - Paste your A4V mutant SOD1 sequence.
    - Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
    - Set peptide length to 12 amino acids.
    - Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
  • – After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

The below peptides were generated using moPPIt:

BinderHemolysisSolubilityAffinityMotif
CATGCNVWPGVI0.04778951416.5986404420.667649984
ADSEFTAPSEAH0.05709111715.7097411160.711770117
ESEKYGVQCHIT0.06423294516.1200928690.721853793
CFAGIYKQKEQT0.04899525616.0077400210.786077559
QAQCGQFQFNVE0.0400816216.1981964110.903030217
SQCTRVLVPTIC0.11427587316.3552865980.730438828
ARKPCFAALQSA0.02695757216.2208309170.638868332
EKPDYHDGPCWI0.0466313360.999999946.4837784770.73085916

These peptides have similar ranges for the parameters mentioned in the above table when compared to the first table on this page; further analysis on multiple other dimensions need to be performed for a thorough comparison.

Part C: Final Project: L-Protein Mutants

L-Protein Engineering | Option 3: Random Mutagenesis

Based on the Table Information provided, and filtering to only the mutations where Lysis is happenning and protein levels are assigned as 1, I identify the following mutations should take place:

Position of the mutation in LBase Pair Changed (RNA nucleotide coordinate of the MS2 phage genome)Amino Acid PositionAmino Acid ChangeLysisProtein Levels (ND=Not determined)
38C->T13P->L11
43T->G15S->A11
52A->G18R->G11
53G->T18R->I11
89G->A30R->Q11
89G->T30R->L11
92G->T31R->I11
131T->C44L->P11
131T->C44L->P11
133G->C45A->P11
136A->T46I->F11

We use an L-protein with the AA mutations at the following positions 13, 30 and 46(from the above table) thereby leading to this L-protein sequence: METRFPQQSQQTLASTNRRRPFKHEDYPCLRQQRSSTLYVLIFLAFFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

DnaJ is completely wild-type (unmutated): MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDSQKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYNMELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQQTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAPAGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTGKLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRSKSFFDGVKKFFDDLTR

Alphofold3 co-folding result of 3-mutated L-protein with DnaJ
Alphofold3 co-folding result of 3-mutated L-protein with DnaJ