Week 5 HW: Protein Design II

Part A: SOD1 Binder Peptide Design

Part 1:

The first step was retrieving human SOD1 sequence from Uniprot and introducing the A4V mutation. Here’s the SOD1 sequence:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Here’s the mutated SOD1 Sequence:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

I used the PepMLM Colab to generate the 4 peptides of length 12 as specified in the homework. I selected the length to be 12 and I chose 4 binders as I had to generate 4 peptides. This was the result.

BinderSequencePseudo Perplexity
0FLYRWLPSRRGGThis is the known binder that the homework said to add in the list
1SRWDEYTAVVAWARK9.686584
2SWYGEYTGVVAWRKK14.675614
3AHWPEYVVVVEWKKK20.736155
4SRVDEYTVRKKWARK15.232643

Part 2: Evaluate Binders with AlphaFold3

Next step was to evaluate the binders. I went to  alphafoldserver.com, logged in with my google account and then I was greeted by this screen.

For each peptide, I pasted the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. There were 5 total jobs. to be submitted.

Job 0: The Mutated SOD1 and the known binder: FLYRWLPSRRGG

The peptide (shown in orange/red) visually wraps around one face of the β-barrel, appearing partially surface-bound. It does not appear to penetrate deeply. Its position is consistent with engagement near the loop regions connecting β-strands rather than strictly at the N-terminus where A4V (position 4) sits.

Job 2: Mutated SOD1 and Binder 1: SRWDEYTAVVAWARK

The peptide (visible as a yellow coil) appears to dock away from the main β-barrel body — positioned more distally and looking loosely tethered. It does not appear buried and likely represents a surface-level interaction, possibly near an external loop rather than the A4V mutation site directly.

Job 3: Mutated SOD1 and Binder 2: SWYGEYTGVVAWRKK

The peptide (orange/red loop) appears to contact the β-barrel on a lateral face and partially approaches what could be the dimer interface region. It sits more surface-exposed rather than buried.

Job 4: Mutated SOD1 and Binder 3: AHWPEYVVVVEWKKK

The peptide (yellow, compact) appears to bind near the front face of the β-barrel and shows relatively close association with the protein body. It could be engaging a region near the electrostatic loop or β-barrel surface, though not deeply buried.

Job 5: Mutated SOD1 and Binder 4: SRVDEYTVRKKWARK

The peptide (orange/red) drapes along one edge of the SOD1 structure, appearing surface-bound. Its positioning is loosely consistent with an approach toward the N-terminal β-strand region where A4V resides, though definitive localization is limited without residue-level contact maps.

ipTM Scores and Binding Description

BinderSequenceipTMpTM
0 (Known)FLYRWLPSRRGG0.320.77
1SRWDEYTAVVAWARK0.440.87
2SWYGEYTGVVAWRKK0.220.81
3AHWPEYVVVVEWKKK0.350.83
4SRVDEYTVRKKWARK0.310.87
None of the peptides show convincing deep burial, suggesting predominantly surface-level or shallow groove engagement with the β-barrel exterior.

The ipTM scores across all five complexes range from 0.22 to 0.44, values that collectively sit in the low-to-moderate confidence range for inter-chain interaction quality. The known binder (FLYRWLPSRRGG) achieves an ipTM of 0.32, which serves as the reference benchmark. Notably, Binder 1 (SRWDEYTAVVAWARK) is the only PepMLM-generated peptide to exceed this, reaching an ipTM of 0.44. A meaningful improvement of ~0.12 over the known binder. Binder 1 stands out as the most structurally promising among the generated candidates.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Now we had to evaluate the properties of the generated peptides, we use PeptiVerse for this.

The workflow was simple, paste the peptide seq and paste mutated SOD1 seq, check the boxes according to the homework: Predicted binding affinity, Solubility, Hemolysis probability, Net charge (pH 7), Molecular weight. Here are the results.

For Peptide 1: SRWDEYTAVVAWARK

For Peptide 2: SWYGEYTGVVAWRKK

For Peptide 3: AHWPEYVVVVEWKKK

For Peptide 4: SRVDEYTVRKKWARK

For Peptide 0, not generated but known: FLYRWLPSRRGG

BinderSolubilityHemolysis Prob.Binding Affinity (pKd)Net Charge (pH 7)MW (Da)
0 – FLYRWLPSRRGG0.6080.0476.361+2.761507.7
1 – SRWDEYTAVVAWARK1.0000.0786.905+0.461838.0
2 – SWYGEYTGVVAWRKK1.0000.0386.728+1.461830.0
3 – AHWPEYVVVVEWKKK1.0000.0276.808+0.881898.2
4 – SRVDEYTVRKKWARK1.0000.0386.380+3.461922.2

When the PeptiVerse predictions are overlaid with the AlphaFold3 structural data, a reasonably coherent picture emerges. Binder 1 (SRWDEYTAVVAWARK) leads on both fronts. The highest ipTM (0.44) and the strongest predicted binding affinity (6.905 pKd), suggesting that the structural confidence in its interface correlates with a tighter predicted binding interaction. This is the clearest case of structural and biochemical agreement. Binder 3 (AHWPEYVVVVEWKKK) also performs consistently. It has a moderate ipTM of 0.35 pairs with an affinity of 6.808 pKd and the lowest hemolysis probability (0.027), making it the safest therapeutic candidate on safety metrics.

Binder 1 is the clear choice to advance. It uniquely leads on the structural confidence metric (ipTM = 0.44, the only one to exceed the known binder), has the highest predicted binding affinity (6.905 pKd), is perfectly soluble (1.000), and is non-hemolytic. Its near-neutral net charge (+0.46) is also favorable.

Part 4: Generate Optimized Peptides with moPPIt

Now for the last part, I was supposed to generate better and optimized peptides using moPPit Colab . I copied the Colab notebook. After running the first two cells, to clone the Github repo and install requirements. I found the cell with the generation setup, I chose ‘de-novo synthesis’ pasted my mutated SOD1 sequence in the target protein sequence. The target protein box opens up when you select motif/affinity guidance. I chose residues 1-10 and set the peptide length to 12. I enabled motif and affinity guidance and generated the peptides.

My parameters.

This was the point upto which the code functioned. I tried my best to run the code but on the Colab there was always an error repeated, I tried in a new runtime, I tried all kinds of hacks using Gemini but it didn’t work. I tried finding an alternative access platform for moppit but it wasn’t available. I decided to just see the generated peptides of other people. I noticed that the moppit generated proteins are actually more optimized to the specific goal that we are trying to achieve, based on the weights, I think. (solubility, hemolysis etc.)

As about the evaluation prior to clinical studies, I would do more in-silico binding simulations, then screen the successful candidates to in-vitro binding assays like ELISA, then look for real hemolytic, cytotoxic property and then possibly move on to animal studies.

Part C L-Protein Mutants

Option 1: Mutagenesis

I used the Colab Notebook, provided in the pdf. Notebook I pasted in the L-Protein sequence and ran the notebook to get the per-substitution LLR scores and then I got a list top 20 mutations with positive score mutations.

PositionWild_Type_AAMutation_AALLR_Score
50KL2.561468
29CR2.395427
39YL2.24178
29CS2.04315
9SQ2.014325
29CQ1.997049
29CP1.971029
29CL1.960646
50KI1.928801
53NL1.864932
61EL1.818098
52TL1.813968
50KF1.802069
29CT1.797247
29CK1.795878
5FQ1.795244
5FR1.659717
29CA1.648656
27YR1.628061
22FR1.602028
5FP1.596891
50KV1.594576
50KS1.574557
5FT1.559024
5FS1.556417
45AL1.539248
39YS1.517457
27YS1.497053
40VL1.47763
27YL1.474637
22FS1.423358
29CE1.383281
39YA1.364999
29CN1.362601
50KA1.357795
29CI1.344121
5FL1.332615
17NR1.323651
39YI1.320103
39YT1.302804
26DR1.268762
29CH1.246107
39YF1.245851
39YV1.24439
23KR1.236555
25ER1.22935
24HR1.227779
50KT1.222131
27YQ1.218851
27YT1.215567
Cross-checking this with the experimental data provided along with the homework. I found out that there are only two overlapping mutations.
  1. Position 29: C→R Experimental Lysis=0 LLR Score=2.3954
  2. . Position 50: K→I Experimental Lysis=0 LLR Score=1.9288

Now. categorizing beneficial mutations by region. I found out that

  • 19 mutations improved lysis (Lysis=1)
  • 63 mutations impaired lysis (Lysis=0)

Regional Distribution

RegionPositionsBeneficial Mutations
Transmembrane35–593
Soluble1–34, 60–7516

84% of beneficial mutations occur in soluble regions.

Top Beneficial Experimental Mutations

Soluble Region

PositionMutationEffect
13P→LImproved lysis
15S→AImproved lysis
18R→GImproved lysis
18R→IImproved lysis
19R→SImproved lysis

Transmembrane Region

PositionMutationEffect
44L→PImproved lysis
45A→PImproved lysis
46I→FImproved lysis

Top Language Model Predictions

PositionMutationLLR ScoreRegion
50K→L2.5615Transmembrane
29C→R2.3954Soluble
39Y→L2.2418Transmembrane
29C→S2.0431Soluble
9S→Q2.0143Soluble

Selection of 5 Candidate Mutations

The criteria for selection was:

  • Prioritized high LLR scores
  • Considered experimental evidence
  • Maintained regional balance (TM + soluble)
  • Avoided redundant positions
  • Focused on mutations supported by both datasets

Here’s a list of selected Candidate Mutations

MutationRegionLLR ScoreKey Rationale
K50LTransmembrane2.5615Highest-scoring mutation; increases TM hydrophobicity and may improve membrane insertion
Y39LTransmembrane2.2418Enhances hydrophobic packing within TM helix
C29RSoluble2.3954High-scoring mutation that may improve oligomerization despite weak experimental support
S9QSoluble2.0143Likely enhances hydrogen bonding and N-terminal stability
A45PTransmembrane1.5392Experimentally validated (Lysis=1); may improve pore geometry via helix kink formation