Protein Design Part II - L-Protein Mutants

Protein Design Part II

BRD4 Drug Discovery Platform • L-Protein Mutants

Part A: SOD1 Binder Peptide Design

Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

MATK V A VCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence and add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Peptide SequencePerplexity Score
WRSYVAALAHWK12.24
WHYPAVAAAWKE9.54
WRYGVAAAEHKK12.30 (Best)
WRYYAVAAELWK16.22
FLYRWLPSRRGG (Known)20.63

The perplexity score for the known binder was calculated reusing the function compute_pseudo_perplexity() with the next code snippet:

compute_pseudo_perplexity(model, tokenizer, protein_seq, "FLYRWLPSRRGG")

Evaluate Binders with AlphaFold3

WRSYVAALAHWK ipTM: 0.32
WRSYVAALAHWK Structure WRSYVAALAHWK Binding
This peptide appears to be surface-bound, but with a very poorly confident binding showed both by the low ipTM score and the expected positional error.
WHYPAVAAAWKE ipTM: 0.33
WHYPAVAAAWKE Structure WHYPAVAAAWKE Binding
This peptide forms an α-helix that is still not buried in the protein and only forms a surface-bound. The low scores of this binding suggest a very poor positional confidence. The peptide locates in the opposite site to the N-terminus site where the A4V mutation is located.
WRYGVAAAEHKK ipTM: 0.37
WRYGVAAAEHKK Structure WRYGVAAAEHKK Binding
This peptide is also surface-bound, with a very low positional confidence. It wraps around the β-barrel in the same plane the N-terminus is located, but in the top fraction of the protein.
WRYYAVAAELWK ipTM: 0.21
WRYYAVAAELWK Structure WRYYAVAAELWK Binding
This peptide shows the weakest ipTM score of all other peptides. The peptide is still not contacting the protein, forming just a surface-bound with very little positional confidence. Wrap around the top part of the β-barrel.
FLYRWLPSRRGG [KNOWN BINDER] ipTM: 0.31
FLYRWLPSRRGG Structure FLYRWLPSRRGG Binding
This peptide locates on the top part of the protein, opposite to the N-terminus, and around the β-barrel. It doesn't show a good confidence.

ALL ipTM scores ranged from 0.21 to 0.37, indicating a low overall confidence in protein-peptide interfaces across all models. The peptide WRYYAVAAELWK shows the lowest ipTM score of all models, while peptides WRSYVAALAHWK, WHYPAVAAAWKE, and WRYGVAAAEHKK all exceed the score of the known binder, where WRYGVAAAEHKK stands out as the interface prediction with the best confidence.

Evaluate Properties in the PeptiVerse

Peptide FocusPropertyPredictionValueUnit
WRYYAVAAELWKSolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.101Probability
Binding AffinityMedium binding7.116pKd/pKi
Length-12aa
Molecular Weight-1555.8Da
Net Charge (pH 7)-0.76-
Isoelectric Point-8.50pH
HydrophobicityMedium binding-0.24GRAVY
WHYPAVAAAWKE
Selected Candidate
SolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.025Probability
Binding AffinityWeak binding5.140pKd/pKi
Length-12aa
Molecular Weight-1428.6Da
Net Charge (pH 7)--0.15-
Isoelectric Point-6.76pH
Hydrophobicity--0.32GRAVY
WRYGVAAAEHKKSolubilitySoluble1.000Probability
HemolysisNon-hemolytic0.017Probability
Binding AffinityWeak binding5.487pKd/pKi
Length-12aa
Molecular Weight-1415.6Da
Net Charge (pH 7)-1.85-
Isoelectric Point-9.70pH
Hydrophobicity--1.00GRAVY

It looks like there is no direct correlation between structural confidence and binding affinity, as the peptide with the highest ipTM score (WRYGVAAAEHKK) shows the weakest predicted affinity, while the one with the strongest affinity has the lowest confidence. All peptides show potentially good toxicity profiles, as they all are predicted as soluble an non-hemolytic.

I would choose a peptide to move forward by balancing structural confidence, binding affinity and therapeutic properites. In this context, WHYPAVAAAWKE emerges as a very good candidate: It has a solid structural confidence (ipTM 0.33), a reasonable binding affinity (5.14 pKd/pki), a low hemolysis probabilty (0.025) and is perfectly soluble.

Generate Optimized Peptides with moPPIt

PeptideHemolysisSolubilityAffinityMotif
RCQRKEFTNLAA0.940.675.960.83
GGTQCEVKKISW0.960.756.690.79
ETYAPEYTDINA0.940.675.780.81

The peptides generated with moPPit are optimized for multiple therapeutic objectives, such as low hemolysis, high solubility, strong binding affinity, and motif presence. The moPPit peptides differ from the PepMLM-generated ones in the next ways:

  • They exhibit very high hemolysis probabilities (~0.94), making them potentially toxic, whereas all PepMLM peptides are non-hemolytic (≤0.101).
  • Their solubility is lower (~0.67 vs. ~1.0), and binding affinities are similar (5.78–5.96 pKd/pKi) except for one weak binder.
  • moPPit peptides were optimized for motif presence, achieving high motif scores in two cases.

Next Steps: To evaluate these peptides before clinical studies, I would first perform in vitro hemolysis and cytotoxicity assays on relevant cell lines, followed by binding affinity measurements and structural validation to ensure specific target engagement and exclude any non desired effects. Stability and solubility tests would also be essential. Finaly, we would take into account the delivery method to assess how effectively the peptide reaches the target [1].

BRD4 Drug Discovery Platform Tutorial

Has not received credits for Boltz platform

Final Project: L-Protein Mutants

Comparison between experimental results and language model predictions.

Observation: There exists very little correlation, and the language model often predicts combinations that yielded a Lysis=0 experimentally, indicating poor functional capture.

Site and functional correlation (Match & Lysis=1)
Site correlation (Match & Lysis=0)
No correlation
Table 1 – Experimental Results on Mutagenesis
PositionMutationLysisProtein Levels
1M->I00
1M->I00
1M->T00
2E->Stop0N.D.
3T->I00
3T->S00
6P->L00
8Q->Stop0N.D.
8Q->L00
8Q->L00
10Q->Stop0N.D.
11Q->Stop0N.D.
13P->L11
13P->L11
15S->A11
18R->G11
18R->I11
18R->Stop0N.D.
19R->S10
19R->H10
20R->W10
20R->W10
20R->L10
23K->E10
23K->E10
23K->Stop0N.D.
23K->E10
23K->Stop0N.D.
25E->V10
25E->G10
25E->D10
25E->G10
25E->G10
25E->G10
25E->G10
25E->G10
25E->G10
25E->G10
25E->G10
25E->G10
25E->G10
26D->G10
27Y->Stop0N.D.
29C->Stop0N.D.
29C->R00
29C->Stop0N.D.
30R->Q11
30R->L11
30R->Stop0N.D.
30R->Stop0N.D.
31R->Stop0N.D.
31R->Stop0N.D.
31R->I11
32Q->Stop0N.D.
33Q->H01
33Q->H01
33Q->Stop0N.D.
33Q->Stop0N.D.
34R->Stop0N.D.
34R->Stop0N.D.
36S->Stop0N.D.
36S->Stop0N.D.
39Y->H00
39Y->Stop0N.D.
39Y->Stop0N.D.
40V->E00
41L->Stop0N.D.
41L->Stop0N.D.
41L->Stop0N.D.
42I->N00
43F->L01
44L->V01
44L->P11
44L->P11
45A->P11
46I->N00
46I->F11
46I->N00
47F->Y01
47F->Y01
47F->Y01
48L->P01
49S->L01
49S->Stop0N.D.
49S->L01
49S->T01
49S->Stop0N.D.
49S->T01
50K->E01
50K->N01
50K->N01
50K->Stop0N.D.
50K->I01
50K->N01
50K->Q00
50K->I01
50K->E01
50K->N01
50K->Stop0N.D.
50K->N01
50K->Stop0N.D.
51F->S01
51F->S01
52T->N00
53N->S01
53N->S01
53N->D01
53N->H01
53N->S01
53N->I00
53N->Q00
53N->K00
53N->Q00
54Q->Stop00
55L->Stop0N.D.
55L->Stop0N.D.
56L->H01
56L->H01
56L->H01
56L->P00
56L->P00
56L->H01
57L->P00
60L->P00
60L->V00
60L->Q00
60L->P00
60L->Q00
63V->E01
63V->E01
66T->K01
66T->R00
69T->S00
71Q->Stop0N.D.
72Q->Stop0N.D.
73L->Stop0N.D.
73L->Stop0N.D.
73L->Stop0N.D.
Table 2 – Predicted scores via Language Models
PositionInitial AAMut AALLR Score
50KL2.56
29CR2.40
39YL2.24
29CS2.04
9SQ2.01
29CQ2.00
29CP1.97
29CL1.96
50KI1.93
53NL1.86
61EL1.82
52TL1.81
50KF1.80
29CT1.80
29CK1.80
5FQ1.80
5FR1.66
29CA1.65
27YR1.63
22FR1.60
5FP1.60
50KV1.59
50KS1.57
5FT1.56
5FS1.56
45AL1.54
39YS1.52
27YS1.50
40VL1.48
27YL1.47
22FS1.42
29CE1.38
39YA1.36
29CN1.36
50KA1.36
29CI1.34
5FL1.33
17NR1.32
39YI1.32
39YT1.30
26DR1.27
29CH1.25
39YF1.25
39YV1.24
23KR1.24
25ER1.23
24HR1.23
50KT1.22
27YQ1.22
27YT1.22

Original Protein

Original

Original + DnaJ

Predicted Mutant Structures

Structures predicted via MF2 Multimer for selected L-protein mutants. Each mutation targets either the transmembrane region (AA 41–75) or the soluble region (AA 1–40).

Mutations in Transmembrane Region (AA 41–75)

L44P: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFPAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

L44P

L44P + DnaJ

A45P: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLPIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

A45P

A45P + DnaJ

V63E: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAEIRTVTTLQQLLT

V63E

V63E + DnaJ

Mutations in Soluble Region (AA 1–40)

R30Q: METRFPQQSQQTPASTNRRRPFKHEDYPCRQQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

R30Q

R30Q + DnaJ

C29R: METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

C29R

C29R + DnaJ

References

  1. [1] WuXi AppTec, "A Strategic Roadmap for Peptide Preclinical Studies: 3 Key Stages," labtesting.wuxiapptec.com, Oct. 30, 2025. [Online]. Available: https://labtesting.wuxiapptec.com/2025/10/30/a-strategic-roadmap-for-peptide-preclinical-studies-3-key-stages/. [Accessed: Mar. 16, 2026].