Protein Design Part II

BRD4 Drug Discovery Platform • L-Protein Mutants

Part A: SOD1 Binder Peptide Design

Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

MATK V A VCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence and add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Peptide Sequence	Perplexity Score
WRSYVAALAHWK	12.24
WHYPAVAAAWKE	9.54
WRYGVAAAEHKK	12.30 (Best)
WRYYAVAAELWK	16.22
FLYRWLPSRRGG (Known)	20.63

The perplexity score for the known binder was calculated reusing the function compute_pseudo_perplexity() with the next code snippet:

compute_pseudo_perplexity(model, tokenizer, protein_seq, "FLYRWLPSRRGG")

Evaluate Binders with AlphaFold3

WRSYVAALAHWK ipTM: 0.32

This peptide appears to be surface-bound, but with a very poorly confident binding showed both by the low ipTM score and the expected positional error.

WHYPAVAAAWKE ipTM: 0.33

This peptide forms an α-helix that is still not buried in the protein and only forms a surface-bound. The low scores of this binding suggest a very poor positional confidence. The peptide locates in the opposite site to the N-terminus site where the A4V mutation is located.

WRYGVAAAEHKK ipTM: 0.37

This peptide is also surface-bound, with a very low positional confidence. It wraps around the β-barrel in the same plane the N-terminus is located, but in the top fraction of the protein.

WRYYAVAAELWK ipTM: 0.21

This peptide shows the weakest ipTM score of all other peptides. The peptide is still not contacting the protein, forming just a surface-bound with very little positional confidence. Wrap around the top part of the β-barrel.

FLYRWLPSRRGG [KNOWN BINDER] ipTM: 0.31

This peptide locates on the top part of the protein, opposite to the N-terminus, and around the β-barrel. It doesn't show a good confidence.

ALL ipTM scores ranged from 0.21 to 0.37, indicating a low overall confidence in protein-peptide interfaces across all models. The peptide WRYYAVAAELWK shows the lowest ipTM score of all models, while peptides WRSYVAALAHWK, WHYPAVAAAWKE, and WRYGVAAAEHKK all exceed the score of the known binder, where WRYGVAAAEHKK stands out as the interface prediction with the best confidence.

Evaluate Properties in the PeptiVerse

Peptide Focus	Property	Prediction	Value	Unit
WRYYAVAAELWK	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.101	Probability
	Binding Affinity	Medium binding	7.116	pKd/pKi
	Length	-	12	aa
	Molecular Weight	-	1555.8	Da
	Net Charge (pH 7)	-	0.76	-
	Isoelectric Point	-	8.50	pH
	Hydrophobicity	Medium binding	-0.24	GRAVY
WHYPAVAAAWKE Selected Candidate	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.025	Probability
	Binding Affinity	Weak binding	5.140	pKd/pKi
	Length	-	12	aa
	Molecular Weight	-	1428.6	Da
	Net Charge (pH 7)	-	-0.15	-
	Isoelectric Point	-	6.76	pH
	Hydrophobicity	-	-0.32	GRAVY
WRYGVAAAEHKK	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.017	Probability
	Binding Affinity	Weak binding	5.487	pKd/pKi
	Length	-	12	aa
	Molecular Weight	-	1415.6	Da
	Net Charge (pH 7)	-	1.85	-
	Isoelectric Point	-	9.70	pH
	Hydrophobicity	-	-1.00	GRAVY

It looks like there is no direct correlation between structural confidence and binding affinity, as the peptide with the highest ipTM score (WRYGVAAAEHKK) shows the weakest predicted affinity, while the one with the strongest affinity has the lowest confidence. All peptides show potentially good toxicity profiles, as they all are predicted as soluble an non-hemolytic.

I would choose a peptide to move forward by balancing structural confidence, binding affinity and therapeutic properites. In this context, WHYPAVAAAWKE emerges as a very good candidate: It has a solid structural confidence (ipTM 0.33), a reasonable binding affinity (5.14 pKd/pki), a low hemolysis probabilty (0.025) and is perfectly soluble.

Generate Optimized Peptides with moPPIt

Peptide	Hemolysis	Solubility	Affinity	Motif
RCQRKEFTNLAA	0.94	0.67	5.96	0.83
GGTQCEVKKISW	0.96	0.75	6.69	0.79
ETYAPEYTDINA	0.94	0.67	5.78	0.81

The peptides generated with moPPit are optimized for multiple therapeutic objectives, such as low hemolysis, high solubility, strong binding affinity, and motif presence. The moPPit peptides differ from the PepMLM-generated ones in the next ways:

They exhibit very high hemolysis probabilities (~0.94), making them potentially toxic, whereas all PepMLM peptides are non-hemolytic (≤0.101).
Their solubility is lower (~0.67 vs. ~1.0), and binding affinities are similar (5.78–5.96 pKd/pKi) except for one weak binder.
moPPit peptides were optimized for motif presence, achieving high motif scores in two cases.

Next Steps: To evaluate these peptides before clinical studies, I would first perform in vitro hemolysis and cytotoxicity assays on relevant cell lines, followed by binding affinity measurements and structural validation to ensure specific target engagement and exclude any non desired effects. Stability and solubility tests would also be essential. Finaly, we would take into account the delivery method to assess how effectively the peptide reaches the target [1].

BRD4 Drug Discovery Platform Tutorial

Has not received credits for Boltz platform

Final Project: L-Protein Mutants

Comparison between experimental results and language model predictions.

Observation: There exists very little correlation, and the language model often predicts combinations that yielded a Lysis=0 experimentally, indicating poor functional capture.

Site and functional correlation (Match & Lysis=1)

Site correlation (Match & Lysis=0)

No correlation

Table 1 – Experimental Results on Mutagenesis

Position	Mutation	Lysis	Protein Levels
1	M->I	0	0
1	M->I	0	0
1	M->T	0	0
2	E->Stop	0	N.D.
3	T->I	0	0
3	T->S	0	0
6	P->L	0	0
8	Q->Stop	0	N.D.
8	Q->L	0	0
8	Q->L	0	0
10	Q->Stop	0	N.D.
11	Q->Stop	0	N.D.
13	P->L	1	1
13	P->L	1	1
15	S->A	1	1
18	R->G	1	1
18	R->I	1	1
18	R->Stop	0	N.D.
19	R->S	1	0
19	R->H	1	0
20	R->W	1	0
20	R->W	1	0
20	R->L	1	0
23	K->E	1	0
23	K->E	1	0
23	K->Stop	0	N.D.
23	K->E	1	0
23	K->Stop	0	N.D.
25	E->V	1	0
25	E->G	1	0
25	E->D	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
25	E->G	1	0
26	D->G	1	0
27	Y->Stop	0	N.D.
29	C->Stop	0	N.D.
29	C->R	0	0
29	C->Stop	0	N.D.
30	R->Q	1	1
30	R->L	1	1
30	R->Stop	0	N.D.
30	R->Stop	0	N.D.
31	R->Stop	0	N.D.
31	R->Stop	0	N.D.
31	R->I	1	1
32	Q->Stop	0	N.D.
33	Q->H	0	1
33	Q->H	0	1
33	Q->Stop	0	N.D.
33	Q->Stop	0	N.D.
34	R->Stop	0	N.D.
34	R->Stop	0	N.D.
36	S->Stop	0	N.D.
36	S->Stop	0	N.D.
39	Y->H	0	0
39	Y->Stop	0	N.D.
39	Y->Stop	0	N.D.
40	V->E	0	0
41	L->Stop	0	N.D.
41	L->Stop	0	N.D.
41	L->Stop	0	N.D.
42	I->N	0	0
43	F->L	0	1
44	L->V	0	1
44	L->P	1	1
44	L->P	1	1
45	A->P	1	1
46	I->N	0	0
46	I->F	1	1
46	I->N	0	0
47	F->Y	0	1
47	F->Y	0	1
47	F->Y	0	1
48	L->P	0	1
49	S->L	0	1
49	S->Stop	0	N.D.
49	S->L	0	1
49	S->T	0	1
49	S->Stop	0	N.D.
49	S->T	0	1
50	K->E	0	1
50	K->N	0	1
50	K->N	0	1
50	K->Stop	0	N.D.
50	K->I	0	1
50	K->N	0	1
50	K->Q	0	0
50	K->I	0	1
50	K->E	0	1
50	K->N	0	1
50	K->Stop	0	N.D.
50	K->N	0	1
50	K->Stop	0	N.D.
51	F->S	0	1
51	F->S	0	1
52	T->N	0	0
53	N->S	0	1
53	N->S	0	1
53	N->D	0	1
53	N->H	0	1
53	N->S	0	1
53	N->I	0	0
53	N->Q	0	0
53	N->K	0	0
53	N->Q	0	0
54	Q->Stop	0	0
55	L->Stop	0	N.D.
55	L->Stop	0	N.D.
56	L->H	0	1
56	L->H	0	1
56	L->H	0	1
56	L->P	0	0
56	L->P	0	0
56	L->H	0	1
57	L->P	0	0
60	L->P	0	0
60	L->V	0	0
60	L->Q	0	0
60	L->P	0	0
60	L->Q	0	0
63	V->E	0	1
63	V->E	0	1
66	T->K	0	1
66	T->R	0	0
69	T->S	0	0
71	Q->Stop	0	N.D.
72	Q->Stop	0	N.D.
73	L->Stop	0	N.D.
73	L->Stop	0	N.D.
73	L->Stop	0	N.D.

Table 2 – Predicted scores via Language Models

Position	Initial AA	Mut AA	LLR Score
50	K	L	2.56
29	C	R	2.40
39	Y	L	2.24
29	C	S	2.04
9	S	Q	2.01
29	C	Q	2.00
29	C	P	1.97
29	C	L	1.96
50	K	I	1.93
53	N	L	1.86
61	E	L	1.82
52	T	L	1.81
50	K	F	1.80
29	C	T	1.80
29	C	K	1.80
5	F	Q	1.80
5	F	R	1.66
29	C	A	1.65
27	Y	R	1.63
22	F	R	1.60
5	F	P	1.60
50	K	V	1.59
50	K	S	1.57
5	F	T	1.56
5	F	S	1.56
45	A	L	1.54
39	Y	S	1.52
27	Y	S	1.50
40	V	L	1.48
27	Y	L	1.47
22	F	S	1.42
29	C	E	1.38
39	Y	A	1.36
29	C	N	1.36
50	K	A	1.36
29	C	I	1.34
5	F	L	1.33
17	N	R	1.32
39	Y	I	1.32
39	Y	T	1.30
26	D	R	1.27
29	C	H	1.25
39	Y	F	1.25
39	Y	V	1.24
23	K	R	1.24
25	E	R	1.23
24	H	R	1.23
50	K	T	1.22
27	Y	Q	1.22
27	Y	T	1.22

Original Protein

Original

Original + DnaJ

Predicted Mutant Structures

Structures predicted via MF2 Multimer for selected L-protein mutants. Each mutation targets either the transmembrane region (AA 41–75) or the soluble region (AA 1–40).

Mutations in Transmembrane Region (AA 41–75)

L44P: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFPAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

L44P

L44P + DnaJ

A45P: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLPIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

A45P

A45P + DnaJ

V63E: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAEIRTVTTLQQLLT

V63E

V63E + DnaJ

Mutations in Soluble Region (AA 1–40)

R30Q: METRFPQQSQQTPASTNRRRPFKHEDYPCRQQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

R30Q

R30Q + DnaJ

C29R: METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

C29R

C29R + DnaJ

References

[1] WuXi AppTec, "A Strategic Roadmap for Peptide Preclinical Studies: 3 Key Stages," labtesting.wuxiapptec.com, Oct. 30, 2025. [Online]. Available: https://labtesting.wuxiapptec.com/2025/10/30/a-strategic-roadmap-for-peptide-preclinical-studies-3-key-stages/. [Accessed: Mar. 16, 2026].