Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design

Superoxide dismutase 1 (SOD1) is a cytosolic enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Ala to Val at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge:

Design short peptides that bind mutant SOD1.
Then decide which ones are worth advancing toward therapy.

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Fig 1. The human SOD1 protein sequence.

At position 4 (excluding Met); change Ala → Val

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Fig 2. The human SOD1 variant carrying an A4V mutation.

Using the PepMLM CoLab linked from the HuggingFace PepMLM-650M model card:
Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Peptide Name	Sequence	Perplexity
peptide_1	WRYYVVVAEWGE	30.94
peptide_2	WLYYATVARWGK	20.55
peptide_3	WHYYVVGLRWWE	28.21
peptide_4	WRYYVTGAAWWK	17.13
known binder peptide	FLYRWLPSRRGG	20.6

Table 1. Results of generating 4 new peptides and the perplexity values against the SOD1_A4V by PepMLM. The known binder is shown in the bottom row for comparison.

Newly generated peptides with varied pseudo perplexity values indicate that some could achieve even better binding than the known binding peptide, FLYRWLPSRRGG, as judged by the lower value in perplexity.

Part 2: Evaluate Binders with AlphaFold3

Navigate to the AlphaFold Server: alphafoldserver.com
For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
Record the ipTM score and briefly describe where the peptide appears to bind.

Does it localize near the N-terminus where A4V sits?
Does it engage the B-barrel region or approach the dimer interface?
Does it appear surface-bound or partially buried?

Peptides	ipTM Values	Localization Characteristics
peptide_1	0.32	Surface-bound, may have partial interaction with beta-sheets
peptide_2	0.33	Interacts with beta-sheets, proximity to dimer interface residues
peptide_3	0.49	Surface-bound, does not interact with beta-sheets
peptide_4	0.42	Surface-bound, may have partial interaction with beta-sheets
known binder peptide	0.38	Surface-bound, does not interact with beta-sheets

Table 2. Modeling a peptide-protein complex. Interface predicted template modeling (ipTM) scores were obtained from AlphaFold analysis.

None of the peptides engages at the N-terminus where the A4V mutation is found, except peptide_2, which may interact with residue 153, the last residue at the C-terminus participating in the dimer interface.

(A) peptide_1	(B) peptide_2

(C) peptide_3	(D) peptide_4

(E) known binder peptide

Fig 3. AlphaFold modeling of pepMLM peptides with SOD1_A4V visualized by PyMol. SOD1_A4V monomer (blue) and binder peptides (orange). Highlights: A4V residue with sidechain in stick (red) and dimer interface residues (green) 50–53, 114, 148, and 150–153, as referenced in Hough et al. 2004.

In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated matches or exceed the known binder.

All peptides, including the benchmark, have ipTM values that fall between 0.3 and 0.5. Interpretation of the 0.3 – 0.5 range indicates weak interactions, but they may still occur. As ChatGPT research indicated, AlphaFold struggles with short peptides, as short as 12 amino acids. Having a benchmark, a known binder peptide, helps interpret the values, even if categorized as weak.

A low perplexity score of a peptide indicates that it is more likely to occur based on a protein language model. Among the new peptides, the perplexity score of peptide_4, 17.13, which is the lowest, is even lower than the perplexity score of the known binder peptide, 20.6. Two of the peptides have higher perplexity scores, peptide_1 and peptide_3, 30.94 and 28.21, respectively, placing these in the disfavored category.

ipTM score	Interpretation
> 0.7	confident interaction
0.5 - 0.7	possible interaction
0.3 - 0.5	weak interaction
< 0.3	likely no interaction

Table 3. Reference for interpreting ipTM scores.

I’ve used ChatGPT research to interpret the results.

Prompt:

Can you explain the results from AlphaFold structural analysis for binder peptides to the SOD1_A4V variant? Results in Tables 1 and 2.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

Paste the peptide sequence
Paste the A4V mutant SOD1 sequence in the target field.
Check the boxes

Predicted binding affinity
Solubility
Hemolysis probability
Net charge (pH 7)
Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

Peptide	Property	Prediction	Value	Unit
peptide_1, WRYYVVVAEWGE	Binding Affinity	Weak binding	6.750	pKd/pKi
	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.258	Probability
	Net Charge (pH 7)		-1.23
	Molecular Weight		1556.7	Da
peptide_2, WLYYATVARWGK	Binding Affinity	Weak binding	6.729	pKd/pKi
	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.054	Probability
	Net Charge (pH 7)		1.76
	Molecular Weight		1513.7	Da
peptide_3, WHYYVVGLRWWE	Binding Affinity	Weak binding	6.733	pKd/pKi
	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.135	Probability
	Net Charge (pH 7)		-0.15
	Molecular Weight		1693.9	Da
peptide_4, WRYYVTGAAWWK	Binding Affinity	Medium binding	7.059	pKd/pKi
	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.041	Probability
	Net Charge (pH 7)		1.76
	Molecular Weight		1586.8	Da
known binder peptide, FLYRWLPSRRGG	Binding Affinity	Weak binding	5.968	pKd/pKi
	Solubility	Soluble	1.000	Probability
	Hemolysis	Non-hemolytic	0.047	Probability
	Net Charge (pH 7)		2.76
	Molecular Weight		1507.7	Da

Table 4. Summary of the PeptiVerse analysis.

I’ve chosen peptide_2 for the therapeutic advancement with the following reasons:

Based on the PeptiVerse evaluation of the new peptides, the binding affinities of 3 out of 4 were predicted to be weak (Table 4). The known binder peptide is also scored with weak binding affinity in PeptiVerse. Only one peptide, peptide_4, had medium affinity, with a higher affinity score than the other peptides. However, peptide_4 is localized elsewhere, as determined by AlphaFold structure analysis, not near the A4V or dimer interface (Fig 1D).

On the other hand, the second-best-ranked model for the peptide_2 showed that this peptide could be localized near AV4 and closer to the dimer interface (Fig 1B), which may potentially interact with one of the dimer interface residues, 153, and prevent aggregation. Notably, other model predictions for this peptide, including the best-ranking one, showed that the peptide might also be located elsewhere.

Given the location of the second-best predicted model, I favored peptide_2 because it could potentially have a more relevant binding location. All peptides are scored as weak binding but possible interactions (ipTM between 0.3 and 0.5). The visualization of peptide_2 is shown in Fig 1B, reflecting its predicted binding location based on that second-best model. The ranking scores for the best and second-best were 0.43 and 0.42, respectively, as reported in the AlphaFold analysis. ipTM is 0.33 for the best and 0.31 for the second-best. Figs. 1A, 1C, 1D, and 1E , show the best-ranked models of peptide_1, peptide_3, peptide_4, and the known binder peptide, respectively.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPIt CoLab linked from the HuggingFace moPPIt model card.
Make a copy and switch to a GPU runtime.
In the notebook,

Paste your A4V mutant SOD1 sequence.
Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
Set peptide length to 12 amino acids.
Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
After generation, briefly describe how these moPPIt peptides differ from your PepLMM peptides. How would you evaluate these peptides before advancing them to clinical studies?

moPPIt Peptide	Binder Sequence	Hemolysis	Solubility	Affinity	Motif
m_peptide_1	TEQKKFTCRTQD	0.98065103	0.91666669	5.93099356	0.92896324
m_peptide_2	YQKCLVRETTGV	0.96973951	0.75	6.67283011	0.86192995
m_peptide_3	KEKKQRVQCTDG	0.97739551	0.91666669	5.56807947	0.85476714
m_peptide_4	SVKTTHCEQGKP	0.97208287	0.83333331	5.50222826	0.84767622

Table 5. moPPIt-generated binder peptides and their in-silico attributes. The following weight adjustments were made for targeting position 4, specifically: Hemolysis=1, Solubility=1, Affinity=1, Motif=1.

Unlike the PepMLM peptides, none of the moPPIt peptides achieved desirable low scores on hemolytic activity prediction. Instead, all peptides had high hemolysis scores. This is due to the region surrounding position 4 and the binder peptide that has a high proportion of hydrophobicity, resulting in high scoring of hemolytic activity. A further optimization is needed to reduce net hydrophobicity and charge residues in the peptide binder. As peptides were designed for a specific region, it can be harder to achieve every desired trait, requiring the consideration of trade-offs as well.

As a next step, I applied more stringent weight adjustment to optimize peptide binders in moPPIt (see results in Table 6 and Fig 4).

moPPIt Peptide	Binder Sequence	Hemolysis	Solubility	Affinity	Motif
m2_peptide_1	DTECTQTRLKKS	0.9731365	0.916666687	5.666040897	0.787758291
m2_peptide_2	YDVTTRLYFGRW	0.94705141	0.666666627	6.606595039	0.334373683
m2_peptide_3	KDEFDCKPCYNL	0.93650597	0.75	7.194892883	0.706446946
m2_peptide_4	TEKTIEKKQWCA	0.98217107	0.75	6.305016994	0.889526725
m2_peptide_5	SKECGTLRFKQR	0.96697627	0.833333313	6.679063797	0.910008729
m2_peptide_6	YKKETVKTNQFH	0.97450537	0.833333313	5.35283041	0.899615645
m2_peptide_7	TTSTHICTCPLC	0.87881172	0.75	5.995385647	0.758235991
m2_peptide_8	TGDTTCLKKQHF	0.97177865	0.833333313	5.857715607	0.851789355

Table 6. MoPPIt-generated binder peptides and their in-silico attributes, testing more stringent weight adjustment for optimal binder generation. The following weight adjustments were made, targeting position 4, specifically: Hemolysis=10, Solubility=5, Affinity=5, Motif=10.

(A) m2_peptide_1	(B) m2_peptide_2

(C) m2_peptide_3	(D) m2_peptide_4

(E) m2_peptide_5	(F) m2_peptide_6

(G) m2_peptide_7	(H) m2_peptide_8

Fig 4. AlphaFold modeling of moPPIt peptides with SOD1_A4V visualized by PyMol. SOD1_A4V monomer (blue) and binder peptides (orange). Highlights: A4V residue with sidechain in stick (red), dimer interface residues (green) 50-53, 114, 148, and 150-153, and sidechains of residues highlighted in peptide binders possibly interacting with A4V in stick (orange): m2_peptide_1, residue 4C; m2_peptide_3, residue 10V; m2_peptide_4, residue 11C; m2_peptide_5, residue 4C; m2_peptide_7, residue 10P, and m2_peptide_8, residue 7C. Based on the best-scoring models of m2_peptide_2 and m2_peptide_6, these binders did not interact with residue A4V.

A new set of moPPIt-generated binder peptides targeting position 4 has still fallen short in achieving low hemolysis scores under the conditions under which the moPPIt generator was run. This indicates further optimization is still needed.

Affinity scores ranged between 5.35 and 7.19, not significantly different from the previous set where the weights were less stringent. Notably, the motif score for the m2_peptide_2 was significantly low, 0.33, and this peptide is localized elsewhere. m2_peptide_6 also localized elsewhere, even though the motif score was higher, 0.89, and the reason for this is unclear. The rest of the moPPIt peptides had expected localization relative to position 4, and by visual inspection, they may interact with position 4 and the surrounding hydrophobic dimer interface, indicating promise for advancement. The only issue is the high hemolytic activity. To achieve less hemolytic activity, hydrophobic amino acids can be replaced by less hydrophobic ones, and charged residues can be balanced. As Gemini AI research also suggests, there are other approaches to reduce hemolytic activity, such as peptide cyclization and conjugating peptides to polymers.

I’ve used Gemini AI for research.

Prompts:

Can you explain why hemolytic activity is seen in some peptides designed for therapeutic purposes? How to avoid hemolytic activity in peptides for better therapeutics?

Part B: BRD4 Drug Discovery Platform Tutorial (Optional)

Part C: Final Project: L-Protein Mutants

High-level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is to key to the understanding of how phages can potentially solve antibiotic resistance.

The objective is to improve the stability and autofolding of the lysis protein. More specifically, we want to engineer the lysis protein to increase the ability of MS2 to overcome a common E. coli resistance mechanism: a single point mutation in DnaJ prevents the binding of the lysis protein. We can attempt this by mutating the lysis protein to change its properties. Together, we aim for finding mutations that change the lysis protein one of the following ways: (1) an independence of lysis protein processing from DnaJ or other bacterial chaperones or (2) a faster or more efficient killing of E. coli to reduce the window in which the host can acquire resistance (3) higher lysis protein expression. In the course of this class, we will proceed through the following stages to create and test new MS2 phage mutants:

In this subset:

Stage 1: Engineer novel L-protein mutants using protein design tools

Stage 2: Synthesize the L-protein mutant gene via Twist

Stage 3: Clone the L-protein mutant gene into a plasmid using Gibson Assembly

Stage 4: Test the L-protein mutant’s structural integrity using the Nuclera system

Stage 5: Test the L-protein in E. coli with plaque assays