Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge: Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy.

You will use three models developed in our lab:

PepMLM: target sequence-conditioned peptide generation via masked language modeling
PeptiVerse: therapeutic property prediction
moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Original sequence:

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Sequence with mutation: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Generated sequences:

Binder	Pseudo Perplexity
WRYGVTALAHWX	10.28 ⭐
KRYPVVGLEWKX	14.16
KHYPPVVVAHKK	14.86
WRYYAAVVRHKK	19.84

Known Binder: FLYRWLPSRRGG

Four candidate peptides of length 12 were generated using PepMLM conditioned on the mutant SOD1 sequence. The model assigned pseudo-perplexity scores to each peptide, which reflect the likelihood of the sequence under the model. Lower perplexity values indicate higher confidence. Among the generated peptides, WRYGVTALAHWX had the lowest pseudo-perplexity (10.28), suggesting it is the most plausible binder candidate. For comparison, the known SOD1-binding peptide FLYRWLPSRRGG was also included.

Part 2: Evaluate Binders with AlphaFold3

Navigate to the AlphaFold Server: alphafoldserver.com
For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.

a) ipTM = 0.45pTM = 0.78

b) ipTM = 0.38pTM = 0.87

c) ipTM = 0.24pTM = 0.87

d) ipTM = 0.27pTM = 0.71

e) Known Binder ipTM = 0.35pTM = 0.83

Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

Model 1

The peptide sits along the side of the β-sheet barrel.
It contacts surface loops near the barrel edge.
Surface-bound, not buried.
Not near the extreme N-terminus (A4V region).

Model 2

The peptide is positioned above the β-barrel core, interacting mainly with loop regions.
Still surface-exposed.
No clear contact with the dimer interface.

Model 3

The peptide extends toward a flexible loop projecting from the barrel.
Appears loosely associated, with much of the peptide exposed.
Again not near the N-terminal A4V site.

Model 4

-The peptide approaches between β-strands and adjacent loops, slightly closer to the barrel surface.

Part of the peptide appears partially tucked against the protein, but still largely surface-bound.

Model 5

The peptide lies along the β-sheet surface, contacting residues on the outer barrel face.
The orientation is consistent with surface docking rather than deep insertion.

In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

AlphaFold3 was used to model complexes between mutant SOD1 and each candidate peptide. The predicted interface TM-scores (ipTM) ranged from 0.24 to 0.45, indicating generally weak but plausible protein–peptide interactions. Most peptides appeared surface-bound along the β-barrel region of SOD1, interacting primarily with exposed loop regions rather than the N-terminal region where the A4V mutation occurs. The known SOD1-binding peptide FLYRWLPSRRGG produced an ipTM score of 0.35. Notably, one PepMLM-generated peptide (WRYGVTALAHWA) showed a higher ipTM score of 0.45, suggesting a potentially stronger interaction than the reference peptide. These results indicate that some generated peptides may represent promising candidates for further optimization and evaluation.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

Paste the peptide sequence.
Paste the A4V mutant SOD1 sequence in the target field.
Check the boxes

Predicted binding affinity
Solubility
Hemolysis probability
Net charge (pH 7)
Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

PEPTIDE #1:

PEPTIDE #2:

PEPTIDE #3:

PEPTIDE #4:

PEPTIDE #5:

Peptide	ipTM (AF3)	Binding Affinity (pKd)	Solubility	Hemolysis (Prob)	Net Charge
WRYGVTALAHWA	0.45	6.223	Soluble (1.00)	0.048	+1.85
KRYPVVGLEWKA	0.38	5.542	Soluble (1.00)	0.033	+1.76
KHYPPVVVAHKK	0.24	4.835	Soluble (1.00)	0.018	+2.93
WRYYAAVVRHKK	0.27	6.045	Soluble (1.00)	0.032	+3.84
FLYRWLPSRRGG	0.31	5.968	Soluble (1.00)	0.047	+2.76

PeptiVerse predictions were used to evaluate the therapeutic properties of the generated peptides, including binding affinity, solubility, hemolysis probability, and net charge. The peptide WRYGVTALAHWA showed the highest predicted binding affinity (pKd = 6.223) and the highest AlphaFold3 interface score (ipTM = 0.45), suggesting a relatively stronger interaction with SOD1 compared to the other candidates and the reference peptide FLYRWLPSRRGG. All peptides were predicted to be highly soluble with low hemolysis probabilities, indicating generally favorable therapeutic properties. Although some peptides displayed higher net charges, WRYGVTALAHWA maintained a moderate charge and low toxicity risk. Based on the combined structural and therapeutic predictions, WRYGVTALAHWA appears to provide the best balance of binding strength and developability and I selected it as the peptide to advance for further optimization.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPit Colab linked from the HuggingFace moPPIt model card
Make a copy and switch to a GPU runtime.
In the notebook:

Paste your A4V mutant SOD1 sequence.
Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
Set peptide length to 12 amino acids.
Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.

After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

Peptides generated:

a) STCKYKKIGGTL

b) GRYKCYCRDSRY

c) DDTITCKKKQCT

In this step, peptides were generated using moPPIt, which applies Multi-Objective Guided Discrete Flow Matching to design binders toward specific residues on the target protein while simultaneously optimizing multiple objectives such as binding affinity, solubility, and toxicity. The mutant sequence of Superoxide Dismutase 1 was provided as the target, and residues 4-7 near the A4V mutation were selected to guide peptide binding toward the N-terminal region of the protein. Compared to the peptides generated with PepMLM, the moPPIt peptides displayed more structured residue patterns, including higher frequencies of positively charged residues such as lysine and arginine and the presence of cysteine residues that may contribute to stabilizing protein–peptide interactions. This suggests that moPPIt performs directed optimization of binding motifs rather than broadly sampling plausible sequences.

Before advancing these peptides toward clinical studies, several validation steps would be required. First, computational structural modeling using AlphaFold3 or molecular docking could confirm whether the peptides bind near the intended SOD1 residues. Property prediction tools such as PeptiVerse could further evaluate binding affinity, solubility and toxicity risks. Finally, experimental validation would be necessary, including in vitro binding assays, aggregation inhibition assays for mutant SOD1, and toxicity testing in relevant cellular systems.

Part C: Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

Lysis Protein Sequence (UniProtKB ID:

https://www.uniprot.org/uniprotkb/P03609/entry)

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Note: Lysis protein contains a soluble N-terminal domain followed by a transmembrane protein (blue/last 35 residues). Transmembrane protein affects the lysis activity. The soluble domain (green) is the domain responsible for interaction with DnaJ.

L-Protein Engineering | Option 1: Mutagenesis

STEP 1:

A multiple sequence alignment of homologous L-protein sequences was performed using Clustal Omega to identify conserved and variable regions across related bacteriophages. The alignment revealed that the transmembrane region, located in the C-terminal portion of the protein, is highly conserved, particularly in residues forming a hydrophobic helix (LVLIFLAIFLSKFTNQLLLSLL). This high level of conservation suggests a critical functional role in membrane insertion and pore formation during bacterial lysis. In contrast, the N-terminal soluble region displayed greater sequence variability, indicating a higher tolerance to mutations. Based on these observations, conserved residues were avoided during mutational design, while more variable positions, especially in the soluble domain, were prioritized as potential targets for mutation.

STEP 2:

To evaluate the effect of mutations across the L-protein sequence, a protein language model (ESM-2) was used to compute log-likelihood ratio (LLR) scores for all possible amino acid substitutions at each position. This approach estimates how favorable a mutation is relative to the wild-type residue based on learned sequence patterns from large protein datasets. Positive LLR scores indicate mutations that are more likely to be tolerated or beneficial for protein stability, while negative scores suggest deleterious effects. The results were compiled into a ranked list of candidate mutations, allowing the identification of positions and substitutions with the highest predicted improvement. These scores were then used as a primary filter to guide mutation selection, in combination with conservation analysis from the multiple sequence alignment.

The protein language model identified several mutations with high positive LLR scores, indicating potentially favorable substitutions. The top-ranked mutations included K50L (LLR = 2.56), C29R (LLR = 2.39), Y39L (LLR = 2.24), C29S (LLR = 2.04), and S9Q (LLR = 2.01). Additional high-scoring mutations were observed at positions within both the soluble and transmembrane regions, such as T52L (LLR = 1.81), N53L (LLR = 1.86), and A45L (LLR = 1.54), particularly favoring substitutions to hydrophobic residues in the transmembrane domain. These results suggest that increasing hydrophobicity in the membrane region and selecting tolerated substitutions in variable regions may improve protein stability and folding.

STEP 3:

To assess how well the model predictions reflect real functional outcomes, the LLR scores were compared with available experimental lysis data for L-protein mutants. While some overlap between high-scoring mutations and experimentally tested variants was observed, many of the top-ranked mutations identified by the model were not present in the experimental dataset. Therefore, the experimental data was used when available, but for many candidate mutations, selection relied primarily on LLR scores in combination with conservation analysis.

STEP 4:

Based on the combined analysis of LLR scores, sequence conservation, and structural considerations, five mutations were selected as potential candidates for improving the L-protein. In the soluble region, the mutations S9Q and K23R were chosen due to their high LLR scores and location in more variable regions, suggesting a higher tolerance for substitutions that may improve folding stability. In the transmembrane region, K50L and T52L were selected, as both mutations introduce more hydrophobic residues, which is consistent with the conserved nature of this domain and may enhance membrane insertion and pore formation. Additionally, a combined mutant (S9Q + K50L) was designed to explore potential additive effects between improved folding in the soluble region and enhanced hydrophobicity in the transmembrane domain.

FASTA SEQUENCES:

>WT_L_protein METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.44

>S9Q METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>K23R METRFPQQSQQTPASTNRRRPFRHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>K50L METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>T52L METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.46

>S9Q_K50L METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

AlphaFold predictions were used to assess the structural impact of the selected mutations. The wild-type protein showed a pTM score of 0.44, while most mutants exhibited similar values around 0.43, indicating no significant structural disruption. Notably, the T52L mutant showed a slightly higher pTM score of 0.46, suggesting a modest improvement in structural stability. This result is consistent with the introduction of a more hydrophobic residue in the transmembrane region, which may favor membrane insertion. Overall, these findings indicate that the proposed mutations are structurally tolerated and may contribute to improved protein stability.