Week 5 HW: Protein Design Part II

Week 5

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
Record the perplexity scores that indicate PepMLM’s confidence in the binders.

![Peptides + Perplexity Scores]( )

Part 2: Evaluate Binders with AlphaFold3

Navigate to the AlphaFold Server: alphafoldserver.com
For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

![SOD + Binder 1]( )

Across all simulated complexes, the designed 12-mer peptides primarily target the outer flexible loops and the exposed surfaces of the SOD1 \beta-barrel structure, rather than completely burying themselves within the core. For the top-performing candidate, Peptide 1 (WHYYPAAARWKA), the structural visualization displays a stabilization adjacent to the terminal loops, remaining highly surface-bound. Peptide 2 (WLYYPVVVALWK) leverages its bulky hydrophobic residues (Tyrosines and Valines) to become partially buried in a superficial hydrophobic pocket of the barrel. In contrast, Peptide 3 (WLYPAAALEHKE) shows poor structural alignment, leaving the peptide highly flexible and extended away from the stable dimer interface or the crucial N-terminus site near residue position 4.

The observed ipTM values range from 0.26 to 0.42. These values reflect a relatively low structural confidence regarding the precise docking coordinates of the interface, a typical benchmark limitation when evaluating short, flexible linear peptides against a rigid, dimeric enzyme in AlphaFold3. Notably, Peptide 1 (WHYYPAAARWKA) achieved the highest structural confidence with an ipTM of 0.42 and a pTM of 0.86, successfully outperforming the known positive control binder FLYRWLPSRRGG ipTM = 0.38). Peptide 2 also slightly surpassed the control with an ipTM of 0.39. This indicates that the evolutionary-conditioned generation via PepMLM successfully sampled sequence patterns capable of matching or exceeding the structural interface stability of experimentally validated binders in an in silico environment.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

Paste the peptide sequence.
Paste the A4V mutant SOD1 sequence in the target field.
Check the boxes a) Predicted binding affinity b) Solubility c) Hemolysis probability d) Net charge (pH 7) e) Molecular weight Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?
Choose one peptide you would advance and justify your decision briefly.

Based on the comparative analysis between AlphaFold3 structural metrics and PeptiVerse therapeutic property predictions, we observe that structural confidence (ipTM) does not strictly correlate with sequence-based thermodynamic metrics. Peptide 2 (WLYYPVVVALWK) achieved a medium binding affinity prediction (pKd/pKi = 7.136) and a high structural ipTM of 0.39. However, its extreme hydrophobic nature (GRAVY score of +1.01) drastically compromises its drug-like profile, yielding a poor solubility probability of 0.461 and a dangerously high hemolysis probability of 0.240. On the other hand, Peptide 1 (WHYYPAAARWKA) demonstrated the highest structural confidence with an ipTM of 0.42 while maintaining an optimal therapeutic balance: maximum solubility probability (1.000) and minimal hemolysis risk (0.020), proving that structural stability and safety profiles must be screened concurrently.

I choose to advance Peptide 1 (WHYYPAAARWKA) toward further development. While Peptide 2 shows slightly stronger raw binding energy, its high hemolytic probability (24) makes it biologically toxic for systemic administration against ALS targets. Peptide 1 successfully balances evolutionary confidence (exhibiting the lowest pseudo-perplexity in PepMLM), superior structural dock integrity over the known positive control binder FLYRWLPSRRGG (ipTM = 0.42 vs 0.38), and a pristine pharmacological safety profile with optimal hydrophilicity and charge.

Peptide ID	Sequence (12 aa)	AlphaFold3 ipTM	Solubility (Prob)	Hemolysis (Prob)	Binding Affinity (pKd/pKi)
0	`WRYPVAGLAHWK`	0.34	0.838	0.020	6.249 (Weak)
1 (Advanced)	`WHYYPAAARWKA`	0.42	1.000	0.020	6.393 (Weak)
2	`WLYYPVVVALWK`	0.39	0.461	0.240	7.136 (Medium)
3	`WLYPAAALEHKE`	0.26	1.000	0.019	5.980 (Weak)
Control	`FLYRWLPSRRGG`	0.38	0.608	0.047	6.353 (Weak)

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPit Colab linked from the HuggingFace moPPIt model card
Make a copy and switch to a GPU runtime.
In the notebook: a) Paste your A4V mutant SOD1 sequence. b)Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). c) Set peptide length to 12 amino acids. d) Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

Unlike PepMLM, which blindly samples plausible binding sequences conditioned solely on the global target sequence, moPPIt utilizes Multi-Objective Guided Discrete Flow Matching (MOG-DFM). This allows us to explicitly steer the generation toward specific residue indices (forcing binding directly at the N-terminus A4V destabilized region) while simultaneously optimizing physical objectives like solubility and non-hemolysis during the generation process itself, rather than relying on a post-generation screening filter.

Evaluation before clinical studies: Before moving these computational candidates to clinical phases, they must undergo standard wet-lab validation pipeline:
In vitro biophysical characterization using Surface Plasmon Resonance (SPR) or Isothermal Titration Calorimetry (ITC) to determine experimental binding affinity (K_d).
Circular Dichroism (CD) to evaluate peptide secondary structure stability.
Cellular assays (e.g., patient-derived motor neuron cultures) to confirm that the peptide actively inhibits toxic SOD1 aggregation and prevents cellular degradation without displaying cytotoxicity or hemolytic activity.

Part C: Final Project: L-Protein Mutants

To specifically disrupt the binding interface with the E. coli DnaJ chaperone while preserving the overall structural integrity of the L-protein’s soluble domain, we designed two multi-site mutant candidates (each containing 3 distinct substitutions within residues 1-40):

Multi-Site Candidate 1: `R23A / R24A / P27A`

Mutated Sequence (Soluble Domain): METRFPQQSQQTPASTNRRRPFKAADYACRRQQRSST... Justification: This design targets the highly basic and rigid positive cluster (RRRPF) in the soluble domain. By mutating Arg23 and Arg24 to Alanine, we systematically strip away the positive guanidinium side chains that coordinate with DnaJ’s negative surface pockets. Additionally, the P27A mutation removes a rigid proline kink, introducing backbone flexibility. Together, these three concurrent changes are engineered to sterically and electrostatically shut down DnaJ recognition, forcing a chaperone-independent folding pathway.

Multi-Site Candidate 2: `Q8E / Q9E / H20YL`

Mutated Sequence (Soluble Domain): METRFPQEELTPASTNRRRPFKYEDYPCRRQQRSST... Justification: This combination focuses on optimizing the net charge and evolutionary surface compliance. The dual Q8E / Q9E substitution introduces a strong localized negative charge density at the N-terminus, which increases cytosolic solubility and expression yield according to our ESM log-likelihood heatmap. Simultaneously, mutating the Histidine at position 20 to a Tyrosine (H20Y) introduces an aromatic stacking capability that stabilizes the local alpha-helical fold monomerically, minimizing kinetic misfolding traps without requiring chaperone assistance.

![HeatMap]( )

Based on the generated ESM-MaskedLM log-likelihood ratio heatmap, I selected 5 point-mutations filtering for the highest scoring (yellow/bright) hotspots while respecting domain boundaries: Soluble Region - Q9E: Selected due to a prominent positive log-likelihood score at position 9. It replaces a neutral glutamine with a charged glutamic acid, predicted by ESM to increase surface solubility and expression efficiency in the cytosol. Soluble Region - T15A: Located in a highly tolerant structural loop. The heatmap shows a bright yellow pixels for alanine substitution, suggesting a mutation that preserves structural integrity while potentially testing chaperone-independent folding routes. Transmembrane Region - I47V: The heatmap displays a continuous horizontal yellow streak across the transmembrane segment for Valine (V). This indicates that conserving a hydrophobic character while slightly reducing side-chain volume is evolutionarily favored, optimizing pore oligomerization kinetics. Transmembrane Region - F50L: Mutating the aromatic Phenylalanine to an aliphatic Leucina shows a strong positive score, expected to improve helix-helix packing during the multimeric pore assembly. Combinatorial Variant - Q9E / I47V: A double mutant engineered to simultaneously drive high cytosolic accumulation (soluble domain optimization) and rapid membrane perforation (transmembrane domain optimization).