Week 5 HW: Protein Design Part II
Part A: SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM
The human SOD1 sequence (UniProt P00441) was retrieved and the A4V mutation introduced by substituting alanine for valine at residue 4:
Uniprot (P00441) Sequence:
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Mutated (A4V) Sequence:
MATVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Four peptides of length 12 were generated using PepMLM-650M conditioned on the
mutant sequence, and the known binder FLYRWLPSRRGG was added for comparison.
Pseudo-perplexity scores were computed for all peptides using the
compute_pseudo_perplexity function from the PepMLM Colab notebook. Lower
perplexity indicates higher model confidence in the peptide as a binder.
| Index | Binder | Pseudo Perplexity |
|---|---|---|
| 1 | KHYPVVAAELKA | 10.79 |
| 2 | WHYYAAALAHKA | 14.46 |
| 3 | WHVVAAAVRWKE | 20.09 |
| 4 | FLYRWLPSRRGG (known binder) | 20.59 |
Part 2: Evaluate Binders with AlphaFold3
The four PepMLM-generated peptides and the known binder FLYRWLPSRRGG were submitted to the AlphaFold3 server as separate chains paired with the A4V mutant SOD1 sequence. The resulting ipTM scores were: KHYPVVAAELKA (0.48), WHYYAAALAHKA (0.46), WHVVAAAVRWKE (0.37), and FLYRWLPSRRGG (0.32). Notably, all three top PepMLM-generated peptides exceeded the known binder in structural confidence. KHYPVVAAELKA and WHYYAAALAHKA appeared to engage the surface near the β-barrel region, while WHVVAAAVRWKE showed a more peripheral interaction. None of the peptides achieved ipTM values above 0.5, which is consistent with the inherent challenge of modeling short peptide-protein complexes, but the PepMLM candidates consistently outperformed the reference binder.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
PeptiVerse predictions revealed that all four peptides are soluble and non-hemolytic, with binding affinity scores ranging from 5.89 to 6.70 pKd (all classified as weak binding). There was no strong correlation between ipTM and predicted affinity: WHVVAAAVRWKE had the highest pKd (6.70) despite a lower ipTM (0.37), while KHYPVVAAELKA combined the best ipTM (0.48) with acceptable affinity (5.89). The known binder FLYRWLPSRRGG showed the lowest solubility probability (0.61) and highest net charge (+2.76), suggesting potential pharmacological liabilities despite its experimental validation. Selected candidate: KHYPVVAAELKA. This peptide ranked first in structural confidence (ipTM 0.48), first in PepMLM perplexity (10.79), is completely soluble (1.00), non-hemolytic (0.07), and carries a near-neutral charge (+0.85), making it the most promising candidate across all evaluated dimensions.
Part 4: Generate Optimized Peptides with moPPIt
Using moPPIt with affinity and motif guidance targeting residues 1–10 of the A4V mutant SOD1 (the N-terminal region destabilized by the mutation), 10 peptides of length 12 were generated. The results revealed a notable trade-off between affinity and motif scores. CVYCCVDGCVWV achieved the highest predicted affinity (pKd 8.12), crossing the threshold for moderate binding, but had a low motif score (0.39), suggesting it does not engage the target region specifically. In contrast, DTPPCYAPVICY balanced strong motif engagement (0.730) with reasonable affinity (6.81). Compared to PepMLM peptides, moPPIt candidates are richer in cysteines, likely forming disulfide bonds for structural stability, and show higher affinity scores overall — reflecting the benefit of multi-objective guided optimization over unconditional sampling. Before advancing any of these peptides to clinical studies, further evaluation would include: (1) experimental binding validation via SPR or ITC; (2) AlphaFold3 structural modeling of the moPPIt candidates; (3) stability and proteolysis assays; (4) cell-based toxicity screens; and (5) in vivo pharmacokinetic profiling.
Part B: MS2 Phage Lysis Protein Engineering (Option 1: Mutagenesis)
Overview
The MS2 lysis protein (L-protein, UniProtKB P03609) was analyzed using the ESM2 protein language model to predict the effect of single amino acid substitutions across all positions. The sequence used was:
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
The protein contains two functional regions:
- Soluble N-terminal domain (residues 1–37): responsible for interaction with the E. coli chaperone DnaJ
- Transmembrane domain (residues 38–75): responsible for membrane integration and lysis activity
ESM2 Scoring and Correlation with Experimental Data
The ESM2 notebook generated Log Likelihood Ratio (LLR) scores for all possible single amino acid substitutions at every position. A positive LLR score indicates that the model considers the mutation more favorable than the wild-type residue in that context.
Crossreferencing the ESM2 scores with the experimental L-protein mutant dataset revealed a weak but consistent trend: mutants with confirmed lysis activity (Lysis=1) had a mean ESM2 score of -0.156, compared to -0.407 for non-functional mutants (Lysis=0). This suggests that ESM2 captures some evolutionary signal relevant to protein function, but is insufficient as a standalone predictor. A notable exception was C29R (score +2.40), which the model ranked highly but showed no lysis activity experimentally — highlighting the limitations of sequence-only models for membrane proteins.
Proposed Mutations
Five mutations were selected based on positive ESM2 scores, avoidance of experimentally confirmed loss-of-function positions, and biological rationale:
| # | Mutation | Region | ESM2 Score | Rationale |
|---|---|---|---|---|
| 1 | F5Q | Soluble | +1.80 | Position not highly conserved; Q substitution predicted favorable by ESM2 and may reduce hydrophobic burial in the soluble domain |
| 2 | C29S | Soluble | +2.04 | Free cysteines can form aberrant disulfide bonds destabilizing the protein; serine is a conservative, polar replacement with high ESM2 confidence |
| 3 | Y39L | Transmembrane | +2.24 | Tyrosine is poorly suited for membrane-buried positions; leucine increases hydrophobicity and favors stable TM helix formation |
| 4 | A45L | Transmembrane | +1.54 | Leucine substitution at this position increases hydrophobic packing within the TM helix, potentially improving membrane integration |
| 5 | E61L | Transmembrane | +1.82 | Glutamate is a charged residue unfavorable in transmembrane regions; leucine substitution improves the hydrophobic character of the helix |
Mutations 1 and 2 target the soluble domain to potentially alter DnaJ-independent folding. Mutations 3, 4, and 5 target the transmembrane domain to improve membrane integration efficiency, which could accelerate lysis kinetics and reduce the window for E. coli to acquire resistance.