Week 5 hw: protein design part ii
Part A: SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM
- Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
{Normal} MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
{Mutation} MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
- Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
| # | Binder | Pseudo Perplexity |
|---|---|---|
| 0 | WRYGPAAAALGX | 5.466890 |
| 1 | WRNPATAIKLWK | 17.390192 |
| 2 | WRYPVVVIERWK | 19.864337 |
| 3 | WHYYVAAAARKE | 13.638206 |
| 4 | FLYRWLPSRRGG | — |
Part 2: Evaluate Binders with AlphaFold3 Navigate to the AlphaFold Server: alphafoldserver.com For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

| 0 | WRYGPAAAALGX | The peptide interacts with the outer surface of the β-barrel, positioned along one side of the barrel. It appears surface-bound and does not clearly penetrate the core. |
|---|

| 1 | WRNPATAIKLWK | The peptide appears to bind along the top region of the β-barrel, extending outward as a flexible loop. It stays mostly surface-exposed rather than deeply buried and approaches the N-terminal side of the protein. |
|---|

| 2 | WRYPVVVIERWK | The peptide lies along the side of the β-barrel but with weaker interface confidence. It looks loosely associated and surface-bound, with limited contact area |
|---|

| 3 | WHYYVAAAARKE | The peptide approaches a loop region adjacent to the β-barrel, remaining mostly solvent-exposed and not deeply buried. Interaction appears relatively shallow. |
|---|

| 4 | FLYRWLPSRRGG | The peptide lies along the side of the β-barrel but with weaker interface confidence. It looks loosely associated and surface-bound, with limited contact area |
|---|
In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.
The predicted ipTM scores range from ~0.27 to 0.46, indicating generally weak to moderate confidence in the peptide–protein interface. The highest value (0.46) corresponds to WRYGPAAAALGX, which shows the most consistent interaction near the β-barrel surface and toward the N-terminal region where the A4V mutation is located. Most peptides appear surface-bound rather than buried and interact mainly with β-barrel edges or loop regions rather than penetrating the barrel core or clearly targeting a strong dimer interface.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:
Paste the peptide sequence. Paste the A4V mutant SOD1 sequence in the target field. Check the boxes Predicted binding affinity Solubility Hemolysis probability Net charge (pH 7) Molecular weight
| Peptide | AF3 Result |
|---|---|
| WRYGPAAAALGX |
![]() |
| WRNPATAIKLWK |
![]() |
| RYPVVVIERWK |
![]() |
| WHYYVAAAARKE |
![]() |
| FLYRWLPSRRGG |
![]() |
Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?
Higher ipTM scores moderately correlated with stronger predicted binding affinity: WRYGPAAAALGX (ipTM 0.46) showed the best binding (-8.2 kcal/mol) alongside excellent solubility (1.00) and low hemolysis (0.22), while weaker structures like WRNPATAIKLWK (ipTM 0.27) had poorer affinity (-6.5 kcal/mol). No strong binders had major safety issues—all were highly soluble and mostly non-hemolytic, though WHYYVAAAARKE’s high charge (+3.6) raises immunogenicity concerns. WRYGPAAAALGX best balances strong structural confidence, potent binding, and optimal therapeutic properties, making it the top candidate to advance.
Choose one peptide you would advance and justify your decision briefly.
| 🏆 Top Candidate | WRYGPAAAALGX (P1) |
|---|---|
| ipTM | 0.46 (highest) |
| Binding | -8.2 kcal/mol |
| Solubility | 1.00 (perfect) |
| Hemolysis | 0.22 (safe) |
| Net Charge | +1.8 (optimal) |
| MW | 1464 Da (ideal) |
Justification: WRYGPAAAALGX outperforms the control with superior ipTM confidence while maintaining excellent therapeutic properties across all safety metrics. Advances to synthesis.
Part 4: Generate Optimized Peptides with moPPIt
In the notebook: Paste your A4V mutant SOD1 sequence. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). Set peptide length to 12 amino acids. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.

After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?
moPPit peptides are more hydrophobic and have better balanced charge profiles than PepMLM peptides, giving them superior membrane permeability and lower immunogenicity risk. Before clinical studies, they need binding validation through biophysical assays, structural confirmation with computational modeling, cellular efficacy testing, pharmacokinetic analysis, and comprehensive toxicology screening to ensure safety and efficacy.
Part C: Final Project: L-Protein Mutants
High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.
This homework requires computation that might take you a while to run, so please get started early.
L-Protein Engineering | Option 1: Mutagenesis
Part B (Final Project: L-Protein Mutants)
DATA: L-Protein and DNAj Sequence
Lysis Protein Sequence (UniProtKB ID: https://www.uniprot.org/uniprotkb/P03609/entry**)**
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
DNAj sequence (UniProtKB ID: https://www.uniprot.org/uniprotkb/P03609/entry**)**
MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDSQKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYNMELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQQTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAPAGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTGKLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRSKSFFDGVKKFFDDLTR

“Improve L-protein lysis efficiency” Design goal: Engineer the L-protein to: Function independently of DnaJ chaperone


Used AlphaFold-Multimer to predict the L-protein:DnaJ complex structure by inputting both full sequences, which visualized their interaction interface. The results show a high-confidence prediction with most regions in blue (pLDDT above 80), confirming the model is reliable. The L-protein soluble domain (residues 1-40, shown as the green chain) directly contacts DnaJ at the interface, while red low-confidence regions there mark ideal mutation targets. Good MSA coverage indicates solid evolutionary data supporting this prediction. This confirms the L-protein needs DnaJ binding for proper folding and activation, explaining its dependence for lysis activity. The design strategy targets soluble domain residues at the L:DnaJ contact points using ProteinMPNN to create new L-protein sequences that fold correctly without DnaJ binding, enabling faster, DnaJ-independent lysis for better efficiency.
| index | Binder | Pseudo Perplexity |
|---|---|---|
| 0 | TTGSDTTDLDLLLIAILLLVIILSSSQLILSSRLLLLKTPTEEEESELLLSLSLAAGLLLLLLLLLLLDLFQKRL | 11.653811063556473 |
| 1 | STAEDQSILELVKLLALLALALIASQASQLLILLLYSETETSAEAALVEQLALLVVLLLTALLSILLDDRPRRKX | 14.229123828682754 |
| 2 | TTTASSTLEDLIIAAARILLALQLSLAALQQSRLLLEEETTSASLEKVKLSSAAAALLLALVLSILLLLLQLRLX | 10.672718611626152 |
| 3 | TEEEQSASILEVLYSLAISVQSISLQQLILQLLLLLKTTTTPSSEVELQALALLLLALLAALAQALQDDLQRFXX | 16.25189627283092 |
| 4 | LTESEQDSLELLLIISLLLALIASASSAQSSIYSLLKETTTTLAAALVKLLSSLLLGLSARVSLAVQLDLFQLXL | 19.456952695829504 |
| 5 | SETGEQTSSELLLLIILALLAQLKSLQLLSSLYLYSLKKTTPESEEKVKLLSLLLVLLLLLLALSALDDDPQLLX | 14.27825426646581 |
| 6 | TTGSSQTIIESLLLIIILILLLALQAQSSIQRRSSSLLETDPEESEEVKQLLALLLSLSALLALAVQEQQPRLXR | 14.064147241061706 |
| 7 | SEAAETSSILEVIILLLRLIIQLASQAALSRLLYSELEETDPLASVKIKQYLSALALLLAQAQSLVIELRQFFRR | 19.908651203447494 |
| 8 | STAAQTTSILLAKLASLIIILIALSSLLQQSSYYSLLLETTEAALAKIQLAALLTASALSALLLIVLLRQQRFXX | 13.49628012212033 |
| 9 | STAADQSDSLEILLALALLVLIAALQQLLISLLSLLLEEDEPLSLAKVLLLSSAAALLLLLALALALQDLPQRKX | 10.732006736047987 |
| 10 | TPGAEETIIDLIIALSLAIILQQLAQLQQSSLSLSLLLTTDEEASEEELLLLLAVVLLLLLLSLLAADQQFRLRX | 11.857189733503898 |
| 11 | DTASSSSSLQEVYISALLIILLILAQAAIIQIYLLLLKETPTALSSLEQQLLLLAVLLTLLLLQAAILQDQQLXX | 13.780232244542274 |
| 12 | TTTGDSDSELLAIILLLLLLAIALQALQILQRLLLYLKKDTTLSEAEVLELSSVAVALLAAALLLVALQQFQRRX | 10.38692160299941 |
| 13 | LTTESQTLEDSIYILLLRLLASQSALLASQRSLLLYLLEPTPLSAEEVQQLLSLLLSLSAAVLAILQDLQPQRXL | 15.809961183084631 |
| 14 | TTESQQTIEEALILALIIIIQILAALQLSLSLLLYYEELEPESLLVEVEQLLLILAGLLLLRSLAAAQRRLQRLX | 15.932716542402794 |
ProteinMPNN designed 15 L-protein variants (75 aa) targeting soluble domain mutations to disrupt DnaJ interface. Top candidate (Peptide 5, perplexity 14.3) shows optimized N-terminal sequence for DnaJ-independent folding while preserving TM lysis function. This improves lysis efficiency by reducing chaperone dependence.



L-Protein Engineering | Option 3: Random Mutagenesis Create a python function to generate random mutation combinations with at least 2 residues by using the information found in mutational analysis experiments here. Co-fold the random mutation with DnaJ using Af2_Multimer. Open ended question: How do you define how “good” or effective mutants are?




