Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation. Your challenge: Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy. You will use three models developed in our lab: PepMLM: target sequence-conditioned peptide generation via masked language modeling https://colab.research.google.com/drive/1z37YcBtwhd-RzNrWaTiKUhTHwZGKzdGc?usp=sharing

FieldValue
Binder sequenceKDVEHTLDHYALKNR
Length15 amino acids
Pseudo Perplexity17.59
PeptiVerse: therapeutic property prediction
image.png image.png
moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Record the perplexity scores that indicate PepMLM’s confidence in the binders. I retrieved the human SOD1 sequence from UniProt entry P00441 and introduced the A4V mutation.

Mutant SOD1 sequence: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Using the PepMLM Colab linked from the Hugging Face PepMLM-650M model card, I ran peptide generation conditioned on the mutant SOD1 sequence. The output returned the following peptide:

BinderPseudo Perplexity
SHYPEVTAVYKAKKX10.084192055372938

For comparison, I added the known SOD1-binding peptide:

BinderPseudo Perplexity
FLYRWLPSRRGGN.A.

Note: Although the assignment requested four peptides of length 12 amino acids, my PepMLM run returned only one peptide. The returned sequence, SHYPEVTAVYKAKKX, is 15 amino acids long and contains a terminal X, so it does not exactly match the requested format. I am reporting the model output as returned by the Colab.

Part 2: Evaluate Binders with AlphaFold3

Navigate to the AlphaFold Server: alphafoldserver.com For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried? In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

I submitted the mutant SOD1 sequence together with the generated peptide and the known binder as separate chains to the AlphaFold Server. However, the PepMLM-generated peptide contained a terminal X (SHYPEVTAVYKAKKX), which is a non-standard/ambiguous residue and may have interfered with processing on the server. At the time of submission, the job was saved but no completed structure or ipTM output was returned, so I was not able to extract a confident ipTM score or binding pose for this sequence.

I therefore note that AlphaFold3 evaluation of the PepMLM-generated peptide may require either a rerun, a cleaned peptide sequence, or a peptide without ambiguous residues. The known SOD1-binding peptide FLYRWLPSRRGG can still be tested separately as a control.

I modeled mutant SOD1 together with the known binder FLYRWLPSRRGG in AlphaFold3. The resulting multichain prediction gave an ipTM of 0.90 and a pTM of 0.92, indicating a strong and confident predicted interface. The output included two SOD1 chains and one peptide chain, consistent with a dimeric SOD1 complex with peptide binding. This suggests that the known binder forms a plausible interaction with mutant SOD1 and provides a strong positive-control reference for comparison with the PepMLM-generated peptide.

PeptideipTMBinding locationStructural interpretation
SHYPEVTAVYKAKKXN/ANot determinedPepMLM peptide contains terminal X, which may complicate AlphaFold evaluation
FLYRWLPSRRGG0.90[inspect visually]Strong predicted interface; useful positive control

image.png image.png I evaluated the PepMLM-generated peptide SHYPEVTAVYKAKKX in PeptiVerse against mutant A4V SOD1. The peptide was predicted to be soluble with a solubility score of 1.000, and non-hemolytic with a hemolysis probability of 0.022. It had a predicted net charge of +1.55 at pH 7, a molecular weight of 1603.0 Da, an isoelectric point of 9.40, and a hydrophobicity (GRAVY) of -0.81, suggesting a relatively hydrophilic peptide with favorable physicochemical properties.

Compared with the AlphaFold3 analysis, the generated peptide showed a promising therapeutic-property profile, but its structural evaluation was limited because the sequence ends with an ambiguous residue (X). In contrast, the known SOD1-binding peptide FLYRWLPSRRGG produced a strong AlphaFold3 interface score (ipTM = 0.90), giving more confidence in its binding. Therefore, although SHYPEVTAVYKAKKX appears attractive in terms of solubility and low hemolysis risk, the current evidence does not show that it has stronger structural binding than the known binder.

If I had to advance one peptide, I would choose FLYRWLPSRRGG because it currently has the strongest direct structural support for binding to SOD1. However, SHYPEVTAVYKAKKX could still be a promising candidate for follow-up if the ambiguous terminal residue is resolved and the complex is re-evaluated structurally.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide: Paste the peptide sequence. Paste the A4V mutant SOD1 sequence in the target field. Check the boxesPredicted binding affinitySolubilityHemolysis probabilityNet charge (pH 7)Molecular weight Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties? Choose one peptide you would advance and justify your decision briefly. image.png image.png I evaluated the PepMLM-generated peptide SHYPEVTAVYKAKKX in PeptiVerse against mutant A4V SOD1. The peptide was predicted to be soluble with a solubility score of 1.000, and non-hemolytic with a hemolysis probability of 0.022. It had a predicted net charge of +1.55 at pH 7, a molecular weight of 1603.0 Da, an isoelectric point of 9.40, and a hydrophobicity (GRAVY) of -0.81, suggesting a relatively hydrophilic peptide with favorable physicochemical properties.

Compared with the AlphaFold3 analysis, the generated peptide showed a promising therapeutic-property profile, but its structural evaluation was limited because the sequence ends with an ambiguous residue (X). In contrast, the known SOD1-binding peptide FLYRWLPSRRGG produced a strong AlphaFold3 interface score (ipTM = 0.90), giving more confidence in its binding. Therefore, although SHYPEVTAVYKAKKX appears attractive in terms of solubility and low hemolysis risk, the current evidence does not show that it has stronger structural binding than the known binder.

If I had to advance one peptide, I would choose FLYRWLPSRRGG because it currently has the strongest direct structural support for binding to SOD1. However, SHYPEVTAVYKAKKX could still be a promising candidate for follow-up if the ambiguous terminal residue is resolved and the complex is re-evaluated structurally.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPit Colab linked from the HuggingFace moPPIt model card Make a copy and switch to a GPU runtime. In the notebook: Paste your A4V mutant SOD1 sequence. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). Set peptide length to 12 amino acids. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies? image.png image.png

I opened the moPPIt Colab notebook and configured a de novo peptide design run for the A4V mutant SOD1 target. The design setup was saved successfully with a binder length of 12 amino acids and 10 samples requested. The enabled optimization objectives were hemolysis, non-fouling, solubility, and half-life, all with equal weights ([1.0, 1.0, 1.0, 1.0]). No fixed motif positions were specified in this run.

Compared with PepMLM, moPPIt represents a more controlled peptide-design framework. PepMLM generates plausible binders conditioned primarily on the target sequence, whereas moPPIt is designed to optimize multiple therapeutic and developability objectives simultaneously during generation. This means moPPIt is better suited for producing peptide candidates that are not only potential binders, but also more favorable in terms of safety and drug-like properties such as reduced hemolysis, improved solubility, lower fouling tendency, and longer half-life.

In this case, I was able to configure the moPPIt design setup, but full peptide generation could not be completed reliably because the notebook required a higher-memory GPU runtime than was available in my Colab environment. Therefore, I could not obtain final optimized peptide sequences for direct comparison with the PepMLM output. Nevertheless, the moPPIt setup illustrates how peptide design can be shifted from simple sampling toward multi-objective optimization.

Before advancing any moPPIt-generated peptide toward clinical studies, I would first evaluate the candidates computationally using structure prediction and peptide-property assessment tools, then validate binding experimentally using biophysical assays such as SPR, MST, or ITC. The most promising peptides would next be tested in cell-based assays for effects on mutant SOD1 function, aggregation, and toxicity, followed by preclinical studies of safety, stability, pharmacokinetics, and efficacy. These steps would be essential before considering any clinical development.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

Boltz JQ1 Output

Boltz Hit Output

Boltz Lead Output

Boltz predicted pocket structure

Part C: Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

This homework requires computation that might take you a while to run, so please get started early.

Mutagenesis

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'A', 'LLR_Score': 0.17826223373413086}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'C', 'LLR_Score': -1.320490837097168}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'D', 'LLR_Score': -0.9523532390594482}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'E', 'LLR_Score': -0.7839055061340332}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'F', 'LLR_Score': -0.09411430358886719}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'G', 'LLR_Score': -0.478163480758667}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'H', 'LLR_Score': -1.1169145107269287}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'I', 'LLR_Score': 0.03905367851257324}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'K', 'LLR_Score': -0.6343975067138672}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'L', 'LLR_Score': 0.9823817014694214}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'M', 'LLR_Score': -1.3153700828552246}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'N', 'LLR_Score': -0.8423151969909668}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'P', 'LLR_Score': -0.3809645175933838}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'Q', 'LLR_Score': -0.44191431999206543}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'R', 'LLR_Score': -0.36664485931396484}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'S', 'LLR_Score': 0.11946296691894531}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'T', 'LLR_Score': 0.0}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'V', 'LLR_Score': 0.13974881172180176}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'W', 'LLR_Score': -1.768484115600586}
{'Position': 66, 'Wild_Type_AA': 'T', 'Mutation_AA': 'Y', 'LLR_Score': -1.20965576171875}
      Position Wild_Type_AA Mutation_AA  LLR_Score
989         50            K           L   2.561468
574         29            C           R   2.395427
769         39            Y           L   2.241780
575         29            C           S   2.043150
173          9            S           Q   2.014325
573         29            C           Q   1.997049
572         29            C           P   1.971029
569         29            C           L   1.960646
987         50            K           I   1.928801
1049        53            N           L   1.864932
1209        61            E           L   1.818098
1029        52            T           L   1.813968
984         50            K           F   1.802069
576         29            C           T   1.797247
568         29            C           K   1.795878
93           5            F           Q   1.795244
94           5            F           R   1.659717
560         29            C           A   1.648656
534         27            Y           R   1.628061
434         22            F           R   1.602028
92           5            F           P   1.596891
997         50            K           V   1.594576
995         50            K           S   1.574557
96           5            F           T   1.559024
95           5            F           S   1.556417
889         45            A           L   1.539248
775         39            Y           S   1.517457
535         27            Y           S   1.497053
789         40            V           L   1.477630
529         27            Y           L   1.474637
435         22            F           S   1.423358
563         29            C           E   1.383281
760         39            Y           A   1.364999
571         29            C           N   1.362601
980         50            K           A   1.357795
567         29            C           I   1.344121
89           5            F           L   1.332615
334         17            N           R   1.323651
767         39            Y           I   1.320103
776         39            Y           T   1.302804
514         26            D           R   1.268762
566         29            C           H   1.246107
764         39            Y           F   1.245851
777         39            Y           V   1.244390
454         23            K           R   1.236555
494         25            E           R   1.229350
474         24            H           R   1.227779
996         50            K           T   1.222131
533         27            Y           Q   1.218851
536         27            Y           T   1.215567
image.png image.png
Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
   Amino Acid  Position     Score
0           L        50  2.561468
1           L        39  2.241780
2           I        50  1.928801
3           L        53  1.864932
4           L        52  1.813968
5           F        50  1.802069
6           V        50  1.594576
7           S        50  1.574557
8           L        45  1.539248
9           S        39  1.517457
10          L        40  1.477630
11          A        39  1.364999
12          A        50  1.357795
13          I        39  1.320103
14          T        39  1.302804
15          F        39  1.245851
16          V        39  1.244390
17          T        50  1.222131
18          L        54  1.120860
19          R        39  1.064191
Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
      Position Wild_Type_AA Mutation_AA  LLR_Score
989         50            K           L   2.561468
574         29            C           R   2.395427
769         39            Y           L   2.241780
575         29            C           S   2.043150
173          9            S           Q   2.014325
573         29            C           Q   1.997049
572         29            C           P   1.971029
569         29            C           L   1.960646
987         50            K           I   1.928801
1049        53            N           L   1.864932
Position of the mutation in L	Base Pair Changed	Amino Acid Position	Amino Acid Change	Lysis	Protein Levels (ND=Not determined)
0	3	G->T	1	M->I	0	0
1	3	G->A	1	M->I	0	0
2	2	T->C	1	M->T	0	0
3	4	G->T	2	E->Stop	0	N.D.
4	8	C->T	3	T->I	0	0
Notes: - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Position Wild_Type_AA Mutation_AA LLR_Score 1332 50 K L 2.561468 770 29 C R 2.395427 777 29 C R 2.395427 1035 39 Y L 2.241780 771 29 C S 2.043150 778 29 C S 2.043150 229 9 S Q 2.014325 236 9 S Q 2.014325 776 29 C Q 1.997049 769 29 C Q 1.997049

image.png image.png image.png image.png

Notes: - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
mutated_sequence amino_acid position \ 0 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLAVLIFLAI... A 38 1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLCVLIFLAI... C 38 2 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLDVLIFLAI... D 38 3 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLEVLIFLAI... E 38 4 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLFVLIFLAI... F 38 5 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLGVLIFLAI... G 38 6 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLHVLIFLAI... H 38 7 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLIVLIFLAI... I 38 8 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLKVLIFLAI... K 38 9 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAI... L 38 10 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLMVLIFLAI... M 38 11 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLNVLIFLAI... N 38 12 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLPVLIFLAI... P 38 13 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLQVLIFLAI... Q 38 14 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLRVLIFLAI... R 38 15 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLSVLIFLAI... S 38 16 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLTVLIFLAI... T 38 17 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLVVLIFLAI... V 38 18 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLWVLIFLAI... W 38 effect_esm 0 0.007527 1 0.007512 2 0.007670 3 0.007671 4 0.007551 5 0.007582 6 0.007564 7 0.007540 8 0.007574 9 0.007545 10 0.007590 11 0.007572 12 0.007646 13 0.007608 14 0.007573 15 0.007546 16 0.007537 17 0.007585 18 0.007533