Week 5 HW: hw-protein-design-part-ii

🐉 Part A: SOD1 Binder Peptide Design (From Pranam)

Background:Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Part 1: Generate Binders with PepMLM

1.Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

>sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

sp|P00441|SOD1_A4V (Clinical A4V Mutation, Internal Position 5) MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Position:   1  2  3  4  5  6  7  8  9  10
-----------------------------------------
WT :        M  A  T  K  A  V  C  V  L  K ...
A4V:        M  A  T  K  V  V  C  V  L  K ...
                        ^
                     [Mutant]

2-3.Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

| Binder Sequence | Pseudo Perplexity (PPPL) |
| `WHYPPAGAAHGX`  |     `8.107327`           |

Peptide Sequence Comparison Report

🧬 Reference Sequence

ref = "FLYRWLPSRRGG"

generated = [
    "WHYPPAGAAHGX"
]

def identity(seq1, seq2):
    matches = sum(a == b for a, b in zip(seq1, seq2))
    return matches / len(seq1)

def mutations(seq1, seq2):
    return [(i+1, a, b) for i, (a, b) in enumerate(zip(seq1, seq2)) if a != b]

results = []

for seq in generated:
    id_score = identity(ref, seq)
    muts = mutations(ref, seq)
    
    results.append({
        "seq": seq,
        "identity": id_score,
        "mutations": muts
    })

for r in results:
    print(f"\nSequence: {r['seq']}")
    print(f"Identity: {r['identity']:.2f}")
    print(f"Mutations: {r['mutations']}")
Sequence: WHYPPAGAAHGX
Identity: 0.08
Mutations: [
 (1, 'F', 'W'),
 (2, 'L', 'H'),
 (3, 'Y', 'Y'),
 (4, 'R', 'P'),
 (5, 'W', 'P'),
 (6, 'L', 'A'),
 (7, 'P', 'G'),
 (8, 'S', 'A'),
 (9, 'R', 'A'),
 (10, 'R', 'H'),
 (11, 'G', 'G'),
 (12, 'G', 'X')
]

🧬 PepMLM Peptide Perplexity Analysis

⚙️ Device Setup

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

def compute_perplexity(seq, tokenizer, model):
    inputs = tokenizer(seq, return_tensors="pt")
    
    # ⭐ Move inputs to the same device as model
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs, labels=inputs["input_ids"])
        loss = outputs.loss
        ppl = torch.exp(loss)

    return ppl.item()

    ref = "FLYRWLPSRRGG"
    gen = "WHYPPAGAAHGX"

    ppl_ref = compute_perplexity(ref, tokenizer, model)
    ppl_gen = compute_perplexity(gen, tokenizer, model)

print(f"Reference PPL: {ppl_ref:.4f}")
print(f"Generated PPL: {ppl_gen:.4f}")

🧬 PepMLM Perplexity Analysis Report


📊 Results Summary

Sequence TypePseudo-Perplexity (PPL)
Reference (SOD1-binding peptide)2.7808
Generated peptide3.5465

🧠 Interpretation

The generated peptide shows a higher pseudo-perplexity compared to the reference SOD1-binding peptide.

This indicates that:

  • It has a lower probability under the PepMLM protein language model
  • It deviates more from learned natural peptide patterns
  • It is less consistent with native-like sequence motifs

🔬 Biological implication

These results suggest:

  • Reduced structural plausibility
  • Potential loss of native-like binding characteristics
  • Increased deviation from known SOD1-binding sequence features

image image

Fig. 1 Protein-peptide complex based on AlphaFold Server

“The A4V mutation typically occurs near the core of the SOD1 protein’s β-barrel, leading to structural instability and a higher propensity for protein dissociation (or ‘falling apart’). The peptide binds in the vicinity of this A4V mutation site.”

image image

Fig. 2 Binding site of protein and short peptide

“The observed pTM of 0.85 indicates that AlphaFold has very high confidence in the overall topological conformation of the protein monomer, suggesting a highly reliable structure. However, the ipTM of 0.39 is relatively low (with 0.5 typically serving as the threshold), which implies that the model lacks certainty regarding the relative positioning and interaction interface between the protein and the short peptide. This suggests that the binding conformation may be unstable or that multiple potential binding modes exist.”

image image

WHYPPAGAAHGX demonstrates an excellent developmental balance between binding capacity and therapeutic properties:

Exceptional Safety and Solubility: With a solubility probability of 1.000 and a hemolysis probability as low as 0.017, this peptide presents virtually no hurdles in terms of experimental handling and biocompatibility.

Moderate Affinity: Although its $pKd$ of 5.350 is classified as “weak binding” (micromolar range), it provides a very clean and highly plastic scaffold for a short 12-amino acid peptide with no significant net charge.

Conclusion: By trading off a degree of affinity, this peptide achieves optimal physicochemical stability. Compared to sequences that exhibit “strong binding but poor solubility,” it stands out as a candidate with superior druggability potential.

Selection and Evaluation of a Stochastic Peptide Sequence

Which one is best?

| Binder Sequence | Pseudo Perplexity (PPPL) |
| `WRSYVVAVELGE`  |  `18.629115501481596` |
image image

Peptide Comparative Analysis: WRSYVVAVELGE vs. WHYPPAGAAHGX

📝 Executive Summary

While WRSYVVAVELGE shows slightly higher binding affinity, WHYPPAGAAHGX presents a significantly more favorable "druggability" profile, particularly regarding membrane permeability, non-fouling characteristics, and solubility in physiological conditions.
---

🔍 Comparative Analysis

1. Binding & Potency

  • WRSYVVAVELGE: Demonstrates a higher binding affinity ($pK_d/pK_i$ of 5.814).
  • WHYPPAGAAHGX: Shows slightly weaker binding (5.350).

Takeaway: The difference in affinity is marginal ($\approx 0.46$ log units). In many lead-optimization contexts, this slight loss in potency is a reasonable trade-off for the improved ADME properties seen in the second peptide.

2. ADME & Pharmacokinetics

FeatureWRSYVVAVELGEWHYPPAGAAHGXWinner
Permeability0.030 (Low)0.571 (Moderate)🏆 WHYPPAGAAHGX
Half-Life0.493 hrs0.618 hrs🏆 WHYPPAGAAHGX
FoulingFouling (0.467)Non-fouling (0.671)🏆 WHYPPAGAAHGX
Hemolysis0.112 (Low)0.017 (Negligible)🏆 WHYPPAGAAHGX

Permeability: The most striking difference. WHYPPAGAAHGX has a $\approx 19\times$ higher probability of penetrance, making it a much better candidate for intracellular targets or oral bioavailability. Stability: WHYPPAGAAHGX offers a longer predicted half-life and better resistance to non-specific protein adsorption (non-fouling).

3. Physicochemical Properties

Hydrophobicity (GRAVY): WRSYVVAVELGE is slightly hydrophobic (0.28), which correlates with its lower solubility in complex environments and fouling tendency. WHYPPAGAAHGX is distinctly hydrophilic (-0.60), explaining its excellent solubility profile. Isoelectric Point (pI) & Charge: WRSYVVAVELGE (pI 4.86) carries a stronger negative charge at pH 7 (-1.23). WHYPPAGAAHGX (pI 6.92) is nearly neutral (-0.07) at physiological pH. This neutrality often aids in crossing lipid bilayers, supporting its higher permeability score.


🎯 Conclusion

WHYPPAGAAHGX is the superior scaffold for further development.

Although WRSYVVAVELGE has a stronger initial binding score, its poor permeability and fouling tendencies represent significant “developability” hurdles. WHYPPAGAAHGX strikes a much better balance: it remains highly soluble and non-hemolytic while providing the necessary penetrance to function effectively in a biological system.