Week 5 HW:Protein-design-part-ii

Part 1: Generate binders with PepMLM

Step 1 — Get the mutant SOD1 sequence

Original SOD1 sequence from Uniprot

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

A4V mutant SOD1 sequence

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Step 2 — Run PepMLM

Used google colab notebook for the step.

https://colab.research.google.com/drive/1j4HO5MPdIlCLZFvqYnjy_Ug--8oxTB9Z#scrollTo=VtfbXYndhyle

#	Binder	Pseudo Perplexity
1	WLSYPVVLEWGE	16.4334209
2	WHYYVVAVRWKE	31.112519
3	WHSYAAAAALWE	12.9540785
4	WLYYAVGLAWKX	14.2779037
5	FLYRWLPSRRGG	XXXX

(lower perplexity = the model is more confident the peptide is a good binder)

Part 2: AlphaFold3 structure prediction

For each result, I looked at the below: The ipTM score (higher is better, closer to 1.0 means confident binding prediction)

Binder	Pseudo Perplexity	ipTM score
WLSYPVVLEWGE	16.4334209	0.21
WHYYVVAVRWKE	31.112519	0.35
WHSYAAAAALWE	12.9540785	0.27
WLYYAVGLAWKX	14.2779037	Could not be generated because it had a letter X
FLYRWLPSRRGG	XXXX	0.3

The ipTM score (interface predicted TM-score) measures how confidently AlphaFold3 predicts the two chains interact. It ranges from 0 to 1. Below 0.4 = poor 0.4–0.6 = moderate Above 0.6 = good

Where the peptide appears to dock (look at the 3D viewer — is it near position 4 at the N-terminus? At the dimer interface? Surface-exposed?)

I have a question on the fact that my second binder has a high Pseudo Perplexity value of 31 therefore the assumption is that the binding confidence level is very low. However through alphafold it has the best ipTM score suggesting that it had the best confident binding prediction

Part 3: PeptiVerse therapeutic properties

Binder	Solubility	Hemolysis	Binding Affinity	Length	Molecular Weight	Net Charge (pH 7)	Isoelectric Point	Hydrophobicity (GRAVY)
WLSYPVVLEWGE	Soluble - 1.00	Non-hemolytic 0.112	Weak binding - 5.952	12	1477.7	-2.23	4.24	0.26
WHYYVVAVRWKE	Soluble - 1.00	Non-hemolytic 0.069	Weak binding - 6.345	12	1635.9	0.85	8.5	-0.42
WHSYAAAAALWE	Soluble - 1.00	Non-hemolytic 0.035	Weak binding - 5.863	12	1375.5	-1.15	5.47	0.18
WLYYAVGLAWKX	Soluble - 1.00	Non-hemolytic 0.115	Medium binding - 7.101	12	1351.8	0.76	8.5	0.56

the best one balances strong binding + not hemolytic + good solubility.

Part 4: moPPIt optimized design

https://colab.research.google.com/drive/1OYDpzq1pC35539nINWGZwmf4Jw5zss-Y#scrollTo=19a09b27

I initially started with trying to use the google colab shared and input what I thought was right. I continously kept on getting an error that is IndexError.

I used claude to help me work out what i can do, and this is what it provided me with the instructions below. After that I did the following and this is a summary of my concluison from the work.

Steps taken

I then went to Gemini and copy pasted the above and let it run.

After completing the run the output it gave was as below

moPPIt uses discrete flow matching to steer generation toward specific binding sites AND simultaneously optimize multiple properties. moPPIt peptides are expected to have higher binding affinity and better drug-like properties (solubility, safety) than PepMLM outputs

SOD1 Structure Guide: Targeting Residues for moPPIt

What Is SOD1?

Superoxide Dismutase 1 (SOD1)

Role: Antioxidant enzyme in cells; protects against free radical damage
Structure: Homodimer (~32 kDa monomer × 2) with zinc and copper cofactors
Disease Link: >180 mutations cause Familial ALS (fALS); A4V is one of the most severe
A4V Mutation Impact:
- Position 4: Alanine → Valine
- Makes protein unstable, prone to misfolding and aggregation
- Leads to neuronal toxicity → motor neuron death → ALS

SOD1 Sequence Map (Human, UniProt P08572)

POSITION    1         10        20        30        40        50
            |         |         |         |         |         |
SEQUENCE:   METAKSQVVQ VEVKPPALP DAQFEVVHS LAKWKRQTL GQHDFSAGE GLYTHMKALR
             ↑                                      ↑  ↑  ↑  ↑
           A4V (mut.)                  Motif region  Metal sites

POSITION    60        70        80        90       100       110
            |         |         |         |         |         |
SEQUENCE:   PDEDRLSPL HSVYVDQWD VERVMGDGE RTIGIVNVF NVKEGIEQL PDGQKVSDT
             ↑        ↑↑↑↑↑↑↑↑    ↑↑↑
           Dimer     Cu/Zn      Dimer
           interface  binding    interface

POSITION   120       130       140       150
            |         |         |         |
SEQUENCE:   GFAKGSDDH DGLNSGRVL APSPVQVSS AAGV
                                             ↑
                                        C-terminus

Key Regions to Target in moPPIt

1. A4V Mutation Site (Most Direct)

Position:  1  2  3  4  5  6  7  8  9  10
Sequence:  M  E  T  A→V  K  S  Q  V  V  Q
Index:     0  1  2  3  4  5  6  7  8  9
                     ↑↑↑↑↑↑↑↑↑↑↑↑↑
                  TARGET INDEX: 3

Why target here?
✓ Direct site of pathogenic mutation
✓ Likely destabilized region
✓ Peptide here could stabilize or mark for degradation
✓ Highest specificity to A4V disease

Recommended for: Stabilization or selective targeting of mutant

2. Metal Cofactor Binding Sites (Stabilization)

Zinc Binding (Zn²⁺):

Position:   45   46   47   48   49
Sequence:   R    P    D    E    D
Index:      44   45   46   47   48
            ↑    ↑↑↑↑↑↑↑↑↑↑↑↑↑
            |    Zn²⁺ binding motif
           Loop

Why target here?
✓ Zinc stabilizes SOD1 structure
✓ A4V mutants have weaker Zn²⁺ binding
✓ Peptide here could enhance Zn²⁺ coordination
✓ Stabilizes fold → prevents aggregation

Recommended indices: [44, 45, 46, 47, 48]
Or 1-indexed positions: 45-49

Copper Binding (Cu²⁺):

Position:   70   71   72   73   74   75   76   77   78   79   80
Sequence:   H    S    V    Y    V    D    Q    W    D    W    E
Index:      69   70   71   72   73   74   75   76   77   78   79
            ↑    ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
            |    Cu²⁺ binding cluster
           Loop

Key residues:
- H71 (Histidine 71) - coordinates Cu²⁺
- D74 (Aspartate 74) - coordi
nates Cu²⁺
- W76 (Tryptophan 76) - structural support

Why target here?
✓ Critical for catalytic activity
✓ A4V mutants lose Cu²⁺ stability faster
✓ Peptide here could improve metal retention
✓ Even small improvements help

Recommended indices: [70, 71, 73, 74, 76, 78, 79]
Or 1-indexed positions: 71, 72, 74, 75, 77, 79, 80

3. Dimer Interface (Aggregation Prevention)

MONOMER A:                     MONOMER B:
┌──────────────────┐           ┌──────────────────┐
│                  │           │                  │
│  Position 50-60  │◄─────────►│  Position 50-60  │
│  (contacts B)    │ Interface │  (contacts A)    │
│                  │           │                  │
│  Position 85-100 │◄─────────►│  Position 85-100 │
│  (contacts B)    │ Interface │  (contacts A)    │
│                  │           │                  │
└──────────────────┘           └──────────────────┘

Dimer Interface A (Region 1):
Position:   50   51   52   53   54   55   56   57   58   59   60
Sequence:   L    G    Q    H    D    F    S    A    G    E    G
Index:      49   50   51   52   53   54   55   56   57   58   59
            ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

Dimer Interface B (Region 2):
Position:   85   86   87   88   89   90   91   92   93   94  100
Sequence:   G    I    E    Q    L    P    D    G    Q    K   ...
Index:      84   85   86   87   88   89   90   91   92   93   99
            ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

Why target here?
✓ A4V mutants hyperaggregate (misfolded dimers)
✓ Peptide here could disrupt "bad" dimers
✓ Or stabilize native dimers
✓ Critical for ALS pathogenesis

Recommended indices: [49, 51, 53, 55, 57, 84, 86, 88, 90]
Or 1-indexed positions: 50, 52, 54, 56, 58, 85, 87, 89, 91

4. General Surface Patches (High Accessibility)

SOD1 accessible surface:
- N-terminal region (1-20): Accessible, flexible
- Loop regions (40-50, 60-70): Flexible, exposed
- C-terminal region (140-153): Accessible, flexible
- Anywhere NOT in active site or dimer interface

Benefits:
✓ Easier for peptide to access
✓ Less steric clashes
✓ Multiple contact points

Recommended indices: [0-20] or [100-153] or scattered accessible residues

CHOOSING YOUR BINDING RESIDUES FOR moPPIt

Decision Tree:

What do you want the peptide to do?

├─ STABILIZE SOD1 (prevent misfolding)
│  └─ Target metal binding sites:
│     └─ USE: [45, 46, 47, 70, 71, 73, 74, 76, 78, 79]
│
├─ DISRUPT BAD DIMERS (anti-aggregation)
│  └─ Target dimer interface:
│     └─ USE: [49, 51, 53, 55, 57, 84, 86, 88, 90]
│
├─ SPECIFICALLY TARGET A4V MUTATION
│  └─ Target mutation site directly:
│     └─ USE: [3] (+ maybe nearby: [1, 2, 4, 5])
│
└─ BALANCED APPROACH (be conservative)
   └─ Target multiple sites:
      └─ USE: [3, 49, 55, 74, 76, 79, 85, 88]

EXAMPLE: Building Your Binding Residue List

Example 1: Stabilization Strategy

Goal: Keep SOD1 folded (prevent A4V misfolding)
Strategy: Reinforce metal binding sites

Biology positions:  45  46  47  48  49  70  71  72  73  74  75  76
Code indices:       44  45  46  47  48  69  70  71  72  73  74  75

For Colab input:
binding_residue_indices = [44, 45, 46, 47, 48, 69, 70, 71, 72, 73, 74, 75]

Explanation: "I'm targeting the Zn²⁺ and Cu²⁺ coordination sites. A4V 
mutants are unstable, so a peptide that reinforces these metal-binding 
regions could restore structural integrity."

Example 2: Anti-Aggregation Strategy

Goal: Prevent dimerization (block aggregation)
Strategy: Disrupt dimer interface

Biology positions:  50  52  54  56  58  85  87  89  91
Code indices:       49  51  53  55  57  84  86  88  90

For Colab input:
binding_residue_indices = [49, 51, 53, 55, 57, 84, 86, 88, 90]

Explanation: "I'm targeting the dimer interface. A4V causes pathogenic 
aggregates, so a peptide at the dimer interface could selectively bind 
and disaggregate or prevent formation of toxic SOD1 oligomers."

Example 3: Multi-Site Strategy (Recommended)

Goal: Multi-pronged approach
Strategy: Combine mutation site + metal binding + accessible surface

Targets:
- A4V site: position 4 → index 3
- Zn binding: positions 45-49 → indices 44-48
- Cu binding: positions 71, 74, 76 → indices 70, 73, 75
- Surface accessibility: position 100 → index 99

For Colab input:
binding_residue_indices = [3, 44, 45, 46, 47, 48, 70, 73, 75, 99]

Explanation: "I'm targeting multiple sites: the A4V mutation site (direct 
targeting), the Zn²⁺ and Cu²⁺ binding regions (structural stabilization), 
and a surface-accessible region (for cell recognition). This multi-target 
approach should generate peptides with both high affinity and therapeutic 
breadth."

INDEX CONVERSION QUICK TABLE

If you see position X in publications/UniProt:

Biology Position	Code Index	Region
4	3	A4V mutation
45-49	44-48	Zn²⁺ site
50-60	49-59	Dimer interface 1
71-80	70-79	Cu²⁺ site
85-100	84-99	Dimer interface 2
1-20	0-19	N-terminus
140-153	139-152	C-terminus

Formula: Code Index = Biology Position - 1

SAMPLE OUTPUT FROM moPPIt

Once you run generation, you’ll see output like this:

=== MOPPIT PEPTIDE GENERATION RESULTS ===

Target: SOD1_A4V (153 residues)
Binding sites: [3, 49, 55, 74, 76, 79, 85, 88]
Peptide length: 12
Guidance: motif + affinity + solubility + hemolysis
Strength: 0.7

Generated 10 peptides:

1. MWFFLKPLYLRP
   ├─ Affinity Score: 8.5/10 ✓ (strong binder)
   ├─ Solubility: 0.88/1.0 ✓ (soluble)
   ├─ Hemolysis: 0.95/1.0 ✓ (non-toxic)
   └─ Motif fit: 7/8 binding residues contacted
   
2. LRPPGKVFLWMY
   ├─ Affinity Score: 8.3/10 ✓
   ├─ Solubility: 0.85/1.0 ✓
   ├─ Hemolysis: 0.92/1.0 ✓
   └─ Motif fit: 6/8 binding residues contacted
   
[...etc...]

WHAT GOOD SCORES LOOK LIKE

Affinity Score:
  8.5-10.0  →  Excellent (strong, specific binding)
  7.0-8.4   →  Good (decent binder)
  6.0-6.9   →  Okay (marginal)
  <6.0      →  Poor (weak or no binding)

Solubility:
  0.8-1.0   →  Excellent (soluble, won't aggregate)
  0.6-0.79  →  Good (mostly soluble)
  0.4-0.59  →  Fair (some precipitation risk)
  <0.4      →  Poor (will aggregate)

Hemolysis (non-toxicity):
  0.8-1.0   →  Excellent (safe to blood)
  0.6-0.79  →  Good (low toxicity)
  0.4-0.59  →  Fair (some toxicity risk)
  <0.4      →  Poor (likely toxic)

Overall = Average of normalized scores
Best candidates: All three scores >0.8