Week 5: Protein Design - Part II
SOD Peptide Design from Pranam
Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation. Your challenge: a. Design short peptides that bind mutant SOD1. b. Then decide which ones are worth advancing toward therapy.
All the given steps were followed and the results are summarized below.
| Binders | Perplexity Scores | iPTM score | Binding Region |
|---|---|---|---|
| ARWDPVVGAVEARAK | 12.81 | 0.34 | Forms a helix and binds a bit far from N-terminal around the 70-100 residue positions |
| ARWPEYVAVYEAKRA | 12.16 | 0.25 | Forms a helix and binds near N-terminal |
| WRWGVVTARKKAARK | 12.24 | 0.48 | A bit far from N-terminal, around the 70-100 residue positions |
| AWVPVLTARVELAKX | 18.139 | 0.42 | Near N-terminal |
| FLYRWLPSRRGG | - | 0.39 | Near N-terminal |
Thus observed ipTM scores for the PepMLM-generated peptides demonstrate variable docking confidence, ranging from a low of 0.25 to a peak of 0.48. When compared against the known binder (FLYRWLPSRRGG), which exhibits a moderate ipTM score of 0.39 for its interaction near the N-terminal, two of the computationally designed peptides successfully exceed this baseline. Specifically, the peptide WRWGVVTARKKAARK achieves the highest overall ipTM score of 0.48, binding further away from the N-terminal (around residues 70–100), while AWVPVLTARVELAKX scores 0.42, binding near the N-terminal. The remaining two PepMLM candidates, ARWDPVVGAVEARAK (0.34) and ARWPEYVAVYEAKRA (0.25), do not pass the benchmark established by the known binder.
| Peptide | MW (Da) | Charge | GRAVY | Solubility | Hemolysis |
|---|---|---|---|---|---|
| AWVPVLTARVELAKX | 1534.9 | +0.80 | 0.70 | Soluble (1.000) | Low (0.087) |
| WRWGVVTARKKAARK | 1813.2 | +5.76 | -0.95 | Soluble (0.985) | Low (0.023) |
| ARWPEYVAVYEAKRA | 1809.0 | +0.80 | -0.63 | Soluble (1.000) | Low (0.084) |
| ARWDPVVGAVEARAK | 1624.8 | +0.80 | -0.20 | Soluble (1.000) | Low (0.049) |
Analysis & Structural Correlation
- ipTM vs. Affinity: High AlphaFold3 ipTM scores correlate with tight structural complementation (optimized hydrogen bonding, salt bridges, or hydrophobic packing), which generally translates to a more favorable (strongly negative) predicted binding free energy.
- Safety Profiles: No strong binders pose safety risks. All sequences show flawless predicted solubility and negligible hemolytic risk. Even the highly hydrophobic sequence AWVPVLTARVELAKX (GRAVY: 0.70) remains safely non-hemolytic.
- Optimal Balance: ARWDPVVGAVEARAK achieves the best therapeutic balance. It pairs maximum solubility (1.000) and low hemolysis (0.049) with a moderate GRAVY score (-0.20), minimizing the aggregation risks seen in highly hydrophobic variants.
Lead Selection & Justification
Selection: ARWDPVVGAVEARAK Justification: With a 0.42 iPTM score, this peptide is the best candidate to advance. It also delivers a great therapeutic profile with great predicted solubility, minimal hemolytic risk, and a balanced hydrophobic index that mitigates in vitro aggregation while maintaining binding efficacy.
Binders obtained through moPPlt:
| Sample Sequence | Hemolysis Score | Solubility Score | Affinity Score |
|---|---|---|---|
| RAVARNVWFAWF | 0.0126 | 1.0000 | 7.4749 |
| ACEEFGFEAVCA | 0.0077 | 1.0000 | 6.9697 |
| GSRRWWVYWHYT | 0.0244 | 1.0000 | 7.5653 |
| CFAGAGNRYGWK | 0.0054 | 0.9998 | 6.5160 |
| GRRCAGPYYNWG | 0.0068 | 1.0000 | 7.3545 |
| CRDTRVGCPHRC | 0.0220 | 1.0000 | 7.7880 |
| WCCQWADGRWER | 0.0158 | 0.9995 | 7.2062 |
| PFFCREYALYCY | 0.0043 | 0.9983 | 7.7089 |
| FAYYRPCGCGCR | 0.0251 | 1.0000 | 8.1801 |
- PepMLM vs. moPPIt Peptides
- PepMLM generates general linear binders across a target sequence. moPPIt utilizes discrete flow matching to target hyper-localized, functional subsequence motifs.
- PepMLM candidates are designed as general sequence binders ideal for scaffolding or linking to functional domains (e.g., degradation tags), whereas moPPIt candidates act as compact, high-density competitive inhibitors tailored to disrupt precise protein-protein interaction interfaces.
- Both feature optimal safety metrics (Solubility approx 1.0, Hemolysis < 0.03), but moPPIt provides direct, highly optimized quantitative affinity scores (ranging from 6.51 to 8.18).
- Pre-Clinical Evaluation Strategy
Phase I: Structural Validation
- AF3 Conformation: Run AlphaFold3 to assess structural complementarity and docking metrics (ipTM/pLDDT).
- Disulfide Mapping: Predict intra-chain disulfide bonding patterns for cysteine-rich variants to evaluate self-cyclization.
Phase II: Biophysical Characterization
- Binding Kinetics: Synthesize top leads and determine absolute Kd values via Surface Plasmon Resonance (SPR).
- Proteolytic Stability: Incubate in human serum to quantify metabolic half-life and assess the need for backbone stabilization (e.g., D-amino acids, cyclization).
Phase III: Functional & Safety Assays
- Toxicity Screening: Perform wet-lab erythrocyte hemolysis and cellular viability (MTT) assays to validate computational safety metrics.
- On-Target Efficacy: Run cell-based functional assays to confirm target inhibition or degradation capability.
Final Proect: L-Protein Mutants
Option 1: Mutagenesis
- Experimental vs. ESM Score Correlation
The experimental data reveals a poor direct correlation with the subtle variations in your ESM snippet scores.
The ESM snippet shows tight, high scores (0.00753 to 0.00759) for mutating position 39 (Y39).
In the dataset, the wild-type residue Y39 is highly critical. Mutating it to Histidine (Y39H) completely abolishes function (Lysis: 0, Protein Levels: 0).
Conclusion: The language model identifies this region as highly significant (yielding high absolute scores), but its tiny fractional score differences do not capture the binary functional disruption seen experimentally.
- 5 Engineered Mutants
The 5 mutants I have chosen are:
Soluble Region Variants
Variant 1: P13L
Why: Directly validated in the data to maintain full function (Lysis: 1, Protein Levels: 1). It confirms the soluble loop tolerates hydrophobic substitution.
Variant 2: S15A
Why: Directly validated in the data (Lysis: 1, Protein Levels: 1). Substituting Serine with Alanine preserves functional spacing while removing the hydroxyl group.
Transmembrane (TM) Region Variants
Variant 3: R30L
Why: Validated in the data to maintain full function (Lysis: 1, Protein Levels: 1). Replacing a charged Arginine with Leucine at the TM boundary optimizes the hydrophobic core.
Variant 4: I46F
Why: Validated in the data to maintain full function (Lysis: 1, Protein Levels: 1). Swapping Isoleucine for Phenylalanine maintains hydrophobicity and supports membrane-disrupting alpha-helix packing.
Robust Combination Variant
- Variant 5: P13L + S15A
- Why: Combines two adjacent, experimentally proven functional mutations from the soluble domain. This tests whether the local region accommodates additive hydrophobic structural changes without losing stability.
Option 2: Mutagenesis Using Af2 Multimer
| Component / Section | Key Findings & Metric Analysis | Structural / Biophysical Implications |
|---|---|---|
| AlphaFold2 Model Metrics | • Model 5 top rank (pLDDT = 84.1, pTM = 0.575). • High MSA depth (>15,000 sequences). | Robust, high-confidence structural predictions that have stably converged over 3 recycles. |
| Domain Confidence Profile | • Domain 1: Residues 1–70 (pLDDT > 80). • Loop: Residues 75–105 (pLDDT \sim 20\text{–}40). • Domain 2: Residues 110–350+ (pLDDT > 80). | Avoid engineering mutations in the flexible loop (75–105), as disordered regions tolerate changes without disrupting binding. |
| Mutation Set 1: Electrostatic Inversion | • M1 (Domain 1): Basic to Acidic (e.g., K/R \rightarrow E). • M2 (Domain 2): Acidic to Basic (e.g., D/E to K). • M3 (Domain 2): Basic to Acidic (e.g., R to E). | Destabilizes binding by shattering existing salt bridges and generating active electrostatic repulsion at the interface. |
| Mutation Set 2: Steric & Hydrophobic Clash | • M4 (Domain 1): Hydrophobic to Charged (e.g., L/I to R). • M5 (Domain 2): Small to Bulky Aromatic (e.g., A/G to W). | Drives a massive energetic desolvation penalty and introduces severe steric hindrance to physically wedge the complex apart. |