Week 5 HW: Protein Design - Part 2

Part A. SOD1 Binder Peptide Design

The perplexity scores for the candidate peptides were: ‘SDGAVLLGSDGE’ (Candidate 1): 16.25 ‘LLGSDGALQVGS’ (Candidate 2): 14.65 ‘SGVAVLCSDGQG’ (Candidate 3): 25.34 ‘AVGVCGVAVLGN’ (Candidate 4): 17.20

Lower perplexity scores suggest that the model finds a sequence more “familiar” or “expected,” potentially correlating with higher biological plausibility, conformational stability, or likelihood of interaction. Candidate 2, with a perplexity of 14.65, appears to be the most promising candidate from the mutated SOD1 sequence for further investigation, being closest to the known binding peptide’s score.

Candidate 1: ipTM = 0.52 pTM = 0.88

cover image cover image

Localization: The peptide (the small chain with yellow and orange segments at the top) is localized near the N-terminus. It is observed to sit atop the beta-barrel, specifically in the region where the initial helix and the N-terminal end of the SOD1 (shown in dark blue) connect with the rest of the protein structure.Interaction with the beta-barrel: The peptide directly engages the $\beta$-barrel region. It appears to be “resting” on the upper beta-sheets, effectively acting as a molecular cap.Surface-bound vs. Buried: The peptide appears primarily surface-bound. While it is not deeply buried within the protein core, it maintains extensive contact with the exposed surface, a characteristic typical of stabilizing peptides.Relationship to A4V: Being situated at the top, near the start of the sequence, the peptide is in the immediate vicinity of residue 4. This suggests it may help “anchor” the N-terminus, preventing it from detaching or unfolding due to the destabilizing effect of the A4V mutation.

cover image cover image

Upon closer inspection of the AlphaFold3 model, a clear hydrogen bond is visible between the peptide backbone and the SOD1 $\beta$-sheet. This specific inter-chain interaction confirms that the peptide is not just near the enzyme, but actively docking through electrostatic stabilization, which contributes to the observed ipTM of 0.52

Candidate 2: ipTM = 0.54 pTM = 0.89

cover image cover image

Although Candidate #2 shows a similar confidence score (ipTM = 0.54), the structural model reveals a significantly more robust interaction network. Detailed inspection shows multiple hydrogen bonds between the peptide backbone and the SOD1 surface loops, compared to the single-point attachment of previous candidate. This increased ‘molecular velcro’ effect likely provides better stabilization of the N-terminus, making this peptide a stronger therapeutic lead against the A4V mutation.

Candidate 3: ipTM = 0.35 pTM = 0.88

cover image cover image

Although the global view suggests a weak interaction (ipTM ~0.35), detailed close-up inspection reveals a highly specific docking event. The peptide binds to a flexible disordered loop (cyan region) rather than the rigid $\beta$-barrel core. This interaction is mediated by a sophisticated hydrogen-bond network and includes crucial electrostatic contacts between a peptide Arginine ($Arg$) and acidic residues on the enzyme’s surface loop. This suggests the peptide acts as an allosteric stabilizer, reducing the flexibility of critical loops near the N-terminus. A key residue (appearing as a nitrogen-rich heterocycle) is perfectly positioned to stabilize the enzyme’s loop through electrostatic contacts.

Candidate 4: ipTM = 0.49 pTM = 0.89

cover image cover image

ipTM of 0.49: By falling below the 0.5 threshold, AlphaFold is indicating a lack of confidence in the existence of a stable binding interface. Visual Evidence: The image shows the peptide (yellow chain) physically separated from the SOD1 enzyme (blue chain). Although the peptide attempts to adopt a self-folded conformation, there are no hydrogen bonds connecting it to the protein. pTM of 0.89: As in previous cases, this high value confirms that the SOD1 structure is correctly modeled and stable; the issue is strictly a lack of peptide affinity. Why does it fail to bind? Despite having numerous Valines (V) and Glycines (G), this peptide appears to be excessively hydrophobic and prone to self-folding. Instead of targeting the SOD1 surface, the peptide prefers to interact with itself, remaining “afloat” in the solvent without engaging the target.

After a comprehensive analysis using PepMLM for generation and AlphaFold3 for structural validation, two distinct strategies for stabilizing the mutant SOD1 emerge:

Candidate 2 (LLGSDGALQVGS) - The Structural Lead: > With the lowest perplexity score (14.65) and a superior ipTM of 0.54, this peptide stands out as the most structurally stable binder. Its aliphatic composition allows it to dock firmly against the protein’s core, acting as a reliable “patch” for the hydrophobic vulnerability created by the A4V mutation. 💎

Candidate 3 (SGVAVLCSDGQG) - The “Dark Horse” Candidate: > While its global confidence metrics are lower, high-resolution inspection reveals a fascinating “allosteric” mechanism. Candidate 3 demonstrates a sophisticated hydrogen-bond network that specifically “clamps” onto disordered surface loops. By inmovilizing these flexible regions near the N-terminus, it could provide a unique form of protection against the unfolding process that leads to toxic aggregation. 💎

Final Recommendation: > While Candidate 2 is the primary choice for advancement toward therapy due to its overall stability, Candidate 3 warrants further investigation. Its ability to “freeze” specific protein loops offers a complementary approach to traditional binding, potentially providing a more nuanced way to rescue the native fold of the SOD1-A4V enzyme.

The known binder presentes ipTM = 0.33 pTM = 0.79 clearly not a better choice 👎 👎 👎

NOTE: During the interaction with Gemini, the following suggestion was received: “SOD1 is a metalloenzyme, meaning it requires Copper (Cu) and Zinc (Zn) to be stable. If you want an ultra-precise model, you could add this.” That is to say, adding a third element (ligand) in AlphaFold using their specific SMILES strings or chemical identifiers. While the current model focuses on the protein-peptide interface, including these metallic cofactors would better simulate the native, stabilized state of the SOD1 enzyme.

PeptiVerse

Candidate 4 (AVGVCGVAVLGN) presents a contradictory profile. While PeptiVerse predicts the highest binding affinity of the set (6.651 $pKd$), this contradicts the AlphaFold3 structural model, which showed no physical contact with the enzyme (ipTM 0.49).This discrepancy is likely explained by the peptide’s high hydrophobicity (GRAVY: 1.83). Such extreme hydrophobicity often leads to non-specific interactions or self-aggregation rather than targeted docking at the A4V site. Furthermore, its hemolysis probability (0.132) is significantly higher than other candidates, making it a less safe therapeutic option.

A comparison between structural modeling and pharmacological prediction reveals a compelling trade-off. Candidate 2 (LLGSDGALQVGS) maintains the highest structural confidence (ipTM 0.54), but Candidate 3 (SGVAVLCSDGQG) shows a much stronger predicted binding affinity in PeptiVerse (6.242 $pKd$ vs 4.502 $pKd$). Visually, this is supported by the dense hydrogen-bond network observed in the AlphaFold3 close-up, where Candidate 3 effectively “clamps” onto the surface loops. Both peptides show ideal therapeutic profiles with maximum solubility and negligible hemolysis probability, confirming that PepMLM-generated sequences successfully avoid the toxic traits of highly hydrophobic non-binders like Candidate 4.

PeptiVerse Evaluation: Candidate 3 (SGVAVLCSDGQG)Binding Affinity: 6.242 ($pKd$). This is significantly higher than Candidate 2 (4.502). In logarithmic terms, this represents a much stronger predicted affinity for the target.Solubility: 1.000. Like Candidate 2, it is predicted to be perfectly soluble.Hemolysis: 0.022. Even lower than Candidate 2, making it exceptionally safe for systemic use.Net Charge: -1.55. Its slightly more negative charge might contribute to its better solubility and specific interaction with the mutant site. Interestingly, Candidate 3 (SGVAVLCSDGQG) emerges as a superior pharmacological lead.

Optimized Peptides with moPPIt

The peptides generated by moPPIt represent a significant shift from “plausible sampling” to “precision engineering.” Compared to the PepMLM candidates, several key differences emerge:

  • Chemical Diversity and Functional Groups: The moPPIt sequences incorporate a wider variety of amino acids, such as Cysteine (C), Tyrosine (Y), and Phenylalanine (F). While the PepMLM leads were primarily aliphatic or polar (rich in L, V, S, D, G), the presence of Cysteine in the moPPIt leads allows for potential disulfide bond formation, which can stabilize the peptide’s conformation and enhance its “clamping” effect on the SOD1 surface.
  • Targeted Structural Anchoring: Unlike the stochastic nature of PepMLM, which sampled sequences that could theoretically bind anywhere, moPPIt was guided to specific residues near the A4V mutation site. This targeted approach results in sequences that are chemically optimized to interact with the specific structural pocket destabilized by the mutation.
  • Pre-optimized Therapeutic Metrics: By incorporating solubility and hemolysis guidance during the generation process, moPPIt avoids the pitfalls of extreme hydrophobicity seen in some PepMLM candidates (like Candidate 4, which had a GRAVY score of 1.83). This ensures that the generated sequences are not just good binders but also safe pharmacological leads.

Pre-Clinical Evaluation Strategy: Before advancing these moPPIt candidates to clinical trials, they must be validated through a multi-step process:Structural Validation (In Silico): Molecular Dynamics (MD) Simulations: Static models like AlphaFold3 are insufficient. MD simulations are required to evaluate the binding residence time and ensure the peptide remains docked at the A4V site under physiological fluctuations. Biochemical Assays (In Vitro): Thioflavin T (ThT) Fibrillization Assay: This is the most critical functional test. It determines if the peptide can successfully inhibit the aggregation of mutant SOD1 into toxic fibrils. Surface Plasmon Resonance (SPR): This provides an accurate measurement of the Dissociation Constant ($K_d$) and binding kinetics (on/off rates) to verify the affinity predicted by PeptiVerse. Biological & Safety Testing:Proteolytic Stability: Since these are peptides, they must be tested for resistance to serum proteases to ensure a sufficient half-life in the human body. Cellular Toxicity: The leads must be tested on motor neuron cultures expressing the A4V mutation to confirm they reduce cellular stress and improve neuron survival without inducing off-target toxicity.

Note on AI Collaboration: The technical responses and structural analyses presented in this work were developed with the assistance of Gemini, an artificial intelligence model by Google. Gemini provided the initial drafts and technical frameworks based on the raw data from AlphaFold3, PepMLM, PeptiVerse, and moPPIt. The final review, polishing, and scientific validation were performed by the student to ensure accuracy and alignment with the course objectives.

Part C: Final Project: L-Protein Mutants

Analysis of Clustal Omega Alignment

  1. Soluble Region (Residues 1–40) This region is critical for DnaJ chaperone interaction. Highly Variable (Ideal for Mutation):
  • Positions 1–6: The start of the protein shows significant variation (METRFP vs METQSP vs MEIRFP). Position 4 is particularly flexible.
  • Positions 15–19: This loop varies between STNRR, STNRF, and STNRY. Mutating these could alter the binding surface for DnaJ. Conserved (Avoid Mutating):
  • Positions 21–25 (PFKHE): These residues are almost identical across all sequences, suggesting they are structurally vital.
  • Positions 30–38 (RRQQRSST): This motif is highly conserved.
  1. Transmembrane Region (Residues 41–75) This region integrates into the membrane to form pores. Variable (Ideal for Mutation):
  • Position 45: Changes between F (Phenylalanine) and C (Cysteine).
  • Position 73: Varies between Q (Glutamine) and R (Arginine). Adding a charge here could affect how the protein sits in the membrane. Conserved (Avoid Mutating):
  • Positions 48–60 (LAIFLSKFTNQLL): This hydrophobic core is very consistent, as it must maintain a specific shape to span the lipid bilayer.
cover image cover image

According to the graph, in general, we should avoid the aminoacids Cysteine, Methionine, and Tryptophan. Residue 4 and residues 21-28 seem good options to mutate.

Looking at the excel with experimental data, there is a clear “functional window” for engineering between residues 13 and 31 of the soluble domain. In this region, multiple mutations—such as R18G, R20W, and K23E—maintain a Lysis score of 1, demonstrating that this domain is structurally flexible and can tolerate amino acid substitutions without losing functional integrity. In contrast, the N-terminal start (residues 1–11) and the transmembrane core (residues 48–60) are highly sensitive, where most mutations result in Lysis 0 due to the disruption of protein production or pore-forming capability. Therefore, our engineering strategy focuses on introducing mutations within the residue 13–31 range to optimize DnaJ-independent folding and protein expression while preserving the essential lytic activity of the phage.

Functional Robustness in the TM Domain: * Between residues 38 and 75, there are numerous “safe” substitutions, such as T49S, A63V, and T69S, all of which maintain full lysis activity. This suggests that while the hélice must span the membrane, it can tolerate many conservative amino acid changes without losing its pore-forming ability. The “Lethal” Exceptions: Even within this functional window, there are critical “black holes” where any change causes failure. For example, position 49 is highly sensitive; while S49L works, several other mutations at this exact spot lead to 0 lysis. Position 60 is a “dead zone”: L60P, L60V, and L60Q all result in a total loss of function.

The Soluble Domain (Residues 1–40) This region interacts with the host chaperone DnaJ. Highly Sensitive Sites (Lysis = 0): * Position 1 (M1I, M1T): Any change to the start codon abolishes protein production and lysis. Position 3 (T3I, T3S): Mutations here result in a total loss of function. Position 33 (Q33H): Changing Glutamine to Histidine at this position stops lysis. Tolerant Sites (Lysis = 1): Position 18 (R18G, R18I): The protein remains functional, suggesting this part of the soluble domain is flexible. Position 20 (R20W, R20L): These substitutions are well-tolerated. Position 23 (K23E): This site is resilient to change.

Proposed Mutations for MS2 L-Protein Engineering

R18G + R20W + K23E: Combines three sites proven to be functional (Lysis 1) in the lab data. By changing these three residues simultaneously, we drastically alter the electrostatic surface of the N-terminal domain to ensure DnaJ independence while maintaining high protein levels.

S15A + R19S + Q32E: Targets the highly variable “loop” residues identified in ClustalOmega. Replacing these with smaller or differently charged residues (S to A, R to S, Q to E) aims to create a “stealth” soluble domain that fails to bind the host’s mutated DnaJ chaperone.

F45A + A63V: Combines two experimentally validated “safe” sites in the lysis-active region. This combination aims to stabilize the hydrophobic hélice (A63V) while testing if the removal of the bulky Phenylalanine (F45A) facilitates faster pore assembly.

T69S + L73R: Uses a proven functional mutation (T69S) paired with an evolutionary change seen in Emesvirus (L73R). The goal is to optimize the C-terminal “anchor” to improve membrane penetration and accelerate bacterial killing.

R18I + T75S: Combines a high-expression soluble mutation (R18I) with a conservative C-terminal tail modification. This variant is designed to test if increasing the initial stability of the protein translates into more efficient processing at the membrane interface.