Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design

PeptidePerplexityipTM scoreN terminusB-barrelDimer interface
WRYPAAAAALKX4.308080.3CloseNoSurface bound
WRYGATVAAHKX5.8119530.48FarNoPartially buried
WLSGAAALALKX5.7161310.45CloseNoSurface bound
WLYPAAALALKX8.301710.36FarNoPartially buried
FLYRWLPSRRGG0.38FarNoSurface bound

The predicted protein–peptide complexes produced relatively low ipTM scores overall, indicating weak confidence in the modeled interactions. The PepMLM-generated peptides showed ipTM values ranging from 0.30 to 0.48. The highest score was observed for the peptide WRYGATVAAHKX (ipTM = 0.48), followed by WLSGAAALALKX (ipTM = 0.45), both of which exceeded the ipTM score of the known SOD1-binding peptide FLYRWLPSRRGG (ipTM = 0.38). Despite these slightly higher scores, none of the predicted peptides appeared to strongly interact with the β-barrel region of SOD1, and most were either surface-bound or only partially buried on the protein surface. Overall, while some PepMLM-generated peptides showed marginally higher ipTM scores than the known binder, the predicted interactions remain weak and uncertain.

PeptidePredicted binding affinitySolubilityHemolysis probabilityNet chargeMolecular weight (Da)
WRYPAAAAALKX5.437SolubleNon - hemolitic1.761199.6
WRYGATVAAHKX5.440SolubleNon - hemolitic1.851241.6
WLSGAAALALKX6.550SolubleNon - hemolitic0.761082.6
WLYPAAALALKX6.693SolubleNon - hemolitic0.761198.7
FLYRWLPSRRGG5.96SolubleNon - hemolitic2.761507.7

The peptide property predictions were broadly favorable, since all candidates were predicted to be soluble and non-hemolytic. However, the AlphaFold3 results showed only modest ipTM values, suggesting weak to moderate confidence in the predicted protein-peptide interactions. The peptide with the highest ipTM score was WRYGATVAAHKX (0.48), while the best predicted binding affinity value was observed for WRYPAAAAALKX (5.437), indicating that higher ipTM did not perfectly correlate with stronger predicted affinity. Overall, WRYGATVAAHKX appears to offer the best balance between structural binding potential and therapeutic properties, so it would be the strongest candidate to advance.

I would choose WRYGATVAAHKX, because:

  • it has the highest ipTM
  • it is soluble
  • it is non-hemolytic
  • its charge is moderate
  • it outperformed the known binder in ipTM

Part C: Final Project: L-Protein Mutants

VariantMutationRegionExperimental evidenceConservation analysisExpected effectRationale
V1P13LSoluble regionLysis = 1; Protein level = 1Highly conserved; keep with cautionMay alter soluble-domain behavior while preserving lysisSelected because it retained lysis activity and detectable protein expression in the experimental mutant dataset. Although the site is conserved, it is kept as a cautious candidate because experimental data supports functionality.
V2S15ASoluble regionLysis = 1; Protein level = 1Moderately conserved / partially variableMay preserve or improve folding while maintaining lysisSelected because it is a small amino acid change, retained lysis activity, and occurs in a less constrained region than fully conserved sites.
V3R30QSoluble regionLysis = 1; Protein level = 1Highly conserved; keep with cautionMay affect DnaJ-associated interaction or soluble-domain propertiesSelected because the soluble domain is associated with DnaJ interaction, and this mutant retained lysis and protein expression experimentally.
V4L44PTransmembrane regionLysis = 1; Protein level = 1Highly conserved; keep with cautionMay alter membrane-associated lysis activitySelected because the transmembrane region affects lysis activity, and this mutation remained functional in the experimental data.
V5A45PTransmembrane regionLysis = 1; Protein level = 1Moderately conserved / partially variableMay modify transmembrane behavior while preserving lysisSelected because it retained both lysis activity and detectable protein expression and is less strictly conserved than nearby transmembrane residues.

To select MS2 L-protein mutant candidates, I first divided the protein into two functional regions: the soluble N-terminal region, which is associated with DnaJ interaction, and the C-terminal transmembrane region, which affects lysis activity. I prioritized mutations that retained lysis activity and detectable protein levels in the experimental mutant dataset. I then used homologous L-protein sequences from pBLAST and aligned them with Clustal Omega to evaluate whether each candidate position was conserved or variable. Highly conserved sites were interpreted with caution, while partially variable sites were considered more permissive for mutation. The final five variants include mutations in both the soluble and transmembrane regions, allowing the design to test effects on DnaJ-related behavior and membrane-associated lysis.

The expected outcome is that at least one of these L-protein variants will maintain lysis activity while changing properties related to folding, DnaJ interaction, or membrane-associated lysis. Soluble-domain mutants may help test whether the protein can become less dependent on DnaJ-mediated processing, while transmembrane mutants may affect the speed or efficiency of bacterial lysis. Because some selected residues are conserved, these mutations should be interpreted as candidates for testing rather than guaranteed improvements. The next experimental step would be to synthesize the mutant genes, clone them into the appropriate construct, and compare their lysis activity against the wild-type L-protein.

Figure 1. Clustal Omega alignment of selected MS2 L-protein homologs.
The alignment was used to evaluate whether candidate mutation sites were highly conserved or partially variable before selecting final L-protein mutant variants.

Clustal Omega alignment of selected MS2 L-protein homologs Clustal Omega alignment of selected MS2 L-protein homologs

Note. Candidate mutations were interpreted using conservation patterns across homologous L-protein sequences. Highly conserved residues were kept only with caution when supported by experimental lysis and protein-expression data.