Week 5 HW: Protein Design Part II

Protein Design II

video video

SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

  1. Retrieval of SOD1 Sequence

The human Superoxide Dismutase 1 (SOD1) protein sequence was retrieved from UniProt (Accession P00441).

Wild-type sequence (first region):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
  1. Introduction of the A4V Mutation

The classical ALS mutation A4V replaces Alanine (A) with Valine (V) near the N-terminus.

However, examination of the provided sequence shows:

PositionResidue
1M
2A
3T
4K
5A
6V

Thus residue 4 is Lysine, not Alanine. The nearest Alanine occurs at position 5, so the mutation was applied there.

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

This substitution increases hydrophobicity near the N-terminus and is known to destabilize SOD1, promoting aggregation associated with aggressive familial ALS.

  1. Peptide Generation with PepMLM

Using the PepMLM-650M model Colab, the mutant SOD1 sequence was used as the conditioning context to generate four peptides of length 12 amino acids.

image image
During the implementation of the PepMLM Colab notebook, the peptide generation step produced the same sequence, WRYYAVAAAHKX, for all four generated peptides. This might have occurred because the model generation process was likely using deterministic decoding, where the model selects the highest-probability amino acid at each position given the same input sequence. Since the conditioning sequence (the A4V mutant SOD1) and the generation settings remained the same for each run, the model repeatedly produced the identical peptide instead of generating diverse sequences. Additionally, the presence of “X” at the end of the sequence indicates that the model predicted an unknown or unresolved amino acid token during generation. As a result, all four peptides were identical, and the control peptide FLYRWLPSRRGG was added separately for comparison as required in the assignment.

Snapshot of the output (of a particular section, not all) image image

Final generated peptides and control sequence is as follows:

PeptideSequence
Pep1WRYYAVAAAHKX
Pep2WRYYAVAAAHKX
Pep3WRYYAVAAAHKX
Pep4WRYYAVAAAHKX
ControlFLYRWLPSRRGG

PepMLM Token Prediction Scores:

PositionAmino AcidScore
1W0.562357
2R0.230632
3Y0.458953
4Y0.257805
5A0.329096
6V0.214972
7A0.337871
8A0.136613
9A0.123724
10H0.186813
11K0.268938
12X0.243224

Part 2: Evaluate Binders with AlphaFold3

Submission to AlphaFold Server

The mutant A4V SOD1 FASTA sequence was submitted to the AlphaFold Server. For each test, the SOD1 mutant sequence was entered as the first chain, followed by the peptide sequence as the second chain to model the protein–peptide complex.

The following image shows the submission of SOD1 mutant sequence to the AlphaFold Server: image image

The result generated through this submission is as follows: image image

  1. Peptide 1 Evaluation

Original PepMLM Sequence

WRYYAVAAAHKX

Because X represents an unknown amino acid, it was replaced with E (Glutamic acid) before submission to AlphaFold:

Final peptide used:

WRYYAVAAAHKE
image image

AlphaFold Scores

MetricValue
ipTM0.26
pTM0.71

Structural Observation

The AlphaFold prediction produced an ipTM score of 0.26 and a pTM score of 0.71. The pTM value indicates that the overall SOD1 protein structure is predicted with reasonable confidence. However, the very low ipTM score suggests weak or negligible interaction between the peptide and SOD1.

Visualization of the predicted complex shows that the peptide is loosely positioned on the surface of the protein and does not form a clear binding interface. The peptide does not appear to localize near the N-terminal region where the A4V mutation occurs. Additionally, it does not penetrate the β-barrel core or interact with the dimer interface of the protein.

This result suggests that the PepMLM-generated peptide is unlikely to bind strongly to mutant SOD1.

  1. Control Peptide Evaluation

Control Sequence

FLYRWLPSRRGG

image image

AlphaFold Scores

MetricValue
ipTM0.32
pTM0.82

Structural Observation

The AlphaFold prediction for the control peptide produced an ipTM score of 0.32 and a pTM score of 0.82. The relatively high pTM value indicates that the overall SOD1 protein structure was predicted with high confidence, consistent with its known β-barrel fold.

However, the ipTM score remains relatively low, suggesting weak or unreliable interaction between the peptide and SOD1. Visualization of the predicted complex shows that the peptide is positioned along the outer surface of the protein rather than forming a well-defined binding pocket.

The peptide does not localize near the N-terminal region containing the A4V mutation and does not strongly engage the β-barrel core or the dimer interface. Instead, the peptide remains largely surface-bound, suggesting that the interaction may be nonspecific or transient.

  1. Summary of AlphaFold Results
PeptideSequenceipTMBinding Observation
PepMLM peptideWRYYAVAAAHKE0.26Peptide appears loosely positioned on the surface of SOD1 and does not form a well-defined binding interface. It does not localize near the A4V mutation site.
Control peptideFLYRWLPSRRGG0.32Peptide remains surface-bound and does not strongly interact with the β-barrel core or dimer interface.
  1. Binding Site Analysis
RegionObservation
N-terminus (A4V site)Peptide does not bind near this region
β-barrel corePeptide does not penetrate the barrel
Dimer interfacePeptide does not appear positioned between monomers
Protein surfacePeptide appears loosely surface-bound
  1. Final Interpretation

The AlphaFold predictions produced relatively low ipTM scores for both peptides, indicating weak predicted interactions with the SOD1 protein. The PepMLM-generated peptide (WRYYAVAAAHKE) showed an ipTM value of 0.26, suggesting very little confidence in a stable binding interface. The control peptide (FLYRWLPSRRGG) produced a slightly higher ipTM value of 0.32, but this value is still below the threshold typically associated with reliable protein–peptide interactions.

Visualization of the predicted complexes shows that both peptides remain largely surface-bound and do not interact strongly with the N-terminal A4V mutation site, the β-barrel core, or the dimer interface. None of the PepMLM-generated peptides matched or exceeded the predicted binding strength of the control peptide, and both peptides appear to form weak and nonspecific interactions with SOD1.

Highlighting the N-terminus Region

To further examine the predicted binding location, the N-terminal region of the SOD1 protein, which contains the A4V mutation, was highlighted in the AlphaFold structure. This visualization allowed for direct observation of whether the peptide interacts with or binds near this mutation site.

Upon inspection of the predicted complex, the peptide does not localize near the N-terminal region and does not appear to form interactions with residues surrounding the A4V mutation. Instead, the peptide remains positioned on the outer surface of the protein, away from the mutation site. This observation suggests that the peptide is unlikely to specifically target the A4V region of the mutant SOD1 protein.

image image

Highlighting the Control Peptide Sequence

image image

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

The therapeutic properties of the generated peptides were analyzed using the PeptiVerse platform.

image image

Results obtained:

image image

Therapeutic Property Evaluation Using PeptiVerse

The peptide WRYYAVAAAHKE was further analyzed using PeptiVerse to evaluate its potential therapeutic properties. The peptide sequence and the A4V mutant SOD1 sequence were provided as inputs, and several relevant properties were predicted.

Predicted Peptide Properties

PropertyPredicted Value
Solubility Probability1.00
Hemolysis Probability0.018
Net Charge (pH 7)0.85
Molecular Weight1464.6 Da
GRAVY Hydrophobicity−0.60
Cell Permeability0.494
Estimated Half-Life~0.46 hours

The peptide is predicted to be highly soluble, which is a desirable property for therapeutic peptides. It also shows a very low hemolysis probability, suggesting that it is unlikely to damage red blood cells. The moderate molecular weight and near-neutral net charge may support reasonable biological compatibility.

The GRAVY hydrophobicity score of −0.60 indicates that the peptide is relatively hydrophilic, which aligns with the predicted high solubility. However, the predicted cell permeability is moderate, and the estimated half-life of approximately 0.46 hours suggests limited stability in biological environments.

Comparison of Structural and Therapeutic Predictions

When comparing the structural predictions with the therapeutic property analysis, the results appear consistent. The low ipTM value from AlphaFold3 indicates weak predicted binding between the peptide and SOD1, and the structural visualization supports this by showing a surface-bound peptide without a well-defined binding interface.

Although the peptide does not demonstrate strong predicted binding affinity, it does not exhibit problematic therapeutic properties, such as high hemolysis risk or poor solubility, which are common limitations in peptide drug candidates.

Peptide Selection for Advancement

WRYYAVAAAHKE represents a reasonable peptide candidate to advance for further study. While its predicted binding strength to SOD1 is relatively weak, it demonstrates favorable therapeutic characteristics, including high solubility, low hemolysis probability, and acceptable physicochemical properties.

Future optimization approaches, such as targeted peptide redesign or guided peptide generation methods, could potentially improve binding affinity while preserving these favorable therapeutic traits.

Part 4: Generate Optimized Peptides with moPPIt

The given mutant sequence was used to generate the optimized peptide:

image image

The motif positions were set to residues 1–10 during peptide generation. Additionally, only three optimization properties were selected in the notebook because the computation was performed on a T4 GPU in Google Colab, which has limited computational resources. Reducing the number of selected properties helped ensure that the notebook ran efficiently within the available GPU memory and runtime constraints.

It took >40 mins to implement the code

image image

moPPIt Generated Peptides

The model generated three candidate peptides with predicted values for solubility, binding affinity, and motif score.

BinderSolubilityPredicted AffinityMotif Score
YNQKYSQCKYAC0.91676.420.68
IKYINQKLKELR0.66677.180.75
QDDKSEEEEDGQ1.004.700.34

Comparison of moPPIt Peptides vs PepMLM Peptide

The moPPIt binder predictions produced three peptide candidates with varying physicochemical and predicted binding properties.

PeptideSolubilityPredicted AffinityMotif Score
YNQKYSQCKYAC0.91676.420.68
IKYINQKLKELR0.66677.180.75
QDDKSEEEEDGQ1.004.700.34

For comparison, the PepMLM-generated peptide (WRYYAVAAAHKE) evaluated earlier showed:

  1. Excellent solubility (1.0)

  2. Very low hemolysis probability (0.018), indicating favorable therapeutic safety

However, AlphaFold3 predicted weak structural binding with an ipTM ≈ 0.26, suggesting low confidence in stable interaction with the SOD1 A4V protein.

In contrast, the moPPIt peptides show higher predicted binding affinity scores (4.7–7.18), suggesting stronger potential interaction with the target protein compared to the PepMLM peptide. However, the moPPIt peptides vary more in solubility. For example, IKYINQKLKELR shows only moderate solubility (0.67), which could potentially impact therapeutic delivery.

The moPPIt peptides appear optimized for binding affinity, whereas the PepMLM peptide appears optimized for favorable therapeutic properties, such as solubility and safety.

Evaluation Before Clinical Advancement

Before advancing any of these peptides to clinical studies, several additional evaluations would be necessary.

  1. Structural Validation

Further structural analysis should be performed using tools such as AlphaFold3 or molecular docking to confirm the predicted binding interface with the A4V mutant SOD1 protein. This would help determine whether the peptide binds near the N-terminal A4V mutation site, the β-barrel region, or the dimer interface.

  1. Binding Affinity Testing

Experimental assays such as surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) should be performed to measure the actual binding strength between the peptide and the SOD1 protein.

  1. Stability and Pharmacokinetics

Peptides should be evaluated for serum stability and biological half-life. Additional studies should assess protease resistance and degradation rates to determine whether the peptide remains stable in physiological conditions.

  1. Toxicity and Safety

Safety evaluation is essential before clinical use. Experiments should test hemolysis, cytotoxicity, and potential immunogenic responses in relevant cell culture models.

  1. Functional Assays

Functional assays should determine whether the peptide can reduce aggregation or toxicity of mutant SOD1, which is an important mechanism in ALS therapeutic development.

Interpretation The moPPIt peptides demonstrate stronger predicted binding affinity, particularly IKYINQKLKELR, which shows the highest affinity and motif score among the generated candidates. However, the PepMLM peptide shows superior solubility and safety predictions.

An ideal therapeutic peptide would balance strong binding affinity with favorable physicochemical and safety properties. Therefore, further computational validation and experimental testing would be required to determine which peptide candidate provides the best overall balance of binding performance, stability, and therapeutic safety.

Visualization of moPPIt Peptides

  1. YNQKYSQCKYAC
image image
  1. IKYINQKLKELR
image image
  1. QDDKSEEEEDGQ
image image

FINAL GROUP PROJECT Phage Lysis Protein Design Challenge

  1. Introduction

Bacteriophage lysis proteins are responsible for disrupting the host bacterial membrane during phage infection, allowing the release of viral particles. The MS2 lysis protein is a small membrane-associated protein composed of 75 amino acids and contains two major functional regions:

DomainResiduesFunction
Soluble domain1–40Interaction with host chaperone protein DnaJ
Transmembrane helix41–75Membrane insertion and pore formation

Lysis Protein Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Design Objective

Design five mutations in the lysis protein:

  1. 2 mutations in the soluble region

  2. 2 mutations in the transmembrane region

  3. 1 mutation anywhere in the sequence

These mutations should preserve protein function while potentially improving stability or membrane activity.

  1. Evolutionary Analysis

2.1 Protein BLAST

Homologous sequences for the MS2 lysis protein were obtained using Protein BLAST.

The sequences were downloaded in FASTA format and used for multiple sequence alignment.

image image

2.2 Multiple Sequence Alignment

Multiple sequence alignment was performed using Clustal Omega.

Tool used:

https://www.ebi.ac.uk/jdispatcher/msa/clustalo

Homologous sequences used

  • WP_434006754.1

  • WP_434006752.1

  • SNQ28029.1

  • ACN90570.1

  • AAF19634.1

  • ACN90183.1

  • ACN90501.1

  • ACN90441.1

  • ACN90250.1

These sequences represent related phage lysis proteins.

After, running the BLAST- downloaded the FASTA(CLUSTER) FILE:

image image
  1. Conservation Analysis

Clustal Omega indicates conservation using the following symbols:

SymbolMeaning
*Fully conserved residue
:Strongly conserved
.Weakly conserved

Example conservation pattern:

** *  :***:**.  ** ***: ****** ** **

Key Conserved Motifs

Highly conserved motifs observed in the alignment include:

METRFPQQSQQTPAST
PCRRQQRSSTLY

These residues are likely essential for structural stability or host protein interaction, particularly with DnaJ.

Therefore, fully conserved residues should not be mutated.

image image
  1. Variable Regions

Regions showing substitutions or alignment gaps indicate evolutionary variability.

Example variable region:

RYRRPRGSNTGKEYRLKKFCRNI

Variation is also observed in the C-terminal region, where some sequences contain truncations or insertions.

Implication

Variable regions are better candidates for mutational engineering because they are less likely to disrupt protein function.

  1. Domain Analysis

The MS2 lysis protein contains two main structural regions:

RegionResiduesFunction
Soluble domain1–40Interaction with DnaJ
Transmembrane domain41–75Membrane insertion and pore formation
  1. Soluble Region Conservation

The N-terminal soluble domain shows high conservation across homologous sequences.

Example conserved sequence:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLY

Mutations in this region must therefore be chosen carefully.

Candidate mutation sites

PositionResidueReason
12QWeakly conserved
17NVariable among homologs
26YModerate variability

These positions may tolerate substitutions without disrupting protein folding.

  1. Transmembrane Region Conservation

The C-terminal region forms a transmembrane helix.

Example sequence:

LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

This region is highly hydrophobic, which is required for membrane insertion.

However, conservative substitutions between hydrophobic residues may be tolerated.

Candidate mutation sites

PositionResidueReason
52LHydrophobic substitution possible
55IMinor hydrophobic change
59VFrequently mutated experimentally
  1. Key Observations from Alignment
  • The N-terminal region is highly conserved, indicating functional importance in host interaction.

  • Some residues in the soluble domain show moderate variability.

  • The transmembrane region remains hydrophobic but allows conservative substitutions.

  • Some homologous proteins exhibit C-terminal truncations, suggesting structural flexibility in this region.

  1. Mutation Design Strategy
  • The mutation design followed several biological constraints:

Rules applied

  • Avoid fully conserved residues

  • Prefer weakly conserved or variable residues

  • Maintain hydrophobicity in transmembrane helices

  • Preserve overall protein folding and stability

  1. Mutational Scoring Using Protein Language Models

Mutation effects are predicted using protein language models, such as:

  • ESM-1b

  • MSA Transformer

  • ProteinBERT

Mutation scoring used log-likelihood ratio (LLR) values.

LLR Interpretation

ScoreInterpretation
> 2Very favorable
1–2Moderately favorable
0–1Weakly favorable
< 0Unfavorable

Following image indicates results obtained using Protein Language Models (ESM).ipynb

image image
  1. Top Ranked Mutations
PositionWTMutationLLR Score
50KL2.56
29CR2.39
39YL2.24
29CS2.04
9SQ2.01
53NL1.86
52TL1.81
61EL1.81

Many favorable mutations convert residues to Leucine (L) because leucine stabilizes membrane helices due to its strong hydrophobicity.

  1. Mapping Mutations to Protein Regions

Soluble Region (1–40)

MutationScore
C29R2.39
C29S2.04
S9Q2.01
Y39L2.24
F5Q1.79

Transmembrane Region (41–75)

MutationScore
K50L2.56
T52L1.81
N53L1.86
E61L1.81
A45L1.53
  1. Biological Filtering

Risky mutations were removed using biological constraints.

Mutations excluded

  • C29R
  • C29S

Reason: cysteine residues may form structural interactions.

Safer alternatives

  • Y39L
  • S9Q
  • F5Q
  1. Final Selected Mutations
MutationRegionLLR Score
S9QSoluble2.01
Y39LSoluble2.24
K50LTransmembrane2.56
T52LTransmembrane1.81
N53LAnywhere1.86
  1. Mutated Protein Sequence

Original Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Mutated Sequence

METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSLFLQLLLSLLEAVIRTVTTLQQLLT

Mutations applied:

  • S9Q
  • Y39L
  • K50L
  • T52L
  • N53L
  1. Comparison with Experimental Data

Experimental data supports mutational tolerance at several selected positions.

MutationPositionEvidenceInterpretation
S9Q9No experimental mutation reportedLikely tolerant
Y39L39Y→H mutation reportedPosition mutable
K50L50Multiple substitutions observedHighly tolerant
T52L52Mutation recordedMutation tolerated
N53L53Several variants reportedFlexible boundary residue

These results support the predicted soluble and membrane domain boundaries.

  1. Structural Prediction Using AlphaFold

The mutated sequence was modeled using AlphaFold Multimer

It required several attempts to successfully obtain a PDB file. Initially, an 8-sequence oligomer model was submitted for prediction; however, the system crashed during the run due to the high computational load. After adjusting the input and rerunning the analysis, a successful prediction was eventually completed and the resulting outputs were documented as follows.

image image

Interpretation

The AlphaFold Multimer predictions were performed using several models, seeds, and recycling steps to evaluate the structural stability of the designed protein complex. Across all runs, the predicted local distance difference test (pLDDT) values ranged approximately between 32 and 40, indicating low to moderate confidence in the overall structural prediction, which is expected for small membrane-associated proteins and flexible regions. The pTM scores were generally between 0.19 and 0.31, while ipTM scores ranged from ~0.13 to 0.27, suggesting limited but detectable inter-chain interaction confidence. Notably, model 2 with seed 001 produced the highest scores (pLDDT ≈ 40.3, pTM ≈ 0.312, ipTM ≈ 0.275), indicating the most reliable structural prediction among the tested configurations. Most models converged after 6 recycling iterations, with total runtimes of approximately 258–323 seconds per model, suggesting stable convergence of the prediction process. While the moderate confidence scores indicate some structural uncertainty, the consistent convergence across multiple seeds and models suggests that the predicted fold and interaction patterns are reproducible and suitable for preliminary structural analysis.

To improve the prediction results, the analysis was repeated using a different input configuration. Instead of running an eight-sequence oligomer model, which previously caused the system to crash, a four-oligomer sequence setup was used. This reduced computational complexity and allowed the prediction to run successfully, enabling the generation of structural outputs for further analysis.

Results obtained:

image image
  1. Co-Folding Analysis

The mutated lysis protein sequence was further analyzed using co-folding simulations with additional protein sequences to investigate potential protein–protein interactions.

Structural visualization tools such as Discovery Studio were used to examine key structural and interaction features, including:

  • Hydrogen bonding patterns
  • Protein–protein interface interactions
  • Membrane insertion orientation

Co-folding simulations were performed using both the AlphaFold Multimer v3 notebook and the AlphaFold Server to compare prediction consistency and interaction confidence across different platforms.

The results obtained from the AlphaFold Server are summarized as follows:

image image
  1. Conclusion

This study applied evolutionary analysis, protein language models, and structural prediction to design mutations in the MS2 lysis protein.

Key findings:

  • The N-terminal region is highly conserved and involved in host interaction.

  • The C-terminal region forms a hydrophobic transmembrane helix.

  • Protein language model scoring identified favorable mutations.

  • Biological filtering ensured structural compatibility.

Final designed mutations

  • S9Q
  • Y39L
  • K50L
  • T52L
  • N53L
image image