Week 5 HW: Protein Design Part II

Protein Design II

SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

Retrieval of SOD1 Sequence

The human Superoxide Dismutase 1 (SOD1) protein sequence was retrieved from UniProt (Accession P00441).

Wild-type sequence (first region):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Introduction of the A4V Mutation

The classical ALS mutation A4V replaces Alanine (A) with Valine (V) near the N-terminus.

However, examination of the provided sequence shows:

Position	Residue
1	M
2	A
3	T
4	K
5	A
6	V

Thus residue 4 is Lysine, not Alanine. The nearest Alanine occurs at position 5, so the mutation was applied there.

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

This substitution increases hydrophobicity near the N-terminus and is known to destabilize SOD1, promoting aggregation associated with aggressive familial ALS.

Peptide Generation with PepMLM

Using the PepMLM-650M model Colab, the mutant SOD1 sequence was used as the conditioning context to generate four peptides of length 12 amino acids.

During the implementation of the PepMLM Colab notebook, the peptide generation step produced the same sequence, WRYYAVAAAHKX, for all four generated peptides. This might have occurred because the model generation process was likely using deterministic decoding, where the model selects the highest-probability amino acid at each position given the same input sequence. Since the conditioning sequence (the A4V mutant SOD1) and the generation settings remained the same for each run, the model repeatedly produced the identical peptide instead of generating diverse sequences. Additionally, the presence of “X” at the end of the sequence indicates that the model predicted an unknown or unresolved amino acid token during generation. As a result, all four peptides were identical, and the control peptide FLYRWLPSRRGG was added separately for comparison as required in the assignment.

Snapshot of the output (of a particular section, not all)

Final generated peptides and control sequence is as follows:

Peptide	Sequence
Pep1	WRYYAVAAAHKX
Pep2	WRYYAVAAAHKX
Pep3	WRYYAVAAAHKX
Pep4	WRYYAVAAAHKX
Control	FLYRWLPSRRGG

PepMLM Token Prediction Scores:

Position	Amino Acid	Score
1	W	0.562357
2	R	0.230632
3	Y	0.458953
4	Y	0.257805
5	A	0.329096
6	V	0.214972
7	A	0.337871
8	A	0.136613
9	A	0.123724
10	H	0.186813
11	K	0.268938
12	X	0.243224

Part 2: Evaluate Binders with AlphaFold3

Submission to AlphaFold Server

The mutant A4V SOD1 FASTA sequence was submitted to the AlphaFold Server. For each test, the SOD1 mutant sequence was entered as the first chain, followed by the peptide sequence as the second chain to model the protein–peptide complex.

The following image shows the submission of SOD1 mutant sequence to the AlphaFold Server:

The result generated through this submission is as follows:

Peptide 1 Evaluation

Original PepMLM Sequence

WRYYAVAAAHKX

Because X represents an unknown amino acid, it was replaced with E (Glutamic acid) before submission to AlphaFold:

Final peptide used:

WRYYAVAAAHKE

AlphaFold Scores

Metric	Value
ipTM	0.26
pTM	0.71

Structural Observation

The AlphaFold prediction produced an ipTM score of 0.26 and a pTM score of 0.71. The pTM value indicates that the overall SOD1 protein structure is predicted with reasonable confidence. However, the very low ipTM score suggests weak or negligible interaction between the peptide and SOD1.

Visualization of the predicted complex shows that the peptide is loosely positioned on the surface of the protein and does not form a clear binding interface. The peptide does not appear to localize near the N-terminal region where the A4V mutation occurs. Additionally, it does not penetrate the β-barrel core or interact with the dimer interface of the protein.

This result suggests that the PepMLM-generated peptide is unlikely to bind strongly to mutant SOD1.

Control Peptide Evaluation

Control Sequence

FLYRWLPSRRGG

AlphaFold Scores

Metric	Value
ipTM	0.32
pTM	0.82

Structural Observation

The AlphaFold prediction for the control peptide produced an ipTM score of 0.32 and a pTM score of 0.82. The relatively high pTM value indicates that the overall SOD1 protein structure was predicted with high confidence, consistent with its known β-barrel fold.

However, the ipTM score remains relatively low, suggesting weak or unreliable interaction between the peptide and SOD1. Visualization of the predicted complex shows that the peptide is positioned along the outer surface of the protein rather than forming a well-defined binding pocket.

The peptide does not localize near the N-terminal region containing the A4V mutation and does not strongly engage the β-barrel core or the dimer interface. Instead, the peptide remains largely surface-bound, suggesting that the interaction may be nonspecific or transient.

Summary of AlphaFold Results

Peptide	Sequence	ipTM	Binding Observation
PepMLM peptide	WRYYAVAAAHKE	0.26	Peptide appears loosely positioned on the surface of SOD1 and does not form a well-defined binding interface. It does not localize near the A4V mutation site.
Control peptide	FLYRWLPSRRGG	0.32	Peptide remains surface-bound and does not strongly interact with the β-barrel core or dimer interface.

Binding Site Analysis

Region	Observation
N-terminus (A4V site)	Peptide does not bind near this region
β-barrel core	Peptide does not penetrate the barrel
Dimer interface	Peptide does not appear positioned between monomers
Protein surface	Peptide appears loosely surface-bound

Final Interpretation

The AlphaFold predictions produced relatively low ipTM scores for both peptides, indicating weak predicted interactions with the SOD1 protein. The PepMLM-generated peptide (WRYYAVAAAHKE) showed an ipTM value of 0.26, suggesting very little confidence in a stable binding interface. The control peptide (FLYRWLPSRRGG) produced a slightly higher ipTM value of 0.32, but this value is still below the threshold typically associated with reliable protein–peptide interactions.

Visualization of the predicted complexes shows that both peptides remain largely surface-bound and do not interact strongly with the N-terminal A4V mutation site, the β-barrel core, or the dimer interface. None of the PepMLM-generated peptides matched or exceeded the predicted binding strength of the control peptide, and both peptides appear to form weak and nonspecific interactions with SOD1.

Highlighting the N-terminus Region

To further examine the predicted binding location, the N-terminal region of the SOD1 protein, which contains the A4V mutation, was highlighted in the AlphaFold structure. This visualization allowed for direct observation of whether the peptide interacts with or binds near this mutation site.

Upon inspection of the predicted complex, the peptide does not localize near the N-terminal region and does not appear to form interactions with residues surrounding the A4V mutation. Instead, the peptide remains positioned on the outer surface of the protein, away from the mutation site. This observation suggests that the peptide is unlikely to specifically target the A4V region of the mutant SOD1 protein.

Highlighting the Control Peptide Sequence

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

The therapeutic properties of the generated peptides were analyzed using the PeptiVerse platform.

Results obtained:

Therapeutic Property Evaluation Using PeptiVerse

The peptide WRYYAVAAAHKE was further analyzed using PeptiVerse to evaluate its potential therapeutic properties. The peptide sequence and the A4V mutant SOD1 sequence were provided as inputs, and several relevant properties were predicted.

Predicted Peptide Properties

Property	Predicted Value
Solubility Probability	1.00
Hemolysis Probability	0.018
Net Charge (pH 7)	0.85
Molecular Weight	1464.6 Da
GRAVY Hydrophobicity	−0.60
Cell Permeability	0.494
Estimated Half-Life	~0.46 hours

The peptide is predicted to be highly soluble, which is a desirable property for therapeutic peptides. It also shows a very low hemolysis probability, suggesting that it is unlikely to damage red blood cells. The moderate molecular weight and near-neutral net charge may support reasonable biological compatibility.

The GRAVY hydrophobicity score of −0.60 indicates that the peptide is relatively hydrophilic, which aligns with the predicted high solubility. However, the predicted cell permeability is moderate, and the estimated half-life of approximately 0.46 hours suggests limited stability in biological environments.

Comparison of Structural and Therapeutic Predictions

When comparing the structural predictions with the therapeutic property analysis, the results appear consistent. The low ipTM value from AlphaFold3 indicates weak predicted binding between the peptide and SOD1, and the structural visualization supports this by showing a surface-bound peptide without a well-defined binding interface.

Although the peptide does not demonstrate strong predicted binding affinity, it does not exhibit problematic therapeutic properties, such as high hemolysis risk or poor solubility, which are common limitations in peptide drug candidates.

Peptide Selection for Advancement

WRYYAVAAAHKE represents a reasonable peptide candidate to advance for further study. While its predicted binding strength to SOD1 is relatively weak, it demonstrates favorable therapeutic characteristics, including high solubility, low hemolysis probability, and acceptable physicochemical properties.

Future optimization approaches, such as targeted peptide redesign or guided peptide generation methods, could potentially improve binding affinity while preserving these favorable therapeutic traits.

Part 4: Generate Optimized Peptides with moPPIt

The given mutant sequence was used to generate the optimized peptide:

The motif positions were set to residues 1–10 during peptide generation. Additionally, only three optimization properties were selected in the notebook because the computation was performed on a T4 GPU in Google Colab, which has limited computational resources. Reducing the number of selected properties helped ensure that the notebook ran efficiently within the available GPU memory and runtime constraints.

It took >40 mins to implement the code

moPPIt Generated Peptides

The model generated three candidate peptides with predicted values for solubility, binding affinity, and motif score.

Binder	Solubility	Predicted Affinity	Motif Score
YNQKYSQCKYAC	0.9167	6.42	0.68
IKYINQKLKELR	0.6667	7.18	0.75
QDDKSEEEEDGQ	1.00	4.70	0.34

Comparison of moPPIt Peptides vs PepMLM Peptide

The moPPIt binder predictions produced three peptide candidates with varying physicochemical and predicted binding properties.

Peptide	Solubility	Predicted Affinity	Motif Score
YNQKYSQCKYAC	0.9167	6.42	0.68
IKYINQKLKELR	0.6667	7.18	0.75
QDDKSEEEEDGQ	1.00	4.70	0.34

For comparison, the PepMLM-generated peptide (WRYYAVAAAHKE) evaluated earlier showed:

Excellent solubility (1.0)
Very low hemolysis probability (0.018), indicating favorable therapeutic safety

However, AlphaFold3 predicted weak structural binding with an ipTM ≈ 0.26, suggesting low confidence in stable interaction with the SOD1 A4V protein.

In contrast, the moPPIt peptides show higher predicted binding affinity scores (4.7–7.18), suggesting stronger potential interaction with the target protein compared to the PepMLM peptide. However, the moPPIt peptides vary more in solubility. For example, IKYINQKLKELR shows only moderate solubility (0.67), which could potentially impact therapeutic delivery.

The moPPIt peptides appear optimized for binding affinity, whereas the PepMLM peptide appears optimized for favorable therapeutic properties, such as solubility and safety.

Evaluation Before Clinical Advancement

Before advancing any of these peptides to clinical studies, several additional evaluations would be necessary.

Structural Validation

Further structural analysis should be performed using tools such as AlphaFold3 or molecular docking to confirm the predicted binding interface with the A4V mutant SOD1 protein. This would help determine whether the peptide binds near the N-terminal A4V mutation site, the β-barrel region, or the dimer interface.

Binding Affinity Testing

Experimental assays such as surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) should be performed to measure the actual binding strength between the peptide and the SOD1 protein.

Stability and Pharmacokinetics

Peptides should be evaluated for serum stability and biological half-life. Additional studies should assess protease resistance and degradation rates to determine whether the peptide remains stable in physiological conditions.

Toxicity and Safety

Safety evaluation is essential before clinical use. Experiments should test hemolysis, cytotoxicity, and potential immunogenic responses in relevant cell culture models.

Functional Assays

Functional assays should determine whether the peptide can reduce aggregation or toxicity of mutant SOD1, which is an important mechanism in ALS therapeutic development.

Interpretation The moPPIt peptides demonstrate stronger predicted binding affinity, particularly IKYINQKLKELR, which shows the highest affinity and motif score among the generated candidates. However, the PepMLM peptide shows superior solubility and safety predictions.

An ideal therapeutic peptide would balance strong binding affinity with favorable physicochemical and safety properties. Therefore, further computational validation and experimental testing would be required to determine which peptide candidate provides the best overall balance of binding performance, stability, and therapeutic safety.

Visualization of moPPIt Peptides

YNQKYSQCKYAC

IKYINQKLKELR

QDDKSEEEEDGQ

FINAL GROUP PROJECT Phage Lysis Protein Design Challenge

Introduction

Bacteriophage lysis proteins are responsible for disrupting the host bacterial membrane during phage infection, allowing the release of viral particles. The MS2 lysis protein is a small membrane-associated protein composed of 75 amino acids and contains two major functional regions:

Domain	Residues	Function
Soluble domain	1–40	Interaction with host chaperone protein DnaJ
Transmembrane helix	41–75	Membrane insertion and pore formation

Lysis Protein Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Design Objective

Design five mutations in the lysis protein:

2 mutations in the soluble region
2 mutations in the transmembrane region
1 mutation anywhere in the sequence

These mutations should preserve protein function while potentially improving stability or membrane activity.

Evolutionary Analysis

2.1 Protein BLAST

Homologous sequences for the MS2 lysis protein were obtained using Protein BLAST.

The sequences were downloaded in FASTA format and used for multiple sequence alignment.

2.2 Multiple Sequence Alignment

Multiple sequence alignment was performed using Clustal Omega.

Tool used:

https://www.ebi.ac.uk/jdispatcher/msa/clustalo

Homologous sequences used

WP_434006754.1
WP_434006752.1
SNQ28029.1
ACN90570.1
AAF19634.1
ACN90183.1
ACN90501.1
ACN90441.1
ACN90250.1

These sequences represent related phage lysis proteins.

After, running the BLAST- downloaded the FASTA(CLUSTER) FILE:

Conservation Analysis

Clustal Omega indicates conservation using the following symbols:

Symbol	Meaning
*	Fully conserved residue
:	Strongly conserved
.	Weakly conserved

Example conservation pattern:

** *  :***:**.  ** ***: ****** ** **

Key Conserved Motifs

Highly conserved motifs observed in the alignment include:

METRFPQQSQQTPAST
PCRRQQRSSTLY

These residues are likely essential for structural stability or host protein interaction, particularly with DnaJ.

Therefore, fully conserved residues should not be mutated.

Variable Regions

Regions showing substitutions or alignment gaps indicate evolutionary variability.

Example variable region:

RYRRPRGSNTGKEYRLKKFCRNI

Variation is also observed in the C-terminal region, where some sequences contain truncations or insertions.

Implication

Variable regions are better candidates for mutational engineering because they are less likely to disrupt protein function.

Domain Analysis

The MS2 lysis protein contains two main structural regions:

Region	Residues	Function
Soluble domain	1–40	Interaction with DnaJ
Transmembrane domain	41–75	Membrane insertion and pore formation

Soluble Region Conservation

The N-terminal soluble domain shows high conservation across homologous sequences.

Example conserved sequence:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLY

Mutations in this region must therefore be chosen carefully.

Candidate mutation sites

Position	Residue	Reason
12	Q	Weakly conserved
17	N	Variable among homologs
26	Y	Moderate variability

These positions may tolerate substitutions without disrupting protein folding.

Transmembrane Region Conservation

The C-terminal region forms a transmembrane helix.

Example sequence:

LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

This region is highly hydrophobic, which is required for membrane insertion.

However, conservative substitutions between hydrophobic residues may be tolerated.

Candidate mutation sites

Position	Residue	Reason
52	L	Hydrophobic substitution possible
55	I	Minor hydrophobic change
59	V	Frequently mutated experimentally

Key Observations from Alignment

The N-terminal region is highly conserved, indicating functional importance in host interaction.
Some residues in the soluble domain show moderate variability.
The transmembrane region remains hydrophobic but allows conservative substitutions.
Some homologous proteins exhibit C-terminal truncations, suggesting structural flexibility in this region.

Mutation Design Strategy

The mutation design followed several biological constraints:

Rules applied

Avoid fully conserved residues
Prefer weakly conserved or variable residues
Maintain hydrophobicity in transmembrane helices
Preserve overall protein folding and stability

Mutational Scoring Using Protein Language Models

Mutation effects are predicted using protein language models, such as:

ESM-1b
MSA Transformer
ProteinBERT

Mutation scoring used log-likelihood ratio (LLR) values.

LLR Interpretation

Score	Interpretation
> 2	Very favorable
1–2	Moderately favorable
0–1	Weakly favorable
< 0	Unfavorable

Following image indicates results obtained using Protein Language Models (ESM).ipynb

Top Ranked Mutations

Position	WT	Mutation	LLR Score
50	K	L	2.56
29	C	R	2.39
39	Y	L	2.24
29	C	S	2.04
9	S	Q	2.01
53	N	L	1.86
52	T	L	1.81
61	E	L	1.81

Many favorable mutations convert residues to Leucine (L) because leucine stabilizes membrane helices due to its strong hydrophobicity.

Mapping Mutations to Protein Regions

Soluble Region (1–40)

Mutation	Score
C29R	2.39
C29S	2.04
S9Q	2.01
Y39L	2.24
F5Q	1.79

Transmembrane Region (41–75)

Mutation	Score
K50L	2.56
T52L	1.81
N53L	1.86
E61L	1.81
A45L	1.53

Biological Filtering

Risky mutations were removed using biological constraints.

Mutations excluded

C29R
C29S

Reason: cysteine residues may form structural interactions.

Safer alternatives

Y39L
S9Q
F5Q

Final Selected Mutations

Mutation	Region	LLR Score
S9Q	Soluble	2.01
Y39L	Soluble	2.24
K50L	Transmembrane	2.56
T52L	Transmembrane	1.81
N53L	Anywhere	1.86

Mutated Protein Sequence

Original Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Mutated Sequence

METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSLFLQLLLSLLEAVIRTVTTLQQLLT

Mutations applied:

S9Q
Y39L
K50L
T52L
N53L

Comparison with Experimental Data

Experimental data supports mutational tolerance at several selected positions.

Mutation	Position	Evidence	Interpretation
S9Q	9	No experimental mutation reported	Likely tolerant
Y39L	39	Y→H mutation reported	Position mutable
K50L	50	Multiple substitutions observed	Highly tolerant
T52L	52	Mutation recorded	Mutation tolerated
N53L	53	Several variants reported	Flexible boundary residue

These results support the predicted soluble and membrane domain boundaries.

Structural Prediction Using AlphaFold

The mutated sequence was modeled using AlphaFold Multimer

It required several attempts to successfully obtain a PDB file. Initially, an 8-sequence oligomer model was submitted for prediction; however, the system crashed during the run due to the high computational load. After adjusting the input and rerunning the analysis, a successful prediction was eventually completed and the resulting outputs were documented as follows.

Interpretation

The AlphaFold Multimer predictions were performed using several models, seeds, and recycling steps to evaluate the structural stability of the designed protein complex. Across all runs, the predicted local distance difference test (pLDDT) values ranged approximately between 32 and 40, indicating low to moderate confidence in the overall structural prediction, which is expected for small membrane-associated proteins and flexible regions. The pTM scores were generally between 0.19 and 0.31, while ipTM scores ranged from ~0.13 to 0.27, suggesting limited but detectable inter-chain interaction confidence. Notably, model 2 with seed 001 produced the highest scores (pLDDT ≈ 40.3, pTM ≈ 0.312, ipTM ≈ 0.275), indicating the most reliable structural prediction among the tested configurations. Most models converged after 6 recycling iterations, with total runtimes of approximately 258–323 seconds per model, suggesting stable convergence of the prediction process. While the moderate confidence scores indicate some structural uncertainty, the consistent convergence across multiple seeds and models suggests that the predicted fold and interaction patterns are reproducible and suitable for preliminary structural analysis.

To improve the prediction results, the analysis was repeated using a different input configuration. Instead of running an eight-sequence oligomer model, which previously caused the system to crash, a four-oligomer sequence setup was used. This reduced computational complexity and allowed the prediction to run successfully, enabling the generation of structural outputs for further analysis.

Results obtained:

Co-Folding Analysis

The mutated lysis protein sequence was further analyzed using co-folding simulations with additional protein sequences to investigate potential protein–protein interactions.

Structural visualization tools such as Discovery Studio were used to examine key structural and interaction features, including:

Hydrogen bonding patterns
Protein–protein interface interactions
Membrane insertion orientation

Co-folding simulations were performed using both the AlphaFold Multimer v3 notebook and the AlphaFold Server to compare prediction consistency and interaction confidence across different platforms.

The results obtained from the AlphaFold Server are summarized as follows:

Conclusion

This study applied evolutionary analysis, protein language models, and structural prediction to design mutations in the MS2 lysis protein.

Key findings:

The N-terminal region is highly conserved and involved in host interaction.
The C-terminal region forms a hydrophobic transmembrane helix.
Protein language model scoring identified favorable mutations.
Biological filtering ensured structural compatibility.

Final designed mutations

S9Q
Y39L
K50L
T52L
N53L