week-05-HW-protein-design-part-II

cover image cover image
Part A: SOD1 Binder Peptide Design (From Pranam)

Assignees for this section MIT/Harvard students Required Committed Listeners Required Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge:

  1. Design short peptides that bind mutant SOD1.
  2. Then decide which ones are worth advancing toward therapy.

You will use three models developed in our lab:

  • PepMLM: target sequence-conditioned peptide generation via masked language modeling
  • PeptiVerse: therapeutic property prediction
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
  2. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
  3. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
  4. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
  5. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Part 2: Evaluate Binders with AlphaFold3

  1. Navigate to the AlphaFold Server: alphafoldserver.com
  2. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
  3. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
  4. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes
    1. Predicted binding affinity
    2. Solubility
    3. Hemolysis probability
    4. Net charge (pH 7)
    5. Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

  1. Open the moPPit Colab linked from the HuggingFace moPPIt model card
  2. Make a copy and switch to a GPU runtime.
  3. In the notebook:
    1. Paste your A4V mutant SOD1 sequence.
    2. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
    3. Set peptide length to 12 amino acids.
    4. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
  4. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

Part 1: Generate Binders with PepMLM

For the PepMLM analysis, the amino acid sequence of the normal Superoxide Dismutase 1 protein was obtained from UniProt using the accession number P00441. cover image cover image To simulate the disease-associated variant, the A4V mutation was then manually introduced into the sequence to generate the mutant form of the protein used for the peptide design experiments. This mutation corresponds to the substitution of alanine by valine at position 4 of the protein sequence.

  • Original Superoxide dismutase 1 (SOD1) sequence from Uniprot:

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

  • Mutant A4V sequence (A4V means Alanine → Valine at position 4):

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Note

The Problem: The A4V mutation you are studying is famous because it destabilizes the dimer interface. This causes the dimer to fall apart into monomers, which then misfold and aggregate into the toxic clumps seen in ALS patients.

After running the PepMLM model using mutant A4V sequence, I obtained four candidate peptide binder sequences. In the generated results, each sequence ended with the amino acid symbol ‘X’, which represents an undefined residue predicted by the model.

To proceed with the structural analysis and fold the predicted binders with the mutated SOD1 protein, I needed to assign a specific amino acid at this position. For this reason, I replaced the ‘X’ residue with alanine (A) in each sequence. I chose alanine because it is a small and neutral amino acid that generally has minimal effects on protein structure and interactions. This allowed me to obtain complete peptide sequences that could be used for the subsequent folding and interaction prediction with the A4V mutant SOD1 protein. cover image cover image

BinderOriginal SequenceModified SequencePerplexity
Known binderFLYRWLPSRRGG//////
1WLSPAAGVEWKXWLSPAAGVEWKA14.764
2WHYYVVVVRHWXWHYYVVVVRHWA28.661
3WRSYVVVVELKXWRSYVVVVELKA20.402
4WRYPAVVAAHGXWRYPAVVAAHGA5.136

Part 2: Evaluating Binders with AlphaFold3

The predicted peptide binders were evaluated using structural modeling. Each peptide was folded together with the A4V mutant of SOD1 to evaluate the potential protein–peptide interactions. cover image cover image cover image cover image

Note

pTM and ipTM scores: the predicted template modeling (pTM) score and the interface predicted template modeling (ipTM) score are both derived from a measure called the template modeling (TM) score. This measures the accuracy of the entire structure (Zhang and Skolnick, 2004; Xu and Zhang, 2010). A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. ipTM measures the accuracy of the predicted relative positions of the subunits within the complex. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction. ipTM values between 0.6 and 0.8 are a gray zone where predictions could be correct or incorrect. TM score is very strict for small structures or short chains, so pTM assigns values less than 0.05 when fewer than 20 tokens are involved; for these cases PAE or pLDDT may be more indicative of prediction quality.

  • The known binder (FLYRWLPSRRGG) showed a relatively low binding confidence, with an ipTM score of 0.28. The peptide binds mainly on the surface of the SOD1 β-barrel, close to the electrostatic loop and the zinc-binding loop. It does not bind near the N-terminus, where the A4V mutation is located, and it also does not interact with the dimer interface. The peptide remains mostly surface-bound rather than buried inside the protein structure. Several residues help stabilize the interaction. For example, Trp5 and Tyr3 can form aromatic contacts with the protein surface, while Arg8 and Arg9 may form hydrogen bonds with nearby residues of SOD1. However, the peptide does not form a strong or compact binding interface, which suggests that the interaction may be weak or transient.

  • Binder 1 (WLSPAAGVEWKA) showed a clear improvement compared with the known binder, with an ipTM score of 0.39. The peptide binds on a hydrophobic groove on the surface of the SOD1 β-barrel. In this interaction, Trp1 acts as an important anchoring residue, helping the peptide attach to a hydrophobic pocket on the protein surface. Other residues such as Ser3 and Pro4 help position the peptide backbone against the protein surface. In addition, Glu9 forms stabilizing hydrogen bonds with nearby residues on SOD1. Because of these interactions, the peptide forms a more compact and organized binding conformation than the known binder. Although the peptide still binds away from the A4V mutation site, the higher ipTM score and the stronger interaction network suggest that Binder 1 may represent a more promising peptide candidate.

  • Binder 2 (WHYYVVVVRHWA) showed a moderate interaction with SOD1, with an ipTM score of 0.33, which is higher than the known binder but lower than Binder 1. The peptide binds on a surface patch of the SOD1 β-barrel region. Several residues appear to contribute to this interaction. Trp1 participates in both hydrogen bonding and aromatic interactions with the protein surface, helping to anchor the peptide. Tyr3 and Arg9 also participate in hydrogen bonding that stabilizes the peptide orientation. In addition, the terminal residue Ala12 contributes to stabilizing the peptide backbone through hydrogen bonding with the protein surface. Compared with the known binder, Binder 2 shows a more localized and organized binding mode, although the peptide still binds mainly on the surface of the protein rather than deeply inside the structure.

  • Binder 3 (WRSYVVVVELKA) showed the lowest binding confidence among the designed peptides, with an ipTM score of 0.20, which is even lower than the known binder. The peptide still localizes on the surface of the SOD1 β-barrel, but the interaction appears weak and poorly defined. The interaction is mainly supported by Arg2 and Lys11, which can form hydrogen bonds with residues on the SOD1 surface. In addition, Tyr4 may contribute through aromatic interactions with the protein surface. However, the peptide forms only a limited number of stabilizing contacts, and the interaction appears less stable compared with Binder 1 and Binder 2. These results suggest that Binder 3 may not be a strong candidate for stable binding to the SOD1 mutant.

  • Binder 4 (WRYPAVVAAHGA) showed a moderate structural confidence, with an ipTM score of 0.33, similar to Binder 2 and higher than the known binder. The peptide binds on the surface of the SOD1 β-barrel region. Several residues contribute to this interaction. Trp3, Val6, and Gly11 appear to form hydrogen bonds with residues on the SOD1 surface, helping stabilize the interaction. In addition, an internal hydrogen bond between Val6 and His10 helps stabilize the peptide backbone and maintain its conformation. Compared with Binder 3, this peptide forms more defined interactions with the protein surface, which explains its higher predicted binding confidence. Although the peptide still binds away from the A4V mutation site, the interaction appears more organized and stable than the known binder.

To further explore whether peptide length influences binding stability, the same structural analysis was also performed using 11-residue versions of the peptides obtained by removing the final alanine that replaced the unknown residue X. For Binder 1, the ipTM score decreased from 0.39 (12 aa) to 0.27 (11 aa), indicating that the twelfth residue likely helps stabilize the interaction with the SOD1 surface. In contrast, Binder 2 showed a small increase in structural confidence, where the score changed from 0.33 (12 aa) to 0.35 (11 aa), suggesting that the slightly shorter peptide may adopt a somewhat better orientation on the protein surface. Binder 3 showed the strongest negative effect of shortening the peptide, with the score decreasing from 0.20 (12 aa) to 0.13 (11 aa), confirming that this peptide already forms weak interactions and becomes even less stable when shortened. Interestingly, Binder 4 showed the opposite trend, where the 11-residue version reached the highest score of all tested peptides (0.44) compared with 0.33 for the 12-residue version, suggesting that removing the last residue may allow the peptide to adopt a more favorable binding conformation. Overall, these exploratory results suggest that peptide length can influence binding stability, but the effect is sequence-dependent, since shortening the peptide reduced stability for some binders (Binder 1 and Binder 3) while improving it for others (Binder 2 and Binder 4).

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

I evaluated the therapeutic properties of all 12-residue peptide binders using PeptiVerse. cover image cover image cover image cover image cover image cover image The known binder FLYRWLPSRRGG showed weak binding affinity (pKd 5.97), good solubility, very low hemolysis probability (0.047), and a positive net charge of 2.76. Among the PepMLM-generated peptides, Binder 1 (WLSPAAGVEWKA) had weak binding affinity (pKd 5.61), excellent solubility, very low hemolysis (0.037), and a near-neutral net charge (-0.24). Binder 2 (WHYYVVVVRHWA) exhibited medium binding affinity (pKd 7.12), fully soluble, non-hemolytic (0.115), and slightly positive net charge (0.93). Binder 3 (WRSYVVVVELKA) showed weak binding (pKd 6.28), soluble, non-hemolytic (0.115), and net charge 0.76. Binder 4 (WRYPAVVAAHGA) had weak binding (pKd 5.22), soluble, non-hemolytic (0.037), and net charge 0.85. cover image cover image

BinderSequenceipTM (AlphaFold)Predicted Binding Affinity (pKd/pKi)SolubilityHemolysis ProbabilityNet Charge (pH 7)Molecular Weight (Da)Highlights
KnownFLYRWLPSRRGG0.28Weak (5.968)Soluble (1)0.0472.761507.7Surface-bound, low confidence; non-hemolytic
Binder 1WLSPAAGVEWKA0.39Weak (5.608)Soluble (1)0.037-0.241314.5Highest ipTM, stable hydrophobic groove binding; non-hemolytic
Binder 2WHYYVVVVRHWA0.33Medium (7.123)Soluble (1)0.1150.931614.8Good binding, slightly lower ipTM than Binder 1; non-hemolytic
Binder 3WRSYVVVVELKA0.20Weak (6.279)Soluble (1)0.1150.761448.7Lowest ipTM, weak structural stability; non-hemolytic
Binder 4WRYPAVVAAHGA0.33Weak (5.216)Soluble (1)0.0370.851297.5Medium ipTM, non-hemolytic, good solubility

Comparing these properties with the structural confidence from AlphaFold, we see that higher ipTM scores do not always directly match stronger predicted binding. For example, Binder 1 had the highest ipTM (0.39) but only weak predicted binding, while Binder 2 had slightly lower ipTM (0.33) but showed medium predicted binding. All generated peptides are soluble and non-hemolytic, which is favorable for therapeutic use. Considering both structural confidence and predicted properties, Binder 1 (WLSPAAGVEWKA) is the most promising overall: it has the highest structural stability on SOD1, is non-hemolytic, fully soluble, and has a near-neutral charge that may support safe and effective binding in a biological context.

Part 4: Generating Optimized Peptides with moPPIt

Mutations such as the A4V variant can destabilize the structure of Superoxide Dismutase 1, increasing the probability of protein misfolding, dissociation of the dimer, and toxic aggregation, which are processes associated with Amyotrophic Lateral Sclerosis (ALS). For this reason, the design strategy of this step focused on generating short peptides that can bind simultaneously to both monomers at the dimer interface, effectively acting as a molecular bridge that reconnects and stabilizes the two subunits. By reinforcing the interaction between the chains, these peptides may help restore a conformation closer to the native functional state of the SOD1 complex, while reducing the structural instability caused by the mutation.

To do so, several design parameters were selected before generating peptides. The peptide length was fixed at 12 amino acids. The motif position was focused on residues 3–10, meaning the central region of the peptide was encouraged to interact with the target protein. In addition, affinity guidance and solubility optimization were enabled, and hemolysis prediction was considered to reduce potential toxicity. These settings allow the model to design peptides that not only bind the protein but also have better therapeutic properties. cover image cover image cover image cover image cover image cover image

After generating the following peptides, their structures were evaluated using AlphaFold to predict how they interact with SOD1.

  • The generated sequences:
Optimized BinderSequenceIpTM ScoreBinding Localization
1KRQCEIFNQFMA0.91Interface between the two monomers
2EKDNKWVITSQF0.86Interface between the two monomers
3VCQFDYKTLFKK0.87Interface between the two monomers
4GQQSLFKTKTLD0.89The outer surface of a single SOD1 monomer
  • Binder 1 – KRQCEIFNQFMA (ipTM: 0.91) This peptide localizes at the dimer interface of the SOD1 homodimer and acts as a molecular bridge between the two monomers. Several residues of the peptide participate in stabilizing the interaction. Gln3 forms a hydrogen bond with residues on the first monomer, while Cys4 interacts with a cysteine residue on the second monomer. In addition, Asn8 forms multiple hydrogen bonds with residues on Chain A. These multiple contacts allow the peptide to connect both monomers simultaneously, which could help stabilize the dimer structure of SOD1. cover image cover image
  • Binder 2 – EKDNKWVITSQF (ipTM: 0.86) This peptide also binds at the dimer interface and connects the two monomers. The interaction is mainly driven by the N-terminal region of the peptide. Glu1 forms several hydrogen bonds with residues on Chain B, creating a strong anchoring point. In addition, Ser10 interacts with residues on Chain A. Through these interactions with both monomers, the peptide may help maintain the stability of the SOD1 dimer. cover image cover image
  • Binder 3 – VCQFDYKTLFKK (ipTM: 0.87) This peptide spans the interface between the two monomers, forming stabilizing contacts with both chains. Val1 forms a hydrogen bond with residues on Chain B, while Phe4 interacts with residues on Chain A. These interactions allow the peptide to bridge the two monomers and stabilize the interface region. cover image cover image
  • Binder 4 – GQQSLFKTKTLD (ipTM: 0.89) Unlike the previous peptides, this binder attaches to the outer surface of a single SOD1 monomer, particularly near the β-barrel structure. The interaction is mainly driven by residues near the C-terminus of the peptide. Thr10 forms a hydrogen bond with the monomer, while Asp12 forms two hydrogen bonds with residues on Chain A. Lys9 also contributes to stabilization by forming an additional hydrogen bond. This peptide does not bridge the dimer but instead stabilizes the surface structure of the monomer. cover image cover image The four peptides show two different binding strategies:

Three peptides (KRQCEIFNQFMA, EKDNKWVITSQF, and VCQFDYKTLFKK) bind at the dimer interface, where they interact with residues from both monomers. These peptides may help stabilize the SOD1 dimer by acting as a bridge between the two chains. In contrast, GQQSLFKTKTLD binds only to one monomer, specifically on the β-barrel surface. Instead of bridging the two chains, this peptide may stabilize the structure of the individual monomer.

Among the peptides, KRQCEIFNQFMA shows the highest ipTM score (0.91), suggesting the strongest predicted interaction with the protein complex. When comparing the peptides generated by PepMLM and moPPIt, the main difference lies in the design strategy. PepMLM mainly samples possible peptide sequences that could bind to the target protein based on patterns learned from protein sequence data. However, it does not allow the user to control exactly where the peptide should bind on the protein. As a result, the generated peptides are plausible binders, but their binding location and biochemical properties are not specifically optimized.

In contrast, moPPIt enables guided peptide design. In this approach, the user can select specific residues or regions on the protein where the peptide should bind, such as the dimer interface of Superoxide Dismutase 1 or regions near the A4V mutation. The model also optimizes several properties simultaneously, including binding affinity, solubility, hemolysis risk, and motif placement. Because of this multi-objective optimization, moPPIt peptides are designed to better satisfy several therapeutic requirements at the same time.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

Assignees for this section

MIT/Harvard students Required

Committed Listeners Required


For this part, unfortunately, I was unable to access the BRD4 Drug Discovery Platform, as the access was not granted to me despite my request.

Part C: Final Project: L-Protein Mutants

Assignees for this section

MIT/Harvard students Required

Committed Listeners Required


Option 1 : Improve autofolding and lysis efficiency

The goal of this part was to design mutations in the L-protein in order to improve its function. Two main objectives were considered. The first objective was to improve the autofolding ability of the L-protein so that it can fold correctly without strong dependence on host chaperones. The second objective was to improve the lysis efficiency of the protein by enhancing its ability to form pores in the E. coli membrane and promote faster or more efficient bacterial lysis.

To identify possible mutations, the provided mutation scoring notebook was used. This notebook evaluates all possible amino-acid substitutions in the L protein and assigns a score to each mutation.

After running the notebook, the resulted mutation predictions are presented in the following dataset:

Position (DNA)Position (Protein)Wild Type AAMutation AALLR Score
98950KL2.561468
57429CR2.395427
76939YL2.241780
57529CS2.043150
1739SQ2.014325
57329CQ1.997049
57229CP1.971029
56929CL1.960646
98750KI1.928801
104953NL1.864932
120961EL1.818098
102952TL1.813968
98450KF1.802069
57629CT1.797247
56829CK1.795878
935FQ1.795244
945FR1.659717
56029CA1.648656
53427YR1.628061
43422FR1.602028
925FP1.596891
99750KV1.594576
99550KS1.574557
965FT1.559024
955FS1.556417
88945AL1.539248
77539YS1.517457
53527YS1.497053
78940VL1.477630
52927YL1.474637
43522FS1.423358
56329CE1.383281
76039YA1.364999
57129CN1.362601
98050KA1.357795
56729CI1.344121
895FL1.332615
33417NR1.323651
76739YI1.320103
77639YT1.302804
51426DR1.268762
56629CH1.246107
76439YF1.245851
77739YV1.244390
45423KR1.236555
49425ER1.229350
47424HR1.227779
99650KT1.222131
53327YQ1.218851
53627YT1.215567

cover image cover image The predicted mutations were compared with the experimental dataset of L-protein mutants provided in the course material. This dataset contains mutations that were experimentally tested and their effect on lysis activity.

The goal of this comparison was to determine whether mutations with high prediction scores correspond to mutations that show improved lysis in experimental studies. This step helps evaluate the reliability of the prediction model. The results of this comparison revealed a limited overlap between predicted beneficial mutations and experimentally tested mutations. Two mutations, C29R and K50I, appeared in both datasets. However, experimental data indicated that these substitutions did not improve lysis activity. This suggests that, while the protein language model captures sequence compatibility, it does not fully predict functional outcomes such as lysis efficiency. For this raison, experimental validation remains essential to confirm computational predictions.

To avoid mutations that could disrupt essential protein functions, sequence conservation analysis was performed. Multiple sequences related to the MS2 L protein obtained from the BLAST results provided in the course folder were uploaded to Clustal Omega and aligned. cover image cover image cover image cover image cover image cover image The conserved regions of the L protein were identified after analyzing the multiple sequence alignment results. Highly conserved residues, which are the same across all sequences, were marked with stars (*) in the alignment output. while colon (:) indicates residues with strongly similar chemical properties. These positions were considered critical for protein function, so mutations at these residues were avoided. The remaining positions, which showed variability among sequences, were classified as non-conserved and were selected as potential sites for mutation. This approach ensured that the chosen mutations would minimize disruption of essential protein structure and function. cover image cover image Mutations were selected using the resulted mutation scoring predictions and evolutionary conservation analysis. Only residues located in non-conserved positions were chosen in order to reduce the risk of disrupting essential protein functions. The selected mutations (F5Q, S9Q, F22S, Y27L, and A45L) -as represented in the following table- are distributed between the N-terminal region, the central region, and the transmembrane domain of the L protein. This distribution allows the exploration of potential effects on protein autofolding and membrane activity, while maintaining the overall structural integrity of the protein. cover image cover image

MutationLLR Score*Protein RegionAA Property ChangeMutation TypeConserved Residue?Structural RiskRationale for Selection
S9Q~2.01N-terminal regionSmall polar → Polar amideConservativeUnconservedLowSimilar polarity; minimal structural disruption while potentially altering hydrogen bonding
F5Q~1.80N-terminal regionHydrophobic aromatic → Polar amideModerateUnconservedModerateIntroduces polarity which may affect folding and interaction with cytoplasmic environment
A45L~1.54Transmembrane helixSmall hydrophobic → Larger hydrophobicConservativeUnconservedLowMaintains hydrophobic nature; may stabilize helix packing in membrane
Y27L~1.47Near transmembrane regionAromatic → Hydrophobic aliphaticModerateUnconservedModerateMaintains hydrophobicity but removes aromatic ring; could affect membrane insertion
F22S~1.42Cytoplasmic / near TM regionHydrophobic aromatic → Small polarModerateUnconservedModerateReduces hydrophobicity; may influence membrane interaction and folding

Because the L gene overlaps with other genes in the MS2 genome, the nucleotide changes corresponding to the selected mutations were checked to ensure that they do not introduce stop codons in the overlapping reading frames. cover image cover image cover image cover image cover image cover image The mutations F5Q and S9Q are located in the region overlapping with the coat protein (CP) gene, near its C-terminal end, while the mutation A45L is located in the region overlapping with the replicase (Rep) gene, near its N-terminal region. For each mutation, the possible codon substitutions were examined and confirmed not to generate stop codons in the overlapping genes. Therefore, these mutations are considered compatible with the genome organization of MS2. cover image cover image

Option 2: Achieve DnaJ independence

Here the goal was to reduce or eliminate the dependence of the L-protein on the host chaperone DnaJ. By designing mutations in the soluble N-terminal domain of the L-protein, i aimed to weaken its interaction with DnaJ while maintaining proper folding. This approach could potentially allow the phage to function even if DnaJ is mutated or absent in the host.

To study the interaction, i used the AlphaFold2-Multimer notebook in ColabFold to co-fold the soluble domain of the L-protein with the full sequence of E. coli DnaJ. The sequences used were:

  • DnaJ sequence:

MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDSQKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYNMELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQQTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAPAGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTGKLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRSKSFFDGVKKFFDDLTR

  • The soluble domain of Lysis protein (N terminal Domain):

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSS

cover image cover image

The first co-folding run generated five ranked models with the following parameters. Here, pLDDT reflects the confidence in the predicted structure of the L–DnaJ complex, pTM indicates the overall predicted quality of the complex, and ipTM estimates the predicted strength of the interaction between L-protein and DnaJ:

Rank_004_alphafold2_multimer_v3_model_1: pLDDT=79.3 pTM=0.567 ipTM=0.198 cover image cover image Rank_005_alphafold2_multimer_v3_model_2 : pLDDT=78.9 pTM=0.573 ipTM=0.161 cover image cover image Rank_003_alphafold2_multimer_v3_model_3: pLDDT=78.4 pTM=0.572 ipTM=0.265 cover image cover image
Rank_001_alphafold2_multimer_v3_model_4: pLDDT=78.9 pTM=0.583 ipTM=0.373 cover image cover image
Rank_002_alphafold2_multimer_v3_model_5: pLDDT=77.9 pTM=0.572 ipTM=0.287 cover image cover image

  • Plots for L_DnaJ_complex cover image cover image

By comparing the different predicted models, the fourth model was identified as the best-ranked model because it showed the highest ipTM score, indicating the strongest predicted interaction between the L-protein and DnaJ. cover image cover image Using PyMol I Analysed the predicted L–DnaJ complex of the best predicted model (rank 4) and the results revealed multiple interaction residues located in the N-terminal region of the L protein as summarized in the following table:

ResidueTypeContacts with DnaJTypical Interaction Role
Met 1Hydrophobic (non-polar)ASP116, ARG113Hydrophobic contact
GLU2Negatively chargedALA115, ASP116Electrostatic / salt bridge
THR3Polar (uncharged)ASP116, LEU117, ARG118Hydrogen bonding
ARG4Positively chargedALA115, ARG118, LEU117, GLU233, ASP116Electrostatic / salt bridge
PHE5Hydrophobic aromaticASN120, LEU117, ARG118Hydrophobic packing
PRO6Hydrophobic (rigid)LEU117, ARG118, ASN120, TYR119, ASP116Structural / hydrophobic contact
GLN8Polar (uncharged)ASN120, ARG118, TYR119Hydrogen bonding
SER15Polar (uncharged)GLN252, GLU122, LYS251Hydrogen bonding
ASN17Polar (uncharged)GLN252Hydrogen bonding
ARG18Positively chargedVAL250, GLN252, GLN249, GLU122, LYS251Electrostatic interaction
ARG19Positively chargedGLN252Electrostatic interaction
ARG20Positively chargedGLN252Electrostatic interaction
PRO21HydrophobicGLN252, GLU257Structural / hydrophobic contact
PHE22Hydrophobic aromaticPRO254, GLU266, GLU257Hydrophobic contact
LYS23Positively chargedGLU266Electrostatic interaction
HIS24Positively charged / polarVAL326, ARG324, GLU266Electrostatic / hydrogen bond
GLU25Negatively chargedGLU266Electrostatic interaction
ASP26Negatively chargedARG324, GLU266, VAL326Electrostatic interaction
TYR27Aromatic polarVAL327, THR329, GLU328Hydrophobic + H-bond

Key residues such as Arg4, Thr3, Pro6, Phe5, Arg18, Lys23, His24, and Tyr27 were found to interact with several residues of DnaJ, including Asp116, Leu117, Arg118, Glu122, and Glu266. These interactions involve a combination of electrostatic, hydrophobic, and hydrogen-bond contacts. Residues forming multiple contacts were considered potential targets for mutagenesis aimed at reducing the dependence of the L protein on the DnaJ chaperone.

Two hydrophobic residues (Pro6 and Phe22), two positively charged residues (Arg4 and Arg18), and two negatively charged residues (Glu2 and Asp26) were selected for mutational analysis. These residues participate in multiple contacts with DnaJ and represent different physicochemical interaction types involved in stabilizing the L–DnaJ interface.

To evaluate the contribution of different interaction types at the L–DnaJ interface, selected residues were substituted with alanine using an alanine-scanning approach in order to remove their side-chain interactions while minimizing structural perturbation.

Original ResidueMutationReason
PRO6P6Aremoves rigid hydrophobic contact
PHE22F22Aremoves aromatic hydrophobic interaction
ARG4R4Aremoves positive charge
ARG18R18Aremoves strong electrostatic interaction
GLU2E2Aremoves negative charge
ASP26D26Aremoves negative charge

The resulting N-terminal sequence of the lysis protein was used to re-predict the interaction with the DnaJ protein in order to evaluate whether the introduced mutations could reduce the dependence of the lysis protein on the host chaperone:

MATAFAQQSQQTPASTNARRPAKHEAYPCRRQQRSS

The mutated L-protein was co-folded again with DnaJ using AlphaFold2-Multimer. The five ranked models obtained were:

RankpLDDTpTMipTM
378.70.5790.291
478.40.5740.235
577.10.5690.233
279.10.5810.219
179.40.5680.206

When we compared the new models with the wild-type complex, we can see clearly that the ipTM values were slightly lower. In the wild-type prediction, the best model showed an ipTM value of 0.373, while after mutation the highest ipTM value decreased to 0.291. Since ipTM reflects the predicted strength of interaction between two proteins, this decrease suggests that the interaction between the L-protein and DnaJ became weaker after the mutations were introduced. This reduction is consistent with the mutation strategy, where several key residues involved in hydrophobic and electrostatic contacts were replaced with alanine in order to remove their side-chain interactions.

Despite these changes, the overall structural confidence of the models (pLDDT values were 78.9 to 78.6) remained similar to the wild-type predictions, indicating that the L-protein is still likely to fold correctly. Therefore, these results suggest that the designed mutations may reduce the dependence of the L-protein on the DnaJ chaperone while maintaining a stable protein structure. This computational approach demonstrates how targeted mutagenesis combined with AlphaFold2-Multimer predictions can be used to design L-protein variants with potentially lower chaperone dependency.

in this homework, AI ChatGPT assisted me in organizing and clearly articulating my answers and descriptions, ensuring that the content is well-structured and easy to understand.