Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

Original: 

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS

AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV

HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutated:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS

AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV

HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ


2. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence, add the known SOD1-binding peptide FLYRWLPRSRRGG and record the perplexity scores of your generated binders.

Binder

Pseudo Perplexity

WRYPVTVLRHKX

11.329218109003818

WRYYAVVLEHWX

14.262868870651067

WRYPVAALAHGX

7.885580275805914

WLYYATGAAWKK

15.833767982778536

 FLYRWLPSRRGG


The perplexity scores of my polypeptides are relatively low, as all except one are under 15. This indicates a high confidence in the generated samples. 

3. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

Binder

ipTM scores

Binding description

WRYPVTVLRHKA

0.46

The polypeptide appears to bind to the length of the beta barrel, at approximately a third of the diameter of the beta barrel away from the N-terminus and dimer interface. It appears to be surface bound. 

WRYYAVVLEHWA

0.19

It seems to superficially bind to the protein at the furthest distance away from the beta-barrel, with low proximity to the N-terminus. Perhaps one end could be described as close to the dimer interface.

WRYPVAALAHGA

0.36

The polypeptide seems to bind far away from the N-terminus and the dimer interface, it approaches the beta barrel region somewhat, but still keeps a relatively large distance from it as well. It appears to be surface bound. 

WLYYATGAAWKK

0.26

The polypeptide appears to superficially bind at the edge of the beta-barrel, with very low proximity to both the N-terminus and the dimer interface. 

 FLYRWLPSRRGG

0.31

The polypeptide appears to be partially buried at the c-terminus, this terminus appears to be the closest part to the beta-barrel even though overall there seems to be very little interaction with that specific structure. There seems to be very low proximity to both the N-terminus and the dimer interface.

4. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

All of my ipTM are below the threshold of acceptable probability, the lowest being 0.19. However, two of my PepMLM-generated peptides do exceed the known binder’s score with a score of 0.36 and 0.46. 

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

cover image cover imagecover image cover imagecover image cover imagecover image cover image


Based on the property prediction by PeptiVerse, it can be stated that all of the polypeptides are water soluble and non-hemolytic. These are highly necessary factors for potential application as they could prove to be detrimental if otherwise. 

All polypeptides are predicted to be weak binding, which matches the ipTM scores of the AlphaFold models. This weak binding was also clearly seen in the visualisation of the polypeptides’ binding to the target protein within the AlphaFold models. 

The molecular weight of all pp’s ranges between 1311 and 1592 Da, which is within a low range. This could indicate a lower aggregation rate, which would be good. 

The SOD-1 protein is negatively charged (-6 per monomer) at physiological pH (7.4) (Shi, 2014),  the binders have a small positive charge with the exclusion of WRYVAVVLEHWA having a minute negative charge of -0.15. A strongly positively charged binder would be more prone to non-specific binding to the protein and would result in a higher toxicity, therefore it is good that they are only slightly positively charged.

WRYPVTVLRHKA is the peptide that I would advance as it has the highest ipTM score, it also binds to the protein over the length of the beta-barrel. It has a very low hemolytic score of 0.038. It does however have a relatively weak binding score, however, so do all the other options. 

Reference:

Shi Y, Abdolvahabi A, Shaw BF. Protein charge ladders reveal that the net charge of ALS-linked superoxide dismutase can be different in sign and magnitude from predicted values. Protein Sci. 2014 Oct;23(10):1417-33. doi: 10.1002/pro.2526. Epub 2014 Aug 7. PMID: 25052939; PMCID: PMC4287002.

Part C: Final Project: L-Protein Mutants

Information about Lysis protein in Bacteriophage MS2

Amino acid sequence of Lysis

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

DNA Sequence of Lysis

atggaaacccgctttccgcagcagagccagcagaccccggcgagcaccaaccgccgccgcccgtttaaacatgaagattatccgtgccgccgccagcagcgcagcagcaccctgtatgtgctgatttttctggcgatttttctgagcaaatttaccaaccagctgctgctgagcctgctggaagcggtgattcgcaccgtgaccaccctgcagcagctgctgacc

DNAj Sequence

     tcgacgctga atttgaagaa gtcaaagaca aaaaataatc gccctataaa cgggtaatta         60

     tactgacacg ggcgaagggg aatttcctct ccgcccgtgc attcatctag gggcaattta        120

     aaaaagatgg ctaagcaaga ttattacgag attttaggcg tttccaaaac agcggaagag     180

     cgtgaaatca gaaaggccta caaacgcctg gccatgaaat accacccgga ccgtaaccag        240

     ggtgacaaag aggccgaggc gaaatttaaa gagatcaagg aagcttatga agttctgacc        300

     gactcgcaaa aacgtgcggc atacgatcag tatggtcatg ctgcgtttga gcaaggtggc        360

     atgggcggcg gcggttttgg cggcggcgca gacttcagcg atatttttgg tgacgttttc         420

     ggcgatattt ttggcggcgg acgtggtcgt caacgtgcgg cgcgcggtgc tgatttacgc        480

     tataacatgg agctcaccct cgaagaagct gtacgtggcg tgaccaaaga gatccgcatt        540

     ccgactctgg aagagtgtga cgtttgccac ggtagcggtg caaaaccagg tacacagccg        600

     cagacttgtc cgacctgtca tggttctggt caggtgcaga tgcgccaggg attcttcgct         660

     gtacagcaga cctgtccaca ctgtcagggc cgcggtacgc tgatcaaaga tccgtgcaac        720

     aaatgtcatg gtcatggtcg tgttgagcgc agcaaaacgc tgtccgttaa aatcccggca        780

     ggggtggaca ctggagaccg catccgtctt gcgggcgaag gtgaagcggg cgagcatggc       840

     gcaccggcag gcgatctgta cgttcaggtt caggttaaac agcacccgat tttcgagcgt        900

     gaaggcaaca acctgtattg cgaagtcccg atcaacttcg ctatggcggc gctgggtggc        960

     gaaatcgaag taccgaccct tgatggtcgc gtcaaactga aagtgcctgg cgaaacccag       1020

     accggtaagc tattccgtat gcgcggtaaa ggcgtcaagt ctgtccgcgg tggcgcacag       1080

     ggtgatttgc tgtgccgcgt tgtcgtcgaa acaccggtag gcctgaacga aaggcagaaa      1140

     cagctgctgc aagagctgca agaaagcttc ggtggcccaa ccggcgagca caacagcccg       1200

     cgctcaaaga gcttctttga tggtgtgaag aagttttttg acgacctgac ccgctaacct        1260

     ccccaaaagc ctgcccgtgg gcaggcctgg gtaaaaatag ggtgcgttga agatatgcga       1320

     gcacctgtaa agtggcgggg atcactccca taagcgct                                 1358

Conserved sites

cover image cover image

The provided article “Mutational analysis of the MS2 lysis protein L. Microbiology” describes how throughout homologs, position 48 and 49 are consistently conserved sites (LEU en SER). In the central domain there is a high conservation level across most homologs in position 38 39 40, (Leu, Tyr and Val) (Chamakura, 2017)


Known mutational effects from research. 

cover image cover image

The above figure obtained from the aforementioned paper shows an overview of a mutational analysis of MS2 L. The analysis shows that certain missense mutations in the second domain do not affect lytic function (indicated below the L sequence). In the second, third and fourth domain, numerous  mutations did negatively affect lysis function but did not result in protein accumulation (indicated above the L sequence). Green asterisks represent positions where a single nucleotide mutation would result in a nonsense mutation. 


References:

Chamakura, K. R., Edwards, G. B., & Young, R. (2017). Mutational analysis of the MS2 lysis protein L. Microbiology (Reading, England), 163(7), 961–969. https://doi.org/10.1099/mic.0.000485



My selected approach to determining sequence variations is cross referencing the provided experimental data on L-protein mutants with generated point mutations and accompanying success scores from the provided Protein Language Models. 

As the assignment dictates, 2 mutations should be located in the soluble N-terminal domain (position 1 - 39) and 2 in the transmembrane domain ( position 40 - 75)


Top 10 of colab 

cover image cover image

When crossreferencing this list with the original experimental mutations, two mutations, namely C29R and K50I appeared to result in neither lysis or protein formation. Therefore they will be excluded from potential mutation candidates. 


This leaves the following mutations: 

K50L - While this mutation has the highest LLR score, further research indicates that Lysine (K) and Leucine (L) are structurally very different residues. This could indicate the potential for problematic protein formation. However, due to its high score it should still be considered. 

Y39L - Tyrosine and Leucine 

S9Q - Serine and glutamine

C29Q - Cystine and glutamine 

C29P - Cystine and proline

C29L - cystine and leucine

N53L - Asparagine and leucine

I asked Perplexity to compare the structural properties of the wild-type amino acids to their  mutated amino acids. 

Mutation

Amino Acids

Structural Similarity?

Reason

Y39L

Tyrosine → Leucine

No

Aromatic ring vs. branched alkyl chain; different bulk and polarity. www2.chemistry.msu

S9Q

Serine → Glutamine

Low

Short hydroxyl vs. longer amide; size/volume mismatch despite some polarity. benchchem

C29Q

Cysteine → Glutamine

No

Thiol (-SH) vs. amide; polarity similar but chain length and reactivity differ. wikipedia

C29P

Cysteine → Proline

No

Linear thiol vs. rigid cyclic; proline disrupts helices unlike cysteine. youtubeweb02.gonzaga

C29L

Cysteine → Leucine

No

Small polar thiol vs. large hydrophobic branch; major size/chemistry shift. www2.chemistry.msu

N53L

Asparagine → Leucine

No

Polar amide vs. hydrophobic branch; polarity and H-bonding lost. www2.chemistry.msu


However, I will continue the assignment with the following five mutations:

K50L

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

cover image cover image

N53L

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT


cover image cover image

S9Q

METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT


cover image cover image



Y39L 

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT


cover image cover image


C29L

METRFPQQSQQTPASTNRRRPFKHEDYPLRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT



cover image cover image