Week 4 HW: Protein Design Part I

Part B: Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.

1. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions. Briefly describe the protein you selected and why you selected it.

I chose the p53 protein, which triggers programmed cell death when ailments like cancer cause extensive DNA damage from oxidative stress like UV light, oxygen radicals or chemicals. In a cancerous cell, the p53 protein will travel to the nucleus and signal the mitochondria to release reactive oxygen species or increase calcium levels. Other death factors released include cytochrome c, which activates caspases and SMAC which blocks survival proteins (Fogg et al., 2011). I selected this protein as mutations in this protein can cause cancer and it is vital to protect the human genome from damage .

  1. Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? You can use this notebook to count most frequent amino acid - https://colab.research.google.com/drive/1vlAU_Y84lb04e4Nnaf1axU8nQA6_QBP1?usp=sharing

p53 is 393 amino acids long.

The most common amino acid is: P (Proline), which appears 45 times.

How many protein sequence homologs are there for your protein? Hint: Use the pBLAST tool to search for homologs and ClustalOmega to align and visualize them.

I found 175 total homologs using the p53 human version (https://www.uniprot.org/uniprotkb/P04637/entry).

My Cluster alignment sequences are below:

CLUSTAL O(1.2.4) multiple sequence alignment


Zebrafish_P53      -------------MAQNDSQEFAELWEKN----LISIQPPGGGSCWDII-----NDEEYL	38
Frog_P53           ----MEPSSETGMDPPLSQETFEDLWSLLPDPLQTVTCR-------------LDNLSEFP	43
Human_P53          ---MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLP---SQAMDDLMLSPDDIEQWF	54
Mouse_P53          MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPS-----PHCMDDLLL-PQDVEEFF	54
Rat_P53            ---MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNPMEDLFL-PQDVAELL	56
                                    ..: *  **.                           :  :  

Zebrafish_P53      ---PGSFDPNFFG-NV-----LEEQP------QPSTLPPTSTVPETSDYPGDHGFRLRFP	83
Frog_P53           D-YPLAADMTV------LQ--------EGLMGNAVPTVTSCAVPSTDDYAGKYGLQLDFQ	88
Human_P53          TEDPGPDEAPRMPEAAPPVAPAPATPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFL	114
Mouse_P53          E---GPSEALRVSGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGNYGFHLGFL	111
Rat_P53            E---GPEEALQVS-APAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYGFHLGFL	112
                          :                               :. **. . * *.:*::* * 

Zebrafish_P53      QSGTAKSVTCTYSPDLNKLFCQLAKTCPVQMVVDVAPPQGSVVRATAIYKKSEHVAEVVR	143
Frog_P53           QNGTAKSVTCTYSPELNKLFCQLAKTCPLLVRVESPPPRGSILRATAVYKKSEHVAEVVK	148
Human_P53          HSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVR	174
Mouse_P53          QSGTAKSVMCTYSPPLNKLFCQLVKTCPVQLWVSATPPAGSRVRAMAIYKKSQHMTEVVR	171
Rat_P53            QSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAIYKKSQHMTEVVR	172
                   :.****** ****  ***:****.****: : *   ** *: :** *:**:*:*::***:

Zebrafish_P53      RCPHHERTP-DGDNLAPAGHLIRVEGNQRANYREDNITLRHSVFVPYEAPQLGAEWTTVL	202
Frog_P53           RCPHHERSVEPGEDAAPPSHLMRVEGNLQAYYMEDVNSGRHSVCVPYEGPQVGTECTTVL	208
Human_P53          RCPHHERCS-DSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIH	233
Mouse_P53          RCPHHERCS-DGDGLAPPQHLIRVEGNLYPEYLEDRQTFRHSVVVPYEPPEAGSEYTTIH	230
Rat_P53            RCPHHERCS-DGDGLAPPQHLIRVEGNPYAEYLDDKQTFRHSVVVPYEPPEVGSDYTTIH	231
                   *******    .:. **  **:*****    * :*  : **** **** *: *:: **: 

Zebrafish_P53      LNYMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEESNFKKDQ	262
Frog_P53           YNYMCNSSCMGGMNRRPILTIITLETPQGLLLGRRCFEVRVCACPGRDRRTEEDNYTKKR	268
Human_P53          YNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG	293
Mouse_P53          YKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKE	290
Rat_P53            YKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKE	291
                    :***********************  .* **** .*************:***.*  *. 

Zebrafish_P53      ETKTMAKTTTGTKRSLVKESSSATLRPEGSKKAKGSSSDEEIFTLQVRGRERYEILKKLN	322
Frog_P53           GLKPS------GKRELAHPPS---SEPPLPKKRLVVDDDEEIFTLRIKGRSRYEMIKKLN	319
Human_P53          EPHHELPPGS-TKRALPNNTS---SSPQPKKK----PLDGEYFTLQIRGRERFEMFRELN	345
Mouse_P53          VLCPELPPGS-AKRALPTCTS---ASPPQKKK----PLDGEYFTLKIRGRKRFEMFRELN	342
Rat_P53            EHCPELPPGS-AKRALPTSTS---SSPQQKKK----PLDGEYFTLKIRGRERFEMFRELN	343
                               ** *    *     *   **      * * ***:::**.*:*::::**

Zebrafish_P53      DSLELSDVVPASDAEKYRQKFMTKNKKENRGSSEPKQGKKLMVKDEGRSDSD	374
Frog_P53           DALELQESLDQQKVTI--------KCRKCRDEIKPKKGKKLLVKDEQPDSE-	362
Human_P53          EALELKDAQAGKEPGGSRAHS---SHLKSKKGQSTSRHKKLMFKTEGPDSD-	393
Mouse_P53          EALELKDAHATEESGDSRAHS---SYLKTKKGQSTSRHKKTMVKKVGPDSD-	390
Rat_P53            EALELKDAHAAEESGDSRAHS---SYPKTKKGQSTSRHK-PMIKKVGPDSD-	390
                   ::***.:    ..           .  : :   . .: *  :.*    ... 

Does your protein belong to any protein family?

The protein belongs to the p53 family.

Identify the structure page of your protein in RCSB When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

The structure for 1TUP (https://www.rcsb.org/structure/1TUP) was solved and made public in 1995. It is 2.20 Å, higher resolution, so it is a high quality structure.

Are there any other molecules in the solved structure apart from protein?

DNA which is bound and complexed with p53.

Does your protein belong to any structure classification family?**

Open the structure of your protein in any 3D molecule visualization software: PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands) Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

References

Fogg, V. C., Lanning, N. J., & MacKeigan, J. P. (2011). Mitochondria in cancer: At the crossroads of life and death. Chinese Journal of Cancer, 30(8), 526–539. https://doi.org/10.5732/cjc.011.10018