Week 4: Protein Design 1

Class Assignment

Part A.

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

500 grams is approximately 3.01e26 Daltons, which converts to 3.01e24 molecules of amino acids.

  1. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

The fish/cow DNA introduced by the food you consume does not transform into your cells (and furthermore their nuclei), and so cannot even be read by human transcription/translation machinery to produce fish/cow proteins. Furthermore, specialization of promoters and ribosomes mean that the human machinery cannot ‘read’ this foreign DNA as if it were human DNA.

  1. Why are there only 20 natural amino acids?

The leading hypothesis is that so far evolution has optimized for this number to introduce enough variation for protein diversity but not so much that amino acids are not stable.

  1. Can you make other non-natural amino acids? Design some new amino acids.

Absolutely! There is a lot of existing research to produce non-natural amino acids for applications in drug therapeutics and molecular imaging (such as photo-reactive amino acids).

  1. Where did amino acids come from before enzymes that make them, and before life started?

Amino acids were first formed from random interactions in the ‘primordial soup’ of water, inorganic gases, and energy sources such as heat.

  1. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

We would expect left handedness.

  1. Why are most molecular helices right-handed?

In nature, amino acids exist in L confirmation which results in righ-handed helices due to sterical constraints for stability, but there are growing efforts to make D-amino acids (mirror images). This ‘mirror life’ efforts have been controversial, however, with many scientists signing a statement about the risks of mirror life in December 2024.

  1. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

The backbone of the Beta sheet wants to form hydrogen bonds with other Beta sheet backbones, leading to aggregation. Stacked Beta sheets have greater thermodynamic stability (lower energy state) and so this aggregation is preferred.

  1. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Based on how proteins misfold in amyloid diseases, the beta-sheet stacking structure is preferred to create the large aggregates of amyloid fibers. These amyloid beta sheet structures are very strong, and so have been employed as materials for tissue engineering or even environmental filtering!

Part B.

  1. Briefly describe the protein you selected and why you selected it.

I selected the mCherry protein as I plan to use it as a ‘proof of concept’ reporter gene to see whether my integrative phage satellite can be used to engineer M. aichiense.

  1. Identify the amino acid sequence of your protein.

MVSKGEEDNM AIIKEFMRFK VHMEGSVNGH EFEIEGEGEG RPYEGTQTAK LKVTKGGPLP FAWDILSPQF MYGSKAYVKH PADIPDYLKL SFPEGFKWER VMNFEDGGVV TVTQDSSLQD GEFIYKVKLR GTNFPSDGPV MQKKTMGWEA SSERMYPEDG ALKGEIKQRL KLKDGGHYDA EVKTTYKAKK PVQLPGAYNV NIKLDITSHN EDYTIVEQYE RAEGRHSTGG MDELYK

  1. How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

It is 236 amino acids long and the most frequent amino acid is Glycine (25 residues).

  1. How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

Uniprot reports two similar proteins (both uncharacterized, from Pseudotamlana agarivorans and Purpureocillium lilacinum (Paecilomyces lilacinus)) at 100% identity match, but no official homologs.

  1. Does your protein belong to any protein family?

Yes, the pFam PF01353, which also includes many green fluorescent proteins.

  1. Identify the structure page of your protein in RCSB

The page can be found here: https://www.rcsb.org/3d-view/2H5Q/1

  1. When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better.

The crystal structure I found was published in 2006 with a 1.36 Å resolution, so a high resolution and thus a good quality structure.

  1. Are there any other molecules in the solved structure apart from protein?

There are some water molecules and one small ’non-standard’ molecule CH6 at the center of the protein, but otherwise the solved structure is just the protein.

  1. Does your protein belong to any structure classification family?

Its immediate family is the mFruits family of other red fluorescent proteins.

  1. Open the structure of your protein in any 3D molecule visualization software. Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

I used the RCSB visualization software.

Cartoon visualization cartoon image cartoon image

Ribbon visualization (In RCSB, this is called ‘Backbone’ representation type) backbone image backbone image

Ball and stick visualization ball image ball image

  1. Color the protein by secondary structure. Does it have more helices or sheets?

secondary image secondary image The protein appears to have more sheets (colored in yellow).

  1. Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

hydrophobic image hydrophobic image Hydrophilic residues are colored in orange and hydrophobic residues are colored in green. The protein appears to have a fairly even distrubition of residues that tend to alternate.

  1. Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Surface image Surface image The image has been rotated to highlight the deepest hole observed in the surface of the protein, but I would not say there is an clearly apparent large binding pockets.

Part C.