Week 4: Protein Design Part i

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat?

    • For this exercise, it is necessary to assume that 20% of the meat is protein; therefore, 500 g of meat contains 100 g of protein.

    • Now, an amino acid has an average mass of 100 Daltons, and since
      1 Dalton = 1 g/mol,
      then 100 Daltons = 100 g/mol.

    $$ \frac{100 g}{100 g/mol} \times 6.022 \times 10^{23} = 6.022 \times 10^{23} $$

    • Therefore, in 500 g of meat there are approximately
      $ 6.022 × 10^{23} $ molecules of amino acids.
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans do not become cows or fish when eating them because dietary proteins are digested into individual amino acids and small peptides in the gastrointestinal tract. The original three-dimensional structure and biological function of these proteins are destroyed during digestion. These amino acids are then reused by our cells to synthesize new proteins according to our own DNA sequence. Biological identity is determined by genetic information encoded in DNA, not by the proteins we consume. Therefore, although we obtain amino acids from beef or fish, we use them to build human proteins, not cow or fish proteins.

  1. Why are there only 20 natural amino acids?

  2. Can you make other non-natural amino acids? Design some new amino acids.

  3. Where did amino acids come from before enzymes that make them, and before life started?

  4. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

  5. Can you discover additional helices in proteins?

  6. Why are most molecular helices right-handed?

  7. Why do β-sheets tend to aggregate?

  8. What is the driving force for β-sheet aggregation?

  9. Why do many amyloid diseases form β-sheets?

  10. Can you use amyloid β-sheets as materials?

  11. Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

  1. Briefly describe the protein you selected and why you selected it. Thymidine phosphorylase (TP) is an enzyme (which is a protein) that plays a critical role in the body’s ability to recover nucleosides following DNA degradation. Here are its key characteristics based on the sources and our previous conversation:
  • Enzymatic Activity: TP’s function is primarily catabolic. It catalyzes the breakdown of thymidine into thymine and 2-deoxyribose-1-phosphate (2dDR1P), which is subsequently dephosphorylated into the sugar 2-deoxy-D-Ribose (2dDR).
  • Structural Identity: TP has an amino acid sequence that is identical to platelet-derived endothelial cell growth factor (PD-ECGF). Because it is a protein, it has a specific 3D structural architecture that largely dictates its biological function.
  • Pro-Angiogenic Role: TP is known to stimulate angiogenesis (the formation of new blood vessels). Researchers believe this angiogenic activity is directly driven by its catalytic production of 2dDR, which acts as a pro-angiogenic byproduct.
  • Link to Cancer: Historically, elevated TP activity has been found in cancer patients compared to healthy controls. Because tumor growth is highly dependent on angiogenesis, TP’s ability to promote blood vessel formation makes it a notable factor in cancer progression.
  1. Identify the amino acid sequence of your protein.

    • How long is it? 482 aminoacids long
    • What is the most frequent amino acid? Luicine (70 times)
    • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. 250 Homologous proteins
    • Does your protein belong to any protein family? Thymidine phosphorylase belongs to the pyrimidine nucleoside phosphorylase family, classified within the glycosyltransferase family 3, as indicated by databases such as Pfam, InterPro and PANTHER.
  2. Identify the structure page of your protein in RCSB

  • When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å) The Structures of Native Human Thymidine Phosphorylase and in Complex with 5-Iodouracil was solved in 2009 by the Department of Biology and Biochemistry, University of Bath with i high resolution of just 2.50 Å.

  • Are there any other molecules in the solved structure apart from protein? Yes. Besides the protein, the structure contains the small molecule 5-iodouracil (IUR), which is bound to the active site of thymidine phosphorylase. represented by C4 H3 I N2 O2

  • Does your protein belong to any structure classification family? Yes. The protein belongs to the nucleoside phosphorylase/phosphoribosyltransferase structural superfamily.

  1. Open the structure of your protein in any 3D molecule visualization software:
  • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
  • Color the protein by secondary structure. Does it have more helices or sheets?
  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?