Week 4 HW: Protein Engineering Part One

Protein Design Part One 

Bioinformatics tools—including protein structure prediction, generative models, protein sequence recovery, and visualization tools—are essential for use in protein engineering, helping us design sequences, predict structures, and understand molecular interactions at scale, with increasing precision and efficacy. However, these tools are not perfect, as they are unable to perfectly mirror the complexity of how protein-related phenomena occur in the real, infinitely more complicated world. To further complicate things, some functions (enzymatic activity, ligand binding) may be easier to measure experimentally, but when designing proteins that are novel or poorly characterized, we lack accessible assays and other means of evaluating said proteins. Understanding the 3D structure of proteins is also a significant hurdle. An effective protein engineering pipeline must take this all into consideration. Thus, computational approaches should be discerned for use based on the requirements of the target protein and the desired function of said protein. Building an effective pipeline requires clarity and strategy. 

This week’s objectives are to learn concepts of amino acid structure, 3D protein visualization, and how to deal with a variety of machine learning-based design tools. We will then hopefully brainstorm as a group how to apply the tools we have in the toolbox this week to engineer a better bacteriophage. 

Part A. Conceptual Questions

We were to choose nine of the following questions from a list of eleven provided by Shuguang Zhang. I have presented them all below in the hopes that I will be able to find free time to come back and finish answering them all! 

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

    1. If in 500 grams of meat there are 130g of protein, according to Google. If 100 daltons is 100g/mol, our calculation for mols is 130g/(100g/mol). This means we have 1.3 mols of amino acids. Given 6.022x1023 molecules per mol, that’s 7.8x1023 molecules.
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

    1. Our digestive system breaks these down into the nutrients needed to construct cells, not replicate the organisms.
  3. Why are there only 20 natural amino acids?

    1. In a word, evolution. This “canonical set” of amino acids formed billions of years ago during RNA world and were optimized to enable the formation of soluble structures with closely packed cores. This allowed the formation of ordered binding pockets (Doig, 2017).
  4. Can you make other non-natural amino acids? Design some new amino acids.

    1. We can make non-natural amino acids by keeping the same basic amino acid backbone and chemically changing the side chain in order to give it new properties. One can also use enzymes or engineered cells to produce novel amino acids. There are other methods, but these two seem most common. We could use this knowledge design amino acids that change how proteins behave in subtle ways. One example would be a metal binding amino acid with a side chain designed to hold onto iron or copper more tightly than natural amino acids can. This could allow scientists to build proteins that carry out new types of chemical reactions or better control electron transfer, which is important in metabolism and energy production. We could also hypothetically creat a pH sensitive amino acid whose charge changes very sharply when not at a normal body pH. Adding this into a protein could make the protein switch shape or activity depending on its environment, such as becoming active in acidic conditions but inactive at neutral pH.
  5. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

    1. If I make an α-helix using D-amino acids, I would expect it to be left-handed. In proteins built from L-amino acids, α-helices are right-handed. With D-amino acids, the most stable helix is the mirror, which means it would twist in the left-handed direction.
  6. Can you discover additional helices in proteins?

    1. Indeed. We can discover additional helices in proteins by studying their three-dimensional structures. Scientists figure out the shape of proteins using techniques like X-ray crystallography or cryo-electron microscopy, and then look for repeating spiral patterns in the backbone of the protein. If a section of the protein consistently forms a spiral that is different from a typical alpha helix, it can be identified as another type of helix; thus, new helices are found by observing regular structural patterns in real proteins.
  7. Why are most molecular helices right-handed?

    1. Most molecular helices are right-handed because almost all amino acids are L-amino acids, which typically form right-handed alpha helices for the sake of stability. Thus, right-handed helices became the most common and energetically favorable orientation in nature.
  8. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

    1. Many amyloid diseases form  β-sheets because beta sheet structures are very stable which means proteins stick together. When certain proteins misfold, their backbone hydrogen bonds line up in a way that creates long, repeating beta sheets that stack on top of each other (this is how we get the signature plaques of Alzheimer’s disease, which I studied), which is very hard for our bodies to break down. This is the pathology we see in Alzheimer’s.  
  9. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? 1. Please see above, and yes you can. They would be ideal for biological scaffolds, for example, an application long studied (Nowick, 1995).

  10. Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

Briefly describe the protein you selected and why you selected it. For my protein, I’ve chosen human interleukin-6 (IL-6). I studied this biomarker extensively in my exploratory work for Cognito Therapeutics and am simply fascinated by its role in Alzheimer’s Disease and inflammation in general. 

Identify the amino acid sequence of your protein.

MAPVPPGEDSKDVAAPHRQPLTSSERIDKQIRYILDGISALRKETCNKSNMCESSKEALAENNLNLPKMAEKDGCFQSGFNEETCLVKIITGLLEFEVYLEYLQNRFESSEEQARAVQMSTKVLIQFLQKKAKNLDAITTPDPTTNASLLTKLQAQNQWLQDMTTHLILRSFKEFLQSSLRALRQM

  • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. The sequence length of IL-6 is 186. The most common amino acid is leucine (L).

  • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. The BLAST tool was so useful. There are 250 homologs. 

  • Does your protein belong to any protein family? IL-6 is part of the IL-6 cytokine family, a group of signaling proteins involved in immune responses and inflammation and all with a shared receptor component called gp130, which is used to transmit signals inside cells. Other members include Interleukin-11 (also studied in Alzheimer’s Disease), Leukemia inhibitory factor, and Oncostatin M. They all help regulate immune function, inflammation, and cell growth. 

Identify the structure page of your protein in RCSB When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

  1. The structure was solved in 1997. Yes, it is very good resolution (Resolution: 1.90 Å).

    • Are there any other molecules in the solved structure apart from protein?

      1. Yes, they are carbohydrate groups - it is a glycoprotein. 
    • Does your protein belong to any structure classification family?

      1. Yes, interleukin-6.
  2. Open the structure of your protein in any 3D molecule visualization software:

I downloaded PyMol and went to File > Open, and chose the .pdg file I downloaded from RCSB.

  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Instead of commands for my first visualization I used my browser. Relevant commands are captured in each screenshot.

Cartoon Ribbon Ball and stick

Color the protein by secondary structure. Does it have more helices or sheets? Helices are red here, loops are green, and if there were sheets, they would be yellow. It has more helices.

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

  • Red: Acidic residues, Hydrophilic
  • Blue: Basic residues, Hydrophilic
  • Green: Polar uncharged residues, Hydrophilic
  • Gray: Hydrophobic

It appears there are more hydrophobic than hydrophilic residues.

  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

It does not have pockets. It has shallow binding sites closer to the surface.

Citations:

  1. Doig AJ. Frozen, but no accident - why the 20 standard amino acids were selected. FEBS J. 2017 May;284(9):1296-1305. doi: 10.1111/febs.13982. Epub 2017 Jan 13. PMID: 27926995.
  2. Potiszil, C., Ota, T., Yamanaka, M., Sakaguchi, C., Kobayashi, K., Tanaka, R., … & Nakamura, E. (2023). Insights into the formation and evolution of extraterrestrial amino acids from the asteroid Ryugu. Nature communications, 14(1), 1482.
  3. Nowick, James S., Eric M. Smith, and Glenn Noronha. “Molecular Scaffolds. 3. An Artificial Parallel. beta.-Sheet.” The Journal of Organic Chemistry 60.23 (1995): 7386-7387.