Week 4 HW: Protein Design I

Hi- It’s time to do your homework! iso iso

Part A

Questions:

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Solve- How many Daltons are in 500 grams? 1g = 6.022173643E+23 Daltons (6.022173643E+23 x 500)/100 = 4.29 x 10E+18

    Answer- There are approximately 4.29 x 10E+18 amino acid molecules in 500 grams of meat.

  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

    Answer- The process of digestion uses acids and enzymes to break tissues down into simpler molecules for use in cellular mechanics. This process helps provide the rescources/energy to power our own cellular processes which reassemble the digested amino acids, sugars and such into the forms required by each cell to perform its function which is dictated by the organisms genetic code.

  3. Why are there only 20 natural amino acids?

    Answer- According to Andrew Doig, a chemical biologist at the University of Manchester in the UK, it goes all the way back to LUCA. The twenty natural amino acids found in all living organisms are the building blocks we share in common. It is speculated that these twenty in particular are the most stable do to their structure, and able to maintain their functional forms after being buried, or exposed to the environmental conditions of Earth 3.5–3.8 billion years ago when LUCA first emerged, probably from some archaic form of RNA. They are more hydrophobic than hydrophilic, and that may have something to do with their role in LUCA’s evolution because of the way they are able to fold and work together structuraly.

  4. Can you make other non-natural amino acids? Design some new amino acids.

    Answer- Yes, absolutely! The only problem though is when trying to incorporate synthetic amino acids into cell function/DNA-RNA the catalytic function of those new amino acids have to be incorporated into the cell’s chemical function of enzymatic activity.

  5. Where did amino acids come from before enzymes that make them, and before life started?

    Answer- Apparently, there are several theories which range from riding in on an astreroid to chemical reactions occuring through electrical activity-i.e., lightening striking in areas rich in oxygen, nitrogen, hydrogen, carbon and sulphur compounds. The astroid theory has been proven viable by the discovery of at least 86 amino acids on the Murichison meteorite, which landed in Australia in 1969. I beleive it was probably a combination of planetary chemical reactions and celestial seeding.

  6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

    Answer- Left, because the a helix is usually right handed for the most common amino acids becuase the carbons have an L configuration, but in D-amino acids, the carbons have a D configuration, they mirror the L configuation. x

  7. Can you discover additional helices in proteins? In theory, yes. Although detecting structures outside the typical three is difficult, not to mention the limitation of the physical and chemical properties of helical structures. But with synbio, the possibility of discovering something new is certainly possible, maybe even likely.

  8. Why are most molecular helices right-handed?

    Answer- It’s a mystery… I think, but it has something to do with chirality. Perhaps polarity, too? Or maybe the ionic charge of particles, laws of attraction- opposites attract- who knows? Certainly, there must be an answer.

  9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

Answer- It appears as though the proteing is programmed to fold in such a way that it wants to form aggregate structures. This behavior is probably caused by the alternating arrangement of charged and hydrophobic amino acids, which makes them inclined to gather through attraction of oppositely charged particles?

Part B

URD53675.1 orange carotenoid-binding protein (plasmid) Chroococcidiopsis sp. CCNUC1

  1. For the sake of consistency I decided to stick with the first protein from the week two homework. It’s interesting to see how many different organisms this protein exists in, all with there own unique functions using this molecule.
  2. It is 319 amnino acids long, most common is A with a count of 29. According to the UNIProt Homology, there are 239 homologs, and InterProscan identifies it as belonging to NTF2-like domain superfamily. Belongs to the orange carotenoid-binding protein family.
  3. There were quite a few hits for OCP in RCSB PDB. The model above is a 3-D rendering of OCP2 Gloeocapsa sp. PCC 7428 from RCSB PDB.
  4. PyMol Renderings: Stick and Ribbon (below)

Pymol - Orange Carotenoid-binding Protein

x4 x4 Clockwise from topleft: ribbon, secondary, stick, hyrdro

Number of helices: 10

Number of sheets: 6

Hyrophobic with 2,363 atoms in hydrophobic residues vs. Hydrophilic residues with a count of 782 atoms

Binding Pocket Yes, definately several pockets and even some holes leading to core.

Part C - Using ML-Based Protein Design Tools

C1. Protein Language Modeling

  1. Deep Mutational Scans- pdb_00008qx5 Helical Carotenoid Protein 4 (HCP4) from Anabaena with bound Canthaxanthin deep deep • Heat map visuals indicate binding probabilities for:

    • Sites 43 and 112 exhibit high probability for following bases: A, D, E, G, K, N, Q, R, S, and T.
    • C has low probability across all sites except 43 and 112, where it is medium.
    • Y and X are fairly low across entire sequence.
  2. Latent Space Analysis
    sp1 sp1 sp4 sp4 In the examples above the closest related matches were:

    • Baker’s yeast (Saccharomyses cerevisiae) TaxId: 559292
    • Eukaryotic peptide chain release factor ERF2 C-terminal domain (fission yeast, Schizosaccharomyces pombe) TaxId: 48961
    • Cytochrome f subunit of the cytochrome b6f complex, transmembrane anchor (Green Alga, Chlamydomonas reinhardtii) TaxId: 3055

C2. Protein Folding

Amino Acid Probability - PDB 8qx5 aap aap Folding a protein:

Fold your protein with ESMFold.

fo fo

Do the predicted coordinates match your original structure?

ESM fold/reverse fold comparison

esm esm The left column represents ESM fold from PDB sequence-8qx5, right column displays reverse fold results

Citation:

  1. Rachel Brazil (2018). The Alphabet Soup of Life. Available at: https://www.chemistryworld.com/features/why-are-there-20-amino-acids/3009378.article (Acessed: 28 February 2026)
  2. Kirschning A. On the Evolutionary History of the Twenty Encoded Amino Acids. Chemistry. 2022 Oct 4;28(55):e202201419. doi: 10.1002/chem.202201419. Epub 2022 Jul 28. PMID: 35726786; PMCID: PMC9796705.
  3. Huang, W., Wang, S., Wei, Y. et al. Design and evolution of artificial enzyme with in-situ biosynthesized non-canonical amino acid. Nat Commun 16, 8698 (2025). https://doi.org/10.1038/s41467-025-63733-3
  4. Medeiros-Silva J, Dregni AJ, Hong M. Distinguishing Different Hydrogen-Bonded Helices in Proteins by Efficient 1H-Detected Three-Dimensional Solid-State NMR. Biochemistry. 2024 Jan 2;63(1):181-190. doi: 10.1021/acs.biochem.3c00589. Epub 2023 Dec 21. PMID: 38127783; PMCID: PMC10880114.
  5. Greg Huber et al, Entropy and chirality in sphinx tilings, Physical Review Research (2024). DOI: 10.1103/PhysRevResearch.6.013227
  6. The PROSITE database Sigrist CJA, Cuche BA, de Castro E, Coudert E, Redaschi N, Bridge A. The PROSITE database for protein families, domains, and sites. Nucleic Acids Res. 2026; doi: 10.1093/nar/gkaf1188 [In press] PubMed:41263099 [Full text] [PDF version]