Week 04: Protein Design Part-1
Part A: Conceptual Questions
1. How many molecules of amino acids in 500g of meat?
Meat is roughly 20% protein by weight. To find the total number of molecules, we can use the following estimation:
- Protein Mass: 500g × 0.20 = 100g
- Molar Mass: On average, an amino acid is 100 Da (which is equivalent to 100 g/mol).
- Moles: 100g / 100 g/mol = 1 mole
- Molecules: Using Avogadro’s number ($6.022 \times 10^{23}$), 500g of meat contains approximately 600 sextillion ($6 \times 10^{23}$) amino acid molecules.
2. Why don’t humans become cows or fish after eating them?
When we consume protein, our digestive system does not keep the original structure intact. Instead, it breaks the long polymer chains down into individual amino acids.
These “bricks” are transported to our cells, where our own DNA provides the unique blueprint to reassemble them into specific human proteins. The biological identity of an organism is defined by the sequence and arrangement of these building blocks, not the raw materials themselves.
3. Why are there only 20 natural amino acids?
While many more amino acids exist in chemistry, these 20 provide sufficient chemical diversity (hydrophobicity, charge, and size) to fold into almost any functional shape required for life.
Evolution likely “settled” on this specific set early on because adding more would increase the complexity of the translation machinery (tRNAs and enzymes) without providing a significant survival advantage. It reached a point of diminishing returns.
4. Can you make non-natural amino acids?
Yes. Scientists use “expanded genetic codes” to incorporate synthetic amino acids into proteins for research and medicine.
- Design Example: One could design an amino acid with a cyano-group (–CN) to act as a sensitive local probe for electric fields within a protein, or one with a photo-crosslinker that bonds to neighbors only when triggered by UV light.
5. Where did amino acids come from before life and enzymes?
Amino acids formed through abiotic synthesis (chemical processes without life).
- The Miller-Urey Experiment: This famous study demonstrated that lightning-like discharges in a “primeval soup” of methane, ammonia, and hydrogen can spontaneously create amino acids.
- Space: Amino acids have also been discovered on carbonaceous meteorites, suggesting that the building blocks of life can form in deep space and were delivered to Earth via impacts.
6. Handedness of an alpha-helix using D-amino acids?
Natural proteins are made of L-amino acids, which naturally twist into right-handed alpha-helices. Because D-amino acids are the mirror image of L-amino acids, the physical space (steric hindrance) between atoms is reversed. Therefore, a helix made of D-amino acids would be left-handed.
7. Why are most molecular helices right-handed?
This is due to homochirality—the fact that life uses only one “hand” (the L-form) of amino acids. In an L-amino acid chain, the geometry of the peptide bond and the positions of the side chains make the right-handed twist energetically favorable. A left-handed twist would cause the side chains to physically “clash” with the protein’s backbone.
8. Why do beta-sheets tend to aggregate?
Beta-sheets have “sticky” edges characterized by unsatisfied hydrogen bond donors and acceptors.
To reach a more stable, lower-energy state, these exposed edges seek out other strands to bond with. If they cannot find a partner within the same protein, they will bond with strands from other protein molecules, causing them to stack into large, insoluble clumps.
9. Why do amyloid diseases form beta-sheets?
Amyloid diseases (such as Alzheimer’s or Parkinson’s) occur when proteins misfold into extremely stable, “cross-beta” structures. These act like “molecular velcro,” where the sheets stack so tightly that water is completely excluded, making them very hard for the body to break down.
- Materials Use: These structures are actually quite useful in engineering. They are being researched as nanofibers for tissue scaffolding or as ultra-strong adhesives because they are incredibly resistant to heat and chemical degradation.
Part B: Protein Analysis and Visualization
Selected Protein: Transthyretin (TTR) (P02766)
Gene: TTR
Organism: Homo sapiens
1. Briefly describe the protein you selected and why you selected it.
It is a homotetrameric transport protein produced mainly in the liver and choroid plexus of the brain. It carries thyroid hormone (thyroxine/T4) and retinol-binding protein through the bloodstream and cerebrospinal fluid.
2. Identify the amino acid sequence of your protein.
3. How long is the protein?
The length of the amino acid is: 147
4. What is the most frequent amino acid in the sequence?
Most frequent: A | 15 | 10.20%
Amino Acid Frequency Analysis

5. How many protein sequence homologs are there for your protein?
UniProt BLAST returned thousands of homologous transthyretin sequences across vertebrates including mammals, birds, reptiles, and fish, showing that the protein is highly evolutionarily conserved.
6. Does your protein belong to any protein family?
It belongs to the Transthyretin/hydroxyisourate hydrolase superfamily.
7. Identify the structure page of your protein in RCSB.
RCSB PDB ID: 1DVQ
https://www.rcsb.org/structure/1DVQ
8. When was the structure solved?
This structure was deposited in 2000 and was solved in 2001, using X-ray diffraction at 2.00 Å resolution.
9. Is it a good-quality structure?
2.00 Å — excellent quality
10. Are there any other molecules in the solved structure apart from the protein itself?
Yes — thyroxine (T4) ligand in the binding pocket, plus water molecules.
11. Does your protein belong to any structural classification family?
Transthyretin (synonym: prealbumin)
Protein Visualization
12. Visualize the protein using different structural representations.
Cartoon Representation

13. Color the protein by secondary structure.

The protein contains significantly more beta-sheets than alpha-helices. This is expected for transthyretin, which is a beta-sheet-rich transport protein.
14. Color the protein by residue type.

Hydrophobic residues are mainly buried inside the protein core, helping stabilize the folded structure through hydrophobic interactions. Hydrophilic residues are mostly exposed on the protein surface, where they interact with water and other molecules.
15. Visualize the molecular surface of the protein.

The structure contains visible binding pockets near the thyroxine-binding channel. These cavities are important for ligand binding and transport functionality.
Part C: Using ML-Based Protein Design Tools
C1. Protein Language Modeling
Deep Mutational Scans

The heatmap shows mutation sensitivity across sequence positions. Certain residues are highly conserved, meaning mutations at these sites strongly reduce model likelihood. Hydrophobic core residues showed especially strong intolerance to mutation.
Latent Space Analysis

The embedding clusters proteins with similar sequence features close together in latent space. Transthyretin appears near proteins with similar beta-sheet-rich transport architectures.
C2. Protein Folding
Folding a Protein with ESMFold
The predicted structure generated by ESMFold closely matched the experimentally solved structure from the PDB. The overall beta-sheet-rich fold was preserved, indicating that the model successfully captured the native topology of transthyretin.
Small mutations generally did not dramatically disrupt the fold, suggesting that the protein structure is relatively resilient to conservative amino acid substitutions. However, larger sequence alterations caused visible structural deviations and reduced stability.
C3. Protein Generation
ProteinMPNN Sequence Probability Analysis

The probability map highlights which amino acids are favored at each position. Conserved residues showed high confidence scores, while flexible surface positions tolerated more variation.
The generated sequence maintained many of the important structural residues found in the original transthyretin sequence. When folded using ESMFold, the predicted structure remained highly similar to the original backbone structure.