Week 4 HW: Protein design part I

Protein design part i

  • Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

R/: Meat is approximately 20% protein by mass, so 500g of meat contains roughly 100g of protein. Since the average amino acid has a molecular weight of ~100 Daltons (100 g/mol), 100g of protein equals 1 mole of amino acids. Multiplying by Avogadro’s number gives approximately 6 × 10²³ molecules = 600 sextillion amino acid molecules in a single piece of meat.

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

R/: When we digest meat, our digestive enzymes (pepsin, trypsin, chymotrypsin) completely break down all foreign proteins into their individual amino acid building blocks. These amino acids are chemically identical regardless of their source glycine from a cow is the same as glycine from a fish. Once absorbed into the bloodstream, our own DNA acts as the sole blueprint, directing cells to reassemble those amino acids into specifically human proteins. The organism’s identity is determined by its genetic instructions, not by the raw materials it consumes.

Why are there only 20 natural amino acids?

R/: The 20 natural amino acids represent an evolutionary solution that balances three factors: chemical diversity (they collectively cover all necessary properties charged, neutral, hydrophobic, hydrophilic, rigid, flexible), genetic coding capacity (the 4-letter DNA alphabet produces 64 possible codons, providing enough redundancy to encode exactly 20 amino acids reliably), and evolutionary stability (the genetic code was fixed ~3.5 billion years ago in the earliest cells, and changing it would be catastrophic for all life). Two exceptions selenocysteine and pyrrolysine exist in rare organisms, confirming that expansion is possible but extremely costly, having occurred only twice in nearly 4 billion years of evolution.

Can you make other non-natural amino acids? Design some new amino acids.

Where did amino acids come from before enzymes that make them, and before life started?

R/: Amino acids originated through abiotic (non-biological) chemistry on early Earth. The landmark Miller-Urey experiment (1953) demonstrated that passing electricity through a mixture of simple gases (methane, ammonia, hydrogen, and water vapor) mimicking early Earth’s atmosphere and lightning spontaneously produces amino acids. Additionally, amino acids have been discovered in carbonaceous meteorites such as the Murchison meteorite, indicating they form naturally throughout the universe via basic organic chemistry. Hydrothermal vents on the ocean floor also provide conditions where amino acids can form abiotically through mineral catalyzed reactions. This means amino acids predate life entirely they are a natural product of carbon chemistry under energetic conditions, and early life simply inherited and later optimized their production through enzymes.

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

R/: A D-amino acid α-helix would be left-handed. Normal α-helices built from L-amino acids are right-handed because the geometry of L-amino acids favors phi/psi backbone angles that produce righthanded coiling. D-amino acids are the exact mirror image of L-amino acids, so their backbone geometry favors the mirror-image conformation a left-handed helix. The structure is equally stable and retains the same hydrogen bonding pattern, but is the complete mirror image of a natural α-helix. This has been confirmed experimentally with synthetic D-peptides.

Can you discover additional helices in proteins?

Why are most molecular helices right-handed?

R/: Most biological helices are right-handed because life universally uses L-amino acids, whose backbone bond angles (phi/psi angles) favor right handed coiling as the lowest energy conformation. This is a direct consequence of homochirality life’s ancient, universal selection of L-amino acids over D-amino acids, which became permanently fixed early in evolution. The Ramachandran plot confirms that right-handed helical conformations occupy the most energetically favorable region for L-amino acid backbone geometry. The deeper origin of L-amino acid selection may involve asymmetric radiation from neutron stars, preferential adsorption on chiral mineral surfaces, or random chance reinforced by self-replication.

Why do β-sheets tend to aggregate?

R/:β-sheets aggregate due to two primary forces. First, the edge strands of β-sheets have unsatisfied hydrogen bond donors and acceptors along their backbone, which seek to form additional hydrogen bonds with the edges of neighboring β-sheets. Second, β-sheets frequently present hydrophobic surfaces on their faces; the hydrophobic effect drives these surfaces together to minimize unfavorable interactions with surrounding water molecules. These two forces combined with the geometric flatness and complementarity of β-sheets create highly stable aggregates. Aggregation follows nucleation dependent kinetics slow initial seed formation followed by rapid self-propagating growth which explains the progressive nature of amyloid diseases.

What is the driving force for β-sheet aggregation?

Why do many amyloid diseases form β-sheets?

R/:Amyloid diseases are characterized by β-sheet formation because β-sheets represent an extremely thermodynamically stable protein conformation. Unlike normal protein folding where hydrophobic regions are buried inside the structure stress conditions (aging, mutations, pH changes) cause proteins to partially unfold and expose their hydrophobic cores. These exposed regions spontaneously reorganize into β-sheets, as hydrogen bonds form along the protein backbone regardless of the specific amino acid sequence. Critically, β-sheets have flat, complementary surfaces that stack onto each other, allowing one misfolded protein to act as a template that recruits and converts neighboring proteins into the same conformation a self propagating cascade. The resulting fibers are insoluble and resistant to cellular degradation machinery, causing progressive accumulation and cell death. This mechanism underlies Alzheimer’s disease (amyloid-β and tau), Parkinson’s disease (α-synuclein), and prion diseases, among others.

Can you use amyloid β-sheets as materials?

R/:Yes, amyloid β-sheet fibers are increasingly recognized as valuable nanomaterials. Their properties as exceptional mechanical strength, chemical stability, nanoscale regularity, and spontaneous self-assembly make them attractive for multiple applications. Current research includes their use as nanoscaffolds for assembling metal nanoparticles into nanowires, drug delivery vehicles that protect and release therapeutics under controlled conditions, food-grade thickeners and emulsifiers derived from whey protein amyloids, and living materials through engineered bacterial curli fibers. Spider silk’s remarkable toughness is partially attributed to embedded β-sheet nanocrystals, inspiring synthetic fiber design. The central challenge is controlling assembly precisely distinguishing between pathological uncontrolled aggregation and useful directed self-assembly.

Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

Briefly describe the protein you selected and why you selected it.

Identify the amino acid sequence of your protein.

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

Does your protein belong to any protein family?

Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

Are there any other molecules in the solved structure apart from protein?

Does your protein belong to any structure classification family?

Open the structure of your protein in any 3D molecule visualization software:

PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?