Week 4 HW: Protein Design Part I

This week focuses on how sequence, structure, and energetics can be modeled and manipulated to create or optimize proteins with specified functions.

Part A - Conceptual Questions

For my homework, I initated a conversation with Claude Ai using Sonnet v4.6. My prompts use a method I use to start with a question, allow me to provide my answer, and receive an evaluation of my response with reinforcing key learning concepts. (Expand to see detailed responses to my answers.). I find this approach to be more interactive and leads to better knowledge retention.

Question 1: How many amino acid molecules are in a 500g piece of meat?

Using the given parameters — 500g of meat, average amino acid weight of 100 Daltons.

Feedback…

500g ÷ 100 g/mol = 5 moles × 6.022 × 10²³ = ~3 × 10²⁴ amino acid molecules — three septillion building blocks in a single meal.

Question 2:Why do humans eat beef but not become a cow? Eat fish but not become a fish?

We digest and metabolize proteins rather than absorb and mutate. Our proteins are protected and encoded.

Feedback…

The digestive system is a sophisticated demolition process. Proteases break proteins down into individual amino acids, stripping all biological identity. Your DNA then encodes precisely which proteins your ribosomes build from those recycled raw materials. The cow’s sequence is erased; your genome writes a new one. Sequence is everything — same 20 amino acids, completely different instructions.

Question 3: Why are there only 20 natural amino acids?

Because that represents a large number of variations — when combined with every possible sequence, the combinations are exponentially high in the billions.

Feedback…

Your combinatorics intuition is correct and actually undersells it. For a protein just 100 amino acids long, possible sequences number 20¹⁰⁰ — vastly larger than atoms in the observable universe. The 20 canonical amino acids cover the essential chemical toolkit life needs — acids, bases, hydrophobics, polars, aromatics, and structurally unique proline. The genetic code likely became frozen early once ribosome and tRNA machinery co-evolved around these 20. Selenocysteine is sometimes called the 21st, and synthetic biology is actively expanding the code. The 20 are evolution’s solution, not chemistry’s limit.

Question 4: Can you make other non-natural amino acids? Design some new ones.

Your answer: Yes — evolution converged on 20, but others may exist outside evolutionary pressure, possibly arising from light spectrum properties or geological timescales.

Feedback…

Absolutely yes. The Murchison meteorite contained over 70 amino acids, most non-canonical, suggesting amino acid chemistry is far broader than what evolution selected for. Synthetic non-natural amino acids already in use include p-Azidophenylalanine with precision bioconjugation handles, β-amino acids with protease-resistant backbones, D-amino acids as mirror images of natural L-forms, and fluorinated amino acids with altered hydrophobicity for drug design. To design a novel amino acid you manipulate the R-group side chain — engineering metal-binding groups, photoactivatable groups responsive to specific light wavelengths, or click-chemistry handles. Zhang’s QTY Code is itself this kind of thinking — recognizing structural mimicry between amino acids to repurpose the existing 20 in non-natural combinations.

Question 5: Where did amino acids come from before enzymes that make them, and before life started?

Your answer: Geological, light energy, electrical, and even gravitational forces were all at play resulting in amino acid formations before our understanding of life emerged.

Feedback…

Your answer is well-grounded and captures the essential abiotic forces. The Miller-Urey experiment (1953) demonstrated that electrical sparks, water vapor, methane, ammonia, and hydrogen — simulating early Earth — produced amino acids within days, no enzymes required. Hydrothermal vents provide another pathway, where iron-sulfur mineral surfaces act as primitive catalysts substituting for enzymes that didn’t yet exist. Extraterrestrial delivery is strongly supported by the Murchison meteorite, Tagish Lake meteorite, and asteroid Ryugu samples — amino acids forming in space through UV photochemistry on interstellar ice grains, then delivered to Earth by gravity over billions of years. The Strecker synthesis — hydrogen cyanide, ammonia, aldehydes, and water — produces amino acids abiotically from conditions readily available on early Earth. The deeper chicken-and-egg puzzle points toward the RNA World hypothesis — RNA molecules that could both carry information and catalyze reactions preceded both proteins and DNA, bootstrapping the system before protein synthesis machinery existed.

Question 6: If you make an α-helix using D-amino acids, what handedness would you expect?

An alpha-helix creates a right-handed coil.

Feedback…

Your answer describes the natural L-amino acid case correctly — but D-amino acids change everything. D-amino acids are mirror images of L-amino acids, and an α-helix built entirely from D-amino acids is left-handed — a perfect mirror image of the natural right-handed helix. Hydrogen bonding pattern and rise per residue remain the same, but the twist inverts. D-peptides are completely invisible to proteases that can’t recognize their mirror-image substrate, making them extraordinarily stable in biological environments and actively investigated for drug delivery and therapeutics.

Question 7: Can you discover additional helices in proteins?

Yes, since a protein may have many evolutionary and disrupted or folded variations.

Feedback…

Yes — definitively. Beyond the classic right-handed α-helix, the 3₁₀ helix hydrogen bonds every three residues and appears frequently at helix termini. The π-helix spans five residues and was long considered rare until computational analyses revealed it appears frequently at functionally important sites. The polyproline helix is unique — proline’s rigid ring forces a backbone geometry with no internal hydrogen bonds at all. As AlphaFold2 has predicted structures for hundreds of millions of proteins, helical motifs continue to be found in new contexts, including intrinsically disordered proteins that adopt helical structure only upon binding a partner. The full catalog of biologically relevant helix types remains incompletely mapped.

Question 8: Why are most molecular helices right-handed?

Due to molecular electrical charge initiating primary bonds resulting in a right-handed twist, with left-handed helices possible under favorable conditions.

Feedback…

Your answer captures something real — charge and bonding geometry do play a role — but the deeper answer touches one of the most profound unsolved questions in science: the origin of biological homochirality. The immediate structural reason is that L-amino acid backbone geometry makes the right-handed α-helix the lowest energy conformation, minimizing steric clashes while maximizing hydrogen bonding. The deeper question is why life chose L-amino acids at all. Leading hypotheses include circularly polarized UV light from cosmic sources preferentially destroying one mirror-image form — directly relevant to your light spectrum point. Parity violation in the weak nuclear force creates a vanishingly small but real energy difference between D and L molecules that could bias outcomes over geological timescales. A third possibility is frozen accident — an early self-replicating system happened to use L-amino acids and locked that choice in permanently. The handedness of life’s helices may trace all the way back to a molecular asymmetry that originated in the cosmos.

Question 9: Why do β-sheets tend to aggregate? What is the driving force?

β-sheets aggregate because they are flat and linear in design with bonding properties, repeating in a pattern or weave.

Feedback…

Your structural intuition is correct. β-sheets have exposed edges where hydrogen bond capacity is unfulfilled — unlike α-helices where all hydrogen bonding is internally satisfied. These exposed edges are essentially sticky, actively seeking additional strands to bond with. Hydrophobic stacking between sheet faces amplifies this — flat geometry allows face-to-face stacking driven by hydrophobic interactions between side chains above and below the sheet plane. Critically, aggregation is cooperative — each new strand makes the next addition more energetically favorable, which is why β-sheet aggregation can accelerate explosively once a nucleus forms. This cooperativity underlies amyloid formation in Alzheimer’s, Parkinson’s, Type 2 diabetes, and prion diseases — thermodynamically stable, protease-resistant, insoluble aggregates that are pathologically destructive precisely because they cannot be cleared.

Question 10: Why do amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Amyloid diseases form β-sheets due to misfolding pathology. Since they are β-folds they are sticky and thermodynamically strong, difficult to clear — the same properties that would make an excellent material such as a synthetic cement.

Feedback…

Your answer is strong on both counts and your cement analogy is more apt than you might realize. Proteins can sample alternative folding pathways under stress — aging, mutation, pH shifts — and for many proteins the cross-β amyloid structure represents a thermodynamic energy minimum more stable than the native fold. Once a misfolded nucleus forms it templates surrounding proteins in a prion-like propagation. The body cannot degrade these structures because dense hydrogen bond networks and hydrophobic cores resist proteases, and insolubility makes them inaccessible to cellular clearing machinery. As materials, amyloid fibrils have tensile strength comparable to steel on a per-weight basis, self-assemble from solution without external energy, and are chemically stable across wide pH and temperature ranges. Demonstrated applications include hydrogels for tissue scaffolding, conductive fibrils coated with metal nanoparticles for bioelectronics, water filtration membranes, and amyloid-silica composites as structural cement-like materials. Curli fibers — naturally occurring bacterial amyloid from E. coli biofilms — have been engineered as programmable living materials that assemble on demand. Biology’s problem becomes materials science’s solution.

Question 11: Design a β-sheet motif that forms a well-ordered structure.

A motif that acts as a 3-dimensional weave on the x, y, and z axis — resulting in a textile stronger than a simple x,y weave, useful in environments requiring strong resistant materials like Kevlar or heat resistant tiles.

Feedback…

Your 3D weave concept maps closely onto real structural strategies in nature and materials engineering. Well-ordered β-sheet design requires strict alternation of hydrophobic and hydrophilic residues — hydrophobics pack face-to-face between sheets while hydrophilics point outward into solvent. Edge-capping residues at strand termini prevent runaway aggregation. Turn sequences need geometrically precise residues — proline enforces bends, glycine provides backbone flexibility. Biology already builds your 3D concept: β-barrel proteins in bacterial outer membranes curve and close into cylinders of remarkable stability. Spider silk embeds nanocrystalline β-sheet domains in an amorphous matrix, distributing stress in three dimensions — outperforming Kevlar on a weight-normalized basis by absorbing energy through controlled deformation rather than brittle fracture. Computationally designed β-sheet proteins from David Baker’s group include closed barrels and extended lattices not found in nature. Your reentry tile analogy is structurally sound — ablative heat shields work by distributing energy across a 3D network with no single catastrophic failure point, exactly what a 3D β-sheet lattice would achieve. The key engineering challenge is controlling z-axis assembly using sequence-encoded electrostatic repulsion between sheet faces to set precise interlayer spacing rather than collapsing into amorphous aggregates.


Part B: Protein Analysis and Visualization

Briefly describe the protein you selected and why you selected it.

Identify the amino acid sequence of your protein.

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

Does your protein belong to any protein family?

Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

Are there any other molecules in the solved structure apart from protein?

Does your protein belong to any structure classification family?

Open the structure of your protein in any 3D molecule visualization software:

PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?


Checklist

  • Briefly describe the protein you selected and why you selected it.
  • Identify the amino acid sequence of your protein.
  • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
  • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
  • Does your protein belong to any protein family?
  • Identify the structure page of your protein in RCSB
  • When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
  • Are there any other molecules in the solved structure apart from protein?
  • Does your protein belong to any structure classification family?
  • Open the structure of your protein in any 3D molecule visualization software:
  • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
  • Color the protein by secondary structure. Does it have more helices or sheets?
  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?