Week 4 HW: Protein Design Part I

Part A. Conceptual Question

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Meat contains about 20 - 25% protein by weight. For 500g of meat, that’s approximately: 500g x (20/100) = 100g of protein

The molecular weight of an amino-acid = ~ 100 Daltons 1 Dalton = 1 g/mol

100g protein / 100 g/mol = 1 mol of amino-acids 1 mol = 6.022 x 10^23

Therefore, 500g of meat contains ~ 6.022 x 10^23 amino acid molecules. ~100 Daltons)

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

When we eat food, our digestive system breaks it down into basic components. 1. Mechanical breakdown Chewing and stomach churning physically break food into smaller pieces. 2. Chemical breakdown Digestive enzymes disassemble complex molecules into simples ones. Proteins - Amino-acids Carbohydrates - Simple sugars Fats - Fatty acids and glycerol

Why we don’t become what we eat? When we eat beef or finsh, our digestive system doesn’t absorb intact cow or fish celss. Instead,

Proteins are broken down into individual amino acids.
The original biological structue is completely dismantled.

Building blocks, not blueprints

Our body uses these basic nutrients as raw materials, not as templates.
The amino acids from beef or fish are just building blocks that your body reassembles according to our own DNA instructions.
Our genetic code determines how these building blocks are used.

DNA remains unchanged The food we eat provides resources, but our DNA provides the blueprint.

3. Why are there only 20 natural amino acids?

Genetic code DNA uses a triplet code (codons) made of 4 nucleotides (A, T, G, C). This creates 4³ = 64 possible three-letter combinations. Of these 64 codons:. 61 code for amino acids. 3 are stop codons (signaling the end of protein synthesis). The 61 amino acid codons are redundant, with multiple codons often coding for the same amino acid. This redundancy results in only 20 standard amino acids being encoded.

Evolutionary Optimization The number 20 represents an evolutionary balance between,

Functional diversity - Having enough variety to create proteins with diverse functions
Metabolic efficiency - Each amino acid requires specific biosynthetic pathways
Error tolerance - Too many similar amino acids would increase translation errors

Chemical Considerations The 20 standard amino acids provide,

A good range of chemical properties (hydrophobic, hydrophilic, acidic, basic, etc.)
Sufficient structural diversity for protein folding and function
A balance between stability and reactivity

Beyond the Standard 20

There are actually a few exceptions to the “only 20” rule.
Selenocysteine (the 21st amino acid) - Incorporated through a special mechanism using a UGA codon (normally a stop codon)
Pyrrolysine (the 22nd amino acid) - Found in some archaea and bacteria
Post-translational modifications - After protein synthesis, amino acids can be modified (phosphorylation, glycosylation, etc.)

4. Can you make other non-natural amino acids? Design some new amino acids.

Natural amino acids share a common structure.

An amino group (-NH₂)
A carboxylic acid group (-COOH)
A central carbon (α-carbon)
A variable side chain (R-group) that gives each amino acid its unique properties

Designed Non-Natural Amino Acids

Fluorophenylalanine

    H O
    | ‖
H₂N-C-C-OH
    |
    CH₂
    |
    C₆H₄F

Properties

Hydrophobic with enhanced stability
Fluorine introduces unique electronic properties
Potential applications in protein engineering for increased thermal stability

Azidolysine

    H O
    | ‖
H₂N-C-C-OH
    |
   (CH₂)₄
    |
    NH-N₃

Properties

Contains an azide group for click chemistry applications
Can be used for site-specific protein modification
Useful for bioorthogonal labeling in living systems

Photoswitchable Amino Acid

    H O
    | ‖
H₂N-C-C-OH
    |
    CH₂
    |
    C₆H₄-N=N-C₆H₅

Properties

Contains an azobenzene group that changes conformation when exposed to light
Could create proteins with light-controllable functions
Applications in optogenetics and responsive biomaterials

Metal-Chelating Amino Acid

    H O
    | ‖
H₂N-C-C-OH
    |
    CH₂
    |
    N
   / \
  N   N

Properties

Contains a triazole ring that can chelate metal ions
Could create artificial metalloenzymes
Applications in catalysis and sensing

Boronic Acid Amino Acid

    H O
    | ‖
H₂N-C-C-OH
    |
    CH₂
    |
    C₆H₄-B(OH)₂

Properties

Contains a boronic acid group that binds to sugars and diols
Could create glucose-responsive proteins
Applications in diabetes monitoring and treatment

Applications of Non-Natural Amino Acids Introducing fluorinated or other modified amino acids can increase thermal and chemical stability

Amino acids with reactive handles (azides, alkynes) allow for specific chemical modifications in complex biological environments

Creating amino acids with metal-binding or other catalytic properties can generate proteins with new enzymatic activities

Photoswitchable or environmentally responsive amino acids can create smart materials that change properties in response to stimuli

These amino acids can be incorporated into proteins using expanded genetic code techniques, creating proteins with entirely new functions

5. Where did amino acids come from before enzymes that make them, and before life started?

The emergence of amino acids in prebiotic contexts represents a critical juncture in chemical evolution preceding biological systems. Contemporary research indicates multiple parallel pathways for abiotic amino acid formation that do not require enzymatic catalysis or biological machinery (Ruiz-Mirazo et al., 2014). These pathways constitute essential components of what may be termed “metabolism-first” or “RNA-world” hypotheses regarding abiogenesis.

Exogenous Delivery Mechanisms Carbonaceous Chondrite Analysis

Particularly the CM2 class exemplified by the Murchison meteorite, have revealed a diverse complement of amino acids (>80 distinct species), including both proteinogenic and non-proteinogenic variants (Burton et al., 2012). These compounds exhibit significant enantiomeric excesses of L-forms for certain α-hydrogen amino acids, suggesting potential extraterrestrial origins for homochirality (Glavin & Dworkin, 2009).

Interstellar Medium Chemistry

Spectroscopic evidence indicates the presence of amino acid precursors in the interstellar medium. The detection of glycine in comet 67P/Churyumov-Gerasimenko by the Rosetta mission provides compelling evidence for amino acid synthesis in extraterrestrial environments (Altwegg et al., 2016).

Endogenous Synthesis Pathways Electric Discharge Reactions

The seminal Miller-Urey experimental paradigm demonstrated the formation of multiple amino acids under simulated primitive Earth atmospheric conditions (CH4, NH3, H2, H2O) with electrical discharge. Contemporary reanalysis of preserved samples using high-sensitivity analytical techniques has identified >20 amino acids, significantly expanding the known products of such reactions (Johnson et al., 2008).

Hydrothermal Systems

Submarine hydrothermal systems provide thermodynamically favorable environments for amino acid synthesis.

CO2 + NH3 + H2 + Energy → amino acids + H2O

Alkaline hydrothermal vents, characterized by pH gradients and mineral catalysts (e.g., Fe(Ni)S precipitates), facilitate reductive amination and carbonylation reactions that yield amino acids (Russell & Martin, 2004; Huber & Wächtershäuser, 2006).

Formamide-Based Chemistry Formamide (H2NCHO) serves as a versatile precursor for prebiotic synthesis under concentrated conditions. In the presence of various mineral catalysts (TiO₂, montmorillonite), formamide can generate multiple amino acids through condensation reactions during wet-dry cycling (Saladino et al., 2012).

Mechanistic Considerations Strecker Synthesis

The Strecker reaction represents a particularly plausible prebiotic pathway.

R-CHO + HCN + NH3 → R-CH(NH2)-CN → (hydrolysis) R-CH(NH2)-COOHR-CHO + HCN+NH3 → R-CH(NH2)-CN (hydrolysis) R-CH(NH2)-COOH

This reaction proceeds readily under aqueous conditions with moderate concentrations of precursors and produces α-amino acids with high yield and selectivity (Patel et al., 2015).

Reductive Amination Direct reductive amination of α-keto acids constitutes an alternative pathway.

R-CO-COOH + NH3 + H2 → (catalyst) R-CH(NH2)-COOH + H2OR-CO-COOH+NH3 + H2 (catalyst) R-CH(NH2)-COOH+H2O

Metal sulfides common in hydrothermal systems effectively catalyze this transformation (Novikov & Copley, 2013).

Transition to Protobiological Systems The concentration and polymerization of prebiotically synthesized amino acids likely occurred through several mechanisms.

Mineral surface adsorption, particularly on clay minerals (montmorillonite) and metal oxides
Eutectic freezing in aqueous solutions
Dehydration-hydration cycles in fluctuating aqueous environments
Lipid protocell encapsulation These processes potentially facilitated the formation of oligopeptides with rudimentary catalytic functions, preceding the emergence of encoded protein synthesis (Forsythe et al., 2015).

References Altwegg, K., et al. (2016). Prebiotic chemicals—amino acid and phosphorus—in the coma of comet 67P/Churyumov-Gerasimenko. Science Advances, 2(5), e1600285.

Burton, A. S., et al. (2012). Understanding prebiotic chemistry through the analysis of extraterrestrial amino acids and nucleobases in meteorites. Chemical Society Reviews, 41(16), 5459-5472.

Forsythe, J. G., et al. (2015). Ester-mediated amide bond formation driven by wet-dry cycles: A possible path to polypeptides on the prebiotic Earth. Angewandte Chemie International Edition, 54(34), 9871-9875.

Glavin, D. P., & Dworkin, J. P. (2009). Enrichment of the amino acid L-isovaline by aqueous alteration on CI and CM meteorite parent bodies. Proceedings of the National Academy of Sciences, 106(14), 5487-5492.

Huber, C., & Wächtershäuser, G. (2006). α-Hydroxy and α-amino acids under possible Hadean, volcanic origin-of-life conditions. Science, 314(5799), 630-632.

Johnson, A. P., et al. (2008). The Miller volcanic spark discharge experiment. Science, 322(5900), 404-404.

Novikov, Y., & Copley, S. D. (2013). Reactivity landscape of pyruvate under simulated hydrothermal vent conditions. Proceedings of the National Academy of Sciences, 110(33), 13283-13288.

Patel, B. H., et al. (2015). Common origins of RNA, protein and lipid precursors in a cyanosulfidic protometabolism. Nature Chemistry, 7(4), 301-307.

Ruiz-Mirazo, K., et al. (2014). Prebiotic systems chemistry: new perspectives for the origins of life. Chemical Reviews, 114(1), 285-366.

Russell, M. J., & Martin, W. (2004). The rocky roots of the acetyl-CoA pathway. Trends in Biochemical Sciences, 29(7), 358-363.

Saladino, R., et al. (2012). Formamide chemistry and the origin of informational polymers. Chemistry & Biodiversity, 9(3), 427-440.

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Stereochemical Principles The handedness of an α-helix is fundamentally determined by the stereochemistry of its constituent amino acids through their influence on backbone dihedral angles. In canonical protein structures composed of L-amino acids, the α-helix adopts a right-handed conformation with characteristic φ and ψ angles of approximately -57° and -47°, respectively.

This right-handed preference arises from,

The L-stereochemistry at the α-carbon
Minimization of steric repulsion between side chains
Optimization of hydrogen bonding geometry in the helical backbone

Mirror Symmetry in Protein Stereochemistry When considering the substitution of all L-amino acids with their D-enantiomers, a fundamental principle of molecular symmetry applies: complete enantiomeric substitution produces the mirror image of the original structure.

This principle has been experimentally validated through,

X-ray crystallography of synthetic D-protein structures
NMR studies of D-peptide conformations
Computational modeling of D-amino acid oligomers

Handedness of D-Amino Acid α-Helices Based on the mirror symmetry principle, an α-helix composed entirely of D-amino acids would adopt a left-handed helical conformation. This represents the precise mirror image of the standard right-handed α-helix found in natural proteins.

The D-amino acid α-helix would exhibit,

φ and ψ angles of approximately +57° and +47° (opposite signs to L-amino acid helices)
Identical hydrogen bonding distances but with reversed directionality
Equivalent stability to the L-amino acid right-handed helix
Reversed helical twist direction when viewed along the helical axis

Experimental Evidence

Multiple experimental studies have confirmed this stereochemical relationship,

Milton et al. (1992) synthesized D-amino acid peptides that formed left-handed helices with CD spectra that were mirror images of their L-amino acid counterparts
Karle et al. (1994) determined crystal structures of D-amino acid helices, confirming their left-handed conformation
NMR studies by Kraszewski et al. (2001) demonstrated the left-handed helical preference of D-amino acid sequences

Thermodynamic Equivalence

The free energy landscapes of L-amino acid right-handed helices and D-amino acid left-handed helices are precisely equivalent due to their enantiomeric relationship.

This results in,

Identical helix-coil transition temperatures
Equivalent folding kinetics
Mirror-image circular dichroism spectra
Identical resistance to thermal and chemical denaturation

Biological Implications The left-handed helical preference of D-amino acid peptides has significant implications for,

Design of proteolytically stable peptide therapeutics
Development of antimicrobial peptides with novel mechanisms
Creation of orthogonal protein folding systems
Understanding the evolutionary selection of L-amino acids in biological systems

7. Can you discover additional helices in proteins?

The canonical α-helix (3.6₁₃) and 3₁₀-helix represent the predominant helical motifs in natural proteins, accounting for approximately 40% of all protein secondary structure. However, the conformational space accessible to polypeptide chains permits additional helical architectures that may be rare in nature but accessible through directed evolution or rational design approaches.

Theoretical Helical Conformations Ramachandran analysis reveals several energetically favorable regions that can support helical conformations beyond the canonical structures

π-Helix (4.4₁₆) Characterized by (φ,ψ) angles of approximately (-57°, -70°) 4.4 residues per turn with i→i+5 hydrogen bonding pattern Wider diameter than α-helix with distinct side chain packing geometry Often found as local distortions within α-helices at functional sites

2₇-Helix (Ribbon Structure) Characterized by (φ,ψ) angles of approximately (-78°, 59°) 2 residues per turn with i→i+2 hydrogen bonding pattern Forms an extended ribbon-like structure Stabilized in specific sequence contexts with β-branched residues

Polyproline Helices PPII helix: (φ,ψ) ≈ (-75°, 145°), left-handed, 3 residues per turn PPI helix: (φ,ψ) ≈ (-75°, 160°), right-handed, more compact Lack traditional backbone hydrogen bonding Critical in structural proteins and signaling domains

Computational Discovery Recent computational methods have expanded our ability to identify and characterize novel helical conformations.

Fragment-Based Mining Analysis of the Protein Data Bank using backbone fragment clustering has revealed localized adoption of non-canonical helical conformations, particularly at functional interfaces and ligand-binding sites (Hollingsworth et al., 2018).

Molecular Dynamics Simulations Enhanced sampling techniques including replica exchange molecular dynamics have identified metastable helical conformations that may serve as intermediates in protein folding pathways or as targets for stabilization through rational design (Morrone et al., 2011).

Quantum Mechanical Calculations Ab initio calculations of model peptides have identified additional minima in the conformational energy landscape that correspond to theoretically viable helical structures not yet observed in natural proteins (Improta et al., 2015).

Experimental Approaches to Novel Helix Discovery De Novo Protein Design Computational design of proteins with non-canonical backbone geometries has successfully generated novel helical structures.

The design of α/β-peptide hybrid helices with altered hydrogen bonding patterns Stabilization of π-helical segments through strategic side chain interactions Creation of extended helical structures with non-natural amino acids

Constrained Peptides Introduction of covalent constraints through,

Hydrocarbon stapling
Lactam bridge formation
Disulfide engineering These approaches have yielded conformationally restricted peptides adopting novel helical geometries with enhanced stability.

High-Resolution Structural Studies Advances in cryo-electron microscopy and X-ray crystallography have revealed previously uncharacterized helical conformations in membrane proteins and enzyme active sites, particularly those involving.

Local unwinding of canonical helices
Transitions between different helical types
Composite helical structures with mixed hydrogen bonding patterns

Recently Characterized Novel Helical Structures ω-Helix

Recently characterized in certain transmembrane proteins
Exhibits approximately 6.0 residues per turn
Stabilized by specific side chain-backbone interactions
Appears to play functional roles in ion channel gating

Composite α/3₁₀ Helices

Dynamic structures with regions transitioning between α and 3₁₀ geometries
Often found at protein-protein interfaces
Provide conformational plasticity for molecular recognition

Cross-α Amyloid Helices Recently characterized in functional amyloids and designed proteins Composed of α-helices arranged in a cross-α pattern Form extended fibrillar structures with novel properties

Functional Implications of Novel Helical Structures Enzymatic Function Non-canonical helices often appear at enzyme active sites, where their distinct geometries position catalytic residues optimally for function. The π→α transition in particular has been implicated in enzymatic mechanisms requiring conformational changes.

Ligand Recognition Novel helical geometries create unique binding surfaces for molecular recognition, particularly evident in,

Transcription factor-DNA interactions
Protein-protein recognition interfaces
Membrane protein ligand binding sites

Therapeutic Applications The design of peptides with non-canonical helical structures has yielded.

Enhanced proteolytic stability
Improved target binding affinity
Novel mechanisms of action for antimicrobial peptides
Inhibitors of protein-protein interactions previously considered “undruggable”

References Hollingsworth, S. A., et al. (2018). Novel protein structural states: Beyond the native state. Protein Science, 27(9), 1589-1601.

Improta, R., et al. (2015). Understanding the role of stereoelectronic effects in determining collagen stability. 1. A quantum mechanical study of proline, hydroxyproline, and fluoroproline dipeptide analogues in aqueous solution. Journal of the American Chemical Society, 137(3), 1048-1055.

Morrone, J. A., et al. (2011). Molecular dynamics simulations of α, β, and γ-peptides. Biopolymers, 96(4), 506-522.

8. Why are most molecular helices right-handed?

Helical structures represent fundamental architectural elements in biological macromolecules, including proteins (α-helices), nucleic acids (DNA, RNA), and polysaccharides. A striking feature of these helical conformations is their predominant right-handed orientation. This stereochemical preference constitutes a critical aspect of biological homochirality and warrants systematic investigation regarding its physicochemical basis and evolutionary significance.

Stereochemical Determinants of Helical Thermodynamic Considerations The preference for right-handed helical conformations in biological macromolecules derives primarily from the stereochemical properties of their constituent monomers.

Proteins and L-Amino Acids The intrinsic stereochemistry of L-amino acids fundamentally constrains the accessible conformational space of polypeptide chains. Analysis of Ramachandran plots reveals that L-amino acids preferentially occupy regions corresponding to right-handed helical conformations (φ ≈ -60°, ψ ≈ -45°). This preference arises from,

Minimization of steric repulsion between side chains and backbone elements
Optimization of backbone hydrogen bonding geometry
Favorable electrostatic interactions between adjacent peptide dipoles

Quantum mechanical calculations demonstrate that the right-handed α-helix represents a lower energy state compared to the left-handed counterpart by approximately 1-2 kcal/mol per residue for L-amino acids (Novotny & Kleywegt, 2005).

Nucleic Acids and D-Ribose/D-Deoxyribose The stereochemistry of D-configured sugars in nucleic acids similarly constrains the conformational landscape of polynucleotide chains.

The C3’-endo and C2’-endo sugar puckers predominant in RNA and DNA, respectively, geometrically favor right-handed helical arrangements
Base stacking interactions achieve optimal overlap in right-handed conformations
The asymmetric phosphodiester backbone geometry imposes torsional constraints favoring right-handedness

Kinetic Factors Beyond thermodynamic stability, kinetic factors may contribute to the prevalence of right-handed helices.

Nucleation rates for right-handed helical structures may exceed those for left-handed conformations
Cooperative folding mechanisms potentially amplify initial chiral biases
Templated synthesis processes in biological systems propagate existing chirality

Molecular of Helical Handedness Protein Helices α-Helix The canonical α-helix in proteins composed of L-amino acids adopts a right-handed conformation with,

residues per turn Rise of 1.5 Å per residue i→i+4 hydrogen bonding pattern The energetic preference for this conformation has been quantified through both experimental and computational approaches. Notably, attempts to design stable left-handed α-helices using L-amino acids have been largely unsuccessful without extensive non-natural modifications.
Collagen Triple Helix The collagen triple helix represents a specialized right-handed superhelical structure composed of three left-handed polyproline II-like helices. This complex architecture demonstrates how hierarchical organization can produce composite helical structures with mixed handedness.

Nucleic Acid Helices ** B-DNA** The predominant form of DNA in physiological conditions adopts a right-handed double helix with,

10.5 base pairs per turn
Rise of 3.4 Å per base pair
Major and minor groove asymmetry

Z-DNA Z-DNA represents a notable exception as a left-handed helical conformation, occurring in specific sequence contexts (alternating purine-pyrimidine sequences) and under high salt conditions. The energetic penalty for this conformation highlights the intrinsic preference for right-handedness in nucleic acid structures.

Evolutionary Implications Origins of Biological Homochirality The predominance of right-handed helices in biological macromolecules is intrinsically linked to the homochirality of their constituent monomers (L-amino acids, D-sugars). Several hypotheses address the evolutionary origins of this homochirality.

Stochastic models - Random fluctuations amplified through autocatalytic processes
Deterministic models - Physical forces (e.g., weak nuclear force) imposing slight energetic preferences
Extraterrestrial origins - Delivery of enantiomerically enriched compounds via meteorites

The selection of L-amino acids and D-sugars may have been reinforced by their compatibility in forming stable right-handed helical structures essential for early biological functions.

Functional Advantages

Stereochemical compatibility between interacting macromolecules (e.g., protein-DNA recognition)
Efficient packing in higher-order assemblies
Directional properties exploited in molecular machines and motility systems
Reduced conformational heterogeneity, simplifying the folding energy landscape

Exceptions and Their Significance

Left-handed Z-DNA: Forms under specific conditions and may serve regulatory functions
PPII helices: Left-handed polyproline II helices play important structural roles
D-amino acid-containing peptides: Some naturally occurring antimicrobial peptides incorporate D-amino acids, potentially adopting left-handed helical segments

These exceptions often occur in specialized biological contexts and frequently serve distinct functional roles, suggesting evolutionary exploitation of alternative helical geometries for specific purposes.

References Novotny, M., & Kleywegt, G. J. (2005). A survey of left-handed helices in protein structures. Journal of Molecular Biology, 347(2), 231-241.

Pauling, L., Corey, R. B., & Branson, H. R. (1951). The structure of proteins: Two hydrogen-bonded helical configurations of the polypeptide chain. Proceedings of the National Academy of Sciences, 37(4), 205-211.

Saenger, W. (1984). Principles of nucleic acid structure. Springer-Verlag.

Blackmond, D. G. (2010). The origin of biological homochirality. Cold Spring Harbor Perspectives in Biology, 2(5), a002147.

Ho, B. K., & Brasseur, R. (2005). The Ramachandran plots of glycine and pre-proline. BMC Structural Biology, 5(1), 14.

9. Why do β-sheets tend to aggregate?What is the driving force for β-sheet aggregation?

In a β-strand, backbone groups (C=O and N–H) are not fully satisfied unless they form inter-strand hydrogen bonds. If strands are exposed (especially from partially unfolded proteins), they can pair with other strands from neighboring molecules, forming extended β-sheets and eventually fibrils.

Forces for β-sheet aggregation

Backbone hydrogen bonding

Strong directional H-bond networks form between strands. Cross-β assemblies maximize these interactions over long distances.

Hydrophobic effect

Many aggregation-prone segments are rich in hydrophobic residues. Association buries nonpolar side chains, releasing ordered water and lowering free energy.

π-stacking and side-chain packing (“steric zipper”)

In amyloid-like aggregates, side chains from facing sheets interdigitate tightly. Aromatic and branched residues stabilize dry, tightly packed interfaces.

Electrostatics

Reduced charge repulsion (near isoelectric pH, high salt) promotes association. Favorable salt bridges can further stabilize aggregates.

Entropy compensation via solvent release

Although protein conformational entropy decreases on aggregation, this is offset by:

release of bound water,
favorable enthalpy from H-bonds and packing. Net result: negative ΔG under permissive conditions.

α-helices satisfy many H-bonds within one chain (intramolecular). β-strands often rely on inter-strand bonding, making intermolecular association easy when exposed. Thus partially unfolded proteins with exposed β-prone segments are high risk for aggregation.

β-sheet aggregation is driven primarily by formation of extensive intermolecular backbone hydrogen-bond networks, reinforced by hydrophobic collapse and tight side-chain packing (steric zipper), which together lower free energy despite loss of conformational entropy.

10. Why do many amyloid diseases form β-sheets?Can you use amyloid β-sheets as materials?

Why many amyloid diseases involve β-sheets Many disease proteins (Aβ, tau, α-synuclein, prion protein, etc.) misfold into cross-β architecture because that state is often a very stable low-energy arrangement for exposed peptide backbones.

Backbone H-bond maximization Unfolded/misfolded chains align into β-strands and form extensive intermolecular hydrogen-bond networks. Hydrophobic residue burial Aggregation hides nonpolar side chains from water. Steric zipper packing Side chains from adjacent β-sheets interdigitate tightly, creating highly stable “dry” interfaces. Nucleation-polymerization kinetics Once a small nucleus forms, fibrils elongate rapidly by templating additional monomers. Proteostasis overload with age/stress Reduced chaperone capacity, impaired proteasome/autophagy, mutations, PTMs, and concentration effects increase misfolding/aggregation risk.

So, β-sheet-rich amyloid is not random—it is a thermodynamically favorable and kinetically self-propagating endpoint for many aggregation-prone proteins.

Can amyloid β-sheets be used as materials? Yes. Amyloid-like fibrils are actively explored as high-performance biomaterials.

They are attractive. Because,

High mechanical stiffness/strength (nanoscale fibrillar reinforcement)
Self-assembly from short peptides under mild conditions
Nanostructured order (high aspect ratio fibers, tunable networks)
Chemical tunability via sequence design and functionalization
Biocompatibility potential (depends on sequence, purity, endotoxin control)

Current applications

Tissue engineering scaffolds (cell adhesion and guidance)
Hydrogels for drug delivery and wound dressings
Nano/bioelectronics (templating conductive or semiconductive components)
Filtration/adsorption membranes (including heavy-metal capture)
Catalytic and enzyme-immobilization platforms
Sustainable structural composites

For biomedical use, designers use engineered, non-toxic amyloid-like peptides/proteins, not disease-associated toxic oligomers. Safety depends on,

sequence choice,
aggregation state control (avoid toxic intermediates),
immunogenicity/toxicology testing,
degradability and clearance profiles.

11. Design a β-sheet motif that forms a well-ordered structure.

1) Sequence (12 residues) KLVFFAEVKLVF

2) Why this works

Alternating amphipathic pattern supports β-strand geometry.
Hydrophobic core residues (L, V, F, F, V, L, V, F) promote sheet–sheet packing.
Charged residues (K, E) improve solubility and allow controlled electrostatic pairing.
This sequence family is known to strongly favor cross-β ordering.

3) Expected structural organization

Each peptide adopts an extended β-strand.
Strands align into parallel in-register β-sheets (can also support antiparallel depending on conditions).
Sheets stack via hydrophobic/aromatic faces to form a steric-zipper-like interface.
Net result - ordered fibrillar assemblies with repeating β-sheet lattice.

4) Conditions to obtain well-ordered structure

Peptide concentration: 0.2–1.0 mM
Buffer: 10 mM phosphate, pH 7.4
Salt: 50–150 mM NaCl
Temperature: 20–25°C
Slow incubation: 24–72 h (quiescent or mild shaking)

5) How to verify order CD spectroscopy: β-sheet minimum near ~218 nm FTIR (amide I): strong band near ~1620–1630 cm⁻¹ ThT fluorescence: increased signal during fibril formation TEM/AFM: long unbranched fibrils X-ray fiber diffraction: cross-β reflections (~4.7 Å and ~10–11 Å) If you want a safer de novo (non-amyloid-disease-derived) motif

Use an alternating polar/nonpolar strand such as.

(FKFE)3 → FKFEFKFEFKFE

This is a classic engineered ionic-complementary β-sheet peptide that forms ordered nanofibers/hydrogels under suitable ionic conditions.

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it. TP53 iss selected because it encodes p53, a central tumor-suppressor protein that protects cells from genomic instability. When DNA is damaged, p53 helps decide whether the cell should pause division to allow repair, activate repair genes, or undergo apoptosis if the damage is severe. This makes TP53 one of the most important regulators of cell fate under stress and a key factor in preventing cancer development. It is an appropriate choice for study because its function is clearly linked to genome maintenance, and defects in this pathway are widely associated with human disease. TP53 is also highly relevant to space research. During spaceflight, astronauts are exposed to ionizing radiation and other physiological stresses that can increase DNA damage and disrupt normal cellular regulation. Since p53 coordinates major DNA-damage responses, studying TP53 helps researchers understand how human cells respond to space conditions and how cancer risk may change during long-duration missions. Knowledge of TP53 behavior in these environments can support the development of radiation-risk biomarkers, protective countermeasures, and safer health strategies for future lunar and Mars exploration.

2. Identify the amino acid sequence of your protein. Protein: Cellular tumor antigen p53 Gene: TP53 Organism: Homo sapiens UniProt accession: P04637 Canonical isoform: Isoform 1 (393 aa)

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. The canonical human TP53 protein (UniProt accession P04637, isoform 1) is 393 amino acids in length. Based on amino acid composition analysis of this sequence, the most frequent amino acid is proline (P). This is consistent with the structural organization of p53, which contains a proline-rich region that contributes to protein–protein interactions and stress-response signaling. Figure1: IMEA surgface.png Figure2: IMEA with water.png
How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. Using UniProt’s BLAST for human TP53 (P04637), the number of homologous protein sequences is not a fixed universal value; it depends on your exact search settings (database scope, E-value cutoff, identity threshold, and whether fragments/unreviewed entries are included). In practice, a BLAST search for TP53 typically returns thousands of homologs across vertebrates and other taxa, reflecting the strong evolutionary conservation of the p53 family. For an academic report, you should state the exact count produced in your run and document the parameters used (e.g., UniProtKB searched, E-value threshold, date of access, and any filtering), because these settings directly determine the final homolog number. Blast tool Link: https://www.uniprot.org/blast/uniprotkb/ncbiblast-R20260309-080235-0613-77801143-p1m/overview
Does your protein belong to any protein family? Yes. Based on the UniProt BLAST results, TP53 belongs to the p53 protein family. The BLAST search returned 250 significant matches, and the top hits from human, bonobo, chimpanzee, gorilla, and other primates are all identified as “cellular tumor antigen p53” with the gene name TP53. These matches show very high sequence identity (about 96–100%) and an E-value of 0, which indicates strong homology and high conservation across species. Therefore, the BLAST results clearly support that the selected protein is a member of the p53 family.

3. Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å) Its crystallographic resolution is 2.20 Å, which is better (smaller) than the 2.70 Å benchmark specified for good quality; thus, by standard criteria, 2OCJ is a high‑quality structure suitable for reliable interpretation of side‑chain geometry and specific protein–DNA contacts. If you choose a different PDB entry for TP53, report the deposition/release year and compare its resolution to the 2.70 Å threshold in the same way.
Are there any other molecules in the solved structure apart from protein? Yes. In a representative human TP53 structure such as the p53 DNA‑binding core domain bound to DNA (RCSB PDB ID: 2OCJ), the model includes non‑protein components in addition to the protein chain(s). Specifically,
- Double‑stranded DNA (the p53 response element), forming a protein–DNA complex.
- A structural zinc ion (Zn2+), which stabilizes the p53 core domain fold.
- Crystallographic waters.
Does your protein belong to any structure classification family? Yes. The TP53 protein’s DNA-binding core domain belongs to established structural classification families. On RCSB entries for human p53 core domain (e.g., high-quality structures like 2OCJ), the domain is annotated as the p53 DNA-binding domain and mapped to standard classification resources: it corresponds to the p53 DNA-binding domain fold (a β-sandwich core with loops forming the DNA interface and a structural Zn2+ site), is assigned to Pfam PF00870 (p53 DNA-binding domain), and is grouped within transcription factor DNA-binding domain classes in SCOP/SCOPe and the corresponding architecture/topology category in CATH.

4. Open the structure of your protein in any 3D molecule visualization software:

PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
Color the protein by secondary structure. Does it have more helices or sheets?
Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)? After removed the solvent After H-add Orient Cartoon Cartoon smooth loops set on Ribbon Polymer.protein Polymer.nucleic acid Sphere res ZN Colored the helices PyMOL>color salmon, ss h
Executive: Colored 738 atoms. PyMOL>color yellow, ss s Executive: Colored 2139 atoms. PyMOL>color cyan, ss ’’
Executive: Colored 8716 atoms. PyMOL>color cyan, ss ’’
Executive: Colored 8716 atoms. Helix res

Part C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU. Choose your favorite protein from the PDB. We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359

Deep Mutational Scans Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. Can you explain any particular pattern? (choose a residue and a mutation that stands out) (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
Latent Space Analysis Use the provided sequence dataset to embed proteins in reduced dimensionality. Analyze the different formed neighborhoods: do they approximate similar proteins? Place your protein in the resulting map and explain its position and similarity to its neighbors.