week 4 protein design part 1

Basics

Questions

Amino Acids, Protein Structure, and β-Sheets

1. How many molecules of amino acids do you take with a piece of 500 grams of meat?

It depends on the type and cut of meat, but a reasonable estimate is about 20–26 g of protein per 100 g of beef. Therefore, 500 g of meat would provide approximately 100–130 g of protein.

If we assume an average amino acid residue has a mass of ~100 g/mol, this corresponds to about 1.0–1.3 moles of amino acid residues.

Since 1 mole = 6.022 × 10²³ molecules, this means:

1.0 mole ≈ 6 × 10²³ amino acid residues
1.3 moles ≈ 8 × 10²³ amino acid residues

So, eating 500 g of meat gives you on the order of 6 × 10²³ to 8 × 10²³ amino acid units.

2. Why do humans eat beef but do not become cows, eat fish but do not become fish?

Humans do not become the organisms they eat because food is first digested. Proteins from beef or fish are broken down into amino acids and small peptides in the digestive system. These small molecules are then absorbed and reused by the body to build human proteins, following the instructions encoded in human DNA.

In other words, the body does not copy the identity of the food organism. It only reuses its chemical building blocks.

3. Why are there only 20 natural amino acids?

There are not literally only 20 amino acids in nature, but there are 20 standard amino acids that are universally encoded by the genetic code in most proteins.

These 20 were likely selected during early evolution because they provide:

a broad range of chemical properties
good structural diversity
compatibility with the ribosome
efficient use in the genetic code

They include hydrophobic, polar, charged, aromatic, small, and flexible side chains, which together allow proteins to fold and function in many different ways.

There are also rare exceptions, such as selenocysteine and pyrrolysine, but the core set remains the same.

4. Can you make other non-natural amino acids? Design some new amino acids.

Yes, scientists can make non-natural or non-canonical amino acids. These are useful in chemistry, protein engineering, and synthetic biology.

Examples of designed amino acids

Fluoro-leucine
Similar to leucine, but with a fluorine atom added to the side chain. This could change hydrophobicity and stability.
Photo-switch amino acid
An amino acid with an azobenzene group in its side chain, allowing it to change shape when exposed to light.
Metal-binding amino acid
An amino acid containing a bipyridine-like side chain that can bind metal ions such as copper or zinc.
Redox amino acid
An amino acid with a quinone or ferrocene-like group that could participate in electron transfer.
Click-ready amino acid
An amino acid containing an azide or alkyne group for bioorthogonal “click” chemistry.

These new amino acids could give proteins new properties such as:

light responsiveness
selective chemical reactivity
conductivity
catalytic activity
metal binding

5. Where did amino acids come from before enzymes that make them, and before life started?

Before life began, amino acids likely formed through prebiotic chemistry. This means they were produced by natural chemical reactions without enzymes or living cells.

Possible sources include:

reactions in the early Earth atmosphere
hydrothermal systems
lightning or UV-driven chemistry
meteorites and extraterrestrial delivery

This suggests that amino acids may have existed before life and later became incorporated into the first biological systems. Enzymes appeared later and made these processes faster and more controlled.

6. If you make an α-helix using D-amino acids, what handedness would you expect?

A normal α-helix made from L-amino acids is usually right-handed.

If the helix were made from D-amino acids, it would be expected to form a left-handed α-helix, which is the mirror image of the normal structure.

7. Can you discover additional helices in proteins?

Yes. Besides the classical α-helix, proteins and peptides can adopt other helical forms.

Examples include:

3₁₀-helices
π-helices
left-handed helices in special contexts
synthetic helical structures designed in peptides and foldamers

It is possible to discover or design additional helices by studying unusual protein structures, computational modeling, and synthetic peptide chemistry.

8. Why are most molecular helices right-handed?

In biology, most helices are right-handed because proteins are built mainly from L-amino acids. The stereochemistry of L-amino acids favors the formation of right-handed α-helices.

So the preference is not random: it arises from the chirality of the molecular building blocks.

9. Why do β-sheets tend to aggregate?

β-sheets tend to aggregate because their peptide backbones can form extensive hydrogen-bonding networks between neighboring strands. These interactions are repetitive and highly stabilizing.

Also, β-strands often expose side chains in an alternating pattern, which makes them good at packing together into larger assemblies such as fibrils.

10. What is the driving force for β-sheet aggregation?

The main driving forces are:

hydrogen bonding between peptide backbones
hydrophobic interactions between side chains
release of water molecules from the interface, which increases solvent entropy

Together, these effects make β-sheet assemblies, especially cross-β structures, very stable.

11. Why do many amyloid diseases form β-sheets?

Many amyloid diseases involve proteins that misfold and then assemble into β-sheet-rich fibrils. The cross-β structure is very stable and can grow by recruiting additional misfolded protein molecules.

This makes β-sheet aggregation a common structural feature in diseases such as:

Alzheimer’s disease
Parkinson’s disease
Huntington’s disease
other protein misfolding disorders

12. Can you use amyloid β-sheets as materials?

Yes. Amyloid β-sheet assemblies can be used as functional biomaterials because they are often:

strong
stable
self-assembling
nanoscale and highly ordered

Potential applications include:

tissue engineering scaffolds
nanomaterials
functional coatings
drug delivery systems
bio-inspired structural materials

So although amyloids are linked to disease, they can also be useful when carefully designed and controlled.

13. Design a β-sheet motif that forms a well-ordered structure.

A good β-sheet design should encourage:

β-strand formation
regular side-chain patterning
controlled intermolecular interactions
reduced disorder at the ends

Example 1: Amphipathic β-strand peptide

Sequence:
Ac–Val-Lys-Val-Glu-Val-Lys-Val-Glu–NH2

Why this may work

Val promotes β-strand structure and creates a hydrophobic face.
Lys and Glu create a charged face.
Oppositely charged residues can form salt bridges.
The alternating arrangement supports ordered packing.
N-terminal acetylation and C-terminal amidation reduce end effects.

Example 2: More aggregation-prone fibril-forming motif

Sequence:
Ac–Phe-Val-Phe-Val-Lys-Glu-Phe-Val–NH2

Why this may work

Phe and Val strongly favor packing and aggregation.
Aromatic residues may strengthen intermolecular interactions.
Lys/Glu improve some balance between solubility and assembly.

This sequence may form fibrils more easily, but it also carries a higher risk of uncontrolled aggregation.

Example 3: β-hairpin motif with defined turn

Sequence:
RGKWTWQ–DPro-Gly–QWTVKGR

Why this may work

The DPro-Gly pair promotes a defined hairpin turn.
The strands can align in a controlled intramolecular β-sheet.
Aromatic and charged residues can help stabilize folding and packing.

This design is often more controlled than open-ended fibril-forming strands.

Briefly describe the protein you selected and why you selected it.

Identify the amino acid sequence of your protein. I am interested in Proteins that can enable movement or have realtionship in designing soft robotics, so I was interested in the following proteins:

Silk-Elastin-Like Proteins (SELPs)

Silk-Elastin-Like Proteins (SELPs) are a class of genetically engineered, chimeric biopolymers that combine the structural, mechanical properties of silk (specifically Bombyx mori silk fibroin) with the elasticity and thermo-responsiveness of elastin. By leveraging recombinant DNA technology, these proteins can be precisely tailored for applications in biomedical engineering, drug delivery, and tissue engineering.

Silk-Elastin-Like Proteins (SELPs) are engineered block copolymers comprising repeating amino acid sequences of silk, typically GAGAGS (Gly-Ala-Gly-Ala-Gly-Ser), and elastin, often GVGVP (Val-Pro-Gly-Val-Gly). A common, highly studied monomer unit is, one that combines elastic and structural properties

Composition: SELPs are block copolymers, consisting of alternating silk-like motifs (typically GAGAGS) and elastin-like motifs (typically GVGVP). Self-Assembly: In aqueous solutions, SELPs form micellar-like nanoparticles, with the hydrophobic silk blocks forming the core and the hydrophilic elastin blocks forming the corona. Stimuli-Responsiveness: SELPs are “smart” materials that respond to environmental triggers, most notably temperature, but also pH, ionic strength, and light. Mechanical Properties: The silk-to-elastin ratio determines the mechanical behavior. Higher silk content increases beta-sheet formation, resulting in stiffer materials, while higher elastin content increases flexibility. Production: Produced through E. coli expression systems, allowing for high control over sequence, molecular weight, and monodispersity, which improves reproducibility compared to natural materials

Picture: https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=7736173_nihms-1638934-f0007.jpg

Chambre L, Martín-Moldes Z, Parker RN, Kaplan DL. Bioengineered elastin- and silk-biomaterials for drug and gene delivery. Adv Drug Deliv Rev. 2020;160:186-198. doi: 10.1016/j.addr.2020.10.008. Epub 2020 Oct 17. PMID: 33080258; PMCID: PMC7736173.

Collagen

Collagen’s primary amino acid sequence is characterized by a repeating, unique motif, most commonly Glycine-Proline-X or Glycine-X-Hydroxyproline, where Glycine appears every third residue. These ~1,000 amino acid-long chains form a triple helix, rich in glycine, proline, and hydroxyproline.

Repeating Units: The primary sequence is defined by repeats, making up a significant portion of the chain. Glycine (Gly): Occurs at every third position, essential for the tight packing of the triple helix. Proline (Pro) & Hydroxyproline (Hyp): The ‘X’ and ‘Y’ positions are frequently occupied by Proline (approx. 28%) and Hydroxyproline (approx. 38%). Hydroxyproline and Hydroxylysine: These modified amino acids are crucial for stabilizing the triple helix structure via hydrogen bonding. Structure: Three left-handed polyproline II helices intertwine to create a right-handed superhelical triple helix, known as tropocollagen.

Common types of collagen, such as Type I, consist of two chains and one chain, while Type III consists of three chains. resource founnd in: https://www.google.com/search?q=Identify+the+amino+acid+sequence+of+colageno+protein+&sca_esv=cad53a9b66261df5&rlz=1C5CHFA_enDE1097DE1097&biw=1821&bih=914&sxsrf=ANbL-n5wRETnA3Zz3e6j-U716rO5RCHt9w%3A1773161894209&ei=pk2waf2_DLmdhbIPsJzG8Ac&ved=0ahUKEwj9qtXm5pWTAxW5TkEAHTCOEX4Q4dUDCBE&uact=5&oq=Identify+the+amino+acid+sequence+of+colageno+protein+&gs_lp=Egxnd3Mtd2l6LXNlcnAiNUlkZW50aWZ5IHRoZSBhbWlubyBhY2lkIHNlcXVlbmNlIG9mIGNvbGFnZW5vIHByb3RlaW4gMgcQIRigARgKMgcQIRigARgKMgcQIRigARgKMgcQIRigARgKMgcQIRigARgKSIlbUABYqFNwAXgBkAEAmAHfAqABjByqAQcxLjguNy4yuAEDyAEA-AEC-AEBmAIToALRHMICBhAAGBYYHsICBRAAGO8FwgIIEAAYogQYiQXCAggQABiABBiiBMICCxAAGIAEGIYDGIoFwgIFECEYoAHCAgUQIRifBcICBhAhGBUYCsICBBAhGBWYAwDiAwUSATEgQJIHBzIuOC43LjKgB4ResgcHMS44LjcuMrgHzRzCBwUyLjkuOMgHLYAIAA&sclient=gws-wiz-serp

Actin and myosin

Actin and myosin are highly conserved, complex proteins, with actin typically comprising 374-376 amino acids and myosin (specifically the heavy chain) being a much larger molecule (~2000+ residues). Due to their size and various isoforms, they are generally identified by their full sequences in protein databases (like UniProt) rather than a single short string.

Below are the key details regarding their amino acid sequences based on rabbit skeletal muscle, which is the standard reference:

Actin Amino Acid Sequence (Rabbit Skeletal Muscle) Actin is a 374-residue protein with a highly conserved sequence. It includes a unique-methyl histidine residue.

Key Features: High proportion of proline and glycine. Sequence Data Source: The complete sequence was first determined by Elzinga et al. (1973). Isoforms: While highly conserved, differences occur between skeletal, cardiac, and cytoplasmic isoforms (e.g., about 25 amino acid differences between skeletal and cytoplasmic actin).

Myosin Amino Acid Sequence (Heavy Chain/S1 Fragment) Myosin is a large motor protein (Hexamer: 2 heavy chains, 4 light chains). The functional motor domain is the S1 fragment (globular head).

Active Site Sequence: A key 20-residue peptide containing the active site in Acanthamoeba and rabbit skeletal myosin has been identified, with sequences such as Thr-Glu-Asn-Thr-Me2Lys-Lys. Fragment Identification: A 92-residue fragment containing SH-1 and SH-2 groups in the globular head was identified by Maita et al.. Motor Domain: The motor domain of myosin II comprises approximately 700-800 amino acids at the N-terminus of the heavy chain.

Key Structural Sites (Interaction Points) The interaction between actin and myosin involves specific binding sites on both proteins: Actin Binding Site on Myosin: Located on the S1 head, this area involves multiple hinged segments that change shape to facilitate contraction. Myosin Binding Site on Actin: The interaction involves specific residues that can be mapped using peptide fragments. Loop 4/CM Loop: Specific loops on the myosin head are critical for binding to actin.

For the full, exact sequence, searching for “Rabbit skeletal muscle actin UniProt” or “Human Beta-Myosin Heavy Chain UniProt” in scientific databases is required.

Some images from :

Myotilin Monomer AF-Q9UBF9-2-F1-v6

Protein: Myotilin Gene: MYOT Source organism: Homo sapiens search this organism UniProt: Q9UBF9-2 go to UniProt Experimental structures: 2 PDB structures for Q9UBF9-2go to PDBe-KB Global quality average pLDDT 77.06 (High) Sequence length 314