Week 4 HW: Protein Design

cover image cover image

Homework 4

Protein Design Part I โ€” amino acids, protein structure, helices, and ฮฒ-sheets.

๐Ÿ“‹ Parts

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•—
     โ•‘  ๐Ÿงฌ PROTEIN DESIGN PART I โ€” ฮฑ-helices & ฮฒ-sheets ๐Ÿงฌ          โ•‘
     โ•‘                                                             โ•‘
     โ•‘        โ•ญโ”€โ”€โ”€โ•ฎ     right-handed ฮฑ-helix                       โ•‘
     โ•‘       โ•ฑ  โ—  โ•ฒ    (L-amino acids)                            โ•‘
     โ•‘      โ”‚ โ—   โ— โ”‚                                              โ•‘
     โ•‘       โ•ฒ  โ—  โ•ฑ                                               โ•‘
     โ•‘        โ•ฐโ”€โ”€โ”€โ•ฏ                                                โ•‘
     โ•‘                                                             โ•‘
     โ•‘     โ•โ•โ•โ•โ•ฒ  โ•ฑโ•โ•โ•โ•    ฮฒ-sheet (pleated, H-bonded)             โ•‘
     โ•‘          โ•ฒโ•ฑ                                                 โ•‘
     โ•‘     โ•โ•โ•โ•โ•ฑ  โ•ฒโ•โ•โ•โ•                                            โ•‘
     โ•‘                                                             โ•‘
     โ•‘   "20 amino acids โ†’ infinite folds. Part A + Part B below!" โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang (i.e. you can select two to skip).

Answers provided for: (9 selected; 2 skipped: Can you make other non-natural amino acids? Design some new amino acids. and Design a ฮฒ-sheet motif that forms a well-ordered structure.)


1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Answer: Meat is roughly 15โ€“25% protein by dry weight; water content varies. For a rough estimate, assume ~500 g of meat contains ~100 g of protein (โ‰ˆ20%). An average amino acid has a molecular mass of ~100 Daltons (Da).

  • Number of amino acids in 100 g protein โ‰ˆ 100 g / (100 ร— 10โปยณ kg/mol) โ‰ˆ 100 g / 0.1 kg/mol โ‰ˆ 1 mol โ‰ˆ 6 ร— 10ยฒยณ molecules (Avogadroโ€™s number).

Order of magnitude: ~10ยฒยณโ€“10ยฒโด amino acid molecules per 500 g of meat.


2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Answer: Dietary proteins are digested into amino acids and small peptides before absorption. They are absorbed as monomers, not as intact proteins.

  • Digestion: Stomach acid and proteases (pepsin, trypsin, chymotrypsin) hydrolyze peptide bonds.
  • Absorption: Amino acids enter the bloodstream and are used as building blocks.
  • Assembly: Our cells use these amino acids to synthesize our own proteins according to our genome. The cowโ€™s or fishโ€™s DNA is never used; only the amino acid monomers are reused.

Result: We use the amino acids as nutrients; we do not incorporate the cowโ€™s or fishโ€™s proteins or genes intact. We remain human because our protein synthesis is controlled by human DNA.


3. Why are there only 20 natural amino acids?

Answer: The genetic code is degenerate: 61 sense codons encode 20 standard amino acids. The number 20 reflects a balance of evolutionary and physicochemical constraints:

  • Evolution: Early life likely used a smaller set of amino acids; the canonical 20 were added over time as biosynthesis pathways evolved.
  • Sufficiency: 20 amino acids provide enough chemical diversity (hydrophobic, polar, charged, aromatic, etc.) to build proteins with diverse structures and functions.
  • Genetic code: The triplet code (4ยณ = 64 codons) can encode more than 20, but expansion beyond 20 would require additional tRNA synthetases and codons; the cost of adding more may outweigh the benefit.
  • Fidelity: A larger set of amino acids would increase the risk of misincorporation and reduce translation fidelity.

Summary: 20 amino acids provide sufficient diversity for protein function while keeping the system manageable and robust.


4. Where did amino acids come from before enzymes that make them, and before life started?

Answer: Abiotic (prebiotic) synthesis.

  • Millerโ€“Urey experiment (1952): Simulated early Earth conditions (reducing atmosphere, lightning, heat) produced amino acids (glycine, alanine, etc.) from simple precursors (Hโ‚‚O, CHโ‚„, NHโ‚ƒ, Hโ‚‚).
  • Extraterrestrial sources: Amino acids (e.g., glycine) are found in meteorites (e.g., Murchison) and comets; they may have been delivered to early Earth.
  • Hydrothermal vents: Alkaline vents and other mineral surfaces can catalyze amino acid formation from COโ‚‚, Hโ‚‚, and nitrogen.
  • Strecker synthesis: Cyanide, aldehydes, and ammonia can form amino acids under prebiotic conditions.

Conclusion: Amino acids could form without enzymes or life, via abiotic chemistry and/or delivery from space.


5. If you make an ฮฑ-helix using D-amino acids, what handedness (right or left) would you expect?

Answer: Left-handed (M-type) helix.

  • L-amino acids form right-handed (P-type) ฮฑ-helices because the L-configuration places the side chain in a conformation that favors right-handed twist.
  • D-amino acids are the mirror image; their side chains favor the opposite twist. A D-amino acid ฮฑ-helix is therefore left-handed.

Summary: D-amino acid ฮฑ-helix โ†’ left-handed; L-amino acid ฮฑ-helix โ†’ right-handed.


6. Can you discover additional helices in proteins?

Answer: Yes. Beyond the canonical ฮฑ-helix (3.6 residues/turn), other helices exist:

  • 3โ‚โ‚€ helix: ~3 residues/turn; tighter, shorter hydrogen bonds; often at helix termini.
  • ฯ€-helix: ~4.4 residues/turn; rare; energetically less favorable.
  • Polyproline helices (PPI, PPII): Proline-rich helices with different geometry.
  • Collagen-like structures: Triple helical motifs.
  • Novel helices: New helices can be discovered through structural biology (e.g., X-ray crystallography, cryo-EM) or designed de novo.

Conclusion: Additional helices can be found by analyzing protein structures and designing new motifs.


7. Why are most molecular helices right-handed?

Answer: Several factors favor right-handed helices:

  • Chirality of L-amino acids: All natural proteins use L-amino acids. The L-configuration favors right-handed ฮฑ-helices and ฮฒ-strands; left-handed helices are sterically strained.
  • DNA: Double helix is right-handed (B-form).
  • RNA: RNA helices are typically right-handed.
  • Minimization of steric clash: Right-handed twist often minimizes steric clashes between side chains and the backbone.
  • Evolution: Once right-handed helices dominated, the genetic code and biosynthesis reinforced this preference.

Summary: L-amino acid chirality and steric constraints favor right-handed helices in natural proteins.


8. Why do ฮฒ-sheets tend to aggregate?

Answer: ฮฒ-sheets expose backbone amide and carbonyl groups that can form hydrogen bonds with adjacent strands or sheets.

  • Hydrogen bonding: ฮฒ-strands have alternating Nโ€“H and C=O groups along the backbone; these can pair with adjacent strands or with strands from another sheet.
  • Hydrophobic side chains: Many ฮฒ-sheets have hydrophobic residues; stacking of sheets can bury these surfaces and reduce solvent exposure.
  • Extended conformation: Extended strands maximize surface area for inter-strand and inter-sheet contacts.
  • Amyloid-like stacking: ฮฒ-sheets can stack in a parallel or antiparallel fashion, forming amyloid fibrils.

Conclusion: ฮฒ-sheets aggregate because they expose H-bond donors/acceptors and hydrophobic surfaces that favor inter-sheet interactions.


9. What is the driving force for ฮฒ-sheet aggregation?

Answer: Main driving forces:

  • Hydrogen bonding: Backboneโ€“backbone H-bonds between strands from different molecules or sheets.
  • Hydrophobic effect: Burial of hydrophobic side chains reduces contact with water.
  • Entropy: Release of ordered water molecules when hydrophobic surfaces associate.
  • ฯ€โ€“ฯ€ stacking: Aromatic side chains (e.g., Phe, Tyr) can stack between sheets.
  • Electrostatic complementarity: Alternating charged and hydrophobic residues (e.g., in ionic self-complementary peptides) can drive ordered assembly.

Summary: H-bonding, hydrophobicity, and entropy release drive ฮฒ-sheet aggregation.


10. Why do many amyloid diseases form ฮฒ-sheets?

Answer: Many disease-associated proteins aggregate into amyloid fibrils rich in ฮฒ-sheet structure:

  • Misfolding: Proteins that are normally ฮฑ-helical or disordered can misfold into ฮฒ-sheet-rich conformations under stress (e.g., pH, temperature, mutations).
  • Stability: Cross-ฮฒ structure (ฮฒ-strands perpendicular to the fibril axis) is highly stable; once formed, fibrils are difficult to disaggregate.
  • Nucleation: A small ฮฒ-sheet nucleus can template further growth; amyloid formation is often nucleation-dependent.
  • Examples: Aฮฒ (Alzheimerโ€™s), ฮฑ-synuclein (Parkinsonโ€™s), prion (PrP), huntingtin (Huntingtonโ€™s).

Conclusion: ฮฒ-sheet structure provides a stable, self-propagating amyloid conformation that underlies many neurodegenerative diseases.


11. Can you use amyloid ฮฒ-sheets as materials?

Answer: Yes. Amyloid-like ฮฒ-sheet structures are used as materials:

  • Self-assembling peptides: Shuguang Zhangโ€™s ionic self-complementary peptides form stable ฮฒ-sheet nanofibers and scaffolds for tissue engineering, drug delivery, and 3D cell culture.
  • Nanostructures: ฮฒ-sheet fibrils can serve as templates for mineralization, nanowires, and conductive materials.
  • Hydrogels: ฮฒ-sheet-rich peptide networks form hydrogels for wound healing and regenerative medicine.
  • Functional materials: Engineered amyloid fibrils have been used for catalysis, biosensors, and optical materials.

Conclusion: Amyloid ฮฒ-sheets can be engineered as functional biomaterials for biomedical and material applications.


Part B. Protein Analysis and Visualization

================================================================================
   ______   ______   _____   ____     ____  
  / ____/  / ____/  / ___/  / __ \   / __ \ 
 / __/    / /       \__ \  / / / /  / / / /
/ /___   / /___    ___/ / / /_/ /  / /_/ / 
/_____/  \____/   /____/  \____/   \____/  
Extracellular Superoxide Dismutase (ECSOD / SOD3) โ€” Homework Write-Up Template
================================================================================

Protein Selected

FieldValue
NameExtracellular superoxide dismutase [Cu-Zn]
GeneSOD3 (aka ECSOD)
OrganismHomo sapiens (human)
Chosen Structure (RCSB PDB)2JLP
Classification (RCSB)OXIDOREDUCTASE

Why I selected it (brief):
ECSOD is a secreted antioxidant enzyme that detoxifies superoxide radicals in the extracellular space, helping protect tissues from oxidative stress. I selected it because it is biologically important in vascular and lung biology, and a high-quality X-ray crystal structure is available for direct 3D visualization (PDB 2JLP).


1) Identify the amino acid sequence of the protein

Canonical protein sequence source: UniProt (Entry: P08294)

IMPORTANT NOTE ABOUT SEQUENCE VS STRUCTURE:
The UniProt canonical protein is the biological sequence. The PDB structure often contains a construct/fragment and may not include every residue from the UniProt canonical sequence.

How to obtain the sequence (recommended workflow):

  • A) UniProt canonical sequence (P08294): Go to UniProt entry P08294 โ†’ Download the FASTA (canonical sequence)
  • B) PDB construct sequence (2JLP): Go to the RCSB page for 2JLP โ†’ Download FASTA Sequence
UniProt FASTA (P08294)
>sp|P08294|SODE_HUMAN Extracellular superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD3 PE=1 SV=2
MLALLCSCLLLAAGASDAWTGEDSAEPNSDSAEWIRDMYAKVTEIWQEVMQRRDDDGALH
AACQVQPSATLDAAQPRVTGVVLFRQLAPRAKLDAFFALEGFPTEPNSSSRAIHVHQFGD
LSQGCESTGPHYNPLAVPHPQHPGDFGNFAVRDGSLWRYRAGLAASLAGPHSIVGRAVVV
HAGEDDLGRGGNQASVENGNAGRRLACCVVGVCGPGLWERQAREHSERKKRRRESECKAA
PDB FASTA (2JLP)

Download from RCSB 2JLP โ†’ Sequence tab โ†’ FASTA. The PDB chain in 2JLP is 222 aa per chain (Chains A, B, C, D).


2) How long is it? What is the most frequent amino acid?

Length typeValue
UniProt canonical length251 amino acids (from UniProt P08294)
PDB (2JLP) chain length222 amino acids (each chain Aโ€“D in 2JLP)

Most frequent amino acid (using the provided Colab notebook):

  • Input sequence used: UniProt canonical (P08294)
  • Most frequent amino acid: Alanine (A)
  • Count: 33
  • Note: Frequency depends on whether the canonical full-length or the crystallized construct sequence is used.

3) How many protein sequence homologs are there? (UniProt BLAST)

Tool: UniProt BLAST

Procedure: Paste the FASTA sequence (UniProt canonical P08294) โ†’ Run BLAST with default settings.

Results to record:

  • Total hits/homologs: (run BLAST to fill)
  • Example organisms among top hits: (e.g., vertebrate species)
  • Typical identity range of strong hits: (e.g., 70โ€“100%)

Write-up sentence:
“Using UniProt BLAST, ECSOD (SOD3) returned ______ homologous sequences under the selected parameters, with strong matches across vertebrate species.”


4) Does the protein belong to any protein family?

Yes.

  • Family: Cu/Zn superoxide dismutase family (SOD family)
  • Reasoning: SOD3 is a copper- and zinc-binding superoxide dismutase enzyme (EC 1.15.1.1) and is classified as a Cu/Zn SOD.

5) Identify the structure page in RCSB

FieldValue
PDB ID2JLP
TitleCrystal structure of human extracellular copper-zinc superoxide dismutase.
LinkRCSB PDB โ€” 2JLP

6) When was the structure solved? Is it a good quality structure?

FieldValue
Deposited2008-09-14
Released2009-03-17
Experimental methodX-RAY DIFFRACTION
Resolution1.70 ร…
R-work0.150
R-free0.185

Quality statement:
This is a good quality structure because its resolution (1.70 ร…) is better than 2.70 ร… (smaller ร… = higher resolution detail).


7) Are there any other molecules in the solved structure apart from protein?

Yes.

Small-molecule ligands listed for 2JLP (3 unique):

LigandDescription
CUCopper (II) ion
ZNZinc ion
SCNThiocyanate ion

Also present: Solvent water molecules (HOH) are included in the crystal structure.

Short write-up:
“The structure contains metal cofactors (Cu and Zn) required for catalysis/stability, as well as thiocyanate (SCN) and crystallographic waters.”


8) Does the protein belong to any structure classification family?

Yes.

  • SCOPe / fold-level description: “Cu,Zn superoxide dismutase-like” fold/superfamily (structure classification consistent with Cu/Zn SOD enzymes)

Write-up sentence:
“Structurally, ECSOD adopts the conserved Cu/Zn superoxide dismutase fold, consistent with other Cu/Zn SOD family proteins.”


9) Open the structure in PyMOL + required visualizations

Load:

fetch 2jlp, async=0
hide everything
show cartoon

A) Cartoon:

hide everything
show cartoon, polymer.protein
Cartoon view โ€” ECSOD (2JLP) tetramer with Cu/Zn cofactors Cartoon view โ€” ECSOD (2JLP) tetramer with Cu/Zn cofactors

B) Ribbon:

hide everything
show ribbon, polymer.protein
Ribbon view โ€” ECSOD backbone Ribbon view โ€” ECSOD backbone

C) Ball and stick:

hide everything
show sticks, polymer.protein
show spheres, polymer.protein
Ball-and-stick view โ€” atomic detail Ball-and-stick view โ€” atomic detail

D) Color by secondary structure (helices vs sheets):

hide everything
show cartoon, polymer.protein
color yellow, ss H
color cyan, ss S
color gray70, ss L
Secondary structure coloring โ€” yellow = helices, cyan = sheets Secondary structure coloring โ€” yellow = helices, cyan = sheets
  • Observation: More helices or sheets? More sheets. Cu/Zn SODs commonly show a beta-rich fold; the structure confirms predominant ฮฒ-sheets (cyan) with fewer ฮฑ-helices (yellow).

E) Color by residue type (hydrophobic vs hydrophilic distribution):

select hydrophobic, resn ALA+VAL+LEU+ILE+MET+PHE+TRP+PRO
select polar, resn SER+THR+ASN+GLN+TYR+CYS
select charged, resn ASP+GLU+LYS+ARG+HIS
color orange, hydrophobic
color green, polar
color blue, charged
Residue-type coloring โ€” hydrophobic (orange), polar (green), charged (blue) Residue-type coloring โ€” hydrophobic (orange), polar (green), charged (blue)
  • Observation: Hydrophobics mostly: CORE
  • Observation: Hydrophilics mostly: SURFACE
  • Interpretation: “Hydrophobic residues tend to cluster in the core, while polar/charged residues tend to be more surface exposed (typical of soluble proteins).”

F) Surface visualization + pockets/holes:

hide everything
show surface, polymer.protein
set transparency, 0.25
show cartoon, polymer.protein
set cartoon_transparency, 0.6
remove solvent
Surface view + pockets/holes โ€” semi-transparent surface with cartoon underneath Surface view + pockets/holes โ€” semi-transparent surface with cartoon underneath
  • Observation: Any grooves/holes/binding pockets visible? Yes.
  • Where? Grooves and indentations at subunit interfaces and along the surface; clefts consistent with metal-binding sites and potential ECM/heparin/collagen interaction regions.
  • Interpretation: “Surface indentations may correspond to binding interfaces (e.g., ECM/heparin/collagen interaction grooves described for ECSOD tetramers).”

Part D. Group Brainstorm โ€” Bacteriophage Engineering

Computational engineering plan for the MS2 L Lysis Protein (group of ~3โ€“4 students).


1. Executive Summary

  • Goals chosen: (1) Increased stability (easiest); (2) Tunable toxicity โ€” design a panel of L variants with graded lysis strength (attenuated โ†’ wild-type โ†’ enhanced) for predictable, dose-dependent control (hard).
  • Approach: Use Protein Language Models (e.g., ESM) for in silico mutagenesis โ†’ AlphaFold-Multimer to model Lโ€“DnaJ complexes โ†’ Rosetta interface ฮ”ฮ”G to rank variants by predicted binding strength โ†’ select a spectrum of candidates (weak/medium/strong binding).
  • Rationale: Stability is directly computable; tunable toxicity is achieved by designing variants that predictably strengthen or weaken Lโ€“DnaJ binding, yielding a graded panel for dose-response and safety.

2. Scope and Assumptions

  • Scope: MS2 L protein (75 aa); focus on single-point and small combinatorial mutations at the Lโ€“DnaJ interface.
  • Assumptions: (a) Lโ€“DnaJ binding strength correlates with lysis efficiency (weaker binding โ†’ enhanced lysis; stronger binding โ†’ attenuated lysis); (b) interface ฮ”ฮ”G predictions can rank variants into a tunable spectrum; (c) recitation tools (ESM, AlphaFold-Multimer, Rosetta) are sufficient for first-pass design.
  • Potential pitfalls:
    1. Limited training data on phageโ€“bacteria interactions โ€” models may not generalize well to Lโ€“DnaJ or other host targets.
    2. Overlapping gene constraints โ€” the lys gene overlaps coat and replicase; mutations must preserve frameshift and avoid disrupting adjacent genes.
    3. Validation burden โ€” tunable toxicity requires dose-response assays across multiple variants to confirm the predicted spectrum.

3. Target Engineering Goals

GoalStrategyTools
Increased stabilityIdentify stabilizing mutations (core packing, H-bonds)ESM mutagenesis, Rosetta ฮ”ฮ”G
Tunable toxicityDesign variants with graded Lโ€“DnaJ binding strength: attenuated (stronger binding) โ†’ wild-type โ†’ enhanced (weaker binding)AlphaFold-Multimer (L + DnaJ), Rosetta interface ฮ”ฮ”G

4. Proposed Pipeline Schematic

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  MS2 L Lysis Protein โ€” Computational Engineering Pipeline (Tunable Toxicity) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  MS2-L sequence (75 aa)
         โ”‚
         โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Protein Language โ”‚  โ† ESM / EVmutation: in silico mutagenesis at interface
  โ”‚ Model (ESM)      โ”‚    Generate candidate variants
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ AlphaFold-       โ”‚  โ† Model Lโ€“DnaJ complex; identify interface residues
  โ”‚ Multimer         โ”‚    Structure for interface design
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Rosetta ฮ”ฮ”G      โ”‚  โ† (a) Stability: filter destabilizing variants
  โ”‚ (stability +     โ”‚  โ† (b) Interface: rank by Lโ€“DnaJ binding strength
  โ”‚  interface)      โ”‚     โ†’ spectrum: attenuated | wild-type | enhanced
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  Panel of variants (attenuated โ†’ enhanced) โ†’ wet-lab validation (dose-response)

5. Tools and Rationale

ToolWhy it helps
ESM / Protein LMsLearn evolutionary constraints; predict tolerated vs. destabilizing mutations; generate interface-focused variants.
AlphaFold-MultimerModel Lโ€“DnaJ complex structure; identify interface residues for tunable design (strengthen or weaken binding).
Rosetta ฮ”ฮ”G(a) Stability: filter destabilizing variants; (b) Interface: rank variants by Lโ€“DnaJ binding strength to build a graded panel (attenuated โ†’ enhanced).

PartContent
Part A9 conceptual questions (Shuguang Zhang) โ€” amino acids, helices, ฮฒ-sheets
Part BECSOD/SOD3 (PDB 2JLP) โ€” sequence, structure, PyMOL visualization
Part DMS2 L Lysis Protein โ€” group computational engineering proposal (stability + tunable toxicity)

Summary (EOF)

    โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—
     \  \  \  \  \  \  \  \  \  \
      โ—  โ—  โ—  โ—  โ—  โ—  โ—  โ—  โ—  โ—   โ† Random stuff - don't stare too deep 
       \  \  \  \  \  \  \  \  \  \
        โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—
              peptide backbone
QuestionTopic
1Amino acid molecules in 500 g meat (~10ยฒยณ molecules)
2Digestion vs. incorporation โ€” humans donโ€™t become cow/fish
3Why only 20 natural amino acids
4Prebiotic amino acid synthesis
5D-amino acid ฮฑ-helix โ†’ left-handed
6Additional helices beyond ฮฑ (3โ‚โ‚€, ฯ€, etc.)
7Right-handed helices due to L-amino acid chirality
8ฮฒ-sheet aggregation โ€” H-bonding, hydrophobicity
9Driving force โ€” H-bonds, entropy, hydrophobicity
10Amyloid diseases โ€” misfolding, ฮฒ-sheet stability
11Amyloid ฮฒ-sheets as materials (e.g., Zhangโ€™s peptides)

Skipped: Can you make other non-natural amino acids? Design some new amino acids. | Design a ฮฒ-sheet motif that forms a well-ordered structure.