Week 4 HW: Protein design P1

Part A. Conceptual Questions

How many molecules of amino acids do you take with a piece of 500 grams of meat? Roughly 3 x 10^24 individual amino acid molecules.

1- Why do humans eat beef but do not become a cow, eat fish but do not become fish? Because digestion is a disassembly line. We completely chop up their proteins into individual amino acid building blocks, then use our own DNA blueprints to rebuild them into human.

2- Why are there only 20 natural amino acids? It’s a perfect sweet spot. Those 20 give us enough chemical variety to build everything life needs, but using more would make the system too complicated and error-prone.

3- Can you make other non-natural amino acids? Design some new amino acids. Yes, one could for example make a glowing one by attaching a light-emitting molecule

4- Where did amino acids come from before enzymes that make them, and before life started? They were probably cosmic cooking experiments. Lightning and UV rays zapped simple gases and water in the early Earth’s atmosphere, spontaneously creating them.

5- If you make an α-helix using D-amino acids, what handedness would you expect? You’d get a left-handed helix. They’re mirror images, so using the mirror-image building blocks flips the spiral direction.

7- Can you discover additional helices in proteins? Yes. We’re still finding new ones. The π-helix (a wider, rarer spiral) and the 310-helix (a tighter, stubby one) are two examples already discovered.

8- Why are most molecular helices right-handed? It’s probably a historical accident. Once early life settled on right-handed sugars and left-handed amino acids, their interactions locked everything into a right-handed spiral trend.

9- Why do β-sheets tend to aggregate? What is the driving force? Their edges are like sticky Velcro. They have exposed backbone “claws” (called hydrogen bond donors and acceptors) that desperately want to latch onto another sheet’s identical claws to hide from water.

Part B: Protein Analysis and Visualization

Renin I picked renin because it’s kind of a trend here on my lab. It’s basically an enzyme that regulates blood pressure and volume.

The sequence I got from uniprot(listed as: P00797 · RENI_HUMAN): MDGWRRMPRWGLLLLLWGSCTFGLPTDTTTFKRIFLKRMPSIRESLKERGVDMARLGPEWSQPMKRLTLGNTTSSVILTNYMDTQYYGEIGIGTPPQTFKVVFDTGSSNVWVPSSKCSRLYTACVYHKLFDASDSSSYKHNGTELTLRYSTGTVSGFLSQDIITVGGITVTQMFGEVTEMPALPFMLAEFDGVVGMGFIEQAIGRVTPIFDNIISQGVLKEDVFSFYYNRDSENSQSLGGQIVLGGSDPQHYEGNFHYINLIKTGVWQIQMKGVSVGSSTLLCEDGCLALVDTGASYISGSTSSIEKLMEALGAKKRLFDYVVKCNEGPTLPDISFHLGGKEYTLTSADYVFQESYSSKKLCTLAIHAMDIPPPTGPTWALGATFIRKFYTEFDRRNNRIGFALAR It has 406 AA’s and the most frequent is Leucina (L) with 34 appearances. It has 432 homologs.

The structure for it was deposited at 1992-02-05. And it has a resolution of 2.5Å. It belongs to the: pepsin-like family and it has a N-acetyl-D-glucosamine apart of the molecule.

2RENcartoon 2RENcartoon

Renin as cartoon, colored by secondary structure… It mainly has beta sheets.

2RENribbonsresidues 2RENribbonsresidues

Renin as ribbons, colored by residues. Code: #Hydrophobic (Ala, Val, Ile, Leu, Met, Phe, Trp, Pro, Tyr) color yellow, resn ALA+VAL+ILE+LEU+MET+PHE+TRP+PRO+TYR #Positive charges (Arg, Lys, His) color blue, resn ARG+LYS+HIS #Negative charges (Asp, Glu) color red, resn ASP+GLU #Polar (Ser, Thr, Asn, Gln, Cys) color magenta, resn SER+THR+ASN+GLN+CYS #Glycines (often left as a special case) color green, resn GLY

2RENBSRES 2RENBSRES

Renin as ball and sticks, still colored by residues.

2RENSurface 2RENSurface

Renin surface view, catalytic aspartases (main mechanism) highlighted. Also the site were the drug Aliskiren (the direct renin inhibitor) binds…

Part C: Using ML-Based Protein Design Tools

Part D. Group Brainstorm on Bacteriophage Engineering

Project Objective

  • Engineer the L protein of the MS2 phage to increase structural stability.
  • Disrupt or reduce its interaction with the bacterial chaperone DnaJ.
  • Preserve the C-terminal lysis domain to maintain lytic function.
  • Avoid mutations that interfere with structurally or evolutionarily coupled residues.

Phase 1: Mapping the DnaJ Interaction Interface

Since the exact binding interface between the L protein and DnaJ is unknown, the first step is to identify it computationally rather than introducing arbitrary mutations.

  • Use AlphaFold-Multimer to model the complex between L protein and DnaJ.
  • Generate multiple structural predictions and select the top-ranked models.
  • Identify consensus interface residues that consistently appear in the predicted binding interface.
  • Perform in silico alanine scanning of the N-terminal residues in the complex to determine which residues significantly contribute to binding energy (ΔΔG).
  • Analyze whether the N-terminal region resembles known DnaJ-binding motifs, typically hydrophobic residues flanked by basic amino acids.

This phase defines which residues are critical for interaction and should not be mutated randomly.


Phase 2: Targeted N-Terminal Redesign

Instead of deleting regions or performing extensive random substitutions, introduce controlled chemical modifications to disrupt interaction while preserving structural stability.

  • Focus on charge inversion strategies:

    • Basic residues (K, R) → Acidic residues (E, D)
    • Acidic residues (E, D) → Basic residues (K, R)
  • Disrupt hydrophobic interaction patches:

    • Hydrophobic residues (L, I, V, F) → Polar residues (S, T, N, Q)
    • Aromatic residues (F, Y, W) → Aliphatic or small residues
  • Generate a graded library of variants:

    • Minor charge modifications
    • Moderate interface perturbations
    • Strong hydrophobic disruption

This creates a Pareto front of variants balancing reduced DnaJ interaction and preserved protein stability.


Phase 3: Stability and Functional Filtering

To ensure that redesigned variants remain structurally viable and functionally relevant:

  • Use Rosetta or FoldX to calculate ΔΔG and verify that mutations do not destabilize the overall protein fold.

  • Confirm that mutations in the N-terminal region do not propagate structural stress toward the C-terminal lysis domain.

  • Perform co-evolutionary analysis (e.g., EVcouplings):

    • Identify residue pairs that co-evolved between the N-terminal and C-terminal regions.
    • Avoid mutating co-evolved residues independently to prevent functional disruption.
  • Evaluate aggregation propensity using tools such as Aggrescan3D to ensure that mutations do not create exposed hydrophobic patches leading to cytoplasmic aggregation.

  • Assess sequence plausibility using protein language models such as ESM to filter out unlikely or non-natural variants.


Key Limitations

  • The DnaJ binding mode may be transient or dynamic, reducing AlphaFold-Multimer accuracy.
  • Protein language model scores do not guarantee in vivo functionality.
  • Intrinsically disordered regions may not be accurately modeled.
  • Computational predictions must ultimately be validated experimentally.
Pipeline Pipeline