Week 4 HW: Protein Design Part I

cover image

HW4

Conceptual Questions

Here are my answers to the conceptual questions: [1] How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

~3x10^{24 molecules from (500g/(100g/mol))x6.02x10}23

[2] Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Proteins are broken into amino acids in the digestive system and reassembled using instructions from human DNA. We reuse the material, but not the entire structure.

[3] Why are there only 20 natural amino acids?

These 20 amino acids are diverse enough to form all of the complex functional proteins needed throughout evolution. Adding others does not necessarily provide an evolutionary advantage and increases complexity and error rates.

[4] Can you make other non-natural amino acids? Design some new amino acids.

Yes, you can do this by using engineered ribosomes and system. Some examples of this include adding fluorinated amino acids to increase hydrophobicity and stability and photo-reactive amino acids that respond to light.

[5] Where did amino acids come from before enzymes that make them, and before life started?

They came from abiotic chemical reactions on early Earth, such as from sources like gases with lightning and UV radiation as energy sources or hydrothermal vents from meteorites.

[6] If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Natural proteins are made of L-amino acids and form right-handed α-helices. If instead you used entirely D-amino acids you;d make a left-handed α-helix.

[7] Can you discover additional helices in proteins? -Yes, proteins can form multiple helical structures not just the α-helix, some examples are 3₁₀ helices and π-helices.

[8] Why are most molecular helices right-handed?

Life evolved using L-amino acids, which are right-handed.

[9] Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-sheets form planar structures with backbone hydrogen bond donors and acceptors along their edges. This lets multiple sheets form hydrogen bonds and stack together.

[10] Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Skipped

[11] Design a β-sheet motif that forms a well-ordered structure.

Skipped

Protein Analysis and Visualization - ATP Synthase: P0AB98 · ATP6_ECOLI

Briefly describe the protein you selected and why you selected it.

I selected ATP synthase because it is a fascinating molecular machine. As a mechanically driven enzyme, it directly connects structure, motion, and energy conversion. ATP synthase is a large multi-subunit complex composed of two major regions: F₀ (membrane-embedded rotor) and F₁ (soluble catalytic head). The amino acid sequence depends on the organism and subunit. I looked at ATP6_ECOLI which is a critical membrane component of the F₀ motor in E. coli. It has 271 amino acids and is highly hydrophobic with many transmembrane helices. The most frequent amino acid is leucine which appears 45 times.

Here’s the sequence: MASENMTPQDYIGHHLNNLQLDLRTFSLVDPQNPPATFWTINIDSMFFSVVLGLLFLVLFRSVAKKATSGVPGKFQTAIELVIGFVNGSVKDMYHGKSKLIAPLALTIFVWVFLMNLMDLLPIDLLPYIAEHVLGLPALRVVPSADVNVTLSMALGVFILILFYSIKMKGIGGFTKELTLQPFNHWAFIPVNLILEGVSLLSKPVSLGLRLFGNMYAGELIFILIAGLLPWWSQWILNVPWAIFHILIITLQAFIFMVLTIVYLSMASEEH

BLAST searching in UniProt revealed 250 homologs. The protein belongs to the F-type ATPase subunit family, which is common across many life forms because of its role in energy metabolism.

The earliest I could find this structure was solved was back in 2016 but in 2023 it was increased from 6.9 Å to 3.1 Å making it really close to a good quality structure. The solved structures of ATP synthase in E.coli includes multiple proteins and lipid molecules, detergent molecules, and even bound ATP or ADP.

In structural classification systems, subunit a belongs to the membrane protein α-helical bundle class, because it is mostly made of transmembrane α-helices.

cover image

Protein Visualizations: ATP Synthase 6oqw

cover image

Here's the protein as a cartoon in pyMOL

cover image

Here it is by secondary structure, it has a lot more helices than sheets

cover image

Here it is with hydrophobic in orange, polar in cyan, positive in blue, and negative in magenta. It's mostly hydrophobic, which is important for membrane stabilization.

cover image

The surface of the protein does not have any large holes and instead has narrow channels.

Group Brainstorm on Bacteriophage Engineering

Computational Engineering of the MS2 Lysis Protein (L) Background. The MS2 L protein is a 75-amino-acid polypeptide that lyses E. coli by an incompletely understood mechanism. Its C-terminal transmembrane (TM) domain inserts into the cytoplasmic membrane and oligomerizes, causing depolarization that triggers host autolytic enzymes to degrade the murein layer. Recessive, conservative missense mutations clustered around a conserved LS dipeptide strongly implies L engages an unidentified host protein target rather than simply disrupting the bilayer. The dispensable N-terminal domain binds chaperone DnaJ (with solved PDB structures), modulating lysis timing. Its removal causes lysis ~20 min earlier. No experimental structure of L exists. Goals. (1) Stabilize L for more robust membrane accumulation. (2) Accelerate lysis by bypassing DnaJ-dependent regulatory timing and improving delivery of functional L to the membrane. Because the downstream lytic target is unknown, we do not attempt to enhance per-molecule toxicity at the point of target engagement; we focus on removing regulatory brakes and increasing the supply of functional protein. Pipeline: Three Tools, Each Non-Redundant

Clustal Omega (Conservation Map). Align L homologs across Leviviridae (MS2, f2, R17, GA, PP7, AP205, PRR1, M12, KU1, JP34). Conserved C-terminal residues, especially the LS motif, are presumed to mediate the unknown heterotypic interaction and are excluded from mutation. This map constrains all downstream design.
ESM2 + Deep Combinatorial Scanning (Fitness Oracle). Score every single-point mutation by log-likelihood change: increases at mutable positions indicate stabilizing substitutions (Goal 1). N-terminal scanning identifies mutations that disrupt DnaJ binding (Goal 2). A strict preservation rule applies near the LS motif: mutations are evaluated for maintenance of wild-type fitness, not improvement. The genetics show even conservative changes there cause recessive loss of function. Pairwise combinatorial scanning (about ~2M pairs) captures epistatic synergies at mutable positions. This could be potentially pushed further with enough compute.
AlphaFold 3 (Structural Filter + Complex Model). Predicts variant structures as a sanity check (does the TM helix survive?) and models the L–DnaJ complex to verify that N-terminal truncations/mutations disrupt the regulatory interface. Used as a filter, not a design engine. PAE matrix identifies confident interface contacts. Ranking. Composite score: ESM2 log-likelihood gain (stability) + conservation preservation (all essential residues intact) + AF3-predicted DnaJ-binding disruption (for timing bypass). Top 10–20 variants advance to experimental validation. Pipeline Schematic

cover image

Why Not More Tools? ProteinMPNN is excluded because it is trained on crystallized globular PDB proteins, not predicted structures of disordered membrane peptides. The compute is invested in combinatorial ESM2 depth. Pitfalls No experimental structure: All structural reasoning rests on AF3 predictions for a challenging target; mitigated by treating AF3 as a filter and cross-referencing against the conservation map. Unknown lytic target: The central limitation. We cannot optimize target-binding affinity for an unidentified partner; engineering is restricted to upstream properties (stability, membrane delivery, DnaJ bypass). Autolysin bottleneck: If lysis rate is limited by host autolytic enzyme activity rather than L accumulation, stabilization gains may show diminishing returns; the plaque assay will reveal this.

Sources

AI Prompts

Write PyMOL code to help me visualize the protein as “cartoon”, “ribbon” and “ball and stick”, color the protein by secondary structure, color the protein by residue type, and visualize the surface of the protein