Week 4 HW: Protein Design Part I

Part A

  1. Number of amino acids = 500g/(100 Da) = 5 mol = 3.0 x 10^24 molecules of amino acids (approx).
  2. Over the course of digestion, the polynucleotides are broken down into constituent nucleobases prior to absorption.
  3. Considering the degeneracy of codons, these 20 amino acids were sufficient to produce a diversity of proteins, and were evolutionarily conserved due to a lack of strong selection pressure against them (ie. they were “good enough”).
  4. .
  5. They could’ve formed abiotically given the presence of organic precursors such as carboxylic acids, amines and small molecule side groups, catalyzed by zeolites, and triggered by lightning and heat on the primordial earth or extraterrestrial asteroids respectively.
  6. Left-handed spirals.
  7. Yes, some examples are $\pi$ helices and $3_{10}$ helices.
  8. It’s more energetically stable given L-amino acids and D-saccharides predominate natural protein helices and the sugar-phosphate backbone of polynucleotides respectively.
  9. In aqueous environments, the hydrophobic portions of beta sheets are driven together by steric interactions while the hydrophilic groups facilitate relatively strong intermolecular hydrogen bonding.
  10. Beta sheets allow for strong intermolecular bonding, in hydrophobic zippers, that renders misfolded proteins more stable than their functional conformations and facilitates the addition of more proteins to the aggregate fibril - the biochemical basis of amyloid pathogenesis. Yes, you could make materials out of amyloid beta sheets but they’d only be stable under aqueous conditions and would be potential biotoxins.
  11. .

Part B

  1. I chose to investigate an antifreeze protein from an arctic bacterium as a potential solution for winter snow-clearing.

  2. The sequence, as obtained from UniProt is as follows:

>sp|H7FWB6|IBP_FLAFP Ice-binding protein OS=Flavobacterium frigoris (strain PS1) OX=1086011 GN=HJ01_03463 PE=1 SV=1
MKILKRIPVLAVLLVGLMTNCSNDSDSSSLSVANSTYETTALNSQKSSTDQPNSGSKSGQ
TLDLVNLGVAANFAILSKTGITDVYKSAITGDVGASPITGAAILLKCDEVTGTIFSVDAA
GPACKITDASRLTTAVGDMQIAYDNAAGRLNPDFLNLGAGTIGGKTLTPGLYKWTSTLNI
PTDITISGSSTDVWIFQVAGNLNMSSAVRITLAGGAQAKNIFWQTAGAVTLGSTSHFEGN
ILSQTGINMKTAASINGRMMAQTAVTLQMNTVTIPQ
  • It is 280 residues long. Threonine is the most common AA residue, appearing a cumulative 33 times.
  • There are 250 homologs from across the tree of life.
  • It belongs to the ice-binding protein superfamily.
  1. The structure was modelled in Apr 2014. The model is high-resolution, providing details down to 2.10Å. There are no non-protein components in the final protein structure. It belongs to the ice-binding protein superfamily

  2. Cartoon View Cartoon View of IBP_FLAFP Cartoon View of IBP_FLAFP

Ribbon View Ribbon View of IBP_FLAFP Ribbon View of IBP_FLAFP

Ball-and-Stick View Ball-and-stick View of IBP_FLAFP Ball-and-stick View of IBP_FLAFP

Labelling by Secondary structure Secondary Structure-Labelled View of IBP_FLAFP Secondary Structure-Labelled View of IBP_FLAFP

The protein has more sheets than helices, though the helices are substantially larger.

Labelling by residue hydropathy.

Blue represents hydrophilicity while red represents hydrophobicity Hydropathic View of IBP_FLAFP Hydropathic View of IBP_FLAFP

Hydrophobic residues are encountered sparingly, on outward-oriented branches, while hydrophilic residues are found on the main beta sheets accessible to the protein’s surroundings.

Surface View Surface Energy View of IBP_FLAFP Surface Energy View of IBP_FLAFP

There are a few pockets for water to bind to.

Part C

C1

Deep Mutational Scans

  1. Mutational Heat Map Mutational Heat Map Mutational Heat Map

  2. I think the bright bands which occur at regular intervals signify residues involved in linking helices together since they offer some flexibility in binding nucleation sites.

Latent Space Analysis

  1. Yes, they tend to be homologs

C2

Folding Protein Folding Protein Folding

  1. It largely matches the original structure including the spiral linking several beta sheets and a large alpha helix. However, it fails to model the hydrophobic side branches of the protein separately and instead connects them using a polypeptide chain.
  2. No, it is quite sensitive to missense mutations. This comports with the earlier heatmap of mutational scans.

C3

Inverse Folding

Inverse Folding Inverse Folding

Comparison of Structures

Original: Protein Folding Protein Folding

Modified Sequence: Modified Protein Modified Protein

In comparison to the original, the predicted structure of the modified protein retains the broad helix and beta sheet spiral structure but its beta sheet spiral is misshapen and the hydrophobic branches are overlarge. Given the difference between experimentally-derived structure and computational folding prediction for the original protein, I would believe that this new sequence would fail to demonstrate ice-binding behavior in a biological environment.

D

Protein Engineering Goals

  1. Enhance the thermal stability of MS2-L peptide derived from the ssRNA Leviviridae phage

Protein Engineering Workflow

  1. Use Protein Language Models to conduct high-thoroughput in silico mutation of the protein
  2. Grade mutants’ thermal stability using Rosetta
  3. Carrying forward the top quintile of mutants as seeds for the next iteration, repeat the process till increments in stability are minimal