Week 4 HW: Protein Design Part I

Part A: Conceptual Questions

  1. Since 1 Da = 1 g/mol, 500 g of meat equates to about 5 mol of amino acids. The approximate number of amino acids within these 5 mols would be 3x10^24.

  2. When humans eat, the macromolecules the beef are made of are broken down during digestion into monomers. These monomers are common to all life, and humans use them to build human-specific macromolecules.

  3. These 20 amino acids are what evolution happened to select for. These 20 amino acids happen to be enough to build all the proteins that are necessary for life that has evolved on Earth. Theoretically, there could be more, but in our “system” of life, these 20 are enough.

  4. Theoretically, yes. The R group can be anything, but only some will be functional with the 20 natural amino acids. Below are designs of new amino acids; I based them off of preexisting ones and added/removed atoms new amino acids

  5. Amino acids formed under natural conditions on Earth. One example is demonstrated by the Miller-Urey experiments where they demonstrated that atmospheric gases could form amino acids spontaneously when energy is put in in the form of lightning.

  6. Left-handed because natural amino acids are L and create right-handed α-helixes. This means D-amino acids mirror L-amino acids, so the α-helix would also be mirrored.

  7. Yes, you can discover new α-helices that already exist in proteins using tools/techniques like AlphaFold and X-ray crystallography. Discovering novel types of helices would require changing the chemistry of the amino acids, so discovering pre-existing ones would be unlikely. Creating new types of helices would be interesting, though.

  8. This is because most biological molecules are D-enantiomers (right-handed), so they also create right-handed helices.

  9. β-sheets are formed using hydrogen bonds across backbones. β-sheet backbones “want” to form more hydrogen bonds with another peptide backbone, and can do that by stacking with other β-sheets.

  10. Many amyloid diseases form β-sheets because hydrogen bonding in β-sheets is extremely favorable, so if the protein is able to “misfold” into β-sheets, it will do that. β-sheets can be used as materials. In fact, silk is rich in β-sheets, which is why it has so much tensile strength.

Part B: Protein Analysis and Visualization

Part C: Using ML-Based Protein Design Tools

C1: Protein Language Modeling

C2: Protein Folding

C3: Protein Generation

Part D: Group Brainstorm on Bacteriophage Engineering

I collaborated with Heather Qian on this assignment!

Computational Goal: We will attempt to increase the titer of the L protein expressed by MS2 in the E. coli host.

Overall solution: create a new transcription factor that binds very tightly to the promoter, increasing expression of the L protein

Inverse protein folding using ProteinMPNN

  • Use structure of a transcription factor (a PDB file) that binds with very high affinity to a promoter that is highly homologous to the L protein promoter
  • Input the structure of the aforementioned transcription factor into ProteinMPNN along with the amino acid sequence of a native transcription factor that binds to the L protein promoter
  • This will theoretically generate the sequence of a protein that is structurally similar to a transcription factor with high DNA-binding affinity but is specific to the L protein promoter

Confirm the binding affinity between our designed transcription factor and the MS2 L protein promoter with a ligand-binding AI model

Why will these tools accomplish our computational goal?

  • Protein MPNN is an inverse-folding algorithm. The sequence of the L protein promoter and other transcription factors that bind to this promoter are known. However, to generate a transcription factor with higher affinity for this promoter sequence than the native L protein transcription factors, we will model our inverse-folding after an existing transcription factor that behaves in the manner we envision (binds with high affinity) for our engineered transcription factor. As discussed in this week’s lecture, existing AI algorithms are good at designing structures similar to previously-characterized structures but less good at designing different (novel) structures. As such, our inverse-folding approach hinges on the existence of another high affinity transcription factor for a homologous promoter
  • Using a ligand-binding AI model will provide a computational indication of the success of our engineering without the need to use an in vitro or in vivo model

Possible pitfalls:

  • We are unable to find a transcription factor that binds to a highly homologous promoter sequence with high affinity :(
  • Inaccuracies in AI predictions :(

Schematic: Bacteriophage Engineering Schematic Bacteriophage Engineering Schematic