Week 4 HW: Protein Design

Part A: conceptual question: Answer any of the following questions from Shuguang Zhang

Why do beta-sheets tend to aggregate?

A beta-strand is what happens when a protein’s backbone which involves the repeating NH–Calpha–CO chain shared by every amino acid stretches out into a nearly flat zigzag. When two or more of these strands line up next to each other and link through hydrogen bonds (where an N–H on one strand bonds to a C=O on the neighbor), you get a beta-sheet. The strands on the outer edges still have a full row of exposed N–H and C=O groups resulting in another strand being added, and so on.

What forces pull sheets together?

  • The hydrophobic effect is the biggest one. In a beta-strand, side chains stick out. Since many side chains are hydrophobic, two sheets stack such that the greasy surfaces are in the interior.

  • Hydrogen bonding gives the structure its regularity. Each new strand that joins the sheet edge contributes roughly one H-bond per amino acid along its length. Individually, H-bonds in water are not enormously strong because breaking one with a neighbor just lets you form one with a water molecule instead, but across a strand of ten or more residues, they add up meaningfully.

  • Van der Waals packing stabilizes sheets that have stacked together. Van der Waals forces are much weaker and shorter-range. They arise from temporary, fluctuating dipoles.

Part B: Protein Analysis and Design

Briefly describe the protein you selected and why you selected it.

I selected a macrocyclic peptide for the following reasons:

  • They have the ability to interfere with Protein-Protein Interactions (PPI), which is applicable to therapeutics
  • They have the ability to permeat membranes as they are small and can change conformation depending on hydrophobicity of the environment
  • They can be programmed with ML for targeting purposes
  • Compared to linear peptides, they are more robust to proteases because the N-terminus and C-terminus are hidden from proteases

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

Are there any other molecules in the solved structure apart from protein?

Does your protein belong to any structure classification family?

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Part C. Using ML-Based Protein Design Tools