Week 5 HW: Protein Design Part ii

  • [] Homework — DUE BY START OF MAR 10 LECTURE

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mechanis

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • [] Task A

Your challenge:

    Background: Design short peptides that bind mutant SOD1 and then decide which ones are worth advancing toward therapy. You will use three models developed in our lab:
  • PepMLM: target sequence-conditioned peptide generation via masked language modeling.
  • PeptiVerse: therapeutic property prediction.
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)
  • Part 1: Generate Binders with PepMLM

                      Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
    

    🟢🟤🟡P00441

    Here is fully translated superoxide dimutase protein P00441 in uniprot with the initiator methionine included. We need to cleave that M off before we apply our requested mutation to progress with a mature enzyme.

    So not this… 1 2 3 4 M A T K

    But this.. 1 2 3 4 A T K A

    To create our A4V SOD (love the rhyme) mutant… 1 2 3 4 A T K V

    The savvy student who fails to cleave the first methionine (M) can intuit the actual amino acid to change without thinking through any of the previous steps, but it’s nice to have a why in all things, since this is biology after all and we have evolution and ChatGPT. Please note that we will not want to use a protein sequence with any sort of truncation or wrapping on the sequence so here are my sequences for PPMLM-650M.

                      Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: 
    

    🤗pepmlm650mlink

                      Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
    

    To create our A4V SOD (love the rhyme) mutant… 1 2 3 4 A T K V

    Mutant A4V SOD for PepMLM-650 There are two options, full protein sequence and a 12-Sequence input which I settled on in later runs.
    MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

    colabcode

    MATKVVCVLKGD

    Within the PepMLM-650 codebase in Google Colab Notebook, there are sliders and input fields to parameterize individual runs. However, these parameters didn’t seem to encode, so I finally hard-coded changes, as I will show below as a series of excerpts pulled from the codebase.

    single_sequence = True #@param {type:"boolean"} protein_seq = "MATKVVCVLKGD" #@param {type:"string"} # Initial value for num_binders num_binders = 4 # Initial values for top_k and peptide_length top_k = 3 peptide_length = 12

    code_constrained_step

    Initial_4in1_SequenceSet

    BinderPseudo_Perplexity_Score
    WVVVLVAGVVGE35.014933
    LTLVVAVGEVGE25.582245
    SVTEEVEDVDPV21.336863
    LPTVVVEGVDPE17.079494

    To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

    Record the perplexity scores that indicate PepMLM’s confidence in the binders.

    Part 2: Evaluate Binders with AlphaFold3

    Navigate to the AlphaFold Server: alphafoldserver.comFor each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

    Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

    Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

    Paste the peptide sequence. Paste the A4V mutant SOD1 sequence in the target field. Check the boxes Predicted binding affinity Solubility Hemolysis probability Net charge (pH 7) Molecular weight Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

    Choose one peptide you would advance and justify your decision briefly.

    Part 4: Generate Optimized Peptides with moPPIt

    Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

    Open the moPPit Colab linked from the HuggingFace moPPIt model card Make a copy and switch to a GPU runtime. In the notebook: Paste your A4V mutant SOD1 sequence. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). Set peptide length to 12 amino acids. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

  • [] Task B - Boltz Document

https://docs.google.com/document/d/18Vd9TQL2FjpEU0QdlGCgHe1D0BDoMzcfPRiFEXQIAas/preview

Part C: Final Project: L-Protein Mutants

This homework requires computation that might take you a while to run, so please get started early.

Tools

See HTGAA Protein Engineering Tools spreadsheet