Week 5 HW: Protein Design Part ii

  • [] Homework — DUE BY START OF MAR 10 LECTURE

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mechanis

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • [] Task A

Your challenge:

    Background: Design short peptides that bind mutant SOD1 and then decide which ones are worth advancing toward therapy. You will use three models developed in our lab:
  • PepMLM: target sequence-conditioned peptide generation via masked language modeling.
  • PeptiVerse: therapeutic property prediction.
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)
  • Part 1: Generate Binders with PepMLM

                      Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
    

    🟢🟤🟡P00441

    Here is fully translated superoxide dimutase protein P00441 in uniprot with the initiator methionine included. We need to cleave that M off before we apply our requested mutation to progress with a mature enzyme.

    So not this… 1 2 3 4 M A T K

    But this.. 1 2 3 4 A T K A

    To create our A4V SOD (love the rhyme) mutant… 1 2 3 4 A T K V

    The savvy student who fails to cleave the first methionine (M) can intuit the actual amino acid to change without thinking through any of the previous steps, but it’s nice to have a why in all things, since this is biology after all and we have evolution and ChatGPT. Please note that we will not want to use a protein sequence with any sort of truncation or wrapping on the sequence so here are my sequences for PPMLM-650M.

                      Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: 
    

    🤗pepmlm650mlink

                      Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
    

    To create our A4V SOD (love the rhyme) mutant… 1 2 3 4 A T K V

    Mutant A4V SOD for PepMLM-650 There are two options, full protein sequence and a 12-Sequence input which I settled on in later runs.
    MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

    colabcode

    MATKVVCVLKGD

    Within the PepMLM-650 codebase in Google Colab Notebook, there are sliders and input fields to parameterize individual runs. However, these parameters didn’t seem to encode, so I finally hard-coded changes, as I will show below as a series of excerpts pulled from the codebase.

    single_sequence = True #@param {type:"boolean"} protein_seq = "MATKVVCVLKGD" #@param {type:"string"} # Initial value for num_binders num_binders = 4 # Initial values for top_k and peptide_length top_k = 3 peptide_length = 12

    code_constrained_step

    Initial_4in1_SequenceSet

    BinderPseudo_Perplexity_Score
    WVVVLVAGVVGE35.014933
    LTLVVAVGEVGE25.582245
    SVTEEVEDVDPV21.336863
    LPTVVVEGVDPE17.079494

    To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

    Scores above, I think. I feel like I’m in Gulliver’s Travels returning this exercise cold after 2 months so I am going to try and piece together memory using a peers homework. For the next phase I am now going to find the amino acid sequence for my SOD1 sequence. Now I’m not copying information because I just went to uniprot site myself and searched for SOD1 on the splash page and scrolling down on first page I found human sodc but I skipped over that one and found the sod for sheep

    P09670 · SODC_SHEEP
    MATKAVCVLKG
    DGPVQGTIRFE
    AKGDKVVVTGS
    ITGLTEGDHGF
    HVHQFGDNTQG
    CTSAGPHFNPL
    SKKHGGPKDEE
    RHVGDLGNVKA
    DKNGVAIVDIV
    DPLISLSGEYS
    IIGRTMVVHEK
    PDDLGRGGNEE
    STKTGNAGGRL
    ACGVIGIAP

    Record the perplexity scores that indicate PepMLM’s confidence in the binders.

    Given confusion perplexity scores are likely very high. The model confidence according to UniProt is spot on though, specifically model confidence is very high (pLDDT > 90). This is generated by AlphaFold as a per-residue confidence score (pLDDT) between 0 and 100. Now I am prepared to transition to part 2. SheepSODC SheepSODC

    What about my mutants, though? I do not want to disrupt folding randomly, need an appropriate target region for mutation logic so I will leverage the MobiDB website.

    SheepSODCmutant SheepSODCmutant

    Enumerated Amino Acids with position and highlighted for subsequent mutagenesis based on encode segment flexibility, disruptability, and functional consequences

    PositionAmino Acid
    1M
    2A
    3T
    4K
    5A
    6V
    7C
    8V
    9L
    10K
    11G
    12D
    13G
    14P
    15V
    16Q
    17G
    18T
    19I
    20R
    21F
    22E
    23A
    24K
    25G
    26D
    27K
    28V
    29V
    30V
    31T
    32G
    33S
    34I
    35T
    36G
    37L
    38T
    39E
    40G
    41D
    42H
    43G
    44F
    45H
    46V
    47H
    48Q
    49F
    50G
    51D
    52N
    53T
    54Q
    55G
    56C
    57T
    58S
    59A
    60G
    61P
    62H
    63F
    64N
    65P
    66L
    67S
    68K
    69K
    70H
    71G
    72G
    73P
    74K
    75D
    76E
    77E
    78R
    79H
    80V
    81G
    82D
    83L
    84G
    85N
    86V
    87K
    88A
    89D
    90K
    91N
    92G
    93V
    94A
    95I
    96V
    97D
    98I
    99V
    100D
    101P
    102L
    103I
    104S
    105L
    106S
    107G
    108E
    109Y
    110S
    111I
    112I
    113G
    114R
    115T
    116M
    117V
    118V
    119H
    120E
    121K
    122P
    123D
    124D
    125L
    126G
    127R
    128G
    129G
    130N
    131E
    132E
    133S
    134T
    135K
    136T
    137G
    138N
    139A
    140G
    141G
    142R
    143L
    144A
    145C
    146G
    147V
    148I
    149G
    150I
    151A
    152P

    Amino acids selected for mutation

    PositionWTCandidateReason
    71GAReduce flexibility
    72GAFlexible glycine region
    76EPPotential structural disruption site

    Final mutant sequence with three changes

    MATKAVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHAAPKDPERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP
    Sequence TypeAmino Acid Sequence
    G71A/G72A + E76P Mutant
    firstalphafoldserverrun firstalphafoldserverrun

    Part 2: Evaluate Binders with AlphaFold3

    Navigate to the AlphaFold Server: [alphafoldserver](https://alphafoldserver.com/welcome) WTSheepSODC WTSheepSODCFor each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. MUTSheepSODC MUTSheepSODCRecord the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?MATKVVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP
    Sequence TypeAmino Acid Sequence
    A4V Mutant
    A4V_Mutant A4V_MutantIn a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

    There were two mutation pathways considered in this section the A4V change resulted in a 0.1 ipTM drop. I can comment on the mutation I engineered above, which introduces a three-residue change (G71A/G72A + E76P). The predicted template modeling (pTM) score and interface predicted template modeling (ipTM) scores are based on the template modeling (TM) score which are all metrics available in the AlphaFold Server visualization. The TM was originally proposed by Zang_&_Skolnick based on the Global Distance Test (GDT) and MaxSub. The scores are evaluated using a statistical association, measured by a correlation coefficient, after adjusting for differences in protein size. An interesting observation in Abramson et al. (2024) methods article they do not resport statistical tests of association due to small n populations paper. The paper describes pTM and ipTM as global ranking variables that can increase rates of disorder in model. In addition chain ranking can be performed with a variation of the pTM metric and pLDDT can be averaged for putative residues.

    FirstChange_G71A FirstChange_G71A FirstChange_G72A FirstChange_G72A ThirdChange_E76P ThirdChange_E76P

    Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

    Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

    PeptiVerse PeptiVerse

    Paste the peptide sequence. PeptiVersewt PeptiVersewt Paste the A4V mutant SOD1 sequence in the target field.

    MATKVVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP
    Sequence TypeAmino Acid Sequence
    A4V Mutant
    PeptiVerseA4Vmutant PeptiVerseA4Vmutant

    Check the boxes Predicted binding affinity Solubility Hemolysis probability Net charge (pH 7) Molecular weight Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

    Choose one peptide you would advance and justify your decision briefly.

    There can be alignment between the computational prediction scores from AlphaFold Server, specifically ipTM rank score, which are both “significant” because they are larger than 0.80 but the wild type ipTM score is 0.01 larger than the mutant ipTM. In regards to the thermodynamics expressed in the PeptiVerse datasheets. Solubility and penetrance increase in the WT as hydrophobicity declines, compared to the mutant, which has greater hydrophobicity and lower solubility and penetrance. The explanation for the thermodynamic differences between the wild type and the mutants is the exposure of hydrophobic bases to the solution, leading to more water cages forming around them in the mutant than in the wild type.

    Part 4: Generate Optimized Peptides with moPPIt

    Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

    Open the moPPit Colab linked from the HuggingFace moPPIt model card Make a copy and switch to a GPU runtime. In the notebook: moPPItA4V_mutant moPPItA4V_mutant ~Make sure to switch De Novo to enter manual sequence

    Paste your A4V mutant SOD1 sequence. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). Set peptide length to 12 amino acids. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides. ~Set target for affinity to original WT: MATKAVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies? It doesn’t stay connected to the necessary server to run.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

[] Task B - Boltz Document

https://docs.google.com/document/d/18Vd9TQL2FjpEU0QdlGCgHe1D0BDoMzcfPRiFEXQIAas/preview

Boltz Lab BRD4 Drug Discovery Platform Tutorial Introduction This exercise walks you through a real drug discovery workflow using the Boltz Lab platform - from predicting how known drugs bind a cancer target, all the way to running AI-generated molecule libraries and interpreting the results. You will work on BRD4 (Bromodomain-containing protein 4), an epigenetic reader protein and validated oncology target. BRD4 has been the subject of intense medicinal chemistry effort in recent years. What you will learn • How to use Boltz Lab to predict protein-ligand binding structures • How to interpret Binding Confidence and Optimization Score metrics • How to set up a virtual screening project in Boltz Lab • How to compare known drugs and AI-generated molecules in a single workflow • How to critically evaluate computational predictions from a drug discovery perspective

Background: BRD4 and BET Bromodomains BRD4 is a member of the BET (Bromodomain and Extra-Terminal) family of epigenetic reader proteins. It recognises acetylated lysine residues on histone tails and recruits transcriptional machinery to gene promoters, driving expression of oncogenes including c-Myc. Dysregulated BRD4 activity is implicated in haematological malignancies, solid tumours, and inflammatory disease. This exercise inspects the example of JQ1 - the landmark BRD4 inhibitor reported by Filippakopoulos et al. in Nature 2010. The three compounds below capture a hit-to-candidate optimisation journey, including a deliberately instructive stereochemical twist. Stage Compound SMILES Hit Stripped
Back Core CC1C2C(=C(SC=2NCCN=1)C)C Lead Triazole +
Acid O=C(C[C@@H]1N=C(C)C2C(=C(SC=2N2C1=NN=C2C)C)C)O Candidate (+)-JQ1 O=C(C[C@H]1C2=NN=C(N2C3=C(C(C4=CC=C(C=C4)Cl)=N1)C(C)= C(S3)C)C)OC(C)(C)C

�� Note: Reference: Filippakopoulos P. et al. Selective inhibition of BET bromodomains. Nature 468, 1067-1073 (2010). Crystal structure PDB: 3MXF (BRD4 BD1 complexed with (+)-JQ1). 3MXF 3MXF source Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Part 0: Sign-up to Boltz Lab Go to lab.boltz.bio, click “Request Access”, add your name and email while specifying as organization name “HTGAA”, and click “Submit request”.
boltzrequest boltzrequest We will try to make sure to approve your request within a day or two, giving you credits for both the exercise as well as further exploration. If you plan to use Boltz Lab for your final project and need more credits, please reach out to me at gabriele@boltz.bio.
Part 1: Structural Predictions in the Sandbox

Start with three Boltz-2 predictions in the Sandbox to understand how the model scores protein– ligand interactions across a real drug discovery progression. 1.1 The Boltz-2 Metrics Explained Before you run your first prediction, understand these three key outputs: Metric Range What it means When to trust it Binding Confidence 0 - 1 How confidently Boltz-2 places the ligand in the binding site. Higher = predicted more likely to bind.

0.7 considered
reliable; > 0.8 high
confidence Optimization Score 0 - 1 A relative affinity for use in congeneric series, or between known binders. Higher = predicted to bind more tightly. Use for relative
ranking, Structure Confidence 0 - 1 Measures the confidence of the predicted structure Higher = more likely the structure predicted correctly. 0.8 considered high confidence.

You need all three to be high to trust a prediction. 1.2 Running Your Three Predictions Navigate to the Boltz Sandbox at lab.boltz.bio and log in to your account.

  1. Go to Sandbox → New Prediction
  2. Name this BRD4 binder JQ1
  3. Select ‘Complex’, add ‘Sequence from RCSB’, and add 3MXF
  4. Continue through Constraints (not needed for this example), and select Jq1 as the Binder for an affinity prediction.
  5. Submit the prediction.
  6. Use the ‘Duplicate Prediction’ in the results review, and remove the small molecule.
  7. Add in the SMILES for the Hit and Lead.
  8. When predictions complete, record your results in the table below whichsmallmoleculeremoveandhow whichsmallmoleculeremoveandhow Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Compound Binding Confidence Optimization Score Structure Confidence Hit
CompoundBinding ConfidenceOptimization ScoreStructure Confidence
Hit
Lead
JQI

Discussion Questions
• Does Binding Confidence increase as you move from hit to clinical candidate? What would you expect, and why might it deviate? • Inspect the predicted binding pose for JQ1. Can you identify potential key binding interactions. • Compare the Optimization Scores. How do the scores compare for JQ1 vs the Lead. Part 2: Setting Up a BRD4 Design Project Now you will create a small molecule Design Project - the Boltz Lab workflow for virtual screening and lead optimisation. We will set up BRD4 as a target using the clinical candidate as our structural reference. 2.1 Creating the Target

  1. From the dashboard, create a Design Projects via ‘New Project’
  2. Name your project: ‘BRD4 Workshop '
  3. Select ‘Small Molecule’
  4. Click Add Target and add the protein structure as in the Sandbox using PDB code 3MXF 5. Continue and let the apo structure complete. Continue if the structure looks good. 6. Leave binding residue selection blank, the platform will auto-detect the pocket 7. In the Molecular Probe field, paste the JQ1 SMILES.
  5. Predict Pocket Structure and complete the Target Set-Up �� Note: Why no binding residue selection? Boltz Lab uses the probe SMILES to identify the relevant binding pocket automatically. What the Probe Does The probe compound defines the active site geometry for the target. Boltz-2 uses the cofolded probe structure as an internal reference when scoring your library compounds. This is equivalent to providing a crystallographic template in traditional docking - except the model generates the structure on the fly.

Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Part 3: Running Your Virtual Screen BRD4 is a well validated target, and therefore we will generate a small Library of 1K small molecule binders. For typical exploratory targets, Boltz recommends 20K as a minimum number of binders. 3.1: Run a Generative Design Campaign We will utilize the Boltz Lab small-molecule generative workflow. This generates novel molecules optimised for BRD4 binding using Boltz-2 as the scoring function.

  1. After creating the design project, Boltz Lab will prompt you to Generate binders with AI. 2. Name your experiment, provide a relevant hypothesis, and Create the Experiment. 3. The New Virtual Screen will be pre-configured with a Generative screen using the Enamine REAL space.
  2. Keep ‘Normal Filtering’ selected. This will ensure we only generate molecules acceptable to a medicinal chemist.
  3. Decide if you would like to apply any Molecule Filters. We recommend the ‘Drug-Like’ Preset.
  4. Select a custom number of Binders and enter 1K.
  5. Start the Virtual Screen.
  6. Allow binders to be generated, and View Results in Experiment �� Note: 1k molecules is a very small screen, for real applications where you plan to synthesize the molecule (e.g. your final project) we would recommend running at least 10-20k molecules. Part 4: Analysis and Discussion As your experiment completes, use the ‘Quick Add Candidates’ on the experiment screen to add JQ1 as a benchmark for generated designs. 4.1 Interpreting Your Results As your experiment completes, use the ‘Quick Add Candidates’ on the experiment screen to add JQ1 as a benchmark for generated designs. From your screen output, identify three categories of molecules: Category Criteria Likely interpretation High confidence binders Binding Confidence > 0.80 Opt. Score > 0.40 Strong predicted hits - inspect poses carefully Moderate confidence Binding Confidence 0.65–0.80 Opt. Score 0.25–0.40 Plausible binders - additional validation needed Low confidence / non-binders Binding Confidence < 0.65 Opt. Score < 0.25 Likely incorrect pose or non binding chemotype

Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Discussion: As the virtual screen completes, assess the following: • How does JQ1 in the Design Project screen alongside the library. Does it score as the top compound?
• How do the top scoring binders compare in binding pose to JQ1? • Try adding a second target to your project via the dropdown in the structure viewer, for example, BRD2 (PDB: 5UEN). Re-run the top scoring binders against BRD2 and compare which compounds score highly for BRD4 but not BRD2. This is a selectivity analysis - a key part of real BET inhibitor programs. Resources and Further Reading Resource Link / Reference Boltz Lab Platform docs.boltz.bio Key BRD4 Paper Filippakopoulos P. et al. Nature 468, 1067–1073 (2010) JQ1 PDB Structure rcsb.org/structure/3MXF

Tutorial designed by Geoffrey Smith

Part C: Final Project: L-Protein Mutants

This homework requires computation that might take you a while to run, so please get started early.

Tools

See HTGAA Protein Engineering Tools spreadsheet