Week 5 HW: Protein Design Part ii

[] Homework — DUE BY START OF MAR 10 LECTURE

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mechanis

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

[] Task A

Your challenge:

Background: Design short peptides that bind mutant SOD1 and then decide which ones are worth advancing toward therapy. You will use three models developed in our lab:
PepMLM: target sequence-conditioned peptide generation via masked language modeling.
PeptiVerse: therapeutic property prediction.
moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

                  Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

🟢🟤🟡P00441

Here is fully translated superoxide dimutase protein P00441 in uniprot with the initiator methionine included. We need to cleave that M off before we apply our requested mutation to progress with a mature enzyme.

So not this… 1 2 3 4 M A T K

But this.. 1 2 3 4 A T K A

To create our A4V SOD (love the rhyme) mutant… 1 2 3 4 A T K V

The savvy student who fails to cleave the first methionine (M) can intuit the actual amino acid to change without thinking through any of the previous steps, but it’s nice to have a why in all things, since this is biology after all and we have evolution and ChatGPT. Please note that we will not want to use a protein sequence with any sort of truncation or wrapping on the sequence so here are my sequences for PPMLM-650M.

                  Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:

🤗pepmlm650mlink

                  Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

To create our A4V SOD (love the rhyme) mutant… 1 2 3 4 A T K V

Mutant A4V SOD for PepMLM-650 There are two options, full protein sequence and a 12-Sequence input which I settled on in later runs.
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

colabcode

MATKVVCVLKGD

Within the PepMLM-650 codebase in Google Colab Notebook, there are sliders and input fields to parameterize individual runs. However, these parameters didn’t seem to encode, so I finally hard-coded changes, as I will show below as a series of excerpts pulled from the codebase.

single_sequence = True #@param {type:"boolean"}
protein_seq = "MATKVVCVLKGD" #@param {type:"string"}

# Initial value for num_binders
num_binders = 4

# Initial values for top_k and peptide_length
top_k = 3
peptide_length = 12

code_constrained_step

Initial_4in1_SequenceSet

Binder	Pseudo_Perplexity_Score
WVVVLVAGVVGE	35.014933
LTLVVAVGEVGE	25.582245
SVTEEVEDVDPV	21.336863
LPTVVVEGVDPE	17.079494

To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

Scores above, I think. I feel like I’m in Gulliver’s Travels returning this exercise cold after 2 months so I am going to try and piece together memory using a peers homework. For the next phase I am now going to find the amino acid sequence for my SOD1 sequence. Now I’m not copying information because I just went to uniprot site myself and searched for SOD1 on the splash page and scrolling down on first page I found human sodc but I skipped over that one and found the sod for sheep

P09670 · SODC_SHEEP
MATKAVCVLKG
DGPVQGTIRFE
AKGDKVVVTGS
ITGLTEGDHGF
HVHQFGDNTQG
CTSAGPHFNPL
SKKHGGPKDEE
RHVGDLGNVKA
DKNGVAIVDIV
DPLISLSGEYS
IIGRTMVVHEK
PDDLGRGGNEE
STKTGNAGGRL
ACGVIGIAP

Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Given confusion perplexity scores are likely very high. The model confidence according to UniProt is spot on though, specifically model confidence is very high (pLDDT > 90). This is generated by AlphaFold as a per-residue confidence score (pLDDT) between 0 and 100. Now I am prepared to transition to part 2.

What about my mutants, though? I do not want to disrupt folding randomly, need an appropriate target region for mutation logic so I will leverage the MobiDB website.

Enumerated Amino Acids with position and highlighted for subsequent mutagenesis based on encode segment flexibility, disruptability, and functional consequences

Position	Amino Acid
1	M
2	A
3	T
4	K
5	A
6	V
7	C
8	V
9	L
10	K
11	G
12	D
13	G
14	P
15	V
16	Q
17	G
18	T
19	I
20	R
21	F
22	E
23	A
24	K
25	G
26	D
27	K
28	V
29	V
30	V
31	T
32	G
33	S
34	I
35	T
36	G
37	L
38	T
39	E
40	G
41	D
42	H
43	G
44	F
45	H
46	V
47	H
48	Q
49	F
50	G
51	D
52	N
53	T
54	Q
55	G
56	C
57	T
58	S
59	A
60	G
61	P
62	H
63	F
64	N
65	P
66	L
67	S
68	K
69	K
70	H
71	G
72	G
73	P
74	K
75	D
76	E
77	E
78	R
79	H
80	V
81	G
82	D
83	L
84	G
85	N
86	V
87	K
88	A
89	D
90	K
91	N
92	G
93	V
94	A
95	I
96	V
97	D
98	I
99	V
100	D
101	P
102	L
103	I
104	S
105	L
106	S
107	G
108	E
109	Y
110	S
111	I
112	I
113	G
114	R
115	T
116	M
117	V
118	V
119	H
120	E
121	K
122	P
123	D
124	D
125	L
126	G
127	R
128	G
129	G
130	N
131	E
132	E
133	S
134	T
135	K
136	T
137	G
138	N
139	A
140	G
141	G
142	R
143	L
144	A
145	C
146	G
147	V
148	I
149	G
150	I
151	A
152	P

Amino acids selected for mutation

Position	WT	Candidate	Reason
71	G	A	Reduce flexibility
72	G	A	Flexible glycine region
76	E	P	Potential structural disruption site

Final mutant sequence with three changes

MATKAVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHAAPKDPERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP

Sequence Type	Amino Acid Sequence
G71A/G72A + E76P Mutant

Part 2: Evaluate Binders with AlphaFold3

Navigate to the AlphaFold Server: [alphafoldserver](https://alphafoldserver.com/welcome) For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?MATKVVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP

Sequence Type	Amino Acid Sequence
A4V Mutant

In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

There were two mutation pathways considered in this section the A4V change resulted in a 0.1 ipTM drop. I can comment on the mutation I engineered above, which introduces a three-residue change (G71A/G72A + E76P). The predicted template modeling (pTM) score and interface predicted template modeling (ipTM) scores are based on the template modeling (TM) score which are all metrics available in the AlphaFold Server visualization. The TM was originally proposed by Zang_&_Skolnick based on the Global Distance Test (GDT) and MaxSub. The scores are evaluated using a statistical association, measured by a correlation coefficient, after adjusting for differences in protein size. An interesting observation in Abramson et al. (2024) methods article they do not resport statistical tests of association due to small n populations paper. The paper describes pTM and ipTM as global ranking variables that can increase rates of disorder in model. In addition chain ranking can be performed with a variation of the pTM metric and pLDDT can be averaged for putative residues.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

Paste the peptide sequence. Paste the A4V mutant SOD1 sequence in the target field.

MATKVVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP

Sequence Type	Amino Acid Sequence
A4V Mutant

Check the boxes Predicted binding affinity Solubility Hemolysis probability Net charge (pH 7) Molecular weight Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

There can be alignment between the computational prediction scores from AlphaFold Server, specifically ipTM rank score, which are both “significant” because they are larger than 0.80 but the wild type ipTM score is 0.01 larger than the mutant ipTM. In regards to the thermodynamics expressed in the PeptiVerse datasheets. Solubility and penetrance increase in the WT as hydrophobicity declines, compared to the mutant, which has greater hydrophobicity and lower solubility and penetrance. The explanation for the thermodynamic differences between the wild type and the mutants is the exposure of hydrophobic bases to the solution, leading to more water cages forming around them in the mutant than in the wild type.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

Open the moPPit Colab linked from the HuggingFace moPPIt model card Make a copy and switch to a GPU runtime. In the notebook: ~Make sure to switch De Novo to enter manual sequence

Paste your A4V mutant SOD1 sequence. Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch). Set peptide length to 12 amino acids. Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides. ~Set target for affinity to original WT: MATKAVCVLKGDGPVQGTIRFEAKGDKVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSKKHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTGNAGGRLACGVIGIAP After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies? It doesn’t stay connected to the necessary server to run.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

[] Task B - Boltz Document

https://docs.google.com/document/d/18Vd9TQL2FjpEU0QdlGCgHe1D0BDoMzcfPRiFEXQIAas/preview

Boltz Lab BRD4 Drug Discovery Platform Tutorial Introduction This exercise walks you through a real drug discovery workflow using the Boltz Lab platform - from predicting how known drugs bind a cancer target, all the way to running AI-generated molecule libraries and interpreting the results. You will work on BRD4 (Bromodomain-containing protein 4), an epigenetic reader protein and validated oncology target. BRD4 has been the subject of intense medicinal chemistry effort in recent years. What you will learn • How to use Boltz Lab to predict protein-ligand binding structures • How to interpret Binding Confidence and Optimization Score metrics • How to set up a virtual screening project in Boltz Lab • How to compare known drugs and AI-generated molecules in a single workflow • How to critically evaluate computational predictions from a drug discovery perspective

Background: BRD4 and BET Bromodomains BRD4 is a member of the BET (Bromodomain and Extra-Terminal) family of epigenetic reader proteins. It recognises acetylated lysine residues on histone tails and recruits transcriptional machinery to gene promoters, driving expression of oncogenes including c-Myc. Dysregulated BRD4 activity is implicated in haematological malignancies, solid tumours, and inflammatory disease. This exercise inspects the example of JQ1 - the landmark BRD4 inhibitor reported by Filippakopoulos et al. in Nature 2010. The three compounds below capture a hit-to-candidate optimisation journey, including a deliberately instructive stereochemical twist. Stage Compound SMILES Hit Stripped
Back Core CC1C2C(=C(SC=2NCCN=1)C)C Lead Triazole +
Acid O=C(C[C@@H]1N=C(C)C2C(=C(SC=2N2C1=NN=C2C)C)C)O Candidate (+)-JQ1 O=C(C[C@H]1C2=NN=C(N2C3=C(C(C4=CC=C(C=C4)Cl)=N1)C(C)= C(S3)C)C)OC(C)(C)C

�� Note: Reference: Filippakopoulos P. et al. Selective inhibition of BET bromodomains. Nature 468, 1067-1073 (2010). Crystal structure PDB: 3MXF (BRD4 BD1 complexed with (+)-JQ1). source Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Part 0: Sign-up to Boltz Lab Go to lab.boltz.bio, click “Request Access”, add your name and email while specifying as organization name “HTGAA”, and click “Submit request”.
We will try to make sure to approve your request within a day or two, giving you credits for both the exercise as well as further exploration. If you plan to use Boltz Lab for your final project and need more credits, please reach out to me at gabriele@boltz.bio.
Part 1: Structural Predictions in the Sandbox

Start with three Boltz-2 predictions in the Sandbox to understand how the model scores protein– ligand interactions across a real drug discovery progression. 1.1 The Boltz-2 Metrics Explained Before you run your first prediction, understand these three key outputs: Metric Range What it means When to trust it Binding Confidence 0 - 1 How confidently Boltz-2 places the ligand in the binding site. Higher = predicted more likely to bind.

0.7 considered
reliable; > 0.8 high
confidence Optimization Score 0 - 1 A relative affinity for use in congeneric series, or between known binders. Higher = predicted to bind more tightly. Use for relative
ranking, Structure Confidence 0 - 1 Measures the confidence of the predicted structure Higher = more likely the structure predicted correctly. 0.8 considered high confidence.

You need all three to be high to trust a prediction. 1.2 Running Your Three Predictions Navigate to the Boltz Sandbox at lab.boltz.bio and log in to your account.

Go to Sandbox → New Prediction
Name this BRD4 binder JQ1
Select ‘Complex’, add ‘Sequence from RCSB’, and add 3MXF
Continue through Constraints (not needed for this example), and select Jq1 as the Binder for an affinity prediction.
Submit the prediction.
Use the ‘Duplicate Prediction’ in the results review, and remove the small molecule.
Add in the SMILES for the Hit and Lead.
When predictions complete, record your results in the table below Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Compound Binding Confidence Optimization Score Structure Confidence Hit

Compound	Binding Confidence	Optimization Score	Structure Confidence
Hit
Lead
JQI

Discussion Questions
• Does Binding Confidence increase as you move from hit to clinical candidate? What would you expect, and why might it deviate? • Inspect the predicted binding pose for JQ1. Can you identify potential key binding interactions. • Compare the Optimization Scores. How do the scores compare for JQ1 vs the Lead. Part 2: Setting Up a BRD4 Design Project Now you will create a small molecule Design Project - the Boltz Lab workflow for virtual screening and lead optimisation. We will set up BRD4 as a target using the clinical candidate as our structural reference. 2.1 Creating the Target

From the dashboard, create a Design Projects via ‘New Project’
Name your project: ‘BRD4 Workshop '
Select ‘Small Molecule’
Click Add Target and add the protein structure as in the Sandbox using PDB code 3MXF 5. Continue and let the apo structure complete. Continue if the structure looks good. 6. Leave binding residue selection blank, the platform will auto-detect the pocket 7. In the Molecular Probe field, paste the JQ1 SMILES.
Predict Pocket Structure and complete the Target Set-Up �� Note: Why no binding residue selection? Boltz Lab uses the probe SMILES to identify the relevant binding pocket automatically. What the Probe Does The probe compound defines the active site geometry for the target. Boltz-2 uses the cofolded probe structure as an internal reference when scoring your library compounds. This is equivalent to providing a crystallographic template in traditional docking - except the model generates the structure on the fly.

Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Part 3: Running Your Virtual Screen BRD4 is a well validated target, and therefore we will generate a small Library of 1K small molecule binders. For typical exploratory targets, Boltz recommends 20K as a minimum number of binders. 3.1: Run a Generative Design Campaign We will utilize the Boltz Lab small-molecule generative workflow. This generates novel molecules optimised for BRD4 binding using Boltz-2 as the scoring function.

After creating the design project, Boltz Lab will prompt you to Generate binders with AI. 2. Name your experiment, provide a relevant hypothesis, and Create the Experiment. 3. The New Virtual Screen will be pre-configured with a Generative screen using the Enamine REAL space.
Keep ‘Normal Filtering’ selected. This will ensure we only generate molecules acceptable to a medicinal chemist.
Decide if you would like to apply any Molecule Filters. We recommend the ‘Drug-Like’ Preset.
Select a custom number of Binders and enter 1K.
Start the Virtual Screen.
Allow binders to be generated, and View Results in Experiment �� Note: 1k molecules is a very small screen, for real applications where you plan to synthesize the molecule (e.g. your final project) we would recommend running at least 10-20k molecules. Part 4: Analysis and Discussion As your experiment completes, use the ‘Quick Add Candidates’ on the experiment screen to add JQ1 as a benchmark for generated designs. 4.1 Interpreting Your Results As your experiment completes, use the ‘Quick Add Candidates’ on the experiment screen to add JQ1 as a benchmark for generated designs. From your screen output, identify three categories of molecules: Category Criteria Likely interpretation High confidence binders Binding Confidence > 0.80 Opt. Score > 0.40 Strong predicted hits - inspect poses carefully Moderate confidence Binding Confidence 0.65–0.80 Opt. Score 0.25–0.40 Plausible binders - additional validation needed Low confidence / non-binders Binding Confidence < 0.65 Opt. Score < 0.25 Likely incorrect pose or non binding chemotype

Tutorial designed by Geoffrey Smith Boltz Lab | BRD4 Platform Tutorial — MIT Guest Lecture Discussion: As the virtual screen completes, assess the following: • How does JQ1 in the Design Project screen alongside the library. Does it score as the top compound?
• How do the top scoring binders compare in binding pose to JQ1? • Try adding a second target to your project via the dropdown in the structure viewer, for example, BRD2 (PDB: 5UEN). Re-run the top scoring binders against BRD2 and compare which compounds score highly for BRD4 but not BRD2. This is a selectivity analysis - a key part of real BET inhibitor programs. Resources and Further Reading Resource Link / Reference Boltz Lab Platform docs.boltz.bio Key BRD4 Paper Filippakopoulos P. et al. Nature 468, 1067–1073 (2010) JQ1 PDB Structure rcsb.org/structure/3MXF

Tutorial designed by Geoffrey Smith

Part C: Final Project: L-Protein Mutants

This homework requires computation that might take you a while to run, so please get started early.

Tools

See HTGAA Protein Engineering Tools spreadsheet