Week 5 HW: protein desing part 2

Homework — DUE BY START OF MAR 10 LECTURE

Part A: SOD1 Binder Peptide Design (From Pranam)

Introduction

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.
Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
Using the PepMLM Colab linked from the HuggingFace PepMLM-650M template card:
Generate four 12-amino-acid peptides conditioned to the mutant SOD1 sequence.
Add the known SOD1-binding peptide, FLYRWLPSRRGG, to the generated list for comparison.
Record the puzzle scores, which indicate PepMLM’s confidence in the ligands.

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHE
FGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSI
EDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVI
GIAQ

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

Part C: Final Project: L-Protein Mutants

The proposed mutations in the Bacteriophage MS2 lysis protein are generally biochemically consistent and plausible for improving stability without compromising function. The Q→E and S→T changes are conservative substitutions between polar residues that could favor electrostatic interactions or slightly increase local stability. The hydrophobic L→I and V→I mutations are particularly reasonable within the transmembrane domain, as they maintain the hydrophobicity necessary for membrane insertion and may improve helix packing. The F→Y substitution preserves aromaticity but introduces a hydroxyl group that could facilitate interactions at the membrane-water interface. Overall, these mutations do not significantly alter the length or overall hydrophobicity of the transmembrane segment, which is important because the L protein exerts its function by interfering with the MurJ lipid flippase II in Escherichia coli; therefore, the proposed changes are compatible with maintaining activity while potentially improving folding or structural stability.

Variants with more “aggressive” mutations (increasing hydrophobicity or propensity for helix formation in the transmembrane region):

  1. S -> L (increased hydrophobicity within the TM domain)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSLTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
  2. T -> L (reinforces the hydrophobic character of the propeller)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRLVTTLQQLLT
  3. A -> L (greater transmembrane helix stability)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
  4. L -> F (introduces aromatic residue into TM)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFFSKFTNQLLLSLLEAVIRTVTTLQQLLT

These variants introduce less conservative changes that increase hydrophobicity, or the propensity to form α-helices, within the transmembrane domain of the Bacteriophage MS2 L protein. By replacing polar or small residues with more hydrophobic amino acids such as leucine or phenylalanine, more stable insertion into the bacterial membrane and more efficient packing of the transmembrane helix can be favored. This potentially increases the protein’s interaction with the lipid flippase II MurJ in Escherichia coli, the functional target of this lysis protein, which could intensify the inhibition of peptidoglycan precursor transport and accelerate the cell lysis process. Overall, mutations of this type aim to increase the structural stability and functional efficiency of the lysis mechanism.

Computational Analysis of Lysis Protein Mutations Using ESM Protein Language Models

Introduction

Protein language models (PLMs) such as ESM-1b and ESM2 enable the extraction of contextual evolutionary information directly from amino acid sequences. In this study, multiple computational approaches were applied to analyze mutational tolerance within a small lysis-associated membrane protein containing a predicted transmembrane (TM) domain. The objective was to evaluate whether PLM-derived representations correlate with experimentally relevant structural and functional constraints.

Methods

Mutational effects were evaluated using two complementary strategies:

  1. Log-Likelihood Ratio (LLR) Analysis using EsmForMaskedLM, where each residue was masked and alternative amino acids were scored according to evolutionary plausibility.
  2. Embedding-Based Representation Analysis, where hidden-state embeddings from ESM-1b/ESM2 were extracted and averaged to evaluate global perturbations caused by single-site mutations.
    A heatmap representation of the embeddings was generated to visualize latent structural organization within the sequence.
Image 17: ESM2 embedding heatmap representation of the protein sequence.
Image 18: LLR ranking of top predicted mutations.
Image 19: Comparative mutational embedding analysis for positions 38–39.

Results

Table 1. Top Mutations Predicted by LLR Analysis
PositionWTMutationLLR ScoreStructural Interpretation
50KL2.56Increased TM hydrophobicity
39YL2.24TM helix stabilization
52TL1.81Reinforced membrane compatibility
53NL1.86Hydrophobic TM optimization
45AL1.54Helical stabilization
Table 2. Rationally Proposed Mutations
MutationRegionExpected Effect
S->LTMIncreased membrane hydrophobicity
T->LTMEnhanced helix packing
A->LTMImproved TM stability
L->FTMAromatic membrane interaction

Embedding-based analyses at positions 38 and 39 produced highly similar effect_esm values (~0.00753–0.00759), indicating that single-point mutations do not substantially perturb the global latent representation of the protein. However, mutations preserving hydrophobic or aromatic character (Y->F, Y->L, Y->I, Y->W) appeared more compatible with the TM environment.

Discussion

The results demonstrate that protein language models effectively capture structural and evolutionary constraints associated with transmembrane domains. High-scoring mutations consistently increased hydrophobicity, reinforcing the idea that PLMs encode membrane-associated biophysical principles. However, structural compatibility predicted by ESM does not necessarily imply preserved lytic activity. Experimental observations previously indicated that some functionally essential residues, such as K50, lose activity despite appearing structurally favorable when mutated.
Thus, PLMs appear to model sequence plausibility and structural coherence more strongly than specific biological function. Consequently, these methods are highly informative for identifying conserved structural regions and designing rational variants, but experimental validation remains essential for assessing functional activity.