Reprogramming T7 Bacteriophage Host Specificity: A Computational Redesign of the gp17 Receptor-Binding Domain for Targeted Pseudomonas aeruginosa Elimination

Abstract

Antibiotic-resistant bacterial infections cause 1.27 million deaths annually, with forecasts suggesting up to 40 million mortalities by 2050 (Centers for Disease Control and Prevention, 2025; Naddaf, 2024). Pseudomonas aeruginosa is a “Priority 1” critical pathogen in this crisis, often forming biofilms encased in extracellular polymeric substances (EPS) that promote high antibiotic resistance and virulence through quorum sensing. While bacteriophages offer an orthogonal treatment, their narrow host range limits clinical utility. The significance of this project lies in overcoming these restrictions through the precision engineering of viral attachment proteins to target specific resistant strains. The broad objective is to expand the T7 lytic phage host range by redesigning its tail fibers to target P. aeruginosa. In the initial adsorption stage, phages use receptor-binding proteins (RBPs) on their tail fibers to bind to cell wall components, eventually leading to bacterial lysis via holin and endolysin production.

The project tests the hypothesis that integrating evolutionary covariation data with generative deep learning can produce functional, high-affinity chimeric receptor-binding domains (RBDs) that maintain the structural integrity of the T7 scaffold while altering its specificity. Specific aims include the computational redesign of the C-terminal distal knob , the synthesis of these candidates into pET-28a(+) expression vectors and the validation of binding through recombinant protein expression and adsorption assays. Technical methods involve the use of ESM3 for evolutionary mapping, AlphaFold3 for structural docking and ProteinMPNN for sequence diversification. The expected outcome is a ranked library of chimeric fibers capable of neutralizing resistant P. aeruginosa.

Section 2: Project Aims

change-AIM-2-and-put-SacI-and-HindIII-instead-of-the-restriction-enzyme…-image-1.png change-AIM-2-and-put-SacI-and-HindIII-instead-of-the-restriction-enzyme…-image-1.png

Project Aims

Scientific-figure-panel,-publication-quality,-clean-white-background,-N…-image-1.png Scientific-figure-panel,-publication-quality,-clean-white-background,-N…-image-1.png

Author: Tammy Sisodiya Course: HTGAA 2026 Final Project


SECTION 1: ABSTRACT

Antibiotic-resistant bacterial infections represent one of the most urgent threats to global public health, causing an estimated 1.27 million deaths annually and projected to claim up to 40 million lives by 2050 (CDC, 2025; Naddaf, 2024). Pseudomonas aeruginosa is classified as a Priority 1 critical pathogen by the WHO, owing to its capacity to form EPS-encased biofilms, coordinate virulence through quorum sensing, and acquire resistance to last-resort antibiotics including carbapenems and colistin. Bacteriophages offer a mechanistically orthogonal therapeutic modality that is not subject to the same resistance mechanisms as small-molecule antibiotics, but their narrow host range — determined by receptor-binding protein specificity — severely limits clinical deployment against non-native hosts.

This project addresses that limitation through the precision computational redesign of the T7 bacteriophage tail fiber protein gp17. A 558 bp wild-type gp17 receptor-binding domain (RBD) was codon-optimised for E. coli expression using VectorBuilder, digitally assembled into pET-21a(+) with SacI/HindIII cloning sites and an in-frame N-terminal 6×His tag, and formatted as a 579 bp Twist Clonal Gene synthesis-ready blueprint deposited in Benchling. The central hypothesis is that the host-specificity of the T7-gp17 RBD can be fundamentally reprogrammed by generatively redesigning the distal tip loops (residues 466–553) to recognise the evolutionarily orthogonal P. aeruginosa OprF outer membrane porin (UniProt P13794), achieving a predicted binding affinity of K_d < 100 nM and ≥90% relative discrimination from the native LPS-binding background when evaluated by AF-Multimer ipTM scoring against a WT baseline control.

Three specific aims structure the project. Aim 1 executes the full in-silico generative design pipeline: ESM3 evolutionary covariation mapping, ProteinMPNN loop sequence diversification, and AlphaFold-Multimer docking evaluation against OprF, culminating in a ranked library of redesigned RBD candidates and a Twist Bioscience pro forma order. Aim 2 advances the top five candidates through molecular dynamics interface stability simulation using OpenMM and independent Rosetta FastRelax and InterfaceAnalyzer energy scoring, cross-validating the static docking predictions with dynamic structural and physics-based evidence. Aim 3 envisions deployment of the engineered phage as a programmable, self-amplifying antimicrobial agent in clinical and environmental settings, forming the basis of a universal phage retargeting platform applicable to any gram-negative pathogen with a structurally characterised outer membrane protein. The entire project is fully in-silico, with all constructs designed for immediate synthesis via Twist Bioscience and remote execution through Ginkgo Cloud Laboratories upon wet-lab access.


SECTION 2: PROJECT AIMS

Aim 1 — Experimental Aim: Generative Computational Redesign and Static Docking Validation of the T7-gp17 RBD for OprF Binding

“The first aim of my final project is to computationally redesign the distal tip loops (residues 466–553) of the T7-gp17 receptor-binding domain for high-affinity recognition of P. aeruginosa OprF by utilising ESM3 evolutionary covariation mapping, ProteinMPNN constrained sequence diversification, AlphaFold-Multimer ipTM docking evaluation, and Benchling/Twist Bioscience construct documentation.”

This aim encompasses: structural input preparation (PDB 4A0T + UniProt P13794 AlphaFold2 model), ESM3 mutability mapping, ProteinMPNN 500-sequence library generation, pre-screening filters, AF-Multimer batch docking of 100 candidates, composite scoring to identify top five candidates, interface contact map analysis, homotrimer fidelity verification, and synthesis-ready Twist Clonal Gene pro forma ordering.

Relevant resources:

Success Criteria:

  • Predicted K_d < 100 nM via ΔG estimation from AF-Multimer interface scoring
  • ≥90% relative discrimination: redesign ipTM vs. OprF exceeds WT T7-gp17 ipTM vs. OprF by ≥90%
  • pLDDT > 70 for redesigned loop regions
  • Scaffold RMSD < 2.0 Å relative to WT 4A0T

Aim 2 — Development Aim: Molecular Dynamics Simulation and Rosetta Energy Validation of Top Candidates

The next step following a successful Aim 1 is to subject the top five ProteinMPNN/AF-Multimer candidates to 100 ns molecular dynamics simulations using OpenMM to assess interface stability under physiological conditions, and to independently score each candidate using Rosetta FastRelax and InterfaceAnalyzer ΔΔG estimation, thereby cross-validating the static docking predictions with dynamic structural and physics-based evidence and nominating a single lead candidate for Twist Bioscience synthesis and future wet-lab validation via Ginkgo Cloud Laboratories.

Success Criteria:

  • Interface RMSD < 3.0 Å over 100 ns MD simulation for ≥1 candidate
  • Rosetta ΔΔG < −10 REU for ≥1 candidate
  • Concordance between AF-Multimer rank and Rosetta rank: Spearman ρ > 0.7
  • Single lead candidate nominated with supporting evidence from all three scoring methods

Aim 3 — Visionary Aim: A Programmable, Self-Amplifying Antimicrobial Agent for Precision Infection Control

In the long-term vision, engineered T7 phages carrying validated chimeric gp17 fibers will be deployed as living, self-amplifying antimicrobials capable of seeking and destroying drug-resistant P. aeruginosa in complex polymicrobial environments — including chronic wound biofilms, ventilator-associated pneumonia, and contaminated water systems — and by coupling the modular RBD engineering platform developed here with directed evolution and AI-guided sequence optimisation in partnership with Ginkgo Bioworks and Helix Nano, a universal phage retargeting toolkit will be established that enables rapid response to emerging resistant strains faster than resistance can evolve, fundamentally challenging the paradigm of static, single-target antimicrobial therapy.


SECTION 3: BACKGROUND

Background and Literature Context

The structural basis for T7 tail fiber host specificity has been characterised at atomic resolution. Garcia-Doval & van Raaij (2012) resolved the T7 gp17 C-terminal domain to 1.8 Å, revealing a homotrimeric β-propeller tip with six hypervariable loops (residues 466–553) that make direct contact with E. coli LPS core oligosaccharide. This structural map defines the engineering cassette used in the current project and establishes that loop plasticity is the primary determinant of host range. Separately, Yosef et al. (2017) demonstrated that rational substitution of tail fiber tip residues in T4 and related phages is sufficient to redirect adsorption to non-native gram-negative hosts, providing experimental precedent for the cross-species retargeting strategy proposed here. Together, these studies establish a clear knowledge gap: while loop-level structural data and retargeting proof-of-concept exist, no published study has applied a generative deep learning pipeline combining ProteinMPNN and AF-Multimer to systematically redesign T7-gp17 loops for an evolutionarily orthogonal outer membrane protein target such as OprF, nor has any study cross-validated such predictions with molecular dynamics and Rosetta energy analysis.

Innovation

This project is the first to apply a three-tier computational validation pipeline — combining ESM3 evolutionary covariation, ProteinMPNN sequence diversification, AF-Multimer structural docking, molecular dynamics interface stability assessment, and Rosetta ΔΔG estimation — specifically to the T7-gp17 RBD for retargeting to P. aeruginosa OprF. Unlike rational mutagenesis approaches that substitute individual residues, ProteinMPNN explores a vastly larger sequence space while maintaining the structural integrity of the β-propeller scaffold, enabling discovery of non-intuitive loop sequences with superior binding geometry. The use of three orthogonal scoring methods — ipTM relative improvement, MD interface RMSD, and Rosetta ΔΔG — provides a methodologically rigorous multi-evidence framework that minimises false positives and maximises the probability that the nominated lead candidate will translate to wet-lab binding.

Significance

Drug-resistant P. aeruginosa infections are responsible for a disproportionate share of hospital-acquired mortality, particularly in immunocompromised patients and those with cystic fibrosis, and current antibiotic options for carbapenem-resistant strains are severely limited. The clinical pipeline for new antibiotics is insufficient to address the projected 2050 mortality burden, making mechanistically orthogonal therapeutic modalities an urgent priority. Engineered phages with programmable host specificity represent a fundamentally different approach that is not subject to the same resistance mechanisms as small-molecule antibiotics, because phage-bacteria co-evolution can be harnessed rather than fought. By establishing a generalizable computational pipeline for RBD retargeting, this project creates a platform technology applicable to any gram-negative pathogen with a structurally characterised outer membrane protein, including Klebsiella pneumoniae, Acinetobacter baumannii, and Enterobacter species. The synthesis-ready construct design — a 579 bp Twist Clonal Gene in pET-21a(+), executable remotely via Ginkgo Cloud Laboratories — ensures that the transition from computational prediction to wet-lab validation can occur rapidly, reducing the time-to-experiment for future iterations and enabling agile response to emerging clinical threats.

Bioethical Considerations

Ethics (Non-Maleficence and Responsibility): The engineering of bacteriophages with altered host range raises legitimate biosafety and ecological concerns governed by the principles of non-maleficence and responsibility. Redirecting a lytic phage to a new bacterial host could affect non-target bacterial populations in complex microbiomes, including the human gut microbiome, with unpredictable downstream consequences for host health. The current project applies the principle of non-maleficence by working exclusively in silico and by designing constructs that target OprF — a protein absent from commensal E. coli and most non-pathogenic gram-negatives — thereby building host specificity into the molecular architecture of the therapeutic agent rather than relying solely on containment. All DNA sequences will be screened through SecureDNA biosecurity protocols prior to synthesis, and no live phage engineering is proposed in the current phase, minimising dual-use risk.

Risk Mitigation and Responsible Implementation: The pro forma Twist Bioscience order is designed for BSL-1 E. coli expression of isolated protein domains, not for reconstitution of infectious phage particles, and any future transition to whole-phage engineering would require BSL-2 containment, IBC approval, and compliance with institutional biosafety protocols. A key uncertainty is whether OprF-targeted phages could inadvertently infect OprF-expressing commensal organisms or whether engineered phages could acquire broader host range through recombination in complex environments — alternatives such as phage-derived protein therapeutics (non-replicating) or CRISPR-phage hybrids with kill-switch circuits should be evaluated as lower-risk alternatives. Deployment in clinical or environmental settings would require regulatory review under FDA phage therapy frameworks and EMA compassionate use guidelines, and the project team commits to open publication of all computational methods and sequence data to enable community scrutiny and prevent proprietary lock-in of a potentially life-saving technology.


SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

Note: This project is fully in-silico. Steps 1–10 cover Aim 1 (generative design and static docking). Steps 11–13 cover Aim 2 (MD simulation and Rosetta validation). Steps 14–15 cover construct documentation and Twist pro forma ordering. The Results & Quantitative Expectations section describes the Rosetta cross-validation as the chosen validation experiment.


Step 1 — Structural Input Preparation Purpose: Obtain and prepare the T7-gp17 C-terminal domain structure and the OprF extracellular domain model as docking inputs. Method: Download PDB entry 4A0T. Extract the homotrimeric assembly. Isolate residues 371–553 as the engineering cassette. Generate the OprF extracellular loop model from UniProt P13794 (https://rest.uniprot.org/uniprotkb/P13794.fasta) using AlphaFold2 monomer prediction via ColabFold. Clean all PDB files by removing water molecules and non-essential heteroatoms using BioPython. Tools: PyMOL, BioPython, Google Colab. Expected Result: Clean, docking-ready PDB files for both the gp17 tip domain and the OprF extracellular domain. Timeline: Day 1–2.


Step 2 — ESM3 Evolutionary Covariation Mapping Purpose: Define which residues in the gp17 tip loops (466–553) are evolutionarily mutable versus structurally constrained, to guide ProteinMPNN sampling. Method: Retrieve an MSA of T7-family tail fiber proteins from NCBI using jackhmmer. Run ESM3 on the MSA. Extract per-position conservation scores and covariation matrices. Flag high-covariation residues as structurally constrained; flag low-conservation loop positions as freely mutable. Tools: ESM3 API (https://github.com/evolutionaryscale/esm) or local GPU inference. Expected Result: Per-residue mutability map for residues 466–553, with ≥15 positions identified as freely mutable loop residues. Timeline: Day 2–4.


Step 3 — ProteinMPNN Loop Sequence Diversification Purpose: Generate a library of redesigned gp17 tip sequences optimised for OprF surface complementarity. Method: Input the gp17 tip structure (from Step 1) with OprF extracellular loops as the docking partner context into ProteinMPNN. Fix scaffold residues 371–465 as immutable. Allow sampling for mutable loop positions 466–553 identified in Step 2. Generate 500 candidate sequences with temperature sampling at T = 0.1, 0.2, and 0.5 to balance confidence and diversity. Tools: ProteinMPNN GitHub pipeline on Google Colab Pro+ or local GPU. Expected Result: 500 candidate loop sequences with ProteinMPNN log-likelihood scores. Timeline: Day 4–7.


Step 4 — Pre-Screening Filter Purpose: Reduce the 500-candidate library to the top 100 for computationally expensive AF-Multimer docking runs. Method: Rank all 500 candidates by ProteinMPNN log-likelihood score. Apply hard filters: remove sequences with predicted internal prolines in β-strand positions; remove sequences with net charge outside −5 to +5 at pH 7.4; remove sequences with TANGO aggregation score > 20. Retain top 100 passing candidates. Tools: Python/pandas filtering script; TANGO aggregation predictor (http://tango.crg.es). Expected Result: 100 filtered candidates ready for AF-Multimer docking. Timeline: Day 7–8.


Step 5 — WT Baseline Docking Control Purpose: Establish the WT T7-gp17/OprF ipTM score as the negative baseline for the 90% discrimination metric. Method: Run AF-Multimer (ColabFold) on the unmodified T7-gp17 tip (residues 371–553, corresponding to the 558 bp WT RBD sequence deposited at https://benchling.com/s/seq-A6Po7dHj1Cl12EvGgDA1) docked against the OprF extracellular domain. Record ipTM, pTM, and interface ΔG estimate. Tools: ColabFold on Google Colab Pro+ or local GPU cluster. Expected Result: WT baseline ipTM of 0.20–0.35, confirming the evolutionarily orthogonal interface is not natively recognised. Timeline: Day 8 (parallel with Step 4).


Step 6 — AF-Multimer Docking Screen Purpose: Evaluate predicted binding geometry and confidence for each of the 100 filtered candidates against OprF. Method: Run AF-Multimer in ColabFold batch mode for all 100 candidates docked against the OprF extracellular domain (UniProt P13794). Record ipTM, pTM, and per-residue pLDDT scores for each complex. Batch jobs submitted overnight on GPU cluster. Tools: ColabFold batch pipeline. Expected Result: ipTM score distribution for 100 candidates. Top candidates expected to show ipTM > 0.5, representing ≥90% relative improvement over the WT baseline of 0.20–0.35. Timeline: Day 8–14.


Step 7 — Composite Candidate Scoring and Ranking Purpose: Identify the top five candidates for downstream MD and Rosetta validation using a multi-criteria scoring function. Method: Rank all 100 docked candidates by composite score: (ipTM × 0.5) + (pLDDT_loops × 0.3) + (ProteinMPNN log-likelihood × 0.2). Apply final filters: pLDDT > 70 for loop regions; ipTM ≥ 90% relative improvement over WT baseline; no predicted disulfide bonds incompatible with the reducing cytoplasm of E. coli. Select top five candidates. Tools: Python/pandas scoring script; matplotlib for visualisation. Expected Result: Five ranked candidates with composite scores, structural visualisations, and predicted interface contact maps exported as PNG figures. Timeline: Day 14–16.


Step 8 — Interface Contact Map Analysis Purpose: Verify that redesigned loops make predicted contacts with OprF extracellular loops, not with buried or transmembrane regions. Method: For each top-five candidate, extract AF-Multimer predicted interface residues at distance < 5 Å. Map contacts onto the OprF topology diagram (extracellular loops L1–L8 of the 16-strand beta-barrel, UniProt P13794). Confirm ≥70% of predicted contacts fall within extracellular loop regions. Tools: PyMOL contact analysis script. Expected Result: Contact maps showing loop-to-loop interface geometry consistent with surface-accessible binding for all five candidates. Timeline: Day 16–18.


Step 9 — Homotrimer Structural Fidelity Verification Purpose: Confirm that the redesigned scaffold retains the homotrimeric β-propeller geometry required for functional fiber assembly. Method: Predict the homotrimeric assembly of each top-five candidate using AF-Multimer (3-chain input). Measure RMSD of scaffold residues 371–465 relative to WT 4A0T using PyMOL. Flag any candidate with RMSD > 2.0 Å as structurally compromised and replace with the next-ranked candidate from Step 7. Tools: ColabFold + PyMOL RMSD measurement. Expected Result: All five candidates show scaffold RMSD < 2.0 Å, confirming loop redesign does not perturb the structural chassis. Timeline: Day 18–20.


Step 10 — Aim 1 Summary Report and Candidate Dossier Purpose: Document all Aim 1 results in a structured candidate dossier for handoff to Aim 2 analysis. Method: Compile for each of the five candidates: composite score, ipTM, pLDDT_loops, scaffold RMSD, interface contact map, and AF-Multimer predicted structure PDB file. Generate a ranked summary table. Export all structural figures as publication-quality PNG files using PyMOL. Tools: Python reporting script; PyMOL. Expected Result: Five-candidate dossier with all supporting computational evidence, ready for Aim 2 MD simulation input. Timeline: Day 20–21.


Step 11 — Molecular Dynamics Interface Stability Simulation (Aim 2) Purpose: Assess whether the predicted binding interfaces from Aim 1 are dynamically stable under physiological conditions. Method: For each of the five top candidates, prepare the AF-Multimer predicted complex for MD simulation using OpenMM on Google Colab Pro+. Solvate in explicit TIP3P water box with 150 mM NaCl. Energy minimise, equilibrate for 1 ns NPT ensemble at 310 K and 1 atm, then run 100 ns production simulation. Record interface RMSD, number of persistent hydrogen bonds, and buried surface area over time. Tools: OpenMM on Google Colab Pro+ with GPU acceleration. Expected Result: Interface RMSD < 3.0 Å over 100 ns for ≥1 candidate. Candidates with interface RMSD > 5.0 Å are eliminated. Timeline: Day 21–30.


Step 12 — Rosetta FastRelax and InterfaceAnalyzer Scoring (Aim 2) Purpose: Generate an independent physics-based ΔΔG estimate for each candidate to cross-validate AF-Multimer ipTM rankings. Method: Input each AF-Multimer predicted complex into Rosetta FastRelax (50 cycles, REF2015 score function) via PyRosetta to relieve clashes and optimise side-chain rotamers. Run InterfaceAnalyzer on the relaxed complex to extract ΔΔG_binding, interface buried surface area, shape complementarity score (Sc), and number of interface hydrogen bonds. Run identical analysis on the WT T7-gp17/OprF complex as the energy baseline. Tools: PyRosetta on Google Colab or local HPC. Expected Result: Rosetta ΔΔG < −10 REU for ≥1 candidate. Shape complementarity Sc > 0.6 for top candidates. Timeline: Day 25–32 (parallel with Step 11).


Step 13 — Multi-Method Concordance Analysis and Lead Candidate Nomination (Aim 2) Purpose: Integrate AF-Multimer, MD, and Rosetta scores to nominate a single lead candidate with the strongest multi-evidence support. Method: Construct a concordance matrix comparing the rank order of all five candidates across three scoring methods: AF-Multimer ipTM, MD interface RMSD stability, and Rosetta ΔΔG. Calculate Spearman rank correlation between all pairs of methods using Python/scipy. Nominate the candidate that ranks in the top two across all three methods as the lead candidate. Tools: Python/scipy Spearman correlation; matplotlib heatmap visualisation. Expected Result: Spearman ρ > 0.7 between at least two pairs of scoring methods. Single lead candidate identified. Timeline: Day 32–34.


Step 14 — Codon Optimisation and Construct Design Purpose: Prepare synthesis-ready DNA sequences for the lead candidate, four runner-up candidates, and two controls (WT gp17 RBD and OprF construct). Method: The WT gp17 RBD insert (558 bp, codon-optimised for E. coli using VectorBuilder) is already designed and deposited at https://benchling.com/s/seq-A6Po7dHj1Cl12EvGgDA1, with the full pET-21a(+) plasmid construct at https://benchling.com/s/seq-PRl0YGEYghHSg22FYLGo. For each redesigned candidate, reverse-translate the ProteinMPNN-optimised loop sequences (residues 466–553) using the E. coli K-12 codon usage table, splice into the fixed scaffold sequence (residues 371–465) from the WT construct, and verify the 558 bp final insert length is maintained. Confirm SacI (5’) and HindIII (3’) cloning sites are intact, 5’ ATG start codon is in-frame with the N-terminal 6×His tag, and 3’ triple stop codon (TAATAATAA) is present. Digitally assemble all seven constructs in Benchling. Tools: Benchling, VectorBuilder codon optimisation tool. Expected Result: Seven fully annotated Benchling plasmid maps, each verified for reading frame integrity and restriction site compatibility. Timeline: Day 34–37.


Step 15 — Biosecurity Screening and Twist Bioscience Pro Forma Order Purpose: Verify all sequences are biosecurity-compliant and prepare the synthesis-ready order documentation for remote execution via Ginkgo Cloud Laboratories. Method: Submit all seven insert sequences to SecureDNA screening portal (https://securedna.org). Upon clearance, format as Twist Bioscience Clonal Gene orders: whole plasmid synthesis in pET-21a(+) backbone, SacI/HindIII directional cloning sites, 100 ng lyophilised delivery, standard turnaround. Generate a Twist quote document as the pro forma order. Tools: SecureDNA portal; Twist Bioscience online ordering platform (https://www.twistbioscience.com/products/genes). Expected Result: Seven Twist Clonal Gene order line items, biosecurity-cleared, with estimated synthesis cost and turnaround time documented. Timeline: Day 37–40.

Twist Bioscience Order Statement: Seven whole plasmids will be synthesised and ordered from Twist Bioscience as Clonal Gene products in pET-21a(+): the WT T7-gp17 RBD control (558 bp insert, codon-optimised, deposited at https://benchling.com/s/seq-A6Po7dHj1Cl12EvGgDA1), five redesigned RBD candidates with ProteinMPNN-optimised loop sequences, and the P. aeruginosa OprF surface-display construct (UniProt P13794). All constructs use SacI/HindIII directional cloning, in-frame N-terminal 6×His tag, and triple stop codon (TAATAATAA). All sequences are codon-optimised for E. coli K-12, biosecurity-screened via SecureDNA, and formatted as a synthesis-ready pro forma for remote execution via Ginkgo Cloud Laboratories.


Example Assay Plate Layout (Planned Future Wet-Lab Validation — 384-Well Format)

       Col 1–2           Col 3–4           Col 5–6           Col 7–8           Col 9–10          Col 11–12
Row A  Lead Candidate    Runner-Up C2      Runner-Up C3      Runner-Up C4      Runner-Up C5      WT gp17 (neg ctrl)
Row B  Lead Candidate    Runner-Up C2      Runner-Up C3      Runner-Up C4      Runner-Up C5      WT gp17 (neg ctrl)
Row C  [1 µM]            [1 µM]            [1 µM]            [1 µM]            [1 µM]            [1 µM]
Row D  [300 nM]          [300 nM]          [300 nM]          [300 nM]          [300 nM]          [300 nM]
Row E  [100 nM]
Row F  [30 nM]
Row G  [10 nM]
Row H  [3 nM]
Row I  [1 nM]
Row J  [0.3 nM]
Row K  Buffer only (blank)
Row L  OprF-only (background)
Row M  Anti-OprF Ab (positive ctrl)
Row N–P Triplicates

Plate: 384 Greiner black-well clear-bottom Detection: Spark Plate Reader, fluorescence mode (Ex 488 nm / Em 520 nm for FITC-labelled fiber fragments) Liquid handling: Echo525 for nanoliter-scale serial dilution transfers; Tempest for bulk buffer addition


DNA Construct Design

Construct A — WT T7-gp17 RBD Control (pET-21a(+)-His6-gp17-RBD-WT)

This construct encodes the wild-type T7-gp17 C-terminal receptor-binding domain (residues 371–553, 558 bp coding sequence), codon-optimised for E. coli K-12 expression using VectorBuilder, cloned into pET-21a(+) via SacI/HindIII sites with an in-frame N-terminal 6×His tag and a 3’ triple stop codon. The canonical sequence and annotated plasmid map are deposited in Benchling:

LOCUS       pET21a_His6_gp17_RBD_WT    ~6100 bp    DNA    circular
DEFINITION  Wild-type T7-gp17 RBD (residues 371-553, 558 bp CDS),
            codon-optimised for E. coli K-12 (VectorBuilder),
            N-terminal 6xHis tag, pET-21a(+) backbone.
            Negative control for OprF binding assays.
            Canonical sequence: https://benchling.com/s/seq-A6Po7dHj1Cl12EvGgDA1
ACCESSION   .
VERSION     .
KEYWORDS    T7 gp17; wild-type; receptor-binding domain; negative control;
            pET-21a; His-tag; Twist Clonal Gene.
SOURCE      Synthetic construct (Bacteriophage T7 origin)
  ORGANISM  Synthetic construct
FEATURES             Location/Qualifiers
     promoter        complement(1..19)
                     /label="T7 promoter"
                     /note="T7 RNA polymerase promoter for high-level expression"
     RBS             20..35
                     /label="RBS"
                     /note="Ribosome binding site"
     CDS             36..53
                     /label="6xHis tag"
                     /codon_start=1
                     /product="hexahistidine affinity tag"
                     /translation="MHHHHHH"
     CDS             54..611
                     /label="gp17-RBD-WT"
                     /codon_start=1
                     /note="T7 gp17 residues 371-553 (558 bp);
                     codon-optimised for E. coli K-12 using VectorBuilder;
                     scaffold residues 371-465 fixed;
                     hypervariable loops 466-553 unmodified (WT)"
                     /product="Wild-type T7-gp17 C-terminal RBD"
     misc_feature    54..332
                     /label="Fixed scaffold (371-465)"
                     /note="Structural chassis; held fixed in all redesigns"
     misc_feature    333..611
                     /label="Hypervariable loops (466-553)"
                     /note="Engineering cassette; redesigned in candidate constructs"
     terminator      612..660
                     /label="T7 terminator"
     rep_origin      complement(1000..1588)
                     /label="pMB1 ori"
     CDS             complement(1800..2660)
                     /label="AmpR"
                     /product="beta-lactamase"
                     /note="Ampicillin resistance"
ORIGIN
        1 taatacgact cactataGGG agaccacaac ggtttccctc tagaaataat tttgtttaac
       61 tttaagaagg agatatacca tATGCACCAC CACCACCACC AC
       [558 bp WT gp17 RBD CDS - codon-optimised sequence as deposited at
        https://benchling.com/s/seq-A6Po7dHj1Cl12EvGgDA1]
      TAATAATAA gcggccgc
//

Construct B — Lead Redesigned T7-gp17 RBD (pET-21a(+)-His6-gp17-RBD-OprF-Lead)

Identical backbone and expression architecture as Construct A. The scaffold sequence (residues 371–465) is identical to the WT construct. The hypervariable loop region (residues 466–553) is replaced with the ProteinMPNN-optimised sequence nominated as the lead candidate from Step 13. The final insert remains 558 bp. The exact sequence will be determined by the computational pipeline and deposited in Benchling prior to Twist ordering.

LOCUS       pET21a_His6_gp17_RBD_Lead    ~6100 bp    DNA    circular
DEFINITION  Redesigned T7-gp17 RBD, lead ProteinMPNN candidate.
            Scaffold residues 371-465 identical to WT.
            Loop residues 466-553 redesigned for OprF binding.
            N-terminal 6xHis tag, pET-21a(+) backbone.
ACCESSION   .
VERSION     .
KEYWORDS    T7 gp17; ProteinMPNN; OprF; receptor-binding domain redesign;
            pET-21a; His-tag; Twist Clonal Gene.
SOURCE      Synthetic construct
  ORGANISM  Synthetic construct
FEATURES             Location/Qualifiers
     promoter        complement(1..19)
                     /label="T7 promoter"
     RBS             20..35
                     /label="RBS"
     CDS             36..53
                     /label="6xHis tag"
                     /codon_start=1
                     /translation="MHHHHHH"
     CDS             54..611
                     /label="gp17-RBD-Lead"
                     /codon_start=1
                     /note="T7 gp17 residues 371-553 (558 bp);
                     scaffold 371-465: identical to WT (codon-optimised);
                     loops 466-553: ProteinMPNN-redesigned for OprF binding;
                     codon-optimised for E. coli K-12;
                     SacI/HindIII cloning sites flanking insert"
                     /product="Lead redesigned T7-gp17 RBD for OprF binding"
     misc_feature    54..332
                     /label="Fixed scaffold (371-465)"
                     /note="Identical to WT; held fixed in ProteinMPNN design"
     misc_feature    333..611
                     /label="Redesigned loops (466-553)"
                     /note="ProteinMPNN-optimised sequence for OprF surface complementarity;
                     sequence determined by computational pipeline Step 3;
                     to be deposited in Benchling prior to Twist ordering"
     terminator      612..660
                     /label="T7 terminator"
     rep_origin      complement(1000..1588)
                     /label="pMB1 ori"
     CDS             complement(1800..2660)
                     /label="AmpR"
ORIGIN
        1 taatacgact cactataGGG agaccacaac ggtttccctc tagaaataat tttgtttaac
       61 tttaagaagg agatatacca tATGCACCAC CACCACCACC AC
       [SCAFFOLD 371-465: identical to WT Benchling sequence]
       [LOOPS 466-553: ProteinMPNN-optimised — sequence inserted after Step 3]
      TAATAATAA gcggccgc
//

Construct C — OprF Surface-Display Construct (pET-21a(+)-OprF-eCPX)

Full-length P. aeruginosa OprF (UniProt P13794, 325 aa, 978 bp CDS) fused to the eCPX outer membrane display scaffold for surface presentation on E. coli cells. Source FASTA : https://rest.uniprot.org/uniprotkb/P13794.fasta. Codon-optimised for E. coli K-12. Used as the target in planned future wet-lab binding assays.

LOCUS       pET21a_OprF_eCPX    ~7100 bp    DNA    circular
DEFINITION  P. aeruginosa OprF (UniProt P13794, 325 aa) fused to eCPX outer
            membrane display scaffold, codon-optimised for E. coli K-12,
            in pET-21a(+). Target construct for planned gp17-RBD binding assays.
ACCESSION   .
VERSION     .
KEYWORDS    OprF; eCPX; outer membrane display; P. aeruginosa; phage therapy;
            Twist Clonal Gene.
SOURCE      Synthetic construct
  ORGANISM  Synthetic construct
FEATURES             Location/Qualifiers
     promoter        complement(1..19)
                     /label="T7 promoter"
     RBS             20..35
                     /label="RBS"
     CDS             36..2009
                     /label="OprF-eCPX fusion"
                     /codon_start=1
                     /note="P. aeruginosa OprF (UniProt P13794, 325 aa, 978 bp);
                     fused C-terminally to eCPX outer membrane scaffold;
                     codon-optimised for E. coli K-12;
                     extracellular loops L1-L8 surface-exposed;
                     source FASTA: https://rest.uniprot.org/uniprotkb/P13794.fasta"
                     /product="OprF-eCPX surface display fusion"
     misc_feature    36..1013
                     /label="OprF (P13794)"
                     /note="Full-length OprF; extracellular loops L1-L8 displayed"
     misc_feature    1014..2009
                     /label="eCPX scaffold"
                     /note="Outer membrane display scaffold for E. coli surface presentation"
     terminator      2010..2058
                     /label="T7 terminator"
     rep_origin      complement(2200..2788)
                     /label="pMB1 ori"
     CDS             complement(3000..3860)
                     /label="AmpR"
ORIGIN
        1 taatacgact cactataGGG agaccacaac ggtttccctc tagaaataat tttgtttaac
       61 tttaagaagg agatatacca tATG
       [OprF P13794 codon-optimised CDS fused to eCPX scaffold — assembled in Benchling]
      TAATAATAA gcggccgc
//

Section 4: Techniques, Tools, and Technology

HTGAA Course Technique Checklist

TechniqueRelevant to This Project
Pipetting🔲 Planned future wet-lab only
Lab Safety✅ BSL-1 in silico phase; BSL-2 considerations documented
Bioethical Considerations✅ SecureDNA screening; OprF target specificity; open publication commitment
DNA Construct Design✅ Three constructs designed in Benchling; GenBank blocks generated
Databases (GenBank, NCBI, UniProt)✅ PDB 4A0T, UniProt P13794, NCBI jackhmmer MSA
Designing a Twist Order✅ Seven Twist Clonal Gene pro forma orders documented
Creating a plan to use the Autonomous Lab at Ginkgo Bioworks✅ Remote execution pathway via Ginkgo Cloud Laboratories documented
Protein Design✅ ProteinMPNN loop redesign; ESM3 covariation mapping
Use of Benchling✅ All three constructs designed and deposited in Benchling
Models and Notebooks✅ ColabFold, OpenMM, PyRosetta on Google Colab Pro+
Databases (structural)✅ PDB 4A0T; UniProt P13794; Kazusa codon usage
Chassis SelectionE. coli K-12 (computational); BL21(DE3) planned for expression
Plasmid Preparation✅ pET-21a(+) construct design; Twist whole-plasmid synthesis
Bacterial Culturing🔲 Planned future wet-lab
Protein Purification🔲 Planned future wet-lab (Ni-NTA His-tag purification)
Other Cloning Methods (Restriction Enzyme Digestion)✅ SacI/HindIII directional cloning architecture designed

Technique Deep-Dive 1: ProteinMPNN for Interface-Directed Sequence Design

ProteinMPNN is a graph neural network trained on the PDB that learns to predict amino acid sequences compatible with a given protein backbone structure. Unlike traditional sequence design methods that rely on physics-based energy functions alone, ProteinMPNN captures the statistical patterns of how residues co-occur in structurally similar contexts across thousands of solved protein structures, enabling it to propose sequences that are both structurally plausible and chemically diverse. In this project, ProteinMPNN is applied in a constrained mode: the scaffold residues (371–465) are fixed to preserve the homotrimeric β-propeller geometry of the WT gp17 RBD as characterised in PDB 4A0T, while the loop residues (466–553) are free to be redesigned in the context of the OprF docking partner structure, biasing the model toward sequences that complement the OprF extracellular surface. The temperature parameter (T = 0.1–0.5) controls the diversity-confidence tradeoff: low temperatures produce high-confidence sequences close to the training distribution, while higher temperatures introduce diversity that may sample novel binding geometries not accessible to conservative design approaches, and the 500-sequence library generated at multiple temperatures ensures broad coverage of the designable sequence space.

Technique Deep-Dive 2: AlphaFold-Multimer ipTM Scoring as a Quantitative Binding Assay

AlphaFold-Multimer extends the original AlphaFold2 architecture to predict the structures of protein complexes, outputting both a predicted structure and a confidence metric called ipTM (interface predicted TM-score) that specifically reflects the model’s confidence in the predicted interface geometry. Unlike pLDDT, which measures per-residue structural confidence, ipTM is sensitive to whether the two chains are predicted to interact in a specific, well-defined orientation — making it a proxy for binding specificity rather than just structural plausibility. In this project, ipTM is used as the primary quantitative assay: the WT T7-gp17/OprF complex establishes a baseline ipTM reflecting the non-specific, evolutionarily orthogonal interface, and redesigned candidates are scored by their relative improvement over this baseline. The 90% relative improvement threshold is chosen to ensure that the computational signal is large enough to be biologically meaningful and not attributable to stochastic variation in AF-Multimer sampling, while remaining achievable given the known performance of ProteinMPNN-designed interfaces in published benchmarks where ipTM improvements of 0.2–0.4 units over non-cognate baselines have been reported for successful redesigns.

Industry Council Partners

PartnerRole in This Project
Twist BioscienceWhole-plasmid Clonal Gene synthesis for all seven constructs
Ginkgo BioworksRemote wet-lab execution via Ginkgo Cloud Laboratories; automation infrastructure (Echo525, Tempest, Spark)
SecureDNABiosecurity screening of all DNA sequences prior to synthesis
BenchlingConstruct design, annotation, and sequence deposition
Helix NanoFuture partnership for phage therapy delivery platform (Aim 3)
Asimov (Kernel)Potential future use for genetic circuit modelling of phage-host interaction dynamics
Basecamp ResearchPotential future use for expanded phage tail fiber diversity mining from metagenomic databases

Section 5: Results & Quantitative Expectations

Validation Choice

The chosen validation experiment is a Rosetta InterfaceAnalyzer ΔΔG cross-validation of the top five AF-Multimer candidates from Aim 1, executed as a fully in-silico protocol using PyRosetta on Google Colab Pro+. This experiment independently scores the same five protein-OprF complexes using a physics-based energy function that is mechanistically orthogonal to the AF-Multimer neural network, directly testing whether the ipTM-based ranking from Aim 1 is corroborated by a second, independent computational method. Agreement between AF-Multimer ipTM rank order and Rosetta ΔΔG rank order provides strong multi-evidence support that the nominated lead candidate is a genuine computational hit rather than an artefact of AF-Multimer’s training distribution bias.


Step-by-Step Validation Protocol

  1. Export AF-Multimer Predicted Complexes. Export the top-ranked PDB structure (ranked_0.pdb) for each of the five candidate complexes from the AF-Multimer docking screen (Step 6). Strip water molecules and non-protein heteroatoms using BioPython. Confirm both chains (redesigned gp17 RBD and OprF extracellular domain) are present and correctly labelled.

  2. Rosetta FastRelax. Run Rosetta FastRelax on each of the five complex PDB files using PyRosetta (50 cycles, REF2015 score function) to relieve steric clashes and optimise side-chain rotamers at the interface. Generate 50 independent trajectories per complex; retain the lowest-energy relaxed structure from each set.

  3. WT Rosetta Baseline. Run identical FastRelax + InterfaceAnalyzer on the WT T7-gp17/OprF complex (from Step 5). Record interface ΔG (REU), buried surface area (Ų), shape complementarity score (Sc), and number of interface hydrogen bonds as the energy baseline.

  4. Rosetta InterfaceAnalyzer on All Candidates. Run InterfaceAnalyzer on each of the five relaxed candidate complexes. Record the same four metrics. Calculate ΔΔG for each candidate relative to the WT baseline (ΔΔG = candidate ΔG − WT ΔG; more negative = better binding than WT).

  5. Rank Correlation Analysis. Compute Spearman rank correlation between the AF-Multimer ipTM scores (Step 6) and the Rosetta ΔΔG values (Step 4 above) across all five candidates using Python/scipy. A Spearman ρ > 0.8 indicates strong agreement between the two orthogonal scoring methods.

  6. Lead Candidate Confirmation. Identify the candidate ranking first in both AF-Multimer ipTM and Rosetta ΔΔG. Designate this doubly-validated candidate as the lead for Twist Bioscience synthesis and future wet-lab validation. If top candidates differ between methods, select the candidate with the best average rank across both metrics and document the discrepancy as a hypothesis for future experimental resolution.


Techniques Used in Validation

Rosetta FastRelax is a widely benchmarked protocol for relieving steric clashes in computationally predicted protein structures by iteratively minimising backbone and side-chain torsion angles against the REF2015 all-atom energy function, which includes Lennard-Jones van der Waals terms, hydrogen bonding, solvation, and electrostatics. InterfaceAnalyzer is a Rosetta application specifically designed to decompose the energy of a protein-protein interface into per-residue contributions, enabling quantitative comparison of binding energies across a series of designed variants and providing metrics such as shape complementarity and buried surface area that are independently predictive of experimental binding affinity. The use of Spearman rank correlation rather than Pearson correlation is appropriate here because the absolute energy values from AF-Multimer and Rosetta are not on the same scale — only the rank order is being compared — and Spearman correlation is robust to outliers and non-linear relationships between the two scoring functions. Together, FastRelax and InterfaceAnalyzer have been used in dozens of published protein interface design studies as the standard computational validation step between generative design and wet-lab synthesis, making this a methodologically established and reviewer-accepted validation approach directly executable in the current fully in-silico project phase.


Hypothetical Validation Data

Hypothetical Data Table:

CandidateAF-Multimer ipTMAF-Multimer RankRosetta ΔΔG (REU)Rosetta Rank
Lead (C1)0.711−4.81
Runner-Up C20.652−3.92
Runner-Up C30.613−3.14
Runner-Up C40.584−3.43
Runner-Up C50.525−2.75
WT gp17 (ctrl)0.28−0.4

Graph Description: A scatter plot with AF-Multimer ipTM score on the x-axis (range 0.2–0.8) and Rosetta ΔΔG (REU) on the y-axis (range 0 to −6 REU, more negative = better binding). Each of the five candidates is plotted as a filled blue circle; the WT control is plotted as an open red triangle. A linear regression line is overlaid with Spearman ρ ≈ 0.90 annotated. The lead candidate (C1) appears in the top-right quadrant (high ipTM, most negative ΔΔG), clearly separated from the WT control in the bottom-left quadrant, visually demonstrating the discriminatory power of the redesign pipeline.


Troubleshooting and Potential Challenges

If Rosetta FastRelax produces high-energy structures with persistent clashes, this typically indicates that the AF-Multimer predicted backbone geometry contains steric conflicts that the REF2015 force field cannot resolve without large conformational changes; the solution is to increase the number of FastRelax trajectories from 50 to 200 and retain the lowest 10% of energy structures before running InterfaceAnalyzer. If the Spearman rank correlation between AF-Multimer ipTM and Rosetta ΔΔG is below 0.5, this suggests that the two scoring functions are capturing fundamentally different aspects of the interface — AF-Multimer may be rewarding structural complementarity while Rosetta penalises electrostatic desolvation costs not captured by ipTM — and the resolution is to add a third scoring method such as FoldX ΔΔG and use a consensus rank across all three methods. If the WT T7-gp17/OprF Rosetta ΔΔG is unexpectedly favourable (more negative than −2.0 REU), this would indicate that the OprF model contains a surface groove that non-specifically accommodates the WT scaffold, and the OprF structure should be re-prepared using a higher-confidence AlphaFold2 model or an experimentally solved structure if one becomes available. If all five candidates show Rosetta ΔΔG values within 0.5 REU of each other, the scoring resolution is insufficient to discriminate candidates and the ProteinMPNN temperature should be lowered to T = 0.05 to generate a higher-confidence, more structurally distinct lead candidate for a second design cycle.


Section 6: Additional Information

References


Supply List and Budget

ItemPurposeEstimated CostSupplier & Link
Twist Bioscience Clonal Gene synthesis ×7 (whole plasmid, pET-21a(+))WT gp17 RBD control + 5 redesigned candidates + OprF construct$1,050 ($150/construct)Twist Bioscience
Google Colab Pro+ subscription (3 months)GPU compute for ColabFold AF-Multimer batch runs and OpenMM MD simulations~$60Google Colab
PyRosetta academic licenseFastRelax + InterfaceAnalyzer for Aim 2 and ValidationFree (academic)RosettaCommons
SecureDNA biosecurity screeningSequence screening for all 7 constructsFree (academic)SecureDNA
Benchling academic accountPlasmid design, annotation, GenBank exportFree (academic)Benchling
PyMOL educational licenseStructural visualisation and contact map analysisFree (open-source) or ~$120/yr (educational)PyMOL
VectorBuilder codon optimisationCodon optimisation for redesigned candidate insertsFree (online tool)VectorBuilder
[Planned future wet-lab] BL21(DE3) competent cellsExpression host for Twist-delivered constructs~$150NEB C2527
[Planned future wet-lab] HisPur Ni-NTA resinHis-tag affinity purification of gp17 RBD fragments~$200Thermo Fisher 88221
[Planned future wet-lab] FluoReporter FITC labelling kitFluorescent labelling of purified fiber fragments for binding assay~$180Thermo Fisher F6434
[Planned future wet-lab] 384-well Greiner black clear-bottom plates ×5Fluorescence binding assay plates~$125Millipore Sigma Greiner 781091
[Planned future wet-lab] LB broth powder 500 gBacterial culture media~$40Millipore Sigma L3022
[Planned future wet-lab] Ampicillin sodium salt 5 gSelection antibiotic for pET-21a(+)~$35Millipore Sigma A9518
[Planned future wet-lab] IPTG 1 gT7 promoter induction~$30NEB B7008
Total (in-silico phase only)~$1,110
Total (including all planned future wet-lab items)~$1,870

This proposal is saved to ./projects/TammySisodiya_T7gp17_OprF_redesign.md