Projects

Final projects

  • Individual Final Project

    Fitness-constrained design of a probiotic sentinel: a quantitative framework for circuit governability under evolutionary pressure.

  • Group Final Project

    HTGAA Group Project: Engineering the MS2 Bacteriophage L Protein

Subsections of Projects

Individual Final Project

ÌṢỌYoruba: to be well; to recover. A fitness-aware engineered probiotic designed to sense gut context, respond with targeted antimicrobials, and remain governable by design.


The problem

Childhood diarrhoeal disease kills roughly half a million children under five every year, and the majority of those deaths happen in sub-Saharan Africa. During clinical training in Osogbo, the treatment options were ORS, zinc, and empirical antibiotics. Effective, but blunt. The gap is not a shortage of therapeutics. It is a precision problem.

Core design question

How do we design microbial circuits that remain governable under the evolutionary and ecological pressures of a real gut environment?

The existing engineered probiotic literature optimises for peak performance under ideal conditions. ÌṢỌ maps design regimes: what works, what breaks, and what stays stable as conditions shift.

System architecture

ÌṢỌ is a four-module sense-respond-contain system built on E. coli Nissle 1917 (EcN):

ModuleComponentRole
BiosensorTtrS/TtrR two-component systemDetects tetrathionate, a pathogen-associated signal produced during gut inflammation by Salmonella and E. coli O157:H7
RegulatorThresholded Hill-function promoterGates activation; suppresses leaky expression and reduces fitness cost at homeostatic baseline
EffectorMicrocin H47 (MccH47)Narrow-spectrum antimicrobial; ATP synthase inhibition; active against Salmonella, Shigella, pathogenic E. coli; endogenous immunity protein MchI in EcN chassis
ContainmentdeltaDAPA auxotrophyDAP absent from mammalian gut; deletion is lethal without exogenous supply; escape frequency ~10^-8 per generation

Design stance

Optimise for stability, not just performance. The output is not a single optimal construct. It is a map of design regimes: parameter regions where the circuit functions, where it fails under burden, and where containment holds under selection pressure.

Scope — Spring 2026

Computational modelling only. Wet-lab validation, full microbiome simulation, and clinical deployment are explicitly out of scope for this phase.

Aim 1 (Experimental)

The first aim of my final project is to build and simulate a genome-scale metabolic and circuit-level ODE model of the four-module ÌṢỌ architecture by utilising Tellurium/libroadrunner for time-course simulation, SALib for global sensitivity analysis, and NumPy-based Moran process modelling for evolutionary containment stability, generating a Pareto-resolved fitness-efficacy landscape and a ranked parameter influence analysis as the primary computational output.

Aim 2 (Developmental)

Following a successful Aim 1, the top-ranked parameter regimes from the Pareto landscape will guide assembly and transformation of the sense-respond-contain circuit into EcN. The engineered sentinel will be tested in co-culture assays against Salmonella Typhimurium and E. coli O157:H7, validating both the tetrathionate-sensing threshold and MccH47-mediated kill kinetics experimentally. Discrepancies between model predictions and wet-lab data will feed back into model refinement.

Aim 3 (Visionary)

The long-term goal is a rugged, orally delivered live biotherapeutic that operates autonomously in the gut, activates only in the presence of pathogen-associated tetrathionate, kills narrow-spectrum without collateral microbiome disruption, and cannot persist outside the host. If the fitness-governability framework holds, ÌṢỌ becomes a design methodology applicable beyond this specific pathogen set, with direct relevance for AMR management, inflammatory bowel disease, and cancer immunotherapy in low-resource clinical settings.

Literature context

Palmer et al. (2017, ACS Infectious Diseases) demonstrated that EcN can be engineered to sense gut-luminal tetrathionate via the TtrS/TtrR two-component system and produce Microcin H47 in response, achieving measurable Salmonella inhibition in a mouse colonisation model. Critically for ÌṢỌ, this paper provides experimentally validated, ODE-parameterisable values for sensor activation kinetics, MccH47 production rates, and pathogen kill constants, making it the direct quantitative predecessor to this project rather than simply a conceptual reference.

Stritzker et al. (2007, International Journal of Medical Microbiology) characterised deltaDAPA auxotrophy in EcN in detail, reporting an escape frequency of approximately 10^-8 per generation under DAP-free conditions. That specific number is what makes the containment module computationally tractable: escape probability can be directly parameterised in the Moran process model rather than estimated from first principles.

What is novel

What ÌṢỌ does that neither paper does is treat fitness cost as a first-class design variable rather than a post-hoc observation. Every published EcN engineering study acknowledges metabolic burden; none model it explicitly as a design input alongside efficacy. ÌṢỌ builds a Pareto frontier that makes the tradeoff navigable rather than anecdotal. The containment module also moves from binary characterisation (auxotrophy is present or absent) to a dynamic system property, asking how quickly a loss-of-function mutant fixes in a finite population over evolutionary time.

Why this matters

Diarrhoeal disease causes approximately 1.6 million deaths per year globally, with the under-five burden concentrated in West and East Africa. Nigeria alone accounts for a disproportionate share of this mortality. Existing interventions reduce severity but do not prevent recurrence in high-transmission settings, and empirical antibiotic use is accelerating resistance emergence in the pathogens most responsible for paediatric deaths: Salmonella, enterotoxigenic E. coli, and Shigella. A sentinel probiotic that activates conditionally, kills narrow-spectrum, and cannot persist outside the host addresses this without adding to AMR pressure.

Beyond the immediate clinical problem, the fitness-governability framework ÌṢỌ develops has broader implications. Any engineered living therapeutic faces the same core question: will the circuit hold under the evolutionary pressure of a real biological environment? Current regulatory frameworks for live biotherapeutics have no standardised computational tool for answering this before a clinical trial. ÌṢỌ begins building one. Nigerian and broader West African epidemiological data (Egbewale 2022; Gayawan 2024) are used to parameterise disease burden and clinical context from the start, not as a framing afterthought.

Ethical implications

Two principles are directly engaged here: beneficence and justice. A precision antimicrobial that spares the commensal microbiome and cannot persist outside the host is strictly better than empirical broad-spectrum antibiotics for the patient, for the microbiome, and for the resistance landscape. Research that addresses paediatric mortality in West Africa while remaining computationally grounded in West African epidemiology represents a genuine departure from the default of developing interventions for high-income contexts and adapting them downstream.

The risks require honesty. A single deltaDAPA deletion is probably not sufficient for any real-world deployment. The current model assumes a closed population and does not account for horizontal gene transfer of the dapA gene from environmental bacteria. The Moran process also excludes commensal competition dynamics, so estimates of circuit persistence are optimistic. These are known limitations, explicitly scope-bounded to this computational phase. Non-maleficence requires that these caveats travel with any communication of the results. Open-source model release via GitHub (MIT licensed) is a deliberate act toward equitable access to the methodology.

Modelling pipeline

Circuit topology → ODE construction → parameter sweep → Pareto analysis → sensitivity analysis → evolutionary stability

Biosensor signal

Chosen: tetrathionate via TtrS/TtrR two-component system. Pathogen-specific: Salmonella and E. coli O157:H7 produce tetrathionate during gut inflammation via reactive oxygen species. Experimentally validated in EcN (Palmer et al. 2017). Signal is absent under homeostatic conditions, directly minimising leaky expression burden at baseline.

Microcin effector

Chosen: Microcin H47 (MccH47). Naturally produced by EcN. Narrow-spectrum: E. coli, Salmonella, Shigella. Mechanism is ATP synthase inhibition, a well-characterised mode of action enabling direct ODE kill-kinetics parameterisation. Immunity protein MchI is endogenous to the EcN chassis. Palmer 2017 provides benchmarked production and kill-rate values for exactly this design.

Containment

Chosen: deltaDAPA auxotrophy (diaminopimelic acid / DAP). DapA is essential for lysine and peptidoglycan synthesis. DAP is absent from the mammalian gut: no dietary source, no commensal production. Deletion is lethal without exogenous supply. Validated in EcN (Stritzker et al. 2007). Published escape frequency ~10^-8 per generation is directly parameterisable for the containment escape model.

ODE framework

Chosen: Tellurium + libroadrunner (SBML/Antimony). Purpose-built for systems biology ODE modelling. Antimony syntax maps directly onto circuit topology (promoter to mRNA to protein). libroadrunner’s stiff CVODE solver handles fast mRNA turnover and slow protein accumulation dynamics without manual configuration. SBML export makes every model citable and reproducible. SciPy solve_ivp (LSODA flag) runs in parallel for parameter sweeps and Pareto grid computation.

Sensitivity analysis

Primary: PRCC via SALib (Marino et al. 2008). Designed for nonlinear, monotonic systems, exactly what Hill-function gene circuits produce. 500 to 2000 Latin hypercube samples sufficient for 6 to 8 parameters.

Supplementary: Sobol total-order indices. Captures interaction effects (Hill coefficient n and KD interact in the sensor module). 5000 to 10000 samples, tractable on a laptop in minutes.

Evolutionary stability

Chosen: Moran process with fitness-weighted selection. Two competing types: functional circuit (fitness 1 minus delta) and loss-of-function mutant (fitness 1). Fixation probability computed analytically (Nowak 2006), then 1000 stochastic trajectories via numpy.random.choice() with fitness-weighted birth-death events. Directly answers: how long does the circuit remain functional under selection pressure?

ODE engine

Tellurium + libroadrunner — all four-module ODE construction and time-course simulation written in Antimony syntax. SBML export for reproducibility and citability.

Numerical / sweeps

SciPy solve_ivp (LSODA) — parameter sweeps and Pareto grid computation. LSODA auto-switches between stiff and non-stiff regimes.

Sensitivity analysis

SALib — PRCC for main figures, Sobol as supplementary. Canonical citation: Marino et al. 2008, J. Theor. Biol.

Evolutionary simulation

NumPy random.choice() — Moran process. Fitness-weighted birth-death events across 1000 independent trajectories. No additional dependencies.

Figures

Matplotlib + Seaborn (seaborn-whitegrid). Pareto scatter, PRCC tornado chart, containment escape semi-log, Moran fixation fan. Exported at 300 dpi PNG and SVG.

Reproducibility

GitHub + venv + requirements.txt (pinned exact versions). One command regenerates all figures. MIT licensed. CITATION.cff included.

Environment setup

python3.11 -m venv ~/.envs/iso-ecn && \
source ~/.envs/iso-ecn/bin/activate && \
pip install --upgrade pip && \
pip install tellurium scipy numpy \
    matplotlib seaborn SALib pandas && \
pip freeze > requirements.txt

Task 1: Environment setup and baseline biosensor model (weeks 1)

Install and configure the full modelling stack: Tellurium, libroadrunner, SALib, NumPy, Matplotlib, Seaborn, SciPy, all pinned in a venv with requirements.txt. Build the biosensor module as a two-ODE Hill-function model encoding the TtrS/TtrR tetrathionate-to-promoter activation pathway. Fit activation threshold KD and Hill coefficient n against Palmer 2017 time-course data.

Expected result: Simulated sensor activation curve matches digitised Palmer 2017 experimental data within 20% across the measured tetrathionate concentration range.

Actual findings

Task 2: Full four-module ODE construction (weeks 2)

Extend the biosensor ODE to include the regulator module (thresholded Hill-function promoter gating effector expression), the effector module (MccH47 production and pathogen kill kinetics), and the containment module (deltaDAPA escape probability). Write all models in Antimony syntax within Tellurium. Export validated models to SBML and commit to GitHub.

Expected result: Stable steady-state solutions for all four modules under both homeostatic and pathogen-present conditions. Leaky expression at baseline should approach zero.

Actual findings

Task 3: Pareto landscape and parameter sweep (weeks 3)

Use SciPy solve_ivp with LSODA flag to sweep burden parameter delta and effector output rate k_M across a 50 x 50 parameter grid. Record steady-state growth rate and pathogen suppression ratio for each grid point. Plot Pareto frontier, colour-coded by regulator variant (linear vs. thresholded).

Expected result: A visible Pareto frontier separating viable design space from over-burdened and under-effective regions. The thresholded regulator variant should dominate the frontier.

Actual findings

Task 4: Global sensitivity analysis (weeks 4)

Run PRCC analysis via SALib using Latin hypercube sampling across 6 to 8 parameters: Hill coefficient n, signal threshold KD, burden delta, MccH47 production rate k_M, pathogen kill rate k_kill, mRNA degradation rate, and protein dilution rate. Generate ranked tornado chart. Run supplementary Sobol total-order index analysis.

Expected result: n and KD rank as the top two PRCC drivers of sensor module output. Sobol indices confirm a significant interaction effect between the two.

Actual findings

Task 5: Evolutionary stability via Moran process (weeks 5)

Implement the Moran process in NumPy. Define two competing cell types (functional circuit, fitness 1 minus delta; loss-of-function mutant, fitness 1). Compute analytical fixation probability from Nowak 2006. Run 1000 stochastic trajectories. Vary delta across the Pareto-viable range; plot fixation probability across three population sizes with analytical solution overlaid.

Expected result: Fixation probability of the loss-of-function mutant increases sharply above delta = 0.1. This is the quantitative argument for why the thresholded regulator module is not optional.

Actual findings

Industry Council companies

CompanyRole
Asimov (Kernel)Validate Pareto landscape and containment circuit architecture; independent cross-check against Tellurium ODE results
SecureDNAScreen all DNA sequences (mccH47, deltaDAPA cassette) before synthesis
CultivariumEcN-specific transformation protocols and characterised parts for Aim 2
Twist BiosciencesCodon-optimised construct synthesis for Aim 2
OpentronsCo-culture assay automation for Aim 2 parallel screening
1 — Fitness-efficacy Pareto frontier

Burden parameter delta and effector output k_M swept across a 50 x 50 parameter grid. Each point represents steady-state growth rate and pathogen suppression ratio. Pareto frontier overlaid. Colour-coded by circuit variant (linear vs. thresholded regulator). This figure makes the design regime concept concrete: the viable parameter space, the over-burdened region, and the under-effective region visible in a single plot. No equivalent exists in the published EcN engineering literature.

2 — Sensitivity analysis (PRCC tornado)

PRCC bar chart ranked by absolute influence on steady-state pathogen suppression. Parameters: Hill coefficient n, signal threshold KD, burden delta, microcin production rate k_M, pathogen kill rate k_kill. Sobol indices shown as supplementary to capture n-KD interaction effects. k_M and k_kill dominate pathogen suppression output; n and K_D dominate TtrR* biosensor output. These are distinct design problems: effector kinetics governs therapeutic outcome at physiological signal concentrations, while sensor cooperativity governs specificity at sub-threshold concentrations.

3 — Containment escape probability

Semi-log plot of escape frequency vs. generations. Single deltaDAPA vs. dual deltaDAPA + deltaThyA auxotrophy compared. Analytical curve overlaid on stochastic simulation trajectories. Anchored to published escape frequency ~10^-8 per generation (Stritzker 2007).

4 — Evolutionary stability (Moran fixation)

Fixation probability of loss-of-function mutant as a function of burden delta, across three population sizes. 1000-trajectory stochastic fan with analytical Nowak 2006 solution overlaid. Demonstrates that the thresholded regulator (Module 2) extends functional circuit half-life under selection relative to constitutive expression: the quantitative argument for why the regulator module is not optional.

What was validated

The computational modeling pipeline constitutes the primary validation for this project. Specifically, the four-module ODE system was constructed and simulated in Tellurium/libroadrunner, producing a Pareto-resolved fitness-efficacy landscape and a ranked parameter sensitivity analysis via PRCC — directly fulfilling the rubric requirement of “developing a model or completing a computational analysis relevant to your project.” This approach was chosen because the project is explicitly model-first: the computational output is not a precursor to the real work but is the deliverable itself, generating design guidance that no single wet-lab experiment at this stage could produce.

Validation protocol

  1. Define circuit topology: four-module architecture (TtrS/TtrR biosensor → thresholded Hill-function regulator → MccH47 effector → ΔdapA containment) with biological parameters drawn from Palmer et al. 2017 and Stritzker et al. 2007
  2. Write Antimony-syntax ODE models for each module in Tellurium, export to SBML for reproducibility
  3. Implement growth-burden model: logistic growth base modified by scalar burden parameter δ, parameterised from Scott et al. 2010
  4. Implement biosensor Hill-function cascade: signal S → sensor protein R → promoter output P, two variants (linear and thresholded)
  5. Couple effector (MccH47) production and pathogen kill rate via mass-action kinetics
  6. Sweep burden δ and effector output M across a defined parameter grid using SciPy solve_ivp (LSODA solver)
  7. Compute steady-state growth rate and pathogen kill for each grid point; identify Pareto frontier
  8. Run PRCC sensitivity analysis via SALib: define parameter bounds, generate Latin hypercube sample (~1000 points), run model over sample, compute PRCC indices for Hill coefficient n, threshold K_D, burden δ, production rate k_M, kill rate k_kill
  9. Implement Moran process in NumPy: two competing types (functional circuit fitness 1−δ, loss-of-function mutant fitness 1), run 1000 independent stochastic trajectories, compute fixation probability analytically and compare
  10. Export all figures at 300 dpi PNG and SVG; commit all code and SBML files to GitHub (Jonahnki/iso-sentinel-ecn) with a one-command reproduce script

Synthetic biology techniques utilised

The primary technique is quantitative ODE-based modeling of a synthetic gene circuit, which is a standard systems-level synthetic biology method for predicting circuit behaviour before construction. Global sensitivity analysis via PRCC is used to identify which biological parameters most strongly influence circuit performance — a technique directly analogous to experimental design of variation (DOE) approaches used in wet-lab strain engineering. Evolutionary stability modeling via the Moran process applies population genetics theory to assess whether the engineered circuit remains stable under natural selection pressure, which is a synthetic biology governance question increasingly recognised as essential for live biotherapeutic design. Together these constitute a computational Design-Build-Test-Learn cycle: the model is the design, the simulation is the build, the Pareto and sensitivity outputs are the test, and the parameter refinement loop is the learn step.

Data and analysis

The primary data output is the Pareto frontier plot: a two-dimensional scatter of burden δ (fitness cost axis) vs. pathogen kill rate (antimicrobial function axis) across the swept parameter space, with the Pareto-optimal boundary overlaid. Each point on this plot represents a distinct circuit parameter combination, and the frontier identifies the set of designs where no further improvement in kill rate is achievable without increasing fitness cost. The PRCC tornado chart ranks parameter influence on steady-state pathogen suppression; k_M and k_kill dominate at ranks 1 and 2, while n and K_D rank as the top two drivers of TtrR* biosensor output specifically — a distinction that matters because sensor sensitivity and effector potency are both necessary but operate on different parts of the circuit. The containment escape probability semi-log plot shows that ΔdapA single auxotrophy maintains escape frequency below 10⁻⁸ per generation for at least 200 generations under modeled selection pressure.

Challenges, limitations, and alternative strategies

The primary limitation of a purely computational validation is parameter uncertainty: kinetic constants for MccH47 production and TtrR activation are drawn from Palmer et al. 2017, which used a different construct architecture and growth conditions than those modeled here. Transferring parameters across experimental contexts introduces uncertainty that cannot be resolved without wet-lab measurement, and the model outputs should be interpreted as design guidance rather than quantitative predictions. A second limitation is the absence of spatial heterogeneity — the ODE framework assumes a well-mixed population, whereas the gut is spatially structured, meaning colonisation dynamics and local tetrathionate gradients are not captured. A third challenge specific to the Moran process implementation is the fixed-population-size assumption, which does not account for the population bottlenecks and variable colonisation densities characteristic of EcN in a gut context; a variable-population birth-death process would be more realistic but requires additional parameterisation not available in the current literature. To address these limitations in a future phase, the ODE parameters would be updated with experimentally measured values from the cell-free expression and co-culture validation steps described in Aim 2, and the Moran model would be extended to a variable-population Wright-Fisher simulation with gut-realistic bottleneck sizes drawn from published EcN colonisation data.

Baseline design parameters

ParameterSymbolValueBoundsSource
Max promoter outputα_max12.0 rel. unitsfixedv6 calibration
Leaky expressionα_leak0.24 (2% α_max)1–5% α_maxv6 fix; >10% collapses FI
Tetrathionate EC50EC5020 µM5–50 µMPalmer 2017; right-shifted from 10 µM for gut conservatism
Hill cooperativity (biosensor)n21–3Two-component phosphorelay physiology
TtrR production ratek10.426 h⁻¹fixedJ23115 promoter + medium copy
Basal TtrR leakk1_leak0.002fixedReduced from 0.01 to prevent OFF-state floor inflation
TtrR degradationd_TtrR0.1 h⁻¹fixedStandard bacterial protein turnover
sfGFP degradationd_sfGFP0.05 h⁻¹fixedsfGFP stability in EcN
Hill constant (sensor→reporter)Km0.3≥0.3Lower bound enforced; Km=0.1 shifts sigmoid left of physiological window
Regulator gate thresholdthreshold_TtrR2.0 rel. unitsfixedTtrR_ss at EC50 (20 µM); 50% induction
Gate sharpnessn_reg3fixedNear-digital switching; suppresses sub-threshold leaky effector
MccH47 production ratek_M0.2 µM/h0.01–0.5 µM/h (Pareto sweep)Palmer 2017 mid-range
MccH47 degradationd_M0.05 h⁻¹fixedPeptide stability estimate
Pathogen kill ratek_kill0.3 µM⁻¹ h⁻¹fixedPalmer 2017 gut-corrected (original 0.05 underestimated suppression)
Pathogen growth ratek_growth0.5 h⁻¹fixedSalmonella Typhimurium in gut (~35 min doubling)
Plasmid copy numberCN20 copies/cell5–50pSC101 medium copy range
Burden per copyδ/copy0.0015fixedScott et al. 2010 ribosome allocation model
Circuit fitness costδ0.03 (3%)≤0.10Derived: CN × δ/copy
ΔdapA escape frequencyμ_escape10⁻⁸ per generationfixedStritzker 2007
EcN generation timet_gen0.5 hfixed~30 min doubling in gut

Confirmed simulation outputs

MetricValueTargetStatus
Fold induction (observed)41.5×≥10×
sfGFP steady state (ON, 50 µM)12.16 rel. units>10
sfGFP steady state (OFF, 0 µM)0.293 rel. units<1.0
t50 (sfGFP, ON state)15.3 h8–30 h
Pathogen suppression at 200 h (50 µM S)100%≥90%
MccH47 homeostatic steady state0.0000~0
Pathogen homeostatic steady state1.0000~1
Sub-threshold suppression (2 µM)0.0%<50%
Pareto frontier points (thresholded)42>0
Pareto frontier points (linear)39>0
Moran extinction rate (δ=0.03)95.1%>80%
Analytical vs stochastic RMSE0.0175<0.05
Containment escape t50 (single ΔdapA)3,956 years>1,000 years
Module 2 lifetime advantage2.0×>1×

Multivariable relationships and biological implications

RelationshipFindingBiological implication
α_leak → FIFI = (α_max + α_leak) / α_leak; at 10% leak FI collapsed to 5.5×; at 2% leak FI = 41.5×Leaky expression is the single most destabilising parameter for therapeutic gating. Even small absolute increases in basal expression dramatically erode the ON/OFF discrimination needed to prevent activation in healthy gut
EC50 → activation windowSigmoid sits within 10–100 µM physiological window at EC50 = 20 µM; shifts dangerously left at EC50 = 10 µMEC50 must be calibrated to gut tetrathionate levels during active infection, not laboratory buffer conditions. Under-estimating EC50 risks false activation in sub-pathological inflammatory states
n (Hill) → switch sharpnessn=2 gives cooperative switching; n=1 gives graded non-zero baseline output at low SHill cooperativity is what separates a sensor from a rheostat. n=2 is the minimum for therapeutically meaningful gating in a two-component phosphorelay system
Km → left-shift riskKm=0.1 activates below physiological window; Km ≥ 0.3 keeps sigmoid correctly anchoredKm sets the TtrR concentration at which sfGFP output is half-maximal. Reducing Km below 0.3 is equivalent to making the effector more sensitive than the sensor — a design inversion that breaks the gating logic
k_M + k_kill → Pareto positionk_M is top PRCC driver (0.733); k_kill is second (0.671); both dominate Sobol STEffector module output, not sensor sensitivity, is rate-limiting for therapeutic function. Investment in MccH47 production fidelity and secretion efficiency returns more suppression gain than further sensor optimisation
δ → Moran fixationP_fix rises from 0.03 at δ=0.03 to 0.10 at δ=0.10; circuit half-life halves from 3,956 to 1,978 yearsThere is a 3.3× safety margin between the design operating point (δ=0.03) and the critical threshold (δ=0.10) above which loss-of-function fixation becomes ecologically relevant. The thresholded regulator preserves this margin by suppressing expression at baseline
n_reg → gate fidelityn_reg=3 gives near-digital switching; sub-threshold MccH47 = 0.0001 µM (effectively zero)Regulator sharpness is what enforces the therapeutic safety contract. A graded regulator (n_reg=1) would permit low-level MccH47 production at all tetrathionate concentrations, including homeostatic
μ_escape + dual auxotrophySingle ΔdapA: t50 = 3,956 years; dual ΔdapA+ΔthyA: escape probability near zero across 10¹² generationsContainment robustness scales super-linearly with additional auxotrophies. The second deletion multiplies escape improbability rather than adding to it, because both reversions must occur independently
PRCC vs Sobol rank agreementTop-4 ranks identical across methods: k_M, k_kill, d_M, Km_regConsistency between PRCC and Sobol ST confirms the model is not artefactually sensitive to the choice of global sensitivity method. The ranking is a structural property of the circuit, not an analysis artefact
Signal-dependent sensitivityAt S=2 µM top driver is n_reg; at S≥50 µM top driver is k_MBelow threshold, gate architecture controls leakage; above threshold, effector kinetics controls outcome. These are two distinct design problems, not one

How the variables relate to each other

Biosensor layer

  • Tetrathionate (S) ↑ → TtrR* ↑ (direct, Hill-function, cooperative at n=2)
  • EC50 ↑ → activation threshold shifts right → less sensitive sensor (inverse: higher EC50 = harder to activate)
  • n ↑ → switch becomes sharper; more digital ON/OFF behaviour (direct: higher n = steeper sigmoid)
  • α_leak ↑ → OFF-state floor ↑ → fold induction ↓ (inverse: more leak = worse discrimination)
  • k1_leak ↑ → TtrR* at S=0 ↑ → sfGFP baseline ↑ → fold induction ↓ (inverse)

Regulator layer

  • TtrR* ↑ past threshold → regulator gate opens → MccH47 production enabled (threshold-gated: near-binary)
  • n_reg ↑ → gate switch sharper → sub-threshold leaky effector production ↓ (direct: higher n_reg = cleaner gate)
  • Km_reg ↑ → gate opens at higher TtrR* → effector activation delayed (inverse: higher Km = harder to open gate)

Effector layer

  • k_M ↑ → MccH47 steady state ↑ → pathogen suppression ↑ (direct: strongest PRCC driver, rank 1)
  • d_M ↑ → MccH47 steady state ↓ → pathogen suppression ↓ (inverse: rank 3 PRCC driver)
  • k_kill ↑ → pathogen kill rate ↑ → suppression ↑ (direct: rank 2 PRCC driver)
  • k_growth ↑ → pathogen recovers faster → suppression ↓ (inverse: rank 5 PRCC driver)
  • MccH47 and k_kill interact multiplicatively in the kill term (k_kill × MccH47 × Pathogen): neither alone is sufficient

Burden and copy number

  • Copy number ↑ → δ ↑ (direct, linear: δ = CN × 0.0015)
  • δ ↑ → circuit fitness cost ↑ → loss-of-function mutant selective advantage ↑ → Moran fixation probability ↑ (direct)
  • δ ↑ → circuit functional half-life ↓ (inverse: higher burden = shorter evolutionary lifetime)
  • Pareto frontier position: k_M ↑ and δ ↑ move in opposite directions on the frontier — you cannot increase effector output without also increasing burden unless you reduce copy number or tighten the regulator gate

Containment

  • μ_escape fixed at 10⁻⁸ per generation (Stritzker 2007): not a design variable, a biological floor
  • Adding a second auxotrophy (ΔdapA + ΔthyA): escape probability scales as μ² not 2μ — super-linear safety gain
  • Population size N ↑ → fixation probability per mutant ↓, but rate of mutant arrival ↑ (N × μ_escape): net escape rate is approximately constant across N for fixed δ

Cross-layer interactions confirmed by Sobol S2

  • n and EC50 interact (S2 = 0.023 for TtrR* output): cooperativity and threshold are not independent — a sharp sigmoid at the wrong EC50 is no better than a shallow one at the right EC50
  • k_M and k_kill interact (S2 > 0): production rate and kill rate are coupled through MccH47 concentration — saturating one without the other gives diminishing returns
  • Signal level S changes which layer dominates: at S < EC50 the gate architecture (n_reg, Km_reg) controls leakage; at S > EC50 the effector kinetics (k_M, k_kill) controls outcome

A fully interactive browser-based ODE simulation is deployed and accessible via the Cloudfare interface and linked below:

The simulator runs entirely client-side and implements a four-module Runge–Kutta (RK4) integration framework for solving a coupled dynamical system governing gut–pathogen interactions. No server infrastructure or external compute backend is required.

The interface exposes nine mechanistic parameters through real-time sliders, enabling continuous modulation of system state. Each user interaction triggers full re-evaluation of the ODE system and updates five derived biological observables:

  • Fold induction response amplitude
  • Half-response time (t₅₀)
  • Pathogen suppression efficiency
  • System burden parameter (δ)
  • Moran fixation probability

Together, these outputs provide a live mapping between parameter space and emergent ecological–immunological dynamics, allowing direct exploration of the nonlinear nature, stability transitions, and intervention sensitivity.

Unified governing relationships — Tasks 1 to 5

Task 1: Biosensor steady state

Task 2: Regulator gate and effector steady state

Task 3: Pareto position

Task 4: Sensitivity — PRCC and Sobol

Task 5: Moran process — log-space corrected

Unified cross-task relationship

The clinically relevant output — therapeutic circuit functional lifetime weighted by suppression efficacy:

What worked

The model-first architecture proved its value immediately. By fixing circuit topology and parameter ranges computationally before any construct design, every subsequent modelling decision had a clear biological anchor. The pre-Aim-2 checklist with nine pass/fail criteria was particularly useful: it made version transitions (v1 through v6) traceable and prevented premature progression on a flawed parameterisation.

The decision to ground leaky expression as an explicit parameter rather than a derived residual was the single most consequential fix across the entire project. Fold induction swung from 2×10⁸ (numerically meaningless) to 5.5× (biologically insufficient) to 41.5× (therapeutically viable) across six model iterations, solely because of how α_leak was handled. This taught a general lesson: in Hill-function gene circuit models, the denominator of the fold induction ratio is always the most sensitive and least constrained quantity, and it should be the first thing fixed, not the last.

PRCC and Sobol agreement on the top-four parameter ranking (k_M, k_kill, d_M, Km_reg) was a genuinely satisfying result. Convergence across two independent global sensitivity methods on 36,864 model evaluations gives confidence that the ranking reflects the biology rather than sampling noise.

What failed or required significant revision

The alpha_leak inflation problem (v1–v5). Setting α_leak as a percentage of α_max sounded correct but produced an OFF-state floor of 2.4 relative units when α_leak = 1.2 — because the term adds directly to basal sfGFP production regardless of TtrR. The FI calculation was not wrong; the model was not wrong; the parameterisation assumption was wrong. Six versions were required to isolate this. The fix (reduce to 2%, reduce k1_leak simultaneously) resolved it in one step once the root cause was identified.

k_kill underestimation (v1 Task 2). The initial k_kill = 0.05 µM⁻¹ h⁻¹ was taken from Palmer 2017 without accounting for the dilution and diffusion losses expected in a gut lumen environment. This produced suppression results that passed the checklist (≥90%) but for the wrong reason — the model was not reflecting realistic gut pharmacodynamics. Correcting to k_kill = 0.3 µM⁻¹ h⁻¹ produced 100% suppression at 200 h, which while still not validated, is more defensible as a design-space exploration.

The Moran overflow warning. The RuntimeWarning: overflow encountered in scalar power during fixation probability computation for large N values (N=10,000) reflects a numerical limitation of the analytical Nowak 2006 formula at high population sizes and low δ: (1/r)^N underflows to zero for large N, producing a 0/0 division. The stochastic trajectories are unaffected, but the analytical curve is unreliable for N>10,000 at δ<0.05. This was resolved in the final pipeline via a log-space rewrite of the analytical formula: the numerator simplifies exactly to δ, and the denominator is evaluated as 1 − exp(N × log(1−δ)) using np.log1p for precision near zero. No RuntimeWarning appears in the corrected implementation, and the eighth checklist item confirms numerical validity at N=10,000 explicitly.

SVG rendering on the HTGAA page. Multiple attempts to embed SVG figures failed across all approaches (bare filename, static path, Gitea raw URL, inline SVG blocks). The root cause is a combination of Hugo Relearn’s branch bundle scoping, the shared Hugo instance’s Goldmark unsafe rendering restriction, and CORS policy on cross-origin SVG in <img> tags. PNG via Gitea raw URL resolved the rendering problem entirely. The lesson is that SVG is the right format for archival and journal submission but not for Hugo-served portfolio pages on shared infrastructure.

The n–EC50 interaction expectation mismatch. The project page stated that n and EC50 would rank as the top two PRCC drivers of biosensor output — which is true when the output metric is TtrR* steady state. When the output metric is pathogen suppression (the more clinically relevant quantity), n and EC50 fall to ranks 7 and 6 respectively, because the effector module (k_M, k_kill) dominates end-to-end. This is not an error but a framing imprecision in the original expected results statement, and it reflects an important biological truth: sensor sensitivity and effector potency are both necessary but effector kinetics is the rate-limiting step for therapeutic outcome at physiological signal concentrations.

What this project does not yet answer

The model assumes a well-mixed, spatially homogeneous gut compartment. Real gut colonisation involves spatial gradients of tetrathionate, mucus diffusion barriers, and microbiome neighbourhood effects none of which are captured here. The Moran process also assumes a fixed population size and ignores the population bottlenecks that occur during gut transit and colonisation. Both simplifications make the containment and stability estimates optimistic. These are not failures of the current work — they are the next layer of modelling complexity that a preprint would need to address honestly in its limitations section.

References

  1. Palmer, J. D., Piattelli, E., McCormick, B. A., Silby, M. W., Brigham, C. J., & Bucci, V. (2017). Engineered probiotic for the inhibition of Salmonella via tetrathionate-induced production of microcin H47. ACS Infectious Diseases, 4(1), 39–45. https://doi.org/10.1021/acsinfecdis.7b00114
  2. Weibel, N., Curcio, M., Schreiber, A., et al. (2024). Engineering a novel probiotic toolkit in Escherichia coli Nissle 1917. ACS Synthetic Biology, 13(8), 2376–2390. https://doi.org/10.1021/acssynbio.4c00036
  3. Lynch, J. P., Goers, L., & Lesser, C. F. (2022). Emerging strategies for engineering E. coli Nissle 1917-based therapeutics. Trends in Pharmacological Sciences, 43(9). https://doi.org/10.1016/j.tips.2022.02.002
  4. Ba, F., Zhang, Y., Ji, X., Liu, W.-Q., Ling, S., & Li, J. (2023). Expanding the toolbox of probiotic E. coli Nissle 1917 for synthetic biology. bioRxiv. https://doi.org/10.1101/2023.06.05.543671
  5. Stritzker, J., Weibel, S., Hill, P. J., Oelschlaeger, T. A., Goebel, W., & Szalay, A. A. (2007). Tumor-specific colonization, tissue distribution, and gene induction by probiotic E. coli Nissle 1917 in live mice. International Journal of Medical Microbiology, 297(3), 151–162.
  6. Scott, M., Gunderson, C. W., Mateescu, E. M., Zhang, Z., & Hwa, T. (2010). Interdependence of cell growth and gene expression: origins and consequences. Science, 330(6007), 1099–1102.
  7. Marino, S., Hogue, I. B., Ray, C. J., & Kirschner, D. E. (2008). A methodology for performing global uncertainty and sensitivity analysis in systems biology. Journal of Theoretical Biology, 254(1), 178–196.
  8. Nowak, M. A. (2006). Evolutionary Dynamics: Exploring the Equations of Life. Harvard University Press.
  9. Moran, P. A. P. (1958). Random processes in genetics. Mathematical Proceedings of the Cambridge Philosophical Society, 54(1), 60–71.
  10. Egbewale, B. E., Karlsson, O., & Sudfeld, C. R. (2022). Childhood diarrhea prevalence and uptake of oral rehydration solution and zinc treatment in Nigeria. Children, 9(11), 1722.
  11. Gayawan, E., Cameron, E., Okitika, T., Egbon, O. A., & Gething, P. (2024). A situational assessment of treatments received for childhood diarrhea in Nigeria. PLOS ONE, 19(5), e0303963.
  12. World Health Organization. (2024). Diarrhoeal disease. https://www.who.int/news-room/fact-sheets/detail/diarrhoeal-disease

Supply list and budget

Computational (current phase — no cost)

  • Python 3.11 environment (free, open source)
  • Tellurium + libroadrunner (free, pip-installable)
  • SALib (free, pip-installable)
  • NumPy, SciPy, Matplotlib, Seaborn (free, pip-installable)
  • GitHub repository, MIT license (free)
  • Personal Linux desktop (existing hardware)

Wet-lab validation phase (Aim 2 — estimated)

  • Twist Bioscience gene synthesis, four-module construct ~3.5 kb: ~$350
  • EcN ΔdapA chassis strain (laboratory stock or DSMZ acquisition): ~$150
  • DAP supplemented LB media, selective plates: ~$80
  • 96-well plate reader access (institutional): $0–$200 per run
  • Tetrathionate sodium salt (Sigma-Aldrich, 5g): ~$60
  • Salmonella Typhimurium SL1344 (laboratory stock): $0
  • Co-culture assay consumables (plates, tips, tubes): ~$120
  • OD600/fluorescence plate reader cartridges: ~$50
  • LC-MS peptide mapping access for MccH47 confirmation (Waters BioAccord or institutional): ~$400–$800 per sample

Estimated total for Aim 2 wet-lab phase: ~$1,200–$1,800

Distinction from existing work

The engineered probiotic field asks: can we build a circuit that works?

ÌṢỌ asks: across what design regimes does a circuit remain both functional and governable under the pressures that will actually be present?

  • Fitness cost as a design variable — no published EcN paper produces a fitness-efficacy Pareto frontier. All existing work acknowledges burden; none model it as a first-class input.
  • Containment as a dynamic system property — the field treats auxotrophy as binary. ÌṢỌ models escape probability over evolutionary time via the Moran process.
  • Disease context — the dominant literature targets IBD and colorectal cancer. ÌṢỌ is framed around acute paediatric diarrhoeal disease in West African clinical settings; this changes which signals, effectors, and ecological assumptions are relevant.
  • Model-first methodology — existing EcN engineering papers build constructs first and measure them. ÌṢỌ maps the computational design landscape before any construct is built.
  • Geographic grounding — clinical inspiration and epidemiological parameters drawn from Nigerian data (Egbewale 2022; Gayawan 2024). African-origin disease burden as a scientific foundation, not a framing afterthought.

What ÌṢỌ builds on

  • Palmer et al. 2017 — direct experimental predecessor; tetrathionate/MccH47/EcN parameter source
  • Weibel et al. 2024 — modular architecture precedent; ÌṢỌ extends with containment module and ODE-level analysis
  • Ba et al. 2023 — EcN toolbox that any future wet-lab build would draw on
  • Stritzker et al. 2007 — deltaDAPA escape frequency; containment parameterisation source
  • Marino et al. 2008 — PRCC methodology; sensitivity analysis canonical citation
Preprint — target June 2026

The models, figures, and write-up constitute the core of a bioRxiv preprint. Abstract, introduction, and discussion sections bring it to a citable first-author computational biology paper.

Microbiome competition layer

Add a simplified Lotka-Volterra competition term for commensal species. Explores how microbiome density affects EcN colonisation stability and circuit persistence under realistic ecological conditions.

Week 5 synthesis — microcin analog design

Apply the PepMLM/moPPIt peptide generation pipeline (HTGAA Week 5) to propose microcin-analog sequences with improved target specificity. AlphaFold3 structural prediction of microcin-pathogen outer membrane protein complexes bridges the computational peptide design and engineered probiotic work.

West Africa AMR data integration

Parameterise the pathogen kill model with AMR prevalence data from Nigerian clinical isolates (WHONET/GLASS). Grounds the model in Sub-Saharan African epidemiology and connects to the planned AMR West Africa genomic data paper.

Repository

Jonahnki/iso-sentinel-ecn

All model code, SBML files, and figures. MIT licensed. CITATION.cff included.

Validated construct — pTtr-TtrSR-sfGFP_EcN_v1

The construct below reflects the finalised Benchling assembly: 8,323 bp, pSC101_KanR backbone, verified in both linear and circular representations. All part identifiers are confirmed iGEM Registry parts as assembled.

Construct status

Assembled and visualised in Benchling (pTtr-sfGFP_TtrSR_EcN_v1, 8,323 bp). Linear and circular maps shown below. Computational parameter regime from Tasks 1–5 defines the performance targets this construct must hit experimentally in Aim 2.

Construct maps

Linear construct map — 8,323 bp Linear construct map — 8,323 bpCircular construct map — 8,323 bp Circular construct map — 8,323 bp

Part-by-part annotation

Position (bp)PartIdentityRole
1–4,200pSC101_KanR_MCSpSC101 origin + KanR resistanceLow-copy backbone (5–20 copies/cell); kanamycin selection; MCS for cloning
~4,200BBa_J23115Constitutive promoter (medium-strong)Drives TtrR expression; matched to k1=0.426 in Task 1 ODE
~4,450BBa_B0031RBS (medium strength)Ribosome binding site for TtrR translation
~4,500–5,200ttrR_proteinTtrR response regulator (S. Typhimurium LT2)Receives phosphoryl group from TtrS; activates pTtr_LT2 promoter
~5,200BBa_B0015Double terminatorTerminates TtrR transcription; prevents read-through
~5,250BBa_J23106Constitutive promoter (medium)Drives TtrS expression
~5,300BBa_B0031RBS (medium strength)Ribosome binding site for TtrS translation
~5,350–6,700ttrS_proteinTtrS sensor histidine kinase (S. Typhimurium LT2)Senses tetrathionate; autophosphorylates; transfers phosphoryl to TtrR
~6,700BBa_B0015Double terminatorTerminates TtrS transcription
~6,750pTtr_Salmonella_LT2TtrR-activated inducible promoterOutput promoter; activated only when TtrR* exceeds threshold; EC50=20 µM tetrathionate
~6,950BBa_B0031RBS (medium strength)Ribosome binding site for sfGFP translation
~7,000–7,700BBa_I746916sfGFP (superfolder GFP)Fluorescence reporter; validates sensor-to-output pathway; proxy for MccH47 expression in Aim 2
~7,700BBa_B0015Double terminatorTerminates sfGFP transcription; end of expression cassette

Circuit logic

Tetrathionate (S) present in gut lumen → TtrS autophosphorylation (J23106 → TtrS constitutive) → TtrR phosphorylation (J23115 → TtrR constitutive) → TtrR* accumulates above Km_reg threshold (2.0 rel. units) → pTtr_Salmonella_LT2 activated → sfGFP expressed (reporter) → [Aim 2: MccH47 replaces/co-expresses with sfGFP under same promoter] At homeostasis (S = 0): → TtrR* below threshold → pTtr_LT2 silent → sfGFP OFF (leaky floor = 0.293 rel. units at α_leak = 2% α_max)

Aim 2 extension — MccH47 insertion point

The current construct expresses sfGFP as the reporter payload under pTtr_LT2. For Aim 2 wet-lab validation, MccH47 + MchI immunity cassette replaces or is inserted downstream of sfGFP using the existing MCS in the pSC101 backbone:

pTtr_LT2 → [BBa_B0031 → sfGFP → BBa_B0015 ↓ Aim 2 swap pTtr_LT2 → BBa_B0031 → MccH47 + MchI → BBa_B0034 → BBa_B0015

BBa_B0034 (strong RBS) replaces B0031 for the effector cassette to maximise k_M — confirmed as top Pareto driver (PRCC rank 1, Sobol ST rank 1) from Tasks 3 and 4.

Containment — chromosomal ΔdapA

ΔdapA auxotrophy is implemented as a chromosomal deletion in the EcN host, separate from the plasmid construct. DAP (diaminopimelic acid) is absent from mammalian gut, making survival contingent on exogenous DAP supply. Escape frequency: ~10⁻⁸ per generation (Stritzker 2007). This layer is not encoded on the plasmid and does not contribute to the 8,323 bp construct size.


Aim 2 experimental framework

Inputs from Aim 1

The following outputs from the computational pipeline feed directly into Aim 2 experimental design, defining what to measure, in what order, and with what precision:

Aim 1 outputHow it constrains Aim 2
Pareto-resolved parameter regimes (top-ranked δ, K_D, n, k_M combinations)Defines the experimental parameter space to sample first; avoids blind screening
Ranked PRCC sensitivity indicesk_M and k_kill are ranks 1 and 2 for suppression; these are measured before n or EC50 — constraining the highest-leverage parameters first
Predicted tetrathionate activation threshold (EC50=20 µM, n=2)Sets the concentration range for dose-response assay: 0, 2, 10, 20, 50, 100 µM
Predicted MccH47 kill kinetics curve (k_kill vs. effector concentration)Defines expected CFU reduction trajectory for co-culture kill assay; any deviation triggers ODE re-parameterisation
ΔdapA-confirmed EcN chassisHost strain confirmed auxotrophic before construct transformation; fluctuation assay baseline established
Synthesised four-module plasmid constructTwist-synthesised, SecureDNA-screened construct delivered to Ginkgo for CFPS expression and co-culture screening

Expected outputs

  • Transformed EcN colonies carrying the integrated sense-respond-contain circuit
  • Dose-response curve: fluorescence reporter activation (sfGFP) vs. tetrathionate concentration across 0–100 µM physiological range
  • Kill curve: CFU reduction of Salmonella Typhimurium SL1344 and E. coli O157:H7 over time in co-culture at defined tetrathionate concentrations
  • Growth curves comparing wild-type EcN, uninduced circuit EcN, and induced circuit EcN — quantifying burden δ experimentally
  • Refined ODE parameter set back-fitted from all experimental observations; committed to iso-sentinel-ecn/params/aim2_wetlab.yaml

Metrics produced

MetricMethodLinks back to
Hill coefficient n and activation threshold K_D for TtrS/TtrRFluorescence dose-response curve fittingTask 1 biosensor ODE — direct parameter validation
MccH47 production rate k_M and kill rate k_kill under defined inductionLC-MS quantification (Waters BioAccord) + CFU kill curveTask 2 effector ODE; Task 3 Pareto position
Burden δ: growth rate differential between circuit-on and wild-type EcNOD600 time-course comparisonTask 3 Pareto x-axis; Task 5 Moran input
ΔdapA escape frequencyFluctuation assay (Luria-Delbrück)Task 5 containment escape model — validates μ_escape=10⁻⁸ assumption
Model-to-experiment deviationppm or % error per parameter: (measured − predicted) / predicted × 10⁶All tasks; feeds ODE refinement loop
Circuit specificity: kill ratio against target pathogens vs. commensal E. coliCFU ratio on selective vs. non-selective mediaTask 2 effector module; confirms narrow-spectrum claim

Performance targets — Tasks 1–5 validated predictions

MeasurementODE predictionTarget rangeMethod
sfGFP at 50 µM tetrathionate12.16 rel. units>10Plate reader fluorescence
sfGFP at 0 µM (homeostatic)0.293 rel. units<1.0Plate reader fluorescence
Fold induction (ON/OFF)41.5×≥10×Fluorescence ratio
t50 (sfGFP ON state)15.3 h8–30 hTime-course plate reader
MccH47 secretion (Aim 2)3.45 µM/h at full inductionDetectable above baselineLC-MS (Waters BioAccord)
Pathogen suppression (Aim 2)100% at 200h≥90% at 48hCFU count, selective media
ΔdapA confirmationLethal without DAPNo growth in DAP-free LBOD600 growth curve

SecureDNA screening

All sequences — ttrS, ttrR, pTtr_LT2, sfGFP, and the Aim 2 MccH47+MchI cassette — require SecureDNA screening before Twist synthesis submission. This is a mandatory HTGAA Industry Council step.

Industry Council alignment

CompanySpecific role
Asimov KernelIndependent circuit architecture validation; cross-check Pareto landscape against Kernel simulation output
SecureDNAPre-synthesis screen of ttrS, ttrR, mccH47, mchI, ΔdapA cassette sequences
Twist BiosciencesCodon-optimised synthesis of full construct (~3.5 kb); delivery to Ginkgo
Ginkgo BioworksCFPS expression, plate reader fluorescence, potential his-tag purification
Waters CorporationLC-MS confirmation of MccH47 secretion; intact mass of sfGFP reporter
OpentronsCo-culture assay automation for parallel tetrathionate concentration screening

Group Final Project

Authored and reviewed by:

  • 2026a-john-adeyemo-adedeji
  • 2026a-eric-schneider
  • 2026a-albert-manrique
  • 2026a-tehseen-rubbab
  • 2026a-brie-taylor

Introduction

This document captures the full scope of our group work within the Genspace node focused on engineering the MS2 bacteriophage L protein. Group 2 formed around a shared interest in improving the toxicity, stability, and tunability of the L protein through computational design.

Our early brainstorming sessions centered on three broad goals:

  • Increased stability
  • Higher titers
  • Higher toxicity of the lysis protein

After several meetings and independent exploration, the group converged on two main computational directions. The first centered on systematic truncation and mutagenesis of the N-terminal regulatory domain. The second focused on point mutations within conserved regions that could alter electrostatic interactions while preserving structure.

Two major pipelines emerged from that work. John’s pipeline explored N-terminal truncations, DnaJ disruption, sequence redesign, codon optimization, and sequencing validation. Eric’s pipeline focused on charge-based mutations, conservation mapping, structural modeling, ORF overlap analysis, and cross-referencing with experimental lysis data.

Both approaches identified strong but distinct candidates for improving L protein function.


John’s Analysis and Pipeline

Summary

The MS2 lysis protein L is a 75 amino acid single-pass transmembrane protein whose N-terminal region acts as a regulatory brake on lysis. Rather than directly participating in membrane disruption, this region delays insertion and oligomerization of the transmembrane domain.

My pipeline focused on systematically removing portions of that inhibitory region while preserving the membrane-spanning lytic core. The central hypothesis was simple: if the N-terminal domain slows lysis, then partial removal should release that inhibition and produce earlier, stronger lytic activity.

The strongest candidate to emerge from the analysis was L_trunc30, which removes the first 30 amino acids while preserving the entire transmembrane domain.

Working Sequence

Confirmed L protein sequence:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Confirmed DNA sequence:

atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgag
gattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaa
tttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

Core Hypothesis

Three ideas guided the pipeline:

  1. Partial truncations of the N-terminal region should reduce inhibition and increase lysis efficiency.
  2. The regulatory function is probably localized to a smaller sub-region rather than spread evenly across the entire N-terminus.
  3. There is likely an optimal truncation point where toxicity increases without destabilizing the membrane-spanning domain.

Pipeline Overview

StageToolPurpose
1ESM2Mutational scanning across all 75 residues
2ESMFoldStructural prediction of truncation variants
3AlphaFold-MultimerModeling interaction with DnaJ
4GROMACSMolecular dynamics and RMSF analysis
5ProteinMPNNJunction redesign and charge reduction
6Codon optimizationPrepare E. coli expression constructs
7Synthetic construct designAssemble expression cassette
8Bowtie2 + BCFtoolsVariant calling and sequencing validation
9IGVManual inspection of called variants

Major Findings

ESM2 Mutagenesis Scan

The ESM2 scan identified position C29 as the dominant mutational hotspot in the N-terminal domain.

MutationLLRNotes
C29R3.64Top-ranked substitution
C29P3.17Strong helix-disrupting mutation
C29Q3.06Conservative but highly favored
F22R1.86Introduces basic charge
S9Q1.69Recovered independently in prior work

C29 accounted for 12 of the top 20 substitutions. That concentration strongly suggested that the wild-type residue at this site is not ideal for maximizing toxicity outside the native viral context.

Structural Findings

ESMFold predictions for all truncation variants suggested that the N-terminal domain is highly disordered in solution. Interdomain contact analysis returned essentially zero contacts across all variants, which fits with the known biology of the L protein.

The more useful signal came from molecular dynamics.

For L_trunc30:

  • Remaining N-terminal stub RMSF: ~1.87 nm
  • Transmembrane domain RMSF: ~0.27 nm

That sharp drop in flexibility confirmed that the transmembrane region remains stable even after removing 30 amino acids from the N-terminus.

Charge Analysis

The wild-type N-terminal region is strongly basic due to motifs like RRRPFK and RRQQR.

L_trunc30 reverses the overall charge profile:

VariantNet chargeInterpretation
Wild-type LApproximately +8Strong DnaJ interaction expected
L_trunc30-2Reduced DnaJ binding and earlier lysis expected

This was important mechanistically because DnaJ binding depends heavily on electrostatic interactions with the positively charged N-terminal region.

Codon Optimization and Construct Design

All major truncation variants were codon-optimized for E. coli K-12.

The lead construct, L_trunc30, preserved the essential LS motif and was assembled into a complete 230 bp expression cassette with:

  • Ptrc promoter
  • Optimized RBS
  • Lambda t0 terminator
  • rrnB T1 terminator
  • Gibson overhangs compatible with the mUAV backbone

Lead Candidate

CandidateKey FeatureReason
L_trunc30Removes aa 1-30Strongest balance of toxicity, structural stability, and DnaJ disruption

Secondary Candidates

CandidateReason for Inclusion
C29RHighest ESM2 score overall
F22RAdds positive charge in N-terminal region
S9QRecovered independently in previous scans
L_trunc40Most aggressive truncation, likely strongest toxicity

GDrive Folder Depo: https://drive.google.com/drive/folders/17TE8ES8jUfnYL5irekBBFF2hsXrgr9lT?usp=sharing

Eric’s Analysis and Pipeline

Eric approached the same problem from a different angle. Instead of removing large sections of the N-terminus, he focused on identifying individual amino acid substitutions that could improve toxicity while preserving the overall structure of the protein.

His strongest candidate was P13L, a single amino acid change in the N-terminal region.

Pipeline Overview

StageToolPurpose
1UniProt + BLASTSequence retrieval and homolog identification
2Clustal OmegaConservation mapping
3AlphaFold-MultimerOligomer modeling
4ESM2Mutation scoring
5ESMFoldStructural confidence and pTM analysis
6ChimeraXElectrostatic visualization
7BenchlingORF overlap analysis

Major Findings

Conservation Analysis

Eric identified a relatively unconstrained region between amino acids 16 and 28 that could tolerate mutation without damaging essential structure.

PositionWild-type residueInterpretation
18RFully conserved, avoid
21PFully conserved, avoid
23KFully conserved, avoid
26DVariable, strong candidate
13PWeakly conserved, potentially safe

Structural Modeling

P13L produced the strongest ESMFold result among all variants tested.

VariantpTMChange vs WT
Wild-type0.273Reference
D26R0.267Slight decrease
P13L0.420Strong increase

The jump from 0.273 to 0.420 made P13L the most structurally favorable point mutation in Eric’s pipeline.

Experimental Cross-Reference

Unlike my pipeline, Eric cross-referenced computational candidates with available lysis data.

MutationReplicate AReplicate BResult
P13L11Confirmed lytic
D26G10Mixed
K23E10Mixed
E25G10Mixed

P13L was the only candidate to remain consistently positive across both replicates.

ORF Overlap Analysis

One of the more interesting parts of Eric’s work was the DNA-level overlap analysis.

P13L falls within the overlap region between the coat protein and the L protein, which initially made it look risky. After codon-level analysis, though, the mutation turned out to be safe.

GeneWT codonMutant codonResult
L proteinCCGCTGPro → Leu
Coat proteinTCCTCTSer → Ser

That synonymous change in the coat protein meant the mutation could proceed without disrupting the overlapping reading frame.

Lead Candidate

CandidateKey FeatureReason
P13LSingle amino acid substitutionBest structural score and strongest experimental support

Secondary Candidates

CandidateStatus
D26RUntested but promising
D26GMixed experimental results
N17ROpen candidate
H24ROpen candidate

Albert’s Notes

Albert focused primarily on structural stability.

His workflow emphasized:

  1. Sequence retrieval from UniProt
  2. BLAST and Clustal Omega for conservation mapping
  3. ESM2 mutational scanning
  4. ESMFold structure prediction
  5. AlphaFold-Multimer confirmation of DnaJ interactions
  6. Wet lab validation of top-ranked variants

His key concern was preserving structure while introducing beneficial mutations.

He also pointed out an important limitation that kept showing up across the project: membrane proteins are underrepresented in both structural databases and protein language model training sets. That means even high-scoring mutations should still be interpreted cautiously.


Tehseen’s Notes

Tehseen’s approach aligned closely with my truncation-based strategy but focused more on identifying the smallest regulatory segment required for precise control over lysis timing.

The central idea was not simply to remove the N-terminal region, but to identify exactly which residues are responsible for slowing lysis.

That led to three closely related hypotheses:

  1. Partial truncations can increase lysis gradually rather than all at once.
  2. Regulatory effects are probably localized to a smaller sub-region.
  3. There is likely an optimal balance point between stronger toxicity and preserved protein stability.

Comparative Summary

AspectJohn’s PipelineEric’s Pipeline
Main strategyProgressive N-terminal truncationPoint mutation design
Lead candidateL_trunc30P13L
Core hypothesisRemove inhibitory domainIncrease local electrostatic effects
ESM2 scopeFull 1,425-substitution scanSingle-site targeted analysis
Structural analysisESMFold + GROMACS RMSFESMFold + ChimeraX
DnaJ interactionCentral to modelConsidered indirectly
Experimental validationNot yet completedP13L confirmed experimentally
Construct designFully assembledStill planned
Sequencing workflowFully designed with Bowtie2, BCFtools, IGVListed as future step

Final Interpretation

The project ended up producing two very different but complementary engineering directions.

L_trunc30 represents the stronger systems-level redesign. It removes the inhibitory N-terminal region, reduces DnaJ engagement, preserves the transmembrane core, and provides a fully buildable expression construct ready for synthesis and sequencing validation.

P13L represents the cleaner minimal-change strategy. It preserves the full-length protein, improves structural confidence, survives ORF overlap analysis, and already has positive experimental support.

If the goal is maximum disruption of the native regulatory system, L_trunc30 is the stronger candidate.

If the goal is a simpler mutation with lower engineering risk and existing wet lab support, P13L is the better starting point.

The most practical next step would be to synthesize and compare both side by side.