Individual Final Project

this is a draft

Papeco: An In Silico Design Project for a Second-Generation Genetically Encoded Carbon Monoxide Biosensor

SECTION 1: ABSTRACT

Papeco is a in silico synthetic biology project focused on designing a second-generation genetically encoded biosensor for carbon monoxide (CO). The project addresses an important problem in chemical biology and biosensing: CO has real biological relevance in heme metabolism, stress signaling, and disease, but the available fluorescent tools are dominated by small-molecule probes rather than programmable protein-based systems. This matters because genetically encoded sensors can, in principle, be targeted to defined cells or compartments, redesigned by DNA sequence alone, and improved iteratively through structure-guided engineering. Natural heme-based CO sensor proteins such as CooA and RcoM provide a strong biological starting point because they already recognize CO through ligand-dependent conformational changes. The broad objective of Papeco is therefore to computationally design and prioritize new fluorescent CO biosensor architectures built from these natural sensing domains.

The central hypothesis is that the conformational response of a natural CO-sensing heme protein can be computationally coupled to a fluorescent output, and that a structure-guided design workflow can identify a small set of candidate constructs with a realistic chance of working before any wet-lab work begins. The project will test this hypothesis by comparing CooA and RcoM as scaffolds, identifying candidate insertion or fusion sites for fluorescent readouts, building DNA-level construct designs, modeling the resulting fusion proteins, and ranking designs with a reproducible scoring pipeline. Specific aims include selecting the best starting scaffold, generating a rational library of fluorescent biosensor constructs, and producing a prioritized shortlist of candidate DNA designs ready for future synthesis. Methods will include literature-guided design, structural analysis of experimental and predicted protein models, sequence alignment, insertion-site scoring, AlphaFold- or ESMFold-style structural prediction, Rosetta-style remodeling, and short molecular dynamics comparisons of top candidates. The expected outcome is not a validated biosensor yet, but a complete computational design package: a justified scaffold choice, a ranked construct library, finalized DNA sequences, and a clear roadmap for experimental follow-up.

SECTION 2: PROJECT AIMS

Aim 1: Experimental Aim (this project)

The first aim of my final project is to computationally design and rank a small panel of CooA- and RcoM-based fluorescent carbon monoxide biosensor constructs by utilizing literature-guided structural analysis, DNA sequence design, insertion-site prioritization, protein structure prediction, and in silico stability and conformational scoring.

Relevant methods/resources for Aim 1:

Published literature on CooA, RcoM, and the prior CooA-based CO biosensor COSer
Experimental CooA structures from the PDB and predicted models for less-characterized scaffolds
DNA and protein sequence design tools
AlphaFold DB / ESMFold / Boltz-style structure prediction where appropriate
Rosetta-style remodeling for linker and insertion-site modeling
Basic MD or conformational comparison workflows for top candidates
Final DNA construct maps and synthesis-ready sequences

Aim 2: Development Aim

Top designs from Aim 1 will be prototyped as fusion proteins or simple genetic constructs and tested in vitro for CO responsiveness through fluorescence or visible color change. The goal is to validate signal specificity, sensitivity, and response time under controlled exposure conditions.

Aim 3: Visionary Aim

The long-term vision is to translate this biosensor into an accessible paper-based or wearable detector for homes, laboratories, and industrial settings, enabling early warning in low-resource environments and reducing preventable CO poisoning. And also establish a general design framework for genetically encoded gas biosensors that could ultimately enable programmable imaging of carbon monoxide dynamics in living systems and serve as a model for building future protein-based sensors for other gasotransmitters.

SECTION 3: BACKGROUND

Background and Literature Context

Carbon monoxide is widely known as a toxic gas, but it is also a biologically meaningful signaling molecule produced during heme degradation. That makes it scientifically important but technically difficult to study, especially in a way that captures where and when CO appears inside living systems. Most current fluorescent CO detection strategies rely on small-molecule chemistry rather than genetically encoded proteins. Those tools can be useful, but they are not as easy to retarget, evolve, or encode directly through DNA. Papeco addresses this gap by starting from natural bacterial CO-sensing proteins and redesigning them computationally into fluorescent biosensors.

One important peer-reviewed paper is Inouye et al. (1997), which established CooA from Rhodospirillum rubrum as a bona fide CO-sensing heme protein. The study showed that CooA binds CO at its heme and changes into an active regulatory state capable of sequence-specific DNA binding. This is important because it proves that CooA already contains the core sensing mechanism Papeco needs: selective ligand recognition linked to a biologically meaningful conformational response. For an in silico design project, that makes CooA especially attractive because the project can focus on coupling an existing switch to fluorescence rather than inventing a new switch from nothing.

A second important peer-reviewed paper is Salman et al. (2019), which characterized the heme-based CO sensor RcoM-2 and showed that it binds CO with extremely high affinity and very low dissociation. This paper is relevant because it identifies a second natural scaffold with sensing chemistry that may be superior to CooA in some respects, especially selectivity and oxygen tolerance. At the same time, RcoM is less structurally tractable for a short computational project because it lacks the same depth of experimentally accessible structural precedent used routinely for CooA-centered design. Together, these two papers define the key design tension in Papeco: CooA is the better starting scaffold for a first computational campaign, while RcoM is the more ambitious scaffold for future improvement.

Novelty and Innovation

Papeco is innovative because it is not simply a literature review of CO sensing and not yet a conventional wet-lab biosensor project. Instead, it is a complete in silico design campaign that treats DNA design, structure prediction, and candidate ranking as the main deliverables. The project is also novel because the field of engineered protein-based fluorescent CO biosensors is still sparse, meaning there is real room for a rational second-generation design effort. By combining a natural heme-based gas sensor with computational fluorescent-protein fusion design, Papeco explores a synthetic biology design space that is still underdeveloped.

Why This Project Matters and What Impact It Could Have

This project matters because a major barrier in CO biology is not just detection, but the lack of programmable biological sensors that can be systematically improved. Carbon monoxide is relevant to oxidative stress, inflammation, heme oxygenase biology, and cellular signaling, yet many experiments still rely on indirect measurements or chemistry-based proxies. A successful computational design framework would make the next stage of experimental work much more efficient by reducing the number of biosensor constructs that need to be built blindly. That matters both scientifically and practically: it lowers cost, focuses experimental effort on the most plausible candidates, and creates a reusable workflow for future gas sensor engineering. Beyond Papeco itself, the project could help advance the broader synthetic biology goal of converting natural ligand-responsive proteins into modular genetically encoded reporters. If successful, the concepts developed here could eventually change how researchers approach the design of sensors for CO and other gasotransmitters by making structure-guided design a standard first step rather than an afterthought.

Ethical Implications

This project raises ethical questions mainly around research responsibility, scientific honesty, and downstream use rather than immediate physical risk, because it is fully computational. The principle of non-maleficence still applies because a poorly characterized biosensor design could mislead later experiments if it is presented as more specific or more mature than it really is. Beneficence is also relevant because the intended purpose of the project is to create a tool that could improve biological measurement and reduce uncertainty in CO research. Responsibility is especially important here because the literature already contains examples of apparent CO probes that respond to CO-releasing molecules or related chemistry rather than to CO itself. A computational project can still do harm if it produces overconfident designs, unrealistic claims, or weak validation logic that others adopt uncritically.

SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

Broad Workflow Figure

flowchart TD
    A[Literature review and benchmark survey] --> B[Choose scaffold: CooA-first, RcoM comparator]
    B --> C[Collect structures and sequences]
    C --> D[Map conformationally active and insertion-tolerant regions]
    D --> E[Design DNA-level biosensor constructs]
    E --> F[Model structures of fusion proteins]
    F --> G[Score stability, heme-pocket preservation, and motion coupling]
    G --> H[Rank candidates and eliminate weak designs]
    H --> I[Run short MD or state-comparison analyses on top designs]
    I --> J[Produce synthesis-ready DNA shortlist and final report]

Methods, Tools, and Technologies Included in the Plan

Peer-reviewed literature on CooA, RcoM, and fluorescent CO-sensing precedent
Protein Data Bank structures for CooA and sequence databases for scaffold retrieval
Sequence alignment tools for conservation analysis
DNA design software for synthesis-ready construct generation
AlphaFold DB, ESMFold, or similar structure-prediction tools for fusion-protein modeling
Rosetta-style local remodeling or relaxation tools for insertion-site refinement
Molecular dynamics or short structural relaxation workflows for comparative stability assessment
Scoring spreadsheets or scripting workflows for transparent candidate ranking
Visualization tools for domain maps, construct schematics, and structural overlays

Expected Results for the Main Analyses

Scaffold comparison: CooA will likely emerge as the best first-round design scaffold because it combines stronger structural precedent with an existing fluorescent engineering example.
Insertion-site analysis: only a small subset of loops or domain boundaries will likely be suitable for fluorescent coupling without disrupting heme sensing.
Fusion-modeling analysis: several candidate designs will probably fail due to steric or packing problems, which is a useful result because it narrows the experimental search space.
Ranking analysis: the top 2-3 constructs should be defensible as the most synthesis-worthy candidates even if none can be claimed to function yet.
DNA design output: the final product will include complete construct sequences and maps that can be ordered directly in a future experimental phase.

Expected Overall Outcome

The most realistic outcome of this class project is a complete computational biosensor design package rather than a biologically validated final sensor. Success means identifying the most plausible scaffold, designing a small set of specific fluorescent biosensor constructs, modeling and ranking them transparently, and ending with synthesis-ready DNA sequences for future testing. Even if none of the models look ideal, the project will still be valuable because it will define which design routes are least promising and which features should guide the next engineering round. In that sense, Papeco works as a true in silico synthetic biology project: it converts a broad idea into a concrete, auditable, and experimentally actionable design plan.

Sources

Inouye S, et al. “CooA, a CO-sensing transcription factor from Rhodospirillum rubrum, is a CO-binding heme protein” (1997). https://pmc.ncbi.nlm.nih.gov/articles/PMC23420/
Salman M, et al. “Interaction of the Full-Length Heme-Based CO Sensor Protein RcoM-2 with Ligands” (2019). https://pubmed.ncbi.nlm.nih.gov/31502443/
Komori H, et al. “Crystal structure of CO-sensing transcription activator CooA bound to exogenous ligand imidazole” (2007). https://pubmed.ncbi.nlm.nih.gov/17292914/
Vos MH, et al. “Early processes in heme-based CO-sensing proteins” (2022). https://www.frontiersin.org/articles/10.3389/fmolb.2022.1046412/full
Wang X, et al. “A selective fluorescent probe for carbon monoxide imaging in living cells” (2012). https://pubmed.ncbi.nlm.nih.gov/22930547/
COSer biosensor entry. https://biosensordb.ucsd.edu/biosensorDB/bsView.php?IDin=863
Xie Y, et al. “A Review for In Vitro and In Vivo Detection and Imaging of Gaseous Signal Molecule Carbon Monoxide by Fluorescent Probes” (2022). https://www.mdpi.com/1420-3049/27/24/8842
Liu L, et al. “An Fe(III)-Based Fluorescent Probe for Carbon Monoxide only Senses the ‘CO Donor’ Used, CORM-3, but Not CO” (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC12547852/
Liu L, et al. “Sensing a CO-Releasing Molecule (CORM) Does Not Equate to Sensing CO: The Case of DPHP and CORM-3” (2023). https://pmc.ncbi.nlm.nih.gov/articles/PMC10267888/

This could be turn into…

a wearable device
wall sticker