Individual Final Project

this is a draft

cover image cover image papeco papeco

Papeco: An In Silico Design Project for a Second-Generation Genetically Encoded Carbon Monoxide Biosensor

SECTION 1: ABSTRACT

Papeco is a in silico synthetic biology project focused on designing a second-generation genetically encoded biosensor for carbon monoxide (CO). The project addresses an important problem in chemical biology and biosensing: CO has real biological relevance in heme metabolism, stress signaling, and disease, but the available fluorescent tools are dominated by small-molecule probes rather than programmable protein-based systems. This matters because genetically encoded sensors can, in principle, be targeted to defined cells or compartments, redesigned by DNA sequence alone, and improved iteratively through structure-guided engineering. Natural heme-based CO sensor proteins such as CooA and RcoM provide a strong biological starting point because they already recognize CO through ligand-dependent conformational changes. The broad objective of Papeco is therefore to computationally design and prioritize new fluorescent CO biosensor architectures built from these natural sensing domains.

The central hypothesis is that the conformational response of a natural CO-sensing heme protein can be computationally coupled to a fluorescent output, and that a structure-guided design workflow can identify a small set of candidate constructs with a realistic chance of working before any wet-lab work begins. The project will test this hypothesis by comparing CooA and RcoM as scaffolds, identifying candidate insertion or fusion sites for fluorescent readouts, building DNA-level construct designs, modeling the resulting fusion proteins, and ranking designs with a reproducible scoring pipeline. Specific aims include selecting the best starting scaffold, generating a rational library of fluorescent biosensor constructs, and producing a prioritized shortlist of candidate DNA designs ready for future synthesis. Methods will include literature-guided design, structural analysis of experimental and predicted protein models, sequence alignment, insertion-site scoring, AlphaFold- or ESMFold-style structural prediction, Rosetta-style remodeling, and short molecular dynamics comparisons of top candidates. The expected outcome is not a validated biosensor yet, but a complete computational design package: a justified scaffold choice, a ranked construct library, finalized DNA sequences, and a clear roadmap for experimental follow-up.

SECTION 2: PROJECT AIMS

Aim 1: Experimental Aim (this project)

The first aim of my final project is to computationally design and rank a small panel of CooA- and RcoM-based fluorescent carbon monoxide biosensor constructs by utilizing literature-guided structural analysis, DNA sequence design, insertion-site prioritization, protein structure prediction, and in silico stability and conformational scoring.

Relevant methods/resources for Aim 1:

  • Published literature on CooA, RcoM, and the prior CooA-based CO biosensor COSer
  • Experimental CooA structures from the PDB and predicted models for less-characterized scaffolds
  • DNA and protein sequence design tools
  • AlphaFold DB / ESMFold / Boltz-style structure prediction where appropriate
  • Rosetta-style remodeling for linker and insertion-site modeling
  • Basic MD or conformational comparison workflows for top candidates
  • Final DNA construct maps and synthesis-ready sequences

Aim 2: Development Aim

Top designs from Aim 1 will be prototyped as fusion proteins or simple genetic constructs and tested in vitro for CO responsiveness through fluorescence or visible color change. The goal is to validate signal specificity, sensitivity, and response time under controlled exposure conditions.

Aim 3: Visionary Aim

The long-term vision is to translate this biosensor into an accessible paper-based or wearable detector for homes, laboratories, and industrial settings, enabling early warning in low-resource environments and reducing preventable CO poisoning. And also establish a general design framework for genetically encoded gas biosensors that could ultimately enable programmable imaging of carbon monoxide dynamics in living systems and serve as a model for building future protein-based sensors for other gasotransmitters.

SECTION 3: BACKGROUND

Background and Literature Context

Carbon monoxide is widely known as a toxic gas, but it is also a biologically meaningful signaling molecule produced during heme degradation. That makes it scientifically important but technically difficult to study, especially in a way that captures where and when CO appears inside living systems. Most current fluorescent CO detection strategies rely on small-molecule chemistry rather than genetically encoded proteins. Those tools can be useful, but they are not as easy to retarget, evolve, or encode directly through DNA. Papeco addresses this gap by starting from natural bacterial CO-sensing proteins and redesigning them computationally into fluorescent biosensors.

One important peer-reviewed paper is Inouye et al. (1997), which established CooA from Rhodospirillum rubrum as a bona fide CO-sensing heme protein. The study showed that CooA binds CO at its heme and changes into an active regulatory state capable of sequence-specific DNA binding. This is important because it proves that CooA already contains the core sensing mechanism Papeco needs: selective ligand recognition linked to a biologically meaningful conformational response. For an in silico design project, that makes CooA especially attractive because the project can focus on coupling an existing switch to fluorescence rather than inventing a new switch from nothing.

A second important peer-reviewed paper is Salman et al. (2019), which characterized the heme-based CO sensor RcoM-2 and showed that it binds CO with extremely high affinity and very low dissociation. This paper is relevant because it identifies a second natural scaffold with sensing chemistry that may be superior to CooA in some respects, especially selectivity and oxygen tolerance. At the same time, RcoM is less structurally tractable for a short computational project because it lacks the same depth of experimentally accessible structural precedent used routinely for CooA-centered design. Together, these two papers define the key design tension in Papeco: CooA is the better starting scaffold for a first computational campaign, while RcoM is the more ambitious scaffold for future improvement.

Novelty and Innovation

Papeco is innovative because it is not simply a literature review of CO sensing and not yet a conventional wet-lab biosensor project. Instead, it is a complete in silico design campaign that treats DNA design, structure prediction, and candidate ranking as the main deliverables. The project is also novel because the field of engineered protein-based fluorescent CO biosensors is still sparse, meaning there is real room for a rational second-generation design effort. By combining a natural heme-based gas sensor with computational fluorescent-protein fusion design, Papeco explores a synthetic biology design space that is still underdeveloped.

Why This Project Matters and What Impact It Could Have

This project matters because a major barrier in CO biology is not just detection, but the lack of programmable biological sensors that can be systematically improved. Carbon monoxide is relevant to oxidative stress, inflammation, heme oxygenase biology, and cellular signaling, yet many experiments still rely on indirect measurements or chemistry-based proxies. A successful computational design framework would make the next stage of experimental work much more efficient by reducing the number of biosensor constructs that need to be built blindly. That matters both scientifically and practically: it lowers cost, focuses experimental effort on the most plausible candidates, and creates a reusable workflow for future gas sensor engineering. Beyond Papeco itself, the project could help advance the broader synthetic biology goal of converting natural ligand-responsive proteins into modular genetically encoded reporters. If successful, the concepts developed here could eventually change how researchers approach the design of sensors for CO and other gasotransmitters by making structure-guided design a standard first step rather than an afterthought.

Ethical Implications

This project raises ethical questions mainly around research responsibility, scientific honesty, and downstream use rather than immediate physical risk, because it is fully computational. The principle of non-maleficence still applies because a poorly characterized biosensor design could mislead later experiments if it is presented as more specific or more mature than it really is. Beneficence is also relevant because the intended purpose of the project is to create a tool that could improve biological measurement and reduce uncertainty in CO research. Responsibility is especially important here because the literature already contains examples of apparent CO probes that respond to CO-releasing molecules or related chemistry rather than to CO itself. A computational project can still do harm if it produces overconfident designs, unrealistic claims, or weak validation logic that others adopt uncritically.

To keep the project ethical, the design workflow should explicitly record its assumptions, uncertainties, and failure points. Candidate constructs should be ranked conservatively, and claims about specificity should be framed as design hypotheses rather than established facts until they are tested experimentally. The project should also include planned controls for future validation, such as distinguishing true CO sensing from oxygen sensitivity, redox artifacts, or donor-specific chemical effects. One potential unintended consequence is that computational scoring could be mistaken for proof of biosensor function, when in reality it is only a prioritization strategy. An important alternative is to treat the output not as “the best sensor” but as a shortlist for experimental falsification. Ethically, the most responsible approach is to use computation to reduce uncertainty while being transparent that real biological performance remains to be demonstrated.

SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

Broad Workflow Figure

flowchart TD
    A[Literature review and benchmark survey] --> B[Choose scaffold: CooA-first, RcoM comparator]
    B --> C[Collect structures and sequences]
    C --> D[Map conformationally active and insertion-tolerant regions]
    D --> E[Design DNA-level biosensor constructs]
    E --> F[Model structures of fusion proteins]
    F --> G[Score stability, heme-pocket preservation, and motion coupling]
    G --> H[Rank candidates and eliminate weak designs]
    H --> I[Run short MD or state-comparison analyses on top designs]
    I --> J[Produce synthesis-ready DNA shortlist and final report]

Detailed Experimental Plan

  1. Week 1, Day 1: Define the computational objective and scoring criteria. I will frame Papeco as a fully in silico design project and define what counts as success: a justified scaffold choice, a ranked list of candidate biosensors, and at least 3 synthesis-ready DNA constructs. Expected result: a realistic project scope with clear deliverables.

  2. Week 1, Day 1-2: Review the key literature on natural CO-sensing proteins and prior engineered CO biosensors. I will focus on CooA, RcoM, and the prior CooA-based fluorescent sensor COSer to identify known mechanisms, limitations, and useful design precedents. Expected result: a design memo that explains why CooA is the primary scaffold and RcoM is the secondary comparator.

  3. Week 1, Day 2: Retrieve experimental CooA structures and gather RcoM sequence information. I will collect relevant CooA PDB entries, including structures that represent off-state or activation-related geometries, and gather sequence records for RcoM and close homologs for modeling. Expected result: a curated structure and sequence dataset for both scaffold families.

  4. Week 1, Day 2-3: Build a sequence alignment and annotate conserved versus flexible regions. I will align CooA homologs and, separately, RcoM-related sequences to identify residues likely to be essential for heme coordination, allostery, dimerization, and structural integrity. Expected result: a residue-annotation map showing where mutation or insertion is likely to be risky versus plausible.

  5. Week 1, Day 3: Compare available structural states of CooA to identify moving regions. I will use structural overlays to locate loops, hinges, and domain boundaries that change position between off-like and activation-related states. Expected result: a shortlist of candidate windows where a fluorescent readout might best couple to CO-induced conformational change.

  6. Week 1, Day 3-4: Define the design architectures to test. I will compare at least three architectures: internal circularly permuted fluorescent protein insertion, terminal FRET fusion, and a hybrid design that combines minimal insertion with linker tuning. Expected result: a structured design matrix rather than a single ad hoc construct idea.

  7. Week 1, Day 4: Choose the primary architecture for the class project. Based on literature precedent and tractability, I will prioritize a CooA-based cpFP insertion strategy for the main design branch and keep one RcoM or FRET-based architecture as a lower-priority comparison. Expected result: one main design branch plus one stretch comparison branch.

  8. Week 1, Day 4-5: Generate candidate insertion or fusion sites. I will score solvent exposure, state-dependent displacement, conservation, heme-pocket proximity, and dimer-interface risk to rank possible insertion positions. Expected result: 5-10 candidate sites narrowed to the top 3-4 for actual construct design.

  9. Week 1, Day 5: Design linker variants and fluorescent fusion geometry. For each top site, I will generate short linker sets that vary flexibility and spacing while preserving reading frame and expected folding. Expected result: a small rational library of candidate proteins rather than a random combinatorial design.

  10. Week 2, Day 1: Produce full DNA sequences for each candidate construct. I will reverse-translate the protein designs into synthesis-ready DNA, optimize codons for the intended expression host, remove problematic restriction sites if needed, and confirm reading-frame continuity. Expected result: finalized DNA designs that satisfy the course requirement that the project include DNA design.

  11. Week 2, Day 1-2: Predict structures for each designed fusion protein. I will use AlphaFold DB where applicable and ESMFold or comparable prediction tools for novel fusion constructs to evaluate whether the sensor domain, fluorescent domain, and linker arrangement remain physically plausible. Expected result: a first-pass model set for all candidate constructs.

  12. Week 2, Day 2-3: Remodel uncertain junctions using Rosetta-style local refinement. For designs in which linker geometry or domain packing looks ambiguous, I will use local remodeling or relaxation to test whether the fusion can accommodate the insertion without obviously breaking the scaffold. Expected result: refined structural hypotheses for the top candidates.

  13. Week 2, Day 3: Eliminate constructs that fail obvious structural filters. I will reject candidates that bury the fluorescent chromophore region improperly, disrupt known heme-binding geometry, create severe steric clashes, or collapse key dimer interfaces. Expected result: a smaller set of designs worth deeper analysis.

  14. Week 2, Day 3-4: Score each surviving candidate with a transparent ranking rubric. I will combine features such as heme-pocket preservation, predicted domain stability, state-coupling plausibility, insertion-site tolerance, and linker reasonableness into a documented heuristic score. Expected result: a ranked candidate table with explicit criteria rather than intuition alone.

  15. Week 2, Day 4-5: Run short molecular dynamics or comparative relaxation analyses on the top 2-3 constructs. The goal will not be to measure exact CO affinity, which generic MD cannot do well here, but to compare relative stability, flexibility, and preservation of the designed geometry. Expected result: evidence that the top-ranked constructs are at least structurally more plausible than the rejected ones.

  16. Week 3, Day 1: Cross-check the leading constructs against the natural-sensor literature. I will verify that no favored design disrupts residues strongly implicated in heme coordination, ligand exchange, or known allosteric transmission. Expected result: a literature-checked shortlist rather than a purely model-driven shortlist.

  17. Week 3, Day 1-2: Produce final construct maps and DNA documentation. For each top construct, I will create a plain-language design summary, protein-domain map, DNA sequence record, and rationale for why it ranked highly. Expected result: a synthesis-ready package for future lab implementation.

  18. Week 3, Day 2-3: Define future wet-lab validation controls even though they are not part of this project. I will specify how the designs should later be tested against true CO, oxygen, NO if relevant, and donor-artifact controls. Expected result: a stronger and more responsible handoff from in silico design to future experimentation.

  19. Week 3, Day 3-4: Write the final computational report. I will compile the scaffold comparison, design logic, construct rankings, DNA designs, structural images, and uncertainty statements into a final project document. Expected result: a complete and coherent in silico final project.

  20. Week 3, Day 4-5: Select the best next-step construct set. I will choose 2-3 lead Papeco designs for future synthesis based on the combined structural and design evidence. Expected result: a clearly prioritized endpoint instead of an open-ended list of possibilities.

Methods, Tools, and Technologies Included in the Plan

  • Peer-reviewed literature on CooA, RcoM, and fluorescent CO-sensing precedent
  • Protein Data Bank structures for CooA and sequence databases for scaffold retrieval
  • Sequence alignment tools for conservation analysis
  • DNA design software for synthesis-ready construct generation
  • AlphaFold DB, ESMFold, or similar structure-prediction tools for fusion-protein modeling
  • Rosetta-style local remodeling or relaxation tools for insertion-site refinement
  • Molecular dynamics or short structural relaxation workflows for comparative stability assessment
  • Scoring spreadsheets or scripting workflows for transparent candidate ranking
  • Visualization tools for domain maps, construct schematics, and structural overlays

Expected Results for the Main Analyses

  • Scaffold comparison: CooA will likely emerge as the best first-round design scaffold because it combines stronger structural precedent with an existing fluorescent engineering example.
  • Insertion-site analysis: only a small subset of loops or domain boundaries will likely be suitable for fluorescent coupling without disrupting heme sensing.
  • Fusion-modeling analysis: several candidate designs will probably fail due to steric or packing problems, which is a useful result because it narrows the experimental search space.
  • Ranking analysis: the top 2-3 constructs should be defensible as the most synthesis-worthy candidates even if none can be claimed to function yet.
  • DNA design output: the final product will include complete construct sequences and maps that can be ordered directly in a future experimental phase.

Expected Overall Outcome

The most realistic outcome of this class project is a complete computational biosensor design package rather than a biologically validated final sensor. Success means identifying the most plausible scaffold, designing a small set of specific fluorescent biosensor constructs, modeling and ranking them transparently, and ending with synthesis-ready DNA sequences for future testing. Even if none of the models look ideal, the project will still be valuable because it will define which design routes are least promising and which features should guide the next engineering round. In that sense, Papeco works as a true in silico synthetic biology project: it converts a broad idea into a concrete, auditable, and experimentally actionable design plan.

Sources

  1. Inouye S, et al. “CooA, a CO-sensing transcription factor from Rhodospirillum rubrum, is a CO-binding heme protein” (1997). https://pmc.ncbi.nlm.nih.gov/articles/PMC23420/
  2. Salman M, et al. “Interaction of the Full-Length Heme-Based CO Sensor Protein RcoM-2 with Ligands” (2019). https://pubmed.ncbi.nlm.nih.gov/31502443/
  3. Komori H, et al. “Crystal structure of CO-sensing transcription activator CooA bound to exogenous ligand imidazole” (2007). https://pubmed.ncbi.nlm.nih.gov/17292914/
  4. Vos MH, et al. “Early processes in heme-based CO-sensing proteins” (2022). https://www.frontiersin.org/articles/10.3389/fmolb.2022.1046412/full
  5. Wang X, et al. “A selective fluorescent probe for carbon monoxide imaging in living cells” (2012). https://experts.illinois.edu/en/publications/a-selective-fluorescent-probe-for-carbon-monoxide-imaging-in-livi
  6. COSer biosensor entry. https://biosensordb.ucsd.edu/biosensorDB/bsView.php?IDin=863
  7. Xie Y, et al. “A Review for In Vitro and In Vivo Detection and Imaging of Gaseous Signal Molecule Carbon Monoxide by Fluorescent Probes” (2022). https://www.mdpi.com/1420-3049/27/24/8842
  8. Liu L, et al. “An Fe(III)-Based Fluorescent Probe for Carbon Monoxide only Senses the ‘CO Donor’ Used, CORM-3, but Not CO” (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC12547852/
  9. Liu L, et al. “Sensing a CO-Releasing Molecule (CORM) Does Not Equate to Sensing CO: The Case of DPHP and CORM-3” (2023). https://pmc.ncbi.nlm.nih.gov/articles/PMC10267888/

This could be turn into…

  • a wearable device
  • wall sticker
parede parede