Projects
Final projects:
- My Individual Final Project Documention





This document presents the complete final project report, including the design strategy, construct engineering workflow, structural analyses, and in silico validation steps. For a more interactive and visually detailed presentation including animated rotating views of the predicted protein structures, enhanced figures, additional simulations, and direct access to all Benchling design files and cloning maps, please refer to the project documentation webpages associated with this final project.
These resources provide a more comprehensive visualization of the project beyond the static figures included in this PDF document.
Project Title: Engineering Houseplants for Atmospheric Carbon Monoxide Capture: Chloroplast-Targeted Expression of the Bacterial CODH Enzyme Complex in Nicotiana tabacum

The Problem This Project Addresses
Carbon monoxide (CO) is a colorless, odorless, tasteless toxic gas that cannot be detected by human senses. It is produced whenever something burns incompletely — gas heaters, stoves, car engines, fireplaces, and wood-burning appliances all release CO. Indoors, CO accumulates silently and can reach dangerous or fatal concentrations before anyone notices. The current standard of protection is a battery-powered electrochemical CO detector. These devices are excellent at detecting CO and sounding an alarm , but they cannot remove the gas from the air. Once the alarm sounds, the occupants must evacuate and ventilate the space manually. Furthermore, CO detectors require regular battery replacement and eventually need to be replaced entirely. In low-income households worldwide, detectors are frequently absent, have dead batteries, or are past their useful lifespan.
–> This project proposes a fundamentally different approach: instead of detecting CO, make the plant remove it.
The Core Idea
Certain bacteria ,particularly Oligotropha carboxidovorans, have evolved the ability to use CO as a food source. They do this using an enzyme called Carbon Monoxide Dehydrogenase (CODH), which converts CO into CO₂ according to this reaction:
CO + H₂O → CO₂ + 2 electrons + 2 protons
The CO₂ produced by this reaction is not harmful at the quantities involved and supposed to be reused by a plant’s own photosynthesis through the Calvin cycle.
This project proposes to take the bacterial CODH system out of the bacterium and introduce it into a plant, specifically targeting it to the chloroplast (the organelle where photosynthesis happens). By placing CODH inside the chloroplast, two elegant outcomes occur simultaneously:
The scientific foundation for this idea is already established in the literature. Duffus et al. (2018) demonstrated that the complete CODH complex can be functionally expressed in Escherichia coli –> proving heterologous expression is achievable. South et al. (2019) demonstrated in Science that bacterial enzymes introduced into tobacco chloroplasts producing CO₂ directly in the stroma increased plant biomass by up to 40% –> proving that chloroplast-produced CO₂ is efficiently captured by photosynthesis. This project extends this logic to a new substrate: atmospheric CO.
The Complete Genetic System Required
The CODH enzyme from O. carboxidovorans is not a single protein. It is a complex system requiring seven genes organized into two functional groups:
coxL –> the large catalytic subunit (~88 kDa) where CO is actually oxidized. Contains the unique [CuSMoO₂] active site coxM –> the medium subunit (~30 kDa) containing FAD, responsible for electron transfer coxS –> the small subunit (~18 kDa) containing [2Fe-2S] iron-sulfur clusters, part of the electron relay chain
These three proteins assemble into a (CoxL·CoxM·CoxS)₂ heterohexamer — a complex of six protein subunits working together.
coxD –> an AAA+ ATPase chaperone that acts as a “maturation protein,” responsible for the post-translational insertion of copper and the essential bridging sulfur into the apo-enzyme, converting it to active holo-enzyme. coxE, coxF and coxG –> “final processing” and “sulfur addition” are part of a complex pathway. According to research, coxF plays a role in copper acquisition/mobilization, and coxE and coxG are involved in the maturation pathway that leads to the properly sulfurated and copper-inserted active site. The exact individual functions of coxE and coxG are still being elucidated, though their role in the maturation complex is essential.
Overview of the Three Aims

AIM 1 — Computational Design and Validation of the Complete Genetic System
In simple terms: Design the complete genetic blueprint for the CO-capturing plant system on a computer, verify every element computationally, and produce a synthesis-ready design.
The seven bacterial genes cannot simply be pasted into a plant. They need to be comprehensively redesigned for plant expression:
All of this is done computationally using Benchling, A codon optimization tool, ChloroP 1.1, Boltz, and the Asimov Kernel –> producing a complete verified design ready for DNA synthesis through Twist Biosciences.
AIM 2 — Wet Lab Transformation and Functional Validation (The next step — beyond this course)
In simple terms: Actually build the constructs in the lab, put them into tobacco plants, and prove the enzyme works. Aim 2 begins where Aim 1 ends. The Twist-synthesized multicassettes fragments are assembled into the pCAMBIA vectors using Gibson Assembly. The constructs are introduced into Nicotiana tabacum via Agrobacterium tumefaciens-mediated leaf disc transformation , the standard method for introducing genes into tobacco. Transgenic plants are selected on dual antibiotic medium (hygromycin + kanamycin, confirming both constructs integrated).
The experimental progression follows strict logic — each step must succeed before the next begins:
for more details, please take a look on part I of week 10 homework.
AIM 3 — Optimization, Transfer to Houseplants, and Real-World Deployment(The long-term vision)
In simple terms: Assuming Aim 2 succeeds, optimize the system, transfer it to real houseplants, and develop it toward real-world deployment. If Aim 2 demonstrates functional CO oxidation in tobacco, Aim 3 pursues three parallel directions:
Direction 1 — Transfer to real houseplants: The validated genetic architecture from tobacco is adapted for transformation into Epipremnum aureum (Pothos) and Spathiphyllum wallisii (Peace Lily) — widely kept, hardy, aesthetically acceptable houseplants. Agrobacterium-mediated transformation protocols established for tobacco are adapted for these species.
Direction 2 — System optimization: Several improvements are pursued to increase CO removal efficiency and operational range:
A CO-responsive inducible promoter system replaces constitutive promoters, activating CODH expression only when CO is present and saving plant energy otherwise Constitutively open stomata engineering to maintain CO uptake during nighttime hours when CO poisoning risk is highest Expression levels are optimized based on the quantitative CO removal model to increase per-plant removal capacity
Direction 3 — Safety, containment, and deployment:
Genetic Use Restriction Technology (GURT): To prevent seed viability and uncontrolled environmental spread, I will implement Genetic Use Restriction Technology (GURT). This ensures that any engineered plants cannot reproduce outside controlled environments. Additional containment strategy — chloroplast genome integration:
As an alternative or complement to GURT, I can integrate the transgenes into the chloroplast genome instead of the nuclear genome. Chloroplast DNA is maternally inherited in most flowering plants, including tobacco (Nicotiana tabacum). This means the transgenes are not transmitted via pollen, virtually eliminating the risk of gene flow to wild relatives. This is a well-established biosafety strategy for plant synthetic biology.
Regulatory pathway planning begins under USDA APHIS (Regulation of genetically engineered plantsand) EPA (Regulation of plants producing pesticidal substances (if applicable))frameworks.
The deployment target is refined based on the quantitative CO removal analysis: rather than acute emergency protection in homes (which requires too many plants), the primary application is chronic CO reduction in high-exposure industrial and semi-industrial environments like workshops, garages, underground parking facilities, and developing-world indoor cooking spaces where CO concentrations are higher and more sustained.
The ethical framework for commercial deployment ,including informed consent, false assurance prevention, equity of access, and environmental risk, is fully developed and integrated into regulatory submissions.
Sources:
| Gene | Genomic Coordinates (NCBI) | Protein ID | Biological Role | Assigned Construct |
|---|---|---|---|---|
| coxL | CP002827.1 (30264–32693) | AEI08106.1 | Catalytic subunit responsible for CO oxidation | Construct 1 (Structural) |
| coxM | CP002827.1 (28882–29748) | AEI08104.1 | FAD-binding subunit involved in electron transfer | Construct 1 (Structural) |
| coxS | CP002827.1 (29767–30267) | AEI08105.1 | Fe-S cluster-containing subunit for electron relay | Construct 1 (Structural) |
| coxD | CP002827.1 (32748–33635) | AEI08107.1 | Molybdenum cofactor insertion and enzyme maturation | Construct 2 (Maturation) |
| coxE | CP002827.1 (33637–34836) | AEI08108.1 | Assists in Mo-cofactor biosynthesis and assembly | Construct 2 (Maturation) |
| coxF | CP002827.1 (34840–35682) | AEI08109.1 | Active site processing and enzyme activation | Construct 2 (Maturation) |
| coxG | CP002827.1 (35682–36299) | AEI08110.1 | Sulfur ligand incorporation into the active site | Construct 2 (Maturation) |
| Promoter | Origin | Relative Strength vs. CaMV 35S | Key Advantage / Note | Source |
|---|---|---|---|---|
| TobUbi.u4 | Nicotiana tabacum (polyubiquitin) | ~7× stronger | Native to tobacco; excellent stability for long-term expression | Genschik et al., 1994 (GenBank: X77456.1) |
| D100 | Synthetic (Dahlia mosaic virus) | ~2.2× stronger | One of the strongest synthetic promoters validated in tobacco | Khadanga et al., 2021; Sahoo et al., 2015 |
| MSD3 | Synthetic chimeric (MMV + SCBV) | ~1.15× stronger | Works in both monocots and dicots; stable in tobacco | Kumari et al., 2024; Dey & Maiti, 1999 |
| DaMVFLt4 | Dahlia mosaic virus | ~5× stronger | Very high activity in protoplasts and transgenic plants | Sahoo et al., 2014; GenBank: JX272320.1 |
| M24 | MMV-derived | ~10× stronger | Extremely strong promoter with enhanced duplicated domains | Sahoo et al., 2014 |
| S100 | Synthetic (Strawberry vein banding virus) | ~1.8× stronger | Strong synthetic alternative; slightly weaker than D100 | Khadanga et al., 2021; Pattanaik et al., 2004 |
| SM | Synthetic chimeric (SCBV + MMV) | ~2.1× stronger | Highly effective in dicots like tobacco | Kumari et al., 2024; Davies et al., 2014 |
| BM | Synthetic chimeric (BSV + MMV) | ~1.72× stronger | Good alternative synthetic promoter for dicots | Kumari et al., 2024; Remans et al., 2005 |
| FMV 34S | Figwort mosaic virus | ~2× stronger | Widely used constitutive promoter in dicots | Bhattacharyya et al., 2002 |
| CaMV 35S | Cauliflower mosaic virus | 1× (reference) | Gold standard promoter for plant expression | Odell et al., 1985; Shakhova et al., 2022 |
| PTSB1 | Arabidopsis thaliana (TSB1) | ~2.4× stronger | Very strong in mature leaves; tissue-dependent variation | Shirasawa-Seo et al., 2002 |
| PPHYB | Arabidopsis thaliana (PHYB) | ~1.5× stronger | Uniform expression across tissues | Shirasawa-Seo et al., 2002; Goosey et al., 1997 |
| PNCR | Soybean chlorotic mottle virus | ~5× (protoplasts), moderate in plants | Strong viral promoter distinct from CaMV and FMV | Conci et al., 1993; Shirasawa-Seo et al., 2002 |
| PCisV | PClSV FLt promoter | ~2× stronger | Strong constitutive promoter comparable to FMV | Maiti & Shepherd, 1998 |
| dPCisV | Double enhancer PCisV | ~6× stronger | Highly powerful promoter due to enhancer duplication | Maiti & Shepherd, 1998 |
| CPV1 | Cassava vein mosaic virus | ~0.5× of CPV2 | Moderate activity; tissue-specific expression | Verdaguer et al., 1996; Calvert et al., 1995 |
| CPV2 | Cassava vein mosaic virus | ~1× (similar to e35S) | Stronger version; high activity in vascular tissues | Verdaguer et al., 1998 |
| pFMV | Figwort mosaic virus | <1 (weaker than 35S) | Common alternative but weaker in this system | Shakhova et al., 2022 |
| AtUBQ10 (0.8) | Arabidopsis thaliana | <1 (similar to pFMV) | Stable expression across tissues | Shakhova et al., 2022 |
| AtAct2 | Arabidopsis thaliana | Moderate to low | Constitutive but weak in tobacco system | Shakhova et al., 2022 |
| P-Nos | Agrobacterium tumefaciens | Weak to moderate | Commonly used for selectable marker genes | GenBank: AF485783 |
| Terminator | Origin | Relative Performance | Key Characteristics | Sequence Source |
|---|---|---|---|---|
| tOCS | Agrobacterium tumefaciens (octopine synthase) | Highest (Top performer) | Most stable and strongest expression in Nicotiana systems; best overall choice | Shakhova et al., 2022 (supplementary Benchling file) |
| tHSP18.2 | Arabidopsis thaliana (heat shock protein 18.2) | Very high (slightly below tOCS) | Strong expression; highly efficient but slightly less than tOCS in tobacco | Shakhova et al., 2022 (supplementary Benchling file) |
| tATPase | Solanum lycopersicum (ATPase gene) | High | Robust and consistent performance; comparable to tHSP18.2 | Shakhova et al., 2022 (supplementary Benchling file) |
| tAtAct2 | Arabidopsis thaliana (actin 2) | Low | Weak expression in Nicotiana; not suitable for high-expression constructs | Shakhova et al., 2022 (supplementary Benchling file) |
| tRBCS3C | Solanum lycopersicum (Rubisco small subunit 3C) | Low | Limits transcription efficiency; weakest among tested terminators | Shakhova et al., 2022 (supplementary Benchling file) |
| T-35S | Cauliflower mosaic virus | Moderate to high | Widely used standard terminator; reliable polyadenylation signal | GenBank: GQ497234.1 (pEAQ-HT vector) |
| T-E9 | Pisum sativum (Rubisco small subunit) | High | Efficient transcription termination and mRNA stabilization in plants | GenBank: HM036220.1 (pKM24KH vector) |
| CTP | Source Protein | Organism | UniProt Accession | Length (aa) | Key Function |
|---|---|---|---|---|---|
| RbcS CTP | Ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit | Arabidopsis thaliana | P10795 | 57 | Targets proteins to chloroplast stroma (photosynthetic pathway) |
| Ferredoxin-2 CTP | Ferredoxin-2 (chloroplastic) | Arabidopsis thaliana | P16972 | 53 | Directs proteins to chloroplast electron transport system |
| RecA CTP | DNA repair protein RecA homolog 1 | Arabidopsis thaliana | Q39199 | 57 | Targets proteins to chloroplast nucleoids (DNA maintenance) |
Plant Expression Vectors: pCAMBIA2300 and pCAMBIA1300
For my plant transformation system, I selected two complementary binary vectors: pCAMBIA2300 and pCAMBIA1300, enabling the independent construction and co-expression of structural and maturation gene cassettes. Detailed technical specifications for both vectors can be found in their respective datasheets provided by Abcam for pCAMBIA1300 and pCAMBIA2300.
| Feature | pCAMBIA2300 | pCAMBIA1300 |
|---|---|---|
| Construct Use | Structural genes (coxL, coxM, coxS) | Maturation genes (coxD, coxE, coxF, coxG) |
| Approx. Size | ~8.7 kb | ~8.9 kb |
| Plant Selection Marker | Kanamycin (nptII) | Hygromycin (HygR) |
| Bacterial Selection | Kanamycin | Kanamycin |
| Reporter Gene | None (empty vector) | None (empty vector) |
| Cloning Site | pUC18-derived MCS | pUC18-derived MCS |
| Replication in Agrobacterium | pVS1 origin (high stability) | pVS1 origin (high stability) |
| Insert Capacity | Suitable for large multi-cassette inserts | Suitable for large multi-cassette inserts |
| Main Advantage | Compatible with kanamycin-based plant selection | Enables dual selection with hygromycin |
To improve protein production from the engineered CODH expression cassettes, the 5′ untranslated region (UTR) of Alfalfa Mosaic Virus (AMV) RNA4 was incorporated as a translational enhancer upstream of each coding sequence. The objective of this element was to increase translational efficiency and improve ribosome recruitment in Nicotiana tabacum cells.
To improve the structural stability and transcriptional insulation of the multi-cassette CODH constructs, neutral spacer sequences were introduced between adjacent expression cassettes. These spacers were designed to reduce promoter and terminator interference, minimize homologous recombination risks, and prevent unwanted interactions between neighboring transcriptional units during cloning and plant expression.
Sources:
Objective:
Codon optimization is a fundamental step in synthetic biology when expressing genes across different organisms. Although the genetic code is universal, meaning that most organisms use the same codons to encode the same amino acids, the frequency at which specific codons are used varies between species. This phenomenon is known as codon usage bias.
Each organism has evolved to preferentially use certain codons over others, largely reflecting the abundance of corresponding transfer RNAs (tRNAs). As a result, a gene originating from one organism may be inefficiently translated when introduced into another if its codon usage does not match the host’s preferences.
In this project, the seven genes encoding the Carbon Monoxide Dehydrogenase (CODH) system originate from a bacterium and are being expressed in a plant (Nicotiana tabacum). Without codon optimization, several issues can arise:
Because the CODH system depends on the coordinated expression of multiple subunits and maturation proteins, balanced and efficient expression of each gene is essential. Even a single poorly expressed component could compromise the functionality of the entire enzyme complex.
Therefore, codon optimization is not just a technical adjustment but a critical requirement for functional expression. In this step, each gene sequence is redesigned to match the codon usage preferences of Nicotiana tabacum, while preserving the exact amino acid sequence of the encoded proteins. Additional considerations, such as avoiding mRNA secondary structures, eliminating cryptic splice sites, and maintaining appropriate GC content, are also taken into account.
Sources:
Objective
Subcellular targeting is a critical step in synthetic biology when expressing proteins in a new host organism. In plant cells, proteins must be directed to the correct organelle in order to function properly. This is especially important for metabolic pathways that depend on specific cellular environments.
In this project, the seven proteins forming the Carbon Monoxide Dehydrogenase (CODH) system originate from a bacterium. However, in plant cells, these proteins need to function inside the chloroplast, where photosynthesis occurs and where the produced CO₂ can be directly reused.
Bacterial proteins do not naturally contain signals that allow them to enter plant organelles. As a result, if they are expressed without modification, they will remain in the cytosol, where they may not fold correctly, may not interact properly with other subunits, and may fail to form a functional enzyme complex.
To solve this problem, each CODH protein must be fused to a chloroplast transit peptide (CTP). These short sequences are naturally found in plant proteins and act as targeting signals that guide newly synthesized proteins into the chloroplast. Once the protein reaches the chloroplast, the transit peptide is cleaved, releasing the mature protein in its functional form.
Sources:
Objective
After completing sequence collection, codon optimization, chloroplast transit peptide fusion, and cleavage site verification, the next objective was to design the regulatory architecture controlling expression of the seven CODH genes inside Nicotiana tabacum cells.
The CODH pathway is composed of multiple interacting structural and maturation proteins that must function together in a coordinated manner. Because of this, maintaining balanced expression between the genes is critical. Excessive or insufficient expression of specific subunits could negatively affect protein folding, complex assembly, chloroplast burden, and overall enzyme functionality.
Therefore, the main goal of this phase was to design a biologically balanced expression system by selecting suitable promoter–terminator combinations capable of driving efficient and coordinated expression of all seven CODH genes.
The initial plan for this phase was to:
The final promoter–terminator combinations were selected based on relative promoter strengths, functional compatibility between regulatory elements, and expected expression balance across the CODH pathway. Terminator efficiency values were taken from reported comparative plant expression data in Shakhova et al. (2022). The overall performance scores were predicted using an AI-based evaluation (Claude AI) integrating promoter strength, terminator efficiency, and expected transcriptional balance.
| Gene | Promoter | Strength | Terminator | Combined Performance |
|---|---|---|---|---|
| coxL | D100 | 2.2× | tOCS | ★★★★ |
| coxM | SM | 2.1× | tHSP18.2 | ★★★★ |
| coxS | FMV 34S | 2.0× | tATPase | ★★★ |
| coxD | D100 | 2.2× | tOCS | ★★★★ |
| coxE | SM | 2.1× | tHSP18.2 | ★★★★ |
| coxF | S100 | 1.8× | tATPase | ★★★ |
| coxG | FMV 34S | 2.0× | T-35S | ★★★ |
The objective of this step was to design each cox gene as an independent plant expression cassette containing all the required regulatory elements for efficient expression in Nicotiana tabacum. This included selecting appropriate promoters, terminators, chloroplast transit peptides (CTPs), translational enhancers, purification tags, and spacer sequences, while organizing the multicassette constructs in a modular format compatible with DNA synthesis and Gibson Assembly.
The objective of this step was to prepare the pCAMBIA2300 and pCAMBIA1300 backbones for Gibson Assembly by virtually linearizing the vectors at a selected restriction site and generating homologous overlap regions. These homology arms were designed to guide the precise insertion and seamless assembly of the multicassette fragments into the plasmid backbones.
The objective of this step was to adapt the designed multicassette constructs to the synthesis requirements of Twist Bioscience by identifying and resolving problematic repetitive regions, optimizing synthesis compatibility, and ensuring that all final fragments could be successfully synthesized and assembled through Gibson Assembly.
The objective of this phase was to simulate the commercial DNA synthesis workflow by exporting the finalized multicassette fragments from Benchling in FASTA format and evaluating their compatibility with the synthesis requirements of Twist Bioscience. This step aimed to verify sequence manufacturability, detect potential synthesis issues such as repetitive regions or sequence complexity, and confirm that all fragments were fully ready for commercial synthesis and downstream Gibson Assembly.
The objective of this phase was to digitally assemble all synthesized DNA fragments into the final structural and maturation multicassette constructs using Gibson Assembly simulation in Benchling . This step allowed me to verify fragment compatibility, overlap integrity, correct orientation, and successful reconstruction of the complete plasmids before experimental cloning.
The objective of this phase was to digitally assemble the fully reconstructed Structural and Maturation multicassette inserts into their corresponding binary plant expression vectors, pCAMBIA2300 and pCAMBIA1300, using in silico Gibson Assembly in Benchling. This step aimed to generate complete circular plant transformation plasmids, verify the integrity of all assembly junctions and vector backbone elements, and confirm that the final constructs were fully compatible with downstream cloning, bacterial propagation, and Agrobacterium-mediated plant transformation applications.
Objective:
The objective of this verification step was to evaluate whether the engineered fusion subunits retained their ability to correctly assemble into the complete functional enzyme complex after the addition of chloroplast transit peptides (CTPs) and purification tags. Instead of predicting only the CoxL–CoxM–CoxS trimer, I modeled the entire (LMS)2 heterohexameric complex using AlphaFold 3 in order to perform a more realistic structural validation of the final engineered system.
This analysis aimed to verify that all modified subunits still formed stable inter-chain interactions comparable to the native enzyme architecture, while also confirming that the added CTP regions remained solvent-exposed and spatially separated from the subunit–subunit interaction interfaces. In addition, this step was used to assess whether the native assembly surfaces between CoxL, CoxM, and CoxS remained structurally accessible and unaffected by the engineered modifications, ensuring that the final enzyme complex could theoretically self-assemble correctly inside the chloroplast environment.
Sources:
Objective
The objective of this step was to design three plant expression constructs that could be efficiently assembled into circular plasmids using the Golden Gate Assembly (GGA) method. All three constructs were designed with the same regulatory and reporter elements, while only the chloroplast transit peptide (CTP) sequence was changed in order assess the correct localization of the three engineered ctp sequences using a GFP reporter and confocal microscopy. The final constructs were designed as follows:
Each construct was assembled using BsaI-mediated Golden Gate cloning with specifically designed 4 bp overhangs to ensure correct orientation and seamless ligation between adjacent fragments.