Projects

Final projects:

  • Project Title: Engineering Houseplants for Atmospheric Carbon Monoxide Capture: Chloroplast-Targeted Expression of the Bacterial CODH Enzyme Complex in Nicotiana tabacum The Problem This Project Addresses Carbon monoxide (CO) is a colorless, odorless, tasteless toxic gas that cannot be detected by human senses. It is produced whenever something burns incompletely — gas heaters, stoves, car engines, fireplaces, and wood-burning appliances all release CO. Indoors, CO accumulates silently and can reach dangerous or fatal concentrations before anyone notices. The current standard of protection is a battery-powered electrochemical CO detector. These devices are excellent at detecting CO and sounding an alarm , but they cannot remove the gas from the air. Once the alarm sounds, the occupants must evacuate and ventilate the space manually. Furthermore, CO detectors require regular battery replacement and eventually need to be replaced entirely. In low-income households worldwide, detectors are frequently absent, have dead batteries, or are past their useful lifespan.

Subsections of Projects

Individual Final Project

cover image cover image

Project Title: Engineering Houseplants for Atmospheric Carbon Monoxide Capture: Chloroplast-Targeted Expression of the Bacterial CODH Enzyme Complex in Nicotiana tabacum

image image

The Problem This Project Addresses

Carbon monoxide (CO) is a colorless, odorless, tasteless toxic gas that cannot be detected by human senses. It is produced whenever something burns incompletely — gas heaters, stoves, car engines, fireplaces, and wood-burning appliances all release CO. Indoors, CO accumulates silently and can reach dangerous or fatal concentrations before anyone notices. The current standard of protection is a battery-powered electrochemical CO detector. These devices are excellent at detecting CO and sounding an alarm , but they cannot remove the gas from the air. Once the alarm sounds, the occupants must evacuate and ventilate the space manually. Furthermore, CO detectors require regular battery replacement and eventually need to be replaced entirely. In low-income households worldwide, detectors are frequently absent, have dead batteries, or are past their useful lifespan.

–> This project proposes a fundamentally different approach: instead of detecting CO, make the plant remove it.

The Core Idea

Certain bacteria ,particularly Oligotropha carboxidovorans, have evolved the ability to use CO as a food source. They do this using an enzyme called Carbon Monoxide Dehydrogenase (CODH), which converts CO into CO₂ according to this reaction:

CO + H₂O → CO₂ + 2 electrons + 2 protons

The CO₂ produced by this reaction is not harmful at the quantities involved and supposed to be reused by a plant’s own photosynthesis through the Calvin cycle.

This project proposes to take the bacterial CODH system out of the bacterium and introduce it into a plant, specifically targeting it to the chloroplast (the organelle where photosynthesis happens). By placing CODH inside the chloroplast, two elegant outcomes occur simultaneously:

  1. The plant actively breaks down CO from the surrounding air
  2. The CO₂ produced by CODH is immediately captured by Rubisco and enters the Calvin cycle, making the plant slightly more productive

The scientific foundation for this idea is already established in the literature. Duffus et al. (2018) demonstrated that the complete CODH complex can be functionally expressed in Escherichia coli –> proving heterologous expression is achievable. South et al. (2019) demonstrated in Science that bacterial enzymes introduced into tobacco chloroplasts producing CO₂ directly in the stroma increased plant biomass by up to 40% –> proving that chloroplast-produced CO₂ is efficiently captured by photosynthesis. This project extends this logic to a new substrate: atmospheric CO.

The Complete Genetic System Required

The CODH enzyme from O. carboxidovorans is not a single protein. It is a complex system requiring seven genes organized into two functional groups:

  1. Group 1 — Structural subunits (the enzyme itself):

coxL –> the large catalytic subunit (~88 kDa) where CO is actually oxidized. Contains the unique [CuSMoO₂] active site coxM –> the medium subunit (~30 kDa) containing FAD, responsible for electron transfer coxS –> the small subunit (~18 kDa) containing [2Fe-2S] iron-sulfur clusters, part of the electron relay chain

These three proteins assemble into a (CoxL·CoxM·CoxS)₂ heterohexamer — a complex of six protein subunits working together.

  1. Group 2 — Maturation proteins (the assembly machinery):

coxD –> an AAA+ ATPase chaperone that acts as a “maturation protein,” responsible for the post-translational insertion of copper and the essential bridging sulfur into the apo-enzyme, converting it to active holo-enzyme. coxE, coxF and coxG –> “final processing” and “sulfur addition” are part of a complex pathway. According to research, coxF plays a role in copper acquisition/mobilization, and coxE and coxG are involved in the maturation pathway that leads to the properly sulfurated and copper-inserted active site. The exact individual functions of coxE and coxG are still being elucidated, though their role in the maturation complex is essential.

Overview of the Three Aims

image image

AIM 1 — Computational Design and Validation of the Complete Genetic System

In simple terms: Design the complete genetic blueprint for the CO-capturing plant system on a computer, verify every element computationally, and produce a synthesis-ready design.

The seven bacterial genes cannot simply be pasted into a plant. They need to be comprehensively redesigned for plant expression:

  • Their DNA sequences must be rewritten in “plant language” through codon optimization
  • Each protein needs a molecular address label (chloroplast transit peptide) added to its beginning so it is directed to the correct location inside the plant cell
  • The address labels must be verified to ensure the plant’s processing machinery will correctly remove them after the protein arrives
  • Each gene needs its own promoter (an on-switch for gene expression) and terminator (an off-switch), carefully chosen to prevent the plant from silencing all the genes simultaneously
  • Translation enhancer sequences must be added to maximize protein production
  • Spacer sequences must be placed between genes to prevent one gene’s transcription from accidentally running into the next
  • The complete system must be distributed across two separate transformation vectors

All of this is done computationally using Benchling, A codon optimization tool, ChloroP 1.1, Boltz, and the Asimov Kernel –> producing a complete verified design ready for DNA synthesis through Twist Biosciences.

AIM 2 — Wet Lab Transformation and Functional Validation (The next step — beyond this course)

In simple terms: Actually build the constructs in the lab, put them into tobacco plants, and prove the enzyme works. Aim 2 begins where Aim 1 ends. The Twist-synthesized multicassettes fragments are assembled into the pCAMBIA vectors using Gibson Assembly. The constructs are introduced into Nicotiana tabacum via Agrobacterium tumefaciens-mediated leaf disc transformation , the standard method for introducing genes into tobacco. Transgenic plants are selected on dual antibiotic medium (hygromycin + kanamycin, confirming both constructs integrated).

The experimental progression follows strict logic — each step must succeed before the next begins:

  • –> Step 1 — Chloroplast targeting validation
  • –> Step 2 — Gene integration and transcription
  • –> Step 3 — Protein expression and CTP cleavage
  • –> Step 4 — Complex assembly
  • –> Step 5 — CO oxidation activity
  • –> Step 6 — Plant health and photosynthesis

for more details, please take a look on part I of week 10 homework.

AIM 3 — Optimization, Transfer to Houseplants, and Real-World Deployment(The long-term vision)

In simple terms: Assuming Aim 2 succeeds, optimize the system, transfer it to real houseplants, and develop it toward real-world deployment. If Aim 2 demonstrates functional CO oxidation in tobacco, Aim 3 pursues three parallel directions:

Direction 1 — Transfer to real houseplants: The validated genetic architecture from tobacco is adapted for transformation into Epipremnum aureum (Pothos) and Spathiphyllum wallisii (Peace Lily) — widely kept, hardy, aesthetically acceptable houseplants. Agrobacterium-mediated transformation protocols established for tobacco are adapted for these species.

Direction 2 — System optimization: Several improvements are pursued to increase CO removal efficiency and operational range:

A CO-responsive inducible promoter system replaces constitutive promoters, activating CODH expression only when CO is present and saving plant energy otherwise Constitutively open stomata engineering to maintain CO uptake during nighttime hours when CO poisoning risk is highest Expression levels are optimized based on the quantitative CO removal model to increase per-plant removal capacity

Direction 3 — Safety, containment, and deployment:

Genetic Use Restriction Technology (GURT): To prevent seed viability and uncontrolled environmental spread, I will implement Genetic Use Restriction Technology (GURT). This ensures that any engineered plants cannot reproduce outside controlled environments. Additional containment strategy — chloroplast genome integration:

As an alternative or complement to GURT, I can integrate the transgenes into the chloroplast genome instead of the nuclear genome. Chloroplast DNA is maternally inherited in most flowering plants, including tobacco (Nicotiana tabacum). This means the transgenes are not transmitted via pollen, virtually eliminating the risk of gene flow to wild relatives. This is a well-established biosafety strategy for plant synthetic biology.

Regulatory pathway planning begins under USDA APHIS (Regulation of genetically engineered plantsand) EPA (Regulation of plants producing pesticidal substances (if applicable))frameworks.

The deployment target is refined based on the quantitative CO removal analysis: rather than acute emergency protection in homes (which requires too many plants), the primary application is chronic CO reduction in high-exposure industrial and semi-industrial environments like workshops, garages, underground parking facilities, and developing-world indoor cooking spaces where CO concentrations are higher and more sustained.

The ethical framework for commercial deployment ,including informed consent, false assurance prevention, equity of access, and environmental risk, is fully developed and integrated into regulatory submissions.

image image

Sources:

  • Bährle, R., Böhnke, S., Englhard, J., Bachmann, J., & Perner, M. (2023). Current status of carbon monoxide dehydrogenases (CODH) and their potential for electrochemical applications. Bioresources and Bioprocessing, 10(1), 84. https://doi.org/10.1186/s40643-023-00705-9
  • Dent, M. R., Weaver, B. R., Roberts, M. G., & Burstyn, J. N. (2023). Carbon Monoxide-Sensing Transcription Factors: Regulators of Microbial Carbon Monoxide Oxidation Pathway Gene Expression. Journal of Bacteriology, 205(5), e00332-22. https://doi.org/10.1128/jb.00332-22
  • Erb, T. J. (2024). Photosynthesis 2.0: Realizing New-to-Nature CO2-Fixation to Overcome the Limits of Natural Metabolism. Cold Spring Harbor Perspectives in Biology, 16(2), a041669. https://doi.org/10.1101/cshperspect.a041669
  • Kaufmann, P., Duffus, B. R., Teutloff, C., & Leimkühler, S. (2018). Functional Studies on Oligotropha carboxidovorans Molybdenum–Copper CO Dehydrogenase Produced in Escherichia coli. Biochemistry, 57(19), 2889–2901. https://doi.org/10.1021/acs.biochem.8b00128
  • Liu, C., Zhang, N., Sun, L., Gao, W., Zang, Q., & Wang, X. (2022). Potted plants and ventilation effectively remove pollutants from tobacco smoke. International Journal of Low-Carbon Technologies, 17, 1052–1060. https://doi.org/10.1093/ijlct/ctac081
  • Park, S., Mani, V., Kim, J. A., Lee, S. I., & Lee, K. (2022). Combinatorial transient gene expression strategies to enhance terpenoid production in plants. Frontiers in Plant Science, 13, 1034893. https://doi.org/10.3389/fpls.2022.1034893
  • Qin, S., Liu, Y., Yan, J., Lin, S., Zhang, W., & Wang, B. (2022). An Optimized Tobacco Hairy Root Induction System for Functional Analysis of Nicotine Biosynthesis-Related Genes. Agronomy, 12(2), 348. https://doi.org/10.3390/agronomy12020348
  • Schübel, U., Kraut, M., Mörsdorf, G., & Meyer, O. (1995). Molecular characterization of the gene cluster coxMSL encoding the molybdenum-containing carbon monoxide dehydrogenase of Oligotropha carboxidovorans. Journal of Bacteriology, 177(8), 2197–2203. https://doi.org/10.1128/jb.177.8.2197-2203.1995
  • Siebert, D., Busche, T., Metz, A. Y., Smaili, M., Queck, B. A. W., Kalinowski, J., & Eikmanns, B. J. (2020). Genetic Engineering of Oligotropha carboxidovorans Strain OM5—A Promising Candidate for the Aerobic Utilization of Synthesis Gas. ACS Synthetic Biology, 9(6), 1426–1440. https://doi.org/10.1021/acssynbio.0c00098
  • Tao, Y., Chiu, L.-W., Hoyle, J. W., Dewhirst, R. A., Richey, C., Rasmussen, K., Du, J., Mellor, P., Kuiper, J., Tucker, D., Crites, A., Orr, G. A., Heckert, M. J., Godinez-Vidal, D., Orozco-Cardenas, M. L., & Hall, M. E. (2023). Enhanced Photosynthetic Efficiency for Increased Carbon Assimilation and Woody Biomass Production in Engineered Hybrid Poplar. Forests, 14(4), 827. https://doi.org/10.3390/f14040827
  • Thagun, C., Odahara, M., Kodama, Y., & Numata, K. (2024). Identification of a highly efficient chloroplast-targeting peptide for plastid engineering. PLOS Biology, 22(9), e3002785. https://doi.org/10.1371/journal.pbio.3002785

Subsections of Individual Final Project

PHASE 1: Sequence Collection

Structural and maturation genes sequences:

To obtain the gene sequences, I used the accession number GenBank CP002827.1, which corresponds to the genome of Oligotropha carboxidovorans. I accessed this record through the National Center for Biotechnology Information platform.

Within the genome page, I used the graphical genome viewer to locate the genes of interest. I specifically identified the structural genes (coxL, coxM, coxS) and the maturation genes (coxD, coxE, coxF, coxG) involved in the CO dehydrogenase (CODH) system.

For each gene, I clicked on its corresponding feature in the graphical map, opened its detailed annotation page, and selected the FASTA format option. This allowed me to retrieve the nucleotide sequence of each gene individually. image image All sequences were downloaded separately in FASTA format and then compiled for further analysis and use in my project. image image

CoxL structural subunit sequence:

CP002827.1:30264-32693 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

ATGAATATCCAGACCACCGTTGAACCGACGAGCGCGGAGCGTGCCGAAAAGTTGCAGGGTATGGGCTGCAAGCGCAAACGTGTCGAAGATATCCGCTTTACCCAGGGTAAGGGCAACTACGTCGATGATGTGAAATTACCGGGTATGTTGTTTGGTGATTTCGTTCGTTCGTCGCACGCCCATGCGCGCATTAAAAGTATCGATACCTCGAAGGCTAAGGCGCTTCCAGGTGTATTCGCTGTTTTAACGGCGGCCGACCTGAAGCCGCTGAATCTGCATTATATGCCGACGCTGCTGGCGATGTGCAGGCAGTGCTTGCAGACGAGAAGGTTCTTTTCCAGAATCAGGAGGTTGCCTTTGTAGTGGCGAAAGATCGTTACGTTGCGGCGGACGCGATCGAATTGGTCGAAGTCGATTATGAGCCGCTGCCGGTTCTAGTCGACCCATTCAAGGCAATGGAACCAGATGCACCTCTGCTACGTGAAGATATCAAAGACAAAATGACCGGTGCGCACGGTGCGCGCAAACATCACAACCATATCTTCCGTTGGGAAATAGGCGATAAGGAAGGCACCGATGCGACCTTCGCCAAAGCCGAAGTCGTGTCAAAAGATATGTTTACCTATCATCGGGTGCATCCGTCGCCGCTGGAAACGTGTCAGTGCGTTGCGTCGATGGACAAGATCAAGGGTGAACTGACGTTGTGGGGCACATTCCAGGCGCCGCATGTCATCCGTACCGTGGTGTCGCTGATCTCGGGTTTGCCGGAGCATAAAATCCACGTCATTGCACCGGACATCGGGGGCGGCTTTGGCAACAAGGTGGGCGCTTATTCCGGCTACGTCTGCGCGGTGGTTGCCTCCATCGTGCTGGGCGTGCCCGTGAAGTGGGTCGAAGACCGAATGGAGAACCTCTCCACGACATCATTTGCGCGCGACTATCATATGACGACAGAACTCGCAGCCACCAAGGACGGCAAGATTCTTGCGATGCGCTGTCACGTCCTGGCTGATCACGGAGCGTTCGACGCCTGTGCCGATCCATCGAAATGGCCGGCGGGCTTCATGAACATCTGTACCGGCTCCTATGACATGCCGGTGGCACATCTGGCCGTGGATGGTGTCTATACCAACAAAGCGTCCGGCGGCGTAGCCTATCGTTGCTCGTTCCGAGTGACGGAAGCGGTTTATGCCATTGAGCGCGCGATCGAGACGCTGGCGCAGCGGCTCGAGATGGACTCAGCCGATCTACGCATCAAGAACTTTATCCAGCCGGAGCAGTTCCCTTATATGGCGCCGCTGGGCTGGGAGTACGACAGCGGAAATTATCCACTCGCGATGAAGAAAGCGATGGATACGGTCGGTTATCATCAGCTTCGTGCTGAACAGAAAGCCAAACAGGAAGCCTTCAAGCGCGGCGAGACACGCGAGATTATGGGCATCGGTATCTCGTTTTTCACCGAGATTGTCGGCGCCGGGCCGTCGAAGAATTGCGATATTCTCGGCGTGTCGATGTTTGACTCGGCGGAAATCCGTATCCATCCAACCGGTTCAGTGATTGCCCGCATGGGCACCAAGAGCCAGGGCCAGGGGCACGAGACGACCTACGCTCAGATCATCGCCACCGAACTCGGTATTCCCGCTGACGACATCATGATCGAAGAAGGCAATACCGACACTGCCCCTTATGGCCTTGGCACTTACGGCTCGCGCTCGACGCCGACGGCTGGTGCGGCAACCGCTGTGGCCGCGCGCAAAATCAAAGCCAAGGCGCAGATGATTGCGGCGCACATGCTCGAAGTGCATGAGGGCGATTTGGAATGGGACGTGGACCGCTTCCGGGTGAAAGGCCTTCCGGAAAAATTCAAGACCATGAAGGAACTCGCCTGGGCGTCCTACAATAGTCCGCCGCCCAATCTCGAGCCTGGGCTCGAGGCTGTGAACTATTACGACCCTCCGAATATGACTTATCCGTTCGGTGCCTATTTCTGCATCATGGATATCGATGTGGACACCGGCGTCGCCAAAACCCGGCGCTTCTATGCACTGGACGATTGCGGAACACGTATCAACCCGATGATCATCGAAGGGCAGGTGCATGGTGGTTTGACCGAGGCCTTCGCGGTCGCGATGGGGCAGGAGATCCGATACGACGAGCAAGGCAACGTGCTTGGAGCGTCGTTTATGGACTTCTTCCTGCCGACGGCCGTCGAAACGCCGAAGTGGGAGACCGACTACACAGTGACGCCGTCGCCACATCATCCGATCGGCGCCAAAGGCGTGGGTGAAAGTCCGCATGTCGGCGGTGTGCCGTGCTTCTCAAATGCGGTGAATGATGCTTACGCCTTTCTGAACGCCGGCCATATCCAAATGCCGCATGATGCCTGGCGGCTATGGAAGGTAGGCGAGCAACTTGGCCTGCACGTCTAA

Cox M structural subunit sequence:

CP002827.1:28882-29748 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

GTGATACCTGGTTCATTTGATTATCACCGTCCAAAATCCATTGCAGACGCAGTCGCGCTTCTGACGAAGCTCGGTGAGGATGCTCGGCCCTTGGCCGGAGGCCACAGCCTAATTCCGATCATGAAGACCCGGCTGGCTACGCCGGAGCATCTGGTTGATCTCAGGGATATTGGAGATCTCGTCGGAATTCGAGAGGAGGGTACGGACGTCGTCATCGGGGCGATGACCACTCAGCATGCGCTGATAGGCTCAGATTTTCTCGCAGCAAAATTGCCGATCATTCGCGAGACATCGCTGCTGATCGCCGATCCGCAAATCCGCTACATGGGAACCATTGGCGGCAACGCCGCTAACGGCGATCCGGGCAACGATATGCCGGCCCTCATGCAGTGTCTCGGTGCGGCTTACGAACTCACCGGCCCTGAAGGTGCGCGCATAGTTGCTGCGCGAGATTACTATCAAGGTGCTTATTTCACGGCGATCGAGCCCGGTGAACTTCTTACAGCAATCCGAATTCCGGTGCCGCCCACCGGACACGGTTACGCTTACGAAAAACTGAAGCGGAAAATTGGCGACTATGCCACCGCCGCGGCGGCTGTCGTGCTGACGATGAGCGGCGGAAAATGTGTGACGGCATCGATCGGTCTCACCAATGTTGCGAACACACCGCTTTGGGCGGAAGAGGCCGGCAAGGTGCTGGTTGGCACGGCGCTCGACAAACCTGCGCTCGACAAGGCTGTAGCGCTGGCTGAGGCGATCACCGCTCCGGCGTCGGATGGCCGCGGGCCCGCAGAATATCGGACCAAGATGGCGGGTGTCATGCTGCGTCGTGCGGTCGAGCGGGCCAAGGCCCGCGCCAAGAATTAG

Cox S structural subunit sequence:

CP002827.1:29767-30267 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

ATGGCGAAAGCCCATATCGAGTTGACGATCAACGGACATCCGGTGGAGGCACTGGTCGAACCGCGTACGCTGTTGATCCATTTCATTCGCGAGCAACAGAACCTTACCGGCGCACATATCGGCTGCGACACCAGCCACTGCGGCGCGTGTACTGTCGATCTCGATGGTATGTCGGTGAAGAGCTGCACAATGTTCGCTGTCCAGGCTAACGGGGCTTCAATCACCACGATTGAAGGCATGGCAGCACCGGATGGTACACTGAGTGCGCTGCAGGAAGGGTTCCGCATGATGCATGGTCTGCAATGCGGCTACTGCACTCCGGGGATGATCATGCGATCGCATCGCTTGCTGCAGGAGAATCCAAGCCCGACCGAAGCGGAAATACGCTTCGGCATCGGTGGAAATCTTTGCCGCTGCACCGGCTATCAGAACATTGTCAAAGCAATCCAGTATGCCGCCGCCAAGATCAATGGCGTACCTTTCGAGGAGGCCGCAGAATGA

Cox D structural subunit sequence:

CP002827.1:32748-33635 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

ATGCGTCATCATGCTGAACGAGACAAGGTCGCCGAGAGGCTGGCCTATGCGGGCTATATCCCCGATCGCGATCTTGCGACCGCTGTTTGGCTGATGGAAAGCCTGTCGCGCCCGTTGTTGCTGGAAGGCGAAGCGGGTGTAGGCAAGACCGAGGTCGCGCTGACACTGGCGCAAGCGAACGGAGCAAGGCTCATTCGCTTGCAATGCTATGAGGGGCTCGATCAAAACGCGGCATTATACGAGTGGAACTACCAACGGCAGTTGCTGGCGATCAAAACACGGGAAAGTCGTGCGGACGCGGTAGATGTTATCGAGGATCATATTTTCTCGGAGAAGTTTCTGCTTGAGCGGCCGCTGTTGGCTGCAATACGTCAACCCAAATCGGCAGTGCTGCTAATTGATGAGGTTGACCGCGCCGACGAGGAGTTTGAGGCCTTTTTACTCGAACTGTTGTCGGATTATCAGGTTTCGATTCCCGAACTTGGCACAATCCATGCCACAACGATTCCACAGGTGATCCTGACATCCAATGGCACGCGTGAGTTATCAGATGCGTTGCGCCGGCGTTGTCTCTATCACTATGTCGACTATCCGGATGTTGAACGCGAGGCGCGTATCATCACCACACGGATGCCGAATATCGACGTTGCGCTGGCGTTGCAGATTGCCAGGATGATCGAGGGAATCCGAAAAGAGGATTTGCGCAAGAGTCCCGGCGTCGCGGAAACCCTCGACTGGGCGGCAGCATTGGCGGGGCTTGGCGTTGAGGATCTGCGCGCTGAACCCGAAGCTGTCTTTGAAACGATGATGTGCTTGATCAAGACAGTCGAAGATAAATCGCGCGTGACTCGCGAGGTTTCTGATCGGCTGCTGGGCAAGGTGGCATGA

Cox E structural subunit sequence:

CP002827.1:33637-34836 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

ATGGTGGCAACTGCGGCCATTCATGAATCCAGCGCTGCTTCGGCAGGGGCTCGCCGCAAGCTTGGCGACTTTGTCCGAGTACTCCGGGACAATGGTTTCATTGTGGGGCTCGCGGAGGCTGGCGATGCGCTTACCGTGCTGAGCAGGCCTGCCTCTTTGACGCCGTCGCGTCTGCGACCGGCGCTCCGCGCATTGTTCTGCAGTAACAAGTCTGATTGGGAAAAGTTCGACGAGATTTTCGATGCGTTCTGGCTGGGGCGCGGCATGAAATCCGCAACGCGCATTTCGGGCGTGCTGCAGAAAAGTCCGCCCGGTATGGAGAGTTCAAGGAGTGGCGATCGGCCAGGTAATCCTGATGGGGCGCCAGATCATGTACAGCGGCGTATAGGCTTGGATCACGGCACCGATGAAAATAGTCCCGGCCTGCGGGAAGGTGCATCGCGCGCGGACTCGCTGGCCAAGGCTGATTTTCGTCATCTCACAAACCCGGACGATCTTGCTGCAGCTCATGCGGTAGCTGCAAGACTCGCAAAGGCGATGCGGGTGCGCTTAACCCGTCGCGAACAATCGCGCCGTACTGGCCGGCGTATCGACCTCCGCCGCACGATTCACAAAAATATTGCCCATGGAGGGATGCCGCTGGAGTTGGTCTGGCGACAACGCAAGCATAAACCATTACGGCTGGTCGTGCTGCTCGACGCGTCCGGATCTATGAGCATGTATTCGGCAGTATTCCTCCGGTTCATGCACGGGATTCTTGATAATTTTCGTGAGGCCGAGGCCTTCGTCTTCCATACGCGCCTCATTCATATTTCGCCCGCTTTGCGTGAGCGCGATGCGACACGTTCTGTGGAGCGTATGTCGCTGTTGGCGCAAGGCGTCGGTGGTGGCACCCGGATCGGTGAATCGCTTGCCACGTTCAATCGGTGGCATGCGAAGCGTGCAATTCATTCGCGCACTTGTGTGATGATCGTGTCCGACGGCTACGATACCGGGCCTGCCGAGCAACTGGAGCGAGAGATGTCGGCGCTGCGCCGTCGCTGTCGCCGTATCGCCTGGCTCAATCCGATGATCGGCTGGCGCGGCTATGCGCCAGAGGCAGCGGGGATGAAGGCGGCCCTGCCTCATGTCGACTTGTTTGCGCCCGCTCACAACCTCGAGAGCTTGCAAGCCATTGAGCCTTATCTGGCGAGGATTTGA

Cox F structural subunit sequence:

CP002827.1:34840-35682 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

ATGACACCTACTCCTGACGTGCTCGATCTCGTCAACAATATGAAAGCCCGGGGTGAGCCGTTTGCCCTCGCAACGGTAGTGCGGACGGTATCACTCACCGCAGCCAAGGCAGGTGCAAAGGCTATTATTTTGAGCGACGGTACTATGACCGCCGGCTGGATCGGGGGCGGGTGTGCGCGGGCGAATGTGCTGAAGGCTGCGCGACAATCGCTTTCGGACGGCAAGCCGCGCCTGATTAGTGTACAGCCCAAGGACGTTCTTGAGGAACACGGTCTGACGGCAGGTGAGGCGCGAGAAGGTGTGCTCTATGCCAACAACATGTGCCCGAGCCATGGTACCATGGATATTTTTGTCGAGCCGATCTTGCCGCGTCCTCAGCTCTATATCTGTGGTGCATCGCCGGTTGCGGTGGCTATCGCGGCTATCGCACCGCGTATGGGATTTTTTGTGTCGGTATGCGCGCCCAAAGCAGATCACACGCTCTTTGGTGACACCGATAGGCTGATTGATGGTTATGAAATTCCCGCCGACAGCGGCACTAATCGTTATGTCGTTGTATCGACGCAGGGACGTGGCGATACTGCTGCGCTGAAATCCGCACTATCCACGCCATCCGTCTACGTGGCTTTCGTTGGCTCGCGTAAGAAAGCGTCGGTGTTGAGGGAAGAGCTTACCGTAGCAGGCATCGCGCCGTCGCTATTGGAAACATTGCACGCGCCTGCCGGCCTCGACCTCGGCGGTATCACGCCTGATGAAATCGCGCTCTCGATCGTAGCGGAGATGGTCGAGATACGTCGCCACGGGCAACGACAATCGGATAATCAGAAAGAAGGAACATCCTGA

Cox G structural subunit sequence:

CP002827.1:35682-36299 Oligotropha carboxidovorans OM5 plasmid pHCG3, complete sequence

ATGGATATGAACGCATCGCAGCGCATCGAAGCCTCGCGCGAAAAAGTCTACGCCGCGCTCAACGATGTTGAGGTGCTTAGGCCGTGCATTCCAGGCTGCGAGTCCATCGAAAAGATCTCTGATAGCGAGATGACTGCCAAGGTCACGTTGCGCATTGGCCCAGTGAAAGCATCTTTTACCGGCAAGGTGACCCTATCGGATCTCGATCCGCCAAACGGTTACACGATTGCAGGGGAGGGTACAGGCGGCATGGCGGGATTTGCCAAGGGCGGTGCTACGGTGAAACTCGAAGCGGATGGGACTGCGACGATTCTTCACTATACTGTTAAAGCTGATGTCGGCGGCAAACTGGCGCAGCTTGGTGGCCGGCTAATCGATGCGACCGCGACAAAACTTGCAGGAGAGTTTTTTGAAAAATTCGGCAATATTGTTGGGCCTGTCGTAGTCCAAGATGAAGAAGAGCCGGTTAAGAAGAAAGGCTGGCTCAAGAAGATCACTGGCGCTCTCAGTGTCCTTGTCTTTAGCATTTTATTAGGCGCGCACTGGTGTTGTATTGGCGGCCATGCTCACGCTCAGAACGATCCGCTGATGTTAGCGATCTGCTCGTCGCGAGTTTGA
GeneGenomic Coordinates (NCBI)Protein IDBiological RoleAssigned Construct
coxLCP002827.1 (30264–32693)AEI08106.1Catalytic subunit responsible for CO oxidationConstruct 1 (Structural)
coxMCP002827.1 (28882–29748)AEI08104.1FAD-binding subunit involved in electron transferConstruct 1 (Structural)
coxSCP002827.1 (29767–30267)AEI08105.1Fe-S cluster-containing subunit for electron relayConstruct 1 (Structural)
coxDCP002827.1 (32748–33635)AEI08107.1Molybdenum cofactor insertion and enzyme maturationConstruct 2 (Maturation)
coxECP002827.1 (33637–34836)AEI08108.1Assists in Mo-cofactor biosynthesis and assemblyConstruct 2 (Maturation)
coxFCP002827.1 (34840–35682)AEI08109.1Active site processing and enzyme activationConstruct 2 (Maturation)
coxGCP002827.1 (35682–36299)AEI08110.1Sulfur ligand incorporation into the active siteConstruct 2 (Maturation)
Promoter sequences:

TobUbi.U4 proximal promoter:

The 263 bp proximal promoter region of the Ubi.U4 gene from Nicotiana tabacum was obtained based on the study by Genschik et al., (1994)This region corresponds to the sequence spanning −263 to −1 relative to the transcription start site (TSS) and contains key cis-regulatory elements involved in transcriptional regulation. The transcription start site (TSS, +1) was not directly annotated in the GenBank entry. Therefore, it was determined based on the promoter analysis presented in the original publication by Genschik et al. (1994), where the TSS was experimentally identified and illustrated in Figure 3. The nucleotide sequence was retrieved from the GenBank database (accession: X77456.1), corresponding to positions 575–837 of the N. tabacum Ubi.U4 gene. image image

> emb|X77456.1 :575-837 N.tabacum Ubi.U4 gene

ACTACGTTAGAGCGCTAACGAGAATACTTCATATACCGTATTTTTTACGATAATAATAATGTAATGTGAAATTGCTATCCAAAAGGCACCTAATTTTGTCCACCGTTCAAAGGAAAGGACAAGGAAGTAGTAGCGTGTAGGTTTGGTGCTGTACAAAATAAGCAAGACACGTGTTGCCTTATTATAGGATAATCCATAAGGCAATTTCGTCTTAAGTCGGCCATTGCACCTTTAAAAGGAGCCTCTTTGTTCCCAAAATCTTC

D100 chimeric promoter (Dahlia mosaic virus - DaMV):

The D100 promoter is a synthetic construct derived from the Dahlia mosaic virus (DaMV) genome, as described by (Khadanga et al., 2021)based on the work of (Sahoo et al., 2015). It is designed by combining an upstream activation sequence with a core promoter region to enhance transcriptional activity.

  • DaMV14UAS (−203 to −33): an upstream activation sequence acting as a transcriptional enhancer
  • A short linker sequence (CCCGAC)
  • DaMV4CP (−474 to +82): a core promoter region required for basal transcription The source promoter region corresponds to a 706 bp fragment (6579–7280) of the DaMV genome (GenBank: JX272320.1), with the transcription start site (TSS, +1) located at position 7053 based on coordinate mapping.

The following sequences were extracted based on coordinate mapping:

DaMV14UAS (−203 to −33):

> gb|JX272320.1|:6850-7020 Dahlia mosaic virus clone pDaMV-p2, complete genome

TCTGCAAACAGCTCAAAAAGCTACTGGCCGACAATCATAATTGCTCGGCATGTGCAGGTGGGGCCTCCACTAGCAATAATACAAGCTTTACAGCTTGCAGTGACTCATCCTCCAATAATGAGGAAAAAGACGTCAGCAGTGACGAACAAGGGCCTGAAGACTTGCCTATAT

DaMV4CP (−474 to +82):

> gb|JX272320.1|:6579-7134 Dahlia mosaic virus clone pDaMV-p2, complete genome

GAATTCAATCCTCCTCAGGAAATGAAGGATTCAGGAGATCTTCTCTATCAACTTGCTCAAGTAAGGACAAACGGGTTCACCCGGATCCTCCAGAAGACCCAGTCTATCAACGGAGAAACAAAGATAAAAATCAATTACTCACATGAAAGAGTATTGATCACGAGTCACTATGGAGCGACAATCTCCAGACAGGATGTCAGCATCTTATCTTCCTTTGAAGAAAGCATCATCAATAACGATGTAATGGTGGGGACATCCACTAAGTTATTGCTCTGCAAACAGCTCAAAAAGCTACTGGCCGACAATCATAATTGCTCGGCATGTGCAGGTGGGGCCTCCACTAGCAATAATACAAGCTTTACAGCTTGCAGTGACTCATCCTCCAATAATGAGGAAAAAGACGTCAGCAGTGACGAACAAGGGCCTGAAGACTTGCCTATATAATGGCATTCACCCCTCAGTTGAAGAGCATCAGGAGTTTCAGCATAGAAACTTTCTCTTTAACAAATCTATCTTTTCTTTAAAGCATGTGTGAGTAGAAACCCATATAGGGTTA

Initially, the promoter sequence was reconstructed using GenBank coordinates. However, slight discrepancies were observed when compared to the promoter structure illustrated in the published figure. image image Therefore, the final D100 promoter sequence was generated using an Gemini AI tool based on the figure from Khadanga et al. (2021), as it accurately reflects the reported experimental construct:

GCTCTGCAAACAGCTCAAAAAGCTACTGGCCGACAATCATAATTGCTCGGCATGTGCAGGTGGGGCCTCCACTAGCAATAATACAAGCTTTACAGCTTGCAGTGACTCATCCTCCAATAATGAGGAAAAAGACGTCAGCAGTGACGAACAAGGGCCTGAAGACTTGCCcccgacAATCCTCCTCAGGAAATGAAGGATTCAGGAGATCTTCTCTATCAACTTGCTCAAGTAAGGACAAACGGGTTCACCCGGATCCTCCAGAAGACCCAGTCTATCAACGGAGAAACAAAGATAAAAATCAATTACTCACATGAAAGAGTATTGATCACGAGTCACTATGGAGCGACAATCTCCAGACAGGATGTCAGCATCTTATCTTCCTTTGAAGAAAGCATCATCAATAACGATGTAATGGTGGGGACATCCACTAAGTTATTGCTCTGCAAACAGCTCAAAAAGCTACTGGCCGACAATCATAATTGCTCGGCATGTGCAGGTGGGGCCTCCACTAGCAATAATACAAGCTTTACAGCTTGCAGTGACTCATCCTCCAATAATGAGGAAAAAGACGTCAGCAGTGACGAACAAGGGCCTGAAGACTTGCCTATATAATGGCATTCACCCCTCAGTTGAAGAGCATCAGGAGTTTCAGCATAGAAACTTTCTCTTTAACAAATCTATCTTTTCTTTAAAGCATGTGTGAGTAGAAACCCATATAGGGTTATAATGT

S100 chimeric promoter (Soybean vein clearing virus, SVBV):

The S100 promoter is a synthetic chimeric construct derived from the Soybean vein clearing virus (SVBV), as described by Khadanga et al., (2021)based on Pattanaik et al., (2004). It is designed by combining an upstream activation sequence with a core promoter region to enhance transcriptional activity.

  1. SV10UAS (250 bp) (-352 to -102): This is the Upstream Activation Sequence that contains major regulatory elements contributing to transcriptional enhancement. 2.2. The Linker: CCCGAC sequence: A synthetic 6 bp linker (CCCGAC) inserted between the enhancer and core promoter, similar to the design used in the D100 promoter.
  2. SV10CP (371 bp) (-352 to +19): The core promoter fragment (also referred to as SVBVFLt10) containing the TATA box (around −30) and the transcription start site (TSS, +1) required for transcription initiation.

The S100 promoter sequence was directly extracted from Figure 1 of Pattanaik et al. (2004), where the nucleotide sequence is explicitly provided in text format, and assembled in this order [SV10UAS] + [CCCGAC linker] + [SV10CP]:

GAAGCCCGCTTTACAAGTGGCCAGCTAGCTATCACTGAAAAGACAGCAAGACAATGGTGTCTCGATGCACCAGAACCACATCTTTGCAGCAGATGTGAAGCAGCCAGAGTGGTCCACAAGACGCACTCAGAAAAGGCATCTTCTACCGACACAGAAAAAGACAACCACAGCTCATCATCCAACATGTAGACTGTCGTTATGCGTCGGCTGAAGATAAGACTGACCCCAGGCCAGCACTAAAGAAGAAATAAcccgacGAAGCCCGCTTTACAAGTGGCCAGCTAGCTATCACTGAAAAGACAGCAAGACAATGGTGTCTCGATGCACCAGAACCACATCTTTGCAGCAGATGTGAAGCAGCCAGAGTGGTCCACAAGACGCACTCAGAAAAGGCATCTTCTACCGACACAGAAAAAGACAACCACAGCTCATCATCCAACATGTAGACTGTCGTTATGCGTCGGCTGAAGATAAGACTGACCCCAGGCCAGCACTAAAGAAGAAATAATGCAAGTGGTCCTAGCTCCACTTTAGCTTTAATAATTATGTTTCATTATTATTCTCTGCTTTTGCTCTCTATATAAAGAGCTTGTATTTTCATTTGAAGGCAGAGGCGAACACACACACA

DaMVFLt4 promoter (556 pb):

The DaMV4CP fragment corresponds to a natural promoter region derived from the Dahlia mosaic virus (DaMV). It consists of a 556 bp sequence spanning positions −474 to +82 relative to the transcription start site (TSS) according to Sahoo et al., (2014) study.

This fragment was directly extracted from the DaMV genome available in the GenBank database (accession: JX272320.1), corresponding to genomic coordinates 6579–7134.

> gb|JX272320.1|:6579-7134 Dahlia mosaic virus clone pDaMV-p2, complete genome

GAATTCAATCCTCCTCAGGAAATGAAGGATTCAGGAGATCTTCTCTATCAACTTGCTCAAGTAAGGACAAACGGGTTCACCCGGATCCTCCAGAAGACCCAGTCTATCAACGGAGAAACAAAGATAAAAATCAATTACTCACATGAAAGAGTATTGATCACGAGTCACTATGGAGCGACAATCTCCAGACAGGATGTCAGCATCTTATCTTCCTTTGAAGAAAGCATCATCAATAACGATGTAATGGTGGGGACATCCACTAAGTTATTGCTCTGCAAACAGCTCAAAAAGCTACTGGCCGACAATCATAATTGCTCGGCATGTGCAGGTGGGGCCTCCACTAGCAATAATACAAGCTTTACAGCTTGCAGTGACTCATCCTCCAATAATGAGGAAAAAGACGTCAGCAGTGACGAACAAGGGCCTGAAGACTTGCCTATATAATGGCATTCACCCCTCAGTTGAAGAGCATCAGGAGTTTCAGCATAGAAACTTTCTCTTTAACAAATCTATCTTTTCTTTAAAGCATGTGTGAGTAGAAACCCATATAGGGTTA

SM chimeric hybrid promoter (SUAS + MUAS fusion):

The SM promoter is a synthetic chimeric hybrid promoter constructed by combining regulatory elements from two plant viruses, as described by Kumari et al., (2024). It integrates an upstream activation sequence from Sugarcane bacilliform virus with an enhancer domain from Mirabilis mosaic virus to enhance transcriptional activity.

  1. SUAS ( SCBV Upstream Activation Sequence): This fragment corresponds to the Upstream Activation Sequence (UAS) derived from Sugarcane bacilliform virus (SCBV), as described by Davies et al., (2014). The selected region spans −434 bp to −153 bp relative to the transcription start site (TSS), resulting in a fragment of 282 bp. This region functions as a transcriptional enhancer.
  2. MUAS (MMV Upstream Activation Sequence): This fragment corresponds to the transcriptional enhancer domain derived from the full-length transcript (FLt) promoter of Mirabilis mosaic virus (MMV), as reported by Dey & Maiti, (1999).The sequence spans −297 to −38 relative to the TSS, with a total length of 259 bp, and contributes strong enhancer activity.

To find the first fragment SUAS, I first mapped both boundaries of the 839 bp SCBV promoter using the SCBV-F primer anchor (ATTGAATGG) and the complement of the SCBV-R primer (GAATTACACCTTTCCGCA) against the Sugarcane bacilliform virus (SCBV) Ireng Maleng isolate sequence (accession AJ277091). This allowed me to confirm the full span of the mother fragment from relative coordinate −770 to +69 image image image image Next, I identified the Transcription Start Site (TSS) based on the underlined leader sequence reported in the Figure 2 from the Davies (2014) study. I could identify the TSS (+1) as the 7528th nucleotide in the Sugarcane bacilliform virus (SCBV) Ireng Maleng isolate sequence: 7528 ATC GGTAGTTCAC CACATGAGTA TTTGAGTCAA 7560 image image To isolate the specific SUAS domain for the SM promoter, which the sources define as the segment from relative coordinates −434 to −153, I calculated the internal absolute indices within the 839 bp mother fragment. By mapping these relative coordinates back from the TSS, I determined the exact 282 bp enhancer sequence required to be joined directly to the MMV core promoter to build the chimeric SM promoter:

> emb|AJ277091.1|:7094-7375 Sugarcane bacilliform IM virus complete genome, isolate Ireng Maleng

GAACACCGTTCGAGTGTCATCGACAGGCCAAGGCCAACAGATGATCATTTCAGACCATGGGGGGATGTTACATACTGGCTGAATAAAGAAGCAGAAGAGTGCCACACAAGGGGCGACAACGTCGAAGGCGCAGAAGACGCAGTCGATCTCACTGACGTAAGCAATGACGACCAGTGGAGGAGATCGTAAGCAATGACGTATGGAGCGTGGAGGACCCATGAAAGCACTGAGAAGGCATCTCAACTTTCGGTGTGTGAGTGCGCATCCTATGCGATGCTTTGT

To find the second fragment MUAS, I first identified the source as the Mirabilis mosaic virus (MMV) full-length transcript (FLt) promoter from the Dey and Maiti (1999) article. Because the original study provided the literal nucleotide sequence in Figure 1 rather than a GenBank accession number, I used the printed sequence obtained from Gemini AI tool as my primary reference. image image I then established the Transcription Start Site (TSS or +1) as the anchor point, which the researchers mapped via primer extension to a guanidine (G) residue located 24 nucleotides downstream of the TATATAA box. To isolate the specific MUAS fragment, which spans the relative coordinates −297 to −38, I counted upstream from the TSS to locate the nucleotide at position −297 and extracted the sequence through to the nucleotide at position −38. This process provided the 259 bp enhancer domain required for the construction of the SM and BM chimeric promoters:

TTCGTCCACAGACATCAACATCTTATCGTCCTTTGAAGATAAGATAATAATGTTGAAGATAAGAGTGGGAGCCACCACTAAAACATTGCTTTGTCAAAAGCTAAAAAAGATGATGCCCGACAGCCACTTGTGTGAAGCATGTGAAGCCGGTCCCTCCACTAAGAAAATTAGTGAAGCATCTTCCAGTGGTCCCTCCACTCACAGCTCAATCAGTGAGCAACAGGACGAAGGAAATGACGTAAGCCATGACGTCTAATCCC

The SM promoter was generated by directly fusing the SUAS fragment upstream of the MUAS enhancer sequence, as described by (Kumari et al., 2024a) based on the source sequence described in Dey & Maiti, (1999) study:

GAACACCGTTCGAGTGTCATCGACAGGCCAAGGCCAACAGATGATCATTTCAGACCATGGGGGGATGTTACATACTGGCTGAATAAAGAAGCAGAAGAGTGCCACACAAGGGGCGACAACGTCGAAGGCGCAGAAGACGCAGTCGATCTCACTGACGTAAGCAATGACGACCAGTGGAGGAGATCGTAAGCAATGACGTATGGAGCGTGGAGGACCCATGAAAGCACTGAGAAGGCATCTCAACTTTCGGTGTGTGAGTGCGCATCCTATGCGATGCTTTGTTTCGTCCACAGACATCAACATCTTATCGTCCTTTGAAGATAAGATAATAATGTTGAAGATAAGAGTGGGAGCCACCACTAAAACATTGCTTTGTCAAAAGCTAAAAAAGATGATGCCCGACAGCCACTTGTGTGAAGCATGTGAAGCCGGTCCCTCCACTAAGAAAATTAGTGAAGCATCTTCCAGTGGTCCCTCCACTCACAGCTCAATCAGTGAGCAACAGGACGAAGGAAATGACGTAAGCCATGACGTCTAATCCC

BM chimeric hybrid promoter (BUAS + MUAS fusion):

The BM promoter is a synthetic chimeric hybrid promoter constructed by the fusion of two regulatory elements, as described by (Kumari et al., 2024a). It combines an upstream activation sequence from Banana streak virus with an enhancer domain from Mirabilis mosaic virus to enhance transcriptional efficiency.

  1. BUAS (BSV Upstream Activation Sequence) : This fragment corresponds to the Upstream Activation Sequence (UAS) derived from Banana streak virus (BSV), as reported by Remans et al., (2005). The selected region spans −1150 bp to −33 bp relative to the transcription start site (TSS), resulting in an expected length of approximately 1117 bp. This region functions as a strong transcriptional enhancer.
  2. MUAS (MMV Upstream Activation Sequence): This sequence corresponds to the transcriptional enhancer domain derived from the full-length transcript (FLt) promoter of Mirabilis mosaic virus (MMV). It is identical to the MUAS element used in the SM promoter and contributes additional transcriptional activation capacity.

To find the first fragment BUAS, I first identified the source as the Banana streak virus (BSV) Cavendish isolate, which corresponds to GenBank accession AF215815. Although the current database entry for this accession may show a length of 1,287 bp, I noted that the sources utilize a 1,304 bp synthesized version of this isolate spanning from relative coordinates −1,150 to +154.

Next, I used the BSV-F primer anchor sequence (GGTTGCATGGAAGG) to locate the beginning of the promoter region within the GenBank file. By finding this exact sequence at the very start of the file, I established that Nucleotide 1 of the GenBank entry corresponds to the relative coordinate −1,150. image image I then determined the Transcription Start Site (TSS or +1) by mapping the relative coordinates to the absolute indices of the 1,304 bp sequence. Since there are 1,150 bases upstream of the start site, the TSS is located at Nucleotide 1151. To isolate the specific BUAS domain, which the sources define as the segment from −1,150 to −33, I calculated the end index by subtracting 33 from the TSS (1151−33=1118). Finally, I extracted the sequence from Nucleotide 1 to Nucleotide 1118, which provided the approximately 1,117 bp (mathematically 1,118 bp) enhancer fragment required to construct the BM chimeric promoter:

> gb|AF215815.1|:1-1118 Banana streak virus ORF III polyprotein gene, partial cds

GGTTGCATGGAAGGTTGGGGAGGAGTTTGTAAATGGAAAGAACAATCAGGACAACCAAGATGGTCAGAGAAGATTTGTGCTTATGCGAGTGGAAAGTTTAATCCGATCAAGAGCACAATTGATGCAGAAATTCAAGCAGTCATCAACAGCTTGGATAAATTCAAGATATATTATCTTGATAAAAAGGAGTTGATCATCAGGACGGATAGTCAAGCGATAGTCAGTTTCTACAAGAAGAGTAGTGACCACAAACCCTCAAGGGTAAGATGGTTAGCTTTCACTGACTATATCACTGGAACAGGATTGGATGTGAAGTTTGAGCATATTGACGGCAAGGATAATGTGCTAGCAGACACTCTGTCAAGGCTAGTAAAAATCATATGCCACAAGGAGAAACATCCATCAGAAACAATATTGATCAACGTTGCAGAAGAAATACTTCAGAAAGGAAGTATTGGAGCAAAAAGAAAGTTGGGAGAAATGATAAGTGGATATGAAGCTTGGATGACAAGAATCCAAGAACACAAAATCAAGACACTAACACTTATCGAAAAACCAGTTTTTAAATGTGGTTGCAGGAAACCTGCTAGGCTTCACACGTCCAGGACATCAAGAAATCCGGGAAGAGAATTTTACTCATGTGAAAATAAAGCATGTTTCACTTGGGTATGGAAGGATCAGATTGATGAATACGTTCAAGAAGTGATGACGTGGAACGACCAAGTAAGCCAGTTGCCAGAAGAACCAGAAGGCTACAATGAAGGATGCACGATTGAAGACGCATTCGATCTGCTAGACGTCAGCAATGACGATCAATGGGCAAGGTCGTAAGCCATGACGTAGCGGAAGTGATGGACCCCATACCACTGGATGGCACTAACCAGTGTGACAAGGATACGAGATGCCAAGTGAGCTGGATAGCACTCACTTTATGTAAAGAGTGGTCTGCGTACCAACTCCACTATAGTCTGTCTGAGGTGCGATGCTGTGTCACGCACAAAGACTTTAGATTCCTTTGCGTGAGATGTACGCAAAGCAGTGTGTCCAGAGTGTGCTGTGACGCGTCCCTTGCATTATTGGTGGGTGCACCTAACGATGCGGGAAGCCGAACTCCCTCT

The BM promoter was generated by directly fusing the BUAS fragment upstream of the MUAS enhancer sequence, as described by Kumari et al., (2024):

GGTTGCATGGAAGGTTGGGGAGGAGTTTGTAAATGGAAAGAACAATCAGGACAACCAAGATGGTCAGAGAAGATTTGTGCTTATGCGAGTGGAAAGTTTAATCCGATCAAGAGCACAATTGATGCAGAAATTCAAGCAGTCATCAACAGCTTGGATAAATTCAAGATATATTATCTTGATAAAAAGGAGTTGATCATCAGGACGGATAGTCAAGCGATAGTCAGTTTCTACAAGAAGAGTAGTGACCACAAACCCTCAAGGGTAAGATGGTTAGCTTTCACTGACTATATCACTGGAACAGGATTGGATGTGAAGTTTGAGCATATTGACGGCAAGGATAATGTGCTAGCAGACACTCTGTCAAGGCTAGTAAAAATCATATGCCACAAGGAGAAACATCCATCAGAAACAATATTGATCAACGTTGCAGAAGAAATACTTCAGAAAGGAAGTATTGGAGCAAAAAGAAAGTTGGGAGAAATGATAAGTGGATATGAAGCTTGGATGACAAGAATCCAAGAACACAAAATCAAGACACTAACACTTATCGAAAAACCAGTTTTTAAATGTGGTTGCAGGAAACCTGCTAGGCTTCACACGTCCAGGACATCAAGAAATCCGGGAAGAGAATTTTACTCATGTGAAAATAAAGCATGTTTCACTTGGGTATGGAAGGATCAGATTGATGAATACGTTCAAGAAGTGATGACGTGGAACGACCAAGTAAGCCAGTTGCCAGAAGAACCAGAAGGCTACAATGAAGGATGCACGATTGAAGACGCATTCGATCTGCTAGACGTCAGCAATGACGATCAATGGGCAAGGTCGTAAGCCATGACGTAGCGGAAGTGATGGACCCCATACCACTGGATGGCACTAACCAGTGTGACAAGGATACGAGATGCCAAGTGAGCTGGATAGCACTCACTTTATGTAAAGAGTGGTCTGCGTACCAACTCCACTATAGTCTGTCTGAGGTGCGATGCTGTGTCACGCACAAAGACTTTAGATTCCTTTGCGTGAGATGTACGCAAAGCAGTGTGTCCAGAGTGTGCTGTGACGCGTCCCTTGCATTATTGGTGGGTGCACCTAACGATGCGGGAAGCCGAACTCCCTCTTTCGTCCACAGACATCAACATCTTATCGTCCTTTGAAGATAAGATAATAATGTTGAAGATAAGAGTGGGAGCCACCACTAAAACATTGCTTTGTCAAAAGCTAAAAAAGATGATGCCCGACAGCCACTTGTGTGAAGCATGTGAAGCCGGTCCCTCCACTAAGAAAATTAGTGAAGCATCTTCCAGTGGTCCCTCCACTCACAGCTCAATCAGTGAGCAACAGGACGAAGGAAATGACGTAAGCCATGACGTCTAATCCC

MSD3 chimeric deletion-hybrid promoter (MUAS + SD3):

The MSD3 promoter is a “deletion-hybrid” construct composed of the following two fragments joined directly together as described in the study of (Kumari et al., 2024b):

  1. MUAS (MMV Upstream Activation Sequence): This is the same sequence of the transcriptional enhancer domain isolated from the Mirabilis mosaic virus (MMV) full-length transcript (FLt) promoter, as used in SM and BM promoters.
  2. SD3 (SCBV Deletion Fragment 3): This fragment is a truncated promoter region derived from the Sugarcane bacilliform virus (SCBV), as described by Davies et al., 2014. The SD3 sequence corresponds to the region spanning −340 bp to +69 bp relative to the transcription start site, resulting in a fragment of 409 bp. This region retains essential core promoter elements required for basal transcription. The SD3 fragment was extracted from the SCBV genome (GenBank accession: AJ277091.1, positions 7188–7597):
emb|AJ277091.1|:7188-7597 Sugarcane bacilliform IM virus complete genome, isolate Ireng Maleng

AAGAGTGCCACACAAGGGGCGACAACGTCGAAGGCGCAGAAGACGCAGTCGATCTCACTGACGTAAGCAATGACGACCAGTGGAGGAGATCGTAAGCAATGACGTATGGAGCGTGGAGGACCCATGAAAGCACTGAGAAGGCATCTCAACTTTCGGTGTGTGAGTGCGCATCCTATGCGATGCTTTGTACCTTTGTTAGCTGTGTGTGTCCTTTTGGCATCTGTGCCACTTTACCTTTGTCGGCCACGTTGCCTTTGCTTAGCATCTACGCAAGCATAGCGCTCGGCTGGTGTGTGTTCCCTCTGCCTATATAAGGCATGGTTGTATGACTCTTACACTCATCGGTAGTTCACCACATGAGTATTTGAGTCAAGTTTGGCTTGAATAATAAGAATTACACCTTTCCGCAA

The final MSD3 promoter was obtained by direct assembly of the MUAS enhancer upstream of the SD3 core promoter fragment:

TTCGTCCACAGACATCAACATCTTATCGTCCTTTGAAGATAAGATAATAATGTTGAAGATAAGAGTGGGAGCCACCACTAAAACATTGCTTTGTCAAAAGCTAAAAAAGATGATGCCCGACAGCCACTTGTGTGAAGCATGTGAAGCCGGTCCCTCCACTAAGAAAATTAGTGAAGCATCTTCCAGTGGTCCCTCCACTCACAGCTCAATCAGTGAGCAACAGGACGAAGGAAATGACGTAAGCCATGACGTCTAATCCCAAGAGTGCCACACAAGGGGCGACAACGTCGAAGGCGCAGAAGACGCAGTCGATCTCACTGACGTAAGCAATGACGACCAGTGGAGGAGATCGTAAGCAATGACGTATGGAGCGTGGAGGACCCATGAAAGCACTGAGAAGGCATCTCAACTTTCGGTGTGTGAGTGCGCATCCTATGCGATGCTTTGTACCTTTGTTAGCTGTGTGTGTCCTTTTGGCATCTGTGCCACTTTACCTTTGTCGGCCACGTTGCCTTTGCTTAGCATCTACGCAAGCATAGCGCTCGGCTGGTGTGTGTTCCCTCTGCCTATATAAGGCATGGTTGTATGACTCTTACACTCATCGGTAGTTCACCACATGAGTATTTGAGTCAAGTTTGGCTTGAATAATAAGAATTACACCTTTCCGCAA

M24 synthetic promoter (MMV-derived):

The M24 promoter is a synthetic high-expression promoter derived from the Mirabilis mosaic virus (MMV), as described by (Sahoo et al., 2014). It was engineered to enhance transcriptional activity in plant systems. Based on the full-length transcript (FLt) promoter of MMV, the promoter was enhanced by duplication of upstream enhancer domains, leading to a significant increase in transcriptional strength.

The M24 promoter sequence was retrieved from the binary vector pSiM24 available in GenBank (accession: KF032933.1). The promoter corresponds to the region spanning positions 235–860 of the vector sequence.

KF032933.1:235-860 Binary vector pSiM24, complete sequence

TTCGTCCACAGACATCAACATCTTATCGTCCTTTGAAGATAAGATAATAATGTTGAAGATAAGAGTGGGAGCCACCACTAAAACATTGCTTTGTCAAAAGCTAAAAAAGATGATGCCCGACAGCCACTTGTGTGAAGCATGTGAAGCCGGTCCCTCCACTAAGAAAATTAGTGAAGCATCTTCCAGTGGTCCCTCCACTCACAGCTCAATCAGTGAGCAACAGGACGAAGGAAATGACGTAAGCCATGACGTCTAATCCCCCCAACTTCGTCCACAGACATCAACATCTTATCGTCCTTTGAAGATAAGATAATAATGTTGAAGATAAGAGTGGGAGCCACCACTAAAACATTGCTTTGTCAAAAGCTAAAAAAGATGATGCCCGACAGCCACTTGTGTGAAGCATGTGAAGCCGGTCCCTCCACTAAGAAAATTAGTGAAGCATCTTCCAGTGGTCCCTCCACTCACAGCTCAATCAGTGAGCAACAGGACGAAGGAAATGACGTAAGCCATGACGTCTAATCCCACAAGAATTTCCTTATATAAGGAACACAAATCAGAAGGAAGAGATCAATCGAAATCAAAATCGGAATCGAAATCAAAATCGGAATCGAAATCTCTCATCT

PClSV FLt promoter (Peanut chlorotic streak caulimovirus):

The PClSV FLt promoter is a constitutive plant promoter derived from the Peanut chlorotic streak caulimovirus. It is composed of a basic full-length transcript (FLt) promoter region and upstream enhancer elements, which can be arranged in single or duplicated configurations to modulate transcriptional strength.

The promoter elements were identified from the PClSV genome (GenBank accession: U13988.1) as follows:

  1. Basic FLt promoter (core region): Spans positions 5852–6101 (~250 bp) and contains essential elements required for transcription initiation
> gb|U13988.1|PCU13988:5852-6101 Peanut chlorotic streak caulimovirus, complete genome
GAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGACCTGTGCATCCGCTTTGCCTATAAATAAGTTTTAGTTTGTATTGATCGACACGATCGAGAAGACACGGCCAT
  1. Enhancer element: A 178 bp upstream regulatory sequence (5852–6029) responsible for increasing transcriptional activity
> gb|U13988.1|PCU13988:5852-6029 Peanut chlorotic streak caulimovirus, complete genome
GAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGA

The assembled PClSV FLt promoter [Enhancer] + [Core promoter] sequence:

GAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGAGAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGACCTGTGCATCCGCTTTGCCTATAAATAAGTTTTAGTTTGTATTGATCGACACGATCGAGAAGACACGGCCAT

Double enhancer PCisV FLt promoter:

Based on (Maiti & Shepherd, 1998), the double enhancer configuration was constructed by duplicating the enhancer region upstream of the core promoter: [Enhancer] + [Enhancer] + [Core promoter] (~428 bp)

The PClSV FLt promoter sequence was reconstructed from GenBank (U13988.1) and assembled in a double enhancer configuration based on the design described by Maiti & Shepherd (1998):

GAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGAGAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGAGAGATCTTGAGCCAATCAAAGAGGAGTGATGTAGACCTAAAGCAATAATGGAGCCATGACGTAAGGGCTTACGCCATTACGAAATAATTAAAGGCTGATGTGACCTGTCGGTCTCTCAGAACCTTTACTTTTTATATTTGGCGTGTATTTTTAAATTTCCACGGCAATGACGATGTGACCTGTGCATCCGCTTTGCCTATAAATAAGTTTTAGTTTGTATTGATCGACACGATCGAGAAGACACGGCCAT

The double enhancer configuration of the PClSV FLt promoter results in an approximately threefold increase in transcriptional activity compared to the single enhancer version. Overall, this promoter exhibits strong constitutive expression in transgenic plants, with activity levels reported to be comparable to the FLt promoter of the Figwort mosaic virus and functionally similar to the widely used CaMV 35S promoter, making it a robust alternative for high-level gene expression in plant systems.

CVP1 and CVP2 promoters (Cassava vein mosaic virus, CsVMV):

The CVP1 and CVP2 promoters are constitutive plant promoters derived from the Cassava vein mosaic virus (CsVMV), as described by Verdaguer et al., (1996) and Verdaguer et al., (1998) based on the reference genome reported by Calvert et al., (1995). These promoters correspond to two fragments of different lengths within the viral genome and differ in their regulatory strength.

  • CVP1 (short fragment): corresponds to a 388 bp fragment spanning nucleotides 7235 to 7623, which maps to the region −368 to +20 relative to the transcription start site (TSS).
  • CVP2 (long fragment): represents a longer 511 bp fragment extending from nucleotides 7160 to 7675, corresponding to positions −443 to +72 relative to the TSS.

Both fragments contain core promoter elements, including the TATA box and upstream regulatory motifs, with CVP2 retaining additional upstream sequences that enhance transcriptional activity.

The sequences were directly retrieved from the CsVMV reference genome (GenBank accession: U20341.1) using the genomic coordinates reported in the original studies:

CPV 1 :

>gb|U20341.1|CVU20341:7235-7623 Cassava vein mosaic virus, complete genome

GCTCAGCAAGAAGCAGATCAATATGCGGCACATATGCAACCTATGTTCAAAAATGAAGAATGTACAGATACAAGATCCTATACTGCCAGAATACGAAGAAGAATACGTAGAAATTGAAAAAGAAGAACCAGGCGAAGAAAAGAATCTTGAAGACGTAAGCACTGACGACAACAATGAAAAGAAGAAGATAAGGTCGGTGATTGTGAAAGAGACATAGAGGACACATGTAAGGTGGAAAATGTAAGGGCGGAAAGTAACCTTATCACAAAGGAATCTTATCCCCCACTACTTATCCTTTTATATTTTTCCGTGTCATTTTTGCCCTTGAGTTTTCCTATATAAGGAACCAAGTTCGGCATTTGTGAAAACAAGAAAAAATTTGGTGTAAG

CPV 2 :

>gb|U20341.1|CVU20341:7160-7675 Cassava vein mosaic virus, complete genome

TCCAGAAGGTAATTATCCAAGATGTAGCATCAAGAATCCAATGTTTACGGGAAAAACTATGGAAGTATTATGTGAGCTCAGCAAGAAGCAGATCAATATGCGGCACATATGCAACCTATGTTCAAAAATGAAGAATGTACAGATACAAGATCCTATACTGCCAGAATACGAAGAAGAATACGTAGAAATTGAAAAAGAAGAACCAGGCGAAGAAAAGAATCTTGAAGACGTAAGCACTGACGACAACAATGAAAAGAAGAAGATAAGGTCGGTGATTGTGAAAGAGACATAGAGGACACATGTAAGGTGGAAAATGTAAGGGCGGAAAGTAACCTTATCACAAAGGAATCTTATCCCCCACTACTTATCCTTTTATATTTTTCCGTGTCATTTTTGCCCTTGAGTTTTCCTATATAAGGAACCAAGTTCGGCATTTGTGAAAACAAGAAAAAATTTGGTGTAAGCTATTTTCTTTGAAGTACTGAGGATACAACTTCAGAGAAATTTGTAAGTTTG

Functional analyses have demonstrated that CVP2 exhibits expression levels comparable to the enhanced CaMV 35S promoter (e35S), whereas CVP1 shows approximately half of this activity, indicating that CVP2 is about twofold more active than CVP1. These results highlight the importance of additional upstream regulatory sequences in driving stronger gene expression in plant systems.

FMV Sgt (34S) promoter (Figwort mosaic virus):

The Sgt (34S) promoter is a subgenomic promoter derived from the Figwort mosaic virus (FMV). It is located between ORF V and ORF VI and is responsible for driving the expression of ORF VI via a subgenomic transcript. According to (Bhattacharyya et al., 2002) , a 301 bp fragment spanning −270 to +31 relative to the transcription start site (TSS) provides maximal promoter activity. The promoter sequence was extracted from the published figure using an AI tool (Gemini), as it was only available in image format: image image

TTTACAGTAAGAACTGATAACAAAAATTTTACTTATTTCCTTAGAATTAATCTTAAAGGTGATAGTAAACAAGGACGATTAGTCCGTTGGCAAAATTGGTTCAGCAAGTATCAATTTGATGTCGAACATCTTGAAGGTGTAAAAAACGTTTTAGCAGATTGCCTCACGAGAGATTTTAATGCTTAAAAACGTAAGCGCTGACGTATGATTTCAAAAAACGCAGCTATAAAAGAAGCCCTCCAGCTTCAAAGTTTTCATCAACACAAATTCTAAAAACAAAATTTTTAGAGAGGGGGAGTG

PTSB1 promoter (Arabidopsis thaliana):

The PTSB1 promoter is a constitutive plant promoter I derived from the Arabidopsis thaliana tryptophan synthase β-subunit gene (TSB1). I identified this as a powerful alternative to the CaMV 35S promoter for high-level gene expression in tobacco (Shirasawa-Seo et al. 2002).

I retrieved this promoter from GenBank accession M23872, corresponding to a 1.5 kb fragement. I defined the exact boundaries of this fragment by mapping the reported PCR primers directly onto the reference sequence (Shirasawa-Seo et al. 2002):

  • 5’ Border (Forward primer): GAATTCTTTCATATCTCCTGCAAAGT
  • 3’ Border (Reverse primer): TCAGAGAGAGATTCATTCAGTA (This is the reverse complement of the primer sequence TACTGAATGAATCTCTCTCTGA listed in the sources.) image image image image The resulted extracted sequence of PTSB1 promoter:
GAATTCTTTCATATCTCCTGCAAAGTTCTTGATATCAATACTCCAGCAGTAACTAAGACTTAGACTCTTGAGCGTAGGAGAGTTTGATAACAAAGACTCGGCCTCTGTGAGCTTGATCCAACCAATAGAGAGCTTTCTAGGCAATCCCGAGTTTTTGAACTTGGAGGGATCAAGCCCACACGCGTAAATCTTTAGTGATTCGAGATTTGTGTTTAAAATCCGAATTAAAACCTAATCAAATTAAAACTAAACCAAACCAAATACAATCCAAAATTAAACTAATTTTGGTTGAGTTTGGTTATAGTTTTACTAAATCCAAATTAACAGAACATAACCAAACCCGAAGATTTTTAGAGTCTTTAGAATTTTAAGGTGATTTTAGATAAAAGAGATTAAACACAAATCTCGAAAACTAAAGAAAGAGTTTTTGAAAATTTTTAAGTGTTTTCATGTAAAGTGGATTTCTCTGTGTTTTCTGCATTCTGCGGATTATAACTCCTATGTTTTTTTTCTCCGTCAATTATATGTGTTTATTTTCTCTATTTTCTTTTATTTTTATTTTTATTCTCTATATTAGGGTTTAGTTTATGAAAACTTTTTGTTATCTATATAGGCTTGGGGGATGTATTTAAATTAGAATTTAAAGTGATTTGAGTTCTTTGAGTTTTTAAATAATTTTAACGATTTTAAAAAAGTTCGTATGATTTTTGTAAAATCTATTAAAATCTCACCTTAAATCATGGGATTTGGATTTCTGTATTTTGAACTAAGAAAATCCTCTCAAATCCTCCAAAATCATTAAAATTCAAATCCACAAATTGTTCTGAATAACAGTGAATTTTAAGGTGGATTTTGAAATAATTAGTTCAATAACACTGAATTTCATGAGATTTTTTAAAATACATGTTTGAATAACATATGATTTATAAATTCTACACAAATCTTTTAAAATTCTAATTTCAATACATTGTTTTTGAAAGTGTTATTGACTCTTGCCAATATAGTATCCCAATTCCCAACTTGTGTTTCATTTTTTCATCTATCTAATAAACAATTAGATGAACACAAAAAAATATTGGTAGGTGATGGCTCAATTGGATATGTTTTTGAAAACCATGTGTTAAAAACTTAAAATACTATCCAACTTACCCCAGTCCTACCAACTTTTTTTTTCTTCTCTTGGTCTGCTTACATGTGTCTGCTTATATCTCCAAAAGGAAATAGATATATAAAAATTCAAATTTAAATATTTGCGATTTGTTAAATTTTAATCAATATTTAATTTTTGTTTTTTTTTGTTTTTTTTTATGAAGACAACAAATAACCAAATTTATCAAATCTGATCAAAGCAGATTTAGGATTTTACAAATATATTTTTTTAATATGAATTTTGTGGTCAGATTTTGACCAATTCTCTTTGAAAAAAAAAAAAATCTATCTATAAAAACATGTGTTACTTTGAAAGGATATTTCAAGGAGAAGAATATATTTGACTCAGAGAGAGATTCATTCAGTA

This region contains the core promoter and upstream regulatory elements responsible for its strong constitutive activity. This promoter exhibited approximately 2.4-fold higher expression than the CaMV 35S promoter in mature tobacco leaves, with activity increasing in lower leaf positions (Shirasawa-Seo et al. 2002).

PPHYB promoter (Arabidopsis thaliana):

The PPHYB promoter is a constitutive promoter derived from the Arabidopsis thaliana phytochrome B (PHYB) gene (Goosey et al. 1997; Shirasawa-Seo et al. 2002).

I retrieved this sequence from GenBank accession L09262, which corresponds to a 2.3 kb fragment. The promoter boundaries were defined by mapping the experimentally reported primers onto the sequence (Shirasawa-Seo et al. 2002):

  • 5’ Border (Forward primer): GTCGACTTGTGCACCACCGTCT
  • 3’ Border (Reverse primer): CGGAGAAGAAGAACCGTCGTCA (This is the reverse complement of the primer sequence TGACGACGGTTCTTCTTCTCCG listed in the sources.) image image image image The resulted extracted sequence of PPHYB promoter:
GTCGACTTGTGCACCACCGTCTAAGCTAACAAGTTGACCTAAACGCTCTATGGGATTAGGGTTTAGTAGATTGAGACTGAATAAAGAAACCCTAAAATCGAGCATCATCACAACATGAAACTCCTTACTCTGCTTCTTCTTTGCTTCTTCTTTATCGATGTGCTTCCTTGTAAAAGACATATCTTTGGATAAAGTGTTCAACTTTTTGCATGTGAATCGTACTCTTCTCAGAGATGTCACTGGAAACTTCGAGAGCACCTCCTCCGCCACATCCTTTGGAAGATCCGAGAGCATCGTCGTTGATTGTTTTTGCATATCGAAGAAATTTTACTTTACCTTTTACTCTGATTTCTTCAGAGATTATGAGAGAACGAACACTTCAGAAATGTTAGATGTTTCTAAATTGGGCTTGGGCTTTAAAGTATTACCCAAAGGCTATTAAAGTCGTTTTTTCCAATTTGGGCTCCTGATTTATTAGTATGGGAGGGCTTAGTTTTGGGCTTTAAAGTATGCCCCAATGCCTAATAATGTCTAGCTAGTTCTTCGTTATACTAAAGAACGAATTTTGGAAATTCTTGAATTACGATTGTACCCTTATATTAATTTCATCTTTTGTCTTATTCTTATTTATGCAAAAGTTATGCAAAAGTTTTAAGAAATTAGCAGCCAAGCCTAAAGAATCATTGAGAGTTTATAAGGGTGATTTGGTAATTGAGTAGTTTATTAGCTAATTTGATTTCAGTGGCACGTGGTAAATTACTGGTGGTTTAAAACTATTGTACGTGGACGATTCTTAGCCAACGAACTAGTACACTCTAGTGCGAACAGGTACATGATTAAATTCGTGGACATCCAATCATATCTCGTCCAAGATAAGACCAAAACATATGAGGTCATTACTCACTAATAAACATTTAAACTTTTGTTTTGTCAACGAATAGTGTGTTTTTCTTTTGTCATTCCAATTTTTTTCTGTTTTCTTTTCACTATTCACTTTTGGTCCATAATATTTTATGGGTATATAAGATAATCGTTTTTGTCTTCATACATGGTAACATGGATGTTTATATATGTAATAGTGTTAAAAAGAAAAAGTGGTCGGTTATACTTAACTTATTATGATAGAGCTTTGAAAACAAACAACACGAGATGGAGAAATTAGTCATTCAACAAAAGAAAAGGACGAACGCAGTGACTTAACATGAAACTGTGAGCGGCCCAAAATCATTTATGTAATGGACCCTTAACTTTTCATGCACACGATTTTTCTCATTTATATGTTTTTCTGCTCTCTTTTTTTCCTCTTTATCATTACTTTAATTTATTTTATGTTCTTTTTTCGAAGCACCATAATTGTATGCTTTCACCAAATAATCCAAATTTAGAATCATTAATATGTCAAAAAAGAATTGCATATATTCAATAAAACGTAATGCTAAGTAGTACAATGCATGTATTATACAAAATGTAATGATATAGATCCAACGTATATATCAAAGTGGACCAAAATATATCTTATGTATTAGACGAGTTTACTATGCAAAATTTATGATTCTATTCCGCATGGAGCGTGCTAATACTACTTCGAACCCCTTTGAGACCAATATGTGATTCTATATTCTATCTAGTACAAATTATGAGAAGTATATACGTACGATGAGAGTATAAAACATTTCAATATTTGTATAGAGAGGACACCACTTGGTTGACTTGACCCACGATAAGATATTGAAGAAACCAAACTTGTATAGTACGAATTCGAAATCGTAATTGATGATGCGATTCGACAAGTCCAGGGGCTCCCTCCCACGCGCAATGGGCCCAGCAACCACGTGTGGCCACTAGAGAGAATAAACCATTAGCCCACGTGATCTTGGGCCCAATCAATCTCTCCCTCACATTAAACGACAAAACAAAAGCTCTTCTGGGTTAAATTGATAAATATCAAAACTTTAAAGGTAATTTGCTAAAATCGCCACACAAAAAAAGTCGCAGAAAATATATGAGGAAACAAAAAGCGAAGACGACAAAAAAAAAAAAAACTCTGATTTTTTTTTGTTATCTCTCTCTATCTGAGAGGCACACATTTTGCTTCGTCTTCTTCAATTTATTTTATTGGTTTCTCCACTTATCTCCGATCTCAATTCTCCCCATTTTCTTCTTCCTCAAGTTCAAAATTCTTGAGAATTTAGCTCTACCAGAATTCGTCTCCGATAACTAGTGGATGATGATTCACCCTAAATCCTTCCTTGTCTCGAGGTAATTCTGAGAAATTTCTCAAATTCAAAATCAAACGGCATGGTTTCCGGAGTCGGGGGTAGTGGCGGTGGCCGTGGCGGTGGCCGTGGCGGAGAAGAAGAACCGTCGTCA

This fragment includes the core promoter and regulatory regions required for stable expression. Functionally, PPHYB provides approximately 1.5-fold higher expression than the CaMV 35S promoter in mature tobacco leaves, with a more uniform expression pattern across leaf positions compared to PTSB1 (Shirasawa-Seo et al. 2002).

PNCR promoter (Soybean chlorotic mottle virus):

The PNCR promoter is a viral-derived constitutive promoter isolated from the large noncoding region of the Soybean chlorotic mottle virus (Conci et al. 1993). Based on the reported genome size (~8,175 bp), I identified the corresponding genomic sequence and retrieved it from GenBank accession X15828.2. I then defined the functional ~486 bp promoter fragment by mapping the reported PCR primers onto the genome (Conci et al. 1993):

  • 5’ Border (Forward primer): ATGTAGGACATGCCAGCTGTAA
  • 3’ Border (Reverse primer): CAAGCACAAGAGAAAAGAAAGG (Note: This is the reverse complement of the primer sequence CCGGATCCTTTCTTTTCTCTTGTGCTTG provided in the source, after removing the restriction enzyme site.): image image image image
    The extracted sequence of PNCR promoter:
ATGTAGGACATGCCAGCTGTAAAAGAAAGCTCACCTACTAATATGTGGTAGTGGACGCTTTACTTTATTAAAAGTGGTTGGTCAGTAATAATGTAAGACCCCACTTCTTTTCTTTTGCTTGCACGCGAAGGATGCCGCTCTACCCAGTTGTTAAGGCACCTATCGCATTATAAATAAGAGACCAAGGACTCTATTGTTCCTTGGAGTTTGATTGAGTAAGGAATATAGCCAATAGTGCCGTGTAAGGCCAAGTGCTTTTATCCATTTACACTCACTCCCAGTCGGTGGTTTAAAAACCTGGACCGGCAAAGTCGAGAGACTCTAAATTAGAAAAGGAGAAGTCCTTTATACTATCAAACAAGGAGAGATCCTAAATCTAAACACAAAATCCTTTATGAATAAGAAATTGTTCCAGCAACTACCAAGTCTTAAAAAGACCCAGGAAGCAAAAGCAAAGCAAGAACAAGCACAAGAGAAAAGAAAGG

This region contains key regulatory features including a TATA box, CAAT-like motifs, and multiple enhancer-related elements. Functionally, this promoter exhibits approximately five-fold higher expression than the CaMV 35S promoter in tobacco protoplasts (Conci et al. 1993), while showing moderate constitutive activity (~67% of P35S) in mature leaves (Shirasawa-Seo et al. 2002).

FMV promoter (Figwort mosaic virus):

The FMV promoter is a constitutive viral promoter derived from the Figwort mosaic virus genome. In this work, I used the promoter sequence obtained directly from the supplementary Benchling file provided in (Shakhova et al., 2022):

tcatcaaaatatttagcagcattccagattgggttcaatcaacaaggtacgagccatatcactttattcaaattggtatcgccaaaaccaagaaggaactcccatcctcaaaggtttgtaaggaagaattctcagtccaaagcctcaacaaggtcagggtacagagtctccaaaccattagccAaaagctacaggagatcaatgaagaatcttcaatcaaagtaaactactgttccagcacatgcatcatggtcagtaagtttcagaaaaagacatccaccgaGgacttaaagttagtgggcatctttgaaagtaatcttgtcaacatcgagcagctggcttgtggggaccagacaaaaaaggaatggtgcagaattgttaggcgcacctaccaaaagcatctttgcctttattgcaaagataaagcagattcctctagtacaagtggggaacaaaataacgtggaaaagagctgtcctgacagcccactcactaatgcgtatgacgaacgcagtgacgaccacaaaagaattccctctatataagaaggcattcattcccatttgaaggatcatcagatactGaaccaatatttctc

To verify its genomic origin, I performed a BLAST analysis using the NCBI nblast, and obtained a 100% sequence match corresponding to coordinates 6358 to 6955 of the reference genome (GenBank accession NC_003554.1), confirming the exact location of the promoter fragment within the FMV genome. According to (Shakhova et al., 2022), the FMV promoter exhibited lower activity compared to the CaMV 35S promoter under their experimental conditions, indicating that while it remains a functional constitutive promoter, it is not as strong as p35S in this specific system.

p35S (CAMV 35S promoter):

The p35S promoter is a canonical constitutive promoter derived from the Cauliflower mosaic virus and is one of the most widely used regulatory elements in plant biotechnology.

In my study, I used the specific p35S sequence provided in the supplementary Benchling file of (Shakhova et al., 2022):

tgagacttttcaacaaaggataatttcgggaaacctcctcggattccattgcccagctatctgtcacttcatcgaaaggacagtagaaaaggaaggtggctcctacaaatgccatcattgcgataaaggaaaggctatcattcaagatctctctgccgacagtggtcccaaagatggacccccacccacgaggagcatcgtggaaaaagaagaggttccaaccacgtctacaaagcaagtggattgatgtgacatctccactgacgtaagggatgacgcacaatcccactatccttcgcaagacccttcctctatataaggaagttcatttcatttggagaggaca

pAtUBQ10 promoter (Arabidopsis thaliana):

The pAtUBQ10 promoter (version 0.8) is a strong constitutive plant promoter derived from the Arabidopsis thaliana ubiquitin-10 gene (At4g05320). In this work, I used the exact ~800 bp upstream fragment as characterized in (Shakhova et al., 2022).

I obtained the sequence directly from the supplementary Benchling file provided in the study, ensuring that the construct corresponds precisely to the experimentally validated version used for expression analysis:

tgggacccacggttcaattattgccaattttcagctccaccgtatatttaaaaaataaaacgataatgctaaaaaaatataaatcgtaacgatcgttaaatctcaacggctggatcttatgacgaccgttagaaattgtggttgtcgacgagtcagtaataaacggcgtcaaagtggttgcagccggcacacacgagtcgtgtttatcaactcaaagcacaaatacttttcctcaacctaaaaataaggcaattagccaaaaacaactttgcgtgtaaacaacgctcaatacacgtgtcattttattattagctattgcttcaccgccttagctttctcgtgacctagtcgtcctcgtcttttcttcttcttcttctataaaacaatacccaaagagctcttcttcttcacaattcagatttcaatttctcaaaatcttaaaaactttctctcaattctctctaccgtgatcaaggtaaatttctgtgttccttattctctcaaaatcttcgattttgttttcgttcgatcccaatttcgtatatgttctttggtttagattctgttaatcttagatcgaagtcgattttctgggtttgatcgttagatatcatcttaattctcgattagggtttcatagatatcatccgatttgttcaaataatttgagttttgtcgaataattactcttcgatttgtgatttctatctagatctggtgttagtttctagtttgtgcgatcgaatttgtcgattaatctgagtttttctgattaaca

This fragment represents the regulatory region immediately upstream of the translation start site and includes key cis-regulatory elements responsible for its constitutive activity.

Functionally, in Nicotiana systems, this promoter provides high and stable expression levels, outperforming several endogenous plant promoters such as pAtAct2, pAtTCTP, and pAtPD7 (Shakhova et al., 2022). Although its activity is lower than the viral Cauliflower mosaic virus 35S promoter, it shows comparable expression strength to other viral promoters such as Figwort mosaic virus (FMV) and Cotton leaf curl Multan virus (CmYLCV), making it a reliable and predictable option for high-level gene expression in both Nicotiana benthamiana leaves and tobacco BY-2 cell packs.

pAtAct2 promoter (Arabidopsis thaliana):

The pAtAct2 promoter is a constitutive plant promoter derived from the Arabidopsis thaliana actin 2 gene (AT3G18780). In this work, I used the specific version characterized in (Shakhova et al., 2022).

I obtained the sequence directly from the supplementary Benchling file provided in the study, ensuring that the construct corresponds exactly to the experimentally tested version. In this configuration, the native promoter was fused to the 5′UTR omega sequence of the Tobacco mosaic virus (TMV), a common modification used to enhance translation efficiency in Nicotiana expression systems:

tcgacaaaatttagaacgaacttaattatgatctcaaatacattgatacatatctcatctagatctaggttatcattatgtaagaaagttttgacgaatatggcacgacaaaatggctagactcgatgtaattggtatctcaactcaacattatacttataccaaacattagttagacaaaatttaaacaactattttttatgtatgcaagagtcagcatatgtataattgattcagaatcgttttgacgagttcggatgtagtagtagccattatttaatgtacatactaatcgtgaatagtgaatatgatgaaacattgtatcttattgtataaatatccataaacacatcatgaaagacactttctttcacggtctgaattaattatgatacaattctaatagaaaacgaattaaattacgttgaattgtatgaaatctaattgaacaagccaaccacgacgacgactaacgttgcctggattgactcggtttaagttaaccactaaaaaaacggagctgtcatgtaacacgcggatcgagcaggtcacagtcatgaagccatcaaagcaaaagaactaatccaagggctgagatgattaattagtttaaaaattagttaacacgagggaaaaggctgtctgacagccaggtcacgttatctttacctgtggtcgaaatgattcgtgtctgtcgattttaattatttttttgaaaggccgaaaataaagttgtaagagataaacccgcctatataaattcatatattttcctctccgctttgaatactgtatttttac

Functionally, although pAtAct2 is historically described as a strong constitutive promoter in Arabidopsis, the results of (Shakhova et al., 2022) show that it exhibits relatively low activity in tobacco systems. When compared to the 0.4 kb version of the Cauliflower mosaic virus 35S promoter (p35S) used as the reference in this study, pAtAct2 ranks among the weakest promoters in the tested set. This indicates that, despite its native strength in Arabidopsis, pAtAct2 behaves as a moderate-to-low strength promoter in Nicotiana, even after optimization via the TMV omega 5′UTR fusion.

NOS promoter (Agrobacterium tumefaciens nopaline synthase):

The NOS promoter is a constitutive plant promoter derived from the nopaline synthase (nos) gene of Agrobacterium tumefaciens, and is widely used in plant transformation vectors for moderate gene expression.

In this work, I retrieved the NOS promoter sequence from GenBank entry AF485783.1, corresponding to the binary vector pBI121, using the coordinates 2519 to 2825. This fragment represents the regulatory region upstream of the nos gene as commonly implemented in plant expression constructs.

The sequence was directly extracted from the annotated GenBank record, ensuring consistency with a well-established and experimentally validated vector backbone frequently used in plant biotechnology.

>AF485783.1:7727-7979 Binary vector pBI121, complete sequence

GATCGTTCAAACATTTGGCAATAAAGTTTCTTAAGATTGAATCCTGTTGCCGGTCTTGCGATGATTATCATATAATTTCTGTTGAATTACGTTAAGCATGTAATAATTAACATGTAATGCATGACGTTATTTATGAGATGGGTTTTTATGATTAGAGTCCCGCAATTATACATTTAATACGCGATAGAAAACAAAATATAGCGCGCAAACTAGGATAAATTATCGCGCGCGGTGTCATCTATGTTACTAGATC

Functionally, the NOS promoter is considered a moderate-to low strength constitutive promoter, typically weaker than strong viral promoters such as the Cauliflower mosaic virus 35S promoter, but valued for its stable and reliable expression across different plant tissues.

PromoterOriginRelative Strength vs. CaMV 35SKey Advantage / NoteSource
TobUbi.u4Nicotiana tabacum (polyubiquitin)~7× strongerNative to tobacco; excellent stability for long-term expressionGenschik et al., 1994 (GenBank: X77456.1)
D100Synthetic (Dahlia mosaic virus)~2.2× strongerOne of the strongest synthetic promoters validated in tobaccoKhadanga et al., 2021; Sahoo et al., 2015
MSD3Synthetic chimeric (MMV + SCBV)~1.15× strongerWorks in both monocots and dicots; stable in tobaccoKumari et al., 2024; Dey & Maiti, 1999
DaMVFLt4Dahlia mosaic virus~5× strongerVery high activity in protoplasts and transgenic plantsSahoo et al., 2014; GenBank: JX272320.1
M24MMV-derived~10× strongerExtremely strong promoter with enhanced duplicated domainsSahoo et al., 2014
S100Synthetic (Strawberry vein banding virus)~1.8× strongerStrong synthetic alternative; slightly weaker than D100Khadanga et al., 2021; Pattanaik et al., 2004
SMSynthetic chimeric (SCBV + MMV)~2.1× strongerHighly effective in dicots like tobaccoKumari et al., 2024; Davies et al., 2014
BMSynthetic chimeric (BSV + MMV)~1.72× strongerGood alternative synthetic promoter for dicotsKumari et al., 2024; Remans et al., 2005
FMV 34SFigwort mosaic virus~2× strongerWidely used constitutive promoter in dicotsBhattacharyya et al., 2002
CaMV 35SCauliflower mosaic virus1× (reference)Gold standard promoter for plant expressionOdell et al., 1985; Shakhova et al., 2022
PTSB1Arabidopsis thaliana (TSB1)~2.4× strongerVery strong in mature leaves; tissue-dependent variationShirasawa-Seo et al., 2002
PPHYBArabidopsis thaliana (PHYB)~1.5× strongerUniform expression across tissuesShirasawa-Seo et al., 2002; Goosey et al., 1997
PNCRSoybean chlorotic mottle virus~5× (protoplasts), moderate in plantsStrong viral promoter distinct from CaMV and FMVConci et al., 1993; Shirasawa-Seo et al., 2002
PCisVPClSV FLt promoter~2× strongerStrong constitutive promoter comparable to FMVMaiti & Shepherd, 1998
dPCisVDouble enhancer PCisV~6× strongerHighly powerful promoter due to enhancer duplicationMaiti & Shepherd, 1998
CPV1Cassava vein mosaic virus~0.5× of CPV2Moderate activity; tissue-specific expressionVerdaguer et al., 1996; Calvert et al., 1995
CPV2Cassava vein mosaic virus~1× (similar to e35S)Stronger version; high activity in vascular tissuesVerdaguer et al., 1998
pFMVFigwort mosaic virus<1 (weaker than 35S)Common alternative but weaker in this systemShakhova et al., 2022
AtUBQ10 (0.8)Arabidopsis thaliana<1 (similar to pFMV)Stable expression across tissuesShakhova et al., 2022
AtAct2Arabidopsis thalianaModerate to lowConstitutive but weak in tobacco systemShakhova et al., 2022
P-NosAgrobacterium tumefaciensWeak to moderateCommonly used for selectable marker genesGenBank: AF485783
Terminator sequences:

The sequences of the tOCS, tHSP18.2, tATPase, tAtAct2, and tRBCS3C terminators were retrieved from the supplementary Benchling file provided in the study by Shakhova et al. Using this source ensured that the exact versions correspond to those experimentally validated in the study, maintaining consistency with the reported expression data.

tOCS terminator (Agrobacterium tumefaciens)

The tOCS terminator originates from the octopine synthase gene of Agrobacterium tumefaciens. In the comparative analysis reported by Shakhova et al. (2022), this terminator consistently showed the highest performance among all tested elements. It produced the strongest and most stable expression levels across both Nicotiana benthamiana leaves and tobacco BY-2 cell systems, making it the most reliable option when maximal transgene expression is required.

  • tOCS extracted sequences:
ctgctttaatgagatatgcgagaagcctatgatcgcatgatatttgctttcaattctgttgtgcacgttgtaaaaaacctgagcatgtgtagctcagatccttaccgccggtttcggttcattctaatgaatatatcacccgttactatcgtatttttatgaataatattctccgttcaatttactgattgtaccctactacttatatgtacaatattaaaatgaaaacaatatattgtgctgaataggtttatagcgacatctatgatagagcgccacaataacaaacaattgcgttttattattacaaatccaattttaaaaaaagcggcagaaccggtcaaacctaaaagactgattacataaatcttattcaaatttcaaaagtgccccaggggctagtatctacgacacaccgagcggcgaactaataacgctcactgaagggaactccggttccccgccggcgcgcatgggtgagattccttgaagttgagtattggccgtccgctctaccgaaagttacgggcaccattcaacccggtccagcacggcggccgggtaaccgacttgctgccccgagaattatgcagcatttttttggtgtatgtgggccccaaatgaagtgcaggtcaaaccttgacagtgacgacaaatcgttgggcgggtccagggcgaattttgcgacaacatgtcgaggctcagcag

tHSP18.2 terminator (Arabidopsis thaliana)

The tHSP18.2 terminator is derived from the heat shock protein 18.2 gene of Arabidopsis thaliana. According to Shakhova et al. (2022), it performs at a very high level, ranking just below tOCS in both experimental systems. Although previously considered optimal in Arabidopsis and rice, its activity in tobacco remains strong but slightly less efficient than tOCS.

  • tHSP18.2 extracted sequences:
TAGGTTAAatatgaagatgaagatgaaatatttggtgtgtcaaataaaaagcttgtgtgcttaagtttgtgtttttttcttggcttgttgtgttatgaatttgtggctttttctaatattaaatgaatgtaagatctcattataatgaataaacaaatgtttctataatccattgtgaatgttttgttggatctcttctgcagcatataactactgtatgtgctatggtatggactatggaatatgattaaagataag

tATPase terminator (Solanum lycopersicum)

The tATPase terminator, originating from a tomato (Solanum lycopersicum) ATPase gene, belongs to the group of high-performing terminators. Experimental data from Shakhova et al. (2022) indicate that it supports robust expression levels comparable to tHSP18.2 in Nicotiana systems. This makes it a solid alternative when strong but not necessarily maximal expression is sufficient.

  • tATPase extracted sequences:
accgcactgtgtgtggtttctcaagaccaagacagctaaagcctaaagtcagagatctaatatgtgtattgttattcatgacaccacagctgccacttttggtgttatgatctgtttgtagaagtaggaattcttttttttctacttaataatagcttaaagagctgtgcaatttggtctgtattttttgtgtattttgcactcattatttgtgaacagtttgagaactatttattttctaagatttgtgcacgtatgaaccacttttcatctatataccaccatgtttattctgcatctatgggattgagtttgaatattcgttgatcaacaaagttatatttggtggatactacttgaaggtgcatatactttgtgctcatatatttagttgatattctggattttgagctggacaaattgatcaaggtagtctaatctggtctggttactaataaaactcaagagatcact

tAtAct2 terminator (Arabidopsis thaliana)

The tAtAct2 terminator comes from the actin 2 gene of Arabidopsis thaliana. Despite the widespread use of actin-related regulatory elements, this terminator showed relatively weak performance in the tested tobacco systems. In Shakhova et al. (2022), it consistently resulted in low expression levels in both plant leaves and cell cultures, indicating limited efficiency for high-expression constructs.

  • tAtAct2 extracted sequences:
gctctcaagatcaaaggcttaaaaagctggggttttatgaatgggatcaaagtttctttttttcttttatatttgcttctccatttgtttgtttcatttccctttttgttttcgtttctatgatgcacttgtgtgtgacaaactctctgggtttttacttacgtctgcgtttcaaaaaaaaaaaccgctttcgttttgcgttttagtcccattgttttgtagctctgagtgatcgaattgatgcctctttattccttttgttccctataatttctttcaaaactcagaagaaaaaccttgaaactctttgcaatgttaatataagtattgtataagatttttattgatttggttattagtcttacttttgctacctccatcttcacttggaactgatattctgaatagttaaagcgttacatgtgttccattcacaaatgaacttaaactagcacaaagtcagatattttaagatcgcaccattt

tRBCS3C terminator (Solanum lycopersicum)

The tRBCS3C terminator is derived from the small subunit (3C) of the Rubisco gene in tomato. Similar to tAtAct2, it exhibited low expression output in all experimental conditions described by Shakhova et al. (2022). The data suggest that this terminator can significantly limit overall transcriptional efficiency, especially when paired with strong promoters.

  • tRBCS3C extracted sequences:
atatgtcaacagtgagaaactgttcgcattttccgttttgcttctttctttctattcaatgtatgttgttggattccagttgaatttattatgagaactaataataatagtaataatcatttgtttctttactaatttgcattttcacatatgatttctggtgcatatcataattttcattccaccaatattaatttcccccattcaagttacttatgaaatagaaatcctcttctccgactactttatttgtccgaaagtcttgtggctgctatataa

Important note! The study highlights that terminators do not act independently but interact strongly with the chosen promoter. With highly active promoters, the difference between a strong terminator (such as tOCS) and a weak one (such as tRBCS3C) can lead to expression changes of more than 50-fold. While this effect is less pronounced with weaker promoters, it remains an important factor in construct design.

T-35S (Cauliflower mosaic virus)

The T-35S terminator is a widely used viral transcriptional terminator derived from the Cauliflower mosaic virus (CaMV). For my construct, I retrieved its sequence from the binary vector pEAQ-HT available in GenBank under accession GQ497234.1. The fragment corresponds to the region spanning positions 2889 to 3588, which contains the full termination and polyadenylation signals commonly used in plant expression systems. This sequence was directly extracted from the annotated GenBank entry to ensure accuracy and consistency with experimentally validated vector designs.

> GQ497234.1:2889-3588 Binary vector pEAQ-HT, complete sequence

CTCGAATTCGCTGAAATCACCAGTCTCTCTCTACAAATCTATCTCTCTCTATTTTCTCCATAAATAATGTGTGAGTAGTTTCCCGATAAGGGAAATTAGGGTTCTTATAGGGTTTCGCTCATGTGTTGAGCATATAAGAAACCCTTAGTATGTATTTGTATTTGTAAAATACTTCTATCAATAAAATTTCTAATTCCTAAAACCAAAATCCAGTACTAAAATCCAGATCTCCTAAAGTCCCTATAGATCTTTGTCGTGAATATAAACCAGACACGAGACGACTAAACCTGGAGCCCAGACGCCGTTCGAAGCTAGAAGTACCGCTTAGGCAGGAGGCCGTTAGGGAAAAGATGCTAAGGCAGGGTTGGTTACGTTGACTCCCCCGTAGGTTTGGTTTAAATATGATGAAGTGGACGGAAGGAAGGAGGAAGACAAGGAAGGATAAGGTTGCAGGCCCTGTGCAAGGTAAGAAGATGGAAATTTGATAGAGGTACGCTACTATACTTATACTATACGCTAAGGGAATGCTTGTATTTATACCCTATACCCCCTAATAACCCCTTATCAATTTAAGAAATAATCCGCATAAGCCCCCGCTTAAAAATTGGTATCAGAGCCATGAATAGGTCTATGACCAAAACTCAAGAGGATAAAACCTCACCAAAATACGAAAGAGTTCTTAACTCTAAAGATAAAAGAT

T-E9 (Pea Rubisco small subunit)

The T-E9 terminator originates from the small subunit of the Rubisco gene (rbcS) in pea (Pisum sativum) and is known for its efficient transcription termination and mRNA stabilization in plant systems. I obtained this sequence from the binary vector pKM24KH, using the GenBank accession HM036220.1. The selected region corresponds to positions 10721 to 11366, as defined in the annotated sequence. This fragment was directly extracted from the GenBank record to ensure that the version used matches the one functionally validated in plant transformation vectors.

> HM036220.1:10721-11366 Binary vector pKM24KH, complete sequence

GCTTTCGTTCGTATCATCGGTTTCGACAACGTTCGTCAAGTTCAATGCATCAGTTTCATTGCGCACACACCAGAATCCTACTGAGTTTGAGTATTATGGCATTGGGAAAACTGTTTTTCTTGTACCATTTGTTGTGCTTGTAATTTACTGTGTTTTTTATTCGGTTTTCGCTATCGAACTGTGAAATGGAAATGGATGGAGAAGAGTTAATGAATGATATGGTCCTTTTGTTCATTCTCAAATTAATATTATTTGTTTTTTCTCTTATTTGTTGTGTGTTGAATTTGAAATTATAAGAGATATGCAAACATTTTGTTTTGAGTAAAAATGTGTCAAATCGTGGCCTCTAATGACCGAAGTTAATATGAGGAGTAAAACACTTGTAGTTGTACCATTATGCTTATTCACTAGGCAACAAATATATTTTCAGACCTAGAAAAGCTGCAAATGTTACTGAATACAAGTATGTCCTCTTGTGTTTTAGACATTTATGAACTTTCCTTTATGTAATTTTCCAGAATCCTTGTCAGATTCTAATCATTGCTTTATAATTATAGTTATACTCATGGATTTGTAGTTGAGTATGAAAATATTTTTTAATGCATTTTATGACTTGCCAATTGATTGACAACATGCATCAATCGAT

Addional terminaters:

T-Nos (Nopaline Synthase)

> GQ497234.1:1596-1848 Binary vector pEAQ-HT, complete sequence

GATCGTTCAAACATTTGGCAATAAAGTTTCTTAAGATTGAATCCTGTTGCCGGTCTTGCGATGATTATCATATAATTTCTGTTGAATTACGTTAAGCATGTAATAATTAACATGTAATGCATGACGTTATTTATGAGATGGGTTTTTATGATTAGAGTCCCGCAATTATACATTTAATACGCGATAGAAAACAAAATATAGCGCGCAAACTAGGATAAATTATCGCGCGCGGTGTCATCTATGTTACTAGATC

T-PinII (Potato Proteinase Inhibitor II)

T-Mas (Mannopine Synthase)

TerminatorOriginRelative PerformanceKey CharacteristicsSequence Source
tOCSAgrobacterium tumefaciens (octopine synthase)Highest (Top performer)Most stable and strongest expression in Nicotiana systems; best overall choiceShakhova et al., 2022 (supplementary Benchling file)
tHSP18.2Arabidopsis thaliana (heat shock protein 18.2)Very high (slightly below tOCS)Strong expression; highly efficient but slightly less than tOCS in tobaccoShakhova et al., 2022 (supplementary Benchling file)
tATPaseSolanum lycopersicum (ATPase gene)HighRobust and consistent performance; comparable to tHSP18.2Shakhova et al., 2022 (supplementary Benchling file)
tAtAct2Arabidopsis thaliana (actin 2)LowWeak expression in Nicotiana; not suitable for high-expression constructsShakhova et al., 2022 (supplementary Benchling file)
tRBCS3CSolanum lycopersicum (Rubisco small subunit 3C)LowLimits transcription efficiency; weakest among tested terminatorsShakhova et al., 2022 (supplementary Benchling file)
T-35SCauliflower mosaic virusModerate to highWidely used standard terminator; reliable polyadenylation signalGenBank: GQ497234.1 (pEAQ-HT vector)
T-E9Pisum sativum (Rubisco small subunit)HighEfficient transcription termination and mRNA stabilization in plantsGenBank: HM036220.1 (pKM24KH vector)
CTP (Chloroplast Transit Peptde) sequences:

The three chloroplast transit peptides (RbcS CTP, Ferredoxin-2 CTP, and RecA CTP) were identified from Arabidopsis thaliana proteins using the UniProt database. For each protein, I first retrieved the corresponding entry (accessions P10795, P16972, and Q39199), then examined the “Features” section, specifically under PTM/Processing, to locate the annotated transit peptide regions. image image image image image image

The CTP sequences were directly extracted from the annotated transit peptide segments, which correspond to the N-terminal targeting signals responsible for directing proteins to the chloroplast. This approach ensures that the selected sequences match experimentally curated annotations and represent functional chloroplast-targeting peptides.

The extracted sequences are:

RbcS CTP (P10795):

MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVN
image image

Ferredoxin-2 CTP (P16972):

MASTALSSAIVGTSFIRRSPAPISLRSLPSANTQSLFGLKSGTARGGRVTAM
image image

RecA CTP (Q39199):

MDSQLVLSLKLNPSFTPLSPLFPFTPCSSFSPSLRFSSCYSRRLYSPVTVYA
image image

These sequences were selected to provide alternative chloroplast targeting signals with potentially different import efficiencies, enabling flexibility in construct design.

CTPSource ProteinOrganismUniProt AccessionLength (aa)Key Function
RbcS CTPRibulose-1,5-bisphosphate carboxylase/oxygenase small subunitArabidopsis thalianaP1079557Targets proteins to chloroplast stroma (photosynthetic pathway)
Ferredoxin-2 CTPFerredoxin-2 (chloroplastic)Arabidopsis thalianaP1697253Directs proteins to chloroplast electron transport system
RecA CTPDNA repair protein RecA homolog 1Arabidopsis thalianaQ3919957Targets proteins to chloroplast nucleoids (DNA maintenance)
Vector Backbones

pCAMBIA2300 (Construct 1: Structural genes – coxL, M, S)

image image

The pCAMBIA2300 vector (GenBank accession AF234315.1) was used as the backbone for the structural gene construct. It is a binary plant expression vector with an approximate size of 8.7 kb, designed as an empty cloning system without any reporter gene, allowing full customization of inserted expression cassettes.

This vector carries the nptII gene, which confers kanamycin resistance in plants, making it suitable for selecting transformants expressing the structural genes (coxL, coxM, coxS). For bacterial propagation, it also includes a kanamycin resistance marker, enabling selection in E. coli prior to Agrobacterium transformation.

The cloning region consists of a pUC18-derived multiple cloning site (MCS) containing standard restriction sites. Additionally, the presence of the pVS1 origin of replication ensures high plasmid stability in Agrobacterium. This vector is well-suited for accommodating multi-cassette inserts, such as the structural gene assembly used in this project.

pCAMBIA1300 (Construct 2: Maturation genes – coxD, E, F, G)

image image

The pCAMBIA1300 vector (GenBank accession AF234296.1) was selected as the backbone for the maturation gene construct. Similar to pCAMBIA2300, it is an empty binary vector (~8.9 kb) designed for flexible insertion of custom genetic elements.

Its key feature is the presence of a hygromycin resistance gene (HygR) for plant selection, which complements the kanamycin resistance used in pCAMBIA2300. This enables the implementation of a dual-selection strategy for identifying co-transformed plants carrying both constructs.

For bacterial selection, pCAMBIA1300 also carries a kanamycin resistance marker, allowing propagation in E. coli. The vector includes a standard pUC18-derived MCS, suitable for inserting large DNA fragments such as the multi-gene maturation cassette (coxD, coxE, coxF, coxG).

Dual-Vector Strategy and Considerations

The combined use of pCAMBIA2300 and pCAMBIA1300 allows efficient co-expression of multiple genes through independent constructs:

ConstructGenesVectorPlant Selection
StructuralcoxL, coxM, coxSpCAMBIA2300Kanamycin
MaturationcoxD, coxE, coxF, coxGpCAMBIA1300Hygromycin

This dual-selection system enables reliable identification of plants carrying both constructs. An important technical consideration is that both vectors use kanamycin for bacterial selection, which prevents simultaneous selection of both plasmids in E. coli. Therefore, each construct must be cloned and verified independently before being introduced into Agrobacterium. Co-transformation can then be achieved, followed by selection at the plant level using both antibiotics.

Plant Expression Vectors: pCAMBIA2300 and pCAMBIA1300

For my plant transformation system, I selected two complementary binary vectors: pCAMBIA2300 and pCAMBIA1300, enabling the independent construction and co-expression of structural and maturation gene cassettes. Detailed technical specifications for both vectors can be found in their respective datasheets provided by Abcam for pCAMBIA1300 and pCAMBIA2300.

FeaturepCAMBIA2300pCAMBIA1300
Construct UseStructural genes (coxL, coxM, coxS)Maturation genes (coxD, coxE, coxF, coxG)
Approx. Size~8.7 kb~8.9 kb
Plant Selection MarkerKanamycin (nptII)Hygromycin (HygR)
Bacterial SelectionKanamycinKanamycin
Reporter GeneNone (empty vector)None (empty vector)
Cloning SitepUC18-derived MCSpUC18-derived MCS
Replication in AgrobacteriumpVS1 origin (high stability)pVS1 origin (high stability)
Insert CapacitySuitable for large multi-cassette insertsSuitable for large multi-cassette inserts
Main AdvantageCompatible with kanamycin-based plant selectionEnables dual selection with hygromycin

Sources:

  • Bhattacharyya, S., Dey, N., & Maiti, I. B. (2002). Analysis of cis-sequence of subgenomic transcript promoter from the Figwort mosaic virus and comparison of promoter activity with the cauliflower mosaic virus promoters in monocot and dicot cells. Virus Research, 90(1), 47–62. https://doi.org/10.1016/S0166-0934(02)00146-5
  • Calvert, L. A., Ospina, M. D., & Shepherd, R. J. (1995). Characterization of cassava vein mosaic virus: A distinct plant pararetrovirus. Journal of General Virology, 76(5), 1271–1278. https://doi.org/10.1099/0022-1317-76-5-1271
  • Conci, L. R., NISHIZAWA, Y., SAITO, M., DATE, T., HASEGAWA, A., MIKI, K., & HIBI, T. (1993). A strong promoter fragment from the large noncoding region of soybean chlorotic mottle virus DNA. Japanese Journal of Phytopathology, 59(4), 432-437.
  • Davies, J. P., Reddy, V., Liu, X. L., Reddy, A. S., Ainley, W. M., Thompson, M., Sastry-Dent, L., Cao, Z., Connell, J., Gonzalez, D. O., & Wagner, D. R. (2014). Identification and use of the sugarcane bacilliform virus enhancer in transgenic maize. BMC Plant Biology, 14(1), 359. https://doi.org/10.1186/s12870-014-0359-3
  • Dey, N., & Maiti, I. B. (1999). Structure and promoter/leader deletion analysis of mirabilis mosaic virus (MMV) full-length transcript promoter in transgenic plants. Plant Molecular Biology, 40(5), 771–782. https://doi.org/10.1023/A:1006285426523
  • Genschik, P., Marbach, J., Uze, M., Feuerman, M., Plesse, B., & Fleck, J. (1994). Structure and promoter activity of a stress and developmentally regulated polyubiquitin-encoding gene of Nicotiana tabacum. Gene, 148(2), 195–202. https://doi.org/10.1016/0378-1119(94)90689-0
  • Goosey, L., Palecanda, L., & Sharrock, R. A. (1997). Differential patterns of expression of the Arabidopsis PHYB, PHYD, and PHYE phytochrome genes. Plant physiology, 115(3), 959–969. https://doi.org/10.1104/pp.115.3.959
  • Khadanga, B., Chanwala, J., Sandeep, I. S., & Dey, N. (2021). Synthetic Promoters from Strawberry Vein Banding Virus (SVBV) and Dahlia Mosaic Virus (DaMV). Molecular Biotechnology, 63(9), 792–806. https://doi.org/10.1007/s12033-021-00344-5
  • Kumari, K., Sherpa, T., & Dey, N. (2024a). Analysis of plant pararetrovirus promoter sequence(s) for developing a useful synthetic promoter with enhanced activity in rice, pearl millet, and tobacco plants. Frontiers in Plant Science, 15. https://doi.org/10.3389/fpls.2024.1426479
  • Kumari, K., Sherpa, T., & Dey, N. (2024b). Analysis of plant pararetrovirus promoter sequence(s) for developing a useful synthetic promoter with enhanced activity in rice, pearl millet, and tobacco plants. Frontiers in Plant Science, 15. https://doi.org/10.3389/fpls.2024.1426479
  • Norris, S. R., Meyer, S. E., & Callis, J. (1993). The intron of Arabidopsis thaliana polyubiquitin genes is conserved in location and is a quantitative determinant of chimeric gene expression. Plant molecular biology, 21(5), 895–906. https://doi.org/10.1007/BF00027120
  • Maiti, I. B., & Shepherd, R. J. (1998). Isolation and Expression Analysis of Peanut Chlorotic Streak Caulimovirus (PClSV) Full-Length Transcript (FLt) Promoter in Transgenic Plants. Biochemical and Biophysical Research Communications, 244(2), 440–444. https://doi.org/10.1006/bbrc.1998.8287
  • Pattanaik, S., Dey, N., Bhattacharyya, S., & Maiti, I. B. (2004). Isolation of full-length transcript promoter from the Strawberry vein banding virus (SVBV) and expression analysis by protoplasts transient assays and in transgenic plants. Plant Science, 167(3), 427–438. https://doi.org/10.1016/j.plantsci.2004.04.011
  • Remans, T., L. Grof, C. P., Ebert, P. R., & Schenk, P. M. (2005). Identification of functional sequences in the pregenomic RNA promoter of the Banana streak virus Cavendish strain (BSV-Cav). Virus Research, 108(1), 177–186. https://doi.org/10.1016/j.virusres.2004.09.005
  • Sahoo, D. K., Dey, N., & Maiti, I. B. (2014). pSiM24 Is a Novel Versatile Gene Expression Vector for Transient Assays As Well As Stable Expression of Foreign Genes in Plants. PLOS ONE, 9(6), e98988. https://doi.org/10.1371/journal.pone.0098988
  • Sahoo, D. K., Sarkar, S., Raha, S., Das, N. C., Banerjee, J., Dey, N., & Maiti, I. B. (2015). Analysis of Dahlia Mosaic Virus Full-length Transcript Promoter-Driven Gene Expression in Transgenic Plants. Plant Molecular Biology Reporter, 33(2), 178–199. https://doi.org/10.1007/s11105-014-0738-9
  • Shakhova, E. S., Markina, N. M., Mitiouchkina, T., Bugaeva, E. N., Karataeva, T. A., Palkina, K. A., Fakhranurova, L. I., Yampolsky, I. V., Sarkisyan, K. S., & Mishin, A. S. (2022). Systematic Comparison of Plant Promoters in Nicotiana spp. Expression Systems. International Journal of Molecular Sciences, 23(23), 15441. https://doi.org/10.3390/ijms232315441
  • Shirasawa-Seo, N., Mitsuhara, I., Nakamura, S., Murakami, T., Iwai, T., Nishizawa, Y., … & Ohashi, Y. (2002). Constitutive promoters available for transgene expression instead of CaMV 35S RNA promoter: Arabidopsis promoters of tryptophan synthase protein β subunit and phytochrome B. Plant Biotechnology, 19(1), 19-26.
  • Verdaguer, B., de Kochko, A., Beachy, R. N., & Fauquet, C. (1996). Isolation and expression in transgenic tobacco and rice plants, of the cassava vein mosaic virus (CVMV) promoter. Plant Molecular Biology, 31(6), 1129–1139. https://doi.org/10.1007/BF00040830
  • Verdaguer, B., de Kochko, A., Fux, C. I., Beachy, R. N., & Fauquet, C. (1998). Functional organization of the cassava vein mosaic virus (CsVMV) promoter. Plant Molecular Biology, 37(6), 1055–1067. https://doi.org/10.1023/A:1006004819398

Phase 2: Codon Optimization

Phase 3: CTP Junction Design & SPP Cleavage Verification

Phase 4: Promoter screening Simulation (Asimov Kernel)

Phase 5: Terminator Selection (pairing with promoters)

Phase 6: Level 0 Part Assembly

Phase 7: Level 1 Cassette Assembly (individual expression cassettes)

Phase 8: Level 2 Multicassette Assembly (structural + maturation inserts)

Phase 9: Full Construct Assembly (insert into pCAMBIA1300 and pCAMBIA2300)

Phase 10: GFP Reporter Construct Assembly

Phase 11: Protein Structure Prediction

Phase 12: Primer Design & Twist Bioscience Order Preparation

Group Final Project

cover image cover image