Codon Optimization and Sequence Adaptation processes:
1. Start Codon Verification and Correction
As an initial step, all seven CODH genes were carefully inspected to verify the presence of a valid translation initiation codon. A critical adjustment was required for the coxM gene, which was the only gene using an alternative bacterial start codon (GTG) instead of the canonical ATG.
Since plant translation machinery, particularly in Nicotiana tabacum, strictly recognizes ATG as the initiation codon, the native GTG was manually corrected to ATG during the optimization process. This modification ensures proper translation initiation while preserving the original amino acid sequence of the CoxM protein.
2. Codon Optimization Strategy
Codon optimization was performed using the Benchling Codon Optimization Tool, applying the âMatch Codon Usageâ algorithm. This approach was selected because it reproduces the natural codon distribution of the target organism rather than overusing only the most frequent codons, thereby improving mRNA stability and translation efficiency.

The optimization process was carried out under the following parameters:
- Target organism: Nicotiana tabacum
- Restriction site filtering: Removal of common restriction enzyme recognition sites (EcoRI, HindIII, BamHI, XbaI, PstI, and SpeI) to facilitate downstream cloning
- Golden Gate compatibility: Elimination of BsaI and Esp3I sites to ensure compatibility with Modular Cloning (MoClo) systems
- RNA stability optimization: Implementation of uridine depletion and avoidance of stable hairpin structures to reduce ribosomal stalling and improve translation efficiency
3. Results and Validation
Following optimization, all sequences were evaluated using CAIcal to assess codon adaptation and overall sequence quality.

The analysis demonstrated consistently strong performance across all seven genes as showed in the following table:
| Gene Name | Length (bp) | CAI Score | Total GC% | GC at 3rd Position | Nc Value | Expression Potential |
|---|
| CoxL | 2430 | 0.773 | 46.3% | 40.0% | 57.0 | Excellent |
| CoxE | 1200 | 0.762 | 49.8% | 40.5% | 61.0 | Very Good |
| CoxG | 618 | 0.760 | 46.9% | 40.8% | 61.0 | Very Good |
| CoxS | 501 | 0.759 | 47.7% | 43.1% | 61.0 | Very Good |
| CoxD | 888 | 0.756 | 46.8% | 40.5% | 61.0 | Very Good |
| CoxF | 843 | 0.748 | 49.5% | 39.9% | 61.0 | Very Good |
| CoxM | 867 | 0.747 | 49.5% | 39.1% | 61.0 | Very Good |
The Codon Adaptation Index (CAI) values ranged from 0.747 to 0.773, indicating a high level of similarity to codon usage patterns found in highly expressed genes of Nicotiana tabacum. This suggests that the optimized sequences are well-suited for efficient translation in the plant host.
The overall GC content was successfully adjusted to a range of 46.3% to 49.8%, aligning with the typical GC composition of plant genes. This represents a significant improvement compared to the original bacterial sequences and contributes to better transcriptional stability and compatibility with the host genome.
The Effective Number of Codons (Nc) values ranged from 57.0 to 61.0, reflecting a balanced codon usage without excessive repetition. This indicates that the sequences maintain sufficient variability, which is important for avoiding issues such as tRNA depletion or translational bottlenecks.
Additionally, the GC content at the third codon position was maintained at approximately 40%, which is considered optimal for the âwobbleâ position. This balance supports efficient recognition by plant tRNAs and contributes to overall translation efficiency.
To further validate the integrity of the optimization process, both the raw bacterial sequences and the codon-optimized sequences were translated into their corresponding amino acid sequences.

A pairwise comparison was then performed using BLASTp alignment to assess sequence similarity.
The results confirmed that all optimized proteins are identical to their native counterparts, with no changes in amino acid sequence.
This verification step ensures that codon optimization only affected synonymous codon usage without altering protein structure or function, preserving the biological activity of all seven CODH components.
The resulting codon-optimized cox genes sequences are as follows:
- coxD gene (codon optimized):
ATGAGACATCATGCTGAACGAGATAAGGTCGCCGAGAGGCTAGCCTATGCAGGTTATATTCCAGATCGTGATCTTGCTACCGCTGTTTGGCTGATGGAAAGCCTTTCCAGGCCCTTGTTGTTAGAAGGAGAAGCTGGTGTAGGTAAAACCGAGGTAGCTCTGACTCTTGCGCAAGCTAACGGAGCAAGGCTCATTCGCTTGCAATGCTATGAAGGGCTCGATCAAAACGCTGCATTATACGAGTGGAATTACCAACGGCAGTTGCTCGCTATCAAAACACGGGAAAGTCGTGCTGACGCAGTAGATGTTATCGAAGATCATATTTTCTCAGAGAAGTTTCTTCTTGAGCGACCTCTGTTGGCTGCAATACGTCAACCCAAATCAGCAGTGCTACTAATTGATGAGGTTGACAGGGCCGACGAGGAGTTCGAAGCCTTTTTACTCGAACTTCTAAGCGATTACCAGGTTTCTATTCCTGAACTTGGTACAATCCACGCAACAACGATTCCACAGGTGATATTAACTTCCAATGGCACGAGAGAGTTATCAGATGCCTTGAGGAGGAGATGTCTCTACCACTATGTCGACTATCCAGATGTTGAAAGAGAAGCGCGTATCATAACCACAAGAATGCCGAATATTGACGTTGCTCTGGCGTTGCAGATTGCCAGGATGATCGAGGGAATACGAAAAGAGGATTTACGCAAGAGTCCTGGAGTCGCAGAAACTCTCGACTGGGCAGCAGCATTGGCTGGGCTTGGCGTTGAGGATCTTAGAGCTGAACCAGAAGCTGTGTTTGAAACTATGATGTGCTTGATAAAGACAGTCGAAGATAAATCGAGAGTGACTAGAGAGGTTTCTGATAGACTGCTTGGAAAGGTGGCATAA
- coxE gene (codon optimized):
ATGGTTGCAACTGCTGCCATTCATGAATCCAGCGCTGCTTCAGCAGGAGCTAGACGCAAGCTGGGCGATTTTGTTCGAGTACTCCGGGACAATGGTTTTATTGTGGGGCTCGCGGAGGCTGGAGATGCTCTTACTGTTCTTAGCAGGCCTGCCTCTTTGACACCTAGCAGACTACGACCGGCTCTTCGTGCATTGTTCTGCTCAAACAAGTCTGATTGGGAAAAGTTTGACGAGATTTTCGATGCTTTCTGGCTTGGACGAGGAATGAAATCCGCAACGAGAATTTCCGGAGTGCTTCAAAAAAGTCCTCCCGGTATGGAAAGTTCAAGGAGTGGCGATAGACCAGGTAATCCTGATGGGGCACCAGATCATGTTCAGCGGCGTATAGGCTTGGATCACGGCACCGATGAAAATAGTCCAGGACTTCGGGAAGGTGCATCACGCGCTGACTCACTGGCCAAGGCTGATTTTAGACATCTCACAAACCCGGACGATCTTGCTGCCGCTCATGCTGTAGCTGCAAGACTCGCAAAGGCTATGAGGGTGCGCTTAACCCGACGTGAACAGTCTCGCAGAACTGGTAGGAGGATCGACCTTAGAAGGACTATTCACAAAAATATAGCCCATGGAGGAATGCCACTGGAATTGGTCTGGCGACAGAGGAAACACAAACCATTAAGACTGGTTGTTCTACTCGACGCTTCCGGATCTATGAGCATGTATAGTGCAGTATTCTTAAGATTCATGCACGGGATTCTTGATAATTTTAGGGAGGCCGAAGCATTTGTTTTCCATACAAGGCTAATTCATATATCTCCAGCTTTGAGAGAACGTGATGCGACACGTTCTGTGGAGAGAATGAGCCTATTGGCCCAAGGCGTCGGTGGTGGAACACGGATCGGTGAATCACTTGCCACGTTTAATAGATGGCATGCAAAGAGAGCAATTCATTCGAGGACTTGCGTTATGATCGTGTCAGATGGTTACGATACCGGACCTGCCGAGCAATTGGAGCGAGAAATGTCGGCTTTAAGGCGTCGTTGTAGAAGAATCGCATGGCTCAACCCAATGATCGGTTGGAGGGGGTATGCGCCAGAGGCAGCTGGGATGAAAGCTGCACTGCCTCACGTCGACTTGTTTGCTCCCGCTCACAACTTAGAGAGCTTGCAAGCAATTGAGCCTTACTTAGCGAGGATATAA
- coxF gene (codon optimized):
ATGACACCTACTCCTGACGTGTTAGATTTAGTCAACAATATGAAAGCCAGAGGAGAGCCATTCGCCCTTGCAACTGTAGTTCGGACGGTATCACTCACCGCAGCCAAGGCAGGTGCAAAGGCTATTATTTTGAGCGACGGTACTATGACAGCAGGATGGATTGGGGGCGGGTGTGCGAGAGCTAATGTGCTTAAGGCTGCTAGGCAAAGTCTTAGCGACGGAAAGCCGAGGCTGATTAGTGTTCAACCAAAGGATGTTCTTGAGGAACATGGTTTAACAGCAGGGGAAGCGCGAGAAGGAGTGCTATATGCCAACAACATGTGCCCAAGCCATGGTACCATGGATATTTTCGTTGAGCCAATATTGCCGCGACCTCAGCTCTATATCTGTGGAGCAAGCCCAGTTGCAGTGGCTATAGCTGCTATAGCACCTCGTATGGGATTTTTTGTGTCTGTTTGCGCTCCCAAAGCAGATCACACATTGTTTGGTGATACCGATAGGCTGATTGATGGTTATGAAATTCCCGCCGACAGCGGTACTAATCGGTACGTCGTTGTATCTACACAGGGACGTGGCGATACTGCTGCTCTGAAATCTGCACTATCCACGCCATCCGTCTACGTGGCTTTCGTTGGCAGTAGAAAGAAAGCCTCGGTTTTGAGGGAAGAGCTTACCGTAGCAGGAATTGCGCCATCACTATTGGAAACATTGCATGCTCCTGCCGGCCTCGACCTTGGCGGTATCACTCCTGATGAAATCGCTCTCTCAATCGTTGCTGAGATGGTCGAGATAAGACGCCACGGGCAAAGACAAAGCGATAATCAGAAAGAAGGAACATCATAA
- coxG gene (codon optimized):
ATGGATATGAACGCAAGCCAGAGAATTGAAGCCTCAAGGGAAAAAGTCTACGCCGCTCTCAATGATGTTGAGGTGCTTAGGCCTTGCATTCCAGGTTGCGAGTCCATCGAAAAGATCTCTGATAGCGAGATGACTGCCAAGGTAACATTGCGCATAGGACCAGTGAAAGCATCTTTTACCGGTAAGGTGACCCTAAGTGATCTCGATCCTCCAAATGGTTACACCATAGCAGGGGAGGGTACAGGAGGAATGGCAGGATTCGCAAAGGGCGGTGCTACTGTGAAACTCGAAGCTGACGGGACTGCCACGATTCTTCATTATACTGTTAAAGCTGACGTCGGAGGCAAACTGGCGCAGCTTGGTGGTAGACTAATCGATGCAACAGCTACAAAACTTGCAGGAGAGTTTTTTGAAAAATTCGGAAATATTGTTGGGCCTGTAGTAGTCCAAGACGAAGAAGAGCCGGTTAAGAAGAAAGGTTGGTTGAAGAAGATAACTGGCGCTTTAAGTGTTTTGGTTTTCTCAATTTTGTTAGGAGCTCACTGGTGTTGTATTGGGGGCCATGCTCACGCTCAAAACGATCCCCTGATGTTAGCGATCTGTTCATCGCGAGTTTAA
- coxL gene (codon optimized):
ATGAATATTCAGACAACAGTTGAACCAACTAGCGCTGAGAGAGCAGAAAAGTTGCAGGGTATGGGGTGCAAGAGGAAAAGAGTCGAAGATATTCGATTTACTCAGGGTAAGGGCAATTACGTCGATGATGTGAAATTACCGGGTATGTTGTTTGGTGATTTTGTTAGGAGTAGCCACGCTCATGCTAGGATTAAAAGTATTGATACCTCAAAAGCTAAGGCGCTTCCAGGTGTATTCGCTGTTTTAACAGCGGCAGATTTGAAGCCTCTGAATTTACATTATATGCCCACTCTGGCTGGAGATGTACAAGCAGTTCTTGCAGACGAGAAAGTTCTTTTCCAAAATCAAGAGGTTGCTTTTGTAGTGGCTAAAGATAGATACGTTGCGGCAGATGCGATCGAATTGGTAGAAGTAGATTATGAGCCATTACCAGTTCTAGTAGACCCATTCAAGGCAATGGAACCAGATGCACCTCTTCTAAGAGAAGATATTAAAGACAAAATGACTGGTGCACACGGTGCGAGGAAACATCACAACCATATATTCAGATGGGAAATAGGTGATAAGGAAGGAACTGATGCTACCTTCGCCAAAGCTGAAGTTGTGTCAAAAGATATGTTTACCTATCATCGGGTTCATCCGAGCCCACTGGAAACGTGTCAATGTGTTGCATCTATGGACAAGATCAAGGGTGAACTGACGTTGTGGGGCACATTTCAGGCTCCCCATGTCATTAGAACAGTAGTGTCATTGATCAGCGGTTTGCCAGAGCATAAAATCCACGTCATTGCACCTGACATAGGGGGAGGATTTGGAAACAAGGTGGGAGCTTATTCCGGGTACGTCTGTGCTGTGGTTGCCTCCATCGTGCTGGGAGTACCCGTTAAGTGGGTCGAAGATCGAATGGAGAACCTAAGCACTACATCATTTGCACGTGACTACCACATGACTACAGAACTCGCAGCTACAAAGGATGGAAAGATTCTTGCAATGCGCTGTCACGTCTTGGCTGATCACGGAGCTTTCGATGCCTGTGCTGATCCATCTAAATGGCCTGCTGGGTTTATGAACATATGTACAGGAAGCTATGACATGCCAGTTGCACATTTGGCCGTGGATGGTGTCTATACTAACAAAGCATCCGGCGGAGTAGCTTATAGGTGCTCATTCCGAGTTACAGAAGCTGTTTATGCCATTGAGAGGGCTATTGAGACTCTGGCTCAGCGGCTCGAGATGGATTCAGCTGATCTAAGAATAAAGAACTTTATACAACCTGAGCAGTTCCCTTATATGGCTCCTCTTGGCTGGGAGTACGACAGCGGAAATTATCCATTAGCGATGAAGAAAGCTATGGATACTGTTGGTTATCATCAACTTCGTGCTGAACAGAAAGCCAAACAAGAAGCATTTAAGCGGGGCGAGACACGCGAGATTATGGGAATTGGTATCTCGTTTTTCACCGAGATTGTTGGCGCCGGGCCGTCTAAGAATTGTGATATTCTCGGAGTTTCTATGTTTGATAGTGCAGAAATTCGTATTCATCCAACCGGTTCAGTGATTGCTAGAATGGGCACTAAGAGCCAGGGCCAGGGGCACGAGACTACTTACGCTCAAATCATAGCAACCGAACTCGGTATTCCCGCTGACGACATTATGATCGAAGAAGGGAATACCGATACTGCCCCTTATGGGCTTGGAACTTACGGAAGTCGCTCGACACCCACGGCTGGTGCTGCAACCGCTGTGGCCGCTCGTAAAATAAAAGCCAAGGCTCAAATGATTGCAGCACACATGCTCGAAGTGCATGAGGGAGATTTGGAATGGGACGTGGACAGATTTAGGGTTAAAGGTCTTCCGGAAAAATTCAAGACTATGAAGGAACTCGCATGGGCATCCTACAATAGTCCACCACCCAATCTTGAGCCTGGGCTCGAGGCTGTGAACTATTACGACCCTCCTAATATGACTTATCCTTTTGGTGCCTATTTTTGCATTATGGATATAGATGTGGATACTGGCGTCGCCAAAACCAGGAGGTTCTATGCATTAGACGATTGCGGAACAAGAATCAACCCGATGATTATAGAAGGGCAAGTTCATGGTGGTTTGACAGAGGCCTTCGCAGTAGCTATGGGGCAGGAGATCCGATACGACGAGCAAGGAAATGTGCTTGGAGCATCTTTTATGGACTTCTTCTTGCCAACGGCCGTCGAAACACCAAAGTGGGAGACAGATTACACAGTTACTCCATCTCCACATCATCCTATAGGAGCCAAAGGCGTTGGTGAAAGTCCTCATGTTGGCGGTGTGCCTTGCTTTTCAAATGCGGTTAATGATGCTTACGCATTTTTAAACGCAGGCCACATCCAAATGCCTCATGATGCATGGAGACTATGGAAGGTAGGAGAGCAACTTGGACTTCACGTCTAA
- coxM gene (codon optimized):
ATGATACCTGGATCATTTGATTATCATAGACCAAAATCCATTGCAGACGCAGTTGCTCTTCTTACGAAATTAGGGGAGGATGCTAGACCTTTGGCCGGAGGCCACAGCCTAATTCCTATTATGAAGACCAGATTAGCTACACCAGAACATTTGGTTGATCTCAGGGATATTGGAGATTTAGTCGGAATTAGGGAGGAGGGTACGGACGTCGTCATCGGGGCAATGACAACTCAGCATGCGCTTATAGGTTCAGATTTCTTGGCAGCAAAATTGCCAATTATTCGCGAGACAAGCCTGTTGATAGCAGATCCACAAATAAGGTACATGGGAACCATTGGCGGCAATGCCGCTAACGGAGATCCTGGAAACGATATGCCGGCCCTCATGCAGTGCTTGGGTGCGGCTTACGAACTCACTGGCCCTGAAGGTGCTCGTATAGTTGCTGCACGAGATTACTATCAAGGGGCTTATTTCACTGCTATTGAGCCCGGTGAACTTCTTACAGCAATCAGAATCCCCGTGCCACCCACTGGACACGGGTACGCTTACGAAAAACTGAAGCGGAAAATTGGCGACTATGCCACCGCCGCGGCAGCTGTAGTACTAACAATGAGTGGTGGAAAATGTGTGACTGCATCGATCGGTCTAACTAATGTTGCGAACACACCACTTTGGGCAGAAGAGGCCGGAAAGGTGTTGGTTGGTACTGCTCTCGACAAACCTGCTTTAGACAAGGCTGTAGCTCTGGCTGAGGCTATCACAGCTCCGGCATCTGATGGTCGCGGGCCAGCAGAATATCGAACCAAGATGGCTGGTGTTATGCTTCGTAGGGCAGTTGAAAGAGCAAAGGCCAGAGCCAAGAATTAA
- coxS gene (codon optimized):
ATGGCGAAAGCTCACATTGAACTCACGATCAACGGACATCCAGTGGAGGCATTGGTTGAACCTCGGACTTTACTAATTCACTTCATTAGAGAGCAACAGAACCTTACCGGCGCACATATCGGATGCGACACTTCACACTGCGGGGCTTGTACTGTTGATCTCGATGGTATGAGCGTGAAGAGCTGTACAATGTTTGCTGTCCAAGCTAATGGAGCTTCAATCACCACCATTGAAGGAATGGCAGCACCGGATGGTACACTGAGTGCTCTGCAAGAAGGGTTTAGGATGATGCATGGTTTGCAATGCGGTTACTGTACTCCAGGGATGATCATGCGATCCCATAGATTGCTTCAAGAGAATCCAAGCCCCACAGAAGCGGAAATAAGGTTCGGAATTGGTGGAAATCTTTGCCGCTGTACAGGCTACCAGAACATTGTTAAAGCAATACAGTATGCCGCCGCTAAGATAAATGGCGTACCTTTTGAGGAGGCCGCAGAATAA
Back-Translation and Codon Optimization of Engineered CTP Sequences
After designing and validating the engineered chloroplast transit peptides (CTPs) at the amino acid level, the next step was to convert these protein sequences into DNA sequences that are fully compatible with the plant expression system. This process ensures that the âtargeting signalsâ (CTPs) are translated efficiently in Nicotiana tabacum, just like the CODH subunits.
Since these CTPs are fused directly to the N-terminus of the CODH proteins, it is essential that they follow the same genetic design rules as the rest of the system to guarantee consistent expression and proper chloroplast targeting.
Back-Translation Strategy
The engineered CTP amino acid sequences (RbcS, Fer2, and RecA), including the modified junction motifs (VNAâAM, VTAâAM, and TVYâAA), were back-translated into DNA sequences using the Benchling Codon Optimization tool.
This step converted the peptide sequences into nucleotide sequences optimized for expression in Nicotiana tabacum, ensuring compatibility with the plantâs codon usage preferences and translation machinery.
Codon Optimization Consistency
The same optimization framework used for the seven CODH genes was applied to the CTP sequences to maintain full compatibility and expression uniformity across the entire multigene construct. This guarantees that all components of the system follow the same expression logic within the plant cell.
Key Adjustment: Hairpin Structure Control
A specific adjustment was introduced during this step due to the short length of CTP sequences. The standard secondary structure analysis settings were not optimal for short peptide-encoding regions, which can lead to inaccurate prediction of stable RNA hairpins near the translation start site.
To address this, the hairpin analysis window was reduced to 100 to improve sensitivity for short sequences and to ensure that no stable secondary structures form at the 5â region that could interfere with ribosome binding or early translation.
The following are the final codon-optimized CTP sequences generated in this step:
- RbcS CTP Sequence (engineered and codon optimized):
ATGGCCTCATCAATGCTCAGTAGCGCCACAATGGTGGCAAGTCCTGCTCAAGCTACAATGGTCGCTCCCTTTAATGGTCTGAAGTCGTCCGCAGCATTCCCAGCAACTAGAAAAGCTAATAATGACATAACGAGCATTACCAGCAACGGAGGCAGGGTAAACGCTGCG
- Fer2 CTP Sequence (engineered and codon optimized):
ATGGCTAGCACCGCACTGAGCTCAGCCATTGTGGGAACTTCCTTCATCCGGAGAAGTCCTGCGCCCATATCTCTACGATCACTCCCATCGGCAAACACACAATCTCTTTTTGGGTTGAAGAGTGGAACGGCAAGGGGTGGCAGAGTCACAGCTGCT
- RecA CTP Sequence (engineered and codon optimized):
ATGGACTCTCAACTTGTATTAAGCCTGAAGTTGAACCCCTCTTTCACACCACTTAGTCCTTTGTTTCCGTTTACTCCATGTTCCAGTTTCTCCCCATCGCTAAGGTTTTCAAGCTGCTACTCACGAAGACTCTATTCACCTGTCACCGTGTACGCAGCT