Phase 2: Codon Optimization

The Codon Optimization and Its Critical Role:

Codon Optimization and Sequence Adaptation processes:

1. Start Codon Verification and Correction

As an initial step, all seven CODH genes were carefully inspected to verify the presence of a valid translation initiation codon. A critical adjustment was required for the coxM gene, which was the only gene using an alternative bacterial start codon (GTG) instead of the canonical ATG.

Since plant translation machinery, particularly in Nicotiana tabacum, strictly recognizes ATG as the initiation codon, the native GTG was manually corrected to ATG during the optimization process. This modification ensures proper translation initiation while preserving the original amino acid sequence of the CoxM protein.

2. Codon Optimization Strategy

Codon optimization was performed using the Benchling Codon Optimization Tool, applying the “Match Codon Usage” algorithm. This approach was selected because it reproduces the natural codon distribution of the target organism rather than overusing only the most frequent codons, thereby improving mRNA stability and translation efficiency.

image image cover image cover image image image image image

The optimization process was carried out under the following parameters:

  • Target organism: Nicotiana tabacum
  • Restriction site filtering: Removal of common restriction enzyme recognition sites (EcoRI, HindIII, BamHI, XbaI, PstI, and SpeI) to facilitate downstream cloning
  • Golden Gate compatibility: Elimination of BsaI and Esp3I sites to ensure compatibility with Modular Cloning (MoClo) systems
  • RNA stability optimization: Implementation of uridine depletion and avoidance of stable hairpin structures to reduce ribosomal stalling and improve translation efficiency

3. Results and Validation

Following optimization, all sequences were evaluated using CAIcal to assess codon adaptation and overall sequence quality. image image cover image cover image image image

The analysis demonstrated consistently strong performance across all seven genes as showed in the following table:

Gene NameLength (bp)CAI ScoreTotal GC%GC at 3rd PositionNc ValueExpression Potential
CoxL24300.77346.3%40.0%57.0Excellent
CoxE12000.76249.8%40.5%61.0Very Good
CoxG6180.76046.9%40.8%61.0Very Good
CoxS5010.75947.7%43.1%61.0Very Good
CoxD8880.75646.8%40.5%61.0Very Good
CoxF8430.74849.5%39.9%61.0Very Good
CoxM8670.74749.5%39.1%61.0Very Good

The Codon Adaptation Index (CAI) values ranged from 0.747 to 0.773, indicating a high level of similarity to codon usage patterns found in highly expressed genes of Nicotiana tabacum. This suggests that the optimized sequences are well-suited for efficient translation in the plant host.

The overall GC content was successfully adjusted to a range of 46.3% to 49.8%, aligning with the typical GC composition of plant genes. This represents a significant improvement compared to the original bacterial sequences and contributes to better transcriptional stability and compatibility with the host genome.

The Effective Number of Codons (Nc) values ranged from 57.0 to 61.0, reflecting a balanced codon usage without excessive repetition. This indicates that the sequences maintain sufficient variability, which is important for avoiding issues such as tRNA depletion or translational bottlenecks.

Additionally, the GC content at the third codon position was maintained at approximately 40%, which is considered optimal for the “wobble” position. This balance supports efficient recognition by plant tRNAs and contributes to overall translation efficiency.

To further validate the integrity of the optimization process, both the raw bacterial sequences and the codon-optimized sequences were translated into their corresponding amino acid sequences.

image image cover image cover image image image cover image cover image image image

A pairwise comparison was then performed using BLASTp alignment to assess sequence similarity. image image The results confirmed that all optimized proteins are identical to their native counterparts, with no changes in amino acid sequence. image image cover image cover image image image This verification step ensures that codon optimization only affected synonymous codon usage without altering protein structure or function, preserving the biological activity of all seven CODH components.

The resulting codon-optimized cox genes sequences are as follows:

  • coxD gene (codon optimized):
ATGAGACATCATGCTGAACGAGATAAGGTCGCCGAGAGGCTAGCCTATGCAGGTTATATTCCAGATCGTGATCTTGCTACCGCTGTTTGGCTGATGGAAAGCCTTTCCAGGCCCTTGTTGTTAGAAGGAGAAGCTGGTGTAGGTAAAACCGAGGTAGCTCTGACTCTTGCGCAAGCTAACGGAGCAAGGCTCATTCGCTTGCAATGCTATGAAGGGCTCGATCAAAACGCTGCATTATACGAGTGGAATTACCAACGGCAGTTGCTCGCTATCAAAACACGGGAAAGTCGTGCTGACGCAGTAGATGTTATCGAAGATCATATTTTCTCAGAGAAGTTTCTTCTTGAGCGACCTCTGTTGGCTGCAATACGTCAACCCAAATCAGCAGTGCTACTAATTGATGAGGTTGACAGGGCCGACGAGGAGTTCGAAGCCTTTTTACTCGAACTTCTAAGCGATTACCAGGTTTCTATTCCTGAACTTGGTACAATCCACGCAACAACGATTCCACAGGTGATATTAACTTCCAATGGCACGAGAGAGTTATCAGATGCCTTGAGGAGGAGATGTCTCTACCACTATGTCGACTATCCAGATGTTGAAAGAGAAGCGCGTATCATAACCACAAGAATGCCGAATATTGACGTTGCTCTGGCGTTGCAGATTGCCAGGATGATCGAGGGAATACGAAAAGAGGATTTACGCAAGAGTCCTGGAGTCGCAGAAACTCTCGACTGGGCAGCAGCATTGGCTGGGCTTGGCGTTGAGGATCTTAGAGCTGAACCAGAAGCTGTGTTTGAAACTATGATGTGCTTGATAAAGACAGTCGAAGATAAATCGAGAGTGACTAGAGAGGTTTCTGATAGACTGCTTGGAAAGGTGGCATAA
  • coxE gene (codon optimized):
ATGGTTGCAACTGCTGCCATTCATGAATCCAGCGCTGCTTCAGCAGGAGCTAGACGCAAGCTGGGCGATTTTGTTCGAGTACTCCGGGACAATGGTTTTATTGTGGGGCTCGCGGAGGCTGGAGATGCTCTTACTGTTCTTAGCAGGCCTGCCTCTTTGACACCTAGCAGACTACGACCGGCTCTTCGTGCATTGTTCTGCTCAAACAAGTCTGATTGGGAAAAGTTTGACGAGATTTTCGATGCTTTCTGGCTTGGACGAGGAATGAAATCCGCAACGAGAATTTCCGGAGTGCTTCAAAAAAGTCCTCCCGGTATGGAAAGTTCAAGGAGTGGCGATAGACCAGGTAATCCTGATGGGGCACCAGATCATGTTCAGCGGCGTATAGGCTTGGATCACGGCACCGATGAAAATAGTCCAGGACTTCGGGAAGGTGCATCACGCGCTGACTCACTGGCCAAGGCTGATTTTAGACATCTCACAAACCCGGACGATCTTGCTGCCGCTCATGCTGTAGCTGCAAGACTCGCAAAGGCTATGAGGGTGCGCTTAACCCGACGTGAACAGTCTCGCAGAACTGGTAGGAGGATCGACCTTAGAAGGACTATTCACAAAAATATAGCCCATGGAGGAATGCCACTGGAATTGGTCTGGCGACAGAGGAAACACAAACCATTAAGACTGGTTGTTCTACTCGACGCTTCCGGATCTATGAGCATGTATAGTGCAGTATTCTTAAGATTCATGCACGGGATTCTTGATAATTTTAGGGAGGCCGAAGCATTTGTTTTCCATACAAGGCTAATTCATATATCTCCAGCTTTGAGAGAACGTGATGCGACACGTTCTGTGGAGAGAATGAGCCTATTGGCCCAAGGCGTCGGTGGTGGAACACGGATCGGTGAATCACTTGCCACGTTTAATAGATGGCATGCAAAGAGAGCAATTCATTCGAGGACTTGCGTTATGATCGTGTCAGATGGTTACGATACCGGACCTGCCGAGCAATTGGAGCGAGAAATGTCGGCTTTAAGGCGTCGTTGTAGAAGAATCGCATGGCTCAACCCAATGATCGGTTGGAGGGGGTATGCGCCAGAGGCAGCTGGGATGAAAGCTGCACTGCCTCACGTCGACTTGTTTGCTCCCGCTCACAACTTAGAGAGCTTGCAAGCAATTGAGCCTTACTTAGCGAGGATATAA
  • coxF gene (codon optimized):
ATGACACCTACTCCTGACGTGTTAGATTTAGTCAACAATATGAAAGCCAGAGGAGAGCCATTCGCCCTTGCAACTGTAGTTCGGACGGTATCACTCACCGCAGCCAAGGCAGGTGCAAAGGCTATTATTTTGAGCGACGGTACTATGACAGCAGGATGGATTGGGGGCGGGTGTGCGAGAGCTAATGTGCTTAAGGCTGCTAGGCAAAGTCTTAGCGACGGAAAGCCGAGGCTGATTAGTGTTCAACCAAAGGATGTTCTTGAGGAACATGGTTTAACAGCAGGGGAAGCGCGAGAAGGAGTGCTATATGCCAACAACATGTGCCCAAGCCATGGTACCATGGATATTTTCGTTGAGCCAATATTGCCGCGACCTCAGCTCTATATCTGTGGAGCAAGCCCAGTTGCAGTGGCTATAGCTGCTATAGCACCTCGTATGGGATTTTTTGTGTCTGTTTGCGCTCCCAAAGCAGATCACACATTGTTTGGTGATACCGATAGGCTGATTGATGGTTATGAAATTCCCGCCGACAGCGGTACTAATCGGTACGTCGTTGTATCTACACAGGGACGTGGCGATACTGCTGCTCTGAAATCTGCACTATCCACGCCATCCGTCTACGTGGCTTTCGTTGGCAGTAGAAAGAAAGCCTCGGTTTTGAGGGAAGAGCTTACCGTAGCAGGAATTGCGCCATCACTATTGGAAACATTGCATGCTCCTGCCGGCCTCGACCTTGGCGGTATCACTCCTGATGAAATCGCTCTCTCAATCGTTGCTGAGATGGTCGAGATAAGACGCCACGGGCAAAGACAAAGCGATAATCAGAAAGAAGGAACATCATAA
  • coxG gene (codon optimized):
ATGGATATGAACGCAAGCCAGAGAATTGAAGCCTCAAGGGAAAAAGTCTACGCCGCTCTCAATGATGTTGAGGTGCTTAGGCCTTGCATTCCAGGTTGCGAGTCCATCGAAAAGATCTCTGATAGCGAGATGACTGCCAAGGTAACATTGCGCATAGGACCAGTGAAAGCATCTTTTACCGGTAAGGTGACCCTAAGTGATCTCGATCCTCCAAATGGTTACACCATAGCAGGGGAGGGTACAGGAGGAATGGCAGGATTCGCAAAGGGCGGTGCTACTGTGAAACTCGAAGCTGACGGGACTGCCACGATTCTTCATTATACTGTTAAAGCTGACGTCGGAGGCAAACTGGCGCAGCTTGGTGGTAGACTAATCGATGCAACAGCTACAAAACTTGCAGGAGAGTTTTTTGAAAAATTCGGAAATATTGTTGGGCCTGTAGTAGTCCAAGACGAAGAAGAGCCGGTTAAGAAGAAAGGTTGGTTGAAGAAGATAACTGGCGCTTTAAGTGTTTTGGTTTTCTCAATTTTGTTAGGAGCTCACTGGTGTTGTATTGGGGGCCATGCTCACGCTCAAAACGATCCCCTGATGTTAGCGATCTGTTCATCGCGAGTTTAA
  • coxL gene (codon optimized):
ATGAATATTCAGACAACAGTTGAACCAACTAGCGCTGAGAGAGCAGAAAAGTTGCAGGGTATGGGGTGCAAGAGGAAAAGAGTCGAAGATATTCGATTTACTCAGGGTAAGGGCAATTACGTCGATGATGTGAAATTACCGGGTATGTTGTTTGGTGATTTTGTTAGGAGTAGCCACGCTCATGCTAGGATTAAAAGTATTGATACCTCAAAAGCTAAGGCGCTTCCAGGTGTATTCGCTGTTTTAACAGCGGCAGATTTGAAGCCTCTGAATTTACATTATATGCCCACTCTGGCTGGAGATGTACAAGCAGTTCTTGCAGACGAGAAAGTTCTTTTCCAAAATCAAGAGGTTGCTTTTGTAGTGGCTAAAGATAGATACGTTGCGGCAGATGCGATCGAATTGGTAGAAGTAGATTATGAGCCATTACCAGTTCTAGTAGACCCATTCAAGGCAATGGAACCAGATGCACCTCTTCTAAGAGAAGATATTAAAGACAAAATGACTGGTGCACACGGTGCGAGGAAACATCACAACCATATATTCAGATGGGAAATAGGTGATAAGGAAGGAACTGATGCTACCTTCGCCAAAGCTGAAGTTGTGTCAAAAGATATGTTTACCTATCATCGGGTTCATCCGAGCCCACTGGAAACGTGTCAATGTGTTGCATCTATGGACAAGATCAAGGGTGAACTGACGTTGTGGGGCACATTTCAGGCTCCCCATGTCATTAGAACAGTAGTGTCATTGATCAGCGGTTTGCCAGAGCATAAAATCCACGTCATTGCACCTGACATAGGGGGAGGATTTGGAAACAAGGTGGGAGCTTATTCCGGGTACGTCTGTGCTGTGGTTGCCTCCATCGTGCTGGGAGTACCCGTTAAGTGGGTCGAAGATCGAATGGAGAACCTAAGCACTACATCATTTGCACGTGACTACCACATGACTACAGAACTCGCAGCTACAAAGGATGGAAAGATTCTTGCAATGCGCTGTCACGTCTTGGCTGATCACGGAGCTTTCGATGCCTGTGCTGATCCATCTAAATGGCCTGCTGGGTTTATGAACATATGTACAGGAAGCTATGACATGCCAGTTGCACATTTGGCCGTGGATGGTGTCTATACTAACAAAGCATCCGGCGGAGTAGCTTATAGGTGCTCATTCCGAGTTACAGAAGCTGTTTATGCCATTGAGAGGGCTATTGAGACTCTGGCTCAGCGGCTCGAGATGGATTCAGCTGATCTAAGAATAAAGAACTTTATACAACCTGAGCAGTTCCCTTATATGGCTCCTCTTGGCTGGGAGTACGACAGCGGAAATTATCCATTAGCGATGAAGAAAGCTATGGATACTGTTGGTTATCATCAACTTCGTGCTGAACAGAAAGCCAAACAAGAAGCATTTAAGCGGGGCGAGACACGCGAGATTATGGGAATTGGTATCTCGTTTTTCACCGAGATTGTTGGCGCCGGGCCGTCTAAGAATTGTGATATTCTCGGAGTTTCTATGTTTGATAGTGCAGAAATTCGTATTCATCCAACCGGTTCAGTGATTGCTAGAATGGGCACTAAGAGCCAGGGCCAGGGGCACGAGACTACTTACGCTCAAATCATAGCAACCGAACTCGGTATTCCCGCTGACGACATTATGATCGAAGAAGGGAATACCGATACTGCCCCTTATGGGCTTGGAACTTACGGAAGTCGCTCGACACCCACGGCTGGTGCTGCAACCGCTGTGGCCGCTCGTAAAATAAAAGCCAAGGCTCAAATGATTGCAGCACACATGCTCGAAGTGCATGAGGGAGATTTGGAATGGGACGTGGACAGATTTAGGGTTAAAGGTCTTCCGGAAAAATTCAAGACTATGAAGGAACTCGCATGGGCATCCTACAATAGTCCACCACCCAATCTTGAGCCTGGGCTCGAGGCTGTGAACTATTACGACCCTCCTAATATGACTTATCCTTTTGGTGCCTATTTTTGCATTATGGATATAGATGTGGATACTGGCGTCGCCAAAACCAGGAGGTTCTATGCATTAGACGATTGCGGAACAAGAATCAACCCGATGATTATAGAAGGGCAAGTTCATGGTGGTTTGACAGAGGCCTTCGCAGTAGCTATGGGGCAGGAGATCCGATACGACGAGCAAGGAAATGTGCTTGGAGCATCTTTTATGGACTTCTTCTTGCCAACGGCCGTCGAAACACCAAAGTGGGAGACAGATTACACAGTTACTCCATCTCCACATCATCCTATAGGAGCCAAAGGCGTTGGTGAAAGTCCTCATGTTGGCGGTGTGCCTTGCTTTTCAAATGCGGTTAATGATGCTTACGCATTTTTAAACGCAGGCCACATCCAAATGCCTCATGATGCATGGAGACTATGGAAGGTAGGAGAGCAACTTGGACTTCACGTCTAA
  • coxM gene (codon optimized):
ATGATACCTGGATCATTTGATTATCATAGACCAAAATCCATTGCAGACGCAGTTGCTCTTCTTACGAAATTAGGGGAGGATGCTAGACCTTTGGCCGGAGGCCACAGCCTAATTCCTATTATGAAGACCAGATTAGCTACACCAGAACATTTGGTTGATCTCAGGGATATTGGAGATTTAGTCGGAATTAGGGAGGAGGGTACGGACGTCGTCATCGGGGCAATGACAACTCAGCATGCGCTTATAGGTTCAGATTTCTTGGCAGCAAAATTGCCAATTATTCGCGAGACAAGCCTGTTGATAGCAGATCCACAAATAAGGTACATGGGAACCATTGGCGGCAATGCCGCTAACGGAGATCCTGGAAACGATATGCCGGCCCTCATGCAGTGCTTGGGTGCGGCTTACGAACTCACTGGCCCTGAAGGTGCTCGTATAGTTGCTGCACGAGATTACTATCAAGGGGCTTATTTCACTGCTATTGAGCCCGGTGAACTTCTTACAGCAATCAGAATCCCCGTGCCACCCACTGGACACGGGTACGCTTACGAAAAACTGAAGCGGAAAATTGGCGACTATGCCACCGCCGCGGCAGCTGTAGTACTAACAATGAGTGGTGGAAAATGTGTGACTGCATCGATCGGTCTAACTAATGTTGCGAACACACCACTTTGGGCAGAAGAGGCCGGAAAGGTGTTGGTTGGTACTGCTCTCGACAAACCTGCTTTAGACAAGGCTGTAGCTCTGGCTGAGGCTATCACAGCTCCGGCATCTGATGGTCGCGGGCCAGCAGAATATCGAACCAAGATGGCTGGTGTTATGCTTCGTAGGGCAGTTGAAAGAGCAAAGGCCAGAGCCAAGAATTAA
  • coxS gene (codon optimized):
ATGGCGAAAGCTCACATTGAACTCACGATCAACGGACATCCAGTGGAGGCATTGGTTGAACCTCGGACTTTACTAATTCACTTCATTAGAGAGCAACAGAACCTTACCGGCGCACATATCGGATGCGACACTTCACACTGCGGGGCTTGTACTGTTGATCTCGATGGTATGAGCGTGAAGAGCTGTACAATGTTTGCTGTCCAAGCTAATGGAGCTTCAATCACCACCATTGAAGGAATGGCAGCACCGGATGGTACACTGAGTGCTCTGCAAGAAGGGTTTAGGATGATGCATGGTTTGCAATGCGGTTACTGTACTCCAGGGATGATCATGCGATCCCATAGATTGCTTCAAGAGAATCCAAGCCCCACAGAAGCGGAAATAAGGTTCGGAATTGGTGGAAATCTTTGCCGCTGTACAGGCTACCAGAACATTGTTAAAGCAATACAGTATGCCGCCGCTAAGATAAATGGCGTACCTTTTGAGGAGGCCGCAGAATAA

Back-Translation and Codon Optimization of Engineered CTP Sequences

After designing and validating the engineered chloroplast transit peptides (CTPs) at the amino acid level, the next step was to convert these protein sequences into DNA sequences that are fully compatible with the plant expression system. This process ensures that the “targeting signals” (CTPs) are translated efficiently in Nicotiana tabacum, just like the CODH subunits.

Since these CTPs are fused directly to the N-terminus of the CODH proteins, it is essential that they follow the same genetic design rules as the rest of the system to guarantee consistent expression and proper chloroplast targeting.

Back-Translation Strategy

The engineered CTP amino acid sequences (RbcS, Fer2, and RecA), including the modified junction motifs (VNA–AM, VTA–AM, and TVY–AA), were back-translated into DNA sequences using the Benchling Codon Optimization tool.

This step converted the peptide sequences into nucleotide sequences optimized for expression in Nicotiana tabacum, ensuring compatibility with the plant’s codon usage preferences and translation machinery.

Codon Optimization Consistency

The same optimization framework used for the seven CODH genes was applied to the CTP sequences to maintain full compatibility and expression uniformity across the entire multigene construct. This guarantees that all components of the system follow the same expression logic within the plant cell.

Key Adjustment: Hairpin Structure Control

A specific adjustment was introduced during this step due to the short length of CTP sequences. The standard secondary structure analysis settings were not optimal for short peptide-encoding regions, which can lead to inaccurate prediction of stable RNA hairpins near the translation start site.

To address this, the hairpin analysis window was reduced to 100 to improve sensitivity for short sequences and to ensure that no stable secondary structures form at the 5’ region that could interfere with ribosome binding or early translation.

The following are the final codon-optimized CTP sequences generated in this step:

  • RbcS CTP Sequence (engineered and codon optimized):
ATGGCCTCATCAATGCTCAGTAGCGCCACAATGGTGGCAAGTCCTGCTCAAGCTACAATGGTCGCTCCCTTTAATGGTCTGAAGTCGTCCGCAGCATTCCCAGCAACTAGAAAAGCTAATAATGACATAACGAGCATTACCAGCAACGGAGGCAGGGTAAACGCTGCG
  • Fer2 CTP Sequence (engineered and codon optimized):
ATGGCTAGCACCGCACTGAGCTCAGCCATTGTGGGAACTTCCTTCATCCGGAGAAGTCCTGCGCCCATATCTCTACGATCACTCCCATCGGCAAACACACAATCTCTTTTTGGGTTGAAGAGTGGAACGGCAAGGGGTGGCAGAGTCACAGCTGCT
  • RecA CTP Sequence (engineered and codon optimized):
ATGGACTCTCAACTTGTATTAAGCCTGAAGTTGAACCCCTCTTTCACACCACTTAGTCCTTTGTTTCCGTTTACTCCATGTTCCAGTTTCTCCCCATCGCTAAGGTTTTCAAGCTGCTACTCACGAAGACTCTATTCACCTGTCACCGTGTACGCAGCT

Objective:

Codon optimization is a fundamental step in synthetic biology when expressing genes across different organisms. Although the genetic code is universal, meaning that most organisms use the same codons to encode the same amino acids, the frequency at which specific codons are used varies between species. This phenomenon is known as codon usage bias.

Each organism has evolved to preferentially use certain codons over others, largely reflecting the abundance of corresponding transfer RNAs (tRNAs). As a result, a gene originating from one organism may be inefficiently translated when introduced into another if its codon usage does not match the host’s preferences.

In this project, the seven genes encoding the Carbon Monoxide Dehydrogenase (CODH) system originate from a bacterium and are being expressed in a plant (Nicotiana tabacum). Without codon optimization, several issues can arise:

  • Reduced translation efficiency due to rare codons
  • Ribosome stalling or premature termination
  • Lower protein yield or misfolding
  • Overall failure of the multi-subunit complex to assemble correctly

Because the CODH system depends on the coordinated expression of multiple subunits and maturation proteins, balanced and efficient expression of each gene is essential. Even a single poorly expressed component could compromise the functionality of the entire enzyme complex.

Therefore, codon optimization is not just a technical adjustment but a critical requirement for functional expression. In this step, each gene sequence is redesigned to match the codon usage preferences of Nicotiana tabacum, while preserving the exact amino acid sequence of the encoded proteins. Additional considerations, such as avoiding mRNA secondary structures, eliminating cryptic splice sites, and maintaining appropriate GC content, are also taken into account.


Sources: