DNA Design Luciferase


FIREFLY LUCIFERASE

Bioluminescent art is based on the organic production of visible light by living organisms. This light is produced through the oxidation of luciferin, which is catalyzed by an enzyme called luciferase. For this week’s assignment, we will focus on the protein coding for this enzyme, which was first identified in the firefly species (Photinus Pyrlis) [1].

[1] De Wet J.R. et al. Firefly luciferase gene: structure and expression in mammalian cells. Mol Cell Biol (1987). https://pmc.ncbi.nlm.nih.gov/articles/PMC365129/

1. Protein Sequence

Sources

NCBI database search: https://www.ncbi.nlm.nih.gov/protein/BAF48396.1

Uniprot database search: https://www.uniprot.org/uniprotkb/P08659/entry#sequences

550 amino acids

1 medaknikkg papfypledg tageqlhkam kryalvpgti aftdahievn ityaeyfems 61 vrlaeamkry glntnhrivv csenslqffm pvlgalfigv avapandiyn erellnsmni 121 sqptvvfvsk kglqkilnvq kklpiiqkii imdsktdyqg fqsmytfvts hlppgfneyd 181 fvpesfdrdk tialimnssg stglpkgval phrtacvrfs hardpifgnq iipdtailsv 241 vpfhhgfgmf ttlgylicgf rvvlmyrfee elflrslqdy kiqsallvpt lfsffakstl 301 idkydlsnlh eiasggapls kevgeavakr fhlpgirqgy gltettsail itpegddkpg 361 avgkvvpffe akvvdldtgk tlgvnqrgel cvrgpmimsg yvnnpeatna lidkdgwlhs 421 gdlaywdede hffivgrlks likykgyqva paelesillq hpnifdagva glpdddagel 481 paavvvlehg ktmtekeivd yvasqvttak klrggvvfvd evpkgltgkr darkireili 541 kakkggkskl

2. Reverse Translate

Source

NCBI database search for P.pyralis (firefly) luciferase gene: https://www.ncbi.nlm.nih.gov/nuccore/M15077

Firefly Luciferase DNA sequence

1 ctgcagaaat aactaggtac taagcccgtt tgtgaaaagt ggccaaaccc ataaatttgg 61 caattacaat aaagaagcta aaattgtggt caaactcaca aacattttta ttatatacat 121 tttagtagct gatgcttata aaagcaatat ttaaatcgta aacaacaaat aaaataaaat 181 ttaaacgatg tgattaagag ccaaaggtcc tctagaaaaa ggtatttaag caacggaatt 241 cctttgtgtt acattcttga atgtcgctcg cagtgacatt agcattccgg tactgttggt 301 aaaatggaag acgccaaaaa cataaagaaa ggcccggcgc cattctatcc tctagaggat 361 ggaaccgctg gagagcaact gcataaggct atgaagagat acgccctggt tcctggaaca 421 attgcttttg tgagtatttc tgtctgattt ctttcgagtt aacgaaatgt tcttatgttt 481 ctttagacag atgcacatat cgaggtgaac atcacgtacg cggaatactt cgaaatgtcc 541 gttcggttgg cagaagctat gaaacgatat gggctgaata caaatcacag aatcgtcgta 601 tgcagtgaaa actctcttca attctttatg ccggtgttgg gcgcgttatt tatcggagtt 661 gcagttgcgc ccgcgaacga catttataat gaacgtaagc accctcgcca tcagaccaaa 721 gggaatgacg tatttaattt ttaaggtgaa ttgctcaaca gtatgaacat ttcgcagcct 781 accgtagtgt ttgtttccaa aaaggggttg caaaaaattt tgaacgtgca aaaaaaatta 841 ccaataatcc agaaaattat tatcatggat tctaaaacgg attaccaggg atttcagtcg 901 atgtacacgt tcgtcacatc tcatctacct cccggtttta atgaatacga ttttgtacca 961 gagtcctttg atcgtgacaa aacaattgca ctgataatga attcctctgg atctactggg 1021 ttacctaagg gtgtggccct tccgcataga actgcctgcg tcagattctc gcatgccagg 1081 tatgtcgtat aacaagagat taagtaatgt tgctacacac attgtagaga tcctattttt 1141 ggcaatcaaa tcattccgga tactgcgatt ttaagtgttg ttccattcca tcacggtttt 1201 ggaatgttta ctacactcgg atatttgata tgtggatttc gagtcgtctt aatgtataga 1261 tttgaagaag agctgttttt acgatccctt caggattaca aaattcaaag tgcgttgcta 1321 gtaccaaccc tattttcatt cttcgccaaa agcactctga ttgacaaata cgatttatct 1381 aatttacacg aaattgcttc tgggggcgca cctctttcga aagaagtcgg ggaagcggtt 1441 gcaaaacggt gagttaagcg cattgctagt atttcaaggc tctaaaacgg cgcgtagctt 1501 ccatcttcca gggatacgac aaggatatgg gctcactgag actacatcag ctattctgat 1561 tacacccgag ggggatgata aaccgggcgc ggtcggtaaa gttgttccat tttttgaagc 1621 gaaggttgtg gatctggata ccgggaaaac gctgggcgtt aatcagagag gcgaattatg 1681 tgtcagagga cctatgatta tgtccggtta tgtaaacaat ccggaagcga ccaacgcctt 1741 gattgacaag gatggatggc tacattctgg agacatagct tactgggacg aagacgaaca 1801 cttcttcata gttgaccgct tgaagtcttt aattaaatac aaaggatatc aggtaatgaa 1861 gatttttaca tgcacacacg ctacaatacc tgtaggtggc ccccgctgaa ttggaatcga 1921 tattgttaca acaccccaac atcttcgacg cgggcgtggc aggtcttccc gacgatgacg 1981 ccggtgaact tcccgccgcc gttgttgttt tggagcacgg aaagacgatg acggaaaaag 2041 agatcgtgga ttacgtcgcc agtaaatgaa ttcgttttac gttactcgta ctacaattct 2101 tttcataggt caagtaacaa ccgcgaaaaa gttgcgcgga ggagttgtgt ttgtggacga 2161 agtaccgaaa ggtcttaccg gaaaactcga cgcaagaaaa atcagagaga tcctcataaa 2221 ggccaagaag ggcggaaagt ccaaattgta aaatgtaact gtattcagcg atgacgaaat 2281 tcttagctat tgtaatatta tatgcaaatt gatgaatggt aattttgtaa ttgtgggtca 2341 ctgtactatt ttaacgaata ataaaatcag gtataggtaa ctaaaaa

3. Codon Optimization

According to the genetic code, there are fewer amino acids than codon possibilities (see chart below, image credit cdn.prod.website-files.com). Thus, in theory, multiple codons can encode for the same amino acid. But in practice, spatial configuration and kinetics factors affect the translation process. For instance, the use of some codons ressembling the STOP codons can interrupt prematurely the translation process. Thus, codon optimization is an important step when designing a nucleotide sequence.

Firefly Luciferase Optimized DNA sequence

Twist Bioscience add-ons:

Flank 5’: AGTACGCGTCTACGG

Flank 3’: TCCGATGACGTTAGC

ATGGAAGATGCAAAAAATATTAAAAAAGGCCCGGCGCCGTTTTATCCGCTGGAAGATGGCACAGCCGGTGAGCAGCTGCACAAAGCGATGAAGCGCTATGCGCTGGTTCCGGGCACCATTGCCTTCACCGATGCGCACATCGAAGTCAACATCACCTATGCTGAGTACTTTGAAATGTCTGTGCGTCTGGCGGAAGCGATGAAACGCTATGGTCTGAACACCAACCACCGTATTGTGGTCTGCTCTGAAAACAGCCTGCAGTTCTTCATGCCGGTACTGGGTGCGCTGTTTATCGGTGTTGCGGTAGCGCCGGCGAACGACATCTATAATGAGCGTGAACTGCTGAACTCCATGAACATCAGCCAGCCAACCGTTGTTTTTGTCAGCAAAAAAGGCCTGCAGAAAATCCTCAACGTTCAGAAAAAACTGCCGATCATTCAGAAAATCATCATCATGGACAGCAAAACCGATTATCAGGGTTTCCAGAGCATGTACACCTTTGTCACCAGCCACCTGCCGCCGGGTTTCAACGAATATGATTTTGTTCCGGAGAGCTTTGACCGTGATAAAACCATTGCGCTGATCATGAACAGCTCTGGCTCCACTGGTCTGCCGAAAGGTGTAGCGCTGCCGCACCGCACTGCCTGTGTGCGTTTCAGCCATGCGCGTGATCCGATTTTCGGTAACCAGATCATTCCGGACACCGCAATTCTGTCAGTGGTGCCGTTCCATCACGGTTTTGGTATGTTTACCACCCTGGGCTACCTGATCTGCGGTTTCCGCGTAGTGCTGATGTACCGCTTTGAAGAAGAGCTGTTCCTGCGCAGCCTGCAGGACTACAAAATCCAGTCTGCGCTGCTGGTACCGACCCTGTTCAGCTTCTTTGCCAAATCCACCCTGATCGATAAATATGACCTGAGTAACCTGCACGAGATTGCCTCTGGTGGTGCACCGCTGAGCAAAGAAGTTGGTGAAGCGGTGGCGAAACGTTTCCATCTGCCGGGTATCCGTCAGGGTTATGGTCTGACTGAAACCACCTCTGCGATTCTGATCACCCCGGAAGGTGATGACAAACCGGGTGCGGTGGGCAAAGTGGTACCGTTCTTCGAAGCGAAAGTGGTGGATCTCGACACCGGTAAAACGCTGGGTGTGAACCAGCGTGGTGAACTGTGTGTACGTGGCCCGATGATCATGTCTGGTTATGTCAACAACCCGGAAGCGACCAATGCGCTGATCGACAAAGATGGTTGGCTGCACAGCGGCGACATCGCCTATTGGGATGAAGATGAGCACTTCTTTATCGTTGACCGCCTGAAAAGCCTGATCAAATATAAAGGCTATCAGGTAGCACCGGCGGAACTGGAGTCGATCCTGCTGCAGCATCCGAACATCTTCGATGCCGGCGTGGCGGGTCTGCCGGATGATGATGCAGGTGAGCTGCCGGCAGCGGTGGTGGTGCTGGAGCACGGTAAAACCATGACCGAGAAAGAGATTGTTGATTATGTGGCCAGCCAGGTGACCACTGCGAAGAAACTGCGCGGTGGCGTGGTGTTTGTTGATGAAGTGCCGAAAGGTCTGACCGGTAAACTGGATGCGCGTAAAATCCGCGAGATTCTGATTAAAGCGAAAAAAGGCGGTAAAAGCAAACTG

Analysis of Twist’s Optimizations by Claude: Out of 550 codons, 526 were changed (95.6%). Removed restriction sites: EcoRV codon ~515, Xbal codon ~16 (not on the list). BsaI, MluI, AatII not present. GC Content was increased from 42.8% to 52.5%. Insect genomes tend to be AT-rich. Bacteria (E. coli) and mammalian cells prefer slightly higher GC content. Rare codons eliminated and the most frequent codons in E. coli used.

4. From DNA Sequence to Firefly Luciferase

Firefly luciferase can be produced either by using living organisms (cell-dependent systems) or in a test tube (cell-free systems). In both cases, the production follows the two steps of the central dogma:

(1) Transcription of DNA into mRNA. RNA polymerase binds to a promoter and reads and copies the DNA from start to stop codons into mRNA, in which the nucleotide thymine (T) is replaced by uracil (U).

(2) Translation of mRNA into protein. Ribosomes bind to and read the mRNA codon by codon and, for each codon, incorporate the matching amino acid via a transfer RNA (tRNA). This forms a chain of amino acids bonded together (a polypeptide), which starts folding as the chain grows and is released when the ribosome reaches the stop codon. Depending on the protein, further maturation processes and/or association into a larger complex may occur afterwards.

In cell-dependent systems, the gene of interest is first cloned into an expression vector, i.e. inserted into plasmids, which are then amplified before being transferred into host organisms (e.g. E. coli), which carry out protein synthesis. In cell-free systems, the protein is produced by adding the DNA (or mRNA) directly into a mixture containing the elements required for transcription and translation (ribosomes, enzymes, cofactors, etc.). Cell-free systems are usually used when a rapid check is needed (protein production within a few hours), whereas cell-dependent systems are preferred for higher-yield production (e.g. in the industrial sector).

References Recitation Week 02

Image source: What is the central dogma?

Image source: What is a plasmid?

Images source: Bacterial transformation & selection

5. From A Single DNA Sequence to Multiple Proteins

A. Alternative Splicing (in Eukaryotes)

A gene is made of exons (actively coding parts) and introns (silent parts). After the transcription, there are many other steps before the translation, including the processing of the pre-mRNA. The cell’s machinery cuts out the introns and can rearrange the exons in different combinations. This process called alternative splicing explains why a single gene can code for multiple proteins with have different shapes and functions (isoforms). This evolutionary mechanism allows for instance the human body to create hundreds of thousands different proteins using only around 20,000 genes.

Image source and reference: All About Alternative Splicing

B. Polycistronic RNA

6. Documentation

Claude’s guidance for the optimization of the Firefly Luciferase gene (Photinus pyralis):

1. Understanding the Flank Sequences

Flanking sequences are short DNA segments added to the 5’ and 3’ ends of your optimized insert. They serve as:

  • Cloning handles for restriction enzyme digestion or Gibson Assembly
  • Reading frame aligners to ensure proper in-frame fusion with a vector
  • Spacers to protect restriction sites from incomplete digestion

Flank 5’ template “AGTACGCGTCTACGG” decoded: Buffer bases AGT protect the restriction site from incomplete digestion, MluI restriction enzyme site (A | CGCGT cut) and the linker CTACGG serves as a spacer before the ATG start codon.

Flank 3’ template “TCCGATGACGTTAGC” decoded: The linker TCCG serves as a spacer after the stop codon, AatII restriction enzyme site (GACGT | C cut), and buffer bases TAGC for protection.

2. Restriction Sites to Remove During Optimization

Internal occurrences of EcoRV GATATC and BsaI GGTCTC(N)1 restriction sites must be removed from within the luciferase coding sequence without changing the aa sequence (synonymous codon substitutions).

(N)1 means the RE cuts outside the recognition sequence, in this case one random nucleotide downstream.

3. DNA Regions Excluded from Optimization

Some regions should not be codon-optimized:

  • Known functional RNA elements (e.g., internal ribosome entry sites, regulatory motifs)
  • Regions with validated mutagenesis you want to preserve exactly
  • His-tags, linkers, or fusion sequences if already codon-optimized elsewhere