Part 3: DNA Design Challenge

3.1 Choose Your Protein

Protein chosen: Superfolder Green Fluorescent Protein (sfGFP)

Why: sfGFP is a robust, rapidly maturing fluorescent protein derived from Aequorea victoria (Pédelacq et al., 2005). It is widely used in synthetic biology as a reporter—when expressed in cells, it fluoresces bright green under blue/UV light, enabling real-time visualization of gene expression, protein localization, and cell tracking. Its “superfolder” mutations improve folding efficiency in diverse hosts (including E. coli), making it ideal for expression experiments. It also connects directly to Part 4, where we build an expression cassette to make E. coli glow green.

Source: FPbase — Superfolder GFP | UniProt | GenBank: ASL68970

Protein sequence (amino acids):

MSKGEELFTGVVPILVELDGDVNGHKFSVRGEGEGDATNGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNFNSHNVYITADKQKNGIKANFKIRHNVEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSVLSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

(238 amino acids, ~26.8 kDa)

3.2 Reverse Translate: Protein → DNA

Using the Central Dogma in reverse: given a protein sequence, we infer a possible DNA sequence that could encode it. Because the genetic code is degenerate (multiple codons encode the same amino acid), many DNA sequences can produce the same protein. A simple reverse translation uses one valid codon per amino acid—here, E. coli preferred codons (most frequently used in highly expressed genes).

Tool used: Reverse translation with E. coli codon preferences (e.g., ExPASy Translate or similar tools; can also be done manually with a codon usage table).

Reverse-translated DNA sequence (one possible encoding):

ATGTCAAAAGGTGAAGAACTGTTTACCGGTGTGGTGCCGATTCTGGTGGAACTGGATGGTGATGTGAACGGTCACAAATTTTCAGTGCGTGGTGAAGGTGAAGGTGATGCTACCAACGGTAAACTGACCCTGAAATTTATTTGCACCACCGGTAAACTGCCGGTGCCGTGGCCGACCCTGGTGACCACCCTGACCTACGGTGTGCAGTGCTTTTCACGTTACCCGGATCACATGAAACGTCACGATTTTTTTAAATCAGCTATGCCGGAAGGTTACGTGCAGGAACGTACCATTTCATTTAAAGATGATGGTACCTACAAAACCCGTGCTGAAGTGAAATTTGAAGGTGATACCCTGGTGAACCGTATTGAACTGAAAGGTATTGATTTTAAAGAAGATGGTAACATTCTGGGTCACAAACTGGAATACAACTTTAACTCACACAACGTGTACATTACCGCTGATAAACAGAAAAACGGTATTAAAGCTAACTTTAAAATTCGTCACAACGTGGAAGATGGTTCAGTGCAGCTGGCTGATCACTACCAGCAGAACACCCCGATTGGTGATGGTCCGGTGCTGCTGCCGGATAACCACTACCTGTCAACCCAGTCAGTGCTGTCAAAAGATCCGAACGAAAAACGTGATCACATGGTGCTGCTGGAATTTGTGACCGCTGCTGGTATTACCCACGGTATGGATGAACTGTACAAA

(714 bp)

3.3 Codon Optimization

Why optimize codon usage? Different organisms prefer different codons for the same amino acid, based on tRNA abundance and other factors. Using rare codons can slow translation, cause ribosome stalling, and reduce protein yield. Codon optimization replaces codons with those most frequently used in the target organism, improving expression levels and folding. It also allows us to avoid restriction enzyme recognition sites (e.g., BsaI, BsmBI, BbsI) that would interfere with Golden Gate or other assembly methods.

Organism chosen: Escherichia coli (K-12)

Why E. coli? It is the standard workhorse for recombinant protein expression: well-characterized genetics, fast growth, simple culture, and widely available vectors and protocols. The HTGAA Part 4 exercise uses E. coli for the sfGFP expression cassette, so optimizing for E. coli keeps the workflow consistent.

Tool used: Twist Bioscience Codon Optimization Tool (avoiding Type IIs sites BsaI, BsmBI, BbsI as recommended).

Codon-optimized DNA sequence (for E. coli):

Using Twist Codon Optimization Tool, avoiding Type IIs sites BsaI, BsmBI, BbsI:

ATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

(717 bp; optimized for E. coli expression, restriction-site free — same sequence used in Part 4 expression cassette)

3.4 You Have a Sequence! Now What?

Technologies to produce sfGFP from this DNA:

Cell-dependent (recombinant expression in E. coli):
- Clone the codon-optimized gene into an expression vector (e.g., pTwist Amp High Copy) with a constitutive or inducible promoter (e.g., BBa_J23106), RBS (e.g., BBa_B0034), and terminator (e.g., BBa_B0015).
- Transform the plasmid into E. coli (e.g., DH5α, BL21).
- Grow cells; the host RNA polymerase transcribes the DNA into mRNA, and ribosomes translate the mRNA into sfGFP.
- The protein folds and forms its chromophore; cells fluoresce green under blue light (~488 nm excitation, ~510 nm emission).
Cell-free (in vitro transcription–translation):
- Use a cell-free system (e.g., E. coli lysate, PURE system) with the DNA template.
- Add NTPs, amino acids, and energy sources; the system transcribes and translates the gene without living cells.
- Useful for rapid prototyping, toxic proteins, or when cell growth is impractical.
DNA synthesis (Twist, IDT, etc.):
- Order the gene as a clonal or linear fragment from a synthesis provider.
- Use it directly for cloning or cell-free expression, avoiding PCR or cloning from natural sources.

Flow: DNA → (RNA polymerase) → mRNA → (ribosomes + tRNAs + amino acids) → polypeptide → (folding + chromophore formation) → fluorescent sfGFP.

3.5 [Optional] How Does It Work in Nature?

Alignment of DNA, RNA, and protein: In the Central Dogma, DNA is transcribed to RNA (T→U), and RNA is translated to protein (3 nt → 1 aa). Tools like Benchling or Ronan’s gel art site can visualize this alignment.

Single gene → multiple proteins: Alternative splicing (eukaryotes) or alternative start codons/ribosomal frameshifting can produce multiple proteins from one gene. sfGFP is a single open reading frame, but in general, one gene can yield multiple isoforms through these mechanisms.