Week 2 HW: DNA Read, Write & Edit

Xray of DNA

Part 1: Benchling & In-silico Gel Art

See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview:

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes:
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI
Virtual digest sequence LAMCG
Restriction Enzyme Digestion made with Benchling
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
  • You might find Ronan’s website a helpful tool for quickly iterating on designs!
E=m*a2
EcoRV vs. EcoRI
Single Enzymes
Pyramid Enzymes

Part 3: DNA Design Challenge

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

[Example from our group homework, you may notice the particular format — The example below came from UniProt]

>sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1
METRFPQQSQQTPASTNRRPRFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLSLL EAVIRTVTTLQQLLT

THYROGLOBULINE (CANIS LUPUS FAMILIARIS) / ACTIN (CANIS LUPUS FAMILIARIS) vs. ACTIN (FUNGUS: S.C)

The world of proteins is so vast that choosing a single protein has been a profound task.

To follow the same path as week 1 HW, let´s start with Thyroglobulin, a very complex and specialized protein that is key to the generation of T3 and T4 hormones; in other words, it is a hormone protein. Because of its complexity, specificity, and its work with DNA, it is a modern protein. Some interesting facts about Thyroglobuline are: its size, it is very big in comparison to other proteins, it only functions in the thyroid gland, it is prone to being attacked to inmune system´s cells when something is not working well, and it does not accept errors in its process. If we compared it to the Actin protein, we could understand that Actin is a simpler protein that achieves a general action and that it is present in all eukaryotic forms since early life on Earth. Actin is the protein in charge of the formation of the cytoskeleton, motility, and shape of cells, among many other functions. The interesting fact about Actin is that it can allow errors to occur, in contrast to Thyroglobuline, which is very precise.

In the exercise bellow I will develop Thyroglobuline for Canis lupus familiaris, and also, compare Actin protein in dogs vs. Actin protein in fungus (Saccharomyces cerevisiae).

THYROGLOBULINE - CANIS LUPUS FAMILIARIS

>sp|P15881|ITF2_CANLF Transcription factor 4 OS=Canis lupus familiaris OX=9615 GN=TCF4 PE=2 SV=2
MFSPPVSSGKNGPTSLASGHFTGSNVEDRSSSGSWGNGGHPSPSRNYGDGTPYDHMTSRD
LGSHDNLSPPFVNSRIQSKTERGSYSSYGRESNLQGCHQSLLGGDMDMGTPGTLSPTKPG
SQYYQYSSNNPRRRPLHSSAMEVQTKKVRKVPPGLPSSVYAPSASTADYNRDSPGYPSSK
PAASTFPSSFFMQDGHHSSDPWSSSSGMNQPGYGGMLGSSSHIPQSSSYCSLHPHERLSY
PSHSSADINSSLPPMSTFHRSGTNHYSTSSCTPPANGTDSIMANRGSGAAGSSQTGDALG
KALASIYSPDHTNNSFSSNPSTPVGSPPSLSAGTAVWSRNGGQASSSPNYEGPLHSLQSR
IEDRLERLDDAIHVLRNHAVGPSTAMPGGHGDMHGIIGPSHNGAMGGLGSGYGTGLLSAN
RHSLMVGAHREDGVALRGSHSLVPNQVPVPQLPVQSATSPDLNPPQDPYRGMPPGLQGQS
VSSGSSEIKSDDEGDENLQDTKSSEDKKLDDDKKDIKSITSNNDDEDLTPEQKAEREKER
RMANNARERLRVRDINEAFKELGRMVQLHLKSDKPQTKLLILHQAVAVILSLEQQVRERN
LNPKAACLKRREEEKVSSEPPPLSLAGPHPGMGDASNHMGQM

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

[Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Lysis protein DNA sequence
atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa
THYROGLOBULINE - DNA sequence

atgttctccccccccgtgtcctccggcaagaacggccccacctccctggcctccggccac
ttcaccggctccaacgtggaggaccgctcctcctccggctcctggggcaacggcggccac
ccctccccctcccgcaactacggcgacggcaccccctacgaccacatgacctcccgcgac
ctgggctcccacgacaacctgtccccccccttcgtgaactcccgcatccagtccaagacc
gagcgcggctcctactcctcctacggccgcgagtccaacctgcagggctgccaccagtcc
ctgctgggcggcgacatggacatgggcacccccggcaccctgtcccccaccaagcccggc
tcccagtactaccagtactcctccaacaacccccgccgccgccccctgcactcctccgcc
atggaggtgcagaccaagaaggtgcgcaaggtgccccccggcctgccctcctccgtgtac
gccccctccgcctccaccgccgactacaaccgcgactcccccggctacccctcctccaag
cccgccgcctccaccttcccctcctccttcttcatgcaggacggccaccactcctccgac
ccctggtcctcctcctccggcatgaaccagcccggctacggcggcatgctgggctcctcc
tcccacatcccccagtcctcctcctactgctccctgcacccccacgagcgcctgtcctac
ccctcccactcctccgccgacatcaactcctccctgccccccatgtccaccttccaccgc
tccggcaccaaccactactccacctcctcctgcaccccccccgccaacggcaccgactcc
atcatggccaaccgcggctccggcgccgccggctcctcccagaccggcgacgccctgggc
aaggccctggcctccatctactcccccgaccacaccaacaactccttctcctccaacccc
tccacccccgtgggctcccccccctccctgtccgccggcaccgccgtgtggtcccgcaac
ggcggccaggcctcctcctcccccaactacgagggccccctgcactccctgcagtcccgc
atcgaggaccgcctggagcgcctggacgacgccatccacgtgctgcgcaaccacgccgtg
ggcccctccaccgccatgcccggcggccacggcgacatgcacggcatcatcggcccctcc
cacaacggcgccatgggcggcctgggctccggctacggcaccggcctgctgtccgccaac
cgccactccctgatggtgggcgcccaccgcgaggacggcgtggccctgcgcggctcccac
tccctggtgcccaaccaggtgcccgtgccccagctgcccgtgcagtccgccacctccccc
gacctgaaccccccccaggacccctaccgcggcatgccccccggcctgcagggccagtcc
gtgtcctccggctcctccgagatcaagtccgacgacgagggcgacgagaacctgcaggac
accaagtcctccgaggacaagaagctggacgacgacaagaaggacatcaagtccatcacc
tccaacaacgacgacgaggacctgacccccgagcagaaggccgagcgcgagaaggagcgc
cgcatggccaacaacgcccgcgagcgcctgcgcgtgcgcgacatcaacgaggccttcaag
gagctgggccgcatggtgcagctgcacctgaagtccgacaagccccagaccaagctgctg
atcctgcaccaggccgtggccgtgatcctgtccctggagcagcaggtgcgcgagcgcaac
ctgaaccccaaggccgcctgcctgaagcgccgcgaggaggagaaggtgtcctccgagccc
ccccccctgtccctggccggcccccaccccggcatgggcgacgcctccaaccacatgggc
cagatg

3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

[Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Lysis protein DNA sequence with Codon-Optimization
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

Codon optimization is needed to make the codon sequence of the original host be read or expressed in the codon sequence of the organism that will produce it later, without modifying the amino acid sequence. In the case of research, it is necessary to have a bank of protein that will be analyzed and tested; in that way, it is not sustainable to always get it from the original host for many reasons: budget, quantity, ethics, etc.

In the case of canine Thyroglobulin, the experiment will be based on the question: how to produce canine Thyroglobulin that could be used as one component of an implant of a thyroid gland? The cell organism that will produce the protein later will be CHO (Chinese hamster ovary cells), a mammal cell that has the capacity to perform complex processes to produce proteins as specialized as thyroglobulin.

THYROGLOBULINE - DNA sequence with Codon-Optimization

ATGTTCTCACCACCTGTGTCTTCTGGCAAGAATGGCCCCACCTCCCTGGCTTCTGGCCACTTCACCGGAAGCAACGTGGAGGACAGGTCCTCTTCCGGCTCCTGGGGCAATGGCGGCCACCCAAGTCCATCTCGAAACTACGGCGACGGGACCCCTTACGATCACATGACCTCCAGAGACCTGGGCTCTCATGACAATCTGTCTCCCCCATTTGTGAACTCCCGGATTCAGTCTAAGACTGAGCGGGGCTCATACAGCTCTTACGGACGCGAGAGCAACCTGCAGGGTTGTCACCAGTCCCTGCTGGGCGGAGACATGGACATGGGCACCCCCGGGACCCTCTCTCCTACTAAGCCTGGCTCTCAGTATTACCAGTACTCCTCCAATAACCCTCGAAGGCGGCCCCTGCACAGCAGTGCCATGGAGGTCCAGACAAAGAAAGTCAGGAAGGTGCCACCAGGCCTGCCCAGCTCCGTCTATGCCCCAAGCGCCTCCACCGCCGATTACAATCGAGATAGCCCCGGTTACCCCTCCTCTAAGCCAGCAGCCTCTACTTTCCCTAGCTCCTTCTTTATGCAGGACGGCCATCACTCAAGTGATCCTTGGTCCAGCAGCTCTGGCATGAACCAGCCAGGGTACGGTGGTATGCTGGGTTCTTCCAGTCACATCCCTCAGTCTTCTTCCTACTGTAGTCTGCATCCACATGAGCGCCTGTCATACCCCAGCCACTCCTCTGCCGACATCAATAGCTCCCTGCCACCCATGTCAACCTTCCATAGGAGCGGCACTAACCATTATTCCACATCCAGCTGCACTCCTCCCGCTAACGGTACTGACTCTATCATGGCTAACAGAGGCTCCGGCGCCGCTGGAAGCAGTCAGACCGGAGATGCTCTCGGCAAGGCCCTGGCTTCTATCTATTCTCCCGACCATACCAACAATTCTTTCAGTAGCAACCCTTCTACTCCCGTGGGCTCCCCTCCTTCCCTGTCCGCCGGAACCGCTGTGTGGTCTAGGAATGGCGGCCAGGCCAGCTCCAGCCCTAATTATGAGGGCCCCCTGCACAGCCTGCAGTCTCGTATTGAGGATAGGCTGGAGCGACTGGACGACGCTATTCACGTGCTGCGTAACCATGCTGTGGGCCCAAGCACCGCTATGCCCGGGGGACACGGAGACATGCACGGAATCATCGGCCCTTCTCACAACGGGGCTATGGGGGGTCTGGGCAGCGGCTACGGAACAGGCCTGCTGTCCGCCAACAGGCACTCTCTGATGGTGGGTGCCCACCGGGAAGACGGAGTGGCCCTGAGAGGGTCACATAGCCTGGTGCCTAACCAGGTGCCTGTGCCTCAGCTGCCCGTGCAGAGTGCTACTAGCCCCGATCTGAACCCTCCACAGGACCCTTACAGAGGCATGCCACCCGGTCTGCAGGGACAGTCTGTGTCCTCTGGCAGTAGCGAGATCAAGTCAGATGACGAGGGAGACGAGAACCTGCAGGATACAAAGAGCTCCGAGGATAAGAAATTGGACGACGACAAGAAGGACATCAAGTCCATCACCAGCAACAACGACGACGAGGACCTGACTCCTGAGCAGAAGGCCGAACGGGAAAAGGAAAGGCGGATGGCTAACAATGCAAGAGAACGCCTGAGGGTCAGGGATATCAATGAGGCTTTCAAGGAGCTGGGCAGGATGGTGCAGCTGCATCTTAAGTCTGACAAGCCACAGACAAAGCTGCTGATCCTGCACCAGGCTGTGGCTGTGATTCTGTCCCTGGAGCAGCAGGTGAGAGAGAGGAACCTGAACCCTAAGGCCGCTTGCCTGAAAAGACGGGAGGAGGAAAAAGTGAGCTCTGAGCCCCCACCCCTCTCCCTGGCCGGACCACACCCCGGCATGGGCGACGCTTCTAACCACATGGGCCAGATG

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

If I had to produce this from my DNA, I would have to use a codon-optimized sequence so that the protein could be interpreted by human cells. Although canine thyroglobuline and human thyroglobuline are not so different, and the DNA is not that different, it is necessary to make this step in order to have amino acids arranged in perfect order so that they can be read by mRNA. The technologies to do this would be:

  • cell-free methods: for producing the protein, specifically CHO cells that come from Chinese hamster ovaries. As this protein comes from a mammal, the cells for reproducing the protein need to be from the same group; it is not efficient to use bacterial cells like E.coli, for example, because the protein needs to fold in a specific way, and the differences between bacteria, mammal,s and plants make this process very different.

  • bioreactor: for scaling the production, avoiding cito-contamination, and giving the process a controlled atmosphere to fold and grow.


Part 4: Prepare a Twist DNA Synthesis Order

This is a practice exercise, not necessarily your real Twist order!

- 4.1. Create a Twist account and a Benchling account

- 4.2. Build Your DNA Insert Sequence

- 4.3. On Twist, Select The “Genes” Option

- 4.4. Select “Clonal Genes” option

- 4.5. Import your sequence

- 4.6. Choose Your Vector