Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Optional for Committed Listeners

Part 3: DNA Design Challenge

3.1 Choose your protein

For this assignment, I chose Collagen Type I (alpha 1 chain). I selected this protein because collagen is the main structural protein found in bone, teeth, and connective tissues. In archaeology, collagen is extremely important because it can survive for thousands of years in skeletal remains and artifacts made from bone or leather. It is widely used in radiocarbon dating and paleoproteomics to identify species and study ancient diets. Since I am interested in archaeology, this protein connects molecular biology with archaeological research.

Using UniProt, I obtained the protein sequence for human Collagen Type I alpha 1 chain (COL7A1).

sp|Q02388-1|CO7A1_HUMAN Isoform 1 of Collagen alpha-1(VII) chain OS=Homo sapiens OX=9606 GN=COL7A1 MTLRLLVAALCAGILAEAPRVRAQHRERVTCTRLYAADIVFLLDGSSSIGRSNFREVRSF LEGLVLPFSGAASAQGVRFATVQYSDDPRTEFGLDALGSGGDVIRAIRELSYKGGNTRTG AAILHVADHVFLPQLARPGVPKVCILITDGKSQDLVDTAAQRLKGQGVKLFAVGIKNADP EELKRVASQPTSDFFFFVNDFSILRTLLPLVSRRVCTTAGGVPVTRPPDDSTSAPRDLVL SEPSSQSLRVQWTAASGPVTGYKVQYTPLTGLGQPLPSERQEVNVPAGETSVRLRGLRPL TEYQVTVIALYANSIGEAVSGTARTTALEGPELTIQNTTAHSLLVAWRSVPGATGYRVTW RVLSGGPTQQQELGPGQGSVLLRDLEPGTDYEVTVSTLFGRSVGPATSLMARTDASVEQT LRPVILGPTSILLSWNLVPEARGYRLEWRRETGLEPPQKVVLPSDVTRYQLDGLQPGTEY RLTLYTLLEGHEVATPATVVPTGPELPVSPVTDLQATELPGQRVRVSWSPVPGATQYRII VRSTQGVERTLVLPGSQTAFDLDDVQAGLSYTVRVSARVGPREGSASVLTVRREPETPLA VPGLRVVVSDATRVRVAWGPVPGASGFRISWSTGSGPESSQTLPPDSTATDITGLQPGTT YQVAVSVLRGREEGPAAVIVARTDPLGPVRTVHVTQASSSSVTITWTRVPGATGYRVSWH SAHGPEKSQLVSGEATVAELDGLEPDTEYTVHVRAHVAGVDGPPASVVVRTAPEPVGRVS RLQILNASSDVLRITWVGVTGATAYRLAWGRSEGGPMRHQILPGNTDSAEIRGLEGGVSY SVRVTALVGDREGTPVSIVVTTPPEAPPALGTLHVVQRGEHSLRLRWEPVPRAQGFLLHW QPEGGQEQSRVLGPELSSYHLDGLEPATQYRVRLSVLGPAGEGPSAEVTARTESPRVPSI ELRVVDTSIDSVTLAWTPVSRASSYILSWRPLRGPGQEVPGSPQTLPGISSSQRVTGLEP GVSYIFSLTPVLDGVRGPEASVTQTPVCPRGLADVVFLPHATQDNAHRAEATRRVLERLV LALGPLGPQAVQVGLLSYSHRPSPLFPLNGSHDLGIILQRIRDMPYMDPSGNNLGTAVVT AHRYMLAPDAPGRRQHVPGVMVLLVDEPLRGDIFSPIREAQASGLNVVMLGMAGADPEQL RRLAPGMDSVQTFFAVDDGPSLDQAVSGLATALCQASFTTQPRPEPCPVYCPKGQKGEPG EMGLRGQVGPPGDPGLPGRTGAPGPQGPPGSATAKGERGFPGADGRPGSPGRAGNPGTPG APGLKGSPGLPGPRGDPGERGPRGPKGEPGAPGQVIGGEGPGLPGRKGDPGPSGPPGPRG PLGDPGPRGPPGLPGTAMKGDKGDRGERGPPGPGEGGIAPGEPGLPGLPGSPGPQGPVGP PGKKGEKGDSEDGAPGLPGQPGSPGEQGPRGPPGAIGPKGDRGFPGPLGEAGEKGERGPP GPAGSRGLPGVAGRPGAKGPEGPPGPTGRQGEKGEPGRPGDPAVVGPAVAGPKGEKGDVG PAGPRGATGVQGERGPPGLVLPGDPGPKGDPGDRGPIGLTGRAGPPGDSGPPGEKGDPGR PGPPGPVGPRGRDGEVGEKGDEGPPGDPGLPGKAGERGLRGAPGVRGPVGEKGDQGDPGE DGRNGSPGSSGPKGDRGEPGPPGPPGRLVDTGPGAREKGEPGDRGQEGPRGPKGDPGLPG APGERGIEGFRGPPGPQGDPGVRGPAGEKGDRGPPGLDGRSGLDGKPGAAGPSGPNGAAG KAGDPGRDGLPGLRGEQGLPGPSGPPGLPGKPGEDGKPGLNGKNGEPGDPGEDGRKGEKG DSGASGREGRDGPKGERGAPGILGPQGPPGLPGPVGPPGQGFPGVPGGTGPKGDRGETGS KGEQGLPGERGLRGEPGSVPNVDRLLETAGIKASALREIVETWDESSGSFLPVPERRRGP KGDSGEQGPPGKEGPIGFPGERGLKGDRGDPGPQGPPGLALGERGPPGPSGLAGEPGKPG IPGLPGRAGGVGEAGRPGERGERGEKGERGEQGRDGPPGLPGTPGPPGPPGPKVSVDEPG PGLSGEQGPPGLKGAKGEPGSNGDQGPKGDRGVPGIKGDRGEPGPRGQDGNPGLPGERGM AGPEGKPGLQGPRGPPGPVGGHGDPGPPGAPGLAGPAGPQGPSGLKGEPGETGPPGRGLT GPTGAVGLPGPPGPSGLVGPQGSPGLPGQVGETGKPGAPGRDGASGKDGDRGSPGVPGSP GLPGPVGPKGEPGPTGAPGQAVVGLPGAKGEKGAPGGLAGDLVGEPGAKGDRGLPGPRGE KGEAGRAGEPGDPGEDGQKGAPGPKGFKGDPGVGVPGSPGPPGPPGVKGDLGLPGLPGAP GVVGFPGQTGPRGEMGQPGPSGERGLAGPPGREGIPGPLGPPGPPGSVGPPGASGLKGDK GDPGVGLPGPRGERGEPGIRGEDGRPGQEGPRGLTGPPGSRGERGEKGDVGSAGLKGDKG DSAVILGPPGPRGAKGDMGERGPRGLDGDKGPRGDNGDPGDKGSKGEPGDKGSAGLPGLR GLLGPQGQPGAAGIPGDPGSPGKDGVPGIRGEKGDVGFMGPRGLKGERGVKGACGLDGEK GDKGEAGPPGRPGLAGHKGEMGEPGVPGQSGAPGKEGLIGPKGDRGFDGQPGPKGDQGEK GERGTPGIGGFPGPSGNDGSAGPPGPPGSVGPRGPEGLQGQKGERGPPGERVVGAPGVPG APGERGEQGRPGPAGPRGEKGEAALTEDDIRGFVRQEMSQHCACQGQFIASGSRPLPSYA ADTAGSQLHAVPVLRVSHAEEEERVPPEDDEYSEYSEYSVEEYQDPEAPWDSDDPCSLPL DEGSCTAYTLRWYHRAVTGSTEACHPFVYGGCGGNANRFGTREACERRCPPRVVQSQGTG TAQD

3.2 Reverse Translation

According to the Central Dogma, DNA is transcribed into RNA and then translated into protein. Since each amino acid is encoded by a three-nucleotide codon, we can work backwards from a protein sequence to determine a possible DNA sequence.

For the partial collagen sequence shown previously:

  • Protein sequence (partial): MTLRLLVAALCAGILAEAPRVRAQHRERVTCTRLYAADIVFLLDGSSSIGRSNFREVRSF

Using NCBI, one possible nucleotide sequence that encodes this amino acid sequence is:

  • DNA sequence (one possible version):

      1 aattcccaca aaccctgctg acttgacccc attggcccag acccctgttc cctgccactg
     61 gatgagggct cctgcactgc ctacaccctg cgctggtacc atcgggctgt gacaggcagc
    121 acagaggcct gtcacccttt tgtctatggt ggctgtggag ggaatgccaa ccgttttggg
    181 acccgtgagc ctgcgagcgc cgctgcccac cccgggtgtc cagagccagg ggacaggtac
    241 tgcccaggac tgaggcccag ataatgagct gagattcagc atcccctgga ggacgtcggg
    301 gtctcagcag aaccccactg tccctcccct tggtgctaga ggcttgtgtg cacgtgagcg
    361 tcggttgtgc agttcccgtt atttcagtga cttggtcccg tgggtctaac cttcccccct
    421 gtggacaaac ccccattgtg gctccn
    

Explanation:

ATG → Methionine (M)

GGT → Glycine (G)

CCT → Proline (P)

CGT → Arginine (R)

Because the genetic code is degenerate (multiple codons can encode the same amino acid), this is only one possible DNA sequence. Many other nucleotide sequences could produce the exact same collagen protein segment.

3.3 Codon Optimization

After obtaining a possible DNA sequence from reverse translation, the next step is codon optimization. Although multiple DNA sequences can encode the same protein, different organisms prefer certain codons over others. This is known as codon bias. If a gene contains many codons that are rarely used in the host organism, protein production may be slow or inefficient. For this assignment, I chose to optimize the collagen sequence for Escherichia coli because it is widely used in biotechnology. E. coli grows quickly, is inexpensive to culture, and is commonly used for recombinant protein production. Using an online codon optimization tool, the DNA sequence was adjusted to:

  • Use codons that are frequently used in E. coli
  • Improve translation efficiency
  • Avoid problematic sequences (such as strong secondary structures or unwanted restriction sites)

Importantly, codon optimization does not change the amino acid sequence of the protein. It only changes the nucleotide sequence to improve expression in the chosen organism. By optimizing the codons for E. coli, the collagen gene would be more efficiently transcribed and translated, leading to higher protein yield.

  • Improved ADN: GCCGCAACCACCTGCTGCTGCGCCTGCGCCGCGGCCTGCTGTTGCACCGGCTGCACCGGCGCGTGTACCACCGGCGCATGCTGCTGCTGCGCGACCACCGGTGGTTGCTGTTGTGCGGGTGCCTGCTGCTGTTGCACCGGCACCACCTGTTGCTGCACCGGCTGCTGCGCATGTACCGGCGGAGCCACCGGCGCGGGCGGTGGCTGCACCTGTTGCACCGGCTGCGCATGTACCGGCTGCTGCACCGCGTGCGCCTGCTGCTGCACCGGCTGCGGCTGCACTGGCGGTACTGCATGCTGCGCGACCTGCGGCGGCGGCTGCACCGGCACCGGTGCCTGTGCCGGCGGCTGCGCGGGCTGCGCGTGCGCGGGCGCCGGTGGCTGCTGCACCGGCACCTGCGCCTGTTGCTGCACCACCACCACCGGCACCTGTACAGCGACCGGCGGCACCGGCGGCTGCACCGGCACCGGCGGTGCGGGTGGCGGCGCGGCGACCGGTTGTTGTGCAGCGTGCTGCGGCACCACCACCACAGGTGGTGGTGCGTGTTGCTGCGGCACCGGCGCGGGCTGCTGCACCGGCTGTGGTGCGGGCTGCGGCTGTTGCGGCTGTACCGGCTGTTGCTGCGCGTGTTGCTGTTGTGGCGGTGGCACCGGTACCTGCTGCGCGGGCGCGGGTTGCTGTGCCGGCGGCGGCGGCGCCTGTGCGGGTGGCACCGCGTGCACCGGCTGCTGCTGCGCCGGAGGCGCGTGCACCGGCGCCGGTGGTTGCTGTTGCGCGGGTGCCACCGCGGCAACCGGCGCGGGCTGTACCGGCGCAGGCGCAACCACCTGCGCAGGTTGTGCGACGTGCTGCTGTTGTACAGGTGGCGCGGGCGGCGCCTGTGGCACGTGTGGCGGCGGCGGTACCTGCACCTGTGCCGGCTGCGCGGGCGCAGCGTGTTGTTGCTGCGCGTGCACGGGCACCTGCTGTTGCACCTGCTGTTGCTGTACCACCGGGGGCACCGGCTGCACCGCAGGTGCCGGCGGTTGTACCACCGGTACCGGTACCGGTTGTGCGTGCGGCACCGGCGCGGGCTGCGGCACCTGCGGCGGTACCACCGGTACCGGCTGCGCGGGTACCACCTGCTGCTGCGGCACCACCGCGACGACCACCTGCGCAGGTACCGGTGCGTGCACCACCGGTGGCACCTGCTGCTGTGGCACCGGCGGTGGTACCTGCACAGCGGCCTGCTGCACCACCTGCTGTTGCTGCTGTTGCACCGGCACCGGTGGTGCGTGTGCGGCGGCCTGCTGCTGCTGTTGCGCGACGACCGGCACCGGCGGATGTACCTGCTGCAATTAA

3.4 You have a sequence! Now what?

Now that I have a codon-optimized DNA sequence for collagen, the next step is to produce the protein. One common method is a cell-dependent system. In this approach, the optimized DNA sequence is inserted into a plasmid (a small circular DNA molecule). The plasmid is then introduced into Escherichia coli cells through transformation. Once inside the cell:

  • The DNA is transcribed into mRNA.
  • The mRNA is translated by ribosomes into the collagen protein.
  • The bacterial cell essentially acts as a biological factory, producing the protein as it grows.

Another option is a cell-free system. In this method, instead of using living cells, the DNA is added to a solution containing the necessary molecular machinery (ribosomes, enzymes, nucleotides, amino acids). The transcription and translation processes occur in a test tube, producing the protein directly. This method is faster and more controlled, but usually more expensive. In both cases, the DNA sequence follows the Central Dogma: DNA → RNA → Protein, resulting in the production of the collagen protein.

Part 4: Prepare a Twist DNA Synthesis Order

This project uses E. coli and collagen to create reproducible patterns that simulate organic components of ancient artifacts, such as textiles or adhesives. Collagen acts as a structural scaffold to hold proteins in place, while engineered E. coli produce proteins that form visible patterns. Automation ensures precision and repeatability, allowing us to study how these materials might degrade or be preserved over time, providing insights into experimental archaeology and conservation.