Week 2 HW: Reading and Writing DNA Sequences
Start Your Homework!
Part 1: Benchling & In-silico Gel Art
Johnny Got His Gun
-The sequence has been rearranged into the shape of a pistol. I wanted to keep the number of columns restricted to the original seven enzymes for this composition, but I’m sure the image could be improved with additional columns. Question? How many columns can be introduced to the gel matrix?

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis
NA
Part 3: DNA Design: Pick a Protein… Any Protein?

The organism - Chroococcidiopsis - contains the orange carotenoid protein (OCP)
As seen in the page from my notes above, I’m interested in Chroococcidiopsis, a type of cyanobacteria, which is the focal point of my research with PIG. A protein, OCP, is found in the Phycobilisome (PBS), which is a large light harvesting complex attached to the Thylacoid Membranes within the cyanobacteria cell body. It is an Orange Carotenoid Protein (OCP) with 6483 genes. At this point, I have no idea what any of this means, and I’m not sure if this protein - which was one of over 41,000 results to my search for “proteins commonly found in cyanobacteria” - will be of any importance to my project. Apparently, it acts as a sunscreen with its orange coloration for the thylacoid membrane, which is important for regulating photosynthesis within the cell. I chose this protein in particular for primarily two reasons, it’s a necessary part of a functional cyanobacterial cell, and it has been isolated as a plasmid, which seems to be important in context to the work we will be doing within HTGAA.
Reverse Translation of OCP Protein sequence to DNA sequence.
The OCP sequence:
1 mpytiesars ifpdtqvasa vptivesfeq lsaedrlall wfaytemgvt itpaamqvan
61 mmfaektlaq ieqipaaeqt qvmcdlinht dtpicrtysy fgmnvklgfw yqlgewmkqg
121 ivapipegyk lsakasnvlq tirqleggqq ltvlrdivvn mghspttatq kveepvvppk
181 dlaprtkivi eginnstvls ymenmnafdf eaavalfaed galqppfeep ivgqesilaf
241 mreecyglkl ipergisepg ergftqikvm gkvqtpwagd svginlawrf linrqgkiff
301 vaidvlaspq ellnlglvk
Reverse Translate results Results for 319 residue sequence “URD53675.1 orange carotenoid-binding protein (plasmid) [Chroococcidiopsis sp. CCNUC1]” starting “MPYTIESARS”
reverse translation of URD53675.1 orange carotenoid-binding protein (plasmid) [Chroococcidiopsis sp. CCNUC1] to a 957 base sequence of most likely codons. atgccgtataccattgaaagcgcgcgcagcatttttccggatacccaggtggcgagcgcg gtgccgaccattgtggaaagctttgaacagctgagcgcggaagatcgcctggcgctgctg tggtttgcgtataccgaaatgggcgtgaccattaccccggcggcgatgcaggtggcgaac atgatgtttgcggaaaaaaccctggcgcagattgaacagattccggcggcggaacagacc caggtgatgtgcgatctgattaaccataccgataccccgatttgccgcacctatagctat tttggcatgaacgtgaaactgggcttttggtatcagctgggcgaatggatgaaacagggc attgtggcgccgattccggaaggctataaactgagcgcgaaagcgagcaacgtgctgcag accattcgccagctggaaggcggccagcagctgaccgtgctgcgcgatattgtggtgaac atgggccatagcccgaccaccgcgacccagaaagtggaagaaccggtggtgccgccgaaa gatctggcgccgcgcaccaaaattgtgattgaaggcattaacaacagcaccgtgctgagc tatatggaaaacatgaacgcgtttgattttgaagcggcggtggcgctgtttgcggaagat ggcgcgctgcagccgccgtttgaagaaccgattgtgggccaggaaagcattctggcgttt atgcgcgaagaatgctatggcctgaaactgattccggaacgcggcattagcgaaccgggc gaacgcggctttacccagattaaagtgatgggcaaagtgcagaccccgtgggcgggcgat agcgtgggcattaacctggcgtggcgctttctgattaaccgccagggcaaaatttttttt gtggcgattgatgtgctggcgagcccgcaggaactgctgaacctgggcctggtgaaa
reverse translation of URD53675.1 orange carotenoid-binding protein (plasmid) [Chroococcidiopsis sp. CCNUC1] to a 957 base sequence of consensus codons. atgccntayacnathgarwsngcnmgnwsnathttyccngayacncargtngcnwsngcn gtnccnacnathgtngarwsnttygarcarytnwsngcngargaymgnytngcnytnytn tggttygcntayacngaratgggngtnacnathacnccngcngcnatgcargtngcnaay atgatgttygcngaraaracnytngcncarathgarcarathccngcngcngarcaracn cargtnatgtgygayytnathaaycayacngayacnccnathtgymgnacntaywsntay ttyggnatgaaygtnaarytnggnttytggtaycarytnggngartggatgaarcarggn athgtngcnccnathccngarggntayaarytnwsngcnaargcnwsnaaygtnytncar acnathmgncarytngarggnggncarcarytnacngtnytnmgngayathgtngtnaay atgggncaywsnccnacnacngcnacncaraargtngargarccngtngtnccnccnaar gayytngcnccnmgnacnaarathgtnathgarggnathaayaaywsnacngtnytnwsn tayatggaraayatgaaygcnttygayttygargcngcngtngcnytnttygcngargay ggngcnytncarccnccnttygargarccnathgtnggncargarwsnathytngcntty atgmgngargartgytayggnytnaarytnathccngarmgnggnathwsngarccnggn garmgnggnttyacncarathaargtnatgggnaargtncaracnccntgggcnggngay wsngtnggnathaayytngcntggmgnttyytnathaaymgncarggnaarathttytty gtngcnathgaygtnytngcnwsnccncargarytnytnaayytnggnytngtnaar
Question 1. The translation results were produced using bioinformatics’ Sequence Analysis - Reverse Translate which produced the results above. In addtion it produced code for graphing the base probabilities. How is this code converted in a graph? Python?
Question 2. What is the difference between most likely codons and consensus codons? Question 3. In addition to nitrogenous bases ACTG, there are also: n,y,h,r,w,s and m. These represent codon combos which form amino acids? - yes
Codon Optimization
Clearly there are advantages as well as disadvantages to Codon Optimization, and after reviewing the basic concept it seems there are two divergent camps, pro optimization vs. the purist. Let’s compare.
Some Advantages:
- Optimization can increase protein production by making more efficient use of codons by using a simplified version of synonymous amino acid combinations. This could help improve the efficiency of producing therapeutic drugs, and potentially reduce production cost.
- Certain codon combinations can potentially increase the expression of a gene thereby making more affective treatments.
- Optimization helps to align a target gene’s codon usage with the preferred codons of the host organism.
Some Disadvantages:
- Modification of protein structure can affect protein function or performance.
- Potentially, there are deeper levels of code in specific codon combinations which could be related to rythms associated with protein folding patterns during elongation, or functional aspects which have not yet been discovered.
- Optimization can also affect wobble of tRNAand potentially cause unwanted mutations or render proteins disfunctional.
Based off this surface research, I can see reasons for and against codon optimization - and as with any experimental research - we must carefully weigh the potential benefits of optimization relative to the application. For instance, if the optimization is for use in animal research, which could potentially impact the wellbeing of an organism with unintended “side effects”, then comprehensive multi level testing should be conducted to exhaust or reduce error within the application. If the optimization is purely lab based and not able to modify DNA replication, and will improve the speed of data and research assimilation then I can see this as a definite advantage to employ the technology.
Codon Optimization- novoprolabs.com, I used this because the Twist link was broken. The sequence type was: DNA/RNA with E.coli as expression host which produced the following result from most likely codon group:
ATGCCGTATACCATCGAGTCTGCACGTAGCATCTTCCCAGACACCCAGGTTGCGTCTGCGGTTCCGACCATTGTGGAGTCTTTCGAGCAGCTGTCCGCGGAAGATCGCCTGGCGCTGCTGTGGTTCGCTTACACCGAAATGGGCGTCACCATCACCCCGGCGGCGATGCAGGTAGCAAACATGATGTTCGCGGAAAAGACCCTGGCTCAAATCGAACAGATTCCGGCAGCAGAACAGACTCAGGTAATGTGCGATCTGATCAACCATACCGATACGCCGATTTGCCGTACTTACTCCTACTTTGGCATGAACGTCAAACTGGGTTTCTGGTATCAACTGGGTGAATGGATGAAGCAGGGTATCGTTGCGCCGATCCCGGAAGGCTACAAACTGTCCGCCAAGGCCTCCAACGTGCTGCAGACTATCCGCCAGCTGGAGGGTGGCCAGCAGCTGACCGTTCTGCGTGACATCGTCGTTAACATGGGTCACTCCCCAACGACCGCTACTCAGAAAGTTGAAGAGCCGGTCGTGCCGCCGAAAGATCTGGCGCCGCGCACCAAAATCGTGATTGAAGGTATCAACAACAGCACCGTTCTGAGCTATATGGAGAACATGAACGCGTTCGATTTCGAGGCCGCAGTTGCCCTGTTCGCAGAAGACGGCGCTCTGCAGCCGCCGTTCGAAGAACCGATCGTTGGCCAGGAGTCCATCCTGGCCTTCATGCGTGAAGAATGCTATGGCCTGAAACTGATCCCGGAACGTGGCATTAGCGAACCGGGTGAGCGTGGTTTCACTCAGATCAAAGTGATGGGCAAAGTTCAAACCCCGTGGGCGGGCGACAGCGTAGGCATCAACCTGGCATGGCGTTTCCTGATCAACCGTCAAGGCAAGATTTTCTTCGTTGCCATCGACGTACTGGCGAGCCCGCAGGAGCTGCTGAATCTGGGTCTGGTAAAA
Results from consensus codons, sequence type- protein, expression host- Ecoli:
1 GCAACCGGCT GCTGCAACAC CGCGTACGCG TGTAACGCTA CCCATGGTGC GCGTTGGTCC AACGGCTGTA 71 ACATGGGCAA CTGGTCCAAC GCTACGCATA CCACTTACTG CTGCAACGGC GCTTACGCGT GTAACTGCGC 141 ACGCGGCACC AACGGTTGCA ACTGGTCCAA CGGTTGTAAC GGTACTAATT GTTGCAACGC ATGCAACGCG 211 ACGCACGGTA CCAACGGTGC GCGTTGGTCT AACACCACTT ACGGCGCTCG TTGCGCGCGC TACACCAACT 281 GGTCTAATGG CTGCAACGGC GCGCGTGGCG CATACATGGG TAACTACACT AATGGCTGTA ACTACACCAA 351 CTACACCAAC ACCGGCGGTA CCACGTACGG TTGTAACACT GCATATGCAT GTAACGGCGC GCGTGCAACT 421 GGCGGTGGTA ATGGTACGAA CGCCTGTAAC GCGACTCACG CGTGCAACTG TTGCAACGGC TGCAACGGTT 491 GCAACGCAAC CGGCTGTGCT CGTGGCACTA ATGGCTGCAA CGCAGCTTAT GCAACCGGCG CTACCGGTAC 561 CACCTACGGT TGTAACGGTG CGCGTGCCGC TCGTGCCTGC AACTACACCA ATGGTTGTAA TTGCGCTCGC 631 GCGACCCACG GTGCCCGCTG CGCTCGTGCA ACCCATTGCT GCAACGGCTG TAATGGCTGT AATGGCGCGC 701 GCTGCGCGCG TGCATGTAAT TGCGCGCGCG GTACCAACGC GACTGGCACC GGTTACGGCG CCTATTACAC 771 CAACGCGACC CATGCAGCAT ACTGTGCCTA TGCGTGCAAC GGCGCATATG CATGCAATTG CTGCAACGCG 841 ACCCACACCG GTTATATGGG TAATGCGTGC AACACGGCTT ATTGGAGCAA TACGGCTTAC ACCACTTACG 911 GTGGTAACGC CACCGGTGCT GCGTATGGCA CGAACGCTGC GCGTTATACG AACGGTGGCA ACACCACCTA 981 CACTGGCGGC ACCGCGTATT GTGCGCGTTA CACCAACGGT GGTAACGGCG CTCGCACCGG TGGTGCCACG 1051 GGCGCGGCAC GTTGCGCACG TGGCGGTAAC GCTACCCACG GCACTAACGG TTGCAATTGT TGCAACGCGA 1121 CTCACTGCTG CAACGGTGCA CGTGGCGGTA ACACCGCATA TGCTGCGCGT TACACCAACT GGTCTAACGG 1191 CTGCAATGCG GCTCGCGGTT GCAACTGGAG CAACGCAGCG TACGGCACCA ACTACACTAA CTGCGCACGC 1261 GCTTGCAACG CAACCCATAT GGGCAACTGT GCGCGTTATA CGAACGGCGC GCGTGGTGGT AACGGCGGTA 1331 ACTGCGCTCG TTGTGCTCGT TACACCAACG CGTGTAATGG TACTAACTAC ACCAACATGG GTAACGGTGC 1401 ATACGCGACT CACGGTACCA ACGGTACGAA CGCTGCGTAC GCGACTGGCG GCGGCAACTG CGCTTACTGG 1471 TCTAATTGTT GCAACGCCTG CAATGCATGT AACGGCTGTA ACGCCTGCAA CTGCGCTCGT GCGGCACGTG 1541 GTACCAACGG CGCGCGTGGT GCTCGCTGCT GCAACGGTAC CAACGGTACC AACTGTTGCA ACTGTTGCAA 1611 CGCTGCGCGC GGTGCGTACT ACACCAACGG CTGCAATTGC TGTAACATGG GCAACGCTTG TAACGCAGCG 1681 CGTGCAACTC ACGGCACTAA CGCGACTCAC GGCGCTCGTG GTGGTAACGC GACCCACGCT GCTTACGCGG 1751 CATACTGGTC TAACGCATGC AACGGCACCA ATTACACCAA CTGGTCTAAC ACCGCCTACG CCACTGGCGG 1821 TGCGCGTGCG GCGTATGCAA CCGGCGCCGC GTATGGTTGT AACACTACCT ACGGTGCTTA CACCACGTAC 1891 GGCGCGCGCG GCTGCAACGG CTGTAACGGC ACCAATGGTT GCAACTACAC CAACACTACC TACGGTTGTA 1961 ATGGTGCTCG TGGCGCTTAC GGTGGCAACG GCTGTAACTA TACCAACTGC GCTCGCTGCT GCAACTGTTG 2031 TAACACCACT TACGGTGCAC GCGGTGCACG TTGTTGCAAC GCTACCCACG GTACCAATGG CGGTAATTGC 2101 GCTCGTGGTG CTCGTTGGTC TAACGCAACC CACTATACTA ACGGTTGCAA TACCACCTAC GCAACCGGTA 2171 TGGGCAATGG CGCCCGTGGT GCGCGTACCG GTTATACCGC CTACGGTGGT AACTACACCA ACGCAGCTCG 2241 TTACACCAAC GCCACCCACT GCTGTAACGG CGCTCGTATG GGTAATGGTG GCAATGCTAC CCACTGGTCC 2311 AACGGCGCAC GCTGCTGCAA TGGTGGTAAC GGTGCGCGTA TGGGTAACGG CGGCAACACC ACCTATGCCT 2381 GCAACTGCGC ACGTGCAACT CACGCCGCGC GTGGTACTAA CGCGACCGGC GGCGGTAATG CTGCTCGCGG 2451 CACGAACTGT GCACGTGCTT GCAACTGCTG CAATACTGGC GGTGGCTGTA ACGGTGGCAA CGGTGCATAC 2521 TGGAGCAACG GCACCAACGG CGGTAACGCG ACCCATGCAG CTTATTACAC TAACGGTTGT AACACCGGCG 2591 GTATGGGCAA CACCACCTAT TACACCAACG CGACCCACGC AGCTTACATG GGTAATTGTG CACGTGGCGG 2661 CAACGCGGCG CGCGCAACGC ACACGACCTA CACCACCTAC GGCACGAACG GTTGTAACGC GACTCACGGT 2731 GCCTACGGCA CCAACTATAC CAACGGCTGT AATTGGTCTA ACTGTTGTAA CTGCGCTCGT GGCGCCCGTT 2801 ACACGAACTA TACCAACGCG GCCTACTACA CCAATGGCGG CAACTACACT AACGGCACCA ACGCCGCACG 2871 T
Question 4. In the optimized sequence above the 25th character is T of TAC which would indicate RNA start codon AUG?
question 5. Then in the next line below I found the sequence ACTTAC, which would indicate a stop codon with another start directly after in the sequence. Everything in between the start and stop is a particular gene in the chain? Is that the promoter of the sequence?
Question 6. If AUG is start codon, which contains uracile, then start and stop codons only exist in RNA?
You have a sequence! Now what?
I don’t know, this is where it falls apart for me… WAH Wah wah… GAME OVER… INSERT ANOTHER TOKEN!!! I’m failing to grasp the connection between the Orange Caratenoid Protein OCP, which I chose to explore, and how it relates to the host organism, Ecoli, in the codon optimization stage of homework.
Question 7. Am I trying to cut and paste the OCP into the ecoli genome using the benchling software? Or am I looking for the expression genes within the OCP sequence by identifying the starts and stops?
Question 8. How do I determine where start and stop codons are within the sequence? I know Methionine (AUG) is start and there are three stop codons, UAA, UAG, UGA. Do I start reading in sets of three from the begining of optimized sequence looking for TAC, which would be on the DNA side of AUG?
Annotation of Sequence (I think I need a tutor!)

Prepare a Twist DNA Synthesis Order
I was able to create both Benchling and Twist accounts no problem. I imported the optimization sequence above as a new sequence, which marked all the restriction enzyme sites, but was not able to distinguish promoter region etc. for annotation.
Question 9. How do I determine promoter, RBS, etc. ?
-RBS stands for Ribosomal Biding Site- It is a sequence of of nucleotides (codons) upstream of the start codon (AUG Methionine) in mRNA and has 5-7 nucleotides rich in A and G.
-Start codon in DNA is TAC, and it should be followed by a stop codon (ATT,ATC,ACT).
PDF of my Benchling file (produced from the most likely codon sequence)
Question 10. Should I have fed the optimization results from the consensus codons into Benchling?
DNA Read
My Primary goal with HTGAA is to build my understanding cyanobacteria as the building block of not only my artwork, but also as a cellular organism which forms the foundational base of the ecological food chain. I cannot even begin to think or dream of how, what, or why I would want to alter the cellular systems of cyanobacteria at this point, but as my knowledge of these organisms develops I know my perspective will change.
DNA Write
DNA Edit
References
- Image of Chroococcidiopsis -Villanueva, Chelsea & Hasler, Petr & Dvorak, Petr & Poulíčková, Aloisie & Casamatta, Dale. (2018). Brasilonema lichenoides sp. nov. and Chroococcidiopsis lichenoides sp. nov. (Cyanobacteria): Two novel cyanobacterial constituents isolated from a tripartite lichen of headstones. Journal of Phycology. 54. 10.1111/jpy.12621.
- Sound JK, Bellamy-Carter J, Leney AC. The increasing role of structural proteomics in cyanobacteria. Essays Biochem. 2023 Mar 29;67(2):269-282. doi: 10.1042/EBC20220095. PMID: 36503929; PMCID: PMC10070481.