Week 2 HW: DNA, READ, WRITE AND EDIT!

Geeking out over protein structures and data banks, DNA storage in plants, clouds and decoding DNA into sound
I love that artist Antoine Bertin has decoded the RNA of SARS COV 2 into this track! check it out.
This is the RNA of the Coronavirus translated into sound (viruses are made of RNA, not exactly DNA). Each nucleotide of the RNA (A,U,G or C) is transformed into a note so the virus sequence can be heard. The tempo of the track follows the rhythm at which the epidemic is growing (exponential curve) and how this curve flattens if we all stay home :) I wanted to create a track that can help with relaxation in times of isolation, and meditate on the fact all life on earth, including viruses, are made of the same material. We (humans, animals, trees, bacteria, viruses) are the continuation of a same common ancestor. Anyway; I hope this will helps everyone explore in their own sonic way what we are going through! Here is an extract of the RNA sequence :)
Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, complete genome (NC_045512.2)
auuaaagguuuauaccuucccagguaacaaaccaaccaacuuucgaucucuuguagaucuguucucuaaacgaacuuua aaaucuguguggcugucacucggcugcaugcuuagugcacucacgcaguauaauuaauaacuaauuacugucguugaca ggacacgaguaacucgucuaucuucugcaggcugcuuacgguuucguccguguugcagccgaucaucagcacaucuagg uuucguccgggugugaccgaaagguaagauggagagccuugucccugguuucaacgagaaaacacacguccaacucagu uugccuguuuuacagguucgcgacgugcucguacguggcuuuggagacuccguggaggaggucuuaucagaggcacguc aacaucuuaaagauggcacuuguggcuuaguagaaguugaaaaaggcguuuugccucaacuugaacagcccuauguguu caucaaacguucggaugcucgaacugcaccucauggucauguuaugguugagcugguagcagaacucgaaggcauucag uacggucguaguggugagacacuugguguccuugucccucaugugggcgaaauaccaguggcuuaccgcaagguucuuc uucguaagaacgguaauaaaggagcugguggccauaguuacggcgccgaucuaaagucauuugacuuaggcgacgagcu uggcacugauccuuaugaagauuuucaagaaaacuggaacacuaaacauagcagugguguuacccgugaacucaugcgu gagcuuaacggaggggcauacacucgcuaugucgauaacaacuucuguggcccugauggcuacccucuugagugcauua aagaccuucuagcacgugcugguaaagcuucaugcacuuuguccgaacaacuggacuuuauugacacuaagaggggugu auacugcugccgugaacaugagcaugaaauugcuugguacacggaacguucugaaaagagcuaugaauugcagacaccu
I wanna read, write and edit DNA!!!
I had twisted sister in my mind while I was saying this, particularly I WANNA ROCK.

The 2nd week has been again packed with new information but I cannot wait to read, write and edit DNA as this it totally new information. In the past year I have had some health issues and have been to every single doctor and what is left is to get a DNA test to check for HLA Chromosome 6. The human leukocyte antigen (HLA) system is a complex of genes on chromosome 6 in humans that encode cell-surface proteins responsible for regulation of the immune system. Wish me luck!
Week 2- DNA Read, Write, & Edit HW
This week explores the read–write–edit toolkit: sequencing and synthesis workflows, restriction digests and gel electrophoresis, and early genome-editing frameworks.
Make sure to document every step of the in-silico and lab experiments. Make sketches, screenshots, notes, drawings… anything that helps you - and others - understand the experiment.
Part 0: Basics of Gel Electrophoresis
Gel electrophoresis separates DNA fragments based on size using:
Negatively charged DNA backbone Electric field Agarose matrix Size-dependent migration
I attended and watched all lecture and recitation videos apart from the one last week on Thursday, the first meetup with Tokyo Bioclub node because I was setting up an exhibition and because with the time difference I did not see the email on time but I watched the recording :)
How does gel electrophoresis work?!


…and what does it look like?

I have known for a while how it looks like but I never really looked properly into it. I have been working with agar for a while now due to making biomaterials for textiles and edible materials too. In addition, I have also worked with other polymers too such as different kinds of alginate, gelatin and different kinds of starch.
Part 1: Benchling & In-silico Gel Art
See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details.

•Overview:
- Make a free account at benchling.com
It was super easy! I logged in with my google account.
- Import the Lambda DNA

This is what the DNA sequence looks like in FASTA SEQUENCE FORMAT! I saved the file in a file document because it was the only available option on the the neb.com website. I did right click and saved in file format. Let’s see if we can import it like this in benchling!

Importing the lambda DNA sequence in benchling
First I created a new project on benchling named ‘htgaa week 2 - MARISA SATSIA’.

Then i imported the DNA!


Then I clicked on open sequence and VOILA!

- Simulate Restriction Enzyme
You might wonder what a restriction enzyme is right?!


- Simulate Restriction Enzyme Digestion with the following Enzymes:
EcoRI

HindIII BamHI KpnI EcoRV SacI SalI
Here is the enzyme digest simulation with all the enzymes!


Here is the ladder simulation

- Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks and 6. -> You might find Ronan’s website, a helpful tool for quickly iterating on designs!
I made this using Ronan’s website. I think it is pretty cool to simulate this whole process and have a visual because I do not know when I am actually gonna do the lab!

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis
Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.
Unfortunately I cannot do that here in Cyprus, but I am actively looking for a lab to let me practice a bit.

Part 3: DNA Design

Part 3.1. Choose your protein
In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
[Example from our group homework, you may notice the particular format — The example below came from UniProt]
sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT
I choose the HPV genome proteins L1 (HPV16-L1) and L2(HPV16-L2). The HPV genome is surrounded by an icosahedral capsid consisting of two structural proteins: the major capsid protein L1 (HPV16-L1) and the minor capsid protein L2 (HPV16-L2). The L1 proteins are highly conserved and aggregate to form 72 fivefold capsomers. The L2 protein binds viral DNA. There are multiple types of HPV unfortunately and each affects us differently. Some types cause cervical cancer and some warts. There is an mRNA vaccine which I got when it first came out in 2007 or 2008 or 2009, when I was 18 or 19, I do not exactly remember.
L1 Protein Lengths by HPV Type
The L1 gene encodes the major capsid protein of the Human Papillomavirus (HPV), which spontaneously self-assembles into virus-like particles (VLPs)).
Because HPV has over 100 different genotypes, the exact sequence length varies slightly:
HPV 16: 505 amino acids (Prototype ID: P03101). HPV 18: 568 amino acids (UniProt ID: T2A5K9). HPV 51: 504 amino acids (UniProt ID: P26536).
This is what AI mode in google mentioned!

Below is the FASTA sequence for the L1 Major Capsid Protein of HPV Type 16, the strain responsible for approximately 50% of all cervical cancer cases worldwide. HPV 16 L1 Protein Sequence (UniProt P03101). This protein is 505 amino acids long and is the primary antigen used in HPV vaccines like Gardasil.
L1 SEQUENCE
sp|P03101|VL1_HPV16 Major capsid protein L1 OS=Human papillomavirus type 16 OX=333760 GN=L1 PE=1 SV=1
MSLWLPSEATVYLPPVPVSKVVSTDEYVARTNIYYHAGTSRLLAVGHPYFPIKKPNNNKI LVPKVSGLQYRVFRIHLPDPNKFGFPDTSFYNPDTQRLVWACVGVEVGRGQPLGVGISGH PLLNKLDDTENASAYAANAGVDNRECISMDYKQTQLCLIGCKPPIGEHWGKGSPCTNVAV NPGDCPPLELINTVIQDGDMVHTGFGAMDFTTLQANKSEVPLDICTSICKYPDYIKMVSE PYGDSLFFYLRREQMFVRHLFNRAGAVGENVPDDLYIKGSGSTATLANNYYPTPSGSMVT SDAQIFNKPYWLQRAQGHNNGICWGNQLFVTVVDTTRSTNMSLCAAISTSETTYKNTNFK EYLRHGEEYDLQFIFQLCKITLTADVMTYIHSMNSTILEDWNFGLQPPPGGTLEDTYRFV TSQAIACQKHTPPAPKEDDPLKKYTFWEVNLKEKFSADLDQFPLGRKFLLQAGLKAKPKF TLGKRKATPTTSSTSTTAKRKKRKL
I also got this Pentamer Structure of Major Capsid protein L1 of Human Papilloma Virus type 11 from the 3d viewer from the RCSB PDB I love the 3d visualisation tool and the fact that you can isolate things and make animations and download 3d models.

Part 3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence
The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
Example: Get to the original sequence of phage MS2 L-protein from its genome. The LYSIS protein DNA sequence below-
atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa
This is the FASTA sequence of phage MS2 DNA genome on the website:

For the HPV 16 L1 Protein Sequence (UniProt P03101) the reverse translation or reverse engineering sequence iiiiisssss:
NC_001526.4:5560-7077 Human papillomavirus type 16 (HPV16), L1 major capsid protein ATGAGCCTGTGGCTGCCCAGCGAGGCCACCGTGTACCTGCCTCCCGTGCCCGTGTCCAAG GTGGTGAGCACCGACGAGTACGTGGCCCGGACCAACATCTACTACCACGCCGGCACCAGC CGCCTGCTGGCCGTGGGCCACCCCTACTTCCCCATCAAGAAGCCCAACAACAACAAGATC CTGGTGCCCAAGGTGAGCGGCCTGCAGTACCGGGTGTTCCGGATCCACCTGCCCGACCCC AACAAGTTCGGCTTCCCCGACACCAGCTTCTACAACCCCGACACCCAGCGGCTGGTGTGG GCCTGCGTGGGCGTGGAGGTGGGCCGGGGCCAGCCCCTGGGCGTGGGCATCAGCGGCCAC CCCCTGCTGAACAAGCTGGACGACACCGAGAACGCCAGCGCCTACGCCGCCAACGCCGGC GTGGACAACCGGGAGTGCATCAGCATGGACTACAAGCAGACCCAGCTGTGCCTGATCGGC TGCAAGCCCCCCATCGGCGAGCACTGGGGCAAGGGCAGCCCCTGCACCAACGTGGCCGTG AACCCCGGCGACTGCCCCCCACTGGAGCTGATCAACACCGTGATCCAGGACGGCGACATG GTGCACACCGGCTTCGGCGCCATGGACTTCACCACCCTGCAGGCCAACAAGAGCGAGGTG CCCCTGGACATCTGCACCAGCATCTGCAAGTACCCCGACTACATCAAGATGGTGAGCGAG CCCTACGGCGACAGCCTGTTCTTCTACCTGCGGCGGGAGCAGATGTTCGTGCGGCACCTG TTCAACCGGGCCGGCGCCGTGGGCGAGAACGTGCCCGACGACCTGTACATCAAGGGCAGC GGCAGCACCGCCACCCTGGCCAACAACTACTACCCCACCCCCAGCGGCAGCATGGTGACC AGCGACGCCCAGATCTTCAACAAGCCCTACTGGCTGCAGCGGGCCCAGGGCCACAACAAC GGCATCTGCTGGGGCAACCAGCTGTTCGTGACCGTGGTGGACACCACCCGGAGCACCAAC ATGAGCCTGTGCGCCGCCATCAGCACCAGCGAGACCACCTACAAGAACACCAACTTCAAG GAGTACCTGCGGCACGGCGAGGAGTACGACCTGCAGTTCATCTTCCAGCTGTGCAAGATC ACCCTGACCGCCGACGTGATGACCTACATCCACAGCATGAACAGCACCATCCTGGAGGAC TGGAACTTCGGCCTGCAGCCCCCCCCCGGCGGCACCCTGGAGGACACCTACCGGTTCGTG ACCAGCCAGGCCATCGCCTGCCAGAAGCACACCCCCCCCGCCCCCAAGGAGGACGACCCC CTGAAGAAGTACACCTTCTGGGAGGTGAACCTGAAGGAGAAGTTCAGCGCCGACCTGGAC CAGTTCCCCCTGGGCCGGAAGTTCCTGCTGCAGGCCGGCCTGAAGGCCAAGCCCAAGTTC ACCCTGGGCAAGCGGAAGGCCACCCCCACCACCAGCAGCACCAGCACCACCGCCAAGCGG AAGAAGCGGAAGCTGTAA
Official Reference Information
Database: NCBI GenBank / RefSeq Accession Number: NC_001526.4 Locus Tag: HPV16gp6 (L1) Coordinates: 5560 to 7077 (1518 base pairs) Function: Major capsid protein; self-assembles into virus-like particles (VLPs) used in vaccines.
Part 3.3. Codon optimization
Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
Example from from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI.
Lysis protein DNA sequence with Codon-Optimization
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA
For the HPV 16 L1 protein DNA sequence with codon-optimization
According to AI the preferred codon optimization tool for HPV16 and HPV18, particularly for designing vaccines, is the Java Codon Adaptation Tool (JCat). JCat is used to adapt the codon usage of the HPV genes to the host organism (e.g., E. coli or humans) to improve protein expression.
GC-Content of Homo sapiens: 40.892862223204
Translation: ATGAGCCTGTGGCTGCCCAGCGAGGCCACCGTGTACCTGCCTCCCGTGCC 50 CGTGTCCAAGGTGGTGAGCACCGACGAGTACGTGGCCCGGACCAACATCT 100 ACTACCACGCCGGCACCAGCCGCCTGCTGGCCGTGGGCCACCCCTACTTC 150 CCCATCAAGAAGCCCAACAACAACAAGATCCTGGTGCCCAAGGTGAGCGG 200 CCTGCAGTACCGGGTGTTCCGGATCCACCTGCCCGACCCCAACAAGTTCG 250 GCTTCCCCGACACCAGCTTCTACAACCCCGACACCCAGCGGCTGGTGTGG 300 GCCTGCGTGGGCGTGGAGGTGGGCCGGGGCCAGCCCCTGGGCGTGGGCAT 350 CAGCGGCCACCCCCTGCTGAACAAGCTGGACGACACCGAGAACGCCAGCG 400 CCTACGCCGCCAACGCCGGCGTGGACAACCGGGAGTGCATCAGCATGGAC 450 TACAAGCAGACCCAGCTGTGCCTGATCGGCTGCAAGCCCCCCATCGGCGA 500 GCACTGGGGCAAGGGCAGCCCCTGCACCAACGTGGCCGTGAACCCCGGCG 550 ACTGCCCCCCACTGGAGCTGATCAACACCGTGATCCAGGACGGCGACATG 600 GTGCACACCGGCTTCGGCGCCATGGACTTCACCACCCTGCAGGCCAACAA 650 GAGCGAGGTGCCCCTGGACATCTGCACCAGCATCTGCAAGTACCCCGACT 700 ACATCAAGATGGTGAGCGAGCCCTACGGCGACAGCCTGTTCTTCTACCTG 750 CGGCGGGAGCAGATGTTCGTGCGGCACCTGTTCAACCGGGCCGGCGCCGT 800 GGGCGAGAACGTGCCCGACGACCTGTACATCAAGGGCAGCGGCAGCACCG 850 CCACCCTGGCCAACAACTACTACCCCACCCCCAGCGGCAGCATGGTGACC 900 AGCGACGCCCAGATCTTCAACAAGCCCTACTGGCTGCAGCGGGCCCAGGG 950 CCACAACAACGGCATCTGCTGGGGCAACCAGCTGTTCGTGACCGTGGTGG 1000 ACACCACCCGGAGCACCAACATGAGCCTGTGCGCCGCCATCAGCACCAGC 1050 GAGACCACCTACAAGAACACCAACTTCAAGGAGTACCTGCGGCACGGCGA 1100 GGAGTACGACCTGCAGTTCATCTTCCAGCTGTGCAAGATCACCCTGACCG 1150 CCGACGTGATGACCTACATCCACAGCATGAACAGCACCATCCTGGAGGAC 1200 TGGAACTTCGGCCTGCAGCCCCCCCCCGGCGGCACCCTGGAGGACACCTA 1250 CCGGTTCGTGACCAGCCAGGCCATCGCCTGCCAGAAGCACACCCCCCCCG 1300 CCCCCAAGGAGGACGACCCCCTGAAGAAGTACACCTTCTGGGAGGTGAAC 1350 CTGAAGGAGAAGTTCAGCGCCGACCTGGACCAGTTCCCCCTGGGCCGGAA 1400 GTTCCTGCTGCAGGCCGGCCTGAAGGCCAAGCCCAAGTTCACCCTGGGCA 1450 AGCGGAAGGCCACCCCCACCACCAGCAGCACCAGCACCACCGCCAAGCGG 1500 AAGAAGCGGAAGCTGTAA

In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
For humans and for vaccine development.
Part 3.4. You have a sequence! Now what?
What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
I could use the twist dna synthesis.
Part 3.5. How does it work in nature/biological systems?
Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below. [Example shows the biomolecular flow in central dogma from DNA to RNA to Protein] Special note that all “T” were transcribed into “U” and that the 3-nt codon represents.
Part 4: Prepare a Twist DNA Synthesis Order
I do need someone to check my hw and tell me if i did everything right and I will finish this part asap!
Part 5: Read, write, edit!🔮
5.1 DNA Read
(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank)?
I would like to read HPV16 AND HPV18. It is important to me because of personal reasons.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
Several advanced technologies are used to analyze HPV DNA and RNA sequences, ranging from established clinical screening methods to cutting-edge research tools for detecting viral integration. The primary techniques include Next-Generation Sequencing (NGS), PCR-based methods, and molecular hybridization . Here is a breakdown of the technologies used for HPV DNA/RNA sequence analysis:
Next-Generation Sequencing (NGS) NGS is used for high-throughput, comprehensive genomic analysis, including identifying multiple HPV subtypes, mutations, and integration sites.
Nanopore Sequencing (Third-Generation): This technology is used for long-read sequencing, allowing for the characterization of complete HPV genomes and the identification of HPV integration into the host genome. It is particularly useful for identifying chimeric cellular–viral reads. Illumina Sequencing: Often combined with hybrid capture for high-accuracy sequencing of full HPV genomes. HPV-KITE: A specialized algorithm that uses k-mer data analysis for rapid HPV detection from NGS data.
Nucleic Acid Amplification & Detection (DNA/RNA)
Real-Time PCR (qPCR): The most common method, using primers (e.g., L1, E6/E7) to amplify and quantify HPV DNA. Examples include Cobas HPV and BD Onclarity. RT-PCR (Reverse Transcription PCR): Used specifically for detecting mRNA expression of E6 and E7 oncoproteins. Transcription-Mediated Amplification (TMA): Used in the Aptima HPV Assay to detect E6/E7 mRNA for high-risk HPV. Isothermal Amplification (IATs): Methods like Loop-Mediated Isothermal Amplification (LAMP) and Nucleic Acid Sequence-Based Amplification (NASBA) are used for rapid, isothermal detection without a thermocycler. Droplet Digital PCR (ddPCR): Used for absolute quantification of HPV DNA/RNA with high sensitivity.
Signal Amplification & Hybridization
Hybrid Capture (HC2): A signal amplification method that uses RNA probes to hybridize with HPV DNA, which is then captured and detected via chemiluminescence. Invader Technology: A signal amplification method (used in Cervista tests) that uses special enzymes to cleave DNA, creating a fluorescent signal. DNA Microarray/Chips: Technologies like Linear Array or PapilloCheck detect multiple HPV types by hybridizing amplified DNA to specific probes.
Summary of Technologies by Goal Goal Technology Full Genome/Integration Nanopore Sequencing, Illumina High-Risk DNA Screening Real-Time PCR (Cobas, Abbott), Hybrid Capture 2 (HC2) Active Infection (RNA) RT-PCR (Aptima, NASBA) Point-of-Care/Rapid LAMP, RPA, CRISPR-Cas12a
Also answer the following questions:
Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?