Week 2 HW: DNA read, write and edit

DNA READ

What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would like to sequence genomic DNA of the some strains of Antarctic bacteria I study in my PhD. I’m interested in them for new species characterization, enzyme discovery and extremozyme evolution purposes.

In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

For Antarctic strains I would like to use nanopore sequencing approach, as it allows to generate long reads of DNA - a must have for accurate de novo assembly of novel species. I would also choose the Ilumina, Ion Torrent or nanopore sequencing (with short DNA fragments) as a secondary sequencing techinque for short-reads collection (preferable to fill the gaps in the long-reads).

Also answer the following questions: 2a. Is your method first-, second- or third-generation or other? How so?

Nanopore sequencing is the third-generation sequencing. It was commeracially developed relatively not so long ago. The sequencing is regared as thrid generation sequencing due to real-time, label- (depending on the usedmatrices and applications) and amplificaion-free, sequencing approach.

2b. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.It mostly

My initial input (matrix) is a long gDNA sequences (HMW gDNA) isolated from a given bacterial strain (either through special isolation kits for long gDNA fragements or phenol:chloroform method). Depending on the input quality and quantity fragmentation of DNA matrix might be required. Next, we need to ligate the adapter (withouth which the DNA molecule will not be led to the sequencing pore). After ligation of the adapters, the clean-up stage will allow to get rid of the impurities and unattachted adapters.

2c. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)

The essential steps of nanopore sequencing are 1) sample preparation (gDNA isolation and quantification), 2) library prep (fragmentation, ligation of adapters, clean-up), 3) sequencing (loading the flow cell, real-time data acquisition, basecalling) and 4) downstream data analysis. In the nanopore sequencing method the basecalling is a process of translation of raw electrical signals (occuring when a specific nuclotide pass through the nanopore - resulting in a distinctive ionic current changes).

2d. What is the output of your chosen sequencing technology?

In case of Oxfored nanopore sequencing the output is the raw electrical signal data (POD5/FAST5) that is basecalled in real-time into long-read DNA/RNA sequences, typically as FASTQ files.

DNA Write

What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I would like to synthesize various antifreeze proteins of Glaciozyma martinii 186 Antarctic yeast fused with GFP-related proteins (various colors). In the genome of G. martinii there are at least 9 antifreeze-proteins. The creation of fusion proteins would allow me to answer some of the questions related to their synthesis, such as 1) Are fusionAFPs still functional 2) are the fusion proteins synthesised at various speed and in groups or rather as lone proteins for specific conditions. (the sequences will be known soon)

What technology or technologies would you use to perform this DNA synthesis and why?

As the genes of GFP and GFP-related proteins have an average length of 717 bp to 730 bp and average AFP from G. martinii has a lenght from 750 to 1260 bp (250-420 amino acids). If they should be synthesized into one gene. The total lenght of the sequence would reach almost 2000 bp. As we now the standard solid synthesis would not be suffcient. Therefore I suggest to utilize Twist gene synthesis.

2a. What are the essential steps of your chosen sequencing methods?

The essential steps are 1) synthesis of oligos on a silicon chip, 2) assembly/annealing - the synthesized oligos are pooled and annealed to form larger, double-stranded DNA fragments (up to 5k), 3) correction of errors via enzymatic reaction and 4) correct gene amplification (PCR).

2b. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

The main limitations of the method are high GC content of the desired sequence, as well as presence of long homopolymer regions and many repeats in the sequence.

DNA EDIT

1.What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I would like to edit the genome of an Antarctic yeast G. martinii 186 (IBMP TUL collection). The target seqeunces would be 9 antifreeze protein genes present in this microorganism. I would like to introduce the GFP-related proteins (different colors for each gene). This could result with fluorescent antifreeze proteins production cells, for scalable, easier study of the potential horizontal gene transfers between species and fusionAFPs expression conditions.

What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

I would like to perform CRISPR-Cas9-based genome editions. The standard expression systems (using plasmids) wil be to burdensome for the host organisms. For each fusion gene we would need to present different plasmid with many copies in the cell. This could result in getting rid of the extra plasmids or death of the host cells. Moreover, putting a genomic-DNA-encoded gene into a plasmid would make it impossible to study expression factors and conditions.

2a. How does your technology of choice edit DNA? What are the essential steps?

The CRISPR-Cas9 technology edits DNA by using a custom-designed guide RNA (gRNA) to direct the Cas9 enzyme to a specific genomic location. There it acts as molecular scissors to create a double-strand break. Then, te cell repairs this damage, allowing for a new sequence to become the part of genomic DNA.

2b. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

First of all I need to identify the specific sequence/-s of interest. Then we create the guide RNA (small, site-specific fragment to locate the binding site). Next, the proper vector (plasmid with the gRNA seq and Cas9 protein) has to be consturcted and delivered into the chosen cells. The pre-complexed Cas9-gRNA system is also possbile. Transfection can be conducted with various techniques. Then the cell’s own molecualr machinery repairs the edited fragment. The last stage is the validation and screening stages that confirms proper execution and functionality of the introduced changes.

2c. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The main problem with CRISPR-Cas9 editing is the lack of highly specific place of Cas9 protein binding which could result in the introduction of changes in the unwanted places in the genome. Introduction of large fragments of new DNA sequence decreases in efficiency of gene editing and in diploid organisms can lead to loss of heterozygosity or major genomic shifts. If the CRISPR-Cas9 system is introduced in plasmids it might be challenging to successfully deliver mulitple CRISPR plasmid into the cells.