Week 1 HW: Principles and Practices

Questions from Professor Jacobson:
- Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
The standard polymerase error rate is 1:1 000 000. Human genome is around 3.2 Gbp = 3.2 x 10^9. Therefore, the copying of full human genome “at once” would yield around 3200 errors (mutations) - some silent and non-significant, some causing serious development and health issues. To avoid so many mistakes passing through the MutS repair system scans the DNA after replications, identifies the mismatched base pairs and trigger wrong sequence excision and re-copying of the fragment.
- How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
The average human protein is around 1036 bp. So, if for the 100 amino acid sequence (with each amino acid being coded by 3 nucleotides) we face 3 ^ 100 options for a 1036 bp protein (around 345 amino acids) we face a possibility of 3 ^ 345 coding options. Not all of those sequences will yield a functional protein of interest due to the fact that some rare codons or their grouping can introduce more mistakes than others, act as unintentional stop signals or slow down the synthesis process [1].
1 https://medium.com/@anrizal05/one-protein-countless-codes-the-reverse-translation-challenge-computer-scientist-gentle-6e100e225ff8
Questions from Dr. LeProust:
- What’s the most commonly used method for oligo synthesis currently?
The most common method used for oligo synthesis is currently the solid-phase phosphoramidite chemistry, in which the nucleoside phosphoramidites are attachted to solid surface and grown in cycles of deprotection (taking off the protecting groups), coupling, capping, and oxidation.
- Why is it difficult to make oligos longer than 200nt via direct synthesis?
Diffuclty of making oligonucleotides longer than 200 nt with drect synthesis comes from depleeting the coupling efficiency with the lenght of disired synthesized product. For a fragment of 20 mer the coupling efficiency of 98% can yield about 68% of full-length product.Therefore, high coupling efficiency has to be preserved throught the process. This is particulary challenging due to the presence of water, which can dilute the concentrations of needed reagents (nuclotide derrivatives, activators, etc.). During the synthesis of longer fragments there is also a higher probability of errors and side reactions. Moreover, longer oligos are also more difficult to purify [1].
- Why can’t you make a 2000bp gene via direct oligo synthesis?
The direct oligo synthesis of a 2000 bp gene is not feasible due to several reasons, including those listed in the answer to Question 2. The accumulation of erros, decreasing coupling and termination of unreacted chains (capping) efficiency prevent practical use of direct synthesis for such long oligos.
- https://www.glenresearch.com/reports/gr21-211
Question from George Church:
- What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
The 10 essential amino acids (that animals can’t synthesize themselves) for all animals are phenylalanine, valine, threonine, tryptophan, isoleucine, methionine, histidine, ariginine (though in some cases can be synthesized by animals), leucine and lysine. The “Lysine Contingency” plan that was introduced in the Jurrasic Park movies as a way of ensuring that none of the dinosaurs will be able to surive outside of the park as they lack the ability to synthesize the amino acid lysine was completely flawed concept. The animals, including dinosaurs and humans, cannot synthesieze lysine even without any molecular-biology-based intervention - we need to consume them within our diets (microbes, plants, other animals - that can be found anywhere, not just on an island with monitoring and lysine supplement facility).
Week 2 HW: DNA read, write and edit
![cover image]()
DNA READ
- What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).
I would like to sequence genomic DNA of the some strains of Antarctic bacteria I study in my PhD. I’m interested in them for new species characterization, enzyme discovery and extremozyme evolution purposes.
- In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
For Antarctic strains I would like to use nanopore sequencing approach, as it allows to generate long reads of DNA - a must have for accurate de novo assembly of novel species. I would also choose the Ilumina, Ion Torrent or nanopore sequencing (with short DNA fragments) as a secondary sequencing techinque for short-reads collection (preferable to fill the gaps in the long-reads).
Also answer the following questions:
2a. Is your method first-, second- or third-generation or other? How so?
Nanopore sequencing is the third-generation sequencing. It was commeracially developed relatively not so long ago. The sequencing is regared as thrid generation sequencing due to real-time, label- (depending on the usedmatrices and applications) and amplificaion-free, sequencing approach.
2b. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.It mostly
My initial input (matrix) is a long gDNA sequences (HMW gDNA) isolated from a given bacterial strain (either through special isolation kits for long gDNA fragements or phenol:chloroform method). Depending on the input quality and quantity fragmentation of DNA matrix might be required. Next, we need to ligate the adapter (withouth which the DNA molecule will not be led to the sequencing pore). After ligation of the adapters, the clean-up stage will allow to get rid of the impurities and unattachted adapters.
2c. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)
The essential steps of nanopore sequencing are 1) sample preparation (gDNA isolation and quantification), 2) library prep (fragmentation, ligation of adapters, clean-up), 3) sequencing (loading the flow cell, real-time data acquisition, basecalling) and 4) downstream data analysis. In the nanopore sequencing method the basecalling is a process of translation of raw electrical signals (occuring when a specific nuclotide pass through the nanopore - resulting in a distinctive ionic current changes).
2d. What is the output of your chosen sequencing technology?
In case of Oxfored nanopore sequencing the output is the raw electrical signal data (POD5/FAST5) that is basecalled in real-time into long-read DNA/RNA sequences, typically as FASTQ files.
DNA Write
- What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)
I would like to synthesize various antifreeze proteins of Glaciozyma martinii 186 Antarctic yeast fused with GFP-related proteins (various colors). In the genome of G. martinii there are at least 9 antifreeze-proteins. The creation of fusion proteins would allow me to answer some of the questions related to their synthesis, such as 1) Are fusionAFPs still functional 2) are the fusion proteins synthesised at various speed and in groups or rather as lone proteins for specific conditions. (the sequences will be known soon)
- What technology or technologies would you use to perform this DNA synthesis and why?
As the genes of GFP and GFP-related proteins have an average length of 717 bp to 730 bp and average AFP from G. martinii has a lenght from 750 to 1260 bp (250-420 amino acids). If they should be synthesized into one gene. The total lenght of the sequence would reach almost 2000 bp. As we now the standard solid synthesis would not be suffcient. Therefore I suggest to utilize Twist gene synthesis.
2a. What are the essential steps of your chosen sequencing methods?
The essential steps are 1) synthesis of oligos on a silicon chip, 2) assembly/annealing - the synthesized oligos are pooled and annealed to form larger, double-stranded DNA fragments (up to 5k), 3) correction of errors via enzymatic reaction and 4) correct gene amplification (PCR).
2b. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?
The main limitations of the method are high GC content of the desired sequence, as well as presence of long homopolymer regions and many repeats in the sequence.
DNA EDIT
1.What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?
I would like to edit the genome of an Antarctic yeast G. martinii 186 (IBMP TUL collection). The target seqeunces would be 9 antifreeze protein genes present in this microorganism. I would like to introduce the GFP-related proteins (different colors for each gene). This could result with fluorescent antifreeze proteins production cells, for scalable, easier study of the potential horizontal gene transfers between species and fusionAFPs expression conditions.
- What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:
I would like to perform CRISPR-Cas9-based genome editions. The standard expression systems (using plasmids) wil be to burdensome for the host organisms. For each fusion gene we would need to present different plasmid with many copies in the cell. This could result in getting rid of the extra plasmids or death of the host cells. Moreover, putting a genomic-DNA-encoded gene into a plasmid would make it impossible to study expression factors and conditions.
2a. How does your technology of choice edit DNA? What are the essential steps?
The CRISPR-Cas9 technology edits DNA by using a custom-designed guide RNA (gRNA) to direct the Cas9 enzyme to a specific genomic location. There it acts as molecular scissors to create a double-strand break. Then, te cell repairs this damage, allowing for a new sequence to become the part of genomic DNA.
2b. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
First of all I need to identify the specific sequence/-s of interest. Then we create the guide RNA (small, site-specific fragment to locate the binding site). Next, the proper vector (plasmid with the gRNA seq and Cas9 protein) has to be consturcted and delivered into the chosen cells. The pre-complexed Cas9-gRNA system is also possbile. Transfection can be conducted with various techniques. Then the cell’s own molecualr machinery repairs the edited fragment. The last stage is the validation and screening stages that confirms proper execution and functionality of the introduced changes.
2c. What are the limitations of your editing methods (if any) in terms of efficiency or precision?
The main problem with CRISPR-Cas9 editing is the lack of highly specific place of Cas9 protein binding which could result in the introduction of changes in the unwanted places in the genome. Introduction of large fragments of new DNA sequence decreases in efficiency of gene editing and in diploid organisms can lead to loss of heterozygosity or major genomic shifts. If the CRISPR-Cas9 system is introduced in plasmids it might be challenging to successfully deliver mulitple CRISPR plasmid into the cells.