Week 2 HW: dna-read-write-and-edit

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of polymerase is 1:10^{6 nucleotides, and the length of the human genome is 3.2 gbp or Gega base pair (=3.2* 10}9), this means that if we divide the length of the genome by one million, we would get 3 thousand, which represents the number of errors per genome of a single person. Our bodies use polymerase proof reading which is a process done by DNA polymerases, where they have the ability to detect mistakes done in the newly synthesized DNA strand, while simultaneously building it, when such mistakes are detected, they stop their function, send the new strand to a different site, where they preform exonucleases activity on it by cutting out the wrong base that was added, from the 3’ to 5’ side, then the DNA polymerases enzymes return to their original site and continue their function. These modifications are done within the replication process, there are other modifications that are done after the replication process (post-replication modifications) including mismatch repairs where some proteins are able to detect base pair mismatches on the new DNA strand and they cut them out and build the correct matches using the template strand.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The average human protein has 1036 amino acids, so there would be 10^623 different sequences that code for such a protein, although this number is huge, most of it doesn’t actually end up producing the protein, one of the reasons include codon bias in cells, where cells have some codons that they prefer more than the rest even if all of these codons code for the same amino acid, so these cells have the matching tRNAs for these favorable codons in a higher concentration than the rest, so if a mRNA strand has a lot of rare codons, it will translated slowly, which could influence the functionality of the produced protein. Another problem that can happen is if the mRNA has too many common codons, where the translation will be too fast, and the resulting amino acids won’t have enough time to fold efficiently, since amino acids start folding which ribosomes are doing the translation. Also, these codon sequences can contain pre-mature stop codons that bring the translation process to an end, and no proper protein will be formed.

What’s the most commonly used method for oligo synthesis currently?

solid-phase phosphoramidite oligonucleotide synthesis is the most common used method to produce oligonucleotides, the solid phase is used because it makes it easy to purify the product from unwanted components.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Making oligos that are longer than 200nt is considered difficult because of the high chance that yield is poor, this is because the attachment of nucleotides is uncertain within a single cycle, so as the number of cycles increases, the less the attachment will be.

Why can’t you make a 2000bp gene via direct oligo synthesis?

It would be difficult to make because it requires a lot of cycles, which means a high possibility of unattached nucleotides, and it would be extremely difficult to separate or remove the uncoupled pieces.

Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

Glutamate and Lysine for an electrostatic interaction