Week 2 HW: DNA read, write and edit

Week 2 Lecture Prep

Jacobson’s Questions

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error rate of polymerase: Natural DNA polymerase has an error rate of 1 in 10^6 bases.

Human genome length: The human genome is around 3.2 billion base pairs.

At an error rate of 1 in 10⁶, polymerase would introduce roughly:

3.2 x 109 / 1 x 106 = 3200 errors per genome

How biology deals with it: Despite this potential for errors, the cell uses multiple mechanisms to maintain genome stability:

  • Proofreading by DNA polymerase during replication.

  • Mismatch repair systems that fix errors after DNA synthesis.

  • Excision repair pathways for damaged or mismatched bases.

These systems reduce the final mutation rate to approximately 1 in 10⁹ bases, making DNA replication remarkably accurate.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Because the genetic code is degenerate —meaning 61 different codons specify only 20 amino acids— a single protein sequence can be represented by countless, often billions or more, combinations of DNA sequences. For an average human protein of 400 amino acids, there are approximately 10^200 different DNA sequences that can encode the same protein.

Why not all work:

  • Codon bias (organisms prefer certain codons over others).

  • mRNA stability and folding may affect translation.

  • Regulatory sequences (e.g., hairpins, internal ribosome entry sites) may be unintentionally formed.

  • Toxic sequences might be formed (e.g., repeats, CpG motifs).