Week 2: DNA read, write and edit

** Assignment (Week 2 Lecture Prep) — DUE BY START OF FEB 10 LECTURE** Homework Questions from Professor Jacobson Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Answer: DNA polymerase typically makes 1 mistake per 10⁶ nucleotides (1:10⁶) during replication. The human genome is roughly 3.2 billion base pairs (3.2 × 10⁹ bp or 3.2 Gbp), which is much larger than the raw error rate would suggest, so without correction, thousands of errors could accumulate per replication. But biology is smart:

Many polymerases have a 3’→5’ exonuclease activity that can remove incorrectly added nucleotides immediately.
After replication, mismatch repair systems scan the DNA and fix remaining errors.
Together, these mechanisms reduce the effective error rate to ~1 mistake per 10⁹–10¹⁰ nucleotides, ensuring genome integrity.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Answer:
Number of coding possibilities:

The genetic code is degenerate, meaning most amino acids are encoded by multiple codons.
An average human protein is roughly 400 amino acids long.
Since each amino acid can be coded by 1–6 codons, the total number of possible DNA sequences is huge (roughly ~10¹⁹⁰–10²⁰⁰ possible sequences for a 400-amino acid protein).
Why not all codes work in practice:
1. Codon bias: Some codons are preferred over others in human cells for efficient translation. Using rare codons can slow down protein production or cause misfolding. Thats why when you work with a certain microorganism you have to make sure you codon optimize the sequence for that particular organism.
2. mRNA structure: Certain sequences can form strong secondary structures (hairpins) that interfere with translation or stability. There are online tools that helps you calculate that probability.
3. Regulatory signals: Some sequences might accidentally create splice sites, ribosome binding problems, or cryptic start/stop codons.
4. Protein folding: Synonymous codons can affect the speed of translation, which in turn affects co-translational protein folding.
5. GC content and stability: Extremes in GC or AT content can make the mRNA unstable or harder to transcribe.

Homework Questions from Dr. LeProust: What’s the most commonly used method for oligo synthesis currently? Why is it difficult to make oligos longer than 200nt via direct synthesis? Why can’t you make a 2000bp gene via direct oligo synthesis?

Answer:
Chip-based oligonucleotide synthesis is widely used for high-throughput synthesis of many short DNA sequences (oligos) in parallel.

Why difficult to make oligos >200nt:
1. Stepwise synthesis errors: Each nucleotide is added one at a time, and the efficiency is less than 100% per step (~99–99.5%), so errors accumulate.
2. Purity drops with length: Longer oligos have more incomplete or incorrect sequences, making them unreliable above ~200 nucleotides.
Why you can’t make a 2000bp gene directly:
Direct synthesis of 2000 base pairs would require stitching together ~10 oligos of 200nt each (or many more if using shorter oligos).
- Error rate accumulation: The chance that a full-length sequence is error-free is low.
- Solution in practice: Genes of this length are assembled from shorter oligos using assembly methods such as PCR assembly, Gibson assembly, or Golden Gate assembly.

Homework Question from George Church: Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any. Question: Using Google & Prof. Church’s slide #4, what are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Answer:
Animals require certain amino acids from their diet because they cannot synthesize them internally. These essential amino acids include Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and in some cases Arginine. Lysine, being one of these essential amino acids, is already universally required for survival, so designing a “Lysine Contingency” is not particularly practical, as organisms cannot grow without it anyway. In contrast, Arginine is considered conditionally essential; most adults can produce it internally, but under stress, rapid growth, or other special conditions, dietary intake becomes necessary. Choosing Arginine as a contingency could therefore be more effective in a synthetic biology or containment context, because an engineered organism could be designed to depend on external Arginine, providing a controllable mechanism for survival. This comparison highlights how the essentiality of an amino acid determines whether it can realistically be used for biocontainment strategies.

Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview:

Make a free account at benchling.com
Import the Lambda DNA.
Simulate Restriction Enzyme Digestion with the following Enzymes:
    EcoRI
    HindIII
    BamHI
    KpnI
    EcoRV
    SacI
    SalI
Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
You might find Ronan’s website a helpful tool for quickly iterating on designs!

Answer:

Part 3: DNA Design Challenge 3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

Answer:

I chose the NorR transcription factor. It binds specifically to NO molecules. Nitric oxide is a classic sign of acute inflammation and oxidative stress in uterine tissue. So when NO is present, it binds to NorR, the protein changes shape and activates a specific promoter (called PnorV). The idea is that the latter activates the expression of some anti-inflammatory protein (for which I am still looking for ideas). It was difficult to find the sequence. Finally, I found a paper where they listed the nucleotides in the supplementary material. I copied the sequence into Benchling, selected it, and did a blast. I found it was in Escherichia coli strain MC1061 chromosome. And enter the protein ID: /protein_id=“YBO70461.1”

YBO70461.1 nitric oxide reductase transcriptional regulator NorR [Escherichia coli] MSFSVDVLANIAIELQRGIGHQDRFQRLITTLRQVLECDASALLRYDSRQFIPLAIDGLAKDVLGRRFAL EGHPRLEAIARAGDVVRFPADSELPDPYDGLIPGQESLKVHACVGLPLFAGQNLIGALTLDGMQPDQFDV FSDEELRLIAALAAGALSNALLIEQLESQNMLPGDATPFEAVKQTQMIGLSPGMTQLKKEIEIVAASDLN VLISGETGTGKELVAKAIHEASPRAVNPLVYLNCAALPESVAESELFGHVKGAFTGAISNRSGKFEMADN GTLFLDEIGELSLALQAKLLRVLQYGDIQRVGDDRCLRVDVRVLAATNRDLREEVLAGRFRADLFHRLSV FPLSVPPLRERGDDVILLAGYFCEQCRLRQGLSRVVLSAGARNLLQHYSFPGNVRELEHAIHRAVVLARA TRSGDEVILEAQHFAFPEVTLPTPEVAAVPVVKQNLREATEAFQRETIRQALAQNHHNWAACARMLETDV ANLHRLAKRLGLKD

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

Answer:

As previously said, I already had the DNA sequence, but I either way tried a tool to reverse translate the protein sequence to DNA. I’ve used: Reverse Translate Reverse Translate results Results for 504 residue sequence “YBO70461.1 nitric oxide reductase transcriptional regulator NorR [Escherichia coli]” starting “MSFSVDVLAN”

reverse translation of YBO70461.1 nitric oxide reductase transcriptional regulator NorR [Escherichia coli] to a 1512 base sequence of most likely codons. atgagctttagcgtggatgtgctggcgaacattgcgattgaactgcagcgcggcattggc catcaggatcgctttcagcgcctgattaccaccctgcgccaggtgctggaatgcgatgcg agcgcgctgctgcgctatgatagccgccagtttattccgctggcgattgatggcctggcg aaagatgtgctgggccgccgctttgcgctggaaggccatccgcgcctggaagcgattgcg cgcgcgggcgatgtggtgcgctttccggcggatagcgaactgccggatccgtatgatggc ctgattccgggccaggaaagcctgaaagtgcatgcgtgcgtgggcctgccgctgtttgcg ggccagaacctgattggcgcgctgaccctggatggcatgcagccggatcagtttgatgtg tttagcgatgaagaactgcgcctgattgcggcgctggcggcgggcgcgctgagcaacgcg ctgctgattgaacagctggaaagccagaacatgctgccgggcgatgcgaccccgtttgaa gcggtgaaacagacccagatgattggcctgagcccgggcatgacccagctgaaaaaagaa attgaaattgtggcggcgagcgatctgaacgtgctgattagcggcgaaaccggcaccggc aaagaactggtggcgaaagcgattcatgaagcgagcccgcgcgcggtgaacccgctggtg tatctgaactgcgcggcgctgccggaaagcgtggcggaaagcgaactgtttggccatgtg aaaggcgcgtttaccggcgcgattagcaaccgcagcggcaaatttgaaatggcggataac ggcaccctgtttctggatgaaattggcgaactgagcctggcgctgcaggcgaaactgctg cgcgtgctgcagtatggcgatattcagcgcgtgggcgatgatcgctgcctgcgcgtggat gtgcgcgtgctggcggcgaccaaccgcgatctgcgcgaagaagtgctggcgggccgcttt cgcgcggatctgtttcatcgcctgagcgtgtttccgctgagcgtgccgccgctgcgcgaa cgcggcgatgatgtgattctgctggcgggctatttttgcgaacagtgccgcctgcgccag ggcctgagccgcgtggtgctgagcgcgggcgcgcgcaacctgctgcagcattatagcttt ccgggcaacgtgcgcgaactggaacatgcgattcatcgcgcggtggtgctggcgcgcgcg acccgcagcggcgatgaagtgattctggaagcgcagcattttgcgtttccggaagtgacc ctgccgaccccggaagtggcggcggtgccggtggtgaaacagaacctgcgcgaagcgacc gaagcgtttcagcgcgaaaccattcgccaggcgctggcgcagaaccatcataactgggcg gcgtgcgcgcgcatgctggaaaccgatgtggcgaacctgcatcgcctggcgaaacgcctg ggcctgaaagat

3.3. Codon optimization.

Answer:

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize Google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why? Page said: 404 not found. But in my case i would like to use E.coli as my host organism, and the protein is already from E.coli. So no need for optimizing. In case I later decide to use other bacteria, this optimization will be necessary.

Part 4: Prepare a Twist DNA Synthesis Order

Answer:

NorR expression cassette: https://benchling.com/s/seq-Aq4BUhFSzAsDtlzDNKU7?m=slm-juuMEYDxnWK38MVGn1Cu NorR inside pTwist Amp High Copy

Part 5: DNA Read/Write/Edit 5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

Answer:

I want to sequence two types of DNA:

Synthetic Plasmid DNA: After assembling the genetic cassette i will sequencing to ensure that there are no frameshift mutations or errors in the codons. I would use Sanger Sequencing for this. Sanger is first generation (chain termination)
E. coli Nissle 1917 Genome (Post-editing): If I decide to integrate the circuit into the bacterial chromosome, I need to sequence the entire genome to confirm that the insertion occurred at the desired locus and that there were no off-target effects that compromise the safety of the probiotic. I would use Oxford Nanopore Technologies for whole genome verification.Nanopore is third generation (real-time single molecule).

From both, I will receive a FASTQ file containing the read sequences and their quality scores.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I want to synthesize a genetic sensor-response circuit for the relief of menstrual pain. The goal is to create a “living therapeutic” that detects nitric oxide (NO) and secretes analgesic peptides.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

Answer:

I want to edit the genome of the bacterium E. coli Nissle 1917. Reason: For greater stability and safety, I want to integrate the NO sensor circuit directly into the bacterium’s chromosome (a neutral site) rather than keeping it in a plasmid. This prevents the loss of the circuit and eliminates the need to use antibiotics to keep the plasmid inside the women.

(ii) What technology or technologies would you use to perform these DNA edits and why?

Answer:

I would use the Lambda Red homologous recombination system because it allows for the direct exchange of large DNA fragments through homology in a very efficient manner in microorganisms.