Week 2 HW: DNA READ, WRITE, & EDIT

DNA Design Challenge

Protein: GFP (Green Fluorescent Protein)
Reason: Because GFP is commonly used as a biological marker to visualize various cellular processes due to its green fluorescence.

sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence

GFP DNA >ATGTCCAAGGGTGAGGAGCTGTTTACCGGCGTGGTTCCGATTCTTGTGGAATTAGACGGCGATGTCAACGGCCACTTCTCCGTTTCT GGCGAGGGCGAGGGAGGCGACGCCACGTATGGCAAATTGACCCTGAAGTTTATTTGCACGACCGGAAAATTGCCTGTACCGTGGCCCACACTTTGGT CACTACCGTTATCAATGTTTCTCGCTATCCGGACCACATGAAGCAGCATGACTTCTTTAAAAGTGCAATGCCCGAGGGTTATGTTCAAGAGCGGACCA TCTTTTTTAAAAGACGACGGCAACTACAAGACGCGCGAGGTGAAGTTCGAGGGCGACACGCTGGTGAATCGGATTGAGTTAAAAGGAATTGACTTTAA AAGATGACGGCAACATCCTTGGACATAAGTTAGAGTACAATTATAATTCAAACCACGTGTACATCATGGCCGACAAACAAAAAAACGGCATCAAGGTA AACTTTAAAATTAGACATAATATCGAGGATGGCAGTGTTCAATTAGCCGACCATTACCAACAGAACACCGATAGGCGGACGGTCCTGTATTACCTGAC AACCATTACCTTAGCACGCAGTCTGCACTGTCCAAGGACCCAAATGAGAAACGAGGACCATATGGTGTTGCTAGAGTTCGTTACCGCAGCAGGAATAAC

Codon optimization

The selected organism: Escherichia coli (E. coli) Reason: This is because E. coli is a common model organism for the production of recombinant proteins due to its speed, low cost and ease of manipulation.

ATGTCTGGTGGAGGTTCTGTTTACCGGCGTTGGTAGTGGTATTCTTGTGGAATTAGATGGCGATGTCAACGGCCACTTTTCCGTTTCAGGCGAGG GTGAGGGAGGCGACTACCGTGTCAAAATCGACACCTTGAAGTTTATTTGCACGACCGGAAAATTGCCTGTACCGTGGACCACACTTTGGTCACTACTG TTATCAACGTGTTCTCGCTATCCGGACCACATGAAGCAGCATGACTTTTAAAAGTGCAATGCCCGAGGGTTATGTTCAAGAGCGGACCATCTTTTTTA AAAGACGACGGCAACTACAAGACGCGCGAGGTGAAGTTCGAGGGCGAACGCGTGGTGAATCGGATTGAGCTAAAAGGAATTGACTTTAAAAGACGACG GCAACATTCTTGGACATAAGCTAGAGTACAATTATAATTCAAACCACGTGTACATCATGGCCGACAAACAAAAAAACGGCATCAAGGTAAACTTTAAA ATTAGACATAATATCGAGGATGGCAGTGTTCAATTAGCCGACCATTACCAACAGAACACCGATAGGCGGACGGTCCTGTATTACCTGACAACCATTAC CTTAGCACGCAGTCTGCACTGTCCAAGGACCCAAATGAGAAACGAGGACCATATGGTGTTGCTAGAGTTCGTTACCGCAGCAGGAATAAC
Technologies to Produce Protein

Cell-dependent method in this case, focused on E. coli:

Cloning: An insertion of the optimized sequence is carried out in a plasmid.

Transformation: In this step, what is done is that the plasmid is introduced into E. coli.

Expression: With this, an induction is carried out with IPTG to produce GFP.

Purification: This step seeks to use affinity chromatography.

Cell free method: For this purpose, E. coli extracts are used to produce GFP directly.

DNA Read/Write/Edit

DNA Read

(i) What DNA Would You Edit? The DNA to be sequenced would be the F9 gene locus extracted from the patient’s liver cells following CRISPR-Cas9 gene editing. The core purpose is verification — confirming that the correction worked exactly as intended and that no unintended damage was introduced elsewhere in the genome. There are three specific sequencing goals. First, confirming the edit — reading across the exact mutation site to verify that the correct nucleotide was restored by Homology-Directed Repair, and that no indels were introduced by the messier NHEJ pathway instead. Second, off-target analysis — scanning the broader genome for any locations where Cas9 may have made unintended cuts, which is a critical safety check before any edited cells are returned to the patient. Third, monitoring gene expression — by sequencing RNA transcripts (via RNA-seq), it is possible to confirm that the corrected F9 gene is actually being transcribed and translated into functional Factor IX protein, not just sitting silently in the genome. (ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Nanopore Sequencing (Oxford Nanopore Technologies) For verifying CRISPR edits in the F9 gene, Nanopore sequencing is the strongest choice. The F9 gene involves complex structural mutations, and Nanopore’s ability to read very long stretches of DNA in a single pass makes it ideal for confirming whether the edit succeeded and detecting any off-target cuts elsewhere in the genome.

Generation Nanopore is a third-generation sequencing technology. Unlike first-generation Sanger sequencing (slow, single fragment) or second-generation Illumina (short reads, requires amplification), Nanopore reads single DNA molecules in real time without amplification, and uniquely detects bases electrically rather than optically.
Input & Preparation The input is high-molecular-weight genomic DNA extracted from the patient’s edited liver cells. After extraction, DNA quality is checked using NanoDrop and gel electrophoresis. The ends are then repaired and dephosphorylated to create clean blunt ends, followed by ligation of Oxford Nanopore’s proprietary motor protein adapters. Importantly, no PCR amplification is needed, which avoids bias and preserves the DNA’s native state.
Sequencing & Base Calling A protein nanopore is embedded in a membrane with a constant ionic current flowing through it. The motor protein feeds the DNA strand through the pore one base at a time, and each base disrupts the current by a characteristic amount. These electrical signals are recorded in real time and translated into a DNA sequence by a neural network-based base calling algorithm such as Dorado.
Output The output is a set of long reads — sometimes hundreds of thousands of base pairs — stored in FASTQ format, containing both the sequence and per-base quality scores. These are aligned to the reference genome to confirm the F9 correction, identify any NHEJ-caused indels, and flag potential off-target edits.

DNA Write (i) What DNA Would You Synthesize and Why? The target for synthesis would be a corrected copy of the F9 gene, which encodes Clotting Factor IX — the protein absent or dysfunctional in Hemophilia B patients. Rather than relying solely on CRISPR to repair the mutation in place, synthesizing a complete, healthy F9 sequence allows it to be used as the donor template in Homology-Directed Repair, or delivered independently as a gene therapy construct. The F9 gene spans approximately 34,000 base pairs, and the synthesized version would carry the correct nucleotide at the mutation site, restoring the gene’s ability to produce functional Factor IX. This approach is compelling because it offers a one-time, permanent fix rather than the lifelong Factor IX infusions patients currently depend on.

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Because the F9 gene is too long to synthesize as a single error-free strand using standard chemical methods, the same two-stage approach applies. Silicon-based oligonucleotide synthesis (as used by platforms like Twist Bioscience) prints thousands of short, overlapping DNA fragments in parallel using phosphoramidite chemistry, building the sequence one nucleotide at a time directly onto a chip.

Once detached, these fragments are fed into Gibson Assembly, where an exonuclease exposes overlapping single-stranded ends, a polymerase fills any gaps, and a ligase seals the joins — stitching all the pieces into the complete, corrected F9 sequence in a single reaction. The synthesized F9 construct would then be packaged into an AAV vector (Adeno-Associated Virus), specifically AAV5 or AAV8, which has a strong affinity for liver cells where Factor IX is naturally produced, and delivered intravenously to the patient. The primary limitations are error rates — longer constructs accumulate more synthesis errors, requiring thorough verification with Nanopore sequencing — and the AAV’s limited cargo capacity, which at roughly 4.7kb is smaller than the full F9 gene, necessitating the use of a compact promoter and careful construct design to fit within that constraint.

DNA Edit (i) What DNA would you want to edit and why? The DNA to be edited is the F9 gene located on the X chromosome, which encodes Clotting Factor IX — a protein essential to the blood clotting cascade. Hemophilia B occurs when mutations in this gene prevent the body from producing functional Factor IX, meaning even minor injuries or internal bleeds can become life-threatening without immediate medical intervention. The most common severe mutations include point mutations and small deletions that either truncate the protein or render it completely nonfunctional.

The technology of choice is CRISPR-Cas9, delivered to liver cells via an AAV8 viral vector. CRISPR-Cas9 is the most practical option because it is programmable, relatively affordable, and precise enough to target a single mutation within the F9 gene. AAV8 is paired with it specifically because this serotype has a strong natural affinity for hepatocytes — the liver cells where Factor IX is produced — making intravenous delivery efficient without requiring cells to be removed from the patient. How It Works A custom Guide RNA (gRNA) is designed to match the exact sequence surrounding the F9 mutation. The Cas9 protein escorts this gRNA through the cell, scanning the genome until it finds its complementary sequence and binds. It then executes a Double-Strand Break, cutting through both strands of the DNA helix at the target site. At this point, a donor DNA template carrying the correct F9 sequence is provided alongside the CRISPR machinery. The cell’s repair system uses this template as a blueprint to fix the break through Homology-Directed Repair (HDR), precisely overwriting the mutation with the healthy sequence. Preparation & Input The editing package consists of four components: the Cas9 nuclease, the custom gRNA, the corrective donor DNA template, and the AAV8 vector to carry everything into liver cells. The gRNA must be carefully designed to be unique to the F9 locus to minimize the risk of cutting elsewhere. The entire construct is packaged into AAV8 and delivered intravenously, where it naturally homes to the liver. Limitations The three main limitations are efficiency, off-target effects, and immune response. HDR is inherently inefficient in adult liver cells, as these are largely non-dividing — many cells will default to the imprecise NHEJ repair pathway, potentially worsening the mutation rather than correcting it. Off-target cuts remain a safety concern; if Cas9 mistakes a similar genomic sequence for its intended target, it could disrupt a critical gene. Finally, some patients carry pre-existing immune responses to AAV vectors, which can neutralize the delivery system before it reaches the liver, limiting the therapy’s effectiveness.