Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

This is the Lambda Sequence

This is the Lambda sequence with the cuts

First, I tried to use Ronan’s website to get a template, and then I made it in Benchling

Template:

Benchling:

3.1. Choose your protein.

I decided to use the Kappa Opioid GPCR, as this is the target for my biofactory, which is for my final project. It comes from the OPKR1 gene, and UniProt entry P41145 (OPRK_HUMAN) lists the canonical human kappa opioid receptor as 380 amino acids with this sequence:

MDSPIQIFRGEPGPTCAPSACLPPNSSAWFPGWAEPDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNSWPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVREDVDVIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLLSGSREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSTSHSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKMRMERQSTSRVRNTVQDPAYLRDIDGMNKPV

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

A convenient “answer key” for the corresponding human CDS is provided in KEGG for hsa:4986 (OPRK1), showing a 1143 nt coding region (380 aa + stop) https://www.genome.jp/dbget-bin/www_bget?hsa:4986

3.3. Codon optimisation.

Codon optimisation is done to match the host codon usage to improve the translation efficiency and protein yield. For the sake of this protein, I chose E. coli for simplicity so I can practice this according to the homework

https://en.vectorbuilder.com/tool/codon-optimization/c1dd076b-99f7-469f-84a3-c205c444346c.html

ATGGATAGCCCGATTCAGATTTTTCGCGGCGAACCGGGCCCGACGTGCGCCCCGAGCGCGTGTCTGCCGCCGAACAGCAGCGCGTGGTTTCCGGGCTGGGCGGAACCGGATAGCAATGGCAGCGCGGGTAGCGAAGATGCGCAGCTGGAACCGGCGCATATTAGTCCGGCGATCCCGGTGATTATTACCGCCGTGTATAGCGTGGTCTTTGTGGTGGGTCTGGTGGGCAACAGCCTGGTGATGTTTGTTATTATTCGCTATACCAAAATGAAAACCGCAACCAACATCTACATCTTCAACCTGGCACTGGCGGATGCGCTGGTGACCACCACCATGCCGTTTCAGAGCACCGTGTATCTGATGAATAGCTGGCCGTTCGGCGACGTGCTGTGTAAAATTGTGATTAGCATCGATTACTATAATATGTTTACCAGCATTTTTACCCTCACCATGATGAGCGTGGATCGTTACATTGCCGTGTGCCATCCGGTGAAAGCGCTGGATTTTCGTACGCCGCTGAAAGCGAAAATTATTAATATTTGCATTTGGCTGCTGAGCAGCAGCGTGGGCATTAGCGCGATTGTGCTGGGCGGCACCAAAGTGCGTGAAGATGTGGATGTGATCGAATGCAGCCTGCAGTTTCCGGATGACGATTATTCATGGTGGGATCTGTTTATGAAAATCTGCGTATTTATTTTTGCCTTTGTGATCCCTGTGCTGATTATTATTGTGTGCTACACCCTGATGATTCTGCGTCTGAAATCTGTGCGCCTGCTGAGCGGCAGCCGCGAAAAAGATCGTAATCTGCGCCGCATTACCCGCCTGGTGCTGGTGGTGGTGGCCGTGTTTGTGGTGTGCTGGACCCCGATCCACATTTTTATCCTGGTGGAAGCGCTGGGCTCGACGTCACATAGCACCGCGGCGCTGAGCAGCTATTACTTTTGCATTGCCCTGGGCTATACCAACAGCAGCCTGAATCCGATTCTGTATGCCTTTCTGGACGAAAATTTTAAACGCTGCTTTCGCGATTTTTGTTTTCCGCTGAAAATGCGCATGGAACGCCAGAGTACCAGCCGCGTGCGCAACACCGTGCAGGATCCGGCGTACCTGCGCGACATTGATGGTATGAACAAACCGGTGTAA

3.4. You have a sequence! Now what?

Technology 1: Clone codon-optimized CDS into a bacterial plasmid (T7/lac promoter), transform into E. coli. Codon optimization is used to match host codon bias to improve expression.

Technology 2: Induce expression; proteinproduction follows the same central dogma (DNA to RNA to protein), but membrane insertion/folding for 7TM proteins is a key challenge in bacteria.

Part 4: Prepare a Twist DNA Synthesis Order

This is my Benchling sequence link https://benchling.com/s/seq-pBSv19pNJ2bIdI5DcYzM?m=slm-17Elwj2xG3ryxk0DBJrw

This is the full OPRK1 Sequence:

Finally, this is my proposed vector:

5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank). I think it could be interesting to sequence small genetic regions related to caffeine metabolism and sensitivity; CYP1A2, AHR, ADORA2A. This is relevant as many individuals experience coffee and caffene differently and do not enjoy it as much as others, or experience more abhorrent side effects than others. By analysing the metabolism of caffeine from CYP1A2 and investigating the adenosine receptors, a specialised suggestion of caffeine intake, bean type, and coffee type can be permutated to give people the best experience with minimal side effects.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? Sanger sequencing is 1st-generation sequencing. It reads DNA by creating DNA fragments terminated by special nucleotides (ddNTPs) and separating them by capillary electrophoresis to infer the base order
What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. The input would be genomic DNA from a cheek swab or saliva sample. The steps to prepare them would be to Extract DNA from the sample, PCR amplify the short region(s) containing the SNP(s), to purify the PCR product and then set up Sanger sequencing reaction
What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? In Sanger sequencing, you first make many DNA copies, but the copying sometimes stops when a special “terminator” base (a fluorescent ddNTP) is added. This creates lots of DNA fragments of different lengths, each ending in a colored base. The fragments are then separated by capillary electrophoresis, and a detector reads the color signal as fragments pass by. The sequencing software converts the color peaks into A, C, G, T letters
What is the output of your chosen sequencing technology? The output of Sanger sequencing is usually a chromatogram/trace file (often .ab1) showing colored peaks, plus a text DNA sequence that the software called from those peaks

5.2 DNA Write (i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :) I want to synthesize plasmid DNA vectors that turn bacteria into biofactories for making human-useful therapeutics, specifically: a therapeutic hormone for metabolic disease (example: human insulin) a small-molecule product relevant to cardiovascular/metabolic health (example: Coenzyme Q10 (CoQ10) as a medically used antioxidant supplement with cardiovascular relevance)

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? The essential steps are: (1) computational design of the DNA construct (promoters/RBS/genes/terminators plus plasmid features), (2) chemical DNA writing using cyclic solid-phase phosphoramidite synthesis to generate oligos, (3) assembly of oligos into longer gene-length fragments when needed, (4) cloning/packaging into a plasmid backbone for bacterial expression, and (5) sequence verification/quality control so the final plasmid matches the intended design. These steps reflect the standard phosphoramidite cycle used for oligo construction and the gene/plasmid workflow offered by commercial synthesis providers
What are the limitations of your writing method (if any) in terms of speed, accuracy, scalability? A key limitation is that errors accumulate as DNA length increases, because each chemical base-addition step is not perfectly efficient; this means long constructs are more likely to contain substitutions or deletions and often require assembly from shorter pieces plus verification. In addition, although high-throughput synthesis platforms scale very well for many sequences in parallel, overall cost and turnaround time can still be bottlenecks for very large libraries or long, complex constructs, and certain sequence patterns (like repeats or extreme GC content) can reduce synthesis success and increase the need for troubleshooting or redesign

5.3 DNA Edit (i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why? I would want to edit human DNA SNPs that cause metabolic/cardiovascular disease, especially those that lead to very high LDL cholesterol and early heart disease risk, such as variants involved in familial hypercholesterolemia (FH). FH is commonly linked to harmful variants in LDLR (and sometimes related genes like APOB or PCSK9), and editing these could lower lifelong LDL exposure and reduce cardiovascular

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps? In prime editing, a Cas9 nickase fused to a reverse transcriptase is guided to a specific DNA site by a pegRNA that also contains the template for the desired change; the system nicks DNA and then “writes” the corrected sequence, which cellular repair processes finalize into a stable edit. This avoids relying on the same double-strand-break repair competition that often makes precise HDR edits difficult
What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? The main preparation is selecting the exact SNP to fix (e.g., an LDLR pathogenic variant) and designing the appropriate guide/pegRNA to target it. The key inputs are the editor components (prime editor or base editor), the guide RNA(s), and the target human cells/tissue context (for cholesterol disorders this is often discussed in relation to the liver because it controls LDL metabolism).
What are the limitations of your editing methods (if any) in terms of efficiency or precision? Major limitations include variable editing efficiency, possible off-target changes or unintended byproducts, and the practical challenge of safe, effective delivery of the editing system to the correct tissue in humans.