Week 2 HW: DNA READ, WRITE & EDIT

Part 1: Benchling & In-silico Gel Art

Simulated EcoRI Digestion

Simulated HindIII Digestion

Simulated BamHI Digestion

Simulated KpnI Digestion

Simulated EcoRV Digestion

Simulated SacI Digestion

Simulated SalI Digestion

Digital Gel Art. Enzymes used to process the DNA are listed in each respective lanes

Part 3: DNA Design Challenge

3.1. Choice of Protein and Protein Sequence

Chosen Protein: Green Fluorescent Protein (GFP)

Why this protein?

I chose Green Fluorescent Protein (GFP) because it is a well-characterized and widely used reporter protein in molecular biology and synthetic biology. GFP emits green fluorescence when exposed to blue or UV light, making it an ideal tool for studying gene expression, protein localization, and cellular processes. Its extensive documentation and availability of sequence data make it suitable for computational analysis and reverse translation exercises.

Protein Sequence Source

The amino acid sequence was obtained from UniProt, a curated protein sequence database (https://www.uniprot.org/uniprotkb/P42212/entry).

Protein Sequence (UniProt format)

sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1

MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

3.2. Reverse Translation: Protein to DNA

The Central Dogma of molecular biology explains how DNA is transcribed into RNA and translated into protein. Because multiple codons can encode the same amino acid, reverse translation produces a possible nucleotide sequence rather than a single definitive one.

Using a reverse translation tool from https://www.bioinformatics.org, the following nucleotide sequence was generated for the GFP protein.

Reverse-Translated DNA Sequence

atgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtggaactggatggc gatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgatgcgacctatggc aaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccgtggccgaccctg gtgaccacctttagctatggcgtgcagtgctttagccgctatccggatcatatgaaacag catgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgcaccatttttttt aaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggcgataccctggtg aaccgcattgaactgaaaggcattgattttaaagaagatggcaacattctgggccataaa ctggaatataactataacagccataacgtgtatattatggcggataaacagaaaaacggc attaaagtgaactttaaaattcgccataacattgaagatggcagcgtgcagctggcggat cattatcagcagaacaccccgattggcgatggcccggtgctgctgccggataaccattat ctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgatcatatggtgctg ctggaatttgtgaccgcggcgggcattacccatggcatggatgaactgtataaa

3.3. Codon Optimization

Why Codon Optimization Is Necessary

Although different organisms use the same genetic code, they prefer different codons for the same amino acids. Codon optimization improves protein expression by matching the codon usage of the host organism, increasing translation efficiency, mRNA stability, and overall protein yield.

Chosen Expression Organism and Reason

Escherichia coli

E. coli is commonly used for recombinant protein production due to its fast growth, low cost, and well-understood genetics.

Codon-Optimized DNA Sequence (for E. coli) using GenSmart™ Codon Optimization tool (https://www.genscript.com/gensmart-free-gene-codon-optimization.html)

ATGAGTAAAGGTGAAGAACTGTTTACCGGTGTGGTTCCGATCCTGGTTGAACTGGATGGTGATGTTAACGGTCATAAATTTTCAGTTTCTGGTGAAGGTGAAGGTGATGCTACCTATGGCAAATTGACTCTGAAGTTTATCTGTACCACTGGCAAATTGCCGGTGCCATGGCCAACTCTGGTGACCACTTTCTCTTATGGTGTACAGTGCTTCTCCCGTTATCCTGATCATATGAAACAGCATGATTTTTTCAAATCTGCTATGCCAGAAGGTTATGTTCAGGAAAGGACTATTTTCTTCAAGGATGATGGTAATTATAAAACTAGAGCTGAAGTTAAATTTGAAGGTGATACCTTGGTCAATCGTATTGAACTGAAAGGTATTGATTTTAAAGAAGATGGTAATATTCTGGGTCATAAACTGGAATATAATTATAATTCTCATAATGTTTATATTATGGCTGATCAGAAAAATGGTATTAAAGTTAATTTTAAAATTAGACATAATATTGAAGATAGTGGTTCAGTTCTGGCTGATCATTATCAGCAGAATACCCCAATTGGTGATGGTCCTGTTCTGCTGCCAGATAATCATTATTTGTCTACACAGAGTGCTTTGTCTAAGGATCCTAATGAAGAAAGAGATCATATGGTTCTGTTGGAATTTGTTACCGCTGCTGGTATTACACATGGTATGGATGAACTGTATTAA

3.4. You Have a Sequence! Now What?

Once the codon-optimized DNA sequence is obtained, several technologies can be used to produce the protein.

Cell-Dependent Method

The DNA sequence is inserted into a plasmid vector with a promoter.
The plasmid is transformed into a host cell (e.g., E. coli).
The host cell transcribes the DNA into mRNA.
Ribosomes translate the mRNA into the GFP protein.
The protein can be purified using affinity chromatography.

Cell-Free Method

The DNA is added to a cell-free transcription–translation system.
Transcription and translation occur in vitro.
The protein is synthesized without living cells.
Both methods rely on the Central Dogma: DNA → RNA → Protein.

Part 4: Prepare a Twist DNA Synthesis Order

4.2. Build Your DNA Insert Sequence

I selected GFP as my protein of interest due to its well-characterized fluorescence properties and robust expression in E. coli. The coding sequence used in this expression cassette was previously codon optimized for E. coli in Part 3 of this assignment.

I Constructed linear GFP expression cassette with annotated regulatory and coding regions

Benchling link: https://benchling.com/s/seq-0svwetPOR91RRgpKfRjz?m=slm-l7fH5hxSe2vTJEp6n9YU

I Uploaded GFP expression cassette and selected pTwist Amp High Copy vector for synthesis

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) I would sequence DNA used for DNA-based digital data storage. Since digital files are encoded into synthetic DNA strands, sequencing is necessary to verify that the information was written correctly and can be accurately recovered. Even small errors could corrupt stored data, so reading the DNA ensures reliability and long-term stability.

(ii) I would use second-generation Illumina sequencing developed by Illumina. The input would be the synthetic DNA pool, prepared by adding adapters and amplifying it before sequencing. The system reads DNA by incorporating fluorescently labeled bases one at a time and imaging each cycle to identify the sequence. The output is a set of short reads with quality scores that can be decoded back into digital information.

5.2 DNA Write

(i) I would synthesize a simple environmental biosensor circuit that expresses Green Fluorescent Protein when exposed to a specific contaminant. This would allow visible detection of environmental signals and could be used for low-cost monitoring or education.

(ii) I would use standard solid-phase chemical DNA synthesis, such as the method used by Twist Bioscience. DNA is built one nucleotide at a time and shorter pieces can be assembled into larger constructs. The main limitations are synthesis errors in longer sequences and the need for assembly steps for bigger genes.

5.3 DNA Edit

(i) I would edit plant genes related to drought tolerance to improve crop resilience under climate stress. Enhancing stress-response pathways could support food security and more sustainable agriculture.

(ii) I would use CRISPR-Cas9, developed for genome editing by scientists including Jennifer Doudna. A guide RNA directs the Cas9 enzyme to a specific DNA sequence, where it makes a cut that the cell repairs, introducing edits. Limitations include possible off-target effects and variable efficiency.