Week 2 HW: DNA Read Write and Edit

DNA Design Challenge

Chosen Protein: I chose GFP because it serves as a robust reporter that could be used for my allergen biosensor. The goal of the device is to turn a biological detection (sensing peanut DNA) into a signal the user can see. GFP spontaneously fluoresces green when exposed to UV or blue light (like a simple black light LED). By designing the system so that GFP is activated only when the allergen is detected (or shut off in the presence of the allergen), I can create an intuitive user interface.

Amino Acid Sequence:

sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 PE=1 SV=1 MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.: atgagtaaag gagaagaact tttcactgga gttgtcccaa ttcttgttga attagatggt gatgttaatg ggcacaaatt ttctgtcagt ggagagggtg aaggtgatgc aacatacgga aaacttaccc ttaaatttat ttgcactact ggaaaactac ctgttccatg gccaacactt gtcactactt tctgttatgg tgttcaatgc ttttcaagat acccagatca tatgaaacag catgactttt tcaagagtgc catgcccgaa ggttatgtac aggaaagaac tatatttttc aaagatgacg ggaactacaa gacacgtgct gaagtcaagt ttgaaggtga tacccttgtt aatagaatcg agttaaaagg tattgatttt aaagaagatg gaaacattct tggacacaaa ttggaataca actataactc acacaatgta tacatcatgg cagacaaaca aaagaatgga atcaaagtta acttcaaaat tagacacaac attgaagatg gaagcgttca actagcagac cattatcaac aaaatactcc aattggcgat ggccctgtcc ttttaccaga caaccattac ctgtccacac aatctgccct ttccaaagat cccaacgaaa agagagatca catggtcctt cttgagtttg taacagctgc tgggattaca catggcatgg atgaactata caaa

Codon optimization: Codon optimization is necessary because the genetic code is redundant; while multiple codons can specify the same amino acid, different organisms utilize these codons with varying frequencies. By rewriting the sequence to use the host’s “preferred” codons, we ensure that the corresponding tRNA molecules are readily available, which prevents ribosomal stalling and significantly accelerates protein synthesis. Additionally, optimization allows for the removal of inhibitory mRNA secondary structures and “unfavorable” sequences that could lead to premature degradation of the genetic instructions, ultimately ensuring a more robust and rapid visual signal for the user

For this assignment, I have chosen to optimize the sequence for Escherichia coli. As the most well-characterized model organism in synthetic biology, E. coli provides a reliable and standard baseline for protein expression with highly optimized commercial algorithms available for sequence design.

While E. coli is the selected host due to the robust, pre-set optimization tools available on platforms like Twist Bioscience, it serves as a functional baseline for standardization and ease of integration into common laboratory workflows. However, the ideal biological choice for a timely sensor would be Vibrio natriegens (TaxID: 1219067), which possesses a doubling time of under 10 minutes and a significantly higher ribosomal density. Utilizing E. coli for this optimization ensures a high Codon Adaptation Index (CAI) and reliable synthesis, though a custom-optimized Vibrio sequence would remain the preferred engineering solution for achieving maximum metabolic speed in a real-world application.

Optimized using Twist’s Codon optimization tool for E. Coli, avoiding EcoRI, BamHI, XhoI, HindIII, BasI, and BbsI cut sites. ATGTCCAAAGGTGAAGAGTTGTTTACCGGCGTTGTTCCCATCTTAGTGGAGCTCGACGGAGATGTCAATGGTCACAAATTCAGTGTATCAGGTGAAGGGGAAGGCGACGCGACATACGGGAAACTTACCTTAAAATTTATATGCACCACCGGCAAATTGCCCGTACCATGGCCAACGTTAGTGACCACCTTTTCCTACGGTGTCCAGTGCTTTTCACGGTACCCGGATCACATGAAACAGCACGACTTCTTCAAGTCCGCGATGCCGGAAGGCTACGTTCAAGAGCGCACCATATTCTTCAAGGATGACGGAAACTACAAGACGCGAGCAGAGGTTAAGTTCGAGGGAGACACCTTGGTAAATCGAATTGAATTAAAAGGCATTGACTTCAAGGAAGATGGAAACATCCTGGGCCATAAGCTGGAGTACAACTACAATAGTCATAATGTTTACATCATGGCGGATAAACAAAAGAATGGTATCAAGGTCAACTTCAAGATACGACACAATATCGAAGATGGATCTGTCCAATTAGCGGACCACTACCAGCAAAATACCCCCATTGGTGATGGTCCAGTTCTGCTCCCGGACAACCACTATTTGAGTACACAGTCGGCCCTCTCTAAGGACCCTAACGAAAAGCGGGACCATATGGTGCTCCTGGAATTTGTAACGGCCGCCGGAATTACCCACGGCATGGACGAGCTGTACAAATGA

Now what?:

8The Transcription and Translation Process The process occurs in two stages. First, transcription happens when RNA Polymerase binds to the promoter and creates a messenger RNA (mRNA) strand. Next, translation begins as ribosomes dock onto the mRNA at the Ribosome Binding Site. The ribosome “reads” the mRNA codons and recruits tRNA molecules to assemble amino acids—starting at your M (Methionine) and ending at the TAA stop codon—folding the chain into a functional protein.

Cell-Dependent Production In cell-dependent systems, the DNA is inserted into a plasmid and transformed into a living host like E. coli or V. natriegens. The host’s metabolism provides the energy (ATP) and raw amino acids needed for synthesis. While highly scalable for mass production, this method requires significant time for cell growth and complex purification steps to separate your target protein from the host’s cellular components.

Cell-Free Production A cell-free (TX-TL) system mixes the linear DNA fragment directly with a lysate containing ribosomes and enzymes harvested from broken-open cells. This might be the ideal “timely” choice for a sensor because it eliminates the need to wait for living cultures to grow. It allows for immediate protein production and is more resilient to the lysis buffers used in food testing, though it is generally more expensive for large-scale use.

Prepare a Twist DNA Synthesis Order

Linear Map (Annotated)

Twist Cloning Vector

DNA Read/Write/Edit

Read

What DNA would you want to sequence (e.g., read) and why? I would sequence the metagenomic DNA from the human gut microbiome of patients with neurodegenerative diseases like Alzheimer’s. Research into the “gut-brain axis” suggests that microbial diversity and specific bacterial metabolites (such as short-chain fatty acids) directly influence neuroinflammation. By reading the entire microbial community, we can identify specific species that are neuroprotective or neurotoxic, providing a non-invasive way to discover early biomarkers for brain health.

In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? I would use Illumina Next-Seq because it offers the optimal balance of accuracy and throughput required for complex microbiome analysis. When sequencing a fecal sample to understand the gut-brain axis, we are looking at a soup of DNA from thousands of different bacterial species. To detect rare but potentially influential neuro-active microbes, we need deep sequencing, meaning we must read the sample millions of times. Illumina’s second-generation technology is the industry leader for this because its cost per base is significantly lower than third-generation methods (like Nanopore) and its base-calling accuracy is higher (99.9%), ensuring that we don’t misidentify a bacterial species due to a sequencing error.

This is a second-generation technology. It is defined by its “sequencing-by-synthesis” (SBS) method, which uses massive parallelism to read millions of short DNA fragments simultaneously with extremely high accuracy.
The input is high-quality genomic DNA extracted from fecal samples. The essential preparation steps are fragmentation, adapter ligation, and PCR amplification
The sequencer adds fluorescently labeled nucleotides one at a time. Each time a base is incorporated, it emits a specific color. A high-resolution camera records these flashes, and the software performs base calling by converting the sequence of colors into a digital string of A, C, G, and T.
The output is a FASTQ file containing millions of short, highly accurate “reads” which are then computationally assembled to map the microbial species present in the gut.

Write

What DNA would you want to synthesize (e.g., write) and why? I would synthesize a closed-loop genetic logic gate circuit for use in iPSC-derived human microglia. This circuit would be designed to sense high levels of pro-inflammatory cytokines (like IL-1β) or Amyloid-beta and respond by triggering the production of a therapeutic anti-inflammatory protein (like IL-10) or a fluorescent reporter like mCherry for real-time monitoring of disease state in the lab. This would be useful because current drugs for Alzheimer’s are always on, leading to side effects. A synthetic circuit in iPSC-derived microglia could trigger a therapeutic response (like releasing a clearing enzyme) only when it senses high levels of Amyloid-beta or inflammatory cytokines.

What technology or technologies would you use to perform this DNA synthesis and why? I would use Twist Bioscience’s silicon platform because synthetic genetic circuits require the writing of many different regulatory parts (promoters, insulators, and reporters) that must work with each other. Twist’s technology is superior for this because it uses a semiconductor-based approach to synthesize thousands of Gene Fragments in parallel on a single silicon chip. For a BME project, this allows me to multiplex. I can order 50 variations of my Alzheimer’s-sensing circuit with slightly different promoter strengths to see which one has the best signal-to-noise ratio in my iPSC-derived microglia.

Twist utilizes a semiconductor-based silicon platform to perform traditional phosphoramidite chemistry in miniature. The process involves a four-step cycle: De-protection (removing a blocking group), Coupling (adding the next nucleotide), Capping (preventing incorrect chains from growing), and Oxidation (stabilizing the bond). This is repeated for each base until the custom sequence is complete.
Limitations:

Speed: Synthesis and shipping typically take 5–10 days, which is slower than biological replication.
Accuracy: Chemical synthesis has a small error rate that compounds with length; therefore, sequences longer than 2kb must be built by stitching together smaller, verified fragments.
Scalability: While the silicon platform allows for synthesizing thousands of different genes at once, the cost per base remains higher than large-scale natural DNA replication.

Edit

What DNA would you want to edit and why? I would want to edit the human APOE gene to convert the high-risk APOE4 allele into the protective APOE2 allele. APOE4 is the strongest genetic risk factor for late-onset Alzheimer’s, while APOE2 is known to be neuroprotective. Making this switch in human neural stem cells could fundamentally change a person’s risk profile and slow the progression of neurodegeneration.

What technology or technologies would you use to perform these DNA edits and why? I would choose CRISPR Base Editing specifically because of its safety profile in post-mitotic or sensitive cells like those found in the brain. Standard CRISPR-Cas9 (first-generation editing) creates Double-Strand Breaks, which can trigger a p53-mediated toxicity response or cause large, unintended deletions that could be catastrophic in a neural environment. Base Editing is the superior choice for a therapeutic application in Alzheimer’s research because it performs a search and replace at the single-atom level. By chemically converting the target nucleotide without ever cutting the DNA backbone, we minimize the risk of genomic instability while achieving the precise C -> T flip needed to convert the APOE4 risk allele into the protective APOE2 variant

Base editing uses a “deactivated” Cas9 (dCas9) or a nickase (nCas9) fused to a deaminase enzyme. Unlike standard CRISPR, it does not cut the DNA backbone. The process begins with targeting, where a custom-designed guide RNA leads the Base Editor complex to the precise location of the APOE SNP within the genome. Once the target is reached, the Cas9 domain performs unzipping, pulling the DNA strands apart to create a localized window of single-stranded DNA. Finally, the chemical conversion occurs; the fused deaminase enzyme physically rearranges the atoms of a specific base—for instance, converting a Cytosine into a Uracil. The cell’s natural repair machinery then recognizes this change and converts it into a Thymine (C -> T), effectively flipping the genetic switch from a risk allele to a protective one without ever creating a double-strand break.
For preparation, we must design a guide RNA with a 20bp spacer that is unique to the APOE locus. The input is typically the Base Editor protein (or mRNA) and the synthetic gRNA, delivered via lipid nanoparticles or viral vectors.
Despite its precision, the method faces a significant challenge in efficiency, as not every target cell will successfully receive the editor or undergo the chemical conversion. This results in mosaicism, a state where only a fraction of the neural population is corrected while others remain in the high-risk state. Furthermore, the technology is limited by its precision regarding bystander editing. If other identical bases (such as multiple Cytosines) are located within the narrow activity window of the deaminase enzyme, the editor may unintentionally change those nearby bases as well, potentially leading to unintended genetic modifications.