Week 2 HW: DNA Read, Write, and Edit

Part 0: Designing your Gel Art

Step 1

Step 2

Step 3

Step 4

ECORI

HindIII

BamHI

KpnI

EcoRV

SacI

SalI

All restriction enzyme

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1. Choose your protein. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose. [Example from our group homework, you may notice the particular format — The example below came from UniProt] >sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT

I choose Tau protein, because it is a microtubule associated protein that helps regulate neuronal microtubules, and it’s strongly connected to neurodegenerative disease (tauopathies, including Alzheimer’s disease). Tau is also interesting computationally because it has multiple splice isoforms and large intrinsically disordered regions.

UniProt Accession - P10636 Entry name - TAU_HUMAN Protein - Microtubule associated protein tau Gene - MAPT

>sp|P10636|TAU_HUMAN Microtubule-associated protein tau OS=Homo sapiens OX=9606 GN=MAPT PE=1 SV=5 MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPG SETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAG HVTQEPESGKVVQEGFLREPGPPGLSHQLMSGMPGAPLLPEGPREATRQPSGTGPEDTEG GRHAPELLKHQLLGDLHQEGPPLKGAGGKERPGSKEEVDEDRDVDESSPQDSPPSKASPA QDGRPPQTAAREATSIPGFPAEGAIPLPVDFLSKVSTEIPASEPDGPSVGRAKGQDAPLE FTFHVEITPNVQKEQAHSEEHLGRAAFPGAPGEGPEARGPSLGEDTKEADLPEPSEKQPA AAPRGKPVSRVPQLKARMVSKSKDGTGSDDKKAKTSTRSSAKTLKNRPCLSPKHPTPGSS DPLIQPSSPAVCPEPPSSPKYVSSVTSRTGSSGAKEMKLKGADGKTKIATPRGAAPPGQK GQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREP KKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLD LSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEK LDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDT SPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above. [Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Lysis protein DNA sequence atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

CDS (5’ - 3’ 1 atggctgagc cccgccagga gttcgaagtg atggaagatc acgctgggac gtacgggttg 61 ggggacagga aagatcaggg gggctacacc atgcaccaag accaagaggg tgacacggac 121 gctggcctga aagaatctcc cctgcagacc cccactgagg acggatctga ggaaccgggc 181 tctgaaacct ctgatgctaa gagcactcca acagcggaag ctgaagaagc aggcattgga 241 gacaccccca gcctggaaga cgaagctgct ggtcacgtga cccaagagga gttgagagtt 301 ccgggccggc agaggaaggc gcctgaaagg cccctggcca atgagattag cgcccacgtc 361 cagcctggac cctgcggaga ggcctctggg gtctctgggc cgtgcctcgg ggagaaagag 421 ccagaagctc ccgtcccgct gaccgcgagc cttcctcagc accgtcccgt ttgcccagcg 481 cctcctccaa caggaggccc tcaggagccc tccctggagt ggggacaaaa aggcggggac 541 tgggccgaga agggtccggc ctttccgaag cccgccacca ctgcgtatct ccacacagag 601 cctgaaagtg gtaaggtggt ccaggaaggc ttcctccgag agccaggccc cccaggtctg 661 agccaccagc tcatgtccgg catgcctggg gctcccctcc tgcctgaggg ccccagagag 721 gccacacgcc aaccttcggg gacaggacct gaggacacag agggcggccg ccacgcccct 781 gagctgctca agcaccagct tctaggagac ctgcaccagg aggggccgcc gctgaagggg 841 gcagggggca aagagaggcc ggggagcaag gaggaggtgg atgaagaccg cgacgtcgat 901 gagtcctccc cccaagactc ccctccctcc aaggcctccc cagcccaaga tgggcggcct 961 ccccagacag ccgccagaga agccaccagc atcccaggct tcccagcgga gggtgccatc 1021 cccctccctg tggatttcct ctccaaagtt tccacagaga tcccagcctc agagcccgac 1081 gggcccagtg tagggcgggc caaagggcag gatgcccccc tggagttcac gtttcacgtg 1141 gaaatcacac ccaacgtgca gaaggagcag gcgcactcgg aggagcattt gggaagggct 1201 gcatttccag gggcccctgg agaggggcca gaggcccggg gcccctcttt gggagaggac 1261 acaaaagagg ctgaccttcc agagccctct gaaaagcagc ctgctgctgc tccgcggggg 1321 aagcccgtca gccgggtccc tcaactcaaa gctcgcatgg tcagtaaaag caaagacggg 1381 actggaagcg atgacaaaaa agccaagaca tccacacgtt cctctgctaa aaccttgaaa 1441 aataggcctt gccttagccc caaacacccc actcctggta gctcagaccc tctgatccaa 1501 ccctccagcc ctgctgtgtg cccagagcca ccttcctctc ctaaatacgt ctcttctgtc 1561 acttcccgaa ctggcagttc tggagcaaag gagatgaaac tcaagggggc tgatggtaaa 1621 acgaagatcg ccacaccgcg gggagcagcc cctccaggcc agaagggcca ggccaacgcc 1681 accaggattc cagcaaaaac cccgcccgct ccaaagacac cacccagctc tggtgaacct 1741 ccaaaatcag gggatcgcag cggctacagc agccccggct ccccaggcac tcccggcagc 1801 cgctcccgca ccccgtccct tccaacccca cccacccggg agcccaagaa ggtggcagtg 1861 gtccgtactc cacccaagtc gccgtcttcc gccaagagcc gcctgcagac agcccccgtg 1921 cccatgccag acctgaagaa tgtcaagtcc aagatcggct ccactgagaa cctgaagcac 1981 cagccgggag gcgggaaggt gcagataatt aataagaagc tggatcttag caacgtccag 2041 tccaagtgtg gctcaaagga taatatcaaa cacgtcccgg gaggcggcag tgtgcaaata 2101 gtctacaaac cagttgacct gagcaaggtg acctccaagt gtggctcatt aggcaacatc 2161 catcataaac caggaggtgg ccaggtggaa gtaaaatctg agaagcttga cttcaaggac 2221 agagtccagt cgaagattgg gtccctggac aatatcaccc acgtccctgg cggaggaaat 2281 aaaaagattg aaacccacaa gctgaccttc cgcgagaacg ccaaagccaa gacagaccac 2341 ggggcggaga tcgtgtacaa gtcgccagtg gtgtctgggg acacgtctcc acggcatctc 2401 agcaatgtct cctccaccgg cagcatcgac atggtagact cgccccagct cgccacgcta 2461 gctgacgagg tgtctgcctc cctggccaag cagggtttgt ga

3.3. Codon optimization. Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why? [Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Lysis protein DNA sequence with Codon-Optimization ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

Even though different codons can encode the same amino acid, organisms prefer certain codons. If a gene sequence is from one organism and express it in a different host, the host may,
* Translate it slowly because the needed tRNAs are rare.
* Have ribosomo stalling, premature termination, or misfolding issues
* Produce much lower protein yield

I would optimize Tau (MAPT) for Escherichia coli because it’s the most common, fast, and inexpensive expression host for recombinant proteins in teaching labs and basic cloning. It also has strong codon relative to human genes, so optimization often makes a big difference..

The tool reports statistics for the original versus codon-optimized DNA sequence. The pasted sequence has a GC content of 61.19% and a CAI of 0.51, where CAI (Codon Adaptation Index) indicates how closely the codon usage matches the preferred codons of the chosen expression host. A value around 0.5 suggests only moderate adaptation and potentially less efficient translation. After optimization, the improved DNA keeps a very similar GC content (60.59%) but raises the CAI to 0.90, meaning the DNA was rewritten using synonymous codons that are much more favored by the host while still encoding the same protein. In addition, the avoid cleavage sites setting shows the optimized sequence was designed to remove internal restriction sites for Basal and Esp3I (a common alias for BsmBI), which helps prevent unwanted cutting suring Type IIS.

3.4. You have a sequence! Now what? What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Cell dependent (in-vivo) expression in E. coli Tau protein can be produced by recombinant expression in living E. coli. The optimized tau coding sequnece is cloned into a bacterial expression plasmid containing an E. coli promoter, a Shine-Dalgarno ribosome binding site, and a transcription terminator, oftern with an affinity tag such as a His-tag to aid purification. After transformation into E.coli and induction (with IPTG in lac/T7 systems), RNA polymerase transcribes the tau gene into mRNA. Ribosomes bind the RBS and translate the mRNA from the start codon to the stop codon as tRNAs add amino acids according to each codon. Because the sequnece is codon-optimized (high CAI) for E. coli, codons better match abundant bacterial tRNAs, which can improve translation efficiency and increase protein yield. The expressed tau protein is then harvested from cells and purified, commonly ny affininty chromatography.

Cell-free (in-vivo) expression Tau protein can also be produced using cell free expresison, where the tau DNA template (plasmid or linear) is added directly to an E. coli lysate or purified transription/translation system containing RNA polymerase, ribosomes, tRNAs, amino acids, nucleotides, and an energy regeneration mixture. In this setup, the DNA is transcribed into mRNA and translated into tau protein in vitro, without maintainig living cells. Cell free systems are typically faster to set up and allow prcise control conditions (such as temperature and reaction composition), which can be useful for optimizing yield or expressing proteins that are difficult or toxic in-vivo.

3.5. [Optional] How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!!

In natural biological systems (especially eukaryotes), one gene can produce multiple protein isoforms because the primary transcript (pre-mRNA) can be processed in different ways before translation. The main transcription level (and immediate RNA processing) mechanisms are, alternative promoter usage (different transcription sites - different 5’ ends), alternative splicing (different combinations of expns retained/removed - different coding sequences). Alternative polyadenylation (different 3’ ends that can change UTRs and sometimes coding regions), and in some organisms RNA editing (changing specific bases in the RNA, which can alter codons). Among these, alternatice splicing is the most common explanation for “one gene -> many proteins”. Because inclusing or skipping specific exons changes the mRNA codons and therefore the aminoacid sequence of the final protein.

i.e.

GENE (coding strand DNA)
Exon 1:  ATG GCT
Exon 2:  GAA TTT
Exon 3:  CCT TAA

Pre‑mRNA (conceptually: exons + introns transcribed, then introns removed)
Mature mRNA Isoform A (Exon 1 + Exon 2 + Exon 3):
AUG GCU GAA UUU CCU UAA

Translation (codon → amino acid) Isoform A:
AUG  GCU  GAA  UUU  CCU  UAA
Met  Ala  Glu  Phe  Pro  Stop


Mature mRNA Isoform B (Exon 1 + Exon 3; exon 2 skipped):
AUG GCU CCU UAA

Translation Isoform B:
AUG  GCU  CCU  UAA
Met  Ala  Pro  Stop

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account and a Benchling account

4.2. Build Your DNA Insert Sequence

**For example, let’s make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein):

In Benchling, select New DNA/RNA sequence**

Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).

Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated!

Linear Map After Annotated

This insert sequence you built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). Go ahead and download the FASTA file for the sequence you made.

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs. Here’s an example of what you just annotated in Benchling:

4.3. On Twist, Select The “Genes” Option

4.4. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.5. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.6. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

Link

This is the plasmid I just built with my expression cassette included. Congratulations on building my first plasmid, My Self :-)

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I have selected the MAPT gene (Microtubule-Associated Protein Tau), which encodes the Tau protein, as the DNA sequence I would prioritize for sequencing. Located on chromosome 17q21 in humans, the MAPT gene spans approximately 134 kilobases and consists of 16 exons. Tau protein plays a critical role in stabilizing microtubules within neurons, facilitating axonal transport and maintaining neuronal structure. Sequencing this gene offers profound insights into human health, particularly neurodegenerative diseases, while also holding potential for broader applications in biobanking and bioinformatics.

Rationale for Sequencing the MAPT Gene

1. Relevance to Human Health and Disease Research

Alzheimer’s Disease (AD) - Mutations or hyperphosphorylation of Tau leads to neurofibrillary tangles, a hallmark of AD pathology. Sequencing MAPT could identify genetic variants (e.g., single nucleotide polymorphisms like rs8070723) that increase susceptibility, enabling risk stratification and early intervention.

Frontotemporal Dementia (FTD) and Other Tauopathies - Over 50 pathogenic mutations in MAPT have been linked to FTD, progressive supranuclear palsy, and corticobasal degeneration. High-throughput sequencing could reveal novel variants, informing precision medicine approaches such as gene therapy or small-molecule inhibitors targeting Tau misfolding.

Tau dysfunction is implicated in conditions like chronic traumatic encephalopathy (CTE) from repetitive brain injuries. Comparative sequencing across populations could elucidate gene-environment interactions, such as the role of lifestyle factors in disease onset.

By sequencing MAPT, researchers could generate data for genome-wide association studies (GWAS), accelerating drug discovery. For instance, CRISPR-based editing of faulty MAPT sequences has shown promise in preclinical models, potentially translating to therapeutic applications.

2. Applications in Environmental Monitoring and Biodiversity

Analyzing MAPT orthologs in model organisms (e.g., mice, zebrafish) or even ancient DNA from extinct species (e.g., Neanderthals via paleogenomics) could provide evolutionary insights into brain development and resilience. This ties into biodiversity monitoring by highlighting conserved genetic elements across species, aiding conservation efforts in ecosystems affected by climate change.

In scenarios like wastewater analysis (e.g., eDNA sequencing), detecting human-derived MAPT fragments could serve as biomarkers for population health surveillance, such as tracking neurodegenerative disease prevalence in urban sewage systems.

3. Beyond Traditional Applications: DNA Data Storage and Biobanks

Integrating MAPT sequences into large-scale biobanks (e.g., the UK Biobank or All of Us Research Program) would create comprehensive datasets for AI-driven analysis. This could facilitate machine learning models predicting disease trajectories based on genetic, epigenetic, and phenotypic data.

Emerging technologies use synthetic DNA for high-density, long-term data storage. Studying the stable, repetitive structure of MAPT could inspire bioengineered storage systems, where natural gene motifs encode digital information with error-correcting capabilities. This represents a futuristic intersection of biology and information technology, potentially revolutionizing data archiving in an era of big data.

Methodological Considerations

Sequencing Techniques I recommend next-generation sequencing (NGS) platforms like Illumina NovaSeq for high accuracy, supplemented by long-read technologies (e.g., PacBio) to resolve MAPT’s repetitive regions. Bioinformatics tools such as GATK for variant calling and AlphaFold for protein structure prediction would enhance data interpretation.

Interpreting non-coding regions and epigenetic modifications (e.g., methylation) requires interdisciplinary expertise. Future work could involve single-cell sequencing to capture neuronal heterogeneity.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

For sequencing the MAPT gene (encoding Tau protein), I would employ a hybrid approach combining Illumina NovaSeq (a second-generation sequencing platform) as the primary technology, supplemented by PacBio Sequel (a third-generation platform) for targeted regions.

Why Illumina NovaSeq? It is a high-throughput, short-read next-generation sequencing (NGS) system ideal for accurate, cost-effective sequencing of targeted genes like MAPT. With read lengths of 100–300 base pairs (bp) and error rates below 0.1%, it excels in variant detection, including single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) relevant to tauopathies. Its scalability suits biobank-scale projects, and costs have dropped to ~$0.01 per million bases, making it accessible for human health research. However, it struggles with highly repetitive or GC-rich regions in MAPT, which is why supplementation is needed.

Why Supplement with PacBio Sequel? This long-read technology provides reads up to 20–30 kb, enabling resolution of MAPT’s complex repetitive exons (e.g., exon 10 repeats linked to splicing isoforms). It is particularly useful for phasing haplotypes and detecting structural variants, which short-read methods might miss.

Illumina NovaSeq This is a second-generation sequencing technology. Second-generation methods, also known as NGS, involve massively parallel sequencing of amplified DNA fragments, producing short reads (typically <500 bp) with high throughput. Unlike first-generation (e.g., Sanger sequencing, which is chain-termination based and low-throughput) or third-generation (e.g., single-molecule, long-read methods), it relies on amplification and reversible terminator chemistry for base-by-base synthesis and detection.

PacBio Sequel This is a third-generation technology. It sequences single DNA molecules in real-time without amplification, generating long reads directly from native DNA. This distinguishes it from second-generation methods by avoiding PCR bias and enabling direct observation of epigenetic modifications.

The input for both technologies is high-quality genomic DNA extracted from samples (e.g., blood, tissue, or cell lines relevant to MAPT studies, such as neuronal cells from Alzheimer’s patients). Preparation ensures the DNA is suitable for library construction, focusing on purity (A260/A280 ratio ~1.8–2.0) and quantity (typically 50–1000 ng).

Steps for Input Preparation (Illumina NovaSeq-Focused, with PacBio Notes)

DNA Extraction - Isolate genomic DNA using kits like Qiagen DNeasy (for blood/tissue) to obtain intact, high-molecular-weight DNA.

Quantification and Quality Check - Measure concentration (e.g., Qubit fluorometer) and integrity (e.g., agarose gel electrophoresis or Agilent Bioanalyzer).

Fragmentation - Shear DNA into 200–500 bp fragments using sonication (e.g., Covaris) or enzymatic methods (e.g., Nextera tagmentation) for Illumina. (For PacBio, minimal fragmentation is needed; aim for >10 kb fragments using gentle pipetting or MEGARUPTOR.)

End Repair and A-Tailing - Blunt-end fragments and add adenine overhangs using enzymes like T4 DNA polymerase and Klenow fragment to facilitate adapter ligation.

Adapter Ligation - Attach platform-specific adapters (e.g., Illumina TruSeq adapters with barcodes for multiplexing) using T4 DNA ligase. These include sequencing primers and indices.

PCR Amplification - Enrich the library via limited-cycle PCR (8–12 cycles) to amplify adapter-ligated fragments, introducing necessary sequences for clustering. (PacBio skips or minimizes PCR to avoid bias, using ligation-based library prep like SMRTbell adapters.)

Size Selection and Purification - Use magnetic beads (e.g., AMPure XP) to select optimal fragment sizes and remove contaminants.

Quality Control - Validate library via qPCR or Bioanalyzer to ensure concentration and fragment distribution.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I propose synthesizing a modular genetic circuit designed to sense and respond to abnormal Tau protein aggregation, a hallmark of neurodegenerative diseases like Alzheimer’s disease (AD). This circuit would be based on a synthetic promoter-responsive system integrated with a reporter gene, drawing inspiration from synthetic biology tools like those used in biosensors. Building on my previous selection of the MAPT gene for sequencing (which encodes Tau), this synthesis shifts to “writing” DNA for therapeutic and diagnostic applications. The circuit could be delivered as plasmid DNA or mRNA for expression in neuronal cell models, enabling real-time monitoring of Tau pathology.

Neurodegenerative diseases affect over 50 million people globally, with AD alone projected to triple in prevalence by 2050. Abnormal Tau aggregation leads to neurofibrillary tangles, disrupting neuronal function. Synthesizing this circuit would allow for the creation of a biosensor that detects Tau misfolding in living cells, triggering a fluorescent reporter (e.g., GFP) or therapeutic response (e.g., expression of a chaperone protein to dissolve aggregates). This aligns with mRNA-based therapies, similar to COVID-19 vaccines, where synthetic sequences are used for targeted protein expression. It could accelerate drug screening by providing a high-throughput assay for anti-Tau compounds.

The circuit acts as a genetic sensor for intracellular “environmental” stimuli, such as protein misfolding or inflammation associated with tauopathies. It could be engineered into organoids or animal models to monitor disease progression in real-time, responding to stimuli like oxidative stress.

Extended versions could incorporate Tau-inspired structural proteins for biomaterials (e.g., microtubule-like scaffolds for tissue engineering). In a creative twist, it could inspire DNA origami art, where Tau motifs form nanoscale structures mimicking neuronal tangles for educational visualizations.

What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Twist’s method achieves error rates as low as 1:2,000 bases through massively parallel synthesis of oligonucleotides (oligos) up to 300 nucleotides long, followed by error correction and assembly into longer genes. This is crucial for my circuit, which requires precise regulatory elements (e.g., promoter motifs) to ensure functionality in sensing Tau aggregation without off-target effects.

It supports synthesizing thousands of custom sequences in parallel on a single chip, making it ideal for iterative designs (e.g., variants of the GFP reporter or adding therapeutic modules). For my ~1.2 kb construct, Twist can deliver it as a clonal gene fragment or plasmid insert, with turnaround times of 5–10 business days.

Twist has experience with biotech tools, including circuits for drug discovery and biosensors. Their platform aligns with the assignment’s invitation to have Twist synthesize constructs, enabling real-world prototyping for Tau-related neurodegeneration research.

For ultra-long constructs (>10 kb, e.g., if expanding to a full synthetic genome fragment), EDS uses template-independent DNA polymerases to build sequences de novo, avoiding chemical synthesis limitations like toxicity from phosphoramidite reagents. However, Twist’s method suffices for my modular circuit.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

auopathies affect millions worldwide, with AD alone impacting over 50 million people and costing ~$1 trillion annually in healthcare. Mutations in MAPT (e.g., the P301L variant) disrupt Tau’s role in microtubule stabilization, leading to neurodegeneration, cognitive decline, and reduced lifespan. Editing these could prevent or reverse pathology, extending healthy lifespan by 5–10 years in at-risk individuals. This aligns with human augmentation goals, such as enhancing cognitive resilience against aging, without venturing into controversial “designer” traits.

MAPT is a high-impact target: It’s directly linked to preventable suffering, and editing it builds on existing research (e.g., preclinical CRISPR studies in mouse models showing reduced Tau aggregates). Unlike broader genome edits (e.g., for polygenic traits like intelligence), MAPT offers a focused, monogenic intervention with clearer paths to clinical trials. It also ties into my prior choices, creating a cohesive “read-write-edit” pipeline for Tau-related biotech.

Extend edits to model organisms like mice or zebrafish, engineering MAPT orthologs to study evolutionary conservation of brain health. This could inform de-extinction efforts (e.g., editing revived mammoth genomes for neural resilience in changing climates) or animal restoration (e.g., protecting endangered species from stress-induced neurodegeneration).

Inspired by MAPT’s role in cellular structure, edit plant genomes for analogous traits, such as enhancing microtubule-associated proteins in crops (e.g., editing maize genes for better drought resistance or nitrogen fixation efficiency). This could improve food security by creating resilient, high-yield plants without GMOs’ ethical baggage.

Editing MAPT prioritizes equity—targeting diseases disproportionately affecting aging populations in underserved communities. It avoids eugenics concerns by focusing on therapy, adhering to guidelines like those from the WHO on genome editing. Long-term, it could democratize longevity, but I’d advocate for global access to prevent exacerbating inequalities.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

CRISPR-Cas9 enables targeted double-strand breaks (DSBs) at specific MAPT loci, while base editing allows single-base changes (e.g., C-to-T for correcting point mutations) without DSBs, reducing risks like indels. Prime editing offers “search-and-replace” functionality for precise insertions/deletions, ideal for MAPT’s repetitive regions linked to tauopathies.

It’s cost-effective (~$100–$500 per experiment), widely available, and scalable for high-throughput applications (e.g., editing neuronal cell lines or crop genomes). Preclinical studies (e.g., in AD mouse models) have shown CRISPR reducing Tau aggregates by 50–70%, making it suitable for human health, longevity, and augmentation goals.

For human therapeutics, it’s somatic-cell focused to avoid germline ethics issues. In agriculture/conservation, multiplex CRISPR (editing multiple sites) can enhance traits like nitrogen fixation in plants or neural resilience in animals. It’s more efficient than older methods like ZFNs (zinc-finger nucleases) and has a proven track record in FDA-approved therapies (e.g., Casgevy for sickle cell disease).

For MAPT regions with high off-target potential (e.g., repetitive sequences), TALENs provide modular DNA-binding domains for greater specificity, though they’re more labor-intensive. This hybrid approach ensures robustness for complex edits.

CRISPR-Cas9 edits DNA by using a guide RNA (gRNA) to direct the Cas9 enzyme to a specific sequence, where it creates a DSB. Variants like base editing fuse Cas9 with deaminases for single-base changes without breaks, while prime editing uses a modified Cas9 with reverse transcriptase for precise rewriting. Below, I focus on prime editing as the advanced method for MAPT (e.g., correcting mutations), with notes on standard Cas9.

Steps for Prime Editing (Primary Method)

Targeting - The prime editing guide RNA (pegRNA) hybridizes to the target MAPT DNA sequence, recruiting the prime editor (a nicking Cas9 fused to reverse transcriptase).
Nicking - The Cas9 domain creates a single-strand nick at the target site (e.g., exon 10 of MAPT), exposing the DNA for editing without a full DSB.
Reverse Transcription - The pegRNA serves as a template; reverse transcriptase copies the desired edit (e.g., correcting P301L by replacing a single nucleotide) onto the nicked strand.
Flap Resolution and Ligation - Cellular machinery resolves the edited flap, incorporating the change via endogenous repair pathways (e.g., flap endonuclease removes the old sequence, and ligase seals the new one).
Integration - The edit is permanently incorporated into the genome, with minimal disruption.

How It Edits DNA It rewrites DNA like a word processor searching for the target via gRNA complementarity, then precisely replacing/inserting bases using the pegRNA template. This achieves ~20–50% efficiency for point edits, higher than standard Cas9 (~10–30% for HDR-based repairs).

For standard CRISPR-Cas9 - It induces DSBs, repaired by non-homologous end joining (NHEJ, for knockouts) or homology-directed repair (HDR, for precise insertions using a donor template). Base editing deaminates bases (e.g., C to U, then T) without breaks.

What Preparation Do You Need to Do (e.g., Design Steps) and What Is the Input (e.g., DNA Template, Enzymes, Plasmids, Primers, Guides, Cells) for the Editing?

Design Steps (Preparation) Target Selection - Analyze MAPT sequence (e.g., via NCBI or Ensembl) to identify edit sites (e.g., rs63751273 for P301L). Use tools like CRISPOR or Benchling to predict gRNA efficacy and off-target scores.

gRNA/pegRNA Design - Design 20-nt gRNAs complementary to the target, with a PAM site (NGG for Cas9). For prime editing, extend to pegRNA with edit template (e.g., 10–20 nt replacement sequence). Optimize for GC content (40–60%) and specificity.

Donor Template Design (if using HDR) - Create a single-stranded DNA oligo or plasmid with homology arms (~500 bp) flanking the desired edit.

Off-Target Prediction - Run in silico analyses (e.g., Cas-OFFinder) to select low-risk designs; validate with GUIDE-seq. Delivery System Design: Choose vectors (e.g., plasmids for transfection, AAV for in vivo) and cell-type optimizations (e.g., lipid nanoparticles for neurons).

Inputs for Editing DNA Template/Donor - Single- or double-stranded DNA oligo (50–200 nt) or plasmid carrying the repair sequence (e.g., wild-type MAPT exon for correction).

Enzymes - Cas9 protein (or prime editor fusion), often as ribonucleoprotein (RNP) complexes for direct delivery. Plasmids/Vectors: pSpCas9 plasmid expressing Cas9 and gRNA, or all-in-one vectors like Addgene’s PE2 for prime editing.

Primers/Guides - Synthetic gRNA/pegRNA (e.g., from IDT, 100–150 nt for pegRNA) and PCR primers for verification.

Cells/Organisms - Target cells (e.g., HEK293 or iPSC-derived neurons for human models; plant protoplasts for agriculture). For in vivo, animal models like MAPT-mutant mice.

Other - Transfection reagents (e.g., Lipofectamine), antibiotics for selection, and media for cell culture. These are assembled into a editing cocktail (e.g., RNP + donor DNA) and introduced via electroporation, transfection, or injection.

What Are the Limitations of Your Editing Methods (If Any) in Terms of Efficiency or Precision?

Efficiency Editing rates vary (10–50% for prime editing, lower in primary cells like neurons due to delivery barriers). HDR efficiency is low (~1–10%) in non-dividing cells, favoring error-prone NHEJ and indels. For MAPT, repetitive regions can reduce targeting success.

May require multiple rounds or enrichment (e.g., FACS sorting), increasing time/cost. In agriculture, plant regeneration from edited cells can be inefficient (20–40% success).

Use prime/base editing (DSB-free, higher efficiency) or multiplex gRNAs; optimize delivery (e.g., AAV for 70–90% transduction in brain tissues).

Precision Off-target effects (1–5% of edits) can occur at similar sequences genome-wide, potentially causing unintended mutations or cancer risks in therapeutics. Prime editing is more precise but can introduce rare “bystander” edits. MAPT’s homology to other genes (e.g., MAP2) heightens this risk. Impact: Reduces reliability for clinical use; ethical concerns for human augmentation or conservation (e.g., unintended ecological effects in edited species). Mitigation: Advanced variants like high-fidelity Cas9 (e.g., SpCas9-HF1) cut off-targets by 90%; combine with TALENs for dual validation. Post-editing sequencing (e.g., whole-genome) confirms precision.