Week 2 HW: DNA Read Write and Edit

Table of contents

Software used:

  • Terminal,
  • git,
  • xcode,
  • hugo,
  • benchling,
  • rcdonovan website,
  • twist website.

Objective:

This week explores the read–write–edit toolkit: sequencing and synthesis workflows, restriction digests and gel electrophoresis, and early genome-editing frameworks.

Background:

DNA Read (George Church), Write (Joe Jacobson), & Edit (Emily Leproust). In addition to recitation and Tokyo Biohub node lab meetings

Methods:

  • Start with touchpoint of Design stage of SynBio DBTL cycle with In-silico Gel Art
  • Build DNA fragments in Benchling with restriction digests for Testing with Gel Electrophoresis
  • Learn from Benchling work & In-silico Gel Art
  • Start to Design or
  • Gel Electrophoresis
  • Obtain protein sequences
  • Plasmid digestion with restriction enzymes,
  • Preparing Twist DNA Synthesis Order

Tasks:

  1. Documentation
  • Make sure to document every step of the in-silico and lab experiments. Make sketches, screenshots, notes, drawings… anything that helps you - and others - understand the experiment. Your documentation should help you - and others - to understand the topic. Don’t be afraid to add things that don’t work. Show your failures - and how you overcame them. Your Documentation should be a description of the amazing journey you are on!
  1. Part 0: Basics of Gel Electrophoresis
  • Attend or watch all the lectures and recitation videos. Optionally watch bootcamp.

Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details.

Overview:

  • The EcoRI RE is sourced from Escherichia coli> with palladrome cut at AATT 5’-GAATTC-3’ 3’-CTTAAG-5’ leaving a 5’ sticky end.
  • The BamHI RE is sourced from Bacillus amyloliquefaciens and scans for 5’-GGATCC-3’ 3’-CCTAGG-5’ to cut between G and G leaving a 5’ sticky end.
  • The HindIII RE is sourced from Haemophilus influenzae and scans for 5’-AAGCTT-3’ 3’-TTCGAA-5’ leaving a 5’ sticky end.
  • The KpnI RE is sourced from Klebsiella pneumoniae, it requires small molecule cofactors including Mg and Ca ions to complete cut with fidelity; uses 5’-GGTACC-3’ 3’-CCATGG-5’ and rather uniquely for this experimental RE set leaves a 3’ sticky end.
  • The EcoRV RE is sourced from Escherichia coli also and scans for 5’-GATATC-3’ 3’-GTATAG-5’ and leaves the blunt end for this RE set.
  • The SacI RE is sourced from Streptomyces achromogenes and scans for 5’-GAGCTC-3’ 3’-GTCGAG-5’ leaving a 5’ sticky end.
  • The SaII RE is sourced from Streptomyces albus and scans for 5’-GTCGAC-3’ 3’-CAGCTG-5’ leaving a 5’ sticky end.
  • Source: Recognition sequences and cleavage patterns were verified using the REBASE database (Roberts et al., 2015).
  • Create a pattern/image w/style of Paul Vanouse’s Latent Figure Protocol artworks.
  • Use Ronan’s website as a helpful tool for quickly iterating on designs! Here is the link [https://rcdonovan.com/gel-art].
Benchling_Virtual_Digest_Report Benchling_Virtual_Digest_Report

HW2 is structured purposefully to make us think like synbio engineers. For example, the reason we transition from Gel Electrophoresis to Restriction Digests is because we cannot move large strands of DNA and RNA through the GE matrix. We need small enough pieces of readable genetic material just to accomplish the lab assay. This makes RD a function necessary to achieve our design objectives. Benchling is a similar addition to the HW2 learning module, we need to see the restriction digests applied on our Lamba model and the computational ladder for converting the pieces of plasmid DNA in our GE matrix, it then helps that we can use Benchling in subsequent steps also.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.

Now if your mind works like mine it might seem abrupt to leap from the movement of DNA through GE matrix to proteins but not if you understand the Central Dogma, sure, but even more the SynBio Design, Build, Test, Learn loop.

  • [https://doi.org/10.1371/journal.pbio.3002116]

Add a Bacterial chromosome and plasmid sequenced with Oxford Nanopore MiniON because I am annoyingly meticulous with discovery. In my HW2 discussion questions I am going to sing praises to Nanopore so also better to be consistent in DNA read inputs. I will download chromosome and plasmid DNA and load into Benchling. Please note the Genbank files do not play nicely with Benchling, so I will need to shift to FASTAs. Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 > Chromosome GenBank: https://www.ncbi.nlm.nih.gov/nuccore/CP033092.2/ > CP033092.2 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome > Plasmid GenBank: https://www.ncbi.nlm.nih.gov/nuccore/CP033091.2/ > CP033091.2 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 plasmid unnamed, complete sequence

Part 3: DNA Design Challenge

3.1. Choose your protein.

  • In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
  • [Example from our group homework, you may notice the particular format — The example below came from UniProt]
  • sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT Considering: RpoS in E. coli K-12 will download Amino Acids for protein below and convert backwards to genome if I do not find an online reference that isn’t deleted.
83333_0:000b85 {"organism":"Escherichia coli K-12","genome_id":"GCF_000974885.1","pub_prot_id":"WP_000081588.1","pub_gene_id":"SF31_RS18190","description":"RNA polymerase sigma factor RpoS"}
MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEQEPSDNDLAEEELLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLVVKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMNQTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNERITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAKQREVLARRFGLLGYEAATLEDVGREIGLTRERVRQIQVEGLRRLREILQTQGLNIEALFRE

https://www.ebi.ac.uk/interpro/result/InterProScan/iprscan5-R20260216-160122-0718-15835993-p1m/internal-1771257679016-348-1/ https://alphafold.ebi.ac.uk/entry/P13445 https://www.ncbi.nlm.nih.gov/datasets/gene/GCF_003697165.1/

Protien Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome

MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEEEPSDNDLAEEELLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLVVKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMNQTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNERITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAKQREVLARRFGLLGYEAATLEDVGREIGLTRERVRQIQVEGLRRLREILQTQGLNIEALFREEVSICQKGQSQARLAFFLLVHGTC*

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

  • The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
  • [Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]
  • Lysis protein DNA sequence atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa
  • Nucleotide Sequence for my gene pick
>gnl|ECOLI|EG10510 rpoS RPOS-MONOMER (complement(2866559..2867551)) Escherichia coli K-12 substr. MG1655
atgAGTCAGA ATACGCTGAA AGTTCATGAT TTAAATGAAG ATGCGGAATT TGATGAGAAC
GGAGTTGAGG TTTTTGACGA AAAGGCCTTA GTAGAACAGG AACCCAGTGA TAACGATTTG
GCCGAAGAGG AACTGTTATC GCAGGGAGCC ACACAGCGTG TGTTGGACGC GACTCAGCTT
TACCTTGGTG AGATTGGTTA TTCACCACTG TTAACGGCCG AAGAAGAAGT TTATTTTGCG
CGTCGCGCAC TGCGTGGAGA TGTCGCCTCT CGCCGCCGGA TGATCGAGAG TAACTTGCGT
CTGGTGGTAA AAATTGCCCG CCGTTATGGC AATCGTGGTC TGGCGTTGCT GGACCTTATC
GAAGAGGGCA ACCTGGGGCT GATCCGCGCG GTAGAGAAGT TTGACCCGGA ACGTGGTTTC
CGCTTCTCAA CATACGCAAC CTGGTGGATT CGCCAGACGA TTGAACGGGC GATTATGAAC
CAAACCCGTA CTATTCGTTT GCCGATTCAC ATCGTAAAGG AGCTGAACGT TTACCTGCGA
ACCGCACGTG AGTTGTCCCA TAAGCTGGAC CATGAACCAA GTGCGGAAGA GATCGCAGAG
CAACTGGATA AGCCAGTTGA TGACGTCAGC CGTATGCTTC GTCTTAACGA GCGCATTACC
TCGGTAGACA CCCCGCTGGG TGGTGATTCC GAAAAAGCGT TGCTGGACAT CCTGGCCGAT
GAAAAAGAGA ACGGTCCGGA AGATACCACG CAAGATGACG ATATGAAGCA GAGCATCGTC
AAATGGCTGT TCGAGCTGAA CGCCAAACAG CGTGAAGTGC TGGCACGTCG ATTCGGTTTG
CTGGGGTACG AAGCGGCAAC ACTGGAAGAT GTAGGTCGTG AAATTGGCCT CACCCGTGAA
CGTGTTCGCC AGATTCAGGT TGAAGGCCTG CGCCGTTTGC GCGAAATCCT GCAAACGCAG
GGGCTGAATA TCGAAGCGCT GTTCCGCGAG taa

Nucleotide Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome (Forward 5’ to 3')

ATGAGTCAGAATACGCTGAAAGTTCATGATTTAAATGAAGATGCGGAATTTGATGAGAACGGAGTTGAGGTTTTTGACGAAAAGGCCTTAGTAGAAGAGGAACCCAGTGATAACGATTTGGCCGAAGAGGAACTGTTATCGCAGGGAGCCACACAGCGTGTGCTGGACGCGACTCAGCTTTACCTTGGTGAGATTGGTTATTCACCACTGTTAACGGCCGAAGAAGAAGTTTATTTTGCGCGTCGCGCACTGCGTGGAGATGTCGCCTCTCGCCGCCGGATGATCGAGAGTAACTTGCGTCTGGTGGTAAAAATTGCCCGCCGTTATGGCAATCGTGGTCTGGCGTTGCTGGACCTGATCGAAGAGGGCAACCTGGGGCTGATCCGCGCGGTAGAGAAGTTTGACCCGGAACGTGGTTTCCGCTTCTCAACATACGCAACCTGGTGGATTCGCCAGACGATCGAACGGGCGATTATGAACCAAACCCGTACTATTCGTTTGCCGATTCACATCGTAAAGGAGCTGAACGTTTACCTGCGAACCGCACGTGAGTTGTCCCATAAGCTGGACCACGAACCAAGTGCGGAAGAGATCGCAGAGCAACTGGATAAGCCAGTTGATGACGTCAGCCGTATGCTTCGTCTTAACGAGCGCATTACCTCGGTAGACACCCCGCTGGGTGGTGATTCCGAAAAAGCGTTGCTGGACATCCTGGCCGATGAAAAAGAGAATGGTCCGGAAGATACCACGCAAGATGACGATATGAAGCAGAGCATCGTCAAATGGCTGTTCGAGCTGAACGCCAAACAGCGTGAAGTACTGGCACGTCGATTCGGTTTGCTGGGGTACGAAGCGGCAACACTGGAAGATGTAGGTCGTGAAATTGGCCTCACCCGTGAACGTGTTCGCCAGATTCAGGTTGAAGGCCTGCGCCGTTTGCGCGAAATCCTGCAAACGCAGGGGCTGAATATCGAAGCGCTGTTCCGCGAAGAAGTAAGCATCTGTCAGAAAGGCCAGTCTCAAGCGAGGCTGGCTTTTTTTCTTTTGGTACATGGTACATGTTGA

Reverse Compliment Nucleotide Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome (Reverse 3’ to 5')

TCAACATGTACCATGTACCAAAAGAAAAAAAGCCAGCCTCGCTTGAGACTGGCCTTTCTGACAGATGCTTACTTCTTCGCGGAACAGCGCTTCGATATTCAGCCCCTGCGTTTGCAGGATTTCGCGCAAACGGCGCAGGCCTTCAACCTGAATCTGGCGAACACGTTCACGGGTGAGGCCAATTTCACGACCTACATCTTCCAGTGTTGCCGCTTCGTACCCCAGCAAACCGAATCGACGTGCCAGTACTTCACGCTGTTTGGCGTTCAGCTCGAACAGCCATTTGACGATGCTCTGCTTCATATCGTCATCTTGCGTGGTATCTTCCGGACCATTCTCTTTTTCATCGGCCAGGATGTCCAGCAACGCTTTTTCGGAATCACCACCCAGCGGGGTGTCTACCGAGGTAATGCGCTCGTTAAGACGAAGCATACGGCTGACGTCATCAACTGGCTTATCCAGTTGCTCTGCGATCTCTTCCGCACTTGGTTCGTGGTCCAGCTTATGGGACAACTCACGTGCGGTTCGCAGGTAAACGTTCAGCTCCTTTACGATGTGAATCGGCAAACGAATAGTACGGGTTTGGTTCATAATCGCCCGTTCGATCGTCTGGCGAATCCACCAGGTTGCGTATGTTGAGAAGCGGAAACCACGTTCCGGGTCAAACTTCTCTACCGCGCGGATCAGCCCCAGGTTGCCCTCTTCGATCAGGTCCAGCAACGCCAGACCACGATTGCCATAACGGCGGGCAATTTTTACCACCAGACGCAAGTTACTCTCGATCATCCGGCGGCGAGAGGCGACATCTCCACGCAGTGCGCGACGCGCAAAATAAACTTCTTCTTCGGCCGTTAACAGTGGTGAATAACCAATCTCACCAAGGTAAAGCTGAGTCGCGTCCAGCACACGCTGTGTGGCTCCCTGCGATAACAGTTCCTCTTCGGCCAAATCGTTATCACTGGGTTCCTCTTCTACTAAGGCCTTTTCGTCAAAAACCTCAACTCCGTTCTCATCAAATTCCGCATCTTCATTTAAATCATGAACTTTCAGCGTATTCTGACTCAT

RNA Nucleotide Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome

AUGAGUCAGAAUACGCUGAAAGUUCAUGAUUUAAAUGAAGAUGCGGAAUUUGAUGAGAACGGAGUUGAGGUUUUUGACGAAAAGGCCUUAGUAGAAGAGGAACCCAGUGAUAACGAUUUGGCCGAAGAGGAACUGUUAUCGCAGGGAGCCACACAGCGUGUGCUGGACGCGACUCAGCUUUACCUUGGUGAGAUUGGUUAUUCACCACUGUUAACGGCCGAAGAAGAAGUUUAUUUUGCGCGUCGCGCACUGCGUGGAGAUGUCGCCUCUCGCCGCCGGAUGAUCGAGAGUAACUUGCGUCUGGUGGUAAAAAUUGCCCGCCGUUAUGGCAAUCGUGGUCUGGCGUUGCUGGACCUGAUCGAAGAGGGCAACCUGGGGCUGAUCCGCGCGGUAGAGAAGUUUGACCCGGAACGUGGUUUCCGCUUCUCAACAUACGCAACCUGGUGGAUUCGCCAGACGAUCGAACGGGCGAUUAUGAACCAAACCCGUACUAUUCGUUUGCCGAUUCACAUCGUAAAGGAGCUGAACGUUUACCUGCGAACCGCACGUGAGUUGUCCCAUAAGCUGGACCACGAACCAAGUGCGGAAGAGAUCGCAGAGCAACUGGAUAAGCCAGUUGAUGACGUCAGCCGUAUGCUUCGUCUUAACGAGCGCAUUACCUCGGUAGACACCCCGCUGGGUGGUGAUUCCGAAAAAGCGUUGCUGGACAUCCUGGCCGAUGAAAAAGAGAAUGGUCCGGAAGAUACCACGCAAGAUGACGAUAUGAAGCAGAGCAUCGUCAAAUGGCUGUUCGAGCUGAACGCCAAACAGCGUGAAGUACUGGCACGUCGAUUCGGUUUGCUGGGGUACGAAGCGGCAACACUGGAAGAUGUAGGUCGUGAAAUUGGCCUCACCCGUGAACGUGUUCGCCAGAUUCAGGUUGAAGGCCUGCGCCGUUUGCGCGAAAUCCUGCAAACGCAGGGGCUGAAUAUCGAAGCGCUGUUCCGCGAAGAAGUAAGCAUCUGUCAGAAAGGCCAGUCUCAAGCGAGGCUGGCUUUUUUUCUUUUGGUACAUGGUACAUGUUGA

Source: https://biocyc.org/ECOLI/sequence-rc?type=GENE&object=EG10510

3.3. Codon optimization.

Lysis protein DNA sequence with Codon-Optimization ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

3.4. You have a sequence! Now what?

3.5. [Optional] How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.
  2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein like a provided example at [https://2026a.htgaa.org/2026a/course-pages/weeks/week-02/index.html]

Reading DNA

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account, and Benchling account…

  • create Twist and Benchling accounts
  • Pick our protein! I will pick a protein related to aging for final project, I am just trying to keep my head above water on HW2 so the protein I pick is the example provided. See below in codeblock but what sort of nucleotides are “M E T…”? Clearly those aren’t nucleotides they are single letter representatives of amino acids, known as codons, constructed from 3 nucleotides. Here we are given in a top-down Build of a protein, which we must run the Central Dogma in reverse to translate back to RNA and then transcribe back to DNA.
>sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT
                     /note="unnamed protein product; L-protein"
                     /codon_start=1
                     /transl_table=11
                     /protein_id="CAA23990.1"
                     /db_xref="GOA:P03609"
                     /db_xref="InterPro:IPR022599"
                     /db_xref="UniProtKB/Swiss-Prot:P03609"
                     /translation="METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFL
                     AIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT"
     CDS             1761..3398

Here is an example of what running backwards looks like crudely. In this instance we go all the way back (1996) to the original sequence of phage MS2 L-protein from its genome. This is an excerpt from the GenBank file: representing a “phage MS2 genome” GenBank record [https://www.ncbi.nlm.nih.gov/nuccore/V00642].

Please note this sequence doesn’t come from the bottom of the GenBank file instead the selected region is required which must be further trimmed to match the code provided below from the HW2 blog. With correct NCBI links we can now confirm this code from the blog actually came from this GenBank record [ https://www.ncbi.nlm.nih.gov/nuccore/NC_001417.2?from=1678&to=1905&report=genbank]. I will also move this GenBank file into Benchling instead of previous file.

          atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

A closer match of genome nucleotides is obtainted through another NCBI lookup [https://www.ncbi.nlm.nih.gov/nuccore/NC_001417.2?report=fasta&from=1643&to=1938] though even here the resulting gene fragment must be further trimmed

>NC_001417.2:1643-1938 phage MS2 genome
GCTTATTGTTAAGGCA|
ATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAAC
TCCGGCATCTACTAATAGACGCCGGCCATTCAAACATGAGGATTACCCATGTCGAAGACAACAAAGAAGT
TCAACTCTTTATGTATTGATCTTCCTCGCGATCTTTCTCTCGAAATTTACCAATCAATTGCTTCTGTCGC
TACTGGAAGCGGTGATCCGCACAGTGACGACTTTACAGCAATTGCTTACTTAA|
GGGACGAATTGCTCACA
AAGCATCCGACCTTAG

Reflecting, since we need the gene that codes for the LYS_BPMS2 Lysis protein in the Escherichia phage MS2 we go back to a GenBank file from 1996 when virology was the approach in molecular biology for engineering tag segments of RNA strand with stems looped in the translation phase of the Central Dogma of molecular biology. Based on the orignal RNA virus from which MS2 was derived.

  • select Genes on the page with prompt “what can twist build for you?” for HW2
  • name the project “L protein” with “L” for “Lysis” for HW2.
  • select Clonal Genes order card and press “Order Now” when prompted to select gene type for HW2.
  • avoid my mistake, this next page is going to take us to an “Excel Like” worksheet that we will develop our request with. The old school way was to download and upload meticulously formatted Excel spreadsheets; we are advanced humans capable of using web forms. Before we enter the DNA we require into this order form we have to work through the DNA we were given to read in HW2 Completing the optimization process on Twist Website we now have a Codon-Optimized Lysis protein DNA sequence.
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

Optional: If we were going to synthesize more of this protein we now have a set of genetic instructions to read to grow those proteins. However there are different methods from which we can Build those proteins. We can consider cell-dependent or cell-free approaches. Explain more about these when I pick my protein.

In preparation for next steps remember that my Codon-Optimized Lysis protein DNA sequence In Benchling instructions to transcribe gene to RNA “Highlight the DNA sequence of interest.” “Right-click and select Copy Special.” “Choose the Reverse Complement option to get the anti-sense strand (RNA equivalent).” “Create a New DNA/RNA Sequence and paste the sequence, ensuring the type is set to “RNA”.”

4.2. Build Your DNA Insert Sequence

  • Let’s first organize our directories in Benchling for the assembly line
  • Create folder for Registry of Standard Biological Parts [https://parts.igem.org/Part:BBa_J23106] In that folder create the following folders: > A_Promoter > B_RBS > C_Start Codon > D_Coding Sequence > E_7x His Tag > F_Stop Codon > G_Terminator

HW2 Objective of assembly: make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein).

In Benchling, select New DNA/RNA sequence

Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).

Promoter (e.g. BBa_J23106) TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (e.g. BBa_B0034 with spacers for optimal expression) CATTAAAGAGGAGAAAGGTACC

Start Codon ATG

Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example) AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli) CATCACCATCACCATCATCAC

Stop Codon TAA

Terminator (e.g. BBa_B0015) CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated!

This is not required for this exercise, but to share your design with others, please ensure that link sharing is turned on! (Optional) Share your final sequence link with a TA for review!

This insert sequence you built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). Go ahead and download the FASTA file for the sequence you made.

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs.

Here is my practice assembled copy of the HW2 gene fragment I will import in Twist. However, I will not submit an actual order to Twist because this is just my demonstration Clonal Gene fragment copy. I will repeat these steps with my own functional gene for official purchase order.

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

Here is my final Twist purchase order, though I will not actually purchase this either until an experiment can be developed.

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCC
ATTAAAGAGGAGAAAGGTACC
ATG
ATGAGTCAGAATACGCTGAAAGTTCATGATTTAAATGAAGATGCGGAATTTGATGAGAACGGAGTTGAGGTTTTTGACGAAAAGGCCTTAGTAGAAGAGGAACCCAGTGATAACGATTTGGCCGAAGAGGAACTGTTATCGCAGGGAGCCACACAGCGTGTGCTGGACGCGACTCAGCTTTACCTTGGTGAGATTGGTTATTCACCACTGTTAACGGCCGAAGAAGAAGTTTATTTTGCGCGTCGCGCACTGCGTGGAGATGTCGCCTCTCGCCGCCGGATGATCGAGAGTAACTTGCGTCTGGTGGTAAAAATTGCCCGCCGTTATGGCAATCGTGGTCTGGCGTTGCTGGACCTGATCGAAGAGGGCAACCTGGGGCTGATCCGCGCGGTAGAGAAGTTTGACCCGGAACGTGGTTTCCGCTTCTCAACATACGCAACCTGGTGGATTCGCCAGACGATCGAACGGGCGATTATGAACCAAACCCGTACTATTCGTTTGCCGATTCACATCGTAAAGGAGCTGAACGTTTACCTGCGAACCGCACGTGAGTTGTCCCATAAGCTGGACCACGAACCAAGTGCGGAAGAGATCGCAGAGCAACTGGATAAGCCAGTTGATGACGTCAGCCGTATGCTTCGTCTTAACGAGCGCATTACCTCGGTAGACACCCCGCTGGGTGGTGATTCCGAAAAAGCGTTGCTGGACATCCTGGCCGATGAAAAAGAGAATGGTCCGGAAGATACCACGCAAGATGACGATATGAAGCAGAGCATCGTCAAATGGCTGTTCGAGCTGAACGCCAAACAGCGTGAAGTACTGGCACGTCGATTCGGTTTGCTGGGGTACGAAGCGGCAACACTGGAAGATGTAGGTCGTGAAATTGGCCTCACCCGTGAACGTGTTCGCCAGATTCAGGTTGAAGGCCTGCGCCGTTTGCGCGAAATCCTGCAAACGCAGGGGCTGAATATCGAAGCGCTGTTCCGCGAAGAAGTAAGCATCTGTCAGAAAGGCCAGTCTCAAGCGAGGCTGGCTTTTTTTCTTTTGGTACATGGTACATGTTGA
CATCACCATCACCATCATCAC
TAA
CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

4.2. On Twist, Select The “Genes” Option

4.3. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.4. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.5. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

This is the plasmid you just built with your expression cassette included. Congratulations on building your first plasmid!

Important

For your final projects, remember to include:

Fully annotated Benchling insert fragment Desired Twist cloning vector

Part 5: DNA Read/Write/Edit

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs. Here’s an example of what you just annotated in Benchling:

4.3. On Twist, Select The “Genes” Option

4.4. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.5. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.6. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

This is the plasmid you just built with your expression cassette included. Congratulations on building your first plasmid!

Important

For your final projects, remember to include:

Fully annotated Benchling insert fragment Desired Twist cloning vector

Part 5: DNA Read/Write/Edit

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would like to use directed evolution in outbred goats to select my DNA to sequence. Therefore my plan is to stick with the HTGAA method until I have a Nanopore sequencer and reagents for genomic surveillance of my herd. My argument for why is still developing but essentially I have anecdotal observations to support a hypothesis. An example of the type of genes I would like to sequence is the second vector I uploaded – the RpoS gene in the K-12 strain of E. coli. The gene was sequenced with a Nanopore sequencer.

DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025]

DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025]

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

There is no substitute in my opinion for a Nanopore sequencer, with a distant second being a PacBio. Ofcourse, Nanopore sequencers are far less popular than Illumina, and despite the fact that I am a big fan of Craig Venter, I still prefer the scientific opportunities available with Nanopore. In fact a significant reason why I went back to school post reproductive fitness equals zero, is because when I graduated from college they still hadn’t completed the Human Genome Project. I learned about Nanopores during the COVID-19 Pandemic when I started one of the first wastewater surveillance programs in the U.S. I believe the accuracy, speed, and flexibility of pore facilitated single base sequencing reads in parallel multithreaded readings fits my future research goals exactly and I am on the cusp of becoming a Nanopore super user.

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

I am focused on deep time series sequencing data that is broad enough to include changes in diversity of microbiome and host metatranscriptomic, epigenetic, and metabolomic signals as well as metagenomic changes. I also want to develop pipelines that I always have the opportunity to contribute to but never want to own or primarily benefit from. I believe paywall science is an etiologic mechanism that favors contagion.

The first benefit of 3rd and 4th generation sequencers, particularly the Nanopore machines, is that they do not even require PCR amplification. Don’t get me wrong, I love PCR as a flexible assay, but as an Epidemiologist I have never been comfortable with making more copies of pathogens on principle. I realize this a bit of a semantic argument and there are plenty of bio safety measures in place. At the same time the same biosecurity measures are drivers of inequality in applied Molecular Biology capabilities. What does it mean when the technology itself becomes a driver of inequality to scientific techniques everyone in a generation should have access to? I believe it means it’s time to keep innovating.

In addition I think there is wisdom in sequencing the actual shoddy molecules collected from the field, particularly for my applications. This is a Biosecurity advantage and better fit to the Epi Triangle anyway. However I am not saying there are not scenarios where higher level Biosecurity reference labs with PCR pipelines are not necessary. I just think some sequencers should be managed and maintained by governments and smaller non-PCR-based Nanopores should be prioritized by individual field researchers, like I intend to be.

Now there is an elephant in the room, thoug,h and that’s data storage. I have been wrestling with data storage my entire career, and I know my interest in Nanopore sequencing isn’t going to make these challenges go away anytime soon. Therefore, I am all for DNA storage of genomic sequencing information about animals in plant DNA ideally. If a safe method is already available, storage in animal subjects would be incredible as well. What DNA storage is maintained in goat horns or sheep’s wool? I need to investigate the methodology further to see if this is even possible. I am ashamed to admit that until I read Dr. Church’s Epilogue in Regeneration, I had never even thought about this before.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I am dazzled by DNA origamis for synthetic materials, but the complexity of the methods to achieve static outputs is not necessarily a tradeoff I would invest time in right now. Genetic circuits are different though, I am fully attentive to this revolution. Particularily like we see in the examples provided by the Elowitz Lab [https://www.elowitz.caltech.edu/research#!computationandsyntheticcircuits].

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

I would like to learn more about the CHOMP (circuits of hacked orthogonal modular proteases) method to integrate binary logic into functional programming modules in biological circuitry. The CHOMP method can then be used to control regulatory cascades and even more exciting to me binary logic gates. My research interests are nonlinear models in aging and cellular senescence that utiilizes elucidated insights to improve areas of stasis with the potentional for rejuvination. Waves of molecular decisions all of which with decipherable underlying binary logic gates based on Boolean logic. The engineering methods focus on viruses and bacteria. The programming motif they target is incoherent feed-forward loops. The amino acid they interact with is the Nitrogen end of Tyrosine which they expand to a four protein circuit. They image their single-transcript adaptive pulse circuits using time-lapse images. The result of the engineering is a rachet to control intrinsic nonlierarity of input and output biological systems. The scalabity and accuracy are tunable by the application. The speed is slow to design and as fast as biological circuits once implemented. Another method I would be interested in investigating further is the Asish et al. (2026) lab’s noninvasive biosensor application using live-cell diffusion-weighted imaging to investigate the effect of Gly-Ser spacers in transcription.

  • Source: Xiaojing J. Gao et al. ,Programmable protein circuits in living cells.Science361,1252-1258(2018).DOI:10.1126/science.aat5062
  • Asish N. Chacko et al. ,A programmable genetic platform for engineering noninvasive biosensors.Sci. Adv.12,eaec1211(2026).DOI:10.1126/sciadv.aec1211

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I am for animals. I want to contribute to dextinction and life extinction of all endangered and vulnerable organisms I can serve. I will contribute to human longevity as an afterthought to Natural diversity and sustainability. I am not beholden to humans though. I believe in the sanctity of all life.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions: How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The synthesis of DNA and OMIC ontologies starts with phylogeny, small molecules, and phenotypes like disease. Therefore my career plan is to build the throughput for molecule mediate bidrectional interactions between Host physiology and the microbes in the microbiome and metagenome around the host. This is the big tent vision. Now how do get there. Imagine if I lay it all out here step-by-step. How could I do what I aspire to do if 1000 people do it first before me. Still, I come to HTGAA a pleb at the stairs to the Temple of Zeus, with my goats. Not to sacrifice, though. I will not be a culling scientist. There is no scientific discovery in the text of life that is worth sacrificing a living thing. I am a builder by nature anyway, I want to observe life without intervention, well some intervention is necessary, but not like it’s currently done. Therefore what to do?

Here is what I can share at HW2. Everywhere an organism lives, say a goat, is a DNA and RNA wake of material. Much of it is waste material, residues from competing metabolic systems stacked and ready to be interpreted and transformed into data. With data comes constraints, especially OMIC data, it’s endless and massive in scale, randomized and chaotic. I like the idea of applied systems biology pipelines built around dead biological material. You can catalogue and reconstruct living systems from waste chemistry. Do you need an Almond in it’s shell to understand the life history of that nut, not really, a fragment of husk in a pile on the ground will tell you about the almond and the animal that consumed it.

COVID-19, as a front-line Epidemiologist, in the center of the maelstrom did not equivocate in its lessons. First and foremost, public health apparatuses like mRNA vaccination research and deployment infrastructure is useful when it’s available, accessible, and appropriately matched to the agent. The rest is wastewater. Especially, where the infrastructure of sewers is insufficient to remove waste from a community fast enough, can be used to trace outbreaks in near real time. Wastewater surveillance is harder where the water is plentiful, deep, and fast-flowing. The great news for wastewater epidemic surveillance is that the structural inequalities above the sewers, exist within the sewers, and drive disease transmission in Outbreaks. This isn’t a hunch; the data support it. This is why I will continue to be interested in wastewater surveillance also when I enter the workforce.

However, I will focus on much broader networks of waste than wastewater, which is what makes the intersection of gut microbiomes, microbes, and host physiology the biological nexus for me. Thus, applications, many options here – especially in agriculture. I like agriculture because soil is the ultimate biological pile of waste. I have watched animal waste turn into dirt for several years now, and from that waste, plants grow. The animals eat those plants and turn it into animal tissues using systems of heredity and variability that have nothing to do with anything I did. I just get the animal in front of a plant and they complete their reproductive and maintenance programs. If I keep the animals water clean and their housing dry they do not get sick. These animals and the environment are an engine that I can run passively – they make the world a better place.

At the same time, though, this natural experiment produces a lot of opportunities to study molecule-mediated bidirectional relationships between animal hosts and the microbes in their microbiome and metagenome. Fortunately, for my experimental milieu, my species is driving Earth to its extreme of the boundary conditions for habitability, which certainly makes science more interesting – especially when local interventions can be developed to support sustainability, health, and longevity.

The last sentence is key for the edits I would dare to make. Never blindly though. This is why I will structure my lab within evolution directed sythesis.

Resources

DNA Sequencing at 40: Past, Present, and Future (2017) Shendure, J., Balasubramanian, S., Church, G. et al. https://doi.org/10.1038/nature24286 DNA Synthesis Technologies to Close the Gene Writing Gap (2023), Hoose, A., Vellacott, R., Storch, M. et al. https://doi.org/10.1038/s41570-022-00456-9 Recombineering and MAGE (2021), Wannier T, et al. Nat Rev Methods Primers, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9083505/ CRISPR Technology: A Decade of Genome Editing is Only the Beginning, Wang, Doudna, et al., https://www.science.org/doi/10.1126/science.add8643 Databases

GenBank overview: https://www.ncbi.nlm.nih.gov/genbank/ NCBI: https://www.ncbi.nlm.nih.gov/genome/ Ensembl: https://useast.ensembl.org/index.html UCSC Genome Browser: https://genome.ucsc.edu/ Protective and Enhancing Alleles: https://arep.med.harvard.edu/gmc/protect.html Editors and tutorials

CRISPR/Cas9 Short tutorial for designing gRNAs: https://blog.addgene.org/how-to-design-your-grna-for-crispr-genome-editing Benchling specific tutorial for designing gRNAs: https://www.benchling.com/blog/how-to-design-grnas-to-target-your-favorite-gene List of Cas editors and their PAM sites: https://www.synthego.com/guide/how-to-use-crispr/pam-sequence Base Editors Base editors contain a nicking or dead Cas9 enzyme fused to a deaminase. a.) PAM requirement: Base editors contain a nicking or dead Cas9 enzyme fused to a deaminase. For designing your guide RNA for base editing you will therefore have a PAM requirement like you would have for any Cas9 experiment. b.) Deamination window: An additional design constraint is that the sequence window in which deamination occurs is only a few base pairs long. You can find information on the deamination windows in the review below (even though some new editors are not included). BE4 and ABE7.10 are good starting points and both use SpCas9 with NGG Pam requirement. Base editors with other PAM sites have been constructed too. Review of base editors (2018) including a list of all base editors, their editing window and PAM requirement: https://www.nature.com/articles/s41576-018-0059-1?WT.feed_name=subjects_animal-biotechnology Other editors: Prime editor https://www.nature.com/articles/s41586-019-1711-4 Tutorials/tools: https://primeedit.nygenome.org/ https://www.nature.com/articles/s41551-020-00622-8 http://pegfinder.sidichenlab.org/ TALEN For TALENs, you can assume no sequence restrictions – One of the technology’s previous restrictions was a T starting base, but this has since been overcome. In contrast to the CRISPR/Cas technologies above, your DNA sequence is recognized through interactions between the DNA and the TALEN: each TAL in the array recognizes one base. (Note: In order to introduce a double strand break, you will need to design to TALENs targeting the opposing strands.) Short guide: https://www.addgene.org/talen/guide/ One of the available design resources: https://tale-nt.cac.cornell.edu/node/add/talen Directed evolution for overcoming starting base restriction:https://academic.oup.com/nar/article/41/21/9779/1276340 Additional Resources:

Gel Purification of DNA: after DNA gel electrophoresis, cutting a band of DNA out of the agarose gel allows isolation and purification of a specific DNA fragment: Addgene: Protocol - How to Purify DNA from an Agarose Gel Overview of synthetic, unnatural organisms using recoding: Synthetic genomes with altered genetic codes (2020) DNA recorders, Sense+Read+Write: Lineage tracing and analog recording in mammalian cells by single-site DNA writing (2021) Molecular electronics, integrating single molecules into electronic chips: Molecular electronics sensors on a scalable semiconductor chip: A platform for single-molecule measurement of binding kinetics and enzyme activity (2022) Review of genome editors (zinc finger nucleases, TALENs, CRISPR) at the time CRISPR was emerging as editing technology: https://www.cell.com/trends/biotechnology/pdf/S0167-7799(13)00087-5.pdf Clinical trials of genome-editing therapies: https://www.nature.com/articles/d41573-020-00096-y