Week 2 HW: DNA Read, Write, & Edit

This page tackles each of the week 2’s HomeWork Questions.

Part 0:

To understand the basics of Gel Electrophoresis (Gl. Ep.), I watched the following videos:

My understanding of Gl. Ep. is that it simply pulls DNA through a maze, which has channels and pores; each DNA fragment experiences the same force per unit length (so essentially the intended forward acceleration for all DNA fragments would have been the same if there was no friction), but the maze structure creates more resistance/friction to the longer fragments due to which they slow down. Thus we get different bands; unless the pore formation is deterministic and can be atomistically replicated, this is essentially a heuristic which works in real life (as pores in different lanes can also be different; I think there should be a metric to measure if the lanes should have the same weight; this can be achieved by puting the test DNA in the lanes, but also placing a much longer and a much shorter DNA-fragment in all the lane-wells; ideally there should be a straight line formed at the top and bottom (imagine the first and last step of the ladder stretched out across all lanes) and the metric should be how straight the line is! Straighter the line the more holistically equal all the DNA-racing lanes are!).

The answer to the question on this website (under “2. DNA Gel Ladders”) “Because DNA has the same charge per mass for any number of nucleotides, gel electrophoresis separates DNA purely based on length (can you think why?)” is:

~ because the force per unit mass is same, and the only differentiating factor becomes the DNA-travel/movement resistance due to the gel which is proportional to the length (more the length there are more contact points with the gel and longer DNA cant pass through those pores easily, essentially creating a length-specific bottleneck…

Part 1:

Gel Electrophoresis pattern sequence 1a — *1. a.*

Gel Electrophoresis pattern sequence 1b — *1. b.*

Image 1 (a, b) reminds us of the shifting temperatures across the globe due to global warming and motivates us to work to prevent climate change.

(Due to lack of time and the problem being a fathomable combinatorial problem, I chose to use my imagination and get the best of whatever sequence was generated; in case any1 is interested in getting hold of a mathematical formulation which can directly output the enzymes required for each lane of a specific design pattern, I can develop such an MILP formulation…)

Gel Electrophoresis pattern sequence 2a — *2. a.*

Gel Electrophoresis pattern sequence 2b — *2. b.*

Image 2 (a, b) reminds us of an anomalous intelligence drop in the Gen-Z population (the Reverse Flynn effect), which is possibly an effect of high smartphone/social media/internet usage (and more research may be necessary to understand the actual causes and develop policies to combat them).

Part 2: Not applicable for Global Committed Listeners without Lab access (therefore I am skipping this)

Part 3:

3.1. Choose your protein.

I have chosen the TRF2 (Telomeric Repeat-binding Factor 2); this protein seems to be a good choice given my individual project idea as this protein:

binds the T-loop/D-loop directly
stabilizes telomere structure
prevents end-to-end fusion and DNA damage signalling
seems an ideal candidate for telomere protection and/or controlled replication access

The Amino Acid sequence is:

MAAGAGTAGPASGPGVVRDPAASQPRKRPGREGGEGARRSDTMAGGGGSSDGSGRAAGRRASRSSGRARRGRHEPGLGGPAERGAGEARLEEAVNRWVLKFYFHEALRAFRGSRYGDFRQIRDIMQALLVRPLGKEHTVSRLLRVMQCLSRIEEGENLDCSFDMEAELTPLESAINVLEMIKTEFTLTEAVVESSRKLVKEAAVIICIKNKEFEKASKILKKHMSKDPTTQKLRNDLLNIIREKNLAHPVIQNFSYETFQQKMLRFLESHLDDAEPYLLTMAKKALKSESAASSTGKEDKQPAPGPVEKPPREPARQLRNPPTTIGMMTLKAAFKTLSGAQDSEAAFAKLDQKDLVLPTQALPASPALKNKRPRKDENESSAPADGEGGSELQPKNKRMTISRLVLEEDSQSTEPSAGLNSSQEAASAPPSKPTVLNQPLPGEKNPKVPKGKWNSSNGVEEKETWVEEDELFQVQAAPDEDSTTNITKKQKWTVEESEWVKAGVQKYGEGNWAAISKNYPFVNRTAVMIKDRWRTMKRLGMN

Other choices were:

POT1 → single-stranded telomere protection
TERT → telomerase activity (lengthening)
RTEL1 → T-loop unwinding / torsion relief

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I obtained the mRNA sequence for TERF2 from NCBI (RefSeq: NM_005652.5).

3.3. Codon optimization.

The provided Twist Biosciences Codon Optimization link is defunct. I therefore used Vector Builder where I provided the protein sequence (DNA/RNA sequences can also be provided here) and optimized it against cleavage sites of:

two restriction enzymes AccIII and AhaIII

four restriction enzymes BbsI, BsaI, BsmAI, & BsmI

   ATGGCCGCAGGAGCCGGCACAGCTGGGCCCGCCTCCGGTCCCGGAGTGGTGAGGGATCCAGCTGCCTCCCAGCCCAGAAAGCGCCCCGGCAGAGAGGGCGGCGAGGGCGCCCGCCGAAGCGATACTATGGCCGGAGGCGGAGGCTCCTCCGATGGTTCAGGCAGAGCAGCAGGCCGCCGGGCCTCCAGATCCTCCGGCCGCGCCCGGCGCGGCAGACACGAACCTGGGCTTGGAGGGCCCGCCGAGAGGGGCGCCGGCGAGGCCAGACTGGAGGAGGCCGTGAACCGGTGGGTGCTGAAGTTCTATTTTCACGAGGCCCTGAGAGCCTTTAGGGGGAGCCGGTATGGCGATTTTAGACAGATCAGGGATATTATGCAGGCCCTGCTGGTGCGCCCTCTGGGAAAAGAGCACACCGTGAGCAGACTGCTGAGAGTGATGCAGTGCCTGTCCCGCATCGAGGAGGGCGAAAATCTCGATTGCAGCTTTGACATGGAAGCAGAGCTCACTCCCCTGGAAAGCGCCATCAATGTGCTGGAAATGATCAAGACCGAATTCACCCTGACCGAGGCCGTGGTGGAGTCCTCACGGAAACTGGTTAAGGAGGCTGCCGTGATCATTTGCATTAAGAATAAGGAGTTCGAGAAGGCTAGCAAGATTCTGAAGAAGCACATGTCTAAGGACCCAACAACACAGAAACTGAGGAACGACCTGCTGAACATTATCAGAGAGAAGAACCTGGCCCACCCTGTGATCCAGAATTTCAGCTACGAAACATTCCAGCAGAAAATGCTGAGGTTTCTGGAGTCACACCTGGACGATGCCGAGCCTTATCTGCTGACAATGGCCAAGAAGGCTCTTAAGAGCGAGAGCGCCGCCAGCTCTACCGGCAAGGAGGACAAGCAGCCCGCCCCTGGGCCTGTCGAGAAGCCTCCAAGAGAGCCCGCCCGGCAGCTGAGAAACCCTCCCACCACCATCGGGATGATGACACTGAAGGCTGCCTTCAAGACCCTGAGCGGCGCTCAGGACTCAGAGGCCGCTTTTGCCAAGCTGGATCAGAAGGACCTGGTGCTGCCAACCCAAGCCCTGCCTGCCAGCCCCGCCCTGAAAAATAAAAGGCCAAGGAAAGACGAGAATGAATCCAGCGCACCCGCCGATGGAGAGGGGGGCTCCGAGCTTCAGCCCAAGAACAAGCGGATGACTATTTCCAGACTGGTGCTGGAGGAAGATTCCCAGAGCACCGAGCCTTCCGCAGGCCTCAACAGCAGCCAGGAGGCCGCTTCAGCCCCACCCTCCAAGCCAACTGTCCTGAATCAGCCACTCCCCGGAGAGAAGAACCCCAAGGTGCCAAAGGGGAAATGGAATTCCAGCAATGGCGTGGAAGAGAAGGAAACCTGGGTGGAGGAGGATGAGCTGTTTCAGGTGCAGGCCGCCCCTGACGAGGACAGCACTACTAACATCACTAAGAAGCAGAAGTGGACTGTGGAGGAATCCGAGTGGGTGAAGGCCGGCGTGCAGAAATACGGGGAGGGCAATTGGGCTGCCATTTCCAAGAACTACCCCTTCGTGAATCGGACAGCCGTGATGATCAAAGACCGGTGGAGGACAATGAAGCGGCTGGGCATGAACTGA

The organism selected was human, mentioning this allows further optimization of the codon, for this case of the human nuclear protein.

Codon optimization is generally necessary for improving translation efficiency and protein yield (reducing ribosomal pausing and improving folding); the safeguard against specific restriction enzymes is to ensure that the DNA does not get cut in case any of the subsequent future workflows requires usage of such an enzyme…

3.4. You have a sequence! Now what?

To synthesize this protein from within my DNA (assuming that it is not already present), we can use some technique to insert the obtained (codon optimized) DNA sequence into a viral vector to insert it inside the human DNA. After successful DNA insertion, the Central Dogma will take care of the rest of the protein synthesis process…

3.5. How does it work in nature/biological systems?

A single gene can code for multiple proteins at the transcriptional level, as there can be multiple start sites, alternate splicing of exons, and the 3-nucleotide reading frame, which can essentially pack three times the information.

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Twist account and a Benchling account created

4.2. Build Your DNA Insert Sequence

Final sequence link for TA to review: https://benchling.com/s/seq-brbMJ6xUPhkDudPrgjLR?m=slm-rCY7ZhowmvIW8w7LoEno.

4.3. to 4.6.| My first plasmid: https://benchling.com/s/seq-FhHywDsQ9IWd3ksYDXqa?m=slm-GexKX3FGA9sfzUUrgdLq.

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I am interested in developing a sample of mammalian circular DNA (mice, monkeys, etc.), and understand the complications during such DNA replication (to prevent cancers during cell division). Therefore, I am interested in developing a circular DNA (cDNA; clipping away the telomeres) and merging the two ends and then sequencing that entire cDNA. My motive behind this is to disprove my hunch that cDNA is a viable path to human lifespan/healthspan extension!?!

(ii) _{In lecture, a variety of sequencing technologies were mentioned.} What technology or technologies would you use to perform sequencing on your DNA and why?

I will use the latest Oxford Nanopore sequencing technique because:
- it can read entire circular DNA molecules without any need for fragmentation
- it can sequence the DNA using rolling-circle–amplified replication
- it uses natural motor enzymes

Is your method first-, second- or third-generation or other? How so?

The Oxford Nanopore third-generation sequencing performs single-molecule sequencing without DNA amplification, producing long reads, and preserves the circular DNA topology.

What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

The input will be the (purified) circular mammalian DNA; no fragmentation and no PCR is required.

What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

A motor enzyme feeds DNA through a biological nanopore; each nucleotide creates a change in ionic current, which can be decoded uniquely.

What is the output of your chosen sequencing technology?

From the raw electrical signal data (FAST5), long-read nucleotide sequences (FASTQ/FASTA) are derived, which preserve structural information like junctions, repeats, and circular continuity

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I wish to synthesise a mammalian cDNA and probe its self-replicating properties during cellular division (ensuring that the DNA copy generated does not exhibit features of cancer or other abnormalities).

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I will use oligonucleotide synthesis followed by enzymatic DNA assembly to construct the full mammalian cDNA, because this approach allows precise sequence control, modular assembly, and is compatible with commercial gene synthesis pipelines (e.g., Twist).

What are the essential steps of your chosen sequencing methods?

- the approx 1842 nt long DNA is split into 12 overlapping oligos (175 bp each)
- Chemical synthesis of each short DNA oligonucleotides
- Enzymatic assembly of oligos into the full-length cDNA

What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

- Possibly high error rates as sequence length nears the limit of 200 bp for oligonucleotide synthesis; this can be bypassed by splitting DNA further
- Whole-genome or very large circular constructs may require multi-step assembly

5.3 DNA Edit

(i) What DNA would you want to edit and why?

In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I am interested to edit DNA of most test aminals (ncluding but not limited to, C elegans, fruitflies, mice, monkeys, etc.) starting from the smaller organisms to large mammals. However this is not gene editing; simply clipping off the telomeres and joining the ends of each linear chromosome to make it circular. Next I wish to observe cell-division processes and how such cDNA fares in large animals (mapping out the Hayflick limit changes due to this alternation).
Additionally, I am interested in looking into telomere maintenance and end-protection, especially if their controlled modulation can extend cellular lifespan without inducing genomic instability or cancer.

(ii) What technology or technologies would you use to perform these DNA edits and why?

How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

There seem to be enzymes that are already used to develop circular DNA from linear DNA by clipping telomeres and joining both ends:
- Restriction Endonucleases make staggered cuts in linear DNA
- DNA Ligase joins the ends of linear DNA fragments together
- Protelomerase specifically resolves telomeres, converting linear DNA into circular form

I am also interested in looking at CRISPR-based editors for targeted modifications in case it is necessary to arrest certain telomere(-ase) or related pathways.

The main challenges are in delivering edits uniformly across all chromosomes, and the high risk of genomic instability (especially during cell divisions). Therefore, all experiments will be restricted to somatic cells in model organisms, and will need extensive validation steps.

Additionally, smaller circular DNA exhibits torsion; in case this might also be a concern when large mammalian DNA is made circular, mechanisms to periodically release the torsion need to be developed (whether such a system may be necessary also needs to be ideated -- requires expert guidance and advice).