Week 2 HW: DNA Read, Write and Edit

cover image cover image

Part 1:

Firstly, I made the enzyme list on Benchling, containing the mentioned enzymes:

enzyme list enzyme list

Secondly, I tried playing around with the restriction enzymes to make some kind of resemblance to artwork. I finally made something satisfactory, and had to make the other symmetrical half.

first iteration first iteration

I tried representing a smiling face through the different restriction enzymes on benchling.

benchling art benchling art

Part 3:

3.1

My initial choice was Myosin. However, it was too long of a sequence according to Twist, nearing the end of Part 4. Hence, I have chosen the protein Somatotropin. This is because I find it interesting how such a small protein is responsible for growth in complex creatures like humans.

3.2

I used a Reverse Translate Tool to translate the protein sequence to a nucleotide sequence (based on the Central Dogma). The nucleotide sequence was:

ATGGCGACCGGCAGCCGCACCAGCCTGCTGCTGGCGTTTGGCCTGCTGTGCCTGCCGTGGCTGCAGGAAGGCAGCGCGTTTCCGACCATTCCGCTGAGCCGCCTGTTTGATAACGCGATGCTGCGCGCGCATCGCCTGCATCAGCTGGCGTTTGATACCTATCAGGAATTTGAAGAAGCGTATATTCCGAAAGAACAGAAATATAGCTTTCTGCAGAACCCGCAGACCAGCCTGTGCTTTAGCGAAAGCATTCCGACCCCGAGCAACCGCGAAGAAACCCAGCAGAAAAGCAACCTGGAACTGCTGCGCATTAGCCTGCTGCTGATTCAGAGCTGGCTGGAACCGGTGCAGTTTCTGCGCAGCGTGTTTGCGAACAGCCTGGTGTATGGCGCGAGCGATAGCAACGTGTATGATCTGCTGAAAGATCTGGAAGAAGGCATTCAGACCCTGATGGGCCGCCTGGAAGATGGCAGCCCGCGCACCGGCCAGATTTTTAAACAGACCTATAGCAAATTTGATACCAACAGCCATAACGATGATGCGCTGCTGAAAAACTATGGCCTGCTGTATTGCTTTCGCAAAGATATGGATAAAGTGGAAACCTTTCTGCGCATTGTGCAGTGCCGCAGCGTGGAAGGCAGCTGCGGCTTT

3.3

Codon optimization is necessary so that instead of the “rare” codons being used, the host organism can use the codons it possesses. This improves the yield of the protein and helps in boosting the production.

I used a Codon Optimization Tool to optimize the codons for Escherichia. I did this because it is an organism that has a symbiotic relationship with humans and can be used as a vector to produce the proteins.

ATGGCGACCGGCAGCCGCACCAGCCTGCTGCTGGCGTTTGGCCTGTTATGCCTGCCGTGGCTGCAGGAAGGCAGCGCCTTTCCGACCATCCCGCTGAGCCGTCTGTTCGACAACGCGATGCTGCGCGCGCATCGCCTGCATCAGCTGGCCTTCGATACCTACCAGGAATTTGAAGAAGCGTATATTCCGAAAGAACAGAAATATAGCTTCCTGCAGAATCCGCAGACCTCCCTGTGCTTTTCGGAAAGCATTCCGACCCCGAGCAACCGCGAAGAAACCCAGCAGAAAAGCAATCTGGAACTGCTGCGCATTAGCCTGCTGCTGATTCAGTCGTGGCTGGAACCGGTGCAGTTTCTGCGCAGCGTGTTTGCAAACAGCCTGGTGTATGGCGCGAGCGACAGCAACGTGTATGATCTGCTGAAAGATCTGGAAGAAGGCATTCAGACCCTGATGGGCCGCCTGGAAGATGGCAGCCCGCGTACCGGCCAGATTTTTAAACAGACGTACAGCAAATTTGATACCAACAGCCATAACGATGATGCCCTGCTGAAAAACTACGGCCTGCTGTATTGCTTTCGTAAAGACATGGATAAAGTGGAAACCTTCCTGCGCATTGTGCAGTGTCGCAGCGTGGAAGGCAGCTGTGGCTTT

3.4

  • Cell-Dependent: This would involve using living cells (such as E. coli) as host cells and vectors, and using them as sort of factories to produce the protein sequences. First, once the DNA is obtained and prepared (likely using PCR), it is inserted into the vector where transcription and translation take place. Post translation, the protein is produced.

  • Cell-Independent: This would involve using DNA templates and RNA polymerase to transcribe DNA to mRNA. Ribosomes transcribe mRNA to Protein and voila! This method is preferrable because it takes less time, and does not require cell maintenance (toxic proteins can also be synthesized).

Therefore, I think cell-independent systems might be more efficient but both types should work equally well.

Part 4:

After completing all the steps I built the Plasmid with my Expression Casette. The expression cassete was made on Benchling and can be found here.

Part 5:

5.1 Read

I. What DNA would you want to sequence (e.g., read) and why?

I would like to read DNAs from fossils and prehistoric artifacts. I choose this to help evolutionary research. I am also very invested in paleontology and dream to de-extinct the organisms driven to extinction because of human activities like hunting or poaching, or otherwise.

II. In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

I would use Second Generation Sequencing DNA Sequencing Techniques like Illumina. It would involve small DNA fragments. It has high accuracy. It can do massive parallel sequencing. Additionally, it has already been used for similar purposes in labs (for Ancient DNA).

The input here is the aDNA extracted from the fossils or remnants. It is already highly fragmented and does not require fragmenting further. It might also be damaged and requires certain steps to fix it. The process involves:

  • DNA Extraction: After powdering the sample, aDNA is carefully extracted without contamination (usually in highly equipped facilities).
  • Repair: The DNA fragments may have overhangs and damages. The ends are repaired in this process. Some damages are partially removed, as well.
  • Adapter Ligation: The DNA adapters are ligated to both ends of the fragment. These adapters have neccessary sequences that make the DNA compatible with sequencing platforms, allowing for Amplification.
  • PCR: Ancient DNA is scarce and limited. PCR amplifies and makes millions of copies of each fragment.
  • Flow Cell Binding: The DNA Library is sent to the sequencer (Illumina). The fragments are immobilized onto the surface of a sequencing flow cell.
  • Sequencing By Synthesis: Using base calling (signal from sequencer is converted to corresponding nucleotide, where Illumina uses light intensities), an output is created as a FASTQ file (FASTA file but also contains quality scores for each base).

5.2 Write

I. What DNA would you want to synthesize (e.g., write) and why?

I would like to synthesize materials like polymers through vectors. The polymers would be biodegradeable and utilised for 3-D Printing, Injection Molding and similar Manufacturing techniques. These could replace the petroleum-based polymers in various industries.

II. What technology or technologies would you use to perform this DNA synthesis and why?
  1. I would use Solid Phase Chemical DNA Synthesis, which is a method for creating custom DNA/RNA sequences by anchoring the growing chain to an insoluble support and adding the nucleotide bases. This technique has high yield, utilizes codon optimization and produces precise sequences. Following the oligo synthesis, Gibson Assembly could be used to combine all the DNA fragments in one reaction. It is also highly accurate and efficient.
  2. Then the genes would need to be inserted in the plasmid vectors (most likely E. coli). Plasmids are commonly used because theyproduce many copies and have a fast growth. This is likely the stage, where TWIST comes in.
  3. Some Metabollic Engineering needs to be done to make the vector more efficient, solve conflicting metabollic pathways and make it into a proper “Cell Factory.”
  4. Accumulation of the products, harvesting and purification of the same will be needed for scaling up the production of the polymers.

5.3 Edit

I. What DNA would you want to edit and why?

I would like to edit Human genes to resolve genetic disorders. When I had first come to know about CRISPR, I had wondered if editing genes could cure genetic diseases or even cancer. After CRISPR had been used to provide therapy for KJ Muldoon (first person to receive in vivo peronalized CRISPR therapy), I have only become more intrigued in the field of using Gene Editing to help treat these kinds of diseases.

II. What technology or technologies would you use to perform these DNA edits and why?

CRISPR-cas9 would be the major technology that would be used to perform these DNA edits. It is a precise, efficient, and customizable gene-editing technology adapted from a natural bacterial immune system. It already has great significance in the clinical industry. For example, Casgevy is the first FDA-approved CRISPR-cas9 gene editing therapy used to treat sickle-cell anaemia and beta-thalassemia.

CRISPR uses cas-9 enzyme (the enzyme that breaks down the molecules) and gRNA (Guide RNA that directs the cas-9 enzyme). The Design Steps:

  • Target Identification: Used to identify the mutation and figure out the exact target site.
  • gRNA Design: The gRNA is designed based on the information from the previous step.
  • Donor Template: This is used to correct mutations. It contains the correct sequence.

The input includes the sequence to be edited, gRNA, cas9 enzyme and the Donor Template.

Limitations of CRISPR-cas9:

  • The enzyme may cut the wrong sites and there is a risk of mutation.
  • Homology-Directed Repair (precise repairing) becomes inefficient.
  • Delivery Challenges: Might be difficult to deliver in certain tissues and vectors.
  • Immune Response: cas-9 being a bacterial enzyme may trigger immune reactions.
AI Citations
  • Tool name and version: Chat GPT, Version GPT-5.2
  • Time and Date: 14th February, ~7 P.M. IST
  • Prompts:
    1. what kind of dna sequencing tech would i use to sequence dna from fossils in order to revive the organims. what generation would the dna sequencing tech be
    2. how do you prepare your input ( fragmentation adapter ligation PCR) list the essential steps how does it decode the bases of your DNA sample (base calling) what is the output of your chosen sequencing technology
    3. What technology or technologies would you use to perform DNA synthesis where you hypothetically make vectors produce biodegradeable plastics and why?
    4. What technology or technologies would you use to perform DNA edits to the human gene to fix genetic diseases and why?
  • Responses:
    1. If you’re talking about sequencing DNA from fossils for something like “reviving” an organism (de-extinction), the type of sequencing technology matters a lot because ancient DNA (aDNA) is extremely degraded, fragmented, and chemically damaged. Here’s how it breaks down: What makes fossil DNA difficult? DNA from fossils (like mammoths or Neanderthals) is: Highly fragmented (often 30–100 base pairs long) Chemically modified (cytosine → uracil deamination) Contaminated with microbial and modern human DNA Present in very low amounts Because of this, you need sequencing technology that works well with short, damaged fragments. Best generation of sequencing technology ✅ Second-Generation Sequencing (Next-Generation Sequencing, NGS) This is the most widely used for ancient DNA. Example platform: Illumina Why Illumina-type sequencing? Works extremely well with short DNA fragments High accuracy Massive parallel sequencing (millions of fragments at once) Standard method used in ancient DNA labs Most famous ancient genome reconstructions (like Neanderthal and mammoth genomes) used this approach. So the answer is: Second-generation (NGS) sequencing is the primary technology used for fossil DNA. What about other generations? 🧬 First Generation (Sanger sequencing) Example: Applied Biosystems systems Too slow Requires longer intact DNA Not practical for ancient DNA ❌ Not suitable for fossil DNA. 🧬 Third Generation (Long-read sequencing) Examples: Oxford Nanopore Technologies Pacific Biosciences These produce very long reads (thousands of bases), but: Ancient DNA is already very short Higher error rates (especially problematic with damaged bases) They can be used supplementally, but they are not the primary tool for degraded fossils. Full Workflow for Fossil DNA If the goal is de-extinction, the process would involve: Extract DNA in a clean-room ancient DNA lab Build sequencing libraries from tiny fragments Sequence using Illumina (2nd generation) Bioinformatically reconstruct genome Compare to closest living relative Fill gaps using reference genome Edit living relative genome using CRISPR Clone via somatic cell nuclear transfer Example of proposed de-extinction target: Woolly Mammoth Closest living relative: Asian Elephant Important Reality Check “Reviving” a fossil organism is not: Extract DNA → grow dinosaur Instead, it’s: Reconstruct genome computationally → Edit living species to resemble extinct one. Even then: DNA rarely survives millions of years Dinosaur DNA is effectively impossible Only relatively recent extinct species (tens of thousands of years old) are feasible Final Answer To sequence DNA from fossils for possible revival: You would use Second-Generation (Next-Generation) sequencing technology, especially Illumina short-read platforms. It is currently the gold standard for ancient DNA research. If you want, I can also explain why Jurassic Park–style dinosaur revival is scientifically impossible from a DNA chemistry perspective.
    2. I’ll answer this assuming you’re using second-generation (NGS) short-read sequencing, specifically an Illumina platform — the standard for ancient/fossil DNA. What is the input? Input material: Extracted ancient DNA (aDNA) from fossil bone, tooth, or preserved tissue. DNA is already: Highly fragmented (30–100 bp typical) Chemically damaged Very low concentration So unlike modern DNA sequencing, you usually do NOT fragment it further — it’s already broken. How do you prepare the input? (Library Preparation) These are the essential steps: Step 1: DNA Extraction Powder fossil sample Use EDTA + proteinase digestion Silica-column or magnetic-bead purification Work done in clean-room aDNA facility Goal: isolate short DNA fragments while minimizing contamination. Step 2: End Repair Ancient DNA fragments: Have overhangs May contain damaged bases (like uracil from cytosine deamination) Enzymes are used to: Repair fragment ends Sometimes partially remove damage (optional UDG treatment) Step 3: Adapter Ligation Short synthetic DNA adapters are ligated to both ends. Adapters: Contain sequencing primer binding sites Include index/barcode sequences (to identify sample) Allow fragments to bind to flow cell Now your DNA becomes a sequencing library. Step 4: PCR Amplification Because ancient DNA is scarce PCR enriches fragments that have adapters Produces millions of copies of each fragment. After this step: You have a ready-to-sequence DNA library. Essential Steps of Illumina Sequencing Now the library goes onto the sequencer. Step 5: Flow Cell Binding Library fragments bind to complementary oligos on the flow cell surface. Step 6: Bridge Amplification (Cluster Generation) Each fragment bends and forms a “bridge.” Through repeated amplification: Forms a dense cluster of identical DNA molecule Each cluster = one original fragment This massively increases signal strength. Step 7: Sequencing by Synthesis (SBS) This is how base calling works. Illumina uses: Fluorescently labeled reversible terminator nucleotides (A, T, C, G) Each base has a different color Cycle process: Add all four labeled nucleotides Only ONE base incorporates (due to terminator block) Laser excites the flow cellCamera detects fluorescence color Software assigns base (A/T/C/G) Chemical cleavage removes terminator Repeat cycle Each cycle reads one base. After ~100–150 cycles → full read length obtained. How Base Calling Works Each cluster emits a fluorescent signal Intensity and wavelength correspond to base identity Imaging software converts signal to: Base (A/T/C/G) Quality score (Phred score) Base calling is essentially: Fluorescent color → digital base → quality score What is the Output? The output is:📁 FASTQ files Each read contains: Read ID DNA sequence (A/T/C/G) Quality scores per base For ancient DNA: Millions to billions of short reads (30–150 bp) Later assembled computationally Compared to reference genome (e.g., Asian elephant if reconstructing mammoth) Summary Table Stage What Happens Input Extracted fragmented ancient DNA Library Prep End repair → Adapter ligation → PCR Sequencing Bridge amplification → Sequencing by synthesis Base Calling Fluorescent detection per cycle Output FASTQ files with short reads + quality scores In One Sentence Using an Illumina platform: Your input is fragmented fossil DNA, prepared by adapter ligation and PCR into a sequencing library, decoded via fluorescent sequencing-by-synthesis, and output as millions of short reads in FASTQ format. If you’d like, I can also explain how those reads are assembled into a full extinct genome — which is actually the hardest part.
    3. If your goal is to engineer cells to produce biodegradable plastics (like PHAs), you’d combine DNA synthesis + molecular cloning + metabolic engineering. Here’s a clear, structured answer. 🧬 Goal Design and synthesize genes encoding plastic-producing enzymes, insert them into a vector, and express them in a microbial host. A common biodegradable plastic: Polyhydroxyalkanoates (PHAs) Naturally produced by bacteria under nutrient stress. 1️⃣ DNA Synthesis Technology ✅ Solid-Phase Chemical DNA Synthesis Used to synthesize: Custom genes (codon-optimized) Promoters Regulatory sequences Modern providers use phosphoramidite chemistry to synthesize oligonucleotides, then assemble them into full genes. Why? Precise sequence design Codon optimization for host (e.g., E. coli) Ability to modify promoters and RBS strength Gene Assembly Methods After oligo synthesis: 🔹 Gibson Assembly Efficient, seamless DNA assembly method. Invented by: Daniel Gibson Why use it? Allows assembly of multiple DNA fragments in one reaction No restriction enzyme scars Ideal for building synthetic metabolic pathways 2️⃣ Vector Construction Plasmid Vectors Circular DNA that replicates in bacteria. Inserted genes: phaA phaB phaC (enzymes required for PHA synthesis) Host organism:Escherichia coli (commonly engineered lab strain) Why plasmids? High copy number Easy transformation Fast growth 3️⃣ Genome Editing (Optional but Advanced) Instead of plasmids, you could integrate genes into the chromosome using: CRISPR-Cas9 Gene editing system. Why? Stable expression No plasmid maintenance burden Industrial-scale stability 4️⃣ Metabolic Engineering Tools To improve yield: Promoter engineering Ribosome binding site tuning Knockout competing metabolic pathways Flux balance modeling This turns the host into a “cell factory.” 5️⃣ Fermentation Technology After genetic engineering: Large-scale bioreactors Controlled nutrient limitation Induces PHA accumulation inside cells Cells accumulate plastic granules → harvested → purified. Why These Technologies? Technology Why Use? Solid-phase DNA synthesis Precisely design custom genes Codon optimization software Maximize expression Gibson Assembly Seamless multi-gene cloning Plasmid vectors Fast, flexible expression CRISPR Stable genomic integration Fermentation systems Scale production Example of Industrial Relevance Companies producing biodegradable plastics via engineered microbes include: Danimer Scientific NatureWorks Concise Exam-Style Answer I would use solid-phase chemical DNA synthesis to design and synthesize codon-optimized genes for PHA biosynthesis. These genes would be assembled into plasmid vectors using Gibson Assembly and transformed into E. coli. Optionally, CRISPR could be used for stable genome integration. The engineered microbes would then be grown in bioreactors to produce biodegradable plastics at scale. If you want, I can also explain which specific enzymes are required in the PHA pathway and how carbon flux is redirected toward plastic production.
    4. To fix genetic diseases at the DNA level, the leading technology today is CRISPR-based genome editing, particularly: CRISPR–Cas9 Base editing Prime editing CRISPR is already being used clinically (e.g., sickle cell disease therapy). 🧬 Technology of Choice: CRISPR–Cas9 CRISPR systems were adapted from bacterial immunity and developed for genome editing by researchers including: Jennifer Doudna Emmanuelle Charpentier A real clinical example: Casgevy – approved CRISPR therapy for sickle cell disease. 1️⃣ How Does CRISPR–Cas9 Edit DNA? Core Mechanism CRISPR–Cas9 uses: Cas9 enzyme → molecular scissors Guide RNA (gRNA) → directs Cas9 to specific DNA sequence Essential Steps of Editing Step 1: Target Recognition gRNA binds to complementary DNA sequence. Cas9 binds nearby PAM sequence (e.g., NGG). Step 2: DNA Cleavage Cas9 creates a double-strand break (DSB) at the target site. Step 3: DNA Repair (Cell’s Own Machinery) Two possible pathways: A) Non-Homologous End Joining (NHEJ) Error-prone Creates insertions/deletions (indels) Often used to knock out genes B) Homology-Directed Repair (HDR) Requires donor DNA template Allows precise correction Used to fix mutations 2️⃣ Preparation & Design Steps Before editing, careful design is required. 🔬 A) Target Identification Sequence patient DNA Identify mutation Choose exact target site 🧠 B) Guide RNA Design Design gRNA complementary to mutation region Ensure minimal off-target similarity Confirm PAM availability Software tools predict: Off-target risk Cutting efficiency 🧪 C) Donor Template (for precise correction) If correcting mutation: Provide synthetic DNA template Contains correct sequence Includes homology arms (~50–1000 bp) 3️⃣ What Is the Input? Depends on delivery method. Typical components: Component Purpose Cas9 protein or Cas9 mRNA DNA cutting Guide RNA Target specificity Donor DNA template (optional) Precise correction Delivery system Introduce into cells Delivery Methods Viral vectors (AAV, lentivirus) Lipid nanoparticles Electroporation (ex vivo editing) Example: For sickle cell: Stem cells are removed Edited ex vivo Reinfused into patient 4️⃣ Variants of CRISPR Editing 🧬 Base Editing No double-strand break. Converts single base (e.g., C → T) Higher precision Fewer indelsGood for: Point mutations 🧬 Prime Editing More flexible: Can insert, delete, or replace small DNA segments Uses reverse transcriptase More precise but technically complex 5️⃣ Limitations of CRISPR Editing ⚠️ 1. Off-Target Effects Cas9 may cut unintended sites Risk of mutations elsewhere Improved variants reduce this risk. ⚠️ 2. HDR Efficiency Is Low Precise repair (HDR) is inefficient Especially in non-dividing cells NHEJ often dominates. ⚠️ 3. Mosaicism Not all cells get edited Leads to mixed populations ⚠️ 4. Delivery Challenges Difficult to deliver into certain tissues Viral vectors have size limits ⚠️ 5. Immune Response Cas9 is bacterial protein Can trigger immune reactions 6️⃣ Efficiency & Precision Summary Method Precision Efficiency Best For CRISPR-Cas9 (NHEJ) Moderate High Gene knockouts CRISPR-Cas9 (HDR) High Low–Moderate Gene correction Base Editing Very high (point edits) Moderate–High Single base mutations Prime Editing Very high Lower Complex edits Concise Exam-Style Answer I would use CRISPR–Cas9 genome editing to correct disease-causing mutations. A guide RNA directs the Cas9 nuclease to a specific DNA sequence, where it induces a double-strand break. The cell repairs the break either by NHEJ (causing gene disruption) or HDR if a donor template is supplied for precise correction. Inputs include Cas9 protein or mRNA, guide RNA, and optionally a donor DNA template. Limitations include off-target effects, low HDR efficiency, delivery challenges, and possible immune responses. If you’d like, I can also explain when base editing or prime editing would be better than standard CRISPR for specific diseases.