Week 2 Homework: DNA Read, Write & Edit
3.1. Choose your protein
Question: In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
The protein I have chosen is Rubisco, the protein responsible for fixing carbon dioxide during photosynthesis. I have chosen this protein because one of my final project ideas was to create a system that is more efficient at fixing carbon.
Protein Sequence:
MASSILSSAVVASVNSASPAQASMVAPFTGLKSSAGFPITRKNNVDITTLASNGGKVQCMKVWPPLGLRKFETLSYLPDMSNEQLSKECDYLLRNGWVPCVEFDIGSGFVYRENHRSPGYYDGRYWTMWKLPMFGCTDSSQVIQEIEEAKKEYPDAFIRVIGFDNVRQVQCISFIAYKPPRFYSS
3.2. Reverse Translate
Question: The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools, determine the nucleotide sequence that corresponds to the protein sequence you chose above.
Nucleotide Sequence:
atggcgagcagcattctgagcagcgcggtggtggcgagcgtgaacagcgcgagcccggcg
caggcgagcatggtggcgccgtttaccggcctgaaaagcagcgcgggctttccgattacc
cgcaaaaacaacgtggatattaccaccctggcgagcaacggcggcaaagtgcagtgcatg
aaagtgtggccgccgctgggcctgcgcaaatttgaaaccctgagctatctgccggatatg
agcaacgaacagctgagcaaagaatgcgattatctgctgcgcaacggctgggtgccgtgc
gtggaatttgatattggcagcggctttgtgtatcgcgaaaaccatcgcagcccgggctat
tatgatggccgctattggaccatgtggaaactgccgatgtttggctgcaccgatagcagc
caggtgattcaggaaattgaagaagcgaaaaaagaatatccggatgcgtttattcgcgtg
attggctttgataacgtgcgccaggtgcagtgcattagctttattgcgtataaaccgccg
cgcttttatagcagc
3.3. Codon Optimization
Question: Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
Different organisms recognize DNA strands in slightly different ways, so codon optimization is necessary so that motifs that are native to an organism’s machinery are recognized and the gene function is conserved.
3.4. You have a sequence! Now what?
Question: What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
I could use a standard cloning process to produce this protein from DNA. This involves cutting a circular piece of DNA with restriction enzymes, inserting my gene of interest into the plasmid, and using a ligation reaction to create the genetic insertion I wish to express. Then I would insert the plasmid into an organism (E. coli) to produce my protein of interest.
[Image of molecular cloning workflow]
5.1 DNA Read
(i) What DNA would you want to sequence and why?
I would want to sequence and read the DNA from highly efficient plant species. Again I’m potentially pursuing a final project in which I create a biosensor for carbon dioxide production or an efficient fixation system and understanding the machinery responsible for these reactions is crucial.
(ii) What technology or technologies would you use to perform sequencing on your DNA and why?
To identify the specific genes responsible for efficient carbon fixation, I would utilize PacBio SMRT (Single Molecule Real-Time) Sequencing. Plant genomes are notoriously difficult to read because they are often polyploid and filled with repetitive DNA sections. Older technologies (like Illumina) chop DNA into tiny pieces, making it impossible to puzzle a complex plant genome back together. PacBio uses long reads to span these repetitive gaps, allowing me to fully reconstruct the gene clusters responsible for the plant’s efficiency.
Technology Choice: I would use PacBio SMRT Sequencing
- Why: This technology produces “long reads” (reading long stretches of DNA at once) with high accuracy. This is crucial for de novo assembly of plant genomes, which contain highly repetitive sequences that short-read technologies cannot resolve.
Generation: This is a Third-Generation sequencing technology.
- How so? Unlike second-generation methods (which require chopping DNA, amplifying it into clusters, and pausing to take photos), third-generation sequencing reads single molecules of DNA in real-time without the need for PCR amplification during the reading process.
Input & Preparation:
- Input: High Molecular Weight (HMW) genomic DNA (gDNA).
- Preparation:
- Extraction: Isolate DNA carefully to keep strands long (15,000+ base pairs).
- Fragmentation: Shear the DNA into uniform large sizes (e.g., 15kb).
- SMRTbell Construction: This is the essential step. Hairpin-shaped adapters are ligated (glued) to both ends of the double-stranded DNA. This turns the linear DNA into a circle.
- Primer Binding: A sequencing primer and DNA polymerase are bound to the adapters.
Essential Steps & Base Calling:
- Loading: The SMRTbell templates are loaded onto a chip containing millions of microscopic wells called ZMWs (Zero-Mode Waveguides).
- Sequencing: A single DNA polymerase enzyme is anchored at the bottom of the ZMW. As the DNA circle moves through the enzyme, the polymerase incorporates fluorescently labeled nucleotides (A, T, C, G).
- Base Calling: Each nucleotide emits a distinct color of light (pulse) as it is added. A camera records this movie of light flashes in real-time to determine the sequence.
- HiFi Mode: Because the DNA is a circle, the polymerase reads the same sequence over and over again. The software creates a “consensus” from these passes, eliminating random errors.
Output: The output is HiFi Reads (High Fidelity Long Reads). These are digital files (typically FASTQ or BAM) containing sequences that are exceedingly long (10,000 to 25,000 base pairs) with >99.9% accuracy.
5.2 DNA Write
(i) What DNA would you want to synthesize and why?
To create a sensor capable of detecting carbon fixation rates, I would synthesize a genetic circuit. This circuit would utilize a specific promoter (such as a cyanobacterial carbon-responsive promoter) upstream of a reporter gene (like GFP). By synthesizing variations of this promoter, I can tune the sensitivity of the sensor. To achieve this, I would use Silicon-based High-Throughput DNA Synthesis (the technology used by Twist!). This platform miniaturizes the chemical process onto a silicon chip, allowing me to “print” thousands of different sensor designs simultaneously to find the most efficient one.
(ii) What technology would you use?
Technology Choice: I would use Silicon-based Phosphoramidite Synthesis (Twist Bioscience Platform).
- Why: Unlike traditional synthesis (which uses plastic 96-well plates), this technology uses a silicon chip with millions of microscopic wells. This allows for massive scale; I can synthesize thousands of variations of my carbon sensor circuit to test different sensitivity levels in parallel.
Essential Steps of Synthesis:
- Oligonucleotide Printing: The process begins on a silicon chip. Using inkjet-like technology, specific nucleotides (A, C, T, G) are added layer-by-layer using phosphoramidite chemistry. This builds short, single-stranded pieces of DNA called “oligos” (usually <200 base pairs).
- Cleavage & Retrieval:** Once the oligos are built, they are chemically cut (cleaved) off the silicon chip and pooled together in a liquid mixture.
- Gene Assembly: Since oligos are short and genes are long, the oligos are designed to overlap. Using a reaction similar to PCR (called Polymerase Cycling Assembly), the overlapping oligos act as primers for each other, stitching together to form the full-length double-stranded DNA fragment (e.g., my 1,000bp sensor).
- Cloning & QC: The assembled DNA is inserted into an expression vector (plasmid) and usually sequenced (using NGS) to ensure there are no errors before being shipped.
Limitations (Speed, Accuracy, Scalability):
- Length (Scalability): We cannot synthesize a whole genome in one continuous strand. The chemical yield drops as the chain gets longer. We are generally limited to synthesizing fragments of 1.8kb to 3kb. Larger constructs (like whole genomes) must be stitched together from these smaller chunks.
- Accuracy: Chemical synthesis has an error rate (roughly 1 error every few hundred bases). While highly accurate, deletions or insertions can occur. This requires “error correction” steps or sequencing to find the perfect clones.
- Speed: While the “printing” is fast, the downstream assembly and shipping usually take weeks (10–20 business days), which is slower than simply ordering a primer.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I would propose editing the human genome to modulate the expression of telemorase reverse transcriptase (TERT). Telomere shortening acts as a “molecular clock” that limits the number of times a cell can divide, eventually leading to cellular aging (senescence) and tissue failure. By utilizing gene editing to enhance telomerase activity in specific adult stem cell populations, we could theoretically delay the onset of age-related degenerative diseases, thereby extending human “healthspan” and longevity.
(ii) What technology would you use to perform these DNA edits and why?
I would use the CRISPR-Cas9 System (specifically utilizing Homology Directed Repair (HDR)). CRISPR is currently the most versatile, programmable, and accessible tool for mammalian genome editing. Unlike older zinc finger nucleases, CRISPR relies on a simple RNA guide, making it easy to retarget to the specific telomerase promoter regions I want to modify.
Essential Steps of Editing: Guide RNA (gRNA) Design:I would synthesize a specific string of RNA (approx. 20 bases) that matches the DNA sequence near the TERT gene. Cas9 Complex Formation: The gRNA binds to the Cas9 protein (the “molecular scissors”), creating a ribonucleoprotein complex. Targeting & Scanning: The complex enters the cell nucleus and scans the genome. It looks for a specific molecular anchor called the PAM sequence. Once found, it unzips the DNA to check if the gRNA matches the target. Cleavage (The Cut): If the sequence matches, Cas9 cuts both strands of the DNA, creating a Double Strand Break (DSB). Repair (The Edit): This is the critical step. To change the gene (rather than just breaking it), I would provide a Donor DNA Template. The cell uses this template to repair the cut via Homology Directed Repair (HDR), effectively pasting my desired sequence (e.g., a stronger promoter) into the genome.
Limitations: Off-Target Effects (Accuracy): The biggest risk is that the gRNA might accidentally match a similar sequence elsewhere in the genome, causing Cas9 to cut a gene I didn’t intend to touch which could have no consequence to potentially causing cancer. HDR Efficiency: Human cells prefer to fix cuts by simply gluing them back together, which is messy and error-prone. Getting the cell to actually use my “Donor Template” is difficult and often happens in only a small percentage of cells, so theres a low efficiency rate. Delivery: Getting the bulky Cas9 protein and the RNA into the nucleus of a living human (in vivo) is physically difficult. Other viral vectors are often used but have size limits and immune response risks.
Gemini AI was consulted for formatting