Week 2 — DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Simulación de restricción enzimática mediante Benchling

1.1 Enzimas utilizadas para la simulación

EcoRI
HindIII
BamHI
KpnI
EcoRV
SacI
SalI

Part 3: DNA Design Challenge

3.1 Choose Your Protein

Protein: Envelope small membrane protein (Gene: E)
Organism: Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) (SARS-CoV-2)
Metadata:
- Length: 75 amino acids
- Mass: 8,365 Da

Amino Acid Sequence:

MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV

Reason for Selection:

I am interested in the SARS-CoV-2 Envelope (E) protein from a biosurveillance standpoint. This protein is essential for viral assembly and release; consequently, it cannot undergo significant mutations without halting these vital functions. Given this, primers for the E gene should tend to be more stable over time, unlike those for the Spike (S) protein, which constantly mutates to evade vaccines and immune responses. Therefore, the E protein can be considered a more reliable and “secure” target for viral detection and monitoring.

3.2 Reverse Translation

Conversion of Protein (amino acid) sequence to DNA (nucleotide) sequence.

Description: Reverse translation of sample sequence to a 226-base sequence of most likely codons.

DNA Sequence:

atgtatagctttgtgagcgaagaaaccggcaccctgattgtgaacagcgtgctgctgtttctggcgtttgtggtgtttctgctggtgaccctggcgattctgaccgcgctgcgcctgtgcgcgtattgctgcaacattgtgaacgtgagcctggtgaaaccgagcttttatgtgtatagccgcgtgaaaaacctgaacagcagccgcgtgccggatctgctggtg.

3.3 Codon Optimization

Optimized DNA Sequence:

ATGTATAGCTTTGTGAGCGAAGAAACGGGCACCCTGATTGTGAACAGCGTGCTGCTGTTCCTGGCATTTGTGGTGTTTCTGCTGGTGACCCTGGCGATTCTGACCGCGCTGCGCCTGTGCGCCTATTGTTGCAACATTGTGAACGTGAGCCTGGTGAAACCGAGCTTTTACGTTTATAGCCGCGTGAAAAATCTGAACAGCAGCCGCGTGCCGGATCTGCTGGTG.

Codon optimization is basically a synthetic biology hack to get the most out of gene expression in a specific organism. Since the genetic code is degenerate, different species have their own “favorite” codons to code for the same amino acid. This technique tweaks the genetic sequence to match the tRNA availability of the host, clearing out any bottlenecks during translation. By swapping “rare” codons for “frequent” ones, we keep the ribosome from stalling, ensuring we get a steady and efficient protein production.

3.4. You have a sequence! Now what?

Question: What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

To produce this Protein E, we can use different technologies. For example, bacterial expression systems, mammalian cell cultures, or insect cell systems (infected with baculoviruses). Cell-free protein synthesis technologies also exist.

For the case of Protein E, I have decided to use bacterial systems technology, as it is the fastest and lowest in cost. Although it does not include post-translational modifications (PTMs), for the objectives of this production, this does not represent an issue.

Workflow for Protein E Production: To produce my Protein E, I need the following workflow. I will use E. coli for production, which requires two major components: 1) a plasmid and 2) the host cell. The Plasmid: This is the vehicle that allows us to integrate our gene of interest into an expression system. A recommended model is the pET plasmid system. It primarily consists of a promotor (T7) that is activated in a regulated manner (by adding IPTG). I will use restriction enzymes for cloning or Gibson Assembly technology to insert my sequence into the plasmid. To purify this protein, specific labels are used, such as a His-tag. The Cell (E. coli): For my gene to be converted into Protein E within E. coli, it first needs to be transcribed into mRNA. This process utilizes a T7 RNA polymerase, which reads the sequences and generates a complementary strand (mRNA). Codon optimization will ensure that the message structure is stable and efficient. Next is the translation of mRNA into protein, which occurs in the ribosomes. The ribosome attaches to the mRNA and reads its optimized codons. tRNAs (transfer RNA) bring the corresponding amino acids (aa), and the ribosome links them together, and then builds the amino acid chain of Protein E. Finally: Culture and Purification: Once the bacteria has produced the protein, it must be extracted. Using bioreactors or shake flasks, the bacteria are grown in nutrient-rich media such as LB. To release the protein, we must sonicate to disrupt the bacterial cells and use detergents to extract it from the membrane and keep it stable. Finally, using affinity chromatography techniques, we can purify it by using the His-tag.

Optimized DNA Sequence:

ATGTATAGCTTTGTGAGCGAAGAAACGGGCACCCTGATTGTGAACAGCGTGCTGCTGTTCCTGGCATTTGTGGTGTTTCTGCTGGTGACCCTGGCGATTCTGACCGCGCTGCGCCTGTGCGCCTATTGTTGCAACATTGTGAACGTGAGCCTGGTGAAACCGAGCTTTTACGTTTATAGCCGCGTGAAAAATCTGAACAGCAGCCGCGTGCCGGATCTGCTGGTG.

Part 4: Prepare a Twist DNA Synthesis Order

This part (from 4.1 to 4.6) was done succesfully and the plasmid is shown below.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

I want to sequence the E gene in hospitalized patients for more reliable biosurveillance. Unlike the Spike (S) protein, which mutates rapidly to evade immunity, the E protein is structurally essential and highly conserved. Its lower mutation rate makes it a more stable anchor for detection, ensuring consistent monitoring and fewer false negatives over time.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

To monitor the E protein gene, I’ve chosen Illumina technology due to its high accuracy and ability to process multiple patient samples simultaneously. It would allow us not only to detect its presence but also to identify specific mutations, track variants, and confirm that our “anchor” (the E gene) remains stable.

Generation This is a second-generation technology (Next-Generation Sequencing ), as it utilizes massive amplification of fragments on a solid surface and detection via fluorescence.
Input and Preparation

The input is viral genomic DNA. Since SARS-CoV-2 is an RNA virus, we must first perform Reverse Transcription to convert its RNA into complementary DNA (cDNA), as sequencing technologies like Illumina only work with DNA. Regarding the samples, we use nasopharyngeal swabs (where the viral load is highest) or blood samples from hospitalized patients to monitor systemic presence.

Essential Preparation Steps:

Fragmentation: Cutting the DNA into small pieces. Adapter Ligation: Attaching sequences to the ends to fix them to the flow cell. Indexing: Adding “barcodes” to identify each individual patient. Enrichment PCR: Amplifying the library to ensure a detectable signal.

Essential Steps and Base Calling Bridge PCR (Amplification): Creates “clusters” of identical copies on the flow cell. Sequencing by Synthesis: Nucleotides with fluorescent terminators are incorporated one by one. Base Calling: A camera captures the color of each cluster; the software translates that color (e.g., Blue = C) into a nitrogenous base and assigns a quality score.
Output The final results are FASTQ files, which contain the sequences or reads and their respective quality values (Q-scores) for subsequent bioinformatic analysis.

5.2 DNA Write

(i) What DNA would you want to synthesize and why?

I want to synthesize the codon-optimized gene for the SARS-CoV-2 Envelope (E) protein. While reading or sequencing allows us to monitor the virus, writing the DNA enables us to develop tools for detection.

Why synthesize the E gene? Synthesizing this specific sequence is critical for biosurveillance for three main reasons:

Safe Diagnostics: It allows for the creation of synthetic positive controls to validate PCR kits and biosensors without the need to handle the live, infectious virus.

Immune Response Monitoring: The synthesized protein serves as a stable antigen for serological assays, helping identify if patients are developing antibodies against the most conserved regions of the virus

Analysis of Structural Stability: Having the pure protein enables structural studies (such as cryo-EM) to predict how potential mutations might impact viral integrity.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

To produce the Protein E gene, I would use silicon microchip-based DNA synthesis, mainly because this technology allows for the simultaneous writing of thousands of genes with high precision.

Its essential Steps are:

Digital Design: Fragmentation of the gene E sequence into shorter oligonucleotides.

Chemical Printing: Deposition of phosphoramidites via inkjet printing onto the silicon chip.

Elongation Cycle: Base-by-base construction (Coupling, Oxidation, Deprotection).

Assembly: Joining short fragments using PCR or Gibson Assembly to form the full-length gene.

Verification: Final quality control using NGS to guarantee the sequence is 100% accurate.

Limitations:

Speed: Delivery times can extend to weeks due to logistics and rigorous quality control steps.

Accuracy: The risk of errors increases proportionally with sequence length, requiring costly error-correction processes.

Scalability: Excellent for massive projects, but less efficient in terms of cost and time if only a single, isolated gene is required.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

I would like to edit the gene encoding the Cas13a protein to optimize its affinity for the guide RNA (gRNA), thereby creating a hypersensitive variant with enhanced detection capabilities.

gRNA Source: The gRNAs used in this system will be derived from a preliminary scan of suspicious sequences, specifically those exhibiting signatures of genetic engineering or a “human accent” (such as the use of artificially optimized codons). This allows the detection to be targeted toward synthetic threats created by malicious actors.

Why: In biosecurity, it is important to detect these sequences at extremely low concentrations. By editing Cas13a, it would improve its binding capacity and the collateral cleavage signal, enabling rapid detection upon gRNA recognition.

Application: This would facilitate the development of a rapid diagnostic kit to identify modified pathogens or synthetic threats in the field, eliminating the need for complex laboratory infrastructure.

(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use Prime Editing to perform these DNA edits. This technology is often referred to as a “genetic word processor” because it allows for high-precision search-and-replace capabilities without requiring double-strand breaks.

How it works and Essential Steps:

Prime Editing uses a specialized protein complex and a unique guide RNA to rewrite the target sequence:

Targeting: The pegRNA (prime editing guide RNA) directs the nCas9-RT (a fusion of nickase Cas9 and Reverse Transcriptase) to the specific site in the Cas13a gene.

Nicking: The nCas9 nicks only one strand of the DNA.

Reverse Transcription: The Prime Editor uses the "tail" of the pegRNA as a template to synthesize new DNA (containing the desired mutation) directly into the target site.

Flap Resolution: The cell’s natural repair machinery removes the old DNA strand and incorporates the new edited sequence, which permanently changes the Cas13a gene.

Preparation and Input:

Design Steps: The process involves designing a specific pegRNA that contains both the targeting sequence and the search-and-replace template for the desired Cas13a mutations. A secondary nick-gRNA is also designed to increase editing efficiency.

Input:

    Plasmid/mRNA: Encoding the Prime Editor protein (nCas9-RT fusion).

    pegRNA & nick-gRNA: Synthesized RNA or DNA templates for their expression.

    Host Cells: For example, a production strain of E. coli or a yeast system where the modified Cas13a gene will be hosted and expressed.

Limitations:

Efficiency: Prime Editing can be less efficient (lower percentage of successfully edited cells) than traditional CRISPR-Cas9, particularly in certain cell types.

Complexity: Designing effective pegRNAs is more complex than designing standard gRNAs and often requires multiple rounds of optimization to achieve the desired hypersensitivity in the protein.