Week 2 HW: DNA Read, Write and Edit

Part 1: Benchling & In-silico Gel Art

It was definitely more difficult than I thought!

First Attempt: Letter A

Reference Image	In-silico Gel Result

Second Attempt: A Face

Reference Image	In-silico Gel Result

Part 3: DNA Design Challenge

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose

Protein: TP53 - tumor protein p53

The p53 protein, often referred to as “the guardian of the genome,” is a critical transcription factor that maintains cellular integrity by regulating the cell cycle and genomic stability. In response to cellular stress—such as DNA damage, hypoxia, or oncogene activation—p53 initiates a cascade of signals that can halt the cell cycle in the G1 phase, allowing time for DNA repair mechanisms to fix errors. If the damage proves too extensive to be repaired, p53 triggers apoptosis (programmed cell death), effectively eliminating potentially pre-cancerous cells and preventing the propagation of harmful mutations.From a clinical and molecular perspective, the TP53 gene is the most frequently mutated gene in human cancers, appearing in over 50% of all cases. When mutations occur, particularly in the DNA-binding domain, the protein loses its ability to function as a tumor suppressor, allowing cells with genomic instability to proliferate unchecked. This loss of control is a fundamental step in carcinogenesis and a primary focus for modern research into gene therapies and precision medicine aimed at restoring p53 activity to combat tumor growth.

MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backward from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

reverse translation TP53 - tumor protein p53

atggaagaaccgcagagcgatccgagcgtggaaccgccgctgagccaggaaacctttagc gatctgtggaaactgctgccggaaaacaacgtgctgagcccgctgccgagccaggcgatg gatgatctgatgctgagcccggatgatattgaacagtggtttaccgaagatccgggcccg gatgaagcgccgcgcatgccggaagcggcgccgccggtggcgccggcgccggcggcgccg accccggcggcgccggcgccggcgccgagctggccgctgagcagcagcgtgccgagccag aaaacctatcagggcagctatggctttcgcctgggctttctgcatagcggcaccgcgaaa agcgtgacctgcacctatagcccggcgctgaacaaaatgttttgccagctggcgaaaacc tgcccggtgcagctgtgggtggatagcaccccgccgccgggcacccgcgtgcgcgcgatg gcgatttataaacagagccagcatatgaccgaagtggtgcgccgctgcccgcatcatgaa cgctgcagcgatagcgatggcctggcgccgccgcagcatctgattcgcgtggaaggcaac ctgcgcgtggaatatctggatgatcgcaacacctttcgccatagcgtggtggtgccgtat gaaccgccggaagtgggcagcgattgcaccaccattcattataactatatgtgcaacagc agctgcatgggcggcatgaaccgccgcccgattctgaccattattaccctggaagatagc agcggcaacctgctgggccgcaacagctttgaagtgcgcgtgtgcgcgtgcccgggccgc gatcgccgcaccgaagaagaaaacctgcgcaaaaaaggcgaaccgcatcatgaactgccg ccgggcagcaccaaacgcgcgctgccgaacaacaccagcagcagcccgcagccgaaaaaa aaaccgctggatggcgaatattttaccctgcagattcgcggccgcgaacgctttgaaatg tttcgcgaactgaacgaagcgctggaactgaaagatgcgcaggcgggcaaagaaccgggc ggcagccgcgcgcatagcagccatctgaaaagcaaaaaaggccagagcaccagccgccat aaaaaactgatgtttaaaaccgaaggcccggatagcgat

3.3. Codon optimization.

Once the nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize Google for a “codon optimization tool”.

Tool: Codon Optimization IDT https://www.idtdna.com/CodonOpt

ATG GAG GAA CCA CAG AGT GAC CCC AGC GTG GAG CCA CCA CTG AGC CAG GAG ACC TTC AGC GAC CTG TGG AAG CTG CTG CCT GAG AAC AAC GTG CTG AGC CCC CTG CCC AGC CAG GCC ATG GAC GAC CTG ATG CTC TCC CCT GAT GAC ATC GAG CAG TGG TTC ACT GAG GAC CCT GGG CCC GAC GAG GCC CCC CGG ATG CCT GAA GCT GCA CCT CCT GTG GCC CCT GCC CCT GCA GCC CCC ACC CCA GCC GCC CCT GCC CCA GCT CCC TCA TGG CCA CTC TCC TCC TCT GTC CCC TCC CAG AAG ACC TAC CAG GGC TCC TAT GGC TTC CGC CTG GGC TTC CTG CAC TCA GGG ACT GCA AAA TCT GTC ACC TGC ACC TAC AGC CCA GCC CTG AAT AAG ATG TTC TGC CAG CTG GCC AAG ACC TGC CCT GTG CAG CTG TGG GTG GAC TCC ACA CCA CCA CCA GGG ACC AGA GTG CGG GCT ATG GCC ATT TAC AAG CAG AGC CAG CAC ATG ACC GAG GTG GTG CGG AGA TGC CCC CAT CAC GAG CGC TGC TCT GAC TCT GAT GGC CTG GCC CCT CCC CAG CAC CTC ATC CGT GTG GAG GGG AAC CTG AGG GTG GAG TAC CTG GAC GAC AGG AAC ACC TTC CGG CAC TCT GTG GTG GTG CCC TAT GAG CCT CCC GAG GTG GGC TCT GAC TGC ACC ACC ATC CAC TAC AAC TAC ATG TGC AAT TCC TCC TGT ATG GGG GGA ATG AAC CGG AGA CCC ATC CTG ACC ATC ATC ACC CTG GAG GAC TCC TCT GGA AAC CTG CTT GGG AGG AAC AGC TTT GAG GTG CGG GTG TGT GCC TGC CCT GGC CGG GAC AGG AGA ACT GAG GAG GAG AAC CTG AGG AAG AAG GGA GAG CCT CAC CAT GAG CTG CCT CCT GGA TCC ACC AAG CGG GCC CTG CCC AAC AAC ACC TCC TCC AGC CCT CAG CCC AAG AAG AAG CCC CTG GAT GGA GAG TAC TTC ACC CTG CAG ATC CGG GGG AGG GAG AGG TTC GAG ATG TTC CGG GAG CTG AAT GAG GCC CTG GAG CTG AAG GAC GCC CAG GCT GGG AAG GAG CCA GGG GGC AGC AGG GCC CAC TCC AGC CAC CTG AAA TCC AAG AAA GGG CAG TCC ACT TCC AGA CAC AAG AAA CTC ATG TTC AAG ACT GAA GGG CCA GAC TCT GAC

In your own words, describe why do you need to optimize codon usage. Which organism have you chose to optimize the codon sequence for and why?

Codon optimization is essential because the genetic code is degenerate, meaning multiple codons can encode the same amino acid. However, different organisms—and even different tissues—display a “codon bias,” where certain tRNAs are more abundant than others. By strategically selecting the most efficient codons for a specific host, we can increase translational speed, ensure proper protein folding, and significantly boost the overall yield of the target protein. Without this optimization, the translation process might stall due to the scarcity of rare tRNAs, leading to truncated or misfolded proteins. For this project, I have chosen to optimize the codon sequence for human cells (Homo sapiens). My primary reason for this choice is that the target protein is p53, the “guardian of the genome.” Given its critical role in human cancer biology and its complex post-translational modifications, using a human expression system is vital to ensure that the protein retains its native conformation and biological functionality. Optimizing for the human host allows for a more accurate study of how p53 interacts with other endogenous tumor suppressors and provides a more clinically relevant model for developing potential gene therapies or molecular interventions.

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

To produce the p53 protein from an optimized DNA sequence, two primary technological approaches can be utilized: cell-dependent systems and cell-free systems. In a cell-dependent approach, such as using mammalian cell lines like HEK293T or CHO, the optimized DNA is inserted into a plasmid vector and introduced into the cells via transfection or viral transduction. This method is ideal for p53 because these living factories possess the complex machinery required for essential post-translational modifications, ensuring the protein folds correctly into its functional tetrameric state. Alternatively, Cell-Free Protein Synthesis (CFPS) utilizes a biological soup or lysate (often derived from rabbit reticulocytes) containing ribosomes and tRNAs without the constraints of a living cell. This is particularly advantageous for p53 production because high concentrations of the protein can be toxic to living hosts, whereas a cell-free system allows for rapid, direct synthesis from a DNA template in a matter of hours. The conversion of this DNA sequence into a functional protein follows the fundamental biological pathway of transcription and translation. During transcription, the enzyme RNA polymerase binds to a promoter region on the DNA template and “reads” the optimized sequence to synthesize a complementary strand of messenger RNA (mRNA), which serves as a portable blueprint of the genetic information. This mRNA then enters the translation phase, where it is processed by the ribosome. The ribosome reads the mRNA in sets of three nucleotides, or codons, while transfer RNA (tRNA) molecules deliver the corresponding amino acids. As the ribosome moves along the mRNA strand, it catalyzes the formation of peptide bonds between these amino acids, building a polypeptide chain that eventually folds into the specific three-dimensional structure of the p53 protein, ready to fulfill its role in genomic maintenance.

3.5. Optional - How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level.

In nature, a single gene can diversify its output at the transcriptional level primarily through alternative splicing, a process where different combinations of exons from the same pre-mRNA molecule are joined together. While the initial transcription process produces a primary RNA transcript containing both coding (exons) and non-coding (introns) regions, the spliceosome selectively removes introns and stays flexible on which exons to retain. By including or skipping specific exons, the cell can generate multiple distinct mRNA variants from one DNA sequence, each of which is subsequently translated into a protein with different functional domains or properties.

Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

Part 4: Twist DNA Synthesis Order

Linear Map of my preview TP53 sequence expression cassette

My final plasmid

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

To analyze the genetic diversity of the Andean bear (Tremarctos ornatus), I would sequence the mitochondrial COI (Cytochrome c Oxidase I) gene, as it serves as a robust molecular barcode for identifying population variations in endangered species.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

I would use Oxford Nanopore Sequencing (ONT), which is a third-generation technology because it performs real-time sequencing of single DNA molecules without the need for massive PCR amplification. The input consists of genomic DNA purified from non-invasive samples (hair). Preparation involves adapter ligation with motor proteins and optional barcoding for multiplexing. The essential steps involve the DNA strand passing through a protein nanopore embedded in a polymer membrane; the system performs base calling by measuring disruptions in the ionic current caused by the unique chemical composition of each nucleotide as it transits the pore. The output is a series of FAST5 or FASTQ files containing long-read sequences.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would synthesize a lactate-responsive genetic circuit (utilizing the lldPR operon) to monitor metabolic stress in wildlife. The sequence would include a lactate-sensitive promoter, a ribosome binding site (RBS), and a Green Fluorescent Protein (GFP) reporter gene.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use semiconductor-based DNA synthesis, which leverages silicon-based technology to write thousands of genes in parallel with high precision. The essential steps include in silico design, oligonucleotide synthesis on silicon microchips, PCR-based fragment assembly, and cloning into vectors. The primary limitations of this method include the high cost for extremely long sequences and technical difficulties in synthesizing regions with high GC content or complex repetitive elements, though its scalability is far superior to traditional column-based methods.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

I would choose to edit the TP53 gene in mammalian cell lines to restore its tumor-suppressor function, given its critical role in cancer biology.

(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use CRISPR-Cas9 technology due to its high specificity and versatility. This technology edits DNA using a Cas9 nuclease directed by a synthetic guide RNA (sgRNA) to create a double-strand break at a precise genomic location. Preparation requires designing an sgRNA complementary to the target site, and the input consists of the ribonucleoprotein (RNP) complex or plasmids encoding Cas9 and the guide. The essential steps include PAM motif recognition, guide-DNA hybridization, and enzymatic cleavage, followed by Homology-Directed Repair (HDR) if a repair template is provided. The main limitations involve off-target effects (unintended cuts) and the variable efficiency of HDR-mediated repair in post-mitotic cells.