Week 2 — DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview:

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, SalI.
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

Lamda DNA import Lamda DNA import Simulate Restriction Enzymes Simulate Restriction Enzymes Paul Vanouse’s Latent Figure Paul Vanouse’s Latent Figure

Part 3: DNA Design Challenge

  1. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

yo use la proteina SIRT1 por su potencial en …. y su relevancia en estudio antievenjecimiento y su relacion con el desarollo de cancer???. Obtuve su secuencia por medio de UniProt

sp|Q96EB6|SIRT1_HUMAN NAD-dependent protein deacetylase sirtuin-1 OS=Homo sapiens OX=9606 GN=SIRT1 PE=1 SV=3 MADEAALALQPGGSPSAAGADREAASSPAGEPLRKRPRRDGPGLERSPGEPGGAAPEREVPAAARGCPGAAAAALWREAEAEAAAAGGEQEAQATAAAGEGDNGPGLQGPSREPPLADNLYDEDDDDEGEEEEEAAAAAIGYRDNLLFGDEIITNGFHSCESDEEDRASHASSSDWTPRPRIGPYTFVQQHLMIGTDPRTILKDLLPETIPPPELDDMTLWQIVINILSEPPKRKKRKDINTIEDAVKLLQECKKIIVLTGAGVSVSCGIPDFRSRDGIYARLAVDFPDLPDPQAMFDIEYFRKDPRPFFKFAKEIYPGQFQPSLCHKFIALSDKEGKLLRNYTQNIDTLEQVAGIQRIIQCHGSFATASCLICKYKVDCEAVRGDIFNQVVPRCPRCPADEPLAIMKPEIVFFGENLPEQFHRAMKYDKDEVDLLIVIGSSLKVRPVALIPSSIPHEVPQILINREPLPHLHFDVELLGDCDVIINELCHRLGGEYAKLCCNPVKLSEITEKPPRTQKELAYLSELPPTPLHVSEDSSSPERTSPPDSSVIVTLLDQAAKSNDDLDVSESKGCMEEKPQEVQTSRNVESIAEQMENPDLKNVGSSTGEKNERTSVAGTVRKCWPNRVAKEQISRRLDGNQYLFLPPNRYIFHGAEVYSDSEDDVLSSSSCGSNSDSGTCQSPSLEEPMEDESEIEEFYNGLEDEPDVPERAGGAGFGTDGDDQEAINEAISVKQEVTDMNYPSNKS

1.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

Using Benchling’s reverse translation tool, I converted the SIRT1 protein sequence into its corresponding coding DNA sequence (CDS) using the standard genetic code. The resulting DNA sequence is:

Using the Benchling's reverse translation tool, I converted the SIRT1 protein sequence into its corresponding coding DNA sequence (CDS) using the standard genetic code. The resulting DNA sequence is:

```text
ATGGCGGACGAGGCGGCGCTGGCGCTGCAGCCGGGCGGGAGCCCGAGCGCGGCGGGCGACCGCGAGGCGGCGAGCAGCCCGGCGGGCGAGCCGCTGCGCAAGCGCCCGCGCCGCGACGGCGCCGGCCTGGAGCGCAGCCCCGGCGAGCCGGGCGGCGCCGCCCCGGAGCGCGAGGTGCCGGCGGCGGCGCGCGGCTGCCCGGCGGCGGCGGCGGCGCTGTGGCGCGAGGCGGAGGCGGAGGCGGCGGCGGCGGGCGGCGAGCAGGAAGCGCAGGCGACGGCGGCGGCGGAGGGGGAGGACAACGGGCCGGGGCTGCAGGGCCCGAGCCGCGAGCCGCCGCTGGCGGACAACCTGTACGACGAGGACGACGACGACGAGGGCGAGGAGGAGGAGGCCGCGGCGGCGGCGATCGGCTACCGGCACAACCTGCTGTTCGGCGACGAGATCATCACCAACGGCTTCCACTCCTGCGAGAGCGACGAGGAGGACCGAGCCTCCCACGCCAGCAGCAGCGACTGGACCCCGAGGCCGAGGATCGGCCCGTACACCTTCGTGCAGCAGCATCTGATGATCGGGACCGACCCCAGGACGATACTGAAGGACCTGCTGCCGGAGACTATCCCGCCGCCGGAGCTGGACGACATGACTCTGTGGCAGATCGTGATCAATATCCTGAGCGAGCCGCCGAAACGGAAGAAGAGGAAAGACATCAACACTATTGAGGATGCGGTGAAACTGCTGCAGGAATGTAAAAAAATCATCGTCCTGACAGGTGCTGGAGTTTCTGTAAGTTGCGGCATTCCTGACTTTAGGTCACGAGACGGGATTTACGCGAGACTTGCTGTGGACTTTCCAGACCTTCCGGATCCGCAAGCAATGTTTGATATTGAATACTTCAGAAAGGACCCGCGCCCCTTCTTCAAGTTTGCAAAGGAAATATACCCAGGCCAGTTCCAACCAAGTTTGTGCCACAAGTTTATAGCCCTCTCAGACAAGGAAGGAAAACTGCTGAGAAATTATACCCAGAATATCGACACACTGGAGCAGGTTGCTGGGATCCAAAGAATCATTCAGTGTCATGGAAGTTTTGCCACTGCATCTTGCCTCATTTGCAAATACAAGGTGGATTGTGAAGCCGTCAGAGGGGACATTTTTAACCAAGTGGTCCCCCGGTGCCCCCGCTGCCCTGCCGATGAGCCGCTGGCAATAATGAAGCCAGAGATTGTGTTCTTTGGGGAAAACCTCCCTGAACAATTTCACAGAGCAATGAAGTATGATAAAGATGAAGTGGATCTACTGATAGTTATTGGCAGCAGCCTGAAAGTTCGTCCAGTGGCCTTAATACCAAGCAGCATTCCCCATGAAGTGCCTCAGATCCTGATCAATAGAGAGCCATTGCCACATCTCCACTTTGATGTGGAACTGCTGGGGGACTGTGACGTTATAATCAATGAACTGTGCCATAGACTGGGAGGGGAGTATGCAAAGCTGTGTTGCAATCCCGTAAAGTTATCCGAGATCACTGAAAAGCCCCCCCGGACTCAGAAGGAGCTGGCATATCTGTCAGAGCTGCCGCCTCCCACCCTCCACGTGTCCGAGGATAGTAGCAGCCCAGAGCGCACCTCCCCTCCGGACAGCAGTGTTATCGTCACACTGCTGGACCAGGCAGCTAAGAGCAATGATGATCTGGATGTGTCAGAGAGCAAAGGATGCATGGAGGAGAAGCCCCAGGAGGTCCAAACGTCGAGGAACGTGGAGAGCATCGCTGAGCAGATGGAGAATCCGGATTTGAAAAATGTTGGGAGTAGCACGGGGGAGAAGAATGAAAGAACTTCAGTCGCTGGGACGGTGAGAAAGTGCTGGCCAAACCGGGTTGCGAAAGAACAAATATCTCGCCGATTGGACGGGAATCAGTACCTGTTCCTGCCACCAAACCGATACATTTTCCATGGTGCTGAAGTTTACTCGGATGATTCAGAGGACGACGTGCTCTCTTCCTCCTCTTGTGGCTCCAATTCAGACTCAGGCACTTGCCAGTCACCCAGTCTAGAGGAACCTATGGAGGATGAGTCCGAAATAGAAGAGTTTTACAATGGGCTGGAGGATGAGCCTGATGTCCCTGAGCGTGCTGGGGGTGCAGGATTTGGAACAGATGGAGATGACCAAGAAGCCATAAATGAGGCAATCTCAGTAAAACAGGAAGTGACAGATATGAACTATCCGAGCAACAAATCA

1.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

Codon optimization is required because the genetic code is degenerate-multiple codons encode the same amino acid, and different organisms have distinct codon usage biases. When expressing a human gene like SIRT1 in a foreign host such as E. coli, the presence of human-rare codons (which correspond to low-abundance tRNAs in E. coli) can lead to translational pausing, premature termination, mRNA instability, or protein misfolding. By replacing those rare codons with synonymous codons preferred by the expression host, I can dramatically increase protein yield, solubility, and functional expression without altering the amino acid sequence.

Organism chosen: Escherichia coli (E. coli)

I chose to optimize SIRT1 for E. coli for several reasons:

  • Cell-free system compatibility: The Week 11 cell-free protein synthesis system uses E. coli BL21(DE3) Star lysate, which is naturally optimized for E. coli codon preferences.

  • Speed and cost: E. coli is the fastest and most cost-effective microbial host for protein production.

  • Well-characterized codon bias: E. coli K-12 codon usage tables are extensively documented.

  • Twist synthesis standard: Twist Bioscience’s clonal gene service is optimized for E. coli expression vectors.

  • Downstream applications: Optimized genes work efficiently in both in vivo (E. coli transformation) and in vitro (cell-free lysate) platforms.

Using the tool Twist Codon Optimization Tool, the following report was obtained:

Using the tool Twist Codon Optimization Tool, the following report was obtained:

NameOriginal sequenceFlank 5'Optimized sequenceFlank 3'Organism of expressionTypePreserved regionsRestriction sitesSites onlyLengthOptimizedIssues
SIRT1ATGGCTGATGAAGCTGCATTAGCGCTGCAGCCGGGCGGCTCACCATCAGCGGCGGGCGCGGACCGTGAAGCGGCCAGCTCCCCGGCGGGTGAGCCGCTGCGCAAGCGTCCGCGCCGTGACGGCCCGGGGCTGGAGCGCAGCCCGGGTGAGCCAGGCGGGGCTGCGCCGGAGCGTGAGGTGCCGGCTGCGGCGCGTGGCTGCCCGGGTGCGGCGGCGGCAGCGCTGTGGCGTGAAGCAGAAGCTGAAGCTGCGGCAGCAGGCGGTGAACAGGAAGCCCAGGCAACAGCTGCTGCTGGTGAAGGTGACAACGGCCCGGGGCTGCAGGGGCCGAGCCGTGAACCGCCGCTGGCTGACAACCTGTATGATGAAGATGACGATGATGAAGGTGAAGAAGAAGAAGAAGCGGCGGCGGCGGCGATTGGTTACCGTGACAACCTGCTGTTCGGTGATGAAATCATCACCAACGGTTTCCACAGCTGCGAGAGCGATGAGGAAGACCGTGCCTCTCATGCCAGCAGCTCTGACTGGACCCCGCGTCCGCGTATTGGTCCGTACACCTTTGTCCAGCAGCATCTGATGATCGGCACCGATCCGCGCACTATCCTGAAAGACCTGCTGCCGGAAACCATCCCGCCGCCGGAACTGGATGACATGACGCTGTGGCAGATTGTGATCAACATCCTGAGCGAACCGCCGAAAAGGAAGAAGCGCAAAGACATCAACACCATTGAAGATGCGGTGAAGCTGCTGCAGGAGTGCAAAAAAATCATCGTCCTGACCGGTGCGGGCGTCTCTGTCAGCTGCGGTATCCCGGATTTCCGCAGCCGTGATGGTATCTATGCGCGTCTGGCGGTTGATTTCCCGGATCTGCCGGACCCGCAGGCGATGTTTGACATCGAGTACTTCCGCAAAGATCCGCGTCCGTTCTTCAAATTTGCCAAAGAGATCTACCCGGGTCAGTTCCAGCCGAGCCTGTGTCACAAATTTATTGCGCTGAGCGACAAAGAAGGCAAACTGCTGCGTAACTACACCCAGAACATCGACACCCTGGAACAGGTGGCGGGTATTCAGCGCATCATTCAGTGCCATGGCAGCTTTGCGACTGCGAGCTGCCTGATCTGCAAATACAAAGTGGACTGCGAAGCGGTGCGTGGTGATATCTTCAACCAGGTGGTACCGCGCTGCCCGCGCTGCCCGGCTGATGAACCGCTGGCGATTATGAAACCGGAAATTGTGTTCTTTGGTGAGAACCTGCCGGAACAGTTCCACCGTGCGATGAAATATGACAAAGATGAGGTTGATCTGCTGATTGTGATCGGCAGCTCGCTGAAAGTGCGTCCGGTTGCGCTGATCCCATCCTCGATTCCGCATGAAGTACCGCAGATTCTGATCAACCGTGAGCCGCTGCCGCACCTGCACTTCGATGTTGAGCTGCTGGGTGACTGCGATGTCATCATCAACGAGCTGTGCCACCGTCTGGGTGGTGAGTATGCCAAGCTGTGCTGCAACCCGGTGAAACTCTCTGAAATCACCGAGAAGCCGCCGCGTACCCAGAAAGAGCTGGCTTATTTATCTGAACTGCCGCCGACACCGCTGCACGTCAGCGAAGACAGCTCTAGCCCGGAGCGCACCTCTCCGCCGGACTCCTCCGTGATTGTCACCCTGCTGGATCAGGCGGCGAAATCCAACGATGACCTGGATGTCTCTGAGAGCAAAGGTTGCATGGAAGAGAAACCGCAGGAGGTGCAGACCAGCCGTAACGTGGAGAGTATTGCTGAACAGATGGAAAACCCGGATCTGAAAAACGTTGGCAGCTCAACTGGTGAGAAAAATGAGCGTACCTCGGTGGCGGGTACTGTGCGTAAATGTTGGCCGAACCGCGTGGCGAAAGAACAGATTAGCCGCCGTCTGGATGGTAACCAGTATCTGTTCCTGCCGCCGAACCGTTATATTTTCCACGGTGCGGAAGTCTACTCCGACAGCGAAGATGACGTGCTGTCCTCCTCGAGCTGTGGTAGCAACTCCGATAGCGGTACCTGTCAGAGCCCGTCACTGGAAGAACCGATGGAAGATGAAAGTGAAATTGAGGAATTCTATAACGGTCTGGAAGATGAACCGGATGTTCCGGAGCGTGCGGGTGGTGCGGGTTTCGGTACCGACGGTGATGATCAGGAAGCGATTAACGAAGCGATCTCCGTGAAACAGGAAGTGACCGATATGAACTACCCGTCCAACAAGAGCEscherichia coli general (562)Other protein typefalse2241true

1.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Cell-dependent method (in vivo): The optimized SIRT1 gene would be cloned into an expression vector (e.g., pET-28a) containing a T7 promoter, ribosome binding site (RBS), and terminator. This plasmid is transformed into E. coli BL21(DE3), where IPTG induction drives T7 RNA polymerase to transcribe the gene into mRNA, which is then translated by bacterial ribosomes into SIRT1 protein. The protein can be purified using affinity tags (e.g., 6xHis tag). Cell-free method (in vitro): The DNA template (linear or plasmid) is added directly to an E. coli lysate-based cell-free system (as in Week 11). The lysate provides all necessary machinery — RNA polymerase, ribosomes, tRNAs, energy regeneration systems, and amino acids — to perform coupled transcription and translation in a test tube within hours, without living cells. This method is faster, allows precise control over reaction conditions, and avoids toxicity issues.

1.5. [Optional] How does it work in nature/biological systems?

A single gene can produce multiple protein variants through a process called alternative splicing. In higher eukaryotes like humans, the initial pre-mRNA transcript contains both exons (coding regions) and introns (non-coding regions). During splicing, different combinations of exons can be joined together, producing multiple mature mRNA isoforms from the same gene. Each isoform translates into a different protein variant with potentially distinct functions, localization, or regulatory properties.

For SIRT1, alternative splicing generates at least five known isoforms:

IsoformExon combinationProtein lengthFunctional difference
Isoform 1 (canonical)Exons 1-9747 aaFull-length, nuclear localization
Isoform 2Missing exon 3~650 aaTruncated, altered deacetylase activity
Isoform 3Alternative exon 5~500 aaCytoplasmic, different substrate specificity
Isoform 4Missing exons 3 & 5~400 aaCatalytically inactive, dominant negative
Isoform 5Alternative 3’ end685 aaDifferent C-terminus, altered stability

Central Dogma: DNA → RNA → Protein alignment

Below is the alignment for the first 30 amino acids of SIRT1, showing the flow of information from DNA to RNA to protein.

Color legend:

  • 🔵 Blue: DNA template strand (coding sequence)
  • 🟢 Green: RNA transcript (T → U)
  • 🟡 Yellow: Protein (amino acids from codons)

Complete alignment for the first 30 amino acids in table format:

DNA (5’ → 3')RNA (5’ → 3')CodonAmino Acid (3-letter)Amino Acid (1-letter)
ATGAUGAUGMethionineM
GCGGCGGCGAlanineA
GACGACGACAspartic acidD
GAGGAGGAGGlutamic acidE
GCAGCAGCAAlanineA
GCAGCAGCAAlanineA
CTGCUGCUGLeucineL
GCAGCAGCAAlanineA
CTGCUGCUGLeucineL
CAGCAGCAGGlutamineQ
CCGCCGCCGProlineP
GGCGGCGGCGlycineG
GGCGGCGGCGlycineG
AGCAGCAGCSerineS
CCGCCGCCGProlineP
AGCAGCAGCSerineS
GCAGCAGCAAlanineA
GCCGCCGCCAlanineA
GGTGGUGGUGlycineG
GATGAUGAUAspartic acidD
CGTCGUCGUArginineR
GAAGAAGAAGlutamic acidE
GCAGCAGCAAlanineA
GCAGCAGCAAlanineA
TCTUCUUCUSerineS
TCTUCUUCUSerineS
CCGCCGCCGProlineP
GCAGCAGCAAlanineA
GGTGGUGGUGlycineG
GAAGAAGAAGlutamic acidE

Key observations from the alignment:

FeatureDescription
TranscriptionAll “T” (thymine) in DNA are replaced by “U” (uracil) in RNA
TranslationEach set of 3 nucleotides (codon) specifies one amino acid
Reading frameThe sequence is read continuously from the start codon (ATG/AUG/Met)
Genetic codeFollows the standard genetic code (same for humans and E. coli for most codons)
Start codonATG (DNA) / AUG (RNA) codes for Methionine (M)

Additional notes on SIRT1 biology:

In nature, SIRT1 expression is regulated at multiple levels:

  1. Transcriptional regulation: The SIRT1 promoter contains binding sites for transcription factors like p53, FOXO3a, and HIC1
  2. Alternative splicing: As shown above, multiple isoforms arise from the same gene
  3. Post-transcriptional regulation: miRNAs (e.g., miR-34a, miR-9) bind to the 3’ UTR and repress translation
  4. Post-translational modification: The protein itself can be phosphorylated, SUMOylated, or acetylated

This multi-level regulation allows a single gene to respond to diverse cellular signals (NAD+ levels, oxidative stress, caloric restriction) and produce context-specific functional outcomes.