Week 2 HW: DNA Read, Write, & Edit

Part 0: Basics of Gel Electrophoresis

I have attended the recitation.

Part 1: Benchling & In-silico Gel Art

I made the gel art below. It is “HT” for “How To grow almost anything”.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

I worked in group with Louisa, Jasmine, and Yutong. We tried to make the cat gel art designed by Louisa, but unfortunately it was not very successful. Photo below:

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose EGFR (Epidermal Growth Factor Receptor), because it is a protein that plays a critical role in cell growth and division, and it is often mutated in various cancers.

>sp|P00533|EGFR_HUMAN Epidermal growth factor receptor OS=Homo sapiens OX=9606 GN=EGFR PE=1 SV=2
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV
VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA
VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF
QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC
TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV
VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK
NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF
ENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKL
FGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCN
LLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVM
GENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVV
ALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGS
GAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGI
CLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAA
RNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSY
GVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPK
FRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQ
QGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTED
SIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLN
TVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRV
APQSSEFIGA

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I used the reverse translation tool on the Twist Bioscience website. I added a stop codon (TAA) manually to the end of the DNA sequence. The DNA sequence is as follows:

>Egfr
ATGCGTCCGTCCGGCACTGCAGGGGCCGCACTGTTGGCTCTGCTCGCAGCCTTATGCCCCGCCAG
TCGAGCATTGGAGGAGAAGAAGGTGTGCCAGGGAACCTCTAACAAGCTGACACAACTCGGGACCT
TCGAGGACCATTTCCTGAGCCTCCAGAGAATGTTTAACAACTGCGAGGTTGTGTTGGGGAATCTG
GAGATTACCTACGTCCAGCGTAACTACGACCTCAGTTTCCTCAAGACAATACAGGAAGTGGCTGG
TTATGTTCTGATTGCACTGAATACCGTAGAGAGAATCCCCCTGGAAAACCTGCAAATTATTAGAG
GGAACATGTACTACGAAAACTCATATGCACTGGCCGTCCTGTCTAATTACGATGCCAACAAAACC
GGCCTGAAGGAGCTGCCCATGAGGAATCTCCAGGAAATTCTGCATGGGGCTGTAAGATTCAGCAA
TAACCCGGCTCTCTGTAACGTCGAGAGCATCCAGTGGCGCGACATAGTGAGCTCTGATTTCCTTT
CAAATATGTCCATGGATTTTCAAAACCATCTGGGGTCTTGTCAAAAGTGCGACCCATCTTGTCCC
AATGGTTCCTGTTGGGGCGCCGGAGAGGAGAATTGTCAGAAGCTTACCAAGATTATCTGTGCACA
GCAGTGTTCAGGGAGATGTAGGGGGAAGTCTCCAAGCGATTGCTGTCACAATCAATGCGCCGCCG
GGTGCACCGGCCCCAGAGAGAGTGACTGTTTAGTGTGCCGTAAATTCAGGGATGAGGCTACGTGC
AAAGACACATGCCCTCCACTCATGTTGTACAACCCAACAACTTATCAGATGGACGTGAACCCCGA
AGGGAAGTATAGCTTTGGGGCGACGTGCGTCAAGAAATGTCCACGCAACTATGTCGTTACGGACC
ACGGTTCCTGTGTGCGCGCTTGCGGAGCCGATTCATACGAAATGGAGGAAGACGGCGTCCGCAAG
TGTAAGAAGTGTGAAGGTCCTTGTCGCAAAGTCTGTAACGGTATCGGAATCGGCGAATTCAAGGA
CTCTCTCAGTATTAATGCAACCAATATCAAGCATTTTAAAAATTGTACCTCAATCTCAGGGGACT
TGCACATCCTGCCCGTTGCGTTTAGGGGGGACTCATTTACACACACCCCTCCACTGGATCCACAG
GAATTGGATATTCTGAAGACGGTGAAGGAAATCACCGGGTTCCTCCTGATTCAGGCCTGGCCAGA
GAACCGGACCGACCTTCACGCATTTGAGAACCTTGAAATTATTCGCGGGAGAACGAAACAACATG
GGCAATTCAGTCTCGCTGTCGTCAGCCTTAACATTACCTCCCTCGGATTGAGATCACTGAAGGAG
ATAAGCGACGGGGACGTGATAATCTCTGGGAACAAGAACTTGTGCTACGCCAACACTATTAACTG
GAAGAAGTTGTTCGGTACCTCAGGTCAGAAGACTAAAATCATCTCTAATCGAGGCGAAAATTCTT
GCAAAGCAACAGGGCAGGTGTGCCATGCCCTGTGCTCCCCTGAAGGGTGCTGGGGCCCCGAACCA
AGAGACTGCGTCTCTTGTCGGAATGTTTCCCGGGGGAGGGAATGTGTGGATAAGTGCAACCTGCT
TGAAGGAGAACCGAGAGAGTTTGTGGAAAATTCCGAATGTATCCAATGCCACCCTGAATGTTTAC
CTCAGGCCATGAACATTACATGCACAGGTAGGGGCCCCGACAATTGTATTCAGTGCGCCCATTAC
ATCGACGGACCCCACTGCGTAAAGACGTGCCCTGCCGGCGTTATGGGCGAGAATAATACCCTGGT
GTGGAAGTATGCCGACGCCGGGCATGTTTGCCATCTGTGCCACCCAAATTGTACATATGGCTGCA
CCGGCCCCGGGTTGGAGGGCTGCCCCACTAATGGACCTAAGATCCCCAGCATCGCCACCGGGATG
GTTGGCGCTCTGCTGCTGCTGTTGGTGGTCGCCTTGGGGATAGGGTTGTTCATGCGAAGGAGGCA
TATCGTCAGGAAGCGCACCCTGAGGAGGCTGCTCCAAGAGCGTGAGCTTGTCGAACCCCTGACCC
CCAGCGGCGAGGCCCCAAACCAGGCACTTTTGCGCATTCTGAAAGAGACCGAATTCAAAAAGATC
AAGGTGCTCGGCAGCGGGGCCTTCGGCACTGTGTACAAGGGTCTTTGGATCCCGGAGGGCGAAAA
AGTTAAAATACCCGTGGCAATAAAAGAGCTGAGGGAGGCTACAAGTCCAAAGGCAAACAAAGAAA
TCCTTGACGAAGCATACGTTATGGCATCCGTCGATAATCCACACGTGTGCAGGCTTCTCGGGATT
TGCCTGACGTCAACCGTGCAGCTGATTACGCAACTGATGCCATTTGGCTGTCTGCTCGATTATGT
GCGCGAGCACAAGGACAACATTGGATCTCAATACTTACTCAACTGGTGCGTGCAGATCGCGAAGG
GGATGAACTATCTGGAGGACCGCAGACTCGTCCACAGGGATCTTGCAGCCAGGAACGTACTCGTT
AAGACCCCGCAGCATGTGAAAATTACCGATTTCGGCCTTGCAAAACTGCTGGGTGCCGAAGAGAA
AGAATACCATGCAGAGGGTGGCAAAGTTCCTATCAAATGGATGGCGTTAGAGTCAATTCTGCATC
GGATCTATACCCATCAGAGCGACGTGTGGTCTTACGGTGTGACCGTTTGGGAGCTTATGACTTTT
GGGAGCAAACCGTACGACGGCATCCCGGCAAGCGAAATTTCCTCAATACTGGAAAAGGGGGAACG
TCTGCCCCAACCACCTATCTGCACTATAGACGTTTATATGATAATGGTGAAATGTTGGATGATCG
ACGCCGACAGTCGACCCAAGTTTCGAGAGTTAATCATCGAGTTCTCCAAGATGGCTCGGGATCCC
CAAAGGTACTTAGTCATCCAGGGCGATGAAAGAATGCACTTACCCTCACCCACAGATTCAAACTT
CTATCGAGCTTTGATGGATGAGGAAGACATGGATGATGTGGTAGACGCCGACGAGTACCTGATAC
CACAGCAGGGTTTTTTTTCTTCACCAAGCACATCTCGTACGCCTCTTCTGAGTAGCCTCAGCGCG
ACCTCCAACAACTCCACAGTGGCGTGCATCGACCGCAACGGACTTCAGTCCTGTCCAATTAAAGA
GGATTCCTTCCTGCAGAGGTATAGCAGCGACCCTACCGGAGCCCTTACCGAGGATAGTATTGATG
ATACATTCCTGCCTGTACCCGAATACATTAATCAGTCCGTGCCAAAACGCCCCGCAGGGAGTGTA
CAGAATCCAGTGTACCACAACCAGCCGCTGAACCCCGCACCCAGCCGAGACCCCCACTACCAGGA
CCCACATAGCACGGCCGTGGGAAATCCTGAATACCTGAACACCGTGCAGCCTACATGTGTGAATA
GCACTTTCGATAGCCCCGCACACTGGGCCCAGAAGGGCTCACACCAAATTAGCCTTGATAACCCT
GATTACCAGCAGGACTTCTTCCCCAAGGAGGCAAAGCCAAACGGTATCTTTAAGGGTAGCACGGC
CGAAAACGCAGAGTACTTGAGGGTTGCCCCTCAGTCCAGTGAGTTCATTGGCGCCTAA

3.3. Codon optimization.

I used the codon optimization tool on the Twist Bioscience website. The optimized DNA sequence is attached below.

Codon optimization is necessary because different organisms have different preferences for codons to encode the same amino acid. This can affect the efficiency of protein expression. I chose to optimize the codon sequence for E. coli, because it is a commonly used host organism for protein expression in the lab.

>Egfr
ATGAGACCCAGTGGAACCGCCGGTGCAGCTCTCCTTGCTTTGCTCGCTGCGCTCTGTCCAGCTTC
ACGGGCCCTTGAAGAAAAGAAAGTCTGTCAAGGTACAAGCAATAAACTCACGCAGTTGGGAACTT
TTGAAGATCACTTTCTGTCCCTGCAAAGGATGTTCAATAATTGTGAAGTAGTTCTGGGCAACCTC
GAAATCACATATGTACAGAGAAATTATGATTTATCCTTTCTGAAAACCATCCAAGAGGTAGCCGG
GTACGTCTTGATCGCTTTAAACACGGTTGAACGGATACCACTCGAGAATTTGCAGATAATCCGCG
GCAATATGTATTATGAGAATAGCTACGCCCTCGCGGTGCTCTCAAACTATGACGCGAATAAGACA
GGGTTAAAAGAATTACCAATGAGAAACCTGCAAGAGATACTCCACGGTGCAGTTAGGTTTAGTAA
CAATCCAGCCCTGTGCAATGTGGAATCTATTCAATGGCGAGATATCGTTAGTAGTGACTTTCTGT
CCAACATGAGTATGGACTTCCAGAATCACCTTGGCAGTTGCCAGAAATGTGATCCCAGCTGCCCA
AACGGGAGCTGCTGGGGAGCTGGGGAAGAAAACTGCCAGAAACTCACTAAAATCATATGCGCTCA
ACAATGCTCTGGCAGGTGCAGAGGCAAAAGCCCTTCCGACTGTTGCCATAACCAGTGTGCAGCTG
GATGTACTGGGCCGAGGGAAAGCGATTGCCTTGTCTGTAGAAAGTTTCGGGACGAAGCCACCTGT
AAGGATACTTGTCCACCCCTGATGCTCTATAATCCTACGACCTACCAAATGGATGTTAATCCGGA
AGGAAAATACTCCTTCGGCGCCACCTGTGTGAAGAAGTGCCCGCGGAATTACGTTGTGACAGATC
ATGGGTCTTGCGTCCGAGCCTGTGGTGCAGACTCTTATGAGATGGAAGAGGATGGGGTGAGGAAA
TGCAAGAAATGCGAGGGGCCATGCAGGAAGGTATGCAATGGAATTGGCATAGGTGAGTTTAAAGA
TTCACTGAGCATCAACGCGACAAACATTAAACACTTCAAGAACTGCACGTCCATATCTGGAGATC
TTCATATTCTTCCGGTGGCTTTCCGAGGAGATTCTTTCACCCATACACCACCCTTAGACCCTCAA
GAGCTGGACATATTGAAAACAGTTAAAGAGATTACAGGCTTTCTGCTTATCCAAGCTTGGCCTGA
AAATAGGACGGATCTCCATGCCTTCGAAAATCTGGAGATCATCAGAGGACGCACAAAGCAGCACG
GACAGTTTTCCCTGGCGGTGGTGTCTCTCAATATAACTTCACTTGGCTTACGCAGCCTCAAAGAA
ATTTCCGACGGAGATGTCATCATAAGTGGAAATAAGAATCTCTGTTACGCTAATACCATCAATTG
GAAGAAACTCTTCGGAACATCTGGACAAAAGACAAAGATCATTAGCAACCGCGGGGAGAACAGCT
GTAAGGCTACCGGACAAGTCTGTCACGCACTTTGTTCTCCAGAGGGATGTTGGGGCCCAGAGCCT
CGTGATTGTGTGTCCTGCAGGAACGTCAGCCGCGGCAGAGAGTGCGTTGACAAATGTAATCTCCT
CGAGGGCGAGCCTCGCGAATTCGTTGAGAACAGTGAGTGCATTCAGTGTCATCCAGAGTGCTTGC
CACAAGCTATGAATATCACCTGTACTGGACGCGGACCTGATAACTGCATCCAATGTGCTCACTAT
ATAGATGGGCCACATTGTGTGAAAACTTGTCCAGCTGGTGTAATGGGAGAAAACAACACATTAGT
TTGGAAATACGCAGATGCTGGTCACGTGTGTCACCTTTGTCATCCTAACTGCACCTACGGGTGTA
CTGGTCCAGGCCTCGAAGGTTGTCCGACCAACGGCCCAAAGATTCCTTCAATTGCAACAGGCATG
GTCGGGGCCTTGTTGTTGTTGCTGGTTGTGGCACTCGGTATTGGCCTGTTTATGAGACGGCGGCA
CATTGTGCGTAAAAGAACATTACGTCGCCTCTTACAGGAACGAGAACTGGTAGAGCCGCTCACAC
CTTCTGGGGAAGCACCGAATCAAGCCCTGCTCCGTATATTAAAGGAAACTGAGTTTAAGAAAATT
AAAGTACTGGGATCCGGCGCTTTTGGTACAGTTTATAAAGGGCTGTGGATACCTGAAGGGGAGAA
GGTGAAGATCCCTGTCGCCATCAAGGAATTGCGAGAAGCGACTTCCCCCAAAGCCAATAAAGAGA
TTCTGGATGAGGCCTATGTCATGGCTTCTGTGGACAACCCTCATGTTTGTCGCCTGTTAGGCATC
TGTCTTACTAGCACTGTCCAACTTATCACTCAGTTGATGCCGTTCGGGTGCCTTCTGGACTACGT
TAGAGAACATAAAGATAATATCGGTAGCCAGTATCTCCTGAATTGGTGTGTTCAAATAGCAAAAG
GCATGAATTACCTCGAAGATCGGCGGCTGGTTCATCGCGACCTGGCTGCTCGGAATGTCCTTGTC
AAAACACCCCAACACGTAAAGATAACAGACTTTGGGCTGGCTAAGCTTCTTGGCGCTGAGGAAAA
GGAGTATCACGCTGAAGGCGGAAAGGTCCCCATTAAGTGGATGGCTCTGGAATCCATCTTGCACA
GGATATACACTCACCAATCAGATGTCTGGTCCTATGGGGTAACAGTATGGGAACTGATGACCTTC
GGCTCCAAGCCATATGATGGAATACCTGCGAGTGAGATAAGCTCCATTTTGGAGAAAGGAGAGAG
GTTACCGCAGCCGCCCATATGTACAATTGATGTCTACATGATTATGGTCAAGTGCTGGATGATTG
ATGCAGATAGCCGGCCGAAATTCCGCGAATTGATTATTGAATTTAGCAAAATGGCCCGCGACCCA
CAGCGCTATTTGGTTATTCAAGGGGACGAGAGGATGCATCTGCCAAGCCCAACTGACAGCAACTT
TTACCGCGCCCTTATGGACGAAGAGGATATGGACGACGTTGTGGATGCTGATGAATATTTGATTC
CTCAACAAGGGTTCTTCAGTTCTCCCTCAACTTCCAGAACCCCACTGCTTTCAAGTTTATCCGCA
ACTAGTAATAATAGCACCGTTGCATGTATTGATCGGAATGGGCTGCAAAGCTGCCCCATCAAGGA
AGACTCATTCTTACAACGTTACTCATCTGATCCCACTGGGGCGCTGACTGAAGACTCAATCGACG
ACACCTTTCTTCCAGTCCCGGAGTATATCAACCAAAGTGTCCCCAAGAGACCTGCCGGATCCGTG
CAAAACCCTGTTTATCATAATCAACCTCTCAATCCAGCGCCGTCTAGGGATCCTCATTATCAAGA
TCCCCACTCTACTGCTGTAGGGAACCCAGAGTATCTCAATACAGTTCAACCCACTTGCGTCAACT
CTACCTTTGACAGTCCTGCCCATTGGGCGCAGAAAGGTTCCCATCAGATCTCCCTGGACAATCCA
GACTATCAACAAGATTTCTTTCCTAAAGAAGCCAAACCCAATGGGATATTCAAAGGATCTACCGC
AGAGAATGCGGAATATCTGCGCGTGGCACCCCAAAGCAGCGAATTTATCGGGGCTTAG

3.4. You have a sequence! Now what?

Cell-dependent and cell-free technologies could be used.

Cell-dependent method: First, insert the DNA sequence into a plasmid vector, and then transfer it into a host cell (e.g. E. coli). The host cell will transcribe the DNA into mRNA, which will then be translated into the protein.

Part 4: Prepare a Twist DNA Synthesis Order

I have done everything in Part 3 using the Twist Bioscience website, and followed the tutorial to finish the remaining steps.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

I would want to sequence the DNA of a cancer cell, because we can learn which mutations led to the cancer, and use corresponding drugs (if available) to treat the cancer.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Illumina sequencing.

Q1: Is your method first-, second- or third-generation or other? How so?

Second-generation, because it is based on sequencing by synthesis.

Q2: What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

The input is the DNA extracted from the cancer cell.

Fragmentation: The DNA is fragmented into smaller pieces.

Adapter ligation: Adapters are ligated to the ends of the DNA fragments.

PCR: The DNA fragments are amplified using PCR.

Q3: What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

First, DNA fragments are immobilized on a flow cell and amplified to form clusters. Then, fluorescently labeled nucleotides are added to the flow cell. Each time, a nucleotide is incorporated into the growing DNA strand by DNA polymerase, a fluorescent signal is emitted. The fluorescence is detected to determine which nucleotide was incorporated. Finally, the fluorescence signals are converted into base calls.

Q4: What is the output of your chosen sequencing technology?

Sequence reads with quality scores. Sequence reads can be assembled into a complete genome sequence.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would want to synthesize the DNA sequence of mutated EGFR in cancers. By synthesizing the DNA sequence of EGFR and translate it into protein, we can study its function and develop targeted drugs.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use phosphoramidite chemical synthesis for short DNA fragments (e.g. 200bp), and then Gibson assembly for assembling the short fragments into the full-length DNA sequence.

Q1: What are the essential steps of your chosen sequencing methods?

Phosphoramidite chemical synthesis: One DNA base is synthesizaed at a time. Each cycle consists of four steps: deprotection, coupling, capping, and oxidation.

Gibson assembly: DNA fragments with overlapping ends are mixed together. Enzymes (exonuclease, DNA polymerase, and DNA ligase) are added to the mixture and stitch the DNA fragments together.

5.2 DNA Edit

(i) What DNA would you want to edit and why?

I would want to edit the DNA sequence of a gene that is mutated in a genetic disease, such as cystic fibrosis. By editing the DNA sequence to correct the mutation, we can potentially cure the disease.

Q1: How does your technology of choice edit DNA? What are the essential steps? CRISPR-Cas9 is a commonly used technology for DNA editing.

First, a guide RNA (gRNA) is designed to target the specific DNA sequence to be edited. The gRNA is then complexed with the Cas9 protein to form a ribonucleoprotein (RNP) complex which will be delivered into the target cells. The RNP will bind to the target DNA sequence and creates a break. The cell’s repair mechanisms then repair the break according to a repair template.

Q2: What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

Preparation and input: gRNA sequence that corresponds to the edit site (find), repair template (replace).

Q3: What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The main limitation is the off-target effect, where the edits are applied to unintended sites.