Week 2 HW: DNA Read, Write, and Edit

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

I have selected PIEZO1 as my protein, that is a protein sitting in the cell membrane and opens when the membrane is physically stretched, compressed, or deformed, basically detecting the membrane tension.

Protein Sequence (2,521 aa)

Click to expand full protein sequence
>PIEZO1_Homo_sapiens | 2521 aa | Mechanosensitive ion channel
MEPHVLGAVLYWLLLPCALLAACLLRFSGLSLVYLLFLLLLPWFPGPTRCGLQGHTGRLL
RALLGLSLLFLVAHLALQICLHIVPRLDQLLGPSCSRWETLSRHIGVTRLDLKDIPNAIR
LVAPDLGILVVSSVCLGICGRLARNTRQSPHPRELDDDERDVDASPTAGLQEAATLAPTR
RSRLAARFRVTAHWLLVAAGRVLAVTLLALAGIAHPSALSSVYLLLFLALCTWWACHFPIS
TRGFSRLCVAVGCFGAGHLICLYCYQMPLAQALLPPAGIWARVLGLKDFVGPTNCSSPHA
LVLNTGLDWPVYASPGVLLLLCYATASLRKLRAYRPSGQRKEAAKGYEARELELAELDQW
PQERESDQHVVPTAPDTEADNCIVHELTGQSSVLRRPVRPKRAEPREASPLHSLGHLIM
DQSYVCALIAMMVWSITYHSWLTFVLLLWACLIWTVRSRHQLAMLCSPCILLYGMTLCCL
RYVWAMDLRPELPTTLGPVSLRQLGLEHTRYPCLDLGAMLLYTLTFWLLLRQFVKEKLLK
WAESPAALTEVTVADTEPTRTQTLLQSLGELVKGVYAKYWIYVCAGMFIVVSFAGRLVVY
KIVYMFLFLLCLTLFQVYYSLWRKLLKAFWWLVVAYTMLVLIAVYTFQFQDFPAYWRNLT
GFTDEQLGDLGLEQFSVSELFSSILVPGFFLLACILQLHYFHRPFMQLTDMEHVSLPGTR
LPRWAHRQDAVSGTPLLREEQQEHQQQQQEEEEEEEDSRDEGLGVATPHQATQVPEGAAK
WGLVAERLLELAAGFSDVLSRVQVFLRRLLELHVFKLVALYTVWVALKEVSVMNLLLVVL
WAFALPYPRFRPMASCLSTVWTCVIIVCKMLYQLKVVNPQEYSSNCTEPFPNSTNLLPTE
ISQSLLYRGPVDPANWFGVRKGFPNLGYIQNHLQVLLLLVFEAIVYRRQEHYRRQHQLA
PLPAQAVFASGTRQQLDQDLLGCLKYFINFFFYKFGLEICFLMAVNVIGQRMNFLVTLHG
CWLVAILTRRHRQAIARLWPNYCLFLALFLLYQYLLCLGMPPALCIDYPWRWSRAVPMNS
ALIKWLYLPDFFRAPNSTNLISDFLLLLCASQQWQVFSAERTEEWQRMAGVNTDRLEPLR
GEPNPVPNFIHCRSYLDMLKVAVFRYLFWLVLVVVFVTGATRISIFGLGYLLACFYLLLF
GTALLQRDTRARLVLWDCLILYNVTVIISKNMLSLLACVFVEQMQTGFCWVIQLFSLVCT
VKGYYDPKEMMDRDQDCLLPVEEAGIIWDSVCFFFLLLQRRVFLSHYYLHVRADLQATAL
LASRGFALYNAANLKSIDFHRRIEEKSLAQLKRQMERIRAKQEKHRQGRVDRSRPQDTLG
PKDPGLEPGPDSPGGSSPPRRQWWRPWLDHATVIHSGDYFLFESDSEEEEEAVPEDPRPS
AQSAFQLAYQAWVTNAQAVLRRRQQEQEQARQEQAGQLPTGGGPSQEVEPAEGPEEAAA
GRSHVVQRVLSTAQFLWMLGQALVDELTRWLQEFTRHHGTMSDVLRAERYLLTQELLQGG
EVHRGVLDQLYTSQAEATLPGPTEAPNAPSTVSSGLGAEEPLSSMTDDMGSPLSTGYHTR
SGSEEAVTDPGEREAGASLYQGLMRTASELLLDRRLRIPELEEAELFAEGQGRALRLLRAV
YQCVAAHSELLCYFIIILNHMVTASAGSLVLPVLVFLWAMLSIPRPSKRFWMTAIVFTE
IAVVVKYLFQFGFFPWNSHVVLRRYENKPYFPPRILGLEKTDGYIKYDLVQLMALFFHRS
QLLCYGLWDHEEDSPSKEHDKSGEEEQGAEEGPGVPAATTEDHIQVEARVGPTDGTPEPQ
VELRPRDTRRISLRFRRRKKEGPARKGAAAIEAEDREEEEGEEEKEAPTGREKRPSRSGGR
VRAAGRRLQGFCLSLAQGTYRPLRRFFHDILHTKYRAATDVYALMFLADVVDFIIIIFGFW
AFGKHSAATDITSSLSDDQVPEAFLVMLLIQFSTMVVDRALYLRKTVLGKLAFQVALVLA
IHLWMFFILPAVTERMFNQNVVAQLWYFVKCIYFALSAYQIRCGYPTRILGNFLTKKYNHL
NLFLFQGFRLVPFLVELRAVMDWVWTDTTLSLSSWMCVEDIYANIFIIKCSRETEKKYPQP
KGQKKKKIVKYGMGGLIILFLIAIIWFPLLFMSLVRSVVGVVNQPIDVTVTLKLGGYEPL
FTMSAQQPSIIPFTAQAYEELSRQFDPQPLAMQFISQYSPEDIVTAQIEGSSGALWRISPP
SRAQMKRELYNGTADITLRFTWNFQRDLAKGGTVEYANEKHMLALAPNSTARRQLASLLE
GTSDQSVVIPNLFPKYIRAPNGPEANPVKQLQPNEEADYLGVRIQLRREQGAGATGFLEW
WVIELQECRTDCNLLPMVIFSDKVSPPSLGFLAGYGIMGLYVSIVLVIGKFVRGFFSEIS
HSIMFEELPCVDRILKLCQDIFLVRETRELELEEELYAKLIFLYRSPETMIKWTREKE

Key features: 38 transmembrane helices/monomer · Trimeric propeller architecture · ~900 kDa functional complex


3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

Native DNA Sequence (7,566 bp)

Click to expand native DNA coding sequence
>PIEZO1_CDS_native | 7566 bp | Homo sapiens
atggaaccgcatgtgctgggcgcggtgctgtattggctgctgctgccgtgcgcgctgctg
gcggcgtgcctgctgcgctttagcggcctgagcctggtgtatctgctgtttctgctgctg
ctgccgtggtttccgggcccgacccgctgcggcctgcagggccataccggccgcctgctg
cgcgcgctgctgggcctgagcctgctgtttctggtggcgcatctggcgctgcagatttgc
ctgcatattgtgccgcgcctggatcagctgctgggcccgagctgcagccgctgggaaacc
ctgagccgccatattggcgtgacccgcctggatctgaaagatattccgaacgcgattcgc
ctggtggcgccggatctgggcattctggtggtgagcagcgtgtgcctgggcatttgcggc
cgcctggcgcgcaacacccgccagagcccgcatccgcgcgaactggatgatgatgaacgc
gatgtggatgcgagcccgaccgcgggcctgcaggaagcggcgaccctggcgccgacccgc
cgcagccgcctggcggcgcgctttcgcgtgaccgcgcattggctgctggtggcggcgggc
cgcgtgctggcggtgaccctgctggcgctggcgggcattgcgcatccgagcgcgctgagc
agcgtgtatctgctgctgtttctggcgctgtgcacctggtgggcgtgccattttccgatt
agcacccgcggctttagccgcctgtgcgtggcggtgggctgctttggcgcgggccatctg
atttgcctgtattgctatcagatgccgctggcgcaggcgctgctgccgccggcgggcatt
tgggcgcgcgtgctgggcctgaaagattttgtgggcccgaccaactgcagcagcccgcat
gcgctggtgctgaacaccggcctggattggccggtgtatgcgagcccgggcgtgctgctg
ctgctgtgctatgcgaccgcgagcctgcgcaaactgcgcgcgtatcgcccgagcggccag
cgcaaagaagcggcgaaaggctatgaagcgcgcgaactggaactggcggaactggatcag
tggccgcaggaacgcgaaagcgatcagcatgtggtgccgaccgcgccggataccgaagcg
gataactgcattgtgcatgaactgaccggccagagcagcgtgctgcgccgcccggtgcgc
ccgaaacgcgcggaaccgcgcgaagcgagcccgctgcatagcctgggccatctgattatg
gatcagagctatgtgtgcgcgctgattgcgatgatggtgtggagcattacctatcatagc
tggctgacctttgtgctgctgctgtgggcgtgcctgatttggaccgtgcgcagccgccat
cagctggcgatgctgtgcagcccgtgcattctgctgtatggcatgaccctgtgctgcctg
cgctatgtgtgggcgatggatctgcgcccggaactgccgaccaccctgggcccggtgagc
ctgcgccagctgggcctggaacatacccgctatccgtgcctggatctgggcgcgatgctg
ctgtataccctgaccttttggctgctgctgcgccagtttgtgaaagaaaaactgctgaaa
tgggcggaaagcccggcggcgctgaccgaagtgaccgtggcggataccgaaccgacccgc
acccagaccctgctgcagagcctgggcgaactggtgaaaggcgtgtatgcgaaatattgg
atttatgtgtgcgcgggcatgtttattgtggtgagctttgcgggccgcctggtggtgtat
aaaattgtgtatatgtttctgtttctgctgtgcctgaccctgtttcaggtgtattatagc
ctgtggcgcaaactgctgaaagcgttttggtggctggtggtggcgtataccatgctggtg
ctgattgcggtgtatacctttcagtttcaggattttccggcgtattggcgcaacctgacc
ggctttaccgatgaacagctgggcgatctgggcctggaacagtttagcgtgagcgaactg
tttagcagcattctggtgccgggcttttttctgctggcgtgcattctgcagctgcattat
tttcatcgcccgtttatgcagctgaccgatatggaacatgtgagcctgccgggcacccgc
ctgccgcgctgggcgcatcgccaggatgcggtgagcggcaccccgctgctgcgcgaagaa
cagcaggaacatcagcagcagcagcaggaagaagaagaagaagaagaagatagccgcgat
gaaggcctgggcgtggcgaccccgcatcaggcgacccaggtgccggaaggcgcggcgaaa
tggggcctggtggcggaacgcctgctggaactggcggcgggctttagcgatgtgctgagc
cgcgtgcaggtgtttctgcgccgcctgctggaactgcatgtgtttaaactggtggcgctg
tataccgtgtgggtggcgctgaaagaagtgagcgtgatgaacctgctgctggtggtgctg
tgggcgtttgcgctgccgtatccgcgctttcgcccgatggcgagctgcctgagcaccgtg
tggacctgcgtgattattgtgtgcaaaatgctgtatcagctgaaagtggtgaacccgcag
gaatatagcagcaactgcaccgaaccgtttccgaacagcaccaacctgctgccgaccgaa
attagccagagcctgctgtatcgcggcccggtggatccggcgaactggtttggcgtgcgc
aaaggctttccgaacctgggctatattcagaaccatctgcaggtgctgctgctgctggtg
tttgaagcgattgtgtatcgccgccaggaacattatcgccgccagcatcagctggcgccg
ctgccggcgcaggcggtgtttgcgagcggcacccgccagcagctggatcaggatctgctg
ggctgcctgaaatattttattaacttttttttttataaatttggcctggaaatttgcttt
ctgatggcggtgaacgtgattggccagcgcatgaactttctggtgaccctgcatggctgc
tggctggtggcgattctgacccgccgccatcgccaggcgattgcgcgcctgtggccgaac
tattgcctgtttctggcgctgtttctgctgtatcagtatctgctgtgcctgggcatgccg
ccggcgctgtgcattgattatccgtggcgctggagccgcgcggtgccgatgaacagcgcg
ctgattaaatggctgtatctgccggatttttttcgcgcgccgaacagcaccaacctgatt
agcgattttctgctgctgctgtgcgcgagccagcagtggcaggtgtttagcgcggaacgc
accgaagaatggcagcgcatggcgggcgtgaacaccgatcgcctggaaccgctgcgcggc
gaaccgaacccggtgccgaactttattcattgccgcagctatctggatatgctgaaagtg
gcggtgtttcgctatctgttttggctggtgctggtggtggtgtttgtgaccggcgcgacc
cgcattagcatttttggcctgggctatctgctggcgtgcttttatctgctgctgtttggc
accgcgctgctgcagcgcgatacccgcgcgcgcctggtgctgtgggattgcctgattctg
tataacgtgaccgtgattattagcaaaaacatgctgagcctgctggcgtgcgtgtttgtg
gaacagatgcagaccggcttttgctgggtgattcagctgtttagcctggtgtgcaccgtg
aaaggctattatgatccgaaagaaatgatggatcgcgatcaggattgcctgctgccggtg
gaagaagcgggcattatttgggatagcgtgtgctttttttttctgctgctgcagcgccgc
gtgtttctgagccattattatctgcatgtgcgcgcggatctgcaggcgaccgcgctgctg
gcgagccgcggctttgcgctgtataacgcggcgaacctgaaaagcattgattttcatcgc
cgcattgaagaaaaaagcctggcgcagctgaaacgccagatggaacgcattcgcgcgaaa
caggaaaaacatcgccagggccgcgtggatcgcagccgcccgcaggataccctgggcccg
aaagatccgggcctggaaccgggcccggatagcccgggcggcagcagcccgccgcgccgc
cagtggtggcgcccgtggctggatcatgcgaccgtgattcatagcggcgattattttctg
tttgaaagcgatagcgaagaagaagaagaagcggtgccggaagatccgcgcccgagcgcg
cagagcgcgtttcagctggcgtatcaggcgtgggtgaccaacgcgcaggcggtgctgcgc
cgccgccagcaggaacaggaacaggcgcgccaggaacaggcgggccagctgccgaccggc
ggcggcccgagccaggaagtggaaccggcggaaggcccggaagaagcggcggcgggccgc
agccatgtggtgcagcgcgtgctgagcaccgcgcagtttctgtggatgctgggccaggcg
ctggtggatgaactgacccgctggctgcaggaatttacccgccatcatggcaccatgagc
gatgtgctgcgcgcggaacgctatctgctgacccaggaactgctgcagggcggcgaagtg
catcgcggcgtgctggatcagctgtataccagccaggcggaagcgaccctgccgggcccg
accgaagcgccgaacgcgccgagcaccgtgagcagcggcctgggcgcggaagaaccgctg
agcagcatgaccgatgatatgggcagcccgctgagcaccggctatcatacccgcagcggc
agcgaagaagcggtgaccgatccgggcgaacgcgaagcgggcgcgagcctgtatcagggc
ctgatgcgcaccgcgagcgaactgctgctggatcgccgcctgcgcattccggaactggaa
gaagcggaactgtttgcggaaggccagggccgcgcgctgcgcctgctgcgcgcggtgtat
cagtgcgtggcggcgcatagcgaactgctgtgctattttattattattctgaaccatatg
gtgaccgcgagcgcgggcagcctggtgctgccggtgctggtgtttctgtgggcgatgctg
agcattccgcgcccgagcaaacgcttttggatgaccgcgattgtgtttaccgaaattgcg
gtggtggtgaaatatctgtttcagtttggcttttttccgtggaacagccatgtggtgctg
cgccgctatgaaaacaaaccgtattttccgccgcgcattctgggcctggaaaaaaccgat
ggctatattaaatatgatctggtgcagctgatggcgctgttttttcatcgcagccagctg
ctgtgctatggcctgtgggatcatgaagaagatagcccgagcaaagaacatgataaaagc
ggcgaagaagaacagggcgcggaagaaggcccgggcgtgccggcggcgaccaccgaagat
catattcaggtggaagcgcgcgtgggcccgaccgatggcaccccggaaccgcaggtggaa
ctgcgcccgcgcgatacccgccgcattagcctgcgctttcgccgccgcaaaaaagaaggc
ccggcgcgcaaaggcgcggcggcgattgaagcggaagatcgcgaagaagaagaaggcgaa
gaagaaaaagaagcgccgaccggccgcgaaaaacgcccgagccgcagcggcggccgcgtg
cgcgcggcgggccgccgcctgcagggcttttgcctgagcctggcgcagggcacctatcgc
ccgctgcgccgcttttttcatgatattctgcataccaaatatcgcgcggcgaccgatgtg
tatgcgctgatgtttctggcggatgtggtggattttattattattatttttggcttttgg
gcgtttggcaaacatagcgcggcgaccgatattaccagcagcctgagcgatgatcaggtg
ccggaagcgtttctggtgatgctgctgattcagtttagcaccatggtggtggatcgcgcg
ctgtatctgcgcaaaaccgtgctgggcaaactggcgtttcaggtggcgctggtgctggcg
attcatctgtggatgttttttattctgccggcggtgaccgaacgcatgtttaaccagaac
gtggtggcgcagctgtggtattttgtgaaatgcatttattttgcgctgagcgcgtatcag
attcgctgcggctatccgacccgcattctgggcaactttctgaccaaaaaatataaccat
ctgaacctgtttctgtttcagggctttcgcctggtgccgtttctggtggaactgcgcgcg
gtgatggattgggtgtggaccgataccaccctgagcctgagcagctggatgtgcgtggaa
gatatttatgcgaacatttttattattaaatgcagccgcgaaaccgaaaaaaaatatccg
cagccgaaaggccagaaaaaaaaaaaaattgtgaaatatggcatgggcggcctgattatt
ctgtttctgattgcgattatttggtttccgctgctgtttatgagcctggtgcgcagcgtg
gtgggcgtggtgaaccagccgattgatgtgaccgtgaccctgaaactgggcggctatgaa
ccgctgtttaccatgagcgcgcagcagccgagcattattccgtttaccgcgcaggcgtat
gaagaactgagccgccagtttgatccgcagccgctggcgatgcagtttattagccagtat
agcccggaagatattgtgaccgcgcagattgaaggcagcagcggcgcgctgtggcgcatt
agcccgccgagccgcgcgcagatgaaacgcgaactgtataacggcaccgcggatattacc
ctgcgctttacctggaactttcagcgcgatctggcgaaaggcggcaccgtggaatatgcg
aacgaaaaacatatgctggcgctggcgccgaacagcaccgcgcgccgccagctggcgagc
ctgctggaaggcaccagcgatcagagcgtggtgattccgaacctgtttccgaaatatatt
cgcgcgccgaacggcccggaagcgaacccggtgaaacagctgcagccgaacgaagaagcg
gattatctgggcgtgcgcattcagctgcgccgcgaacagggcgcgggcgcgaccggcttt
ctggaatggtgggtgattgaactgcaggaatgccgcaccgattgcaacctgctgccgatg
gtgatttttagcgataaagtgagcccgccgagcctgggctttctggcgggctatggcatt
atgggcctgtatgtgagcattgtgctggtgattggcaaatttgtgcgcggcttttttagc
gaaattagccatagcattatgtttgaagaactgccgtgcgtggatcgcattctgaaactg
tgccaggatatttttctggtgcgcgaaacccgcgaactggaactggaagaagaactgtat
gcgaaactgatttttctgtatcgcagcccggaaaccatgattaaatggacccgcgaaaaa
gaa

3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

  1. E. coli Codon-Optimized DNA (7,566 bp)

Optimized for expression in E. coli C43(DE3). Rare codons (AGG/AGA for Arg, CUA for Leu, AUA for Ile) replaced with E. coli-preferred synonymous codons to prevent ribosomal stalling and improve yield.

Click to expand E. coli-optimized sequence (codon-spaced)
>PIEZO1_Ecoli_optimized | 7566 bp | Codons spaced for readability
ATG GAA CCG CAT GTT TTG GGG GCG GTG CTC TAT TGG CTG CTC TTA CCG TGC
GCG TTA TTG GCC GCT TGT CTT CTG CGC TTT AGC GGC CTG TCT CTC GTG TAC
CTG CTT TTT CTG CTG CTG CTT CCG TGG TTC CCG GGC CCT ACG CGT TGT GGT
TTG CAA GGT CAT ACG GGT CGC TTA TTG CGC GCG CTG CTT GGC CTG TCC TTA
TTA TTT CTT GTG GCC CAT TTA GCC CTG CAA ATT TGT CTG CAT ATC GTT CCG
CGC CTG GAT CAG TTG CTG GGC CCG TCC TGC TCA CGC TGG GAG ACA TTG AGC
CGC CAT ATT GGG GTC ACG CGT TTA GAT CTC AAA GAT ATT CCT AAC GCT ATC
CGT TTG GTG GCG CCA GAC TTA GGT ATT CTG GTG GTG TCG AGC GTT TGT CTG
GGT ATT TGC GGT CGT CTG GCA CGT AAC ACG CGG CAG TCA CCT CAT CCG CGT
GAG CTC GAT GAT GAT GAG CGC GAT GTG GAT GCG AGT CCT ACC GCC GGC CTC
CAG GAG GCT GCG ACG CTC GCC CCG ACA CGC CGC TCG CGC CTG GCC GCA CGC
TTT CGC GTT ACG GCC CAT TGG CTG CTC GTA GCA GCA GGT CGT GTC CTG GCA
GTG ACG CTC CTG GCC CTT GCC GGG ATT GCG CAC CCG TCA GCG CTG AGC AGC
GTG TAC CTG TTA CTG TTC CTG GCG CTT TGC ACC TGG TGG GCC TGC CAT TTT
CCG ATC AGC ACA CGT GGC TTC TCC CGC CTG TGC GTG GCT GTA GGC TGT TTT
GGC GCA GGG CAT CTT ATT TGT CTT TAT TGC TAT CAG ATG CCT CTG GCT CAG
GCT TTG CTG CCG CCA GCA GGC ATC TGG GCC CGC GTG CTG GGT CTT AAA GAC
TTT GTT GGT CCG ACC AAC TGC TCA AGC CCT CAT GCC CTG GTG TTA AAT ACC
GGT TTA GAT TGG CCG GTG TAT GCA AGT CCG GGT GTT CTC CTG CTC CTT TGT
TAC GCC ACC GCA TCC TTG CGC AAA CTC CGC GCC TAT CGT CCG TCC GGG CAG
CGT AAA GAA GCG GCG AAA GGC TAC GAA GCA CGC GAA TTA GAA TTG GCT GAG
CTG GAT CAA TGG CCG CAG GAA CGT GAG AGC GAT CAG CAC GTT GTG CCG ACA
GCG CCG GAT ACC GAA GCG GAT AAC TGT ATC GTA CAC GAA CTG ACT GGT CAG
TCC AGT GTG TTA CGT CGC CCG GTT CGC CCG AAG CGG GCA GAA CCG CGG GAA
GCT TCC CCG CTC CAT AGC TTG GGC CAT CTG ATC ATG GAT CAG TCT TAT GTA
TGC GCA CTG ATC GCG ATG ATG GTA TGG TCT ATC ACC TAC CAC TCT TGG CTT
ACT TTT GTG CTT TTG CTG TGG GCC TGT CTG ATC TGG ACC GTT CGC TCG CGC
CAT CAG TTA GCC ATG CTG TGC TCA CCG TGC ATC CTT CTG TAT GGC ATG ACC
TTA TGC TGC CTT CGC TAT GTA TGG GCG ATG GAT CTT CGT CCG GAG CTC CCA
ACG ACG CTG GGC CCG GTG AGT CTG CGC CAG TTG GGT TTA GAA CAC ACG CGC
TAC CCG TGC CTG GAT TTG GGG GCG ATG CTG TTG TAT ACG CTG ACA TTT TGG
TTA TTG TTG CGG CAG TTC GTT AAG GAG AAA CTG CTC AAA TGG GCG GAA TCT
CCG GCA GCC TTG ACC GAG GTG ACC GTC GCG GAT ACA GAG CCG ACG CGT ACA
CAG ACC CTG CTG CAG TCG TTG GGC GAA TTG GTG AAA GGG GTG TAT GCC AAG
TAC TGG ATC TAT GTT TGT GCG GGT ATG TTT ATC GTA GTG TCC TTC GCC GGG
CGT CTG GTG GTG TAT AAA ATT GTT TAT ATG TTT CTG TTC CTG CTT TGC CTG
ACT TTA TTC CAG GTC TAC TAT TCA CTT TGG CGT AAA TTG CTC AAG GCC TTT
TGG TGG CTT GTC GTT GCG TAT ACC ATG TTG GTC CTG ATC GCC GTG TAT ACC
TTT CAG TTT CAG GAT TTC CCG GCC TAT TGG CGT AAT CTG ACC GGT TTC ACC
GAT GAA CAG CTG GGT GAC CTG GGT CTG GAG CAA TTT TCC GTT AGC GAA CTG
TTC AGC AGT ATC CTC GTG CCG GGT TTT TTT TTA CTC GCG TGT ATT CTG CAG
CTC CAT TAC TTT CAT CGT CCG TTC ATG CAA TTA ACA GAC ATG GAA CAT GTA
AGC TTG CCG GGT ACG CGC CTG CCT CGC TGG GCC CAC CGG CAG GAT GCC GTC
TCA GGC ACA CCG TTG CTG CGT GAA GAA CAG CAG GAA CAC CAG CAG CAG CAA
CAA GAG GAG GAA GAA GAA GAA GAA GAT TCT CGC GAT GAA GGC CTT GGT GTC
GCC ACC CCT CAC CAG GCA ACC CAA GTC CCG GAG GGG GCC GCC AAA TGG GGT
CTG GTT GCC GAG CGG TTG CTT GAA TTG GCA GCA GGC TTT AGT GAC GTG CTC
TCG CGT GTC CAA GTT TTT CTT CGT CGT CTG TTA GAA CTG CAC GTG TTT AAG
TTA GTA GCG TTA TAT ACG GTA TGG GTC GCG TTG AAA GAG GTC TCT GTT ATG
AAT CTG CTG TTG GTT GTG TTG TGG GCG TTT GCG CTG CCG TAT CCA CGC TTT
CGG CCG ATG GCG TCA TGT CTT TCG ACA GTG TGG ACC TGT GTT ATC ATC GTG
TGT AAA ATG CTG TAT CAG TTG AAA GTG GTT AAT CCG CAA GAG TAT AGT TCC
AAC TGT ACG GAA CCG TTT CCG AAC TCG ACC AAT CTG CTC CCG ACC GAG ATC
TCT CAG TCT CTC CTG TAT CGT GGG CCA GTG GAC CCG GCG AAC TGG TTT GGT
GTG CGC AAA GGC TTT CCG AAT TTG GGC TAC ATT CAG AAC CAC CTG CAA GTC
CTC CTG CTG CTG GTG TTT GAA GCG ATT GTG TAT CGC CGT CAA GAA CAT TAT
CGT CGT CAA CAT CAG TTG GCG CCT CTG CCT GCG CAG GCT GTT TTC GCA TCC
GGT ACG CGT CAA CAA CTG GAT CAG GAC CTG CTG GGT TGC CTG AAA TAT TTT
ATC AAT TTT TTT TTT TAT AAA TTC GGC CTG GAA ATT TGT TTT TTG ATG GCG
GTT AAT GTA ATC GGT CAA CGC ATG AAC TTT TTA GTT ACT CTG CAC GGT TGC
TGG CTC GTG GCG ATT CTT ACC CGC CGT CAT CGC CAG GCG ATC GCC CGT CTG
TGG CCG AAT TAT TGC TTA TTC CTT GCT CTG TTT CTG CTG TAT CAG TAT CTC
CTG TGC CTG GGC ATG CCG CCG GCG TTG TGC ATT GAT TAT CCT TGG CGG TGG
AGC CGT GCC GTA CCG ATG AAC AGC GCG CTT ATT AAG TGG CTG TAC TTA CCT
GAT TTC TTC CGT GCA CCG AAT TCG ACG AAC TTG ATC TCC GAT TTC CTG TTA
CTG TTG TGC GCG TCG CAA CAG TGG CAG GTG TTC TCG GCG GAA CGC ACA GAG
GAG TGG CAG CGC ATG GCC GGT GTA AAT ACC GAT CGC CTG GAA CCG CTC CGT
GGC GAA CCG AAT CCG GTG CCG AAT TTT ATT CAT TGT CGC AGT TAT TTA GAC
ATG TTG AAA GTT GCA GTA TTC CGC TAC CTG TTC TGG CTG GTA CTC GTT GTT
GTA TTC GTT ACT GGC GCG ACT CGG ATT AGT ATT TTC GGC TTA GGC TAT CTG
TTA GCC TGT TTT TAT CTG CTG CTT TTC GGT ACC GCA CTG CTG CAG CGC GAC
ACG CGT GCG CGC CTG GTT CTG TGG GAT TGC CTC ATT CTC TAT AAC GTG ACT
GTG ATT ATC AGT AAA AAC ATG CTT AGT TTG CTG GCG TGC GTT TTC GTT GAA
CAG ATG CAG ACC GGT TTT TGC TGG GTA ATC CAA TTA TTC TCA TTA GTG TGC
ACT GTG AAA GGC TAT TAC GAT CCG AAA GAA ATG ATG GAT CGG GAT CAG GAT
TGT TTG CTC CCG GTG GAA GAA GCA GGT ATT ATC TGG GAT TCT GTC TGT TTT
TTT TTC CTT TTA CTG CAG CGT CGC GTT TTC CTG TCC CAC TAC TAT CTG CAC
GTT CGG GCT GAT CTG CAG GCA ACC GCC CTT CTG GCC TCG CGG GGG TTT GCC
TTA TAT AAC GCC GCC AAT CTG AAA TCC ATT GAT TTC CAC CGT CGC ATT GAA
GAA AAG TCT CTG GCT CAA CTG AAA CGT CAG ATG GAA CGC ATT CGT GCC AAA
CAG GAG AAA CAT CGT CAA GGC CGC GTT GAT CGG AGT CGG CCG CAG GAT ACA
TTG GGC CCA AAG GAT CCA GGG CTG GAA CCG GGT CCG GAC TCG CCG GGC GGT
TCG TCC CCG CCG CGT CGT CAG TGG TGG CGG CCA TGG CTC GAT CAC GCT ACC
GTT ATC CAT AGT GGC GAT TAT TTT TTA TTT GAG TCC GAT TCG GAA GAA GAA
GAA GAA GCA GTT CCG GAG GAT CCG CGC CCT AGT GCA CAG AGC GCG TTT CAA
CTT GCG TAT CAG GCG TGG GTG ACC AAT GCA CAA GCC GTT TTG CGC CGC CGC
CAG CAG GAA CAG GAA CAG GCG CGC CAA GAA CAA GCA GGT CAA CTG CCT ACG
GGC GGC GGC CCG TCA CAA GAA GTT GAA CCT GCC GAA GGT CCG GAG GAA GCT
GCG GCC GGG CGC AGC CAT GTG GTG CAG CGC GTT CTT AGC ACC GCG CAG TTT
CTG TGG ATG CTG GGC CAA GCC CTG GTA GAT GAA TTG ACA CGC TGG TTG CAA
GAA TTT ACG CGT CAT CAC GGC ACC ATG TCC GAC GTG CTG CGC GCC GAG CGT
TAC TTG CTG ACG CAG GAG CTG TTG CAA GGG GGC GAA GTA CAC CGT GGC GTA
CTG GAC CAG CTC TAC ACA TCG CAA GCA GAG GCG ACG CTT CCT GGC CCA ACC
GAG GCC CCG AAC GCG CCA AGC ACC GTC TCT AGC GGC CTG GGC GCG GAA GAA
CCT TTA TCC TCC ATG ACA GAC GAT ATG GGG TCA CCG CTG AGC ACC GGT TAC
CAT ACC CGT TCG GGG TCT GAA GAG GCA GTT ACG GAC CCG GGT GAA CGC GAA
GCT GGT GCC TCT CTC TAT CAG GGG CTG ATG CGC ACC GCT TCA GAG CTG CTG
CTG GAT CGC CGC CTG CGC ATC CCT GAA CTG GAA GAA GCC GAA TTA TTT GCA
GAA GGC CAG GGT CGT GCC TTG CGC CTG TTA CGT GCA GTA TAT CAG TGC GTC
GCG GCA CAT AGC GAA CTG CTG TGT TAC TTT ATC ATT ATC CTG AAT CAT ATG
GTG ACC GCG TCT GCA GGT AGT CTG GTA CTG CCG GTT CTG GTA TTC TTA TGG
GCC ATG CTT TCC ATC CCG CGT CCA AGT AAA CGG TTC TGG ATG ACG GCG ATT
GTG TTT ACC GAA ATT GCT GTA GTG GTA AAA TAT TTA TTT CAA TTT GGC TTC
TTC CCA TGG AAT TCC CAC GTG GTG CTG CGG CGC TAT GAG AAT AAA CCG TAC
TTC CCT CCG CGC ATT TTG GGC TTA GAA AAA ACC GAT GGC TAT ATC AAA TAC
GAT TTA GTG CAG CTG ATG GCG TTA TTT TTT CAT CGC AGT CAG CTG TTA TGT
TAT GGT CTG TGG GAT CAT GAA GAG GAC TCT CCT AGC AAG GAA CAC GAT AAA
TCG GGT GAA GAA GAA CAG GGT GCC GAA GAA GGC CCT GGT GTG CCT GCA GCT
ACC ACT GAG GAT CAC ATT CAG GTG GAA GCG CGC GTT GGC CCA ACC GAT GGT
ACA CCG GAA CCG CAG GTG GAG TTA CGT CCG CGC GAT ACG CGT CGC ATT TCA
CTG CGT TTC CGT CGC CGT AAA AAA GAA GGC CCA GCG CGG AAG GGT GCT GCG
GCG ATC GAG GCA GAG GAC CGT GAG GAG GAG GAA GGG GAG GAA GAA AAA GAA
GCG CCA ACG GGC CGT GAG AAA CGT CCG TCG CGG TCT GGT GGC CGC GTT CGC
GCA GCT GGC CGT CGC CTG CAG GGG TTT TGC CTG TCA CTG GCG CAG GGT ACC
TAT CGC CCG CTC CGT CGC TTT TTC CAC GAT ATT CTG CAC ACG AAA TAT CGT
GCC GCG ACA GAT GTG TAT GCG CTG ATG TTT TTA GCT GAT GTG GTG GAT TTT
ATT ATT ATT ATC TTT GGG TTT TGG GCA TTC GGG AAG CAC TCT GCA GCA ACT
GAT ATT ACC TCT AGT TTA AGT GAT GAT CAG GTC CCG GAA GCG TTC CTG GTG
ATG CTG TTG ATT CAG TTT TCG ACG ATG GTT GTG GAT CGT GCT CTG TAT CTG
CGT AAG ACT GTC CTG GGT AAA TTG GCA TTT CAA GTG GCC TTA GTA TTG GCC
ATC CAT CTG TGG ATG TTC TTT ATT TTA CCG GCG GTG ACT GAA CGT ATG TTT
AAT CAG AAT GTT GTG GCC CAG TTA TGG TAT TTT GTG AAA TGT ATT TAC TTC
GCG TTA AGC GCG TAC CAA ATC CGG TGT GGT TAT CCG ACA CGT ATT CTG GGC
AAT TTC TTG ACT AAA AAA TAT AAC CAC CTT AAT CTG TTC CTG TTC CAA GGC
TTC CGC CTC GTT CCG TTT CTG GTG GAG TTA CGC GCA GTT ATG GAT TGG GTA
TGG ACA GAT ACT ACG CTG TCA CTC TCC TCG TGG ATG TGC GTG GAA GAT ATT
TAT GCT AAT ATT TTC ATC ATT AAA TGC TCG CGC GAA ACC GAG AAA AAG TAC
CCG CAA CCG AAA GGG CAA AAG AAA AAA AAA ATC GTG AAG TAT GGC ATG GGT
GGG TTA ATC ATT CTG TTC CTG ATT GCC ATC ATT TGG TTT CCG CTG TTG TTT
ATG TCA CTG GTG CGC TCG GTG GTG GGC GTG GTC AAT CAG CCG ATT GAT GTG
ACC GTG ACT TTG AAA TTA GGT GGC TAT GAA CCA TTG TTC ACG ATG AGT GCG
CAG CAA CCG AGT ATT ATT CCG TTT ACT GCG CAG GCG TAT GAA GAG CTG TCT
CGC CAG TTT GAT CCG CAA CCA CTG GCT ATG CAG TTT ATT TCC CAA TAT TCC
CCA GAG GAC ATC GTA ACT GCC CAG ATC GAG GGC AGC AGC GGC GCG CTG TGG
CGT ATT TCT CCT CCG AGT CGC GCC CAA ATG AAA CGC GAA CTG TAT AAT GGC
ACT GCC GAT ATC ACT CTT CGC TTC ACA TGG AAC TTT CAG CGG GAT CTG GCG
AAA GGC GGG ACC GTG GAA TAT GCG AAC GAG AAA CAT ATG TTG GCG CTG GCG
CCG AAC AGT ACC GCG CGT CGG CAA TTG GCC TCC TTG TTA GAG GGG ACC AGC
GAC CAA AGC GTA GTT ATC CCA AAC CTG TTT CCT AAA TAC ATT CGT GCG CCG
AAT GGT CCA GAG GCC AAC CCA GTC AAA CAA TTG CAA CCG AAT GAG GAG GCG
GAC TAT CTG GGC GTA CGT ATC CAA CTG CGT CGC GAA CAG GGT GCC GGC GCC
ACC GGC TTT CTG GAA TGG TGG GTA ATT GAA CTG CAG GAA TGC CGT ACG GAT
TGT AAT CTG CTC CCG ATG GTA ATT TTT TCG GAC AAA GTG AGC CCG CCG TCG
TTA GGT TTC TTA GCT GGT TAT GGC ATC ATG GGT TTG TAT GTT AGC ATC GTG
CTG GTC ATC GGG AAA TTT GTG CGC GGG TTT TTC AGC GAG ATT AGC CAT AGC
ATC ATG TTC GAG GAA CTT CCG TGT GTG GAT CGC ATC CTG AAG CTG TGC CAG
GAT ATC TTC TTA GTT CGC GAG ACC CGT GAA CTG GAA CTT GAA GAG GAA CTG
TAT GCC AAG CTG ATT TTC CTC TAC CGC TCC CCA GAA ACG ATG ATC AAA TGG
ACC CGT GAA AAA GAA

Key differences from human-optimized version: Arginine codons AGG/AGA → CGT/CGC (abundant E. coli tRNAs) · Leucine CTA → CTG/CTT · Isoleucine ATA → ATT · Lower GC content (~52% vs ~69% in human-optimized)


Quick Comparison

PropertyProteinNative DNAE. coli-Optimized DNA
Length2,521 aa7,566 bp7,566 bp
GC content~58%~52%
Target hostH. sapiensE. coli C43(DE3)
Rare codonsNone (native)Eliminated
Encoded proteinPIEZO1IdenticalIdentical

Note: Both DNA sequences encode the exact same protein. Only the synonymous codon choices differ, optimized for the translational machinery of the target host organism.