Week 2 HW: DNA read, write and edit

Part 1: Benchling & In-silico Gel Art

Pattern

Part 3: DNA Design Challenge

3.1. Choose your protein

My primary research project is related to Cystic Fibrosis (CF). Therefore, choosing the CFTR protein allows me to directly connect the concepts from this class to my own work. Understanding the protein’s structure, its function as a chloride channel, and how mutations disrupt that function is foundational to my project. Protein sequence from NCBI (Reference:NM_000492.4)

MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLHPAIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQVALLMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCWEEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVLRMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQNNNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMVIMGELEPSEGKIKHSGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSSYFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTGEFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPTLQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDLKECFFDDMESIPAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHSRNNSYAVIITSTSSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTLKAGGILNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLRAYFLQTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTANWFLYLSTLRWFQMRIEMIFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSKVMIIENSHVKKDDIWPSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLLNTEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSDQEIWKVADEVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPVTYQIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSLFRQAISPSDRVKLFPHRNSSKCKSKPQIAALKEETEEEVQDTRL

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence

NM_000492.4 Homo sapiens CF transmembrane conductance regulator (CFTR), mRNA

ATGCAGCGCAGCCCGCTGGAAAAAGCGAGCGTGGTGAGCAAACTGTTTTTTAGCTGGACCCGCCCGATTCTGCGCAAAGGCTATCGCCAGCGCCTGGAACTGAGCGATATTTATCAGATTCCGAGCGTGGATAGCGCGGATAACCTGAGCGAAAAACTGGAACGCGAATGGGATCGCGAACTGGCGAGCAAAAAAAACCCGAAACTGATTAACGCGCTGCGCCGCTGCTTTTTTTGGCGCTTTATGTTTTATGGCATTTTTCTGTATCTGGGCGAAGTGACCAAAGCGGTGCAGCCGCTGCTGCTGGGCCGCATTATTGCGAGCTATGATCCGGATAACAAAGAAGAACGCAGCATTGCGATTTATCTGGGCATTGGCCTGTGCCTGCTGTTTATTGTGCGCACCCTGCTGCTGCATCCGGCGATTTTTGGCCTGCATCATATTGGCATGCAGATGCGCATTGCGATGTTTAGCCTGATTTATAAAAAAACCCTGAAACTGAGCAGCCGCGTGCTGGATAAAATTAGCATTGGCCAGCTGGTGAGCCTGCTGAGCAACAACCTGAACAAATTTGATGAAGGCCTGGCGCTGGCGCATTTTGTGTGGATTGCGCCGCTGCAGGTGGCGCTGCTGATGGGCCTGATTTGGGAACTGCTGCAGGCGAGCGCGTTTTGCGGCCTGGGCTTTCTGATTGTGCTGGCGCTGTTTCAGGCGGGCCTGGGCCGCATGATGATGAAATATCGCGATCAGCGCGCGGGCAAAATTAGCGAACGCCTGGTGATTACCAGCGAAATGATTGAAAACATTCAGAGCGTGAAAGCGTATTGCTGGGAAGAAGCGATGGAAAAAATGATTGAAAACCTGCGCCAGACCGAACTGAAACTGACCCGCAAAGCGGCGTATGTGCGCTATTTTAACAGCAGCGCGTTTTTTTTTAGCGGCTTTTTTGTGGTGTTTCTGAGCGTGCTGCCGTATGCGCTGATTAAAGGCATTATTCTGCGCAAAATTTTTACCACCATTAGCTTTTGCATTGTGCTGCGCATGGCGGTGACCCGCCAGTTTCCGTGGGCGGTGCAGACCTGGTATGATAGCCTGGGCGCGATTAACAAAATTCAGGATTTTCTGCAGAAACAGGAATATAAAACCCTGGAATATAACCTGACCACCACCGAAGTGGTGATGGAAAACGTGACCGCGTTTTGGGAAGAAGGCTTTGGCGAACTGTTTGAAAAAGCGAAACAGAACAACAACAACCGCAAAACCAGCAACGGCGATGATAGCCTGTTTTTTAGCAACTTTAGCCTGCTGGGCACCCCGGTGCTGAAAGATATTAACTTTAAAATTGAACGCGGCCAGCTGCTGGCGGTGGCGGGCAGCACCGGCGCGGGCAAAACCAGCCTGCTGATGGTGATTATGGGCGAACTGGAACCGAGCGAAGGCAAAATTAAACATAGCGGCCGCATTAGCTTTTGCAGCCAGTTTAGCTGGATTATGCCGGGCACCATTAAAGAAAACATTATTTTTGGCGTGAGCTATGATGAATATCGCTATCGCAGCGTGATTAAAGCGTGCCAGCTGGAAGAAGATATTAGCAAATTTGCGGAAAAAGATAACATTGTGCTGGGCGAAGGCGGCATTACCCTGAGCGGCGGCCAGCGCGCGCGCATTAGCCTGGCGCGCGCGGTGTATAAAGATGCGGATCTGTATCTGCTGGATAGCCCGTTTGGCTATCTGGATGTGCTGACCGAAAAAGAAATTTTTGAAAGCTGCGTGTGCAAACTGATGGCGAACAAAACCCGCATTCTGGTGACCAGCAAAATGGAACATCTGAAAAAAGCGGATAAAATTCTGATTCTGCATGAAGGCAGCAGCTATTTTTATGGCACCTTTAGCGAACTGCAGAACCTGCAGCCGGATTTTAGCAGCAAACTGATGGGCTGCGATAGCTTTGATCAGTTTAGCGCGGAACGCCGCAACAGCATTCTGACCGAAACCCTGCATCGCTTTAGCCTGGAAGGCGATGCGCCGGTGAGCTGGACCGAAACCAAAAAACAGAGCTTTAAACAGACCGGCGAATTTGGCGAAAAACGCAAAAACAGCATTCTGAACCCGATTAACAGCATTCGCAAATTTAGCATTGTGCAGAAAACCCCGCTGCAGATGAACGGCATTGAAGAAGATAGCGATGAACCGCTGGAACGCCGCCTGAGCCTGGTGCCGGATAGCGAACAGGGCGAAGCGATTCTGCCGCGCATTAGCGTGATTAGCACCGGCCCGACCCTGCAGGCGCGCCGCCGCCAGAGCGTGCTGAACCTGATGACCCATAGCGTGAACCAGGGCCAGAACATTCATCGCAAAACCACCGCGAGCACCCGCAAAGTGAGCCTGGCGCCGCAGGCGAACCTGACCGAACTGGATATTTATAGCCGCCGCCTGAGCCAGGAAACCGGCCTGGAAATTAGCGAAGAAATTAACGAAGAAGATCTGAAAGAATGCTTTTTTGATGATATGGAAAGCATTCCGGCGGTGACCACCTGGAACACCTATCTGCGCTATATTACCGTGCATAAAAGCCTGATTTTTGTGCTGATTTGGTGCCTGGTGATTTTTCTGGCGGAAGTGGCGGCGAGCCTGGTGGTGCTGTGGCTGCTGGGCAACACCCCGCTGCAGGATAAAGGCAACAGCACCCATAGCCGCAACAACAGCTATGCGGTGATTATTACCAGCACCAGCAGCTATTATGTGTTTTATATTTATGTGGGCGTGGCGGATACCCTGCTGGCGATGGGCTTTTTTCGCGGCCTGCCGCTGGTGCATACCCTGATTACCGTGAGCAAAATTCTGCATCATAAAATGCTGCATAGCGTGCTGCAGGCGCCGATGAGCACCCTGAACACCCTGAAAGCGGGCGGCATTCTGAACCGCTTTAGCAAAGATATTGCGATTCTGGATGATCTGCTGCCGCTGACCATTTTTGATTTTATTCAGCTGCTGCTGATTGTGATTGGCGCGATTGCGGTGGTGGCGGTGCTGCAGCCGTATATTTTTGTGGCGACCGTGCCGGTGATTGTGGCGTTTATTATGCTGCGCGCGTATTTTCTGCAGACCAGCCAGCAGCTGAAACAGCTGGAAAGCGAAGGCCGCAGCCCGATTTTTACCCATCTGGTGACCAGCCTGAAAGGCCTGTGGACCCTGCGCGCGTTTGGCCGCCAGCCGTATTTTGAAACCCTGTTTCATAAAGCGCTGAACCTGCATACCGCGAACTGGTTTCTGTATCTGAGCACCCTGCGCTGGTTTCAGATGCGCATTGAAATGATTTTTGTGATTTTTTTTATTGCGGTGACCTTTATTAGCATTCTGACCACCGGCGAAGGCGAAGGCCGCGTGGGCATTATTCTGACCCTGGCGATGAACATTATGAGCACCCTGCAGTGGGCGGTGAACAGCAGCATTGATGTGGATAGCCTGATGCGCAGCGTGAGCCGCGTGTTTAAATTTATTGATATGCCGACCGAAGGCAAACCGACCAAAAGCACCAAACCGTATAAAAACGGCCAGCTGAGCAAAGTGATGATTATTGAAAACAGCCATGTGAAAAAAGATGATATTTGGCCGAGCGGCGGCCAGATGACCGTGAAAGATCTGACCGCGAAATATACCGAAGGCGGCAACGCGATTCTGGAAAACATTAGCTTTAGCATTAGCCCGGGCCAGCGCGTGGGCCTGCTGGGCCGCACCGGCAGCGGCAAAAGCACCCTGCTGAGCGCGTTTCTGCGCCTGCTGAACACCGAAGGCGAAATTCAGATTGATGGCGTGAGCTGGGATAGCATTACCCTGCAGCAGTGGCGCAAAGCGTTTGGCGTGATTCCGCAGAAAGTGTTTATTTTTAGCGGCACCTTTCGCAAAAACCTGGATCCGTATGAACAGTGGAGCGATCAGGAAATTTGGAAAGTGGCGGATGAAGTGGGCCTGCGCAGCGTGATTGAACAGTTTCCGGGCAAACTGGATTTTGTGCTGGTGGATGGCGGCTGCGTGCTGAGCCATGGCCATAAACAGCTGATGTGCCTGGCGCGCAGCGTGCTGAGCAAAGCGAAAATTCTGCTGCTGGATGAACCGAGCGCGCATCTGGATCCGGTGACCTATCAGATTATTCGCCGCACCCTGAAACAGGCGTTTGCGGATTGCACCGTGATTCTGTGCGAACATCGCATTGAAGCGATGCTGGAATGCCAGCAGTTTCTGGTGATTGAAGAAAACAAAGTGCGCCAGTATGATAGCATTCAGAAACTGCTGAACGAACGCAGCCTGTTTCGCCAGGCGATTAGCCCGAGCGATCGCGTGAAACTGTTTCCGCATCGCAACAGCAGCAAATGCAAAAGCAAACCGCAGATTGCGGCGCTGAAAGAAGAAACCGAAGAAGAAGTGCAGGATACCCGCCTG

3.3. Codon optimization

Codon optimization in NOVOPRO Codon Optimization Tool (ExpOptimizer) For the purposes of this homework exercise, I chose to codon optimize the CFTR sequence for expression in E. coli K12 with the exclusion of BamHI, EcoRI and PstI sequences. This decision was primarily practical, as I wanted to clearly visualize how the nucleotide sequence changes through optimization. When I performed a BLAST alignment comparing the original human CFTR sequence to the E. coli-optimized version, I observed a clear difference with approximately 83% identity at the nucleotide level. This provided a tangible demonstration of how codon optimization works in practice.

I also attempted to perform the same optimization and BLAST comparison for both Homo sapiens and Saccharomyces cerevisiae hosts. Interestingly, in both cases, BLAST returned results indicating no significant differences between the original and optimized sequences. This makes sense biologically: since the original CFTR sequence is already human, optimizing it for human expression would result in minimal changes, and yeast likely shares enough codon bias with humans.

It is important to emphasize, however, that for actual experimental practice, particularly for a functional study of CFTR mutations, codon optimization should be performed for human bronchial epithelial (CFBE) cells. These cells provide the native cellular environment, complete with the proper trafficking machinery, post-translational modification systems, and lipid membrane composition required for CFTR to function as a legitimate chloride channel. While E. coli works well for this homework demonstration, it would not be suitable for genuine functional studies of this protein.

image opt

ATGCAGCGTAGTCCGTTAGAAAAGGCGTCTGTCGTATCAAAATTGTTTTTTAGTTGGACTCGCCCCATTCTGCGTAAAGGTTATCGTCAACGCTTAGAACTCAGCGACATTTACCAGATCCCGAGTGTAGATTCGGCGGATAATCTCAGCGAAAAGTTGGAACGTGAGTGGGATCGCGAATTGGCATCCAAGAAAAATCCGAAACTGATTAATGCGTTACGCCGCTGTTTCTTTTGGCGTTTTATGTTTTATGGTATCTTTCTCTATTTGGGCGAAGTTACTAAAGCTGTGCAGCCATTGCTGCTCGGACGTATTATTGCGAGCTATGATCCGGACAATAAAGAAGAACGCTCAATTGCGATTTACCTGGGCATTGGCCTGTGTTTACTGTTTATTGTTCGTACCCTGCTGTTACACCCGGCGATTTTTGGACTCCATCACATTGGCATGCAGATGCGCATTGCCATGTTTTCCCTGATCTATAAGAAAACGCTGAAATTGTCCAGTCGCGTTCTGGACAAAATTTCCATCGGACAGCTGGTGAGCCTCCTGTCAAATAACCTGAACAAATTTGACGAGGGTCTCGCCCTGGCACATTTCGTATGGATTGCGCCGCTGCAAGTGGCGTTACTGATGGGCCTGATCTGGGAACTGCTGCAGGCTTCAGCATTTTGCGGCCTGGGCTTTCTGATTGTGCTGGCGCTTTTTCAGGCAGGACTGGGACGCATGATGATGAAATACCGCGACCAACGTGCGGGAAAAATCAGCGAACGTTTAGTCATTACCAGCGAAATGATCGAAAACATTCAATCAGTAAAAGCGTACTGCTGGGAAGAAGCCATGGAGAAAATGATTGAAAACCTTCGCCAGACCGAACTGAAACTGACGCGCAAAGCGGCGTACGTGCGCTATTTCAACAGCAGTGCCTTTTTTTTTAGCGGATTTTTTGTCGTCTTTCTGAGCGTGCTGCCGTATGCACTGATTAAGGGGATTATTCTGCGCAAAATTTTCACGACAATTTCCTTCTGTATTGTACTCCGTATGGCTGTTACCCGTCAGTTCCCGTGGGCGGTGCAGACGTGGTACGACTCGTTGGGGGCTATTAATAAAATTCAAGACTTCCTGCAGAAGCAGGAGTATAAAACGTTAGAGTATAACCTCACCACGACGGAAGTCGTTATGGAAAACGTAACCGCATTCTGGGAAGAAGGTTTTGGCGAACTTTTTGAAAAGGCCAAACAGAATAACAATAACCGCAAAACGAGTAATGGAGATGATAGCCTGTTTTTTAGCAACTTCTCACTGCTGGGCACCCCGGTGCTTAAAGATATTAACTTCAAGATCGAGCGTGGCCAACTGCTGGCCGTGGCGGGAAGCACAGGCGCCGGAAAGACCTCGTTGCTTATGGTGATTATGGGCGAATTAGAACCGTCAGAGGGCAAAATTAAGCATTCCGGCCGTATTTCGTTTTGTAGTCAGTTTAGCTGGATTATGCCCGGCACCATCAAAGAGAACATCATTTTCGGGGTGTCGTACGATGAATACCGTTATCGTTCAGTTATCAAAGCGTGTCAACTGGAAGAAGATATCTCTAAGTTTGCTGAAAAAGACAACATTGTGCTGGGGGAAGGCGGCATCACCTTGTCAGGCGGTCAGCGTGCACGCATTAGTCTGGCGCGTGCTGTGTATAAAGACGCGGATTTATATCTCCTTGATAGTCCCTTCGGATATCTGGATGTCTTGACCGAAAAAGAAATTTTTGAAAGCTGCGTGTGCAAACTGATGGCCAACAAAACTCGCATTCTCGTAACCTCAAAAATGGAACACTTGAAAAAAGCCGACAAGATTCTGATTCTGCACGAAGGTTCAAGCTATTTTTACGGTACCTTTTCTGAGTTACAGAATCTGCAGCCCGACTTTTCGTCCAAATTAATGGGCTGCGACAGCTTTGACCAGTTTAGCGCCGAACGCCGCAATTCAATCCTTACGGAAACATTACATCGCTTCTCTCTGGAAGGCGATGCCCCAGTAAGCTGGACAGAAACGAAAAAACAGAGCTTCAAACAAACCGGCGAATTCGGCGAAAAACGTAAGAACTCAATCCTTAATCCCATTAATTCCATTCGTAAGTTCAGCATCGTGCAGAAGACGCCGCTCCAGATGAATGGCATTGAGGAGGATTCGGATGAACCGCTTGAGCGCCGTCTGTCATTAGTGCCGGATTCGGAGCAAGGTGAAGCAATTTTACCCCGTATTTCAGTGATTTCTACCGGGCCGACGCTCCAGGCACGCCGTCGCCAGAGCGTGCTCAACCTTATGACTCATAGCGTGAATCAAGGACAAAATATTCATCGTAAAACGACAGCGAGCACACGCAAAGTGAGCTTGGCGCCTCAAGCAAATCTTACCGAACTGGACATCTATTCGCGCCGCCTCTCCCAGGAGACTGGCCTCGAAATTTCCGAGGAGATCAATGAAGAAGACCTGAAAGAATGCTTTTTTGACGATATGGAAAGCATCCCGGCGGTCACTACGTGGAATACGTATCTCCGCTATATTACGGTGCATAAAAGTCTGATCTTCGTATTGATCTGGTGTTTAGTGATCTTTCTGGCCGAAGTGGCCGCGTCCCTTGTGGTGCTGTGGCTGCTGGGTAACACCCCTTTGCAGGATAAGGGCAATAGCACTCATTCCCGCAATAACTCGTATGCCGTCATCATTACCTCCACCTCGAGTTATTACGTCTTTTATATCTACGTGGGCGTCGCTGATACCTTATTGGCAATGGGGTTCTTCCGCGGTCTGCCGTTAGTGCATACACTGATCACCGTCTCGAAAATTCTGCATCATAAAATGCTGCATAGTGTGCTCCAGGCACCGATGTCGACCCTGAACACACTGAAAGCAGGCGGCATTTTAAACCGCTTTTCTAAAGACATTGCCATTCTGGATGACCTGCTTCCCCTGACTATTTTTGATTTTATTCAGTTGCTGCTCATTGTAATTGGCGCTATTGCTGTGGTGGCGGTTCTGCAACCGTATATTTTTGTCGCGACCGTGCCGGTCATTGTCGCTTTCATTATGCTGCGCGCCTACTTTCTGCAGACGAGCCAACAGCTTAAACAGCTCGAATCTGAAGGACGTTCACCTATCTTTACTCACCTGGTTACGTCGCTGAAAGGCCTGTGGACGCTGCGTGCATTCGGCCGTCAGCCGTACTTCGAAACCCTGTTCCATAAAGCACTGAACCTGCATACCGCGAACTGGTTTCTGTATCTGTCGACTCTGCGCTGGTTTCAAATGCGTATTGAGATGATTTTCGTAATTTTTTTTATTGCCGTGACTTTCATCAGTATCTTGACCACGGGCGAGGGCGAAGGTCGTGTGGGTATTATCCTGACCTTGGCTATGAACATCATGAGTACACTGCAGTGGGCGGTGAATAGCAGCATCGATGTGGATTCTTTGATGCGCAGCGTGTCCCGCGTTTTTAAATTCATTGACATGCCTACCGAAGGTAAGCCCACGAAGAGTACAAAACCCTACAAAAACGGCCAACTGTCAAAGGTTATGATTATTGAAAATTCGCACGTGAAAAAAGACGACATTTGGCCGAGCGGCGGTCAAATGACAGTGAAGGATCTGACGGCGAAATACACAGAAGGAGGCAATGCCATTCTGGAAAACATTTCTTTCTCCATCAGCCCGGGCCAGCGCGTCGGGCTCCTGGGCCGTACGGGTAGCGGCAAATCCACTCTGCTTTCAGCATTTTTGCGCCTCTTAAACACGGAAGGAGAAATTCAGATTGATGGCGTCTCGTGGGATAGCATCACACTGCAACAGTGGCGTAAGGCATTCGGCGTCATTCCGCAGAAGGTGTTCATTTTTTCGGGCACCTTTCGCAAGAACCTGGACCCTTATGAACAGTGGAGCGACCAGGAGATTTGGAAGGTAGCGGACGAAGTGGGCCTGCGTTCGGTTATTGAACAGTTCCCCGGTAAATTGGACTTCGTGCTGGTCGATGGTGGGTGCGTACTCTCTCATGGGCACAAACAACTTATGTGCCTGGCGCGTAGTGTGCTGAGCAAAGCCAAGATTCTCCTGCTGGACGAACCGTCAGCACATCTCGATCCTGTCACCTATCAGATCATTCGCCGCACCCTCAAACAGGCGTTCGCGGATTGCACGGTGATTCTGTGCGAACATCGCATCGAAGCGATGCTTGAGTGTCAGCAGTTTCTCGTGATCGAAGAAAACAAAGTCCGTCAATATGATAGCATCCAGAAGTTGCTGAATGAACGTTCATTATTTCGCCAGGCGATCAGCCCGAGCGACCGTGTGAAACTGTTCCCTCATCGTAATAGCTCGAAGTGTAAATCCAAACCTCAGATTGCTGCACTGAAGGAGGAAACGGAAGAAGAAGTGCAGGATACTCGTCTG

ncbi img

3.4. You have a sequence! Now what?

Once I have my codon-optimized CFTR gene sequence designed for E. coli, several technologies exist to produce the actual protein. These methods fall into two main categories: cell-dependent expression systems and cell-free systems. The most common method involves inserting my optimized CFTR gene into a plasmid vector. This plasmid is designed with regulatory elements including a promoter to recruit RNA polymerase, a ribosome binding site to help recruit bacterial ribosomes for translation, and a selectable marker such as an antibiotic resistance gene to ensure only bacteria containing my plasmid survive. The process begins with transformation, where I introduce the plasmid into competent E. coli cells through heat shock or electroporation. I then culture the bacteria in liquid media until they reach optimal density, at which point I add an inducing chemical like IPTG to activate the promoter and trigger transcription of my CFTR gene. Once induced, the bacterial machinery transcribes DNA into mRNA, which ribosomes then translate into CFTR protein. For a challenging membrane protein like CFTR, special considerations are needed. I might use engineered E. coli strains that optimize membrane protein folding or grow cultures at lower temperatures to prevent aggregation.

Cell-free protein synthesis offers a compelling alternative, particularly for difficult-to-express proteins like CFTR. These methods use cellular extracts containing all the necessary transcription and translation machinery, but without living cells. To use this approach, I prepare an extract from E. coli, which contains ribosomes, tRNAs, amino acids, and energy sources. I then add my purified plasmid DNA directly to this extract, and transcription and translation occur in a test tube over several hours, after which the protein can be purified directly from the reaction mixture. This approach offers several advantages for CFTR production. As an open system, I can add detergents or lipids during synthesis to help membrane proteins fold properly. I can also screen many mutant variants quickly without transforming cells each time. If CFTR proves toxic to E. coli, which membrane proteins often do, cell-free synthesis bypasses this entirely. Additionally, I can easily incorporate modified amino acids for structural studies.

Part 4: Prepare a Twist DNA Synthesis Order

img annotations

img construct

https://benchling.com/s/seq-ceNcs06iT2dZog3TP27n?m=slm-XZIDkTXrI6R5WwFYIFl8

img twist order

img plasmid

https://benchling.com/s/seq-mc2yPDRoSr0FiL2RCqAB?m=slm-tjBqRHIwYT01GDHSirkw

Part 5: DNA Read/Write/Edit

5.1 DNA Read

If I could choose any DNA to sequence, I would want to sequence the CFTR gene from individuals diagnosed with or suspected of having Cystic Fibrosis. This choice is directly connected to my main project and interest in CF research. Sequencing the CFTR gene from patient samples would allow me to identify which specific mutation or mutations an individual carries. This is critically important because Cystic Fibrosis is caused by over 2,000 different known mutations in the CFTR gene, and these mutations can affect the protein in different ways. Some mutations, like F508del, cause the protein to misfold and never reach the cell surface. Others, like G551D, allow the protein to reach the surface but prevent the channel from opening properly. Still other mutations result in premature stop codons or splicing errors that produce no functional protein at all. By sequencing each patient’s CFTR gene, I could determine their specific genotype, which directly informs their prognosis and guides treatment decisions. This is particularly relevant today because of the development of mutation-specific therapies with modulators. These drugs aren’t effective in every mutations, highlighting why knowing the exact DNA sequence is essential for personalized medicine. Beyond individual patient care, sequencing CFTR genes from large populations would contribute to our understanding of how different variants correlate with disease severity, how new variants arise, local variants and potentially identify rare mutations that might respond to existing or future therapies. For someone working on CF, having this sequence information is the foundation upon which everything else is built.

For sequencing the CFTR gene, I would employ a combination approach using whole exome sequencing followed by targeted CFTR gene sequencing, both performed on the Illumina platform. This dual strategy is particularly valuable for clinical scenarios involving suspected Cystic Fibrosis or other disorders with overlapping symptoms. Illumina sequencing is classified as second-generation sequencing, also called next-generation sequencing or NGS. Unlike first-generation Sanger sequencing, which reads one DNA fragment at a time, Illumina’s massively parallel approach sequences millions of fragments simultaneously, making it highly efficient for clinical diagnostics. I have selected Illumina because it offers accuracy with very low error rates, making it ideal for detecting disease-causing mutations where precision is paramount. Additionally, Illumina produces shorter reads which provides the high accuracy needed for clinical decision-making. This is particularly important for CFTR sequencing, as knowing the exact mutation determines which targeted therapies a patient may receive. The input for this sequencing approach would be genomic DNA isolated from a patient blood sample. The essential steps for library preparation would begin with DNA isolation and quality assessment, where I extract genomic DNA from the patient sample and verify its purity and integrity using UV spectrophotometry and fluorometric assays. Next, for whole exome sequencing, I would use hybridization-based probes to capture all exonic regions, while for targeted CFTR sequencing, a custom enrichment panel can specifically capture the CFTR gene regions including both exons and introns. The captured DNA is then fragmented into small pieces suitable for Illumina sequencing. Following fragmentation, Illumina-specific adapters are ligated to both ends of the fragments, as these adapters are complementary to the oligos on the flow cell and enable cluster generation. PCR amplification is then performed to increase the quantity of the library, which is especially important when working with limited starting material. Finally, the prepared libraries are quantified to ensure optimal loading concentration onto the sequencer.

The essential steps of Illumina sequencing technology begin with cluster generation through bridge amplification. The library is loaded onto a flow cell where fragments hybridize to complementary oligos on the surface. Through bridge amplification, each fragment forms a clonal cluster of identical molecules, which amplifies the fluorescent signal for detection. The core technology that decodes the bases is called sequencing by synthesis. This process uses fluorescently labeled nucleotides with reversible terminators. In each cycle, DNA polymerase adds a single complementary nucleotide to each growing strand. Unbound nucleotides are washed away, and a laser excites the fluorescent labels. A camera captures images of the flow cell, recording which base was incorporated in each cluster based on the emission wavelength. The fluorescent dye and terminator are then cleaved, allowing the next cycle to begin. This cyclic reversible termination process repeats for the desired number of cycles to achieve a specific read length. The images are processed by Illumina’s DRAGEN Bio-IT Platform or similar software, which converts fluorescence intensities into base calls with associated quality scores. The output of this sequencing approach includes several file types and reports. Raw FASTQ files contain the sequence reads and quality scores for each base call. Aligned BAM files show the reads aligned to the human reference genome, allowing visualization of the CFTR region. Variant call format files, or VCF files, list all identified variants including single nucleotide variants, and small insertions and deletions. For CFTR specifically, this includes critical variants that affect splicing. Finally, a clinical report provides a curated interpretation of identified variants, classifying them as pathogenic, likely pathogenic, variants of uncertain significance, or benign based on established guidelines. This comprehensive approach would enable accurate diagnosis of CFTR-related disorders, guide mutation-specific therapy decisions, and potentially identify novel variants for further study.

5.2 DNA Write

For DNA synthesis, I would like to synthesize an mRNA construct encoding a fully functional, wild-type CFTR protein, designed for use as an mRNA replacement therapy. This approach would operate independently of current modulator therapies and could benefit patients regardless of their specific variant type, including those with nonsense or frameshift mutations that do not respond to existing drugs. The concept involves delivering synthetic CFTR mRNA to airway epithelial cells, where it would be translated directly into functional CFTR protein by the cell’s own ribosomes. This bypasses the need to correct the patient’s mutated endogenous gene. However, this strategy would ideally require co-delivery of RNA interference molecules such as siRNA or shRNA to silence the expression of the endogenous mutated CFTR transcript. This would prevent potential dominant-negative interactions or production of toxic truncated protein fragments from the patient’s own gene.

I would use the silicon-based high-throughput DNA synthesis platform developed by Twist Bioscience. This technology represents the current industry standard for commercial gene synthesis and is ideally suited for producing the approximately 4,500 base pair CFTR coding sequence I require.

Twist’s synthesis method is classified as second-generation high-throughput DNA synthesis and relies on phosphoramidite chemistry miniaturized and parallelized on a silicon microarray chip. Unlike conventional column-based synthesizers that produce one sequence at a time, Twist’s platform synthesizes thousands to millions of oligonucleotides simultaneously on a single silicon wafer. I have chosen this technology because it offers the optimal balance of throughput, accuracy, and cost-effectiveness for my CFTR project. Additionally, all synthesized products are sequence-verified by next-generation sequencing, ensuring the accuracy critical for downstream therapeutic applications.

The essential steps begin with sequence design and optimization, where my codon-optimized CFTR coding sequence is analyzed for complexity factors like high GC content and repeats. A silicon wafer is then functionalized with a chemical linker to enable oligonucleotide attachment. The core process is parallel oligonucleotide synthesis using phosphoramidite chemistry, where each synthesis cycle involves four reactions: deblocking, coupling, capping, and oxidation. These cycles occur across thousands of features on the chip with stepwise coupling efficiency. Once synthesis is complete, the oligonucleotides are chemically cleaved and eluted from the silicon surface. For my full-length CFTR gene, these oligonucleotides undergo amplification and assembly through PCR-based methods and Gibson Assembly to join fragments into the complete 4.4 kb coding sequence. Finally, rigorous quality control and sequence verification using next-generation sequencing confirms accuracy before delivery.

Regarding limitations, speed remains a factor with standard turnaround of approximately 10 business days. Accuracy is excellent at the per-base level, but cumulative error rates for sequences of several thousand base pairs range from 0.3% to 1.4%, which is why NGS verification is essential. For scalability, the platform excels at producing many sequences in parallel but has inherent limits on individual sequence length. Twist’s Gene Fragments are offered up to 5 kb, which accommodates my CFTR coding sequence but would require additional assembly steps for larger constructs.

5.3 DNA Edit

For DNA editing, I would want to edit the CFTR gene directly in patient-derived cells to permanently correct disease-causing mutations at the source. Unlike my previously described mRNA replacement therapy, which provides a temporary workaround, gene editing offers the potential for a one-time, permanent cure by repairing the endogenous gene and restoring normal CFTR expression under its native regulatory control. This approach would be particularly valuable because it maintains the cell’s natural mechanisms for regulating CFTR expression, rather than relying on exogenous delivery.

The technology I would use to perform these edits is CRISPR-Cas9 for initial exploration, but for therapeutic application I would preferentially employ base editing or prime editing depending on the specific mutation. These represent second-generation CRISPR technologies that offer greater precision and safety for clinical applications. CRISPR-Cas9 is classified as a programmable nuclease system derived from bacterial adaptive immune systems. The essential steps begin with designing a single guide RNA (sgRNA) complementary to the target region adjacent to a protospacer adjacent motif (PAM) sequence. To overcome the limitation of CRISPR-cas9 with the errors caused by the double strand breake, I would use base editing, a more precise technology that does not require this type of brakes. Base editors fuse a catalytically impaired Cas protein (nickase Cas9) to a deaminase enzyme. For example, for correcting G551D, a point mutation where glycine is replaced by aspartate, I would use an adenine base editor (ABE) that converts an A•T base pair to G•C. The essential steps involve designing an sgRNA that positions the Cas9 nickase to expose the target base within the editing window, typically 4-8 nucleotides. The deaminase chemically converts the target adenine to inosine, which is read as guanine during replication.

In terms of preparation, the input components would include a base editor mRNA or protein and the corresponding sgRNA. These components would be delivered to patient-derived bronchial epithelial cells or induced pluripotent stem cells (iPSCs) for ex vivo editing, with the goal of eventually moving to in vivo delivery via lipid nanoparticles or AAV vectors optimized for lung targeting. The limitations of these methods must be carefully considered. Base editing offers higher efficiency and fewer double-strand breaks, but is limited to transition mutations (A→G or C→T) and cannot correct insertions or deletions. It would be necessary to consider off-target mutations.