Homework

Weekly homework submissions:

Biological Engineering Application: A Functional Expression Platform for Classifying Variants of Uncertain Significance (VUS) in Cystic Fibrosis

My proposal is to develop a standardized, accessible platform for the functional expression and characterization of variants in the CFTR gene, specifically focusing on Variants of Uncertain Significance (VUS) identified in underrepresented populations in Ecuador.

The core application integrates three components: rapid synthesis and cloning of CFTR gene constructs containing locally identified VUS; expression of these constructs in relevant, standardized human cell lines (e.g., immortalized bronchial epithelial cells); and a multiparametric, high-content functional assay suite.

This suite would automatically measure key parameters: subcellular localization via microscopy, protein processing and maturation via western blot, ion channel function via fluorescence-based halide assays, and response to CFTR modulator drugs.

The motivation for this tool is to address a critical equity gap in genomic medicine. Global genetic databases like ClinVar are heavily biased toward populations of European descent. Consequently, variants identified in mestizo, Indigenous, or Afro-descendant populations in Ecuador and similar regions are often classified as VUS due to a lack of functional and phenotypic data.

This has direct, harmful clinical consequences: it leads to ambiguous diagnoses for patients and families, restricts access to modern, highly effective modulator therapies that are approved for specific variants, and perpetuates health disparities.

This platform aims to functionalize genomics by converting raw genetic data into actionable clinical knowledge, deliberately prioritizing variants from historically underserved populations to ensure that precision medicine benefits extend globally.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

Biological Engineering Application: A Functional Expression Platform for Classifying Variants of Uncertain Significance (VUS) in Cystic Fibrosis

1. My proposal is to develop a standardized, accessible platform for the functional expression and characterization of variants in the CFTR gene, specifically focusing on Variants of Uncertain Significance (VUS) identified in underrepresented populations in Ecuador.

The core application integrates three components: rapid synthesis and cloning of CFTR gene constructs containing locally identified VUS; expression of these constructs in relevant, standardized human cell lines (e.g., immortalized bronchial epithelial cells); and a multiparametric, high-content functional assay suite.

This suite would automatically measure key parameters: subcellular localization via microscopy, protein processing and maturation via western blot, ion channel function via fluorescence-based halide assays, and response to CFTR modulator drugs.

The motivation for this tool is to address a critical equity gap in genomic medicine. Global genetic databases like ClinVar are heavily biased toward populations of European descent. Consequently, variants identified in mestizo, Indigenous, or Afro-descendant populations in Ecuador and similar regions are often classified as VUS due to a lack of functional and phenotypic data.

This has direct, harmful clinical consequences: it leads to ambiguous diagnoses for patients and families, restricts access to modern, highly effective modulator therapies that are approved for specific variants, and perpetuates health disparities.

This platform aims to functionalize genomics by converting raw genetic data into actionable clinical knowledge, deliberately prioritizing variants from historically underserved populations to ensure that precision medicine benefits extend globally.

2. Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

The primary governance goal is to ensure that the VUS characterization platform maximizes clinical benefit and equity for underrepresented populations while ethically managing the risks inherent in generating new genomic knowledge. The first specific sub-goal is ethical and equitable variant prioritization. This involves establishing criteria and processes to guarantee that the variants selected for study are those that originate from and will first benefit populations with the greatest need and the least representation in global databases. The second sub-goal is responsible and accessible knowledge translation. This ensures that functional characterization results are effectively, rapidly, and freely translated into improvements in the clinical management of the very patients from whom the variants were sourced. The third sub-goal is protecting privacy and fostering community agency. This focuses on safeguarding patient genetic data and actively involving patient communities in decision-making processes regarding data use and study design, moving beyond mere consent to meaningful partnership.

3. Describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Action 1: A Community-Engaged Variant Prioritization Framework

The purpose of this action is to shift from the current common practice, where research variants are often selected for scientific convenience or commercial interest, to a proposed model where a multi-stakeholder committee prioritizes variants based on clinical need and equity. The design requires establishing a “VUS Ethical Prioritization Committee” for the project. This committee would include representatives from Ecuadorian CF patient associations, local clinical geneticists and neumologist, bioethicists, and the project scientists. The committee would employ a scoring system that combines factors like the VUS frequency in local undiagnosed cohorts, the clinical severity of the phenotype in carrier patients, the underrepresentation of the patient’s ethnic group in databases, and the variant’s potential for reclassification to grant access to existing therapies. Key actors involved are patient advocacy groups (providing the need perspective), public hospitals/health institutions (providing anonymized clinical data), and funders (SENESCYT, NIH, who could require this framework as a grant condition). This action relies on several assumptions: that patient groups and local clinicians have the capacity and time for meaningful participation; that clinical data can be shared securely and ethically for prioritization purposes; and that “need” can be consensually defined and quantified. A key risk of failure is that the committee becomes a bureaucratic formality without real power, with scientists ultimately choosing variants that are more “publishable” in high-impact journals, thereby ignoring the equity-driven prioritization. An unintended consequence of success could be that pressure to study only the most urgent variants leads to the neglect of rarer variants in patients with milder symptoms, creating a new form of bias. It could also generate conflict among communities or families competing to have their variant studied first.

Action 2: A Legally Binding “Clinical Translation Commitment” in Collaboration Agreements

This action’s purpose is to move beyond the current common practice where functional data from research projects often takes years—if ever—to reach a patient’s medical record. It proposes a model with a clear, obligatory pathway for variant reclassification and communication. The design needs a contractual appendix in all collaboration agreements (between universities, hospitals, funders) and informed consent forms. This legal instrument would mandate three things: an obligation for the research team to formally submit conclusive functional data to clinical databases (ClinVar) and the local lab’s variant review committee; a pre-approved protocol for returning actionable results to the treating physician and, if agreed upon, the patient/family when a VUS is reclassified; and non-exclusive licensing of any findings to prevent patents from restricting diagnostic use in Ecuador’s public health system. Primary actors include university technology transfer lawyers who draft the clause, research ethics committees who mandate it for approval, and forward-thinking funders (like the Wellcome Trust) who promote open science and benefit-sharing policies. Critical assumptions here are that the functional data will be robust enough to justify clinical reclassification, that mechanisms and personnel exist within the local health system to receive and act on this information, and that researchers are willing to accept these potential constraints on commercialization. This action could fail if the clauses are vague and unenforceable, if the ClinVar submission process is slow and bureaucratic, or if there is no clear clinical partner to receive the information. An unintended success risk is that the obligation to return results—especially for reclassified benign variants—could generate unnecessary anxiety or create high logistical costs. It might also inadvertently deter future researchers or companies from collaborating with resource-limited settings if they perceive these clauses as overly restrictive.

Action 3: A Technical Standard for “Algorithm Auditing” of Pathogenicity Predictors The purpose of this action is to mitigate the risk that the functional data generated by the platform could inadvertently reinforce existing biases in bioinformatic prediction algorithms (like PolyPhen-2, SIFT). These tools, trained predominantly on European genomic data, are known to be less accurate for other populations. The design involves creating an “Ethical Reference Dataset” from the platform’s results. Each functionally characterized variant (pathogenic, benign, residual function) would be used to “audit” standard prediction algorithms. The process would involve systematically publishing statements in open repositories: “Variant CFTR p.XYZ, common in the Andean mestizo population, was characterized as pathogenic with a processing defect. However, algorithm [PolyPhen-2] predicted it as ‘benign’ (score: 0.1). This suggests an underestimation bias for pathogenic variants from this population group.” Key actors are the project scientists who generate and publish the audit data, bioinformatics organizations (like the gnomAD consortium) that incorporate these findings to retrain or flag their tools’ limitations, and scientific journals that could require such bias analyses for publication of variant characterization studies. This action assumes that the functional assays are a reliable “gold standard,” that algorithm developers are open to critique and improving their tools, and that a sufficient number of variants will be characterized to perform statistically meaningful bias analysis. It risks failure if the audit data is ignored by the dominant bioinformatics community, rendering the effort inconsequential. A significant unintended consequence of success could be the generation of widespread, paralyzing distrust in all computational prediction tools, potentially delaying diagnosis in settings where functional assays are unavailable. Furthermore, the data on population-specific algorithmic bias could be misappropriated to support essentialist political arguments about biological differences between groups.

4.

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents233
• By helping respond322
Foster Lab Safety
• By preventing incident223
• By helping respond323
Protect the environment
• By preventing incidents333
• By helping respond333
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility?122
• Not impede research131
• Promote constructive applications121

5. Based on the scoring matrix, I recommend a prioritized combination of Action 1 (Community-Engaged Prioritization Framework) and Action 3 (Algorithm Auditing Standard) for immediate adoption and funding by national research funding agencies, as the U.S. National Institutes of Health and its counterparts in Ecuador and other underrepresented regions. This combination best advances the core ethical goals of equity, constructive application, and responsible knowledge generation with minimal burden to research progress. Action 1 directly ensures the research agenda addresses the most pressing clinical needs of underserved populations, scoring highest in feasibility, promoting constructive applications, and not impeding research. Action 3 complements this by systematically correcting the global bioinformatic biases that initially created the inequity in VUS classification, scoring best in minimizing burdens and also promoting constructive applications. The primary trade-off in prioritizing this combination is accepting a weaker direct link to traditional biosecurity and lab safety incident prevention, as these actions are focused on ethical oversight and data quality rather than physical containment or misuse deterrence. A critical assumption is that fostering a more equitable and accurate genomic research ecosystem is itself a foundational form of security, preventing harm caused by diagnostic errors and denied care. The major uncertainty lies in implementation: whether community committees can be resourced effectively for meaningful engagement (Action 1) and whether major algorithm developers will voluntarily adopt audit findings to retrain their models (Action 3). Action 2 (Legal Commitment), while crucial for ultimate clinical translation, scores poorly on burden and feasibility; it should be developed in parallel as a longer-term policy goal, informed by the pilot frameworks and trust built through Actions 1 and 3.

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? According to the slides, the error rate of a standard polymerase used in gene synthesis is cited as 1 in 10⁴ bases. The human genome is approximately 3.2 billion base pairs (Gbp). If uncorrected, such an error rate would introduce hundreds of thousands of mutations per genome replication, making stable heredity and cellular function impossible. Biology resolves this catastrophic discrepancy not through polymerase perfection alone, but by employing a layered system of enzymatic correction. The presentation highlights a key component of the MutS Repair System. A dedicated mismatch repair complex that scans and fixes errors after replication. This biological strategy combines proofreading by the polymerase itself with dedicated repair pathways to achieve the net high fidelity required for life.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? The presentation provides the key figure needed, that an average human protein is encoded by 1,036 base pairs. Given the degeneracy of the genetic code, the number of different DNA sequences that can specify the identical amino acid chain is astronomically large, calculated by multiplying the number of synonymous codon choices for each position. However, the presentation powerfully illustrates why, in practice, these myriad synonymous codes are not functionally equivalent. The core reason is the formation of specific messenger RNA (mRNA) secondary structures, dictated by the precise nucleotide sequence through NA:NA interactions. Therefore, successful gene design requires selecting a nucleotide sequence that not only encodes the correct protein but also folds into an mRNA structure compatible with the cellular translation machinery.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently? The most commonly used method for oligo synthesis currently is solid-phase phosphoramidite synthesis on a functionalized silica (SiO₂) support.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis? It is difficult to make oligos longer than 200 nucleotides (nt) via direct synthesis primarily due to the cumulative effect of stepwise coupling inefficiencies. Each cycle in the phosphoramidite synthesis (Slide 3) has a yield of less than 100%. Even a very high per-step coupling efficiency of 99.5% results in a total yield of only about 37% for a 200-mer (0.995^200 ≈ 0.37). This means most of the material synthesized is truncated at various lengths. The primary technical challenges are maintaining exceptionally high coupling yields at every single step and minimizing side reactions over hundreds of cycles.

3. Why can’t you make a 2000bp gene via direct oligo synthesis? You cannot make a 2000bp gene via direct oligo synthesis because the fundamental length limitation of the core phosphoramidite chemistry, as explained above, makes synthesizing a single, continuous 2000nt strand impractical and prohibitively inefficient. Instead, the standard industrial practice is to use oligos as building blocks for assembly. Genes are constructed by synthesizing many shorter, high-quality oligos and then assembling them into longer double-stranded DNA fragments via enzymatic methods like polymerase cycling assembly (PCA) or ligation. This “synthesize then assemble” strategy overcomes the length constraint.

Homework Question from George Church

1. What code would you suggest for AA:AA interactions? AUG CGA -> M:R

Week 2 HW: DNA read, write and edit

Part 1: Benchling & In-silico Gel Art

Pattern

Part 3: DNA Design Challenge

3.1. Choose your protein

My primary research project is related to Cystic Fibrosis (CF). Therefore, choosing the CFTR protein allows me to directly connect the concepts from this class to my own work. Understanding the protein’s structure, its function as a chloride channel, and how mutations disrupt that function is foundational to my project. Protein sequence from NCBI (Reference:NM_000492.4)

MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLHPAIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQVALLMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCWEEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVLRMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQNNNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMVIMGELEPSEGKIKHSGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSSYFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTGEFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPTLQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDLKECFFDDMESIPAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHSRNNSYAVIITSTSSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTLKAGGILNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLRAYFLQTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTANWFLYLSTLRWFQMRIEMIFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSKVMIIENSHVKKDDIWPSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLLNTEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSDQEIWKVADEVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPVTYQIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSLFRQAISPSDRVKLFPHRNSSKCKSKPQIAALKEETEEEVQDTRL

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence

NM_000492.4 Homo sapiens CF transmembrane conductance regulator (CFTR), mRNA

ATGCAGCGCAGCCCGCTGGAAAAAGCGAGCGTGGTGAGCAAACTGTTTTTTAGCTGGACCCGCCCGATTCTGCGCAAAGGCTATCGCCAGCGCCTGGAACTGAGCGATATTTATCAGATTCCGAGCGTGGATAGCGCGGATAACCTGAGCGAAAAACTGGAACGCGAATGGGATCGCGAACTGGCGAGCAAAAAAAACCCGAAACTGATTAACGCGCTGCGCCGCTGCTTTTTTTGGCGCTTTATGTTTTATGGCATTTTTCTGTATCTGGGCGAAGTGACCAAAGCGGTGCAGCCGCTGCTGCTGGGCCGCATTATTGCGAGCTATGATCCGGATAACAAAGAAGAACGCAGCATTGCGATTTATCTGGGCATTGGCCTGTGCCTGCTGTTTATTGTGCGCACCCTGCTGCTGCATCCGGCGATTTTTGGCCTGCATCATATTGGCATGCAGATGCGCATTGCGATGTTTAGCCTGATTTATAAAAAAACCCTGAAACTGAGCAGCCGCGTGCTGGATAAAATTAGCATTGGCCAGCTGGTGAGCCTGCTGAGCAACAACCTGAACAAATTTGATGAAGGCCTGGCGCTGGCGCATTTTGTGTGGATTGCGCCGCTGCAGGTGGCGCTGCTGATGGGCCTGATTTGGGAACTGCTGCAGGCGAGCGCGTTTTGCGGCCTGGGCTTTCTGATTGTGCTGGCGCTGTTTCAGGCGGGCCTGGGCCGCATGATGATGAAATATCGCGATCAGCGCGCGGGCAAAATTAGCGAACGCCTGGTGATTACCAGCGAAATGATTGAAAACATTCAGAGCGTGAAAGCGTATTGCTGGGAAGAAGCGATGGAAAAAATGATTGAAAACCTGCGCCAGACCGAACTGAAACTGACCCGCAAAGCGGCGTATGTGCGCTATTTTAACAGCAGCGCGTTTTTTTTTAGCGGCTTTTTTGTGGTGTTTCTGAGCGTGCTGCCGTATGCGCTGATTAAAGGCATTATTCTGCGCAAAATTTTTACCACCATTAGCTTTTGCATTGTGCTGCGCATGGCGGTGACCCGCCAGTTTCCGTGGGCGGTGCAGACCTGGTATGATAGCCTGGGCGCGATTAACAAAATTCAGGATTTTCTGCAGAAACAGGAATATAAAACCCTGGAATATAACCTGACCACCACCGAAGTGGTGATGGAAAACGTGACCGCGTTTTGGGAAGAAGGCTTTGGCGAACTGTTTGAAAAAGCGAAACAGAACAACAACAACCGCAAAACCAGCAACGGCGATGATAGCCTGTTTTTTAGCAACTTTAGCCTGCTGGGCACCCCGGTGCTGAAAGATATTAACTTTAAAATTGAACGCGGCCAGCTGCTGGCGGTGGCGGGCAGCACCGGCGCGGGCAAAACCAGCCTGCTGATGGTGATTATGGGCGAACTGGAACCGAGCGAAGGCAAAATTAAACATAGCGGCCGCATTAGCTTTTGCAGCCAGTTTAGCTGGATTATGCCGGGCACCATTAAAGAAAACATTATTTTTGGCGTGAGCTATGATGAATATCGCTATCGCAGCGTGATTAAAGCGTGCCAGCTGGAAGAAGATATTAGCAAATTTGCGGAAAAAGATAACATTGTGCTGGGCGAAGGCGGCATTACCCTGAGCGGCGGCCAGCGCGCGCGCATTAGCCTGGCGCGCGCGGTGTATAAAGATGCGGATCTGTATCTGCTGGATAGCCCGTTTGGCTATCTGGATGTGCTGACCGAAAAAGAAATTTTTGAAAGCTGCGTGTGCAAACTGATGGCGAACAAAACCCGCATTCTGGTGACCAGCAAAATGGAACATCTGAAAAAAGCGGATAAAATTCTGATTCTGCATGAAGGCAGCAGCTATTTTTATGGCACCTTTAGCGAACTGCAGAACCTGCAGCCGGATTTTAGCAGCAAACTGATGGGCTGCGATAGCTTTGATCAGTTTAGCGCGGAACGCCGCAACAGCATTCTGACCGAAACCCTGCATCGCTTTAGCCTGGAAGGCGATGCGCCGGTGAGCTGGACCGAAACCAAAAAACAGAGCTTTAAACAGACCGGCGAATTTGGCGAAAAACGCAAAAACAGCATTCTGAACCCGATTAACAGCATTCGCAAATTTAGCATTGTGCAGAAAACCCCGCTGCAGATGAACGGCATTGAAGAAGATAGCGATGAACCGCTGGAACGCCGCCTGAGCCTGGTGCCGGATAGCGAACAGGGCGAAGCGATTCTGCCGCGCATTAGCGTGATTAGCACCGGCCCGACCCTGCAGGCGCGCCGCCGCCAGAGCGTGCTGAACCTGATGACCCATAGCGTGAACCAGGGCCAGAACATTCATCGCAAAACCACCGCGAGCACCCGCAAAGTGAGCCTGGCGCCGCAGGCGAACCTGACCGAACTGGATATTTATAGCCGCCGCCTGAGCCAGGAAACCGGCCTGGAAATTAGCGAAGAAATTAACGAAGAAGATCTGAAAGAATGCTTTTTTGATGATATGGAAAGCATTCCGGCGGTGACCACCTGGAACACCTATCTGCGCTATATTACCGTGCATAAAAGCCTGATTTTTGTGCTGATTTGGTGCCTGGTGATTTTTCTGGCGGAAGTGGCGGCGAGCCTGGTGGTGCTGTGGCTGCTGGGCAACACCCCGCTGCAGGATAAAGGCAACAGCACCCATAGCCGCAACAACAGCTATGCGGTGATTATTACCAGCACCAGCAGCTATTATGTGTTTTATATTTATGTGGGCGTGGCGGATACCCTGCTGGCGATGGGCTTTTTTCGCGGCCTGCCGCTGGTGCATACCCTGATTACCGTGAGCAAAATTCTGCATCATAAAATGCTGCATAGCGTGCTGCAGGCGCCGATGAGCACCCTGAACACCCTGAAAGCGGGCGGCATTCTGAACCGCTTTAGCAAAGATATTGCGATTCTGGATGATCTGCTGCCGCTGACCATTTTTGATTTTATTCAGCTGCTGCTGATTGTGATTGGCGCGATTGCGGTGGTGGCGGTGCTGCAGCCGTATATTTTTGTGGCGACCGTGCCGGTGATTGTGGCGTTTATTATGCTGCGCGCGTATTTTCTGCAGACCAGCCAGCAGCTGAAACAGCTGGAAAGCGAAGGCCGCAGCCCGATTTTTACCCATCTGGTGACCAGCCTGAAAGGCCTGTGGACCCTGCGCGCGTTTGGCCGCCAGCCGTATTTTGAAACCCTGTTTCATAAAGCGCTGAACCTGCATACCGCGAACTGGTTTCTGTATCTGAGCACCCTGCGCTGGTTTCAGATGCGCATTGAAATGATTTTTGTGATTTTTTTTATTGCGGTGACCTTTATTAGCATTCTGACCACCGGCGAAGGCGAAGGCCGCGTGGGCATTATTCTGACCCTGGCGATGAACATTATGAGCACCCTGCAGTGGGCGGTGAACAGCAGCATTGATGTGGATAGCCTGATGCGCAGCGTGAGCCGCGTGTTTAAATTTATTGATATGCCGACCGAAGGCAAACCGACCAAAAGCACCAAACCGTATAAAAACGGCCAGCTGAGCAAAGTGATGATTATTGAAAACAGCCATGTGAAAAAAGATGATATTTGGCCGAGCGGCGGCCAGATGACCGTGAAAGATCTGACCGCGAAATATACCGAAGGCGGCAACGCGATTCTGGAAAACATTAGCTTTAGCATTAGCCCGGGCCAGCGCGTGGGCCTGCTGGGCCGCACCGGCAGCGGCAAAAGCACCCTGCTGAGCGCGTTTCTGCGCCTGCTGAACACCGAAGGCGAAATTCAGATTGATGGCGTGAGCTGGGATAGCATTACCCTGCAGCAGTGGCGCAAAGCGTTTGGCGTGATTCCGCAGAAAGTGTTTATTTTTAGCGGCACCTTTCGCAAAAACCTGGATCCGTATGAACAGTGGAGCGATCAGGAAATTTGGAAAGTGGCGGATGAAGTGGGCCTGCGCAGCGTGATTGAACAGTTTCCGGGCAAACTGGATTTTGTGCTGGTGGATGGCGGCTGCGTGCTGAGCCATGGCCATAAACAGCTGATGTGCCTGGCGCGCAGCGTGCTGAGCAAAGCGAAAATTCTGCTGCTGGATGAACCGAGCGCGCATCTGGATCCGGTGACCTATCAGATTATTCGCCGCACCCTGAAACAGGCGTTTGCGGATTGCACCGTGATTCTGTGCGAACATCGCATTGAAGCGATGCTGGAATGCCAGCAGTTTCTGGTGATTGAAGAAAACAAAGTGCGCCAGTATGATAGCATTCAGAAACTGCTGAACGAACGCAGCCTGTTTCGCCAGGCGATTAGCCCGAGCGATCGCGTGAAACTGTTTCCGCATCGCAACAGCAGCAAATGCAAAAGCAAACCGCAGATTGCGGCGCTGAAAGAAGAAACCGAAGAAGAAGTGCAGGATACCCGCCTG

3.3. Codon optimization

Codon optimization in NOVOPRO Codon Optimization Tool (ExpOptimizer) For the purposes of this homework exercise, I chose to codon optimize the CFTR sequence for expression in E. coli K12 with the exclusion of BamHI, EcoRI and PstI sequences. This decision was primarily practical, as I wanted to clearly visualize how the nucleotide sequence changes through optimization. When I performed a BLAST alignment comparing the original human CFTR sequence to the E. coli-optimized version, I observed a clear difference with approximately 83% identity at the nucleotide level. This provided a tangible demonstration of how codon optimization works in practice.

I also attempted to perform the same optimization and BLAST comparison for both Homo sapiens and Saccharomyces cerevisiae hosts. Interestingly, in both cases, BLAST returned results indicating no significant differences between the original and optimized sequences. This makes sense biologically: since the original CFTR sequence is already human, optimizing it for human expression would result in minimal changes, and yeast likely shares enough codon bias with humans.

It is important to emphasize, however, that for actual experimental practice, particularly for a functional study of CFTR mutations, codon optimization should be performed for human bronchial epithelial (CFBE) cells. These cells provide the native cellular environment, complete with the proper trafficking machinery, post-translational modification systems, and lipid membrane composition required for CFTR to function as a legitimate chloride channel. While E. coli works well for this homework demonstration, it would not be suitable for genuine functional studies of this protein.

image opt

ATGCAGCGTAGTCCGTTAGAAAAGGCGTCTGTCGTATCAAAATTGTTTTTTAGTTGGACTCGCCCCATTCTGCGTAAAGGTTATCGTCAACGCTTAGAACTCAGCGACATTTACCAGATCCCGAGTGTAGATTCGGCGGATAATCTCAGCGAAAAGTTGGAACGTGAGTGGGATCGCGAATTGGCATCCAAGAAAAATCCGAAACTGATTAATGCGTTACGCCGCTGTTTCTTTTGGCGTTTTATGTTTTATGGTATCTTTCTCTATTTGGGCGAAGTTACTAAAGCTGTGCAGCCATTGCTGCTCGGACGTATTATTGCGAGCTATGATCCGGACAATAAAGAAGAACGCTCAATTGCGATTTACCTGGGCATTGGCCTGTGTTTACTGTTTATTGTTCGTACCCTGCTGTTACACCCGGCGATTTTTGGACTCCATCACATTGGCATGCAGATGCGCATTGCCATGTTTTCCCTGATCTATAAGAAAACGCTGAAATTGTCCAGTCGCGTTCTGGACAAAATTTCCATCGGACAGCTGGTGAGCCTCCTGTCAAATAACCTGAACAAATTTGACGAGGGTCTCGCCCTGGCACATTTCGTATGGATTGCGCCGCTGCAAGTGGCGTTACTGATGGGCCTGATCTGGGAACTGCTGCAGGCTTCAGCATTTTGCGGCCTGGGCTTTCTGATTGTGCTGGCGCTTTTTCAGGCAGGACTGGGACGCATGATGATGAAATACCGCGACCAACGTGCGGGAAAAATCAGCGAACGTTTAGTCATTACCAGCGAAATGATCGAAAACATTCAATCAGTAAAAGCGTACTGCTGGGAAGAAGCCATGGAGAAAATGATTGAAAACCTTCGCCAGACCGAACTGAAACTGACGCGCAAAGCGGCGTACGTGCGCTATTTCAACAGCAGTGCCTTTTTTTTTAGCGGATTTTTTGTCGTCTTTCTGAGCGTGCTGCCGTATGCACTGATTAAGGGGATTATTCTGCGCAAAATTTTCACGACAATTTCCTTCTGTATTGTACTCCGTATGGCTGTTACCCGTCAGTTCCCGTGGGCGGTGCAGACGTGGTACGACTCGTTGGGGGCTATTAATAAAATTCAAGACTTCCTGCAGAAGCAGGAGTATAAAACGTTAGAGTATAACCTCACCACGACGGAAGTCGTTATGGAAAACGTAACCGCATTCTGGGAAGAAGGTTTTGGCGAACTTTTTGAAAAGGCCAAACAGAATAACAATAACCGCAAAACGAGTAATGGAGATGATAGCCTGTTTTTTAGCAACTTCTCACTGCTGGGCACCCCGGTGCTTAAAGATATTAACTTCAAGATCGAGCGTGGCCAACTGCTGGCCGTGGCGGGAAGCACAGGCGCCGGAAAGACCTCGTTGCTTATGGTGATTATGGGCGAATTAGAACCGTCAGAGGGCAAAATTAAGCATTCCGGCCGTATTTCGTTTTGTAGTCAGTTTAGCTGGATTATGCCCGGCACCATCAAAGAGAACATCATTTTCGGGGTGTCGTACGATGAATACCGTTATCGTTCAGTTATCAAAGCGTGTCAACTGGAAGAAGATATCTCTAAGTTTGCTGAAAAAGACAACATTGTGCTGGGGGAAGGCGGCATCACCTTGTCAGGCGGTCAGCGTGCACGCATTAGTCTGGCGCGTGCTGTGTATAAAGACGCGGATTTATATCTCCTTGATAGTCCCTTCGGATATCTGGATGTCTTGACCGAAAAAGAAATTTTTGAAAGCTGCGTGTGCAAACTGATGGCCAACAAAACTCGCATTCTCGTAACCTCAAAAATGGAACACTTGAAAAAAGCCGACAAGATTCTGATTCTGCACGAAGGTTCAAGCTATTTTTACGGTACCTTTTCTGAGTTACAGAATCTGCAGCCCGACTTTTCGTCCAAATTAATGGGCTGCGACAGCTTTGACCAGTTTAGCGCCGAACGCCGCAATTCAATCCTTACGGAAACATTACATCGCTTCTCTCTGGAAGGCGATGCCCCAGTAAGCTGGACAGAAACGAAAAAACAGAGCTTCAAACAAACCGGCGAATTCGGCGAAAAACGTAAGAACTCAATCCTTAATCCCATTAATTCCATTCGTAAGTTCAGCATCGTGCAGAAGACGCCGCTCCAGATGAATGGCATTGAGGAGGATTCGGATGAACCGCTTGAGCGCCGTCTGTCATTAGTGCCGGATTCGGAGCAAGGTGAAGCAATTTTACCCCGTATTTCAGTGATTTCTACCGGGCCGACGCTCCAGGCACGCCGTCGCCAGAGCGTGCTCAACCTTATGACTCATAGCGTGAATCAAGGACAAAATATTCATCGTAAAACGACAGCGAGCACACGCAAAGTGAGCTTGGCGCCTCAAGCAAATCTTACCGAACTGGACATCTATTCGCGCCGCCTCTCCCAGGAGACTGGCCTCGAAATTTCCGAGGAGATCAATGAAGAAGACCTGAAAGAATGCTTTTTTGACGATATGGAAAGCATCCCGGCGGTCACTACGTGGAATACGTATCTCCGCTATATTACGGTGCATAAAAGTCTGATCTTCGTATTGATCTGGTGTTTAGTGATCTTTCTGGCCGAAGTGGCCGCGTCCCTTGTGGTGCTGTGGCTGCTGGGTAACACCCCTTTGCAGGATAAGGGCAATAGCACTCATTCCCGCAATAACTCGTATGCCGTCATCATTACCTCCACCTCGAGTTATTACGTCTTTTATATCTACGTGGGCGTCGCTGATACCTTATTGGCAATGGGGTTCTTCCGCGGTCTGCCGTTAGTGCATACACTGATCACCGTCTCGAAAATTCTGCATCATAAAATGCTGCATAGTGTGCTCCAGGCACCGATGTCGACCCTGAACACACTGAAAGCAGGCGGCATTTTAAACCGCTTTTCTAAAGACATTGCCATTCTGGATGACCTGCTTCCCCTGACTATTTTTGATTTTATTCAGTTGCTGCTCATTGTAATTGGCGCTATTGCTGTGGTGGCGGTTCTGCAACCGTATATTTTTGTCGCGACCGTGCCGGTCATTGTCGCTTTCATTATGCTGCGCGCCTACTTTCTGCAGACGAGCCAACAGCTTAAACAGCTCGAATCTGAAGGACGTTCACCTATCTTTACTCACCTGGTTACGTCGCTGAAAGGCCTGTGGACGCTGCGTGCATTCGGCCGTCAGCCGTACTTCGAAACCCTGTTCCATAAAGCACTGAACCTGCATACCGCGAACTGGTTTCTGTATCTGTCGACTCTGCGCTGGTTTCAAATGCGTATTGAGATGATTTTCGTAATTTTTTTTATTGCCGTGACTTTCATCAGTATCTTGACCACGGGCGAGGGCGAAGGTCGTGTGGGTATTATCCTGACCTTGGCTATGAACATCATGAGTACACTGCAGTGGGCGGTGAATAGCAGCATCGATGTGGATTCTTTGATGCGCAGCGTGTCCCGCGTTTTTAAATTCATTGACATGCCTACCGAAGGTAAGCCCACGAAGAGTACAAAACCCTACAAAAACGGCCAACTGTCAAAGGTTATGATTATTGAAAATTCGCACGTGAAAAAAGACGACATTTGGCCGAGCGGCGGTCAAATGACAGTGAAGGATCTGACGGCGAAATACACAGAAGGAGGCAATGCCATTCTGGAAAACATTTCTTTCTCCATCAGCCCGGGCCAGCGCGTCGGGCTCCTGGGCCGTACGGGTAGCGGCAAATCCACTCTGCTTTCAGCATTTTTGCGCCTCTTAAACACGGAAGGAGAAATTCAGATTGATGGCGTCTCGTGGGATAGCATCACACTGCAACAGTGGCGTAAGGCATTCGGCGTCATTCCGCAGAAGGTGTTCATTTTTTCGGGCACCTTTCGCAAGAACCTGGACCCTTATGAACAGTGGAGCGACCAGGAGATTTGGAAGGTAGCGGACGAAGTGGGCCTGCGTTCGGTTATTGAACAGTTCCCCGGTAAATTGGACTTCGTGCTGGTCGATGGTGGGTGCGTACTCTCTCATGGGCACAAACAACTTATGTGCCTGGCGCGTAGTGTGCTGAGCAAAGCCAAGATTCTCCTGCTGGACGAACCGTCAGCACATCTCGATCCTGTCACCTATCAGATCATTCGCCGCACCCTCAAACAGGCGTTCGCGGATTGCACGGTGATTCTGTGCGAACATCGCATCGAAGCGATGCTTGAGTGTCAGCAGTTTCTCGTGATCGAAGAAAACAAAGTCCGTCAATATGATAGCATCCAGAAGTTGCTGAATGAACGTTCATTATTTCGCCAGGCGATCAGCCCGAGCGACCGTGTGAAACTGTTCCCTCATCGTAATAGCTCGAAGTGTAAATCCAAACCTCAGATTGCTGCACTGAAGGAGGAAACGGAAGAAGAAGTGCAGGATACTCGTCTG

ncbi img

3.4. You have a sequence! Now what?

Once I have my codon-optimized CFTR gene sequence designed for E. coli, several technologies exist to produce the actual protein. These methods fall into two main categories: cell-dependent expression systems and cell-free systems. The most common method involves inserting my optimized CFTR gene into a plasmid vector. This plasmid is designed with regulatory elements including a promoter to recruit RNA polymerase, a ribosome binding site to help recruit bacterial ribosomes for translation, and a selectable marker such as an antibiotic resistance gene to ensure only bacteria containing my plasmid survive. The process begins with transformation, where I introduce the plasmid into competent E. coli cells through heat shock or electroporation. I then culture the bacteria in liquid media until they reach optimal density, at which point I add an inducing chemical like IPTG to activate the promoter and trigger transcription of my CFTR gene. Once induced, the bacterial machinery transcribes DNA into mRNA, which ribosomes then translate into CFTR protein. For a challenging membrane protein like CFTR, special considerations are needed. I might use engineered E. coli strains that optimize membrane protein folding or grow cultures at lower temperatures to prevent aggregation.

Cell-free protein synthesis offers a compelling alternative, particularly for difficult-to-express proteins like CFTR. These methods use cellular extracts containing all the necessary transcription and translation machinery, but without living cells. To use this approach, I prepare an extract from E. coli, which contains ribosomes, tRNAs, amino acids, and energy sources. I then add my purified plasmid DNA directly to this extract, and transcription and translation occur in a test tube over several hours, after which the protein can be purified directly from the reaction mixture. This approach offers several advantages for CFTR production. As an open system, I can add detergents or lipids during synthesis to help membrane proteins fold properly. I can also screen many mutant variants quickly without transforming cells each time. If CFTR proves toxic to E. coli, which membrane proteins often do, cell-free synthesis bypasses this entirely. Additionally, I can easily incorporate modified amino acids for structural studies.

Part 4: Prepare a Twist DNA Synthesis Order

img annotations

img construct

https://benchling.com/s/seq-ceNcs06iT2dZog3TP27n?m=slm-XZIDkTXrI6R5WwFYIFl8

img twist order

img plasmid

https://benchling.com/s/seq-mc2yPDRoSr0FiL2RCqAB?m=slm-tjBqRHIwYT01GDHSirkw

Part 5: DNA Read/Write/Edit

5.1 DNA Read

If I could choose any DNA to sequence, I would want to sequence the CFTR gene from individuals diagnosed with or suspected of having Cystic Fibrosis. This choice is directly connected to my main project and interest in CF research. Sequencing the CFTR gene from patient samples would allow me to identify which specific mutation or mutations an individual carries. This is critically important because Cystic Fibrosis is caused by over 2,000 different known mutations in the CFTR gene, and these mutations can affect the protein in different ways. Some mutations, like F508del, cause the protein to misfold and never reach the cell surface. Others, like G551D, allow the protein to reach the surface but prevent the channel from opening properly. Still other mutations result in premature stop codons or splicing errors that produce no functional protein at all. By sequencing each patient’s CFTR gene, I could determine their specific genotype, which directly informs their prognosis and guides treatment decisions. This is particularly relevant today because of the development of mutation-specific therapies with modulators. These drugs aren’t effective in every mutations, highlighting why knowing the exact DNA sequence is essential for personalized medicine. Beyond individual patient care, sequencing CFTR genes from large populations would contribute to our understanding of how different variants correlate with disease severity, how new variants arise, local variants and potentially identify rare mutations that might respond to existing or future therapies. For someone working on CF, having this sequence information is the foundation upon which everything else is built.

For sequencing the CFTR gene, I would employ a combination approach using whole exome sequencing followed by targeted CFTR gene sequencing, both performed on the Illumina platform. This dual strategy is particularly valuable for clinical scenarios involving suspected Cystic Fibrosis or other disorders with overlapping symptoms. Illumina sequencing is classified as second-generation sequencing, also called next-generation sequencing or NGS. Unlike first-generation Sanger sequencing, which reads one DNA fragment at a time, Illumina’s massively parallel approach sequences millions of fragments simultaneously, making it highly efficient for clinical diagnostics. I have selected Illumina because it offers accuracy with very low error rates, making it ideal for detecting disease-causing mutations where precision is paramount. Additionally, Illumina produces shorter reads which provides the high accuracy needed for clinical decision-making. This is particularly important for CFTR sequencing, as knowing the exact mutation determines which targeted therapies a patient may receive. The input for this sequencing approach would be genomic DNA isolated from a patient blood sample. The essential steps for library preparation would begin with DNA isolation and quality assessment, where I extract genomic DNA from the patient sample and verify its purity and integrity using UV spectrophotometry and fluorometric assays. Next, for whole exome sequencing, I would use hybridization-based probes to capture all exonic regions, while for targeted CFTR sequencing, a custom enrichment panel can specifically capture the CFTR gene regions including both exons and introns. The captured DNA is then fragmented into small pieces suitable for Illumina sequencing. Following fragmentation, Illumina-specific adapters are ligated to both ends of the fragments, as these adapters are complementary to the oligos on the flow cell and enable cluster generation. PCR amplification is then performed to increase the quantity of the library, which is especially important when working with limited starting material. Finally, the prepared libraries are quantified to ensure optimal loading concentration onto the sequencer.

The essential steps of Illumina sequencing technology begin with cluster generation through bridge amplification. The library is loaded onto a flow cell where fragments hybridize to complementary oligos on the surface. Through bridge amplification, each fragment forms a clonal cluster of identical molecules, which amplifies the fluorescent signal for detection. The core technology that decodes the bases is called sequencing by synthesis. This process uses fluorescently labeled nucleotides with reversible terminators. In each cycle, DNA polymerase adds a single complementary nucleotide to each growing strand. Unbound nucleotides are washed away, and a laser excites the fluorescent labels. A camera captures images of the flow cell, recording which base was incorporated in each cluster based on the emission wavelength. The fluorescent dye and terminator are then cleaved, allowing the next cycle to begin. This cyclic reversible termination process repeats for the desired number of cycles to achieve a specific read length. The images are processed by Illumina’s DRAGEN Bio-IT Platform or similar software, which converts fluorescence intensities into base calls with associated quality scores. The output of this sequencing approach includes several file types and reports. Raw FASTQ files contain the sequence reads and quality scores for each base call. Aligned BAM files show the reads aligned to the human reference genome, allowing visualization of the CFTR region. Variant call format files, or VCF files, list all identified variants including single nucleotide variants, and small insertions and deletions. For CFTR specifically, this includes critical variants that affect splicing. Finally, a clinical report provides a curated interpretation of identified variants, classifying them as pathogenic, likely pathogenic, variants of uncertain significance, or benign based on established guidelines. This comprehensive approach would enable accurate diagnosis of CFTR-related disorders, guide mutation-specific therapy decisions, and potentially identify novel variants for further study.

5.2 DNA Write

For DNA synthesis, I would like to synthesize an mRNA construct encoding a fully functional, wild-type CFTR protein, designed for use as an mRNA replacement therapy. This approach would operate independently of current modulator therapies and could benefit patients regardless of their specific variant type, including those with nonsense or frameshift mutations that do not respond to existing drugs. The concept involves delivering synthetic CFTR mRNA to airway epithelial cells, where it would be translated directly into functional CFTR protein by the cell’s own ribosomes. This bypasses the need to correct the patient’s mutated endogenous gene. However, this strategy would ideally require co-delivery of RNA interference molecules such as siRNA or shRNA to silence the expression of the endogenous mutated CFTR transcript. This would prevent potential dominant-negative interactions or production of toxic truncated protein fragments from the patient’s own gene.

I would use the silicon-based high-throughput DNA synthesis platform developed by Twist Bioscience. This technology represents the current industry standard for commercial gene synthesis and is ideally suited for producing the approximately 4,500 base pair CFTR coding sequence I require.

Twist’s synthesis method is classified as second-generation high-throughput DNA synthesis and relies on phosphoramidite chemistry miniaturized and parallelized on a silicon microarray chip. Unlike conventional column-based synthesizers that produce one sequence at a time, Twist’s platform synthesizes thousands to millions of oligonucleotides simultaneously on a single silicon wafer. I have chosen this technology because it offers the optimal balance of throughput, accuracy, and cost-effectiveness for my CFTR project. Additionally, all synthesized products are sequence-verified by next-generation sequencing, ensuring the accuracy critical for downstream therapeutic applications.

The essential steps begin with sequence design and optimization, where my codon-optimized CFTR coding sequence is analyzed for complexity factors like high GC content and repeats. A silicon wafer is then functionalized with a chemical linker to enable oligonucleotide attachment. The core process is parallel oligonucleotide synthesis using phosphoramidite chemistry, where each synthesis cycle involves four reactions: deblocking, coupling, capping, and oxidation. These cycles occur across thousands of features on the chip with stepwise coupling efficiency. Once synthesis is complete, the oligonucleotides are chemically cleaved and eluted from the silicon surface. For my full-length CFTR gene, these oligonucleotides undergo amplification and assembly through PCR-based methods and Gibson Assembly to join fragments into the complete 4.4 kb coding sequence. Finally, rigorous quality control and sequence verification using next-generation sequencing confirms accuracy before delivery.

Regarding limitations, speed remains a factor with standard turnaround of approximately 10 business days. Accuracy is excellent at the per-base level, but cumulative error rates for sequences of several thousand base pairs range from 0.3% to 1.4%, which is why NGS verification is essential. For scalability, the platform excels at producing many sequences in parallel but has inherent limits on individual sequence length. Twist’s Gene Fragments are offered up to 5 kb, which accommodates my CFTR coding sequence but would require additional assembly steps for larger constructs.

5.3 DNA Edit

For DNA editing, I would want to edit the CFTR gene directly in patient-derived cells to permanently correct disease-causing mutations at the source. Unlike my previously described mRNA replacement therapy, which provides a temporary workaround, gene editing offers the potential for a one-time, permanent cure by repairing the endogenous gene and restoring normal CFTR expression under its native regulatory control. This approach would be particularly valuable because it maintains the cell’s natural mechanisms for regulating CFTR expression, rather than relying on exogenous delivery.

The technology I would use to perform these edits is CRISPR-Cas9 for initial exploration, but for therapeutic application I would preferentially employ base editing or prime editing depending on the specific mutation. These represent second-generation CRISPR technologies that offer greater precision and safety for clinical applications. CRISPR-Cas9 is classified as a programmable nuclease system derived from bacterial adaptive immune systems. The essential steps begin with designing a single guide RNA (sgRNA) complementary to the target region adjacent to a protospacer adjacent motif (PAM) sequence. To overcome the limitation of CRISPR-cas9 with the errors caused by the double strand breake, I would use base editing, a more precise technology that does not require this type of brakes. Base editors fuse a catalytically impaired Cas protein (nickase Cas9) to a deaminase enzyme. For example, for correcting G551D, a point mutation where glycine is replaced by aspartate, I would use an adenine base editor (ABE) that converts an A•T base pair to G•C. The essential steps involve designing an sgRNA that positions the Cas9 nickase to expose the target base within the editing window, typically 4-8 nucleotides. The deaminase chemically converts the target adenine to inosine, which is read as guanine during replication.

In terms of preparation, the input components would include a base editor mRNA or protein and the corresponding sgRNA. These components would be delivered to patient-derived bronchial epithelial cells or induced pluripotent stem cells (iPSCs) for ex vivo editing, with the goal of eventually moving to in vivo delivery via lipid nanoparticles or AAV vectors optimized for lung targeting. The limitations of these methods must be carefully considered. Base editing offers higher efficiency and fewer double-strand breaks, but is limited to transition mutations (A→G or C→T) and cannot correct insertions or deletions. It would be necessary to consider off-target mutations.

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork

https://colab.research.google.com/drive/1Gn1gslLg1UgNxQTzmBobHn-Zt3UJypa0#scrollTo=pczDLwsq64mk&line=7&uniqifier=1

https://opentrons-art.rcdonovan.com/?id=44nd2j8trb218x2

Post-Lab Questions

In this study, researchers at Colorado State University demonstrate a novel application of the Opentrons OT-2 liquid handling robot to automate the scale-up of protein crystallization. Protein crystallization is a critical bottleneck in structural biology and the development of protein-based biomaterials, yet traditional manual methods are time-consuming, labor-intensive, and prone to variability between researchers. The team sought to determine whether an affordable, general-purpose liquid handling system could reliably perform the complex steps required for sitting drop vapor diffusion experiments at the 24-well scale.

To achieve this, the researchers developed custom Python scripts to control the OT-2 with precision, enabling it to mix reservoir solutions with precise gradients of precipitants, buffers, and salts before combining small volumes of these mixtures with protein samples on crystallization pedestals. A significant technical hurdle was the incompatibility of the non-standard Hampton Research CrysChem 24-well plates with the OT-2’s deck. The team overcame this by designing and 3D-printing a custom adapter to securely hold the plates, demonstrating the flexibility and customizability of the Opentrons platform. All protocols, scripts, and design files were made openly available on GitHub.

The automated system was successfully validated using two different proteins: hen egg white lysozyme (HEWL), a model protein, and an engineered periplasmic protein from Campylobacter jejuni , which the lab uses as a biomaterial for nanotechnology applications. The OT-2 produced crystals for both proteins, with the HEWL trial yielding crystals in 18 out of 24 wells within 24 hours. When compared directly to manual plate setup by multiple researchers with varying experience levels, the OT-2 produced crystals more consistently, reducing well-to-well variability. While the robot took slightly longer to set up plates than an experienced scientist (approximately 30-40 minutes), it freed the researcher to focus on other tasks. Accuracy testing revealed that while pipetting errors increased with viscosity (reaching 13.5% for 1 μL of 100% glycerol), errors were minimal for more moderate viscosity solutions commonly used in crystallization screens.

Figure 4 shows the proof-of-concept sitting drop plates prepared with food coloring to visualize correct mixing. The OT-2 prepared a plate with a blue gradient increasing from left to right and a red gradient increasing from top to bottom, compared to an identical plate prepared manually. No differences were discernable between the robot-prepared and manually prepared plates, confirming the OT-2’s ability to accurately dispense and mix liquids according to the programmed gradients.

fig 4

Figure 6 presents the final results of the human versus OT-2 HEWL plate preparation comparison. The robot demonstrated more consistent crystal production across multiple plates compared to human researchers with varying levels of wet lab experience, highlighting the automation advantage in reducing person-to-person variability.

fig 6

DeRoo, J. B., Jones, A. A., Slaughter, C. K., Ahr, T. W., Stroup, S. M., Thompson, G. B., & Snow, C. D. (2025). Automation of protein crystallization scaleup via Opentrons-2 liquid handling. SLAS technology, 32, 100268. https://doi.org/10.1016/j.slast.2025.100268

2.

I intend to develop an automated workflow to study microbial resistance patterns in pathogens isolated from cystic fibrosis (FC) patients in Ecuador. Cystic fibrosis patients are particularly susceptible to chronic respiratory infections, often involving multi-drug resistant organisms that require precise susceptibility testing to guide clinical treatment. My project will leverage the Opentrons OT-2 liquid handling robot to automate minimum inhibitory concentration (MIC) testing, enabling high-throughput screening of patient isolates against a panel of clinically relevant antibiotics.

This automated workflow will enable systematic resistance surveillance in a vulnerable patient population while demonstrating the power of open-source automation tools in resource-limited settings. By combining the affordability of the Opentrons platform with custom 3D-printed adapters and open-source Python scripts, this project provides a template for laboratories in Ecuador and similar settings to implement sophisticated antimicrobial susceptibility testing without prohibitive equipment costs. Importantly, by automating the most labor-intensive and error-prone steps of MIC testing, this approach frees skilled microbiologists to focus on interpretation and patient care rather than repetitive pipetting. Ultimately, this work aims to improve clinical outcomes for cystic fibrosis patients in Ecuador by providing faster, more reliable, and more comprehensive susceptibility data to guide antibiotic therapy decisions in an era of rising antimicrobial resistance.}

Sample Processing Module: After patient sputum samples are cultured on selective media and individual colonies are isolated, the automated workflow begins. For each confirmed pathogen, a single colony is inoculated into a Mueller-Hinton broth in culture tubes and placed in the shaking incubator at 35°C until logarithmic growth phase is achieved.

MIC Testing Workflow: The antibiotic susceptibility testing will follow established high-throughput MIC protocols using 96-well plates . Using the Ginkgo Nebula platform for experimental design, I will create a randomized block design to test multiple patient isolates against a panel of 8-10 antibiotics commonly used in Ecuadorian clinical settings. The OT-2 will perform serial dilutions of antibiotic stocks directly in the 96-well plates, followed by inoculation with standardized bacterial suspensions normalized to 0.5 McFarland standard. This automation eliminates the manual pipetting errors that often plague MIC testing and ensures consistent dilution series across all samples.

Python Control Scripts: The OT-2 will run custom Python scripts incorporating deck layout definitions, tip tracking, and error handling. Techniques from the crystallization paper, such as tip touching for complete dispensing and slow aspirate speeds for foaming antibiotics, will be included. Scripts will include pause points for manual interventions like transferring plates to the incubator.

This automated workflow enables systematic resistance surveillance in a vulnerable population while demonstrating open-source automation in resource-limited settings. By combining the affordable Opentrons platform with existing lab infrastructure and custom 3D-printed adapters, this project provides a template for laboratories in Ecuador to implement sophisticated susceptibility testing without prohibitive costs. The system can process dozens of isolates simultaneously with minimal hands-on time, freeing microbiologists for interpretation and patient care. For Ecuador, where antimicrobial resistance is an emerging threat but surveillance resources are limited, this open-source approach democratizes access to high-quality testing.

Ultimately, this work aims to improve clinical outcomes for cystic fibrosis patients by providing faster, more reliable susceptibility data to guide antibiotic therapy. By establishing baseline resistance patterns and tracking changes over time, the project can inform empiric treatment guidelines and contribute to global antimicrobial resistance surveillance. All protocols, scripts, and design files will be openly available for adaptation by other laboratories facing similar challenges.