Week 1 HW: Principles and Practices

Biological Engineering Application: A Functional Expression Platform for Classifying Variants of Uncertain Significance (VUS) in Cystic Fibrosis

1. My proposal is to develop a standardized, accessible platform for the functional expression and characterization of variants in the CFTR gene, specifically focusing on Variants of Uncertain Significance (VUS) identified in underrepresented populations in Ecuador.

The core application integrates three components: rapid synthesis and cloning of CFTR gene constructs containing locally identified VUS; expression of these constructs in relevant, standardized human cell lines (e.g., immortalized bronchial epithelial cells); and a multiparametric, high-content functional assay suite.

This suite would automatically measure key parameters: subcellular localization via microscopy, protein processing and maturation via western blot, ion channel function via fluorescence-based halide assays, and response to CFTR modulator drugs.

The motivation for this tool is to address a critical equity gap in genomic medicine. Global genetic databases like ClinVar are heavily biased toward populations of European descent. Consequently, variants identified in mestizo, Indigenous, or Afro-descendant populations in Ecuador and similar regions are often classified as VUS due to a lack of functional and phenotypic data.

This has direct, harmful clinical consequences: it leads to ambiguous diagnoses for patients and families, restricts access to modern, highly effective modulator therapies that are approved for specific variants, and perpetuates health disparities.

This platform aims to functionalize genomics by converting raw genetic data into actionable clinical knowledge, deliberately prioritizing variants from historically underserved populations to ensure that precision medicine benefits extend globally.

2. Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

The primary governance goal is to ensure that the VUS characterization platform maximizes clinical benefit and equity for underrepresented populations while ethically managing the risks inherent in generating new genomic knowledge. The first specific sub-goal is ethical and equitable variant prioritization. This involves establishing criteria and processes to guarantee that the variants selected for study are those that originate from and will first benefit populations with the greatest need and the least representation in global databases. The second sub-goal is responsible and accessible knowledge translation. This ensures that functional characterization results are effectively, rapidly, and freely translated into improvements in the clinical management of the very patients from whom the variants were sourced. The third sub-goal is protecting privacy and fostering community agency. This focuses on safeguarding patient genetic data and actively involving patient communities in decision-making processes regarding data use and study design, moving beyond mere consent to meaningful partnership.

3. Describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Action 1: A Community-Engaged Variant Prioritization Framework

The purpose of this action is to shift from the current common practice, where research variants are often selected for scientific convenience or commercial interest, to a proposed model where a multi-stakeholder committee prioritizes variants based on clinical need and equity. The design requires establishing a “VUS Ethical Prioritization Committee” for the project. This committee would include representatives from Ecuadorian CF patient associations, local clinical geneticists and neumologist, bioethicists, and the project scientists. The committee would employ a scoring system that combines factors like the VUS frequency in local undiagnosed cohorts, the clinical severity of the phenotype in carrier patients, the underrepresentation of the patient’s ethnic group in databases, and the variant’s potential for reclassification to grant access to existing therapies. Key actors involved are patient advocacy groups (providing the need perspective), public hospitals/health institutions (providing anonymized clinical data), and funders (SENESCYT, NIH, who could require this framework as a grant condition). This action relies on several assumptions: that patient groups and local clinicians have the capacity and time for meaningful participation; that clinical data can be shared securely and ethically for prioritization purposes; and that “need” can be consensually defined and quantified. A key risk of failure is that the committee becomes a bureaucratic formality without real power, with scientists ultimately choosing variants that are more “publishable” in high-impact journals, thereby ignoring the equity-driven prioritization. An unintended consequence of success could be that pressure to study only the most urgent variants leads to the neglect of rarer variants in patients with milder symptoms, creating a new form of bias. It could also generate conflict among communities or families competing to have their variant studied first.

Action 2: A Legally Binding “Clinical Translation Commitment” in Collaboration Agreements

This action’s purpose is to move beyond the current common practice where functional data from research projects often takes years—if ever—to reach a patient’s medical record. It proposes a model with a clear, obligatory pathway for variant reclassification and communication. The design needs a contractual appendix in all collaboration agreements (between universities, hospitals, funders) and informed consent forms. This legal instrument would mandate three things: an obligation for the research team to formally submit conclusive functional data to clinical databases (ClinVar) and the local lab’s variant review committee; a pre-approved protocol for returning actionable results to the treating physician and, if agreed upon, the patient/family when a VUS is reclassified; and non-exclusive licensing of any findings to prevent patents from restricting diagnostic use in Ecuador’s public health system. Primary actors include university technology transfer lawyers who draft the clause, research ethics committees who mandate it for approval, and forward-thinking funders (like the Wellcome Trust) who promote open science and benefit-sharing policies. Critical assumptions here are that the functional data will be robust enough to justify clinical reclassification, that mechanisms and personnel exist within the local health system to receive and act on this information, and that researchers are willing to accept these potential constraints on commercialization. This action could fail if the clauses are vague and unenforceable, if the ClinVar submission process is slow and bureaucratic, or if there is no clear clinical partner to receive the information. An unintended success risk is that the obligation to return results—especially for reclassified benign variants—could generate unnecessary anxiety or create high logistical costs. It might also inadvertently deter future researchers or companies from collaborating with resource-limited settings if they perceive these clauses as overly restrictive.

Action 3: A Technical Standard for “Algorithm Auditing” of Pathogenicity Predictors The purpose of this action is to mitigate the risk that the functional data generated by the platform could inadvertently reinforce existing biases in bioinformatic prediction algorithms (like PolyPhen-2, SIFT). These tools, trained predominantly on European genomic data, are known to be less accurate for other populations. The design involves creating an “Ethical Reference Dataset” from the platform’s results. Each functionally characterized variant (pathogenic, benign, residual function) would be used to “audit” standard prediction algorithms. The process would involve systematically publishing statements in open repositories: “Variant CFTR p.XYZ, common in the Andean mestizo population, was characterized as pathogenic with a processing defect. However, algorithm [PolyPhen-2] predicted it as ‘benign’ (score: 0.1). This suggests an underestimation bias for pathogenic variants from this population group.” Key actors are the project scientists who generate and publish the audit data, bioinformatics organizations (like the gnomAD consortium) that incorporate these findings to retrain or flag their tools’ limitations, and scientific journals that could require such bias analyses for publication of variant characterization studies. This action assumes that the functional assays are a reliable “gold standard,” that algorithm developers are open to critique and improving their tools, and that a sufficient number of variants will be characterized to perform statistically meaningful bias analysis. It risks failure if the audit data is ignored by the dominant bioinformatics community, rendering the effort inconsequential. A significant unintended consequence of success could be the generation of widespread, paralyzing distrust in all computational prediction tools, potentially delaying diagnosis in settings where functional assays are unavailable. Furthermore, the data on population-specific algorithmic bias could be misappropriated to support essentialist political arguments about biological differences between groups.

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	2	3	3
• By helping respond	3	2	2
Foster Lab Safety
• By preventing incident	2	2	3
• By helping respond	3	2	3
Protect the environment
• By preventing incidents	3	3	3
• By helping respond	3	3	3
Other considerations
• Minimizing costs and burdens to stakeholders	2	3	1
• Feasibility?	1	2	2
• Not impede research	1	3	1
• Promote constructive applications	1	2	1

5. Based on the scoring matrix, I recommend a prioritized combination of Action 1 (Community-Engaged Prioritization Framework) and Action 3 (Algorithm Auditing Standard) for immediate adoption and funding by national research funding agencies, as the U.S. National Institutes of Health and its counterparts in Ecuador and other underrepresented regions. This combination best advances the core ethical goals of equity, constructive application, and responsible knowledge generation with minimal burden to research progress. Action 1 directly ensures the research agenda addresses the most pressing clinical needs of underserved populations, scoring highest in feasibility, promoting constructive applications, and not impeding research. Action 3 complements this by systematically correcting the global bioinformatic biases that initially created the inequity in VUS classification, scoring best in minimizing burdens and also promoting constructive applications. The primary trade-off in prioritizing this combination is accepting a weaker direct link to traditional biosecurity and lab safety incident prevention, as these actions are focused on ethical oversight and data quality rather than physical containment or misuse deterrence. A critical assumption is that fostering a more equitable and accurate genomic research ecosystem is itself a foundational form of security, preventing harm caused by diagnostic errors and denied care. The major uncertainty lies in implementation: whether community committees can be resourced effectively for meaningful engagement (Action 1) and whether major algorithm developers will voluntarily adopt audit findings to retrain their models (Action 3). Action 2 (Legal Commitment), while crucial for ultimate clinical translation, scores poorly on burden and feasibility; it should be developed in parallel as a longer-term policy goal, informed by the pilot frameworks and trust built through Actions 1 and 3.

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? According to the slides, the error rate of a standard polymerase used in gene synthesis is cited as 1 in 10⁴ bases. The human genome is approximately 3.2 billion base pairs (Gbp). If uncorrected, such an error rate would introduce hundreds of thousands of mutations per genome replication, making stable heredity and cellular function impossible. Biology resolves this catastrophic discrepancy not through polymerase perfection alone, but by employing a layered system of enzymatic correction. The presentation highlights a key component of the MutS Repair System. A dedicated mismatch repair complex that scans and fixes errors after replication. This biological strategy combines proofreading by the polymerase itself with dedicated repair pathways to achieve the net high fidelity required for life.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? The presentation provides the key figure needed, that an average human protein is encoded by 1,036 base pairs. Given the degeneracy of the genetic code, the number of different DNA sequences that can specify the identical amino acid chain is astronomically large, calculated by multiplying the number of synonymous codon choices for each position. However, the presentation powerfully illustrates why, in practice, these myriad synonymous codes are not functionally equivalent. The core reason is the formation of specific messenger RNA (mRNA) secondary structures, dictated by the precise nucleotide sequence through NA:NA interactions. Therefore, successful gene design requires selecting a nucleotide sequence that not only encodes the correct protein but also folds into an mRNA structure compatible with the cellular translation machinery.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently? The most commonly used method for oligo synthesis currently is solid-phase phosphoramidite synthesis on a functionalized silica (SiO₂) support.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis? It is difficult to make oligos longer than 200 nucleotides (nt) via direct synthesis primarily due to the cumulative effect of stepwise coupling inefficiencies. Each cycle in the phosphoramidite synthesis (Slide 3) has a yield of less than 100%. Even a very high per-step coupling efficiency of 99.5% results in a total yield of only about 37% for a 200-mer (0.995^200 ≈ 0.37). This means most of the material synthesized is truncated at various lengths. The primary technical challenges are maintaining exceptionally high coupling yields at every single step and minimizing side reactions over hundreds of cycles.

3. Why can’t you make a 2000bp gene via direct oligo synthesis? You cannot make a 2000bp gene via direct oligo synthesis because the fundamental length limitation of the core phosphoramidite chemistry, as explained above, makes synthesizing a single, continuous 2000nt strand impractical and prohibitively inefficient. Instead, the standard industrial practice is to use oligos as building blocks for assembly. Genes are constructed by synthesizing many shorter, high-quality oligos and then assembling them into longer double-stranded DNA fragments via enzymatic methods like polymerase cycling assembly (PCA) or ligation. This “synthesize then assemble” strategy overcomes the length constraint.

Homework Question from George Church

1. What code would you suggest for AA:AA interactions? AUG CGA -> M:R