Week 1 HW: Principles and Practices

[Spheroid Cell Culture]

1. Biological Engineering Application

This project proposes the development of an intestinal spheroid culture platform derived from cell lines (e.g., Caco-2 spheroid or organoid-like cultures), combined with multi-omics profiling (transcriptomics, proteomics, and metabolomics) computational modeling using systems biology and machine-learning approaches. The platform is intended to support research on drug absorption, inflammatory bowel disease (IBD) diagnostics, and predictive analysis of treatment outcomes. Initially, the system will be used to generate hypotheses from experimental data, with the long-term goal of becoming a predictive research tool.

Platform Workflow

3D Intestinal Model Generation: Establishment of Caco-2–derived 3D epithelial cultures to model intestinal barrier function.
Experimental Perturbation: Exposure of cultures to inflammatory signals, drug compounds, or microbiota-related metabolites. Multi-Omics Acquisition: Collection of transcriptomic, proteomic, and metabolomic data to capture cellular responses.
Data Processing and Integration: Quality control, normalization, and integration of omics datasets using reproducible bioinformatics pipelines.
Computational Modeling: Application of systems biology and machine-learning approaches to identify patterns and generate hypotheses.
Validation and Iteration: Experimental validation of model predictions through iterative testing.

2. Governance Framework

Governance Objectives

The project integrates governance principles to ensure safe, transparent, and equitable use of the technology.

Scientific and Clinical Safety

Implement staged validation protocols before diagnostic use.
Establish quality-control standards for omics data.
Limit early platform use to research contexts.
Document uncertainty in predictive models.

Biological Data Protection

Anonymize patient-derived data.
Comply with research ethics and data protection regulations.
Implement controlled access to datasets and software.
Maintain traceability of samples and analyses.

Responsible Use of Predictive Models

Design software as a research-support tool.
Include confidence and uncertainty metrics in predictions.
Validate models with independent datasets.
Avoid automated decision-making without human supervision.

Equity and Access

Promote open-source computational tools.
Design scalable experimental protocols.
Encourage collaboration with public institutions.
Document methodologies for technology transfer.

3. Governance Actions

Stage-based validation requirement: Restrict initial platform use to research applications until validation standards are met. In the early stages, use cell lines as a working model (3D spheroids/organoids).
Controlled access data management: Use public databases to triangulate working hypotheses. Implement anonymized datasets with institutional oversight and traceability.
Transparent computational workflows: Share bioinformatics processes and documentation through reproducible research practices.

Prioritization Strategy

The project prioritizes a combination of staged validation protocols and open, reproducible computational standards. These actions balance scientific safety with research feasibility and transparency. Controlled-access data infrastructure will be implemented progressively when human biological samples are incorporated.

4. Rating of governance actions

The following table summarizes the evaluation of governance options according to course criteria.

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	1	2	2
• By helping respond	2	1	2
Foster Lab Safety
• By preventing incident	1	2	2
• By helping respond	2	1	2
Protect the environment
• By preventing incidents	2	2	3
• By helping respond	2	2	3
Other considerations
• Minimizing costs and burdens to stakeholders	2	3	1
• Feasibility?	2	2	1
• Not impede research	2	3	1
• Promote constructive applications	1	1	1

Governance Actions: Visual Comparison

Stage-based validation requirement
Biosecurity:         ██░
Lab safety:          ██░
Environment:        ██░
Feasibility:        ██░

Controlled access data management
Biosecurity:         ██░
Lab safety:          ██░
Environment:        ██░
Feasibility:        ██░

Transparent computational workflows
Biosecurity:         ██░
Lab safety:          ██░
Environment:        ███
Feasibility:        █░░

Bars represent rubric scores (1 = █ strong alignment, 2 = ██ moderate alignment, 3 = ███ weaker alignment).

5. Strategies for an Ethical Biological Future

Based on the scoring above, the priority would be a combination of Option 1 (staged validation requirement) and Option 3 (open and reproducible computational standards). Together, these actions balance scientific safety with research feasibility. Validation protocols reduce the risk of incorrect interpretation or premature diagnostic use, while reproducible computational workflows promote transparency, collaboration, and constructive scientific applications without significantly increasing costs. The project will follow validation-driven research practices, responsible data governance, and open computational workflows. Periodic ethical evaluation will accompany platform development to identify risks and support responsible translation into diagnostic or predictive applications.

Ethical Reflection and Protocol Standardization

To improve reproducibility and reliability, the project will:
Implement standard operating procedures (SOPs).
Validate and benchmark protocols across experiments.
Use shared documentation and version control for methods.

Assignment (Week 2 Lecture Prep) — DUE BY START OF FEB 10 LECTURE

Professor Jacobson’s homework questions:

Nature’s machinery for copying DNA is called polymerase. What is the polymerase error rate? How does this compare to the length of the human genome? How does biology address this discrepancy? DNA polymerase with proofreading activity (3′-5′ exonuclease) has an approximate error rate of 1 in 10⁶ nucleotides incorporated.

The human genome has approximately 3.2 × 10⁹ base pairs, so if only polymerase fidelity existed, thousands of errors would be introduced per complete genome replication. Biology resolves this discrepancy through multiple levels of error correction, for example:

polymerase proofreading
DNA repair systems (e.g., mismatch repair such as MutS)
redundancy and robustness of the biological system Together, these mechanisms reduce the effective error rate to levels compatible with genome stability.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons why all these different codes fail to encode the protein of interest?

The genetic code has 64 codons for 20 amino acids, which leads to genetic code degeneracy. According to the slides, an average human protein has approximately 1036 bp, or about 345 amino acids. If each amino acid can be encoded by an average of 3 codons, the number of possible sequences would be approximately 3^345, representing an extremely large number of possible DNA sequences that code for the same protein. Why not all of these sequences work in practice. Many variants don’t work well due to biological and physical constraints, for example:

codon bias and translation efficiency (different codons for the same amino acid)
GC content and DNA stability
DNA/RNA secondary structures
unwanted regulatory signals
repeats or sequences that are difficult to synthesize
mRNA stability In other words, even though the genetic code is redundant, not all equivalent sequences are functionally equivalent.

Dr. LeProust’s Homework Questions:

What is the most commonly used method for oligonucleotide synthesis?
Why is it difficult to produce oligonucleotides longer than 200 nucleotides by direct synthesis?
Why can’t a 2000 bp gene be created by direct oligonucleotide synthesis?

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramide synthesis. This process occurs cyclically, adding one nucleotide at a time, and each step has an efficiency slightly less than 100%. Due to this imperfect efficiency in each cycle, the probability of accumulated errors increases with the sequence length, making it difficult to produce oligonucleotides longer than approximately 200 nucleotides by direct synthesis. For the same reason, it is not possible to directly synthesize a 2000 base pair gene as a single oligonucleotide, as the accumulation of errors and truncated products would be too high. In practice, long genes are constructed by assembling multiple shorter oligos using molecular assembly methods (e.g., PCR assembly or Gibson assembly).

George Church’s Homework Question:

Choose ONE of the following three questions to answer; and cite any AI prompts or paper citations used.

[Using Google Slide #4 and Professor Church] What are the 10 essential amino acids in all animals? And how does this affect your view of the “Lysine Contingency”?

The essential amino acids in animals include: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine (and in some contexts, arginine). These amino acids must be obtained from the diet because animals cannot synthesize them. In relation to the “Lysine Contingency,” lysine becomes a critical point in bioengineering because its metabolic availability can be used as a control mechanism or biological dependency in synthetic systems. This illustrates how natural metabolic constraints can be exploited as biocontainment or functional control strategies in synthetic biology.

[Given slides #2 and 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

A possible code for AA:AA interactions could be based on physicochemical complementarity (e.g., charge, hydrophobicity, and size), analogous to how NA:NA interactions rely on base pairing and AA:NA interactions rely on codon recognition. This is because amino acid–amino acid interactions are primarily determined by chemical and structural complementarity rather than a fixed symbolic code, unlike nucleic acid base pairing.