Week 1 HW: Principles and Practices

THE CLASS ASSIGNMENT

Biological Engineering Application / Tool to Be Developed

The biological engineering application I aim to develop is an integrated CRISPR–bioinformatics platform for modeling and early-stage testing of therapies for neurodegenerative diseases, particularly Alzheimer’s disease, by utilizing genetically engineered neuronal cells carrying patient-specific genetic variations (SNPs).

This platform integrates:

CRISPR-Cas9 technology to perform gene editing in neuronal cells (for example, targeting genes associated with Alzheimer’s disease),
Molecular docking and ADME prediction to screen candidate drug compounds derived from natural products, and
Bioinformatics analysis to predict molecular effects and the signaling pathways involved.

The motivation for developing this tool is that current Alzheimer’s therapies remain limited and expensive. Genetically modified neuron-based in vitro models can improve the accuracy of drug efficacy predictions, while the integration of computational approaches with wet-lab biology can reduce reliance on animal testing and accelerate the drug discovery process.

Governance / Policy Goals for an Ethical Future

The primary governance goal is to ensure that this technology is safe, not misused, equitable, and responsibly applied.

Overarching Goal

To ensure the use of genetic engineering for human health without creating biological, social, or environmental risks.

Sub-Goals

Non-maleficence (Harm Prevention) - Prevent the misuse of CRISPR technology for non-medical or harmful purposes. - Avoid the release of genetically engineered cells into the environment.
Safety and Security - Ensure high standards of laboratory safety. - Reduce the risk of genetic data breaches involving patient information.
Equity and Access - Ensure that the technology is not accessible only to elite institutions. - Promote use for public benefit rather than purely commercial interests.
Three Governance Actions Option 1: Layered Regulation for CRISPR Research in Academic Laboratories

Purpose: Currently, CRISPR use in many laboratories relies heavily on internal institutional regulations. I propose a layered approval system for CRISPR research involving human neuronal cells.

Design:

Actors: Universities, ethics committees, national regulators.
Each project must:
- Obtain approval from an ethics committee,
- Report gene targets and experimental objectives, and
- Undergo periodic safety audits.

Assumptions:

Institutions have sufficient administrative capacity.
Researchers are willing to be transparent.

Risks of Failure & “Success”:

Risk of failure: slowing down research progress.
Risk of “success”: excessive bureaucracy that hinders innovation.

Option 2: Mandatory Technical Standards for Biological and Genetic Data Security

Purpose: At present, there are no uniform technical standards for securing genetic data and biological materials produced through gene editing.

Design:

Actors: Government agencies, research institutions, digital infrastructure providers.
Implementation includes:
- Biological containment mechanisms (e.g., genetic kill-switches),
- Encryption of genetic data, and
- Role-based access controls.

Assumptions:

Security technologies can be widely implemented.
Researchers possess adequate data security literacy.

Risks of Failure & “Success”:

Risk of failure: high implementation costs.
Risk of “success”: overreliance on complex technical systems.

Option 3: Incentives for Open Research and Constructive Applications

Purpose: Much advanced biological research remains closed and commercially driven.

Design:

Actors: Governments, funding agencies, universities.
Incentives include:
- Dedicated grants for open research,
- Recognition for publications and responsible open data practices, and
- Cross-border research collaborations.

Assumptions:

Researchers are motivated by non-financial incentives.
Open data will not be misused.

Risks of Failure & “Success”:

Risk of failure: limited industry participation.
Risk of “success”: potential misuse of openly shared data by irresponsible actors.

The table

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	1	1	2
• By helping respond	2	1	2
Foster Lab Safety
• By preventing incident	1	1	2
• By helping respond	2	1	2
Protect the environment
• By preventing incidents	2	1	2
• By helping respond	2	1	2
Other considerations
• Minimizing costs and burdens to stakeholders	2	3	1
• Feasibility?	2	2	1
• Not impede research	3	2	1
• Promote constructive applications	2	2	1

Based on the evaluation above, a combination of Option 2 and Option 3 represents the most balanced approach. Option 2 is essential for ensuring biological safety and data security, while Option 3 helps ensure that innovation continues to progress and delivers broad societal benefits. Option 1 remains necessary as a foundational ethical and legal safeguard, but it should be implemented proportionally to avoid unnecessarily constraining research activities.

HOMEWORK QUESTIONS FROM PROFESSOR JACOBSON :

Natures machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes dont work to code for the protein of interest?

Here’s My Answer :

Polymerase Error Rate: Biological synthesis using error-correcting DNA polymerase has an error rate of 1:10⁶.Comparison to Human Genome: The average human protein is approximately 1,036 base pairs (bp) long, while the longest human proteins can exceed 100 kbp. On a global scale, the Genbank Release 220.0 (as of 6.15.17) contains approximately 235 Gbp of sequence data. An error rate of 10^-6 means one error occurs roughly every million base pairs, which is significant when considering the billions of base pairs in a full genome.Biological Mitigation: Biology utilizes proofreading mechanisms through error-correcting polymerases to maintain this accuracy. Additionally, systems like the MutS repair system are used to identify and correct mismatches. This system involves proteins such as MutH, MutL, and MutS, which work together with ATP and DNA polymerase III to recognize errors, remove the incorrect segment, and resynthesize the DNA correctly.
The “complexity” of arranging N monomeric building blocks of Q different types is defined by the number of different ways to arrange them (W). For a given polymer length N, biology must balance codon code redundancy and diversity.

Reasons for Code Failure cause by In practice, even if multiple DNA sequences can theoretically code for the same protein (due to the redundancy of the 20 amino acids), many will not work efficiently for several reasons:

Secondary Structure Interference is because The DNA or mRNA may fold into unfavorable secondary structures. Minimum Free Energy (MFE) calculations show that different sequences have different stability levels (free energy), which can interfere with the translation process.RNA Cleavage: Specific sequences may trigger RNA cleavage rules within a cell (such as RNase III in E. coli), which would degrade the mRNA before it can be translated.Synthesis Errors: During artificial gene synthesis, chemical synthesis has a much higher error rate (1:10²) than biological synthesis. This leads to a nonuniform, error-rich library where many synthetic molecules are “incorrect” despite having the intended theoretical sequence.

HOMEWORK QUESTIONS FROM Dr. LeProust :

What’s the most commonly used method for oligo synthesis currently?
Why is it difficult to make oligos longer than 200nt via direct synthesis?
Why can’t you make a 2000bp gene via direct oligo synthesis?

Below is My Answer :

The Phosphoramidite method is the current industry standard for oligonucleotide synthesis. This chemical process follows a four-step cycle repeated for each nucleotide added:

Coupling: A DMT-protected phosphoramidite is added to the unprotected $5^{\prime}$ OH of the growing chain.
Capping: Unreacted $5^{\prime}$ OH sites are acetylated to prevent them from extending further in future cycles (this helps avoid single-base deletions).
Oxidation: The phosphite triester is oxidized to a stable phosphate.
Deblock (Deprotection): Acid-catalyzed removal of the DMT group allows the next base to be added.

The primary limitation is stepwise yield (efficiency). Even with extremely high efficiency for each chemical step (e.g., 99% or 99.5%), the final yield of the correct product decreases exponentially as the number of couplings ($N$) increases.

The Math: The yield of a full-length oligo is roughly (1 - {error rate})^N.
For a 200nt oligo at 99% efficiency per step, the final yield of perfect sequence is only about 13% (0.99²⁰⁰ ≈ 0.13). At shorter lengths, this is manageable, but as the length increases toward 200nt and beyond, the amount of full-length, error-free material becomes negligible compared to the “truncated” or incorrect sequences.

Direct synthesis is limited by both yield and cumulative error rates:

Yield: Based on the exponential decay mentioned above, the yield for a 2000bp direct synthesis would be effectively zero using current phosphoramidite chemistry.
Error Rates: Standard industry error rates for synthesized oligos range from 1:200 in some baseline processes to 1:3000 in highly optimized platforms. A 2000bp sequence would statistically contain multiple errors if synthesized as a single continuous piece.
The Solution: Instead of direct synthesis, 2000bp genes are created through gene assembly. Shorter, high-quality oligos (typically 100-300nt) are synthesized first, purified or error-filtered, and then enzymatically assembled into full-length genes using methods like PCR Assembly (Stemmer method) or Gibson Assembly.

HOMEWORK QUESTIONS FROM George Church :

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Here’s my Answer :

The 10 essential amino acids that most animals (including humans) cannot synthesize on their own and must obtain through their diet are:
- Phenylalanine
- Valine
- Threonine
- Tryptophan
- Isoleucine
- Methionine
- Histidine
- Arginine
- Leucine
- Lysine

–> The “Lysine Contingency” vs. Genomically Recoded Organisms (GROs) The “Lysine Contingency” is a classic biocontainment concept (famously used in Jurassic Park) where an organism is engineered to be unable to produce lysine, making it dependent on an external supply for survival. However, Prof. Church’s research on Genomically Recoded Organisms (GROs), specifically the work by Mandell et al. (2015) mentioned in the slides, shifts the perspective on this contingency in several ways: From Natural to Synthetic Auxotrophy: The traditional Lysine Contingency is often considered “leaky” or weak because lysine is ubiquitous in nature. If a “contained” animal escapes, it can simply find lysine in the wild. Church’s slides propose using Non-Standard Amino Acids (NSAA) instead. Genetic and Metabolic Isolation: By recoding the genome (e.g., reassigning the UAG stop codon to a synthetic NSAA), scientists create a “metabolic isolation”. The organism becomes dependent on a man-made chemical that does not exist in the natural environment. Superior Biocontainment: Unlike the lysine contingency, which relies on a simple metabolic deficiency, a GRO is physically unable to correctly translate its essential proteins without the specific synthetic monomer. This provides a much higher level of safety, as there is effectively zero chance of the organism finding its required “nutrient” in the wild. In this view, the “Lysine Contingency” is an early, flawed attempt at biocontainment that has been superseded by Xenomicrobiology—the creation of life forms with an expanded or altered genetic code that are fundamentally incompatible with the natural world

prompt : How would you explain the answer to this question to someone who is still learning basic biology?