Week 1 HW: Principles and Practices

Biological Engineering Application

Proposed application: I want to develop a computer program that helps early-stage biological research by making it easier and more responsible for researchers to analyze biological data. The tool would help organize, check, and understand biological datasets (such as genomic or protein-related data) using bioinformatics and AI-assisted methods. It would also clearly show where there is doubt and where there is a risk of misuse.

Why this interests me: My interest stems from my academic exposure to bioinformatics and my curiosity about how software can meaningfully support biological research without lowering safety standards. As biological technologies become more accessible, I am particularly interested in how computational systems might help guide responsible use rather than accelerate harm or misuse.

Governance/Policy Goals for an Ethical Future

Ensure that AI- and software-assisted biological tools promote constructive scientific progress while minimizing risks related to misuse, safety failures, and inequitable access.

Sub-goals:

Non-malfeasance by preventing intentional or accidental misuse of biological data or tools.
Reduce risks related to unsafe experimental design or misinterpretation of results.
Equity & Access by ensuring tools do not disproportionately benefit only well-resourced institutions or regions.
To encourage transparency, reproducibility, and responsible documentation.

Governance Actions

Option 1: Mandatory Safety & Ethics Training for Tool Access

Purpose: Currently, many computational biology tools are accessible with minimal oversight. This action proposes requiring basic ethics and safety training before granting access to advanced biological analysis features.

Design

Actor(s): Universities, research institutions, platform developers
Short certification modules embedded into the tool onboarding
Required before unlocking sensitive or high-risk functionalities

Assumptions

Users will engage honestly with the training
Training content is kept up to date
Institutions agree on baseline standards

Risks of Failure & “Success”

Risk: Training becomes a box-checking exercise
Unintended success: May exclude independent researchers or under-resourced users if not designed inclusively

Option 2: Built-in Technical Safeguards and Usage Monitoring

Purpose: Introduce technical constraints that limit high-risk outputs and flag potentially dangerous use cases.

Design

Actor(s): Software developers, private companies
Automated flags, rate limits, and warning prompts
Optional audit logs for institutional users

Assumptions

Risky behaviors can be meaningfully detected
Developers correctly anticipate misuse patterns

Risks of Failure & “Success”

Risk: Over-blocking legitimate research
Unintended success: Users may try to bypass safeguards using alternative tools

Option 3: Norms and Incentives for Transparent Documentation

Purpose: Encourage researchers to document both successes and failures to promote safer learning and reproducibility, much like chess players recording every move of a match.

Design

Actor(s): Journals, funding bodies, academic institutions
Incentives for publishing negative or null results
Standardized documentation templates

Assumptions

Researchers value incentives over speed or prestige
Documentation does not expose sensitive information

Risks of Failure & “Success”

Risk: Increased administrative burden
Unintended success: Over-disclosure of sensitive methods

Scoring Governance Actions

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	2	1	2
• By helping respond	2	2	1
Foster Lab Safety
• By preventing incident	1	2	2
• By helping respond	2	2	1
Protect the environment
• By preventing incidents	2	2	2
• By helping respond	2	3	1
Other considerations
• Minimizing costs and burdens to stakeholders	2	3	1
• Feasibility?	1	2	1
• Not impede research	2	3	1
• Promote constructive applications	1	2	1

Prioritization & Recommendation

Based on the scoring, I would prioritize a combination of Option 1 (training requirements) and Option 3 (documentation norms). Together, these approaches encourage responsible behavior without heavily restricting legitimate research. While technical safeguards (Option 2) are important, they should be applied cautiously to avoid impeding innovation.

Target audience: Academic institutions and platform developers, with encouragement from funding agencies.

Trade-offs & uncertainties:

Balancing accessibility with responsibility
Ensuring governance mechanisms evolve alongside technology
Risk that voluntary norms are unevenly adopted

Reflection: Ethical Concerns from This Week

This week highlighted how easily powerful biological tools can shift from beneficial to harmful depending on context, intent, and oversight. One ethical concern that stood out to me was the assumption that access alone equates to understanding or responsibility. I was also struck by how governance often lags behind technical capability.

Proposed additional governance action: Introduce interdisciplinary review processes that include not only scientists but also ethicists, policymakers, and community representatives when developing or deploying new biological tools.

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson

Question 1: Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Answer: Nature’s machinery for copying DNA, called DNA polymerase, has an error rate of approximately 1 error per 10 million base pairs (1:10⁶). The human genome consists of approximately 3 billion base pairs. This means that during DNA replication, there could be around 300 errors per replication event.

Biology has evolved mechanisms to address this discrepancy and maintain genomic integrity:

Proofreading by DNA Polymerase: DNA polymerase has a built-in proofreading ability. If it incorporates an incorrect nucleotide, it can detect the error, remove the incorrect base, and replace it with the correct one. This significantly reduces the error rate to about 1 error per billion base pairs (1:10⁹).
DNA Repair Mechanisms: Cells have additional repair systems, such as mismatch repair, which identify and correct errors that escape the proofreading process. These mechanisms further reduce the error rate and help maintain the accuracy of the genome.

These processes ensure that the human genome remains stable and functional despite its vast size and the inherent error rate of DNA replication.

Question 2: How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Answer: To determine how many different ways DNA can code for an average human protein, we need to consider the following:

Average Length of a Human Protein: The document states that the average human protein is 1036 base pairs long. Since each amino acid is encoded by a codon (a sequence of three nucleotides), this corresponds to approximately 345 amino acids (1036 ÷ 3 ≈ 345).
Codon Redundancy: The genetic code is degenerate, meaning multiple codons can encode the same amino acid. For example, there are 64 possible codons, but only 20 amino acids, so many amino acids are encoded by more than one codon. The number of codons per amino acid varies (e.g., leucine has 6 codons, while methionine has only 1).
Number of Possible DNA Codes: If we assume an average of 3 codons per amino acid (a rough estimate based on the genetic code), the number of possible DNA sequences for an average human protein would be approximately 3345, which is an astronomically large number.

Reasons Why All These Codes Don’t Work in Practice:

Codon Bias: Different organisms have preferences for certain codons over others, known as codon bias. Codons that are rarely used in the host organism may lead to inefficient translation or reduced protein expression.
mRNA Secondary Structures: Some DNA sequences may produce mRNA with secondary structures (e.g., hairpins) that interfere with ribosome binding or translation, reducing the efficiency of protein synthesis.
Regulatory Elements: DNA sequences may inadvertently contain regulatory elements (e.g., promoters, enhancers, or silencers) that affect transcription or translation, leading to unintended consequences.
Protein Folding and Function: While the amino acid sequence may be correct, the codon choice can influence the speed of translation, which in turn affects protein folding. Improper folding can result in non-functional or misfolded proteins.
Post-Translational Modifications: Some DNA sequences may not allow for proper post-translational modifications, which are critical for the protein’s function.
Codon Context Effects: The sequence context around codons can influence translation efficiency and accuracy, meaning that not all codon combinations are equally effective.

In practice, researchers often optimize codon usage for the host organism to ensure efficient and accurate protein production.

Homework Questions from Dr. LeProust

Question 1: What’s the most commonly used method for oligo synthesis currently?

Answer: The most commonly used method for oligonucleotide synthesis currently is solid-phase phosphoramidite chemistry

Question 2: Why is it difficult to make oligos longer than 200nt via direct synthesis?

Answer: Making oligonucleotides (oligos) longer than 200 nucleotides (nt) via direct chemical synthesis is difficult primarily because of the exponential decrease in yield caused by coupling efficiencies being less than 100%, and the cumulative increase in chemical errors over long synthesis cycles.

Question 3: Why can’t you make a 2000bp gene via direct oligo synthesis?

Answer: Direct, single-step chemical synthesis of a 2000 base pair (bp) DNA sequence is currently not possible using standard automated oligonucleotide synthesis, primarily due to the exponential decrease in efficiency, accumulation of errors, and the inability to purify such long single-stranded molecules. While 2000 bp genes are commonly created, they are assembled from smaller oligonucleotides (typically 40-200 bases) rather than synthesized in one direct step.

Homework Question from George Church

Question 1: What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Answer: The 10 essential amino acids that must be acquired through the diet of most animals, including humans, are Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine. These are considered “essential” because animal bodies cannot synthesize these amino acids internally at a rate sufficient to meet metabolic needs and must instead obtain them from food.

How the Science Affects My View of the Contingency: The knowledge that lysine is an essential amino acid for all animals completely undermines the premise of the “Lysine Contingency” as a practical, reliable safety feature. All vertebrates already cannot produce their own lysine. They must obtain it from their food. Making a dinosaur “lysine-deficient” is redundant because they were already, by definition, dependent on dietary lysine. The contingency fails because dinosaurs can easily find lysine in their environment. Herbivores can eat soy, beans, and other common plants, while carnivores can obtain it by eating those herbivores. The failure of the contingency serves as a key plot point demonstrating human hubris and the inability to fully control nature, as noted by characters in the franchise.

AI Use Disclosure: I used ChatGPT (OpenAI) as a brainstorming and structuring aid while working on this assignment. Specifically, I used AI-generated prompts to help clarify the assignment requirements, organize my ideas, and explore example frameworks for governance and ethics analysis. All final interpretations, reflections, and written content were reviewed, adapted, and contextualized by me.