Week 1 Homework: Principles, Ethics & Practices
For the Future of Biological Engineering
All my life I have felt deep uncertainty about what lies beyond death, a question that many poeple confront at some point. It is from this questioning that my passion for science was born. Through scientific progress, humanity has managed to extend life expectancy significantly. However, one critical challenge remains: A longer life does not necessarily mean a longer healthy life.
A recent study analyzing multiple population and health indicators data sources reported a median healthspan-lifespan gap (The difference between the numbers of years lived and the numbers of years lived in good health) of 9.1 years, but forecasting a further widening over the next century. This disparity is not only a major healthcare challenge, but also a significant economic burden.
![[Top: Data Visualization of Aging Trends]](lab.jpg)
![[Left: Neural Network Simulation]](neuron.jpg)
To address this challenge, my approach focuses on one of the most critical contributors to the healthspan-lifespan gap: Neural function decline. I propose to engineer an aging biosensor by generating functional neural networks from induced pluripotent stem cells (iPSCs) and equipping them with synthetic genetic reporter circuits. Using CRISPR-based genome editing, these circuits can generate fluorescent signals reflecting aging hallmarks (e.g., cellular stress, DNA damage, senescence, etc).
Integrating the previously described biological system with a robotic self-driven laboratory will help providing an autonomous experimental engine capable of executing high-throughput perturbation screening (i.e., drug candidates).
| Prevent Biological Harm | Responsible AI Decision-Making | Ethical Standards (Human Materials) |
|---|---|---|
| Ensure all engineered circuits and neuronal models remain confined to controlled laboratory environments. | Prevent fully autonomous execution of high-risk experiments by establishing human-in-the-loop governance. | Ensure all iPSC lines are obtained under consent frameworks that clearly state genetic engineering and drug discovery use. |
| Implement mandatory validation checkpoints to detect off-target CRISPR edits or artifacts that could lead to misleading conclusions. | Use explainable AI techniques for the experimental design agent and continuously register model outputs, training data provenance, and decision rationales for external and regulatory review. | Guarantee donor privacy protection by ensuring there is no way to use any disclosed information to trace back to identifiable individuals. |
// EXPAND_TO_VIEW_PROTOCOL_DETAILS
Purpose: Many labs are already using automation to carry out large-scale biological experiments with minimal supervision. Some promising "closed-loop" systems even already allow for making decisions about which experiment to do next based on the results of previous experiments. My suggestion for how to balance the issue of AI autonomy and humans using AI currently is to simply add an extra step of human approval if the AI proposes something that passes a risk threshold, such as genome editing or unusual combinations of compounds or sensitive human-derived cells.
Design: To facilitate this, a formal tiered system of approvable experiments must be adopted by universities, research facilities, and biotech companies. The idea is that a lower risk experiment could be performed autonomously, but higher risk experiments must be reviewed by a responsible scientist and an ethics review team in the institution. The funding agencies or journals could also support this by requiring researchers to justify their approval process when publishing or receiving funding for results. This would not delay science, but would ensure that autonomy does not replace responsibility.
Assumptions: This approach is based on the assumption that it will be possible to define what counts as "high risk" in a practical and consistent way across the institutions. It also relies on the fact that the people tasked with reviewing the AI's proposed experiments are expert and honest (Or with good-will) enough to detect problematic ideas, and the process will stay meaningful rather than turning into pure bureaucracy.
Risks of Failure & “Success”: It would fail if the review process itself becomes a mere bureaucratic exercise; reviewers would click through and approve everything without seriously reviewing it. Or it may make research slower that labs would prefer the workflow to be informal enough to avoid the oversight process. If it succeeds too well, the uneven system might mean only large institutions with strong compliance teams run advanced autonomous labs, leaving smaller labs behind.
Purpose: Today, for the most part, experiments can be documented, but the AI workflows might not necessarily have a standardized documentation of the rationale for a particular experiment, the model used to predict the decision, and the specific version of the software and the data pipeline. The change I am proposing is in creating a controlled-access set of rules for what is known as “audit logging” in the autonomous systems in the laboratory, essentially indicating what the AI decided and why, and what was the execution and outcomes of the experiment.
Design: In order to do so, companies creating lab automation software as well as research institutions creating a closed-loop experiment platform will need to ensure that they incorporate a mechanism for internal logging that records details about the version of AI being used, inputs, experiment selection mechanism, robotic execution settings, as well as analysis pipeline. This might become an expected part of published research and grants that are submitted. Over time, this might be an expected component, much like research needs to incorporate method and protocol information if applicable.
Assumptions: This assumes that increased transparency really improves safety and reproducibility and that individuals are motivated to store and share their logs. Still, it also requires that sensitive information, such as private data sets or proprietary algorithms, can be protected while providing sufficient accountability.
Risks of Failure & “Success”: This might not happen if the logs are too voluminous and complex for a person to examine, leading to institutions trying to meet the bare minimum to pass an audit. It might not happen if the enterprise refuses to disclose the key information due to its intellectual property rights. If it does happen, a side effect might include a decrease in the freedom of investigation due to the exposure of information in the system of logging as a form of vigilance.
Purpose: Many of the technologies necessary for AI-based biology are also being democratized, including software for designing experiments, designing genome editing experiments, and using automatic screens. In general, all of these technologies are used responsibly, but in some cases, they could also enable irresponsible rushed experimentation if they are used improperly. The change that is being proposed is in creating a controlled-access set of rules for especially dangerous software, including software for automatically designing and performing genome edits, or executing biological optimization experiments with minimal human involvement.
Design: This would also require coordination between software developers, lab automation companies, cloud providers, and regulators. A practical version could work much like the licensing process is normally used to regulate high-risk activities, like those that require high-end functions, by requiring that the user login and the applications they use be associated with legitimate research institutions. They could also impose strict limits on uses they consider inappropriate. The key is to limit the potential for misuse of high-end automation stacks, not limit research.
Assumptions: First, it assumes that most risky activity would depend on mainstream tools and services; second, it assumes that a restriction of access would meaningfully reduce harm. It also assumes that regulators can define what counts as “high-risk automation” without being too broad and accidentally limiting legitimate scientific work.
Risks of Failure & “Success”: It might fail because constraints just shift the malicious users towards open-source or underground options. It also may fail if the rules are too restrictive and slow down the speed of actual biomedical research. If it succeeds, it is likely to have the unwanted side effect of focusing power in a few large institutions and corporations at the cost of smaller labs and open science.
| Criteria | Human Check | Audit Log | Access Control |
|---|---|---|---|
| // PREVENT_EXPERIMENTAL_HARM | |||
| Ensure engineered models remain confined | 1 | 2 | 3 |
| Detect off-target CRISPR edits & artifacts | 3 | 3 | 1 |
| // RESPONSIBLE_AI | |||
| Prevent autonomous high-risk experiments | 3 | 2 | 1 |
| Require explainability and logging | 2 | 3 | 1 |
| // ETHICAL_STANDARDS | |||
| Ensure iPSC consent compliance | 2 | 1 | 3 |
| Protect donor privacy | 1 | 1 | 2 |
Based on the scoring matrix, my top priority would be Governance Option 1: Human approval checkpoints for higher-risk autonomous experiments.
This option will work best for the most critical of our governance objectives: Minimizing harm while preserving the ability to innovate. This option directly attacks the core risk of closed-loop experimentation, which is not the automation, but the fact that an AI system might very quickly and with no human accountability explore problems with increasing complexity, such as exploring untested designs. This option creates a stop signal for experiments that require human review and delve into genome editing, human-derived neuronal model experimentation, and compound combinations.
The key trade-off is that there is friction introduced. It might slow down discovery pipelines and make it administratively heavy, particularly if the process is overly formalized or cautious. Still, compared to the other governance options, this remains realistic. Governance Option 2: Mandatory Logging improves transparency and reproducibility, but it does nothing to stop damage from occurring in real-time. It only really helps after the fact. Governance Option 3: Controlled Access/Licensing improves containment and prevents misuse of AI tools, but it risks being excessively restrictive and overly concentrating research powers with large organizations or companies.
However, this recommendation assumes that it will be possible to establish well-defined boundaries of what constitutes "high-risk" experimentation, as well as that there will be sufficient knowledge among reviewers of the experiments that AI proposes. There are also some questions as to whether such systems will continue to have any real effectiveness over time, whether they will begin to function as a mere formality as more AI labs emerge, though of all of the options, among existing possibilities, human-in-the-loop represents the appropriate balance of safety, practicality, and allowing scientific progress.
1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
In the slides for week 2, it is mentioned that the error rate of polymerase is about 1 mistake per 106 nucleotides (1:106). The human genome consists of about 3.1B DNA base pairs (6.2B in total). At a rate of 1 error every million copied nucleotides (6.2B/1M), we would have aproximately 6200 potential errors every time the DNA is copied. However, cells have mechanisms to counteract these errors:
Proofreading: Correction mechanism that occurs immediately during replication by recognizing that the 3'-OH group of the misplaced nucleatide is in a wrong position. During the process, enzymes replace the misplaced nucleotide, fixing about 99% of errors.
Mismatch repair: This process occurs after replication. Enzymes recognize structural deformities caused by misplaced nucleotides (Incorrectly paired) and replace them with the correct nucleotide.
2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
An average protein has around 1036 base pairs, which are equivalent to approximately 345 amino acids (1036/3). Each aminoacid may have between 1 and 6 possible codons, and if we consider an average of 3 codons per aminoacid, we would have approximately 3345 ways to code a protein.
In practice, many of this sequences do not work because even though they may code the same protein, they can also create high percentages of Citosin-Guanin pairs, with more stable unions that A-T, and it may create secondary structures that affect negatively strand separation and DNA synthesis.
1. What’s the most commonly used method for oligo synthesis currently?
The most common method is phosphoramidite-based solid-phase synthesis
2. Why is it difficult to make oligos longer than 200nt via direct synthesis?
Because of a high compound error rate, at 200 nucleotides, you would expect a significant portion of molecules to have errors.
3. Why can’t you make a 2000bp gene via direct oligo synthesis?
It is better to synthesize short fragments and then assemble them because this way you preserve accuracy. Because of that same high error rate, it is statistically unlikely to get a correct, full-length 2000bp gene with direct oligo synthesis.
1. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
The lysine contingency [1]: The lysine contingency is nothing more than a funny, clever narrative tool, as in the real world no animal has the ability to produce an essential amino acid (Lysine). Lysine is no different from our need of consuming vitamin C, we can't produce it, but we will not die if we don't consume it (At least not immediately), we will develop a deficiency instead. Basically, in the jurassic park universe, scientists were engineering dinosaurs to lose an ability they never had...
[1] Lopez MJ, Mohiuddin SS. Biochemistry, Essential Amino Acids. [Updated 2024 Apr 30]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK557845/