Week 1 HW: Principles and Practices

🧬 Week 1: Principles, Ethics, and Practices

HTGAA Spring 2026 · Constantin Convalexius · Vienna, Austria

1. The Application: AI-Powered Science Automation

I’m interested in building an AI platform that helps automate parts of the scientific process — things like scanning literature for gaps, designing experiments, running them through lab robots (like the Opentrons we’ll use in HTGAA), and helping write up results.

Why? Science is slow. Not because scientists are lazy, but because there’s way more good questions than people to work on them. Many ideas never get tested because the person who had them didn’t have the right lab skills or equipment. And honestly, a lot of published research can’t even be reproduced because of human error in complicated protocols. Or negative results don’t get published at all, leading to the “chasing the same dead ends” phenomenon — but no one knows, because it’s not published.

An AI platform could help with all of that. Not by replacing scientists, but by letting more people do better science faster, use negative and positive results to iterate faster and learn from more data, which can be used to train the next “physics” model of the AI. I think of it like a student somewhere without access to a fancy lab — they could design a CRISPR experiment, have a robot run it remotely, and get solid results back. OpenAI did something very similar now with Ginkgo Bioworks, read here: GPT-5 Lowers Protein Synthesis Cost.

The obvious problem: this is dual-use. The same tool that speeds up drug discovery could also speed up bioweapon development. Which is exactly why governance matters here.


2. Policy Goals

Two main goals, each broken into sub-goals:

Goal A — Safety & Security

  • A1: Prevent the platform from being used (or easily adapted) for weapons development
  • A2: Keep humans in the loop for any high-risk experiments — no fully autonomous dangerous stuff

Goal B — Equitable Access

  • B1: Make the tools accessible regardless of where you are or how much funding you have
  • B2: Prevent any single company or government from monopolizing AI-driven science

3. Three Governance Actions

Action 1: Open-Source Mandate

  • Purpose: Right now the best AI models are built behind closed doors. I’d require that publicly funded AI-science tools get released as open-source — similar to how the Human Genome Project made all genomic data public. Private platforms could get tax incentives for doing the same.
  • Design: Funding agencies (NIH, NSF, ERC) tie grants to open-source release, like the existing open-access publication mandates. Code goes on GitHub or Hugging Face. Philanthropic orgs like the Chan Zuckerberg Initiative could co-fund.
  • Assumptions: That open-source leads to faster improvement (usually true — see Linux, Python). That the community helps maintain quality. But also: open-source means bad actors get access too, which is a real problem.
  • Risks: Companies might only open-source outdated models while keeping the good stuff private. And if everything is truly open, you’re lowering barriers for misuse too — which directly conflicts with Goal A.

Action 2: Built-In Safety Guardrails

  • Purpose: Current AI content filters are pretty weak and easy to bypass. I’d build domain-specific safety layers into the platform — not just keyword blocking, but actual screening of what’s being designed. Similar to how DNA synthesis companies like Twist Bioscience already screen orders against pathogen databases.
  • Design: Multiple layers: (1) screen DNA sequence requests against pathogen databases, (2) flag suspicious query patterns, (3) require extra credentials for the riskiest capabilities, (4) regular red-teaming by security experts. Built by developers, advised by biosecurity people.
  • Assumptions: That AI can reliably tell the difference between legit research and misuse — this is honestly still an unsolved problem. And that filters won’t be so aggressive they block perfectly good research.
  • Risks: Too strict → researchers switch to unfiltered alternatives. Too weak → false sense of security. And determined bad actors can probably just train their own models from scratch anyway.

Action 3: International Regulatory Body

  • Purpose: There’s no international body governing AI systems that accelerate science. The Biological Weapons Convention wasn’t designed for this. I’d propose an International Commission on AI-Assisted Research (ICAIR), modeled on the IAEA — setting standards, certifying platforms, and coordinating responses to misuse.
  • Design: UN member states + AI companies + scientific organizations participate. ICAIR sets minimum safety standards, certifies compliant platforms, runs audits, and coordinates responses. Funded by member states plus a levy on commercial AI platforms.
  • Assumptions: That international cooperation on AI governance is achievable (big assumption given US-China tensions). That the body can move fast enough — historically, regulation always lags technology.
  • Risks: Major nations refuse to join, making it toothless. Or it becomes so bureaucratic it kills innovation. Worst case: incumbents capture the body and use it to block competition.

4. Scoring Matrix

Scale: 1 = best, 3 = least effective

Policy GoalOpen-SourceSafety GuardrailsInt. Regulatory Body
Enhance Biosecurity
• Preventing incidents312
• Helping respond321
Foster Lab Safety
• Preventing incidents212
• Helping respond311
Protect Environment
• Preventing incidents312
• Helping respond321
Other Considerations
• Minimizing costs123
• Feasibility123
• Not impeding research123
• Promoting constructive use122

Summary: Open-source wins on access and feasibility but loses badly on security. Guardrails are best at prevention but depend on unsolved AI safety problems. The international body is strongest for response but hardest to actually create.


5. Recommendation

Audience: MIT Leadership / MIT Media Lab

No single action works alone. I’d go with a layered approach:

  1. Open-source — like OpenCourseWare, Creative Commons, Open Source Software.
  2. Build guardrails very soon, best day one.
  3. Gate the dangerous stuff: Basic capabilities stay open, advanced dual-use features (novel organism design) require institutional verification. Kind of like how some chemicals or drugs are freely available while others need a license or prescription.
  4. Push for international standards — we can’t create a regulatory body alone, but we could host working groups and publish frameworks that others adopt.

Main trade-off: Openness vs. security.

My resolution: Open source for wide distribution, with guardrails for more capable and dangerous capabilities (dual use).

Biggest uncertainty: Whether AI safety filters can actually keep pace with rapidly evolving capabilities. Nobody has a good answer to this yet.


6. Ethical Reflections

Going into this week I thought governance is something you deal with after a technology exists. The recitation changed that — the Jurassic Park meme sounds silly but captures it well. We’re too much in “can we?” mode and not enough in “should we?” mode.

The openness question kept bugging me. My gut says make everything open, but then I think about what “everyone” includes and it gets uncomfortable. I now think openness with checkpoints makes more sense — open tools, but controls where designs become physical (synthesis, robot instructions).

AI-generated fraud was new to me. An AI could make up data that looks real, or accidentally lead someone to design something harmful. Provenance tracking for AI outputs seems necessary.

These discussions are also very US-centric. As a med student in Vienna — AI doesn’t stop at borders. Building safety into the platform architecture could raise the floor globally, similar to how iGEM runs safety reviews across all countries without needing international treaties.

Actions I’d propose: ethics review before new AI capabilities get released, provenance tracking as default, tying capability releases to safety milestones, and building risk education directly into the workflow so users can’t blindly automate dangerous stuff.


Week 2 Lecture Prep

Dr. LeProust’s Questions

1. What’s the most commonly used method for oligo synthesis currently?

The standard is the phosphoramidite method developed by Caruthers in 1981.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

The problem: each coupling step isn’t 100% efficient. It’s around 99% or so, but not perfect. So if your coupling efficiency is 99%, for a 200-mer you’d get something like 0.99^200 ≈ 13% full-length correct product. The rest is junk — truncated products that failed at some step along the way.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Building on the previous answer: if even getting to 200nt with decent yield is hard, imagine trying 2000nt. At 99% coupling efficiency, 0.99^2000 is basically zero. You’d get virtually no full-length product. (Note: Twist Bioscience demonstrated for the first time that they can synthesize a ~700nt oligo, which was a major achievement pushing those limits.)

Professor Jacobson’s Questions

1. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

  • Error Rate: DNA polymerase has an error rate of approximately 1 in 10^6 (1 in a million)
  • Human Genome Size: approximately 3.2 Giga Base Pairs (Gbp) — that’s ~3 orders of magnitude larger than the error rate denominator
  • Implication: Thousands of errors would appear per single replication event
  • How Biology Deals With It: Biology overcomes this through additional error correction: proofreading by the polymerase itself during synthesis, and post-synthesis mismatch repair systems that catch and fix remaining errors

2. How many different ways are there to code for an average human protein? Why don’t all of these codes work in practice?

  • Number of Ways: The redundancy of the genetic code (multiple codons per amino acid) combined with an average human protein length of ~1036 base pairs means there is an astronomical number of different DNA sequences that could theoretically encode the same protein.
  • Why Not All Codes Work: Despite coding for the same amino acids, different DNA/RNA sequences are not functionally equivalent because:
    • Different nucleotides have different chemical features in hydrogen bonding and electrostatic properties — leading to different folding of primary into secondary/tertiary structures (the ribosome itself is an RNA that produces proteins!)
    • RNA Cleavage — breaking of the RNA strand means it doesn’t assemble as anticipated
    • Loop Formation — RNA can form ring structures, creating different secondary structures
    • Complex Tertiary Structures — rings, 3D origami-like shapes, and even cellular automata-like patterns

Professor George Church’s Question

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 Essential Amino Acids

In animals (including humans — and the dinosaurs of Jurassic Park), these 10 amino acids cannot be synthesized de novo and must come from the diet:

Amino AcidAmino Acid
PhenylalanineMethionine
ValineHistidine
ThreonineArginine*
TryptophanLeucine
IsoleucineLysine

*Arginine is essential in many animals/birds; conditionally essential in humans.

The “Lysine Contingency” from Jurassic Park Wiki

The “Lysine Contingency” is a fictional biocontainment strategy from Jurassic Park where dinosaurs were genetically engineered to be unable to produce lysine. The intent was to ensure they would fall into a coma and die if they escaped, as they’d lack the supplements provided by park staff.

Impact on My View

This is a completely fictional contingency that in the real world would have never worked — because no animal can synthesize lysine anyway. It’s an essential amino acid that every animal has to eat (via plants or meat). So the “engineered dependency” is completely redundant — the dinosaurs already couldn’t make it!

A real biocontainment strategy would need to engineer dependency on a non-natural amino acid — something that doesn’t exist in any food source. This would create true “metabolic isolation” that cannot be bypassed by simply eating natural foods.


AI Disclosure

Claude (Anthropic) — Used to help structure and refine this assignment. The core ideas and positions are my own.

  • Prompt 1: “Help me structure my governance analysis for AI-powered science automation, with three governance actions and a scoring matrix.”
  • Prompt 2: “Nice I have done the homework draft now, please refine it so it has less spelling errors, correct my grammar and format it better. If you correct my wording, don’t write AI but write human like. Keep all the info unless it is obviously wrong.”

Cursor (AI-assisted IDE) — Used to build and deploy my HTGAA website.