Week 1 HW: Principles and Practices

Application: An AI Agent for Protein and Molecular Design

I’m developing an AI agent for protein and molecular design - an autonomous system that can take a high-level design brief (e.g. “design a protein that binds target X with nanomolar affinity”) and execute the full computational design pipeline: searching structure databases, running generative models, evaluating candidates, iterating on designs, and preparing sequences for synthesis. Unlike standalone models, an agent orchestrates multiple tools and makes decisions across the design cycle with minimal human intervention.

The promise is enormous: compressing weeks of expert computational work into hours, democratising access to protein engineering capabilities, and enabling rapid iteration on drug candidates, industrial enzymes, and biosensors. But agency amplifies dual-use risk. A standalone generative model requires a knowledgeable user to interpret and act on outputs. An agent that autonomously navigates the full design-to-synthesis pipeline lowers the expertise barrier dramatically. In 2022, Urbina et al. demonstrated a related concern — they inverted a drug discovery model’s objective function and generated ~40,000 molecules predicted to be more toxic than VX nerve agent, in under 6 hours. An agentic system could, in principle, not only generate such candidates but evaluate, optimise, and prepare them for ordering — all without the user needing deep domain knowledge.

Policy Goals

Primary Goal: Prevent misuse of generative biological AI while preserving its benefits

Sub-goals:

Biosecurity — Prevent AI-designed biological agents (proteins, toxins, pathogens) from being created or used to cause harm
Maintaining open science — Avoid governance structures so restrictive that they get in the way of legitimate research and fair access to these tools
Accountability — Ensure clear responsibility chains so that when things go wrong, there are mechanisms for tracking where things went wrong

Governance Actions

Action 1: Technical Screening Layer — Automated Hazard Flagging on Agent Outputs

Purpose: Currently, most generative bio-AI systems have no built-in safety filters, including emerging agentic pipelines. A user can instruct an agent to design any sequence or molecule without any check on whether the output is potentially dangerous. Many foundation model providers have some guardrails in place but these mostly police intent rather than dengerous molecules. This is worse with agents than standalone models because the agent may autonomously evaluate, refine, and prepare dangerous designs for synthesis without a human reviewing intermediate steps. I’m proposing a technical screening layer, analogous to content moderation in LLMs, that automatically flags outputs with high predicted toxicity, homology to known threat agents (select agents, toxins), or dual-use concern at multiple checkpoints in the agent’s pipeline.

Design: This requires:

A curated database of known threat sequences and molecular scaffolds, drawing from select agent lists and known toxin families
Lightweight classifier models trained to flag outputs above a risk threshold
Integration at the API level, so screening happens before results are returned
Model developers (companies like the one I work at, plus academic labs releasing open models) would need to implement this. Funding could come from existing biosecurity programmes such as UK AISI and US BARDA

Assumptions:

That dangerous outputs are detectable computationally. This is partially true (homology to known agents is searchable) but novel threats with no known analogues would slip through
That model developers will adopt this voluntarily or can be incentivised to do so
That the databases of known threats are comprehensive and kept current

Risks of Failure & “Success”:

Failure: Screening is trivially bypassed, for example by users running open-source models locally without the filter. Creates a false sense of security
“Success”: Over-sensitive filters block legitimate research. Researchers designing novel antimicrobials might constantly trigger toxicity flags. Could push users toward unfiltered open-source alternatives, defeating the point of the policy

Action 2: Industry API-Gated Access with Tiered Permissions

Purpose: Currently, access to powerful generative bio-AI is relatively open, including agentic systems that can autonomously execute multi-step design pipelines. Many underlying models are available as downloadable weights or through APIs with minimal identity verification. An agent that chains these models together amplifies risk because it reduces the expertise needed to go from intent to synthesis-ready design. I’m proposing a tiered access system where the level of capability scales with the user’s credentials and intended use:

Tier 1 (Open): In silico exploration. Anyone can query models for general protein properties, structure prediction, and basic design
Tier 2 (Verified): Full generative capability. Requires institutional affiliation, identity verification, and a stated research purpose
Tier 3 (Screened): Synthesis-coupled design. When a user wants to order synthetic DNA or protein based on AI-generated designs, synthesis providers (Twist, IDT, etc.) run additional biosecurity screening on the sequences

Design: This requires:

Identity verification infrastructure, which could piggyback on existing systems like ORCID for academics or institutional credentials
Coordination between AI model providers and DNA synthesis companies. The International Gene Synthesis Consortium (IGSC) already screens orders, but integration with upstream AI tools is new
Industry buy-in from model providers to gate their APIs. Companies like Anthropic have shown this is viable for language models (Claude was initially waitlisted)

Assumptions:

That tiering is enforceable. If model weights are open-source, gating the API is moot
That institutional affiliation is a reasonable proxy for trustworthiness. It’s not perfect, as state-sponsored actors have institutional credentials
That synthesis providers are the right chokepoint. This only works if physical synthesis remains the bottleneck, which may not hold as benchtop synthesis becomes easier

Risks of Failure & “Success”:

Failure: Determined bad actors route around the system entirely. Tiering only inconveniences legitimate researchers
“Success”: Creates a two-tier research ecosystem where well-resourced institutions have full access and smaller labs or Global South researchers are locked out, exacerbating existing inequities in biotech

Action 3: Regulatory Mandatory Dual-Use Review for Generative Bio-AI Publications and Releases

Purpose: Currently, there is no systematic requirement to assess dual-use risk before publishing generative bio-AI models, agentic systems, or their underlying datasets. The Urbina paper was itself a demonstration of how easily a published model could be repurposed, and agentic systems that chain multiple models into autonomous pipelines compound this risk by making misuse more accessible. I’m proposing mandatory dual-use risk assessments, similar to Institutional Biosafety Committee (IBC) review for wet lab work, before any generative bio-AI model, agent framework, training dataset, or capability benchmark is publicly released.

Design: This requires:

Expanding the remit of existing biosafety/biosecurity review bodies (such as IBCs or the UK’s ACDP) to cover computational tools, not just physical experiments
Developing standardised dual-use risk assessment frameworks specific to AI-bio. The existing frameworks are designed for gain-of-function wet lab work and don’t map cleanly
Journals and preprint servers (Nature, bioRxiv) could require evidence of dual-use review as a condition of publication, similar to ethics approval for human subjects research
Government funding agencies (UKRI, NIH, DARPA) could mandate dual-use review as a grant condition

Assumptions:

That review bodies have the technical expertise to evaluate AI model capabilities. Currently most IBCs do not
That pre-publication review is fast enough not to fatally slow down a fast-moving field
That the definition of “dual-use” can be operationalised clearly enough for consistent review decisions

Risks of Failure & “Success”:

Failure: Review becomes a rubber stamp. Committees lack expertise, approve everything, and the process adds bureaucratic overhead without improving safety
“Success”: Slows the pace of open publication enough that research moves to private industry where there’s less oversight. Creates a perverse incentive to not publish, reducing the transparency that currently helps the security community track developments

Scoring

Does the option:	Option 1 (Screening)	Option 2 (Tiered API)	Option 3 (Dual-Use Review)
Enhance Biosecurity
• By preventing incidents	2	1	2
• By helping respond	2	2	1
Foster Lab Safety
• By preventing incident	n/a	n/a	2
• By helping respond	n/a	n/a	n/a
Protect the environment
• By preventing incidents	2	2	2
• By helping respond	3	3	2
Other considerations
• Minimizing costs and burdens to stakeholders	1	2	3
• Feasibility?	1	2	3
• Not impede research	2	3	2
• Promote constructive applications	1	2	2

(1 = best, 3 = worst, n/a = not applicable)

Recommendation

I would recommend prioritising a combination of Actions 1 and 2, technical screening integrated with tiered API access, addressed to an organization like the AI Safety Institute who are in my opinion world leading!

Action 1 (automated screening) scores highest on feasibility and cost because it’s a technical solution that model developers can implement without legislative change. It’s the lowest-friction intervention. However, it’s insufficient alone because it’s bypassable with open-source models.

Action 2 (tiered access) addresses that gap by creating identity-linked accountability, and by integrating with the existing DNA synthesis screening infrastructure (IGSC). Together, these two actions create defence in depth: screening catches inadvertent misuse, and tiered access raises the bar for deliberate misuse.

Action 3 (mandatory dual-use review) scores well on response capability — a paper trail of risk assessments is valuable after an incident — but is the hardest to implement. The expertise gap in review bodies is real, and the risk of pushing research into less transparent private settings is significant. I’d recommend this as a medium-term goal, starting with voluntary frameworks that build capacity before mandating compliance.

Key trade-off: All three actions risk disadvantaging smaller labs and researchers who lack institutional infrastructure. Any implementation should include capacity-building provisions — for example, free verified access tiers for researchers from lower-income institutions.

Key uncertainty: The biggest unknown is how long DNA/protein synthesis remains the effective bottleneck, and also whether it even can be considered a bottlneck in 2026. If benchtop synthesis becomes cheap and accessible, Actions 1 and 2 lose much of their enforcement power, and the governance challenge shifts fundamentally toward the wet lab.

Week 1 Ethical Reflection

Halfpipe of doom was interesting, the observation that powerful technologies simultaneously promise to save and destroy the world. This isn’t new. Nuclear physics gave us both energy and bombs. Every transformative technology has this yin and yang.

This is definitely gogin to accelerate in biology right now, I think we are at the pivot point. The tools we’re learning in this course — DNA synthesis, CRISPR, protein design, autonomous AI agents that chain these together — are the biological equivalent of splitting the atom. The constructive applications are huge, but so is the potential for misuse. And unlike nuclear technology, where the materials and infrastructure required act as natural barriers, the barriers in biology are collapsing so fast. AI compresses the knowledge barrier, synthesis costs keep dropping, and the biological “materials” literally self-replicate.

This reinforces why governance can’t be an afterthought bolted on after the technology matures. It needs to be designed in parallel!

Week 2 Lecture Prep

Jacobson Questions

Q1: What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

Polymerase with its built-in proofreading has an error rate of about 1 in 10⁶. The human genome is ~3.2 billion bp, so each replication would introduce ~3,200 errors. Biology fixes this with post-replication mismatch repair systems like MutS, which bring the effective error rate down to roughly 1 in 10⁹–10¹⁰.

Q2: How many different ways are there to code for an average human protein? Why don’t all of these work in practice?

An average human protein is ~345 amino acids. Most amino acids have ~3 synonymous codons, giving roughly 3³45 possible DNA sequences for the same protein. In practice most won’t work because of codon usage bias (organisms prefer codons matched to their tRNA abundance), mRNA secondary structure affecting translation, and RNA cleavage rules.

LeProust Questions

Q1: What’s the most commonly used method for oligo synthesis currently?

Phosphoramidite chemistry, developed by Caruthers in 1981. A four-step cycle (coupling, capping, oxidation, deblocking) repeated for each base. Used in both traditional column synthesisers and modern chip-based platforms like Twist’s silicon platform.

Q2: Why is it difficult to make oligos longer than 200nt via direct synthesis?

Coupling efficiency compounds over length. Even at ~99% per step, (0.99)²⁰⁰ ≈ 13% full-length product. Longer oligos are dominated by truncations and errors.

Q3: Why can’t you make a 2000bp gene via direct oligo synthesis?

At 2000 cycles the full-length yield is essentially zero, and with ~1:200 per-base error rate you’d average ~10 errors per molecule. Instead, genes are built by assembling shorter overlapping oligos (60–200nt) using methods like Gibson assembly, then error-corrected.

Church Question

Q1: What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine. Animals cannot synthesise these and must get them from diet.

The “Lysine Contingency” in Jurassic Park was a biocontainment strategy where dinosaurs were engineered to not synthesise lysine. The problem is that lysine is already essential in all animals. Plus lysine is abundant in normal food sources, making it useless as containment.