Week 1 HW: Principles and Practices

The application I want to build
I want to develop a closed-loop pipeline for peptide engineering that uses Feynman–Kac (FK) steering to control diffusion-based protein generation at inference time. The goal is to go beyond zero-shot prediction and instead build an automated engineering cycle that repeatedly:
- proposes peptide and mini-protein candidates,
- captures experimental readouts (binding, activity, stability, etc.),
- converts those measurements into reward signals,
- uses FK steering to bias the next round of generative sampling toward better candidates without retraining the underlying diffusion model.
This is inspired by the FK-steering approach, which wraps a diffusion-based protein generator with a sampling scheme so trajectories are continuously reweighted toward user-defined rewards. In our case, the reward is the experimental readout itself.
Why peptides?
Peptides are a good choice for this project because they are fast to synthesize and test, which makes them compatible with iterative lab loops. Many peptide properties we care about (solubility, stability, expression, off-target behavior) are hard to optimize from prediction alone, so a wet-lab loop is attractive. Functionally, peptides can serve as binders, inhibitors, diagnostic reagents, or modular parts in synthetic biology pipelines.
Milestones
| Horizon | Goal |
|---|---|
| Class MVP | Learn the wet-lab steps for this pipeline and complete at least one full design–build–test cycle. |
| Medium term | Compare FK steering against simple finetuning and reinforcement learning baselines. |
| Long term | Use this framework to discover therapeutic proteins. |
Governance and policy goals
Closed-loop design could be repurposed to create harmful biomolecules. Governance should reduce the probability of both deliberate misuse and accidental creation of dangerous function. The overarching goal is therefore misuse prevention, broken down into three sub-goals:
- Ensure the system does not optimize toward harmful or restricted targets and functions.
- Reduce the chance that hazardous sequences are synthesized without review.
- Maintain audit trails and responsible-use norms.
Three governance options
I propose three governance actions spanning institutional review, synthesis controls, and logging infrastructure.
Option 1: Institutional Review
| Aspect | Details |
|---|---|
| Purpose | Add structured risk assessment before synthesis, target changes, or new reward functions in academic protein design projects. |
| Design | One-page checklist covering target protein class, reward function, synthesis plan, and screening. Projects triggering high-risk criteria (regulated agents, virus optimization) require formal oversight. |
| Assumptions | Lightweight review gates and good record-keeping practices are sufficient for most academic work. |
| Risks | May push students to under-report; if too strict, could slow down R&D. |
Option 2: Synthesis Controls
| Aspect | Details |
|---|---|
| Purpose | Require synthesis vendors to perform functional or homology-based screening of orders. |
| Design | Institutions only purchase from vendors who screen orders and verify customer identity. |
| Assumptions | Sequence screening can be done well enough to meaningfully reduce risk. |
| Risks | Screening must be highly accurate to catch edge cases; missed cases could have severe consequences. |
Option 3: Logging Infrastructure
| Aspect | Details |
|---|---|
| Purpose | Create a secure, shared database that tracks when AI tools generate protein designs. |
| Design | Built-in logging of AI tool usage with cross-referencing against synthesis orders. |
| Assumptions | Confidentiality and transparency can be balanced. |
| Risks | Hacking risk, plus tension with sensitive intellectual property. |
Scoring
| Does the option: | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance Biosecurity | |||
| • By preventing incidents | 2 | 1 | 2 |
| • By helping respond | 1 | 2 | 1 |
| Foster Lab Safety | |||
| • By preventing incidents | 1 | 2 | 3 |
| • By helping respond | 1 | 2 | 1 |
| Protect the environment | |||
| • By preventing incidents | 2 | 2 | 3 |
| • By helping respond | 2 | 2 | 1 |
| Other considerations | |||
| • Minimizing costs and burdens to stakeholders | 2 | 2 | 2 |
| • Feasibility | 1 | 2 | 3 |
| • Not impede research | 1 | 2 | 1 |
| • Promote constructive applications | 1 | 2 | 2 |
Prioritization and tradeoffs
In order of priority:
- Option 1 (Institutional Review). This option can be implemented the fastest. MIT already has the safety infrastructure (IBC, EHS) to build on. As a leading institution in AI protein design, MIT can set standards that others follow, and a well-designed lightweight review process could become a widely adopted model.
- Option 2 (Synthesis Controls). The existing federal framework provides a strong template (vendor screening, customer verification, reporting requirements), but it depends on industry cooperation beyond MIT’s control. MIT can contribute by researching better screening algorithms and influencing government gold standards.
- Option 3 (Logging Infrastructure). If this project becomes a widely used system, tracking who designed what becomes relatively easy. The system has to be designed extremely carefully to be scalable, secure, transparent, and yet confidential.
Tradeoffs
- Speed vs. safety
- Open science vs. closed science
- Transparency vs. confidentiality
Key uncertainties
- How manageable it is to manually gate research directions.
- How well screening actually works against deliberate misuse.
- How feasible it is to design a logging system everyone is satisfied with.
Reflection on this week
Unfortunately, I was ill this week and was unable to attend class.