Class Assignment This assignment was done with the help of AI Engineering Allosteric Proteins via Domain Insertion I am interested in developing a computational tool to identify stable domain‐insertion sites for creating novel allosteric proteins. In this approach, a functional protein domain (e.g. a sensor or light‐sensing domain) is inserted into another protein, so that a change in the sensor domain (triggered by a ligand, light, etc.) allosterically controls the host protein’s activity. This strategy can produce “protein switches” with new-to-nature functions. For example, Wolf et al. (2025) demonstrated that domain insertion can generate potent optogenetic or chemogenetic switches: they built a machine‐learning pipeline (ProDomino) to predict insertion sites, and successfully created light‐ and ligand‐activated versions of proteins (including Cas9 variants) by inserting receptor domains. In general, computational protein design is rapidly improving, especially with AI methods, and promises to transform biotechnology and medicine. A dedicated tool for domain-insertion design would accelerate development of custom allosteric proteins (for biosensors, therapeutics, metabolic control, etc.) by predicting which residues can accept an insert without misfolding. Such a tool would integrate protein structure modeling, stability prediction, and sequence‐function data, guiding one-shot design of switchable proteins as described in recent literature.
Subsections of Homework
Week 1 HW: Principles and Practices
Class Assignment
This assignment was done with the help of AI
Engineering Allosteric Proteins via Domain Insertion
I am interested in developing a computational tool to identify stable domain‐insertion sites for creating novel allosteric proteins. In this approach, a functional protein domain (e.g. a sensor or light‐sensing domain) is inserted into another protein, so that a change in the sensor domain (triggered by a ligand, light, etc.) allosterically controls the host protein’s activity. This strategy can produce “protein switches” with new-to-nature functions. For example, Wolf et al. (2025) demonstrated that domain insertion can generate potent optogenetic or chemogenetic switches: they built a machine‐learning pipeline (ProDomino) to predict insertion sites, and successfully created light‐ and ligand‐activated versions of proteins (including Cas9 variants) by inserting receptor domains. In general, computational protein design is rapidly improving, especially with AI methods, and promises to transform biotechnology and medicine. A dedicated tool for domain-insertion design would accelerate development of custom allosteric proteins (for biosensors, therapeutics, metabolic control, etc.) by predicting which residues can accept an insert without misfolding. Such a tool would integrate protein structure modeling, stability prediction, and sequence‐function data, guiding one-shot design of switchable proteins as described in recent literature.
Policy Goals for Safe, Ethical Development
The overarching policy goal is to ensure that allosteric protein engineering proceeds in a manner that prevents harm and promotes societal benefit. In particular, we seek an ethical future where engineered proteins do not create new biohazards. Drawing on synthetic biology frameworks, we can decompose this into sub-goals (cf. Garfinkel et al. 2007, WHO 2022):
Enhance biosafety. Ensure laboratory safety to prevent accidental release or exposure. Researchers must follow best practices (e.g. biosafety level protocols) and conduct rigorous risk assessments. For example, CDC/NIH’s BMBL guidelines emphasize protocol‐driven risk assessment as a core principle. A biosafety sub-goal is protecting workers, the community, and the environment from unintended exposure to novel proteins.
Enhance biosecurity. Prevent misuse or malicious creation of harmful proteins (dual-use prevention). Engineered proteins could theoretically be misused as toxins or pathogens, so a key goal is to block such outcomes. This aligns with the preventing harm aim. Measures include oversight of synthetic DNA and protein design processes. For example, Baker and Church (2024) warn that powerful computational protein design “is vulnerable to misuse and the production of dangerous biological agents”. They suggest creating secure databases of all synthetic gene sequences, only queried in emergencies, so that design remains “safe, secure, and trustworthy”. This biosecurity goal also implies the ability to trace or intercept dangerous designs.
Promote responsible innovation. Encourage beneficial uses of engineered proteins while maintaining public trust and ethical norms. This includes transparency, community engagement, and fairness. For instance, the U.S. Presidential Bioethics Commission (2014) urged enhanced coordination and transparency, ongoing risk analysis, public engagement, and stepped-up ethics education for synthetic biology. It is important that researchers disclose methods and risk assessments, and that the public is informed about potential impacts. A related sub-goal is ensuring equity and access: U.S. policy emphasizes growing the bioeconomy “safely, ethically, and equitably”, meaning that the benefits of engineered proteins (e.g. new medicines or sustainable chemicals) should be widely shared, without exacerbating disparities. Governance should therefore foster a culture of responsibility (norms) in addition to rules.
In sum, our policy goals include non-malfeasance through biosafety and biosecurity, along with constructive use and ethical stewardship of allosteric protein technologies. These echo broader frameworks: e.g., a JCVI report on synthetic genomics highlighted goals of enhancing biosecurity, fostering lab safety, and protecting communities, and WHO’s Global Guidance stresses that mitigating biorisks is a shared responsibility among regulators, researchers, funders, and the public.
Governance Actions
We can envision multiple governance actions – rules, incentives, and technical measures – involving different actors, to achieve these sub-goals. Below are four illustrative actions, each analyzed in terms of purpose, design, assumptions, and potential failures or unintended outcomes.
Mandatory Sequence Screening and Data Repositories (Regulatory Oversight).
Purpose: Strengthen biosecurity by ensuring that DNA orders (and protein designs) are screened for high-risk sequences and tracked. Currently, many DNA synthesis companies voluntarily screen orders against databases of pathogens. We propose a formal requirement: all gene synthesis orders must be checked against a curated list of dangerous motifs, with sequences stored in secure registries. This would detect attempts to synthesize known toxins or pathogenic proteins and provide traceability.
Design: This would be implemented by federal regulators (e.g. through HHS/NIH guidance or an FDA rule). Companies like Twist Bioscience, IDT, etc. would be required to maintain order logs and screening software. A central U.S. repository (or federated system) could store hashes of sequences (as Baker & Church suggest
) accessible only to authorized agencies in emergencies. This leverages the law and architecture levers of governance
. It could be built atop existing biosecurity programs: for example, current CDC Select Agent regulations and NIH Dual Use Research policies (like P3CO review) already require oversight of certain dangerous pathogens, and this action would extend oversight to synthetic sequences in design phase. Funding and standards for the screening tools would come from agencies (e.g. NCBI or a new interagency lab) under an initiative like the U.S. Biosafety and Biosecurity Innovation Initiative (BBII)
. Law enforcement (FBI) might get emergency access to the registry if an incident occurs.
Assumptions: This assumes that screening algorithms can reliably flag high-risk sequences and that repositories can be secured. It also assumes companies will comply and that smaller providers exist. Uncertainty exists about unknown threats: a purely novel harmful protein might not match any list. There’s also an assumption that query access is strictly controlled (to avoid privacy or intellectual property leaks).
Risks of Failure & Success: If implementation fails (e.g. companies evade rules or hackers breach the database), malicious actors could slip dangerous gene orders through undetected. False negatives in screening or poor repository security are concerns. On the other hand, a “successful” outcome (a comprehensive, used registry) carries potential downsides: it could create a false sense of security (driving reliance on the system while new risks emerge), or raise controversies over surveillance and data privacy. Industry may resist added regulatory burdens or worry about proprietary sequences being logged. Additionally, this system might not catch illicit designs re-encoded by slight mutations or entirely new agents (highlighting the assumption risk).
Strengthening Biosafety Training and Institutional Oversight (Norms).
Purpose: Enhance biosafety and ethical standards in research environments. The goal is to ensure that any lab engineer working with allosteric proteins has appropriate training and oversight so that accidents are minimized. Currently, many universities require Institutional Biosafety Committee (IBC) approval for recombinant DNA work, but domain-insertion designs might fall outside traditional GMO rules. We propose expanding such oversight: any project involving novel protein engineering must include a formal risk assessment and safety plan.
Design: Academic institutions, biotech companies, and community labs would be required (or strongly encouraged) to have certified biosafety officers and implement training in line with NIH/CDC guidelines. For example, the forthcoming CDC BMBL 6th Edition stresses risk assessment as protocol-driven practice. We could mandate biosafety training modules (online courses like NIH’s “Biosafety and Biosecurity for Researchers”) as prerequisites for projects. Grant proposals (NIH/NSF) could include a biosafety section (similar to how dual-use concerns are addressed under NIH’s DURC/P3CO frameworks). Professional societies (like ABSA International) and iGEM-style competitions already enforce safety reviews; they could partner to update best practices specifically for synthetic proteins. This action primarily uses norms and markets: it sets community standards and links them to funding.
Assumptions: We assume that education and oversight change behavior and that institutions will properly enforce rules. In reality, training alone may not catch every hazard or be uniformly effective. There’s uncertainty about how to assess risk for entirely new proteins. Some researchers might under-appreciate subtle risks (the Presidential Bioethics Commission noted uncertainty of risks in emerging tech).
Risks of Failure & Success: If oversight is lax or perfunctory, it fails to catch hazards (for instance, the “unknown unknowns” of a new protein’s toxicity). Conversely, if overdone, red tape could slow beneficial research or lead to check-the-box compliance without genuine safety benefits. A positive outcome would be a strong safety culture: researchers routinely consult biosafety experts and transparently document risks. However, even then, excessive caution could stifle innovation or push some work into less-regulated spaces (e.g. outside traditional institutions). Unintended consequences might include a two-tier system where well-funded labs meet all standards but under-resourced groups cannot afford oversight, raising equity issues.
Incentives and Technical Solutions (Market/Architecture).
Purpose: Promote “biosecurity by design” and make safety an integral part of protein engineering tools. This aligns with the White House’s call for biosafety innovation
and the broader market lever: fund research that yields safer design methods. For example, just as ProDomino is a technical solution to domain insertion, we could develop companion tools that automatically flag potentially hazardous allosteric designs (e.g. if an inserted domain is similar to a toxin).
Design: Federal R&D programs (NIH, NSF, DOE) could issue grants or prizes for developing safety-enhancing technologies. One idea is an open database of “approved” insertion sites in model proteins (curated by NIH or a consortium), built from experimental data, so designers can avoid unexplored risky regions. Another is integrating safety checks into protein modeling software (an “architecture” lever): for example, adding modules to Rosetta or Alphafold pipelines that warn when a design might create unintended binding pockets or immunogenic sequences. This action spans both technical measures and incentives: the government (or industry consortiums) would fund proof-of-concept tools (like the BBII initiative mentioned in OSTP plans) and set standards for software. Public–private partnerships could emerge, similar to cybersecurity frameworks that integrate security-by-design, but here focusing on biotechnology.
Assumptions: This assumes that technical solutions (like AI risk-prediction) can meaningfully identify hazards in protein designs. It also assumes researchers will adopt these tools and that incentives (grants or prizes) will steer community efforts. A wrong assumption might be that technology can catch all risky cases; novel hazards may still evade detection if outside the training data. Another is that market forces will reward safety: e.g., companies might value a tool that prevents expensive failures, but if competition favors speed over caution, uptake may lag.
Risks of Failure & Success: If poorly executed, this could give a false assurance. For instance, if a screening tool is widely used, labs might neglect independent judgment, potentially amplifying a single point of failure. A failure mode is if funding programs end, leaving tools unsupported. On the upside, success means a robust suite of safe design practices (like how the tech industry uses static analysis). But even success carries risk: it could entrench certain standards that inhibit creativity (for example, overly conservative site restrictions), or create security dependence (if, say, a private firm owns a key safety algorithm and restricts access). Also, focusing on tech solutions alone ignores the need for socio-political measures (Norms and Law).
4. Community Norms and International Collaboration (Norms/Policy).
Purpose: Build a supportive ecosystem of shared norms and align with global governance. Scientific societies, DIY/“biohacker” spaces, and international bodies all have roles. The idea is that, alongside rules and tools, social norms and global consensus shape behavior. WHO’s framework stresses that biorisk governance must involve all relevant stakeholders, and indeed norms (like iGEM’s safety culture) can be very influential.
Design: Domestically, this could involve updating guidelines like the NIH Guidelines for Recombinant DNA and CDC/ABSA resources to explicitly cover novel protein engineering. Community lab networks (e.g. DIYBio, community biology spaces) can use their living handbook to include domain-insertion safety practices. Internationally, governments might adopt the WHO guidance or new multilateral agreements on synthetic biology, akin to arms control for biological technologies. This could take the form of a global registry of benign design practices or a shared platform (similar to how cybersecurity has international CERTs). The law lever would be lighter here – mostly adopting voluntary standards – but it is reinforced by norms and possibly market if international funding agencies make grants conditional on following global best practices. Educational outreach to the public and cross-disciplinary dialogues (technology foresight, as suggested by the Presidential Commission) would also fit here.
Assumptions: This assumes broad international cooperation and that soft norms will be heeded. However, biotech competition could hinder agreement. There is uncertainty in how to enforce norms across borders. We also assume community groups stay responsible; history shows some activists voluntarily consult experts (e.g. DIYBio Ask-an-Expert) but there is always a risk of rogue actors.
Risks of Failure & Success: A failure in norms-building could look like fragmentation: some countries or communities might not follow guidance, leaving gaps. If community labs do not buy in, they could inadvertently develop risks. Alternatively, success means a cohesive culture where best practices are widely shared and updated (an “adaptive governance” approach, as recommended by the NSC on Emerging Biotech). Unintended consequences of “success” might include complacency (assuming norms are followed without verification) or conflicts if ethical standards diverge globally. For instance, one country might push ahead with a risky application while others refrain, leading to distrust or tension.
In all these actions, a combination of approaches is needed. Law and regulation (Action 1) set the baseline; norms and education (Action 2 and 4) cultivate a responsible culture; market incentives and technical solutions (Action 3) drive innovation in safety. Echoing Ethan Zuckerman’s adaptation of Lessig’s framework, effective governance will likely use all four levers – laws, norms, markets, and code – together. For example, legal rules might mandate screening, markets might fund safe-design tools (through government grants), norms might be embodied in professional codes and training, and “code” might be built into design software.
Each proposed action must be monitored and updated as the field evolves. The Synthetic Genomics report advises policymakers to “closely monitor progress” and remain willing to adapt options. Likewise, the White House bioeconomy strategy insists on continuous assessment of biosafety investments as biotechnology advances. Only by combining technical foresight with flexible governance can we ensure that the development of engineered allosteric proteins contributes positively without unintended harm.
Sources: The above discussion draws on multiple governance analyses and policy reports. For example, Baker and Church (2024) highlight the dual-use risks of AI-driven protein design
; the JCVI Synthetic Genomics study (2007) outlines goals of biosecurity and lab safety
; the WHO’s 2022 framework stresses shared responsibility in biorisk management
; U.S. government initiatives (e.g. Bold Goals report) emphasize growing the bioeconomy safely, ethically, equitably
; and the U.S. bioethics commission (2014) calls for transparency and ethics education in synthetic biology
. These and other resources guide the above proposals.
Criterion
Option 1: Training & Oversight
Option 2: Technical Incentives
Option 3: Norms & Int’l Collab
Enhance Biosecurity — By preventing incidents
2
1
2
Enhance Biosecurity — By helping respond
2
2
3
Foster Lab Safety — By preventing incidents
1
1
2
Foster Lab Safety — By helping respond
1
2
3
Protect the environment — By preventing incidents
2
2
2
Protect the environment — By helping respond
2
2
3
Minimizing costs & burdens to stakeholders
2
1
1
Feasibility
1
2
3
Not impede research
1
1
1
Promote constructive applications
1
1
1
Rationales
Option 1 — Biosafety Training & Oversight
Strengths: Proven to reduce lab accidents; flexible, feasible, and widely adoptable.
Limitations: Inadequate alone for adversarial risks; uneven enforcement.
Scores: Strong for safety, low research burden; moderate biosecurity impact.
Option 2 — Biosecurity-by-Design Tools
Strengths: Builds safety into design workflows; reduces long-term risk and cost.
Limitations: Tool development takes time; potential false negatives or vendor lock-in.
Scores: High for prevention and feasibility; moderate risk of over-reliance.
Option 3 — Norms & International Collaboration
Strengths: Encourages global buy-in and low-cost adoption; fills policy gaps.
Limitations: Slow to form; weak enforcement and limited rapid response utility.
Scores: Low burden, high legitimacy; low immediacy.
Prioritize a layered governance strategy for safe domain-insertion allosteric protein engineering. First, combine enhanced biosafety training (Option 2) with funding for biosecurity-by-design tools (Option 3) to reduce accidental and intentional risks while preserving research freedom. Next, implement targeted sequence screening (Option 1) with safeguards for IP and privacy. Support international norms and community engagement (Option 4) to build global trust and equity. Fund pilot programs, FOAs for safety modules, and capacity-building grants for smaller labs. Require biosafety/dual-use risk plans in grant applications. This integrated approach balances safety, innovation, feasibility, and fairness in emerging protein design.
Assignment (Week 2 Lecture Prep)
This assignment was done with the help of AI
Homework Questions from Professor Jacobson
DNA Polymerase Error Rate
Baseline misincorporation rate (replicative polymerases with proofreading):
~10⁻⁷ per base per replication.
After mismatch repair (MMR):
~10⁻⁹ to 10⁻¹⁰ per base per cell division (final mutation rate).
Human genome size:
~3.2 × 10⁹ base pairs (haploid genome).
Quantitative comparison
At 10⁻⁹ error rate: 3.2×109×10−9 = 3mutations per cell division
So despite copying billions of bases, only a handful of mutations persist per replication cycle.
How biology resolves the discrepancy
Biology uses a multi-layered fidelity architecture:
Base selectivity (polymerase active site geometry)
3′→5′ exonuclease proofreading
Mismatch repair (MMR)
Damage repair pathways (BER, NER, HR, NHEJ)
The system functions as a hierarchical error-correction cascade, reducing raw chemical error into evolutionarily tolerable mutation rates.
How Many DNA Codes for an Average Human Protein?
An average human protein ≈ 400 amino acids.
The genetic code is degenerate:
61 codons encode 20 amino acids.
Average degeneracy ≈ 3 codons per amino acid (not uniform).
Rough combinatorial estimate
If average degeneracy ≈ 3:
3400≈10190
So ~10¹⁹⁰ possible DNA sequences could encode the same protein sequence.
This number vastly exceeds the number of atoms in the observable universe (~10⁸⁰).
Why Most of These Sequences Don’t Work in Practice:
Although they encode the same amino acid sequence, many fail functionally due to: