I am a student researcher focused on peptide design and experimental validation, with strong interests in bioinformatics and computational biology. My work centers on the AI-assisted design of inhibitory peptides, followed by in vitro and in vivo experimental testing, combining computational approaches with molecular biology and biochemical validation.
I actively participate in scientific outreach initiatives and the organization of academic events, and I am involved in communities focused on omics sciences, bioinformatics, and science education. My goal is to develop computationally guided biomolecular tools with real experimental impact in biomedicine, microbiology, and biotechnology.
Governance of AI-Driven Biological Design 1. Biological Engineering Application or Tool Description As the core idea of my project, I would like to develop a concept that has been under discussion in my laboratory for some time: a computational platform for the de novo design of peptide and protein ligands capable of inhibiting essential microbial processes, such as translation, with the goal of suppressing or controlling microbial growth.
Homework Questions & Answers Homework Questions from Professor Jacobson Questions Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
Part 1: Benchling & In-silico Gel Art
Simulate Restriction Enzyme Digestion with the following Enzymes:
Part 3: DNA Design Challenge
3.1. Choose your protein.
In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
Article / case study: Automation at Adaptyv Bio and protein binder design competitions
A prominent example of the use of automation in biology is the work carried out by Adaptyv Bio, a company specialized in laboratory automation and the integration of artificial intelligence for protein design and validation. In particular, Adaptyv Bio organized international protein design competitions, such as the Protein Binder Competition, in which thousands of computationally generated designs were experimentally tested using fully automated workflows.
Subsections of Homework
Week 1 HW: Principles and Practices
Governance of AI-Driven Biological Design
1. Biological Engineering Application or Tool
Description
As the core idea of my project, I would like to develop a concept that has been under discussion in my laboratory for some time: a computational platform for the de novo design of peptide and protein ligands capable of inhibiting essential microbial processes, such as translation, with the goal of suppressing or controlling microbial growth.
This platform would integrate computational biochemistry, focused on deep-learning-based structure prediction, together with generative sequence design and molecular dynamics simulations to generate ligands that selectively bind to key molecular interactions in microbial metabolism, including protein–protein and protein–nucleic acid interactions. Tools such as RosettaFold Diffusion and BindCraft have demonstrated remarkable capabilities for designing ligands with high affinity and specificity, consolidating this approach as a promising strategy for the rational development of new antimicrobial agents.
2. Governance and Policy Goals
General Governance Objective
Ensure that computational platforms for antimicrobial ligand design are developed and implemented in ways that maximize public health benefits while minimizing risks of misuse, accidental harm, and unethical applications.
Objective 1: Prevent Malicious or Irresponsible Use of Molecular Design Platforms
Sub-goals:
Limit the use of ligand design tools to prevent the generation of molecules that could increase microbial virulence, toxicity, immune evasion, or environmental persistence.
Establish mechanisms to identify, evaluate, and manage cases of dual-use research of concern (DURC) arising from computational molecular engineering.
Implement educational mechanisms focused on responsible AI use, aimed at preventing unintentionally harmful applications.
Objective 2: Strengthen Biosecurity Throughout the Research Process
Sub-goals:
Ensure that computational design workflows incorporate early biological risk assessment, and promote rigorous experimental validation protocols that evaluate toxicity and potential ecological impact before advancing projects.
Objective 3: Promote Responsible Innovation and Transparency in AI-Driven Bioengineering
Sub-goals:
Encourage documentation, traceability, and auditability of molecular design decisions, supported by open scientific communication and the sharing of best practices related to risk prevention and mitigation.
3. Governance Actions
Action 1 — Integrated Biosecurity Filters in Molecular Design Platforms
Purpose:
Current molecular design tools optimize binding and stability without systematic safety analysis. The proposed change is to integrate mandatory biosecurity filters that detect or log potentially dangerous sequences or misuse, particularly those related to pathogens or viral components.
Design:
This action would involve collaboration with the laboratories and organizations responsible for developing these software platforms to implement built-in safety checks.
Assumptions:
It is assumed that harmful biological functions can be predicted and flagged computationally with sufficient accuracy.
Risks of Failure & Success: Failures include false negatives, where harmful designs are not detected. Additionally, many platforms operate locally and offline, limiting centralized monitoring.
Action 2 — Institutional Oversight for AI-Driven Molecular Engineering
Purpose:
Establish specialized review processes to evaluate dual-use risks before experimental implementation.
Design: Universities, ethics committees, funding agencies, and regulatory bodies would implement multidisciplinary review panels and mandatory risk assessments prior to project approval and funding.
Assumptions:
This approach assumes institutional capacity for technical risk evaluation and researcher compliance.
Risks of Failure & Success: Excessive regulation may discourage exploratory research*, particularly in low-resource environments.
Action 3 — Tiered Access and Licensing of Advanced Molecular Design Platforms
Purpose: Implement user identification and credential-based access models to monitor and deter misuse.
Design: Platform developers, regulatory agencies, and academic consortia would establish access levels, authentication systems, and activity monitoring.
Assumptions: This approach assumes credential-based access control is enforceable and accepted by researchers.
Risks of Failure & Success:
Failures include excluding under-resourced researchers and the emergence of unregulated alternative tools.
4. Governance Option Evaluation Matrix
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
1
2
1
• By helping respond
2
1
2
Foster Lab Safety
• By preventing incident
2
1
2
• By helping respond
2
1
2
Protect the environment
• By preventing incidents
2
1
2
• By helping respond
3
1
2
Other considerations
• Minimizing costs and burdens to stakeholders
2
3
3
• Feasibility?
1
2
3
• Not impede research
2
3
1
• Promote constructive applications
1
2
2
5. Governance Prioritization and Recommendation
Based on the scoring and overall evaluation, I would prioritize interinstitutional oversight mechanisms and tiered-access systems as the most effective governance options. The main reason for this prioritization is that the primary risk factor for misuse lies in the broad open access to these software tools, combined with limited monitoring of their actual use. Since many AI-based biological design platforms are freely accessible, traceability, accountability, and early risk detection are currently limited, substantially increasing the risk of accidental misuse or deliberate malicious exploitation.
The administrative burden, slower research workflows, and barriers for resource-limited institutions were considered key trade-offs. However, the significant improvements in risk mitigation, accountability, and governance transparency outweigh these disadvantages. Furthermore, these limitations can be reduced through careful system design, including fast-track approval procedures and international collaboration frameworks.
This recommendation assumes that institutions possess the organizational and technical capacity to implement oversight systems and that researchers will comply with access regulations. Major uncertainties remain regarding global regulatory harmonization, consistent enforcement, and the adaptability of governance frameworks in response to the rapid evolution of AI capabilities.
Target Audience:
This recommendation is primarily directed at major research institutions, international scientific organizations, and national regulatory agencies, aiming to establish coordinated governance structures that balance innovation, safety, and public protection.
Homework Questions & Answers
Homework Questions from Professor Jacobson
Questions
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Answers
DNA polymerase has an intrinsic error rate of approximately 1 mistake per 10⁶ nucleotides incorporated. Given that the human genome is roughly 3 × 10⁹ base pairs long, this would result in thousands of errors per replication cycle in the absence of correction mechanisms. Biology addresses this discrepancy through the proofreading activity of DNA polymerase and post-replicative mismatch repair systems, which dramatically reduce the final mutation rate.
Due to the degeneracy of the genetic code, there exists an astronomically large number of DNA sequences capable of encoding an average human protein. However, in practice, not all of these sequences are equally viable. Factors such as mRNA stability, codon usage bias, translational efficiency, secondary structure formation, and regulatory sequence constraints limit the set of functional coding sequences.
Homework Questions from Dr. LeProust
Questions
What’s the most commonly used method for oligo synthesis currently?
Why is it difficult to make oligos longer than 200 nt via direct synthesis?
Why can’t you make a 2000 bp gene via direct oligo synthesis?
Answers
The most commonly used method for oligonucleotide synthesis is solid-phase chemical synthesis using phosphoramidite chemistry.
It is difficult to synthesize oligonucleotides longer than ~200 nucleotides because the coupling efficiency at each synthesis cycle is not perfect, leading to the progressive accumulation of errors and truncated products as the length increases.
A 2000 bp gene cannot be synthesized directly because the cumulative error rate and product truncation become overwhelmingly high, preventing the recovery of a correct full-length sequence in sufficient yield and purity.
Homework Question from George Church
Question
What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?
Answer
The ten essential amino acids in animals are:
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Threonine
Tryptophan
Valine
Arginine
Week 2
Homework Questions & Answers
Homework Questions from Professor Jacobson
Questions
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Answers
DNA polymerase has an intrinsic error rate of approximately 1 mistake per 10⁶ nucleotides incorporated. Given that the human genome is roughly 3 × 10⁹ base pairs long, this would result in thousands of errors per replication cycle in the absence of correction mechanisms. Biology addresses this discrepancy through the proofreading activity of DNA polymerase and post-replicative mismatch repair systems, which dramatically reduce the final mutation rate.
Due to the degeneracy of the genetic code, there exists an astronomically large number of DNA sequences capable of encoding an average human protein. However, in practice, not all of these sequences are equally viable. Factors such as mRNA stability, codon usage bias, translational efficiency, secondary structure formation, and regulatory sequence constraints limit the set of functional coding sequences.
Homework Questions from Dr. LeProust
Questions
What’s the most commonly used method for oligo synthesis currently?
Why is it difficult to make oligos longer than 200 nt via direct synthesis?
Why can’t you make a 2000 bp gene via direct oligo synthesis?
Answers
The most commonly used method for oligonucleotide synthesis is solid-phase chemical synthesis using phosphoramidite chemistry.
It is difficult to synthesize oligonucleotides longer than ~200 nucleotides because the coupling efficiency at each synthesis cycle is not perfect, leading to the progressive accumulation of errors and truncated products as the length increases.
A 2000 bp gene cannot be synthesized directly because the cumulative error rate and product truncation become overwhelmingly high, preventing the recovery of a correct full-length sequence in sufficient yield and purity.
Homework Question from George Church
Question
What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?
Answer
The ten essential amino acids in animals are:
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Threonine
Tryptophan
Valine
Arginine
Week 2 — DNA Read, Write, & Edit
Part 1: Benchling & In-silico Gel Art
Simulate Restriction Enzyme Digestion with the following Enzymes:
Part 3: DNA Design Challenge
3.1. Choose your protein.
In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
The protein selected for this assignment is Translation Initiation Factor IF-2 (IF2) from Escherichia coli (strain K12).
I chose IF2 because it plays a central role in the initiation of protein synthesis, a critical and highly regulated step of gene expression. IF2 is responsible for promoting the binding of the initiator tRNA to the ribosomal P site and facilitating the correct assembly of the translation initiation complex. Due to its essential function, IF2 is a key factor in controlling translational efficiency and fidelity. Additionally, IF2 is highly conserved among bacteria, making it an important target for studies in molecular biology, ribosome dynamics, and antibiotic development. Its biological relevance and mechanistic complexity make it a particularly interesting protein to study.
The amino acid sequence of IF2 was obtained from UniProt.
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
Using the reverse translation tool available in Benchling, the amino acid sequence of Translation Initiation Factor IF-2 (IF2) from Escherichia coli (strain K12) was converted into a corresponding nucleotide sequence.
I’ve shortened the sequense because it’s very long.
3.3. Codon optimization.
Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
Codon optimization is necessary because, although multiple codons can encode the same amino acid, different organisms show preferences for specific codons. These preferences are related to the abundance of corresponding tRNAs and directly affect translation efficiency, protein yield, and overall expression levels.
The organism selected for codon optimization was Escherichia coli, because it is one of the most widely used hosts for recombinant protein expression. E. coli offers fast growth, low cost, well-established genetic tools, and high-level protein production.
I honestly don’t know how to represent it, so I’m going to include the evidence from Bencling.
3.4. You have a sequence! Now what?
What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
In a cell-dependent system, the optimized DNA is cloned into an expression plasmid and introduced into E. coli. Inside the cell, the DNA is transcribed into mRNA and then translated by ribosomes into a protein, which folds into its functional form and can later be purified using techniques such as affinity chromatography.
From personal experience I use cell-dependent but the approach of both systems, the underlying process follows the Central Dogma of Molecular Biology, where DNA is transcribed into mRNA and then translated into protein.
Part 4: Prepare a Twist DNA Synthesis Order
4.1. and 4.2. Build Your DNA Insert Sequence
The final DNA construct includes the essential regulatory and coding elements required for protein expression: a promoter to initiate transcription, a ribosome binding site (RBS) to enable efficient translation, a start codon (ATG), the codon-optimized coding sequence of the target protein, a C-terminal 7×His tag to facilitate protein purification, a stop codon, and a transcription terminator to properly end transcription. Together, these components ensure efficient transcription, translation, and purification of the recombinant protein in E. coli.
4.3. to 4.6.
The pET target (AMP), a recombinant expression vector intended for Escherichia coli IF2 protein production, displays the IF2_MOD plasmid map in the figure. Strong, controlled transcription and translation of the inserted gene are made possible by the plasmid’s T7 promoter and T7 ribosome binding site (RBS). The lac operator (lacO) and the lacI repressor regulate expression, enabling IPTG induction. The plasmid also has a high-copy-number replication origin, which guarantees effective plasmid maintenance, and the ampicillin resistance gene (ampR), which is used to select transformed cells.
Part 5: DNA Read/Write/Edit
(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).
I would want to sequence genes related to antibiotic resistance in pathogenic bacteria, because antimicrobial resistance is a major global health problem. By sequencing these genes, we can identify resistance mechanisms, track how they spread among bacterial populations, and monitor the emergence of new resistant strains. This information is essential for improving disease surveillance, guiding treatment decisions, and developing new strategies to control infections.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
For exaple: This method is a second-generation sequencing technology that is ideal for studying antibiotic resistance genes in pathogenic bacteria. The input is purified DNA extracted from bacterial or environmental samples, which is prepared by fragmentation, adapter ligation, and PCR amplification to generate a sequencing library. Sequencing is performed using sequencing-by-synthesis, where fluorescently labeled nucleotides are incorporated and detected to accurately decode each DNA base. The output consists of millions of short DNA sequence reads, which can be analyzed to identify resistance genes, detect mutations, and monitor their spread in bacterial populations.
5.2 DNA Write
(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)
I would like to synthesize genes with optimized codons that encode antimicrobial peptides and inhibitors of proteins involved in bacterial translation, such as peptides that interact with IF2, as these could be used for drug discovery and the development of new antibiotics. These synthetic DNA sequences could then be inserted into expression vectors to rapidly produce and test novel therapeutic proteins.
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
I would use solid-phase phosphoramidite DNA synthesis, followed by FPLC purification, because this method allows precise and reliable chemical synthesis of custom DNA sequences. The essential steps include stepwise nucleotide coupling, oxidation, capping, and deprotection to build the DNA strand, followed by cleavage from the solid support. The main limitations of this method are sequence length constraints (typically up to ~200 bp per fragment), synthesis errors that accumulate with longer sequences, and moderate scalability, which requires assembly of longer constructs from shorter fragments.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I want to edit bacterial genes involved in antibiotic resistance and translation initiation, such as infB (encoding IF2), to study their function and to develop new antimicrobial strategies. Editing this DNA would allow precise modification of key residues to understand their role in protein synthesis and to identify vulnerabilities that can be targeted for drug development.
(ii) What technology or technologies would you use to perform these DNA edits and why?
I would use CRISPR-Cas9 gene editing technology because it allows precise, efficient, and targeted modification of DNA sequences. Its high accuracy, simplicity, and adaptability make it ideal for editing bacterial genes such as infB to investigate protein function and antibiotic resistance mechanisms.
Week 3 — Lab Automation
1. Article / case study: Automation at Adaptyv Bio and protein binder design competitions
A prominent example of the use of automation in biology is the work carried out by Adaptyv Bio, a company specialized in laboratory automation and the integration of artificial intelligence for protein design and validation. In particular, Adaptyv Bio organized international protein design competitions, such as the Protein Binder Competition, in which thousands of computationally generated designs were experimentally tested using fully automated workflows.
In these competitions, participants used AI models to design proteins capable of binding to specific therapeutic targets, such as the EGFR receptor and, in similar events, emerging viral proteins such as those from Nipah virus. Subsequently, the best designs were synthesized, expressed, and characterized using automated robotic pipelines, including cloning, protein expression, and affinity assays through Bio-Layer Interferometry (BLI). The entire experimental process was conducted in high-throughput robotic laboratories, enabling the rapid, reproducible, and standardized evaluation of hundreds of proteins.
This approach demonstrated how the integration of artificial intelligence with robotic automation can dramatically accelerate the design–build–test–learn (DBTL) cycle, reducing costs, human error, and experimental time, while simultaneously generating large volumes of reproducible data to improve predictive models.
2. Automation proposal for the final project
For my final project, I plan to implement an automated flow for the design, expression, and functional evaluation of protein binders targeted at essential bacterial targets, combining artificial intelligence, structural bioinformatics, and experimental automation.
Automated general flow
A. Computational design of binders
Use of generative AI models and structural prediction (ProteinMPNN, RFdiffusion, AlphaFold2).
Molecular docking assessment and molecular dynamics simulations.
Automatic prioritization of candidates with better affinity and stability.
B. Experimental automation with Opentrons
Automated cloning of binder genes into expression vectors.
Bacterial transformation, expression induction and culture preparation.
Automated bacterial growth assays and inhibition measurement.
C. Functional validation
Automated reading of OD600.
Analysis of growth curves.
Statistical comparison between controls and strains expressing binders.