Class Assignment Describe a biological engineering application or tool you want to develop and why. Aplication title: De novo design of proteins binders for neutralizing Bothrops venom toxins
Antivenoms are a mix of immunoglobulins produced traditionally by the hyperimmunization of large animals with crude venom obtained from clinically-relevant snakes (Ratanabangkoon, K., 2023). Novel alternatives have emerged to neutralize venom toxins without the use of animals. For example, Torres and collaborators (2025) designed proteins with high affinity for important regions of cytotoxins from the 3FTx family. These proteins showed great neutralizating capacity in vitro and great protective capacity in vivo .
Part 1: Benchling & In-silico Gel Art Lambda Sequence: Sequence from E.coli I cl857 S7 lambda bateriophage (Daniels, et al., 1983) available at New England Biolabs (N3011)
A digest simulation was performed using the lambda sequence and 7 different restriction enzyme (EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI). The range of fragments obtained from this simulation varies depending on the enzyme used.
Opentrons Artwork: Gel Designing Design: Snake Trimeresurus puniceus Inspired from a snake photo taken in the Oswaldo Meneses serpentarium, Lima, Peru. Art created Donovan’s Automation art interface
Python Script Design Opentrons script was created following the instructions and ideas offered by the HTGAA Opentrons Colab. To create the script first I created a pseudocode with the idea of how the robot will work
Part A: Conceptual Questions How many molecules of amino acids do you take with a 500 grams of meat? (on average an amino acid is ~100 Daltons) Assuming whole composition of meat is protein, the number of amino acids molecules in 500 grams is 3.011 x 1024 molecules.
Why do humans eat beef but do not become a cow, eat fish but do not become fish? This is because our digestion breaks down macromolecules into their monomers. Proteins are broken down into amino acids that later are used for the biosynthesis of proteins. The phenotypic characteristic of an organism is defined in its principally by its genome, and it’s not affected by the food they consume.
Part 1: SOD 1 Binder Peptide Design Superoxide dismutase 1 sequence was retrieved from Uniprot database (P00441), this protein has a length of 154 amino acids.
SOD1 Sequence: sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
The mutated version of the human SOD1 caused by an A4V mutation was retrieved from the PDB database that contains a structure obtained from an X-Ray Diffraction study with a resolution of 1.90 Å (Hough et al., 2004)
Part 1: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fifelity PCR Master Mix offered by New England Biolabs is a product that contains a DNA polymerase with high fidelity useful for cloning and amplification of difficult amplicons. Phusion DNA polymerase contains proofreading activity (3’ -> 5’ exonuclease) and a higher fidelity 50X greater that Taq polymerase. Thermo Fisher Scientific Phusion High-Fidelity PCR Marter Mix is composed of a HF Buffer or GC Buffer; both buffers are used to reduce the error rate of the DNA polymerase. HF Buffer contains a lower error rate (4.4 x 107) than GC Buffer (9.5 x 107), however GC buffer can improve the performance of the polymerase on some difficult or long templates with high GC-rich templates or with secondary structures. The master mix is also provided with a optimized concentration of MgCl2 which is an essential cofactor that stabilizes the DNA double helix and facilitates primer annealing. High Fidelity PCR reaction can also include DMSO that is used to reduce the melting temperature through its association with Cytosine residues that changes the conformation of the DNA template. What are some factors that determine primer annealing temperature during PCR? There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning? How does the plasmid DNA enter the E. coli cells during transformation? Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online). Model this assembly method with Benchling or Asimov Kernel!
Homework: Final Project Figure 1 below presents some key aspects of my final project that require experimental testing and quantitative evaluation. These aspects refer to the expression of the protein binders generated using a Deep Learning Model and selected after in silico prediction of their therapeutic characteristics.
Figure 1: Experimental aspects of the final project: AI-driven Antivenom: A Generative Pipeline for De novo Neutralizing Peptides Against Snake Toxins. Image generated using: Copilot AI Aspect 1: Cell-Free Expression System (CFS) Cell-Free Expression System (CFS) enables rapid expression of multiple protein variants in parallel. Since this project aims to predict and express different protein binders, CFS provides a scalable and automatable platform that avoids cell culture and allows preliminary functional testing without full purification (Cui et al., 2022) To ensure reproducibility, the CFS workflow must be standarized and quantitatively monitored. Expression efficiency will be measures by using detection tags like biotinylated lysine or His-tag into the peptide of interest, enabling detection through SDS-PAGE followed by Western Blotting. A colorimetric readout using biotin-binding secondary antibody and chromogenic substrate will aloow quantification of expression levels (Hunt et al., 2024)
Part A: Artwork Canvas Contibution: I contribute with 52 dots making the “Love” Desing at the bottom left plate. I liked the collaborative working of the canvas and the constant organic evolution of the design that resembles very well with the development of synthetic biology. For a next collaborative art experiment I could propose an experiment that serves as a competence of different groups, or a collaborative project that shows thet location of the participant in the world.
Subsections of Homework
Week 1 HW: Principles and Practices
Class Assignment
Describe a biological engineering application or tool you want to develop and why.
Aplication title:De novo design of proteins binders for neutralizing Bothrops venom toxins
Antivenoms are a mix of immunoglobulins produced traditionally by the hyperimmunization of large animals with crude venom obtained from clinically-relevant snakes (Ratanabangkoon, K., 2023). Novel alternatives have emerged to neutralize venom toxins without the use of animals. For example, Torres and collaborators (2025) designed proteins with high affinity for important regions of cytotoxins from the 3FTx family. These proteins showed great neutralizating capacity in vitro and great protective capacity in vivo .
“Omics” strategies applied to snake venoms have been developed as Venomics, these strategies allows the characterization of whole venoms building protein profiles. Similar estrategies can also be used for studying antibody-toxin complexes as Antivenomics (Lomonte, 2017). The information obtained through venomics and antivenomics give us the ability to build databases with structural and functional information of snake toxins; this information is important for disigning de novo proteins
Bothrops genus is one of the most relevant in South America and has been studied broadly. Many venomics and antivenomics studies has been developed giving high amounts of information that can be used for the design of proteins with high affinity to their important regions.
The following proposal aims to create a system that involves venomics and antivenomics estudies of Bothrops venoms to create a database that can be used to identify key regions with high impact in their toxic activities. Through these regions I propose the design of proteins using artificial intelligence. These candidates could be useful to explore the possibility of designing synthetic antivenoms that don’t depend on the use of animals for their production (Figure 1)
Figure 1: Schematic representation of de novo desing workflow
Justification: Snakebites are classified as Neglected Tropical Diseases by the World Health Organization (WHO) affecting low-and middle-income countries from Africa, Asia, and South America (World Health Organization, 2019).
Antivenoms are the only approved treatment against snakebites. Antivenoms show several limitations in their efficacy and production. Snake venoms present variability in their composition. This could lead to antivenoms with different efficacies depending on the venom used for their production. Additionally, antibodies present in the antivenom can cause adverse reactions when administered to the patient.
Antivenom production is technologically complex with high costs, resulting in a limitation for low-income countries (Alangode et al., 2020). Antivenoms not only present but also numerous challenges in their production but also in their requisites at different levels to be used safely (Figure 2, Potet et al, 2021)
Figure 2: Access antivenom barriers at different levels from global to local. Figure obtained from Potet et al, 2021
The limitations observed in antivenoms produced traditionally supports the necessity of novel alternatives that can be produced safely and with low cost in their design. The use of artificial intelligence with the information provided by venomics and antivenomics opens the possibility of creating synthetic alternatives for the neutralization of venom toxins, and their design could also be optimized to a production at large scale increasing their availabilty and reducing their cost.
Analysis of Protein Binders for Governance Goals and Actions
The World Health Organization has established a programme to evaluate the safety and effectivenes of current antivenoms intended for their use in different countries. This programme led to the recruit of several world experts, forming the Working Group on Snakebite Envenoming. Through this group, the WHO has established a goal of reducing the mortality and disabiluty of snakebite envenomings by 2030. (World Health Organization, 2019)
To accomplish this goal, the working group has developed a road map with objetives at different scales (Figure 3, Williams et al, 2019)
Figure 3: WHO snakebite envenoming road map objectives, impact goals, and timeline phases. Image gathered from Williams et al, 2019
Designing protein binders De novo fits with the objective “Safe and Effective Treatment” from the WHO roadmap. This objective proposes 5 key activities:
Make safe, effective antivenoms available, and affordable to all
Better control and regulation of antivenoms
Prequalification of antivenoms
Integrated health worker training and education
Improving clinical decision making, treatment, recovery and rehabilitation
Investing in innovative research on new therapeutics
The implementation of these protein binders as an alternative to traditionally-produced antivenoms should meet with these 5 key activites. The image below analysis how portein binders could contributo to these key activities and proposes 4 potential governance actions according to the objetive and key activites proposes by the WHO (Figure 4)
Figure 4: Analysis of protein binders and their possible relationship with government goals and actions. Figure A: Representation of key characteristics that every potential antivenom candidate must follow.(Obtained from: Thumtecho et al., 2023) Table A: Potential Governance Actions related to the use of protein binders. Table B: Possible contribution of protein binders to the key activities proposed by the WHO. Table C: Impact score of the Governance Actions proposed for each key activitiy from the WHO "Safe and Effective Treatment"
One of the most important governance actions that I would prioritize is the development of reproducible protocols for the design and use of protein binders against snake venoms. Reproducible protocols require the participation of public and private research institutions and involves the development of clear and highly reproducible strategies for de novo prediction of these protein binders, recombinant production and scalation. This action may contribute to other actions like the creation of guidelines promoted by the WHO using these protocols.
In Peru, the National Health Institute is in charge of antivenom production, the development of reproducible protocols requires the association of research laboratories with this institute. A pilot program can also be created using different species of the genus Bothrops to design and test the efficacy of protein binders.
References
Alangode, A., Rajan, K. & Nair, B. G. (2020). Snake antivenom: Challenges and alternate approaches.. Biochemical Pharmacology, 181. https://doi.org/10.1016/J.BCP.2020.114135
Lomonte, B. and Calvete, J. J. (2017). Strategies in ‘snake venomics’ aiming at an integrative view of compositional, functional, and immunological characteristics of venoms. Journal of Venomous Animals and Toxins including Tropical Diseases, 23(1). https://doi.org/10.1186/S40409-017-0117-8
Potet, J., Beran, D., Ray, N., Alcoba, G., Habib, A. G., Iliyasu, G., Waldmann, B., Ralph, R., Faiz, M. A., Monteiro, W. M., Sachett, J. d. A. G., Di Fábio, J. L., Cortés, M. d. l. Á., Brown, N. & Williams, D. (2021). Access to antivenoms in the developing world: A multidisciplinary analysis.. Toxicon: X, 12. https://doi.org/10.1016/J.TOXCX.2021.100086
Ratanabanangkoon, K. (2023). Polyvalent Snake Antivenoms: Production Strategy and Their Therapeutic Benefits. Toxins, 15. https://doi.org/10.3390/TOXINS15090517
Thumtecho, S., Burlet, N. J., Ljungars, A. & Laustsen, A. H. (2023). Towards better antivenoms: navigating the road to new types of snakebite envenoming therapies. Journal of Venomous Animals and Toxins including Tropical Diseases, 29.
Torres, S. V., Valle, M. B., Mackessy, S., Menzies, S. K., Casewell, N. R., Ahmadi, S., Muratspahić, E., Sappington, I., Overath, M., Rivera-de-Torre, E., Ledergerber, J., Laustsen, A. H., Boddum, K., Bera, A. K., Kang, A., Brackenbrough, E., Cardoso, I. A., Crittenden, E., Edge, R. & Decarreau, J. (2025). De novo designed proteins neutralize lethal snake venom toxins.. Nature, 639. https://doi.org/10.1038/S41586-024-08393-X
Williams, D., Faiz, M. A., Abela-Ridder, B., Ainsworth, S., Bulfone, T. C., Nickerson, A., Habib, A. G., Junghanss, T., Wen, F. H., Turner, M. J., Harrison, R. A. & Warrell, D. A. (2019). Strategy for a globally coordinated response to a priority neglected tropical disease: Snakebite envenoming.. PLoS Neglected Tropical Diseases, 13. https://doi.org/10.1371/JOURNAL.PNTD.0007059
OpenAI (2026). CHATGPT(GTP-5-based-model). Used for conceptual discussion and feedback on project development. https://chat.openai.com/
Week 2 Lecture Prep
Professor Jacobson Questions:
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
During DNA replication our cells use DNA polymerases for DNA synthesis, these polymerases can have an error rate of 1 bp per every 100,00 nucleotides. As the human genome is composed of 6 billion bp per diploid cell, every time a cell divides DNA polymerases will make about 120,000 errors (Pray, 2008). While these errors may become mutations that could lead to new adaptations, it is important to correct these errors since they could lead to many dangerous effects on the organism’s life. To correct these errors some DNA polymerases come with an extra exonuclease 3’-5’ activity that serves as proofreading. For example, PolƐ is a DNA polymerase that is involved in the process of DNA replication of the leading strand. PolƐ is a holoenzyme compose of many subunits, when a mismatch is detected in the pol site of PolƐ the proteins arrest the pol activity and the protein moves away from the mismatched 3’end preventing additional base incorporation. Then, the proofreading region generates a change in the DNA conformation. This takes the mismatched base to the exo site of the polymerase generating the excision, after that the polymerase resumes its activity after correcting the mistake (Wang et al, 2025). Proofreading mechanisms help to reduce errors induced by the replication process, for that reason, polymerases with proofreading activity are highly important in different applications. To design complex synthetics systems, it is necessary to reduce the possibility of bp mismatches caused by the polymerase, for that reason, high fidelity polymerase with proofreading activity is available commercially
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Human genetic code is a set of three RNA bases called codon; every codon decodes a specific amino acid. The genetic code shows codon degeneracy, in other word codons that can be used to decode the same amino acid. In average most amino acids correspond to three codons, with some exceptions like Methionine and Tryptophan that only belong to a single codon. While codon degeneracy allows the use of different codons to produce an amino acid, different organisms have different preferences for the codons they use. Codon preference may occur for different reasons like metabolic pressures where some specific tRNAs are used instead of a wide variety of tRNAs for every codon available. Similarly, protein characteristics may influence the preference for some tRNAs than others (Ford, n.d).
Dr. LeProust Questions:
What’s the most used method for oligo synthesis currently?
The most used method for DNA synthesis is through Phosphoramidite chemistry. This technology consists of the use of Nucleoside Phosphoramidites, a type of modified nucleosides that allows the sequential addition of new bases in a cyclic manner. These modified nucleosides are protected in a way that chemists can control the reaction of oligonucleotide synthesis by exposing only the regions of the nucleotide they desire.
Why is it difficult to make oligos longer than 200nt via direct synthesis?
One of the reasons why synthesis of oligos longer than 200 nt is the increase of errors caused by the natural DNA polymerases error rate or fidelity of nucleoside phosphoramidite thecnology. Another reason could be the limitations of Quality Control, since oligos require MALDI spectrometry to test their quality, this method limits the length to 10-50 nucleotides.
Why can’t you make a 2000bp gene via direct oligo synthesis?
Production of long oligos faces the main challenge of accumulating errors in their formation making it difficult to obtain high yield of oligos with high quality (Yin et al, 2024)
George Church Questions:
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
While many organisms are capable to synthesize all these 20 amino acids, some groups like ours (Metazoa) have lost the capability to synthetize nine EAAs. These amino acids are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. Additionally, arginine can also be considered essential because of the incapacity of the body to synthetize it under special periods of growth (Lopez & Mohiuddin, 2024). One explanation for the loss of the synthetic capacity of these essential amino acids is because of energetic efficiency. Estimates suggest that essential amino acids have high energetic costs in their synthesis. Selective pressures towards energy efficiency may contribute to the loss of capacity to produce essential amino acids and relying on them by direct consumption (Kasalo et al., 2026).
Considering that in Jurassic Park movies the lysine contingency consists in limiting the expansion of dinosaurs by creating to them the incapacity of lysine production. This contingency now seems futile because animals have more essential amino acids than dinosaurs in that case. Animals have overcome this limitation through the ingest of these amino acids in their diets, so in consequence dinosaurs can also survive by consuming other living things that produce lysine either from animal or plant sources.
Kasalo, N., Domazet-Lošo, M., & Domazet-Lošo, T. (2026). Outsourcing of energetically costly amino acids at the origin of animals. Nature Communications. https://doi.org/10.1038/s41467-026-68724-6
Pray, L. (2008) DNA Replication and Causes of Mutation. Nature Education 1(1):214
Wang, F., He, Q., O’Donnell, M. E., & Li, H. (2025). The proofreading mechanism of the human leading-strand DNA polymerase ε holoenzyme. Proceedings of the National Academy of Sciences, 122(22), e2507232122. https://doi.org/10.1073/pnas.2507232122
Yin, Y., Arneson, R., Yuan, Y., & Fang, S. (2024). Long oligos: direct chemical synthesis of genes with up to 1728 nucleotides. Chemical Science, 16(4), 1966–1973. https://doi.org/10.1039/d4sc06958g
Week 2 HW: DNA Read, Write & Edit
Part 1: Benchling & In-silico Gel Art
Lambda Sequence: Sequence from E.coli I cl857 S7 lambda bateriophage (Daniels, et al., 1983) available at New England Biolabs (N3011)
A digest simulation was performed using the lambda sequence and 7 different restriction enzyme (EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI). The range of fragments obtained from this simulation varies depending on the enzyme used.
EcoRV, for example, has 21 restriction sites, giving 22 band in the simulation. On the other hand, KpnI, SacI, and SalI have a few restriction sites, showing only two bands in the simulation (Figure 2).
Since restriction enzymes cleave specific sequences in the genome, the difference between the number of sites for EcoRV compared to KpnI, SacI, and SalI raises the question: **Why does the lambda genome have more restriction sites for EcoRV than others?
Bacteriophages usually present fewer restriction sites as a response to this defense mechanism. This difference may change depending on the interaction between the bacteriophage and its host (Pleška & Guet, 2017)
Gel Art: Raimondi Stela
Using a simulation of the digestion of lambda genome with different restriction enzymes, I tried to portray the “god of staffs”. This is a deity found in the Raimondi Stela that belonged to the Chavin culture (Figure 3)
The gel created tried to be similar to the deity holding their staffs.
Part 3: DNA design challenge
Protein Chosen: Bothrops atrox snake venom nerve growth factor
Description: Nerve Growth Factor (NGF) is a member of the neurotrophin family that regulates the growth, differentiation, and survival of peripheral neurons during the development of the nervous system. This factor acts through two key receptors, tyrosine kinase A receptor (TrkA) and p75 neurotrophin receptor (p75NTR). The TrkA receptor activates signaling cascades that promote neuron differentiation and neurite growth.
Snake venom NGF (sNGF) is a protein that has been reported from the venom of elapid and viperid snakes. It is proposed that the presence of sNGF in the venom helps the envenomation process by causing the release of signaling chemicals that promote inflammatory reactions and increase vascular permeability, aiding the spread of other toxins and promote the apoptosis of cells (Sunagar et al., 2013).
Justification: NGFs have been proposed as promising options to treat neurodegenerative diseases and promote regenerative processes. sNGFs show high similarities to human NGFs and have been studied for many applications like chondrogenesis, neurite outgrowth, neuroprotection, tumor growth inhibition, etc. (Devi & Jayaraman, 2025).
Because sNGFs present special activities during envenomation, the study of sNGFs from other snake species may help to find new functions with a possible use in the study of regeneration and nervous development. These new functions may contribute to the design of synthetic alternatives with specific functions that can be applied for therapeutic purposes.
Protein Sequence: I have chosen the sequence of a sNGF from Bothrops atrox snake venom available at the UniProt database (ID: A0A1L8D608). The existence of this protein was proved through transcription level.
Selection of the expression system
To continue with the process of reverse translation and codon optimization, I investigated which expression system would be the most suitable to produce this protein. Schütz et al. (2023) offers a concise guide for expression system selection with a decision graph depending on the characteristics of the protein to be expressed (Figure 3).
Figure 3: Decision Scheme for gene expression system. This scheme is based on the protein characteristics. Figure taken from Schütz et al. (2023)
I gathered the following information of the protein based on four decision points proposed by Schütz:
The target is eukaryotic protein
Uniprot PTM/Processing section describes that the protein contains a signal region related to its secretion between the 1-18 amino acids and three disulfide bonds (Figure 4).
The resulting protein would have 241 amino acids in total when expressed and 233 amino acids when secreted with a molecular mass of 27.197 KDa.
Figure 4: PTM/Processing information of B. atrox NGF available at UniProt (ID: A0A1L8D608)
Uniprot information from other NFGs does not show that requires glycosylation when expressed
In the case of this design, it wouldn’t be necessary to have an expression system with higher yield
Based on this information the decision graph suggests using an expression system using a strain of E. coli that promotes disulfide bond formation. This decision is also supported by other studies that use E. coli to express human NFGs in vitro (Tilko et al., 2016; Dicou, 1992)
Reverse Translate
Before performing the reverse translation of the protein, I decided to eliminate the amino acids 1-18 because they are part of the signal region of the protein and this won’t be used for this design. I included an initiator methionine at the N-terminus to allow translation initiation. The sequence modification was realized using the Benchling software.
Using the same software I reversed translated the protein using Escherichia coli (K12) genetic code using the method Match codon usage. The result of this process is an optimized sequence of 672 bp. This sequence was used later to perform a Blastx analysis where was found that the resulting sequence matches with other NGF from snakes (Figure 5)
Figure 5: BlastX analysis of the translated NGF using the Benchling software, the sequence showed close simmilarities with other snake NGFs and a predicted NGF group.
Codon optimization
To simulate the creation of a clonal gene using the Twist Bioscience environment, the optimized sequence was uploaded in the software. A codon optimization was performed in the application. During the configuration of the optimization, I conserved the region 321-524 since it’s predicted as the NGF region by the Blastx result.
The resulting sequence was later labeled as optimized B.atrox NGF (BatroxNGFOptimized) and finally chosed as the sequence to be used for the creation of the expression cassette.
Expression Vector Selection
To select a suitable expression vector, it is necessary to consider that the protein requires a proper environment to develop three disulfide bonds. The formation of disulfide bonds can be achieved by expressing the protein in E. coli periplasm or in the cytoplasm of engineered E coli.
A study performed by Shamriz et al. 2016 uses the pET-32a expression vector that contains the Trx-tag for increasing the solubility of the protein and its expression in E. coli Origami (DE3) to promote the correct formation of disulfide bonds in the cytoplasm of E. coli. Another strategy aims to translocate the recombinant protein into the periplasm using a signal peptide that helps the formation of disulfide bonds and increases its stability (Pouresmaeil & Azizi-Dargahlou, 2023).
Based on this information I opted for a pET-29b(+) expression vector from Twist Bioscience because it contains an N-terminal S•Tag™ sequence and may help with the protein solubility and a C-terminal His•Tag® sequence for its easy purification.
To help with the sulfide formation I selected SHuffle® strain from New England Biolabs that is engineered for the formation of disulfide bond in the cytoplasm.
Another way to express this protein is by adding signal sequence to allow the translocation of the protein to the periplasm and this could be analyzed later if possible.
Part 4: Preparation of Twist DNA Synthesis Order
A simulation of DNA Synthesis order was generated using the optimized NGF sequence obtained from the previous part and inserted into the pET-29b (+) expression vector generating the plasmid as can be observed below (Figure 6)
Figure 6: pET-29b Expression Plasmid with optimized B.atrox NGF sequence. Annotations of relevant regions where performed using the Benchling software and using information of the vector pET-29b (+) from Twist Bioscience
Part 5: DNA Read/Write/Edit
DNA ReadSequencing Idea: Genome-Wide Association Studies of Genetic Elements Related with Peanut Allergy Diversity in PeruDescription: Allergies are misdirected immune reactions against a specific molecule (Allergens) to a previously exposed patient. These reactions are associated with an immune response mediated by a particular type of antibody called IgE. Allergies are diverse in nature and involve several ambiental and congenital factors, but also genetic factors. Several genes have been investigating for their involvement in allergic reactions, showing a complex heterogeneity that varies person to person (Falcon & Caoili, 2023)
Genetic factors associated with allergies may help to elucidate the mechanisms that promote allergies predisposition. For that purpose, Genome-wide association studies (GWASs) offer a good option to study the genetic elements associated with allergies.
GWAS are used to identify the association between genotypes with phenotypes. This is performed by selecting a group of individuals to obtain their phenotypic information. Using different GWAS arrays or sequencing strategies, genotypes of these individuals are obtained. Phenotypic and genotypic information is later used to conduct association tests to obtain relevant genetic elements that may be important for the phenotype studied (Uffelmann et al., 2021)
Technologies to perform GWAS genotyping
GWAS genotyping technologies are microarrays, Whole Exome Sequencing (WES) or Whole Genome Sequencing (WGS). To study the genetic component of peanut allergy in Perú we can use previously associated genes like HLA-DQ and HLA-DR o genes located in chromosome six (Allergies and Genetics | Health and Medicine | Research Starters | EBSCO Research, n.d.). The objective of genotyping these genes is to determine Single Nucleotide Polymorphisms (SNPs) that might have strong association with peanut allergy. Whole genome sequencing technologies can be applied for SNP genotyping and involves sequencing all regions of the entire genome. On the other hand, Whole exome sequencing is a method for sequencing only the exonic region of the human genome.
Is the method first-, second- or third- generation or other?
Whole and Exome genome sequencing are part of the Next Generation Sequencing (NGS) because they are based in the massively parallel sequencing process.
What is your input? How do you prepare your input?
For this study my input is genomic DNA extracted from a representative Peruvian sample of individuals diagnosed with peanut allergy.
What are the essential steps of your chosen technology, how does it decode the bases of your DNA sample?
For WGS studies, Illumina uses a sequencing technology by synthesis, where fluorescently labeled nucleotides to sequence millions of clusters on a cell surface in parallel.
What is the output of your sequencing technology?
Illumina sequencing data is obtained through the signal intensity measurement of the labeled nucleotides that serve a terminators
DNA WriteProject Idea: Snake venom NGF from B atrox
The following idea aims to express a snake venom NGF from B. atrox. sNGFs have been applied in numerous studies to test their potential effect on regenerative processes because of their similarity with the human NGF and because of novel properties that may appear because of its evolution in the snake venom.
For its production I propose the recombinant production of this protein, for that I realized used a sequence available at UniProt (ID: A0A1L8D608) a reverse translated to then propose it cloning using a vector in E. coli and expression in the same organism.
DNA EditProject Idea: Using genetic engineered cells in hydrogels for cartilage regeneration
Hydrogels are tridimentional networks polymers that can be used as scaffold for cartilage tissue engineering. A promising approach is to modify the genome of stem cells, creating specific gene circuits to promote cartilage regeneration. Trough gene edition, we could use steam cells to modify their proliferation capacity or control it using genetic circuits, a concept that may help with this idea is the concept of BioBricks that allows to create libraries that coul be used to modify the behavior of these stem cells (Elnaggar et al., 2025).
Daniels, D.L. et al. (1983). Appendix II: Complete Annotated Lambda Sequence. R.W. Hendrix, J.W. Roberts, F.W. Stahl and R. A. Weisberg(Ed.), Lambda-II. 519-676. New York: Cold Spring Harbor Laboratory Press.
Devi, S., & Jayaraman, G. (2025). Unraveling the molecular basis of snake venom nerve growth factor: human TrkA recognition through molecular dynamics simulation and comparison with human nerve growth factor. Frontiers in Bioinformatics, 5, 1674791. https://doi.org/10.3389/fbinf.2025.1674791
Elnaggar, K. S., Gamal, O., Hesham, N., Ayman, S., Mohamed, N., Moataz, A., Elzayat, E. M., & Hassan, N. (2025). A guide in synthetic biology: Designing genetic circuits and their applications in stem cells. SynBio, 3(3), 11. https://doi.org/10.3390/synbio3030011
Falcon, R. M. G., & Caoili, S. E. C. (2023). Immunologic, genetic, and ecological interplay of factors involved in allergic diseases. Frontiers in Allergy, 4, 1215616. https://doi.org/10.3389/falgy.2023.1215616
Pleška, M., & Guet, C. C. (2017). Effects of mutations in phage restriction sites during escape from restriction–modification. Biology Letters, 13(12). https://doi.org/10.1098/rsbl.2017.0646
Pouresmaeil, M., & Azizi-Dargahlou, S. (2023). Factors involved in heterologous expression of proteins in E. coli host. Archives of Microbiology, 205(5), 212. https://doi.org/10.1007/s00203-023-03541-9
Shamriz, S., Ofoghi, H., & Amini-Bayat, Z. (2016). Soluble Expression of Recombinant Nerve Growth Factor in Cytoplasm of Escherichia coli. Iranian Journal of Biotechnology, 14(1), 16–22. https://doi.org/10.15171/ijb.1331
Sunagar, K., Fry, B. G., Jackson, T. N. W., Casewell, N. R., Undheim, E. a. B., Vidal, N., Ali, S. A., King, G. F., Vasudevan, K., Vasconcelos, V., & Antunes, A. (2013). Molecular Evolution of Vertebrate Neurotrophins: Co-Option of the Highly Conserved Nerve Growth Factor Gene into the Advanced Snake Venom Arsenalf. PLoS ONE, 8(11), e81827. https://doi.org/10.1371/journal.pone.0081827
Uffelmann, E., Huang, Q. Q., Munung, N. S., De Vries, J., Okada, Y., Martin, A. R., Martin, H. C., Lappalainen, T., & Posthuma, D. (2021). Genome-wide association studies. Nature Reviews Methods Primers, 1(1). https://doi.org/10.1038/s43586-021-00056-9
Week 3 HW: Lab Automation
Opentrons Artwork: Gel Designing
Design: Snake Trimeresurus puniceus
Inspired from a snake photo taken in the Oswaldo Meneses serpentarium, Lima, Peru. Art created Donovan’s Automation art interface
Python Script Design
Opentrons script was created following the instructions and ideas offered by the HTGAA Opentrons Colab. To create the script first I created a pseudocode with the idea of how the robot will work
Pseudocode
Get the coordinates of the art from donovan’s page in the form of a dictionary
Create a function Coordinate_per_color:
Pick up a 20 ul tip
For each coordinate
Check if the tip is empty (20 ul volume)
Aspirate an amount depending on the number of coordinates to fill (20 or less)
Get the x and y coordinates
Move to the x and y coordinates
Dispense 1 ul to the coordinate
Remove the tip
Call the function Coordinate_per_color for each color present in the dictionary
Following the idea of the pseudocode I followed the script design from Dominika Wawrzyniak, 2021 student and adapted to the coordinates from Donovan’s Automation page. For this first draft I decided to copy and paste the coordinates and give them a dictionary structure, then I changed the color names using the names from the robot deck setup constants. The resulting script is the following :
# Set the initial coordinates take from the donovan's page (Converted into a dictionary)Coordinates={"Green":[],"Red":[],"Blue":[],"Yellow":[],"Cyan":[]}#To avoid using many tips the objective is to create a function that takes up the points and add the volume per colordefCoordinate_per_color(color_string):# Pick up a 20 ul tippipette_20ul.pick_up_tip()# For every coordinate per colorforiinrange(len(Coordinates[color_string])):# i shows the number of positionsifi%20==0:# Aspirate a volume 20 if the total of remaining coord to paint is more than 20pipette_20ul.aspirate(min(20,len(Coordinates[color_string])-i),location_of_color(color_string))# Get the x and y coordinatesx_coordinate=Coordinates[color_string][i][0]y_coordinate=Coordinates[color_string][i][1]# Move to the x and y coordinatesadjusted_location=center_location.move(types.Point(x_coordinate,y_coordinate))# Dispense 1 ul to the positionpipette_20ul.dispense(1,adjusted_location)hover_location=adjusted_location.move(types.Point(z=2))pipette_20ul.move_to(hover_location)# Finishing drop the tippipette_20ul.drop_tip()#Call the function Coordinate_per_color for every color in the dictionaryfornameinCoordinates.keys():print(name)Coordinate_per_color(name)
After executing the script, I simulated the visualization and got the Image I wanted to create
Second Design: Geometrical Green/Red Yin and Yang
After recieving the instructions from my node y change the design for a Yin-Yang inspired design. To create the design I comtemplated the idea of using mathematical formulas to desing the pattern.
AI assistance
ChatGPT was used to support conceptual understanding of the geometric contruction of the Yin-Yang symbol using circles and semicircles. All code implementation was independently developed by me, only using the artificial inteligence to offer some feedback.
First I created a code testing the mathematical approach which consisted in creating the design using circles and semicircles, first I integrated the mathematical code with the robot operation code resulting in a messy code
//FirstYin-YangCode# Start at the centercursor=center_location.move(types.Point(x=0,y=0))# Define de radius as 20 (To reduce the times to aspirate a volume)radius=20# Define the number of point (Default 40)points=40# Function to create semicirclesdefthetha(i):theta=np.pi*i/(points-1)returntheta# Fucntion aspiratedefaspirate(color):ifi%20==0:pipette_20ul.aspirate(min(20,points-i),location_of_color(color))# Function hoverdefhover():hover_location=adjusted_location.move(types.Point(x=0,y=0,z=2))pipette_20ul.move_to(hover_location)# Create a green semicirclepipette_20ul.pick_up_tip()foriinrange(points):aspirate("Green")theta=thetha(i)x=radius*np.sin(theta)y=(radius*np.cos(theta))adjusted_location=cursor.move(types.Point(x=x,y=y))pipette_20ul.dispense(1,adjusted_location)hover()# Create and S-divider in the circle (UpperSide)inner_radius=10foriinrange(points):aspirate("Green")theta=thetha(i)x=inner_radius*np.sin(theta)y=(inner_radius*np.cos(theta))+radius/2adjusted_location=cursor.move(types.Point(x=x,y=y))pipette_20ul.dispense(1,adjusted_location)hover()pipette_20ul.drop_tip()# Create a Red semicirclepipette_20ul.pick_up_tip()foriinrange(points):aspirate("Red")theta=thetha(i)x=radius*np.sin(theta)y=(radius*np.cos(theta))adjusted_location=cursor.move(types.Point(x=-x,y=y))pipette_20ul.dispense(1,adjusted_location)hover()# Create and S-divider in the circle (LowerSide)inner_radius=10foriinrange(points):aspirate("Red")theta=thetha(i)x=inner_radius*np.sin(theta)*-1y=(inner_radius*np.cos(theta))-radius/2adjusted_location=cursor.move(types.Point(x=x,y=y))pipette_20ul.dispense(1,adjusted_location)hover()
Second Attemp: To make the code more readable I created two custom functions called create_circle and create_semicircle. Also adapted the logic to get a list of coordinates so it can be used by the code example shared by the Node.
## Ying-Yang Code ## Create two functions that will give the coordinate for circles## Circle functiondefcreate_circle(x_center,y_center,radius,points):coordinates=[]foriinrange(points):angle=2*math.pi*i/pointsx=x_center+radius*math.cos(angle)y=y_center+radius*math.sin(angle)coordinates.append((x,y))returncoordinates## Semicircle functiondefcreate_semicircle(x_center,y_center,radius,points,direction="left"):"""
Four semicircle orientations:
- right
- left
- up
- down
"""coordinates=[]foriinrange(points):angle=math.pi*i/(points)# Change directionifdirection=="right":# Base x and y coordinates (Default = right)x=x_center+radius*math.sin(angle)y=y_center+radius*math.cos(angle)coordinates.append((x,y))elifdirection=="left":x=x_center+radius*math.sin(angle)y=y_center+radius*math.cos(angle)coordinates.append((-x,-y))elifdirection=="up":x=x_center+radius*math.cos(angle)y=y_center+radius*math.sin(angle)coordinates.append((x,y))elifdirection=="down":x=x_center+radius*math.cos(angle)y=y_center+radius*math.sin(angle)coordinates.append((-x,-y))else:raiseValueError("direction must be: right, left, top or bottom")returncoordinates# Ying-Yang Design: Using the robot script offered by the node# Green parts# Green middle circle boundary pipette_20ul.pick_up_tip()green_big_circle=create_semicircle(0,0,20,40,"right")forx,yingreen_big_circle:adjusted_location=center_location.move(types.Point(x=x,y=y))ifpipette_20ul.current_volume==0:pipette_20ul.aspirate(1,location_of_color("Green"))dispense_and_detach(pipette_20ul,1,adjusted_location)# Semicircle which center is the middle inferior part of the circlegreen_S_semicircle=create_semicircle(0,10,10,40,"left")forx,yingreen_S_semicircle:adjusted_location=center_location.move(types.Point(x=x,y=y))ifpipette_20ul.current_volume==0:pipette_20ul.aspirate(1,location_of_color("Green"))dispense_and_detach(pipette_20ul,1,adjusted_location)# Small circle whose center is at the center of the s_semicirclegreen_small_circle=create_circle(0,10,5,20)forx,yingreen_small_circle:adjusted_location=center_location.move(types.Point(x=x,y=y))ifpipette_20ul.current_volume==0:pipette_20ul.aspirate(1,location_of_color("Green"))dispense_and_detach(pipette_20ul,1,adjusted_location)pipette_20ul.drop_tip()# Red parts# Red middle circle boundary pipette_20ul.pick_up_tip()red_big_circle=create_semicircle(0,0,20,40,"left")forx,yinred_big_circle:adjusted_location=center_location.move(types.Point(x=x,y=y))ifpipette_20ul.current_volume==0:pipette_20ul.aspirate(1,location_of_color("Red"))dispense_and_detach(pipette_20ul,1,adjusted_location)# Semicircle which center is the middle inferior part of the circlered_S_semicircle=create_semicircle(0,10,10,40,"right")forx,yinred_S_semicircle:adjusted_location=center_location.move(types.Point(x=x,y=y))ifpipette_20ul.current_volume==0:pipette_20ul.aspirate(1,location_of_color("Red"))dispense_and_detach(pipette_20ul,1,adjusted_location)# Small circle whose center is at the center of the s_semicirclered_small_circle=create_circle(0,-10,5,20)forx,yinred_small_circle:adjusted_location=center_location.move(types.Point(x=x,y=y))ifpipette_20ul.current_volume==0:pipette_20ul.aspirate(1,location_of_color("Red"))dispense_and_detach(pipette_20ul,1,adjusted_location)pipette_20ul.drop_tip()
The final code allowed me to obtain the desired Yin-Yang Design
Post-Lab Questions
Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications
The article I found interesting is developed by Kverneland et al (2024). In this article an automated workflow is designed with the objective of preparing protein samples for LC-MS/MS Analysis. Using Opentons OT-2 robot Hela cells samples and plasma serum from patients were prepared with a shotgun approach to prepare the sample for the proteomic analysis. From this approach they analyzed 192 HeLa samples and consistently identified approximately 8000 protein groups and 130,000 peptide precursors.
The importance of this study relies on the necessity of identifying and analyzing protein profiles of many samples. Proteomics approaches offer valuable information that can be used to discover novel biomarkers and contribute to the development of personalized treatments. Also, this article provides a potential approach to creating databases containing proteomic information that could be used for novel synthetic technologies like, for example, de novo design of proteins.
Write a description about what you intend to do with outomation tools
Idea 1: An automated pipeline for De novo Design and Production of small neutralizing peptides against Bothrops atrox Venom Toxins
For this idea I designed a pipeline and identified the use of automation approaches in two key activities as shown below
The use of automated proteomic can help us to identify novel proteins with potential therapeutical applications and also identify protein families and relevant sequences, this sequences can be used for the desing of small neutralizing peptides that may be used against snake venoms. Since the design would produce a high amount of potential candidates it is necessary an automated process of protein synthesis. For this I identify the use of AI driven cell-free protein synthesis as promising aproach to produce and test possible candidates to neutralize snake venom toxins.
Final Project Ideas
I created three slides containing my three final project ideas using the lessons learned until now:
How many molecules of amino acids do you take with a 500 grams of meat? (on average an amino acid is ~100 Daltons)
Assuming whole composition of meat is protein, the number of amino acids molecules in 500 grams is 3.011 x 1024 molecules.
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
This is because our digestion breaks down macromolecules into their monomers. Proteins are broken down into amino acids that later are used for the biosynthesis of proteins. The phenotypic characteristic of an organism is defined in its principally by its genome, and it’s not affected by the food they consume.
Why are there only 20 natural amino acids?
Amino acids origin is a complex field of study of many theories indicating why living organisms have conserved 20 amino acids. Crick for example mention the “Frozen accident theory” which states that amino acids have been kept the same as living beings’ complexity cross a limit where other amino acids cannot compete with the gold standard. On the other hand, there is a close relationship between these 20 amino acids and living being’s metabolism, suggesting that these amino acids are the result of metabolism optimization (Kirschning, 2022).
Can you make other non-natural amino acids? Design some new amino acids.
Non-natural amino acids are modified amino acids that don’t belong to the traditional 20 amino acids, these amino acids can be produced by several organisms. Non-natural amino acids can be designed by modifying natural amino acids using techniques grouped in the field of Peptidomimetics.
Where did amino acids come from before enzymes that make them, and before life started?
Several hypotheses have been proposed for the origin of amino acids, for example the “ARN World” hypothesis proposes the existence of a precursor to RNA whit the capability of processing information and catalyze chemical reactions. Evidence of this hypothesis is the presence of Ribozymes, RNA molecules with catalytic activity (Higgs & Lehman, 2014). Other theories propose other molecules like coenzymes or the association of metabolic pathways and amino acids for their production and the possible apparition of amino acids from meteorites (Kirschning, 2022)
If you make a -helix using D-amino acids, what handedness (right or left) would you expect?
L- and D-Amino Acids are chiral molecules, meaning that L-Amino Acids are the reflection of D-Amino acids. Alpha-helix using L-amino Acids results in right handedness helix. D-Amino Acids Could result in left handedness in their alpha helix (Novotny & Kleywegt, 2005).
Why are most molecular helices right-handed?
Most helices are right-handed because they are formed by L-amino Acids which are more stable at that orientation.
Why do β-sheets tend to aggregate?
Aggregation may be promoted by the presence of Aggregation Prone Regions (APRs) that are 5-15 residue long stretches in proteins. These APRs regions are usually buried inside the hydrophobic core of the protein. A protein with more APRs may aggregate because this rearing can link together through hydrogen bones which is also promoted by their hydrophobicity forming a “steric zipper” (Aggregation Prone Regions (APRs), n.d.). a. What is the driving force for β-sheet aggregation?
Β-sheet aggregation is promoted by intermolecular forces like hydrogen bonds and hydrophobic interaction between the Aggregation Prone Regions
Can you use amyloid β-sheets as materials?
A study made by Cheng et al. (2012) proposes the use of amyloid β-sheets mimics (ABSMs) to antagonize the aggregation of amyloid proteins and reduce their toxic activity.
Cheng, P., Liu, C., Zhao, M., Eisenberg, D., & Nowick, J. S. (2012). Amyloid β-sheet mimics that antagonize protein aggregation and reduce amyloid toxicity. Nature Chemistry, 4(11), 927–933. https://doi.org/10.1038/nchem.1433
Higgs, P. G., & Lehman, N. (2014). The RNA World: molecular cooperation at the origins of life. Nature Reviews Genetics, 16(1), 7–17. https://doi.org/10.1038/nrg3841
Kirschning, A. (2022). On the Evolutionary History of the Twenty Encoded Amino Acids. Chemistry - a European Journal, 28(55), e202201419. https://doi.org/10.1002/chem.202201419
Novotny, M., & Kleywegt, G. J. (2005). A survey of left-handed helices in protein structures. Journal of Molecular Biology, 347(2), 231–241. https://doi.org/10.1016/j.jmb.2005.01.037
Protein Analysis and Visualization
Protein Description: Bothrops atrox snake venom nerve growth factor (NGF)
Nerve Growth Factor (NGF) is a protein involved in the process of neurite differentiation and growth through the activation of a receptor that promotes signaling cascades. Snake venoms possess NGFs that might contribute to the toxicity of the venom by activating pro-inflammatory signals that increase vascular permeability. Snake NGF has been proposed as an alternative for therapeutic strategies aimed at regenerating tissues.
Amino acid sequence description
The amino acid sequence was obtained from the Uniprot database (ID: A0A1L8D608) predicted at a transcriptomic level
Using Uniprot’s Blast tool I found that the protein has high homology with a broad number of NGFs from other species of snakes and also from other reptiles and fishes
Uniprot’s database informs that this protein belongs to NGF-beta family and other related families according to different databases
NFG structure page
The B. atrox NGF doesn’t have a crystallyzed protein so its structure haven’t been resolved. To answer the protein structure questions another NGF from Mus musculus (ID: pdb_0001btg)
This protein has been resolve with 2.50 A structure resolution. This makes the structure a good model to study
Mus musculus Domains are part of the Neurotrophin family according to the Scop Database, each chain is linked with the 1BET A: 10-116 Domain (SCOP ID: 8026411)
Protein Visualization
For the visualization of the B. atrox NGF a model was created using the SWISS-MODEL tool to generate an predicted structure base on homology (A0A1L8D608)
Protein Visualization was performed using Pymol Software
NGF model was visualized as ribbons, cartoon and sticks. This model is a homo-2-mer
Coloring the protein by secondary structure shows that the protein lacks of alpha helix structures and is mainly formed by b-sheets structures and loops.
Hydrophobic (orange) and Hydrophilic (skyblue) residues were colored in the model. The model contains a homogeneous distribution of these residues along the beta sheets structure while the loops are mainly hydrophobic.
Part C1: ML-Based Protein Design Tools
The protein selected is the crystal structure of beta nerve growth factor at 2.5 A resolution from Mus musculus (ID: pdb_00001btg).
Deep Mutational Scans:
Mutation Scan Heatmap results show low scores along the protein sequence as can be observed below. However the fifth residue (S) showed a high mutational score with the glycine amino acid. Low scores in the mutation heatmap may be related with their protein conservation. A study from Barker et al. (2020) states that Mammalian nerve growth factors (NGF) are conserved, this could be related with a low mutational capacity. To corroborate this hypothesis I suggest comparing mutational scores of NGF of mammals and other animal groups.
Latent Space Analysis
The figure below shows the latent space result after introducing the NGF sequence.
The resulting figure shows a homogeneous distribution of the proteins in the database with small dispersed regions. This could suggest a gradual evolution of the proteins used in this database. Dispersed regions are diverse in origin and also comprises automated matches.
Part C2: Protein Folding
NGF folding obtained using the ESMFold program was created using the sequence from the Mus musculus NGF (ID: pdb_00001btg). The sequence was used creating 3 copies since the original PDB sequence presents a homotrimeric structure. The resulting prediction (Yellow) was compared with the original structure (Multicolor) using the Pymol software
To study the quality of the protein folding the predicted region was aligned with the original structure
As showed in the image above, an aligment was performed using only two chains of both structures (B and C) resulting an RMSD = 0.487 indicating a close alignment between the predicted structure and it’s original model. Similar result can be observed comparing the resulting structures, observing similarities en their secondary structure. This result suggests the correct application of the ESMFold language and indicates that the predicted protein can be used for other structural studies.
Part C3: Reverse Folding
The Mus musculus NGF was used with the Protein MPNN tool to predic the protein sequence based on the structural information. Using the default parameters of the tool a new sequence was obtained:
The resulting sequence presents a recovery score of 0.3519 wich suggest a low recovery percentage of the original sequence. Similar result can be observed when aligning both original and predicted sequence using Clustal Omega and through the probability graph of every aminoacid:
However Blast analysis predicted a close relationship of the predicted sequence with other NGFs of different species and other neurotrophins which could be supported with their evolutionary conservation of this family of proteins
Part D. Group Brainstorm on Bacteriophage Engineering
Main Goals:
Goal 1: Increase the stability of the MS2 lysis protein by predicting mutations of residues near the C-terminal region and surrounding the LS motif
Goal 2: Improve the N-terminal region by modifying residues to contribute to its toxic activity or add new functional regions that may increase its toxicity.
Improving MS2 lysis protein by modifying regions not related with the Leu48 and Ser49 (LS motif) and surrounding to improve protein toxicity (Chamakura et al, 2017). Predict mutations that may improve its stability. We suggest that by increasing protein stability, the protein would not require the presence of the DnaJ for its action.
Another goal is to design new accessories to the N-terminal region to improve lysis toxicity. Berkhout et al, 1985 suggest C-terminal region is key for protein activity, so taking this in consideration we can try to modify the N-terminal region to improve protein stability or add a new characteristics that may improve the toxicity of the protein
Strategy
Given our two main goals, we propose different strategies to address each objective
For the first goal, we propose using a protein language model such as ESM-2 to perform in silico deep mutational scan that evaluates the plausibility of all possible single-point mutations in the MS2 L protein. Subsequently, we will employ ESMFold or AlphaFold2 to predict the resulting 3D structural variations.
For the second goal:
Step 1: Identify and Annotate key functional regions near the C-terminal motif and LS motif
Software: Blast (For conserved domains), PeSTO (Functional motifs)
Predict mutations near the N-terminal and C-terminal site that may improve protein stability
Software: Clustal Omega (To identify hotspots for mutations)
Generate different protein candidates with mutations and evaluate their stability
Software: Alpha-Fold Multimer, Boltz-1
We propose using Alpha-Fold with a specific training set for bacteriophages
Predict accessory peptide sequences to insert in their N-terminal region and improve its toxicity
Software: FoldSeek (To find remote sequences with similar folding), EvolvePro (To suggest optimized N-terminar sequences)
Test suitability of these protein candidates by performing docking essays with a bacterial membrane model, etc.
The model can predict structures that look stable and coherent, but it does not measure real folding energy, membrane insertion, or toxicity. “Looks good” in silico ≠ “works better” in vivo.
Selection of variants that appear structurally improved but do not increase stability or toxicity — or even reduce lytic activity.
Phage-specific training / limited viral datasets
If the model is trained or fine-tuned on a small, biased set of phage proteins, it may learn dataset-specific patterns instead of general biological rules (overfitting).
Mutations may seem optimized in the model but fail outside that narrow dataset. Reduced generalizability and misleading predictions.
References:
Chamakura, K. R., Edwards, G. B., & Young, R. (2017). Mutational analysis of the MS2 lysis protein L. Microbiology, 163(7), 961–969. https://doi.org/10.1099/mic.0.000485
Berkhout, B., De Smit, M., Spanjaard, R., Blom, T., & Van Duin, J. (1985). The amino terminal half of the MS2-coded lysis protein is dispensable for function: implications for our understanding of coding region overlaps. The EMBO Journal, 4(12), 3315–3320. https://doi.org/10.1002/j.1460-2075.1985.tb04082.x
Week 5 HW: Protein Design Part II
Part 1: SOD 1 Binder Peptide Design
Superoxide dismutase 1 sequence was retrieved from Uniprot database (P00441), this protein has a length of 154 amino acids.
The mutated version of the human SOD1 caused by an A4V mutation was retrieved from the PDB database that contains a structure obtained from an X-Ray Diffraction study with a resolution of 1.90 Å (Hough et al., 2004)
1UXM_1 Superoxide Dismutase Mutated from Homo sapiens:
An alignment between normal SOD 1 and mutated SOD 1 was performed using Clustal Omega to corroborate the mutation at position four, an initial methionine was included into the mutated SOD 1 to have a protein of the same length. (Figure 1)
Figure 1: Multiple sequence alignment Clustal between SOD1 sequence (POO441) retrieved by Uniprot database and mutated SOD1 sequence available at PDB database (1UXM_1). Alignment shows a single point mutation in the residue 4 A/V that has been reported in several studies a the cause of the amyotrophic lateral sclerosis (ALS) disease
Small protein binders were generated using the PepMLM model made by Chen et al (2025). Four peptides were generated with a length of 12 amino acids and a Top K value of 3.
index
Binder
Pseudo Perplexity
0
WRYYAVVVAHKX
12.802906286585648
1
WHYGVVALAHKX
7.909934706159041
2
WLSYPAALRHKX
11.125327842529979
3
WRSPAAAVRWKE
11.952399811426888
The four candidates have low pseudo perplexity values (< 20) indicating confidence from the model to the peptides designed (Chen et al. 2025). A fasta document was created including the four candidates with the mutated SOD sequence and SOD-1 binding peptide FLYRWLPSRRGG as a control. However Generated candidates contained an X amino acid coded that means an unknown residue.
These candidates were aligned with the original protein using Clustal Omega (Figure 2)
Figure 2: Multiple sequence alignment Clustal between three small binders candidates and the mutated SOD1 sequence, another peptide was used as control to compare the suitability of the generated binders. Results shows close similarities between the three candidates and the region 32-44 of the mutated SOD protein, while the control didn't show the same similarity with the candidates
Part 2: AlphaFold 3 Binders
Peptide candidates were modeled using the AlphaFold Server together with the mutated SOD 1 sequence. The control peptide was also modeled and showed a close integration into the SOD 1 structure. Candidates 1, 2, and 3 haven’t shown an integration into the internal structure of SOD 1 (Figure 3)
Figure 3: AlphaFold Generation of the interaction between the candidates and mutate SOD1 sequence. Candidates 1,2 or 3 don't show a possible insertion to a pocket region in the target sequence while the control seem to interact and insert well into the protein
Confidence metrics are presented in the table below where pTM and ipTM scores are shown for each Candidate and the control. These scores measure the accuracy of the structures generated. For all candidates and the control, the pTM scores are more than 0.5, suggesting some confidence that the structure is like its true structure. On the other hand, ipTM value suggests poor confidence in the relative position of the subunits within the complex
Peptide
ipTM
pTM
Control
0.26
0.78
Candidate 1
0.36
0.76
Candidate 2
0.45
0.83
Candidate 3
0.36
0.87
Part 3: PeptiVerse Evaluation
PeptiVerse was used to predict several characteristics that are required for proposing a binding peptide with therapeutical application.
Candidate
Solubility
Hemolysis
Binding Activity
pH
Length
Molecular Weight
Candidate 1
Soluble
Non-Hemolytic
Weak
9.70
12
1373.7 Da
Candidate 2
Soluble
Non-Hemolytic
Weak
9.99
12
1323.8 Da
Candidate 3
Soluble
Non-Hemolytic
Weak
10.84
12
1456.7 Da
Control
Soluble
Non-Hemolytic
Weak
11.71
12
1507.7 Da
Candidates 1, 2, and 3 showed high solubility and low hemolytic probability, indicating their possible expression and use. However, pHs obtained a highly basic making it difficult to keep their structure in blood. Predicted Binding activities suggest that the candidates would have a weak interaction with their target. This result is also supported by the ipTM values gotten indicating that these candidates could not be able of binding to the target.
Part 4: Optimized Peptides Generation with moPPIt
Peptide binders were produced using the moPPIt using the mutated SOD1 N-terminal as target region. I propose that these candidates would bind to the mutated region and prevent the aggregation by stabilization of the structure. Peptides were generated considering as objectives and weights their Hemolysis probability, Solubility, Affinity and Specificity.
A total of 4 candidates who were generated have low pseudo-perplexity values indicating low uncertainty for the model to the predicted sequence (OFS Pseudo-perplexity for Protein Fitness, n.d.)
Candidates
Sequence
Pseudo-Perplexity
Candidate 1
WRYYAVVVAHKX
12.80
Candidate 2
WHYGVVALAHKX
7.90
Candidate 3
WLSYPAALRHKX
11.12
Candidate 4
WRSPAAAVRWKE
11.95
A Clustal Omega alignment was performed for all the candidates generated by moPPIt and PEPMLM showing close similarities in their sequences (Figure 4)
Figure 4: Multiple alignment between PepmLM and moPPit generated peptides. Alignment shows close similarities with the peptides generated by both language models
moPPIt candidates were evaluated using the PeptiVerse programs to evaluate their main characteristics and therapeutical applicability.
Candidate
Solubility
Hemolysis
Binding Activity
pH
Length
Molecular Weight
Candidate 1
Soluble
Non-Hemolytic
Weak
9.70
12
1373.7 Da
Candidate 2
Soluble
Non-Hemolytic
Weak
8.61
12
1262.7 Da
Candidate 3
Soluble
Non-Hemolytic
Weak
9.99
12
1323.8 Da
Candidate 4
Soluble
Non-Hemolytic
Weak
10.84
12
1456.7 Da
All candidates were predicted with weak affinity and presented a pH superior to 7 making them difficult to use directly in a human.
References
Hough, M. A., Grossmann, J. G., Antonyuk, S. V., Strange, R. W., Doucette, P. A., Rodriguez, J. A., … & Hasnain, S. S. (2004). Dimer destabilization in superoxide dismutase may result in disease-causing properties: structures of motor neuron disease mutants. Proceedings of the National Academy of Sciences, 101(16), 5976-5981.
Chen, L. T., Quinn, Z., Dumas, M., Peng, C., Hong, L., Lopez-Gonzalez, M., … & Chatterjee, P. (2025). Target sequence-conditioned design of peptide binders using masked language modeling. Nature Biotechnology, 1-9.
Zhang, Y., Tang, S., Chen, T., Mahood, E., Vincoff, S., & Chatterjee, P. (2026). PeptiVerse: A Unified Platform for Therapeutic Peptide Property Prediction. bioRxiv, 2025-12.
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion High-Fifelity PCR Master Mix offered by New England Biolabs is a product that contains a DNA polymerase with high fidelity useful for cloning and amplification of difficult amplicons. Phusion DNA polymerase contains proofreading activity (3’ -> 5’ exonuclease) and a higher fidelity 50X greater that Taq polymerase. Thermo Fisher Scientific Phusion High-Fidelity PCR Marter Mix is composed of a HF Buffer or GC Buffer; both buffers are used to reduce the error rate of the DNA polymerase. HF Buffer contains a lower error rate (4.4 x 107) than GC Buffer (9.5 x 107), however GC buffer can improve the performance of the polymerase on some difficult or long templates with high GC-rich templates or with secondary structures. The master mix is also provided with a optimized concentration of MgCl2 which is an essential cofactor that stabilizes the DNA double helix and facilitates primer annealing. High Fidelity PCR reaction can also include DMSO that is used to reduce the melting temperature through its association with Cytosine residues that changes the conformation of the DNA template.
What are some factors that determine primer annealing temperature during PCR?
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
How does the plasmid DNA enter the E. coli cells during transformation?
Describe another assembly method in detail (such as Golden Gate Assembly)
Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
Model this assembly method with Benchling or Asimov Kernel!
Week 7 HW: Genetic Circuits Part II
Week 9 HW: Cell Free Systems
Week 10 HW: Imaging and Measurement
Homework: Final Project
Figure 1 below presents some key aspects of my final project that require experimental testing and quantitative evaluation. These aspects refer to the expression of the protein binders generated using a Deep Learning Model and selected after in silico prediction of their therapeutic characteristics.
Figure 1: Experimental aspects of the final project: AI-driven Antivenom: A Generative Pipeline for De novo Neutralizing Peptides Against Snake Toxins. Image generated using: Copilot AI
Aspect 1: Cell-Free Expression System (CFS)
Cell-Free Expression System (CFS) enables rapid expression of multiple protein variants in parallel. Since this project aims to predict and express different protein binders, CFS provides a scalable and automatable platform that avoids cell culture and allows preliminary functional testing without full purification (Cui et al., 2022)
To ensure reproducibility, the CFS workflow must be standarized and quantitatively monitored. Expression efficiency will be measures by using detection tags like biotinylated lysine or His-tag into the peptide of interest, enabling detection through SDS-PAGE followed by Western Blotting. A colorimetric readout using biotin-binding secondary antibody and chromogenic substrate will aloow quantification of expression levels (Hunt et al., 2024)
Quantitative outputs for this aspect
Relative Expression Intensity (Densitometry of Wester Blot bands)
Expression yield per reaction
Reproducibility across replicates (CV%)
Figure 2 illustrates the Promega FluoroTect and Trascend detection systems that portrays the methodology to evaluate the expression of protein from a CFS.
Figure 2: Promega detection system for their cell-free expression systems
Aspect 2: Peptide Purification
To evaluate neutralization capacity, the expressed peptides must be purified from the CFS reaction. PURExpress In vitro Protein Synthesis Kit (New England Biolabs) for example uses reverse purification, a method that uses magnetic affinity resing based on metals that can selective capture His-tagged peptides (Figure 3)
Figure 3: Reverse purification of His-tagged proteins, this method could be use to extract peptides and later eliminate the tag by the use of a Tag Removal by Enzymatic Cleavage system
This approach could enable rapid purification suitable for small peptides or mini-proteins and is compatible with downstream biochemical assays.
Quantitative outputs for this aspect
Purification yield
Purity (Assessed by SPS-PAGE)
Recovery efficiency to crude expression levels
The metrics allow can be use later to generated an automated system for protein binders expression and purification
Aspect 3: Reactivity and Neutralization
The functional evaluation of the binders focuses on their ability to interact and neutralize the target toxin.
In vitro Neutralization: Neutralization can be test in vitro through the determination of the enzymatic activity of many protein targets (e.g. PLA2 or SVMP activity) in the presence of each binder. This provides a measureable neutralization parameter such as % inhibition or IC50.
Another strategy is the use of proteomics technique that involve the formation of the toxin-binder complex and and analyzed the interaction using a liquid chromatography-ms thechnique, which enables:
Confirmation of the complex formation
Identification of bound toxin fragments
Comparizon of binding efficiency acroos binder variants
Proteomics thecnique are most suitable to analyze different binder-protein interaction in an automated process
Quantitative outputs for this aspect
% Reduction in toxin enzymatic activity
IC50 values for each binder
LC-MS peak intensities corresponding to toxin-binder complexes
References:
Cui, Y., Chen, X., Wang, Z., & Lu, Y. (2022). Cell-Free PURE System: evolution and achievements. BioDesign Research, 2022, 9847014. https://doi.org/10.34133/2022/9847014
Hunt, A. C., Rasor, B. J., Seki, K., Ekas, H. M., Warfel, K. F., Karim, A. S., & Jewett, M. C. (2024). Cell-Free gene Expression: methods and applications. Chemical Reviews, 125(1), 91–149. https://doi.org/10.1021/acs.chemrev.4c00116
Waters Part I: Molecular Weight
Theoretical Molecular Weight and Isoelectric Poin
The molecular weight (MW) of the eGFP protein was calcultad using the Expasy tool. Prior the calcultion, the His-tag and linker sequences were removed to obtain the MW corresponding to the native eGFP amino-acid sequence:
Native eGFP MW: 26.941 kDa
Isoelectric point: 5.58
The isoelectric point, being below than 7, indicated that the protein behaves as a weak acid, carrying a negative charge at a physiological pH
Expasy calculates MW using the average isotopic masses of amino acid and stimates the isoelectric point using the Bjellqvist pK set.
Adjacent Charge State Apprach for Experimental MW Determination
The molecular weight of eGFP was determined using LC-MS data provide in the homework figure. Two of the most intense adjacent charge-state peaks were selected for analysis (Figure 4)
Figure 4: Determination of the experimental MW of the eGFP protein through LC-MS analysis
Using this approach a charge stat of 31 was stimated and the experimental MW was estimated near to 28.05 kDa
Comparison with the theoretical MW of native eGFP revealed a difference of 1.108 kDa. This deviation cannot be explained by common post‑translational modifications (PTMs), which typically contribute mass shifts in the range of +14 to +80 Da. Furthermore, eGFP expressed in E. coli does not undergo glycosylation, eliminating the possibility of large PTM‑related mass additions.
The calculated accuracy of the experimental result relative to the native theoretical MW was 3.95%, which is significantly higher than the accuracy typically reported for intact‑protein mass determination using this methodology (Durbing et al., 2025). This discrepancy strongly suggested that the analyzed protein did not correspond to the native eGFP sequence.
Molecular Weight including His-Tag and Linker
It was propose that the theoretical MW include the his-tag and linker sequences so a new theoretical MW was determine including this sequences:
Molecular Weight (MW): 28.006 kDa
Isoelectric point (pI): 5.90
When compared to the experimental MW (28.05 kDa), the difference was only 0.044 kDa, corresponding to an accuracy of 0.09%. This level of agreement is consistent with the expected performance of intact‑mass LC‑MS analysis and suggests that the protein analyzed retains its purification tag and linker.
Part 3: Waters Part 3
The eGFP peptide was analyzed using the Benchling software that states the sequence contains 6 arginines and 20 lysines that compose 2.4% and 8.1% of the whole protein respectively (Figure 5)
Figure 5
Usin the Expasy Peptide mass tool it is predicted that the eGFP would have 19 fragments after the digestion with trypsin.
Figure 6
Week 11 HW: Bioproduction and Cloud Labs
Part A: Artwork Canvas
Contibution: I contribute with 52 dots making the “Love” Desing at the bottom left plate. I liked the collaborative working of the canvas and the constant organic evolution of the design that resembles very well with the development of synthetic biology. For a next collaborative art experiment I could propose an experiment that serves as a competence of different groups, or a collaborative project that shows thet location of the participant in the world.
Part B: Cell-Free Protein Synthesis
Components of the cell-free reaction:
E. coli lysate
BL21 (DE3) Star Lysate: Cell extract that contains all the components neccesary for transcription, translation, folding a nd energy regeneration of the cell-free system. E. coli BL21 strains is typically chosen for cell-free expresion because of its enhance mRNA stability and decrease protease activity (Hunt et al. 2024).
Salts/Buffer
Potassium and Magnesium Glutamate: These salts are used to have different uses. Potassium and Magnesium ions are use to balance the charge of negatively charge biomolecules like DNA (Hunt et al., 2024). The ions are also involved in mantaining the integrity of stability of ribosomes, for example lack of magnesium is related with the loss of ribosome integrity and lack of potassium is related with the loss of ribosome activity (Nierhaus, 2014)
Glutamate acts a anionic counterion but also is used as secondary energy source. Optimizing glutamate amount is commonly done to regulate the amount of NAD and CoA cofactors because these tend to increase the cost of the expression system (Cai et al., 2015)
HEPES-KOH and Potassium phosphate monobasic and dibasic: HEPES is traditionally used a buffer to control the pH level under acidic subproducts created from glucose metabolism (Hunt et al., 2024). Potassium phosphate can be also use as buffer and also to control ATP concentrations by supporting glycolisis and pyruvate metabolism.
Energy / Nucleotide system
Energy supplements are used for ATP regeneration through metabolic pathways. ATP is required as energy molecule for transcription and translation. Glucose is used a primary energy source for ATP generation through glycolisis but requires phosphate as source (Hunt et al., 2024)
NMPs (AMP, CMP, GMP, UMP): Are used a cost-effective supplement of nucleotides replacing NTPs. This monophosphates. Incorporation of NMPs is well stablished as the cytomin/glutamate-phosphate/NMP batch system because of its high yield and batch production (Jewett et al., 2008)
Translation Mix
17 amino acid mix: Aminoacids are neccesary for the translation process. Traditionally the 20 amino acid can be supplemented in excess to the reaction, however optimization studies have shown that several aminoacids like aspartate, lysine, and tyrosine can be produced in the cell-free reaction using the extract source materials (Jewett et al., 2008). Simmilarly other study has shown thet effect of halving every single aminoacid in the reaction showing that some aminoacids can be obtained through native metabolic process (Pedersen et al., 2010).
Cystein also plays a key role in protein synthesis, for example increasing the concentration of cystein can be related with and increase of glutathione S-transferase (Shingaki & Nimura, 2011), also cystein can cause inespecific disulfude bonds so its optimization is important.
Additives
Nicotinamide: Nicotinamide is a regulator of NAD metabolism and synthesis being an important regulate of the energy production.
Backfill
Nuclease-Free Water: Nuclease free water acts as the medium for all metabolic reactions and transcription-translation process. It is required to provide water free of nucleases and protease to avoid degradation of the desired products
Differences between 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix
Figure below shows master mix compositions for the cell free system
The principal difference that can be found is the source of energy and nucleotides used. For the 1 hour optimized system it can be observed that is base on Phosphoenol pyruvate (PEP), which is often used as energy soruce for the cell-free reaction. PEP is an expensive additive to the CFS reaction but initially showed better performance than glucose (Calhoun & Swartz, 2005). For the second master mix Ribose and Glucose are used a main source of energy, this could decrease the pH of the reaction and for that the addition of other buffers like Potassium phosphate are included.
Another main difference between both systems is the used of tri-phosphate nucleotides for the 1-hour master mix and monophosphate nucleotides for the 20 hours master mix, According to Jewett et al (2008), NMPs are cost-effective replacements of NTPs and can be used in batch systems with high yields.
Some additives are used in the 1 hour master mix to improve the reaction, for example Spermidine is usually used to increase transcription fidelity (Igarashi et al., 1982) and DMSO is commonly used to dissolve water-insoluble substances. In the case of the 20-hour master mix this additives can be avoided to decrease the cost of the reaction.
Part C: Planning the Global Experiment
Fluorescent Proteins
sfGFP: Superfolded GFP is a variant of GFP from Aequorea victoria that is superfolded to avoid the effect of GFP folding in oxidizing environments (Aronson et al., 2011). One characteristic that can affect the role of GFP is the formation of inespecific agrupation because the protein tends to form weak dimers that can produce inclusion bodies.
mRFP1: mRFP1 main disadvantages are the slow maturation and tendecy to form oligomers
mKO2: mKO2 shows moderated acid sensitivity with a pKa of (5.5) indicating the necessity to control the pH of the medium
References
Aronson, D. E., Costantini, L. M., & Snapp, E. L. (2011). Superfolder GFP is fluorescent in oxidizing environments when targeted via the SEC Translocon. Traffic, 12(5), 543–548. https://doi.org/10.1111/j.1600-0854.2011.01168.x
Calhoun, K. A., & Swartz, J. R. (2005). Energizing cell‐free protein synthesis with glucose metabolism. Biotechnology and Bioengineering, 90(5), 606–613. https://doi.org/10.1002/bit.20449
Cai, Q., Hanson, J. A., Steiner, A. R., Tran, C., Masikat, M. R., Chen, R., … & Yin, G. (2015). A simplified and robust protocol for immunoglobulin expression in E scherichia coli cell‐free protein synthesis systems. Biotechnology progress, 31(3), 823-831.
Hunt, A. C., Rasor, B. J., Seki, K., Ekas, H. M., Warfel, K. F., Karim, A. S., & Jewett, M. C. (2024). Cell-Free gene Expression: methods and applications. Chemical Reviews, 125(1), 91–149. https://doi.org/10.1021/acs.chemrev.4c00116
Igarashi, K., Hashimoto, S., Miyake, A., Kashiwagi, K., & Hirose, S. (1982). Increase of fidelity of polypeptide synthesis by spermidine in eukaryotic Cell‐Free systems. European Journal of Biochemistry, 128(2–3), 597–604. https://doi.org/10.1111/j.1432-1033.1982.tb07006.x
Jewett, M. C., Calhoun, K. A., Voloshin, A., Wuu, J. J., & Swartz, J. R. (2008). An integrated cell‐free metabolic platform for protein production and synthetic biology. Molecular Systems Biology, 4(1), 220. https://doi.org/10.1038/msb.2008.57
Pedersen, A., Hellberg, K., Enberg, J., & Karlsson, B. G. (2010). Rational improvement of cell-free protein synthesis. New Biotechnology, 28(3), 218–224. https://doi.org/10.1016/j.nbt.2010.06.015
Shingaki, T., & Nimura, N. (2011). Improvement of translation efficiency in an Escherichia coli cell-free protein system using cysteine. Protein Expression and Purification, 77(2), 193–197. https://doi.org/10.1016/j.pep.2011.01.017