A happy, curious soul who loves working with livestock 🐄, though if you dangle schistosomiasis, I might just cave 😆.I’m proudly biased toward animal health, deeply passionate about One Health.
On a typical day, you’ll find me doing science communication, policy advocacy for biotechnology, and figuring out how to better support One Health governance 😊 all while innovating, experimenting, and pushing the needle forward to make the world a little better ⭐ ⭐
I am a HTGAA Committed Listener, my responsibilities are:
• Watching class lectures and recitations
• Participating in node reviews
• Developing and documenting my homework
• Actively communicating with other students and TAs on the forum
• Allowing HTGAA and BioClub to share my work (with attribution)
• Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
• Following locally applicable health and safety guidance
• Promoting a respectful environment free of harassment and discrimination
• Signed by committing this file to my documentation page/repository,
(The Challenge) Mastitis is an inflammation of the mammary gland in dairy cows, often caused by bacterial pathogens. It is a common and costly issue in dairy farms, leading to significant economic losses and affecting overall milk quality(Damasceno et al., 2025). The condition can arise from various factors, including poor hygiene, stress, and injuries to the udder. Common bacterial pathogens responsible for mastitis include Staphylococcus aureus, Escherichia coli, and Streptococcus uberis, which can enter the udder through damaged skin or during milking. Mastitis presents in either clinical or subclinical form. Coliforms from Escherichia coli, Klebsiella spp., and Enterobacter spp. account for 40% of clinical mastitis cases.
Part 1: Benchling & In-silico Gel Art I began by signing into my Benchling account and creating a new folder called HTGAA-read, write, and edit I then added a new sequence that is the Lambda_NEB fasta sequence On the right-hand side, I used the icon that looks like scissors to enter the different restriction enzymes and then viewed the different digestion sites on the virtual digest tab. Below is the image from the virtual digest for the different enzymes.
Python Script for Opentrons Artwork I first designed the sunflower using the Opentrons-Art Website. For the design, I went with a sunflower design. https://opentrons-art.rcdonovan.com/?id=e3z1i8r73863y1k
I then downloaded the Excel file, and on HTGAA26 Opentrons Colab, I leveraged Gemini assistance to be able to write the Python script. I uploaded the Excel sheet and promted gemini to assist in developing a script that would give me the sunflower design. Link to the Colab:https://colab.research.google.com/drive/1arsozAVNQhs-4Ol0LMIRKZ4QGVld0Kgf?usp=sharing
Part A. Conceptual Questions How many molecules of amino acids in 500g of meat? A quick search shows that, for example, beef contains ~20–22g of protein per 100g of meat. Also, it would be good to mention that Raw meat contains more water, so the total weight of amino acids is lower per 100g compared to cooked meat, where water loss concentrates the nutrients (often increasing protein to 28–36g per 100g).
Part 1: Generate binders with PepMLM Step 1 — Get the mutant SOD1 sequence Original SOD1 sequence from Uniprot
sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
A4V mutant SOD1 sequence
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Step 2 — Run PepMLM Used google colab notebook for the step.
DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? What you add yourself (not in the master mix) Forward primer + Reverse primer — define what region to amplify Template DNA — the sequence to be copied Nuclease-free water — to bring the reaction to final volume
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Integrated Artificial Neural Networks (IANNs) offer superior advantages over traditional Boolean genetic circuits by enabling analog processing, high-dimensional pattern recognition, and robust adaptability. Unlike Boolean circuits (on/off), IANNs handle continuous input levels, allowing complex decision-making, such as classifying disease states based on weighted molecular signatures rather than simple threshold switches.
Homework Part A: General and Lecturer-Specific Questions General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. In traditional in vivo (inside the cell) methods, the cell’s primary goal is its own survival. In CFPS, the goal is purely production.
Homework: Waters Part I — Molecular Weight We were analyzing an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).
Mastitis is an inflammation of the mammary gland in dairy cows, often caused by bacterial pathogens. It is a common and costly issue in dairy farms, leading to significant economic losses and affecting overall milk quality(Damasceno et al., 2025). The condition can arise from various factors, including poor hygiene, stress, and injuries to the udder. Common bacterial pathogens responsible for mastitis include Staphylococcus aureus, Escherichia coli, and Streptococcus uberis, which can enter the udder through damaged skin or during milking. Mastitis presents in either clinical or subclinical form. Coliforms from Escherichia coli, Klebsiella spp., and Enterobacter spp. account for 40% of clinical mastitis cases.
While antibiotics remain the primary treatment for clinical mastitis, improper use contributes to antimicrobial resistance (AMR), making infections harder to control. The overuse and misuse of antibiotics in farms contribute to the emergence of drug-resistant mastitis-causing pathogens, further complicating disease management. Mastitis is difficult to eradicate, but its prevalence can be significantly reduced through proper farm management and preventive measures to ensure profitable production in dairy farms. One of such methods is the use of teat dipping disinfectants before and after milking. Current post-milking disinfectants are designed to reduce bacterial exposure and control infections. However, their effectiveness can vary, and there are limitations to their use.
Commercial teat dips typically contain active ingredients such as iodine, chlorhexidine, and lactic acid, each chosen for their bactericidal properties. Iodine-based products have long been favored for their broad-spectrum efficacy; however, concerns about iodine residues in milk are prompting farmers to consider alternatives. Chlorhexidine is another common disinfectant known for its effectiveness against various pathogens, but its performance can vary significantly depending on the formulation. Lactic acid is gaining popularity as well, particularly when combined with hydrogen peroxide, which has shown promising results in reducing bacterial loads. Nevertheless, these traditional disinfectants are not without limitations. For example, their effectiveness might diminish in the presence of organic matter, which can inhibit their action on bacteria (Fitzpatrick et al., 2021).
Reliance on these products may lead to issues such as the development of resistant bacterial strains, raising questions about their long-term efficacy(Damasceno et al., 2025). Given these challenges, the dairy industry is increasingly looking for innovative solutions to enhance mastitis control. This exploration includes synthetic antimicrobial peptides, which could provide a more effective and safer alternative to conventional disinfectants, addressing both efficacy and the concerns associated with traditional products (Ózsvári & Ivanyos, 2022)
(The Intervention)
Synthetic antimicrobial peptides (AMPs) is a promising substitute for traditional post-teat dips in dairy farming. The peptides can be designed to control a broad range of bacteria, making them effective against the pathogens responsible for mastitis. The mechanism of action of AMPs can involve disrupting the bacterial cell membranes, which leads to cell death. This method of attack is different from that of many conventional disinfectants that often rely on chemical reactions.
(The Promise)
A safer, more effective preventative solution to control mastitis.
‘Week 1 HW: Governance Policy Goals’
‘Week 1 HW: ‘Current Status & Potential Governance Actions’
In Kenya, currently, there are no defined regulations for products developed through synthetic biology. Medicines and veterinary products are regulated mainly through the Pharmacy and Poisons Board and the Directorate of Veterinary Services. For products that may have foreign DNA they would be considered as GMOs and therefore would be regulated by the National Biosafety Authority.
There is a need to explore the development of a regulatory pathway within existing Kenyan institutions for synthetic biology-based products. This will ensure that the products meet all safety requirements and that there is public confidence in their safety. However, there is concern that the introduction of these regulations may over-extend approval timelines and regulatory processes, potentially delaying research progress and the deployment of synthetic biology–derived products. The following governance actions are meant to help balance building public trust and ensuring that the products from synthetic biology reach the end users.
Development of a regulatory pathway for the regulation of Syn-Bio products
Picking lessons from the development of gene editing regulatory products, the same aspect can be used to decide on how syn-bio products are regulated. It can be 2-phased, where level 1 is those that do not incorporate foreign DNA, and level 2 is those that involve incorporated foreign DNA. In terms of regulation, a clear pathway can be developed for the two levels, whereby the different government agencies can work together to develop factors such as setting the maximum residue limit in Milk & Milk products, as in the case of antimicrobial peptides.
For this to be achieved, there is a need for buy-in from the National Government, researchers, and regulatory bodies such as NBA-Biotech regulation, NEMA- environment regulation, and directorate of Veterinary Service, and the Public Health Institutes.
One of the main assumptions is that the various organizations have the technical capacity and know-how to regulate synthetic biology products. There is also an assumption that there is an existing risk assessment structure that can easily be adapted for antimicrobial peptides and even those for ranking syn-bio products as high, moderate, and low risk.
Develop and Implement a stewardship plan
Drawing from best practices from stewardship plans from biotech products, a detailed stewardship plan can be developed by regulators and researchers to ensure that farmers use the products as stipulated and that there is a way of ensuring that they are not abused. A quick example would be guidelines on how to ensure traceability and compliance, and use AMP-based dips as part of integrated mastitis management plans and not as the magic bullet that would solve all the problems.
The assumptions made are that the products can be easily traceable and that the farmers would be willing to keep proper records without tokenism involved. The expectation is that the farmers will follow the farm management and record-keeping plan provided, and that they will be honest and not falsify the records.
‘Week 1 HW: ‘Week 2 Lecture Prep’
Homework Questions from Professor Jacobson
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
Error Rate: 1:106
Biology deals with this through proofreading and replacing the mismatched base pairs.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Homework Questions from Dr. LeProust
What’s the most commonly used method for oligo synthesis currently? Phosphoramidite synthesis
Why is it difficult to make oligos longer than 200nt via direct synthesis? There would be compounding inefficiencies, since the addition process is not perfect
Why can’t you make a 2000bp gene via direct oligo synthesis? There would be higher error rates
Homework Question from George Church
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
Lysine is an essential amino acid, meaning that we have to eat certain foods to obtain it. Based on the story from Jurassic Park, the scientist inserted a gene that created a single faulty enzyme in protein metabolism. The animals could therefore not manufacture the amino acid lysine. This technically does not matter because with or without the engineered enzyme, the dinosaurs would still be dependent on the food they consume to get the nutrients.
In the event they escaped, they would still eat other lysine-rich foods in nature and would get lysine and therefore survive without being in captivity.
Week 2 HW: Read, write and Edit
Part 1: Benchling & In-silico Gel Art
I began by signing into my Benchling account and creating a new folder called HTGAA-read, write, and edit
I then added a new sequence that is the Lambda_NEB fasta sequence
On the right-hand side, I used the icon that looks like scissors to enter the different restriction enzymes and then viewed the different digestion sites on the virtual digest tab. Below is the image from the virtual digest for the different enzymes.
Part 3: DNA Design Challenge
3.1. Choose your protein
Through a literature search using pubmed I searched for papers that have explored naturally occurring peptides that have antimicrobial activity towards both Gram-positive and Gram-negative bacteria. Staphylococcus aureus (Gram-positive), Escherichia coli (Gram-negative), and Streptococcus uberis (Gram-positive) are the most common mastitis-causing bacteria in dairy farming.
Pituitary adenylate cyclase-activating polypeptide (PACAP) is a naturally occurring cationic peptide known for its strong immunosuppressive and cell-protective effects. It belongs to the secretin, growth hormone-releasing hormone (GHRH), and vasoactive intestinal peptide (VIP) family, and exhibits significant anti-inflammatory and cytoprotective capabilities. While PACAP is most concentrated in the brain, it is also present in notable amounts in other tissues, such as the thymus, spleen, lymph nodes, and duodenal mucosa (Vaudry et al., 2009). Despite its therapeutic potential, the use of PACAP as a pharmaceutical agent is hindered by its extremely short half-life in the bloodstream after systemic delivery, largely due to rapid degradation—particularly by the enzyme DPP IV, which targets the peptide’s amino terminus through exopeptidase activity (Green et al., 2006). PACAP occurs in two bioactive forms, both amidated, consisting of either 38 or 27 amino acids.
Producing PACAP synthetically is crucial for scalability, uniformity, and ethical reasons(Starr et al., 2018). Since the peptide is primarily found in sensitive tissues like the brain and immune organs, extracting it from natural sources is neither feasible nor appropriate for widespread agricultural applications. Instead, recombinant methods using engineered microorganisms enable large-scale, cost-efficient production through fermentation, ensuring consistent quality and purity.
Green, B. D., Irwin, N., & Flatt, P. R. (2006). Pituitary adenylate cyclase-activating peptide (PACAP): Assessment of dipeptidyl peptidase IV degradation, insulin-releasing activity and antidiabetic potential. Peptides, 27(6), 1349–1358. https://doi.org/10.1016/j.peptides.2005.11.010
Starr, C. G., Maderdrut, J. L., He, J., Coy, D. H., & Wimley, W. C. (2018). Pituitary adenylate cyclase-activating polypeptide is a potent broad-spectrum antimicrobial peptide: Structure-activity relationships. Peptides, 104, 35–40. https://doi.org/10.1016/j.peptides.2018.04.006
Vaudry, D., Falluel-Morel, A., Bourgault, S., Basille, M., Burel, D., Wurtz, O., Fournier, A., Chow, B. K. C., Hashimoto, H., Galas, L., & Vaudry, H. (2009). Pituitary Adenylate Cyclase-Activating Polypeptide and Its Receptors: 20 Years after the Discovery. Pharmacological Reviews, 61(3), 283–357. https://doi.org/10.1124/pr.109.001370
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence
I used the “Reverse Translate” tool of the Sequence Manipulation Suite to get the nucleotide sequence. And confirmed that the sequence was accurate using the NCBI BlastX to confirm that it is accurate.
Screenshot of the BlastX results confirming accurate reverse translation
3.3. Codon optimization
Codon optimization is essential because, while several DNA codons can code for the same amino acid, different species show distinct preferences in codon usage. Expressing the bovine PACAP sequence directly in a microbial host without adjustment could lead to poor translation efficiency and, consequently, low protein output. To support economical, large-scale production, the gene sequence was adapted for use in Escherichia coli, a widely adopted system for recombinant protein expression. By aligning codon usage with the host’s natural preferences, translation becomes more efficient, protein yields improve, and overall production costs decrease, enabling viable industrial synthesis of the antimicrobial peptide.
Codon optimization using VectorBuilder
Improved sequences from VectorBuilder
The following is the improved DNA sequence post-codon optimization
ATGACCATGTGCAGCGGCGCCCGTCTGGCCCTGCTGGTGTATGGCATTCTGATGCACAGCAGCGTGTATGGCTCGCCGGCGGCGAGCGGCCTGCGATTCCCGGGTATTCGTCCGGAAAATGAAGTGTACGATGAAGATGGCAACCCGCAGCAGGATTTTTATGATAGCGAAAGCCTGGGCGTTGGCAGCCCGGCGAGCGCGCTGCGCGATGCGTATGCGCTGTATTATCCGGCAGAAGAACGTGATGTGGCGCACGGCATTCTGAATAAAGCGTATCGCAAAGTGCTGGATCAGCCGAGCGCGCGCCGCAGCCCGGCAGATGCGCATGGCCAGGGTCTGGGCTGGGATCCGGGTGGCAGCGCCGATGATGATAGCGAACCGCTGAGCAAACGCCATAGCGATGGCATTTTTACCGATAGCTATAGCCGCTATC
3.4. You have a sequence! Now what?
Once the codon-optimized DNA sequence for PACAP is available, it needs to be integrated into a biological system capable of transcribing the DNA into mRNA and subsequently translating that mRNA into the PACAP protein. Given the objective of producing PACAP as an affordable antimicrobial peptide for managing mastitis, a cell-based production system using E. coli fermentation is the most suitable choice. This approach offers high protein output, low manufacturing costs, scalability for industrial use, and practical potential for commercial application.
The PACAP gene would first be incorporated into an expression vector, typically a plasmid, which is then delivered into Escherichia coli cells. Within the bacterial host, RNA polymerase recognizes the promoter region and initiates transcription of the PACAP gene into mRNA. Ribosomes then attach to the mRNA and begin translation, synthesizing the PACAP peptide with the help of tRNAs that correspond to the optimized codons. As the bacteria grow and divide, they generate substantial quantities of the peptide, which can later be harvested and purified.
Part 4: Prepare a Twist DNA Synthesis Order
4.2. Build Your DNA Insert Sequence
Following the instructions from the homework, I manually inserted the following sequences sequentially into the bases section
Promoter (e.g. BBa_J23106): TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC
RBS (e.g. BBa_B0034 with spacers for optimal expression): CATTAAAGAGGAGAAAGGTACC
Coding Sequence (The codon optimized sequence already had a start codon so I did not put two start codons)
7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli): CATCACCATCACCATCATCAC
Stop Codon: TAA
Terminator (e.g. BBa_B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA
Below are the screenshots from the various steps during the annotation process.
Annotation window
Linear map of the annotated sequence.
Twist Workflow
Construct- Twist
Plasmid with expression cassette - Benchling
Part 5: DNA Read/Write/Edit
5.1 DNA Read
(i) What DNA would you want to sequence (e.g., read) and why?
In the context of mastitis control and epidemiological surveillance, I would sequence metagenomic DNA extracted from milk, teat swabs, and environmental samples collected from dairy farms with and without biosecurity measures. Sequencing this DNA would allow identification of the dominant mastitis-causing bacteria circulating within these farms and enable detection of antibiotic resistance genes present in the microbial populations. This information would provide real-world insight into pathogen prevalence and resistance patterns, which would directly inform the rational design and optimisation of my PACAP-based antimicrobial peptide to ensure it is effective against the most relevant and resistant strains. In addition, I would sequence my final PACAP expression construct to confirm sequence accuracy and ensure that no mutations were introduced during cloning or synthesis.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
For metagenomic analysis of dairy farm samples, I would use shotgun next-generation sequencing (NGS), specifically a short-read platform such as Illumina sequencing. This method is classified as a second-generation sequencing technology.
The input for this method would be total DNA extracted from milk, teat swabs, or environmental samples collected from dairy farms. After extraction, the DNA must be prepared into a sequencing library. (Usually, there is a step by step protocal shared on how to prepare the libraries, including pulling and how to send them for sequencing)
The output of this sequencing technology is a large dataset of short DNA reads in FASTQ format, which includes both the nucleotide sequence and associated quality scores for each base. These reads can then be assembled or mapped against reference databases to identify bacterial species present in the samples and detect antibiotic resistance genes.
5.2 DNA Write
(i) What DNA would you want to synthesize (e.g., write) and why?
I would synthesise a codon-optimised version of the PACAP-27 gene designed for efficient expression in Escherichia coli. The sequence would be optimised to match the preferred codon usage of the host organism in order to maximise protein yield. Additionally, based on insights gained from metagenomic surveillance data, I may incorporate rational modifications to improve antimicrobial activity, stability, or resistance to proteolytic degradation. Synthesising the gene rather than amplifying it from a natural source allows full control over sequence design and ensures the construct is tailored specifically to the pathogens identified in dairy farm environments.
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
I would use commercial chemical DNA synthesis technology, which relies on phosphoramidite chemistry to generate short oligonucleotides that are enzymatically assembled into full-length genes. This method allows precise sequence customisation, incorporation of optimised codons, and removal of unwanted restriction sites. It is reliable, accurate, and particularly suitable for small genes such as PACAP, making it an efficient approach for generating a ready-to-clone expression construct.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I would edit the PACAP gene sequence to enhance its antimicrobial properties while maintaining safety and specificity. Edits may include amino acid substitutions that increase cationic charge, improve membrane disruption, or enhance activity against Gram-negative bacteria and biofilm-forming strains identified in the metagenomic analysis. These modifications would be guided by epidemiological data to ensure the peptide is effective against the most prevalent mastitis pathogens. I may also edit regulatory elements within the expression vector to optimise protein yield or secretion efficiency.
(ii) What technology or technologies would you use to perform these DNA edits and why?
To introduce precise amino acid substitutions into the PACAP gene, I would use site-directed mutagenesis, as it enables targeted and controlled sequence modifications without altering the rest of the construct. Site-directed mutagenesis would be sufficient to generate improved peptide variants informed by metagenomic surveillance data.
I then downloaded the Excel file, and on HTGAA26 Opentrons Colab, I leveraged Gemini assistance to be able to write the Python script.
I uploaded the Excel sheet and promted gemini to assist in developing a script that would give me the sunflower design.
Link to the Colab:https://colab.research.google.com/drive/1arsozAVNQhs-4Ol0LMIRKZ4QGVld0Kgf?usp=sharing
Publication Title: Automation of biochemical assays using an open-sourced, inexpensive robotic liquid handler
While going through the paper, I analyzed two things.
How the Opentrons OT-2 was used.
How was it making it easier?
The Opentrons OT-2 was being used to automate biochemical assays for protein and DNA measurement. Specifically, the researchers developed two assays: the Bradford assay for protein quantification and the PicoGreen assay for double-stranded DNA measurement. These assays are highly relevant to vaccine development, as they provide information on efficacy, potency, safety, and quality control.
Based on this article, the OT-2 simplifies laboratory automation in several ways:
Cost-effective: Unlike traditional commercial liquid handlers that cost over $250,000, this system costs under $10,000, making automation accessible to smaller labs.
User-friendly programming: It uses Python programming language with open-source flexibility, reducing training requirements.
Accurate and precise: The study found the pipettes exhibited excellent accuracy and precision, with relative inaccuracy of 1.30% for the P20 pipette and 0.53% for the P300 pipette.
Time-saving: The Bradford assay completed in 75 minutes and the PicoGreen assay in 41 minutes, reducing manual labor.
How would I leverage this for my work
Based on the work I want to do that is aimed at prpoducing Antimicrobial Peptide at industrial scaled there are several ways I can leverage it to my advantage.
The OT-2 could be adapted for this application in the following ways:
Quality control testing: The automated protein and DNA quantification assays demonstrated here could monitor protein expression levels and detect residual DNA contamination during antimicrobial peptide synthesis and purification.
Assay automation: The system’s ability to run customized Python protocols means I could develop automated workflows for peptide synthesis optimization, mixing reagents, and conducting potency assays.
High-throughput screening: The medium-throughput capability would allow testing multiple peptide variants or production conditions simultaneously to identify the most effective formulations.
Week 4 HW: protein-design-part-i
Part A. Conceptual Questions
How many molecules of amino acids in 500g of meat?
A quick search shows that, for example, beef contains ~20–22g of protein per 100g of meat.
Also, it would be good to mention that Raw meat contains more water, so the total weight of amino acids is lower per 100g compared to cooked meat, where water loss concentrates the nutrients (often increasing protein to 28–36g per 100g).
If:
100g = 20g of protein
500g = ?
(500*20)/100= 100g of protein
An average amino acid molecular weight of ~100 Da (100 g/mol):
Moles of amino acids = 100g ÷ 100 g/mol = 1 mol
Number of molecules = 1 mol × 6.022 × 10²³
≈ 6 × 10²³ amino acid molecules
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
The short answer is because of digestion.
Proteases in the gut break all dietary proteins down to their constituent free amino acids, meaning the informational content (sequence) of the cow’s proteins is destroyed. Ribosomes then reassemble amino acids in the order dictated by ones own mRNA, encoding human proteins.
Interesting fact I discovered Any intact foreign protein that did slip through is attacked by ones immune system as a non-self antigen which is the basis of food allergies (fragments sometimes escape digestion).
Why are there only 20 natural amino acids?
One of the main theories as to why there are only 20 amino acids in the world is attributed to evolution.
The “frozen accident” / historical contingency
The genetic code was fixed early in evolution (~3.5–4 billion years ago) and became locked in. Changing it would be catastrophic (every codon reassignment would corrupt thousands of proteins simultaneously).
Can you make other non-natural amino acids? Design some new amino acids.
Yes, numerous non-natural (or non-canonical) amino acids (ncAAs) can be designed and synthesized using advanced chemical methods, such as palladium-catalyzed C-H bond functionalization. By incorporating unnatural amino acids, scientists can extend the genetic code, creating proteins with unique structural and functional properties.
Where did amino acids come from before enzymes that make them, and before life started?
Some of the theories of the origin of amino acids include:
a) The Miller-Urey experiment (1953)
b) Meteorite delivery
c) Hydrothermal vents
d) HCN and formaldehyde chemistry
e) The “RNA world” scenario
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
It would be LEFT-HANDED.
Proteins built from D-amino acids form left-handed α-helices, which are mirror images of the right-handed helices formed by L-amino acids.
Can you discover additional helices in proteins?
Proteins can adopt several types of helical structures, each with distinct geometric and hydrogen-bonding characteristics. The α-helix is the most common helical structure found in proteins. Beyond the classic α-helix, several other helical structures exist.
There is:
3₁₀-helix is a tighter helix with 3.0 residues per turn. Its hydrogen bonds form between residue i and i + 3, resulting in a rise of about 2.0 Å per residue. This helix often appears at the termini of α-helices rather than as long, independent structures.
π-helix is a rare and wider helix, containing 4.4 residues per turn. Its hydrogen bonding occurs between residue i and i + 5, with a rise of approximately 1.1 Å per residue. π-helices are uncommon but are frequently found at functional sites in proteins.
Polyproline II (PPII) helix has 3.0 residues per turn and lacks intramolecular hydrogen bonds. Instead, it adopts an extended left-handed conformation with a rise of about 3.1 Å per residue. This structure plays an important role in signaling, particularly in binding interactions such as those involving SH3 domains.
Polyproline I (PPI) helix contains 3.3 residues per turn and also lacks intramolecular hydrogen bonds. It is right-handed and more compact, with a rise of about 1.9 Å per residue. This form is typically observed in organic solvents.
Collagen helix consists of approximately 3.3 residues per turn and is stabilized by interchain hydrogen bonds rather than intramolecular ones. Each chain has a rise of about 2.9 Å per residue, and three chains wind together to form a characteristic triple helix. This structure is distinguished by its repeating Gly–X–Y sequence pattern.
Why are most molecular helices right-handed?
Natural proteins are composed mainly of L-amino acids. The stereochemistry of the peptide backbone energetically favors right-handed α-helices.
Why do β-sheets tend to aggregate?
What is the driving force for β-sheet aggregation?
β-sheets hydrogen bond through their edge strands, which have free NH and C=O groups pointing outward, ready to hydrogen bond with another β-sheet edge. Unlike α-helices (where all H-bond donors/acceptors are internally satisfied), β-sheet edges are “sticky.”
Part B: Protein Analysis and Visualization
One of the final individual projects I want to do is the development of a biosensor for the diagnosis of Vibrio Cholerae. To help me learn more about V. Cholerae I will use this assignment to study one of the main toxins that is associated with toxicity.
Cholera toxin (CT) is defined as the major virulence factor of the bacterium Vibrio cholerae, comprising a single A subunit and five identical B subunits in an A-B5 architecture, which causes severe diarrheal symptoms in infected individuals. Cholera toxin is a member of the AB toxin family and is composed of a catalytically active heterodimeric A-subunit linked with a homopentameric B-subunit.
Briefly describe the protein you selected and why you selected it.
The protein I have selected is the Cholera enterotoxin subunit A. Cholera toxin subunit A (CTA) is the catalytically active component of the AB₅ cholera enterotoxin secreted by Vibrio cholerae, and it is the central molecular component responsible for the devastating secretory diarrhea that defines cholera disease.
Identify the amino acid sequence of your protein. (For this part I search the Cholera toxin on UniProt, after which i selected the Cholera enterotoxin subunit A)
The amino acid sequence
MVKIIFVFFIFLSSFSYANDDKLYRADSRPPDEIKQSGGLMPRGQSEYFDRGTQMNINLYDHARGTQTGFVRHDDGYVSTSISLRSAHLVGQTILSGHSTYYIYVIATAPNMFNVNDVLGAYSPHPDEQEVSALGGIPYSQIYGWYRVHFGVLDEQLHRNRGYRDRYYSNLDIAPAADGYGLAGFPPEHRAWREEPWIHHAPPGCGNAPRSSMSNTCDEKTQSLGVKFLDEYQSKVKRQIFSGYQSDIDTHNRIKDEL
Sequence Lenght 258 Amino Acids
Amino Acid Frequency Using the provided colab notebook I was able to get the amino acid count.
The output is as follows: S: 23, G: 22, D: 19, Y: 18, R: 17, I: 16, L: 16, A: 15, P: 14, V: 13, F: 12, Q: 12, N: 11, E: 11, H: 11, T: 10, K: 8, M: 5, W: 3, C: 2
The letters represent each of the amino acids: S = Serine, G = Glycine, D = Aspartate, Y = Tyrosine, R = Arginine, I = Isoleucine, L = Leucine, A = Alanine, P = Proline, V = Valine, F = Phenylalanine, Q = Glutamine, N = Asparagine, E = Glutamate, H = Histidine, T = Threonine, K = Lysine, M = Methionine, W = Tryptophan, C = Cysteine.
The most abundant amino acid is Serine (S, n=23), closely followed by Glycine (G, n=22) and Aspartate (D, n=19) The overall composition hints at a hydrophilic, surface-exposed protein given the prevalence of polar and charged residues (S, D, R, Y, Q, N, E, H).
Protein Sequence Homologs
Protein homologs are proteins that share a common evolutionary origin. They come from the same ancestral gene, even if their sequences or functions have changed over time. There are two main types:
Orthologs – Proteins in different species that evolved from a common ancestral gene after a speciation event. They often retain similar functions.
Paralogs – Proteins within the same species that arose by gene duplication. They may evolve new or specialized functions over time.
Through BlastKB there were 250 hits
A comprehensive search for cholera toxin homologs across biological databases yielded 250 results distributed across three domains. The vast majority of results (207 sequences, 83%) came from fungi, particularly Ascomycota species, with notable representations from entomopathogenic fungi in the families Ophiocordycipitaceae, Clavicipitaceae, and Cordycipitaceae, as well as the plant pathogen Colletotrichum (84 results). Bacterial homologs comprised 42 sequences (17%), dominated by Pseudomonadota, including pathogenic species such as Escherichia coli, Vibrio cholerae, and Paraburkholderia, alongside environmental bacteria like Bartonella and Leptospira. A single viral sequence (1 result) was identified as Vibrio phage CTXphi, which is notably the well-characterized prophage known to carry the cholera toxin genes themselves, validating the search methodology. The predominance of fungal and bacterial sequences suggests that toxin-like proteins with structural or functional homology to cholera toxin are widespread across microbial organisms, though the biological significance of these fungal matches warrants further investigation given that toxin production is not a typical trait of this kingdom.
Protein Family
Based on the results, the protein belongs to the Cholera enterotoxin subunit A family.
Identify the structure page of your protein in RCSB
When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
It was solved on 2004-04-06. It has a resolution of 1.9 Å which was obtained through X-RAY DIFFRACTION. I would say it has good quality since the resolution is smaller than 2.70 Å.
Are there any other molecules in the solved structure apart from protein?
There are two unique ligands present in the structure, specifically: beta-D-galactopyranose and Sodium Ion.
Does your protein belong to any structure classification family?
Yes. It belongs to the ADP-ribosylating toxins. (NOT SURE ABOUT THIS ANSWER)
Open the structure of your protein in any 3D molecule visualization software:
Using PyMol I wanted to visualize my protein of interest.
Download the PDB file for the protein, then on PyMol, go to file then open, select the downloaded PDB file. (Chatgpt helped with the python code for the different visualizations)
Cartoon
I used the following commands
PyMOL>hide everything
PyMOL>show cartoon
PyMOL>color red
PyMOL>util.cbc (color by secondary structure)
Ribbon
I used the following commands
PyMOL>hide everything
PyMOL>show ribbon
Ball and Stick
I used the following command
yMOL>hide everything
PyMOL>show sticks
PyMOL>show spheres
To Improve structure
PyMOL>set sphere_scale, 0.25
PyMOL>set stick_radius, 0.15
To answer the following questions I used chapgpt to explain how to identify the different structures and also to understand the residues
Color the protein by secondary structure. Does it have more helices or sheets?
By default, PyMOL colors:
Red → α-helices
Yellow → β-sheets
Green → loops/coils
The protein has α-helices that appeared as a spiral / corkscrew shape, Long cylindrical coils and was red
It also has β-Sheets that appeared as Flat arrow-shaped strands, Multiple arrows next to each other and were yellowSpiral/corkscrew
It also had Loops that appeared as Thin connecting regions that were green
To tell whether I have more helices or sheets, I ran the following code
PyMOL>select helices, ss h
Selector: selection “helices” defined with 3854 atoms.
PyMOL>select sheets, ss s
Selector: selection “sheets” defined with 2744 atoms.
My protein has more Helices
Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
“hydrophobic” defined with 4211 atoms.
“hydrophilic” defined with 3626 atoms.
There are more hydrophobic atoms than hydrophilic atoms.
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?
Part C. Using ML-Based Protein Design Tools
C1. Protein Language Modeling
Deep Mutational Scans
Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
Can you explain any particular pattern? (choose a residue and a mutation that stands out)
The protein has several highly conserved positions (deep blue vertical stripes).
C2. Protein Folding
C3. Protein Generation
Part D. Group Brainstorm on Bacteriophage Engineering
(lower perplexity = the model is more confident the peptide is a good binder)
Part 2: AlphaFold3 structure prediction
For each result, I looked at the below:
The ipTM score (higher is better, closer to 1.0 means confident binding prediction)
Binder
Pseudo Perplexity
ipTM score
WLSYPVVLEWGE
16.4334209
0.21
WHYYVVAVRWKE
31.112519
0.35
WHSYAAAAALWE
12.9540785
0.27
WLYYAVGLAWKX
14.2779037
Could not be generated because it had a letter X
FLYRWLPSRRGG
XXXX
0.3
The ipTM score (interface predicted TM-score) measures how confidently AlphaFold3 predicts the two chains interact. It ranges from 0 to 1.
Below 0.4 = poor
0.4–0.6 = moderate
Above 0.6 = good
Where the peptide appears to dock (look at the 3D viewer — is it near position 4 at the N-terminus? At the dimer interface? Surface-exposed?)
I have a question on the fact that my second binder has a high Pseudo Perplexity value of 31 therefore the assumption is that the binding confidence level is very low. However through alphafold it has the best ipTM score suggesting that it had the best confident binding prediction
Part 3: PeptiVerse therapeutic properties
Binder
Solubility
Hemolysis
Binding Affinity
Length
Molecular Weight
Net Charge (pH 7)
Isoelectric Point
Hydrophobicity (GRAVY)
WLSYPVVLEWGE
Soluble - 1.00
Non-hemolytic 0.112
Weak binding - 5.952
12
1477.7
-2.23
4.24
0.26
WHYYVVAVRWKE
Soluble - 1.00
Non-hemolytic 0.069
Weak binding - 6.345
12
1635.9
0.85
8.5
-0.42
WHSYAAAAALWE
Soluble - 1.00
Non-hemolytic 0.035
Weak binding - 5.863
12
1375.5
-1.15
5.47
0.18
WLYYAVGLAWKX
Soluble - 1.00
Non-hemolytic 0.115
Medium binding - 7.101
12
1351.8
0.76
8.5
0.56
the best one balances strong binding + not hemolytic + good solubility.
Part 4: moPPIt optimized design
Week 6 HW: Genetic Circuits Part I: Assembly Technologies
DNA Assembly
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
What you add yourself (not in the master mix)
Forward primer + Reverse primer — define what region to amplify
Template DNA — the sequence to be copied
Nuclease-free water — to bring the reaction to final volume
The star component is Phusion polymerase itself — it has a fused processivity domain that makes it faster than standard Taq, and crucially it has 3′→5′ proofreading (exonuclease) activity, giving it an error rate ~50× lower than Taq. This is why it is preferred when accuracy matters, such as before Gibson assembly or cloning.
What are some factors that determine primer annealing temperature during PCR?
The four main factors are GC content, primer length, the calculated melting temperature (Tm), and the buffer’s salt concentration. With Phusion specifically, NEB recommends using their online Tm calculator rather than generic formulas, because the enzyme’s processivity domain affects optimal conditions. A practical starting point is to set annealing temperature to the Tm of your lower-Tm primer minus 5°C, then optimise from there.
Consequences of wrong annealing temperature
Ta too LOW → primers bind non-specifically → multiple bands, wrong products
Ta too HIGH → primers cannot bind stably → no product or very low yield
Rule of thumb for Phusion: use the NEB Tm Calculator and set Ta = Tm of lower primer − 5°C
Gradient PCR can be used to empirically optimise Ta when uncertain
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
Both methods produce linear DNA fragments, but through fundamentally different mechanisms. PCR synthesises new copies of a specific region exponentially from a template, using primers to define the exact endpoints — which means you can engineer the ends freely by adding sequences to your primers. A restriction enzyme digest cuts existing DNA at fixed recognition sequences, producing ends that are dictated entirely by where those sequences happen to sit in the molecule.
In terms of error risk, PCR introduces the possibility of polymerase errors (minimised by Phusion’s proofreading), while RE digests are faithful to the original sequence but can only cut where recognition sites exist. If those sites don’t exist in your insert, or if a cut site would fall in an undesirable location, PCR is the only option. Conversely, if you already have a plasmid carrying your insert flanked by known restriction sites, a digest is faster and simpler than designing and running PCR.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
Gibson assembly works by joining fragments that share 15–40 bp of identical sequence at their junctions. The exonuclease in the Gibson mix chews back 5′ ends to create single-stranded overhangs, which then anneal and are ligated. For your fragments to be compatible, you must ensure overlapping homology at every junction.
For PCR fragments, the key step is primer design — your forward primer for each insert should carry a 5′ tail of 20–40 bp that matches the end of the upstream fragment (or linearised vector). Tools like Benchling let you design these overlaps visually and verify them in silico before ordering primers. For RE-digested fragments, you need to confirm that the sticky or blunt ends of adjacent fragments are complementary and that no sequence is lost or mismatched at the junction. After generating your fragments by either method, always run a gel to confirm band size, then column-purify or gel-extract to remove primers, enzymes, and salts that would interfere with the Gibson reaction.
Key things that can go wrong
Overlaps too short (<15 bp) → poor annealing, assembly fails
Overlaps contain repeats or secondary structure → mis-assembly or rearrangements
Impure fragments (primers still present) → compete with correct annealing
How does the plasmid DNA enter the E. coli cells during transformation?
The most common lab method is heat-shock transformation of chemically competent cells. Cells are treated with divalent cations (typically CaCl₂) during preparation, which partially neutralises the negative charge on both the cell membrane and the DNA, reducing electrostatic repulsion. When the cold cell-DNA mixture is briefly shifted to 42°C, transient pores open in the membrane through which DNA can enter. The cells are then recovered in rich medium at 37°C (called the recovery or outgrowth step) to allow antibiotic resistance genes on the plasmid to be expressed before plating on selective media.
What happens at the molecular level
CaCl₂ treatment: Ca²⁺ ions coat DNA and neutralise the membrane’s negative charge
Ice incubation: DNA-cell complexes form and associate with the outer membrane
Heat shock: membrane fluidity spike creates transient pores → DNA enters cytoplasm
Return to ice: pores reseal, DNA is trapped inside
Outgrowth: plasmid replicates, antibiotic resistance gene is expressed before selection
An alternative to chemical transformation is electroporation, where a brief electrical pulse physically creates pores in the membrane to allow DNA entry. Electroporation generally gives 10–100× higher efficiency and is preferred for large plasmids or when transformation efficiency is critical. However, it requires electrocompetent cells (washed to remove salts) and specialised equipment.
Describe another assembly method in detail (such as Golden Gate Assembly)
Golden Gate Assembly is an elegant cloning strategy that uses Type IIS restriction enzymes — enzymes that cut outside their recognition sequence to generate custom 4-bp overhangs. Because the enzyme cuts away from its own recognition site, the recognition sequence is eliminated from the final product, leaving only the overhangs you designed. This enables scarless, seamless assembly.
Golden Gate uses a Type IIS enzyme (most commonly BsaI or BsmBI) whose recognition sequence and cut site are spatially separated. Each DNA part is flanked by Type IIS recognition sequences oriented so that cutting releases the recognition site and leaves behind a unique 4-bp overhang. Because every junction has a distinct 4-bp overhang, the parts can only assemble in one correct order — the overhangs provide directionality and specificity simultaneously. The restriction enzyme and T4 DNA ligase are added to the same tube and thermocycled together: digestion and ligation alternate in repeated cycles, so misligated or incorrect assemblies are continuously re-cut and corrected while the correct assembly accumulates. The final assembled product lacks all Type IIS sites, so it can never be re-cut by the enzyme in the reaction. This makes Golden Gate highly efficient even for assemblies of 5–10 fragments in a single reaction, which makes it faster and more scalable than sequential cloning strategies.
ii. Model this assembly method with Benchling or Asimov Kernel!
Go to benchling.com and create a free account (or log into your course account)
Create a new DNA sequence — paste in your vector sequence
Add annotations for each BsaI recognition site (GGTCTC for BsaI) using the annotation tool
Create separate sequences for each insert part, each flanked by BsaI sites with the correct 4-bp overhangs
Use the Assembly Wizard (under Cloning → Assembly) — select “Golden Gate” as the assembly type, add your parts in order, and Benchling will simulate the digestion and show you the predicted assembled construct
Verify that the junctions are correct and the reading frame is maintained across each junction
As a comparison to Gibson: Golden Gate is preferable when you are assembling many parts (5+) in a defined order and want precise scarless junctions, while Gibson is simpler to set up for 2–3 fragments and does not require specific restriction sites to be absent from your inserts. Golden Gate requires that no internal BsaI sites exist within any of your parts (you may need to silently mutate them out), whereas Gibson has no such constraint.
Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Integrated Artificial Neural Networks (IANNs) offer superior advantages over traditional Boolean genetic circuits by enabling analog processing, high-dimensional pattern recognition, and robust adaptability. Unlike Boolean circuits (on/off), IANNs handle continuous input levels, allowing complex decision-making, such as classifying disease states based on weighted molecular signatures rather than simple threshold switches.
Advantages of IANNs
They can handle more complex signals (not just ON/OFF).
They can combine many inputs at once.
They give graded responses (not just yes/no).
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
Use: Detect cancer cells inside the body
Inputs:
X1 = cancer marker 1
X2 = cancer marker 2
Output:
If both markers are high → produce drug (kill cell)
If not → do nothing
Limitations:
Hard to control inside real cells
Can be slow
May make mistakes (wrong output)
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
Assignment Part 2: Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
Examples
Mycelium packaging
Used instead of plastic/foam
Mycelium leather
Used for clothes, bags
Building materials
Used for bricks/insulation
Advantages:
Biodegradable
Eco-friendly
Renewable
Disadvantages:
Not as strong as plastic/metal
Can be damaged by water
Slower to produce
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
What we could make fungi do:
Produce stronger materials
Glow (for lighting)
Clean pollution
Why fungi (vs bacteria):
Grow into solid shapes (good for materials)
Naturally make strong structures
Better for large-scale materials
Week 9 HW: Cell free systems
Homework Part A: General and Lecturer-Specific Questions
General homework questions
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
In traditional in vivo (inside the cell) methods, the cell’s primary goal is its own survival. In CFPS, the goal is purely production.
Experimental Control: You can directly manipulate the environment. You can add non-natural amino acids, adjust redox potential, or add chaperones at precise concentrations without the cell’s homeostatic mechanisms fighting back.
Flexibility: There is no “transformation” step. You can add linear DNA or mRNA directly to the mix, enabling rapid prototyping (Design-Build-Test cycles).
Two Cases where CFPS is superior:
Cytotoxic Proteins: Producing proteins that would kill a living host (e.g., antimicrobial peptides or certain toxins) is much easier in vitro because the system doesn’t need to stay “alive.”
Incorporation of Non-Canonical Amino Acids (ncAAs): CFPS allows for the site-specific insertion of synthetic amino acids to create proteins with new chemical properties, which is often difficult in cells due to transport issues or toxicity.
Describe the main components of a cell-free expression system and explain the role of each component.
Cell Extract (Crude Lysate): Provides the core machinery: Ribosomes, aminoacyl-tRNA synthetases, and initiation/elongation factors.
Cell-free protein synthesis template DNA or mRNA: Encodes the protein of interest.
Energy Source High-energy molecules (e.g., Phosphoenolpyruvate or Creatine Phosphate): to fuel the reaction.
Amino Acids: The raw building blocks for the protein chain.
Nucleotides (NTPs): Required for transcription (if starting from DNA) and energy transfer.
Salts and Buffers (e.g., Magnesium and Potassium): To maintain pH and stabilize the folding of the machinery.
Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
Protein synthesis is energetically expensive. Every peptide bond requires the hydrolysis of multiple ATP and GTP molecules. If the energy supply is exhausted, translation stops, leading to low yields.
One common method is the Secondary Energy Solution using Creatine Phosphate and Creatine Kinase. The kinase enzyme transfers a phosphate group from creatine phosphate to ADP, constantly “recharging” the ATP pool within the tube.
Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
Prokaryotic (e.g., E. coli): High yield, fast, and inexpensive. However, it lacks post-translational modification (PTM) capabilities.
Protein to produce: Insulin. While complex, it is a small protein that can be efficiently produced and folded in optimized E. coli systems.
Eukaryotic (e.g., CHO or Wheat Germ): Lower yield but capable of complex folding and PTMs like glycosylation.
Protein to produce: Monoclonal Antibodies. These require complex disulfide bond formation and glycosylation to be functional, which prokaryotic systems cannot easily handle.
How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
Membrane proteins are notoriously difficult because they are hydrophobic and require a lipid environment to fold correctly; otherwise, they aggregate and become useless.
Challenges and Solutions:
Challenge: Lack of a lipid bilayer.
Solution: Supplement the CFPS reaction with nanodiscs or synthetic liposomes. These provide a “landing pad” for the protein to insert itself into as it is being synthesized.
Challenge: Detergent toxicity.
Solution: Use detergent-free CFPS or mild surfactants that stabilize the protein without denaturing the translation machinery.
Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
Template Degradation: Check the purity of your DNA/mRNA. Use RNase inhibitors to prevent the degradation of the blueprint during the reaction.
Energy Depletion: Analyze the duration of the reaction. If it stops prematurely, increase the concentration of the secondary energy source or implement a dialysis system to remove byproduct phosphate.
Poor Protein Folding: If the protein is being made but is insoluble (forming pellets), try lowering the incubation temperature or adding molecular chaperones (like DnaK/J) to the extract.
Homework question from Peter Nguyen
Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:
Write a one-sentence summary pitch sentence describing your concept.
A paper-based, low-cost diagnostic tool that utilizes freeze-dried cell-free systems and RNA toehold switches to provide a visible color-change signal in the presence of Vibrio cholerae within 60 minutes.
How will the idea work, in more detail? Write 3-4 sentences or more.
The device consists of a paper strip embedded with a freeze-dried cell-free transcription-translation (TX-TL) system and a synthetic “toehold switch” gene circuit. When a suspected water sample is dropped onto the paper, the water rehydrates the biological machinery, “waking it up.” If the specific RNA sequence of the Vibrio cholerae bacteria is present in the sample, it binds to the toehold switch, unfolding the RNA structure and allowing the ribosome to translate a reporter protein, such as LacZ or purple chromoprotein, which turns the paper from white to a distinct color.
What societal challenge or market need will this address?
Cholera remains a major global health threat in resource-limited settings where traditional lab-based testing (like bacterial culture or PCR) is too slow, expensive, and dependent on a “cold chain” for refrigerated transport. This kit addresses the need for decentralized, rapid outbreak detection, allowing local health workers to confirm cases in the field and immediately initiate life-saving interventions like water chlorination and rehydration therapy.
How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?
Activation: The limitation of needing water for activation is turned into a benefit here, as the liquid sample being tested serves as the rehydration agent for the freeze-dried reagents.
Stability: By incorporating lyoprotectants (such as trehalose) during the freeze-drying process, the enzymatic machinery is stabilized against high tropical temperatures, eliminating the need for refrigeration during shipping.
One-Time Use: To ensure the tool is economically viable despite being one-time use, the system is engineered onto cellulose paper, which is biodegradable and costs only cents per test, making it a scalable solution for mass surveillance during floods or humanitarian crises.
Week 10 HW: Imaging and Measurement
Homework: Waters Part I — Molecular Weight
We were analyzing an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).
1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight?
The chemical formula and average molecular weight.
Sequence Components:
Using an average isotopic mass calculator (like ExPASy Compute pI/Mw), the breakdown for this specific sequence is:
Calculated MW: 28,124.08 Da
2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:
Determine for each adjacent pair of peaks (n, n+1) using the formula provided
The relationship between two adjacent peaks, where $m_1$ is the lower $m/z$ peak (higher charge) and $m_2$ is the higher $m/z$ peak (lower charge).
Project Description I want to develop a synthetic biology-based point-of-care diagnostic platform that can rapidly detect cholera toxin and toxin co-regulated pilus antigens in stool and environmental samples within 15-30 minutes using engineered cell-free biosensors integrated into a paper-based microfluidic device.
The system employs single-domain antibodies recognition elements fused to split T7 RNA polymerase domains that reconstitute active enzyme only upon antigen binding, driving colorimetric reporter expression for visual result interpretation requiring no laboratory infrastructure or specialized training.
I want to develop a synthetic biology-based point-of-care diagnostic platform that can rapidly detect cholera toxin and toxin co-regulated pilus antigens in stool and environmental samples within 15-30 minutes using engineered cell-free biosensors integrated into a paper-based microfluidic device.
The system employs single-domain antibodies recognition elements fused to split T7 RNA polymerase domains that reconstitute active enzyme only upon antigen binding, driving colorimetric reporter expression for visual result interpretation requiring no laboratory infrastructure or specialized training.
Aim 1: Design and validate a dual-channel cell-free biosensor that detects cholera toxin and toxin co-regulated pilus, demonstrating reliable, rapid results without living cells or laboratory infrastructure.
Aim 2: Embed the biosensor into a paper-based diagnostic strip and test it on real clinical and environmental samples, confirming accuracy, stability in tropical climates, and readiness for field deployment.
Aim 3: Create an adaptable platform for rapid infectious disease detection anywhere in the world from cholera to pandemic pathogens enabling real-time outbreak surveillance and early containment in resource-limited settings.