Genes are liky the story, and DNA is the language that the story is written in
About me
Hi, everyone! My name is Shirley Azurin. I am currently an undergraduate student in the field of Genetics and Biotechnology. In my free time, I enjoy doing volunteering, whether it’s to meet new people, build my network, or simply have fun. I have volunteered at the herbarium of the Museum of Natural History in my country, at a sports competition called Bolivarian Games, and I am currently volunteering at the SPBBC.
Homework Questions from Professor Jacobson Question 1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
DNA polymerase has an error rate on the order of 1 error per 106 bases. Compared to the size of the human genome, about 3 × 109 base pairs, this would imply thousands of errors per genome replication. Biology addresses this mismatch by adding layers of correction: polymerases can proofread during synthesis, and additional mismatch repair pathways correct many of the errors that still escape proofreading.
Due Date Due by start of Mar 3 Lecture
A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang:
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Why do humans eat beef but do not become a cow, eat fish but do not become fish? Why are there only 20 natural amino acids? Can you make other non-natural amino acids? Design some new amino acids. Where did amino acids come from before enzymes that make them, and before life started? If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? Can you discover additional helices in proteins? Why are most molecular helices right-handed? Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? Design a β-sheet motif that forms a well-ordered structure. B. Protein Analysis and Visualization In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:
1) Biological engineering application tool I want to develop and why
Engineered probiotics for therapeutic compound delivery
Recently, I’ve become fascinated by probiotics after I started growing water kefir grains, a symbiotic culture of bacteria and yeast, mainly because of the benefits often attributed to them, such as helping balance the gut microbiota through the production and release of beneficial compounds. That experience made me wonder: what if we could intentionally design a probiotic strain to deliver a specific compound the body could benefit from? Building on that idea, these engineered probiotic bacterias are designed to be ingested orally, allowing them to travel through the digestive system and proliferate in the gut, where they can produce and release defined “payloads” (e.g., vitamins, enzymes, anti-inflammatory molecules, etc.)
While searching through the literature, I found that recent work highlights that engineered probiotics can modulate the intestinal microenvironment with higher precision than conventional drugs by enabling localized delivery of anti-inflammatory factors, scavenging of excess reactive oxygen species (ROS), restoration of barrier integrity, and regulation of microbial homeostasis (Duan et al., 2025). That said, this is not a completely new idea, there is already research going on exploring engineered probiotics as potential approaches for addressing diseases and disorders related to the gut and digestive system.
2) Governance / policy goals
Safe design to avoid potential risks (Non-malfeasance)
Evaluate whether the engineered probiotic could unintentionally alter the native microbiome in harmful ways
Prioritize designs that reduce the chance of the engineered strain persisting uncontrollably, thus avoiding a long-term colonization, they need to be programmed to stop at some point.
Informed Consent and Public Engagement
Informed the general public about the benefits and potential risks of synthetic biology
Explain to the general public what engineered probiotics can and cannot do, how they differ from conventional probiotics as well as the potential risks.
Provide transparent information on the progress and methodology used in this project in accessible language to encourage public trust.
3) Governance actions
a. Establishment of Regulatory Guidelines for Products Derived of Synthetic Biology
Purpose: Currently, in Peru policies are often general about GMOs and may not clearly address engineered probiotics intended for oral use. I propose creating specific guidelines for the safe development and testing of engineered probiotics.
Design: This would require coordination between government regulators, research institutions, and public health authorities.
Assumptions: Currently in Peru the production of GMO is banned. For that reason, anything related to that topic is negatively seen by the population. So it is possible that the efforts to establish a policy to regulate this new product could result in them being banned.
Risks of Failure & “Success”: Failure could mean either over-restriction or under-regulation which would allow unsafe projects. Even “success” has risks, like an increased visibility and interest, including attention from actors with bad intentions.
b. Educational Programs related to Synthetic Biology
Purpose: Many people understand “probiotics” as inherently safe, but engineered probiotics are different. This action aims to inform the general public about what engineered probiotics can and cannot do, how they differ from conventional probiotics.
Design: This action as well involves an integrate collaboration between government, scientific researches, universities, institutes, non-profit scientific organizations and even media outlets to develop and implement a comprehensive public awareness campaign. Activities may include seminars, workshops, banners, social media campaigns, and public forums.
Assumptions: Is very important to take into account the receptivity of the public to this new information, as well as the effectiveness of communication strategies that tries to convey complex scientific concepts in an accessible and engaging manner. A problem in the communication could drive in negative opinions.
Risks of Failure & “Success”: Failure in effective communication could increase misinformation, with misconceptions such as “engineered probiotics are dangerous” or “engineered probiotics are the cure of illness”. Success could promote public trust and support, but it could also unintentionally normalize the idea so much that some people attempt unregulated uses or pressure for premature deployment.
c. Promotion of Synthetic Biology Among Scholars
Purpose: Promoting synthetic biology in academia would increase knowledge, build local expertise, and encourage responsible innovation in areas like engineered probiotics with the consideration of safer designs.
Design: Universities and scientific organizations could implement workshops, seminars, and curriculum modules focused on: genetic circuit design (e.g., in Benchling), reproducibility, risk assessment, and biosafety-by-design.
Assumptions: Thanks to the academic background, scholars have a foundational understanding of biological principles and are more inclined to learn and apply topics in emerging fields such as Synthetic Biology.
Risks of Failure & “Success”: A failure on this may result in low adoption of this technology within academic circles, weak safety practices, or “copy-paste” projects without understanding risks. On the other hand, the success of this action could lead to increased the interest, participation and collaboration between scholars in Synthetic Biology. More scholars into this field will lead to more researches and discoveries that could benefit the general population.
4) Scoring governance actions
Does the option:
Option 1
Option 2
Option 3
Protect human health
• By preventing biological harms (side effects)
n/a
3
1
• By strengthening responsible research practices
n/a
3
1
Prevent unintended spread
• By preventing uncontrolled persistence
n/a
n/a
1
• By helping respond
n/a
n/a
1
Informed the public
• By preventing incidents
2
2
2
• By helping respond
2
1
1
Other considerations
• Minimizing costs and burdens to stakeholders
1
3
2
• Feasibility?
1
2
2
• Not impede research
1
3
1
• Promote constructive applications
1
3
1
5) Governance Prioritization and Recommendation
I would prioritize option 2 and option 3. Before trying to draft or push new laws, it’s important that people understand what engineered probiotics can and cannot do, and how they differ from conventional probiotics. In Peru, this matters even more because the country has a strong legal and cultural sensitivity around GMOs. For example, there is a moratorium on the entry and production of certain living modified organisms for environmental release that has recently been extended until December 31, 2035.
Week 2 HW: DNA Read, Write and Edit
Week 2 HW: Questions
Homework Questions from Professor Jacobson
Question 1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
DNA polymerase has an error rate on the order of 1 error per 106 bases. Compared to the size of the human genome, about 3 × 109 base pairs, this would imply thousands of errors per genome replication. Biology addresses this mismatch by adding layers of correction: polymerases can proofread during synthesis, and additional mismatch repair pathways correct many of the errors that still escape proofreading.
Question 2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Because the genetic code is degenerate, there are extremely many different DNA sequences that could encode the same average human protein sequence. However, most of these theoretical sequences are not equally functional in real cells. In practice, there are many factors that can strongly affect expression and translation. So although many sequences are valid theorically, only a smaller subset tends to work well in living systems.
Homework Questions from Dr. LeProust
Question 1. What’s the most commonly used method for oligo synthesis currently?
The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry. It is a stepwise chemical process where nucleotides are added one at a time on a solid support, which makes it highly automatable and scalable.
Question 2. Why is it difficult to make oligos longer than 200 nt via direct synthesis?
It is difficult to synthesize oligos longer than roughly 200 nucleotides because errors accumulate at every synthesis cycle. Even if each cycle is “very good,” errors and truncations accumulate with length, so the fraction of full-length, correct oligos drops a lot past ~200 nt.
Question 3. Why can’t you make a 2000 bp gene via direct oligo synthesis?
A 2000 bp gene cannot be made reliably by direct chemical synthesis because the cumulative error and truncation rates would make correct full-length molecules extremely unlikely and inefficient. Instead, long genes are typically produced by assembling shorter oligos into larger fragments, followed by cloning and sequence verification to identify correct constructs.
Homework Question from George Church
What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?
The ten essential amino acids in animals are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, arginine
Week 4 HW: Protein Design Part I
Due Date
Due by start of Mar 3 Lecture
A. Conceptual Questions
Answer any NINE of the following questions from Shuguang Zhang:
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Why are there only 20 natural amino acids?
Can you make other non-natural amino acids? Design some new amino acids.
Where did amino acids come from before enzymes that make them, and before life started?
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Can you discover additional helices in proteins?
Why are most molecular helices right-handed?
Why do β-sheets tend to aggregate?
What is the driving force for β-sheet aggregation?
Why do many amyloid diseases form β-sheets?
Can you use amyloid β-sheets as materials?
Design a β-sheet motif that forms a well-ordered structure.
B. Protein Analysis and Visualization
In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:
Briefly describe the protein you selected and why you selected it.
Identify the amino acid sequence of your protein.
How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
Does your protein belong to any protein family?
Identify the structure page of your protein in RCSB
When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
Are there any other molecules in the solved structure apart from protein?
We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:
Protein language modeling
Deep Mutational Scans
Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
Can you explain any particular pattern? (choose a residue and a mutation that stands out)
(Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
Latent Space Analysis
Use the provided sequence dataset to embed proteins in reduced dimensionality.
Analyze the different formed neighborhoods: do they approximate similar proteins?
Place your protein in the resulting map and explain its position and similarity to its neighbors.
Folding a protein
Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?
Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN
Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
Input this sequence into ESMFold and compare the predicted structure to your original.
D. Group Brainstorm on Bacteriophage Engineering
Main Goals
Goal 1: Increase the stability of the MS2 lysis protein by predicting mutations of residues near the C-terminal region and surrounding the LS motif
Goal 2: Improve the N-terminal region by modifying residues to contribute to its toxic activity or add new functional regions that may increase its toxicity.
Improving MS2 lysis protein by modifying regions not related with the Leu48 and Ser49 (LS motif) and surrounding to improve protein toxicity (Chamakura et al, 2017). Predict mutations that may improve its stability. We suggest that by increasing protein stability, the protein would not require the presence of the DnaJ for its action.
Another goal is to design new accessories to the N-terminal region to improve lysis toxicity. Berkhout et al, 1985 suggest C-terminal region is key for protein activity, so taking this in consideration we can try to modify the N-terminal region to improve protein stability or add a new characteristics that may improve the toxicity of the protein
Strategy
(which tools/approaches from recitation you propose using and why do you think those tools might help solve your chosen sub-problem?
)
Given our two main goals, we propose different strategies to address each objective
For the first goal, we propose using a protein language model such as ESM-2 to perform in silico deep mutational scan that evaluates the plausibility of all possible single-point mutations in the MS2 L protein. Subsequently, we will employ ESMFold or AlphaFold2 to predict the resulting 3D structural variations.
For the second goal:
Step 1: Identify and Annotate key functional regions near the C-terminal motif and LS motif
The model can predict structures that look stable and coherent, but it does not measure real folding energy, membrane insertion, or toxicity. “Looks good” in silico ≠ “works better” in vivo.
Selection of variants that appear structurally improved but do not increase stability or toxicity — or even reduce lytic activity.