SHIRLEY AZURIN — HTGAA Spring 2026

Genes are liky the story, and DNA is the language that the story is written in

About me

Hi, everyone! My name is Shirley Azurin. I am currently an undergraduate student in the field of Genetics and Biotechnology. In my free time, I enjoy doing volunteering, whether it’s to meet new people, build my network, or simply have fun. I have volunteered at the herbarium of the Museum of Natural History in my country, at a sports competition called Bolivarian Games, and I am currently volunteering at the SPBBC.

Contact info

Homework

Labs

Week 1 Lab: Pipetting

Projects

Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
Table of Contents 1) Biological engineering application / tool 2) Governance / policy goals 3) Governance actions 4) Scoring governance actions 5) Governance Prioritization and Recommendation 1) Biological engineering application tool I want to develop and why Engineered probiotics for therapeutic compound delivery
Week 2 HW: DNA Read, Write and Edit
Week 2 HW: Questions
Homework Questions from Professor Jacobson Question 1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy? DNA polymerase has an error rate on the order of 1 error per 106 bases. Compared to the size of the human genome, about 3 × 109 base pairs, this would imply thousands of errors per genome replication. Biology addresses this mismatch by adding layers of correction: polymerases can proofread during synthesis, and additional mismatch repair pathways correct many of the errors that still escape proofreading.
Week 4 HW: Protein Design Part I
Due Date Due by start of Mar 3 Lecture A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Why do humans eat beef but do not become a cow, eat fish but do not become fish? Why are there only 20 natural amino acids? Can you make other non-natural amino acids? Design some new amino acids. Where did amino acids come from before enzymes that make them, and before life started? If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? Can you discover additional helices in proteins? Why are most molecular helices right-handed? Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? Design a β-sheet motif that forms a well-ordered structure. B. Protein Analysis and Visualization In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

Week 1 HW: Principles and Practices

1) Biological engineering application tool I want to develop and why

Engineered probiotics for therapeutic compound delivery

Recently, I’ve become fascinated by probiotics after I started growing water kefir grains, a symbiotic culture of bacteria and yeast, mainly because of the benefits often attributed to them, such as helping balance the gut microbiota through the production and release of beneficial compounds. That experience made me wonder: what if we could intentionally design a probiotic strain to deliver a specific compound the body could benefit from? Building on that idea, these engineered probiotic bacterias are designed to be ingested orally, allowing them to travel through the digestive system and proliferate in the gut, where they can produce and release defined “payloads” (e.g., vitamins, enzymes, anti-inflammatory molecules, etc.)

While searching through the literature, I found that recent work highlights that engineered probiotics can modulate the intestinal microenvironment with higher precision than conventional drugs by enabling localized delivery of anti-inflammatory factors, scavenging of excess reactive oxygen species (ROS), restoration of barrier integrity, and regulation of microbial homeostasis (Duan et al., 2025). That said, this is not a completely new idea, there is already research going on exploring engineered probiotics as potential approaches for addressing diseases and disorders related to the gut and digestive system.

2) Governance / policy goals

Safe design to avoid potential risks (Non-malfeasance)

Evaluate whether the engineered probiotic could unintentionally alter the native microbiome in harmful ways
Prioritize designs that reduce the chance of the engineered strain persisting uncontrollably, thus avoiding a long-term colonization, they need to be programmed to stop at some point.

Informed Consent and Public Engagement

Informed the general public about the benefits and potential risks of synthetic biology
Explain to the general public what engineered probiotics can and cannot do, how they differ from conventional probiotics as well as the potential risks.
Provide transparent information on the progress and methodology used in this project in accessible language to encourage public trust.

3) Governance actions

a. Establishment of Regulatory Guidelines for Products Derived of Synthetic Biology

Purpose: Currently, in Peru policies are often general about GMOs and may not clearly address engineered probiotics intended for oral use. I propose creating specific guidelines for the safe development and testing of engineered probiotics.
Design: This would require coordination between government regulators, research institutions, and public health authorities.
Assumptions: Currently in Peru the production of GMO is banned. For that reason, anything related to that topic is negatively seen by the population. So it is possible that the efforts to establish a policy to regulate this new product could result in them being banned.
Risks of Failure & “Success”: Failure could mean either over-restriction or under-regulation which would allow unsafe projects. Even “success” has risks, like an increased visibility and interest, including attention from actors with bad intentions.

b. Educational Programs related to Synthetic Biology

Purpose: Many people understand “probiotics” as inherently safe, but engineered probiotics are different. This action aims to inform the general public about what engineered probiotics can and cannot do, how they differ from conventional probiotics.
Design: This action as well involves an integrate collaboration between government, scientific researches, universities, institutes, non-profit scientific organizations and even media outlets to develop and implement a comprehensive public awareness campaign. Activities may include seminars, workshops, banners, social media campaigns, and public forums.
Assumptions: Is very important to take into account the receptivity of the public to this new information, as well as the effectiveness of communication strategies that tries to convey complex scientific concepts in an accessible and engaging manner. A problem in the communication could drive in negative opinions.
Risks of Failure & “Success”: Failure in effective communication could increase misinformation, with misconceptions such as “engineered probiotics are dangerous” or “engineered probiotics are the cure of illness”. Success could promote public trust and support, but it could also unintentionally normalize the idea so much that some people attempt unregulated uses or pressure for premature deployment.

c. Promotion of Synthetic Biology Among Scholars

Purpose: Promoting synthetic biology in academia would increase knowledge, build local expertise, and encourage responsible innovation in areas like engineered probiotics with the consideration of safer designs.
Design: Universities and scientific organizations could implement workshops, seminars, and curriculum modules focused on: genetic circuit design (e.g., in Benchling), reproducibility, risk assessment, and biosafety-by-design.
Assumptions: Thanks to the academic background, scholars have a foundational understanding of biological principles and are more inclined to learn and apply topics in emerging fields such as Synthetic Biology.
Risks of Failure & “Success”: A failure on this may result in low adoption of this technology within academic circles, weak safety practices, or “copy-paste” projects without understanding risks. On the other hand, the success of this action could lead to increased the interest, participation and collaboration between scholars in Synthetic Biology. More scholars into this field will lead to more researches and discoveries that could benefit the general population.

4) Scoring governance actions

Does the option:	Option 1	Option 2	Option 3
Protect human health
• By preventing biological harms (side effects)	n/a	3	1
• By strengthening responsible research practices	n/a	3	1
Prevent unintended spread
• By preventing uncontrolled persistence	n/a	n/a	1
• By helping respond	n/a	n/a	1
Informed the public
• By preventing incidents	2	2	2
• By helping respond	2	1	1
Other considerations
• Minimizing costs and burdens to stakeholders	1	3	2
• Feasibility?	1	2	2
• Not impede research	1	3	1
• Promote constructive applications	1	3	1

5) Governance Prioritization and Recommendation

I would prioritize option 2 and option 3. Before trying to draft or push new laws, it’s important that people understand what engineered probiotics can and cannot do, and how they differ from conventional probiotics. In Peru, this matters even more because the country has a strong legal and cultural sensitivity around GMOs. For example, there is a moratorium on the entry and production of certain living modified organisms for environmental release that has recently been extended until December 31, 2035.

Week 2 HW: DNA Read, Write and Edit

Week 2 HW: Questions

Homework Questions from Professor Jacobson

Question 1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

DNA polymerase has an error rate on the order of 1 error per 10^{6 bases. Compared to the size of the human genome, about 3 × 10}9 base pairs, this would imply thousands of errors per genome replication. Biology addresses this mismatch by adding layers of correction: polymerases can proofread during synthesis, and additional mismatch repair pathways correct many of the errors that still escape proofreading.

Question 2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Because the genetic code is degenerate, there are extremely many different DNA sequences that could encode the same average human protein sequence. However, most of these theoretical sequences are not equally functional in real cells. In practice, there are many factors that can strongly affect expression and translation. So although many sequences are valid theorically, only a smaller subset tends to work well in living systems.

Homework Questions from Dr. LeProust

Question 1. What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry. It is a stepwise chemical process where nucleotides are added one at a time on a solid support, which makes it highly automatable and scalable.

Question 2. Why is it difficult to make oligos longer than 200 nt via direct synthesis?

It is difficult to synthesize oligos longer than roughly 200 nucleotides because errors accumulate at every synthesis cycle. Even if each cycle is “very good,” errors and truncations accumulate with length, so the fraction of full-length, correct oligos drops a lot past ~200 nt.

Question 3. Why can’t you make a 2000 bp gene via direct oligo synthesis?

A 2000 bp gene cannot be made reliably by direct chemical synthesis because the cumulative error and truncation rates would make correct full-length molecules extremely unlikely and inefficient. Instead, long genes are typically produced by assembling shorter oligos into larger fragments, followed by cloning and sequence verification to identify correct constructs.

Homework Question from George Church

What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?

The ten essential amino acids in animals are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, arginine

Week 4 HW: Protein Design Part I

Due Date

Due by start of Mar 3 Lecture

A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang:

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Why are there only 20 natural amino acids?
Can you make other non-natural amino acids? Design some new amino acids.
Where did amino acids come from before enzymes that make them, and before life started?
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Can you discover additional helices in proteins?
Why are most molecular helices right-handed?
Why do β-sheets tend to aggregate?
- What is the driving force for β-sheet aggregation?
Why do many amyloid diseases form β-sheets?
- Can you use amyloid β-sheets as materials?
Design a β-sheet motif that forms a well-ordered structure.

B. Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

Briefly describe the protein you selected and why you selected it.
Identify the amino acid sequence of your protein.
- How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
- How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
- Does your protein belong to any protein family?
Identify the structure page of your protein in RCSB
- When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
- Are there any other molecules in the solved structure apart from protein?
- Does your protein belong to any structure classification family?
Open the structure of your protein in any 3D molecule visualization software:
- PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
- Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
- Color the protein by secondary structure. Does it have more helices or sheets?
- Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
- Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
Choose your favorite protein from the PDB.
We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

Protein language modeling

Deep Mutational Scans
1. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
2. Can you explain any particular pattern? (choose a residue and a mutation that stands out)
3. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
Latent Space Analysis
1. Use the provided sequence dataset to embed proteins in reduced dimensionality.
2. Analyze the different formed neighborhoods: do they approximate similar proteins?
3. Place your protein in the resulting map and explain its position and similarity to its neighbors.

Folding a protein

Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
Input this sequence into ESMFold and compare the predicted structure to your original.

D. Group Brainstorm on Bacteriophage Engineering

Main Goals

Goal 1: Increase the stability of the MS2 lysis protein by predicting mutations of residues near the C-terminal region and surrounding the LS motif
Goal 2: Improve the N-terminal region by modifying residues to contribute to its toxic activity or add new functional regions that may increase its toxicity.

Improving MS2 lysis protein by modifying regions not related with the Leu48 and Ser49 (LS motif) and surrounding to improve protein toxicity (Chamakura et al, 2017). Predict mutations that may improve its stability. We suggest that by increasing protein stability, the protein would not require the presence of the DnaJ for its action.

Another goal is to design new accessories to the N-terminal region to improve lysis toxicity. Berkhout et al, 1985 suggest C-terminal region is key for protein activity, so taking this in consideration we can try to modify the N-terminal region to improve protein stability or add a new characteristics that may improve the toxicity of the protein

Strategy

(which tools/approaches from recitation you propose using and why do you think those tools might help solve your chosen sub-problem? )

Given our two main goals, we propose different strategies to address each objective

For the first goal, we propose using a protein language model such as ESM-2 to perform in silico deep mutational scan that evaluates the plausibility of all possible single-point mutations in the MS2 L protein. Subsequently, we will employ ESMFold or AlphaFold2 to predict the resulting 3D structural variations.

For the second goal:

Step 1: Identify and Annotate key functional regions near the C-terminal motif and LS motif

Software: Blast (For conserved domains), PeSTO (Functional motifs)

Predict mutations near the N-terminal and C-terminal site that may improve protein stability

Software: Clustal Omega (To identify hotspots for mutations)

Generate different protein candidates with mutations and evaluate their stability

Software: Alpha-Fold Multimer, Boltz-1

We propose using Alpha-Fold with a specific training set for bacteriophages

Predict accessory peptide sequences to insert in their N-terminal region and improve its toxicity

Software: FoldSeek (To find remote sequences with similar folding), EvolvePro (To suggest optimized N-terminar sequences)

Test suitability of these protein candidates by performing docking essays with a bacterial membrane model, etc.

Pitfalls

Strategy/Software	Core Limitation	Risks
Structural prediction & design (AlphaFold, FoldSeek, EvolvePro, Boltz-1)	The model can predict structures that look stable and coherent, but it does not measure real folding energy, membrane insertion, or toxicity. “Looks good” in silico ≠ “works better” in vivo.	Selection of variants that appear structurally improved but do not increase stability or toxicity — or even reduce lytic activity.
Phage-specific training / limited viral datasets

SHIRLEY AZURIN — HTGAA Spring 2026

About me

Contact info

Homework

Labs

Projects

Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices

Table of Contents

1) Biological engineering application tool I want to develop and why

2) Governance / policy goals

3) Governance actions

4) Scoring governance actions

5) Governance Prioritization and Recommendation

Week 2 HW: DNA Read, Write and Edit

Week 2 HW: Questions

Homework Questions from Professor Jacobson

Homework Questions from Dr. LeProust

Homework Question from George Church

Week 4 HW: Protein Design Part I

A. Conceptual Questions

B. Protein Analysis and Visualization

C. Using ML-Based Protein Design Tools

D. Group Brainstorm on Bacteriophage Engineering

Main Goals

Strategy

Pitfalls

Suggested Pipeline

Labs

Lab writeups:

Week 1 Lab: Pipetting

Projects

Final projects:

Individual Final Project

Group Final Project

SHIRLEY AZURIN — HTGAA Spring 2026

About me

Contact info

Homework

Labs

Projects

Subsections of SHIRLEY AZURIN — HTGAA Spring 2026

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

Table of Contents

1) Biological engineering application tool I want to develop and why

2) Governance / policy goals

3) Governance actions

4) Scoring governance actions

5) Governance Prioritization and Recommendation

Week 2 HW: DNA Read, Write and Edit

Week 2 HW: Questions

Homework Questions from Professor Jacobson

Homework Questions from Dr. LeProust

Homework Question from George Church

Week 4 HW: Protein Design Part I

A. Conceptual Questions

B. Protein Analysis and Visualization

C. Using ML-Based Protein Design Tools

D. Group Brainstorm on Bacteriophage Engineering

Main Goals

Strategy

Pitfalls

Suggested Pipeline

Labs

Lab writeups:

Subsections of Labs

Week 1 Lab: Pipetting

Projects

Final projects:

Subsections of Projects

Individual Final Project

Group Final Project