Raphael Aca — HTGAA Spring 2026

About me

Greetings from Atlanta, Georgia! My name is Raphael Aca, and I am a Biology undergrad student at the Georgia Institute of Technology. I have had a passion for synthetic biology ever since I transformed my first bacteria to glow in my freshman year of high school. Today, I continue my passion for SynBio in my undergrad research, classes, clubs, etc. In the future, I hope to obtain a PhD and continue to work in SynBio in topics such as de-extinction, astrobiology, bioremediation, biomedical engineering, etc. When I’m not thinking about SynBio, I partake in hobbies such as weightlifting, video games, or gardening.

Contact info

Email Instagram LinkedIn

Homework

Labs

Projects

Subsections of Raphael Aca — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Class Assignment 1. First, describe a biological engineering application or tool you want to develop and why. I want to develop a membraneless organelle within plant cells that is able to detect breakage of the cell membrane by a foreign organism. This organelle, which is comprised of intrinsically disordered proteins (IDPs), would trigger immune system responses upon detection. The purpose of this organelle is to detect and shut down fungal plant pathogens that infect through breaching cell membranes. This novel application would lower yield loss in rice plants (primarily Oriza Sativa) from fungal diseases like Rice Blast (Magnaporthe grisea) which is responsible for 10%-30% yield losses every year for rice, preventing the possibility of feeding about 60 million people.

  • Week 2 HW: DNA Read, Write, & Edit

    Part 1: Benchling & In-silico Gel Art Info This is a picture of the gel art I designed on Benchling. The bands in the 1-6 ladders create the word “Hi” on completion. The restriction enzymes used on the Lambda DNA are listed above the diagram.

  • Week 3 HW: Lab Automation

    Assignment: Python Script for Opentrons Artwork This is a link to the code for my Opentrons Artwork. AI Contributions: I used AI to generate large portions of my code as I am largely unfamiliar with python programming. I used Gemini AI and asked it to integrate my coordinates for my artwork into the code in Colab.

  • Week 4 HW: Protein Design Part I

    Part A. Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Assuming that meat contains about 20% protein, 500 grams of meat translates to ~100 grams of protein. If an amino acid weighs 100 Daltons, this will translate to 100 grams per mole of amino acids. Knowing this, we calculate 500 grams of meat to have one mole of amino acids. Using Avogadro’s constant, we can calulate the number to be 6 x 1023 molecules of amino acids in 500 grams of meat.

  • Week 5 HW: Protein Design Part II

    Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM This is the human SOD1 sequence containing the A4V mutation: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Binder Pseudo Perplexity 0 WHYYATGARWGE 16.929015 1 WRYGAVALELKK 12.714672 2 WRSPAAAARWWK 9.155765 3 WRYPATAAALKX 4.843841 4 FLYRWLPSRRGG N/A Info The table generated by PepMLM detailing possible peptides to bind to mutant SOD1 along with their pseudo perplexity scores. Peptide 4 is an already known SOD1-binding peptide.

  • Week 6 HW: Genetic Circuits Part I: Assembly Technologies

    Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The Phusion High-Fidelity PCR Master Mix consists of Phusion DNA Polymerase, deoxynucleotides, and a reaction buffer (including MgCl2). The Phusion DNA Polymerase is a high-fidelity enzyme that is used to synthesize new, complementary nucleotides to the 3’ end of a DNA strand. Deoxynucleotides are present within the master mix to be added to the cloned DNA strand. The reaction buffer facilitates enzymatic function and stabilizes the DNA polymerase, allowing the PCR reaction to proceed smoothly.

  • Week 7 HW: Genetic Circuits Part II

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

Class Assignment

1. First, describe a biological engineering application or tool you want to develop and why.

I want to develop a membraneless organelle within plant cells that is able to detect breakage of the cell membrane by a foreign organism. This organelle, which is comprised of intrinsically disordered proteins (IDPs), would trigger immune system responses upon detection. The purpose of this organelle is to detect and shut down fungal plant pathogens that infect through breaching cell membranes. This novel application would lower yield loss in rice plants (primarily Oriza Sativa) from fungal diseases like Rice Blast (Magnaporthe grisea) which is responsible for 10%-30% yield losses every year for rice, preventing the possibility of feeding about 60 million people.

2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

To ensure an ethical future, it is imperative to use preventative measures against ecological harm or unintended spread of this tool. One sub-goal to support this goal is to pursue the genetic containment of any engineered constructs within the plant cells. This could be done through designing constructs that can only function within plant cells. Another sub-goal is ensuring reversibility of the tool and monitoring altered organsism over time. Molecular constructs that are able to disable the tool along with monitoring programs can secure reversibility in case of adverse effects to the environment. Another policy goal to pursue is promoting equitable and responsible agricultural use. Promoting equitable use can be done through confirming affordable implementation for small farmers. Practicing transparency on the mechanics of the engineered system with farmers and consumers along with informed consent are vital for responsible agricultural uses.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Option 1. Currently, engineered crops are evaluated for environmental risk without any requirement for active containment mechanisms beyond general biosafety assessment. Federal regulators like the USDA-APHIS should create a requirement where engineered disease-responsive crops must have genetic containment features within its design to reduce unintended spread. This requirement should be a condition for any field trials/commercialization of genetically modified plants. The developers of the engineered crops will document containment logic and submit documentation to regulators within the federal agency for approval. This design assumes that genetic containment strategies will effectively reduce ecological risk and that regulators can consistently evaluate various containment designs across differing technologies. This option carries the risk of mutations or environmental variability bypassing containment procedures in practice. In addition, this option could slow innovation and burden smaller developers.

Option 2. Novel pathogen resistance strategies often only evaluated for efficacy and not for how they can shape pathogen evolution. Research institutions and academic funders (e.g., NSF, USDA-NIFA) can create new incentives for researchers to create new design strategies for pathogen resistance that take evolution and ecological pressure into consideration. Academic funders can reward researchers for designing damage-based or multi-signal immune systems over single effector type targets. This can be enforced through giving priority funding or recognition to projects that demonstrate reduced selective pressure. This option assumes that researchers can accurately anticipate evolutionary dynamics upon design. There is still a risk of pathogens adapting in an unforeseen way. It is also important to note that pathogen resistance strategies can lead to overly sensitive immune responses in crops which could hinder crop yield.

Option 3. Many engineered crops today are owned and controlled by companies which makes it more difficult for farmers to use these technologies. By using regulatory or financial incentives, we can encourage developers to commit to equitable licensing and deployment models to ensure more accessibility for small farmers. Government agencies can offer incentives such as faster regulatory review or public grant eligibility so that developers will agree to low-cost licensing and clear communication to farmers on benefits and limitations. This policy assumes that broader access to engineered crops will improve food security and the incentives are enough to influence company behavior. A possible issue with this policy is that smaller crop developers may struggle to meet equity requirements, unintentionally favoring bigger companies.

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents21N/A
• By helping respond12N/A
Foster Lab Safety
• By preventing incident22N/A
• By helping respond12N/A
Protect the environment
• By preventing incidents21N/A
• By helping respond12N/A
Other considerations
• Minimizing costs and burdens to stakeholders321
• Feasibility?222
• Promote Accessibility?331
• Not impede research322
• Promote constructive applications211

4. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

The scoring demonstrates that the best course of action would be to combine Option 2 (incentives for evolution-resilient design) and Option 3 (accessible deployment and licensing). Option 2 would be prioritized as it performs the strongest in prevention-focused categories (biosecurity and environmental protection) while also minimizing burdens on researchers and scientific discovery. This option is vital in preventing detrimental harm to the environment.

Option 3 complements Option 2 by strengthening its gaps found outside of prevention-based categories. Option 3 allows for accessibility while also minimizing costs to stakeholders. By combining these two options, my bioengineering tool maximizes accessibility and impact while minimizing risks to the environment.

While Option 1 is strong in response to risks, it scored poorly on cost, accessibility, and research freedom. Prioritizing this option would ultimately slow innovation and burden smaller research teams. With our chosen options, we assume that developers would prioritize equitable deployment incentives and that early design decisions are effective at preventing ecological/evolutionary risk. However, we are uncertain as to whether incentives may be enough to convince profit-driven companies and if preventative design measures are able to prevent pathogens from evolving in unforeseen ways.

Week 2 Lecture Preparation

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of polymerase is 1:106. Since the human genome is 3.2 billion base pairs long, this error rate would create about 32,000 errors per replication cycle. This would be catastrophic for genetic stability. Biology deals with this discrepancy by using error correcting polymerase, which actively corrects mistakes before they become permanent mutations in the DNA.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The average human protein has about 1036 base pairs. With about 345 codons, there can easily be 10100 or more ways to code the average human protein. A possible reason for why all these different codes do not work to code for the protein of interest is that mRNA could have an altered structure which could affect translation initiation, elongation, or stability. Another possible reason is that some sequences can create splice sites, promoter elements, or miRNA binding sites.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

The most commonly used method for oligo synthesis currently is phosphoramidite-based solid-phase synthesis.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

The main reasons for why it is so difficult to make oligos longer than 200 nt via direct synthesis is because of various reasons. To start, each cycle has a small failure rate which dramatically drops yield after 200 cycles. Long synthesis also increases the risk of base damage and full-length product is difficult to separate from failure sequences of a similar length.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

You can’t make a 2000bp gene via direct oligo synthesis as the small failure rate for each cycle would make the yield essentially zero. Too many errors would accumulate for a 2000-mer sequence.

Homework Questions from George Church

1. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acid in all animals are histidine, isoleucine, lysine, methionine, phenylalanine, threonine, typtophan, valine, and arginine. This makes me feel as if the Lysine Contingency is a very flawed method. While the dinosaurs did require lysine to survive, they could have eaten lysine from any animal in their diet. A more robust contingency would’ve involved multiple essential nutrients or synthetic auxotrophy.

2. What code would you suggest for AA:AA interactions?

For AA:AA interactions, I would suggest a “Lock-and-Key” chemical match based on the charge, hydrophobicity, size, shape, and special pairs (like Cysteine forming strong bonds).

3. Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:

If our most advanced biological medicines were as easy to store and ship as aspirin, this would mean that more people could receieve life-saving gene and cell therapies. We would not have to worry about ultra-cold freezers which saves hundreds of thousands of dollars. We would be able to reach people in remote areas who would not have the resources to have frigid freezers to store medicines. This would revolutionize medicine and allow for cheap distribution to every corner of the world.

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

cover image cover image
Info

This is a picture of the gel art I designed on Benchling. The bands in the 1-6 ladders create the word “Hi” on completion. The restriction enzymes used on the Lambda DNA are listed above the diagram.

Part 3: DNA Design Challenge

3.1. Choose your protein.

The protein that I chose is the human testis-determining factor. I chose this protein because I find it interesting how one gene plays such a big role in sex differentiation and is the largest factor in deciding how a human embryo will grow. It is interesting to think about how one type of protein encoded by one gene in the human genome can spark significant change in the entire development process of humans. This sequence was taken from the NCBI:

>NP_003131.1 sex-determining region Y protein [Homo sapiens] MQSYASAMLSVFNSDDYSPAVQENIPALRRSSSFLCTESCNSKYQCETGENSKGNVQDRVKRPMNAFIVWSRDQRRKMALENPRMRNSEISKQLGYQWKMLTEAEKWPFFQEAQKLQAMHREKYPNYKYRPRRKAKMLPKNCSLLPADPASVLCSEVQLDNRLYRDDCTKATHSRMEHQLGHLPPINAASSPQQRDRYSHWTKL

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

> reverse translation of NP_003131.1 sex-determining region Y protein [Homo sapiens] to a 612 base sequence of most likely codons. ATGCAGAGCTATGCGAGCGCGATGCTGAGCGTGTTTAACAGCGATGATTATAGCCCGGCGGTGCAGGAAAACATTCCGGCGCTGCGCCGCAGCAGCAGCTTTCTGTGCACCGAAAGCTGCAACAGCAAATATCAGTGCGAAACCGGCGAAAACAGCAAAGGCAACGTGCAGGATCGCGTGAAACGCCCGATGAACGCGTTTATTGTGTGGAGCCGCGATCAGCGCCGCAAAATGGCGCTGGAAAACCCGCGCATGCGCAACAGCGAAATTAGCAAACAGCTGGGCTATCAGTGGAAAATGCTGACCGAAGCGGAAAAATGGCCGTTTTTTCAGGAAGCGCAGAAACTGCAGGCGATGCATCGCGAAAAATATCCGAACTATAAATATCGCCCGCGCCGCAAAGCGAAAATGCTGCCGAAAAACTGCAGCCTGCTGCCGGCGGATCCGGCGAGCGTGCTGTGCAGCGAAGTGCAGCTGGATAACCGCCTGTATCGCGATGATTGCACCAAAGCGACCCATAGCCGCATGGAACATCAGCTGGGCCATCTGCCGCCGATTAACGCGGCGAGCAGCCCGCAGCAGCGCGATCGCTATAGCCATTGGACCAAACTG

3.3. Codon optimization.

Optimizing a codon sequence can have various impacts. Codon optimization replaces less-favored codons in a specific organism with more common codons. An optimized codon sequences has a higher efficiency in translation which then leads to higher levels of protein expression. In addition, an optimized codon improves the stability of the mRNA since they are more likely to be recognized by tRNAs. Overall, an optimized codon is more likely to have increased protein expression.

> Optimized codon sequence of NP_003131.1 sex-determining region Y protein to Humans (Homo sapiens). ATGCAGTCCTATGCCTCCGCCATGCTGAGCGTGTTTAACAGTGATGACTACTCCCCAGCCGTGCAGGAGAACATCCCAGCCCTGAGACGCAGCAGCTCATTCCTGTGTACCGAGTCTTGCAACTCCAAGTACCAGTGCGAGACCGGCGAGAACAGTAAGGGAAACGTGCAGGATCGCGTGAAAAGGCCCATGAACGCTTTCATCGTGTGGAGCCGCGATCAGAGGAGGAAGATGGCCCTGGAGAATCCCAGGATGCGGAACAGCGAAATCTCCAAGCAGCTGGGCTACCAGTGGAAGATGCTGACCGAGGCCGAGAAGTGGCCATTTTTCCAGGAGGCACAGAAGCTGCAGGCCATGCACAGAGAGAAGTACCCCAATTACAAGTACAGACCCAGAAGAAAGGCCAAAATGCTGCCTAAGAACTGTTCCCTGCTGCCCGCCGACCCAGCCTCCGTGCTGTGCTCTGAAGTCCAGCTGGACAACCGCCTGTACAGAGACGACTGTACCAAGGCCACCCACTCCCGCATGGAACACCAGCTGGGGCACCTGCCCCCCATTAATGCCGCATCCTCCCCCCAGCAGCGCGACCGGTACAGCCACTGGACAAAGCTG

3.4. You have a sequence! Now what?

To produce this protein from my DNA, we can use a multitude of both cell-dependent and cell-free methods.

We can living cells as “factories” for our proteins. We can design a plasmid containing our TDF-encoding gene that also has a promoter, ribosome binding site, and antibiotic resistance marker. Through transforming a host cell (like Escherichia coli) with this DNA, we can then induce expression of the TDF protein within the cells. Upon harvesting and purifying the protein, we can then have a batch of TDF proteins.

If a cell-free system was preferable, we could combine ribosomes, tRNAs, polymerases, amino acids + nucleotides, and an energy system with our DNA to create our protein in a test tube. This method would be much faster.

Part 4: Prepare a Twist DNA Synthesis Order

cover image cover image cover image cover image

Info

The above photo is the sequence of the expression cassette to express TDF proteins. This photo was taken in Benchling.

cover image cover image
Info

The above photo is a pTwist Amp High Copy plasmid backbone with the TDF-expressing insert (shown by “insert”). This photo was taken in Twist Bioscience.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

I would want to sequence the genome of a lactose intolerant person. I want to sequence this genome to better understand what genes are implicated in the reduced expression of lactase in lactose-intolerant patients.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

To sequence my DNA, I would use Illumina Whole-Genome Sequencing. I would use this method as it can give a complete, highly accurate view of someone’s entire DNA sequence. This is a second-generation sequencing method that can sequence millions of short fragments in parallel and uses PCR amplification. For our input, we must extract blood/saliva and purify the genomic DNA of these cells. Then, we shear the DNA into ~200-500 bp fragments using enzymes. Once we ligate synthetic adapters to both ends of the fragments, we can then PCR amplify the adapter-ligated fragments. The prepared DNA can be combined with complementary oligos from a flow cell to generate clusters. This method uses a sequencing-by-synthesis method. By adding fluorescently labled nucleotides, we can use a camera to record which color/nucleotide attached to the sequence. After removing the nucleotide chemically and repeating the process with each type of nucleotide, we can then generate a raw FASTQ file.

DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would like use synthesize a sequence of DNA that can work in a cell-free system and detect certain molecules present in a disease. This circuit will detect specific molecules from a pathogen like malaria. I want to create this biosensor to give impoverished areas a way to detect disease with cheap methods.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I will perform this DNA synthesis using Gibson Assembly and molecular cloning. I can order the parts of the genetic circuit and assemble them accordingly using Gibson Assembly. After assembling the plasmid, the plasmid can be amplified through transformation, cloning in a bacterial cell, and purification. The limitations of this method is that it can be slower due to colony screening and sequencing verification and could include failed ligations, wrong inserts, etc. It also requires that I have already synthesized DNA.

DNA Edit

(i) What DNA would you want to edit and why?

One application for editing human DNA is to cure lactose intolerance. By editing the genome of a lactose intolerant human to be able to produce lactase, we can cure his lactose intolerance and allow him to consume foods with lactose. This application of gene editing is just one example of how synthetic biology can be leveraged to solve common human disorders.

cover image cover image

(ii) What technology or technologies would you use to perform these DNA edits and why?

Lactose intolerance is commonly caused due to reduced expression of the lactase enzyme. To fix this, one could modify regulatory variants near the LCT gene in the human genome. The best way of doing this uses base editing systems (a CRIPSR-derived technology) to convert one base into another at a targetted location. This process works by a guide RNA first directing a Cas protein to a specific DNA sequence. Upon finding this DNA sequence, the Cas protein binds to the target site and a fused enzyme will then chemically convert the base into another base. After the cell’s repair mechanisms fix the strand, there will be a single-letter DNA change.

To prepare to preform this DNA edit, you must first identify which regulatory variant is implicated in adult lactase expression. Then, you have to prepare a sequence-specific guide RNA, base editor protein, and delivery system (maybe viral?). As a note, this system will target human intestinal stem cells in vivo. This method, however, carries various limitations. Since it is in vivo, delivery will be extremely difficult and all cells will not be edited (mosaic editing). In addition, this editing method may cause unintended edits in other areas of the genome.

Week 3 HW: Lab Automation

cover image cover image

Assignment: Python Script for Opentrons Artwork

This is a link to the code for my Opentrons Artwork.

AI Contributions: I used AI to generate large portions of my code as I am largely unfamiliar with python programming. I used Gemini AI and asked it to integrate my coordinates for my artwork into the code in Colab.

cover image cover image
Info

This image is a simulation of the product of my code when ran on the Opentrons robot. It is pixel art of a lizard.

result image result image
Info

An image of the result of my code being run on the Opentrons robot at William and Mary.

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

The published paper that I found is titled “Semiautomated Production of Cell-Free Biosensors” (Brown, 2025). In this publication, the researchers develop and demonstrate an automated pipeline using the Opentrons OT-2 liquid handling robot to (mostly) automate and scale the manufacturing of cell-free biosensors. In their pipeline, they used the Opentrons to produce a full 384-well plate of fluoride-sensing biosensors. The researchers had the objective of using the Opentrons OT-2 robot to develop a method capable of high-throughput manufacturing, reduced variability in sensor performance, and accessibility across global labs.

cover image cover image

When compared to manually assembled biosensors, the biosensors created by the robot proved to have greater consistency among detection thresholds. This research suggested that facilities or field clinics could use Opentrons robots to assemble diagnostic tests on-demand rather than importing them from outside sources. In addition, the success of this study shows how the cheaper OT-2 robot can be used in replacement of industrial-grade liquid handlers so that the production of synthetic biology-based diagnostics can be done in resource-limited settings.

Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

In my final project, I want to use automation tools to reliably produce biosensors and test them. I want to use the Opentrons robot to create multiple samples of protoplast cells altered by CRIPSR/Cas9 to produce a certain biosensor within the cell. These cells would be incubated and grown. Then, in an experiment, the automation tools would expose these cells to pathogens to see if the biosensor is able to reliably detect the presence of the pathogen. The fluoresence created by the biosensor can be measured by PHERAstar.

References

Brown, D. M., Phillips, D. A., Garcia, D. C., Arce, A., Lucci, T., Davies Jr., J. P., Mangini, J. T., Rhea, K. A., Bernhards, C. B., Biondo, J. R., Blurn, S. M., Cole, S. D., Lee, J. A., Lee, M. S., McDonald, N. D., Wang, B., Perdue, D. L., Bower, X. S., Thavarajah, W., … Lucks, J. S. (2025). Semiautomated Production of Cell-Free Biosensors. ACS Synthetic Biology, 14(3), 979-986. https://doi.org/10.1021/acssynbio.4c00703.s001

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

    Assuming that meat contains about 20% protein, 500 grams of meat translates to ~100 grams of protein. If an amino acid weighs 100 Daltons, this will translate to 100 grams per mole of amino acids. Knowing this, we calculate 500 grams of meat to have one mole of amino acids. Using Avogadro’s constant, we can calulate the number to be 6 x 1023 molecules of amino acids in 500 grams of meat.

  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

    When you eat beef or fish, you eat cells that contain various macromolecules like DNA, proteins, lipids, and carbohydrates. While these macromolecules would usually have an effect on a human’s cells, our bodies instead digest these molecules and break them down into smaller forms (Proteins into amino acids, DNA into nucleotides, etc.). So, by the time the macromolecules reach our bloodstream, there are no intact cow or fish genes/proteins. Our cells take up these digested molecules and uses them to rebuild our own human genome, not theirs.

  3. Why are there only 20 natural amino acids?

    There are only 20 natural amino acids because that number of different amino acids provides enough chemical diversity for protein function while also avoiding the drawbacks of having more amino acids. Adding any more amino acids would only supply new amino acids that would behave very similarly to one of the original 20 amino acids. In addition, since similar codons can encode chemically similar amino acids, adding an additional amino acid could increase the chance of a small genetic mutation causing a disruptive substitution in the protein structure.

  4. Where did amino acids come from before enzymes that make them, and before life started?

    Before life existed, amino acids were synthesized on Earth through ordinary chemistry. The Miller-Urey experiment demonstrated that early Earth’s atmosphere was capable of forming several types of amino acids without the help of cells or enzymes. Amino acids can form spontaneously through prebiotic chemical reactions that are driven by UV radiation, hydrothermal vents, lightning, etc.

  5. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

    If you made an α-helix using D-amino acids, you would form a left-handed α-helix. However, if you used L-amino acids, you would create a right-handed α-helix. This is due to the peptide backbone’s fixed bond angles which make a D-amino acid-based α-helix energetically favorable in a left-handed form.

  6. Can you discover additional helices in proteins?

    Using synthetic biology, it is possible to design new helices in proteins. Researchers can use D-amino acids and noncanonical amino acids to modify backbone chemistry and engineer altered hydrogen bonding patterns. A new helice is unlikely to emerge naturally since the chemistry of protein backbones is heavily constrained.

  7. Why are most molecular helices right-handed?

    Most molecular helices are right-handed because of how the chiral L-amino acids and D-sugars act within the protein/DNA strand. Due to their stereochemistry, L-amino acids and D-sugars find right-handed helices more energetically favorable. Due to evolution, organisms tend to favor L-amino acids and D-sugars which causes most moelcular helices to instead be right-handed.

  8. Why do β-sheets tend to aggregate?

    β-sheets have hydrogen bond donors (N-H) and hydrogen bond acceptors (C=O). To satisy these untaken hydrogen bonds, β-sheets form hydrogen bonds with other strands. Unlike α-helices, β-sheets will depend on intermolecular hydrogen bonding. This causes them to become aggregation-prone.

    • What is the driving force for β-sheet aggregation?

      The main driving force for β-sheet aggregation primarily includes backbone hydrogen bonding. Each added strand forms another array of H-bonds which increases stability. In addition, β-sheet-forming sequences usually have a hydrophobic effect where one face is hydrophobic while the other is polar. This causes water to be excluded from the packed-together hydrophobic faces.

  9. Why do many amyloid diseases form β-sheets?

    Amyloid fibrils share a universal Cross-β architecture motif. In this structure, β-strands run perpendicular to the fibril axis where sheets stack. In proteins that misfold, sequences are exposed that are capable of forming extended β-strands and are normally buried. This causes aggregation to become autocatalytic which results in additional monomers to attach to the growing fibril.

    • Can you use amyloid β-sheets as materials?

      Amyloid β-sheets are strong, highly ordered, self-assembling, and chemically programmable. This makes them highly capable of being used as a material. Researchers are exploring them for use in nanofibers, biomaterials, drug delivery scaffolds, etc. Since they are sequence-programmable, researchers can engineer peptides that can be mechanically tunable and form controlled fibers.


Part B: Protein Analysis and Visualization

cover image cover image A 3D render of the structure of a homolog of lactase.

1. Briefly describe the protein you selected and why you selected it.

The protein that I chose was the lactase (LCT) enzyme (structure shown above). I chose this protein because this enzyme is heavily implicated in lactose intolerance. I have a large interest in using synthetic biology to be able to cure lactose intolerance.

2. Identify the amino acid sequence of your protein.

This is the amino acid sequence of lactase:

>NP_002290.2 lactase/phlorizin hydrolase preproprotein [Homo sapiens]

MELSWHVVFIALLSFSCWGSDWESDRNFISTAGPLTNDLLHNLSGLLGDQSSNFVAGDKDMYVCHQPLPT FLPEYFSSLHASQITHYKVFLSWAQLLPAGSTQNPDEKTVQCYRRLLKALKTARLQPMVILHHQTLPAST LRRTEAFADLFADYATFAFHSFGDLVGIWFTFSDLEEVIKELPHQESRASQLQTLSDAHRKAYEIYHESY AFQGGKLSVVLRAEDIPELLLEPPISALAQDTVDFLSLDLSYECQNEASLRQKLSKLQTIEPKVKVFIFN LKLPDCPSTMKNPASLLFSLFEAINKDQVLTIGFDINEFLSCSSSSKKSMSCSLTGSLALQPDQQQDHET TDSSPASAYQRIWEAFANQSRAERDAFLQDTFPEGFLWGASTGAFNVEGGWAEGGRGVSIWDPRRPLNTT EGQATLEVASDSYHKVASDVALLCGLRAQVYKFSISWSRIFPMGHGSSPSLPGVAYYNKLIDRLQDAGIE PMATLFHWDLPQALQDHGGWQNESVVDAFLDYAAFCFSTFGDRVKLWVTFHEPWVMSYAGYGTGQHPPGI SDPGVASFKVAHLVLKAHARTWHHYNSHHRPQQQGHVGIVLNSDWAEPLSPERPEDLRASERFLHFMLGW FAHPVFVDGDYPATLRTQIQQMNRQCSHPVAQLPEFTEAEKQLLKGSADFLGLSHYTSRLISNAPQNTCI PSYDTIGGFSQHVNHVWPQTSSSWIRVVPWGIRRLLQFVSLEYTRGKVPIYLAGNGMPIGESENLFDDSL RVDYFNQYINEVLKAIKEDSVDVRSYIARSLIDGFEGPSGYSQRFGLHHVNFSDSSKSRTPRKSAYFFTS IIEKNGFLTKGAKRLLPPNTVNLPSKVRAFTFPSEVPSKAKVVWEKFSSQPKFERDLFYHGTFRDDFLWG VSSSAYQIEGAWDADGKGPSIWDNFTHTPGSNVKDNATGDIACDSYHQLDADLNMLRALKVKAYRFSISW SRIFPTGRNSSINSHGVDYYNRLINGLVASNIFPMVTLFHWDLPQALQDIGGWENPALIDLFDSYADFCF QTFGDRVKFWMTFNEPMYLAWLGYGSGEFPPGVKDPGWAPYRIAHAVIKAHARVYHTYDEKYRQEQKGVI SLSLSTHWAEPKSPGVPRDVEAADRMLQFSLGWFAHPIFRNGDYPDTMKWKVGNRSELQHLATSRLPSFT EEEKRFIRATADVFCLNTYYSRIVQHKTPRLNPPSYEDDQEMAEEEDPSWPSTAMNRAAPWGTRRLLNWI KEEYGDIPIYITENGVGLTNPNTEDTDRIFYHKTYINEALKAYRLDGIDLRGYVAWSLMDNFEWLNGYTV KFGLYHVDFNNTNRPRTARASARYYTEVITNNGMPLAREDEFLYGRFPEGFIWSAASAAYQIEGAWRADG KGLSIWDTFSHTPLRVENDAIGDVACDSYHKIAEDLVTLQNLGVSHYRFSISWSRILPDGTTRYINEAGL NYYVRLIDTLLAASIQPQVTIYHWDLPQTLQDVGGWENETIVQRFKEYADVLFQRLGDKVKFWITLNEPF VIAYQGYGYGTAAPGVSNRPGTAPYIVGHNLIKAHAEAWHLYNDVYRASQGGVISITISSDWAEPRDPSN QEDVEAARRYVQFMGGWFAHPIFKNGDYNEVMKTRIRDRSLAAGLNKSRLPEFTESEKRRINGTYDFFGF NHYTTVLAYNLNYATAISSFDADRGVASIADRSWPDSGSFWLKMTPFGFRRILNWLKEEYNDPPIYVTEN GVSQREETDLNDTARIYYLRTYINEALKAVQDKVDLRGYTVWSAMDNFEWATGFSERFGLHFVNYSDPSL PRIPKASAKFYASVVRCNGFPDPATGPHACLHQPDAGPTISPVRQEEVQFLGLMLGTTEAQTALYVLFSL VLLGVCGLAFLSYKYCKRSKQGKTQRSQQELSPVSSF

  • The length of the protein is: 1927 aminoacids.
  • The most common amino acid is: L, which appears 170 times.
  • Using Uniprot’s BLAST tool, I found 250 protein sequence homologs for LCT. The result of the BLAST tool can be found here.

3. Identify the structure page of your protein in RCSB.

I was not able to find the same exact protein in RCSB. However, I found a close homolog in Escherichia coli. The PDB ID is pdb_00005e9a.

  • This structure was solved on 2016-10-26. The resolution of the structure is 2.56 Å. This is an acceptable/good quality structure.
  • In the structure, there are water molecules and typically bounder inhibitors or substrate analogs.
  • When put into SCOP, the PDB ID and Uniprot ID returns no results. I assume this is because the protein does not belong to any structure classification family.

4. Open the structure of your protein in any 3D molecule visualization software.

ball and stick ball and stick
Info

3D visualization of “ball-and-stick” model of 5E9A protein.

ribbon ribbon
Info

3D visualization of “ribbon” model of 5E9A protein.

cartoon cartoon
Info

3D visualization of “cartoon” model of 5E9A protein.

ss ss
Info

3D visualization of 5E9A protein colored by secondary structure. Cyan is helix, red is sheet, and magenta is loop.

The above image of 5E9A colored by its secondary structure shows how the structure most likely contains more helices than sheets.

residue residue
Info

3D visualization of 5E9A protein colored by residue type. Yellow is hydrophobic, green is polar (uncharged & hydrophilic), blue is positively charged (hydrophilic), red is negatively charged (hydrophilic), and light orange is Glycine (special case, both hydrophobic and hydrophilic).

The 3D visualization of the 5E9A protein colored by residue type shows how the protein is about half hydrophobic amino acids and half hydrophilic amino acids. The protein shows a somewhat alternating pattern between hydrophilic and hydrophobic residues. This pattern creates the helices, sheets, and loops shown in the model.

surface surface
Info

3D visualization of “surface” model of 5E9A protein. Colored rainbow by elem c.

The above model shows a “hole” in the enzyme which is its binding pocket. Not shown in the picture but on the other side of the protein there is another binding pocket. It is likely that these “holes” are where lactose binds.

Part C. Using ML-Based Protein Design Tools

For this part, I am continuing to use the lactase protein pdb_00005e9a.

C1. Protein Language Modeling

1. Deep Mutational Scans

scan image scan image
Info

A screenshot of the Mutation Scan Heatmap generated from the LCT protein sequence on ESM2.

The above mutation scan heatmap demonstrates stark patterns. For instance, the W, M, and C residues seem to have consistently negative values throughout the protein sequence. This means that these amino acid residues would most likely reduce fitness/function in the LCT protein if they were to replace nearly any amino acid in a mutation.

2. Latent Space Analysis cover image cover image

Info

Image showing a latent space analysis of various proteins. The red cross represents my chosen protein, LCT/E59A.

The different formed neighborhoods within the the latent space analysis which grouped proteins together mostly based on similar function and species. For example, HMG1 (High mobility group protein 1) of Rattus norvegicus (Norway rat) and HMG1 of Cricetulus griseus (Chinese hamster) were present in a cluster together.

My protein, E59A, had a higher TSNE3 (as shown above) and was grouped with various human (Homo sapien) proteins such as somatotropin, inferon-beta, and other automated matches. It seems to be similar to human proteins involved in human development. This is slightly expected as E59A is a human protein.

C2. Protein Folding

folded folded
Info

A 3D representation of the truncated structure of 5E9A (560 residues) using ESMFold.

Due to the sequence of my protein being too long, I opted to truncate my protein sequence to 560 residues so that the ESMFold in Google Colab would be able to successfully run the sequence. The ESMFold prediction produced an average pLDDT score of 79.1, showing generally reliable structural confidence. The predicted TM-score (pTM = 0.583) shows moderate confidence in the overall global fold. These statistics imply that the individual domains are well predicted but their orientations may be less accurate. This is a plausible is structurally coherent fold.

The predicted coordinates do not exactly match my original structure as I had to truncate my protein in order for it to work with ESMFold. However, if I used my entire sequence, I predict that it would have matched well given the statistics of my fold.

After mutating the sequence a few times, I saw how my structure proved to be fairly resistant to mutations. After small mutations, the structure did not change significantly. However, large segments did produce significant change within the structure.

C3. Protein Generation

prob plot prob plot
Info

Plot showing the probabilities of each amino acid in each index of the sequence based on the 3D structure of E59A.

This is the original sequence: MNHKVHHHHHHIEGRHMELGTLEGSMTKFPLLSSKISGLLHGADYNPEQWLDHPDVLVRDVEMMKEARCNVMSVGIFSWSALEPEEGRYTFDWMDQVLNRLHENGISVFLATPSGARPAWMSQKYPQVLRVGRDRVPALHGGRHNHCMSSPVYREKVQLMNGQLAKRYAHHPAVIGWHISNEYGGECHCDTCQGQFRDWLKARYVTLDALNKAWWSTFWSHTYTDWSQLESPSPQGENGVHGLNLDWRRFNTDQVTRFCSEEIRPLKAENPALPATTNFMEYFNDYDYWKLAGVLDFISWDSYPMWHTRQDDIGLAAYTAMYHDLMRTLKQGKPFVLMESTPSFTNWQPTSKLKKPGMHILSSLQAVAHGADSVQYFQWRKSRGSCEKFHGAVVDHVGHIDTRVGREVAELGSILSALAPVAGSRVEAKVAIIFDWESRWAMDDAMGPRNAGLHYENTVADHYRALWAQGIAVDVINADCDLQGYDLVIAPMLYMVREGVGERISAFVQAGGRFVATYWSGIVNETDLCFLNGFPGPLRPVLGIWAEEIDSLTDEQHNSVAGVEGNALGLSGPYRASQLCEVIHLEGAAALATYGDDFYAGNPAVTVNLYGKGQAYYVASRNDQQFHADFFTALAKEMKLPRAINTPLPEGVTAARRTDGESEFIFLQNYNADNQTVALPQDYQDIVHGGNLPRKLTLPAFGCQILTRKITQ

This is the predicted sequence: MKFPRLSPKIDGLLLGAEYYPELFLDDPELIERLIELMKEAKINVVRLGTHAWEYLEPEPGVYNFAWLDKTLDLLEKNGIYVLLATPTGKLPRWVYEEHPEVLRTKLDGTKEKYGGSHNICLVSPYFRALATEMNTKLASRYAHHPAVIGWEIGNEFSGYCNCPLCQEAFRKWVKEKFGTLDAFNKAGNLEKNNKVVTDWSQIKLPSPNGENDVLTLINLFKEYNTQLVKDFLTQLIEPLKKYNPNLPVTSNFDWWQEDYDYTELATVLDFIAYDSYPPWGTGPDDVRLAAEVAMYHDYMRSLKNGKPFILSETYVDHVNWQATSTSLPPGRLKLWCLLHVACGAEAVLFHYLRRPRTGWDKNHGAVIDHTGNIDTPVGKEVKALGDELASLKDIAGTKIEAEVAIVYDWKSRVILEASKGPLDAGLDYVGNVNRWFRAFWSQGIAVDVIDADADLSPFKLVVAPELYMVPAGVGDRLAAFVAAGGTFVATVLSGVVDEYGREFTDGRPGPLRDVLGVLVRRVRSLSADQTATVTGVAGNELGLSGPYTVTRLAAVVELRGAEALAVYGSGPDAGQPAVTVNRHGKGRAYYVAGRVDDAFWAAFFGSLAEKLKLKRAIDTPLPPGVFARRRTDGENEYVFLFNFRDTPVTVTLPGTYTXXXXXGTVPATITLPPYGVKVLTRKI

The predicted sequence is significantly distinct from the original sequence. The predicted sequence is longer than the original sequence and contains little similarities with the original sequence.

predicted structure predicted structure
Info

A proposed 3D structure generated by ESMFold based on the predicted sequence generated by ProteinMPNN. The predicted strcture seems to be more clumped than the original structure generated by ESMFold.

Part D. Group Brainstorm on Bacteriophage Engineering

For this part of the homework, I joined a group of students consisting of Jason Ross, Jay Handfield, Nana Agyei Afrane-Asare, Xavier-Lewis Palmer, and myself. After going through the phage reading and reviewing the bacteriophage final project goals, we opted to increase the thermodynamic stability of the Lysis Protein.

Our group’s plan for engineering a bacteriophage includes multiple steps and various protein engineering tools. First, we will use BLAST and Clustal Omega to discover amino acid residues that are conserved between Lysis protein homologs. ESM2 will be used to score mutations for their evolutionary plausibility and ESM-Fold/ProteinMPNN/Boltz-1 will be used to refine the folded protein. EvolvePro will be used to computationally direct evolution in this protein structure. Lastly, we can use computational stress-testing under varying environmental conditions to test for destabilization.

Our group’s full one page proposal is linked here.

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

This is the human SOD1 sequence containing the A4V mutation: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

BinderPseudo Perplexity
0WHYYATGARWGE16.929015
1WRYGAVALELKK12.714672
2WRSPAAAARWWK9.155765
3WRYPATAAALKX4.843841
4FLYRWLPSRRGGN/A
Info

The table generated by PepMLM detailing possible peptides to bind to mutant SOD1 along with their pseudo perplexity scores. Peptide 4 is an already known SOD1-binding peptide.

Part 2: Evaluate Binders with AlphaFold3

BinderipTMPeptide localizes near SOD1 N-terminus?Engages β-barrel region?Approaches dimer interface?Surface-bound/partially buried peptide?3D Model
0WHYYATGARWGE0.32NoNoYesYesimage0 image0
1WRYGAVALELKK0.18NoNoYesYesimage1 image1
2WRSPAAAARWWK0.42NoYesNoYesimage2 image2
3WRYPATAAALKA0.46NoYesNoYesimage3 image3
4FLYRWLPSRRGG0.29NoNoYesYesimage4 image4
Info

Table containing results from AlphaFold3 generations

Peptide generations 0, 1, and 4 contain ipTM scores lower than 0.4, demonstrating no meaningful binding between the peptide and the A4V mutated SOD1 protein. However, compared to the known binder 4, peptides 0, 2, and 3 have greater ipTM values. Peptides 2 and 3 contain ipTM scores significantly greater than the known binder 4 at 0.42 and 0.46, respectively.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

The results from PeptiVerse of the 5 peptides proved to be mostly consistent with the ipTM values in AlphaFold3. Peptides 0, 1, and 4 resulted in weak binding which is consistent with their low ipTM scores in AlphaFold3. However, peptide 2 also resulted in weak binding in PeptiVerse which was unexpected as it had an ipTM score of 0.42 in AlphaFold3. Peptide 3 resulted in medium binding which is consistent with its ipTM score of 0.46. All five peptides were soluable and non-hemolytic so they all had sufficient therapeutic properties. Only peptide 4 balanced predicted binding and therapeutic properties well.

Part 4: Generate Optimized Peptides with moPPIt

Peptide SequenceHemolysis ↓Solubility ↑Affinity ↑Motif Match ↑
GGKKEYYYSRYP0.95860.91677.210.1572
EKQYTCDTSTKM0.96750.91676.180.8416
KKTTGYGECSYN0.96391.00005.850.8290
GTYTCETTYTQW0.97280.91676.680.8369
Info

Table of results from the moPPIt-v3 Colab using the A4V mutated SOD1 protein.

The peptides generated by the moPPIt-v3 Colab generated stronger binding and motif match results than the peptides generated by PepMLM. In addition, both groups of peptides had high solubility. However, the peptides generated by moPPIt-v3 demonstrated very high hemolysis results, insinuating high risks for red blood cell damage. I would evaluate these peptides to be unready for clinical studies since their hemolytic values are too high. These peptides would be too dangerous to use in therapeutic applications.

Part C: Final Project: L-Protein Mutants

I worked with Jason Ross, Xavier-Lewis Palmer, and Nana Agyei Afrane-Asare to generate mutated proteins to improve the stability of the L-Protein. Our results can be found in this Google Doc.

Week 6 HW: Genetic Circuits Part I: Assembly Technologies

cover image cover image

Assignment: DNA Assembly

  1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The Phusion High-Fidelity PCR Master Mix consists of Phusion DNA Polymerase, deoxynucleotides, and a reaction buffer (including MgCl2). The Phusion DNA Polymerase is a high-fidelity enzyme that is used to synthesize new, complementary nucleotides to the 3’ end of a DNA strand. Deoxynucleotides are present within the master mix to be added to the cloned DNA strand. The reaction buffer facilitates enzymatic function and stabilizes the DNA polymerase, allowing the PCR reaction to proceed smoothly.

  1. What are some factors that determine primer annealing temperature during PCR?

The annealing temperature for a PCR reaction the primer base composition (proportions of A, T, G, and C), primer concentration, primer length, and ionic reaction environment.

  1. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

    PCRRestriction Enzyme Digest
    PurposeCreate millions of copies of a DNA segment (amplify the DNA segment).Cut DNA at a specific site.
    Active EnzymesDNA PolymeraseRestriction Endonucleases
    TargettingDefined by synthetic primers.Defined by palindromic recognition sequences.
    TemperatureAlternates between denaturation, annealing, and extension temperatures (95°C, 55°C, 72°C, respectively).Usually a steady incubation at 37°C.
    ResultHigh-concentration amplicons of a single sizeDifferent sized fragments of DNA based on the number of sites
    Input DNAMinimal amount of DNA templateHigh concentrations of DNA
  2. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

We ensure that the DNA sequences that we have digested and PCR-ed are appropriate for Gibson cloning through preparing the correct conditions for Gibson cloning. The two DNA inserts we create must have identical ends that will overlap with one another. This is to ensure that they will stick together during Gibson assembly. In addition, the DNA used must be treated with DpnI and a cleanup kit (column purification) to remove the original template DNA and leftover salts/enzymes.

  1. How does the plasmid DNA enter the E. coli cells during transformation?

The most common method for plasmid DNA to enter E. Coli cells during transformation is chemical transformation. In this method, cells are treated with calcium chloride which neutralizes the negative charges on the DNA and cell membrane to bring them closer to one another. Then, the cells are cooled down and suddenly heated to 42°C. This sudden change in temperature creates a pressure difference between the inside and outside of the cell, forming temporary pores in the membrane which allow the entry of the plasmid DNA into the cytoplasm.

  1. Describe another assembly method in detail (such as Golden Gate Assembly)

Golden Gate Assembly (GGA) is a highly efficient molecular cloning method that allows for the simultaneous assembly of multiple DNA fragments into a single piece. GGA relies on Type IIS restriction enzymes. These enzymes recognize specific DNA sequences but cut at a defined distance outside of that sequence. This allows the user to design complementary overhangs with any 4-base-pair sequence on different fragments so that they are forced to assemble in a specific order and orientation. Researchers can also design this process to cut off and discard the recognition site. By having the restriction digest and ligation happen at the same time, the DNA is sealed within its final, correct form without the possibility of being cut again. Golden Gate assembly allows for the restriction digest and ligation to happen all in one tube with the highly efficient, seamless and programmable arrangement of multiple segments of DNA.

diagram diagram

Assignment: Asimov Kernel

Week 7 HW: Genetic Circuits Part II

Labs

Lab writeups:

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Week 2 Lab: Lab DNA Gel Art

Week 3 Lab: Opentrons Art

This is the link to my Opentrons artwork.

AI Contributions: I used AI to generate large portions of my code as I am largely unfamiliar with python programming. I used Gemini AI and asked it to integrate my coordinates for my artwork into the code in Colab.

cover image cover image
Info

This is a picture of the simulation of my code. It represents what will be printed onto an agar plate when ran on the Opentrons robot.

Week 4 Lab: Protein Design Part I

Week 5 Lab: Protein Design Part II

Week 6 Lab: Gibson Assembly

Week 7 Lab: Neuromorphic Circuits

Projects

Final projects:

  • SECTION 1: ABSTRACT Seasonal allergic rhinitis affects hundreds of millions of people globally, heavily driven by major pollen allergens like Bet v 1 from birch trees. Current treatments rely on systemic pharmaceuticals, such as antihistamines or steroids, that manage human immune symptoms post-exposure but fail to address the environmental trigger itself. This project addresses this critical gap by shifting the paradigm from symptom management to active, localized bioremediation inside the human nasal cavity. The broad objective of this project is to engineer a “Living Bio-Shield”: a bacterial genetic circuit designed to operate within a nasal commensal that detects and neutralizes pollen proteins upon inhalation. We hypothesize that a chimeric two-component receptor system can be engineered to specifically bind Bet v 1, subsequently triggering a genetic circuit to secrete neutralizing nanobodies (VHH domains) via a Sec-dependent pathway. The specific aims involve designing the chimeric receptor in silico, assembling the genetic circuit plasmid, and validating the computational folding and binding affinity of the receptor-nanobody complex. This will be achieved using bioinformatics databases, AlphaFold for protein design, Benchling for DNA construct assembly, and simulated structural analysis. By neutralizing allergens before they interact with the mucosal epithelium, this project establishes a novel preventative biotherapeutic platform for respiratory health.

Subsections of Projects

Individual Final Project

cover image cover image

SECTION 1: ABSTRACT

Seasonal allergic rhinitis affects hundreds of millions of people globally, heavily driven by major pollen allergens like Bet v 1 from birch trees. Current treatments rely on systemic pharmaceuticals, such as antihistamines or steroids, that manage human immune symptoms post-exposure but fail to address the environmental trigger itself. This project addresses this critical gap by shifting the paradigm from symptom management to active, localized bioremediation inside the human nasal cavity. The broad objective of this project is to engineer a “Living Bio-Shield”: a bacterial genetic circuit designed to operate within a nasal commensal that detects and neutralizes pollen proteins upon inhalation. We hypothesize that a chimeric two-component receptor system can be engineered to specifically bind Bet v 1, subsequently triggering a genetic circuit to secrete neutralizing nanobodies (VHH domains) via a Sec-dependent pathway. The specific aims involve designing the chimeric receptor in silico, assembling the genetic circuit plasmid, and validating the computational folding and binding affinity of the receptor-nanobody complex. This will be achieved using bioinformatics databases, AlphaFold for protein design, Benchling for DNA construct assembly, and simulated structural analysis. By neutralizing allergens before they interact with the mucosal epithelium, this project establishes a novel preventative biotherapeutic platform for respiratory health.


SECTION 2: PROJECT AIMS

Aim 1: Experimental Aim (this project): The first aim of my final project is to design and computationally validate a chimeric Two-Component System (TCS) receptor and its associated genetic circuit by utilizing Benchling for DNA construct design and AlphaFold-Multimer for protein structure prediction. This aim focuses on the in silico fusion of a known Bet v 1 nanobody (such as Nb16) to the periplasmic domain of an EnvZ receptor, mapping the resulting operon alongside a Sec-tagged nanobody secretion gene.

Aim 2: Development Aim: The next step following a successful in silico design in Aim 1 would be to physically synthesize the genetic construct using Twist Biosciences, transform it into a lab-safe test chassis (like Bacillus subtilis or E. coli), and validate the active secretion and neutralization of Bet v 1 using in vitro ELISA and SDS-PAGE assays.

Aim 3: Visionary Aim: The long-term vision for this project is to deploy this genetic circuit into a human nasal commensal (such as Staphylococcus epidermidis) to create a commercially viable “Probiotic Nasal Spray” that establishes a persistent, localized defense against airborne allergens. This challenges the existing clinical paradigm of reactive immunotherapy, enabling a proactive capability to continuously filter and neutralize environmental toxins or viruses directly at the point of respiratory entry.


SECTION 3: BACKGROUND

Background and Literature Context The current state of allergy mitigation heavily relies on either avoidance or post-exposure immune suppression, leaving a significant gap in technologies that neutralize allergens at the environmental-mucosal interface. Recent advancements in protein engineering have successfully identified high-affinity camelid nanobodies (VHH) that bind specifically to Bet v 1, effectively blocking human IgE recognition (Levin et al., Journal of Immunology, 2014). Concurrently, synthetic biology has demonstrated the feasibility of engineering live biotherapeutic products (LBPs), where commensal bacteria are modified to secrete therapeutic proteins directly into human microbiomes, such as the gut (Steidler et al., Science, 2000).

This project is highly innovative because it applies the concepts of environmental bioremediation directly to the human microbiome. Instead of relying on passive physical masks or systemic drugs, it utilizes a novel application of chimeric bacterial sensors to create a “smart” biological filter. By engineering a two-component system to respond to a plant protein rather than a bacterial signaling molecule, this work challenges the assumption that mucosal defense must be entirely managed by the human immune system. This expands the boundaries of synthetic biology by proposing a programmable, symbiotic relationship between humans and engineered commensals for respiratory protection.

This project addresses the pressing real-world problem of the escalating global burden of seasonal allergies, which are worsening due to climate-change-driven increases in pollen production. The current reliance on daily antihistamines presents a significant barrier to quality of life due to side effects like fatigue and mucosal drying. If fully realized, this project could benefit society by providing a “once-a-season” localized treatment that eliminates systemic side effects entirely. Furthermore, the outcomes of this project represent a modular platform technology; if the system works for Bet v 1, the nanobody cassette can be swapped to neutralize other airborne threats, including industrial pollutants or respiratory viruses. Ultimately, this field-level change could shift allergy treatment from pharmacology to engineered preventative ecology.

Ethical Implications The primary ethical implications involved in this project center around the eventual deployment of genetically modified organisms (GMOs) into the human respiratory tract. This directly invokes the principle of non-maleficence (do no harm) and the responsibility to protect both the human host and the broader ecosystem. Introducing an engineered commensal carries the risk of unintended immune responses, potential dysbiosis of the natural nasal microbiome, or the horizontal gene transfer of the synthetic plasmid to pathogenic bacteria. Additionally, from an ecological standpoint, we must consider the risk of these engineered bacteria escaping the host via sneezing or exhalation and establishing themselves in the natural environment.

To ensure this project is conducted ethically, stringent biocontainment measures must be integrated into the fundamental design of the bacteria. I propose the implementation of a genetic “kill-switch,” specifically a strict auxotrophy for a synthetic amino acid not found in nature or the human diet. The bacteria would only survive if the user periodically applies a specialized nasal spray containing this nutrient; if discontinued, the engineered colony would rapidly die off. A potential unintended consequence of this action is genetic mutation or recombination that breaks the kill-switch, allowing the bacteria to survive independently. If we are wrong in our assumptions about the stability of the chimeric receptor, the bacteria might chronically secrete proteins, leading to localized tissue inflammation. An alternative to this live-commensal approach would be utilizing cell-free systems embedded in a physical, wearable bio-mask, which achieves the neutralization goal without the risks associated with human colonization.


SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

Experimental Plan and Timeline:

  1. Week 1 (Sequence Acquisition): Mine the Protein Data Bank (PDB) and UniProt to retrieve the exact amino acid sequences for the Bet v 1 allergen, a validated Bet v 1 nanobody (e.g., Nb16), and the standard E. coli EnvZ/OmpR two-component system.
  2. Week 2 (Receptor Design): Utilize AlphaFold via an available high-performance computing cluster or Colab notebook to design the chimeric receptor, fusing the Nb16 nanobody to the periplasmic domain of EnvZ.
  3. Week 3 (Structural Simulation): Run AlphaFold-Multimer to simulate the docking of the newly designed chimeric receptor with the Bet v 1 protein to verify that the binding affinity and conformational shift are preserved.
  4. Week 4 (Plasmid Architecture): Open Benchling and construct the full genetic circuit in silico.

Group Final Project

cover image cover image