Homeworks

Subsections of Homeworks

Week 1 HW: Principles and Practices

Microbes

1. First, describe a biological engineering application or tool you want to develop and why.

Effective wastewater treatment is essential for clean waterways, environmental health, and safe drinking water. However, rapid urbanization and population growth have overloaded our capacity to manage aquatic waste, jeopardizing clean water access and biodiversity as pathogens, heavy metals, and algal bloom-inducing nutrients get flushed into waterways. Additionally, current water treatment strategies do not effectively remove a number of harmful compounds, including some drugs and dyes (Renganathan et al., 2025).

Engineered microbial consortia, harnessing and enhancing bacterial communities’ innate capacity to degrade harmful compounds, may offer a promising way to strengthen our current wastewater management approaches and expand the variety of pollutants that we can effectively remove and remediate. I am especially interested in using cyanobacterial species for this purpose—I worked briefly with Nostoc and Oscillatoria species, both of which have promising nitrogen-fixation and biodegradation capabilities (Atoku et al., 2021), over the summer, and I am also studying Acinetobacter baylyi (not a cyanobacterium), which has strong catabolic capabilities and can degrade harmful toxins and aromatic compounds (Baugh et al., 2025; Li et al., 2021). I’m curious about how these species might interact as a microbial community and how we could potentially harness their interactions for better wastewater treatment.

A wastewater remediation project would both expand upon my research as a member of William & Mary’s 2025 iGEM team, which focused on developing design principles to apply SynBio to water-related problems, and is relevant to my local community in Williamsburg: Surrounding cities in Virginia and the broader Chesapeake Bay Watershed suffer from frequent water quality and sewage issues.

Goal 1: Ensure feasibility and effectiveness of deploying engineered microbes in a wastewater context.

  • Ensure that engineered microbes function effectively under complex wastewater conditions in addition to laboratory conditions
  • Overcome cost and accessibility barriers to deployment
  • Promote public trust

Goal 2: Ensure effective containment of engineered microbes and prevent off-target ecological harm.

  • Ensure responsible testing and monitoring of engineered solutions
  • Develop targeted containment and safety strategies based on experimental results

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Proposed governance actions:

  1. Assess performance and safety of engineered solutions in simulated wastewater environments prior to deployment; adapt engineering approach based on findings.
  2. Facilitate discussion (e.g., scheduled meetings and/or townhalls) between scientists, wastewater plants, and the public about current needs and potential solutions.
  3. Require that scientists and wastewater treatment plants develop a detailed risk-mitigation and containment plan prior to deployment, outlining procedures for using engineered microbes in a water treatment context and strategies to maintain and monitor bacterial containment.
  4. Require approval from relevant local governmental health and environmental agencies (e.g., the Virginia Department of Health and Virginia Department of Environmental Quality) prior to deployment.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Chart

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Based on the above rubric, I would prioritize a governance framework that integrates rigorous scientific testing and safety assessment with discussion between stakeholders, developing regulations that take into account both scientific data and the perspectives of regulators and the public. This framework would create a network of accountability between groups—e.g., ensuring that regulatory decisions are scientifically backed but also that deployment of scientific solutions doesn’t outpace the development of mechanisms to ensure their safety and sustainability. The system involves a tradeoff between testing/safety and the speed of development, but is necessary to ensure that solutions are effective and that we do not deploy them in haste.

References

Atoku, D. I., Ojekunle, O. Z., Taiwo, A. M., & Shittu, O. B. (2021). Evaluating the efficiency of Nostoc commune, Oscillatoria Limosa and chlorella vulgaris in a phycoremediation of heavy metals contaminated industrial wastewater. Scientific African, 12. https://doi.org/10.1016/j.sciaf.2021.e00817

Baugh, A. C., Tumen-Velasquez, M. P., Zempel, I. R., Duscent-Maitland, C. V., Slarks, L. E., Defalco, J. B., Johnson, C. W., Beckham, G. T., & Neidle, E. L. (2025). Rewiring aromatic compound consumption: Chromosomal amplification and evolution of a foreign pathway in acinetobacter baylyi ADP1. ACS Synthetic Biology, 14(9), 3543–3556. https://doi.org/10.1021/acssynbio.5c00341

Li, H., Yang, Y., Zhang, D., Li, Y., Zhang, H., Luo, J., & Jones, K. C. (2021). Evaluating the simulated toxicities of metal mixtures and hydrocarbons using the alkane degrading bioreporter Acinetobacter Baylyi adpwh_reca. Journal of Hazardous Materials, 419, 126471. https://doi.org/10.1016/j.jhazmat.2021.126471

Renganathan, P., Gaysina, L. A., García Gutiérrez, C., Rueda Puente, E. O., & Sainz-Hernández, J. C. (2025). Harnessing engineered Microbial Consortia for xenobiotic bioremediation: Integrating multi-omics and AI for next-generation wastewater treatment. Journal of Xenobiotics, 15(4), 133. https://doi.org/10.3390/jox15040133

—————————————————————————————————————————–

Week 2 Lecture Prep

Homework Questions from Professor Jacobson:

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

DNA polymerase’s error rate is 1:10^6 bp. The human genome is 3 billion bp, so cells have evolved DNA repair mechanisms to correct errors when they occur. For example, the protein MutS identifies incorrect DNA base pairings and starts the repair processes alongside other proteins.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

There are 20 amino acids and, because of codon redundancy, about two to four different ways to encode each of them. If the average protein is about 300 amino acids long, this would mean about 3^300 different possible ways to encode the protein. In reality, these codes don’t all work because the specific codons are translated at different rates—and translation speed may affect protein folding.

Homework Questions from Dr. LeProust:

1. What’s the most commonly used method for oligo synthesis currently?

The Phosphoramidite DNA Synthesis method

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

It takes a long time and the yield is low due to the long coupling time and capping process.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

The yield and time constraints make this impossible.

Homework Question from George Church:

1. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids are arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. Animals must obtain the “essential” amino acids through their diet. The “Lysine Contingency” does not make sense because it involves a scientist removing dinosaurs’ ability to produce lysine, which they could not produce in the first place.

Week 2 HW: Read, Write, Edit DNA

16S_bands

DNA Design Challenge

3.1. Choose your protein.

I’ve chosen PprA, a protein that contributes to radiation resistence in the extremophile bacterium Deinococcus radiodurans, which can survive exposure to space conditions, via DNA repair mechanisms. PprA and other proteins involved in D. radiodurns’s space response could have space biotechnology applications–e.g., engineering space-tolerant food sources and terraforming Martian soil for agriculture. (Note that other research groups have successfully expressed PprA in E. coli before. I’m interested in eventually engineering PprA into a different chassis with direct relevance to space travel or exploring alternative proteins that enhance space (radiation, microgravity, vacuum, etc.) tolerance.)

>sp|O32504|PPRA_DEIRA DNA repair protein PprA OS=Deinococcus radiodurans (strain ATCC 13939 / DSM 20539 / JCM 16871 / CCUG 27074 / LMG 4051 / NBRC 15346 / NCIMB 9279 / VKM B-1422 / R1) OX=243230 GN=pprA PE=1 SV=2
MLPLAFLICSGHNKGSMARAKAKDQTDGIYAAFDTLMSTAGVDSQIAALAASEADAGTLD
AALTQSLQEAQGRWGLGLHHLRHEARLTDDGDIEILTDGRPSARVSEGFGALAQAYAPMQ
ALDERGLSQWAALGEGYRAPGDLPLAQLKVLIEHARDFETDWSAGRGETFQRVWRKGDTL
FVEVARPASAEAALSDAAWDVIASIKDRAFQRELMRRSEKDGMLGALLGARHAGAKANLA
QLPEAHFTVQAFVQTLSGAAARNAEEYRAALKTAAAALEEYQGVTTRQLSEVLRHGLRES

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

PprA DNA sequence, from D. radiodurns refseq NC_001264.1, below:

        1 tcagctctcg cgcaggccgt gccgcagcac ttcggacagt tggcgggtgg tcacgccctg
       61 gtattcctcc agcgcagcgg cggcggtttt cagggccgcg cggtactcct cggcgttgcg
      121 ggcggcggct ccgctgaggg tctgcacgaa cgcctgcacg gtgaagtgcg cttcgggcag
      181 ctgggcgagg ttggccttgg ccccggcgtg gcgagccccg agcagggcgc cgagcatccc
      241 gtccttctcg ctgcggcgca tcagctcacg ctggaaggcg cggtccttga tgctggcgat
      301 cacgtcccag gcagcgtcgg agagcgcggc ctcggcggac gcgggccggg ccacctcgac
      361 aaacagggtg tcgcccttgc gccacacgcg ctgaaaggtt tcgccgcgcc ccgccgacca
      421 gtcggtttcg aagtcgcggg cgtgctcgat cagcaccttg agctgcgcca acggcaagtc
      481 gccgggagcg cggtagccct cgccgagcgc cgcccactgg ctcaggccgc gttcgtcgag
      541 cgcctgcatg ggcgcgtagg cctgcgcgag tgctccgaag ccctcgctca cgcgggcgct
      601 ggggcggcca tcggtcagaa tttcgatgtc gccgtcgtcg gtcagccgcg cctcatggcg
      661 caggtggtgc agccccagcc cccagcgccc ctgcgcttct tgcaaggact gcgtgagcgc
      721 cgcgtccagc gtgcccgcgt cggcctcact cgcggcgagg gcggcgatct ggctgtccac
      781 gcccgccgtg ctcatcaagg tgtcgaaggc ggcgtagatg ccgtccgttt ggtcttttgc
      841 tttagccctt gccatactgc ctttattatg ccctgaacag attaaaaagg ccaggggtag
      901 cac

(http://ncbi.nlm.nih.gov/nuccore/NC_001264.1?from=381165&to=382067) (Note that the above nt sequence is the forward strand, but the protein is encoded by the reverse. See the AA seq for the product that it encodes. Note also that D. radiodurans sometimes uses non-standard start codons.)

3.3. Codon optimization.

Codon optimization ensures that the codons in an engineered construct’s nucleotide sequence correspond with tRNAs that the host species commonly uses and that maximize the efficiency of translation (without changing the resulting amino acid sequence). To account for the NCBI nt sequence’s reverse strand location and D. radiodurans’s non-standard start codons, I optimized directly from the amino acid sequence using the IDT codon optimization tool. (Note that I needed to add a stop codon (TAA) manually because, given the lack of a corresponding AA, the protein seq does NOT encode one.)

I optimized the sequence for use in E. coli. E. coli could be engineered to exhibit extremophile characteristics in order to enhance its viability as a chassis for space applications. Note also that I avoided BbsI, BsaI, BsmBI cut sites. Below is the optimized sequence:

ATGCTGCCACTGGCTTTCCTGATTTGCAGTGGCCATAACAAAGGTAGTATGGCGCGCGCCAAAGCAAAGGACCAGACAGATGGGATCTATGCAGCTTTTGATACTTTAATGTCGACGGCAGGCGTCGATAGCCAGATCGCCGCATTAGCGGCCTCAGAGGCCGATGCGGGCACACTGGACGCGGCCTTGACCCAAAGCCTGCAGGAGGCGCAAGGTCGTTGGGGTCTGGGTCTGCATCATCTGCGCCATGAAGCACGCCTGACTGACGATGGCGACATTGAAATTTTAACCGACGGTCGTCCGAGCGCCCGTGTCTCTGAGGGATTTGGAGCACTTGCTCAAGCGTATGCGCCTATGCAAGCCCTGGATGAGCGCGGCCTGTCCCAGTGGGCTGCGTTGGGTGAAGGATATCGCGCGCCGGGAGACTTGCCTCTGGCCCAGCTGAAAGTGCTGATCGAGCACGCGCGTGACTTCGAAACGGACTGGAGCGCTGGACGCGGCGAAACCTTTCAACGTGTGTGGCGTAAAGGTGATACTCTGTTCGTTGAAGTTGCGCGTCCAGCGAGCGCGGAAGCCGCCTTATCCGACGCAGCATGGGATGTGATTGCGAGCATCAAAGATCGCGCGTTTCAGCGTGAACTGATGCGCCGTAGTGAGAAAGATGGAATGCTGGGGGCACTGCTTGGCGCACGCCATGCTGGCGCCAAAGCTAATTTAGCGCAACTGCCGGAAGCACATTTTACGGTCCAAGCGTTTGTCCAGACTCTGTCTGGTGCCGCAGCGCGCAATGCGGAAGAATATCGCGCCGCGTTAAAAACGGCAGCGGCAGCCTTAGAGGAATACCAGGGTGTTACCACGCGTCAGCTGAGTGAAGTGTTGCGCCATGGCTTGCGTGAAAGTTAA

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? The most straightforward way to transcribe and translate this DNA sequence into protein would be to engineer it into a plasmid, or order it as part of a plasmid, and then transform it into a model bacterium such as E. coli, which could then produce the protein.

Prepare a Twist DNA Synthesis Order

I’ve included the BBa_J23106 promoter for high constitutive expression in E. coli, the BBa_B0034 RBS, and BBa_B0015 Terminator. If I were to order this construct, I might look into E. coli promoters that are UV or general radiation or DNA damage-inducible, as this might reduce the burden of constitutive expression and ensure that the cell only diverts resources to producing the protein under conditions where it is useful.

dradio_construct

Selected pTwist Amp high copy vector on Twist dradio_plasmid

DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

I would like to perform metagenomic sequencing (or metatranscriptomics, though this would be RNAseq) on one of the harmful algal blooms (HABs) that frequently affect local lakes in Virginia. I studied HABs as part of the William & Mary iGEM project last year and am currently working with transcriptomic data from HAB microcosms. Local stakeholders told me that we still cannot fully predict when these blooms might happen and what factors contribute to their duration and severity. Understanding the Virginia blooms’ taxonomic composition (via metagenomics) and functional roles of the relevant microbes (via metatranscriptomics) would inform public safety measures and bloom mitigation strategies.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

1) Is your method first-, second- or third-generation or other? How so?

I would use PacBio HiFi sequencing, which is a third-generation/long read method. Longer reads make it easier to accurately identify the taxa present in the sample and to de novo assemble metagenomes, while shorter, fragmented reads provide less information and and may be more likely to yield false matches.

2) What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

My input would be DNA extracted from the lakewater (probably via an extraction kit designed for water samples) and prepared via the following steps:

  • DNA fragmentation
  • Adapter ligation - in this case, capping of DNA fragments with “ligated hairpin adapters”, which basically turn the sequence into a loop around which the RNA polymerase can move.

3) What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

In PacBio HiFi sequencing 1) the scientist performs library prep (see above), 2) DNA molecules stick to “zero-mode waveguides,” which are wells in the sequencing device, the “SMRT Cell”, and 3) DNA polymerase adds fluorescent nucleotides to each individual DNA molecule, emitting light that indicates the sequence.

4) What is the output of your chosen sequencing technology?

PacBio HiFi sequencing produces long-read raw sequence data as output.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

The cyanobacterium N. commmune has a number of qualities–such as relative tolerance to vacuum conditions, radiation, dry environments, and extreme heat, anti-inflammatory properties, and the ability to promote plant growth–that make it promising for space travel applications, including terraforming Martian soil and providing a sustainable food source. However, the species is extremely slow-growing, difficult to engineer, and may produce toxins harmful to human health–so on its own may be difficult to deploy/apply. I would like to either engineer Nostoc to be faster-growing and more manageble as a chassis–or would synthesize relevant Nostoc genes and pathways (e.g., extracellular polymeric substance production mechanisms, nutrient content, and survival under extreme conditions) and express them in model organisms, which we could then more-easily manipulate and use as chassis for space, nutrition, agriculture, wastewater-management, and more.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use clonal gene sythesis methods (e.g., from Twist) to order the relevant genes and/or circuits in plasmid form so that I could easily express them in a chassis of choice. Producing the plasmids would involve synthesizing the relevant fragments (e.g., with the phosphoramidite method) and then assembling them together (e.g., via Gibson or Golden Gate).

5.3 DNA Edit

(i) What DNA would you want to edit and why?

I would like to modify genes and pathways that regulate bacterial growth, particularly in the cyanobacterium N. commune. The bacterium’s slow growth rate is a major barrier to engineering, deployment as a chassis, and basic science. Modifying growth rate may be difficult because of potential off-target effects on bacterial performance (e.g., the species may need to grow slowly in order to produce an essential compound) and would likely require me to edit regulatory components of genes involved in metabolism and nutrient uptake–and potentially could involve knocking out non-essential genes that might slow down bacterial growth.

(ii) What technology or technologies would you use to perform these DNA edits and why?

I might use a combination of CRISPR and TALENs to potentially 1) replace sequences relevant to metabolic regulation (e.g., that promote nutrient uptake and processing) with more-active counterparts via homology-directed repair with template sequences of interest and 2) knock out sequences that are not essential to bacterial survival/performance and that may impose a burden on bacterial growth.

Week 3 HW: Automation

Python Script and Design

design

Above: My Opentron design of the Washington, DC flag

Post-Lab Questions

1) Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

This paper https://pmc.ncbi.nlm.nih.gov/articles/PMC7886139/ (Lazaro-Perona et al., 2021) tests the efficacy of RNA extraction and PCR-based bulk Covid-19 testing procedures, including an Opentron method. They did not find a significant difference in the ability of the Opentron method vs. other protocols to detect Covid, however they point out that the Opentron method is cost-effective and less labor-intensive than its non-automated counterparts.

2) Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

I’m interested in engineering model bacteria to produce extracellular polymeric substances (EPS) similar to those of the cyanobacterium Nostoc. Nostoc’s EPS gives it natural metal-chelating abilities, high tolerance to stress, dessication, and space conditions, and high nutritional content. However, Nostoc is extremely slow-growing and very difficult to engineer, so is not an optimal chassis. Adding specific Nostoc-like properties to other species would make it possible to take advantage of these characteristics for engineering without dealing with its slow growth rate.

Automation would be most important in the testing phase of my project. To assess EPS formation in bulk, I might use an Opentron-like system to screen a portion of samples, potentially containing different versions of my circuit (or under various conditions and/or including controls that are uninduced or without a circuit present), for production of EPS, and EPS-composition (i.e., presence of carbohydrate compounds vs proteins) via chemical and colorimetric assays. These assays might involve using the opentron to add relevant reagents to each sample, then using a plate reader or spectrophotometer for downstream absorbance and fluoresence measurements.

Final Project Ideas

  1. Engineering synthetic microbial consortia, including A. baylyi, N. commune, and Oscillatoria sp., for wastewater treatment and bioremediation

  2. Engineering production of elements of Nostoc’s extracellular polymeric substance composition into fast-growing, genetically-tractable bacteria for applications in space travel, bioremediation, nutrition, and agriculture.

  3. Developing a bioengineering toolkit for space travel using genes from extremophile species.

Week 4 HW: Protein Design Part I

design

A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang:

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

There are about 20 grams of protein per 100 grams of chicken. If you eat 500 grams of chicken, you’ve eaten 100 grams of protein. Looking only at the amino acids contained within proteins, this is 6.022x1025 daltons, so 6.022x1023 amino acids.

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

We repurpose the proteins, amino acids, etc contained in other organisms for our own biological needs. Since we have human genetic material that instructs our bodies on how to produce human proteins in contexts that are relevant for human development and survival, we don’t become cows when we eat beef.

3. Why are there only 20 natural amino acids?

20 natural amino acids provides enough structural and chemical variation to produce a vast number of diverse proteins when combined in different arrangements.

4. Can you make other non-natural amino acids? Design some new amino acids.

Yes, this is possible and involves complex chemical synthesis pathways.

5. Where did amino acids come from before enzymes that make them, and before life started?

Amino acids can be formed from inorganic matter in reactions that are triggered by the irradiation of specific compounds. This process is generates amino acids in space, and scientists have replicated it in the lab.

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

You would expect a left handed alpha-helix.

7. Can you discover additional helices in proteins?

Yes, AI and ML methods make it possible to uncover novel structural information (such as additional helices) about a given protein.

8. Why are most molecular helices right-handed?

Most amino acids are left-handed. The most chemically favorable combination of left-handed amino acids into a helix is a right-handed helix.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

Beta-sheets have the ability to hydrogen-bond at the edges and have hydrophobic regions. Beta-sheets can hydrogen bond with each other and their hydrophobic regions clump to avoid water.

10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Misfolded proteins often naturally naturally aggregate in a beta-sheet-like structure because of the structure’s stability and energetic favorability. Amyloid beta-sheets are very “sticky” and difficult to get rid of and could be good building materials for this reason.

11. Design a β-sheet motif that forms a well-ordered structure.

A beta-sheet motif with alternating hydrophilic and hydrophobic amino acids will form a well-ordered structure because the hydrophobic sides will group strongly together. E.g., LTINLTIN

B. Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

1. Briefly describe the protein you selected and why you selected it.

I selected the NpDps4 protein from Nostoc punctiforme (Howe et al., 2019, https://pmc.ncbi.nlm.nih.gov/articles/PMC6675082/#abstract1). Nostoc is interesting because of its nutritional relevance, high tolerance to extreme conditions, and ability to uptake toxic environmental compounds. A number of proteins (particularly, those involved in pathways that synthesize Nostoc’s extrapolymeric substances, or EPS) contribute to these qualities, but few have a structure available on PDB. One of N. punctiforme’s Dps proteins, a group of DNA-binding proteins involved in iron uptake and the oxidative stress response across bacteria, is available on PDB and may contribute to Nostoc’s resilience in extreme conditions. Nostoc’s has a higher number of Dps proteins than other species, suggesting that has a complex system for managing oxidative stress and iron homeostasis. NpDps4 has an unusual cyanobacteria-specific structure and has some similarities to the equivalent protein in Deinococcus radiodurans, an extremophile bacterium that, like Nostoc, can survive space-like conditions (Howe et al., 2019).

2. Identify the amino acid sequence of your protein.

  • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
  • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
  • Does your protein belong to any protein family?

PDB link: https://www.rcsb.org/structure/5HJH

QTLLRNFGNVYDNPVLLDRSVTAPVTEGFNVVLASFQALYLQYQKHHFVVEGSEFYSLHEFFNESYNQVQDHIHEIGERLDGLGGVPVATFSKLAELTCFEQESEGVYSSRQMVENDLAAEQAIIGVIRRQAAQAESLGDRGTRYLYEKILLKTEERAYHLSHFLAKDSLTLGFVQAA

NpDps4 is 178 amino acids long and its most frequent amino acid is Leucine (L). The protein belongs to the Dps family, a group of DNA-binding proteins involved in iron homeostasis and the oxidative stress response. A BLAST against UniProtKB yielded 27 results with 90% identity or higher and 250 results with 36.9% identity or higher.

3. Identify the structure page of your protein in RCSB

  • When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
  • Are there any other molecules in the solved structure apart from protein?
  • Does your protein belong to any structure classification family?

The structure was solved in 2017 with a resolution of 1.88 Å. The solved structure includes iron 3+ ions to which the protein is bound in a complex. The page also includes a 2D diagram of 4-(2-HYDROXYETHYL)-1-PIPERAZINE ETHANESULFONIC ACID, which is a ligand for the protein. NpDps4 belongs to the Dps protein family, which share a similar structure.

4. Open the structure of your protein in any 3D molecule visualization software:

  • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
  • Color the protein by secondary structure. Does it have more helices or sheets?
  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Ball and stick visualization:

ball-stick

Secondary structure visualization:

secondary-structs

(Helices = Red, Sheets = Yellow, Loops/Coils = Green)

Residue hydrophilicity visualization:

secondary-structs

(Hydrophobic = Purple, Hydrophilic = Pink)

Surface visualization

surface

Analysis: Dps4 is mostly composed of helices. Most of its hydrophilic residues are on the outside of the protein, while the hydrophobic residues are on the interior. The protein does not appear to have any major “holes,” but has a concave center.

C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

  1. Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
  2. Choose your favorite protein from the PDB.
  3. We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1) Protein Language modeling 1. Deep Mutational Scans

  • Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
  • Can you explain any particular pattern? (choose a residue and a mutation that stands out)

Mutational scan heatmap surface

The protein seems to be highly tolerant to mutations at residue 97, which is leucine in the original sequence. This location probably is not critical for protein structure and function.

2. Latent Space Analysis

  • Use the provided sequence dataset to embed proteins in reduced dimensionality.
  • Analyze the different formed neighborhoods: do they approximate similar proteins?
  • Place your protein in the resulting map and explain its position and similarity to its neighbors.

Latent space visualization

surface

(Red diamond = Dps4, Gray circles = Other proteins)

I found some groups of proteins in the map that were similar in function. For example, there was a cluster of proteins involved in respiration and another that appeared related to oxidateive stress. Enzymes of similar types grouped together. For example, I found a cluster of transferases.

My protein was surrounded by proteins of a variety of different functions and species of origin. One of the closer proteins was a DNA binding protein, which makes sense given that NpDps4 is a DNA-binding protein.

C2) Protein Folding

1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

ESMFold Prediction

surface

The ESMFold prediction looks significantly different from my original structure as visualized in PyMOL. In this example, the AI prediction may not have generated a structure accurate to the experimentally-validated one, potentially because the experimental structure shows the protein interacting with iron ions and a ligand.

2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

Changing just a few amino acids did not make a major difference, suggesting that the structure is relatively resilient to mutations. Randomly changing the amino acids in large segments of the sequence generated variants with new structural components (e.g., an alpha helix where there was previously an intrinsically disordered region), but mostly retained the same overall shape.

C3) Protein Generation Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN 1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

Predicted sequence:

AVTRLPGEVYENDLGLPREVTAPVTAGLNRLLASYRALLADYTRFAQTVSGPRAAELRAFFKEAAAQIQKHIDKIEKRLRELGGVPVTDAASIAALTVYTPMAPGVASAEEMVRHALAAYEAIIKEIKAQQKLAEELGDEETAKLLKEILKETEAQAKQLKAFLAPESEDDAFTLPG

Protein MPNN assigned this a score of 0.9236.

At a glance, this sequence looks significantly different from the original.

  1. Input this sequence into ESMFold and compare the predicted structure to your original.

ESMFold prediction for inverse-folded sequence reverse-predict

The result looks like it has the same domains as the original, but a slightly different arrangement.

D. Group Brainstorm on Bacteriophage Engineering

Brainstorm/Options for increasing MS2 titer and lysis protein toxicity:

Overall workflow idea: schematic

Option 1: Compile known lysis proteins of a variety of phage (or maybe just E coli ones given high specificity of phage) and info on their lytic efficiency → then use AI models (possibly similar to Evo 1 and Evo 2, as described in the King et al. preprint, though these are whole genome models) to find structural or sequence patterns that are related to enhanced vs. reduced lytic efficiency across all phage and across E. coli phage in particular. Synthesize the supposedly more-optimal E. coli phage protein variants and/or synthesize modified MS2 L proteins based off of them. Identify novel proteins that are effective, then engineer them into MS2.

  • Could compile a database of lysis proteins that are similar in structure and/or sequence to MS2 L (and therefore more likely to work in E. coli) using FoldSeek, BLAST, and Clustal Omega
  • (Unsure to what extent lysis proteins are host-specific. They likely have less of an effect on host range than tail fibers, which affect ability to bind to host, however, if a lysis protein is dependent on binding to a certain host protein, it would likely be highly host-specific.)
  • Train an AI with the above database and generate new lysis proteins that retain enough similarity to existing MS2 L or other E. coli lysis proteins that they might still function in MS2 / E. coli (or look at all lysis proteins in general). This could involve or be based on a tool like EvolvePro or a combination of RFDiffusion and ProteinMPNN (RFDiffusion to generate novel scaffolds and ProteinMPNN to generate sequences for them)
  • Possibly, test the novel proteins’ ability to bind to relevant host proteins (e.g., DnaJ) in silico using AlphaFold-Multimer or Boltz-1 (, which we could also use to enhance the binding interactions)
  • Determine which residues are more likely to be essential to MS2 or other lysis protein functioning via ESM2 mutational scanning. Filter out novel sequence options that include likely detrimental mutations at these locations. (Alternatively, incorporate this step into the AI algorithm somehow and instruct the model to avoid mutations that ESM2 classifies as deleterious.)
  • Based on the above, select novel in silico-designed proteins that display the most promising interactions with host proteins in silico and that have unusual sequences and/or structures (to increase the chances of enhancing lytic capabilities beyond those of the existing proteins) but that retain a level of fundamental similarity to MS2 L and other lysis proteins at the essential locations as determined through ESM2.
  • Test the lytic ability of the most promising ones by synthesizing them into plasmids, transforming into E. coli, and inducing them on a plate to measure whether and how effectively they lyse host cells
  • Engineer the most effective ones into MS2
  • Limitations: there may be a high chance of developing novel proteins that are non-functional because we lack knowledge about the exact function that they need to perform and the interactions with host proteins that we would need to retain in order to preserve lytic abilities. Additionally, to have the highest possible chance of obtaining proteins that function in an E. coli phage, we may need to limit the database to E. coli phage lysis proteins, which may not be enough training data for an AI model.

Option 2: Alternatively, random mutagenesis of MS2 L until we get a higher titer or more lytic version. I.e., perform random mutagenesis on MS2 phages and then run infection assays to identify mutants that are higher titer or lyse the host faster than the wildtype.

  • This is the “traditional” method and is time-consuming and likely limited in the variety of functional novel phages that it can generate

Week 5 HW: Protein Design Part II

A. SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

Mutant SOD sequence:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Binder peptides generated with PepMLM, evaluated with AlphaFold:

PeptidePerplexityipTMLocation
WRVPAAGAELGX7.4150450.41Over disordered region
WHYYAAAVRWKX15.4351340.37Across 2 rows in sheet
WLYPAAGLRHWX16.5861510.26Over small helix
WRYYAAALALGX7.5777190.38Over sheet region, peptide is a helix

For comparison, here is a known SOD1-binder peptide:

PeptidePerplexityipTMLocation
FLYRWLPSRRGGN/A0.31Across 2 rows in sheet

Part 2: Evaluate Binders with AlphaFold3

All but one of the new binder peptides had an intrinsically disordered structure, and all binders varied in their location of binding. All binders appeared to “float” above the SOD protein in alphafold’s visualization, suggesting that none would directly integrate into the structure. All but one of the new peptides had a higher ipTM value than the known binder. (See previous section for specific stats and binding locations.)

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

PeptiVerse results for all new peptide options: peptide-1

peptide-2 peptide-3 peptide-4

All of the resulting peptides have similarly weak binding affinities according to PeptiVerse. All were also soluble and non-hemolytic. If I were to advance one of the peptides, I would choose the fourth one – it had the strongest predicted binding affinity of the four (per PeptiVerse) and, according to AlphaFold, had a more-interesting structure: helical rather than disordered.

Part 4: Generate Optimized Peptides with moPPIt

Getting CUDA Out of Memory Errors even though using L4 GPU as recommended. Tried to debug, but the program keeps crashing.

B. BRD4 Drug Discovery Platform Tutorial

Part 1: Structural Predictions in the Sandbox

CompoundBinding ConfidenceOptimization ScoreStructure Confidence
Hit0.420.220.98
Lead0.750.270.98
JQ10.960.440.98

The predicted binding confidence in Boltz won’t change as a potential ligand advances from “hit” to candidate for clinical treatment. However, it is safer to advance ligands that have favorable binding scores to clinical candidates.

The JQ1 ligand appears to bind between three alpha helices in BRD4. JQ1 has a higher optimization score than the Lead and Hit.

Part 2: Setting Up a BRD4 Design Project

Generated target region for BRD4 based on JQ1 binding site.

Part 3: Running Your Virtual Screen

confidence candidates

Part 4: Analysis and Discussion

JQ1 is the top compound, with a binding confidence of 0.96. The second closest new binder had a score of 0.92. Other top binders had confidence levels between 0.7-0.8.

I ran the top 8 BRD4 binders against the target BRD2. All had comparable or higher binding confidence scores with BRD2.

C. Final Project: L-Protein Mutants

Heatmap scoring possible mutations mutations

Positive log likelihoods on the heatmap tend to correspond with tolerable mutations based on experimental data. Negative log likelihoods tend to correspond with deleterious mutations.

L-protein seq:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
MutationRegionBenefitMultimer (AF-2)Comments
Y -> L, AA 39TransmembraneHeatmap score is about 2, indicating that this would be highly tolerable and potentially beneficial. Swapping a given amino acid with leucine appears to be more tolerable than alternative swaps across the residues.lysis-1Retained pore structure
I -> F, AA 46TransmembraneExperimental data showed that the lysis protein functioned after this mutation. The heatmap shows a moderate log likelihood ratio, indicating that this mutation might not occur evolutionarily but probably is not deleterious.lysis-1Also retained pore structure, though AlphaFold displayed it at a different angle. See its second visualization for a better comparison lysis-1
L -> P, AA 44TransmembraneExperimentally validated but interestingly appears to have a relatively low log likelihood ratio on the heatmaplysis-1Retains pore structure, but the disordered portions of the proteins appear to be associating more with each other in the complex and/or appear misshapen. Not sure why changing the transmembrane domain would directly affect the disordered regions, so suspect that this is an artifact of AF-2’s chosen display angle
P -> L, AA 13SolubleExperiemntally validated and the heatmap gives a relatively high log likelihood ratio. We know that the mutated protein works in practice and could occur naturally due to evolution.lysis-1Retains pore structure. Note AF-2 visualization shows a side view.
S -> Q, AA 9SolubleUnlikely to have a major deleterious effect given that serine and glutamine are both polar and uncharged. The heatmap shows a high log likelihood ratio.lysis-1Similar to above, retains pore structure. AF-2 visualization shows side view again.

Week 6 HW: Genetic Circuits Part I: Assembly Technologies

A. DNA Assembly

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The Phusion High-Fidelity PCR Master Mix contains a DNA polymerase enzyme that is a modified version of one found in Pyrococcus furiosus – to create the second strand during the extension step; deoxynucleotides so that the polymerase has nucleotides to use for extension; and reaction buffer so that the pH and other conditions are optimal for PCR.

2. What are some factors that determine primer annealing temperature during PCR?

GC-content, primer length, and primer concentration all affect annealing temperature.

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

Restriction enzyme digests cut DNA at specific cut site sequences using enzymes of your choice. You obtain whatever fragments result from cutting the existing DNA at that site, and there is no amplification process. For PCR, you design oligo primers for a longer sequence and then selectively amplify it through repeated denaturation, annealing, and extension. PCR is useful for quickly determining whether a specific sequence is present in your sample, and for extending DNA fragments in such a way that they overlap and can be assembled via downstream reactions (e.g. creating sticky ends with exonucleases in Gibson Assembly). Restriction digests allow you to cut DNA at sites of interest so that you can stick fragments together in the way that you want.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Prior to finalizing your selection of target sequences and primer design, you should use online calculators to ensure that relevant sequences are compatible with each other and include all sequences necessary (e.g., promoter, RBS, etc) for the expression of the gene(s) of interest, have a GC content that isn’t too high or too low, and that the insert-vector molar ratio will be 2:1.

5. How does the plasmid DNA enter the E. coli cells during transformation?

By diffusing through pores (generated through the electrical shock) in the cell membrane.

6. Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online). Model this assembly method with Benchling or Asimov Kernel!

Golden Gate Assembly uses TypeIIS recognition enzymes, which cleave DNA outside their recognition site, to create sticky overhangs in a destination vector/plasmid and corresponding fragment. Usually, the original plasmid contains an easily-screenable sequence so that you can determine whether you’ve effectively removed it and replaced it with your fragment of interest. To add your insert, you design PCR primers that target the sequence of interest and that also extend it with TypeIIS recognition sequences and cut sites that you’ve designed to match those in the target vector. After PCR, the resulting amplicon should be compatible with the vector – you can use restriction enzymes to create sticky ends and ligate the insert and vector in a “one pot” reaction involving the fragments, vector, restriction enzyme, and DNA ligase. At the end of the reaction, you add more TypeIIS enzyme to ensure that any remaining uncleaved vectors are removed, and you inactivate the enzymes by increasing the reaction temperature. goldengate

B. Asimov Kernel

Create a blank Notebook entry to document the homework and save it to that Repository. Explore the devices in the Bacterial Demos Repo to understand how the parts work together. Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository.

Circuit schematic: goldengate Oscillating protein expression after addition of aTc inducer: proteins

Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo. Explain in the Notebook Entry how you think each of the Constructs should function. Run the simulator and share your results in the Notebook Entry. If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome.

Practice Construct 1: System expressing Ag43 (biofilm formation) in E. coli. Ag43 can be repressed by the addition of an inducer, which triggers LacI expression tied to Ag43’s repressible promoter.

Ag43 Ag43-transcripts

Practice Construct 2: Simple constitutive expression system for Ppra, a radiation resistance protein found in D. radiodurans

ppra ppra-transcripts

Practice Construct 3: System that expresses either the Covid spike protein or a human ACE2 receptor protein. ACE2 is expressed in response to inducer addition and is linked to the expression of LacI, which represses the spike protein in this circuit.

ace2 Ace2-transcripts

Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

Part 1: IANNs

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

IANNs are more complex and flexible than traditional genetic circuits and can involve a greater degree of abstraction (i.e., the inputs aren’t digital). Specifically, IANNs allow non-digital dosages for better spatial and temporal control of gene expression in ways that more effectively address the complex and systems-based natural of relevant deployment settings such as the human immune system.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

IANNs provide a non-digital behavior that could potentially be useful for developing sensitive engineered systems that control microbial behavior (the outputs) at the population or community level in response to specific combinations external/environmental factors (inputs) at given levels. For example, certain specific combinations of quroum sensing molecules at a given level are known to cause certain behavioral responses, some of which are relevant to human health and the environment – e.g., toxin production and pathogenicity-associated responses. Using IANNs, we could create systems that detect (and respond to) those combinations with high sensitivity and specificity, with applications in biosensing (e.g., producing reporters that indicate the dangerous conditions) and developing methods to target and reduce pathogenicity.

3. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

Diagram below: The first layer regulates endoribonuclease production via a similar repression mechanism to that of the endoribonuclease – acting at the post-transcriptional level on the endoribonuclease transcript. When repressor acting upon the endoribonuclease is low enough, the endoribonuclease is expressed and can turn off the fluorescent protein’s expression (in the second layer) by binding to its transcripts before they can be translated. Enough expression of the repressor of the endoibonuclease results in the expression of the fluorescent protein.

diagram

Part 2: Fungal Materials

1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Fungal materials are usually mycelium-based, meaning they use the root part of the fungus. Mycelium-based materials have been proposed as a more sustainable and biodegradable alternative to traditional packaging components, building materials, and textiles. However, they tend to have less structural integrity than traditional alternatives, and we currently do not have large-scale mycelium material growth/production systems in place for manufacturers to use.

2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Fungi could be engineered to self-assemble into useful biomaterials via modification of signaling systems and pathways invovled in poulation pattern formation. These systems could be engineered to respond to specific, experimentor-deterimed inducers, which manufacturors could use for fine-tuned control of fungal manufacturing at a larger scale.

Fungi are eukaryotes, so are more complex and likely can be engineered to produce a wider variety of compunds and perform a wider variety of functions than we can engineer into bacteria. Certain fungi, such as the budding yeast Saccharomyces cerevisiae are good model organsisms for studying eukaryotic cell behavior and are good platforms for using SynBio to conduct basic science research (e.g., understanding the effects of genetic variation and protein-protein interactions of interest by measuring and observing the effects of specific engineering changes, potentially involving complex circuits).

Part 3: First DNA Twist Order

Project Title: Developing an Engineerable Photogranule System for Wastewater Treatment with Cyanobacterium Oscillatoria sp. and Chassis Bacterium Acinetobacter baylyi ADP1

Brief Project Description: Oxygenic photogranules, glob-like consortia of filamentous photosynthetic cyanobacteria and non-photosynthetic bacteria, naturally absorb and break down harmful chemicals and offer several sustainability advantages over current wastewater treatment techniques. My long-term goal is to develop an engineerable model photogranule system, using filamentous cyanobacterium Oscillatoria and the heterotrophic chassis Acinetobacter baylyi,that could be easily modified to enhance versatility and resilience to diverse microbial conditions, target specific chemical products and pollutants of-interest for removal and/or detoxification, and enhance the viability of the bacterial-waste sludge for renewable downstream applications.

Draft Aim 1: My first immediate experimental aim is to enhance A. baylyi’s biofilm formation and/or extracellular polysaccharide production in such a way that it associates more effectively and reliably with Oscillatoria. Currently, I am considering engineering A baylyi to express Ag43, a well-characterized cell membrane protein that, when upregulated/expressed, promotes cellular aggregation and biofilm formation in E. coli. By forming aggregates with itself, I hypothesize that the engineered A. baylyi will be better suited (than the wild type) to form photogranules with Oscillatoria in a wastewater setting. My immediate experimental steps are to 1) design/order a plasmid that expresses Ag43 for use in A. baylyi or order genetic parts that I could assemble via Gibson Assembly, 2) assess wild type A. baylyi’s level of natural association with Oscillatoria in the lab, and 3) transform the Ag43 plasmid into A. baylyi (after any assembly steps, if necessary) and assess its effect on A. baylyi biofilm formation and association with Oscillatoria via visual observation and microscopy.

Week 8 HW: Cell-Free Systems

General HW Questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free protein synthesis allow the expression of systems that might otherwise be toxic to or impose a high level of burden upon a living cell. The lack of native cellular pathways (which often respond to circuit presence in unexpected ways and can sometimes compromise the functioning of a circuit) also means that the researcher need not fear interference or complications due to the behavior of the host. Cell free methods are more beneficial than cell production for working with sensing/detection constructs such as riboswitches, since (in the absence of a cell membrane) the construct will function better with direct access to compounds of interest in the environment. Cell free methods are also advantageous when studying systems that would otherwise kill or significantly reduce the functioning of the host cell – e.g., toxins or high-burden circuits.

2. Describe the main components of a cell-free expression system and explain the role of each component.

Cell free systems contain:

  • Ribosomes and tRNAs: To provide the mechanisms necessary for translation
  • ATP: To power translation
  • mRNA: To be translated –> Note that you can either use existing mRNA or provide DNA and RNA polymerase to transcribe it
  • Any enzymes necessary for the pathway being expressed

3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Energy provision is critical in cell free systems because it powers transcription and translation. Cell-free systems typically provide a continuous ATP supply through specific kinase and phosphatase-based enzymatic pathways.

4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic cell-free systems use prokaryotic cell lysates and are faster and cheaper than eukaryotic cell-free systems. Eukaryotic cell-free systems use eukaryotic cellular machinery, allowing the expression of eukaryotic proteisn that require processing steps involving eukaryote-specific machinery, such as the endoplasmic reticulum and golgi apparatus. Some examples of proteins optimal to express in each system are: In prokaryotic lysates – specific bacteriocins, anti-microbial proteins produced by bacteria to kill competitor bacteria; In eukaryotic lysates – specific antibodies, which usually require folding/processing in the eukaryotic endomembrane system.

  1. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

To assess and optimize a membrane protein’s expression, I would create a cell free system that includes relevant cell membrane components. Membrane proteins’ funcitoning is highly dependent on protein structure and on the hydrophobicity and hydrophilicity of the proteins’ domains, so I would optimize the environment (e.g., ion concentrations) for effective folding.

6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

The low yield could be because:

  1. The circuit expressing the target protein was flawed
  • switch to a stronger promoter or from an inducible to a constitutive promoter
  • validate circuit sequence / identify issues via sequencing
  1. The system did not include enough of one or more required cellular components
  • test the system in an excess of amino acids, tRNAs, ribosomes, ATP, etc
  1. The protein’s complete synthesis is dependent on the presence of other proteins that aren’t present in the current cell free system
  • understand the complete pathway for the protein’s syntheis and add relevant cellular components as necessary

Questions from Kate Amadala

Design an example of a useful synthetic minimal cell as follows:

1. Pick a function and describe it. a. What would your synthetic cell do? What is the input and what is the output?

My synthetic minimal cell would express the Mcy toxin-production gene cluster in harmful algal bloom-causing cyanobacteria. Given my minimal cell would presumably have a substantially reduced genome, its degree of functioning or non-functioning would clarify the minimimum genetic conditions required for toxin production.

b. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

While this could potentially be realized by TxTl, encapsulation is more reflective of a real cellular environment (/ provides the minimum conditions of a cellular environment) and would provide potentially more-relevant/applicable information about the gene cluster’s funcitoning.

c. Could this function be realized by genetically modified natural cell?

Yes, this function could be realized through gene-knockout studies. However, cyanobacteria are difficult to genetically engineer, and gene knockout in real cyanobacterial cells might not be feasible to perform at a large enough scale for high throughput screening.

d. Describe the desired outcome of your synthetic cell operation.

My synthetic cell should provide a model that I can use for genetic screening to assess the effects of various genetic modifications (and other changes) on Mcy performance and expression – to ultimately identify protein targets for anti-microcystin and anti-HAB strategies.

2. Design all components that would need to be part of your synthetic cell.

a. What would be the membrane made of?

The membrane would replicate that of a gram negative bacterium: with an outer phospholipid membrane, a peptidoglycan layer, and an inner membrane.

b. What would you encapsulate inside? Enzymes, small molecules.

A minimal or reduced genome, ribosomes, tRNAs, ATP.

c. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)

Bacterial. Potentially, Microcystis aeruginosa.

d. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

My synthetic cell will likely need to express membrane channels necessary for basic ion homeostasis.

3. Experimental details

a. List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

  • Minimal M aeruginosa genome
  • Mcy genes A-J

b. How will you measure the function of your system?

Lyse cells and use a toxin detection assay to measure microcystin production. (And potentially use other chemistry techniques to assess its chemical structure.)

Questions from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

  • Write a one-sentence summary pitch sentence describing your concept.

I propose a face mask that performs real-time virus detection via a cell-free system embedded in the fabric.

  • How will the idea work, in more detail? Write 3-4 sentences or more.

The face mask will contain a cell free system consisting of several detection systems for multiple pathogens. Potentially, the cell lysate could contain multiple riboswitches that detect pathogen genetic material and express a reporter of a color that corresponds with the pathogen being detected. (Note that the cell-free system might require a built-in “disruption” step to get viral genetic material out of the capsid, which may not be feasible )

  • What societal challenge or market need will this address?

The product will make real-time at-home multi-pathogen testing more accessible and encourage users to seek medical treatment when necessary and avoid spreading infectious disease.

  • How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

This product would likely have to be one-time use only – or could potentially be used until a positive test. To compensate for the sustainability downsides of such a setup, the cell free component could be replaceable – embedded in a reusable cloth mask.

Questions from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

How does the performance of well-characterized genetic circuits (e.g., the repressilator and toggle switch) differ under space conditions / aboard a space station as compared to a traditional Earth-based laboratory setting? Some space conditions for testing include: microgravity, radiation exposure, low-oxygen, and vacuum exposure. Understanding how circuits behave in these contexts–independently of a living cell/chassis, which contains complex machinery that would likely fail and compromise circuit performance under many of these conditions–would help us optimize engineering and circuit design strategies for space applications (developing a set of SynBio “design principles” for space).

2. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

I would study existing well-characterized genetic circuits such as the Repressilator and Toggle Switch. I might also look at very simple systems expressing only a reporter, and potentially with varying control systems (constitutive expression vs. inducible expression, etc).

3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

The above circuits would allow me to easily benchmark their performance under space conditions via comparison with their performance under typical laboratory conditions. I would use RNA sequencing and analyze reporter expression via spectrophotometry and other fluorescence measurements in order to identify differences in performance and characterize performance under different space-like conditions.

4. Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

My research goal is to characterize circuit performance (in cell-free systems) under a variety of space-like conditions, varying the circuit design and space conditions of interest. My results will inform future cell-free system applications in space and will also generate foundational knowledge about the effect of space on circuit behavior–independently of a bacterial chassis.

5. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

I will recreate the repressilator and toggle switch experiments, adapted to use cell free expression rather than bacterial cells, under several space-like conditions: 1) a microgravity simulator, 2) under some form of radiation, 3) under low oxygen conditions, and 4) a normal Earth laboratory setup (the control). I will collect fluorescence measurements at multiple timepoints to assess whether circuit behavior differs from the control under the different conditions. I will also perform RNA sequencing to assess the timing of gene expression of the circuits within the cell free systems and determine whether it varies across conditions.

Week z10 HW: Advanced Imaging and Measurement

Homework: Final Project

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

I plan to conduct several levels of measurement. First, I will validate the basic assembly of my genetic circuit via gel electrophoresis at each assembly step and by measuring fluorescence to assess the expression of a reporter gene within the circuit, indicating successful transformation. Second, I will confirm whether my circuit effectively knocked out the Oscillatoria McyH gene by performing PCR and sequencing the resulting fragment. Third, I will assess whether the knockout effectively inhibited microcystin production by using a standard microcystin toxin assay to quantify microcystin levels in a culture of engineered Oscillatoria and a culture of non-engineered Oscillatoria.

Homework: Waters Part I — Molecular Weight

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

Theoretical pI/Mw: 5.90 / 28006.60

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

  • Determine for each adjacent pair of peaks

Using the charge state at the peak and the charge state to its left, I obtained:

z = 32.09

  • Determine the MW of the protein.

MW = 27kDa

  • Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1

Accuracy = (|27-28| / 28 ) –> experimental result 3.57% off from the true value.

  • Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

z = +18.87 (= +19)

Homework: Waters Part II — Secondary/Tertiary structure

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

Unfolded/denatured proteins are highly charged, have more surface area, and basic residues tend to get protonated, while native proteins have little charge, and charge is evenly spread across the surface. on a mass spectrum, a denatured protein will have more peaks, corresponding with the greater number of charges.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800? What is the charge state? How can you tell?

z = +4.62 (= +5) –> z = 1/(2799.6365-2799.4199)

Homework: Waters Part III — Peptide Mapping - primary structure

1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

Lysine: 20, Arginine: 6

2. How many peptides will be generated from tryptic digestion of eGFP?

19

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

~22 peaks

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

Yes, this is close to the number of peptides generated, however there are more peaks in the chromatogram.

5. Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide based on its m/z and z.

Charge state: = +2. MW = 1.049 kDa.

6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm

Potentially, FEGDTLVNR. Error of 9.55 x 10 -4

7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

Result = 90.7% coverage

Homework: Waters Part IV — Oligomers

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

7FU Decamer: Intensity ~15 (at 3.4 MDa)

8FU Didecamer: Intensity ~170 (at 8 MDa)

8FU 3-Decamer: Intensity ~50 (at 12 MDa)

8FU 4-Decamer: Intensity close to 0 (at 16 MDa)

Homework: Waters Part V — Did I make GFP?

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

Our node did not have work done at Waters. Please see data screenshots in this document.

Week z11 HW: Bioproduction and Cloud Labs

Part A: Pixel Art

Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST. Make a note on your HTGAA webpages.

I liked that the project setup limited the amount of changes that each user could make at once (with the time buffer) to ensure contribution from many users and to reduce the chaos of changes happening at once. In terms of potential improvements, the size of the canvas (/total number of plates) could be increased for next year to increase the number and complexity of possible designs.

Part B: Cell-Free Synthesis

Referencing the cell-free protein synthesis reaction composition, provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

E coli lysate:

  • BL21 Star Lysate w/ T7 Polymerase: E. coli minimal cell material that provides a basis/medium for all the reactions taking place and is often optimized for minimal mRNA and protein degradation (via reduced RNase and protease activity). The T7 polymerase synthesizes RNA for in vitro transcription.

Salts Buffer

  • Potassium Glutamate: stabilizes RNA and aids in protein folding and interactions via charge
  • HEPES-KOH pH 7.5: acts as a buffer to maintain optimal pH for “cell” functioning
  • Magnesium Glutamate: provides a source of Mg2+, an essential cofactor for many enzymes invovled in transcription and translation – e.g., RNA polymerases and ATPases
  • Potassium phosphate monobasic: another buffer component to maintain optimal pH. Also provides phosphate ions essential for ATP-regeneration and other cellular processes.
  • Potassium phosphate dibasic: similar role to potassium phosphate monobasic, the key difference being that the dibasic version is an alkaline buffering agent while monobasic is an acidic buffering agent. At the correct concentrations, the two together help create the optimal pH.

Energy / Nucleotide System

  • Ribose: An essential component of ATP and a precursor to deoxyribose found in DNA and RNA
  • Glucose: Essential for ATP production (via glycolysis)
  • AMP: Second messenger for various essential processes and a precursor to ATP. Also a nucleic acid monomer
  • CMP: Nucleic acid monomer and essential for RNA synth
  • GMP: Also a nucleic acid monomer
  • UMP: RNA monomer
  • Guanine: Its derivatives are essential for RNA synthesis

Translation Mix (Amino Acids)

  • 17 Amino Acid Mix: Provides amino acids essential for translation/protein synth
  • Tyrosine: Not present in the standard mix because it’s considered “non-essential” because it is synthesized from the already-provided phenylalanine precursor, but is important to provide to ensure sufficient supply
  • Cysteine: Not present in the standard mix because it’s considered “non-essential” because it is synthesized from the already-provided phenylalanine precursor, but is important to provide to ensure sufficient supply

Additives

  • Nicotinamide: Precursor to NAD+, essential for ATP synthesis

Backfill

  • Nuclease Free Water: Suspends the solution so that everything is mixed together and can function in a cell-like manner (cells are ~70% water)

Part C: Cell-Free Master Mix Design

  1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

  2. sfGFP: Has “superfolder” mutations that give it a stronger beta barrel and enhanced folding resilient to sub-optimal conditions

  3. mRFP1: Unlike most fluorescent proteins, mRFP1 is monomeric, so can be easily fused to other proteins

  4. mKO2: Is also monomeric and is engineered with mutations that enhance its folding speed and stability and reduce auto-aggregation

  5. mTurquoise2: Also monomeric and has a strong “packed” structure around the chromophore (due to a key AA substitution) that enhances its stability and ability to withstand photobleaching

  6. mScarlet_I: Also monomeric and has an especially rigid chromophore structure that enhances its brightness.

  7. Electra2: Also monomeric and has a tight beta barrel around its chromophore that strengthens its general stability and photostability.

  8. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

Hypothesis: Increasing the concentrations of Potassium Glutamate and Magnesium Glutamate will enhance the stability (and fluoresence, which depends on correct folding) of the beta barrel-based proteins above (e.g., sfGFP).

  1. The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

  2. The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!). The reaction composition for each well will be as follows:

  • 6 μL of Lysate
  • 10 μL of 2X Optimized Master Mix from above
  • 2 μL of assigned fluorescent protein DNA template
  • 2 μL of your custom reagent supplements

Total: 20 μL reaction

Week z12 HW: Building Genomes

Extra Time to work on Final Project