Bioengineering postdoc researching the effects of genetically modified bacteria on soil microbiota. i love cyanobacteria! 💚 i’m interested in biomanufacturing with photosynthetic microbes.
First, describe a biological engineering application or tool you want to develop and why.
I want to engineer a genetic biocontainment system linked to PET degradation because this would allow controlled open-release applications of plastic-degrading bacteria. This biocontainment strategy should be designed to prevent both proliferation of the living bacteria outside the desired application zones and the spread of the synthetic genetic elements through horizontal gene transfer. PET plastic can be degraded with two enzymes into terephthalic acid, which can be used as a carbon source by some soil bacteria like Psueomonas species. Using transcription factors, a genetic circuit can be built to link the expression of a kill-switch with the degradation of PET, so that the cells don’t spread beyond the polluted area. CRISPR-Cas, as a kill-switch mechanism, not only prevents proliferation of live cells, but also can degrade the enzymes to decrease the likelihood of horizontal gene transfer.
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
The products of synthetic biology, like engineered microbes, could be used to do amazing things that help society, but there are inherent risks in editing life. I believe a core governance goal should be to ensure we, as synthetic biologists, are designing our products with those risks in mind, and making choices to mitigate those risks. In some cases like open-release of engineered bacteria (like for plastic pollution bioremediation in soil), we might not even know what all the risks might be, or how likely they are. Therefore, an important subgoal is controlled, small-scale testing under realistic deployment conditions for risk assessment. Once risks are identified, the probabilities of occurrence should be considered along with the potential harms, and risk mitigation should be designed appropriately. So, another subgoal is requiring risk mitigation strategies for the identified risks; as well as demonstrating that the chosen strategies do minimize those risks.
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
In the US, bioremediation activity is usually regulated by the US Environmental Protection Agency; although depending on the specific application, the FDA/USDA might also have jurisdiction. For engineered bacteria for soil pollutant bioremediation, I believe the bacteria would need to be approved by the EPA. Therefore, they could take governance actions by implementing specific policies around these products:
EPA-led or funded research on risk assessment of engineered bacteria in realistic open-release conditions.
Purpose: Current EPA policies mostly disallow engineered organisms for open-release unless the organism has no “trans-genes” (genes from a different species). This is largely considered to be outdated to the current level of technology and scientific knowledge. Requiring risk assessment proposes new research to clearly identify possible risks and prioritize them by probability and potential harm, eventually allowing a way for approval and safe implementation of engineered organism products.
Design: The EPA would need to do this research or fund other institutes to do test engineered bacteria in conditions that reflect open-release applications. The EPA has multiple offices that do research, as well as some grants that give funding to external recipients. Ideally this would result in a list of possible risks and how to assess them.
Assumptions: Research outcomes can be broadly applicable to similar scenarios (this is a pretty huge assumption that I’m honestly unsure if I’m comfortable making); i.e. engineered bacteria in similar applications in similar environments might have similar risks.
Risks of Failure and Success: This could fail because there could be additional risks that are not identified. Especially when looking at something as potentially broad as an open-release application of a live organism, there are so many potential interactions that we can’t anticipate or test for in a controlled manner. For example, in a soil bioremediation or biofertilizer context, there are bench-scale microcosm and greenhouse-scale mesocosm experiments that can account for a lot of the soil/water/plant interactions. But what about things like weather and wildlife? A field study is needed, but if you control against those risks (such as netting to keep out birds) to prevent escape during the risk assessment experiments, you still aren’t able to fully test those risks. So a risk of success is the harmful escape of an unsafe engineered bacterium during risk assessment experiments. With how connected environments are (i.e. oceans), this could result in a global spread.
Policy to require specific risk mitigation and demonstration of effectiveness under realistic application conditions for engineered bacteria approval.
Purpose: Currently, new products that might affect environment and public health need to be approved by the EPA for commercial use. This would enact specific requirements for approvals for engineered bacteria. Additionally, many publications about genetic biocontainment discuss it as potential risk mitigation, but the effectiveness of the biocontainment is only demonstrated under specific laboratory conditions (i.e. axenic, optimized media, etc.).
Design: This would be a change in current EPA standards and approval processes. The EPA would need to write and implement new policies, potentially train risk assessors and application managers, and develop testing procedures to ensure compliance. With the overturning of the Chevron doctrine, likely this sort of new policy would require the buy-in of either the companies trying to get their products approved or US Congress to pass new legislation.
Assumptions: Companies and reseachers abide by federal regulations regarding testing and approval. Risk assessment is done in good faith, rather than by companies prioritizing profit over safety. Risk assessment is done by trained ecological and biological risk assessors who know what to look for or be aware of.
Risks of Failure and Success: This could fail if the requirement is too stringent to allow any new products to be approved. This could also fail if the requirements are too lax, and not all risks are accounted for and mitigated. If experimental conditions do not properly reflect application conditions, what appeared to be effective mitigation in the lab might not be effective mitigation in application.
Researchers and inventors could also implement relevant and effective genetic biocontainment in any engineered bacteria used for open-release applications.
Purpose: For risks around the unintended spread of engineered bacteria or their synthetic genetic constructs, genetic biocontainment can mitigate these risks by preventing proliferation and/or degrading the relevant DNA. By tying the biocontainment system to the intended use of the bacterium, researchers manage risk in a relevant manner, thus ensuring that the bacterium is specific to the intended application and minimizing spread thereby reducing risks.
Design: Any developer of an engineered bacteria that could be released would need to research biocontainment and engineer a system into their bacteria. This would require a change in the current culture of the field, where the risks of engineered bacteria spread and mitigation through biocontainment are sometimes discussed, but mostly considered somewhat niche. If it became common practice to consider application and risks thereof for the products of synthetic biology, I think the design of these sorts of safeguards would be more widespread. Any sort of research requires funding and incentive, so universities, grant funders, and biotech companies would need to start looking for these considerations in proposals to motivate it.
Assumptions: Genetic biocontainment is a good strategy to mitigate the potential ecological and public health risks of new synthetic biology products. These risks are limited to ones we think to test (i.e. microbial community shifts, horizontal gene transfer of antiobiotic resistance genes or other functions, proliferation of engineered bacteria in unintended location, local specific bacterial extinction event in the case of a particularly robust engineered bacterium).
Risks of Failure and Success: If we rely too heavily on genetic biocontainment, a failure of the genetic system could result in losing that protection against risk. It’s also possible risks would not be seriously considered because we too easily trust biocontainment to minimize the risk.
Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Does the option:
Risk Assessment Research
Risk Mitigation for Approval
Biocontainment in Practice
Enhance Biosecurity
• By preventing incidents
2
1
1
• By helping respond
2
2
3
Foster Lab Safety
• By preventing incident
2
n/a
1
• By helping respond
n/a
n/a
n/a
Protect the environment
• By preventing incidents
2
1
1
• By helping respond
2
1
2
Other considerations
• Minimizing costs and burdens to stakeholders
1
3
3
• Feasibility?
1
2
3
• Not impede research
1
3
2
• Promote constructive applications
1
1
2
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
I would prioritize the requirement of risk assessment and mitigation strategies for EPA approval of engineered bacteria. I believe this would have the biggest impact in terms of allowing engineered bacteria to be used for public good (such as pollution bioremediation) while preventing potential harm (such as ecosystem destabilization by permanently altering the native microbiome). However, I don’t think such a policy would be possible without the prior research so the EPA regulators know what to look for - so the first strategy of risk assessment research would also have to be prioritized. The development of genetic biocontainment tools and implementation thereof becoming regular practice in the field of engineered microbes would be awesome, but I think would be harder to bring about and would take longer - although it might actually have more impact. So maybe instituting a course on risk for bioengineering or biotechnology students could help to bring about that sort of cultural change.
References:
Yonatan Chemla, Connor J Sweeney, Christopher A Wozniak, et al. Engineering Bacteria for Environmental Release: Regulatory Challenges and Design Strategies. Authorea. July 05, 2024. DOI: 10.22541/au.171933709.97462270/v2
Dalton R George, Mark Danciu, Peter W. Davenport, et al. A bumpy road ahead for genetic biocontainment. Nature Communications, 15(650). January 20, 2024. DOI: 10.1038/s41467-023-44531-1
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
Polymerase error rate: $1 : 10^{6}$. The human genome is around 3.2 Gb, or $3.2 * 10^{9}$ basepairs. Biological polymerases are error-correcting; they have have proofreading mechanisms.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
The average human protein is encoded within 1036bp. This might be answerable based on the last slide titled “Fabricational Complexity”, but I couldn’t quite figure out what these formulas are supposed to be calculating without explanation. So instead, we can do some back-of-the-napkin math together. 1036bp is $1036/3 \approx 345$ codons, or 344 amino acids (because of the stop codon at the end), assuming that the 1036bp figure doesn’t include introns. Most amino acids have either 4 or 2 codons that can encode for it, although a couple have more or less. We’ll average it out to approximately 3 codons per amino acid. I imagine that not all amino acids are used at the same frequency in human proteins, but I don’t actually know what it is off the top of my head, so we’re just going to go with what we have. Each possible DNA sequence for an amino acid sequence includes every combination with all possible codons for each amino acid. So assuming an average human protein has 344 amino acids, and the average number of codons per amino acid is 3, then there are $3^{344} = 1.3 E164$ different ways to code for an average human protein. In practice, not all tRNAs are synthesized at the same frequency, so it might take unreasonably long for certain codons to be recognized during chain extension; and during DNA replication, errors can be made and some errors will be more tolerable than others due to codon wobble.
LeProust:
What’s the most commonly used method for oligo synthesis currently?
Phosphoramidite synthesis.
Why is it difficult to make oligos longer than 200nt via direct synthesis?
There are side reactions that occur, causing the accumulation of errors (incorrect bases).
Why can’t you make a 2000bp gene via direct oligo synthesis?
I think this is because of the side reactions in Q2, right? Like, the accumulation of errors limits oligo synthesis to around 200 bases in practice. Also, oligos are single-stranded DNA; a 2000bp gene is double-stranded, and therefore you’d either need to synthesize both strands and ligate them together, or synthesize one strand and use it as a template for PCR or something.
Church:
Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own: BioStabilization Systems - ARPA-H
Biologic therapeutics are critically important for a number of diseases, but require careful and specific conditions at all points on the supply chain to maintain efficacy. Specifically, cell therapies and biologics require extreme cold to prevent degradation, thus making biologics inaccessible to people who don’t live near a specialized medical center. To solve this problem, we propose to express biologic therapeutics in extremophiles from abyssal marine sediment, which demonstrated little cell proliferation in low-oxygen environments but regained metabolic activity when incubated with oxygen. We predict that the faster cell turnover period at warmer temperature, oxygen-rich, and high-nutrient conditions will allow us to engineer these bacteria to produce the biologic therapeutic molecules. Once production is achieved, we will seal the cells into low-oxygen capsules for transport, which we predict will slow their metabolic rate enough to preserve the goal product until oxygen is provided again. If successful, this research could expand access to biologic therapeutics to anywhere that can aseptically incubate microbes at room temperature and purify the molecules therein.
References:
Morono, Y., Ito, M., Hoshino, T. et al. Aerobic microbial life persists in oxic marine sediment as old as 101.5 million years. Nat Commun 11, 3626 (2020). https://doi.org/10.1038/s41467-020-17330-1
Suzuki, Y., Webb, S.J., Kouduka, M. et al. Subsurface Microbial Colonization at Mineral-Filled Veins in 2-Billion-Year-Old Mafic Rock from the Bushveld Igneous Complex, South Africa. Microb Ecol 87, 116 (2024). https://doi.org/10.1007/s00248-024-02434-8
Personal notes/drafting
abstract formula:
1 sentence on the broad problem: Biologic therapeutics are critically important for a number of diseases, but require careful and specific conditions at all points on the supply chain to maintain efficacy.
1-2 sentences on the specific problem: How to transport cell therapies and biologics at room temperature, decentralizing medicine
1 sentence on the broad goal: We aim to express biologic compounds in extremophiles from the deep subsurface where energy and nutrients are limited.
2-3 sentences on methods: aerobic microbes from oxic abyssal marine sediment that proliferated at 10C with provision of nutrients and higher conc O2; might need to consider eukaryotic protein folding in prokaryotes; low O2 environment - maybe sealing the cells (post-therapeutic production, pre-shipping) into an airtight capsule would prevent metabolic activity including the breakdown of said therapeutics?
1 sentence on future work: maybe also try extremophiles found within old rock samples
1 sentence on conclusion/impact: expands access to biologics, especially to under-resourced communities
First, describe a biological engineering application or tool you want to develop and why. I want to optimize a strain of cyanobacteria for biomanufacturing. Cyanobacteria can be engineered to produce many useful things from atmospheric carbon dioxide, from commodity chemicals to bioactive compounds for pharmaceuticals, but harvesting the products is often energy intensive and expensive, especially at an industrial scale. I am particularly interested in cyanobacterial bioplastics, such as polyhydroxyalkanoates, because this would be a closed-loop carbon cycle for biodegradable plastic.
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
Goal: Prevent accidental release that could harm native ecosystems through microbial community shifts or production of commodity chemicals in the natural environment.
Subgoal: Include biocontainment systems in all commercially used industrial bioproduction strains.
Subgoal: Institute testing standards and protocols to notice any accidental release when it occurs.
Goal: Increase access to the genetic tools and strains used for cyanobacterial bioproduction to allow more chemicals to be manufactured in this carbon-neutral way.
Subgoal: Publish cyanobacterial genetic engineering research (such as new tools, etc.) in open access journals or make PDFs available on personal/lab websites.
Subgoal: Enable strain sharing.
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
Policy to require specific risk mitigation and demonstration of effectiveness under realistic application conditions for engineered bacteria approval.
Purpose: Currently, engineered bacteria that might affect environment and public health need to be approved by the EPA, FDA, or USDA for commercial use. This new policy would enact specific requirements for approvals for engineered bacteria. Additionally, many publications about genetic biocontainment discuss it as potential risk mitigation, but the effectiveness of the biocontainment is only demonstrated under specific laboratory conditions (i.e. axenic, optimized media, etc.).
Design: This would be a change in current federal standards and approval processes. The EPA, FDA, and USDA would need to write and implement new policies, potentially train risk assessors and application managers, and develop testing procedures to ensure compliance. With the overturning of the Chevron doctrine, likely this sort of new policy would require the buy-in of either the companies trying to get their products approved or US Congress to pass new legislation.
Assumptions: Companies and reseachers abide by federal regulations regarding testing and approval. Risk assessment is done in good faith, rather than by companies prioritizing profit over safety. Risk assessment is done by trained ecological and biological risk assessors who know what to look for or be aware of.
Risks of Failure and Success: This could fail if the requirement is too stringent to allow any new products to be approved. This could also fail if the requirements are too lax, and not all risks are accounted for and mitigated. If experimental conditions do not properly reflect application conditions, what appeared to be effective mitigation in the lab might not be effective mitigation in application.
Researchers and inventors could also implement relevant and effective genetic biocontainment in any engineered bacteria used for commercial biomanufacturing.
Purpose: For risks around the unintended spread of engineered bacteria or their synthetic genetic constructs, genetic biocontainment can mitigate these risks by preventing proliferation and/or degrading the relevant DNA. By tying the biocontainment system to the intended use of the bacterium, researchers manage risk in a relevant manner, thus ensuring that the bacterium is specific to the intended application and minimizing spread thereby reducing risks.
Design: Any developer of an engineered bacteria that could be intentionally or unintentionally released would need to research biocontainment and engineer a system into their bacteria. This would require a change in the current culture of the field, where the risks of engineered bacteria spread and mitigation through biocontainment are sometimes discussed, but mostly considered somewhat niche. If it became common practice to consider application and risks thereof for the products of synthetic biology, I think the design of these sorts of safeguards would be more widespread. Any sort of research requires funding and incentive, so universities, grant funders, and biotech companies would need to start looking for these considerations in proposals to motivate it.
Assumptions: Genetic biocontainment is a good strategy to mitigate the potential ecological and public health risks of new synthetic biology products. These risks are limited to ones we think to test (i.e. microbial community shifts, horizontal gene transfer of antiobiotic resistance genes or other functions, proliferation of engineered bacteria in unintended location, local specific bacterial extinction event in the case of a particularly robust engineered bacterium).
Risks of Failure and Success: If we rely too heavily on genetic biocontainment, a failure of the genetic system could result in losing that protection against risk. It’s also possible risks would not be seriously considered because we too easily trust biocontainment to minimize the risk.
Establish professional society for cyanobacteria-specific or general photosynthetic-organism research to promote resesarch and tool sharing.
Purpose: Currently, microalgae research is generally lumped along with all other non-model microbes in synthetic biology. A professional association or conference specific to photobiocatalysis could be a gathering place to collect all relevant tools, protocols, and standards, as well as potentially institute a shared ethics or goal to include improving access to the research and its products.
Design: Perhaps a starting point would be to invite cyanobacteria, eukaryotic microalgae, macro-algae, and plant synthetic biologists to a conference on photobiocatalysis, along with industry representatives from companies using or creating engineered phototrophs. This might be best done under the banner of an existing synthetic biology or metabolic engineering professional association (such as the Society for Biological Engineering in the American Institute of Chemical Engineers). If there is enough interest at the conference, attendees could work together to establish a more specific sub-association, or just resolve to discuss access and research sharing at the conference itself.
Assumptions: This is a large enough field to host such a specific conference. It might be too niche, but I don’t think so; it might be a conference on the smaller side at first though probably.
Risks of Failure and Success: It’s possible industry and start-ups might not want to popularly share their research as there is an economic disincentive.
Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Does the option:
Risk Mitigation for Approval
Biocontainment in Practice
Photobiomanufacturing Professional Society
Enhance Biosecurity
• By preventing incidents
1
1
3
• By helping respond
2
3
3
Foster Lab Safety
• By preventing incident
2
n/a
2
• By helping respond
2
n/a
2
Protect the environment
• By preventing incidents
1
1
2
• By helping respond
1
2
2
Other considerations
• Minimizing costs and burdens to stakeholders
3
3
3
• Feasibility?
2
3
2
• Not impede research
3
2
1
• Promote constructive applications
1
2
1
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. I would prioritize the requirement of risk assessment and mitigation strategies for federal approval of engineered bacteria. I believe this would have the biggest impact in terms of allowing engineered bacteria to be used for public good (such as biomanufacturing) while preventing potential harm (such as ecosystem destabilization by permanently altering native microbiome in instances of escape). The development of genetic biocontainment tools and implementation thereof becoming regular practice in the field of engineered microbes would be awesome, but I think would be harder to bring about and would take longer - although it might actually have more impact. The establishment of a professional society could help institute such norms. Starting a new conference would probably be easiest in terms of discovering feasibility - proposing it to a handful of host organizations would rapidly identify whether this is currently worth pursuing or if it would need to be worked on for a while first.
References:
Chemla, Y; Sweeney, CJ; Wozniak, CA; et al. Engineering Bacteria for Environmental Release: Regulatory Challenges and Design Strategies. Authorea. July 05, 2024. DOI: 10.22541/au.171933709.97462270/v2
George, DR; Danciu, M; Davenport, PW; et al. A bumpy road ahead for genetic biocontainment. Nature Communications, 15(650). January 20, 2024. DOI: 10.1038/s41467-023-44531-1
Schmelling, NM; Bross, M. What is holding back cyanobacterial research and applications? A survey of the cyanobacterial research community. Nat Commun 15, 6758. August, 8, 2024. DOI: 10.1038/s41467-024-50828-6
Week2 Lecture Prep
Jacobson:
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? Polymerase error rate: $1 : 10^{6}$. The human genome is around 3.2 Gb, or $3.2 * 10^{9}$ basepairs. Biological polymerases are error-correcting; they have have proofreading mechanisms. There are also mutation repair mechanisms.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? The average human protein is encoded within 1036bp. This might be answerable based on the last slide titled “Fabricational Complexity”, but I couldn’t quite figure out what these formulas are supposed to be calculating without explanation. So instead, we can do some back-of-the-napkin math together. 1036bp is $1036/3 \approx 345$ codons, or 344 amino acids (because of the stop codon at the end), assuming that the 1036bp figure doesn’t include introns. Most amino acids have either 4 or 2 codons that can encode for it, although a couple have more or less. We’ll average it out to approximately 3 codons per amino acid. I imagine that not all amino acids are used at the same frequency in human proteins, but I don’t actually know what it is off the top of my head, so we’re just going to go with what we have. Each possible DNA sequence for an amino acid sequence includes every combination with all possible codons for each amino acid. So assuming an average human protein has 344 amino acids, and the average number of codons per amino acid is 3, then there are $3^{344} = 1.3 E164$ different ways to code for an average human protein. In practice, not all tRNAs are synthesized at the same frequency, so it might take unreasonably long for certain codons to be recognized during chain extension; and during DNA replication, errors can be made and some errors will be more tolerable than others due to codon wobble.
LeProust:
What’s the most commonly used method for oligo synthesis currently? Phosphoramidite synthesis.
Why is it difficult to make oligos longer than 200nt via direct synthesis? There are side reactions that occur, causing the accumulation of errors (incorrect bases).
Why can’t you make a 2000bp gene via direct oligo synthesis? I think this is because of the side reactions in Q2, right? Like, the accumulation of errors limits oligo synthesis to around 200 bases in practice. Also, oligos are single-stranded DNA; a 2000bp gene is double-stranded, and therefore you’d either need to synthesize both strands and ligate them together, or synthesize one strand and use it as a template for PCR or something.
Church:
Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own: BioStabilization Systems - ARPA-H \
Biologic therapeutics are critically important for a number of diseases, but require careful and specific conditions at all points on the supply chain to maintain efficacy. Specifically, cell therapies and biologics require extreme cold to prevent degradation, thus making biologics inaccessible to people who don’t live near a specialized medical center. To solve this problem, we propose to express biologic therapeutics in extremophiles from abyssal marine sediment, which demonstrated little cell proliferation in low-oxygen environments but regained metabolic activity when incubated with oxygen. We predict that the faster cell turnover period at warmer temperature, oxygen-rich, and high-nutrient conditions will allow us to engineer these bacteria to produce the biologic therapeutic molecules. Once production is achieved, we will seal the cells into low-oxygen capsules for transport, which we predict will slow their metabolic rate enough to preserve the goal product until oxygen is provided again. If successful, this research could expand access to biologic therapeutics to anywhere that can aseptically incubate microbes at room temperature and purify the molecules therein.
References:
Morono, Y; Ito, M; Hoshino, T; et al. Aerobic microbial life persists in oxic marine sediment as old as 101.5 million years. Nat Commun 11, 3626. 2020. DOI: 10.1038/s41467-020-17330-1
Suzuki, Y; Webb, SJ; Kouduka, M; et al. Subsurface Microbial Colonization at Mineral-Filled Veins in 2-Billion-Year-Old Mafic Rock from the Bushveld Igneous Complex, South Africa. Microb Ecol 87, 116. 2024. DOI: 10.1007/s00248-024-02434-8
Personal notes/drafting
abstract formula:
1 sentence on the broad problem: Biologic therapeutics are critically important for a number of diseases, but require careful and specific conditions at all points on the supply chain to maintain efficacy.
1-2 sentences on the specific problem: How to transport cell therapies and biologics at room temperature, decentralizing medicine
1 sentence on the broad goal: We aim to express biologic compounds in extremophiles from the deep subsurface where energy and nutrients are limited.
2-3 sentences on methods: aerobic microbes from oxic abyssal marine sediment that proliferated at 10C with provision of nutrients and higher conc O2; might need to consider eukaryotic protein folding in prokaryotes; low O2 environment - maybe sealing the cells (post-therapeutic production, pre-shipping) into an airtight capsule would prevent metabolic activity including the breakdown of said therapeutics?
1 sentence on future work: maybe also try extremophiles found within old rock samples
1 sentence on conclusion/impact: expands access to biologics, especially to under-resourced communities
I couldn’t figure out how to use Ronan’s website other than the randomization button unfortunately. As a result, I went with a pretty simple smiley face design for my in-silico art.
Part 3: DNA Design Challenge
3.1 Protein
I chose PETase, a naturally occurring enzyme from Ideonella sakaiensis, which degrades poly(ethylene terephthalate) into monomers of mono-2-hydroxyethyl terephthalate.
To reverse translate in Benchling, it asks what codon optimization scheme you want to use. For this initial DNA sequence, I just used Escherichia coli K12 as my organism, matching codon usage to the frequency found in the E. coli genome.
To practice codon optimization, I stayed within the Benchling tool. This time, I chose my potential host organism: Pseudomonas putida, a soil bacteria that is a fairly common chassis. This time, I selected to only use the best (most frequently occuring) codons, to theoretically improve expression. I also avoided BsaI cut sites, in case I want to use Golden Gate cloning with this construct.
I optimized the codons for P. putida, so I would choose to express this in P. putida. I would probably put the gene onto an expression plasmid first, under a strong constitutive promoter, just to ensure it works. After transforming P. putida with the plasmid, I would test expression by looking at protein production with a Western blot, and also culturing with a sample of PET plastic, to check for degradation. Ultimately, I would want to integrate this gene into the genome of the bacteria, possibly under an inducible promoter for use in open-release plastic pollution bioremediation.
Part 4: Prepare a Twist DNA Synthesis Order
There is actually a way to use Twist’s expression vectors, so I wouldn’t have to design the whole expression cassette. For example, if I wanted to express my E. coli codon-optimized PETase gene in E. coli, I could select one of Twist’s pET expression vectors; in this case, I chose pET-blank(Kan). It has a T7 promoter and RBS already included in it, and lacO for inducible expression. I believe the host strain would need a T7 polymerase, but my PETase gene should be expressed in the presence of IPTG.
What DNA would you want to sequence (e.g., read) and why?
I would want to sequence metagenomic 16S sequences from soil samples. This gives me a baseline for bacterial community structure (prior to engineered strain addition).
In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
This would be Illumina sequencing probably, so second generation. This requires PCR amplification of the 16S variable region, adapter ligation, and then library pooling. The output would be many, many sequences that I would be able to compare to published 16S sequences to identify bacterial species present. I’d choose this method because it multiplexes better than Sanger sequencing, and it doesn’t need to be long-read like nanopore.
5.2 Write
What DNA would you want to synthesize (e.g., write) and why?
I would want to synthesize the CRISPR cassette for my kill-switch because it’s somewhat difficult and time-intensive to stitch together out of oligos.
What technology or technologies would you use to perform this DNA synthesis and why?
I’d order it from Twist because it has multiple internal repeats, and they’re one of the few companies with the technology to accomplish that.
5.3 Edit
What DNA would you want to edit and why?
I’d like to edit the P. putida genome to include PETase and MHETase genes, as well as a killswitch circuit for biocontainment to prevent unintended ecological effects during application.
What technology or technologies would you use to perform these DNA edits and why?
I’d like to use CRISPR-Cas9 because it is the most flexible when it comes to genomic integration location. I could identify a few good neutral sites for integration and design sgRNAs to target these locations. Then I could insert repair templates (including homology arms) onto vectors and transform those sequentially with the CRISPR-Cas9 plasmid (repair template should be first). The repair templates could either be wholly synthesized, or assembled through overlap PCR or Gibson assembly (primer design for homology arms and overlaps for assembly). I might need a antibiotic, fluorescent, or other marker to scan for initial transformation and also genomic integration post-plasmid loss - in that case, I would also need to consider a step to remove the marker.
I couldn’t figure out how to use Ronan’s website other than the randomization button unfortunately. As a result, I went with a pretty simple smiley face design for my in-silico art.
Part 2: Gel Art - Restriction Digests and Gel Electrophoresis
I’m interested in PhaC, a PHA synthase. This is an enzyme involved in the synthesis of polyhydroxyalkanoates (PHAs), a class of biopolymer that is considered a potential non-petroleum-derived thermoplastic. PHAs are also of interest for possible medical uses as biodegradable polymers. PhaC is the enzyme that catalyzes the polymerization step, adding on monomers to the chain.
I selected PhaC from Cupriavidus necator H16 whose primary product is poly(3-hydroxybutyurate). From UniProt, the accession number is P23608 · PHAC_CUPNH.
I used the Benchling back-translate tool set to match Escherichia coli K-12 naturally occuring codon usage because it didn’t have the native host C. necator as an option. They are in the same phylum (Pseudomonadota), so maybe it will be similar.
They are not that similar, it turns out; although that may have less to do with codon usage frequency and more to do with when the reverse translate tool used which codons. Here’s the DNA sequence alignment comparing the genomic sequence from C. necator with the E. coli optimized reverse translation. This sequence alignment was performed in Benchling, using MAFFT with pre-set parameters.
Full alignment viewable here.
3.3 Codon optimize
I once again used the Benchling tool to codon optimize for E. coli K-12, but this time, I selected the Best Codon option in Benchling, and this was performed off the original C. necator phaC DNA sequence - although it should produce the same sequence if it was done as a reverse translate from the amino acid sequence too (since i confirmed that the phaC sequence does translate to the PhaC sequence with 100% identity).
This sequence could be used to express PhaC in E. coli. I would probably put the gene onto an expression plasmid, under a strong constitutive promoter, just to ensure it works. After transforming E. coli with the plasmid, I would test expression by looking at protein production with a Western blot, and looking at cells under a microscope to look for PHA granules. I need to do a little more literature searching on heterologous expression of PhaC in E. coli - I think maybe other enzymes are needed for PHB synthesis.
3.5 Optional - how does it work in natural biological systems?
Describe how a single gene codes for multiple proteins at the transcriptional level. Different reading frames on the same string of DNA bases gives different codons that are off-set by which base (1-3) starts it. In this way, genes for multiple proteins can overlap on the same sequence of DNA.
Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein! I created the transcript by using Benchling to create a new RNA sequence off the reverse of my coodon-optimized sequence. I kept the annotations, so the translation should still be visible. Then I made a new alignment in Benchling using MAFFT with the automatic parameters. Again, the sequences match perfectly - although it’s not 100% identity because technically the T/U difference between DNA and RNA are considered mismatches, but we can see visually across the bottom of the screenshot that we don’t have any actual mismatches.
Part 4: Prepare a Twist DNA Synthesis Order
Following the instructions in the Week2 Homework, I added the J23106 promoter and an RBS at the beginning of my codon-optimized phaC sequence. My coding sequence already had a start and stop codon, so I didn’t need to add those. I inserted the 7x-His tag just before the stop codon, and then I put the terminator after the stop codon at the end.
I then set up the Twist order, as if I was going to order this cassette to be synthesized. Again, following the instructions for upload, I chose cloning vector pTwist Amp High Copy to make a full plasmid. My sequence was high complexity, so I went through the Twist codon optimization process to improve the sequence for easier synthesis. I chose E. coli as my host strain again, and selected the ORF that matched my gene. I chose the promoter and RBS, and terminator regions as regions to preserve during the codon optimization process so that it kept the sequences for the genetic parts that I chose. The optimized sequence was no longer high complexity as the regions of high GC% and repeats were changed.
What DNA would you want to sequence (e.g., read) and why? I’d like to sequence the genomes of all cyanobacterial strains known to produce PHAs or specifically PHB (some already are sequenced, I think). I want to align all the known cyanobacterial PHA-synthases, and then align with the assembled genomes of the cyanobacterial strains known to produce PHAs that maybe aren’t annotated yet to try to find the PHA-synthases and add those to my comparisons.
In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? I would use third-generation sequencing on an Oxford nanopore. By using long-read technology, I would get much longer contigs, to make genomic assembly easier.
5.2 Write
What DNA would you want to synthesize (e.g., write) and why? I’d like to get a CRISPR-Cas12a multiplexed gRNA cassette synthesized. This would allow multiple genomic edits to occur simultaneously, if the appropriate repair templates are included (one for gRNA target).
What technology or technologies would you use to perform this DNA synthesis and why? I would submit an order to Twist to get this synthesized because it has multiple internal repeats because of the CRISPR region, which means traditional DNA synthesis technologies would struggle with this sequence.
5.3 Edit
What DNA would you want to edit and why? I’d like to improve PHA-synthase expression in my cyanobacterial chassis strain of choice (specific strain yet to be determined). This could be accomplished through promoter replacement if we’re staying in the genome rather than adding a plasmid, but I’d also be interested in knocking out other biosynthetic pathways to improve carbon flux towards PHA synthesis. So I’d want to edit the genomic DNA of a cyanobacterial chassis.
What technology or technologies would you use to perform these DNA edits and why? I’d use a CRISPR-Cas12a vector because it allows for multiplexed targeting, so I could make multiple genomic edits. Cas12a both processes the CRISPR-gRNA cassette and makes the cuts, so it requires fewer components than Cas9. Additionally, there’s some evidence suggesting Cas12a shows less off-target effects than Cas9.
Note: This is due before the Victoria node does its Opentrons artwork lab, at a future date TBD. This homework assignment is still in-progress because the due date is not yet established.
Post-lab questions
Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications. A paper published this month in ACS Synthetic Biology details a new workflow for automating MoClo plasmid assembly and transformation, with a semi-automated colony PCR on an Opentrons OT-2 and Opentrons Flex. These workflows are designed to be user-friendly and output the Opentrons protocol from user-supplied CSV files, which provided README files describe how to produce.
Alternatively, the authors also developed a graphical user interface which requires no coding ability. This is a novel application because it is only the second automation of MoClo/Golden Gate cloning for Opentrons system (as opposed to advanced high-throughput liquid handling systems), and this new workflow does not require Python ability as the previously published AssemblyTron workflow.
These workflows were validated by assembling plasmids with the MoClo Yeast Toolkit and MoClo SubtiToolKit, and transforming these plasmids into Saccharomyces cerevisiae and sequentially Escherichia coli and Bacillus subtilis, respectively. With both toolkits, the automated procedure achieved efficiency comparable to the manual procedures (> 90% and 60%, respectively).
Figure 1: Schematic overview of the protocol design workflows developed for the Opentrons platform. Protocols can be generated using either the generator.py Python script via the command line or the online Slowpoke tool, which features a user-friendly GUI. Both tools run the workflow.py files in the backend. (A) Workflow for Golden Gate-based cloning, where users define genetic part layouts and assembly combinations. (B) Workflow for colony PCR, including colony selection, reagent layout, and reaction recipe input.
Malci, K; Meng, F; Galez, H; et al. Slowpoke: An Automated Golden Gate Cloning Workflow for Opentrons OT-2 and Flex. 2026. ACS Synthetic Biology, 15(2): 511-521. DOI: 10.1021/acssynbio.5c00629
Write a description about what you intend to do with automation tools for your final project. I’d want to utilize the Opentrons set-up in the Victoria node to enable the possible execution of my medium-term aim with as little scientist benchtime as possible. I don’t know the exact make and model of all modules that the Victoria Opentrons has, but below is a series of possible steps that might be automatable (best use of automation would be medium or high throughput, depending on the number of designs we are able to test):
Gibson Assembly or MoClo plasmid assembly
Transfer reaction components into wells
Heat block for digestion/ligation/PCR steps
Transformation of expression plasmid
Transfer plasmids and competent cells into wells
Heat block for heat shock
Transfer media into wells
Heated shaker for recovery
Incubator for overnight growth
Stamp onto new plate or pick into multiple liquid cultures for culturing
Incubator or heated shaker for overnight growth
Readout
Transfer cells (and reagents) into wells
Plate reader for fluorescent or colorimetric output
Final project ideas
Brainstorming:
Identification of PhaC analog in Cyanobacterium aponium UTEX 3222 and overproducing or engineering for increased efficiency
BLAST/align with known PHA-synthases
Compare efficiency / mutations that improved turnover in other PhaC - test analogous mutations (aligned location, similar or different AAs). improved substrate specificity?
Site-specific saturation mutagenesis? Would be good use for automation
Quorum sensing based killswitch (i.e. cell dies if it escapes bioreactor)
Has to have some kind of inducible element or won’t grow after initial transformation
What’s good at quorum sensing already?
Something else??? Something in E coli that can be done on Opentron
Because it’s more convenient for a final project to be executed in Victoria remotely
Cyanobacterial expression plasmid across multiple cyano species
needs to include E coli machinery for manipulation and production (and conjugation, for relevant species)
Ideas:
PhaC protein engineering
Short term aim: Design small library of PhaC variants with expected improvement
Medium term aim: Generate library and test in chassis strain
Long term aim: Develop PHB bio-manufacturing cyanobacterial strain for carbon-neutral/carbon-negative plastic (depending on biodegradation).
Quorum sensing based circuit for biocontainment
Short term aim: Design killswitch with genetic circuit to trigger based on quorum sensing.
Medium term aim: Build genetic circuit with expression based on quorum sensing with a measureable output; test circuit in E. coli.
Long term aim: Optimize circuit sensitivity and test with killswitch expression; integrate into bio-manufacturing chassis strains for population-linked biocontainment.
Broad cyanobacterial expression plasmid
Short term aim: Design plasmid backbone based off native cyanobacterial plasmids and established E. coli machinery.
Medium term aim: Test expression in multiple cyanobacterial strains (including some previously considered genetically intractable with classic broad-host-range vectors).
Long term aim: Establish protocol for domestication of newly prospected, wild-type cyanobacterial strains using the cyanobacterial plasmid.
Need to answer 9/11 questions; I skipped 7 and 11.
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) $$ 500g * \frac{1 mol AA}{100g} = 5 mol AA $$
$$ 5 mol * \frac{6.02*10^{23} molecules}{1 mol} = 3.01 E24 molecules $$
Why do humans eat beef but do not become a cow, eat fish but do not become fish? We break down the proteins during digestion to the constituent amino acids. These amino acids are then used in our cells to build human proteins.
Why are there only 20 natural amino acids? It’s been hypothesized that the 20 naturally occurring amino acids fairly effectively cover the “chemical space”, which would indicate that more complex or diverse amino acids are not needed for increasing function. This includes variation in chemical properties like molecular size, hydrophobicity, and charge, but also rotational conformations. These twenty sufficiently cover the space for effective function while also being relatively low in energy (easy to synthesize). Another paper hypothesizes that all twenty natural amino acids predate the RNA world, and in fact were naturally synthesized prebiotically with mineral catalysts - thus suggesting that the development of the three-base 64-codon alphabet actually was because a two-base 16-codon alphabet would restrict to sixteen instead of the existing 20 amino acids.
Doig, AJ. Frozen, but no accident – why the 20 standard amino acids were selected. 2017. FEBS J, 284: 1296-1305. doi: 10.1111/febs.13982
Bywater RP. Why twenty amino acid residue types suffice(d) to support all living systems. 2018. PLoS One, 13(10):e0204883. doi: 10.1371/journal.pone.0204883
Can you make other non-natural amino acids? Design some new amino acids. There are a new non-cannonical amino acids that people have designed and used, by changing the residue for an unnatural one.
Where did amino acids come from before enzymes that make them, and before life started?
In 2018, Bywater suggested that amino acids were synthesized prebiotically, with the simpler structures occurring through aqueous reactions, and more complex structures requiring mineral catalysts. Many amino acids have been identified on meteorites, suggesting that amino acids could have originated in outer space, but more likely that the conditions to synthesize the “simpler” amino acids exist in multiple places. Other researchers have suggested that the “complex” amino acids must have been biosynthesized by early proteins made up of “simple” amino acids, and in particular, that histidine, phenylalanine, cysteine, methionine, tryptophan and tyrosine had to come after molecular oxygen because they have redox functionality.
Doig, AJ. Frozen, but no accident – why the 20 standard amino acids were selected. 2017. FEBS J, 284: 1296-1305. doi: 10.1111/febs.13982
Bywater RP. Why twenty amino acid residue types suffice(d) to support all living systems. 2018. PLoS One, 13(10):e0204883. doi: 10.1371/journal.pone.0204883
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? I would expect D-amino acids would form a left-handed helix because L-amino acids form right-handed helices.
Can you discover additional helices in proteins?
Why are most molecular helices right-handed? In general, naturally occuring amino acids are L-enantiomers, which leads to right-handed helices because of steric hindrance requiring the side chains to point outwards.
Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? Because beta sheets are flat, they can stack, and the large surface area means that the side-chains can have interactions (especially hydrophobic side-chains) between the sheets.
Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? Amyloids are ordered protein aggregates consisting of repeating beta sheet motif. Proteins that have an alternative folding structure with a lot of beta sheets become amyloids when they self-assemble into fibrils, and the alternative conformation with the beta sheets is energetically stable. Amyloid diseases usually are from a single amyloid-forming protein. Because of their tendency to self-assemble, I think you could use amyloid beta sheets as materials for DNA origami.
Riek R. The Three-Dimensional Structures of Amyloids. 2017. Cold Spring Harb Perspect Biol;9(2):a023572. doi: 10.1101/cshperspect.a023572.
Ow SY, Dunstan DE. A brief overview of amyloids and Alzheimer’s disease. 2014. Protein Sci;23(10):1315-31. doi: 10.1002/pro.2524.
Design a β-sheet motif that forms a well-ordered structure.
Part B: Protein Analysis and Visualization
Briefly describe the protein you selected and why you selected it. I chose PhaC from Cupriavidus necator. PhaC is a polyhydroxyalkanoate-synthase, used in biopolymer production. I selected it because engineering PhaC is one of my potential final projects. The C-terminal domain is believed to be the catalystic domain, and it has a solved crystal structure. The N-terminal domain does not have a solved crystal structure, and is believed to potentially be involved in substrate specificity.
Identify the amino acid sequence of your protein. \
How long is it? What is the most frequent amino acid? 390 amino acids (when i removed the His-tag at the end). Most frequent amino acid is A (alanine).
How many protein sequence homologs are there for your protein? BLAST found 250 sequence homologs - mostly belonging to other bacteria that biosynthesize PHAs.
Does your protein belong to any protein family? It’s classified as a transferase.
Identify the structure page of your protein in RCSB C. necator PhaC (C-terminal domain) has been uploaded to RCSB PDB here.
When was the structure solved? Is it a good quality structure? The structure was solved in 2016 by two different and unrelated groups, which is a good sign for repeatability (PDB 5HZ2 and 5T6O). It has a resolution of 1.8Ă…, which is a good quality structure.
Are there any other molecules in the solved structure apart from protein? Yes, there is a sulfate ion and a glycerol molecule.
Does your protein belong to any structure classification family? Nothing that I could find on SCOP.
Open the structure of your protein in any 3D molecule visualization software: I used the structure viewer on the PDB website because I wasn’t able to download PyMol on my laptop (not enough memory space).
Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
Color the protein by secondary structure. Does it have more helices or sheets? I think it looks like it has more helices.
Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues? I colored by hydrophobicity of residue in the PDB structure viewer, because it was all one color when I selected color by residue molecule type. Not sure what was up with that, but I figured hydrophobicity would let me look at the hydrophobic vs hydrophilic residues. The hydrophobic residues are more clustered towards the insides of the structure.
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)? Yes, you can kind of see the indentation in the center of the screenshot below.
Part C: Using ML-based Protein Design Tools
I’m continuing with the C-terminal domain of PhaC, 5HZ2 in PDB. Colab notebook.
C1. Protein Language Modeling
Deep Mutational Scans
Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. I copied the FASTA protein sequence from PDB into the first line of cell3 of the Colab notebook replacing the string labeled “protein_sequence”.
Can you explain any particular pattern? (choose a residue and a mutation that stands out) Position 277 seems important - Aspartic acid is the only yellow/high score. Everything else is mostly dark blue, so very negative, which I think means not likely to be able to mutate. So likely, this is either important structurally or catalytically. Asp is one of the few charged amino acids, so that makes me think it might be catalytic.
Latent Space Analysis
Use the provided sequence dataset to embed proteins in reduced dimensionality.
Analyze the different formed neighborhoods: do they approximate similar proteins? I think they probably mostly do, but it’s kind of hard to tell, because there are so many proteins that it’s hard to visually see which are clustered vs overlapping clusters, and also many of the proteins are just labeled “automated matches” which isn’t really helpful for identification.
Place your protein in the resulting map and explain its position and similarity to its neighbors. It’s nearest to a lipase, a few esterases/thioesterases, and some acetyl-transferases. These are all also from bacteria. I think this makes sense, because these are all kind of involved in biosynthesis of (sometimes long) carbon-containing molecules.
Note: PhaC is the partially covered black dot surrounded by orange-yellow dots.
Code for visualization:
New cell after cell53 of the Colab. i wrote the following code based off existing Python knowledge, and mostly looking at the prior couple cells.
# add my protein sequence to the sequences array
#make list collection to match the first thing in sequences that was printed above
record = SeqRecord(seq=Seq(protein_sequence), id='5hz2', name='PhaC', description='PhaC - polyhydroxyalkanoate synthase (Cupriavidus necator)', dbxrefs=[])
#print the original length of sequences array to compare
print(len(sequences))
#append my new entry to the sequences array
sequences.append(record)
#print new length of sequences array to compare to the old (should be one greater here)
print(len(sequences))
#print the final item of the sequences array (should be my new one)
sequences[len(sequences)-1]
Then ran former cell 54 (currently cell 55 since i added a new one) as usual. Separated out the visualization generation code into a separate cell. Ran the initial dataframe creation. Made a new cell to confirm what my sequence descriptor was:
protein_sequence_annotations[15177]
Then visualized with the following code in a single cell. The chunk that was added is after the fig_3d.update_layout and before fig_3d.show(). This chunk was adapted from the bit that was posted by Noureldin Rihan on the Discourse forum.
# Visualize with Plotly 3D scatter plot, coloring by TSNE3
fig_3d = px.scatter_3d(
tsne_df_3d,
x='TSNE1',
y='TSNE2',
z='TSNE3',
color='TSNE3', # Color points based on the third t-SNE component
title='3D t-SNE Visualization of Protein Sequence Embeddings (Color by TSNE3)',
hover_name=protein_sequence_annotations[:len(embeddings_array)] # You can replace this with sequence IDs if available
)
fig_3d.update_layout(
height=800 # Increase the height of the plot
)
#change color and size of my protein so it is easier to find in the huge latent space
#code adapted from Noureldin Rihan on Discourse forum https://forum.htgaa.org/t/issues-with-latent-space-analysis/382
# get the protein's index
my_point = tsne_df_3d.iloc[protein_sequence_annotations.index("PhaC - polyhydroxyalkanoate synthase (Cupriavidus necator)")]
# color it differently
fig_3d.add_scatter3d(
x=[my_point["TSNE1"]],
y=[my_point["TSNE2"]],
z=[my_point["TSNE3"]],
marker=dict(
size=10, # Choose the dot size
color="Black" # Choose a color
),
text=["PhaC - polyhydroxyalkanoate synthase (Cupriavidus necator)"],
hovertemplate="<b>%{text}</b><br>TSNE1: %{x:.2f}<br>TSNE2: %{y:.2f}<br>TSNE3: %{z:.2f}<extra></extra>"
)
fig_3d.show()
C2. Protein Folding
Fold your protein with ESMFold. Do the predicted coordinates match your original structure? This looks like a smaller and less intricate structure than the solved structure. I’m not sure what’s up with that.
Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations? I replaced all the Es with Ds and removed the His-tag at the end of the sequence. This yielded the following structure:
I think it looks similar. So at least with the small mutations it’s resilient. larger mutations probably not.
C3. Protein Generation
Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one. The output from the third cell after the Inverse Folding with MPN heading:
Based on the heatmap, it has far less flexibility in sequence than the original.
Then after the heatmap cell, there was the last cell that gave a different output that also looked like a predicted sequence, so I’m unclear which one we should look at:
Input this sequence into ESMFold and compare the predicted structure to your original. Replacing the original 5HZ2 protein sequence with the new sequence from the last cell into ESMFold (cell 54) gives us this predicted structure below. Which I guess looks kind of similar to the original predicted structure, but still to me does not look like the PDB structure. The new sequence doesn’t have a His-tag at the end, but it does kind of look like it has a linear tail like a His-tag, which is neat.
Part D: Group Brainstorm on Bacteriophage Engineering
What do we know:
E. coli DnaJ binds to denatured proteins to prevent/disassemble aggregates (native function in heat-shock).
DnaJ binds to the hydrophilic tail of MS2-L protein.
point mutation of highly conserved proline in DnaJ results in no lysis (so maybe no more binding of MS2-L tail?)
removal of MS2-L tail recovers lysis function (meaning DnaJ is only necessary when tail exists)
suggests hydrophilic tail aggregates in some way that prevents lysis except in presence of DnaJ to stop aggregation
so stability should be improved if we can figure out how the tail is interacting with the tail of other MS2-L molecules, and then mutating that away so there is no aggregation and dependence on DnaJ
graph TB;
A[sequence and structure of MS2-L] -->|if geometry and chemical interactions are known| B[view interactions between MS2-L copies]
A -->|if geometry and interactions are not known| C[model interactions with AlphaFold or something that can do protein interactions]
B -->|visual analysis and mutation modeling| D[Identify important residues in MS2-L tail interactions]
C -->|visual analysis and mutation modeling| D[Identify important residues in MS2-L tail interactions]
D -->|use knowledge of hydrophobicity/charge/etc. OR use ESM2 mutational scan and select ones that it finds unlikely| E[Select dissimilar AAs to substitute in interacting residues]
E -->|AlphaFold or similar| F[model protein folding in new AA sequence with selected mutations]
F -->|something that can model protein interactions| G[model interactions between mutant MS2-L copies]
G -->|select mutations that have similar hydrophilicity as original tail but less interaction with each other and maybe also with DnaJ| H[test mutations in lab]
Potential problems:
don’t know what can model protein-protein interactions
we might have covered this in class but i don’t remember. i can rewatch the lectures
what if modeling doesn’t show interactions between the tails? we know there probably has to be one…
might have to simplify by only modeling the tail section, but that is probably known already (will have to model folding and interactions with full protein sequence in later steps probably)
could start with DnaJ, what in MS2-L binds with the essential proline in DnaJ, and assume that it’s spatially close to that. then test various mutations of nearby residues
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion DNA polymerase - a high fidelity DNA polymerase, which means that it is an enzyme that adds single nucleotides to extend a DNA chain along a template with some sort of proof-reading ability. It is used for PCR, which means it has to be thermostable.
dNTPs - single nucleotide bases to be used by the polymerase to make DNA
buffer - buffer is used primarily for controlling the pH of the PCR reaction, but it also includes MgCl2 which is a required co-factor for the DNA polymerase.
What are some factors that determine primer annealing temperature during PCR? Primer annealing temperature is affected by the length of the primer and the GC content primarily.
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. PCR is a method to produce many copies of a DNA sequence for which you already have a template. It requires a thermocycler, and PCR mix (thermostable DNA polymerase, dNTPs, appropriate buffer). To use it, you need to have template DNA and primers designed to bookend the sequence of interest. Restriction digests can linearize circular DNA or trim DNA sequences. It requires a heat block or incubator, the relevant restriction enzymes, and appropriate buffer. To use it, you need to have (typically a medium or high concentration amount) DNA that contains your sequence of interest already bookended by restriciton enzyme cutsites. Restriction digests can produce sticky ends or blunt ends; PCR will always produce blunt ends. Both methods will typically require some sort of purification step before further use (DNA cleaning and concentrating; gel extraction). PCR is useful when you need more of a particular sequence of DNA, when you want to make point mutations within a sequence (multi-step process), to add short sequences to the ends of the DNA sequence (such as restriction enzyme cutsites, adaptors, or overlaps). Restriction digestion is useful when you need to remove an insert from a plasmid backbone, to linearize a vector for electrophoresis or other analysis, and for restriction-digest cloning (including ensuring insert and vector have appropriate sticky ends for directional insertion).
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning? Ideally you would design and test in silico to ensure overlaps are appropriate. My first couple times trying Gibson assembly, i wrote it out by hand to convince myself i had done it correctly, but many molecular biology software options can now assist with this as well. You can exactly confirm your purified DNA fragments prior to Gibson assembly by sequencing them, but you can also just get a good idea of their size (which would at least tell you if you PCR’d a very different or non-specific products) by running them on a gel.
How does the plasmid DNA enter the E. coli cells during transformation? During a heat shock transformation, you shock the E. coli cells with an abrupt temperature change from on ice at 0°C (or sometimes room temperature around 20°C) to 42°C. This opens pores within the cell membrane that allow DNA to enter the cells, due to prior treatment with CaCl2 to neutralize the negative charge of the DNA.
Describe another assembly method in detail (such as Golden Gate Assembly).
Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
Golden Gate Assembly can be conceptualized as a cross between restriction digest cloning and Gibson Assembly. Like restriction digest cloning, restriction enzymes are used to digest both the insert and the vector to create compatible sticky ends for directional insertion. However, it uses Type IIS restriction enzymes (such as AarI) that cut outside their recognition site. Therefore with correct design, the recognition sites are removed in assembly. This allows for plasmid construction similar to Gibson assembly: design your insertion fragments and vector backbone to have compatible overhangs/overlaps with the adjacent sequences (often added during primer design in PCR), then add all fragments to the reaction mix which includes both a nuclease and a ligase for assembly. In Golden Gate assembly, the Type IIS restriction enzyme(s) find their recognition sites, cut nearby (at a pre-identified base), resulting in the designed 4-base overhangs. These overhangs can connect with matching overhangs from either the original construct or the intended adjacent fragment, which will be ligated into a closed dsDNA molecule (if the original construct is re-ligated, then the Type IIS enzyme again finds the recognition site and cuts again, thereby improving the efficiency).
Figure from Addgene’s Golden Gate Cloning page.
Model this assembly method with Benchling or Asimov Kernel! To compare assembly methods, I used Benchling’s Assembly Wizard tool to simulate the same plasmid construction using restriction digest, Gibson assembly, and Golden Gate assembly. My target plasmid is called “pGFP”, with a pET28a(+) backbone and an insert containing the gene for green fluorescent protein (GFP) under constitutive promoter P_LacIQ from plasmid pZE27GFP. I started by importing both pET28a(+) and pZE27GFP into Benchling from Addgene. I used Benchling’s auto-annotation tool on pET28a(+) for annotations. pZE27GFP was already annotated, but was missing the annotation for P_LacIQ, so I added an annotation from that by downloading the Genbank file from the Addgene site and using CTRL-F on the sequence to identify it in the original file. I wanted these annotations so that I knew the locations of the relevant sequences in my files for easier visual identification during the cloning simulation. Note that the GFP translation in the pZE27GFP file didn’t include the stop codon, but the stop codon was present, just not included in that translation annotation, and I was too lazy to fix this, so I just remembered that my sequence of interest included the three bases past the end of the translation annotation.
Restriction Digest
Opening the pZE27GFP file to the plasmid map view, I selected the Digests tool to show all single cutters on the map, and identified ones that were near the ends of goal insertion sequence (outside P_LacIQ and GFP): XhoI and HindIII.
Then I opened the pET28a(+) file to the plasmid map view, and selected the Digest option to only show the selected enzymes, and found these two enzymes cut in the insertion locus on the plasmid (between the T7 promoter and the His-tag).
Since both enzymes were present on both starting constructs, I used the Assembly Wizard tool for Restriction Digest cloning, and selected the backbone and insert by highlighting the above sequences with the selected enzymes.
This resulted in a final assembly of pGFP_RDassembly. Note that both the XhoI and HindIII recognition sites are preserved in the final construct. While sticky-ended enzymes allow for directional insertion, this insert does not require directional insertion because it contains both the promoter and the gene. This is important because technically the insert is backwards for the vector as intended (for the T7 promoter and His-Tag on the backbone).
Gibson Assembly
For the Gibson assembly method, I started by opening the Assembly Wizard, and selecting the Gibson option, and opting to try the new combinatorial assembly tool instead. I retained all the default options.
This resulted in a final assembly of pGFP_GibsonAssembly. The primers were auto-generated by the tool, and are visible in the Benchling files for pET28a(+) and pZE27GFP the the naming convention following “pET28a-GA_forward”. The PCR products used in the final assembly are here (insert) and here (backbone).
Golden Gate Assembly
For the Golden Gate assembly method, I similarly started by opening the Assembly Wizard, used the new combinatorial assembly tool for Golden Gate. I retained all the default options. I selected “Use a primer pair” as the option under “Fragment production method”, and then retained the default options that auto-populated. Upon selecting my insert and backbone sequences, the tool threw a warning for a recognition site for the Type IIS enzyme within one of those sequences, so I went into the tool settings to instead select AarI as my enzyme. AarI was chosen somewhat arbitrarily because I’ve used it before; if it had also thrown an error, I would have simply gone down the list until I found a compatible enzyme that wouldn’t cut inside my sequences.
This resulted in a final assembly of GFP_GGassembly. The primers were auto-generated by the tool, and are visible in the Benchling files for pET28a(+) and pZE27GFP the the naming convention following “pET28a-GG_forward”. The PCR products used in the final assembly are here (insert) and here (backbone). Note that both fragments contain AarI recognition sites, but the final construct does not.
Asimov Kernel
See repository JKS_hw6 in Asimov Kernel.
Repressilator:
Repressilator reconstruction:
My initial attempt looks like
The Terminator chosen (L3S2P24 Bacterial Terminator) is the only one available in the Characterized Bacterial Parts repo. The H1 terminator was chosen arbitrarily as the shortest RBS; I just wanted the same RBS for each promoter-gene combo.
I wanted to add a backbone, but there’s no backbone available in the Characterized Bacterial Parts repo. Because the homework instructions said to use only the parts in this repo, I figured I’d try this first without the backbone.
Unfortunately, this didn’t work. The outcome of my first simulation (E coli, 24h, 30min, no ligands) is below. Notice the lack of oscillations in the transcript and protein concentrations over time.
I have two potential solutions for this that I can think of before I check the pre-made Repressilator: first, I don’t have a backbone, which I do think I need, but I did still get a simulation without it, so maybe I don’t. Second, I don’t have a reporter protein. My recollection of the Repressilator paper includes a fluorescent output, so I’ll try adding a reporter gene next.
Second attempt:
pTet was chosen arbitrarily - it could have been any of the three promoters used prior. H1 RBS was used again for consistency. LitR was chosen arbitrarily as a reporter gene because I couldn’t find a fluorescent protein within the Characterized Bacterial Parts repo.
Unfortunately, this gave more or less the same kind of output with no oscillations. I’ll try adding in a backbone from outside the Characterized Bacterial Parts repo, but if that doesn’t work then I’ll have to go back and reference the demonstration repressilator. Adding pUC-SpecR-v1 backbone, but it didn’t change the output.
Checking the repressilator in the Bacterial Demos repo, I’m honestly not totally sure why mine didn’t work. It looks really similar:
The terminator and backbone used are the same ones as I used. It has LacI/LambdaCI swapped from my original construct, but it should still work. Oh! I see - I accidentally grabbed pTet not pTetR originally. I went back and removed my pTet-LitR section, to return to my original construct, and then I replaced the pTet with pTetR.
This worked! Here’s my new output:
And here’s the oscillations that I wanted to see. Awesome!
Construct1: OR gate
Construct 1: OR gate
Initial construct
pTet is activated by aTc, pTac is activated by IPTG. BBa_E0040 is from the iGEM registry; encodes for GFP.
If aTc or IPTG is present, then GFP will be expressed.
I’m a little surprised that there was as much of a difference between aTc and IPTG alone, but considering we are just looking at expression or not (rather than how much expression), i think this still worked. I am curious if I flip the order of pTet and pTac if that changes it at all. Kept the ligand amounts and times the same.
Just about the same. This makes me think that maybe setting the aTc concentration to 0 at time 12hr is maybe not working well, or maybe pTac is just that much stronger of a promoter than pTet.
Construct2: NOR gate
Construct2: NOR gate
Initial construction:
pTet is induced by aTc, pTac is induced by IPTG. BBa_E0040 encodes GFP.
If neither aTc nor IPTG are present, then GFP will be expressed.
Expected output:
aTc
IPTG
Output
0
0
1
1
0
0
0
1
0
1
1
0
Simulation:
0-6hrs: aTc => no output 6-12hrs: no ligands => GFP 12-18hrs: IPTG => no output 18-24hrs: aTc+IPTG => no output \
Expected outcome achieved.
Construct3: XOR gate
Construct3: NOR gate
I wanted to try to see if i could independently come up with a XOR gate without directly copying the one in the Bacterial Demos repo. Looking at my OR gate and NOR gate, I thought I’d be able to, but when I started to try to sketch it out, I kept getting stuck. Originally, I was thinking an OR gate minus an AND gate, and I had designs for both of those.
OR gate
Expected output:
aTc
IPTG
Output
0
0
0
1
0
1
0
1
1
1
1
1
AND gate
Expected output:
aTc
IPTG
Output
0
0
0
1
0
0
0
1
0
1
1
1
However, I couldn’t figure out how to combine these in a way that made sense. After drawing out probably a couple dozen circuits, I ended up consulting the XOR gate in the Bacterial Demos repo. Looking over it briefly (but not trying to track out the outcomes directly), I figured out a tiered method to design the circuit.
Line1: start with the output: GFP, under a repressible promoter. Line2: then below that draw that promoter’s transcription factor. add in a repressible promoter (but leave room for more if needed). Line3: then below that, draw the new promoter’s transcription factor. add in one of the two inducible promoters (leave room for more promoters if needed). But we have two inputs, so we need two inducible promoters. They can’t be on the same protein, because that wouldn’t give an OR gate. So add another promoter on line2. Line2: add another repressible promoter to the transcription factor for the GFP promoter. Line3: Below that, draw in the new promoter’s transcription factor, under the control of the other inducible promoter (leave room for more promoters if needed). But the inducible promoters need to be able to cancel each other out. Line3: So add the same repressible promoter to each transcription factor on this line. Line4: Below that, draw in that new promoter’s transcription factor, under the control of BOTH inducible promoters.
This yields the following circuit:
Expected outcome:
aTc
IPTG
SrpR
AmtR
QacR
LitR
Output
0
0
0
1
1
0
1
1
0
1
1
0
1
0
0
1
1
0
1
1
0
1
1
1
1
1
0
1
This is the opposite of an XOR gate (yielding output at Neither input or Both inputs, rather than yielding output at Either of only one input), so i just need to add one more layer of repressible promoter to get what I’m hoping for I think. Or I can replace the LitR with GFP and remove the section with GFP under pLitR.
New circuit for XOR gate:
Expected outcome:
aTc
IPTG
SrpR
AmtR
QacR
Output
0
0
0
1
1
0
1
0
1
1
0
1
0
1
1
0
1
1
1
1
1
1
1
0
Simulation:
0-6 hrs: aTc only 6-12 hrs: nothing 12-18 hrs: IPTG only 18-24 hrs: aTc and IPTG
This did not give the expected outcome. GFP doesn’t fall again at the end like it should.
I think there was just something with the simulation; either i didn’t set up the ligands properly, or it wasn’t enough time to equilibrate or something. Because when I run the different ligand combinations individually, or just one change over 24 hours it works like expected.
Here is the outcome for aTc high the entire time, and adding high IPTG at 12 hours. So it does work as expected.
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? IANNs do analog computing instead of digital. So functions are additive (positive or negative) rather than just present/absent. This means that they can respond to an input that’s beyond (over or under) a certain threshold, instead of just is the input present or not. Non-digital dosage. IANNs can also stack with multiple layers for multiple inputs as well.
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. IANNs can be used to identify cell types, such as cancer cells by differentiating them from the surrounding healthy cells. The cancer cells might not have a single unique signal to use as an identifier, but it might have a few different metabolites (or other signals) present in different amounts from the healthy cells. So an IANN can be used to recognize multiple inputs, and how much of those inputs are present (is it more/less than the baseline amount present in the healthy cells). The output might be fluorescence to tag tumor locations for a surgeon to excise, or maybe the output could be a medication for specific targeted release.
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts? Some existing fungal materials include fungal leather and fabrics for clothing, primarily of mycelium or cellulose; biocement, which uses bacteria or fungi to produce calcium carbonate around gravel; and fungal composite materials, which uses a fungal mycelium around an organic or agricultural substrate. Fungal composite materials can be leather-like fabrics, packaging, acoustic insulation, thermal insulation, and hard particle board or brick-like building materials for furniture or architecture.
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria? Engineered fungi might form mycelium materials that can produce different colors; contain biosensors; have different material properties like hardness/flexibility; or be able to actively bioremediate the location that the mycelium-made object is placed in. Fungi are eukaryotic instead of prokaryotic like bacteria, which means there is more diversity both within the cell (organelles) and on a cell-to-cell level (cell differentiation). This complexity both increases the difficulty of synthetic biology in fungi over bacteria, but also allows for engineering that complexity (such as only having the bioremediation turned on in the fruiting bodies).
First DNA Twist Order
Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website. My first sequence is the wild-type Cupriavidus necator PhaC. For cell-free synthesis, it will be transcribed by T7 polymerase, so it needs to have those components. I designed it in Kernel, using the parts from the iGEM repository and the PhaC_Cnecator gene from the Uniprot repository.
The promoter, Bba_Z0251 is the T7 promoter with the consensus sequence. The RBS, Bba_Z0261 is a wild-type T7 RBS that has been characterized as a strong RBS by an iGEM team. The terminator, Bba_K731721 is a wild-type T7 terminator that has been characterized by an iGEM team. The Uniprot PhaC_Cnecator part has no DNA sequnce in Kernel, so I remade this circuit in Benchling, using the PhaC_Cnecator sequence that I had previously codon optimized for E. coli expression in homework2; and copied the regulatory elements from Kernel.
This will be synthesized into a Twist cloning vector. Ronan suggested a chloramphenicol marker for constructs at the Ginkgo Nebula facility, so I’ll use pTwist-Chlor-HighCopy.
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free protein synthesis avoids the requirements of a cold chain for shipping or storage, and it also can simplify complex living systems by instead adding in specific and known amounts of reagents (enzymes, nucleotides, amino acids, etc.). It is more beneficial than cell production in situations like biosensing in remote environments (infectious disease detection in remote or under-resourced locations) and biomanufacturing of toxic products (like some pharmaceuticals) because production won’t stop due to cell death.
Describe the main components of a cell-free expression system and explain the role of each component.
template nucleic acid: DNA or RNA encoding the gene of interest for protein
cell lysate (collection of active components, including the following - or the purified components could be added individually)
tRNA: recognizes RNA codons and adds new amino acids onto a protein chain during translation
polymerase: makes nucleic acids (DNA or RNA)
nucleotides: used by polymerase to make nucleic acids
buffer: maintains reaction pH to optimal level for enzyme function
other enzymes and cofactors, depending on the goal of the system (sometimes these are included through a cell lysate)
amino acids and ribosomes, if protein production is the goal
Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment. Cell-free systems are essentially a series of chemical reactions (biological in nature, but still chemistry), which means that activation energy is required for some reactions. Energy provision regeneration is critical to ensure that the reactions continue to happen instead of stalling out early. Specifically, this is important in protein expression because translation is energetically expensive (requires ATP to attach amino acids to tRNAs). Cells generate ATP through a collection of metabolic processes; a cell-free system needs to be designed to ensure it has a way to generate ATP. One potential method is adding NAD and CoA to generate ATP from pyruvate without needing any additional enzymes.
Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why. Prokaryotic systems are simpler than eukaryotic systems. Eukaryotic systems might have more components, especially for production of functional proteins, for chaperones or post-translational modifications. A prokaryotic system might be good to produce antimicrobial peptides because you don’t need to worry about the product killing the host. A eukaryotic system might be better at producing functional antibodies because antibodies are eukaryotic proteins and therefore might be more functional in a eukaryotic system.
How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup. A membrane protein is difficult to produce in a cell-free system because it likely has a hydrophobic area and a hydrophilic area because it is natively located within a membrane. This means that it is unlikely to be folded into the correct structure without a hydrophilic space for the hydrophilic component of the protein. To optimize the expression of a membrane protein in a cell-free experiment, you would need to stabilize it, for example, by providing liposomes or membrane vesicles in which the membrane proteins could localize for correct folding.
Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each. Three possible reasons for a low protein yield in a cell-free system is insufficient transcription, insufficient translation, or a badly designed DNA template. Insufficient transcription could be due to not adding enough nucleotides into the reaction. This could be tested by adding an mRNA template into the reaction to see if this solves it. Insufficient translation could be due inactive tRNA, inactive ribosomes, or not enough amino acids. This could be tested by spiking more of those individual, purified components (or fresh cell lysate) into the reaction - it’s possible one of those has been degraded. A badly designed DNA template might have a promoter that isn’t recognized by the polymerase provided in the cell-free system; this could be tested with a control reaction that includes a DNA template known to work in this established system.
Reference
Hunt, AC; Rasor, BJ; Seki, K; et al. Cell-Free Gene Expression: Methods and Applications. 2024. ACS Chemical Reviews 125(1): 91-149. DOI: 10.1021/acs.chemrev.4c00116
Homework questions from Kate Adamala
Design an example of a useful synthetic minimal cell as follows:
Pick a function and describe it.
What would your synthetic cell do? What is the input and what is the output? The SMC would produce PHB (bioplastic) using atmospheric carbon dioxide as a carbon source (effectively, photosynthesis producing PHB as the carbon storage molecule). The input is CO2 and sunlight. The output is PHB (and oxygen).
Could this function be realized by cell-free Tx/Tl alone, without encapsulation? Maybe, it would likely be at a low yield. The value of encapsulation here is to keep the intermediates in close spatial proximity to the biosynthetic enzymes for efficient biosynthesis of the final product. I’m also unsure if a thylakoid could exist without encapsulation. I’m not sure why it wouldn’t be able to; I just don’t think I’ve ever read of a cell-free thylakoid.
Could this function be realized by genetically modified natural cell? No. A cell, even a genetically modified one, would have to devote some carbon flux towards biomass and cell replication. Ideally, the synthetic cell wouldn’t have to, and all the carbon (consumed from atmospheric carbon dioxide) would go exclusively towards PHB production.
Describe the desired outcome of your synthetic cell operation. The synthetic cell would produce PHB from atmospheric carbon dioxide, with all carbon flux going towards PHB.
Design all components that would need to be part of your synthetic cell.
What would be the membrane made of? The membrane would be made up of lipids and cholesterol for flexibility. It also needs to include a thylakoid for light-harvesting.
What would you encapsulate inside? Enzymes, small molecules. Inside the SMC, I’d want the enzymes for PHB synthesis. This includes PhaC (the PHA synthase), and also all the enzymes required to build the precursor monomers. We’d maybe need a couple of Calvin Cycle enzymes, but it’s hard to say without drawing out all possible pathways of carbon flux - the idea would be for PHB to be the “energy storage” product. This might be easiest by using a cyanobacterial cell lysate, but ideally, we’d want to get to something simpler than that.
Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian) It would be bacterial. Especially at first, it would have to come from a cyanobacterium, likely Synechocystis sp. PCC 6803 because it’s well-studied. It would be ideal to understand the system to the extent that we could use any bacterial system (such as E. coli), and simply include whatever cyanobacteria-specific proteins or metabolites are needed.
How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?) The SMC would have to export the PHB. So some kind of membrane channel would need to be included.
Experimental details
List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.) Membrane: lipid, cholesterol, thylakoid membrane, chlorophyll, membrane channel Enzymes: bacterial Tx/Tl, PHB biosynthetic enzymes
How will you measure the function of your system? The system’s function would be measured by the PHB output, which could be BODIPY staining if PHB is not exported, or mass spectrometry if PHB is exported.
Homework questions from Peter Nguyen
Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:
Write a one-sentence summary pitch sentence describing your concept. A chlorophyll-based paint for self-healing concrete can improve air quality in buildings suffering degradation.
How will the idea work, in more detail? Write 3-4 sentences or more. Self-healing bioconcrete is either live cells, or a cell-free system, integrated into concrete that produces calcium carbonate from atmospheric CO2 when cracks are exposed to water (which then fills in the cracks). My idea is to create freeze-dry a cell-free system expressing chlorophyll to turn into a paint to go on the outside of this building material. The chlorophyll provides an energy regeneration capacity for the calcium-carbonate cell-free system, while also producing oxygen, thereby improving the local air quality; effectively, photosynthesis that generates calcium carbonate from CO2 and light instead of generating glucose. This would mean that any cracks or chips seen on the inside of the building could be sprayed with water and lit with a plant light, and the combination of the two cell-free systems would repair the crack. The chlorophyll paint would have the further benefit of being visibly green when activated, so the repair process could be visually tracked.
What societal challenge or market need will this address? How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)? This improves the self-healing concrete concept, which addresses the high CO2 emissions cost of traditional concrete manufacturing, as well as decreasing the amount of human work needed to repair broken concrete. The biggest limitation here is that it is one time use, but i think that making the chlorophyll into a paint addresses this because once the repair is completed, the new concrete could be painted over again.
Reference
Smirnova, M; Nething, C; Stolz, A; et al. High strength bio-concrete for the production of building components. 2023. NPJ Materials Sustainability, 1(4): s44296-023-00004-6. DOI: 10.1038/s44296-023-00004-6
Homework questions from Ally Huang
Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. Ionizing radiation is a safety and health concern for space exploration because of how damaging it is to living organisms. Ionizing radiation is more harmful than non-ionizing radiation because it is higher energy and can pass through more materials (thereby making it harder to shield from). While in low Earth orbit, where the ISS is, most of the radiation is protected against by Earth’s magnetic field, but the astronauts aboard the ISS still experience more radiation than people on Earth. Any space exploration beyond low Earth orbit has to deal with higher amounts of ionizing radiation.
Name the molecular or genetic target that you propose to study. Melanin from Cryptococcus neoformans, biosynthesized by Lac1 with phenolic substrate such as dopamine; and control pigment chlorophyll, biosynthesized by ChlP with substrate geranylgeranyl-chlorophyll a
Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. C. neoformans is a fungus that utilizes the energy in radiation via radiosynthesis, analogous to plants utilizing the energy in sunlight via photosynthesis. A similarly radiotrophic fungus was grown on the ISS to investigate its potential as a shielding mechanism against the ionizing radiation in space. It’s known that the pigment melanin provides some protective effect against radiation, and it’s hypothesized that melanin plays an analogous role as chlorophyll in radiosynthesis and photosynthesis, respectively.
Clearly state your hypothesis or research goal and explain the reasoning behind it. Hypothesis: melanin will provide a greater protective effect against the radiation in space than chlorophyll a. The DNA in tubes with lac1 will have a lower mutation or fragmentation than tubes with chlP. The tubes containing lac1 will have a higher number of control (mRFP1) transcripts than tubes containing chlP. This difference in transcript counts might be attributable either to the higher DNA integrity due to melanin’s protection or to increased energy availability from radiosynthesis over photosynthesis (more radiation than sunlight in the test conditions).
Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. All tubes contain BioBits cell-free expression system, the control gene for red fluorescent protein (mRFP1), and the substrates for both Lac1 and ChlP (dopamine and geranylgeranyl-chlorophyll a).
Negative control: no additional DNA
Condition 1: DNA encoding lac1 gene
Condition 2: DNA encoding chlP gene While these tubes could be visualized with the Molecular Fluorescence Viewer for red fluorescence, I believe visual analysis would be hampered by the pigment production. Better data would be obtained from purified nucleic acids. The DNA should be sequenced with long-reads to identify any fragmentation. The RNA should be used in RT-qPCR to quantify transcript counts.
Casadevall, A; Cordero, RJB; Bryan, R; et al. Melanin, radiation, and energy transduction in fungi. 2017. ASM Microbiology Spectrum, 5(2): 10.1128/microbiolspec.funk-0037-2016. DOI: 10.1128/microbiolspec.funk-0037-2016
Averesch, NJH; Shunk, GK; Kern, C. Cultivation of the dematiaceous fungus Cladosporium sphaeropermum aboard the International Space Station and effects of ionizing radiation. 2022. Frontiers in Microbiology, 13: 877625. DOI: 10.3389/fmicb.2022.877625
Williamson, PR; Wakamatsu, K; Ito, S. Melanin biosynthesis in Cryptococcus neoformans. 1998. ASM Journal of Bacteriology, 180(6): 1570-1572. DOI: 10.1128/jb.180.6.1570-1572.1998
Chen, GE; Canniffe, DP; Barnett, SFH; et al. Complete enzyme set for chlorophyll biosynthesis in Escherichia coli. 2018. Science Advances, 4(1): eaaq1407. DOI: 10.1126/sciadv.aaq1407
Based on the predicted amino acid sequence of eGFP and any known modifications, what is the calculated molecular weight? You can use an online calculator. Using the online calculator: 28006.60 Da. However, GFP’s self-cyclization into the active fluorophore results in a loss of around 20 Da, according to this week’s lab. So the better theoretical molecular weight should be 28006.60-20 = 27986.60
Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and: m/z: Charge state n is 903.7148; charge state n+1 is 875.4421
Determine z for each adjacent pair of peaks.
$$ z = \frac{\frac{m}{z_{n+1}}}{\frac{m}{z_n} - \frac{m}{z_{n+1}}} $$
$$ z = \frac{875.4421}{903.7148 - 875.4421} = \frac{875.4421}{28.2727} $$
$$ z = 30.9642 = 31 $$
Determine the MW of the protein.
$$ MW = z*\frac{m}{z_n}-z = z(\frac{m}{z_n}-1) $$
$$ MW = 31*(903.7148-1) = 31*902.7148 $$
$$ MW = 27,984.1588 $$
Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1.
$$ accuracy = \frac{|MW_{experiment} - MW_{theory}|}{MW_{theory}} $$
$$ accuracy = \frac{|27,984.1588 - 27986.60|}{27986.60} = \frac{2.4412}{27986.60} = 8.7227e-5 $$
$$ accuracy * 1,000,000 = 87.2275 ppm $$
This is >50ppm but it’s close, so this might be the right protein.
Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not? The picture is pretty blurry, so honestly i am having a hard time reading the numbers. But i think we can see isotope peaks labeled: 1473.7429, 1473.7950, [unreadable], 1474.0045, 1474.0481, 1474.1006. These all yield spacings around 0.05. This would indicate a charge state around 20.
Waters Part II: Secondary/Tertiary Structure
Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)? Native protein conformation is the shape the protein is folded into when it is made by the cell, this is usually the active state for enzymes. Denatured protein conformation is when the protein is unfolded, and essentially a linear amino acid sequence. On mass spectometry, the denatured state exposes all possible sites for adding a charge for the clean z+1 peaks, whereas the native conformation has more limited (and frequently unknown) how many charges can and are added in different peaks. In a mass spec, the more linear/unfolded proteins add more charges, so the m/z peaks tend to be lower than those of a native protein (more peaks to the right).
Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell? Once again, the low resolution of the screenshot is making it hard to read the numbers. A stretch that i’m decently confident about reads peaks at: 2545.1304, 2545.2222, 2545.3140, 2545.4058, 2545.4973. These all yield a spacing around 0.09. This would indicate a charge state around 11.
Waters Part III: Peptide Mapping - primary structure
How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above.
How many peptides will be generated from tryptic digestion of eGFP? 26, by my hand count. Using the online tool, 19. i think the difference is in not counting the very short peptides (< 5 amino acids, plus a couple of 4 AA peptides, likely because they have heavier side chains since it has a 500 Da cutoff).
Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance. I saw 23 peaks, but only 21 are labeled, so I’m guessing maybe only the labeled ones are >10% abundance?
Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer? No, there are more peaks in the chromatogram.
Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H]+) based on its m/z and z. The m/z of the peptide at the most abundant charge state is 525.76712. The z of the most abundant charge state is 2 (because the highest peak has isotope peaks that are 0.5 m/z apart).
$$ \frac{MW+2H}{2} = 525.76712 $$
$$ MW + 2(1.00727) = 1051.53424 $$
$$ MW + 2.01454 = 1051.53424 $$
$$ MW = 1049.5197 $$
$$ [M+H]+ = MW+1H = 1049.5197 + 1.00727 $$
$$ [M+H]+ = 1050.52697 $$
Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. The peptide is FEGDTLVNR.
$$ accuracy = \frac{|MW_{experiment} - MW_{theory}|}{MW_{theory}} $$
$$ accuracy = \frac{|1050.52697 - 1050.5214|}{1050.5214} = \frac{0.00557}{1050.5214} $$
$$ accuracy = 5.30212 e-6 * 1,000,000 = 5.3 ppm $$
This is <10 ppm, so it is probably the correct peptide.
What is the percentage of the sequence that is confirmed by peptide mapping? 88%
Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? FEGDTLVNR; mono; +1; B, Y. Mostly matches up. D peak at 717 is very small and unlabeled, but it looks like there’s a peak approximately there. There’s no N peak at 289, nor an R peak at 175. Also the three smallest peaks don’t match up with anything in the in-silico fragmentation (56, 122, 214).
Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Mostly - the peptides that are not covered in the peptide mapping are either too large (>20 AA) or too small (<5 AA) for confident identification according to the informaiton provided in the lab.
Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.
Theoretical
Observed/measured on the Intact LC-MS
PPM Mass Error
Molecular weight (kDa)
27986.60
27,984.16
87.2
This error is close to 50 ppm, so it might be GFP. Especially with the pretty good peptide mapping, I think this is likely GFP, though I am not as confident as I would like to be.
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork
I made most of the big bulls-eye target in the upper right quadrant that occurred fairly early on in the editing time period. I made it largely during the recitation just by having it open in another window from the lecture and clicking another pixel whenever my timer ran out. I didn’t really contribute after that, but it was fun to see how people incorporated the target into one of the scissors handles, and then it ultimately disappeared. For future iterations, I’d really recommend publishing the viewable history link somewhere because I lost it after like a day and so then I wasn’t able to keep watching the changes and how they compared to previous versions.
Part B: Cell-Free Protein Synthesis | Cell-Free Reagents
Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction. \
E. coli Lysate
BL21 (DE3) Star Lysate (includes T7 RNA Polymerase): Lysate includes enzymes, nutrients, and cofactors; specifically, it includes the T7 RNA polymerase for rapid transcription of genes under T7 promoter.
Salts/Buffer
Potassium Glutamate: Potassium glutamate is a potassium source; potassium is an essential enzyme co-factor.
HEPES-KOH pH 7.5: HEPES buffer maintains pH at the optimal pH for enzyme efficiency; transcription and translation usually occurs at neutral pHs which the inside of a cell is. KOH is the hydroxide to adjust buffer to pH 7.5 to because potassium phosphate is used.
Magnesium Glutamate: Magnesium glutamate is a magnesium source; magnesium is an essential enzyme co-factor.
Potassium phosphate monobasic: Potassium phosphate is both a phosphate (energy) source and a potassium (enzyme co-factor) source; I’m unsure why both the monobasic and dibasic are included here. Monobasic is mildly acidic in comparison.
Potassium phosphate dibasic: Potassium phosphate is both a phosphate (energy) source and a potassium (enzyme co-factor) source; I’m unsure why both the monobasic and dibasic are included here. Dibasic is mildly basic in comparison.
Energy / Nucleotide System
Ribose: Ribose is a sugar molecule that is an essential component of nucleic acids and ATP. It’s used in nucleotide production (GMP from guanine) and possibly also energy regernation.
Glucose: Glucose is a sugar molecule that is used in ATP regeneration.
AMP: Adenosine monophosphate is a nucleotide used in transcription. It gets additional phosphate groups to become ATP, which is essential for energy, and so it is probably also used in ATP regeneration.
CMP: Cytidine monophosphate is a nucleotide used in transcription.
GMP: Guanosine monophosphate is a nucleotide used in transcription.
UMP: Uridine monophosphate is a nucleotide used in transcription.
Guanine: Guanine is the nucleoside base for GMP; it can be used to make GMP with ribose and phosphate probably?
Translation Mix (Amino Acids)
17 Amino Acid Mix: Amino acids are needed for translation because they are what proteins are made up of. I’m not sure why this is only 17 instead of 20.
Tyrosine: This is an amino acid needed for translation. I’m unsure why additional tyrosine would need to be added beyond the mix, maybe it’s not one of the 17?
Cysteine: This is an amino acid needed for translation. I’m unsure why additional cysteine would need to be added beyond the mix, maybe it’s not one of the 17?
Additives
Nicotinamide: Nicotinamide is part of NAD+/NADP+, and so is needed for energy regeneration in redox reactions.
Backfill
Nuclease Free Water: It’s an aqueous solution, so water fills out the rest of the reaction volume; nuclease-free water doesn’t contain active restriction enzymes to cleave DNA or RNA.
Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. The main difference appears to be in energy regeneration and cheapter components, because it uses nucleotide monophosphates instead of triphosphates. It also uses PEP-Mono (phosphoenol pyruvate, monosodium salt) for energy instead of nucleotides and sugars for energy generation through enzymatic pathways like glycolysis. PEP-Mono is a high energy phosphate-containing compound that can easily transfer phosphate groups for energy.
Bonus question: How can transcription occur if GMP is not included but Guanine is? GMP is produced with guanine, ribose, and phosphate that is provided separately in the cell-free mixture. not sure which specific enzyme(s) are involved
Part C: Planning the Global Experiment | Cell-Free Master Mix Design
Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. FPbase.org is not currently working, so i’m doing the best i can based off of skimming papers because i don’t have time to do a close reading of a whole bunch of them unfortunately. honestly, i’ll probably just try again later.
sfGFP: sfGFP was developed to be faster at folding into the active, fluorescent shape than wild-type GFP, resulting in a more robust and stable fluorescent protein.
mRFP1: Needs to bind to calcium?
mKO2
mTurquoise2
mScarlet_I
Electra2
Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.
The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24).
Contents Dilution Practice Lab Practice Dilution practice 1 The stock concentration of a mystery substance (MS) is 5 M. Calculate how to dilute to 100 µM (0.1 mM).
Contents: Planning Lab Prep Gel Art Planning Notes: i don’t have lambda DNA, but i do have Escherichia coli BL21 genomic DNA and a small collection of various plasmids and PCR products of varying rates. we also have a handful of restriction enzymes but not a lot, and mostly not common ones. i think my strategy is going to be: sketch out a design run a restriction digest on the E. coli genomic DNA to get a bunch of different-sized fragments. doesn’t particularly matter which one i think. run the digest on a gel, and purify out the fragments of the size i want with a Qiagen or NEB kit; note: i am going to have to elute with pretty small volumes to keep them concentrated enough to show up in subsequent gels. run a new gel with the purified fragments based on the design (possibly augmenting with PCR products if desired for brightness/intensity). take photo to show off for whatever reason, neither uploading Genbank files and downloading accession files for the E. coli genomic assembly in Benchling is working for me. i suspect it probably has to do with the size of the files and speed (or lack thereof) of my internet. so i can’t do much in-silico planning and testing. but i think my plan will work without it. it just means i’ll have to do more testing during instead of thinking/planning prior. Lab Prep: Sketch out a design. I found a photo of the Portland skyline with Mt. Hood in the background from the City of Portland’s Instagram. Photo credit: @james.is.jumbled. I traced the lines of primary visual components to get a line art style drawing, and then split it into a grid of 16 columns, for the 16 wells for the largest gel comb I have available. I recreated the gridded line art, scaled to a printout of the 1kb+ gel ladder, to approximate the size DNA fragments I would need in each column.
We started off our node’s discussion of the neuromorphic circuits based off of a couple example circuits developed by TA Steven with help from ClaudeAI. Because I feel like I don’t really understand the analog vs binary computing, I was most interested in the design that explored that aspect.
Option C: Competing Inhibitors Concept: One dominant ERN (CasE, high dose) controls the whole network. It kills both Csy4 and the green output directly.
For the first step, I would use 1 µL of the stock solution diluted into 499 µL of water to make 500 µL of a 10,000 µM solution. Then for the next step, I would use 1 µL of the 10,000 µM dilution, diluted into 99 µL of water to make 100 µL of a 100 µM solution.
Dilution practice 2
The stock concentration of a mystery substance (MS) is 5 M. If the molar mass of MS is 532 g/mol, what’s the concentration of the stock concentration in g/mL?
$$ 5 M = 5 \frac{mol}{L} $$
$$ 5 \frac{mol}{L} * 532 \frac{g}{mol} = 2,660 \frac{g}{L} $$
$$ 2,660 \frac{g}{L}* \frac{1L}{1,000 mL} = 2.66 \frac{g}{mL} $$
You will perform a serial dilution to get 100 uM of MS. Devise a plan to dilute a 5 M MS solution to 100 uM. How many dilution steps will we need? Which tubes should we use? Which pipettes? We will need two empty microtubes. For the first step, we’ll use a P20 for the stock solution, and a P1000 for the water. For the second step, we’ll use a P20 for dilution 1, and a P200 for the water.
graph LR;
A[stock solution 5M] -->|1µL stock into 499µL water| B[dilution 1: 10,000µM]
B -->|1µL dilution1 into 99µL water| C[final dilution: 100 µM]
Fill out the following chart to prepare a final reaction with 60 uL reaction volume. Why did we make 100 uM MS if we actually need 40 uM MS? Why not prepare 40 uM in serial dilutions?
If we had 40 µM MS, then when we added the loading dye, it would be diluted below 40 µM. So we need to have a high enough concentration of MS, that we can add loading dye to 1X concentration and still reach a final MS concentration of 40 µM.
Lab
Part 1: Mixing Color
I made my stock color solutions by adding dye to approximately 5 ml water in three different 12 ml test tubes: 3 drops of yellow dye, 1 drop of blue dye, 2 drops of red dye, and then vortexing to mix.
Following the protocol, I obtained 6 colors. Step 4 was done with P20 and P200 in steps as described; steps 5 and 6 were done in single steps with the P1000 and P200 respectively.
I made an additional 4 colors as follows:
Lime: 300 ul yellow, 50 ul blue
Teal: 25 ul yellow, 600 ul blue
Coral: 300 ul red, 50 ul yellow, 25 ul blue, 300 ul water
Slate: 100 ul red, 300 ul blue, 300 ul water
My step 7 artwork is below and also the above cover image.
Part 2: Performing Serial Dilution
I don’t know what the Mystery Substance (MS) is supposed to be. I used some purified pUC19 plasmid, at a concentration of 197 ng/ul because that’s something I had available. It’s a double-stranded DNA, so the molecular weight would be around 660 g/mol per base pair, or a total of $660 \frac{g/mol}{bp} * 2.7 kb = 1,800 kg/mol$ approximately. Therefore, my stock concentration is $ 0.197 \frac{g}{L} * \frac{mol}{1,800,000 g} = 1.094E-7 mol/L = 0.11 uM = 110 nM$.
To get an arbitrarily chosen 1 nM stock, I did the following serial dilution:
graph LR;
A[0.11 uM stock solution ] -->|4.5 uL stock into 45.5 uL water| B[dilution: 0.01 uM]
Then I made the final solution according to the table. Again, the MS desired concentration was chosen arbitrarily.
Reagent
Stock concentration
Desired concentration
Volume
Loading dye
6X
1X
10 µL
MS
10 nM
1 nM
6 µL
dH2O
n/a
n/a
44 µL
I added 20 ul of the final solution to an agarose gel (1% w/v). I made the agarose gel by measuring out 0.5 g of agarose, and adding it to 50 ml of 1x TAE buffer, then microwaving until melted. I poured it into a gel mold with a well comb and let set fully before putting into the electrophoresis set-up to practice loading into a well.
i don’t have lambda DNA, but i do have Escherichia coli BL21 genomic DNA and a small collection of various plasmids and PCR products of varying rates.
we also have a handful of restriction enzymes but not a lot, and mostly not common ones.
i think my strategy is going to be:
sketch out a design
run a restriction digest on the E. coli genomic DNA to get a bunch of different-sized fragments. doesn’t particularly matter which one i think.
run the digest on a gel, and purify out the fragments of the size i want with a Qiagen or NEB kit; note: i am going to have to elute with pretty small volumes to keep them concentrated enough to show up in subsequent gels.
run a new gel with the purified fragments based on the design (possibly augmenting with PCR products if desired for brightness/intensity).
take photo to show off
for whatever reason, neither uploading Genbank files and downloading accession files for the E. coli genomic assembly in Benchling is working for me. i suspect it probably has to do with the size of the files and speed (or lack thereof) of my internet. so i can’t do much in-silico planning and testing. but i think my plan will work without it. it just means i’ll have to do more testing during instead of thinking/planning prior.
Lab Prep:
Sketch out a design. I found a photo of the Portland skyline with Mt. Hood in the background from the City of Portland’s Instagram. Photo credit: @james.is.jumbled. I traced the lines of primary visual components to get a line art style drawing, and then split it into a grid of 16 columns, for the 16 wells for the largest gel comb I have available. I recreated the gridded line art, scaled to a printout of the 1kb+ gel ladder, to approximate the size DNA fragments I would need in each column.
Restriction digest E. coli gDNA.
10 ul E. coli BL21 gDNA (125 ng/ul)
5 ul rCutSmart buffer
1 ul MspI (2018)
1 ul SpeI-HF (2015)
1 ul XbaI (2015)
2 ul NdeI (2009)
34 ul ultra-pure water
Combined the above components in a microtube (50 ul total reaction volume) and vortexed to mix. Incubated at 37C for an hour. Note that all enzymes are from NEB and are all past their expiration dates, but have been stored in a -20C freezer the whole time.
Gel purification of DNA fragments. I re-used an old gel for this first run. I combined 3ul of ladder with 2ul SYBR Green I dilution (diluted 1:50) and around 0.5ul loading dye on a scrap of parafilm, and loaded this mixture into well 5. I added 6ul of SYBR Green I dilution into the restriction digest along with 10ul of loading dye. I loaded around 33ul each into 2 lanes. I ran this electrophoresis for 40 minutes at 180mV.
Lanes:
1kb+ ladder (NEB)
Multi-enzyme digested E. coli gDNA
Multi-enzyme digested E. coli gDNA
PCR product
PCR product
PCR product
It was just smears, which I suppose isn’t too surprising, considering that I started with gDNA and all my enzymes were expired. From this gel, I cut out smears from the multi-enzyme digests at the following ranges: 1-0.1kb, 0.7-0.1kb. Using a Qiagen Qiaquick gel purification kit, I purified these semars individually. All purifications were eluted with 30 ul of elution buffer. I added these to tubes of PCR products for my gel art palette.
To another gel, I loaded the following into the wells, mixing each with 1ul of loading dye on parafilm prior to loading:
ladder
10ul A
2ul A
1ul A
linearized plasmid
15ul B
2ul B
1ul B
5ul A
These are not super clear, but I cut out additional smears from lanes 1, 6, and 9 at the following ranges: 0.5-0.1kb, 0.2-0.1kb. Eluted these with 25ul of elution buffer. Added these to my palette above: tubes .
This left me with the following palette (all sizes and size ranges are approximate):
A. PCR product: 6kb
B. PCR product: 5kb
C. PCR product: 4kb
D. PCR product: 3kb
E. PCR product: 1kb
F. PCR product: 700bp
G. PCR product: 650bp
H. PCR product: 500bp
J. PCR product: 200bp
K. PCR product: 100bp
L. smear from 100bp-1kb
M. smear from 100bp-700bp
N. smear from 100bp-500bp
O. smear from 100bp-200bp
Note that J and K are low concentration, and the smears didn’t show up well on the test plate, so I’m going to use larger volumes of those than I am for the rest.
I re-drew my gridded lineart with the PCR products that I know I have.
Gel Art lab
I cast a new electrophoresis gel by dissolving 1.3g agarose in 130ml 1x TAE, and pouring into a larger gel mold. This fit a comb with 16 wells. I allowed this to set before transferring into an electrophoresis set-up filled with 1x TAE. I loaded the following combinations into the wells, mixing each on parafilm with both 2ul of SYBR GreenI (50x dil) and appropriate volumes of loading dye, prior to loading. PCR products were 2ul each, except J and K which were 4ul each. Smears (L, M, N, O) were all in the range of 4-10ul per well.
ladder
empty well
E, L, I
J, O
H, N
D, I, O
C, F, M, H, I
B, F, M
A, J, O
B, I, O
C, J, O
D, F, M
J, O
G, N, I
H, N, J
empty well
Ran gel at 200V for around a half hour.
Not all the bands are the same brightness, which I can probably attribute to the variable DNA concentration of my various PCR products. It also looks like I must’ve mixed up the 4kb and 5kb tubes. None of the smears showed up at all, which was a little disappointing. Overall though, the art turned out pretty well, I think, even if it was more trial and error than in-silico design and then execution.
We started off our node’s discussion of the neuromorphic circuits based off of a couple example circuits developed by TA Steven with help from ClaudeAI. Because I feel like I don’t really understand the analog vs binary computing, I was most interested in the design that explored that aspect.
Option C: Competing Inhibitors
Concept: One dominant ERN (CasE, high dose) controls the whole network. It kills both Csy4 and the green output directly.
CasE (strong) ──kills──▶ Csy4 (weak, dies) ──kills──▶ mNeonGreen (green, OFF)
Csy4 (dead) ──can’t kill──▶ PgU (survives, but has nothing to do)
Group
Plasmid
Amount
Role
X1
CasE
200 ng
Dominant enzyme
X1
eBFP2
50 ng
Blue control light
X2
Csy4_rec_CasE
100 ng
Csy4, killed by CasE
X2
mMaroon1
50 ng
Maroon control light
X3
PgU_rec_Csy4
100 ng
PgU, freed because Csy4 is dead
Bias
CasE_rec_mNeonGreen
150 ng
Green, killed by CasE
Expected result: Blue ON, Maroon ON, Green OFF
Why it’s interesting: Shows that dosage (ng amounts) determines who wins. You could run a second experiment with CasE reduced to 50 ng to see if the outcome changes — demonstrating the analog nature of the circuit.
Worth noting that there was significant confusion over the way that Claude worded the “Roles” of the Enzyme_rec_output constructs. The correct interpretation is as follows: CasE_rec_mNeonGreen means that the plasmid encodes for mNeonGreen with a recognition site for CasE (therefore, it is amount of mNeonGreen minus amount of CasE to determine if fluorescent green is present).
I was interested in the analog/dosing aspect of this circuit, but I thought it might be more interesting to include a different color, so we would see either Green or a different fluorescence, depending on which enzyme there was more of.
My first attempt looked like this:
Circuit Name
Transfection Group
Contents
Concentration (ng/ul)
DNA wanted (ng)
JKScircuit-1
X1
PgU
50
200
JKScircuit-1
X1
eBFP2
50
100
JKScircuit-1
X2
PgU_rec_Csy4
50
50
JKScircuit-1
X2
mMaroon1
50
100
JKScircuit-1
Bias
PgU_rec_mNeonGreen
50
100
JKScircuit-1
Bias
CasE_rec_Csy4_rec_mKO2
50
100
I figured that if there was more PgU than Csy4, then it would output orange. But if there was more Csy4 than PgU then it would output green. This is because PgU subtracts from mNeonGreen output, Csy4 subtracts from mKO2 output, and PgU subtracts from Csy4 output. So if there is high PgU, then mNeonGreen is not expressed; there is not enough Csy4 to compete with the PgU, and since there is not Csy4, there is nothing to inhibit the mKO2 output. If there is high Csy4, then there is not enough PgU to inhibit mNeonGreen (because it is mostly used up in competing with Csy4), and the remaining Csy4 inhibits mKO2. I figured that since I don’t have any CasE expressed in my system, it doesn’t matter that CasE could also inhibit mKO2. eBBFP2 and mMaroon1 are controls to check for transfection efficiency. I had the unbalanced DNA amounts because that tests the analog computing that I was interested in.
Unfortunately, this first attempt gave an error when I tried to put it into the Neuromorphic Wizard tool. Looking through our forum discussion post, I was able to somewhat troubleshoot off of this explanation TA Steven got from ClaudeAI when he was trying to troubleshoot Jessica Wu’s circuit:
Why it failed:
The Predict API only accepts X1 + X2 groups (no OR, NOT, Bias etc.)
Each group must have exactly: marker + ERN + ERN_rec_output
Based off this, I changed the Transfection Group for the last two lines from Bias to X1 or X2; because even though that’s what was in the example given in the lab, this suggested that the Neuromorphic Wizard API could only accept X1 and X2. So my second attempt looked like this:
Circuit Name
Transfection Group
Contents
Concentration (ng/ul)
DNA wanted (ng)
JKScircuit-2
X1
PgU
50
200
JKScircuit-2
X1
eBFP2
50
100
JKScircuit-2
X2
PgU_rec_Csy4
50
50
JKScircuit-2
X2
mMaroon1
50
100
JKScircuit-2
X2
PgU_rec_mNeonGreen
50
100
JKScircuit-2
X1
CasE_rec_Csy4_rec_mKO2
50
100
Unfortunately, this still gave an error. I suspect that the PgU_rec_Csy4 and PgU_rec_mNeonGreen being in the same transfection group is outside the bounds of what the API could accept, but biologically, I don’t think this would be a problem. Essentially, I am using PgU_rec_Csy4 as an ERN instead of as a ERN_rec_output. The proper solution is probably to instead put PgU_rec_mNeonGreen into an X3 group instead of in X2 with the PgU_rec_Csy4, but we didn’t have enough colors to use as transfection controls since I wanted to use two colors as outputs.
TA Steven put my circuit into ClaudeAI to troubleshoot as well, and gave me this feedback:
Fixed. Here’s what changed:
Original
Fixed
Why
X2: PgU_rec_Csy4 (output, no ERN)
X2: Csy4 (ERN)
X2 needs a standalone ERN
X2: mMaroon1
X2: mKO2
API only recognizes eBFP2/mKO2 as markers
X1: CasE_rec_Csy4_rec_mKO2
X1: Csy4_rec_mNeonGreen
No CasE source existed; output should be mNeonGreen
This is an asymmetric cross-inhibition — PgU is dosed 4x heavier than Csy4, so X1 should dominate the competition. The heatmap should show stronger suppression along the X1 axis.
Total DNA: 600ng (under limit).
This is frustrating because I’m pretty sure this is a limitation of the Neuromorphic Wizard API, and my circuit is biologically sound. The first one is what I suggested originally, that I am using PgU_rec_Csy4 as an ERN instead of as a ERN_rec_output. I’m not sure why X2 needs a standalone ERN. The second one is just that it doesn’t accept mMaroon1 as a marker, which is odd because it’s listed on the Parts list as an option. The third one is what I explained earlier; that since I don’t have any CasE expressed in my system, it doesn’t matter that CasE could also inhibit mKO2.
While I would have liked to see experimentally what would happen with my design, since I do think it’s valid biologically, we wanted to submit validated circuits only, since each node could only submit two circuits. So TA Steven submitted my circuit that was fixed by Claude because it was able to give a valid output on the Predict tab of the Neuromorphic Wizard.
Results:
I’m honestly not sure if this shows what I’d expect. I’m unclear on what these heatmaps are actually showing. Like I know that each dot is a cell that was transfected with the same things, but I’m not sure what exactly that means in regards to my circuit. I think I probably need to rewatch the lecture for clarity.
Week 9 Lab: Cell-Free Systems
Unfortunately, this lab is not available for remote participation.
Week 10 Lab: Mass Spectrometetry
Unfortunately, this lab was not available for remote participation. See this week’s homework for data analysis.
Notebook This page logs the notes, thoughts, brainstorming, and planning associated with my individual final project.
Links to section Idea brainstorming Quorum-sensing based killswitch PHA synthase enzyme engineering Feb 24, 2026 Brainstorming: Identification of PhaC analog in Cyanobacterium aponium UTEX 3222 and overproducing or engineering for increased efficiency BLAST/align with known PHA-synthases Compare efficiency / mutations that improved turnover in other PhaC - test analogous mutations (aligned location, similar or different AAs). improved substrate specificity? Site-specific saturation mutagenesis? Would be good use for automation Quorum sensing based killswitch (i.e. cell dies if it escapes bioreactor) Has to have some kind of inducible element or won’t grow after initial transformation What’s good at quorum sensing already? Something else??? Something in E coli that can be done on Opentron Because it’s more convenient for a final project to be executed in Victoria remotely Cyanobacterial expression plasmid across multiple cyano species needs to include E coli machinery for manipulation and production (and conjugation, for relevant species) Ideas: PhaC protein engineering Short term aim: Design small library of PhaC variants with expected improvement Medium term aim: Generate library and test in chassis strain Long term aim: Develop PHB bio-manufacturing cyanobacterial strain for carbon-neutral/carbon-negative plastic (depending on biodegradation). Quorum sensing based circuit for biocontainment Short term aim: Design killswitch with genetic circuit to trigger based on quorum sensing. Medium term aim: Build genetic circuit with expression based on quorum sensing with a measureable output; test circuit in E. coli. Long term aim: Optimize circuit sensitivity and test with killswitch expression; integrate into bio-manufacturing chassis strains for population-linked biocontainment. Broad cyanobacterial expression plasmid Short term aim: Design plasmid backbone based off native cyanobacterial plasmids and established E. coli machinery. Medium term aim: Test expression in multiple cyanobacterial strains (including some previously considered genetically intractable with classic broad-host-range vectors). Long term aim: Establish protocol for domestication of newly prospected, wild-type cyanobacterial strains using the cyanobacterial plasmid. Mar 31, 2026 Leaning towards quorum sensing killswitch because it’s more aligned with my prior experience and knowledge, so i think it will take less research on my behalf. since i’m already falling behind on homeworks, i’m worried about how much time it would take to optimize a protein since i have no prior machine learning experience.
Identification of PhaC analog in Cyanobacterium aponium UTEX 3222 and overproducing or engineering for increased efficiency
BLAST/align with known PHA-synthases
Compare efficiency / mutations that improved turnover in other PhaC - test analogous mutations (aligned location, similar or different AAs). improved substrate specificity?
Site-specific saturation mutagenesis? Would be good use for automation
Quorum sensing based killswitch (i.e. cell dies if it escapes bioreactor)
Has to have some kind of inducible element or won’t grow after initial transformation
What’s good at quorum sensing already?
Something else??? Something in E coli that can be done on Opentron
Because it’s more convenient for a final project to be executed in Victoria remotely
Cyanobacterial expression plasmid across multiple cyano species
needs to include E coli machinery for manipulation and production (and conjugation, for relevant species)
Ideas:
PhaC protein engineering
Short term aim: Design small library of PhaC variants with expected improvement
Medium term aim: Generate library and test in chassis strain
Long term aim: Develop PHB bio-manufacturing cyanobacterial strain for carbon-neutral/carbon-negative plastic (depending on biodegradation).
Quorum sensing based circuit for biocontainment
Short term aim: Design killswitch with genetic circuit to trigger based on quorum sensing.
Medium term aim: Build genetic circuit with expression based on quorum sensing with a measureable output; test circuit in E. coli.
Long term aim: Optimize circuit sensitivity and test with killswitch expression; integrate into bio-manufacturing chassis strains for population-linked biocontainment.
Broad cyanobacterial expression plasmid
Short term aim: Design plasmid backbone based off native cyanobacterial plasmids and established E. coli machinery.
Medium term aim: Test expression in multiple cyanobacterial strains (including some previously considered genetically intractable with classic broad-host-range vectors).
Long term aim: Establish protocol for domestication of newly prospected, wild-type cyanobacterial strains using the cyanobacterial plasmid.
Mar 31, 2026
Leaning towards quorum sensing killswitch because it’s more aligned with my prior experience and knowledge, so i think it will take less research on my behalf. since i’m already falling behind on homeworks, i’m worried about how much time it would take to optimize a protein since i have no prior machine learning experience.
Quorum sensing notes:
auto-inducer: triggers expression
keep in mind phenolic compounds and other naturally occurring quorum quenching
also keep in mind the potential for auto-inducer production from other related bacteria; like if biomanufacturing strain escapes bioreactor but lands in soil with existing microbiome - we still want the escaped cells to die
maybe we could make a synthetic quorum sensing system for orthogonality: would require biosynthetic pathway for auto-inducer, auto-inducer recognition (inducible promoter, transcription factor, riboswitch, etc.), auto-inducer export pathway to preferentially or rapidly diffuse out of the cell
for killswitch activation at low population (cells escaped from bioreactor), maybe consider secondary/back-up activation from an environmental signal
potentially test circuit with fluorescence or colorimetric output first before killswitch/toxin-antitoxin genes
References
Miguel, CMTS; Santos, CA; Lima, EMF; et al. Quorum Sensing in Bacteria: From Mechanisms to Applications in Foods. 2026. Current Opinion in Food Science: 101394. DOI: 10.1016/j.cofs.2026.101394.
maybe autoinducer can also trigger expression of toxin repressor
killswitch trigger for population low (like Paula’s idea for targeted drug delivery):
low constitutive expression of antitoxin
autoinducer-triggered expression of toxin
maybe autoinducer can also trigger expression of antitoxin repressor
Apr 2, 2026
In Victoria node recitation last night, Derek suggested a possibility for cell-free testing of the system to be able to use the Gingko cloud lab instead. Originally i had figured that because i want it to be a killswitch system, that it needs to be in a living cell. Also because traditional quorum sensing systems are dependent on autoinducer concentration within vs outside the cell membrane. But Derek suggested considering instead how to design a quorum sensing system that would be cell-free like on a paper biosensor: that triggers when at a minimum concentration, rather than triggering at the first cell it sees.
Phrasing it that way made me think of the analog computing of the neuromorphic circuits: where inputs are additive (positive or negative). So to reach a minimum concentration rather than the very first thing present, there has to be a counter-actor to the sensed thing present; something that degrades or binds to the autoinducer or signalling metabolite that is expressed constitutively at a low level. So when the signalling molecule concentration is low (low population), it will be all used up by the counter-actor before it can trigger expression of the QS-controlled genes. When the signalling molecule concentration is high (high population), it will outnumber the counter-actor, so it can still trigger the QS-controlled genes.
For example, a riboswitch that recognizes a small molecule metabolite. The metabolite is produced and exported by cells, and when present, activates the gene of interest (in my switch a killswitch or fluorescent protein). We’d also want to express the riboswitch as an aptamer that is unconnected to the gene of interest to bind the metabolite at low concentrations, until a high concentration of the metabolite is reached and the metabolite outnumbers the loose aptamer and can trigger the riboswitch to activate gene expression.
Apr 3, 2026
Brainstorming and design
Drafts
Title:
Population-dependent killswitch to prevent bioreactor escape
Short description:
Biocontainment is essential for safe biomanufacturing, but most strains with biocontainment have bespoke systems designed for that particular strain. A population-dependent killswitch would kill any cells that escape the bioreactor where they are being cultivated or harvested. My initial idea is a toxin-antitoxin system expressed under control of a quorum-sensing circuit. Future considerations: safeguards against biocontainment escape through mutation, multiple levels of regulation.
Aims:
Design a genetic circuit that controls expression dependent upon cell population density in E. coli. The circuit will be designed with the intent for a final use with a killswitch, but fluorescent or colorimetric outputs might be used for initial design and validation. Validate the circuit with a simulation in Asimov Kernel.
Test circuit in E. coli with a measurable output (such as fluorescence).
Test circuit with killswitch; integrate into a biomanufacturing chassis strain for population-linked biocontainment.
Companies:
Asimov - I plan on using Kernel to design and simulate my genetic circuit. Basecamp Research - Maybe their AI can help me design overlapping genes to prevent killswitch escape via toxin gene mutation. Cultivarium - If successful, quorum-based biocontainment could be a useful genetic tool to port to new potential chassis microbes.
Project idea slide
References:
Leonard SP; Halvorsen TM; Lim B; et al. Synthetic overlapping genes stabilize genetic systems. 2026. mBio, 17(3):e0272525. DOI: 10.1128/mbio.02725-25.
Blazejewski, T; Ho, H-I; Wang, HH. Synthetic sequence entanglement augments stability and containment of genetic information in cells. 2019. Science, 365(6453): 595-598. DOI: 10.1126/science.aav5477
Last night in the Victoria node recitation, Derek was really talking about how cool the Kernel-Twist-Nebula-Waters pipeline is, and he mentioned that he was a little disappointed that not many projects seemed like they were fully utilizing it. Especially since he still isn’t back in Victoria, it seems like the only way i’ll be able to get any actual lab data unfortunately, so that’s different from my original plan with the quorum sensing. i ended up messaging Derek on Discourse to ask if it was too late to change my mind, and he said that while technically yes, since no one had signed up on my slide as a mentor yet, i could change it. So i went ahead and came up with a new slide and replaced my original response.
Description:
Companies:
References
These are for using mass spectrometry for PHA analysis, for the Waters step of the pipeline.
Khang, TU; Kim, M-J; Yoo, JI; et al. Rapid analysis of polyhydroxyalkanoate contents and its monomer compositions by pyrolysis-gas chromatography combined with mass spectrometry (Py-GC/MS). 2021. International Journal of Biological Macromolecules, 174: 449-456. DOI: 10.1016/j.ijbiomac.2021.01.108
Johnston, B; Radecka, I; Chiellini, E; et al. Mass spectrometry reveals molecular structure of polyhydroxyalkanoates attained by bioconversion of oxidized polypropylene waste fragments. 2019. Polymers, 11(10):1580. DOI: 10.3390/polym11101580
Conners, EM; Bose, A. State-of-the-art methods for quantifying microbial polyhydroxyalkanoates. 2025. ASM Applied and Environmental Microbiology, 91(9):e00274-25. DOI: 10.1128/aem.00274-25
These are for machine learning for enzyme engineering.
Satoh, Y; Tajima, K; Tannai, H; et al. Enzyme-catalyzed poly(3-hydroxybutyrate) synthesis from acetate with CoA recycling and NADPH regeneration in Vitro. 2003. Journal of Bioscience and Bioengineering, 95(4): 335-341. DOI: 10.1016/S1389-1723(03)80064-6
Apr 9, 2026
Talking to Derek about the timeline in our Victoria node recitation last night, he suggested that everything for lab work will probably need to be ordered in the next two weeks to get data in time for final project presentations. He also said that if we want to run anything on the Ginkgo Nebula cloud lab, we need to talk with Ronan and see if he has the capacity for it. Given this timeline, i am almost definitely not going to be able to figure out any ML-guided protein engineering before the final ordering. What i’m thinking instead is to design initial constructs of PhaC from C. necator, PhaC from UTEX 3222, and a rational design for a UTEX 3222 PhaC mutant, all designed for cell-free expression. The reaction will probably include the monomer, since it’s simpler than using the full 5-enzyme cell free system from Satoh et al (reference 8 above) that used acetate as the feedstock, but I will need to double check the energy and CoA regeneration.
Apr 10, 2026
Enzyme sequence choices
PhaC enzyme from Cupriavidus necator was chosen as my wild-type. I used the amino acid sequence from Uniprot and codon-optimized it in Benchling for Escherichia coli. This was from homework 2 i think, and i just used that one.
For the mutant: I found a review paper (Ref1) that identified Ala510 in PhaC_Cnecator as having a role in substrate specificity: with A510M, A510Q, and A510C all increasing promiscuity (M, C both sulfur-containing residues; Q, C both polar residues; M, Q both larger residues); a related PhaC from Chromobacterium sp. USM2 found that changing the analogous A to M/W/V (all non-polar residues, larger than A) increased promiscuity; and the same PhaC from Chromobacterium sp. USM2 found that changing the analogous A to S (similar size, but polar) increased substrate specificity (towards short-chain-length PHAs, like PHB). It’s surprising that A->S had the oppposite effect as A->C, for these two different PhaC variants from different bacteria. But since I didn’t have time to read a lot more, I figured A510S was a good construct to test against the PhaC_Cnecator wild type to start with.
I tried to identify the PhaC sequence from UTEX 3222 to test as well, but I was unable to, as of yet. While the paper in which UTEX 3222 was prospected said the authors identified the genes encoding the PHA biosynthesis enzymes (Ref2), the genes weren’t annotated on the full genome sequence assembly, and I got no results BLASTing either C. necator PhaC or cyanobacterial PhaC from Synechocystis sp. PCC 6803 or the more closely relatated Microcystis aeruginosa sp. PCC 7608SL. I also tried BLASTing PhaE (another PHA biosynthetic enzyme) from a few different cyanobacterial strains as well, with still no results. All BLAST searches were tBLASTn to search for the nucleotide gene sequence within the genome assembly from the amino acid sequences from the various known PhaC
protein sequences. The genes were also not listed amongst the biosynthetic genes listed in the Supplementary Information from the UTEX 3222 paper. I was out of ideas at this point, and on a time constraint, so to get constructs added to the order list today, I decided to move forward just with the PhaC_Cnwt and PhaC_CnA510S for now. If/when I get my ML design program working, maybe I can email George Church (or whoever on the author list did the genome annotation) to try to run it through, but I can definitely start with the C. necator one.
Construct design
Derek sent me a message asking me to order constructs today. So I went into Kernel to design PhaC_Cnecator with T7 promoter, RBS and terminator for cell-free expression because the E. coli based cell-free expression kits I found online from both NEB and Thermo Fisher both used T7 polymerase. In Kernel, I used a T7 promoter, T7 RBS, and T7 terminator from the iGEM repository. I chose promoter Bba_Z0251 from the many options because it had a lot of documentation on its iGEM registry page, and matched the full consensus sequence (from the T7 promoters iGEM page). I chose RBS Bba_Z0261 from the many options because it was analyzed by the same iGEM team as the promoter I used. I used T7 terminator Bba_K731721 from the many options because it most closely matched a quick google search for the T7 terminator sequence. While Kernel did have a genetic part for PhaC_Cnecator in the Uniprot repository, it actually didn’t have a nucleotide sequence associated with it. So i copied the promoter, RBS, and terminator sequences from Kernel into Benchling, where I used the previously codon-optimized PhaC_Cnecator sequence from homework 2. I used Benchling’s translation tool to identify the Ala at position 510, and changed a single nucleotide to change A510 to A510S (Ala: GCC; Ser: TCC) for the mutant.
Then I exported the FASTA files for both constructs and uploaded them into the Twist portal for clonal genes. I couldn’t remember if it mattered using linear gene fragments or clonal genes within a plasmid, but I went with clonal plasmid because previous experience with linear fragment orders from Twist were pretty low concentration. I remembered from lecture that Ronan preferred us to use chloramphenicol for an antibiotic marker if needed, so I decided to use the pTwist-Chlor-HighCopy cloning vector. Twist’s interface found both genes to be complex, so I used its internal codon optimization to fix this issue: I identified the organism as E. coli, did not omit any restriction enzyme recognition sites, and selected the promoter, RBS, and terminator regions as sequences that should not be changed. To my surprise, these sequences were not identical except the one point mutation; they were optimized differently, but I suppose it doesn’t really matter. Then I exported the full constructs (including plasmid) GenBank files from Twist and re-uploaded into Benchling to generate the link for adding the spreadsheet.
After meeting with Derek to explain, he suggested using linear gene fragments instead of clonal, so I re-did the Twist ordering bit to generate prices and optimized fragment GenBank files. I elected to leave the adaptors on because I assume those will give long enough arms, but I’m not really sure. Derek said he’d check with Ronan.
Experimental design
I checked that Millipore Sigma does in fact carry my substrate, I think. The substrate being the PHB monomer: 3-hydroxy-butyryl-CoA. However, it’s very expensive, so I’ll see about also ordering the DNA and cheaper substrates for the 5-enzyme biosynthetic pathway with CoA recycling that was in one of the papers I found (Ref3). After comparing the even just of all the substrates I’d still need for the full pathway, it’s cheaper just to order the original substrate (since I’d still need at least a little bit of CoA, which is still expensive on its own) then to also get a bunch of additional DNA. Derek mentioned that I need to figure out what kind of purification is needed for Waters to analyze my PHB product at the end of the reaction.
Reference
Chek, MF; Hiroe, A; Hakoshima, T; et al. PHA synthase (PhaC): interpreting the functions of bioplastic-producing enzyme from a structural perspective. 2018. Applied Microbiology and Biotechnology, 103: 1131-1141. DOI: 10.1007/s00253-018-9538-8
Schubert, MG; Tang, T-C; Goodchild-Michelman, IM; et al. Cyanobacteria newly isolated from marine volcanic seeps display rapid sinking and robust, high-density growth. 2024. ASM Applied and Environmental Microbiology, 90: e00841-24. DOI: 10.1128/aem.00841-24
Satoh, Y; Tajima, K; Tannai, H; et al. Enzyme-catalyzed poly(3-hydroxybutyrate) synthesis fro macetate with CoA recycling and NADPH regeneration in Vitro. 2003. Journal of Bioscience and Bioengineering, 95(4): 335-341. DOI: 10.1016/S1389-1723(03)80064-6
Apr 13, 2026
After Derek talked to Ronan, he suggested going back to clonal genes. I used my original PhaC_Cn construct, but then I decided to just have the point mutation for the mutant and have the rest of the sequence be identical. So I copied the PhaC_Cn-pTwist construct into a new DNA sequence in Benchling, and made the point mutation (GCA->TCA), and then verified quickly in Twist that this sequence is still simple and the same price.