Jason Ross — HTGAA Spring 2026

About me

Passionate about space-based biotechnology applications and strengthening biosecurity. Excited to grow my technical acumen through this course!

Contact info

Please feel free to reach out via the HTGAA Discourse forum at 2026a-jason-ross!

Homework

Labs

Projects

Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
A glorious space phage (with artistic license lol) Week 1 Biological Engineering Application Governance Exercise I’m interested in developing phage chassis capable of targeting bacteria that commonly cause infections during spaceflight. I’m interested in developing these phage chassis because:
Week 2 HW: DNA Read, Write, and Edit
Part 0: Basics of Gel Electrophoresis Per instructions for this part, I attended the 02/10 lecture and 02/11. Additionally I attended all 3 Bootcamp sessions. Part 1: Benchling & In-silico Gel Art Make a free account at benchling.com Benchling Account Creation Confirmation
Week 3 HW: Lab Automation
The Power of Lab Automation Assignment: Python Script for Opentrons Artwork 0: Attended this week’s recitation and reviewed the lab information on programming Opentrons 1: Generated an artistic design using Ronan’s Opentrons GUI 1 2: Artistic Design Python Script: See script in URL below: https://colab.research.google.com/drive/1-pgSJt_aF9MydtG0szxz2YKoogNRLRhH#scrollTo=PsOgJ2DndZzt 3: Listing my sfgfp point coordinates from Ronan’s Opentrons GUI below (the shape is a rightward-facing green arrow): [(6.6,11), (8.8,11), (11,11), (8.8,8.8), (11,8.8), (13.2,8.8), (11,6.6), (13.2,6.6), (15.4,6.6), (13.2,4.4), (15.4,4.4), (17.6,4.4), (15.4,2.2), (17.6,2.2), (19.8,2.2), (17.6,0), (19.8,0), (22,0), (-22,-2.2), (-19.8,-2.2), (-17.6,-2.2), (-15.4,-2.2), (-13.2,-2.2), (-11,-2.2), (-8.8,-2.2), (-6.6,-2.2), (-4.4,-2.2), (-2.2,-2.2), (0,-2.2), (2.2,-2.2), (4.4,-2.2), (6.6,-2.2), (8.8,-2.2), (11,-2.2), (13.2,-2.2), (15.4,-2.2), (17.6,-2.2), (19.8,-2.2), (22,-2.2), (24.2,-2.2), (-22,-4.4), (-19.8,-4.4), (-17.6,-4.4), (-15.4,-4.4), (-13.2,-4.4), (-11,-4.4), (-8.8,-4.4), (-6.6,-4.4), (-4.4,-4.4), (-2.2,-4.4), (0,-4.4), (2.2,-4.4), (4.4,-4.4), (6.6,-4.4), (8.8,-4.4), (11,-4.4), (13.2,-4.4), (15.4,-4.4), (17.6,-4.4), (19.8,-4.4), (22,-4.4), (24.2,-4.4), (26.4,-4.4), (-22,-6.6), (-19.8,-6.6), (-17.6,-6.6), (-15.4,-6.6), (-13.2,-6.6), (-11,-6.6), (-8.8,-6.6), (-6.6,-6.6), (-4.4,-6.6), (-2.2,-6.6), (0,-6.6), (2.2,-6.6), (4.4,-6.6), (6.6,-6.6), (8.8,-6.6), (11,-6.6), (13.2,-6.6), (15.4,-6.6), (17.6,-6.6), (19.8,-6.6), (22,-6.6), (24.2,-6.6), (17.6,-8.8), (19.8,-8.8), (22,-8.8), (15.4,-11), (17.6,-11), (19.8,-11), (13.2,-13.2), (15.4,-13.2), (17.6,-13.2), (11,-15.4), (13.2,-15.4), (15.4,-15.4), (8.8,-17.6), (11,-17.6), (13.2,-17.6), (6.6,-19.8), (8.8,-19.8), (11,-19.8)]
Week 4 HW: Protein Design Part 1
South American Rattlesnakes (Crotalus durissus terrificus) with Crotamine protein Part A: Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) I intake approx. 5 * 1023 Daltons of amino acids when ingesting 500 grams of meat. This is based off results indicating I ingest approx. 1021 Daltons of amino acids when ingesting 1 gram of meat Why do humans eat beef but do not become a cow, eat fish but do not become fish? Humans eat beef but don’t become cattle, and eat fish but don’t become fish because genetic information from the lifeform being ingested isn’t transferred wholesale. Much of the genetic material being eaten is broken down during digestion and more importantly a human beings’ cells follow instructions derived from their DNA. Human beings’ cells utilize amino acids from the lifeform being ingestion, but perform this utilization according to specific genetic instructions. The lifeform being ingested and its amino acids are the raw materials the cell uses for various means. Why are there only 20 natural amino acids? There are several broad reasons why there are only 20 standard natural amino acids. The first reason is that early on in the history of evolution, this group of amino acids became more or less ’locked in’, meaning that once the basic relationship between three letter codons and these 20 standard natural amino acids became widely distributed across the kingdom of life, it becamde too risky/dangerous from an evolutionary standpoint to alter this core set. Another reason is that the group of 20 gives enough range in structure and chemistry to build a large chunk of what evolution or directed evolution might desire. The other reasons seem to amount to various types of evolutionary trade-offs. Adding more than 20 amino acids to this standard set would add additional, potentially unwanted complexity, while decreasing the number of amino acids in the set might lead to issues with a lack of uniqueness with amino acids side chain sharing, which would in turn limit the functional flexibility of amino acids to do things like fold precisely. Can you make other non-natural amino acids? Design some new amino acids. Yes you can. My attempts to design some new amino acids usng SwissSideChain and the Cryo-EM structure of Receptor Tyrosine Kinase ROS1 PDB file in PyMol (open-source) are shown below: Attempt at creating a non-natural amino acid residue mutation of Tyrosine Kinase ROS1 using cyclohexanecarboxylic acid
Week 5 HW: Protein Design Part 2
Using AlphaFold for Protein Optimization Part A: SOD1 Binder Peptide Design Part 1: Generate Binders with PepMLM Retrieved human SOD1 sequence via UniProt (see photo below). Introduced A4V mutation via Gemini prompt (see sequence below). Human SOD1 sequence (A4V mutation not added) Human SOD1 sequence (A4V mutation added)
Week 6 HW: Genetic Circuits Part 1
Robot Crafting Genetic Circuit (Stylized) DNA Assembly What are some components in the Phusion High-Fidelity (HF) PCR Master Mix and what is their purpose? HF DNA Polymerase: This is the enzyme responsible for copying DNA as it moves from the 5’ to the 3’ position across the DNA Deoxynucleotide triphosphates (dNTPs): These are the DNA molecular building blocks, consisting of Adenine (A), Thymine (T), Cytosine (C), and Guanine (G) variants HF Buffer: This consists of magnesium chloride, which is salt added to the reaction. It matters because it dissolves into Mg²⁺, which helps nucleotides bond during the reaction What are some factors that determine primer annealing temperature during PCR? Some factors that determine primer annealing tempeature during PCR include: Primer lengths Primer melting tempratures GC content/sequence content Buffer components There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. PCR: PCR creates new linear DNA fragments by via enzymatic amplification of a given region nth number of times. The PCR protocol essentially consists of setting up reaction mixes, denaturating the DNA into single strends, annealing so primers can anneal to specific complementary sequences, extension so the polymerase can syntehsize a new strand, and then repeating this as many times as neccessary. This method might be more useful when there is a specific fragment of DNA one wants to amplify for further use. Restriction Enzyme Digests: Restriction Enzyme Digests create new linear DNA fragments by cutting DNA at specific points/recognition sites. The Restriction Enzyme Digest protocol consists of setting up a reaction mix, incubation, and then stopping the reaction. This method might be more useful when there is a specific fragment of DNA one wants to isolate for further analysis. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning? You can ensure the DNA sequences have appropriate 5’ –> 3’ orientation with corresponding overlaps. Fragments salso need to cover the relevant region for cloning, and also need to be inserted at the appropriate molar ratio relative to the plasmid backbone (vector). This is usually a 2:1 ratio. How does the plasmid DNA enter the E. coli cells during transformation? The plasmid DNA enters the E. coli either via heat shock (temperature change) or electroporation (high electrical voltage). Both methods shock the E. coli cell, causing its cell membrane to open for the plasmid DNA to enter. Describe another assembly method in detail (such as Golden Gate Assembly) DNA topoisomerase I (TOPO) Cloning: TOPO cloning’s traditionally used, as it’s a fast, reliable method for cloning products from PCR for later sequencing, etc. The first step in TOPO cloning is generating an insert with Taq polymerase via PCR. This creates inserts with an A-overhang, which can then help address the second step. The second step is to combine this PCR product with the TOPO vector. This is usually done for a couple of minutes. The insert’s 5’ OH/hydroxyl interacts with the TOPO DNA at its end, and as part of this process A and T base pairing occurs between the respective insert and the vector . Then the TOPO religates the strangs and dissociates, creating a closed circular plasmid with the given insert. See diagrams below:
Week 7 HW: Genetic Circuits Part 2
Genetic Circuits Part 2 Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) Unlike traditional genetic circuits, IANNs are analog, and as such correspond more closely to the nature of biological systems (i.e, we’re not always looking for strict 0/1 binary logic, sometimes we’re looking to establish control across a range of values or space/time). This analog nature means they are more responsive, efficient, and biocompatible.
Week 9 HW: Cell-Free Systems
Homework Part A: General and Lecturer-Specific Questions Cell-Free Systems General Homework Questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free systems allow for a broader range of potential chemistries than those given to us from natural biology, expanding flexibility. Cell-free protein synthesis also allows for greater control over experimental variables because the entire protein expression construct is designed from scratch (i.e., we have the opportunity to bypass a lot of the compleity of natural cells). Cell-free expression is more beneficial than cell production if you want to rapidly protoype gene pathways and if you want an expression mechanism that’s more amenable to consistent, predictable modeling and analysis. Describe the main components of a cell-free expression system and explain the role of each component. The main components of a cell-free expression system are (based on elements described in this hyperlink 1): DNA template: Genetic code to begin Tx/Tl process Ribosomes: Assembling amino acids into polypeptides Enzymes: Catalyzing certain important chemical reactions necessary for the appropriate functioning of that cell-free expression system (ex. transcription and translation, energy generation) Amino Acids: The core chemical building blocks of the proteins the cell-free expression system will express Polymerases: Synthesizing DNA and RNA Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment Energy provision regeneration is critical in cell-free systems because cell-free systems don’t consume enzymes to produce energy. They also need external energy sources to remove waste products. A workaround might be to have analogous enzymatic reactions (possibly based off shared common charges) within the cell-free system to produce energy Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why. Prokaryotic cell-free expression systems allow for the colocation of transcription and translation. This might work well for proteins that need to be produced at high volume, like an industrial protease prtoein. Eukaryotic cell-free expression systems allow for more complex proteins to be built due to their nuclei. This might work well for the production of more advanced/technically complex proteins, like rabbit serum albumin. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup. In a manner similar to Shuguang Zhang ‘molecular glove’ experiment, I’d try to essentially coat and/or surround the the membrane protein with hydrophilic proteins to attract and/or absorb water in the cell-free environment, so the membrane protein can incorporate into the liposome 2. Challenges might include appropriate hydrophilic concentrations (which might be discerned via calculations or trial and error) or bonding between the hydrophilic proteins and the membrane proteins. This might be mitigated and/or the amount of error reduced through the use of computaitonal modeling and simulation tools like AlphaFold Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each. Suboptimal Ribosome Function: Examine ribosome mRNA transcription processes and modify as necessary Suboptimal Transcription: Examine tRNAs for coding errors/misreads or inappropriate expression levels and modify as necessary Suboptimal External Communication (i.e., yields cannot properly exit system at desired levels): Examine and modify membrane channel functionality as necessary Supporting prompts for this section listed below:
Week 10 HW: Advanced Imaging & Measurement Technology
Waters Corporation Mass Spectrometer Homework: Final Project For your final project: Please identify at least one (ideally many) aspect(s) of your project that you will measure. Lysis Rate Efficiency of Plating Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements. Lysis Rate: This measures the rate at which the mutated m. smegma mycobacteriophage lyses or destroys bacteria. This would be measured in a wet lab setting by comparing percentages of bacteria across a control and another plate that has been exposed to a mutated form of m. smegma mycobacteriophage Efficiency of Plating: This measures the rate at which the mutated m. smegma mycobacteriophage can begin initiating a host infection. Believe this would also be measured in a wet lab setting by comparing percentages of bacteria across a control and another plate that has been exposed to a mutated form of m. smegma mycobacteriophage What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail. Lysis Rate: I’d likely use a microplate reader as part of a wet lab extension of the final project Efficiency of Plating: I’d use a plauqe assay as part of a wet lab extension of the final project Supporting prompts for this section listed below:
Week 11 HW: Bioproduction and Cloud Labs
Part 1: Global Pixel Artwork Cloud Lab Contribution Made the following contributions to the Global Pixel Artwork Cloud Lab (see screenshots below) Global Pixel Artwork Contributions (see above). Edited 4 pixels in the upper right hand corner of the image (changed them to sfGFP)
Week 12 HW: Bioproduction & Cloud Labs Part 2
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork Contribute at least one pixel to this global artowrk experiment before the editing ends on Sunday 4/19 at 11:59 PM EST. Contributed 4 pixels to the global artwork experiment on Saturday 4/18 Make a note on your HTGAA webpages including:
Week 13 HW: Scaling Health Innovation
Master Mix Concentrations See test Master Mix Concentrations below: Week 13 Test Master Mix Concentrations Per Week 12 hypothesis (see Week 12 HW)1, attempted to increase ribose and glucose concentration levels to increase selected well sfGFP expression levels
Week 14 HW: Bio Design and Bio Fabrication
Space Phage Supreme work (with artistic license lol) Final Project Work Worked on Finalizing Final Project 1 https://pages.htgaa.org/2026a/jason-ross/projects/individual-final-project/index.html ↩︎
Week 15 HW: Survey Responses
Future Reflections: Kepler-16b Biopark Survey Responses, Pluses and Deltas, and Reflection

Week 1 HW: Principles and Practices

A glorious space phage (with artistic license lol)

Week 1 Biological Engineering Application Governance Exercise

I’m interested in developing phage chassis capable of targeting bacteria that commonly cause infections during spaceflight. I’m interested in developing these phage chassis because:
- They would give astronauts and future space travelers more autonomy in countering ad hoc novel infections that may occur during long-duration missions
- They would help contribute to greater personalization of medical care for broader ranges/more diverse spacefaring populations, who will likely travel, work, and live in space for extended periods of time/longer than traditional space missions to date
- This development may help counter novel terrestrial infections for medically underserved populations
Governance Goal 1 (G1): Preventing/Mitigating Malicious Dual-Use (i.e., ensuring)
- Appropriate biosecurity (including cyberbiosecurity) controls
- Safe deployment conditions
Governance Goal 2 (G2): Empowering Autonmous and Equitable Use
- ‘Plug-and-play’ functionality (i.e., any mission or crew should be able to intuitively use solution – advanced technical acumen not required)
- Solution costs and distribution should develop with access across wide range(s) of demographics and socio-economic strata in mind
Governance Action 1: Phage Safety Refusal (PSR)
Purpose:
- Current State: There are no proactive mechansisms preventing malicious dual-use phage development (i.e., developing a phage speicifcally designed to degrade host mRNA in healthy cells).
- Proposed Changes: Easily implementable phage production host ‘kill switches’ that can be deployed if n^th concentrations of healthy cells accidentally or deliberately targeted by phage therapy
Design:
- Existing global space health biotechnology base (consisting of private firms, academia, and government-affiliated research centers) need to opt-in and actively invest in space-based phage therapies at adequate enough levels to ensure this notion of a phage defensive ‘kill switch’ can be validated and deployed, or invalidated and no longer pursued
- Diverse space health regulators (consisting of a variety of agencies across nation-state governments) would need to approve said ‘kill switches’ (ex. Food and Drug Administration in the United States)
- Patients participating in clinical trials of the ‘kill switch’ would need to feel comfortable enough with the solution to participate
- Investors and members of private industry must see enough potential in the ‘kill switch that they see it as worthy of their time and investment
- Astronauts and future spacefarers must be comfortable enough with the ‘kill switches’ to consent to their use
Assumptions
- The underlying notion of malicious dual-use phage applications, particularly in a space biotechnology context. Assumptions underneath this assumption include:
  - Viable malicious dual-use phage applications within a space health context and
  - Malicious actors would see and act on the benefits of phage interventions for targeted action against healthy cells in adversarial/antagonistic persons
- That defensive anti-phage ‘kill switches’ even make sense to pursue technically
Risks of Failure & “Success”
- No viable malicious dual-use phage application concern (terrestrially or in a space context)
- No interest in developing anti-phage ‘kill switches’
- Developing anti-phage ‘kill switches’ is too technically intensive/not scalable to address the diverse needs of potential future users
- Not enough funding to develop necessary research body of knowledge to make anti-phage ‘kill switches’ viable and safe to use in a space health context
- Anti-phage ‘kill switches’ successfully developed and deployed and become another chapter in the never-ending story of phages v. bacteria (i.e., phages adapt to the ‘kill switches’ over time, rendering them ineffective)
Governance Action 2: Space Medicine Access Consortia (SMAC)
Purpose
- Current State: The field of space medicine is still in its relative nascency. While space health consortia exist, there are no consortia explicitly aimed at making space medicine advancements as broadly accessible as possible to both spacefaring and terrestrial populations.
- Proposed Changes: Charter and develop a consortia explicitly aimed at making space medicine advancements as broadly accessible as possible to both spacefaring and terrestrial populations. Ideally this consortia should comprise members from private industry, academia, government-affiliated research centers, and allied partners
Design:
- Potential and future committed consortia partners from across private industry, academia, government-affiliated research centers, and allied partners need to see the value of the consortia’s mission (i.e., they need to see and align with how the consortia’s explicit charter to make space medicine advancements as broadly accessible as possible to both spacefaring and terrestrial populations would benefit their organization)
- Consortia partners need to agree on:
  - Rules of the road (this will likely need to be contractual)
  - Research scope(s) for their organizations
- Concrete milestones indicating consortia success, stagnation, or failure in achieving its goals
- Adequate funding to see consortia from inception through the point where most of its charter has been fulfilled (a somewhat analogous example might be the pool of institutions the Information Processing Technology Office (IPTO) at the Defense Advanced Research Projects Agency (DARPA) pooled together when creating the ARPANET, which later evolved to become the modern Internet, and the shifts in IPTO’s mission over time as a result of these developments)
Assumptions
- Critical mass of stakeholders:
  - cares about making space medicine advancements accessible to spacefaring and terrestrial populations
  - can see the value of a dedicated consortia to make space medicine advancements more accessible for spacefaring and terrestrial populations
  - can agree to the rules of the road necessary to make consortia viable
  - see the value of making phage chassis capable of targeting bacteria that commonly cause infections during spaceflight ‘plug-and-play’ and cost-effective for end-users (i.e., they wouldn’t prioritize some other space medicine intervention)
- Consortia can achieve its charter, with greater phage chassis accessiblity as a step on its path to success
Risks of Failure & “Success”
- No or not enough interest in:
  - making space medicine advancements more accessible for spacefaring and terrestrial populations
  - developing consortia to make space medicine advancements more accessible for spacefaring and terrestrial populations
- SMAC not viable due to:
  - Too much burden/not enough return on investment for potential consortia members (i.e., they don’t see the value)
  - Disagreeements over rules of engagement
  - Disagreement over research scope for participating members (i.e. who does what)
  - Inadequate implementation pathways (i.e., most of SMAC’s work languishes in the ‘valley of death’)
- SMAC prioritizes other space medicine interventions over making phage chassis capable of targeting bacteria that commonly cause infections during spaceflight ‘plug-and-play’ and cost-effective for end-users
- SMAC might ‘succeed’ too much, and as a result, the loudest or wealthiest SMAC members might eventually see value in exercising disproportionate control over the entity, diluting its charter
Governance Action 3: Space Applied Biomedicine Repository (SABR)
Purpose
- Current State: Numerous space medicine guides have been developed by NASA, including the NASA Astronaut Medical Operations Handbook, Advanced Diagnostic Ultrasound in Microgravity (ADUM) Protocols, and OCHMO-STD-100.1A, otherwise known as the NASA Medical Standard. Commercial space missions largely lean on these guides for their missions
- Proposed Changes: While NASA’s precedent is useful and organizations like the Translational Research Institue for Space Health (TRISH) have proposed tailored guidance for commercial space travelers, there is no existing repository explicitly focused on lessons learned and best practices for applied biomedicine in a space context. As opposed to a static guide, this work be more like a git-based repo, that could be updated with lessons learned, remaining customaizable for large numbers of future spacefarers and their medicial needs
Design
- User base:
  - willing to contribute to and derive value from SABR
  - with enough technical know-how to contribute to and derive value from SABR
- Stakeholder(s) willing to pay subscription costs to run (likely git-based) SABR technical back-end
- Easy to understand guidance and lessons learned that can be easily ingested in text, video, or audio form on (likely) simple, non-bandwidth intensive network connected devices
Assumptions:
- A (likely git-based) repository is an optimal vehicle for distributing recursively improving or community-based applied biomedicine lessons-learned for the space medicine community and future missions
- This repository would contain accumulated, useful insights regarding how phage chassis capable of targeting bacteria that commonly cause infections during spaceflight can be deployed or administered:
  - safely (i.e. without malicious dual-use) across a wide variety of space missions
  - in a ‘plug-and-play’ or affordable manner
- Enough of a user base:
  - willing to contribute to and derive value from SABR
  - with enough technical know-how to contribute to and derive value from SABR
- Stakeholder(s) willing to pay subscription costs to run (likely git-based) SABR technical back-end
- Easy to understand guidance and lessons learned that can be easily ingested in text, video, or audio form on (likely) simple, non-bandwidth intensive network connected devices (i.e., SABR can adequately work even in very bandwidth constrained conditions)
Risks of Failure & “Success”
- Not enough interest in SABR (potential users or contributors don’t see its value)
- SABR is:
  - too:
    - early/the timing’s not right
    - abstract and remote in its guidance (i.e., the actual users or contributors who might derive value from its content see its content as too over their heads, technical, jargon-filled, or not applicable for their specific use cases)
  - a useful application, just not for the safe, easy-to-use, or cost-efective deployment of phage chassis capable of targeting bacteria that commonly cause infections during spaceflight
  - bandwidth-constrained to such a degree that it’s untenable for all intensive purposes
- Can’t find stakeholder(s) willing to pay subscription costs to run (likely git-based) SABR technical back-end
- SABR works so well and grows to such an extent that discerning actual mission-relevant content of value from the repository becomes a challenge for uninitiated users (i.e., filtering is a challenge – hard to separate filler from useful content)
Biological Engineering Application Governance Scoring Rubric
Biological Engineering Application: Phage chassis capable of targeting bacteria that commonly cause infections during spaceflight
The rubric below works as follows: Policy goals and sub-goals are listed vertically, while each of the governance actions are listed next to the respective column header titled ‘Option’. Governance actions are scored from 1-3 based on how well they fulfil each policy goal and sub-goal. A score of 1 indicates a governance action does a poor job at fulfilling a policy goal, a 2 indicated a governance action does an OK job at fulfilling a policy goal, and a 3 indicates a governance action is the best at fulfilling a policy goal.

Does the option:	Option 1: Phage Safety Refusal (PSR)	Option 2: Space Medicine Access Consortia (SMAC)	Option 3: Space Applied Biomedicine Repository (SABR)
Preventing/Mitigating Malicious Dual-Use	3	1.5	1
• By implementing appropriate biosecurity (including cyberbiosecurity) controls	2	1	2
• By promoting safe deployment conditions	3	2	2
Empowering Autonmous and Equitable Use	1	3	3
• By encouraging ‘plug-and-play’ functionality	1	2	3
• By promoting cost and distribution accessibility	1	3	2

Based on the scoring above, I’d probably prioritize SABR. Given the difficulties around space-based governance or governance in remote conditions, I think an organization like a TRISH or the Organization for Space Medicine, Engineering, and Design (OSMED) might be a good starting point, promoter, or convener to begin making something like SABR a reality. The more I think about it, the varied jurisdictions and varied, unclear regualtory regimes at the nation-state level make the idea of a more open-source/tribal knowledge-based self-governing solution appealing (or at bare minimum it shows a gap that could potentially be filled). I also think that if the timing was right (i.e., enough useful information could be added by a community of contributors) this could actually help fulfill some of the policy goals associated with the phage chassis project, not by a lot of sanctioned formal policy-making perhaps, but by community contribution, input, and/or agreed-upon best practices. That said, I can see and understand the trade-offs between formal policy-making at the nation-state level and more grassroots normative development of best practices as a result of doing this exercise. The glaring uncertainties that remain are whether or not the repository’s timing is right and/or if a dedicated user and contributor base could coalesce around it
Ethical Considerations: Given the mechanics of phages, specifically their ability to hijack host cell tRNAs, ribosomes, and amino acids, I was somewhat surprised that there wasn’t more mention of potential deliberate malicious dual-use of phages. Maybe it’s my relative nascent understanding of the life sciences, the limitations of my research on this topic, or maybe the topic itself is either not researched or considered extensively. If this is true and the notion of potential deliberate malicious dual-use of phages might be a little bit left field, not well understood, or not well defined, perhaps convening working groups might be a sensible governance action, as these groups can often help map areas of concern for emerging dual-use technologies. Maybe distributing outputs from these working groups (i.e., white papers) to relevant academic journals, technical standards bodies, or policymakers might also be a worthwhile governance action.

All supporting prompts for the governance exercise above listed below:

Supporting Prompt	Source
Take a look at the following quote from the URL below: “Strain-specific phage chassis to target bacteria that commonly cause infections during space flight.” What is the difference between a phage and a phage chassis? In general? In a biotechnological context? Do NOT hallucinate when answering these questions https://roadmap.ebrc.org/engineering-biology-for-space-health/	Perplexity
Take a look at the following quote from the URL below: “Capability to produce novel phages on space missions for rapid control of evolved biofouling microbes. What are ’evolved biofouling microbes’? What is biofouling? I assume biofouling indicates something bad/undesirable, but I don’t know what the term actually means beyond my assumption. Do NOT hallucinate when answering these questions https://roadmap.ebrc.org/engineering-biology-for-space-health/	Perplexity
I understand how a phage can insert itself into a cell. Not exactly understanding if or how phages’ abilities contribute at all to personalized medicine developments (i.e., is there something about phage properties that make them particularly good candidates for personalized medical interventions)? Do NOT hallucinate when answering this question	Perplexity
How do governance mechanisms or standards of good or socially harmonious/beneficial behavior work (or work effectively) in remote regions (think polar research stations, etc.)?	Perplexity
What is the phenomenon called in artificial intelligence when a large language model (LLM) refuses to reply to user input due to safety concerns? What is it called and how does it work? Do NOT hallucinate when answering these questions	Perplexity
What are the technical subcomponents of a biotechnology intervention or treatment using phage chassis? What do the supply chains look like, if any? Do NOT hallucinate when answering these questions. If you don’t know the answers to these questions, say so	Perplexity
“phage chassis synthetic biology manufacturing pipeline” search results	Perplexity
Are there any existing ways a biotechnology solution (let’s say a custom developed chassis) can proactively prevent itself from malicious dual-use? Analogous to large language model (LLM) safety refusal, are there any mechanisms that can be pre-built into a biotechnology solution to proactively prevent malicious dual-use? Do NOT hallucinate when answering these questions	Perplexity
How exactly do phages interact with genetic code information within a given cell? How do cell-based bacteria defend against unwanted phages?	Perplexity
I’m high-level aware that there are certain ’no-go’/‘do not edit’ pieces of genetic code. How are phages traditionally prevented from editing these ’no-go’/‘do not edit’ pieces of genetic code? Is that a thing? If I’m off in any way/if my conceptual underpinnings seem shaky, let me know Do NOT hallucinate when answering these questions	Perplexity
Tell me about about engineered synthetic biology kill switches	Perplexity
Have any engineered synthetic biology kill switches been implemented as part of phage therapies? Do NOT hallucinate when answering this question	Perplexity
If I’m making a novel phage-related therapy for astronauts, and I live in the United States, the Food and Drug Administration (FDA) would need to approve this therapy, correct? My assumption is yes. How does approval of a drug used outside of Earth’s atmosphere work from a regulatory perspective? Do NOT hallucinate when answering these questions	Perplexity
Are there any space health-related consortia specifically or explicitly aimed at making space medicine advancements as broadly accessible as possible to both spacefaring and terrestrial populations? If so, share information regarding said consortia Do NOT hallucinate. If you don’t know the answer to this question, say so	Perplexity
In medicine, what do we usually mean by ‘point of care’? What do we mean when we say that?	Perplexity
Do space medicine point of care guides exist? If so, are there any for commercial space tourists, astronauts, or future groups of spacefarers, including workers, etc.?	Perplexity
What is applied biomedicine?	Perplexity
What does trish stand for in a space health context	Google AI Mode
Tell me about the space health point of care guide TRISH is either developing or has developed	Google AI Mode
What is the TRISH POCUS training referred to in the answer to the last prompt? What does POCUS refer to?	Perplexity
How are most git-base repositories run? What is the underlying technical back-end powering them and how is this infrastructure paid for?	Perplexity

Week 1 Homework Questions

Professor Jacobson Questions

Two widely used polymerases are thermus aquaticus (Taq) and pyrococcus furiosus (Pfu) ¹. Taq has error rates ranging between 1 x 10^-5 to 2 x 10^-4 errors per base pair per doubling while Pfu has error rates of 1.3 x 10⁶ ² ³. Compared to the length of the human genome, 3 x 10⁹ base pairs, this comes out to apprxoimately 3.3 x 10^-13%, 6.6 x 10^-12%, and 3.3 x 10^-2% ⁴. Biology deals with this discrepancy during DNA replication through proofreading when it detects inaccurate nucleotides. When the polymerase detects that an inccorect base has been added, the polmyerase enzyme makes a cut in the chemical bond, releasing the incorrect nucleotide ⁵. If errors are made after replication, a mismatch repair is initiated. This is where enzymes recognize incorrectly added nucleotides and dispose of them. Nucelotide excision repair is another way nature corrects these errors. This occurs when ezymes remove and replace incorrect bases via cuts at the 3 and 5 prime ends of the incorrect base ⁶.
An average human protein is approximately 375 amino acids long ⁷. As each codon consists of 3 letters, rough math indicates there are approximately 3³⁷⁵ number of potential coding sequences, an extremely large number of combinations ⁸. Some of the reasons all these different codons don’t code for the protein of interest are:
- Codon Bias: Some codons are represented during transcription at a far greater level than others, traditionally due to more abundant transcription RNA (tRNA), ensuring higher levels of expression ⁹ ¹⁰.
- mRNA Structure: Certain mRNA structures can be impacted by certain codon expression (i.e., become less stable), and therefore become more susceptible to degradation ¹¹.
- Translation Accuracy Issues: Non-optimal codons decrease protein translation efficiency, due to a form of crowding in the ribosome, the area in the cell where protein production takes place ¹².

Dr. LeProust Questions

Solid-phase phosphoramidite synthesis is the most common currently used oligo synthesis method ¹³.
It’s difficult to make oligos longer than 200nt via direct synthesis because coupling errors/inefficiencies compound to the point where one ends up with lots of short, incomplete fragments ¹⁴.
A 2000bp gene has 4000 nucleotides ¹⁵. Based on the answer to the previous question, creating a 2000bp is not currently feasible due to the accumulation of coupling errors/inefficiencies, even when stitching together smaller oligos or using novel enzymatic methods are taken into account ¹⁶ ¹⁷.

George Church Question

Question 3 Response: ARPA-H Biostablization Sytem (BoSS) Grant Response
All supporting prompts listed below answer:

Innovative Solutions Opening	ARPA-H-SOL-26-136
Solution Summary Title	Pyrococcus furiosus-Inspired Molecular Staples (PFIMS)
Team Lead Organization	Federally Funded Research and Development Center (FFDRC)
Type of Organization	See above
Technical Point of Contact	Name: Jason Ross
Administrative Point of Contact	Name: Jason Ross
Total Basis of Estimate	$2,600,000
Places of Performance	McLean, VA
Other Team Members	MITRE Biotechnology Department (L271) Interns

Concept Summary

The team behind Pyrococcus furiosus-Inspired Molecular Staples (PFIMS) seeks to develop small organic molecules capable of binding to the ‘grooves’ of DNA and proteins across a variety of temperatures. If successfully developed, PFIMS would allow heat-proofing a biologic across a variety of temperatures without the need for dehydration. By locking protein folds through ionic pull, we can stablize biologics for longer periods of time (TA1). Our work will scale this system to scalable cell processing across an array of temperatures and use cases (TA2).

Innovation and Impact

PFIMS is inspired by extremophile biology. Pyrococcus furiosus can survive at temperatures of 100 degrees Celsius in ocean vents. By mimicking pyrococcus furiosus’ molecular heat shield, we can keep cells alive and functioning for exteneded periods of time at refrigerator or room temperature. Unlike modern cryopreservation methods that employ various forms of freezing that can harm cells during thawing, our stabilization solution stabilizes proteins using byproducts proteins naturally produce. Not only are cold temperatures avoided entirely, but once scaled this solution will slash costs for biologics shipping. Most importantly, PFIMS can work for any cell type, as it builds off fundamental biological features, such as protein folding and membrane strength.

Proposed Work

We plan to develop a novel, functioning bench-top bioprocessing system inspired by pyrococcus furiosus. We will create a polyamine-based stabilization medium that will power this bioprocessing system for a standard biologic (ex. cell therapy or antibodies). PFMIS’ approach is grounded in existing literature on stablization for high temperature DNA/protein stabilization via polyamines, small organic molecules with muliple amino groups (Bae, 2018) (Despotović, 2020) (Oshima, 2007). These polymaines act as a form of ‘molecular staples’ in preliminary modeling efforts (Vieille, C.,2001).

Key Milestones and Deliverables

Phase 1: Synthesize the branched polyamine formulation. Deliverable: Optimized medium prototype based on a standard biologic.

Phase 2: Integrate the stabilization medium into bioprocessing hardware, likely a singlee-use bioreactor to for initial prototype followed by a larger testing and deplouyment within a media/buffer prep mixer. Deliverable: Protoyped biostabilization device.

Phase 3: Validate stability metrics for a model biologic using PFIMS at room temperature over time. Deliverable: Validation report and delivery of biostabilization system capable of scaled biostablization across nth biologics

Technical Risks and Mitigations

Risk: Polyamines may exhibit toxicity at necessary concentratioons. Mitigation: Screening polyamines for reduced toxicity levels; introducing wash and resuspension steps into bioprocessing.

Risk: Stabilization mechanism may exhibit difficulty transferring from archaic single-celled microbes like pyrococcus furiosus to eukaryotic cells (cells containing nuclei and organelles). Mitigation: Tests across multiple cell types

Use Case

PFIMS enables rapid delivery of life-saving biologics and therapeutics in low-resourced or contested conditions. This allows for dramatically cheaper shipment of biologics to locations such as rural communities without robust public health infrastructure, remote or relatively isolated geographies, or active conflict zones.

Sources

Bae DH, Lane DJR, Jansson PJ, Richardson DR. The old and new biochemistry of polyamines. Biochim Biophys Acta Gen Subj. 2018 Sep;1862(9):2053-2068. doi: 10.1016/j.bbagen.2018.06.004. Epub 2018 Jun 8. PMID: 29890242.
Despotović Dragana, Longo Liam, et. al. Polyamines mediate folding of primordial hyperacidic helical peptides into stable amyloid-like fibrils. Biochemistry, 60(4), 257–267.
Oshima T. Unique polyamines produced by an extreme thermophile, Thermus thermophilus. Amino Acids. 2007 Aug;33(2):367-72. doi: 10.1007/s00726-007-0526-z. Epub 2007 Apr 12. PMID: 17429571.
Vieille C, Zeikus GJ. Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol Mol Biol Rev. 2001 Mar;65(1):1-43. doi: 10.1128/MMBR.65.1.1-43.2001. PMID: 11238984; PMCID: PMC99017.

All supporting prompts listed below:

Supporting Prompt	Source
Tell me about how very temperature resilient trees or plants achieve homeostasis despite large temperature fluctuations. If the Dsup protein in tardigrades could hypothetically be used to confer radiation resistance to humans traveling in space, how could an analogous protein or feature found in trees or plants be used to store biologics across a wider array of temperatures? Essentially I’m asking for analogous applicability if that makes sense	Google AI Mode
As a potential next step, do NOT look into synthetic biology startups that are already attempting to synthesize LEA proteins for “cold-chain-free” vaccine storage. Give me other, non-plant or tree examples across the kingdom of life where homeostasis is achieve despite large temperature fluctuations. I want to look where synthetic biology startups are NOT looking	Google AI Mode
Give me academic sources for each of the 3 non-plant examples and any biotechnology research in academia around these properties. Then tell me the basics of how the properties of each organism would transfer to a medium capable of biological stabilization for an array of biologics	Google AI Mode
What is Pyrococcus furiosus? Can you show me a picture?	Google AI Mode
“The Transfer Logic To create a stabilization medium, you would synthesize branched-chain polyamines (like thermo-spermine). Binding: These molecules are positively charged and naturally “wrap” around negatively charged biologics (like DNA/RNA or specific protein folds). The Medium: The medium would be a liquid concentrate of these polyamines. Instead of refrigeration, the “staples” provide enough ionic pull to prevent the biologic from “unzipping” when the temperature rises.” Based on the ‘Proposed Work’ section in the attached document, create 2 simple examples of final deliverables, and create some broad brushstrokes/bullet points I can use to create a ‘Proposed Work’ section based on the specifications provided in the attached document.	Google AI Mode
Based on the previous information provided, estimates of how long these processes would take, and the salary rates for an average employee at the MITRE Corporation, give me some Total Basis of Estimate numbers (i.e., ranges for how much this would cost)?	Google AI Mode
Looking to develop a Pyrococcus furiosus -inspired stabilization medium for cell preservation. Based on existing Pyrococcus furiosus literature, give me 5 bullet points describing why this approach would be novel and game-changing. Don;’t use a lot of jargon. One of the bullets should explain how this stabilization medium would differ from the current state of the art	Perplexity
What is a polyamine?	Perplexity
If I wanted to integrate a Pyrococcus furiosus -inspired stabilization medium into bioprocessing hardware, what type of hardware would we be talking about? What’s commonly used?	Perplexity
What does the term ‘cytotoxic’ mean? I assume it’s a form of toxicity, but I don’t know what it specifically refers to	Perplexity
What are archaea and how are they different from eukaryotic cells?	Perplexity

Week 2 HW: DNA Read, Write, and Edit

Part 0: Basics of Gel Electrophoresis

Per instructions for this part, I attended the 02/10 lecture and 02/11. Additionally I attended all 3 Bootcamp sessions.

Part 1: Benchling & In-silico Gel Art

Make a free account at benchling.com

Benchling Account Creation Confirmation

Import the Lambda DNA

Benchling Phage Lambda DNA Import Confirmation_02.12.26

Simulate Restriction Enzyme Digestion with the following Enzymes:
- EcoRI
Benchling EcoRI Enzyme Digest Confirmation
- HindIII
Benchling HindIII Enzyme Digest Confirmation
- BamHI
Benchling BamHI Enzyme Digest Confirmation
- KpnI
Benchling KpnI Enzyme Digest Confirmation
- EcoRV
Benchling EcoRV Enzyme Digest Confirmation
- SacI
Benchling SacI Enzyme Digest Confirmation
- SalI
Benchling SalI Enzyme Digest Confirmation
Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artowrks

Mind the Gap (Or a Most Wondrous Cave) ➕

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1 Choose your protein

I chose the Mantis Fibroin 1 protein because for some reason when I received this assignment, my mind flipped to an insect protein, and then from there, a praying mantis. Upon further research, I was pleased with where my intuition lead me. The Mantis Fibroin 1 protein helps comprise the mantis’ ootheca, otherwise known as its egg casing. What’s fascinating about these proteins is that they create this coiled yet flexible foam-like structure around the mantis’ eggs. This protein piqued my interest, as it might have biomimetic potential. The Mantis Fibroin 1 protein is listed below ¹²:

tr|I3PM87|I3PM87_9NEOP Mantis fibroin 1 OS=Pseudomantis albofimbriata OX=627833 GN=MF1 PE=2 SV=1 MDSKMLCVSLLLAVFCLWYTEASPLEEKYGEKYGDMEEYQRGTEDSRAVINDHTAKVASQ SARGMVNKAKTTEAAARSNEQLSKDRQYYYREYLKKADYHKKKALEYEQLSAAENAKIAY HESKQKDWETKARESDVQCRDAEAKYEQSYTRSRELKRESIIAYVQAAMHHAEASGDHMK ADRAKDIARDMMRKAESLRGDASNHYQRSEEDKNKARSEKVKAHQNADNSQRHHTACRAY DQEGLKTRLSSKANMMRQIHSSLLAERSHSLAREDGLAADLSHKLAEELARMSEESGAIS KINSGEERGYSNKVRQDEVKAHELAVSKRMMGAEVADNSEMISLAQAKDGSLDEGENYKL STFYADDSTKNMLPDSRGQMSYGDE

3.2 Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

A translated Mantis Fibroin 1 protein nucleotide sequence of most likely codons is below, as well as evidence showing how I inserted the Mantis Fibroin 1 protein UniProt information into the reverse translation tool

atggatagcaaaatgctgtgcgtgagcctgctgctggcggtgttttgcctgtggtatacc gaagcgagcccgctggaagaaaaatatggcgaaaaatatggcgatatggaagaatatcag cgcggcaccgaagatagccgcgcggtgattaacgatcataccgcgaaagtggcgagccag agcgcgcgcggcatggtgaacaaagcgaaaaccaccgaagcggcggcgcgcagcaacgaa cagctgagcaaagatcgccagtattattatcgcgaatatctgaaaaaagcggattatcat aaaaaaaaagcgctggaatatgaacagctgagcgcggcggaaaacgcgaaaattgcgtat catgaaagcaaacagaaagattgggaaaccaaagcgcgcgaaagcgatgtgcagtgccgc gatgcggaagcgaaatatgaacagagctatacccgcagccgcgaactgaaacgcgaaagc attattgcgtatgtgcaggcggcgatgcatcatgcggaagcgagcggcgatcatatgaaa gcggatcgcgcgaaagatattgcgcgcgatatgatgcgcaaagcggaaagcctgcgcggc gatgcgagcaaccattatcagcgcagcgaagaagataaaaacaaagcgcgcagcgaaaaa gtgaaagcgcatcagaacgcggataacagccagcgccatcataccgcgtgccgcgcgtat gatcaggaaggcctgaaaacccgcctgagcagcaaagcgaacatgatgcgccagattcat agcagcctgctggcggaacgcagccatagcctggcgcgcgaagatggcctggcggcggat ctgagccataaactggcggaagaactggcgcgcatgagcgaagaaagcggcgcgattagc aaaattaacagcggcgaagaacgcggctatagcaacaaagtgcgccaggatgaagtgaaa gcgcatgaactggcggtgagcaaacgcatgatgggcgcggaagtggcggataacagcgaa atgattagcctggcgcaggcgaaagatggcagcctggatgaaggcgaaaactataaactg agcaccttttatgcggatgatagcaccaaaaacatgctgccggatagccgcggccagatg agctatggcgatgaa

Mantis Fibroin 1 UniProt Information Inserted into Reverse Translation Tool

3.3 Codon Optimization

We need to optimize codon usage so a particular sequence can be expressed with greater fidelity, reliability, and efficiency in a host organism. I’ve chosen to optimze the codon sequence for Saccharomyces cerevisiae (baker’s yeast), because it:
is commonly used as a host in biotechnology applications
folds in a manner closer to insect protein folding
is apparently easier to work with than mammalian cells

A codon optimized Mantis Fibroin 1 nucleotide sequence is shown below, as well as evidence showing how I showing how I inserted the Mantis Fibroin 1 protein nucleotide sequence information into the codon optimization tool

ATGGACAGTAAGATGTTATGTGTCTCCTTATTGTTGGCTGTTTTTTGTTTATGGTATACTGAAGCTTCCCCATTAGAAGAAAAGTATGGTGAAAAGTACGGTGACATGGAAGAGTACCAAAGAGGTACTGAAGATTCAAGAGCAGTTATTAACGATCATACTGCTAAAGTTGCTTCCCAATCCGCCAGAGGTATGGTTAATAAGGCTAAGACTACAGAAGCTGCTGCTAGAAGTAATGAACAATTATCTAAAGATAGACAATACTATTACAGAGAATATTTGAAAAAGGCTGATTATCATAAGAAGAAAGCTTTGGAATATGAACAGCTTTCAGCTGCTGAAAATGCAAAAATTGCTTATCATGAATCTAAACAAAAAGACTGGGAAACGAAAGCCAGAGAATCCGATGTTCAATGTCGTGATGCTGAAGCAAAATATGAACAATCTTACACAAGGTCCAGAGAACTGAAAAGGGAATCTATTATTGCTTATGTTCAAGCTGCTATGCATCATGCTGAAGCTAGCGGTGATCACATGAAAGCTGATAGAGCTAAAGATATCGCTAGAGATATGATGAGAAAGGCAGAATCCTTAAGGGGTGACGCTAGCAACCATTATCAGAGATCCGAAGAAGATAAGAATAAGGCCAGATCTGAAAAGGTTAAAGCTCATCAAAACGCTGATAATTCTCAAAGACATCATACTGCATGCAGAGCGTATGACCAAGAAGGTTTAAAGACGAGATTGAGCTCAAAAGCCAACATGATGAGACAAATTCACTCCTCACTACTGGCTGAAAGATCTCATTCATTAGCAAGAGAAGACGGTCTTGCGGCCGATTTATCACATAAGTTGGCTGAAGAATTAGCTAGAATGTCCGAAGAATCAGGTGCTATATCTAAAATAAACTCAGGTGAAGAAAGAGGCTATTCGAATAAAGTGAGACAAGATGAGGTTAAAGCACATGAATTGGCTGTTAGCAAAAGAATGATGGGTGCTGAAGTTGCTGATAATTCGGAGATGATTAGTTTGGCACAAGCTAAAGACGGTTCTTTAGATGAAGGTGAGAACTATAAATTATCCACTTTTTATGCAGACGATTCTACAAAAAATATGCTACCAGATTCTAGGGGTCAAATGTCTTACGGTGATGAA

Mantis Fibroin 1 nucleotide sequence codon optimized for Saccharomyces cerevisiae (baker’s yeast)

3.4: You have a sequence! Now what?

Several cell-dependent technologies could be used for producing the codon-optimized Mantis Fibroin 1 protein. One such technology, a yeast system, has already been pursued in the previous steps of this section, as Saccharomyces cerevisiae (baker’s yeast) is a form of yeast. Bacterial systems, such as E. coli could also be used for producing the protein in host cell culture, although this would require different codon optimization. Other cell-dependent technologies could have included insect or mammalian-based systems, although I’m not sure of the value of expressing an insect-associated protein in another insect-based host (although this may be a failure of imagination on my part). As mentioned previously, mammalian systems could also be used, but apparently mammalian cells are more difficult to work with than bacterial or yeast-based hosts. Cell-free methods for producing the codon-optimized Mantis Fibroin 1 protein would involve breaking open a cell, extracting relevant ribosomes, enzymes, tRNAs, etc., and then taking these contents and combinining them with a DNA template (in this case our Mantis Fibroin 1 protein nucleotide sequence), energy sourcesm relevant amino acids, and a reaction buffer. There’s a time advantage of cell-free methods over cell-dependent methods of protein expression.

3.5 Optional: How does it work in nature/biological systems?

From my research in answering this question, I think the answer is that post-transcriptional process called alternative splicing occurs, where non-coding mRNA (introns) are cut and removed, while coding regions (exons) remain ³. It’s pretty fascinating because this splicing can create several different types of mRNA molecules, and therefore different proteins. This increases the efficiency with which different proteins can be expressed within a particular organism.
See below

Attempted Mantis Fibroin 1 Alignment

All Section 3 Prompts Listed Below

Supporting Prompt	Source
I want to make DNA gel art in the style of Paul Venouse’s gel electrophoresis works. I want to use the website below to help me:https://rcdonovan.com/gel-art How should I go about making this art virtually? I have a Benchling account, an Internet connection, and no access to a wet lab	Perplexity
Help me design a specific lane‑by‑lane “band plan” for your first virtual gel artwork. My knowledge of Benchling is very minimal and my desired art is a right-facing arrow	Perplexity
Not understanding the 3rd Step under Step 3. How do I run any digest that gives me 2 fragments? How do I know my target size for each copy of the Arrow_Base sequence?	Perplexity
I see a scissors icon. Write out a literal “click this, then this” sequence for one lane so I can duplicate it for the others	Perplexity
Ok I’ve found and made 5 digests of the requested bp sizes. When I’m seeing the combined virtual digest I’m not seeing an arrow. What are next steps I should take?	Perplexity
Ok – let’s take a step back. I’ve tried what you’ve said and am not seeing a right facing arrow. Think I need to learn how to do this from the ground up with some more tweaking. I know how to select the Digests option in Benching, I know how to save a Digest, and I know how to click on ‘Virtual Digest’ and select different kinds of ladders. What type of ‘Cut sites visible on maps’ should I select to begin compiling the right facing arrow? Under ‘Show enzymes that cut’ is there anything in particular I should select? Should I select ‘Highlight enzymes with compatible sticky ends’? How do I actually learn how to make the right facing arrow?	Perplexity
This seems like a good workflow. How do I create a new DNA with a 5000bp length? I assume I have to import a sequence of that length from an NCBI accession correct?	Perplexity
When I type in HindIII I see 0 cuts. I can’t make a ~4000bp band based on that, right?	Perplexity
Everything is coming up a 0 when I try to put in an enzyme. I think there’s hallucinating going on or there is something wrong when you started having me make the 5kb sequence	Perplexity
When I select an enzyme for a digest, do I then need to select a cut site to make the bio art or no?	Perplexity
Dumb question: How do I see the length of a digest for a given enzyme in a sequence?	Perplexity
Tell me how to search NCBI for an accession	Perplexity
When I look inside the Digest feature in Benchling, how do I find an enzyme that can give me a cut of a certain length? I see Name in one column, followed by Cuts, but am not exactly understanding what I’m seeing here	Perplexity
wondering if there are any special proteins found in the praying mantis insect and what exactly about these proteins make them special	Google AI Mode
Are there any proteins in nature that have unbelievable economy of space that would make them particularly useful for data storage?	Google AI Mode
Are there any proteins in nature that have unbelievable economy of space that would make them particularly useful for data storage?	Google AI Mode
i want to make a box where I can put some text in in markdown hugo relearn theme. I don’t want to create a table. What should i create?	Google AI Mode
When one does codon optimization and the sequence in question comes from a protein traditionally associated with a given species (let’s say an insect), does one traditionally optimize the codon sequence for that same species or its genus/family of species? How does this work given standard codon optimization practices?	Perplexity
So in essence, we perform codon optimization so our sequence in question can be expressed with greater fidelity or reliability in the host? Or so the host can receive or incorporate the sequence as efficiently as possible? Let me know if my thinking or terminology is off here	Perplexity
I have a nucleotide sequence for the Mantis Fibroin 1 protein. Have learned about some of the Mantis Fibroin 1 protein’s interesting properties, namely how it helps create a coiled yet flexible casing around Praying Mantis eggs. At this point, I think this protein might have some biomimetic potential, but am not sure what organism I should optimize the sequence for. Traditionally I know E. coli and Baker’s yeast are used a lot in synthetic biology applications, and I know mammalian cells are apparently more challenging to work with. For a use case like this one, where I have an insect-associated protein that may have biomimetic properties, let me know some traditional host organisms in biotechnology that are used for codon optimization in cases like this one. Do NOT hallucinate. Use existing sources. If you cannot provide anything whatsoever, say so	Perplexity
When we have a protein we find in the wild, and then we codon optimize its nucleotide sequence for expression in a host organism, what do cell-dependent or cell-free methods to produce this codon optimized protein from the sequence mean in a biotechnological context? What exactly are we talking about?	Perplexity
Based on the answer to the previous prompt, what is a promoter? What is a lysate mix? Do NOT hallucinate when answering these questions	Perplexity
What are some cell-dependent methods of producing proteins from DNA in biotechnology? Are there multiple types of cell-dependent methods? Do NOT hallucinate when answering these questions	Perplexity
Are cell-dependent methods for producing proteins from DNA in biotechnology distinct from cell-dependent technologies for producing proteins from DNA or are the terms ‘methods’ and ’technologies’ essentially interchangeable in a biotechnology context? If they’re not, describe some cell-dependent technologies for producing proteins from DNA. Do NOT hallucinate when addressing this query. If you cannot answer this question, say so	Perplexity
explain to me how cell-free expression of a protein works	Google AI Mode
Does the histone code have anything to do with the ability for a single gene in nature to code for multiple proteins at the transcriptional level? How does the histone code relate to the transcriptome? Are they one and the same?. Do NOT hallucinate when answering these questions. If you don’t know the answers to these questions, say so	Perplexity
Based on the answer to the last prompt, then what does allow for a single gene in nature to code for multiple proteins at the transcriptional level? In plain terms, what does “…different exon combinations produce distinct mRNA isoforms from one gene, which then translate into varied proteins” mean? What are exons again, and how do they produce different combinations? What is an mRNA isoform? Do NOT hallucinate when answering this query	Perplexity
Thank you for the answer to the last query. What exactly is being sliced (the gene itself or something else), and why is it being spliced? Why would nature/evolution create this ability? Do NOT hallucinate when answering this query. Go off existing literature	Perplexity
Does alternative splicing operate at the transcriptional level? From what I can see in the link below, it operates on the translational level. https://www.yourgenome.org/theme/what-is-rna-splicing/. If there was hallucination, or an error in answering the previous prompts, say so. Or, if I’m misreading or misunderstanding things, say so. Just wondering what at the transcriptional level allows for a single gene to code for multiple proteins	Perplexity
how does a single gene in nature code for multiple proteins at the transcriptional level?	Google AI Mode
I want to align a DNA sequence, its transcribed RNA, and a resulting translated protein. I believe I can capture the separate pieces (the DNA sequence, the transcribed RNA, and the resulting translated protein) in Benchling. If this is true, how can I go about doing this?	Perplexity
I have a codon optimized sequence for the Mantis Fibroin 1 protein in a Saccharomyces cerevisiae (baker’s yeast) host. I want to produce the RNA sequence and the final translated protein. What services online can I use to do this?	Perplexity
What does forward and reverse translation of a DNA sequence in Benchling mean?	Perplexity
I have a codon optimized nucleotide sequence (DNA). How can I find what the RNA sequence and translated protein look like for this sequence? What services can I use to see these items?	Google AI Mode

Part 4: Prepare a Twist DNA Synthesis Order

4.1: Create a Twist account, and Benchling account

Twist Account Creation

Benchling Account Creation Confirmation

4.2: Build Your DNA Insert Sequence

Original Sequence Insertion NOTE: Think I may’ve started off inserting the wrong sequence. This may have been potentially fixed when I inserted a sfGFP sequence from NCBI*

Codon Optimization NOTE: Think I may’ve started off inserting the wrong sequence. This may have been potentially fixed when I inserted a sfGFP sequence from NCBI*

Corrected NCBI Sequence Insertion

Corrected NCBI Sequence Codon Optimization

Start Codon Annotation

Stop Codon Annotation

Promoter BBa_J23106 Insertion

RBS Insertion

Coding Sequence Insertion

7x His Tag Insertion

Terminator BBa_B0015 Insertion

Sequence Linear View

Downloaded Sequence FASTA file (via Mac OS TextEdit). See below:
HQ873313 (codon optimized) TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCatgAGCAAAGGAGAAGAACTTT TCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGA AGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTT GTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTG CCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAA GTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACAC AAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCA AAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCC TGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATG GTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAAcgtaaaggcgaggagctgt tcactggtgtcgtccctattctggtggaactggatggtgatgtcaacggtcataagttttccgtgcgtggcgagggtga aggtgacgcaactaatggtaaactgacgctgaagttcatctgtactactggtaaactgccggtaccttggccgactctg gtaacgacgctgacttatggtgttcagtgctttgctcgttatccggaccatatgaagcagcatgacttcttcaagtccg ccatgccggaaggctatgtgcaggaacgcacgatttcctttaaggatgacggcacgtacaaaacgcgtgcggaagtgaa atttgaaggcgataccctggtaaaccgcattgagctgaaaggcattgactttaaagaagacggcaatatcctgggccat aagctggaatacaattttaacagccacaatgtttacatcaccgccgataaacaaaaaaatggcattaaagcgaatttta aaattcgccacaacgtggaggatggcagcgtgcagctggctgatcactaccagcaaaacactccaatcggtgatggtcc tgttctgctgccagacaatcactatctgagcacgcaaagcgttctgtctaaagatccgaacgagaaacgcgatcatatg gttctgctggagttcgtaaccgcagcgggcatcacgcatggtatggatgaactgtacaaatgaCATCACCATCACCATC ATCACtaaCCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTG AACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

4.3: On Twist, Select the Genes Option

Selected Genes

4.4: Select the ‘Clonal Genes’ Option

Clonal Genes Selected see results in subsections 4.5 and 4.6 below

4.5: Import your sequence

Imported Sequence (Step 1)

Imported Sequence (Step 2)

4.6: Choose Your Vector

Chose Vector

Downloaded Construct

Imported sequence into Benchling and viewed resulting plasmid

All Section 4 Prompts Listed Below

Supporting Prompt	Source
Tell me how to add a Promoter to a sequence in Benchling	Perplexity
Found this information from the Registry of Standard Biological Parts: BBa_J23106 Can you break down what this naming convention means and how I can find the relevant Promoter information in a sequence based on this naming convention? Do NOT hallucinate. If you don’t know the answer, say so	Perplexity
What is an alignment in Benchling? In Benchling, how do I put a codon optimized sequence under or next to a sequence I originally imported? Do NOT hallucinate when answering this question	Perplexity
How do I replace a sequence in Benchling with a codon-optimized sequence?	Perplexity
Bit confused regarding how to find a Promoter in a sequence in Benchling. I tried Auto-Annotate and it doesn’t seem to be working. Where should I go from here?	Perplexity
What is an RBS in Benchling?	Perplexity
What is a 7x His Tag? What is a Terminator? How do I find these in Benchling? Where are these traditionally inserted into a sequence in Benchling?	Perplexity
How do I paste sequences into a Benchling file?	Perplexity
How do I know where to insert a Promoter into a given sequence in Benchling?	Perplexity
Not totally understanding. If the start codon (the ATG) represents the start of the sequence, how do I insert something before that in Benchling?	Perplexity
What is an RBS? Where are they traditionally inserted into a sequence?	Perplexity
What do spacers look like in Benchling? Is it literally just empty space with no letters/codons? Something tells?	Perplexity
Where is a coding sequence traditionally inserted in a codon optimized sequence in Benchling? If there’s something off in what I’m saying, let me know	Perplexity
Where is a C-terminus in a protein in Benchling?	Perplexity
How do I find an amino acid view for a sequence in Benchling?	Perplexity
In Benchling, if I’m inserting a 7x His Tag and a Terminator, and I have a stop codon in my sequence, what is the traditional sequence? Is it 7x His Tag, stop codon, Terminator? Something else?	Perplexity
Any way I can add a Schema to a sequence after the fact in Benchling?	Perplexity

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?
I want to sequence the DNA of the bdelloid rotifer Adineta genus, a microscopic-sized invertebrate that kind of looks like a worm. I’m fascinated by its ability to sustain cryptobiosis for thousands (in the case of a bdelloid rotifer thawed out in Russia in 2015, more than 24,000!!) of years. Transgenesis of the bdelloid rotifer’s cryptobiotic abilities in mammalian organisms could have profound impacts on the future of the species, specifically the ability for groups of homo sapiens or other future sapiens forms to engage in interstellar travel over large durations of time and space. More information on the bdelloid rotifer Adineta genus and its cryptobiotic abilities can be found in the footnote at the end of this sentence ⁴.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
- I’d use Next-Generation Sequencing (NGS) on my DNA because it’s well-suited for transgenesis. It’s fast, high-resolution, and allows for massively parallel sequencing (if desired). More importantly, it’s highly precise, meaning it can pinpoint transgene locations within a host genome.
Is your method first-, second- or third-generation or other? How so?
- It’s a second-generation sequencing method, as it emerged in the 2000’s after the advent of Sanger sequencing in the 1970’s and before the advent of single-molecule sequencing in the 2010’s.
What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
- If the gene exists in an organism, the initial input is extracted genomic DNA from that particular organism. Otherwise, the initial input can be a plasmid (if the DNA’s already cloned) or complementary DNA (cDNA) if only mRNA is available. Essential preparation steps listed below (assuming gene exists in an organism):
  - Isolate or extract the gene; lyse cells or tissues from donor, remove proteins and contaminants
  - Fragment the isolated or extracted gene into fragments of apprxo. ~200-600bp
  - Convert ragged ends into blunt or sticky ends
  - Attach adapter sequences to ends of each fragment
  - Enrich fragments of the intended size (i.e., the size you want), removing any remaining small artifacts and excess adapters
  - Review fragement size distribution
  - Convert double-stranded library into single strands if necessary
  - Load into sequencing instrument
What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
- The essential NGS steps are listed below:
  - Preparation (see above)
  - Amplification: Many copies of each DNA fragment are created on flow cells. One end of each fragment sticks to a primer, gets copied to a complementary strand, and bends over to stick to a primer. This bridging repeats n^th times, forming amplified (hence the name) clusters of these identical grouped fragments
  - Sequencing: A polymerase enzymes adds 1 colored flourescent nucleotide to each strand in each cluster of grouped fragments. A camera takes a picture of recording the color for the given nucleotide (revealing its A, T, G, or C base) then chemicals are used to wash away any free-floating nucleotides that aren’t part of a given cluster.
  - Base Decoding/Base Calling: A computer analyzes these colored image clusters, assigning each cluster a sequence of bases based on its colors. This analysis then becomes a text string of DNA code, with confidence ratings per base sequence based on the resolution of the read ⁵
What is the output of your chosen sequencing technology?
- See above. NGS outputs a text string of DNA code, with confidence ratings per base sequence

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?)
- See answer to 5.1 (i). I’m fascinated by the potential of transgenesis of the bdelloid rotifer Adineta’s cryptobiotic abilities for interstellar travel, and therefore, am very interested in reading and editing its sequence.
NOTE: I found a bdelloid rotifer Adineta sequence in the link in the footnote. However, the raw FASTA information is so long that upon insertion into this webpage, it seemingly broke the webpage, or caused it to freeze up (pardon the unintentional cryptobiosis pun) ⁶
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
- I’d use PCR amplification, chemical synthesis, and restriction enzymes and ligation to perform this DNA synthesis. I’d use PCR so I can make many copies of the original DNA, I’d use chemical synthesis so I can clone the DNA into a given plasmid, encapsulating the DNA for desired level expression in an appropiate vehicle, and I’d use restriction enzymes and ligation for precise synthesis.
- What are the essential steps of your chosen sequencing methods?
  - PCR amplification:
    - Denaturation: Heat DNA so it breaks into stingle strands
    - Annealing: Cool the DNA, allowing primers to bind to sites in target gene
    - Extension: DNA polymerase adds nucleotides from the primer starting point, allowing each strand to fully copy
  - Chemical synthesis:
    - Deprotection: Remove protecting DMT group
    - Base Coupling: Add protected phosphoramidite nucleotide for phosphate linkage
    - Capping: Cap the chain to prevent errors
    - Oxidation: Stabilize phosphate triester bonds
  - Restriction enzymes and ligation:
    - Stitch oligos from prior chemical synthesis step into a complete gene for insertion into plasmid
    - Add flanking restriction enzymes at ends of oligos as needed
    - Clean the isolated DNA via gel extract
    - Use matching restriction enzymes to incubate the gene insert and its plasmid vector. This incubation process recognizes specific sequences in the gene insert and cut, creating blunt and sticky ends
    - Use DNA ligase enzyme to mix compatible ends
- What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?
  - A scalability challenge is that it’s difficult to synthesize a sequence of more than 200nt via direct synthesis methods because at a certain point, too many errors accumulate in the synthesized sequence

5.2 DNA Edit

(i) What DNA would you want to edit and why?
- See answer to 5.1 (i). I’m fascinated by the potential of transgenesis of the bdelloid rotifer Adineta’s cryptobiotic abilities for interstellar travel, and therefore, am very interested in reading and editing its sequence.
(ii) What technology or technologies would you use to perform these DNA edits and why?
- I’d use CRISPR-Cas9 to perform these DNA edits because when performing transgenesis from an invertebrate to a mammalian vertebrate, it allows for non-random, precise insertion of large amounts of genetic data with reduced risk of unintended or off-target effects.
- How does your technology of choice edit DNA? What are the essential steps?
  - CRISPR-Cas9 edits DNA through a multi-stage mechanism. This mechanism is broken down below:
    - Recognition: A single guide RNA (sgRNA) pairs with a Cas9 protein
    - This pair scans the genome for a 20bp DNA sequence if the sequence is next to a protospacer adjacent motif (PAM)
    - If there is a PAM next to the desired 20bp DNA sequence, the Cas9 makes a dobule-stranded break (DSB)
    - The DSB then triggers repair mechanisms (either non-homolgous end joining [NHEJ] or homology-directed repair [HDR]). These repair mechanisms allow the desired edited DNA to be incorporated into the sequence
- What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
  - Preparation:
    - Select target site(s)
    - Design and synthesize gRNA
    - Build DNA donor templates
    - Create mixture of recombinant Cas9 protein and purified gRNA
    - Preapre cells/embryos
  - Inputs to this process are gRNDA,Cas9 protein, donor DNA (usually linear templates), and a delivery vehicle (usually an injection buffer)
- What are the limitations of your editing methods (if any) in terms of efficiency or precision?
  - HDR can have low efficiency in a transgenesis context
  - Off-target or unintended consequences can still occur
  - Need PAMs near target sequences for precise DSBs

All Section 5 Prompts Listed Below

Supporting Prompt	Source
Remind me what a digest is in a biotechnological context. I know it has something to do with subdividing DNA sequences into fragments based on enzymes, but there’s some additional information I know I’m missing. Do NOT hallucinate when answering this question	Perplexity
What is horizontal gene transfer? A separate question (perhaps): What is the technical term in biotechnology for transferring the abilities of one organism to another (ex. if I wanted to actually give a lizard the ability to fly like a bird by importing genetic properties that allow for the creation of wings for example)?	Perplexity
What is it called in biotechnology when traits from one organism are transferred or conferred to another via an engineered process or processes?	Google AI Mode
If I want to perform transgenesis in a biotechnological context (i.e., introduce a foreign gene into a new organism to confer a desired trait), and I want to start this process by sequencing the original foreign gene, what is considered the best practice in modern biotechnology for sequencing this original foreign gene? Is this sequencing method first, second, or third generation in the history of biotechnology? From some other period? What essential steps does it involve and how does it decode the bases of the original foreign gene? What is its output? Do NOT hallucinate when answering these questions. If you don’t know the answer to any of these questions, say so	Perplexity
How is Next-Generation Sequencing (NGS) considered second generation? Do NOT hallucinate when answering this question	Perplexity
If I want to perform transgenesis in a biotechnological context (i.e., introduce a foreign gene into a new organism to confer a desired trait), and I want to start this process by sequencing the original foreign gene via Next-Generation Sequencing (NGS), what is my input at the very beginning of the sequencing process? How is that input prepared for sequencing? Do NOT hallucinate when answering these questions. If you don’t know the answer to any of these questions, say so	Perplexity
What is cDNA?	Perplexity
I have the following information describing and Illumina Next-Generation Sequencing (NGS workflow: –Library Prep: Extract/isolate the gene (PCR amplify if needed), fragment to ~200-500 bp, add adapters/barcodes.–Amplification: Bridge amplification on flow cell creates clusters of identical fragments.–Sequencing: DNA polymerase incorporates fluorescent reversible terminator nucleotides (A/T/G/C); camera captures color/emission per base, cleaving terminator for next cycle.–Imaging/Analysis: Base calling from images, alignment to reference or de novo assembly. Explain the amplification through imaging/analysis steps to me as if I was a reasonably educated 16 year old without an advanced biotechnology background. Tell me what the terms in the amplification through imaging/analysis steps mean. Do NOT hallucinate when addressing this query	Perplexity
What is a polymerase enzyme? Explain this to me as if I was a reasonably educated 16 year old without an advanced biotechnology background	Perplexity
Found the following on NCBI: “Uncultured bdelloid rotifer isolate Undet.AN.1.3 cytochrome oxidase subunit I gene, partial cds; mitochondrial"Is that the same as the full bdelloid rotifer genome? Bit confused and am unsure whether or not it isDo NOT hallucinate. If you don’t know the answer to this question, say so	Google AI Mode
Yes, I’m looking for a complete nuclear genome of a bdelloid rotifer, although I’m not sure which species of bdelloid rotifer I’m looking for. I thought bdelloid rotifer was its own species. Open to having any misconceptions cleared up. If you can provide me links where I can find a complete bdelloid rotifer nuclear genome, that would be greatly appreciated	Google AI Mode
Am aware that there was a bdelloid rotifer that came out of long-term cryptobiosis in Russia back in 2015. Any chance we know/you can find the specific species of this rotifer that had this cryptobiotic ability? If so, I’d like the complete nuclear genome for that rotifer species	Google AI Mode
In the answer two prompts ago, there was a mention of “chemicals wash away extras” in the ‘Sequencing by synthesis’ step. What are ’extras’ in this context? Do NOT hallucinate when answering this question	Perplexity
Is the workflow sequence described in the answer to the prompt 2 prompts ago Polymerase Chain Reaction (PCR)? Believe so, but am not sure	Perplexity
If I want to perform transgenesis and I start by Next-Generation Sequencing (NGS) to read the DNA, what comes after (i.e., what technologies are traditionally used for writing and then editing the DNA or the original organism that has the abilities I want to confer in a host)? Do NOT hallucinate when answering this question	Perplexity
In the answer to the last prompt, can you give me the ‘So What?’ or ‘So What’? behind the ‘Writing the DNA (Gene Preparation & Cloning)’ section from a transgenesis perspective? What do gene synthesis steps like “clone it into a plasmid vector with promoter, terminator, and selection marker.” allow one to do in a transgenesis context? What does restriction enzyme digestion and ligation allow one to do? Do NOT hallucinate. If you don’t know the answers to these questions, say so	Perplexity
I have a DNA writing workflow consisting of the following steps:–PCR amplification: Use the sequenced gene info to design primers and amplify the gene from the original source DNA.–Gene synthesis: Chemically synthesize the DNA sequence (especially if optimized for the host), then clone it into a plasmid vector with promoter, terminator, and selection marker.–Restriction enzyme digestion & ligation: Cut the vector and gene insert with enzymes, then join (ligate) them using DNA ligase to create a recombinant plasmidCan you give me the ‘So What?’ or ‘So What’? behind this workflow from the perspective of someone who wants to perform transgenesis from one organism to another? What do gene synthesis steps like “clone it into a plasmid vector with promoter, terminator, and selection marker.” allow one to do in a transgenesis context? What does restriction enzyme digestion and ligation allow one to do?Do NOT hallucinate. If you don’t know the answers to these questions, say so	Google AI Mode
How does PCR amplification and gene synthesis actually work? What are the essential steps?	Perplexity
How does the content in the last prompt related to the Phosphoramidite DNA Synthesis Cycle? Do NOT hallucinate when answering this question	Perplexity
What is a dNTP in genomics?	Perplexity
What is a ‘phosphate triester’? Explain this to me in simple terms. Do NOT hallucinate	Perplexity
Understanding the phosphoramidite DNA synthesis cycle. How does this transition to the essential steps of restriction enzyme digestion and ligation? Do NOT hallucinate when answering this question	Perplexity
If I want to perform transgenesis from one organism to another in a biotechnological context, particularly if I want to confer a trait from an invertebrate organism to a vertebrate, mammalian organism, what is/are the recommended DNA technology or technologies for accomplishing this task? What are the benefits and drawbacks, and respective workflows of each ot these technologies? Where does CRISPR fit into the mix?	Perplexity
Can you elaborate on the CRISPR-Cas9 workflow from the answer to the last prompt, specifically describing how it edits DNA? Can you make a workflow just focused on that? Explain this workflow to me as if I was a reasonably educated person with some (not extensive) biology and biotechnology knowledge. Do NOT hallucinate when answering this query	Perplexity

https://www.uniprot.org/uniprotkb/I3PM87/entry#sequences ↩︎
https://rest.uniprot.org/uniprotkb/I3PM87.fasta ↩︎
https://www.yourgenome.org/theme/what-is-rna-splicing/ ↩︎
Excerpt from “The Great Siberian Thaw” (New Yorker Magazine; 2022-01-17): “Permafrost thaw has brought to the surface all sorts of mysteries from millennia past. In 2015, scientists from a Russian biology institute in Pushchino, a Soviet-era research cluster outside Moscow, extracted a sample of yedoma from a borehole in Yakutia. Back at their lab, they placed the piece of frozen sediment in a sterilized culture box. A month later, a microscopic, wormlike invertebrate known as a bdelloid rotifer was crawling around inside. Radiocarbon dating revealed the rotifer to be twenty-four thousand years old. In August, I drove out to Pushchino, where I was met by Stas Malavin, a researcher at the laboratory. “It’s one thing for a simple bacterium to come back to life after being buried in the permafrost,” he said. “But this creature has intestines, a brain, nervous cells, reproductive organs. We’re clearly dealing with a higher order.” The rotifer had survived the intervening years in a state of “cryptobiosis,” Malavin explained, “a kind of hidden life, where metabolism eﬀectively slows down to zero.” The animal emerged from this geological “time machine,” as he put it, not just alive but able to reproduce. A rotifer lives for only a few weeks, but replicates itself multiple times through parthenogenesis, a type of asexual reproduction. Malavin removed from the lab fridge a direct descendant of the rotifer that had crawled out of the permafrost and placed it under a microscope. An oval-shaped plankton squirmed around; I imagined this blob, two-tenths of a millimetre in size, as a nervous explorer who awoke to find itself in a strange and unexpected future. “Why be modest?” Malavin asked. Unlocking the secret of how an animal with a complex anatomy was able to shut down for tens of thousands of years and then turn itself back on might, for example, oﬀer hints for using cryogenic conditions to store organs for donation. Neuroscientists at M.I.T. have been in touch. “I’m obviously not saying our findings will lead to people being put into long-term cryogenic slumber tomorrow,” Malavin said. “But it’s a step in that direction.”” ↩︎
I think this might mean the resolution of the image of the cluster ↩︎
https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_021613535.1/ ↩︎

Week 3 HW: Lab Automation

The Power of Lab Automation

Assignment: Python Script for Opentrons Artwork

0: Attended this week’s recitation and reviewed the lab information on programming Opentrons
1: Generated an artistic design using Ronan’s Opentrons GUI ¹
2: Artistic Design Python Script: See script in URL below:
- https://colab.research.google.com/drive/1-pgSJt_aF9MydtG0szxz2YKoogNRLRhH#scrollTo=PsOgJ2DndZzt
3: Listing my sfgfp point coordinates from Ronan’s Opentrons GUI below (the shape is a rightward-facing green arrow):
- [(6.6,11), (8.8,11), (11,11), (8.8,8.8), (11,8.8), (13.2,8.8), (11,6.6), (13.2,6.6), (15.4,6.6), (13.2,4.4), (15.4,4.4), (17.6,4.4), (15.4,2.2), (17.6,2.2), (19.8,2.2), (17.6,0), (19.8,0), (22,0), (-22,-2.2), (-19.8,-2.2), (-17.6,-2.2), (-15.4,-2.2), (-13.2,-2.2), (-11,-2.2), (-8.8,-2.2), (-6.6,-2.2), (-4.4,-2.2), (-2.2,-2.2), (0,-2.2), (2.2,-2.2), (4.4,-2.2), (6.6,-2.2), (8.8,-2.2), (11,-2.2), (13.2,-2.2), (15.4,-2.2), (17.6,-2.2), (19.8,-2.2), (22,-2.2), (24.2,-2.2), (-22,-4.4), (-19.8,-4.4), (-17.6,-4.4), (-15.4,-4.4), (-13.2,-4.4), (-11,-4.4), (-8.8,-4.4), (-6.6,-4.4), (-4.4,-4.4), (-2.2,-4.4), (0,-4.4), (2.2,-4.4), (4.4,-4.4), (6.6,-4.4), (8.8,-4.4), (11,-4.4), (13.2,-4.4), (15.4,-4.4), (17.6,-4.4), (19.8,-4.4), (22,-4.4), (24.2,-4.4), (26.4,-4.4), (-22,-6.6), (-19.8,-6.6), (-17.6,-6.6), (-15.4,-6.6), (-13.2,-6.6), (-11,-6.6), (-8.8,-6.6), (-6.6,-6.6), (-4.4,-6.6), (-2.2,-6.6), (0,-6.6), (2.2,-6.6), (4.4,-6.6), (6.6,-6.6), (8.8,-6.6), (11,-6.6), (13.2,-6.6), (15.4,-6.6), (17.6,-6.6), (19.8,-6.6), (22,-6.6), (24.2,-6.6), (17.6,-8.8), (19.8,-8.8), (22,-8.8), (15.4,-11), (17.6,-11), (19.8,-11), (13.2,-13.2), (15.4,-13.2), (17.6,-13.2), (11,-15.4), (13.2,-15.4), (15.4,-15.4), (8.8,-17.6), (11,-17.6), (13.2,-17.6), (6.6,-19.8), (8.8,-19.8), (11,-19.8)]
4: Used Gemini 2.5 Flash (built into Google Colab) to assist with completing the coding portion of the homework. I have some rough Python knowledge via a Codecademy course, which helped get things started (i.e., I did do some of the coding for this assignment).

All Gemini 2.5 Flash prompts are listed below:

Supporting Prompt	Model
I want to create some code similar to the code in Examples 1-7. What are the core elements of the code I need to create?	Gemini 2.5 Flash
Been working on some code in the ‘Your Code’ module. Have made a single green dot so far. Looking to create a rightward-facing green arrow based on these coordinates in the attached .py file Tell me how the code in the ‘Your Code’ module under the #Aspirate subsection needs to be edited to output the rightward facing green arrow in the attached .py file	Gemini 2.5 Flash
Wondering if you could help explain something. Not seeing an actual visualization of a green arrow below. Where is it? Can you give me a picture output of the code similar to the picture output located in the examples in this URL below? https://colab.research.google.com/drive/1VoouRH0nqlk09g50rHxOElaLD-SVknYY#scrollTo=PsOgJ2DndZzt	Gemini 2.5 Flash
Ensure the ‘run’ function executes all `arrow_points`	Gemini 2.5 Flash
Still getting a single green dot when I run the simulator. Have inputted the coordinates for the rightward-facing green arrow based on the attached file and am aware this will need to likely need a for loop to aspirate the colors.	Gemini 2.5 Flash
Still getting a single green dot when I run the simulator. Have inputted the coordinates for the rightward-facing green arrow based on the attached file and am aware this will need to likely need a for loop to aspirate the colors. Recommendations on how to proceed?	Gemini 2.5 Flash
Explain why only 1uL was dispensed instead of 20uL	Gemini 2.5 Flash
Looking at the code and see that I inputted the points for the rightward-facing arrow under arrow_points. Not understanding what parts of the code need to be changed (if any) so that when the simulation runs, a rightward-facing green arrow is outputted	Gemini 2.5 Flash
Thank you. Can you please tell me where cell “pczDLwsq64mk” is located, so I can have an idea of the code used to create the rightward-facing arrow in the simulation?	Gemini 2.5 Flash

5: Coordinating robot time slot with William & Mary node
6: Submitted Python file via assignment form (see screenshot below):

Python Form Submission Confirmation Screenshot

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
- The paper An Automated Versatile Diagnostic Workflow for Infectious Disease Detection in Low-Resource Settings was published in the journal Micromachines in 2024 and describes using simple Commercial Off-the-Shelf (COTS) reagents and lab equipment, along with an Opentrons robot, to create an automated workflow for detecting diseases in low-resource settings ². What’s interesting about the paper is that the workflow the researchers designed reduced the time for detecting a pathogen (in this case meningitis) by approx. 18% (total timne of 118 min.) with an almost 5.8x reduction in cost for sample processing ³. The total cost of to run 8 samples for meningitis detection was approx. $126 USD, a cost savings that matters in a low-resourced environments. The findings and extension of this paper’s workflow shows promise for decentralized disease detection in low-resource settings during future health security incidents. Except for opening and closing of tube lids, the following four steps of the autonated workflow were completed by the Opentrons robot:
  - DNA isolation with Dynabeads
    - Sample incubation
    - Washing step
    - DNA resuspension
  - DNA amplification
    - Recombinase Polymerase Amplification (RPA) mix preparatoon
    - RPA amplification
  - DNA digestion
    - Exonuclease digestion
  - DNA dtection
    - Preparation of Vertical Flow Microarray (VFM) solutions
    - Addition of samples to VFM
    - Signal enhancement
Associated AI prompts to address this question included below:

Supporting Prompt	Model
Can you find me biotechnology related papers from the past 5 years (ideally from Western sources) that incorporate Opentrons or another lab automation tool to create or research a novel biosecurity application	Google Scholar Labs
Give me a rundown of what primers are in genomics in simple terms (don’t get overly technical, if possible). Explain this to me as if I was a reasonably educated 15 year-old. Tell me if primers are found in nature or if they’re an artificial construct. Tell me how primers are used in biotechnology Do NOT hallucinate when answering this query	Perplexity
In a gene amplification context, what does vortexing mean?	Perplexity
What are amplicons?	Perplexity
What is a ctrA gene?	Perplexity
What is a reagent in a diagnostic or biotechnology context?	Perplexity

Write a description about what you intend to do with automation tools for your final project.
- Phage isolation experiments generally require enriching bacterial strains, filtering out bacterial particles, pouring the phage-containing mixture in with fresh bacteria on agar, plating and re-plating phage plaques (areas of phage propagation and bacetrial destruction on agar), and characterizing the resulting phage plaques on the agar. My working thoughts are that I might actually be able to create a somewhat automated workflow a-la the paper referenced in the previous question to help with the filteration through characterization steps of a phage isolation experiment. This might involve using an Opentrons robot and COTS equipment to:
  - Handle bacterial liquids
  - (Maybe) use an Opentrons module to shake bacterial liquids and reagents
  - Operate centrifuge for spinning bacteria
  - Pour agar with phage plaques
  - Operate micropipettes for phage testing
  - Operate Thermocycler module for amplifying specific lysed areas
  - Rapidly run software to characterize novel phage DNA

Note: The maeterial above is not set in stone. It’s an outline of potential automation options for a potential final project (Space (LEO and beyond) or Suborbital Phage Isolation)

Associated AI prompts to address this question included below:

Supporting Prompt	Model
What do phage isolation experiments usually entail? Are any elements of phage isolation experiments dangerous to human health and safety, and if so, why?	Perplexity
Take the steps in the “What phage isolation usually involves” section of the last prompt and break down the tools traditionally used for each step. Do NOT hallucinate when answering this question	Perplexity
What is supernatant?	Perplexity
What are plaques in a phage isolation experiment context?	Perplexity
What does it mean to ‘pellet’ bacteria?	Perplexity

Final Project Ideas

Submitted 3 Final Project ideas in my node’s section of the slide deck (see screenshot below):

All supporting prompts for Final Project Ideas Slide listed below

Supporting Prompt	Model
What is the microbiome? It’s the gut right? If I’m oversimplifying with the second question, let me know Do NOT hallucinate when answering these questions	Perplexity
Does a biotechnological equivalent of ‘Build Your Own Phage’ a-la Build a Bear or Lego exist in the real world? If so, does this capability exist in a personalized medicine context, or is it operable in remote environments? Do NOT hallucinate when answering this question	Perplexity
Tell me about the adenita vaga bdelloid rotifer that was able to maintain cryptobiosis for 24,000 years	Google AI Mode
Tell me about the adenita vaga bdelloid rotifer that went into cryptobiosis for 24,000 years	Google AI Mode
Thinking about creating some sort of real-time personalized medicine biological monitoring capability allowing phages to be identified, created, and disseminated for nascent infections. Create some type of logo image for this concept (the kind you might find on a sticker) that combines the elements of a phage and a time-keeping watch. Let’s not have it be too cartoony or effused with excessive color	Gemini
Nice. Change the writing on the top to PhageWatch as opposed to PhageGuard	Gemini

https://opentrons-art.rcdonovan.com ↩︎
https://www.mdpi.com/2072-666X/15/6/708#Introduction ↩︎
This is for running samples through the workflow not for equipment like the Opentrons OT-One-Hood ↩︎

Week 4 HW: Protein Design Part 1

South American Rattlesnakes (Crotalus durissus terrificus) with Crotamine protein

Part A: Conceptual Questions

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

I intake approx. 5 * 10²³ Daltons of amino acids when ingesting 500 grams of meat. This is based off results indicating I ingest approx. 10²¹ Daltons of amino acids when ingesting 1 gram of meat

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans eat beef but don’t become cattle, and eat fish but don’t become fish because genetic information from the lifeform being ingested isn’t transferred wholesale. Much of the genetic material being eaten is broken down during digestion and more importantly a human beings’ cells follow instructions derived from their DNA. Human beings’ cells utilize amino acids from the lifeform being ingestion, but perform this utilization according to specific genetic instructions. The lifeform being ingested and its amino acids are the raw materials the cell uses for various means.

Why are there only 20 natural amino acids?

There are several broad reasons why there are only 20 standard natural amino acids. The first reason is that early on in the history of evolution, this group of amino acids became more or less ’locked in’, meaning that once the basic relationship between three letter codons and these 20 standard natural amino acids became widely distributed across the kingdom of life, it becamde too risky/dangerous from an evolutionary standpoint to alter this core set. Another reason is that the group of 20 gives enough range in structure and chemistry to build a large chunk of what evolution or directed evolution might desire. The other reasons seem to amount to various types of evolutionary trade-offs. Adding more than 20 amino acids to this standard set would add additional, potentially unwanted complexity, while decreasing the number of amino acids in the set might lead to issues with a lack of uniqueness with amino acids side chain sharing, which would in turn limit the functional flexibility of amino acids to do things like fold precisely.

Can you make other non-natural amino acids? Design some new amino acids.

Yes you can. My attempts to design some new amino acids usng SwissSideChain and the Cryo-EM structure of Receptor Tyrosine Kinase ROS1 PDB file in PyMol (open-source) are shown below:

Attempt at creating a non-natural amino acid residue mutation of Tyrosine Kinase ROS1 using cyclohexanecarboxylic acid

Attempt at creating a non-natural amino acid residue mutation of Tyrosine Kinase ROS1 using cyclopropanecarboxylic acid

Where did amino acids come from before enzymes that make them, and before life started?

Amino acids come from metabolic molecules within the cell. These metabolic molecules consist of carbon atom chemical backbone, inorganic nitrogen, and enzyme-facilitated chemical reactions. Before life as we know it started, amino acids originated from abiotic (not from living organisms) chemical reactions on Earth before the emergence of life as we know it. The chemical reactions occurred in the atmosphere, hydrothermal and oceanic vents, and via meteorite and comet (i.e., extraterrestrial) delivery

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

If I made an a-helix using D-amino acids, I’d expect left handedness because the majority of alpha helices are built from L-amino acids, which exhibit right handedness.

Can you discover additional helices in proteins?

Yes additional helices can be discovered in proteins. These can include non-alpha helices and even π-helices. These can be discovered via x-ray crystallography, structure databases, or with the aid of AI prediction tools.

Why are most molecular helices right-handed?

Most molecular helices are right-handed because L-amino acides force the helix backbones to conform to right-handedness. Selection pressures also favors this form of handedness in the helix for hydrogen atom bonding and overall functionality

Why do β-sheets tend to aggregate?

β-sheets tend to aggregate because the edges of β-sheets have exposed nitrogen and hydrogen (donor) atomic pairs and carbon-oxygen pairs (acceptor) atoms. What happens is that when sheets get in close proximity these atomic pairs snap together, almost like a zipper.
- What is the driving force for β-sheet aggregation?
  - The driving force for β-sheet aggregation is the attraction of the donor and acceptor atomic pairs
Supporting prompts for this section listed below:

Supporting Prompt	Model
What is a Dalton as a unit of measurement?	Perplexity
How many molecules of amino acids are located within a single gram of edible meat, on average? Do NOT hallucinate when answering this question	Perplexity
What is Avogadro’s number and why is it relevant within the context of the answer to the previous prompt? What does mol mean in the context of g/mol? Do NOT hallucinate when answering these questions	Perplexity
10^{21} * 500 equals how much?	Perplexity
From a biological and genetic standpoint why is it that when an organism ingests food, they don’t become the organism they’re ingesting (i.e., if a human being eats salmon, why don’t they become a salmon)? Where do amino acids fit into this explanation? Do NOT hallucinate when answering these questions	Perplexity
Why are there only 20 amino acids found in nature? Do NOT hallucinate when answering this question	Perplexity
To elaborate on a point made in the 5th part of the answer to the previous prompt: what is a side chain in the context of amino acids? Do NOT hallucinate when answering this question	Perplexity
Share me resources for designing non-natural amino acids rather simply/with relatively little friction or extensive technical knowledge	Google AI
How can I quickly design a non-natural amino acids? What internet resources exist to help me do this?	Google AI
Having issues downloading SwissSidechain into PyMol. I’m on a MacOS. Do I need Python downloaded on my local machine for successful downloading of SwissSidechain? Do NOT hallucinate when answering these questions	ChatGPT
It’s telling me it’s installed but that Swiss Sidechain is Not Loaded. I downloaded the SwissSidechain package. Please tell me how to proceed	ChatGPT
I’m using ‘3.1.6.1’, 3.0, 3000000, 1749441553, ‘963e47f43382e009c2cd391f0747a8c20ef108e7’, 0’ of PyMol	ChatGPT
Getting ModuleNotFoundError: No module named ‘swisssidechain’	ChatGPT
When I go to select the SwissSidechain folder from my Downloads, I only can open the file. I can’t select the entire thing. Is that an issue?	ChatGPT
Yeah when I try to just select the entire folder from PyMol I can’t do. How should I proceed with the install?	ChatGPT
Ok. Now I have only a .zip and am able to select it. Getting the following error: “Plugin “PySwissSidechain2” has been installed but initialization failed. Tell me how to proceed	ChatGPT
It says ‘Unable to find license file’. Not understanding what’s going on here	ChatGPT
No I don’t have a PyMol license	ChatGPT
How do I install Conda on Mac? Is Miniconda downloadable on the web?	ChatGPT
Should I download the graphical or command Miniconda install?	ChatGPT
Yes	ChatGPT
Getting “zsh: command not found: conda”	ChatGPT
For this code, how do I find my username? Users//miniconda3	ChatGPT
Getting “zsh: no such file or directory: /Users/j_d_r/miniconda3”	ChatGPT
Yes	ChatGPT
Getting “Do you accept the Terms of Service (ToS) for https://repo.anaconda.com/pkgs/main? [(a)ccept/(r)eject/(v)iew]:” What should I write?	ChatGPT
Getting “PackagesNotFoundError: The following packages are not available from current channels: - pymol”	ChatGPT
Ok	ChatGPT
Ok. I’ve opened up the Open-Source version of PyMol and installed SwissSideChain. Now tell me how to mutate some residues from some Protein Data Bank (PDB) structures based on one of the non-natural L- or D-sidechains of the SwissSidechain database. Basically I want to create some graphical representations of non-natural amino acids using SwissSideChain in the Open-Source version of PyMol and I would like you to provide me a step by step process for how to get these visuals (when the visuals are created I also want to be able to know their names and other relevant information)	ChatGPT
What does a PDB structure mean within the context of proteins?	Perplexity
What’s a common protein downloaded from RCSB PDB?	Perplexity
Not totally understanding how SwissSidechain works. Do I just install random .pdb online, open that in PyMol, and then run SwissSidechain against it?	ChatGPT
Ok. At RCSB PDB and am looking to download a PyMol compatible file extension. What file extension should I go for?	ChatGPT
What is the name of a .pdb file? PDBx/mmCIF?	ChatGPT
Whenever I click ‘PySwissSidechain’ from Plugin –> Legacy Plugins my Open-Source PyMol window fails and I see this in Terminal: “libc++abi: terminating due to uncaught exception of type NSException”. What’s going on? How do I proceed with running the SwissSidechain extension?	ChatGPT
Yes do this	ChatGPT
Seeing this command on SwissSidechain: “Command line Use the command: Mutate Object//Chain/ResNumber/, Newres For instance, to mutate residue number ‘82’ on chain ‘E’ in object ‘complex’ into Homoleucine (HLEU), write: Mutate complex//E/82/, HLEU” What does each piece of this command mean in biological speak? How do I find residues and non-natural amino acids to choose from?	ChatGPT
In SwissSideChain, what is the short code for the non-natural amino acid ‘Cha’ and ‘Ca’?	Perplexity
Amino acids are produced by enzymes, correct? DO NOT hallucinate when answering this question	Perplexity
So based on the answer to the last prompt, where exactly do amino acids come from? Where exactly do they originate? DO NOT hallucinate when answering this question	Perplexity
When we say ’enzyme‑catalyzed pathway’ we mean a chemical reaction that an enzyme speeds up, correct? Do NOT hallucinate when answering this question	Perplexity
What is an enzyme pathway in the context converting a precursor into an amino acid? Is it simply a chemical reaction? Something else? Do NOT hallucinate when answering this question	Perplexity
In the answer 3 prompts ago, there was mention of “Carbon Skeleton” and “Intermediates of glycolysis, the citric acid (TCA) cycle, and the pentose phosphate pathway” when describing them. What does this mean in simple terms? Are we saying these provide the chemical structure of an amino acid to keep it fundamentally sound? Do NOT hallucinate when answering these questions	Perplexity
Ok. Based on available literature and if relevant/necessary, the information shared in response to the previous prompts, how did amino acids originate before the emergence of life as we know it on this planet? Was it via chemical process in cyanobacteria or another form of bacteria or archaea? Do NOT hallucinate when answering these questions	Perplexity
An a-helix within the context of genomics and synthetic biology is an alpha helix correct? How do D-amino acids relate to alpha helices? Do NOT hallucinate when answering these questions	Perplexity
Not understanding. In the answer to the last prompt you stated, " In genomics and synthetic biology, “α-helix” or “a-helix” refers to the standard protein alpha helix: a right‑handed secondary structure formed by regular backbone hydrogen bonding, almost always built from L‑amino acids in natural proteins.” Then you said, “A chain made entirely of L‑amino acids naturally forms a right‑handed α‑helix” Did hallucination occur here? If so, where and how?	Perplexity
If this is the case, how can an α-helix using D-amino acids exhibit right-handedness? Not seeing how this would work? Do NOT hallucinate when answering this question	Perplexity
Explain the following: –The difference between how protein is described in a nutritional context and what proteins are/how they are described in a biological or biotechnological context –My hunch is additional helices in naturally-occurring proteins can be found, although I’m not sure how. Confirm or deny this hunch. If my hunch is correct, tell me why additional helices for proteins found in nature can be found and some methods for discovering these additional helices. Do NOT hallucinate when explaining these items. If you there’s risk of exaggeration or outputting something not confirmed or based on sources, don’t output it	Perplexity
What are molecular helices in a biological context? How do they differ from other types of helices (if at all)? Do NOT hallucinate when answering these questions	Perplexity
What’s the handedness of most molecular helices? What explains their handedness? Do NOT hallucinate when answering these questions	Perplexity
What does this mean in the answer to the last prompt? “L-amino acids dictate right-handed protein helices (opposite would clash sterically).” What are L-amino acids again? Does the L stand for lysate? Explain all this to me as if I was a reasonably educated 14 year-old. Do NOT hallucinate when answering these questionss	Perplexity
What are β-sheets in a biological context? Do we call them beta-sheets? What does the β stand for? What exactly are they? Do NOT hallucinate when answering these questions	Perplexity
What causes beta-sheets to come together as a larger aggregate? Is it molecular forces from difference ionic charges on atoms comprising the beta-sheets? Something else? Do NOT hallucinate when answering these questions	Perplexity
From the answer to the previous prompt, “H-bond donors (N-H) and acceptors (C=O)”" means hydrogen bond donors and carbon acceptors, correct? Do NOT hallucinate when answering this question. What does the C stand for? Do NOT hallucinate when answering these questions	Perplexity

Part B: Protein Analysis and Visualization

I selected the crotamine protein found in the South American rattlesnake, the protein’s abilities to penetrate cells as a toxin allows it to serve as a template for antivenom and potentially targeted therapies.
The amino acid sequence of the crotamine protein protein is below ¹
- AAF34911.1 crotamine [Crotalus durissus terrificus] MKILYLLFAFLFLAFLSEPGNAYKQCHKKGGHCFPKEKICLPPSSDFGKMDCRWRWKCCKKGSGK
- This protein is 65 amino acids long. Its most frequent amino acid in this protein is lysine, which appears 11 times
- There are 250 protein sequence homologs for the crotamine protein after running it in UniProt BLAST (ID: Q9PWF3)
- Yes, this protein belongs to the crotamine-myotoxin family.
Structural answers to this question are listed below:
- The protein structure was discovered in 2005 ². It appears to be a mixed bag from a structural quality standpoint based on the NMR Structure Validation Report below: Crotamine protein RCSB NMR Structure Validation Report
- No, there are no other molecules in the solved structure apart from the protein
- It’s a defensin-like protein of the myotoxin family
3D visualization software (NGL Outputs)
- “Cartoon”, “Ribbon”, and “Ball and Stick” combo
- Protein coloring by secondary structure. I think it might have more sheets than helices, although I’m not sure
- Protein by residue type. It seems like it has a good mix of hydrophobic vs hydrophilic residues
- Protein surface visualization. It doesn’t seem like it has a lot of holes, although I’m not sure

All supporting prompts listed below:

Supporting Prompt	Model
I have a FASTA file and a GenBank identifier for a protein I found on NCBI.gov. How do I then find the amino acid sequence for that given protein? What will it look like? Do NOT hallucinate when answering this question	Perplexity
Confirming that the beta-keratin 2 [Gekko gecko] protein has a 3D structure, correct?	Perplexity
Does the beta-keratin 2 [Gekko gecko] protein have a 3D structure?	Google AI
Understood. Are there any slug mucus proteins that have a 3d structure?	Google AI
Gotcha. Tell me if there are any serpent-derived proteins that have a 3D structure	Google AI
What is the K amino acid? Is it potassium?	Perplexity
What are main protein sequence homologs in the context of genomics and biotechnology? What do they look like?	Perplexity
Show me the format of protein sequence homologs in UniProt database. Show me what they usually look like. Do NOT hallucinate when producing this output	Google AI
Find me resolution info depicting the quality of the structure described in this tab	Gemini
So I’ve heard a good quality structure has higher resolution. How can I obtain information on the quality o the structure from this tab? Is it possible?	Gemini
Based on the scores in the 2nd page of this tab, what is the quality of the structure? Where exactly does the black dot reside here?	Gemini
Based on this tab, tell me if there are any other molecules in the structure of the Crotamine protein beyond the protein itself? If there are, what are they? Do NOT hallucinate when answering this question	Gemini
Where can I find the structure classification family for this protein in this tab? Show me where I can find this	Gemini
Tell me what I’m looking at and how it helps determine the structure classification family of the crotamine protein	Gemini
Tell me if and/or how I can view this protein as a “cartoon” “ribbon” and “ball and stick” using this NGL viewer tool	Gemini
See the plus icon but can’t click on it	Gemini
I want to color the protein by secondary structure to determine whether or not it has more helices or shets. How can I do this in this viewer?	Gemini
Can you see the successful secondary structure output here? If not, say so	Gemini
I now want to color the protein by residue type to determine whether or not it has more helices or shets. How can I do this in this viewer?	Gemini
I no longer want to see secondary structures or helices or sheets. I want to see the protein’s residue types	Gemini
Right now my chain-related Color Scheme options are ChainID, ChainIndex, and ChainName. Which one should I choose?	Gemini
Yes it seems to be recognizing the entire structure as Chain A. Does this mean that most of the residue type is of 1 type? And does this mean that the residues are mostly hydrophobic or hydrophilic?	Gemini
Ok – now how would I visualize the surface of the protein. How can I see if it has holes or binding pockets?	Gemini

Part C: Using ML-Based Protein Design Tools

C1. Protein Language Modeling

Deep Mutational Scans
- See picture below: Crotamine Protein Deep Mutational Scan
- There seem to be these really interesting columns that occur as one moves rightward past position 20 in the heatmap. There is a single almost uniform top-down color for these columns, which according to Gemini, indicate a special sensitivity. These are apparently disulfide bridges, which are quite important for holding the protein together (these are referred to as ‘anchors’)
Latent Space Analysis
- See picture below Non-Crotamine tagged point cloud showing embedded proteins
- Yes they do–there are approximations of similar proteins found throughout the cloud. Crotamine tagged point cloud showing embedded proteins and Crotamine’s location among embeddings
- There are other proteins at the edge of the cloud that belong to either other viruses, or other organisms that might be found in similar locations/habitats to the South American rattlesnake (mice, chickens, humans), or other types of toxins or species carrying toxins. See picture below: Crotamine tagged point cloud showing embedded proteins and Crotamine’s location relative to its neighborhood

C2. Protein Folding

Approach 1: It doesn’t look like the predicted coordinates match the original structure in the PDB much at all (see pictures below). Think this might be due to the fact that the original inputted structures in the PDB for this protein were apparently a bit of a mixed bag according to its NMR Structure Validation Report results (see previous section). Crotamine Protein PDB < > ESM Side-by-Side Structural Comparison (1)
Crotamine Protein PDB < > ESM Side-by-Side Structural Comparison (2)
Approach 2: Asked Gemini about this discrepancy and was recommended to input the mature sequence containing the last 42 residues. The result still visually seems to have some discrepancies compared with the original PDB visual upon first glance, but far less than the outputs under Approach 1 (see photo below) Crotamine Protein PDB < > ESM Side-by-Side Structural Comparison (3) – Post-Mature Sequence Input
I tried the following mutations:
- Cytesine Break: Per Gemini suggestion, replaced all the C residues to A (alanine). This notably decreased the predicted Local Difference Test (pLDDT) score of the outputed mutated sequence and the visual also seemed to indicate less structural integrity to the protein, implying less resilience to this mutation (see photo below) Crotamine Cytesine Break Mutation Results
- Charge Swap: Per Gemini suggestion, replaced all the K (Lysine) and R (Arginine) residues to D (Aspartic Acid). This decreased the pLDDT score of the outputed mutated sequence a bit, but less than the Cytesine Break, indicating the protein was more comparably resilient to this mutation (see photo below) Crotamine Charge Swap Mutation Results

C3. Protein Generation

Based on the heatmap below, there seems to be a difference between the predicted sequence probabilities here and the original heatmap generated in the ‘Deep Mutational Scans’ subsection. 48 of the 65 positions were changed and the sequence recovery rate was around 0.26, which didn’t seem all that promising Crotamine ProteinMPNN Results (1)
Visually, this re-inserted ESMFold output also looks structurally different than the original protein structure (see photo below) Crotamine ProteinMPNN Results (2)

All supporting promopts for this section listed below:

Supporting Prompt	Model
Want to choose a GPU to run this. Believe the ability to select GPUs should be in the bottom right but am not seeing this. Direct me to where I should go on this page for GPU selection	Gemini
Looking over the heatmap under the ‘Mutation Scans’ section of the code, what stands out regarding the inputted protein sequence (MKILYLLFAFLFLAFLSEPGNAYKQCHKKGGHCFPKEKICLPPSSDFGKMDCRWRWKCCKKGSGK)? Are there any particular mutations or patterns that stand out and why? Do NOT hallucinate when answering this question	Gemini
I want to make sure I perform the Latent Space Analysis located under ‘Latent Space Analysis’ correctly, as in I want to make sure I perform Latent Space Analysis on the specific protein sequence I inputted under the previous section ‘Mutation Scans’. Is there anything I need to do or input? Does the code in ‘Latent Space Analysis’ just take the specific protein sequence I inputted under the previous section ‘Mutation Scans’ and run with it? Do NOT hallucinate when answering this question	Gemini
I want to make sure I perform the Latent Space Analysis located under ‘Latent Space Analysis’ correctly, as in I want to make sure I perform Latent Space Analysis on the specific protein sequence I inputted under the previous section ‘Mutation Scans’. It looks like I just inputted some incorrect code in the cell under ‘Latent Space Analysis’ that starts with ‘Latent Space Analysis’ Fix the errors in the code and make sure to input the appropriate information so the protein sequence from the previous ‘Mutation Scans’ cell can be analyzed here. Do NOT hallucinate when addressing this query	Gemini
Ok, so it looks like there’s output in the ‘3D T-SNE visualization of Protein Sequence Embeddings’ subsection of the ‘Latent Space Analysis’ section. Help me understand where my initially inputted Crotamine protein sequence (MKILYLLFAFLFLAFLSEPGNAYKQCHKKGGHCFPKEKICLPPSSDFGKMDCRWRWKCCKKGSGK) is located within the 3D plot. Do NOT hallucinate when addressing this query	Gemini
Ok, so it looks like there’s output in the ‘3D T-SNE visualization of Protein Sequence Embeddings’ subsection of the ‘Latent Space Analysis’ section. Help me understand where my initially inputted Crotamine protein sequence (MKILYLLFAFLFLAFLSEPGNAYKQCHKKGGHCFPKEKICLPPSSDFGKMDCRWRWKCCKKGSGK) is located within the 3D plot. It just seems like there are a lot of protein sequences in this plot and while I can drag over and find things manually, I’m not sure where to start or how to efficiently find the area in this plot where the inputted sequence is located. Any assistance would be useful. Do NOT hallucinate when addressing this query	Gemini
Yes let’s do that. Let’s give it a color that stands out.	Gemini
Now that the protein visualization has been generated in the ‘Run ESMFold’ section, I’m instructed to do the following: “Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?” There’s a couple things here. The first is that the outputted protein structure in ESMFold doesn’t really seem to match up all that well with the original protein structure in the Protein Data Bank (PDB). This might be because the original protein structure in the PDB was apparently a bit of a mixed bag from a quality standpoint. Not sure how to proceed, and that hesitation is twofold: 1) Is there anything I can do regarding the discrepancy between the outputted protein structure in ESMFold and the original protein structure in the PDB? If this is most likely out of my control/if there’s nothing that can be changed from an input or coding perspective, say so 2) To accomplish the instructions listed above in the quotation marks, should I just go to the subsection that says ‘sequence’ in the first cell under the ‘Run ESMFold’ section and just start randomly changing letters? Does that make sense? Regardless of whether or not this is the best approach for fulfilling the instructions, how would I know if the outputted structure(s) are resilient (or not) to mutations? Would it just be discerned from degree of change in outputted structures (i.e., if the structures change a lot, it likely isn’t resilient and vice versa)? Do NOT hallucinate when addressing these questions	Gemini
What does the pLDDT acryonym stand for?	Gemini
My instructions for the ‘Inverse Folding with ProteinMPNN’ section are to ‘Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.’ I assume there’s some customization to be done here, either through coding or through inputting sequence information. What would be the most likely sensible plan to proceed? Success seems to be that the compared protein sequence and the original one look visually rather similar. Any next steps on how to begin working this workflow would be appreciated. Do NOT hallucinate when addressing this query	Gemini
My instructions for the ‘Inverse Folding with ProteinMPNN’ section are to ‘Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.’ From there I need to put results back into ESMFold to see if the visuals match up. I assume there’s some customization to be done here, either through coding or through inputting sequence information. What would be the most likely sensible plan to proceed? Any next steps on how to begin working this workflow would be appreciated. Do NOT hallucinate when addressing this query	Gemini
My instructions for the ‘Inverse Folding with ProteinMPNN’ section are to ‘Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.’ From there I need to put results back into ESMFold to see if the visuals match up. I assume there’s some customization to be done here, either through coding or through inputting sequence information. What would be the most likely sensible plan to proceed? Any next steps on how to begin working this workflow would be appreciated. Do NOT hallucinate when addressing this query	Gemini
Do these steps outputted from the answer to the previous prompt take into account earlier sequence information inputted into previous cells outside the ‘Inverse Folding with ProteinMPNN’ section, specifically information on the protein sequence in question we want to visually compare across the ‘Inverse Folding with Protein MPNN’ and ‘Run ESMFold’ sections (YKQCHKKGGHCFPKEKICLPPSSDFGKMDCRWRWKCCKKGSGK)? If not, how can this be addressed? Do NOT hallucinate when addressing these questions	Gemini
For the last step, just executed, I want a more 1-to-1 comparison between the results in the diagram under ‘Visualize Amino Acid Probabilities’ an the results located in the cell under ‘Mutation Scans’ earlier in the notebook (https://colab.research.google.com/drive/10EnA1imLYVVtWQYR-CsQIWU7tA-LliMj#scrollTo=09FwbZ6v1AUs&line=2&uniqifier=1) Can we make the X and Y axis of the heatmap just executed match this original heatmap with ‘Position in Protein Sequence’ with numbers left to right in the X axis and ‘Amino Acid Mutations’ in the Y axis? Also want ‘Model Scores’ scored from 2 (more yellow-ish) to -6 (more purple-ish) a-la this original heatmap If there are parts of this instruction you don’t understand or that are not possible to execute, say so. After this is resolved, we’ll return to the original workflow from the answer to the previous prompt	Gemini
Address the error found when running the ‘Run ESMFold’ cell	Gemini
Everything under ‘Configure ProteinMPNN with Crotamine Structure’ doesn’t appear to be working. What’s going on?	Gemini

Part D: Group Brainstorm on Bacteriophage Engineering

My William & Mary Node Bacteriophage Engineering Group Brainstorm Google Doc. can be found here ³.

Week 5 HW: Protein Design Part 2

Using AlphaFold for Protein Optimization

Part A: SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

Retrieved human SOD1 sequence via UniProt (see photo below). Introduced A4V mutation via Gemini prompt (see sequence below). Human SOD1 sequence (A4V mutation not added)

Human SOD1 sequence (A4V mutation added)

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

See results in later questions (answers and photos below)
See photos below
Generated four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence
See photo below Added known SOD1-binding peptide FLYRWLPSRRGG for comparison
Perplexity scores listed below

Binder	Perplexity
WRYGAAALAHKE	8.976454
WRYYAAAVELGE	12.659931
WRYGPAVLALGK	9.556429
WLYYAVALALGE	15.294134
FLYRWLPSRRGG	20.635226

NOTE: PepMLM Colab used to generate results above can be found here ¹

Supporting prompts for this section listed below:

Supporting Prompt	Model
Why is protein design and models like AlphaFold important in the context of drug discovery and improvements in human health? If I were to describe its importance to a reasonably educated person on the street who doesn’t know much about the subject what would I say? Would something like “Since most diseases are caused by protein-related issues, and because proteins comprise an essential role in human health and physiology, knowing how proteins function and fold can help us design therapeutics with greater precision and efficacy”? What am I missing there and where am I off? Do NOT hallucinate when answering this question	Perplexity
In the context of proteins and/or the Superoxide dismutase (SOD1) protein found in Homo sapiens, what is the A4V mutation? What does it entail? What does A4V stand for? Do NOT hallucinate when answering this prompt	Perplexity
I have the following Superoxide dismutase (SOD1) protein sequence found in Homo sapiens: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ. I want to introduce the A4V mutation in this sequence so I can eventually generate relevant peptides to bind to the mutated sequence. How does the sequence need to change to accurately represent this mutation? Do NOT hallucinate when addressing this prompt	Gemini
Based on the contents in the ‘Inputs and Parameters’ cell, if I want to generate 4 peptides that each have a 12 amino acid length, other than the ‘Peptide Length’ variable, the other variable I should alter is the ‘Number of Binders’ variable, correct? Or is it the ‘Top K Value’ variable? Not sure which variable I need to alter. Do NOT change any cell content as part of addressing this prompt and do NOT hallucinate	Gemini
Now that I’ve generated 4 peptides, each of which are 12 amino acids long, I now want to add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Without changing any code in any of the cells in this workbook, how can I go about doing this? Do NOT hallucinate when addressing this query	Gemini
Question regarding the third bullet point under 3. for Method 1. How would the benchmark (FLYRWLPSRRGG)’s properties be known for comparison by performing the actions listed under Method 1 in response to the last prompt? Do NOT hallucinate when answering this question	Gemini
Let’s create a new cell under the ‘Download Results’ tab where the following Superoxide dismutase (SOD1)-binding peptide FLYRWLPSRRGG will be analyzed in the exact same way the peptides generated from the ‘Inputs and Parameters’ cell were analyzed in the ‘Generate Peptides’ cell. Do not alter any of the underlying fundamental logic from code in prior cells. Just extend it so the FLYRWLPSRRGG can be analyzed with a Perplexity score in the same way the results from the ‘Inputs and Parameters’ cell were analyzed in the ‘Generate Peptides’ cell. Do NOT hallucinate when performing this task	Gemini

Part 2: Evaluate Binders with AlphaFold3

Navigated to AlphaFold Server (see below):
See peptide results (ipTM scores and binding information) below in 3.
See ipTM and binding information results below:

WRYGAAALAHKE Peptide:
- ipTM: 0.31; peptide appears to bind near the dimer interface, and appears surface-bound, although it should be noted that the level of confidence indicated by the ipTM score is notably low, which can color the perception of these results. See photo below: WRYGAAALAHKE peptide AlphaFold Visualization Results
WRYYAAAVELGE Peptide:
- ipTM: 0.24; again peptide appears to bind near the dimer interface, and appears surface-bound, although it should be noted again that the level of confidence indicated by the ipTM score is again notably low, which can color the perception of these results. See photo below: WRYYAAAVELGE peptide AlphaFold Visualization Results
WRYGPAVLALGK Peptide:
- ipTM: 0.33; again peptide appears to bind near the dimer interface, and appears surface-bound, although once again the confidence of this assessment is not high based on the ipTM. See photo below: WRYGPAVLALGK peptide AlphaFold Visualization Results
WLYYAVALALGE Peptide:
- ipTM: 0.40; again peptide appears to bind near the dimer interface, and appears surface-bound, although once again the confidence of this assessment is not high based on the ipTM. See photo below: WLYYAVALALGE peptide AlphaFold Visualization Results
FLYRWLPSRRGG Peptide:
- ipTM: 0.29; peptide appears to engage with the β-barrel region somewhat and appears surface-bound–again the confidence of this assessment is not high based on the ipTM. See photo below: FLYRWLPSRRGG peptide AlphaFold Visualization Results

All of the ipTM values were low, meaning AlphaFold expressed notable uncertainty regarding peptide placement. It’s interesting to note that almost all of the PepMLM-generated peptides exceeded the FLYRWLPSRRGG 0.29 ipTM. Not sure what that means about techniques used to ascertain the relationship between the FLYRWLPSRRGG and the sequence, although it does seem to indicate PepMLM’s power

Supporting prompts for this section listed below:

Supporting Prompt	Model
If I want to model a protein-peptide complex using this service, how should I proceed? I understand I’ll need to input a protein sequence, but not sure how to input a relevant peptide? What entity type would a peptide fall under? Do NOT hallucinate when outputting this result	Gemini
I already have a protein sequence that should be formatted appropriately. I do have peptides, and it would be great to see if there is any modification that needs to be made in their formats to make sure they’re being inputted according to the correct FASTA format. Here’s the first peptide sequence: WRYGAAALAHKE. If any modification need to be made in their format to make sure they’re being inputted according to the correct FASTA format, tell me what changes need to be made, why, and then make the changes. Otherwise, don’t change anything if everything already checks out. Do NOT hallucinate when addressing this query	Gemini
What does piDDT mean on this page? What do ipTM and pTM mean?	Gemini
Need to understand where the WRYGAAALAHKE peptide binds to the A4V mutated SOD 1 homo sapiens protein sequence (MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ). Not understanding how to interpret the 3D visual I’m seeing on this page. What is the N-terminus and where is it located? What is the β-barrel region or the dimer interface and where are they located here? When we talk about peptide binding sites and we say they are either ‘surface-bound’ or ‘partially-buried’, which of these make sense for this peptide and how can we visually discern this from the 3D graphic? Do NOT hallucinate when replying to this prompt	Gemini
How do I find Residue 1 in the 3d graphic?	Gemini
Ok – so is the WRYGAAALAHKE peptide displayed in orange and yellow in the 3D graphic? Do NOT hallucinate when answering this question	Gemini
So based on what I’m seeing here, it looks like the WRYGAAALAHKE peptide might (very tentatively) bind near/around the N-terminus, and appears to be surface-bound correct? If this is wrong, correct this tentative peptide location and binding type information and explain why. Do NOT hallucinate when addressing this prompt	Gemini
Ok. So does the peptide engage the β-barrel region or approach the dimer interface? Where exactly does the protein appear to bind, generally speaking? Do NOT hallucinate when answering this question	Gemini
So based on what I’m seeing here, it looks like the WRYYAAAVELGE peptide might (very tentatively) bind near/around the N-terminus, and appears to be surface-bound correct? If this is wrong, correct this tentative peptide location and binding type information and explain why. If it approaches the dimer interface, explain why. Do NOT hallucinate when addressing this prompt	Gemini
How do I read what I’m seeing in the 3D graphic? Understand the β-barrel region can be visually eyeballed because it looks like an actual barrel. Other areas like the N-terminus or the dimer interface are harder to visually discern. Essentially I’m asking how to read this visual map of the protein < > peptide interaction located in the graphic. Do NOT hallucinate when replying to this prompt	Gemini
Yeah when I hover over residues in Chrome I just get a cursor. Nothing is highlighting. How should I proceed with reading the structural “landmarks” of the SOD1 protein.	Gemini
The WRYGPAVLALGK peptide appears surface-bound and NOT partially buried, correct? This makes sense because it doesn’t interact with the β-barrel region much at all, right? Do NOT hallucinate when answering this prompt	Gemini
The WRYGPAVLALGK peptide appears surface-bound and NOT partially buried, correct? This makes sense because it doesn’t interact with the β-barrel region much at all, right? Do NOT hallucinate when answering this prompt	Gemini
Believe the level of confidence indicated by the 0.4 ipTM is still not quite high, correct? Would it be considered failing? What is the threshold for failing here? Do NOT hallucinate when answering this question	Gemini
The WLYYAVALALGE peptide appears surface-bound and NOT partially buried, correct? Believe so. Do NOT hallucinate when answering this quesiton	Gemini
Can you explain what it means that most of the 3D graphic is colored dark blue? What is this color indicating exactly? Do NOT hallucinate when answering this question	Gemini
Wondering whether or not it would be fair to say that the FLYRWLPSRRGG binds near the dimer interface and appears surface bound and NOT partially buried. Do NOT hallucinate when addressing this prompt	Gemini
Would we say that the peptide engages the β-barrel region or approaches the N-terminus? Believe it doesn’t approach the N-terminus from my high-level understanding. Do NOT hallucinate when addressing this prompt	Gemini

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

See pasted peptide sequences, A4V mutant SOD1 sequences in target fields, and checked boxes results below
See pasted peptide sequences, A4V mutant SOD1 sequences in target fields, and checked boxes results below
See results below:

WRYGAAALAHKE Peptide:

This peptide has weak binding affinity, is soluble, non-hemolytic, with a slightly positive net charge and a molecular weight of 1372.5. See results below: WRYGAAALAHKE Peptide PeptiVerse Results

WRYYAAAVELGE Peptide:

This peptide has weak binding affinity, is soluble, non-hemolytic, with a slightly negative net charge and a molecular weight of 1427.6. See results below: WRYYAAAVELGE Peptide PeptiVerse Results

WRYGPAVLALGK Peptide:

This peptide has weak binding affinity, is soluble, non-hemolytic, with a positive net charge and a molecular weight of 1330.6. See results below: WRYGPAVLALGK Peptide PeptiVerse Results

WLYYAVALALGE Peptide:

This peptide has weak binding affinity, is soluble, hemolytic, with a negative net charge and a molecular weight of 1368.6. See results below: WLYYAVALALGE Peptide PeptiVerse Results

FLYRWLPSRRGG Peptide:

This peptide has weak binding affinity, is soluble, non-hemolytic, with a positive net charge and a molecular weight of 1507.7. See results below: FLYRWLPSRRGG Peptide PeptiVerse Results
There seems to be some relationship between higher ipTM scores and stronger predicted affinity, although it’s definitely not the type of relationship across the PepMLM-generated peptides that’s 1-to-1 or strong enough to indicate any direct form of causality. In fact the WRYYAAAVELGE peptide had the lowest ipTM score of 0.24, and yet it has the 2nd highest predicted affinity of the group (6.07). So again, we can’t say it’s a clean 1-to-1 relationship. While none of the PepMLM-generated peptides appear to have strong bindings in the general sense, the two strongest of the group, WLYYAVALALGE and WRYGPAVLALGK, are predicted to be soluble and hemolytic and soluble and non-hemolytic respectively.

Based on its Predicted Binding Affinity (5.77 pKd/pKi), Solubility (1.00), Hemolysis (Non-Hemolytic; 0.036), Net charge (ph 7) (1.76), and its Molecular Weight (1330.6 Da), it appears the WRYGPAVLALGK peptide best balances predicted binding and therapeutic properites, and therefore should be advanced based on this balance relative to the other PepMLM-generated peptides

Supporting prompts for this section listed below:

Supporting Prompt	Model
Need to place the following A4V mutant SOD1 sequence “in the target field”: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ. I’ve already inserted the peptide sequence in the ‘Peptide Sequence(s) / SMILES’ field. Where do I place the A4V mutant SOD1 sequence? Where is the ’target field’? Do NOT hallucinate when answering this question	Gemini
Need to check a box off for ‘molecular weight’. Which unchecked box would that be and why? Do NOT hallucinate when answering this question	Gemini
If I have a ‘Net Charge (pH 7) value of 0.85, what does that mean in plain terms? Is it good or bad from a therapeutic perspective? Likewise, if I have an ‘Isoelectric Point’ value of 8.60, what does that mean in plain terms? Is it good or bad from a therapeutic perspective? And if I have a ‘Hydrophobicity (GRAVY)’ score of -0.56, what does that mean in plain terms? Is it good or bad from a therapeutic perspective? Do NOT hallucinate when answering these questions	Gemini
If I have a ‘Molecular Weight’ value of 1372.5, what does that mean in plain terms? Is it good or bad from a therapeutic perspective?	Gemini
Would it be fair to say that this peptide has a slightly positive ‘Net charge (pH 7)’ score or that is has a positive ‘Net charge (pH 7)’ score? What is the distinction? Do NOT hallucinate when answering this prompt	Gemini
What does ‘ug/m’ mean again? Also it’s definitely fair to say that any results showing a hemolytic peptide indicate the peptide is NOT safe for advancement into further therapeutic trials, correct (given the risk of red blood cell damage)? Do NOT hallucinate when answering this prompt	Gemini
If the WRYGPAVLALGK peptide has the following properties, would we say it has a decent or nice balance of predicted binding and therapeutic properties? See properties below:–Soluble: 1.00–Hemolysis: Non-Hemolytic (0.036) –Binding Affinity: Weak Binding Affinity (5.77)–Net charge (pH 7): 1.76–Molecular weight: 1330.6. Do NOT hallucinate when answering this prompt	Gemini

Part 4: Generate Optimized Peptides with moPPIt

Opened moPPit Colab
Made a copy and switched to a GPU runtime (see below) Switched to GPU runtime
Notebook results:
- Pasted A4V mutat SOD1 sequence (see below) Pasting A4V mutant SOD1 sequence
- Chose specific residue indices on SOD1 for peptides to bind (see below) Set specific residues indices on SOD1 for peptide binding
- Set peptide length to 12 amino acids. Generated peptide (see below) Generated peptides
First off, these peptides have stronger and more specific binding than the previous PepMLM peptides. They also appear to achieve this stronger and more specific binding while simultaenously remaining non-hemolytic and soluble. It does appear that there was a slight dip in non-fouling, however. I would evaluate these peptides against the previous set, and also against the intended safety standards for anticipated means of therapeutic transmission (oral, intravenous, etc.)

NOTE: moPPit Colab used to generate results above can be found here ²

Supporting prompts for this section listed below:

Supporting Prompt	Model
If I want to choose specific residues indices (places on the ‘Target_Protein’ variable located under cell ‘3.1 Inputs and Parameters’) where I want to want peptides to bind, what variables in cells 3.1 or 3.2 should I be focusing on and why? Do NOT hallucinate when answering this prompt	Gemini
I’m dealing with a A4V mutated Superoxide dismutase (SOD1) protein sequence found in homo sapiens (MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ). Most of the peptides I generated on another tool appeared to bind rather weakly, potentially near the dimer interface of the protein. Based on this information on the protein sequence and the previously-generated binders, I’m not exactly sure where (i.e., what Motif Positions) and how strong (Specificity) my binders should be that I create in cell 3.2. I’m aware I want to likely increase binding strength/have stronger bindings, but again, not sure exactly what placement(s) make sense given the nature of the A4V mutuation. Open to any thoughts you may have. Do NOT hallucinate when addressing this prompt	Gemini
Ok. Help me understand the results that were just produced from the ‘4. Binder Generation’ cell, and tell me how I can get a .csv file of the results	Gemini
I’ve created some peptide binders that are meant to bind to the a A4V mutated Superoxide dismutase (SOD1) protein sequence found in homo sapiens (MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ). These peptides are meant to bind to the 1-10, 49-54, 111, 148-153 sites of the mutated SOD1 sequence, and are meant to do so with greater binding strength than the previously generated peptide binders in the attached screenshots	Gemini

Part B: BRD4 Drug Discovery Platform Tutorial (Optional)

Did not complete Part B

Part C: Final Project: L-Protein Mutants

Part C Homework and supporting prompts can be found at the hyperlink below ³
Part C Colab Notebooks can be found at the hyperlinks located below ⁴ ⁵

Week 6 HW: Genetic Circuits Part 1

Robot Crafting Genetic Circuit (Stylized)

DNA Assembly

What are some components in the Phusion High-Fidelity (HF) PCR Master Mix and what is their purpose?
- HF DNA Polymerase: This is the enzyme responsible for copying DNA as it moves from the 5’ to the 3’ position across the DNA
- Deoxynucleotide triphosphates (dNTPs): These are the DNA molecular building blocks, consisting of Adenine (A), Thymine (T), Cytosine (C), and Guanine (G) variants
- HF Buffer: This consists of magnesium chloride, which is salt added to the reaction. It matters because it dissolves into Mg²⁺, which helps nucleotides bond during the reaction
What are some factors that determine primer annealing temperature during PCR?
- Some factors that determine primer annealing tempeature during PCR include:
  - Primer lengths
  - Primer melting tempratures
  - GC content/sequence content
  - Buffer components
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
- PCR: PCR creates new linear DNA fragments by via enzymatic amplification of a given region nth number of times. The PCR protocol essentially consists of setting up reaction mixes, denaturating the DNA into single strends, annealing so primers can anneal to specific complementary sequences, extension so the polymerase can syntehsize a new strand, and then repeating this as many times as neccessary. This method might be more useful when there is a specific fragment of DNA one wants to amplify for further use.
- Restriction Enzyme Digests: Restriction Enzyme Digests create new linear DNA fragments by cutting DNA at specific points/recognition sites. The Restriction Enzyme Digest protocol consists of setting up a reaction mix, incubation, and then stopping the reaction. This method might be more useful when there is a specific fragment of DNA one wants to isolate for further analysis.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
- You can ensure the DNA sequences have appropriate 5’ –> 3’ orientation with corresponding overlaps. Fragments salso need to cover the relevant region for cloning, and also need to be inserted at the appropriate molar ratio relative to the plasmid backbone (vector). This is usually a 2:1 ratio.
How does the plasmid DNA enter the E. coli cells during transformation?
- The plasmid DNA enters the E. coli either via heat shock (temperature change) or electroporation (high electrical voltage). Both methods shock the E. coli cell, causing its cell membrane to open for the plasmid DNA to enter.
Describe another assembly method in detail (such as Golden Gate Assembly)
- DNA topoisomerase I (TOPO) Cloning: TOPO cloning’s traditionally used, as it’s a fast, reliable method for cloning products from PCR for later sequencing, etc. The first step in TOPO cloning is generating an insert with Taq polymerase via PCR. This creates inserts with an A-overhang, which can then help address the second step. The second step is to combine this PCR product with the TOPO vector. This is usually done for a couple of minutes. The insert’s 5’ OH/hydroxyl interacts with the TOPO DNA at its end, and as part of this process A and T base pairing occurs between the respective insert and the vector . Then the TOPO religates the strangs and dissociates, creating a closed circular plasmid with the given insert. See diagrams below:

See results below:

Supporting prompts for this section listed below:

Supporting Prompt	Model
Within the context of Gibson Assembly (biotechnology DNA assembly method), why exactly are molar ratios (apparently they need to be 2:1, insert:vector) important? What are molar ratios? Do NOT hallucinate when replying to this prompt	Perplexity
What exactly is the insert and what exactly is the vector within the context of the Gibson Assembly DNA Assembly method? Do NOT hallucinate when replying to this prompt	Perplexity
In the context of biotechnology and synthetic biology, what exactly is a plasmid backbone? Explain this to me as if I were a reasonably educated 16-year old. Do NOT hallucinate when addressing this prompt	Perplexity
Tell me about the Phusion High-Fidelity (HF) Polymerase Chain Reaction (PCR) Master Mix. What is it? What are its subcomponents and what do they do? Do NOT hallucinate when addressing this prompt	Perplexity
The 4 dNTPs referenced in the answer to the last prompt are essentially chemical mixtures, correct? If that is incorrect, what are they? Do NOT hallucinate when answering this prompt	Perplexity
In the answer to the prompt before the previous prompt, there was reference to MgCl₂, and Mg²⁺. What are MgCl₂, and Mg²⁺ respectively? Are they chemicals? Something else? Why do they matter? Do NOT hallucinate when addressing this prompt	Perplexity
Within the context of a Polymerase Chain Reaction (PCR), I believe primers are the pieces of DNA that get copied nth number of times, correct? If I’m mistaken, indicate as such, and the error in the initial reasoning. Do NOT hallucinate when addressing this prompt	Perplexity
So based on the answer to the last prompt: –Primers essentially define the space in the DNA sequence that will be copied? –What is a free 3′‑OH end? Explain this to me as if I were a relatively educated 16-year old Do NOT hallucinate when answering this prompt	Perplexity
Do primer pairs always need to have a temperature difference of 5°C from each other? If so, why? Do primer pairs always need to at a temperature of between 52–58°C before annealing? If so, why? What factors determine ideal primer annealing temperatures, and why? Do NOT hallucinate when addressing these prompts	Perplexity
Both Polymerase Chain Reaction (PCR) and Restriction Enzyme Digests create linear DNA fragments. PCR creates these linear DNA fragments via enzymatic amplification and Restriction Enzyme Digests create these by essentially cutting the DNA, correct? What do the basic steps of each look like (in some simple broken down steps)? Do NOT hallucinate when addressing this prompt	Perplexity
Other than PCR, Restriction Enzyme Digest, and Golden Gate Assembly, what other DNA assembly methods exist? Do NOT hallucinate when addressing this prompt	Perplexity
Which of the following results from the answer to the previous prompt is easiest to model in Benching? EXCLUDE Gibson Assembly from the selection and do NOT hallucinate when addressing this prompt	Perplexity
Ok. Based on the content in the answer to the prompt before the last prompt, explain to me what TOPO cloning is. What is a TOPO? What does it consist of/what are the basic steps? Do NOT hallucinate when addressing this prompt	Perplexity
Based on the answer to the last prompt, what exactly is Taq polymerase again? When we say TOPO cloning is ’ligase-free’, what do we mean when we say that? Why is TOPO cloning traditionally used?	Perplexity
In the context of the answer to the last prompt, why would one clone a PCR product using TOPO cloning? An insert is just the piece/fragment of DNA being inserted into the vector (usually a bacterial plasmid), correct? Do NOT hallucinate when addressing this prompt	Perplexity
Break down the basic steps of how I would model basic (i.e., not complicated TOPO Cloning) in Benchling. Do NOT hallucinate when addressing this prompt	Perplexity
Where can I find a PCR insert to insert into Benchling for TOPO Cloning? Can I basically choose anything from GenBank or UniProt? What should I be looking for in the context of TOPO Cloning? Do NOT hallucinate when addressing this prompt	Perplexity
Thanks. Based on the answer to the last prompt, EGFP is a green fluorescent protein, correct? Do NOT hallucinate when addressing this prompt	Perplexity
Is TOPO Cloning a form of homologous, homology-based cloning? Do NOT hallucinate when addressing this prompt	Perplexity
Would Benchling’s ‘Concantenate sequences’ feature work/be suitable if one was trying to model TOPO Cloning within Benchling? Why or why not? Do NOT hallucinate when addressing this prompt	Perplexity
Take a look at this tab. I want to model DNA topoisomerase I (TOPO) Cloning and am not sure I’m doing the right things. I know I have a Polymerase Chain Reaction (PCR) insert, but am not sure where to go from here regarding the other sequence I’ve imported (sequence-416748). It has a ‘pCR 2.1-TOPO Fzd6HA’ Sequence Label, but I’m not sure if it’s right, or if it is, where to go from here to appropriately model TOPO Cloning in Benchling. Do NOT hallucinate when replying to this prompt	Gemini
Ok. For Step A, can you help me find the insertion sites or the Fzd6 gene?	Gemini
Ok. I have this EGFP sequence and want to use it as part of TOPO Cloning with the previous pCR 2.1-TOPO sequence. What do I do? Do NOT hallucinate when replying to this prompt?	Gemini
Ok. So to complete the 3rd bullet under Step 1 from the answer to the last prompt, do I literally just add an A to the first base and an A to the last base in the sequence? Do NOT hallucinate when replying to this prompt	Gemini
When I go to Asembly Wizard, I can only do ‘Golden Gate’, ‘Gibson’ or ‘Homology’. How should I proceed?	Gemini
When we say ‘select the part of the vector excluding the Fzd6HA gene’, that can mean just copy the entire sequence EXCEPT for the Fzd6HA gene in a new sequence and then including that or no?	Gemini
Ok. Take a look at the construct I made. Does this depict TOPO Cloning in a reasonable way based on the insert and the vector? Do NOT hallucinate when replying to this prompt	Gemini
Getting an error that one of my constructs is invalid. Show me how to fix this	Gemini
Confused. Want to do this manually WITHOUT the Assembly Wizard. Not seeing where to add the EGFP PCR Product. Also not seeing how I can see the plasmid	Gemini

Asimov Kernel

See Repository below
- - Created Asimov Repository
See blank Notebook entry below
- - Created Asimov Notebook
See results below
Explored Bacterial Repos Devices and ran Simulator on various examples (see above)
See Construct creation results below
- Question 1-3 Results: Recreated Repressilator in empty Construct using Characterized Bacterial Parts repository parts, searched and selected parts using the Search function, and dragged and dropped parts into Construct (see photos below)
- Question 4 Results: The Repressilator wasn’t running as expected so I re-made it and ensured I directlty copied and pasted everything. Then I re-ran the Simulation and the Repressilator operated as expected (see photos below)
- Question 5 Results: Documented results in Notebook (see below)
See 3 Construct creation results below
- Created 3 Constructs, explaining in each Notebook entry how I thought the Constructs should operate and why (see photos below)
  - - Construct 1 Results
  - - Construct 2 Results
  - - Construct 3 Results

Supporting prompts for this section listed below:

Supporting Prompt	Model
Remind me what XOR means in classical computation (including, but not limited to its instantiation on digital computers). Do NOT hallucinate when addressing this prompt	Perplexity
Share any links that break down the basic symbology/legend of Synthetic Biology Open Language (SBOL). Do NOT hallucinate when replying to this prompt	Perplexity
Little bit confused here. Copied all the Repressilator parts from the ‘Bacterial Demos’ repo, but am getting different results when running the Simulation. What could be going on?	Kernel AI

Week 7 HW: Genetic Circuits Part 2

Genetic Circuits Part 2

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Unlike traditional genetic circuits, IANNs are analog, and as such correspond more closely to the nature of biological systems (i.e, we’re not always looking for strict 0/1 binary logic, sometimes we’re looking to establish control across a range of values or space/time). This analog nature means they are more responsive, efficient, and biocompatible.
Zoonotic Reservor IANNs (ZRIANNs)
- This is applying IANNs to proactively identify and eliminate zoonotic pathogens in host organisms (ex. cattle and other forms of livestock, bats). By establishing prior understanding of organism homeostatis, ZRIANNs can co-evolve their understanding of nominal organism cellular activity. Across ZRIANNs, this might act as a form of input. When there are deviations from said baselines due to the emergence of novel pathogens trigering events like elevated inflammation levels or unexpected apoptosis, this would act as another form of input, triggering the ZRIANNs to begin understanding novel pathogen behavior for elimination, which would be a form of output. Limitations might include the novel of an IANNs across diverse non-human hosts, difficulties in IANNs achieving homeostatic understanding and insights, and difficulties in IANNs detecting novel, essentially zero-day equivalent pathogens
See my attempt at a diagram below:

Intracellular multilayer perceptron attempt

Supporting prompts for this section listed below:

Supporting Prompt	Model
In the context of artificial neural networks, what is a bandpass circuit? In simple term, what does a bandpass circuit do and what does it look like? Do NOT hallucinate when answering this prompt	Gemini
In the context of artificial neural networks, what is a bandpass circuit? In simple term, what does a bandpass circuit do and what does it look like? Do NOT hallucinate when answering this prompt	Google AI Mode
Can you show me a simple image/graphic of what a multilayer perceptron looks like?Do NOT hallucinate when addressing this prompt	Google AI Mode

Assignment Part 2: Fungal Materials

Existing fungal materials include mycellium-based composites (MBCs), flexible fungal materials (FFMs), and pure mycellium materials. They’re used for use cases such as packaging, fashion, construction, and health and beauty products. Compared to their traditional counterparts, their advantages include greater environmental sustainability (including less use of petroleum and less harm to animals in their construction and use) with high levels of customization and low density Their disadvantage seem to be their load-bearing strength in some applications, and more variability as it’s biologically grown.
I’d personally be interested to see if fungi could be genetically engineered to improve air or water quality and filtration in remote environments where air or water filters may not always be available/readily attainable. My why is rather simple – it would seem rather convenient and environmentally sustainable if air and water filters could be biologically grown as opposed to manufactured and shipped through traditional processes. The advantages of doing synthetic biology in fungi as opposed to bacteria include greater ability to fold complex proteins and a greater ability to form larger, more complex macroscopic structures

Supporting Prompt	Model
Share me some links for articles about fungal materials. Want to understand the types of fungal materials and what they’re used for	Gemini
What are the advantages and disadvantages of fungal materials compared to more traditional materials (including petroleum-based materials) EXCLUDING environmental impact/sustainability? Do NOT hallucinate when addressing this prompt	Google AI Mode
What are some of the advantages of using fungal materials as opposed to bacteria in synthetic biology	Gemini

Assignment Part 3: First DNA Twist Order

Reviewed Final Project documentation guidelines
Submitted Google Form
Created Benchling for insert sequence input (https://benchling.com/bio_star_39/f_/MMV1lxUm3Y-htgaa-final-project-working-folder/)

Week 9 HW: Cell-Free Systems

Homework Part A: General and Lecturer-Specific Questions

Cell-Free Systems

General Homework Questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
- Cell-free systems allow for a broader range of potential chemistries than those given to us from natural biology, expanding flexibility. Cell-free protein synthesis also allows for greater control over experimental variables because the entire protein expression construct is designed from scratch (i.e., we have the opportunity to bypass a lot of the compleity of natural cells). Cell-free expression is more beneficial than cell production if you want to rapidly protoype gene pathways and if you want an expression mechanism that’s more amenable to consistent, predictable modeling and analysis.
Describe the main components of a cell-free expression system and explain the role of each component.
- The main components of a cell-free expression system are (based on elements described in this hyperlink ¹):
  - DNA template: Genetic code to begin Tx/Tl process
  - Ribosomes: Assembling amino acids into polypeptides
  - Enzymes: Catalyzing certain important chemical reactions necessary for the appropriate functioning of that cell-free expression system (ex. transcription and translation, energy generation)
  - Amino Acids: The core chemical building blocks of the proteins the cell-free expression system will express
  - Polymerases: Synthesizing DNA and RNA
Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment
- Energy provision regeneration is critical in cell-free systems because cell-free systems don’t consume enzymes to produce energy. They also need external energy sources to remove waste products. A workaround might be to have analogous enzymatic reactions (possibly based off shared common charges) within the cell-free system to produce energy
Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
- Prokaryotic cell-free expression systems allow for the colocation of transcription and translation. This might work well for proteins that need to be produced at high volume, like an industrial protease prtoein. Eukaryotic cell-free expression systems allow for more complex proteins to be built due to their nuclei. This might work well for the production of more advanced/technically complex proteins, like rabbit serum albumin.
How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
- In a manner similar to Shuguang Zhang ‘molecular glove’ experiment, I’d try to essentially coat and/or surround the the membrane protein with hydrophilic proteins to attract and/or absorb water in the cell-free environment, so the membrane protein can incorporate into the liposome ². Challenges might include appropriate hydrophilic concentrations (which might be discerned via calculations or trial and error) or bonding between the hydrophilic proteins and the membrane proteins. This might be mitigated and/or the amount of error reduced through the use of computaitonal modeling and simulation tools like AlphaFold
Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
- Suboptimal Ribosome Function: Examine ribosome mRNA transcription processes and modify as necessary
- Suboptimal Transcription: Examine tRNAs for coding errors/misreads or inappropriate expression levels and modify as necessary
- Suboptimal External Communication (i.e., yields cannot properly exit system at desired levels): Examine and modify membrane channel functionality as necessary

Supporting prompts for this section listed below:

Supporting Prompt	Model
When we’re saying ‘cell-free expression system’, we’re not saying the same thing as a ‘synthetic minimal cell’ correct? Do NOT hallucinate/make things up when replying to this query	Gemini
Believe a ribosome has a waste removal function within cells, but I could be wrong in this. Clarfy or confirm this is the case, and do NOT hallucinate/make things up when replying to this prompt	Gemini
Believe a polymerase’s job is to essentially make copies of certain things, but the specifics beyond this are evading me at the moment. What is the specific role of a polymerase generally speaking? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Do cell-free systems contain polypeptides? Why or why not? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
In essence, an enzyme’s function within a cell-free expression system is to catalyze certain important chemical reactions necessary for the appropriate functioning of that cell-free expression system, correct? If I’m mistaken here, or if some element of my thinking is off, say so. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
In essence, amino acids’ function within a cell-free expression system is to serve as the core chemical building blocks of the proteins the cell-free expression system will eventually express, correct? If I’m mistaken here, or if some element of my thinking is off, say so. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Within the context of cell, I know ADP and ATP are phosphates (I believe an adenine-type phosphate), although I’m not sure. Tell me if this is correct, and tell me in simple terms how ADP and ATP work within the context of cells to generate appropriate energy levels for cell functionality. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Unlike a prokaryote, eukaryotes have nuclei, correct? What are the advantages and disadvantages of nuclei within the context of protein production? Is it simply that the complexity of eukaryotic cells allows for the production of more sophisticated, technically complex proteins, or are there more reasons? Answer this prompt in a relatively succinct fashion and do NOT hallucinate/make things up when doing so	Gemini
Tell me about what types of proteins don’t require excessive quality control, that require a large volume to be produced, and benefit from a prokaryotic setup (i.e., a cell where transcription and translation occur in the same location)? Do NOT hallucinate/make things up when replying to this prompt	Gemini
What is the name of the common rabbit protein traditionally used in biotechnology experimentation with mammalian cell culture? Blanking on the name. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode

Homework question from Kate Adamala

Based on Iulianna, T., Kuldeep, N. & Eric, F. The Achilles’ heel of cancer: targeting tumors via lysosome-induced immunogenic cell death. Cell Death Dis 13, 509 (2022). ³

Design an example of a useful synthetic minimal cell as follows:

Pick a function and describe it

What would your synthetic cell do? What is the input and what is the output?
- Increase apoptosis in mammalian cells with defective lysosomes. Input: Protein kinase R-like endoplasmic reticulum kinase (PERK). Output: Phosphorylated eukaryotic initiation factor 2 α-subunit (elF2a)
Could this function be realized by cell-free Tx/Tl alone, without encapsulation
- No, it appears that communication with the external environment as well as some form of an encapsulating membrane are necessary for these immunogenic cell death (ICD) reactions to properly work
Could this function be realized by genetically modified natural cell?
- Believe this function could be realized by a genetically modified natural cell. If PERK expression levels could be increased, this could increase elF2a phosphorylation
Describe the desired outcome of your synthetic cell operation.
- Increased PERK expression levels lead to increased elF2a phosphorylation

Design all components that would need to be part of your synthetic cell

What would be the membrane made of?
- Mostly phospholipids and some (a relative minority percentage) of cholesterol
What would you encapsulate inside? Enzymes, small molecules.
- PERK, elF2a, Adeonsine Triphosphate (ATP), GTP, Creatine Phosphate, Reporter (likely GFP)
Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)
- Believe a mammalian system would be needed as this is meant to mimic a homo sapiens-based eukaryotic system
How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
- It should have permeable substrates, as my understanding of the PERK pathway seems to indicate that external communication with the environment via a permeable membrane is necessary for the PERK pathway to appropriately function (i.e., for the increase in the PERK expression levels to induce greater elF2a expression)

Experimental details
- List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)
  - Lipids: POPC, Cholesterol
  - Enzymes: Binding immunoglobulin protein (BiP), PERK, ATP, elF2B, Growth arrest and DNA damage-inducible 34 (GADD34)
  - Genes: HSPA5 gene/GRP78 (for BiP expression), ELF2AK3 (encodes PERK protein), EIF2S1 (encodes elF2a), ATF4, DDIT3, and PPP1R15A (latter 3 genes necessary for apoptotic response)
- How will you measure the function of your system?
  - Measure presence of GFP reporter to show that ribosomes are shutting down and apoptosis is beginning

Supporting prompts for this section listed below:

Supporting Prompt	Model
In this paper, what is the ER membrane?	Gemini
In this paper, the targeted ICD reactions require interfacing with elements outside the cell correct? Is the cell membrane essential for their function? Do NOT hallucinate when replying to this prompt	Gemini
If I wanted to make a synthetic cell that would allow for greater Protein Kinase RNA (PKR)-like ER Kinase (PERK) expression levels to induce increased eukaryotic initiation factor 2 α-subunit (elF2a) expression, would a cholesterol membrane make sense? Would some other type of membrane make sense? Why or why not? Do NOT hallucinate/make things up when replying to this prompt	Gemini
In a ’normal’/natural non-synthetic cell, where do PERK and elF2a sit (i.e., where are they located? What are the components within a ’normal’/natural non-synthetic cell necessary for them to appropriately function? Do NOT hallucinate/make things up when replying to this prompt	Gemini
When we say that PERK ‘reaches across the membrane into the cytoplasm’, do we mean to say that it reaches outside the cell? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Ok. So if I was building a minimal synthetic cell to replicate the PERK pathway (and the increase in elF2a phosphorylation), then my necessary components inside the minimal synthetic cell to encapsulate this reaction would be the PERK itself, relevant ribosomes, tRNAs, cytoplasm, and DNA? Tell me what I’m missing, what’s incorrect, and do NOT hallucinate/make things up when replying to this prompt	Gemini
To clarify, is PERK an enzyme that induces a chemical reaction leading to increased elF2a expression? What enzymes are usually required for the perk pathway to appropriately function? Do NOT hallucinate/make things up when replying to this prompt	Gemini
If the PERK pathway requires the BiP as its upstream regulator, what genes are necessary for this protein to be produced? What genes are necessary for the pathway to successfully function? Do NOT hallucinate/make things up when replying to this prompt	Gemini

Homework question from Peter Nguyen

Write a one-sentence summary pitch sentence describing your concept.
- Robotics Use Case: Thinking about using cell-free systems delivered/facilitated by drones to collect metagenomic samples from remote environments, for the pupose of expanding biosurveillance beyond the traditional wastewater sampling
How will the idea work, in more detail? Write 3-4 sentences or more.
- Drone (or potentially a larger drone ship like an evTOL) would deliver robots and kits with cell-free reactions. These robots might be similar to Mars rovers or bomb-defusing robots. The kit would ideally auto-unload once the drone has reached its given destination, the robot would have to complete a set (i.e., limited and discrete) number of specific steps to collect and store the metagenomic sample. If analysis of the metagenomic sample could be done in real-time or a short time duration, that would be beneficial. If this could not be done, there would essentially be a ‘packing’ step before the robot and the now-utilized kits return to their origin site for sample processing/analysis
What societal challenge or market need will this address?
- The need to expand biosurveillance beyond the purely human environment into more remote locales and animal populations
How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?
- Would need secure storage to maintain stability. Activation with water challenges would require an appropriate water disposal mechanism(s) either within/near the sample kit or facilitated by the robot without harming the robot (or the robot might have some form(s) of waterproof protection). One-time use isn’t an issue for this use case because there are ample examples from the world of biosurveillance where one-time sample collection is the aim

Homework question from Ally Huang

Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting.
- As humans travel farther out into space, particularly on remote, potentially skeleton-crew missions, they may not be able to bring blood supplies for adverse events like necessary transfusions. Moreover, blood banks play an important terrestrial role that might need replication. The basic idea is to engineer liver cells to create blood proteins or blood-like fluid on demand a-la the high-level idea initially proposed in the hyperlinked Engineering Biology Research Consortium (EBRC) Roadmap document ⁴
Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches.
- I’d like to study the albumin plasma protein
Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses.
- Albumin has a vital role in the liver’s production of blood-related proteins. Therefore, it would seem rather improbable to have engineered liver cells to create blood proteins without some sort of working albumin configuration or some analog
Clearly state your hypothesis or research goal and explain the reasoning behind it.
- I’d like to study how to modulate or fine-tune albumin expression levels in microgravity, as it appears microgravity exposure can cause albumin levels to increase ⁵.
Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc.
- I’d include purified albumin plasma protein as a sample in my experiment, as well as some form of small interfering RNA (siRNA) for lowering albumin expression, and GFP for measuring expression level change. I’d use Biobits and the P51 Molecular Fluorescence Viewer to measure the impact of siRNA modulation on albumin expression. I’d collect information on oncoctic pressure modulations across a microgravitty and terrestrial experimental configurations. The terrestrial configuration would serve as a control to indicate comparative rates of siRNA modulation efficacy. GFP expression would indicate certain levels of albumin expression post-siRNA modulation

Supporting Prompt	Model
What exactly is the role of the human liver in creating blood or blood-relate proteins? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Show me research papers from the past 5-10 years on the impacts of microgravity on the production of albumin plasma protein. Do NOT hallucinate/make things up when replying to this prompt	Google Scholar Labs
When biotechnologists typically study albumin terrestrially, how is this done? How do they typically measure albumin’s impact on oncoctic pressure? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode

Homework Part B: Individual Final Project

Put your chosen final project slide in the appropriate slide deck following the instructions on slide 1

Inserted slide in Committed Listener Deck

Submit this Final Project selection form if you have not already.

Submitted FInal Project selection form (see screenshots below)

Begin planning how you will write your final project documentation based on these guidelines

Began writing final project documentation based on hyperlinked guidelines

Prepare your first DNA order and put it in the “Twist (MIT)” or “Twist (Nodes)” tab of the 2026 HTGAA Ordering: DNA, Reagents, Consumables spreadsheet, as appropriate.

Prepared Twist order near end of Final Project completion

Week 10 HW: Advanced Imaging & Measurement Technology

Waters Corporation Mass Spectrometer

Homework: Final Project

For your final project:

Please identify at least one (ideally many) aspect(s) of your project that you will measure.
- Lysis Rate
- Efficiency of Plating
Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
- Lysis Rate: This measures the rate at which the mutated m. smegma mycobacteriophage lyses or destroys bacteria. This would be measured in a wet lab setting by comparing percentages of bacteria across a control and another plate that has been exposed to a mutated form of m. smegma mycobacteriophage
- Efficiency of Plating: This measures the rate at which the mutated m. smegma mycobacteriophage can begin initiating a host infection. Believe this would also be measured in a wet lab setting by comparing percentages of bacteria across a control and another plate that has been exposed to a mutated form of m. smegma mycobacteriophage
What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.
- Lysis Rate: I’d likely use a microplate reader as part of a wet lab extension of the final project
- Efficiency of Plating: I’d use a plauqe assay as part of a wet lab extension of the final project

Supporting prompts for this section listed below:

Supporting Prompt	Model
Does Efficiency of Plating (EOP) mean the same thing as rates of lysing? Believe so. Do NOT hallucinate/make things up when replying to this prompt	Gemini
Explain how mutations are measured in this paper. Do NOT hallucinate/make things up when replying to this prompt	Gemini
In experiments like the one referenced in this paper, how are metrics like Lysis Rate and Efficiency of Plating traditionally measured? What tools are used? Do NOT hallucinate/make things up when replying to this prompt	Gemini

Homework: Waters Part I — Molecular Weight

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight?

The calculated molecular weight is 28006.60 Mw

Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data and:
Determine z for each adjacent pair of peaks (n, n+1) using:
- Chose 933.7349 and 965.9684 from the Figure 1 chart. Based on the formula z = ~28.96
Determine the MW of the protein using the relationship between m/z, MW, and z
- MW = 27,983.85 Daltons
Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1
- The result of the measurement I got was -0.0008 ppm
Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

Believe the answer’s no because the zoomed-in peak from my understanding is not uniform. Instead it constitutes a variety of different charged states that cannot be discerned as a singular discrete value. Absolutely open to being wrong on this

Supporting prompts for this section listed below:

Supporting Prompt	Model
In Section 1, question 2, what does the variable z stand for again? Do NOT hallucinate/make things up when addressing this prompt	Gemini
In the formula in the answer to the last prompt, what does the numerator represent? How does it correlate with the spikes in this image? Am aware that each spike represents an m over z ratio but am unsure where/how to begin. Do NOT hallucinate/make things up when replying to this prompt	Gemini
Not understanding how a spike with an 800 value on the left hand side can have a higher number of charges than a right hand spike with a value of 1000. Explain this to me, and if there was any hallucination(s) that have any implications for the results of the answer to the last prompt, say so. Do NOT hallucinate/make things up when replying to this prompt	Gemini
So to be clear in the hypothetical calculation in the answer to the last prompt, to get z you divided 848.97 (the numerator) by 875.44-848.97 (the denominator). Do NOT hallucinate/make things up when replying to this prompt	Gemini
Where is the predicted weight of the protein in 2.1 that is referenced in Section 1, question 3? Not exactly following. Do NOT hallucinate/make things up when replying to this prompt	Gemini
Think I’m doing something wrong. Got a theoretical MW of 28006.60 and an experimental MW of 26,409,038. I chose the 966.0037 and 966.0390 peaks. Apparently the results from the last equation should be in the 30-50 range and I got approx. 941 when I ran the equation. What did I do wrong/what am I missing? Do NOT hallucinate/make things up when replying to this prompt	Gemini

Homework: Waters Part II — Secondary/Tertiary Structure

Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses
- Believe native proteins are not manipulated in any way (i.e., their properties are not altered via heat or other impacts) while denatured proteins are proteins where these properties are destroyed via direct alteration, usually by applying something like heat or acidity. This is determined in a mass spectrometer via distribution of charges across respective proteins. In the denatured protein in Figure 2, there appears to be a somewhat more Gaussian charge distribution, where the native protein below has a more spread out charge distribution
Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800? What is the charge state? How can you tell?
- Observing a +10 charge state. This because the two peaks near ~2800 (2799.4199 and 2799.6365) have a 0.2166 difference between them, which equates to ~+10 when the 1/z is calculated

Supporting prompts for this section listed below:

Supporting Prompt	Model
Looking over Section 2 question 2 and am pretty sure the answer is that there isn’t a single answer to the charge state of the peak at ~2800, based on the inset in the Figure 3 and its somewhat parabolic-looking curve. Feel free to tell me if/where or how my thinking is off and do NOT hallucinate/make things up when replying to this prompt	Gemini

Homework: Waters Part III - Peptide Mapping - Primary Structure

How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above.

There are 18 Lysines (K) and 6 Arginines (R) in eGFP. (See screenshot below)

How many peptides will be generated from tryptic digestion of eGFP? Believe 25 peptides will be generated

Navigate to https://web.expasy.org/peptide_mass/
Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides
Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.
Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.
- Confirmed 25 generated peptides (see screenshot below)

Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
- I count 25 peaks
Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?
- I see a chromatographic peak match
Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide based on its (m/z) and z.
- The charge (z) of the most abundant charge state of the peptide equals ~2.0323. The mass of the singly charged form of the peptide is 1,067.4760
Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm.
- Think the peptide’s in position 115-123 (peptide sequence FEGDTLVNR). The mass accuracy of measurement’s ~0.0161 ppm
What is the percentage of the sequence that is confirmed by peptide mapping?
- According to the PeptideMass tool, 90.7% of the sequence is confirmed by peptide mapping

Supporting prompts for this section listed below:

Supporting Prompt	Model
What is the relationship between a ’tryptic digestion of eGFP’ and the ’trypsin’ enzyme identified in the screenshot in Section III? Do NOT hallucinate/make things up when replying to this prompt	Gemini
So if they’re 18 Lysines (K) and 6 Arginines (R), then that would mean that there would be 24 cuts made and 24 peptides generated correct? Do NOT hallcuinate/make things up when replying to this prompt	Gemini
Not sure where to begin in terms of breaking down/starting to work on Question 5 in Section III. Any thoughts on where or how to begin? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Believe you’re looking at the wrong question, specifically question 4 in Section III as opposed to question 5. Do not hallucinate and provide guidance on how to begin tackling question 5 in Section III	Gemini

Homework: Waters Part IV - Oligomers

7FU Decamer
- Sits directly to the left of the 4.013 peak between 0 and 5 MDa axis in Figure 7
8FU Didecamer
- Very tall 8.33 MDa peak sitting between 5 and 10 MDa on the MDa axis in Figure 7
8FU 3-Decamer
- The 12.67 MDa peak sitting between 10 and 15 MDa on the MDa axis in Figure 7
8FU 4-Decamer
- The tiny peak sitting between 15 and 20 MDa on the MDa axis in Figure 7

Supporting Prompt	Model
Looking over Section IV of this page, it appears as if the subunit masses are in kilo-Daltons and the Mass Spectrum readouts in Figure 7 are in Mega-Daltons, correct? Blanking on the relationship between kilo and Mega units of measurement. Do NOT hallucinate/make things up when replying to this prompt	Gemini

Homework: Waters Part V - Did I Make GFP?

See screenshots below:

	Theoretical	Observed/measured on the Intact LC-MS	PPM
Molecular weight (kDa)		27,983.85 Daltons	-0.0008 ppm

Week 11 HW: Bioproduction and Cloud Labs

Part 1: Global Pixel Artwork Cloud Lab Contribution

Made the following contributions to the Global Pixel Artwork Cloud Lab (see screenshots below)

Global Pixel Artwork Contributions (see above). Edited 4 pixels in the upper right hand corner of the image (changed them to sfGFP)

Week 12 HW: Bioproduction & Cloud Labs Part 2

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Contribute at least one pixel to this global artowrk experiment before the editing ends on Sunday 4/19 at 11:59 PM EST.
- Contributed 4 pixels to the global artwork experiment on Saturday 4/18
Make a note on your HTGAA webpages including:
- what you contributed to the community bioart project
  - I contributed 4 pixels to the community bioart project. I changed 4 pixels in the upper right plate to sfGGP (see screenshots below)
    
    Community bioart project contributions – changed 4 upper right plate pixels to sfGFP (see above)
- what you liked about the project
  - The project was a nice opportunity to contribute to a larger HTGAA effort. It was nice to see the creativity of the community at play! I also appreciated Ronan’s page for contributing pixels – very intuitive and easy to understand
- what about this collaborative art experiment could be made better for next year
  - I’d probably say a bit more advance notice might have been useful. Perhaps a bit more clarity on ground rules. These are relatively minor nitpicks in the grand scheme of things

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.
- E. coli Lysate
  - BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)
    - I think BL21 (DE3) Star Lysate’s role is to provide the E. coli bacteria necessary to be synthesized into fluorescent proteins. Basically it seems to serve as a starting ingredient/necessary component, for lack of a better word

Salts/Buffer
- Potassium Glutamate
  - Helps aid in the elongation portion of the RNA –> protein translation process. In this context, I think elongation refers to the reaction timeframe
- HEPES-KOH pH 7.5
  - Important cell buffer. It allows for extra buffering capacity if cell culture manipulation occurs for prolonged/longer than normal time period
- Magnesium Glutamate
  - Helps stablizie ribosome construction. Ir also helps neutralize mRNA and DNA backbone negative charge
- Potassium phosphate monobasic
  - Potassium source and buffer for the reaction, helping stabilize pH levels/keep them nominal during the reaction. Contains 1 replaceable atom
- Potassium phosphate dibasic
  - Another potassium source and buffer for the reaction, helping stabilize pH levels/keep them nominal during the reaction. Contains 2 replaceable atoms
Energy / Nucleotide System
- Ribose
  - Key cell energy source. It’s crucial for creating adenosine triphosphate (ATP) the primary form of energy within cells
- Glucose
  - Another essential energy source for cell processes. Also helps produce ATP
- AMP
  - Adenosine Monophosphate (AMP) is a metabolite helping regulate energy levels. It acts as a form of an ATP sensor/response mechanism
- CMP
  - Cytidine monophosphate (CMP) assists with RNA synthesis. It helps decompose RNA into ribonuclease (RNase)
- GMP
  - Guanosine monophosphate (GMP) is key for RNA synthesis and regulates cellular signaling. Helps polymerize RNA
- UMP
  - Uridine monophosate (UMP) is a pyrimidine compound. It also helps polymerize RNA.
- Guanine
  - Nucleic acid base that pairs with cystosine in double-stranded DNA. It’s used to build RNA during the transcription process.
Translation Mix (Amino Acids)
- 17 Amino Acid Mix
  - The mix provides a group of compatible amino acids for translation by ribosomes. These are the materials ribosomes work with for translation into proteins
- Tyrosine
  - Assists with protein synthesis. Also asists with phosphate group post-translational modification (PTM).
- Cysteine
  - An amino acid used by ribosomes to build a protein chains during the translation process. Helps with protein folding and stability.
Additives
- Nicotinamide
  - Helps manage the process of cellular nutrients converting themselves to ATP and vice versa. Think this means it also might help with reaction energetic stability
Backfill
- Nuclease Free Water
  - Ensures appropriate reaction concentration. It also ensures no extraneous enzymes destroy reaction byproducts

Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above.

Believe the main difference between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix is that the 1-hour optimized PEP-NTP master mix optimizes for the fast production of flourescent proteins, while the 20-hour NMP-Ribose-Glucose master mix optimizes for flourescent proteins across a longer timespan. While this might seem obvious based on the slide content, my understanding is that the PEP-NTP master mix is more energy intensive (i.e., it essentially consumes more energy faster to create flourescent protein output) while the NMP-Ribose-Glucose master mix is comparatively less energy intensive (i.e., it essentially consumes either less energy slower across its 20 hour-reaction timespan to create flourescent protein output or it consumes the same amount of energy across its 20 hour-reaction timespan in a less energy intensive fashion). So, in essence, I think the main difference between these reactions comes down to their respective energy constumption levels

Supporting prompts for this section listed below:

Supporting Prompt	Model
Under the ‘Salts/Buffer’ subsection under the 1st question in Part B, I think glutamates help with the creation of a given chemical (in this case potassium or magnesium). Not sure how/why these salts/buffers are relevant in a cell-free protein synthesis reaction. Any insights you might have into the roles of the glutamates here, as well as all the various types of potassium would be useful. Do NOT hallucinate/make things up when replying to this prompt	Gemini
When we say something is a buffer for a chemical reaction, what exactly do we mean when we say that? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (4-5 sentences max)	Gemini
Looking at the ‘Energy/Nucleotide’ subsection under the 1st question in Part B, and given a passing understanding of genomics, I understand that guanine (G) pairs with Cytosine (C). In the context of this subsection, does this mean that a cell-free Guanine mix translates or outputs Cysteine in some may? Is that a relationship between these two things? If so, what’s the relationship? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (5-6 sentences max)	Gemini
Looking at the ‘Backfill’ subsection under the 1st question in Part B, my understanding of nuclease free water’s function in a cell-free protein synthesis reaction is to basically provide a clean backdrop for the reaction to occur, or to determine what’s what post-reaction. Are either of those high-level explanations correct or sensible? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (4-5 sentences max)	Gemini
Doing a sanity check: ribosomes turn amino acids into proteins as part of the translation process, correct? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (2-3 sentences max)	Gemini

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)
- sfGFP
  - It has a relatively quick maturation time (13.6 min.). This means researchers can find out whether or not the cell-free reaction occurred successfully rather quickly if they were solely measuring this protein’s flourescence
- mRFP1
  - It has a comparably longer maturation time (60 min.). This means researchers might need to wait a bit to determine whether or not the cell-free reaction occurred successfully if they were solely measuring this protein’s flourescence
- mKO2
  - Acid sensitivitiy levels are 5.5 pKa. This means that if pH drops from the typical 7.5 pH of a common cell-free reaction, this will cause the flourescence to not show
- mTurquoise2
  - It’s acid sensitivitiy levels are 5.5 pKa and its maturation time is also relatively quick (33.5 min.). This means its realtively resistant to drops in pH (i.e., a flourescence readout will still occur) and a researcher can discern whether or not a successful reaction occurred relatively quickly if they were solely measuring this protein’s flourescence
- mScarlet_1
  - It’s acid sensitivitiy levels are 5.3 pKa and its maturation time is comparatively long (174 min.). This means the protein’s relatively senstive to pH drops from the common mean, and it will also take several hours for a researcher to discern whether or not a successful reaction occurred if they were solely measuring this protein’s flourescence
- Electra2
  - It’s the second brightest of all the proteins in this list. It has a 61.48 brightness readout (the brightest protein is mScarlet_1 with a 70.0 brightness readout)
Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.
- I hypothesize that if I increase ribose and/or glucose reagent concentrations in the cell-free mastermix, it will increase sfGFP brightness over a 36-hour incubation period relative to its nominal brightness rate (54.15)
The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here ¹
- Test reagent master mix compositions based on the hypothesis above shown in the screenshots below
Test master mix composition (increase ribose)
Test master mix composition (increase glucose)
The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!). The reaction composition for each well will be as follows:
- 6 μL of Lysate
- 10 μL of 2X Optimized Master Mix from above
- 2 μL of assigned fluorescent protein DNA template
- 2 μL of your custom reagent supplements

Supporting prompts for this section listed below:

Supporting Prompt	Model
In the ‘Attributes’ table in this tab, what does the ‘Maturation (min.)’ value mean? Why does it matter in practical terms? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (3-4 sentences max)	Gemini
On this tab, when we say mK02 has moderate acid sensitivity, what does that actually mean? What acids is it sensitive to? Any type of acid? Why does acid sensitivity matter in practical terms? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (5-6 sentences max)	Gemini
So based on mK02’s moderate acid sensitivity readout (5.5) would we say that that readout is suboptimal/undesirable relative to normal cell-free reaction pH? Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (2-3 sentences max)	Gemini
Looking over the subsections in Part B, does every single subcategory consist of reagents, or do only some subcategories consist of reagents? Do Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (3-4 sentences max)	Gemini
This might be a dumb question. If I give a cell-free reaction more power in the form of higher concentrations of some of the ATP/cellular energy-associated reagents under the ‘Energy/Nucleotide System’ subsection, could I expect a decreased maturation time for fluorescent protein readout/indication of fluorescence? Do Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (5-6 sentences max)	Gemini
In the response to the last prompt, the energy reagents listed were AMP, CMP, GMP, or UMP. What if ribose or glucose levels were increased? Could I expect a decreased maturation time for fluorescent protein readout/indication of fluorescence for sfGPF? Do Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (5-6 sentences max)	Gemini
Understood. What about the impact of increased energy reagents on sfGFP brightness? If I added increased concentrations of some of the ATP/cellular energy-associated reagents under the ‘Energy/Nucleotide System’ subsection, such as Ribose or Glucose, in principle could I expect to get brighter/more fluorescent sfGFP as an output? Do Do NOT hallucinate/make things up when replying to this prompt and keep the answer relatively succinct and plainspoken (5-6 sentences max)	Gemini

https://rcdonovan.com/cfps ↩︎

Week 13 HW: Scaling Health Innovation

Master Mix Concentrations

See test Master Mix Concentrations below:

Week 13 Test Master Mix Concentrations

Per Week 12 hypothesis (see Week 12 HW)¹, attempted to increase ribose and glucose concentration levels to increase selected well sfGFP expression levels

https://pages.htgaa.org/2026a/jason-ross/homework/week-12-bioproduction-and-cloud-labs2/index.html ↩︎

Week 14 HW: Bio Design and Bio Fabrication

Space Phage Supreme work (with artistic license lol)

Final Project Work

Worked on Finalizing Final Project ¹

https://pages.htgaa.org/2026a/jason-ross/projects/individual-final-project/index.html ↩︎

Week 15 HW: Survey Responses

Future Reflections: Kepler-16b Biopark

Survey Responses, Pluses and Deltas, and Reflection

Labs

Lab writeups:

Week 1 Lab: Introduction to Pipetting and Dilutions
Overview Date(s): 03/02/26 – 03/03/26 Notes: Reviewed lab materials outlined in ‘Overview’ protocol section (pipette types and tips, tubes, tube holders, and stock reagents) and concentration basics with Kate Carline (William & Mary Node TA). Discussed lab material functions and reviewed the basics of dilution math and pipetting technique. Supporting Picture(s): Part 1: Mixing Color Prepared tubes with red, yellow, and blue food coloring solution Marked 6 tubes with red, yellow, blue, red/yellow, blue/yellow, and red/blue combinations Added 500 uL to each each red, yellow, blue, red/yellow, blue/yellow, and red/blue combination solution tube See above – made combinations by mixing colors See above See above Dispersed concentrations onto wax paper to make design in lieu of petri plate Supporting Part 1 and Part 2 photos below
Week 2: DNA Gel Art
DNA Gel Art Protocol Part 0: Designing My Gel Art / Expected Results and Walkthrough Created a virtual digest in Benchling as a basis for DNA Gel Art (see below) Benchling Virtual Digest (A Hidden Hello) Protocol Part 1a: Preparing a 1% Agarose Electrophoresis Gel Preared a 1% Agarose Electrophoresis Gel (see below)
Week 3 Lab: Opentrons Art
Opentrons Art Lab Part 1: Flourescent Bacteria & Black Agar Script See Flourescent Bacteria & Black Agar Script Colab Notebook Script here 1 2 Part 2: Submission and Running Your Protocol Traveled to William & Mary Node to complete this lab, as well as the Pipetting and DNA Gel Art Labs. During my time working this lab, I:
Week 4 Lab: Protein Design Part I
Lab Information Lab work can be found within the Week 4 HW Assignment in the hyperlink below 1 https://pages.htgaa.org/2026a/jason-ross/homework/week-04-hw-protein-design-part-i/index.html ↩︎
Week 5 Lab: Protein Design Part II
Lab Information Lab work can be found within the Week 5 HW Assignment in the hyperlink below 1 https://pages.htgaa.org/2026a/jason-ross/homework/week-05-hw-protein-design-part-ii/index.html ↩︎
Week 6 Lab: Gibson Assembly
Gibson Assembly Lab Pre-Lab: Primer and PCR (Part 1 of 3) Read this section and scanned the NuPack software hyperlink Pre-Lab: Gibson Assembly (Part 2 of 3) Read this section Pre-Lab: DpnI Read this section Pre-Lab: Plasmid Transformation Read this section Part 1: Polymerase Chain Reaction (PCR) Prepared PCR (see photos below)
Week 7 Lab: Neuromorphic Circuits
Neuromorphic Circuits Lab Downloaded Neuromorphic Wizard Completed Circuit Design and Simulation in Neuromorphic Wizard See Neuromorphic Wizard Result Screenshots Below
Week 9 Lab: Cell-Free Systems
Cell-Free Systems Lab Was unable to perform this Lab protocol at the William & Mary Node Wet Lab, as this protocol was not performed at the Lab as part of William & Mary’s HTGAA engagement this semester. However I did review the related ‘Cell-Free Systems Laboratory’ protocol. Answers to the questions found in the ‘Homework questions’ section of the protocol can be found in my Week 9 Homework in the ‘Part A: General Homework Questions’ section 1 https://pages.htgaa.org/2026a/jason-ross/homework/week-09-hw-cell-free-systems/index.html ↩︎
Week 10 Lab: Mass Spectrometry
Mass Spectrometry Lab Was unable to perform this Lab protocol at the William & Mary Node Wet Lab, as this protocol was not performed at the Lab as part of William & Mary’s HTGAA engagement this semester. However I did review the related ‘Mass Spectrometry’ protocol.
Week 11 Lab: Introduction to Cloud Laboratories
Introduction to Cloud Laboratories See Week 11 and Week 12 Homework assignments for answers and documentation regarding all related questions 12 https://pages.htgaa.org/2026a/jason-ross/homework/week-11-hw-bioproduction-and-cloud-labs/index.html ↩︎ https://pages.htgaa.org/2026a/jason-ross/homework/week-12-bioproduction-and-cloud-labs2/index.html ↩︎
Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene
Bioproduction of Beta-Carotene and Lycopene Lab Was unable to perform this Lab protocol at the William & Mary Node Wet Lab, as this protocol was not performed at the Lab as part of William & Mary’s HTGAA engagement this semester. However I did review the related ‘Bioproduction of Beta-Carotene and Lycopene Lab’ protocol. Answers to the questions found in the ‘Post Lab Questions’ section of the protocol can be found below:
Week 13 No Lab
No Lab for Week 13
Week 14 Lab: No Lab
No Lab for Week 14

Week 1 Lab: Introduction to Pipetting and Dilutions

Overview

- Date(s): 03/02/26 – 03/03/26
- Notes: Reviewed lab materials outlined in ‘Overview’ protocol section (pipette types and tips, tubes, tube holders, and stock reagents) and concentration basics with Kate Carline (William & Mary Node TA). Discussed lab material functions and reviewed the basics of dilution math and pipetting technique.
- Supporting Picture(s):

Part 1: Mixing Color

Prepared tubes with red, yellow, and blue food coloring solution
Marked 6 tubes with red, yellow, blue, red/yellow, blue/yellow, and red/blue combinations
Added 500 uL to each each red, yellow, blue, red/yellow, blue/yellow, and red/blue combination solution tube
See above – made combinations by mixing colors
See above
See above
Dispersed concentrations onto wax paper to make design in lieu of petri plate
- Supporting Part 1 and Part 2 photos below
  
  Practiced basic pipetting, mixing colors, and performing serial dilution

Part 2: Performing Serial Dilution

Performed serial dilutions on MS/food coloring
Made a final serial dilution reaction based on the information in the pre-lab
- See pictures above

Week 2: DNA Gel Art

DNA Gel Art

Protocol Part 0: Designing My Gel Art / Expected Results and Walkthrough

Created a virtual digest in Benchling as a basis for DNA Gel Art (see below)

Benchling Virtual Digest (A Hidden Hello)

Protocol Part 1a: Preparing a 1% Agarose Electrophoresis Gel

Preared a 1% Agarose Electrophoresis Gel (see below)

Protocol Part 1a: Restriction Digest

Ran Restriction Digest (see images below)

Protocol Part 2: Gel Run

Performed Gel Run (see mp4s below)

Protocol Part 3: Imaging My Results With a Transilluminator

Took gel and prepared to image results (see below)

Final Results

Final result (see below)

Benchling Protocol Notes (sourced from Wiliam & Mary Node TA, Kate Carline)

NotI-HF: rCutSmart, incubates at 37C, 20,000 U/ml = 10 U/ul Kpn1 (Promega): Buffer J, incubates at 37C, 12,000 U/ml = 12 U/ul Sal1 (Promega): Buffer D, incubates at 37C. 10,000 U/ml = 20 U/ul

1.5 ug DNA 324 ng/ul of Kampy B 4.62 ul DNA for N and K 141.4 ng/ul Kampy C (Nanodrop after running out of Kampy B) 10.61 ul for S

15 units of enzyme 1.5 uL Not1-HF 1.25 ul Kpn1 0.75 ul Sal1

2 ul of each 10X Buffer

Remaining to 20 ul NFW 11.88 ul Not1 12.13 ul Kpn1 6.64 ul Sal1

Spin down briefly in picofuge

Incubated for 30 min at 37C

1% agarose gel 2 ul dye with 10 ul reaction 40 min 185 mA 150V //

Week 3 Lab: Opentrons Art

Opentrons Art Lab

Part 1: Flourescent Bacteria & Black Agar Script

See Flourescent Bacteria & Black Agar Script Colab Notebook Script here ¹ ²

Part 2: Submission and Running Your Protocol

Traveled to William & Mary Node to complete this lab, as well as the Pipetting and DNA Gel Art Labs. During my time working this lab, I:

Selected plates for the Opentrons robot
Operated the Opentrons robot with the help of William & Mary students
Ran my Opentrons code
Dispensed Opentrons tips

Protocol photos and mp4 video loops shown below:

Part 3: Final Result

Here’s the final result, showing my Opentrons Art!

https://colab.research.google.com/drive/1-pgSJt_aF9MydtG0szxz2YKoogNRLRhH?usp=sharing ↩︎
Gemini was used to help code a good chunk of the Opentrons code. At William & Mary, we did need to re-configure the code slightly to make it work on the Opentrons ↩︎

Week 4 Lab: Protein Design Part I

Lab Information

Lab work can be found within the Week 4 HW Assignment in the hyperlink below ¹

https://pages.htgaa.org/2026a/jason-ross/homework/week-04-hw-protein-design-part-i/index.html ↩︎

Week 5 Lab: Protein Design Part II

Lab Information

Lab work can be found within the Week 5 HW Assignment in the hyperlink below ¹

https://pages.htgaa.org/2026a/jason-ross/homework/week-05-hw-protein-design-part-ii/index.html ↩︎

Week 6 Lab: Gibson Assembly

Gibson Assembly Lab

Pre-Lab: Primer and PCR (Part 1 of 3)

Read this section and scanned the NuPack software hyperlink

Pre-Lab: Gibson Assembly (Part 2 of 3)

Read this section

Pre-Lab: DpnI

Read this section

Pre-Lab: Plasmid Transformation

Read this section

Part 1: Polymerase Chain Reaction (PCR)

Prepared PCR (see photos below)

Part 1a: DpnI Digest

Completed DpnI Digest (see photos below)

Part 1b: DNA Purification and Quantification

Purified and quantified DNA. It seems at this point that I did something wrong in one of the proceeding protocol stages with my non-Blue chosen color, so instead of proceeding with both colors, I only proceeded with Blue, as the other color did not have an adequate concentration. See photos below for more documentation of this protocol step

Part 2a: Gibson Assembly

Completed Gibson Assembly. Incubated reaction per protocol (see photo below)

Part 2b: Transformation

Completed Transformation protocol step

Final Results

LOREM

Supporting prompts for analyzing the lab protocol listed below for reference

Supporting Prompt	Model
There’s a part of this page that says “After PCR, we treat each reaction with DpnI to eliminate carryover of the original mUAV plasmid.”. That’s Dpnl, not DpnI right? Is it a lower-case l or a capital I?	Gemini 2.5 Flash
Remind me what an oligonucleotide is again in simple terms. Keep the response to this prompt short and do NOT hallucinate/make anything up	Gemini 2.5 Flash
Within the context of this lab, what is a ‘HiFi assembly method’? What is an ‘overhang’? Keep the response to this prompt short and do NOT hallucinate/make anything up	Gemini 2.5 Flash
How does the ’exonuclease “chews back” one strand of the double-stranded DNA.	Gemini 2.5 Flash
I guess I want to understand how this exonuclease works at the chemical level in relatively simple terms. Can you explain that for me? Do NOT hallucinate/make anything up when replying to this prompt	Gemini 2.5 Flash
When the lab refers to ‘Secondary Structures’, what does that mean? Explain in relatively simple terms and do NOT hallucinate/make things up	Gemini 2.5 Flash
Within the context of this lab, tell me what the ‘pUC19 backbone’ is? What does ‘pUC19’ stand for? Answer this prompt in relatively simple terms, keep the response relatively short, and do NOT hallucinate/make things up	Gemini 2.5 Flash
Believe when this lab refers to ‘molar ratios’ it means the ratio of molecules to one another. Is this correct or am I mistaken? Do NOT hallucinate when replying to this prompt and answer this prompt in relatively simple terms	Gemini 2.5 Flash
When the lab states, ‘DpnI recognizes the sequence GATC only when it is methylated’, the ‘GATC’ refers to a Guanine-Adenine-Thyme-Cytosine combination/piece of DNA, and methylation is a chemical process related to said combination/piece of DNA, correct? Do we refer to a 4-letter combination/piece of DNA as a codon, or am I mistaken? Explain in simple terms what methylation is, the actually technical name of this 4-letter combination/piece of DNA, and if there was anything about my thinking/statements so far that’s off. Do NOT hallucinate/make things up when replying to this prompt and keep things relatively simple and short whenever possible	Gemini 2.5 Flash
What does the ‘SOC’ in ‘SOC growth media’ stand for? Do NOT hallucinate/make things up when replying to this prompt and answer this prompt in relatively simple terms	Gemini 2.5 Flash
In the context of this lab, what does ‘uM’ stand for/mean? What does ‘uL’ stand for/mean? Which represents a larger volume? Explain this in relatively simple terms, show me where these terms fit on an overall scale of liqiuid volumes, and do NOT hallucinate/make things up when answering this prompt	Gemini 2.5 Flash
What is ‘silica adsorption’? Do NOT hallucinate/make things up when answering this prompt and keep the answer relatively concise	Gemini 2.5 Flash
When the protocol says to ‘gel at ~ 100 mV for 15 min.’, the ‘mV’ being referred to is some type of voltage, correct? Do NOT hallucinate/make things up when answering this prompt	Gemini 2.5 Flash
How does Chloramphenicol relate to amilCP? Are they the same? Clarify this relationship and do so in relatively simple terms and do NOT hallucinate/make things up when doing so	Gemini 2.5 Flash

Week 7 Lab: Neuromorphic Circuits

Neuromorphic Circuits Lab

Downloaded Neuromorphic Wizard

Completed Circuit Design and Simulation in Neuromorphic Wizard

See Neuromorphic Wizard Result Screenshots Below
- ![Neuromorphic_Wizard_and_Simulation_Output.png]

Completed Google Sheet Template

Week 9 Lab: Cell-Free Systems

Cell-Free Systems Lab

Was unable to perform this Lab protocol at the William & Mary Node Wet Lab, as this protocol was not performed at the Lab as part of William & Mary’s HTGAA engagement this semester. However I did review the related ‘Cell-Free Systems Laboratory’ protocol. Answers to the questions found in the ‘Homework questions’ section of the protocol can be found in my Week 9 Homework in the ‘Part A: General Homework Questions’ section ¹

https://pages.htgaa.org/2026a/jason-ross/homework/week-09-hw-cell-free-systems/index.html ↩︎

Week 10 Lab: Mass Spectrometry

Mass Spectrometry Lab

Was unable to perform this Lab protocol at the William & Mary Node Wet Lab, as this protocol was not performed at the Lab as part of William & Mary’s HTGAA engagement this semester. However I did review the related ‘Mass Spectrometry’ protocol.

Week 11 Lab: Introduction to Cloud Laboratories

Introduction to Cloud Laboratories

See Week 11 and Week 12 Homework assignments for answers and documentation regarding all related questions ¹²

Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene

Bioproduction of Beta-Carotene and Lycopene Lab

Was unable to perform this Lab protocol at the William & Mary Node Wet Lab, as this protocol was not performed at the Lab as part of William & Mary’s HTGAA engagement this semester. However I did review the related ‘Bioproduction of Beta-Carotene and Lycopene Lab’ protocol. Answers to the questions found in the ‘Post Lab Questions’ section of the protocol can be found below:
- Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively?
  - The Erwinia herbicola crtE, crtI, crtB, and crtY genes induce respective lycopene and beta-carotene production
- Why do the plasmids that are transferred into the E. coli need to contain an antibiotic resistance gene?
  - Antibiotic resistance genes help researchers identify which E. coli bacteria successfully took the plasmid. This is necessary because a lot of the time E. coli bacteria do not successfully take plasmids
- Whast outcomes might we expect to see when we vary the media, presence of fructose, and temperature conditions of the overnight cultures?
  - We might expect different levels of lycopene and beta-carotene production (i.e., different levels of biosynthesis, difficult absorption, and/or potentially different shades of produced pigments by varying overnight culture media, presence of fructose, and temperature conditions
- Generally describe what “OD600” measures and how it can be interpreted in this experiment
  - OD600 measuress cell concentration and peak absorption for respective samples post-cellular culture incubation. It can be interpreted to determine whether or not we actually produced the desired forms of lycopene and beta-carotene with the appropriate pigmentation because we can compare our experimental absorption results form our cultures with previously established lycopene and beta-carotene absorption results in the literature
- What are other experimental setups where we may be able to use acetone to separate cellular matter from a compound we intend to measure?
  - It would appear that acetone would be useful for experimental setups like separating out certain proteins from a larger mixture
- Why might we want to engineer E. coli to produce lycopene and beta-carotene pigments when Erwinia herbicola naturally produces them?
  - Likely because we want more control over production of said pigments than Erwinia herbicola naturally provides. Control in this case might extend to pigment shade, concentration, or production time

All supporting prompts for this section listed below

Supporting Prompt	Model
acetone	Google AI Mode
When we say that acetone acts as a solvent for chemical reactions in its role as a laboratory reagent, what exactly do we mean? How is it useful for doing things like separating cellular material from a compound one intends to measure? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Why are antibiotic resistance genes necessary when transferring a plasmid into E. coli? Is it just because the plasmids will die/be attacked by the E. coli without them?Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode

Week 13 No Lab

No Lab for Week 13

Week 14 Lab: No Lab

No Lab for Week 14

Projects

Final projects:

Individual Final Project
The original glorious space phage (still with artistic license lol) Individual Final Project Idea: Space Phage Supreme Section 1: Abstract Phage therapy’s potential to treat novel bacterial infections has generated increased attention in recent years, terrestrially and in space health research. Recent research from University of Wisconsin Madison demonstrated the unique impacts of microgravity on Escherichia coli bacteria and T7 bacteriophage interactions, particularly on the distribution of genetic mutations across the T7 bacteriophage genome 1. Understanding unique microgravity-derived insights on bacteriophage mutations and bacteriophage bacterial interactions could yield phage therapeutic insights terrestrially and for future space travelers. Accordingly, this research aims to extend the University of Wisconsin, Madison’s research by validating if/how E. coli strain microgravity bacteriophage Gene 17 mutated Variants demonstrate increased fitness across different E. coli bacteria strains (CFT073, UTI89, MG1655). The working hypothesis of this research is that Variants will exhibit similar increased Plauqe-Forming Units/milliliter (PFU/mL) (i.e., they will exhibit increased fitness and lsying). To implement this research, the plan is to:
Group Final Project
Bacteriophage Engineering Group Project Inputs_William & Mary Node Group 1 2 Group Project_Protein Design 1 Selected Goal: Increased stability (easiest) Brainstorm Session Questions: Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”) We’ll attempt to run multi-environment/conditional modeling and simulation to down-select lysis stability approaches that show the greatest resilience across environments/conditions. The team has selected a project focused on enhancing the stability of the Lysis Protein, a decision influenced by the group’s current experience level. The primary objective is to improve thermodynamic stability while concurrently preserving the native protein fold and maintaining functional integrity. The proposed methodology involves utilizing BLAST for identifying homologous sequences, followed by Clustal Omega to ascertain conserved residues susceptible to mutation intolerance. Subsequently, ESM2 will be employed to score candidate substitutions based on evolutionary plausibility. This will be succeeded by the application of ESM-Fold to predict and refine the integrity of the protein fold, as well as to optimize existing backbones. The results may then be further subjected to EvolvePro for accelerated directed evolution. Tools like Boltz-1 and ProteinMPNN offer a capability for redesigning solvent-exposed residues and optimizing the core packing of the protein. We can cross their performance for comparison. All selected variant candidates are slated for computational stress-testing under a range of environmental conditions that could potentially induce destabilization. Selected variant candidates that pass the stress test are prioritized for downstream experimental validation. Why do you think those tools might help solve your chosen sub-problem? The previous bullet point addresses tool functionality in our workflow, explaining why and how various tools will assist us in accomplishing our goal Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”). One potential pitfall is that we may have insufficient in vitro quality and quantity of data to test the environmental constraints of interest. Thus wet-lab work would be needed to back-up the findings, in addition to follow-up There are open questions regarding the validity of the stated research approach (i.e., if the approach makes sense relative to the larger goal of increased stability) Include a schematic of your pipeline See workflow schematic below Group Project_Protein Design 2 3 See results below

Individual Final Project

The original glorious space phage (still with artistic license lol)

Individual Final Project

Idea: Space Phage Supreme

Section 1: Abstract

Phage therapy’s potential to treat novel bacterial infections has generated increased attention in recent years, terrestrially and in space health research. Recent research from University of Wisconsin Madison demonstrated the unique impacts of microgravity on Escherichia coli bacteria and T7 bacteriophage interactions, particularly on the distribution of genetic mutations across the T7 bacteriophage genome ¹. Understanding unique microgravity-derived insights on bacteriophage mutations and bacteriophage bacterial interactions could yield phage therapeutic insights terrestrially and for future space travelers. Accordingly, this research aims to extend the University of Wisconsin, Madison’s research by validating if/how E. coli strain microgravity bacteriophage Gene 17 mutated Variants demonstrate increased fitness across different E. coli bacteria strains (CFT073, UTI89, MG1655). The working hypothesis of this research is that Variants will exhibit similar increased Plauqe-Forming Units/milliliter (PFU/mL) (i.e., they will exhibit increased fitness and lsying).

To implement this research, the plan is to:

Perform plaque-forming assay bacteriophage fitness protocol, with both dry and wet lab components
Test Variants 1 and 2 from the University of Wissconsin, Madison paper against CFT073, UTI89, and MG1655 E. coli bacteria strains

The methods for achieving the specific aims referenced above include:

Dry Lab Gibson Assembly Construct Creaetion
Wet Lab PCR
Wet Lab DNA Purification and Quantification
Wet Lab Gibson Assembly
Wet Lab Phage Reboot
Wet Lab Plaque Assay
Dry Lab PFU/mL Calculation

Section 2: Project Aims:

Aim 1 (Experimental Aim): Validate if/how E. coli BL21 strain microgravity bacteriophage Gene 17 mutated Variants demonstrate increased fitness (PFU/mL) across different E. coli strains (CFT073, UTI89, MG1655)
Aim 2 (Developmental Aim): Follow-on experiments showcasing comparable mutations across additional E. coli strains, with the aim of discerning which microgravity bacteriophage mutations can instigate positive terrestrial human health outcomes (specifically improved PFU/mL)
Aim 3 (Visionary Aim): ‘Plug and play’ (ideally bidirectional [terrestrial and space-based]) catalog of microgravity-derived high-fitness bacteriophages for use against nth forms of bacterial infection

Section 3: Background:

Briefly summarize two-peer reviewed research citations relevant to your research
- In ‘Microgravity reshapes bacteriophage–host coevolution aboard the International Space Station’ University of Wiconsin Madison researchers reported on the dynamics between a T7 bacteriophage and E. Coli after microgravity exposure aboard the International Space Station (ISS) ². Their results indicated delayed phage activity, but ultimately the emergence of several novel mutations across the bacteriophage, which when replicated terrestrially, improved lysing. In ‘Impact of simulated microgravity in short-term evolution of an RNA bacteriophage’ researchers from the Centro de Astrobiología and Universidad Autónoma de Madrid also discovered similar delayed phage activity when RNA bacteriophage was exposed to a terrestrial simulated microgravity environment ³. Both studies indicate novel phage activity due to microgravity exposure.
Explain how your project is novel or innovative
- My project seeks to extend learnings on bacteriophage microgravity exposure to determine how microgravity-derived phage mutations can proactively apply to improve terrestrial phage fitness (including but not limited to lysing). If successful, this project will help demonstrate the utility of microgravity in terrestrial phage therapy development. If successful, it might also help create a bidirectional virtuous cycle between microgravity-derived insights, terrestrial phage therapy, and non-terrestrial phage therapies for long-duration space missions
Explain why your project matters and what impact it could have
- This project attempts to solve the problem of proactively improving phage fitness. Improving phage fitness matters as it’s crucial to making phage therapies a viable alternative to traditional antibiotics, particularly in remote, resource-constrained environments like a long-term space exploration mission. Creating the bidirectional virtuous cycle referred to in the answer to the previous question could advance public health and wellbeing in several ways. It could combat antimicrobial resistant (AMR) bacteria while giving humanity a means of dealing with space-based infections when nth volumes of antibiotics or standard pharmaceuticals may be in short supply or logistically unfeasible to transport. In helping bidirectional virtuous cycle, this research will advance our knowledge of customizing terrestrial bacteriophage for improved fitness based on microgravity-derived insights
Describe the ethical implications associated with your project and identify relevant ethical principles (i.e., non-maleficence, beneficence, justice, or responsibility)
- Improving phage fitness might have unintended consequences, as significantly fit phage could lyse bacteria that are important to function of human microbiomes. Therefore, this project intends to follow the principle of non-maleficence. In practice this means our research will be conducted in low-biosafety (BSL) environments on M. smegma and will focus on improving phage fitness to combat AMR bacteria. This research will also uphold the principle of beneficence by making the results of our research publicly available. The measures taken to ensure this project aligns with ethical principles are mentioned in passing in the previous paragraph and elaborated upon here. The research will be conducted in low-BSL settings, and its results will be made publicly available. Any/all researchers associated with this project will comport with all appropriate statutes in maintaining lab safety at all times. While there could be unintended consequences of publicly sharing this research, any/all researchers associated with this project will share what is strictly necessary within the scope of this project’s research aims. Any/all discussion of using bacteriophages to deliberately alter human microbiomes for adverse health outcomes will not occur.

Section 4: Experimental Design, Techniques, Tools, and Technology

Create a detailed experimental plan for your final project. Include a timeline for each part of your experimental plan (i.e., how long you expect each step in your final project to take)

The following protocol describes the complete experimental workflow as a planned dry and wet lab procedure grounded in the computational designs completed in Benchling. The project is currently dry lab; steps below represent the intended comprehensive execution plan.

This is a planned protocol. Steps 1 and 2 (computational) are completed; Steps 3 onward describe the intended wet-lab execution.

Step 1: (Computational) Mutation Design

Field	Detail
Method	Mutations modeled in Benchling by editing the gp17 CDS within pET-28a; codon optimization for BL21 verified in silico
Automation	Not applicable (computational step)
Plate	Not applicable
Expected Result	Two annotated plasmid maps (pET-28a-gp17-V1 and pET-28a-gp17-V2) with confirmed reading frames and no premature stop codons
Timeline	Completed prior to proposal submission

Step 2: (Computational) Gibson Assembly

Field	Detail
Method	Use Benchling Gibson Assembly feature to create primers and constructs for V1 and V2 against an E. coli backbone (Accession: V01146.1)
Automation	Not applicable (computational step)
Plate	Not applicable
Expected Result	Two annotated V1 and V2 Gibson Assembly constructs and primers (includes adequate bp and overhang) for amplification
Timeline	Completed prior to proposal submission

Step 3: Order DNA from Twist

Field	Detail
Method	Export Benchling plasmid sequences as GenBank files; submit “Gibson Assembly_05.02.26 [Variant 1 Fragment]” and “Gibson Assembly_05.02.26 [Variant 2 Fragment]” constructs (these constructs include primers) to Twist Bioscience as whole plasmid synthesis orders; screen all sequences through SecureDNA prior to submission
Automation	Not applicable (vendor synthesis)
Plate	DNA delivered in standard Twist 96-well format
Expected Result	Sequence-verified plasmid DNA confirmed by Twist Sanger sequencing report; >20 ng/µl
Timeline	10–14 business days after order submission

Step 4: Sequence Verification by Sanger Sequencing

Field	Detail
Method	Design primers flanking gp17 insert (T7 promoter and terminator primers); submit to Azenta/Genewiz; align to Benchling reference via SnapGene or NCBI BLAST
Automation	Echo 525 for primer and template dispensing
Plate	96-Armadillo-PCR-AB2396X
Expected Result	100% sequence identity to designed constructs at all mutated positions
Timeline	2–3 days

Step 5: PCR

Field	Detail
Method	Pick 4–8 colonies per construct; PCR (complete denature, annealing, and extension steps) V1 and V2 primers; resolve on 1% agarose gel
Automation	ATC Thermal Cycler for PCR
Plate	96-Armadillo-PCR-AB2396X
Expected Result	Correct band size in 3 of 4 colonies per construct
Timeline	4–5 hours

Step 6: DpnI Digest

Field	Detail
Method	DpnI cuts DNA when GATC in original template sequence is methylated; reaction should occur at approx. 37 °C
Automation	ATC Thermal Cycler
Plate	96-Armadillo-PCR-AB2396X
Expected Result	Original template digested by DpnI, with amplified V1 and V2 PCR fragments preserved
Timeline	30–60 minutes

Step 7: DNA Purification and Quantification

Method:

Per PCR reaction:

Add PCR product and DNA Binding Buffer to microcentrifuge tube. Mix by vortexing.
Transfer 300 mL/mixture into separate spin columns with collection tube. Centrifuge for 1 min.
Discard flow-through liquid and add DNA Wash Buffer to the column. Centrifuge for 1 min. (do this twice).
Transfer column to new tube. Discard flow-through and throw away collection tube.
Add 6 µl of DNase/RNase-free water or Elution Buffer directly to the column.
Sit at room temperature for 2 min. Centrifuge for another min.

Field	Detail
Automation	Zymo-Spin Column
Plate	96-Armadillo-PCR-AB2396X
Expected Result	Purified solution ready for analysis in Step 9
Timeline	4–10 min. per PCR reaction

Step 8: DNA Purification and Quantification Results Analysis

Field	Detail
Method	Measure DNA concentrations from Step 8 using spectrophotometer (e.g., Nanodrop). Clean stage, add elution buffer, dry stage, add 2 µl per DNA sample and record concentration accordingly
Automation	Not applicable
Plate	Not applicable
Expected Result	~70 µg/mL concentrations; 0.5–1.0 pmoles of DNA for next step
Timeline	10–15 min.

Step 9: (Wet Lab) Gibson Assembly

Field	Detail
Method	Start Gibson Assembly reaction at appropriate concentrations⁴ on ice. Incubate reaction at 50 °C on heat block. Post-incubation, add nuclease-free water for dilution
Automation	Liquid-handling robot (optional)
Plate	96-Armadillo-PCR-AB2396X
Expected Result	Constructed V1 and V2 mutations introduced into standard expression backbone; 0.5–1.0 pmoles
Timeline	15 min. setup, 30–60 min. incubation

Step 10: Cell-Free Phage Reboot

Field	Detail
Method	Use transcription-translation (TX/TL) cell-free mix (e.g., Arbor Biosciences myTXTL) to induce reaction at appropriate concentrations in 1.5–2 mL tube⁵. Incubate resulting mixture overnight at 29 °C
Automation	Liquid-handling robot (optional)
Plate	Standard agar plates
Expected Result	Phage lysates with V1 and V2 variants
Timeline	1 day total

Step 11: Plaque Assay Setup

Field	Detail
Method	Grow 3 batches of each tested E. coli strain (CFT073, UTI89, MG1655) to log phase in lysogeny broth (LB). Take portion of V1 and V2 phage lysates, along with wild-type T7 bacteriophage, and distribute onto each E. coli strain. There should be 3 V1 and V2 phage lysates and 3 wild-type T7 bacteriophage lysates distributed upon completion of this step. Make ten-fold serial dilutions of resulting phage stock
Automation	Echo 525 for serial dilution dispense
Plate	Standard agar plates
Expected Result	Plaques formed on CFT073, UTI89, MG1655 E. coli strains; ~10⁷ PFU
Timeline	18–24 hours

Step 12: Data Analysis — Calculate PFU/mL and Graph Results

Field	Detail
Method	Choose plates with countable number of plaques. Count plaques present on each plate⁶. Photograph plates to document plaque morphology. Run Analysis of Variance (ANOVA)⁷ to determine statistical significance between variant and wild-type results
Automation	Software like OnePetri or ViralPlaque can be used (optional)
Plate	Standard agar plates
Expected Result	PFU/mL readouts per tested strain showing lysing across V1, V2, and wild-type T7 bacteriophage. This readout will be a bar graph with three groups on the X-axis (one per E. coli strain), each group containing three bars (WT, V1, V2), with PFU/mL on the Y-axis and error bars showing variability across replicates; 10⁻⁴ to 10⁻⁸ serial dilutions
Timeline	1–2 days (includes all later steps)

Step 13: Plaque Assay Control Cross-Check

Control	What It Tests	Expected Result
WT gp17 + MG1655	Positive control — confirms functional assay	Plaques present
No phage (any tested E. coli strain)	Negative control — confirms no false positives	No plaques
Empty vector (no gp17) + any strain	Confirms gp17 specifically drives infectivity	No plaques

Step 14: Validation Against Benchmark

Field	Detail
Method	Measure PFU/mL for MG1655 strain
Automation	Software like OnePetri or ViralPlaque can be used (optional)
Plate	Standard agar plates
Expected Result	Ideally, we will see V1 and V2 values similar to MG1655, with gains for CFT073 and UTI89 strains
Timeline	See Step 13

We discussed and practiced various techniques related to synthetic biology throughout the semester. Place a check next to the techniques relevant to your project.
- Pipetting
  - Pipetting
  - Lab Safety
  - Bioethical Considerations
- DNA Gel Art
  - DNA Sequencing
  - DNA Editing
  - DNA Construct Design
  - Databases (e.g., Genbank, NCBI, Enzembl, and UCSC Genome Browser
- Lab Automation
  - Using Liquid Handling Robots (e.g., Opentrons) [Optional]
  - Designing a Twist Order
  - Creating a plan to use the Autonomous lab at Ginkgo Bioworks [Discussed–Tentative]
- Protein Design
  - Use of Benchling
  - Models and Notebooks
  - Databases
- Bioproduction
  - Plasmid Preparation
  - Bacterial Culturing
  - Quality Control/Analysis
  - Bacterial Processing (e.g., Centrifugation, Lysis, DNA Purification)
- Cell-Free Systems
  - Cell Free Reactions
- Gibson Assembly
  - Primer Design or Selection
  - PCR Reactions
  - Gibson Assembly
Expand upon two techniques you checked in the previous question by describing how you would utilize those techniques in your final project.
- I utilized DNA Construct Design when designing two Gibson Assembly Constructs in Benchling. I re-created the Gene 17 mutations for the respective Variants 1 and 2 from the University of Wisconsin, Madison paper and then placed them within a T7 E. coli backbone (Accession: V01146.1) in Benchling using the Gibson Assembly feature. This informs the later Primer Design or Selection technique listed above, because the dry lab Variant 1 and 2 primers designed as part of this dry lab Gibson Assembly Construct inform the the wet lab Gibson Assembly step of the previously listed protocol above. They do this by defining how the Gene 17 coding sequence (CDS) should ideally express itself against the T7 backbone. This in turn helps set the stage for the rebooting the E. coli bacteriophages and helping us measure lysing (PFU/mL), at the end of the protocol (this ties into the Bacterial Processing (e.g., Centrifugation, Lysis, DNA Purification) technique listed above).
Identify any How to Grow (Almost) Anything Industry Council companies which are associated with your final project
- Cultivarium (conceptual alignment), Ginkgo Bioworks (tentative), Opentrons (optional), Twist Biosciences

Section 5: Results & Quantitative Expectations

You are required to validate at least one aspect of your final project aims. This is to ensure that you are able to successfully apply a relevant synthetic biology technique to your project. Include figures if you have them – accuracy is critical in figures, tables, and graphs

What aspect of your final project did you choose to validate?
- I chose to validate PFU/mL results

Write down a detailed protocol of how you validated this aspect of your final project

See detailed protocol steps below (copied from protocol above):

Step 12: Data Analysis — Calculate PFU/mL and Graph Results

Field	Detail
Method	Choose plates with countable number of plaques. Count plaques present on each plate⁶. Photograph plates to document plaque morphology. Run Analysis of Variance (ANOVA)⁷ to determine statistical significance between variant and wild-type results
Automation	Software like OnePetri or ViralPlaque can be used (optional)
Plate	Standard agar plates
Expected Result	PFU/mL readouts per tested strain showing lysing across V1, V2, and wild-type T7 bacteriophage. This readout will be a bar graph with three groups on the X-axis (one per E. coli strain), each group containing three bars (WT, V1, V2), with PFU/mL on the Y-axis and error bars showing variability across replicates; 10⁻⁴ to 10⁻⁸ serial dilutions
Timeline	1–2 days (includes all later steps)

Step 13: Plaque Assay Control Cross-Check

Control	What It Tests	Expected Result
WT gp17 + MG1655	Positive control — confirms functional assay	Plaques present
No phage (any tested E. coli strain)	Negative control — confirms no false positives	No plaques
Empty vector (no gp17) + any strain	Confirms gp17 specifically drives infectivity	No plaques

Step 14: Validation Against Benchmark

Field	Detail
Method	Measure PFU/mL for MG1655 strain
Automation	Software like OnePetri or ViralPlaque can be used (optional)
Plate	Standard agar plates
Expected Result	Ideally, we will see V1 and V2 values similar to MG1655, with gains for CFT073 and UTI89 strains
Timeline	See Step 13

What synthetic biology techniques did you utilize in validating this aspect of your final project?
- This validation step is predicated upon Bacterial Processing (e.g., Centrifugation, Lysis, DNA Purification), followed by Quality Control/Analysis. Bacterial processing is necessary for experimental validation because without appropriate rebooting the phage and performing a plaque assay against the E coli. bacterial strains, we cannot measure PFU/mL. From there, we are perfoming a form of quality control/analysis inasmuch as we are cross-checking our experimental PFU/mL results against our stated hypothesis. In the case of this project and its protocol, we are cross-checking our experimental PFU/mL results by choosing plates with countable number of plaques, counting plaques present on each plate, photographing plates to document plaque morphology, and running Analysis of Variance (ANOVA) to determine statistical significance between variant and wild-type results, with the hope and/or expectation our Variants will outperform the wild-type against the uropathogenic strains (i.e., demonstrate increased PFU/mL relative to the wild-type against strains CFT073 and UTI89 respectively).
You must present data as part of your final project and include some analysis of that data. The data may be collected experimentally in the lab or generated as simulated simulated (e.g., using Asimov Kernal or another simulation method)
- The chart above shows desired hypothetical PFU/mL results. The wild-type T7 (the control) is expected to perform well against the MG1655 (K-12), relative to the microgravity-derived Variants, due to native adaptation. However its performance (i.e., its PFU/mL) plummets relative to the CFT073 and UTI89 uropathogenic strains, while the Variants exhibit higher comparative fitness (PFU/mL) due to greater hydrophobica optimization and greater ability to pierce the bacterial O-antigen shield via multiple microgravity-derived amino acid mutations.

Did you encounter any unexpected challenge(s) when performing your validation? If so, describe the challenge(s) and strategies to overcome it. If not, discuss potential problems, difficulties, limitations, and/or alternative strategies to overcome challenges in your final project
- As the project is currently in dry lab form, I have not encountered any challenges when performing validation. However potential difficulties that could occur in validating the comprehensive version of the project (including wet lab components) could include, but may not be necessarily limited to having uncountable plaques. This would be a problem because I need a countable number of plaques to succesfully calculate PFU/mL readouts as a result of the plaque assay. There are a couple of strategies I’d employ to deal with this potential challenge. I would first cross-check to see if enough serial dilutions were performed. Then I would ensure that no plating errors took place either before or during the plaque assay portion of the wet lab protocol

Section 6: Additional Information

List all references cited in this assignment
Create a supply list and budget for your project
- Twist plasmid synthesis (V1 + V2, list-rate estimate): ~$1,260.00
- Genewiz Sanger sequencing (listed starting price): ~$75.00
- IDT sequencing primers (2 × approx. $5.40): ~$10.80
- IDT PCR primers (4 × approx. $5.40): ~$21.60
- NEB DpnI (R0176S, 1,000 U): $72.00
- Zymo DNA Binding Buffer (D4004-1-L): $68.00
- Zymo DNA Wash Buffer (D4003-2-24): $40.00
- Zymo DNA Elution Buffer (D3004-4-10): $17.00
- Zymo-Spin I-96 plate (C2004): $171.00
- Zymo Collection plate (C2002): $26.00
- Zymo Elution plate (C2003): $23.00
- NEB Gibson Assembly Master Mix (E2611S, 10 rxn): $186.12
- Arbor Biosciences myTXTL Pro Kit (540300, 300 µL): $399.00
- E. coli MG1655 (ATCC 700926): $486.00
- Sigma-Aldrich LB Broth (Miller) powder, 500 g (110285): $72.30
- Addgene pET-28a backbone (#26094): $89.00
- Benchling (academic): $0.00
- SecureDNA screening: $0.00 Total Supply Cost: ~$3,016.82 USD
NOTE: Information listed above is based on Perplexity AI Computer feature prompt, which excluded capital equipment and quote-required items by design to avoid hallucination/pulling unverifiable data

All Final Project supporting prompts listed below:

Supporting Prompt	Model
Please follow the skill instructions below and begin Phase 1:	Claude
Q: What biological problem are you most interested in solving? A: I’m interested in bacteriophage engineering, specifically taking microgravity-derived phage fitness insights from an International Space Station (ISS) experiment and proactively applying these insights to a non-Escherichia coli terrestrial bacteriophage Q: What system or organism would you prefer to work in? A: Thinking a Mycobacterium smegmatis (M. smegma) , but I’m not sure Q: Which synthetic biology approaches interest you most? (Select all that apply) A: ORACLE (Optimized Recombination, Accumulation, and Library Expression), In-cell recombination, and Combinatorial Library Synthesis (although I’m not sure and am open other synthetic biology approaches for this research)	Claude
Q: The ISS microgravity experiments you’re referencing — do you have a specific phage or dataset in mind? A: Yes — a specific phage/study (I’ll describe below) Q: For your M. smegmatis host system, which mycobacteriophage are you thinking of engineering? A: Open to suggestions based on what’s best for the project. Please note that I’m working in a low-biosafety level (BSL) lab, so that’s the only major consideration that comes to mind here Q: What is the core hypothesis — what ‘fitness’ trait from microgravity are you trying to engineer in? A: The desired fitness traits would most likely be ‘Broader host range’ and ‘Enhanced infectivity / adsorption rate’ in that order of priority	Claude
Q: What type of phage genes do you expect carry the microgravity fitness signal? (This shapes your DNA construct design) A: Tail fiber / receptor binding protein (RBP) mutations Q: For measuring success, which primary assay makes most sense to you? A: Combination of the above Q: You mentioned ORACLE / in-cell recombination / combinatorial library synthesis — are you set on using all three, or open to a focused recommendation? A: I’d like your recommendation given the BSL constraint and M. smeg system	Claude
Microgravity reshapes bacteriophage–host coevolution aboard the International Space Station (title); Phil Huss, Chutikarn Chitboonthavisuk, Anthony Meger, Kyle Nishikawa, R. P. Oates, Heath Mills4 Olivia Holzhaus, Srivatsan Raman (authors); 2026 (year)	Claude
Q: Does D29 with focus on the gp80 tail spike RBP sound right for your project? A: I don’t know enough to know enough so will defer to these findings Q: What host range panel would you like to test your engineered D29 variants against? A: Multiple M. smeg strains only (keep it BSL-1) Q: The paper used cell-free approaches for DMS. Do you want a cell-free component in your project? A: Do whatever will likely be logistically easier given the timeframes to complete the research (essentially 4 to maybe 6 weeks max.)	Claude
“generate proposal”	Claude
Take a look at the following quote from the URL below: “Strain-specific phage chassis to target bacteria that commonly cause infections during space flight.” What is the difference between a phage and a phage chassis? In general? In a biotechnological context? Do NOT hallucinate when answering these questions https://roadmap.ebrc.org/engineering-biology-for-space-health/	Perplexity
Take a look at the following quote from the URL below: “Capability to produce novel phages on space missions for rapid control of evolved biofouling microbes. What are ’evolved biofouling microbes’? What is biofouling? I assume biofouling indicates something bad/undesirable, but I don’t know what the term actually means beyond my assumption Do NOT hallucinate when answering these questions https://roadmap.ebrc.org/engineering-biology-for-space-health/	Perplexity
I understand how a phage can insert itself into a cell. Not exactly understanding if or how phages’ abilities contribute at all to personalized medicine developments (i.e., is there something about phage properties that make them particularly good candidates for personalized medical interventions)? Do NOT hallucinate when answering this question	Perplexity
What are the technical subcomponents of a biotechnology intervention or treatment using phage chassis? What do the supply chains look like, if any? Do NOT hallucinate when answering these questions. If you don’t know the answers to these questions, say so	Perplexity
“phage chassis synthetic biology manufacturing pipeline” search results	Perplexity
Are there any existing ways a biotechnology solution (let’s say a custom developed chassis) can proactively prevent itself from malicious dual-use? Analogous to large language model (LLM) safety refusal, are there any mechanisms that can be pre-built into a biotechnology solution to proactively prevent malicious dual-use? Do NOT hallucinate when answering these questions	Perplexity
How exactly do phages interact with genetic code information within a given cell? How do cell-based bacteria defend against unwanted phages?	Perplexity
I’m high-level aware that there are certain ’no-go’/‘do not edit’ pieces of genetic code. How are phages traditionally prevented from editing these ’no-go’/‘do not edit’ pieces of genetic code? Is that a thing? If I’m off in any way/if my conceptual underpinnings seem shaky, let me know Do NOT hallucinate when answering these questions	Perplexity
Tell me about about engineered synthetic biology kill switches	Perplexity
Have any engineered synthetic biology kill switches been implemented as part of phage therapies? Do NOT hallucinate when answering this question	Perplexity
If I’m making a novel phage-related therapy for astronauts, and I live in the United States, the Food and Drug Administration (FDA) would need to approve this therapy, correct? My assumption is yes. How does approval of a drug used outside of Earth’s atmosphere work from a regulatory perspective? Do NOT hallucinate when answering these questions	Perplexity
Are there any space health-related consortia specifically or explicitly aimed at making space medicine advancements as broadly accessible as possible to both spacefaring and terrestrial populations? If so, share information regarding said consortia Do NOT hallucinate. If you don’t know the answer to this question, say so	Perplexity
In medicine, what do we usually mean by ‘point of care’? What do we mean when we say that?	Perplexity
Do space medicine point of care guides exist? If so, are there any for commercial space tourists, astronauts, or future groups of spacefarers, including workers, etc.?	Perplexity
What is applied biomedicine?	Perplexity
What is the TRISH POCUS training referred to in the answer to the last prompt? What does POCUS refer to?	Perplexity
Tell me how to add a Promoter to a sequence in Benchling	Perplexity
Found this information from the Registry of Standard Biological Parts: BBa_J23106 Can you break down what this naming convention means and how I can find the relevant Promoter information in a sequence based on this naming convention? Do NOT hallucinate. If you don’t know the answer, say so	Perplexity
What is an alignment in Benchling? In Benchling, how do I put a codon optimized sequence under or next to a sequence I originally imported? Do NOT hallucinate when answering this question	Perplexity
How do I replace a sequence in Benchling with a codon-optimized sequence?	Perplexity
Bit confused regarding how to find a Promoter in a sequence in Benchling. I tried Auto-Annotate and it doesn’t seem to be working. Where should I go from here?	Perplexity
What is an RBS in Benchling?	Perplexity
What is a 7x His Tag? What is a Terminator? How do I find these in Benchling? Where are these traditionally inserted into a sequence in Benchling?	Perplexity
How do I paste sequences into a Benchling file?	Perplexity
How do I know where to insert a Promoter into a given sequence in Benchling?	Perplexity
Not totally understanding. If the start codon (the ATG) represents the start of the sequence, how do I insert something before that in Benchling?	Perplexity
What is an RBS? Where are they traditionally inserted into a sequence?	Perplexity
What do spacers look like in Benchling? Is it literally just empty space with no letters/codons? Something tells?	Perplexity
Where is a coding sequence traditionally inserted in a codon optimized sequence in Benchling? If there’s something off in what I’m saying, let me know	Perplexity
Where is a C-terminus in a protein in Benchling?	Perplexity
How do I find an amino acid view for a sequence in Benchling?	Perplexity
In Benchling, if I’m inserting a 7x His Tag and a Terminator, and I have a stop codon in my sequence, what is the traditional sequence? Is it 7x His Tag, stop codon, Terminator? Something else?	Perplexity
Any way I can add a Schema to a sequence after the fact in Benchling?	Perplexity
What is horizontal gene transfer? A separate question (perhaps): What is the technical term in biotechnology for transferring the abilities of one organism to another (ex. if I wanted to actually give a lizard the ability to fly like a bird by importing genetic properties that allow for the creation of wings for example)?	Perplexity
If I want to perform transgenesis in a biotechnological context (i.e., introduce a foreign gene into a new organism to confer a desired trait), and I want to start this process by sequencing the original foreign gene, what is considered the best practice in modern biotechnology for sequencing this original foreign gene? Is this sequencing method first, second, or third generation in the history of biotechnology? From some other period? What essential steps does it involve and how does it decode the bases of the original foreign gene? What is its output? Do NOT hallucinate when answering these questions. If you don’t know the answer to any of these questions, say so	Perplexity
How is Next-Generation Sequencing (NGS) considered second generation? Do NOT hallucinate when answering this question	Perplexity
If I want to perform transgenesis in a biotechnological context (i.e., introduce a foreign gene into a new organism to confer a desired trait), and I want to start this process by sequencing the original foreign gene via Next-Generation Sequencing (NGS), what is my input at the very beginning of the sequencing process? How is that input prepared for sequencing? Do NOT hallucinate when answering these questions. If you don’t know the answer to any of these questions, say so	Perplexity
What do phage isolation experiments usually entail? Are any elements of phage isolation experiments dangerous to human health and safety, and if so, why?	Perplexity
Take the steps in the “What phage isolation usually involves” section of the last prompt and break down the tools traditionally used for each step Do NOT hallucinate when answering this question	Perplexity
What is supernatant?	Perplexity
What are plaques in a phage isolation experiment context?	Perplexity
What does it mean to ‘pellet’ bacteria?	Perplexity
Do phages contain or have DNA?	Perplexity
Within the context of Gibson Assembly (biotechnology DNA assembly method), why exactly are molar ratios (apparently they need to be 2:1, insert:vector) important? What are molar ratios? Do NOT hallucinate when replying to this prompt	Perplexity
What exactly is the insert and what exactly is the vector within the context of the Gibson Assembly DNA Assembly method? Do NOT hallucinate when replying to this prompt	Perplexity
In the context of biotechnology and synthetic biology, what exactly is a plasmid backbone? Explain this to me as if I were a reasonably educated 16-year old Do NOT hallucinate when addressing this prompt	Perplexity
Tell me about the Phusion High-Fidelity (HF) Polymerase Chain Reaction (PCR) Master Mix. What is it? What are its subcomponents and what do they do? Do NOT hallucinate when addressing this prompt	Perplexity
Within the context of a Polymerase Chain Reaction (PCR), I believe primers are the pieces of DNA that get copied nth number of times, correct? If I’m mistaken, indicate as such, and the error in the initial reasoning. Do NOT hallucinate when addressing this prompt	Perplexity
So based on the answer to the last prompt: –Primers essentially define the space in the DNA sequence that will be copied? –What is a free 3′‑OH end? Explain this to me as if I were a relatively educated 16-year old Do NOT hallucinate when answering this prompt	Perplexity
Do primer pairs always need to have a temperature difference of 5°C from each other? If so, why? Do primer pairs always need to at a temperature of between 52–58°C before annealing? If so, why? What factors determine ideal primer annealing temperatures, and why? Do NOT hallucinate when addressing these prompts	Perplexity
What does phage therapy look like clinically?	Perplexity
In the answer to the last prompt, what exactly does the creation of the phage cocktail entail? Do NOT hallucinate when replying to this prompt	Perplexity
Take the attached .pdf file and explain to me the components to the plasmid and why they were chosen/why they make sense given the Aims and Experimental Goals of the research project outlined in the attached .md file Do NOT hallucinate/make things up when replying to this prompt. If things don’t make sense to you, or you don’t have additional context regarding the respective .pdf and .md file content, say so	Perplexity
I already went in and changed the amino acids myself in each of the Variants, so I’m actually not tremendously concerned about being able to pinpoint which amino acid residues changed across Variants. I care more about the 4th Caveat mentioned (DE3 lysogen status of CFT073, UTI89, MG1655). How would I verify this? Also, my Varian 1 and 2 plasmids are 2,429 kb respectively, and they’re pET-28a constructs (believe they might be meant to be expression cassettes). If there are any issues you see with this, let me know Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Want to understand the utility of Gibson Assembly in genomics. If I have a mutated fragment of E. coli bacteriophage, and I want to see how that mutated fragment interacts with (specifically how it lyses) multiple strains of E. coli. (ex. CFT073, UTI89), how can Gibson Assembly help me accomplish this goal? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Precision. It allows me to insert the precise mutated bacteriophage gene fragment into nth novel backbones to see their results, as opposed to passing phage on to new hosts	Perplexity
I’ve actually re-created the original Gene 17 Variant 1 and Variant 2, with corresponding amino acid mutations. Wondering how to appropriately create the primers for Gibson Assembly. How do I make sure they have the appropriate 5’ –> 3’ orientation? Also wondering if this is something I should go to Perplexity Computer mode to assist with Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Ok, so help me understand this. To test the gp17 Variants against CFT073, UTI89, and MG1655 E. coli. strains, I know I need an expression vector. I know I need primers for my Variants to allow the mutated gp17 to express itself. Taking pET-28a off the table, not understanding the connective tissue between the Variant primers and testing the primers for lysing (Pfu/mL) against CFT073, UTI89, and MG1655 E. coli. strains. In terms of an experimental protocol, am I supposed to make backbones for each of the strains, the essentially create ‘cuts’ in the backbones so the primers can be inserted? Also not understanding how all this dovetails into the idea of ‘rebooting a phage’ to test phage fitness Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Path B	Perplexity
So based on the results in the last couple prompts, it would be fair to say that my fragments are the mutated gp17 Variants and the phage backbone is a ’normal’ T7 phage genome (like BL21) correct? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Apologies if this is a dumb question. In the context of bacteriophage lab protocol workflows, when someone says they are going to ‘reboot the phage’, what exactly does that mean? Is it the same as Gibson Assembly? If it’s not, what comes first (pretty sure it goes Gibson Assembly –> phage reboot but not sure)? What does this look like in terms of a lab protocol, in simple terms? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
It seems like 2A is the equivalent of a transformation step from a lab protocol perspective yes? If not, or if it’s different somehow, why? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
What do Gibson Assembly protocols typically entail? What is needed to get the reaction properly set-up? What types of wet lab lab plates (if any) and what type of automation is used as part of standard Gibson Assembly protocols? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Want to understand if or how wet lab automation equipment is ever used in the context of phage rebooting. Also, what types of wet lab plates, if any, are used? Including incubation, how long does it usually take to execute a cell-free phage reboot? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
How long does it traditionally take to see plaque assays in a bacteriophage wet lab experiment? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
How is PFU/mL traditionally counted in standard bacteriophage wet lab protocols? Is this done manually in the wet lab or by some form of automated lab equipment? A bit of both? Also, what is an ANOVA statistical test? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
If I want to say a bacteriophage demonstrated greater fitness/lysing, what’s the appropriate terminology/acronym to use? Is it Multiplicity of Infection (MOI), Efficiency of Plating (EOP), or some other term? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Taking a look at this Draft Protocol, I want to understand the following: –Beginning with the end in mind, how much Variant 1 and Variant 2 (V1 and V2) DNA do I need to have by the time I reach Step 9 in order to successfully complete Steps 12 and 13? What should my quantities look like (wet lab measurements across Steps 4 - 12? –Should Step 5 be cut? I think the answer’s yes –Is a cell-free approach traditionally used for Phage Reboot (see Step 11)? Reference the attached document and the open literature on bacteriophage plaque assay/lysing to address this prompt. Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Ok, looking over the first part of the answer to the last prompt, just confirming the ‘Wet Lab Measurement Targets’ table indicate measurements to implement at each relevant Wet Lab protocol step so there’s enough V1 and V2 DNA concentrations to get appropriate PFU/mL readouts at the Plaque Assay/final read-out portion of the protocol, correct? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Apologies if this is a dumb question. Given the open academic literature on bacteriophage plaque assay/lysing, does the Step 9 concentration amount (~30 µg/mL) give us enough to work with or is it optimal relative to the expected PFU/mL output later in the protocol? Does it set things up well? Is it optimal, suboptimal, or nominal? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Wonderful – would you mind please re-doing the table that was outputted 2 prompts ago accordingly? I want all the Wet Lab Measurements in the protocol in adequate/sensible ranges based on the variance identified in the last prompt. In short, I want wet lab measurements corrected if needed across all wet lab steps in the protocol If you don’t know the prompt that’s being referenced from 2 prompts earlier in this thread, say so. Otherwise, please redo the table accordingly Do NOT hallucinate/make things up when replying to this prompt. If you don’t know something, say so	Perplexity
Based on this protocol, create a hypothetical PFU/mL graph showing results for Variant 1, Variant 2, and wild-type T7 bacteriophage against the intended tested E. coli. strains (CFT073, UTI89, MG1655) Explain the hypothetical results, why they are what they are, and do NOT hallucinate/make things up that aren’t sensible based on what exists in the literature on bacteriophage lysing research	Perplexity
Based on this protocol, create a hypothetical PFU/mL graph showing results for Variant 1, Variant 2, and wild-type T7 bacteriophage against the intended tested E. coli. strains (CFT073, UTI89, MG1655) Explain the hypothetical results, why they are what they are, and do NOT hallucinate/make things up that aren’t sensible based on what exists in the literature on bacteriophage lysing research	Perplexity
I made the following Variant Gibson Assembly Constructs in a standard E. coli. T7 bacteriophage backbone (NCBI Accession: V01146) based on the Variants created in the attached PDF. Want to understand the essential components of each Gibson Assembly Construct in 2nd/3rd-level detail (i.e., I want to explain the important pieces of each construct and why they’re there), but do not know how exactly where to begin. I know I have things in the Construct like primers, a Coding Design Sequence (CDS) for the mutant variant genetic fragment of interest (Gene 17), as well as T7 promoters and terminators, but not exactly sure how to piece all of this together into a cohesive narrative regarding what I constructed (the important pieces), their intended functions, and why they’re there How can we go about getting to this goal? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Let me answer your questions to the best of my ability in order: Question 1 Attempted Answer: If you look at the attached PDF, you’ll see a variety (I believe approx. 5) of amino acid substitutions for each Variant. When you look at Variant 1 and Variant 2 constructs, what you don’t see purely based on the screenshots is the fact that the respective CDS inserts per each Variant were crafted by manually changing the relevant Gene 17 amino acids per instructions in the attached PDF (i.e., reading what amino acid mutations took place and then copying them in Benchling). Then those Gene 17 fragments with the amino acid mutations were turned into their own respective fragments separate from their larger original sequence for testing within Gibson a-la the attached pdf (I think – I my memory’s a bit fuzzy on the last part). Hopefully that helps answer the question Question 2 Attempted Answer: Gibson Assembly helps express or amplify a specific genetic mutation within a novel construct. We insert primers into an expression backbone and then genetic circuit components like promoters and terminators help express said mutation. That’s my high-level understanding Feel free to ask further questions or elaborate in response to any answers. DO NOT hallucinate/make things up when replying to this prompt	Perplexity
Here are the answers to your questions: –Yes, the rephrasing matches how I see my constructs –Variant 1 Fragment gp 17 CDS Amino Acid List/Sequence: MANVIKTVLTYQLDGSNRDFNIPFEYLARKFVVVTLIGVDRKVLTINTDYRFATRTTISLTKAWGPADGYTTIELRRVTSTTDRLVDFTDGSILRAYDLNVAQIQTMHVAEEARDLTTDTIGVNNDGHLDARGRRIVNLANAVDDRDAVPFGQLKTMNQNSWQARNEALQFRNEAETFRNQAEGFKNESSTNATNTKQWRDETKGFRDEAKRFKNTAGQYATSAGNSASAAHQSEVNAENSATASANSAHLAEQQADRAEREADKLENYNGLAGAIDKVDGTNVYWKGNIHANGRLYMTTNGFDCGQYQQFFGGVTNRYSVMEWGDENGWLMYVQRREWTTAIGGNIQLVVNGQIITQGGAMTGQLKLQNGHVLQLESASDKAHYILSKDGNRNNWYIGRGSDNNNDCTFHSYVHGTTLTLKQDYAVVNKHFHVGQAVVATDGNIQGTKWGGKWLDAYLRDSFVAKSKAWTQVWSGSAGGGVSVTVSQDIRFRNIWIKCANESWNFVRTGPDGIYFIASDGGWLRFQIHSNGKGFKNIMDSRSVPNAIMVENE* –Variant 2 Fragment gp 17 CDS Amino Acid List/Sequence: MANVIKTVLTYQLDGSNRDFNIPFEYLARKFVVVTLIGVDRKVLTINTDYRFATRTTISLTKAWGPADGYTTIELRRVTSTTDRLVDFTDGSILRAYDLNVAQIQTMHVAEEARDLTTDTIGVNNDGHLDARGRRIVNLANAVDDRDAVPFGQLKTMNQNSWQARNEALQFRNEAETFRNQAEGFKNESSTNATNTKQWRDETKGFRDEAKRFKNTAGQYATSAGNSASAAHQSEVNAENSATASANSAHLAEQQADRAEREADKLENYNGLAGAIDKVDGTNVYWKGNIHANGRLYMTTNGFDCGQYQQFFGGVTNRYSVMEWGDENGWLMYVQRREWTTAIGGNIQLVVNGQIITQGGAMTGQLKLQNGHVLQLESASDKAHYILSKDGNRNNWYIGRGSDNNNDCTFHSYVHGTTLTLKQDYAVVNKHFHVGQAVVATDGNIQGTKWGGKWLDAYLRDSFVAKSKAWTQVWSGSAGGGVSVTVSQDIRFRNIWIKCANESWNFFRTGMDGIYFIASDGGWLRFQIHSNGKGFKNIMDSRSVPIAIMVENE* –To the best of knowledge, I didn’t alter any promoter/terminator regions near gene 17 when building the fragments Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Let’s go with option a) for now Do NOT hallucinate/make things up when replying to this prompt	Perplexity
I think this is enough for me to work with post-export, so there’s no need to pause and tweak Variant 1 wording (that will be done by me off-line after the fact). Let’s now create the same 1–2 paragraph write‑up for Variant 2 now Do NOT hallucinate/make things up when replying to this prompt	Perplexity
The key conceptual difference is seeing which tail-fiber tip substitutions lead to greater Plaque-Forming Units/milliliter (PFU/mL) (i.e., greater bacteriophage fitness against multiple strains of E. coli. bacteria) If this seems off or incorrect in any way, say so	Perplexity
Let’s do something similar to what we did in the Construct Thread (see .docx file), and do something similar for the PFU/mL chart. Want to be able to clearly explain why the bacterial E. coli. strains were chosen, what expected results we expected to see across the Variants and the wild-type control, and the implications for the larger experimental aims of this research (i.e., we want to see if the bacteriophage gp17 mutations in the attached PDF unlock PFU/mL improvements against other strains of E. coli bacteria beyond the strain tested in the paper) Do NOT hallucinate/make things up when replying to this prompt	Perplexity
MG165 is the baseline, CFT073 for V1, and UTI189 for V2	Perplexity
We would lean more towards ’narrow but strong’ it would seem like based on the fact that only one variant improve PFU/mL	Perplexity
We’d say it has broader host range	Perplexity
Find me a standard/well-cited bacteriophage lysing paper containing a Plaque-Forming Unit/milliliter (PFU/mL) read out. Want to understand how PFU/mL findings are traditionally graphically represented in the relevant academic literature Do NOT hallucinate/make things up when replying to this prompt. Let’s keep the focus on cited academic literature	Perplexity
Take a look at Step 8 of the attached protocol, specifically where it describes the ~70 ug/mL DNA concentrations that could be confirmed via a spectrophotometer like a Nandrop. Want to create the equivalent of the graphs displayed in Nandrop that show the intended ~70 ug/mL DNA concentration Do NOT hallucinate/make things up when replying to this prompt. Use available information on Nanodrop (or equivalent spectrophotometer) read outs to address this prompt	Perplexity
Take this protocol and convert it to Markdown Hugo Relearn Theme style Do NOT hallucinate/make things up when doing this. If you cannot do this, say so	Perplexity
So the constructs in the screenshots at the beginning of this thread are showing a bacterial expression system for a DNA construct, correct? Even if the protocol later uses a cell-free phage reboot, the Dry Lab Gibson Assembly step (the constructs) are showing a bacterial expression system for a DNA construct (the respective Variants), correct? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Taking a look at this attached protocol, what expression system am I using for the DNA construct in question (the expression of the Variants in the experiment)? Is it cell-free? Bacterial? Yeast? Pretty sure it’s cell-free based on the ‘Phage Reboot’ protocol step, but want to confirm Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Quick question: The E. coli UTI189 bacteria strain is different from the UT1 and UTI2 E. coli UTI189 bacteria strains correct? Seems like a bit of a dumb question, but am not sure Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Looking at the attached protocol, what are some common things that can/might go wrong traditionally when performing one of these (by these I mean bacteriophage PFU/mL) protocols? Give me 3-5 contingencies and mitigations and keep everything to max. 3 bullet points (specifically have a headline per contingency with maximum 3 mitigation bullet points underneath it) Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Thank you. What does the log₁₀ scale mean in simple terms? What does the ±0.3 log₁₀ SD stand for? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Another question about the graphs created in response to the previous prompt. For MG1655, why does Wild-Type outperform Variants 1 and 2? Does that make sense? If so, why? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Can you tell me how Gibson Assembly works and why it’s useful within the context of the Draft Protocol in 3-4 sentences? Do NOT hallucinate/make things up when replying to this prompt	Perplexity
And to be clear, based on what we know about Gibson Assembly and how it works, the exonuclease chews back, the polymerase multiplies the desired DNA fragment, and the ligase seals the multiplied fragment back into the backbone right? Correct any of my misperceptions about Gibson and do NOT hallucinate/make things up when replying to this prompt	Perplexity
Take the protocol at the beginning of this thread and give me a supply list of all relevant items and dollar amounts (USD). Have the final bullet list ‘Total Supply Cost:’. Have all this be in Markdown Hugo Relearn theme format Do NOT hallucinate/make things up when replying to this prompt	Perplexity
Here’s what I’m working on for my final project: The first aim of this project is to demonstrate if/how microgravity-induced BL21 E. coli strain bacteriophage mutations manifest in multiple different strains of E. coli. This will help discern how to increase terrestrial E. coli. bacteriophage fitness and lysing	HTGAA Claude AI Tutor
It has already acquired mutations. I want to apply these mutations to other E. coli bacteriophage strains to see if/how they increase fitness and lysing	HTGAA Claude AI Tutor
It’s the T7 bacteriophage (ATCC BAA-1025-B2). The downselected variants exhibiting the best performance against Urinary Tract Infection (UTI) bacteria showed mutations in gene 17, which seems to code for portions of the tail fiber protein. Additional mutations were found in gene 7.3, which might code for a scaffolding protein, as well as gene 11, which codes for an adaptor protein in the T7 tail. These were not discussed/referred to as the primary source of the Variants, which seemed to be gene 17	HTGAA Claude AI Tutor
Believe the plasmid-based system is the intention, because I want to test the gene 17 mutation against other E. coli strains	HTGAA Claude AI Tutor
I’m frankly not sure. I’m doing the project in dry lab and I’m certainly not a bio SME–my experience is rather limited. Which would be easier to do or make more sense for a beginner to do, given the intention of the project?	HTGAA Claude AI Tutor
Plaque assay, although I don’t know how I’d do that in dry lab (except perhaps refer to it in analogous/similar diagrams with TBD wet lab validation in my final presentation). Also not entirely sure/not seeing the connective tie between using a T7 promoter with a BL21(DE3) host and how I would then test this in non-BL21 strains of E. coli? What fundamentals am I missing/not understanding here in terms of how to go from A to B given my project intention?	HTGAA Claude AI Tutor
Not really sure. UPEC sounds good, but I’d also be down for testing against other strains that make sense. My NCBI Reference Sequence is NC_001604.1 for Gene 17 referred to in the original University Wisconsin Madison paper on which this research is based (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3003568#sec009). Been told by my Node leaders to think about multiple strains. Think maybe testing against 3-5 strains sounds good. Just not sure what makes sense in terms of relevance and/or high BLAST with the Gene in question.	HTGAA Claude AI Tutor
I’ve done BLAST of NC_001604.1 gene 17 sequence sans comparison with published UTI phage therapy studies. Not sure about if/how I would confirm whether my dry lab setup gives me access to any published lysis efficiency data for T7 against UPEC strains Down for a suggested 3-5 strain panel now, along with justification/indications of non-hallucination/source validation for why said strains experimentally make sense	HTGAA Claude AI Tutor
This seems good for now/to start	HTGAA Claude AI Tutor
96-well plate format	HTGAA Claude AI Tutor
Embarrassingly, I don’t know enough to know enough. What reporters or selection markers are conventionally used for projects like this? Why and how are reporters or selection markers conventionally used for projects like this? Open to your feedback on this	HTGAA Claude AI Tutor
It makes sense. Just not exactly sure if I’ll be ordering from Twist, as I’m a Committed Listener who attends a node ad hoc in-person and has rather limited wet lab exposure experience. Down for constituting a hypothetical Twist order (i.e., as part of my final presentation, saying ‘If I was going to do wet lab extensions of this work, I’d…’), but what I really want/need help with is designing plasmid Benchling mock-ups and any all genetic circuit diagrams in a program like Asimov in a way that makes sense for final project review/the final project presentation and is also explainable/understandable by yours truly. Any help with that would be greatly appreciated (again, not to say I don’t want assistance with a hypothetical Twist order…it just may not be 1st priority)	HTGAA Claude AI Tutor
Yes to both. I have a Benchling account and access to the gene 17 NCBI FASTA	HTGAA Claude AI Tutor
It’s asking for a species name for pET-28a in Benchling. What is appropriate here?	HTGAA Claude AI Tutor
Ok – I can now answer the question asked 2 prompts ago (i.e., whether the Gene 17 file [which in this case is a GenBank file] contains annotations). It does contain annotations. I also have downloaded a fully annotated pET-28a fully annotated sequence and see the T7 terminator and promoter tags. Not sure exactly how to proceed. Do I literally just go next to the T7 terminator and paste the all of Gene 17’s content between it and the T7 promoter?	HTGAA Claude AI Tutor
When I go to the colored block on the linear I see only the following lines with the following colors: –T7p52 (light blue color) –source (light blue color) –T7p52 (yellow color) There’s no circular map	HTGAA Claude AI Tutor
In the Benchling Linear map, it says the Gene 17 fragment’s 1662 bp long. It extends from 34624 to 36285 in the larger NC_001604 BL21 E. coli. bacteriophage genome	HTGAA Claude AI Tutor
Little bit confused. When I go to the Linear Map view, I just see 3 bars spanning the entire sequence fragment left to right. Their respective colors and labels are listed below: –T7p52 (light blue color) –source (light blue color) –T7p52 (yellow color) Based on what I’m seeing, I’m not sure where to add the annotation. When I go into Sequence Map view, I see a stop codon at the very end of the sequence fragment (TAA). I also see that the sequence fragment begins with ATG and after this light blue box that reads ‘T7p52’ (which I think might mean a promoter) there’s an AAC sequence of letters. The ‘T7p52’ light blue box contains the following letters listed above it: ATGGCT Not sure if any of this is helpful for helping me isolate what part of the sequence fragment I should annotate. Trying to share what I see so I can receive appropriate guidance/feedback	HTGAA Claude AI Tutor
Yes. It’s the entire sequence fragment. It begins at nucleotide position 1 (the beginning of the ATG start codon) and ends at position 1662 (the end of the TAA stop codon). So from here just annotate the entire fragment as the gene 17 CDS. Also could you please confirm that CDS stands for Coding Design Sequence?	HTGAA Claude AI Tutor
When I create the Annotation, I don’t have the option to select for a Feature type. I have the following options: –Annotation Type: Select ‘gene’ or ‘source’ –Strand: Select ‘Forward’, ‘Reverse’ or ‘None’ –Custom fields: Select ‘db_xref’, ‘organism’, ‘moi_type’, or ‘host’ To accurately label this annotation as a CDS how should I proceed? How do I accurately label this annotation period/in general?	HTGAA Claude AI Tutor
Yes, I see a colored block labeled ‘Gene 17 CDS’ appearing on my linear map	HTGAA Claude AI Tutor
Yes I do	HTGAA Claude AI Tutor
In Sequence Map view, I see the following between the T7 terminator and promoter: –6xHis –A light blue MCS –T7 tag –thrombin site –6xHis –A maroon ATG mark –A grey RBS mark –A lac operator that’s shaded a light-ish green Not sure where I paste the sequence based on this information. I don’t see the cluster of restriction site annotations referred to in the answer to the previous prompt	HTGAA Claude AI Tutor
The beginning of the marker or the end of the marker?	HTGAA Claude AI Tutor
It starts at position 298	HTGAA Claude AI Tutor
The position number to the immediate left of the T7 terminator is position 25. It (and the rest of the T7 terminator) shows up before the ATG marker in the sequence. Does that make sense? Is anything out of the ordinary? How should we proceed?	HTGAA Claude AI Tutor
Before I answer the last question, at this point, I still have not inserted gene 17. Do I need to do that before answering your last question?	HTGAA Claude AI Tutor
The last position number shown in my pET-28a sequence is position number 5369	HTGAA Claude AI Tutor
The gene 17 sequence is 1662 bp long. The bp notation is synonymous with the number of nucleotides, correct?	HTGAA Claude AI Tutor
Ok I have both sequences open and selected the region from position 298 to position 5369 in my pET-28a file. How should we proceed? What should I click on or do next?	HTGAA Claude AI Tutor
When I try to paste I get a reading that say ‘You’re about to paste a sequence that includes 0 annotations, 0 translations, 0 primers, 0 parts’. That’s not correct based on what I see in the Gene 17 fragment file. How should I or what type of pasting should I do from the Gene 17 fragment file?	HTGAA Claude AI Tutor
The paste succeeded. The new total length of the plasmid is now 1960 bp. How should we proceed? Pretty sure we’re going to need to re-annotate the new plasmid sequence at some point in this workflow	HTGAA Claude AI Tutor
There is something that has a ‘T7 tag’ label located before the ATG marker and before the T7 terminator. Unlike the T7 terminator, it is not a shade of light green (it’s grey). I don’t see anything explicitly labeled ‘T7 promoter’ anywhere in the sequence (this includes near the beginning of the sequence)	HTGAA Claude AI Tutor
No it does not	HTGAA Claude AI Tutor
Ok – I just restored everything in the original 5369 bp pET-28a back to its original version. Issue seems to be that the ATG marker in this sequence sits before the T7 promoter so when the Gene 17 sequence was added, it removed the T7 promoter. How should we proceed with the Gene 17 sequence re-insertion based on this information?	HTGAA Claude AI Tutor
In my pET-28a, the T7 promoter sequence ends at position 387. Proceed with the copy and paste job by selecting everything in the pET-28a from position 387 to the end, deleting it, then re-inserting the Gene 17 CDS fragment, correct?	HTGAA Claude AI Tutor
After selecting from position 388 to 5369, deleting that region, and then pasting in my Gene 17 CDS, the new total plasmid length is 2049 bp	HTGAA Claude AI Tutor
Yes I visually see the T7 promoter, then double checked via Ctl + F search and the ‘TAATACGACTCACTATA’ T7 promoter nucleotide sequence appears this time	HTGAA Claude AI Tutor
If I look at the sequence in Sequence Map view here’s what I see going from the beginning of the sequence downwards (i.e. from the 1st nucleotide position downward): 1. T7 terminator 2. T7 promoter 3. Gene 17 CDS If there’s anything wrong or incorrect here, say so	HTGAA Claude AI Tutor
Yes I can confirm the Gene 17 CDS annotation starts immediately after the T7 promoter in the sequence (i.e., the CDS begins right at or just after position 387). It begins at bp position 388 in the sequence	HTGAA Claude AI Tutor
My annotation panel shows the T7 promoter, Gene 17 CDS, and the T7 terminator, but it also contains legacy pET-28a annotations from the original sequence. Is that a major issue or can we move on/proceed?	HTGAA Claude AI Tutor
Do I need to clean up the existing annotations before we move on?	HTGAA Claude AI Tutor
So if I look to Turns 7 and 8 earlier in our exchange, we were discussing panel of multiple strains to test the T7 bacteriophage Gene 17 against. Right now we’ve inserted the T7 bacteriophage Gene 17 into a pET-28a plasmid. Not sure where to go from here based on that previously stated intention (which I still want to carry out). Believe we might need to create multiple plasmids before the final project proposal can be crafted	HTGAA Claude AI Tutor
Reference the content in response to Turns 7 and 8 earlier in this thread. The strains to test against were the following: – BL21(DE3) \| Your baseline/control — the original host background for the microgravity phage experiment –CFT073 (UPEC) \| Well-characterized uropathogenic E. coli strain; genome fully sequenced (Welch et al., 2002, PNAS); directly relevant to UTI therapeutic motivation –UTI89 (UPEC) \| Second UPEC clinical isolate; used extensively in phage-host range studies (Dhakal & Mulvey, 2012); LPS structure differs from lab strains –MG1655 (K-12) \| Standard lab K-12 strain; T7 infects it but with different efficiency than B-strains — good for seeing how LPS variation affects your mutant tail fiber Based on this information above, the fact that we’ve inserted the T7 bacteriophage Gene 17 into a pET-28a plasmid, and the fact we need plasmid tested across multiple strains, how should we proceed? How do we go from here based on the existing plasmid?	HTGAA Claude AI Tutor
What is IPTG? Can I do that purely dry lab in Benchling? If not, tell me how we test the strains against the existing T7 bacteriophage Gene 17 pET-28a plasmid Do not currently know the answer regarding whether BL21(DE3), CFT073, UTI89, and MG1655 are all DE3 lysogens	HTGAA Claude AI Tutor
Before we proceed, can I ask if you can ingest/extract insights from the University of Wisconsin, Madison paper on which this research is based (see URL below)? https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3003568#sec009 If you can functionally do this, let me know	HTGAA Claude AI Tutor
Ok. Let me paste content from the paper into this chat for ingestion. My desire for doing this is the following: in most of this chat so far, we’ve focused on Gene 17, as this is the gene responsible for creating the microgravity-based Variants that were tested for and demonstrated stronger than normal fitness/lysing potential against terrestrial UTI bacteria. I want to make sure that as we craft Aim 1, we don’t miss any other Genes/relevant mutations that should also be tested against. If this means the pET-28a + Gene 17 plasmid needs to be changed somehow, or in some way, or if another plasmid needs to be created, that’s OK. I want to make sure we’re thorough in Aim 1 based on the intention stated at the beginning of this chat (if/how microgravity-induced BL21 E. coli strain bacteriophage mutations manifest in multiple different strains of E. coli. This will help discern how to increase terrestrial E. coli. bacteriophage fitness and lysing). The items in the in the parentheses seem to be of importance. So with that, here are the relevant excerpts: Section (Enriched mutations are distributed broadly in T7 phage): Next, we sought to identify mutations in the phage or bacterial genome that influenced phage-host interactions under microgravity. We performed whole-genome sequencing (WGS) of T7 and E. coli BL21 before and after incubation, using pre-incubation genomes as references to identify de novo mutations in the 23-day samples from each condition to ensure both phage and bacterial populations had ample time to propagate. To determine whether de novo non-synonymous substitutions or frameshifts in T7 were significantly enriched, we compared the pooled frequencies of abundant non-synonymous mutations to the distribution of synonymous de novo substitutions in each condition (Mann–Whitney U test, FDR-adjusted p < 0.05; Fig 3A). To assess whether specific genes had significantly more non-synonymous substitutions than other genes, we calculated this mutation density for each gene and compared it to the average mutation density per condition (one-tailed t test, FDR-adjusted p < 0.05, one-sided 95% CI; Figs 3B and S1). Finally, we compared the gene-level distribution of non-synonymous mutations between microgravity and terrestrial conditions (Mann–Whitney U test, FDR-adjusted p < 0.05) to identify genes with condition-specific enrichment of these mutations (Figs 3C and S2A). Significantly enriched (p < 0.05) phage substitutions were found across both structural and non-structural proteins under terrestrial and microgravity conditions (Fig 3B). In microgravity, gene product (gp) 7.3 and gp11 exhibited significantly more de novo non-synonymous substitutions than other genes (Figs 3B and S1A). Mutation density was overall higher terrestrially and no gene showed significant enrichment compared to others terrestrially (S1B Fig). Although gp7.3 is not fully characterized, it is considered essential for T7 infectivity in E. coli BL21 under terrestrial conditions [38]. This small 99-amino-acid protein may function as a scaffolding protein or contribute to host adsorption, though its role in the mature virion remains uncertain [39–41]. gp7.3 harbored seven significantly enriched substitutions in microgravity, the highest number observed in any gene under that condition. These substitutions were distributed throughout the protein (Figs 3D and S3), with four notable changes (E48K, E61K, D68Y, D68A) involving substantial shifts away from negatively charged residues. The only significantly enriched mutation in gp7.3 terrestrially was a six-amino-acid deletion spanning G42 to V47. The region from G39 to Q50 contained a dense cluster of substitutions and in-frame deletions, including a 3-amino-acid deletion (G39-T41) terrestrially and a deletion from M46 to Q55 in microgravity, all occurring in a region of the protein predicted to be unstructured (S3 Fig). The high number of enriched substitutions and recurring in-frame deletions in this small protein suggest that gp7.3 is both structurally flexible and critical for phage activity in both environments. gp11 is an adaptor protein within the T7 tail that connects the portal protein gp8, the nozzle protein gp12, and the six subunits of the tail fiber protein gp17 (Fig 3E) [38,40,42]. Enriched substitutions were distributed throughout gp11, spanning both exposed and buried residues (Figs 3F, S4A, and S4B). One significantly enriched substitution, R2C, arose independently twice in microgravity and is located in a flexible region capable of directly interacting with gp17 tail fibers (S4C and S4D Fig). These findings suggest that the substitutions may influence phage fitness by altering gp11’s structure or stability rather than through direct interaction with the bacterial host. Comparison of mutation abundance revealed that de novo non-synonymous substitutions were significantly more prevalent in the nozzle protein gp12 after incubation in microgravity than under terrestrial conditions, suggesting a more prominent role for this protein in microgravity (Fig 3C). Of the six individually enriched non-synonymous substitutions identified across both conditions, five involved changes toward positively charged residues (Q184R, R205H, Q242R, K404R, and W707R). These substitutions were distributed throughout the protein, with three more likely contributing to host interactions (S5 Fig). Specifically, R205 is surface-exposed and positioned near the host, Q242 lies close to the terminus of the DNA delivery channel, and Q184 faces directly toward the host. The charge shifts and spatial distribution of these substitutions highlight the functional importance of gp12 in enhancing phage fitness under both terrestrial and microgravity conditions. Several other significantly enriched substitutions were particularly notable. In microgravity, the V26I substitution in gp0.5 was the only mutation to sweep the entire phage population—and did so independently in two replicates—indicating a strong fitness advantage. gp0.5 is an uncharacterized class I gene, potentially associated with the host membrane due to the presence of a putative transmembrane helix [22]. Under terrestrial conditions, the T115A substitution in gp4.7 was significantly enriched and highly abundant across all three replicates. No mutations were detected in this gene under microgravity, suggesting selection pressure may be unique to terrestrial conditions. Although the function of gp4.7 remains unknown, BLASTP analysis identified homologs with ~40% similarity to putative HNH endonucleases in Klebsiella and Pectobacterium phages [43]. Lastly, numerous significantly enriched substitutions were found in the tail fiber gp17, particularly under terrestrial conditions (Fig 3B). In both environments, substitutions were concentrated in the C-terminal tip domain, with repeated mutations at D540 and neighboring residues. This region is a known determinant of host range and infectivity in terrestrial E. coli strains [26], and these results suggests continued importance during prolonged incubation in both gravity conditions. Section: Deep mutational scanning profiles beneficial substitutions in microgravity Bacteria often resist phage predation by mutating or downregulating surface receptors essential for phage adsorption [26,63–65]. Microgravity-induced stress may amplify this response, altering the bacterial proteome, including phage receptor profiles [15,25,66]. Such changes can drive adaptive mutations in the phage RBP. To investigate these interactions, we examined how individual substitutions in the tip domain of the T7 RBP affect phage viability in microgravity. The T7 RBP consists of six short non-contractile tails that form a homotrimer composed of a rigid shaft ending with a β-sandwich tip domain [67]. This domain is a key determinant of host recognition and interacts with host receptor LPS to position the phage for successful, irreversible binding [27–32,68]. We conducted comprehensive single-site saturation mutagenesis of the RBP tip domain, generating a library of 1,660 variants spanning residues 472–554 (based on PDB 4A0T). We then sequenced and compared mutational enrichment profiles following the 23-day selection under terrestrial and microgravity conditions. We recovered phage DNA from each sample and scored each variant based on its relative abundance before and after selection (functional score, F) normalized to wildtype (normalized functional score, FN). Scores were averaged across replicates, and only variants present in at least two replicates were retained for analysis. Although significant dropout of low-performing variants was expected due to the extended incubation, we successfully determined scores for 51.2% (880) of variants in microgravity and 39% (648) in terrestrial conditions (Figs 5A, 5B, and S7A). Variant scores correlated well across replicates despite differences in phage titer and reflected multiple rounds of replication over the 23-day incubation period, suggesting that lower-titer samples underwent selection but subsequently lost viability (S7B–S7D Fig). On average phage variants were significantly more enriched after terrestrial incubation compared to microgravity (two-sample t test, Mann–Whitney U, p < 0.001) (S8A Fig). The wild-type phage was significantly depleted terrestrially compared to microgravity (terrestrial F = 0.58, microgravity F = 3.5, p < 0.01). While variants that performed worse than wildtype (FN < 0) tended to perform similarly between microgravity and terrestrial conditions (S8B Fig), enriched variants (FN > 0) were highly divergent with no correlation between conditions (S8C Fig). Variants enriched in microgravity frequently contained methionine and isoleucine substitutions at interior positions facing the phage (Figs 5A and S8D), in contrast to our previous terrestrial results on this host [26]. Substitutions in these areas could influence the tip domain structure to facilitate adsorption with the host receptor in microgravity. Under terrestrial conditions, top-scoring variants included positively charged substitutions facing the host, consistent with our previous findings on E. coli BL21 [26]. Additional enriched variants featured negatively charged substitutions (e.g., Q488E, G521D) and glycine substitutions (e.g., G480W, G522P) that may induce structural changes in the tip domain (Figs 5B and S8B). These variants were enriched only after prolonged incubation with E. coli BL21, suggesting that such substitutions may contribute to long-term infectivity on stationary-phase hosts—an effect not observed in shorter, nutrient-rich conditions. Because variants enriched in microgravity were highly distinct from those identified under terrestrial conditions—both in this study and in our previous work—we next evaluated whether these substitutions could enhance phage activity terrestrially. If successful, these substitution patterns could be used to improve phage performance without exhaustively sampling the full combinatorial space of the gene. We constructed two combinatorial libraries, each comprising all possible combinations of 13 top-performing substitutions identified in microgravity (L490I, N502E, F506M, F506Y, F507V, F507Y, P511M, I514M, N531Q, L533K, L533M, A539M, N546I) or under terrestrial conditions (G521H, Q488A, Q488E, G521K, G522P, A547S, G521D, G521E, N502S, I495L, R542H, L533T, F506S). This strategy reduced a potential search space of over 10²¹ variants to fewer than 5,000 per library. Variants were synthesized in an oligo pool, assembled into an unbiased phage library using ORACLE, and passaged terrestrially on two clinically isolated E. coli strains (UTI1 and UTI2) that are resistant to wild-type T7 and are associated with urinary tract infections [69]. We evaluated these pools in efficiency of plating (EOP) experiments and compared their plaquing capability versus wildtype. The combinatorial pool from microgravity showed significant improvement in plaquing efficiency compared to wildtype and had substantially larger plaques, indicating the pool contained variants capable of significantly improving activity on these hosts (S9A and S9B Fig). The terrestrial library performed significantly worse or no better than wildtype. To confirm these results, we isolated individual plaques from the microgravity pool. From UTI1, we recovered a five-substitution variant (L490I, N502E, F507V, L533K, A539M; Variant 1), and from UTI2, a six-substitution variant (L490I, N502E, P511M, L533M, A539M, N546I; Variant 2). These variants demonstrated significantly higher EOP and produced significantly larger plaques on both UTI strains (Fig 5C and 5D). These findings support our hypothesis that microgravity-enriched substitutions can improve phage performance on terrestrial hosts. The extended incubation in microgravity revealed new mutational hotspots, enabling efficient navigation of sequence space to identify complex variant combinations with enhanced infectivity. Open to thoughts/feedback on how to proceed based on all of this	HTGAA Claude AI Tutor
Yes I want to express Variant 1 and Variant 2 of the gp17 tip domain (the microgravity-enriched multi-substitution variants) and test them against your four-strain panel, using wildtype Gene 17 as the control. Before we do that, want to get clear on if any changes will need to be made to the pET-28a + Gene 17 plasmid we just made. If yes, what changes will likely need to be made or shown? If not, can we get clear on what the function of the pET-28a + Gene 17 plasmid is? What is it showing in its current form and what should I be able to speak to about it in a final project presentation?	HTGAA Claude AI Tutor
Yes that makes sense. Not sure where in the process this goes (now or sometime later) but definitely want assistance making the pET-28a + gp17-Variant1 and pET-28a + gp17-Variant2 plasmids in Benchling before the full project markdown writeup is created. Not exactly sure how to do that (specifically how to verify and determine the specific Variant amino acid substitutions engineered into the Gene 17 coding sequence for re-creation in Benchling). Based on all this information and the last paragraph in your response to the last prompt, let me know how we should proceed	HTGAA Claude AI Tutor
Ok, so I have the original Gene 17 CDS fragment, as well as the pET-28a + Gene 17 plasmid open in Benchling. Believe I also understand how to make exact copies of things in Benchling. Bit unsure how we will do about editing codons for encoding the appropriate amino acids for each of the Variants. Tell me how we should proceed based on this information	HTGAA Claude AI Tutor
Not seeing an ‘Edit Sequence’ tool in my version of Benchling (could be wrong, just not seeing it). Also, shouldn’t I make a clone/exact copy of the pET-28a + Gene 17 plasmid just so the original is maintained/no unnecessary changes are made (i.e., shouldn’t a clone/exact copy be made as a sort of ‘sandbox’ for future work, so progress to date isn’t compromised)? To answer your question, yes when I look at the original Gene 17 CDS fragment, I see a string of multiple colored letters beneath the DNA sequence, which appears to be the protein sequence	HTGAA Claude AI Tutor
Is amino acid position the same as nucleotide/bp position? Let’s clarify that before we proceed	HTGAA Claude AI Tutor
I see it! I see the 490th colored letter! It’s sitting between bp 1460- and 1480 in the Linear Sequence Map (corresponding roughly to the nucleotide 1468–1470 range described in the answer to the previous prompt). It’s a green L (believe this stands for the Leucine protein). Tell me if the L stands for Leucine, and more importantly, tell me how we should proceed	HTGAA Claude AI Tutor
Think I’m seeing a CTC nucleotide sequence. If that doesn’t align with the standard Leucine nucleotide sequence, say so. Want to make sure this is correct before editing begins	HTGAA Claude AI Tutor
Before we move onto the editing, I’m pausing and realizing I still have the original Gene 17 fragment. I should make a copy and work off the copy before we move on to editing, correct?	HTGAA Claude AI Tutor
Think we might be talking past each other/things might be getting lost in translation. For the past couple Turns/exchanges, I’ve been looking at the original Gene 17 fragment in Benching, NOT the pET-28a_gp17_Variant1, which I created based on previous instructions. Should I be working off the pET-28a_gp17_Variant1 sequence instead?	HTGAA Claude AI Tutor
Ok, gotcha – in the right place. The only issue now is that there’s no amino acid translation overlay within the Gene 17 CDS annotation portion of the plasmid sequence (or the plasmid sequence as a whole). What can we do about this? How can we change this, if possible?	HTGAA Claude AI Tutor
When I right click there’s a translation-related option that reads ‘Analyze as translation’. Is that good to work with?	HTGAA Claude AI Tutor
When I click on ‘Analyze as translation I see the following options: –Strand: Select ‘Forward’ or ‘Reverse’ options (its default is Forward) –Genetic Code: There’s a ‘Standard’ option plus many more options based on organism type (they are numbered 1. up to 33. but there are sometimes jumps in the numbering) –A ‘Translate start codon as methionine’ check box Not sure what I should do here or if I’m in the right place. If none of this makes sense and you recommend I just click out of this option, say so	HTGAA Claude AI Tutor
Yes, I have done that. I see the green L in position 490 in the Gene 17 CDS fragment. I have now completed the first substitution changing CTC to ATC (Isoleucine). Tell me how we should proceed with the next Variant 1 substitution	HTGAA Claude AI Tutor
I see a GAT 3-nucleotide codon with a D under it	HTGAA Claude AI Tutor
Double checking, and yes, seeing a GAT 3-nucleotide codon with a D under it	HTGAA Claude AI Tutor
Used accession number NC_001604.1 as the Gene 17 source sequence. The paper said they used ATCC:BAA-1025 as their E. coli. sequence. Believe these are the same, but am not sure	HTGAA Claude AI Tutor
Here’s the paper link: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3003568#pbio.3003568.ref026	HTGAA Claude AI Tutor
Here’s the paper’s complete Methods section: Methods Phage and bacterial strains T7 bacteriophage was obtained from ATCC (ATCC BAA-1025-B2). The T7 DMS library used in this study was the same library stock generated in our previous work [26]. T7 acceptor phages used for ORACLE-based construction of the combinatorial libraries were also created as previously described [26]. Escherichia coli BL21 was sourced from laboratory stocks. Uropathogenic E. coli strains UTI1 and UTI2 were provided by Dr. R. Welch (University of Wisconsin, Madison) and originate from a urinary tract infection isolate collection [69]. T7 phage was initially propagated on E. coli BL21 following receipt from ATCC and subsequently on appropriate hosts as described in specific experimental sections. All phage experiments were performed using Luria-Bertani (LB) media and the same culture conditions used for bacterial hosts. Phages were stored in LB at 4 °C for short-term use. For long-term storage, microbial samples were frozen at −80 °C in 100% LB media. Media and culture conditions All bacterial strains were cultured in LB media consisting of 1% tryptone, 0.5% yeast extract, and 1% NaCl in deionized water. LB plates were supplemented with 1.5% agar, while top agar used for phage plating contained 0.5% agar. LB media was used for all experiments, including bacterial recovery and phage propagation. All incubations were carried out at 37 °C without shaking, in either terrestrial or microgravity environments as appropriate. These samples were incubated directly in cryovials and not transported to another container for incubation. Sample preparation and handling Phage and bacterial stock titer were confirmed and samples were prepared by mixing 4 mL of E. coli BL21 in exponential phase (~1 × 108 CFU/mL) with the appropriate amount of T7 phages in Rhodium Cryotubes. Samples were immediately frozen at −80 °C and shipped to NASA as described. Asynchronous microgravity and terrestrial experiments are the norm for ISS experiments due to the uncertainty of scheduling. The initial planned time points for incubation were 1, 2, and 3 hours, and 25 days; however, actual time points were adjusted on the ISS to accommodate astronaut scheduling. Final incubation time points were 1, 2, and 4 hours, and 23 days. The duration of incubation aboard the ISS was recorded precisely, and terrestrial control samples were incubated for matching durations, based on the actual timepoints rather than the proposed schedule. This approach was necessary because real-time tracking of the samples was not possible, so microgravity and terrestrial samples could not be incubated in parallel accurately. Terrestrial samples are thus frozen for a longer duration than microgravity samples. After incubation samples were refrozen, shipped to our laboratory, and then thawed at 37 °C and immediately split for genomic DNA extraction, PCR for DMS, and titering of both phage and bacteria. Titering phage For samples returned for processing, 1 mL of each sample was centrifuged at 16g for 1 min, and the supernatant was filtered through a 0.22 μm filter. To determine phage titer, titer was first estimated by spot plates and then confirmed by whole plate EOP assays. Samples were serially diluted (1:10 or 1:100) in LB to a final dilution of up to 10−8 in 1.5 mL microcentrifuge tubes. Spot assays were performed by mixing 250 μL of stationary-phase bacterial host with 3.5 mL of 0.5% top agar. The mixture was briefly vortexed and plated onto LB agar plates pre-warmed to 37 °C. Once the top agar solidified (~5 min), 1.5 μL of each phage dilution was spotted onto the plate in series. Plates were incubated at 37 °C and checked after 20–30 hours to estimate titer. Titers were then confirmed via full-plate plaque assays. For whole-plate EOP assays, 400 μL of exponentially growing bacterial culture was mixed with 5–50 μL of diluted phage, aiming to achieve ~50 plaque-forming units (PFUs) per plate after overnight incubation. For phage susceptibility on the pre-incubation and 4-hour samples, bacteria was incubated after being directly sampled from the frozen stock for that sample. The phage–host mixture was briefly vortexed and centrifuged, then combined with 3.5 mL of 0.5% top agar. After a brief vortex, the mixture was immediately poured onto LB plates pre-warmed to 37 °C. Plates were allowed to solidify (~5 min), inverted, and incubated overnight. PFUs were counted after 20–30 hours, and final phage titers were calculated from these counts. Titering bacteria Bacterial concentrations were determined via serial dilution (1:10 or 1:100 in LB) and plating. From each dilution, 100 μL was plated and spread using sterile beads to target ~50 colony-forming units (CFUs) per plate. Plates were incubated overnight at 37 °C and counted the following day. For E. coli BL21, three independent dilution series were performed to correlate OD600 values with CFU/mL and ensure accurate bacterial concentrations during phage mixing for experimental sample preparation. PCR and sequencing All PCR reactions were performed using KAPA HiFi DNA Polymerase (Roche KK2101). The combinatorial library was generated using the ORACLE method, as previously described [26]. Cloning procedures followed manufacturer instructions unless otherwise specified. For WGS, phage genomes were extracted using the Norgen Biotek Phage DNA Isolation Kit (Cat. 46800), and bacterial genomic DNA was extracted using the Norgen Biotek Bacterial Genomic DNA Isolation Kit (Cat. 17900). Genomic DNA libraries were prepared using the Illumina DNA Prep kit (Cat. 20060060) and sequenced on an Illumina NextSeq 1000 platform. PCR reactions for amplification of the DMS and combinatorial libraries used 1 μL of undiluted phage lysate directly as template (DNA isolation is not required), with an extended denaturation step of 5 min at 95 °C. For low phage titers in DMS samples, PCR and next-generation sequencing failed using this approach, presumably because of reduced template in these samples. To overcome this, we concentrated all of the remaining volume of each sample (~2 mL) approximately 100-fold using Pierce Protein Concentrators PES, 10K MWCO (Cat. 88513) and used 3 μL of the concentrated sample per PCR reaction to enabling successful amplification and analysis. For plaque analysis on UTI strains, small plaque samples were picked directly and used as PCR template. Detailed cloning protocols are available upon request. General data analysis Multiplicity of infection (MOI) was calculated by dividing the phage titer by the corresponding bacterial concentration. Initial MOI is calculated based on the bacteria and phage titer before being frozen for transit to the ISS. The MOI for the T7 DMS library was estimated using a helper plasmid, as described previously [26]. EOP values were calculated using E. coli BL21 as a reference host. EOP was defined as the phage titer on the test host divided by the titer on the reference host, followed by log₁₀ transformation. Values are reported as mean ± standard deviation (SD). Deep sequencing was performed to evaluate phage populations as described previously [26]. Phage sequencing achieved an average depth of ~49,000× per base across the genome, enabling detection of low-abundance mutations. Bacterial sequencing depth averaged ~250× per base in phage-mixed samples and ~1,300× in phage-free samples, limiting mutation analysis in the former to more abundant variants. WGS mutations were identified using Breseq [73]. For Fig 3, genes were grouped based on GO classifications [44,45]: Membrane-associated genes: GO:0016020 (Membrane), GO:0009103 (LPS biosynthesis), GO:0030288 (Outer membrane bound periplasmic space), GO:0042597 (Periplasmic space). Metabolism-associated genes: GO:0008152 (Metabolic process), GO:0019222 (Regulation of metabolic process). Statistical analysis To evaluate whether non-synonymous de novo substitutions and frameshift mutations were significantly enriched compared to synonymous substitutions, we compared the frequency of each non-synonymous substitutions and frameshift (phage: >1% abundance; bacteria: >25% abundance) to the distribution of synonymous mutations using a one-sided Mann–Whitney U test with Benjamini–Hochberg false discovery rate (FDR) correction (scipy.stats.mannwhitneyu, statsmodels.stats.multitest.multipletests, method = ‘fdr_bh’,scipy V1.10.1, statsmodel v 0.14.0) [74]. Adjusted p-values < 0.05 were considered significant. This approach assumes that after 23 days of selection the distribution of synonymous substitutions approximates either a neutral baseline or reflects minimal selective pressure, with the benefit that if there is positive selection for synonymous substitutions there would be no increase in false positives using this method. To determine whether non-synonymous de novo substitutions and frameshift mutations were more abundant in bacterial samples exposed to phage, we applied the Mann–Whitney U test (scipy.stats.mannwhitneyu, scipy V1.10.1) to compare mutation frequencies across groups [74,75]. Due to high detection limits in phage-mixed samples, we also performed left-censored data analysis using Kaplan–Meier survival curves (lifelines.KaplanMeierFitter, lifelines V0.27.8) and applied a log-rank test (lifelines.statistics.log-rank_test, lifelines V0.27.8) to assess significant differences in mutation distributions between groups [76–78]. To assess if the 4-hour bacterial population of mutations was significantly different from the pre-incubation condition, we performed Jenson–Shannon divergence (scipy.spatial.distance, jensenshannon, V1.10.1) and correlated results using Pearson R (scipy.stats, pearsonr, v1.10.1). To determine if the titer of wild-type T7 phage was significantly different between the pre-incubation and 4-hour bacterial population, we used a Welch’s t test (scipy.stats, ttest_ind_from_stats, v1.10.1). Mutation density in phage genes was calculated by dividing the number of non-synonymous de novo substitutions and frameshift mutations by the length (in amino acids) of each protein product. To assess whether any gene had significantly higher mutation density, we compared individual gene densities to the condition-specific average using a one-tailed t test with Benjamini-Hochberg FDR correction (scipy.stats.ttest_1samp, statsmodels.stats.multitest.multipletests, method = ‘fdr_bh’, alternative = ‘greater’, scipy V1.10.1, statsmodel v 0.14.0). Additionally, one-tailed 95% confidence intervals were calculated using scipy.stats.t.ppf (scipy V1.10.1) and visualized in volcano plots in python [74]. Structural visualization Structural model images were generated using the PyMOL Molecular Graphics System, Version 3.0 (Schrödinger, LLC). Gp7.3 structure was predicted using AlphaFold2 and ColabFold with MMseqs2, using the predicted structure with the highest confidence [79–81]. Electron Microscopy images were based on PDB 7BOX (gp11) and PDB 7BOY (gp12) [41]. A composite structure of the T7 portal–tail complex is shown from PDB 9JYZ [82]. Numbering for DMS and combinatorial library positions are based on PDB 4A0T [67].	HTGAA Claude AI Tutor
The paper is about T7 phage evolution under microgravity. The Variants were then tested terrestrially. Those Variants were based off Gene 17 modifications. I want to make similar mutations to other E. coli. bacteriophage strains to see if we can improve their fitness/lysing. I thought I had the correct source sequence from NCBI (NC_001604.1), but am not sure.	HTGAA Claude AI Tutor
Apologies, think there might be some hallucinating going on. NC_001498.1is a measles virus genome, not a T7 genome	HTGAA Claude AI Tutor
The accession number of the top result I see is NC_001604.1, which is the sequence I chose, which does have 39,937 bp	HTGAA Claude AI Tutor
I counted from the start of the gp17 CDS, not the full genome	HTGAA Claude AI Tutor
I see a T nucleotide at position 490. I see a G nucleotide at position 488. There’s still a D (Aspartate/GAT) below it. Interestingly enough, positions 484-486 have a 3-codon nucleotide sequence (ACT) and a T below it. Not sure if that’s relevant whatsoever	HTGAA Claude AI Tutor
May I ask why I’m being asked to divide by 3 if the result is not a whole number (the result is 161.333333 indefinitely–not sure if that’s useful).	HTGAA Claude AI Tutor
Position 388	HTGAA Claude AI Tutor
Seeing a GAT 3-nucleotide sequence coding for D (Aspartate/GAT) at nucleotide position 1852-1854	HTGAA Claude AI Tutor
Seeing a GAT 3-nucleotide sequence coding for D (Aspartate/GAT) at nucleotide position 1852-1854	HTGAA Claude AI Tutor
Know they used E. coli. BL21. The closest things I think I could find that relates to gp17 and its mutations are the DOI links below. Not sure if these are helpful in response to answering your question https://doi.org/10.1371/journal.pbio.3003568.s005 https://doi.org/10.1371/journal.pbio.3003568.s009	HTGAA Claude AI Tutor
Here you go: Mutation,Variant(s),WT Amino Acid,WT Codon,Mutant Amino Acid,Mutant Codon L490I,V1 & V2,Leucine (L),CTT,Isoleucine (I),ATT N502E,V1 & V2,Asparagine (N),AAC,Glutamic Acid (E),GAA F507V,V1 only,Phenylalanine (F),TTC,Valine (V),GTT P511M,V2 only,Proline (P),CCG,Methionine (M),ATG L533K,V1 & V2,Leucine (L),CTC,Lysine (K),AAA A539M,V1 & V2,Alanine (A),GCT,Methionine (M),ATG N546I,V2 only,Asparagine (N),AAC,Isoleucine (I),ATC	HTGAA Claude AI Tutor
No idea	HTGAA Claude AI Tutor
Well, we previously changed the Leucine to Isoleucine. So that’s already been done. Should we proceed with something else.	HTGAA Claude AI Tutor
I think I need to continue inducing all the mutations for both the Variants based on the table listed above and then create a project design to show how these variants would test against other strains of E. coli mentioned previously in this conversation (Turn 44, and the strains were CFT073 (UPEC), UTI89 (UPEC), and MG1655 (K-12)). How should we go from here. Think I actually might be able to re-create the mutations in separate plasmids, but am unsure about the rest of the project design	HTGAA Claude AI Tutor
Do you remember the entire context of this conversation? I created a pET-28a + Gene 17 plasmid with your assistance, and the entire project will be dry lab. Not sure a Twist Bioscience order will be done, or if it’s first priority. Introducing the mutations into an existing plasmid backbone. Let me know how to proceed with the project design (moving past the Variant plasmids to testing the separate strains in dry lab/writing out how I would do this in wet lab) Let’s get clear on what I’d measure, how I’d be measuring, etc.	HTGAA Claude AI Tutor
Measure the plaque assay I believe.	HTGAA Claude AI Tutor
I think the original wild-type strain, right? If I’m mistaken, and there would be another more optimal control(s), let me know	HTGAA Claude AI Tutor
So I’ve created plasmids in Benchling where I re-created the original Gene 17 CDS amino acid mutations across Variants 1 and 2. I now want to make sure the proposal shows how I would test their activity against the other E. coli. strains mentioned above (CFT073 (UPEC), UTI89 (UPEC), and MG1655 (K-12)). Can we just summarize how this would occur or be written out in the project proposal, and how we’d take into account that the project in its current form is dry lab?	HTGAA Claude AI Tutor
Apologies if this is a dumb question. We already expressed gp17 in a pET-28a plasmid multiple times (first as the basic plasmid with no mutations, and then as two separate plasmids showcasing respective Variant 1 and 2 mutations). So what is the nature of the question you just asked? Is there hallucinating going on, or is there something unaccounted for that you think we should delineate in the proposal?	HTGAA Claude AI Tutor
Ok, thanks for the clarification. Yes, let’s add the extra level of detail and specify IPTG induction details. If you could also let me know what the IPTG acronym stands for in this context, that would be greatly appreciated also	HTGAA Claude AI Tutor
Generate proposal	HTGAA Claude AI Tutor
Apologies, but saw reference to T4 in the proposal. Believe we meant to say T7. Mind editing the proposal accordingly? If this notion is incorrect, say so	HTGAA Claude AI Tutor
In the source paper on which this project is based, the researchers say “Here, we report the dynamics between T7 bacteriophage and Escherichia coli in microgravity aboard the International Space Station (ISS)”. So think the intention is that we’re working with T7. There’s no mention of bacteriophage T4 anywhere in the original paper (see link below): https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3003568#sec009	HTGAA Claude AI Tutor
Yes, regenerate the full corrected proposal now with T7 substituted throughout	HTGAA Claude AI Tutor
Apologies, but the fully corrected proposal with the T7 substitutions appears as one continuous scroll/blob of Markdown. Do you mind re-formatting the content so it looks like the format of the Turn 86 response, with similar spacing and item verification? The content itself does not and should not change – just how it’s formatted/presented	HTGAA Claude AI Tutor
Ok. Thank you for creating this. I’ve read through the project proposal and have some questions. If you can’t address them or respond to them, all good. –Can you remind me what the pET-28a plasmid vector stands for? Why is it useful or necessary? Is it right to call it a plasmid vector or is another term more technically appropriate? –Would you mind explaining each step of the Experimental Design in relatively simple terms, as if I was a reasonably educated 17 year old with some bio knowledge, but who is still a bio n0ob in the grand scheme of things. Explain the following as part of this ask: -the why(s) behind each step -the logical flow/progression of one step to the next -what we’re measuring as part of this Experimental Design, how we’re measuring it, and what types of graphs or images we’d traditionally use to do said measuring -how we’re ultimately testing the fitness of the microgravity-based Variants against the CFT073, UTI89, and MG1655 E. coli. strains	HTGAA Claude AI Tutor
It looks like the answer to the first question got cut off a bit	HTGAA Claude AI Tutor
Ok. Thank you for describing this. Before we move on, can we describe why we specifically chose a pET-28a expression vector for this specific Experimental Design? What is the logic behind it? If we want to test microgravity-based Variants against the CFT073, UTI89, and MG1655 E. coli. strains, why must we put the T7 Gene 17 CDS into a plasmid first? Do NOT hallucinate/make things up when replying to this prompt	HTGAA Claude AI Tutor
Thanks – helpful. Let’s move on to the second question (as a reminder, this second question is to explain each step of the Experimental Design in relatively simple terms, as if I was a reasonably educated 17 year old with some bio knowledge, but who is still a bio n0ob in the grand scheme of things). As part of this ask, explain the following: -the why(s) behind each step -the logical flow/progression of one step to the next -what we’re measuring as part of this Experimental Design, how we’re measuring it, and what types of graphs or images we’d traditionally use to do said measuring -how we’re ultimately testing the fitness of the microgravity-based Variants against the CFT073, UTI89, and MG1655 E. coli. strains	HTGAA Claude AI Tutor
Taking a look at Turn 90. It says that it outputted 15 Experimental Steps, but it only outputted 12. Complete the rest of the Experimental Steps by listing out the remaining 3 steps. Reference previous content in this chat to construct the Experimental Setup with all 15 steps Do NOT hallucinate/make things up when replying to this prompt	HTGAA Claude AI Tutor
Looking at Steps 14 and 15 in the last prompt, approximately how long do we think it should take to complete these Steps? Do NOT hallucinate/make things up when replying to this prompt	HTGAA Claude AI Tutor
There are multiple types of T7 bacteriophage right? In this paper, the researchers only used a specific type of T7 bacteriophage that infects E. Coli, correct? Indicate any areas where my reasoning/thinking is off and do NOT hallucinate/make anything up when replying to this prompt	Gemini
Just so we’re clear, the type of E. Coli. T7 bacteriophage studied in this paper is NOT the only type of E. Coli. T7 bacteriophage out there, correct? Do NOT hallucinate/make things up when replying to this thread	Gemini
Liking these results, but would like to find another E. Coli. T7 bacteriophage, potentially one that’s a bit unorthodox/out of the box to BLAST, that would likely still have a 95% match. Any thoughts about what these might be/how to proceed? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Going off the last prompt, how would I run a comparative BLAST against this specific gene and analogous examples listed above? Do NOT hallucinate/make things up when replying to this prompt	Gemini
After running the BLAST (including Blastn) the results are less than < 95%. The highest % was 89% for T3 (NC_003298.1) via Megablast. Any other thoughts about where/how to find somewhat unorthodox E. Coli. T7 bacteriophage candidates in the 95%+ range? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Looking over the paper again, confirming that the gene associated with the respective 5 and 6 substitution variants referenced in the paragraph before the ‘Discussion’ section is Gene 17, correct? Slightly confused about how that works or relates to the following phrase in the ‘Discussion’ section: “In microgravity, structural genes gp7.3, gp11, and gp12 emerged as particularly important,”. Do NOT hallucinate/make things up when replying to this prompt	Gemini
Help me understand what I’m seeing. What does the ‘L’ in matches with an ‘L’ at the end (ex. ‘Mutant Escherichia phage T7 clone T7_90L’) mean? What does the ‘S’ in matches with an ’s’ at the end (ex. ‘Mutant Escherichia phage T7 clone T7_50S’) mean? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Want to understand which of these results derive from a different strain of E. Coli. than the one referenced in the paper (ATCC BAA-1025-B2). Is it all of them? None of them? A bit confused. Tell me what I should be looking for or if my thinking is off here. Do NOT hallucinate/make things up when replying to this thread	Gemini
For the result for Sequence ID: MZ375271.1, it says Range: 32650 to 34311. How do I find this in the raw FASTA file for this sequence? Do NOT hallucinate/make things up when replying to this prompt	Gemini
How far back in this chat can you go? Want to re-surface the results of the prompt that asked about the specific gp17 mutations that took place in each of the Variants. Want to understand which nucleotides and amino acids to change to re-create the mutations found in Variants 1 and 2. Do NOT hallucinate/make things up when replying to this thread	Gemini
Based on the paper in the shared tab, can you tell me what strain of E. Coli. the University of Wisconsin, Madison researchers tested their bacteriophage against? Believe it was BL21 but want to confirm. Do NOT hallucinate/make things up when replying to this prompt	Gemini
Thank you. While I understand some combinatorial work was used to create Variants 1 and 2 in the paper in the tab, can you explain how I would simply explain how the Variants were created in a phrase approximately the length of a Tweet? Would something like, ‘Variants 1 and 2 were created via a combinatorial approach deriving from nth mutations, with down selection for the highest PFU/mL’ technically make sense? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Why did the Variant 1 and Variant 2 bacteriophage Gene 17 T7 tail fiber substitutions seem to have higher infection rates against terrestrial UTI? What exactly about Gene 17 and the tail fiber substitutions specifically seemed to be so special/useful in achieving this higher infectivity rate? What other microgravity-derived mutations outside of Gene 17 might also help improve E. Coli. bacteriophage infectivity rate going forward (i.e., if I was going to test different microgravity derived genetic mutations/amino acid substitutions, what would make sense)? Do NOT hallucinate/make things up when replying to this prompt	Gemini
In the E. Coli. bacteriophage genome, what genes are gp7.3, 11, and 12 associated with? Do NOT hallucinate/make things up when replying to this prompt	Gemini
Tell me how the researchers for this paper created the Variants in 2-3 sentences. Do NOT hallucinate/make things up when replying to this prompt	Gemini
When we say microgravity unlocks synergistic mutations in the E. Coli bacteriophage, what exactly do we mean by this? Explain in 3-4 sentences. Do NOT hallucinate/make things up when replying to this prompt	Gemini
If i want to locate the position of gene 17 in e coli t7 bacteriophage, how do I do that?Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Can you point me to or create an image showing the ’tree of life’ of the phage kingdom of life? Want to visually see the common genetic ancestor(s) or T7 bacteriophage and D29 mycobacteriophage. Want to see a pictorial overview of the breakdown of life in the phage kingdom. If this does not exist/if this is not grounded in scientific fact, say soDo NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Give me a visual for the last bullet point to drive home the point in basic terms. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Not seeing the transduction visual	Google AI Mode
Working on some research to show similar Gene 17 mutations (the ones that created Variants 1 and 2 in the attached paper) in a different yet similar bacteriophage based off the bacteriophage in this paper (ATCC BAA-1025-B2). Been told to look for a T7 look-alike phage lysate for Gene 17 of the following E. coli strain which I believe I’ve done via BLAST. Found GenBank: MZ375271.1 as a potential candidate. I’ve also been told to go look for phages for DNA extraction, to use AlphaFold for validation of the relevant structural mutations, and I ultimately need to design primers for polymerase chain reaction (PCR) for rebooting the phage.Where does cphages fit into all this? What does it mean to extract DNA based from cphages? How does all this (cphages, AlphaFold) inform the designing of primers for PCR?Do NOT hallucinate/make things up when replying to this prompt. If my thinking is unclear, wrong, or doesn’t make sense in any part of the prompt, say so	Google AI Mode
phage dna database	Google AI Mode
If I have a fragment of a sequence in NCBI (i.e., a specific gene that’s part of a larger NCBI sequence), how do I import that specific fragment into Benchling in such a way that I preserve any and all existing annotations? Do NOT hallucinate/make anything up when replying to this thread	Google AI Mode
Can the Cocktail program referred to in the second category (Simulation of Lysing Rates and Kinetics), work on a Mac? If the answer’s no, all good. Just let me know. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Is a bacteriophage a virus or an actual living organism? If it is one of these two things, why is this the case? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Are there any websites out there for verifying a plasmid relative to a designated function (ex. if I want a plasmid to do X function, can I upload a file and have the web service determine whether I have the right number of base pairs or parts for that function)? Do NOT hallucinate/make things up when replying to this thread	Google AI Mode
If I want to test a fragment of mutated Gene 17 T7 bacteriophage phage DNA against several different strains of E. coli bacteria, what are the core pieces I need in my plasmid? Right now I have the following:–T7 Terminator–T7 Promoter–Ribosome Binding Site (RBS)–Multiple Cloning Site (MCS)–ATG start codon annotation–Lac operator–Thrombin site–2 6xHis tags–Gene 17 Coding Sequence (CDS)–Gene 17 Amino Acid TranslationIs there anything I’m glaringly missing? Anything(s) that seem off relative to my goal? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Helpful. The aim here is to test for a phage complementation/infection/lysing assay against CFT073, UTI89, and MG1655 E. coli strain. sDo NOT hallucinate/make things up when replying to this prompt	Google AI Mode
See 1st prompt.The aim here is to test for a phage complementation/infection/lysing assay against CFT073, UTI89, and MG1655 E. coli strains. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
So when I’m replacing these components, do I essentially look up their FASTA files, insert them into the plasmid sequence as needed, and annotate them accordingly? Is there any location-specific information I should know about where to place some of the items included in the list in the response to the last prompt? Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Explain how I se the strict 5’ –> 3’ order in a Benchling sequence. Also explain to me how I can discern the orientation of my Backbone in a Benchling sequence. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
Tell me what the ‘Run Primer3’ option in Benchling does. Do NOT hallucinate/make things up when replying to this thread	Google AI Mode
Show me publicly accessible examples of E. coli. bacteriophage genome fragment primers assembled in Benchling for Gibson Assembly. Want to understand how these are constructed.Do NOT hallucinate and/or make things up when replying to this prompt	Google AI Mode
What is a wordlist in Benchling? What does it do/what’s its function?Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
When creating a Gibson Assembly Construct in Benchling both the Backbone and Insert should have the same orientation, correct? Believe this is the case (or usually the case), but just want to verify. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
if i’m growing bacteria on agar (example e. coli strains) how do I know when they’ve reached log phase?	Google AI Mode
pfu/ml graph	Google AI Mode
It’s true that if I just write out a string out letters representing an amino acid sequence, I cannot determine the exact amino acid sequence based on those letters because a single letter can stand for multiple amino acids, correct? Isn’t additional context needed?. Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode
The Nanodrop tool for measuring DNA quantities in biology is a type of spectrophotometer, correct? Do NOT hallucinate/make things up when replying to this prompt. Reply to this prompt based on the knowledge at hand about the Nanodrop tool	Google AI Mode
ug/mL	Google AI Mode
In the context of E. coli. bacterial strains, what does the acronym UPEC stand for? Believe it stands for Uropathogenic E. coli., but want to confirm. Moreover, when testing bacteriophage fitness/lsying (PFU/mL), why does testing against UPEC E. coli. bacteria strains matter?Do NOT hallucinate/make things up then replying to this prompt	Google AI Mode
if i’m growing bacteria on agar (example e. coli strains) how do I know when they’ve reached log phase?	Google AI Mode
Remind me again what log phase means within the context of growing bacteria on agar (example E. coli strains). Do NOT hallucinate/make things up when replying to this prompt	Google AI Mode

https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3003568#sec002 ↩︎
Ibid. ↩︎
https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2025.1680651/full ↩︎
10 µl Gibson Assembly Master Mix (2X), 0.2–1.0 pmoles of V1 and V2 DNA fragment, and 10 µl deionized water (see NEB Gibson Assembly Protocol) ↩︎
0.3 mM of additional dNTPs, 0.5–4% PEG 8000, V1 and V2 fragments, 9 µL of Pro Master Mix, and water if needed should add up to 12 µL (see Arbor Biosciences myTXTL guidance) ↩︎
A traditional formula for calculating PFU/mL is PFU/mL = number of plaques / (dilution factor * plated volume in mL) ↩︎ ↩︎
ANOVA traditionally conducted via 6-part process: 1) Calculate raw data 2) Calculate group means 3) Calculate overall mean 4) Compute sum of squares between groups 5) Compute sum of squares within groups 6) Calculate F-statistic (see 6sigma) ↩︎ ↩︎

Group Final Project

Bacteriophage Engineering Group Project Inputs_William & Mary Node Group ¹ ²

Group Project_Protein Design 1

Selected Goal: Increased stability (easiest)
Brainstorm Session Questions:
- Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”)
  - We’ll attempt to run multi-environment/conditional modeling and simulation to down-select lysis stability approaches that show the greatest resilience across environments/conditions.
  - The team has selected a project focused on enhancing the stability of the Lysis Protein, a decision influenced by the group’s current experience level. The primary objective is to improve thermodynamic stability while concurrently preserving the native protein fold and maintaining functional integrity. The proposed methodology involves utilizing BLAST for identifying homologous sequences, followed by Clustal Omega to ascertain conserved residues susceptible to mutation intolerance. Subsequently, ESM2 will be employed to score candidate substitutions based on evolutionary plausibility. This will be succeeded by the application of ESM-Fold to predict and refine the integrity of the protein fold, as well as to optimize existing backbones. The results may then be further subjected to EvolvePro for accelerated directed evolution. Tools like Boltz-1 and ProteinMPNN offer a capability for redesigning solvent-exposed residues and optimizing the core packing of the protein. We can cross their performance for comparison. All selected variant candidates are slated for computational stress-testing under a range of environmental conditions that could potentially induce destabilization. Selected variant candidates that pass the stress test are prioritized for downstream experimental validation.
- Why do you think those tools might help solve your chosen sub-problem?
  - The previous bullet point addresses tool functionality in our workflow, explaining why and how various tools will assist us in accomplishing our goal
- Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).
  - One potential pitfall is that we may have insufficient in vitro quality and quantity of data to test the environmental constraints of interest. Thus wet-lab work would be needed to back-up the findings, in addition to follow-up
  - There are open questions regarding the validity of the stated research approach (i.e., if the approach makes sense relative to the larger goal of increased stability)
- Include a schematic of your pipeline
  - See workflow schematic below

Group Project_Protein Design 2 ³

See results below
Notebook Inputs:

Inputted L-Protein Mutants.csv file for analysis
- Noted Encounter Issues and Resolutions:
  - Part 12 involved a step wherein the cell would be run and there would exist a prompt by which an upload of the “L-Protein Mutants - Sheet1.csv” was required. The current “L-Protein Mutants” file offered in the laboratory was listed as an excel file. We proceeded by saving the data into a .csv file that reflects the desired file name and uploading.

Utilized experimental data [4]
Experimental Correlation Results

The results below indicate poor correlation between the experimental data and the scores from the notebook. The values indicate high levels of uncertainty between the experimental data and the notebook scores. These results indicate that:

5 Mutations

A Heatmap of predicted mutations was given. The following mutations are chosen in coordinate form. While yellow indicates a high ratio, it shouldn’t be taken fully at face value.
- Xavier Results
  - Chose the using the Predicted Effects Graph (allows for a first pass)
    - Soluble-Region: Y39 (L), F5 (S)
    - Transmembrane Region: N53, E61, T52 (All L)
- Raphael Results
  - The following are mutations propositions based on the Predicted Effects Graph:
    - Soluble Region:
      - F5Q: Clusal Omega shows that the 5 position is optimal for change as it is frequently switched. ESM-2 deems this mutation at this position to have the most promising score (1.795244).
      - F5P: Clustal Omega shows 5 position as frequently changing. ESM-2 deems this mutation as third best for position 5 (1.596891).
    - Transmembrane Region:
      - E61L: Clustal Omega shows high variance at position 61. ESM-2 demonstrates a high score of 1.818098.
      - F47L: Clustal Omega shows high variance at this position, showing the protein is most likely to retain functionality after a mutation here. ESM-2 demonstrates a decent positive score.
    - Free:
      - F5S, K50L, E61L: F5S mutation is supported by the Clustal Omega results as it is shown across multiple species. The K50L and E61L mutations are not supported as much by Clustal Omega but show high results in ESM-2.
- Jason Results
  - Soluble Region:
    - F5_R: Potentially reduces protein aggregation risk, increases solubility via Arginine inclusion
    - S9_Q: Potential glutamine inclusion could lead to more stable hydrogen bonds
    - C29_R: Current cysteine prone to potential misfolding. Arginine could improve solubility, potentially preventing clumping or misshapen configurations
  - Transmembrane Region:
    - L44_I: Increased isoleucine density might increase thermodynamic stability
    - A62_V: Valine hydrophobicity might help protein orientation and improve lipid piercing
- Nana Results
  - Soluble Region:
    - Position 5 : F -> Q ( 1.79524445533752)
    - Position 17: N -> R (1.32365107536315)
  - Transmembrane Region:
    - Position : 40 V -> L ( 1.79524445533752)
    - Position 50 K -> L (2.56146419048309)
    - Position 65 R -> L (1.0260357856750488)
  - Mutants:
    - Mutant 1: METRQPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
    - Mutant 2: METRFPQQSQQTPASTRRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
    - Mutant 3: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYLLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
    - Mutant 4: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT
    - Mutant 5: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVILTVTTLQQLLT

Generated Multimeric Assemblies
- Xavier Results
  - Oligomer: METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLLQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLLAVIRTVTTLQQLLT
  - Chain Separated Based on:
    - Chain A: F5S: position 5, F → S
    - Chain B: Y39L: position 39, Y → L
    - Chain C: T52L: position 52, T → L
    - Chain D: N53L: position 53, N → L
    - Chain E: E61L: position 61, E → L
    - METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
    - METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
    - METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLLQLLLSLLEAVIRTVTTLQQLLT
    - METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT
    - METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLLAVIRTVTTLQQLLT
  - Sequence Coverage Map
  - Predictions
  - Display 3D Structure
  - Plots
- Raphael Results
  - F5Q:
  - F5P:
  - E61L:
  - F47L:
  - F5S, K50L, E61L:
    - - RESULTS: Mutation 5 (F5S, K50L, E61L) produced very promising results. It seems that the soluble region cannot be improved (most likely due to its disordered nature which prevents it from forming a tangible final structure). But when the K at position 61 is changed into an L, the transmembrane region improves greatly. The stats for this rank are the following: pLDDT=62.4 pTM=0.565 ipTM=0.558.
        Multimeric Assembly:
        METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLLAVIRTVTTLQQLLT:METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLLAVIRTVTTLQQLLT:METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLLAVIRTVTTLQQLLT:METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLLAVIRTVTTLQQLLT:METRSPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLLAVIRTVTTLQQLLT
- Jason Results
  - Original Wild-Type Sequence 3D visualization and Sequence Coverage Map
  - Multimeric Assembly based on mutations listed in 5.
    - METRRPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFIAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLVEVIRTVTTLQQLLT
      - Multimeric Assembly Sequence 3D visualization and Sequence Coverage Map
        
        Unfortunately it appears that my introduced mutations did not improve the lysis abilities of the L protein based on the results above
- Nana Results
  - Multimeric Assembly: METRQPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTRRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYLLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT:METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVILTVTTLQQLLT

All of Jason’s supporting prompts for this work listed below

Supporting Prompt	Model
In the final 2 cells, what does the ’effect_esm’ variable mean? What do these numbers indicate from a scoring standpoint? How do they fit into a larger context regarding mutated amino acid efficacy? Do NOT hallucinate when answering this prompt	Gemini
Tell me where (i.e., in which cell) the 2.56 mutation value is located. Do NOT hallucinate when answering this prompt	Gemini
I want you to go back to the following item from the response from 2 prompts ago: “n your notebook, you have a cell uploading ‘L-Protein Mutants - Sheet1.csv’, which contains experimental data like “Lysis” and “Protein Levels.” The goal of calculating effect_esm is often to see if the high model scores correlate with high experimental activity (efficacy) in the lab” Based on the effect_esm scores located in the final 2 cells, help interpret if or how the high model scores correlate with high experimental activity (efficacy) in the lab. Explain why there is (or isn’t correlation) based on these scores, and the underlying rationale(s) behind any/all correlation or lack thereof Do NOT hallucinate when answering this prompt	Gemini
In the cell containing ‘interactive_heatmap(protein_sequence)’ there’s a graph titled ‘Predicted Effects of Mutations on Protein Sequence (LLR)’. How is this graph supposed to be read? How does one find the transmembrane and soluble regions from this graph (if that’s something one can find on this graph)? If coordinates with a single letter and 2 digits (ex. K32) can be discerned from this graph, how can they be discerned? Do NOT hallucinate when answering this prompt	Gemini
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Take the protein sequence above and confirm that the first 40 positions (1-40) are the soluble region and positions 41-75 are the transmembrane region. If this is correct, explain why, and if this is incorrect, explain why Do NOT hallucinate when answering this prompt	Gemini
Ok. So if I see a vertical selection in the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph located in the cell containing ‘interactive_heatmap(protein_sequence)’ that seems to have a mostly yellow-ish or light green hue to it, what does that mean? What does it imply for nth mutations that could occur at that specific position in the sequence? Do NOT hallucinate when addressing this prompt	Gemini
If the first position in the following sequence shows up as a dark blue/almost purple color in the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph located in the cell containing ‘interactive_heatmap(protein_sequence)’, does that indicate a conserved region, or a region that should NOT be subject to amino acid mutations? Do NOT hallucinate when answering this prompt	Gemini
Ok. So based on that logic, where/how do I find other conserved regions/essentially no go areas for amino acid mutations based on the information located in the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph located in the cell containing ‘interactive_heatmap(protein_sequence)’? Are these indicated by blue-ish or purple horizontal bands emanating from particular amino acid mutations located on the Y axis of the graph? Clarify these questions Do NOT hallucinate when answering this prompt	Gemini
Yes, identify the coordinates of these solid dark blue vertical columns	Gemini
Ok. I’d like you to take all the logic leading up to the creation of the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph located in the cell containing ‘interactive_heatmap(protein_sequence)’ and extend that to a new cell at the bottom of this notebook containing an almost 100% identical ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph compared with the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph located in the cell containing ‘interactive_heatmap(protein_sequence)’. The only difference is that I want visual indicators (maybe vertical red lines) indicating the beginnings and ends of the respective soluble and transmembrane regions of the sequence. This would make it easier to distinguish things in the graph. Leave all other code in the notebook untouched. Just extend the code behind the creation of the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph located in the cell containing ‘interactive_heatmap(protein_sequence)’ and its underlying logic to a new cell at the bottom of this notebook. Do NOT hallucinate when creating this code	Gemini
Ok. So taking a look at the ‘Predicted Effects of Mutations on Protein Sequence (LLR)’ graph in the penultimate cell with the soluble and transmembrane regions, it looks like the following mutations are promising/favored: –Soluble Region (First F): Amino Acid Mutation Q –Soluble Region (First S): Amino Acid Mutation Q –Transmembrane Region (First C): Amino Acid Mutation L –Transmembrane Region (First C): Amino Acid Mutation R If you can give specific graph coordinates for each of the mutations listed above, do so. If there isn’t enough granularity in what was written above, indicate as such. Do NOT hallucinate when addressing this prompt	Gemini
Apologies, not understanding the following: –How is it that C29 sits in the Soluble N-terminal region if the positions on the graph indicate it is within the Transmembrane region? Is there something wrong with the graph? How can these points be reconciled? –Not seeing a second C at position 65 within the wild-type sequence (METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT) Do NOT hallucinate when replying to this prompt. If you are at risk of hallucinating, indicate as such	Gemini
Ok. So not exactly following the logic behind the placement of the dashed lines indicating the positional start and the ends of the soluble and transmembrane regions based on the X axis information. By that logic it visually looks like the first C located within the wild-type sequence (METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT) falls right inside the transmembrane region. Clarify what’s going on Do NOT hallucinate when answering this prompt	Gemini
Yeah that’s the issue. Taking a look at this excerpt from the answer to the last prompt and seeing the essence of the issue: “You will find that the C is the 29th letter. If the red line is to the right of that C, then the C is correctly in the Soluble region.” The red line is to the left of the C in question. Clarify what’s going on here Do NOT hallucinate when answering this prompt	Gemini
Yes, please provide this code block	Gemini
Please explain this error:	Gemini
For the mutations that are favored/more yellow colored in the ‘Corrected Protein Regions: Soluble v. Transmembrane’ graph located in cell ‘0ZrSuXEGIzsR’, what is the biological logic/rationale behind why the model is giving these mutations yellow colors? Is it binding ability counterbalanced against potential damage to the protein structurally or negative therapeutic effects (toxicity, etc.)? Help clarify this (or these) biological logic/rationales Do NOT hallucinate when addressing this prompt	Gemini
To clarify, when we say the term ‘residue’ in the context of proteins, are we referring to amino acid positions located within the wild-type sequence (METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT)? Do NOT hallucinate when addressing this prompt	Gemini
Ok. Tell me about the following residues (including their formal names) and why mutating them would or might make sense or would be favorable within the context of the wild-type sequence (METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT): –Soluble Region: F5, S9, C29 –Transmembrane Region: K50 Do NOT hallucinate when addressing this prompt	Gemini
Might there be a similar K50L-style trade-off for E61L? Why or why not? Do NOT hallucinate when addressing this query	Gemini
So according to the ‘functional defect’ logic described during the course of the last 2 prompts, which 2 alternate yellow-ish colored mutations located in the Transmembrane region would be ideal for mutation, that would NOT hinder lysing abilities (i.e. ability to kill desired cells)? Do NOT hallucinate when addressing this prompt	Gemini
The residue in location 52 is labeled T. Is that Alanine? Did you mean residue 62 when describing Alanine in your reply to the last prompt?	Gemini
My only concern with these chosen mutations in the transmembrane region is that they’re not exactly yellow. They seem to be shades of lighter green, which isn’t outright blue, but still not exactly screamingly positive Do NOT hallucinate when addressing this prompt	Gemini
Think the following mutations might be worth pursuing based on their color (see ‘Corrected Protein Regions: Soluble v. Transmembrane’ graph in penultimate cell in workbook as reference): Reference on how to read what I wrote regarding the ‘Soluble Region’ and ‘Transmembrane Region’ below: LETTER[NUMBER] (X Axis)_Letter (Y Axis) Soluble Region: –F5_R –S9_Q –C29_R Transmembrane Region: –L44_I –A62_V This is all based on the following wild-type sequence (METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT) Want to go beyond color justification to talk about the benefits of each mutation from a structural and/or functional standpoint Do NOT hallucinate when addressing this prompt	Gemini
What is a thiol group in simple terms? Why would the swap out of the Cytesine for Ariginine in C29 –> R be useful due to Arginine’s positive charge? Explain in simple terms	Gemini
What is the lipid bilayer? How does it relate to A62_V?	Gemini
How do I make a multimeric assembly based on mutations to the wild-type sequence located under the ‘Input protein sequence(s), then hit Runtime -> Run all’ cell? Do I just add each mutated sequence next to each other separated by colons and go from there? Do NOT hallucinate when addressing this query	Gemini
Ok. Here’s my wild-type sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Output 5 individual/separate mutated sequences based on the wild sequence above on the following modifications per individual/separated sequence: –Change the ‘F’ in the 5th position left of the 1st ‘M’ to an ‘R’ –Change the ‘S’ in the 9th position left of the 1st ‘M’ to a ‘Q’ –Change the ‘C’ in the 29th position left of the 1st ‘M’ to an ‘R’ –Change the ‘L’ in the 44th position left of the 1st ‘M’ to an ‘I’ –Change the ‘A’ in the 62nd position left of the 1st ‘M’ to a ‘V’ Do NOT hallucinate when addressing this prompt	Gemini
Thank you. What does the ‘Sequence coverage’ graphic under the ‘Run Prediction’ cell mean/display exactly? Do NOT hallucinate when addressing this prompt	Gemini
What do the results located under the ‘Run Prediction’ and ‘Display 3D structure’ cells indicate about the mutated sequences inputted? Are they strong or weak in combination relative to the goal of improving the stability and autofolding of the Lysis protein in the MS2 bacteriophage (from which the original wild-type sequence originated). I believe the results seem to be suboptimal based on the shape and color outputted in the ‘Display 3D structure’ cell but I’m not sure if that’s the case, and if it is, why that’s the case Do NOT hallucinate when addressing this prompt	Gemini

This Bacteriophage Engineering Group Project was completed by a select group of William & Mary Node students in March 2026 (Xavier Lewis-Palmer, Raphael Aca, Nana Agyei Afrane-Asare, and Jason Ross) ↩︎
This is a Markdown version of the “HTGAA_Bacteriophage Engineering Group Project Brainstorming Doc.__William & Mary Node” Google Doc. (https://docs.google.com/document/d/1676c1tgFUlGaP-Bwp9_vDexbk3VsOJeuQeylNfvz76o/edit?usp=sharing) ↩︎
Based on Opt. 1: Mutagenesis workflow in Protein Design II - Phage HW Sheet (https://docs.google.com/document/d/e/2PACX-1vSKF3Q5PY_T-McPiQoCVr6A9HpUxHedSAPmqikf9pHeoMM1Xt_EDAKOuUR0WlNMP-TZAMErUPbARhGh/pub) ↩︎

Jason Ross — HTGAA Spring 2026

About me

Contact info

Homework

Labs

Projects

Subsections of Jason Ross — HTGAA Spring 2026

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

Week 1 Biological Engineering Application Governance Exercise

Week 1 Homework Questions

Professor Jacobson Questions

Dr. LeProust Questions

George Church Question

Week 2 HW: DNA Read, Write, and Edit

Part 0: Basics of Gel Electrophoresis

Part 1: Benchling & In-silico Gel Art

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1 Choose your protein

3.2 Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

3.3 Codon Optimization

3.4: You have a sequence! Now what?

3.5 Optional: How does it work in nature/biological systems?

Part 4: Prepare a Twist DNA Synthesis Order

4.1: Create a Twist account, and Benchling account

4.2: Build Your DNA Insert Sequence

4.3: On Twist, Select the Genes Option

4.4: Select the ‘Clonal Genes’ Option

4.5: Import your sequence

4.6: Choose Your Vector

Part 5: DNA Read/Write/Edit

5.1 DNA Read

5.2 DNA Write

5.2 DNA Edit

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork

Post-Lab Questions

Final Project Ideas

Week 4 HW: Protein Design Part 1

Part A: Conceptual Questions

Part B: Protein Analysis and Visualization

Part C: Using ML-Based Protein Design Tools

C1. Protein Language Modeling

C2. Protein Folding

C3. Protein Generation

Part D: Group Brainstorm on Bacteriophage Engineering

Week 5 HW: Protein Design Part 2

Part A: SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

Part B: BRD4 Drug Discovery Platform Tutorial (Optional)

Part C: Final Project: L-Protein Mutants

Week 6 HW: Genetic Circuits Part 1

DNA Assembly

Asimov Kernel

Week 7 HW: Genetic Circuits Part 2

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Assignment Part 2: Fungal Materials

Assignment Part 3: First DNA Twist Order

Week 9 HW: Cell-Free Systems

Homework Part A: General and Lecturer-Specific Questions

General Homework Questions

Homework question from Kate Adamala

Homework question from Peter Nguyen

Homework question from Ally Huang

Homework Part B: Individual Final Project

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

Homework: Waters Part I — Molecular Weight

Homework: Waters Part II — Secondary/Tertiary Structure

Homework: Waters Part III - Peptide Mapping - Primary Structure

Homework: Waters Part IV - Oligomers

Homework: Waters Part V - Did I Make GFP?

Week 11 HW: Bioproduction and Cloud Labs

Part 1: Global Pixel Artwork Cloud Lab Contribution