Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Class Assignment — DUE BY START OF FEB 10 LECTURE Question 01 First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. According to the well-regarded popular science writer Matthew Cobb (2022), since the Asilomar conference in 1975, molecular biologists have been the vanguard in self-regulating when playing God. This means we refrain from conducting our research irresponsibly by deploying unnecessarily hazardous experimental methods. Alas, this also means that some of the most exciting genetic engineering is no longer done. Consider Dr. Oswald Avery’s transforming principle experiment. Blindly take a population of virulent pneumonia bacteria and feed them harmless kin until they lose their aggressive function and magically adapt into weak and indifferent pneumonia. Since Asilomar, this is indeed one kind of experiment that trustworthy principal investigators must abstain from. I get it, and still I contemplate. Wasn’t Avery the best of us, though? Between Schrodinger and Watson, Crick, and Franklin – Dr. Avery intuited DNA into existence with his transforming principle and used it effectively. Surely I didn’t name my oldest son after this man for nothing?

  • Week 2 HW: DNA Read Write and Edit

    Table of contents Software used: Terminal, git, xcode, hugo, benchling, rcdonovan website, twist website. Objective: This week explores the read–write–edit toolkit: sequencing and synthesis workflows, restriction digests and gel electrophoresis, and early genome-editing frameworks.

  • Week 3 HW: Lab Automation

    Homework for HTGAA 2026 (Week 03): Lab Automation Table of contents Software used: Terminal, git, Opentrons, rcdonovan website, Google Colab. Objective: This week we get hands-on (or at least code-on) with pipetting robots.

  • Week 4 HW: Protein Design Part I

    Homework: Protein Design I Assignment Objective: Learn basic concepts: amino acid structure, 3D protein visualization, and the variety of ML-based design tools. Brainstorm as a group how to apply these tools to engineer a better bacteriophage (setting the stage for the final project).

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

Class Assignment — DUE BY START OF FEB 10 LECTURE

Question 01

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

According to the well-regarded popular science writer Matthew Cobb (2022), since the Asilomar conference in 1975, molecular biologists have been the vanguard in self-regulating when playing God. This means we refrain from conducting our research irresponsibly by deploying unnecessarily hazardous experimental methods. Alas, this also means that some of the most exciting genetic engineering is no longer done. Consider Dr. Oswald Avery’s transforming principle experiment. Blindly take a population of virulent pneumonia bacteria and feed them harmless kin until they lose their aggressive function and magically adapt into weak and indifferent pneumonia. Since Asilomar, this is indeed one kind of experiment that trustworthy principal investigators must abstain from. I get it, and still I contemplate. Wasn’t Avery the best of us, though? Between Schrodinger and Watson, Crick, and Franklin – Dr. Avery intuited DNA into existence with his transforming principle and used it effectively. Surely I didn’t name my oldest son after this man for nothing?

Unlike Dr. Avery, I am fortunate to be proposing my HTGAA 2026 project after the discovery of DNA and the Asilomar conference and the Once-in-a-Century Pandemic when mRNA Vaccines and CRISPR gene editing approaches were available. However, like Avery, with molecular biology, we can still take a population perspective to current wicked health problems. My first professional mentor, Dr. Paul Farmer, would often credit his milieu caring for the poorest of the poor as the center of his mission. Though I once aspired to that also, I now reflect that I am fortunate to just be an aspiring Molecular Biologist. In addition, when I marvel over everything that has been achieved since Avery, especially since the Human Genome Project and the advances of systems biology to synthetic biology, I see there are now viable alternatives in biological practice to help others and the living world.

This brings me to the project. I agree with Dr. Aubrey De Grey that biological aging is a vexing, immutable inequality in public health that must be solved. In fact, I am engaged in this research with my excellent Biology PHD mentors at North Carolina Agricultural and Technical State University (NCATSU), and one of them was on the team that first postponed senescence in Drosophila back when Star Wars movies were worthy of the hype.

Like Dr. De Grey, I believe exit velocity will be achieved in our lifetimes by engineering negligible senescence. The difference is that his model species are cohorts of robust, rejuvenated rodents centralized in a single laboratory, and I propose we develop many sites and open science approaches using goats instead. I also think that we will need to develop applied computational systems biology simulators (synthetic biology simulators too if they exist) and at the center of the approach needs to be the host-microbiome.

Why goats? They’re not even a monogastric species. Please hear me out.

According to ChatGPT, the oldest recorded goat in the Guinness World Records is McGinty (22 years, 5 months). The buck was a Brition, he was male, and from a Pygmy breed. My understanding is that Pygmy goats were originally bred to feed large cats. This record was set in 2003 and I assume it hasn’t been challenged since. Although I never had the pleasure of meeting McGinty, the general indifference evident by his nefarious name and the dusting of a few social media posts and overall absence of life-history information makes plain that likely society gave up on even understanding goat longevity decades ago. This means that despite living among Homo sapiens for more than 10,000 years and sustaining us in every challenging environment on Earth, we still know more about goats’ genomic diversity than life history. That’s not a bad thing, though, because goats’ genomes and immune systems are as infinitely fascinating as our own.

In addition to not being popular, goats live a life preoccupied by parasites, predators, and food insecurity that is only moderately improved by domestication, let’s be honest. I often reflect on the goats I met in the Galapagos Islands – the first example of extreme biological environments. Goats are not indigenous to the Galapagos. They are migrants. They didn’t migrate there on their own volition, though – instead they brought in rafts and boats a Century ago, and still to this few re-wilded stragglers refuse to go extinct. In fact it’s hard to find an island in the Galapagos that doesn’t have a pile of goat skulls on it. I understand the issue is complicated but either way you land on the issue, it’s hard to deny that goats are specialists in acclimating in extreme environments. Ironically, it’s Charles Darwin’s theory of Natural Selection that I would like to structure the computational systems biology goat longevity simulator around, particularly using Neo-Darwinian genetics and postponement of senescence work by Rose, Muller, Luckinbill, and Graves.

I propose a Long Term Experimental Evolution (LTEE) study that leverages synthetic biology and local animal husbandry to study the role of gut microbiomes on cellular senescence in goats. I hypothesize that understanding diversity and abundance in genetic circuitry constituting biological signaling pathways between adaptive, senescence-resistant microbes and Metazoan somatic tissues will yield the putative attractor switches we need to cure cellular senescence and put apoptosis on a toggle switch. Theoretically, though I certainly don’t plan to achieve this in 10 weeks or morally at all. The point is that once you understand that one contingent evolutionary endosymbiotic event transformed an alpha-proteobacteria into the power center for every Metazoan cell that came after, and then the effects of the mitochondria on oxidative stress accumulation and stabilization. Inevitably, we can trust that the solution to aging in somatic cells will never again be an if question.

Endpoints I will be investigating are biologically and statistically significant variation in “aging” host and microbe genes identified through differential gene expression. The study will be a multigenerational LTEE for Synthetic Biology 101, targeting the bidirectional interactions between living goat genes and pathways and the microbiota in their gut. My stakeholders are the American Milk Goat Breeding Association and Nanopore, and every isolated mountain village or homesteader that is still alive because of their goat herd.


Question 2

GPG01: Explore synthetic biology for goat life history for a putative Mitocarta or SASP gene and phenotypic pathways that may be useful in future studies to bioengineer negligible senescence in goats.

GPG02: Integrate aim 1 gene with OMICs data using computational model to explore molecule mediated bidirectional interactions between somatic host cells and microbes in goat microbiomes.

GPG03: Consider systems-level synthetic biology interventions for extreme environments that support goat metabolism and gut microbiome health.


Question 3

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design,

Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.). Purpose: What is done now and what changes are you proposing? Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc) Assumptions: What could you have wrong (incorrect assumptions, uncertainties)? Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

CN Answer Purpose: Humans have been artifically selecting phenotypes in goats to consume for our benefit for more than 10,000 years. I understand goat meat, milk, and fiber are necessary for human differential reproductive success and maintenance. I actualy have a working goat farm. My purpose is not adjudicate my species, or rescue another, my intention is to use this incredible opportunity to make ammends to another population of Metazoans by helping my peers use synthetic biology to help goats live longer, higher quality lives. I do not pursue this idea to make a more profitable goat commodity either. In summation my reasoning at HW 1 is: because I know postponing senescence in Metazoans is possible and I care about the welfare of all goats, I want to help others and myself advance negligible senesence in goat somatic cells safely and humanly too. The changes I am proposing for my HTGAA 2026 project though are only to expand fair, accurate, timely, accessible open science data about goat life history and genetics, so through synthetic biology we can help goats live longer, healthier lives.

Design: Based on what I know about Synthetic Biology today, which is far less than I care to admit without embarassment on a public website. What is needed to make it “work” is Dr. Aubrey De Grey brillance, vision, ability and a sincere heart for animal welfare. Eventually a sustainable research enterprise plan will be useful to achieve endpoints, quality benchmarks, and safety standards. Let me make a clear point first though, all I am proposing at this juncture is cast out net for data, reel it in and evaluate what I find. This review will require oversite from experts – can anyone put me in contact with Dr. George Church or Dr. Aubrey De Grey?

Assumptions: I love this question because I am a scientist and I think no other discipline is more pragmatic than us when it comes to how we manage uncertainty. This is the crux about assumptions. Uncertainty is dangerous. Case and point, because I care about goat welfare and recognize I do not understand enough about Synthetic Biology interventions to expect what I don’t understand about goat life-history, physiology, and genetics I would never do anything to disrupt in vivo what I am learning to explain and make predictions about – that being the bioengineering of negligible senesence in goats.

Risks of Failure & “Success”: Here risk communication and management are key. I was an Epidemiologist for 20 years before going back to school. In Epidemiology although all Pandemics are orphans, a breech in prevention is always the root cause. I say this to explain why I am so proactive about preventing failure, especially when it comes to public health. Another example, is part of my PhD training was working in a Molecular Microbiology lab on a LTEE for NASA. Here a significant portion of the job is monitor and improve protocols and practice to minimize contamination, especially on a 100 day LTEE study.


Stakeholders: Registry of Standard Biological Parts (RSBP), SAB Biotherapeutics (SABBio), World Health Organization (WHO), Rocky Hill Farm in WV (RHFWV)

Rating Scale: ♛ Most Effective ♞ Moderately Effective ♟ Minimally Effective

Does the option:RSBPSABBioWHO
Explore synthetic biology for goat life history for a putative MitoCarta or SASP gene and phenotypic pathways that may be useful in future studies to bioengineer negligible senescence in goats.
• By reducing uncertainty about the life history of goats.
• By reducing uncertainty about synthetic biology interventions for negligible senescence in goats.
Integrate aim 1 gene with OMICs data using computational model to explore molecule mediated bidirectional interactions between somatic host cells and microbes in goat microbiomes.
• By mapping major biological signaling pathways where communication goes from goat somatic cell to -> GIT microbiome
• By mapping major biological signaling pathways where communication goes from GIT microbe in GIT microbiome to goat somatic cell or system
Consider systems-level synthetic biology interventions for extreme environments that support goat metabolism and gut microbiome health.
• By cataloging goat metabolites and microbiota and their interactions
• By modeling seed to goat food webs for diverse local environments.
• By writing an aspirational study protocol.
Other considerations
• Minimizing costs and burdens to stakeholders
• Feasibility?
• Not impede research
• Promote constructive applications

Assignment (Week 2 Lecture Prep) — DUE BY START OF FEB 10 LECTURE

In preparation for Week 2’s lecture on “DNA Read, Write, and Edit," please review these materials: Lecture 2 slides. The associated papers that are referenced in those slides. In addition, answer these questions in each faculty member’s section:

Homework Questions from Professor Jacobson:

Question 1

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Based on the deck the error rate of polymerase is 1:10^6 or one error for every 1,000,000 base pairs. The size of the human genome according to the Molecular Biology of a Gene by Watson et al. (2007) the human genome is 3200 Mega base pairs in length which converts to 3,200,000,000 base pairs. Biology deals with the discrepancy through redundancy and replication forks moving from many different insertion sites at the same time. This way the redundancy offsets the discrepancy in the error rate. However errors still occur.


Question 2

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

There are two ways that I am aware of, Mass Spec and Edman Degredation. Both of these techniques identify amino residues that are synthesized as triple codons for a varity of lengths and structures. The total number of probable combinations is 3 codons multipled by 20 possible amino acids.


Homework Questions from Dr. LeProust: [Lecture 2 slides]

Question 1

What’s the most commonly used method for oligo synthesis currently?

Amplicon-Based Assays


Question 2

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Turn-around time on results due to added complexity from higher Chimera rate, drop out rate, and uniformity constraints above 100nt


Question 3

Why can’t you make a 2000bp gene via direct oligo synthesis?

I couldn’t find an exact answer in the deck, but an article by Yin et al (2024) cited below, which is relatively up-to-date, reports that the current length record for direct oligo synthesis is between 800 mer - 1728 mer. This alone is an accomplishment since authors explain that the rate of errors increases significantly above 100nt. The article also discusses the original 1000nt ceiling due to the steric hindrance of the substrate macromolecule. Please forgive my answer being a little choppy; I am still learning how to converse in this language.

Yin, Y., Arneson, R., Yuan, Y., and Fang, S. (2025). Long oligos: Direct chemical synthesis of genes with up to 1728 nucleotides. Chemical Science, 16(4), 1966–1973. https://doi.org/10.1039/D4SC06958G


Homework Question from George Church: [Lecture 2 slides]

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

Question 1

[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?


Question 2

[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

Here I think you are asking me to provide an alternative code for the foundation of life? The irony is that is why I am here, I want you all to teach me how to rewrite code for AA:AA interactions. I know this because in preparing my answer I ran AA:AA interactions by AI for a second oppinion. My prompt was “what do you think Dr. George Church means by code for AA:AA interactions”. AI tells me that this code may need to demonstrate how sequences of amino acids influence physical interaction rules, which I interpret to be first order principles. Therefore what code would I suggest to influence AA:AA interactions that are able (if AI’s tip is correct) to perturb first order principles. My code as Dr. Nick Lane would say, should be efficient at turning the feedback loops of matter into biophysical waves of energy. In further prepartion I would turn to my greatest advisors in Natural Selection adaptation, bacteria and their metabolic motifs in Metazoans. I would need a coding system that respects phylogeny and doesn’t immagine I could ever devise a coding system more ingenious than the Krebs cycle or the molecularly machinery behind the deprotonation of hydrogen by Complex 5 in the Electron Transfer Chain. Still it’s a fascinating thought experiment if nothing else. To this end, I used your Acevodo-Rocha et al. (2016) paper and AI to find the Poliseno et al. (2024) paper and though I am not up-to-date on this team I do suspect there is an Epidemiologist among them, because their example of concordant and discordant pairing of coding and noncoding functions is what my coding system would be based on to optimize around the canonical rigidity of present AA:AA interaction.

Acevedo‐Rocha, C. G., & Budisa, N. (2016). Xenomicrobiology: A roadmap for genetic code engineering. Microbial Biotechnology, 9(5), 666–676. https://doi.org/10.1111/1751-7915.12398

Poliseno, L., Lanza, M., & Pandolfi, P. P. (2024). Coding, or non-coding, that is the question. Cell Research, 34(9), 609–629. https://doi.org/10.1038/s41422-024-00975-8

[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or

devise one of your own:

https://arpa-h.gov/explore-funding/programs/boss https://www.darpa.mil/research/programs/smart-rbc https://www.darpa.mil/research/programs/go


Assignment (Your HTGAA Website) — DUE BY START OF FEB 10 LECTURE

Begin personalizing your HTGAA website in https://edit.htgaa.org/, starting with your homepage — fill in the template with

information about yourself, or remove what’s there and make it your own. Be creative! As with all assignments in HTGAA, be sure to

write up every part of this Homework on your HTGAA website in order to receive credit.


Important

For this week only, once your homework is complete and written up on your HTGAA website (and you’ve checked your published website at pages.htgaa.org and are happy with it), fill out the Homework 1 Completion form which David emailed out just after Lecture 1. This Google form expresses your interest in continuing with the course; without it you will not be accepted in HTGAA!

Week 2 HW: DNA Read Write and Edit

Table of contents

Software used:

  • Terminal,
  • git,
  • xcode,
  • hugo,
  • benchling,
  • rcdonovan website,
  • twist website.

Objective:

This week explores the read–write–edit toolkit: sequencing and synthesis workflows, restriction digests and gel electrophoresis, and early genome-editing frameworks.

Background:

DNA Read (George Church), Write (Joe Jacobson), & Edit (Emily Leproust). In addition to recitation and Tokyo Biohub node lab meetings

Methods:

  • Start with touchpoint of Design stage of SynBio DBTL cycle with In-silico Gel Art
  • Build DNA fragments in Benchling with restriction digests for Testing with Gel Electrophoresis
  • Learn from Benchling work & In-silico Gel Art
  • Start to Design or
  • Gel Electrophoresis
  • Obtain protein sequences
  • Plasmid digestion with restriction enzymes,
  • Preparing Twist DNA Synthesis Order

Tasks:

  1. Documentation
  • Make sure to document every step of the in-silico and lab experiments. Make sketches, screenshots, notes, drawings… anything that helps you - and others - understand the experiment. Your documentation should help you - and others - to understand the topic. Don’t be afraid to add things that don’t work. Show your failures - and how you overcame them. Your Documentation should be a description of the amazing journey you are on!
  1. Part 0: Basics of Gel Electrophoresis
  • Attend or watch all the lectures and recitation videos. Optionally watch bootcamp.

Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details.

Overview:

  • The EcoRI RE is sourced from Escherichia coli> with palladrome cut at AATT 5’-GAATTC-3’ 3’-CTTAAG-5’ leaving a 5’ sticky end.
  • The BamHI RE is sourced from Bacillus amyloliquefaciens and scans for 5’-GGATCC-3’ 3’-CCTAGG-5’ to cut between G and G leaving a 5’ sticky end.
  • The HindIII RE is sourced from Haemophilus influenzae and scans for 5’-AAGCTT-3’ 3’-TTCGAA-5’ leaving a 5’ sticky end.
  • The KpnI RE is sourced from Klebsiella pneumoniae, it requires small molecule cofactors including Mg and Ca ions to complete cut with fidelity; uses 5’-GGTACC-3’ 3’-CCATGG-5’ and rather uniquely for this experimental RE set leaves a 3’ sticky end.
  • The EcoRV RE is sourced from Escherichia coli also and scans for 5’-GATATC-3’ 3’-GTATAG-5’ and leaves the blunt end for this RE set.
  • The SacI RE is sourced from Streptomyces achromogenes and scans for 5’-GAGCTC-3’ 3’-GTCGAG-5’ leaving a 5’ sticky end.
  • The SaII RE is sourced from Streptomyces albus and scans for 5’-GTCGAC-3’ 3’-CAGCTG-5’ leaving a 5’ sticky end.
  • Source: Recognition sequences and cleavage patterns were verified using the REBASE database (Roberts et al., 2015).
  • Create a pattern/image w/style of Paul Vanouse’s Latent Figure Protocol artworks.
  • Use Ronan’s website as a helpful tool for quickly iterating on designs! Here is the link [https://rcdonovan.com/gel-art].
Benchling_Virtual_Digest_Report Benchling_Virtual_Digest_Report

HW2 is structured purposefully to make us think like synbio engineers. For example, the reason we transition from Gel Electrophoresis to Restriction Digests is because we cannot move large strands of DNA and RNA through the GE matrix. We need small enough pieces of readable genetic material just to accomplish the lab assay. This makes RD a function necessary to achieve our design objectives. Benchling is a similar addition to the HW2 learning module, we need to see the restriction digests applied on our Lamba model and the computational ladder for converting the pieces of plasmid DNA in our GE matrix, it then helps that we can use Benchling in subsequent steps also.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.

Now if your mind works like mine it might seem abrupt to leap from the movement of DNA through GE matrix to proteins but not if you understand the Central Dogma, sure, but even more the SynBio Design, Build, Test, Learn loop.

  • [https://doi.org/10.1371/journal.pbio.3002116]

Add a Bacterial chromosome and plasmid sequenced with Oxford Nanopore MiniON because I am annoyingly meticulous with discovery. In my HW2 discussion questions I am going to sing praises to Nanopore so also better to be consistent in DNA read inputs. I will download chromosome and plasmid DNA and load into Benchling. Please note the Genbank files do not play nicely with Benchling, so I will need to shift to FASTAs. Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 > Chromosome GenBank: https://www.ncbi.nlm.nih.gov/nuccore/CP033092.2/ > CP033092.2 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome > Plasmid GenBank: https://www.ncbi.nlm.nih.gov/nuccore/CP033091.2/ > CP033091.2 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 plasmid unnamed, complete sequence

Part 3: DNA Design Challenge

3.1. Choose your protein.

  • In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
  • [Example from our group homework, you may notice the particular format — The example below came from UniProt]
  • sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT Considering: RpoS in E. coli K-12 will download Amino Acids for protein below and convert backwards to genome if I do not find an online reference that isn’t deleted.
83333_0:000b85 {"organism":"Escherichia coli K-12","genome_id":"GCF_000974885.1","pub_prot_id":"WP_000081588.1","pub_gene_id":"SF31_RS18190","description":"RNA polymerase sigma factor RpoS"}
MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEQEPSDNDLAEEELLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLVVKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMNQTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNERITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAKQREVLARRFGLLGYEAATLEDVGREIGLTRERVRQIQVEGLRRLREILQTQGLNIEALFRE

https://www.ebi.ac.uk/interpro/result/InterProScan/iprscan5-R20260216-160122-0718-15835993-p1m/internal-1771257679016-348-1/ https://alphafold.ebi.ac.uk/entry/P13445 https://www.ncbi.nlm.nih.gov/datasets/gene/GCF_003697165.1/

Protien Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome

MSQNTLKVHDLNEDAEFDENGVEVFDEKALVEEEPSDNDLAEEELLSQGATQRVLDATQLYLGEIGYSPLLTAEEEVYFARRALRGDVASRRRMIESNLRLVVKIARRYGNRGLALLDLIEEGNLGLIRAVEKFDPERGFRFSTYATWWIRQTIERAIMNQTRTIRLPIHIVKELNVYLRTARELSHKLDHEPSAEEIAEQLDKPVDDVSRMLRLNERITSVDTPLGGDSEKALLDILADEKENGPEDTTQDDDMKQSIVKWLFELNAKQREVLARRFGLLGYEAATLEDVGREIGLTRERVRQIQVEGLRRLREILQTQGLNIEALFREEVSICQKGQSQARLAFFLLVHGTC*

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

  • The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
  • [Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]
  • Lysis protein DNA sequence atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa
  • Nucleotide Sequence for my gene pick
>gnl|ECOLI|EG10510 rpoS RPOS-MONOMER (complement(2866559..2867551)) Escherichia coli K-12 substr. MG1655
atgAGTCAGA ATACGCTGAA AGTTCATGAT TTAAATGAAG ATGCGGAATT TGATGAGAAC
GGAGTTGAGG TTTTTGACGA AAAGGCCTTA GTAGAACAGG AACCCAGTGA TAACGATTTG
GCCGAAGAGG AACTGTTATC GCAGGGAGCC ACACAGCGTG TGTTGGACGC GACTCAGCTT
TACCTTGGTG AGATTGGTTA TTCACCACTG TTAACGGCCG AAGAAGAAGT TTATTTTGCG
CGTCGCGCAC TGCGTGGAGA TGTCGCCTCT CGCCGCCGGA TGATCGAGAG TAACTTGCGT
CTGGTGGTAA AAATTGCCCG CCGTTATGGC AATCGTGGTC TGGCGTTGCT GGACCTTATC
GAAGAGGGCA ACCTGGGGCT GATCCGCGCG GTAGAGAAGT TTGACCCGGA ACGTGGTTTC
CGCTTCTCAA CATACGCAAC CTGGTGGATT CGCCAGACGA TTGAACGGGC GATTATGAAC
CAAACCCGTA CTATTCGTTT GCCGATTCAC ATCGTAAAGG AGCTGAACGT TTACCTGCGA
ACCGCACGTG AGTTGTCCCA TAAGCTGGAC CATGAACCAA GTGCGGAAGA GATCGCAGAG
CAACTGGATA AGCCAGTTGA TGACGTCAGC CGTATGCTTC GTCTTAACGA GCGCATTACC
TCGGTAGACA CCCCGCTGGG TGGTGATTCC GAAAAAGCGT TGCTGGACAT CCTGGCCGAT
GAAAAAGAGA ACGGTCCGGA AGATACCACG CAAGATGACG ATATGAAGCA GAGCATCGTC
AAATGGCTGT TCGAGCTGAA CGCCAAACAG CGTGAAGTGC TGGCACGTCG ATTCGGTTTG
CTGGGGTACG AAGCGGCAAC ACTGGAAGAT GTAGGTCGTG AAATTGGCCT CACCCGTGAA
CGTGTTCGCC AGATTCAGGT TGAAGGCCTG CGCCGTTTGC GCGAAATCCT GCAAACGCAG
GGGCTGAATA TCGAAGCGCT GTTCCGCGAG taa

Nucleotide Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome (Forward 5’ to 3')

ATGAGTCAGAATACGCTGAAAGTTCATGATTTAAATGAAGATGCGGAATTTGATGAGAACGGAGTTGAGGTTTTTGACGAAAAGGCCTTAGTAGAAGAGGAACCCAGTGATAACGATTTGGCCGAAGAGGAACTGTTATCGCAGGGAGCCACACAGCGTGTGCTGGACGCGACTCAGCTTTACCTTGGTGAGATTGGTTATTCACCACTGTTAACGGCCGAAGAAGAAGTTTATTTTGCGCGTCGCGCACTGCGTGGAGATGTCGCCTCTCGCCGCCGGATGATCGAGAGTAACTTGCGTCTGGTGGTAAAAATTGCCCGCCGTTATGGCAATCGTGGTCTGGCGTTGCTGGACCTGATCGAAGAGGGCAACCTGGGGCTGATCCGCGCGGTAGAGAAGTTTGACCCGGAACGTGGTTTCCGCTTCTCAACATACGCAACCTGGTGGATTCGCCAGACGATCGAACGGGCGATTATGAACCAAACCCGTACTATTCGTTTGCCGATTCACATCGTAAAGGAGCTGAACGTTTACCTGCGAACCGCACGTGAGTTGTCCCATAAGCTGGACCACGAACCAAGTGCGGAAGAGATCGCAGAGCAACTGGATAAGCCAGTTGATGACGTCAGCCGTATGCTTCGTCTTAACGAGCGCATTACCTCGGTAGACACCCCGCTGGGTGGTGATTCCGAAAAAGCGTTGCTGGACATCCTGGCCGATGAAAAAGAGAATGGTCCGGAAGATACCACGCAAGATGACGATATGAAGCAGAGCATCGTCAAATGGCTGTTCGAGCTGAACGCCAAACAGCGTGAAGTACTGGCACGTCGATTCGGTTTGCTGGGGTACGAAGCGGCAACACTGGAAGATGTAGGTCGTGAAATTGGCCTCACCCGTGAACGTGTTCGCCAGATTCAGGTTGAAGGCCTGCGCCGTTTGCGCGAAATCCTGCAAACGCAGGGGCTGAATATCGAAGCGCTGTTCCGCGAAGAAGTAAGCATCTGTCAGAAAGGCCAGTCTCAAGCGAGGCTGGCTTTTTTTCTTTTGGTACATGGTACATGTTGA

Reverse Compliment Nucleotide Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome (Reverse 3’ to 5')

TCAACATGTACCATGTACCAAAAGAAAAAAAGCCAGCCTCGCTTGAGACTGGCCTTTCTGACAGATGCTTACTTCTTCGCGGAACAGCGCTTCGATATTCAGCCCCTGCGTTTGCAGGATTTCGCGCAAACGGCGCAGGCCTTCAACCTGAATCTGGCGAACACGTTCACGGGTGAGGCCAATTTCACGACCTACATCTTCCAGTGTTGCCGCTTCGTACCCCAGCAAACCGAATCGACGTGCCAGTACTTCACGCTGTTTGGCGTTCAGCTCGAACAGCCATTTGACGATGCTCTGCTTCATATCGTCATCTTGCGTGGTATCTTCCGGACCATTCTCTTTTTCATCGGCCAGGATGTCCAGCAACGCTTTTTCGGAATCACCACCCAGCGGGGTGTCTACCGAGGTAATGCGCTCGTTAAGACGAAGCATACGGCTGACGTCATCAACTGGCTTATCCAGTTGCTCTGCGATCTCTTCCGCACTTGGTTCGTGGTCCAGCTTATGGGACAACTCACGTGCGGTTCGCAGGTAAACGTTCAGCTCCTTTACGATGTGAATCGGCAAACGAATAGTACGGGTTTGGTTCATAATCGCCCGTTCGATCGTCTGGCGAATCCACCAGGTTGCGTATGTTGAGAAGCGGAAACCACGTTCCGGGTCAAACTTCTCTACCGCGCGGATCAGCCCCAGGTTGCCCTCTTCGATCAGGTCCAGCAACGCCAGACCACGATTGCCATAACGGCGGGCAATTTTTACCACCAGACGCAAGTTACTCTCGATCATCCGGCGGCGAGAGGCGACATCTCCACGCAGTGCGCGACGCGCAAAATAAACTTCTTCTTCGGCCGTTAACAGTGGTGAATAACCAATCTCACCAAGGTAAAGCTGAGTCGCGTCCAGCACACGCTGTGTGGCTCCCTGCGATAACAGTTCCTCTTCGGCCAAATCGTTATCACTGGGTTCCTCTTCTACTAAGGCCTTTTCGTCAAAAACCTCAACTCCGTTCTCATCAAATTCCGCATCTTCATTTAAATCATGAACTTTCAGCGTATTCTGACTCAT

RNA Nucleotide Code for RpoS Gene from NZ_CP033092.1:4177924-4178988 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 chromosome, complete genome

AUGAGUCAGAAUACGCUGAAAGUUCAUGAUUUAAAUGAAGAUGCGGAAUUUGAUGAGAACGGAGUUGAGGUUUUUGACGAAAAGGCCUUAGUAGAAGAGGAACCCAGUGAUAACGAUUUGGCCGAAGAGGAACUGUUAUCGCAGGGAGCCACACAGCGUGUGCUGGACGCGACUCAGCUUUACCUUGGUGAGAUUGGUUAUUCACCACUGUUAACGGCCGAAGAAGAAGUUUAUUUUGCGCGUCGCGCACUGCGUGGAGAUGUCGCCUCUCGCCGCCGGAUGAUCGAGAGUAACUUGCGUCUGGUGGUAAAAAUUGCCCGCCGUUAUGGCAAUCGUGGUCUGGCGUUGCUGGACCUGAUCGAAGAGGGCAACCUGGGGCUGAUCCGCGCGGUAGAGAAGUUUGACCCGGAACGUGGUUUCCGCUUCUCAACAUACGCAACCUGGUGGAUUCGCCAGACGAUCGAACGGGCGAUUAUGAACCAAACCCGUACUAUUCGUUUGCCGAUUCACAUCGUAAAGGAGCUGAACGUUUACCUGCGAACCGCACGUGAGUUGUCCCAUAAGCUGGACCACGAACCAAGUGCGGAAGAGAUCGCAGAGCAACUGGAUAAGCCAGUUGAUGACGUCAGCCGUAUGCUUCGUCUUAACGAGCGCAUUACCUCGGUAGACACCCCGCUGGGUGGUGAUUCCGAAAAAGCGUUGCUGGACAUCCUGGCCGAUGAAAAAGAGAAUGGUCCGGAAGAUACCACGCAAGAUGACGAUAUGAAGCAGAGCAUCGUCAAAUGGCUGUUCGAGCUGAACGCCAAACAGCGUGAAGUACUGGCACGUCGAUUCGGUUUGCUGGGGUACGAAGCGGCAACACUGGAAGAUGUAGGUCGUGAAAUUGGCCUCACCCGUGAACGUGUUCGCCAGAUUCAGGUUGAAGGCCUGCGCCGUUUGCGCGAAAUCCUGCAAACGCAGGGGCUGAAUAUCGAAGCGCUGUUCCGCGAAGAAGUAAGCAUCUGUCAGAAAGGCCAGUCUCAAGCGAGGCUGGCUUUUUUUCUUUUGGUACAUGGUACAUGUUGA

Source: https://biocyc.org/ECOLI/sequence-rc?type=GENE&object=EG10510

3.3. Codon optimization.

Lysis protein DNA sequence with Codon-Optimization ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

3.4. You have a sequence! Now what?

3.5. [Optional] How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.
  2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein like a provided example at [https://2026a.htgaa.org/2026a/course-pages/weeks/week-02/index.html]

Reading DNA

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account, and Benchling account…

  • create Twist and Benchling accounts
  • Pick our protein! I will pick a protein related to aging for final project, I am just trying to keep my head above water on HW2 so the protein I pick is the example provided. See below in codeblock but what sort of nucleotides are “M E T…”? Clearly those aren’t nucleotides they are single letter representatives of amino acids, known as codons, constructed from 3 nucleotides. Here we are given in a top-down Build of a protein, which we must run the Central Dogma in reverse to translate back to RNA and then transcribe back to DNA.
>sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT
                     /note="unnamed protein product; L-protein"
                     /codon_start=1
                     /transl_table=11
                     /protein_id="CAA23990.1"
                     /db_xref="GOA:P03609"
                     /db_xref="InterPro:IPR022599"
                     /db_xref="UniProtKB/Swiss-Prot:P03609"
                     /translation="METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFL
                     AIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT"
     CDS             1761..3398

Here is an example of what running backwards looks like crudely. In this instance we go all the way back (1996) to the original sequence of phage MS2 L-protein from its genome. This is an excerpt from the GenBank file: representing a “phage MS2 genome” GenBank record [https://www.ncbi.nlm.nih.gov/nuccore/V00642].

Please note this sequence doesn’t come from the bottom of the GenBank file instead the selected region is required which must be further trimmed to match the code provided below from the HW2 blog. With correct NCBI links we can now confirm this code from the blog actually came from this GenBank record [ https://www.ncbi.nlm.nih.gov/nuccore/NC_001417.2?from=1678&to=1905&report=genbank]. I will also move this GenBank file into Benchling instead of previous file.

          atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

A closer match of genome nucleotides is obtainted through another NCBI lookup [https://www.ncbi.nlm.nih.gov/nuccore/NC_001417.2?report=fasta&from=1643&to=1938] though even here the resulting gene fragment must be further trimmed

>NC_001417.2:1643-1938 phage MS2 genome
GCTTATTGTTAAGGCA|
ATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAAC
TCCGGCATCTACTAATAGACGCCGGCCATTCAAACATGAGGATTACCCATGTCGAAGACAACAAAGAAGT
TCAACTCTTTATGTATTGATCTTCCTCGCGATCTTTCTCTCGAAATTTACCAATCAATTGCTTCTGTCGC
TACTGGAAGCGGTGATCCGCACAGTGACGACTTTACAGCAATTGCTTACTTAA|
GGGACGAATTGCTCACA
AAGCATCCGACCTTAG

Reflecting, since we need the gene that codes for the LYS_BPMS2 Lysis protein in the Escherichia phage MS2 we go back to a GenBank file from 1996 when virology was the approach in molecular biology for engineering tag segments of RNA strand with stems looped in the translation phase of the Central Dogma of molecular biology. Based on the orignal RNA virus from which MS2 was derived.

  • select Genes on the page with prompt “what can twist build for you?” for HW2
  • name the project “L protein” with “L” for “Lysis” for HW2.
  • select Clonal Genes order card and press “Order Now” when prompted to select gene type for HW2.
  • avoid my mistake, this next page is going to take us to an “Excel Like” worksheet that we will develop our request with. The old school way was to download and upload meticulously formatted Excel spreadsheets; we are advanced humans capable of using web forms. Before we enter the DNA we require into this order form we have to work through the DNA we were given to read in HW2 Completing the optimization process on Twist Website we now have a Codon-Optimized Lysis protein DNA sequence.
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

Optional: If we were going to synthesize more of this protein we now have a set of genetic instructions to read to grow those proteins. However there are different methods from which we can Build those proteins. We can consider cell-dependent or cell-free approaches. Explain more about these when I pick my protein.

In preparation for next steps remember that my Codon-Optimized Lysis protein DNA sequence In Benchling instructions to transcribe gene to RNA “Highlight the DNA sequence of interest.” “Right-click and select Copy Special.” “Choose the Reverse Complement option to get the anti-sense strand (RNA equivalent).” “Create a New DNA/RNA Sequence and paste the sequence, ensuring the type is set to “RNA”.”

4.2. Build Your DNA Insert Sequence

  • Let’s first organize our directories in Benchling for the assembly line
  • Create folder for Registry of Standard Biological Parts [https://parts.igem.org/Part:BBa_J23106] In that folder create the following folders: > A_Promoter > B_RBS > C_Start Codon > D_Coding Sequence > E_7x His Tag > F_Stop Codon > G_Terminator

HW2 Objective of assembly: make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein).

In Benchling, select New DNA/RNA sequence

Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below).

Promoter (e.g. BBa_J23106) TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (e.g. BBa_B0034 with spacers for optimal expression) CATTAAAGAGGAGAAAGGTACC

Start Codon ATG

Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example) AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli) CATCACCATCACCATCATCAC

Stop Codon TAA

Terminator (e.g. BBa_B0015) CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated!

This is not required for this exercise, but to share your design with others, please ensure that link sharing is turned on! (Optional) Share your final sequence link with a TA for review!

This insert sequence you built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). Go ahead and download the FASTA file for the sequence you made.

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs.

Here is my practice assembled copy of the HW2 gene fragment I will import in Twist. However, I will not submit an actual order to Twist because this is just my demonstration Clonal Gene fragment copy. I will repeat these steps with my own functional gene for official purchase order.

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

Here is my final Twist purchase order, though I will not actually purchase this either until an experiment can be developed.

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCC
ATTAAAGAGGAGAAAGGTACC
ATG
ATGAGTCAGAATACGCTGAAAGTTCATGATTTAAATGAAGATGCGGAATTTGATGAGAACGGAGTTGAGGTTTTTGACGAAAAGGCCTTAGTAGAAGAGGAACCCAGTGATAACGATTTGGCCGAAGAGGAACTGTTATCGCAGGGAGCCACACAGCGTGTGCTGGACGCGACTCAGCTTTACCTTGGTGAGATTGGTTATTCACCACTGTTAACGGCCGAAGAAGAAGTTTATTTTGCGCGTCGCGCACTGCGTGGAGATGTCGCCTCTCGCCGCCGGATGATCGAGAGTAACTTGCGTCTGGTGGTAAAAATTGCCCGCCGTTATGGCAATCGTGGTCTGGCGTTGCTGGACCTGATCGAAGAGGGCAACCTGGGGCTGATCCGCGCGGTAGAGAAGTTTGACCCGGAACGTGGTTTCCGCTTCTCAACATACGCAACCTGGTGGATTCGCCAGACGATCGAACGGGCGATTATGAACCAAACCCGTACTATTCGTTTGCCGATTCACATCGTAAAGGAGCTGAACGTTTACCTGCGAACCGCACGTGAGTTGTCCCATAAGCTGGACCACGAACCAAGTGCGGAAGAGATCGCAGAGCAACTGGATAAGCCAGTTGATGACGTCAGCCGTATGCTTCGTCTTAACGAGCGCATTACCTCGGTAGACACCCCGCTGGGTGGTGATTCCGAAAAAGCGTTGCTGGACATCCTGGCCGATGAAAAAGAGAATGGTCCGGAAGATACCACGCAAGATGACGATATGAAGCAGAGCATCGTCAAATGGCTGTTCGAGCTGAACGCCAAACAGCGTGAAGTACTGGCACGTCGATTCGGTTTGCTGGGGTACGAAGCGGCAACACTGGAAGATGTAGGTCGTGAAATTGGCCTCACCCGTGAACGTGTTCGCCAGATTCAGGTTGAAGGCCTGCGCCGTTTGCGCGAAATCCTGCAAACGCAGGGGCTGAATATCGAAGCGCTGTTCCGCGAAGAAGTAAGCATCTGTCAGAAAGGCCAGTCTCAAGCGAGGCTGGCTTTTTTTCTTTTGGTACATGGTACATGTTGA
CATCACCATCACCATCATCAC
TAA
CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

4.2. On Twist, Select The “Genes” Option

4.3. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.4. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.5. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

This is the plasmid you just built with your expression cassette included. Congratulations on building your first plasmid!

Important

For your final projects, remember to include:

Fully annotated Benchling insert fragment Desired Twist cloning vector

Part 5: DNA Read/Write/Edit

It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs. Here’s an example of what you just annotated in Benchling:

4.3. On Twist, Select The “Genes” Option

4.4. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project.

Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly.

Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.5. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.6. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy.

Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

This is the plasmid you just built with your expression cassette included. Congratulations on building your first plasmid!

Important

For your final projects, remember to include:

Fully annotated Benchling insert fragment Desired Twist cloning vector

Part 5: DNA Read/Write/Edit

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would like to use directed evolution in outbred goats to select my DNA to sequence. Therefore my plan is to stick with the HTGAA method until I have a Nanopore sequencer and reagents for genomic surveillance of my herd. My argument for why is still developing but essentially I have anecdotal observations to support a hypothesis. An example of the type of genes I would like to sequence is the second vector I uploaded – the RpoS gene in the K-12 strain of E. coli. The gene was sequenced with a Nanopore sequencer.

DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025]

DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025]

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

There is no substitute in my opinion for a Nanopore sequencer, with a distant second being a PacBio. Ofcourse, Nanopore sequencers are far less popular than Illumina, and despite the fact that I am a big fan of Craig Venter, I still prefer the scientific opportunities available with Nanopore. In fact a significant reason why I went back to school post reproductive fitness equals zero, is because when I graduated from college they still hadn’t completed the Human Genome Project. I learned about Nanopores during the COVID-19 Pandemic when I started one of the first wastewater surveillance programs in the U.S. I believe the accuracy, speed, and flexibility of pore facilitated single base sequencing reads in parallel multithreaded readings fits my future research goals exactly and I am on the cusp of becoming a Nanopore super user.

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

I am focused on deep time series sequencing data that is broad enough to include changes in diversity of microbiome and host metatranscriptomic, epigenetic, and metabolomic signals as well as metagenomic changes. I also want to develop pipelines that I always have the opportunity to contribute to but never want to own or primarily benefit from. I believe paywall science is an etiologic mechanism that favors contagion.

The first benefit of 3rd and 4th generation sequencers, particularly the Nanopore machines, is that they do not even require PCR amplification. Don’t get me wrong, I love PCR as a flexible assay, but as an Epidemiologist I have never been comfortable with making more copies of pathogens on principle. I realize this a bit of a semantic argument and there are plenty of bio safety measures in place. At the same time the same biosecurity measures are drivers of inequality in applied Molecular Biology capabilities. What does it mean when the technology itself becomes a driver of inequality to scientific techniques everyone in a generation should have access to? I believe it means it’s time to keep innovating.

In addition I think there is wisdom in sequencing the actual shoddy molecules collected from the field, particularly for my applications. This is a Biosecurity advantage and better fit to the Epi Triangle anyway. However I am not saying there are not scenarios where higher level Biosecurity reference labs with PCR pipelines are not necessary. I just think some sequencers should be managed and maintained by governments and smaller non-PCR-based Nanopores should be prioritized by individual field researchers, like I intend to be.

Now there is an elephant in the room, thoug,h and that’s data storage. I have been wrestling with data storage my entire career, and I know my interest in Nanopore sequencing isn’t going to make these challenges go away anytime soon. Therefore, I am all for DNA storage of genomic sequencing information about animals in plant DNA ideally. If a safe method is already available, storage in animal subjects would be incredible as well. What DNA storage is maintained in goat horns or sheep’s wool? I need to investigate the methodology further to see if this is even possible. I am ashamed to admit that until I read Dr. Church’s Epilogue in Regeneration, I had never even thought about this before.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I am dazzled by DNA origamis for synthetic materials, but the complexity of the methods to achieve static outputs is not necessarily a tradeoff I would invest time in right now. Genetic circuits are different though, I am fully attentive to this revolution. Particularily like we see in the examples provided by the Elowitz Lab [https://www.elowitz.caltech.edu/research#!computationandsyntheticcircuits].

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

I would like to learn more about the CHOMP (circuits of hacked orthogonal modular proteases) method to integrate binary logic into functional programming modules in biological circuitry. The CHOMP method can then be used to control regulatory cascades and even more exciting to me binary logic gates. My research interests are nonlinear models in aging and cellular senescence that utiilizes elucidated insights to improve areas of stasis with the potentional for rejuvination. Waves of molecular decisions all of which with decipherable underlying binary logic gates based on Boolean logic. The engineering methods focus on viruses and bacteria. The programming motif they target is incoherent feed-forward loops. The amino acid they interact with is the Nitrogen end of Tyrosine which they expand to a four protein circuit. They image their single-transcript adaptive pulse circuits using time-lapse images. The result of the engineering is a rachet to control intrinsic nonlierarity of input and output biological systems. The scalabity and accuracy are tunable by the application. The speed is slow to design and as fast as biological circuits once implemented. Another method I would be interested in investigating further is the Asish et al. (2026) lab’s noninvasive biosensor application using live-cell diffusion-weighted imaging to investigate the effect of Gly-Ser spacers in transcription.

  • Source: Xiaojing J. Gao et al. ,Programmable protein circuits in living cells.Science361,1252-1258(2018).DOI:10.1126/science.aat5062
  • Asish N. Chacko et al. ,A programmable genetic platform for engineering noninvasive biosensors.Sci. Adv.12,eaec1211(2026).DOI:10.1126/sciadv.aec1211

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I am for animals. I want to contribute to dextinction and life extinction of all endangered and vulnerable organisms I can serve. I will contribute to human longevity as an afterthought to Natural diversity and sustainability. I am not beholden to humans though. I believe in the sanctity of all life.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions: How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The synthesis of DNA and OMIC ontologies starts with phylogeny, small molecules, and phenotypes like disease. Therefore my career plan is to build the throughput for molecule mediate bidrectional interactions between Host physiology and the microbes in the microbiome and metagenome around the host. This is the big tent vision. Now how do get there. Imagine if I lay it all out here step-by-step. How could I do what I aspire to do if 1000 people do it first before me. Still, I come to HTGAA a pleb at the stairs to the Temple of Zeus, with my goats. Not to sacrifice, though. I will not be a culling scientist. There is no scientific discovery in the text of life that is worth sacrificing a living thing. I am a builder by nature anyway, I want to observe life without intervention, well some intervention is necessary, but not like it’s currently done. Therefore what to do?

Here is what I can share at HW2. Everywhere an organism lives, say a goat, is a DNA and RNA wake of material. Much of it is waste material, residues from competing metabolic systems stacked and ready to be interpreted and transformed into data. With data comes constraints, especially OMIC data, it’s endless and massive in scale, randomized and chaotic. I like the idea of applied systems biology pipelines built around dead biological material. You can catalogue and reconstruct living systems from waste chemistry. Do you need an Almond in it’s shell to understand the life history of that nut, not really, a fragment of husk in a pile on the ground will tell you about the almond and the animal that consumed it.

COVID-19, as a front-line Epidemiologist, in the center of the maelstrom did not equivocate in its lessons. First and foremost, public health apparatuses like mRNA vaccination research and deployment infrastructure is useful when it’s available, accessible, and appropriately matched to the agent. The rest is wastewater. Especially, where the infrastructure of sewers is insufficient to remove waste from a community fast enough, can be used to trace outbreaks in near real time. Wastewater surveillance is harder where the water is plentiful, deep, and fast-flowing. The great news for wastewater epidemic surveillance is that the structural inequalities above the sewers, exist within the sewers, and drive disease transmission in Outbreaks. This isn’t a hunch; the data support it. This is why I will continue to be interested in wastewater surveillance also when I enter the workforce.

However, I will focus on much broader networks of waste than wastewater, which is what makes the intersection of gut microbiomes, microbes, and host physiology the biological nexus for me. Thus, applications, many options here – especially in agriculture. I like agriculture because soil is the ultimate biological pile of waste. I have watched animal waste turn into dirt for several years now, and from that waste, plants grow. The animals eat those plants and turn it into animal tissues using systems of heredity and variability that have nothing to do with anything I did. I just get the animal in front of a plant and they complete their reproductive and maintenance programs. If I keep the animals water clean and their housing dry they do not get sick. These animals and the environment are an engine that I can run passively – they make the world a better place.

At the same time, though, this natural experiment produces a lot of opportunities to study molecule-mediated bidirectional relationships between animal hosts and the microbes in their microbiome and metagenome. Fortunately, for my experimental milieu, my species is driving Earth to its extreme of the boundary conditions for habitability, which certainly makes science more interesting – especially when local interventions can be developed to support sustainability, health, and longevity.

The last sentence is key for the edits I would dare to make. Never blindly though. This is why I will structure my lab within evolution directed sythesis.

Resources

DNA Sequencing at 40: Past, Present, and Future (2017) Shendure, J., Balasubramanian, S., Church, G. et al. https://doi.org/10.1038/nature24286 DNA Synthesis Technologies to Close the Gene Writing Gap (2023), Hoose, A., Vellacott, R., Storch, M. et al. https://doi.org/10.1038/s41570-022-00456-9 Recombineering and MAGE (2021), Wannier T, et al. Nat Rev Methods Primers, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9083505/ CRISPR Technology: A Decade of Genome Editing is Only the Beginning, Wang, Doudna, et al., https://www.science.org/doi/10.1126/science.add8643 Databases

GenBank overview: https://www.ncbi.nlm.nih.gov/genbank/ NCBI: https://www.ncbi.nlm.nih.gov/genome/ Ensembl: https://useast.ensembl.org/index.html UCSC Genome Browser: https://genome.ucsc.edu/ Protective and Enhancing Alleles: https://arep.med.harvard.edu/gmc/protect.html Editors and tutorials

CRISPR/Cas9 Short tutorial for designing gRNAs: https://blog.addgene.org/how-to-design-your-grna-for-crispr-genome-editing Benchling specific tutorial for designing gRNAs: https://www.benchling.com/blog/how-to-design-grnas-to-target-your-favorite-gene List of Cas editors and their PAM sites: https://www.synthego.com/guide/how-to-use-crispr/pam-sequence Base Editors Base editors contain a nicking or dead Cas9 enzyme fused to a deaminase. a.) PAM requirement: Base editors contain a nicking or dead Cas9 enzyme fused to a deaminase. For designing your guide RNA for base editing you will therefore have a PAM requirement like you would have for any Cas9 experiment. b.) Deamination window: An additional design constraint is that the sequence window in which deamination occurs is only a few base pairs long. You can find information on the deamination windows in the review below (even though some new editors are not included). BE4 and ABE7.10 are good starting points and both use SpCas9 with NGG Pam requirement. Base editors with other PAM sites have been constructed too. Review of base editors (2018) including a list of all base editors, their editing window and PAM requirement: https://www.nature.com/articles/s41576-018-0059-1?WT.feed_name=subjects_animal-biotechnology Other editors: Prime editor https://www.nature.com/articles/s41586-019-1711-4 Tutorials/tools: https://primeedit.nygenome.org/ https://www.nature.com/articles/s41551-020-00622-8 http://pegfinder.sidichenlab.org/ TALEN For TALENs, you can assume no sequence restrictions – One of the technology’s previous restrictions was a T starting base, but this has since been overcome. In contrast to the CRISPR/Cas technologies above, your DNA sequence is recognized through interactions between the DNA and the TALEN: each TAL in the array recognizes one base. (Note: In order to introduce a double strand break, you will need to design to TALENs targeting the opposing strands.) Short guide: https://www.addgene.org/talen/guide/ One of the available design resources: https://tale-nt.cac.cornell.edu/node/add/talen Directed evolution for overcoming starting base restriction:https://academic.oup.com/nar/article/41/21/9779/1276340 Additional Resources:

Gel Purification of DNA: after DNA gel electrophoresis, cutting a band of DNA out of the agarose gel allows isolation and purification of a specific DNA fragment: Addgene: Protocol - How to Purify DNA from an Agarose Gel Overview of synthetic, unnatural organisms using recoding: Synthetic genomes with altered genetic codes (2020) DNA recorders, Sense+Read+Write: Lineage tracing and analog recording in mammalian cells by single-site DNA writing (2021) Molecular electronics, integrating single molecules into electronic chips: Molecular electronics sensors on a scalable semiconductor chip: A platform for single-molecule measurement of binding kinetics and enzyme activity (2022) Review of genome editors (zinc finger nucleases, TALENs, CRISPR) at the time CRISPR was emerging as editing technology: https://www.cell.com/trends/biotechnology/pdf/S0167-7799(13)00087-5.pdf Clinical trials of genome-editing therapies: https://www.nature.com/articles/d41573-020-00096-y

Week 3 HW: Lab Automation

Homework for HTGAA 2026 (Week 03): Lab Automation

NIH_Bioart_Bacterium NIH_Bioart_Bacterium

Table of contents

Software used:

Objective:

This week we get hands-on (or at least code-on) with pipetting robots.

Background:

No lecture. Recitation and Tokyo Biohub node lab meetings. Submit three slides with ideas to our node by 24Feb2026.

Ideas for Tokyo Biohub Deck

  • GPG01: Identify transcription indicators in post reproductive goat life history indicative of alterations to NAD(H), ROS signaling, tissue specific oxidative stress and inflammation.
  • GPG02: Explore application of G-protein coupled receptors (GPCRs) in goats a method Chen et al. (2019) proposes more broadly to monitor bioactive microbial metabolites with associations to physiology.
  • GPG03: Consider systems-level synthetic biology agricultural interventions to improve yield of metabolite specific food-stuffs to support molecule mediated bidirectional interactions between goat hosts and microbiota.

Questions:

For this week, we’d like for you to do the following:

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

  • Child HT, Wierzbicki L, Joslin GR, Roper K, Haxhiraj Q, Tennant RK. Automated environmental metagenomics using Oxford nanopore sequencing. BMC Genomics. 2025 Sep 26;26(1):835. doi: 10.1186/s12864-025-11989-w. PMID: 41013192; PMCID: PMC12465296.

I propose using a cloud laboratory and automation tools to process environmental metagenomics samples with and by Oxford nanopore sequencers. Here is the problem I am attempting to address. I am only one person. My time is always constrained profoundly. This means I always am burning and undercooking items with all my pans in the fire. Still, personally and professionally I am comitted red tooth and claw to environmental protection of biodiversity and abundance of natural ecologies and agricultural middle corridors. In addition I am personally offended by inequality, especially when it comes to the allocation of scientific discovery capacity and supply lines. The most diverse places on Earth are the most imperiled and at the same time least equipped with tooling to achieve the scientific advacements they need to protect their habitats and communities. Allow me to also preface that HTGAA is a small example that the bottlenecks of which I speak are not in human capacity, it’s techology, energy, infrastructure, brick and mortar. I believe that cloud computing and automation tools are a stopgap measure urgently needed to fill the breech and provide platforms to the facilitate the synergies of natural and unnatural selection required to advance sustainability and biodiversity. However engineering these partnerships are going to be just as important as the technological capabilities. The great thing about HTGAA is that we are doing this work from the bottom-up by participating in these cohorts.

Aside from HTGAA, my work with goats actually comes from the same engineering aspiration. I never saw a goat in the U.S. until I became a community health worker and started working outside of my country. Once I left the U.S. goats were much more plentiful, especially in rural mountainous regions. I am raising goats now to learn animal husbandry of these critical animals so that I can better understand how to help raise goats anywhere, in any locale, with any resource constraint. Goats in my opinion are the first automation tool that humans partnered with to survive in extreme environments. Through this partnership goats and humans expanded their gene and environment match with the physical constraints they were encountering in their struggle to ensure their families thrived. Goats and humans share many strengths and weaknesses, mainly their dedication to their families and security of FDR’s essential freedom from uninhabitable temperature and violence, hunger, thirst.

Now to the assignment, but from this perspective, the paper I reviewed from Childs et al. (2025) compared manual and automated metagenomic workflows using Oxford nanopore sequencers and found minimal differences in outcomes assessed. The first reason I chose this paper is because it starts with a fundamental truth, long-read sequencing has transformed our understanding of the microbiome. In fact, metagenomic and microbiome catalogues were not even attempted reliably until these machines entered the OMICs revolution. Enter the pipetter. I can attest, this is monastic work. The challenge is not the tool, it’s the lab space, and the sheer magnitude of the wells that the pipetter must span. Experimental protocols require percise allocation of minute quanties of fluids over and over again. From a personal vantage I quite enjoy the process, for there are few activities more zen in my day but then I am also hyper-privleged. Again inequality rears its head into the hallways of science. Who enters the cloister of the dwindling lab spaces in the world to the shelter of the bench and how many minuites do they have to spend to achieve their objectives. Here too is another ineqality though, because let’s be honest, not their objectives but the objectives of their research supervisors–because labs are also part of the caste system.

How do we untangle all of these knots to do the do the critical work. Could it be automation is answer? This will depend on who has access to automation. Are we talking about robotic workflows that are accessible to anyone with curiosity about microbiomes and metagenomes. Likely not anytime soon. I guess it will be more about the workflows done by students with professors. This is where the revolution of OMICs and Next-Generation Sequencers must be fought. What about private start-ups, I don’t know enough to speculate here. I can ponder the task of expanding the paradigm so any student with want of bench exposure using sequencers can have it. Honestly, I think HTGAA is pursuing this admirably. The cause is certainly just. If students and professors with and without wetlab spaces can both access cloud platforms and automation labs then we can realize the type of contingent niche environment that theoretically at least could be scaled-up and that is far better than not having a foothold at all. The Childs lab (2026) certainly seems to understand this charge when they explain that automation is a game changer fit to improve throughput, reproducibility, and accuracy. What is less clear is if the solution is the automated workflow or the Oxford nanopore sequencers that true read the sample one base pair at a time very quickly and then write that information into a cloud library for template recognition against other long-read sequences with annotation.

I didn’t really leave myself enough time to do this properly, ironically because this is lambing season, but Child’s et al. (2025) do make some very interesting observations in their side-by-side comparison of manual and automated workflows. I will apply these to my project now as well.

Childs et al. (2025) explain that many of the current studies they reviewed for their article only contain high throughput amplicon from the COVID-19 Pandemic. I do not see this as a challenge at all. Instead, when I think about the COVID-19 Pandemic as front-line warrior for Metazoans I see the good we accomplished when political will was aligned with scientific aspirations, and trust that the only reason naysayers have any leeway now to gripe about the deluge of SARS-CoV-2 data and genetic contamination, is because they are alive because of mRNA vaccines and wastewater surveillance, which Oxford nanopore significantly supported.

The liquid handling robot arm of the Childs et al. (2025) study was a Bravo Automated Liquid Handing Platform. I want one. Is it worth the cost though. Apparently, the findings are not sufficient to justify a purchase, based on read length alone. In the study the manual and Bravo study arms both analyzed the same 24 samples from a range of environments across a 96-well plate. Except for read length, which was on average longer in the manual arm than the automated. We can assume, if we have ever pipetted, that the automated arm would be more consistent in the allocation of microfluidics but confounding from variation in diverse soil samples appears to have made this distinction difficult to show. Meanwhile, the manual arm included eludication of DNA samples that the automata didn’t replicate, that doesn’t seem fair to me. However, if the automated workflow literally is not able to do all of the workflow steps than that is a strong point for manual over automated arms until the landscape is level.

Here’s the big takeaway though for my project. Childs et al. (2025) did find that improved automated libraries reduced PCR artefacts and increased sensitivity provide a more accurate snapshot of the ecological taxa of the microbiota – in other words more families, species, sub-species in the samples of less abundant organisms. This is what I want to hear, because if this process was applied to five studies instead of one then we would have 5x’s the power in detection of rare organisms that contribute to the diversity of the soil ecosystems, which is what I aspire most to understand and preserve.

  • Final Project Ideas — DUE BY START OF FEB 24 LECTURE

Methods:

  • Cloud Computing

Tasks:

Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME!

  • Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.
  • Review this week’s recitation and this week’s lab for details on the Opentrons and programming it.
  • Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons.
  • You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead. [!warning] Ask for help early! If you are having any trouble with scripting, contact your TAs as soon as possible for help. Do not wait until your scheduled robot time slot or you may not be able to complete this assignment! If the Python component is proving too problematic even with AI and human assistance, download the full Python script from the GUI website and submit that:
    Use the download icon pointed to by the red arrow in this diagram.

    Use the download icon pointed to by the red arrow in this diagram.

    If you use AI to help complete this homework or lab, document how you used AI and which models made contributions. Sign up for a robot time slot if you are at MIT/Harvard/Wellesley or at a Node offering Opentrons automation. The Python script you created will be run on the robot to produce your work of art! At MIT/Harvard? Lab times are on Thursday Feb.19 between 10AM and 6PM. At other Nodes? Please coordinate with your Node. Submit your Python file via this form. Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

For this week, we’d like for you to do the following:

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.

Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.

Echo transfer biosensor constructs and any required cofactors into specified wells. Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate. Multiflo dispense the CFPS lysate to all wells to start protein expression. PlateLoc seal the plate. Inheco incubate the plate at 37°C while the biosensor proteins are synthesized. XPeel remove the seal. PHERAstar measure fluorescence to compare biosensor responses. Final Project Ideas — DUE BY START OF FEB 24 LECTURE

Assignees for the following sections
MIT/Harvard studentsRequired
Committed ListenersRequired

As explained in this week’s recitation, add a slide in your Node’s section of this slide deck with an idea you have for an Individual Final Project. Be sure to put your name on your slide!

Reading & Resources (click to expand)

Opentrons API Documentation: https://docs.opentrons.com/python-api/ Opentrons Artwork GUI Website: http://opentrons-art.rcdonovan.com/ Opentrons Artwork Colab: HTGAA26 Opentrons Colab Automation Equipment: HTGAA 2026 Recitation: Lab Automation, Opentrons Art, Intro to Cloud Laboratories

HTGAA 2026: Opentrons Artwork Lab

By Eyal Perry, Laura Maria Gonzalez, Dominika Wawrzyniak, Alex Hadik, Suvin Sundararajan, Ronan Donovan

This notebook contains a few examples that demonstrates how the Opentrons OT-2 can be used to draw arbitrary patterns using the the Python Opentrons API. These examples can and should be used as your template as you try to pattern your own colorful, synthetically engineered bacteria.

To use this, make your own copy of this Colab, and in that copy you can run and edit the last section (and your work will be saved in your copy!).

Note: After learning about how to program designs using colab and python, you may choose to print more designs with automated tools like Opentrons Art Interface.

Each example consists of two blocks of code:

  1. The first code block is where the pattern is drawn using .aspirate(), .move_to(), and dispense_and_detach() (as a wrapper around .dispense()) commands (similar to G-code). This block will typically generate no output, as it’s just loading the code (but doesn’t run it yet). This block of code can later be copied as-is and saved as a .py file to be executed on a real Opentrons machine.
  2. The second code block runs a simple simulation that visualizes the pipetted pattern by executing your code in the simulator in this colab. This block will draw the state of the plate after running the robot code.

At the end is a section for you to code your design in, with the same two code blocks. Make your own copy of this Colab notebook and work in your copy. When ready, upload the link to your first block in this section to the linked google form a day before your lab date! Don’t edit the second block in this section, as only the first block will be run on the robot.

Several important notes:

  • All units are in mm
  • Never go beyond a radius of 40mm from (0,0). If you do, you might hit the walls of the petri dish and all hell breaks loose, or you might dispense onto the wall of or even outside the petri dish. (Some common “90mm” or “100mm” petri dishes only have an inner diameter of 84mm in the bottom plate, and the tip occupies a radius of a couple mm.)
  • For the Black Agar Plates, dispense 1 uL drops by default. (If you are trying for a particular effect, going slightly higher in some places may be acceptable.) While that may sound like a small quantity, the E.coli will still be visible (especially after growing) and small “pixel” sizes can produce more detailed patterns.
  • Be careful of dispensing samples too close to each other! They will move around slightly depending on the size of the drop. 1uL drops 2mm apart may sometimes run together or may stay mostly distinct even after incubation; 1uL drops 5mm apart will almost always stay distinct, but give you less than half the “resolution” for your art. Midway between those - 3.5mm separation - may be a happy medium. (See past year photos here and in the Lab Protocol and count dots along one axis; these of course show the ones which were lucky enough to mostly not run together…)
  • On the robot if you dispense and immediately move the tip 1cm to the left it will create a streak of bio-ink (shaped according to the viscosity of the liquid). The simulator accounts for this basic effect, and you will see spurious lines between your dots or to random locations in the visualization. We have provided a routine dispense_and_detach() that dispenses and moves the tip slightly up & down to fully clear the droplet; you can use this in your code both for the simulator and the robot to avoid streaking.
  • We have defined standard configuration for the robot deck for this lab, and our template code follows it. We plan to have Red-, Yellow-, Green-, Cyan-, and Blue-fluorescing bacteria (but no others) at all sites in the robot for your use, and have provided a routine location_of_color() you can use to retrieve our standardized configuration’s location of a named color (which you can pass to an aspirate() call).
  • Pay attention to any text output from the simulator (typically just above the plate image) - it can give useful diagnostics and statistics. Don’t get so focused on your beautiful drawing that you forget to check this every once and a while.
  • Remember not to waste any resources (here tips & reagents, as explained in the Lab Protocol – you can confirm via the “Volume Totals by Color” and “Tip Count” summaries shown after every successful run – but don’t cross-contaminate your color wells.
  • The visualization is not 100% accurate. We don’t model any flud dynamics, so any streaking if you don’t use dispense_and_detach(), any effects of dispensing from z>0, and even the droplet sizes in all cases are not physically realistic; and the simulator doesn’t have an awareness of the 3D positions of labware. (Feel free to contribute improvements to the simulator!)
  • The simulation is not even close to a 100% complete reimplementation of the Opetrons API. Some commands will work on the OT-2 but will cause errors in the simulation (Feel free to contribute!).

After your code is done, to submit it to be run on a robot:

  1. Make sure your code is accessible to us: in your colab click the “Share” icon in the upper right, set “General access” so that “Anyone with the link” can be a “Viewer”.
  2. Copy to the clipboard a link to your code: right-click in the first code block (which has the metadata = {...} section near the top and your code at the end) and choose “Copy link to cell”
  3. Paste this URL into the Google form for submitting to the OT-2 and submit at least a day before your robot time slot.
  4. Review the Instructions given in the Lab Protocol.

Prerequisite Code

The following block of code contains required installations and the simulation/visualization code. It only needs to be run once per runtime.

When run, it will output errors declaring “ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed.” and list some package incompatibilities; that is expected, and is a result of the Opentrons API requiring an old version of some libraries. (No other errors are expected.)

This block can be re-run in a runtime without ill effect (but will show the same errors every time).

Run this block once per runtime to set up your environment

#@title Run this block once per runtime to set up your environment

The colab now comes with too new a version of numpy; opentrons still needs an older one.

So set up venv-like isolation of my pip installs (separated from colab packages) for all subsequent cells.

(Without doing this, colab would require restarting the runtime right after installing a different numpy version.)

import sys, os py = f"{sys.version_info.major}.{sys.version_info.minor}" PKG = f"/content/venv/lib/python{py}/site-packages" os.makedirs(PKG, exist_ok=True) if PKG not in sys.path: sys.path.insert(0, PKG) os.environ[“PIP_TARGET”] = PKG # routes !pip / %pip installs into the venv os.environ[“PYTHONNOUSERSITE”] = “1”

Install opentrons into the venv (and all its dependencies!) BEFORE any import numpy etc.

%pip install -q –upgrade –target “$PKG” opentrons

Now opentrons has been cleanly installed in its own venv-like environment with

versions of packages it likes; proceed to use it “normally” from here.

from opentrons import types import matplotlib.pyplot as plt plt.rcParams[“figure.figsize”] = (10,10)

Petri dish size constants

PETRI_INNER_DIAMETER = 84 # 84mm is hopefully a tight lower bound on inner diameter of “90mm” & “100mm” petri dishes MAX_DRAW_RADIUS = PETRI_INNER_DIAMETER/2 - 2 # leave 2mm margin for the tip size, drop size, miscalibration, etc.

Define some classes for our custom HTGAA Opentrons simulator/visualizer

nullLocation = types.Location(types.Point(x=250, y=250, z=250), None)

def same2DLocation(loc1, loc2): # ignores z (=> tests x, y, and labware) return loc1.point.x == loc2.point.x and loc1.point.y == loc2.point.y and loc1.labware == loc2.labware

def mock_print(str): #print("…\n" + str) pass

each PipetteSim instance tracks what it’s dispensed; if you have multiple, need to call visualize() on each.

(can’t unify multiple by making the instance variables into class variables; note this colab has at least

one instance per example, and we don’t want those sharing dispense states.)

class PipetteSim: # modeled after InstrumentContext in the opentrons api def init(self, instrument_official_name, mount_LR, tip_rack_list, well_colors): if instrument_official_name != “p20_single_gen2”: raise ValueError(f"Unsupported pipette {instrument_official_name} – should be p20_single_gen2") self.max_volume = 20 self.instrument_official_name = instrument_official_name

if mount_LR != "right":
  raise ValueError(f"Unsupported pipette mount {mount_LR} -- should be right")
self.mount_LR = mount_LR

if tip_rack_list[0].labware_official_name != "opentrons_96_tiprack_20ul":
  raise ValueError(f"Unsupported tip rack {tip_rack_list[0].labware_official_name} -- should be opentrons_96_tiprack_20ul")
self.tip_rack_list = tip_rack_list

self.well_colors = well_colors
self.droplets_x = []
self.droplets_y = []
self.droplets_size = []
self.droplets_color = []
self.smears = []                # list of 3-tuples: (xlist, ylist, color)
self.location = nullLocation    # used by dispense_and_detach()
self.justDispensedAt = None
self.current_volume = 0
self.aspirated_loc = None
self.totalAspirated = {}        # 'color' : total
self.totalDispensed = {}        # 'color' : total
self.curr_color = 'orange'
self.has_tip = False            # (in the opentrons api!)
self.tip_count = 0

def del(self): if self.has_tip: raise Exception("### ERROR: Run completed without dropping the tip!") # python prints but ignores exceptions in destructors

used by our dispense_and_detach() routine

def _get_last_location_by_api_version(self): # (in the opentrons api!) return self.location

use the well id to make up a location on the petri dish diagram:

D6 in the center, A1 lower left, H12 upper right (assuming 96-well, but will work for any)

def petriLocOfWell(self, well): # (NOT in opentrons api) assert(isinstance(well, WellMock)) x,y = well.get_row_col() return well.top().move(types.Point(x=(x-ord(‘D’)) * MAX_DRAW_RADIUS/4, y=(y-6) * MAX_DRAW_RADIUS/6, z=0))

but no smear if it’s just a z-move

def smearIfJustDispensed(self, loc): # (NOT in opentrons api) assert(isinstance(loc, (types.Location, WellMock))) if self.justDispensedAt is not None: newloc = loc if isinstance(loc, types.Location) else self.petriLocOfWell(loc) if not same2DLocation(self.justDispensedAt, newloc): line_end = self.justDispensedAt.move(0.5 * (newloc.point - self.justDispensedAt.point)) self.smears.append(([self.justDispensedAt.point.x, line_end.point.x], [self.justDispensedAt.point.y, line_end.point.y], self.curr_color)) self.justDispensedAt = None

def dispense(self, volume, location): # (in opentrons api) assert(isinstance(location, types.Location)) # not allowing dispensing into well or trashbin/wastechute for this lab – petri only! assert(isinstance(volume, (int, float))) if (location.point.x2 + location.point.y2 > MAX_DRAW_RADIUS**2): raise ValueError(f’Dispensing outside “safe” area: Point ({location.point.x}, {location.point.y}) is more than’ + f" {MAX_DRAW_RADIUS}mm away from the petri dish’s center.") if not self.has_tip: raise RuntimeError(“dispense() called when no tip was being held”) if self.current_volume < volume: raise ValueError(f"You dispensed {volume}uL, which is more than was in the pipette ({self.current_volume}uL).") if volume <= 0: raise ValueError(f"Dispensing {volume}uL – you should dispense a positive amount.") if location.point.z < 0: raise ValueError(f"dispense() passed a location with z={location.point.z} – do not go below z=0!") if location.point.z >= 10: print(f"Dispensing from a location with z={location.point.z} – do you really want to dispense from that high?") self.smearIfJustDispensed(location) self.current_volume -= volume self.droplets_x.append(location.point.x) self.droplets_y.append(location.point.y) self.droplets_size.append(volume * 100) # unprincipled scale factor (1uL->100 sq.pt), but it works self.droplets_color.append(’lime’ if self.curr_color.lower()==‘green’ else self.curr_color) # map green -> lime (looks more like GFP) self.totalDispensed.setdefault(self.curr_color, 0) self.totalDispensed[self.curr_color] += volume self.location = location self.justDispensedAt = location

def aspirate(self, volume, location): # (in opentrons api) assert(isinstance(volume, (int, float))) assert(isinstance(location, (types.Location, WellMock))) if not self.has_tip: raise RuntimeError(“aspirate() called when no tip was being held”) if volume + self.current_volume > self.max_volume: raise ValueError(f"Aspirating {volume}uL + {self.current_volume}uL already in pipette = {volume + self.current_volume}uL," f" which is more than the pipette can hold ({self.max_volume}uL).") if volume <= 0: raise ValueError(f"Aspirating {volume}uL – you should aspirate a positive amount.") if self.aspirated_loc is not None and self.aspirated_loc != location: raise RuntimeError(f"Cross-contaminating wells {self.aspirated_loc} and {location} with a single pipette") self.aspirated_loc = location self.smearIfJustDispensed(location) self.current_volume += volume if isinstance(location, WellMock): if location.well_id.upper() not in (id.upper() for id in self.well_colors.keys()): raise ValueError(f"aspirate() was passed well location {location} which hasn’t been configured to have a color.") color = location.color() newloc = location else: # legal for aspirate() but we should probably treat this as an error for this lab? right now marking it white… print(f"WARNING – aspirate() passed a Location rather than a well – are you sure you know what you’re doing?") if location.point.z < 0: raise ValueError(f"aspirate() passed a location with z={location.point.z} – do not go below z=0!") color = ‘white’ # we don’t know where they’re asiprateing from… use an unusual color to mark it. newloc = self.petriLocOfWell(location) self.curr_color = color self.totalAspirated.setdefault(color, 0) self.totalAspirated[color] += volume self.location = newloc

def pick_up_tip(self): # (in opentrons api) loc = types.Location(types.Point(x=-MAX_DRAW_RADIUS, y=MAX_DRAW_RADIUS, z=0), ‘Pickup Tip’) self.smearIfJustDispensed(loc) if self.has_tip: raise RuntimeError(“pick_up_tip() called when already holding a tip”) self.has_tip = True assert(self.aspirated_loc is None) self.tip_count += 1 self.current_volume = 0 self.location = loc

def drop_tip(self): # (in opentrons api) loc = types.Location(types.Point(x=MAX_DRAW_RADIUS, y=MAX_DRAW_RADIUS, z=0), ‘Drop Tip’) self.smearIfJustDispensed(loc) if not self.has_tip: raise RuntimeError(“drop_tip() called when no tip was being held”) self.has_tip = False self.aspirated_loc = None self.current_volume = 0 self.location = loc

def move_to(self, location): # (in opentrons api) if location.point.z < 0: raise ValueError(f"move_to() passed a location with z={location.point.z} – do not go below z=0!") self.smearIfJustDispensed(location) self.location = location

def visualize(self): # (NOT in opentrons api) print("\n=== VOLUME TOTALS BY COLOR ===") for color in self.totalAspirated.keys() | self.totalDispensed.keys(): comment = ’’ if self.totalAspirated.setdefault(color, 0) != self.totalDispensed.setdefault(color, 0): comment = “\t\t##### WASTING BIO-INK : more aspirated than dispensed!” print(f"\t{color}:\t\t aspirated {self.totalAspirated[color]}\t dispensed {self.totalDispensed[color]}{comment}") print(f"\t[all colors]:\t[aspirated {sum(self.totalAspirated.values())}]\t[dispensed {sum(self.totalDispensed.values())}]") print(f"\n=== TIP COUNT ===\n\t Used {self.tip_count} tip(s) (ideally exactly one per unique color)") print("\n") # plus prints its own newline

## uncomment (only) one of these corresponding to the background medium you're printing on
plt.gca().add_patch(plt.Circle((0, 0), radius=PETRI_INNER_DIAMETER/2, color='#000000', fill=True)) # petri dish - 84mm inner diam, black agar plate
#plt.gca().add_patch(plt.Circle((0, 0), radius=PETRI_INNER_DIAMETER/2, color='#000000', fill=False)) # petri dish - 84mm inner diam, paper insert
#plt.gca().add_patch(plt.Circle((0, 0), radius=PETRI_INNER_DIAMETER/2, color='#d7ca95', fill=True)) # petri dish - 84mm inner diam, agar plate

plt.scatter(self.droplets_x, self.droplets_y, self.droplets_size, c=self.droplets_color)

for xlist,ylist,color in self.smears:
    plt.gca().plot(xlist, ylist, color=color, linewidth=4, solid_capstyle='round')

plt.xlim((-(PETRI_INNER_DIAMETER/2 + 0.5), PETRI_INNER_DIAMETER/2 + 0.5))
plt.ylim((-(PETRI_INNER_DIAMETER/2 + 0.5), PETRI_INNER_DIAMETER/2 + 0.5))
plt.show()

class WellMock: def init(self, well_id, well_color, labware_official_name): self.well_id = well_id self.labware_official_name = labware_official_name self.well_color = well_color if well_color else ‘purple’

def get_row_col(self):          # (NOT in opentrons api)
    row = ord(self.well_id[0].upper())
    col = int(self.well_id[1:])
    return (row, col)

def set_row_col(self, row, col):# (NOT in opentrons api)
    self.well_id = chr(row) + str(col)

def color(self):                # (NOT in opentrons api)
    return self.well_color

def bottom(self, z):            # (in opentrons api)
    assert z >= 0
    return self

def center(self):               # (in opentrons api)
    return self

def top(self, z=0):             # (in opentrons api)
    assert(isinstance(z, (int, float)))
    return types.Location(types.Point(x=0, y=0, z=z), 'Well')
    # return self

def move(self, location):       # (NOT in opentrons api) -- why do we have this here? what do we think it should do, move a well?
    assert(isinstance(location, types.Location))
    return self

def __eq__(self, other):
    return self.__class__ == other.__class__ and self.__dict__ == other.__dict__

def __repr__(self):
    return self.well_id

class LabwareMock: def init(self, labware_official_name, deck_slot, display_name, well_colors): self.labware_official_name = labware_official_name self.deck_slot = deck_slot self.display_name = display_name self.well_colors = well_colors

# the opentrons api names these arguments: self, idx
def well(self, well_id):        # (in opentrons api, but deprecated -- use wells(int) or wells_by_name(str) instead)
    return WellMock(well_id, self.well_colors.get(well_id, ''), self)

def __getitem__(self, well_id):
    return WellMock(well_id, self.well_colors.get(well_id, ''), self)

def __repr__(self):
    return "Deck Slot %s - %s" % (str(self.deck_slot), self.display_name)

class ModuleMock: def init(self, module_official_name, deck_slot, well_colors): self.module_official_name = module_official_name self.deck_slot = deck_slot self.well_colors = well_colors

# the opentrons api names these arguments: self, name, label
def load_labware(self, labware_official_name, display_name):    # (in opentrons api)
    mock_print("Module " + str(self.module_official_name) + " loaded " + str(labware_official_name))
    return LabwareMock(labware_official_name, self.deck_slot, display_name, well_colors)

def set_temperature(self, celsius):     # (in opentrons api)
    assert(isinstance(celsius, int))
    assert(celsius >= 4 and celsius <= 110)
    mock_print("Setting temperature to " + str(celsius) + "C")

def open_lid(self):                     # (in opentrons api)
    mock_print("Opening lid")

def close_lid(self):                    # (in opentrons api)
    mock_print("Closing lid")

def set_lid_temperature(self, temperature):     # (in opentrons api, but only for Thermocycler)
    assert(isinstance(temperature, int))
    assert(temperature >= 4 and temperature <= 110)
    mock_print("Setting lid temperature to " + str(temperature) + "C")

def deactivate_lid(self):               # (in opentrons api, but only for Thermocycler)
    mock_print("Deactivate lid")

                                        # (in opentrons api, but only for Thermocycler)
def set_block_temperature(self, temperature, hold_time_minutes=0, hold_time_seconds=0, ramp_rate=0, block_max_volume=25):
    assert(isinstance(temperature, int))
    assert(temperature >= 4 and temperature <= 110)
    assert(isinstance(hold_time_minutes, int))
    assert(isinstance(block_max_volume, int))
    mock_print("Setting block temperature to " + str(temperature) + "C")
    if (hold_time_minutes > 0):
        mock_print("Holding for " + str(hold_time_minutes) + " minutes...")
    if (hold_time_seconds > 0):
        mock_print("Holding for " + str(hold_time_seconds) + " seconds...")

def execute_profile(self, steps, repetitions, block_max_volume):    # (in opentrons api, but only for Thermocycler)
    assert(isinstance(repetitions, int))
    assert(isinstance(block_max_volume, int))

    mock_print("Executing following protocol for " + str(repetitions) + " cycles")

    for step in steps:
        assert(isinstance(step, dict))
        assert(isinstance(step['temperature'], int))
        assert(isinstance(step['hold_time_seconds'], int))

        mock_print("Temperature: " + str(step['temperature']) + "C, Time: " + str(step['hold_time_seconds']) + " seconds")

hmm, this appears to be unused…

class InstrumentMock: def init(self, instrument_official_name, mount_LR, tip_rack_list): self.instrument_official_name = instrument_official_name self.mount_LR = mount_LR starting_tip = None

    if "p20" in instrument_official_name:
        self.display_name = "P20"
        self.vol_range = (1, 20)
    elif "p300" in instrument_official_name:
        self.display_name = "P300"
        self.vol_range = (20, 300)
    elif "p1000" in instrument_official_name:
        self.display_name = "P1000"
        self.vol_range = (100, 1000)
    else:
        mock_print("WARNING: UNSUPPORTED PIPETTE")
        assert false

def advance_tip(self):
    row, col = self.starting_tip.get_row_col()

    row += 1
    if row > ord('H'):
        row = ord('A')
        col += 1

    if col > 12:
        mock_print("WARNING: OUT OF TIPS!!!")
        assert false

    self.starting_tip.set_row_col(row, col)

def pick_up_tip(self):
    row, col = self.starting_tip.get_row_col()
    assert(row >= ord('A') and row <= ord('H'))
    assert(col >= 1 and col <= 12)
    mock_print(self.display_name + " is picking up a tip from " + str(self.starting_tip))
    self.advance_tip()

def drop_tip(self):
    mock_print(self.display_name + " is dropping a tip");

def aspirate(self, volume, well):
    assert(isinstance(volume, (int, float)))
    assert(isinstance(well, WellMock))
    assert volume >= self.vol_range[0] and volume <= self.vol_range[1]
    mock_print("##### " + str(well.labware_official_name) + " [" + str(well.well_id) + "] ---> (" + str(volume) + "uL)")

def dispense(self, volume, well):
    assert(isinstance(volume, (int, float)))
    assert(isinstance(well, WellMock))
    assert volume >= self.vol_range[0] and volume <= self.vol_range[1]
    mock_print("##### " + str(well.labware_official_name) + " [" + str(well.well_id) + "] <--- (" + str(volume) + "uL)")

def blow_out(self):
    mock_print(self.display_name + " blow out")

def mix(self, repetitions, volume, well):
    assert(isinstance(repetitions, int))
    assert(isinstance(volume, (int, float)))
    assert(isinstance(well, WellMock))
    assert volume >= self.vol_range[0] and volume <= self.vol_range[1]
    mock_print("##### " + str(well.labware_official_name) + " [" + str(well.well_id) + "] - Mixing - " + str(repetitions) + " times, volume " + str(volume) + "uL")

def move_to(self, location, force_direct=False):
    assert(isinstance(force_direct, bool))
    assert(isinstance(location, WellMock))
    mock_print(self.display_name + " is moving");

class OpentronsMock: def init(self, well_colors): self.well_colors = well_colors self.pipette = None #self.location_cache = None # unimplemented: opentrons api’s more canonical way to get last_location, but these protocols don’t need it

def home(self):
    mock_print("Going home!")

# the opentrons api names these arguments: self, load_name, location, label
def load_labware(self, labware_official_name, deck_slot, display_name):
    mock_print("Loaded " + str(labware_official_name) + " in deck slot " + str(deck_slot))
    return LabwareMock(labware_official_name, deck_slot, display_name, self.well_colors)

# the opentrons api names these arguments: self, module_name, location
def load_module(self, module_official_name, deck_slot=0):
    mock_print("Loaded module " + str(module_official_name) + " in deck slot " + str(deck_slot))
    return ModuleMock(module_official_name, deck_slot, self.well_colors)

# the opentrons api names these arguments: self, instrument_name, mount, tip_racks
def load_instrument(self, instrument_official_name, mount_LR, tip_rack_list):
    self.pipette = PipetteSim(instrument_official_name, mount_LR, tip_rack_list, self.well_colors)
    return self.pipette

def pause(self):
    mock_print("Robot pause")

def visualize(self):
    self.pipette.visualize()

Put your name in the ‘author’ field of the metadata near the top of the first block, give your protocol a ‘protocolName’ there, and fill in the ‘description’ of what the protocol will do Write code to create your design at the very end of the first block

DEVELOPMENT TIP: Write your code in short runnable chunks, and after you’ve written each one run both of your clode blocks (running the first one loads your code, running the second one executes it on the simulator) to see that it’s doing what you expect. Simulate often!

My Code

    from opentrons import types import math

    metadata = { “author”: “Charley Naney”, “protocolName”: “HTGAA Opentrons Lab”, “description”: “HW3 multi-color agar patterning (safe preview + batched dispense)”, “source”: “HTGAA 2026 Opentrons Lab”, “apiLevel”: “2.20”, }

    —————————-

    DECK CONSTANTS

    —————————-

    TIP_RACK_DECK_SLOT = 9 COLORS_DECK_SLOT = 6 AGAR_DECK_SLOT = 5 PIPETTE_STARTING_TIP_WELL = “A1”

    —————————-

    COLOR SOURCE WELL -> PATTERN NAME (for logging)

    —————————-

    What the TA mock/visualizer uses for plotting colors (must be matplotlib-valid)

    well_colors = { “A1”: “#FFBF00”, # mko2 “B1”: “#FF4500”, # mrfp1 “C1”: “#FF2400”, # mscarlet “D1”: “#32CD32”, # sfgfp “E1”: “#7DF9FF”, # electra2 “F1”: “#4A4B43”, # mjuniper

    }

    What YOU use for printing nice names in the log

    well_color_names = { “A1”: “mko2”, “B1”: “mrfp1”, “C1”: “mscarlet”, “D1”: “sfgfp”, “E1”: “electra2”, “F1”: “mjuniper”, }

    mko2_points = [(7, 29),(11, 29),(13, 29),(17, 29),(21, 29),(7, 27),(9, 27),(11, 27),(15, 27),(17, 27),(19, 27),(21, 27),(25, 27),(1, 25),(3, 25),(5, 25),(7, 25),(9, 25),(11, 25),(13, 25),(17, 25),(19, 25),(23, 25),(25, 25),(27, 25),(-1, 23),(1, 23),(5, 23),(7, 23),(9, 23),(11, 23),(13, 23),(21, 23),(23, 23),(27, 23),(29, 23),(-9, 21),(-5, 21),(-3, 21),(-1, 21),(1, 21),(3, 21),(5, 21),(23, 21),(27, 21),(29, 21),(-13, 19),(-9, 19),(-7, 19),(-3, 19),(-1, 19),(23, 19),(25, 19),(27, 19),(29, 19),(-17, 17),(-13, 17),(-11, 17),(-7, 17),(-5, 17),(-3, 17),(25, 17),(-21, 15),(-17, 15),(-15, 15),(-13, 15),(-11, 15),(-9, 15),(-7, 15),(25, 15),(27, 15),(29, 15),(31, 15),(-25, 13),(-23, 13),(-21, 13),(-19, 13),(-17, 13),(-15, 13),(-13, 13),(25, 13),(29, 13),(-27, 11),(-25, 11),(-23, 11),(-21, 11),(-19, 11),(-17, 11),(25, 11),(27, 11),(31, 11),(-27, 9),(-23, 9),(-21, 9),(25, 9),(27, 9),(-27, 7),(-25, 7),(-23, 7),(25, 7),(27, 7),(29, 7),(-29, 5),(-27, 5),(-25, 5),(-23, 5),(23, 5),(27, 5),(-29, 3),(-27, 3),(-25, 3),(-23, 3),(23, 3),(25, 3),(27, 3),(-29, 1),(-27, 1),(-25, 1),(-21, 1),(21, 1),(23, 1),(-27, -1),(-25, -1),(-23, -1),(-21, -1),(-19, -1),(-15, -1),(17, -1),(19, -1),(23, -1),(-23, -3),(-21, -3),(-19, -3),(-3, -3),(5, -3),(7, -3),(11, -3),(13, -3),(15, -3),(19, -3),(-21, -5),(-19, -5),(-15, -5),(-13, -5),(-11, -5),(-9, -5),(-7, -5),(-5, -5),(-3, -5),(-1, -5),(1, -5),(3, -5),(7, -5),(11, -5)] mrfp1_points = [(19, 29),(1, 27),(23, 27),(-3, 25),(3, 23),(7, 21),(25, 21),(31, 21),(-17, 19),(5, 19),(-9, 17),(27, 17),(31, 17),(-11, 13),(-7, 13),(31, 13),(29, 11),(-25, 9),(-19, 9),(29, 9),(31, 9),(33, 9),(-33, 7),(-31, 7),(-21, 7),(31, 7),(29, 5),(-23, 1),(25, 1),(-31, -1),(-17, -1),(15, -1),(21, -1),(-27, -3),(-25, -3),(1, -3),(3, -3),(9, -3),(17, -3),(-17, -5),(5, -5),(13, -5),(-23, -7),(-19, -7),(-17, -7),(-15, -7),(-7, -7),(3, -7),(5, -7),(-3, -9),(-17, -11),(-15, -11),(-3, -11),(3, -13),(-21, -15),(-21, -19)] mscarlet_i_points = [(-21, 5),(19, 5),(-33, 3),(-31, 3),(-21, 3),(-19, 3),(-35, 1),(-37, -3),(-37, -5),(-37, -7),(-35, -9),(-33, -11),(-27, -13),(-25, -13),(-23, -15),(-21, -17),(-19, -23),(-17, -27),(-13, -29),(-11, -29)] sfgfp_points = [(13, 17),(15, 17),(1, 15),(9, 15),(11, 15),(13, 15),(15, 15),(3, 13),(11, 13),(13, 13),(15, 13),(17, 13),(-5, 11),(-3, 11),(1, 11),(3, 11),(7, 11),(11, 11),(13, 11),(15, 11),(-9, 9),(-7, 9),(-5, 9),(-1, 9),(1, 9),(7, 9),(9, 9),(13, 9),(15, 9),(-11, 7),(-7, 7),(-3, 7),(-1, 7),(1, 7),(3, 7),(7, 7),(9, 7),(13, 7),(15, 7),(19, 7),(-7, 5),(-5, 5),(-3, 5),(-1, 5),(3, 5),(5, 5),(9, 5),(-15, 3),(-13, 3),(-7, 3),(-11, 1),(3, 1)] electra2_points = [(17, 21),(19, 19),(7, 17),(17, 17),(21, 15),(-3, 13),(9, 13),(9, 11),(17, 11),(-15, 9),(-13, 9),(5, 9),(21, 9),(-5, 7),(17, 7),(-15, 5),(-13, 5),(-11, 5),(11, 5),(-5, 3),(3, 3),(13, 3),(17, 3),(-13, 1),(9, 1),(11, 1)] mjuniper_points = [(17, 19),(9, 17),(19, 17),(17, 15),(5, 11),(-3, 9),(3, 9),(11, 9),(17, 9),(19, 9),(-13, 7),(-9, 7),(5, 7),(-9, 5),(1, 5),(13, 5),(-11, 3),(9, 3),(11, 3)]

    colors_to_points = { “A1”: mko2_points, “B1”: mrfp1_points, “C1”: mscarlet_i_points, “D1”: sfgfp_points, “E1”: electra2_points, “F1”: mjuniper_points, }

    def run(protocol): def safe_comment(msg): if hasattr(protocol, “comment”): protocol.comment(str(msg)) else: print(str(msg))

    def safe_pause(msg):
        if hasattr(protocol, "pause"):
            try:
                protocol.pause(str(msg))
            except TypeError:
                protocol.pause()
                print(f"[PAUSE] {msg}")
        else:
            print(f"[PAUSE] {msg}")
    
    def reset_mock_aspirated_loc(pipette):
        if hasattr(pipette, "aspirated_loc"):
            pipette.aspirated_loc = None
    
    tips_20ul = protocol.load_labware(
        "opentrons_96_tiprack_20ul",
        TIP_RACK_DECK_SLOT,
        "Opentrons 20uL Tips",
    )
    
    try:
        pipette_20ul = protocol.load_instrument(
            "p20_single_gen2",
            "right",
            tip_racks=[tips_20ul],
        )
    except TypeError:
        pipette_20ul = protocol.load_instrument(
            "p20_single_gen2",
            "right",
            [tips_20ul],
        )
    
    pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)
    
    temperature_module = protocol.load_module(
        "temperature module gen2",
        COLORS_DECK_SLOT,
    )
    
    temperature_plate = temperature_module.load_labware(
        "opentrons_96_aluminumblock_generic_pcr_strip_200ul",
        "Cold Plate",
    )
    
    color_plate = temperature_plate
    
    agar_plate = protocol.load_labware(
        "htgaa_agar_plate",
        AGAR_DECK_SLOT,
        "Agar Plate",
    )
    
    center_location = agar_plate["A1"].top()
    
    DRY_RUN_ONLY = False
    DOT_UL = 1.0
    PIP_MAX_UL = 20.0
    HEADROOM_UL = 1.0
    PREVIEW_Z_MM = 10.0
    DETACH_Z_MM = 5.0
    
    X_MAX = 40.0
    Y_MAX = 40.0
    
    origin = center_location  
    
    max_dots_per_asp = int(math.floor((PIP_MAX_UL - HEADROOM_UL) / DOT_UL))
    if max_dots_per_asp < 1:
        raise ValueError("Invalid aspiration volume configuration.")
    
    def optimize_path_scanline(points):
        buckets = {}
        for (x, y) in points:
            buckets.setdefault(float(y), []).append(float(x))
        ys = sorted(buckets.keys(), reverse=True)
        out = []
        flip = False
        for y in ys:
            xs = sorted(buckets[y], reverse=flip)
            out.extend([(x, y) for x in xs])
            flip = not flip
        return out
    
    def bounding_box_check(mapping):
        for src, pts in mapping.items():
            for (x, y) in pts:
                if (float(x) < -X_MAX or float(x) > X_MAX or
                    float(y) < -Y_MAX or float(y) > Y_MAX):
                    raise RuntimeError(f"Point outside safety bounds for {src}: ({x}, {y})")
    
    def preview_move(pipette, location):
        high = location.move(types.Point(z=location.point.z + PREVIEW_Z_MM))
        pipette.move_to(high)
    
    def dispense_and_detach(pipette, volume, location):
        above = location.move(types.Point(z=location.point.z + DETACH_Z_MM))
        pipette.move_to(above)
        pipette.dispense(volume, location)
        pipette.move_to(above)
    
    bounding_box_check(colors_to_points)
    
    safe_comment("Starting dry run preview.")
    pipette_20ul.pick_up_tip()
    
    for src_well in sorted(colors_to_points.keys()):
        pts = optimize_path_scanline(colors_to_points[src_well])
        safe_comment(
            f"Preview {src_well} ({well_color_names.get(src_well, src_well)}): {len(pts)} dots"
        )
        for (x, y) in pts:
            target = origin.move(types.Point(x=float(x), y=float(y)))
            preview_move(pipette_20ul, target)
    
    pipette_20ul.drop_tip()
    
    if DRY_RUN_ONLY:
        safe_comment("Preview only mode complete.")
        return
    
    safe_pause("Dry run complete. Resume to dispense.")
    
    safe_comment("Starting dispense.")
    
    for src_well in sorted(colors_to_points.keys()):
        source = color_plate[src_well]
        pts = optimize_path_scanline(colors_to_points[src_well])
    
        safe_comment(
            f"Pattern: {well_color_names.get(src_well, src_well)} | Source well: {src_well} | Dots: {len(pts)}"
        )
    
        pipette_20ul.pick_up_tip()
        reset_mock_aspirated_loc(pipette_20ul)
    
        remaining = list(pts)
        while remaining:
            batch = remaining[:max_dots_per_asp]
            remaining = remaining[max_dots_per_asp:]
    
            volume_needed = len(batch) * DOT_UL
            pipette_20ul.aspirate(volume_needed, source)
    
            for (x, y) in batch:
                target = origin.move(types.Point(x=float(x), y=float(y)))
                dispense_and_detach(pipette_20ul, DOT_UL, target)
    
        pipette_20ul.drop_tip()
        reset_mock_aspirated_loc(pipette_20ul)
    
    safe_comment("Run complete.")
    
    Execute Simulation / Visualization – don’t change this code block

    protocol = OpentronsMock(well_colors) run(protocol) protocol.visualize()

Week 4 HW: Protein Design Part I

Homework: Protein Design I

Assignment


Objective:

Learn basic concepts: amino acid structure, 3D protein visualization, and the variety of ML-based design tools. Brainstorm as a group how to apply these tools to engineer a better bacteriophage (setting the stage for the final project).

HTGAA Protein Engineering Tools, HTGAA Protein Engineering Feedback​


Part A. Conceptual Questions

Answer any of the following questions by Shuguang Zhang:
  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
  2. Why humans eat beef but do not become a cow, eat fish but do not become fish?
    • Although the saying goes we are what we eat, our genomes disagree – they are selfish like that. However, if a human eats a cow with a prion disease, the line between them fades as each returns to the Earth.
  3. Why there are only 20 natural amino acids?
  4. Can you make other non-natural amino acids? Design some new amino acids.
  5. Where did amino acids come from before enzymes that make them, and before life started?
  6. If you make an alpha-helix using D-amino acids, what handedness (right or left) would you expect?
    • The 20 primary amino acids are all L-amino acids, as are most protein building blocks of cells. Alpha-helices here will be the B-DNA, favoring right-handedness. Thus by the power of deduction that leaves D-amino acids, the exceptions, to the way of left-handedness.
  7. Can you discover additional helices in proteins?
    • yes
  8. Why most molecular helices are right-handed?
  • Most life on Earth is evolutionary rooted in B-DNA helices with a right-handed confirmation due to origin in saltwater oceans, passed on to self-replicating cells synthesized from macromolecules shaped by complementarities in form dominated by non-covalent weak interactions. source,
  1. Why do beta-sheets tend to aggregate?
  • ionic bonding, need to confirm.
  1. What is the driving force for b-sheet aggregation?
  2. Why many amyloid diseases form b-sheet?
  3. Can you use amyloid b-sheets as materials?
  4. Design a b-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.
Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions.
  1. Briefly describe the protein you selected and why you selected it.
  2. Identify the amino acid sequence of your protein.
  3. How long is it? What is the most frequent amino acid? You can use this notebook to count most frequent amino acid - https://colab.research.google.com/drive/1vlAU_Y84lb04e4Nnaf1axU8nQA6_QBP1?usp=sharing
  4. How many protein sequence homologs are there for your protein? Hint: Use the pBLAST tool to search for homologs and ClustalOmega to align and visualize them. Tutorial Here
  5. Does your protein belong to any protein family?
  6. Identify the structure page of your protein in RCSB
  7. When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
  8. Are there any other molecules in the solved structure apart from protein?
  9. Does your protein belong to any structure classification family?
  10. Open the structure of your protein in any 3D molecule visualization software:
  • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
    • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
    • Color the protein by secondary structure. Does it have more helices or sheets?
    • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
    • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Part C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein. Copy the notebook below and set up a colab instance with GPU for this section: HTGAA_ProteinDesign2026.ipynb Choose your favorite protein from the PDB. We will now try multiple things, report each of those results in your homework page: Protein Language Models: Deep Mutational Scans Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. Can you explain any particular pattern? (choose a residue and a mutation that stands out) (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment Latent Space Analysis Use the provided sequence dataset to embed proteins in reduced dimensionality Analyze the different formed neighborhoods: do they approximate similar proteins? Place your protein in the resulting map and explain its position and similarity to its neighbors Attention Maps Analyze the attention maps of ESM2. Investigate if its layers correlate to the 2D map of residue distances of your protein Protein Folding: Folding a protein Fold your protein with ESMFold. Do the predicted coordinates match your original structure? Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations? Protein Generation: Inverse-Folding a protein Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one Input this sequence into ESMFold and compare the predicted structure to your original Part D. Group Brainstorm on Bacteriophage Engineering

Find a group of ~3–4 students Review the Bacteriophage Final Project Goals: Increased stability (easiest) Higher titers (medium) Higher toxicity of lysis protein (hard) Brainstorm Session Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”). Write a 1-page proposal (bullet points or short paragraphs) describing: Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”). Why you think those tools might help solve your chosen sub-problem. One or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”). Include a schematic of your pipeline This resource may be useful: HTGAA Protein Engineering Tools Individually put your plan on your website page Each group’s short plan for engineering a bacteriophage Schedule time ( HTGAA Protein Engineering Feedback) to get feedback/discuss your ideas, and put the feedback on your website [Optional] Part E. Find a drug for an oncology target