Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    This homework analyzes a synthetic biology idea and evaluates governance options to support ethical, safe, and responsible innovation.

  • Week 2 HW: DNA Read, Write, & Edit

    This homework goes through DNA Sequencing and Editing.

  • Week 3 HW: Lab Automation

    A robot-assisted synthetic biology platform that uses automated, plate-based assays to test how ABO-like glycan contexts influence inflammatory and microbiome-related responses relevant to gastrointestinal disease risk.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

🧠 Question 1

First, describe a biological engineering application or tool you want to develop and why.
This could be inspired by an idea for your HTGAA class project and/or something you are already doing in your research, or something you are just curious about.

✍️ Answer

One biological engineering tool I’m curious about developing is a synthetic biology–based system to explore whether blood group types, especially blood type A, are actually linked to higher gastrointestinal disease risk at a biological level. I’ve read in multiple papers that people with blood type A may have a higher risk for certain gastrointestinal problems (1). However, when I looked into it more, most of the evidence seems to come from population statistics rather than experimental or mechanistic studies. There doesn’t seem to be a clear biological explanation, and there also aren’t many tools that can directly test this relationship in a controlled way. That gap is what makes me interested in this idea. From a synthetic biology perspective, I find it interesting that ABO blood groups are defined by differences in glycan structures, which are known to play roles in cell–cell interactions, immune responses, and host–microbiome relationships (2). This makes me wonder whether these glycan differences could influence how the gut environment responds to inflammation or pathogens and whether that could partially explain the observed disease risk. A possible approach could be to use engineered cells or microbial biosensors with simple genetic circuits that respond to blood-group-related glycan patterns and gastrointestinal inflammation markers. The goal wouldn’t be to create a finished diagnostic tool right away, but rather a research platform that helps test whether these associations are biologically meaningful instead of just statistical.


🧠 Question 2

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

✍️ Answer

Because this tool links blood group type with disease risk, it raises important ethical and governance concerns. A key goal is preventing harm, especially avoiding discrimination or overinterpretation of results, since blood type alone does not determine gastrointestinal disease risk. Governance should also ensure biological safety and scientific responsibility, particularly if engineered cells or genetic circuits are used, by requiring proper containment and validation before findings are shared beyond research settings. In addition, protecting individual autonomy and privacy is essential, as combining blood group information with biosensor data creates sensitive health information that should only be used with informed consent. Finally, equity should be considered to ensure that the tool does not disproportionately benefit or disadvantage specific populations.


🧠 Question 3

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g., a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g., academic researchers, companies, federal regulators, law enforcement, etc.). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g., 3D printing, drones, financial systems, etc.).

✍️ Answer

Action 1: Regulatory Oversight and Ethical Review

Purpose: Currently, early-stage synthetic biology research often proceeds with minimal oversight, especially in academic labs. I propose requiring that any research using engineered cells or biosensors targeting blood group data undergo formal ethical review and regulatory approval before publication or broader use.

Design: National regulators (e.g., EMA) and university ethics boards would evaluate safety, privacy protections, and non-discrimination measures. Researchers would submit risk assessments and validation plans.

Assumptions: This assumes regulators and review boards have enough expertise in synthetic biology to assess risk accurately and that labs comply with these requirements.

Risks of Failure & “Success”: Failure could occur if the review is too slow or inconsistent, slowing research unnecessarily. Success could unintentionally create overconfidence in safety, leading others to assume the tool is risk-free.

Action 2: Privacy and Data Governance Framework Purpose: Right now, blood group and biosensor data could be collected without strong protections. I propose treating this information as sensitive health data, requiring secure storage, anonymisation, and informed consent for research or secondary use.

Design: Universities, hospitals, and biotech companies would implement encrypted databases and adopt privacy-by-design models, such as federated learning, where data stays local but insights can still be shared.

Assumptions: Assumes technical infrastructure is available and participants understand consent procedures.

Risks of Failure & “Success”: Data leaks could lead to discrimination or misuse. Overly restrictive rules could hinder collaboration and slow scientific progress.

Action 3: Incentives for Equitable and Responsible Innovation Purpose: Often, SynBio innovations are developed for wealthy populations or commercial markets. I propose funding programs and grants that encourage open-source development of biosensor tools and ensure accessibility to diverse populations.

Design: Government research agencies (e.g., DFG, Horizon Europe) could tie grants to equity and open-science requirements. NGOs and academic labs could partner to distribute tools widely and safely.

Assumptions: Assumes companies and researchers are motivated by incentives and will participate voluntarily.

Risks of Failure & “Success”: Companies may avoid participation, limiting innovation. Open designs could also be misused if security oversight is insufficient.


🧠 Question 4

Next, score (from 1 to 3, with 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework, but feel free to make your own:

✍️ Answer

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents123
• By helping respond123
Foster Lab Safety
• By preventing incident123
• By helping respond123
Protect the environment
• By preventing incidents123
• By helping respond123
Other considerations
• Minimizing costs and burdens to stakeholders321
• Feasibility?213
• Not impede research321
• Promote constructive applications123

🧠 Question 5

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritise, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g., to MIT leadership or the Cambridge Mayoral Office) to the national (e.g., to President Biden or the head of a federal agency) to the international (e.g., to the United Nations Office of the Secretary-General or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

✍️ Answer

Based on the scoring of the three governance options, I would prioritise a combination of Option 1 (Regulatory Oversight & Ethical Review) and Option 2 (Privacy & Data Governance Framework), while also incorporating elements of Option 3 (Equity & Incentives) where possible. Regulatory oversight is the most important because it directly enhances biosecurity, lab safety, and environmental protection, which are essential when working with engineered cells or biosensors that interact with human biological data. Privacy and data governance complement this by protecting sensitive blood group and biosensor information, ensuring that individuals’ autonomy is respected and minimising the risk of misuse or discrimination.

Option 3, focusing on equitable access and open-science incentives, is valuable for promoting constructive applications and broad societal benefit, but it has less impact on immediate safety and biosecurity concerns. The main trade-off is that prioritising regulatory oversight and privacy measures may increase costs and slow research progress, while emphasising equity and open access could increase the risk of misuse if technical safeguards are insufficient.

I would recommend this combined approach to national-level regulators and research oversight bodies, such as the EMA or national bioethics committees, because they are in a position to implement formal policies and standards that balance safety, privacy, and societal benefit. The key assumptions are that regulators have sufficient expertise in synthetic biology and that institutions will comply with these rules. Uncertainties include the potential for unforeseen technical risks in engineered biosensors and how effectively privacy protections can prevent indirect discrimination.


This week’s class made me realise that even curiosity-driven synthetic biology work can raise ethical concerns, especially when human biological data is involved. One issue that was new to me was how combining traits like blood group type with disease risk can lead to harm if results are overinterpreted or misused, even without malicious intent. To address this, early ethical review, clear data privacy rules, and careful communication of uncertainty seem important governance actions.


Assignment (Week 2 Lecture Prep)- Professor Jacobson

🧠 Question 1

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

✍️ Answer

  1. Error rate ≈ 1 in 10⁶ bases (10⁻⁶)

  2. Human genome size ≈ 3.2 Gb (3 × 10⁹ bp)

    • With an error rate of 10⁻⁶, naïvely you’d expect: ~3,000 errors per replication
  3. Biology deals with the discrepancy between the finite error rate of DNA polymerase and the very large size of the human genome by using closed-loop, error-correcting replication rather than relying on single-pass accuracy. Replicative DNA polymerases contain a 3′→5′ proofreading exonuclease that removes misincorporated nucleotides during synthesis, improving fidelity by several orders of magnitude. Errors that escape proofreading are further corrected by post-replication mismatch repair systems such as the MutS pathway, which detect and repair base-pair mismatches. Together, these layered correction mechanisms reduce the effective error rate sufficiently to allow replication of gigabase-scale genomes, enabling biological DNA synthesis to scale far beyond what would be possible with open-loop chemical synthesis.


🧠 Question 2

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

✍️ Answer

  1. Average human protein ≈ 1036 bp (~350 aa)

    • Combined with: Degenerate genetic code (multiple codons per amino acid)
    • This implies: ~3³⁵⁰ possible DNA sequences for one average human protein (combinatorial explosion, not the exact number)
  2. a. GC content & secondary structure

    • GC = 10%, 50%, 90% → radically different folding energies
    • Strong secondary structures block transcription/translation

    b. Repeats & homopolymers

    • Shown as problematic for synthesis and stability
    • Cause deletions and recombination

    c. Physical DNA behavior matters

    • DNA is not just information — it is matter with thermodynamics

    So many valid codons fail because they: Fold incorrectly, Are unstable, Are hard to synthesize and, Break regulatory behavior


Assignment (Week 2 Lecture Prep)- Dr. LeProust

🧠 Question 1

What’s the most commonly used method for oligosynthesis currently?

✍️ Answer

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry, originally developed by Caruthers. In this method, DNA is synthesised on a solid support (such as controlled pore glass or silicon) through repetitive cycles of nucleotide coupling, capping of unreacted sites, oxidation, and deprotection. The lecture highlight that this chemistry is highly automatable and forms the basis of modern high-throughput oligo synthesis platforms, including array-based and silicon-based synthesis systems.


🧠 Question 2

Why is it difficult to make oligos longer than 200 nt via direct synthesis?

✍️ Answer

Direct chemical synthesis of oligos becomes inefficient beyond ~200 nucleotides because each synthesis cycle has a coupling efficiency slightly below 100%. These small inefficiencies accumulate over many cycles, leading to a rapid decrease in the fraction of full-length products and a buildup of truncated sequences. As oligo length increases, synthesis errors and truncation products dominate the pool, making purification of the correct full-length oligo increasingly difficult. Additionally, longer sequences are more prone to secondary structure formation, further reducing synthesis efficiency as mentioned.


🧠 Question 3

Why can’t you make a 2000 bp gene via direct oligo synthesis?

✍️ Answer

Synthesising a 2000 bp gene directly using phosphoramidite chemistry is not feasible because the cumulative effect of coupling inefficiencies and error rates makes the yield of full-length, error-free molecules vanishingly small. Over thousands of synthesis cycles, the probability of obtaining a correct full-length product approaches zero, while the majority of molecules are truncated or contain multiple errors. For this reason, the lecture emphasize that modern gene synthesis relies on assembling shorter, chemically synthesized oligos into longer gene fragments using enzymatic assembly methods, followed by sequence verification, rather than attempting direct synthesis of long genes.


Assignment (Week 2 Lecture Prep)- George Church

🧠 Question 1

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

✍️ Answer

Animals require ten essential amino acids: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine (during growth) because they cannot synthesise them on their own. This limitation is metabolic rather than genetic, meaning the ribosome can translate these amino acids, but the organism must obtain them from the environment, as emphasised in Church’s slides of amino acid constraints.

The lysine contingency is especially important because animals completely lack a lysine biosynthesis pathway. This makes lysine a reliable metabolic bottleneck that can be exploited for biocontainment. An engineered organism that depends on lysine, or a lysine analogue, cannot survive without external supplementation, reducing the risk of escape or uncontrolled spread. Lysine is also central to protein function due to its positive charge and role in protein–protein interactions and post-translational modifications. Because lysine is essential at metabolic, structural, and regulatory levels, the lysine contingency provides a robust and evolution-resistant control strategy in synthetic biology.


Assignment (Your HTGAA Website) — DUE BY START OF FEB 10 LECTURE

Begin personalising your HTGAA website in in https://edit.htgaa.org/, starting with your homepage—fill in the template with information about yourself, or remove what’s there and make it your own. Be creative! - Donr As with all assignments in HTGAA, be sure to write up every part of this homework on your HTGAA website in order to receive credit. - Done


References

(1) J. Y. Huang, R. Wang, Y.-T. Gao, and J.-M. Yuan, “ABO blood type and the risk of cancer – Findings from the Shanghai Cohort Study,” PLoS ONE, vol. 12, no. 9, p. e0184295, Sep. 2017, doi: 10.1371/journal.pone.0184295.

(2) G. Misevic, “ABO blood group system,” Blood and Genomics, vol. 2, no. 2, pp. 71–84, Jan. 2018, doi: 10.46701/apjbg.2018022018113.

**The cover page and the text rephrasing of some lines done by AI.

Week 2 HW: DNA Read, Write, & Edit

Part 0: Basics of Gel Electrophoresis

  • Attend or watch all lecture and recitation videos. Optionally watch bootcamp.

Part 1: Benchling & In-silico Gel Art

  • Make a free account at benchling.com
  • Import the Lambda DNA.

  • Simulate Restriction Enzyme Digestion with the following Enzymes:
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI

  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

  • You might find Ronan’s website a helpful tool for quickly iterating on designs!

Part 2: Gel Art – Restriction Digests and Gel Electrophoresis

Didnt have the lab access to perform the above experiment


Part 3: DNA Design Challenge

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen, and why? Using one of the tools described in recitation (NCBI, UniProt, Google), obtain the protein sequence for the protein you chose.

(Example from our group homework, you may notice the particular format — The example below came from UniProt)

sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT

Answer

For this homework, I chose the ABO glycosyltransferase protein because it directly determines human blood group type by modifying cell-surface glycans. Since my broader project idea focuses on whether blood type A may influence gastrointestinal disease risk, this protein is central to that question. The ABO glycosyltransferase is responsible for adding specific sugar residues that create the A or B antigen. These glycan differences may influence host–microbe interactions, immune responses, or inflammation in the gut. I chose this protein because it represents the molecular basis of blood group identity, making it a logical starting point for exploring any mechanistic relationship between blood type and disease risk.

Here is the human ABO glycosyltransferase sequence (UniProt entry for human ABO):

sp|P16442|BGAT_HUMAN Histo-blood group ABO system transferase OS=Homo sapiens OX=9606 GN=ABO PE=1 SV=2 MAEVLRTLAGKPKCHALRPMILFLIMLVLVLFGYGVLSPRSLMPGSLERGFCMAVREPDH LQRVSLPRMVYPQPKVLTPCRKDVLVVTPWLAPIVWEGTFNIDILNEQFRLQNTTIGLTV FAIKKYVAFLKLFLETAEKHFMVGHRVHYYVFTDQPAAVPRVTLGTGRQLSVLEVRAYKR WQDVSMRRMEMISDFCERRFLSEVDYLVCVDVDMEFRDHVGVEILTPLFGTLHPGFYGSS REAFTYERRPQSQAYIPKDEGDFYYLGGFFGGSVQEVQRLTRACHQAMMVDQANGIEAVW HDESHLNKYLLRHKPTKVLSPEYLWDQQLLGWPAVLRKLRFTAVPKNHQAVRNP

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which a DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (Google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

(Example: Get to the original sequence of phage MS2 L-protein from its genome – phage MS2 genome - Nucleotide - NCBI)

Lysis protein DNA sequence

atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

Answer

NG_006669.2:5026-5053,18047-18116,18841-18897,20349-20396,22083-22118,22673-22807,23860-24550 Homo sapiens ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase (ABO), RefSeqGene (LRG_792) on chromosome 9 ATGGCCGAGGTGTTGCGGACGCTGGCCGGAAAACCAAAATGCCACGCACTTCGACCTATGATCCTTTTCC TAATAATGCTTGTCTTGGTCTTGTTTGGTTACGGGGTCCTAAGCCCCAGAAGTCTAATGCCAGGAAGCCT GGAACGGGGGTTCTGCATGGCTGTTAGGGAACCTGACCATCTGCAGCGCGTCTCGTTGCCAAGGATGGTC TACCCCCAGCCAAAGGTGCTGACACCGTGTAGGAAGGATGTCCTCGTGGTGACCCCTTGGCTGGCTCCCA TTGTCTGGGAGGGCACATTCAACATCGACATCCTCAACGAGCAGTTCAGGCTCCAGAACACCACCATTGG GTTAACTGTGTTTGCCATCAAGAAATACGTGGCTTTCCTGAAGCTGTTCCTGGAGACGGCGGAGAAGCAC TTCATGGTGGGCCACCGTGTCCACTACTATGTCTTCACCGACCAGCCGGCCGCGGTGCCCCGCGTGACGC TGGGGACCGGTCGGCAGCTGTCAGTGCTGGAGGTGCGCGCCTACAAGCGCTGGCAGGACGTGTCCATGCG CCGCATGGAGATGATCAGTGACTTCTGCGAGCGGCGCTTCCTCAGCGAGGTGGATTACCTGGTGTGCGTG GACGTGGACATGGAGTTCCGCGACCACGTGGGCGTGGAGATCCTGACTCCGCTGTTCGGCACCCTGCACC CCGGCTTCTACGGAAGCAGCCGGGAGGCCTTCACCTACGAGCGCCGGCCCCAGTCCCAGGCCTACATCCC CAAGGACGAGGGCGATTTCTACTACCTGGGGGGGTTCTTCGGGGGGTCGGTGCAAGAGGTGCAGCGGCTC ACCAGGGCCTGCCACCAGGCCATGATGGTCGACCAGGCCAACGGCATCGAGGCCGTGTGGCACGACGAGA GCCACCTGAACAAGTACCTGCTGCGCCACAAACCCACCAAGGTGCTCTCCCCCGAGTACTTGTGGGACCA GCAGCTGCTGGGCTGGCCCGCCGTCCTGAGGAAGCTGAGGTTCACTGCGGTGCCCAAGAACCACCAGGCG GTCCGGAACCCGTGA

3.3. Codon optimisation.

Once a nucleotide sequence of your protein is determined, you need to codon optimise your sequence. You may, once again, utilise Google for a “codon optimisation tool”. In your own words, describe why you need to optimise codon usage. Which organism have you chosen to optimise the codon sequence for, and why?

(Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI)

Lysis protein DNA sequence with codon optimisation

ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

Answer

Once the nucleotide sequence of the protein is determined, codon optimisation is necessary because different organisms prefer different codons to encode the same amino acid. Although multiple codons can code for one amino acid, the frequency with which each codon is used varies between species. If a gene contains many codons that are rare in the host organism, translation can be inefficient, leading to low protein yield or incorrect folding. Codon optimisation adjusts the DNA sequence to better match the codon usage bias of the chosen expression host, without changing the amino acid sequence of the protein.

For this project, I chose to optimise the codon sequence for Escherichia coli, since it is one of the most commonly used organisms for recombinant protein expression. E. coli grows quickly, is inexpensive to culture, and has well-established cloning and expression systems. Optimising the ABO glycosyltransferase gene for E. coli would increase the likelihood of efficient transcription and translation, improving protein yield for experimental studies. Additionally, codon optimisation tools can help avoid problematic sequences such as strong secondary structures, rare codons, or unwanted restriction enzyme recognition sites.

Optimized codon:

ATGGCGGAAGTGCTGCGTACCCTGGCAGGTAAACCGAAGTGCCATGCCCTGCGTCCGATGATTCTGTTCCTGATTATGCTGGTGCTGGTGCTGTTCGGTTATGGCGTGCTGAGCCCGCGTAGCCTGATGCCGGGCTCTCTGGAACGTGGTTTCTGCATGGCGGTGCGCGAACCGGACCATCTGCAGCGTGTGAGCCTGCCGCGCATGGTGTATCCGCAGCCGAAAGTTCTGACCCCGTGCCGCAAAGATGTGCTGGTGGTGACGCCGTGGCTGGCGCCGATTGTGTGGGAAGGCACCTTTAATATTGATATTCTGAATGAACAGTTTCGCCTGCAGAATACCACCATTGGCCTGACCGTGTTTGCGATTAAAAAATACGTGGCGTTTCTGAAACTGTTTCTGGAAACGGCGGAAAAACATTTCATGGTGGGCCATCGCGTGCACTACTACGTCTTCACCGATCAGCCGGCGGCGGTGCCGCGCGTTACCCTGGGCACGGGCCGCCAGCTGAGCGTGCTGGAAGTGCGCGCGTATAAACGTTGGCAGGATGTTAGCATGCGCCGCATGGAAATGATTAGCGATTTTTGCGAACGTCGCTTTCTGAGCGAAGTGGATTATCTGGTGTGCGTGGATGTGGATATGGAATTTCGCGATCATGTGGGCGTGGAAATTCTGACCCCGCTGTTTGGCACCCTGCATCCGGGCTTCTATGGCAGCAGCCGCGAAGCATTCACCTACGAACGCCGCCCGCAGAGCCAGGCCTACATTCCGAAAGATGAAGGCGATTTCTATTATCTGGGCGGCTTCTTTGGCGGCTCAGTTCAGGAAGTGCAGCGTCTGACCCGCGCCTGCCATCAGGCGATGATGGTGGACCAGGCGAACGGCATTGAAGCCGTTTGGCATGATGAAAGCCATCTGAACAAATACCTGCTGCGTCATAAACCGACCAAAGTTCTGTCGCCGGAATATCTGTGGGATCAGCAGCTGCTGGGCTGGCCGGCGGTGCTGCGTAAACTGCGCTTTACCGCGGTGCCGAAAAACCATCAGGCGGTACGTAATCCGTAA

After codon optimisation using the VectorBuilder tool, the sequence showed a GC content of 56.53% and a Codon Adaptation Index (CAI) of 0.94. The GC content falls within the preferred range for E. coli expression (typically ~30–70%), suggesting the sequence should be stable and efficiently transcribed. The CAI value is close to 1.0, which indicates that the codons used in the optimised gene closely match the codon usage bias of the host organism. A high CAI generally correlates with improved translation efficiency because the host has abundant tRNAs for these codons.

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Answer

To produce the protein from the DNA sequence, the optimised gene would first be cloned into an expression vector containing a promoter, ribosome binding site, and terminator. The plasmid would then be introduced into a host such as E. coli through transformation. Inside the cell, RNA polymerase binds to the promoter and transcribes the DNA into messenger RNA (mRNA). The ribosome then binds to the mRNA and reads the codons, while tRNAs deliver the corresponding amino acids to build the polypeptide chain. The growing chain folds into the functional ABO glycosyltransferase protein after translation.

An alternative method is a cell-free expression system, where purified transcription and translation machinery are mixed with the DNA template in vitro. In this system, RNA is synthesised from the DNA and immediately translated into protein without living cells. Cell-free expression is faster and easier to control, while cell-based expression generally produces larger quantities of protein.

In both approaches, the central dogma applies: DNA is transcribed into RNA, and RNA is translated into the protein.

3.5. How does it work in nature/biological systems?

Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated protein!!!

Answer

A single gene can produce multiple proteins at the transcriptional level, mainly through alternative splicing. During transcription, the DNA sequence is copied into a pre-mRNA that contains both exons (coding regions) and introns (non-coding regions). The cell’s splicing machinery can remove introns in different patterns and join different combinations of exons together. As a result, multiple mature mRNA transcripts can be produced from the same gene, and each mRNA can be translated into a slightly different protein with different structure or function. This allows one gene to increase protein diversity without changing the DNA sequence.

Below is a small illustrative alignment showing how DNA becomes RNA and then protein. Notice that T becomes U during transcription, and every 3 nucleotides (codon) form one amino acid during translation:

DNA: ATG AAA GCT TTT GGA TAA

RNA: AUG AAA GCU UUU GGA UAA

Protein: Met Lys Ala Phe Gly Stop

If an exon is skipped during splicing, the RNA sequence changes:

DNA: ATG AAA GGA TAA

RNA: AUG AAA GGA UAA

Protein: Met Lys Gly Stop

Even though the gene is the same, different mRNA transcripts lead to different proteins. This is one of the main ways cells generate protein diversity from a limited number of genes.


Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account and a Benchling account

4.2. Build Your DNA Insert Sequence

Click here to get the final sequence

FASTA file for the above sequence

constitutive_sfGFP_his_tag TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

4.3. On Twist, Select The “Genes” Option

4.4. Select “Clonal Genes” option

4.5. Import your sequence

4.6. Choose Your Vector

When I first uploaded the DNA sequence, it gave an error due to high GC content. I then used Twist’s built-in codon optimization tool for E. coli to optimize the sequence, and the new sequence is provided below:

ATGGCAGAAG TTCTTCGCAC TTTAGCAGGC AAGCCCAAAT GTCACGCATT ACGGCCAATG ATATTATTTC TCATCATGCT CGTTTTGGTA CTCTTTGGCT ACGGTGTACT CAGTCCTCGC TCTTTGATGC CTGGTAGTTT AGAGAGAGGG TTTTGTATGG CCGTCCGGGA GCCAGATCAC CTGCAAAGAG TATCATTGCC TCGGATGGTT TACCCCCAAC CTAAGGTGTT AACTCCTTGT CGAAAGGACG TTCTTGTAGT AACTCCTTGG CTTGCCCCTA TCGTATGGGA AGGTACATTC AACATCGACA TCCTTAACGA GCAATTCCGG TTGCAAAACA CGACTATAGG TCTTACAGTT TTCGCAATAA AGAAGTATGT TGCCTTCCTC AAGTTATTCC TCGAGACAGC TGAGAAGCAC TTTATGGTCG GTCACCGGGT TCATTATTAT GTGTTTACTG ACCAACCAGC AGCCGTTCCT CGTGTCACTT TAGGTACTGG TCGTCAATTA TCCGTTCTCG AGGTCCGGGC CTACAAGCGC TGGCAAGACG TATCTATGCG TCGAATGGAG ATGATCAGTG ACTTCTGTGA GCGGAGATTC CTTTCAGAGG TTGACTACTT GGTCTGTGTA GACGTTGACA TGGAGTTCCG GGACCACGTA GGTGTTGAGA TCTTAACGCC ATTATTCGGA ACTCTTCACC CCGGTTTCTA CGGGAGTTCG CGCGAGGCTT TTACATATGA GCGTAGACCT CAATCCCAAG CATATATACC TAAGGACGAG GGTGACTTTT ACTACTTAGG TGGATTCTTC GGTGGGTCCG TACAAGAGGT TCAACGCTTA ACTCGGGCAT GTCACCAAGC AATGATGGTC GATCAAGCAA ATGGGATCGA GGCAGTCTGG CACGACGAGT CTCACTTAAA TAAGTATTTG CTTCGGCACA AGCCAACAAA GGTGCTTAGT CCCGAGTACT TGTGGGACCA ACAATTACTC GGATGGCCTG CAGTCCTTAG AAAGCTCCGT TTCACGGCAG TTCCCAAGAA TCACCAAGCT GTTCGGAACC CATGA

After downloading the construct from Twist, I uploaded it to Benchling, and the plasmid map obtained is shown below.


Part 5: DNA Read/Write/Edit

5.1. DNA Read

  1. What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).
  • For my project, I would want to sequence gut microbiome DNA from people with different ABO blood groups, especially comparing blood type A with non-A individuals. My research question is whether the reported association between blood type A and gastrointestinal disease risk has an actual biological mechanism rather than being only a population-level correlation. ABO blood groups are defined by differences in glycan structures, and these glycans are not only present on red blood cells but also on intestinal mucosal surfaces. Many gut microbes interact directly with host glycans by binding to them or metabolizing them as nutrients. Because of this, I suspect that different blood group glycans could shape the microbial community in the gut. Sequencing microbiome DNA would allow me to determine whether certain bacteria, especially glycan-binding or inflammation-associated species, are enriched in individuals with blood type A. In addition, metagenomic sequencing would reveal functional genes such as glycan-degrading enzymes or virulence factors that might trigger inflammatory responses. This information would help identify biological markers that could be used as inputs for a synthetic biology sensing system designed to test the mechanism experimentally.
  1. In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also answer the following questions:

(i) Is your method first-, second- or third-generation or other? How so?

(ii) What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

(iii) What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

(iv) What is the output of your chosen sequencing technology?

  • For my project, I would use Illumina sequencing (sequencing by synthesis) to sequence gut microbiome DNA from individuals with different ABO blood groups. Since my goal is to compare microbial communities and identify possible functional genes linked to host glycan interactions, I need a method that can accurately sequence many DNA fragments from a mixed sample at high depth. Illumina sequencing is widely used for metagenomics because it provides high accuracy, strong statistical power, and the ability to detect small differences in microbial composition between groups.

  • Illumina sequencing is a second-generation (next-generation sequencing, NGS) technology. It is considered second-generation because it performs massively parallel sequencing of millions of DNA fragments at the same time and requires clonal amplification before sequencing. The technology uses bridge amplification to generate clusters and reversible terminator nucleotides to read one base at a time. Unlike first-generation Sanger sequencing, which reads single fragments individually, Illumina reads many short fragments simultaneously, making it suitable for complex microbiome samples.

  • The input for this method would be total DNA extracted from stool samples containing gut microbiome material. First, DNA is isolated from the sample and then fragmented into short pieces of approximately 200–500 base pairs. The fragment ends are repaired and modified by adding an A-tail, followed by ligation of Illumina-specific adapters to both ends. The adapter-ligated fragments are PCR amplified to enrich correctly prepared molecules and create the sequencing library. After quality control and quantification, the library is loaded onto the flow cell for sequencing.

  • The sequencing process begins with cluster generation on the flow cell. DNA fragments bind to complementary oligonucleotides attached to the surface and undergo bridge amplification, forming clonal clusters of identical DNA molecules. During sequencing by synthesis, fluorescently labeled nucleotides with reversible terminators are added. Only one nucleotide can be incorporated in each cycle. After incorporation, a camera records the fluorescent signal, which corresponds to a specific base (A, T, C, or G). The fluorescent label and terminator are then chemically removed, allowing the next cycle to occur. By repeating this process, the machine determines the sequence base by base through detection of fluorescence signals, a process known as base calling.

  • The output of Illumina sequencing is a large collection of short DNA sequence reads stored in FASTQ files. Each read contains the nucleotide sequence along with a quality score indicating confidence in each base call. These reads can then be analyzed bioinformatically to identify microbial species, compare microbiome composition between blood groups, and detect functional genes such as glycan-degrading enzymes or inflammation-associated factors. This information helps evaluate whether differences in microbiome behavior could explain the observed association between blood type A and gastrointestinal disease risk.

5.2. DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize!

  • For my project, I would synthesize a bacterial genetic sensing circuit that detects blood-group-related glycans and activates a measurable reporter when inflammatory conditions are present. The goal is not to diagnose disease yet, but to experimentally test whether molecules associated with blood type A environments change microbial behavior in a biologically meaningful way.

  • ABO blood groups differ in terminal sugar structures on host glycans. Blood type A contains N-acetylgalactosamine (GalNAc) as the terminal sugar. Many gut bacteria recognize or metabolize host glycans, so my idea is to engineer a bacterium (for example a lab strain of E. coli) with a circuit that turns on a fluorescent signal only when two conditions occur: detection of A-associated glycans and detection of inflammation-related signals (such as nitrate or reactive oxygen stress). This would function as a controllable research platform to experimentally connect host glycans to microbial inflammatory responses.

  • The DNA I would synthesize is therefore a two-input AND-gate genetic circuit consisting of: A glycan-responsive promoter (activated by GalNAc metabolism regulator), an inflammation-responsive promoter (stress/nitrate inducible), a transcriptional logic gate (split activator system), and a GFP reporter gene

  • If fluorescence appears only when both signals are present, it would support the hypothesis that specific host glycan environments influence microbial inflammatory behavior.

  • Example construct design:

    1. Part 1 – Constitutive regulator expression: Promoter → regulator protein sensing GalNAc
    2. Part 2 – Inflammation promoter controlling activator half: Stress promoter → Activator fragment A
    3. Part 3 – Glycan promoter controlling activator half: GalNAc promoter → Activator fragment B
    4. Part 4 – Output reporter: AND gate → GFP expression
  • Below is a simplified example of a reporter cassette that could realistically be synthesized (promoter + RBS + GFP + terminator):

TTGACATGATAAGTAAGGAGGTTTAAACATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACATACGGAAAACTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCAAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAATTTTTTGCTAGC

(This represents a GFP reporter module; regulatory promoters would be placed upstream depending on the sensing design.)

  • Synthesizing this circuit allows experimental testing of the hypothesis: Blood-type-specific glycans influence microbial inflammatory behavior.

  • Instead of relying on epidemiological correlations, the engineered system creates a controllable biological readout. If activation differs in A-glycan conditions compared to others, it would provide mechanistic evidence that host glycan composition can shape disease-related microbial responses.

(ii) What technology or technologies would you use to perform this DNA synthesis, and why? Also answer the following questions:

  1. What are the essential steps of your chosen sequencing methods?
  2. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, and scalability?
  • To synthesise my designed genetic circuit, I would use array-based phosphoramidite DNA synthesis followed by fragment assembly (such as Gibson Assembly). Because my construct is a designed sequence rather than naturally occurring DNA, it must be chemically built from short oligonucleotides and then assembled into a complete gene cassette. This approach allows precise control over regulatory elements such as promoters, ribosome binding sites, and reporter genes, which is necessary for constructing a synthetic sensing circuit. The process begins with chemical synthesis of short oligonucleotides (about 60–200 bp) using phosphoramidite chemistry, where nucleotides are added one base at a time to a growing DNA strand attached to a solid surface. After deprotection and cleavage, the oligos are PCR amplified and designed with overlapping regions. These fragments are then assembled into the full construct using Gibson Assembly, in which exonuclease creates complementary overhangs, polymerase fills gaps, and ligase seals the backbone. The assembled plasmid is transformed into bacteria, and colonies are collected for sequence verification.
  1. To read and verify the synthesised DNA, I would use Illumina sequencing (sequencing-by-synthesis). The plasmid DNA would first be extracted and fragmented, adapters would be ligated, and a sequencing library would be prepared. The fragments bind to a flow cell and undergo bridge amplification to form clusters. During sequencing, fluorescent reversible terminator nucleotides are incorporated one at a time, and each cycle is imaged to identify the added base. The fluorescent signal detected at each cycle is converted into nucleotide identity through base calling, generating short sequence reads that can be aligned to the designed construct to confirm its correctness.
  2. The main limitations of Illumina sequencing relate to read length and assembly rather than accuracy. Although it provides very high accuracy and throughput, it produces short reads, so reconstruction of long repetitive regions can be difficult. For my application this is manageable because plasmids are small and have a known reference sequence. In terms of speed, library preparation and sequencing runs take several hours to days, which is slower than simple PCR validation but provides much more reliable confirmation. Scalability is excellent since many constructs can be sequenced simultaneously, but costs increase when sequencing only a very small number of samples.

5.3. DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

  • For my project, I would want to edit bacterial DNA rather than human DNA, specifically genes involved in glycan recognition and inflammatory sensing in a model gut bacterium such as E. coli. The goal of my work is to test whether ABO blood-group glycans , especially the type A terminal sugar N-acetylgalactosamine (GalNAc) , can influence microbial behavior linked to gastrointestinal disease. Instead of modifying patients, I would engineer a controllable microbial system that mimics how gut bacteria might respond inside the intestine. The main edits I would introduce are regulatory and sensing modifications. First, I would insert a glycan-responsive sensing module so the bacterium can detect A-type glycans. This could involve adding or modifying carbohydrate-binding proteins or transport/metabolism regulators that activate transcription when GalNAc is present. Second, I would add an inflammation-response module that detects gut stress signals such as nitrate or oxidative stress, which are commonly elevated during intestinal inflammation. Finally, I would connect both inputs to a reporter output (for example fluorescence), forming a logical AND gate so the cell responds only when both host glycan signals and inflammatory conditions occur together. These edits would allow the bacterium to act as a biological probe of the gut environment. If the engineered cells activate differently in A-type glycan conditions compared to others, it would suggest a mechanistic relationship between blood group chemistry and microbial inflammatory behavior. This approach avoids ethical concerns of editing human genomes and instead creates a reversible experimental model that helps transform epidemiological correlations into testable biological mechanisms.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

  1. How does your technology of choice edit DNA? What are the essential steps?
  2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
  3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?
  • To introduce the edits in my engineered gut bacterium, I would use CRISPR-Cas9 genome editing combined with homologous recombination. This approach is widely used in bacteria because it allows precise insertion of synthetic genetic circuits at defined genomic locations rather than relying only on plasmids. For my project, stable integration is useful so the sensing system behaves consistently across experiments and does not get lost during cell growth.
  1. CRISPR-Cas9 edits DNA by creating a targeted double-strand break at a specific sequence determined by a guide RNA. The cell then attempts to repair this break. If a repair template containing designed DNA is provided, the bacterium uses homologous recombination to copy that template into its genome. In my case, the repair template would contain the glycan-sensing promoter, inflammation-response module, and reporter gene arranged as a logic circuit. The essential steps are: designing a guide RNA targeting a safe insertion site, delivering Cas9 and the guide into the bacteria, introducing a donor DNA template with homologous flanking regions, cleavage of the genome at the target site, and repair using the donor DNA to integrate the synthetic construct.
  2. Preparation involves several design stages. First, I would computationally select a genomic locus that does not disrupt essential genes. Then I would design the single guide RNA (sgRNA) sequence that uniquely matches that region. Next, I would synthesize a donor DNA template containing my circuit flanked by homology arms (~500–1000 bp) matching the insertion site. The experimental inputs therefore include: a plasmid expressing Cas9, a plasmid or cassette encoding the sgRNA, the donor DNA template, competent bacterial cells, and standard transformation reagents. After transformation, edited cells would be selected and verified by sequencing.
  3. The main limitations of this editing method are efficiency and off-target activity. Not all cells successfully incorporate the donor DNA after cutting, so screening is required to isolate correct clones. Homologous recombination efficiency in bacteria can also vary depending on strain and insert size, making larger constructs harder to integrate. Although CRISPR is precise, imperfect guide design can cause unintended cuts at similar sequences, potentially damaging the genome. Finally, multiplex editing (editing many sites at once) becomes less reliable because each additional edit lowers overall success probability. Despite these limitations, CRISPR-Cas9 provides sufficient precision and flexibility for constructing a stable synthetic sensing platform.

Week 3 HW: Lab Automation

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

For this week, we’d like for you to do the following:

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
  2. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.

Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.

  • Echo transfer biosensor constructs and any required cofactors into specified wells.

  • Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate.

  • Multiflo dispense the CFPS lysate to all wells to start protein expression.

  • PlateLoc seal the plate.

  • Inheco incubate the plate at 37°C while the biosensor proteins are synthesized.

  • XPeel remove the seal.

  • PHERAstar measure fluorescence to compare biosensor responses.


1. Paper using Opentrons for novel biology

One example is AssemblyTron: flexible automation of DNA assembly with Opentrons OT‑2 lab robots by Eno et al. (Synthetic Biology, 2023). The authors developed AssemblyTron, an open‑source Python package that takes DNA assembly designs (from the j5 design software) and converts them into executable protocols for the Opentrons OT‑2 liquid‑handling robot. The biological focus is on accelerating the Build step of the Design–Build–Test–Learn cycle in synthetic biology by fully automating PCR setup and multi‑part DNA assemblies such as Golden Gate and in vivo assembly (IVA). academic.oup

2. What I intend to automate for my final project

Core biological idea

I want to build a small, automated platform to probe whether ABO blood‑group–like glycan patterns (especially type A–like structures) influence biological responses relevant to gastrointestinal disease risk. The concept is to combine:

  • Engineered mammalian or microbial cells that express defined ABO‑like glycan patterns (e.g., via glycosyltransferase expression or synthetic glycan coatings).
  • Simple reporter circuits or biosensors that respond to inflammatory cues (e.g., NF‑κB activation, cytokine mimics) or pathogen‑associated signals.
  • An automated liquid‑handling workflow that sets up and runs multi‑factor experiments varying glycan background, inflammatory stimulus, and microbial or ligand exposure.

The aim for this course project is not a full mechanistic explanation, but a robot‑friendly experimental scaffold that could, in principle, be scaled to test whether “type A‑like” contexts behave systematically differently from “type O/B‑like” contexts.

2.1. What I will automate

For the scope of the class, I would focus on plate‑based assays with three main automated modules:

  1. Automated plate setup (Opentrons OT‑2)

    • Distribute different “glycan conditions” across a 96‑well plate:
      • Rows = glycan backgrounds (e.g., mock, A‑like, B‑like, O‑like mimics or different lectin/glycopolymer coatings).
      • Columns = inflammatory or microbial stimuli (e.g., LPS analog, TNFα mimic, conditioned media).
    • Prepare master mixes for:
      • Reporter cells (or cell‑free biosensor system).
      • Media plus defined concentrations of stimuli.
    • Dispense appropriate combinations into each well according to a CSV design file (similar spirit to AssemblyTron linking design → pipetting plan).
  2. Automated time‑course perturbations

    • Use the robot to:
      • Add secondary stimuli at defined timepoints (e.g., addition of microbial supernatant after pre‑conditioning in inflammatory cues).
      • Perform serial dilutions of stimuli across the plate to get dose–response curves.
  3. Automated sampling / readout prep

    • For fluorescent reporters: set up plates with consistent volumes and controls so they can be read on a plate reader.
    • For secreted markers (e.g., simulated “cytokines” using fluorescent reporters or colorimetric substrates): aliquot supernatant into a separate plate for endpoint assays.

This pipeline mirrors the “DBTL” idea in the AssemblyTron paper: design a matrix of conditions, automatically build the experiment on the robot, then test by measuring reporter outputs.

2.2. Example automation workflow (high‑level steps)

Here is a concrete plan for a 96‑well plate experiment:

  1. Design phase

    • Create a CSV “experiment map” specifying:
      • Factor A: Glycan context (e.g., 4 levels: mock, A‑mimic, B‑mimic, O‑mimic).
      • Factor B: Inflammatory stimulus (e.g., 6 concentrations of LPS analog or TNFα mimic).
      • Factor C: Microbial cue (e.g., presence/absence of microbial supernatant or defined ligand).
    • Encode which wells are controls: no cells, no stimulus, glycan only, stimulus only.
  2. Robot setup

    • Deck layout:
      • Slot 1: 96‑well assay plate (flat‑bottom).
      • Slot 2: Reservoir with media and reporter cell suspension (or CFPS mix if using a cell‑free biosensor).
      • Slot 3: 96‑well “stimulus source” plate with concentrated stocks of inflammatory agents and microbial components.
      • Slots 4–5: Tip racks for P20 and P300 single/multi‑channel pipettes.
      • Optional: Temperature module holding cells at 37 °C or 30 °C depending on chassis.
  3. Automated protocol

    • Step 1: Seed reporter cells
      • Robot mixes cell suspension and dispenses a fixed volume (e.g., 50–100 µL) into each experimental well.
    • Step 2: Apply glycan context
      • Option A (simple): Pre‑coat wells manually with glycopolymers or lectins; robot only has to track which wells are which.
      • Option B (more advanced): Robot dispenses defined concentrations of soluble glycoconjugates or lectins to appropriate wells.
    • Step 3: Add inflammatory stimuli
      • Robot performs serial dilutions from stimulus stock plate into media to generate a gradient.
      • Dispenses the correct volume to each well according to the design map.
    • Step 4: Incubation
      • Plate incubated off‑deck (incubator).
    • Step 5: Secondary perturbation (if included)
      • Plate returned to deck; robot adds microbial supernatant or additional ligands to specified wells.
    • Step 6: Sampling / preparation for readout
      • For fluorescence: robot mixes wells, optionally transfers aliquots to a clean plate for reading, and adds stop buffer if needed.
      • For colorimetric assays: robot dispenses substrate and halts reactions after defined times.
  4. Readout

    • Plate reader measures fluorescence or absorbance corresponding to biosensor activation (e.g., NF‑κB reporter, general stress reporter).
    • Data analysis (offline): compare response curves between glycan backgrounds to see whether “A‑like” context shifts sensitivity or maximum response to inflammatory/microbial cues.

2.3. Example pseudocode / Python sketch (Opentrons‑style)

This is illustrative pseudocode in a Python‑like style for an Opentrons OT‑2 protocol:

metadata = {
    "protocolName": "ABO glycan–inflammation screen",
    "author": "Your Name",
    "apiLevel": "2.15"
}

def run(protocol):
    # Load labware
    plate = protocol.load_labware("corning_96_wellplate_360ul_flat", "1")
    stimulus_plate = protocol.load_labware("nest_96_wellplate_200ul_flat", "3")
    reservoir = protocol.load_labware("nest_12_reservoir_15ml", "2")
    tiprack_p300 = protocol.load_labware("opentrons_96_tiprack_300ul", "4")
    tiprack_p20 = protocol.load_labware("opentrons_96_tiprack_20ul", "5")

    # Load instruments
    p300 = protocol.load_instrument("p300_multi", "left", tip_racks=[tiprack_p300])
    p20 = protocol.load_instrument("p20_multi", "right", tip_racks=[tiprack_p20])

    # Reagents in reservoir
    cells = reservoir.wells()[0]      # reporter cell suspension
    media = reservoir.wells() [academic.oup](https://academic.oup.com/synbio/article/8/1/ysac032/6956284?searchresult=1)      # base media

    # Simple map for glycan contexts and stimulus columns
    glycan_rows = {
        "A_mimic": ["A", "B"],
        "B_mimic": ["C", "D"],
        "O_mimic": ["E", "F"],
        "mock":    ["G", "H"]
    }

    # Step 1: seed cells in all wells
    p300.pick_up_tip()
    for col in range(1, 13):  # columns 1–12
        dest = plate.columns()[col - 1]
        p300.transfer(80, cells, dest, new_tip="never", mix_after=(3, 80))
    p300.drop_tip()

    # Step 2: add stimuli (example: gradient from stimulus_plate row A)
    stimulus_source_row = stimulus_plate.rows_by_name()["A"]
    for idx, col in enumerate(plate.columns()):
        p20.pick_up_tip()
        # transfer from corresponding source well in stimulus plate
        p20.transfer(20, stimulus_source_row[idx], col, new_tip="never")
        p20.drop_tip()

    # (Optional) Step 3: add secondary microbial cue at later time point
    # protocol.pause("Incubate plate, then return to deck to resume.")
    # ...additions go here...

    # End: user moves plate to reader for fluorescence measurement

In a more complete version, the layout and volumes would be read from a CSV (like AssemblyTron reads design files), allowing you to change the entire experimental design without rewriting the protocol.

2.4. Possible 3D‑printed holders / hardware

To better mimic gastrointestinal contexts and make the automation physically robust, I could incorporate simple 3D‑printed pieces:

  • Custom plate lid / insert that:
    • Holds gas‑permeable membranes or films coated with different glycan patterns above cell layers.
    • Keeps multiple inserts aligned so the OT‑2 can still accurately access wells.
  • Fabricated “gut chip” carriers that fit into a standard plate footprint:
    • Thin channels or membranes printed into a carrier that snaps into a 96‑well frame, allowing the robot to seed cells on one side and add glycan/microbial stimuli on the other.

These holders would be designed to preserve compatibility with standard SBS plate dimensions, so the robot’s calibration remains valid.

2.5. Possible use of a cloud lab (e.g., Ginkgo Nebula)

If access to a cloud automation platform such as Ginkgo Nebula is available, an extended version of the project could:

  • Use the local Opentrons workflow to prototype the condition matrix and reporter constructs (small panel).
  • Upload the best‑performing biosensor designs and condition matrix to the cloud system to:
    • Scale the screen to many more glycan contexts and pathogen‑related ligands.
    • Incorporate robotics like:
      • Acoustic droplet transfer to miniaturize reaction volumes.
      • Automated incubation and kinetic plate reading.
  • Use the returned data to refine hypotheses about how “A‑like” glycans modulate inflammatory or infection‑related responses.

For this class assignment, the concrete deliverable is the Opentrons protocol plus experimental design, but the architecture is chosen so it could later be ported to a higher‑throughput cloud system.

2.6. What is “novel” about this automation

  • It uses automation not just to speed up a routine assay, but to systematically explore a multi‑factor space: glycan background × inflammatory state × microbial cues.
  • It is explicitly designed to test whether statistical associations between blood type and GI disease risk have plausible biological correlates in controlled, engineered systems.
  • The workflow is modular: swapping in different glycan mimics or reporter circuits does not require changing the overall automated structure—only the design file and a few reagent definitions.