Serin Joby Parekkadan — HTGAA Spring 2026

About me
Master’s student in Synthetic Biology | Exploring neuroscience & cutting-edge biotech | Curious mind, always decoding biology 🧬✨

Master’s student in Synthetic Biology | Exploring neuroscience & cutting-edge biotech | Curious mind, always decoding biology 🧬✨
Week 1 HW: Principles and Practices
This homework analyzes a synthetic biology idea and evaluates governance options to support ethical, safe, and responsible innovation.
Week 2 HW: DNA Read, Write, & Edit
This homework goes through DNA Sequencing and Editing.
A robot-assisted synthetic biology platform that uses automated, plate-based assays to test how ABO-like glycan contexts influence inflammatory and microbiome-related responses relevant to gastrointestinal disease risk.

First, describe a biological engineering application or tool you want to develop and why.
This could be inspired by an idea for your HTGAA class project and/or something you are already doing in your research, or something you are just curious about.
One biological engineering tool I’m curious about developing is a synthetic biology–based system to explore whether blood group types, especially blood type A, are actually linked to higher gastrointestinal disease risk at a biological level. I’ve read in multiple papers that people with blood type A may have a higher risk for certain gastrointestinal problems (1). However, when I looked into it more, most of the evidence seems to come from population statistics rather than experimental or mechanistic studies. There doesn’t seem to be a clear biological explanation, and there also aren’t many tools that can directly test this relationship in a controlled way. That gap is what makes me interested in this idea. From a synthetic biology perspective, I find it interesting that ABO blood groups are defined by differences in glycan structures, which are known to play roles in cell–cell interactions, immune responses, and host–microbiome relationships (2). This makes me wonder whether these glycan differences could influence how the gut environment responds to inflammation or pathogens and whether that could partially explain the observed disease risk. A possible approach could be to use engineered cells or microbial biosensors with simple genetic circuits that respond to blood-group-related glycan patterns and gastrointestinal inflammation markers. The goal wouldn’t be to create a finished diagnostic tool right away, but rather a research platform that helps test whether these associations are biologically meaningful instead of just statistical.
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.
Because this tool links blood group type with disease risk, it raises important ethical and governance concerns. A key goal is preventing harm, especially avoiding discrimination or overinterpretation of results, since blood type alone does not determine gastrointestinal disease risk. Governance should also ensure biological safety and scientific responsibility, particularly if engineered cells or genetic circuits are used, by requiring proper containment and validation before findings are shared beyond research settings. In addition, protecting individual autonomy and privacy is essential, as combining blood group information with biosensor data creates sensitive health information that should only be used with informed consent. Finally, equity should be considered to ensure that the tool does not disproportionately benefit or disadvantage specific populations.
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g., a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g., academic researchers, companies, federal regulators, law enforcement, etc.). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g., 3D printing, drones, financial systems, etc.).
Action 1: Regulatory Oversight and Ethical Review
Purpose: Currently, early-stage synthetic biology research often proceeds with minimal oversight, especially in academic labs. I propose requiring that any research using engineered cells or biosensors targeting blood group data undergo formal ethical review and regulatory approval before publication or broader use.
Design: National regulators (e.g., EMA) and university ethics boards would evaluate safety, privacy protections, and non-discrimination measures. Researchers would submit risk assessments and validation plans.
Assumptions: This assumes regulators and review boards have enough expertise in synthetic biology to assess risk accurately and that labs comply with these requirements.
Risks of Failure & “Success”: Failure could occur if the review is too slow or inconsistent, slowing research unnecessarily. Success could unintentionally create overconfidence in safety, leading others to assume the tool is risk-free.
Action 2: Privacy and Data Governance Framework Purpose: Right now, blood group and biosensor data could be collected without strong protections. I propose treating this information as sensitive health data, requiring secure storage, anonymisation, and informed consent for research or secondary use.
Design: Universities, hospitals, and biotech companies would implement encrypted databases and adopt privacy-by-design models, such as federated learning, where data stays local but insights can still be shared.
Assumptions: Assumes technical infrastructure is available and participants understand consent procedures.
Risks of Failure & “Success”: Data leaks could lead to discrimination or misuse. Overly restrictive rules could hinder collaboration and slow scientific progress.
Action 3: Incentives for Equitable and Responsible Innovation Purpose: Often, SynBio innovations are developed for wealthy populations or commercial markets. I propose funding programs and grants that encourage open-source development of biosensor tools and ensure accessibility to diverse populations.
Design: Government research agencies (e.g., DFG, Horizon Europe) could tie grants to equity and open-science requirements. NGOs and academic labs could partner to distribute tools widely and safely.
Assumptions: Assumes companies and researchers are motivated by incentives and will participate voluntarily.
Risks of Failure & “Success”: Companies may avoid participation, limiting innovation. Open designs could also be misused if security oversight is insufficient.
Next, score (from 1 to 3, with 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework, but feel free to make your own:
| Does the option: | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance Biosecurity | |||
| • By preventing incidents | 1 | 2 | 3 |
| • By helping respond | 1 | 2 | 3 |
| Foster Lab Safety | |||
| • By preventing incident | 1 | 2 | 3 |
| • By helping respond | 1 | 2 | 3 |
| Protect the environment | |||
| • By preventing incidents | 1 | 2 | 3 |
| • By helping respond | 1 | 2 | 3 |
| Other considerations | |||
| • Minimizing costs and burdens to stakeholders | 3 | 2 | 1 |
| • Feasibility? | 2 | 1 | 3 |
| • Not impede research | 3 | 2 | 1 |
| • Promote constructive applications | 1 | 2 | 3 |
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritise, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g., to MIT leadership or the Cambridge Mayoral Office) to the national (e.g., to President Biden or the head of a federal agency) to the international (e.g., to the United Nations Office of the Secretary-General or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.
Based on the scoring of the three governance options, I would prioritise a combination of Option 1 (Regulatory Oversight & Ethical Review) and Option 2 (Privacy & Data Governance Framework), while also incorporating elements of Option 3 (Equity & Incentives) where possible. Regulatory oversight is the most important because it directly enhances biosecurity, lab safety, and environmental protection, which are essential when working with engineered cells or biosensors that interact with human biological data. Privacy and data governance complement this by protecting sensitive blood group and biosensor information, ensuring that individuals’ autonomy is respected and minimising the risk of misuse or discrimination.
Option 3, focusing on equitable access and open-science incentives, is valuable for promoting constructive applications and broad societal benefit, but it has less impact on immediate safety and biosecurity concerns. The main trade-off is that prioritising regulatory oversight and privacy measures may increase costs and slow research progress, while emphasising equity and open access could increase the risk of misuse if technical safeguards are insufficient.
I would recommend this combined approach to national-level regulators and research oversight bodies, such as the EMA or national bioethics committees, because they are in a position to implement formal policies and standards that balance safety, privacy, and societal benefit. The key assumptions are that regulators have sufficient expertise in synthetic biology and that institutions will comply with these rules. Uncertainties include the potential for unforeseen technical risks in engineered biosensors and how effectively privacy protections can prevent indirect discrimination.
This week’s class made me realise that even curiosity-driven synthetic biology work can raise ethical concerns, especially when human biological data is involved. One issue that was new to me was how combining traits like blood group type with disease risk can lead to harm if results are overinterpreted or misused, even without malicious intent. To address this, early ethical review, clear data privacy rules, and careful communication of uncertainty seem important governance actions.
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
Error rate ≈ 1 in 10⁶ bases (10⁻⁶)
Human genome size ≈ 3.2 Gb (3 × 10⁹ bp)
Biology deals with the discrepancy between the finite error rate of DNA polymerase and the very large size of the human genome by using closed-loop, error-correcting replication rather than relying on single-pass accuracy. Replicative DNA polymerases contain a 3′→5′ proofreading exonuclease that removes misincorporated nucleotides during synthesis, improving fidelity by several orders of magnitude. Errors that escape proofreading are further corrected by post-replication mismatch repair systems such as the MutS pathway, which detect and repair base-pair mismatches. Together, these layered correction mechanisms reduce the effective error rate sufficiently to allow replication of gigabase-scale genomes, enabling biological DNA synthesis to scale far beyond what would be possible with open-loop chemical synthesis.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Average human protein ≈ 1036 bp (~350 aa)
a. GC content & secondary structure
b. Repeats & homopolymers
c. Physical DNA behavior matters
So many valid codons fail because they: Fold incorrectly, Are unstable, Are hard to synthesize and, Break regulatory behavior
What’s the most commonly used method for oligosynthesis currently?
The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramidite chemistry, originally developed by Caruthers. In this method, DNA is synthesised on a solid support (such as controlled pore glass or silicon) through repetitive cycles of nucleotide coupling, capping of unreacted sites, oxidation, and deprotection. The lecture highlight that this chemistry is highly automatable and forms the basis of modern high-throughput oligo synthesis platforms, including array-based and silicon-based synthesis systems.
Why is it difficult to make oligos longer than 200 nt via direct synthesis?
Direct chemical synthesis of oligos becomes inefficient beyond ~200 nucleotides because each synthesis cycle has a coupling efficiency slightly below 100%. These small inefficiencies accumulate over many cycles, leading to a rapid decrease in the fraction of full-length products and a buildup of truncated sequences. As oligo length increases, synthesis errors and truncation products dominate the pool, making purification of the correct full-length oligo increasingly difficult. Additionally, longer sequences are more prone to secondary structure formation, further reducing synthesis efficiency as mentioned.
Why can’t you make a 2000 bp gene via direct oligo synthesis?
Synthesising a 2000 bp gene directly using phosphoramidite chemistry is not feasible because the cumulative effect of coupling inefficiencies and error rates makes the yield of full-length, error-free molecules vanishingly small. Over thousands of synthesis cycles, the probability of obtaining a correct full-length product approaches zero, while the majority of molecules are truncated or contain multiple errors. For this reason, the lecture emphasize that modern gene synthesis relies on assembling shorter, chemically synthesized oligos into longer gene fragments using enzymatic assembly methods, followed by sequence verification, rather than attempting direct synthesis of long genes.
What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
Animals require ten essential amino acids: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine (during growth) because they cannot synthesise them on their own. This limitation is metabolic rather than genetic, meaning the ribosome can translate these amino acids, but the organism must obtain them from the environment, as emphasised in Church’s slides of amino acid constraints.
The lysine contingency is especially important because animals completely lack a lysine biosynthesis pathway. This makes lysine a reliable metabolic bottleneck that can be exploited for biocontainment. An engineered organism that depends on lysine, or a lysine analogue, cannot survive without external supplementation, reducing the risk of escape or uncontrolled spread. Lysine is also central to protein function due to its positive charge and role in protein–protein interactions and post-translational modifications. Because lysine is essential at metabolic, structural, and regulatory levels, the lysine contingency provides a robust and evolution-resistant control strategy in synthetic biology.
Begin personalising your HTGAA website in in https://edit.htgaa.org/, starting with your homepage—fill in the template with information about yourself, or remove what’s there and make it your own. Be creative! - Donr As with all assignments in HTGAA, be sure to write up every part of this homework on your HTGAA website in order to receive credit. - Done
(1) J. Y. Huang, R. Wang, Y.-T. Gao, and J.-M. Yuan, “ABO blood type and the risk of cancer – Findings from the Shanghai Cohort Study,” PLoS ONE, vol. 12, no. 9, p. e0184295, Sep. 2017, doi: 10.1371/journal.pone.0184295.
(2) G. Misevic, “ABO blood group system,” Blood and Genomics, vol. 2, no. 2, pp. 71–84, Jan. 2018, doi: 10.46701/apjbg.2018022018113.
**The cover page and the text rephrasing of some lines done by AI.

.png)

Didnt have the lab access to perform the above experiment
In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen, and why? Using one of the tools described in recitation (NCBI, UniProt, Google), obtain the protein sequence for the protein you chose.
(Example from our group homework, you may notice the particular format — The example below came from UniProt)
sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT
For this homework, I chose the ABO glycosyltransferase protein because it directly determines human blood group type by modifying cell-surface glycans. Since my broader project idea focuses on whether blood type A may influence gastrointestinal disease risk, this protein is central to that question. The ABO glycosyltransferase is responsible for adding specific sugar residues that create the A or B antigen. These glycan differences may influence host–microbe interactions, immune responses, or inflammation in the gut. I chose this protein because it represents the molecular basis of blood group identity, making it a logical starting point for exploring any mechanistic relationship between blood type and disease risk.
Here is the human ABO glycosyltransferase sequence (UniProt entry for human ABO):
sp|P16442|BGAT_HUMAN Histo-blood group ABO system transferase OS=Homo sapiens OX=9606 GN=ABO PE=1 SV=2 MAEVLRTLAGKPKCHALRPMILFLIMLVLVLFGYGVLSPRSLMPGSLERGFCMAVREPDH LQRVSLPRMVYPQPKVLTPCRKDVLVVTPWLAPIVWEGTFNIDILNEQFRLQNTTIGLTV FAIKKYVAFLKLFLETAEKHFMVGHRVHYYVFTDQPAAVPRVTLGTGRQLSVLEVRAYKR WQDVSMRRMEMISDFCERRFLSEVDYLVCVDVDMEFRDHVGVEILTPLFGTLHPGFYGSS REAFTYERRPQSQAYIPKDEGDFYYLGGFFGGSVQEVQRLTRACHQAMMVDQANGIEAVW HDESHLNKYLLRHKPTKVLSPEYLWDQQLLGWPAVLRKLRFTAVPKNHQAVRNP
The Central Dogma discussed in class and recitation describes the process in which a DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (Google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
(Example: Get to the original sequence of phage MS2 L-protein from its genome – phage MS2 genome - Nucleotide - NCBI)
Lysis protein DNA sequence
atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa




NG_006669.2:5026-5053,18047-18116,18841-18897,20349-20396,22083-22118,22673-22807,23860-24550 Homo sapiens ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase (ABO), RefSeqGene (LRG_792) on chromosome 9 ATGGCCGAGGTGTTGCGGACGCTGGCCGGAAAACCAAAATGCCACGCACTTCGACCTATGATCCTTTTCC TAATAATGCTTGTCTTGGTCTTGTTTGGTTACGGGGTCCTAAGCCCCAGAAGTCTAATGCCAGGAAGCCT GGAACGGGGGTTCTGCATGGCTGTTAGGGAACCTGACCATCTGCAGCGCGTCTCGTTGCCAAGGATGGTC TACCCCCAGCCAAAGGTGCTGACACCGTGTAGGAAGGATGTCCTCGTGGTGACCCCTTGGCTGGCTCCCA TTGTCTGGGAGGGCACATTCAACATCGACATCCTCAACGAGCAGTTCAGGCTCCAGAACACCACCATTGG GTTAACTGTGTTTGCCATCAAGAAATACGTGGCTTTCCTGAAGCTGTTCCTGGAGACGGCGGAGAAGCAC TTCATGGTGGGCCACCGTGTCCACTACTATGTCTTCACCGACCAGCCGGCCGCGGTGCCCCGCGTGACGC TGGGGACCGGTCGGCAGCTGTCAGTGCTGGAGGTGCGCGCCTACAAGCGCTGGCAGGACGTGTCCATGCG CCGCATGGAGATGATCAGTGACTTCTGCGAGCGGCGCTTCCTCAGCGAGGTGGATTACCTGGTGTGCGTG GACGTGGACATGGAGTTCCGCGACCACGTGGGCGTGGAGATCCTGACTCCGCTGTTCGGCACCCTGCACC CCGGCTTCTACGGAAGCAGCCGGGAGGCCTTCACCTACGAGCGCCGGCCCCAGTCCCAGGCCTACATCCC CAAGGACGAGGGCGATTTCTACTACCTGGGGGGGTTCTTCGGGGGGTCGGTGCAAGAGGTGCAGCGGCTC ACCAGGGCCTGCCACCAGGCCATGATGGTCGACCAGGCCAACGGCATCGAGGCCGTGTGGCACGACGAGA GCCACCTGAACAAGTACCTGCTGCGCCACAAACCCACCAAGGTGCTCTCCCCCGAGTACTTGTGGGACCA GCAGCTGCTGGGCTGGCCCGCCGTCCTGAGGAAGCTGAGGTTCACTGCGGTGCCCAAGAACCACCAGGCG GTCCGGAACCCGTGA
Once a nucleotide sequence of your protein is determined, you need to codon optimise your sequence. You may, once again, utilise Google for a “codon optimisation tool”. In your own words, describe why you need to optimise codon usage. Which organism have you chosen to optimise the codon sequence for, and why?
(Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI)
Lysis protein DNA sequence with codon optimisation
ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA
Once the nucleotide sequence of the protein is determined, codon optimisation is necessary because different organisms prefer different codons to encode the same amino acid. Although multiple codons can code for one amino acid, the frequency with which each codon is used varies between species. If a gene contains many codons that are rare in the host organism, translation can be inefficient, leading to low protein yield or incorrect folding. Codon optimisation adjusts the DNA sequence to better match the codon usage bias of the chosen expression host, without changing the amino acid sequence of the protein.
For this project, I chose to optimise the codon sequence for Escherichia coli, since it is one of the most commonly used organisms for recombinant protein expression. E. coli grows quickly, is inexpensive to culture, and has well-established cloning and expression systems. Optimising the ABO glycosyltransferase gene for E. coli would increase the likelihood of efficient transcription and translation, improving protein yield for experimental studies. Additionally, codon optimisation tools can help avoid problematic sequences such as strong secondary structures, rare codons, or unwanted restriction enzyme recognition sites.
Optimized codon:
ATGGCGGAAGTGCTGCGTACCCTGGCAGGTAAACCGAAGTGCCATGCCCTGCGTCCGATGATTCTGTTCCTGATTATGCTGGTGCTGGTGCTGTTCGGTTATGGCGTGCTGAGCCCGCGTAGCCTGATGCCGGGCTCTCTGGAACGTGGTTTCTGCATGGCGGTGCGCGAACCGGACCATCTGCAGCGTGTGAGCCTGCCGCGCATGGTGTATCCGCAGCCGAAAGTTCTGACCCCGTGCCGCAAAGATGTGCTGGTGGTGACGCCGTGGCTGGCGCCGATTGTGTGGGAAGGCACCTTTAATATTGATATTCTGAATGAACAGTTTCGCCTGCAGAATACCACCATTGGCCTGACCGTGTTTGCGATTAAAAAATACGTGGCGTTTCTGAAACTGTTTCTGGAAACGGCGGAAAAACATTTCATGGTGGGCCATCGCGTGCACTACTACGTCTTCACCGATCAGCCGGCGGCGGTGCCGCGCGTTACCCTGGGCACGGGCCGCCAGCTGAGCGTGCTGGAAGTGCGCGCGTATAAACGTTGGCAGGATGTTAGCATGCGCCGCATGGAAATGATTAGCGATTTTTGCGAACGTCGCTTTCTGAGCGAAGTGGATTATCTGGTGTGCGTGGATGTGGATATGGAATTTCGCGATCATGTGGGCGTGGAAATTCTGACCCCGCTGTTTGGCACCCTGCATCCGGGCTTCTATGGCAGCAGCCGCGAAGCATTCACCTACGAACGCCGCCCGCAGAGCCAGGCCTACATTCCGAAAGATGAAGGCGATTTCTATTATCTGGGCGGCTTCTTTGGCGGCTCAGTTCAGGAAGTGCAGCGTCTGACCCGCGCCTGCCATCAGGCGATGATGGTGGACCAGGCGAACGGCATTGAAGCCGTTTGGCATGATGAAAGCCATCTGAACAAATACCTGCTGCGTCATAAACCGACCAAAGTTCTGTCGCCGGAATATCTGTGGGATCAGCAGCTGCTGGGCTGGCCGGCGGTGCTGCGTAAACTGCGCTTTACCGCGGTGCCGAAAAACCATCAGGCGGTACGTAATCCGTAA
After codon optimisation using the VectorBuilder tool, the sequence showed a GC content of 56.53% and a Codon Adaptation Index (CAI) of 0.94. The GC content falls within the preferred range for E. coli expression (typically ~30–70%), suggesting the sequence should be stable and efficiently transcribed. The CAI value is close to 1.0, which indicates that the codons used in the optimised gene closely match the codon usage bias of the host organism. A high CAI generally correlates with improved translation efficiency because the host has abundant tRNAs for these codons.
What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
To produce the protein from the DNA sequence, the optimised gene would first be cloned into an expression vector containing a promoter, ribosome binding site, and terminator. The plasmid would then be introduced into a host such as E. coli through transformation. Inside the cell, RNA polymerase binds to the promoter and transcribes the DNA into messenger RNA (mRNA). The ribosome then binds to the mRNA and reads the codons, while tRNAs deliver the corresponding amino acids to build the polypeptide chain. The growing chain folds into the functional ABO glycosyltransferase protein after translation.
An alternative method is a cell-free expression system, where purified transcription and translation machinery are mixed with the DNA template in vitro. In this system, RNA is synthesised from the DNA and immediately translated into protein without living cells. Cell-free expression is faster and easier to control, while cell-based expression generally produces larger quantities of protein.
In both approaches, the central dogma applies: DNA is transcribed into RNA, and RNA is translated into the protein.
Describe how a single gene codes for multiple proteins at the transcriptional level. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated protein!!!
A single gene can produce multiple proteins at the transcriptional level, mainly through alternative splicing. During transcription, the DNA sequence is copied into a pre-mRNA that contains both exons (coding regions) and introns (non-coding regions). The cell’s splicing machinery can remove introns in different patterns and join different combinations of exons together. As a result, multiple mature mRNA transcripts can be produced from the same gene, and each mRNA can be translated into a slightly different protein with different structure or function. This allows one gene to increase protein diversity without changing the DNA sequence.
Below is a small illustrative alignment showing how DNA becomes RNA and then protein. Notice that T becomes U during transcription, and every 3 nucleotides (codon) form one amino acid during translation:
DNA: ATG AAA GCT TTT GGA TAA
RNA: AUG AAA GCU UUU GGA UAA
Protein: Met Lys Ala Phe Gly Stop
If an exon is skipped during splicing, the RNA sequence changes:
DNA: ATG AAA GGA TAA
RNA: AUG AAA GGA UAA
Protein: Met Lys Gly Stop
Even though the gene is the same, different mRNA transcripts lead to different proteins. This is one of the main ways cells generate protein diversity from a limited number of genes.



Click here to get the final sequence
FASTA file for the above sequence
constitutive_sfGFP_his_tag TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

When I first uploaded the DNA sequence, it gave an error due to high GC content. I then used Twist’s built-in codon optimization tool for E. coli to optimize the sequence, and the new sequence is provided below:
ATGGCAGAAG TTCTTCGCAC TTTAGCAGGC AAGCCCAAAT GTCACGCATT ACGGCCAATG ATATTATTTC TCATCATGCT CGTTTTGGTA CTCTTTGGCT ACGGTGTACT CAGTCCTCGC TCTTTGATGC CTGGTAGTTT AGAGAGAGGG TTTTGTATGG CCGTCCGGGA GCCAGATCAC CTGCAAAGAG TATCATTGCC TCGGATGGTT TACCCCCAAC CTAAGGTGTT AACTCCTTGT CGAAAGGACG TTCTTGTAGT AACTCCTTGG CTTGCCCCTA TCGTATGGGA AGGTACATTC AACATCGACA TCCTTAACGA GCAATTCCGG TTGCAAAACA CGACTATAGG TCTTACAGTT TTCGCAATAA AGAAGTATGT TGCCTTCCTC AAGTTATTCC TCGAGACAGC TGAGAAGCAC TTTATGGTCG GTCACCGGGT TCATTATTAT GTGTTTACTG ACCAACCAGC AGCCGTTCCT CGTGTCACTT TAGGTACTGG TCGTCAATTA TCCGTTCTCG AGGTCCGGGC CTACAAGCGC TGGCAAGACG TATCTATGCG TCGAATGGAG ATGATCAGTG ACTTCTGTGA GCGGAGATTC CTTTCAGAGG TTGACTACTT GGTCTGTGTA GACGTTGACA TGGAGTTCCG GGACCACGTA GGTGTTGAGA TCTTAACGCC ATTATTCGGA ACTCTTCACC CCGGTTTCTA CGGGAGTTCG CGCGAGGCTT TTACATATGA GCGTAGACCT CAATCCCAAG CATATATACC TAAGGACGAG GGTGACTTTT ACTACTTAGG TGGATTCTTC GGTGGGTCCG TACAAGAGGT TCAACGCTTA ACTCGGGCAT GTCACCAAGC AATGATGGTC GATCAAGCAA ATGGGATCGA GGCAGTCTGG CACGACGAGT CTCACTTAAA TAAGTATTTG CTTCGGCACA AGCCAACAAA GGTGCTTAGT CCCGAGTACT TGTGGGACCA ACAATTACTC GGATGGCCTG CAGTCCTTAG AAAGCTCCGT TTCACGGCAG TTCCCAAGAA TCACCAAGCT GTTCGGAACC CATGA

After downloading the construct from Twist, I uploaded it to Benchling, and the plasmid map obtained is shown below.

Also answer the following questions:
(i) Is your method first-, second- or third-generation or other? How so?
(ii) What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
(iii) What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
(iv) What is the output of your chosen sequencing technology?
For my project, I would use Illumina sequencing (sequencing by synthesis) to sequence gut microbiome DNA from individuals with different ABO blood groups. Since my goal is to compare microbial communities and identify possible functional genes linked to host glycan interactions, I need a method that can accurately sequence many DNA fragments from a mixed sample at high depth. Illumina sequencing is widely used for metagenomics because it provides high accuracy, strong statistical power, and the ability to detect small differences in microbial composition between groups.
Illumina sequencing is a second-generation (next-generation sequencing, NGS) technology. It is considered second-generation because it performs massively parallel sequencing of millions of DNA fragments at the same time and requires clonal amplification before sequencing. The technology uses bridge amplification to generate clusters and reversible terminator nucleotides to read one base at a time. Unlike first-generation Sanger sequencing, which reads single fragments individually, Illumina reads many short fragments simultaneously, making it suitable for complex microbiome samples.
The input for this method would be total DNA extracted from stool samples containing gut microbiome material. First, DNA is isolated from the sample and then fragmented into short pieces of approximately 200–500 base pairs. The fragment ends are repaired and modified by adding an A-tail, followed by ligation of Illumina-specific adapters to both ends. The adapter-ligated fragments are PCR amplified to enrich correctly prepared molecules and create the sequencing library. After quality control and quantification, the library is loaded onto the flow cell for sequencing.
The sequencing process begins with cluster generation on the flow cell. DNA fragments bind to complementary oligonucleotides attached to the surface and undergo bridge amplification, forming clonal clusters of identical DNA molecules. During sequencing by synthesis, fluorescently labeled nucleotides with reversible terminators are added. Only one nucleotide can be incorporated in each cycle. After incorporation, a camera records the fluorescent signal, which corresponds to a specific base (A, T, C, or G). The fluorescent label and terminator are then chemically removed, allowing the next cycle to occur. By repeating this process, the machine determines the sequence base by base through detection of fluorescence signals, a process known as base calling.
The output of Illumina sequencing is a large collection of short DNA sequence reads stored in FASTQ files. Each read contains the nucleotide sequence along with a quality score indicating confidence in each base call. These reads can then be analyzed bioinformatically to identify microbial species, compare microbiome composition between blood groups, and detect functional genes such as glycan-degrading enzymes or inflammation-associated factors. This information helps evaluate whether differences in microbiome behavior could explain the observed association between blood type A and gastrointestinal disease risk.
(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize!
For my project, I would synthesize a bacterial genetic sensing circuit that detects blood-group-related glycans and activates a measurable reporter when inflammatory conditions are present. The goal is not to diagnose disease yet, but to experimentally test whether molecules associated with blood type A environments change microbial behavior in a biologically meaningful way.
ABO blood groups differ in terminal sugar structures on host glycans. Blood type A contains N-acetylgalactosamine (GalNAc) as the terminal sugar. Many gut bacteria recognize or metabolize host glycans, so my idea is to engineer a bacterium (for example a lab strain of E. coli) with a circuit that turns on a fluorescent signal only when two conditions occur: detection of A-associated glycans and detection of inflammation-related signals (such as nitrate or reactive oxygen stress). This would function as a controllable research platform to experimentally connect host glycans to microbial inflammatory responses.
The DNA I would synthesize is therefore a two-input AND-gate genetic circuit consisting of: A glycan-responsive promoter (activated by GalNAc metabolism regulator), an inflammation-responsive promoter (stress/nitrate inducible), a transcriptional logic gate (split activator system), and a GFP reporter gene
If fluorescence appears only when both signals are present, it would support the hypothesis that specific host glycan environments influence microbial inflammatory behavior.
Example construct design:
Below is a simplified example of a reporter cassette that could realistically be synthesized (promoter + RBS + GFP + terminator):
TTGACATGATAAGTAAGGAGGTTTAAACATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACATACGGAAAACTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCAAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAATTTTTTGCTAGC
(This represents a GFP reporter module; regulatory promoters would be placed upstream depending on the sensing design.)
Synthesizing this circuit allows experimental testing of the hypothesis: Blood-type-specific glycans influence microbial inflammatory behavior.
Instead of relying on epidemiological correlations, the engineered system creates a controllable biological readout. If activation differs in A-glycan conditions compared to others, it would provide mechanistic evidence that host glycan composition can shape disease-related microbial responses.
(ii) What technology or technologies would you use to perform this DNA synthesis, and why? Also answer the following questions:
(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?
(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:
One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.
For this week, we’d like for you to do the following:
Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory.
Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell-free protein synthesis.
Echo transfer biosensor constructs and any required cofactors into specified wells.
Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate.
Multiflo dispense the CFPS lysate to all wells to start protein expression.
PlateLoc seal the plate.
Inheco incubate the plate at 37°C while the biosensor proteins are synthesized.
XPeel remove the seal.
PHERAstar measure fluorescence to compare biosensor responses.
One example is AssemblyTron: flexible automation of DNA assembly with Opentrons OT‑2 lab robots by Eno et al. (Synthetic Biology, 2023). The authors developed AssemblyTron, an open‑source Python package that takes DNA assembly designs (from the j5 design software) and converts them into executable protocols for the Opentrons OT‑2 liquid‑handling robot. The biological focus is on accelerating the Build step of the Design–Build–Test–Learn cycle in synthetic biology by fully automating PCR setup and multi‑part DNA assemblies such as Golden Gate and in vivo assembly (IVA). academic.oup
I want to build a small, automated platform to probe whether ABO blood‑group–like glycan patterns (especially type A–like structures) influence biological responses relevant to gastrointestinal disease risk. The concept is to combine:
The aim for this course project is not a full mechanistic explanation, but a robot‑friendly experimental scaffold that could, in principle, be scaled to test whether “type A‑like” contexts behave systematically differently from “type O/B‑like” contexts.
For the scope of the class, I would focus on plate‑based assays with three main automated modules:
Automated plate setup (Opentrons OT‑2)
Automated time‑course perturbations
Automated sampling / readout prep
This pipeline mirrors the “DBTL” idea in the AssemblyTron paper: design a matrix of conditions, automatically build the experiment on the robot, then test by measuring reporter outputs.
Here is a concrete plan for a 96‑well plate experiment:
Design phase
Robot setup
Automated protocol
Readout
This is illustrative pseudocode in a Python‑like style for an Opentrons OT‑2 protocol:
In a more complete version, the layout and volumes would be read from a CSV (like AssemblyTron reads design files), allowing you to change the entire experimental design without rewriting the protocol.
To better mimic gastrointestinal contexts and make the automation physically robust, I could incorporate simple 3D‑printed pieces:
These holders would be designed to preserve compatibility with standard SBS plate dimensions, so the robot’s calibration remains valid.
If access to a cloud automation platform such as Ginkgo Nebula is available, an extended version of the project could:
For this class assignment, the concrete deliverable is the Opentrons protocol plus experimental design, but the architecture is chosen so it could later be ported to a higher‑throughput cloud system.