Hadjer Begga — HTGAA Spring 2026

cover image cover image

About me

I am a recent graduate molecular biology engineer with a strong interest in genetic and protein engineering, synthetic biology, and pharmacogenomics. Through my participation in HTGAA Spring 2026, I aim to strengthen my understanding of DNA manipulation technologies and explore how they can be developed and applied responsibly, with careful consideration of ethical, safety, and governance frameworks.

Contact info

Homework

Labs

Projects

Subsections of Hadjer Begga — HTGAA Spring 2026

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image
1. Project Concept: In-Silico Design of a Lactase-Releasing Probiotic for Lactose Intolerance
  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

I am interested in developing an engineered probiotic system designed to release the lactase enzyme on demand in the human gut for individuals with lactose intolerance. This project is entirely in silico, combining concepts from synthetic biology, microbiome modeling, and systems biology without any wet-lab implementation.

The system would simulate a probiotic chassis such as Lactobacillus or Bifidobacterium, equipped with virtual genetic circuits inspired by lactose metabolism. These circuits would model regulatory control of lactase expression based on local lactose concentration, using logic-gate–like behavior and feedback mechanisms. Enzyme production would increase when lactose is present and decrease once lactose is depleted, allowing adaptive and resource-efficient regulation.

Why Is This Idea Relevant?

In-silico modeling is a recognized and safe approach in synthetic biology that allows the exploration of engineered biological systems and gut microbiome interactions without experimental, ethical, or biosafety risks. Such computational frameworks enable hypothesis generation, system-level understanding, and educational visualization of complex biological behaviors before any real-world implementation.

Note

Lactose intolerance is one of the most common digestive disorders globally, caused by reduced or absent lactase activity in adulthood. It affects a large proportion of the world’s population, particularly in Africa, Asia, and South America, leading to gastrointestinal discomfort and dietary restrictions. Addressing this condition highlights a real, widespread health challenge that benefits from innovative and accessible solutions. (Lactose Intolerance - NIDDK, 2024); image reference Bluepic Bluepic

2. Governance / Policy Goals

2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.


Because this project represents an early, in-silico design phase, its governance goals focus on the responsible framing, communication, and interpretation of computational results rather than regulation of a finalized biological product.

1. Ensuring Ethical Transparency

In silico models can appear highly convincing, even though they rely on simplifying assumptions. Without transparency, such simulations may be mistakenly interpreted as real biological proof, reused incorrectly by others, or generate unjustified confidence in safety or effectiveness. To prevent these risks, the project emphasizes:

  • Clear documentation of all modeling assumptions, including chosen parameters (e.g., lactose concentration thresholds, promoter sensitivity), simulation boundaries, and known limitations.
  • Explicit disclosure of the speculative nature of the work, clarifying potential real-world implications while emphasizing that the model does not represent a validated or deployable probiotic system.
2. Maintaining Scientific Integrity

Although the conceptual model may function optimally in simulation, real biological systems often behave unpredictably due to environmental variability and biological complexity. To maintain scientific integrity, it is essential to:

  • Avoid overstating the effectiveness or safety of real-world probiotics based solely on computational results, and clearly distinguish between theoretical design and experimentally validated outcomes.
3. Considering Public Health and Safety

Since biological behavior cannot be predicted with complete accuracy, the project addresses public health and safety by:

  • Highlighting potential risks of physical implementation, such as disruption of gut microbiome balance or unintended metabolic effects.
  • Including scenario-based analyses to explore possible unexpected consequences for gut microbiome health under different simulated conditions.
3. Potential Governance Actions

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

  1. Purpose: What is done now and what changes are you proposing?
  2. Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
  3. Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
  4. Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

PURPOSEDESIGNASSUMPTIONSRISKS OF FAILURE & “SUCCESS”
Providing mandatory transparency and documentation standards for in-silico biological models (by academic researchers, journals, funding bodies)Require structured documentation sections describing modeling assumptions, parameter choices, simulation constraints, and known limitations of the modelClear and standardized documentation reduces misuse, misinterpretation, and overconfidence in simulation resultsDocumentation may be superficial, misunderstood, or ignored by users
Providing ethical claim-limitation guidelines for computational synthetic biology projects (by bioethics committees, academic institutions)Encourage explicit labeling of projects as Conceptual, Exploratory, or Pre-experimental, and require clear statements that simulation outcomes do not constitute clinical or biological proofClear framing of claims improves scientific integrity, responsible communication, and public trust in synthetic biology researchGuidelines may be ignored outside formal academic or publishing contexts; excessive caution may slow translation of promising concepts into experimental research
Recommending scenario-based risk modeling as a design requirement (by researchers, synthetic biology educators)Integrate scenario analysis into in-silico projects, exploring possible unintended outcomes such as microbiome imbalance, excessive enzyme expression, or metabolic side effects if the system were physically implementedEarly anticipation of risks improves downstream design decisions and promotes responsible innovationScenario analysis may oversimplify complex biological interactions
4. Scoring Governance Actions Against Policy Goals

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:


Action / Policy GoalEnsuring Ethical TransparencyMaintaining Scientific IntegrityConsidering Public Health and Safety
Providing Mandatory Transparency & Documentation Standards for In-Silico Biological Models123
Providing Ethical Claim-Limitation Guidelines for Computational Synthetic Biology Projects212
Recommending Scenario-Based Risk Modeling as a Design Requirement321
5. Prioritization of Governance Options and Strategic Recommendations

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.


From my perspective, scenario-based risk modeling can be prioritized over the other governance options, because all three approaches address public health and safety either directly or indirectly. Scenario-based analysis explicitly explores what could go wrong if an in-silico model were physically implemented, making it the most direct mechanism for anticipating risks to gut microbiome balance or unintended metabolic effects. However, maintaining scientific integrity also plays a critical indirect role in protecting public health: by avoiding overclaiming the safety or effectiveness of a purely conceptual model, the transition from simulation to real-world application becomes more cautious, accurate, and oriented toward appropriate experimental validation, thereby reducing the likelihood of harmful misinterpretations. Similarly, ensuring ethical transparency through clear and accurate documentation of modeling assumptions, parameters, and limitations improves how the model is interpreted and reused by others, helping prevent incorrect applications that could ultimately pose health risks.


Sources:


Assignment (Week 2 Lecture Prep):

Homework Questions from Professor Jacobson:

  1. Error rate and genome context • From the slide N°= 8 , DNA polymerase has an error rate of ~1 in 10⁶ bases. • With the human genome of ~3 × 10⁹ bp, this would result in ~3,000 errors per replication without repair. • Biology reduces this discrepancy with proofreading activity of DNA polymerase (3′→5′ exonuclease) and post-replication mismatch repair like MutS, NER, BER…, which collectively reduce the final error rate to ~1 in 10⁹–10¹⁰.
  2. Human protein: ~1036 bp (~345 amino acids), With ~3 codons per amino acid on average, the number of possible DNA sequences for an average human protein is ~3³⁴⁵ (~10¹⁶⁴ possible sequences). Not all sequences work in practice because of Mutations: Insertions, deletions, transitions, and transversions that can introduce frameshifts or premature stop codons, making the protein non-functional. Also, there are some mechanism of regulations that make some Sequences creating unwanted secondary structures in mRNA, affect splicing, or introduce cryptic signals that disrupt translation.

Homework Questions from Dr. LeProust:

  1. Most commonly used method for oligo synthesis Today, almost all synthetic DNA is made using phosphoramidite solid-phase synthesis. This method adds one nucleotide at a time on a solid support and is reliable, efficient, and easy to automate, which is why it became the standard for modern DNA synthesizers. https://biolabmix.ru/en/info/detail/oligonucleotide-synthesis/#:~:text=The%20most%20common%20approach%20to,for%20example%2C%20by%20attaching%20fluorophores.

  2. Why it’s hard to make oligos longer than ~200 nt Each step in chemical DNA synthesis is very efficient but not perfect, so small errors happen every time a base is added. As the oligo gets longer, these errors pile up, and beyond about 200 nucleotides it becomes very difficult to get a clean, full-length sequence. https://pubs.rsc.org/en/content/articlepdf/2025/sc/d4sc06958g

  3. Why you can’t directly synthesize a 2000 bp gene Making a 2000-base gene in one piece would accumulate too many chemical errors and damaged bases to be useful. Instead, companies synthesize short oligos and then assemble them enzymatically, followed by cloning and sequence checking to make sure the gene is correct. https://www.pnas.org/doi/10.1073/pnas.2237126100#:~:text=The%20broader%20implications%20of%20the,without%20multiple%20repair/selection%20steps.


Homework Question from George Church:

All animals require the same 10 essential amino acids because they cannot synthesize them and must obtain them from their diet. These are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine (arginine is essential for all animals and conditionally essential in adult humans). The “lysine contingency” refers to the idea that lysine is often the limiting essential amino acid in plant-based diets, especially those dominated by cereals like wheat, rice, or maize. Since animals cannot make lysine, their growth and health are directly constrained by how much lysine is available in their food. So knowing that all animals share the same essential amino acid requirements makes lysine’s importance stand out even more. It shows that lysine is not just nutritionally important but evolutionarily critical.

https://www.kemin.com/ap/en/blog/animal/amino-acids-for-animal-health#:~:text=Essential%20amino%20acids:%20These%20are,essential)%2C%20leucine%20and%20lysine

Week 2 HW: DNA Read, Write, and Edit

cover image cover image
Part 0: Basics of Gel Electrophoresis

Attend or watch all lecture and recitation videos. Optionally watch bootcamp


Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview: Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes:

  • EcoRI
  • HindIII
  • BamHI
  • KpnI
  • EcoRV
  • SacI
  • SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs!
  • In this part, I imported The complete 48,502 bp linear genome of bacteriophage lambda from NCBI GenBank into Benchling. This sequence corresponds to the Lambda DNA sold by NEB (N3011) and will be used for in-silico restriction digestion.
Image ImageImage ImageImage ImageImage ImageImage ImageImage ImageImage Image
  • Then simulated restriction enzyme digestion using EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. By running in-silico gel electrophoresis . The resulting virtual gel shows discrete bands corresponding to these fragments, which demostrates how sequence information maps to physical separation in gel electrophoresis. Virtual digest gel Virtual digest gel

  • To create a pattern in the style of Paul Vanouse’s work, I experimented with different combinations of restriction enzymes to control the gel band patterns. By adjusting the number and length of the resulting DNA fragments, I explored how these parameters influence the final visual outcome. Through this process, I ultimately obtained a gel pattern resembling a butterfly shape. Image Image

Gel pattern in style of Paul Vanouse’s work Gel pattern in style of Paul Vanouse’s work
  • This helped me understand how restriction digests and gels work before doing any real lab experiment. I treated this as both a technical exercise and a creative exploration, inspired by DNA gel art concepts.
Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Assignees for the following sections MIT/Harvard students Required Committed Listeners Optional (for those with Lab access) Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.


Part 3: DNA Design Challenge

Assignees for the following sections MIT/Harvard students Required Committed Listeners Required

3.1. Choose your protein. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose. [Example from our group homework, you may notice the particular format — The example below came from UniProt] sp|P03609|LYS_BPMS2 Lysis protein OS=Escherichia phage MS2 OX=12022 PE=2 SV=1 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLL EAVIRTVTTLQQLLT

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above. [Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI] Lysis protein DNA sequence atggaaacccgattccctcagcaatcgcagcaaactccggcatctactaatagacgccggccattcaaacatgaggattacccatgtcgaagacaacaaagaagttcaactctttatgtattgatcttcctcgcgatctttctctcgaaatttaccaatcaattgcttctgtcgctactggaagcggtgatccgcacagtgacgactttacagcaattgcttacttaa

3.3. Codon optimization. Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why? [Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI] Lysis protein DNA sequence with Codon-Optimization ATGGAAACCCGCTTTCCGCAGCAGAGCCAGCAGACCCCGGCGAGCACCAACCGCCGCCGCCCGTTCAAACATGAAGATTATCCGTGCCGTCGTCAGCAGCGCAGCAGCACCCTGTATGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTCACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACAGTGACGACCCTGCAGCAGCTGCTGACCTAA

3.4. You have a sequence! Now what? What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

3.5. [Optional] How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.
  2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below. [Example shows the biomolecular flow in central dogma from DNA to RNA to Protein] Special note that all “T” were transcribed into “U” and that the 3-nt codon represents 1-AA. Rearranged snapshot of MS2 L-protein information flow from DNA to RNA to Protein. Captured from Ice’s Benchling and stitched together in a ppt

  • For the DNA design challenge, I chose a protein related to my project interest in engineered probiotics and conditional enzyme release in the gut.The enzyme β-galactosidase is well-characterized and commonly expressed in Escherichia coli, making it an ideal candidate for computational DNA design and expression modeling.
  • I first searched online database UniProt to obtain the amino acid sequence of the protein.
Image ImageImage ImageImage Image
  • the amino acid equence was as follow:
>sp|P00722|BGAL_ECOLI Beta-galactosidase OS=Escherichia coli (strain K12) OX=83333 GN=lacZ PE=1 SV=2
MTMITDSLAVVLQRRDWENPGVTQLNRLAAHPPFASWRNSEEARTDRPSQQLRSLNGEWR
FAWFPAPEAVPESWLECDLPEADTVVVPSNWQMHGYDAPIYTNVTYPITVNPPFVPTENP
TGCYSLTFNVDESWLQEGQTRIIFDGVNSAFHLWCNGRWVGYGQDSRLPSEFDLSAFLRA
GENRLAVMVLRWSDGSYLEDQDMWRMSGIFRDVSLLHKPTTQISDFHVATRFNDDFSRAV
LEAEVQMCGELRDYLRVTVSLWQGETQVASGTAPFGGEIIDERGGYADRVTLRLNVENPK
LWSAEIPNLYRAVVELHTADGTLIEAEACDVGFREVRIENGLLLLNGKPLLIRGVNRHEH
HPLHGQVMDEQTMVQDILLMKQNNFNAVRCSHYPNHPLWYTLCDRYGLYVVDEANIETHG
MVPMNRLTDDPRWLPAMSERVTRMVQRDRNHPSVIIWSLGNESGHGANHDALYRWIKSVD
PSRPVQYEGGGADTTATDIICPMYARVDEDQPFPAVPKWSIKKWLSLPGETRPLILCEYA
HAMGNSLGGFAKYWQAFRQYPRLQGGFVWDWVDQSLIKYDENGNPWSAYGGDFGDTPNDR
QFCMNGLVFADRTPHPALTEAKHQQQFFQFRLSGQTIEVTSEYLFRHSDNELLHWMVALD
GKPLASGEVPLDVAPQGKQLIELPELPQPESAGQLWLTVRVVQPNATAWSEAGHISAWQQ
WRLAENLSVTLPAASHAIPHLTTSEMDFCIELGNKRWQFNRQSGFLSQMWIGDKKQLLTP
LRDQFTRAPLDNDIGVSEATRIDPNAWVERWKAAGHYQAEAALLQCTADTLADAVLITTA
HAWQHQGKTLFISRKTYRIDGSGQMAITVDVEVASDTPHPARIGLNCQLAQVAERVNWLG
LGPQENYPDRLTAACFDRWDLPLSDMYTPYVFPSENGLRCGTRELNYGPHQWRGDFQFNI
SRYSQQQLMETSHRHLLHAEEGTWLNIDGFHMGIGGDDSWSPSVSAEFQLSAGRYHYQLV
WCQK
  • After selecting the protein, I converted the amino acid sequence of β-galactosidase (1024 residues) into the corresponding DNA sequence using the Sequence Manipulation Suite Reverse Translate tool. Because the genetic code is degenerate, multiple codons can encode the same amino acid. The resulting 3072 bp DNA sequence represents one valid nucleotide sequence capable of encoding the β-galactosidase protein.
Image Image
  • the resulted DNA sequence was as follow:
>reverse translation of sp|P00722|BGAL_ECOLI Beta-galactosidase OS=Escherichia coli (strain K12) OX=83333 GN=lacZ PE=1 SV=2 to a 3072 base sequence of most likely codons.
atgaccatgattaccgatagcctggcggtggtgctgcagcgccgcgattgggaaaacccg
ggcgtgacccagctgaaccgcctggcggcgcatccgccgtttgcgagctggcgcaacagc
gaagaagcgcgcaccgatcgcccgagccagcagctgcgcagcctgaacggcgaatggcgc
tttgcgtggtttccggcgccggaagcggtgccggaaagctggctggaatgcgatctgccg
gaagcggataccgtggtggtgccgagcaactggcagatgcatggctatgatgcgccgatt
tataccaacgtgacctatccgattaccgtgaacccgccgtttgtgccgaccgaaaacccg
accggctgctatagcctgacctttaacgtggatgaaagctggctgcaggaaggccagacc
cgcattatttttgatggcgtgaacagcgcgtttcatctgtggtgcaacggccgctgggtg
ggctatggccaggatagccgcctgccgagcgaatttgatctgagcgcgtttctgcgcgcg
ggcgaaaaccgcctggcggtgatggtgctgcgctggagcgatggcagctatctggaagat
caggatatgtggcgcatgagcggcatttttcgcgatgtgagcctgctgcataaaccgacc
acccagattagcgattttcatgtggcgacccgctttaacgatgattttagccgcgcggtg
ctggaagcggaagtgcagatgtgcggcgaactgcgcgattatctgcgcgtgaccgtgagc
ctgtggcagggcgaaacccaggtggcgagcggcaccgcgccgtttggcggcgaaattatt
gatgaacgcggcggctatgcggatcgcgtgaccctgcgcctgaacgtggaaaacccgaaa
ctgtggagcgcggaaattccgaacctgtatcgcgcggtggtggaactgcataccgcggat
ggcaccctgattgaagcggaagcgtgcgatgtgggctttcgcgaagtgcgcattgaaaac
ggcctgctgctgctgaacggcaaaccgctgctgattcgcggcgtgaaccgccatgaacat
catccgctgcatggccaggtgatggatgaacagaccatggtgcaggatattctgctgatg
aaacagaacaactttaacgcggtgcgctgcagccattatccgaaccatccgctgtggtat
accctgtgcgatcgctatggcctgtatgtggtggatgaagcgaacattgaaacccatggc
atggtgccgatgaaccgcctgaccgatgatccgcgctggctgccggcgatgagcgaacgc
gtgacccgcatggtgcagcgcgatcgcaaccatccgagcgtgattatttggagcctgggc
aacgaaagcggccatggcgcgaaccatgatgcgctgtatcgctggattaaaagcgtggat
ccgagccgcccggtgcagtatgaaggcggcggcgcggataccaccgcgaccgatattatt
tgcccgatgtatgcgcgcgtggatgaagatcagccgtttccggcggtgccgaaatggagc
attaaaaaatggctgagcctgccgggcgaaacccgcccgctgattctgtgcgaatatgcg
catgcgatgggcaacagcctgggcggctttgcgaaatattggcaggcgtttcgccagtat
ccgcgcctgcagggcggctttgtgtgggattgggtggatcagagcctgattaaatatgat
gaaaacggcaacccgtggagcgcgtatggcggcgattttggcgataccccgaacgatcgc
cagttttgcatgaacggcctggtgtttgcggatcgcaccccgcatccggcgctgaccgaa
gcgaaacatcagcagcagttttttcagtttcgcctgagcggccagaccattgaagtgacc
agcgaatatctgtttcgccatagcgataacgaactgctgcattggatggtggcgctggat
ggcaaaccgctggcgagcggcgaagtgccgctggatgtggcgccgcagggcaaacagctg
attgaactgccggaactgccgcagccggaaagcgcgggccagctgtggctgaccgtgcgc
gtggtgcagccgaacgcgaccgcgtggagcgaagcgggccatattagcgcgtggcagcag
tggcgcctggcggaaaacctgagcgtgaccctgccggcggcgagccatgcgattccgcat
ctgaccaccagcgaaatggatttttgcattgaactgggcaacaaacgctggcagtttaac
cgccagagcggctttctgagccagatgtggattggcgataaaaaacagctgctgaccccg
ctgcgcgatcagtttacccgcgcgccgctggataacgatattggcgtgagcgaagcgacc
cgcattgatccgaacgcgtgggtggaacgctggaaagcggcgggccattatcaggcggaa
gcggcgctgctgcagtgcaccgcggataccctggcggatgcggtgctgattaccaccgcg
catgcgtggcagcatcagggcaaaaccctgtttattagccgcaaaacctatcgcattgat
ggcagcggccagatggcgattaccgtggatgtggaagtggcgagcgataccccgcatccg
gcgcgcattggcctgaactgccagctggcgcaggtggcggaacgcgtgaactggctgggc
ctgggcccgcaggaaaactatccggatcgcctgaccgcggcgtgctttgatcgctgggat
ctgccgctgagcgatatgtataccccgtatgtgtttccgagcgaaaacggcctgcgctgc
ggcacccgcgaactgaactatggcccgcatcagtggcgcggcgattttcagtttaacatt
agccgctatagccagcagcagctgatggaaaccagccatcgccatctgctgcatgcggaa
gaaggcacctggctgaacattgatggctttcatatgggcattggcggcgatgatagctgg
agcccgagcgtgagcgcggaatttcagctgagcgcgggccgctatcattatcagctggtg
tggtgccagaaa
  • After reverse translation, I verified the identity of the resulting nucleotide sequence by performing a BLASTn search against the reference lacZ gene from Escherichia coli K-12. The alignment showed 100% query coverage with an E-value of 0.0, confirming a highly significant match. The percent identity was ~84%, which is expected because reverse translation produces a synonymous DNA sequence that differs at the codon level while still encoding the same β-galactosidase protein. This result confirmed that the reverse-translated sequence correctly corresponds to the lacZ gene.
Image ImageImage ImageImage Image

Next, I performed codon optimization of the sequence originates from E. coli K-12 to improve expression efficiency in a Lactobacillus probiotic strain (delbrueckii subsp. Bulgaricus), as this organism is the intended chassis for conditional lactase expression in the human gut, to ensure efficient translation in the final probiotic host organism. Codon optimization was performed using a host-specific algorithm using the Vector Builder codon orimisation tool that adjusts synonymous codon usage to match the preferred codons of L. delbrueckii while preserving the original amino acid sequence.

Why codon optimization is necessary?

Codon optimization is required because different organisms preferentially use different synonymous codons. Optimizing the DNA sequence for the codon usage of the target host improves ribosome efficiency, protein yield, and reduces translational stalling.

Image ImageImage Image
  • the resulted optimised sequence is as following:

Host organism: Lactobacillus delbruekii susbsp. Bulgaricus ATCC 11842 = JCM 1002 Original Sequence: GC=59.80%, CAI=0.72 Optimized Sequence: GC=60.16%, CAI=0.89

Improved DNA[1]: GC=60.16%, CAI=0.89
ATGACTATGATCACCGACAGCCTGGCAGTTGTTTTGCAACGGCGGGACTGGGAAAACCCGGGCGTCACTCAGTTGAACCGGCTGGCCGCCCACCCACCATTTGCCAGCTGGCGCAACTCCGAAGAAGCCCGGACCGACCGGCCGAGCCAGCAACTGAGAAGCTTGAACGGCGAATGGCGTTTCGCCTGGTTTCCGGCCCCGGAAGCCGTCCCAGAAAGCTGGTTGGAATGCGACCTCCCGGAAGCCGATACCGTCGTGGTGCCGAGCAACTGGCAAATGCACGGCTATGACGCCCCCATCTACACCAATGTTACCTACCCAATTACCGTCAACCCGCCATTTGTCCCGACCGAAAACCCGACTGGTTGCTATAGCTTGACCTTCAACGTTGACGAAAGCTGGCTGCAAGAAGGCCAGACCCGCATTATTTTTGACGGCGTTAACAGCGCCTTCCACTTGTGGTGCAACGGCCGCTGGGTCGGCTACGGCCAGGACAGCCGCTTGCCATCCGAATTTGACCTGAGTGCTTTCTTGCGGGCCGGCGAAAACCGTCTGGCCGTCATGGTCCTGCGCTGGAGCGACGGCAGCTACCTGGAAGACCAAGACATGTGGCGGATGTCCGGCATTTTCCGGGACGTCAGCCTGCTGCACAAGCCGACCACCCAGATTTCCGACTTTCACGTTGCAACCCGGTTCAACGACGACTTCTCTCGGGCTGTGCTGGAAGCTGAAGTCCAGATGTGCGGCGAATTGCGGGACTACCTGCGGGTTACTGTTTCATTGTGGCAGGGCGAAACCCAGGTTGCCTCAGGCACCGCCCCGTTTGGCGGTGAAATTATCGACGAACGCGGCGGGTACGCCGACCGGGTTACCTTGAGACTGAACGTGGAAAACCCGAAGTTGTGGAGCGCCGAAATCCCAAATCTGTACCGCGCCGTCGTCGAATTGCACACCGCTGACGGCACCCTGATCGAAGCCGAAGCCTGCGACGTTGGCTTCCGGGAAGTCCGCATCGAAAACGGCTTGCTGCTCCTGAACGGCAAGCCACTGCTGATCCGGGGCGTTAACCGGCACGAACACCACCCATTGCACGGCCAAGTCATGGACGAACAGACTATGGTCCAGGACATCCTGCTGATGAAGCAGAACAACTTCAACGCTGTTCGTTGCTCACACTATCCAAACCATCCACTGTGGTACACTCTGTGCGACCGGTACGGCCTGTACGTTGTGGACGAAGCCAACATCGAAACTCACGGCATGGTTCCGATGAACCGGCTGACCGACGACCCGAGATGGCTGCCAGCCATGAGCGAACGGGTTACTCGCATGGTTCAACGCGACCGGAACCACCCATCCGTTATTATCTGGAGCCTGGGGAACGAAAGCGGCCACGGCGCCAATCACGACGCTCTGTACCGGTGGATCAAGTCCGTCGACCCATCCCGCCCTGTTCAGTACGAAGGCGGCGGCGCCGATACGACCGCCACCGACATCATCTGCCCAATGTACGCCCGGGTTGATGAAGACCAGCCGTTTCCGGCTGTCCCAAAGTGGAGCATCAAGAAGTGGCTGAGCCTGCCAGGCGAAACTCGGCCGCTGATCCTGTGCGAATACGCCCACGCCATGGGCAACTCCCTGGGCGGCTTTGCCAAGTACTGGCAGGCTTTTCGCCAGTATCCACGGTTGCAGGGCGGCTTTGTTTGGGACTGGGTCGACCAAAGCCTGATCAAGTACGACGAAAACGGCAACCCGTGGAGCGCCTACGGCGGCGACTTTGGCGACACCCCGAACGACCGCCAGTTTTGCATGAACGGTCTGGTTTTCGCTGACCGGACGCCACACCCGGCCCTGACCGAAGCCAAGCACCAGCAGCAGTTCTTCCAGTTCCGGCTGTCAGGCCAGACCATCGAAGTGACTAGCGAATACCTGTTTCGCCACTCCGACAACGAATTGTTGCACTGGATGGTCGCCCTGGACGGCAAGCCACTGGCCAGCGGCGAAGTTCCGCTGGACGTTGCCCCACAGGGCAAGCAGCTGATCGAATTGCCGGAACTGCCGCAGCCGGAAAGCGCCGGCCAACTGTGGCTGACTGTTCGGGTCGTTCAGCCGAACGCCACTGCCTGGTCTGAAGCCGGGCACATCTCAGCCTGGCAGCAGTGGCGCCTGGCCGAAAACTTGAGCGTTACGCTGCCGGCCGCCAGCCACGCCATCCCACACCTGACTACTAGCGAAATGGACTTTTGCATCGAATTGGGCAACAAGCGGTGGCAATTCAACCGGCAGAGCGGCTTTCTGAGCCAGATGTGGATCGGCGACAAGAAGCAGTTGCTGACCCCACTGCGGGATCAGTTCACCCGGGCCCCGCTGGACAACGACATCGGCGTCAGCGAAGCCACTCGGATCGACCCAAACGCCTGGGTCGAACGCTGGAAGGCCGCCGGCCACTACCAGGCCGAAGCCGCTCTGCTGCAATGTACCGCTGATACGCTGGCTGACGCCGTCTTGATTACTACCGCTCACGCCTGGCAACACCAGGGCAAGACTTTGTTTATCAGCCGGAAGACCTACCGGATTGACGGCAGCGGTCAGATGGCCATCACAGTCGATGTCGAAGTTGCCAGCGACACCCCGCACCCGGCACGGATCGGCCTGAACTGCCAGCTGGCCCAGGTTGCCGAACGGGTTAACTGGCTGGGCCTGGGCCCTCAGGAAAACTACCCAGACCGTTTGACGGCTGCCTGCTTTGACCGGTGGGACTTACCGTTGAGCGATATGTACACTCCATACGTCTTTCCGTCCGAAAACGGCCTGCGGTGCGGCACCAGAGAACTGAACTATGGCCCGCACCAGTGGCGCGGTGACTTTCAATTCAACATCAGCCGGTACTCCCAGCAGCAGTTGATGGAAACCAGCCACCGCCACCTGCTGCACGCCGAAGAAGGGACGTGGTTGAACATCGACGGCTTTCACATGGGCATCGGCGGCGACGACTCATGGAGCCCGAGCGTTAGCGCTGAATTCCAGTTGAGCGCCGGCCGGTACCACTACCAGTTGGTTTGGTGCCAGAAG
  • To produce the protein from this DNA sequence, I would use a cell-dependent expression system based on bacterial transformation and expression. In this approach, This gene is then placed into an expression cassette with the necessary regulatory elements so it can be used by a biological system.
  • To produce the protein, I would use a cell-dependent expression system through bacterial cloning. The designed DNA sequence is inserted into a plasmid and introduced into a bacterial host by transformation. Inside the cell, the gene is transcribed into mRNA under the control of the selected promoter. The mRNA is then translated by ribosomes, which read the codons starting at the start codon and assemble the corresponding amino acids into the lactase protein. This approach follows the natural flow of genetic information (DNA to RNA to protein) and allows controlled production of the enzyme in living cells.
Part 4: Prepare a Twist DNA Synthesis Order

Assignees for the following sections MIT/Harvard students Required Committed Listeners Required This is a practice exercise, not necessarily your real Twist order!

4.1. Create a Twist account, and Benchling account

4.2. Build Your DNA Insert Sequence For example, let’s make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein): In Benchling, select New DNA/RNA sequence Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing). Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is (see image below). Promoter (e.g. BBa_J23106) TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC RBS (e.g. BBa_B0034 with spacers for optimal expression) CATTAAAGAGGAGAAAGGTACC Start Codon ATG Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example) AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA 7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli) CATCACCATCACCATCATCAC Stop Codon TAA Terminator (e.g. BBa_B0015) CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated! (Optional) Share your final sequence link with a TA for review! This insert sequence you built is commonly referred to as an expression cassette in molecular biology (a sequence you can drop into any vector and it’ll perform its function). Go ahead and download the FASTA file for the sequence you made. It’s helpful to visualize DNA designs using SBOL Canvas (Synthetic Biology Open Language) to convey your designs. Here’s an example of what you just annotated in Benchling:

4.2. On Twist, Select The “Genes” Option

4.3. Select “Clonal Genes” option For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project. Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly. Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.4. Import your sequence You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.5. Choose Your Vector Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle! The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance. Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later. For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy. Click into your sequence and select download construct (GenBank) to get the full plasmid sequence: Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded. This is the plasmid you just built with your expression cassette included. Congratulations on building your first plasmid! Important For your final projects, remember to include:

  1. Fully annotated Benchling insert fragment
  2. Desired Twist cloning vector

  • A lactose-inducible promoter was selected to enable conditional expression of lactase in response to lactose availability in the gut. The PlacA promoter region was extracted from the Lactococcus lactis lac operon upstream of the native ribosome binding site, with preserving lactose-responsive regulation.
AATCGTCGTTTTTTGTTCATATGAAGACTTTCTTTCATAAAGTAATTTTTTTCCAAAGATAATTCTCTTT
TAATTGTATCATAAAAGATAATATTTTCAAGGTAAAACAAACAATTTCAAACAAAAACAAACGTTAGATG
ATGAAATAAGAACAGAGGATTGACGTATATTAGCTTAGGTCAGATTTTGTATAAGACGAAAATAAAGTAG
GACCTCTTAATCAGTAAGTTATAGAAAGTAAAAGACTTTTGTAATACCTGAATAGATATTTCACGTCCAT
TTTGTGATGGATTAAATGAACAAAAATGAACAATAATTTAACGGTGTTATCTATTTTTTAAAAAAACAAA
TAAAAAAAAACAAAAAATTAACAAAAATAGTTGCGTTTTGTTTGAATGTTTGATATCATATAAACAAAGA
AATGATGAAAACGTTATCTTGAACATTTTGCAAAATATTTTCTACTTCTACGTAGCATTTCTTTTTAAAA
TTTAGGAGGTAGTCCAA

Image Image Image Image

  • For the RBS, I chose to keep the native Lactococcus lactis ribosome binding site (RBS) derived from the lacA operon which is the region immediately upstream of the coding sequence (CDS) and preserved its original spacer length to ensure efficient translation initiation in the probiotic host. Maintaining native RBS spacing is critical in Gram-positive bacteria, as ribosome binding and translation efficiency are highly sensitive to the distance between the Shine–Dalgarno sequence and the start codon.

  • the RBS sequence is as follow:

AGGAGGTAGTCCAA
  • I selected the transcription terminator from the tpi gene of Lactococcus lactis, a highly expressed native housekeeping gene, to ensure efficient and reliable transcription termination in the probiotic host. While two related annotations are present in GenBank for this region, both correspond to the same rho-independent transcription terminator. Therefore, I chose the complete annotated terminator region (positions 958–988), which includes both the inverted repeat and the downstream poly-T tract, to ensure proper formation of the termination hairpin and robust termination of transcription.

Image Image Image Image

  • A transcription terminator was included downstream of the lactase coding sequence to ensure proper termination of transcription. This prevents transcriptional read-through into adjacent sequences and improves the stability and predictability of gene expression, independent of promoter regulation.

  • ATG used as start codon and AAG as stop codon

  • From the selected elements, I built a linear expression cassette in Benchling containing a lactose-regulated promoter, native LAB ribosome binding site, codon-optimized lacZ, and a native transcription terminator. I exported this sequence as a FASTA file. Cassette_link_to_Benchling

Image ImageImage Image
  • When I first uploaded my expression cassette FASTA file to Twist Bioscience, I encountered an initial error related to the FASTA header name. The header exceeded the maximum allowed length (32 characters), which caused the sequence to be rejected. I fixed this issue by shortening the header name and re-uploading the file. After this correction, the sequence was accepted for further analysis.
Image Image

However, after re-uploading the corrected file, additional synthesis warnings appeared. These warnings were related to large GC content variation, repetitive regions, and overall sequence complexity. These issues are mainly due to the codon-optimized lacZ gene and the presence of multiple regulatory elements such as the ribosome binding site and transcription terminator. Twist flagged these features as potential manufacturability risks. Unfortunately, I was not able to resolve these additional issues at this stage. Fixing them would have required re-optimizing the enzyme sequence, possibly changing the host organism for codon optimization, and redesigning the regulatory architecture of the cassette. Due to time constraints and because this assignment focuses on learning the design and ordering workflow rather than producing a synthesis-ready construct, I chose not to redesign the sequence further.

Image Image

For this exercise, I proceeded by selecting a Twist clonal vector (pTwist Amp High Copy) to complete the plasmid design. Although the insert sequence still contained manufacturability warnings. However, In a real DNA synthesis order, additional sequence optimization would be required to reduce GC content extremes and repetitive regions to meet synthesis constraints.

Image ImageImage Image
Part 5: DNA Read/Write/Edit

Assignees for the following sections MIT/Harvard students Required Committed Listeners Required

5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank). DNA-based digital data storage technology. Source: Archives in DNA: Workshop Exploring Implications of an Emerging Bio-Digital Technology through Design Fiction - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/DNA-based-digital-data-storage-technology_fig1_353128454 [accessed 11 Feb 2025]. (ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

  1. Is your method first-, second- or third-generation or other? How so?
  2. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
  3. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)
  4. What is the output of your chosen sequencing technology?

5.2 DNA Write (i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :) (ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

  1. What are the essential steps of your chosen sequencing methods?
  2. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

5.3 DNA Edit (i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why? (ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

  1. How does your technology of choice edit DNA? What are the essential steps?
  2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
  3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

DNA read:

  • I would want to sequence DNA used for digital data storage. In my knowledge, this technology enables the storage of digital information such as text, images, or files by encoding them into DNA sequences instead of being stored on hard drives. DNA is extremely stable and can store a huge amount of information in a very small space, which makes it interesting for long-term data storage. Reading this DNA by sequencing is necessary to retrieve the stored information and check that the data has not been damaged or changed over time.
  • For this porpose, I would use Illumina sequencing because it is very accurate and well suited for reading short DNA fragments, which is how DNA data storage is usually organized. this strategy can be performed following 4 crusial steps: Image_adress
Image Image
  1. Generation This method is a second-generation sequencing technology. It sequences millions of short DNA fragments in parallel, which makes it fast and reliable, but it cannot read very long DNA molecules in one piece.

  2. Input and preparation The input is DNA that contains the encoded digital data. To prepare it: The DNA is fragmented into short pieces, Adapters are added to both ends of the fragments, The fragments are amplified using PCR, The prepared DNA is loaded onto a flow cell

  3. How the technology reads DNA (base calling) Each DNA fragment is copied one base at a time using fluorescently labeled nucleotides. A camera records the color added at each step, and the machine translates these signals into DNA letters (A, T, C, G).

  4. Output The output is a large number of short DNA sequence reads saved as digital files. These reads are then assembled and decoded to recover the original stored data.

DNA write:

  • I am particularly interested in the genes in human genomic DNA related to pharmacogenomics and pharmacogenetics. These fields study how genetic variation affects how people respond to drugs. So, I would want to synthesize genes encoding drug-metabolizing enzymes, like human cytochrome P450 enzymes. Since, these genes are central to pharmacogenetics as variations in them strongly influence how drugs are processed in the body. Synthesizing these genes allows them to be studied, expressed, and tested in controlled systems.

  • So in order to synthetizing them , I would use chemical DNA synthesis combined with gene assembly, which is the standard approach used by commercial DNA synthesis companies.

  1. Essential steps
  • DNA synthesis starts with the digital design of the DNA sequence. This is followed by the chemical synthesis of short oligonucleotides, which are then assembled into full-length genes (for example, using Gibson Assembly). The synthesized genes are cloned into plasmids and finally sequence-verified to confirm their accuracy before use.

  • This DNA synthesis method is easy to use and works well for many projects.However, it can sometimes make mistakes during the process. Parts of DNA that have lots of G and C letters or repeated sequences are harder to make. Very long DNA pieces also need to be built from many shorter fragments, which can be tricky and may cause errors.

DNA Edit:

  • I would want to edit DNA in human cell lines used for drug testing, focusing on genes that affect how drugs work. Changing these genes helps researchers see how different genetic variants influence drug effects and side effects, which is useful in pharmacogenomics.

  • The modification can be realised by CRISPR for editing because it allows precise and programmable changes to DNA. this stratigy works by using a guide RNA to find a specific DNA sequence. The Cas enzyme then makes a cut or nick, and the cell repairs it, introducing the change we want.

  • To use CRISPR, you need to design guide RNAs, prepare the CRISPR components (DNA, RNA, or protein), deliver them into cells, and then check which cells were correctly edited.

  • However, there are some limitations, like different editing efficiencies depending on cell type, and ethical or regulatory concerns when working with human cells.


Sources:

Week 3 HW: Lab Automation

cover image cover image
Assignment: Python Script for Opentrons Artwork

Assignees for this section MIT/Harvard students Required Committed Listeners Required Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.

  1. Review this week’s recitation and this week’s lab for details on the Opentrons and programming it.
  2. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.
  3. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept.
  • If you’re a proficient programmer and you’d rather code something mathematical or algorithmic instead of using your GUI coordinates, you may do that instead.
  • Ask for help early!
  • If you are having any trouble with scripting, contact your TAs as soon as possible for help. Do not wait until your scheduled robot time slot or you may not be able to complete this assignment!
  1. If the Python component is proving too problematic even with AI and human assistance, download the full Python script from the GUI website and submit that: Use the download icon pointed to by the red arrow in this diagram.
  2. If you use AI to help complete this homework or lab, document how you used AI and which models made contributions.
  3. Sign up for a robot time slot if you are at MIT/Harvard/Wellesley or at a Node offering Opentrons automation. The Python script you created will be run on the robot to produce your work of art!
  • At MIT/Harvard? Lab times are on Thursday Feb.19 between 10AM and 6PM.
  • At other Nodes? Please coordinate with your Node.
  1. Submit your Python file via this form.

I created two different agar art designs using two Arabic calligraphy styles. For the first design, I used a simple calligraphy style and created it directly using Python scripting in a Google Colab notebook. For the second design, I used the Opentrons Automation Art interface to design the calligraphy and obtain the coordinates.

cover image cover image

I used the Google Gemini AI tool in Colab to understand the logic of the example Opentrons scripts provided in the lab. It helped me understand how coordinates, loops, and pipetting commands work. I also used Gemini AI to help identify and correct mistakes in my Python script, such as indentation errors. I reviewed the suggestions and edited the final code myself.

Post-Lab Questions

Assignees for this section MIT/Harvard students Required Committed Listeners Required One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely. For this week, we’d like for you to do the following:

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
  2. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet. Example 1: You are creating a custom fabric, and want to deposit art onto specific parts that need to be intertwined in odd ways. You can design a 3D printed holder to attach this fabric to it, and be able to deposit bio art on top. Check out the Opentrons 3D Printing Directory. Example 2: You are using the cloud laboratory to screen an array of biosensor constructs that you design, synthesize, and express using cell free protein synthesis.
  3. Echo transfer biosensor constructs and any required cofactors into specified wells.
  4. Bravo stamp in CPFS reagent master mix into all wells of a 96-well / 384-well plate.
  5. Multiflo dispense the CFPS lysate to all wells to start protein expression.
  6. PlateLoc seal the plate.
  7. Inheco incubate the plate at 37°C while the biosensor proteins are synthesized.
  8. XPeel remove the seal.
  9. PHERAstar measure fluorescence to compare biosensor responses.

  1. Featured Article: Automated Assembly of Programmable RNA-Based Sensors
cover image cover image

The research aimed to solve the challenge of rapidly designing and building large libraries of RNA sensors that can “sense” specific viral RNA signatures. These sensors are crucial for diagnostic applications and understanding RNA-protein interactions. The authors focused on the biological validation of these sensors in both in vivo (bacteria) and cell-free systems. cover image cover image

They used the following lab automation:

  • Hardware: Hamilton Microlab STAR liquid-handling workstation.
  • Software: Custom Python scripts integrated with the liquid handler’s control software to manage complex plate layouts and reaction conditions.

The researchers used the automated system as a tool to facilitate:

  • High-Throughput Plasmid Assembly: The authors needed to construct 144 unique plasmids encoding different riboregulator designs. Doing this manually would be prone to pipetting errors and extremely time-consuming.
  • Library Preparation: Automation was used to prepare DNA libraries and reaction mixes for cell-free protein synthesis assays, ensuring consistent reagent volumes across hundreds of samples.
  • Normalization and Dilution: The Hamilton system handled the precise normalization of DNA concentrations across plates, which is critical for accurate comparative screening of sensor performance.

The study successfully identified several high-performing RNA sensors capable of detecting viral targets. The use of automation allowed the team to scale their construction phase by nearly 10-fold compared to manual workflows, enabling them to test a much wider range of biological designs than previously possible. For understanding the content of this artical and which type of Lab automation the authors used in their research , i used the AI tool “SCISPACE”.

  1. Final project Lab Automation:

My final project focuses on developing an in silico model of a lactose-responsive probiotic that produces lactase only when lactose is present. The physical implementation of this model would allow laboratory automation to verify its predicted results through experimental tests. A liquid-handling robot such as Opentrons could be used to prepare a multi-well plate containing a gradient of lactose concentrations. The robot would then inoculate each well with the engineered probiotic strain, and perform timed sampling to measure lactase activity or reporter output. The automated workflow enables scientists to perform systematic and repeatable tests on lactose responses of the genetic circuit. This helps them match their experimental results with their computer-based model. The project currently exists as a computational project which will use automation as a future extension of the project which does not require automation for its current research activities.

Final Project Ideas

Assignees for this section MIT/Harvard students Required Committed Listeners Required As explained in this week’s recitation, add 1-3 slides in your Node’s section of this slide deck with 3 ideas you have for an Individual Final Project. Be sure to put your name, city, and country on your slide!


1st Idea: In-Silico Model of an Engineered Probiotic Producing Lactase in Response to Lactose

  • Problematic:

Many people cannot digest lactose because they lack enough lactase in their intestine. A possible solution is to use engineered probiotics that produce lactase only when lactose is present. Before building such probiotics in the laboratory, it is important to understand how the genetic system would behave. So, without computational modeling, designing these systems requires trial-and-error experiments that are slow and expensive. There is a need for a simple computational model that can predict how a lactose-responsive genetic circuit would control lactase production over time. Image_ref
Lactose_intolerance_distribution_map Lactose_intolerance_distribution_map

  • Objectives:

The project is based on a lactose-responsive genetic cassette, who’s dynamic behavior is modeled as a genetic circuit in silico. The objectives of this project are:

  1. –> To build an in-silico model of a lactose-responsive genetic circuit.
  2. –> To simulate how lactose stimulate lactase production.
  3. –> To study how changing key parameters affects lactose degradation.
  4. –> To explore system behavior completely in silico.
  • Project Description:

The project develops a purely computational model of an engineered probiotic strain. The model is based on a lactose-responsive genetic cassette, whose dynamic behavior is represented in silico as a genetic circuit: – a lacA promoter, operator, and native RBS from Lactococcus lactis for lactose sensing and regulation, – the lacZ gene from Escherichia coli K-12, encoding β-galactosidase (lactase).

I used an AI tool (ChatGPT) to guide me about the repression mechanism I should use, and its response was as follows: To ensure realistic behavior in the model, the lacA promoter includes a native operator, normally repressed by a LacR-like protein in Lactococcus lactis. In the simulation, a repression term is included to prevent unnecessary accumulation of lacZ (lactase) when lactose is absent. Repression Repression The model simulates how the presence of lactose activates the promoter, leading to lactase production, and how this enzyme then degrades lactose over time. No DNA construction or wet-lab experiments are performed. All behavior is represented mathematically and simulated using a computer.

  • Steps to Achieve the Project:
  1. Define simplified biological assumptions (single strain, constant environment).
  2. Represent lactose as the input signal.
  3. Model promoter activation based on lactose concentration.
  4. Model lactase production and degradation over time.
  5. Model lactose degradation by lactase.
  6. Run simulations to observe system behavior.
  7. Change parameters to study different scenarios.
  • Limitations:
  1. The model does not include other gut microbes.
  2. The gut environment is assumed constant.
  3. Results are predictive, not experimentally validated.

2nd Idea:Engineering an E. coli Reporter Strain to Monitor Protein Aging During Heterologous Expression Using a Fluorescent Timer Protein

  • Problematic:

Escherichia coli BL21(DE3) is one of the most widely used hosts for heterologous protein expression in research and biotechnology. Although protein expression levels can be easily measured, there are very limited tools to determine how long the expressed protein molecules have persisted inside the cell. During prolonged induction, proteins may accumulate, age, misfold, or lose functionality, even when expression appears successful. Most current methods detect protein quality only after purification, making optimization of expression conditions slow and inefficient. So, there is a need for a genetically encoded reporter system that can estimate protein aging in living cells during expression. Image_ref

Automation Automation
  • Objectives:

This project is based on a fluorescent timer protein–based reporter system integrated into a heterologous protein expression strain. The objectives are:

  1. –> To engineer a reporter strain capable of estimating protein age in vivo.
  2. –> To use a fluorescent timer protein to distinguish newly synthesized and older proteins.
  3. –> To monitor protein aging during prolonged heterologous expression.
  4. –> To provide a practical tool for optimizing protein expression conditions.
  • Project Description:

The project focuses on the genetic engineering of a protein expression strain of E. coli BL21(DE3). The reporter system is based on a genetic fusion between:

  • a protein of interest (POI) expressed under the T7 promoter, and
  • a fluorescent timer protein whose emission spectrum changes over time after synthesis. The genetic construct consists of:
  • a T7 promoter and ribosome binding site,
  • the gene encoding the protein of interest,
  • a flexible linker sequence,
  • the fluorescent timer protein gene,
  • a transcriptional terminator. Image_ref
Automation Automation

After induction, newly synthesized POI–timer fusion proteins initially emit one fluorescent signal. As time progresses, the timer protein matures and shifts to a second fluorescent signal. The ratio of the two fluorescence signals provides an estimate of the age distribution of the expressed protein population.

I used AI tool (ChatGPT) version to refine questions related to the necessary genetic elements required for T7-based heterologous expression in Escherichia coli BL21(DE3) and to determine the appropriate placement of a fluorescent timer gene for monitoring the age of the expressed protein. Automation Automation

  • Steps to Achieve the Project:
  1. Select a heterologous protein suitable for expression in E. coli.
  2. Design a genetic fusion between the protein of interest and a fluorescent timer protein.
  3. Clone the fusion construct under a T7 promoter into an expression plasmid.
  4. Transform the plasmid into E. coli BL21(DE3).
  5. Induce protein expression using IPTG.
  6. Monitor fluorescence signals over time using appropriate excitation/emission settings.
  7. Calculate fluorescence signal ratios to estimate protein aging.
  8. Compare protein aging under different induction times and expression conditions.
  • Limitations:
  1. Fusion of the timer protein may affect protein folding or function.
  2. Protein damage mechanisms are not directly measured.

3rd Idea:Engineering Houseplants for Atmospheric Carbon Monoxide (CO) Capture

  • Problematic:

Carbon monoxide (CO) is a toxic gas produced by cars, heaters, and incomplete combustion. It is dangerous for humans, especially in indoor environments. Current solutions such as CO detectors can detect the gas but cannot remove it. Some bacteria naturally use CO as an energy source and convert it into carbon dioxide (CO₂). However, common houseplants cannot metabolize CO. If plants could be engineered to convert CO into CO₂, they could act as natural biological air filters. Image_ref Image_ref Image_ref

  • Objectives:

The objectives of this project are:

  1. –> To engineer a houseplant capable of converting carbon monoxide into carbon dioxide.
  2. –> To use microbial genes that naturally perform CO oxidation.
  3. –> To ensure the system works safely in oxygen-rich (indoor) environments.
  4. –> To allow the produced CO₂ to be reused by the plant’s normal photosynthesis.
  5. –> To design a genetically stable and safe indoor plant system.
  • Project Description:

This project engineers a plant to express a bacterial enzyme called carbon monoxide dehydrogenase (CODH). This enzyme converts carbon monoxide (CO) into carbon dioxide (CO₂). The CO₂ produced by this reaction is not wasted. Instead, it enters the plant’s natural photosynthetic pathway (Calvin cycle), where it can be fixed into sugars. The plant therefore detoxifies CO while continuing its normal metabolism. The system is designed to work only when CO is present, to avoid unnecessary energy use.

  • Genetic Elements for construct design:
  1. CO Oxidation Enzymes

The core of the system is the carbon monoxide dehydrogenase (CODH) enzyme, which is responsible for converting carbon monoxide (CO) into carbon dioxide (CO₂). This enzyme is composed of three subunits encoded by the genes coxL, coxM, and coxS. The coxL gene encodes the large catalytic subunit, coxM encodes a subunit involved in electron transfer, and coxS encodes a structural subunit that stabilizes the enzyme complex. These genes originate from Oligotropha carboxidovorans, a bacterium that can oxidize CO in the presence of oxygen, making it suitable for expression in plant cells.

  1. Promoter (Gene Expression Control)

To drive the expression of the CODH genes in plant cells, the CaMV 35S promoter is used. This promoter originates from the Cauliflower mosaic virus and is one of the most widely used promoters in plant biotechnology. It enables strong and constitutive gene expression across many plant tissues and is well characterized, making it a reliable choice for this project.

  1. Subcellular Targeting Signal

A chloroplast transit peptide is included to ensure that the CODH proteins are transported into the chloroplast after synthesis. This targeting signal is derived from the small subunit of the plant enzyme Rubisco, which naturally localizes to the chloroplast. By directing the CODH enzymes to the chloroplast, the CO₂ produced from CO oxidation is generated close to the photosynthetic machinery, allowing it to be efficiently reused by the plant during photosynthesis.

  1. Transcription Terminator

The NOS terminator is used to ensure proper termination of transcription and stable gene expression. This terminator originates from Agrobacterium tumefaciens and is commonly used in plant genetic constructs. Its function is to signal the end of transcription, improving mRNA stability and ensuring reliable expression of the introduced genes.

  • Steps to Achieve the Project:
  1. Select CO-oxidation genes from aerobic bacteria.
  2. Adapt bacterial gene sequences for plant expression (codon optimization).
  3. Design a plant expression construct containing:
  • Plant promoter
  • CODH genes
  • Chloroplast targeting signal
  • Transcription terminator Image_ref Automation Automation
  1. Introduce the construct into the plant genome.
  2. Confirm expression of CODH proteins in plant cells.
  3. Evaluate CO removal and plant health in controlled conditions.
  4. Assess whether produced CO₂ supports normal photosynthesis.
  • Limitations:
  1. Plant genetic engineering is slow and complex.
  2. CO uptake by plants may be limited.
  3. CO metabolism efficiency may be low.

Sources:

week 04 hw: protein design-part-I

cover image cover image
Part A. Conceptual Questions

Assignees for this section MIT/Harvard students Required Committed Listeners Required Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
  3. Why are there only 20 natural amino acids?
  4. Can you make other non-natural amino acids? Design some new amino acids.
  5. Where did amino acids come from before enzymes that make them, and before life started?
  6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
  7. Can you discover additional helices in proteins?
  8. Why are most molecular helices right-handed?
  9. Why do β-sheets tend to aggregate?
  • What is the driving force for β-sheet aggregation?
  1. Why do many amyloid diseases form β-sheets?
  • Can you use amyloid β-sheets as materials?
  1. Design a β-sheet motif that forms a well-ordered structure.

  1. Amino Acid Count in 500g Meat: Meat is roughly 20% protein by mass. (Human Nutrition - Protein, Vitamins, Minerals | Britannica, n.d.)
    • 500g meat x 0.20 = 100g protein.
    • Using an average mass of 100 Daltons (Da) per amino acid: 100g / 100 Daltons (or g/mol) = 1 moles of amino acids
    • 1x 6.022 x 1023 = 6.022 x 1023 molecules /1 mole.
  2. Why we don’t become cows: When we eat protein, our digestive system breaks it down into individual amino acids. Our body then uses its own DNA information to reassemble those amino acids into human proteins. The information which is coded by the sequence of AA is destroyed, but the building blocks or AA are reused.
  3. Why only 20 amino acids: In nature, the use of 20 amino acids is often explained as a “frozen accident” that originated in the early RNA World. This set worked well very early in Earth’s history and then became fixed. These 20 amino acids were good enough to build strong and functional proteins. Even though many other amino acids exist, this small group provides enough variety to perform many functions while remaining simple, stable, and efficient for cells to use. (Doig, 2017)
  4. Non-natural amino acids: Yes, scientists can make non-natural (unnatural) amino acids. They do this using chemical methods and special genetic tools that allow new amino acids to be added to proteins. These new amino acids can give proteins new properties that natural amino acids do not have. (Young & Schultz, 2010) For example, A new amino acid could be made by taking a normal amino acid, like alanine, and adding a fluorine atom to its side chain. This fluorinated amino acid would make proteins more stable and less likely to break down, which is useful for drug design. (Adhikari et al., n.d.)
  5. Pre-life origins of amino acids: According to Gutiérrez-Preciado, Romero, and Peimbert (2010) Before enzymes and living organisms existed, amino acids were probably formed naturally on early Earth. Energy from lightning, UV light, and volcanic heat helped simple gases react to make amino acids. Some amino acids were also brought to Earth by meteorites and comets. Together, these processes created a “primordial soup” of basic organic molecules. (Amino Acids, Evolution | Learn Science at Scitable, n.d.)
  6. D-amino acid α-helix: In nature, L-amino acids form right-handed helices. If you used only D-amino acids, the stereochemistry would be mirrored, resulting in a left-handed $\alpha$-helix. (Zotti et al., n.d.)
  7. Additional helices: Yes, additional helical structures besides the standard α-helix can be found in proteins. Studies show that other types of helices occur in many proteins, but they are often overlooked or mistaken for small distortions in α-helices. These helices are especially common in membrane proteins and are found in a significant number of known protein structures.(Vieira-Pires & Morais-Cabral, 2010)
  8. Why right-handed helices: because this shape is the most stable for the natural building blocks of life. L-amino acids and D-sugars fit together best in a right-handed twist, which allows strong hydrogen bonds and reduces crowding between atoms. Left-handed helices are usually less stable or hard to form. (Right-Handed Alpha-Helix - an Overview | ScienceDirect Topics, n.d.)
  9. β -sheet aggregation: β-sheets tend to aggregate because their edges have exposed hydrogen-bonding groups that easily stick to other β-strands. The main driving forces are hydrogen bonding between strands and the hydrophobic effect, which together make the stacked β-sheet structure very stable and allow fibrils to form.(Gsponer & Vendruscolo, 2006)
Part B: Protein Analysis and Visualization

Assignees for this section MIT/Harvard students Required Committed Listeners Required In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:

  1. Briefly describe the protein you selected and why you selected it.
  2. Identify the amino acid sequence of your protein.
    • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
    • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.
    • Does your protein belong to any protein family?
  3. Identify the structure page of your protein in RCSB
    • When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
    • Are there any other molecules in the solved structure apart from protein?
    • Does your protein belong to any structure classification family?
  4. Open the structure of your protein in any 3D molecule visualization software:
    • PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
    • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
    • Color the protein by secondary structure. Does it have more helices or sheets?
    • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
    • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Part C. Using ML-Based Protein Design Tools

Assignees for this section MIT/Harvard students Required Committed Listeners Required In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

  1. Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
  2. Choose your favorite protein from the PDB.
  3. We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

  1. Deep Mutational Scans
  • a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
  • b. Can you explain any particular pattern? (choose a residue and a mutation that stands out)
  • c. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
  1. Latent Space Analysis
  • a. Use the provided sequence dataset to embed proteins in reduced dimensionality.
  • b. Analyze the different formed neighborhoods: do they approximate similar proteins?
  • c. Place your protein in the resulting map and explain its position and similarity to its neighbors.

C2. Protein Folding

  1. Folding a protein
  2. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
  3. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

C3. Protein Generation

  • Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN
  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
  2. Input this sequence into ESMFold and compare the predicted structure to your original.

Part D. Group Brainstorm on Bacteriophage Engineering

Assignees for this section MIT/Harvard students Required Committed Listeners Required

  1. Find a group of ~3–4 students
  2. Read through the Phage Reading material listed under “Reading & Resources” below
  3. Review the Bacteriophage Final Project Goals for engineering the L Protein:
  • Increased stability (easiest)
  • Higher titers (medium)
  • Higher toxicity of lysis protein (hard)
  1. Brainstorm Session
  • Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).
  • Write a 1-page proposal (bullet points or short paragraphs) describing:
    • Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).
    • Why do you think those tools might help solve your chosen sub-problem?
    • Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).
    • Include a schematic of your pipeline.
  • This resource may be useful: HTGAA Protein Engineering Tools
  1. Each individually put your plan on your HTGAA website
  • Include your group’s short plan for engineering a bacteriophage

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image