Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    At the core of the project is the development of an improved bioprinter designed for two-color bacterial printing using two strains of Escherichia coli: a non-pathogenic, non-modified strain and a genetically transformed strain carrying a plasmid encoding the expression of a color marker. This approach enables the creation of visually distinguishable bacterial images and expands both the artistic and research potential of bioprinting.

  • Week 2 HW: DNA read, write and edit

    Part 1: Benchling & In-silico Gel Art The process of simulation of Lambda genome Restriction Enzyme Digestion in Benchling: Final result: For Gel art pattern I used KpnI, SacI and SalI Restriction Enzyme Digestion. It is a pattern of wave.

  • Week 3 HW: Lab Automation

    Python Script for Opentrons Artwork My Python script which draws my design using the Opentrons. I used AI for adding my color parameters. I made a drawing of Totoro - the titular forest spirit from the 1988 Studio Ghibli animated film My Neighbor Totoro written and directed by Hayao Miyazaki. On Ronan’s website:

  • Week 4 HW: Protein design part I

    Part A. Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Assumptions: Lean meat (e.g., beef, chicken, fish) contains approximately 20% protein by weight (the remainder is water, fat, minerals, and glycogen). 500 g meat × 0.20 = 100 g of protein. Proteins are digested into individual amino acid monomers. Average molar mass of an amino acid residue ≈ 100 g/mol (as stated). Approximately 6 × 10²⁶ amino acid molecules are obtained from 500 g of meat.

  • Week 5 HW: Protein design part II

    Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM the human SOD1 sequence (P00441): MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ the A4V mutant SOD1 sequence: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence with the known SOD1-binding peptide FLYRWLPSRRGG for comparison:

  • Week 6 HW: Genetic circuits part I

    Assignment: DNA Assembly 1. Components of Phusion High-Fidelity PCR Master Mix and their functions The Phusion High-Fidelity PCR Master Mix typically contains: Phusion High-Fidelity DNA Polymerase A thermostable DNA polymerase with 3’→5’ exonuclease proofreading activity, which significantly reduces the error rate during DNA amplification.

  • Week 7 HW: Genetic circuits part II

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Intracellular Artificial Neural Networks (IANNs) provide several important advantages over traditional genetic circuits that operate using Boolean logic gates such as AND, OR, and NOT. Traditional genetic circuits typically generate binary outputs, where genes are either “ON” or “OFF.” In contrast, IANNs can process information in a continuous and weighted manner, similar to artificial neural networks used in computational machine learning.

  • Week 9 HW: Cell-free-systems

    Homework Part A: General and Lecturer-Specific Questions General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Flexibility:

  • Week 10 HW: Imaging and measurement

    Homework: Final Project 1. Conjugation Frequency (Quantitative HGT Efficiency) What is measured: The ratio of transconjugants to total recipient cells — the primary numerical output of the experiment. After overnight incubation on LB agar, cells are re-printed onto: Plate A (no antibiotic) → counts all recipient colonies (pink) + donor (blue) Plate B (+ ampicillin 100 µg/mL) → only AmpR transconjugants survive in recipient zones Conjugation frequency = N(blue AmpR in recipient zone) / N(total recipient on Plate A)

  • Week 11 HW: Bioproduction & Cloud Labs

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork I contributed a one pixel on Q3 - H1 plate with mKO2 fluorescent protein, but it was overlapped by other contributions later. My involvement in the artwork was limited to placing a single pixel, which I used primarily to familiarize myself with the interface. I also initially assumed that the canvas had a limited number of available pixels relative to the large number of course participants. Nevertheless, I found the concept of a collaborative artwork compelling, and its implementation through a great interactive website was thoughtfully designed and inspiring. Also I found the timelapse feature particularly valuable, as it effectively illustrated both the temporal evolution of the image and the conceptual development of the artwork over time.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

At the core of the project is the development of an improved bioprinter designed for two-color bacterial printing using two strains of Escherichia coli: a non-pathogenic, non-modified strain and a genetically transformed strain carrying a plasmid encoding the expression of a color marker. This approach enables the creation of visually distinguishable bacterial images and expands both the artistic and research potential of bioprinting.

The printing process is organized according to a principle conceptually similar to offset printing, where different “layers” or channels correspond to different bacterial suspensions. This makes it possible to consider the bioprinter both as a bioengineering tool and as an experimental platform for rethinking printing technologies. An important artistic dimension lies in the reinterpretation of bacteriological photography. What if documenting bacterial growth will be actively constructed and time-based compositioned, where biological processes function as both medium and subject? The project is based on research of authorship, temporality, and the limits in working with living matter.

The project will be integrated into the educational framework of the University and will be oriented toward an open and interdisciplinary format, as the developed technology, DIY solutions, and methodological approaches are intended to be used by students, researchers, artists, and participants in citizen science.

The project’s primary goal is to foster a constructive user community, uniting students, researchers, artists, and citizen science participants through an arts & science approach. The focus is on democratizing and improving bioprinting, providing accessible tools and documentation, and integrating the project into educational courses and workshops, alongside artistic contexts. This approach promotes education and engagement among a broad audience, fosters a culture of responsible use of living and genetically modified objects, and supports interdisciplinary interactions between science and art.

Purpose: In the local context, bioprinting training courses are limited to single-color cultures, and access to the technology for a wide range of participants is limited. This project proposes two-color bacterial printing using two E. coli strains (non-pathogenic and genetically modified with a colored marker), organized according to principles similar to offset printing. This will improve bioprinting technology and create visually distinguishable images. This allows for reproducible experiments and enables artistic interpretation of bacterial photography as a research phenomenon. The goal is to expand the educational, scientific, and artistic possibilities of bioprinting and make the technology accessible to students, artists, and citizen science participants.

Design:

  • DIY bioprinter working with two types of suspensions
  • Genetically modified and non-pathogenic E. coli strains, safe for use at the BSL-1 educational level
  • Safety and disposal protocols for working with live cultures
  • Software and digital data recording to control printing steps, seeding coordinates, and subsequent image analysis
  • Stakeholders: Faculty, laboratories, students, artists, and citizen science participants who must agree to and adhere to safety and ethical standards
  • Funding for materials, assembly of DIY devices, and organization of courses/workshops.

Assumptions: It’s important that participants follow safety protocols and not modify strains outside of an educational context or without properly created sterile conditions. Also, sometimes the artistic aspect (for example, bacterial photography) may be perceived as science visualization, but in reality, it’s about integrating science, technology, and art through educational and research tools. And creating a DIY device, using equipment and living objects/subjects, and conducting educational courses requires funding, which can be quite a challenge.

Risks of Failure & “Success“: Biological variability in strains can make two-color printing less reproducible. There’s also the possibility of contamination by other bacteria, as well as changes within the bacteria themselves. Technical failures in a DIY bioprinter or software can compromise the accuracy and repeatability of experiments. At the same time, natural biological variability is interesting from both a scientific and artistic perspective. Research into how to improve a DIY bioprinter or learn more about a living subject offers educational and creative value.

table table

The most desirable actions are those aimed at broad audience engagement and democratization of technology. These activities have the greatest impact on the project’s success while simultaneously supporting the goals of citizen science, education, art, and science.


Homework Questions from Professor Jacobson:

  1. 1:106. The error rate decreases from 1:106 to 1:109, i.e. 3 thousand bp, there will be 3 thousand of these base pairs in the genome. Lead to irreversible mutations (base substitutions, insertions, or deletions). This affects the stability of the genome, causing hereditary diseases, cancer (oncogenic potential), and cellular aging. There are DNA repare systems: MutS, MutH, and MutL among prokaryotes, MSH and MLH in eukaryotes.
  2. For the average human protein, consisting of approximately 300-400 amino acids, there is a colossal, virtually infinite number of nucleotide coding (DNA) variants. Due to the degeneracy of the genetic code (64 codons for 20 amino acids), a single amino acid sequence can be encoded by (10^{50}-10^{100}) or more different DNA sequence variants. In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? There will be mutations and altered more complex proteins, their function will be lost. There are conservative domains providing mRNA and/or folding stability, some functional patterns, zones marking exons/introns, start/termination of translation etc. So some sequences won’t give chemically stable, functioning or translation apropriate proteins.

Homework Questions from Dr. LeProust:

  1. Next Generation (Chip Based) Oligo Nucleotide Synthesis.
  2. Yield decrease with further synthesis steps, lower fidelity + error accumulation, hairpin / dimers / cloggs formation.
  3. Direct oligo synthesis is step-by-step base addition to the chain. With this technology, the yield of the full-length product decreases exponentially with each added base. Even if synthesize exact 2000 bp oligo, it would be hard to purify from, for instance, 1990 bp oligo by gel electrophoresis.

Homework Question from George Church:

  1. Essential amino acids are 10 organic compounds (valine, leucine, isoleucine, lysine, methionine, threonine, tryptophan, phenylalanine, histidine, arginine) that are not synthesized in the human body and must be obtained from food for muscle growth, immunity and metabolism. There is pyrrolysine also, wich occurs only in some organisms. Lysine Contigency in “Jurassic Park” movie was presented as “engineered” lack of dinosaurs’ ability to produce lysine amed to tie them to the park therritory where they could get needed supplements.

Week 2 HW: DNA read, write and edit

Part 1: Benchling & In-silico Gel Art

The process of simulation of Lambda genome Restriction Enzyme Digestion in Benchling: enzymes enzymes

enzymes enzymesenzymes enzymes

Final result: enzymes enzymes

For Gel art pattern I used KpnI, SacI and SalI Restriction Enzyme Digestion. It is a pattern of wave. wave wave

Part 3: DNA Design Challenge

I chose the RecA protein from Enterococcus faecalis. I chose this protein because I am interested in horizontal gene transfer in bacteria. RecA is one of the key proteins involved in homologous recombination and DNA repair, which are important processes during horizontal gene transfer and bacterial adaptation. Studying RecA helps understand how bacteria exchange genetic material and acquire new traits such as antibiotic resistance.

3.1. Choose your protein

The protein sequence was obtained from UniProt. Protein: RecA from Enterococcus faecalis sp|P42444|RECA_ENTFA Protein RecA OS=Enterococcus faecalis (strain ATCC 700802 / V583) OX=226185 GN=recA PE=3 SV=2

MADDRKVALDAALKKIEKNFGKGSIMKLGEKADQKISTIPSGSLALDVALGVGGYPRGRI IEVYGPESSGKTTVSLHAIAEVQRNGGTAAFIDAEHALDPQYAEKLGVNIDELLLSQPDT GEQGLEIADALVSSGAIDIVVIDSVAALVPRAEIDGEMGASHVGLQARLMSQALRKLSGS INKTKTIAIFINQIREKVGVMFGNPETTPGGRALKFYATVRLEVRRAEQLKQGTDIVGNR TKIKVVKNKVAPPFKVAEVDIMYGQGISQEGELLDMAVEKDLISKSGAWYGYKEERIGQG RENAKQYMADHPEMMAEVSKLVRDAYGIGDGSTITEEAEGQEELPLDE

3.2. Reverse Translation: Protein Sequence to DNA Sequence

To obtain the nucleotide sequence corresponding to the RecA protein, I used the reverse translation tool from Sequence Manipulation Suite. Reverse translation converts an amino acid sequence into a possible DNA sequence based on the genetic code.

RecA protein DNA sequence

atggcggatgatcgcaaagtggcgctggatgcggcgctgaaaaaaattgaaaaaaactttggcaaaggcagcattatgaaactgggcgaaaaagcggatcagaaaattagcaccattccgagcggcagcctggcgctggatgtggcgctgggcgtgggcggctatccgcgcggccgcattattgaagtgtatggcccggaaagcagcggcaaaaccaccgtgagcctgcatgcgattgcggaagtgcagcgcaacggcggcaccgcggcgtttattgatgcggaacatgcgctggatccgcagtatgcggaaaaactgggcgtgaacattgatgaactgctgctgagccagccggataccggcgaacagggcctggaaattgcggatgcgctggtgagcagcggcgcgattgatattgtggtgattgatagcgtggcggcgctggtgccgcgcgcggaaattgatggcgaaatgggcgcgagccatgtgggcctgcaggcgcgcctgatgagccaggcgctgcgcaaactgagcggcagcattaacaaaaccaaaaccattgcgatttttattaaccagattcgcgaaaaagtgggcgtgatgtttggcaacccggaaaccaccccgggcggccgcgcgctgaaattttatgcgaccgtgcgcctggaagtgcgccgcgcggaacagctgaaacagggcaccgatattgtgggcaaccgcaccaaaattaaagtggtgaaaaacaaagtggcgccgccgtttaaagtggcggaagtggatattatgtatggccagggcattagccaggaaggcgaactgctggatatggcggtggaaaaagatctgattagcaaaagcggcgcgtggtatggctataaagaagaacgcattggccagggccgcgaaaacgcgaaacagtatatggcggatcatccggaaatgatggcggaagtgagcaaactggtgcgcgatgcgtatggcattggcgatggcagcaccattaccgaagaagcggaaggccaggaagaactgccgctggatgaa

3.3. Codon Optimization

Codon optimization is important because different organisms prefer different codons to encode the same amino acid. Although the genetic code is universal, some codons are used more frequently in certain species. Using preferred codons improves translation efficiency, increases protein expression, and can improve protein stability.

I chose to optimize the RecA gene for Escherichia coli because E. coli is one of the most commonly used organisms for recombinant protein production. It grows quickly, is inexpensive to culture, and has many well-established molecular biology tools available. Since RecA is a bacterial protein and is not highly toxic, expression in E. coli is relatively straightforward.

For protein expression, I would use the plasmid pET-28a(+). To insert the optimized RecA gene into the plasmid, restriction enzymes can be used. A suitable pair of restriction enzymes would be NdeI and XhoI. These enzymes are commonly used with pET-28a(+) because their recognition sites are present in the plasmid multiple cloning site and allow directional cloning of the gene insert. Directional cloning ensures that the gene is inserted in the correct orientation for expression. I used Codon Optimization Tool from Vector Builder.

Improved DNA: GC=54.79%, CAI=0.94

ATGGCGGATGATCGCAAAGTGGCCCTTGATGCGGCGCTGAAAAAAATCGAAAAAAATTTTGGCAAAGGCAGCATTATGAAACTGGGCGAAAAAGCGGATCAGAAAATTAGCACCATTCCGTCGGGCAGCCTGGCCCTGGATGTGGCGCTGGGCGTGGGTGGCTATCCGCGTGGCCGCATTATTGAAGTGTATGGCCCGGAAAGCAGCGGCAAAACCACCGTGAGCCTGCATGCAATTGCGGAAGTCCAGCGCAACGGCGGCACCGCGGCCTTTATTGATGCGGAACATGCGCTGGATCCGCAGTATGCGGAAAAACTGGGTGTGAACATTGATGAACTGCTGCTGAGCCAGCCGGATACCGGCGAACAGGGCCTGGAAATTGCGGATGCCCTGGTGAGCTCAGGCGCGATTGATATTGTTGTTATTGACAGTGTGGCGGCCCTGGTGCCGCGCGCCGAGATTGATGGCGAAATGGGCGCAAGCCACGTGGGCCTGCAGGCGCGCCTGATGAGCCAGGCGCTGCGCAAACTGTCAGGCTCGATTAACAAAACCAAAACCATCGCAATTTTTATTAACCAGATTCGTGAAAAAGTGGGCGTGATGTTTGGCAATCCGGAAACCACCCCGGGCGGCCGCGCCCTGAAATTTTATGCGACCGTGCGTCTGGAAGTGCGCCGTGCGGAACAGCTGAAACAGGGCACCGATATTGTGGGCAATCGCACCAAAATTAAAGTGGTGAAAAACAAAGTTGCCCCGCCGTTTAAAGTGGCGGAAGTGGATATTATGTATGGCCAGGGCATTTCGCAGGAAGGCGAACTGCTGGATATGGCGGTGGAAAAAGATCTGATTTCGAAAAGCGGCGCGTGGTATGGCTACAAAGAAGAACGTATTGGCCAGGGCCGTGAAAACGCGAAACAGTACATGGCCGATCATCCGGAAATGATGGCCGAAGTGAGCAAACTGGTTCGTGATGCCTACGGTATCGGCGATGGCAGCACCATCACCGAAGAAGCGGAAGGCCAGGAAGAACTGCCGCTGGATGAA

3.4. You have a sequence! Now what?

After codon optimization, the DNA sequence encoding RecA can be synthesized and inserted into the pET-28a(+) plasmid. The recombinant plasmid is then introduced into Escherichia coli cells through transformation. Because the pET-28a(+) vector can add a His-tag to the protein, the expressed RecA protein could later be purified using nickel affinity chromatography.

Expression in E. coli is a practical approach because bacterial systems are simple, cost-effective, and most practical method because RecA is a non-toxic bacterial protein and bacterial expression systems are highly optimized for this type of protein production.

Part 4: Prepare a Twist DNA Synthesis Order

4.2. Build Your DNA Insert Sequence

In Benchling I uploaded a sequence of RecA gene featured with Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator.

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATATGGCGGATGATCGCAAAGTGGCCCTTGATGCGGCGCTGAAAAAAATCGAAAAAAATTTTGGCAAAGGCAGCATTATGAAACTGGGCGAAAAAGCGGATCAGAAAATTAGCACCATTCCGTCGGGCAGCCTGGCCCTGGATGTGGCGCTGGGCGTGGGTGGCTATCCGCGTGGCCGCATTATTGAAGTGTATGGCCCGGAAAGCAGCGGCAAAACCACCGTGAGCCTGCATGCAATTGCGGAAGTCCAGCGCAACGGCGGCACCGCGGCCTTTATTGATGCGGAACATGCGCTGGATCCGCAGTATGCGGAAAAACTGGGTGTGAACATTGATGAACTGCTGCTGAGCCAGCCGGATACCGGCGAACAGGGCCTGGAAATTGCGGATGCCCTGGTGAGCTCAGGCGCGATTGATATTGTTGTTATTGACAGTGTGGCGGCCCTGGTGCCGCGCGCCGAGATTGATGGCGAAATGGGCGCAAGCCACGTGGGCCTGCAGGCGCGCCTGATGAGCCAGGCGCTGCGCAAACTGTCAGGCTCGATTAACAAAACCAAAACCATCGCAATTTTTATTAACCAGATTCGTGAAAAAGTGGGCGTGATGTTTGGCAATCCGGAAACCACCCCGGGCGGCCGCGCCCTGAAATTTTATGCGACCGTGCGTCTGGAAGTGCGCCGTGCGGAACAGCTGAAACAGGGCACCGATATTGTGGGCAATCGCACCAAAATTAAAGTGGTGAAAAACAAAGTTGCCCCGCCGTTTAAAGTGGCGGAAGTGGATATTATGTATGGCCAGGGCATTTCGCAGGAAGGCGAACTGCTGGATATGGCGGTGGAAAAAGATCTGATTTCGAAAAGCGGCGCGTGGTATGGCTACAAAGAAGAACGTATTGGCCAGGGCCGTGAAAACGCGAAACAGTACATGGCCGATCATCCGGAAATGATGGCCGAAGTGAGCAAACTGGTTCGTGATGCCTACGGTATCGGCGATGGCAGCACCATCACCGAAGAAGCGGAAGGCCAGGAAGAACTGCCGCTGGATGAACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

benchling1 benchling1

4.3.-4.6. Twist and Benchling

I used pTwist Amp High Copy as a Twist cloning vector because I require constant expression of the RecA gene for my experiment. twist twist

Benchling benchling2 benchling2

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

In the context of my idea project, I would sequence DNA from putative transconjugant E. coli colonies that appear blue in the recipient-printed zones on chromogenic agar (Plate B + ampicillin).

Why:

  • Confirm true horizontal gene transfer (conjugation): verify that recipient cells acquired the mobilizable plasmid carrying oriT_RK2–Ptrc–RBS–bglA, rather than the blue phenotype being caused by donor contamination, spontaneous mutations, or unexpected chromogenic substrate metabolism.
  • Verify construct integrity after transfer: check whether bglA, Ptrc, BBa_B0034, and oriT_RK2 are intact (no deletions/rearrangements), since plasmids can mutate or recombine.
  • Map spatial HGT patterns: sequence colonies sampled from different printed coordinates (e.g., overlap boundary vs center) to see whether transfer frequency or plasmid variants correlate with spatial proximity and incubation conditions.

(ii) What technology or technologies would you use to perform sequencing on your DNA and why?

I would use a two-tier approach:

  1. Sanger sequencing (targeted validation)

    • Fast and cost-effective for confirming a limited number of colonies and key regions (bglA + junctions).
    • High per-base accuracy and straightforward interpretation.
  2. Illumina short-read sequencing (scaled, high-throughput)

    • Appropriate when sequencing many colonies across multiple printed locations.
    • High accuracy and depth for detecting low-frequency variants.
  3. Oxford Nanopore long-read sequencing

  • Useful for reading the entire plasmid in one/few long reads and detecting structural rearrangements.

Is your method first-, second- or third-generation or other? How so?

  • Sanger sequencing: first-generation (chain-termination; low throughput; ~700–900 bp reads with high accuracy).
  • Illumina: second-generation (massively parallel sequencing-by-synthesis with cluster amplification; short reads).
  • Oxford Nanopore: third-generation (single-molecule, real-time, long reads; PCR not mandatory).

What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

Input:

  • Colonies from chromogenic plates:
    • Putative transconjugants: blue colonies in recipient zones on Plate B (+ ampicillin)
    • Controls: donor strain, original recipient strain, and ideally a “no-helper” negative control mating

Essential preparation steps (by method):

A) Sanger (PCR + sequencing)

  1. Pick colony → short liquid culture (or colony PCR).
  2. DNA extraction (plasmid miniprep or crude lysate).
  3. PCR amplify:
    • bglA internal region (presence/identity)
    • junctions (e.g., oriT→Ptrc, Ptrc→RBS, RBS→bglA) to confirm correct order/integrity
  4. PCR cleanup.
  5. Sanger sequencing.

B) Illumina (amplicon-seq or plasmid-enriched sequencing)

  1. DNA extraction (preferably plasmid-enriched if focusing on the construct).
  2. Either:
    • PCR amplify target regions (amplicon sequencing), or
    • Fragment DNA for whole-plasmid/whole-genome sequencing.
  3. End repair / A-tailing (kit-dependent).
  4. Adapter ligation + indexing (barcodes).
  5. PCR enrichment (often used).
  6. Sequencing.

C) Nanopore (whole plasmid)

  1. Extract high-quality DNA with minimal shearing.
  2. End repair (kit-dependent).
  3. Ligate nanopore adapters with motor protein.
  4. Load onto flow cell.

What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

  • Sanger: ddNTP incorporation terminates extension; fragments are separated by capillary electrophoresis; fluorescence color determines the terminal base.
  • Illumina: sequencing-by-synthesis with fluorescent reversible terminators; imaging each cycle; base calling from image intensities.
  • Nanopore: DNA passes through a pore and modulates ionic current; neural-network base calling converts current traces (“squiggles”) into bases.

What is the output of your chosen sequencing technology?

  • Sanger: chromatograms (.ab1) + base-called sequences for each amplicon.
  • Illumina: FASTQ short reads (e.g., 2×150 bp) with Phred quality scores.
  • Nanopore: FASTQ long reads (plus raw signal files such as POD5/FAST5, workflow-dependent).

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would synthesize the mobilizable bglA expression cassette that generates the blue phenotype after conjugation:

  • oriT_RK2 (mobilization origin; enables plasmid transfer using RK2 machinery supplied by pRK2013 in trans)
  • Ptrc promoter (strong E. coli-compatible promoter for robust expression)
  • BioBrick RBS BBa_B0034 (efficient translation initiation)
  • bglA CDS from Enterococcus faecalis (β-glucosidase intended to produce a blue chromogenic phenotype)

Why: this DNA encodes the key “sensor output” of the system: recipient cells become blue only after acquiring bglA via HGT, enabling spatial visualization of conjugation.


(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use:

  1. Phosphoramidite-based chemical DNA synthesis (commercial gene synthesis / oligo pools)
  2. Assembly methods such as Gibson Assembly or Golden Gate Assembly

Why:

  • Phosphoramidite chemistry is the standard for accurate, scalable oligo/gene-fragment synthesis.
  • Gibson/Golden Gate support modular assembly of oriT, promoter, RBS, and CDS, and enable rapid variant construction.

1) What are the essential steps of your chosen synthesis methods?

  1. In silico design (Benchling):
    • verify ORF, start/stop, and reading frame of bglA
    • codon-optimize for E. coli if needed
    • check synthesis constraints (repeats, extreme GC, hairpins)
    • add overlaps or Type IIS sites for assembly
  2. Oligo/gene fragment synthesis (phosphoramidite cycles).
  3. Assembly (Gibson/Golden Gate) into the plasmid backbone.
  4. Transformation into E. coli and clone picking.
  5. Sequence verification (Sanger/Illumina) of bglA and junctions.
  6. Functional validation on chromogenic agar.

2) What are the limitations of your synthesis method (if any) in terms of speed, accuracy, scalability?

  • Length limits: single oligos are short; longer constructs require multi-fragment assembly.
  • Error accumulation: synthesis errors accumulate with length; verification and clone screening are required.
  • Sequence-dependent difficulty: repeats, strong secondary structures, and extreme GC can reduce synthesis/assembly success.
  • Speed: outsourcing is fast but still usually days–weeks including cloning and verification.
  • Scalability bottleneck: generating variants is feasible; the limiting step becomes screening + validation (though the color phenotype helps throughput).

5.3 DNA Edit

(i) What DNA would you want to edit and why?

To improve the robustness and interpretability of the HGT, I would edit DNA in the recipient strain and/or the plasmid system.

Examples of useful edits:

  • Recipient genome edits to reduce background coloration or off-pathway metabolism on chromogenic media, making blue strictly dependent on bglA acquisition and expression.
  • Regulatory tuning edits (promoter/RBS fine-tuning) to calibrate the onset and intensity of the blue signal and improve spatial resolution.
  • Plasmid stability edits (e.g., removing recombinogenic sequences or adding stability features) to reduce plasmid loss and signal variability.

Why: the project’s readout is phenotypic (color), so improving specificity and stability directly strengthens conclusions about HGT frequency and spatial dynamics.


(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use CRISPR-based editing in E. coli:

  • CRISPR-Cas9 + recombineering (λ-Red) + donor DNA for precise insertions/replacements, and/or
  • Base editors for targeted point mutations in promoters/RBS without introducing double-strand breaks.

Why:

  • Cas9 + recombineering is well established for precise bacterial genome engineering.
  • Base editing is ideal for fine-tuning regulatory elements controlling bglA expression.

How does your technology of choice edit DNA? What are the essential steps?

CRISPR-Cas9 + donor DNA (with recombineering):

  1. Design gRNA targeting the locus of interest (adjacent to a PAM).
  2. Deliver Cas9 + gRNA (typically via plasmid expression).
  3. Provide a donor DNA template with desired edits flanked by homology arms.
  4. Cas9 creates a double-strand break at the target site.
  5. λ-Red recombination integrates the donor sequence → precise edit.
  6. Cure editing plasmids if needed and verify.

Base editing:

  1. Design gRNA near the base(s) to be changed.
  2. Express nCas9/dCas9 fused to a deaminase.
  3. Targeted binding induces base conversion (C→T or A→G) within an editing window.
  4. Screen and sequence-verify edited clones.

What preparation do you need to do (e.g. design steps) and what is the input for the editing?

Preparation/design:

  • Choose target loci (recipient genes affecting background; promoter/RBS tuning; plasmid stability elements).
  • Design gRNAs and check off-targets against the recipient genome.
  • Design donor templates (if using Cas9 + recombination) or plan the editing window (base editing).
  • Define screening/selection strategy (markers, counterselection, phenotype).

Input:

  • gRNA(s)
  • Cas9 or base editor construct
  • Donor DNA template (ssDNA/dsDNA) for precise edits
  • Editing plasmids and recombineering functions (λ-Red)
  • Competent E. coli cells (recipient/donor as appropriate)

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

  • Variable efficiency depending on locus, strain, and delivery; requires optimization.
  • DSB toxicity/lethality if repair/recombination is inefficient.
  • Off-target edits are possible and require sequencing verification.
  • Base-editing constraints: depends on PAM availability and the editor’s window/context; not all bases are targetable.
  • Plasmid/genome context issues: copy number, recombination, and selection dynamics can complicate stable outcomes.

Week 3 HW: Lab Automation

Python Script for Opentrons Artwork

My Python script which draws my design using the Opentrons. I used AI for adding my color parameters. I made a drawing of Totoro - the titular forest spirit from the 1988 Studio Ghibli animated film My Neighbor Totoro written and directed by Hayao Miyazaki.

On Ronan’s website: totoro totoro

In Google colab Execute Simulation: totoro totoro

Printed Totoro by Opentrons liquid handling robot in Designer Cells Lab Node: totoro totoro

Not everything printed successfully due to the size of the points, but the general pattern is still readable. totoro totoro

Post-Lab Questions

1. del Olmo Lianes et al. (2023) developed COPICK, an upgrade to the Opentrons OT-2 that automates bacterial colony screening directly on Petri dishes. By mounting a camera on the robot deck, the system captures images of agar plates and selects colonies based on size, colour, and fluorescence criteria, achieving a picking rate of ~240 colonies/hour with 82% performance on E. coli. This paper is directly relevant to my final project, which also relies on spatial bacterial patterning and colour-based readout of gene transfer events on chromogenic agar plates using the Opentrons OT-2. Reference: del Olmo Lianes I. et al. (2023) Front. Bioeng. Biotechnol. 11:1202836. Can read here.

2. For my final project, the Opentrons OT-2 serves as a precision bioprinter, depositing 10 µL spots of three bacterial cell suspensions (donor, helper, and recipient, each at OD₆₀₀ = 0.6) in defined spatial patterns directly onto agar plates. The robot runs three sequential deposition steps using a single Python script with fixed XY coordinates:

Deposit donor, helper, and recipient suspensions in defined zones onto plain LB agar (no antibiotics) → overnight incubation at 37°C for triparental mating Reprint the identical pattern onto CHROMagar without antibiotic (Plate A — bioart image: pink + blue simultaneously) Reprint the identical pattern onto CHROMagar + ampicillin 100 µg/mL (Plate B — scientific confirmation: blue in recipient zones = confirmed HGT)

The key advantage of Opentrons over manual pipetting is spatial reproducibility: the same XY coordinates are used across all three plates, ensuring that colour patterns can be directly compared between the bioart plate and the confirmation plate. This spatial precision is essential for the “photocamera” concept — the printed image must remain consistent across experimental replicates.

Final Project Ideas

1 idea FP FP 2 idea FP FP 3 idea FP FP

Week 4 HW: Protein design part I

Part A. Conceptual Questions

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Assumptions:

  • Lean meat (e.g., beef, chicken, fish) contains approximately 20% protein by weight (the remainder is water, fat, minerals, and glycogen).
  • 500 g meat × 0.20 = 100 g of protein.
  • Proteins are digested into individual amino acid monomers.
  • Average molar mass of an amino acid residue ≈ 100 g/mol (as stated).

Approximately 6 × 10²⁶ amino acid molecules are obtained from 500 g of meat.


2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

This is explained by the central dogma of molecular biology, digestion, and genetic isolation:

Digestion breaks down macromolecules:

  • Dietary proteins are hydrolyzed by proteases (pepsin, trypsin, chymotrypsin, peptidases) into individual amino acids and small peptides.
  • Nucleic acids (DNA/RNA) are degraded by nucleases into nucleotides and then further into nucleosides and bases.
  • Intact macromolecular information does not survive digestion.

Absorption yields building blocks, not genetic information:

  • Absorbed amino acids, sugars, nucleotides, and fatty acids enter a common metabolic pool.
  • These are reassembled into human proteins according to instructions encoded in the human genome, transcribed into human mRNA, and translated by human ribosomes.

No horizontal gene transfer from diet:

  • Eukaryotic cells lack mechanisms to incorporate foreign dietary DNA into the nuclear genome.
  • Even if intact DNA fragments reached intestinal cells, they would be degraded by DNases or excluded by chromatin and membrane barriers.

3. Why are there only 20 natural amino acids?

The canonical set of 20 proteinogenic amino acids (plus selenocysteine and pyrrolysine in some organisms) reflects an evolutionary and biochemical optimum:

Chemical diversity and functional sufficiency:

  • The 20 amino acids cover a wide range of chemical properties: hydrophobic, polar, charged (acidic/basic), aromatic, and conformationally constrained (proline).
  • This diversity is sufficient to fold proteins into stable structures and catalyze essentially all cellular reactions.

Genetic code constraints:

  • The genetic code maps 64 codons to amino acids and stop signals. Expanding the amino acid repertoire requires:
    • New aminoacyl-tRNA synthetases
    • Orthogonal tRNAs
    • Unambiguous codon assignments
  • Adding more amino acids increases translational errors and slows ribosomal elongation.

Historical contingency and error minimization:

  • Early life likely used a smaller subset (~10 amino acids).
  • The code expanded gradually and became “frozen” because the existing arrangement minimizes the phenotypic impact of point mutations (error minimization principle).
  • Further expansion offers diminishing returns relative to the evolutionary cost of maintaining additional biosynthetic pathways and quality control.

4. Can you make other non-natural amino acids? Design some new amino acids.

Three Rationally Designed Non-Natural Amino Acids:

NameStructure/ModificationFunctional Rationale
p-Azobenzene-phenylalaninePhenyl ring replaced with azobenzene moietyPhotoswitchable: UV/visible light induces transcis isomerization, enabling light-controlled conformational changes in engineered proteins (optogenetics, dynamic materials).
2,2′-Bipyridyl-alanineChelating bipyridyl group on β-carbonMetal coordination: High-affinity binding to Fe²⁺, Cu²⁺, Ru²⁺. Useful for installing synthetic metallocofactors, creating redox-active sites, or building metalloproteins with tunable properties.
ε-Azido-lysineLysine with azide (–N₃) on ε-amino groupBioorthogonal chemistry: Azide is inert to biological functional groups but reacts rapidly via click chemistry (e.g., strain-promoted azide-alkyne cycloaddition), enabling site-specific labeling, crosslinking, and drug conjugation.

ncAAs expand protein functionality for drug design, biosensors, biomaterials, mechanistic enzymology, and synthetic biology.


5. Where did amino acids come from before enzymes that make them, and before life started?

1. Atmospheric/oceanic synthesis (Miller–Urey type experiments):

  • Reducing atmospheres (CH₄, NH₃, H₂, H₂O) subjected to energy sources (UV radiation, lightning, volcanic heat) produce amino acids.
  • Modern models favor weakly reducing atmospheres (CO₂, N₂, H₂O + H₂) near hydrothermal vents, which still yield amino acids via formose-like and Strecker-like reactions.

2. Strecker synthesis (aqueous chemistry):

  • Aldehydes or ketones react with HCN and NH₃ to form α-aminonitriles, which hydrolyze to amino acids.
  • This pathway occurs in alkaline hydrothermal vent systems and interstellar ice analogs.

3. Extraterrestrial delivery:

  • Amino acids (>80 types, including non-biological ones) have been detected in carbonaceous chondrites (e.g., Murchison meteorite).
  • Some samples show L-enantiomer excess, suggesting chiral symmetry-breaking in space.
  • Comets and interstellar ices likely contributed to Earth’s prebiotic inventory.

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

A polypeptide composed entirely of D-amino acids would form a left-handed α-helix. The reason because L-amino acids favor backbone dihedral angles φ ≈ –60°, ψ ≈ –45° (right-handed helical geometry). D-amino acids are mirror images (enantiomers) of L-amino acids, so their favorable φ/ψ angles are sign-inverted: φ ≈ +60°, ψ ≈ +45°. And these angles produce a left-handed helix with the same hydrogen-bonding pattern (i → i+4) but opposite helical sense.


7. Can you discover additional helices in proteins?

Yes. Beyond the canonical α-helix, 3₁₀-helix, and π-helix, new and rare helical motifs continue to be identified.

Known non-canonical helices:

  • Polyproline II helix (PPII): left-handed, extended; common in collagen and disordered regions.
  • γ-helix: rare; tighter than α-helix.
  • α_L-helix: left-handed α-helix (rare in L-peptides; seen in certain contexts or D-residues).
  • Hybrid/transitional helices: local 3₁₀/α blends or distorted segments.

Tools for discovery:

  • High-resolution X-ray crystallography and cryo-EM: reveal short, rare, or distorted helices.
  • NMR spectroscopy: detects transient helical states (e.g., via residual dipolar couplings, relaxation dispersion).
  • AlphaFold2/3 and molecular dynamics (MD): predict metastable helical conformations and sample rare states via enhanced sampling (replica exchange, metadynamics).

Why new helices appear:

  • Local sequence context, post-translational modifications, ligand binding, membrane environments, and pH shift φ/ψ distributions.
  • Helices exist on a continuum; subtle energy differences stabilize non-canonical H-bond patterns.

8. Why are most molecular helices right-handed?

In biological polymers, right-handedness arises from monomer chirality and steric optimization.

For proteins (L-amino acids):

  • L-amino acids favor φ/ψ angles (≈ –60°, –45°) that produce right-handed α-helices with minimal side-chain clashes and optimal backbone hydrogen bonding.
  • Left-handed helices require φ/ψ values that place side chains in sterically unfavorable positions (forbidden regions of the Ramachandran plot).

General principle:

  • Chirality of monomers dictates handedness.
  • If you swap L ↔ D (e.g., D-amino acids), you get the mirror-image helix (left-handed).

In synthetic/achiral polymers:

  • Handedness is determined by packing efficiency, torsional strain, and solvent interactions.
  • Without chirality, helices may form as racemic mixtures or adopt the handedness that minimizes steric/electronic repulsion.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-sheets are intrinsically prone to intermolecular association due to their geometry and chemical properties.

  • β-strands often alternate polar and nonpolar residues on opposite faces. When β-sheets associate, hydrophobic side chains pack against each other, displacing high-energy ordered water molecules → favorable entropy gain.
  • Aggregation is often initiated by partial unfolding, cleavage, or mutation, exposing β-strand-prone sequences.
  • Some organisms use cross-β aggregates as structural materials (e.g., spider silk, curli fibers in biofilms).

Part B: Protein Analysis and Visualization

I selected the BglA (β-glucosidase) protein because my project focuses on horizontal gene transfer (HGT) in Escherichia coli using a bioprinting system. In this system, donor and recipient bacterial strains are printed onto agar plates in spatially defined layers. Successful conjugation events are visualized by a color change from pink to teal after recipient cells acquire the bglA gene from Enterococcus faecalis and begin producing β-glucosidase activity on chromogenic media.

There is limited structural information available for the E. faecalis BglA protein, so I used the structurally characterized homolog from E. coli. Although E. coli and E. faecalis are not from the same bacterial family, both proteins belong to the β-glucosidase enzyme group and perform similar functions.

The protein sequence for E. coli BglA was obtained from UniProt.

MIVKKLTLPKDFLWGGAVAAHQVEGGWNKGGKGPSICDVLTGGAHGVPREITKEVLPGKY YPNHEAVDFYGHYKEDIKLFAEMGFKCFRTSIAWTRIFPKGDEAQPNEEGLKFYDDMFDE LLKYNIEPVITLSHFEMPLHLVQQYGSWTNRKVVDFFVRFAEVVFERYKHKVKYWMTFNE INNQRNWRAPLFGYCCSGVVYTEHENPEETMYQVLHHQFVASALAVKAARRINPEMKVGC MLAMVPLYPYSCNPDDVMFAQESMRERYVFTDVQLRGYYPSYVLNEWERRGFNIKMEDGD LDVLREGTCDYLGFSYYMTNAVKAEGGTGDAISGFEGSVPNPYVKASDWGWQIDPVGLRY ALCELYERYQRPLFIVENGFGAYDKVEEDGSINDDYRIDYLRAHIEEMKKAVTYDGVDLM GYTPWGCIDCVSFTTGQYSKRYGFIYVNKHDDGTGDMSRSRKKSFNWYKEVIASNGEKL

The most frequent amino acids in the sequence is Glycine (G) as it appears in the sequence 42 times. colab1 colab1


The BglA protein from E. coli contains 479 amino acids. colab2 colab2

Using the UniProt BLAST tool, 250 results were identified as potential homologs of the BglA protein (UniProt ID: Q46829) with sequence identities ranging from 74.7% to 100%. Most homologs were found in bacteria (249 results), which is expected because β-glucosidases are common enzymes involved in bacterial carbohydrate metabolism. One homologous sequence was also identified in the eukaryotic organism Trichuris trichiura (1 result).

BglA belongs to the glycoside hydrolase family 1 (GH1) of β-glucosidases. These enzymes hydrolyze glycosidic bonds in carbohydrates and are involved in sugar metabolism.

The structure I used is Crystal Structure of E.coli BglA (2XHY).

The structure was solved using X-ray crystallography and released in 2011. The resolution of the structure is 2.30 Å, which is considered good quality for a protein crystal structure because lower resolution values indicate more precise atomic positions.

In addition to the protein, the structure contains sulfate ions (SO₄²⁻) and bromide ions. These molecules are commonly present from crystallization conditions and may stabilize the structure during crystal formation.

The protein belongs to the glycoside hydrolase structural family, which typically contains catalytic domains rich in α/β folds involved in carbohydrate hydrolysis.

  • Visualized protein in PyMol using cartoon representations: pymol pymol

  • Visualized protein in PyMol using ribbon representations: pymol pymol

  • Visualized protein in PyMol using ball-and-stick representations: pymol pymol


Secondary structure

After coloring the protein by secondary structure in PyMol, the structure appeared to contain substantially more α-helices than β-sheets. Loops and turns were also distributed throughout the structure. The predominance of α-helices is consistent with the overall fold of many glycoside hydrolase enzymes. pymol pymol


Residue type distribution

After coloring the protein by residue type in PyMol, hydrophobic residues (yellow) appeared to cluster mainly in the internal regions of the protein structure, while hydrophilic residues (cyan) were more exposed on the protein surface. This organization is characteristic of soluble enzymes and contributes to protein stability and interactions with the aqueous environment. pymol pymol


Surface visualization

Surface representation was colored according to B-factor values using a blue-white-red spectrum. Regions with higher B-factors (red/pink) indicate increased atomic displacement or structural flexibility, while lower B-factor regions (blue) correspond to more rigid parts of the structure. The protein surface exhibits several grooves and cavities, consistent with the typical topology of glycoside hydrolases, which often possess an elongated substrate-binding cleft. pymol pymol

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

  1. Deep Mutational Scans - Crystal Structure of E.coli BglA protein.

colab colab Columns with more dark cells (the wild-type amino acid is strongly preferred) indicate more conserved residues, which are likely important for the structure and function of the protein.

  1. Latent Space Analysis

Neighboring proteins usually share the same SCOPe structural class and superfamily number, which indicates that the latent space captures structural and functional similarities between proteins.

C2. Protein Folding

Folding a protein

The ESMFold prediction for BglA produced a highly confident structure with a pTM score of 0.936 and an average pLDDT of 91.16. Most regions were colored blue or green, indicating high local confidence, while only a few loop regions showed lower confidence (yellow/red). The predicted fold closely matched the experimentally determined crystal structure, suggesting that ESMFold accurately reproduced the overall topology of the protein. colab colab


The E164A point mutation had minimal effect on the predicted structure of BglA. The pTM score remained nearly unchanged (0.935 vs. 0.936 for the wild type), and the average pLDDT stayed very high (91.05), indicating that the overall fold and structural confidence were preserved. This suggests that the BglA structure is resilient to single amino acid substitutions.

Point mutation (E164A)

MIVKKLTLPKDFLWGGAVAAHQVEGGWNKGGKGPSICDVLTGGAHGVPREITKEVLPGKYYPNHEAVDFYGHYKEDIKLFAEMGFKCFRTSIAWTRIFPKGDEAQPNEEGLKFYDDMFDELLKYNIEPVITLSHFAMPLHLVQQYGSWTNRKVVDFFVRFAEVVFERYKHKVKYWMTFNEINNQRNWRAPLFGYCCSGVVYTEHENPEETMYQVLHHQFVASALAVKAARRINPEMKVGCMLAMVPLYPYSCNPDDVMFAQESMRERYVFTDVQLRGYYPSYVLNEWERRGFNIKMEDGDLDVLREGTCDYLGFSYYMTNAVKAEGGTGDAISGFEGSVPNPYVKASDWGWQIDPVGLRYALCELYERYQRPLFIVENGFGAYDKVEEDGSINDDYRIDYLRAHIEEMKKAVTYDGVDLMGYTPWGCIDCVSFTTGQYSKRYGFIYVNKHDDGTGDMSRSRKKSFNWYKEVIASNGEKL

colab colab

Deletion of a larger sequence segment resulted in a moderate decrease in prediction confidence (pLDDT 89.1 vs. 91.2 in the wild type), while the overall fold remained largely preserved. The predicted structure retained a similar topology and confidence-color distribution, suggesting that the deleted region is not essential for maintaining the global structural framework of BglA.

Large mutation (removed segment GYCCSGVVYTEHENPEETMY)

MIVKKLTLPKDFLWGGAVAAHQVEGGWNKGGKGPSICDVLTGGAHGVPREITKEVLPGKYYPNHEAVDFYGHYKEDIKLFAEMGFKCFRTSIAWTRIFPKGDEAQPNEEGLKFYDDMFDELLKYNIEPVITLSHFEMPLHLVQQYGSWTNRKVVDFFVRFAEVVFERYKHKVKYWMTFNEINNQRNWRAPLFQVLHHQFVASALAVKAARRINPEMKVGCMLAMVPLYPYSCNPDDVMFAQESMRERYVFTDVQLRGYYPSYVLNEWERRGFNIKMEDGDLDVLREGTCDYLGFSYYMTNAVKAEGGTGDAISGFEGSVPNPYVKASDWGWQIDPVGLRYALCELYERYQRPLFIVENGFGAYDKVEEDGSINDDYRIDYLRAHIEEMKKAVTYDGVDLMGYTPWGCIDCVSFTTGQYSKRYGFIYVNKHDDGTGDMSRSRKKSFNWYKEVIASNGEKL

colab colab

Conclusion

ESMFold accurately reproduced the experimentally determined fold of BglA with high confidence scores (pTM 0.936; pLDDT 91.16). A single-point mutation (E164A) had almost no effect on the predicted structure, indicating high resilience to minor sequence changes. Deletion of a larger sequence segment caused a moderate reduction in confidence scores but did not significantly disrupt the overall fold, suggesting that BglA maintains a robust global structural architecture even after substantial sequence perturbation.

C3. Protein Generation

Inverse-Folding a protein

  1. Predicted sequence probabilities

I used ProteinMPNN (model v_48_020, sampling temperature 0.1) to perform inverse folding on the backbone structure of BglA (PDB: 2XHY). The designed sequence achieved a sequence recovery of 45.5%, indicating that nearly half of the original amino acids were reproduced by the model. This suggests that the BglA backbone imposes substantial structural constraints on sequence selection, particularly within the conserved catalytic core and structurally important regions. Probability maps showed several highly confident residue positions, while other regions tolerated greater sequence variability, especially likely in surface-exposed loops.

colab colab ProteinMPNN amino acid probabilities at each position. Bright spots indicate positions where the model is highly confident about which residue should occupy that position.

  1. Folding the designed sequence with ESMFold

I folded the ProteinMPNN-designed sequence of BglA with ESMFold and compared it to the original crystal structure.

StructurepTMpLDDT
Original BglA0.93691.16
ProteinMPNN-designed sequence0.92387.95

The ProteinMPNN-designed sequence retained a highly similar overall fold compared to the original BglA structure. Confidence coloring was also largely unchanged, with most regions remaining blue and green, indicating that the designed sequence still folds into a stable and well-defined globular structure.

Although the designed sequence recovered only ~45.5% of the original amino acids, the predicted structure remained highly similar to the native fold. This demonstrates that multiple different amino acid sequences can support the same overall protein architecture. In BglA, the structural constraints imposed by the conserved catalytic core and the α/β-fold appear to strongly determine the global topology even when substantial sequence variation is introduced.

BglA is a structured enzymatic protein with a robust and evolutionarily conserved fold. As a result, ProteinMPNN was able to design an alternative sequence that still folds with high confidence into a structure closely resembling the original enzyme. colab colab

Part D. Group Brainstorm on Bacteriophage Engineering

Main goal

Increased stability

Will focus on stabilizing the L protein computationally. By increasing its thermodynamic stability, there is aim to reduce its dependence on the bacterial chaperone DnaJ for proper folding and membrane insertion. A more stable L protein should maintain function even if E. coli mutates DnaJ, thereby helping the phage overcome host resistance mechanisms. This also indirectly supports higher toxicity by ensuring more functional lysis protein reaches the membrane.

Project Objective
Propose to computationally design stabilizing mutations in the MS2 L lysis protein to make it less reliant on the host chaperone DnaJ while preserving (or enhancing) its ability to form oligomeric pores in the bacterial membrane. The long-term goal is to create L protein variants that increase phage lysis efficiency and reduce the evolutionary window in which E. coli can develop resistance during phage therapy.

Proposed Tools and Computational Approaches
Will use a combination of modern protein design and structure prediction tools introduced in recitation:

  • Protein Language Models (ESM-2 / ESMFold) for initial in silico mutagenesis and variant generation.
  • AlphaFold3 (or AlphaFold-Multimer) to predict monomer structures and protein–protein/membrane complexes.
  • FoldX and/or Rosetta (ddG protocol) to quantitatively evaluate mutational effects on folding stability (ΔΔG).
  • ProteinMPNN for sequence redesign of surface residues to improve packing and reduce aggregation propensity.
  • Molecular Dynamics (MD) simulations (short runs using GROMACS or OpenMM with a coarse-grained membrane model) to assess membrane insertion and oligomerization of top candidates.

Why These Tools Are Likely to Help

ESM-2 and ProteinMPNN
These models excel at proposing evolutionarily plausible and physically stable mutations without requiring an experimentally solved structure. Since the MS2 L protein is small, membrane-associated, and structurally under-characterized, sequence-based generative models provide an ideal starting point. They allow to explore sequence space while preserving foldability constraints learned from millions of natural protein sequences.

AlphaFold3 / AlphaFold-Multimer
AlphaFold enables structure prediction of the L protein monomer and potential oligomeric states. Using Multimer mode, can be modeled the L–DnaJ interaction to assess whether specific mutations preserve structural integrity while weakening dependence on the host chaperone. Recent AlphaFold versions handle small peptides and membrane proteins reasonably well, especially when modeled in a membrane-mimetic context.

FoldX / Rosetta (ddG calculations)
These physics-based tools estimate the change in folding free energy (ddG) upon mutation. Variants predicted to have ddG < -1.5 kcal/mol are likely more thermodynamically stable than wild type. This allows to prioritize mutations that may improve folding robustness and reduce chaperone dependence.

Molecular Dynamics (MD) Simulations
MD simulations provide dynamic validation. They allow to evaluate whether stabilized variants:

  • Maintain structural integrity over time
  • Insert more efficiently into lipid bilayers
  • Form stable oligomeric pore-like assemblies

This connects computational stability predictions to functional outcomes such as faster lysis and potentially higher phage titers.

Overall Strategy
By integrating:

  • Evolutionary information (protein language models)
  • Deep learning-based structure prediction (AlphaFold)
  • Physics-based stability estimation (Rosetta/FoldX)
  • Membrane-context dynamic simulations (MD)

Potential Pitfalls

1. Limited Structural Confidence

The L protein is short (~75 amino acids), partially disordered, and membrane-associated. AlphaFold confidence scores (pLDDT) may be low in flexible or transmembrane regions. This uncertainty can propagate into inaccurate ddG predictions and false positives.

Mitigation strategy:

  • Use consensus scoring (ESM + Rosetta + FoldX)
  • Focus on mutations in higher-confidence structural regions
  • Experimentally validate multiple top-ranked variants in parallel

2. Limited Training Data on Phage–Host Interactions

Most protein language models and structural predictors are trained primarily on soluble globular proteins and well-characterized complexes. The L–DnaJ interaction has limited structural and mutational data, which may reduce predictive accuracy for engineering chaperone independence.

Mitigation strategy:

  • Prioritize global stability improvements before fine-tuning interface mutations
  • Maintain diversity among selected candidates
  • Use rapid experimental feedback (Stages 4–5) to iteratively refine computational predictions

Expected Outcome and Next Steps

This computational stage (Stage 1) will generate a prioritized list of 5–8 L protein variants predicted to exhibit improved thermodynamic stability and reduced reliance on host chaperones.

Selected variants will then:

  1. Be synthesized via Twist Bioscience
  2. Be cloned using Gibson Assembly
  3. Be structurally evaluated using the Nuclera system
  4. Be tested in E. coli for lysis efficiency and phage propagation

By focusing on stability, there is aim to engineer a more robust L protein that enhances MS2’s ability to lyse E. coli efficiently. This reduces the evolutionary window for bacterial resistance and strengthens the therapeutic potential of MS2-based phage therapy.

Pipeline Schematic

PhaseMethodsPurpose
Sequence DesignESM-2, ProteinMPNNGenerate stabilized variants
Structure ModelingAlphaFold3, MultimerPreserve fold, weaken DnaJ dependence
Stability FilteringFoldX / RosettaSelect low ddG mutants
Functional ValidationMD simulationsEvaluate membrane behavior
Experimental TransferGene synthesisMove to wet lab testing

Week 5 HW: Protein design part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

the human SOD1 sequence (P00441): MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

the A4V mutant SOD1 sequence: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence with the known SOD1-binding peptide FLYRWLPSRRGG for comparison:

indexBinderPseudo Perplexity
1WRYGPAAAAHWK9.318684
2WHYPAVVLRWKX16.435002
3WLYYPAAVRLWK16.527933
4WLYYVAVVALGE22.958134
5FLYRWLPSRRGG20.63523127283615
google collab google collab

Conclusion

The model assigned the lowest perplexity (9.32) to peptide WRYGPAAAAHWK, indicating the highest sequence plausibility according to the language model.

The experimentally validated SOD1-binding peptide FLYRWLPSRRGG showed one of the higher perplexity (20.63), suggesting that the language model does not necessarily rank experimentally verified binders as the most probable sequences.

One generated peptide (WHYPAVVLRWKX) contains the residue “X”, which denotes an unknown or unspecified amino acid. This likely reflects a tokenization or sampling artifact of the language model. Because “X” does not correspond to a defined amino acid, this peptide should be interpreted cautiously when evaluating potential binding candidates which suggests that such a peptide may be invalid.

Part 2: Evaluate Binders with AlphaFold3

Each peptide was modeled in a separate AlphaFold 3 run using a two-chain setup. Chain A consisted of the SOD1 A4V mutant, while Chain B contained a single 12-residue peptide. All peptides were evaluated individually to compare ipTM scores and binding poses. Peptide 2 (WHYPAVVLRWKX) containing X was excluded from structural analysis because X denotes an unknown amino acid.

Peptide 1: WRYGPAAAAHWK (ipTM = 0.40) af af

Peptide 2: WHYPAVVLRWKX skipped (invalid X residue)

Peptide 3: WLYYPAAVRLWK (ipTM = 0.37) af af

Peptide 4: WLYYVAVVALGE (ipTM = 0.31) af af

Known Binder: FLYRWLPSRRGG (ipTM = 0.29) af af

PeptideipTMpTMBinding Interpretation
WRYGPAAAAHWK0.400.74Highest interface confidence among generated peptides, suggesting the most stable predicted interaction, although overall binding confidence remains low and primarily surface-associated.
WLYYPAAVRLWK0.370.76Weak and uncertain interaction with limited localized confidence; likely transient surface binding without a clearly defined interface.
WLYYVAVVALGE0.310.75Weakest predicted interface among generated peptides; interaction appears diffuse and poorly stabilized.
FLYRWLPSRRGG (known binder)0.290.83Despite being an experimentally known SOD1-binding peptide, AlphaFold3 predicted low interface confidence, suggesting that transient or flexible peptide interactions may not be captured reliably by structural prediction alone.

Conclusion

AlphaFold3 analysis revealed generally low interface confidence across all peptide–SOD1 complexes, with all ipTM values remaining below 0.5. Among the PepMLM-generated peptides, WRYGPAAAAHWK achieved the highest ipTM score (0.40), indicating the strongest predicted interaction with the A4V mutant SOD1 structure, although the interaction still appeared weak and primarily surface-associated. WLYYPAAVRLWK and WLYYVAVVALGE showed progressively lower ipTM values (0.37 and 0.31, respectively), suggesting less stable peptide binding interfaces. None of the generated peptides localized clearly to the N-terminal region containing the A4V mutation, and no strongly buried or highly ordered binding mode was observed. Interestingly, the experimentally known SOD1-binding peptide FLYRWLPSRRGG produced the lowest ipTM score (0.29) despite its validated biological interaction, highlighting a limitation of AlphaFold3 in modeling transient or flexible peptide-mediated interactions. Overall, the results suggest that while some PepMLM-generated peptides may form weak surface interactions with mutant SOD1, none demonstrated highly confident or stable binding according to AlphaFold3 predictions.

Part 3: Evaluate Properties with PeptiVerse

Each peptide was evaluated against the A4V mutant SOD1 sequence using PeptiVerse for predicted binding affinity, solubility, hemolysis probability, net charge, and molecular weight.

Peptide 1: WRYGPAAAAHWK hf hf

Peptide 3: WLYYPAAVRLWK hf hf

Peptide 4: WLYYVAVVALGE hf hf

Known Binder: FLYRWLPSRRGG hf hf

PeptidePredicted Binding AffinitySolubilityHemolysis ProbabilityNet Charge (pH 7)Molecular WeightInterpretation
FLYRWLPSRRGGWeak binding (6.361)Soluble (0.608)Non-hemolytic (0.047)+2.761507.7Known SOD1-binding reference peptide with moderate solubility, low toxicity, and acceptable predicted affinity.
WRYGPAAAAHWKWeak binding (6.332)Soluble (0.999)Non-hemolytic (0.010)+1.851413.6Best overall candidate with the strongest predicted affinity among generated peptides, excellent solubility, and minimal predicted toxicity.
WLYYPAAVRLWKWeak binding (6.889)Soluble (0.624)Non-hemolytic (0.097)+1.761565.9Intermediate candidate with acceptable solubility and low hemolysis risk, but weaker predicted affinity and structural confidence.
WLYYVAVVALGEWeak binding (6.889)Soluble (0.444)Hemolytic (0.310)-1.231382.6Least favorable candidate due to lower solubility, predicted hemolytic activity, and weak structural interaction confidence.

Conclusion

Each peptide was evaluated against the A4V mutant SOD1 sequence using PeptiVerse for predicted binding affinity, solubility, hemolysis probability, net charge, and molecular weight. Among the PepMLM-generated peptides, WRYGPAAAAHWK showed the strongest overall profile, combining the highest AlphaFold3 ipTM score (0.40) with the best predicted binding affinity (6.332), excellent solubility (0.999), and extremely low hemolysis probability (0.010). In contrast, WLYYVAVVALGE demonstrated weaker structural confidence (ipTM 0.31), lower solubility, and a higher predicted hemolysis probability (0.310), making it a less favorable therapeutic candidate despite similar predicted affinity values. WLYYPAAVRLWK showed intermediate behavior with moderate solubility and low hemolysis risk but weaker predicted binding and interface confidence than WRYGPAAAAHWK. Overall, peptides with higher ipTM values tended to display slightly stronger predicted binding affinity and more favorable therapeutic properties. Compared with the known binder FLYRWLPSRRGG, WRYGPAAAAHWK achieved a comparable affinity prediction while exhibiting superior solubility and lower predicted toxicity.

Among the evaluated candidates, WRYGPAAAAHWK I would select for further development. Although its predicted binding remains modest, it achieved the highest ipTM score among the generated peptides and demonstrated the most favorable therapeutic profile, including excellent solubility and minimal predicted hemolytic activity. These properties suggest a better balance between structural interaction potential and developability compared with the other candidates.

Part 4: Generate Optimized Peptides with moPPIt

moPPIt was run targeting motif positions 2–8 of A4V SOD1, with objectives set to Hemolysis Probability, Solubility, Predicted Affinity, and Motif (all weights = 1). Five 12-mer peptides were generated.

PeptideHemolysis ProbabilitySolubilityPredicted AffinityMotifInterpretation
CTSGENVGAGVS0.06660.99996.09100.6461Highly soluble and low-risk peptide with moderate predicted affinity and acceptable motif targeting near the selected SOD1 region.
ANAPWPPAFSFH0.01551.00006.05480.6574Very low predicted hemolysis and excellent solubility, with moderate affinity and balanced therapeutic properties.
PSEKQCVKFHTT0.04811.00005.86240.8456Strong motif guidance score with improved predicted affinity and excellent solubility, making it one of the most balanced candidates.
MYAGIFEKNKQT0.03070.99995.62910.7912Best predicted affinity among generated peptides with low hemolysis probability and excellent solubility, representing the strongest overall candidate.
QPTCGSGQFNWF0.03341.00006.38630.8244Excellent solubility and favorable motif score, although predicted affinity is weaker compared with the other moPPIt peptides.

Conclusion

moPPIt-generated peptides demonstrated consistently favorable therapeutic property predictions compared with the earlier PepMLM-generated candidates. All peptides showed near-perfect predicted solubility (~1.0) and very low hemolysis probabilities, indicating improved safety and developability profiles. In contrast to PepMLM, which generated peptides with mixed structural and therapeutic characteristics, moPPIt produced candidates optimized simultaneously for affinity, motif targeting, and physicochemical properties. Among the generated peptides, MYAGIFEKNKQT achieved the strongest predicted affinity score (5.6291) while maintaining excellent solubility and low predicted toxicity, making it the most promising overall candidate. PSEKQCVKFHTT also showed strong performance due to its high motif score (0.8456), suggesting improved targeting of the selected SOD1 A4V-adjacent region. Overall, the moPPIt peptides appeared more balanced and therapeutically favorable than the original PepMLM-generated peptides, highlighting the advantages of guided multi-objective peptide optimization.

Before advancing these peptides toward clinical development, additional validation would be required, including molecular docking, molecular dynamics simulations, biochemical binding assays, aggregation inhibition studies, cytotoxicity testing, and evaluation in cellular or animal ALS models to confirm both efficacy and safety.

Part C: Final Project: L-Protein Mutants

The goal of this project is to reduce the interaction between the MS2 lysis protein (L-protein) and the bacterial chaperone DnaJ. Since DnaJ is important for proper folding and processing of the L-protein, weakening this interaction may help the phage remain functional even if bacteria modify DnaJ. To study this interaction, co-folding predictions were performed using AlphaFold2 Multimer with both proteins entered together. In AlphaFold2, both sequences are inserted into a single input field separated by a colon :.

Mutations were introduced only in the soluble N-terminal domain of the L-protein and not in the transmembrane region.

Mutant 1

  • R18A
  • R19A
  • R20A
  • R30A
  • R34A

Full Mutant L-Protein Sequence

METRFPQQSQQTPASTNAAAPFKHEDYPCRAQQASSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

AlphaFold2 Multimer Results

Among the generated AlphaFold2 multimer models, the top-ranked model (rank_001, model_4) was selected for analysis because it showed the highest predicted interaction confidence with an ipTM score of 0.165.

MetricValue
pLDDT74.8
pTM0.528
ipTM0.165
mut mut

mut mut 3D structure of top-ranked model (rank_001, model_4)

Explanation

This mutant was designed to weaken the interaction between the L-protein and DnaJ. The soluble N-terminal domain contains several positively charged arginine residues that may participate in electrostatic interactions with DnaJ. Replacing arginine with alanine removes positive charges and may reduce binding affinity to the chaperone.

Alanine substitutions were chosen because alanine is small and structurally non-disruptive, allowing the protein to maintain overall folding while reducing interaction surfaces.

The transmembrane domain was left unchanged because it is important for lysis activity.

Conclusion

Mutant 1 remained structurally stable after introducing alanine substitutions in the soluble domain. The pLDDT score indicates that the mutant protein still folds with moderate confidence.

However, the ipTM score was low (0.165), indicating a weak predicted interaction between DnaJ and the mutant L-protein. This suggests that the introduced arginine-to-alanine substitutions may successfully reduce DnaJ binding while preserving overall protein structure.

The structural model also showed limited contact between the two proteins, supporting the hypothesis that these mutations weaken the DnaJ–L-protein interaction.

Mutant 2

  • Q7L
  • Q10L
  • Q11L
  • Q32A
  • Q33A

Full Mutant L-Protein Sequence

METRFPQLLSQLLTPASTNRRRPFKHEDYPCRRAAARSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

AlphaFold2 Multimer Results

Among the generated AlphaFold2 multimer models, the top-ranked model (rank_001, model_4) was selected for analysis because it showed the highest predicted interaction confidence with an ipTM score of 0.149.

MetricValue
pLDDT74.4
pTM0.523
ipTM0.149
mut mut

mut mut 3D structure of top-ranked model (rank_001, model_4)

Explanation

This mutant was designed to improve autonomous folding of the L-protein and reduce dependence on DnaJ. The soluble domain contains multiple glutamine residues that may contribute to structural flexibility and chaperone dependence. Replacing some glutamines with leucine increases hydrophobic stabilization, while replacing others with alanine reduces polar interactions. These mutations may stabilize local folding and decrease the need for DnaJ-assisted folding.

The transmembrane region was not modified to preserve lysis function.

Conclusion

Mutant 2 remained structurally stable after introducing glutamine-to-leucine and glutamine-to-alanine substitutions in the soluble domain. However, the low ipTM score (0.149) indicates a weak interaction with DnaJ. These results suggest that the mutations may reduce DnaJ dependence while maintaining overall folding of the L-protein.

Week 6 HW: Genetic circuits part I

Assignment: DNA Assembly

1. Components of Phusion High-Fidelity PCR Master Mix and their functions

The Phusion High-Fidelity PCR Master Mix typically contains:

  • Phusion High-Fidelity DNA Polymerase
    A thermostable DNA polymerase with 3’→5’ exonuclease proofreading activity, which significantly reduces the error rate during DNA amplification.

  • dNTPs (dATP, dTTP, dCTP, dGTP)
    The nucleotide building blocks used for DNA strand synthesis.

  • Optimized reaction buffer (HF or GC buffer)
    Maintains optimal pH, salt concentration, and ionic conditions required for high-fidelity enzyme activity.

  • Mg²⁺ ions (MgCl₂)
    Essential cofactor for DNA polymerase activity; influences yield and specificity.

  • Stabilizing agents
    Improve enzyme stability and reaction robustness under thermal cycling conditions.


2. Factors affecting primer annealing temperature in PCR

The annealing temperature depends on:

  • Primer melting temperature (Tm) – the primary determinant
  • GC content – higher GC content increases Tm due to stronger hydrogen bonding
  • Primer length – longer primers generally have higher Tm
  • Sequence composition – base distribution affects duplex stability
  • Salt concentration (especially Mg²⁺) – stabilizes primer-template hybridization and increases effective Tm
  • Mismatches (e.g., intentional mutations) – reduce binding stability and lower effective annealing temperature
  • 3’-end stability – a GC clamp increases binding specificity and efficiency

3. PCR vs Restriction Enzyme Digestion

PCR is a method used to amplify specific DNA fragments using sequence-specific primers.

Advantages:

  • Enables site-directed mutagenesis
  • Does not require restriction sites
  • Highly flexible for cloning design
  • Compatible with Gibson/HiFi assembly workflows

Limitations:

  • Risk of polymerase-induced mutations (even with high-fidelity enzymes)
  • Limited efficient fragment size (~up to 10 kb)

Restriction enzymes cut DNA at specific recognition sequences.

Advantages:

  • Highly predictable and precise cleavage
  • Produces defined sticky or blunt ends
  • Efficient for classical cloning workflows

Limitations:

  • Requires suitable restriction sites in the DNA
  • Can leave sequence “scars”
  • Less flexible for seamless cloning or mutagenesis

When to use each method

  • PCR is preferred when:

    • Introducing mutations (e.g., chromophore engineering in amilCP)
    • No suitable restriction sites are available
    • Using Gibson or HiFi assembly
  • Restriction digestion is preferred when:

    • Standard cloning vectors are used
    • Appropriate restriction sites are present
    • Performing traditional ligation-based cloning

4. Ensuring DNA compatibility for Gibson Assembly

To ensure successful Gibson assembly:

  • Design overlapping regions (20–40 bp) between adjacent fragments
  • Ensure overlaps are:
    • perfectly complementary
    • free of strong secondary structures
    • balanced in GC content
  • Maintain correct:
    • reading frame (for coding sequences)
    • fragment orientation
  • Verify:
    • no unintended mutations in coding regions
    • correct fragment order (backbone → insert → backbone logic)
  • Ensure overlaps have sufficient melting stability (typically ~50–60°C effective annealing range)

5. How plasmid DNA enters E. coli during transformation

Plasmid DNA enters E. coli through temporary membrane permeabilization:

Heat shock transformation

  • CaCl₂ treatment neutralizes negative charges on DNA and membrane
  • Rapid temperature shift (0°C → 42°C) creates transient pores
  • DNA enters the cell by diffusion and electrochemical forces

Electroporation

  • High-voltage pulse creates temporary nanopores in the bacterial membrane
  • DNA passes directly through these pores into the cytoplasm

After transformation:

  • Cells recover in SOC medium
  • Express antibiotic resistance genes
  • Only successfully transformed cells survive on selective media

6. Alternative assembly method: Golden Gate Assembly

Golden Gate Assembly uses Type IIS restriction enzymes (e.g., AarI), which cut outside their recognition sites to generate custom overhangs. This enables directional assembly of multiple DNA fragments in a single reaction. The reaction cycles between 37°C (digestion) and 16°C (ligation), allowing simultaneous cutting and ligation steps. Because the recognition sites are removed during assembly, the final construct is seamless (“scarless”). This method is highly efficient for multi-part DNA assembly and library construction.

Schematic diagram

Fragment A          Fragment B          Fragment C
   AarI                AarI                AarI
    ↓                   ↓                   ↓

Cuts generate compatible sticky ends:

A ---------> B ---------> C

Ligation step (T4 DNA ligase):

Final construct:
A-B-C (scarless Golden Gate assembly)

Simulation of the Golden Gate Assembly method in Benchling

The pUC19 backbone and the final project insert (oriT-Ptrc-RBS-bglA) were added. The Golden Gate AarI Type IIS enzyme was chosen as it didn’t cut sites in my selected fragments. The Golden Gate Assembly was created successfully.

gg gggg gg

Assignment: Asimov Kernel

Repressilator Construct

I recreated the Repressilator by using parts from the Characterized Bacterial Parts repository.

ak ak

The repressilator circuit was simulated in Asimov Kernel under standard E. coli conditions (24 hours, 10-minute timestep, transient transfection, no ligands). The RNA and protein concentration plots show sustained oscillatory behavior of LacI, LambdaCI, and TetR over the full simulation period. The three proteins cycle out of phase, with each repressor periodically suppressing the next promoter in the loop, generating stable temporal oscillations. The RNA profiles mirror the protein oscillations with expected phase relationships and slight amplitude differences due to transcription–translation dynamics.

These results are consistent with the Repressilator construct in the Bacterial Demos repository, which also exhibits sustained, out-of-phase oscillations of the three repressors. The qualitative agreement in oscillation pattern, phase shift, and long-term stability confirms that the reconstructed circuit functions as expected.

ak ak

My Constructs

1. Simple Repression (TetR represses GFP)

This construct consists of constitutive TetR expression (J23100 → B0034 → TetR → Terminator) and a GFP reporter controlled by the TetR-repressible promoter pTetR. TetR is expected to repress pTetR, resulting in low GFP expression at steady state.

ak ak

Simulation conditions: Organism: E. coli, Transfection: Transient, Ligands: None, Duration 24 h, Timestep 30 min, no ligands

The simulation shows that the system reaches a stable steady state without oscillations. However, GFP repression is less pronounced than theoretically expected. This outcome is likely due to the balance between the strength of the constitutive TetR promoter, the basal (leaky) activity of pTetR, and the protein degradation parameters defined in the Kernel model. Overall, the circuit functions as a steady repression system, but repression efficiency is parameter-dependent.

ak ak

2. Double Negative Cascade

This construct implements a repression cascade: LacI is constitutively expressed and represses TetR via pLacI. TetR in turn represses GFP via pTetR. Because LacI suppresses TetR, repression of GFP should be relieved, leading to GFP expression at steady state.

ak ak

Simulation conditions: Organism: E. coli, Transfection: Transient, Ligands: None, Duration 24 h, Timestep 30 min, no ligands

The simulation matches the expected behavior of a linear repressive cascade: high LacI levels suppress TetR, resulting in activation of GFP expression. The system reaches a stable steady state without oscillations, consistent with cascade architecture. Minor deviations in expression levels can be explained by the balance of promoter strengths, RBS efficiencies, and degradation parameters specified in the Kernel model.

ak ak

3. Toggle Switch (Mutual Repression)

This construct implements mutual repression between LacI and TetR. Each protein represses the promoter driving expression of the other, forming a bistable genetic toggle switch. The system is expected to stabilize in one of two possible states: LacI high/TetR low or TetR high/LacI low.

ak ak

Simulation conditions: Organism: E. coli, Transfection: Transient, Ligands: None, Duration 24 h, Timestep 10 min, no ligands

The simulation confirms the functioning of mutual repression and demonstrates robust bistable behavior. The system rapidly selects one dominant state (TetR high, LacI low) and maintains it throughout the simulation. This behavior is consistent with the theoretical toggle switch model, where the final state is determined by the balance of kinetic parameters, promoter strengths, and initial conditions defined in the Kernel simulation.

ak ak

Week 7 HW: Genetic circuits part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Intracellular Artificial Neural Networks (IANNs) provide several important advantages over traditional genetic circuits that operate using Boolean logic gates such as AND, OR, and NOT. Traditional genetic circuits typically generate binary outputs, where genes are either “ON” or “OFF.” In contrast, IANNs can process information in a continuous and weighted manner, similar to artificial neural networks used in computational machine learning.

One major advantage of IANNs is their ability to perform analog signal processing. Biological environments are inherently noisy and variable; therefore, binary logic often oversimplifies intracellular dynamics. IANNs can integrate multiple molecular inputs with different weights and thresholds, enabling graded responses rather than discrete binary outcomes. This allows cells to make more nuanced decisions based on complex environmental conditions.

Another advantage is scalability and computational complexity. Boolean genetic circuits become increasingly difficult to engineer as the number of inputs grows because the number of required regulatory interactions expands rapidly. IANN architectures are more modular and compact, enabling implementation of higher-order decision-making processes using fewer genetic components.

IANNs also possess superior pattern-recognition capabilities. Since neural-network-like systems can classify multidimensional input patterns, they are better suited for applications such as disease-state detection, metabolic-state monitoring, or adaptive therapeutic responses. Traditional Boolean circuits struggle with ambiguous or overlapping biological signals because they rely on rigid thresholds.

IANNs may support adaptive and learning-like behaviors when combined with feedback regulation and dynamic tuning mechanisms. Although current synthetic biology implementations remain relatively simple, the neural-network paradigm offers a conceptual framework for constructing programmable living systems capable of sophisticated information processing.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

A highly promising application of IANN is intelligent cancer-cell detection and targeted therapeutic activation in mammalian cells.

In this system, the IANN would receive multiple intracellular molecular inputs associated with cancer progression. For example:

  • X1: concentration of oncogenic microRNA (e.g., miR-21),
  • X2: hypoxia-associated transcription factors,
  • X3: abnormal metabolic markers such as elevated lactate levels,
  • X4: DNA damage-response proteins.

Each input would regulate synthetic genetic components corresponding to weighted neuronal connections. The network would integrate these signals through regulatory interactions mediated by transcription factors, CRISPR regulators, or endoribonucleases such as Csy4.

The output layer would produce a therapeutic response only when the overall molecular profile strongly matches a cancerous state. Possible outputs include:

  • expression of apoptosis-inducing proteins,
  • activation of immune-signaling molecules,
  • release of fluorescent reporters for diagnostics,
  • controlled secretion of anticancer drugs.

Unlike Boolean circuits, which require exact combinations of signals, the IANN could classify partially overlapping or noisy molecular patterns. For example, moderate levels of hypoxia combined with high miR-21 expression might still trigger therapy even if another marker remains weak. This resembles probabilistic classification in artificial intelligence systems.

The input/output behavior would therefore be continuous and weighted:

  • low cumulative activation → no response,
  • intermediate activation → weak reporter signal,
  • high activation → strong therapeutic gene expression.

But several limitations remain:

  • Biological noise presents a major challenge. Gene-expression variability may distort weighted signal integration and reduce classification accuracy.
  • Scalability is limited by cellular resource competition. Large synthetic networks consume transcriptional and translational machinery, potentially impairing normal cellular function.
  • Precise tuning of regulatory weights and thresholds is difficult in living systems because promoter strengths, RNA degradation rates, and protein interactions vary across cells and environmental conditions.
  • Signal crosstalk between synthetic and endogenous pathways may generate unintended outputs or toxicity.
  • Evolutionary instability may occur over long timescales, as cells can mutate or silence synthetic constructs to reduce metabolic burden.

3

diagram diagram A diagram for an intracellular multilayer perceptron

Assignment Part 2: Fungal Materials

1.1 Mycelium-Based Composites

Applications:

  • Packaging materials (replacement for polystyrene foam)
  • Thermal and acoustic insulation panels
  • Construction biomaterials
  • Furniture and interior design elements

Principle:
Mycelium (the vegetative body of fungi) grows through a lignocellulosic substrate (e.g., sawdust, agricultural waste), binding particles into a cohesive composite. Growth is terminated by heat treatment.

Advantages:

  • Biodegradable
  • Utilize agricultural waste as feedstock
  • Lower energy consumption compared to plastics and cement production
  • Reduced carbon footprint

Disadvantages:

  • Lower mechanical strength compared to synthetic polymers and concrete
  • Moisture sensitivity (if untreated)
  • Limited standardization of material properties

1.2 Mycelium Leather

Applications:

  • Footwear
  • Bags and accessories
  • Clothing
  • Automotive interiors

Advantages:

  • Alternative to animal leather without livestock production
  • Potentially lower environmental impact
  • Tunable texture and thickness

Disadvantages:

  • Mechanical durability may be inferior to high-quality natural leather
  • Often requires coatings or treatments, which may reduce biodegradability

1.3 Chitin and Chitosan

Applications:

  • Biomedical materials (wound dressings, drug delivery systems)
  • Biodegradable films
  • Water purification (adsorbents)

Advantages:

  • Biocompatible
  • Antimicrobial properties (especially chitosan)
  • Biodegradable

Disadvantages:

  • Higher cost compared to conventional synthetic polymers
  • Limited mechanical strength unless combined into composites

2.1 Enhancement of Mechanical Properties

  • Increased synthesis of chitin or β-glucans
  • Modification of cell wall architecture

Goal: to create stronger, more durable structural biomaterials.

2.2 Control of Morphogenesis

  • Regulation of hyphal branching
  • Control over mycelial density

Goal: to standardize material properties and control porosity and mechanical performance.

2.3 Bioremediation Enhancement

  • Upregulation of oxidoreductases (e.g., laccases, peroxidases)

Goal: to improve degradation of xenobiotics, plastics, and petroleum-derived pollutants.

The advantages of doing synthetic biology in fungi as opposed to bacteria

  1. Fungi are eukaryotes and therefore capable of post-translational modifications, proper folding of complex multidomain proteins, secretion of functional eukaryotic proteins

  2. Filamentous fungi naturally secrete large quantities of enzymes into the extracellular environment, simplifying downstream processing in industrial applications.

  3. Mycelium forms natural three-dimensional networks, making fungi uniquely suited for growing structural materials without requiring an external scaffold. Bacteria typically require additional matrices to achieve similar structures.

Assignment Part 3: First DNA Twist Order

1. Final Project Title Bioprinting horizontal gene transfer: a bacterial “photocamera”

Short Final Project Description This project visualises horizontal gene transfer (HGT) in E. coli using a bioprinting system that functions as a bacterial “photocamera”. Donor and recipient strains are printed directly in spatially defined layers onto chromogenic agar; successful conjugation events are revealed by a pink-to-teal colour transition as recipients acquire the β-glucosidase gene (bglA) from Enterococcus faecalis. The project integrates synthetic biology, triparental mating, and bioart to create living images where gene transfer becomes literally visible as evolving colour.

Aim 1: Experimental Aim The first aim of my final project is to visualise plasmid-mediated horizontal gene transfer between spatially organised E. coli populations by utilising a triparental mating system combined with direct chromogenic bioprinting on selective chromogenic agar.

This aim will be achieved through:

  • Plasmid design: cloning the bglA β-glucosidase gene from Enterococcus faecalis into a mobilizable vector backbone (e.g., pBBR1MCS series with oriT) carrying an appropriate antibiotic resistance marker, using Benchling for sequence design and Twist/IDT for gene synthesis
  • Strain construction: transforming the mobilizable bglA plasmid into donor E. coli; using pRK2013 (Addgene #37636) in a helper strain to provide conjugation machinery; maintaining an unmarked recipient strain with pink chromogenic phenotype
  • Bioprinting: spatially depositing donor, helper, and recipient cell suspensions in defined geometric patterns directly onto chromogenic agar using the lab bioprinter, creating layered arrangements that maximise cell-to-cell contact
  • HGT visualisation: incubating printed plates and observing colour transition from pink (recipient only) to teal (successful HGT of bglA) as the spatial visual readout of gene transfer; transconjugants confirmed by replating on selective agar with the corresponding antibiotic

Methods / Resources: triparental mating protocol, chromogenic agar, selective agar with antibiotic (to be confirmed based on vector choice), bioprinter, Benchling (plasmid design), Twist Biosciences (gene synthesis), pRK2013 (Addgene #37636), standard E. coli transformation workflow.

I selected Addgene, Twist Biosciences, New England Biolabs, and Opentrons because they collectively cover both the scientific reproducibility and artistic visualization goals of my bacterial “photocamera” of my project. That consists of plasmid sourcing and gene synthesis to cloning reagents and bioprinting automation. Addgene provides the essential pRK2013 helper plasmid for triparental mating, Twist enables rapid bglA gene synthesis from design to DNA, and NEB supports the molecular cloning workflow. Opentrons brings automation expertise relevant to spatial bioprinting of bacterial suspensions.

2. Construct design in Benchling.

plasmid plasmid

Week 9 HW: Cell-free-systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Flexibility:

  • All components (DNA template, ribosomes, substrates, cofactors, chaperones, detergents) can be added, removed, or tuned in real time without the constraints of cellular membranes or homeostasis.
  • No need for cloning, transformation, culture optimization, or induction. Protein production begins within minutes to hours.
  • Proteins that are lethal to cells, misfolded in vivo, or subject to proteolytic degradation can be synthesized without harming host viability.

Control over experimental variables:

  • can be precisely set and maintained.
  • easier than metabolic incorporation; simply add ncAA and orthogonal tRNA/synthetase.
  • ATP/GTP levels and redox balance (NAD⁺/NADH) can be independently controlled.

Two cases where cell-free is more beneficial:

1: Production of toxic or membrane-disruptive proteins. These proteins lyse or kill host cells, preventing scalable in vivo production. Cell-free systems bypass cell viability entirely.

2: Incorporation of multiple non-canonical amino acids for structural/biophysical studies. Metabolic incorporation in vivo is limited by cellular uptake, toxicity, and competition with natural amino acids. Cell-free systems allow direct supplementation of ncAAs at high concentrations with orthogonal suppressor tRNAs, achieving near-complete incorporation at desired positions.


2. Describe the main components of a cell-free expression system and explain the role of each component.

A typical prokaryotic cell-free system (e.g., E. coli S30 or PURE system) contains:

ComponentRole
Cell lysate or reconstituted translation machineryProvides ribosomes (30S + 50S subunits), translation factors (IF, EF-Tu, EF-G, RF), aminoacyl-tRNA synthetases, and chaperones. In PURE systems, each component is individually purified and defined.
DNA template or mRNAEncodes the target protein. Linear DNA (PCR product) or plasmid with a T7 (or SP6) promoter is commonly used. mRNA can be added directly if pre-synthesized.
T7 RNA polymerase (if using DNA template)Transcribes DNA into mRNA in situ.
Amino acids (20 canonical + optional ncAAs)Building blocks for protein synthesis.
tRNAs (charged or uncharged + synthetases)Deliver amino acids to ribosomes; aminoacyl-tRNA synthetases charge tRNAs.
Energy substratesATP, GTP, CTP, UTP for transcription and translation; phosphoenolpyruvate (PEP) or creatine phosphate for ATP regeneration.
Energy regeneration enzymesPyruvate kinase (uses PEP → pyruvate, regenerates ATP) or creatine kinase (uses creatine phosphate).
Salts and bufferMg²⁺ (critical for ribosome activity and nucleotide binding), K⁺, NH₄⁺, pH ~7–8 (Tris or HEPES buffer).
Redox systemNAD⁺/NADH, reduced glutathione, or DTT to maintain reducing conditions.

3. Why is energy provision and regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

  • ATP and GTP are rapidly consumed during transcription (NTP incorporation), translation (aminoacyl-tRNA charging, peptide bond formation, ribosome translocation), and chaperone activity.
  • Unlike living cells, cell-free systems lack glycolysis and oxidative phosphorylation and cannot regenerate ATP metabolically.
  • Without regeneration, ATP is depleted within minutes, halting protein synthesis prematurely and drastically reducing yield.

Method to ensure continuous ATP supply:

Phosphoenolpyruvate (PEP) / pyruvate kinase system:

  • Mechanism:
    PEP + ADP →^(pyruvate kinase) ATP + pyruvate
  • Advantages:
    • PEP has a very high phosphoryl transfer potential (ΔG°′ ≈ –62 kJ/mol), driving ATP regeneration efficiently.
    • Pyruvate is an end product and does not inhibit the reaction.
    • Simple to implement; widely used in commercial kits.
  • Procedure:
    1. Add PEP (e.g., 20–50 mM) and pyruvate kinase (e.g., 0.1–0.5 U/μL) to the reaction mix.
    2. Monitor reaction time; PEP-based systems can sustain synthesis for 2–6 hours or longer.

4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

FeatureProkaryotic (e.g., E. coli)Eukaryotic (e.g., wheat germ, rabbit reticulocyte, insect cell, CHO)
Ribosome type70S (30S + 50S)80S (40S + 60S)
Post-translational modifications (PTMs)Minimal (no glycosylation, limited phosphorylation)Glycosylation, phosphorylation, acetylation, methylation
Protein folding complexitySimpler; suitable for bacterial proteins or simple eukaryotic domainsBetter for complex eukaryotic proteins requiring chaperones and PTMs
Disulfide bond formationLimited; reducing environment unless oxidizing conditions addedMore robust; endogenous PDI and chaperones present
Cost and easeCheaper, faster, high-throughput friendlyMore expensive, slower, but necessary for authenticity
YieldGenerally higher (mg/mL range possible)Lower (μg/mL range typical)

Protein choice for prokaryotic system:

Protein: Cas9 endonuclease (e.g., Streptococcus pyogenes Cas9)

  • Cas9 is a bacterial protein with no requirement for glycosylation or complex eukaryotic PTMs.
  • High yield is desirable for genome editing applications and structural studies.
  • E. coli cell-free systems (especially PURE or S30 extracts) produce functional Cas9 rapidly and cost-effectively.
  • Can incorporate ncAAs for labeling or crosslinking studies without eukaryotic metabolism interference.

Protein choice for eukaryotic system:

Protein: Erythropoietin (EPO) – a therapeutic glycoprotein hormone

  • EPO requires N-linked glycosylation for stability, serum half-life, and biological activity.
  • Prokaryotic systems cannot perform glycosylation; EPO produced in E. coli is inactive or rapidly degraded in vivo.
  • Eukaryotic cell-free systems (e.g., CHO lysate, insect cell extract, or wheat germ supplemented with microsomes) can glycosylate nascent chains co-translationally or post-translationally.
  • Cell-free allows rapid prototyping of glycosylation site mutants without lengthy cell-line development.

5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Challenges of membrane protein expression in cell-free systems:

  1. Hydrophobicity → aggregation: Membrane proteins have large hydrophobic transmembrane domains that aggregate and precipitate in aqueous solution.
  2. Loss of native structure and function: Without a lipid bilayer, transmembrane proteins misfold.
  3. Low solubility and yield.

Experimental design to optimize membrane protein expression:

1. Choice of membrane-mimetic environment:

Detergent micelles

  • Add nonionic detergents (e.g., Brij-35, digitonin, DDM) or zwitterionic detergents (Fos-Choline) directly to the reaction.
  • Detergents solubilize hydrophobic domains co-translationally, preventing aggregation.
  • Optimization: titrate detergent concentration (typically 0.5–2% w/v) to balance solubilization vs ribosome inhibition.

2. Use of chaperones and lipid-transfer machinery:

  • Supplement with YidC (bacterial insertase), Sec translocon components, or eukaryotic SRP (signal recognition particle) if using eukaryotic lysates.
  • Enhance folding with molecular chaperones (e.g., DnaK, GroEL/ES).

3. Template design:

  • Use constructs with solubility tags (e.g., MBP, SUMO) fused to membrane protein to aid initial solubility, followed by protease cleavage.
  • Optimize the 5′ UTR and RBS/Kozak sequence for efficient translation initiation.

4. Optimization strategy:

  • Small-scale screening: vary detergent type/concentration, liposome composition, Mg²⁺, K⁺, temperature (often lower temps ~25–30°C improve folding).
  • Functional assay readout: ligand binding, fluorescence (if

6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Reason 1: Inefficient transcription or translation initiation

Troubleshooting strategy:

  • Check DNA template integrity: run agarose gel to confirm template is intact and at expected size.
  • Verify transcription: isolate RNA from a small aliquot of the reaction and run on denaturing gel or use RT-qPCR to quantify mRNA levels. Low mRNA → transcription problem.
  • Optimize RBS/5′ UTR: test different spacing between RBS and start codon (optimal is ~5–9 nt for E. coli); use strong synthetic RBS sequences (e.g., BBa_B0034).
  • Add more T7 RNA polymerase or fresh enzyme if transcription is limiting.

Reason 2: Protein aggregation or misfolding

Troubleshooting strategy:

  • Add molecular chaperones: supplement with DnaK/DnaJ/GrpE (Hsp70 system), GroEL/ES, or trigger factor.
  • Adjust redox conditions: for proteins requiring disulfide bonds, add oxidized glutathione (GSSG) or protein disulfide isomerase (PDI). For reducing environment, add DTT or β-mercaptoethanol.
  • Lower reaction temperature: reduce from 37°C to 25–30°C to slow synthesis and allow more time for folding.
  • Check solubility: centrifuge reaction and analyze supernatant vs pellet by SDS-PAGE. If protein is in pellet, it aggregated → try detergents or co-expression of solubility tags (MBP, SUMO).

Reason 3: Rapid degradation of the target protein

Troubleshooting strategy:

  • Add protease inhibitors: include a cocktail (e.g., PMSF, leupeptin, pepstatin, aprotinin, or EDTA-free complete protease inhibitor tablets).
  • Use nuclease-treated lysates: commercial kits often pre-treat extracts to remove nucleases and reduce protease activity.
  • Time-course experiment: sample the reaction at multiple time points (e.g., 30 min, 1 h, 2 h, 4 h) and run Western blot or SDS-PAGE. If protein appears early then disappears → degradation. If it never appears → synthesis problem.
  • Stabilize protein: add ligands, substrates, or cofactors that stabilize the native fold.
  • Check sequence: ensure no internal stop codons, frameshifts, or rare codons that stall ribosomes (leading to abortive peptides).

Homework question from Kate Adamala

1. Pick a function and describe it

a1. What would your synthetic cell do?

The synthetic minimal cell (SMC) will function as a heavy metal biosensor and detoxification system. It will detect toxic mercury ions (Hg²⁺) in water and respond by producing a fluorescent signal and binding the mercury ions to reduce toxicity.

a2. What is the input and output?

  • Input: Mercury ions (Hg²⁺)
  • Output:
    1. Green fluorescent protein (GFP) signal
    2. Mercury-binding protein (MerP)

b. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Partially, yes. A cell-free Tx/Tl system could detect Hg²⁺ and produce GFP in solution. However, encapsulation is important because:

  • it protects the reaction components,
  • increases system stability,
  • prevents diffusion of proteins,
  • allows controlled interaction with the environment.

c. Could this function be realized by genetically modified natural cells?

Yes. Genetically modified E. coli could perform the same task using the mercury resistance operon. However, synthetic minimal cells are safer because:

  • they are non-living,
  • they cannot replicate,
  • they reduce biosafety concerns,
  • they are easier to control experimentally.

d. Describe the desired outcome of your synthetic cell operation.

When mercury ions are present:

  1. Hg²⁺ enters the synthetic cell.
  2. A mercury-responsive genetic circuit activates.
  3. GFP fluorescence is produced.
  4. MerP binds mercury ions.
  5. The contaminated sample becomes fluorescent and partially detoxified.

2. Design all components that would need to be part of your synthetic cell

a. What would the membrane be made of?

The membrane will consist of:

  • POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine)
  • Cholesterol
  • DSPE-PEG2000 for additional membrane stability

Example membrane composition:

  • 70% POPC
  • 25% cholesterol
  • 5% DSPE-PEG2000

b. What would you encapsulate inside?

Cell-free Tx/Tl system

A bacterial transcription/translation system containing:

  • ribosomes,
  • tRNAs,
  • amino acids,
  • RNA polymerase,
  • ATP regeneration system,
  • nucleotides,
  • cofactors.

Small molecules

  • ATP
  • GTP
  • amino acids
  • Mg²⁺ ions
  • K⁺ buffer
  • phosphoenolpyruvate (energy source)

DNA plasmids

Plasmids encoding:

  1. Mercury-responsive regulator (merR)
  2. Green fluorescent protein (gfp)
  3. Mercury-binding protein (merP)
  4. Membrane pore protein (hla)

c. Which organism will the Tx/Tl system come from?

The Tx/Tl system will come from bacterial E. coli extract. A bacterial system is sufficient because:

  • mercury-responsive promoters naturally exist in bacteria,
  • bacterial Tx/Tl systems are inexpensive and robust,
  • no mammalian-specific regulation is required.

d. How will your synthetic cell communicate with the environment?

Mercury ions are small enough to diffuse slowly through lipid membranes. To improve transport efficiency, the membrane will contain pores formed by α-hemolysin.

Communication mechanism

  1. Hg²⁺ enters through α-hemolysin pores.
  2. Internal genetic circuit activates.
  3. GFP accumulates inside the vesicle.
  4. Fluorescence can be measured externally.

3. Experimental detail

a. List all lipids and genes

Lipids: POPC, Cholesterol, DSPE-PEG2000 Genes: merR, merP, gfp, hla

b. How will you measure the function of your system?

Measure GFP fluorescence using fluorescence microscopy, microplate reader, flow cytometry. Measure mercury removal using atomic absorption spectroscopy (AAS), ICP-MS (inductively coupled plasma mass spectrometry).


Homework question from Peter Nguyen

One-Sentence Pitch

Freeze-dried cell-free biosensor systems embedded in façade panels enable buildings to detect airborne pollutants and trigger visible color changes or catalytic neutralization in response to poor air quality.

How the Idea Works

The concept integrates freeze-dried, cell-free synthetic biology systems directly into porous architectural materials such as façade tiles, wall panels, or concrete coatings. These systems contain DNA-encoded biosensors that respond to specific airborne pollutants (e.g., NOx, SO₂, formaldehyde, or particulate-associated toxins).

Upon activation by environmental moisture (rain, humidity, or condensation), the cell-free reaction initiates transcription–translation processes that produce either:

  1. A visible pigment (e.g., chromoprotein) that changes the color of the surface to signal pollution levels, and/or
  2. An inducible enzyme (e.g., oxidoreductases or peroxidases) capable of catalytically degrading certain pollutants at the material surface.

The biological components are compartmentalized in microcapsules distributed throughout the material matrix. When exposed to specific chemical triggers, the capsules activate locally, enabling spatially resolved sensing across the building envelope.

Societal Challenge / Market Need

Urban air pollution is a major public health issue linked to respiratory and cardiovascular diseases. However, pollution monitoring infrastructure is sparse, expensive, and often spatially limited. Buildings themselves represent large, underutilized surface areas in cities.

This system transforms passive architectural surfaces into distributed environmental sensors and potentially active remediation platforms. It addresses the need for:

  • Hyperlocal, real-time pollution monitoring
  • Increased public awareness of environmental conditions
  • Integration of sustainability and smart functionality into construction materials

Addressing Limitations of Cell-Free Systems

1. Water Activation Freeze-dried systems are naturally stable in dry conditions and activate upon hydration. In architecture, this can be leveraged by designing materials that respond specifically to rain events or high humidity, which often correlate with pollutant deposition.

2. Stability The cell-free components can be stabilized using:

  • Trehalose or other lyoprotectants
  • Polymer encapsulation
  • UV-protective coatings
  • Encapsulation in hydrogel microdomains within the material can further enhance thermal and environmental stability.

3. One-Time Use Limitation

  • The material can contain distributed micro-reservoirs that activate sequentially over time.
  • Replaceable surface coatings or modular façade tiles can be designed for periodic renewal.
  • Alternatively, the system can be designed as an event-reporting sensor (e.g., long-term color change after threshold exposure), functioning as a cumulative environmental record rather than a continuously active device.

Homework question from Ally Huang

1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting.

Long-duration spaceflight exposes astronauts to elevated levels of ionizing radiation, increasing the risk of DNA damage, cancer, and degenerative diseases. Current monitoring systems primarily measure physical radiation dose rather than biological impact. A lightweight, low-resource method to assess functional DNA damage in space would improve astronaut health monitoring and countermeasure development. Cell-free systems are particularly suitable for space missions due to their stability, low mass, and independence from living cells. Developing a molecular assay that links radiation exposure to functional gene expression capacity addresses a critical challenge for deep-space exploration missions to the Moon and Mars.

2. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches.

Radiation-induced DNA double-strand breaks in a GFP reporter plasmid, quantified through reduction of functional protein expression in the BioBits® cell-free system.

3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses.

Ionizing radiation induces DNA double-strand breaks (DSBs), among the most severe forms of genomic damage. DSBs disrupt gene integrity and impair transcription and translation. A plasmid encoding GFP can serve as a functional reporter: radiation-induced strand breaks decrease or abolish GFP expression in a cell-free system. Therefore, fluorescence intensity directly reflects template integrity. This strategy measures biologically meaningful damage rather than only absorbed radiation dose. By linking molecular lesions to protein synthesis capacity, the assay provides insight into how radiation may compromise essential genetic processes during spaceflight.

4. Clearly state your hypothesis or research goal and explain the reasoning behind it.

Space radiation exposure reduces functional GFP expression from a reporter plasmid in the BioBits® cell-free system in proportion to accumulated DNA damage.

Ionizing radiation generates strand breaks and base modifications that interfere with transcription and translation. If a GFP-encoding plasmid is exposed to spaceflight conditions and subsequently used in a cell-free protein expression reaction, DNA lesions should decrease protein yield. By comparing fluorescence output between spaceflight-exposed and Earth-based control plasmids, we can quantify functional DNA integrity. This approach provides a biologically relevant readout of radiation damage using minimal hardware. The research goal is to validate a compact, rapid assay capable of assessing molecular radiation damage during long-duration space missions.

5. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc.

Identical GFP plasmid samples will be divided into: (1) spaceflight-exposed samples and (2) ground-based controls. After exposure, plasmids may be amplified using the miniPCR® (to assess amplifiability) and then expressed using the BioBits® cell-free protein expression system. GFP fluorescence will be measured with the P51 Molecular Fluorescence Viewer. Controls include no-DNA reactions (negative control) and non-exposed plasmid reactions (positive control). Fluorescence intensity will be quantified and normalized to DNA input. Reduced fluorescence in spaceflight samples relative to controls will indicate functional radiation-induced DNA damage.


Homework Part B: Individual Final Project

fp fp

Week 10 HW: Imaging and measurement

Homework: Final Project

1. Conjugation Frequency (Quantitative HGT Efficiency)

What is measured: The ratio of transconjugants to total recipient cells — the primary numerical output of the experiment. After overnight incubation on LB agar, cells are re-printed onto:

  • Plate A (no antibiotic) → counts all recipient colonies (pink) + donor (blue)
  • Plate B (+ ampicillin 100 µg/mL) → only AmpR transconjugants survive in recipient zones

Conjugation frequency = N(blue AmpR in recipient zone) / N(total recipient on Plate A)

Technologies: Automated colony counting via high-resolution flatbed scanning + ImageJ / OpenCFU software

2. Chromogenic Phenotype — Colorimetric Measurement (pink → blue transition)

What is measured: The spatial and quantitative shift in colony colour as a direct proxy for bglA expression and β-glucosidase enzymatic activity in transconjugants.

  • Flatbed scan of agar plates at high resolution (≥600 dpi)
  • Extract RGB pixel values per colony zone using ImageJ or Python
  • Pink colonies: H ≈ 330–350°; blue transconjugants: H ≈ 170–190°
  • Quantify the fractional area of blue signal within recipient-zone boundaries

Technologies: Digital image analysis (ImageJ, ColourDeconvolution plugin)

3. Plasmid Presence in Transconjugants — Colony PCR

What is measured: Physical confirmation that the mobilizable plasmid (carrying oriT_RK2–Ptrc–RBS–bglA) has transferred into recipient cells.

  • Pick 12–24 blue AmpR colonies from Plate B recipient zones
  • Colony PCR with primers flanking bglA CDS (~950 bp amplicon) and a second primer pair spanning the oriT–Ptrc junction
  • Controls: donor lysate (positive), untransformed recipient (negative), water (NTC)

Technologies:

  • ATC Thermal Cycler using 96-Armadillo PCR plates
  • Reaction setup: Echo525 acoustic liquid handler for master mix dispensing into 384-well format (high-throughput screening of many colonies)
  • 1% agarose gel electrophoresis with ethidium bromide / SYBR Safe staining, UV transilluminator; band at ~950 bp confirms transfer

4. β-Glucosidase Enzymatic Activity — pNPG Colorimetric Assay

What is measured: Direct enzymatic activity of the bglA gene product (aryl-phospho-β-D-glucosidase from E. faecalis) in cell lysates — the most quantitative functional readout.

  • Grow 3–5 blue transconjugant colonies overnight → pellet → lyse by bead-beating or freeze-thaw
  • Add pNPG (para-nitrophenyl-β-D-glucopyranoside) substrate to lysate in 96-well microplate
  • β-Glucosidase cleaves pNPG → releases yellow para-nitrophenol
  • Read absorbance at 405 nm over time (kinetic assay, 30 min)
  • Compare: transconjugants vs. untransformed recipient vs. donor vs. blank

Technologies:

  • Spark Plate Reader or PHERAstar FSX — kinetic absorbance at 405 nm in 96-round-axygen half-deep plates
  • Reagent dispensing: Multiflo automated microplate dispenser for substrate addition
  • Data: calculate specific activity in nmol pNP / min / mg total protein (protein concentration by Bradford assay at 595 nm on same reader)

5. Spatial Resolution of Bioprinting — Image Analysis

What is measured: The geometric precision of the Opentrons OT-2 deposition — do donor and recipient zones remain spatially segregated, and how sharp is the boundary?

  • Scan plates before and after incubation
  • In ImageJ: draw intensity line profiles across zone boundaries → measure zone width, boundary sharpness (µm), and bleed-through area
  • Vary deposited volume (0.5 µL, 1 µL, 2 µL) and cell density (OD600 0.1–1.0) to optimize printing parameters

Technologies:

  • Flatbed scanner (600–1200 dpi)
  • ImageJ macro for automated line-profile extraction
  • Python + matplotlib for plotting bleed-through as a function of print volume

Homework: Waters Part I — Molecular Weight

1.

By using the molecular weight calculator provided by the ExPASy portal, the molecular weight of the N-terminally tagged eGFP presented above was calculated to be MWth = 28,006.60Da.

2.1.

From Figure 10.1 I chose two adjacent charge state peaks:

  • m/zₙ = 903.7148
  • m/zₙ₊₁ = 875.4421

Therefore:

  • 903.7148 → z = 31
  • 875.4421 → z = 32

2.2.

Based on the calculations of the previous segment the experimentally measured MW of eGFP is MWexp approximately 27,984Da

2.3.

By applying the mathematical formula, the accuracy of the measurement is approximately 8.1x10-4.

3.

The peaks in the zoomed-in region are not clearly separated, likely due to limited instrument resolution at this m/z range. Because the isotopic peaks are only partially resolved and somewhat noisy, it is difficult to directly determine the charge state from isotope spacing alone.

However, by counting the charge states from previously assigned peaks (moving toward higher m/z and decreasing the charge by one for each peak), the zoomed-in peak at m/z ≈ 1474 corresponds to approximately the 19 charge state.

Homework: Waters Part III — Peptide Mapping - primary structure

1.

Lysine (K): 20 residues. Arginine (R): 6 residues. Total K + R: 26.

2.

Using the ExPASy PeptideMass tool with the parameters in Figure 4: 19 peptides.

3.

The eGFP peptide map (Figure 5a) contains 21 chromatographic peaks between 0.5 and 6 minutes with >10% relative abundance. These are the labeled peaks at retention times: 0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, and 5.43 min. All exceed 10% relative abundance (threshold: ~1.2e6 counts, based on the tallest peak at 4.87 min with ~1.2e7 counts).

4.

No, the numbers do not exactly match. There are more peaks (21) than predicted peptides (19). This difference can be explained by incomplete ionization, co-eluting peptides, very small peptides not retained well on the column, missed detection due to low abundance, or peptides outside the 0.5–6 min window.

5.

The most abundant (base) peak in the spectrum is at m/z 525.76712. The charge state of the most abundant form of the peptide is 2+.

To calculate the singly charged mass, I use: [M+H]+ = (m/z × z) − (z − 1) × 1.0073

where:

  • m/z = 525.76712
  • z = 2
  • 1.0073 Da = mass of a proton

525.76712 × 2 = 1051.53424
1051.53424 − 1.0073 = 1050.52694 Da

Final result:

[M+H]+ ≈ 1050.53 Da

6.

By comparing the experimentally determined singly charged mass
([M+H]+ ≈ 1050.53 Da) with the theoretical tryptic peptide masses from the ExPASy PeptideMass tool, the peptide eluting at 2.78 min most likely corresponds to: FEGDTLVNR. The theoretical monoisotopic mass ([M+H]+) is: 1050.5214 Da.

Mass error: 12.2 ppm. The error is slightly above 10 ppm, which is typically considered the upper confidence threshold for peptide identification.

7.

According to Figure 6, the sequence coverage for eGFP is 88%.

Homework: Waters Part IV — Oligomers

OligomerTheoretical Mass (MDa)Observed Peak (MDa)
7FU Decamer3.4~3.4
8FU Didecamer8.0~8.3
8FU 3-Decamer12.0~12.67
8FU 4-Decamer16.0~16

Homework: Waters Part V — Did I make GFP?

TheoreticalObserved/measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)28.0066 kDa27.9840 kDa807 ppm

Yes, the data confirms the production of eGFP. The observed mass is in close agreement with the theoretical calculation (~28.0 kDa), with a relative mass error of 0.081% (807 ppm).

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

I contributed a one pixel on Q3 - H1 plate with mKO2 fluorescent protein, but it was overlapped by other contributions later. pixel pixel

My involvement in the artwork was limited to placing a single pixel, which I used primarily to familiarize myself with the interface. I also initially assumed that the canvas had a limited number of available pixels relative to the large number of course participants. Nevertheless, I found the concept of a collaborative artwork compelling, and its implementation through a great interactive website was thoughtfully designed and inspiring. Also I found the timelapse feature particularly valuable, as it effectively illustrated both the temporal evolution of the image and the conceptual development of the artwork over time.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

1.

E. coli Lysate Provides the core transcription–translation machinery, including ribosomes, tRNAs, aminoacyl-tRNA synthetases, metabolic enzymes, and cofactors. The BL21 (DE3) Star lysate specifically contains T7 RNA polymerase, enabling transcription from T7 promoter-driven DNA templates.

Salts / Buffer

  • Potassium Glutamate
    Maintains intracellular-like ionic strength and supports ribosomal stability and enzymatic activity.

  • HEPES-KOH (pH 7.5)
    Buffers the reaction environment to maintain optimal pH for transcription and translation.

  • Magnesium Glutamate
    Supplies Mg²⁺ ions, essential cofactors for ribosomes, RNA polymerase, ATP-dependent enzymes, and stabilization of nucleotides.

  • Potassium Phosphate Monobasic / Dibasic
    Contribute to buffering capacity and phosphate balance, supporting nucleotide metabolism and energy regeneration pathways.

Energy / Nucleotide System

  • Ribose
    Serves as a precursor for nucleotide biosynthesis, supporting sustained transcription.

  • Glucose
    Provides a carbon and energy source for endogenous metabolic enzymes in the lysate to regenerate ATP.

  • AMP, CMP, GMP, UMP
    Nucleotide monophosphates that are enzymatically converted into nucleotide triphosphates (ATP, GTP, CTP, UTP) required for RNA synthesis and energy transfer.

  • Guanine
    Precursor for GMP/GTP synthesis, supporting transcription and translation processes.

Translation Mix (Amino Acids)

  • 17 Amino Acid Mix
    Supplies the majority of standard amino acids required for protein synthesis.

  • Tyrosine
    Added separately due to solubility limitations; required as a substrate for protein synthesis.

  • Cysteine
    Added separately due to oxidation sensitivity; essential for protein synthesis and disulfide bond formation.

Additives

  • Nicotinamide
    Precursor for NAD⁺ biosynthesis, supporting redox balance and metabolic reactions required for sustained energy regeneration.

Backfill

  • Nuclease-Free Water
    Adjusts the final reaction volume while preventing degradation of DNA or RNA templates.

2.

The 1-hour PEP-NTP system directly supplies phosphoenolpyruvate (PEP) and nucleotide triphosphates (NTPs), enabling rapid, high-level protein production but limiting reaction duration due to fast energy depletion. In contrast, the 20-hour NMP-Ribose-Glucose system relies on nucleotide monophosphates and simple carbon sources that are enzymatically converted into active nucleotides and ATP, allowing slower yet sustained protein expression through metabolic energy regeneration.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

1.

sfGFP (superfolder GFP) is engineered for highly efficient folding and rapid maturation, making it particularly robust in cell-free systems. Its fast chromophore formation allows fluorescence to appear quickly, which improves signal reliability during short incubations.

mRFP1 has a relatively slower maturation time compared to GFP variants, which can delay fluorescence readout in cell-free expression systems. In addition, red fluorescent proteins often require more efficient folding conditions to achieve maximal fluorescence.

mKO2 is an orange fluorescent protein with relatively fast maturation among orange/red fluorophores, which improves early fluorescence detection. However, it is somewhat sensitive to acidic conditions, meaning pH fluctuations in the reaction can reduce fluorescence intensity.

mTurquoise2 is a cyan fluorescent protein with very high quantum yield and brightness, producing strong fluorescence even at moderate expression levels. Proper folding is important for maintaining its efficient chromophore structure and high signal output.

mScarlet-I is a bright red fluorescent protein with improved folding efficiency and photostability, but it still requires longer maturation times than GFP-like proteins. In cell-free systems, prolonged energy availability is important to allow complete chromophore maturation and maximal fluorescence.

Electra2 is a fluorescent protein optimized for specific spectral or functional properties, but its fluorescence output can depend strongly on oxygen availability because chromophore maturation requires oxidation reactions. Limited oxygen diffusion in dense cell-free reactions may therefore reduce fluorescence efficiency.

2.-4.

Cell-Free protein expression optimization strategy

This experiment aims to optimize fluorescence output in a cell-free expression system over a 36-hour incubation by systematically targeting limiting biochemical factors. I selected two fluorescent proteins with distinct maturation kinetics:

  • sfGFP (fast-folding, rapid maturation)
  • mScarlet-I (slow maturation, higher dependence on sustained expression)

To maintain interpretability, a controlled experimental design is used, where each protein is optimized along a single dominant axis:

  • translation efficiency (Mg²⁺) for sfGFP
  • energy availability for mScarlet-I

1) sfGFP Optimization (Translation-Limited Regime)

Fluorescence output of sfGFP is primarily limited by translation efficiency. Increasing magnesium glutamate concentration will enhance ribosome stability and catalytic activity, resulting in increased protein yield and fluorescence. Because sfGFP folds efficiently and matures rapidly, fluorescence output is expected to scale with total protein synthesis rather than maturation time.

Experimental Design (4 wells)

WellMg²⁺ concentration (mM)
17 (baseline)
29
311
413

Expected Outcome

  • enhance translation rate
  • increase total sfGFP yield
  • produce higher fluorescence intensity

2) mScarlet-I Optimization (Energy-Limited Regime)

Fluorescence output of mScarlet-I is limited by energy availability over extended incubation. Increasing glucose and ribose concentrations will prolong ATP regeneration, enabling sustained protein synthesis and improved chromophore maturation. mScarlet-I requires extended translation time and post-translational chromophore maturation. Thus, system longevity is the dominant constraint.

Experimental Design (4 wells)

WellGlucose (g/L)Ribose (g/L)
51.25 (baseline)11.6 (baseline)
62.511.6
71.2516
82.516

Expected Outcome

  • extend reaction lifetime
  • increase cumulative protein production
  • allow more complete chromophore maturation

pixel pixel Assigned wells and adjusted reagents