Francisco Martínez

Francisco M.C

About me

I’m Francisco, have a MSc in Biological Sciences in CINV, UV (Chile) with wet lab expertise in microbiology and energy metabolism, specifically studying C. elegans behavioral aversion to P. aeruginosa PA14. As an Electronic Engineer of UTFSM (Chile), I complement this by designing open-source hardware and microfluidics to bridge biological research with advanced instrumentation.

Contact info

LinkedIn GitHub YouTube WhatsApp

Homework

Labs

Projects

Subsections of Francisco Martínez

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Oncological Bacteriotherapy: Iron Sequestration in the TME via Controlled Release of Stealth Siderophores under NAND Logic Gates and Biocontainment Circuits.

  • Week 2 HW: DNA-read-write-and-edit

    Nature’s Machinery for Copying DNA - oligo synthesis

  • Week 03 HW: Lab-Automation

    1. Python Script for Opentrons Artwork This artwork was generated using the HTGAA26 Opentrons Colab environment. opentron_code The design was implemented programmatically using geometric constructions and multi-color pipetting logic. To properly render Devanagari text (e.g., “चित्”) using PIL in Google Colab, system-level fonts must be installed before executing the Opentrons script. The Noto Sans Devanagari font was installed using the following commands in a separate Colab cell:
  • Week 04 HW: Protein Design Part I

    Part A. Conceptual Questions from Shuguang Zhang 1) How many molecules of amino acids are in 500 g of meat? Assume meat is ~20–25% protein: 500 g meat → ~100–125 g protein. Using ~100 Da per amino acid (given): 100 g / (100 g/mol) = 1.0 mol amino acids → ~6.0×1023 molecules 125 g / (100 g/mol) = 1.25 mol amino acids → ~7.5×1023 molecules Answer: ~10^23

  • Week 05 HW: Protein Design Part II

    Part A: SOD1 Binder Peptide Design Part 1: Generate Binders with PepMLM Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Original sequence sp|P00441|SODC_HUMAN Superoxide dismutase MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Variant: A4V MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence. Using the mutant SOD1 sequence as input, PepMLM Colab generated four 12-residue candidate binders

  • Week 06 HW: Genetic Circuits Part I

    DNA Assembly Assignment 1. Components of the Phusion High-Fidelity PCR Master Mix The Phusion High-Fidelity PCR Master Mix contains the main components required for accurate DNA amplification. One key component is the Phusion DNA polymerase, which synthesizes new DNA strands with high fidelity. The mix also includes dNTPs, which serve as the nucleotide building blocks for DNA synthesis. In addition, it contains a reaction buffer that maintains the proper pH and salt conditions for enzyme activity, as well as magnesium ions, which are essential cofactors for polymerase function. Together, these components support efficient and accurate PCR amplification.

  • week-07-hw-genetic-circuits-part-ii

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. Advantages of IANNs over traditional Boolean genetic circuits IANNs can generate graded, weighted, and more flexible input/output responses instead of only ON/OFF logic. This makes them better suited for integrating multiple noisy biological signals and for approximating complex decision boundaries.

  • week-09-hw-cell-free-systems

    Homework Part A: General and Lecturer-Specific Questions Advantages: Cell-free systems allow direct control of reaction conditions without maintaining cell viability. They are especially useful for toxic proteins and membrane proteins. Main components: Cell extract or Tx/Tl machinery, DNA template, amino acids, nucleotides, energy source, salts, and buffer. Together, these support transcription, translation, and reaction stability. Energy regeneration: ATP and GTP are rapidly consumed during transcription and translation. Continuous supply can be maintained with phosphoenolpyruvate- or maltodextrin-based regeneration systems.

  • week-10-hw-imaging-and-measurement

    Waters Part I — Molecular Weight 1) what is the calculated molecular weight? Using the amino acid sequence provided in the assignment, I calculated the theoretical molecular weight of the construct with the ExPASy Compute pI/Mw tool. The calculator (https://web.expasy.org/compute_pi/) returned the following values:

  • week-11-hw-building-genomes

    Part B: Cell-Free Protein Synthesis | Cell-Free Reagents Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction. 1.1 E. coli Lysate * BL21 (DE3) Star Lysate: Provides the essential molecular machinery, including ribosomes and translation factors, while the T7 RNA Polymerase drives the transcription of DNA into mRNA.

Subsections of Homework

Week 1 HW: Principles and Practices

“Oncological Bacteriotherapy: Engineered Siderophore Secretion and Safety Kill-Switch via NAND Logic Gates”

My project focuses on engineering a bacterial strain capable of sensing the tumor microenvironment and responding through the synthesis of siderophores (Salmochelin), integrated with a robust safety mechanism to prevent off-target effects.

1. General Objective

To induce the death of tumor cells through iron sequestration (Salmochelin), integrating spatial control logic circuits (NAND gate) so that the bacteria act exclusively under hypoxia and ultrasound conditions, and a biocontainment system (Kill-Switch) to guarantee host safety.

Abstract Draw Abstract Draw

2. Experimental Design

2.1.- The effector agent: E. coli Nissle 1917 (Locus iroBCDE, iroN)

  • Siderophore Selection: Release Salmochelin (glycosylated) instead of the siderophore Enterobactin (ENT), which is known to be neutralized by host Lipocalin-2 (Lcn2) (Saha et al. 2019). The use of Pyoverdine is discarded after technical analysis, as its complex biosynthetic pathway represents excessive metabolic stress for the bacteria.
  • Competitive Advantage: Based on the findings of Raffatellu et al. 2010, salmochelin is a siderophore that can survive in an environment with high levels of Lcn2.
  • Cytotoxic Effect (hypothetical): By depleting the labile iron pool (LIP), ferritinophagy and HIF-1α are promoted, but given the high affinity of the siderophore, mitochondrial function collapses, triggering apoptosis of cancer cells.

2.2.- Spatial Control Circuit (NAND Gate)

To avoid systemic toxicity (Pita-Grisanti et al., 2022), Salmochelin production is subject to a double de-repression. In the normal state, two repressors (LacI and TetR) block the operon. Synthesis occurs when the following conditions are met:

  • Condition A (Hypoxia): The P_vgb promoter turns off → The bacteria stop producing LacI-LVA → The LacI lock degrades.
  • Condition B (Ultrasound): The stimulus inactivates the P_tlpA repression system → The bacteria no longer have functional TetR-LVA → The TetR lock degrades.
  • Result: The Salmochelin genes (iroBCDE, iroN) must be under a promoter with binding sites for LacI and TetR. Only when there is NO LacI (Hypoxia) AND NO TetR (Ultrasound) can the polymerase pass and produce Salmochelin and its receptor. Basal expression (leakiness) in healthy tissues is eliminated, protecting the patient’s iron homeostasis. The bacteria only release siderophores inside the tumor and when the physician decides.

2.3.- Biocontainment System (Biosafety Kill-Switch)

Implementation of a cascade activation mechanism. This design minimizes metabolic stress by not producing the drug sensor while the bacteria are colonizing the tumor.

  • Oxygen Sensor (Normoxia): P_cyo promoter → araC gene. In the tumor (Hypoxia), the P_cyo promoter is turned off, so the AraC protein is not produced. In healthy tissue or the bloodstream (Normoxia), the P_cyo promoter is activated and synthesizes the AraC protein, which acts as the “key” for the drug.
  • Drug Sensor: P_bad promoter → ccdB gene (Toxin). The P_bad promoter has an AND activation logic: it is only activated if the AraC protein is present (only in normoxia) AND Arabinose (the drug) is administered.
  • Result: As long as the bacteria remain in the hypoxic TME, they are immune to the drug, as they lack the AraC protein. If the bacteria escape to oxygenated tissues or if the tumor is significantly reduced, the presence of oxygen allows the synthesis of AraC, making the bacteria sensitive to the drug for their total elimination.

2.4.- Note

Lipocalin-2 (Lcn2) can deplete intracellular iron in macrophages, generating an inflammatory response and promoting an attack on bacteria (Manfred Nairz et al., 2016); however, in cancer, Tumor-Associated Macrophages (TAMs) have an altered transcriptional program, where they are instead immunosuppressed. When the tumor size is significantly reduced, it is likely that macrophages will reprogram toward an M1 phenotype, assisting in the elimination of the bacteria alongside the kill-switch.

3. Governance and Policy Goals

To ensure that this bacteriotherapy contributes to an ethical and safe future, I have defined the following goals:

  • Goal 1: Ensure Safety & Eficcacy
    • Sub-goal 1a: Implement multi-layered biocontainment (NAND gate + Kill-switch) to prevent systemic iron depletion in the host.
    • Sub-goal 1b: Prevent bacterial environmental persistence through strictly controlled clinical waste protocols (analogous to radioguided surgery) to ensure no bacterial escape into public sewage.
  • Goal 2: Promote Constructive Applications & Equity
    • Sub-goal 2a: Develop the platform using probiotic E. coli Nissle 1917 to keep production costs low and accessible for developing regions.

4. Governance Actions Matrix

To manage the specific risks of genetic drift and metabolic stress associated with the AraC/CcdB cascade design (Kill-Switch only), I propose the following interconnected actions:

AspectAction 1: Technical (Biocontainment)Action 2: Academic (Transparency)Action 3: Regulatory (Standards)
PurposeImplementation of a genetic circuit that prevents activation of the kill-switch when the synthetic bacteria is administered and at the same time minimizes metabolic stress in a hypoxic environment (TEM).Establishment of a shared database focusing on the leakiness and metabolic burden of hypoxia-responsive sensors like P_cyo.Development of standardized certification to ensure clinical reliability.
DesignImplementation of a cascade activation mechanism, where P_cyo promoter acts as a gatekeeper for AraC synthesis. P_bad then requires both AraC AND Arabinose to express the CcdB toxin.Peer-reviewed publication of the “Stress-Safety Curve” of the AraC/CcdB cascade to define at what point mutation frequency increases.Technical standards (e.g., ISO/TC 276) that define the Mean Time To Failure (MTTF) of the cascade before plasmid loss.
AssumptionsAssumes the AraC and ccdB genes remains functional. Risk of plasmid loss is acknowledged. Assumes the P_cyo promoter remains tightly repressed in hypoxia to prevent metabolic burden.Assumes that labs will transparently share data when the P_cyo sensor leaks and kills the bacteria prematurely.Assumes regulatory bodies (like ISP/FDA) have the expertise to audit compliance with established rules.
RisksGenetic Drift: Loss-of-function mutations in araC or ccdB it generates bacteria that are immune to the drug. The overexpression of araC due to promoter mutations it generates Arabinose-sensitive bacteria even within the tumor.Dual-Use: Detailed performance maps of the P_cyo sensor could be exploited to design oxygen-evading pathogens.Innovation Lag: Excessive bureaucracy in certifying may delay new targeted gene therapies.

Note on Scope: While my policy goals include the NAND gate for spatial control, the following Governance Matrix focuses specifically on the Kill-switch (AraC/CcdB cascade). I have prioritized this component because it represents the highest risk for environmental escape and is the “weakest link” in terms of biocontainment due to potential genetic drift.

5. Scoring and Prioritization

I have scored my proposed governance actions against my specific Policy Goals (1=Best, 3=Worst):

Policy GoalAction 1: Technical (Cascade Stability)Action 2: Academic (Transparency)Action 3: Regulatory (Waste & Standards)
Goal 1: Safety (Non-malfeasance)222
Goal 2: Equity (Low-cost Access)213
Feasibility (Implementation)122

Technical Note: In this assessment, Action 1 is scored based on the current plasmid-based design. However, to minimize the probability of genetic drift, I propose that the final implementation should transition to genomic integration of the AraC/CcdB cascade. This would ensure that the safety circuits are permanently embedded in the bacterial DNA, significantly reducing the risk of mutants compared to episomal (plasmid) expression.

6. Final Recommendation and Prioritization

Based on the scoring, I prioritize a combination of Action 1 (Technical) and Action 3 (Regulatory).

  • Priority and Audience: My recommendation is directed to the authorities responsible for verifying the safety and efficacy of new gene therapies. The technical design alone is insufficient without a clear regulatory framework.

  • Trade-offs: I have chosen to prioritize these over Action 2 (Academic Transparency) to mitigate the Dual-Use risk. While sharing detailed performance data of the P_cyo and P_bad sensors would promote global equity, the risk of this information being exploited to design pathogens that evade oxygen-based immune barriers is a trade-off I consider necessary for public safety.

  • Assumptions and Uncertainties: One assumption and uncertainty regarding what is being proposed is that the safety and efficacy criteria defined by the respective authorities could have undetected safety biases due to a lack of evidence.

7. Ethical Reflection

The most significant ethical concern that arose for me is the Dual-Use Dilemma in the context of biocontainment. I realized that the very mechanisms I am designing to ensure a therapy is safe (like high-precision oxygen sensors) are the same tools that could be used to engineer biological threats that are harder to detect or neutralize.

8. References

  1. Saha P., et al. (2019). “Enterobactin, an iron chelating bacterial siderophore, arrests cancer cell proliferation” Biochemical Pharmacology.
  2. Raffatellu M., et al. (2010). “Lipocalin-2 resistance of Salmonella enterica serotype Typhimurium confers an advantage during life in the inflamed intestine” Cell Host & Microbe.
  3. Pita-Grisanti V., et al. (2022). “Understanding the Potential and Risk of Bacterial Siderophores in Cancer” Frontiers in Oncology.
  4. Nairz M., et al. (2015). “Lipocalin-2 ensures host defense against Salmonella Typhimurium by controlling macrophage iron homeostasis and immune response” Journal of Immunology.

9. AI Prompts

In compliance with HTGAA 2026 guidelines, I certify that this homework was developed with the assistance of Gemini (Google AI).

  • Image Generation: “Used Nano Banana to generate image “Abstract_Draw.png” from the detailed description of my project.”
  • Assessment: “Create a table in markdown format that allows me to compare the design of my logic circuit with a standard design.”
  • Troubleshooting: “Technical troubleshooting for personal profile configuration in the repository and helps to transfer the project in markdown format”

Week 2 HW: DNA-read-write-and-edit

1. Benchling & In-silico Gel Art

1.1 Extraction of Restriction Site Data

Restriction site positions for 10 enzymes were extracted from Benchling for the λ phage genome.
Based on these cut positions, fragment sizes were calculated for each individual enzyme digestion.

(Image 1: Benchling screenshot showing restriction sites)


1.2. Construction of the Combinatorial Space

All possible enzyme combinations were generated from the 10 enzymes
(2¹⁰ − 1 = 1023 combinations).

For each combination:

  • Fragment sizes were computed.
  • A discrete size axis was built from all unique fragment lengths.
  • A binary matrix (combination × fragment sizes) was constructed, indicating presence/absence of each fragment.

This forms the complete “puzzle space” of available molecular weight distributions.


1.3. Definition of the Target Pattern

A desired visual pattern was manually designed using an interactive executable that allows activation/deactivation of bands on the real fragment size axis.

The output is a binary target matrix (lanes × fragment sizes).

(Image 2: manually designed target pattern)

The notebook used to manually design the target band pattern can be accessed here:

➡️ Manual Pattern Generator Notebook


1.4. Similarity Comparison and Optimization

For each lane of the target pattern:

  • It was compared against all 1023 possible enzyme combinations.
  • A contrast-based similarity metric was computed (rewarding matches and penalizing undesired bands).
  • A score was assigned.
  • The highest-scoring combination was selected.

The resulting set of enzyme combinations per lane represents the optimal reconstruction of the desired pattern.

(Image 3: algorithm output + selected enzyme combinations)

The notebook implementing the combinatorial search and similarity scoring can be accessed here:

➡️ Optimization Algorithm Notebook


1.5 Result

After optimization and manual refinement, the following enzyme combinations were selected per lane:

Manually Forced Lanes

  • Lane 4 → (‘XhoI’, ‘KpnI’)
  • Lane 7 → (‘PvuII’, ‘XhoI’)

Best-Scoring Lanes (Algorithm Output)

  • Lane 03 → (‘EcoRI’, ‘KpnI’)
  • Lane 6 → (‘EcoRI’, ‘KpnI’)
  • Lane 8 → (‘EcoRI’, ‘KpnI’)
  • Lane 10 → (‘EcoRI’, ‘KpnI’)
  • Lane 12 → (‘BamHI’,)
  • Lane 13 → (‘EcoRI’, ‘KpnI’)
  • Lane 14 → (‘BamHI’,)
  • Lane 16 → (‘EcoRI’, ‘KpnI’)
  • Lane 17 → (‘BamHI’,)
  • Lane 19 → (‘EcoRI’, ‘KpnI’)
  • Lane 20 → (‘BamHI’,)
  • Lane 21 → (‘EcoRI’, ‘KpnI’)

The final configuration, combining algorithm-selected and manually adjusted lanes, produces a readable macroscopic pattern that spells:

CHITRA

This demonstrates that complex visual structures can be reconstructed using only physically valid restriction enzyme digestion combinations drawn from the complete 1023-piece combinatorial space.

3. DNA Design Challenge

3.1. Gene Selection: E. coli IroB

For my final project, I am designing a therapeutic bacterium for solid tumor treatment using E. coli Nissle 1917 as the chassis. My goal is to synthesize the iroBCDEN operon, which is responsible for Salmochelin production, allowing the bacteria to scavenge iron more efficiently in the tumor microenvironment.

For this sequence design exercise, I am focusing on the key enzyme: IroB (C-glycosyltransferase). To ensure optimal gene expression, proper protein folding, and to minimize metabolic burden, I selected the native IroB sequence from Escherichia coli (Accession: WP_016242764.1). Utilizing a sequence native to the chassis species is a biologically superior approach compared to importing foreign variants.

3.2. Protein Sequence Input

By starting directly with the pure amino acid sequence, I utilized a Reverse Translation approach to build a pristine DNA sequence tailored for my chassis.

Protein Sequence (WP_016242764.1):

MRILFVGPPLYGLLYPVLSLAQAFRVNGHEVLIASGGQFAQKAAEAGLVVFDAAPGLDSEAGYRHHEAQR KKSNIGTQMGNFSFFSEEMADHLVEFAGHWRPDLIIYPPLGVIGPLIAAKYDIPVVMQTVGFGHTPWHIK GVTRSLTDAYRRHNVGTTPRDMAWIDVTPPSMSILENDGEPIIPMQYVPYNGGAVWEPWWERRPERKRLL VSLGTVKPMVDGLDLIAWVMDSASEVDAEIILHISANARSDLRSLPSNVRLVDWIPMGVFLNGADGFIHH GGAGNTLTALHAGIPQIVFGQGADRPVNARVVAERGCGIIPGDVGLSSNMINAFLNNRSLRKASEEVAAE MAAQPCPGEVAKSLITMVQKG

3.3. Codon Optimization

Using the Twist Bioscience Expression Optimization tool, I reverse-translated the E. coli protein sequence into an optimized DNA sequence. (https://www.idtdna.com/CodonOpt)

Optimization Parameters & Constraints:

  • Chassis: Escherichia coli.
  • Genetic Logic Compatibility: I explicitly removed internal restriction sites (GGTCTC for BsaI and GAAGAC for BbsI). This is a critical engineering step to ensure the synthesized gene is fully compatible with the Golden Gate Assembly method, which is required for assembling my NAND logic gate circuit.

Optimized IroB DNA Sequence:

ATG AGA ATT TTA TTT GTT GGA CCG CCG CTC TAC GGC CTG CTG TAT CCG GTG CTG AGC CTG GCG CAG GCG TTC CGC GTC AAC GGC CAC GAG GTG CTG ATT GCC TCC GGC GGG CAG TTT > GCG CAG AAA GCG GCG GAA GCC GGT CTG GTG GTG TTT GAT GCC GCG CCG GGC CTG GAC TCT GAA GCG GGT TAC CGC CAT CAC GAA GCG CAG CGC AAA AAA AGC AAC ATT GGC ACC CAG > ATG GGT AAC TTC AGC TTC TTC TCT GAA GAA ATG GCC GAT CAC CTG GTT GAG TTT GCC GGT CAC TGG CGT CCG GAC CTG ATT ATC TAT CCG CCG CTG GGT GTG ATT GGT CCG CTG ATT > GCG GCA AAA TAT GAC ATC CCG GTG GTT ATG CAG ACC GTC GGC TTT GGT CAC ACG CCG TGG CAC ATC AAA GGC GTG ACC CGC AGC CTG ACC GAT GCC TAT CGC CGT CAC AAC GTT GGC > ACC ACA CCG CGT GAT ATG GCG TGG ATC GAC GTC ACA CCG CCA AGC ATG AGC ATC CTG GAA AAC GAC GGT GAG CCG ATC ATT CCG ATG CAG TAT GTG CCG TAC AAC GGT GGT GCG GTG > TGG GAG CCG TGG TGG GAG CGT CGT CCG GAG CGC AAG CGC CTG CTG GTG AGC CTG GGT ACG GTG AAA CCG ATG GTG GAC GGT CTG GAT CTG ATT GCC TGG GTG ATG GAC AGC GCC AGC > GAA GTT GAT GCG GAG ATC ATC CTG CAC ATC TCT GCC AAC GCG CGC AGC GAC CTG CGC TCG CTG CCG AGC AAC GTG CGC CTG GTT GAT TGG ATT CCG ATG GGT GTG TTC CTG AAC GGT > GCG GAC GGC TTT ATC CAC CAC GGT GGT GCG GGT AAC ACC CTG ACT GCG CTG CAT GCC GGT ATT CCG CAG ATT GTC TTT GGT CAG GGT GCT GAC CGC CCG GTT AAT GCG CGT GTG GTG > GCG GAG CGT GGC TGT GGG ATC ATC CCG GGT GAT GTC GGC CTG TCC AGC AAC ATG ATC AAC GCC TTC CTG AAC AAC CGC TCG CTG CGT AAA GCC TCT GAA GAG GTT GCG GCA GAA ATG > GCG GCG CAG CCG TGC CCG GGT GAG GTG GCC AAA TCG CTG ATC ACC ATG GTT CAG AAA GGG

3.4. Protein Production Technologies

To produce the IroB protein from the newly designed and optimized DNA sequence, two main technological approaches can be employed: cell-dependent (in vivo) and cell-free (in vitro) systems.

1. Cell-Dependent Method (In vivo expression)

This is the traditional recombinant protein production method. The optimized iroB DNA sequence would be cloned into an expression plasmid containing a strong promoter and a Ribosome Binding Site (RBS). This plasmid is then transformed into a bacterial host, such as Escherichia coli BL21(DE3) for massive lab-scale production, or directly into our therapeutic chassis, E. coli Nissle 1917. The living bacteria will act as bio-factories, using their native cellular machinery to express the protein during their growth phase.

2. Cell-Free Protein Synthesis (CFPS) (In vitro expression)

Alternatively, a cell-free system (such as the PURE system or an E. coli cell extract) can be used. This technology strips away the living cell and uses only the essential biological machinery (RNA polymerases, ribosomes, tRNAs, amino acids, and energy molecules) mixed in a tube. By adding our linear or plasmid DNA directly into this mixture, the IroB protein can be synthesized in a few hours. This method is highly advantageous for rapid prototyping and testing of genetic circuits, as it bypasses the need for cell transformation and culturing.

DNA sequence to Protein

  1. Transcription: Under the control of a hypoxia-sensitive promoter (part of my NAND gate logic), the bacterial RNA Polymerase enzyme recognizes and binds to the specific promoter sequence located just upstream of our iroB gene. The enzyme unwinds the double-stranded DNA and uses the template strand to synthesize a single-stranded messenger RNA (mRNA). It reads through our optimized sequence, creating an exact RNA copy, and stops when it reaches a terminator sequence.
  2. Translation: Once the mRNA is transcribed, the bacterial ribosome recognizes and binds to the Ribosome Binding Site (RBS) on the mRNA. The ribosome scans the mRNA until it finds the start codon (ATG). Because we performed codon optimization via reverse translation, the sequence is perfectly calibrated for E. coli. Transfer RNAs (tRNAs) carrying specific amino acids will efficiently recognize the optimized mRNA codons (three-letter nucleotide sequences) without stalling. The ribosome links these amino acids together through peptide bonds, moving along the mRNA until it reaches the stop codon (TAA, TAG, or TGA). At this point, the newly synthesized IroB C-glycosyltransferase protein folds into its 3D structure and is released to perform its catalytic function.
  3. Function: Once folded, IroB will begin glycosylating enterobactin within the cytoplasm to produce the therapeutic Salmochelin.

3.5 How does it work in nature/biological systems?

Historically, the rule in biology was “one gene, one protein.” However, we now know that a single gene can produce multiple different protein variants (isoforms) through mechanisms that alter the mRNA transcript before it is translated. At the transcriptional (and early post-transcriptional) level, there are two primary mechanisms for this:

1. Alternative Splicing

In eukaryotic cells, genes are composed of coding regions (exons) and non-coding regions (introns). When RNA polymerase transcribes the gene, it creates a precursor mRNA (pre-mRNA) that contains both. During a process called alternative splicing, a cellular complex called the spliceosome removes the introns and joins the exons together. However, the spliceosome can choose to include or skip certain exons. Depending on which combination of exons is spliced together to form the mature mRNA, the ribosome will translate completely different protein isoforms, each potentially having different structural domains or functions, all originating from the exact same DNA sequence.

2. Alternative Promoters (Alternative Transcription Start Sites)

A single gene can possess multiple promoters (the DNA sequence where RNA polymerase binds to initiate transcription). Depending on which promoter the cell activates—often influenced by environmental signals or tissue type—transcription will start at different points along the gene. If transcription starts at a downstream promoter, the resulting mRNA will be shorter and will lack the initial genetic instructions. When translated, this produces a truncated version of the protein, often missing specific signaling sequences or regulatory domains present in the full-length version.


4. Plasmid Construction and In Silico Validation (Phase I: iroB)

This section documents the construction of the initial expression vector in Benchling, starting from the optimized iroB gene and culminating in a verified plasmid assembly ready for future expansion.

4.1. Genetic Cassette Design and Optimization

The expression cassette was built systematically in Benchling, starting with the codon-optimized sequence for the iroB gene. To ensure functionality, modularity, and future purifiability, specific genetic parts were integrated:

  • Components: The cassette includes a Promoter (Pro), Ribosome Binding Site (RBS), the iroB CDS, and a strong transcriptional Terminator (Ter).
  • C-Terminal tag: A 7xHis-tag was added immediately upstream of the STOP codon to allow for future protein purification via affinity chromatography.
  • Modular Restriction Sites: To create a standardized “BioBrick-like” part, the entire cassette was flanked with unique restriction enzyme sites: NotI (GCGGCCGC) at the 5’ end and XbaI (TCTAGA) at the 3’ end.

SBOL Cassette Design SBOL Cassette Design Figure 1: SBOL Diagram of the iroB Expression Cassette. The symbols denote (from left to right): NotI RERS, Promoter, RBS, iroB CDS, 7xHis-Tag, Terminator, and XbaI RERS.

Technical Note: The XbaI site was placed immediately following the terminator to encompass the entire modular cassette. It is important to note that since the selected terminator sequence ends in ‘TA’ (and not ‘GA’), the formation of a Dam methylation site (GATCTAGA) is avoided. This serendipitous sequence alignment ensures that the enzyme will not be blocked by methylation, allowing for efficient cleavage during laboratory procedures.


4.2. Vector Selection and In Silico Assembly

The commercial vector pTwist Amp High Copy (2221 bp) was selected as the backbone for this phase. (https://www.twistbioscience.com/products/genes/vectors?tab=catalog-vectors)

Benchling Assembly Process:

  1. Backbone Preparation: The pTwist map was imported, identifying the Multiple Cloning Site (MCS:region between coordinates ~73 and ~245) as the optimal insertion point. Specifically, coordinate 200 was selected to ensure that critical elements like the origin of replication (ori) and the ampicillin resistance gene (AmpR) remained undisturbed.
  2. Insert Preparation: The 1326 bp iroB modular cassette was defined using the flanking NotI (5’ end) and XbaI (3’ end) recognition sites. This allows any researcher to infer the intended Forward direction of the gene by identifying the positions of these specific landmarks on the plasmid map. Modularity is also ensured, allowing the entire expression cassette to be excised and transferred to different vectors in future iterations of the project.
  3. Assembly Simulation: Using Benchling’s molecular biology tools, a Gibson Assembly was simulated to insert the designed iroB modular cassette into the pTwist MCS, resulting in a final circular plasmid of exactly 3561 bp.

Benchling Plasmid Map Benchling Plasmid Map Figure 2: Circular map of the assembled pTwist-iroB-cassette plasmid (3561 bp).


4.3. Results and Validation: Virtual Digest

To validate the structural integrity of the design, a Virtual Enzymatic Digest was performed using NotI and XbaI. The simulation results account for the redistribution of nucleotides at the restriction sites following the cleavage:

  • Fragment 1 (Vector Backbone - pTwist): 2228 bp. This represents the original 2221 bp pTwist sequence plus 7 bp derived from the flanking restriction site architecture.
  • Fragment 2 (iroB Expression Cassette): 1333 bp. This comprises the 1326 bp optimized cassette plus 7 bp from the remaining restriction site sequences.

The sum of these fragments confirms a total plasmid length of 3561 bp.

Virtual Digest Gel Virtual Digest Gel Figure 3: Virtual agarose gel electrophoresis (1% agarose). Lane 1: DNA Ladder. Lane 2: pTwist-iroB digested with NotI/XbaI, yielding two distinct and sharp bands at 2228 bp and 1333 bp. This result confirms successful in silico assembly and validates that the iroB optimized sequence is free of internal restriction sites for the selected enzymes.


4.4. Future Work: Iterative Design

This validated plasmid serves as the foundational “chassis” for the project. The next engineering phases involve:

  • Promoter Re-engineering: Replacing the current constitutive promoter with a Boolean Logic (e.g., NAND gate) promoter designed to respond to hypoxia and ultrasound-linked stimuli.

  • Operon Completion: Sequentially assembling the remaining genes (iroC, iroD, iroE, and iroN) into the cassette to generate a single polycistronic iroBCDEN operon.

  • Clinical-Grade Vector Redesign: The current backbone includes an antibiotic-resistance marker, which is not ideal for therapeutic applications due to biosafety and regulatory concerns. Future versions of the construct should transition to a non-antibiotic plasmid maintenance system appropriate for clinical use.

  • Biocontainment Strategy (Kill Switch Evaluation): A toxin–antitoxin-based containment module is a candidate approach. In such systems, continuous expression of an antitoxin neutralizes a stable toxin; loss or inhibition of the antitoxin can result in growth arrest or cell death. The stability, leakiness, and escape frequency of this strategy must be experimentally evaluated.

  • Expression Burden Mitigation: Full expression of the iroBCDEN complex may impose significant metabolic and translational burden. Strategies such as orthogonal translation systems (e.g., orthogonal ribosomes) or alternative burden-mitigation approaches should be assessed to improve stability and performance.

5. Theoretical Questions: DNA Read, Write, & Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

  • Target: I would sequence the genome of the Panther Chameleon (Furcifer pardalis) and the Common Octopus (Octopus vulgaris), focusing on the gene families of reflectins and the regulatory proteins of iridophores/chromatophores.
  • Why: Current biological reporters, such as Fluorescent Proteins (FPs), have a fundamental limitation: high stability leads to poor temporal resolution. Once an FP is expressed or activated, it remains fluorescent for hours, “smearing” the signal and masking real-time dynamics. This makes it impossible to observe rapid “on/off” pulses in neural circuits (like serotonergic vs. octopaminergic crosstalk) or the precise timing of a synthetic logic gate.

By sequencing these organisms, I aim to discover the genetic basis of reversible structural color. Unlike fluorescence, which requires high-energy lasers that cause phototoxicity and photobleaching, reflectins change their optical properties through rapid conformational shifts. In the context of my cancer-targeting project, these proteins could serve as “dynamic reporters” for my NAND logic gate. They would allow me to observe, in vitro, the exact moment the bacteria detect ultrasound or hypoxia and—crucially—see the signal vanish the instant the stimulus stops. This would provide a level of kinetic resolution and biophysical feedback that is currently unattainable with standard fluorescence, enabling the study of fast enzymatic transitions and synaptic-like communications without “staining” the entire experimental field.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

  • Selected Technology: PacBio HiFi Sequencing combined with Iso-Seq.
  • Rationale: Since the goal is to find functional proteins with specific structural kinetics, I need to resolve not just the genome, but the full-length isoforms of the proteins being expressed in the skin cells. PacBio HiFi provides the extreme accuracy (99.9%) and long reads necessary to assemble these complex, repetitive protein domains without the errors of short-read platforms.

Detailed Technical Questions:

  • Is your method first-, second- or third-generation or other? How so?
    • It is a third-generation technology (Single Molecule, Real-Time). It sequences individual DNA molecules as they are synthesized by a polymerase in a Zero-Mode Waveguide (ZMW), allowing for real-time observation of base incorporation.
  • What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
    • Input: High Molecular Weight (HMW) genomic DNA and full-length mRNA (for Iso-Seq) from dermal tissue.
    • Preparation steps: 1. Extraction: Specialized lysis to maintain long-strand integrity. 2. SMRTbell Library Prep: Ligation of hairpin adapters to create circular DNA templates. 3. Size Selection: Ensuring only long fragments (>10kb) are loaded to maximize information per read.
  • What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
    • Process: A polymerase at the bottom of a ZMW incorporates fluorescently labeled nucleotides. As each base is added, it emits a light pulse.
    • Base Calling: The system records the color and duration of these pulses. Because the template is circular, the polymerase reads it multiple times (Circular Consensus Sequencing), which allows the software to “correct” any random errors and produce a HiFi read of extremely high quality.
  • What is your output of your chosen sequencing technology?
    • The output is a BAM file containing highly accurate, long-read sequences, ready for de novo assembly of the structural color gene clusters.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

  • Target: I want to synthesize a modular genetic circuit containing a NAND logic gate that integrates two environmental sensores (ultrasound-responsive promoters and hypoxia-inducible factors) to drive the expression of the iroB cluster for salmochelin production, including a safety kill-switch.
  • Why: This construct is the core of my final project: a targeted cancer therapy. The goal is to engineer bacteria that only produce potent iron-sequestering siderophores (salmochelins) within the tumor microenvironment (hypoxia) and under external activation (ultrasound). This ensures the therapy is localized, minimizing systemic toxicity. Synthesizing this specific construct via Twist would allow me to perform the first in vitro validations of the logic gate’s precision.
  • Sequence: I don’t have the complete cassette sequence yet; I need to redesign the promoter, add the Kill-Switch and add the iroBCDEN complex. Preliminary sequence: TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGATGAGAATTTTATTTGTTGGACCGCCGCTCTACGGCCTGCTGTATCCGGTGCTGAGCCTGGCGCAGGCGTTCCGCGTCAACGGCCACGAGGTGCTGATTGCCTCCGGCGGGCAGTTTGCGCAGAAAGCGGCGGAAGCCGGTCTGGTGGTGTTTGATGCCGCGCCGGGCCTGGACTCTGAAGCGGGTTACCGCCATCACGAAGCGCAGCGCAAAAAAAGCAACATTGGCACCCAGATGGGTAACTTCAGCTTCTTCTCTGAAGAAATGGCCGATCACCTGGTTGAGTTTGCCGGTCACTGGCGTCCGGACCTGATTATCTATCCGCCGCTGGGTGTGATTGGTCCGCTGATTGCGGCAAAATATGACATCCCGGTGGTTATGCAGACCGTCGGCTTTGGTCACACGCCGTGGCACATCAAAGGCGTGACCCGCAGCCTGACCGATGCCTATCGCCGTCACAACGTTGGCACCACACCGCGTGATATGGCGTGGATCGACGTCACACCGCCAAGCATGAGCATCCTGGAAAACGACGGTGAGCCGATCATTCCGATGCAGTATGTGCCGTACAACGGTGGTGCGGTGTGGGAGCCGTGGTGGGAGCGTCGTCCGGAGCGCAAGCGCCTGCTGGTGAGCCTGGGTACGGTGAAACCGATGGTGGACGGTCTGGATCTGATTGCCTGGGTGATGGACAGCGCCAGCGAAGTTGATGCGGAGATCATCCTGCACATCTCTGCCAACGCGCGCAGCGACCTGCGCTCGCTGCCGAGCAACGTGCGCCTGGTTGATTGGATTCCGATGGGTGTGTTCCTGAACGGTGCGGACGGCTTTATCCACCACGGTGGTGCGGGTAACACCCTGACTGCGCTGCATGCCGGTATTCCGCAGATTGTCTTTGGTCAGGGTGCTGACCGCCCGGTTAATGCGCGTGTGGTGGCGGAGCGTGGCTGTGGGATCATCCCGGGTGATGTCGGCCTGTCCAGCAACATGATCAACGCCTTCCTGAACAACCGCTCGCTGCGTAAAGCCTCTGAAGAGGTTGCGGCAGAAATGGCGGCGCAGCCGTGCCCGGGTGAGGTGGCCAAATCGCTGATCACCATGGTTCAGAAAGGGCATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

  • Selected Technology: Silicon-based high-throughput chemical synthesis (Twist Bioscience platform).
  • Rationale: For a complex circuit involving multiple promoters, a CDS (iroB), and a kill-switch, I need extreme precision and the ability to synthesize large quantities of different variations of the circuit. Twist’s technology uses a silicon platform that miniaturizes the traditional phosphoramidite chemistry, allowing for the synthesis of thousands of genes simultaneously with high fidelity and low cost, which is ideal for prototyping complex logic gates like mine.

Detailed Technical Questions:

  • What are the essential steps of your chosen synthesis method?
    • Phosphoramidite Cycle: The DNA is built base-by-base (A, T, C, G) on a silicon chip. Each addition follows a cycle of 4 steps: De-protection (preparing the strand), Coupling (adding the base), Capping (preventing errors), and Oxidation (stabilizing the bond).
    • Assembly: Since the chemical process can only print short pieces (oligos), these pieces are harvested from the chip and assembled into the full 2.5 kb circuit.
    • Error Correction: The final DNA is “polished” using enzymes to ensure there are no mutations, delivering a 100% accurate sequence.
  • What are the limitations of your synthesis method (if any) in terms of speed, accuracy, scalability?
    • Speed: The chemical synthesis is fast, but the complete process (assembly, quality control, and shipping) takes about 2 to 3 weeks.
    • Accuracy: As the DNA strand gets longer, the chance of errors increases, which is why we must assemble smaller, verified fragments to build a large circuit.
    • Scalability: It is highly scalable (thousands of genes at once), but it still depends on traditional chemicals, unlike newer enzymatic methods.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

  • Target DNA: The genome of a therapeutic strain of Escherichia coli (like E. coli Nissle 1917).
  • Why: I need to integrate my synthetic NAND-iroB circuit into the bacterial chromosome. If the circuit stays on a plasmid, it could be lost or vary in copy number. By editing the bacterial genome to insert the circuit into a “safe harbor” locus, I ensure the therapy is stable and the kill-switch works perfectly every time.

(ii) What technology or technologies would you use to perform these DNA edits and why?

  • Selected Technology: CRISPR-Cas9 mediated Recombineering.
    • Rationale: CRISPR-Cas9 is extremely precise at “cutting” the DNA at a specific site in the bacterial genome, and Recombineering allows us to “paste” my large 2.5 kb synthetic circuit into that gap with high efficiency.

Detailed Technical Questions:

  • How does your technology of choice edit DNA? What are the essential steps?
    • Targeting: A Guide RNA (gRNA) leads the Cas9 protein to the exact spot in the bacterial genome.
    • Cutting: Cas9 creates a double-strand break (DSB) in that spot.
    • Insertion: Using the cell’s repair machinery (and a template I provide), the synthetic NAND-iroB circuit is “pasted” into the cut, becoming a permanent part of the bacteria’s DNA.
  • What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
    • Design steps: I need to design the gRNA (to target the genome) and the Homology Arms (flanking sequences that match the insertion site).
    • Input: The Cas9 enzyme, the gRNA, and my synthetic DNA construct (the circuit synthesized by Twist).
  • What are the limitations of your editing methods (if any) in terms of efficiency or precision?
    • Efficiency: In bacteria, the rate of successful “pasting” (HDR) can be low, often requiring selection markers (like antibiotic resistance) to find the edited cells.
    • Precision: There is a small risk of off-target effects, where Cas9 might cut in the wrong place, though this is rare in simple bacterial genomes.

Week 03 HW: Lab-Automation

1. Python Script for Opentrons Artwork

This artwork was generated using the HTGAA26 Opentrons Colab environment. opentron_code The design was implemented programmatically using geometric constructions and multi-color pipetting logic.

opentron_design opentron_design

To properly render Devanagari text (e.g., “चित्”) using PIL in Google Colab, system-level fonts must be installed before executing the Opentrons script. The Noto Sans Devanagari font was installed using the following commands in a separate Colab cell:

  1. !sudo apt-get update -qq
  2. !sudo apt-get install -y -qq fonts-noto-core fonts-noto-unhinted
  3. !fc-list | grep -i “Devanagari” | head -n 20

2. Post-Lab Questions

2.1 Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

A relevant example of automation enabling novel biological applications is Nielsen et al., 2016. “Genetic circuit design automation.” Science

In this study, the authors developed an automated design–build–test workflow to systematically engineer genetic logic circuits in living cells. Rather than constructing gene circuits through manual trial-and-error, they integrated computational modeling, standardized genetic parts, and high-throughput experimental validation to design functional Boolean logic gates such as AND, OR, NOR, and more complex multi-layer circuits.

The key innovation was the automation of circuit design and screening. Computational tools were used to predict circuit behavior based on promoter strength, repressor activity, and regulatory architecture. These predictions were then experimentally validated using high-throughput plate-based assays, allowing many circuit variants to be constructed and tested in parallel. Automated measurement of reporter outputs (e.g., fluorescence) enabled quantitative evaluation of logic performance, signal thresholds, and leakiness.

Automation significantly reduced the combinatorial complexity inherent in multi-input genetic circuit design. Instead of manually constructing and testing a few variants, the workflow enabled systematic exploration of many possible architectures, improving robustness and reproducibility. This approach demonstrated that genetic logic circuits can be engineered in a scalable and programmable manner, similar to electronic circuit design.

This paper shows how automation transforms synthetic biology from artisanal genetic assembly into an engineering discipline with predictive modeling and systematic validation.

2.2 Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

For my final project, I intend to engineer a bacterial therapeutic system for oncology applications. The core design involves a plasmid encoding a NAND logic gate integrating multiple tumor-associated inputs, including hypoxia, elevated lactate levels, and ultrasound stimulation. Only when specific tumor microenvironment conditions are satisfied would the circuit activate expression of a therapeutic cassette (e.g., the eriBCDEN complex).

Automation would be essential to systematically design and validate this multi-input logic system. I propose implementing an automated design–build–test workflow focused on high-throughput screening of circuit variants.

  1. Combinatorial plasmid assembly Promoter variants responsive to hypoxia, lactate, and ultrasound would be modularly assembled using combinatorial DNA assembly methods (e.g., Golden Gate). An acoustic liquid handler (e.g., Echo) could transfer defined promoter and RBS fragments into specified wells to systematically generate circuit variants.

  2. RBS strength tuning Ribosome binding site variants of defined translation strengths would be introduced to modulate repressor expression levels. This step allows fine control of expression thresholds and minimization of basal leakiness, which is critical for achieving accurate NAND logic behavior.

  3. High-throughput culture setup Following transformation and colony selection, bacterial variants would be distributed into 96-well plates using liquid handling robotics. This enables parallel testing of multiple architectures under standardized growth conditions.

  4. Controlled environmental testing Each well would be exposed to defined combinations of normoxia/hypoxia conditions, graded lactate concentrations, and ultrasound stimulation (applied externally). This systematic input matrix allows evaluation of all Boolean input states.

  5. Automated reporter quantification A fluorescent reporter would be used during circuit prototyping prior to therapeutic deployment. Fluorescence measurements using a plate reader (e.g., PHERAstar) would quantify output across all input combinations, enabling assessment of Boolean fidelity, dynamic range, activation thresholds, and leakiness.

Because multi-input genetic logic circuits require careful balancing of transcriptional and translational parameters, manual testing would be slow and prone to variability. Automation enables parallelized combinatorial screening and quantitative validation before integrating the therapeutic cassette.

By integrating automation into the circuit development pipeline, this approach would accelerate optimization of tumor-specific logic and improve safety and precision in engineered bacterial cancer therapies.

3. Final Project Ideas

  1. Gated Siderophore Bacteriotherapy: My First is a programmable bacterial therapy that targets tumors by expressing a salmochelin siderophore cassette only under highly controlled conditions. The control logic is a two-input gate: hypoxia provides spatial specificity, and ultrasound provides clinician timing. Mechanistically, both inputs are implemented through DNA-binding repressors that toggle promoter accessibility—this week I analyzed the lambda cI DNA-binding domain (PDB 1LMB) as a structural model for repressor–operator control, which maps directly onto the TlpA39–P_tlpA thermal switch used for ultrasound activation. Next, I’m integrating the dual-repressor logic into a single promoter architecture and validating it with sequence/structure design tools.

  2. Neuroengineering - Metabolic Calcium Control: My second project is a closed-loop neuroengineering circuit to keep neuronal activity in a safe range. The input is lactate, a simple metabolic signal that rises in stressed tissue. I use a lactate-responsive promoter to drive a nanobody-based controller that tunes calcium entry when activity becomes too strong. I’ll test it in C. elegans touch neurons using the mec-4d degeneration model, where calcium dynamics can be imaged in vivo. The goal is a genetic feedback system that links metabolism to stable neural signaling.

  3. Ultrasound-Triggered Genetic Switches: My third project is the enabling technology behind my tumor-targeting bacteria: using ultrasound as a non-invasive control signal. The core idea is to build biological transducers—such as gas vesicles and mechanosensitive channels—that convert focused ultrasound into a reliable genetic switch. That switch becomes an external “ON command” you can combine with internal signals like hypoxia to build multi-input logic in living cells. So this project turns ultrasound into a general remote-control layer, and my bacteriotherapy project is the first concrete use case.

Figure: High-level architecture of the genetic control system. Hypoxia, ultrasound, and lactate sensing modules feed into a promoter-logic layer that gates expression of the therapeutic payload (either a siderophore cassette or a nanobody), enabling multi-input control over when and where the output is produced.

Week 04 HW: Protein Design Part I

Part A. Conceptual Questions from Shuguang Zhang

1) How many molecules of amino acids are in 500 g of meat?

Assume meat is ~20–25% protein: 500 g meat → ~100–125 g protein.
Using ~100 Da per amino acid (given):

  • 100 g / (100 g/mol) = 1.0 mol amino acids → ~6.0×10^23 molecules
  • 125 g / (100 g/mol) = 1.25 mol amino acids → ~7.5×10^23 molecules

Answer: ~10^23

2) Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Food proteins are digested into amino acids/peptides, then your cells rebuild human proteins according to your genome and regulation. You recycle building blocks; you do not copy the animal’s body plan.

3) Why are there only 20 natural amino acids?

The translation system (genetic code + tRNAs + synthetases + ribosome) is optimized around a standard set that provides a broad, efficient chemical toolkit. Expanding it is costly because it would require coordinated changes across the whole decoding machinery (and most proteins). (Note: nature also uses rare genetically encoded additions like selenocysteine/pyrrolysine in some lineages.)

4) Can you make other non-natural amino acids? Design some new amino acids.

Yes—chemistry and engineered translation can incorporate noncanonical amino acids. Examples:

  • Azido-alanine (Ala–N3): bioorthogonal “click” handle for labeling.
  • p-benzoyl-phenylalanine: UV-activated crosslinker to trap interactions.
  • Bipyridyl-alanine: metal-chelating side chain for catalysis/materials.
  • Fluoroleucine: tunes hydrophobicity/stability and NMR/19F probes.

5) Where did amino acids come from before enzymes and before life started?

Abiotic synthesis from simple precursors (e.g., atmospheric/energy-driven reactions), mineral-catalyzed chemistry (e.g., hydrothermal settings), and extraterrestrial delivery (meteorites). Prebiotic chemistry can generate amino acids without enzymes.

6) If you make an α-helix using D-amino acids, what handedness would you expect?

A helix built from D-amino acids is the mirror of the L-form helix.
Answer: D-amino-acid α-helices are expected to be left-handed (L-amino-acid α-helices are typically right-handed).

7) Can you discover additional helices in proteins?

Yes. Beyond the canonical α-helix, proteins can contain 3₁₀ helices, π helices, and short helical turns. They can be identified by backbone hydrogen-bond patterns and secondary-structure assignment algorithms (e.g., DSSP/STRIDE) and validated by structural data (X-ray/cryo-EM/NMR).

8) Why are most molecular helices right-handed?

Because proteins use L-amino acids, and for L-residues the right-handed α-helix minimizes steric clashes and optimizes backbone H-bond geometry and side-chain packing. Left-handed helices are generally less favorable for L-residues.

9) Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-strands have backbone H-bond donors/acceptors; exposed “sheet edges” can form intermolecular H-bonds, effectively zipping molecules together.
Driving forces: backbone hydrogen bonding + hydrophobic packing (and release of ordered water), often producing very stable “stacked” β-structures.


Part B: Protein Analysis and Visualization

1) Briefly describe the protein and why I selected it

I selected the lambda repressor (cI) (PDB 1LMB) because it is a well-resolved DNA-binding transcriptional repressor and provides a direct structural model for the TlpA/TlpA39–P_tlpA thermal switch used in ultrasound-triggered bacterial circuits. In both systems, a repressor binds an operator/promoter to block transcription, and regulation occurs by changing the repressor’s ability to bind DNA; therefore, cI is an ideal, structurally validated example to analyze DNA binding, secondary structure, and regulatory interfaces using 3D visualization tools. (rcsb.org)

Bacteriophage lambda repressor (cI), N-terminal DNA-binding domain bound to operator DNA
RCSB PDB: 1LMB (X-ray, 1.80 Å)


2) Identify the amino acid sequence of your protein.

How long is it?

  • Sequence length: 92 amino acids

What is the most frequent amino acid?

  • Most frequent amino acid: A (Alanine) — 11 occurrences

How many protein sequence homologs are there for your protein?

Running UniProt BLAST with the 1LMB protein sequence returned 231 homologous sequences (hits) in UniProtKB. The top matches are annotated as phage repressors / HTH-type transcriptional regulators (lambda/lambdoid-like repressors).

Protein family

cI is a helix-turn-helix (HTH) DNA-binding transcriptional repressor, part of the lambda/lambdoid phage repressor family that controls the lysis–lysogeny switch in temperate bacteriophages.


Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

  1. Deep Mutational Scans (ESM2)

I used ESM2 to generate an unsupervised deep mutational scan of my protein based on language-model likelihood scores (heatmap of all single substitutions).

Pattern / standout example:
Because the heatmap is plotted as heatmap[:, 2:], the x-axis starts at residue 3 (1-based).

  • Most deleterious mutation: D37→P (heatmap: x=34, y=P, z≈-7.89). Proline is strongly disfavored here, consistent with disruption of local secondary structure (proline is a common helix breaker).
  • Most favorable mutation: K23→L (heatmap: x=20, y=L, z≈+3.95), suggesting this substitution is well tolerated in the local sequence context according to the model.
  1. Latent Space Analysis (protein embeddings + 3D t-SNE)

I embedded the provided dataset (n=15,177 proteins) using protein language model embeddings (320D) and reduced dimensionality with 3D t-SNE. The resulting map forms local neighborhoods where nearby points represent proteins with similar sequence features (t-SNE is most reliable for local similarity).

Do neighborhoods approximate similar proteins?
Yes. Proteins in the same neighborhood tend to share related sequence motifs and often similar functions/families.

Placing my protein in the map (via nearest neighbors):
My exact sequence is not present as a point in the provided dataset, so I embedded my sequence with the same model and located it by its closest neighbors in embedding space. The nearest neighbors include: Nearest neighbors (top 5):

indexdistanceTSNE1TSNE2TSNE3annotation (short)
11240.6071.1521.019-6.889lambda cI repressor, DNA-binding domain
11521.0801.1131.027-6.865HTH-like match (Nostoc punctiforme)
11491.1681.0991.009-6.878HTH-like match (E. coli)
11531.2241.0650.997-6.873HTH-like match (P. aeruginosa)
11281.2571.1051.005-6.878HigA antitoxin (HTH regulator)

These neighbors are consistent with my protein being a helix-turn-helix (HTH) DNA-binding regulator, and indicate that my sequence lies in an HTH/transcription-factor neighborhood. Colab notebook (Latent space section)

C2. Protein Folding

  1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

I folded the 92 aa lambda repressor DNA-binding domain using ESMFold and compared it to the experimental structure (PDB 1LMB). The prediction shows a mainly alpha-helical fold consistent with an HTH-like DNA-binding domain, so it matches the expected overall topology. Minor differences are expected because 1LMB is solved in a protein–DNA complex, while ESMFold predicts the protein without DNA. 3. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

I tested ESMFold predictions for (i) a point mutation predicted as favorable by the ESM2 scan (K23L), (ii) a strongly disfavored point mutation (D37P), and (iii) a large-segment replacement (10-aa alanine stretch: positions 31–40 → AAAAAAAAAA).

Across all three variants, the predicted structures remain predominantly alpha-helical and preserve the same overall fold/topology by qualitative visual comparison. Differences, if any, appear mainly local (subtle shifts in helix/loop geometry), rather than a global collapse or refolding. Conclusion: For this small HTH-like domain, the overall fold appears resilient to these mutations and to the tested segment-level replacement (at least at the level of ESMFold-predicted coordinates).

Note: ESM2 mutation scores reflect sequence plausibility, not a direct folding energy. In my tests, even a strongly disfavored mutation (e.g., D37→P) did not collapse the global fold in ESMFold, suggesting the overall topology is robust. The mutation scan is more informative for identifying specific constrained positions (likely functional/structural hotspots) than for predicting global unfolding from a single “worst-score” mutation.

C3. Protein Generation

  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

I used the backbone coordinates from PDB 1LMB (protein chain 4) as input to ProteinMPNN to generate sequence candidates compatible with the same fold.

Original (WT):
STKKKPLTQEQLEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALLAKILKVSVEEFSPSIAREIYEMYEAVS

ProteinMPNN (sample 0):
GPGRKPLTEEELEAAKKLKAIYEERKEELNLSQAKVAELLGVSQSTVSALFNGERAFNLEIAKKLAEILKIEVSEFSPELAKKIAEEEKKIE

Sequence recovery: 0.5109 (~51% of positions match the original).

Comparison (predicted vs original): Predicted sequence probabilities:
The per-position amino-acid probability heatmap shows a mix of:

  • high-confidence positions (bright cells), where ProteinMPNN strongly prefers a specific residue given the backbone geometry, and
  • low-confidence positions (diffuse/darker columns), where multiple residues are plausible (more sequence flexibility).

Overall, ProteinMPNN preserves the general biochemical character expected for this fold (many helix-compatible and charged residues) while allowing substantial substitutions at less constrained positions.

  1. Input this sequence into ESMFold and compare the predicted structure to your original.

I folded the ProteinMPNN-designed sequence with ESMFold and compared it to the original fold (left). The designed sequence (right) produces a very similar, predominantly alpha-helical topology, consistent with the same HTH-like backbone. Differences are mainly local (helix lengths/orientations and terminal regions), rather than a complete refolding. Conclusion: ProteinMPNN proposes a sequence that is compatible with the original backbone: despite ~50% sequence recovery, the predicted structure remains close to the original fold at the qualitative/topology level.

Part D. Group Brainstorm on Bacteriophage Engineering (Engineering MS2 Lysis Protein L via N-Terminal Modulation of DnaJ Dependence)

Selected Goals

Primary goal – Increased stability (functional robustness)
Identify sequence variants of MS2 lysis protein L that maintain structural plausibility and membrane-competent architecture.

Secondary goal – Higher titers (mechanism-linked)
Modulate the dependence of L on the host chaperone DnaJ by engineering the N-terminal regulatory segment that controls activation of the lysis protein.


Biological Motivation

Previous studies show that MS2 lysis protein L requires the host chaperone DnaJ for lytic activity, and that the N-terminal region plays a regulatory role in this dependence. However, no work has systematically explored how sequence variation in this region shapes the conformational constraints underlying host-assisted activation.

We hypothesize that DnaJ dependence emerges from sequence-encoded constraints within the N-terminal regulatory segment. By mapping mutational tolerance in this region, we aim to identify variants that alter host dependence while preserving the membrane-associated lytic function of L.


Computational Approach

Protein Language Models (ESM2 / ESM-3)

Perform an in silico mutational scan of the N-terminal region to identify sequence-plausible mutations. Language model likelihood scores provide a proxy for evolutionary constraints and help prioritize mutations that are unlikely to disrupt protein viability.

Structure Prediction (ESMFold or Boltz-1)

Predict structures for candidate variants and filter out mutations predicted to cause major structural disruption. These predictions act as a structural plausibility check rather than definitive structural validation.

Interaction Proxy (AlphaFold-Multimer)

Model complexes between MS2 L variants and the host chaperone DnaJ. While chaperone interactions are dynamic, these predictions provide a relative signal to compare potential effects of mutations on host interaction.

Sequence Conservation (BLAST + Clustal Omega)

Identify conserved residues to avoid mutating positions likely critical for function.


Potential Pitfalls

  • Membrane proteins are challenging for structure predictors.
  • Chaperone interactions may not be accurately captured by AlphaFold-Multimer.
  • Variants that alter lysis timing could negatively affect phage burst size.

Pipeline

WT MS2 L sequence

BLAST / Clustal → identify conserved N-terminal residues

ESM2 mutational scan → generate candidate variants

ESMFold / Boltz → remove structurally implausible variants

AlphaFold-Multimer → compare predicted interaction with DnaJ

Shortlist variants for experimental testing of lysis timing and phage titers

Week 05 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

Original sequence
sp|P00441|SODC_HUMAN Superoxide dismutase

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Variant: A4V

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
  1. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

Using the mutant SOD1 sequence as input, PepMLM Colab generated four 12-residue candidate binders

indexBinderPseudo Perplexity
0WRYPVAGARHWE18.89836973999799
1KLYYPVVVAWWK17.203301905376957
2HRYPVVVAALKE11.315016775827807
3WLYGAAVLRHGE15.526728984710877
  1. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

PepMLM’s pseudo-perplexity scores indicate the model’s confidence in the generated binders, with lower values corresponding to higher confidence. Among the four generated peptides, HRYPVVVAALKE showed the highest confidence (lowest pseudo-perplexity, 11.315), whereas WRYPVAGARHWE showed the lowest confidence (highest pseudo-perplexity, 18.898). The reference peptide FLYRWLPSRRGG was included for comparison, but no pseudo-perplexity score for it was provided in the displayed output.

Part 2: Evaluate Binders with AlphaFold3

  1. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
    SOD1 A4V–KLVYPVVVAWWK complex SOD1 A4V–KLVYPVVVAWWK complexSOD1 A4V–FLYRWLPSRRGG complex SOD1 A4V–FLYRWLPSRRGG complex
    KLVYPVVVAWWK (ipTM = 0.59)FLYRWLPSRRGG (ipTM = 0.33)

Figure 1. AlphaFold-predicted SOD1 A4V complexes shown side by side for comparison. Left: complex with the PepMLM-generated peptide KLVYPVVVAWWK. Right: complex with the known SOD1-binding peptide FLYRWLPSRRGG.

  1. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
  • WRYPVAGARHWEipTM = 0.35. The peptide appears loosely surface-bound on the SOD1 surface, with no clear evidence of a well-defined buried binding mode. It does not convincingly localize near the N-terminus/A4V region and instead appears to contact the β-barrel surface in a weakly resolved manner.

  • KLVYPVVVAWWKipTM = 0.59. This peptide showed the strongest predicted interface among the tested candidates. It appears mainly surface-bound and extended along the β-barrel region, rather than deeply buried in a pocket. It does not clearly localize near the N-terminus where A4V sits, and no strong interaction with the dimer-interface region is evident.

  • HRYPVVVAALKEipTM = 0.48. The peptide appears surface-associated with low-to-moderate interface confidence. It does not seem to bind near the N-terminal A4V region and instead contacts an exposed outer region of SOD1, consistent with a surface-bound interaction rather than a partially buried one.

  • WLYGAAVLRHGEipTM = 0.31. This peptide shows a very weak predicted interface and appears largely extended and surface-associated, without a defined binding pocket. It does not localize near the A4V-containing N-terminus, nor does it show a clear approach to the dimer interface. The interaction appears predominantly surface-bound.

  • FLYRWLPSRRGGipTM = 0.33. The known binder also showed a low-confidence interface in this AlphaFold prediction. The peptide appears loosely surface-bound rather than buried, with no strong evidence of localization near the N-terminus/A4V site or a clearly resolved interaction at the dimer-interface region.

  1. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

Overall, the predicted SOD1 A4V–peptide complexes showed low-to-moderate ipTM values, indicating that none of the modeled interfaces was predicted with high confidence. Among the PepMLM-generated candidates, KLVYPVVVAWWK produced the highest ipTM (0.59), followed by HRYPVVVAALKE (0.48), whereas WRYPVAGARHWE (0.35) and WLYGAAVLRHGE (0.31) showed weaker predicted interfaces. The known SOD1-binding peptide FLYRWLPSRRGG gave an ipTM of 0.33. Therefore, the best PepMLM-generated peptide, KLVYPVVVAWWK, exceeded the known binder in this AlphaFold-based comparison, and HRYPVVVAALKE and WRYPVAGARHWE also matched or surpassed it, whereas WLYGAAVLRHGE did not.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Results
FLYRWLPSRRGG peptide property prediction FLYRWLPSRRGG peptide property predictionHRYPVVVAALKE peptide property prediction HRYPVVVAALKE peptide property predictionKLVYPVVVAWWK peptide property prediction KLVYPVVVAWWK peptide property predictionWLYGAAVLRHGE peptide property prediction WLYGAAVLRHGE peptide property predictionWRYPVAGARHWE peptide property prediction WRYPVAGARHWE peptide property prediction
  1. Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Compared with the AlphaFold3 models, the sequence-based peptide property predictions show only partial agreement with the structural results. The peptide with the highest structural confidence, KLVYPVVVAWWK (ipTM = 0.59), also has the strongest predicted binding affinity among the candidates (pKd/pKi = 6.760), so in that case the two methods are consistent. However, this trend is not perfect across all peptides: for example, WRYPVAGARHWE has a low AlphaFold3 interface score (ipTM = 0.35) but still a moderately favorable predicted affinity (6.213), while the known binder FLYRWLPSRRGG showed both low structural confidence (ipTM = 0.33) and the weakest predicted affinity (5.968). Importantly, all peptides were predicted to be soluble and non-hemolytic, so none of the better binders appears disqualified by poor solubility or overt hemolysis risk. Among them, KLVYPVVVAWWK appears to best balance predicted binding and therapeutic properties, since it combines the highest ipTM, the strongest predicted affinity, full solubility, and a non-hemolytic prediction, although its hemolysis probability (0.178) is somewhat higher than that of the other candidates and would still merit attention in follow-up validation.

  1. Choose one peptide you would advance and justify your decision briefly.

I would advance KLVYPVVVAWWK. Among the tested peptides, it showed the highest AlphaFold3 interface confidence (ipTM = 0.59) and the strongest predicted binding affinity (pKd/pKi = 6.760), making it the most consistent top candidate across both structural and sequence-based evaluations. It was also predicted to be soluble and non-hemolytic, which supports its therapeutic potential. Although its hemolysis probability was somewhat higher than that of the other candidates, it remained below the threshold for a hemolytic prediction, so overall it provided the best balance between predicted binding performance and developability.

Part 4: Generate Optimized Peptides with moPPIt

After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

PeptideMethodTarget motifAffinity scoreipTMInterpretation
KLVYPVVVAWWKPepMLMNone6.7600.59Best overall PepMLM candidate
HRYPVVVAALKEPepMLMNone5.5630.48Intermediate
WRYPVAGARHWEPepMLMNone6.2130.35Weak interface
WLYGAAVLRHGEPepMLMNone6.2030.31Weak interface
FLYRWLPSRRGGKnown binderNone5.9680.33Reference; weaker than best PepMLM candidate
RYTDIQQYCGKWmoPPIt29–356.4230.42Moderate but not strong
GQSDYCTRQGKImoPPIt29–355.9330.52Best moPPIt structural result so far
KRGKTCLECYQYmoPPIt29–357.2860.28Strong property scores but poor structural support
GCGYSRSYTKYEmoPPIt107–1157.2860.44Good score profile, but only moderate structural support
GDRSEYCSQKKQmoPPIt107–1156.4180.53Best moPPIt structural result; moderate interface confidence
EQSRYGHKQDERmoPPIt107–1155.2210.36High motif score but weak structural support

Note: Residues 29–35; 107-115 were selected as a hypothesis-driven target motif based on their apparent mutational sensitivity in the ESM2 deep mutational scan. This choice does not demonstrate that the region is surface-exposed or experimentally validated as a peptide-binding site, but it highlights a segment that may be structurally or functionally important and therefore worth testing in a controlled design setting.

Before any clinical development, these peptides would require stepwise preclinical evaluation. First, their intended mechanism of action would need to be clarified: whether they are meant to bind mutant SOD1 merely as recognition molecules, to block a pathogenic interaction surface, to interfere with dimerization, or to reduce misfolding or aggregation. They would then need to be tested experimentally in biochemical and cellular assays to confirm real binding to mutant SOD1, measure affinity and selectivity relative to wild-type SOD1, and determine whether binding produces a meaningful functional effect. This should be followed by preclinical studies addressing stability, protease susceptibility, uptake or delivery, toxicity, hemolysis, immunogenicity risk, pharmacokinetics, and efficacy in relevant animal models of SOD1-associated disease before considering any first-in-human study. In other words, these computational results could justify preclinical follow-up, but they are far from sufficient to support direct clinical advancement.

Part C: Final Project: L-Protein Mutants

Week 06 HW: Genetic Circuits Part I

DNA Assembly Assignment

1. Components of the Phusion High-Fidelity PCR Master Mix

The Phusion High-Fidelity PCR Master Mix contains the main components required for accurate DNA amplification. One key component is the Phusion DNA polymerase, which synthesizes new DNA strands with high fidelity. The mix also includes dNTPs, which serve as the nucleotide building blocks for DNA synthesis. In addition, it contains a reaction buffer that maintains the proper pH and salt conditions for enzyme activity, as well as magnesium ions, which are essential cofactors for polymerase function. Together, these components support efficient and accurate PCR amplification.

2. Factors that determine primer annealing temperature during PCR

Primer annealing temperature depends mainly on the melting temperature of the primers. This is influenced by primer length, GC content, sequence composition, and the degree of complementarity between the primer and the template DNA. Primers with higher GC content usually have higher melting temperatures because GC base pairs form stronger interactions than AT base pairs. The ionic conditions of the reaction can also affect annealing behavior. In practice, the annealing temperature is usually chosen a few degrees below the primer melting temperature to promote specific binding.

3. PCR and restriction enzyme digests as methods to create linear DNA fragments

PCR and restriction enzyme digestion can both be used to generate linear DNA fragments, but they do so in different ways. PCR creates a linear fragment by amplifying a specific region of DNA using primers and a DNA polymerase. This makes PCR highly flexible, since the user can define the exact fragment boundaries and can also introduce overlaps, mutations, or additional sequences through primer design. By contrast, restriction enzyme digestion produces linear DNA fragments by cutting DNA at specific recognition sites. This method is simpler when the correct restriction sites are already present, but it is less flexible because it depends on the natural or engineered location of those sites. PCR is preferable when custom fragment design is needed, whereas restriction digestion is useful for straightforward excision or plasmid linearization.

4. Ensuring that digested and PCR-generated DNA fragments are appropriate for Gibson cloning

To ensure that PCR-generated and digested DNA fragments are appropriate for Gibson cloning, the fragments must contain overlapping homologous ends. These overlaps are typically around 20 to 40 base pairs long and must match the adjacent fragment exactly so that they can anneal during the Gibson reaction. It is also important to verify that the fragments have the expected size and correct orientation. This can be checked by sequence design in Benchling and by confirming the fragment sizes experimentally, for example by gel electrophoresis. In addition, the DNA fragments should be clean and well purified to improve assembly efficiency.

5. How plasmid DNA enters E. coli cells during transformation

Plasmid DNA enters E. coli cells only after the cells have been made competent. In chemical transformation, the cells are treated to make their membranes more permeable, and a brief heat shock helps the plasmid DNA cross the membrane and enter the cell. In electroporation, a short electrical pulse creates temporary pores in the membrane that allow DNA uptake. After the DNA enters the cell, the bacteria recover in rich medium and begin expressing the antibiotic resistance marker carried by the plasmid. This makes it possible to select transformed cells on antibiotic-containing plates.

Another Assembly Method

6. Gibson Assembly

Gibson Assembly is a DNA assembly method that joins DNA fragments that share overlapping ends. The reaction contains three main enzymatic activities: an exonuclease, a DNA polymerase, and a DNA ligase. First, the exonuclease chews back the 5’ ends of the DNA fragments, exposing complementary single-stranded overlaps. These overlapping regions anneal to one another if they were designed correctly. Then the DNA polymerase fills in any missing nucleotides, and the ligase seals the remaining nicks in the DNA backbone. Because this method does not depend on restriction sites at the junctions, it allows seamless assembly of multiple DNA fragments in a single reaction.

7. Gibson Assembly explained in 5-7 sentences plus diagram

Gibson Assembly works by joining DNA fragments that share overlapping homologous ends. An exonuclease first creates single-stranded overhangs by chewing back the 5’ ends of each fragment. These exposed complementary regions then anneal to each other. After annealing, a DNA polymerase fills in the missing nucleotides. Finally, a DNA ligase seals the remaining nicks in the sugar-phosphate backbone. This method is efficient because it allows seamless joining of fragments without requiring restriction sites at the junctions. It is especially useful when assembling plasmids from PCR products and linearized backbones.

Gibson Assembly diagram Gibson Assembly diagram Figure 1. Overview of Gibson Assembly. (1) A linearized vector backbone and a DNA insert are designed with homologous overlap regions at opposite ends. (2) Exonuclease activity resects the 5’ ends of the DNA fragments, exposing complementary 3’ single-stranded overhangs. (3) These complementary 3’ overhangs anneal, DNA polymerase fills in the remaining gaps, and DNA ligase seals the remaining nicks, generating the final assembled plasmid.

8. Modeling this assembly method in Benchling

I modeled this assembly strategy in Benchling using a Gibson Assembly design from my project. In this design, I organized the vector backbone and the insert as separate DNA fragments and verified that they contained the appropriate overlap-compatible ends required for Gibson cloning. Benchling was useful for checking fragment orientation, overlap design, and the expected structure of the final construct before assembly. I then used the platform to inspect the final plasmid map and confirm the architecture of the assembled design.

Benchling Gibson design Benchling Gibson design Figure 2. Benchling setup for Gibson Assembly design. The vector backbone and the iroB insert were organized as separate fragments in Benchling and prepared for Gibson Assembly by defining overlap-compatible ends. This setup was used to verify fragment identity, orientation, and the expected construct before generating the final plasmid design.

Benchling Gibson design Benchling Gibson design Figure 3. Final Benchling construct after Gibson Assembly design. Circular map of the assembled pTwist-iroB-cassette plasmid (3561 bp), showing the final construct architecture after insertion of the iroB modular cassette into the vector backbone. This map was used to confirm the final plasmid structure and annotation.

Asimov Kernel Homework

1. Repository and notebook setup

I created a dedicated repository for this homework in Asimov Kernel (Francisco_MC_HW6) and added a notebook entry to document the work.

2. Exploring the Bacterial Demos repository

I explored the Bacterial Demos repository and chose to inspect the NAND construct as an initial example (Figure 4), since NAND logic is directly relevant to the design of genetic circuits and to my broader interest in programmable circuit behavior.

From this construct, I observed that Kernel represents the design both as a linear arrangement of functional genetic parts and as a circular DNA map. This made it easier to identify how promoters, ribosome binding sites, coding sequences, and terminators are organized within the circuit.

3. Initial observations

The NAND example helped me understand how Kernel connects circuit logic with DNA architecture. In the linear view, the construct is represented as an ordered set of functional parts. In the circular map, the same construct can be interpreted as a plasmid-level design. This dual representation is useful for relating circuit structure to the physical organization of the DNA sequence.

Benchling Gibson design Benchling Gibson design Figure 4. NAND construct from the Bacterial Demos repository in Asimov Kernel. The figure shows both the linear circuit architecture and the circular map of the construct. This view was useful for identifying the arrangement of promoters, ribosome binding sites, coding sequences, and terminators, and for relating circuit logic to the underlying DNA organization.

4.1 Recreate the Repressilator

I simulated the reconstructed repressilator in E. coli for 72 simulated hours using a 10-minute time step and no added ligand. The resulting RNA and protein concentration plots showed an initial transient phase followed by sustained oscillatory behavior over time. This indicates that the reconstructed circuit preserves the expected dynamic logic of the repressilator rather than converging to a single stable state. Both transcript and protein levels fluctuate over time, which is consistent with a cyclic repression network.

Benchling Gibson design Benchling Gibson design Figure 5. Comparison of repressilator architectures. The upper construct shows the original repressilator architecture, in which the promoter–repressor pairings follow the reference design from the Bacterial Demos repository. The lower construct shows a variant with altered promoter order while keeping the same general set of parts. This comparison was used to test how rewiring promoter arrangement affects the behavior of the oscillator.

4.2 Comparison of repressilator variants

I compared four repressilator configurations: the original reference architecture, a variant with a modified middle RBS, a variant with altered promoter order, and a combined variant containing both changes. In all four cases, the circuit still showed sustained oscillatory behavior in both RNA and protein concentration plots. This indicates that, in Kernel, the repressilator is qualitatively robust to these perturbations.

However, the variants did not behave identically. The main effect of both the RBS change and the promoter reordering was a redistribution of the quantitative balance among the three nodes. In particular, different variants shifted which transcriptional unit became most dominant in the final RNAP and ribosome flux plots. This suggests that these design changes affect the oscillatory regime quantitatively, even when they do not abolish oscillation altogether.

ConstructPromoter orderMiddle RBSOscillation observed?Main observation
Reference repressilatorOriginalOriginalYesBaseline oscillatory regime
Variant 1OriginalAlteredYesOscillation preserved, but the middle module becomes more dominant
Variant 2AlteredOriginalYesOscillation preserved, with a shifted quantitative balance across nodes
Variant 3AlteredAlteredYesOscillation preserved, with combined changes in node dominance and final flux distribution

Benchling Gibson design Benchling Gibson design Figure 6. Oscillatory response under two modified repressilator configurations. The left panels show the RNA and protein time-course simulations for combination 1 in the comparison table, corresponding to the repressilator with the first modified condition. The right panels show combination 3, in which both variables were altered simultaneously. In both cases, the circuit retained sustained oscillatory behavior after an initial transient phase. However, the combined perturbation produced a more uneven quantitative distribution across the three nodes, especially at the protein level, indicating that these modifications affect the balance of the oscillatory regime even when oscillation is preserved.

5. Build three of your own Constructs using the parts in the Characterized Bacterial Parts Repo

5.1 First construct: transcriptional NAND gate

The first construct implements a transcriptional NAND gate using two inducible inputs, IPTG and aTc. These inputs activate pTac and pTet, leading to expression of hrpR and hrpS. Only the simultaneous presence of both regulators activates pHrpL, which produces the repressor AmtR. AmtR then represses the GFP output module driven by pAmtR, so fluorescence is lost only in the double-input condition.

Benchling Gibson design Benchling Gibson design Figure 7. Synthetic NAND gate assembled from characterized bacterial parts. The construct uses pTac and pTet as two inducible input promoters controlling hrpR and hrpS. These two regulators jointly activate pHrpL (BBa_K4226003), which drives expression of the repressor AmtR. AmtR represses the output promoter pAmtR, thereby controlling the GFP reporter (BBa_E0040). As a result, GFP is expressed in all conditions except when both inputs are simultaneously present, consistent with NAND behavior.

The construct uses pTac and pTet as two inducible input promoters controlling hrpR and hrpS. These two regulators jointly activate pHrpL (BBa_K4226003), which drives expression of the repressor AmtR. AmtR represses the output promoter pAmtR, thereby controlling the GFP reporter (BBa_E0040). As a result, GFP is expressed in all conditions except when both inputs are simultaneously present, consistent with NAND behavior.

Piece-by-piece interpretation

PartRole in the circuitFunctional meaning
pTacInput 1 promoterInduced by IPTG
pTetInput 2 promoterInduced by aTc
BBa_K2561008hrpRFirst regulator of the AND layer
BBa_K2561009hrpSSecond regulator of the AND layer
BBa_K4226003pHrpLPromoter activated only when both HrpR and HrpS are present
AmtRFinal repressorRepresses the output promoter
pAmtRRepressible output promoterDrives GFP unless repressed by AmtR
BBa_B0034RBSEnables translation of the reporter
BBa_E0040GFPFluorescent output reporter

Biological interpretation

The circuit detects two external inducers, IPTG and aTc. IPTG induces pTac and aTc induces pTet, producing HrpR and HrpS, respectively. Only when both are present is pHrpL activated, which drives expression of AmtR. AmtR then represses pAmtR, shutting off GFP expression. Therefore, the fluorescent output is turned OFF only when both inputs are present, which implements NAND logic.

Truth table

IPTGaTcpTacpTetHrpRHrpSpHrpLAmtRGFPOutput
0000000011
1010100011
0101010011
1111111100

5.2 Oscillator-coupled NAND circuit for cyclic protein expression

This design couples a transcriptional NAND gate to a repressilator oscillator in order to generate cyclic protein expression. The NAND module determines whether output expression is permitted based on two external inputs, while the repressilator imposes a temporal oscillatory pattern on the system. As a result, the output protein is not produced continuously, but instead in periodic pulses and only under the logical conditions defined by the NAND gate.

Biological interpretation

Biologically, this construct can be understood as a programmable pulsatile expression system. The two external inputs first determine whether the cell is allowed to express the target protein, and the oscillator then determines when that expression occurs. This creates rhythmic bursts of protein production rather than constitutive accumulation, which could be useful for periodic secretion, controlled therapeutic delivery, or reducing metabolic burden during prolonged expression.

Benchling Gibson design Benchling Gibson design Figure 8. Oscillator-coupled NAND gate for pulsatile protein expression.
The NAND module determines whether output expression is permitted, while the repressilator provides a cyclic temporal signal. Together, these modules generate rhythmic protein expression only when the required logical input conditions are satisfied.

Limitations

This design is conceptually valid, but its experimental implementation may face several limitations. Reuse of repeated regulatory parts could increase the risk of recombination or construct instability. In addition, the large size and multilayered structure of the circuit may impose a significant metabolic burden on the host cell. Coupling the NAND module and the oscillator on the same plasmid could also introduce unintended interactions between components, and promoter leakiness may reduce the sharpness of both the logic response and the pulsatile output.

week-07-hw-genetic-circuits-part-ii

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. Advantages of IANNs over traditional Boolean genetic circuits

IANNs can generate graded, weighted, and more flexible input/output responses instead of only ON/OFF logic. This makes them better suited for integrating multiple noisy biological signals and for approximating complex decision boundaries.

2. Useful application of an IANN

A useful application would be tumor microenvironment sensing. Multiple inputs such as hypoxia, lactate, and acidity could be integrated to produce a selective therapeutic output only when the combined signal pattern matches a tumor-like state. A limitation is that biological components may have leakiness, limited dynamic range, crosstalk, and high variability, which can reduce classification accuracy.

3. Multilayer perceptron concept

In an intracellular multilayer perceptron, the first layer processes the input signals and produces an intermediate regulator, such as an endoribonuclease. That regulator then controls expression of the layer 2 output, for example a fluorescent protein, allowing hierarchical signal processing across layers.

Intracellular multilayer perceptron

Input layer:
  X1 ---> [Tx/Tl] ---\
                      \
                       > Layer 1: endoribonuclease R1
                      /
  X2 ---> [Tx/Tl] ---/

Layer 1 output:
  R1 --| regulates --> Layer 2: fluorescent protein mRNA --> [Tl] --> Y

Assignment Part 2: Fungal Materials

1. Examples of fungal materials

Existing fungal materials include mycelium-based packaging, leather-like textiles, insulation panels, and construction composites. They are used as sustainable alternatives to plastic foams, animal leather, and some building materials. Their main advantages are biodegradability, low-energy production, and renewable growth. Their disadvantages include lower durability, moisture sensitivity, and less standardized performance compared with conventional materials.

2. Why genetically engineer fungi?

Fungi could be engineered to produce stronger, more water-resistant, or more functional materials, or to sense and respond to environmental signals. Synthetic biology in fungi is attractive because fungi naturally grow as structured biomaterials, secrete enzymes efficiently, and can process complex substrates such as agricultural waste. Compared with bacteria, fungi are often better suited for making large, fibrous, and mechanically useful living materials.

week-09-hw-cell-free-systems

Homework Part A: General and Lecturer-Specific Questions

  1. Advantages: Cell-free systems allow direct control of reaction conditions without maintaining cell viability. They are especially useful for toxic proteins and membrane proteins.

  2. Main components: Cell extract or Tx/Tl machinery, DNA template, amino acids, nucleotides, energy source, salts, and buffer. Together, these support transcription, translation, and reaction stability.

  3. Energy regeneration: ATP and GTP are rapidly consumed during transcription and translation. Continuous supply can be maintained with phosphoenolpyruvate- or maltodextrin-based regeneration systems.

  4. Prokaryotic vs eukaryotic cell-free systems: Prokaryotic cell-free systems are faster, simpler, and cheaper, making them useful for proteins such as GFP or bacterial enzymes. Eukaryotic systems are better for proteins that require more complex folding or post-translational modifications, such as membrane receptors or glycosylated proteins.

  5. Optimizing membrane protein expression To optimize membrane protein expression, I would test the protein in a cell-free system supplemented with liposomes or nanodiscs to support folding and insertion. The main challenge is aggregation caused by hydrophobic regions, so providing a membrane-like environment and testing different reaction conditions would help improve yield and functionality.

  6. Low yield: possible causes and troubleshooting One possible cause is poor DNA template quality, which can be addressed by improving template purity or concentration. A second cause is insufficient energy supply, which can be improved by optimizing the ATP regeneration system. A third cause is protein misfolding or aggregation, which can be addressed by changing temperature, adding chaperones, or using liposomes or nanodiscs.

Homework question from Kate Adamala

  • Function: A synthetic minimal cell that senses extracellular lactate and triggers reporter release.
  • Input / Output: Input = extracellular lactate. Output = fluorescence outside the vesicle after pore activation.
  • Cell-free alone?: Only partially. Encapsulation is needed for boundary formation and controlled release.
  • Natural engineered cell?: Yes, but a minimal cell is simpler and more controllable.
  • Desired outcome: OFF without lactate, ON with lactate.
  • Membrane: POPC + cholesterol.
  • Inside: Bacterial Tx/Tl system, amino acids, nucleotides, ATP regeneration system, DNA circuit.
  • Tx/Tl source: Bacterial.
  • Communication: Lactate enters by permeability or transport; output exits through a pore.
  • Lipids / genes: POPC, cholesterol; lactate-responsive module, reporter gene, alpha-hemolysin.
  • Readout: Measure fluorescence outside the vesicle with and without lactate.

Homework question from Peter Nguyen

  • A replaceable underarm textile patch with freeze-dried cell-free sensing chemistry that detects breast-cancer-associated sweat VOC patterns as an exploratory early-risk screening tool, producing a visible color change. :contentReference[oaicite:0]{index=0}

  • How it works: The patch would be embedded into the axillary region of a shirt and activated by sweat moisture during wear. The sensing layer would respond to a selected VOC proxy or small panel of sweat-associated metabolites linked to breast-cancer volatilomic signatures, generating a colorimetric readout visible on the fabric. This concept is exploratory rather than diagnostic, since current evidence supports altered sweat/VOC patterns in breast cancer but also shows major variability and the need for standardization and larger validation studies. :contentReference[oaicite:1]{index=1}

  • Need addressed: The idea targets the need for low-cost, noninvasive, wearable screening technologies that could complement—not replace—conventional breast-cancer detection. Current VOC research suggests promise for noninvasive screening, but not yet clinical readiness. :contentReference[oaicite:2]{index=2}

  • How to address cell-free limitations: The sensing module would be freeze-dried for storage stability, packaged as a replaceable one-time-use cartridge, and designed to activate only when hydrated by sweat. To reduce false positives, the patch would be framed as a research or screening-support device rather than a diagnostic tool, and future versions would likely need multiplexed sensing instead of relying on a single metabolite. :contentReference[oaicite:3]{index=3}

Relevant DOIs

  • 10.3390/cancers15112939
  • 10.1177/11772719221100709
  • 10.1016/j.physbeh.2023.114307

Homework question from Ally Huang

  • Background: Long-term spaceflight causes bone loss because microgravity disrupts the balance between bone formation and bone resorption. This is significant for human health in space and for future long-duration missions. It is also scientifically interesting because bone loss reflects broader cellular stress and tissue adaptation under microgravity. :contentReference[oaicite:0]{index=0}

  • Target: CDKN1A (p21) nucleic acid signature associated with microgravity-related osteogenic stress. :contentReference[oaicite:1]{index=1}

  • Relation to the challenge: CDKN1A/p21 has been reported to increase in bone under microgravity-associated conditions and is linked to osteogenic cell-cycle arrest. Detecting this target would provide a molecular readout connected to space-induced bone loss. :contentReference[oaicite:2]{index=2}

  • Hypothesis / goal: I hypothesize that a BioBits-based fluorescence assay can be used to detect a bone-loss-associated molecular marker relevant to microgravity stress. The goal is to create a simple proof-of-concept workflow in which a selected target sequence is amplified with miniPCR and linked to a fluorescent readout using BioBits and the P51 viewer. This would model how portable molecular tools could support astronaut health monitoring in resource-limited environments. :contentReference[oaicite:3]{index=3}

  • Experimental plan: I would test synthetic DNA samples representing negative, low, and high levels of the CDKN1A target, plus a no-template control. The target would be amplified with miniPCR, coupled to a BioBits fluorescence readout, and measured with the P51 viewer. The main data would be fluorescence intensity and signal-to-background differences across conditions. :contentReference[oaicite:4]{index=4}

Relevant DOIs

  • 10.1371/journal.pone.0061372
  • 10.1038/s41526-022-00194-8

Homework Part B: Individual Final Project

week-10-hw-imaging-and-measurement

Waters Part I — Molecular Weight

1) what is the calculated molecular weight?

Using the amino acid sequence provided in the assignment, I calculated the theoretical molecular weight of the construct with the ExPASy Compute pI/Mw tool.

The calculator (https://web.expasy.org/compute_pi/) returned the following values:

  • Theoretical pI: 5.90
  • Theoretical molecular weight: 28006.60 Da

2) Molecular weight from adjacent charge states

Two adjacent peaks from Figure 1 were selected for the adjacent charge-state analysis:

  • m/z = 1000.4302
  • m/z = 966.0037

2.1 Determine the charge state

Using the adjacent charge-state equation,

z = (m/z)_(n+1) / [(m/z)n - (m/z)(n+1)]

the charge state was determined to be z = 28.

2.2 Determine the molecular weight

Using the relationship

MW = z(m/z) - zH

where H = 1.0073 Da, the experimental molecular weight of eGFP was calculated as 27,983.84 Da.

2.3 Calculate the accuracy

Using

Accuracy = |MW_experiment - MW_theory| / MW_theory

and comparing the experimental value with the theoretical molecular weight from Question 1 (28,006.60 Da), the measurement error was 0.0813%.

3) Charge state of the zoomed-in peak

No, the charge state of the zoomed-in peak cannot be determined directly from the zoomed-in signal shown in Figure 1. To assign a charge state from a single zoomed peak, the isotopic peak spacing would need to be clearly resolved, because the spacing is approximately equal to 1/z. In this spectrum, the zoomed peak appears as a broad unresolved signal rather than a set of distinct isotopic peaks, so the spacing cannot be measured reliably. Therefore, the charge state must be inferred from adjacent charge-state peaks in the full envelope rather than from the zoomed-in peak itself.

Waters Part II — Secondary/Tertiary structure

1) Native vs denatured eGFP conformations

Native and denatured protein conformations differ in their degree of folding. In the native state, eGFP maintains a compact and folded three-dimensional structure. In the denatured state, the protein unfolds, exposing more of its amino acid side chains to the solvent. As a result, more protonatable sites become accessible during electrospray ionization, so the denatured protein acquires more charges than the native protein.

This difference can be detected by mass spectrometry through the charge-state distribution. A folded protein usually shows lower charge states because its compact structure limits protonation. In contrast, an unfolded protein shows higher charge states because more basic sites are exposed and can be protonated.

This pattern is visible in Figure 2. The denatured eGFP spectrum (top) displays a broad charge-state envelope with many peaks at lower m/z values, consistent with a highly charged, unfolded protein population. By contrast, the native eGFP spectrum (bottom) shows a much narrower distribution with fewer peaks at higher m/z values, indicating lower charge states and a more compact folded structure.

Overall, the main spectral difference is that denatured eGFP appears with higher charge states and a broader distribution, whereas native eGFP appears with lower charge states and a narrower distribution, consistent with retention of its folded conformation.

2) Charge state of the native peak near 2800 m/z

Yes, the charge state of the peak near 2800 m/z can be assigned as +10.

This can be determined because native eGFP has an intact mass of about 28 kDa, so a peak near 2800 m/z is consistent with a species carrying about 10 charges:

charge state ≈ MW / (m/z)
charge state ≈ 28000 / 2800
charge state ≈ 10

This assignment is also consistent with the neighboring native peaks, which form a low-charge distribution typical of a compact, folded protein. In native MS, folded proteins usually retain fewer charges, so peaks appear at higher m/z values than in the denatured spectrum.

Waters Part III — Peptide Mapping - primary structure

1) Lysines and arginines in eGFP

The eGFP sequence contains:

  • 20 Lysines (K)
  • 6 Arginines (R)

These residues are important because trypsin cleaves on the C-terminal side of K and R, so they define the expected peptide fragments in a tryptic digest.

2) Number of peptides generated by tryptic digestion of eGFP

Using trypsin as the protease, the eGFP sequence is predicted to generate 27 peptides.

This is consistent with trypsin cleaving after lysine (K) and arginine (R) residues in the sequence.

3) Number of chromatographic peaks in the eGFP peptide map

Between 0.5 and 6.0 minutes, I observe approximately 16 chromatographic peaks with greater than 10% relative abundance in the eGFP peptide map.

Because this is based on visual inspection of the TIC, the exact count may vary slightly depending on how partially resolved shoulders are interpreted.

4) Comparison between observed and predicted peptide counts

No, the number of chromatographic peaks does not match the number of peptides predicted from Question 2. The chromatogram shows fewer peaks than the 27 peptides predicted from the tryptic digest.

This difference is expected because not every predicted peptide is necessarily detected as a clear chromatographic peak. Some peptides may be too small, too low in abundance, poorly ionized, or may co-elute with other peptides.

5) Peptide m/z, charge state, and singly charged mass

The most abundant peptide peak is observed at m/z 525.7671.

The isotope spacing is approximately 0.5 m/z (for example, from 525.7671 to 526.2592), which indicates a charge state of z = 2, since isotopic spacing is approximately 1/z.

Using this charge state, the singly charged form of the peptide was calculated as:

[M+H]+ = 1050.53

This is also consistent with the peak observed near m/z 1050.5244 in the spectrum.

6) Peptide identification and mass accuracy

Based on comparison with the expected tryptic peptide masses in PeptideMass, the peptide is FEGDTLVNR.

The experimental singly charged mass was m/z 1050.5244 and the theoretical mass for [M+H]+ of FEGDTLVNR is 1050.5214.

Accuracy = |MW_experiment - MW_theory| / MW_theory

Accuracy = |1050.5244 - 1050.5214| / 1050.5214

Accuracy = 0.00000281

Therefore, the measurement error is approximately 2.8 ppm.

7) Sequence coverage confirmed by peptide mapping

The peptide mapping data confirms 88% of the eGFP sequence.

Waters Part IV — Oligomers

1) KLH oligomer assignments

Using the subunit masses from Table 1:

  • 7FU = 340 kDa
  • 8FU = 400 kDa

the expected oligomer masses are:

  • 7FU decamer = 10 × 340 kDa = 3.4 MDa
  • 8FU didecamer = 20 × 400 kDa = 8.0 MDa
  • 8FU 3-decamer = 30 × 400 kDa = 12.0 MDa
  • 8FU 4-decamer = 40 × 400 kDa = 16.0 MDa

Waters Part V — Did I make GFP?

TheoreticalObserved/measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)28.006627.9838812.7 ppm

week-11-hw-building-genomes

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

  1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

1.1 E. coli Lysate * BL21 (DE3) Star Lysate: Provides the essential molecular machinery, including ribosomes and translation factors, while the T7 RNA Polymerase drives the transcription of DNA into mRNA.

1.2 Salts/Buffer * Potassium & Magnesium Glutamate: Essential for maintaining ionic balance; Magnesium specifically acts as a cofactor for ribosome stability and enzymatic functions. * HEPES-KOH (pH 7.5) & Potassium Phosphates: Act as buffering agents to stabilize the pH, ensuring an optimal environment for biochemical reactions.

1.3 Energy / Nucleotide System * Ribose & Glucose: Serve as carbon sources and secondary energy substrates to power the regeneration of ATP within the system. * AMP, CMP, GMP, UMP & Guanine: These are the fundamental nucleotide building blocks required for the synthesis of mRNA during the transcription phase.

1.4 Translation Mix (Amino Acids) * 17 Amino Acid Mix, Tyrosine & Cysteine: These are the primary monomers or “building blocks” that are polymerized to form the specific protein sequence.

1.5 Additives * Nicotinamide: Helps maintain metabolic flux and prevents the degradation of key energy cofactors like NAD+.

1.6 Backfill * Nuclease-Free Water: Used to adjust the reaction to its final volume while ensuring the absence of enzymes that could degrade DNA or RNA templates.

  1. Main Differences (PEP-NTP vs. NMP-Ribose-Glucose)

    • The 1-hour PEP-NTP mix uses high-energy PEP and direct NTPs for immediate, rapid protein synthesis, whereas the 20-hour mix relies on slower, sustainable energy regeneration from Ribose and Glucose using NMPs. This shift from “ready-to-use” fuels to “precursor-based” metabolism allows the 20-hour system to maintain reaction stability for a significantly longer duration.
  2. Bonus Question: Transcription without GMP

    • Transcription can still occur because the cell-free lysate contains endogenous enzymes (such as phosphoribosyltransferases) that can salvage Guanine by attaching it to a ribose-phosphate provider. This pathway converts the free Guanine base into GMP and subsequently into the GTP required by the RNA polymerase to build the mRNA strand.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

  1. Properties of Fluorescent Proteins in Cell-Free Systems

1.1 sfGFP (superfolder GFP): Known for its exceptionally fast folding and high stability, which allows for rapid detection and high expression levels even in robust cell-free environments.

1.2 mRFP1: This protein has a relatively slow maturation time and lower photostability compared to newer variants, which can lead to a delayed or weaker signal readout during the reaction.

1.3 mKO2: It is highly bright and has fast maturation, but like many orange/red proteins, its fluorescence can be sensitive to the pH levels maintained by the buffer system.

1.4 mTurquoise2: Characterized by its high quantum yield and superior photostability, providing a very bright and consistent signal that is ideal for precise quantitative readouts.

1.5 mScarlet-I: One of the brightest red fluorescent proteins available, it features a fast maturation rate that is specifically optimized for efficient folding in various expression systems.

1.6 Electra2: Designed for high photostability and rapid maturation, making it particularly effective for real-time monitoring of protein synthesis in long-duration cell-free reactions.

  1. Hypothesis for Improving Long-Term Fluorescence
  • Protein: Electra2
  • Reagent(s) Adjustment: Increase the initial concentration of Glucose and Nicotinamide in the master mix.
  • Expected Effect: By increasing these reagents, we enhance the secondary energy regeneration pathway, ensuring a steady ATP supply over the full 36-hour period. Combined with Electra2’s inherent rapid maturation and high photostability, this sustained energy flux will maximize the number of correctly folded fluorescent molecules throughout the entire reaction duration, maintaining a strong and consistent fluorescence signal without the reaction running out of fuel prematurely.
  1. Final Phase Reaction Protocol The final reaction will consist of a 20 µL total volume, incorporating 2 µL of custom reagent supplements. My experimental goal will be to use these 2 µL to deliver the adjusted concentrations of Glucose and Nicotinamide proposed in my hypothesis, maximizing ATP availability throughout the full 36-hour period to support Electra2’s rapid maturation and maintain consistent fluorescence signal over the entire reaction duration.

Part D: Build-A-Cloud-Lab

CELL-FREE SYNTHESIS OPTIMIZATION & LAB AUTOMATION DESIGN

    1. Reagent Roles & Hypothesis** My project focuses on extending the protein synthesis reaction from 20 to 36 hours.
  • RAC-15 (Preparation): Used for the high-precision addition of the 2 µL custom supplements (Glucose/Nicotinamide) into the master mix.

  • RAC-16 (Incubator): Dedicated to maintaining a stable thermal environment for the entire 36-hour duration.

  • Lunatic (Analysis): Integrated into the loop to perform real-time fluorescence measurements without interrupting the experiment.

Workflow Logic: The circular Magnum Motion track allows for an automated, iterative cycle. Samples move from incubation (RAC-16) to measurement (Lunatic) and back to incubation without manual intervention or thermal shock. This ‘Infinite Loop’ design is critical for documenting the full 36-hour fluorescence curve and proving that the increased ATP availability from the adjusted reagents maximizes Electra2’s fluorescence output throughout the entire reaction duration.

Final Lab Design Final Lab Design

Conclusion and Scalability: The primary advantage of this automated circular design is its capacity for High-Throughput Screening (HTS). Beyond simple monitoring, the RAC-15 can be programmed to dispense a gradient of different Glucose and Nicotinamide concentrations across a single 96-well plate. By cycling this plate through the Lunatic every hour, the system will simultaneously characterize multiple experimental conditions. This allows us to identify the precise “sweet spot” for protein yield and maturation over the 36-hour window in a single, autonomous run.

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image