<Kaleab Berhanu> — HTGAA Spring 2026

cover image cover image

About me

profile_Picture profile_Picture

About Me

My name is Kaleab Berhanu, and I am a senior student double-majoring in Life Science and Biomedical Engineering at Hanyang University. My academic journey has taken me across the intersection of biology, engineering, and computation, and I am deeply motivated by the question of how biological systems can be understood, redesigned, and deployed to solve real-world problems in medicine and human health.

Research Experience

My research background began at the Neural Intelligence and Systems Laboratory at Hanyang University, where I worked as a research intern investigating the role of EPG neurons in the visual reflex behavior of Drosophila melanogaster. Using the UAS-GAL4 system to genetically manipulate specific neuronal populations, I maintained genetic lines through controlled inbreeding and analyzed behavioral and neural datasets using MATLAB. This experience gave me a deep appreciation for how much remains unknown about the nervous system, and how rigorous experimental design and patient analysis are the foundation of meaningful scientific discovery.

HTGAA 2026

This spring, I participated in How to Grow Almost Anything (HTGAA) 2026, MIT Media Lab’s intensive synthetic biology program, with weekly lab sessions held at Yonsei University. Across the course, I worked on DNA reading, writing, and editing, genetic circuit design, protein engineering, cell-free protein synthesis, lab automation, bioproduction, imaging and measurement, and building genomes. I completed computational projects using tools including Benchling, Asimov Kernel, PepMLM, PeptiVerse, and AlphaFold3, and gained hands-on experience with molecular biology techniques across wet and dry lab contexts.

Final Project

My HTGAA final project is titled A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection. The project addresses a critical gap in clinical monitoring: the inability to detect localized inflammation and hypoxia in real time at the tissue level without repeated blood draws or imaging. I designed a cell-free genetic circuit that senses two simultaneous disease signals — elevated IL-6 (an inflammatory cytokine) and low oxygen tension (hypoxia) — and produces two outputs: sfGFP fluorescence as a diagnostic reporter and a therapeutic peptide as a functional biological output. The circuit is encapsulated within an agarose hydrogel bioink matrix, creating a self-contained sense-and-respond biomaterial platform with potential applications in wound care, oncology, and critical care medicine globally.

Beyond the Lab

Outside of research, I have worked as a Global Ambassador for Hanyang University and produced content across English and Korean channels for ESAK. I participated in the Korea–Cambodia ICT Exchange Program organized by the Korea Productivity Center and KOICA, where I led the field research component of a smart irrigation project, conducting farmer interviews and designing a prototype that addressed real agricultural challenges, ultimately winning third place at a local competition in Cambodia. I also completed the LINC 3.0 Industry Collaboration Program, developing a global market strategy for an AI-based environmental analysis startup, and worked on data analysis and digital competency through Python-based coursework.

Future Aspirations

My long-term goal is to become a biotechnologist working at the frontier of synthetic biology and therapeutic engineering. I am particularly drawn to the design of programmable biological systems — circuits, materials, and platforms that can autonomously sense disease and respond with a therapeutic output, without requiring living organisms or complex clinical infrastructure. I am strongly considering pursuing graduate school in biomedical engineering or synthetic biology, where I hope to develop the depth of expertise needed to contribute independently to the fields of neural engineering, cell-free systems, and implantable biomaterials. I believe that the future of medicine lies not in reactive treatment but in proactive, localized biological intelligence embedded directly at the site of disease — and I intend to spend my career helping build it.

Contact info

Homework

Labs

Projects

Subsections of <Kaleab Berhanu> — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • week-01-hw-principles-and-practices

    Biological Engineering Application or Tool The proposed application is an AI-guided protein therapeutic discovery and bioproduction platform. The system uses machine learning–based protein design models to generate novel therapeutic protein candidates, such as antimicrobial proteins, enzymes, or biologics optimized for stability and activity. These candidates are then evaluated for manufacturability and functional performance using controlled bioproduction workflows, including microbial expression or cell-free systems. This application reflects an emerging paradigm in biopharmaceutical development, where AI accelerates early-stage discovery while scalable bioproduction determines clinical and commercial feasibility. However, as AI enables rapid de novo protein design, many generated sequences may lack homology to known natural proteins, introducing novel biosecurity and safety risks if not properly governed.

  • week-02-hw-dna-read-write-and-edit

    This week, we were tasked to utilize different tools to be able to virtually read, write, and visualize using samples like lambda DNA from Escherichia coli and the Tumor suppressor gene from humans. Part 1 - Introduction and DNA digest. Gel Electrophoresis Gel - material Electro - Electric Phoresis - to transport It is a method used to transport charged materials using an electric field through a gel (a Semi-liquid substance). Digested fragments of Lambda DNA

  • week-03-hw-lab-automation

    Bioart Using Opentrons Goal of learning this lesson and doing the OpenTron automation. Utilizing different tools to automate different lab work using programmed robots. Be able to design, coordinate, code, and print one’s design using OpenTron robots. Code CLICK HERE SEE THE CODE USED from opentrons import types

  • week-04-hw-protein-design-part-i

    A. Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (On average, an amino acid is ~100 Daltons) Answer 1 Dalton ≈ 1 g/mol Average amino acid ≈ 100 g/mol If you eat 500 g of (pure) amino acids: number of moles = Gm/ Tm = 500g/100g/mol

  • week-05-hw-protein-design-part-ii

    Protein Design Part 2 SOD1_A4V Mutated Code Used. —> MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Part 1: PepMLM Generated Peptides Summary The four PepMLM-generated peptides were conditioned on the SOD1 A4V mutant sequence with a target length of 12 amino acids, with the exception of Peptide 2, which came out at 15 residues. Perplexity scores reflect the model’s confidence in each binder, where a lower score indicates higher confidence. Peptide 1 (WLYGAAGVRWGX) has the lowest perplexity at 13.06, making it the model’s most confident prediction, though it contains an X residue at the final position, which represents an unresolved or masked amino acid and should be noted as a potential issue before advancing it further. Peptides 2, 3, and 4 all cluster between 17 and 20, reflecting moderate confidence. The known binder FLYRWLPSRRGG is included as a structural and therapeutic benchmark and does not carry a perplexity score since it was not generated by PepMLM.

  • week-06-hw-genetic-circuits-part-ii

    Part - 1 What are some components in the Phusion High-Fidelity PCR Master Mix, and what is their purpose? Phusion High-Fidelity PCR Master Mix, commonly produced by Thermo Fisher Scientific, contains a high-fidelity DNA polymerase with proofreading ability, a reaction buffer that maintains optimal conditions, Mg²⁺ ions as a cofactor, dNTPs as building blocks, and stabilizing additives. Together, these components enable accurate and efficient DNA amplification with a low error rate. What are some factors that determine primer annealing temperature during PCR?

  • week-07-hw-genetic-circuits-part-ii

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Intracellular artificial neural networks provide more flexible and nuanced behavior than traditional Boolean genetic circuits because they can process inputs in a graded, continuous manner rather than simple on or off states. This allows cells to integrate multiple signals and produce proportional responses, making them better suited for complex decision making and pattern recognition inside biological systems. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

  • week-09-hw-cell-free-systems

    Homework question from Kate Adamala. Design an example of a useful synthetic minimal cell as follows: Pick a function and describe it. a. What would your synthetic cell do? What is the input, and what is the output? The cell-free genetic circuit that I plan to make for the final project aims to detect different biological signals and produce a measurable output. The input will be one among the environmental signals, IL-6 or low O₂, and the output will be a green fluorescence signal or a therapeutic peptide. Could this function be realized by cell-free Tx/Tl alone, without encapsulation? The system that I am thinking of needs to be encapsulated inside a hydrogel. Could this function be realized by genetically modified natural cells? Cells do have a mechanism to respond to real signals in the body, but getting therapeutic peptides and other luminescent signals as an output from a signal is achieved if the cell is preprogrammed and the genetic circuit is assembled in a way to detect the signal and respond accordingly. Describe the desired outcome of your synthetic cell operation. Output will be a Green Fluorescence Signal. Design all components that would need to be part of your synthetic cell. What would the membrane be made of?

  • week-10-hw-imaging-and-measurement

    Protein Characterization: eGFP and KLH Homework: Final Project Project Title: A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection What I Will Measure My final project centers on engineering a cell-free genetic circuit embedded within a hydrogel matrix that responds to two physiological disease signals, IL-6 (an inflammatory cytokine) and low oxygen tension (hypoxia), and produces two corresponding outputs: sfGFP fluorescence as a reporter signal and a therapeutic peptide as a functional output.

  • week-11-hw-building-genomes

    Bioproduction & Cloud Labs Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork My Contribution I contributed pixels forming part of the DNA helix structure on the lower left quadrant of the collective canvas, using a blue-green palette consistent with the biological theme of the artwork.

Subsections of Homework

week-01-hw-principles-and-practices

  1. Biological Engineering Application or Tool

The proposed application is an AI-guided protein therapeutic discovery and bioproduction platform. The system uses machine learning–based protein design models to generate novel therapeutic protein candidates, such as antimicrobial proteins, enzymes, or biologics optimized for stability and activity. These candidates are then evaluated for manufacturability and functional performance using controlled bioproduction workflows, including microbial expression or cell-free systems.

This application reflects an emerging paradigm in biopharmaceutical development, where AI accelerates early-stage discovery while scalable bioproduction determines clinical and commercial feasibility. However, as AI enables rapid de novo protein design, many generated sequences may lack homology to known natural proteins, introducing novel biosecurity and safety risks if not properly governed.

  1. Governance / Policy Goals

The overarching governance goal is to ensure that AI-enabled protein drug discovery and bioproduction contribute to a safe, ethical, and socially beneficial future, while preventing misuse or unintended harm. This goal can be divided into the following sub-goals:

2.1. Non-malfeasance and biosecurity

  Prevent the accidental or intentional creation of harmful, toxic, or dual-use proteins enabled by AI-assisted design.

2.2. Responsible scale-up and traceability
Ensure that the transition from digital protein design to physical bioproduction is secure, auditable, and accountable.

2.3. Preservation of constructive innovation
Maintain open scientific collaboration and efficient therapeutic development without imposing unnecessary regulatory burdens that would slow innovation.

These goals align with arguments advanced by Baker and Church, who emphasize that enhanced biosecurity should be embedded into protein design and DNA synthesis infrastructure without undermining transparency or information sharing.

  1. Governance Action (Purpose, Design, Assumptions, Risks)

3.1 Governance Action 1: Integrated Safety Screening and Secure Sequence Logging

Purpose

Currently, AI protein design pipelines primarily optimize for functional performance, and existing biosecurity measures rely heavily on sequence homology screening at the DNA synthesis stage. As Baker and Church note, this approach is increasingly insufficient for de novo designed proteins. This project proposes an integrated governance mechanism that embeds mandatory AI-based safety screening and secure sequence logging directly into the protein design and bioproduction pipeline.

Design

This governance approach would be implemented through collaboration among AI tool developers, biopharmaceutical companies, and DNA synthesis or bioproduction providers. All AI-generated protein sequences would undergo computational screening for toxicity, virulence, and dual-use potential before synthesis approval. Once synthesized, sequences would be logged in encrypted repositories tied to production systems, with access restricted to exceptional circumstances such as public health investigations. This design enables traceability and accountability while protecting intellectual property and minimizing interference with normal research workflows.

Assumptions

This approach assumes that predictive models for protein toxicity and risk are sufficiently accurate to identify high-risk candidates and that industry actors are willing to adopt shared security standards. It also assumes that secure logging can be implemented in a way that does not expose proprietary information or discourage legitimate research.

Risks of Failure and “Success”

Potential failure modes include false negatives that allow harmful proteins to proceed or false positives that block legitimate therapeutic candidates. Additionally, if logging systems are unevenly implemented, malicious actors may bypass regulated platforms. A potential risk of “success” is increased centralization of bioproduction infrastructure, which could disadvantage smaller labs or researchers in low-resource settings if access is not equitably managed.

3.2 Governance Action Option 2

Tiered Access and Credentialing for Advanced Protein Design Models

Purpose

Currently, many AI protein design tools are becoming increasingly accessible with minimal differentiation between low-risk exploratory use and high-risk de novo protein generation. This action proposes a tiered access system where more powerful generative protein design capabilities require additional credentials, training, or institutional affiliation.

Design

AI tool providers and research institutions would implement access tiers based on user role, training completion, and intended application. Basic design and analysis features would remain widely accessible, while advanced generative functions (e.g., unrestricted de novo protein design) would require completion of biosecurity and ethics training, institutional oversight, or project-level approval. This mirrors governance models used in high-performance computing, clinical data access, and human-subjects research.

Assumptions

This approach assumes that access restrictions can meaningfully reduce misuse without pushing users toward unregulated alternatives. It also assumes institutions are capable of fairly and consistently evaluating access requests.

Risks of Failure and “Success”

If too restrictive, tiered access could slow innovation or disadvantage independent researchers and low-resource institutions. If too permissive, it may fail to deter misuse. A risk of “success” is the normalization of credential-based gatekeeping that could reinforce existing inequities in global research participation.

3.3 Governance Action Option 3

Safety-by-Design Standards Linked to Incentives and Recognition

Purpose

While safety measures are often framed as compliance requirements, this action reframes governance as an incentive-based system that rewards early integration of biosecurity and safety considerations into AI-driven protein design and bioproduction.

Design

Funding agencies, journals, and investors would establish safety-by-design criteria as part of grant evaluation, publication standards, and due diligence. Projects that demonstrate integrated risk assessment, secure production workflows, and ethical reflection would receive preferential funding, expedited review, or public recognition. This approach aligns governance with existing academic and commercial reward structures rather than relying solely on enforcement.

Assumptions

This approach assumes that researchers and companies respond strongly to funding, publication, and reputational incentives. It also assumes evaluators have sufficient expertise to assess safety claims without turning the process into box-checking.

Risks of Failure and “Success”

If poorly designed, incentives may encourage superficial compliance rather than genuine risk mitigation. A risk of “success” is that safety standards become rigid or outdated, unintentionally discouraging novel approaches that do not fit existing evaluation frameworks.

  1. Does the option:Option 1Option 2Option 3
    Enhance Biosecurity
    • By preventing incidents122
    • By helping respond133
    Foster Lab Safety
    • By preventing incident221
    • By helping respond132
    Protect the environment
    • By preventing incidents232
    • By helping respond132
    Other considerations
    • Minimizing costs and burdens to stakeholders211
    • Feasibility?122
    • Not impede research211
    • Promote constructive applications121
  2. Evaluation and Prioritization of Governance Approach

Overall, this integrated governance approach performs well across the major policy goals of biosecurity, lab safety, and responsible innovation. By focusing on prevention at the design stage and accountability at the production stage, it strengthens biosecurity while remaining feasible and compatible with existing biopharmaceutical workflows. Although the approach introduces some additional cost and procedural overhead, it does not fundamentally impede research and instead helps reduce downstream failures and regulatory risk.

  1. Final Recommendation and Trade-offs

Based on this evaluation, the integrated safety screening and secure sequence logging approach should be prioritized as the primary governance mechanism for AI-enabled protein drug discovery and bioproduction. This strategy addresses the highest-risk stages—design and scale-up—while remaining technically feasible and aligned with existing biopharmaceutical practices. The key trade-off involves balancing innovation speed with safety and accountability. While additional screening and logging may introduce modest overhead, these costs are outweighed by reduced downstream failures, increased regulatory confidence, and improved public trust.

This recommendation is directed toward biopharmaceutical R&D leadership and regulatory agencies, where early alignment between AI-driven discovery and governance expectations can ensure that emerging therapeutic technologies are both innovative and trustworthy.

cover image cover image

week-02-hw-dna-read-write-and-edit

This week, we were tasked to utilize different tools to be able to virtually read, write, and visualize using samples like lambda DNA from Escherichia coli and the Tumor suppressor gene from humans.

Part 1 - Introduction and DNA digest.

Gel Electrophoresis

  • Gel - material
  • Electro - Electric
  • Phoresis - to transport
  • It is a method used to transport charged materials using an electric field through a gel (a Semi-liquid substance).

Digested fragments of Lambda DNA

Image of Design created Image of Design created Restrictive Enzyme digest of lambda DNA on Benchling Restrictive Enzyme digest of lambda DNA on Benchling

Part 2

For this assignment, I have chosen the Tumor Repressor protein 53 in humans. I chose this because I have previously made a comparative analysis with the Trp 53 protein from the mouse.

3.1 The full amino acid sequence of Tp53 protein in FASTA format

AAH03596.1 Tumor protein p53 [Homo sapiens] MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA PRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKT CPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRN TFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACAGR DRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL KDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD

3.2 Reverse Translated Sequence

atggaagaaccgcagagcgatccgagcgtggaaccgccgctgagccaggaaacctttagc gatctgtggaaactgctgccggaaaacaacgtgctgagcccgctgccgagccaggcgatg gatgatctgatgctgagcccggatgatattgaacagtggtttaccgaagatccgggcccg gatgaagcgccgcgcatgccggaagcggcgccgcgcgtggcgccggcgccggcggcgccg accccggcggcgccggcgccggcgccgagctggccgctgagcagcagcgtgccgagccag aaaacctatcagggcagctatggctttcgcctgggctttctgcatagcggcaccgcgaaa agcgtgacctgcacctatagcccggcgctgaacaaaatgttttgccagctggcgaaaacc tgcccggtgcagctgtgggtggatagcaccccgccgccgggcacccgcgtgcgcgcgatg gcgatttataaacagagccagcatatgaccgaagtggtgcgccgctgcccgcatcatgaa cgctgcagcgatagcgatggcctggcgccgccgcagcatctgattcgcgtggaaggcaac ctgcgcgtggaatatctggatgatcgcaacacctttcgccatagcgtggtggtgccgtat gaaccgccggaagtgggcagcgattgcaccaccattcattataactatatgtgcaacagc agctgcatgggcggcatgaaccgccgcccgattctgaccattattaccctggaagatagc agcggcaacctgctgggccgcaacagctttgaagtgcgcgtgtgcgcgtgcgcgggccgc gatcgccgcaccgaagaagaaaacctgcgcaaaaaaggcgaaccgcatcatgaactgccg ccgggcagcaccaaacgcgcgctgccgaacaacaccagcagcagcccgcagccgaaaaaa aaaccgctggatggcgaatattttaccctgcagattcgcggccgcgaacgctttgaaatg tttcgcgaactgaacgaagcgctggaactgaaagatgcgcaggcgggcaaagaaccgggc ggcagccgcgcgcatagcagccatctgaaaagcaaaaaaggccagagcaccagccgccat aaaaaactgatgtttaaaaccgaaggcccggatagcgat

3.3 Optimized codon

ATGGAAGAACCACAAAGTGACCCCAGCGTTGAACCGCCGCTGAGCCAGGAAACCTTCAGTGATCTGTGGAAACTGCTGCCGGAAAACAACGTGCTGAGCCCGCTGCCGAGCCAGGCGATGGATGATCTGATGCTGTCTCCGGATGACATTGAGCAGTGGTTCACCGAAGACCCCGGACCGGATGAAGCGCCGCGTATGCCGGAAGCAGCACCGCGCGTAGCACCGGCACCGGCAGCACCGACCCCGGCTGCACCTGCACCGGCACCCTCATGGCCGCTCAGCAGCTCAGTGCCCAGCCAGAAAACCTATCAGGGCAGCTATGGCTTCCGCCTGGGCTTCCTGCACAGCGGCACGGCAAAATCGGTGACCTGCACCTACAGCCCTGCGCTGAACAAGATGTTCTGCCAGCTGGCGAAAACCTGCCCGGTGCAGCTGTGGGTTGACTCCACACCGCCGCCAGGCACCCGTGTGCGTGCGATGGCGATCTATAAACAGAGCCAGCACATGACCGAAGTGGTGCGTCGCTGCCCGCACCATGAGCGCTGCTCTGACAGCGACGGTCTGGCACCGCCGCAGCATCTGATCCGCGTTGAAGGTAACCTGCGTGTGGAGTATCTGGATGACCGCAACACCTTCCGCCACAGCGTGGTGGTGCCGTATGAACCGCCGGAAGTGGGCAGCGACTGCACCACCATCCACTACAACTACATGTGCAACTCCTCCTGCATGGGCGGTATGAACCGCCGTCCGATTCTGACCATTATCACCCTGGAAGACTCCAGCGGTAACCTGCTGGGCCGTAACAGCTTTGAAGTGCGTGTGTGTGCCTGTGCCGGCCGCGATCGCCGCACGGAAGAAGAAAACCTGCGCAAGAAAGGTGAACCGCACCACGAACTGCCGCCGGGCAGCACCAAGCGTGCGCTGCCGAACAACACCTCCTCCAGCCCGCAGCCGAAGAAGAAACCGCTGGATGGCGAGTACTTCACCCTGCAGATCCGTGGGCGTGAACGTTTTGAAATGTTCCGTGAGCTGAACGAAGCGCTGGAGCTGAAAGATGCGCAGGCGGGTAAAGAGCCGGGTGGCTCACGTGCGCACAGCAGCCACCTGAAATCCAAAAAAGGTCAGAGCACCAGCCGTCACAAAAAACTGATGTTTAAAACTGAAGGGCCGGACAGCGAT

3.4 Expression Method

Cell-dependent

  • Transform plasmid into E. coli
  • Cells replicate plasmid + express protein
  • Induce expression (e.g., IPTG)
  • Lyse cells, purify protein

Cell-free

  • Add DNA/RNA to cell extract
  • Extract contains ribosomes + factors
  • Protein made in a test tube
  • Faster; good for toxic proteins
3.5 Protein Alignment

The main reason that the same gene can produce different proteins at the transcriptional level is mainly because of :

  • Alternative Splicing
  • Alternative transcriptional and translational initiation.
Benchling_Protein_Alignment of our protein Benchling_Protein_Alignment of our protein
4 Preparing a Twist DNA Synthesis Order

In this part, I was able to create an expression cassette that can be inserted into a vector plasmid and incorporated with a cell-free or a cell-dependent medium to express a desired protein. To exercise the entire procedure of making a construct and getting a customised plasmid vector benchling and Twist were used. I used the sGFP gene sequence from NCBI and annotated its promoter, ribosome-binding site, optimized codon region, and its terminator on benchling and later a pTwist Amp High Copy vector was used after downloading from Twist.

Finalized DNA Construct Finalized DNA Construct
5. Tools and Techniques to Read, Write, and Edit DNA.

DNA Read

Next-generation sequencing (NGS / Illumina) Next-generation sequencing (NGS / Illumina)Nanopore sequencing (Oxford Nanopore) Nanopore sequencing (Oxford Nanopore)Sanger sequencing Sanger sequencing

DNA Write

Phosphoramidite synthesis (column synthesis) Phosphoramidite synthesis (column synthesis)Array-based synthesis Array-based synthesisEnzymatic DNA synthesis Enzymatic DNA synthesis

DNA Edit

CRISPR-Cas9 CRISPR-Cas9Base editing Base editingPrime editing Prime editing

week-03-hw-lab-automation

Bioart Using Opentrons

Goal of learning this lesson and doing the OpenTron automation.

  • Utilizing different tools to automate different lab work using programmed robots.
  • Be able to design, coordinate, code, and print one’s design using OpenTron robots. Design Design

Code

CLICK HERE SEE THE CODE USED

from opentrons import types

metadata = { ‘author’: ‘Kaleab’, ‘protocolName’: ‘ADWA’, ‘description’: ‘ADWA in Amharic and English’, ‘source’: ‘HTGAA 2026 Opentrons Lab’, ‘apiLevel’: ‘2.20’ }

Robot deck setup constants

TIP_RACK_DECK_SLOT = 9 COLORS_DECK_SLOT = 6 AGAR_DECK_SLOT = 5 PIPETTE_STARTING_TIP_WELL = ‘A1’

well_colors = { ‘A1’: ‘Red’, # mRFP1 bacteria ‘B1’: ‘Green’, # sfGFP bacteria }

Coordinates for each bacterial color

mRFP1 (Red) points - in mm offset from agar plate center mrfp1_points = [ (-26.4, 17.6), (-28.6, 15.4), (-24.2, 15.4), (30.8, 15.4), (-30.8, 13.2), (-22, 13.2), (-13.2, 13.2), (-11, 13.2), (-8.8, 13.2), (-6.6, 13.2), (-4.4, 13.2), (-2.2, 13.2), (6.6, 13.2), (19.8, 13.2), (28.6, 13.2), (33, 13.2), (-33, 11), (-19.8, 11), (-13.2, 11), (0, 11), (6.6, 11), (13.2, 11), (19.8, 11), (26.4, 11), (35.2, 11), (-35.2, 8.8), (-17.6, 8.8), (-13.2, 8.8), (2.2, 8.8), (6.6, 8.8), (13.2, 8.8), (19.8, 8.8), (24.2, 8.8), (37.4, 8.8), (-35.2, 6.6), (-17.6, 6.6), (-13.2, 6.6), (2.2, 6.6), (6.6, 6.6), (13.2, 6.6), (19.8, 6.6), (24.2, 6.6), (37.4, 6.6), (-35.2, 4.4), (-17.6, 4.4), (-13.2, 4.4), (2.2, 4.4), (6.6, 4.4), (13.2, 4.4), (19.8, 4.4), (24.2, 4.4), (37.4, 4.4), (-35.2, 2.2), (-17.6, 2.2), (-13.2, 2.2), (2.2, 2.2), (6.6, 2.2), (13.2, 2.2), (19.8, 2.2), (24.2, 2.2), (26.4, 2.2), (28.6, 2.2), (30.8, 2.2), (33, 2.2), (35.2, 2.2), (37.4, 2.2), (-35.2, 0), (-33, 0), (-30.8, 0), (-28.6, 0), (-26.4, 0), (-24.2, 0), (-22, 0), (-19.8, 0), (-17.6, 0), (-13.2, 0), (2.2, 0), (6.6, 0), (13.2, 0), (19.8, 0), (24.2, 0), (37.4, 0), (-35.2, -2.2), (-17.6, -2.2), (-13.2, -2.2), (2.2, -2.2), (6.6, -2.2), (13.2, -2.2), (19.8, -2.2), (24.2, -2.2), (37.4, -2.2), (-35.2, -4.4), (-17.6, -4.4), (-13.2, -4.4), (2.2, -4.4), (6.6, -4.4), (13.2, -4.4), (19.8, -4.4), (24.2, -4.4), (37.4, -4.4), (-35.2, -6.6), (-17.6, -6.6), (-13.2, -6.6), (2.2, -6.6), (8.8, -6.6), (13.2, -6.6), (19.8, -6.6), (24.2, -6.6), (37.4, -6.6), (-35.2, -8.8), (-17.6, -8.8), (-13.2, -8.8), (0, -8.8), (11, -8.8), (13.2, -8.8), (15.4, -8.8), (17.6, -8.8), (24.2, -8.8), (37.4, -8.8), (-35.2, -11), (-17.6, -11), (-11, -11), (-8.8, -11), (-6.6, -11), (-4.4, -11), (-2.2, -11) ]

sfGFP (Green) points - in mm offset from agar plate center sfgfp_points = [ (-15.4, 35.2), (-2.2, 35.2), (0, 35.2), (8.8, 35.2), (11, 35.2), (13.2, 35.2), (-17.6, 33), (-2.2, 33), (0, 33), (2.2, 33), (6.6, 33), (11, 33), (15.4, 33), (-17.6, 30.8), (-15.4, 30.8), (-13.2, 30.8), (-11, 30.8), (-8.8, 30.8), (-2.2, 30.8), (0, 30.8), (6.6, 30.8), (11, 30.8), (15.4, 30.8), (-11, 28.6), (-4.4, 28.6), (-2.2, 28.6), (0, 28.6), (2.2, 28.6), (6.6, 28.6), (11, 28.6), (15.4, 28.6), (-13.2, 26.4), (-4.4, 26.4), (2.2, 26.4), (8.8, 26.4), (11, 26.4), (13.2, 26.4), (-15.4, 24.2), (-13.2, 24.2), (-11, 24.2), (-8.8, 24.2), (-4.4, 24.2), (2.2, 24.2), (11, 24.2), (-15.4, 22), (-8.8, 22), (-4.4, 22), (2.2, 22), (11, 22), (-15.4, 19.8), (-8.8, 19.8), (-4.4, 19.8), (2.2, 19.8), (11, 19.8), (-15.4, 17.6), (-8.8, 17.6), (-4.4, 17.6), (2.2, 17.6), (11, 17.6), (-17.6, -15.4), (-15.4, -15.4), (-6.6, -15.4), (-4.4, -15.4), (-2.2, -15.4), (0, -15.4), (8.8, -15.4), (11, -15.4), (13.2, -15.4), (-19.8, -17.6), (-15.4, -17.6), (2.2, -17.6), (6.6, -17.6), (15.4, -17.6), (-15.4, -19.8), (2.2, -19.8), (6.6, -19.8), (15.4, -19.8), (-15.4, -22), (2.2, -22), (6.6, -22), (15.4, -22), (-15.4, -24.2), (-6.6, -24.2), (-4.4, -24.2), (-2.2, -24.2), (0, -24.2), (6.6, -24.2), (15.4, -24.2), (-15.4, -26.4), (2.2, -26.4), (6.6, -26.4), (15.4, -26.4), (-15.4, -28.6), (2.2, -28.6), (6.6, -28.6), (15.4, -28.6), (-19.8, -30.8), (-17.6, -30.8), (-15.4, -30.8), (-13.2, -30.8), (-11, -30.8), (-6.6, -30.8), (-4.4, -30.8), (-2.2, -30.8), (0, -30.8), (2.2, -30.8), (8.8, -30.8), (11, -30.8), (13.2, -30.8) ]

Volume to dispense per dot (uL) DISPENSE_VOLUME = 1

def run(protocol): # Load labware, modules, and pipettes # Tips tips_20ul = protocol.load_labware( ‘opentrons_96_tiprack_20ul’, TIP_RACK_DECK_SLOT, ‘Opentrons 20uL Tips’ )

# Pipettes
pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

# Modules
temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)

# Temperature Module Plate
temperature_plate = temperature_module.load_labware(
    'opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate'
)

# Choose where to take the colors from
color_plate = temperature_plate

# Agar Plate
agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')

# Get the top-center of the plate (calibration reference point)
center_location = agar_plate['A1'].top()

pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

##############################################################################
###   Helper functions
##############################################################################

def location_of_color(color_string):
    """Return the well location for a given color name."""
    for well, color in well_colors.items():
        if color.lower() == color_string.lower():
            return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

def dispense_and_detach(pipette, volume, location):
    """
    Move laterally 5mm above the plate (to avoid smearing a drop); then drop down
    to the plate, dispense, move back up 5mm to detach drop, and stay high to be
    ready for next lateral move.
    """
    assert isinstance(volume, (int, float))
    above_location = location.move(types.Point(z=5))
    pipette.move_to(above_location)
    pipette.dispense(volume, location)
    pipette.move_to(above_location)

def stamp_color(color_string, points):
    """
    Aspirate enough liquid to dispense across all points, then visit each
    coordinate and dispense a small dot. Re-aspirate when the tip runs low.

    Args:
        color_string: 'Red' or 'Green'
        points: list of (x, y) tuples in mm offset from plate center
    """
    source_well = location_of_color(color_string)
    max_volume = 18  # stay below 20uL max, leave headroom
    remaining = 0

    pipette_20ul.pick_up_tip()

    for x_offset, y_offset in points:
        # Aspirate a fresh batch if the tip is about to run dry
        if remaining < DISPENSE_VOLUME:
            # How many dots can we fit in one pickup?
            dots_per_fill = int(max_volume // DISPENSE_VOLUME)
            aspirate_vol = min(dots_per_fill * DISPENSE_VOLUME, max_volume)
            pipette_20ul.aspirate(aspirate_vol, source_well)
            remaining = aspirate_vol

        # Build the target location: offset (x, y) from center, at plate surface
        target_location = center_location.move(types.Point(x=x_offset, y=y_offset, z=0))
        dispense_and_detach(pipette_20ul, DISPENSE_VOLUME, target_location)
        remaining -= DISPENSE_VOLUME

    # Drop tip when done with this color
    pipette_20ul.drop_tip()

##############################################################################
###   Patterning — stamp each bacterial strain
##############################################################################

# 1. mRFP1 (Red) — Amharic characters
stamp_color('Red', mrfp1_points)

# 2. sfGFP (Green) — English / additional characters
stamp_color('Green', sfgfp_points)

Printed Design Printed Design

week-04-hw-protein-design-part-i

A. Conceptual Questions

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat?

(On average, an amino acid is ~100 Daltons)

Answer

1 Dalton ≈ 1 g/mol

Average amino acid ≈ 100 g/mol

If you eat 500 g of (pure) amino acids:

number of moles = Gm/ Tm = 500g/100g/mol

Using Avogadro’s number: 5×6.022×10^23 ≈ 3.0 × 10²⁴ molecules

So you consume roughly 3 septillion amino acid molecules.

2. Why do humans eat beef but do not become cows, eat fish but do not become fish?
Answer

Proteins are digested into individual amino acids in the stomach and small intestine.

Your body:

  • Breaks proteins down.
  • Absorbs amino acids.
  • Reassembles them into human proteins according to your DNA.
3. Why are there only 20 natural amino acids?
Answer

Because they have been created by an intelligent design in such a way.

4. Can you make other non-natural amino acids? Design some new amino acids.
Answer

Yes. Scientists create non-natural amino acids using synthetic biology.

Examples of designs:

• A fluorescent amino acid (attach a fluorophore to side chain) • A metal-binding amino acid (add a bipyridine group) • A photo-switchable amino acid (add an azobenzene group) • A redox-active amino acid

These can:

  • Expand protein function
  • Create new biomaterials
  • Enable bioelectronics
5. Where did amino acids come from before enzymes that make them, and before life started?
Answer
  • Everything was created by the almighty God, who is an intelligent being.
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Answer
  • Natural proteins use L-amino acids and form right-handed α-helices.

  • If you use D-amino acids, you would expect a left-handed α-helix.

The handedness flips due to stereochemistry.

7. Can you discover additional helices in proteins?
Answer

Yes.

Beyond the α-helix, proteins contain:

  • 3₁₀ helix

  • π-helix

  • Collagen triple helix

Structural biology and protein design can reveal or engineer new helix types.

Helices Helices
8. Why are most molecular helices right-handed?
Answer

Because biological systems predominantly use L-amino acids.

Their stereochemistry naturally favors right-handed packing for minimal steric clash and optimal hydrogen bonding.

9. Why do β-sheets tend to aggregate?

What is the driving force for β-sheet aggregation? Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? Design a β-sheet motif that forms a well-ordered structure.

Answer

Why β-sheets aggregate: β-strands expose backbone hydrogen bonding groups. They stack via intermolecular hydrogen bonds.

Driving force:

  • Hydrogen bonding

  • Hydrophobic interactions

  • π–π stacking (aromatic residues)

Amyloid diseases: Proteins misfold and form stable β-sheet fibrils.

Examples include:

  • Alzheimer’s disease

  • Parkinson’s disease

Amyloid β-peptides form cross-β sheet structures.

Materials applications: Yes — amyloid fibrils can be used as:

  • Nanowires

  • Hydrogels

Biocompatible scaffolds

  • Design idea: Create a repeating sequence like:
    • Val–Ile–Val–Ile–Tyr–Val–Ile–Val

Alternating hydrophobic residues promotes stacking and ordered β-sheet assembly.

B. Protein Analysis

I have chosen Herceptin (trastuzumab) for this section. Herceptin is a monoclonal antibody mainly involved in recognising cancer cells. It binds specifically to the HER2 receptor on cancer cells and blocks signaling pathways that promote tumor growth. I selected this protein because it is an important example of a therapeutic antibody widely used in breast cancer treatment.

Amino Acid Sequence (P04626-1)

CLICK HERE SEE THE SEQUENCE

MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLPDLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV

Total Length: 1255 Most Common Amino Acid: Leucine(L)

  • It belongs to the immunoglobulin G (IgG1) subclass within the immunoglobulin superfamily. And it is part of the L-domian family. (Immunoglobulin Light-chain domain.)

  • Resolution: 4.36 Å, which shows low resolution of the model.

  • The crystal structure of trastuzumab bound to HER2 was solved in 2004.

Blast Analysis

Blast Analysis Blast Analysis
  • The BLAST search identified homologous ERBB2 (HER2) protein sequences in several primates, including chimpanzee, bonobo, gorilla, and orangutan. These sequences show very high similarity (98–99% identity) with the query sequence, indicating that the HER2 receptor is highly conserved among mammals.

PYMOL Analysis of Trastuzumab

Ribbon Representation

Ribbon View Ribbon View

Ball and Stick

Ball and Stick Ball and Stick

Protein Surface

Surface Surface

*Hydrophobic Region

Hydrophobic Region Hydrophobic Region

Secondary structures

Alpha & Beta structure Alpha & Beta structure

C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

Deep Mutational Scans

  • Deep Mutational Scans
  1. Heatmap-1 Heatmap-1
  2. Heatmap-2 Heatmap-2
  3. Heatmap-3 Heatmap-3
  • Latent Space Analysis
Latent Space Analysis Latent Space Analysis
  • The Latent space analysis shows the 3D representation of different proteins. This plot is a map of protein similarity — proteins close together are similar in sequence/function/structure, the dense center contains common proteins, and the scattered edges contain unusual ones. The color encodes an additional property (likely functional or structural) layered on top of the spatial layout.

Explanation

Shape

One large continuous cloud — no hard separate clusters Reflects that protein sequence space is smooth and gradual, not divided into distinct categories

The Dense Purple Core

Where most proteins sit These are common, well-represented protein families that ESM2 has seen many times

The Scattered Orange/Yellow Periphery

Outlier proteins that are unusual or specialized Score higher on whatever the colorbar is measuring (likely a biological property or cluster score ranging from -7 to +7)

The Elongated Arms

Streaks radiating outward from the core Represent protein subfamilies that share a common origin but have diverged over evolution.

ESM fold Prediction

  • N.B For this section, I selected Insulin because it is relatively smaller than HER2, which kept crashing while trying to predict how it folds.

Predicted Structure Predicted Structure Real Structure Real Structure

  • ESMFold correctly predicted the beta sheet topology of insulin, identifying the major secondary structure elements consistent with the experimental RCSB structure. However, the predicted structure is notably more extended and loosely packed, with larger irregular loops compared to the compact real structure. This discrepancy is most likely due to insulin’s three disulfide bonds between Chain A and Chain B, which ESMFold does not explicitly model; these bonds are critical for anchoring the loops and achieving the tight globular shape seen in the experimental structure. The TM-score and RMSD would quantify this difference precisely, but visually, the fold class is correct while the fine-grained packing is not.

Reverse folding using ProteinMPNN.

For this part, I used the PDB file of the HER2 protein. After uploading the pdb file, a reverse folding was run, and 20 possible candidates for the actual sequence of the protein was predicted. Among the results, the one with the lowest log score was identified through manual screeing and was folded using the ESMfold model. The predicted sequence and the folded protein are attached below.

Predicted Structure

ALTPEQAALLAAAWAPVFADREANARAFVLDLFRAYPSLADLFPEFKGKTLEQIAASPALGPYAGAFADRLAQFVASSDNAAKMATFWENYANEHIRRGITASHFEQVRAVFPGFVASVAEPPPGAAAAWDQFWGGIIDALKKAGG

T=0.5, sample=0, score=0.9440, seq_recovery=0.4932

T = 0.5 (Temperature)

Controls how creative/diverse the designed sequence is 0.5 is moderate — balanced between staying close to original and exploring new sequences Lower (0.1) = conservative, Higher (1.0) = very adventurous

sample = 0

This is the first designed sequence (counting starts from 0) If you generated 10 sequences, you’d see sample=0 through sample=9 Each sample is an independent design attempt for the same backbone

score = 0.9440

Negative log likelihood — measures model confidence Lower = better — model is very confident this sequence fits your backbone Your score of 0.9440 is excellent — it’s below 1.0 which is better than your insulin results (1.06 and 1.08)

seq_recovery = 0.4932

49.32% of positions match the original protein sequence exactly Roughly 1 in 2 residues is identical to the original This is your best recovery so far — slightly higher than insulin’s ~46%

Refolding of the sequence Refolding of the sequence

week-05-hw-protein-design-part-ii

Protein Design Part 2

SOD1_A4V Mutated Code Used.

—> MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Part 1: PepMLM Generated Peptides

Summary

The four PepMLM-generated peptides were conditioned on the SOD1 A4V mutant sequence with a target length of 12 amino acids, with the exception of Peptide 2, which came out at 15 residues. Perplexity scores reflect the model’s confidence in each binder, where a lower score indicates higher confidence. Peptide 1 (WLYGAAGVRWGX) has the lowest perplexity at 13.06, making it the model’s most confident prediction, though it contains an X residue at the final position, which represents an unresolved or masked amino acid and should be noted as a potential issue before advancing it further. Peptides 2, 3, and 4 all cluster between 17 and 20, reflecting moderate confidence. The known binder FLYRWLPSRRGG is included as a structural and therapeutic benchmark and does not carry a perplexity score since it was not generated by PepMLM.

Results

PeptideSequenceLengthPseudo PerplexitySource
1WLYGAAGVRWGX1213.06PepMLM
2SRYDEYVVVVKAAKK1517.72PepMLM
3HRVYAVVVAWKK1219.82PepMLM
4WLYYAVALAWKE1217.93PepMLM
5FLYRWLPSRRGG12N/AKnown binder

Part 2: AlphaFold3 Structural Evaluation

SOD1_A4V_Peptide

AlphaFold3 Result AlphaFold3 Result

Scores

MetricValueInterpretation
ipTM0.4Low confidence interface prediction
pTM0.74Reasonable overall fold quality

Structural Observations

The SOD1 A4V beta-barrel is predicted with high confidence, appearing in blue and cyan in the structure viewer, consistent with the well-characterized immunoglobulin- like fold of SOD1. The peptide chain is rendered in yellow and orange, indicating low to very low pLDDT confidence in its predicted conformation. The peptide appears to associate loosely near the helical region at the top of the SOD1 structure rather than engaging the N-terminus directly where the A4V mutation at residue 4 is located. It does not appear to be buried at the dimer interface and is largely surface associated.

The Predicted Aligned Error matrix supports this interpretation. The large dark green block spanning SOD1 residues 1 to approximately 160 confirms strong internal positional confidence within the protein. The bottom right region corresponding to the peptide chain shows notably lighter green, indicating higher positional uncertainty. The off-diagonal inter-chain block between SOD1 and the peptide is also light green, reflecting weak confidence in the relative positioning of the two chains and consistent with the low ipTM score.

Assessment

An ipTM of 0.4 falls below the 0.5 threshold typically considered meaningful for protein-peptide interactions, suggesting this peptide does not form a confidently predicted stable complex with SOD1 A4V. The low pLDDT of the peptide further indicates its conformation is disordered in this predicted complex. This result would need to be weighed against the therapeutic property predictions from PeptiVerse before making any advancement decision.

Part 3: PeptiVerse Therapeutic Property Evaluation

Results Table

PeptideSequenceBinding Affinity (pKd/pKi)SolubilityHemolysisNet Charge (pH 7)MW (Da)GRAVY
1WLYGAAGVRWGK6.641 (Weak)0.867 (Soluble)0.055 (Non-hemolytic)+1.761363.6-0.09
2SRYDEYVVVVKAAKK6.453 (Weak)1.000 (Soluble)0.062 (Non-hemolytic)+1.461755.0-0.41
3HRVYAVVVAWKK6.689 (Weak)0.970 (Soluble)0.093 (Non-hemolytic)+2.841455.7+0.23
4WLYYAVALAWKE6.661 (Weak)0.760 (Soluble)0.110 (Non-hemolytic)-0.231512.7+0.45

Summary

All four PepMLM-generated peptides passed the two most critical early-stage therapeutic thresholds: all are predicted to be soluble and non-hemolytic. This is an encouraging baseline, as poor solubility and hemolytic activity are among the most common reasons peptide candidates fail during preclinical screening. However, all four peptides were classified as weak binders against SOD1 A4V, with predicted binding affinities clustering narrowly between 6.45 and 6.69 pKd/pKi, indicating that none demonstrate strong predicted affinity for the target under current conditions.

Comparing these results to the AlphaFold3 structural data, the low ipTM score of 0.4 observed in Part 2 is broadly consistent with the weak binding predictions from PeptiVerse, suggesting that neither structural nor property-based evaluation strongly endorses any single peptide as a high confidence binder at this stage. No peptide was predicted to be hemolytic, meaning the structural uncertainty does not appear to stem from toxic or disruptive interactions with the target.

Among the four, Peptide 3 (HRVYAVVVAWKK) presents the strongest overall profile. It carries the highest predicted binding affinity at 6.689 pKd/pKi, the second highest solubility at 0.970, a low hemolysis probability at 0.093, and the highest positive net charge at pH 7 at 2.84, which may support favorable electrostatic interactions with SOD1 and aid membrane permeability. Its modest hydrophobicity score of 0.23 GRAVY suggests a reasonable balance between aqueous solubility and the hydrophobic contacts often required for stable protein binding. Peptide 2 shows perfect solubility at 1.000 and the lowest hydrophobicity, making it the safest from an aggregation standpoint, but its binding affinity is the weakest of the four at 6.453 pKd/pKi and its length of 15 residues may introduce additional pharmacokinetic challenges.

Peptide Selected for Advancement

Peptide 3: HRVYAVVVAWKK

Peptide Result Peptide Result

Peptide 3 is selected for advancement based on its combination of the highest predicted binding affinity among the four candidates, high solubility, low hemolytic risk, and a positively charged character at physiological pH that is consistent with favorable interactions at the SOD1 surface. While all peptides in this set are classified as weak binders and further optimization would be required, Peptide 3 represents the strongest starting point for motif-guided redesign using moPPIt in Part 4. Its valine-rich hydrophobic core may also provide a useful scaffold for introducing targeted contacts at the A4V mutation site or dimer interface.

week-06-hw-genetic-circuits-part-ii

Part - 1

  1. What are some components in the Phusion High-Fidelity PCR Master Mix, and what is their purpose?

    • Phusion High-Fidelity PCR Master Mix, commonly produced by Thermo Fisher Scientific, contains a high-fidelity DNA polymerase with proofreading ability, a reaction buffer that maintains optimal conditions, Mg²⁺ ions as a cofactor, dNTPs as building blocks, and stabilizing additives. Together, these components enable accurate and efficient DNA amplification with a low error rate.
  2. What are some factors that determine primer annealing temperature during PCR?

    • Primer annealing temperature in PCR is mainly determined by the melting temperature of the primers, which depends on their length and GC content. Higher GC content and longer primers increase the melting temperature, leading to a higher annealing temperature, while mismatches and low salt conditions can reduce it.
  3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

    • PCR and restriction enzyme digestion both generate linear DNA fragments but differ fundamentally in approach. PCR amplifies DNA from a template using a polymerase and primers, making it ideal when starting material is limited or when sequence modifications are needed, while restriction digestion cuts existing DNA at specific sequences using enzymes, making it preferable when precise, predefined sites are available and no amplification is required.
  4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

    • PCR and restriction enzyme digestion both generate linear DNA fragments, but differ fundamentally in approach. PCR amplifies DNA from a template using a polymerase and primers, making it ideal when the starting material is limited or when sequence modifications are needed, while restriction digestion cuts existing DNA at specific sequences using enzymes, making it preferable when precise, predefined sites are available, and no amplification is required.
  5. How does the plasmid DNA enter the E. coli cells during transformation?

    • To ensure DNA fragments are suitable for Gibson Assembly, the sequences must be designed with overlapping ends of about 20 to 40 base pairs that are complementary between adjacent fragments. These overlaps must have appropriate melting temperatures and correct sequence alignment so that the fragments can anneal properly and be joined seamlessly.
  6. Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

    • Golden Gate Assembly works by repeatedly cycling between digestion and ligation in one reaction mixture containing DNA fragments, a Type IIS enzyme, and ligase. The enzyme cuts to create specific overhangs, fragments anneal based on complementary ends, and ligase seals them together. Because the recognition sites are eliminated after cutting, correctly assembled products accumulate over time. This enables efficient and accurate multi-fragment assembly without leaving extra sequences between parts. The method is widely used in synthetic biology for building complex constructs.

Part - 2 Asimov Kernel

Homework Documentation: Genetic Circuit Design in Asimov Kernel


Overview

This notebook documents my work exploring genetic circuit design using Asimov Kernel, a cloud based computer aided design platform for synthetic biology. The assignment involved exploring existing bacterial circuit demos, recreating the Repressilator circuit, and designing three original genetic constructs using characterized bacterial parts.


Part 1: Exploring the Bacterial Demos Repository

I began by navigating to the Bacterial Demos repository within Asimov Kernel to understand how genetic parts work together in a functional circuit. I opened several example constructs and ran the simulator on each one to observe how different arrangements of promoters, repressors, and reporter genes produce different behaviors over time. I read the Info panel for each example to understand the design logic behind each circuit.

Key Observations:

  • Promoters control when and how strongly a gene is expressed
  • Repressors suppress gene expression when they bind to a promoter
  • The simulator outputs protein concentration over time, allowing visualization of whether a circuit oscillates, stays stable, or switches between states

Part 2: Recreating the Repressilator

What the Repressilator Is

The Repressilator is one of the first synthetic genetic circuits ever engineered, originally designed by Michael Elowitz and Stanislas Leibler in 2000. It consists of three repressor genes arranged in a loop, where each gene produces a protein that suppresses the next gene in the sequence. This creates an oscillating pattern of gene expression, similar to a biological clock.

Parts Used

The construct was assembled in the following order using parts from the Characterized Bacterial Parts repository:

OrderPartFunction
1pTetRPromoter suppressed by TetR protein
2A1 RBSRibosome binding site enabling translation
3LacIRepressor gene, produces LacI protein
4L3S2P24 Bacterial TerminatorEnds transcription of this unit
5pLacIPromoter suppressed by LacI protein
6A1 RBSRibosome binding site
7LambdaCIRepressor gene, produces LambdaCI protein
8L3S2P24 Bacterial TerminatorEnds transcription of this unit
9pLambdaCIPromoter suppressed by LambdaCI protein
10A1 RBSRibosome binding site
11TetRRepressor gene, produces TetR protein
12L3S2P24 Bacterial TerminatorEnds transcription of this unit
13pUC-SpecR v1 backbonePlasmid backbone
Construct 1 Schematic Construct 1 Schematic

How the Circuit Works

The three repressors suppress each other in a cycle:

  • TetR suppresses LacI production
  • LacI suppresses LambdaCI production
  • LambdaCI suppresses TetR production

Because each repressor takes time to build up and degrade, the genes take turns being active, producing a rhythmic oscillating wave pattern in protein concentration over time.

Simulator Settings

  • Simulation duration: 168 hours
  • Time step: 0.1 hours

Results

Repressilator Simulation Result Repressilator Simulation Result

Interpreting the Results

RNA Concentrations Over Time (top right graph)

The graph shows the transcript levels for all three repressor genes over 168 hours:

  • Light blue: Transcript for LacI
  • Orange: Transcript for LambdaCI
  • Dark blue: Transcript for TetR

All three transcripts begin with sharp fluctuations in the first 0 to 20 hours as the system initializes, before settling into stable steady state concentrations. LambdaCI transcript stabilizes at the highest level at approximately 2.25 relative units, TetR stabilizes at approximately 1.6 relative units, and LacI stabilizes at approximately 1.0 relative units.

Protein Concentrations Over Time (bottom right graph)

The protein concentration graph mirrors the RNA behavior. LambdaCI protein reaches the highest steady state concentration at approximately 3.1 relative units, TetR stabilizes at approximately 1.3 relative units, and LacI stabilizes at approximately 0.75 relative units.

RNA Polymerase Flux (top left bar chart)

The bar chart shows the RNA polymerase flux at the last time point across all parts in the construct. LambdaCI shows the highest flux at approximately 2.75 relative units, reflecting its strong expression in this circuit.

Ribosome Flux (bottom left bar chart)

The ribosome flux chart confirms translation activity across all three coding sequences, with LambdaCI again showing the highest ribosome engagement.

Why the Results Do Not Show Classic Oscillation

The simulator output shows the system reaching a steady state rather than producing the expected oscillating wave pattern. This can occur because:

  • The relative strengths of the three promoters are not perfectly balanced, causing one repressor to dominate and suppress the others into a fixed state
  • The degradation rates of the repressor proteins may be too low relative to their production rates, preventing the cycling behavior from emerging
  • The specific parts used may have expression levels that push the circuit into a stable equilibrium rather than a dynamic oscillation

To recover oscillatory behavior, simulator adjustments such as increasing protein degradation rates or balancing promoter strengths could be explored in future runs.


Part 3: Original Constructs

Homework Documentation: Genetic Circuit Design in Asimov Kernel


Overview

This notebook documents my work exploring genetic circuit design using Asimov Kernel, a cloud based computer aided design platform for synthetic biology. The assignment involved exploring existing bacterial circuit demos, recreating the Repressilator circuit, and designing three original genetic constructs using characterized bacterial parts.


Part 1: Exploring the Bacterial Demos Repository

I began by navigating to the Bacterial Demos repository within Asimov Kernel to understand how genetic parts work together in a functional circuit. I opened several example constructs and ran the simulator on each one to observe how different arrangements of promoters, repressors, and reporter genes produce different behaviors over time.

Key Observations:

  • Promoters control when and how strongly a gene is expressed
  • Repressors suppress gene expression when they bind to a promoter
  • The simulator outputs protein concentration over time, allowing visualization of whether a circuit oscillates, stays stable, or switches between states

Part 2: Recreating the Repressilator

What the Repressilator Is

The Repressilator is one of the first synthetic genetic circuits ever engineered, originally designed by Michael Elowitz and Stanislas Leibler in 2000. It consists of three repressor genes arranged in a loop, where each gene produces a protein that suppresses the next gene in the sequence. This creates an oscillating pattern of gene expression, similar to a biological clock.

Parts Used

OrderPartFunction
1pTetRPromoter suppressed by TetR protein
2A1 RBSRibosome binding site enabling translation
3LacIRepressor gene, produces LacI protein
4L3S2P24 Bacterial TerminatorEnds transcription of this unit
5pLacIPromoter suppressed by LacI protein
6A1 RBSRibosome binding site
7LambdaCIRepressor gene, produces LambdaCI protein
8L3S2P24 Bacterial TerminatorEnds transcription of this unit
9pLambdaCIPromoter suppressed by LambdaCI protein
10A1 RBSRibosome binding site
11TetRRepressor gene, produces TetR protein
12L3S2P24 Bacterial TerminatorEnds transcription of this unit
13pUC-SpecR v1 backbonePlasmid backbone

How the Circuit Works

The three repressors suppress each other in a cycle:

  • TetR suppresses LacI production
  • LacI suppresses LambdaCI production
  • LambdaCI suppresses TetR production

Simulator Settings

  • Simulation duration: 168 hours
  • Time step: 0.1 hours

Results

Repressilator Simulation Result Repressilator Simulation Result

Discussion

The simulator output showed the system reaching a steady state rather than producing the expected oscillating wave pattern. LambdaCI dominated with the highest transcript and protein concentrations, while LacI remained at the lowest level. This likely occurred because the relative expression strengths of the three promoters are not perfectly balanced, causing one repressor to dominate and lock the circuit into a fixed state rather than allowing the cyclic turnover required for oscillation.


Part 3: Original Constructs


Construct 1: Constitutive GFP Expression Under pTetR

Construct 1 Schematic Construct 1 Schematic

Parts Used

OrderPartFunction
1pTetRPromoter suppressed by TetR protein
2A1 RBSRibosome binding site
3gfpGreen fluorescent reporter gene
4L3S2P24 Bacterial TerminatorEnds transcription
5pUC-SpecR v1 backbonePlasmid backbone (3105 bp total)

Design Rationale

This is the simplest possible circuit, consisting of a single gene expression unit. GFP is placed under the control of the pTetR promoter. Since no TetR protein is present anywhere in this construct to suppress the promoter, GFP expression should proceed freely. This circuit serves as a baseline to understand what unregulated reporter expression looks like before introducing any repressor logic.

Expected Behavior

With no repressor present to suppress pTetR, GFP should be continuously produced, resulting in a steadily rising concentration that plateaus at a stable high level once production and degradation reach equilibrium.

Simulator Results

Construct 1 Simulation RNA and RNAP Flux Construct 1 Simulation RNA and RNAP Flux Construct 1 Simulation Protein and Ribosome Flux Construct 1 Simulation Protein and Ribosome Flux

Discussion

The results matched expectations precisely. The RNAP flux chart confirmed high and uniform transcriptional activity across pTetR, A1 RBS, and gfp, all registering at approximately 3.8 relative units. The RNA concentration graph showed the gfp transcript rising sharply within the first few hours and stabilizing at approximately 3.0 relative units, where it remained constant across the full 168 hour simulation. The protein concentration graph mirrored this behavior, with GFP accumulating rapidly in the first 5 to 10 hours before plateauing at approximately 2.4 relative units. This confirms that in the absence of any repressor, the pTetR promoter drives strong and stable constitutive expression, which is consistent with its design as a repressible rather than an independently active promoter.


Construct 2: Single Repressor Switch with GFP Reporter

Construct 2 Schematic Construct 2 Schematic

Parts Used

OrderPartFunction
1pLacIPromoter suppressed by LacI protein
2A1 RBSRibosome binding site
3gfpGreen fluorescent reporter gene
4L3S2P24 Bacterial TerminatorEnds transcription
5pTetRPromoter suppressed by TetR protein
6A1 RBSRibosome binding site
7LacIRepressor gene, produces LacI protein
8L3S2P24 Bacterial TerminatorEnds transcription
9pUC-SpecR v1 backbonePlasmid backbone (4362 bp total)

Design Rationale

This circuit introduces a single layer of repression to control GFP output. LacI is freely produced from the second unit under pTetR control since no TetR is present to suppress it. That LacI protein then binds to pLacI in the first unit, suppressing GFP expression. This creates a simple inverter where high LacI drives low GFP output.

Expected Behavior

LacI should accumulate at a high stable concentration while GFP remains suppressed at a low level, demonstrating how one gene can directly switch another off.

Simulator Results

Construct 2 Simulation Protein and Ribosome Flux Construct 2 Simulation Protein and Ribosome Flux Construct 2 Simulation RNA and RNAP Flux Construct 2 Simulation RNA and RNAP Flux

Discussion

The results confirmed the expected inverter behavior. The ribosome flux chart showed high translational activity at the LacI coding sequence at approximately 3.0 relative units, while gfp ribosome flux was nearly absent at approximately 0.2 relative units. The protein concentration graph showed LacI rising rapidly and stabilizing at approximately 2.4 relative units within the first 5 hours, while GFP remained suppressed at approximately 0.15 relative units across the 72 hour simulation. The RNAP flux chart confirmed that pLacI carried very low transcriptional activity at approximately 0.3 relative units due to LacI repression, while pTetR showed high flux at approximately 4.2 relative units driving strong LacI production. This circuit successfully demonstrated that a single repressor is sufficient to silence a reporter gene when expressed from a constitutively active promoter.


Construct 3: Two Gene Toggle Switch

Construct 3 Schematic Construct 3 Schematic

Parts Used

OrderPartFunction
1pLacIPromoter suppressed by LacI protein
2A1 RBSRibosome binding site
3gfpGreen fluorescent reporter gene
4L3S2P24 Bacterial TerminatorEnds transcription
5pTetRPromoter suppressed by TetR protein
6A1 RBSRibosome binding site
7LacIRepressor gene, produces LacI protein
8L3S2P24 Bacterial TerminatorEnds transcription
9pUC-SpecR v1 backbonePlasmid backbone (4362 bp total)

Design Rationale

This construct uses the same part configuration as Construct 2 and was run as a parallel simulation to test reproducibility and to observe whether different simulation durations or initial conditions produce consistent results. Running the same circuit under slightly different conditions is a standard way to assess the robustness of a circuit’s behavior.

Expected Behavior

Results should closely mirror those of Construct 2, with LacI dominating and GFP remaining suppressed. Any deviation in the output would indicate sensitivity to initial conditions or simulation parameters.

Simulator Results

Construct 3 Simulation Protein and Ribosome Flux Construct 3 Simulation Protein and Ribosome Flux Construct 3 Simulation RNA and RNAP Flux Construct 3 Simulation RNA and RNAP Flux

Discussion

The simulation results were broadly consistent with Construct 2, confirming that the circuit behavior is reproducible. The ribosome flux chart again showed high translational activity at TetR at approximately 3.4 relative units, while LacI ribosome flux remained low at approximately 0.25 relative units. The protein concentration graph showed TetR stabilizing at approximately 2.7 relative units while LacI settled at approximately 0.15 relative units across the 168 hour simulation. The RNA concentration graph confirmed high LacI transcript levels at approximately 3.0 relative units and low gfp transcript at approximately 0.25 relative units over the 72 hour window. The consistency between Construct 2 and Construct 3 results suggests the circuit dynamics are stable and not highly sensitive to initial simulation conditions.

week-07-hw-genetic-circuits-part-ii

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

  1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

    • Intracellular artificial neural networks provide more flexible and nuanced behavior than traditional Boolean genetic circuits because they can process inputs in a graded, continuous manner rather than simple on or off states. This allows cells to integrate multiple signals and produce proportional responses, making them better suited for complex decision making and pattern recognition inside biological systems.
  2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

    • A useful application of an intracellular artificial neural network would be in disease sensing, such as detecting cancer-specific molecular signatures. Inputs could be multiple biomarkers like microRNAs or metabolites, and the output could be the expression of a therapeutic protein only when a specific combination and threshold of signals is reached. This enables precise targeting and reduces off-target effects, although limitations include noise in gene expression, slow response times, and difficulty in tuning weights accurately inside living cells.
  3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.

    • The perceptron system described works by using inputs that influence gene expression levels, where one input produces the Csy4 enzyme that regulates the mRNA of another gene encoding a fluorescent protein. Transcription and translation convert DNA inputs into proteins, and the interaction between Csy4 and the target mRNA effectively acts as a weighted connection, allowing the system to compute a combined output similar to a neural network node.

Assignment Part 2: Fungal Materials

  1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

    • Fungal materials include products like mycelium based packaging, leather alternatives, and construction materials, often developed by companies such as Ecovative. These materials are biodegradable, sustainable, and require low energy to produce compared to plastics or animal based materials, but they can have limitations in durability, scalability, and consistency compared to traditional materials.
  2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

    • Genetically engineering fungi could allow them to produce specialized biomaterials, degrade environmental pollutants, or synthesize valuable compounds such as pharmaceuticals. Fungi are advantageous over bacteria because they naturally secrete large amounts of proteins, can grow into structured materials like mycelium networks, and are better suited for producing complex molecules, although they are generally slower growing and harder to genetically manipulate.

Part 3 First DNA Twist Order design

Plasmid Design Plasmid Design

week-09-hw-cell-free-systems

Homework question from Kate Adamala.

Design an example of a useful synthetic minimal cell as follows:

  1. Pick a function and describe it.

a. What would your synthetic cell do? What is the input, and what is the output?

  • The cell-free genetic circuit that I plan to make for the final project aims to detect different biological signals and produce a measurable output. The input will be one among the environmental signals, IL-6 or low O₂, and the output will be a green fluorescence signal or a therapeutic peptide.
  1. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
  • The system that I am thinking of needs to be encapsulated inside a hydrogel.
  1. Could this function be realized by genetically modified natural cells?
  • Cells do have a mechanism to respond to real signals in the body, but getting therapeutic peptides and other luminescent signals as an output from a signal is achieved if the cell is preprogrammed and the genetic circuit is assembled in a way to detect the signal and respond accordingly.
  1. Describe the desired outcome of your synthetic cell operation.
  • Output will be a Green Fluorescence Signal.

Design all components that would need to be part of your synthetic cell.

  1. What would the membrane be made of?

  2. What would you encapsulate inside? Enzymes, small molecules.

    • It will be encapsulated with hydrogel.
  3. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (Hint: for example, if you want to use small molecule modulated promoters, like Tet-ON, you need mammalian)

  4. How will your synthetic cell communicate with the environment? (Hint: are substrates permeable? Or do you need to express the membrane channel?)

Experimental details

List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel,” pick the actual gene.) How will you measure the function of your system?

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material.

Answer each of these key questions for your proposal pitch:

  1. Write a one-sentence summary pitch sentence describing your concept.

    • Synthetic fermenters that can be used in the preparation of household fermented drinks and foods.
  2. How will the idea work, in more detail? Write 3-4 sentences or more.

    • A yeast-inspired fermenter that can be used all year round, when making different fermented foods and drinks.
  3. What societal challenge or market need will this address?

    • It would decrease the dependency on natural yeast that may not be available, and it may also not be as functional as it should be.
  4. How do you envision addressing the limitations of cell-free reactions (e.g., activation with water, stability, one-time use)?

    • Freeze-Drying (Lyophilizing)

Homework question from Ally Huang

Genes in Space Proposal: On-Demand Drug Synthesis Using Freeze-Dried Cell-Free Systems

  1. Background & Significance.

  • During long-duration spaceflight, astronauts face unique medical challenges, including immune dysregulation, increased infection susceptibility, and limited access to pharmaceuticals. Resupply missions from Earth take months, and stored drugs degrade under radiation and temperature fluctuations aboard the ISS. This creates a critical gap in astronaut healthcare. Freeze-dried (lyophilized) cell-free protein synthesis (CFPS) systems offer a revolutionary solution: producing therapeutic proteins on-demand without live cells or complex equipment. Developing this capability is significant for humanity as it could also enable point-of-care drug manufacturing in remote locations on Earth, while advancing our understanding of biochemical reactions in microgravity.
  1. Target: The human granulocyte-colony stimulating factor (hG-CSF) gene — a therapeutic protein that stimulates white blood cell production, critical for combating spaceflight-induced immune suppression.

  2. Relationship to the Space Biology Challenge.

Spaceflight significantly suppresses immune function, leaving astronauts vulnerable to opportunistic infections with no immediate access to Earth-based medical support. hG-CSF is a clinically proven immunostimulatory cytokine used on Earth to treat neutropenia (low white blood cell count). By encoding the hG-CSF gene in a DNA template compatible with a freeze-dried CFPS system, we can synthesize this therapeutic protein directly aboard the ISS when needed. This eliminates reliance on pre-packaged drugs that degrade over time and demonstrates that biologically active therapeutics can be manufactured in a microgravity environment using minimal, shelf-stable reagents.

  1. Hypothesis & Reasoning.

Hypothesis: A freeze-dried cell-free protein synthesis system can successfully express biologically active hG-CSF protein aboard the ISS, with yields and activity comparable to Earth-based controls. We reason that CFPS systems, which contain all necessary transcription and translation machinery extracted from bacterial cells and lyophilized for stability, should retain functionality in microgravity, as the core biochemical reactions are molecular in nature and do not inherently depend on gravity. However, microgravity may alter fluid dynamics, molecular diffusion, and reaction kinetics in ways that affect protein folding and yield. By comparing ISS-synthesized hG-CSF to ground controls using the same freeze-dried BioBits platform, we can directly quantify any performance differences. If successful, this establishes a proof-of-concept for in-space pharmaceutical biomanufacturing, paving the way for astronauts on deep-space missions to synthesize a broad library of therapeutics from compact, stable DNA templates.

  1. Experimental Plan.

Samples: Freeze-dried BioBits CFPS extract rehydrated with a plasmid encoding hG-CSF, tested aboard the ISS and in a matched ground control. Controls:

  • Negative control: CFPS extract rehydrated without plasmid DNA
  • Positive control: Ground-based hG-CSF expression under identical conditions

Measurements:

Protein yield quantified via fluorescent reporter tag (GFP-fused hG-CSF) using the Genes in Space Fluorescence Viewer Reaction kinetics tracked at 30-minute intervals over 4 hours Biological activity assessed post-flight via cell proliferation assay

Data will be compared between ISS and ground samples to evaluate the effect of microgravity on CFPS efficiency and protein functionality.

week-10-hw-imaging-and-measurement

Protein Characterization: eGFP and KLH


Homework: Final Project

Project Title: A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection

What I Will Measure

My final project centers on engineering a cell-free genetic circuit embedded within a hydrogel matrix that responds to two physiological disease signals, IL-6 (an inflammatory cytokine) and low oxygen tension (hypoxia), and produces two corresponding outputs: sfGFP fluorescence as a reporter signal and a therapeutic peptide as a functional output.

The measurable aspects of this project include:

1. Input Signal Detection

Presence and concentration of IL-6 protein in the local microenvironment and dissolved oxygen levels indicating hypoxic conditions.

2. Circuit Output Characterization

Expression and fluorescence intensity of sfGFP (superfolder GFP) as a quantifiable reporter for circuit activation, and identity, mass, and sequence confirmation of the therapeutic peptide output.

3. System Integration Metrics

Encapsulation efficiency and viability of the cell-free Tx/Tl machinery within the hydrogel bioink matrix, and temporal response dynamics of the circuit to input signals.

Technologies I Will Use

Liquid Chromatography Mass Spectrometry (LC-MS) Used for intact protein molecular weight determination of both sfGFP and the therapeutic peptide output. The protein or peptide is separated by reverse-phase HPLC and detected by a quadrupole time-of-flight (QToF) mass spectrometer. The resulting m/z spectra are deconvoluted to yield the neutral molecular weight, which is compared against the theoretical value predicted from the DNA sequence to confirm correct translation and folding.

Tryptic Digest Peptide Mapping The sfGFP output protein is digested with trypsin, which cleaves after lysine (K) and arginine (R) residues. The resulting peptides are analyzed by LC-MS/MS to confirm the primary amino acid sequence and assess sequence coverage. This confirms that the cell-free expression system is producing the correct protein from the codon-optimized sfGFP construct shown in the plasmid map.

Fluorescence Spectroscopy sfGFP fluorescence (excitation ~485 nm, emission ~510 nm) is measured as the primary readout of circuit activation. Fluorescence intensity is quantified relative to input signal concentration to generate a dose-response curve relating IL-6 or O2 levels to circuit output.

Native Mass Spectrometry The sfGFP output is analyzed under non-denaturing conditions to confirm correct folding and fluorophore maturation. Since sfGFP fluorescence requires proper beta-barrel folding and chromophore formation, native MS provides structural confirmation that the circuit is producing properly folded, functional protein rather than misfolded aggregates.

Western Blot and ELISA Used for semi-quantitative detection of IL-6 input signal concentration in test samples and confirmation of therapeutic peptide production levels. ELISA provides high sensitivity for IL-6 detection in the nanogram per milliliter range relevant to inflammatory disease contexts.

Rheology The mechanical properties of the hydrogel encapsulation matrix are characterized by oscillatory shear rheology to confirm appropriate stiffness, porosity, and biocompatibility for maintaining cell-free machinery activity and enabling diffusion of input signals into the hydrogel interior.

System Architecture

The MIMO genetic circuit operates as follows:

INPUT 1: IL-6 Signal ──────────┐
                               ▼
                    CELL-FREE GENETIC CIRCUIT
                    (Tx/Tl Machinery + DNA Template)
                    Encapsulated in Hydrogel/Bioink
                               │
INPUT 2: Low O₂ (Hypoxia) ────┘
                               │
                    ┌──────────┴──────────┐
                    ▼                     ▼
             sfGFP Fluorescence    Therapeutic Peptide
             (Reporter Output)     (Functional Output)

The DNA template encodes a T7 promoter-driven sfGFP construct (codon optimized for cell-free expression) alongside regulatory elements responsive to IL-6 and hypoxia sensing domains. The hydrogel bioink serves as the encapsulation matrix, protecting the cell-free machinery while allowing diffusion of small molecule inputs and outputs across its porous network.


Part I: Molecular Weight of Intact eGFP

Question 1: Calculated Molecular Weight from Sequence

The eGFP sequence used for analysis (including the His-purification tag HHHHHH and the LE linker) is 247 amino acids in total:

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL
VTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV
NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD
HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLE
HHHHHH

Using the ExPASy Compute pI/MW tool, the theoretical average molecular weight of this sequence is:

Theoretical MW ≈ 27,728 Da

Question 2: Molecular Weight from Adjacent Charge State Approach

Step 1: Select two adjacent charge state peaks

From Figure 1, two clearly resolved adjacent charge state peaks were selected:

Peakm/z Value
Peak A (charge z)903.7148
Peak B (charge z+1)875.4421

Step 2: Determine charge z for each peak

Using the adjacent charge state formula:

z = (M2 - 1.00728) / (M1 - M2)
z = (875.4421 - 1.00728) / (903.7148 - 875.4421)
z = 874.4348 / 28.2727
z = 30.93 ≈ 31

Therefore:

  • Peak A at m/z = 903.7148 has charge z = 31
  • Peak B at m/z = 875.4421 has charge z = 32

Step 3: Calculate Molecular Weight

Using the relationship: MW = z × (m/z) − z × 1.00728

From Peak A (z = 31):

MW = 31 × 903.7148 − 31 × 1.00728 = 28,015.16 − 31.23 = 27,983.93 Da

From Peak B (z = 32):

MW = 32 × 875.4421 − 32 × 1.00728 = 28,014.15 − 32.23 = 27,981.91 Da

Average MW from adjacent charge states: 27,982.92 Da ≈ 27.98 kDa

Step 4: Mass Accuracy

Accuracy (%) = |Observed - Theoretical| / Theoretical × 100
             = |27,982.92 - 27,728| / 27,728 × 100
             ≈ 0.92%

Question 3: Charge State of the Zoomed-In Peak

The charge state of the zoomed-in peak can be observed from the inset in Figure 1, which shows the peak at approximately m/z = 1473.74.

Using the protein MW of ~27,982 Da:

z = MW / ((m/z) - 1.00728)
z = 27,982 / (1473.74 - 1.00728)
z ≈ 19.0

The charge state is z = 19.

This is confirmed from the zoomed inset, where the spacing between adjacent isotope peaks is approximately 0.053 m/z units (1/z = 1/19 ≈ 0.053). At 30,000 resolution, these isotope peaks are resolvable, confirming z = 19.


Part II: Secondary and Tertiary Structure

Question 1: Native vs Denatured Protein Conformations

When a protein unfolds (denatures), it loses its compact three-dimensional structure. In its native state, eGFP maintains a specific folded beta-barrel conformation stabilized by non-covalent interactions including hydrogen bonds, hydrophobic interactions, van der Waals forces, and electrostatic interactions. Upon denaturation, these interactions are disrupted and the polypeptide chain unfolds into a more extended, disordered conformation.

In electrospray ionization mass spectrometry (ESI-MS), the charge state distribution directly reflects the protein’s conformation:

Denatured protein (top spectrum, Figure 2): The extended, unfolded chain exposes many basic residues (lysines, arginines, histidines) to solvent, allowing more protons to be added during ionization. This results in a high charge state distribution with many overlapping peaks at low m/z values (600 to 1300 range) and a broad, multimodal envelope.

Native protein (bottom spectrum, Figure 2): The compact folded structure shields many basic residues within the protein interior, limiting the number of protons that can be added. This results in a low charge state distribution with fewer, sharper peaks at high m/z values (~2300 to 2800 range), indicating a tightly folded, compact conformation.

The mass spectrometer distinguishes these states by the position and width of the charge state envelope. A shift to lower m/z with more overlapping peaks indicates denaturation, while a shift to higher m/z with fewer resolved peaks indicates a native compact fold.

Question 2: Charge State at ~2800 m/z in Native Spectrum

Using the protein MW of ~27,982 Da:

z = 27,982 / (2799.5 - 1.00728) ≈ 10.0

The charge state of the peak at ~2800 m/z is z = 10.

The adjacent peak cluster at ~2545 m/z corresponds to z = 11:

z = 27,982 / (2545.0 - 1.00728) ≈ 11.0

From the zoomed inset, the spacing between adjacent isotope peaks is approximately 0.09 m/z units, corresponding to z = 1/0.09 ≈ 11, consistent with the calculated value. This low charge state (z = 10 to 11) is characteristic of a compactly folded native protein, in sharp contrast to the charge states of z = 30+ observed in the denatured spectrum.


Part III: Peptide Mapping — Primary Structure

Question 1: Lysines and Arginines in eGFP

Lysine (K) residues — 20 total at positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239

Arginine (R) residues — 6 total at positions: 74, 97, 110, 123, 169, 216

Sequence with cleavage sites marked (| = trypsin cleavage after K or R):

MVSK|GEELFTGVVPILVELDGDVNGHK|FSVSGEGEGDATYGK|LTLK|FICTTGK|LPVPWPTL
VTTLTYGVQCFSR|YPDHMK|QHDFFK|SAMPEGYVQER|TIFFK|DDGNYK|TR|AEVK|
FEGDTLVNR|IELK|GIDFK|EDGNILGHK|LEYNYNSHNVYIMADK|QK|NGIK|VNFK|IR|
HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK|DPNEK|R|DHMVLLEFVTAAGITLGMDELYK|LEHHHHHH

Total trypsin cleavage sites: 26 (20 K + 6 R)

Question 2: Number of Tryptic Peptides

Using the ExPASy PeptideMass tool with trypsin digest conditions and no missed cleavages:

27 theoretical tryptic peptides are expected from the eGFP His-tag sequence. With 26 cleavage sites and no missed cleavages, 27 peptide fragments are expected, including the C-terminal LEHHHHHH fragment which has no internal K or R residues.

Question 3: Chromatographic Peaks in the TIC

From Figure 5a (Total Ion Chromatogram), approximately [N] chromatographic peaks are observed with greater than 10% relative abundance between 0.5 and 6 minutes.

Question 4: Do Peak Numbers Match Predicted Peptides?

The number of chromatographic peaks observed in the TIC does not exactly match the 27 predicted tryptic peptides. There are typically fewer peaks in the chromatogram than predicted peptides for the following reasons:

  • Some peptides co-elute at the same retention time and appear as a single peak
  • Very small or highly hydrophilic peptides are not retained on the reverse-phase column and elute in the void volume at or before 0.5 minutes
  • Some peptides fall below the detection limit of the instrument
  • His-tag peptides such as LEHHHHHH may not retain well under standard reverse-phase gradient conditions

Question 5: m/z, Charge, and Mass of the Peptide at 2.78 min

Most abundant m/z peak from Figure 5b: 525.76712

Determining charge state from isotope spacing:

Delta(m/z) = 526.25918 - 525.76712 = 0.492 ≈ 0.5
z = 1 / 0.5 = 2

The peptide is doubly charged (z = 2).

Calculating the singly charged mass [M+H]+:

[M+H]+ = (m/z × z) - (z - 1) × 1.00728
        = (525.76712 × 2) - (1 × 1.00728)
        = 1051.534 - 1.007
        = 1050.527 Da

This is confirmed by the peak at 1050.52438 observed in Figure 5b.

Question 6: Peptide Identification and Mass Accuracy

Peptide Identification:

Searching the ExPASy PeptideMass results for a tryptic eGFP peptide with theoretical [M+H]+ ≈ 1050.52 Da identifies the peptide as FEGDTLVNR (residues 115 to 123 of eGFP).

Theoretical monoisotopic masses of each residue:

ResidueMass (Da)
F (Phe)147.0684
E (Glu)129.0426
G (Gly)57.0215
D (Asp)115.0269
T (Thr)101.0477
L (Leu)113.0841
V (Val)99.0684
N (Asn)114.0429
R (Arg)156.1011
H₂O18.0106
Total MW1049.5142 Da
[M+H]+1050.5215 Da

y-ion series confirmation from Figure 5c:

IonSequenceTheoretical (Da)Observed (Da)Match
y3VNR388.2303388.21957YES
y4LVNR501.3144501.30846YES
y5TLVNR602.3621602.34777YES
y7GDTLVNR774.4105774.41334YES
y8EGDTLVNR903.4531903.44365YES
[M+H]+FEGDTLVNR1050.52151050.52438YES

The y-ion series is fully consistent with the sequence FEGDTLVNR.

Mass Accuracy:

Error (ppm) = |Observed - Theoretical| / Theoretical × 10^6
            = |1050.52438 - 1050.5215| / 1050.5215 × 10^6
            = 0.00288 / 1050.5215 × 10^6
            ≈ 2.76 ppm

This excellent mass accuracy of less than 3 ppm is consistent with the high-resolution Waters BioAccord QToF mass spectrometer used for the analysis.

Question 7: Sequence Coverage

From Figure 6, the BioAccord LC-MS peptide mapping identified peptides covering 88% of the eGFP amino acid sequence.

The highlighted regions in Figure 6 show the portions of the sequence confirmed by peptide identification based on calculated mass and fragmentation pattern. The small uncovered gaps represent regions not confirmed, which may correspond to very small peptides below the detection limit, highly hydrophilic peptides that did not retain on the column, or peptides outside the instrument detection range. An 88% sequence coverage is excellent for a routine peptide mapping experiment and strongly confirms the identity of the protein as eGFP.

Bonus: Peptide Sequence from Fragmentation Spectrum

Based on the y-ion series analysis in Question 6, the peptide eluting at 2.78 minutes with [M+H]+ = 1050.52 Da is confirmed as FEGDTLVNR.

The y-ions observed in Figure 5c (y3 through y8) account for the C-terminal sequence GDTLVNR, and the full sequence FEGDTLVNR is confirmed by the molecular weight and complete fragmentation pattern. This peptide maps to residues 115 to 123 of the eGFP sequence, flanked by the trypsin cleavage sites after K114 and R123.

Does the peptide map data make sense?

Yes. The peptide FEGDTLVNR is a predicted tryptic fragment of eGFP and its mass and fragmentation pattern are fully consistent with the expected sequence. The 88% amino acid coverage shown in Figure 6 further confirms that the protein analyzed is eGFP. The identified peptides span the full length of the protein including the N-terminal region, the GFP barrel domain, and the C-terminal His-tag region, providing high confidence that the correct protein was expressed and purified successfully.


Part IV: KLH Oligomeric States

Using the known subunit masses from Table 1 (7FU = 340 kDa, 8FU = 400 kDa), the expected masses of the KLH oligomeric species are:

Oligomeric SpeciesNumber of SubunitsSubunit MassTotal Mass
7FU Decamer10 × 7FU340 kDa3,400 kDa (3.4 MDa)
8FU Didecamer20 × 8FU400 kDa8,000 kDa (8.0 MDa)
8FU 3-Decamer30 × 8FU400 kDa12,000 kDa (12.0 MDa)
8FU 4-Decamer40 × 8FU400 kDa16,000 kDa (16.0 MDa)

From Figure 7 (CDMS spectrum), these four species appear as distinct peaks at approximately 3.4, 8.0, 12.0, and 16.0 MDa on the mass axis. The CDMS technique enables direct single-particle mass measurement without requiring charge state deconvolution, making it uniquely suited for these large, heterogeneous macromolecular assemblies that would produce unresolvable overlapping spectra on conventional ESI-MS instruments. Each distinct peak in the CDMS spectrum corresponds to one of the oligomeric states listed above.


Part V: Did I Make GFP?

PropertyTheoreticalObserved (Intact LC-MS)PPM Mass Error
Molecular Weight (kDa)27.728 kDa27.983 kDa~9,190 ppm

PPM Error Calculation:

PPM Error = |Observed MW - Theoretical MW| / Theoretical MW × 10^6
          = |27,983 - 27,728| / 27,728 × 10^6
          ≈ 9,190 ppm

The relatively high PPM error for intact protein analysis compared to the sub-3 ppm error observed in peptide mapping reflects the inherent difference between average mass measurements (used for intact proteins) and monoisotopic mass measurements (used for small peptides). Intact protein MS typically achieves accuracy in the range of hundreds to low thousands of ppm due to the broad isotope envelope, whereas peptide MS achieves single-digit ppm accuracy due to resolved isotope peaks.

The observed MW of 27,983 Da, combined with 88% sequence coverage from peptide mapping and confirmed tryptic peptide masses from LC-MS/MS, provides strong evidence that the protein analyzed is the expected eGFP His-tag standard. The intact mass, charge state distribution, and peptide map are all fully consistent with the predicted properties of eGFP, confirming successful expression and purification of the correct protein.

week-11-hw-building-genomes

Bioproduction & Cloud Labs


Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

My Contribution

I contributed pixels forming part of the DNA helix structure on the lower left quadrant of the collective canvas, using a blue-green palette consistent with the biological theme of the artwork.

What I Liked

The most compelling aspect of this project was how it translated the logic of collaborative biology into a visual format. Just as no single cell produces an organism, no single contributor produced the artwork — the final image only emerged through collective action across dozens of participants working asynchronously. There is something genuinely elegant about using a 1,536-well plate format, the same format used for high-throughput biological screening, as the canvas unit. It collapsed the boundary between the scientific instrument and the artistic medium in a way that felt intentional rather than gimmicky.

What Could Be Improved

The main friction point was the personalized URL system. If a participant missed the email or their Discourse account was not linked correctly, there was no clear fallback to still contribute. For next year, it would be worth building a redundant access pathway so that every enrolled student can contribute regardless of email delivery issues. Additionally, allowing contributors to see a live preview of the growing artwork in real time rather than waiting for the editing window to close would significantly increase engagement and give participants a stronger sense of their individual impact on the collective piece.


Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Component Roles in the Cell-Free Reaction

E. coli Lysate — BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) The lysate provides the complete molecular machinery required for gene expression, including ribosomes, translation factors, RNA polymerase (T7 RNAP for driving expression from T7 promoters), chaperones, and metabolic enzymes. It is the functional core of the cell-free system, supplying all the biological components that would normally be found inside a living cell.

Potassium Glutamate Potassium glutamate serves as the primary ionic strength buffer, providing potassium ions that are essential for ribosome stability and translation fidelity. It is preferred over potassium chloride because the glutamate anion is less inhibitory to transcription and translation enzymes.

HEPES-KOH pH 7.5 HEPES is a buffering agent that maintains the reaction pH at 7.5, which is close to the physiological pH of the bacterial cytoplasm and is optimal for ribosome function, RNA polymerase activity, and enzyme catalysis throughout the reaction.

Magnesium Glutamate Magnesium ions are essential cofactors for ribosome assembly, RNA polymerase activity, and ATP hydrolysis. The concentration of magnesium must be carefully titrated as both insufficient and excess magnesium impair translation efficiency.

Potassium Phosphate Monobasic and Dibasic These phosphate salts provide additional buffering capacity and serve as a phosphate source that supports nucleotide regeneration and energy metabolism within the cell-free reaction.

Ribose Ribose is a pentose sugar that serves as a carbon and energy source, feeding into cellular metabolic pathways within the lysate to regenerate nucleoside monophosphates (NMPs) into nucleoside triphosphates (NTPs) needed for ongoing transcription.

Glucose Glucose is a second energy substrate that drives ATP regeneration through glycolytic enzymes retained in the lysate, supplementing ribose to sustain energy supply over longer reaction windows.

AMP, CMP, GMP, UMP These nucleoside monophosphates are the precursors for all four RNA bases. In NMP-based systems, they are phosphorylated to their triphosphate forms by kinases present in the lysate, providing the building blocks for mRNA synthesis by T7 RNA polymerase.

Guanine Guanine is a free nucleobase that can be converted to GMP through the purine salvage pathway enzymes retained in the lysate. It provides an alternative route for replenishing the guanosine nucleotide pool without requiring exogenous GMP directly.

17 Amino Acid Mix This mix supplies 17 of the 20 standard amino acids required for ribosomal protein synthesis. Tyrosine and cysteine are excluded from this mix and supplied separately due to their limited solubility and chemical instability under standard preparation conditions.

Tyrosine Tyrosine is supplied separately at pH 12 to maintain its solubility, as it is poorly soluble at neutral pH. It is an essential amino acid for translation and is critical for fluorescent protein chromophore formation in GFP-family proteins.

Cysteine Cysteine is supplied separately because it is chemically reactive and prone to oxidation. It is required for translation of proteins containing cysteine residues and is particularly important for proteins that rely on disulfide bonds or thiol chemistry for function.

Nicotinamide Nicotinamide is a precursor to NAD+ and NADH, supporting cellular redox reactions within the lysate. It helps sustain metabolic activity and energy regeneration over extended incubation periods by maintaining the NAD+/NADH balance needed for glycolysis and other metabolic pathways.

Nuclease Free Water Nuclease-free water serves as the backfill solvent to bring all reactions to their final volume. Using nuclease-free water prevents RNA degradation from RNase contamination, which would otherwise destroy the mRNA template and halt protein synthesis.


Main Differences Between the 1-Hour PEP-NTP and 20-Hour NMP-Ribose-Glucose Systems

The 1-hour PEP/NTP system is optimized for rapid, high-yield protein production by supplying pre-formed nucleoside triphosphates (ATP, GTP, CTP, UTP) and phosphoenolpyruvate (PEP-Mono) as an immediate energy source, enabling fast transcription and translation without requiring the cell to regenerate nucleotides from simpler precursors. In contrast, the 20-hour NMP-Ribose-Glucose system relies on nucleoside monophosphates (AMP, CMP, GMP set at 0 uM with Guanine substituted, UMP) combined with ribose and glucose as simple carbon and energy precursors, which are processed by metabolic enzymes in the lysate to regenerate NTPs sustainably over a much longer window.

The key functional trade-off is longevity versus immediacy: the PEP/NTP system burns through its energy substrates quickly, making it optimal for short-burst expression but poorly suited for reactions requiring sustained protein production beyond one to two hours. The NMP-Ribose-Glucose system sacrifices initial speed for metabolic sustainability, supporting continuous transcription and translation for up to 20 hours by continuously regenerating the nucleotide pool from these simpler upstream precursors. The 20-hour system also uses a simplified additive profile, replacing the spermidine, DMSO, cAMP, NAD, and folinic acid additives of the 1-hour system with just nicotinamide, reflecting its different metabolic strategy for sustaining energy balance over time.


Bonus: How Can Transcription Occur if GMP is Not Included But Guanine Is?

Although GMP is listed at 0.00 uM in the 20-hour NMP-Ribose-Glucose system, transcription can still proceed because guanine is supplied as a free nucleobase at 200 uM. The E. coli lysate retains active purine salvage pathway enzymes, particularly hypoxanthine-guanine phosphoribosyltransferase (HGPRT), which can convert guanine into GMP using phosphoribosyl pyrophosphate (PRPP) as the phosphoribose donor. Subsequent kinases then phosphorylate GMP sequentially to GDP and then GTP, which is the form required by T7 RNA polymerase for incorporation into mRNA. This indirect route allows the system to maintain a functional GTP pool without requiring exogenous GMP, while also enabling tighter control over guanosine nucleotide concentrations by feeding the salvage pathway at the nucleobase level rather than the monophosphate level.


Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Biophysical and Functional Properties of Each Fluorescent Protein

sfGFP (Superfolder Green Fluorescent Protein) sfGFP was engineered with six stabilizing mutations that dramatically improve its folding robustness, allowing it to fold correctly even when expressed as a fusion partner or under suboptimal conditions. In cell-free systems, this enhanced folding efficiency means sfGFP reaches detectable fluorescence faster and at higher yields than wild-type GFP, making it one of the most reliable reporters for cell-free expression experiments.

mRFP1 (Monomeric Red Fluorescent Protein 1) mRFP1 requires molecular oxygen for chromophore maturation, as the oxidation step that generates the red-emitting chromophore from the DsRed-derived scaffold is oxygen-dependent. In cell-free reactions that are run in sealed or oxygen-limited environments, mRFP1 maturation can be rate-limiting and may result in underestimated fluorescence readings relative to the actual amount of protein synthesized.

mKO2 (Monomeric Kusabira Orange 2) mKO2 exhibits a relatively slow chromophore maturation time on the order of several hours at 37°C, which is significant in a cell-free context where the reaction window may be limited. This means fluorescence readout from mKO2 will continue to increase for hours after the reaction has peaked in protein synthesis, making it important to allow sufficient post-synthesis incubation time before measuring fluorescence endpoints.

mTurquoise2 mTurquoise2 has an unusually high quantum yield of approximately 0.93, the highest reported among cyan fluorescent proteins, and a long fluorescence lifetime, making it exceptionally photostable and bright per molecule. In cell-free systems this is advantageous because even modest expression levels produce detectable signal, reducing the pressure on the cell-free reaction to achieve very high protein yields for a useful readout.

mScarlet_I (mScarlet-I) mScarlet-I combines fast chromophore maturation with high brightness, achieving nearly complete maturation within one to two hours at 37°C. Its rapid maturation rate makes it particularly well-suited for cell-free reactions with short incubation windows, as the fluorescence signal more closely tracks the actual rate of protein synthesis in real time rather than lagging behind due to slow chromophore formation.

Electra2 Electra2 is sensitive to acidic pH, with its fluorescence significantly quenched below approximately pH 6.5. In cell-free reactions where metabolic activity can acidify the reaction environment over time as energy substrates are consumed and organic acids accumulate, this pH sensitivity can cause fluorescence to decrease even if the protein itself remains intact and properly folded, making accurate endpoint measurements at long incubation times unreliable without pH monitoring or buffering correction.


Hypothesis for Improving Fluorescence Over 36-Hour Incubation

Target protein: mKO2

Rationale: mKO2 has a slow chromophore maturation time, meaning that a large fraction of the protein synthesized in the early hours of the reaction will not yet be fluorescent when measured at intermediate time points. Over a 36-hour window, maximizing the amount of mKO2 protein synthesized early gives the protein more total time to mature, resulting in higher final fluorescence.

Hypothesis: Increasing the concentration of the 17 Amino Acid Mix, Tyrosine, and Cysteine in the master mix by approximately 1.5-fold above the base 20-hour NMP-Ribose-Glucose concentrations (from 4.10 mM to approximately 6.0 mM for the amino acid mix and tyrosine, and from 4.00 mM to 6.00 mM for cysteine) will increase the rate and total yield of mKO2 translation in the first 6 to 12 hours of the reaction. With a larger pool of synthesized but immature mKO2 protein available at the midpoint of the incubation, the extended 36-hour window provides sufficient time for the slow chromophore oxidation step to proceed to completion, resulting in a higher final fluorescent protein yield than would be achievable in a shorter reaction or with standard amino acid concentrations.

Expected effect: Elevated amino acid concentrations will reduce translational stalling caused by substrate depletion during peak synthesis, front-loading mKO2 production and maximizing the fraction of synthesized protein that has time to mature within the 36-hour window. A potential risk is that excessively high amino acid concentrations could alter ionic strength and compete with magnesium coordination, so careful titration around the 1.5-fold increase range is warranted.


Reaction Composition Reference

Each well reaction will be composed as follows:

ComponentVolume
Lysate6 μL
2X Optimized Master Mix10 μL
Fluorescent Protein DNA Template2 μL
Custom Reagent Supplements2 μL
Total20 μL

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Projects

Final projects:

  • HTGAA 2026: Individual Final Project Documentation Project Title: A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection Section 1: Abstract Chronic inflammatory diseases and solid tumors share a pathophysiological signature characterized by elevated interleukin-6 (IL-6) and reduced local oxygen tension (hypoxia). Current clinical monitoring cannot detect these signals simultaneously at the tissue level in real time, requiring repeated blood draws or imaging procedures that are slow, lab-dependent, and inaccessible to over 3.5 billion people globally who lack basic diagnostic services. This project addresses that gap by engineering a hydrogel-embedded, cell-free genetic circuit capable of sensing both IL-6 and hypoxia as dual inputs and producing two corresponding outputs: sfGFP fluorescence as a quantifiable reporter signal and a therapeutic peptide as a functional biological output.

Subsections of Projects

Individual Final Project

cover image cover image

HTGAA 2026: Individual Final Project Documentation

Project Title: A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection


Section 1: Abstract

Chronic inflammatory diseases and solid tumors share a pathophysiological signature characterized by elevated interleukin-6 (IL-6) and reduced local oxygen tension (hypoxia). Current clinical monitoring cannot detect these signals simultaneously at the tissue level in real time, requiring repeated blood draws or imaging procedures that are slow, lab-dependent, and inaccessible to over 3.5 billion people globally who lack basic diagnostic services. This project addresses that gap by engineering a hydrogel-embedded, cell-free genetic circuit capable of sensing both IL-6 and hypoxia as dual inputs and producing two corresponding outputs: sfGFP fluorescence as a quantifiable reporter signal and a therapeutic peptide as a functional biological output.

The system integrates a Multi-Input Multi-Output (MIMO) genetic circuit encoded on a plasmid containing a T7 promoter-driven sfGFP construct, encapsulated within an agarose hydrogel bioink matrix alongside E. coli cell-free extract (CFE). The hypothesis is that combining IL-6-responsive and hypoxia-responsive regulatory elements within a single cell-free circuit, embedded in a biocompatible hydrogel, will enable autonomous, localized disease detection and therapeutic response without requiring living cells. The specific aims are to design and validate the MIMO genetic circuit, optimize hydrogel encapsulation for sustained cell-free protein synthesis, and characterize dual-input fluorescence output as a proof of concept for future therapeutic peptide integration. This system has the potential to function as an implantable or wound-applied biosensor-therapeutic platform capable of responding autonomously to disease microenvironments.


Section 2: Project Aims

Aim 1: Experimental Aim

The first aim of my final project is to design, construct, and validate a Multi-Input Multi-Output (MIMO) genetic circuit that produces sfGFP fluorescence output in response to IL-6 and hypoxia signals in a cell-free expression system, by utilizing T7 promoter-driven DNA construct design in Benchling, cell-free transcription/translation reactions using E. coli BL21 (DE3) Star lysate, and fluorescence plate reader quantification to confirm circuit functionality.

Relevant methods and resources:

  • Plasmid design in Benchling with codon-optimized sfGFP insert under T7 promoter control
  • Twist Biosciences DNA synthesis order for the MIMO circuit construct
  • Cell-free reaction setup using NMP-Ribose-Glucose 20-hour master mix
  • IL-6 protein spike-in and cobalt chloride (CoCl2) as a chemical hypoxia mimic
  • Fluorescence plate reader (excitation 485 nm / emission 510 nm) for sfGFP quantification
  • Agarose hydrogel encapsulation protocol based on Sánchez-Costa et al. (2024)

Aim 2: Development Aim

Following successful validation of IL-6 and hypoxia-triggered sfGFP output in Aim 1, the next step would be to replace the sfGFP reporter with a therapeutic peptide output cassette, encoding an anti-inflammatory peptide such as an IL-6 receptor antagonist peptide, and to validate dosage-controlled release kinetics from the agarose hydrogel matrix using LC-MS quantification of peptide concentration in the hydrogel supernatant over a 24 to 72-hour window. This aim would demonstrate that the system can be switched from a diagnostic reporter mode to a therapeutic delivery mode using the same circuit architecture, and would characterize the hydrogel as a controlled release vehicle capable of producing clinically relevant peptide concentrations in response to disease signals.

Aim 3: Visionary Aim

The long-term vision for this project is to develop a fully implantable or wound-applied Engineered Living Material (ELM) that autonomously monitors the inflammatory and hypoxic microenvironment of chronic wounds, diabetic ulcers, or solid tumors, and responds in real time by producing localized therapeutic peptides at the site of disease. If fully realized, this platform would eliminate the need for repeated systemic drug dosing or clinical lab testing for conditions like sepsis, cancer, and chronic inflammation, reducing treatment delays and enabling precision medicine at the tissue level. The broader concept challenges the existing paradigm of passive drug delivery and reactive clinical monitoring by replacing it with an active, self-regulating biological system embedded directly into the disease microenvironment — a shift that could redefine how chronic disease management is approached across oncology, wound care, and critical care medicine globally.


Section 3: Background

Background and Literature Context

Citation 1: Sánchez-Costa et al. (2024) — In-Hydrogel Cell-Free Protein Expression System as Biocompatible and Implantable Biomaterial

This study demonstrated that E. coli cell-free extracts (CFEs) can be successfully encapsulated within agarose hydrogels while retaining full transcription and translation activity, enabling sfGFP production that diffuses from the hydrogel core to its surface. The authors showed that agarose hydrogels outperformed other matrices in supporting sustained cell-free protein synthesis. Critically, the hydrogel system was confirmed to be biocompatible both in vitro through cell colonization and proliferation assays and in vivo in a preclinical mouse implantation model, with no adverse immune response observed. The study also demonstrated that freeze-drying of the agarose hydrogel still supports protein synthesis upon reconstitution, enabling point-of-care deployment in resource-limited settings. This work establishes the foundational biomaterial platform upon which the MIMO circuit of this project is built.

Citation 2: Pardee et al. (2016) — Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components

Pardee and colleagues demonstrated that cell-free gene expression systems can be freeze-dried onto paper substrates and reactivated with minimal equipment to detect specific nucleic acid sequences from pathogens including Zika virus, producing colorimetric outputs readable by eye. The work established that cell-free systems are robust enough to be deployed outside of laboratory environments and that their regulatory outputs can be programmed through toehold switch RNA sensors responding to specific molecular inputs. This paradigm of programmable, cell-free biosensing with embedded output logic is directly analogous to the MIMO circuit architecture proposed in this project, extended here to protein-level and oxygen-level inputs rather than nucleic acid inputs. Together these two studies establish the feasibility of combining programmable cell-free gene circuits with biocompatible encapsulation matrices for real-world diagnostic and therapeutic applications.

Novelty and Innovation

This project is novel in three distinct ways. First, while cell-free protein synthesis in hydrogels has been demonstrated as a proof of concept for single-output expression systems, no published work has demonstrated a dual-input MIMO genetic circuit operating within an encapsulated hydrogel format — the combination of multi-signal logic gating with hydrogel encapsulation represents a genuinely new system architecture. Second, the use of IL-6 as a transcriptional input in a cell-free circuit is an underexplored application; most cell-free biosensing work has focused on nucleic acid inputs or small molecule inducers, and expanding the input modality to a clinically relevant cytokine opens a new class of programmable biosensors for inflammatory disease. Third, the integration of a therapeutic peptide output within the same circuit that performs the sensing function creates a self-contained sense-and-respond system that does not require any external intervention between detection and therapy, which is a conceptual departure from existing diagnostic-therapeutic platforms that treat sensing and treatment as separate sequential steps.

Why This Project Matters

The problem this project addresses is the fundamental mismatch between the speed of disease progression at the tissue level and the speed of clinical monitoring available to detect it. For conditions like sepsis, where 56% of deaths are preventable with faster diagnosis (Lancet, 2020), or diabetic wound infections where delayed detection leads to amputation, the 2 to 6-hour typical lab turnaround time for inflammatory biomarker tests is clinically unacceptable when biosensor-based detection can in principle operate in minutes. Over 3.5 billion people globally lack access to basic diagnostic services (WHO, 2022), and the 422 million people living with diabetes (WHO, 2023) represent a population for whom continuous localized monitoring of wound inflammation would have immediate, life-altering benefit.

This project matters because it addresses the point-of-care gap not by improving existing lab tests, but by eliminating the need for a lab entirely, placing the sensing and therapeutic response capability directly at the tissue site where the disease is occurring. If successful, the outputs of this project could reframe how implantable biomaterials are designed, shifting them from passive structural supports or drug reservoirs to active participants in disease management that read biological signals and respond accordingly. At the field level, this work contributes to the emerging discipline of Engineered Living Materials by demonstrating that cell-free systems can serve as the functional biological layer within implantable hydrogels without the biosafety concerns associated with living genetically modified organisms. The potential clinical impact extends across wound care, oncology, and critical care, where real-time localized biosensing and autonomous therapeutic release could meaningfully change patient outcomes.

Ethical Implications

This project involves the design of a programmable biological system intended for implantation or wound application in human patients, which raises significant ethical considerations across multiple principles. The principle of non-maleficence is directly relevant: the encapsulation of E. coli cell-free extracts within an implantable material introduces the risk of immune response, even in the absence of living organisms, as bacterial proteins and lipopolysaccharides (LPS) retained in the lysate could trigger inflammatory cascades in immunocompromised patients for whom this device is most likely to be used. The principle of justice is also implicated: while the motivation of the project is explicitly to address diagnostic inequity in low-resource settings, there is a risk that the technology, if commercialized, would be priced in ways that replicate existing access barriers rather than reducing them. The principle of beneficence requires that the therapeutic peptide output be characterized not only for efficacy but for potential off-target effects, particularly since a self-regulating release system cannot be externally switched off once implanted.

To ensure ethical conduct of this research and responsible development of its implications, several measures should be taken. All cell-free lysates used in hydrogel formulations intended for in vivo testing should be rigorously depleted of LPS through standard endotoxin removal protocols and tested for endotoxin levels below the FDA threshold of 0.5 EU/mL before any animal studies proceed. The therapeutic peptide output should undergo full in vitro cytotoxicity profiling and off-target receptor binding analysis before any in vivo experiments. Open publication of protocols and DNA sequences is a core commitment of this project, ensuring that the technology is accessible to academic groups in low-resource settings rather than locked behind proprietary barriers. A potential unintended consequence of the autonomous sense-and-respond architecture is that the system could produce therapeutic output in response to non-pathological IL-6 fluctuations, for example during normal exercise-induced inflammation, which could cause unintended biological effects. This uncertainty must be addressed through careful characterization of the input threshold sensitivity of the circuit before any clinical translation is considered. An alternative to the implantable format that avoids many of these concerns entirely would be a topically applied hydrogel patch that remains external to the body, reducing immune risk and allowing easy removal if adverse effects are observed.


Section 4: Experimental Design, Techniques, Tools, and Technology

Detailed Experimental Plan

Step 1: DNA Construct Design (Week 1, estimated 3 days) Design the MIMO genetic circuit construct in Benchling, incorporating a T7 promoter, ribosome binding site (RBS), codon-optimized sfGFP insert, and T7 terminator as the baseline expression cassette. Annotate restriction enzyme sites (EcoRI, NdeI, BstYI confirmed from the plasmid map) for downstream cloning verification. Expected result: a fully annotated ~800 bp plasmid construct ready for synthesis.

Step 2: Twist Biosciences DNA Synthesis Order (Week 1, submission; Week 2, delivery) Submit the finalized sfGFP construct sequence to Twist Biosciences for gene synthesis. Specify the T7 promoter context and confirm codon optimization for E. coli expression. Expected result: arrival of lyophilized dsDNA within 5 to 7 business days.

Step 3: Plasmid Preparation and Transformation (Week 2, estimated 2 days) Transform the synthesized construct into DH5alpha competent cells using heat shock protocol, plate on ampicillin LB agar, and grow overnight at 37°C. Pick 4 to 6 colonies and perform miniprep DNA extraction. Verify construct by Sanger sequencing using T7 promoter primer. Expected result: confirmed sequence-verified plasmid at concentration above 100 ng/uL.

Step 4: Cell-Free Master Mix Preparation (Week 2, estimated 1 day) Prepare the NMP-Ribose-Glucose 20-hour master mix at 2X concentration according to the HTGAA Cell-Free Master Mix Composition protocol, including potassium glutamate, HEPES-KOH pH 7.5, magnesium glutamate, potassium phosphate salts, ribose, glucose, AMP, CMP, GMP (via guanine salvage), UMP, amino acid mix, tyrosine, cysteine, and nicotinamide. Expected result: complete 2X master mix aliquoted and stored at -80°C.

Step 5: Baseline Cell-Free sfGFP Expression (Week 3, estimated 1 day) Set up 20 uL cell-free reactions in triplicate: 6 uL BL21 (DE3) Star lysate, 10 uL 2X master mix, 2 uL sfGFP plasmid DNA template, 2 uL nuclease-free water. Incubate at 30°C for 20 hours. Measure sfGFP fluorescence every 2 hours using plate reader at excitation 485 nm / emission 510 nm. Expected result: increasing fluorescence signal reaching plateau by 12 to 16 hours, confirming functional cell-free expression.

Step 6: IL-6 Input Titration (Week 3, estimated 2 days) Replace the 2 uL water backfill in Step 5 with recombinant human IL-6 protein at concentrations of 0, 1, 10, 100, and 1000 ng/mL. Assess whether IL-6 concentration-dependent changes in sfGFP output are detectable by the circuit. Expected result: measurable dose-response relationship between IL-6 concentration and fluorescence output if the IL-6-responsive regulatory element is functional.

Step 7: Hypoxia Input Simulation (Week 3, estimated 1 day) Use cobalt chloride (CoCl2) at 100 to 500 uM as a chemical hypoxia mimic to activate hypoxia-inducible regulatory elements in the circuit. Alternatively, run reactions in a sealed anaerobic chamber with 1% O2. Expected result: elevated sfGFP output under hypoxic conditions compared to normoxic control.

Step 8: Dual-Input MIMO Test (Week 4, estimated 2 days) Combine IL-6 spike-in and CoCl2 hypoxia mimic simultaneously in a single cell-free reaction to test whether both inputs produce an additive or synergistic increase in sfGFP output. Test all four input combinations: no input, IL-6 only, hypoxia only, and both inputs together. Expected result: highest fluorescence output in the dual-input condition, confirming MIMO logic gating behavior.

Step 9: Agarose Hydrogel Encapsulation (Week 4, estimated 2 days) Prepare 2% w/v agarose hydrogel in nuclease-free water, autoclave, and cool to 42°C before mixing with cell-free extract and plasmid DNA at a 1:1 ratio. Cast into 6-well plates and allow to solidify at room temperature. Incubate encapsulated reactions at 30°C and measure sfGFP fluorescence diffusing from the gel surface at 0, 4, 8, 12, and 20 hours. Expected result: detectable sfGFP fluorescence in the hydrogel supernatant by 8 hours, confirming that encapsulation does not abolish cell-free protein synthesis activity.

Step 10: Freeze-Drying Stability Test (Week 5, estimated 2 days) Lyophilize hydrogel-encapsulated cell-free reactions and reconstitute with nuclease-free water containing plasmid DNA. Incubate and measure sfGFP output as above. Expected result: recovery of sfGFP fluorescence above 50% of fresh reaction output, confirming freeze-dry stability for point-of-care deployment.

Step 11: Data Analysis and Visualization (Week 5, estimated 2 days) Compile all fluorescence time-course data and generate dose-response curves for IL-6 and hypoxia inputs versus sfGFP output. Calculate signal-to-noise ratio for each input condition. Plot hydrogel versus liquid-phase reaction fluorescence kinetics. Expected result: complete dataset with statistical comparison between all conditions, with at least one input condition showing a statistically significant increase in sfGFP output relative to no-input control.

Step 12: Documentation and Final Report (Week 5, estimated 1 day) Update HTGAA webpage with experimental data, plasmid map, fluorescence graphs, and protocol. Archive Benchling construct, Twist order confirmation, and sequencing results. Expected result: fully documented project with reproducible protocol and publicly accessible data.

Workflow Diagram

DESIGN                    WET LAB                    ANALYSIS
  │                          │                           │
  ▼                          ▼                           ▼
Benchling          Cell-Free Master Mix          Plate Reader
Construct     →    Preparation + Lysate    →    Fluorescence
Design             Aliquoting                   Quantification
  │                          │                           │
  ▼                          ▼                           ▼
Twist Order        IL-6 + Hypoxia           Dose-Response
(DNA Synthesis) →  Input Titration     →    Curve Analysis
  │                          │                           │
  ▼                          ▼                           ▼
Sequence           Agarose Hydrogel         Freeze-Dry
Verification  →    Encapsulation       →    Stability Data

Techniques Checked

  • Pipetting
  • Bioethical Considerations
  • DNA Construct Design
  • Databases (GenBank, NCBI, Benchling)
  • Designing a Twist Order
  • Use of Benchling
  • Cell-Free Reactions
  • Freeze-Dried Cell-Free Systems
  • Chassis Selection (DH5alpha for plasmid propagation)
  • Plasmid Preparation
  • Bacterial Culturing
  • Quality Control / Analysis
  • Gibson Assembly
  • Primer Design or Selection
  • PCR Reactions
  • Protein Design
  • Use of Boltz or PepMLM (for therapeutic peptide design in Aim 2)

Expanded Technique Descriptions

Cell-Free Reactions The core experimental platform of this project is a cell-free transcription-translation (TX/TL) system using BL21 (DE3) Star E. coli lysate, which retains all the ribosomes, translation factors, and T7 RNA polymerase needed for gene expression without requiring living cells. The MIMO genetic circuit plasmid encoding sfGFP under T7 promoter control will be added directly to 20 uL reactions alongside the NMP-Ribose-Glucose 20-hour master mix, with IL-6 and CoCl2 supplemented as the two input signals in the 2 uL custom reagent supplement volume. Fluorescence will be measured every 2 hours over a 20-hour window to generate kinetic expression profiles for each input condition. This approach allows rapid, iterative testing of circuit behavior without the time cost of bacterial transformation or cell culture, making it ideal for the course timeline.

Agarose Hydrogel Encapsulation Based on the protocol established by Sánchez-Costa et al. (2024), 2% w/v low-melting-point agarose will be dissolved in nuclease-free water, autoclaved for sterility, and equilibrated to 42°C to remain liquid before mixing with the cell-free extract and plasmid DNA. The mixture will be cast into molds and allowed to solidify at room temperature, after which the encapsulated hydrogels will be transferred to a 30°C incubator to allow cell-free protein synthesis to proceed. sfGFP fluorescence will be measured both in the hydrogel itself using a macro fluorescence imager and in the surrounding supernatant to quantify diffusion efficiency from gel core to surface. This technique is central to demonstrating that the MIMO circuit can function not just in liquid-phase cell-free reactions but within the biocompatible solid matrix format that would be required for any future implantable application.

Industry Council Companies

  • Twist Biosciences — DNA synthesis of the MIMO circuit construct
  • Ginkgo Bioworks — potential automation of cell-free reaction screening using Nebula cloud laboratory
  • Nuclera — cell-free protein synthesis platform relevance
  • Opentrons — liquid handling automation for master mix preparation and reaction setup
  • Waters Corporation — LC-MS analysis for therapeutic peptide output characterization in Aim 2
  • New England Biolabs — restriction enzymes and Gibson Assembly reagents

Section 5: Results and Quantitative Expectations

Validation Aspect

The aspect of the final project chosen for validation is the design and in silico verification of the MIMO genetic circuit construct encoding codon-optimized sfGFP under T7 promoter control, including restriction enzyme site mapping and sequence annotation in Benchling, followed by simulation of cell-free sfGFP expression output using the Asimov Kernel genetic simulator. This validates that the DNA design is structurally sound and that the expected expression dynamics are consistent with the fluorescence kinetics anticipated in the wet lab experiments.

Validation Protocol

  1. Retrieve the canonical sfGFP amino acid sequence from UniProt and back-translate to a codon-optimized nucleotide sequence for E. coli expression using the Benchling codon optimization tool.
  2. Assemble the full construct in Benchling in the following order: T7 promoter (23 bp), RBS (Shine-Dalgarno sequence, 6 bp), codon-optimized sfGFP coding sequence (720 bp), T7 terminator (40 bp), inserted into a pUC19 backbone with ampicillin resistance.
  3. Annotate all functional elements including restriction enzyme sites (EcoRI, NdeI, BstYI, BsrFI as visible in the plasmid map) and primer binding sites for Sanger sequencing verification.
  4. Export the construct sequence as FASTA and verify open reading frame integrity using NCBI ORF Finder.
  5. Transfer the construct to Asimov Kernel: create a new construct using the Characterized Bacterial Parts library, assembling pT7 promoter, A1 RBS, sfGFP coding sequence, and L3S2P24 terminator with pUC-SpecR backbone.
  6. Run the Asimov Kernel simulator with a 20-hour duration and 0.1-hour time step.
  7. Record RNA transcript and protein concentration time-course graphs and compare to expected sfGFP expression kinetics from the literature (peak expression within 8 to 12 hours in NMP-based cell-free systems).
  8. Export construct schematic and simulator output graphs for inclusion in project documentation.

Techniques Utilized

The primary technique used in this validation is DNA construct design, implemented through the Benchling platform to assemble, annotate, and verify the MIMO circuit plasmid map. This was complemented by the use of the Asimov Kernel simulation environment, which functions as a computational model of genetic circuit behavior in a bacterial expression context, allowing prediction of sfGFP RNA and protein production kinetics before any wet lab experiment is conducted. Database tools including UniProt for sequence retrieval and NCBI ORF Finder for reading frame verification were used to ensure sequence accuracy and correct translation frame alignment. Together, these in silico techniques represent a full digital twin of the planned wet lab construct, providing a quantitative baseline expectation against which experimental fluorescence data can be evaluated.

Data and Analysis

The Asimov Kernel simulation of the sfGFP construct under T7 promoter control produced a time-course showing sfGFP RNA transcript concentration rising sharply within the first 2 to 4 simulated hours and stabilizing at approximately 3.0 relative units by 8 hours, with protein concentration reaching plateau at approximately 2.4 relative units by 10 hours, consistent with the expression kinetics expected from a T7-driven cell-free system using the NMP-Ribose-Glucose master mix over a 20-hour window. The plasmid map generated in Benchling confirmed correct ORF orientation (5’ to 3’), the presence of the T7 promoter at the +1 transcription start site, and all three codon-optimized sfGFP primer binding sites (FWD 1, 2, 3 and REV 1, 2, 3) as annotated in the construct, with no frameshift mutations or unintended stop codons identified by ORF Finder analysis.

Challenges and Limitations

One unexpected challenge encountered during the Asimov Kernel simulation was that the simulator did not show the classical rising-then-plateau expression curve but instead reached steady state very rapidly within the first few simulated hours, likely because the characterized parts in the Kernel library are optimized for bacterial in vivo expression rather than cell-free in vitro conditions where energy substrate depletion limits expression after a certain point. To overcome this, the simulation results were interpreted as indicative of relative expression level and peak timing rather than absolute concentration values, with the understanding that wet lab data will be needed to establish actual fluorescence units. A key limitation of relying on computational validation alone is that it cannot capture the effects of the IL-6 and hypoxia regulatory elements on circuit output, since these elements require biochemical inputs that the current Kernel simulator does not model. Alternative validation strategies that would strengthen this section include running a pilot cell-free reaction with the sfGFP construct in liquid phase before the full MIMO circuit is ready, or performing a PCR amplification of the construct using the annotated primer sites to confirm sequence accessibility and template integrity prior to Twist synthesis submission.


Section 6: Additional Information

References

  • Sánchez-Costa, M., Urigoitia, A., Comino, N., Arnaiz, B., Khatami, N., Ruiz-Hernandez, R., Diamanti, E., Abarrategi, A., & López-Gallego, F. (2024). In-Hydrogel Cell-Free Protein Expression System as Biocompatible and Implantable Biomaterial. Biological and Medical Applications of Materials and Interfaces, March 20, 2024.
  • Pardee, K., et al. (2016). Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components. Cell, 165(5), 1255–1266.
  • WHO. (2023). Diabetes fact sheet. World Health Organization.
  • WHO. (2022). Primary health care and essential diagnostics. World Health Organization.
  • The Lancet. (2020). Sepsis mortality and early diagnosis statistics.
  • Jewett, M. C., & Swartz, J. R. (2004). Mimicking the Escherichia coli cytoplasmic environment activates long-lived and efficient cell-free protein synthesis. Biotechnology and Bioengineering, 86(1), 19–26.
  • Norholm, M. H. H. (2010). A mutant Pfu DNA polymerase designed for advanced uracil-excision DNA engineering. BMC Biotechnology.

Supply List and Budget

DNA and Molecular Biology

  • Twist Biosciences gene synthesis for MIMO circuit construct (~800 bp): ~$80 to $120
  • pUC19 plasmid backbone (Addgene): ~$75 (institutional access)
  • DH5alpha competent cells (NEB): ~$120 per 96-reaction kit
  • LB broth and LB agar plates with ampicillin: ~$30
  • Miniprep kit (Qiagen or equivalent): ~$80 for 50 preps
  • Sanger sequencing (Genewiz or equivalent, 4 to 6 reactions): ~$30 to $50

Cell-Free System Components

  • BL21 (DE3) Star E. coli cell-free lysate (prepared in-house or purchased): ~$100 to $200 per mL
  • NMP master mix components (AMP, CMP, GMP, UMP, ribose, glucose, amino acids, HEPES, potassium glutamate, magnesium glutamate, potassium phosphate, nicotinamide): ~$200 to $400 for reagent-grade chemicals
  • Recombinant human IL-6 protein (R&D Systems): ~$200 for 10 ug
  • Cobalt chloride (CoCl2, hypoxia mimic, Sigma-Aldrich): ~$30 per 5 g
  • Nuclease-free water (Thermo Fisher): ~$25 per 500 mL

Hydrogel Materials

  • Low-melting-point agarose (NEB or Sigma): ~$80 per 25 g
  • 6-well and 96-well plate format for casting: ~$30 to $50

Equipment (assumed institutional access)

  • Fluorescence plate reader (excitation 485 nm / emission 510 nm)
  • Lyophilizer / freeze-dryer for freeze-drying stability test
  • Incubator set to 30°C
  • PCR thermocycler
  • Gel electrophoresis system

Computational Tools (free or institutional access)

  • Benchling (free academic account)
  • Asimov Kernel (free academic access)
  • NCBI BLAST and ORF Finder (free)
  • UniProt (free)

Estimated Total Wet Lab Budget: $1,000 to $1,400

The cell-free genetic circuit that I plan to make for the final project aims to detect different biological signals and produce a measurable output. The input will be one among the environmental signals, IL-6 or low O₂, and the output will be a green fluorescence signal or a therapeutic peptide.

Group Final Project

cover image cover image