Hotaku Komatsu — HTGAA 2026

cover cover
🌱

Hotaku Komatsu

Welcome to my How To Grow Almost Anything (HTGAA) Spring 2026 workspace. This is where I document my journey through synthetic biology.


About me

I am a student in the HTGAA 2026 course, exploring the intersection of technology and biology.

Info

Contact info


Homework

  • Week 1: Principles and Practices Class Assignment 1. Application or Tool to Develop & Why Application: ā€œThe Bio-Puzzleā€ – A hardware-agnostic DNA assembly toolkit using split-protein reporters (e.g., Split-GFP) as physical ā€œchecksumsā€ for long-sequence construction.
  • Week 3: Lab Automation Lab Automation assignment This week, we explored lab automation and its applications in synthetic biology.
  • Week 4: Protein Design I This week, we explored the fundamentals of protein structure, visualization, and the latest ML-based tools for design and analysis.
  • Week 5: Protein Design II This week focuses on designing and evaluating therapeutic peptides for SOD1 mutant A4V, a key player in familial Amyotrophic Lateral Sclerosis (ALS).
  • Week 6: DNA Assembly This week explores the experimental design and in-silico simulation of constructing genetic circuits. The first section details standard molecular cloning techniques like PCR and Assembly, whilst the second section documents my modeling results in Asimov Kernel.
  • Week 7: IANNs & Fungal Materials This week covers two advanced synthetic biology paradigms: Intracellular Artificial Neural Networks (IANNs) for complex logical decision making, and the engineering of macro-scale Fungal Materials.

Labs

Projects

Subsections of Hotaku Komatsu — HTGAA 2026

Homework

Class Assignments:

  • Week 1 HW: Principles and Practices

    Week 1: Principles and Practices Class Assignment 1. Application or Tool to Develop & Why Application: ā€œThe Bio-Puzzleā€ – A hardware-agnostic DNA assembly toolkit using split-protein reporters (e.g., Split-GFP) as physical ā€œchecksumsā€ for long-sequence construction.

  • Week 2 HW: DNA Read, Write, and Edit

    Week 2: DNA Read, Write, and Edit Part 1 & 2: Gel Art Describe your process of creating Gel Art using Benchling and restriction digests.

  • Week 3 HW: Lab Automation

    Week 3: Lab Automation Lab Automation assignment This week, we explored lab automation and its applications in synthetic biology.

  • Week 4 HW: Protein Design I

    Week 4: Protein Design I This week, we explored the fundamentals of protein structure, visualization, and the latest ML-based tools for design and analysis.

  • Week 5 HW: Protein Design II

    Week 5: Protein Design II This week focuses on designing and evaluating therapeutic peptides for SOD1 mutant A4V, a key player in familial Amyotrophic Lateral Sclerosis (ALS).

  • Week 6 HW: DNA Assembly

    Week 6: DNA Assembly This week explores the experimental design and in-silico simulation of constructing genetic circuits. The first section details standard molecular cloning techniques like PCR and Assembly, whilst the second section documents my modeling results in Asimov Kernel.

  • Week 7 HW: IANNs & Fungal Materials

    Week 7: IANNs & Fungal Materials This week covers two advanced synthetic biology paradigms: Intracellular Artificial Neural Networks (IANNs) for complex logical decision making, and the engineering of macro-scale Fungal Materials.

Subsections of Homework

Week 1 HW: Principles and Practices

cover cover

Week 1: Principles and Practices

Class Assignment

1. Application or Tool to Develop & Why

Application: “The Bio-Puzzle” – A hardware-agnostic DNA assembly toolkit using split-protein reporters (e.g., Split-GFP) as physical “checksums” for long-sequence construction.

Why: For students and DIY biologists, the most immediate “biosecurity” threat is unintentional human error. Assembling long DNA (2,000bp+) from short, affordable oligo pools is difficult and error-prone. Currently, verifying these assemblies requires expensive and slow sequencing. The “Bio-Puzzle” transforms biosecurity from a restrictive policy into a helpful engineering tool. By engineering DNA fragments to produce a visual signal only when assembled in the correct order, we provide a real-time, “at-the-bench” verification system.

2. Governance/Policy Goals

Goal: To foster a culture of “Integrity by Design” in decentralized research environments.

  • Sub-goal A: Minimize human error in DNA synthesis by providing immediate physical feedback.
  • Sub-goal B: Establish a community norm where long-sequence assembly is inherently linked to transparency and verification.

3. Potential Governance Actions

  • Option 1: The Split-Reporter Standard (Technical): Standardize “Verification Tags” at DNA junctions.
  • Option 2: Orthogonal Overhang Library (Policy): Create open-source library of “puzzle teeth” for safe connection.
  • Option 3: The Bio-Cookbook Ledger (Social): Peer-verified platform sharing success outcomes.

4. Scoring Governance Actions

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents112
• By helping respond321
Foster Lab Safety
• By preventing incident112
Protect environment
• By preventing incidents112

(1 = Best / 3 = Worst)

5. Prioritization, Trade-offs, and Ethics

Prioritization: I prioritize Option 1 (Split-Reporter) because it addresses the core technical challenge.

Ethical Reflection: This project shifts the focus from “top-down censorship” to “bottom-up empowerment.”


Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson

    • The error rate of polymerase is $10^{-6}$.
    • The length of human genome is $3.2$ Gbp.
    • Biology utilize proofreading and mismatch repair to deal with this discrepancy.
    • The average length of a human gene is about 1,036 bp.
    • Total possible DNA sequences: $3^{345}$.

Homework Questions from Dr. LeProust

  1. Phosphoramidite method: Deprotection → Coupling → Capping → Oxidation
  2. Limit: Once it exceeds 200 nt, it becomes difficult to separate and purify.
  3. Yield: Decreases exponentially with length.

Homework Question from George Church

  • Essential amino acids: Lysine, Histidine, Isoleucine, Leucine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, Arginine.
  • Lysine constraint: Lysine can be supplied externally, so the constraint isn’t absolute and could undergo reversion mutations.

Week 2 HW: DNA Read, Write, and Edit

cover cover

Week 2: DNA Read, Write, and Edit

Part 1 & 2: Gel Art

Describe your process of creating Gel Art using Benchling and restriction digests.

1.1 In-silico Design (Benchling)

  • Enzymes used: HindIII, SacI, KpnI

  • Design Concept: For this project, I designed a “Gel Art” pattern inspired by a European circular arch bridge. I wanted to use the different migration speeds of DNA fragments to recreate the structural elegance of an ancient stone bridge.

  • Inspiration: Stone Bridge Stone Bridge

  • Simulated Gel Pattern:

Gel Art Bridge Gel Art Bridge

1.2 Lab Execution

  • Protocol followed: “Gel Art: Restriction Digests and Gel Electrophoresis”
  • Results: (Awaiting wet-lab execution results)

Part 3: DNA Design Challenge

3.1 Protein Choice

  • Chosen Protein: LuxR (Aliivibrio fischeri)
  • Why: I chose LuxR because it is a fundamental component for building genetic circuits. LuxR/LuxI system is one of the most well-characterized Quorum Sensing systems, used to create sophisticated biological logic gates and population-controlled behaviors in synthetic biology.
  • Mechanism (LuxR/LuxI Quorum Sensing):

LuxR Quorum Sensing LuxR Quorum Sensing Image source: MDPI - International Journal of Molecular Sciences, 2020

  • Protein Sequence:
    >sp|P12746|LUXR_ALIF1 Transcriptional activator protein luxR OS=Aliivibrio fischeri (strain ATCC 7744 / MJ11) OX=312301 GN=luxR PE=1 SV=1
    MKNNIKNYAFLLFLFIIFINPKNNSAKLDKIKAYNTIVEKVEGNEFDLALFAYIHLALLL
    NKINNKLLIKGDKISLVGFPCVDNGLCSTGIIFSHVNDLVVNDYIFNIDNKENESIKLID
    LFEKSVEEVKAIYNYYKKINEKNYLILDSKISFYKLHDSYKKLYKLSLNIIPLSFEKKEL
    CILKKLIHETLSKFKIEKSYVNLDKLIDKNIQLIKIEQNDFNDSIYSYKKLISIILLPLT
    YFE

3.2 Reverse Translation

  • Reverse Translation Process: Starting from the LuxR protein sequence, I determined the original nucleotide sequence of the luxR gene from Aliivibrio fischeri. I used the Gene Corner Reverse Translate Tool for this process. In synthetic biology, this “reverse” process allows us to understand how nature codes for the protein, providing a baseline for synthesis.
  • Wild-type DNA Sequence (luxR):
    atgaaaaacaatattaaaaattatgcgtttcttttgttatttttcatcatatttattaat
    ccgaaaaataatagcgcaaaattagataaaatcaaagcgtacaatacaattgtagagaaa
    gtagaaggtaatgaatttgatttggcgctatttgcatatattcatttggccttactttta
    aataaaatcaataataagttatttattaaaggtgataaaatcagtttagttggtttcccg
    tgtgtagataacggattatgttcaactggaattattttttctcatgttaatgatttagtt
    gttaatgattatattttttacattgataataaagaaaatgaatcaattaaattgattgat
    ttatttgaaaagagtgtagaagaggtaaaagcgatttataattattataaaaaaattaat
    gagaaaaattatctaattttagattcaaaaatcagtttttataaattacatgatagttat
    aaaaaattatatatattgagtttaaatattatccctttaagttttgaaaaaaaagaactt
    tgtattttaaaaaaactaattcatgagacattaagtaaattcaaaattgagaagagttat
    gttaatttagataaattaattgataaaaatattcaattaattaaaattgagcaaaatgat
    tttaatgattcgatttatagttacaaaaaattaattagtattattctattaccactaact
    tattttgaataa

3.3 Codon Optimization

  • Optimized for Organism: Escherichia coli (K-12)

  • Optimization Tool: I utilized the IDT Codon Optimization Tool to adapt the sequence for high expression in E. coli.

  • Why Optimize? Codon optimization is crucial because different organisms prefer different “synonymous” codons to represent the same amino acid. This is known as codon usage bias. In E. coli, rare codons (like those found in A. fischeri) can lead to:

    1. Low Expression: Ribosomes stalling at rare codons, reducing protein yield.
    2. Truncated Proteins: Ribosomes falling off the mRNA before finishing.
    3. Misfolding: The timing of translation speed affects how the protein folds. Additionally, optimization helps remove internal restriction sites (like BsaI, BbsI) and strong secondary structures that might interfere with DNA synthesis or translation.
  • Codon-Optimized DNA Sequence for E. coli:

    ATG AAG AAC AAT ATT AAA AAC TAC GCA TTT CTG CTG CTG TTT TTT ATC ATT
    TTC ATC AAC CCG AAA AAT AAC TCA GCC AAG CTG GAT AAA ATT AAA GCG TAT
    AAT ACA ATT GTC GAA AAA GTG GAG GGC AAC GAA TTT GAT TTG GCG CTT TTT
    GCC TAC ATC CAC TTG GCG CTG TTG CTG AAT AAA ATT AAT AAT AAA TTG TTT
    ATT AAA GGC GAC AAG ATT TCG CTG GTC GGT TTC CCG TGC GTG GAC AAC GGC
    CTG TGC TCA ACT GGT ATT ATC TTT TCA CAT GTC AAT GAT CTT GTA GTG AAT
    GAT TAT ATC TTT TAT ATT GAC AAT AAA GAA AAT GAG AGT ATT AAG CTG ATT
    GAC CTT TTC GAG AAG TCC GTA GAG GAA GTG AAG GCC ATT TAT AAT TAT TAC
    AAG AAA ATC AAC GAA AAG AAT TAT CTG ATT TTG GAC TCA AAA ATC TCG TTC
    TAT AAA TTA CAC GAT TCT TAT AAA AAG CTT TAC ATC CTG TCG CTG AAC ATC
    ATC CCG TTG TCT TTT GAA AAA AAG GAA CTG TGT ATT CTG AAA AAA CTG ATC
    CAC GAA ACC CTG AGT AAA TTT AAA ATT GAG AAG TCT TAC GTT AAC CTG GAT
    AAA TTA ATT GAT AAA AAC ATT CAG TTG ATC AAG ATT GAA CAA AAC GAT TTC
    AAC GAT AGC ATT TAT TCG TAC AAA AAG TTA ATT AGC ATC ATT CTG CTG CCC
    TTG ACA TAT TTT GAA TAA

3.4 Production Technology

  • Process (Cell-Free Protein Synthesis - CFPS): I am particularly interested in producing this protein using cell-free methods, such as the PURE (Protein synthesis Using Recombinant Elements) system or TX-TL (Transcription-Translation) cell extracts (e.g., from E. coli).

  • How it works: In a cell-free system, instead of transforming the DNA into a living cell, we simply mix the DNA template with a cocktail of “biological machinery” in a test tube. This cocktail includes:

    1. RNA Polymerase: To transcribe the DNA into mRNA.
    2. Ribosomes: To translate the mRNA into a protein.
    3. tRNAs & Amino Acids: The building blocks and adapters for protein assembly.
    4. Energy Sources: ATP and regeneration systems to fuel the process.
  • Why Cell-Free?

    • Speed: Protein can be produced in hours rather than days, as there is no need for cell growth or transformation.
    • Direct Prototyping: We can use linear DNA (like the codon-optimized sequence I designed) directly without cloning it into a plasmid.
    • Safety & Control: Since there are no living cells, it is easier to study proteins that might be toxic to a host cell, and we have precise control over the reaction environment.
  • Dual-Method Compatibility: Ultimately, this design is highly versatile: it is possible to use both cell-free (in vitro) and cell-dependent (in vivo) methods. While cell-free systems offer rapid prototyping and a controlled environment, the same optimized sequence can be cloned into a plasmid and transformed into E. coli for stable, large-scale production. This flexibility allows us to choose the most appropriate method depending on the experimental goals.

3.5 [Optional] Central Dogma in Nature

  • Transcription and Translation: In nature, a single gene can sometimes code for multiple proteins through mechanisms like alternative splicing (in eukaryotes) or overlapping open reading frames (ORFs) (in viruses and some bacteria like our lux operon). For example, in the lux system, the organization of genes allows for coordinated expression of the entire bioluminescence machinery from a single promoter.

Part 4: Twist DNA Synthesis Order

4.1 DNA Insert Construction

  • Name: LuxR_Cassette_v1

  • Full Insert Sequence (FASTA):

    >LuxR_Cassette_v1
    TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATG
    ATGAAGAACAATATTAAAAACTACGCATTTCTGCTGCTGTTTTTTATCATTTTCATCAAC
    CCGAAAAATAACTCAGCCAAGCTGGATAAAATTAAAGCGTATAATACAATTGTCGAAAAA
    GTGGAGGGCAACGAATTTGATTTGGCGCTTTTTGCCTACATCCACTTGGCGCTGTTGCTG
    AATAAAATTAATAATAAATTGTTTATTAAAGGCGACAAGATTTCGCTGGTCGGTTTCCCG
    TGCGTGGACAACGGCCTGTGCTCAACTGGTATTATCTTTTCACATGTCAATGATCTTGTA
    GTGAATGATTATATCTTTTATATTGACAATAAAGAAAATGAGAGTATTAAGCTGATTGAC
    CTTTTCGAGAAGTCCGTAGAGGAAGTGAAGGCCATTTATAATTATTACAAGAAAATCAAC
    GAAAAGAATTATCTGATTTTGGACTCAAAAATCTCGTTCTATAAATTACACGATTCTTAT
    AAAAAGCTTTACATCCTGTCGCTGAACATCATCCCGTTGTCTTTTGAAAAAAAGGAACTG
    TGTATTCTGAAAAAACTGATCCACGAAACCCTGAGTAAATTTAAAATTGAGAAGTCTTAC
    GTTAACCTGGATAAATTAATTGATAAAAACATTCAGTTGATCAAGATTGAACAAAACGAT
    TTCAACGATAGCATTTATTCGTACAAAAAGTTAATTAGCATCATTCTGCTGCCCTTGACA
    TATTTTGAACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTC
    AGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGA
    GTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA
  • Components Breakdown:

    • Promoter (J23106): TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC (Constitutive promoter)
    • RBS (B0034): CATTAAAGAGGAGAAAGGTACC (Strong RBS with spacers)
    • Start Codon: ATG
    • Coding Sequence: Optimized luxR (Stop codon removed for C-terminal tagging)
    • 7x His Tag: CATCACCATCACCATCATCAC (For protein purification)
    • Stop Codon: TAA
    • Terminator (B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

4.2 Vector Selection

  • Cloning Vector: pTwist Amp High Copy

  • Design Preview (Benchling Linear Map): LuxR Benchling LuxR Benchling

  • Strategy: This vector was chosen for its high copy number in E. coli, ensuring high yields of the plasmid for downstream applications. The cassette is flanked by cloning sites for easy extraction if needed. You can view the full annotated sequence on Benchling here.

  • Final Plasmid Map: LuxR Plasmid Map LuxR Plasmid Map Visual representation of the LuxR_Expression_Cassette inserted into the pTwist Amp High Copy vector.


Part 5: DNA Read/Write/Edit

5.1 DNA Read (Sequencing)

  • (i) Target: I want to sequence environmental DNA and synthesized constructs in real-time at the “bench” or in the field. This is crucial for early detection of unintended mutations or environmental contamination.
  • (ii) Technology: Oxford Nanopore Technologies (ONT)
    • Generation: 3rd Generation (Single-molecule sequencing).
    • Input & Preparation: Long-read DNA. Minimal preparation is key; using ONT’s Rapid Sequencing Kits, we can perform transposase-based fragmentation and adapter ligation in under 10 minutes.
    • Essential Steps: DNA molecules pass through a protein nanopore embedded in a membrane. As they pass, they disrupt the electrical current.
    • Base Calling: The changes in current are decoded using neural networks (base-callers like Guppy or Dorado) into a sequence of A, T, C, and G.
    • Output: FASTQ files of “long reads,” allowing me to see entire genetic circuits in a single piece.
    • Why ONT? It is highly portable (MinION) and provides data in real-time, which is essential for the “Biosecurity by Design” concept below.

5.2 DNA Write (Synthesis)

  • (i) Target: Synthetic genetic circuits based on quorum sensing regulators like LuxR (as designed in Part 3 and 4). These would be distributed as part of “safe-to-use” biological toolkits for decentralized research.
  • (ii) Technology: Phosphoramidite Synthesis (Modern Array-based)
    • Essential Steps: Controlled coupling of A, T, C, and G nucleotides onto a substrate (like a silicon chip for Twist Bioscience), followed by deprotection and oxidation cycles.
    • Limitations: While accurate and scalable for fragments, constructing multi-kb circuits still requires hierarchical assembly (Gibson or Golden Gate). Errors increase with length, requiring the biosecurity measures described in Part 5.3.

5.3 DNA Edit (Biosecurity Kill Switch)

  • (i) Target: I want to edit the Host Genome (E. coli or cell-free chassis) to include an autonomously triggered “Biosecurity Kill Switch.”
  • (ii) Technology: CRISPR-Cas9 System
    • The Concept: If the ONT sequencer (Read) detects a specific error, unintended mutation, or the presence of a hazardous sequence, it triggers the expression of a specialized CRISPR-Cas system.

    • Mechanism: I would design the system where a guide RNA (gRNA) targets essential genomic sequences or the synthetic circuit itself. Once triggered, Cas9 creates double-strand breaks in the genome, effectively “self-destructing” the cell or the DNA pool to prevent the spread of a dangerous or dysfunctional agent. CRISPR Cas9 Mechanism CRISPR Cas9 Mechanism

    • Preparation & Input: Requires a plasmid carrying the Cas9 gene under a conditional promoter and specialized gRNAs designed for high precision.

    • Limitations: The primary limitation is “escaper” mutants—cells that survive by mutating the CRISPR target site. To mitigate this, multiple essential sites must be targeted simultaneously (multiplexing).


Week 3 HW: Lab Automation

cover cover

Week 3: Lab Automation

Lab Automation assignment

This week, we explored lab automation and its applications in synthetic biology.

1. Assignment: Python Script for Opentrons Artwork

I designed a “Pac-Man” themed piece of Bio-Art for the Opentrons OT-2 robot. The design features Pac-Man and several ghosts, mapped onto a circular canvas representing the output of the lab’s liquid handling.

Google Colab Link: HTGAA Week 3 - Pac-Man Opentrons Art

Art Preview: Pac-Man Opentrons Art Pac-Man Opentrons Art

Python Script Logic: The script iterates through a matrix of coordinates, assigning specific colored liquids (Yellow for Pac-Man, Red/Blue/Cyan/Orange for ghosts) to designated wells. I used the Opentrons Python API (v2) to handle the aspirate and dispense operations with the P300 single-channel pipette.


2. Post-Lab Questions

1. Published Paper Review

Paper: Implementation of an open-source robotic platform for SARS-CoV-2 testing by real-time RT-PCR (Amen et al., Scientific Reports, 2021)

Summary: This paper details the implementation of a high-throughput diagnostic workflow for SARS-CoV-2 using Opentrons OT-2 liquid-handling robots. The authors developed a system that automates the transition from sample preparation to RT-PCR setup, allowing for up to 2,400 tests per day with a small footprint.

Novelty: The primary novelty lies in the use of open-source hardware to solve a critical capacity crisis during a global pandemic. Traditionally, such high-throughput testing required million-dollar proprietary systems. By utilizing the OT-2, the researchers created a “reproducible and flexible” diagnostic workstation that is affordable and can be rapidly deployed worldwide. This highlights how automation tools like Opentrons can democratize advanced molecular biology applications that were previously limited to elite institutions.

2. Automation for Final Project

For my final project, I am developing Next-Generation LuxR Biosensors with shifted specificities. I intend to use lab automation to bridge the gap between “Design” and “Test” in my DBTL cycle.

The Plan:

  • Automated Mutant Screening: I will use the Opentrons OT-2 to setup 96-well plates containing my library of LuxR variants.
  • Dynamic Range Characterization: The robot will perform automated serial dilutions of various AHL induce (Acyl-homoserine lactones) across 8 orders of magnitude (from 1 pM to 100 μM). This level of precision is difficult to achieve manually without significant error.
  • Combinatorial Analysis: To study crosstalk, I will automate the mixing of multiple different AHLs and variants in a single plate to see how the sensors behave in “noisy” chemical environments.

Automation Workflow (Inspired by Example 2):

  1. OT-2: Transfer LuxR variant expression constructs into designated wells of a 96-well plate.
  2. OT-2: Perform serial dilutions of 5 different AHL analogs and dispense into the variant-containing wells.
  3. Plate Reader: (Manual or remote integration) Measure fluorescence over a 12-hour period to generate real-time induction curves.
  4. Data Analysis: Use Python to automatically calculate the EC50 (half-maximal effective concentration) for each variant-ligand pair.

Pseudocode Idea:

# Automation protocol for variant screening
AHL_stock = reservoir.wells_by_name()['A1']
variants = plate.cols_by_name()['1'] # Variants in column 1

# Step 1: Serial Dilution of AHL
for i in range(1, 11):
    protocol.transfer(10, AHL_stock, plate.cols_by_name()[str(i+1)], mix_after=(3, 50))
    # Add buffer to maintain volume

# Step 2: Adding Variants
for well in plate.wells():
    p300.transfer(50, variant_stock, well)

3. Final Project Ideas

As part of the assignment, I have proposed the following three directions for my individual final project. These ideas explore different aspects of synthetic biology, from allosteric control to molecular transport and stimuli-responsive pores.

Idea 01

AI-Designed Allosteric Switches for Synthetic Cell Control

Designing molecular switches that sense specific small molecules (inducers) and trigger gene expression or enzymatic activity within an artificial cell.

Project Idea 01
Idea 02

Customizing Nano-Walkers for Targeted Intracellular Transport

Designing high-efficiency artificial motor proteins (Kinesin-like) that transport specific "cargo" (enzymes, DNA, or metabolites) to precise locations within a synthetic cell.

Project Idea 02
Idea 03

Stimuli-Responsive Nanopores for Selective Communication

Designing gated transmembrane proteins (nanopores) that open/close in response to light or voltage, controlling the flux of ions and molecules in milliseconds.

Project Idea 03

Week 4 HW: Protein Design I

cover cover

Week 4: Protein Design I

This week, we explored the fundamentals of protein structure, visualization, and the latest ML-based tools for design and analysis.


Part A. Conceptual Questions

I’ve selected 9 questions from Shuguang Zhang’s list to answer below:

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (Average AA ~100 Daltons)
Meat is typically ~20% protein by weight. So, 500g of meat contains ~100g of protein. Given the average amino acid is ~100 Daltons (100 g/mol), 100g corresponds to exactly 1 mole of amino acids. Using Avogadro's number, that is **6.022 × 1023 molecules**.
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Because through digestion, we break down these complex proteins into their individual building blocks—the 20 standard amino acids. Our cells then use these generic "bricks" to build human-specific proteins based on the "blueprints" stored in our own DNA.
3. Why are there only 20 natural amino acids?
Evolutionarily, these 20 provide enough chemical diversity (size, charge, hydrophobicity) to create almost any fold. Additionally, the genetic code (64 codons) is saturated by these 20, plus stop signals. It's a "frozen accident" that was optimized early in the origin of life.
4. Can you make other non-natural amino acids? Design some new amino acids.
Yes, using expanded genetic codes (like amber codon suppression). One could design a "para-azidophenylalanine" for **click chemistry** (allowing site-specific labeling) or a "coumarin-tagged alanine" for **intrinsic fluorescence** tracking within a cell.
5. Where did amino acids come from before enzymes that make them, and before life started?
They were synthesized prebiotically. Key sources include the **Miller-Urey experiment** (lightning in a reducing atmosphere), hydrothermal vents on the ocean floor, and even extraterrestrial delivery via carbonaceous chondrite meteorites (like the **Murchison meteorite**).
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
A D-amino acid α-helix would be **left-handed**. This is the mirror image of the standard right-handed α-helix formed by natural L-amino acids.
7. Can you discover additional helices in proteins?
Yes, besides the α-helix, proteins exhibit **310 helices** (tighter, 3 residues per turn) and **π-helices** (wider, 4.4 residues per turn), though they are less common and often occur at the ends of standard helices.
8. Why are most molecular helices right-handed?
For L-amino acids, the right-handed α-helix is thermodynamically more stable because the side chains (R-groups) point away from the helix core, minimizing steric clashes. In a left-handed helix, L-amino acid side chains would clash with the backbone.
9. Why do β-sheets tend to aggregate? What is the driving force?
β-sheets aggregate through complementary hydrogen bonding between the backbones of neighboring strands. The primary driving force is the **hydrophobic effect**, where the "sticky" hydrophobic side chains on the faces of the sheets pack together to exclude water, and the enthalpy of H-bond formation.

Part B: Protein Analysis and Visualization

I selected the LuxR Regulator (PDB ID: 7AMT) from Vibrio alginolyticus for this analysis. This protein is a master transcription factor in quorum sensing, highly relevant to my research on synthetic biosensors.

1. Protein Description

LuxR is the primary regulator for quorum sensing in Vibrio species. It acts as a DNA-binding protein that senses the bacterial population density (via autoinducers) and subsequently activates or represses hundreds of target genes. Understanding its structural interaction with DNA is critical for engineering fine-tuned genetic circuits.

2. Sequence Analysis

  • Sequence (Chain A): MKNIKLFVSSYPLNQEELKQLIASTGYHVIKATSQNLNVEQSEIEMAIGKNIKGKITKKEAEILFKQEVEAAVRAILRNAKLEVIYDSLDAVRTASLINFIFQLGDAGIARYVNSLRMLQQKRWDETAVNKAKSRWYNQTPNRAKRIITTFRTGTWDAYKNL
  • Length: 162 Amino Acids.
  • Most Frequent Amino Acid: Leucine (L). As a predominantly α-helical protein, Leucine is frequently used to stabilize the hydrophobic core of the helices.

3. Homologs and Family

  • Homologs: Searching UniProt with BLAST reveals thousands of homologs across the Vibrionaceae family, many with >90% identity.
  • Family: It belongs to the LuxR / TetR family of transcriptional regulators.

4. Structure Details (RCSB)

  • Structure Page: RCSB PDB 7AMT
  • Title: Structure of LuxR with DNA (activation)
  • Organism: Vibrio alginolyticus
  • Expression System: Escherichia coli
  • Deposited: 2020-10-09 | Released: 2021-03-31
  • Authors: Liu, B., Reverter, D.
  • Quality: 2.60 ƅ resolution. This meets the criteria for a “good quality structure” (< 2.70 ƅ).
  • Other Molecules: The solved structure is a complex containing the protein dimer bound to Double-Stranded DNA (the binding site).

5. 3D Visualization

I analyzed the crystal structure using the RCSB Mol* viewer to observe the modular architecture of the LuxR dimer.

LuxR Actual Structure LuxR Actual Structure Figure 1: Crystal structure of LuxR dimer (Orange/Green) bound to activation DNA (Pink/Purple) from PDB 7AMT.

Visual Analysis:

  • Visualization Mode: The protein is shown in Cartoon representation, while DNA is shown in a Ribbon format to highlight the backbone topology.
  • Secondary Structure: The protein is almost entirely composed of α-helices (Spiral shapes). Observations show a significant dominance of helices over β-sheets, which is characteristic of the TetR/LuxR superfamily.
  • Residue Distribution: (Based on hydrophobicity analysis) Hydrophobic residues are primary buried within the helical bundles to form a stable core, while the exterior surfaces, especially those contacting the DNA phosphate backbone, are enriched with polar and positively charged residues (Arg, Lys).
  • Surface & Binding Pockets: The visualization clearly shows the DNA-binding cleft. The helix-turn-helix (HTH) motifs of the dimer insert into the major grooves of the DNA, while the N-terminal extension reaches into the minor groove, providing a precise “lock and key” recognition mechanism.

Part C. Using ML-Based Protein Design Tools

Using ESM models and ProteinMPNN, I analyzed the designability of the LuxR protein using the sequences and structural motifs identified in Part B.

C1. Protein Language Modeling

ESM-2

I performed an unsupervised deep mutational scan of the LuxR (7AMT) sequence using the ESM-2 language model.

LuxR ESM-2 Heatmap LuxR ESM-2 Heatmap Figure 2: Deep Mutational Scan Heatmap for LuxR (7AMT) predicted by ESM-2.

Analysis of Patterns:

  • Amino Acid Specificity: There is a clear trend where mutations to Proline (P) and Cysteine (C) are consistently predicted as highly unfavorable (dark purple columns). This is likely because Proline introduces structural kinks that disrupt the α-helical stability, and Cysteine can introduce unintended disulfide bonds.
  • Conserved Residues: Several vertical “dark stripes” are visible across the sequence, particularly in the C-terminal DNA-binding domain. These represent highly conserved residues where any mutation is predicted to be deleterious, indicating their critical role in folding or DNA interaction.
  • Design Potential: The lighter areas (Yellow/Green) identify potential regions where the protein may be more tolerant to mutations, providing a roadmap for engineering sensors with shifted ligand or DNA specificity.

C2. Protein Folding

ESMFold

I used ESMFold to predict the 3D structure of the LuxR (7AMT) monomer directly from its amino acid sequence.

LuxR ESMFold Prediction LuxR ESMFold Prediction Figure 3: ESMFold structural prediction for the LuxR monomer, colored by rainbow (N-to-C).

Observations:

  • Structural Integrity: The predicted structure shows a high degree of organization, characterized by multiple α-helices bundled together. This matches the experimental findings from Part B (7AMT), where the protein was found to be predominantly helical.
  • Model Confidence: The highly compact nature of the predicted fold suggests a high confidence score (expected pLDDT > 90).
  • Comparison to PDB: While the actual crystal structure is a homodimer, ESMFold’s monomeric prediction accurately captures the fold of an individual subunit. The specific arrangement of the DNA-binding domain (HTH motif) is clearly visible, demonstrating that the language model has “learned” the structural rules governing these transcription factors.

C3. Protein Generation

ProteinMPNN

I used ProteinMPNN to perform inverse folding on the LuxR dimer backbone (7AMT), generating a novel sequence that fits the same 3D coordinates.

LuxR ProteinMPNN Probs LuxR ProteinMPNN Probs Figure 4: Amino acid probability map generated by ProteinMPNN for the LuxR backbone.

Sequence Generation Results:

  • Generated Sequence: EVLPREELRARILEAAFEVFAEKGLENANFSDIAERLNIPRSTVRYHFPSREELKTTVLKAVIEKMKKFFEENIDPEASLRENLLRLFRAFLEKVKAKEPWLTIYMEASKDDSPEIKPLYEKLSKEILGLVRGLFERAKERGEIPADLDPEELAKRFFELLRELYEEGKKLXXXEELEKRIEELLEKYP
  • Sequence Recovery: 30.11% (seq_recovery=0.3011). This indicates that the AI redesigned about 70% of the sequence while maintaining the core structural requirements.
  • Score: 1.0294 (lower indicates higher confidence in the design).

Analysis:

  • Probabilistic Mapping: The heatmap shows high-probability “spikes” for certain residues, particularly in the interior buried sites (Leucine, Isoleucine), which are essential for maintaining the helical architecture.
  • Sequence Diversity: The 30% recovery rate shows that there is significant sequence space available for this specific fold. This demonstrates that we can radically change the sequence of LuxR to improve stability or change its properties while keeping its functional “shape” intact.

Part D. Group Brainstorm on Bacteriophage Engineering

Group Goal: Higher Toxicity of Lysis Protein (L Protein) Strategy Name: Hapi: Final Version

1. Proposed Strategy

Enhancing Lytic Toxicity via Membrane-Targeting Optimization and DnaJ-Independence.

To address the goal of “Higher toxicity of lysis protein,” I propose focusing on the physicochemical interaction between the L protein and the E. coli inner membrane, bypassing the inhibitory effects of the host DnaJ chaperone.

2. Tools & Approaches

  • Molecular Dynamics (MD) Simulations: Complementing AlphaFold, I propose using MD simulations to model the L protein’s insertion into a simulated E. coli lipid bilayer. This allows us to observe the real-time interaction of the N-terminal basic residues with the membrane, which is critical for lysis efficiency.
  • Charge Distribution Analysis: We will re-engineer the L protein’s basic residues to maintain membrane affinity even when the host triggers the PmrA/PmrB system, which alters membrane charge as a defense mechanism.

3. Rationale: Why these tools help?

The L protein’s “toxicity” is a race against host defenses. MD simulations help us design a protein that inserts more “promiscuously” and rapidly into the membrane. By ensuring the protein is less sensitive to the host’s physiological state (e.g., changes in membrane charge), we increase its robust lytic activity.

4. Potential Pitfalls

  • Membrane Over-toxicity: A too-efficient L protein might aggregate prematurely or be targeted by host proteases like DegP before reaching the membrane.
  • Simulation vs. Reality: MD simulations may not fully capture the crowded environment of the E. coli periplasm, leading to a discrepancy between in silico and in vivo results.

5. References

6. AI-Assistance Documentation

  • Google Gemini (2026, March 8): Prompted for perspectives on engineering lytic toxicity focusing on membrane lipid composition and host defense systems (PmrA/PmrB).
  • Google Gemini (2026, March 8): Analyzed how MD simulations complement AlphaFold for membrane insertion studies and identified protease pitfalls (DegP).
  • Google Gemini (2026, March 8): Verified academic references and URLs for MS2 L protein mechanisms.

Week 5 HW: Protein Design II

cover cover

Week 5: Protein Design II

This week focuses on designing and evaluating therapeutic peptides for SOD1 mutant A4V, a key player in familial Amyotrophic Lateral Sclerosis (ALS).


Part A: SOD1 Binder Peptide Design

1. Preparation: Mutant SOD1 Sequence

I retrieved the human SOD1 sequence (P00441) and introduced the A4V mutation (Alanine to Valine at residue 4, relative to the processed chain).

Original Sequence (P00441):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutant Sequence (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Part 1: Generate Binders with PepMLM

PepMLM-650M

The first step is to generate candidate binders using target-conditioned masked language modeling. I used the PepMLM-650M model to sample 12-residue peptides conditioned on the A4V mutant SOD1 sequence.

Peptide IDSequence (12 AA)Perplexity Score
Known BinderFLYRWLPSRRGG(Reference)
PepMLM-0WRSYVVAVRHKA13.12
PepMLM-1WRSPVTAAALKK8.76
PepMLM-2WLYGAVGARHKE12.66
PepMLM-3WRYYVAVVRHKE26.45

Observations:

  • Amino Acid Substitution: The model generated an undefined amino acid “X” at the C-terminus of PepMLM-0. To enable structural prediction in AlphaFold3, I replaced it with Alanine (A).
  • PepMLM-1 achieved the lowest perplexity score (8.76), indicating the highest model confidence in its affinity for the mutant SOD1 target.
  • Most generated sequences show a high frequency of positively charged residues (Lysine, Arginine) or hydrophobic residues (Valine, Alanine), which may be important for interacting with the destabilized N-terminus of SOD1.
  • These candidates will now be validated structurally using AlphaFold3.

Part 2: Evaluate Binders with AlphaFold3

AlphaFold3 Server

I modeled the candidate peptides with the mutant SOD1 (A4V) using the AlphaFold3 Server to evaluate structural confidence and binding sites.

Comparison Result: PepMLM-0 (WRSYVVAVRHKA)

SOD1 PepMLM-0 AlphaFold3 SOD1 PepMLM-0 AlphaFold3 Figure 2: AlphaFold3 prediction of PepMLM-0 (Yellow/Orange).

MetricValue
ipTM Score0.39

Key Result: PepMLM-1 (WRSPVTAAALKK)

SOD1 PepMLM-1 AlphaFold3 SOD1 PepMLM-1 AlphaFold3 Figure 3: AlphaFold3 prediction of PepMLM-1 docking to SOD1 A4V (Blue).

MetricValue
ipTM Score0.56

Comparison Result: PepMLM-2 (WLYGAVGARHKE)

SOD1 PepMLM-2 AlphaFold3 SOD1 PepMLM-2 AlphaFold3 Figure 4: AlphaFold3 prediction of PepMLM-2.

MetricValue
ipTM Score0.38

Comparison Result: PepMLM-3 (WRYYVAVVRHKE)

SOD1 PepMLM-3 AlphaFold3 SOD1 PepMLM-3 AlphaFold3 Figure 5: AlphaFold3 prediction of PepMLM-3.

MetricValue
ipTM Score0.30

Reference: Known Binder (FLYRWLPSRRGG)

SOD1 Known Binder AlphaFold3 SOD1 Known Binder AlphaFold3 Figure 6: AlphaFold3 prediction of the known SOD1-binding peptide.

MetricValue
ipTM Score0.34

Analysis & Comparison:

  • PepMLM-1 vs. Known Binder: Remarkably, PepMLM-1 (ipTM 0.56) significantly outperforms the known binder (ipTM 0.34) in terms of structural binding confidence. This suggests that target-conditioned generation via PepMLM can yield candidates with superior theoretical affinity than previously identified sequences.
  • Correlation with Perplexity: The PepMLM Perplexity scores correlate well with structural confidence (ipTM). PepMLM-1 (8.76) is the top design, while the other generation candidates (Perplexity 12.6–26.4) and the known binder all achieved lower ipTM scores across the surface loops.
  • Common Binding Motifs: Both the PepMLM peptides and the known binder tend to localize on the exposed surface loops or β-sheet edges of the SOD1 β-barrel. This implies a general affinity for the protein’s “sticky” solvent-exposed patches.
  • Site Localization: None of the peptides—including the known binder—deeply targeted the N-terminal A4V mutation pocket in these simulations. This highlights that while we have found strong surface binders, specific “pocket-filling” designs may require the site-specific guidance of models like moPPIt.

Part 3: Evaluate Properties in PeptiVerse

PeptiVerse

Beyond structural docking, we must evaluate the pharmacological and therapeutic properties of the designed peptides. I used PeptiVerse to predict how these candidates would behave in a biological environment.

Peptide IndexSequenceAffinitySolubilityHemolysisNet ChargeAF3 ipTM
ReferenceFLYRWLPSRRGG[Pending][Pending]Non-hemolytic (0.047)+2.760.34
0 (X→A)WRSYVVAVRHKA[Pending][Pending]Non-hemolytic (0.031)+2.850.39
1WRSPVTAAALKK[Pending][Pending]Non-hemolytic (0.020)+2.760.56
2WLYGAVGARHKE[Pending][Pending]Non-hemolytic (0.035)+0.850.38
3WRYYVAVVRHKE[Pending][Pending]Non-hemolytic (0.057)+1.850.30

Observations:

  • AI-Designed vs. Known Binder: The AI-designed lead candidate, PepMLM-1, demonstrates superior structural confidence (ipTM 0.56) compared to the known binder (ipTM 0.34).
  • Safety Profile: PepMLM-1 also shows a lower predicted hemolysis probability (0.020) than the reference sequence (0.047), suggesting that sequence-conditioned generation can simultaneously optimize for both affinity and therapeutic safety.
  • Biochemical Consistency: Most successful candidates (PepMLM-0, 1) and the known binder share a high positive net charge (+2.7 to +2.8) at physiological pH, likely facilitating the initial attraction to the target protein’s surface.

Recommendation: Based on the integrated analysis of structural confidence and therapeutic safety, I recommend advancing PepMLM-1 (WRSPVTAAALKK) toward clinical development. It offers the best overall profile:

  1. Superior Binding: Highest ipTM score (0.56), significantly outperforming the known binder (0.34).
  2. Optimal Safety: Lowest predicted hemolysis probability (0.020) among all tested sequences.
  3. Physicochemical Favorability: Strong net positive charge (+2.76) at physiological pH, aligning with confirmed binding patterns for SOD1.

Part 4: Optimized Design with moPPIt

moPPIt (MOG-DFM)

While PepMLM provides plausible binders based on sequence context, moPPIt (Multi-Objective Guided Discrete Flow Matching) allows for controlled design. I used moPPIt to steer peptide generation toward specific surface patches on SOD1 and optimize for multiple objective functions simultaneously (Affinity, Solubility, and Hemolysis).

moPPIt Generated Candidates:

SequenceMotif ScoreBinding MetricSolubility ScoreHemolysis Score
NKKSGEWFQKPG0.755.750.680.58
KQTKIERPCCVQ0.756.620.670.57
QACGTGVVGTTF0.676.880.670.63

Analysis: moPPIt vs. PepMLM

  • Targeted Binding: Unlike the PepMLM leads which tended to bind general surface loops, the moPPIt-generated sequences like NKKSGEWFQKPG show a distinct motif structure. By specifying residue indices near position 4, moPPIt was able to “search” for sequences that specifically complement the destabilized N-terminus environment.
  • Complexity of Design: The moPPIt candidates exhibit a more diverse range of chemical functionalities, including specific motifs (e.g., the Proline-Glycine “turn” in ...QKPG) that are optimized to fit the target surface while maintaining high solubility.
  • Evaluation for Clinical Use: Before advancing these moPPIt designs, I would validate them using specialized assays:
    1. Biolayer Interferometry (BLI): To measure the actual $k_{on}$ and $k_{off}$ rates of the synthetic peptides against the recombinant A4V SOD1 protein.
    2. Aggregation Inhibition Assay: Since A4V causes aggressive aggregation, the ultimate test is whether these peptides prevent the mutant SOD1 from forming toxic fibrils in vitro.
    3. Cell-based Toxicity Rescue: Testing whether the peptides can rescue motor neuron-like cells (e.g., NSC-34) expressing the A4V mutant from SOD1-mediated proteotoxicity.

Part C: Final Project - L-Protein Mutants

Objective: Improve the stability and auto-folding of the lysis protein of the MS2 phage.

Current Progress:

  • [Task 1: Retrieve L-protein wild-type sequence]
  • [Task 2: Identify potential destabilizing regions]
  • [Task 3: Plan ML-guided mutagenesis]

Week 6 HW: DNA Assembly

cover cover

Week 6: DNA Assembly

This week explores the experimental design and in-silico simulation of constructing genetic circuits. The first section details standard molecular cloning techniques like PCR and Assembly, whilst the second section documents my modeling results in Asimov Kernel.


Assignment: DNA Protocol Questions

1. Phusion High-Fidelity PCR Master Mix

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

  • Phusion DNA Polymerase: The core enzyme. It possesses 5’→3’ polymerase activity for synthesis and 3’→5’ exonuclease activity for proofreading. This minimizes errors during amplification.
  • Deoxynucleotide triphosphates (dNTPs): The basic building blocks (dATP, dCTP, dGTP, dTTP) used by the polymerase to synthesize the new matching DNA strand.
  • Reaction Buffer (with MgClā‚‚): Provides the optimal pH and salt conditions. Magnesium ions (Mg²⁺) are a crucial cofactor that the polymerase requires to function properly.

2. Primer Annealing Temperature

What are some factors that determine primer annealing temperature during PCR?

  • Primer Target Sequence (GC Content): Guanine and Cytosine form 3 hydrogen bonds compared to the 2 between Adenine and Thymine. Primers with a higher GC percentage have a higher melting temperature ($T_m$) and therefore require a higher annealing temperature.
  • Primer Length: Longer primers have more complementary bases to pair with the template, increasing the sum of hydrogen bonding forces, thereby raising the $T_m$.
  • Salt Concentration: The concentration of monovalent cations (like K⁺ or Na⁺) in the buffer stabilizes the DNA duplex, affecting the required annealing temperature.

3. PCR vs Restriction Enzyme Digests

Compare and contrast these two methods of creating linear DNA fragments.

Polymerase Chain Reaction (PCR)

Protocol: Utilizes thermal cycling (denaturation, annealing, extension) along with synthetic primers, dNTPs, and a polymerase enzyme to exponentially copy a specific DNA sequence.

Pros: Does not require large starting amounts of DNA. Allows you to "add" sequences (like homology overlaps, restriction sites, or barcodes) to the ends of the fragment via the primers.

Use Case: Ideal for amplifying a specific gene from a genomic sample or preparing fragments for homology-based assembly (like Gibson Assembly).
Restriction Enzyme Digest

Protocol: Involves incubating the target plasmid or DNA snippet at a steady temperature (typically 37°C) with endonucleases that cut at specific recognition motifs.

Pros: Extremely precise with practically zero sequence mutations compared to PCR. Inexpensive and generates predefined overhangs for standard ligation.

Use Case: Ideal for moving existing fragments between standardized plasmids (like BioBrick standards), checking plasmid structure (diagnostic digest), or when high-fidelity amplification is too risky for a very large fragment.

4. Preparation for Gibson Assembly

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Gibson Assembly requires adjacent segments of DNA to share common sequences. When designing the primers for PCR, you must add a homology overhang (typically 15–40 base pairs) to the 5’ end of the primer. This overhang strictly matches the end sequence of the adjacent piece of DNA it is meant to assemble next to. Without this overlap, the T5 Exonuclease cannot create the required complementary single-stranded sticky ends.

5. Plasmid Transformation into E. coli

How does the plasmid DNA enter the E. coli cells during transformation?

Generally, there are two common approaches:

  • Chemical Transformation (Heat Shock): Cells are washed in a calcium chloride ($CaCl_2$) solution to neutralize the negative charges on the cell membrane and DNA. A sudden “heat shock” (e.g., 42°C for 45s) creates a thermal imbalance that sweeps the DNA into the cell through transient pores.
  • Electroporation: The bacteria and DNA are subjected to a very brief, high-voltage electrical pulse. This localized electric field disrupts the phospholipid bilayer, opening temporary holes for the DNA to enter.

Detail of Another Assembly Method

Golden Gate Assembly

Golden Gate Assembly is a sophisticated and highly efficient method for seamlessly joining multiple DNA fragments together in a single reaction tube.

At the heart of the method are Type IIS restriction enzymes (like BsaI or BsmBI). Unlike traditional restriction enzymes that cut exactly at their recognition site, Type IIS enzymes cut a specific distance away from their recognition site. This characteristic lets you place the recognition site on the very edge of the DNA insert. Once the enzyme makes the cut, the recognition site is entirely removed from the target fragment, leaving behind a 4-base pair overhang that you can completely customize.

By designing each consecutive DNA fragment with matching, distinct overhangs, they will naturally self-assemble in the correct order. The reaction mixture contains both the Type IIS restriction enzyme and DNA Ligase. It is put in a thermocycler that alternates between the optimal temperature for the cutter (e.g., 37°C) and the ligase (e.g., 16°C). Because the final, correctly assembled DNA no longer contains any recognition sites, the reaction is unidirectional—once the fragments are joined properly, they can never be re-cut by the enzyme. This enables the assembly of dozens of parts efficiently.

Simplified Model of Golden Gate Assembly:

graph TD
    A[Part 1: Promoter] -->|BsaI Cut| F1[Overhang: AATG]
    B[Part 2: Gene] -->|BsaI Cut| F2[Overhang: AATG / TCGG]
    C[Part 3: Terminator] -->|BsaI Cut| F3[Overhang: TCGG]
    
    F1 -. Ligation .-> F2
    F2 -. Ligation .-> F3
    
    F3 --> Final[Seamless Final Construct<br/>No BsaI sites remain!]
    style Final fill:#d4edda,stroke:#28a745,stroke-width:2px;

Assignment: Asimov Kernel Exploration

šŸ’”
I used Asimov Kernel to model basic gene regulatory networks and dynamic behaviors using characterized bacterial parts.

1. Recreating the Repressilator

Hypothesis: The Repressilator is a classic synthetic biology oscillator built from a ring of three repressor genes (tetR, cI, and lacI). Each repressor inhibits the transcription of the next one in the cycle. Because there are three nodes (an odd number), the system cannot settle into a stable state. It should produce continuous oscillations, like a biological clock.

Results: When simulating the construct, I observed wave-like oscillations in the concentration of the reporter protein over time. The peaks of each repressor’s expression lagged behind the previous one in the cycle, exactly matching the behavior of the Demo repository construct.

2. My Custom Construct 1: Constitutive GFP Generator

Design: Constitutive Promoter -> RBS -> GFP -> Terminator Hypothesis: This is the simplest possible circuit. Because the promoter is constitutive (always “on”) and has no repressor binding sites, the production of GFP should rise steadily and then plateau when the rate of production matches the rate of degradation and cell growth dilution. Results: As expected, the simulated curve showed a rapid logarithmic growth in GFP concentration that leveled off at a high, steady-state amount.

3. My Custom Construct 2: Inducible Toggle Switch

Design: I linked two repressor systems (Repressor A inhibits Promoter B and Repressor B inhibits Promoter A). A reporter gene is placed downstream of Promoter A. Hypothesis: This setup creates a bistable system. It can remain indefinitely in State A or State B, but not both. By adding a chemical inducer midway through the simulation that neutralizes Repressor B, the system should forcefully switch states. Results: The simulation started in one state (low reporter). Upon the simulated addition of the inducer, I observed an immediate “flip,” where the reporter concentration jumped to high and stayed there even as the inducer degraded, demonstrating cellular memory.

4. My Custom Construct 3: Feed-Forward Loop (FFL)

Design: A network where Protein X activates Protein Y, and both Protein X and Protein Y are required to activate the final Reporter Protein Z. Hypothesis: This is a coherent type-1 feed-forward loop. Functionally, it acts as a “sign-sensitive delay.” It filters out brief, noisy signals. If the input to X is only brief, Y won’t build up enough to turn on Z. Only a sustained input will turn on Z. Results: By simulating short pulses of input vs. a continuous sustained input, the simulation clearly showed the Reporter Z only activating when the input pulse duration exceeded the buildup threshold of Protein Y.

Week 7 HW: IANNs & Fungal Materials

cover cover

Week 7: IANNs & Fungal Materials

This week covers two advanced synthetic biology paradigms: Intracellular Artificial Neural Networks (IANNs) for complex logical decision making, and the engineering of macro-scale Fungal Materials.


Part 1: Intracellular Artificial Neural Networks (IANNs)

1. Advantages over Traditional Boolean Genetic Circuits

While traditional genetic circuits rely on rigid, binary Boolean logic (AND, OR, NOT), Intracellular Artificial Neural Networks (IANNs) can process analog (continuous) signals. They are fundamentally capable of:

  • Weight Tuning: By adjusting ribosome binding site (RBS) strengths or promoter affinities, users can “weight” different inputs.
  • Complex Pattern Recognition: IANNs can classify complex metabolic states combining multiple molecular markers that might otherwise be too noisy for sharp Boolean switches to handle effectively.
  • Non-linear computation: Using cooperative binding or enzymatic thresholds, they can perform fuzzy logic and handle biological noise much more robustly.

2. Application for an IANN

Application Target: A therapeutic “Cancer Cell Classifier” circuit.

  • Inputs ($X_1, X_2, X_3$): Intracellular concentrations of three different oncogenic microRNAs (e.g., miR-21, miR-155) or cancer-specific transcription factors.
  • Output ($Y$): Production of a pro-apoptotic protein (such as Bax or Caspase-9).
  • Behavior: The IANN receives the concentrations of the input markers. Instead of firing if only one crosses a threshold (which might cause a false positive in a healthy cell), the network computes a weighted sum of the markers. Only if the integrated score crosses the hidden layer’s activation threshold does it trigger the output, executing cell death selectively.
  • Limitations: Maintaining multiple distinct plasmids or thick gene cassettes places a massive metabolic load on the cell. Furthermore, crosstalk between similar biological components in different layers can cause short-circuits.

3. Diagram of a Multilayer Perceptron IANN

Here is a conceptual diagram of a multilayer perceptron where Layer 1 outputs an endoribonuclease (like Csy4), which in turn regulates (cleaves) the mRNA of a fluorescent protein in Layer 2.

graph TD
    classDef input fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px;
    classDef layer1 fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px;
    classDef layer2 fill:#e8f5e9,stroke:#4caf50,stroke-width:2px;
    classDef output fill:#fff3e0,stroke:#ff9800,stroke-width:2px;

    Input1[Input Signal X1<br/>e.g., Inducer]:::input
    Input2[Input Signal X2<br/>e.g., Inducer]:::input

    subgraph Layer 1: Hidden Layer
    L1Tx[Transcription Factor]:::layer1
    L1TF[Csy4 Endoribonuclease]:::layer1
    end

    Input1 --> L1Tx
    Input2 --> L1Tx
    L1Tx --> L1TF

    subgraph Layer 2: Output Layer
    L2mRNA[Target mRNA<br/>with Csy4 cut site]:::layer2
    L2Fluor[Fluorescent Protein]:::layer2
    end

    L1TF -- Cleaves Repressive Site --> L2mRNA
    L2mRNA -- Translation --> L2Fluor:::output

Part 2: Fungal Materials

1. Existing Fungal Materials

Fungal materials are primarily made by growing mycelium (the root-like structure of mushrooms) on an agricultural substrate.

  • Mycelium Packaging (e.g., Evocative Mushroom Packaging): Used to replace styrofoam in shipping.
  • Mycelium Leather (e.g., Mylo, Reishi): A vegan, biodegradable alternative to animal leather used in fashion and upholstery.
  • Acoustic / Structural Panels: Mycelium mixed with hemp or wood chips, baked into bricks for construction.
  • Advantages: Fully biodegradable, carbon-negative or neutral, upcycles agricultural waste (like corn stalks), and requires very little water/energy to grow.
  • Disadvantages: Lower tensile strength compared to synthetic petroleum plastics, water sensitivity (can degrade or mold if not properly sealed or baked), and difficulties in scaling consistent macro-structures.

2. Genetic Engineering of Fungi

Application: Genetically engineer Aspergillus niger or Trichoderma reesei to secrete enzymes capable of degrading microplastics (e.g., PETase for PET plastics) or “forever chemicals” (PFAS). As the mycelium network grows through contaminated soil or water, it acts as a living, expansive bio-filter.

Advantages over Bacteria:

  1. Macroscopic Networks: Fungi naturally form massive physical networks (hyphae) that can penetrate deep into solid substrates, which bacteria cannot do as effectively.
  2. Eukaryotic Machinery: Fungi are eukaryotes. They possess the endoplasmic reticulum and Golgi apparatus necessary for complex post-translational modifications (like glycosylation), which are often required for large, active enzymes to fold properly.
  3. Protein Secretion Capacity: Fungi are natural champions at secreting massive amounts of digestive enzymes into their extracellular environment, scaling far better than E. coli for industrial protein production.

Part 3: First DNA Twist Order

🧬
**Project Action Item:** I have submitted the required Google Form with Aim 1 and shared the folder for the DNA designs.

DNA Design Challenge

Building on my final project regarding MS2 Phage L-Protein Mutants (from Week 5).

  • Insert Sequence: Synthetically designed, codon-optimized MS2 L-protein sequence with stabilizing mutations at the N-terminus.
  • Backbone Vector: pET-28a(+)
  • Purpose: The insert is flanked by restriction sites (e.g., NcoI and XhoI) to be cloned into the pET-28a(+) backbone, allowing robust IPTG-inducible expression of the mutated L-protein in E. coli BL21(DE3) cells for subsequent stability assays. The presence of the vector’s N-terminal His-tag will allow for easy IMAC purification.

Labs

Lab writeups:

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Week 3 Lab: Opentrons Art

cover cover

Week 3 Lab: Opentrons Art

Opentrons Art Lab

In this lab, we used the Opentrons OT-2 liquid handling robot to create “art” by pipetting colored liquids into a 96-well plate.

Goal

The goal of this lab was to familiarize ourselves with the Opentrons platform, Python API, and the basics of liquid handling automation.

Protocol

  1. Design the Art: Use the GUI at opentrons-art.rcdonovan.com to map colors to specific wells.
  2. Write the Python Script: Utilize the Opentrons API (v2) to define labware (96-well plate, tip racks, reservoir) and instruments (P300 pipette).
  3. Setup the Robot: Load the OT-2 with the required labware. In my case, I used a reservoir for the source colors and a 96-well flat-bottom plate for the canvas.
  4. Run the Protocol: Execute the script via the Opentrons App or Jupyter Notebook.

Python Script

The script used to generate the art can be found in my Homework Assignment page.

Results

Below is the resulting “bio-art” created by the robot. Each well represents a pixel of the design, filled with precisely metered colored liquid to form Pac-Man and his ghosts.

Pac-Man Opentrons Art Pac-Man Opentrons Art (Resulting design generated via the Python script)

Reflection

The Opentrons OT-2 proved to be a user-friendly platform. The transition from a GUI-generated design to a functional Python script was seamless. This exercise highlighted the importance of clear labware definitions and the power of open-source automation in scaling creative biological workflows.

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image

MoU

BioClub Committed Listener MoU

HTGAA Committed Listener (CL) Agreement

I am a HTGAA Committed Listener, my responsibilities are:

  • Watching class lectures and recitations
  • Participating in node reviews
  • Developing and documenting my homework
  • Actively communicating with other students and TAs on the forum
  • Allowing HTGAA and BioClub to share my work (with attribution)
  • Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
  • Following locally applicable health and safety guidance
  • Promoting a respectful environment free of harassment and discrimination

Signed by committing this file to my documentation page/repository,

Hotaku Komatsu

March 9, 2026