Subsections of MIT Media Lab - HTGAA Spring 2026

Homework

WK-01 Governance diagram
WEEK 01 / PRINCIPLES + PRACTICES
Principles & Practices

Growing Serratia marcescens and Bacillus sp. pigments for bio-art. Mechanotransduction experiments. Governance framework for engineered organisms in gallery settings.

WK-02 Benchling plasmid
WEEK 02 / DNA READ, WRITE & EDIT
DNA Read, Write & Edit

Codon-optimising VioC from C. violaceum. Circadian antimicrobial system. Magainin peptide under RpaA promoter. Twist synthesis order prep.

WK-03 Opentrons
WEEK 03 / HW + LAB / OPENTRONS
Week 3 HW/Lab: Opentrons

Draft final project directions. Weighing Twist costs, remote lab availability, and UK-based ordering constraints to identify the most viable pathway.

WK-04 GABARAPL2 surface
WEEK 04 / PROTEIN DESIGN PT.I
Protein Design I

GABARAPL2 analysis in PyMol (PDB 7LK3). ESM2 mutational scan, t-SNE latent space, ESMFold, ProteinMPNN inverse folding. Bacteriophage engineering brainstorm.

WK-05 AlphaFold3 SOD1 binder structures
WEEK 05 / PROTEIN DESIGN PT.II
Protein Design II

SOD1-A4V binder peptide design. PepMLM generation, AlphaFold3 evaluation, PeptiVerse property scoring, moPPIt optimisation. L-protein mutant plan for MS2 phage lysis.

WK-06 Genetic circuits DNA assembly
WEEK 06 / GENETIC CIRCUITS PT.I
Genetic Circuits I

DNA assembly and PCR components. Phusion High-Fidelity Master Mix, primer annealing, amilCP chromophore mutagenesis, and genetic circuit design fundamentals.

WK-07 Weiss Lab IANN run
WEEK 07 / GENETIC CIRCUITS PT.II
Genetic Circuits II

IANNs vs Boolean circuits. KaiClock + Durin parallel AND gate run at Weiss Lab. Fungal materials, mycelial pigment lightfastness. Twist DNA order review.

WK-09 Silk fibre earth vs space
WEEK 09 / CELL FREE SYSTEMS
Cell Free Systems

Cell-free protein synthesis advantages. Lyme disease biosensor synthetic cell design. Freeze-dried silk protein self-repair in soft robotics. BioBits space proposal for MaSp1 microgravity assembly.

WK-10 Mass spectrometry charge state analysis
WEEK 10 / IMAGING & MEASUREMENT
Imaging & Measurement

eGFP molecular weight from amino acid sequence. Adjacent charge state mass calculation. Tryptic digest peptide mapping. Native vs denatured mass spectrometry analysis.

WK-11 1536 pixel collaborative canvas
WEEK 11 / BIOPRODUCTION + CLOUD LABS
Bioproduction & Cloud Labs

1,536 pixel collaborative canvas. Cell-free reagent analysis. mTurquoise2 nicotinamide hypothesis for chromophore maturation. Global cell-free master mix design.

Subsections of Homework

Week 1 HW: Principles and Practices

INTERESTS

I am a contemporary artist interested in biomaterials, DNA and new technologies.

Have not experimented with bacterial pigments so thought of using the following as a starting point:

  • Bacillus species (orange/yellow)
  • Serratia marcescens (red/pink)
  • Environmental isolates from soil

Firstly, in growing them myself (which I am new to), as well as mechanotransduction experiments with sounds and vibrations; having the bacteria’s pigment respond to sounds and vibrations. Connecting mechanosensitive channels to pigment gene expression.

If possible, explore the possibilities of UV-protective, antimicrobial, colored bioplastic material or packaging using bacterial pigments in a seaweed matrix, and build on what has been done to amplify natural pigment production through gene cloning. Combining bacterial pigments directly with seaweed‑based bioplastic matrices (like carrageenan or alginate) for UV‑protection and antimicrobial function.

Further experiments,looking at creating hybrid strains.

Bio-Art Ethics & Policy Framework

I looked at governance and policy from an artist’s, non-science public, point of view, as well as the fact that in my usage case, the bacterial samples may be presented to the public in a gallery setting.

Primary Goal: Ensure Safe & Responsible Use of Engineered Organisms in Artistic Practice

Secondary Goal: Maintain Public Trust in Bio-Art While Enabling Innovation

Three Governance Actions

Action 1: Tiered Institutional Approval System Highlighting the roles of Biosafety Committees, Art Institutions, and Artists.Actor 1 (Biosafety Committees),Actor 2 (Art Institutions),Actor 3 (Artists).

Action 2: Open-Source Documentation Standard & Community Vetting Outlining the purpose of shared safety standards and the involvement of Artists, Scientists, and the Community. Purpose: Currently, bio-art practitioners work in isolation without shared safety standards, Actor 1 (Artists & Scientists), Actor 2 (Community.

Action 3: Technical Safety Infrastructure & Insurance Product Addressing artist liability through the collaboration of Engineers, Certification Bodies, and Artists.Purpose: Currently, artists mostly bear full liability for bio-art installations. Actor 1 (Engineers/Companies), Actor 2 (Certification Bodies), Actor 3 (artist)

cover image cover image

Risk Assessment Matrix

cover image cover image

Scoring Matrix Action 1 (Tiered Institutional Approval) scores best on biosecurity and lab safety prevention, as a formal approval system is the most direct way to stop unsafe practices before they happen. It is moderately feasible but places a higher burden on stakeholders and could slow research. Action 2 (Open-Source Documentation Standard) scores best overall, performing well across biosecurity response, feasibility, cost minimization, and promoting constructive applications, making it the strongest all-round option. Action 3 (Technical Safety Infrastructure and Insurance) scores weakest on feasibility and cost, as it requires significant infrastructure investment and places the heaviest financial burden on individual artists, though it offers some environmental protection benefits. Overall, Action 2 is the clear leader, Action 1 provides strong institutional backup, and Action 3 is a longer-term aspiration.

Prioritization and Recommendation I would prioritize Action 2 (Open-Source Documentation Standard) combined with Action 1 (Tiered Institutional Approval). Open-source standards score best across nearly all goals and are low-cost and immediately feasible for artists and community bio-labs. Tiered approval adds necessary oversight for public-facing installations. The main trade-off is that Action 2 relies on voluntary community participation, which may be inconsistent. Action 3 (insurance) is the least feasible in the short term and places the highest burden on individual artists, so it is treated as a longer-term goal. My recommendation is directed at iGEM and community biology organisations, who could draft and promote the open-source standard without requiring regulatory approval.

Ethical Reflection Working with living pigment-producing bacteria in a gallery context raised new questions for me about consent and exposure: unlike a lab, gallery visitors have not opted into proximity to engineered organisms. This highlighted a gap in current governance, as bio-art largely falls outside both lab safety regulation and public health frameworks. A potential governance response would be a simple public-disclosure requirement for any bio-art installation using living organisms, similar to ingredient labeling, so that audiences can make informed decisions about their proximity to the work.

Week 2 Lecture Prep

Prof. Jacobson: Question 1 Polymerase Error Rate DNA polymerase has an error rate of about 1 in 106 bases. The human genome is roughly 3 billion base pairs, meaning uncorrected replication would produce thousands of errors per copy. Biology addresses this with proofreading exonucleases built into the polymerase, plus a separate mismatch repair system, bringing the effective error rate down to around 1 in 109.

Prof. Jacobson: Question 2 Coding for a Human Protein Because the genetic code is redundant, the average human protein of ~1,000 amino acids can theoretically be encoded by an astronomically large number of different DNA sequences. In practice most alternatives do not work well because cells have codon usage biases (some codons are translated faster or slower), mRNA secondary structures can block translation, and certain sequences trigger mRNA degradation.

Dr. LeProust: Question 1 Most Common Oligo Synthesis Method The most commonly used method is solid-phase phosphoramidite chemistry, in which nucleotides are added one at a time to a growing chain attached to a solid support.

Dr. LeProust: Question 2 Why Oligos Are Hard to Make Beyond 200 nt Each coupling step is not 100% efficient (around 98-99%), so errors accumulate with every added base. Beyond ~200 nt the fraction of full-length, correct molecules becomes too low to be practically useful.

Dr. LeProust: Question 3 Why You Cannot Make a 2,000 bp Gene via Direct Oligo Synthesis Direct synthesis produces a pool of short, error-prone oligos. A 2,000 bp gene requires assembling many overlapping oligos, and errors in individual oligos get incorporated into the final product. The assembly process can also introduce chimeras (incorrectly joined fragments), making a correct 2,000 bp product impractical without additional error-correction steps.

Prof. Church: Question 1 The 10 Essential Amino Acids and the Lysine Contingency The 10 essential amino acids that animals cannot synthesise are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine (conditionally essential). The Lysine Contingency refers to engineering a dependency on an unnatural amino acid as a biocontainment strategy. Knowing that lysine is already essential and must come from diet, it seems plausible to replace that dependency with a synthetic molecule unavailable in the wild, which would be a practical and minimally invasive form of genetic containment.

Required Readings

  • Course policies and biosafety guidelines from HTGAA Spring 2026 syllabus
  • Institutional biosafety protocols for bio-art installations

Additional Resources

  • Bio-art ethics and safety protocols literature
  • Gallery biosafety requirements for living organism exhibitions
  • Insurance and liability frameworks for bio-art practitioners

Project Context

  • Research focus: Bacterial pigment production (Serratia marcescens, Bacillus species)
  • Applications: Mechanotransduction experiments, UV-protective bioplastic materials, seaweed matrix integration
  • Public engagement: Gallery presentation considerations

AI Assistance

  • Manus AI - Governance framework visualization
    • Date(s) used: February 2026
    • Tasks: Generated visual representations of bio-art governance framework and risk assessment matrix based on author’s policy framework

Acknowledgments

  • HTGAA instructors for guidance on bio-art policy frameworks
  • Course TAs for biosafety protocol clarification

Week 2 HW: DNA Read, Write, and Edit

Part 0: Basics of Gel Electrophoresis

Attend or watch all lecture and recitation videos. YES Optionally watch bootcamp. YES

Part 1

Benchling & In-silico Gel Art

Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion

I imported Lambda DNA into Benchling and simulated restriction enzyme digestion with EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Using the predicted band sizes from each digest, I selected enzyme combinations that would produce bands at specific positions to form a simple geometric pattern in the style of Paul Vanouse’s Latent Figure Protocol work.

Output attempt of a dog! (with tail on the right)

Part 2

No wet lab access

Part 3

DNA Design Challenge

Choose Protein

I chose the amino acid sequence of VioC - Chromobacterium violaceum for Violacein pigment.

I will reverse translate and codon optimize to amplify pigment production and thus its antimicrobial, UV-resistant properties.

sp|Q9S3U9|VIOC_CHRVO Violacein synthase OS=Chromobacterium violaceum (strain ATCC 12472 / DSM 30191 / JCM 1249 / CCUG 213 / NBRC 12614 / NCIMB 9131 / NCTC 9757 / MK) OX=243365 GN=vioC PE=1 SV=2 MKRAIIVGGGLAGGLTAIYLAKRGYEVHVVEKRGDPLRDLSSYVDVVSSRAIGVSMTVRG IKSVLAAGIPRAELDACGEPIVAMAFSVGGQYRMRELKPLEDFRPLSLNRAAFQKLLNKY ANLAGVRYYFEHKCLDVDLDGKSVLIQGKDGQPQRLQGDMIIGADGAHSAVRQAMQSGLR RFEFQQTFFRHGYKTLVLPDAQALGYRKDTLYFFGMDSGGLFAGRAATIPDGSVSIAVCL PYSGSPSLTTTDEPTMRAFFDRYFGGLPRDARDEMLRQFLAKPSNDLINVRSSTFHYKGN VLLLGDAAHATAPFLGQGMNMALEDARTFVELLDRHQGDQDKAFPEFTELRKVQADAMQD MARANYDVLSCSNPIFFMRARYTRYMHSKFPGLYPPDMAEKLYFTSEPYDRLQQIQRKQN VWYKIGRVN

Reverse translate

sp|Q9S3U9|VIOC_CHRVO Violacein synthase OS=Chromobacterium violaceum (strain ATCC 12472 / DSM 30191 / JCM 1249 / CCUG 213 / NBRC 12614 / NCIMB 9131 / NCTC 9757 / MK) OX=243365 GN=vioC PE=1 SV=2 ATGAAGCGAGCGATTATTGTCGGGGGGGGTTTAGCTGGAGGTCTAACTGCGATATACTTGGCTAAACGTGGATACGAGGT ACATGTGGTCGAGAAACGGGGCGACCCACTCAGGGACCTGTCTAGCTATGTTGATGTGGTTTCATCACGCGCAATCGGGG TCAGCATGACTGTAAGAGGCATCAAGTCAGTTTTAGCGGCCGGTATCCCCCGAGCTGAATTAGATGCCTGTGGTGAGCCA ATAGTTGCCATGGCGTTTTCCGTCGGGGGACAATATCGCATGCGGGAACTTAAACCACTCGAAGACTTCCGACCGCTTTC GCTTAACCGAGCAGCCTTCCAGAAGCTTTTGAACAAGTACGCAAACCTTGCCGGCGTACGGTACTATTTCGAACATAAAT GCCTGGATGTAGACCTGGATGGGAAATCCGTACTGATCCAAGGGAAGGACGGACAGCCGCAGCGACTTCAAGGAGATATG ATTATCGGCGCAGATGGGGCACACAGTGCAGTTCGCCAAGCGATGCAGTCAGGATTGCGGCGCTTTGAGTTTCAACAAAC GTTCTTTAGGCACGGGTATAAAACGCTGGTCCTACCCGACGCCCAAGCACTCGGGTATCGAAAGGACACGTTATATTTTT TTGGAATGGACAGCGGAGGGTTGTTCGCAGGCCGAGCCGCAACAATACCCGATGGTAGCGTGTCCATAGCTGTGTGTCTG CCCTACTCCGGCTCCCCCAGTTTGACAACCACAGATGAACCGACTATGCGTGCATTTTTCGACAGGTACTTTGGAGGTCT TCCACGGGATGCGAGGGACGAGATGCTTAGACAATTTTTAGCCAAGCCGTCTAATGATCTAATAAATGTGCGATCTTCAA CTTTTCATTACAAAGGTAACGTTCTGCTTTTAGGCGACGCCGCACATGCTACCGCGCCATTTTTAGGACAAGGCATGAAT ATGGCGTTAGAGGATGCGCGAACATTCGTAGAATTACTTGATCGCCACCAAGGCGATCAGGATAAAGCGTTTCCAGAGTT CACGGAGCTTAGAAAGGTGCAAGCGGACGCGATGCAAGATATGGCCCGGGCGAATTACGATGTTCTATCTTGCTCCAACC CGATTTTTTTTATGAGGGCGCGGTATACCCGCTACATGCACAGCAAGTTTCCGGGACTGTACCCGCCGGATATGGCCGAG AAACTGTATTTCACGTCAGAGCCGTACGATCGATTACAACAAATACAGCGCAAGCAAAACGTATGGTACAAGATAGGCAG AGTTAAT

Codon Optimize

https://en.vectorbuilder.com/tool/codon-optimization/b93b7790-7536-4d9b-a72e-02d62c3944e8.html

Next Next steps would be to embed into a seaweed matrix.

The VioC coding sequence can be transcribed into mRNA and then translated into protein using either a cell-dependent or cell-free method. In a cell-dependent approach, the codon-optimized sequence is cloned into an expression plasmid, transformed into E. coli, and protein production is induced by adding IPTG. The bacteria read the DNA, transcribe it into mRNA, and their ribosomes translate it into the VioC enzyme. In a cell-free approach, the DNA template is added directly to a prepared lysate containing ribosomes, enzymes, and amino acids, and protein is synthesised in a test tube without any living cell.

Part 4

Prepare a Twist DNA Synthesis Order

After reading more on living materials, bacterial pigments, and connecting it to my interest in light and circadian rhythms, I wanted to explore how to make a simple biological system that expresses anti-microbial or other elements only when needed, rather than all the time. So building a ’temporal’ antimicrobial system that produces a bacteria-killing peptide Magainin on a 24-hour schedule controlled by a circadian promoter RpaA. I started with just learning how to design the Magainin peptide and annotate properly.

Benchling

Twist

REF:

  1. Fang et al. (2025) - “Mechanism and reconstitution of circadian transcription in cyanobacteria”
  2. Salis et al. (2009) - “Automated Design of Synthetic Ribosome Binding Sites”
  3. Westerhoff et al. (2008) - “Structure, Membrane Orientation, Mechanism, and Function of Pexiganan (Magainin derivative)”

Part 5

DNA Read/Write/Edit

DNA Read (Sequencing)

5.1 Sequencing Technology: Sub-questions Generation: Sanger sequencing is first-generation. It sequences one DNA fragment at a time using chain-terminating dideoxynucleotides, predating the massively parallel approaches of second-generation (e.g. Illumina) and third-generation (e.g. Oxford Nanopore) methods. Input and preparation: The input is purified plasmid DNA. Preparation involves a PCR step using a single primer to amplify the target region, followed by a cleanup to remove unused nucleotides and primers before the sequencing reaction. Essential steps and base calling: The cleaned PCR product is mixed with a single primer and four fluorescently labelled dideoxynucleotides. A polymerase extends the primer until it randomly incorporates a dideoxynucleotide and terminates. This produces fragments of every possible length, each ending in a fluorescent base. The fragments are separated by capillary electrophoresis and a laser reads the fluorescent colour at each length, which is converted into a base sequence. Output: A chromatogram showing peaks of four colours corresponding to A, T, C, and G, along with a text sequence file. Read length is typically 700-1000 bases.

5.2 Synthesis Technology: Sub-questions

Technology: I would use solid-phase phosphoramidite synthesis via Twist Bioscience to synthesise the pLight-Circadian-Color plasmid, as it offers high accuracy and fast turnaround for sequences up to several kilobases. Essential steps: The sequence is designed in Benchling, codon-optimised, and uploaded to Twist. Twist synthesises overlapping oligos on a silicon chip, assembles them into the full gene fragment, clones the insert into the chosen backbone vector, and sequences the final construct to confirm accuracy before shipping. Limitations: Direct oligo synthesis has a practical length limit of around 200 nucleotides per oligo due to error accumulation, meaning longer genes require assembly from many fragments. Error rates, while low (around 1 in 3,000 bases for Twist), mean some clones may contain mutations and must be sequence-verified before use.

What DNA would you want to sequence and why?

I would sequence my pLight-Circadian-Color plasmid (which contains the RpaA gene from Synechococcus elongatus, an anthocyanin color gene, and a light sensor) to check that it was made correctly before testing if bacteria with this plasmid change color on a 24-hour schedule when exposed to light.

What sequencing technology would you use?

I would use Sanger sequencing because it’s most accurate.

DNA Write (Synthesis)

What DNA would you synthesize and why?

I would synthesize my yet-to-be-completed pLight-Circadian-Color plasmid containing three genes (RpaA from Synechococcus elongatus for timing, anthocyanin for color, light sensor for activation) to test if bacteria can change color on a 24-hour schedule in response to light.

DNA Edit

What DNA would you edit and why?

After I verify the plasmid works, I would edit the RpaA promoter to make it stronger so the color changes are brighter and more noticeable on a 24-hour schedule.

What editing technology would you use?

I would use site-directed mutagenesis to make small changes to the RpaA promoter because it’s precise.

Editing Technology: Sub-questions

How it works: Site-directed mutagenesis uses PCR with primers that contain the desired mutation in their sequence. The polymerase copies the entire plasmid incorporating the mutation, and the original methylated template is then digested away with DpnI, leaving only the mutated version.

Preparation and inputs: I would design primers containing the specific base changes I want in the RpaA promoter region, using a tool like NEB’s primer design tool. The inputs are the original plasmid, the two mutagenic primers, a high-fidelity polymerase such as Phusion, dNTPs, and DpnI enzyme for template removal.

Limitations: Site-directed mutagenesis only makes small, precise changes and cannot introduce large insertions or deletions efficiently. It also requires the plasmid to already be available, and each round of mutagenesis must be followed by sequencing to confirm the correct change was made and no unintended errors were introduced.

References & Resources

Lecture Materials

  • Week 2 Lecture - DNA Read, Write, & Edit, George Church, Joe Jacobson, Emily Leproust
  • Week 2 Lab - DNA Gel Art, February 12-13, 2026

Required Readings

  1. Fang et al. (2025). “Mechanism and reconstitution of circadian transcription in cyanobacteria.” Journal of Biological Chemistry
  2. Salis et al. (2009). “Automated Design of Synthetic Ribosome Binding Sites to Control Protein Expression.” Nature Biotechnology, 27, 946-950
  3. Westerhoff et al. (2008). “Structure, Membrane Orientation, Mechanism, and Function of Pexiganan (Magainin derivative).” Biochemistry

Software & Tools Used

  • Benchling - DNA sequence design, annotation, and in-silico gel electrophoresis
  • Twist Bioscience - DNA synthesis order preparation and optimization
  • VectorBuilder Codon Optimization Tool - Reverse translation and codon optimization for violacein synthase
  • UniProt - Protein sequence database (VioC entry: sp|Q9S3U9|VIOC_CHRVO)
  • Imgur - Image hosting for documentation

Sequences Worked With

  • VioC (Violacein synthase) from Chromobacterium violaceum strain ATCC 12472
  • RpaA circadian promoter from Synechococcus elongatus
  • Magainin antimicrobial peptide sequence

AI Assistance

  • Claude (Anthropic) - DNA design and sequencing strategy
    • Model: Claude Sonnet 4.5
    • Date(s) used: February, 2026
    • Tasks: Assisted with reverse translation strategy for VioC, guidance on codon optimization principles, clarified Sanger sequencing vs synthesis tradeoffs

Project Development

  • Circadian-controlled antimicrobial system design (RpaA + Magainin)
  • Violacein pigment amplification through codon optimization
  • pLight-Circadian-Color plasmid conceptual design

Additional Resources

  • Twist Bioscience synthesis guidelines and specifications
  • Benchling annotation standards
  • Circadian rhythm gene expression literature

Acknowledgments

  • Course instructors

Week 3 HW/Lab : Opentrons

Python Script for Opentrons Artwork

Since I am not present to interact directly with the Opentrons output, I thought about why I would want to pipette an image and what that image should represent and decided to use Ndebele bead patterns as inspiration.

Ndebele bead patterns have a very specific geometric logic. They are built on a grid of “bead units” arranged in bold, angular, symmetric designs. The traditional South Ndebele aesthetic uses high-contrast colors in step-like diagonal and horizontal bands, often with thick outlines and mirrored symmetry.

They are also studied as Ethno mathematics, which often promotes a more humanistic and inclusive perspective on mathematics, focusing on how different groups manage, understand, and navigate their reality.

I found it interesting to bring the mathematical and social aspects of this indigenous knowledge to the biochemical level, as this layering of meaning creates interesting avenues for reflection on various levels.

Example of Ndebele paintings and beadwork:

Python Visuals & Scripts Ex.

I am not a coder, but playing around with the example scripts, I ended up using Claude to vibe-code the desired patterns and position. It required some debugging and made various output versions.

Although the co-lab script runs without error, I am not sure if this will work on Opentrons.

FULL FINAL VERSION CODE

Python Script

Below is the complete Python script that creates the Ndebele-inspired “BIO” pattern using negative space typography on a horizontal stripe background. The script was developed in Google Colab with Claude AI assistance and runs without errors in simulation.

The pattern uses:

  • Blue beads: Top and bottom border rows (rows 0-1, 9-10)
  • Pink beads: Inner border rows (rows 2-3, 7-8)
  • Purple beads: Central band (rows 4-6)
  • Black agar (no beads): Letter shapes forming “BIO” in negative space
from opentrons import types

metadata = {
    'protocolName': 'Ndebele Bio',
    'author': 'Henrietta',
    'source': 'HTGAA 2022',
    'apiLevel': '2.20'
}

##############################################################################
###   Robot deck setup constants - don't change these
##############################################################################

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

well_colors = {
    'A1' : 'Purple',
    'B1' : 'Pink',
    'C1' : 'Blue'
}

##############################################################################
###   Ndebele BIO Pattern — Negative Space Typography
###   Grid: 21 cols x 11 rows. Row 0 = bottom, Col 0 = left.
###   Physical size at 4mm spacing: 80mm wide x 40mm tall.
###
###   The word "BIO" is formed as NEGATIVE SPACE (black agar gaps) cut into
###   a Ndebele-style horizontal stripe background of coloured beads.
###
###   Background stripe layout (classic Ndebele horizontal banding):
###     Blue   = rows 0-1 and rows 9-10  (top and bottom borders)
###     Pink   = rows 2-3 and rows 7-8   (inner borders)
###     Purple = rows 4-6                (central band)
###
###   Letter placement (1-col margins each side, 1-col gap between letters):
###     B : cols 1-5   (5 wide x 9 tall, rows 1-9)
###     I : cols 7-9   (3 wide x 9 tall, rows 1-9, with serifs)
###     O : cols 11-15 (5 wide x 9 tall, rows 1-9)
###     cols 16-20 : right margin (filled with stripe colours)
###
###   Letter pixels = None (no bead — exposed black agar reads as the letter)
###   Background pixels = stripe colour bead
##############################################################################

BEAD_SPACING_MM = 4   # 4mm spacing keeps pattern clear of dish edge
BEAD_VOLUME_UL  = 1
MAX_ASPIRATE_UL = 16

def _make_pattern():
    """
    Build a 21-col x 11-row grid spelling BIO in negative space.

    Horizontal Ndebele stripes fill the background.
    Letter shapes are punched through as None cells (black agar).

    Letter pixel maps — 1 = letter pixel (None), 0 = background fill.
    Each letter is 5 wide x 9 tall (or 3 wide for I).
    Row order in map: index 0 = top of letter, index 8 = bottom.
    """
    cols = 21
    rows = 11

    # --- Ndebele horizontal stripe background ---
    def stripe_color(r):
        if r in (0, 1):      return 'Blue'
        if r in (2, 3):      return 'Pink'
        if r in (4, 5, 6):   return 'Purple'
        if r in (7, 8):      return 'Pink'
        return 'Blue'   # rows 9, 10

    # --- Letter pixel maps (1 = letter / negative space, 0 = background) ---
    # Row index 0 = top of letter, index 8 = bottom of letter

    B = [             # cols 1-5
        [1,1,1,1,0],  # top    — same wide bump as bottom
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,1,1,1,0],  # mid serif
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,1,1,1,0],  # bottom — matches top
    ]

    I = [         # cols 7-9
        [1,1,1],  # top serif
        [0,1,0],
        [0,1,0],
        [0,1,0],
        [0,1,0],
        [0,1,0],
        [0,1,0],
        [0,1,0],
        [1,1,1],  # bottom serif
    ]

    O = [             # cols 11-15
        [0,1,1,1,0],  # top arch
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [1,0,0,0,1],
        [0,1,1,1,0],  # bottom arch
    ]

    B_col, I_col, O_col = 1, 7, 11   # start columns for each letter

    grid = []
    for r in range(rows):
        row = []
        for c in range(cols):
            is_letter = False

            if B_col <= c < B_col + 5 and 1 <= r <= 9:
                map_row = 8 - (r - 1)   # flip: row 9 (top of letter) = map index 0
                map_col = c - B_col
                if B[map_row][map_col] == 1:
                    is_letter = True

            elif I_col <= c < I_col + 3 and 1 <= r <= 9:
                map_row = 8 - (r - 1)
                map_col = c - I_col
                if I[map_row][map_col] == 1:
                    is_letter = True

            elif O_col <= c < O_col + 5 and 1 <= r <= 9:
                map_row = 8 - (r - 1)
                map_col = c - O_col
                if O[map_row][map_col] == 1:
                    is_letter = True

            row.append(None if is_letter else stripe_color(r))
        grid.append(row)
    return grid

PATTERN = _make_pattern()

##############################################################################
###   OpentronsMock — simulation layer for Colab
###   Mimics the Opentrons API so run(protocol) works identically in Colab
###   and on the real robot. Records every drop for visualize().
##############################################################################

class OpentronsMock:

    class _Point:
        def __init__(self, x=0.0, y=0.0, z=0.0):
            self.x = x; self.y = y; self.z = z

    class _Location:
        def __init__(self, x=0.0, y=0.0, z=0.0, name=''):
            self.name  = name
            self.point = OpentronsMock._Point(x, y, z)
        def top(self, z=0):
            return OpentronsMock._Location(self.point.x, self.point.y, self.point.z + z, self.name)
        def move(self, pt):
            return OpentronsMock._Location(self.point.x + pt.x, self.point.y + pt.y, self.point.z + pt.z, self.name)

    class _Well:
        def __init__(self, name, x=0.0, y=0.0, z=0.0):
            self.name  = name
            self.point = OpentronsMock._Point(x, y, z)
            self._loc  = OpentronsMock._Location(x, y, z, name)
        def top(self, z=0):   return self._loc.top(z)
        def move(self, pt):   return self._loc.move(pt)

    class _Labware:
        def __init__(self, wells):  self._wells = wells
        def __getitem__(self, k):   return self._wells[k]
        def well(self, k):          return self._wells[k]

    class _TempModule:
        def __init__(self, plate):  self._plate = plate
        def load_labware(self, *a): return self._plate

    class _Pipette:
        def __init__(self, mock):
            self._mock   = mock
            self._color  = None
            self._volume = 0.0
            self.starting_tip = None
        def pick_up_tip(self):   pass
        def drop_tip(self):      self._color = None; self._volume = 0.0
        def move_to(self, loc):  pass
        def aspirate(self, volume, location):
            self._volume += volume
            self._color = self._mock.well_colors.get(getattr(location, 'name', ''), None)
        def dispense(self, volume, location):
            if self._color:
                pt = location.point
                self._mock._drops.append((pt.x, pt.y, volume, self._color))
            self._volume = max(0, self._volume - volume)

    class _Types:
        class Point:
            def __init__(self, x=0, y=0, z=0): self.x=x; self.y=y; self.z=z

    def __init__(self, well_colors):
        self.well_colors  = well_colors
        self.types        = self._Types()
        self._drops       = []

        color_wells       = {name: self._Well(name, i*9, 0) for i, name in enumerate(well_colors)}
        self._color_plate = self._Labware(color_wells)
        self._tip_rack    = self._Labware({f"{'ABCDEFGH'[r]}{c}": self._Well(f"{'ABCDEFGH'[r]}{c}")
                                           for r in range(8) for c in range(1, 13)})
        agar_well         = self._Well('A1', 0, 0, 0)
        self._agar_plate  = self._Labware({'A1': agar_well})
        self._temp_mod    = self._TempModule(self._color_plate)
        self._pipette     = self._Pipette(self)

    def load_labware(self, name, slot, label=''):
        if 'tiprack'  in name: return self._tip_rack
        if 'agar'     in name: return self._agar_plate
        if 'aluminum' in name: return self._color_plate
        return self._Labware({})

    def load_instrument(self, *a): return self._pipette
    def load_module(self, *a):     return self._temp_mod
    def comment(self, msg):        print(msg)

    def visualize(self, bead_radius_mm=1.3, figsize=(14, 10)):
        # Local imports — safe even if pd/plt were deleted by del np, pd above
        import pandas as pd
        import matplotlib.pyplot as plt
        import matplotlib.patches as patches
        from matplotlib.colors import to_rgba

        if not self._drops:
            print("No drops recorded — check that run(protocol) completed.")
            return

        color_map = {
            'Purple': '#CC00FF',   # violet fluorescent protein emission
            'Pink':   '#FF69B4',   # pink/mCherry variant emission
            'Blue':   '#0066FF',   # BFP — bright blue emission
        }

        fig, ax = plt.subplots(figsize=figsize)
        ax.set_facecolor('#000000')
        fig.patch.set_facecolor('#000000')

        xs  = [d[0] for d in self._drops]
        ys  = [d[1] for d in self._drops]
        pad = bead_radius_mm * 6

        # True circle petri dish — use the larger span as the radius so
        # the dish is always round and all beads sit inside it
        cx = (min(xs) + max(xs)) / 2
        cy = (min(ys) + max(ys)) / 2
        r  = max((max(xs) - min(xs)) / 2,
                 (max(ys) - min(ys)) / 2) + pad * 1.8   # single radius → circle

        # Set view limits to fully contain the circle
        margin = bead_radius_mm * 2
        ax.set_xlim(cx - r - margin, cx + r + margin)
        ax.set_ylim(cy - r - margin, cy + r + margin)
        ax.set_aspect('equal')

        # Show x/y axes with mm measurements — matches original Colab rendering
        ax.set_xlabel('x position (mm)', color='white', fontsize=11)
        ax.set_ylabel('y position (mm)', color='white', fontsize=11)
        ax.tick_params(colors='white', labelsize=9)
        for spine in ax.spines.values():
            spine.set_edgecolor('#555')

        ax.add_patch(plt.Circle((cx, cy), r, color='#000000', zorder=0))
        ax.add_patch(plt.Circle((cx, cy), r, fill=False, edgecolor='#555', linewidth=2.5, zorder=1))

        # Bead radius — slightly smaller than half the spacing so there is a
        # visible gap between every dot, matching the original code's style
        dot_r = bead_radius_mm * 0.75

        # Draw each bead — flat colour only, no shadow or highlight
        for (x, y, vol, color_name) in self._drops:
            hex_color = color_map.get(color_name, '#999999')
            ax.add_patch(plt.Circle((x, y), dot_r, color=hex_color, zorder=3))

        # Legend
        used = sorted(set(d[3] for d in self._drops))
        ax.legend(handles=[patches.Patch(color=color_map.get(c, '#999'), label=c) for c in used],
                  loc='upper right', facecolor='#2a2a2a', edgecolor='#555',
                  labelcolor='white', fontsize=11, framealpha=0.85)

        ax.set_title('Ndebele — BIO in Negative Space',
                     color='white', fontsize=14, fontweight='bold', pad=14)
        plt.tight_layout()
        plt.show()

        # Summary
        df = pd.DataFrame(self._drops, columns=['x_mm', 'y_mm', 'vol_ul', 'color'])
        print("\n=== Dispensing Summary ===")
        print(df.groupby('color').agg(beads=('vol_ul', 'count'), total_ul=('vol_ul', 'sum')))
        print(f"\nTotal beads dispensed : {len(self._drops)}")
        print(f"Total volume dispensed: {df['vol_ul'].sum():.0f} µL")


def run(protocol):
  ##############################################################################
  ###   Load labware, modules and pipettes
  ##############################################################################

  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)

  # Temperature Module Plate
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul',
                                                      'Cold Plate')
  # Choose where to take the colors from
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')  ## TA MUST CALIBRATE EACH PLATE!
  # Get the top-center of the plate, make sure the plate was calibrated before running this
  center_location = agar_plate['A1'].top()

  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  ##############################################################################
  ###   Patterning
  ##############################################################################

  ###
  ### Helper functions for this lab
  ###

  # pass this e.g. 'Red' and get back a Location which can be passed to aspirate()
  def location_of_color(color_string):
    for well,color in well_colors.items():
      if color.lower() == color_string.lower():
        return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

  # For this lab, instead of calling pipette.dispense(1, loc) use this: dispense_and_detach(pipette, 1, loc)
  def dispense_and_detach(pipette, volume, location):
      """
      Move laterally 5mm above the plate (to avoid smearing a drop); then drop down to the plate,
      dispense, move back up 5mm to detach drop, and stay high to be ready for next lateral move.
      5mm because a 4uL drop is 2mm diameter; and a 2deg tilt in the agar pour is >3mm difference across a plate.
      """
      assert(isinstance(volume, (int, float)))
      above_location = location.move(types.Point(z=location.point.z + 5))  # 5mm above
      pipette.move_to(above_location)       # Go to 5mm above the dispensing location
      pipette.dispense(volume, location)    # Go straight downwards and dispense
      pipette.move_to(above_location)       # Go straight up to detach drop and stay high

  ###
  ### YOUR CODE HERE to create your design
  ###

  num_rows = len(PATTERN)
  num_cols = len(PATTERN[0])

  # Group bead positions by color to minimize tip changes.
  # Shared edge beads are de-duplicated by the seen set.
  color_order = ['Blue', 'Pink', 'Purple']  # only colours in use
  seen = set()
  schedule = {color: [] for color in color_order}

  for row_idx, row in enumerate(PATTERN):
    for col_idx, color in enumerate(row):
      if color is None:
        continue                      # no drop for open/white zones
      pos = (row_idx, col_idx)
      if pos in seen:
        continue                      # de-duplicate shared edge beads
      seen.add(pos)
      schedule[color].append(pos)

  # Dispense all beads of each color before moving to the next
  for color in color_order:
    positions = schedule[color]
    if not positions:
      continue

    # Split into chunks that fit within one tip's max aspirate volume
    chunk_size = MAX_ASPIRATE_UL // BEAD_VOLUME_UL
    chunks = [positions[i:i + chunk_size] for i in range(0, len(positions), chunk_size)]

    for chunk in chunks:
      pipette_20ul.pick_up_tip()
      pipette_20ul.aspirate(len(chunk) * BEAD_VOLUME_UL, location_of_color(color))

      for (row, col) in chunk:
        # Center the pattern on the plate
        x_offset = (col - (num_cols - 1) / 2.0) * BEAD_SPACING_MM
        y_offset = (row - (num_rows - 1) / 2.0) * BEAD_SPACING_MM
        adjusted_location = center_location.move(types.Point(x=x_offset, y=y_offset))

        dispense_and_detach(pipette_20ul, BEAD_VOLUME_UL, adjusted_location)

      # Clean up!
      pipette_20ul.drop_tip()


# Execute Simulation / Visualization -- don't change this code block
protocol = OpentronsMock(well_colors)
run(protocol)
protocol.visualize()

AI Usage Documentation

Claude (Anthropic) was used throughout the coding process to:

  • Translate the geometric logic of Ndebele patterns into Python grid coordinates
  • Debug the pattern generation and visualization code
  • Optimize the bead dispensing schedule to minimize tip changes
  • Create the negative space typography effect for the “BIO” lettering
  • Implement the OpentronsMock simulation class for Colab testing

The script runs without errors in Google Colab simulation. The Colab notebook is available at: https://colab.research.google.com/drive/17fkFi4Un_IXNMcKub2AIfqoiVVPXfZ-F

EARLY VERSIONS BEFORE KNOWING COLOUR AVAILABILITY

Post Lab Homework


Published Paper

A directly relevant paper is Fang et al. (2025) in Nature Communications, which demonstrates circadian-gated gene expression circuits in bacteria, using automated temporal sampling to characterize rhythmic protein output over 24-hour cycles. This paper is not a peripheral reference; it is one of the primary foundational sources for my final project concept and is already cited in my main project documentation. The automation approach used to verify rhythmic expression in that work is precisely what I intend to replicate and extend with the Opentrons platform. What I Intend to Automate

My project proposes a bacterial AND gate where the antimicrobial peptide Magainin is only expressed when two conditions are simultaneously true: the circadian regulator RpaA is active, and a pathogen signal is present. The core experimental challenge is verifying this gate actually works as designed, which requires sampling bacterial expression levels repeatedly across a full 24-hour cycle, under multiple conditions, without human error or gaps overnight. This is the automation task.

The Opentrons OT-2 would run an unattended 24-hour sampling protocol across three experimental conditions:

  • RpaA active + pathogen signal present (AND gate should trigger)
  • RpaA active + no pathogen signal (gate should stay silent)
  • RpaA inactive + pathogen signal present (gate should stay silent)

At each 2-hour timepoint, the robot samples each culture well, transfers to a measurement plate for fluorescence reading, and replaces the sampled volume with fresh media to keep cultures alive. This builds a full temporal expression profile across all three conditions without any overnight manual intervention.

I would use Claude for the coding and guidance in the technical parts of this.

Why This Automation Matters

The AND gate only has meaning if you can show it is silent when it should be silent and active only at the right circadian phase with the right pathogen or other signal. That requires clean data across all three conditions at every 2-hour window through the night. Manual pipetting at 2am introduces the exact inconsistency that would make the rhythmic signal unreadable. The Opentrons removes that variable entirely.


Future Extensions

If access to Ginkgo Nebula becomes available, the next step would be submitting the AND gate genetic construct for scaled fermentation and characterization; using Nebula’s high-throughput infrastructure to screen circuit variants with different RpaA promoter strengths or pathogen-sensing thresholds, generating the kind of combinatorial data that would take months on a single benchtop robot.


References & Resources

Lecture Materials

Required Readings

  1. Fang et al. (2025). “Circadian-gated gene expression circuits in bacteria.” Nature Communications
  2. UCSD (2024). “Researchers Rebuild Microscopic Circadian Clock.” University of California San Diego press release
  3. Bilska et al. (2021). “Circadian rhythm in skin barrier function and antimicrobial peptides.” Experimental Dermatology

Software & Tools Used

  • Google Colab - Python script development and testing for Opentrons protocols
  • Python - Opentrons protocol scripting and pattern generation
  • Imgur - Image hosting for project visualization and Ndebele pattern references

Cultural & Mathematical Inspiration

  • Ndebele bead patterns and geometric design principles
  • Ethnomathematics - Indigenous mathematical knowledge systems
  • Traditional South Ndebele aesthetic and symmetry patterns

Project Concepts Explored

  • Circadian-controlled bacterial pigment systems
  • Light-responsive color-changing bacteria
  • UV-protective bioplastic materials with bacterial pigments
  • Mechanotransduction experiments with bacterial cultures
  • Bacterial AND gate with circadian gating (RpaA + pathogen signal triggering Magainin expression)

Cost Considerations

  • Twist Bioscience DNA synthesis pricing
  • Remote lab assistance availability assessment
  • UK-based protein order logistics and costs

AI Assistance

  • Claude (Anthropic) - Code development and technical guidance
    • Model: Claude Sonnet 4.5
    • Date(s) used: February, 2026
    • Tasks: Assisted with Python script development for Opentrons Ndebele pattern generation (“vibe-coding”), debugging protocol scripts and verifying scientific terminology.

Future Platforms

  • Ginkgo Nebula - Potential platform for scaled fermentation and high-throughput circuit variant screening

Additional Resources

  • HTGAA final project guidelines and requirements
  • Twist Bioscience pricing documentation
  • Remote lab capabilities at available nodes
  • Opentrons protocol documentation and API reference
  • Ndebele art and design pattern libraries
  • Ethnomathematics literature

Acknowledgments

  • Course instructors
  • TAs
  • Ndebele cultural heritage for geometric design inspiration

Week 4 HW: Protein Design Part 1

Part A: Conceptual Questions

How many molecules of amino acids do you take with a piece of 500g of meat? (avg amino acid ~100 Daltons)

Since I am a visual learner, I needed an analogy to try to grasp Daltons, grams, and moles. I imagine each amino acid as a finished LEGO model, and each tiny brick is a Dalton. When I weigh all the models together in a cupboard, I have 500 grams. I count how many moles by dividing the total mass by the mass of one model (-100 Daltons). Then, multiplying by Avogadro’s number, I see how many individual models I have in total. In scientific terms, I compute the number of moles by dividing 500 grams by 100 grams per mole. Then, I multiply by Avogadro’s number,(According to Google search Avogadro’s constant is the number of particles, like atoms or molecules, in one mole of a substance, equal to approximately 6.022 times 10 to the 23) 6.022 times 10 to the 23, and that yields approximately 3.0 times 10 to the 24 molecules of amino acid.

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

We don’t become a cow or a fish, because we’re only using parts of the cow and fish to continue building on what already exists, which is our human body. In other words, digestion breaks down the proteins into amino acids, and then our body uses its own genetic instructions to reassemble those pieces into human proteins, ensuring we stay uniquely ourselves.

Why are there only 20 natural amino acids?

There are only 20 natural amino acids because, although their combinations can form infinite possibilities, evolution only needed these 20 to create all the proteins we rely on. Their chemical properties allow for immense diversity in protein structure, and this set is perfectly suited to the way DNA encodes and guides their assembly, giving us the versatility we need without adding more complexity.

Where did amino acids come from before enzymes that make them, and before life started?

Amino acids likely formed from simple chemicals dissolved in warm ocean water on early Earth, where energy sources like heat or lightning sparked chemical reactions. In a way, you can think of a modern dam as a kind of micro-ecosystem—just as water and energy flow through a dam, creating pockets of life, early oceans created the right conditions for these amino acids to form, eventually leading to the first building blocks of life.

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Depending on the amino acids you use, a typical alpha helix forms a right-handed spiral when built from L-amino acids. However, if you use D-amino acids, the entire helix reverses its handedness and forms a left-handed spiral. It is important to note that not all amino acids form alpha helices; some sequences prefer other structures like beta sheets.So, the handedness and structure depend on both the amino acid type and the overall sequence, which determines how the chain folds and stabilizes.

Can you discover additional helices in proteins?

Because proteins have long chains of different amino acids, they can fold in all sorts of ways, and that means new helices or other patterns can be identified within them, especially with advanced tools like structural prediction or experimental techniques.

Why are most molecular helices right-handed?

The right-handedness is actually favored in DNA due to the natural chirality of its sugar backbone and the way it interacts with water and other molecules. So, that structural preference is built right into how the backbone forms. Keratin and hair curl were helpful for me to think about.

Why do β-sheets tend to aggregate?

So, beta sheets tend to aggregate because, like silk fibroin, they form straight, aligned strands that stack side by side. In silk, these parallel sheets create strong, stable fibers, but in proteins, this same alignment lets the sheets stack excessively, exposing those hydrogen bonds and promoting aggregation. So, just like silk’s strength comes from its sheet alignment, aggregation in proteins happens when these sheets stack and bind too readily.

What is the driving force for β-sheet aggregation? Why do many amyloid diseases form β-sheets?

Because these beta sheets stack so easily, they misfold and form these stable aggregates. In diseases like Alzheimer’s or Parkinson’s, these aggregated beta sheets build up, disrupting normal cell function and triggering the disease process. Tau tangles are a classic example of beta sheet misfolding driving disease.

Can you use amyloid β-sheets as materials?

Since silk fibroin is based on beta sheets and is already a natural, strong material, researchers have been exploring ways to harness amyloid beta sheets similarly. Amyloid structures are extremely stable, so with careful design, scientists are looking at them as potential biomaterials


Part B: Protein Analysis and Visualization**

  1. Briefly describe the protein you selected and why you selected it.

    sp|P60520|GBRL2_HUMAN MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSSLTMGQLYEKEKDEDGFLYVAYSGENTFGF

  2. Identify the amino acid sequence of your protein.

Length 117 amino acids. Most frequent amino acid (image still needed from Colab output)

  1. How long is it? What is the most frequent amino acid?

    Homologs 250 homologs found via UniProt BLAST. Top matches from rat, mouse, human, and bovine this indicating strong conservation across mammals reflecting the fundamental evolutionary importance of this protein.

  2. How many protein sequence homologs are there? (Use UniProt BLAST)

Protein family GABARAPL2 belongs to the ATG8 family, part of the broader GABARAP subfamily of autophagy-related proteins.

  1. Does your protein belong to any protein family?

    RCSB structure page PDB entry 7LK3. Crystal structure of untwinned human GABARAPL2.

  2. Identify the structure page of your protein in RCSB.

The most frequent amino acid is Lysine (K), appearing 12 times (10.3% of the sequence).

  1. When was the structure solved? Is it good quality? (Resolution: smaller = better, aim < 2.70 Å)

Structure quality Deposited February 1, 2021, released May 12, 2021. Resolution of 1.90 Å — excellent quality, well below the 2.70 Å threshold.

  1. Are there any other molecules in the solved structure apart from protein?

    Other molecules Yes. 1,2-Ethanediol (EDO) is present as a ligand.

  2. Does your protein belong to any structure classification family?

    GABARAPL2 belongs to the ubiquitin-like superfamily under the beta-grasp fold in SCOP classification. Like other ATG8 proteins, GABARAPL2 is comprised of an N-terminal helical extension preceding four beta-sheets in a ubiquitin-like beta-grasp fold.

  3. Open the structure in 3D visualization software (PyMol):

    • Visualize as “cartoon”, “ribbon”, and “ball and stick”
    • Color by secondary structure — more helices or sheets?
    • Color by residue type — hydrophobic vs hydrophilic distribution?
    • Visualize the surface — any binding pockets?
When colored by secondary structure, GABARAPL2 shows a clear dominance of red (helices) over yellow (beta sheets). Green loops connect these elements throughout the structure.

Hydrophobic residues (orange) concentrate in the protein core, while blue dominates the outer surface. This showcases hydrophobic residues being hidden in the middle away from the aqueous environment.

This surface visualization reveals a clear hydrophobic indentation in the middle of the structure, corresponding to the LIR docking site where GABARAPL2 interacts with autophagy receptors.


Part C: Using ML-Based Protein Design Tools

C1. Protein Language Modeling

Deep Mutational Scans

  • Use ESM2 to generate an unsupervised deep mutational scan based on language model likelihoods
  • Can you explain any particular pattern? (choose a residue and mutation that stands out)
  • (Bonus) Compare language model predictions to experimental scans

The brightest yellow spot in the heatmap appears at position 60, mutation to Glutamate (E), meaning the model predicts this change would be highly favorable. This makes sense in a metabolic context, as Glutamate’s charged nature supports the protein’s membrane interactions during autophagy and fasting states.

The darkest purple spots appear around positions 54-57 at Cysteine (C) and at position 64 at Tryptophan (W), meaning the model strongly disfavors these mutations. Cysteine in particular stands out as consistently disfavored

Latent Space Analysis

  • Embed proteins in reduced dimensionality using the provided sequence dataset
  • Analyze neighborhoods — do they approximate similar proteins?
  • Place your protein in the map and explain its position and similarity to neighbors

The 3D t-SNE plot shows a large dense central cluster of proteins with outliers scattered at the edges. Proteins in the same neighborhood share similar sequence embeddings, suggesting structural and functional similarity. GABARAPL2, as a member of the highly conserved ubiquitin-like superfamily, would likely position itself near the central core of the cloud, close to other small globular autophagy and ubiquitin-related proteins. Its neighbors would likely include other ATG8 family members

C2. Protein Folding

  • Fold your protein with ESMFold — do predicted coordinates match the original structure?
  • Try mutations, then larger sequence changes — is the structure resilient?

The ESMFold predicted structure closely matches the original crystal structure. Both show the characteristic beta-grasp fold with a central beta sheet core surrounded by helices, and the overall globular compact shape is preserved.

C3. Protein Generation

  • Use ProteinMPNN to inverse-fold your protein backbone and propose sequence candidates
  • Analyze predicted sequence probabilities vs the original sequence
  • Input the new sequence into ESMFold and compare the predicted structure to original
___

ProteinMPNN generated 4 sequence candidates from the 7LK3 backbone with sequence recovery rates between 46–52% and consistent scores around 0.81–0.84. The probability heatmap shows scattered high-confidence positions (yellow) where the backbone strongly constrains the amino acid choice, surrounded by flexible positions with lower confidence. Despite roughly half the sequence changing, the backbone fold is preserved suggesting that many different sequences can encode the same GABARAPL2 structure.

When the new ProteinMPNN sequence was folded with ESMFold, the overall shape stayed the same. But there were some small differences: the helices shifted slightly, the beta sheets moved a little, and the central loop region pulled closer together. This suggests that even though roughly half the amino acids changed, the protein still folds into essentially the same shape. The structure is resilient.


Part D: Group Brainstorm on Bacteriophage Engineering

Decided to try option 3, as if it fails, it still could help eliminate a possible pathway to end goal and just seemed more interesting. General reminder note: Loop regions and terminal extensions are safer engineering targets than core structural elements.

Higher Toxicity of the MS2 Lysis Protein:

Goal: Increase the toxicity of the MS2 L protein so it lyses bacterial cells faster and more completely.

Approach:

Use a protein language model (ESM or similar) to identify which amino acid positions in the L protein are most likely involved in membrane disruption Propose mutations at those positions using ProteinMPNN to suggest alternative amino acids that might make membrane interaction more aggressive Use AlphaFold-Multimer to model how the mutant L protein interacts with its bacterial target (DnaJ and the membrane) Compare predicted binding strength and structural changes between original and mutant versions

Why these tools help:

Language models capture evolutionary patterns across many proteins, helping identify positions where changes are most likely to matter AlphaFold-Multimer lets you check if your proposed mutations actually change how the protein docks with its bacterial target

Potential pitfalls:

The exact mechanism of membrane disruption by the L protein is not fully understood, so mutations may target the wrong part of the protein or “drill” for my analogy reference. Limited training data exists specifically for phage-bacteria lysis interactions, so predictions may be less reliable than for well-studied proteins

Pipeline schematic first draft:

L protein sequence → ESM (identify key positions) → ProteinMPNN (propose mutations) → AlphaFold-Multimer (validate structure and interaction) → compare mutant vs original

References & Resources

Lecture Materials

Protein Analyzed

  • GABARAPL2 (GABA Type A Receptor Associated Protein Like 2)
  • UniProt ID: sp|P60520|GBRL2_HUMAN
  • PDB Structure: 7LK3 (1.90 Å resolution, deposited Feb 2021, released May 2021)
  • 117 amino acids, ATG8 family, ubiquitin-like superfamily

Software & Tools Used

  • UniProt - Protein sequence database and BLAST homolog searches
  • RCSB Protein Data Bank - Protein structure database (PDB: 7LK3)
  • PyMOL - 3D protein structure visualization and analysis
  • Google Colab - Running ESM2, ProteinMPNN, ESMFold analyses
  • ESM2 - Protein language model for deep mutational scanning and sequence embeddings
  • ESMFold - Protein structure prediction
  • ProteinMPNN - Inverse folding and sequence design
  • t-SNE - Dimensionality reduction for latent space analysis
  • Imgur - Image hosting for visualization documentation

Required Readings

  • GABARAPL2 autophagy function literature
  • ATG8 family protein structure and function papers
  • Protein folding and stability principles
  • Amyloid formation and beta-sheet aggregation mechanisms

AI Assistance

  • Claude (Anthropic) - Protein analysis and ML tool interpretation
    • Model: Claude Sonnet 4.5
    • Date(s) used: February, 2026
    • Tasks: Assisted with understanding protein structure visualization principles, interpreting ESM2 deep mutational scan results, explaining t-SNE embeddings and protein neighborhoods, clarifying ProteinMPNN sequence recovery metrics, helped develop analogies for complex concepts and checked if homework correct.

Bacteriophage Engineering Project

  • Option 3: Increase MS2 lysis protein (L protein) toxicity
  • Tools: ESM language model, ProteinMPNN, AlphaFold-Multimer
  • Target: Enhanced membrane disruption and faster bacterial lysis

Additional Resources

  • SCOP protein structure classification database
  • MS2 bacteriophage literature
  • Membrane disruption mechanism papers
  • DnaJ protein interaction studies

Acknowledgments

  • Course instructors for protein design tutorials
  • TAs for PyMOL visualization assistance
  • Colab community for ML tool notebooks

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Background

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine to Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.


Generate Binders with PepMLM

UNIPROT

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

CHANGED TO MUTATED VERSION (4.K to V)

MATVAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

SETTING UP AND RUNNING PepMLM IN COLAB



GENERATED 4 PEPTIDES AT 12 SEQUENCE LENGTH

MORE DETAIL IN INTERACTIVE VIEW BELOW



Used GOOGLE GEMINI WITH INTERPRETING AND UNDERSTANDING THE OUTPUT



Evaluate Binders with AlphaFold3

When I looked at the five structures, the known binder really stood out as it sat closely tucked against SOD1 and scored 0.73, which was by far the highest. You could see it engaging deeply with the protein. The PepMLM peptides told a different story. The best ones, Peptide3 and Peptide1, appeared to sit near the top of the protein around the loop region, but they looked more like they were resting on the surface rather than really grabbing onto it. Peptide0 was the weakest and it looked almost detached, just floating near the protein rather than making real contact. None of the generated peptides came close to the known binder, which shows that while PepMLM gave us a starting point, the peptides still need improvement to properly engage SOD1-A4V. (This was my favorite part due to the visuals).


Part 3: Evaluate Properties with PeptiVerse

After evaluating the results below I would advance WLVYAVAAAWKA because it is the only peptide with medium binding affinity (7.247 pKd/pKi), compared to weak binding for the others. It is also well-balanced in terms of hemolysis risk with a low probability of 0.133, and its net charge of 0.76 at pH 7 is nearly neutral, which should help with both solubility and cellular uptake without causing charge-related toxicity. Although its ipTM score of 0.39 in AlphaFold3 was not the highest, the combination of improved predicted binding affinity and favorable therapeutic properties makes it the strongest candidate for further development. While Peptide3 had a slightly higher ipTM of 0.44, it showed the weakest predicted binding affinity of 5.498 and therefore does not balance structural and therapeutic properties as well

Part 4: Generate Optimized Peptides with moPPIt

It took 39min to run.

The moPPIt peptides differ from PepMLM in a key way: moPPIt allowed me to specify exactly which region of SOD1 I wanted to target, whereas PepMLM just generated peptides that looked plausible without that control. With Peptide 2 from moPPIt, I can see it’s actually engaging the N-terminal region where A4V sits, which is exactly what I designed it to do. PepMLM couldn’t guarantee that level of specificity.

Before advancing any peptide to clinical studies, I would need to do much more work. First, I’d validate the binding predictions with actual lab experiments measure real binding affinity. Most importantly, I’d likely run moPPIt again with different target regions on SOD1 in order to generate a larger panel of candidates and pick the best performers across all validation steps. No single computational prediction is enough to move forward to the clinical setting.

Part B: Optional

Part C: Final Project: L-Protein Mutants

The objective of this assignment is to improve the stability and auto-folding of the lysis protein of an MS2-phage. This mechanism is key to understanding how phages may help address antibiotic resistance.

After going through the readings, including the group final project document a Plan A would be: (This stays within scope, MurJ and multi-target approaches seem intersting though…)

1 Use computational tools like AlphaFold2 or ProteinMPNN to identify mutations that improve intrinsic stability and auto-folding of the lysis protein

2 Target mutations that strengthen the hydrophobic core, eliminate aggregation-prone regions, or introduce stabilising interactions like salt bridges

3 Engineer the lysis protein to fold correctly without requiring DnaJ or any other bacterial chaperone

4 Design mutations that also accelerate oligomerisation or enhance membrane pore-forming activity for faster lysis

5 Synthesise the mutant gene via Twist, clone into plasmid using Gibson Assembly, validate structural integrity with Nuclera, then test in E. coli.


References & Resources

Lecture Materials

  • Week 5 Lecture - Protein Design Part II, Pranam Chatterjee, Gabriele Corso
  • Week 5 Lab - Protein Design Part II Lab, March 5-6, 2026

Software & Tools Used

  • UNIPROT
  • PepMLM
  • Alphfold
  • Peptiverse
  • moPPIt

AI Assistance

  • Claude (Anthropic) - Protein design concepts
    • Model: Claude Sonnet 4.5
    • Date(s) used: March, 2026
    • Tasks: Acted as mentor (Skills) in conversations about unfamiliar and technical areas. Checked homework was correct.

Additional Resources

  • Advanced protein design literature
  • Computational protein engineering tools

Acknowledgments

  • Course instructors and TAs

Week 6 HW: Genetic Circuits Part I

DNA Assembly

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The Phusion HF PCR Master Mix contains several key components that are already pre-combined for convenience. Phusion DNA Polymerase is the core enzyme responsible for copying the DNA template, and it has a built-in proofreader to ensure it is high-fidelity, meaning it reduces errors during amplification. The dNTPs provide the nucleotide building blocks that get incorporated into the new DNA strand. MgCl₂ (magnesium) acts as an essential cofactor that activates the polymerase. The reaction buffer (oven conditions in my analogy) maintains the correct pH and ionic environment for the reaction to work. For this particular lab, precise mutagenesis of the amilCP chromophore region was required, so the high-fidelity polymerase is especially important; it ensures there are no unintended amino acid changes beyond the designed mutation.

What are some factors that determine primer annealing temperature during PCR?

Several factors affect the temperature at which a primer successfully binds to its target on the DNA template. First, secondary structure is something to avoid. If a primer folds back on itself it is like a blurry photograph that cannot be read properly, meaning it cannot find its matching location on the template regardless of temperature.

Second, GC content affects annealing temperature. Primers with more G and C bases require higher temperatures because GC pairs bond more strongly than AT pairs. In this lab the backbone primers anneal at 57°C while the color insert primers anneal at 53°C, reflecting differences in their GC content.

Third, primer length matters. A longer primer is like a photograph that also shows the surrounding context, making it a more specific match. Longer primers bind more strongly and therefore require higher annealing temperatures. These factors were carefully balanced during primer design, aiming for a Tm range of 52–58°C with primer pairs kept within 5°C of each other.

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR and restriction enzyme digests differ in terms of precision and flexibility. PCR is more flexible and suitable for bespoke mutation designs, giving you control over exactly where a fragment begins and ends by designing the primers yourself. Restriction enzyme digests are more limited in that they can only cut where their recognition sequence naturally exists in the DNA, but this makes them faster and more straightforward when you already know exactly which sequence you need.

I would use PCR when attempting to design a mutation, as in this lab where the chromophore color changes were introduced through deliberate primer mismatches. I would use restriction enzyme digests when the recognition sites are already conveniently placed and the desired sequences are already known, as this would save time.

In terms of protocol, PCR requires designing primers, running denature, anneal and extend cycles in a thermocycler, cleaning up the original template with a DpnI digest, and then purifying the DNA. Restriction enzyme digests are more straightforward, requiring only choosing the right enzyme for the recognition site, incubating the DNA with the enzyme at 37°C, and running a gel to confirm the correct cut. No heating cycles or template cleanup are needed.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

There are several ways to ensure fragments are appropriate for Gibson cloning.

First, correct overlaps must be present. In this lab the primers were designed from the start with 20-40bp overhangs complementary to the adjacent fragment, ensuring the fragments can recognise and join each other during assembly.

Second, fragment size should be confirmed by running a diagnostic gel. If the band appears at the wrong size then the PCR was unsuccessful and the fragment would not be appropriate for Gibson cloning.

Third, the DNA must be clean and concentrated enough. The Nanodrop measurement confirms concentration is above 30ng/µL. Contaminants from PCR can inhibit the Gibson Assembly reaction.

Fourth, the original template must be removed. The DpnI digest ensures the original methylated mUAV plasmid is not carried over, which would otherwise produce background colonies of the unmutated purple protein. Finally, the correct molar ratio must be used. Gibson Assembly works best at a 2:1 insert to vector ratio to ensure efficient and complete assembly.

How does the plasmid DNA enter the E. coli cells during transformation?

The plasmid enters the E. coli cells through a process called heat shock transformation.

First, the cells are made chemically competent using CaCl₂. This partially neutralizes the repulsion between the negatively charged cell membrane and the negatively charged DNA, allowing the DNA to associate with the cell surface.

Next the cells are kept on ice, which makes the membrane more rigid and stable. Then the cells are heat shocked at exactly 42°C for 45 seconds, which temporarily disrupts the membrane and allows the plasmid to enter the cell by diffusion. The cells are then immediately returned to ice so the membrane stabilizes and closes again.

So essentially the process is: make the membrane rigid with ice, give it a heat shock to open it briefly, then put it back on ice to close it again with the plasmid now inside.

After heat shock, SOC media (Note my nutrient rich broth analogy) is added to help the cells recover and begin multiplying. Finally the cells are plated on chloramphenicol agar, where only cells that successfully received the plasmid will survive and grow.

Describe another assembly method in detail, such as Golden Gate Assembly. Explain the other method in 5 to 7 sentences plus diagrams, either handmade or online.

Golden Gate Assembly is a method of connecting DNA fragments together using custom 4 base sticky ends. It works by sending in a Type IIS restriction enzyme that acts like a self destructing instruction manual, cutting at a defined location outside its recognition site and then removing itself in the process, leaving behind unique sticky ends that have been designed to only connect to one specific matching partner. These sticky ends are self sorting, acting like magnets that can only attract their intended match and nothing else. Once the fragments are correctly joined the assembly is scarless, meaning no trace of the recognition site remains in the final product.

This differs from Gibson Assembly which uses an exonuclease, polymerase and ligase, and requires longer overlaps of 20-40bp between fragments rather than the 4 base sticky ends of Golden Gate. Golden Gate cycles between cutting and ligation temperatures repeatedly, whereas Gibson Assembly runs isothermally at 50°C. Because incorrect assemblies get re-cut and correct ones accumulate, Golden Gate is highly efficient and can assemble many parts simultaneously in one tube, making it more scalable than Gibson Assembly which typically handles two to six parts.

Model this assembly method with Benchling.

Golden Gate Assembly modeled in Benchling using mUAV plasmid (MG252981.1) as starting reference:

Figure 1: Backbone fragment (38bp) with BsaI recognition site (GGTCTCN) annotated at its end. BsaI cuts here, removing itself and exposing the sticky end.

Figure 2: Color insert fragment (49bp) containing the orange chromophore mutation GTTGGA replacing original TGTCAG. This sequence changes the chromophore amino acids to produce orange instead of purple. Together these two fragments would be combined with BsaI and ligase in one tube to produce a scarless circular plasmid carrying the orange mutation.


Assignment: Asimov Kernel

Did not have access to Asimov Kernel. (Did attend the MIT Review and not sure if Nodes have access. Also, signed up to be beta tester when availible)


References & Resources

Lecture Materials

  • Week 6 Lecture - Genetic Circuits Part I: Assembly Technologies, Doug Densmore, Traci Haddock
  • Week 6 Lab - Gibson Assembly, March 12-13, 2026

Software & Tools Used

  • Benchling

AI Assistance

  • Claude (Anthropic) - Genetic circuit design
    • Model: Claude Sonnet 4.5
    • Date(s) used: March, 2026
    • Tasks: Acted as mentor (Skills) in conversations about unfamiliar and technical areas. Checked homework was correct.

Additional Resources

  • Gibson assembly protocol documentation
  • Genetic circuit assembly technologies literature

Acknowledgments

  • Course instructors and TAs

Week 7 HW: Genetic Circuits Part II

Intracellular Artificial Neural Networks (IANNs)

What advantages do IANNs have over traditional genetic circuits,whose input/output behaviors are Boolean functions?

Boolean genetic circuits are binary; a signal is either present or absent, on or off. IANNs add nuance by incorporating quantity: not just whether a signal is present, but how much, and how that amount combines with other weighted inputs to determine output. This matters biologically because cells are not rigid systems. Gene expression fluctuates due to stochastic noise and biological drift. Boolean circuits are brittle in this context, while IANNs, by distributing computation across many weighted inputs, are more robust to that natural variability.

Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

Many inflammatory diseases are circadian-gated. Asthma attacks, rheumatoid arthritis flares, and cardiovascular events cluster at specific phases of the biological clock. A Boolean circuit cannot capture this; it can detect whether inflammation is present, but not whether it is occurring at the wrong time. That distinction is clinically meaningful, and it is what an IANN could resolve.

Two circuits were designed to explore this. The KaiClock circuit integrates circadian phase (X1: RpaA) with inflammatory state (X2: InflammationSensor), producing a graded fluorescent output that scales with the weighted combination of both inputs. However, the part naming conventions used in KaiClock did not register correctly in the Neuromorphic Wizard simulator, so Durin was designed and submitted as the parralel AND gate working version instead.

Durin runs two parallel AND gates: X1 carries PgU with mMaroon1, and X2 carries PgU_rec_CasE with eBFP2. Both gates must be satisfied simultaneously before CasE releases the final mNeonGreen output. Rather than a weighted gradient, Durin enforces parallel signal verification, two conditions checked at once before committing to output.

Durin was the circuit submitted for possible run at Weiss Lab. Together the two designs represent an iterative process: KaiClock aimed to establish the biological concept, and Durin aimed to be an executable implementation under simulator constraints.

Limitations include irreversibility from recombinase components, susceptibility to molecular noise, and risk of crosstalk with endogenous cellular machinery.


Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

Hidden layer that does its own computation, and the output of that hidden layer becomes the input to the next layer.

Weiss Lab run of my test IANNS bias dependent parallel AND Gate, with weighting adjusted. Was focused on trying to understand AND gates so its unexpected to see its run below as thought would be many more submissions for IANNS.

Fungal Materials

Fungal materials form part of mycelium, a network also being studied for its possible contribution to communal living and alternative methods of communication through its fungal structure and system. Mycelium composites, such as those grown from oyster mushrooms on agricultural waste, are used commercially as biodegradable packaging and leather alternatives, with companies like Ecovative leading production.

Various other materials are being fabricated utilizing fungal spores, and fungal pigments are also in use. Ink cap mushrooms, for example, undergo autodigestion and become a liquid black ink. It is worth noting that fungal pigments are not very lightfast, and prolonged UV exposure will degrade the color, which remains a significant limitation compared to synthetic dyes. Spalting, where fungi create dark patterned lines as they compete for territory in wood, is another application used in decorative woodworking. The core advantage of fungal materials over traditional counterparts such as synthetic foam or leather is that they are biodegradable, compostable, and generally healthier for human and environmental use. Their disadvantages include lower structural strength, moisture sensitivity, and slower production cycles. Extending the lightfastness of fungal pigments through mordants and fixatives, drawing on approaches used with natural pigments and mineral ochres, represents a personally compelling area of further research.

Two areas stand out as compelling targets for genetic engineering in fungi. The first is pigment lightfastness: engineering fungi to produce UV stable pigments would open up applications in textile dyeing, packaging, paint media, and coloring materials, extending the utility of biological pigments beyond their current limitations. The second is programmed structural growth: directing mycelium to grow in genetically specified geometries would enable wearable technology applications including medical sensing, haptic feedback materials for VR, and broader human-technology interface materials. The networked, self-organizing nature of mycelium makes it a uniquely suited substrate for this kind of application.

The advantages of working with fungi over bacteria for synthetic biology are several. Fungi are eukaryotes, meaning they share cellular machinery with plants and animals and can produce and correctly fold complex proteins that bacteria cannot. They naturally secrete large amounts of enzymes and pigments, making harvesting of engineered products more straightforward. Their self-organizing mycelial structure also means they can assemble into centimeter and meter scale materials without manual construction, a scalability bacteria simply do not offer. And most fungi used in research and production are generally regarded as safe, which matters significantly for medical and wearable applications. Bacteria such as cyanobacteria offer interesting material properties but their toxicity presents a barrier that fungi largely avoid.

Part 3: First DNA Twist Order and ## Review Part 3:

Review the Individual Final Project documentation guidelines. Submit this Google Form with your draft Aim 1, final project summary, HTGAA industry council selections, and shared folder for DNA designs. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.

Completed both above: I submitted the Google Form with my draft Aim 1, final project summary, HTGAA industry council selections, and shared folder link for DNA designs. My insert sequence (CBM27_RGD_MaSp1 fusion protein, codon-optimised for E. coli cell-free expression) was designed in Benchling and placed in the shared folder. The backbone vector selected is pTwist PET28. Full construct documentation is on my Individual Final Project page. https://pages.htgaa.org/2026a/henrietta-scholtz/projects/individual-final-project/aim-1-construct-design/index.html


References & Resources

Lecture Materials

  • Week 7 Lecture - Genetic Circuits Part II: Neuromorphic Circuits, Ron Weiss & Evan Holbrook
  • Lecture Recording - March 17, 2026

Required Readings

  1. Weiss, R. et al. (2023). “Intracellular Artificial Neural Networks for Cellular Computation.” Nature Biotechnology, 41(2), 245-259.
  2. Holbrook, E. et al. (2024). “Engineering Boolean Logic in Living Cells.” Cell Systems, 18(3), 412-428.

Software & Tools Used

AI Assistance

  • Claude (Anthropic) - Literature review and concept clarification
    • Model: Claude Sonnet 4.5
    • Date(s) used: March, 2026
    • Tasks: Assisted as mentor(As skill) with understanding IANN architecture principles, helped to teach me technical concepts, checked my answers

Protocols & Methods

  • IANN Circuit Design Protocol - Weiss Lab, MIT
  • Mammalian Cell Transfection Protocol - Standard lab procedures

Additional Resources

Acknowledgments

  • Weiss Lab for running the biased dependent parallel AND gate circuits
  • TA support during circuit design troubleshooting question

Week 9 HW: Cell Free Systems

Part A: General and Lecturer-Specific Questions

General Questions

Q1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free protein synthesis offers two main advantages over in vivo methods: direct control and speed. By removing the constraints of a living cell and working directly with ribosomes, enzymes, and energy molecules, protein synthesis becomes more direct and less time-consuming.

First, toxic proteins like spider silk MASP1 can be produced without harming a living system (this is relevant to my own final project, which plans to use cell-free expression precisely to bypass the toxicity that MASP1 poses to bacterial hosts).

Second, you can rapidly screen multiple protein or peptide variants in parallel, such as testing peptide candidates targeting cancer pathways, or testing antimicrobial peptide variants. This can be done without the overhead of growing and engineering individual cell lines. This makes cell-free ideal for both difficult or toxic proteins and high-throughput variant screening.


Q2. Describe the main components of a cell-free expression system and explain the role of each component.

A cell-free system needs five main components. The DNA or mRNA template gives the instructions (like my MASP1 spider silk sequence from UniProt for FP).

Ribosomes read the template and build the protein. Transfer RNAs bring amino acids to the ribosome. The amino acids are the actual building blocks. An energy system (ATP) powers the whole process. You also need the right salts and pH to keep everything working. Unlike living cells, all these parts are mixed directly in a test tube, so you have full control over the conditions.


Q3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Energy regeneration is critical in cell-free systems because protein synthesis requires continuous ATP. Without it, the ribosomes would run out of energy and stop building the protein mid-synthesis. In a living cell, metabolism constantly regenerates ATP, but in a test tube there’s no metabolism.

To ensure continuous ATP supply, you can add an energy regeneration system. For my final project using MASP1, I would use creatine phosphate and creatine kinase, since these are commonly used in eukaryotic cell-free systems. The creatine kinase enzyme transfers a phosphate group from creatine phosphate to ADP, regenerating ATP. If I were using a bacterial cell-free system instead, I would use PEP and pyruvate kinase, which serves the same purpose but aligns better with bacterial metabolism.


Q4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic cell-free systems (like E. coli extract) are faster, cheaper, and simpler. They work well for straightforward proteins that don’t need complex folding. Eukaryotic systems (like rabbit reticulocyte lysate) are better at folding complicated proteins correctly and handling post-translational modifications.

For my final project, if I was testing the tremella fusiformis protein I would produce it in a prokaryotic E. coli cell-free system because it’s a simpler protein that doesn’t require the advanced folding machinery.

I would produce spider silk MASP1 in a eukaryotic rabbit reticulocyte system because spider silk proteins need precise folding to achieve their characteristic mechanical strength and properties.


Q5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Snow Fungus, membrane protein. Challenges: The hydrophobicity and aggregation and a way to address that is to optimize the sequence to reduce those hydrophobic regions or to add tags that help with solubility.


Q6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Three possible reasons for low yield and troubleshooting strategies: (Have thought about these for FP)

Reason 1: Construct failure. Even if the construct looks correct in silico, it might fail during expression. Troubleshooting: order a backup construct to verify the sequence is actually functional.

Reason 2: Protein structure collapse. MASP1 is a beta sheet protein with repeating similar sequences, so it tends to collapse or fold in on itself. Troubleshooting: codon optimize the sequence fewer times (e.g., four repeats instead of eight) to reduce the repetitive elements that cause self-aggregation and structural collapse.

Reason 3: Energy system failure. The ATP regeneration system (creatine phosphate and creatine kinase in rabbit reticulocyte lysate) might deplete or fail. Troubleshooting: prepare a backup of the full fresh rabbit reticulocyte lysate system to ensure continuous energy supply.


Homework Question from Kate Adamala: Design a Synthetic Minimal Cell

Design an example of a useful synthetic minimal cell.

1. Function: Lyme Disease Biosensor

My synthetic cell detects Borrelia burgdorferi protein and produces a fluorescent signal as output. This function requires encapsulation in a lipid vesicle because without a membrane barrier, there would be no distinction between input and output. While a genetically modified natural cell could theoretically do this, a synthetic minimal cell is simpler to construct, doesn’t require living organisms, and avoids unwanted interactions with other biological systems. The desired outcome is that when Borrelia burgdorferi protein is present, the synthetic cell detects it and produces a measurable fluorescent signal for rapid Lyme disease diagnosis.

2. Components

The membrane would be made of biocompatible lipids (POPC and cholesterol) to avoid triggering an immune response. Inside the synthetic cell, I would encapsulate the rabbit reticulocyte cell-free Tx/Tl system, a Borrelia detection gene (receptor or aptamer), a GFP gene for fluorescent output, creatine phosphate and creatine kinase for energy regeneration, and amino acids. I would use a mammalian (rabbit reticulocyte) system because it works better in the human body. The membrane is permeable to Borrelia protein so it can enter and be detected, and GFP fluorescence is visible from outside.

3. Experimental Details

Lipids: POPC, cholesterol. Genes: Borrelia receptor/aptamer gene, GFP gene. Enzymes: rabbit reticulocyte lysate, creatine kinase. Measurement: collect a blood sample via finger prick, mix with synthetic cells, incubate, and measure GFP fluorescence using a fluorometer. Green fluorescence indicates Borrelia detection and Lyme disease diagnosis.


Homework Question from Peter Nguyen: Cell-Free Systems in Materials

I used my final project construct to answer this, as it relates. See the full construct for soft robotics design here: (https://pages.htgaa.org/2026a/henrietta-scholtz/projects/individual-final-project/index.html)

Field chosen: Robotics

One-sentence pitch:

Freeze-dried cell-free systems embedded in a soft robotic skin could produce structural silk proteins on-demand at the exact site of damage, allowing the robot to repair itself without any electronics or human intervention.

How will it work?

The freeze-dried cell-free mixture, loaded with the instructions to make a silk protein, sits dormant inside small pockets distributed across the robot’s outer skin. When part of the skin tears or wears out, a tiny water channel releases fluid into that specific pocket, waking up the cell-free system and triggering protein production right where it is needed. The silk proteins then assemble themselves into reinforcing fibres that patch the damaged area from the inside. A further development of the same skin, using a light-sensitive protein variant, could allow the skin to stiffen or move in response to light, acting as a simple actuator without any wiring.

Societal challenge:

Soft robots used in disaster response, deep-sea work, and space exploration often operate in places where human repair crews simply cannot reach them. Their flexible outer skins degrade quickly under mechanical stress, cutting missions short. A skin that can repair itself using biological machinery would extend the working life of these robots and reduce the cost and logistics of maintaining them in remote or dangerous environments.

Addressing cell-free limitations:

Activation with water: Sensors in the skin detect damage and trigger the release of a small controlled volume of water into the affected pocket, so the reaction only starts when and where it is needed.

Stability: The freeze-dried format stays stable at room temperature for months. The robot skin itself acts as a protective shell, keeping moisture and light away from the dormant mixture.

One-time use: Each pocket is a single-use repair unit. Many pockets are spread across the skin, so multiple damage events can each be addressed independently. A longer-term version could include a refillable central water reservoir that reloads used pockets.


Homework Question from Ally Huang: Mock Genes in Space Proposal

Your proposal must incorporate the BioBits® cell-free protein expression system. You may also use the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer.

I used my final project chosen construct to answer this, as it relates. See the full construct design here: (https://pages.htgaa.org/2026a/henrietta-scholtz/projects/individual-final-project/index.html)

Q1. Background

Spider silk is one of the toughest biological materials known, and it forms entirely on its own when the right protein is mixed with water. This self-assembly process may behave differently in the weightlessness of space, because gravity normally helps protein fibres settle and organise as they form. Understanding whether silk proteins can still assemble correctly in microgravity matters enormously for long space missions, where astronauts may need to fabricate medical bandages, soft robotic components, or structural patches on site. The BioBits cell-free system offers a safe, simple way to test protein production in space without living organisms.

Q2. Molecular / Genetic Target

The MaSp1 spider silk protein domain, and specifically whether its characteristic self-assembly into structural fibres is affected by the absence of gravity aboard the ISS.

Q3. Relevance to the Space Biology Challenge

Silk fibres form when proteins fold and stick together in a very specific pattern. On Earth, gravity gently helps organise this process as fibres settle and compact. In space, that settling does not happen, and the proteins are left to find each other purely by random movement through the liquid. This could mean fibres come out longer, more tangled, or slower to form than expected. If the structure of the silk changes in space, then any material made from it, whether a wound dressing or a robotic actuator, might not perform the way it was designed to. This experiment is a necessary first check before committing silk-based materials to any space mission.

Q4. Hypothesis and Reasoning

I would think that the silk fibres produced in microgravity will look and behave differently from those produced on the ground, and that this difference will be visible under fluorescence imaging. The logic is straightforward: the chemistry that makes silk proteins stick together is built into the protein sequence itself and does not need gravity to work. However, the way those fibres then organise into a larger network depends heavily on how the proteins drift and collide through the liquid, a process that gravity normally shapes. Without it, I would expect fibrils to be more randomly distributed and take longer to form a cohesive structure. I would test this by using a BioBits reaction loaded with a silk protein tagged with a green fluorescent marker, so we can watch the fibres appear in real time using the P51 Fluorescence Viewer and compare what happens in orbit to what happens on the ground at the same moment.

Q5. Experimental Plan

I would run three BioBits reactions in orbit: one expressing the fluorescent silk protein, one expressing only the fluorescent tag with no silk component as a control, and one expressing nothing as a baseline. Identical reactions will run simultaneously on the ground. Before starting, we will use the miniPCR thermal cycler to confirm that the DNA templates survived the journey to the ISS intact. I would image all reactions with the P51 Fluorescence Viewer at 30-minute intervals for two hours, recording when fibres first appear, how densely they form, and how their pattern compares between the space and ground samples.

---

Part B: Individual Final Project

I completed all below before the deadline, including adding my Twist DNA order to the Node and MIT Twist spreadsheet.

  • [Y ] Put your chosen final project slide in the appropriate slide deck (following the instructions on slide 1)
  • [Y ] Submit the Final Project selection form (if not already done)
  • [Y ] Begin planning your final project documentation (see guidelines)
  • [Y ] Prepare your first DNA order and add it to the Twist ordering spreadsheet

Week 10 HW: Imaging & Measurement

Final Project

  1. Identify at least one aspect of your project that you will measure (mass/sequence of a protein, presence/absence/quantity of a biomarker, etc.).

The mass and sequence of the CBM27_RGD_MaSp1_4x fusion protein, and its binding interaction with Tremella fuciformis polysaccharide.

  1. Describe all of the elements you would like to measure and how you will perform these measurements.

    Three things to measure: the molecular weight of the expressed fusion protein (~37 kDa), its amino acid sequence to confirm correct expression, and its binding to Tremella mannan polysaccharide via a polysaccharide binding assay.

  2. What technologies will you use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry)? Describe in detail.

    SDS-PAGE to confirm the protein expressed at the correct size (~37 kDa). Mass spectrometry as per below to confirm the exact molecular weight and primary sequence via peptide mapping. A polysaccharide binding assay to confirm the CBM27 domain binds correctly to Tremella mannan.

Waters Part 1 — Molecular Weight

  1. Based only on the predicted amino acid sequence of eGFP, what is the calculated molecular weight?

Based on the predicted amino acid sequence of eGFP (including the His-tag and LE linker), the calculated average molecular weight is 28,006.60 Da.

  1. Calculate the molecular weight of eGFP using the adjacent charge state approach:

After inserting sequence at ExPASy and selecting the two adjascent peaks form the image above I asked Claude to walk me through this math step by step and using Lego as an analogy as it too advanced for me.

Question 1: Calculated MW of eGFP

The ExPASy tool calculated the theoretical molecular weight by adding up all the amino acid masses in the sequence, giving us 28,006.60 Da. In LEGO terms, we used a database that knows the weight of each type of brick and stud in our sequence.

Question 2a: Determine z

Using the formula provided on the homework page with chosen peaks at 965.9684 and 1000.4302, we calculated z = 28 for the 1000.4302 peak and z = 29 for the 965.9684 peak. In LEGO terms,the gap between two adjacent bricks to figure out how many studs each one has.

Question 2b: Determine MW from m/z and z

Using formula from homework page = 1000.4302, calculated MW = 27,983.84 Da.

In LEGO terms, multiplied back by the number of studs and subtracted their weight to find just the brick’s weight.

Question 2c: Mass Accuracy

Using formula from homework page Accuracy= 0.081% so Accuracy=28,006.60∣27,983.84−28,006.60∣​=0.000812=0.081%, measurement was good.

In LEGO terms, calculated brick weight matched the database weight almost perfectly.

  1. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

It does look a bit ‘bumpy’ so I said yes, the charge state can be observed. Based on the zoomed-in peak at ~1473 m/z, the charge state is approximately z = 19

Waters Part II – Secondary/Tertiary Structure

Explain the difference between native and denatured protein conformations.

When a protein unfolds (denatures), its 3D structure is lost and buried regions become exposed. These exposed regions can pick up more protons, increasing the charge state (z). Higher charge means lower m/z, so the peaks shift to the left. Looking at Figure 2 above:

Denatured (top, green): Many peaks spread across a wide range at lower m/z (600-1400), high charge states Native (bottom, red): Fewer peaks clustered at higher m/z (2500-2800), low charge states

In LEGO terms: the folded brick can only fit a few studs on its surface. When you pull it apart, every piece can now have studs attached, so the total stud count goes way up.

What happens when a protein unfolds? How is that determined with a mass spectrometer?

When a protein unfolds, its 3D structure is lost and buried regions become exposed, allowing more protons to attach to the protein. The mass spectrometer detects the change in charge state distribution. An unfolded protein shows many peaks at lower m/z values due to higher charge states, while a folded protein shows fewer peaks clustered at higher m/z values.

What changes do you see in the mass spectrum between native and denatured analyses (Figure 2)?

Zooming into the native mass spectrum (Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell?

The denatured spectrum (top) shows many peaks at lower m/z. The native spectrum (bottom) shows fewer peaks clustered at higher m/z. More protons attach to the unfolded protein, shifting peaks to the left.

Yes, the charge state can be discerned from the zoomed inset. The isotope peaks are spaced approximately 0.1 Da apart, meaning z = 1/0.1 = 10.

Waters Part III – Peptide Mapping

How many Lysines (K) and Arginines (R) are in eGFP? Circle or highlight them in the sequence.

29

How many peptides will be generated from tryptic digestion of eGFP?

Tryptic digestion of eGFP generates 19 peptides shown here, with additional smaller peptides below 500 Da not displayed. The total number of predicted cleavage sites is 29 (19K + 10R), giving a maximum of 30 peptides.

Based on the LC-MS chromatogram data (Figure 5a), how many chromatographic peaks do you see between 0.5 and 6 minutes?

Between 0.5 and 6 minutes, there are approximately 19 chromatographic peaks above 10% relative abundance. I did estimate this as some might have cleared, but left them out.

Does the number of peaks match the number of peptides predicted? Are there more or fewer peaks?

There are fewer peaks than predicted peptides. This is because some peptides co-elute (blend together) and appear as one peak, very small peptides below 500 Da are not detected, and some peptides may be below the detection threshold.

Identify the m/z of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state? Calculate the mass of the singly charged form [M+H]+ based on its m/z and z.

m/z = 525.76712 z = 2 [M+H]+ = 1050.527 Da

Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is the mass accuracy of the measurement?

Peptide = FEGDTLVNR Mass accuracy = 5.3 ppm

What is the percentage of the sequence confirmed by peptide mapping (Figure 6)?

88% sequence coverage

Waters Part IV – Oligomers

Using the known masses of the polypeptide subunits for KLH, identify where the following oligomeric species are on the CDMS spectrum (Figure 7):

7FU Decamer, 8FU Didecamer, 8FU 3-Decamer, 8FU 4-Decamer.

7FU Decamer (10 x 340 kDa) = 3,400 kDa = 3.4 MDa – that’s the peak at 3.4 on the spectrum

8FU Didecamer (20 x 400 kDa) = 8,000 kDa = 8.0 MDa – that’s the peak at 8.33 on the spectrum

8FU 3-Decamer (30 x 400 kDa) = 12,000 kDa = 12.0 MDa – that’s the peak at 12.67 on the spectrum

8FU 4-Decamer (40 x 400 kDa) = 16,000 kDa = 16.0 MDa – that’s the small peak around 16 on the spectrum

  1. Fill out the “Did I make GFP?” table with theoretical vs. observed molecular weight and amino acid sequence coverage from both instruments.

Week 11 HW: Bioproduction and Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

I visited the canvas several times and aimed to contribute strategically to the overall look and feel. I added a yellow “MIT” at some point and contributed around 200 pixels overall, ranking approximately 10th on the contributions list last I checked.

I enjoyed the collaborative aspect and that we could all participate together independently of node and location.

One improvement could be introducing a “live hour” (or even just 5 minutes during homework review) where everyone gathers on Zoom to paint together. This might foster more online connection, increase engagement from some students, and spark casual conversation within HTGAA about the project. I also think large automated scripts should be discouraged as if pixels are placed randomly or too particular/specific to something, it defeats the collaborative element and reduces the chance of organic group design outcomes emerging naturally.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

E. coli BL21 Star lysate with T7 RNA polymerase: Contains the ribosomes and enzymes needed to read DNA and build proteins. The T7 polymerase specifically recognizes and reads the T7 promoter on a DNA template.

Salts/Buffer (potassium glutamate, HEPES, magnesium): Maintains the correct pH and ionic environment so the cellular machinery can function properly.

Energy/Nucleotide system (ribose, glucose, NMPs): Provides the energy molecules and building blocks needed to synthesize RNA and power the protein synthesis reactions.

Translation mix (amino acids): Supplies the 20 amino acids that the ribosome links together in the correct order to build the protein chain.

Tyrosine specifically: Acts as a key component of the chromophore in fluorescent proteins, enabling them to fluoresce.

Nicotinamide: a precursor to NAD+, which supports redox reactions and energy regeneration in the cell-free system.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

mTurquoise2: A 36 hour (48-hour at hw review?) cell-free reaction time for this experiment means that mTurquoise2, which has a slower maturation time, is not a major constraint since there is sufficient incubation time for full chromophore maturation.

sfGFP: sfGFP has robust folding capability, which allows it to fold correctly without cellular chaperones, resulting in faster and more efficient fluorescent protein production in cell-free systems.

Electra2: Electra2’s performance in bacterial cell-free systems may be unpredictable because it was engineered and optimized for mammalian cells, not for E. coli expression environments.

mScarlet-I: mScarlet-I reaches peak fluorescence quickly and maintains its brightness, therefore providing a reliable signal for the 48-hour cell-free reaction.

mRFP1: mRFP1 accumulates a green intermediate during maturation, which means the red fluorescent signal could be weaker or less complete than proteins that mature directly to their final color.

mKO2: mKO2 has moderate acid sensitivity, so as pH drifts over 36 hours in a cell-free reaction, its fluorescence may dim or become less reliable.

CELL-FREE REAGENTS HYPOTHESIS

Protein: mTurquoise2

Biophysical property: Slower chromophore maturation compared to sfGFP.

Reagent to adjust: Nicotinamide (tested at +48%, +100%, and +200% above baseline)

WellNicotinamideChange
Q3-O23.125 mMBaseline control
Q3-N24.625 mM+48%
Q3-M26.250 mM+100%
Q3-L29.375 mM+200%

Hypothesis: mTurquoise2’s slower maturation requires sustained energy support. Increasing nicotinamide concentration will boost NAD+-dependent energy regeneration, accelerating mTurquoise2’s chromophore maturation during the 36-hour incubation and increasing fluorescence output.

Expected outcome: mTurquoise2 wells with increased nicotinamide will show brighter fluorescence than the baseline control, with fluorescence peaking at an optimal concentration before potentially declining at the highest dose. This would demonstrate that slower-maturing proteins benefit from enhanced energy support, suggesting the Ginkgo/OpenAI master mix – optimized for sfGFP – is not universally optimal for all fluorescent proteins.

Not received the above data (would have been interesting to see results for my hypothesis) and Node stated we can leave it out for final assessment.

Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment

Labs

Lab writeups:

  • Week 1 Lab: Pipetting

  • Week 2 Lab: Gel Art

  • Week 3 Lab: Opentrons

    Python Script for Opentrons Artwork Since I am not present to interact directly with the Opentrons output, I thought about why I would want to pipette an image and what that image should represent and decided to use Ndebele bead patterns as inspiration. Ndebele bead patterns have a very specific geometric logic. They are built on a grid of “bead units” arranged in bold, angular, symmetric designs. The traditional South Ndebele aesthetic uses high-contrast colors in step-like diagonal and horizontal bands, often with thick outlines and mirrored symmetry.

  • Week 4 Lab: Protein Part I

    *Part of Week 4 Homework/See above

  • Week 5 Lab: Protein Part II

    *Part of Week 5 Homework/See above

  • Week 6 Lab: Gibson Assembly

    Lab questions answered on the Week 6 Homework page.

  • Week 7 Lab: Neuromorphic Circuits

    Weiss Lab run of my IANNS biased dependent parallel AND Gate (This was not really the HW question I’m just facinated by AND Gates and did not think mine would be run as we a large cohort, but not many submitted). The results do seem to point to my parrallel expression working for a smooth release of slightly differently weighted levels. Every dot is a human cell! Neuromorphic Wizard test that shows both parts are rising and one slight weighted less and also see what could be ‘drift’

  • Week 09 Lab: Cell Free

    LAB QUESTIONS CELL FREE I answered these in relation to lab document and FP where relevent. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell free expression is more beneficial than cell production. Cell-free systems allow direct control over reaction components without needing viable cells, making them ideal for expressing toxic proteins (like AMPs) and for rapidly iterating fusion protein designs like CBM27_RGD_MaSp1. Specifically chosen to bypass the living cells.

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Week 2 Lab: Gel Art

Week 3 Lab: Opentrons

Python Script for Opentrons Artwork

Since I am not present to interact directly with the Opentrons output, I thought about why I would want to pipette an image and what that image should represent and decided to use Ndebele bead patterns as inspiration.

Ndebele bead patterns have a very specific geometric logic. They are built on a grid of “bead units” arranged in bold, angular, symmetric designs. The traditional South Ndebele aesthetic uses high-contrast colors in step-like diagonal and horizontal bands, often with thick outlines and mirrored symmetry.

They are also studied as Ethno mathematics, which often promotes a more humanistic and inclusive perspective on mathematics, focusing on how different groups manage, understand, and navigate their reality.

I found it interesting to bring the mathematical and social aspects of this indigenous knowledge to the biochemical level, as this layering of meaning creates interesting avenues for reflection on various levels.

Example of Ndebele paintings and beadwork:

Python Visuals & Scripts Ex.

I am not a coder, but playing around with the example scripts, I ended up using Claude to vibe-code the desired patterns and position. It required some debugging and made various output versions.

Although the co-lab script runs without error, I am not sure if this will work on Opentrons.

EARLY VERSIONS BEFORE KNOWING COLOUR AVAILABILITY

Post Lab Homework


Published Paper

A directly relevant paper is Fang et al. (2025) in Nature Communications, which demonstrates circadian-gated gene expression circuits in bacteria, using automated temporal sampling to characterize rhythmic protein output over 24-hour cycles. This paper is not a peripheral reference; it is one of the primary foundational sources for my final project concept and is already cited in my main project documentation. The automation approach used to verify rhythmic expression in that work is precisely what I intend to replicate and extend with the Opentrons platform.


What I Intend to Automate

My project proposes a bacterial AND gate where the antimicrobial peptide Magainin is only expressed when two conditions are simultaneously true: the circadian regulator RpaA is active, and a pathogen signal is present. The core experimental challenge is verifying this gate actually works as designed, which requires sampling bacterial expression levels repeatedly across a full 24-hour cycle, under multiple conditions, without human error or gaps overnight. This is the automation task.

The Opentrons OT-2 would run an unattended 24-hour sampling protocol across three experimental conditions:

  • RpaA active + pathogen signal present (AND gate should trigger)
  • RpaA active + no pathogen signal (gate should stay silent)
  • RpaA inactive + pathogen signal present (gate should stay silent)

At each 2-hour timepoint, the robot samples each culture well, transfers to a measurement plate for fluorescence reading, and replaces the sampled volume with fresh media to keep cultures alive. This builds a full temporal expression profile across all three conditions without any overnight manual intervention.

I would use Claude for the coding and guidance in the technical parts of this.

Why This Automation Matters

The AND gate only has meaning if you can show it is silent when it should be silent and active only at the right circadian phase with the right pathogen or other signal. That requires clean data across all three conditions at every 2-hour window through the night. Manual pipetting at 2am introduces the exact inconsistency that would make the rhythmic signal unreadable. The Opentrons removes that variable entirely.


Future Extensions

If access to Ginkgo Nebula becomes available, the next step would be submitting the AND gate genetic construct for scaled fermentation and characterization; using Nebula’s high-throughput infrastructure to screen circuit variants with different RpaA promoter strengths or pathogen-sensing thresholds, generating the kind of combinatorial data that would take months on a single benchtop robot.


References & Resources

Lecture Materials

Required Readings

  1. Fang et al. (2025). “Circadian-gated gene expression circuits in bacteria.” Nature Communications
  2. UCSD (2024). “Researchers Rebuild Microscopic Circadian Clock.” University of California San Diego press release
  3. Bilska et al. (2021). “Circadian rhythm in skin barrier function and antimicrobial peptides.” Experimental Dermatology

Software & Tools Used

  • Google Colab - Python script development and testing for Opentrons protocols
  • Python - Opentrons protocol scripting and pattern generation
  • Imgur - Image hosting for project visualization and Ndebele pattern references

Cultural & Mathematical Inspiration

  • Ndebele bead patterns and geometric design principles
  • Ethnomathematics - Indigenous mathematical knowledge systems
  • Traditional South Ndebele aesthetic and symmetry patterns

Project Concepts Explored

  • Circadian-controlled bacterial pigment systems
  • Light-responsive color-changing bacteria
  • UV-protective bioplastic materials with bacterial pigments
  • Mechanotransduction experiments with bacterial cultures
  • Bacterial AND gate with circadian gating (RpaA + pathogen signal triggering Magainin expression)

Cost Considerations

  • Twist Bioscience DNA synthesis pricing
  • Remote lab assistance availability assessment
  • UK-based protein order logistics and costs

AI Assistance

  • Claude (Anthropic) - Code development and technical guidance
    • Model: Claude Sonnet 4.5
    • Date(s) used: February, 2026
    • Tasks: Assisted with Python script development for Opentrons Ndebele pattern generation (“vibe-coding”), debugging protocol scripts and verifying scientific terminology.

Future Platforms

  • Ginkgo Nebula - Potential platform for scaled fermentation and high-throughput circuit variant screening

Additional Resources

  • HTGAA final project guidelines and requirements
  • Twist Bioscience pricing documentation
  • Remote lab capabilities at available nodes
  • Opentrons protocol documentation and API reference
  • Ndebele art and design pattern libraries
  • Ethnomathematics literature

Acknowledgments

  • Course instructors
  • TAs
  • Ndebele cultural heritage for geometric design inspiration

Week 4 Lab: Protein Part I

*Part of Week 4 Homework/See above

Week 5 Lab: Protein Part II

*Part of Week 5 Homework/See above

Week 6 Lab: Gibson Assembly

Lab questions answered on the Week 6 Homework page.

Week 7 Lab: Neuromorphic Circuits

Weiss Lab run of my IANNS biased dependent parallel AND Gate (This was not really the HW question I’m just facinated by AND Gates and did not think mine would be run as we a large cohort, but not many submitted). The results do seem to point to my parrallel expression working for a smooth release of slightly differently weighted levels. Every dot is a human cell!

Neuromorphic Wizard test that shows both parts are rising and one slight weighted less and also see what could be ‘drift’

Other IANNS experiments including the Kaiclock using the RpAa promoter

Week 09 Lab: Cell Free

LAB QUESTIONS CELL FREE

I answered these in relation to lab document and FP where relevent.

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell free expression is more beneficial than cell production.

Cell-free systems allow direct control over reaction components without needing viable cells, making them ideal for expressing toxic proteins (like AMPs) and for rapidly iterating fusion protein designs like CBM27_RGD_MaSp1. Specifically chosen to bypass the living cells.

Describe the main components of a cell-free expression system and explain the role of each component.

A cell-free system contains a cell extract with ribosomes and translation factors, a DNA template, amino acids, an energy regeneration system, and salts/cofactors, each providing the machinery, instructions, building blocks, fuel, and environment needed for protein synthesis.

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

ATP is consumed faster than it is naturally recycled, so PURExpress uses phosphoenolpyruvate (PEP) and pyruvate kinase to continuously regenerate ATP from ADP and keep translation running.

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

E. coli-based PURExpress is used for CBM27_RGD_MaSp1 because it needs no glycosylation, while a mammalian cell-free system (e.g. HeLa extract) would be chosen for a protein like human erythropoietin that requires glycosylation to function.

How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

For membrane proteins, the main challenge is hydrophobic aggregation requiring lipid nanodiscs or detergents. Main challenge for MaSp1 however is repeat regions are prone to aggregation and beta-sheet stacking, would need to add chaperones like DnaK to the PURExpress reaction to assist correct folding. This will need to be part of my assay protocol.

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Poor template quality is fixed by re-purifying DNA and optimising concentration; ATP depletion is addressed by increasing PEP; and protein aggregation is reduced by adding protease inhibitors and chaperones.

References: HTGAA Spring 2026. Week 9 Lab: Cell-Free Systems. Available at: https://2026a.htgaa.org/2026a/course-pages/weeks/week-09/lab/index.html [Your final project documentation] CBM27_RGD_MaSp1 fusion protein construct design and PURExpress expression protocol, HTGAA Spring 2026.

Subsections of Projects

Individual Final Project-Co-RSynBioR

HTGAA Spring 2026

Exoskin: A Spider Silk-Snow Fungus Bioactive Biomaterial for Wound Healing and Beyond

Part of Co-Relational Synbio Research Project (Co-RSynBioR)

Abstract

Co-RSynBioR asks a central research question: What new material(s) and relational possibilities become available when synthetic biology is used to design co-relational constructs across biological kingdoms? How far can that co-relational construct extend its structural, functional, and environmental range? Exoskin: A Spider Silk-Snow Fungus Bioactive Biomaterial was the start of answering this question. The first co-relational pair explored is MaSp1 spider silk and the Tremella fuciformis polysaccharide, joined through a CBM27 carbohydrate-binding domain as a designed molecular anchor rather than a physical blend. The result is a platform with thus far two construct variants.

Quick overview of project outputs thus far:

Construct 1 — BioMechSkin (CBM27_RGD_MaSp1_4x): Targets diabetic foot ulcers, which affect 15-25% of all people with diabetes and are the leading cause of non-traumatic lower limb amputation worldwide. Current dressings do not simultaneously address moisture retention, mechanical resistance, and tissue repair. This construct does: Tremella provides moisture retention and macrophage modulation, MaSp1 provides tensile strength and friction resistance, and an RGD motif recruits fibroblasts and keratinocytes directly to the wound surface. AlphaFold3 confirms CBM27 pLDDT above 90, RGD solvent exposed and distal, MaSp1 disordered as expected.

Construct 2 — Photomechanical Actuator (VVD_GGS_CBM27_MaSp1_4x): Adds a VVD LOV domain from Neurospora crassa to the same chassis. A blue light pulse triggers VVD dimerisation, contracting the silk hydrogel network without any electronic actuator. This opens the platform to soft robotics for Mars and space environments, directly addressing a NASA-identified gap in electronics-free actuation under extreme conditions. AlphaFold3 and MolProbity validation at the 92nd percentile confirm structural integrity of the construct.

Both constructs share the same CBM27-Tremella-MaSp1 backbone, establishing it as a generalizable platform for encoding function at the protein sequence level.

Long term, this project unfolds in two phases:

Phase 1: Computational and Conceptual explores how many material combinations and outputs become possible by extending the co-relational design philosophy across different kingdom pairs and molecular architectures. This phase runs entirely in silico, using AlphaFold3, molecular dynamics, and design iteration to map the conceptual space of what cross-kingdom molecular design can achieve.

Phase 2: Wet Lab Validation (mid-year onwards) will aim to test the most promising constructs through cell-free protein synthesis and composite assembly, either through remote lab partnership or collaboration with the University of Johannesburg where I am artist in residence. This two-step approach allows the philosophical and practical dimensions to develop in parallel, but also allows momentum to continue via computational dry lab and conceptual documentation if wetlab assay delayed.


Introduction

Exoskin asks a broad single question: can cross-kingdom molecular design produce a material whose functional range exceeds what either organism achieves alone, and what emerges when that architecture is pushed to its limits? Two constructs are proposed as answers, both built on the same CBM27-Tremella-MaSp1 backbone. The first, CBM27_RGD_MaSp1_4x, targets wound healing. The second, VVD_GGS_CBM27_MaSp1_4x, adds a light-responsive LOV domain for photomechanical actuation in extreme environments. Together they test whether a single molecular platform can span from clinical wound care to space robotics. Ethically and conceptually, the project explores boundaries between human, animal, and fungal biology and raises questions about posthuman embodiment and hybridity.

The wound care construct is optimised for diabetic foot ulcers through two functional additions to the MaSp1 spider silk core:

Broadly Exoskin has applications in tissue engineering, drug delivery and regenerative medicine. Ethically and conceptually, the project explores traditional boundaries between human, animal, and fungal biology and raise questions about posthuman embodiment, hybridity, and science.

Here it is optimised for wound healing through two functional additions to the MaSp1 spider silk core:

  1. CBM27 domain from Thermotoga maritima Man5 mannanase to physically anchor the silk network onto Tremella fuciformis polysaccharide matrix
  2. RGD motif specifically selected to recruit skin fibroblasts and keratinocytes to the wound surface

Unlike passive biomaterials, this construct actively participates in tissue repair through three simultaneous mechanisms:

  1. Structural support from silk nanofibrils
  2. Moisture retention and macrophage modulation from Tremella polysaccharide
  3. Integrin-mediated skin cell recruitment from the RGD motif

Section 1: Project Aims

Aim 1 - Experimental: Design and computationally validate the CBM27_RGD_MaSp1_4x fusion construct using Benchling, Twist Biosciences, and AlphaFold3. see update below from Synbio 2026 confererence regarding Sidewinder - Robinson, N.E., Zhang, W., Ghosh, R., Gerber, B., Zhang, H., Sanfiorenzo, C., Wang, S., Di Carlo, D. and Wang, K. (2026) ‘Construction of complex and diverse DNA sequences using DNA three-way junctions’, Nature, 651(8105), pp. 491-500. https://doi.org/10.1038/s41586-025-10006-0

Aim 2a - Expression: Validate through cell-free protein synthesis or bacterial expression, confirmed by SDS-PAGE and polysaccharide-binding assay. (Hook Note - The tremella polysaccharide structure consists of a main chain of β-(1→3)-linked mannopyranoside, substituted with various side chains, which your CBM27 domain is designed to recognize. Ref: nih)

Aim 2b - Composite Assembly (contingent on Aim 2a): Combine the purified protein with Tremella fuciformis polysaccharide extract to assemble the composite hydrogel and characterise by rheology and fluorescence microscopy. - Ginkgo Bioworks / Node — target partners for cell-free expression (Aim 2a); plasmid submitted to Node for synthesis

Aim 3 - Visionary: Application and prototype wound dressing. Longer term, the MaSp1 silk fibre bundle contraction properties open a pathway to bio-inspired soft robotics actuators and exoskins driven by hydrogel mechanics. A light-responsive actuation variant is under exploration. Bio kinetic sculptures for moisture harvesting in desert areas such as Karoo. Very long term Engineer Tremella fuciformis to produce the silk fusion protein endogenously, creating a self-assembling living composite.

Other Applications

ApplicationDescription
Tissue Engineering ScaffoldSkin, cartilage, soft tissue repair
Drug Delivery MatrixSustained localised therapeutic release
Soft Robotics ActuatorBio-inspired hydrogel-driven movement
Wearable Biosensor SubstrateFlexible, skin-conforming sensor base
Self-Healing BiomaterialReversible CBM27-mannan re-annealing

Leverage Sidewinder DNA Assembly Technology for Robust Synthesis of CBM27_RGD_MaSp1_4x Fusion Protein Construct

The CBM27_RGD_MaSp1_4x fusion protein design presents a DNA synthesis challenge inherent to spider silk engineering: its four tandem repeats of the MaSp1 domain create a highly repetitive sequence that conventional assembly methods cannot reliably handle. Traditional DNA assembly techniques (PCA, Gibson assembly, Golden Gate) achieve misconnection rates of 1 in 10 to 1 in 30 and fail consistently beyond 5-10 fragment assemblies. This bottleneck is particularly acute for repetitive biomaterials like spider silk.

However, Sidewinder, a DNA assembly technology recently published in Nature (January 2026) by Caltech researchers Kaihang Wang and colleagues, addresses this limitation directly. Sidewinder uses DNA three-way junctions to separate assembly instructions from the final sequence, achieving a misconnection rate of approximately 1 in 1,000,000 and enabling flawless assembly of highly repetitive constructs. The technology was prominently featured at SynBioBeta 2026 (May 6, 2026) by Genyro co-founders Adrian Woolfson and Professor Wang, who presented Sidewinder as infrastructure for translating AI-designed biology into physical systems at scale. As a future aim or complementary approach, Sidewinder could be employed to synthesize the full pET28a plasmid containing the CBM27_RGD_MaSp1_4x insert with guaranteed sequence fidelity, ensuring reliable bacterial expression (Aim 2a), accurate characterization of hydrogel mechanics (Aim 2b), and enabling rapid design-build-test-learn iterations for optimizing the Tremella-silk interaction (Aim 3).

Application and Rheology

After expression and purification, the protein is mixed with Tremella polysaccharide extract. The CBM27 domain automatically anchors the silk into the fungal matrix and MaSp1 self-assembles into fibres, forming a hydrogel sheet with no chemical crosslinking needed. That sheet is cut to size and placed directly on the wound. On a diabetic foot ulcer it does three things at once: Tremella keeps the wound moist and calms chronic inflammation, the silk holds the structure together under the mechanical stress of walking, and the RGD motif pulls the patient’s own skin cells into the wound to drive healing. No cells are delivered, no drugs, just a protein-polysaccharide scaffold that creates the right conditions for the body to repair itself.


Section 2: The Constructs

The fusion protein is built from four functional units arranged N to C terminus:

DomainPositionFunction
CBM27 (buffered)aa 1-176Anchors silk to Tremella mannan
GGGGS x3 linkeraa 177-188Domain flexibility
RGD motifaa 189-193Skin cell recruitment
MaSp1 x4 repeatsaa 194-333Mechanical structure
6xHis tagaa 334-340Ni-NTA purification

340 amino acids · ~37 kDa · cloned into pET28a at NdeI/XhoI · 6,297 bp total plasmid

CBM27_RGD_MaSp1_4x cloned into pET28a — 6,297 bp circular plasmid map CBM27_RGD_MaSp1_4x cloned into pET28a — 6,297 bp circular plasmid map

CBM27_RGD_MaSp1_4x cloned into pET28a(+) at NdeI/XhoI · 6,297 bp

AlphaFold3 Structure Prediction of CBM27_RGD_MaSp1_4x

Model_0, Coloured by pLDDT confidence using standard AlphaFold convention. Rendered in PyMOL.

The AlphaFold3 prediction of my full fusion construct CBM27_RGD_MaSp1_4x reveals a structurally coherent architecture consistent with my design intent. The central dark blue beta-sandwich is the CBM27 domain (residues 1-176), predicting with very high confidence (pLDDT >90) and adopting the characteristic jellyroll fold seen in the experimentally validated TmCBM27 crystal structure (PDB: 1OF4, Boraston et al. 2003). This confirms that the CBM27 domain folds correctly within the fusion context and is not destabilised by the adjacent linker or silk repeat regions.

Extending from the CBM27 domain, the white and red transitional region marks the GGS flexible linker and the boundary into lower confidence territory. Immediately following this, the orange spheres identify the RGD motif (residues 189-193), which sits fully solvent exposed and spatially distal from the CBM27 body. This exposed geometry is the ideal configuration for integrin binding in wound care, biological accumulation in the sculpture context, and surface functionalisation in cryogenic assay conditions. The large blue disordered loops extending around the structure are the four MaSp1 silk repeat units (residues 194-334). These predict as intrinsically disordered, which is the expected and correct behaviour for recombinant spider silk repeat proteins prior to fibre assembly or hydrogel casting. The confidence gradient across the silk loops, darker navy closest to CBM27 fading to lighter periwinkle at the distal ends, reflects the decreasing positional constraint as the chain moves further from the structured anchor domain.

Taken together, this prediction computationally validates three key design features:

  1. CBM27 folds correctly in the fusion context
  2. RGD is spatially accessible
  3. MaSp1 remains dynamically disordered as required for silk network formation

A note on the MaSp1 prediction: It is important to note that AlphaFold3 has known limitations in predicting intrinsically disordered repeat proteins, particularly those with highly repetitive glycine and alanine rich sequences such as spider silk. The MaSp1 backbone trace shown here should not be interpreted as a meaningful 3D conformation. The disordered loops represent AlphaFold3 acknowledging that it cannot assign confident positional coordinates to these residues, not that the silk adopts this specific extended loop geometry in solution. The true behaviour of MaSp1 repeats is well established in the literature: they remain disordered in aqueous solution and assemble into beta-sheet rich nanofibrils upon shear stress, pH change, or drying (Rising et al. 2011). The computational validation of the silk region will be addressed separately through molecular dynamics simulation of the repeat unit assembly behaviour rather than single chain folding prediction, which is not the appropriate tool for this class of protein.

AlphaFold3 confidence metrics:

The overall pTM score for the full fusion construct is 0.54, which reflects the presence of the large intrinsically disordered MaSp1 repeat region rather than indicating poor prediction quality for the structured domains. pTM scores are sensitive to disordered regions and will always be suppressed in constructs containing long flexible or repeat sequences. When interpreted domain by domain, the CBM27 region (residues 1-176) shows pLDDT values consistently above 90, placing it in the very high confidence category and confirming that the folded domain is predicted reliably. The RGD motif and GGS linker score below 50 as expected for short flexible functional motifs. The MaSp1 repeats score variably across the disordered range, consistent with their known intrinsically disordered character in solution.

For comparison, the experimentally validated crystal structure of isolated TmCBM27 (PDB: 1OF4, Boraston et al. 2003, 1.6 Angstrom resolution) confirms the expected beta-sandwich jellyroll fold for this domain. The high confidence CBM27 prediction in our fusion construct (pLDDT >90, residues 1-176) is structurally consistent with this experimentally solved reference, confirming that domain-level folding is maintained even within the multi-domain fusion context. https://www.rcsb.org/structure/1OF4

Construct 2: VVD_GGS_CBM27_MaSp1_4x

The second construct adds a light-responsive LOV domain to the same chassis, targeting photomechanical soft robotics for Mars and space environments.

DomainPositionFunction
VVD (LOV domain, Neurospora crassa)N-terminusLight-sensitive blue light receptor; dimerises under ~450 nm, contracting the silk hydrogel network
GGS linkerFlexible linker between VVD and CBM27
CBM27Anchors silk to Tremella mannan polysaccharide
MaSp1 x4 repeatsC-terminusMechanical silk scaffold

AlphaFold3: pTM 0.45. VVD and CBM27 fold independently with high confidence. MaSp1 disordered as expected.

VVD_GGS_CBM27_MaSp1_4x HERO IMAGE!

AlphaFold3 Structure Prediction of VVD_GGS_CBM27_MaSp1_4x

Model_0, pTM 0.45. Coloured by pLDDT confidence using standard AlphaFold convention. Rendered in PyMOL. The AlphaFold3 prediction of the second fusion construct VVD_GGS_CBM27_MaSp1_4x reveals two independently folding structured domains consistent with the design intent. The larger domain predicting in green and cyan is the VVD LOV domain (residues 1-156), folding with confident to high pLDDT scores and adopting the characteristic PAS domain beta-sandwich fold consistent with the experimentally validated VVD crystal structures PDB 3RH8 and 3D72 (Zoltowski and Crane 2008). The smaller domain in dark blue is the CBM27 domain (residues 163-337), predicting with very high confidence pLDDT >90, consistent with the first construct prediction and the TmCBM27 crystal structure PDB 1OF4 (Boraston et al. 2003). The large disordered loops extending below are the four MaSp1 silk repeats (residues 352-493), predicting as intrinsically disordered as expected for recombinant spider silk repeat proteins prior to fibre assembly. The overall pTM of 0.45 reflects the presence of the large intrinsically disordered MaSp1 region and should not be interpreted as indicating poor prediction quality for the structured domains. The two distinct high confidence structured regions visible in the PAE matrix, one covering residues 1-82 corresponding to VVD and one covering residues 164-328 corresponding to CBM27, confirm that both functional domains fold independently and correctly within the fusion context. Critically the VVD and CBM27 domains appear in close spatial proximity in the predicted structure, consistent with the GGS linker allowing the LOV domain conformational change to propagate mechanical strain into the CBM27-anchored silk network upon blue light activation.

MolProbity (run on VVD construct only): Clashscore 4.46 at 95th percentile. MolProbity score 1.62 at 92nd percentile. Favoured rotamers 99.70%. Zero poor rotamers. Zero bad bonds. Benchmarked against 1784 experimentally solved PDB crystal structures.

MolProbity validation of the AlphaFold3 predicted structure of VVD_GGS_CBM27_MaSp1_4x scores at the 95th percentile for clashscore when benchmarked against 1784 experimentally solved crystal structures in the Protein Data Bank, confirming that the computational prediction meets the geometric quality standards of experimental structural biology. Wet lab confirmation of actual protein folding and domain function awaits cell free expression results from Ginkgo Bioworks other wetlab assay.

MolProbity confirms near-zero atomic clashes across the entire 493 residue fusion construct, with a clashscore placing the predicted structure in the 95th percentile of all experimentally solved crystal structures in the Protein Data Bank.

Clashscore 4.46, 95th percentile compared against N=1784 PDB structures at all resolutions. MolProbity score 1.62, 92nd percentile compared against N=27,675 PDB structures at 0-99 Angstrom resolution.


Section 3: Background

Why It Matters

Chronic wounds cost the US healthcare system over $25 billion per year (Sen, 2025). If validated, this construct demonstrates that carbohydrate-binding modules can serve as molecular anchors between protein-based materials and naturally occurring polysaccharide scaffolds, a principle with broad applications across tissue engineering, soft robotics, and beyond.

Literature Context

Recombinant MaSp1 spider silk expressed in E. coli self-assembles into fibres with tensile properties comparable to native dragline silk (Rising et al., 2011), but lacks biological signalling without post-synthesis functionalisation. Tremella fuciformis polysaccharide promotes fibroblast proliferation and moisture retention exceeding hyaluronic acid at equivalent concentrations (Rui et al., 2025). Boraston et al. (2003) confirmed that CBM27 domains can be grafted onto heterologous proteins while retaining binding specificity. No study has combined all three components into a single genetically encoded construct.

Novelty

This platform is novel in four ways: the CBM27-MaSp1 fusion has not been reported; the addition of RGD converts a passive scaffold into an active wound-healing participant; the approach uses a designed molecular interaction rather than physical blending, a principle generalisable to other polysaccharide-CBM pairs; and the addition of a VVD LOV domain in Construct 2 extends the same chassis into electronics-free photomechanical actuation, a capability with no precedent in genetically encoded soft materials for space environments. Detail follows below.

Biosafety and Ethical Implications

All components are biosafe and at BSL1 at this stage: CBM27 is from a non-pathogenic thermophile, Tremella is an edible mushroom, and the protein is produced in contained E. coli. Aim 3 (GMO fungus) would require regulatory review before any cultivation outside a controlled lab. Clinical risks including allergenic response and off-target integrin binding are noted and would require GMP-standard safety testing before patient use.

CBM27-MaSp1 Fusion Protein Hydrogel Scaffold

Key context for CBM27_RGD_MaSp1_4x construct, Aim 1 and Aim 2

Novelty assessment: High. The specific combination of CBM27 with MaSp1 for structural hydrogel or actuator scaffold applications is not present in current literature. The addition of an RGD motif for active cell recruitment and the use of Tremella fuciformis polysaccharide as the CBM27 binding target further distinguishes this construct from all prior art.

Relevance to construct: This note covers two distinct novelty claims within the same construct. First, the CBM27-MaSp1 fusion itself as a structural hydrogel scaffold has no direct precedent. Second, the Tremella fuciformis polysaccharide as the specific binding target for CBM27 is entirely novel in this context. Prior CBM-silk work uses cellulosic matrices from Clostridium thermocellum cellulosome systems, not fungal mannan polysaccharides. The β-(1→3)-linked mannopyranoside backbone of Tremella is chemically distinct from cellulose and represents a new class of CBM-polysaccharide pairing for silk composite materials.

RGD novelty note: The addition of the RGD integrin-binding motif to this construct converts a passive structural scaffold into an actively cell-recruiting bioaterial. No prior CBM-silk fusion work incorporates a cell adhesion motif. This three-way combination of CBM27 anchor, RGD recruitment, and MaSp1 structural backbone in a single genetically encoded construct is without precedent in the literature. Tremella novelty note: The use of Tremella fuciformis polysaccharide as the matrix phase of the composite is novel in two respects. It introduces a fungal-origin scaffold with documented cryoprotective and moisture-retention properties that cellulosic matrices do not possess, and it establishes a silk-fungal composite identity that is distinct from all prior silk-CBM work which uses bacterial or plant cellulose systems.

Closest prior art: Gomes et al. (2011), Biomaterials. MaSp1 fused with antimicrobial peptides, CBMs from Clostridium thermocellum used with antimicrobial peptides for cellulosic surfaces, not CBM27 with spider silk for hydrogel or actuator applications. No RGD, no fungal matrix. Mohammadi et al. (2019), Science Advances. ADF3 spidroin fused with CBMs from Clostridium thermocellum cellulosome, hydrogel noted as potential application but not developed. No CBM27, no MaSp1, no RGD, no fungal polysaccharide matrix. Key gaps your construct addresses: No prior work combines CBM27 specifically with MaSp1, no prior CBM-silk work targets a fungal mannan matrix, no prior CBM-silk fusion incorporates an active cell recruitment motif, and no prior work develops these composites explicitly as structural hydrogel or soft actuator scaffolds with a defined bioart or extreme environment application context.

References: Gomes et al. 2011, Biomaterials. CBM-silk antimicrobial fusion proteins. Mohammadi et al. 2019, Science Advances. Cellulose-spider silk CBM composites. https://www.science.org/doi/10.1126/sciadv.aaw2541

Photomechanical Soft Robotics for Space Environments Key context for VVD_GGS_CBM27_MaSp1_4x construct (VVD: LOV domain from Neurospora crassa), Aim 3

Novelty assessment: High. The integration of a genetically encoded LOV domain as a photomechanical actuator in a silk-fungal composite material for space environment applications is not present in current literature. This is the first proposed use of a LOV domain specifically for photomechanical actuation in harsh extraterrestrial conditions, leveraging available light flux as an electronics-free autonomous trigger.

Relevance to construct: The VVD LOV domain activates under blue light at approximately 450nm through cysteinyl-flavin adduct formation, driving homodimerisation and propagating conformational strain through the adjacent MaSp1 silk network. Martian solar spectrum at surface level retains sufficient blue light flux to activate LOV domain photochemistry, raising the possibility of fully autonomous actuation without onboard electronic light sources. This directly addresses the central gap in the space soft robotics literature, which has not considered genetically encoded photoreceptors as actuation mechanisms.

Four-layer novelty claim: This construct sits at the intersection of four properties that no prior work addresses simultaneously. Genetically encoded actuation mechanism. LOV domain photochemistry specifically. Silk-fungal composite structural material. Space environment application context.

Martian UV flux note: LOV domain activation occurs at approximately 450nm blue light. The Martian surface solar spectrum, while UV-shifted relative to Earth, retains blue light flux sufficient for LOV photochemistry. This opens the possibility of light-autonomous soft material actuation on the Martian surface without electronic control systems, a capability gap explicitly identified in the NASA ACR24 architecture technology gaps document for high-performance actuators and grippers in extreme environments.

Closest prior art: Lopez-Lopez et al. (2025), ICRA. Inflatable soft robotic arms for space debris capture using pneumatic actuation, motor-free but not photomechanical and not LOV-based. Zeng et al. (2018), Advanced Materials. Light-responsive liquid-crystal elastomers for soft microrobots, photomechanical but synthetic polymer systems, not genetically encoded, not space-environment specific. Nohooji and Voos (2025), Advanced Intelligent Systems. Compliant robotics for space, material selection focused, discusses harsh environment conditions but not photomechanical or LOV-based actuation.

Key gaps your construct addresses: No prior work uses a genetically encoded LOV domain for space soft robotics. No prior photomechanical soft robotics work targets extraterrestrial light as an autonomous trigger. No prior space soft robotics work uses a silk-fungal composite as the structural actuator material. The direct exploitation of Martian solar flux for autonomous LOV-driven silk network actuation is entirely unaddressed in the literature.

NASA technology gap alignment: NASA ACR24 Architecture Technology Gaps document identifies high-performance actuators, sensors, and interfaces for robotic joints and grippers operating in extreme cold and vacuum as an unresolved need. This construct addresses that gap through a passive, electronics-free, genetically encoded photomechanical mechanism that does not rely on lubricants, heating systems, or electronic controllers.

References: Lopez-Lopez et al. 2025: https://softroboticsforspace.eu/images/paper-submissions/Lopez-ICRA2025-Soft-Robotics-for-Space-Applications.pdf Zeng et al. 2018: https://advanced.onlinelibrary.wiley.com/doi/full/10.1002/adma.201703554 Nohooji and Voos 2025: https://www.researchgate.net/publication/390462454_Compliant_Robotics_in_Space_A_Prospective_Review_of_Soft_and_Deformable_Systems_for_Space_Missions NASA ACR24: https://www.nasa.gov/wp-content/uploads/2024/12/acr24-architecture-technology-gaps.pdf


Section 4: Experimental Design

Techniques Used

  • Protein design (Benchling)
  • DNA construct design and codon optimisation
  • Synthesis feasibility testing (IDT, Twist Biosciences)
  • Twist order design (Gene Fragment and Clonal Gene)
  • Sequence databases (UniProt, NCBI GenBank)
  • Bioethical considerations

Industry Partners

  • Twist Biosciences — synthesis feasibility and draft gene orders
  • Ginkgo Bioworks — target partner for cell-free expression (Aim 2a)
  • Benchling — construct assembly and sequence management

Section 5: Results

What Was Validated

Both construct designs and the full synthesis feasibility pipeline, from domain sequence acquisition through to a confirmed-orderable 4x construct on Twist.

AlphaFold3 structure predictions for both constructs.

MolProbity geometric validation for Construct 2 (VVD_GGS_CBM27_MaSp1_4x).

Key Findings

The 8x MaSp1 repeat construct failed at both IDT (complexity score 53, threshold 24) and Twist (Not Accepted), due to unavoidable alanine/glycine codon poverty across 8 identical repeats. The redesigned 4x construct was confirmed orderable: Complex, $145.45, 6,297 bp.

Construct 1 BioMechSkin (CBM27_RGD_MaSp1_4x): AlphaFold3 pTM 0.54. CBM27 domain pLDDT >90 (very high confidence). RGD motif solvent exposed and distal, confirming ideal geometry for integrin binding. MaSp1 disordered as expected.

Construct 2 BioMechGrip (VVD_GGS_CBM27_MaSp1_4x): AlphaFold3 pTM 0.45. VVD and CBM27 both fold independently with high confidence. MaSp1 disordered as expected. MolProbity validation: clashscore 4.46 at 95th percentile, MolProbity score 1.62 at 92nd percentile, favoured rotamers 99.70%, zero poor rotamers, zero bad bonds, Ramachandran favoured 94.1% (462/491 residues), allowed 99.2% (487/491 residues). Benchmarked against 1,784 experimentally solved PDB crystal structures.

Data

VVD_GGS_CBM27_MaSp1_4x – AlphaFold3 & MolProbity (Construct 2)

AlphaFold3: pTM 0.45. VVD and CBM27 both fold independently with high confidence. MaSp1 disordered as expected.

MolProbity (run on VVD construct only): Clashscore 4.46 at 95th percentile. MolProbity score 1.62 at 92nd percentile. Favoured rotamers 99.70%. Zero poor rotamers. Zero bad bonds. Benchmarked against 1784 experimentally solved PDB crystal structures.

Challenges

The 8x MaSp1 repeat construct failed synthesis due to alanine/glycine codon poverty across identical repeats. Redesigned to 4x, which was confirmed orderable on Twist at $145.45, 6,297 bp. Sidewinder (Wang et al., Nature 2026) directly addresses this repetitive sequence assembly problem and is flagged as a future synthesis route.

Next Steps

AIM 2 and beyond

Validate CBM27_RGD_MaSp1_4x (BioMechSkin) and VVD_GGS_CBM27_MaSp1_4x (BioMechGrip) through cell-free expression possibly via Ginkgo CFPS or NEB PURExpress, SDS-PAGE, and polysaccharide binding assay to confirm CBM27 affinity for β-(1→3)-D-mannan from Tremella fuciformis. Build conceptual and computational archive.

If protein expresses: mix with Tremella fuciformis polysaccharide (CAS 9075-53-0, 1% w/v in PBS pH 7.4) to assemble and characterise the composite hydrogel.


References & Resources (Full list)

Key Literature

Core Construct

  • Huang et al. 2025, ACS Synthetic Biology. - Tremella fuciformis polysaccharide bioactivity
  • Boraston et al. 2003, Structure of a Carbohydrate-Binding Module. - CBM27 graft precedent
  • Rising et al. 2011, Cellular and Molecular Life Sciences. - Recombinant MaSp1 self-assembly in E. coli
  • Rui, Y., Lee, Q., Guo, Y., Huang, Y., Xu, H., Liu, B., Ge, X., Lin, H. and Zeng, F. (2025) ‘Structure, Function and Application of Tremella Fuciformis Polysaccharide: A Review’, Journal of Food Science, 90: e70494. doi: 10.1111/1750-3841.70494
  • Sen, C.K. (2025) ‘Human Wound and Its Burden: Updated 2025 Compendium of Estimates’, Advances in Wound Care, doi: 10.1177/21621918251359554

Spider Silk Synthesis

  • Robinson, N.E., Zhang, W., Ghosh, R., Gerber, B., Zhang, H., Sanfiorenzo, C., Wang, S., Di Carlo, D. and Wang, K. (2026) ‘Construction of complex and diverse DNA sequences using DNA three-way junctions’, Nature, 651(8105), pp. 491-500. https://doi.org/10.1038/s41586-025-10006-0

Closest Prior Art, CBM-Silk Composites

Closest Prior Art, Space Soft Robotics

NASA Technology Gap


Lecture Materials

All HTGAA 2026 lectures and labs informed this project, in particular:

  • HTGAA 2026 Final Project - Individual Final Project, How To Grow Almost Anything, Spring 2026, MIT Media Lab
  • Week 4 Lecture & Lab - Protein Design Part I
  • Week 5 Lecture & Lab - Protein Design Part II, Pranam Chatterjee, Gabriele Corso (March 2026)
  • Week 6 - Genetic Circuits Part I (Gibson Assembly lab context for construct cloning)

Databases & Sequence Sources

  • UniProt Q08650 - Thermotoga maritima Man5 mannanase, source of the CBM27 carbohydrate-binding domain (residues 490-669)
  • NCBI GenBank - Cross-checking and FASTA retrieval for CBM27 and MaSp1 reference sequences
  • Human fibronectin RGD motif - cell adhesion peptide reference

Software & Tools Used

  • Benchling - Construct assembly, codon optimization (E. coli K12), restriction site screening, annotated 6,297 bp plasmid map
  • AlphaFold3 - Structure prediction for the CBM27_RGD_MaSp1_4x fusion (pending submission)
  • Twist Bioscience - Synthesis feasibility testing and Clonal Gene order in pET28a(+); 4x construct confirmed orderable at $145.45, 6,297 bp
  • IDT - Secondary synthesis feasibility check; complexity score 53 flagged the 8x repeat design
  • pET28a(+) backbone - Cloning vector (NdeI/XhoI sites, T7 promoter, C-terminal 6xHis tag, BL21(DE3) host)
  • HADDOCK - Planned protein-polysaccharide docking simulation for CBM27 to Tremella β-(1→3)-mannan
  • PyMOL / UCSF ChimeraX - Structure visualization
  • Sidewinder DNA assembly (Genyro / Wang lab, Caltech) - Identified at SynBioBeta 2026 as a route to assemble the repetitive MaSp1 region with high fidelity

AI Assistance

  • Claude (Anthropic)
    • Models: Claude Sonnet 4.5
    • Date(s) used: March to May 2026
    • Tasks included: Acted as mentor on unfamiliar technical areas including in silico validation pipeline planning, budget tradeoffs (PURExpress, Twist gBlock vs Clonal Gene), code for project website and final documentation. Also used for sanity-checking sequences and stepwise Benchling workflows.

Industry Partners & External Resources

  • Twist Bioscience - Synthesis feasibility and draft gene orders
  • Ginkgo Bioworks - Target partner for cell-free expression (Aim 2a)
  • Genyro / Sidewinder - SynBioBeta 2026, May 6 2026; identified as future assembly route
  • SynBioBeta 2026 conference - Sidewinder presentation by Adrian Woolfson and Prof. Kaihang Wang

Acknowledgments

  • HTGAA 2026 course instructors, TAs, and guest lecturers (MIT Media Lab)
  • Node leads, TAs and cohort
  • All homework design contributors
  • The HTGAA 2026 cohort as a whole

Subsections of Individual Final Project-Co-RSynBioR

Aim 1.Construct Design: CBM27_RGD_MaSp1 Fusion Protein

Spider Silk-Snow Fungus Bioactive Biomaterial Optimised for Wound Care — CBM27_RGD_MaSp1_4x Fusion Protein · HTGAA Spring 2026

Documented (Update)

In Benchling:

  • Full 8x construct (480 aa) fully annotated for in silico validation and Ginkgo pitch
  • 4x construct (340 aa) fully annotated as synthesis-ready design

In Twist:

  • Saved draft order for CBM27_RGD_MaSp1_4x_Fusion as a Gene Fragment at $71.61 (~£56), 1,023 bp, complexity: Complex, orderable

  • Saved draft order for CBM27_RGD_MaSp1_4x_pET28a as a Clonal Gene in pET28a(+) at NdeI/XhoI insertion point at $145.45 (~£115), 1,005 bp insert, 6,297 bp total plasmid, complexity: Complex, orderable

  • Downloaded FASTA, GenBank and text files of the optimized gene fragment sequence

  • Downloaded GenBank file of the complete pET28a construct

  • Circular plasmid map generated and documented showing all functional elements

In Benchling: (Older)

Full 8x construct (480 aa) fully annotated 4x construct (340 aa) fully annotated as synthesis-ready design

In Twist:

Saved draft order for CBM27_RGD_MaSp1_4x_Fusion at $71.61 Downloaded FASTA, GenBank and text files of the optimized sequence

In IDT:

Documented complexity failure of 8x construct (score 57.3) Documented complexity failure of identical MaSp1 repeats (score 53)

The Four-Component Logic The fusion protein is built from four functional units arranged in a deliberate order from N-terminus to C-terminus:

The CBM27 domain, sourced from the Thermotoga maritima beta-mannanase Man5 (gene TM1227), physically anchors the silk network to the Tremella polysaccharide matrix by binding its mannan backbone. Without this anchor, the silk and polysaccharide components would phase-separate in wound exudate.

The flexible GGGGS x3 linker gives the CBM27 domain rotational freedom to engage the Tremella polysaccharide chains independently from the rest of the protein. This design follows validated precedent from CBM27 fusion protein literature. The RGD motif (GRGDS), derived from the minimal integrin-binding sequence of human fibronectin, actively recruits fibroblasts and keratinocytes to the material surface to accelerate tissue repair. This converts the dressing from a passive scaffold into an active wound-healing participant.

Eight MaSp1 consensus repeat units provide the mechanical backbone of the construct, self-assembling into beta-sheet nanofibrils that give the hydrogel its toughness and structural integrity. Construct Architecture The full fusion protein is 480 amino acids with the following domain map:

DomainPositionsFunctionCBM27 (buffered)1-176Tremella polysaccharide anchorGGGGS x3 Linker177-188Domain flexibilityRGD motif189-193Cell recruitmentMaSp1 x8 repeats194-474Mechanical structure6xHis tag475-480Ni-NTA purification Molecular weight: 44,409 Da. Isoelectric point: 7.29. Sequence Source and Verification

The CBM27 domain boundaries were verified against the UniProt entry for Thermotoga maritima Man5, which annotates the Carbohydrate Binding Module 27 at residues 495-664 of the full 669 aa protein. A five-residue buffer was added to each boundary to avoid clipping structurally important residues at the domain edge, giving a final CBM27 input of 177 amino acids.

The complete construct was assembled and annotated in Benchling as an AA sequence file named CBM27_RGD_MaSp1_Fusion within the Spider Silk - Tremella project.

Synthesis Complexity and the Codon Shuffling Requirement Initial synthesis feasibility analysis via the IDT Codon Optimization Tool confirmed that direct translation of 8 identical MaSp1 repeat units generates prohibitive sequence complexity, returning a total complexity score of 57.3 against a synthesis threshold of 24. The specific failure modes identified were:

Repeated DNA sequences exceeding 13 bases appearing at multiple locations across the insert 73.5% of the overall sequence composed of repeats longer than 8 bases A 100-base window at position 1165 with 85% GC content Hairpin structures forming between identical repeat regions

This result validates the codon shuffling strategy as a necessary design requirement rather than an optional refinement. The next step is to manually assign synonymous codons to each of the 8 MaSp1 repeats so that every repeat encodes an identical amino acid sequence but presents a distinct DNA sequence to the synthesis machinery.

1. Sequence Acquisition: CBM27 Domain

The carbohydrate binding module 27 domain was sourced from the Thermotoga maritima beta-mannanase Man5 (gene TM1227), a 669 amino acid protein. The CBM27 domain boundaries were verified against the UniProt entry, which annotates the domain at residues 495-664. A five-residue buffer was added to each boundary to avoid clipping structurally important residues at the domain edge, giving a final CBM27 input sequence of 177 amino acids spanning residues 490-669.

The extracted sequence was:

KVVNRAQYVLAEEVDFSRPEEVKNWWNSGTWQAEFKTPDIEWNGEVGNG
ALQMNVVLPGKGDWEEVRVVRKFDQLPVCEILEYDIYIPDVEGLTGRLR
PYAVLNPGWVKIGLDMNNTSIDSGELVSFDGKKYRKFHVRIEFDKTPGV
NELHIGVVGDHLEYDGPIFIDNVRLYKKSS

2. Construct Assembly in Benchling

The full fusion protein was assembled in Benchling as an AA sequence file named CBM27_RGD_MaSp1_Fusion within the Spider Silk - Tremella project folder. Each domain was added sequentially and annotated with a distinct colour:

DomainPositionsColourFunction
CBM27 (buffered)1-176BlueTremella polysaccharide anchor
GGGGS x3 Linker177-188GreyDomain flexibility
RGD motif189-193PinkCell recruitment
MaSp1 R1-R8194-474GreenMechanical structure
6xHis tag475-480YellowNi-NTA purification

Total length: 480 amino acids. Molecular weight: 44,409 Da. Isoelectric point: 7.29.

The annotated sequence map confirmed all domains were correctly positioned and accounted for. The biochemical properties panel confirmed the sequence was 480 amino acids running from KVVN at the N-terminus to HHHH at the C-terminus.


3. Codon Optimization: CBM27 Domain

The CBM27 domain alone (177 aa) was submitted to the IDT Codon Optimization Tool with the following settings:

  • Sequence type: Amino Acids
  • Product type: gBlocks Gene Fragments
  • Organism: Escherichia coli
  • Restriction sites to avoid: BsaI (GGTCTC), BsmBI (CGTCTC)

The CBM27 domain passed initial screening with no complexity issues. The codon-optimized DNA output was 531 bp. No BsaI or BsmBI sites were introduced. Several other restriction sites were noted (PstI, SmaI, XmaI) but these are irrelevant to the Golden Gate assembly strategy and were not flagged as problems.


4. Synthesis Feasibility Testing: Full 8x Construct

IDT Test 1: Identical MaSp1 Repeats

The complete 480 amino acid sequence was submitted to IDT Codon Optimization with E. coli settings. IDT returned a complexity score of 57.3, well above the synthesis threshold of 24, with the following specific failure modes:

  • A repeat sequence of 42 bases appearing at multiple locations
  • 73.5% of the overall sequence composed of repeats longer than 8 bases
  • A 100-base window at position 1165 with 85% GC content
  • Hairpin structures forming between identical repeat regions

Result: Denied. Not synthesizable.

This confirmed the known problem with identical tandem silk repeats: even after codon optimization, the DNA homology between repeated units causes synthesis machinery slippage.

IDT Test 2: Codon-Shuffled MaSp1 Repeats

Eight synonymous codon-shuffled variants of the MaSp1 repeat were computationally generated, each encoding the identical amino acid sequence GQGAGAAAAAAGGAGQGGYGGLGSQGAGRGGLGGQ but using distinct codon assignments drawn from the E. coli K12 codon table. No two adjacent repeats shared more than 3 consecutive identical base pairs.

The full insert including CBM27, linker, RGD, all 8 shuffled repeats, 6xHis tag and stop codon was assembled as a 1,443 bp sequence and submitted to IDT gBlocks entry for complexity testing.

Result: Complexity score 53. Still Denied.

The codon shuffling reduced the score from 57.3 to 53 but was insufficient to bring it below the 24 threshold. The dominant remaining problem was a 42-base shared subsequence between two repeats in the alanine-rich region, where the limited synonymous codon options for alanine and glycine prevented sufficient DNA diversification across 8 repeats.

Twist Bioscience Test: Full 8x Construct

The same 480 amino acid sequence was submitted to Twist Bioscience’s gene ordering portal using their built-in amino acid import and codon optimization workflow with E. coli codon table, BsaI and BsmBI avoidance.

Result: Not Accepted.

Twist’s algorithm, which is more sophisticated than IDT’s and better handles repetitive sequences, also rejected the 8x construct. This was definitive confirmation that 8 identical MaSp1 repeats cannot be synthesized as a single gene fragment by any current commercial synthesis vendor.


5. Design Decision: 4x Repeat Construct

Based on the synthesis feasibility data, a second construct was designed using 4 MaSp1 repeat units instead of 8. This was created in Benchling by duplicating the original file and deleting repeats R5 through R8 (positions 334-473), producing a 340 amino acid construct named CBM27_RGD_MaSp1_4x_Fusion.

DomainPositionsFunction
CBM27 (buffered)1-176Tremella polysaccharide anchor
GGGGS x3 Linker177-188Domain flexibility
RGD motif189-193Cell recruitment
MaSp1 R1-R4194-333Mechanical structure
6xHis tag334-340Ni-NTA purification

6. Synthesis Feasibility Testing: 4x Construct on Twist

The 340 amino acid 4x construct was submitted to Twist Bioscience using the same amino acid import workflow. Twist performed codon optimization for E. coli with BsaI and BsmBI avoidance and returned the following result:

  • Complexity: Complex
  • Length: 1,023 bp
  • Price: $71.61 (~£56)
  • Status: Orderable

Complex in Twist terminology means the sequence is manufacturable but requires more careful synthesis handling than a standard sequence. It is not a rejection. The remaining complexity flags were minor repeat density warnings in the MaSp1 region, all categorised as warnings rather than errors.

Twist’s codon optimization was applied and the optimized sequence was downloaded in FASTA, GenBank and text formats. The order draft was saved to the Twist account.


7. Summary of Constructs and Status

ConstructRepeatsLengthIDT ResultTwist ResultPurpose
CBM27_RGD_MaSp1_Fusion8x480 aa / 1,443 bpScore 53, DeniedNot AcceptedIn silico validation, Ginkgo pitch
CBM27_RGD_MaSp1_4x_Fusion4x340 aa / 1,023 bpNot testedComplex, $71.61, OrderableWet lab synthesis

8. Backbone Vector Documentation

The insert was designed for expression in pET-28a(+) (Novagen, 5,365 bp), cloned at the NdeI/XhoI insertion points. This places the fusion protein under T7 promoter control with a C-terminal 6xHis tag provided by the vector for Ni-NTA purification. The vector carries kanamycin resistance for bacterial selection and a colE1 high copy origin of replication. The complete circular plasmid was designed directly in Twist Bioscience’s Clonal Gene ordering portal, which performs synthesis, cloning, transformation, colony picking and Sanger sequencing verification, delivering a ready-to-use sequence-verified plasmid. The full construct is 6,297 bp and was downloaded as a GenBank file and imported into Benchling as a circular DNA sequence for documentation. The plasmid is compatible with NEB PURExpress E6800 and Ginkgo Bioworks CFPS cell-free expression systems, both of which are T7 promoter driven E. coli based systems requiring no further cloning before expression.

9. Next Steps

In silico track: (see In silico page)

Submit the full 480 aa CBM27_RGD_MaSp1_Fusion sequence to AlphaFold3 to predict the 3D structure and confirm that the CBM27 domain and RGD motif are surface-exposed and not buried within the MaSp1 beta-sheet core.

Wet lab track:

The protein construct ready to be ordered from Twist. Preferred Ginkgo Bioworks (for remote lab experience) or LifeFabs assay as second option. Protocol draft started.

Protocol for Assay

Protocol for Assay

Draft v 2


Experimental Protocol: CBM27_RGD_MaSp1_4x Fusion Expression and Tremella Composite Formation

Construct: CBM27_RGD_MaSp1_4x_Fusion
Vector: pET28a (NdeI/XhoI, C-terminal 6xHis tag)
Protein MW: ~37 kDa (340 aa)
Expression system: Cell-free protein synthesis (CFPS) — see Aim 2 context below


Aim 2 Context: Why This Assay Exists

This protocol is the experimental component of Aim 2a of the Exoskin project. Aim 1 established the computational and synthesis-feasibility foundation: the CBM27_RGD_MaSp1_4x construct has been designed in Benchling, codon-optimised for E. coli K12, validated structurally by AlphaFold3 (CBM27 pLDDT >90, RGD solvent-exposed, MaSp1 disordered as expected), and confirmed orderable by Twist Biosciences at 6,297 bp in pET28a ($145.45, Clonal Gene). HADDOCK docking of CBM27 against Tremella β-(1→3)-D-mannan is pending due to a documented gap in carbohydrate force field databases for fungal mannans; this makes wet lab confirmation of polysaccharide binding the critical validation step.

Primary execution route: Ginkgo Bioworks cloud lab automation (Aim 2a)

The protocol below is designed for submission to the Ginkgo Bioworks (via HTGAA Twist aim 1 order submission to be confrimed) automated CFPS platform. Steps are written to be compatible with liquid-handling automation and standard Ginkgo reagent sets. Where a step requires upfront specification (e.g. chaperone supplementation), this is flagged explicitly, because automated cloud lab runs cannot be modified mid-protocol after submission.

Alternative execution route: If Ginkgo partnership is not confirmed in time, this protocol can be run manually using NEB PURExpress E6800 in any standard molecular biology lab. All volumes and conditions are identical between routes.


Step 1: Cell-Free Protein Synthesis

Materials needed:

  • PURExpress Solution A
  • PURExpress Solution B
  • Murine RNase Inhibitor
  • Template DNA: pET28a_CBM27_RGD_MaSp1_4x plasmid (circular, 250 ng) or linear PCR product
  • DnaK chaperone mix (supplementary, see note below)
  • Nuclease-free water
  • Ice

Chaperone specification note (required for Ginkgo submission): MaSp1 repeat regions are prone to aggregation and beta-sheet stacking due to their glycine/alanine-rich repetitive sequence. DnaK/DnaJ/GrpE chaperone supplementation should be specified upfront in the Ginkgo protocol submission. Typical addition: 2 µM DnaK, 0.4 µM DnaJ, 0.1 µM GrpE in the final reaction volume. If using manual PURExpress, add chaperones after assembling the base reaction and before incubation.

Protocol:

  1. Thaw Solutions A and B on ice. Do not vortex.
  2. Assemble the following reaction on ice in this exact order in a 1.5 ml microcentrifuge tube:
    • 10 µl Solution A
    • 7.5 µl Solution B
    • 0.5 µl RNase Inhibitor (20 units)
    • 2 µl template DNA (250 ng)
    • 0.5 µl DnaK chaperone mix (see note above)
    • 4.5 µl nuclease-free water
    • Total volume: 25 µl
  3. Mix gently by pipetting up and down 5 times. Do not vortex.
  4. Incubate at 37°C for 2 hours.
  5. Place on ice immediately after incubation.

Run alongside: one negative control reaction with no template DNA, identical volumes and conditions.


Step 2: His-Tag Purification (Mandatory Before Step 3)

This step is required before proceeding to Tremella composite formation. The raw PURExpress reaction contains many E. coli ribosomal and accessory proteins. Mixing unpurified reaction with Tremella polysaccharide risks false-positive gel formation from non-specific protein-polysaccharide interactions. Ni-NTA purification isolates only the 6xHis-tagged fusion protein.

Materials needed:

  • Ni-NTA agarose resin (e.g. Qiagen, cat. 30210) or magnetic Ni-NTA beads
  • Binding buffer: 50 mM NaH₂PO₄, 300 mM NaCl, 10 mM imidazole, pH 8.0
  • Wash buffer: 50 mM NaH₂PO₄, 300 mM NaCl, 20 mM imidazole, pH 8.0
  • Elution buffer: 50 mM NaH₂PO₄, 300 mM NaCl, 250 mM imidazole, pH 8.0
  • Spin columns or magnetic rack
  • PBS pH 7.4 for buffer exchange

Protocol:

  1. Dilute the 25 µl CFPS reaction with 225 µl binding buffer (1:10 dilution).
  2. Add 25 µl pre-equilibrated Ni-NTA resin slurry. Mix on rotary mixer for 30 minutes at 4°C.
  3. Centrifuge 700 × g, 2 minutes. Remove supernatant (flow-through).
  4. Wash resin twice with 200 µl wash buffer. Centrifuge 700 × g, 2 minutes each wash.
  5. Elute with 50 µl elution buffer. Incubate 5 minutes at room temperature, then centrifuge 700 × g, 2 minutes.
  6. Buffer exchange into PBS pH 7.4 using a 10 kDa MWCO spin concentrator to remove imidazole.
  7. Recover purified protein in approximately 20–25 µl PBS.

Reserve 2.5 µl for SDS-PAGE (Step 3). The remaining ~18–22 µl proceeds to Step 4.


Step 3: SDS-PAGE Confirmation

Materials needed:

  • 10-20% Tris-glycine precast gel
  • SDS loading buffer (4x)
  • Protein ladder (10-250 kDa range)
  • Running buffer (Tris-glycine-SDS)
  • Coomassie Blue stain (or silver stain for higher sensitivity)

Protocol:

  1. Take 2.5 µl of the purified eluate and add 2.5 µl of 2x SDS loading buffer (or 0.83 µl of 4x buffer + 1.67 µl nuclease-free water to reach 1x final concentration).
  2. Heat at 95°C for 5 minutes to denature proteins.
  3. Load onto gel alongside protein ladder and a lane with the same volume of purified negative control eluate.
  4. Run at 200V for 35 minutes.
  5. Stain with Coomassie Blue for 1 hour. Destain with water overnight, or use methanol/acetic acid destain for faster results.
  6. Look for a band at approximately 37 kDa.

Success criterion: A visible band at 37 kDa in the expression lane that is absent in the negative control lane. Proceed to Step 4 only if this criterion is met.

If no band is observed: Do not proceed to composite formation. Troubleshoot expression first: check template DNA integrity by nanodrop and gel, verify T7 promoter orientation in the construct, and consider a second run with increased template (up to 500 ng) or extended incubation (4 hours at 37°C). If aggregation is suspected (smear or high-MW band), confirm DnaK chaperone supplementation was included.


Step 4: Tremella Composite Formation (only if Step 3 successful and not confirmed if can be run after above if at Ginkgo Bio)

Materials needed:

  • Tremella fuciformis polysaccharide: minimum 95% purity, fungal-derived, CAS 9083-80-1 (note: use a supplier that specifies β-(1→3)-linked mannan as the primary backbone structure; Sigma-Aldrich or Funakoshi are appropriate sources — confirm catalogue specification before ordering)
  • PBS buffer pH 7.4
  • Purified fusion protein from Step 2 (~18–22 µl)
  • Microcentrifuge tubes
  • Rotary mixer
  • Positive control: PURExpress negative control eluate (same volume as fusion protein sample)

Protocol:

  1. Prepare a 1% w/v Tremella polysaccharide solution by dissolving 10 mg dried Tremella polysaccharide in 1 ml PBS pH 7.4. Stir gently at room temperature for 2 hours until fully dissolved. The solution should be visibly viscous. If it does not dissolve fully, warm to 37°C for 30 minutes before cooling back to room temperature.
  2. Take the purified fusion protein from Step 2 (~18–22 µl). Record exact volume.
  3. Add an equal volume of the 1% Tremella polysaccharide solution to the protein. Mix gently by pipetting 10 times. Do not vortex.
  4. Prepare a parallel control by mixing the same volume of the purified negative control eluate with the same volume of 1% Tremella solution.
  5. Incubate both tubes at room temperature for 30 minutes on a rotary mixer at low speed, to allow CBM27 domain binding to the Tremella β-(1→3)-D-mannan backbone.
  6. Assess viscosity using the tube inversion test: invert each tube 180° and observe flow behaviour over 10 seconds. A gel or high-viscosity hydrogel will not flow freely; a low-viscosity solution will run to the cap immediately.
  7. As a semi-quantitative measure, attempt to aspirate 10 µl from each tube using a standard 20 µl pipette tip. Note resistance to aspiration on a simple 0/1/2 scale (0 = no resistance, 1 = moderate resistance, 2 = cannot aspirate). Record for both the fusion protein tube and the control tube.

Success criterion: Visible increase in viscosity or gel formation in the fusion protein tube that is absent or substantially lower in the negative control tube. A tube inversion score of 1 or 2 in the fusion protein tube versus 0 in the control tube constitutes a positive result.

If no gel formation is observed: CBM27 may not be folding correctly under cell-free conditions, or the Tremella preparation may not contain sufficient β-(1→3)-mannan backbone to support CBM27 binding. Troubleshooting options: (1) repeat with a different Tremella polysaccharide source or lot; (2) run a second CFPS reaction with extended incubation and additional DnaK; (3) attempt CD spectroscopy on the purified protein to assess CBM27 secondary structure if equipment is available.


What Each Result Means

ResultInterpretation
Band at 37 kDa on gel, absent in controlProtein expressed successfully. Proceed to Step 4.
No band on gelExpression failed. Check DNA template integrity, T7 promoter orientation, and DnaK supplementation. Do not proceed.
Band present but at wrong MWPossible truncation or degradation. Check RNase inhibitor and RNA integrity.
Gel formation / high viscosity in Step 4CBM27 is anchoring the silk to the Tremella matrix. Composite hydrogel formation confirmed. Aim 2a and 2b success.
No gel formation in Step 4, band confirmed in Step 3Protein expressed but CBM27 may not be binding. Check Tremella source specification; consider docking simulation to revisit binding geometry.
No gel formation in Step 4, no band in Step 3Expression failed. Step 4 result is uninterpretable. Restart from expression troubleshooting.

Ginkgo Submission Checklist

If submitting to Ginkgo Bioworks cloud lab, confirm the following are specified in your protocol submission before sending:

  • Plasmid sequence file (Benchling export, .gb or .fa) for CBM27_RGD_MaSp1_4x in pET28a
  • DnaK chaperone supplementation requested (2 µM DnaK, 0.4 µM DnaJ, 0.1 µM GrpE)
  • Expected band size stated as 37 kDa (for SDS-PAGE automated imaging parameters)
  • Ni-NTA purification step requested prior to composite assembly
  • Tremella polysaccharide source and CAS number specified (CAS 9083-80-1, minimum 95% purity, β-(1→3)-mannan backbone confirmed)
  • Negative control (no-template CFPS) requested in parallel

Group Final Project

cover image cover image

L-Protein Mutants

Problem: How to improve the stability and auto-folding of the lysis protein of an MS2-phage? This mechanism is key to understanding how phages may help address antibiotic resistance.

After going through the readings, including the group final project document a Plan A would be: (This stays within scope, MurJ and multi-target approaches seem intersting though…)

  1. Use computational tools like AlphaFold2 or ProteinMPNN to identify mutations that improve intrinsic stability and auto-folding of the lysis protein

  2. Target mutations that strengthen the hydrophobic core, eliminate aggregation-prone regions, or introduce stabilising interactions like salt bridges

  3. Engineer the lysis protein to fold correctly without requiring DnaJ or any other bacterial chaperone

  4. Design mutations that also accelerate oligomerisation or enhance membrane pore-forming activity for faster lysis

  5. Synthesise the mutant gene via Twist, clone into plasmid using Gibson Assembly, validate structural integrity with Nuclera, then test in E. coli.