Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Living lab TerraPods, Lebanon The halfpipe of Doom- How to grow good? For the first weeks lecture we had an introduction to the fundamental principles of synthetic biology and the HTGAA program. The focus of the lecture was on the governance and ethics of synthetic biology. David S. Kong discussed the balance between decentralized and centralized synBio development and the importance of thrust (something we are lacking these days). As a global community we have largely agreed to certain rules (e.g. bioweapon treaty 1975) however emerging synBio technologies also allow a much broader audience to participate in the development (e.g. community labs/ biohackers) that might not necessary always align with large governmental policies. He draws the parallel to how the early governance of the internet have allowed for a decentralized scaling that have contributed to an increased “computer literacy”. This might allow us to make better (although not perfect) personal decisions for how to use this new technology. Coming from a background of community focused biolab practice this was an interesting topic and made me think of the importance for a global bio-literacy. It also got me to think about the importance to apply these principals in a simple enough way that it doesn’t stifle participation.

  • Week 2 HW: dna read write and edit

    Part 1: Benchling & In-silico Gel Art My original idea was to make a circle, but after some trial and error I realized it would be a bit too complicated—so I settled on an arch (bridge). 1a) I imported the sequence for lambda DNA. 1b) In Benchling, I ran all 7 restriction enzymes we had available to see which ones gave:

  • Week 03 — Opentrons: Automation Art + Post-Lab Questions

    Part 1 — Automation Art (OT-2 “printing” a design) This week I designed a microscope icon as “automation art” and converted it into a grid of XY dot coordinates that can be dispensed by the Opentrons OT-2 onto an agar plate.

  1. Design → coordinate map I started from the course Automation Art Interface, which makes it easy to draw a dot pattern on a circular “canvas.”
  1. How many amino acid molecules are in 500 g of meat? If 500 g of meat is about 20% protein, that gives about 100 g protein. Since one amino acid is about 100 g/mol, that is about 1 mole, or ~6 × 10^23 molecules.
  • Week 5 HW: Protein design part 2

    Part 1: Generate Binders with PepMLM For this exercise, I used the human SOD1 target protein and introduced the A4V mutation. I then used PepMLM to generate four candidate 12-amino-acid peptide binders against the mutant target sequence. As requested in the assignment, I also included the known binder peptide FLYRWLPSRRGG for comparison. What is a A4V mutation:

  • week-06-hw-genetic-circuits-part-i

    Assignment: DNA Assembly 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The Phusion High-Fidelity PCR Master Mix contains at least three key components: Phusion DNA polymerase, deoxynucleotides (dNTPs), and an optimized reaction buffer that includes MgCl₂. The polymerase is the enzyme that synthesizes new DNA strands during PCR, the dNTPs are the nucleotide building blocks incorporated into the new DNA, and the buffer/MgCl₂ provide the chemical environment and cofactor needed for efficient polymerase activity. According to the website of (New England)[https://www.neb.com/en/products/m0531-phusion-high-fidelity-pcr-master-mix-with-hf-buffer?srsltid=AfmBOorWPUiBMtKsQJJH0VLGPzLYHtMYELtt0wf7AQB0YZYF4nrTfFsz] the main benefit of rgw Master mix is high fidelity (50X comparing to Taq) and fast extension times.

  • Week 07 HW: Genetic circuits part ii

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?** Intracellular artificial neural networks (IANNs) have a major advantage over traditional Boolean genetic circuits because they can process graded, continuous signals rather than only treating inputs as ON/OFF states. In biological systems, many relevant signals such as metabolite concentration, RNA abundance, stress level etc are not naturally binary. Neural-network-like circuits are better suited to integrate these analog inputs and make decisions based on their combined strength. Rizik e.g 2022

  • Week 09 HW: cell free systems

    Advantages of cell-free systems Cell-free protein synthesis (CFPS) offers a highly flexible and controllable environment compared to in vivo expression systems. Because there are no living cells, experimental conditions such as pH, ionic strength, redox environment, DNA concentration, cofactors, and additives can be directly tuned without affecting cell viability. This enables rapid optimization and prototyping of genetic constructs. Additionally, CFPS is significantly faster, allowing protein production within hours instead of requiring cell growth, transformation, and induction steps.

  • week-10-hw-imaging and measurment

    ##Final Project ?? Waters Part I — Molecular Weight The predicted molecular weight of the full eGFP construct, including the LE linker and His6-tag, is approximately 28,006.6 Da based on the amino acid sequence. Mature eGFP forms an internal chromophore, which results in a mass loss of approximately 20 Da. Therefore, the expected molecular weight of mature eGFP is approximately 27,986.6 Da.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image Living lab TerraPods, Lebanon

The halfpipe of Doom- How to grow good?

For the first weeks lecture we had an introduction to the fundamental principles of synthetic biology and the HTGAA program. The focus of the lecture was on the governance and ethics of synthetic biology. David S. Kong discussed the balance between decentralized and centralized synBio development and the importance of thrust (something we are lacking these days). As a global community we have largely agreed to certain rules (e.g. bioweapon treaty 1975) however emerging synBio technologies also allow a much broader audience to participate in the development (e.g. community labs/ biohackers) that might not necessary always align with large governmental policies. He draws the parallel to how the early governance of the internet have allowed for a decentralized scaling that have contributed to an increased “computer literacy”. This might allow us to make better (although not perfect) personal decisions for how to use this new technology. Coming from a background of community focused biolab practice this was an interesting topic and made me think of the importance for a global bio-literacy. It also got me to think about the importance to apply these principals in a simple enough way that it doesn’t stifle participation.

Questions that I tried to include in my homework:

1. Describe a biological engineering application

Programmable colors for bacterial cellulose production

The textile dyeing industry is a major source of chemical pollution and water use. Coloration of bacterial cellulose (BC) can also be technically challenging because pigments often diffuse slowly into the material’s dense nanofibrillar network, making post-growth dyeing difficult and time consuming. This project proposes a bioengineering approach to generate color in situ during BC growth, eliminating conventional dyeing steps.

TerraPods TerraPods Dyed BC I developed at TerraPods Lebanon

Prior work demonstrates the feasibility of embedding pigmentation into BC production. Walker et al.(2025) 1 engineered the cellulose-producing bacterium Komagataeibacter rhaeticus to generate melanin during BC growth, producing pigmented material. Zhou et al. (2025) 2 demonstrated a “one-pot” co-culture strategy coupling BC production by Komagataeibacter xylinus with pigments synthesised in engineered E. coli, enabling a broader palette by combining violacein derivatives (green/blue/navy/purple) and carotenoids (red/orange/yellow).

Zhou et al. (2025) Zhou et al. (2025) Zhou et al. (2025)

Building on these studies, the core concept here is light-patterned control of pigment production during BC formation. A cellulose-forming culture generates the sheet while a pigment-producing bacteria is engineered to be light-responsive, so that pigmentation occurs in illuminated regions. Patterned illumination via projection enables spatial control of coloration. Furthermore this technique would also enable varying projected patterns across growth phases that could yield multi-layer visual effects, (e.g. moiré-like effects).

Walker et al.(2025) Walker et al.(2025) Walker et al.(2025)

Drawing from my previous experiences on working in various community biolab the project is framed as a distributed biofabrication platform for community labs, which creates governance questions around biosafety practice in a decentralized settings, concider the relative complex technique I was for this excersice imagining a centralized organization providing the framework and digital infrastructure for the community labs to safetly experiment with the protocol. Although consumer product are less ethically complicated then for example medicine or bioweapon their came up important questions concerning consumer/skin-contact safety, environmental release and waste handling, and norms for responsible dissemination of methods and bacteria strains.

2. governance/policy goals

                CENTRAL PLATFORM / ORG
     (protocol repo + training + registry + reporting)
            |           |               |
     SOP minimums   pigment safety   open hardware stack
     (Option 1)      (Option 2)          (Option 3)
            |           |               |
            +-----------+---------------+
                        |
        -----------------------------------------
        |                  |                   |
   Community Lab A     Community Lab B     Community Lab C...
 (local biosafety)   (local biosafety)   (local biosafety)
   - containment        - containment       - containment
   - waste handling     - waste handling    - waste handling
   - incident reports   - incident reports  - incident reports
   - minimal tests + labeling (skin-contact, leaching, etc.)
                        |
  Local authorities / partners / funders
  (disposal rules, validation support, incentives)
  • Actors: Community labs and networks, open-hardware designers, academic partners, funders, and (optionally) insurers.

A. Biosecurity

  • A1: Reduce risk of malicious repurposing of organisms, materials, or protocols.
  • A2: Improve traceability and incident reporting to support response.

B. Lab safety

  • B1: Standardize safe practices (training, containment, waste handling) across labs.
  • B2: Establish clear response procedures for spills, exposures, and contamination.

C. Environmental protection

  • C1: Prevent release of organisms or harmful pigments/byproducts.
  • C2: Enable remediation and corrective action after incidents.

D. User/consumer protection and social trust

  • D1: Ensure skin-contact safety (low leaching, low irritation risk, stability).
  • D2: Maintain low barrier access; avoid governance that excludes low-resource labs.
  • D3: Require transparency and avoid misleading sustainability claims.

E. Feasibility and innovation

  • E1: Keep requirements simple for community labs.
  • E2: Avoid unnecessary friction to legitimate research and education.

3. Governance actions (three options)

➡️ Option 1 — Network baseline: certification + SOP minimums

Purpose: Reduce variability in biosafety practice across distributed labs.

Design: A lightweight participation standard for labs using the platform including training checklist; Standard operating procedure (SOP) templates for handling, contamination response, waste logs and periodic documentation checks.

Assumptions: Labs will opt in if benefits are tangible and the extra admistrive work is not to burdensome.

Risks: Uneven enforcement; exclusion of under-resourced labs if standards become to complex.

➡️ Option 2 — Pigment/material safety standard: whitelist + minimal testing + labeling

Purpose: Address the most important downstream risk for the product: skin-contact, pigment safety and environmental implications.

Design: Shared “allowable pigment classes” (whitelist) plus minimum evidence requirements for testing (basic leach, washfastness, disposal guidance, documentation of lab status). Standard labeling for intended use and safety-relevant claims.

Assumptions: Low-cost testing tools or institutional partners are available; whitelist stays current and not to restrictive.

Risks: The process to complex and hindering community engagement, or weak tests gives unreliable results, slowed innovation if the whitelist narrows too far.

➡️ Option 3 — Open-source hardware standards for safe, distributed BC biofabrication

Purpose: Reduce reliance on expensive proprietary equipment while lowering barriers to participation without lowering safety. The goal is to make safe practice easier by default through standardized, well-documented hardware and workflows suitable for community labs.

Design: an open-source “reference stack” that includes:

  • Validated hardware designs for core needs (e.g., enclosed growth modules with spill containment, filtered airflow concepts, light/projection enclosures to reduce eye/UV exposure, basic sensing/logging for temperature/pH proxies where appropriate).
  • A documentation package: build BOMs with substitutions, maintenance/calibration checklists, cleaning/decon compatibility notes, and safety labels.
  • Inter-lab benchmarking: common test artifacts and reporting templates so labs can compare performance and identify failure modes early.

Assumptions:

  • Standardizing equipment and documentation will reduce accidents and variability more effectively than rules alone.
  • Community labs have enough fabrication capacity (or partner access) to build/maintain hardware.
  • A shared reference design can remain adaptable across different local constraints.

Risk:

  • Hardware reliability varies; incomplete documentation leads to unsafe modifications; lack of maintenance causes drift in performance.
  • Lowered barriers increase scale of adoption faster than training capacity; designs are copied without safety context; fragmentation into many forks undermines standardization.

4. Score

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents122
• By helping respond122
Foster Lab Safety
• By preventing incident121
• By helping respond121
Protect the environment
• By preventing incidents212
• By helping respond222
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility in community labs?121
• Not impede research221
• Promote constructive applications111

5. Prioritization and recommendation

I would prioritize Option 1 + Option 2 as the baseline governance package, with Option 3 as a longer-term technical pathway. Option 1 provides uniform safety culture and response capacity across labs; Option 2 directly governs consumer-contact risks and environmental externalities specific to pigment-enabled textiles. Option 3 is desirable for uniformed implementation of option 1 and 2 in a community lab setting.

Primary audiences: community lab networks and lab leads (implementation), funders/partners, and local safety/environment authorities (alignment on waste and disposal practices).

ChatGBT 5.2 was used for brainstorming bioengineering ideas for BC production in a community based setting

Prompt1

I have this homework for my new How to grow almost anything: To start with I need to come up with a bioengineering project that suits this class. I am thinking about different ways that I can use my current work maybe on bacterial cellulose production for material production would it be possible to use syn bio to improve material production for fabric development in fashion. and decentrialised manufacturing and design. could we start with coming up with 10 ideas that could be interesting for this homework focus on BC but could also be other materials. after that is finished we can think about the legal framework. here is the class: + the homework guidlines!

Aswell as searching for academic literature

Prompt2

do you have any good academic articles for referencing this project around the topics: engineering bacteria to produce pigment when exposed to light, insitu pigmentation of BC, community lab governance structure?!

and correct spelling error and double checking if I understood the research correctly

Prompt3

check this improved text and restructure, improve when needed also mark out if their is something in the text that I missunderstod from the research articles. Highlight any changes that you make to the text!

and to make the code for the governance chart:

Prompt4

can you draw a map of this governance structure: Drawing from my previous experiences on working in various community biolab the project is framed as a distributed biofabrication platform for community labs, which creates governance questions around biosafety practice in a decentralized settings, concider the relative complex technique I was for this excersice imagining a centralized organization providing the framework and digital infrastructure for the community labs to safetly experiment with the protocol. Although consumer product are less ethically complicated then for example medicine or bioweapon their came up important questions concerning consumer/skin-contact safety, environmental release and waste handling, and norms for responsible dissemination of methods and bacteria strains. this is the full text: https://pages.htgaa.org/2026a/alve-lagercrantz/homework/week-01-hw-principles-and-practices/index.html

It was also used for debugging some of the problems that I had with the website build, I am not including those prompts here…

Homework Questions from Professor Jacobson

Jacobson

Error rate of (proofreading) DNA polymerase: about 1 error per 10⁶ bases added (≈10⁻⁶). Human genome length (diploid not specified on slide; genome size shown): about 3.2 Gbp ≈ 3.2×10⁹ base pairs. you’d expect roughly 3.2×10⁹ / 10⁶ ≈ 3.2×10³ ≈ 3,200 misincorporations per genome copy.

Proofreading built into polymerase via a 3′→5′ exonuclease that removes misincorporated bases. Post-replication mismatch repair systems (the slides show the MutS/MutL/MutH pathway) that find mismatches and replace the wrong stretch. Beyond that (general bio context): other DNA repair pathways and cellular checkpoints reduce which errors persist as heritable mutations.

The genetic code is triplet-based (codons like AUG/GUU/GGA encode amino acids). The slide gives average human protein coding length ≈ 1036 bp. That’s about 1036/3 ≈ 345 codons (≈345 amino acids, ignoring stop/start details). Because most amino acids have multiple synonymous codons, the number of distinct DNA sequences that can encode the same protein is roughly: “Rule of thumb” average ~3 codons per amino acid ⇒ ~3345 ≈ 4×10164 possible coding sequences. Using 61 sense codons / 20 amino acids ≈ 3.05 average degeneracy ⇒ ~(3.05)345 ≈ 1×10167. So: on the order of 10165–10167 different DNA sequences could encode an “average” human protein sequence. Why don’t all those synonymous options work in real cells? (practical constraints) nucleotide sequence affects behavior even when the amino-acid sequence is unchanged: mRNA secondary structure / folding changes with GC% and sequence, affecting translation and stability. RNA cleavage / degradation sensitivity depends on sequence/structure (RNase III cleavage rules shown). And in practice (common synthetic biology reasons, consistent with the above): Codon-usage bias & tRNA availability in the host: “rare” codons can slow or stall translation, reduce yield, or increase misfolding. Unwanted sequence motifs: accidental promoters/terminators, cryptic splice sites (eukaryotes), repeats/homopolymers, extreme GC or AT stretches that break synthesis/PCR or trigger regulation.

Homework Questions from Dr. LeProust:

LeProust

Solid-phase phosphoramidite chemical synthesis (automated DNA synthesizers running repeated deprotection/coupling/capping/oxidation-type cycles). 2. Because chemical synthesis is “open loop” (no proofreading), and errors + incomplete coupling accumulate every base-addition cycle. The slide gives a chemical synthesis error rate ~1:10² per base addition. That means the fraction of perfect molecules drops roughly exponentially with length (e.g., if ~1% error per step, the chance of an error-free 200-mer is about (0.99)200 ≈ 0.13 (0.99) 200 ≈0.13, so most product is wrong/truncated), and purification becomes dominated by a complex mixture. 3. A 2000 bp strand would require ~2000 sequential chemical addition cycles, so with ~1% error per base (from the slide’s 1:10² figure), the probability of getting a full-length error-free molecule is ~ (0.99) 2000 ≈2×10−9(0.99) 2000≈2×10 −9—essentially none, and you’d mostly produce a huge smear of incorrect/truncated products. So instead, genes are made by assembling shorter oligos/fragments (the slides point to assembly approaches like Gibson assembly and whole-genome assembly from synthetic oligos).

Homework Question from George Church:

George Church

the protein analog of A–T / G–C complementarity in NA:NA.


  1. Walker, K. T., Li, I. S., Keane, J., Goosens, V. J., Song, W., Lee, K.-Y., & Ellis, T. (2025). Nature Biotechnology, 43, 345–354. https://doi.org/10.1038/s41587-024-02194-3 ↩︎

  2. Zhou, H., Lin, P., Jeong, K. J., & Lee, S. Y. (2025). Trends in Biotechnology. https://doi.org/10.1016/j.tibtech.2025.09.019 ↩︎

Week 2 HW: dna read write and edit

Part 1: Benchling & In-silico Gel Art

My original idea was to make a circle, but after some trial and error I realized it would be a bit too complicated—so I settled on an arch (bridge).

1a) I imported the sequence for lambda DNA.

1b) In Benchling, I ran all 7 restriction enzymes we had available to see which ones gave:

  • a busy lane (many bands) → use as the “background” in most lanes
  • a cleaner lane (fewer bands) → use to “carve out” the interior of the arch

In-silico Gel Art 1 In-silico Gel Art 1
In-silico Gel Art 2 In-silico Gel Art 2

Note:

  • Lane = the vertical track DNA runs down from a single well
  • Bands = the horizontal lines within a lane (different fragment sizes)

Based on the results above, I rearranged the enzymes to create the pattern:

Benchling gel layout Benchling gel layout

Although it’s not the most beautiful arch, this was a great exercise for understanding the basics of in-silico digests and gel band patterns.

This tool is also great for quickly iterating on gel-art layouts: https://rcdonovan.com/gel-art

3.1. Choose your protein

In recitation, we discussed picking a protein for the homework that you personally find interesting. I chose CBM3.

Why CBM3?
CBM3 is interesting because it works like a modular “cellulose anchor”: you can fuse it to other proteins so they reliably stick to cellulose (including bacterial cellulose). Beyond simple labeling, CBM fusions are used as fluorescent probes to visualize cellulose organization and dynamics, as affinity tags for low-cost purification on cellulose, and as anchoring domains to immobilize enzymes on cellulose scaffolds—turning cellulose into a reusable biocatalyst support or functional capture material.

Simply put: it’s short, often expresses well, and it sticks to cellulose.
Reference: CBM3 (example paper)

In UniProt, I searched for “carbohydrate-binding module CBM cellulose-binding protein” and got many hits. A good way to narrow the options is to pick something that is:

  1. Reviewed (Swiss-Prot) (more reliable annotation)
  2. Short / manageable (ideally ~80–250 aa)
  3. Clearly annotated as a CBM domain (cellulose-binding)

The UniProt entry I used was Q06851. The full protein is long, but UniProt makes it possible to extract only the domain/region relevant to the application:

  1. Open the UniProt entry
  2. Scroll to Family & Domains
  3. Find the feature you are interested in (domain boundaries)

I chose the CBM3 (carbohydrate-binding module family 3) from the cellulosome scaffoldin CipA, because CBM3 specifically binds cellulose and is relevant for bacterial cellulose materials.

UniProt domain selection UniProt domain selection

3.2. Reverse translate: Protein (amino acid) → DNA (nucleotide)

To extract only the CBM3 region, I downloaded the sequence and used the Gao Lab WebLab tool:
WebLab – range_extract_protein

I entered the range 365–523, which returned:

>CBM3_CipA_Q06851_res365-523
GAYAITKDGVFAKIRATVKSSAPGYITFDEVGGFADNDLVEQKVSFIDGGVNVGNATPTKGATPTNTATPTKSATATPTRPSVPTNTPTNTPANTPVSGNLKVEFYNSNPSDTTNSINPQFKVTNTGSSAIDLSKLTLRYYYTVDGQKDQTFWCDHAAI
? ?

Next, I pasted the CBM3 amino-acid sequence into the Sequence Manipulation Suite reverse-translation tool: bioinformatic – Reverse Translate

Finally, I double-checked the result in Benchling by pasting the reverse-translated DNA into a new sequence and using Benchling’s Translate feature to confirm it produced the same amino-acid sequence.

benchling benchling

3.3. Codon optimization

I decided to codon-optimize for E. coli because it’s a common protein-expression host with well-established tools. Codon optimization matters because organisms have different codon bias / tRNA abundances, and matching preferred codons often improves translation efficiency, protein yield, and reduces stalling during expression. To do this, I used Twist’s codon-optimization workflow and selected Host: Escherichia coli. The optimization completed successfully (“Optimization was successful”) and the sequence scored Standard, indicating it is considered synthesize-able under Twist’s constraints. I then selected Use the optimized sequence and (as a sanity check) confirmed that the translated amino-acid sequence remained unchanged—only synonymous codons were swapped.

twist twist

“I optimized for E. coli because it’s a common protein-expression host with well-established tools; the purified CBM can then be applied to bacterial cellulose to bind it.”

3.4. You have a sequence! Now what?

Now that I have a DNA sequence encoding CBM3, the next step is to express the protein. In a typical cell-dependent (in vivo) workflow, the codon-optimized CBM3 coding sequence is cloned into an E. coli expression plasmid under a promoter (e.g., T7/lac).

-An expression plasmid is designed to make lots of protein.

-A promoter is a DNA “on-switch” that tells the cell when to start making RNA from your gene.

-T7/lac is a common strong promoter system used to tightly control expression.

After transforming the plasmid into an expression strain, the cells are grown and expression is induced (often with IPTG).

IPTG releases repression in the lac system so the promoter becomes active, and the cells start producing CBM3.

Inside the cell, the DNA is transcribed by RNA polymerase into mRNA, and the mRNA is then translated by ribosomes into the CBM3 protein as tRNAs deliver amino acids according to the codons. The protein can then be purified (for example via an affinity tag such as His-tag) and used to bind/functionalize bacterial cellulose.

-His-tag lets you purify CBM3 using a matching resin (Ni-NTA), washing away everything else.

Alternatively, CBM3 could be produced using a cell-free expression system (TX-TL), where the DNA template (plasmid or linear) is added directly to a lysate containing RNA polymerase, ribosomes, and all required cofactors.

required cofactors: -RNA polymerase

-ribosomes

-tRNAs, amino acids

-energy + cofactors

In this setup the same steps—transcription to mRNA and translation to protein—happen in a test tube rather than inside living cells, which can be faster and easier for prototyping, though often at smaller scale.

Why do cell-free?

  • Often faster for prototyping (no transformations, no growing cells).
  • Convenient when testing multiple designs quickly.
  • Downsides: usually more expensive per mg and often smaller scale/yield than growing E. coli.

Ethical and regulatory difference: Cell-free systems are generally considered safer because they are non-living reactions that cannot usually replicate or spread in the environment. They stop once substrates, energy, or cofactors are depleted. In contrast, in-cell genetic engineering uses living organisms, which can continue growing and may pose risks if accidentally released, such as persistence in the environment or transfer of engineered DNA to other organisms.

Part 4 — Build an E. coli expression cassette (Benchling → Twist-ready)

For this step I designed a complete E. coli expression DNA insert in Benchling by assembling the required genetic parts in the correct order:

  1. Promoter (BBa_J23106)
  2. RBS (BBa_B0034 + spacer)
  3. Start codon (ATG)
  4. Coding sequence: replaced the template CDS with my codon-optimized gene (from Part 3)
  5. C-terminal His-tag (7×His)
  6. Stop codon (TAA)
  7. Terminator (BBa_B0015)

After pasting each piece, I annotated every region (promoter, RBS, start, CDS, His-tag, stop, terminator) directly on the Benchling sequence.

Benchling linear map of the insert Benchling linear map of the insert benchling

I also used Benchling’s Analyze/Translate to confirm the ATG (Open Reading Frame) is in frame from the ATG (Start codon) and that the sequence ends with the His-tag followed by a stop codon.

Benchling link: https://benchling.com/s/seq-YgFm33VIxUzvPdZpyOKk?m=slm-lYfXGHAomlD9Go7bgPWh

Benchling translation / stop-codon check Benchling translation / stop-codon check

The plasmid backbone is the original vector framework containing essential elements such as the antibiotic resistance marker and origin of replication. The insert is the DNA fragment cloned into that backbone. The source annotation usually refers to the origin or overall sequence record and is not typically a functional genetic element itself.

In conclusion

  • E. coli = the factory
  • plasmid backbone = the delivery vehicle / operating template inside the factory
  • insert = the custom cargo you added
e-colli e-colli

Part 5 — DNA Read / Write / Edit (pigment-colored SCOBY / bacterial cellulose sheets)

This builds directly on my Week 1 project idea (“Programmable colors for bacterial cellulose production”):
https://pages.htgaa.org/2026a/alve-lagercrantz/homework/week-01-hw-principles-and-practices/index.html

5.1 DNA Read (sequencing)

(ii) What sequencing technology would you use and why?
Because SCOBY is a mix of different types of DNA (bacteria, yeast etc) I would use Oxford Nanopore long-read sequencing with shotgun metagenomic DNA from the SCOBY. One run can tell me both who is present (community composition) and help reconstruct full plasmids/inserts, which matters for checking stability during long fermentations.

Oxford Nanopore Oxford Nanopore
  • Generation: Third-generation (single-molecule, long-read sequencing).
  • Input: Total genomic DNA extracted from the SCOBY (mixed community DNA).
  • Essential prep steps: Extract DNA carefully (aim for high molecular weight) → optionally size-select / gently shear if needed → ligate Nanopore adapters (or use rapid prep) → load on flow cell.
  • How bases are decoded (base calling): DNA passing through a nanopore changes the ionic current; a basecaller converts the signal into A/C/G/T sequences.
  • Output: FASTQ (reads + quality scores) (often plus raw signal files) → downstream: taxonomic profiling + assembly to recover plasmids/contigs and verify constructs.

5.2 DNA Write (synthesis)

The Part 4 cassette I built is an E. coli expression-style design (promoter/RBS/terminator suited for E. coli). To make color, I can keep the same cassette architecture but swap the coding sequence to a pigment gene (or pathway). For SCOBY/BC specifically, there are two realistic “write” directions:

  1. In-situ pigmentation inside the cellulose producer
    Engineer a cellulose-producing Komagataeibacter strain to biosynthesize pigment while it grows the pellicle. A strong example is melanin via tyrosinase expression, which yields dark, robust coloration in BC.1

  2. Co-culture / division-of-labor pigmentation
    Keep the cellulose producer focused on making BC, and pair it with a second microbe engineered to produce pigments (broad palette). A published example uses E. coli strains producing violacein derivatives and carotenoids alongside Komagataeibacter xylinus to generate multiple BC colors.2

Important design note: If the target host is Komagataeibacter (not E. coli), the regulatory parts (promoters/RBS/terminators, plasmid backbone) must be chosen for that host; otherwise the pigment genes may not express even if the coding sequence is correct.

Material/safety note (relevant for textiles/skin contact):

  • Some pigments (e.g., violacein) are bioactive, so “write” decisions should also consider leaching, irritation risk, and safe handling/disposal pathways. 3

5.3 DNA Edit (genome editing)

For stable, repeatable colored BC (especially over long growth periods), genome editing can be attractive because it can:

  • reduce dependence on plasmid maintenance,
  • improve stability across generations,
  • enable more predictable performance in a mixed or semi-open fermentation context.

Conceptually, “edit” could mean integrating a pigment function into the cellulose-producer genome, or tuning regulatory control (e.g., linking pigment production to growth phase or light-patterning concepts used in engineered living materials).

Bonus — a bacterial-cellulose (BC) face mask that changes color via cell-free pigment expression

BC is already a compelling cosmetic substrate because it holds a lot of water, conforms well to skin, and has been tested as a moisturizing sheet mask material. In one evaluation, a single application of a bacterial-cellulose mask increased facial skin moisture more than a moist towel control.4

facemask facemask Generated by ChatGBT

Instead of putting living engineered cells on the face, a safer “synthetic biology” route is to embed freeze-dried cell-free gene expression (TX-TL) into the BC sheet as small patterned “sensor dots.” These cell-free circuits stay inactive when dry, then turn on when the mask hydrates during wear; outputs can be colorimetric (visible) or optical.5

Because freeze-dried cell-free circuits activate upon rehydration, a conventional pre-hydrated sheet mask would trigger prematurely during storage. A practical design might be a dry-stored BC mask (or a separate paper sensor tab) that is activated only at time of use by releasing fluid.

How it could work:

  • Input (skin/sweat biomarker): pH (skin barrier/irritation proxy), lactate (sweat/metabolic proxy).
  • Sensing layer (cell-free circuit): a biomarker-responsive regulatory element controls whether a reporter is expressed.6
  • Output (visible color): express a chromoprotein (strong color under normal light) so the mask visibly shifts color in specific zones without any instrument; chromoproteins are attractive for “naked-eye” readouts.7
IGEM IGEM

Why this is interesting for BC masks:

  • The mask provides hydration + intimate contact, which can reactivate freeze-dried cell-free systems.
  • Patterning multiple “dots” enables a simple visual map (e.g., pH zones at cheeks vs T-zone), turning the mask into a wearable readout rather than just a carrier.

[^^1][^3]


References (footnotes)


  1. Walker, K. T. et al. Self-pigmenting textiles grown from cellulose-producing bacteria with engineered tyrosinase expression. Nature Biotechnology (2025, published online 2024). https://doi.org/10.1038/s41587-024-02194-3 ↩︎

  2. Zhou, H. et al. One-pot production of colored bacterial cellulose. Trends in Biotechnology (2025). https://doi.org/10.1016/j.tibtech.2025.09.019 ↩︎

  3. WEEK 1 HW: PRINCIPLES AND PRACTICES https://pages.htgaa.org/2026a/alve-lagercrantz/homework/week-01-hw-principles-and-practices/index.html ↩︎

  4. Amnuaikit, T. et al. (2011). Effects of a cellulose mask synthesized by a bacterium on facial skin characteristics and user satisfaction. https://pmc.ncbi.nlm.nih.gov/articles/PMC3417877/ ↩︎

  5. Nguyen, P.Q. et al. (2021). Wearable materials with embedded synthetic biology sensors for biomolecule detection. Nature Biotechnology. https://www.nature.com/articles/s41587-021-00950-3 ↩︎

  6. Ba, F. et al. Chromoproteins: visible tools for advancing synthetic biology. https://pubmed.ncbi.nlm.nih.gov/41309430/ ↩︎

  7. Pardee, K. et al. (2014). Paper-Based Synthetic Gene Networks. Cell. https://pubmed.ncbi.nlm.nih.gov/25417167/ ↩︎

Week 03 — Opentrons: Automation Art + Post-Lab Questions

Part 1 — Automation Art (OT-2 “printing” a design)

This week I designed a microscope icon as “automation art” and converted it into a grid of XY dot coordinates that can be dispensed by the Opentrons OT-2 onto an agar plate.

1) Design → coordinate map

I started from the course Automation Art Interface, which makes it easy to draw a dot pattern on a circular “canvas.”

Automation Art Interface screenshot Automation Art Interface screenshot

2) Convert the pattern into points + sanity-check in Python

To avoid trial-and-error on the robot, I used a Colab notebook to:

  • convert pixels/dots → (x, y) coordinate lists
  • preview the design as a scatter plot
  • separate two colors (main shape vs highlights)

Colab notebook:
https://colab.research.google.com/drive/1tLENS2Rs0mxdN-pJp5QNfm1K6dfg9xsS?usp=sharing

The preview below shows the final point-map I used:

  • Green = main “microscope” body
  • Red = highlight/accent points (mScarlet)
Coordinate preview from Colab Coordinate preview from Colab

3) Implement in an OT-2 protocol

In my OT-2 protocol, the key idea is:

  • store the design as coordinate lists (e.g., electra2_points, mscarlet_i_points)
  • aspirate enough volume for a “chunk” of dots (so we don’t aspirate for every single point)
  • dispense each dot using a small helper that moves down to dispense and back up to detach the droplet cleanly

Snippet (from my protocol):

# --- parameters ---
DOT_UL = 0.8      # volume per dot
GRID_MM = 1.0     # coordinate units → mm

designs = [
    ("Green", electra2_points),
    ("Red",   mscarlet_i_points),
]

for color_label, pts in designs:
    source = location_of_color(color_label)
    pipette.pick_up_tip()

    dots_per_chunk = int(pipette.max_volume // DOT_UL)

    i = 0
    while i < len(pts):
        chunk = pts[i:i + dots_per_chunk]
        vol = DOT_UL * len(chunk)

        pipette.aspirate(vol, source)

        for (x, y) in chunk:
            dest = center_location.move(types.Point(x=x * GRID_MM, y=y * GRID_MM, z=0))
            dispense_and_detach(pipette, DOT_UL, dest)

        i += len(chunk)

    pipette.drop_tip()

Part 2 — Post-Lab Questions (Opentrons paper + how it connects to my final project)

2.1 A published paper using Opentrons for a novel bio application

I chose Brown et al. (2025), “Semiautomated Production of Cell-Free Biosensors” (ACS Synthetic Biology) because it shows the OT-2 being used not just for “routine liquid handling,” but as a manufacturing platform for synthetic biology diagnostics.

In the paper, the authors use an Opentrons OT-2 to assemble large batches of cell-free biosensor reactions, then process them through a deployment-style pipeline: assemble → (optionally) lyophilize → rehydrate → measure output. They compare manual vs automated preparation and demonstrate reliable, scaled production (including a full 384-well plate format), which is exactly the kind of reproducibility you want when moving from “cool demo” to “repeatable product”.

2.2 How Opentrons could be “perfect” for producing a BC skincare sheet mask (pouch mask)

For my final project direction, I’m thinking of a skincare sheet mask, using bacterial cellulose (BC) as the carrier material. The OT-2 is a great fit because it turns a “handmade one-off” into a repeatable, batchable fabrication workflow.

Where OT-2 helps most

  • Standardized loading of serum / actives: dispense precise volumes of humectants (e.g., glycerol), buffers, preservatives (if used), fragrance-free additives, etc. into pouches or soaking trays so every mask gets the same dose.
  • Patterned deposition (“pixel printing”) onto BC: print micro-spots or zones of different formulations (e.g., soothing zone vs brightening zone) or a visible “QC pattern” to confirm even loading.
  • Built-in controls + QC: include calibration spots or a reference color patch on each sheet (so each mask is self-verifiable in documentation/photos).

How this connects to the Brown et al. OT-2 paper Brown et al. use the OT-2 as a manufacturing platform for cell-free biosensor reactions (assemble → process → rehydrate → readout). My mask workflow is conceptually similar, just with a different substrate:

  • assemble formulations (or cell-free mixes for R&D prototypes)
  • deposit onto/into BC in a controlled way
  • package / dry / store
  • rehydrate on use (when the sheet mask is applied)

What I would document as “automation value”

  • Repeatability across a batch (mass gain of BC after dosing, or volume dispensed per pouch)
  • Uniformity (image-based check of a printed pattern across masks)
  • Optional: a simple visual indicator that activates upon rehydration (e.g., a time/usage indicator patch for R&D proof-of-concept)

This makes the OT-2 useful not only for lab experiments, but for building a small-scale manufacturing pipeline for BC skincare sheet masks.

Brown et al. (2025) — workflow schematic + readout example Brown et al. (2025) — workflow schematic + readout example

Reference

  • Brown, D. M. et al. (2025). Semiautomated Production of Cell-Free Biosensors. ACS Synthetic Biology. DOI: 10.1021/acssynbio.4c00703

Links (for citation / screenshots):

PubMed: https://pubmed.ncbi.nlm.nih.gov/40073441/
ACS (journal page): https://pubs.acs.org/doi/10.1021/acssynbio.4c00703
PDF: https://jewettlab.org/wp-content/uploads/2025/06/brown-et-al-2025-semiautomated-production-of-cell-free-biosensors.pdf

Final Project Ideas

Idea 1 — OT-2 “manufactured” BC skincare sheet masks (pouch masks)

Concept: Use the Opentrons OT-2 as a small-scale manufacturing tool to reproducibly load / pattern skincare formulations onto bacterial cellulose (BC) sheet masks that come in a sealed pouch and sit on skin for ~1–2 hours.

  • Problem: BC have excelant water holding capacity however handmade BC sheet masks are hard to standardize (dose, uniformity, repeatability across a batch).

  • Hypothesis: Automation + coordinate-based dispensing can turn BC sheet masks into a consistent, documented “biofabrication pipeline.” bacteria can be engineered to “read” your skin health and express it in simple color cues.

  • embed a cell-free color indicator patch as a “time / health/ hydration indicator.

  • Approach (R&D workflow):

    • Grow/harvest BC sheets → press to target thickness → load into a deck jig/holder.
    • OT-2 dispenses exact volumes of serum/actives into:
      • (A) the pouch (soak method), and/or
      • (B) directly onto the BC in patterns/zones (“forehead zone”, “cheek zone”, etc.).
  • MVP demo: 6–12 masks with identical dosing; photo + mass-gain and uniformity checks.

  • What to measure: repeatability (dispensed volume, BC mass gain), uniformity (image analysis), user-facing consistency (feel, tack, wetness over time).


Idea 2 — Water-resistant BC “leather” via in-growth synbio

Concept: Reduce BC water uptake during growth by programming the system to deposit a cellulose-bound amphiphilic layer (e.g., a hydrophobin–cellulose binding domain fusion) that self-assembles on/within the BC network.

  • Problem: When using BC as leather substitude (material production) one of the main problems is that it absorbs a lot of water + swells; tradtionally the solution have been different post-coatings different oils or waxes however they tend to not be very long lasting.

  • Hypothesis: A cellulose-binding, self-assembling protein layer produced during growth period can reduce wetting and wicking without heavy post-treatment.

  • Approach:

    • Engineer a production strain or a modular functionalization step to present hydrophobin–CBD/CBM at the BC interface.
    • Compare conditions:
      1. control BC
      2. BC + in-process hydrophobin–CBD functionalization
      3. BC + conventional post-coat (baseline comparison)
  • MVP demo: small “bag panel” swatch set + simple rain/soak tests.

  • What to measure: water uptake %, wicking height, thickness change after wetting, flex/crack after dry–wet cycles.

  • Stretch goal: combine with in-growth pigment or optogenetic patterning for functional + aesthetic “self-finished” BC.


Idea 3 — Light-input → color-output BC bio-print for moiré effects (BC + engineered E. coli)

This project is based on week01 homework

Concept: A co-culture “living printer”: Komagataeibacter grows the BC sheet while engineered E. coli produces pigments under light control, enabling projected patterns. Two patterned layers with slightly different line frequencies create moiré interference when stacked.

  • Problem: Dyeing BC is slow/uneven; patterning usually requires post-processing.
  • Hypothesis: Optogenetics enables spatial control: light patterns → localized gene expression → localized color on/within a growing material.
  • Approach (research plan):
    • Build/borrow a light-gated expression system in E. coli (red/green/blue input).
    • Drive a visible output (pigment pathway or chromoprotein).
    • Pattern with projector/photomask onto a co-culture or onto E. coli deposited on BC.
    • Grow/prepare two sheets with slightly offset gratings → overlay for moiré visuals.
  • MVP demo: one light-patterned colored sheet + photo documentation of resolution/contrast.
  • What to measure: pattern sharpness (edge blur), color contrast, stability after drying, moiré strength with layer overlay.
  • Stretch goal: multi-color “logic-like” prints (different wavelengths → different pigments).

Reff. project 1:

  1. Brown, D. M. et al. Semiautomated Production of Cell-Free Biosensors. ACS Synthetic Biology (2025). https://pubmed.ncbi.nlm.nih.gov/40073441/
  2. Amnuaikit, T. et al. (2011). Effects of a cellulose mask synthesized by a bacterium on facial skin characteristics and user satisfaction. https://pmc.ncbi.nlm.nih.gov/articles/PMC3417877/
  3. Nguyen, P.Q. et al. (2021). Wearable materials with embedded synthetic biology sensors for biomolecule detection. Nature Biotechnology. https://www.nature.com/articles/s41587-021-00950-3
  4. Pardee, K. et al. (2014). Paper-Based Synthetic Gene Networks. Cell. https://pubmed.ncbi.nlm.nih.gov/25417167/
  5. Ba, F. et al. Chromoproteins: visible tools for advancing synthetic biology. https://pubmed.ncbi.nlm.nih.gov/41309430/

Reff. project 2:

  1. Puspitasari, N. Class I hydrophobin fusion with cellulose binding domain… (PDF thesis/report, 2021). https://repositori.ukwms.ac.id/id/eprint/31910/1/1-Class_I_hydrophobin_fusion_with_%28Nathania%29.pdf

Reff. project 3:

  1. Walker, K. T., Li, I. S., Keane, J., Goosens, V. J., Song, W., Lee, K.-Y., & Ellis, T. (2025). Nature Biotechnology, 43, 345–354. https://doi.org/10.1038/s41587-024-02194-3
  2. Zhou, H., Lin, P., Jeong, K. J., & Lee, S. Y. (2025). Trends in Biotechnology. https://doi.org/10.1016/j.tibtech.2025.09.019
  3. Levskaya, A. et al. Synthetic biology: engineering Escherichia coli to see light. Nature (2005). https://pubmed.ncbi.nlm.nih.gov/16306980/

Week 04 HW: Protein design part 1

Shuguang Zhang — 9 Short Answers

(Skipped #4 and #11)

1. How many amino acid molecules are in 500 g of meat?

If 500 g of meat is about 20% protein, that gives about 100 g protein.
Since one amino acid is about 100 g/mol, that is about 1 mole, or ~6 × 10^23 molecules.

2. Why do we eat beef but do not become a cow?

Because our body digests food proteins into amino acids and then uses them to build human proteins.

3. Why are there only 20 natural amino acids?

Because evolution selected a set of 20 that gives enough chemical variety while still being efficient for life to use.

5. Where did amino acids come from before life started?

They likely formed through prebiotic chemistry, such as lightning, UV radiation, hydrothermal activity, or from meteorites.

6. What handedness would an α-helix made of D-amino acids have?

It would most likely form a left-handed helix.

7. Can there be additional helices in proteins?

Yes. Besides the α-helix, proteins can also have 3₁₀ helices and π-helices, and new ones can be designed.

8. Why are most molecular helices right-handed?

Because natural proteins are made from L-amino acids, which usually favor right-handed helices.

9. Why do β-sheets tend to aggregate?

Because β-strands can easily line up and make hydrogen bonds with each other.
The main driving force is backbone hydrogen bonding plus hydrophobic interactions.

10. Why do many amyloid diseases form β-sheets? Can amyloid β-sheets be used as materials?

Amyloid proteins often misfold into very stable β-sheet fibrils, which can build up in disease.
Yes, in controlled settings they can also be used as useful biomaterials.

Before diving deep into the homework here is some highlight from the lecture with Cale and Ahmed giving some fundational knowledge around protein design:

what does protein do?

function function

When we look at protein design it is important to concider what type of abstraction we are looking at:

abstaction abstaction

Proteins are build up from the 20 Amino acids each has a unique chemical structure, charge, physical propertie that will determine the protein structure and function:

protein chain protein chain

this is an overview of the most important function of proteins:

function function

Proteins are classified as CATH

classification classification classification2 classification2

This is a great website where you easily can “browse” the different classes:

2 2

cath

Distance maps are a tecnique to calculate distance between different parts in space of a protein structure and visualize it in 2D (scale: ångstrom)

distancemap distancemap

Three scoring categories of the quality of predicted structure:

  1. Physical / Energetic
  • Stability, folding energy,
  • Examples: Rosetta energy, ΔG folding
  1. Structural
  • RMSD, Packing, clashes
  1. Models confidence
  • AlphaFold pLDDT / PAE

plddt usage:

  • Assessing confidence within a domain ((but not between domains))
  • Identifying domains
  • Identifying possible disordered regions

PAE usage:

  • To assess relative domain positions
  • If the value is less than 30 in the interface region means a good predictor of binding

full PDF

video slides

google docs

Part B — Protein Analysis and Visualization (BcsZ / PDB: 3QXF)

Table of contents


1. Protein choice

I chose BcsZ (bacterial cellulose synthase subunit Z) from Escherichia coli K-12 (PDB: 3QXF) because it is part of the bacterial cellulose (BC) synthase system. BcsZ is annotated as a periplasmic endo-β-1,4-glucanase in glycoside hydrolase family 8 (GH8), meaning it can cut β-1,4 linked glucan chains (cellulose-like polymers) and is associated with efficient cellulose biosynthesis/translocation.

What “periplasmic endo-β-1,4-glucanase (GH8)” means

  • Periplasmic: located in the periplasm, the space between inner and outer membranes in Gram-negative bacteria (like E. coli).
  • Glucan: a chain of glucose units (cellulose is a glucan).
  • β-1,4: the bond type between glucose units in cellulose.
  • Endo-: cuts inside the chain (not only from the ends).
  • GH8: a carbohydrate-enzyme family classification (shared fold + mechanism among related enzymes).

Why a cellulose-producing bacterium has a “cellulose cutter” Producing and exporting a long polymer is mechanically challenging. A periplasmic endoglucanase can help by:

  • clearing jams / trimming chains that clog export
  • processing cellulose during extrusion (helps proper fiber/network formation)
  • helping polymer movement through the periplasm toward the export channel

2. Amino acid sequence + basic analysis

Sequence source: RCSB PDB sequence for 3QXF, chain A www.rcsb.org.
Sequence length: 355 aa (chains A–D are the same sequence).

rcsb rcsb

FASTA (chain A)

>3QXF_A BcsZ (E. coli K-12) length=355
ACTWPAWEQFKKDYISQEGRVIDPSDARKITTSEGQSYGMFSALAANDRAAFDNILDWTQNNLAQGSLKERLPAWLWGKKENSKWEVLDSNSASDGDVWMAWSLLEAGRLWKEQRYTDIGSALLKRIAREEVVTVPGLGSMLLPGKVGFAEDNSWRFNPSYLPPTLAQYFTRFGAPWTTLRETNQRLLLETAPKGFSPDWVRYEKDKGWQLKAEKTLISSYDAIRVYMWVGMMPDSDPQKARMLNRFKPMATFTEKNGYPPEKVDVATGKAQGKGPVGFSAAMLPFLQNRDAQAVQRQRVADNFPGSDAYYNYVLTLFGQGWDQHRFRFSTKGELLPDWGQECANSHLEHHHHHH

Amino-acid frequency (from the Week 4 Colab)

I used the Week 4 Colab notebook to compute amino-acid frequencies from the FASTA sequence.

Most frequent amino acids (top 5):

  • A (Alanine): 32
  • L (Leucine): 31
  • G (Glycine): 26
  • S (Serine): 23
  • K (Lysine): 22 (tied with D = 22)

I used ChatGBT to generate this code that could generate most frequent AA:

cleaned_sequence = protein_sequence.replace(" ", "").replace("\n", "").strip()
amino_acid_count = Counter(cleaned_sequence)

print("Length:", len(cleaned_sequence))
print("Top 10:", amino_acid_count.most_common(10))

3. Homologs (UniProt BLAST)

I ran UniProt BLAST (https://www.uniprot.org/blast) using the FASTA sequence above (default settings).

rcsb rcsb
  • Homologs found (displayed): 250 results in UniProtKB
  • E-value range shown: from 0.0 (strongest) to about 4.1 × 10⁻¹²⁸ (least significant shown)
  • Identity range shown: approximately 50.9% – 100%
  • Example top hit (from Text Output): 99% identity (338/339), Expect = 0.0

Conclusion: With the displayed results, all 250 hits are >30% identity, and all are extremely significant by E-value.

Footnote:

  • Homologs are proteins in other organisms (or strains) that are related by evolution—they come from a common ancestral gene.
  • The E-value (expect value) is a BLAST statistic that answers: “If I searched a database this big with a random (unrelated) sequence, how many hits with this score would I expect to see just by chance?”
  • Rule-of-thumb:
  • E < 1e-3: usually meaningful similarity
  • E < 1e-10: very strong
  • E ~ 0.0 (BLAST rounds extremely tiny values to 0): essentially “as strong as it gets”

4. Protein family / domain classification

Does it belong to a protein family? Yes.

But first of all what is CATH, SCOP2 and ECOD:

CATH, SCOP2, and ECOD are all systems for classifying protein domains based on their three-dimensional structure and evolutionary relationships, but they organize proteins in slightly different ways. CATH uses a clear hierarchical scheme based on Class, Architecture, Topology, and Homologous superfamily, making it useful for describing both structural shape and evolutionary grouping. SCOP2 is an updated version of SCOP that also classifies proteins by structure and ancestry, but it uses a more flexible framework rather than a strictly rigid hierarchy. ECOD (Evolutionary Classification of Protein Domains) places particularly strong emphasis on evolutionary relationships and homology, aiming to group protein domains by shared ancestry. In summary, all three classify protein structure, but CATH is often seen as a geometry-based hierarchical system, SCOP2 as a flexible structure-and-evolution system, and ECOD as especially focused on evolutionary history.

  • GH8 (Glycoside Hydrolase family 8): indicates BcsZ belongs to a known family of carbohydrate-active enzymes that hydrolyze glycosidic bonds (fits its endoglucanase/cellulase-like role).
  • Six-hairpin glycosidase(-like) superfamily: describes the shared fold architecture (a helix-rich α/α toroid / alpha–alpha barrel-like fold) found in related carbohydrate enzymes, even when sequences vary.

5. Structure page + structure quality

From the RCSB structure page www.rcsb.org:

  • PDB entry: 3QXF
  • Method: X-ray diffraction
  • Resolution: 1.85 Å (high quality; smaller Å = sharper structure)
  • Released: 2011-03-30 (deposited 2011-03-01)
  • Other molecules present: Other molecules present: no ligands/cofactors (HET atoms = 0), but the crystal includes waters (solvent); the protein was expressed with selenomethionine (MSE) residues.

rcsb rcsb

6. Structure classification (SCOP2 / CATH / ECOD)

These classifications all point to a helix-rich α/α architecture typical of GH8-like glycosidases.

SCOP2

  • SCOP2B Superfamily: Six-hairpin glycosidases
rcsb rcsb

CATH

  • Class: Mainly Alpha
  • Architecture: Alpha/alpha barrel
rcsb rcsb

ECOD

  • Architecture: alpha superhelices
  • Topology: alpha/alpha toroid
  • Family name: Glyco_hydro_8
rcsb rcsb

7. 3D visualization in PyMOL

I used PyMOL to visualize 3QXF (focusing on chain A for clarity).

7.1 Visualize as cartoon, ribbon, and ball-and-stick

  • Ribbon
rcsb rcsb
  • Cartoon
rcsb rcsb
  • Ball-and-stick Full-protein ball-and-stick is visually dense but shows atomic detail.
rcsb rcsb
fetch 3qxf, async=0
remove solvent
select prot, 3qxf and chain A
hide everything
show cartoon, prot
zoom prot

Why are we using this 3 ways of visualize the protein structure?

  • Cartoon/ribbon answer: What is the big structural arrangement?
  • Ball-and-stick answers: What is happening at the residue/atom level?

7.2 Color by secondary structure. Does it have more helices or sheets?

After coloring by secondary structure:

  • Helices dominate (in red)
  • There are fewer β-sheets (in yellow)
  • Remaining regions are loops/turns
helices helices
dss
color red,    prot and ss h
color yellow, prot and ss s
color gray70, prot and ss l+""

Conclusion: BcsZ is helix-rich (more helices than β-sheets), consistent with GH8 / α/α fold classifications.

7.3 Color by residue type. Hydrophobic vs hydrophilic distribution

residue residue
select hydrophob, prot and resn ALA+VAL+ILE+LEU+MET+PHE+TRP+TYR+PRO+CYS
select polar,     prot and resn SER+THR+ASN+GLN+GLY
select charged,   prot and resn ASP+GLU+LYS+ARG+HIS

color orange, hydrophob
color cyan,   polar
color blue,   charged

After coloring residues by type:

  • Hydrophobic residues (orang) cluster mostly in the protein core (stabilizing the fold).
  • Polar and charged residues (cyan) are enriched on the protein surface, consistent with a soluble enzyme.
  • charged is colored in blue
  • The putative substrate-binding cleft shows a mix of polar/aromatic residues typical for carbohydrate-binding enzymes.

Note The small pink dots are likely selenium-containing atoms from selenomethionine (MSE) residues present in the crystal structure. Since MSE was not included in the custom residue-type selections, those atoms remained in the default viewer coloring.

7.4 Visualize the surface. Does it have any “holes” (binding pockets)?

rcsb rcsb
hide everything
show surface, prot
set transparency, 0.25

When visualized as a surface, BcsZ shows a prominent groove/cleft rather than a deep enclosed cavity.

Conclusion: BcsZ has a clear binding pocket / cleft consistent with an enzyme that acts on polymeric substrates (cellulose-like chains), which often bind along an open channel rather than a small closed pocket.

  • A small closed pocket is good for binding a small molecule.
  • An open groove or cleft is better for binding a long chain, like cellulose.

To make the substrate-binding cleft clearer, I compared the apo BcsZ structure (3QXF) with the cellopentaose-bound BcsZ structure (3QXQ), which shows how a glucan chain can sit along the open cleft.

Surface + ligand (ligand-bound cleft) Surface + ligand (ligand-bound cleft)

C1. Protein Language Modeling — Unsupervised Deep Mutational Scan (ESM2)

For my chosen protein (PDB: 3QXF), I used ESM2 to generate an unsupervised deep mutational scan by scoring every possible single amino-acid substitution at each position (language-model likelihood scores, mode="RELATIVE"). In the heatmap, each column is a residue position in the sequence and each row is a mutation-to amino acid. Brighter colors indicate mutations the model considers more plausible in context; darker colors indicate mutations that are strongly disfavored.

Overall pattern (what the heatmap shows)

Most positions show modest tolerance (many mutations cluster around neutral-ish scores), but there are clear vertical bands of strongly negative scores where almost any substitution is unlikely. These “dark stripes” suggest highly constrained positions, often linked to structural packing or important local geometry.

Mutation Scan Heatmap (ESM2) for my 3QXF sequence Mutation Scan Heatmap (ESM2) for my 3QXF sequence

Finding standout mutations (min/max scores)

Because N- and C-termini can show edge effects in language-model scoring (and my sequence ends with a short His-tag tail), I selected a standout mutation after excluding:

  • the first 5 residues (N-terminus edge effects)
  • the last 7 residues (His-tag tail)

I used the code below to convert the heatmap matrix into a mutation table and extract the most damaging/tolerated substitutions:

import pandas as pd
import numpy as np

arr = np.array(heatmap)
aas = list("ACDEFGHIKLMNPQRSTVWY")
L = len(protein_sequence)

score_mat = arr[:20, :L]  # 20 amino acids x L positions

rows = []
for i in range(L):
    wt = protein_sequence[i]
    for aa_i, mut in enumerate(aas):
        if mut == wt:
            continue
        rows.append((i+1, wt, mut, float(score_mat[aa_i, i])))

df = pd.DataFrame(rows, columns=["pos","wt","mut","score"])

# exclude N-terminus edge effects + C-terminal His-tag tail
core = df[(df["pos"] >= 6) & (df["pos"] <= (L-7))]

print("Most damaging:")
print(core.sort_values("score").head(1).to_string(index=False))

print("Most tolerated:")
print(core.sort_values("score", ascending=False).head(1).to_string(index=False))

Standout example (a strongly constrained position)

Most damaging internal mutation: V98 → R, score −11.600975

This mutation replaces a small hydrophobic residue (Val) with a bulky, positively charged residue (Arg). That kind of change is typically unfavorable if the position is in a packed protein interior (it disrupts hydrophobic packing and can introduce an unsatisfied charge). The fact that multiple substitutions at the same site are also strongly negative suggests position 98 is broadly mutation-intolerant, consistent with it being structurally important.

Top 10 most damaging (excluding first 5 residues + His-tag tail)

RankPositionWT → MutScore
198V → R-11.600975
2109R → I-11.381086
3107A → P-10.845333
441F → D-10.764390
5109R → L-10.727297
641F → K-10.649606
798V → C-10.633169
898V → W-10.569185
998V → K-10.555022
10102W → K-10.527938

Extra pattern note: several top hits are “structurally disruptive” mutation types (e.g., A→P can break secondary structure; aromatic/hydrophobic → charged can disrupt packing or interfaces), which matches the intuition that the darkest vertical bands in the heatmap correspond to constrained, structure-critical sites.

C1. Protein Language Modeling — Latent Space Analysis (ESM2 embeddings + 3D t-SNE)

To explore how a protein language model organizes sequence space, I embedded a provided dataset of ~15k protein sequences using ESM2 and then reduced the embeddings to 3 dimensions with t-SNE. Each point in the plot corresponds to one protein from the dataset; proteins that are close together are similar in ESM2 embedding space (i.e., the model considers them “sequence-context similar”).

Note: t-SNE axes (TSNE1/TSNE2/TSNE3) are arbitrary visualization coordinates (they don’t correspond to a specific physical property). The meaningful signal is local proximity / neighborhoods, not absolute axis values.

Dataset embedding + neighborhood structure

After generating mean-pooled ESM2 embeddings for the dataset, I visualized the results using a 3D t-SNE scatter plot. The dataset forms several dense regions and smaller “islands”, suggesting the embeddings capture recurring sequence/fold patterns and cluster related proteins into neighborhoods.

3D t-SNE visualization of the dataset embeddings 3D t-SNE visualization of the dataset embeddings

Alternate view / rotation of the same 3D t-SNE embedding space Alternate view / rotation of the same 3D t-SNE embedding space my protein in red

Placing my protein (3QXF) on the map

I then computed an embedding for my chosen protein (3QXF) using the same ESM2 embedding pipeline, appended it to the dataset, and re-ran t-SNE so that my protein appears on the same map as a highlighted point.

3D t-SNE of ESM2 embeddings (my protein highlighted) — zoomed view 3D t-SNE of ESM2 embeddings (my protein highlighted) — zoomed view3D t-SNE of ESM2 embeddings (my protein highlighted) — full dataset view 3D t-SNE of ESM2 embeddings (my protein highlighted) — full dataset view

Nearest neighbors to 3QXF (cosine similarity in embedding space)

To make the neighborhood interpretation concrete, I computed cosine similarity between my protein’s embedding and every dataset embedding and extracted the top nearest neighbors. The similarities are very high (~0.97–0.99), indicating that 3QXF lands inside a tight neighborhood of closely related embeddings.

From the dataset annotations, the closest neighbors include multiple polysaccharide-active enzymes (e.g., alginate lyase, chondroitinase, and probable endoglucanase). Even though these enzymes may act on different substrates, they share common sequence/fold features typical of carbohydrate-active proteins, which likely explains why the language-model embeddings place them near each other.

Top nearest neighbors to 3QXF in embedding space (cosine similarity) Top nearest neighbors to 3QXF in embedding space (cosine similarity)

Interpretation:
My 3QXF protein sits in a neighborhood enriched for carbohydrate/polysaccharide-processing enzymes, suggesting ESM2 embeddings capture higher-level similarities (shared fold/domain patterns and conserved sequence motifs) beyond exact function labels. This supports the idea that local neighborhoods in embedding space approximate “similar proteins” in terms of structure/function family.


Code snippet

  • Generate mean-pooled ESM2 embeddings for the dataset sequences
  • Compute my protein embedding and append it
  • Run 3D t-SNE and plot
  • Compute cosine similarity to retrieve nearest neighbors

C3. Protein Generation (Inverse Folding)

Picture Source:

  1. Post from Sergey Ovchinnikov
  2. Roney, Ovchinnikov et al. (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101.

Goal

Use a fixed backbone from my chosen PDB (3QXF) to generate new sequence candidates with ProteinMPNN (inverse folding), then validate one designed sequence by folding it with ESMFold and comparing it to the native baseline.


1) ProteinMPNN: backbone → sequence candidates

I ran ProteinMPNN on PDB 3QXF, designing chain A while keeping chains B/C/D fixed in the scoring context. ProteinMPNN produced 16 candidate sequences at sampling temperature T = 0.1.

Important note about sequence length: ProteinMPNN designs only residues that exist in the PDB ATOM coordinates (i.e., modeled residues). That’s why the “native” chain segment used here is 337 aa, not the full-length annotated FASTA (which can include missing terminal residues and expression tags).

ProteinMPNN reports seq_recovery ≈ 0.51 for sample 1, meaning the designed sequence is ~51% identical to the modeled native chain segment while still being compatible with the same backbone.


2) Predicted sequence probabilities (ProteinMPNN)

ProteinMPNN also saves per-position amino-acid probabilities (distribution over 20 AAs per residue position) in:

  • /content/mpnn_out/probs/3QXF.npz

These probabilities can be summarized as:

  • max probability per position (how confident the model is at each residue)
  • entropy per position (how uncertain the model is / how many choices are plausible)

(If you haven’t made these plots yet, you can generate them with the code snippet at the end of this section and add screenshots.)


3) ESMFold validation (sequence → structure)

Native baseline (PDB-modeled chain A)

I first folded the native modeled chain-A segment (same residue range ProteinMPNN used) using ESMFold.

  • Length: 337 aa
  • pTM: 0.940
  • Mean pLDDT: 92.291
  • Output PDB: native_chainA_6b6cf/ptm0.940_r3_default.pdb

ProteinMPNN-designed sequence (T=0.1, sample=1)

Next, I folded the ProteinMPNN-designed candidate (sample 1) with ESMFold:

  • Length: 337 aa
  • pTM: 0.948
  • Mean pLDDT: 92.546
  • Output PDB: mpnn_sample1_bf1be/ptm0.948_r3_default.pdb

Interpretation: Both native and designed sequences have very high pTM and pLDDT, and visually they form the same compact globular fold. This suggests ProteinMPNN successfully proposed a new sequence that remains compatible with the original backbone fold.


Figures

Saved ESMFold output PDBs (native vs designed):
ESMFold output PDB filenames ESMFold output PDB filenames

ESMFold predicted structure — Native (modeled chain A, rainbow coloring):
Native chain A structure (ESMFold) Native chain A structure (ESMFold)

ESMFold predicted structure — ProteinMPNN sample 1 (rainbow coloring):
ProteinMPNN sample 1 structure (ESMFold) ProteinMPNN sample 1 structure (ESMFold)

Alternate view (same prediction, different camera angle):
Alternate view Alternate view


( Code to generate ProteinMPNN probability plots

Use this to create the two plots (max probability + entropy).

import numpy as np
import matplotlib.pyplot as plt

data = np.load("/content/mpnn_out/probs/3QXF.npz")
print("Keys:", data.files)

# Find an array shaped like (..., 21) where 21 = 20 amino acids + 1 special token
probs = None
for k in data.files:
    arr = data[k]
    if arr.ndim in (2, 3) and arr.shape[-1] == 21:
        probs = arr
        print("Using key:", k, "shape:", arr.shape)
        break

assert probs is not None, "Could not find a probability array with last dimension = 21"

# If multiple samples exist, take sample 0
if probs.ndim == 3:
    probs_used = probs[0]
else:
    probs_used = probs

# Normalize in case these are logits/log-probs
probs_used = np.exp(probs_used - probs_used.max(axis=-1, keepdims=True))
probs_used = probs_used / probs_used.sum(axis=-1, keepdims=True)

max_prob = probs_used.max(axis=-1)
entropy = -(probs_used * np.log(probs_used + 1e-9)).sum(axis=-1)

plt.figure(figsize=(10,3))
plt.plot(max_prob)
plt.title("ProteinMPNN: max amino-acid probability per position")
plt.xlabel("Residue index")
plt.ylabel("Max probability")
plt.show()

plt.figure(figsize=(10,3))
plt.plot(entropy)
plt.title("ProteinMPNN: entropy per position (uncertainty)")
plt.xlabel("Residue index")
plt.ylabel("Entropy")
plt.show()

Inverse Folding with ProteinMPNN

For this part, I used the backbone of PDB: 3QXF and performed inverse folding with ProteinMPNN. I set the model to design chain A while keeping chains B, C, and D fixed.

ProteinMPNN generated a new sequence candidate for chain A based on the original backbone geometry. The native chain A sequence and the designed sequence were both 337 amino acids long. When I compared them, the designed sequence matched the native sequence at 175 out of 337 positions, giving a sequence identity of 51.93%. This means the model changed almost half of the residues while still proposing a sequence compatible with the same backbone fold.

The model also assigned a better score to the designed sequence than to the native one. The native score was 1.3309, while the sampled designed sequence had a score of 0.7779. Since this score reflects the model’s negative log-likelihood, the lower score suggests that ProteinMPNN considers the designed sequence highly compatible with the input backbone.

To further test the design, I folded the ProteinMPNN-generated sequence using ESMFold. The resulting predicted structure was then compared to the original 3QXF chain A structure. The comparison showed a Cα RMSD of 0.652 Å, which indicates that the predicted structure is extremely close to the original backbone. This suggests that the redesigned sequence preserves the same overall fold very well.

The confidence of the ESMFold prediction was also high. The output gave a mean pLDDT of 0.92 (with a minimum of 0.57 and maximum of 0.97), indicating that most of the structure was predicted with strong confidence.

Structural Overlay

Overlay of original and redesigned structures Overlay of original and redesigned structures

Figure 1. Overlay of the original 3QXF chain A structure and the ESMFold-predicted structure for the ProteinMPNN-designed sequence. The two structures align very closely, with only minor deviations in a few flexible regions.

Side-by-Side Comparison

Side-by-side comparison of original and redesigned structures Side-by-side comparison of original and redesigned structures

Figure 2. Side-by-side cartoon view of the original 3QXF chain A structure (left) and the ESMFold prediction of the redesigned sequence (right). The global fold is preserved, showing that the redesigned sequence remains compatible with the original backbone.

Amino Acid Probability Heatmap

ProteinMPNN amino acid probability heatmap ProteinMPNN amino acid probability heatmap

Figure 3. Amino-acid probability heatmap from ProteinMPNN showing the predicted residue probabilities at each sequence position. Bright, high-probability peaks indicate strongly constrained positions, while darker regions suggest positions that can tolerate more sequence variation.

Overall, this inverse-folding experiment shows that ProteinMPNN can generate a substantially different sequence while still preserving the original fold. Even with only about 52% sequence identity, the redesigned sequence folds back into a structure that is nearly identical to the starting backbone, demonstrating the robustness of structure-guided protein design.

Part D — Bacteriophage Engineering Proposal

Selected Goal

I propose to focus on:

  • Primary goal: Increasing stability of the phage lysis (L) protein
  • Secondary goal: Modulating interaction with host machinery (e.g., E. coli DnaJ)

This direction is computationally tractable and aligns with available protein design tools while still connecting to functional outcomes (lysis efficiency and phage fitness).

Rationale

The L protein is responsible for host cell lysis and is therefore a key determinant of bacteriophage replication efficiency. Improving its structural stability could:

  • Increase protein lifetime inside the host
  • Improve folding efficiency
  • Potentially increase effective lysis activity

Additionally, modifying interactions with host proteins (e.g., DnaJ chaperone system) could alter:

  • Protein degradation pathways
  • Folding dynamics
  • Toxicity and timing of lysis

These properties make the L protein a suitable target for computational protein engineering concepts.

Proposed Computational Approach

1. Sequence Analysis & Baseline Characterization

  • Use UniProt / BLAST to identify homologs
  • Generate multiple sequence alignment (MSA)
  • Identify conserved vs variable regions

Goal: Identify mutation-tolerant regions

2. Structure Prediction

  • Predict structure using ESMFold or AlphaFold2

Goal: Obtain structural model for downstream design

3. In Silico Mutagenesis (Protein Language Models)

  • Use ESM-2 to perform:
    • Deep mutational scanning (in silico)
    • Likelihood scoring of mutations

Goal: To identify mutations likely to improve stability without disrupting function

4. Sequence Optimization

  • Use ProteinMPNN:
    • Redesign selected regions (not the full protein, to preserve function)
    • Generate candidate sequences

Goal: Improve packing, stability, and foldability

5. Structural Validation

  • Re-run ESMFold / AlphaFold on designed variants
  • Compare:
    • pLDDT (confidence)
    • Structural deviations

Goal: Filter unstable designs

6. Interaction Modeling

  • Use AlphaFold-Multimer:
    • Model interaction with host proteins (e.g., DnaJ)

Goal: Evaluate whether mutations alter interaction in the host organism

Pipeline Schematic

 
Input: L protein sequence
        ↓
Homology search (BLAST / MSA)
        ↓
Structure prediction (ESMFold / AlphaFold)
        ↓
In silico mutagenesis (ESM-2)
        ↓
Sequence redesign (ProteinMPNN)
        ↓
Structure validation (AlphaFold)
        ↓
(Optional) Complex modeling (AlphaFold-Multimer)
        ↓
Output: Candidate stabilized L protein variants

Why These Tools

  • Protein Language Models (ESM-2):
    Capture evolutionary constraints → useful for predicting tolerated mutations

  • ProteinMPNN:
    Enables structure-based redesign → improves stability via better packing

  • AlphaFold / ESMFold:
    Provide fast structural validation → essential for screening designs

  • AlphaFold-Multimer:
    Allows hypothesis testing of host–phage interactions

Together, these tools enable a pipeline from sequence to function hypothesis.

Potential Pitfalls

  • Lack of experimental validation

    • Computational predictions may not correlate with real folding or function
  • Limited training data for phage proteins

    • Models are biased toward well-studied proteins
    • Phage-specific interactions may be poorly captured
  • Over-optimization risk

    • Increasing stability may reduce functional dynamics needed for lysis

Conclusion

This approach focuses on stability engineering as an accessible entry point into bacteriophage design. By combining protein language models, structure prediction, and sequence redesign, it is possible to generate testable hypotheses for improved phage function, while staying within the scope of computational tools introduced in HTGAA.

References

Week 5 HW: Protein design part 2

Part 1: Generate Binders with PepMLM

For this exercise, I used the human SOD1 target protein and introduced the A4V mutation. I then used PepMLM to generate four candidate 12-amino-acid peptide binders against the mutant target sequence. As requested in the assignment, I also included the known binder peptide FLYRWLPSRRGG for comparison.

What is a A4V mutation:

  • A = alanine
  • 4 = position 4
  • V = valine
  • So it means the alanine at that position is replaced by valine. . In SOD1, A4V is a famous mutation. It is often described as one of the more aggressive SOD1-linked variants.

*What is SOD1:

  • Stands for superoxide dismutase 1. It is the gene/protein for an enzyme that helps protect cells from oxidative damage by breaking down superoxide radicals, which are harmful oxygen byproducts of normal metabolism. Human SOD1 is the well-known copper/zinc superoxide dismutase found in the cytoplasm

Target sequence used

A4V mutant SOD1 sequence:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
| Peptide ID   | Sequence       | Length | Perplexity |
| ------------ | -------------- | -----: | ---------: |
| PepMLM-1     | `WSDDAVVDAVHA` |     12 |     2.8559 |
| PepMLM-2     | `WDWDSAAAAAAK` |     12 |     1.8068 |
| PepMLM-3     | `WHSGPGAAAAAK` |     12 |     2.1700 |
| PepMLM-4     | `HHSGSGGAAGKH` |     12 |     2.9582 |
| Known binder | `FLYRWLPSRRGG` |     12 |        N/A |

Interpretation

  • PepMLM produced four short peptide candidates for the mutant SOD1 target. Based on the perplexity values, PepMLM-2 (WDWDSAAAAAAK) is the most promising candidate, because it has the lowest perplexity, which indicates the highest model confidence among the generated sequences. PepMLM-3 ranked second, while PepMLM-1 and PepMLM-4 had higher perplexity and are therefore less favored by the model.
  • It is also interesting that the generated peptides are quite different in composition from the known binder FLYRWLPSRRGG. The PepMLM outputs are enriched in small, polar, and acidic residues such as A, G, D, H, and S, while the known binder contains more hydrophobic and basic residues such as F, L, W, R, and Y. This suggests that the model explored a different part of sequence space while still proposing candidate binders for the same target.
  • Overall, the strongest candidate from this step is PepMLM-2, which I would prioritize for the next stage of structural evaluation.

Part 2: Evaluate Binders with AlphaFold3

I evaluated each peptide by submitting the A4V mutant SOD1 sequence together with each peptide as separate chains in AlphaFold Server. For each prediction, I recorded the ipTM score and visually inspected where the peptide appeared to bind on SOD1. The goal was to see whether the peptide localized near the N-terminus/A4V region, the β-barrel surface, or the dimer interface. AlphaFold Server reports ipTM as a confidence measure for predicted interfaces in complexes, so higher values suggest a more confident protein–peptide interaction.

What is??

  • ipTM stands for interface predicted TM-score. It is a confidence score for the relative positioning of the chains basically, how believable the predicted interaction interface is between the protein and the peptide. Higher is better. A commonly used rough interpretation is: above 0.8 = strong confidence, below 0.6 = likely weak or failed prediction, and 0.6–0.8 = gray zone where the pose may or may not be right.

  • N-terminus / A4V region is the beginning of the protein chain. In SOD1, the A4V mutation is right near that beginning region: alanine is replaced by valine close to the N-terminal end. In the A4V mutant, the overall SOD1 structure is mostly preserved, studies report increased disorder around the N-terminus and a shift in how the two SOD1 subunits sit together. Reff

  • β-barrel is a protein fold made from multiple β-strands that wrap around into a barrel-like shape. SOD1’s monomer is built around an eight-stranded antiparallel β-barrel, and SOD1 is a dimer of two such β-barrels. The β-barrel surface means the outside exposed face of that folded barrel.

  • Dimer interfaceSOD1 normally functions as a homodimer, meaning two identical SOD1 subunits bind together. The dimer interface is the set of surfaces and contacts where those two subunits touch each other Reff

AlphaFold results

Peptide IDSequenceTop ipTMInterpretation of binding pose
PepMLM-1WSDDAVVDAVHA0.52Weak-to-moderate interface. The peptide sits near the protein surface, but the pose is not tightly packed and looks only loosely associated.
PepMLM-2WDWDSAAAAAAK0.49Weak interface. The peptide appears offset from the SOD1 surface and does not form a convincing bound complex.
PepMLM-3WHSGPGAAAAAK0.64Strongest of the five tested peptides. The peptide lies across the surface of SOD1 in a more continuous contact pose than the others.
PepMLM-4HHSGSGGAAGKH0.39Weak interface. The peptide touches one side of the protein but remains extended and low-confidence.
Known binderFLYRWLPSRRGG0.33Weakest result in this AlphaFold screen. The peptide remains mostly detached and does not form a convincing bound pose in the top-ranked model.

Structural observations

PepMLM-1

The top-ranked model for PepMLM-1 gave an ipTM of 0.52, which was moderate but not especially convincing. In the chain-colored view, the peptide is close to SOD1 but still looks somewhat detached rather than tightly docked. I interpreted this as a weak or ambiguous interaction, not a strongly defined binding mode.

PepMLM-1 bound to A4V SOD1 PepMLM-1 bound to A4V SOD1

PepMLM-2

Although PepMLM-2 had the best PepMLM perplexity score in Part 1, the AlphaFold result was less convincing. Its top-ranked model had an ipTM of 0.49, and the peptide appears offset from the protein surface rather than packed into a clear binding site. This suggests that sequence plausibility from PepMLM did not translate into the strongest structural interface.

PepMLM-2 bound to A4V SOD1 PepMLM-2 bound to A4V SOD1

PepMLM-3

PepMLM-3 performed best in the AlphaFold comparison, with a top-ranked ipTM of 0.64. Visually, this peptide follows the SOD1 surface much more closely than the others and appears to form a broader, more continuous contact region. Even though this is still not an extremely high-confidence interface, it is the most convincing binding pose among the five peptides tested.

PepMLM-3 bound to A4V SOD1 PepMLM-3 bound to A4V SOD1

PepMLM-4

For PepMLM-4, the top-ranked model had an ipTM of 0.39. The peptide touches the protein surface, but the interaction looks elongated and weak, without a compact docking geometry. I therefore considered this a poor candidate relative to PepMLM-1 and especially PepMLM-3.

PepMLM-4 bound to A4V SOD1 PepMLM-4 bound to A4V SOD1

Known binder

The known binder surprisingly gave the weakest structural result in this AlphaFold screen, with a top-ranked ipTM of 0.33. In the chain-colored view, the peptide remains mostly separate from the protein and does not adopt a clear bound conformation. This does not necessarily mean it cannot bind experimentally, but in this prediction set it was less convincing than the best PepMLM-generated candidate.

Known binder bound to A4V SOD1 Known binder bound to A4V SOD1

Interpretation

Overall, PepMLM-3 (WHSGPGAAAAAK) was the most promising peptide in the AlphaFold evaluation because it had the highest ipTM (0.64) and the most convincing surface-bound pose. PepMLM-1 was intermediate, while PepMLM-2, PepMLM-4, and the known binder all looked weaker in the structural screen.

An interesting result is that the peptide with the lowest PepMLM perplexity was PepMLM-2, but the peptide with the best AlphaFold complex prediction was PepMLM-3. This shows that sequence-level model confidence and structure-level interface confidence are related but not identical. In this case, I would prioritize PepMLM-3 for follow-up testing.

Another important observation is that none of the peptides clearly docked directly at the extreme N-terminal A4V mutation site itself. Instead, the predicted interactions were mostly distributed over broader exposed surfaces of SOD1. So the best candidate here appears to behave more like a surface-binding peptide than a mutation-site-specific binder.

Final ranking from Part 2

  1. PepMLM-3 — best overall AlphaFold interface
  2. PepMLM-1 — moderate but weaker than PepMLM-3
  3. PepMLM-2 — weaker structural support despite best PepMLM perplexity
  4. PepMLM-4 — poor interface
  5. Known binder — weakest in this AlphaFold screen

Part 3: Evaluate Properties of Generated Peptides in PeptiVerse

This part answers even if this peptide looks like the best binder, is it also a realistic peptide to pursue?

To further compare the PepMLM-generated peptides, I evaluated each one in PeptiVerse using the A4V mutant SOD1 sequence as the protein target. I recorded the required outputs from the homework prompt: predicted binding affinity, solubility, hemolysis probability, net charge (pH 7), and molecular weight.

why is predicted binding affinity, solubility, hemolysis probability, net charge (pH 7), and molecular weight important metrics and what do they acctually mean?

  • binding affinity A stronger binder usually means the peptide is more likely to stay attached long enough to have an effect. If binding is very weak, the peptide may just drift away and not do much.

  • solubility This is very important because most biological experiments happen in aqueous environments. If a peptide is poorly soluble, it may:

  • Hemolysis means breaking open red blood cells. So hemolysis probability is a prediction of whether the peptide might damage cell membranes strongly enough to lyse red blood cells. This matters because a peptide might bind a target but still be too toxic or membrane-disruptive to be a good therapeutic lead. low hemolysis probability = safer-looking peptide, high hemolysis probability = warning sign for toxicity

  • Net charge at pH 7 This is the peptide’s overall electrical charge around neutral pH. Some amino acids are positively charged, some negatively charged, and some neutral. When you add them up, you get the peptide’s net charge. This matters because charge affects: a) solubility b) how the peptide interacts with proteins c) how it interacts with membranes d) whether it tends to stick nonspecifically to other molecules

  • Molecular weight how heavy the peptide is,for a peptide, this is closely related to how many amino acids it contains and what those amino acids are.

Why all of these matter?:

  • able to bind reasonably well
  • soluble enough to test
  • not obviously toxic
  • have a reasonable charge
  • have a manageable size

PeptiVerse results

Peptide IDSequenceBinding affinity (pKd/pKi)SolubilityHemolysis probabilityNet charge (pH 7)Molecular weight (Da)
PepMLM-1WSDDAVVDAVHA5.6321.0000.065-3.151284.3
PepMLM-2WDWDSAAAAAAK5.0271.0000.033-1.241262.3
PepMLM-3WHSGPGAAAAAK4.6981.0000.0160.851123.2
PepMLM-4HHSGSGGAAGKH4.2011.0000.0161.021102.1

Individual PeptiVerse outputs

PepMLM-1

PeptiVerse predicted that PepMLM-1 is fully soluble and non-hemolytic, but it had the highest hemolysis probability of the four peptides and was also the most negatively charged. It showed the highest predicted binding affinity in PeptiVerse, although it was still classified as weak binding overall.

PeptiVerse results for PepMLM-1 PeptiVerse results for PepMLM-1

PepMLM-2

PepMLM-2 was also predicted to be fully soluble and non-hemolytic. Compared with PepMLM-1 it had a slightly lower predicted binding affinity, lower hemolysis probability, and a less negative charge. This makes it somewhat more balanced than PepMLM-1 from a developability perspective.

PeptiVerse results for PepMLM-2 PeptiVerse results for PepMLM-2

PepMLM-3

PepMLM-3 had full predicted solubility, very low hemolysis probability, and a slightly positive net charge, which could be favorable for interaction with exposed protein surfaces. Its predicted binding affinity was lower than PepMLM-1 and PepMLM-2 in PeptiVerse, but it still looked attractive overall because of its better safety/developability profile.

PeptiVerse results for PepMLM-3 PeptiVerse results for PepMLM-3

PepMLM-4

PepMLM-4 had the lowest predicted binding affinity of the four peptides, but it was also fully soluble, very low in hemolysis probability, and the lightest peptide by molecular weight. It looked like a safe and soluble candidate, but less promising from a binding perspective.

PeptiVerse results for PepMLM-4 PeptiVerse results for PepMLM-4

Interpretation

A clear pattern from PeptiVerse is that all four peptides were predicted to be soluble, and all four had low hemolysis probabilities, so none of them looked immediately problematic from a basic safety/solubility perspective. The differences were mainly in relative binding affinity, charge, and molecular weight.

If I rank the peptides by PeptiVerse predicted binding affinity alone, the order is:

  1. PepMLM-1 — 5.632
  2. PepMLM-2 — 5.027
  3. PepMLM-3 — 4.698
  4. PepMLM-4 — 4.201

However, PeptiVerse and AlphaFold did not rank the peptides in the same way. In Part 2, PepMLM-3 gave the best AlphaFold complex result with the highest ipTM and the most convincing surface-bound pose, while PepMLM-1 only showed a weaker and more ambiguous interface. This means that the peptide with the highest predicted affinity in PeptiVerse was not the same peptide that gave the strongest structural complex prediction.

Final decision

Based on the combined results from PepMLM, AlphaFold, and PeptiVerse, I would advance PepMLM-3 (WHSGPGAAAAAK).

My reasoning is:

  • it had the strongest AlphaFold result from Part 2,
  • it remained fully soluble in PeptiVerse,
  • it had a very low hemolysis probability (0.016),
  • it had a relatively low molecular weight (1123.2 Da),
  • and its slightly positive net charge (0.85) may be more favorable than the strongly negative charge of PepMLM-1.

So even though PepMLM-1 had the highest PeptiVerse binding score, PepMLM-3 appears to offer the best overall balance between predicted binding geometry and peptide properties. For that reason, PepMLM-3 would be my lead candidate for follow-up testing.

Part 4: Generate Optimized Peptides with moPPIt

For the final design step, I used moPPIt to generate peptides that were explicitly guided toward a selected region of the target protein, rather than only sampling general binders from sequence context as in PepMLM. I used the A4V mutant SOD1 sequence as the target and chose a motif around the N-terminal region (residues 1–8) in order to bias the model toward the area surrounding the disease-associated A4V mutation.

Input settings used

  • Target protein: A4V mutant SOD1
  • Targeted motif / residue region: 1–8
  • Peptide length: 12 aa
  • Guidance enabled: Affinity + Motif
  • Number of samples requested: 3

moPPIt-generated peptides

Peptide IDSequenceTargeted motifNotes
moPPIt-1RSKTKLCGEKQV1–8Positively charged / mixed-polar sequence, quite different from the PepMLM peptides
moPPIt-2GCGDLFTYYYYG1–8More aromatic and hydrophobic, with several tyrosines
moPPIt-3Not completed1–8Colab GPU limit interrupted the run before the third peptide finished

Interpretation

Compared with the PepMLM peptides, the moPPIt peptides look quite different in sequence composition. The earlier PepMLM candidates were enriched in small and simple residues such as A, G, S, and D, while the moPPIt peptides contain more clearly designed features, including charged residues in moPPIt-1 and aromatic residues in moPPIt-2. This makes sense, because moPPIt was run with an explicit motif-targeting objective rather than only sequence-conditioned peptide generation.

The most important difference is conceptual:

  • PepMLM generated peptides that behaved mostly like general surface binders
  • moPPIt was used here to bias peptide design toward the N-terminal A4V-adjacent region

So even though I have not yet structurally validated these new peptides, they are more directly aligned with the biological goal of targeting the mutation-associated region of SOD1.

Limitation of this run

The moPPIt run was interrupted by Colab GPU usage limits before the third sample completed, so I only obtained two finished peptides in this session. I therefore treat this as a partial design round rather than a complete final screen.

Comparison to PepMLM peptides

In Parts 1–3, the best overall PepMLM candidate was PepMLM-3 (WHSGPGAAAAAK), because it showed the strongest AlphaFold interface while also maintaining good PeptiVerse properties. However, those PepMLM peptides did not clearly dock at the extreme A4V/N-terminal site. The moPPIt design step was therefore useful because it shifted the strategy from simply finding plausible binders to generating peptides that are more likely to engage the chosen mutation-adjacent motif.

How I would evaluate the moPPIt peptides before advancing them

Before considering these peptides as therapeutic leads, I would next:

  1. predict their complexes with AlphaFold to check whether they actually bind near residues 1–8 of SOD1,
  2. evaluate their binding affinity, solubility, hemolysis, charge, and molecular weight in PeptiVerse,
  3. compare them directly against PepMLM-3, which was the strongest candidate from the previous steps,
  4. test whether they show better site specificity for the mutant N-terminal region rather than general surface sticking.

After computational screening, the next stage would be experimental validation, including peptide synthesis, in vitro binding assays, comparison between wild-type and A4V mutant SOD1, and functional assays related to aggregation or stabilization.

Conclusion

Even with only two completed outputs, moPPIt was useful because it produced a new set of peptides specifically optimized toward the A4V-adjacent N-terminal motif of SOD1. The two peptides generated in this run were:

  • RSKTKLCGEKQV
  • GCGDLFTYYYYG

These would be the next candidates I would test computationally against PepMLM-3 to see whether motif-guided design can produce a more mutation-focused binder than the original PepMLM approach.

Part C — Mutation Analysis with ESM

To explore how mutations may affect the stability and plausibility of my protein sequence, I used the ESM protein language model to perform a single-site mutational scan across the entire sequence. This analysis calculates a log-likelihood ratio (LLR) score for substituting each amino acid at each position in the protein.

The LLR score estimates how likely a mutation is according to the learned statistical patterns of natural proteins.

  • Positive LLR values indicate that the mutation is plausible or tolerated.
  • Negative LLR values suggest that the mutation may destabilize the protein or be less compatible with natural sequence patterns.

This approach allows us to identify positions that are mutation-tolerant and potentially useful for protein design.


Global Mutation Landscape

The heatmap below shows the predicted effects of all possible amino acid substitutions across the protein sequence.

  • X-axis: position in the protein sequence
  • Y-axis: substituted amino acid
  • Color: predicted mutation effect (LLR score)

Brighter yellow regions represent mutations predicted to be more favorable, while darker blue/purple regions represent unfavorable substitutions.

Predicted mutation effects heatmap Predicted mutation effects heatmap

From this visualization we can see that:

  • Some positions are highly constrained (mostly negative scores), suggesting that mutations there would likely disrupt the protein.
  • Other positions show several neutral or positive substitutions, indicating that these sites may tolerate mutation.
  • A few positions show strong positive signals for specific amino acids, suggesting potential candidates for protein engineering.

Detailed View of Mutation Effects

The following heatmap provides another view of the mutation landscape, confirming the overall pattern of mutation tolerance across the sequence.

Predicted mutation effects heatmap 2 Predicted mutation effects heatmap 2

In both visualizations, several residues show clusters of positive LLR values for specific substitutions, suggesting that these positions may accommodate changes without disrupting the protein fold.


Protein Representation Learned by ESM

The ESM model also generates a high-dimensional representation (embedding) of the protein sequence. These embeddings capture patterns such as evolutionary constraints and structural signals.

The visualization below shows the representation dimensions learned by the model across the sequence.

Protein representation visualization Protein representation visualization

Although the representation values appear relatively uniform across most positions, subtle variations encode contextual information about each residue within the protein sequence.


Candidate Mutations

Based on the LLR mutation analysis, I selected several candidate mutations with relatively favorable scores. These mutations occur at positions where the model predicts that substitutions may be tolerated.

Example candidate mutations include:

  • S9Q
  • C29R
  • Y39L
  • K50L
  • N53L

These mutations were chosen because they showed relatively high LLR scores compared to other substitutions at the same positions, suggesting that the protein language model considers them compatible with natural protein sequence patterns.

Residue 39 appeared particularly permissive to mutation, with multiple substitutions showing similar scores. This suggests that this region may tolerate amino-acid changes without strongly disrupting the protein structure.


Interpretation

The ESM mutational scan provides a data-driven way to identify potentially tolerable mutations in a protein sequence. While these predictions do not guarantee functional improvements, they highlight mutations that are consistent with evolutionary patterns learned by the model.

In protein engineering workflows, such predictions can be used to:

  • prioritize mutations for experimental testing
  • explore sequence space while maintaining structural plausibility
  • identify flexible regions of the protein

Overall, the analysis suggests that several positions in this protein may tolerate mutation and could serve as starting points for further design or optimization.

google_collab

week-06-hw-genetic-circuits-part-i

Assignment: DNA Assembly

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The Phusion High-Fidelity PCR Master Mix contains at least three key components: Phusion DNA polymerase, deoxynucleotides (dNTPs), and an optimized reaction buffer that includes MgCl₂. The polymerase is the enzyme that synthesizes new DNA strands during PCR, the dNTPs are the nucleotide building blocks incorporated into the new DNA, and the buffer/MgCl₂ provide the chemical environment and cofactor needed for efficient polymerase activity. According to the website of (New England)[https://www.neb.com/en/products/m0531-phusion-high-fidelity-pcr-master-mix-with-hf-buffer?srsltid=AfmBOorWPUiBMtKsQJJH0VLGPzLYHtMYELtt0wf7AQB0YZYF4nrTfFsz] the main benefit of rgw Master mix is high fidelity (50X comparing to Taq) and fast extension times.

image image image: ChatGBT

2. What are some factors that determine primer annealing temperature during PCR?

The main factor is the melting temperature (Tm) of the primers. Tm depends on the primer’s sequence length, and base composition.

  • Base composition: GC primers generally bind more strongly than AT-rich ones and therefor require higher Tm.
  • sequence length: Longer sequences tend to bind better since their is more base pairs that can bind to eachother.

Good primer pairs should usually have Tms that are close to each other. The lab notes suggest a binding-region Tm around 52–58°C and within about 5°C of the partner primer, and annealing is chosen about 2–5°C below the lower primer Tm. Reaction conditions also matter; for example, additives such as DMSO can lower primer Tm, so the annealing temperature may need to be reduced. In our lab protocol the backbone PCR and insert PCR use different annealing temperatures (57°C vs 53°C) because the primer sets differ.

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR creates a linear DNA fragment by using primers and a DNA polymerase to amplify a chosen region through repeated cycles of denaturation, annealing, and extension. Its biggest advantage is flexibility: it can amplify almost any desired region and can also add useful sequence features through the primers, such as mutations, overlaps for Gibson assembly, or restriction sites. That makes PCR preferable when a fragment must be engineered, when no convenient restriction sites exist, or when only a small defined region should be copied.

A restriction digest, by contrast, creates linear DNA by cutting at specific recognition sequences with restriction enzymes. This is often simpler and very reliable when the needed sites are already present in the plasmid or multiple cloning site, and it is especially useful for subcloning, plasmid linearization, or diagnostic digests. Its limitation is that it depends on sequence context: the enzyme sites must be present where you need them and absent where you do not want cuts. So in practice, restriction digestion is often preferable when the construct already has good enzyme sites, while PCR is preferable when you need more freedom in fragment boundaries or sequence design.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

For Gibson assembly, the most important requirement is that adjacent DNA fragments have matching homologous overlaps (similar in position). In the lab, the primer design guidelines specify about 20–22 bp overlaps.

Beyond design, you should verify the fragments experimentally. In this protocol, that means using DpnI (a restriction enzyme that cuts methylated DNA at the sequence GATC) to remove methylated parental plasmid template after PCR, purifying the PCR products, checking DNA concentration, and running a diagnostic gel to confirm that the backbone and insert have the expected sizes. For the assembly itself, the lab recommends an approximately 2:1 insert:vector molar ratio, which also helps improve successful Gibson cloning. It is also possible to confirm the whole assembly in silico in Benchling before doing the wet lab step, to make sure the overlap sequences are exact and nothing missing.

About (Gibsom assembley)[https://www.youtube.com/watch?v=tlVbf5fXhp4]

5. How does the plasmid DNA enter the E. coli cells during transformation?

Plasmid DNA enters the E. coli cells by heat-shock transformation of chemically competent cells. The cells are kept on ice with the DNA, then briefly exposed to 42°C, which causes the membrane to become transiently permeable. The lab handout explains this as the membrane “opening up,” after which the plasmid enters the cells by diffusion. The cells are then allowed to recover in SOC medium for about an hour so they can repair their membranes and begin expressing the antibiotic-resistance marker before they are plated on selective agar.

SOC medium is a growth medium for bacteria

6. Describe another assembly method in detail: Golden Gate Assembly

Golden Gate Assembly is a DNA assembly method that uses a Type IIS restriction enzyme such as BsaI or BsmBI together with T4 DNA ligase in a single reaction. Unlike standard restriction enzymes, Type IIS enzymes cut outside their recognition sequence, so the researcher can design custom overhangs that determine exactly which fragments join to each other. Because the recognition sites are placed so they are removed during assembly, the final product is usually scarless and cannot be re-cut in the same way, which allows digestion and ligation to happen in the same tube. This makes Golden Gate especially useful for assembling multiple fragments in a defined order, such as promoter–RBS–CDS–terminator constructs in synthetic biology. A major design requirement is that the parts must not contain unwanted internal sites for the Type IIS enzyme being used; if they do, the sequence must be “domesticated” first. Compared with Gibson, Golden Gate is excellent for modular, repeatable multi-part assembly, while Gibson is often more convenient when overlaps are easier to design than restriction-site architecture.

Simple diagram of Golden Gate Assembly:

Goldengate Goldengate

Week 07 HW: Genetic circuits part ii

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?**

Intracellular artificial neural networks (IANNs) have a major advantage over traditional Boolean genetic circuits because they can process graded, continuous signals rather than only treating inputs as ON/OFF states. In biological systems, many relevant signals such as metabolite concentration, RNA abundance, stress level etc are not naturally binary. Neural-network-like circuits are better suited to integrate these analog inputs and make decisions based on their combined strength. Rizik e.g 2022

A second advantage is that IANNs can implement more flexible and complex computations such as classification, soft majority decisions, analog-to-digital conversion, and multistage signal processing. Rizik (2022) show multilayer “perceptgene” circuits that compute a soft majority function, perform analog-to-digital conversion, and implement a ternary switch and argue that neuro-inspired circuit design can be more reliable, resource-efficient, and reconfigurable for different tasks.

A third advantage is better compatibility with biological noise and nonlinearity. The same paper reports that logarithmic-domain neuromorphic computing is more suitable than a linear-domain perceptron for their gene circuits, and that it is more robust to noise at low signal concentrations. This is important because intracellular environments are noisy and variable from cell to cell. In that sense, IANNs are often better matched to real biological computation than rigid Boolean logic alone.

Overall, Boolean circuits are useful when a strict yes/no rule is enough, but IANNs are more powerful when the task requires integrating multiple imperfect signals, weighting them differently, and producing a graded or thresholded response.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

One useful application for an IANN would be a cell-state classifier for targeted cancer detection or therapy. The idea would be to engineer a mammalian cell circuit that reads several intracellular biomarkers at once, such as microRNA levels, stress-response signals, or hypoxia-associated signals, and then decides whether the overall profile matches a diseased cell state. Synthetic biology has already been used to build multi-input circuits for identifying specific cancer cells, and broader synthetic signal-processing systems are being developed for diagnostics and therapies. Z. Xie 2011

In this application, the inputs would be several intracellular markers, for example: high miR-21, high miR-155, low activity from a tumor-suppressor-associated pathway, and a hypoxia-related signal. Instead of applying a strict Boolean rule such as “all markers must be present,” the IANN would assign different effective weights to each input. In the first layer, each biomarker would influence production of a regulatory RNA or protein. In a hidden layer, those intermediate signals would be combined into a weighted internal score. In the output layer, if the total score crosses a threshold, the circuit could activate an output such as GFP for detection, a therapeutic protein, or a kill-switch effector. This makes the system more tolerant of noisy or partially matching disease signatures.

The main limitation is that real cells have limited shared resources for transcription and translation. Synthetic genes compete for these resources, which can make otherwise separate modules interfere with one another and cause the actual circuit behavior to differ from the intended design. This is a serious issue for multilayer circuits because each additional node increases the load on the cell.

A second limitation is orthogonality and crosstalk. Endoribonuclease-based platforms are powerful because they are modular and composable, but not every regulator is perfectly orthogonal with every other one. The PERSIST platform Di Adreth 2022 showed that most endoRNases were orthogonal, but some pairs showed cross-reactivity and should be avoided. That means a practical IANN needs careful part selection and calibration.

A third limitation is that large intracellular neural circuits are still difficult to scale. The neuromorphic computing paper notes that these systems support only a limited number of distinct inputs, and multilayer gene circuits can also face issues such as slow dynamics, variability between cells, and tuning difficulties. So while the concept is powerful, achieving a reliable therapeutic IANN would require careful optimization and validation.

4. Diagram for an intracellular multilayer perceptron

Left / Layer 1: X1 = DNA encoding endoRNase 1 Middle / Hidden layer: DNA encoding endoRNase 2 with mRNA 2 containing target site for endoRNase 1 Right / Output layer: DNA encoding fluorescent protein with mRNA 3 containing target site for endoRNase 2

image image

-X1 = DNA encoding endoribonuclease -Layer 1 output = endoribonuclease protein -X2 = DNA encoding fluorescent protein -Layer 2 = reporter transcript/protein regulated by the endoribonuclease from layer 1 -Y = fluorescence

References

  1. HTGAA Spring 2026, Week 7 — Genetic Circuits Part II: Neuromorphic Circuits. Course assignment page listing the Part 1 questions and the multilayer perceptron drawing task.
    https://2026a.htgaa.org/2026a/course-pages/weeks/week-07/index.html

  2. Rizik, L. et al. (2022). Synthetic neuromorphic computing in living cells. Nature Communications, 13, 5602.
    https://www.nature.com/articles/s41467-022-33288-8

  3. DiAndreth, B. et al. (2022). The PERSIST platform provides programmable RNA regulation using CRISPR endoRNases. Nature Communications, 13, 2582.
    https://www.nature.com/articles/s41467-022-30172-3

  4. Gao, Y., Wang, L., and Wang, B. (2023). Customizing cellular signal processing by synthetic multi-level regulatory circuits. Nature Communications, 14, 8415.
    https://www.nature.com/articles/s41467-023-44256-1

  5. Frei, T. et al. (2020). Characterization and mitigation of gene expression burden in mammalian cells. Nature Communications, 11, 4641.
    https://www.nature.com/articles/s41467-020-18392-x

  6. Xie, Z. et al. (2011). Multi-input RNAi-based logic circuit for identification of specific cancer cells. Science, 333(6047), 1307–1311.
    https://www.science.org/doi/10.1126/scisignal.4189ec246

    Assignment Part 2: Fungal Materials

Fungi are eukaryotic organisms, meaning they belong to the same broad domain as animals and plants, but they form their own biological kingdom. This group includes yeasts, molds, and mushrooms. Unlike bacteria and archaea, fungi have complex cells with a nucleus. Their unique growth behavior, especially through filamentous networks called mycelium, has made them highly interesting for biomaterial research.

Fungi within Eukaryota Fungi within Eukaryota

Mycology is the branch of biology concerned with the study of fungi and their many roles and applications, including:

  • pathogenic activity
  • drug discovery
  • ecology
  • bioremediation
  • biomaterials

In the context of material design, the most important part of the fungus is often the mycelium, the root-like vegetative network that grows through a substrate. In recent years, mycelium has been widely explored as a biomaterial for packaging, construction, insulation, acoustic panels, and leather-like alternatives for fashion.

A major reason for this interest is that fungal materials can be grown on cheap and abundant feedstocks, such as sawdust, straw, wood chips, or other agricultural waste. They are also attractive because they are generally lightweight, biodegradable, and relatively fast to cultivate compared with many conventional manufacturing processes.

The material chart below suggests that mycelium composites often behave more like foams or lightweight natural materials than like dense polymers, ceramics, or metals. This makes them especially promising where low weight, cushioning, insulation, or biodegradability are more important than very high structural strength.

Material property comparison Material property comparison

red dots are mycelium

Examples of existing fungal materials

1. Mycelium leather-like materials

One of the best-known applications of fungal materials is in the fashion industry, where mycelium is used to create leather-like sheets and surfaces. Companies such as Bolt Threads and their material Mylo helped popularize this category by presenting fungal alternatives for bags, shoes, and accessories.

Fungal fashion products Fungal fashion products Mylo Bolt threads

Leather-like fungal material Leather-like fungal material Mylo Bolt threads

These materials are interesting because they can be developed either from liquid-grown fungal biomass or from solid-substrate growth systems, depending on the intended texture and manufacturing process.

Advantages over traditional leather:

  • animal-free
  • potentially lower environmental impact
  • can be grown rather than fully extracted from animals
  • texture, thickness, and surface finish can be tuned
  • can fit circular and bio-based design strategies

Disadvantages:

  • often still require coatings or backing layers for durability
  • may not yet match the longevity of high-quality animal leather
  • industrial scaling and consistency are still developing
  • some products are expensive compared with conventional synthetic leather or mass-market leather

Compared with synthetic “vegan leather” made from plastics, fungal leather alternatives may also offer a more bio-based route, although in practice some current products still include polymer coatings, so they are not always fully biodegradable.

2. Mycelium packaging

Another important example is mycelium packaging, especially developed by companies such as Ecovative. In this case, mycelium is grown through agricultural waste to form protective packaging shapes that can replace expanded polystyrene or other petrochemical foams.

Mycelium packaging Mycelium packaging Ecovative

Uses:

  • protective packaging for bottles, electronics, and fragile goods
  • molded cushioning forms
  • compostable alternatives to foam packaging

Advantages over conventional foam packaging:

  • biodegradable and compostable
  • grown from low-cost waste streams
  • lower dependence on fossil-based plastics
  • good shock absorption and lightweight performance

Disadvantages:

  • more sensitive to moisture than plastic foams
  • less suitable for very long-term wet storage
  • can be bulkier or less standardized than industrial plastic packaging
  • production speed and storage conditions may be more demanding than mass-produced plastic

3. Acoustic and interior panels

Mycelium is also being used for acoustic panels, tiles, and interior surfaces. Companies such as Mogu have developed products that use fungal composites for sound absorption and architectural finishes.

Mycelium acoustic panels Mycelium acoustic panels Mogu

These materials work well because their internal porous structure can help absorb sound, while their low density can also contribute to thermal insulation.

Advantages over conventional acoustic materials:

  • bio-based and renewable
  • visually distinctive and suitable for interior design
  • lightweight
  • can provide acoustic and thermal benefits at the same time

Disadvantages:

  • usually better suited to indoor than exposed outdoor use
  • may require treatment for moisture resistance and durability
  • performance can vary depending on substrate, density, and fabrication process
  • still less common and less standardized than mineral wool, foam, or gypsum-based systems

4. Architectural and construction experiments

Mycelium has also been used in architecture, especially in experimental pavilions and temporary installations. One famous example is the Hy-Fi pavilion, which demonstrated the potential of mycelium-grown bricks for lightweight, low-carbon construction.

Hy-Fi pavilion Hy-Fi pavilion MoMa

We have also seen exhibition pavilions such as MY-CO SPACE, which use mycelium-based building elements in semi-protected environments.

MY-CO SPACE pavilion MY-CO SPACE pavilion My-co Space

These projects show that fungal materials can be used not only for products, but also for spatial design and architectural expression.

Advantages over traditional building materials:

  • low weight
  • grown from renewable waste-based feedstocks
  • low embodied energy compared with many fired or petrochemical materials
  • biodegradable and visually unique
  • suitable for temporary structures, exhibitions, and circular design experiments

Disadvantages:

  • limited durability in outdoor conditions without protection
  • vulnerable to moisture, weathering, and biological degradation
  • lower mechanical strength than brick, concrete, or many engineered panels
  • building regulations and long-term structural reliability remain challenges

Conclusion

Fungal materials are a rapidly growing area of biomaterial research and design. Existing examples already include packaging, leather-like materials, acoustic panels, and architectural installations. Their main advantages are that they are lightweight, bio-based, biodegradable, and can be grown on cheap waste substrates. However, compared with traditional materials, they still face important limitations in durability, water resistance, standardization, and structural performance.

For these reasons, fungal materials are especially promising in applications where low weight, sustainability, compostability, and material experimentation are more important than maximum strength or long-term outdoor durability. Rather than replacing all conventional materials, they are currently most valuable as specialized alternatives in design, packaging, interiors, and temporary architecture.

What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Compared with bacteria and yeasts, the synthetic biology infrastructure for filamentous fungi is still less mature. One recent review states that, relative to bacteria and yeasts, synthetic biology in filamentous fungi is “rather underdeveloped,” especially in mushroom-forming species, and links this to factors such as slower growth, lower-throughput transformation, unwanted enzyme secretion, and limited plasmid tools [1]. At the same time, this gap is beginning to close. Recent work has developed a modular synthetic biology toolkit for filamentous fungi that includes natural and synthetic promoters, terminators, fluorescent reporters, selection markers, transcriptional regulatory domains, and components for CRISPR-based technologies [2]. This means fungi are no longer only interesting as natural material producers, but are increasingly becoming engineerable biological chassis.

Fungi also offer some major advantages over bacteria for biomaterial-based synthetic biology. Unlike most bacteria, filamentous fungi naturally grow as multicellular, spatially distributed mycelial networks that branch and intermesh across large areas [1]. These networks are well suited for the development of macroscopic living materials that can sense, respond, and potentially compute across space. In addition, filamentous fungi secrete enzymes that degrade lignocellulosic biomass, allowing them to grow on cheap and abundant waste feedstocks [1]. This makes them especially attractive for biomaterials and biomanufacturing, because the fungus can function both as the material itself and as the engineered sensing or production chassis.

Adamsky2 Adamsky2 Adamsky

An especially interesting direction is the idea of fungi as living sensory-computational materials. Research by Adamatzky and others suggests that mycelial networks behave as electrically active distributed systems. Fungal colonies generate measurable extracellular voltage spikes, and these spike trains vary in duration, amplitude, and temporal patterning [3]. In related work, Adamatzky and colleagues argue that mycelium exhibits neuron-like spiking behaviour and a wide range of non-linear electrical properties, and they show that electrical signals in Aspergillus niger colonies can in principle be used to implement logical gates and circuits [4]. In that study, they also used an A. niger strain expressing green fluorescent protein (GFP) from the glucoamylase (glaA) promoter [4]. Although this line of research is not always synthetic biology in the strict sense, it provides a compelling conceptual basis for future engineered fungal systems.

Adamzky Adamzky Adamsky

It would therefore not be far-fetched to imagine genetically engineering fungi to detect vibration, touch, humidity changes, or electrical activity, and to convert these signals into readable outputs such as fluorescence, color change, altered growth patterns, or production of a specific metabolite. Such systems could be useful for self-monitoring building materials, environmental sensing, smart packaging, or living interfaces. This idea is strengthened by recent evidence that fungi may also respond to sound: Robinson et al. found that acoustic stimulation increased fungal biomass and enhanced Trichoderma harzianum conidia activity [5]. Synthetic biology could extend these native electrical and environmental response behaviours into programmable sensing and response systems.

Overall, bacteria remain easier and faster to engineer in many contexts, but fungi offer a different set of advantages. Their value lies not mainly in engineering simplicity, but in their eukaryotic biology, secretion capacity, growth on low-cost substrates, and ability to form large living material networks [1]. For applications in which the organism itself is meant to become part of a responsive, structural, or computational material, fungi may offer possibilities that bacteria cannot provide as easily.

References

[1] Jo, C., Zhang, J., Tam, J. M., Church, G. M., Khalil, A. S., Segrè, D., & Tang, T.-C. (2023). Unlocking the magic in mycelium: Using synthetic biology to optimize filamentous fungi for biomanufacturing and sustainability. Materials Today Bio, 19, 100560. https://pmc.ncbi.nlm.nih.gov/articles/PMC9900623/

[2] Mózsik, L., Pohl, C., Meyer, V., Bovenberg, R. A. L., Nygård, Y., & Driessen, A. J. M. (2021). Modular Synthetic Biology Toolkit for Filamentous Fungi. ACS Synthetic Biology, 10(11). https://pubs.acs.org/doi/10.1021/acssynbio.1c00260

[3] Adamatzky, A. (2022). Language of fungi derived from their electrical spiking activity. Royal Society Open Science, 9(4), 211926. https://doi.org/10.1098/rsos.211926

[4] Adamatzky, A., Ayres, P., Beasley, A. E., Roberts, N., & Wösten, H. A. B. (2022). Logics in Fungal Mycelium Networks. Logica Universalis, 16(4), 655–669. https://doi.org/10.1007/s11787-022-00318-4

[5] Robinson, J. M., Annells, A., Cando-Dumancela, C., & Breed, M. F. (2024). Sonic restoration: Acoustic stimulation enhances plant growth-promoting fungi activity. Biology Letters, 20(10), 20240295. https://doi.org/10.1098/rsbl.2024.0295

Final Project

I know this part of the homework is not really required for our node but I will use part of the template to try to evaluate some of the ideas that I have. Idea 1 is now canceled and I have narrowed down my research to 2 different ideas

IDEA 1 — BC Face Mask as a cell-free biosensing textile

IDEA 2 — “Water-resistant BC leather” via in-growth synbio functionalization

IDEA 3 — Light-input → color-output BC “bio-print” for moiré effects (E. coli + BC co-culture)

1. Your abstract should briefly address the following elements:

The signafiance: both projects are adressing two separate problems with bacterial cellulose usecases in the textile industry, but they both lead to a clear patch towards a more sustainable fashion industry and have a clear industrial importance concidering the environmental impact of fashion. The Broad Objective: for both projects would be to find sustainable ways to produce textile using bacterial cellulose.

SECTION 3: BACKGROUND

Background and Literature Context Provide background research that explains the current state of knowledge and identifies the gap in knowledge or capability that your project addresses.

IDEA 2 — “Water-resistant BC leather” via in-growth synbio functionalization

These two papers are useful for my final project because they address different parts of the same material problem: how to reduce the strong water absorption of bacterial cellulose by attaching a hydrophobic function to the cellulose surface. The first paper provides a practical fusion-protein strategy. It shows that a class I hydrophobin, HGFI, can be fused to a cellulose-binding domain (CBD), which improves its soluble expression in E. coli and allows the fusion protein to bind directly to bacterial cellulose. This is important for my project because it demonstrates that a CBM/CBD–hydrophobin fusion is a realistic way to deliver a hydrophobic function onto a cellulose material. [1]

pic pic

The second paper is useful in a different way. Rather than focusing on hydrophobin production, it identifies a new cellulose-binding module, CBM104, which binds very selectively to native crystalline cellulose I and does so with much higher adsorption efficiency than the more common fungal CBM1. The authors also suggest that CBM104 binds to hydrophilic regions of cellulose microfibrils, while CBM1 recognizes hydrophobic surfaces. This matters for my project because it suggests that the cellulose-binding part of the fusion is not just a generic anchor: choosing a different CBM could change how strongly and where the hydrophobic protein attaches to bacterial cellulose. [2]

It might be possible to speculate that CBM104 could be used as targeted “glue” attaching on the specific part of (hydrophilic regions) that is interesting for me.

Together, these papers suggest a clear strategy for addressing the BC water-absorption problem. The first paper offers a practical method for building and expressing a hydrophobin–CBM fusion, while the second paper suggests a way to improve that strategy by selecting a more specific cellulose-binding domain. For my project, this means I could design a hydrophobin-based bio-finish for bacterial cellulose and compare a standard CBD/CBM with CBM104 to test whether more selective binding to native crystalline cellulose improves water resistance and overall material performance. The research gap is that the first paper does not test water-resistant BC finishing directly, and the second paper does not test a hydrophobin fusion at all, so my project would combine these two ideas into a new BC-finishing approach. [2]

Next research stept:

Paper 1 — Hydrophobin–CBM fusion (HGFI–CBD)

  • Design fusion construct (Hydrophobin + linker + CBM)
  • Order DNA (e.g. Twist)
  • Clone into expression vector (e.g. pET28a)
  • Transform into E. coli (Top10 → BL21(DE3))
  • Protein expression (IPTG induction)
  • Cell lysis (sonication)
  • Collect cell-free extract (CFE) or purify protein
  • Grow BC pellicles (Komagataeibacter in HS)
  • Ex situ: coat BC with fusion protein
  • or In situ: add CFE during BC growth
  • Dry / process pellicle
  • Test hydrophobicity (contact angle)

Paper 2 — CBM104 discovery / application direction

  • Select CBM (CBM104 vs CBM1)
  • Design fusion construct (Hydrophobin + linker + CBM104)
  • Order DNA (Twist)
  • Clone into expression vector
  • Transform into E. coli
  • Protein expression
  • Cell lysis → collect CFE / purify
  • Grow BC pellicles
  • Apply fusion protein to BC (ex situ or in situ)
  • Compare binding / performance vs CBM1
  • Test material properties (hydrophobicity, absorption)

Future research needed:

  1. Hydrophobic protein alternatives What other hydrophobin could be interesting?
  • Class I vs Class II hydrophobins
  • BslA (bacterial hydrophobin-like protein) ← already proven for BC
  • Amphiphilic peptides (shorter, easier expression)
  • Elastin-like polypeptides (ELPs) (tunable hydrophobicity)
  • Lipid-binding proteins / oleosins
  • Designed peptides (ProteinMPNN / simple repeats)
  1. CBM selection strategy

Compare:

  • Standard CBM1 / CBD
  • CBM104 (paper 2)
  • Possibly bacterial CBMs vs fungal CBMs

What to research:

  • Binding strength
  • Binding location (hydrophilic vs hydrophobic cellulose faces)
  1. Expression system: E. coli vs Komagataeibacter (VERY IMPORTANT)

Option A — E. coli (current paper approach) pro:

  • Easy
  • High expression
  • Fast con:
  • Not integrated into material
  • Post-processing step

Option B — Komagataeibacter (KIK / KTK system)

I havent found any researh on this but should maybe be possible to use KTK (Komagataeibacter Tool Kit) cloning system to clone the system straight into Komagataeibacter

  1. In situ vs ex situ functionalization Compare: Ex situ coating (CFE / purified protein) In situ addition (add protein during growth) Fully engineered BC producer (genetic insertion) Research:
  • diffusion into pellicle
  • stability of protein during growth
  • whether proteins get trapped vs surface-localized
  1. Material performance metrics
  • Water contact angle (WCA)
  • Water uptake % (swelling ratio)
  • Mechanical properties (tensile strength, flexibility)
  • Durability after drying

References

[1] Puspitasari, N., & Lee, C.-K. (2021). Class I hydrophobin fusion with cellulose binding domain for its soluble expression and facile purification. International Journal of Biological Macromolecules, 193, 38–43.
article

[2] Kojima, Y. et al. (2025). A cellulose-binding domain specific for native crystalline cellulose in lytic polysaccharide monooxygenase from the brown-rot fungus Gloeophyllum trabeum. Carbohydrate Polymers, 347, 122651.
article

[3] Gilmour, K. et al. (2025). *Environmentally conscious hydrophobic spray coatings on bacterial cellulose for sustainable and reusable textiles_ [article](https://www.sciencedirect.com/science/article/pii/S0959652625011254#bib15

IDEA 3 — Light-input → color-output BC “bio-print” for moiré effects (E. coli + BC co-culture)

The most important paper is the 2025 Nature Biotechnology study on self-pigmenting bacterial cellulose. It shows that Komagataeibacter rhaeticus can be engineered to produce black bacterial cellulose through tyrosinase expression, and that this pigmentation can be combined with optogenetic control to pattern gene expression in the growing pellicle. This is directly relevant to my project because it proves that BC can be colored from within the growth process itself, rather than only by post-dyeing, and that light can be used as a programmable input for spatial patterning. At the same time, the paper also shows the current limitation: patterned eumelanin still has high background pigmentation and limited contrast, so accurate visual patterning remains a research gap [1].

A second reference is the paper by Levskaya et al., which is one of the foundational demonstrations of bacterial optogenetics. Although it was done in E. coli rather than Komagataeibacter, it established the key idea that a projected light pattern can be converted into a two-dimensional biological image. For my project, this paper is useful as conceptual background: it shows that light can function as a precise design input for pattern formation, which supports the idea of using projected light to “bio-print” patterns into a growing cellulose material [2].

To make this feasible in Komagataeibacter, the genetic toolkit papers are also important. The KTK paper shows that K. rhaeticus can be engineered using a modular Golden Gate cloning system for multigene constructs, while the expanded Acetobacteraceae toolkit provides characterized promoters, RBSs, terminators, and reporter systems for fine control of gene expression in cellulose-producing bacteria. Together these papers show that Komagataeibacter is not only a BC producer, but also a realistic synthetic biology chassis for building more complex circuits such as light-responsive melanin production [3][4].

The more recent Trends in Biotechnology paper by Zhou et al. is useful mainly as a future-direction reference. It shows that colored BC can also be produced through a co-culture strategy using pigment-producing E. coli and BC-producing K. xylinus, achieving seven different colors. This paper might be less relevant as the immediate experimental route, because it is more complex and requires co-culture with E. coli. However, it is valuable because it shows that melanin-based black BC is only one starting point, and that in the future a light-programmed BC system could potentially be extended toward a broader color palette [5].

Together, these papers suggest a direction for my final project. The Nature Biotechnology paper provides the direct experimental basis for light-programmed melanin patterning in bacterial cellulose, Levskaya provides the conceptual foundation for using projected light as a spatial control system, and the KTK / Acetobacteraceae toolkit papers show that Komagataeibacter can realistically be engineered as the host. The research gap is not simply whether BC can be colored, because that has already been shown, but whether higher-fidelity, lower-background, spatially programmable patterning can be achieved in BC, and whether such patterned pellicles can be used to create multilayer optical effects such as moiré.

Potential process

  • Design output system (mCherry for prototyping, tyr1 for melanin as final output)
  • optogenetic switch construct (light-control system (Opto-T7RNAP))
  • Order DNA parts (Twist)
  • Assemble constructs KTK / Golden Gate where compatible other cloning strategy if needed for optogenetic parts
  • Transform into E. coli for plasmid build/propagation
  • Transform engineered plasmids into K. rhaeticus
  • Validate reporter expression in liquid culture
  • Test and optimize light response with mCherry first
  • Grow thin BC pellicles with engineered K. rhaeticus
  • Project patterned light during pellicle growth
  • Image / quantify pattern quality with mCherry
  • Swap reporter to PT7-tyr1
  • Grow BC pellicles under patterned light
  • Transfer pellicles to melanin development buffer and develop visible eumelanin pattern
  • Compare pattern quality
  • Grow two separately patterned thin pellicles, overlay them to test moiré effects

Next research stept:

  • which Opto-T7RNAP variant is most suitable
  • blue-light requirements
  • dynamic range

Output choice: mCherry vs tyr1

  1. Pattern fidelity in BC
  • diffusion / blur during growth
  • how pellicle thickness affects resolution
  • how long you can expose before patterns spread whether thin pellicles give better contrast
  1. Reactor / growth geometry
  • whether to grow each layer separately
  1. Komagataeibacter toolkit options
  • KTK for modular multigene assembly in K. rhaeticus
  • promoter / RBS / terminator choices from the expanded Acetobacteraceae toolkit whether you need one plasmid or two antibiotic markers and compatibility
  1. Development chemistry for melanin
  • melanin development buffer composition effect of pH
  • effect of tyrosine and copper
  • whether development can be made faster or cleaner
  1. Future color expansion
  • whether melanin should stay the final target
  • whether the Zhou co-culture platform is better as a future direction for broader color range
  • whether one-color high-fidelity patterning is stronger than many colors with weak control

References

[1] Walker, K. T. et al. (2025). Self-pigmenting textiles grown from cellulose-producing bacteria.
Article

[2] Levskaya, A. et al. (2005). Synthetic biology: engineering Escherichia coli to see light.
Article

[3] Goosens, V. J. et al. (2021). Komagataeibacter Tool Kit (KTK): A Modular Cloning System for Multigene Constructs and Programmed Protein Secretion from Cellulose Producing Bacteria.
PDF

[4] Teh, M. Y. et al. (2019). An Expanded Synthetic Biology Toolkit for Gene Expression Control in Acetobacteraceae.
Article

[5] Zhou, H., Lin, P., Jeong, K. J., & Lee, S. Y. (2026). One-pot production of colored bacterial cellulose.
Article

Week 09 HW: cell free systems

  1. Advantages of cell-free systems Cell-free protein synthesis (CFPS) offers a highly flexible and controllable environment compared to in vivo expression systems. Because there are no living cells, experimental conditions such as pH, ionic strength, redox environment, DNA concentration, cofactors, and additives can be directly tuned without affecting cell viability. This enables rapid optimization and prototyping of genetic constructs.

Additionally, CFPS is significantly faster, allowing protein production within hours instead of requiring cell growth, transformation, and induction steps.

Cell-free systems are particularly advantageous in cases such as:

  • Toxic proteins: proteins that would inhibit or kill host cells can be produced safely
  • Membrane proteins: can be expressed with detergents, liposomes, or nanodiscs to improve folding and functionality
  1. Components of a cell-free system

A typical cell-free expression system includes:

Cell extract / TX-TL machinery

  • Provides ribosomes, tRNAs, enzymes, and factors required for transcription and translation
  • DNA or mRNA template - Encodes the protein of interest
  • Amino acids Building blocks for protein synthesis
  • Nucleotides (ATP, GTP, CTP, UTP) - Required for transcription and energy transfer
  • Energy regeneration system - Maintains ATP/GTP supply during the reaction
  • Buffer + cofactors (Mg²⁺, K⁺, etc.) - Maintain optimal biochemical conditions
  • Optional additives (chaperones, lipids, detergents)- Help folding or membrane protein insertion
  1. Why energy regeneration is critical

ATP and GTP are consumed during:

  • transcription
  • tRNA charging
  • ribosomal translation
  • Without regeneration, the reaction stops quickly.

Solution: Use an energy regeneration system such as: phosphoenolpyruvate (PEP) + pyruvate kinase or creatine phosphate + creatine kinase. These systems continuously regenerate ATP, allowing sustained protein production.

  1. Prokaryotic vs eukaryotic systems
FeatureProkaryotic CFPSEukaryotic CFPS
SpeedFastSlower
YieldHighLower
ComplexitySimpleComplex
PTMsLimitedFull (glycosylation, etc.)
  1. Designing a membrane protein experiment

Challenges:

  • Poor solubility
  • Misfolding
  • Aggregation Approach:
  • Add detergents or liposomes to mimic membranes
  • Include chaperones
  • Optimize Mg²⁺, temperature, and energy system

Homework question from Kate Adamala

Input: external signal (e.g. chemical or mechanical inducer) Output: cellulose-related components such as:

  • cellulose synthase subunits
  • UDP-glucose
  • regulatory signals controlling cellulose production

b. Could this function be realized by cell-free Tx/Tl alone?

Partially, cell-free TXTL systems can produce proteins such as cellulose synthase subunits or regulatory molecules. However, full cellulose biosynthesis requires:

  • membrane localization
  • metabolic regeneration
  • long-term energy supply

TXTL alone is insufficient for complete cellulose production, but suitable for prototyping and partial functionality.

c. Could this function be realized by genetically modified natural cells?

Yes, and this is currently the most realistic approach. Organisms such as Komagataeibacter rhaeticus naturally produce bacterial cellulose and can be genetically engineered to control production using synthetic circuits (e.g. optogenetic systems). However, synthetic cells offer advantages in:

  • controllability
  • modularity
  • reduced biological complexity

d. Desired outcome The goal is to create a programmable material production system where synthetic cells can spatially or temporally control cellulose formation, enabling structured biomaterials.

🧪 2. Design of the synthetic cell

a. Membrane The synthetic cell membrane would consist of:

  • phospholipid bilayer vesicles (liposomes) e.g. POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine)

b. Encapsulated components Inside the synthetic cell:

  • TXTL system (e.g. E. coli extract)
  • DNA encoding:
  • cellulose synthase components (bcsA, bcsB)
  • regulatory proteins
  • nucleotides (ATP, GTP, etc.)
  • amino acids
  • energy regeneration system (e.g. PEP)
  • cofactors (Mg²⁺, K⁺)

c. Source of TXTL system

A bacterial TXTL system (E. coli-based) is sufficient because:

  • fast and high-yield
  • compatible with most synthetic biology parts
  • no need for eukaryotic post-translational modifications

d. Communication with environment The synthetic cell would interact with its environment through:

  • passive diffusion (small molecules like glucose)
  • embrane pores or channels, such as: α-hemolysin (forms pores in lipid membranes)

This allows uptake of substrates and release of products.

🧬 3. Experimental details a. Example components (genes + lipids) Lipids:

  • POPC
  • cholesterol Genes:
  • bcsA (cellulose synthase catalytic subunit)
  • bcsB (periplasmic subunit)
  • optional regulators of cellulose synthesis
  • α-hemolysin (for membrane permeability)

b. Measurement of function Function could be evaluated by:

  • detecting protein expression (e.g. GFP fusion)
  • measuring cellulose production using:
  • Calcofluor staining
  • dry weight measurement
  • SEM microscopy to observe fiber formation

Homework question from Peter Nguyen

Based on my idea 1 for my final project I would develop a Bacterial cellulose cosmetic skinmask that would sense the “health” of the customers skin. Facemasks are populair single use product, however they are “dumb” providing a singulair batch of substances without telling you anything about what your skin acctually needs.

BC is already a compelling cosmetic substrate because it holds a lot of water, conforms well to skin, and has been tested as a moisturizing sheet mask material. In one evaluation, iinstead of putting living engineered cells on the face, a safer “synthetic biology” route is to embed freeze-dried cell-free gene expression (TX-TL) into the BC sheet as small patterned “sensor dots.” These cell-free circuits stay inactive when dry, then turn on when the mask hydrates during wear; outputs can be colorimetric (visible) or optical.

Because freeze-dried cell-free circuits activate upon rehydration, a conventional pre-hydrated sheet mask would trigger prematurely during storage. A practical design might be a dry-stored BC mask (or a separate paper sensor tab) that is activated only at time of use by releasing fluid.

How it could work:

  • Input (skin/sweat biomarker): pH (skin barrier/irritation proxy), lactate (sweat/metabolic proxy).
  • Sensing layer (cell-free circuit): a biomarker-responsive regulatory element controls whether a reporter is expressed.
  • Output (visible color): express a chromoprotein (strong color under normal light) so the mask visibly shifts color in specific zones without any instrument; chromoproteins are attractive for “naked-eye” readouts.

The advantage of this concept is that facemask is already concidered as single use products so the one time use limitation of freeze dried system is becoming a desirable feature.

Homework question from Ally Huang

  1. Background

My proposal is to develop a freeze-dried BioBits paper-based diagnostic for astronaut urine monitoring. The system would function as a “smart toilet paper” that rehydrates on contact with urine and produces a visible or fluorescent signal when a molecular marker of infection is present. This approach addresses the need for low-resource, non-invasive health monitoring in space, where medical infrastructure is limited. Urinary tract infections (UTIs) are a relevant risk due to immune changes in microgravity. This project is scientifically interesting because it combines synthetic biology, paper-based diagnostics, and cell-free systems for autonomous health monitoring.

  1. Molecular / genetic target Bacterial 16S rRNA sequence specific to Escherichia coli as a biomarker for urinary tract infection.

  2. Relation to space biology challenge Astronauts experience immune dysregulation and altered microbial behavior in microgravity, increasing susceptibility to infections. Urinary tract infections are particularly relevant due to hygiene constraints and closed environments during long-duration missions. Detecting bacterial 16S rRNA from Escherichia coli, a common UTI-causing organism, provides a direct molecular indicator of infection. A paper-based, cell-free diagnostic allows rapid, on-site detection without the need for complex laboratory equipment. This enables early intervention and reduces health risks, making it highly relevant for maintaining crew health during extended space travel.

  3. Hypothesis / research goal I hypothesize that a freeze-dried BioBits cell-free system embedded in paper can detect bacterial RNA from Escherichia coli in urine and produce a measurable colorimetric or fluorescent output upon rehydration. The system would be designed with a DNA construct that responds to the presence of a target RNA sequence, triggering expression of a reporter protein such as GFP. The reasoning is that cell-free systems are stable when freeze-dried and can be activated by simple hydration, making them ideal for space applications. By integrating this system into a paper substrate, it becomes a lightweight, disposable diagnostic tool. The goal is to demonstrate that molecular detection and signal generation can occur reliably in a minimal, equipment-free format suitable for use in microgravity environments.

  4. Experimental plan Urine samples spiked with Escherichia coli RNA will be applied to freeze-dried BioBits paper assays. Controls include: (1) urine without bacterial RNA (negative control) and (2) samples with known RNA concentration (positive control). The assay contains a DNA construct that produces a reporter signal in response to the target sequence. Upon rehydration, the reaction will be incubated and analyzed for color change or fluorescence using the P51 Molecular Fluorescence Viewer. Data collected will include signal intensity over time and detection sensitivity. This will assess the feasibility of rapid, paper-based molecular diagnostics in space.

PART B — Final Project Integration

Cell-free systems could be highly valuable for prototyping the optogenetic circuit before implementing it in Komagataeibacter rhaeticus. Instead of directly assembling and testing the full system in vivo (which is slow and complex), a cell-free system could be used to:

  • Rapidly test Opto-T7RNAP activation dynamics
  • Measure leakage in dark vs light conditions
  • Optimize sRNA expression strength
  • Tune arabinose induction levels
  • Characterize response curves to projected light patterns

Because CFPS allows direct control over DNA concentration and reaction conditions, it would enable systematic testing of circuit parameters such as:

  • promoter strength
  • sRNA efficiency
  • degradation rates
  • transcriptional leakage

This would significantly reduce uncertainty before moving to in vivo experiments, where additional complexity (metabolism, diffusion, growth) makes debugging more difficult. In particular, cell-free systems could serve as a pre-validation layer for Aim 1, allowing partial validation of circuit logic even if full cellulose production cannot be reproduced in vitro.

week-10-hw-imaging and measurment

##Final Project

??

Waters Part I — Molecular Weight

The predicted molecular weight of the full eGFP construct, including the LE linker and His6-tag, is approximately 28,006.6 Da based on the amino acid sequence. Mature eGFP forms an internal chromophore, which results in a mass loss of approximately 20 Da. Therefore, the expected molecular weight of mature eGFP is approximately 27,986.6 Da.

To calculate the molecular weight from the LC-MS data, I selected two adjacent charge-state peaks from Figure 1:

m/z = 1000.4302
m/z = 1037.4423

The lower m/z peak corresponds to the higher charge state. Using the adjacent charge state equation:

z = (1000.4302 - 1.0073) / (1037.4423 - 1000.4302)

z ≈ 27

Therefore, the peak at m/z 1037.4423 corresponds to the 27+ charge state, and the peak at m/z 1000.4302 corresponds to the 28+ charge state.

Using the relationship between m/z, charge state, and molecular weight, the calculated experimental molecular weight is approximately:

MW ≈ 27,986.4 Da

This is very close to the predicted mature eGFP molecular weight of 27,986.6 Da.

Accuracy = |27,986.4 - 27,986.6| / 27,986.6

Accuracy ≈ 0.0005%

For the zoomed-in peak around m/z 1474, the charge state can be estimated from the molecular weight:

z = 27986.6 / (1474 - 1.0073)

z ≈ 19

Therefore, the zoomed-in peak corresponds approximately to the 19+ charge state. The isotope spacing should be about 1/19 = 0.053 m/z, which is close to what is observed in the zoomed-in spectrum.

Waters Part III — Peptide Mapping

The eGFP sequence contains 20 lysines (K) and 6 arginines (R), giving 26 possible trypsin cleavage residues.

Using trypsin with 0 missed cleavages, the eGFP sequence generates 27 theoretical tryptic fragments in total. With the PeptideMass settings shown in the assignment, where only peptides larger than 500 Da are displayed, 19 peptides are reported.

From the peptide map TIC in Figure 5a, I count approximately 19 chromatographic peaks between 0.5 and 6 minutes that are above ~10% relative abundance. This approximately matches the number of predicted tryptic peptides above 500 Da. However, the match is not expected to be exact because some peptides may co-elute, some may ionize poorly, and some peptides may appear in multiple charge states or modified forms.

For the chromatographic peak at 2.78 minutes, the most abundant ion in Figure 5b has an m/z of 525.76712. The isotope spacing is approximately 0.492 m/z, indicating a 2+ charge state.

The neutral peptide mass was calculated as:

M = z(m/z) - zH

M = 2(525.76712) - 2(1.0073)

M ≈ 1049.5197 Da

The singly protonated mass is therefore approximately:

[M+H]+ = 1050.5270

Comparing this mass to the predicted tryptic peptide masses from PeptideMass, the best matching peptide is:

FEGDTLVNR

The theoretical monoisotopic neutral mass of FEGDTLVNR is approximately 1049.5142 Da. The mass error is:

ppm = ((1049.5197 - 1049.5142) / 1049.5142) × 10^6

ppm ≈ 5.3 ppm

According to the amino acid coverage map in Figure 6, 88% of the eGFP sequence was confirmed by peptide mapping.

Overall, the peptide map data supports that the sample is the eGFP standard because the detected peptide masses and fragmentation data match the expected tryptic peptides from eGFP, and the sequence coverage is high at 88%.

Bonus:

The peptide sequence that best matches the fragmentation spectrum in Figure 5c is FEGDTLVNR. This assignment is supported by the measured precursor m/z of 525.76712 with charge state 2+, giving a neutral mass of approximately 1049.5197 Da. This closely matches the theoretical monoisotopic mass of the tryptic peptide FEGDTLVNR.

The peptide map data makes sense and supports identification of the sample as eGFP. The LC-MS peptide map identifies peptides distributed across most of the eGFP sequence, giving 88% amino acid coverage. The combination of accurate peptide mass and fragmentation pattern confirmation indicates that the analyzed protein is consistent with the eGFP standard.

Waters Part IV — Oligomers

Based on the known subunit masses, the expected oligomeric states are:

  • 7FU decamer: 10 × 340 kDa = 3,400 kDa = 3.4 MDa
  • 8FU didecamer: 20 × 400 kDa = 8,000 kDa = 8.0 MDa
  • 8FU 3-decamer: 30 × 400 kDa = 12,000 kDa = 12.0 MDa
  • 8FU 4-decamer: 40 × 400 kDa = 16,000 kDa = 16.0 MDa

In the CDMS spectrum, the 7FU decamer corresponds to the peak near 3.4 MDa. The 8FU didecamer corresponds to the large peak near 8.33 MDa. The 8FU 3-decamer corresponds to the peak near 12.67 MDa. The 8FU 4-decamer is expected near 16 MDa and appears, if present, only as a weak/broad signal in the 16–17 MDa region.

The theoretical molecular weight of mature eGFP, including the LE linker and His6-tag, is 27.9866 kDa. The observed intact LC-MS molecular weight calculated from the adjacent charge states was approximately 27.9864 kDa. This gives a mass error of approximately -7 ppm. The close agreement between the theoretical and observed molecular weights supports that the measured protein is consistent with GFP/eGFP.