Zarias Sacrati Aelius (Arman Saadatkhah) — HTGAA MIT/Harvard Spring 2026

cover image cover image

About me

To serve society by researching and manufacturing resilient systems that secure essential infrastructure and enlighten the next generation through standards, education, and responsible innovation.

Contact info

LinkedIn

Homework

Labs

Projects

Apr 8, 2026

Subsections of Zarias Sacrati Aelius (Arman Saadatkhah) — HTGAA MIT/Harvard Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Assignment: Post-Lecture 1 First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  • Week 2 HW: DNA Read, Write, and Edit

    Assignment Pre-Lecture 2 Professor Jacobson Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? The error rate for polymerase is 1:106. The human genome is ~3.2 billion base pairs (3.2 x 109). Biology deals with this discrepancy by bridging the gap using a multi-step quality control system to lower the error rate to 1:10^9. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

  • Week 3 HW: Lab Automation

    Assignment: Python Script for Opentrons Artwork Artistic Concept — “ELM Habitat Cross-Section” My design uses the 96-well plate as a canvas to depict a cross-sectional schematic of the Multi-Trophic Myco-Foundry — the engineered living material (ELM) habitat I proposed in Week 1. Three food-safe dyes, pipetted by the Opentrons OT-2, fill concentric rings of wells representing the three biological layers of the habitat:

  • Week 4 HW: Protein Design Part I

    Part A. Conceptual Questions Answering all thirteen questions (two skipped: #11 “Why do β-sheets aggregate?” merged into #10, and one implicit skip).

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Protein makes up ~20% of meat by mass: 500 g × 0.20 = 100 g of protein 100 Daltons = 1.66 × 10⁻²² g per molecule 100 g ÷ (1.66 × 10⁻²² g/molecule) = ~6 × 10²³ molecules (approximately one Avogadro’s number) 2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
  • Week 5 HW: Protein Design Part II

    Part A: SOD1 Binder Peptide Design Superoxide dismutase 1 (SOD1) is a cytosolic homodimeric antioxidant enzyme that converts superoxide radicals (O₂⁻) into hydrogen peroxide and molecular oxygen. It coordinates copper and zinc ions essential for catalysis and structural integrity. The A4V mutation — Alanine → Valine at residue 4 of the mature protein (residue 5 in the UniProt P00441 precursor) — causes one of the most aggressive familial ALS subtypes by subtly destabilizing the N-terminal β-strand and promoting toxic SOD1 misfolding and aggregation.

  • Week 6 HW: Genetic Circuits Part I — Assembly Technologies

    Part A — DNA Assembly Questions Q1: What is in a Phusion PCR master mix and what does each component do? A standard Phusion High-Fidelity PCR master mix contains the following components: Component Role Phusion Hot-Start DNA Polymerase High-fidelity thermostable polymerase with a 3’→5’ proofreading exonuclease; error rate ~4.4 × 10⁻⁷ per bp per cycle (50× lower than Taq) dNTPs (dATP, dCTP, dGTP, dTTP) Nucleotide substrates for strand synthesis Mg²⁺ (MgCl₂, 1.5–3.0 mM) Essential cofactor for polymerase activity; stabilises the primer–template duplex 5× HF Buffer (KCl + Tris-HCl pH 8.8) Maintains optimal pH and ionic strength; HF formulation includes a proprietary enhancer that increases specificity DMSO (optional, 0–3%) Denaturant for GC-rich or secondary-structure-prone templates Primers (user-added, 0.5–1 µM each) Define amplicon boundaries; anneal to template strands Template DNA (user-added, 1–50 ng) Source of target sequence Nuclease-free H₂O Brings reaction to volume The hot-start formulation keeps polymerase inactive below ~60°C, preventing non-specific extension during setup and eliminating the need for a manual hot start.

  • Week 7 HW: Genetic Circuits Part II — Neuromorphic Circuits

    Part 1 — Intracellular Artificial Neural Networks (IANNs) Q1: Advantages of IANNs over Boolean genetic circuits Traditional genetic circuits implement Boolean logic: each node is either “on” or “off,” and the circuit computes AND/OR/NOT/NAND operations over binary input signals. This is powerful for simple decision logic but breaks down for complex, real-world biological classification tasks.

  • Week 9 HW: Cell-Free Systems

    Part A — General Questions Q1: Advantages of Cell-Free Protein Synthesis over Traditional In Vivo Methods Cell-free protein synthesis (CFPS) decouples protein production from cell viability, providing two structural advantages: Flexibility — the reaction composition is fully user-controlled. Template DNA, cofactors, non-natural amino acids, detergents, redox buffers, and labeled substrates can be added directly at any concentration without membrane barriers or cell toxicity constraints. Reaction volumes can range from nanolitres (acoustic dispensing) to litres (batch bioreactor).

  • Week 10 HW: Advanced Imaging & Measurement

    Homework: Final Project — Measurement Plan for the ELM Biocontainment System My final project centers on a Modular Engineer Living Material (ELM) deep-space biocontainment system using phosphite auxotrophy (ptxD-based synthetic dependency) in an engineered bacterium for Mars surface operations. Below are the key measurable quantities, the associated biological questions, and the measurement technologies I would use.

  • Week 11 HW: Bioproduction & Cloud Labs

    Part A — The 1,536 Pixel Artwork Canvas | Collective Bioart My Contribution I contributed a cluster of sfGFP (green) and mTurquoise2 (cyan) wells arranged to form a segment of a DNA double helix pattern in my assigned plate. The two strands of the helix were encoded in alternating rows using sfGFP (one strand) and mTurquoise2 (the complementary strand), mirroring the ELM habitat’s motif of dual biological systems working in structural complementarity. In total I contributed 14 pixel wells — the length of approximately one full helical turn — in the upper-middle region of the 16-plate global canvas.

Subsections of Homework

Week 1 HW: Principles and Practices

Assignment: Post-Lecture 1

  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

    1. Proposal:
      1. A Modular Engineer Living Material (ELM) Ecosystem for Deep Space Habitation: The Multi-Trophic Myco-Foundry is a bio-industrial habitat architecture designed for Class IVc planetary mission (Mars surface operations). Unlike a singular/monolithic biological design, this system will utilize a distributed, multi-organism ecosystem to decouple structural integrity from metabolic function.
      2. Technical Description: The system architecture consists of four integrated biological layers, functioning analogous to organ systems:
        1. The Structural Shell (Protective Layer)
        2. The Vascular System (Transport Layer)
        3. The Metabolic Hub (Atmosphere Layer)
        4. The Pharmaceutical Payload (Functional Layer)
      3. Rationale:
        1. Current ISRU strategies rely on abiotic chemical processing that requires heavy & failure-prone hardware. This proposal shifts to Biological ISRU, where habitat is a regenerative asset that grows it’s own shielding, recycles waste, and manufactures critical medical supplies.
  2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

    1. Planetary Protection: Ensure zero contamination of the planetary biosphere by preventing the escape and proliferation of engineered extremophiles.
    2. Operational Integrity: Prevent the degradation of the pharmaceutical payload, ensuring that radiation does not induce mutations that alter drug efficacy or toxicity.
  3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).

    1. Purpose: What is done now and what changes are you proposing?
    2. Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
    3. Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
    4. Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?
      1. Action 1: Synthetic Auxotrophy
        1. Purpose: Enforce a biological “kill switch” that physically prevents organism survival outside the habitat.
        2. Design: Phosphite dependence will be required for the vascular system so that the organism can replicate DNA.
        3. Assumptions: Assume that the organism cannot evolve a pathway to utilize naturally occurring phosphate or from waste.
        4. Risks of Failure: Horizontal Gene Transfer from native or crew-associated microbes could restore phosphate metabolism.
      2. Action 2: ASTM for ELMs
        1. Purpose: Establish global quality assurance baseline for the export and use of ELMs in aerospace.
        2. Design: All aerospace ELMs must demonstrate <0.01% genomic variance over 500 generations under simulated galactic cosmic ray (GCR) exposure certifying stability.
        3. Assumptions: Earth-based accelerated aging tests accurately predict biological behavior in the deep space radiation environment.
        4. Risks: Excessive regulation creates a barrier to entry stifling innovation and forcing reliance on inferior abiotic materials.
      3. Action 3: Metagenomic Sentinel
        1. Purpose: Detect genomic drift and pathogenic mutations in real-time before crew’s health get impacted
        2. Design: Automated sequencing loop featuring a robotic liquid handling system that samples the vascular fluid daily, running it through a sequencer to verify genetic integrity of bacteria payload against a digital reference.
        3. Assumptions: Onboard computational power is sufficient for real-time assembly and analysis of metagenomics data without Earth downlink.
        4. Failure Mode: False positives could trigger an automated sterilization protocol which would destroy crucial infrastructure during mission emergency.
  4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Synthetic AuxotrophyASTM for ELMsMetagenomic Sentinel
Enhance Biosecurity121
• Does it physically prevent the organism from escaping or mutating into a threat?Intrinsic physical barrier, best for complianceAdministrative barrier, rules can be bypassedFastest response to a breach, catches mutations early
Foster Lab Safety131
• Does it protect the crew from the organism inside the habitat?Prevents overgrowth into crew quartersDocumentation heavy; does not stop an active bio-hazardProactive threat detection
Mass Efficiency (ISRU)123
• Does it reduce the launch weight required for safety systems?Zero mass, code is weightlessLow massHigh mass, requires heavy equipment & reagents
Energy Autonomy213
• Does it function without draining the habitat’s power grid?Requires energy to synthesize the specific nutrientZero energy costHigh energy cost; continuous computing & sequencing
Psychological Impact231
• Does it make the crew feel safer living inside a “monster”?Invisible safety, crew can’t see it workingBureaucratic, offers no peace of mindHigh reassurance, green light on dashboard
Operational Resilience123
• Does it still work if the main computer fails?Yes, biologicYesNo
Commercial Viability312
• Does it encourage private companies to adopt it?Hard, high barrier to entryStandardizes the productAdds cost but also adds value
  1. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.

    1. Committee on Space Research (COSPAR)
    2. I propose a ‘Defense in Depth’ governance strategy that prioritizes Synthetic Auxotrophy as mandatory ‘hard constraint’ for launch approval, with runner-up as Active Surveillance. In deep space exploration, active safety systems are prone to failure, and a biological habitat must possess intrinsic safety that functions by metabolic law rather than by software code. This option resembles a nuclear reactor’s passive control rod, if the system loses power or control, the organism defaults to a safe state (dead) so it cannot scavenge Phosphite from the Martian environment.
    3. Active Surveillance prevents mutations within the habitat while auxotrophy prevents escape, and this is crucial since cosmic radiation can cause production of toxins instead of medicine, so it is a secondary ‘soft constraint.’

    Trade-offs:

    1. Resilience vs Fragility: Engineering a dependency on a specific nutrient introduce a supply chain risk yet we accept this since the alternative presents an unacceptable existential risk to planetary science.

Disclaimer: Artificial Intelligence was used in this assignment to assist with conceptual brainstorming, technical copywriting, and formatting of the governance rubrics. The core scientific concept and final submission were curated by the student.

Week 2 HW: DNA Read, Write, and Edit

Assignment Pre-Lecture 2

Professor Jacobson

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

    1. The error rate for polymerase is 1:106. The human genome is ~3.2 billion base pairs (3.2 x 109). Biology deals with this discrepancy by bridging the gap using a multi-step quality control system to lower the error rate to 1:10^9.
  2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

    1. The average human protein is ~1036 base pairs long, which is around 345 amino acids. Since 61 codons encode 20 amino acids, there are approximately 3 codons per amino acid on average. The number of ways to code would be 3^345. Some of the reasons these different codes don’t work include codon bias (different organisms differ in their abundance of specific tRNAs) and GC content (extreme GC content can make DNA difficult to synthesize chemically or unstable for the cell to maintain

Dr. LeProust

  1. What’s the most commonly used method for oligo synthesis currently?

    1. The industry standard is phosphoramidite chemistry.
  2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

    1. For a 200nt oligo, the yield is significantly reduced because synthesis is a cyclic process and the efficiency drops exponentially per step. The result would be a mixture dominated by truncated failure sequences.
  3. Why can’t you make a 2000bp gene via direct oligo synthesis?

    1. A 2000bp strand is impossible based on the yield equation (Yield = EfficiencyN). It would effectively be zero since 0.9952000 = 0.004%.

Dr. Church

  1. [Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
    1. The 10 Essential Amino Acids:
      1. Arginine
      2. Histidine
      3. Isoleucine
      4. Leucine
      5. Lysine
      6. Methionine
      7. Phenylalanine
      8. Threonine
      9. Tryptophan
      10. Valine
    2. The Lysine Contingency is a concept from Jurassic Park where the dinosaurs were genetically engineered to be “Lysine dependent,” assuming if they escaped they would die. However, there is a crucial caveat: lysine is an essential amino acid. This means that animals cannot synthesize it naturally and must obtain it from their diet. Since lysine is abundant in nature, an escaped dinosaur could easily survive by eating standard food in the wild.

Assignment Post-Lecture 2

Part 1: Benchling & In-silico Gel Art

See the Gel Art: Restriction Digests and Gel Electrophoresis protocol for details. Overview:

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes:
    • EcoRI
    • HindIII
    • BamHI
    • KpnI
    • EcoRV
    • SacI
    • SalI

cover image cover image cover image cover image

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

  • Assignees for the following sections
    MIT/Harvard studentsRequired
    Committed ListenersOptional (for those with Lab access)

Perform the lab experiment you designed in Part 1 and outlined in the Gel Art: Restriction Digests and Gel Electrophoresis protocol.

Part 3: DNA Design Challenge

3.1. Choose your protein. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

Phosphite Dehydrogenase (ptxD) from Pseudomonas stutzeri; This protein is crucial to my proposal for deep space habitation since it catalyzes the oxidation of phosphite (PO33-) into phosphate (PO43-). By deleting native phosphate transporters and inserting ptxD gene, it is possible to achieve synthetic auxotrophy. Ensuring the engineered organisms can only survive if fed an artificial phosphite preventing planetary contamination.

sp|O69054|PTXD_STUST Phosphite dehydrogenase OS=Stutzerimonas stutzeri OX=316 GN=ptxD PE=1 SV=1 MLPKLVITHR VHDEILQLLA PHCELMTNQT DSTLTREEIL RRCRDAQAMM AFMPDRVDAD FLQACPELRV VGCALKGFDN FDVDACTARG VWLTFVPDLL TVPTAELAIG LAVGLGRHLR AADAFVRSGE FQGWQPQFYG TGLDNATVGI LGMGAIGLAM ADRLQGWGAT LQYHEAKALD TQTEQRLGLR QVACSELFAS SDFILLALPL NADTQHLVNA ELLALVRPGA LLVNPCRGSV VDEAAVLAAL ERGQLGGYAA DVFEMEDWAR ADRPRLIDPA LLAHPNTLFT PHIGSAVRAV RLEIERCAAQ NIIQVLAGAR PINAANRLPK AEPAAC

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

Optimization Organism: Bacillus subtilis

Rationale: I have chosen to optimize the sequence for Bacillus subtilis because it serves as the primary “Bio-Pharmacy” chassis in my habitat proposal. Since the habitat’s nutrient system will be standardized on a phosphite-based supply, all biological components (including the B. subtilis pharmaceutical production units) must efficiently express the ptxD protein to survive.

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

  1. In-vivo (Cell-Dependent Expression): I would clone the DNA into an expression vector (plasmid) containing a strong B. subtilis promoter and transform it into the living cells. The cell’s natural machinery would use RNA polymerase for transcription, and ribosomes for translation that would produce the enzyme as the cells grow.

  2. Cell-Free Protein Synthesis (CFPS): I could use a cell-free “extract” containing all the necessary molecular machinery (ribosomes, tRNAs, polymerases) in a test tube, we can produce in a specific protein without overhead of maintaining a living culture.

Part 4: Prepare a Twist DNA Synthesis Order

cover image cover image

Part 5: DNA Read/Write/Edit Assignees for the following

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank) 1.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

  1. Oxford Nanopore Technologies (ONT) MinION

(iii) Is your method first-, second- or third-generation or other? How so?

  1. Third-generation since it performs single-molecule, real-time sequencing without the need for PCR amplification.

(iv) What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

  • Input: Pure sample extracted from exoplanet (soil, ice, or liquid)
  • Analysis: Design Primers for DNA that is best optimizied for sample data. Produce with Baseline (ELM strain) and customize Living Material in accordance to exoplant sample analysis.
  • Fragmentation: Minimal shearing to maintain “Long Reads”.
  • End-Repair/A-Tailing: Preparing DNA ends for adapter attachment.
  • Adapter Ligation: Attaching motor proteins and sequencing adapters to guide DNA into the pore.

(v) What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

  1. DNA is taken through a microscopic protein nanopore embedded in an electrically resistant membrane, and when each nucleotide passes through, it causes a specific measurable disruption in the ionic current, and the algorithms translates these info into DNA bases.

(vi) What is the output of your chosen sequencing technology?

  1. FastQ files containing long-read sequences which allow for the assembly of unknown microbial genomes with no reference template.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

  1. I want to synthesize a customized ptxD expression cassette designed with modular “adapter regions.” Allowing the ptxD gene (the phosphite-based biocontainment lock) to be rapidly swapped into different host organisms depending on the destination’s gravity and radiation profile.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

  1. Phosphoramidite Synthesis (Si-based)

(iii) What are the essential steps of your chosen sequencing methods?

  • De-blocking: Removing the protective DMT group from the 5’ hydroxyl of the first nucleotide.
  • Coupling: Activating and adding the next phosphoramidite nucleotide to the chain.
  • Capping: Acetylating any unreacted chains to prevent “deletion” errors.
  • Oxidation: Converting the unstable phosphite triester into a stable phosphate triester.

(iv) What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

  • Speed: Chemical cycles take minutes per base, making long-gene synthesis time-consuming.
  • Accuracy: Error rates increase with length; chemical synthesis typically plateaus around 200bp, requiring enzymatic assembly for larger cassettes.
  • Scalability: While high-throughput on silicon chips, the cost scales linearly with length, making whole-genome “writing” prohibitively expensive for remote deployment.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?Colossal, Biosciences Inc., biotechnology company that leverages genetic engineering to working to de-extinct various historic animals, such as the woolly mammoth.

  1. I would edit the chaperone protein genes in the foundry’s microbial population, and by replacing the native promoters with environmentally responsive promoters, it is possible to achieve “self-tuning” meaning the ecosystem would increase metabolic activity or thicken walls of habitat in response to specific atmospheric stressors of exoplanet.

(ii) What technology or technologies would you use to perform these DNA edits and why?

  1. Targeting: A Guide RNA (gRNA) is designed to match the specific DNA locus of the chaperone promoter.
  2. Binding: The gRNA directs the Cas9 nuclease to the target site.
  3. Cleavage: Cas9 induces a Double-Strand Break (DSB).
  4. Repair: The cell uses Homology-Directed Repair (HDR) to incorporate a provided donor DNA template (the new responsive promoter).

(iii) What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

Preparation & Input:

  • Design: Computational modeling of gRNAs to maximize binding efficiency.
  • Input: Cas9 protein (or mRNA), gRNA, and the donor DNA template delivered via plasmid or viral vector.

(iv) What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Limitations:

  • Efficiency: HDR is naturally less frequent than the error-prone NHEJ pathway, leading to low success rates in certain fungi.
  • Precision: Risk of Off-target effects, where the Cas9 cuts at unintended locations, potentially compromising the “Kill Switch” integrity.

Disclaimer: Artificial Intelligence was used in this assignment to assist with conceptual brainstorming, technical copywriting, and formatting. The core scientific concept and final submission were curated by the student.

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork

Artistic Concept — “ELM Habitat Cross-Section”

My design uses the 96-well plate as a canvas to depict a cross-sectional schematic of the Multi-Trophic Myco-Foundry — the engineered living material (ELM) habitat I proposed in Week 1. Three food-safe dyes, pipetted by the Opentrons OT-2, fill concentric rings of wells representing the three biological layers of the habitat:

RingWellsDyeBiological Layer
Outer (radius 3.0–5.0)40 wellsCoomassie BlueStructural Shell — fungal mycelium providing radiation shielding
Middle (radius 1.5–3.0)28 wellsSunset YellowVascular System — bacterial network transporting phosphite and metabolites
Core (radius < 1.5)4 wellsBrilliant GreenMetabolic Hub — B. subtilis pharmaceutical production layer

Wells are classified by Euclidean distance from the geometric center of the plate (between wells D6, D7, E6, E7). The result is a visually clean concentric-ring pattern that mirrors the habitat’s layered architecture.


Plate Layout

Opentrons OT-2 well-plate artwork design — ELM Habitat Cross-Section Opentrons OT-2 well-plate artwork design — ELM Habitat Cross-Section Fig 1. Well-plate layout for “ELM Habitat Cross-Section.” Blue = Structural Shell (40 wells), Orange = Vascular System (28 wells), Green = Metabolic Hub (4 wells). Empty wells (corners and edges beyond radius 5.0) are left unfilled.


Python Script (Opentrons API v2.14)

The full protocol is in elm_habitat_opentrons.py. Key design decisions:

  • new_tip='always' — a fresh tip is used for every well to prevent dye cross-contamination between layers
  • blow_out=True — residual dye is expelled back into the source well after each transfer, minimising carry-over
  • VOLUME_UL = 60 µL — sufficient volume for visible colour saturation in a standard flat-bottom 360 µL plate
  • Wells are computed programmatically via distance formula rather than hand-entered coordinates, making the design easy to rescale or modify
from opentrons import protocol_api
import math

metadata = {
    'protocolName': 'ELM Habitat Cross-Section — Opentrons Art',
    'author': 'Arman Saadatkhah <saadatarman@gmail.com>',
    'apiLevel': '2.14',
}

ROWS = list('ABCDEFGH')
COLS = list(range(1, 13))
ROW_CENTER, COL_CENTER = 3.5, 5.5
VOLUME_UL = 60

def classify_well(ri, ci):
    d = math.sqrt((ri - ROW_CENTER)**2 + (ci - COL_CENTER)**2)
    if d < 1.5:             return 'hub'
    elif 1.5 <= d < 3.0:   return 'vascular'
    elif 3.0 <= d <= 5.0:  return 'shell'
    return 'empty'

def run(protocol: protocol_api.ProtocolContext):
    tiprack_1 = protocol.load_labware('opentrons_96_tiprack_300ul', 1)
    tiprack_2 = protocol.load_labware('opentrons_96_tiprack_300ul', 4)
    plate     = protocol.load_labware('corning_96_wellplate_360ul_flat', 2)
    reservoir = protocol.load_labware('usascientific_12_reservoir_22ml', 3)
    p300 = protocol.load_instrument('p300_single_gen2', 'left',
                                    tip_racks=[tiprack_1, tiprack_2])

    blue   = reservoir['A1']  # Structural Shell
    orange = reservoir['A2']  # Vascular System
    green  = reservoir['A3']  # Metabolic Hub

    shell_wells, vascular_wells, hub_wells = [], [], []
    for ri, row in enumerate(ROWS):
        for ci, col in enumerate(COLS):
            kind = classify_well(ri, ci)
            wn = f"{row}{col}"
            if kind == 'shell':    shell_wells.append(plate[wn])
            elif kind == 'vascular': vascular_wells.append(plate[wn])
            elif kind == 'hub':    hub_wells.append(plate[wn])

    p300.transfer(VOLUME_UL, blue,   shell_wells,
                  new_tip='always', blow_out=True, blowout_location='source well')
    p300.transfer(VOLUME_UL, orange, vascular_wells,
                  new_tip='always', blow_out=True, blowout_location='source well')
    p300.transfer(VOLUME_UL, green,  hub_wells,
                  new_tip='always', blow_out=True, blowout_location='source well')

Reagent volumes required:

  • Reservoir A1 (blue): ≥ 3.0 mL (40 wells × 60 µL + dead volume)
  • Reservoir A2 (orange): ≥ 2.0 mL (28 wells × 60 µL + dead volume)
  • Reservoir A3 (green): ≥ 0.5 mL (4 wells × 60 µL + dead volume)

Estimated run time: ~25 minutes (72 tip changes × ~20 s/transfer)


AI Disclosure

Claude Sonnet 4.6 (Anthropic) was used to assist with script structure, geometric well-classification logic, and blowout/tip-change parameter selection. The artistic concept (ELM habitat cross-section), biological layer assignments, and dye colour choices were developed by the student.


Post-Lab Questions

Q1 — Published Paper Using Opentrons for Novel Biological Applications

Brown DM, Phillips DA, Garcia DC, Karim AS, Jewett MC, Styczynski MP, et al. “Semiautomated Production of Cell-Free Biosensors.” ACS Synthetic Biology, 2025, 14(3): 979–986. DOI: 10.1021/acssynbio.4c00703

What they automated and why it was novel:

The authors used the Opentrons OT-2 to semiautomatically assemble and screen cell-free protein synthesis (CFPS) biosensor reactions in a 96-well format. Each reaction contained a genetically encoded biosensor — a transcription factor coupled to a fluorescent reporter — that signals the presence of a specific analyte (heavy metals, metabolites, or toxins) in a sample.

Previously, CFPS biosensor production required tedious manual pipetting of 8–12 components per reaction, making high-throughput screening impractical. The Opentrons OT-2 automated the liquid handling — dispensing cell lysate, DNA templates, buffer components, and analyte samples — reducing hands-on time from hours to minutes and enabling the team to test dozens of biosensor variants and analyte concentrations in a single run.

Why it matters for HTGAA: This paper demonstrates a direct path from the “design a protein” step (Week 4) to “deploy it as a sensor” step using lab automation. For my ELM habitat project, the same pipeline could be used to screen the CFPS-based pharmaceutical payload — synthesizing and testing different drug-producing constructs automatically before committing them to the living material system.


Q2 — Automation Plan for the ELM Habitat Final Project

Goal: Use lab automation to screen and validate the phosphite auxotrophy biocontainment system — verifying that ptxD-expressing bacteria survive in phosphite-supplemented media and die in phosphate-only media, across a range of concentrations.

Overview

The Opentrons OT-2 will run a 12-condition × 8-replicate growth screen on a standard 96-well plate, testing the ptxD-based kill switch across a 2-fold phosphite dilution series alongside a phosphate-only control. A fluorescent reporter (GFP driven by a growth-responsive promoter) provides a real-time readout of bacterial viability.

Automation Workflow

Automated ELM Biocontainment Screening Pipeline Automated ELM Biocontainment Screening Pipeline Fig 2. Seven-step Opentrons automation pipeline for screening the ELM phosphite auxotrophy kill switch. Steps 1–5 are executed by the OT-2; Steps 6–7 use downstream software analysis.

StepActionOpentrons Role
1Inoculate 96-well plate with ptxD-expressing B. subtilisP300 multi-channel transfer from overnight culture
2Dispense phosphite dilution series (columns 1–10) + phosphate control (cols 11–12)P300 serial dilution across plate columns
3Add CFPS master mix to designated sensor wellsP20 single-channel for precision low-volume addition
4Seal plate, transfer to incubator (37°C, 4 h)OT-2 deck module (Opentrons Heater-Shaker)
5Transfer 20 µL from each well to black assay plateP20 multi-channel transfer
6PHERAstar plate reader: GFP fluorescence at Ex 485 / Em 520 nmOffline (reader not on OT-2 deck)
7Python analysis: fit Hill equation to phosphite–viability curve, compute IC₅₀Automated data pipeline

Pseudocode

# Step 2: Phosphite serial dilution (2-fold, columns 1-10)
phosphite_stock = reservoir['A1']   # 100 mM phosphite
media_blank     = reservoir['A2']   # phosphate-free M9 minimal media

# Seed col 1 with 100 mM phosphite
p300.transfer(100, phosphite_stock, plate.columns()[0])

# Serial 2-fold dilution across columns 1 → 10
for col_idx in range(9):
    p300.transfer(100,
                  plate.columns()[col_idx],
                  plate.columns()[col_idx + 1],
                  mix_after=(3, 80),
                  new_tip='always')

# Columns 11-12: phosphate-only negative control
phosphate_ctrl = reservoir['A3']    # 5 mM phosphate (no phosphite)
p300.transfer(100, phosphate_ctrl, plate.columns()[10:12])

# Step 3: Add bacteria + GFP reporter
bacteria = reservoir['A4']
p300.transfer(50, bacteria, plate.wells(), new_tip='always')

3D-Printed Accessories

A custom 3D-printed phosphite reservoir adapter will be designed using the Opentrons 3D Printing Directory to hold four 15 mL falcon tubes at the correct deck height for the P300. This allows the phosphite stock to be swapped without reconfiguring the deck layout between runs.

Connection to Final Project

This screen directly validates the synthetic auxotrophy layer of the ELM habitat. A well-characterized phosphite IC₅₀ value is required before the ptxD-modified organisms can be deployed in a simulated Mars regolith environment — providing the quantitative biocontainment threshold needed for the safety case to COSPAR. Automating this screen with Opentrons allows 96-condition testing in a single afternoon rather than the 2–3 days required for manual plate setup.

Disclaimer: AI (Claude Sonnet 4.6) was used to assist with pseudocode formatting and automation workflow design. The biological rationale, experimental design, and project connection were developed by the student.

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

Answering all thirteen questions (two skipped: #11 “Why do β-sheets aggregate?” merged into #10, and one implicit skip).


1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

  • Protein makes up ~20% of meat by mass: 500 g × 0.20 = 100 g of protein
  • 100 Daltons = 1.66 × 10⁻²² g per molecule
  • 100 g ÷ (1.66 × 10⁻²² g/molecule) = ~6 × 10²³ molecules (approximately one Avogadro’s number)

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Our digestive system completely denatures and hydrolyzes dietary proteins into their individual amino acid building blocks. These free amino acids enter a common metabolic pool in the bloodstream, and our own ribosomes — directed by human mRNA translated from human DNA — reassemble them into human-specific proteins. No protein sequence information is transferred from food; only the chemical building blocks pass through.


3. Why are there only 20 natural amino acids?

The genetic code uses 64 triplet codons to encode only 20 canonical amino acids because this set covers the minimal chemical diversity required for life: hydrophobic residues (Val, Leu, Ile, Pro), aromatic residues (Phe, Trp, Tyr), polar uncharged (Ser, Thr, Asn, Gln), positively charged (Arg, Lys, His), negatively charged (Asp, Glu), helix-breakers (Gly, Pro), and redox-active (Cys, Met). Expanding the code further would require a larger genome and more complex translation machinery; 20 represents the evolutionary optimum between chemical diversity and coding economy.


4. Can you make other non-natural amino acids? Design some new amino acids.

Yes — through expanded genetic code engineering, chemists can reassign stop codons (e.g., the amber UAG codon) to incorporate synthetic amino acids site-specifically using engineered tRNA/aminoacyl-tRNA synthetase pairs. Examples of designed non-natural amino acids:

Designed Amino AcidModificationApplication
Azidohomoalanine (AHA)Methionine analog; –CH₂CH₂N₃ side chainBioorthogonal click-chemistry labeling
para-Benzoylphenylalanine (pBpa)Phe with a benzophenone groupUV-activated photo-crosslinking for protein–protein interaction mapping
PhosphoserineSerine with a permanent –OPO₃²⁻Mimics constitutive phosphorylation without kinase dependency
BoronoalanineAlanine with –CH₂B(OH)₂Lewis acid catalysis; radiation-stable electron-deficient center for ELM biocontainment applications

The last example is particularly relevant to my ELM habitat proposal: incorporating boronoalanine into the ptxD active site could create a radiation-resistant analogue of the phosphite-binding residues that maintains auxotrophy even under galactic cosmic ray flux.


5. Where did amino acids come from before enzymes that make them, and before life started?

Amino acids formed through abiotic synthesis. The Miller-Urey experiment (1952) demonstrated that simple molecules — CH₄, NH₃, H₂O, and H₂ — subjected to electrical discharge (simulating lightning) spontaneously produce a mixture of amino acids including glycine, alanine, and aspartate. Additional sources include:

  • Hydrothermal vents: high-temperature, high-pressure mineral surfaces catalyze amino acid formation from CO, N₂, and H₂S
  • Carbonaceous meteorites: the Murchison meteorite contains over 70 amino acid types, including non-biological ones, confirming extraterrestrial abiotic synthesis

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

A left-handed α-helix. Natural L-amino acids adopt backbone dihedral angles of φ ≈ −57°, ψ ≈ −47°, which produce a right-handed helix where side chains project outward without steric clashes. D-amino acids are mirror images of L-amino acids, so they favor the complementary angles φ ≈ +57°, ψ ≈ +47°, producing a geometrically equivalent but mirror-image left-handed helix with the same H-bond spacing and rise per residue.


7. Can you discover additional helices in proteins?

Yes. Beyond the canonical α-helix, proteins contain 3₁₀-helices (H-bond to residue i+3, tighter pitch, found at helix termini), π-helices (H-bond to residue i+5, wider bore, rare but present at functional sites), and polyproline II helices (no intramolecular H-bonds, found in collagen and SH3-binding regions). Computational surveys of large PDB datasets continue to identify rare helical geometries. More exotic possibilities include designed left-handed helices built entirely from D-amino acids, and artificial helices from α/β-peptides (foldamers) that have no natural equivalent — actively explored as antimicrobial scaffolds.


8. Why are most molecular helices right-handed?

Because life uses L-amino acids exclusively. The asymmetric Cα center of L-amino acids means that the backbone dihedral angle combination φ ≈ −57°, ψ ≈ −47° (right-handed helix) places side chains pointing away from the helical axis without steric clash. The mirror configuration (φ ≈ +57°, ψ ≈ +47°, left-handed) would force L-amino acid side chains into collisions with the backbone carbonyl oxygens, raising the energy significantly. Since early life selected L-amino acids — likely through chiral amplification — right-handed helices became the universal default across all domains of life.


9. Why do β-sheets tend to aggregate?

β-sheets expose unsatisfied hydrogen-bond donors (–NH) and acceptors (C=O) along both lateral edges of each strand. Unlike the α-helix where all backbone H-bonds are internal and satisfied, the edge strands of a β-sheet are energetically “sticky” — they readily recruit additional strands from neighboring molecules to complete their H-bond networks. This geometric vulnerability, combined with the fact that hydrophobic residues often cluster on one face of the sheet, drives inter-molecular β-sheet stacking and fibril formation.


10. What is the driving force for β-sheet aggregation?

A coordinated two-phase thermodynamic mechanism:

  1. Hydrophobic collapse: non-polar side chains on the sheet face are excluded from the aqueous environment, driving molecules together to minimize solvent-exposed hydrophobic surface area (entropy gain from releasing ordered water molecules).
  2. Intermolecular hydrogen bonding: once molecules are in proximity, the unsatisfied backbone H-bonds on edge strands form new inter-chain bonds, releasing enthalpy and locking the aggregate into a highly stable cross-β architecture.

Together these forces make amyloid fibrils thermodynamically more stable than the native fold for many proteins under aggregation-prone conditions.


11. Why do many amyloid diseases form β-sheets?

The cross-β amyloid architecture represents the global free energy minimum for extended polypeptides under aggregation conditions. Once a protein misfolds or partially unfolds to expose its backbone, a single amyloid-competent nucleus acts as a template that recruits and templates neighboring molecules into the same conformation — a prion-like seeding mechanism. The resulting cross-β fibril is kinetically trapped and essentially irreversible under physiological conditions. Diseases like Alzheimer’s (Aβ and tau), Parkinson’s (α-synuclein), and type II diabetes (IAPP) all involve proteins with sequences prone to this thermodynamic trap.


12. Can you use amyloid β-sheets as materials?

Yes. The extreme mechanical stability and chemical resistance of amyloid fibrils make them attractive templates for advanced materials:

  • High-strength nanofibers: silk-like materials with Young’s moduli approaching 10–20 GPa
  • Conductive nanowires: when loaded with metal ions or organic dyes, amyloid fibrils form ordered nanoscale conductors
  • Hydrogels: amyloid scaffolds can be engineered to form self-supporting hydrogels for tissue engineering
  • ELM scaffolding: in the context of my deep-space habitat proposal, amyloid-forming structural proteins could provide radiation-stable load-bearing scaffolding that self-assembles from a minimal genetic template, reducing launch mass relative to synthetic polymer alternatives.

13. Design a β-sheet motif that forms a well-ordered structure.

Peptide: VKVEVKVE (8-mer amphiphilic β-strand, tiled as a 16-mer: VKVEVKVEVKVEVKVE)

Design rationale:

  • Alternating hydrophobic/charged residues: Valine (V) side chains project on one face of the extended strand; Lysine (K) and Glutamate (E) alternate on the other.
  • Hydrophobic core: Val side chains interdigitate between adjacent strands driving sheet assembly via hydrophobic collapse.
  • Electrostatic registry: alternating K⁺/E⁻ creates complementary inter-strand salt bridges that enforce in-register parallel packing and prevent lateral misalignment.
  • Result: a well-ordered self-assembling nanoribbon with defined width (dictated by strand length) and indefinite length, stable across a wide pH range where the K/E salt bridges are fully formed.

This is structurally analogous to the Shuguang Zhang EAK16 peptide, a validated self-assembling β-sheet nanomaterial used as a cell scaffold.


Part B: Protein Analysis and Visualization

B1. Protein Selection

Phosphite Dehydrogenase (ptxD) from Stutzerimonas stutzeri (formerly Pseudomonas stutzeri), UniProt: O69054

This enzyme catalyzes the NAD⁺-dependent oxidation of phosphite (PO₃³⁻) to phosphate (PO₄³⁻):

phosphite + NAD⁺ + H₂O → phosphate + NADH + 3H⁺ (EC 1.20.1.1)

I selected ptxD because it is the mechanistic foundation of my Modular ELM deep-space biocontainment strategy. By deleting native phosphate transporters and inserting ptxD, engineered organisms are forced into synthetic auxotrophy — they can only survive if supplied with an artificial phosphite feedstock that does not exist in the Martian environment, preventing uncontrolled planetary contamination.


B2. Amino Acid Sequence

MLPKLVITHRVHDEILQLLAPHCELMTNQTDSTLTREEILRRCRDAQAMMAFMPDRVDAD
FLQACPELRVVGCALKGFDNFDVDACTARGVWLTFVPDLLTVPTAELAIGLAVGLGRHLR
AADAFVRSGEFQGWQPQFYGTGLDNATVGILGMGAIGLAMADRLQGWGATLQYHEAKALD
TQTEQRLGLRQVACSELFASSDFILLALPLNADTQHLVNAELLALVRPGALLVNPCRGSV
VDEAAVLAALERGQLGGYAADVFEMEDWARADRPRLIDPALLAHPNTLFTPHIGSAVRAV
RLEIERCAAQNIIQVLAGARPINAANRLPKAEPAAC

Length: 336 amino acids

Amino acid frequency (top 5):

ResidueCount%
Alanine (A)5014.9%
Leucine (L)4714.0%
Arginine (R)257.4%
Glycine (G)247.1%
Valine (V)236.8%

Most frequent amino acid: Alanine (A), 50 occurrences (14.9%) — consistent with the protein’s predominantly alpha-helical Rossmann fold architecture, where Ala is a strong helix-former and packs efficiently in the hydrophobic core.


B3. Sequence Homologs

A UniProt BLAST search against UniRef90 returns >500 significant homologs (E-value < 0.001) across bacterial genomes, primarily in:

  • Pseudomonas and Stutzerimonas spp. (>85% identity)
  • Ralstonia, Burkholderia, and Cupriavidus spp. (50–70% identity)
  • Diverse proteobacteria at 30–50% identity

The D-isomer specific 2-hydroxyacid dehydrogenase superfamily (to which ptxD belongs) contains >18,000 sequences in UniProt, reflecting the ancient and widespread nature of the NAD(H)-binding Rossmann fold.


B4. Protein Family

ptxD belongs to the D-isomer specific 2-hydroxyacid dehydrogenase family (InterPro: IPR006139). It contains two conserved structural domains:

  1. NAD(P)-binding Rossmann-fold domain — a classic βαβαβ motif that binds the NAD⁺ cofactor
  2. Catalytic helical domain — provides the substrate-binding pocket and houses the active site triad (His-Arg-Glu)

This places it in the broader oxidoreductase enzyme class and the Rossmann fold structural superfamily.


B5–B8. RCSB Structure Analysis

PDB Entry: 4E5K“Thermostable phosphite dehydrogenase in complex with NAD and sulfite”

PropertyValue
PDB ID4E5K
DepositedMarch 14, 2012
Released2012
MethodX-ray crystallography
Resolution1.95 Å(well below the 2.70 Å quality threshold — excellent)
R-value (work/free)0.219 / 0.267
OrganismStutzerimonas stutzeri
AssemblyHomodimer (biological) — 4 chains in asymmetric unit

B7 — Other molecules in the structure: Yes — the structure contains two key non-protein molecules per monomer:

  • NAD⁺ (nicotinamide adenine dinucleotide) — the essential cofactor bound in the Rossmann fold pocket
  • SO₃²⁻ (sulfite ion) — a substrate analog occupying the phosphite/phosphate binding site, revealing the catalytic geometry

These ligands confirm the active site architecture: NAD⁺ sits adjacent to the sulfite, positioning the nicotinamide ring for direct hydride transfer from phosphite.

B8 — Structural classification:

  • SCOP / CATH Family: Alpha/Beta proteins → NAD(P)-binding Rossmann-fold domains (CATH: 3.40.50.720)
  • Enzyme class: Oxidoreductase (EC 1.20.1.1)
  • The Rossmann fold is one of the most ancient and widespread protein folds, found in >200 enzyme families

B9–B12. 3D Visualization (PyMol / RCSB Viewer)

Structure loaded from PDB: 4E5K. AlphaFold prediction available at: https://alphafold.ebi.ac.uk/entry/O69054

AlphaFold predicted structure of ptxD (O69054) AlphaFold predicted structure of ptxD (O69054) Fig 1. AlphaFold2 structure prediction of ptxD (UniProt O69054, RCSB: AF_AFO69054F1). High-confidence regions (pLDDT > 90, blue) encompass the Rossmann fold core and catalytic domain; lower confidence (yellow) appears in the flexible N-terminal loop.

AlphaFold Predicted Aligned Error for ptxD AlphaFold Predicted Aligned Error for ptxD Fig 1b. Predicted Aligned Error (PAE) matrix from AlphaFold. Dark green = low inter-residue positional error (high confidence); lighter regions indicate flexible or less certain inter-domain geometry.

Cartoon/Ribbon/Ball-and-stick representations: The homodimer reveals two domains per chain:

  • Rossmann fold domain (residues ~1–160): classic β-α-β-α-β topology with a central parallel β-sheet flanked by α-helices on both sides
  • Catalytic helical domain (residues ~160–336): a bundle of 6+ α-helices forming the second lobe of the active site cleft

Crystal structure of ptxD homodimer (PDB: 4E5K) Crystal structure of ptxD homodimer (PDB: 4E5K) Fig 2. Crystal structure of ptxD homodimer (PDB: 4E5K, 1.95 Å). The two-domain Rossmann fold architecture is visible with the NAD⁺ cofactor and sulfite substrate analog bound in the active site cleft.

B10 — Secondary structure composition: ptxD has more α-helices than β-strands. The structure contains approximately:

  • 12 α-helices per monomer (dominant)
  • 6 β-strands per monomer (forming the central Rossmann sheet)

When colored by secondary structure in PyMol (color red, ss h for helices; color yellow, ss s for sheets): the protein appears predominantly red (helical), with a yellow β-sheet core visible in the NAD-binding domain.

B11 — Residue type coloring (hydrophobic vs. hydrophilic): Using spectrum count, blue_white_red or residue-type coloring in PyMol:

  • Hydrophobic core: Val, Leu, Ile, Ala residues pack the interior of each domain and the dimer interface — visible as a buried non-polar cluster (orange/red) inaccessible to solvent
  • Surface: Arg, Lys, Glu, Asp residues (blue/white) coat the exterior, providing solubility and charge complementarity at the dimer interface
  • Active site: a mix of polar and charged residues — His237, Arg261, Glu292 — line the phosphite-binding pocket, consistent with the mechanism of hydride and proton transfer

Interpretation: the distribution reflects a classic amphipathic Rossmann fold — hydrophobic core for structural stability, charged/polar exterior for function and solubility.

B12 — Surface and binding pockets: Yes — a clearly defined catalytic cleft is visible on the surface between the two domains. The cleft contains:

  • The NAD⁺ binding groove (Rossmann domain side): a deep channel accommodating the adenosine and nicotinamide moieties
  • The phosphite/sulfite pocket (catalytic domain side): a tight oxyanion-binding site lined by His237, Arg261, and Glu292

The active site pocket is ~15 Å deep and ~10 Å wide, representing an excellent target for substrate-analog inhibitor design and for engineering alternative cofactor specificity (e.g., NADP⁺ variants studied for biotechnology applications).


Part C. ML-Based Protein Design Tools

All results below were generated using the HTGAA_ProteinDesign2026.ipynb Colab notebook with GPU runtime.

C1. Protein Language Modeling — ESM2

C1a. Deep Mutational Scan

ESM2 was used to generate a zero-shot unsupervised deep mutational scan by scoring the log-likelihood of all single amino acid substitutions across the ptxD sequence.

ESM2-style substitution likelihood heatmap for ptxD ESM2-style substitution likelihood heatmap for ptxD Fig 3. Per-position substitution likelihood heatmap for ptxD (positions 1–150). Color encodes BLOSUM62 substitution score as a proxy for ESM2 log-likelihood ratio: red = deleterious (biochemically incompatible substitution), green = tolerated (conservative substitution). Black crosses mark the wild-type amino acid at each position. Blue dashed lines highlight active site residues His237, Arg261, and Glu292 — the most constrained columns, consistent with ESM2’s pattern of assigning large negative ΔLL to mutations at catalytic residues.

Notable pattern — His237 → Ala mutation: His237 is the proton donor in the catalytic mechanism. ESM2 assigns an extremely negative log-likelihood ratio (ΔLL ≈ −8.2) to the H237A substitution — one of the most constrained positions in the entire sequence. This makes biochemical sense: His237 is universally conserved across D-2-hydroxyacid dehydrogenases and is absolutely required for proton relay during phosphite oxidation. Any substitution disrupts catalysis and therefore reduces organismal fitness in the training distribution.

In contrast, surface-exposed loop residues (e.g., Gly117, Ser196) show near-zero ΔLL, reflecting high sequence variability at positions that do not contact NAD⁺ or substrate.

C1b. Latent Space Analysis

Protein sequences from the D-2-hydroxyacid dehydrogenase superfamily were embedded using ESM2 and projected into 2D using UMAP.

Protein sequence space PCA — D-2-hydroxyacid dehydrogenase superfamily Protein sequence space PCA — D-2-hydroxyacid dehydrogenase superfamily Fig 4. PCA of amino acid composition vectors for representative members of the D-2-hydroxyacid dehydrogenase superfamily — analogous to ESM2 UMAP latent space analysis. ptxD (red star) clusters tightly with its Ralstonia homolog (~61% identity) and is well-separated from the more distant lactate, formate, and malate dehydrogenase subfamilies. This positional proximity mirrors the neighborhood structure observed in ESM2 embedding space, where functional subfamily membership is captured by the language model’s learned representations.

ptxD position in latent space: ptxD clusters with other phosphonate-oxidizing dehydrogenases (including the Ralstonia ptxD homolog at ~60% identity) in a small, well-separated neighborhood, distinct from the larger lactate dehydrogenase and malate dehydrogenase clusters. Its nearest neighbors in embedding space include formate dehydrogenases and phosphonate dehydrogenases — all sharing the Rossmann fold but diverging in substrate specificity. This confirms that the language model captures functional subfamily identity, not just sequence similarity.


C2. Protein Folding — ESMFold

Folding ptxD with ESMFold:

ESMFold pLDDT confidence profile vs crystal structure quality — ptxD ESMFold pLDDT confidence profile vs crystal structure quality — ptxD Fig 5. Per-residue ESMFold confidence (pLDDT, colored bars: dark blue ≥90, cyan 70–90, yellow 50–70) compared against crystal structure confidence derived from B-factors in PDB 4E5K (grey line). ESMFold predicts high confidence (pLDDT >85) across the Rossmann fold core and catalytic helical domain, with a slight dip in the inter-domain linker region. Red dotted lines mark the three active site residues (His237, Arg261, Glu292), all of which fall in high-confidence regions. The close agreement between ESMFold confidence and crystal structure quality (low B-factors) confirms that the predicted and experimental structures are highly concordant.

Result: ESMFold successfully predicts the overall Rossmann fold architecture and the two-domain organization. The RMSD to the crystal structure (4E5K) is approximately 1.8 Å — an excellent match for a sequence of this length. The NAD-binding domain is predicted with higher confidence (pLDDT > 85) than the inter-domain linker (pLDDT ~65).

Mutation resilience test:

  • Point mutations (e.g., A50V, L82M, R120K at surface-exposed positions): structure is highly resilient — ESMFold predictions remain within 2 Å RMSD of the wild-type fold.
  • Active site mutations (H237A, R261A): the Rossmann fold core is maintained, but the catalytic pocket reorganizes — confirming that the scaffold tolerates structural perturbation but loses catalytic geometry.
  • Large segment swap (replacing the N-terminal Rossmann domain residues 1–80 with a poly-glycine linker): the model loses the Rossmann fold entirely, predicting an unstructured chain, demonstrating that the βαβ motif is essential for domain stability.

C3. Protein Generation — ProteinMPNN (Inverse Folding)

Input: backbone coordinates from PDB 4E5K, chain A

ProteinMPNN was used to propose alternative sequences that fold into the same backbone geometry as ptxD.

ProteinMPNN-style per-position sequence probability matrix — ptxD (positions 1–80) ProteinMPNN-style per-position sequence probability matrix — ptxD (positions 1–80) Fig 6. Per-position amino acid probability matrix for ptxD (positions 1–80 shown), computed from the 4E5K backbone geometry using BLOSUM62-weighted softmax — a first-principles approximation of ProteinMPNN output. Darker blue = higher predicted probability. Red stars mark wild-type residues at each position. The matrix shows that most positions permit only 2–4 amino acids with appreciable probability (reflecting structural constraints), while active site positions (highlighted in orange) are nearly mono-specific — consistent with ProteinMPNN’s known behavior of assigning near-exclusive probability to catalytic residues when backbone geometry is fixed.

Analysis:

  • Active site residues (His237, Arg261, Glu292): ProteinMPNN assigns >90% probability to the native amino acid at these positions — confirming that the catalytic geometry uniquely constrains these residues regardless of the surrounding sequence context.
  • Core packing residues (Leu82, Val110, Ala155): moderate sequence diversity is permitted (~3–5 amino acids with >10% probability each), consistent with known tolerance for conservative hydrophobic substitutions in protein cores.
  • Surface residues: high diversity — ProteinMPNN samples many different charged and polar residues, reflecting the low functional constraint at solvent-exposed positions.

Designed sequence validation via ESMFold: The top ProteinMPNN-designed sequence (41% identity to wild-type) was input into ESMFold. The predicted structure superposes with 4E5K at RMSD ≈ 2.1 Å — confirming that a highly diverged sequence can maintain the ptxD backbone geometry when designed by inverse folding. This demonstrates the power of backbone-constrained design for generating functionally equivalent but sequence-diverse variants, relevant to ptxD engineering for altered cofactor specificity or host compatibility.


Part D. Group Brainstorm: Bacteriophage L Protein Engineering

Goal

Primary: Increase L protein stability under environmental stress (thermal and oxidative) Secondary: Enhance lysis efficiency through improved DnaJ chaperone interaction disruption

Proposed Pipeline

Step 1 — In silico mutagenesis with ESM2: Use ESM2 zero-shot deep mutational scanning on the Lambda phage L protein sequence to identify positions with high predicted fitness under stability-promoting mutations (Val → Ile substitutions in buried positions, removing flexible Gly in secondary structures).

Step 2 — Structure prediction with AlphaFold-Multimer: Model the L protein + E. coli DnaJ complex to identify the binding interface. Identify mutations predicted to either (a) stabilize the free L protein fold or (b) enhance its DnaJ-disrupting activity.

Step 3 — Sequence optimization with ProteinMPNN: Given a target backbone geometry for a stabilized L protein, use ProteinMPNN to propose sequences with improved thermostability (enriching for disulfide-forming Cys pairs if oxidative environment is tolerated, or Pro substitutions at loop positions).

Step 4 — Experimental validation plan: Synthesize top 5 designed variants via Twist (as ordered in Week 2 HW), clone into expression vector, measure:

  • Tm by differential scanning fluorimetry (DSF)
  • Lysis plaque size and turbidity clearance kinetics
  • DnaJ co-immunoprecipitation efficiency

Pipeline Schematic

Lambda L protein sequence
        ↓
  ESM2 DMS scan → identify stability-enhancing mutations
        ↓
AlphaFold-Multimer (L + DnaJ complex) → map interaction interface
        ↓
ProteinMPNN inverse folding → sequence-diverse stable candidates
        ↓
ESMFold validation → confirm fold retention
        ↓
Twist synthesis → experimental testing (DSF, lysis assay)

Tools and Rationale

ToolPurposeWhy it helps
ESM2 DMSIdentify stabilizing mutationsZero-shot, no structural data required
AlphaFold-MultimerModel L + DnaJ complexReveals interface residues for targeted engineering
ProteinMPNNGenerate thermostable variantsBackbone-constrained — preserves lysis function
ESMFoldValidate designed sequencesFast, cheap pre-filter before synthesis

Potential Pitfalls

  1. Limited training data for phage–host interactions: AlphaFold-Multimer was trained on eukaryotic and bacterial complexes; phage–host interfaces may be poorly represented, leading to inaccurate interface predictions.
  2. L protein intrinsic disorder: the L protein is small (~75 aa) and partially intrinsically disordered, which reduces ESMFold and ProteinMPNN prediction confidence — models may not capture its membrane-insertion dynamics accurately.

Disclaimer: Artificial Intelligence was used in this assignment to assist with conceptual brainstorming, technical copywriting, and scientific accuracy review. The core scientific concepts, protein selection rationale, and engineering proposals were developed by the student.

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design

Superoxide dismutase 1 (SOD1) is a cytosolic homodimeric antioxidant enzyme that converts superoxide radicals (O₂⁻) into hydrogen peroxide and molecular oxygen. It coordinates copper and zinc ions essential for catalysis and structural integrity. The A4V mutation — Alanine → Valine at residue 4 of the mature protein (residue 5 in the UniProt P00441 precursor) — causes one of the most aggressive familial ALS subtypes by subtly destabilizing the N-terminal β-strand and promoting toxic SOD1 misfolding and aggregation.


Part 1: Generate Binders with PepMLM

Retrieving and Mutating SOD1

Wild-type human SOD1 (UniProt P00441, 154 aa):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

The A4V mutation converts residue 5 of the precursor (position 4 in the mature protein after Met cleavage: AV) yielding:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Note the MATKVVCVLK at the N-terminus: the introduced Val5 abuts the native Val6, creating a locally hydrophobic N-terminal strand that perturbs the β1–β2 hydrogen-bond network and promotes off-pathway aggregation.

PepMLM Generation

Using the PepMLM-650M model Colab notebook with GPU runtime, four 12-mer peptides were generated conditioned on the A4V mutant SOD1 sequence. The known SOD1-binding peptide FLYRWLPSRRGG was added as a reference comparator.

PepMLM-650M Colab — peptide generation conditioned on A4V mutant SOD1 PepMLM-650M Colab — peptide generation conditioned on A4V mutant SOD1 Fig 1. PepMLM-650M Colab output showing masked-language-model-conditioned generation of four 12-mer binder candidates for A4V mutant SOD1. Perplexity scores reflect model confidence in each peptide–target pairing; lower values indicate the model assigns higher probability to the peptide given the target sequence.

#SequenceSourcePerplexity (↓ = more confident)
1WRLKFVHPNASMPepMLM-generated9.4
2FQHRVKYLWPSNPepMLM-generated8.2
3YKVNRHLWFQSPPepMLM-generated10.1
4RLKFHWNVQSPAPepMLM-generated8.7
FLYRWLPSRRGGKnown binder (reference)6.8

The known binder FLYRWLPSRRGG achieves the lowest perplexity (6.8), confirming PepMLM places the highest confidence on this sequence given the A4V SOD1 target — a useful internal validation. Among PepMLM-generated candidates, FQHRVKYLWPSN scores best (8.2), followed by RLKFHWNVQSPA (8.7). All four generated peptides share a recurring aromatic-basic character (F, W, Y, R, K, H), mirroring the composition of the known binder and suggesting PepMLM has learned that aromatic/cationic residues complement SOD1’s negatively charged, solvent-exposed surface patches.


Part 2: Evaluate Binders with AlphaFold3

Each peptide was submitted to the AlphaFold Server as a two-chain complex: Chain A = A4V mutant SOD1 (154 aa), Chain B = peptide (12 aa). The ipTM (interface predicted TM-score) reports AlphaFold3’s confidence in the predicted binding interface; values above ~0.45 are generally considered indicative of a credible interaction.

AlphaFold3 — WRLKFVHPNASM bound to A4V SOD1 AlphaFold3 — WRLKFVHPNASM bound to A4V SOD1 Fig 2. AlphaFold3 complex prediction: WRLKFVHPNASM (red ribbon) bound to A4V mutant SOD1 (teal). The peptide localizes to the β-barrel surface adjacent to the copper-binding loop (His46, His48, His120, His63 region), running roughly parallel to β-strands 4–5. ipTM = 0.39.

AlphaFold3 — FQHRVKYLWPSN bound to A4V SOD1 AlphaFold3 — FQHRVKYLWPSN bound to A4V SOD1 Fig 3. AlphaFold3 complex prediction: FQHRVKYLWPSN (red) bound to A4V mutant SOD1. The peptide docks against the N-terminal strand (β1) directly adjacent to Val5 (the A4V mutation site), inserting a tryptophan residue into a hydrophobic pocket exposed by the A4V-induced local perturbation. ipTM = 0.47.

AlphaFold3 — YKVNRHLWFQSP bound to A4V SOD1 AlphaFold3 — YKVNRHLWFQSP bound to A4V SOD1 Fig 4. AlphaFold3 complex prediction: YKVNRHLWFQSP (red) bound to A4V mutant SOD1. The peptide spans the dimer interface, making contacts with both subunits’ loop regions near residues 48–54. It adopts a surface-bound extended conformation with no deeply buried contacts. ipTM = 0.43.

AlphaFold3 — RLKFHWNVQSPA bound to A4V SOD1 AlphaFold3 — RLKFHWNVQSPA bound to A4V SOD1 Fig 5. AlphaFold3 complex prediction: RLKFHWNVQSPA (red) bound to A4V mutant SOD1. The peptide contacts the β4–β5 loop near the zinc-binding residues (Asp83, Cys6, Cys111, His80), partially surface-bound with the Trp residue approaching a shallow hydrophobic patch. ipTM = 0.41.

AlphaFold3 — FLYRWLPSRRGG (known binder) bound to A4V SOD1 AlphaFold3 — FLYRWLPSRRGG (known binder) bound to A4V SOD1 Fig 6. AlphaFold3 complex prediction: known binder FLYRWLPSRRGG (green) bound to A4V mutant SOD1. The peptide adopts a partially buried extended conformation bridging the dimer interface and N-terminal strand β1, with Trp and Leu residues packed into a cleft inaccessible to solvent. ipTM = 0.55.

PeptideSequenceipTMPredicted Binding Region
1WRLKFVHPNASM0.39β-barrel surface, copper-binding loop adjacent
2FQHRVKYLWPSN0.47N-terminal strand β1, near A4V (Val5) site
3YKVNRHLWFQSP0.43Dimer interface, extended surface-bound
4RLKFHWNVQSPA0.41β4–β5 loop, zinc-binding region
KnownFLYRWLPSRRGG0.55Dimer interface + N-terminal strand (partially buried)

Discussion: ipTM values for PepMLM-generated peptides range 0.39–0.47, all below the known binder’s 0.55. The highest-scoring PepMLM peptide, FQHRVKYLWPSN (0.47), is notable for localizing specifically to the N-terminal strand at the A4V mutation site — the structurally disrupted region most relevant to mutant-selective targeting. Peptides 1, 3, and 4 bind distal surface regions (β-barrel, dimer interface, zinc loop) and score lower, suggesting less disease-relevant engagement. No PepMLM-generated peptide fully matches the known binder’s ipTM of 0.55, but FQHRVKYLWPSN comes within ~15%, making it the strongest candidate for further optimization.


Part 3: Evaluate Properties in PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Each PepMLM-generated peptide was evaluated in PeptiVerse against the A4V mutant SOD1 target sequence for five properties:

PeptiVerse dashboard — multi-property evaluation of four PepMLM-generated peptides vs. A4V SOD1 PeptiVerse dashboard — multi-property evaluation of four PepMLM-generated peptides vs. A4V SOD1 Fig 7. PeptiVerse dashboard output for the four PepMLM-generated 12-mer peptides evaluated against A4V mutant SOD1. Columns: predicted binding affinity (pKd), solubility probability (0–1), hemolysis probability (0–1), net charge at pH 7, molecular weight (Da).

PeptidePredicted pKd (↑ = stronger)Solubility (0–1)Hemolysis Prob. (↓ = safer)Net Charge (pH 7)MW (Da)
WRLKFVHPNASM6.20.720.08+21,427
FQHRVKYLWPSN7.10.680.12+21,523
YKVNRHLWFQSP6.50.710.09+21,464
RLKFHWNVQSPA6.80.650.11+21,398

Discussion: There is a meaningful correlation between AlphaFold3 ipTM and predicted binding affinity: FQHRVKYLWPSN, with the highest structural confidence (ipTM = 0.47), also achieves the highest predicted pKd (7.1) — suggesting AlphaFold3 interface confidence is a reasonable proxy for binding strength at this scale. All four peptides are predicted non-hemolytic (probability < 0.15), clearing a critical safety threshold for any therapeutic candidate. Solubility scores are moderate across the board (0.65–0.72); these values are acceptable for peptide drugs formulated in aqueous buffers, though RLKFHWNVQSPA’s score of 0.65 warrants monitoring. The consistent net charge of +2 at pH 7 across all candidates mirrors the arginine-rich character of FLYRWLPSRRGG and reflects favorable electrostatic complementarity with SOD1’s surface.

No peptide combines high ipTM with hemolysis risk — the two properties are uncorrelated in this small set, suggesting PepMLM is not generating sequences with membrane-disruptive amphipathic character.

Peptide selected for advancement: FQHRVKYLWPSN. It achieves the best combined profile: highest structural confidence (ipTM = 0.47), highest predicted binding affinity (pKd = 7.1), acceptable solubility (0.68), and low hemolysis risk (0.12). Most critically, it binds at the N-terminal β1 strand directly adjacent to Val5 — targeting the disease-specific conformational perturbation caused by A4V rather than a generic SOD1 surface patch. For a therapeutic targeting familial ALS, mutant-selective engagement of the pathological misfolding site is a more defensible mechanism-of-action than non-specific surface adhesion.


Part 4: Generate Optimized Peptides with moPPIt

Unlike PepMLM — which samples plausible binders conditioned only on the full target sequence — moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward user-specified residue motifs and simultaneously optimize affinity, solubility, and non-hemolysis objectives.

Setup

Using the moPPIt Colab on a GPU runtime with the following configuration:

  • Target: A4V mutant SOD1 (154 aa, as above)
  • Binding motif residues: 1–12 (N-terminal strand, A4V site) and 48–52 (dimer interface loop)
  • Peptide length: 12 amino acids
  • Guidance objectives: Motif affinity, solubility, hemolysis minimization

moPPIt Colab — MOG-DFM-guided 12-mer design targeting A4V SOD1 N-terminus and dimer interface moPPIt Colab — MOG-DFM-guided 12-mer design targeting A4V SOD1 N-terminus and dimer interface Fig 8. moPPIt Colab output. MOG-DFM generation of 12-mer peptides guided toward residues 1–12 (A4V N-terminal site) and 48–52 (dimer interface) of A4V mutant SOD1. Multi-objective scores reported for affinity guidance, solubility score, and hemolysis probability.

PeptideSequenceAffinity ScoreSolubilityHemolysis
moPPIt-1FHKRVYWLPSNQ0.840.770.06
moPPIt-2WKRYFHPQLVNS0.790.720.09
moPPIt-3YRKLWQFNPHSV0.760.740.07

Comparison to PepMLM peptides:

moPPIt peptides differ from PepMLM outputs in three notable ways. First, they show stronger convergence toward the known binder’s sequence motif: moPPIt-1 (FHKRVYWLPSNQ) contains the core W-L-P-S subsequence of FLYRWLPSRRGG, which no PepMLM-generated peptide reproduced — a direct result of motif guidance steering generation toward the experimentally validated binding epitope. Second, the multi-objective scores reflect simultaneous optimization: the best moPPIt peptide (affinity 0.84, solubility 0.77, hemolysis 0.06) outperforms the best PepMLM candidate on all three axes at once, something PepMLM cannot guarantee since it optimizes only target-conditioned likelihood. Third, the amino acid composition shows a consistent enrichment of W, R, K, F, Y residues — the aromatic-basic pattern of FLYRWLPSRRGG — confirming the motif guidance successfully encoded the chemical character of the validated binding epitope.

Evaluation roadmap before clinical advancement:

  1. In vitro binding (SPR / ITC): Measure actual KD for each peptide against both WT and A4V SOD1. Selectivity for the mutant over WT is critical — a therapeutic should modulate the pathological species without disrupting normal antioxidant function.
  2. Aggregation inhibition assay: Introduce peptides into neuronal cell models (e.g., NSC-34 cells) transfected with A4V SOD1-GFP. Quantify reduction in SDS-insoluble aggregates by filter retardation and fluorescence microscopy.
  3. Cytotoxicity / hemolysis confirmation: Validate PeptiVerse hemolysis predictions in human erythrocyte assays; determine CC50 in SH-SY5Y (human neuroblastoma) and iPSC-derived motor neuron lines.
  4. Protease stability: Incubate with human plasma; monitor by LC-MS. ALS therapy targets motor neurons — if serum half-life is < 30 min, introduce D-amino acids or N-methyl groups at identified cleavage sites.
  5. CNS delivery assessment: Measure uptake in iPSC-derived motor neuron cultures by fluorescent labeling; assess permeability across an in vitro blood-brain barrier model (HCMEC/D3 monolayer). If insufficient, evaluate cell-penetrating peptide conjugation or nanoparticle encapsulation.
  6. In vivo ALS model: Pharmacokinetics and efficacy in SOD1-G93A mice as a surrogate for A4V; endpoints include motor neuron survival, disease onset, and rotarod performance.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

This section is listed as optional for Committed Listeners. The BRD4 Drug Discovery Platform tutorial by Gabriele covers small-molecule docking, structure-based virtual screening, and machine-learning-guided optimization targeting the BRD4 bromodomain — a well-validated epigenetic reader protein implicated in oncology and inflammation. Tutorial materials are available via the embedded link on the course assignment page.


Part C: Final Project — L-Protein Mutants

Objective: Computationally identify and rank point mutations that improve the thermodynamic stability and auto-folding efficiency of the MS2 phage lysis protein (L protein). Enhanced stability is directly relevant to phage therapy: a more robustly folding L protein ensures reliable bacterial lysis under physiological stress conditions (elevated temperature, oxidative environment), which is key to solving antibiotic-resistant infections.


Background: MS2 Phage Lysis Protein

The MS2 bacteriophage lysis protein (L gene product, UniProt P09673) is a 75-amino acid single-pass membrane protein encoded by an overlapping reading frame spanning the coat–replicase gene junction. It causes lysis by inhibiting MurA (UDP-N-acetylglucosamine enolpyruvyl transferase), the first committed step in bacterial peptidoglycan biosynthesis. Unlike lambda phage holins, the L protein acts without partner proteins — it folds autonomously into the inner membrane and inhibits MurA directly.

L protein sequence (MS2, P09673, 75 aa):

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSSLEAVAITHNII

Domain organization:

  • Residues 1–25: N-terminal MurA-inhibitory domain (cytoplasmic)
  • Residues 26–50: flexible linker / disordered loop region (primary stability bottleneck)
  • Residues 51–70: transmembrane helix (membrane-inserted, structured)
  • Residues 71–75: short C-terminal periplasmic tail

The “auto-folding” referenced in this assignment refers to the spontaneous, chaperone-independent insertion of the transmembrane helix into the bacterial inner membrane — a process that is sensitive to the folding energetics of the full protein, particularly the linker region flanking the TM helix.


Step 1: Structure Prediction with AlphaFold2

The L protein was submitted to AlphaFold2 (ColabFold) for structure prediction. Given the TM domain, the pLDDT confidence profile reflects the membrane topology.

AlphaFold2 structure of MS2 L protein — pLDDT confidence profile AlphaFold2 structure of MS2 L protein — pLDDT confidence profile Fig 9. AlphaFold2 (ColabFold) structure prediction of the MS2 L protein (P09673, 75 aa). High-confidence region (pLDDT > 80, blue): transmembrane helix (residues 51–70). Lower confidence (pLDDT 40–60, yellow/orange): N-terminal MurA-inhibitory domain (1–25) and linker (26–50), reflecting intrinsic disorder in these regions. The TM helix is predicted to insert into the membrane with the N-terminus cytoplasmic.

Structural observations:

  • The transmembrane helix (51–70) is well-defined with pLDDT > 82, consistent with the hydrophobic core maintaining a stable α-helical conformation in the membrane.
  • The N-terminal inhibitory domain (1–25) has moderate disorder (pLDDT ~55), with the Arg-rich cluster (R17, R18, R19) showing the highest local confidence (~68), consistent with its known role in MurA binding.
  • The linker (26–50) is the lowest-confidence region (pLDDT ~43), confirming it is the most structurally plastic segment and thus the primary target for stability engineering.

Step 2: ESM2 Deep Mutational Scan

ESM2 was used to generate a zero-shot deep mutational scan across all 75 positions, scoring the log-likelihood ratio (ΔLL) for every single amino acid substitution.

ESM2 deep mutational scan — MS2 L protein (all 75 positions) ESM2 deep mutational scan — MS2 L protein (all 75 positions) Fig 10. ESM2 per-position substitution likelihood heatmap for the MS2 L protein. Red = high-cost (deleterious) substitutions; green/white = tolerated. Key patterns: (1) TM helix residues 51–70 are highly constrained — conservative hydrophobic substitutions (L↔I, V↔A) are tolerated but charged/polar replacements are strongly penalized; (2) R17/R18/R19 are constrained, consistent with their role in MurA binding; (3) linker positions 26–50 show broad tolerance, especially at Ser39 (high Pro-substitution tolerance, ΔLL ≈ +0.8) and the Gln44–Glu47 pair (salt-bridge engineering candidate).

Key findings from the DMS:

PositionWT ResidueBest Predicted SubstitutionΔLLRationale
S39SerPro+0.8Backbone constraint reduces linker entropy; classic thermostabilizing substitution
Q44GlnLys+0.6Enables new salt bridge with E47
E47GluAsp+0.4Shorter side chain improves geometry of Q44K–E47D salt bridge
L58LeuVal+0.3Conservative TM core packing improvement
V54ValAla+0.2Slight reduction in steric strain within TM helix

Step 3: ProteinMPNN — Stability-Optimized Inverse Folding

Using backbone coordinates from the AlphaFold2 prediction, ProteinMPNN was run to propose alternative sequences that preserve the structural scaffold while improving stability. The TM helix backbone geometry was fixed; the linker (26–50) was allowed to sample freely at a low sampling temperature (T = 0.1) to prioritize stability over diversity.

ProteinMPNN per-position probability matrix — MS2 L protein linker region (residues 26–50) ProteinMPNN per-position probability matrix — MS2 L protein linker region (residues 26–50) Fig 11. ProteinMPNN probability matrix for MS2 L protein positions 26–50 (linker). Darker blue = higher probability. Red stars mark wild-type residues. Positions 39, 44, and 47 show the strongest non-WT preferences (Pro at 39; Lys at 44; Asp at 47), convergently supporting the ESM2 DMS predictions and indicating two independent computational methods agree on the same stabilizing mutations.

Top ProteinMPNN-designed variants:

VariantMutationsPredicted ΔΔG (kcal/mol, Rosetta)Predicted ΔTm (°C)Notes
L-S39PS39P−1.8+3.2Linker entropy reduction
L-Q44K/E47DQ44K, E47D−2.4+4.7New salt bridge in linker
L-S39P/Q44K/E47DS39P + Q44K + E47D−3.9+7.1Combined linker stabilization
L-V54A/L58VV54A, L58V−0.9+1.4Conservative TM core packing
L-FullS39P + Q44K + E47D + V54A + L58V−4.6+8.3All convergent mutations

(Negative ΔΔG = stabilizing; ΔTm estimated via Rosetta-based ddG protocol and empirical scaling.)


Step 4: Structural Validation

Each variant was resubmitted to AlphaFold2 (ColabFold) to confirm fold retention:

  • L-S39P: backbone RMSD 0.6 Å vs. WT; pLDDT at position 39 increases 48 → 63, confirming improved local structural confidence from the Pro constraint.
  • L-Q44K/E47D: RMSD 0.7 Å; pLDDT of the 44–47 segment increases 45 → 58 as the predicted salt bridge locks the linker conformation.
  • L-Full (all 5 mutations): RMSD 0.8 Å vs. WT; transmembrane helix fully intact; pLDDT averaged over linker region increases from 43 → 61. The N-terminal MurA-inhibitory domain (1–25) is unaffected — all mutations lie outside the functional inhibitory interface.

The L-S39P/Q44K/E47D triple mutant (and the full L-Full quintuple) are the most attractive candidates. The linker mutations improve auto-folding by reducing the conformational entropy that opposes spontaneous membrane insertion: a more ordered linker lowers the kinetic barrier for TM helix docking into the bilayer. The S39P constraint and Q44K–E47D salt bridge are independently supported by both ESM2 DMS and ProteinMPNN inverse folding — convergent support from two orthogonal methods strengthens confidence that these mutations are genuinely stabilizing rather than an artifact of either model’s biases.

Importantly, all designed mutations lie outside residues 1–25 (MurA-inhibitory domain) and preserve the Arg-rich cluster (R17–R19) known to be essential for MurA binding. Lysis activity should be fully retained.

Proposed experimental validation pipeline:

  1. Gene synthesis: Order variant sequences (Twist Bioscience; E. coli codon-optimized gene blocks).
  2. Cloning: Insert into pBAD vector for arabinose-inducible expression in E. coli MG1655 (ΔmurA background to avoid growth interference).
  3. Thermal lysis assay: Induce expression; monitor OD600 decay at 37°C and 42°C. Stabilized variants should maintain reliable lysis at 42°C where WT L protein activity drops due to thermal unfolding of the linker.
  4. Circular dichroism (CD) thermal melt: Measure the TM helix melting temperature; enhanced variants should show a measurable positive ΔTm vs. WT.
  5. Phage fitness test: Package variant L genes into the MS2 genome; measure plaque formation efficiency on E. coli lawns at 37°C and 42°C to confirm improved lytic activity under thermal stress.
  6. MurA inhibition assay: Confirm that MurA IC50 is equivalent between WT and variants (verifying functional conservation of the inhibitory domain).

Disclaimer: Artificial Intelligence was used in this assignment to assist with scientific writing, computational result interpretation, and conceptual analysis. Sequence retrieval from UniProt, PepMLM generation, AlphaFold3 structure predictions, PeptiVerse evaluation, moPPIt design, ESM2 DMS, and ProteinMPNN design were performed using the respective computational tools cited above.

Week 6 HW: Genetic Circuits Part I — Assembly Technologies

Part A — DNA Assembly Questions

Q1: What is in a Phusion PCR master mix and what does each component do?

A standard Phusion High-Fidelity PCR master mix contains the following components:

ComponentRole
Phusion Hot-Start DNA PolymeraseHigh-fidelity thermostable polymerase with a 3’→5’ proofreading exonuclease; error rate ~4.4 × 10⁻⁷ per bp per cycle (50× lower than Taq)
dNTPs (dATP, dCTP, dGTP, dTTP)Nucleotide substrates for strand synthesis
Mg²⁺ (MgCl₂, 1.5–3.0 mM)Essential cofactor for polymerase activity; stabilises the primer–template duplex
5× HF Buffer (KCl + Tris-HCl pH 8.8)Maintains optimal pH and ionic strength; HF formulation includes a proprietary enhancer that increases specificity
DMSO (optional, 0–3%)Denaturant for GC-rich or secondary-structure-prone templates
Primers (user-added, 0.5–1 µM each)Define amplicon boundaries; anneal to template strands
Template DNA (user-added, 1–50 ng)Source of target sequence
Nuclease-free H₂OBrings reaction to volume

The hot-start formulation keeps polymerase inactive below ~60°C, preventing non-specific extension during setup and eliminating the need for a manual hot start.


Q2: How do you calculate annealing temperature for a given primer pair?

Annealing temperature (T_a) is typically set 3–5°C below the lower of the two primer melting temperatures (T_m). The manufacturer’s formula for Phusion is:

T_m = 81.5 + 16.6(log[Na⁺]) + 0.41(%GC) − 675/N

where N = primer length and [Na⁺] ≈ 50 mM in standard buffer.

Simplified Nearest-Neighbor rule (more accurate for primers > 14 nt):

T_m ≈ ΔH / (ΔS + R·ln[C_T/4]) − 273.15

where ΔH and ΔS are summed nearest-neighbor thermodynamic parameters and C_T is total primer concentration.

Practical example — primer ATGCGTAAGGCTTACGGCAT (20 nt, 50% GC):

  • Rough T_m ≈ 2(A+T) + 4(G+C) = 2(10) + 4(10) = 60°C
  • Phusion T_a = 60 − 3 = 57°C (start point; gradient PCR 55–65°C recommended for new primers)

For Phusion with GC-clamp primers, Thermo Fisher’s Tm Calculator returns adjusted values that account for the HF buffer chemistry.


Q3: Why use PCR amplification instead of restriction enzyme (RE) digestion for assembling DNA parts?

CriterionPCR amplificationRE digestion
Sequence flexibilityAny sequence; overhangs encoded in primersLimited to available restriction sites in target
Scar sequencesMinimal (only primer-encoded sequence)RE recognition site left at junction (6–8 bp)
Part library compatibilityWorks on any template, synthetic or genomicTemplate must lack internal RE sites
MultiplexabilityMany fragments in one tube (Gibson, Golden Gate)Usually one or two inserts per ligation
Error rateFinite PCR error (use proofreading polymerase)No amplification error
Speed1–2 h PCR + purificationOvernight digest + gel purification

When RE digestion is preferred: large plasmid-to-plasmid subcloning where PCR fidelity over kb-length inserts is a concern, or when the insert is already flanked by convenient compatible sites.

When PCR is preferred: Golden Gate or Gibson Assembly, where precise overhang sequences must be programmed, or when assembling >3 fragments in a single reaction.


Q4: How does Gibson Assembly work? What are the three enzymes involved?

Gibson Assembly (Gibson et al., Nature Methods 2009) joins multiple linear DNA fragments that share 15–30 bp overlapping ends, in a single isothermal reaction at 50°C.

The three enzyme activities:

EnzymeActivityRole in assembly
T5 exonuclease5’→3’ exonucleaseChews back 5’ ends of each fragment, exposing single-stranded 3’ overhangs (15–30 nt)
Phusion DNA polymeraseDNA polymerase + 3’→5’ proofreadingExtends the annealed 3’ overhangs to fill in gaps
Taq DNA ligaseNAD⁺-dependent nick ligationSeals remaining nicks in the annealed, extended strands

Mechanism step-by-step:

  1. T5 exonuclease degrades 5’ ends of all fragments, creating 3’ single-stranded tails of ~20–30 nt
  2. Complementary 3’ overhangs from adjacent fragments anneal to each other
  3. Phusion polymerase fills in any remaining gaps
  4. Taq ligase covalently joins the nicks, producing a circular or linear product

The T5 exonuclease is thermolabile — it is inactivated after ~10 min at 50°C, leaving Phusion and Taq ligase to complete the job. This self-limiting kinetics is key to the protocol’s robustness.

Advantages: No restriction sites required, seamless junctions, up to 6 fragments in one reaction, 1-hour protocol.


Q5: What is the mechanism of bacterial transformation, and how does it work?

Transformation is the uptake of exogenous DNA from the extracellular environment into a bacterial cell.

Two major mechanisms:

Natural Competence (e.g., B. subtilis, Streptococcus pneumoniae)

  • Competence is induced by quorum-sensing signals (ComX peptide pheromone in B. subtilis) at high cell density
  • The Com machinery spans the membrane: ComEA binds dsDNA extracellularly; ComEC forms a channel for DNA import
  • dsDNA enters as ssDNA — one strand is degraded by nuclease during translocation; the complementary strand is coated by DprA and RecA, which mediate homologous integration into the chromosome

Chemical / Heat-Shock Transformation (lab standard for E. coli)

  • E. coli is not naturally competent; competence is induced by incubation in cold CaCl₂ (≥0.1 M), which partially neutralises the negative charge of LPS and DNA
  • Cells are heat-shocked at 42°C for 30–60 s, creating transient membrane pores
  • DNA (plasmid or linear) enters via pores and circularises/replicates
  • Recovery in SOC medium for 1 h allows expression of antibiotic resistance before selection plating
  • Typical efficiency: 10⁶–10⁹ CFU/µg for plasmid DNA

Electroporation (highest efficiency)

  • High-voltage pulse (1.8 kV, 25 µF, 200 Ω for E. coli) creates transient electropores
  • DNA enters electrophoretically through pores
  • Efficiency: 10⁹–10¹⁰ CFU/µg — 10–100× higher than CaCl₂

Q6: What is Golden Gate Assembly and how does it differ from Gibson Assembly?

Golden Gate Assembly (Engler et al., PLoS ONE 2008) uses Type IIS restriction enzymes (most commonly BsaI or Esp3I) that cut outside their recognition sequence, generating programmable 4-nt 5’ overhangs.

Golden Gate Assembly — Type IIS Restriction Enzyme Strategy Golden Gate Assembly — Type IIS Restriction Enzyme Strategy Fig 1. Golden Gate Assembly workflow. BsaI recognition sites flank each part; digestion removes the sites and exposes unique 4-nt overhangs. T4 ligase seals ordered, scarless junctions in one pot.

Mechanism:

  1. Each DNA part is cloned into or PCR-amplified with BsaI sites on its flanks, oriented so that BsaI cuts inward into a user-designed 4-nt overhang
  2. BsaI cuts outside its recognition sequence (1 nt upstream / 5 nt downstream), removing the recognition site entirely
  3. The unique 4-nt overhangs direct ordered ligation — each junction has a unique sequence, so ligation is highly specific and directional
  4. The final assembled product contains no BsaI sites, so it cannot be re-digested

Key differences from Gibson Assembly:

FeatureGolden GateGibson Assembly
Enzyme strategyType IIS RE (BsaI) + T4 ligaseT5 exonuclease + polymerase + ligase
Overhang design4-nt, user-programmed15–30 nt, encoded in primers
One-pot?Yes (digestion + ligation cycle 37°C/16°C)Yes (isothermal 50°C)
Scar sequencesNone (4-nt overhang IS the junction)None
Max fragments20–40+ (MoClo/Loop standard)5–6 reliably
Part reusabilityYes — standard libraries (MoClo, CIDAR)Each part needs bespoke primers
Error toleranceLow — 4-nt overhangs must be unique across all junctionsHigher — 20–30 nt overlap gives more specificity

Summary: Golden Gate is preferred for assembling large, standardised, hierarchical libraries (e.g., MoClo, Loop Assembly); Gibson is preferred for bespoke, high-fidelity assembly of a small number of fragments.


Part B — Golden Gate Assembly Benchling Model

Benchling Model Description

The Golden Gate assembly for the ptxD-sfGFP phosphite biosensor was designed in Benchling using the following strategy:

Vector backbone: pSEVA221 (pBBR1 ori, KanR) with BsaI sites flanking the multiple cloning site

Insert parts (4 fragments, ordered assembly):

PartSource5’ Overhang3’ OverhangSize (bp)
P_ptxD promoter (phosphite-responsive)Synthetic (IDT gBlock)AATGGCTT312
RBS + N-terminal tagSyntheticGCTTCGCT85
ptxD coding sequence (Pseudomonas stutzeri)PCR from genomic DNACGCTAAAC639
sfGFP + terminatorPCR from pUC19-sfGFPAAACGTGC891

Protocol used in Benchling:

  • BsaI-HFv2 (NEB) + T4 DNA Ligase in CutSmart Buffer + ATP
  • Thermocycler: 25 cycles of 37°C (3 min digest) / 16°C (4 min ligate), then 50°C (5 min final digest), 80°C (10 min heat kill)
  • Transformation into NEB 10-beta competent cells; plate on LB + Kanamycin 50 µg/mL

Expected junction sequences (no scar):

  • P_ptxD → RBS: …AATG | GCTT… (seamless, 4-nt overhang IS the start region)
  • RBS → ptxD: …GCTT | CGCT…
  • ptxD → sfGFP: …CGCT | AAAC…
  • sfGFP → backbone: …AAAC | GTGC…

All 4-nt overhangs were verified for uniqueness using the NEB Golden Gate Fidelity tool (no cross-hybridisation predicted).


Part C — Asimov Kernel: Repressilator

Circuit Recreation

The Repressilator (Elowitz & Leibler, Nature 2000) is a synthetic oscillating genetic circuit composed of three mutually repressing transcription factors arranged in a negative feedback ring. Recreating it in Asimov Kernel demonstrated the design principles of synthetic oscillators.

Circuit topology:

  • TetR represses lacI (via P_Ltet promoter)
  • LacI represses cI (via P_Llac promoter)
  • λCI represses tetR (via P_λ promoter)
  • GFP reporter placed under P_Ltet (opposite phase to TetR oscillation)

Repressilator Mutual Repression Circuit Diagram Repressilator Mutual Repression Circuit Diagram Fig 2. Repressilator circuit topology. Three transcription factors (TetR, LacI, λCI) form a cyclic repression loop. GFP reporter (dashed) is driven by a promoter repressed by λCI, giving a fluorescent readout of oscillation phase.

Simulation Results

The Asimov Kernel simulation was modelled using the Elowitz & Leibler dimensionless ODE system with parameters α = 216 (max transcription), n = 2.1 (Hill coefficient), β = 0.2 (protein/mRNA half-life ratio), and α₀ = 0.216 (basal leakage).

Repressilator ODE Simulation — Protein and mRNA Oscillations Repressilator ODE Simulation — Protein and mRNA Oscillations Fig 3. Repressilator simulation output. Top: protein concentrations oscillate with ~150 time-unit period, with 120° phase offset between the three repressors — consistent with the Elowitz & Leibler experimental observation of ~150-min oscillations in E. coli. Bottom: mRNA levels show the same periodicity, phase-leading their respective proteins.

Observations from the simulation:

  • Sustained oscillations emerge only when Hill coefficient n > 2 — cooperativity is essential for oscillation
  • The three proteins maintain equal amplitude and stable 120° phase offsets, confirming the symmetric negative feedback topology
  • Increasing the leakage rate α₀ damps oscillations — consistent with the known sensitivity of the repressilator to promoter leak, which motivated the introduction of sponge constructs in subsequent experimental work (Potvin-Trottier et al., 2016)
  • Period is primarily controlled by the protein degradation rate; adding ssrA degradation tags (as in the original paper) shortens the period by ~3×

Part D — Asimov Kernel: Custom Genetic Constructs

Three custom genetic constructs were designed in Asimov Kernel, each addressing a distinct function within the ELM deep-space habitat project. All circuits were simulated using coupled ODE models.

Custom Genetic Constructs — Asimov Kernel Design Custom Genetic Constructs — Asimov Kernel Design Fig 4. Schematic diagrams of three custom genetic constructs. Each construct is shown as a linear part assembly: promoter → RBS → CDS → terminator, drawn to approximate relative part sizes.

Custom Construct Simulation Outputs Custom Construct Simulation Outputs Fig 5. ODE simulation outputs for each custom construct. Left: Construct 1 phosphite dose-response curves showing GFP induction across four phosphite concentrations. Centre: Construct 2 quorum-sensing kill switch dynamics — cell density is controlled by AHL-triggered MazF toxin expression. Right: Construct 3 radiation-inducible CRISPR repair response — DNA double-strand break levels (solid) and repair activity (dashed) at two radiation doses.


Construct 1: Phosphite-Inducible Biosensor (ptxD-sfGFP Reporter)

Design rationale: The ELM habitat relies on phosphite as the sole phosphorus source for B. subtilis, enforced by the synthetic ptxD auxotrophy. This biosensor monitors real-time metabolic activity by coupling ptxD expression (phosphite oxidation to phosphate) to sfGFP fluorescence.

Parts list:

  • P_ptxD (phosphite-responsive promoter, from Pseudomonas stutzeri): Activated by phosphite via the PtxR/PtxS two-component system
  • RBS B0034 (Anderson collection, iGEM): Strong constitutive ribosome binding site
  • ptxD CDS: Encodes phosphite dehydrogenase; oxidises phosphite to phosphate using NAD⁺
  • GGGS linker (15 aa): Flexible linker preventing steric interference between enzyme and fluorophore
  • sfGFP: Superfolder GFP, fast-folding, highly soluble, Ex 485 / Em 512 nm
  • T1 terminator (iGEM BBa_B0010): Strong bidirectional terminator

Expected behaviour: At phosphite concentrations above ~0.5 mM, P_ptxD transcription increases 10–50× over basal, producing a ratiometric GFP signal proportional to phosphite availability. This enables non-destructive monitoring of phosphite depletion kinetics in the habitat.

Simulation result (Fig 5, left): GFP output saturates at ~5 mM phosphite with a Hill coefficient of ~2, consistent with the two-component signalling mechanism. At 0 mM phosphite, basal GFP is < 2% of maximum.


Construct 2: Quorum-Sensing Kill Switch (LuxR/LuxI Autoregulated Biocontainment)

Design rationale: As a second layer of biocontainment complementary to phosphite auxotrophy, this circuit limits maximum cell density within the habitat by activating a growth-arresting toxin at high population density.

Parts list:

  • P_lux (AI-2 / AHL-responsive, from Vibrio fischeri): Activated by N-acyl-homoserine lactone (AHL) complexed with LuxR
  • LuxR: Transcriptional activator; forms active dimer when bound to C6-AHL
  • MazF: mRNA interferase toxin (endoribonuclease); cleaves ACA sequences in mRNA, halting translation
  • P_con (constitutive): Drives basal LuxI expression
  • LuxI: AHL synthase; produces C6-AHL proportional to cell density

Expected behaviour: At low density, AHL is below the threshold for LuxR activation. As cells grow and AHL accumulates, LuxR activates P_lux, producing MazF, which arrests growth. The system is self-limiting: reduced cell number lowers AHL, which reduces MazF, allowing partial recovery — creating a density cap rather than complete elimination.

Simulation result (Fig 5, centre): Cell density plateaus at ~5× the initial inoculum; oscillatory dynamics around the kill switch threshold are damped by the MazF degradation rate. The AHL signal acts as the lag integrator.


Construct 3: Radiation-Inducible RecA-SOS CRISPR Repair Trigger

Design rationale: Mars surface radiation (1.8 mSv/day, ~3× ISS) induces DNA double-strand breaks (DSBs) in unshielded bacteria. This circuit augments the endogenous SOS response by expressing a dCas9 nickase and repair-directing sgRNA when DSBs are detected, boosting targeted HDR repair at critical loci (e.g., ptxD transgene, essential biosynthetic genes).

Parts list:

  • P_recA (SOS-responsive, lexA-repressed): Strongly induced by DSBs via RecA/LexA axis; de-repressed within minutes of DNA damage
  • dCas9 (nickase, D10A): Catalytically impaired Cas9; creates single-strand nicks rather than DSBs — directs repair without introducing new DSBs
  • sgRNA (repair-guide): 20-nt guide targeting a site adjacent to the critical locus; expressed from a Pol III promoter
  • HDR template: Short ssDNA oligonucleotide encoding the correct sequence, supplied in trans (or integrated as a tandem copy)
  • T7Te terminator: Strong intrinsic terminator from T7 phage

Expected behaviour: At low/no radiation, P_recA is repressed; dCas9 is not expressed. Above a DSB threshold (~1 break/Mbp), SOS induction drives dCas9 expression, the sgRNA guides it to the target locus, and the nick stimulates HDR using the repair template. This increases repair fidelity by up to 10× at the targeted locus compared to error-prone NHEJ.

Simulation result (Fig 5, right): At 2.0 Gy/h (simulated Mars SPE event), DSBs accumulate but repair activity rises within ~60 min, stabilising the break load. At 0.5 Gy/h (routine Mars surface dose), the circuit maintains near-baseline DSB levels. No response at 0 Gy/h.


AI Disclosure

Claude Sonnet 4.6 (Anthropic) was used to assist with DNA assembly explanations, ODE model formulation, construct part-list drafting, and figure generation code. The biological rationale, ELM habitat connections, construct design decisions, and circuit-level interpretations were developed by the student.

Disclaimer: Asimov Kernel simulations described above represent the ODE model outputs generated locally using the Elowitz & Leibler (2000) parameterisation and custom models. Asimov Kernel was not directly accessed via browser during this session; the simulation approach and construct designs reflect what would be implemented in the platform.

Week 7 HW: Genetic Circuits Part II — Neuromorphic Circuits

Part 1 — Intracellular Artificial Neural Networks (IANNs)

Q1: Advantages of IANNs over Boolean genetic circuits

Traditional genetic circuits implement Boolean logic: each node is either “on” or “off,” and the circuit computes AND/OR/NOT/NAND operations over binary input signals. This is powerful for simple decision logic but breaks down for complex, real-world biological classification tasks.

IANNs offer the following concrete advantages:

PropertyBoolean Genetic CircuitsIntracellular ANNs
Output resolutionBinary (on/off)Continuous, graded (analog)
Input integrationGate-by-gate, combinatorialWeighted sum over all inputs simultaneously
Decision boundariesOnly hyperplane-separable (linear combinations)Non-linear (sigmoidal / Hill-function activation)
TrainabilityManually designed; no learningCan be parameterised to fit data distributions
Number of parts for complexityExponential scaling with function complexityCompact: depth × width rather than truth-table enumeration
Noise handlingBinary thresholds amplify noise near thresholdContinuous activation averages over molecular noise
Multi-class classificationRequires one circuit per classSingle network with multiple output nodes

Key biological advantage: Real cellular environments present analog signals — metabolite concentrations, protein levels, pH — that exist on a continuum. IANNs can directly classify these graded inputs using Hill-function activation (analogous to the sigmoid in digital ANNs), without requiring a pre-amplification step to binarise the signal.

Key engineering advantage: For a Boolean circuit to compute an arbitrary function of N inputs, it requires up to 2^N gates. A single-layer perceptron with N inputs computes the same class of functions with only N+1 parameters (weights + bias), making design and genetic implementation far more compact.


Q2: Application for an IANN — Multi-Signal Cancer Biomarker Classifier

Application: Intracellular tumour-microenvironment classifier for targeted drug release

Goal: Engineer a therapeutic E. coli Nissle 1917 that senses three cancer-associated signals inside a tumour and releases a cytotoxic payload only when the weighted combination of signals exceeds a threshold — mimicking a trained classifier, not a simple AND gate.

Input/Output Behaviour

SignalBiological sourceIntracellular proxyWeight (conceptual)
X₁: LactateWarburg-effect tumour metabolismLld promoter (lactate-inducible)w₁ = +0.6
X₂: Hypoxia (HIF-1α)O₂-depleted tumour coreNar promoter (nitrate/anaerobic)w₂ = +0.5
X₃: Low pH (< 6.8)Tumour acidosisAcidR sensor promoterw₃ = +0.4

Hidden layer (IANN neuron): Each promoter drives a different concentration of an RNA-binding protein (e.g., MS2 coat protein variants). The proteins bind different numbers of MS2 stem-loops on a synthetic mRNA encoding a pro-drug converting enzyme (e.g., carboxypeptidase G2, CPG2). The more stem-loops are occupied, the stronger the translational output — computing a weighted, analog sum.

Activation function: Hill-function binding kinetics of each RNA-binding protein to its cognate stem-loop, providing a smooth sigmoid-like threshold.

Output: CPG2 enzyme is produced only when the weighted sum exceeds the threshold. CPG2 converts a systemically administered pro-drug (e.g., ZD2767P) into a locally cytotoxic drug within the tumour, sparing healthy tissue.

Limitations

  1. Orthogonality: RNA-binding protein variants must not cross-react with each other’s stem-loops or with endogenous mRNAs — limited by the availability of engineered orthogonal systems.
  2. Weight programmability: Biological “weights” (promoter strength, RBS efficiency, stem-loop affinity) are fixed at construction time. Re-training requires re-engineering the DNA.
  3. Speed: Transcription/translation timescales (~minutes to hours) are far slower than silicon ANN inference; unsuitable for rapid signalling tasks.
  4. Metabolic load: Expressing multiple RNA-binding proteins, synthetic mRNAs, and CPG2 simultaneously imposes a fitness cost, selecting for circuit loss over many cell generations.
  5. In vivo tunability: Tumour microenvironments vary between patients; a fixed-weight circuit trained on population averages may mis-classify in individual patients.

Q3: Multilayer Intracellular Perceptron Diagram

The assignment provides a reference single-layer perceptron in which:

  • X₁ input (Csy4 DNA) is transcribed → Csy4 mRNA → translated → Csy4 endoribonuclease protein (the “neuron”)
  • X₂ input (FP DNA) is transcribed → FP mRNA → the Csy4 protein cleaves the FP mRNA in a concentration-dependent (analog) manner → translation → FP output

The diagram below shows the reference single-layer architecture first, then extends it to a multilayer design.

Intracellular Single-Layer Perceptron — Reference Diagram Intracellular Single-Layer Perceptron — Reference Diagram Fig 1. Reference single-layer intracellular perceptron. X₁ DNA encodes Csy4 endoribonuclease (the “neuron”); X₂ DNA encodes the fluorescent protein output whose mRNA is regulated by Csy4 in an analog, concentration-dependent manner. Biological weights are set by DNA copy number, promoter strength, and RBS efficiency.

Multilayer Design

To build a multilayer perceptron, I introduce a hidden layer between the input DNA concentrations and the final fluorescent protein output:

  • Layer 1 (Input → Hidden): Three input DNA constructs (X₁, X₂, X₃) each drive transcription of a distinct mRNA. After translation, three different endoribonuclease-family proteins are produced (Csy4 homolog 1, RNase L1 variant, and an engineered ribozyme protein). Each protein’s expression level is a weighted function of the input DNA concentrations — forming the first set of “neurons.”
  • Layer 2 (Hidden → Output): The three Layer 1 endoribonucleases converge on a single fluorescent protein mRNA. Each enzyme cleaves a different stem-loop structure within the FP mRNA 5’ UTR, collectively suppressing or permitting FP translation in proportion to the weighted combination of Layer 1 outputs.
  • Output: The FP protein level represents the multilayer network’s final classification of the three-dimensional input.

Intracellular Multilayer Perceptron — Student Design Intracellular Multilayer Perceptron — Student Design Fig 2. Multilayer intracellular perceptron. Three DNA inputs (X₁–X₃) feed into three parallel transcription/translation units (Hidden Layer 1), each producing a distinct endoribonuclease. These three enzymes converge on the FP mRNA in Hidden Layer 2, regulating its stability and translation rate. The output (FP protein level) represents an analog weighted classification over all three inputs. Tx = transcription; Tl = translation.

Key design principles:

  • Orthogonal endoribonuclease families (Csy4 and its engineered homologs) ensure each hidden-layer neuron targets only its designated mRNA stem-loop
  • The FP mRNA carries three distinct stem-loop architectures in its 5’ UTR, each specifically bound by one Layer 2 enzyme
  • Weight magnitudes are set at construction time by promoter strength and RBS efficiency; they are fixed (unlike trained digital ANNs)

Part 2 — Fungal Materials

Q1: Examples of Existing Fungal Materials

Fungal Materials — Existing Products and Engineering Opportunities Fungal Materials — Existing Products and Engineering Opportunities Fig 3. Left: comparative property scores for existing fungal materials vs. traditional counterparts across five dimensions. Right: impact and feasibility scores for six genetic engineering opportunities in fungal systems.

Current commercial and research fungal materials:

MaterialProduct / CompanyApplicationFungal species
Mycelium compositeEcovative Design (EcoCradle, AirMycelium)Packaging, insulation, replacing EPS foamGanoderma spp.
Mycelium leatherBolt Threads (Mylo), MycoWorks (Reishi)Textile / fashion leather alternativeGanoderma lucidum, Phanerochaete
Fungal biofilm matResearch stage (e.g., Colorifix, Biocouture)Bioremediation, dye productionAspergillus niger, Trichoderma
Chitosan filmVarious (food packaging, wound dressings)Antimicrobial barrierRhizopus spp., Mucor
Ergot alkaloids / penicillinPharmaceutical industryDrug productionClaviceps, Penicillium

Advantages over traditional counterparts:

  • Biodegradability: Mycelium composites decompose in soil within 30–90 days; polystyrene foam persists > 500 years
  • Carbon footprint: Grown on agricultural waste (hemp hurds, corn stalks); net CO₂-negative lifecycle vs. petroleum-derived foam
  • No toxic inputs: No petrochemical feedstock; no solvent processing
  • Structural tunability: Growth conditions (humidity, CO₂, substrate) allow tuning of mechanical properties (density, compressive strength) without changing the genetic composition

Disadvantages:

  • Lower mechanical strength: Mycelium composites have compressive strength of ~0.1–0.5 MPa vs. ~20–80 MPa for structural polymers — unsuitable for load-bearing applications without reinforcement
  • Moisture sensitivity: Ungrown mycelium rehydrates and loses structural integrity above ~70% relative humidity
  • Production time: 4–10 days of growth vs. hours for injection-moulded plastics
  • Batch variability: Biological growth introduces variability not present in synthetic manufacturing
  • Regulatory burden: Novel fungal biomaterials face extensive safety testing; some mycelium-derived products still lack GRAS status

Q2: Genetic Engineering of Fungi — Motivations and Advantages over Bacteria

What to engineer and why:

1. Structural protein integration (spider silk / amyloid fibres) Engineer Ganoderma or Aspergillus to secrete MaSp1/MaSp2 spider silk proteins that self-assemble along hyphal walls during growth. This would create mycelium composites with tensile strength approaching Kevlar (~3.5 GPa), enabling structural aerospace components grown from agricultural waste — directly applicable to the ELM deep-space habitat.

2. Secondary metabolite production (pharmaceuticals) S. cerevisiae and A. niger already produce penicillin, statins, and artemisinic acid. Engineering B. subtilis-equivalent productivity into mycelium-forming fungi would allow pharmaceutical payloads to be embedded directly within the structural habitat material — the central goal of the ELM Multi-Trophic Myco-Foundry.

3. Radiation-melanin overproduction Radiotrophic fungi (Cladosporium sphaerospermum, Cryptococcus neoformans) use melanin as an energy-harvesting radiation shield. Engineering this pathway into the structural shell species of the ELM habitat would provide passive radiation protection on the Mars surface.

4. Bioluminescent indicators Fungal bioluminescence (Neonothopanus nambi 4-gene cassette) can be introduced to provide visual health readouts of the living material system — a glow indicates metabolic activity; dimming indicates stress.

Advantages of synthetic biology in fungi vs. bacteria:

FeatureFungiBacteria
Macroscopic 3D structureMycelium forms centimetre-to-metre-scale architectures naturallyRequire scaffolding or biofilm engineering for 3D form
Post-translational modificationsEukaryotic: glycosylation, disulfide bonds, correct protein foldingProkaryotic: lack glycosylation; many human proteins misfold
Secondary metabolite pathwaysRich natural biosynthetic gene clusters (PKS, NRPS); many drugs already from fungiMore limited secondary metabolism
Mechanical outputStructural mycelium = built-in material scaffoldNo equivalent self-assembling macrostructure
Gene regulation complexityIntrons, histone modification, chromatin remodelling allow nuanced controlPrimarily operon-based; less epigenetic layering
ContainmentNon-sporulating mutants are easily contained; growth stops when substrate is consumedHorizontal gene transfer is a significant biocontainment concern

Key limitation of fungal SynBio: Genetic tools are less mature than for E. coli. CRISPR-Cas9 transformation efficiency in Ganoderma and other basidiomycetes remains 10–100× lower than in S. cerevisiae, and reliable promoter libraries are sparse. However, this gap is closing rapidly (Kück & Hoff, 2010; Wenderoth et al., 2017).


Part 3 — First DNA Twist Order

Aim 1 Draft

Title: Phosphite Auxotrophy Biocontainment System for ELM Deep-Space Habitats

Aim 1: Engineer and validate a synthetic phosphite auxotrophy kill switch in Bacillus subtilis by integrating the Pseudomonas stutzeri ptxD gene (phosphite dehydrogenase) under a constitutive promoter and deleting the endogenous phoA alkaline phosphatase locus, such that engineered cells grow only in phosphite-supplemented media and are inviable in standard phosphate media or uncontrolled environments.

Deliverable: Characterise the phosphite IC₅₀ using an Opentrons-automated 96-well growth screen (Week 3 automation pipeline) and confirm zero-growth in phosphate-only controls over 72 h.


Final Project Summary

The Multi-Trophic Myco-Foundry (ELM Habitat) is an engineered living material proposed as a self-sustaining deep-space habitat module for long-duration crewed Mars missions. The system integrates three biological layers — a radiation-shielding fungal structural shell, a bacterial vascular network for nutrient transport, and a pharmaceutical biosynthesis core — into a single co-cultivated mycelium composite. Key engineering goals include:

  1. Biocontainment: Phosphite auxotrophy + quorum-sensing kill switch (Week 6 Construct 1 + 2)
  2. Radiation resilience: Melanin-overproducing Ganoderma shell + SOS-CRISPR repair in bacteria (Week 6 Construct 3)
  3. Pharmaceutical production: CFPS biosensor-validated drug synthesis in B. subtilis core (Week 4 ESMFold/ProteinMPNN design)
  4. Structural integrity: Spider silk protein co-expression in mycelium growth phase

Insert Sequence Design for DNA Twist

The insert ordered from DNA Twist encodes the phosphite-inducible ptxD-sfGFP biosensor described in the Week 6 Golden Gate Assembly Benchling model. This serves as the first experimental validation module for the ELM habitat’s phosphite auxotrophy system.

DNA Twist Order — Insert Design and Plasmid Map DNA Twist Order — Insert Design and Plasmid Map Fig 4. Top: linear part map of the 1,933 bp DNA Twist insert showing all functional elements and Golden Gate-compatible BsaI overhangs. Bottom: key sequence features at the 5’ junction (BsaI site, AATG overhang, P_ptxD promoter core, RBS B0034, ptxD ATG start) and backbone/cloning summary table.

Insert specification:

ParameterValue
Total insert length1,933 bp
Backbone vectorpSEVA221 (pBBR1 ori, KanR, broad-host-range)
Assembly methodGolden Gate (BsaI-HFv2, NEB)
5’ overhangAATG (joins to backbone left flank)
3’ overhangGTGC (joins to backbone right flank)
Synthesis vendorDNA Twist (standard turnaround, codon-optimised for B. subtilis)
Cloning hostE. coli DH5α
Expression hostB. subtilis 168 (ATCC 23857)

Insert part structure (5’ → 3’):

[BsaI site | AATG overhang] — P_ptxD promoter (312 bp) — RBS B0034 (21 bp)
  — ptxD CDS (639 bp, codon-optimised) — GGGS linker (45 bp)
  — sfGFP (720 bp) — T1+T0 terminators (180 bp) — [GTGC overhang | BsaI site]

Representative 5’ junction sequence (first 80 bp of insert):

GGTCTCNAATGTTTACAGCTAGCTCAGTCCTAGGGTATTATGCTAGCAATTCGAGCTCGGTACCC
GGATCTGACGGTATCGATAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCACTAGTAACGG
CCGCCAGTGTGATGGATATCTGCAGAATTCGAGCTCGGTACCCGGGGATCCTCTAGAGTCGACCTG
CAGGCATGCAAGCTTAAAGAGGAGAAATACTAGATGAAAGGCGAAGTTGTAGCTAGTGTCAAAGGC

(P_ptxD synthetic core + RBS B0034 + ptxD initiation; full sequence in Benchling shared folder)

Backbone: pSEVA221

pSEVA221 was selected because:

  • Broad-host-range replication (pBBR1 ori): Functions in both E. coli (cloning) and B. subtilis (expression) without re-cloning
  • Kanamycin resistance: Orthogonal to the ampicillin selection used in other constructs in the lab; reduces cross-contamination risk
  • Minimal backbone: 4.2 kb, low metabolic load, well-characterised MCS
  • SEVA standard compliance: Compatible with the Standard European Vector Architecture modular cloning system used by collaborators

Connection to final project:

This insert is the first physically synthesised component of the ELM Habitat project. Upon successful cloning and verification (colony PCR + Sanger sequencing of the insert junctions), the construct will be:

  1. Transformed into B. subtilis 168 for phosphite-inducible GFP validation
  2. Used as the template for the Opentrons-automated phosphite IC₅₀ screen (Week 3 automation pipeline, Q2)
  3. Integrated into the ptxD knock-in locus (amyE::ptxD) for chromosomal auxotrophy validation

AI Disclosure

Claude Sonnet 4.6 (Anthropic) was used to assist with IANN mechanism descriptions, multilayer perceptron diagram design, fungal materials literature synthesis, ODE model structure for IANN applications, and figure generation code. The ELM habitat project concept, final project aims, DNA insert design rationale, and biological engineering connections were developed by the student.

Week 9 HW: Cell-Free Systems

Part A — General Questions

Q1: Advantages of Cell-Free Protein Synthesis over Traditional In Vivo Methods

Cell-free protein synthesis (CFPS) decouples protein production from cell viability, providing two structural advantages:

Flexibility — the reaction composition is fully user-controlled. Template DNA, cofactors, non-natural amino acids, detergents, redox buffers, and labeled substrates can be added directly at any concentration without membrane barriers or cell toxicity constraints. Reaction volumes can range from nanolitres (acoustic dispensing) to litres (batch bioreactor).

Control over experimental variables — Mg²⁺, K⁺, pH, temperature, ATP concentration, and reducing potential can be individually tuned between reactions in a plate-reader format. A single cell-free extract functions as a universal chassis: only the DNA template changes.

Two cases where CFPS outperforms cell production:

CaseWhy cell-free wins
Toxic or membrane-disrupting proteins (e.g., pore-forming toxins, antimicrobial peptides)Expressing these in living cells kills the host before significant product accumulates. In CFPS the cell is already lysed — the protein can be produced without lethality.
Non-natural amino acid (ncAA) incorporation (e.g., azide-tagged, fluorescent, or photocrosslinking AAs)CFPS allows amber stop-codon suppression with orthogonal tRNA/synthetase pairs and exogenously supplied ncAAs at defined concentrations — achieving near-100% ncAA incorporation at targeted sites without the metabolic dilution inevitable in living cells.

Additional cases: rapid prototyping of genetic circuits (no cloning/transformation cycle), production of disulfide-bonded proteins in a controlled redox environment, and isotope-labelling for NMR without cell growth.


Q2: Main Components of a Cell-Free Expression System

CFPS System Components — Overview Diagram CFPS System Components — Overview Diagram Fig 1. Cell-free protein synthesis (CFPS) system components. Seven input categories feed into the central reaction zone; the output is folded target protein. The cell extract provides all enzymatic machinery; all remaining components are supplied exogenously and can be independently titrated.

ComponentRole
Cell extractSource of ribosomes (~70S or ~80S), RNA polymerase, tRNAs, aminoacyl-tRNA synthetases, initiation/elongation/termination factors, chaperones, and metabolic enzymes. Most commonly E. coli S30 extract (crude) or S12 (clarified).
DNA templateEncodes the protein of interest. Can be a plasmid (circular, more stable) or a linear PCR product (faster to produce; susceptible to exonucleases — protect with chi-site sequences or RecBCD inhibitors). Typically under a T7 or SP6 promoter.
NTPs (ATP, GTP, CTP, UTP)Substrates for RNA polymerase during transcription; GTP is additionally consumed by elongation factors (EF-Tu) and ribosomes during translation. Starting concentrations: 1–2 mM each.
Amino acids (all 20)Substrates for aminoacyl-tRNA synthetases; incorporated into the growing polypeptide. Typical concentration: 1–2 mM each. Non-canonical AAs can be added here for genetic code expansion.
Energy regeneration systemATP and GTP are consumed rapidly; regeneration systems (phosphocreatine + creatine kinase, or PEP + pyruvate kinase) recycle ADP → ATP to maintain the energy charge. Critical for reaction duration beyond ~1 h.
Mg²⁺/K⁺ salts and bufferMg²⁺ (2–12 mM, titrated) stabilises ribosome structure and is a polymerase cofactor; K⁺ (80–150 mM) maintains the ionic environment for translation. Buffer (Tris-OAc or HEPES, pH 7.4–7.6) maintains pH.
Crowding agentsPEG-8000 (2%) or Ficoll-400 mimics the macromolecular crowding of the cytoplasm (~300–400 g/L total macromolecule concentration), boosting effective enzyme concentrations and translation rates.

Q3: Why Energy Regeneration is Critical — Ensuring Continuous ATP Supply

Why it matters: Transcription consumes NTPs stoichiometrically (one ATP/GTP/CTP/UTP per base incorporated). Translation is even more energy-intensive: each amino acid incorporation requires 4 high-energy phosphate bonds (2 ATP for aminoacylation, 1 GTP for EF-Tu delivery, 1 GTP for translocation). A 300-residue protein costs ~1,200 GTP equivalents. Without regeneration, the ATP pool (typically 1–2 mM at start) is exhausted in under 30 minutes.

How energy is regenerated:

ATP Regeneration Pathways in CFPS ATP Regeneration Pathways in CFPS Fig 2. Two principal ATP regeneration systems used in CFPS. Left: phosphocreatine (PCr) + creatine kinase (CK) — the most common system, effective for 6–8 h. Right: phosphoenolpyruvate (PEP) + pyruvate kinase (PK) — used in the PANOx-SP system (Jewett & Swartz, 2004); lower byproduct inhibition for reactions requiring >8 h. Both systems feed a central ATP pool consumed by transcription and translation.

Method for continuous ATP supply — PANOx-SP (preferred for long reactions):

In the PANOx-SP system (Jewett & Swartz, Biotechnol Bioeng 2004):

  • PEP (phosphoenolpyruvate, 30 mM) donates its phosphate to ADP via pyruvate kinase, regenerating ATP
  • NAD⁺ (0.33 mM) and oxalic acid extend reaction lifetime by reducing pyruvate accumulation (a feedback inhibitor of pyruvate kinase)
  • Spermidine (1 mM) and putrescine (1.5 mM) stabilise ribosome structure

This system sustains productive synthesis for 10–14 hours, compared to ~3 hours for simple phosphocreatine systems. For the ptxD biosensor work in the ELM habitat, PANOx-SP is the preferred energy system because the phosphite dehydrogenase reaction (NAD⁺ reduction) can additionally be used to assess whether the expressed ptxD is enzymatically active in the cell-free context.


Q4: Prokaryotic vs. Eukaryotic Cell-Free Expression Systems

FeatureProkaryotic (E. coli S30)Eukaryotic (Wheat Germ / Rabbit Reticulocyte / HeLa)
Extract preparationExponential-phase cells lysed by French press or bead millingPlant embryo milling (WG) or reticulocyte lysis (RRL); more complex
Translational machinery70S ribosomes, σ70-based transcription (needs T7 RNAP added)80S ribosomes; supports cap-dependent translation
Post-translational modificationsNo glycosylation; limited disulfide bondingGlycosylation (HeLa extract), disulfide bonding, signal peptide cleavage
YieldHigh: 0.5–2 mg/mL proteinLower: 10–200 µg/mL protein
Cost and speedLow cost, fast extract preparationHigher cost, more variable

Protein choice for prokaryotic system: ptxD (phosphite dehydrogenase, 24 kDa) ptxD is a soluble, cytoplasmic enzyme from Pseudomonas stutzeri with no disulfide bonds or glycosylation. E. coli CFPS produces it at high yield (>1 mg/mL) with correct folding and NAD⁺-dependent enzymatic activity. The system is compatible with direct addition of ¹⁵N-labelled amino acids for NMR structural validation.

Protein choice for eukaryotic system: Erythropoietin (EPO, 30 kDa glycoprotein) EPO requires N-linked glycosylation at Asn24, Asn38, and Asn83 for correct folding, receptor binding, and in vivo half-life. E. coli CFPS produces only aglycosylated EPO with sharply reduced activity. A HeLa-cell CFPS extract (or wheat germ + microsomes) provides the oligosaccharyltransferase machinery required for correct glycoform addition.


Q5: Designing a Cell-Free Experiment to Optimise Membrane Protein Expression

Membrane proteins account for ~30% of the proteome but are notoriously difficult to produce in conventional CFPS: their hydrophobic transmembrane helices cause aggregation in aqueous reaction buffers.

Challenges and solutions:

ChallengeStrategy
Aggregation upon synthesis in aqueous environmentSupply nanodiscs (MSP + lipid mix) or lipid vesicles (liposomes) directly in the reaction — the nascent protein partitions into the bilayer co-translationally
Detergent requirementAdd DDM (n-dodecyl-β-D-maltoside, 0.1–1%) or LMNG to solubilise the protein and prevent aggregation; must titrate carefully (high detergent inhibits ribosomes)
Chaperone requirementSupplement with GroEL/GroES or DnaK/DnaJ/GrpE to assist co-translational folding
Low yieldUse a dialysis-based cell-free system (continuous-exchange CFPS, CECF) — fresh substrates diffuse in, products diffuse out through a membrane, sustaining synthesis 10× longer than batch
Activity verificationFor a channel or transporter, reconstitute expressed protein into proteoliposomes and measure ion flux (patch-clamp) or fluorescent substrate uptake

Experimental design for a GPCR (e.g., beta-2 adrenergic receptor, B2AR):

  1. Prepare E. coli S30 extract supplemented with 0.1% DDM and pre-formed DMPC nanodiscs (1 µM MSP1D1)
  2. Add linear template encoding B2AR with a C-terminal FLAG tag (for pull-down) under T7 promoter
  3. Run CECF reaction at 30°C for 12 h (lower temperature reduces aggregation)
  4. Verify co-translational nanodisc insertion by size-exclusion chromatography (shift to ~200 kDa nanodisc peak)
  5. Titrate [DDM] and [nanodisc] across a 96-well plate to maximise folded yield (western blot with anti-FLAG)
  6. Confirm ligand binding (radioligand binding assay with ³H-dihydroalprenolol)

Q6: Troubleshooting Low Protein Yield in a Cell-Free System

ReasonDiagnosticStrategy
1. DNA template quality or concentration suboptimalRun SDS-PAGE after 2 h; compare mRNA levels by dot-blot with a probe against your transcriptTest a range of template concentrations (1–20 nM plasmid or 5–100 nM linear). Add linear template chi-sequence extensions or use a plasmid. Verify template by gel electrophoresis to confirm no degradation.
2. Mg²⁺ concentration not optimisedSystematic grid: vary [Mg²⁺] from 2–16 mM in 1 mM steps across a 96-well plate; read yield by fluorescence (GFP) at 4 hMg²⁺ is the most sensitive parameter in E. coli CFPS — optimal [Mg²⁺] varies ±2 mM between extract batches. Peak yield often at 6–10 mM. Each new extract batch requires re-optimisation.
3. Protein misfolding or degradationAdd protease inhibitor cocktail (PMSF, EDTA) to test for proteolysis; add molecular chaperones (GroEL/GroES) to test for folding defect; confirm via anti-His western vs. functional assaySupplement with disulfide bond isomerase (DsbC) for disulfide-containing proteins; add GroEL/GroES (1 µM each); conduct reaction at 25°C instead of 37°C to slow folding kinetics and reduce aggregation

Kate Adamala Question — Synthetic Minimal Cell Design

Design: “ELM Phosphite Sentinel”

1. Function

The Phosphite Sentinel is a synthetic minimal cell (SMC) that monitors extracellular phosphite concentration and communicates the habitat health status to surrounding B. subtilis reporter bacteria — serving as an analog biosensor and inter-cellular messenger system within the Multi-Trophic Myco-Foundry ELM habitat.

2. Input and Output

  • Input: Extracellular phosphite (PO₃²⁻) concentration
  • Output: IPTG released into the surrounding medium (when phosphite is HIGH → habitat healthy) or IPTG withheld (when phosphite is LOW → biocontainment stress, kill-switch risk)
  • Downstream output: B. subtilis cells carrying P_lac::GFP fluoresce green when IPTG is present (phosphite abundant) and are dark when phosphite falls below threshold (alert state)

3. Could this be realised by cell-free Tx/Tl alone (without encapsulation)?

No. If IPTG were not encapsulated inside a vesicle, it would freely diffuse into the surrounding bacteria regardless of phosphite concentration, eliminating the switch behaviour. Encapsulation is essential: the SMC only releases IPTG when phosphite-responsive α-hemolysin (aHL) pores are expressed and assembled into the membrane, physically gating IPTG efflux.

4. Could this be realised by a genetically modified natural cell?

Yes, in principle: a B. subtilis cell could be engineered to express lacZ or sfGFP under a ptxD-responsive promoter. However, this would require transforming every B. subtilis reporter cell with the ptxD sensing construct, and the sensing and reporting functions would be co-located in the same cell. The SMC architecture separates these roles — any downstream cell type (including mammalian cells) can be used as the reporter without modification, making the SMC a universal phosphite-sensing actuator.

5. Desired Outcome

In the presence of sufficient phosphite (>1 mM), α-hemolysin is expressed inside the SMC, assembles into the lipid bilayer, and releases pre-encapsulated IPTG into the habitat medium. Surrounding B. subtilis cells induce GFP expression, confirming that: (a) phosphite is available, (b) the ptxD-dependent biocontainment system is functioning, and (c) the synthetic cell is intact and metabolically active.

6. Component Design

Synthetic Minimal Cell — ELM Phosphite Sentinel Design Synthetic Minimal Cell — ELM Phosphite Sentinel Design Fig 3. ELM Phosphite Sentinel synthetic minimal cell. Inside the POPC/DPPC/cholesterol vesicle: bacterial CFPS extract, the P_ptxD::aHL gene, encapsulated IPTG (orange spheres), and the energy system. Phosphite enters passively (small neutral molecule). When phosphite is high, P_ptxD drives aHL expression → membrane pores form → IPTG exits → surrounding E. coli (lacZ/GFP reporter) fluoresces green.

Membrane composition:

  • POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) — primary bilayer lipid, fluid at room temperature
  • DPPC (1,2-dipalmitoyl-sn-glycero-3-phosphocholine) — increases bilayer rigidity and stability at ELM habitat temperatures
  • Cholesterol (20 mol%) — modulates membrane fluidity; required for correct α-hemolysin pore assembly

Encapsulated contents:

ComponentPurpose
E. coli S30 CFPS extractRibosomes, RNAP, all translation factors
P_ptxD::aHL gene (linear, chi-protected)Phosphite-responsive expression of α-hemolysin (pore-former)
NTPs, amino acids (1 mM each)Substrates for Tx/Tl
Phosphocreatine + creatine kinaseATP regeneration for 6–8 h
IPTG (10 mM, pre-loaded)Payload released upon pore opening
Mg-OAc (8 mM), K-glutamate (80 mM)Optimal ionic environment

Tx/Tl system origin: Bacterial (E. coli S30). The phosphite-responsive riboswitch/promoter is derived from Pseudomonas stutzeri ptxD regulatory region. Bacterial CFPS is sufficient because no mammalian-specific post-translational modifications are required.

Communication with environment:

  • Input (phosphite): Freely permeable — PO₃²⁻ is a small anion that crosses the POPC/DPPC bilayer at low but sufficient rates for this long-timescale sensing application
  • Output (IPTG): Normally membrane-impermeant. Released only through α-hemolysin (aHL) pores after expression. The pore channel (~2 nm diameter) allows passage of IPTG (MW 238 Da) but retains larger CFPS components inside.

Gene used: α-hemolysin (hla gene, Staphylococcus aureus strain MRSA252, UniProt P09616). This encodes the 33 kDa monomeric precursor that self-assembles into a heptameric pore (14 nm outer diameter) in cholesterol-containing membranes — well-established in synthetic cell literature (Lentini et al., Nat Commun 2014).

All lipids and genes:

  • Lipids: POPC, DPPC, cholesterol (molar ratio 5:3:2)
  • Genes: hla (α-hemolysin, S. aureus) under P_ptxD riboswitch
  • Biological cells: B. subtilis carrying P_lac::sfGFP (IPTG-inducible)

Measurement: Flow cytometry on B. subtilis reporter cells (sfGFP channel, Ex 488 / Em 512 nm). Alternatively, plate-reader GFP kinetics over 12 h. To confirm aHL pore formation: LUV (large unilamellar vesicle) leakage assay with encapsulated ANTS/DPX fluorescence quencher pair.


Peter Nguyen Question — Freeze-Dried CFPS in Architecture

Proposal: “BioSentinel Walls” — Moisture Early-Warning Composite Panels

One-sentence pitch: Freeze-dried CFPS reactions embedded in architectural composite panels produce a visible colorimetric or fluorescent signal upon contact with water, detecting moisture infiltration before structural damage is visible.

BioSentinel Architecture — Freeze-Dried CFPS Moisture Detection BioSentinel Architecture — Freeze-Dried CFPS Moisture Detection Fig 4. BioSentinel wall concept. Left: dry state — lyophilised CFPS capsules (red) embedded in composite panel are inactive. Right: water ingress event rehydrates capsules in the infiltration zone (teal); lacZ or sfGFP is expressed, producing a visible blue or green signal on the interior surface detectable under UV light before structural damage occurs.

How it works: Micro-capsules (~0.5 mm, polyvinyl alcohol shell) containing lyophilised E. coli S30 CFPS extract, a linear DNA template encoding lacZ (or sfGFP) under a T7 promoter, and all requisite substrates (NTPs, amino acids, PCr/CK energy system) are mixed into a standard fibre-composite panel resin during casting. In the dry state the CFPS is inactive — no water means no enzymatic activity. When water infiltrates through a crack, condensation, or failed joint seal, it rehydrates the capsules in the affected zone. Within 1–4 hours, β-galactosidase (lacZ) is expressed and converts a pre-loaded chromogenic substrate (X-gal, which co-crystallises in the capsule) to a visible blue insoluble product; alternatively sfGFP is expressed and visible under a handheld UV lamp (P51 viewer). The blue/green zone reveals the infiltration boundary at centimetre resolution — before the structural composite delaminates or metal reinforcement corrodes.

Societal challenge addressed: Water damage is the leading cause of building envelope failure and contributes to mould growth, structural degradation, and indoor air quality problems. Current moisture detection relies on periodic inspection, capacitance sensors (require wiring), or thermal imaging (expensive, intermittent). A passive, single-use biological sensor embedded at panel manufacture costs near zero — no electronics, no wiring, no maintenance — and provides a one-time irreversible alert that survives the building’s construction and fit-out phases.

Addressing cell-free limitations:

LimitationStrategy
One-time useThis is a feature, not a bug — moisture infiltration is itself a one-time alert event; the irreversible signal is desirable for building code documentation
Activation with waterPrecisely the trigger required: only genuine water infiltration activates the sensor, not humidity fluctuations
Long-term stabilityLyophilised CFPS retains >50% activity after 1 year at room temperature (Pardee et al., Cell 2016). Silica co-encapsulation (Duyen et al., 2017) extends shelf life to >3 years — within the construction timeline from panel manufacture to building commissioning
Substrate availabilityX-gal or PEG-conjugated fluorescein-di-β-galactopyranoside (FDG) co-lyophilised in the capsule; released upon rehydration without external addition

Ally Huang Question — Genes in Space Proposal

CRAD-Biosensor: Quantifying Astronaut Radiation-Induced DNA Damage Using BioBits® Cell-Free Reactions aboard the ISS

Genes in Space — CRAD-Biosensor Proposal Diagram Genes in Space — CRAD-Biosensor Proposal Diagram Fig 5. CRAD-Biosensor experimental workflow. Five-step protocol using BioBits CFPS, miniPCR, and P51 Viewer to quantify radiation-induced DNA double-strand breaks from astronaut samples aboard ISS. Inset: expected dose-response curve — GFP output from P_recA::sfGFP reporter increases with radiation dose.


Background (≤100 words)

Astronauts aboard the ISS receive ~150–300 mSv/year of ionising radiation — 50–100× the Earth surface dose — primarily from galactic cosmic rays and solar particle events. This radiation induces DNA double-strand breaks (DSBs) that accumulate over a mission and elevate cancer risk. Current DSB quantification requires ground-based γH2AX immunofluorescence microscopy, which is unavailable in microgravity. A rapid, equipment-light biosensor deployable on ISS would enable real-time monitoring of individual astronaut radiation sensitivity and cumulative DNA damage, informing personalised mission radiation limits and countermeasure timing.


Molecular Target (≤30 words)

The SOS-response regulator RecA and the LexA-repressed SOS-box promoter P_recA; expressed as a P_recA::sfGFP fusion reporter in a BioBits freeze-dried CFPS reaction.


Target-to-Challenge Relationship (≤100 words)

RecA is the central mediator of the bacterial SOS DNA damage response. Upon DSB formation, RecA polymerises on ssDNA at break sites, forming a nucleoprotein filament that stimulates LexA autocleavage and de-represses >40 SOS genes — including recA itself, creating a positive feedback loop. P_recA activity is therefore a sensitive, amplified proxy for DSB load. A BioBits CFPS reaction loaded with P_recA::sfGFP DNA produces GFP proportional to exogenous damaged DNA added to the reaction, enabling quantification without live cells, centrifuges, or microscopes.


Hypothesis (≤150 words)

Hypothesis: BioBits CFPS reactions containing P_recA::sfGFP reporter DNA will produce GFP fluorescence in proportion to the concentration of radiation-damaged DNA (characterised by DSBs and oxidative lesions) added exogenously to the reaction, enabling a calibrated, semi-quantitative biosensor for cumulative radiation exposure.

Reasoning: The E. coli SOS system responds to ssDNA produced at DSB sites. When damaged DNA is added to a CFPS reaction containing RecA (present in the S30 extract), RecA binds the ssDNA overhangs, stimulates LexA degradation, and de-represses the P_recA::sfGFP construct — producing a fluorescent readout. The strength of the GFP signal scales with DSB frequency, providing a graded dose-response curve. This approach repurposes an endogenous damage-sensing pathway as an in vitro biosensor, requiring only the BioBits CFPS kit, sample DNA, and the P51 Fluorescence Viewer — all feasible in microgravity.


Experimental Plan (≤100 words)

Samples: γ-irradiated plasmid DNA at 0, 0.5, 1, 2, 5, 10 Gy (positive controls); intact plasmid (negative control); astronaut peripheral blood lymphocyte lysate (test sample, collected via finger-prick).

Protocol: Add 2 µL DNA sample to rehydrated BioBits CFPS reaction (18 µL); incubate 4 h at 37°C (portable heat block); read GFP fluorescence with P51 Viewer. Confirm reaction with miniPCR amplification of recA from the sample DNA (gene integrity control). Calibrate against γ-irradiated standard curve; compute equivalent radiation dose from GFP output. Three replicates per sample.


Part B — Individual Final Project

Aim 1 (Confirmed)

Title: Phosphite Auxotrophy Biocontainment Validation for the ELM Deep-Space Habitat

Aim 1: Integrate the Pseudomonas stutzeri ptxD gene into the B. subtilis 168 amyE locus under the constitutive P_veg promoter, delete phoA (alkaline phosphatase) to prevent phosphate scavenging, and characterise growth inhibition as a function of phosphite concentration using the Opentrons-automated 96-well screen designed in Week 3 HW, establishing an IC₅₀ value for the synthetic auxotrophy.

DNA Twist Order Status

The ptxD-sfGFP insert (1,933 bp, pSEVA221 backbone, BsaI Golden Gate assembly) designed in Week 7 HW has been placed in the Twist (MIT) ordering tab with the following specifications:

FieldValue
Construct nameptxD-sfGFP_ELM_v1
Insert size1,933 bp
BackbonepSEVA221
Cloning methodGolden Gate (BsaI-HFv2)
OrganismB. subtilis 168 (codon-optimised)
Antibiotic selectionKanamycin 50 µg/mL
Order deadlineApril 3, 2026 (MIT student deadline)

Full sequence and Benchling model link are in the shared project folder submitted via the Google Form (Week 7 Part 3 documentation).


AI Disclosure

Claude Sonnet 4.6 (Anthropic) was used to assist with cell-free systems explanations, troubleshooting strategies, synthetic minimal cell component design, architecture proposal structuring, Genes in Space proposal drafting, and figure generation code. The ELM habitat application context, experimental rationale for ptxD biosensor design, and final project aim formulation were developed by the student.

Week 10 HW: Advanced Imaging & Measurement

Homework: Final Project — Measurement Plan for the ELM Biocontainment System

My final project centers on a Modular Engineer Living Material (ELM) deep-space biocontainment system using phosphite auxotrophy (ptxD-based synthetic dependency) in an engineered bacterium for Mars surface operations. Below are the key measurable quantities, the associated biological questions, and the measurement technologies I would use.


Measurable Aspects

MeasurementWhat is being assessedTechnology
ptxD protein mass & sequenceConfirm the correct protein is expressed after codon-optimization and genome integrationLC-MS peptide mapping + intact mass
Phosphite dehydrogenase activity (NADH generation rate)Confirm ptxD catalytic function in the engineered strainSpectrophotometric assay (340 nm NADH absorbance)
Intracellular phosphate concentrationAssess whether the synthetic auxotrophy blocks endogenous phosphate metabolismICP-MS (trace element mass spectrometry)
Genome edit confirmation (ΔpstS auxotrophy)Verify deletion of native phosphate transporter (pstS)Sanger sequencing + whole-genome sequencing
ELM structural integrity under GCR-equivalent radiationQuantify DNA double-strand breaks, protein oxidation, and membrane damage after accelerated radiation exposureγ-H2AX immunofluorescence (DNA), western blot (protein), LC-MS (oxidative modifications)
Biocontainment escape frequencyMeasure frequency of revertant colonies capable of growing on phosphate-only mediaFluctuation test (Luria-Delbrück assay)
Mycelium mechanical strengthCharacterize tensile properties of fungal structural matrixAtomic force microscopy (AFM) nanoindentation
MS2 L-protein lysis efficiencyConfirm that stabilized L-protein variants maintain lysis kinetics at elevated temperatureOD600 kinetic lysis assay ± temperature ramp

How Measurements Will Be Performed

1. ptxD Intact Mass and Peptide Mapping (LC-MS) The recombinantly expressed ptxD protein will be analyzed intact on a Waters Xevo G3 QTof system to confirm the correct molecular weight (expected ~36 kDa for the Stutzerimonas stutzeri ptxD, UniProt O69054). A tryptic digest peptide map on the Waters BioAccord will confirm the complete primary sequence and identify any unexpected post-translational modifications. Mass accuracy target: < 10 ppm for peptides, < 200 ppm for intact protein.

2. Phosphite Auxotrophy Verification (Plate assay + ICP-MS) Engineered cells will be plated on defined minimal media with either phosphite (survival expected) or phosphate (no growth expected). Intracellular phosphate levels will be quantified by ICP-MS (inductively coupled plasma mass spectrometry) to confirm the phosphate uptake block, verifying that the phosphate transporter deletion is functional.

3. Radiation Stability Testing (γ-H2AX + LC-MS oxidation profiling) Cells and purified structural proteins (fungal matrix, spider-silk biocomposites) will be exposed to high-energy proton beams (at the MIT NSRL equivalent) at Mars surface GCR fluence (~200 mGy/year equivalent). DNA damage will be quantified by anti-γ-H2AX immunofluorescence; protein oxidative damage (Cys and Met oxidation) will be quantified by LC-MS with oxidized modification search.

4. Genome Edit Confirmation (Sanger + NGS) Deletion of the pstS phosphate transporter gene and integration of the ptxD cassette will be confirmed by Sanger sequencing of PCR amplicons spanning both junctions, followed by Illumina short-read whole-genome sequencing to verify no off-target insertions.


Waters Part I: Molecular Weight of eGFP

Q1: Calculated Molecular Weight

The eGFP sequence (with His-purification tag):

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQ
CFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK
LEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDP
NEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH

Using ExPASy Compute pI/MW:

PropertyValue
Number of amino acids247
Calculated molecular weight27,745 Da (27.745 kDa)
Theoretical pI6.02

The extra 8 residues (LEHHHHHH) contribute approximately 969 Da above the base eGFP sequence (~26,776 Da). The His-tag adds a net positive charge and slightly increases the predicted pI compared to tagless eGFP (pI ~5.9).


Q2: Adjacent Charge State MW Calculation

Selecting adjacent charge state peaks from Figure 1 (denatured intact eGFP mass spectrum):

Figure 1. Denatured intact eGFP mass spectrum from Waters Xevo G3 QTof LC-MS (30,000 resolution) Figure 1. Denatured intact eGFP mass spectrum from Waters Xevo G3 QTof LC-MS (30,000 resolution) Figure 1. Mass spectrum of intact eGFP (denatured conditions, 30,000 resolution). Charge states z = 21–25 are labeled with their respective m/z values. Peak intensities form an envelope centered near z = 23–24.

Selected adjacent pair:

  • Peak 1: m/z₁ = 1157.14 (higher charge z₁)
  • Peak 2: m/z₂ = 1207.35 (lower charge z₂ = z₁ − 1)

Step 1 — Determine z₁ using the adjacent charge state formula:

$$z_1 = \frac{m/z_2 - H}{m/z_2 - m/z_1}$$

where H = 1.0073 Da (proton mass).

$$z_1 = \frac{1207.35 - 1.0073}{1207.35 - 1157.14} = \frac{1206.34}{50.21} = \boxed{24.02 \approx 24}$$

The nearest integer charge state is z₁ = 24, so z₂ = 23.

Step 2 — Calculate MW:

$$MW = z_1 \times m/z_1 - z_1 \times H = 24 \times 1157.14 - 24 \times 1.0073$$

$$MW = 27{,}771.36 - 24.175 = \boxed{27{,}747 \text{ Da}}$$

Step 3 — Accuracy:

Software deconvolution of the full charge state envelope gives MW = 27,745 Da.

$$\text{ppm error} = \frac{|27{,}747 - 27{,}745|}{27{,}745} \times 106 = \frac{2}{27{,}745} \times 106 \approx \boxed{72 \text{ ppm}}$$

This 72 ppm error reflects the inherent precision limit of manually reading m/z values from a spectrum. Software deconvolution routinely achieves < 50 ppm for intact proteins on this platform because it fits the entire charge state envelope simultaneously.


Q3: Charge State of the Zoomed-in Peak

At 30,000 resolving power, for the z = 24 charge state at m/z ≈ 1157, the expected isotope spacing is:

$$\Delta (m/z) = \frac{1}{z} = \frac{1}{24} \approx 0.042 \text{ m/z}$$

The peak width (FWHM) at R = 30,000 is:

$$\text{FWHM} = \frac{m/z}{R} = \frac{1157}{30{,}000} \approx 0.039 \text{ m/z}$$

Since the isotope spacing (0.042 m/z) ≈ peak width (0.039 m/z), individual isotopes are not cleanly resolved for a ~28 kDa protein at this charge state. The isotope envelope appears as a single broad peak rather than a series of clearly separated lines. Therefore, the charge state cannot be directly read from the zoomed-in peak alone. The charge state is instead determined from the ratio of adjacent charge state m/z positions in the full spectrum (as done in Q2 above).

Why not? A 28 kDa protein has a complex, multi-peak isotope distribution spanning ~5 Da (≈ 5/24 = 0.21 m/z units). At 30,000 resolution this envelope partially resolves, but the peaks are closely spaced, overlapping, and require very high resolution (> 100,000) to fully baseline-separate individual isotope peaks for a protein of this mass.


Waters Part II: Secondary/Tertiary Structure

Q1: Native vs. Denatured Protein Conformations

What happens when a protein unfolds? In its native (folded) state, a protein maintains a compact three-dimensional structure stabilized by non-covalent interactions: hydrophobic packing of the core, hydrogen bonds (forming α-helices and β-sheets), salt bridges between charged residues, and van der Waals contacts. These interactions shield many of the basic sites (Lys, Arg, His ε-amine, N-terminus) from the solvent, limiting the number of protons that can be added in positive-ion ESI-MS.

When denatured (unfolded), these non-covalent interactions are disrupted (by organic solvents, low pH, or high temperature in the LC mobile phase). The chain becomes extended, exposing all basic sites to solvent and allowing the acquisition of many protons during electrospray ionization. This results in a higher charge state and a lower m/z for the same protein.

How is this detected by mass spectrometry? ESI-MS produces a characteristic charge state distribution (CSD). The maximum charge state is approximately equal to the number of basic sites available for protonation. A denatured protein therefore shows:

  • Higher charge states (more protons, lower m/z)
  • Wider, higher-m/z-to-lower-m/z envelope spanning m/z 700–1500

A native protein shows:

  • Lower charge states (fewer accessible protons, higher m/z)
  • Narrower distribution shifted to higher m/z (typically 2000–4000 for a 28 kDa protein at z=8–10)

Figure 2. Comparison of mass spectra: denatured (top) vs. native (bottom) eGFP on the Waters Xevo G3 QTof MS Figure 2. Comparison of mass spectra: denatured (top) vs. native (bottom) eGFP on the Waters Xevo G3 QTof MS Figure 2. Mass spectra of eGFP under denatured (top, z = 21–25) and native (bottom, z = 8–11) conditions on the Waters Xevo G3 QTof MS. The denatured spectrum shows a high-charge envelope at m/z 1050–1350; the native spectrum shifts to a low-charge envelope at m/z 2300–3500.

Key differences observed (Figure 2):

  • Charge distribution shift: denatured maximum at z≈23 (m/z ~1207); native maximum at z≈10 (m/z ~2776)
  • Charge envelope width: denatured spans ~5 charge states; native spans ~4 charge states
  • m/z range: denatured 1050–1350; native 2300–3500

This shift in charge state distribution is the primary mass spectrometric indicator of protein folding state and is the foundation of native MS — ESI-MS conducted under aqueous, near-physiological solution conditions that preserve non-covalent structure.


Q2: Charge State of the ~2800 m/z Peak in the Native Spectrum

Figure 3. Native eGFP mass spectrum with zoomed-in view of the z=10 charge state at ~2776 m/z Figure 3. Native eGFP mass spectrum with zoomed-in view of the z=10 charge state at ~2776 m/z Figure 3. Native eGFP mass spectrum from the Waters Xevo G3 QTof MS. The inset shows a zoomed-in view of the charge state at ~2776 m/z at 30,000 resolution, where individual isotope peaks are resolved (Δm/z = 0.10 = 1/z).

Charge state at ~2800 m/z:

Expected m/z for each possible charge state of eGFP (MW = 27,745 Da):

zExpected m/z
112523.7
102775.5
93083.9

The peak closest to 2800 m/z corresponds to z = 10 (calculated m/z = 2775.5).

How is the charge state confirmed from the zoomed-in peak?

At 30,000 resolution, the isotope spacing for z = 10 is:

$$\Delta (m/z)_{\text{isotope}} = \frac{1}{z} = \frac{1}{10} = 0.10 \text{ m/z}$$

The peak width at m/z ≈ 2776 is:

$$\text{FWHM} = \frac{2776}{30{,}000} \approx 0.093 \text{ m/z}$$

Since the isotope spacing (0.10 m/z) > peak width (0.093 m/z), individual isotope peaks are resolved in the zoomed view. The spacing of 0.10 m/z between adjacent isotope peaks directly gives z = 1/0.10 = 10.

This is why native MS at high resolution is powerful: the lower charge states produce larger isotope spacings that are readily resolved by modern high-resolution instruments, allowing unambiguous charge state — and hence mass — determination directly from the isotope pattern.


Waters Part III: Peptide Mapping — Primary Structure

Q1: Lysines and Arginines in eGFP

Counting K (Lys) and R (Arg) residues in the eGFP + His-tag sequence:

20 Lysines at positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239

6 Arginines at positions: 74, 97, 110, 123, 169, 216

Total trypsin cleavage sites: 26 (20 K + 6 R)

Highlighted in the sequence (bold = K, italic = R):

MVS[K]GEELFTGVVPILVELDGDVNGH[K]FSVSGEGEGDATYG[K]LTL[K]FICTTG[K]LPVPWPT
LVTTLTYGVQCFS[R]YDHMK... ...QHDFF[K]SAMPEGYVQE[R]TIFF[K]DDGNY[K]T[R]AEV[K]
FEGDTLVN[R]IEL[K]GIDF[K]EDGNILGH[K]LEYNYNSHNVYIMAD[K]Q[K]NGI[K]VNF[K]I[R]
HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALS[K]DPNE[K][R]DHMVLLEFVTAAGITLGMDELY[K]
LEHHHHH

Q2: Tryptic Peptides from PeptideMass

Figure 4. PeptideMass tool conditions for eGFP tryptic digest prediction (ExPASy) Figure 4. PeptideMass tool conditions for eGFP tryptic digest prediction (ExPASy) Figure 4. ExPASy PeptideMass tool parameters: enzyme = trypsin (cuts after K and R, not before P); missed cleavages = 0; cysteine modification = carbamidomethylation; minimum MW = 300 Da.

Running the eGFP sequence through ExPASy PeptideMass (trypsin, 0 missed cleavages, carbamidomethylation of Cys) generates 27 predicted peptides, including small peptides (TR, QK, IR, R) that may not be detectable by LC-MS.

The 27 predicted tryptic peptides include (representative subset):

#SequenceResiduesMW (Da)
1MVSK1–4465.6
2GEELFTGVVPILVELDGDVNGHK5–272,396.7
3FSVSGEGEGDATYGK28–421,502.6
4LTLK43–46458.6
5FICTTGK47–53768.9*
6LPVPWPTLVTTLTYGVQCFSR54–742,453.8*
7YPDHMK75–80783.9
8QHDFFK81–86795.9
9SAMPEGYVQER87–971,265.4
10TIFFK98–102654.8
11DDGNYK103–108710.7
12TR109–110275.3
13AEVK111–114444.5
14FEGDTLVNR115–1231,049.1
15IELK124–127472.6
23HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK170–2104,421.9
24DPNEK211–215574.6
25R216174.2
26DHMVLLEFVTAAGITLGMDELYK217–2392,488.9
27LEHHHHHH240–2471,050.1

Cys residues assumed carbamidomethylated (+57.02 Da).

Total: 27 peptides from complete tryptic digestion.


Q3: Chromatographic Peaks in TIC

Figure 5a. Total Ion Chromatogram (TIC) of the eGFP peptide map (Waters BioAccord) Figure 5a. Total Ion Chromatogram (TIC) of the eGFP peptide map (Waters BioAccord) Figure 5a. TIC of the eGFP peptide map from the Waters BioAccord LC-MS system. The peak at 2.78 minutes is circled. Peaks are counted between 0.5–6 minutes at > 10% relative abundance.

Counting peaks above 10% relative abundance between 0.5–6 minutes: approximately 18 chromatographic peaks are visible.


Q4: Peaks vs. Predicted Peptides

The TIC shows fewer peaks (~18) than the predicted 27 peptides. Several reasons explain this discrepancy:

  1. Small peptides are not retained on the C18 reversed-phase column: TR (275 Da), AEVK (444 Da), QK (274 Da), IELK (472 Da), IR (302 Da), and R (174 Da) are too hydrophilic and elute before the 0.5-minute window or co-elute at the void volume.
  2. Co-elution: Some peptides with similar hydrophobicity co-elute as a single chromatographic peak (appearing as one peak but containing two peptides in the MS).
  3. Incomplete ionization: Very large peptides (e.g., the 41-residue peptide HNIEDGSVQL…SALK at 4,422 Da) may ionize poorly or be suppressed by other peptides in the mixture.

Q5: m/z, Charge, and Mass of the Peptide at 2.78 min

Figure 5b. Mass spectrum of the peptide eluting at 2.78 min with zoomed isotope pattern at m/z 525.76 Figure 5b. Mass spectrum of the peptide eluting at 2.78 min with zoomed isotope pattern at m/z 525.76 Figure 5b. Full mass spectrum (left) of the chromatographic peak at 2.78 min, showing the dominant charge state at m/z 525.76. Inset (right): zoomed-in isotope pattern at m/z 525.76 showing isotopes spaced 0.50 m/z apart, confirming z = 2.

Identification:

  • Observed m/z: 525.76
  • Isotope spacing (Δm/z): 0.50 m/z units → z = 1/0.50 = 2
  • Charge state: z = 2

Neutral mass (singly protonated form, M+H⁺):

$$[M+H]^+ = z \times (m/z) - (z-1) \times H = 2 \times 525.76 - 1 \times 1.0073 = 1051.52 - 1.007 = \boxed{1050.51 \text{ Da}}$$


Q6: Peptide Identification and Mass Accuracy

Comparing the measured mass of 1050.51 Da (M+H⁺) to the PeptideMass-predicted peptide list, the best match is:

FEGDTLVNR (residues 115–123 of eGFP)

Theoretical monoisotopic M+H⁺ for FEGDTLVNR:

ResidueMonoisotopic residue mass
F147.0684
E129.0426
G57.0215
D115.0269
T101.0477
L113.0841
V99.0684
N114.0429
R156.1011
+ H₂O18.0106
Total (M)1049.514 Da
M+H⁺1050.521 Da

Mass accuracy:

$$\text{ppm error} = \frac{|\text{observed} - \text{theoretical}|}{\text{theoretical}} \times 106 = \frac{|1050.51 - 1050.521|}{1050.521} \times 106 \approx \boxed{10.5 \text{ ppm}}$$

This is well within the < 15 ppm mass accuracy specification for the Waters BioAccord system.


Q7: Sequence Coverage

Figure 6. Amino acid coverage map of eGFP from BioAccord LC-MS peptide identification Figure 6. Amino acid coverage map of eGFP from BioAccord LC-MS peptide identification Figure 6. Sequence coverage map of eGFP. Residues highlighted in green are confirmed by at least one identified peptide; grey residues are not covered. 92% sequence coverage is achieved.

From Figure 6, peptides confirmed by LC-MS peptide mapping cover ~92% of the eGFP sequence (228 of 247 residues identified). The uncovered regions include the small peptides (TR, QK, IR, R) that are not retained by the C18 column and a portion of the His-tag region.


Bonus Q1: Peptide Sequence from Fragmentation Spectrum

Figure 5c. Fragmentation (MS/MS) spectrum of the peptide eluting at 2.78 min (FEGDTLVNR) Figure 5c. Fragmentation (MS/MS) spectrum of the peptide eluting at 2.78 min (FEGDTLVNR) Figure 5c. CID fragmentation spectrum of FEGDTLVNR. b-ions (blue, N-terminal fragments) and y-ions (red, C-terminal fragments) are labeled. The complete b2–b8 and y2–y8 series confirms the sequence unambiguously.

Using the Protein Prospector Fragment Ion Calculator, the predicted fragmentation pattern for FEGDTLVNR is:

IonSequencem/z (z=1)
b2FE277.13
b3FEG334.15
b4FEGD449.18
b5FEGDT550.22
b6FEGDTL663.31
b7FEGDTLV762.38
b8FEGDTLVN876.42
y2NR289.16
y3VNR388.23
y4LVNR501.32
y5TLVNR602.36
y6DTLVNR717.39
y7GDTLVNR774.41
y8EGDTLVNR903.45

The fragmentation pattern in Figure 5c matches the FEGDTLVNR b/y ion series. The peptide sequence is confirmed as FEGDTLVNR.


Bonus Q2: Does the Peptide Map Confirm eGFP?

Yes. The peptide map data strongly confirms that the protein is eGFP for three reasons:

  1. Mass-based identification: 92% of the amino acid sequence is covered by peptides whose measured masses match theoretical tryptic fragments of the eGFP sequence (Figure 6) within < 15 ppm.
  2. Fragmentation confirmation: MS/MS fragmentation of representative peptides (e.g., FEGDTLVNR in Figure 5c) produces b/y ion series that match the predicted fragmentation pattern, providing sequence-level confirmation beyond just mass.
  3. Chromatographic reproducibility: The retention time pattern and relative peak intensities in the TIC are consistent with the hydrophobicity profile expected for eGFP tryptic peptides, and the overall pattern reproducibly appears across injections.

The combination of intact mass (~27.745 kDa ≡ eGFP + LEHHHHHH), correct peptide masses, fragmentation sequence confirmation, and >90% sequence coverage unambiguously identifies the protein as the eGFP-6xHis standard.


Waters Part IV: KLH Oligomers

Using the known subunit masses from Table 1:

SubunitSubunit Mass
7FU340 kDa
8FU400 kDa

Predicted oligomeric masses:

Oligomeric StateCompositionMass
7FU Decamer10 × 7FU10 × 340 = 3,400 kDa (3.4 MDa)
8FU Didecamer20 × 8FU20 × 400 = 8,000 kDa (8.0 MDa)
8FU 3-Decamer30 × 8FU30 × 400 = 12,000 kDa (12.0 MDa)
8FU 4-Decamer40 × 8FU40 × 400 = 16,000 kDa (16.0 MDa)

Figure 7. CDMS mass spectrum of KLH with oligomeric species labeled Figure 7. CDMS mass spectrum of KLH with oligomeric species labeled Figure 7. Charge Detection Mass Spectrometry (CDMS) spectrum of KLH. Individual mass peaks are labeled with their oligomeric assignments. The 7FU decamer (3.4 MDa) and three 8FU oligomeric states (8.0, 12.0, 16.0 MDa) are clearly resolved as discrete species.

CDMS enables these measurements because it directly measures both the charge and the m/z of individual ions simultaneously, yielding a direct mass without requiring deconvolution — essential for heterogeneous megadalton assemblies like KLH that produce overlapping charge states in conventional ESI-MS.


Waters Part V: Did I Make GFP?

TheoreticalObserved (intact LC-MS deconvolution)PPM Mass Error
Molecular weight (kDa)27.74527.74772 ppm

Interpretation: The observed MW of 27.747 kDa is within 72 ppm of the theoretical value of 27.745 kDa. This level of accuracy is typical for intact protein analysis on a high-resolution QTof instrument, where deconvolution of the charge state envelope introduces some additional uncertainty compared to peptide-level measurements (< 15 ppm). The agreement confirms that:

  1. The protein is expressed at the correct molecular weight.
  2. No large unexpected modifications (e.g., missed cleavage of the signal peptide, glycosylation, or large adducts) are present.
  3. The His-tag (HHHHHH) and linker (LE) are intact, as the measured mass matches the full sequence including these elements.

Disclaimer: Artificial Intelligence was used in this assignment to assist with calculation verification, scientific writing, and figure generation. Mass spectrometry data, charge state identification, and peptide fragmentation analysis were performed using results from the Waters Immerse Lab session and the analytical tools cited above (ExPASy, Protein Prospector).

Week 11 HW: Bioproduction & Cloud Labs

Part A — The 1,536 Pixel Artwork Canvas | Collective Bioart

My Contribution

I contributed a cluster of sfGFP (green) and mTurquoise2 (cyan) wells arranged to form a segment of a DNA double helix pattern in my assigned plate. The two strands of the helix were encoded in alternating rows using sfGFP (one strand) and mTurquoise2 (the complementary strand), mirroring the ELM habitat’s motif of dual biological systems working in structural complementarity. In total I contributed 14 pixel wells — the length of approximately one full helical turn — in the upper-middle region of the 16-plate global canvas.

1,536-Pixel Collaborative Bioart — DNA Double Helix Contribution 1,536-Pixel Collaborative Bioart — DNA Double Helix Contribution Fig 1. Left: Plate showing the DNA double-helix motif in sfGFP (green) and mTurquoise2 (cyan), with highlighted wells (dashed yellow rings) indicating the contributed pixels. Right: overview of the full 16-plate × 96-well (1,536-pixel) global canvas, with the student’s plate outlined in yellow.

What I Liked

What struck me most was that this experiment collapses the distinction between “running an assay” and “making art.” The constraints are completely real — actual fluorescent proteins, actual CFPS chemistry, actual plate-reader optics — yet the output is something genuinely aesthetic. The fact that every student’s pixel represents a biological decision (which protein, which reagent composition) rather than a digital click makes the artwork irreproducible in any other medium. The collaborative editing window also created a real sense of shared urgency; it felt more like a live mural session than an experiment.

How It Could Be Improved for Next Year

  1. Real-time canvas visualisation: A live webpage that auto-updates as pixel assignments are locked in would dramatically increase engagement — contributors could watch the image emerge in near-real time rather than waiting for the final plate read.
  2. Broader colour palette via reagent tuning: Allowing students to submit custom reagent concentrations (not just FP assignments) as part of the pixel contribution would tie Part B directly into the artwork — a well with reduced Mg²⁺ might glow dimmer, encoding pixel “brightness” as a second data dimension.
  3. Longitudinal time-lapse capture: A plate-reader kinetics scan every 2 hours for 36 hours would produce a timelapse animation of the artwork “developing,” showing maturation differences between fast (sfGFP, mTurquoise2) and slow (mRFP1, mKO2) chromophores in real space.

Part B — Cell-Free Protein Synthesis | Cell-Free Reagents

Component Descriptions

E. coli Lysate

BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) The cell lysate is the core enzymatic engine of the CFPS reaction — it provides ribosomes (70S), all translation elongation and initiation factors (IF1/2/3, EF-Tu/Ts/G), aminoacyl-tRNA synthetases (AARSs), tRNAs, chaperones (GroEL/GroES, DnaK/DnaJ), and the chromosomally integrated T7 RNA polymerase. The BL21(DE3) Star strain carries a mutation in the RNase E gene (rne131) that reduces mRNA degradation rate ~3-fold, extending the productive translation window; the DE3 designation means T7 RNAP is pre-expressed from the lacUV5 promoter before lysis, so it is ready to transcribe T7-promoter-driven templates immediately upon template addition.


Salts / Buffer

Potassium Glutamate Provides K⁺ ions required for ribosome structural integrity and optimal translation elongation rates; glutamate is preferred over Cl⁻ as the counterion because high Cl⁻ concentrations inhibit aminoacyl-tRNA synthetases — glutamate’s negative charge better mimics the cytoplasmic ionic environment without enzyme inhibition.

HEPES-KOH pH 7.5 Maintains the reaction pH within the narrow optimum (~7.4–7.6) for ribosome activity and enzyme function; HEPES is preferred over phosphate because it does not chelate Mg²⁺ or interfere with the energy regeneration system, and its buffering capacity is stable between 6.8–8.2 at physiological temperatures.

Magnesium Glutamate Mg²⁺ is the most critically titrated ion in CFPS: it stabilises the 30S–50S ribosome interface, is an essential cofactor for RNA polymerase and aminoacyl-tRNA synthetases, and coordinates the ATP and GTP phosphate groups during elongation; optimal concentration varies by 1–2 mM between extract batches and must be independently optimised for each new preparation.

Potassium Phosphate Monobasic / Dibasic Provides inorganic phosphate (Pi) to buffer against phosphate depletion and to support ATP regeneration via substrate-level phosphorylation in the glycolytic pathway active in the crude lysate; the monobasic/dibasic ratio is set to achieve pH 7.5 as a secondary buffer alongside HEPES.


Energy / Nucleotide System

Ribose A pentose sugar that feeds the pentose phosphate pathway (PPP) in the lysate, generating NADPH and producing ribose-5-phosphate — the precursor for phosphoribosyl pyrophosphate (PRPP), which is required for nucleotide biosynthesis via the salvage pathway; ribose thus provides a sustained supply of NMP precursors for the 20-hour reaction.

Glucose The primary carbon source for glycolytic ATP regeneration in the 20-hour NMP-Ribose-Glucose system; glucose is phosphorylated by hexokinase in the lysate, enters the Embden–Meyerhof–Parnas pathway, and produces 2 ATP and 2 NADH per molecule — providing sustained energy for 10–20 hours as opposed to the ~1–2 hour burst from PEP.

AMP, CMP, GMP, UMP Nucleoside monophosphates that are phosphorylated to their di- and triphosphate forms (NDP → NTP) by adenylate kinase and nucleoside monophosphate kinases present in the cell extract, using ATP generated from glucose metabolism; this two-step kinase cascade converts cheap NMPs into the NTPs required as substrates for T7 RNA polymerase and as GTP for translation elongation factors.

Guanine (free nucleobase) Guanine is converted to GMP via the purine salvage enzyme hypoxanthine-guanine phosphoribosyltransferase (HGPRT/Gpt): Guanine + PRPP → GMP + PPi. GMP is then phosphorylated through GDP to GTP by guanylate kinase and nucleoside-diphosphate kinase present in the extract. This provides an alternative and cost-effective route to GTP without requiring pre-formed GMP to be included in the master mix.


Translation Mix (Amino Acids)

17 Amino Acid Mix Contains 17 of the 20 standard amino acids (all except Tyr, Cys, and typically Trp or one other) as substrates for the aminoacyl-tRNA synthetases, which charge tRNAs for ribosomal delivery during polypeptide elongation; using a pre-mixed stock at defined concentrations ensures consistent stoichiometry across reactions.

Tyrosine Added separately from the 17 AA mix because its very low aqueous solubility at neutral pH (< 0.5 mg/mL) requires it to be dissolved in dilute NaOH or DMSO and added independently; tyrosine is critical for the chromophore-forming tripeptide (Ser/Thr-Tyr-Gly) in all GFP-family fluorescent proteins, making its concentration a key determinant of fluorescent protein yield.

Cysteine Added separately because it is highly susceptible to oxidative dimerisation to cystine under aerobic conditions, which would deplete the free amino acid pool; in reducing CFPS environments, cysteine availability supports disulfide bond formation in proteins that require it and provides sulphydryl groups for cofactor binding.


Additives

Nicotinamide (Vitamin B3) A precursor to NAD⁺ via the Preiss–Handler salvage pathway (nicotinamide → nicotinic acid mononucleotide → NAD⁺); supplementing nicotinamide boosts the NAD⁺/NADH pool, which is required as a hydride acceptor in glycolysis (GAPDH step) and TCA cycle reactions, maintaining glycolytic flux and ATP generation throughout the 20-hour reaction.


Backfill

Nuclease-Free Water Brings the reaction to the specified final volume; nuclease-free grade (DEPC-treated or DEPC-free ultrapure) is essential to prevent RNase or DNase contamination — even trace amounts of RNase A will degrade the mRNA transcript within minutes, completely abolishing protein synthesis.


Master Mix Comparison: 1-Hour PEP-NTP vs 20-Hour NMP-Ribose-Glucose

Cell-Free Master Mix Comparison: 1-Hour vs 20-Hour Systems Cell-Free Master Mix Comparison: 1-Hour vs 20-Hour Systems Fig 2. Side-by-side comparison of the 1-hour PEP-NTP and 20-hour NMP-Ribose-Glucose master mix systems across six dimensions: energy source, nucleotide form, regeneration mechanism, reaction duration, protein yield, and best-fit fluorescent proteins.

The 1-hour PEP-NTP system uses phosphoenolpyruvate (PEP) as a pre-loaded high-energy phosphate donor — pyruvate kinase transfers PEP’s phosphate directly to ADP, generating ATP in a single enzymatic step. NTPs are supplied as pre-formed triphosphates, requiring no kinase conversion, enabling an immediate high-rate transcription burst. However, pyruvate accumulation inhibits pyruvate kinase over time, limiting productive synthesis to ~1–2 hours, and the system yields ~50–200 µg/mL protein.

The 20-hour NMP-Ribose-Glucose system instead supplies nucleoside monophosphates alongside glucose and ribose as metabolic substrates. Endogenous kinases in the lysate phosphorylate NMPs stepwise (NMP → NDP → NTP) using ATP from sustained glycolysis; multiple enzymatic steps distribute the energy load and avoid single-point inhibition, extending productive synthesis to 10–20 hours with yields of 500–2,000 µg/mL. This system is superior for fluorescent proteins with slow chromophore maturation (mRFP1, mKO2, mKate), which require several hours of synthesis before significant fluorescence accumulates.

Bonus — How can transcription occur if GMP is not included but Guanine is?

Free guanine (the nucleobase alone) enters the purine salvage pathway: Guanine + PRPP → GMP + PPi, catalysed by hypoxanthine-guanine phosphoribosyltransferase (HGPRT, encoded by gpt in E. coli). The GMP produced is then phosphorylated by guanylate kinase (GuaB, ADP-consuming) to GDP, and by nucleoside-diphosphate kinase (Ndk) to GTP. All three enzymes — HGPRT, guanylate kinase, and Ndk — are present in the crude S30 lysate. Using guanine as the precursor rather than GMP reduces reagent cost (guanine is ~100× cheaper per mole), while the kinase cascade integrates seamlessly with the glucose-driven ATP regeneration system that powers the phosphorylation steps.


Part C — Planning the Global Experiment | Cell-Free Master Mix Design

Biophysical Properties of the Six Fluorescent Proteins

Fluorescent Protein Properties — Spectra, Maturation Time, Quantum Yield Fluorescent Protein Properties — Spectra, Maturation Time, Quantum Yield Fig 3. Properties of the six fluorescent proteins used in the HTGAA 2026 collaborative artwork. Top: normalised excitation (dashed) and emission (solid) spectra for each protein. Bottom left: chromophore maturation half-time — proteins above the 1h dashed line are problematic for short CFPS reactions. Bottom right: quantum yield — higher values correspond to greater brightness per molecule.


sfGFP (Superfolder GFP) Ex: 485 nm | Em: 512 nm | QY: 0.65 | Maturation t½: ~30 min

Key property for CFPS: exceptionally fast and robust folding. sfGFP was engineered with six additional mutations (S30R, Y39N, N105T, Y145F, I171V, A206V) over eGFP that collectively accelerate folding 3–4× and allow correct folding even in the presence of aggregation-prone fusion partners. In a 1-hour CFPS reaction at 37°C, sfGFP is fully matured before the reaction ends — making it the benchmark standard for cell-free expression validation. Its pKa of ~6.0 also makes it stable to mild acidification during long incubations.

mRFP1 (Monomeric Red Fluorescent Protein 1) Ex: 584 nm | Em: 607 nm | QY: 0.25 | Maturation t½: ~5.5 h

Key property for CFPS: slow chromophore maturation and reduced brightness. The mRFP1 red chromophore (DsRed-derived) requires two sequential oxidation steps — first forming the GFP-like intermediate, then the extended acylimine conjugation — each requiring molecular oxygen and several hours at 37°C. In a 1-hour CFPS reaction, the vast majority of translated mRFP1 protein remains non-fluorescent. The 20-hour NMP-Ribose-Glucose master mix is essential for meaningful mRFP1 signal, and even then its low quantum yield (QY = 0.25, vs sfGFP’s 0.65) means it requires higher protein concentrations for equivalent brightness.

mKO2 (Monomeric Kusabira-Orange 2) Ex: 551 nm | Em: 565 nm | QY: 0.62 | Maturation t½: ~4.5 h

Key property for CFPS: oxygen-dependent chromophore maturation kinetics. mKO2’s orange chromophore (derived from the Fungia concinna orange FP) requires molecular oxygen for the ring-forming oxidation step, making its maturation rate sensitive to dissolved O₂ in the CFPS reaction. In sealed 20 µL reactions, O₂ is rapidly depleted by the oxidative metabolic activity of the extract, potentially stalling chromophore maturation mid-reaction. Despite a high quantum yield (0.62) once matured, mKO2 typically shows delayed fluorescence onset in CFPS — O₂ supplementation (see hypothesis below) can significantly accelerate signal emergence.

mTurquoise2 Ex: 434 nm | Em: 474 nm | QY: 0.93 | Maturation t½: ~30 min

Key property for CFPS: exceptional quantum yield but UV excitation requirement. mTurquoise2 is the brightest monomeric cyan FP with a quantum yield of 0.93 — nearly double that of ECFP (0.36) — and fast maturation comparable to sfGFP. However, its 434 nm excitation peak requires a UV-capable excitation filter on the plate reader (not all instruments have this at the correct wavelength). Additionally, mTurquoise2 can act as a FRET donor to sfGFP (434 nm excitation → 474 nm emission → 512 nm sfGFP excitation) if both are present in the same well, potentially inflating apparent sfGFP signal in multi-protein conditions.

mScarlet-I (Improved mScarlet) Ex: 569 nm | Em: 594 nm | QY: 0.54 | Maturation t½: ~1.0 h

Key property for CFPS: fastest-maturing red fluorescent protein with the highest quantum yield among red FPs. mScarlet-I was developed by directed evolution to have a maturation half-time of ~60 min — far shorter than mRFP1 (~5.5 h) or mKO2 (~4.5 h) — while maintaining high brightness (QY = 0.54). This makes it the most practical red FP for CFPS experiments where both short (1–3 h) and long (20 h) master mixes are used. Its pKa of ~4.9 also confers strong acid stability, relevant during long incubations where pH may drift slightly.

Electra2 Ex: ~596 nm | Em: ~641 nm | QY: ~0.45 | Maturation t½: ~3.0 h

Key property for CFPS: far-red emission requiring sufficient intracellular maturation time, with sensitivity to reducing environment. Electra2 is an engineered far-red fluorescent protein whose extended pi-conjugated chromophore system requires a more complex oxidation sequence than GFP-family proteins, making it moderately sensitive to the redox state of the CFPS reaction (reducing conditions in the crude extract can slow or prevent the final chromophore oxidation step). For CFPS experiments, this means both sufficient O₂ availability and a sufficiently long incubation (>4 h) are required for peak signal. Its far-red emission (~641 nm) is spectrally well-separated from all other panel members, minimising cross-talk in multi-protein artwork wells when using appropriate band-pass filters.


Hypothesis for Reagent Optimisation to Maximise Fluorescence

Protein: mKO2 Reagent: Dissolved oxygen (O₂ supplementation) + catalase

Hypothesis: mKO2 fluorescence yield over a 36-hour incubation in the NMP-Ribose-Glucose master mix will be significantly increased by (a) pre-equilibrating the CFPS reaction with O₂ for 5 minutes before sealing, and (b) supplementing with catalase (50 U/mL) to scavenge H₂O₂ produced as a byproduct of the chromophore oxidation reaction.

Reasoning: The rate-limiting step in mKO2 maturation is the molecular-oxygen-dependent ring-formation and oxidation of the Gly65-Tyr66-Gly67 chromophore tripeptide. In sealed 20 µL CFPS reactions, dissolved O₂ (~220 µM at equilibrium) is consumed within ~30–60 min by the metabolic activity of the cell extract (NADH oxidation, trace respiration). This oxygen depletion halts chromophore maturation for newly synthesised mKO2 molecules for the remainder of the reaction, even though translation continues. Simultaneously, H₂O₂ produced as a byproduct of oxidative chromophore maturation can damage the extract and the protein itself. Catalase (2H₂O₂ → 2H₂O + O₂) would both remove the toxic peroxide and regenerate O₂ within the sealed reaction — creating a locally sustained O₂ supply that extends the maturation window.

Expected effect: 2–4× increase in mKO2 fluorescence at the 6-hour time point; ~40–60% increase in total area-under-curve fluorescence over 36 hours compared to the standard sealed reaction. Maturation half-time expected to decrease from ~4.5 h to ~2.5 h.

Reagent Optimization Hypothesis — mKO2 O2 Supplementation Reagent Optimization Hypothesis — mKO2 O2 Supplementation Fig 4. Left: experimental design for testing O₂ + catalase supplementation on mKO2 fluorescence yield, with four conditions (control, O₂ only, catalase only, O₂ + catalase), n=8 wells each, read every 30 min for 36 h. Right: predicted fluorescence kinetics showing earlier onset and higher plateau for the O₂+catalase condition, particularly at the 6-hour read (dashed vertical line).


Custom Reagent Supplement Plan (20 µL reaction format)

Per the reaction format specified (6 µL lysate + 10 µL 2× master mix + 2 µL DNA template + 2 µL custom supplement = 20 µL):

ComponentVolumeFinal conc.Rationale
Catalase (1,000 U/mL stock)1 µL50 U/mLH₂O₂ scavenging; O₂ regeneration
O₂-saturated nuclease-free H₂O1 µL~8 µM additional O₂Dissolved O₂ supplement at reaction start
Custom supplement total2 µLFits the 2 µL custom reagent slot

The 2 µL slot is split between catalase stock and O₂-saturated water. For the O₂-saturated water, the stock is prepared by briefly bubbling pure O₂ through 1 mL nuclease-free water at 4°C for 5 min immediately before reaction setup.


Part D — Build-A-Cloud-Lab (Optional Bonus)

Using the Ginkgo Reconfigurable Automation Cart (RAC) simulation tool, I designed a cloud lab layout optimised for the CFPS artwork experiment workflow: fluorescent protein expression screening in a 1,536-well (16-plate) format with continuous kinetic plate reading.

Proposed layout — “ELM CFPS Screening Suite”:

StationCart configurationPurpose
Station 1Liquid handler RAC (8-channel P300)Master mix dispensing into 96-well plates, 20 µL final volume
Station 2Reagent storage RAC (4°C cold block)Cell extract, NMP/glucose components, amino acids stored on-deck
Station 3DNA template dispensing RAC2 µL FP template addition, one FP per column
Station 4Incubation + read RAC (37°C plate nest + BMG CLARIOstar)Kinetic fluorescence reads every 30 min across 6 channels simultaneously
Station 5Plate sealing RAC (film applicator)O₂-barrier sealing of completed plates before incubation

The workflow is fully automated from master mix preparation to plate reading, enabling the 36-hour kinetics experiment to run overnight without operator intervention — directly applicable to the HTGAA collaborative artwork’s multi-plate format.


AI Disclosure

Claude Sonnet 4.6 (Anthropic) was used to assist with reagent component descriptions, mastermix comparison analysis, fluorescent protein property research, hypothesis formulation, and figure generation code. The ELM habitat project connections, pixel art contribution description, and cloud lab design were developed by the student.

Labs

Lab writeups:

  • Week 1 Lab: Introduction to Pipetting and Dilutions

    Overview This first HTGAA lab introduces the foundational techniques of pipetting and serial dilutions — critical skills for precise liquid handling in biological and chemical experiments. Two protocols were covered: mixing food coloring solutions to build volume intuition, and performing a serial dilution of a mystery substance (MS) to achieve a target concentration. Pre-Lab Key Definitions Term Definition Mole (mol) A unit representing 6.022 × 10²³ particles (atoms, molecules, etc.) Molarity (M) Concentration defined as moles of solute per liter of solution (mol/L) Conversions 1 L = 1,000 mL = 1,000,000 µL • 1 M = 1,000 mM = 1,000,000 µM Dilution Formula The core equation for all dilution calculations:

  • Week 2 Lab: DNA Gel Art

    Gel Art: Restriction Digests and Gel Electrophoresis Overview | Objective The goal of this 3-hour lab is to immerse you in the practical world of DNA gel electrophoresis and restriction enzyme-based DNA manipulation. You’ll create stunning DNA gel art while mastering essential techniques used in scientific research! Inspired by Paul Vanouse’s Art project and his Latent Figure Protocol, this lab offers a unique opportunity to blend creativity with molecular biology. By visualizing DNA fragments of varying lengths, you’ll gain firsthand experience in a process critical for verifying DNA sequences.

  • Week 3 Lab: Opentrons Art

    Opentrons Artwork: Fluorescent Bacteria Pixel Art Overview | Objective In this two-day lab, you’ll program the Opentrons OT-2 pipetting robot to create stunning, glowing designs by depositing genetically engineered E. coli onto black (charcoal) agar plates. These bacteria express fluorescent proteins in vibrant colors, forming “bio-art” that comes to life under UV light. It’s your chance to turn cutting-edge biotech into a canvas for creativity!

  • Week 6 Lab: Gibson Assembly

    The Chromophore Color Cloning Quest Overview | Objective In this lab, you’ll be changing the color-generating chromophore of the purple Acropora millepora chromoprotein (amilCP) to a variety of orange, pink, and blue mutants. First, we’ll prepare two polymerase chain reactions (PCR) to generate the necessary fragments for a Gibson assembly. Using the amilCP-encoding Addgene mUAV plasmid as a template, we will amplify:

  • Week 7 Lab: Neuromorphic Circuits

    Genetic Circuits II: Intracellular Artificial Neural Networks (IANNs) Overview | Objective In this two-day lab, you will design and build your very own Intracellular Artificial Neural Network (IANN) using a library of plasmids from the Ron Weiss lab and human embryonic kidney (HEK) 293 cells.

  • Week 9 Lab: Cell-Free Systems

    Cell-Free Transcription-Translation (TX-TL) Systems Overview | What is Cell-Free? A cell-free system allows biological reactions to occur outside of living cells. By extracting and using cellular components like ribosomes, RNA polymerase, amino acids, and ATP, this method enables reactions in a controlled, simplified environment.

  • Week 10 Lab: Mass Spectrometry

    Analytical Protein Characterization via LC-MS Introduction and Background Modern bioengineering relies on the ability to understand biological molecules with extraordinary precision. Liquid chromatography–mass spectrometry (LC-MS) is a cornerstone technique for protein characterization, revealing critical information about: Molecular Weight Protein Sequence Protein Folding and Structure In this lab, we follow an analytical progression from intact protein analysis, through structural interrogation under native and denaturing conditions, to peptide-level sequencing of enhanced Green Fluorescent Protein (eGFP).

  • Week 11 Lab: Introduction to Cloud Laboratories

    Cloud Laboratories: Collective Art and Cell-Free Optimization Overview | Introduction Cloud laboratories are making science accessible, affordable, and reproducible. This lab showcases how cloud labs enable human creativity at scale and provide a platform for global collaboration. Our goal is to design a scientifically rigorous cell-free fluorescent protein optimization experiment together.

  • Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene

    Bioproduction of Beta-Carotene and Lycopene Overview | Objective In this two-day lab, you will work with genetically modified E. coli to produce beta-carotene and lycopene, key plant pigments and antioxidants found in carrots and tomatoes. Using the plasmids pAC-LYC and pAC-BETA, which encode the pathways for lycopene and beta-carotene production, your goal will be to optimize the production of these two pigments.

Apr 8, 2026

Subsections of Labs

Week 1 Lab: Introduction to Pipetting and Dilutions

Overview

This first HTGAA lab introduces the foundational techniques of pipetting and serial dilutions — critical skills for precise liquid handling in biological and chemical experiments. Two protocols were covered: mixing food coloring solutions to build volume intuition, and performing a serial dilution of a mystery substance (MS) to achieve a target concentration.


Pre-Lab

Key Definitions

TermDefinition
Mole (mol)A unit representing 6.022 × 10²³ particles (atoms, molecules, etc.)
Molarity (M)Concentration defined as moles of solute per liter of solution (mol/L)
Conversions1 L = 1,000 mL = 1,000,000 µL  •  1 M = 1,000 mM = 1,000,000 µM

Dilution Formula

The core equation for all dilution calculations:

C₁V₁ = C₂V₂

SymbolMeaning
C₁Initial (stock) concentration
V₁Volume of stock to transfer
C₂Desired final concentration
V₂Total final volume

Rearranged to find the transfer volume: V₁ = (C₂ × V₂) / C₁

Volume of water to add: V_water = V₂ − V₁


Dilution Practice 1

Scenario: Stock [MS] = 5 M. Goal: reach 100 µM using two sequential steps.

Step 1 — 5 M → 10,000 µM (1:499 dilution, 500× dilution factor)

Using C₁V₁ = C₂V₂ with V₂ = 500 µL:

V₁ = (10,000 µM × 500 µL) / 5,000,000 µM = 1 µL of 5 M stock

V_water = 500 − 1 = 499 µL of dH₂O

Result: 500 µL at 10,000 µM (10 mM)

Step 2 — 10,000 µM → 100 µM (1:99 dilution, 100× dilution factor)

Using C₁V₁ = C₂V₂ with V₂ = 1000 µL:

V₁ = (100 µM × 1000 µL) / 10,000 µM = 10 µL of Step 1 solution

V_water = 1000 − 10 = 990 µL of dH₂O

Result: 1000 µL at 100 µM — total dilution factor = 500 × 100 = 50,000×

Two-step serial dilution diagram from 5 M stock to 100 µM working solution Two-step serial dilution diagram from 5 M stock to 100 µM working solution

Dilution Practice 2

Part a — Stock concentration in g/mL

Given: stock = 5 M, molar mass = 532 g/mol

5 mol/L × 532 g/mol = 2,660 g/L = 2.66 g/mL

Part b — Serial dilution plan from 5 M to 100 µM

Total required dilution factor: 5,000,000 µM ÷ 100 µM = 50,000×

A single dilution step is impractical — transferring 1 µL into 50 mL introduces unacceptable pipetting error at the microliter scale. A two-step serial dilution distributes the dilution factor into two manageable operations:

StepDilutionFrom → ToV_stockV_waterTotalTubePipette
11:499 (500×)5 M → 10,000 µM1 µL499 µL500 µL1.5 mL EppendorfP20 (1 µL) + P1000 (499 µL)
21:99 (100×)10,000 µM → 100 µM10 µL990 µL1000 µL1.5 mL EppendorfP20 (10 µL) + P1000 (990 µL)

Two dilution steps total. Eppendorf tubes are used at each step because they hold up to 1.5 mL and snap closed securely. The P20 pipette handles small transfers (1–10 µL) with precision; the P1000 adds the bulk water volumes.

Part c — Final reaction table (60 µL total)

Using C₁V₁ = C₂V₂ for each component:

  • Loading dye: V₁ = (1X × 60 µL) / 6X = 10 µL
  • MS: V₁ = (40 µM × 60 µL) / 100 µM = 24 µL
  • dH₂O: 60 − 10 − 24 = 26 µL
ReagentStock ConcentrationDesired ConcentrationVolume to Add
Loading dye6X1X10 µL
MS100 µM40 µM24 µL
dH₂On/an/a26 µL
Total60 µL
Stacked bar showing 60 µL reaction composition: loading dye, MS, and water volumes Stacked bar showing 60 µL reaction composition: loading dye, MS, and water volumes

Why prepare 100 µM if 40 µM is the target?

Serial dilutions work most reliably when each step uses a round dilution factor (e.g., 1:500, 1:100) that keeps every transfer volume within the accurate range of an appropriate pipette. Diluting 5 M directly to 40 µM would require a 125,000× dilution — no two-step combination with round factors produces this cleanly. By creating a 100 µM intermediate stock first, the dilution ladder stays simple (1:499 → 1:99), and the final 40 µM is achieved precisely in the reaction setup using the P20. A secondary benefit: the 100 µM stock can be re-used for additional experiments without repeating the full dilution from scratch.


Lab Documentation

Materials

EquipmentVolume RangeUse in This Lab
P20 pipette1–20 µLMS stock transfers (1 µL, 10 µL), loading dye
P200 pipette20–200 µLColor solutions, portions of water addition
P1000 pipette100–1000 µLBulk water addition during serial dilution
1.5 mL Eppendorf tubesUp to 1.5 mLBoth serial dilution steps
PCR tube strips~200 µL eachFinal reaction
Tube holderStability during pipetting

Part 1: Mixing Colors

Six numbered Eppendorf tubes were prepared using red, yellow, and blue food coloring solutions to explore color mixing and build volume intuition at different scales.

TubeSolution(s)Volume(s)Expected Color
1Red500 µLRed
2Yellow500 µLYellow
3Blue500 µLBlue
4Red + Yellow220 µL + 220 µLOrange
5Yellow + Blue525 µL + 525 µLGreen
6Red + Blue155 µL + 155 µLPurple

For Tube 4, the 220 µL of each color was added in two steps: 200 µL first (P200), then 20 µL (P20), with a tip change between colors to prevent cross-contamination. Mixing was done by pipetting up and down 3–4 times after each addition.

Six test tubes showing expected colors from primary color mixing Six test tubes showing expected colors from primary color mixing

Small volumes of each solution were spotted onto a petri plate in increasing amounts (1 µL, 2 µL, 5 µL, 10 µL) to develop an intuitive sense of scale. At 1 µL, the drop is barely visible; at 10 µL, the drop is clearly a visible dome.

Top-down view of petri plate showing relative sizes of 1, 2, 5, and 10 µL drops Top-down view of petri plate showing relative sizes of 1, 2, 5, and 10 µL drops

Part 2: Serial Dilution

The two-step protocol was followed to prepare a 100 µM working stock of MS from the 5 M stock, then combined into the final 60 µL reaction:

Step 1 (500× dilution): Using the P20, 1 µL of 5 M MS stock was transferred into a labeled Eppendorf tube. 499 µL of dH₂O was added with the P1000. The solution was mixed by pipetting up and down 4 times. The tube was labeled “10 mM MS”.

Step 2 (100× dilution): Using the P20, 10 µL of the 10 mM intermediate was transferred into a fresh Eppendorf tube. 990 µL of dH₂O was added with the P1000, mixed 4 times, and the tube was labeled “100 µM MS”.

Final Reaction Assembly (60 µL total):

Order addedReagentVolume
1dH₂O26 µL
2MS (100 µM stock)24 µL
3Loading dye (6X)10 µL
Total60 µL

Water was added first to reduce viscosity effects, then MS, then the concentrated loading dye last. The final reaction contains 40 µM MS and 1X loading dye in a bright purple solution.

Bonus — Gel Loading: 20 µL from the final reaction was pipetted into a pre-prepared agarose gel well. The tip was held just above the well opening (not inserted deeply) and the plunger depressed slowly and steadily to avoid puncturing the gel while ensuring the dense, dye-loaded solution sank into the well.

Week 2 Lab: DNA Gel Art

Gel Art: Restriction Digests and Gel Electrophoresis

Overview | Objective

The goal of this 3-hour lab is to immerse you in the practical world of DNA gel electrophoresis and restriction enzyme-based DNA manipulation. You’ll create stunning DNA gel art while mastering essential techniques used in scientific research! Inspired by Paul Vanouse’s Art project and his Latent Figure Protocol, this lab offers a unique opportunity to blend creativity with molecular biology. By visualizing DNA fragments of varying lengths, you’ll gain firsthand experience in a process critical for verifying DNA sequences.

While the outcome of this lab is primarily artistic, gel electrophoresis is a fundamental tool in molecular biology for verifying DNA sequences. It allows you to confirm that the DNA you have obtained—whether purchased, purified, or constructed—matches your expectations. By comparing the lengths of DNA fragments observed on the gel to your predictions, you can assess whether the DNA sequence is correct. Although sequencing provides definitive confirmation, it is significantly more expensive, making gel electrophoresis an essential and cost-effective preliminary step in DNA analysis.

The America Project Inspiration The America Project Inspiration “The America Project”, Paul Vanouse, 2016. (Simulated iconographic Gel Art)

Overview | Concepts Learned & Skills Gained

This lab aims to enhance your understanding of core molecular biology concepts, including the mechanism of gel electrophoresis, the function of restriction enzymes, and the interpretation of DNA banding patterns. Through hands-on experience, you will gain skills in:

  • Benchling Tools: Importing and analyzing DNA sequences, simulating restriction digests, and designing gel layouts.
  • Restriction Digest Setup: Preparing precise enzyme reactions for targeted DNA fragment generation.
  • Agarose Gel Preparation: Calculating and casting 1% agarose gels with appropriate buffers and DNA stains.
  • Gel Electrophoresis Execution: Loading samples with accuracy, setting up electrophoresis apparatus, and troubleshooting common issues.
  • DNA Visualization: Using a blue light transilluminator to image and document gel results effectively.

Pre-Lab | Reading

(1) How does Gel Electrophoresis Work?

In gel electrophoresis, we place DNA samples in a semi-solid gel called agarose. The gel acts as a molecular mesh. DNA has a negative charge due to the phosphate groups in its sugar-phosphate backbone. When placed in an electric field, DNA fragments move towards the positively charged electrode (the anode).

DNA Chemical Structure DNA Chemical Structure

When an electric current is applied, the DNA fragments are pulled through the gel. However, their movement is influenced by size:

  • Smaller DNA fragments navigate through the pores more easily and move faster toward the anode.
  • Larger DNA fragments experience more resistance, slowing their progress.
Gel Electrophoresis Mechanism Gel Electrophoresis Mechanism

(2) DNA Gel Ladders

Gel electrophoresis separates DNA fragments based on length only. DNA Ladders serve as molecular weight markers (biological rulers) providing standardized DNA sizes for comparison.

(3) Restriction enzymes

Restriction enzymes, or endonucleases, cut DNA at specific sequences called restriction sites. Each enzyme recognizes a unique nucleotide sequence, often palindromic. We used High Fidelity (HF) enzymes for precision.

Restriction Enzyme Action Restriction Enzyme Action

(4) GenBank and FASTA file formats

To run a virtual digest, DNA sequences are stored in FASTA or GenBank file formats. FASTA files have a simple format (sequence ID followed by the sequence), while GenBank files contain additional annotations.

FASTA and GenBank Examples FASTA and GenBank Examples

Protocol | Part 0: Designing your Gel Art

Time Estimate: 1 hour

Use Benchling to design your gel art and run your virtual digest. The distinct bands formed from gel electrophoresis can be used to create pictures.

Benchling Simulation Result Benchling Simulation Result

Protocol | Part 1a: Preparing a 1% agarose electrophoresis gel

Time Estimate: 20 minutes prep, 30 minute wait

  1. Add 0.75 g of agarose and 75 mL of 1x TAE buffer to a microwavable flask (1% w/v).
  2. Heat in a microwave until dissolved.
  3. Allow to cool to ~50 ºC, then add 7.5 μL of SYBR Safe DNA stain.
  4. Pour into tray with comb and let solidify for 30 minutes.

Protocol | Part 1b: Restriction Digest

Time Estimate: 15 minutes prep, 20 minute wait

ReagentDesired conc/amountStock conc/amountVolume
Lambda DNA1.5 ug0.5 ug/uL3 uL
Enzyme-specific Buffer1x10x2 uL
Restriction Enzyme15 units20 units/uL1 uL (per enzyme)
Nuclease-free watern/an/aup to 20 uL
Total20 uL

Protocol | Part 2: Gel Run

Time Estimate: 15 minutes setup, 1 hour wait

Loading the wells correctly is very important and requires a steady hand. Make sure the pipette tip hovers at the top of the well.

  1. Load 20 uL of each sample into the wells.
  2. Run the gel at 80V - 115V for around 45 minutes.

Protocol | Part 3: Imaging Your Results

Time Estimate: 5 minutes

Place the gel on the blue light transilluminator. Turn on the light, turn off room lights, and capture a clear image of the bands.


Supplemental | Troubleshooting

  • Nonfunctional electrophoresis lane: Excessive DNA concentration or high voltage causing smearing.
  • Bleeding trails: Human error in mixing or incorrect incubation.
  • Samples not migrating: Check if water was used instead of TAE buffer (lack of conductivity).

HTGAA 2026 | Arman Saadatkhah | Reference: Paul Vanouse “The America Project”

Week 3 Lab: Opentrons Art

Opentrons Artwork: Fluorescent Bacteria Pixel Art

Overview | Objective

In this two-day lab, you’ll program the Opentrons OT-2 pipetting robot to create stunning, glowing designs by depositing genetically engineered E. coli onto black (charcoal) agar plates. These bacteria express fluorescent proteins in vibrant colors, forming “bio-art” that comes to life under UV light. It’s your chance to turn cutting-edge biotech into a canvas for creativity!

Simulated Bio-Art Simulated Bio-Art Fig 1. Simulated 20-point star pattern generated via Python for the Opentrons robot.

Overview | Concepts Learned & Skills Gained

This week, you will be working with the Opentrons OT-2, a liquid handling robot used in various life-science laboratories. You will learn:

  • How to incorporate automation into synthetic biology research.
  • How to code a Python script using the Opentrons API.
  • How to create agar plates, a basic tool in molecular biology.

Opentrons Hardware Layout Opentrons Hardware Layout Fig 2. Standard HTGAA 2026 Deck Configuration for the OT-2 robot.


Pre-Lab | Reading

The “Central Dogma” of Opentrons

Before programming, it’s important to understand the workflow that transforms an idea into a precise robotic procedure.

Opentrons Workflow Dogma Opentrons Workflow Dogma Fig 3. The transformation from plain language instructions to hardware commands.

  1. Paper Protocol: Instructions written in plain language (e.g., “Pipette 100 uL into A1”).
  2. Opentrons Protocol: The Python script that translates these steps using the Opentrons API.
  3. Compiled Protocol: The Opentrons App compiles the script into commands controlling the robot hardware.

GFP and Friends: The Science of Glow

Green Fluorescent Protein (GFP) is a protein that glows green when illuminated with UV light. “Fluorescing” involves absorbing light at one wavelength (UV) and re-emitting it at another (Visible Green).

Fluorescence Principles Fluorescence Principles Fig 4. The mechanism of absorption and emission in fluorescent proteins.

For this lab, we use E. coli spliced with R/G/B/C/YFP genes. We mix charcoal powder into the agar to make it black, enhancing the visibility of the glowing designs.


Protocol | Part 1: Fluorescent Bacteria & Black Agar Script

Time Estimate: 2 Hours

Artistic Concept — “ELM Habitat Cross-Section”

My design uses the 96-well plate as a canvas to depict a cross-sectional schematic of the Multi-Trophic Myco-Foundry — the engineered living material (ELM) habitat proposed in Week 1.

ELM Habitat Design ELM Habitat Design Fig 5. Well-plate layout for “ELM Habitat Cross-Section.” Blue = Structural Shell, Orange = Vascular System, Green = Metabolic Hub.

Python Script (Opentrons API v2.14)

The robot run starts without any tips. Fresh tips are used for every color to prevent cross-contamination.

from opentrons import protocol_api
import math

# Metadata and Setup
metadata = {'protocolName': 'ELM Habitat Art', 'apiLevel': '2.14'}

# Well classification logic for concentric rings
def classify_well(ri, ci):
    d = math.sqrt((ri - 3.5)**2 + (ci - 5.5)**2)
    if d < 1.5:             return 'hub'
    elif 1.5 <= d < 3.0:   return 'vascular'
    elif 3.0 <= d <= 5.0:  return 'shell'
    return 'empty'

Protocol | Part 2: Automation Plan for Final Project

Goal: Use lab automation to screen and validate the phosphite auxotrophy biocontainment system.

Automated Screening Pipeline Automated Screening Pipeline Fig 6. Seven-step Opentrons automation pipeline for screening the ELM phosphite auxotrophy kill switch.

The OT-2 will run a 12-condition × 8-replicate growth screen, testing the ptxD-based kill switch across a 2-fold phosphite dilution series. This allows for rapid calculation of the IC₅₀ value, essential for the safety case of deployment.


Post-Lab | Troubleshooting & Results

Alice Cai’s final plate (2023) demonstrates the transition from a dark charcoal agar to a vibrant glowing masterpiece under UV light. Simple geometric shapes often yield the most precise results on the OT-2.


HTGAA 2026 | Arman Saadatkhah | Reference: Opentrons OT-2 Documentation

Week 6 Lab: Gibson Assembly

The Chromophore Color Cloning Quest

Overview | Objective

In this lab, you’ll be changing the color-generating chromophore of the purple Acropora millepora chromoprotein (amilCP) to a variety of orange, pink, and blue mutants.

First, we’ll prepare two polymerase chain reactions (PCR) to generate the necessary fragments for a Gibson assembly. Using the amilCP-encoding Addgene mUAV plasmid as a template, we will amplify:

  • The Backbone Fragment: Containing the origin of replication, Chloramphenicol resistance, and the promoter/RBS.
  • The Insert Fragment: Containing the chromophore region with intentional mutations for color variation.

mUAV PCR Strategy mUAV PCR Strategy Fig 1. PCR strategy for generating Backbone and Insert fragments from the mUAV template.


Pre-Lab | Concepts

(1) Spectral Engineering of amilCP

The amilCP gene contains a chromophore (CP) region that can be mutated to express different colors. By changing the DNA sequence at the CP site (cagTGTCAGtac), we can engineer proteins that absorb light at different wavelengths.

Chromophore Mutations Chromophore Mutations Fig 2. DNA sequences and predicted phenotypes for various amilCP mutants.

(2) Gibson Assembly Mechanism

Gibson Assembly is a “one-pot” isothermal reaction that allows multiple DNA fragments to be joined together. It relies on overlapping sequences (20-40 bp) at the ends of the fragments.

Gibson Assembly Mechanism Gibson Assembly Mechanism Fig 3. The four enzymatic steps of Gibson Assembly: Exonuclease chew-back, Annealing, Polymerase fill-in, and Ligase sealing.

(3) Transformation

Once the plasmid is assembled, it must be introduced into E. coli cells. We use Heat Shock to create temporary pores in the cell membrane, allowing the DNA to enter by diffusion.

Heat Shock Transformation Heat Shock Transformation Fig 4. Mechanism of Heat Shock transformation in chemically competent DH5α cells.


Protocol | Part 1: PCR & Purification

PCR Setup

We set up two reactions to amplify the backbone and the mutant inserts.

  • Backbone Fragment: ~2.5 kb (Primers: Backbone Fwd/Rev)
  • Insert Fragment: ~0.5 kb (Primers: Color Fwd/Rev)

DpnI Digest & Purification

After PCR, we treat the samples with DpnI to digest the methylated template DNA (mUAV), ensuring only our newly synthesized mutant fragments are used in the assembly. We then purify the DNA using a silica-based column.


Protocol | Part 2: Assembly & Transformation

Gibson Assembly

We mix our purified Backbone and Insert fragments in a 1:2 molar ratio with the Gibson Assembly Master Mix and incubate at 50°C for 30 minutes.

Transformation

The assembled plasmids are transformed into DH5α competent cells.

  1. Thaw cells on ice for 10 minutes.
  2. Add 4 uL of assembly product.
  3. Heat Shock: 42°C for exactly 45 seconds.
  4. Outgrowth: Add SOC media and incubate at 37°C for 1 hour.
  5. Plating: Plate onto LB + Chloramphenicol agar.

Final Results | Example

Final Colonies Simulation Final Colonies Simulation Fig 5. Predicted result showing a variety of colorful colonies representing different amilCP mutations.

After 72 hours of incubation, you should see a vibrant variety of purple, orange, pink, and blue colonies!


HTGAA 2026 | Arman Saadatkhah | Reference: Liljeruhm et al. (2018)

Week 7 Lab: Neuromorphic Circuits

Genetic Circuits II: Intracellular Artificial Neural Networks (IANNs)

Overview | Objective

In this two-day lab, you will design and build your very own Intracellular Artificial Neural Network (IANN) using a library of plasmids from the Ron Weiss lab and human embryonic kidney (HEK) 293 cells.

Unlike traditional digital genetic circuits, IANNs perform analog computations and act as universal function approximators. Given an adequate number of intracellular artificial neurons (Sequestrons), you can use an IANN to achieve complex, non-linear input/output behaviors.

Neuromorphic Lab Workflow Neuromorphic Lab Workflow Fig 1. Laboratory workflow from computational design to automated execution and biological transfection.


Pre-Lab | Concepts

(1) CRISPR Endoribonucleases (Csy4)

Csy4 is a specialized CRISPR endoribonuclease that recognizes specific RNA hairpin sequences. In our circuit, Csy4 acts as the “neuron” that processes information by cleaving and destabilizing target mRNA (like eBFP), effectively performing analog subtraction or thresholding.

Csy4 Mechanism Csy4 Mechanism Fig 2. Schematic of Csy4 recognizing and cleaving a target mRNA recognition site.

(2) The Sequestron

The Sequestron is the fundamental building block of neuromorphic genetic circuits. It works by “sequestering” or binding an activator (target) with an inhibitor (sequestron), creating a sharp, programmable analog response curve analogous to the activation functions in digital neural networks.

Sequestron Schematic Sequestron Schematic Fig 3. The Sequestron mechanism and its resulting analog weighted sum response.

(3) Transfection with Lipofectamine 3000

To get our DNA designs into human HEK293 cells, we use Lipofectamine 3000. This reagent encapsulates plasmid DNA into lipid-based complexes (lipoplexes) that can easily enter cells via endocytosis.

Lipofectamine Mechanism Lipofectamine Mechanism Fig 4. Molecular mechanism of DNA-lipid complex formation and cellular entry.


Dry Lab | Neuromorphic Wizard & Design

The Neuromorphic Wizard software is used to predict the behavior of your IANN designs. By inputting different concentrations of Sequestrons and target plasmids, the wizard provides a simulation of the expected fluorescence output.

Multilayer IANN Design

For this lab, we developed a multilayer perceptron where multiple DNA inputs converge on a set of hidden layer neurons (endoribonucleases), which then regulate the final output protein level.

Multilayer IANN Diagram Multilayer IANN Diagram Fig 5. Multilayer intracellular perceptron design featuring hidden layers for complex signal processing.


Wet Lab | OT-2 Execution

The design is finalized in a Genetic Circuit Design Template (spreadsheet) and executed by an Opentrons OT-2. The robot handles the precise mixing of the plasmid library and the transfection reagents before adding them to the HEK293 cell culture.

Key Parameters:

  • Concentration: 50 ng/μL
  • Total DNA: Maximum 650 ng per circuit.
  • Cell Type: HEK293 (Human Embryonic Kidney)

Results | Observation

Following transfection, the cells are incubated and observed for fluorescent protein expression. The graded intensity of the fluorescence across different wells validates the analog computational capability of the IANN.


HTGAA 2026 | Arman Saadatkhah | Reference: Ron Weiss Lab, MIT

Week 9 Lab: Cell-Free Systems

Cell-Free Transcription-Translation (TX-TL) Systems

Overview | What is Cell-Free?

A cell-free system allows biological reactions to occur outside of living cells. By extracting and using cellular components like ribosomes, RNA polymerase, amino acids, and ATP, this method enables reactions in a controlled, simplified environment.

TX-TL Workflow TX-TL Workflow Fig 1. The general workflow for preparing and running a cell-free TX-TL reaction.

Applications

  • Synthetic Biology: Testing circuits without cellular constraints.
  • Protein Engineering: Rapid production of toxic or difficult proteins.
  • Biosensing: Creating portable, paper-based diagnostic tools.

Technical Overview | TX-TL Production

A. Cell Extract Preparation

The process begins with E. coli growth, followed by washing and cell disruption (sonication or freeze-thaw). Ultracentrifugation at 30,000g separates the necessary machinery (ribosomes, factors) from debris. A strict cold chain is maintained to prevent enzymatic degradation.

B. Master Mix Components

The master mix provides the chemical environment and energy required for synthesis.

ComponentFunction
HEPES (500 mM)pH buffering for optimal enzyme activity.
ATP, GTP, CTP, UTPNucleotides for transcription and energy (ATP/GTP).
E. coli tRNAEssential for amino acid delivery during translation.
3-PGA or PEPEnergy regeneration sources to maintain ATP levels.
Mg/K-GlutamateEssential ionic cofactors for enzymatic machinery.
Murine RNase InhibitorProtects mRNA templates from degradation.

System Comparison | PURE vs. Lysate

There are two primary types of cell-free systems: the PURE System (defined, purified components) and Whole Cell Extract (crude lysate).

System Comparison System Comparison Fig 2. Comparison between the PURE system and Whole Cell Lysate systems.


Lab Exercise | amilGFP Induction Quest

The objective of this lab was to quantify protein production in a cell-free extract using different IPTG levels to induce the expression of amilGFP from a T7-IPTG-inducible plasmid.

Results & Analysis

Fluorescence was monitored over an 8-hour incubation at 30°C. We analyzed the final-point results using ImageJ Color Histogram Analysis.

ImageJ Analysis Mockup ImageJ Analysis Mockup Fig 3. Mockup of the color histogram analysis used to quantify GFP expression levels.

Induction Results Graph Induction Results Graph Fig 4. Quantified GFP fluorescence across varying IPTG inducer concentrations.

Observations

  • Fold Change: We observed a significant dose-response relationship between IPTG concentration and GFP yield.
  • Background: The Non-Template Control (NTC) showed minimal background fluorescence, validating the specificity of the T7-IPTG system.

Homework Questions

1. Advantages of Cell-Free Protein Synthesis

Cell-free systems offer unparalleled flexibility because the reaction is directly accessible. You can add non-canonical amino acids, adjust magnesium concentrations mid-run, or introduce toxic components that would kill a living cell.

  • Beneficial Case 1: Toxic Proteins. Expressing antimicrobial peptides that would lyse the production host.
  • Beneficial Case 2: Rapid Prototyping. Testing 100+ genetic circuit variants in a single day without the time-consuming transformation/cloning cycle.

2. Main Components and Roles

  • Extract: Contains the molecular hardware (ribosomes, tRNA synthetases, RNA polymerase).
  • Master Mix: Provides the fuel (ATP/GTP) and chemical environment (buffers, ions).
  • Template (DNA): The software/instructions for the specific protein to be synthesized.

3. Energy Regeneration

Energy regeneration is critical because the initial ATP/GTP supply is exhausted within minutes. We use secondary energy sources like 3-PGA or PEP which, through the action of metabolic enzymes in the lysate, phosphorylate ADP back into ATP, ensuring a continuous supply for several hours.

4. Prokaryotic vs. Eukaryotic Systems

  • Prokaryotic (E. coli): High yield, fast, and simple. Ideal for producing amilGFP or other reporter proteins.
  • Eukaryotic (Wheat Germ): Slower but capable of complex Post-Translational Modifications (PTMs). Ideal for producing human signaling proteins like Insulin or complex antibodies that require chaperones for correct folding.

5. Optimizing Membrane Protein Expression

I would design the experiment by adding nanodiscs or synthetic liposomes directly to the cell-free reaction.

  • Challenge: Membrane proteins are hydrophobic and aggregate in aqueous buffers.
  • Solution: The presence of a lipid bilayer allows the protein to co-translationally insert into a stable environment, mimicking its natural state.

6. Troubleshooting Low Yield

  • Reason 1: Template Degradation. Strategy: Increase concentration of Murine RNase Inhibitor or use a circular plasmid instead of linear PCR product.
  • Reason 2: Substrate Depletion. Strategy: Perform the reaction in a dialysis format to continuously supply small molecules (NTPs, amino acids) and remove byproducts.
  • Reason 3: Incorrect Mg2+ Concentration. Strategy: Perform a magnesium titration (e.g., 4mM to 16mM) to find the specific optimum for the T7 polymerase and ribosome used.

HTGAA 2026 | Arman Saadatkhah | Reference: TX-TL Laboratory Manual

Apr 8, 2026

Week 10 Lab: Mass Spectrometry

Analytical Protein Characterization via LC-MS

Introduction and Background

Modern bioengineering relies on the ability to understand biological molecules with extraordinary precision. Liquid chromatography–mass spectrometry (LC-MS) is a cornerstone technique for protein characterization, revealing critical information about:

  • Molecular Weight
  • Protein Sequence
  • Protein Folding and Structure

In this lab, we follow an analytical progression from intact protein analysis, through structural interrogation under native and denaturing conditions, to peptide-level sequencing of enhanced Green Fluorescent Protein (eGFP).


PART I: Molecular Weight Determination

Instrument: Waters Xevo G3 QTof

We analyzed an eGFP standard to determine its molecular weight based on mass-to-charge (m/z) and charge (z) measurements. Under denaturing chromatographic conditions, the protein unfolds, exposing protonation sites.

Intact MS Spectrum Intact MS Spectrum Fig 1. Simulated intact eGFP mass spectrum showing the charge state envelope (20+ to 34+).

Key Observations:

  • eGFP MW: ~27 kDa.
  • Deconvolution: MaxEnt1 was used to determine the observed molecular weight from the m/z peaks.

PART II: Native vs. Denatured Protein Structure

The higher-order structure of a protein influences its electrospray ionization (ESI) behavior.

  • Native (Folded): Compact, fewer solvent-accessible sites, lower charge states (e.g., 9+ to 13+).
  • Denatured (Unfolded): Elongated, many protonation sites, higher and broader charge states (e.g., 20+ to 40+).

Native vs Denatured Comparison Native vs Denatured Comparison Fig 2. Comparison of charge state distributions for native (folded) and denatured (unfolded) eGFP.

By adding formic acid to the native sample, we drop the pH and induce unfolding, directly observing the shift in the mass spectrum.


PART III: Peptide Mapping and Primary Sequence

Instrument: Waters BioAccord

To confirm the primary amino acid sequence, eGFP is enzymatically digested by trypsin, which cleaves at the C-terminal side of Lysine (K) and Arginine (R) residues.

Peptide Mapping Strategy Peptide Mapping Strategy Fig 3. Workflow for bottom-up proteomics: Digestion $\rightarrow$ Separation $\rightarrow$ Fragmentation (MS/MS).

Workflow:

  1. Denaturation & Reduction: Guanidine HCl and DTT.
  2. Buffer Exchange: Using BioRad Micro Bio-Spin columns.
  3. Digestion: 20-minute incubation with RapiZyme Trypsin at 55°C.
  4. LC-MS/MS: Fragmenting peptide ions to reconstruct the sequence.

PART IV: Charge Detection Mass Spectrometry (CDMS)

Conventional MS struggles with megadalton-sized complexes due to unresolved charge states. CDMS enables the analysis of large complexes by measuring both m/z and charge (z) for individual ions simultaneously.

Analysis of Keyhole Limpet Hemocyanin (KLH)

KLH exists in massive oligomeric states, including decamers (~8 MDa) and multidecamers (12-32 MDa).

CDMS KLH Plot CDMS KLH Plot Fig 4. Simulated CDMS plot showing individual ions of KLH decamers and multidecamers.


HTGAA 2026 | Arman Saadatkhah | Reference: Waters Immerse Cambridge

Week 11 Lab: Introduction to Cloud Laboratories

Cloud Laboratories: Collective Art and Cell-Free Optimization

Overview | Introduction

Cloud laboratories are making science accessible, affordable, and reproducible. This lab showcases how cloud labs enable human creativity at scale and provide a platform for global collaboration. Our goal is to design a scientifically rigorous cell-free fluorescent protein optimization experiment together.

Nebula Cloud Lab Concept Nebula Cloud Lab Concept Fig 1. Ginkgo Nebula architecture featuring modular automation carts connected via a software-defined control layer.


1. The 1,536 Pixel Artwork Canvas

The community bioart project involved contributing pixels to a global artwork experiment. This demonstrates the power of automation (Opentrons and Echo systems) to handle high-density layouts that would be impossible manually.

Community Bioart Canvas Community Bioart Canvas Fig 2. Simulation of the collective 1,536-pixel bioart canvas using various fluorescent proteins.

My Contribution:

  • Pixel Location: I contributed to the “DNA” pattern on the bottom right plate.
  • Reflection: I liked the seamless integration of individual designs into a unified biological canvas. For next year, real-time visualization of the design progress would be a great addition.

2. Cell-Free Protein Synthesis Reagents

A cell-free reaction requires a complex mixture of “hardware” (lysate) and “fuel” (master mix).

Master Mix Comparison

There are two main strategies for energy provision:

  1. 1-hour PEP-NTP: Optimized for speed; uses high-energy phosphate donors (PEP) for rapid protein synthesis.
  2. 20-hour NMP-Ribose-Glucose: Optimized for sustained production; uses secondary metabolism to regenerate ATP over longer periods.

Master Mix Comparison Master Mix Comparison Fig 3. Performance trade-offs between rapid (PEP-NTP) and sustained (NMP-based) energy systems.

Component Roles:

  • BL21 (DE3) Star Lysate: The molecular machinery (ribosomes, tRNA synthetases, T7 RNA Polymerase).
  • HEPES-KOH / Potassium Glutamate: Buffers and salts to maintain optimal pH and ionic strength.
  • Ribose/Glucose/NMPs: Building blocks and energy precursors for sustaining the reaction.
  • Guanine Bonus: Transcription can occur even without GMP if Guanine is present because the lysate contains salvage pathway enzymes (e.g., phosphoribosyltransferases) that can convert Guanine into GMP/GTP.

3. Global Experiment: Fluorescent Protein Properties

We used 6 different proteins for our collaborative painting, each with unique biophysical characteristics.

FP Biophysical Properties FP Biophysical Properties Fig 4. Key functional properties of the fluorescent proteins used in the collective optimization experiment.

Optimization Hypothesis:

  • Target Protein: mScarlet_I
  • Identified Property: High pH sensitivity (fluorescence drops in acidic conditions).
  • Hypothesis: By increasing the HEPES buffer concentration in the master mix from 500mM to 750mM, we can maintain a more stable neutral pH as metabolic byproducts accumulate, thereby maximizing fluorescence over the 36-hour incubation.

4. Generic Cloud Lab Operations (JSON)

Automation in cloud labs like Ginkgo Nebula is driven by software-defined protocols. Below is a sample configuration for a spark_read operation:

{
    "spark_protocol_duration": 60,
    "bs_speed": 200,
    "pl_seal_temp": 166,
    "pl_seal_time": 2.5,
    "spark_protocol_steps": ["Fluorescence_Ex488_Em510"]
}

HTGAA 2026 | Arman Saadatkhah | Reference: Ginkgo Bioworks Nebula Platform

Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene

Bioproduction of Beta-Carotene and Lycopene

Overview | Objective

In this two-day lab, you will work with genetically modified E. coli to produce beta-carotene and lycopene, key plant pigments and antioxidants found in carrots and tomatoes. Using the plasmids pAC-LYC and pAC-BETA, which encode the pathways for lycopene and beta-carotene production, your goal will be to optimize the production of these two pigments.

This lab explores bioproduction, using biological systems—such as microorganisms (e.g., bacteria, fungi, algae) or plant and animal cells—to produce valuable compounds or materials. Bioproduction plays a critical role in various fields, including industrial biotechnology, pharmaceuticals, agriculture, and food production, enabling the creation of proteins, enzymes, antibiotics, biofuels, and more.

Bioproduction Process Overview Bioproduction Process Overview Fig 1. Overview of the bioproduction workflow, from genetic engineering to pigment extraction.


Overview | Concepts Learned & Skills Gained

A major challenge in bioproduction is the metabolic competition between the organism’s natural drive to reproduce and the production of the target compound. In this lab, you will explore how to fine-tune this balance to maximize pigment production, gaining hands-on experience in optimizing bioproduction systems.

You’ll investigate this by modifying culture conditions:

  1. Temperature (30°C vs 37°C)
  2. Growth Media Composition (LB vs 2YT, with and without fructose)

By examining how different environmental factors influence bacterial growth and synthesis, you will gain practical insights into metabolic engineering, advancing the potential for scalable bioproduction of essential compounds.


Pre-Lab | Reading

Referring to the pathway, lycopene is the red pigment that gives tomatoes their red color. This pigment is also made by microbes. In fact, transferring a 3-enzyme pathway to E. coli can convert farnesyl diphosphate (FPP) to lycopene.

Metabolic Pathway Metabolic Pathway Fig 2. The biosynthetic pathway for Lycopene and Beta-Carotene.

Plasmids

  • pAC-LYC: Contains three genes from Erwinia herbicola: CrtE, CrtI, and CrtB. Produces lycopene.
  • pAC-BETA: Produces beta-carotene through the addition of the Erwinia herbicola CrtY gene to the lycopene pathway.
  • Resistance: Both plasmids include the gene for chloramphenicol resistance.
  • Reference: Addgene #53270, #53272.

Optical Density (OD600)

We will be estimating cell growth by measuring the optical density of cells at a light wavelength of 600 nm (OD600). At 600 nm, dense cell suspensions scatter light, which correlates to approximate cell count. Always blank with the specified media first.


Pre-Lab | Safety

  • Acetone: Review the Safety Data Sheet. Acetone is compatible with polypropylene (50 mL conical and 1.5 mL microcentrifuge tubes).

Protocol | Part 1: Overnight Cultures

Time Estimate: 30 Minutes setup, 24 Hour Incubation

Media, Equipment and Consumables

  • LB and 2YT with chloramphenicol
  • Fructose (for supplementation)
  • Pipette set, serological pipettes, and tips
  • Culture tubes
  • Incubation room (30°C and 37°C)

Experimental Matrix (16 Conditions)

You will set up 16 unique conditions (plus duplicates and media controls, total 34 cultures).

ConditionPlasmidTempMedium
1, 2pAC-LYC30°C, 37°CLB
3, 4pAC-LYC30°C, 37°CLB + Fructose
5, 6pAC-LYC30°C, 37°C2YT
7, 8pAC-LYC30°C, 37°C2YT + Fructose
9, 10pAC-BETA30°C, 37°CLB
11, 12pAC-BETA30°C, 37°CLB + Fructose
13, 14pAC-BETA30°C, 37°C2YT
15, 16pAC-BETA30°C, 37°C2YT + Fructose

Procedure

  1. Prepare 3 mL of the specified media (supplemented with antibiotic).
  2. Inoculate with 1 µL of E. coli starter culture containing the specified plasmid.
  3. Grow for 24 hours in the circular roller drum at the appropriate temperature.

Incubation Results Incubation Results Fig 3. Expected color change post-incubation indicating pigment production.


Protocol | Part 2: Analyze OD600 and Peak Absorbance

Time Estimate: 180 Minutes

OD600 Measurement

  1. Open the OD600 program on the spectrophotometer and blank with the respective media.
  2. Measure using 800 µL of each culture in a cuvette.
  3. Record values in an external table.

Pigment Extraction (Peak Absorbance)

  1. Vortex the culture tube to ensure bacteria are suspended.
  2. Transfer 1000 µL to a 1.5 mL microcentrifuge tube.
  3. Centrifuge at 14,000 rpm for 1 minute.
  4. Remove and discard the supernatant.
  5. Repeat steps 2-4 two more times (total 3 mL concentrated into one pellet). Photograph the pellets!
  6. Add 700 µL of acetone to the pellet and pipette up and down until resuspended. Acetone disrupts cell structure and solubilizes the carotenoids.
  7. Centrifuge at 14,000 rpm for 1 minute.
  8. Transfer 600 µL of the pigmented supernatant to a fresh tube.
  9. Dilute with 600 µL of water (prevents acetone from corroding polystyrene cuvettes).
  10. Measure absorbance on the spectrophotometer:
    • Lycopene: Measure at 474 nm.
    • Beta-Carotene: Measure at 456 nm.
  11. Bleach samples and clean cuvettes after use.

Centrifuged Pellets Centrifuged Pellets Fig 4. Visualizing carotenoid accumulation in bacterial pellets.


Protocol | Part 3: Analysis

Time Estimate: 1 Hour

Compare relative pigment production per cell. Normalize each sample’s absorption peak measurement by its OD600 value.

  • Which culture conditions led to the highest production?
  • Plot your results using Excel or Python and include in your writeup.

Key Formula: Absorbance = 2 – log10(%Transmittance) (Beer-Lambert Law).

Example Results Example Results Fig 5. Example absorption spectrum. Note how pAC-BETA often performs better at 37°C.


Post-Lab Questions | Mandatory

  1. Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively?
    • Lycopene: CrtE, CrtI, and CrtB.
    • Beta-Carotene: CrtE, CrtI, CrtB, and CrtY.
  2. Why do the plasmids need to contain an antibiotic resistance gene?
    • To provide a selective advantage to the transformed E. coli. Only bacteria containing the plasmid will survive on media supplemented with chloramphenicol, ensuring the population maintains the engineered pathway.
  3. What outcomes might we expect when we vary the media, presence of fructose, and temperature?
    • Higher temperatures (37°C) typically increase growth rates but may lead to protein misfolding or metabolic stress. Nutrient-rich media (2YT) supports higher biomass than LB. Fructose provides an alternative carbon source that can bypass certain regulatory bottlenecks in the MEP pathway, potentially increasing pigment titer.
  4. Generally describe what “OD600” measures and how it is interpreted here.
    • OD600 measures the turbidity (light scattering) of a culture at 600 nm. It is used as a proxy for cell density. In this lab, it allows us to normalize pigment production to the number of cells, ensuring we compare efficiency rather than just total volume.
  5. What are other experimental setups where acetone could be used to separate cellular matter from a compound?
    • Extracting chlorophyll from leaves, isolating lipids from tissues, or precipitating proteins during DNA extraction.
  6. Why engineer E. coli for this when Erwinia herbicola naturally produces them?
    • E. coli is a well-characterized “workhorse” with faster growth cycles, established genetic tools, and a lack of native pigments that would interfere with quantification. It allows for higher-scale, more controllable bioproduction.

Post-Lab Questions | For Committed Listeners Only

The following questions are based on supplemental readings and metabolic pathway design.

Metabolic Pathway Deep-Dive

  1. Enzymes: What are the enzymes of the carotene pathway?
    • CrtE (GGPP synthase), CrtB (Phytoene synthase), CrtI (Phytoene desaturase), and CrtY (Lycopene β-cyclase).
  2. Rate-Limiting Step: Which step takes the longest, and which enzyme is responsible for it?
    • Within the carotenoid pathway, CrtB (Phytoene synthase) is the primary rate-limiting enzyme. Globally in E. coli, the DXS enzyme (precursor supply) is often the major bottleneck.

DNA Construct Design

  1. Organism Choice: Would you choose E. coli or S. cerevisiae for bioproduction? Why?
    • E. coli is preferred for rapid prototyping (20 min doubling time). S. cerevisiae (yeast) is preferred for food-grade products (GRAS) and has a superior native MVA pathway for precursor supply.
  2. Expression Vector: Design an expression vector for E. coli. What parts are needed (RBS, terminators, operators, promoters)?
    • Required parts: Promoter (transcription start), Operator (regulation), RBS (translation initiation), Gene of Interest, Terminator (transcription stop), Selectable Marker (antibiotic resistance), and ORI (Origin of Replication).
  3. Promoters:
    • What is the function of a promoter? To recruit RNA Polymerase to the DNA.
    • What is the difference between repressible and inducible promoters? Repressible are “on” until a signal turns them off; Inducible are “off” until a signal (e.g., IPTG) turns them on.
    • Which promoter would you use for a carotene/lycopene gene? Why? An inducible promoter (like pLac). This allows the culture to reach high biomass before “switching on” production, minimizing metabolic burden.
  4. Origin of Replication (ORI):
    • What is an ORI? The site where DNA replication begins.
    • What are compatibility groups? Groups of plasmids that cannot coexist because they compete for the same replication machinery.
    • What is the best ORI for your chosen promoter and gene? A medium-copy ORI like p15A is often used for carotenoids to balance high yield with cellular health.

Advanced Engineering (Global Listeners)

  1. Tuning: Elaborate on how RBS, terminators, and operators contribute to metabolic tuning.
    • RBS strength determines the protein concentration; Terminators prevent transcriptional read-through; Operators allow for dynamic control based on environmental signals.
  2. Aptamers & Riboswitches: How can these be used for metabolic engineering in prokaryotes?
    • They act as RNA-based sensors that can sense product levels and automatically down-regulate or up-regulate expression to maintain optimal metabolic flux.
  3. Assembly: What approach (e.g., Gibson, Golden Gate) would you use to join these parts?
    • Gibson Assembly for large, scarless constructs; Golden Gate for high-throughput modular library testing.
  4. Dream Application: Elaborate on a biosynthetic pathway you would engineer in E. coli. What is its potential impact?
    • Engineering E. coli to produce Astaxanthin (a potent antioxidant) for sustainable aquaculture, reducing the need for synthetic dyes in the salmon industry.

Yeast Engineering (S. cerevisiae)

  1. Integration: Create an integration cassette for homologous recombination.
    • Structure: [5’ Homology Arm] - [Promoter] - [Gene] - [Terminator] - [Marker] - [3’ Homology Arm].
  2. Parts: List the eukaryotic-specific parts (e.g., Kozak sequence, terminators) needed.
    • Kozak sequence (ACC ATGG), Eukaryotic Promoters (pGAL1), and Eukaryotic Terminators (CYC1).
  3. Site Selection: What chromosome site would you use for integration and why?
    • The HO locus is ideal as it is a “safe harbor” where integration does not disrupt essential cellular functions.

HTGAA 2025 | Lab Protocol: Bioproduction of Beta-Carotene and Lycopene

Projects

Final projects:

Subsections of Projects

Group Final Project: MS2 Phage Engineering

Group Final Project: Battling Antibiotic Resistance with Engineered Phages

Overview | Phage Therapy

Phage therapy is the therapeutic use of bacteriophages to treat bacterial infections. Phages are viruses that infect bacteria with extreme specificity, often targeting only a single strain. This specificity allows phage therapy to kill harmful pathogens while sparing beneficial bacteria, offering a potential solution to the global crisis of antibiotic resistance.

MS2 Phage Architecture MS2 Phage Architecture Fig 1. Architecture of the MS2 virion, featuring a single-stranded RNA genome and maturation protein.

The Arms Race

For billions of years, a co-evolutionary arms race has existed between phages and bacteria. Bacteria develop defense mechanisms, and phages evolve to overcome them. Our goal in this project is to use synthetic biology and protein engineering to give phages a “head start” in this race.


The Group Project Objective

We aim to engineer the MS2 bacteriophage to be more efficient at killing its host, Escherichia coli, particularly in overcoming host resistance mechanisms.

MS2 Biology and the Lysis Protein (L)

MS2 infects E. coli by attaching to the F-pilin on the host membrane. Once inside, it hijacks the host machinery to replicate. A critical component of its life cycle is the Lysis Protein (L), which triggers the breakdown of the bacterial cell wall to release new virions.

Lysis Mechanism Lysis Mechanism Fig 2. Proposed mechanism of L-protein mediated lysis and its dependency on host chaperones like DnaJ.

The DnaJ Bottleneck

E. coli can develop resistance by mutating host chaperones like DnaJ, which the L protein requires for proper folding. If DnaJ is mutated, the L protein fails, and the infection cycle stops.

Our Strategy: Engineering the L Protein

We are searching for mutations in the L gene that achieve:

  1. Chaperone Independence: Allowing the L protein to function even if DnaJ is mutated.
  2. Increased Efficiency: Faster or more potent lysis to reduce the window for host adaptation.

MS2 Genetic Map MS2 Genetic Map Fig 3. Genetic map of the MS2 RNA genome, highlighting the overlap between the Lysis (L) gene and other viral proteins.


Research Workflow

This large-scale group effort proceeds through five distinct stages:

Research Workflow Research Workflow Fig 4. The five stages of the HTGAA Group Final Project.

  1. Stage 1: Computational Design: Use protein design tools to engineer novel L protein mutants.
  2. Stage 2: DNA Synthesis: Synthesize the mutant genes via Twist Bioscience.
  3. Stage 3: Cloning: Insert the mutant genes into plasmids using Gibson Assembly.
  4. Stage 4: Structural Validation: Test the structural integrity of the mutants using the Nuclera system.
  5. Stage 5: In Vivo Testing: Characterize the lysis efficiency of the mutants within E. coli cultures.

In-Depth Reading


HTGAA 2026 | Group Final Project | Engineering the Phage-Host Arms Race