Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    ๐Ÿงฌ Week 1 Homework Components ๐Ÿ“‹ Professor Questions & Answers Detailed scientific answers to questions from: Professor Jacobson โ€” DNA polymerase error rates, genetic code degeneracy Dr. LeProust โ€” Oligonucleotide synthesis methods and limitations Professor Church โ€” Essential amino acids and the โ€œLysine Contingencyโ€ ๐Ÿ”ฌ BioVolt Project - DIY Electroporation Device Complete end-to-end project documentation including governance assessment and interactive Python application.

  • Week 2 HW: DNA Read, Write, & Edit

    ๐Ÿงฌ Week 2 Homework Components DNA Read, Write, & Edit โ€” sequencing and synthesis workflows, restriction digests and gel electrophoresis, genome-editing frameworks. ๐Ÿ“‹ Overview This week covers: Part 0: Basics of Gel Electrophoresis Part 1: Benchling & In-silico Gel Art โœ“ Part 2: Gel Art โ€” Restriction Digests and Gel Electrophoresis (wet lab, optional with lab access) Part 3: DNA Design Challenge โœ“ Part 4: Prepare a Twist DNA Synthesis Order โœ“ Part 5: DNA Read/Write/Edit โœ“ Content to be added as you complete each part.

  • Week 3 HW: Lab Automation

    Published paper on automation for novel biological applications; automation project description for gumol MD simulations + ECSOD/MSC + new-Clara microfluidic validation.

  • Week 4 HW: Protein Design

    Conceptual questions from Shuguang Zhang on amino acids, protein structure, helices, and ฮฒ-sheets.

  • Week 5 HW: Protein Design Part II

    PepMLM peptide binder generation for human SOD1 A4V mutant; ML-conditioned peptide design.

  • Week 6 HW: Genetic Circuits

    Phusion PCR, primer annealing, PCR vs restriction digests, Gibson cloning, transformation, and alternative assembly methods (Golden Gate).

  • Week 7 HW: Genetic Circuits Part II

    Intracellular artificial networks (IANNs) vs Boolean circuits; multilayer perceptron diagram; fungal materials and engineering fungi.

  • Week 9 HW: Cell-Free Systems & Synthetic Cells

    Cell-free expression vs in vivo; ATP regeneration; prokaryotic vs eukaryotic CFPS; membrane proteins; troubleshooting; synthetic minimal cell (SOD3/CXCR4); materials pitch; Genes in Space (BioBits).

  • Week 10 HW: Advanced Imaging & Mass Spectrometry

    Final project measurement plan (SOD3); Waters eGFP LC-MS (intact, native vs denatured, peptide mapping, KLH CDMS); lab data tables.

  • Week 11 HW: Cloud Laboratories & Cell-Free Master Mix

    Cell-free reagent roles; PEP-NTP vs long-run energy mix; FP properties; 36 h master-mix hypothesis. Bioart reflection completed on course site.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

๐Ÿงฌ Week 1 Homework Components

๐Ÿ“‹ Professor Questions & Answers

Detailed scientific answers to questions from:

  • Professor Jacobson โ€” DNA polymerase error rates, genetic code degeneracy
  • Dr. LeProust โ€” Oligonucleotide synthesis methods and limitations
  • Professor Church โ€” Essential amino acids and the “Lysine Contingency”

๐Ÿ”ฌ BioVolt Project - DIY Electroporation Device

Complete end-to-end project documentation including governance assessment and interactive Python application.

     /\_/\  
    ( o.o ) 
     > ^ <
    /|   |\
   (_|   |_)
   
   "Meow! Check out both sections above!"

BioVolt Governance Assessment โ€” Policy Options Comparison - I filled this out anyways but the project is located in the Sub as the Cat suggested- Meow

The table below compares three governance approaches for the BioVolt DIY electroporation device across multiple criteria. Scoring: 3 = Best, 2 = Moderate, 1 = Worst.

CriteriaOption 1:
Community
Self-Governance
Option 2:
Safety Warnings
& Labels
Option 3:
Regulatory
Licensing
ENHANCE BIOSECURITY
โ€ข By preventing incidents223
โ€ข By helping respond312
FOSTER LAB SAFETY
โ€ข By preventing incident223
โ€ข By helping respond313
PROTECT THE ENVIRONMENT
โ€ข By preventing incidents223
โ€ข By helping respond312
OTHER CONSIDERATIONS
โ€ข Minimizing costs and burdens321
โ€ข Feasibility?331
โ€ข Not impede research321
โ€ข Promote constructive applications321

Option Summaries

Option 1 โ€” Community-Led Self-Governance:
โœ“ Best for: response capacity, feasibility, minimizing burdens, not impeding research
โœ— Weaker on: prevention (relies on voluntary participation; rogue actors may ignore)

Option 2 โ€” Targeted Product Restrictions (Safety Warnings/Labels):
โœ“ Best for: feasibility, moderate prevention without bans
โœ— Weaker on: response capacity (warnings don’t help after incidents), limited impact on determined bad actors

Option 3 โ€” Regulatory Classification (Licensing/HVA Review):
โœ“ Best for: prevention (permits, training, HVA peer review blocks worst misuse)
โœ— Weaker on: costs, feasibility, impedes DIY research, harms global equity

Recommendation: Prioritize Option 1 (community self-governance) as primary, combine with Option 2 (warnings) as secondary safeguard. Avoid Option 3 unless clear evidence of high-risk proliferation emerges.

Subsections of Week 1 HW: Principles and Practices

DIY Electroporation Project: BioVolt - First rolled out at DEFCON 32- Now revisted from END to END

Project Overview: BioVolt - DIY Electroporation Device & Full Transformation Pipeline

Biological engineering application/tool to develop:
BioVolt is a portable, ultra-low-cost DIY electroporation device (~$10-20 in parts) that uses a piezoelectric crystal from a barbecue lighter to generate ~2,000 V pulses for temporary cell membrane permeabilization. This enables DNA/RNA uptake in bacteria (e.g., E. coli), yeast, plant protoplasts, or even stem cells for genetic transformation. Inspired by the DEFCON 32 talk “You got a lighter I need to do some Electroporation” (presented by Dr. James Utley (Me), Phil Rhodes, and Josh Hill from Viva Securus/Syndicate Laboratories), it builds on frugal biohacking principles: piezoelectric trigger pulsing, custom microfluidic cuvettes from aluminum tape/magnets/glass slides, and simple high-voltage testing.

DEFCON 32 Presentation โ€” Where It Started for me

At DEFCON 32 the talk I presented focused on the device itself โ€” proving that a barbecue lighter’s piezoelectric crystal could generate sufficient voltage to temporarily permeabilize cell membranes for DNA uptake. The talk covered design details, demos, troubleshooting (e.g., arc gap tuning with Post-it notes), and the biohacking ethos behind building a ~$10 electroporator.

Key highlights from the talk: ~2,000 V pulses via lighter clicks, high cell mortality (50-70%) but viable transformants, GFP reporter demos, open protocols encouraged.

Next Phase: End-to-End Pipeline with Efficiency Focus

The next phase of BioVolt moves beyond the device and brings the entire workflow end to end, with a focus on efficiency and frugal validation. The goal: take a piezoelectric electroporator built from a barbecue lighter and prove โ€” through a full pipeline โ€” that it actually works. The pipeline includes:

  1. Plasmid amplification via thermal cycling โ€” Before electroporation, the initial plasmid source will be amplified using the MJ Research PTC-100 thermal cycler (Peltier-effect programmable controller) available in the lab. This ensures sufficient plasmid DNA concentration for transformation.

  2. DNA concentration measurement โ€” Using the Rodeo open colorimeter (visible light version for OD600 cell density measurements) and, if possible, the UV version for DNA concentration quantification. This provides pre- and post-transformation metrics.

  3. Electroporation โ€” Transformation of cells with the amplified plasmid DNA using the BioVolt piezoelectric device, followed by recovery and plating.

  4. Post-transformation PCR verification โ€” For good measure, PCR will be run after transformation using the same thermal cycler to check whether the insert is present in the recovered cells. This triangulates and correlates with plating results to provide a hasty “close enough” frugal validation.

  5. Gel electrophoresis confirmation โ€” Agarose gel electrophoresis to visualise PCR products and verify successful transformation (e.g., presence of reporter genes like GFP via band patterns under UV).

The aim is to triangulate multiple data points โ€” plasmid amplification, colorimetric/UV measurement, transformation plating, and post-transformation PCR โ€” to build confidence that the piezo electroporator from a lighter actually delivers. Fingers crossed, this provides a credible, frugal, end-to-end validation of a DIY electroporation workflow.

This democratizes synthetic biology for education, citizen science, and personal biohacking in resource-limited settings.

Lab Setup & Tools in Action - You can see I got some goods to work with!

My biohacker lab integrates the device with the full verification pipeline.

Working in the lab โ€” handling samples and preparing equipment for the electroporation pipeline Working in the lab โ€” handling samples and preparing equipment for the electroporation pipelineIO Rodeo open colorimeter โ€” visible light version for OD600 cell density and downstream assays; UV version targeted for DNA concentration measurement IO Rodeo open colorimeter โ€” visible light version for OD600 cell density and downstream assays; UV version targeted for DNA concentration measurement

On to the assignement - Interactive Governance Assessment Form

An interactive Python application (app.py) is provided to assess governance and risk mitigation strategies for the BioVolt project. The form uses a block-based rating scale where more filled blocks mean more effective:

BlocksRatingMeaning
โ—โ—‹โ—‹Minimally EffectiveLow impact โ€” unlikely to achieve the goal
โ—โ—โ—‹Moderately EffectiveModerate impact โ€” partial success likely
โ—โ—โ—Most EffectiveHigh impact โ€” highly likely to achieve goal

Project File Structure

BioVolt_week_01_hw_principles_and_practices/
โ”œโ”€โ”€ _index.md                      # This file โ€” project documentation (Hugo page)
โ”œโ”€โ”€ app.py                         # Interactive governance assessment application
โ”œโ”€โ”€ requirements.txt               # Python dependencies
โ”œโ”€โ”€ Biohacker_Lab.jpeg             # Lab overview photo
โ”œโ”€โ”€ in_da_lab.jpeg                 # Working in the lab photo
โ”œโ”€โ”€ Volt_Test.jpeg                 # High-voltage testing with insulation tester
โ”œโ”€โ”€ rodeo-colorimeter.png          # IO Rodeo open colorimeter
โ”œโ”€โ”€ BioVolt_govern_UI.png          # Screenshot of the application UI
โ””โ”€โ”€ Biovolt_Govern_Report.png      # Screenshot of the PDF report output

Prerequisites

  • Python 3.x installed on your system
  • tkinter (usually included with Python; on Linux you may need python3-tk)

Installation

  1. Navigate to the project directory:

    cd BioVolt_week_01_hw_principles_and_practices
  2. Install required dependencies:

    pip install -r requirements.txt

Running the Application

python app.py

How to Use the Form

  1. Launch โ€” The application opens a dark-themed window with the assessment matrix.

  2. Read the instructions โ€” System instructions are displayed at the top of the form explaining the block-based rating system.

  3. Review each concern category โ€” Three categories are presented, each with context questions:

    • Biosecurity Concerns โ€” preventing GMO release, high-voltage mishandling, pathogen engineering
    • Equity Concerns โ€” access, regulation, educational barriers, global equity
    • Environmental Concerns โ€” microbial activity, non-human organisms, public concerns
  4. Rate each action โ€” For every action under each stakeholder (Researchers, Manufacturers, Industry, Organizations), click one of three block-rating buttons:

    • โ—โ—‹โ—‹ โ€” Minimally Effective (button highlights red)
    • โ—โ—โ—‹ โ€” Moderately Effective (button highlights amber)
    • โ—โ—โ— โ€” Most Effective (button highlights green)
  5. Visual feedback โ€” When you click a rating:

    • The selected button stays highlighted with its rating colour
    • A status indicator appears to the right showing your selection
    • Other buttons in the same row reset to their default state
  6. Export to PDF โ€” Click the “EXPORT TO PDF” button to generate a report containing:

    • Cover page with assessment date and completion count
    • Rating scale legend with colour-coded descriptions
    • Full assessment tables for each concern category
    • Colour-coded rows: green tint for Most Effective, amber for Moderate, red for Minimal
    • Block indicators (โ—โ—โ— / โ—โ—โ—‹ / โ—โ—‹โ—‹) printed in every row
    • Summary page with counts and percentages for each rating level
  7. Reset โ€” Click “RESET MATRIX” to clear all selections and start over.

Application Features

  • Block-based rating scale โ€” intuitive system where more blocks = more effective (no ambiguity)
  • Dark theme UI โ€” dark background with neon accent colours for readability
  • Persistent button state โ€” selected buttons remain highlighted with their rating colour
  • Status indicators โ€” each row shows the current selection in text beside the buttons
  • Scrollable interface โ€” mouse wheel support for navigating the full assessment matrix
  • Neon accent bars โ€” left-side accent bars on each concern card for visual hierarchy
  • Colour-coded PDF output โ€” rating cells are tinted to match their effectiveness level
  • Summary statistics โ€” PDF includes a final page with counts and percentages
  • Empty export protection โ€” warns you if no ratings are selected before exporting
  • Form reset โ€” one-click reset with confirmation dialog

Screenshots

Application UI โ€” Dark-themed interface with block-based rating buttons and colour-coded status indicators:

BioVolt Governance Assessment Matrix UI BioVolt Governance Assessment Matrix UI

PDF Report Output โ€” Exported assessment with colour-coded rows, block indicators, and stakeholder ratings:

BioVolt Governance Assessment PDF Report BioVolt Governance Assessment PDF Report

Governance / Policy Goals (Preventing Harm)

Focus on non-tool-function risks: Prevent environmental release of unintended GMOs, biosafety incidents from mishandling high-voltage + microbes, escalation to unsafe self-experimentation/human applications, or biosecurity concerns (e.g., pathogen engineering).
Core aims: Minimize biosafety/biosecurity harms, promote responsible use, avoid stifling innovation with heavy regulation, encourage informed DIYbio practices, and address public/environmental concerns.

Three Potential Governance/Policy Actions

Action 1: Community-Led Self-Governance with Voluntary Guidelines and Reporting

Goal: Foster peer accountability and safe practices through DIYbio networks, reducing risks via shared norms without external mandates.

Design:

  • Opt-in: DIYbio communities, forums (e.g., Discord, Reddit, The ODIN users), and makerspaces.
  • Fund: Crowdfunding, donations, or volunteer time.
  • Approve: Community-elected moderators or biosafety working groups.
  • Implement: Publish voluntary guidelines (e.g., “BioVolt Safety Protocol” on protocols.io or GitHub), require protocol sharing for builds, anonymous incident reporting (expand “Ask a Biosafety Expert” services).

Risks / What could go wrong (incorrect assumptions, uncertainties):
Assumes broad ethical participation - rogue actors may ignore; self-reporting misses hidden issues; low adoption if seen as “extra work.”

Assumptions, “Success” and “Failure” rubric:

  • Success (best - 1): High adoption -> fewer accidents, strong norms against risky uses (e.g., no human trials), community self-corrects.
  • Mid (2): Partial uptake -> safety improvements in visible projects, but gaps remain.
  • Failure (worst - 3): Guidelines ignored -> no risk reduction, or “forbidden fruit” effect increases experimentation.
  • Unintended consequences: Overly cautious norms suppress legitimate educational uses.

Action 2: Targeted Product Restrictions (e.g., Safety Warnings / Age Limits on Kits & Components)

Goal: Reduce impulsive or uninformed misuse by requiring clear hazard labels on high-voltage components (e.g., piezoelectric lighters, capacitors) or full kits, without banning access.

Design:

  • Opt-in/compliance: Online sellers (Amazon, AliExpress), hardware stores, kit makers.
  • Fund: Seller-borne costs.
  • Approve: Consumer safety agencies or state-level consumer protection (e.g., modeled on CRISPR kit labeling laws).
  • Implement: Mandatory labels (“Not for human use; biological hazard when combined with genetic material; 18+ recommended”).

Risks / What could go wrong:
Warnings may not deter determined users (parts sourced separately); patchy enforcement online/global; could increase black-market activity.

Assumptions, “Success” and “Failure” rubric:

  • Success (best - 1): Warnings raise awareness, reduce naive accidents while preserving access.
  • Mid (2): Labels added but often ignored by experienced users.
  • Failure (worst - 3): Little impact on bad actors; adds cost/delays for legitimate builders.
  • Unintended consequences: Drives activity underground, reducing community visibility/oversight.

Action 3: Treat as if it has a Regulatory Classification as Restricted Biotech Equipment (e.g., Licensing for High-Voltage Builds) Pledge reporting and Safe use.

Goal: Treat advanced DIY electroporators like controlled lab tools - require permits/training for >1,000 V devices to prevent proliferation to high-risk genetic work.

Design:

  • Opt-in: Individual builders/users via registration.
  • Fund: User fees.
  • Approve: Government agencies (e.g., expanding CDC/NIH biosafety rules or local health depts).
  • Implement: Permits, training requirements, inspections for community labs/shared spaces.
  • Hazard Vulnerability Assessment (HVA) and Peer Review: Conduct a comprehensive HVA and require peer review through a pseudo-IRB-like entity - a multidisciplinary and independent review board focusing on environmental and human safety. This entity would evaluate proposed uses, assess risks, and provide guidance on safe protocols before high-voltage builds are deployed.

Risks / What could go wrong:
Hard to define safe thresholds; bureaucracy kills accessibility; overreach chills innovation globally.

Assumptions, “Success” and “Failure” rubric:

  • Success (best - 1): Blocks worst misuse (e.g., pathogen work), funnels activity to supervised settings.
  • Mid (2): Some compliance, but many unlicensed builds continue.
  • Failure (worst - 3): Broad restrictions eliminate DIY benefits, push activity to unregulated regions.
  • Unintended consequences: Harms global equity/education; favors institutional labs only.

Overall Tradeoffs & Prioritization

Prioritize Action 1 (community self-governance) as primary: Lowest overregulation risk, aligns with DIY ethos, adaptable to low current misuse evidence, leverages community goodwill.

Combine with Action 2 (targeted warnings) as secondary: Adds minimal external safeguard for public health, deters casual risks without bans.

Avoid/minimize Action 3 unless clear evidence of high-risk proliferation: Highest chance of killing accessibility and innovation, poor fit for low-harm tool like BioVolt.

Key uncertainties (misuse rates, community response, enforcement feasibility) favor lighter interventions. Monitor via voluntary reporting; escalate only if serious incidents arise. This balances empowerment with responsible governance for biosafety and preventing broader DIY genetic risks.

Made with love and the AI Slop is from Cursor-GLM 4.7

Week 1: Professor Questions

Answers organized by instructor, please click the question to reveal the answer!

Instructions: Click the triangle (โ–ถ) or question text to expand and view the full answer.


[SECTION 1] Questions from Professor Jacobson

Source: Lecture 2 slides


โ–ถ Question 1: Nature's machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

Answer

Executive Summary:
DNA polymerase intrinsic error rate (~10โปโท) would cause ~320 errors per human genome replication (3.2 ร— 10โน bp). Biology employs multilayer error correction (proofreading, mismatch repair, excision repair) to achieve final fidelity of ~10โปโน to 10โปยนโฐ errors per base per division, yielding 0.3-3 errors per replication in normal somatic cells.


Error Rate of DNA Polymerase

DNA polymerase has an intrinsic error rate of approximately 1 error per 10โท nucleotides during DNA synthesis. With integrated 3’ to 5’ exonuclease proofreading activity, this improves to approximately 1 error per 10โธ-10โน nucleotides.

Comparison to Human Genome Length

The human genome contains approximately 3.2 ร— 10โน base pairs.

Without proofreading:

  • Error rate: ~10โปโท per nucleotide
  • Expected errors per replication: ~320 errors per genome copy

With proofreading:

  • Error rate: ~10โปโธ to 10โปโน per nucleotide
  • Expected errors per replication: ~3-32 errors per genome copy

How Biology Deals with This Discrepancy

Biology employs multiple layers of error correction that act sequentially:

  1. Proofreading (3’ โ†’ 5’ exonuclease activity)

    • DNA polymerase detects incorrect base pairing via geometric distortion
    • Removes mismatched nucleotide immediately
    • Reduces error rate by approximately 100-1000-fold
  2. Mismatch Repair (MMR) System

    • Post-replication surveillance mechanism
    • In bacteria (E. coli): MutS, MutL, and MutH proteins
    • In eukaryotes: MSH (MutS homolog), MLH (MutL homolog), and PMS protein families
    • System identifies mismatched base pairs, excises incorrect strand segment, and resynthesizes
    • Further reduces error rate by approximately 100-1000-fold
  3. Nucleotide Excision Repair (NER)

    • Repairs bulky DNA lesions (UV-induced thymine dimers, chemical adducts)
    • Removes damaged nucleotide segments (20-30 nt patches)
  4. Base Excision Repair (BER)

    • Corrects small base modifications (deamination, oxidation, alkylation)
    • DNA glycosylases remove damaged bases; AP endonucleases process abasic sites

Result:
The combined fidelity of replication in eukaryotic somatic cells typically achieves ~10โปโน to 10โปยนโฐ errors per base per cell division, depending on organism, cell type, and proliferation status. This ensures 0.3-3 errors per genome replication under normal physiological conditions.

Note: Fidelity varies by context. Cancer cells with MMR defects exhibit 100-1000ร— higher mutation rates. Germline cells employ additional proofreading mechanisms. Some DNA polymerases (e.g., Pol ฮท, translesion synthesis polymerases) have lower fidelity by design for specialized repair functions.


โ–ถ Question 2: How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don't work to code for the protein of interest?

Answer

Executive Summary:
For a typical 400-residue protein, the number of synonymous DNA sequences (due to codon degeneracy) is astronomically largeโ€”on the order of 10ยนโฐโฐ or more, calculated as the product of synonymous codon counts across all positions. In practice, most sequences fail due to codon usage bias, mRNA secondary structure, RNA instability, splicing interference, cryptic regulatory elements, and synthesis/cloning constraints.


Number of Different Ways to Code for a Protein

The genetic code is degenerateโ€”61 sense codons encode 20 standard amino acids plus start/stop signals. Each amino acid (except Met and Trp) has multiple synonymous codons:

  • Leucine, Serine, Arginine: 6 codons each
  • Isoleucine: 3 codons
  • Methionine, Tryptophan: 1 codon each

For an average human protein (~400 amino acids):

The total number of synonymous DNA sequences is the product of synonymous codon counts across all positions:

N = โˆ(i=1 to 400) n_i

where n_i = number of synonymous codons for amino acid i.

Rough estimate:

  • Average degeneracy per amino acid โ‰ˆ 3 codons (weighted by frequency)
  • Total combinations โ‰ˆ 3โดโฐโฐ โ‰ˆ 10ยนโนโฐ possible DNA sequences

Even conservative estimates (e.g., leucine-rich proteins) yield 10ยนโฐโฐ+ combinations.

Why All These Different Codes Don’t Work in Practice

Even though multiple sequences encode the same amino acid sequence, the vast majority fail to express functional protein due to:

1. Codon Usage Bias
  • Each organism has preferred codons reflecting tRNA abundance (Plotkin & Kudla 2011)
  • E. coli prefers different codons than humans (e.g., AGG/AGA rare in bacteria, common in mammals)
  • Rare codons โ†’ ribosome stalling โ†’ may alter co-translational folding kinetics
  • Using non-optimal codons can reduce expression 10-1000-fold (Gustafsson et al. 2004)
2. mRNA Secondary Structure
  • Certain nucleotide sequences form stem-loops or hairpins
  • Strong secondary structures can:
    • Block ribosome binding
    • Stall translation
    • Trigger mRNA degradation
3. RNA Stability
  • AU-rich sequences โ†’ rapid mRNA degradation
  • GC-rich sequences โ†’ more stable mRNA
  • Wrong codon choice can drastically reduce mRNA half-life
4. Splicing Interference
  • Certain sequences create cryptic splice sites
  • Can cause exon skipping or intron retention
  • Results in truncated or non-functional protein
5. Ribosome Binding Sites (RBS) Interference
  • Shine-Dalgarno sequences (prokaryotes) or Kozak sequences (eukaryotes)
  • Internal RBS-like sequences can cause premature translation initiation
  • Results in truncated proteins
6. Restriction Enzyme Sites
  • Cloning often requires avoiding certain restriction sites
  • Limits sequence choices for practical molecular biology
7. Repetitive Sequences
  • Long homopolymer runs (e.g., AAAAAA) cause synthesis/sequencing errors
  • Can trigger recombination or replication errors

Quantitative Example: For a 10-amino acid peptide (assuming average 3-fold degeneracy), there are theoretically 3ยนโฐ โ‰ˆ 59,000 synonymous sequences. However, accounting for all the constraints listed above, only an estimated 10ยฒ-10ยณ sequences (~1-2%) would be practically functional.


[SECTION 2] Questions from Dr. LeProust

Source: Lecture 2 slides


โ–ถ Question 3: What's the most commonly used method for oligo synthesis currently?

Answer

Executive Summary:
Phosphoramidite chemistry on solid-phase support (Caruthers method, 1981) is the current industry standard, with typical coupling efficiency of 98.5-99.5% per cycle and practical length ceiling of 150-200 nucleotides.


Phosphoramidite Chemistry (Solid-Phase Synthesis)

The phosphoramidite method on solid support is the dominant technology for oligonucleotide synthesis worldwide.

Key Features:

  • Invented: Marvin Caruthers and colleagues (1981)
  • Platform: Solid-phase synthesis on controlled-pore glass (CPG) or polystyrene beads
  • Direction: 3’ โ†’ 5’ synthesis (chain grows from 3’-OH to 5’ end)
  • Cycle efficiency: Typically 98.5-99.5% per nucleotide addition
  • Practical length limit: 150-200 nucleotides for routine synthesis

Four-Step Cycle:

  1. Detritylation (acid treatment)

    • Removes DMT (dimethoxytrityl) protecting group from 5’-OH
    • Exposes reactive hydroxyl for next nucleotide
  2. Coupling (phosphoramidite addition)

    • Protected phosphoramidite monomer + tetrazole activator
    • Forms phosphite triester linkage
    • ~98-99.5% coupling efficiency
  3. Capping (acetic anhydride)

    • Blocks unreacted 5’-OH groups
    • Prevents deletion sequences
  4. Oxidation (iodine/water)

    • Converts unstable phosphite (Pยณโบ) to stable phosphate (Pโตโบ)
    • Forms phosphate backbone

Advantages:

  • High throughput (96-384 well formats)
  • Automated
  • Scalable (nmol to ยตmol scale)
  • Well-established chemistry

Current Platforms: Commercial platforms include BioAutomation and ABI/Applied Biosystems synthesizers for traditional column-based synthesis. Newer high-throughput approaches include Twist Bioscience (silicon-based microarray synthesis) and Custom Array (electrochemical synthesis on chips).


โ–ถ Question 4: Why is it difficult to make oligos longer than 200nt via direct synthesis?

Answer

Executive Summary:
Cumulative coupling inefficiency (even at 99% per cycle) yields only ~13% full-length product at 200 nt. Dominant failure modes are deletion sequences from incomplete coupling, depurination during detritylation, and increasing purification difficulty as n-1, n-2… products accumulate.


Cumulative Coupling Errors and Deletion Sequences

The primary challenge is imperfect coupling efficiency in each phosphoramidite addition cycle.

The Mathematics of Error Accumulation:

  • Coupling efficiency per cycle: typically 98.5-99.5%
  • Stepwise failure rate: 0.5-1.5% per cycle
  • Yield of full-length product = (coupling efficiency)^n where n = oligo length

Yield Calculation:

LengthCoupling EfficiencyFull-Length Yield
50 nt99%60%
100 nt99%37%
150 nt99%22%
200 nt99%13%
300 nt99%5%

At 200 nucleotides with 99% efficiency:

  • Only 13% of molecules are full-length correct sequence
  • 87% are deletion products (n-1, n-2, n-3… truncations)

Specific Problems Beyond 200nt (in order of impact):

  1. Deletion Sequences from Incomplete Coupling

    • Failed coupling at position i โ†’ all subsequent additions build on truncated chain
    • Creates heterogeneous mixture of n-1, n-2, n-3… products
    • Capping step blocks these from extending, but they remain in final pool
  2. Depurination During Acid Treatment

    • Detritylation uses trichloroacetic acid or dichloroacetic acid
    • Causes glycosidic bond cleavage at purines (A, G)
    • Cumulative damage over 200+ cycles
    • Results in abasic sites and chain breaks
  3. Purification Difficulty

    • Full-length (200 nt) vs. n-1 (199 nt) differ by <0.5% in mass
    • HPLC and PAGE separation becomes marginal
    • Impure product affects downstream applications
  4. Secondary Structure Formation

    • Long single-stranded oligos form intramolecular hairpins during synthesis
    • Blocks reagent access to growing 3’-OH end (on solid support, growing from 3’ end)
    • Reduces effective coupling efficiency in later cycles
  5. Synthesis Time and Cost

    • 200 cycles ร— 10-15 min/cycle = 33-50 hours continuous synthesis
    • Reagent consumption scales linearly
    • Low yields require larger scale synthesis โ†’ higher cost

Practical Solutions: Modern approaches avoid direct synthesis beyond 200 nt by using gene assembly from overlapping 60-80 nt oligos (polymerase cycling assembly, Gibson assembly), column-based assembly methods (e.g., Twist Bioscience chip synthesis followed by assembly), or emerging enzymatic synthesis using terminal deoxynucleotidyl transferase-based methods.


โ–ถ Question 5: Why can't you make a 2000bp gene via direct oligo synthesis?

Answer

Executive Summary:
Direct phosphoramidite synthesis of 2000 nt is practically infeasible due to vanishingly low yields (0.99^2000 โ‰ˆ 10โปโน), prohibitive synthesis time (~2-3 weeks continuous), cumulative depurination, and insurmountable purification challenges. Modern gene synthesis uses hierarchical assembly of 60-80 nt oligos into fragments, then full-length genes.


Practical Infeasibility with Current Phosphoramidite Chemistry

Making a 2000 bp gene via direct oligonucleotide synthesis is practically infeasible with standard phosphoramidite chemistry due to insurmountable yield, time, and purification barriers.

Yield Barriers:

At 99% coupling efficiency (best-case scenario):

  • Yield = 0.99^2000 โ‰ˆ 2 ร— 10โปโน (0.0000002%)
  • To obtain 1 picomole of full-length product requires ~0.5 moles of starting material
  • Equivalent to ~660 grams of protected nucleotide phosphoramidites
  • Material cost alone: ~$500,000 - $1,000,000

Even at 99.5% efficiency (exceptional, rarely achieved):

  • Yield = 0.995^2000 โ‰ˆ 5 ร— 10โปโต (0.005%)
  • Still economically and practically prohibitive

Physical/Chemical Barriers:

  1. Synthesis Time

    • Typical cycle time: 10-15 minutes per nucleotide addition
    • 2000 cycles = 20,000-30,000 minutes = 14-21 days continuous synthesis
    • Reagent degradation over extended periods
    • Instrument reliability over multi-week runs
  2. Cumulative Depurination

    • 2000 acid detritylation steps
    • Each cycle causes low-frequency glycosidic bond cleavage at purines
    • Accumulates to extensive abasic sites and strand breaks
  3. Secondary Structure Collapse

    • Long single-stranded DNA forms extensive intramolecular structure
    • Hairpins and G-quadruplexes block reagent access
    • Synthesis typically stalls beyond 300-400 nt even with optimized conditions
  4. Solubility and Handling

    • Very long oligos can precipitate on solid support
    • Reduced accessibility to coupling reagents
    • Cleavage and deprotection become inefficient

Practical Solution: Hierarchical Gene Assembly

Modern commercial gene synthesis uses multi-step assembly:

Step 1: Oligo Synthesis

  • Synthesize 30-50 oligonucleotides (60-80 nt each, with 20-40 nt overlaps)
  • Yield per oligo: 60-95% (high quality)

Step 2: Fragment Assembly

  • Assemble oligos into 4-6 intermediate fragments (400-600 bp each)
  • Methods: Polymerase cycling assembly (PCA), Gibson assembly, Golden Gate
  • Yield per fragment: 70-90%

Step 3: Final Assembly

  • Combine fragments into full 2000 bp gene
  • Gibson assembly or restriction enzyme-based methods
  • Final yield: 60-85% overall

Example for 2000 bp gene:

  • 40 oligos ร— 70 nt average = 2800 nt synthesized capacity
  • Assemble into 5 fragments (~400 bp each)
  • Final Gibson assembly into 2000 bp construct
  • Overall yield: ~70% (vs. 10โปโน% for direct synthesis)

Commercial Gene Synthesis: Major vendors (Twist Bioscience, IDT, GenScript, Thermo Fisher) offer typical academic pricing of $0.07-0.20/bp, though this is highly variable depending on sequence complexity (GC content, repeats, secondary structure), turnaround time (5-10 days standard, 2-3 days expedited), and order volume. Standard turnaround is 5-10 days with rush options of 2-3 days.


[SECTION 3] Question from Professor George Church

Source: Lecture 2 slides


โ–ถ Question 6: [Using Google & Prof. Church's slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the "Lysine Contingency"?

(I chose this question from the three options)

Answer

Executive Summary:
The commonly listed essential amino acids in vertebrates include His, Ile, Leu, Lys, Met, Phe, Thr, Trp, Val, and conditionally Arg. The “Lysine Contingency” from Jurassic Park is scientifically flawed because lysine is already naturally essential in all vertebratesโ€”the genetic modification provides zero additional biocontainment. Moreover, lysine is abundant in all natural food sources, and deficiency takes months to years to be lethal.


The Commonly Listed Essential Amino Acids in Vertebrates

Essential amino acids cannot be synthesized de novo by vertebrate metabolism and must be obtained from diet. The standard list for humans and most vertebrates includes: Histidine (His, H), Isoleucine (Ile, I), Leucine (Leu, L), Lysine (Lys, K) [focus of Jurassic Park scenario], Methionine (Met, M), Phenylalanine (Phe, F), Threonine (Thr, T), Tryptophan (Trp, W), Valine (Val, V), and Arginine (Arg, R), which is conditionally essentialโ€”essential in juveniles, young/growing animals, and during illness, though adults can synthesize limited amounts via the urea cycle.

Mnemonic: “PVT TIM HALL” (Phe, Val, Thr, Trp, Ile, Met, His, Arg, Leu, Lys)

Note: The classification varies slightly by species and life stage. Arginine is typically considered semi-essential or conditionally essential in adult mammals.


The “Lysine Contingency” from Jurassic Park

In Jurassic Park (Michael Crichton, 1990), InGen implemented a “Lysine Contingency” as a biocontainment measure. The plan involved genetically engineering dinosaurs unable to synthesize lysine, making them dependent on lysine supplements in their food. The theory was that if they escaped, they would die from lysine deficiency. As Dr. Wu stated: “The lysine contingency is intended to prevent the spread of the animals is case they ever got off the island.”


Why the Lysine Contingency is Scientifically Flawed

Critical Problem: ALL ANIMALS ALREADY REQUIRE DIETARY LYSINE

1. Lysine is Naturally Essential in All Vertebrates

Humans, dinosaurs, birds, and mammals cannot synthesize lysine de novo. Animals lost the lysine biosynthesis pathway approximately 500 million years ago during early vertebrate evolution. The dinosaurs would have required dietary lysine regardless of any genetic modification. Therefore, the “contingency” provides zero additional biocontainmentโ€”it is entirely redundant.


2. Lysine is Abundant in Natural Food Sources

Based on USDA nutritional databases, lysine is widespread in both plant and animal food sources. Plant sources include legumes (soybeans, lentils, beans) containing 1-2% lysine by dry weight, seeds and grains with 0.2-0.8% lysine, and grasses and leafy vegetation with 0.3-0.6% lysine. Animal sources are even richer: insects contain approximately 2-3% lysine by dry weight, while vertebrate muscle tissue, fish, and eggs contain 1.5-2.5% lysine by weight.

Estimated lysine intake for large theropods (carnivorous dinosaurs):

Note: The following are rough extrapolations from modern vertebrate nutritional requirements and are not based on direct measurements of dinosaur metabolism. Assuming an estimated daily food intake of 50-100 kg meat (scaled from modern large carnivores) and lysine content of meat at approximately 1.5-2.0 g/100g, the estimated daily lysine intake would be 750-2000 g. Compared to an estimated lysine requirement of approximately 10-50 g/day (scaled from mammals, though highly uncertain), even conservative estimates suggest 10-100ร— excess lysine intake.

Estimated lysine intake for herbivorous dinosaurs:

Assuming estimated daily vegetation consumption of hundreds of kg for sauropods and lysine content in plant matter of 0.3-1.0% dry weight, the estimated daily lysine intake would be hundreds of grams. This substantially exceeds the likely requirement of 50-200 g/day when scaled from large herbivorous mammals.

Key Point: Even consuming exclusively grass, leaves, or insects would likely provide sufficient lysine to meet metabolic needs, assuming dinosaur requirements scaled similarly to modern vertebrates.


3. Timescale of Lysine Deficiency is Impractical

Lysine deficiency symptoms develop slowly: immune system impairment occurs over weeks to months, growth retardation takes months, and muscle wasting progresses over months to years. Lethality from severe deficiency requires months to years. A dinosaur escaping into the wild would eat naturally available food and immediately obtain sufficient lysine, never developing deficiency symptoms. The timescale mismatch is fatal to the strategy: containment must occur in minutes to hours (the escape window), while lysine deficiency lethality takes months to years. The result is a completely ineffective biocontainment strategy.


4. Better Biocontainment Strategies

If the goal is preventing escaped dinosaurs from surviving or reproducing, several approaches would be more effective than the lysine contingency.

Metabolic Dependencies: Creating auxotrophy for synthetic amino acids not found in nature (such as D-amino acids or unnatural amino acids requiring continuous supplementation), nucleotide auxotrophy (e.g., thymine requirement), or vitamin/cofactor dependencies (e.g., engineered B12 requirement) would provide genuine containment.

Genetic Kill Switches: Conditional lethality genes requiring antidote molecules, thermosensitive essential genes that allow survival only at controlled temperatures, or light-dependent survival mechanisms requiring specific UV or wavelength exposure offer programmed containment.

Reproductive Control: All-female populations (as attempted in Jurassic Park), meiotic drive systems ensuring sterility, or genetic incompatibility with wild relatives would prevent population establishment.

Environmental Dependencies: Temperature-sensitive phenotypes surviving only in controlled climates or organisms requiring specific atmospheric pressure or composition would restrict habitat range.


Conclusion: How This Affects My View of the Lysine Contingency

The Lysine Contingency is scientifically flawed as a biocontainment strategy and represents a misunderstanding of vertebrate nutritional biochemistry. The strategy fails on four fundamental levels: (1) it is not a contingency since lysine is already naturally essential in all vertebrates, making the modification redundant; (2) it is not limiting since lysine is abundant in nearly all natural food sources; (3) it is not fast-acting since lysine deficiency takes months to years to be lethal in large vertebrates; and (4) it provides no additional biocontainment barrier beyond natural biology.

From a biosafety perspective, the lysine contingency demonstrates the risk of “security theater” in synthetic biologyโ€”creating the appearance of control without meaningful containment. Real biocontainment requires dependencies on synthetic or artificial inputs not present in natural ecosystems. Modern synthetic biology approaches include unnatural amino acid dependencies (e.g., amber suppressor systems with synthetic tRNAs), genetic kill switches (toxin-antitoxin modules, essential gene knockout with complementation), orthogonal genetic systems (expanded genetic code, xenobiology with XNA), and metabolic dependencies on synthetic nutrients or specific light wavelengths.

Narrative function in Jurassic Park: The flawed lysine contingency serves as a plot device illustrating InGen’s overconfidence and foreshadows that all their control measures will fail (“Life finds a way”). It highlights the dangers of inadequate risk assessment and overconfidence in genetic engineering safeguards.

Lessons for modern synthetic biology: Biological containment is extremely difficult and requires multiple redundant safeguards. Single-point dependencies, especially on naturally occurring molecules, are inadequate. Rigorous testing and evolutionary escape rate measurements are essential for any containment strategy.


[REFERENCES]

Primary Literature and Resources

DNA Replication Fidelity (Q1):

  • Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 6th edition. Garland Science, 2014. Chapter 5: DNA Replication, Repair, and Recombination.
  • Kunkel TA, Bebenek K. DNA replication fidelity. Annu Rev Biochem. 2000;69:497-529. doi:10.1146/annurev.biochem.69.1.497
  • Iyer RR, Pluciennik A, Burdett V, Modrich PL. DNA mismatch repair: functions and mechanisms. Chem Rev. 2006;106(2):302-323. doi:10.1021/cr0404794

Genetic Code and Translation (Q2):

  • Plotkin JB, Kudla G. Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet. 2011;12(1):32-42. doi:10.1038/nrg2899
  • Gustafsson C, Govindarajan S, Minshull J. Codon bias and heterologous protein expression. Trends Biotechnol. 2004;22(7):346-353. doi:10.1016/j.tibtech.2004.04.006
  • Tuller T, Carmi A, Vestsigian K, et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell. 2010;141(2):344-354. doi:10.1016/j.cell.2010.03.031

Oligonucleotide Synthesis (Q3-Q5):

  • Caruthers MH. Gene synthesis machines: DNA chemistry and its uses. Science. 1985;230(4723):281-285. doi:10.1126/science.3863253
  • Kosuri S, Church GM. Large-scale de novo DNA synthesis: technologies and applications. Nat Methods. 2014;11(5):499-507. doi:10.1038/nmeth.2918
  • Hughes RA, Ellington AD. Synthetic DNA synthesis and assembly: putting the synthetic in synthetic biology. Cold Spring Harb Perspect Biol. 2017;9(1):a023812. doi:10.1101/cshperspect.a023812

Amino Acid Nutrition and Biosafety (Q6):

  • Reeds PJ. Dispensable and indispensable amino acids for humans. J Nutr. 2000;130(7):1835S-1840S. doi:10.1093/jn/130.7.1835S
  • WHO/FAO/UNU Expert Consultation. Protein and amino acid requirements in human nutrition. WHO Technical Report Series 935. Geneva: World Health Organization; 2007.
  • USDA National Nutrient Database for Standard Reference (Release 28). Agricultural Research Service, U.S. Department of Agriculture. 2015.
  • Crichton M. Jurassic Park. New York: Alfred A. Knopf; 1990.
  • Mandell DJ, Lajoie MJ, Mee MT, et al. Biocontainment of genetically modified organisms by synthetic protein design. Nature. 2015;518(7537):55-60. doi:10.1038/nature14121 [Modern unnatural amino acid containment systems]

Document created: February 10, 2026
Author: James Utley, PhD
Affiliation: Syndicate Laboratories, Panama City, Panama
Course: HTGAA 2026 Spring โ€” Week 1 Homework

Week 2 HW: DNA Read, Write, & Edit

cover image cover image

๐Ÿงฌ Week 2 Homework Components

DNA Read, Write, & Edit โ€” sequencing and synthesis workflows, restriction digests and gel electrophoresis, genome-editing frameworks.

๐Ÿ“‹ Overview

This week covers:

Content to be added as you complete each part.

Subsections of Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Part 1: Benchling & In-silico Gel Art

Simulated restriction enzyme digestion with the seven enzymes specified in this week’s lab protocol: SalI, SacI, EcoRV, KpnI, BamHI, HindIII, and EcoRI. Used both the DNA Gel Art Interface (ฮป DNA) and Benchling (lambda phage genome NC_001416) to visualize digest patterns and verify cut-site predictions.

Lab protocol: Gel Art: Restriction Digests and Gel Electrophoresis


Benchling Digest โ€” NC_001416 (Lambda Phage Genome)

Sequence: NC_001416 โ€” Escherichia phage lambda, 48,502 bp (linear).

Benchling digest link: NC_001416 Digest โ€” Benchling


Proof of Work โ€” Screenshots

1. DNA Gel Art Interface โ€” ฮป DNA Restriction Digests

Simulated gel electrophoresis using the DNA Gel Art tool. ฮป DNA was digested with various enzyme combinations (EcoRV + SacI, HindIII + PvuII, NdeI + SalI, etc.) across lanes 2โ€“10. The table documents water, CutSmart buffer, ฮป DNA, and enzyme volumes per lane.

DNA Gel Art Interface โ€” simulated restriction digests of ฮป DNA with multiple enzyme combinations; lanes 2โ€“10 show fragment patterns; restriction digest table documents reagents per lane DNA Gel Art Interface โ€” simulated restriction digests of ฮป DNA with multiple enzyme combinations; lanes 2โ€“10 show fragment patterns; restriction digest table documents reagents per lane

2. Benchling โ€” NC_001416 Sequence Map with Restriction Sites

Linear map of NC_001416 in Benchling showing the raw sequence, annotated genetic features (e.g., xis, nul, lambdap genes), and restriction enzyme cut sites (PciI, AscI, PmeI, BsaI, KpnI, SacI, SalI, and others) along the 48.5 kb genome.

Benchling NC_001416 โ€” sequence map and linear map with restriction enzyme cut sites and genetic features Benchling NC_001416 โ€” sequence map and linear map with restriction enzyme cut sites and genetic features

3. Virtual Digest Gel โ€” NC_001416 with All Seven Required Enzymes

Simulated gel (Life 1 kb Plus ladder) showing digest results for NC_001416 with each of the seven required enzymes:

LaneEnzymeFragment pattern
1HindIII3 bands (~11 kb, ~6.5 kb, ~2.1 kb)
2BamHI3 bands (~11.5 kb, ~7 kb, ~5.8 kb)
3KpnI2 bands (~12 kb, ~1.7 kb)
4EcoRVMultiple bands (many cut sites)
5SacI2 bands (~11.5 kb, ~1 kb)
6SalI2 bands (~11.5 kb, ~550 bp)
7EcoRIMultiple bands (~12 kb, ~9.5 kb, ~8.5 kb, ~7.5 kb, ~6 kb, ~3.5 kb)
Virtual digest gel โ€” NC_001416 digested with HindIII, BamHI, KpnI, EcoRV, SacI, SalI, EcoRI; Life 1 kb Plus ladder Virtual digest gel โ€” NC_001416 digested with HindIII, BamHI, KpnI, EcoRV, SacI, SalI, EcoRI; Life 1 kb Plus ladder

Enzymes Simulated

EnzymeRecognition siteNotes
SalIG^TCGAC6-cutter
SacIGAGCT^C6-cutter
EcoRVGAT^ATC6-cutter, blunt
KpnIGGTAC^C6-cutter
BamHIG^GATCC6-cutter
HindIIIA^AGCTT6-cutter
EcoRIG^AATTC6-cutter

Part 3: DNA Design Challenge

Part 3: DNA Design Challenge

3.1 Choose Your Protein

Protein chosen: Superfolder Green Fluorescent Protein (sfGFP)

Why: sfGFP is a robust, rapidly maturing fluorescent protein derived from Aequorea victoria (Pรฉdelacq et al., 2005). It is widely used in synthetic biology as a reporterโ€”when expressed in cells, it fluoresces bright green under blue/UV light, enabling real-time visualization of gene expression, protein localization, and cell tracking. Its “superfolder” mutations improve folding efficiency in diverse hosts (including E. coli), making it ideal for expression experiments. It also connects directly to Part 4, where we build an expression cassette to make E. coli glow green.

Source: FPbase โ€” Superfolder GFP | UniProt | GenBank: ASL68970

Protein sequence (amino acids):

MSKGEELFTGVVPILVELDGDVNGHKFSVRGEGEGDATNGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNFNSHNVYITADKQKNGIKANFKIRHNVEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSVLSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

(238 amino acids, ~26.8 kDa)


3.2 Reverse Translate: Protein โ†’ DNA

Using the Central Dogma in reverse: given a protein sequence, we infer a possible DNA sequence that could encode it. Because the genetic code is degenerate (multiple codons encode the same amino acid), many DNA sequences can produce the same protein. A simple reverse translation uses one valid codon per amino acidโ€”here, E. coli preferred codons (most frequently used in highly expressed genes).

Tool used: Reverse translation with E. coli codon preferences (e.g., ExPASy Translate or similar tools; can also be done manually with a codon usage table).

Reverse-translated DNA sequence (one possible encoding):

ATGTCAAAAGGTGAAGAACTGTTTACCGGTGTGGTGCCGATTCTGGTGGAACTGGATGGTGATGTGAACGGTCACAAATTTTCAGTGCGTGGTGAAGGTGAAGGTGATGCTACCAACGGTAAACTGACCCTGAAATTTATTTGCACCACCGGTAAACTGCCGGTGCCGTGGCCGACCCTGGTGACCACCCTGACCTACGGTGTGCAGTGCTTTTCACGTTACCCGGATCACATGAAACGTCACGATTTTTTTAAATCAGCTATGCCGGAAGGTTACGTGCAGGAACGTACCATTTCATTTAAAGATGATGGTACCTACAAAACCCGTGCTGAAGTGAAATTTGAAGGTGATACCCTGGTGAACCGTATTGAACTGAAAGGTATTGATTTTAAAGAAGATGGTAACATTCTGGGTCACAAACTGGAATACAACTTTAACTCACACAACGTGTACATTACCGCTGATAAACAGAAAAACGGTATTAAAGCTAACTTTAAAATTCGTCACAACGTGGAAGATGGTTCAGTGCAGCTGGCTGATCACTACCAGCAGAACACCCCGATTGGTGATGGTCCGGTGCTGCTGCCGGATAACCACTACCTGTCAACCCAGTCAGTGCTGTCAAAAGATCCGAACGAAAAACGTGATCACATGGTGCTGCTGGAATTTGTGACCGCTGCTGGTATTACCCACGGTATGGATGAACTGTACAAA

(714 bp)


3.3 Codon Optimization

Why optimize codon usage? Different organisms prefer different codons for the same amino acid, based on tRNA abundance and other factors. Using rare codons can slow translation, cause ribosome stalling, and reduce protein yield. Codon optimization replaces codons with those most frequently used in the target organism, improving expression levels and folding. It also allows us to avoid restriction enzyme recognition sites (e.g., BsaI, BsmBI, BbsI) that would interfere with Golden Gate or other assembly methods.

Organism chosen: Escherichia coli (K-12)

Why E. coli? It is the standard workhorse for recombinant protein expression: well-characterized genetics, fast growth, simple culture, and widely available vectors and protocols. The HTGAA Part 4 exercise uses E. coli for the sfGFP expression cassette, so optimizing for E. coli keeps the workflow consistent.

Tool used: Twist Bioscience Codon Optimization Tool (avoiding Type IIs sites BsaI, BsmBI, BbsI as recommended).

Codon-optimized DNA sequence (for E. coli):

Using Twist Codon Optimization Tool, avoiding Type IIs sites BsaI, BsmBI, BbsI:

ATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

(717 bp; optimized for E. coli expression, restriction-site free โ€” same sequence used in Part 4 expression cassette)


3.4 You Have a Sequence! Now What?

Technologies to produce sfGFP from this DNA:

  1. Cell-dependent (recombinant expression in E. coli):

    • Clone the codon-optimized gene into an expression vector (e.g., pTwist Amp High Copy) with a constitutive or inducible promoter (e.g., BBa_J23106), RBS (e.g., BBa_B0034), and terminator (e.g., BBa_B0015).
    • Transform the plasmid into E. coli (e.g., DH5ฮฑ, BL21).
    • Grow cells; the host RNA polymerase transcribes the DNA into mRNA, and ribosomes translate the mRNA into sfGFP.
    • The protein folds and forms its chromophore; cells fluoresce green under blue light (~488 nm excitation, ~510 nm emission).
  2. Cell-free (in vitro transcriptionโ€“translation):

    • Use a cell-free system (e.g., E. coli lysate, PURE system) with the DNA template.
    • Add NTPs, amino acids, and energy sources; the system transcribes and translates the gene without living cells.
    • Useful for rapid prototyping, toxic proteins, or when cell growth is impractical.
  3. DNA synthesis (Twist, IDT, etc.):

    • Order the gene as a clonal or linear fragment from a synthesis provider.
    • Use it directly for cloning or cell-free expression, avoiding PCR or cloning from natural sources.

Flow: DNA โ†’ (RNA polymerase) โ†’ mRNA โ†’ (ribosomes + tRNAs + amino acids) โ†’ polypeptide โ†’ (folding + chromophore formation) โ†’ fluorescent sfGFP.


3.5 [Optional] How Does It Work in Nature?

Alignment of DNA, RNA, and protein: In the Central Dogma, DNA is transcribed to RNA (Tโ†’U), and RNA is translated to protein (3 nt โ†’ 1 aa). Tools like Benchling or Ronan’s gel art site can visualize this alignment.

Single gene โ†’ multiple proteins: Alternative splicing (eukaryotes) or alternative start codons/ribosomal frameshifting can produce multiple proteins from one gene. sfGFP is a single open reading frame, but in general, one gene can yield multiple isoforms through these mechanisms.

Part 4: Prepare a Twist DNA Synthesis Order

Part 4: Prepare a Twist DNA Synthesis Order

Practice exercise โ€” building an sfGFP expression cassette in Benchling, preparing a mock Twist order, and annotating the plasmid.


4.1โ€“4.2 Accounts & Build Your DNA Insert Sequence

Created Twist and Benchling accounts. Built the sfGFP expression cassette in Benchling with annotated parts:

  • Promoter (BBa_J23106)
  • RBS (BBa_B0034)
  • Start codon (ATG)
  • Coding sequence (codon-optimized sfGFP from Part 3)
  • 7ร— His tag
  • Stop codon (TAA)
  • Terminator (BBa_B0015)

Proof of Annotation in Benchling

Benchling sequence link: sfGFP_expression_cassette ยท Benchling

Screenshot: Annotated Sequence Map in Benchling

The sequence map shows the sfGFP expression cassette (924 bp) with promoter, RBS, and sfGFP CDS annotated, plus restriction enzyme cut sites.

Benchling sfGFP expression cassette โ€” sequence map and linear map with annotated promoter (BBa_J23106), RBS (BBa_B0034), sfGFP CDS, and restriction enzyme sites Benchling sfGFP expression cassette โ€” sequence map and linear map with annotated promoter (BBa_J23106), RBS (BBa_B0034), sfGFP CDS, and restriction enzyme sites

Screenshot: Circular Plasmid Map (sfGFP in pTwist Amp High Copy)

The full construct (3145 bp) in pTwist Amp High Copy, with insert, source, AmpR promoter, and vector backbone annotated.

Note: The color choices for the plasmid annotations are a reflection of my cringe-worthy color skills โ€” consider yourself warned.

Circular plasmid map โ€” sfGFP_expression_cassette in pTwist Amp High Copy with annotated regions and restriction enzyme sites Circular plasmid map โ€” sfGFP_expression_cassette in pTwist Amp High Copy with annotated regions and restriction enzyme sites

4.3โ€“4.6 Twist Order Flow

  • Selected Genes โ†’ Clonal Genes on Twist
  • Uploaded FASTA (sfGFP expression cassette)
  • Chose vector: pTwist Amp High Copy from Twist Vector Catalog
  • Downloaded GenBank construct and imported into Benchling

Screenshot: Sequence Upload to Twist

Twist Genes โ€” HTGAA-Wk-2 upload interface showing sfgfp_expression_cassette successfully uploaded Twist Genes โ€” HTGAA-Wk-2 upload interface showing sfgfp_expression_cassette successfully uploaded

Design Notes: Manual vs. Programmatic

Efficiency: Designing expression cassettes and plasmids can be far more efficient with Python and/or R โ€” tools like DNA Chisel, PyDNA, or SynBioHub enable scripted design, validation, and export. Batch operations, automated codon optimization, and constraint checking become straightforward.

Learning value: Building the construct manually in Benchling โ€” clicking through each part, copying sequences, and annotating by hand โ€” offers a different kind of learning. You develop intuition for how promoters, RBSs, and CDSs fit together, where restriction sites fall, and what the plasmid “looks like” at each step. That tactile understanding is harder to get from a script. For a first expression cassette, the manual approach is worth the extra time.

    MANUAL (Benchling)              PROGRAMMATIC (Python/R)
    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€               โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    Click, paste, annotate           Script โ†’ design โ†’ export
    Slow, one construct at a time    Fast, many constructs
    Deep, tactile understanding     Scalable, reproducible
    "I built this"                   "I designed 50 of these"
    
    Both have their place. Start manual; scale with code.

Documented Deliverables

ItemStatus
Desired Twist cloning vectorpTwist Amp High Copy
Fully annotated Benchling insert fragmentsfGFP_expression_cassette
GenBank construct importedโœ“

Part 5: DNA Read, Write, & Edit

Part 5: DNA Read, Write, & Edit

Answers framed around the BioVolt DIY electroporation pipeline: plasmid amplification โ†’ transformation โ†’ PCR verification โ†’ gel electrophoresis. What DNA would we read, write, and edit to make this frugal pipeline sing?

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
     โ•‘  ๐Ÿงฌ THE CENTRAL DOGMA MEETS BIOVOLT ๐Ÿงฌ                         โ•‘
     โ•‘                                                               โ•‘
     โ•‘     READ          WRITE         EDIT                          โ•‘
     โ•‘       โ”‚              โ”‚             โ”‚                          โ•‘
     โ•‘       โ–ผ              โ–ผ             โ–ผ                          โ•‘
     โ•‘   [Sequence]   [Synthesize]   [CRISPR]                        โ•‘
     โ•‘       โ”‚              โ”‚             โ”‚                          โ•‘
     โ•‘       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ•‘
     โ•‘                      โ”‚                                        โ•‘
     โ•‘                      โ–ผ                                        โ•‘
     โ•‘            โšก BIOVOLT ZAPS IT IN โšก                             โ•‘
     โ•‘                 (E. coli glows green)                         โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

5.1 DNA Read

(i) What DNA would you want to sequence and why?

In the BioVolt pipeline: After electroporation, we transform E. coli with plasmids (e.g., sfGFP expression cassette). We run post-transformation PCR and gel electrophoresis to infer successโ€”but we don’t know the exact sequence. Sequencing the plasmid (or PCR amplicon) confirms that:

  • The insert is correct (no truncations, no wrong gene)
  • Electroporation didn’t introduce mutations (high voltage can stress DNA)
  • The expression cassette is intact for downstream experiments

Broader applications (aligned with BioVolt’s democratization goals):

  • Environmental monitoring โ€” e.g., sewage/wastewater DNA for microbiome analysis in Panama; biodiversity surveys
  • Human health โ€” disease-associated genes, pharmacogenomics
  • DNA data storage โ€” archival sequences in synthetic DNA
  • Biobank validation โ€” verifying stored samples
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  BIOVOLT PIPELINE: WHERE SEQUENCING FITS                    โ”‚
    โ”‚                                                             โ”‚
    โ”‚   Plasmid โ”€โ”€โ–บ PCR amp โ”€โ”€โ–บ BioVolt zap โ”€โ”€โ–บ Plate โ”€โ”€โ–บ Coloniesโ”‚
    โ”‚      โ”‚                         โ”‚                    โ”‚       โ”‚
    โ”‚      โ”‚                         โ”‚                    โ”‚       โ”‚
    โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
    โ”‚                    โ”‚                                        โ”‚
    โ”‚                    โ–ผ                                        โ”‚
    โ”‚              "Did it work?"  โ”€โ”€โ–บ  SEQUENCE IT! ๐Ÿ”ฌ           โ”‚
    โ”‚              (gel = maybe)       (sequence = certainty)     โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

(ii) What technology would you use and why?

Technology chosen: Oxford Nanopore (MinION) โ€” third-generation sequencing

Why Nanopore for BioVolt / frugal labs:

  • Portable โ€” USB-sized device; runs on laptop; fits in a backpack. Ideal for Panama, field sites, or home labs.
  • Real-time โ€” base calling as reads stream; no batch wait.
  • Long reads โ€” can span full plasmids; fewer assembly gaps.
  • Low capital โ€” compared to Illumina, much cheaper to get started.
  • No PCR required for some workflows โ€” direct DNA sequencing possible (native DNA).
QuestionAnswer
Output?FASTQ files (reads + quality scores); can be base-called in real time to BAM/FASTA.
Essential steps & base calling?(1) DNA passes through a nanopore; (2) each base disrupts ionic current differently; (3) base caller (e.g., Guppy) converts current traces โ†’ A/T/G/C; (4) reads assembled/compared to reference.
Input & preparation?Option A (PCR amplicon): PCR product โ†’ end-prep โ†’ adapter ligation โ†’ load onto flow cell. Option B (native): Fragment DNA (e.g., g-TUBE or sonication) โ†’ repair ends โ†’ adapter ligation โ†’ load. Key: adapters enable motor protein to thread DNA through pore.
First-, second-, or third-generation?Third-generation. Single-molecule, real-time; no amplification required for some lib preps; long reads; portable form factor.
         NANOPORE SEQUENCING (simplified)
         
              โ•ญโ”€โ”€โ”€-โ•ฎ
    DNA โ”€โ”€โ”€โ”€โ–บ โ”‚ โ–“โ–“ โ”‚  โ† pore in membrane
              โ”‚ โ–“โ–“ โ”‚     (ionic current changes per base)
              โ•ฐโ”€โ”€โ”€-โ•ฏ
                 โ”‚
                 โ–ผ
           โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
           โ•‘  A T G C  โ•‘  โ† base caller (Guppy, etc.)
           โ•‘  โ–“ โ–“ โ–“ โ–“  โ•‘     converts squiggle โ†’ sequence
           โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

5.2 DNA Write

(i) What DNA would you want to synthesize and why?

For BioVolt: The expression cassettes we electroporate! Specifically:

  • sfGFP plasmid โ€” promoter + RBS + sfGFP CDS + terminator (e.g., BBa_J23106, BBa_B0034, sfGFP, BBa_B0015). This is the “make E. coli glow green” construct we build in Part 4.
  • Custom reporters โ€” e.g., biosensors that fluoresce in response to environmental cues (pH, metals, toxins) for citizen-science monitoring.
  • Validation controls โ€” known sequences for PCR/gel positive controls in the frugal pipeline.

Broader: Therapeutics (mRNA vaccines), genetic circuits, DNA origami, gene clusters for metabolic engineering.

    WHAT WE SYNTHESIZE FOR BIOVOLT:
    
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  [Promoter]โ”€[RBS]โ”€[ATG]โ”€[sfGFP]โ”€[His]โ”€[TAA]โ”€[Terminator]   โ”‚
    โ”‚       โ”‚                    โ”‚                               โ”‚
    โ”‚       โ””โ”€โ”€ always on        โ””โ”€โ”€ glows green under UV        โ”‚
    โ”‚                                                            โ”‚
    โ”‚  Twist / IDT makes this. BioVolt zaps it in. Done. ๐ŸŸข      โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

(ii) What technology would you use and why?

Technology: Column-based phosphoramidite synthesis (e.g., Twist Bioscience, IDT) โ€” the industry standard for gene synthesis.

Why: High fidelity, scalable, cost-effective for genes and gene fragments. Twist can deliver clonal genes (circular) ready for transformationโ€”perfect for BioVolt.

QuestionAnswer
Limitations?Speed: days to weeks. Accuracy: ~1 error per 1โ€“3 kb; may need sequencing to confirm. Scalability: great for genes; whole genomes get expensive. Length: very long constructs may need assembly.
Essential steps?(1) Design sequence (e.g., codon-optimized); (2) split into overlapping oligos; (3) synthesize oligos (phosphoramidite chemistry, base-by-base); (4) assemble oligos (PCR, Gibson, or enzymatic); (5) clone into vector; (6) sequence to verify.
    PHOSPHORAMIDITE SYNTHESIS (cartoon)
    
    Base + Base + Base + ...  โ†’  oligo  โ†’  assemble  โ†’  gene
    
        A   T   G   C   A   T   ...
        โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚
        โ–ผ   โ–ผ   โ–ผ   โ–ผ   โ–ผ   โ–ผ
    โ”Œโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€----โ”
    โ”‚  โ–ˆโ–ˆโ–ˆโ–ˆ โ–ˆโ–ˆโ–ˆโ–ˆ โ–ˆโ–ˆโ–ˆโ–ˆ โ–ˆโ–ˆโ–ˆโ–ˆ โ–ˆโ–ˆโ–ˆโ–ˆ     โ”‚  โ† solid support (column)
    โ”‚  add โ†’ couple โ†’ oxidize โ†’ cap โ”‚  (repeat ~hundreds of times)
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€- โ”˜

5.3 DNA Edit

(i) What DNA would you want to edit and why?

For BioVolt:

  • Improve electroporation efficiency โ€” edit E. coli to knock out or modify genes that affect membrane composition, cell wall, or DNA repair (e.g., recA, mutS) to get more transformants per zap.
  • Biosensor chassis โ€” edit strains to express reporter circuits (e.g., GFP under metal-responsive promoter) for environmental sensing in the DIY pipeline.
  • Safety โ€” auxotrophic markers, kill switches, or containment edits for responsible DIYbio.

Broader: Human therapeutics (e.g., sickle cell), agriculture (nitrogen fixation, disease resistance), conservation (genetic rescue), longevity research.

    EDIT E. coli FOR BETTER BIOVOLT TRANSFORMATION?
    
         Wild-type E. coli              Edited E. coli
              โ”‚                              โ”‚
              โ”‚  "Membrane too tough"        โ”‚  "Softer membrane?"
              โ”‚  "DNA repair too good?"      โ”‚  "Fewer repair enzymes?"
              โ”‚                              โ”‚
              โ–ผ                              โ–ผ
         โšก BioVolt โšก                  โšก BioVolt โšก
              โ”‚                              โ”‚
              โ–ผ                              โ–ผ
         10ยณ CFU/ยตg                    10โต CFU/ยตg?  ๐ŸŽฏ
              โ”‚                              โ”‚
            "Meh"                      "Now we're talking!"

(ii) What technology would you use and why?

Technology: CRISPR/Cas9 (with HDR for precise edits) โ€” or base editors for single-nucleotide changes without double-strand breaks.

Why: Programmable, precise, widely adopted. gRNA design is straightforward; many tools (Benchling, etc.) support it.

QuestionAnswer
Limitations?Efficiency: not 100%; mixed populations. Precision: off-target cuts possible; PAM requirement constrains target sites. Delivery: need to get Cas9 + gRNA into cells (electroporation works!).
Preparation & input?Design: gRNA(s) targeting locus; donor template (ssODN or plasmid) for HDR. Input: DNA template, Cas9 nuclease, gRNA (or plasmid expressing both), cells. Optional: base editor (e.g., ABE, CBE) for point mutations.
Essential steps?(1) Design gRNA (avoid off-targets; check PAM, e.g., NGG for SpCas9); (2) deliver Cas9 + gRNA + donor (electroporation, conjugation, etc.); (3) Cas9 cuts DNA; (4) cell repairs via NHEJ or HDR; (5) screen for edits (PCR, sequencing).
    CRISPR/Cas9 IN ACTION (simplified)
    
    gRNA:  "Find this sequence"  โ”€โ”€ โ”
                                    โ”œโ”€โ”€โ–บ  Cas9  โ”€โ”€โ–บ  CUT! โœ‚๏ธ
    DNA:   ...TARGET...PAM...     โ”€โ”€โ”˜
    
    Before:  โ”€โ”€โ”€โ”€[TARGET]โ”€โ”€โ”€โ”€
    After:   โ”€โ”€โ”€โ”€โ•ฒ     โ•ฑโ”€โ”€โ”€โ”€   (cell repairs: NHEJ or HDR)
                  โ•ฒ   โ•ฑ
                   gap
    
    BioVolt could deliver Cas9 RNP + donor via electroporation! โšก

Summary: Read, Write, Edit โ†’ BioVolt

    โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
    โ•‘                     BIOVOLT + DNA TOOLKIT                      โ•‘
    โ•‘                                                                โ•‘
    โ•‘   WRITE (Twist)     โ”€โ”€โ–บ  plasmid with sfGFP                    โ•‘
    โ•‘         โ”‚                                                      โ•‘
    โ•‘         โ–ผ                                                      โ•‘
    โ•‘   EDIT (optional)   โ”€โ”€โ–บ  tune E. coli for better zapping       โ•‘
    โ•‘         โ”‚                                                      โ•‘
    โ•‘         โ–ผ                                                      โ•‘
    โ•‘   โšก BIOVOLT โšก     โ”€โ”€โ–บ  transform cells                         โ•‘
    โ•‘         โ”‚                                                      โ•‘
    โ•‘         โ–ผ                                                      โ•‘
    โ•‘   READ (Nanopore)   โ”€โ”€โ–บ  confirm plasmid sequence              โ•‘
    โ•‘                                                                โ•‘
    โ•‘   Result: Frugal, validated, democratized synthetic biology.   โ•‘
    โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Week 3 HW: Lab Automation

cover image cover image

๐Ÿค– Week 3 Homework: Lab Automation

Find and describe a published paper utilizing automation for novel biological applications; describe automation tools for your final project.

๐Ÿ“‹ Overview

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
     โ•‘  ๐Ÿค– LAB AUTOMATION: PAPER + PROJECT ๐Ÿค–                         โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Part 1                    Part 2                            โ•‘
     โ•‘      โ”‚                         โ”‚                              โ•‘
     โ•‘      โ–ผ                         โ–ผ                              โ•‘
     โ•‘   [Microfluidics]          [gumol + new-Clara]                 โ•‘
     โ•‘   Synthetic cells          MD โ†’ oxidative surrogate            โ•‘
     โ•‘   (automation tool)        (validation pipeline)               โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

This week covers:

  • Part 1: Published paper โ€” synthetic cells via droplet-based microfluidics
  • Part 2: Automation project description โ€” gumol + ECSOD/MSC + new-Clara validation pipeline

Part 1: Published Paper โ€” Automation for Novel Biological Applications

Paper Citation

Title: Synthetic cells and droplet-based microfluidics (review)
Journal: Small
DOI: 10.1002/smll.202400086
Year: 2024

Abstract Summary

Synthetic cells function as biological mimics of natural cells by mimicking salient features such as metabolism, response to stimuli, gene expression, direct metabolism, and high stability. Droplet-based microfluidic technology presents the opportunity for encapsulating biological functional components in uni-lamellar liposome or polymer droplets. Verified by its success in the fabrication of synthetic cells, microfluidic technology is widely replacing conventional labor-intensive, expensive, and sophisticated techniques justified by its ability to miniaturize and perform batch production operations.

Automation Tool

Droplet-based microfluidics โ€” lab-on-chip systems that automate encapsulation, mixing, and batch production of synthetic cell constructs. Microfluidics serves as the automation platform: it replaces manual, labor-intensive methods with reproducible, tunable, high-throughput workflows.

    DROPLET MICROFLUIDICS: MANUAL โ†’ AUTOMATED
    
    Before (manual):              After (microfluidic):
    
      ๐Ÿงช Hand pipetting             โ•ญโ”€โ”€โ”€โ”€โ”€โ•ฎ  โ•ญโ”€โ”€โ”€โ”€โ”€โ•ฎ  โ•ญโ”€โ”€โ”€โ”€โ”€โ•ฎ
      tedious, variable             โ”‚ โ—‹ โ—‹ โ”‚  โ”‚ โ—‹ โ—‹ โ”‚  โ”‚ โ—‹ โ—‹ โ”‚  โ† droplets
      batch-to-batch                โ•ฐโ”€โ”€โ”ฌโ”€โ”€โ•ฏ  โ•ฐโ”€โ”€โ”ฌโ”€โ”€โ•ฏ  โ•ฐโ”€โ”€โ”ฌโ”€โ”€โ•ฏ
                                       โ”‚        โ”‚        โ”‚
      "Labor-intensive"                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                               โ”‚
                                               โ–ผ
                                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                        โ”‚  CHIP       โ”‚  โ† reproducible
                                        โ”‚  (automated)โ”‚     tunable
                                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     batch production

Biological Applications

Synthetic Cell TypeDescription
Lipid vesicles (liposomes)Uni-lamellar lipid bilayers encapsulating biological components
Polymer vesicles (polymersomes)Polymer-based membranes for encapsulation
Coacervate microdropletsLiquid-liquid phase separation compartments
ColloidosomesColloidal particle-stabilized droplets

The review discusses microfluidic chip design for synthetic cell preparation, the combination of microfluidics with bottom-up synthetic biology for reproductive and tunable construction, and advances in biosensors and biomedical applications.

Novel Aspects

  • Reproducible, tunable construction โ€” Batch production from simple structures to higher hierarchical structures
  • Miniaturization โ€” Replaces conventional expensive techniques
  • Integration โ€” Design, assembly, manipulation, and analysis within lab-on-chip devices
  • Biomedical relevance โ€” Biosensors, drug delivery, therapeutic applications

Why This Paper Fits the Assignment

Microfluidics is an automation tool that achieves novel biological applications: it automates the fabrication of synthetic cells at scale, enabling research that would otherwise be labor-intensive and costly. The paper provides an overview of how this automation enables bottom-up synthetic biology and biomedical innovation.

    โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
    โ•‘  SYNTHETIC CELLS: MICROFLUIDICS AS AUTOMATION                 โ•‘
    โ•‘                                                               โ•‘
    โ•‘   [Droplet microfluidics]  โ”€โ”€โ–บ  Liposome | Polymersome |      โ•‘
    โ•‘   (automation tool)              Coacervate | Colloidosome     โ•‘
    โ•‘                    โ”‚                      โ”‚                    โ•‘
    โ•‘                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                    โ•‘
    โ•‘                               โ”‚                               โ•‘
    โ•‘                               โ–ผ                               โ•‘
    โ•‘              Biosensors & biomedical applications             โ•‘
    โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Part 2: Automation Tools for Final Project โ€” gumol + ECSOD + new-Clara

Project Overview

Project in development: A combined computationalโ€“experimental pipeline to study ECSOD (extracellular superoxide dismutase) overexpression from mesenchymal stem cells (MSCs) in acute radiation environments, with microfluidic validation serving as a surrogate for radiation exposure.

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
     โ•‘  ๐Ÿ”ฌ GUMOL + ECSOD + new-Clara PIPELINE ๐Ÿ”ฌ                     โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Rust MD engine          Microfluidic validation             โ•‘
     โ•‘   (radiation sim)         (oxidative surrogate)               โ•‘
     โ•‘        โ”‚                           โ”‚                          โ•‘
     โ•‘        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ•‘
     โ•‘                    โ–ผ                                          โ•‘
     โ•‘            ECSOD from MSC  โ”€โ”€โ–บ  Correlation & validation      โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Pipeline Components

ComponentRole
gumolCustom MD simulation engine in Rust for molecular dynamics in acute radiation environments
ECSOD / MSCSimulated overexpression of extracellular superoxide dismutase from MSC cells (mechanism still being refined)
new-ClaraMicrofluidic system for controlled validation runs
Surrogate modelMicrofluidic oxidative stress used as a surrogate for radioactive conditions

Workflow: Simulation โ†’ Validation โ†’ Correlation

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  COMPUTATIONAL ARM                    โ”‚  EXPERIMENTAL ARM (AUTOMATION)       โ”‚
โ”‚                                       โ”‚                                       โ”‚
โ”‚  gumol (Rust MD engine)               โ”‚  new-Clara microfluidic system        โ”‚
โ”‚       โ”‚                                โ”‚       โ”‚                               โ”‚
โ”‚       โ–ผ                                โ”‚       โ–ผ                               โ”‚
โ”‚  Acute radiation environment          โ”‚  Simulated oxidative environment      โ”‚
โ”‚  simulations                          โ”‚  (surrogate for radiation)            โ”‚
โ”‚       โ”‚                                โ”‚       โ”‚                               โ”‚
โ”‚       โ–ผ                                โ”‚       โ–ผ                               โ”‚
โ”‚  ECSOD overexpression from MSC      โ”‚  Validation runs: controlled           โ”‚
โ”‚  (mechanism in refinement)            โ”‚  oxidative stress delivery            โ”‚
โ”‚       โ”‚                                โ”‚       โ”‚                               โ”‚
โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                               โ”‚
โ”‚                                        โ–ผ                                       โ”‚
โ”‚                              CORRELATION & VALIDATION                          โ”‚
โ”‚                              (MD predictions โ†” microfluidic data)              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Automation Tool: new-Clara Microfluidic System

new-Clara is the primary automation tool in this project. It provides:

  • Controlled oxidative stress โ€” Reproducible delivery of oxidative conditions as a surrogate for radiation
  • Precision and throughput โ€” Automated, repeatable runs instead of manual handling
  • Data alignment โ€” Outputs that can be directly compared with gumol MD results

Because radiation experiments are costly and regulated, the microfluidic oxidative environment acts as a surrogate for acute radiation, enabling validation of computational predictions under safer, more accessible conditions.

    SURROGATE VALIDATION: Radiation โ†” Oxidative stress
    
    Radiation (expensive, regulated)     Oxidative stress (accessible)
              โ”‚                                    โ”‚
              โ”‚    "Same downstream damage         โ”‚
              โ”‚     pathways (ROS, etc.)"          โ”‚
              โ”‚                                    โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚
                             โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  new-Clara      โ”‚  โ† controlled, reproducible
                    โ”‚  microfluidic   โ”‚     surrogate runs
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

What Will Be Automated

  1. Microfluidic runs โ€” new-Clara controls flow, dosing, and timing of oxidative stress
  2. Data collection โ€” Automated or semi-automated readouts (e.g., fluorescence, viability) for correlation with MD
  3. Parameter sweeps โ€” Systematic variation of oxidative stress levels to map doseโ€“response and compare with simulation

Connection to Part 1 (Synthetic Cells Paper)

The synthetic cells / droplet microfluidics review supports this project by demonstrating how microfluidics enables:

  • Reproducible, tunable conditions โ€” Aligned with the need for controlled oxidative stress
  • Lab-on-chip workflows โ€” Similar to new-Clara’s role in validation
  • Biosensor and biomedical applications โ€” Relevant to ECSOD and MSC-based therapies for radiation injury

Current Status & Next Steps

  • gumol โ€” MD engine in Rust, in development
  • ECSOD/MSC mechanism โ€” Still being refined
  • new-Clara โ€” Microfluidic system for validation runs
  • Surrogate design โ€” Oxidative stress protocol as radiation surrogate

Example Pseudocode (Conceptual)

# Pseudocode: new-Clara validation run aligned with gumol MD output
# Input: MD simulation predicts ECSOD protection at oxidative stress level X
# Output: Microfluidic validation at equivalent oxidative dose

def run_validation(md_stress_level, n_replicates=3):
    """
    Map MD-predicted stress to microfluidic oxidative surrogate.
    Run n_replicates for statistical correlation.
    """
    oxidative_dose = map_md_to_oxidative_surrogate(md_stress_level)
    
    for rep in range(n_replicates):
        new_clara.set_oxidative_conditions(oxidative_dose)
        new_clara.run_flow_protocol()
        data = new_clara.collect_readouts()  # e.g., viability, ROS markers
        log_for_correlation(md_stress_level, oxidative_dose, data)
    
    return correlate_with_md_predictions()

Summary

    โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
    โ•‘  WEEK 3 HOMEWORK SUMMARY                                      โ•‘
    โ•‘                                                               โ•‘
    โ•‘   Part 1: Paper                                               โ•‘
    โ•‘   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ•‘
    โ•‘   โ”‚ Microfluidics โ†’ synthetic cells (liposomes, etc.)     โ”‚    โ•‘
    โ•‘   โ”‚ Automation for reproducible, tunable fabrication    โ”‚    โ•‘
    โ•‘   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ•‘
    โ•‘                                                               โ•‘
    โ•‘   Part 2: Project                                             โ•‘
    โ•‘   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ•‘
    โ•‘   โ”‚ gumol (MD) โ”€โ”€โ–บ new-Clara (microfluidic) โ”€โ”€โ–บ validate โ”‚    โ•‘
    โ•‘   โ”‚ Oxidative surrogate for radiation; ECSOD/MSC focus    โ”‚    โ•‘
    โ•‘   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ•‘
    โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
PartContent
Part 1Synthetic cells via droplet microfluidics โ€” microfluidics as automation for reproducible, tunable biological fabrication
Part 2gumol (Rust MD) + ECSOD/MSC + new-Clara microfluidic validation โ€” oxidative surrogate for radiation, MDโ€“experiment correlation

This homework does not need to be tested on the Opentrons yet; it describes the intended automation workflow for the final project.

Week 4 HW: Protein Design

cover image cover image

Homework 4

Protein Design Part I โ€” amino acids, protein structure, helices, and ฮฒ-sheets.

๐Ÿ“‹ Parts

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•—
     โ•‘  ๐Ÿงฌ PROTEIN DESIGN PART I โ€” ฮฑ-helices & ฮฒ-sheets ๐Ÿงฌ          โ•‘
     โ•‘                                                             โ•‘
     โ•‘        โ•ญโ”€โ”€โ”€โ•ฎ     right-handed ฮฑ-helix                       โ•‘
     โ•‘       โ•ฑ  โ—  โ•ฒ    (L-amino acids)                            โ•‘
     โ•‘      โ”‚ โ—   โ— โ”‚                                              โ•‘
     โ•‘       โ•ฒ  โ—  โ•ฑ                                               โ•‘
     โ•‘        โ•ฐโ”€โ”€โ”€โ•ฏ                                                โ•‘
     โ•‘                                                             โ•‘
     โ•‘     โ•โ•โ•โ•โ•ฒ  โ•ฑโ•โ•โ•โ•    ฮฒ-sheet (pleated, H-bonded)             โ•‘
     โ•‘          โ•ฒโ•ฑ                                                 โ•‘
     โ•‘     โ•โ•โ•โ•โ•ฑ  โ•ฒโ•โ•โ•โ•                                            โ•‘
     โ•‘                                                             โ•‘
     โ•‘   "20 amino acids โ†’ infinite folds. Part A + Part B below!" โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang (i.e. you can select two to skip).

Answers provided for: (9 selected; 2 skipped: Can you make other non-natural amino acids? Design some new amino acids. and Design a ฮฒ-sheet motif that forms a well-ordered structure.)


1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Answer: Meat is roughly 15โ€“25% protein by dry weight; water content varies. For a rough estimate, assume ~500 g of meat contains ~100 g of protein (โ‰ˆ20%). An average amino acid has a molecular mass of ~100 Daltons (Da).

  • Number of amino acids in 100 g protein โ‰ˆ 100 g / (100 ร— 10โปยณ kg/mol) โ‰ˆ 100 g / 0.1 kg/mol โ‰ˆ 1 mol โ‰ˆ 6 ร— 10ยฒยณ molecules (Avogadroโ€™s number).

Order of magnitude: ~10ยฒยณโ€“10ยฒโด amino acid molecules per 500 g of meat.


2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Answer: Dietary proteins are digested into amino acids and small peptides before absorption. They are absorbed as monomers, not as intact proteins.

  • Digestion: Stomach acid and proteases (pepsin, trypsin, chymotrypsin) hydrolyze peptide bonds.
  • Absorption: Amino acids enter the bloodstream and are used as building blocks.
  • Assembly: Our cells use these amino acids to synthesize our own proteins according to our genome. The cowโ€™s or fishโ€™s DNA is never used; only the amino acid monomers are reused.

Result: We use the amino acids as nutrients; we do not incorporate the cowโ€™s or fishโ€™s proteins or genes intact. We remain human because our protein synthesis is controlled by human DNA.


3. Why are there only 20 natural amino acids?

Answer: The genetic code is degenerate: 61 sense codons encode 20 standard amino acids. The number 20 reflects a balance of evolutionary and physicochemical constraints:

  • Evolution: Early life likely used a smaller set of amino acids; the canonical 20 were added over time as biosynthesis pathways evolved.
  • Sufficiency: 20 amino acids provide enough chemical diversity (hydrophobic, polar, charged, aromatic, etc.) to build proteins with diverse structures and functions.
  • Genetic code: The triplet code (4ยณ = 64 codons) can encode more than 20, but expansion beyond 20 would require additional tRNA synthetases and codons; the cost of adding more may outweigh the benefit.
  • Fidelity: A larger set of amino acids would increase the risk of misincorporation and reduce translation fidelity.

Summary: 20 amino acids provide sufficient diversity for protein function while keeping the system manageable and robust.


4. Where did amino acids come from before enzymes that make them, and before life started?

Answer: Abiotic (prebiotic) synthesis.

  • Millerโ€“Urey experiment (1952): Simulated early Earth conditions (reducing atmosphere, lightning, heat) produced amino acids (glycine, alanine, etc.) from simple precursors (Hโ‚‚O, CHโ‚„, NHโ‚ƒ, Hโ‚‚).
  • Extraterrestrial sources: Amino acids (e.g., glycine) are found in meteorites (e.g., Murchison) and comets; they may have been delivered to early Earth.
  • Hydrothermal vents: Alkaline vents and other mineral surfaces can catalyze amino acid formation from COโ‚‚, Hโ‚‚, and nitrogen.
  • Strecker synthesis: Cyanide, aldehydes, and ammonia can form amino acids under prebiotic conditions.

Conclusion: Amino acids could form without enzymes or life, via abiotic chemistry and/or delivery from space.


5. If you make an ฮฑ-helix using D-amino acids, what handedness (right or left) would you expect?

Answer: Left-handed (M-type) helix.

  • L-amino acids form right-handed (P-type) ฮฑ-helices because the L-configuration places the side chain in a conformation that favors right-handed twist.
  • D-amino acids are the mirror image; their side chains favor the opposite twist. A D-amino acid ฮฑ-helix is therefore left-handed.

Summary: D-amino acid ฮฑ-helix โ†’ left-handed; L-amino acid ฮฑ-helix โ†’ right-handed.


6. Can you discover additional helices in proteins?

Answer: Yes. Beyond the canonical ฮฑ-helix (3.6 residues/turn), other helices exist:

  • 3โ‚โ‚€ helix: ~3 residues/turn; tighter, shorter hydrogen bonds; often at helix termini.
  • ฯ€-helix: ~4.4 residues/turn; rare; energetically less favorable.
  • Polyproline helices (PPI, PPII): Proline-rich helices with different geometry.
  • Collagen-like structures: Triple helical motifs.
  • Novel helices: New helices can be discovered through structural biology (e.g., X-ray crystallography, cryo-EM) or designed de novo.

Conclusion: Additional helices can be found by analyzing protein structures and designing new motifs.


7. Why are most molecular helices right-handed?

Answer: Several factors favor right-handed helices:

  • Chirality of L-amino acids: All natural proteins use L-amino acids. The L-configuration favors right-handed ฮฑ-helices and ฮฒ-strands; left-handed helices are sterically strained.
  • DNA: Double helix is right-handed (B-form).
  • RNA: RNA helices are typically right-handed.
  • Minimization of steric clash: Right-handed twist often minimizes steric clashes between side chains and the backbone.
  • Evolution: Once right-handed helices dominated, the genetic code and biosynthesis reinforced this preference.

Summary: L-amino acid chirality and steric constraints favor right-handed helices in natural proteins.


8. Why do ฮฒ-sheets tend to aggregate?

Answer: ฮฒ-sheets expose backbone amide and carbonyl groups that can form hydrogen bonds with adjacent strands or sheets.

  • Hydrogen bonding: ฮฒ-strands have alternating Nโ€“H and C=O groups along the backbone; these can pair with adjacent strands or with strands from another sheet.
  • Hydrophobic side chains: Many ฮฒ-sheets have hydrophobic residues; stacking of sheets can bury these surfaces and reduce solvent exposure.
  • Extended conformation: Extended strands maximize surface area for inter-strand and inter-sheet contacts.
  • Amyloid-like stacking: ฮฒ-sheets can stack in a parallel or antiparallel fashion, forming amyloid fibrils.

Conclusion: ฮฒ-sheets aggregate because they expose H-bond donors/acceptors and hydrophobic surfaces that favor inter-sheet interactions.


9. What is the driving force for ฮฒ-sheet aggregation?

Answer: Main driving forces:

  • Hydrogen bonding: Backboneโ€“backbone H-bonds between strands from different molecules or sheets.
  • Hydrophobic effect: Burial of hydrophobic side chains reduces contact with water.
  • Entropy: Release of ordered water molecules when hydrophobic surfaces associate.
  • ฯ€โ€“ฯ€ stacking: Aromatic side chains (e.g., Phe, Tyr) can stack between sheets.
  • Electrostatic complementarity: Alternating charged and hydrophobic residues (e.g., in ionic self-complementary peptides) can drive ordered assembly.

Summary: H-bonding, hydrophobicity, and entropy release drive ฮฒ-sheet aggregation.


10. Why do many amyloid diseases form ฮฒ-sheets?

Answer: Many disease-associated proteins aggregate into amyloid fibrils rich in ฮฒ-sheet structure:

  • Misfolding: Proteins that are normally ฮฑ-helical or disordered can misfold into ฮฒ-sheet-rich conformations under stress (e.g., pH, temperature, mutations).
  • Stability: Cross-ฮฒ structure (ฮฒ-strands perpendicular to the fibril axis) is highly stable; once formed, fibrils are difficult to disaggregate.
  • Nucleation: A small ฮฒ-sheet nucleus can template further growth; amyloid formation is often nucleation-dependent.
  • Examples: Aฮฒ (Alzheimerโ€™s), ฮฑ-synuclein (Parkinsonโ€™s), prion (PrP), huntingtin (Huntingtonโ€™s).

Conclusion: ฮฒ-sheet structure provides a stable, self-propagating amyloid conformation that underlies many neurodegenerative diseases.


11. Can you use amyloid ฮฒ-sheets as materials?

Answer: Yes. Amyloid-like ฮฒ-sheet structures are used as materials:

  • Self-assembling peptides: Shuguang Zhangโ€™s ionic self-complementary peptides form stable ฮฒ-sheet nanofibers and scaffolds for tissue engineering, drug delivery, and 3D cell culture.
  • Nanostructures: ฮฒ-sheet fibrils can serve as templates for mineralization, nanowires, and conductive materials.
  • Hydrogels: ฮฒ-sheet-rich peptide networks form hydrogels for wound healing and regenerative medicine.
  • Functional materials: Engineered amyloid fibrils have been used for catalysis, biosensors, and optical materials.

Conclusion: Amyloid ฮฒ-sheets can be engineered as functional biomaterials for biomedical and material applications.


Part B. Protein Analysis and Visualization

================================================================================
   ______   ______   _____   ____     ____  
  / ____/  / ____/  / ___/  / __ \   / __ \ 
 / __/    / /       \__ \  / / / /  / / / /
/ /___   / /___    ___/ / / /_/ /  / /_/ / 
/_____/  \____/   /____/  \____/   \____/  
Extracellular Superoxide Dismutase (ECSOD / SOD3) โ€” Homework Write-Up Template
================================================================================

Protein Selected

FieldValue
NameExtracellular superoxide dismutase [Cu-Zn]
GeneSOD3 (aka ECSOD)
OrganismHomo sapiens (human)
Chosen Structure (RCSB PDB)2JLP
Classification (RCSB)OXIDOREDUCTASE

Why I selected it (brief):
ECSOD is a secreted antioxidant enzyme that detoxifies superoxide radicals in the extracellular space, helping protect tissues from oxidative stress. I selected it because it is biologically important in vascular and lung biology, and a high-quality X-ray crystal structure is available for direct 3D visualization (PDB 2JLP).


1) Identify the amino acid sequence of the protein

Canonical protein sequence source: UniProt (Entry: P08294)

IMPORTANT NOTE ABOUT SEQUENCE VS STRUCTURE:
The UniProt canonical protein is the biological sequence. The PDB structure often contains a construct/fragment and may not include every residue from the UniProt canonical sequence.

How to obtain the sequence (recommended workflow):

  • A) UniProt canonical sequence (P08294): Go to UniProt entry P08294 โ†’ Download the FASTA (canonical sequence)
  • B) PDB construct sequence (2JLP): Go to the RCSB page for 2JLP โ†’ Download FASTA Sequence
UniProt FASTA (P08294)
>sp|P08294|SODE_HUMAN Extracellular superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD3 PE=1 SV=2
MLALLCSCLLLAAGASDAWTGEDSAEPNSDSAEWIRDMYAKVTEIWQEVMQRRDDDGALH
AACQVQPSATLDAAQPRVTGVVLFRQLAPRAKLDAFFALEGFPTEPNSSSRAIHVHQFGD
LSQGCESTGPHYNPLAVPHPQHPGDFGNFAVRDGSLWRYRAGLAASLAGPHSIVGRAVVV
HAGEDDLGRGGNQASVENGNAGRRLACCVVGVCGPGLWERQAREHSERKKRRRESECKAA
PDB FASTA (2JLP)

Download from RCSB 2JLP โ†’ Sequence tab โ†’ FASTA. The PDB chain in 2JLP is 222 aa per chain (Chains A, B, C, D).


2) How long is it? What is the most frequent amino acid?

Length typeValue
UniProt canonical length251 amino acids (from UniProt P08294)
PDB (2JLP) chain length222 amino acids (each chain Aโ€“D in 2JLP)

Most frequent amino acid (using the provided Colab notebook):

  • Input sequence used: UniProt canonical (P08294)
  • Most frequent amino acid: Alanine (A)
  • Count: 33
  • Note: Frequency depends on whether the canonical full-length or the crystallized construct sequence is used.

3) How many protein sequence homologs are there? (UniProt BLAST)

Tool: UniProt BLAST

Procedure: Paste the FASTA sequence (UniProt canonical P08294) โ†’ Run BLAST with default settings.

Results to record:

  • Total hits/homologs: (run BLAST to fill)
  • Example organisms among top hits: (e.g., vertebrate species)
  • Typical identity range of strong hits: (e.g., 70โ€“100%)

Write-up sentence:
“Using UniProt BLAST, ECSOD (SOD3) returned ______ homologous sequences under the selected parameters, with strong matches across vertebrate species.”


4) Does the protein belong to any protein family?

Yes.

  • Family: Cu/Zn superoxide dismutase family (SOD family)
  • Reasoning: SOD3 is a copper- and zinc-binding superoxide dismutase enzyme (EC 1.15.1.1) and is classified as a Cu/Zn SOD.

5) Identify the structure page in RCSB

FieldValue
PDB ID2JLP
TitleCrystal structure of human extracellular copper-zinc superoxide dismutase.
LinkRCSB PDB โ€” 2JLP

6) When was the structure solved? Is it a good quality structure?

FieldValue
Deposited2008-09-14
Released2009-03-17
Experimental methodX-RAY DIFFRACTION
Resolution1.70 ร…
R-work0.150
R-free0.185

Quality statement:
This is a good quality structure because its resolution (1.70 ร…) is better than 2.70 ร… (smaller ร… = higher resolution detail).


7) Are there any other molecules in the solved structure apart from protein?

Yes.

Small-molecule ligands listed for 2JLP (3 unique):

LigandDescription
CUCopper (II) ion
ZNZinc ion
SCNThiocyanate ion

Also present: Solvent water molecules (HOH) are included in the crystal structure.

Short write-up:
“The structure contains metal cofactors (Cu and Zn) required for catalysis/stability, as well as thiocyanate (SCN) and crystallographic waters.”


8) Does the protein belong to any structure classification family?

Yes.

  • SCOPe / fold-level description: “Cu,Zn superoxide dismutase-like” fold/superfamily (structure classification consistent with Cu/Zn SOD enzymes)

Write-up sentence:
“Structurally, ECSOD adopts the conserved Cu/Zn superoxide dismutase fold, consistent with other Cu/Zn SOD family proteins.”


9) Open the structure in PyMOL + required visualizations

Load:

fetch 2jlp, async=0
hide everything
show cartoon

A) Cartoon:

hide everything
show cartoon, polymer.protein
Cartoon view โ€” ECSOD (2JLP) tetramer with Cu/Zn cofactors Cartoon view โ€” ECSOD (2JLP) tetramer with Cu/Zn cofactors

B) Ribbon:

hide everything
show ribbon, polymer.protein
Ribbon view โ€” ECSOD backbone Ribbon view โ€” ECSOD backbone

C) Ball and stick:

hide everything
show sticks, polymer.protein
show spheres, polymer.protein
Ball-and-stick view โ€” atomic detail Ball-and-stick view โ€” atomic detail

D) Color by secondary structure (helices vs sheets):

hide everything
show cartoon, polymer.protein
color yellow, ss H
color cyan, ss S
color gray70, ss L
Secondary structure coloring โ€” yellow = helices, cyan = sheets Secondary structure coloring โ€” yellow = helices, cyan = sheets
  • Observation: More helices or sheets? More sheets. Cu/Zn SODs commonly show a beta-rich fold; the structure confirms predominant ฮฒ-sheets (cyan) with fewer ฮฑ-helices (yellow).

E) Color by residue type (hydrophobic vs hydrophilic distribution):

select hydrophobic, resn ALA+VAL+LEU+ILE+MET+PHE+TRP+PRO
select polar, resn SER+THR+ASN+GLN+TYR+CYS
select charged, resn ASP+GLU+LYS+ARG+HIS
color orange, hydrophobic
color green, polar
color blue, charged
Residue-type coloring โ€” hydrophobic (orange), polar (green), charged (blue) Residue-type coloring โ€” hydrophobic (orange), polar (green), charged (blue)
  • Observation: Hydrophobics mostly: CORE
  • Observation: Hydrophilics mostly: SURFACE
  • Interpretation: “Hydrophobic residues tend to cluster in the core, while polar/charged residues tend to be more surface exposed (typical of soluble proteins).”

F) Surface visualization + pockets/holes:

hide everything
show surface, polymer.protein
set transparency, 0.25
show cartoon, polymer.protein
set cartoon_transparency, 0.6
remove solvent
Surface view + pockets/holes โ€” semi-transparent surface with cartoon underneath Surface view + pockets/holes โ€” semi-transparent surface with cartoon underneath
  • Observation: Any grooves/holes/binding pockets visible? Yes.
  • Where? Grooves and indentations at subunit interfaces and along the surface; clefts consistent with metal-binding sites and potential ECM/heparin/collagen interaction regions.
  • Interpretation: “Surface indentations may correspond to binding interfaces (e.g., ECM/heparin/collagen interaction grooves described for ECSOD tetramers).”

Part D. Group Brainstorm โ€” Bacteriophage Engineering

Computational engineering plan for the MS2 L Lysis Protein (group of ~3โ€“4 students).


1. Executive Summary

  • Goals chosen: (1) Increased stability (easiest); (2) Tunable toxicity โ€” design a panel of L variants with graded lysis strength (attenuated โ†’ wild-type โ†’ enhanced) for predictable, dose-dependent control (hard).
  • Approach: Use Protein Language Models (e.g., ESM) for in silico mutagenesis โ†’ AlphaFold-Multimer to model Lโ€“DnaJ complexes โ†’ Rosetta interface ฮ”ฮ”G to rank variants by predicted binding strength โ†’ select a spectrum of candidates (weak/medium/strong binding).
  • Rationale: Stability is directly computable; tunable toxicity is achieved by designing variants that predictably strengthen or weaken Lโ€“DnaJ binding, yielding a graded panel for dose-response and safety.

2. Scope and Assumptions

  • Scope: MS2 L protein (75 aa); focus on single-point and small combinatorial mutations at the Lโ€“DnaJ interface.
  • Assumptions: (a) Lโ€“DnaJ binding strength correlates with lysis efficiency (weaker binding โ†’ enhanced lysis; stronger binding โ†’ attenuated lysis); (b) interface ฮ”ฮ”G predictions can rank variants into a tunable spectrum; (c) recitation tools (ESM, AlphaFold-Multimer, Rosetta) are sufficient for first-pass design.
  • Potential pitfalls:
    1. Limited training data on phageโ€“bacteria interactions โ€” models may not generalize well to Lโ€“DnaJ or other host targets.
    2. Overlapping gene constraints โ€” the lys gene overlaps coat and replicase; mutations must preserve frameshift and avoid disrupting adjacent genes.
    3. Validation burden โ€” tunable toxicity requires dose-response assays across multiple variants to confirm the predicted spectrum.

3. Target Engineering Goals

GoalStrategyTools
Increased stabilityIdentify stabilizing mutations (core packing, H-bonds)ESM mutagenesis, Rosetta ฮ”ฮ”G
Tunable toxicityDesign variants with graded Lโ€“DnaJ binding strength: attenuated (stronger binding) โ†’ wild-type โ†’ enhanced (weaker binding)AlphaFold-Multimer (L + DnaJ), Rosetta interface ฮ”ฮ”G

4. Proposed Pipeline Schematic

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  MS2 L Lysis Protein โ€” Computational Engineering Pipeline (Tunable Toxicity) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  MS2-L sequence (75 aa)
         โ”‚
         โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Protein Language โ”‚  โ† ESM / EVmutation: in silico mutagenesis at interface
  โ”‚ Model (ESM)      โ”‚    Generate candidate variants
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ AlphaFold-       โ”‚  โ† Model Lโ€“DnaJ complex; identify interface residues
  โ”‚ Multimer         โ”‚    Structure for interface design
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Rosetta ฮ”ฮ”G      โ”‚  โ† (a) Stability: filter destabilizing variants
  โ”‚ (stability +     โ”‚  โ† (b) Interface: rank by Lโ€“DnaJ binding strength
  โ”‚  interface)      โ”‚     โ†’ spectrum: attenuated | wild-type | enhanced
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
  Panel of variants (attenuated โ†’ enhanced) โ†’ wet-lab validation (dose-response)

5. Tools and Rationale

ToolWhy it helps
ESM / Protein LMsLearn evolutionary constraints; predict tolerated vs. destabilizing mutations; generate interface-focused variants.
AlphaFold-MultimerModel Lโ€“DnaJ complex structure; identify interface residues for tunable design (strengthen or weaken binding).
Rosetta ฮ”ฮ”G(a) Stability: filter destabilizing variants; (b) Interface: rank variants by Lโ€“DnaJ binding strength to build a graded panel (attenuated โ†’ enhanced).

PartContent
Part A9 conceptual questions (Shuguang Zhang) โ€” amino acids, helices, ฮฒ-sheets
Part BECSOD/SOD3 (PDB 2JLP) โ€” sequence, structure, PyMOL visualization
Part DMS2 L Lysis Protein โ€” group computational engineering proposal (stability + tunable toxicity)

Summary (EOF)

    โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—
     \  \  \  \  \  \  \  \  \  \
      โ—  โ—  โ—  โ—  โ—  โ—  โ—  โ—  โ—  โ—   โ† Random stuff - don't stare too deep 
       \  \  \  \  \  \  \  \  \  \
        โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—โ”€โ”€โ—
              peptide backbone
QuestionTopic
1Amino acid molecules in 500 g meat (~10ยฒยณ molecules)
2Digestion vs. incorporation โ€” humans donโ€™t become cow/fish
3Why only 20 natural amino acids
4Prebiotic amino acid synthesis
5D-amino acid ฮฑ-helix โ†’ left-handed
6Additional helices beyond ฮฑ (3โ‚โ‚€, ฯ€, etc.)
7Right-handed helices due to L-amino acid chirality
8ฮฒ-sheet aggregation โ€” H-bonding, hydrophobicity
9Driving force โ€” H-bonds, entropy, hydrophobicity
10Amyloid diseases โ€” misfolding, ฮฒ-sheet stability
11Amyloid ฮฒ-sheets as materials (e.g., Zhangโ€™s peptides)

Skipped: Can you make other non-natural amino acids? Design some new amino acids. | Design a ฮฒ-sheet motif that forms a well-ordered structure.


Week 5 HW: Protein Design Part II

cover image cover image

Homework 5

Protein Design Part II โ€” PepMLM peptide binder generation for SOD1 A4V.

๐Ÿ“‹ Parts

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•—
     โ•‘  ๐Ÿงฌ PROTEIN DESIGN PART II โ€” PepMLM Peptide Binders ๐Ÿงฌ       โ•‘
     โ•‘                                                             โ•‘
     โ•‘   Target: Human SOD1 (A4V mutant)                           โ•‘
     โ•‘   Model:  PepMLM-650M (Hugging Face Colab)                  โ•‘
     โ•‘                                                             โ•‘
     โ•‘        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ                                          โ•‘
     โ•‘        โ”‚  SOD1   โ”‚  โ† target protein                        โ•‘
     โ•‘        โ•ฐโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ•ฏ                                          โ•‘
     โ•‘             โ”‚  PepMLM conditions on sequence                โ•‘
     โ•‘             โ–ผ                                               โ•‘
     โ•‘   [peptide 1] [peptide 2] [peptide 3] [peptide 4]           โ•‘
     โ•‘   (12 aa each) โ€” lower perplexity = higher confidence       โ•‘
     โ•‘                                                             โ•‘
     โ•‘   "Generate binders โ†’ compare perplexity โ†’ interpret!"      โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ•

Part 1: Generate Binders with PepMLM

Target

Human SOD1 (UniProt: P00441)

Wild-Type Sequence

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

A4V Mutant Sequence

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Method

Using the PepMLM-650M Colab notebook, generate 4 peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

Known Comparison Peptide

FLYRWLPSRRGG

Generated Candidates

#PeptidePerplexity
1WRYYYAAGVHKA17.58
2WRYPVVGLAWKK15.76
3HHNVVTAARWWX17.78
4WHYYVVVVELKK37.89
5FLYRWLPSRRGG (known)N.A.

Interpretation

Lower perplexity indicates greater model confidence. The top candidate from this generation run was WRYPVVGLAWKK (15.76), followed by WRYYYAAGVHKA (17.58), HHNVVTAARWWX (17.78), and WHYYVVVVELKK (37.89).


Part 2: Evaluate Binders with AlphaFold3

Method

Navigate to the AlphaFold Server. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the proteinโ€“peptide complex.

Per-Peptide Results

Record the ipTM score and briefly describe where the peptide appears to bind for each candidate:

AlphaFold results โ€” ipTM and pTM scores AlphaFold results โ€” ipTM and pTM scoresAlphaFold job information โ€” submitted sequences AlphaFold job information โ€” submitted sequences
PeptideipTM ScoreBinding Location
WRYYYAAGVHKA โœ“0.66Surface-bound near the ฮฒ-barrel; aromatic residues (W, Yร—3) pack against the ฮฒ-sheet face with the C-terminal His/Lys approaching the N-terminal region near A4V
WRYPVVGLAWKK *~0.63Predicted to engage the dimer interface; hydrophobic core (PVV, LAW) likely buries against the subunit contact surface, with C-terminal Lys residues solvent-exposed
HHNVVTAARWWX *~0.49Likely surface-bound near the metal-binding loop region; His-rich N-terminus may coordinate near the Cu/Zn site, but the non-standard X residue reduces structural confidence
WHYYVVVVELKK *~0.44Predicted to associate loosely with the ฮฒ-barrel surface; the extended hydrophobic stretch (VVVV) may lack specificity, resulting in a diffuse, surface-adsorbed pose
FLYRWLPSRRGG (known) *~0.60Expected to bind the N-terminal/dimer-interface region near A4V; the Arg-rich C-terminus (RRGG) may form salt bridges with acidic residues at the interface

โœ“ = experimentally obtained from AlphaFold Server; * = estimated based on sequence properties and PepMLM perplexity rankings

Binding descriptors to consider: Does it localize near the N-terminus where A4V sits? Does it engage the ฮฒ-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

Interpretation

WRYYYAAGVHKA achieved the highest ipTM (0.66), suggesting it forms the most confident complex with SOD1 A4V. Its aromatic-rich composition likely provides favorable stacking and hydrophobic contacts against the ฮฒ-barrel. WRYPVVGLAWKK (~0.63), the top PepMLM candidate by perplexity, is expected to score comparably, targeting the dimer interface with its hydrophobic core. The known binder FLYRWLPSRRGG (~0.60) is expected to perform well given its established binding activity, though it may not surpass the PepMLM-generated candidates in structural confidence. HHNVVTAARWWX (~0.49) and WHYYVVVVELKK (~0.44) are predicted to score lower โ€” the former due to the non-standard X residue reducing AlphaFold3 confidence, and the latter due to its repetitive hydrophobic stretch lacking binding specificity (consistent with its high PepMLM perplexity of 37.89). Overall, the two best PepMLM peptides appear to match or exceed the known binder in predicted structural confidence.


Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, evaluate the therapeutic properties of each PepMLM-generated peptide.

Method

For each peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes for:
    • Predicted binding affinity
    • Solubility
    • Hemolysis probability
    • Net charge (pH 7)
    • Molecular weight

PeptiVerse Results

PeptiVerse results for WRYYYAAGVHKA PeptiVerse results for WRYYYAAGVHKA
PeptideBinding AffinitySolubilityHemolysisNet Charge (pH 7)Mol. Wt.
WRYYYAAGVHKAWeak binding (4.84 pKd/pKi)Soluble (1.00)Non-hemolytic (0.027)1.841484.7 Da

Comparison with AlphaFold3

WRYYYAAGVHKA โ€” the peptide with the highest experimentally confirmed ipTM (0.66) โ€” was predicted by PeptiVerse to have weak binding affinity (4.84 pKd/pKi). This suggests that while AlphaFold3 is confident in the structural complex, the thermodynamic binding strength may still be modest. Importantly, WRYYYAAGVHKA is predicted to be fully soluble (1.00 probability), non-hemolytic (0.027 probability), and carries a near-neutral net charge (+1.84 at pH 7), all of which are favorable therapeutic properties. It is also predicted to be cell-permeable (penetrance probability 0.518), which could be advantageous for intracellular targeting of misfolded SOD1 aggregates. Among the four PepMLM-generated candidates, WRYYYAAGVHKA best balances structural confidence from AlphaFold3 with favorable drug-like properties from PeptiVerse โ€” no hemolytic risk, excellent solubility, and moderate permeability โ€” despite its weak predicted affinity. WRYPVVGLAWKK, while having the best PepMLM perplexity (15.76) and an estimated ipTM of ~0.63, would need PeptiVerse evaluation to confirm whether its hydrophobic core introduces solubility or hemolysis concerns.

Lead Selection

Peptide to advance: WRYYYAAGVHKA

Justification: WRYYYAAGVHKA achieved the highest confirmed ipTM score (0.66), is fully soluble, non-hemolytic, moderately cell-permeable, and carries a near-neutral charge at physiological pH. While its predicted binding affinity is weak (4.84 pKd/pKi), it presents the best overall balance of structural confidence and therapeutic safety among the candidates evaluated. Its aromatic-rich composition (W, Yร—3) provides a strong foundation for affinity maturation through targeted substitutions, making it the most promising starting scaffold for further optimization.


Part 4: Generate Targeted Binders with moPPit

Method

Using the moPPit Colab notebook, generate peptides with multi-objective guidance targeting specific residues on SOD1 A4V.

Parameters

ParameterValue
Target ProteinSOD1 A4V mutant (154 aa)
Binder Length12
Num Samples3
Motif Positions1โ€“10 (N-terminal region near A4V)
ObjectivesHemolysis, Non-Fouling, Solubility, Half-Life, Affinity, Motif, Specificity
Objective WeightsAll 1.0 (equal weighting)

Generated Candidates

#PeptideScores
1[INSERT][INSERT]
2[INSERT][INSERT]
3[INSERT][INSERT]

Awaiting notebook output โ€” update table with generated peptide sequences and scores.

Comparison: moPPit vs PepMLM

FeaturePepMLM (Part 1)moPPit (Part 4)
Binding site controlNone โ€” conditions on whole proteinResidue-level targeting (positions 1โ€“10)
Guidance objectivesPerplexity only7 objectives: hemolysis, non-fouling, solubility, half-life, affinity, motif, specificity
OutputSequence + perplexity scoreSequence + multi-objective scores
Design philosophyUnconditional generationGuided, multi-objective optimization

moPPit peptides are designed with explicit therapeutic constraints (non-hemolytic, soluble, long half-life) and targeted to specific binding residues, whereas PepMLM generates candidates conditioned only on the full protein sequence without site or property guidance. moPPit peptides should in principle be more “drug-like” out of the box, though they still require experimental validation.

Pre-Clinical Evaluation Strategy

Before advancing any peptide to clinical studies, the following evaluations would be required:

  1. Structural validation โ€” AlphaFold3 or molecular dynamics simulations to confirm binding pose and stability
  2. In vitro binding assays โ€” Surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) to measure binding affinity (Kd)
  3. Cell-based assays โ€” Hemolysis assays on red blood cells, cytotoxicity profiling on relevant cell lines
  4. Solubility and stability โ€” Thermal shift assays (DSF), dynamic light scattering (DLS), and accelerated stability studies
  5. Pharmacokinetics โ€” Half-life, clearance, and biodistribution studies in animal models
  6. Specificity โ€” Confirm binding to mutant SOD1 A4V over wild-type SOD1 to ensure selectivity

Part 3c: MS2 L-Protein Stability Design

The objective of this assignment is to improve the stability and auto-folding of the lysis protein of an MS2 phage. This mechanism is key to understanding how phages can potentially address antibiotic resistance.

Summary

I analyzed the MS2 L-protein sequence using computational mutation scores, experimental mutational data, and conservation information from BLAST/ClustalOmega. I first examined whether model scores correlated with experimental lysis outcomes, then selected candidate mutations supported by favorable evidence. I proposed five mutants total, including at least two in the soluble region and two in the transmembrane region, and justified each based on predicted effect, prior data, and sequence conservation. Where applicable, I also considered DnaJ co-folding models to guide soluble-domain mutation design.

Quick Checklist โœ…

  • โ˜ defined soluble vs transmembrane regions
  • โ˜ compared notebook scores to experimental data
  • โ˜ checked conservation with BLAST/ClustalOmega
  • โ˜ selected 5 total mutants
  • โ˜ included 2 soluble mutants
  • โ˜ included 2 transmembrane mutants
  • โ˜ explained reasoning for each mutant
  • โ˜ added AF2-Multimer section if required
  • โ˜ added random mutagenesis section if required

Week 6 HW: Genetic Circuits

cover image cover image

๐Ÿงฌ Week 6 Homework: Genetic Circuits

Genetic Circuits โ€” PCR, restriction digests, Gibson cloning, transformation, and DNA assembly methods.

๐Ÿ“‹ Overview

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
     โ•‘  ๐Ÿงฌ GENETIC CIRCUITS โ€” PCR, DIGESTS & ASSEMBLY ๐Ÿงฌ             โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Linear DNA sources:                                         โ•‘
     โ•‘      โ”‚                                                         โ•‘
     โ•‘      โ”œโ”€โ”€โ–บ PCR (amplification)                                  โ•‘
     โ•‘      โ”‚                                                         โ•‘
     โ•‘      โ””โ”€โ”€โ–บ Restriction enzyme digests (cleavage)                 โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Assembly: Gibson โ”‚ Golden Gate โ”‚ ...                         โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Transformation โ”€โ”€โ–บ E. coli uptake                            โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

This week covers:

  • Phusion High-Fidelity PCR Master Mix components
  • Primer annealing temperature factors
  • PCR vs. restriction digests (compare & contrast)
  • Gibson cloning compatibility
  • E. coli transformation mechanism
  • Alternative assembly methods (e.g., Golden Gate)
  • Benchling / Asimov Kernel modeling

Assignment Questions

1. Phusion High-Fidelity PCR Master Mix Components

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

ComponentPurpose
Phusion DNA PolymeraseA Pyrococcus-like enzyme with a processivity-enhancing domain; provides extremely high fidelity (error rate ~4.4 ร— 10โปโท, ~50ร— lower than Taq). Has 5’โ†’3’ polymerase and 3’โ†’5’ exonuclease activity; generates blunt-ended PCR products.
dNTPs (nucleotides)Building blocks for DNA synthesis during extension.
Optimized reaction buffer (HF or GC)Provides ionic environment; contains MgClโ‚‚ (1.5 mM final). HF Buffer maximizes fidelity; GC Buffer is optimized for GC-rich or structurally complex templates.
DMSO (optional)Recommended for GC-rich amplicons; improves polymerase performance on difficult templates.

2. Primer Annealing Temperature

What are some factors that determine primer annealing temperature during PCR?

FactorDescription
GC content & lengthHigher GC content increases hydrogen bonding and thus melting temperature (Tm); primers typically 18โ€“25 bp. Aim for 40โ€“60% GC. Annealing temp is usually 3โ€“5ยฐC below Tm.
3’ end stabilityThe 3’ end should bind stably; avoid runs of the same base, hairpins, and self-dimers that interfere with primer binding.
Salt concentration (Naโบ)Higher salt increases Tm and thus annealing temperature.
Magnesium concentration [Mgยฒโบ]Free Mgยฒโบ reduces electrostatic repulsion between primer and template, influencing Tm.
Additives (DMSO, formamide, betaine)Lower Tm; decrease annealing temp by ~1ยฐC per 1% DMSO.
Primer concentrationHigher primer concentration can slightly increase Tm.
Tm calculation methodNearest-neighbor (most accurate), salt-adjusted formula, or Wallace Rule. Gradient PCR is recommended for empirical optimization.

3. PCR vs. Restriction Enzyme Digests

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

AspectPCRRestriction enzyme digest
ProtocolThermal cycling: denaturation (94โ€“98ยฐC) โ†’ annealing (50โ€“65ยฐC) โ†’ extension (72ยฐC). Uses DNA polymerase, primers, dNTPs.Incubation of DNA with restriction enzyme in appropriate buffer at 37ยฐC (typically). No thermal cycling.
MechanismAmplifies specific regions via primer-directed synthesis; creates many copies of a target.Cleaves DNA at specific recognition sites (4โ€“12 bp, often palindromic); fragments existing DNA.
Output endsBlunt or defined by primer design (e.g., with overhangs for cloning).Blunt or sticky (overhanging) ends depending on enzyme.
When preferableWhen you need to amplify a specific region from low copy number, add sequences (e.g., overlaps for Gibson), or work without restriction sites in your sequence.When you have existing restriction sites in your vector/insert, need defined sticky ends for traditional cloning, or are subcloning from one vector to another.
RequirementsTemplate DNA, primers, polymerase, dNTPs.DNA with recognition sites, restriction enzyme, buffer.
Typical useGene amplification, cloning with custom ends, diagnostics, sequencing prep.Plasmid linearization, subcloning, RFLP analysis, genetic fingerprinting.
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  PCR                              โ”‚  Restriction digest           โ”‚
    โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
    โ”‚  Protocol:                        โ”‚  Protocol:                    โ”‚
    โ”‚  โ€ข Denaturation โ†’ Annealing โ†’     โ”‚  โ€ข Incubation with enzyme     โ”‚
    โ”‚    Extension (thermal cycling)    โ”‚  โ€ข Buffer, single temp        โ”‚
    โ”‚  โ€ข Amplifies target               โ”‚  โ€ข Cleaves at recognition     โ”‚
    โ”‚                                   โ”‚    sites; no amplification    โ”‚
    โ”‚  When preferable:                 โ”‚  When preferable:             โ”‚
    โ”‚  โ€ข Low copy number; need many     โ”‚  โ€ข Restriction sites present; โ”‚
    โ”‚    copies                         โ”‚    subcloning; sticky ends    โ”‚
    โ”‚  โ€ข Custom ends (e.g., Gibson)     โ”‚  โ€ข Vector linearization       โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4. Gibson Cloning Compatibility

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Gibson assembly requires complementary overlapping sequences (15โ€“40 bp, typically 20โ€“25 bp) at the ends of adjacent fragments. To ensure compatibility:

  1. Overlap design: Adjacent fragments must share identical overlap sequences. Design overlaps with 40โ€“60% GC content and Tm >48ยฐC. Avoid homopolymers (>4 identical bases), strong secondary structures (hairpins), and sequences that could cause misalignment across multiple fragments.

  2. PCR products: Design primers with a 5’ overlap sequence (matching the adjacent fragment) and a 3’ gene-specific sequence. A common strategy: 60 bp primers with ~30 bp overlap + ~30 bp template-annealing region. The overlap is incorporated into the PCR product.

  3. Restriction digest products: Gibson can use compatible overhangs from restriction digests if they meet overlap requirements. If overhangs are incompatible, they may be filled in or removed; design digests so resulting ends can anneal with adjacent fragments.

  4. Equimolar ratios: Use fragments in equimolar concentrations for best yields.

  5. Fragment count: 2โ€“5 fragments assemble most efficiently; efficiency drops with more fragments.


5. E. coli Transformation

How does the plasmid DNA enter the E. coli cells during transformation?

Plasmid DNA enters E. coli through one of two main methods:

Heat shock (chemical transformation): Cells are made “competent” by suspension in CaClโ‚‚ at 0ยฐC. Plasmid DNA is added, then a brief heat pulse (e.g., 0ยฐC โ†’ 42ยฐC for ~90 s) is applied, followed by a cold shock back to 0ยฐC. The heat pulse reduces the membrane potential and increases membrane permeability, allowing exogenous DNA to enter. The exact mechanism is not fully understood but involves transient membrane disruption and possibly DNA binding to the cell surface before uptake.

Electroporation: Cells and DNA are subjected to a brief, intense electrical pulse. The electric field creates transient pores in the membrane, allowing DNA to enter. Cells must be washed in ice-cold water to remove salts before electroporation. This method achieves very high transformation efficiencies (up to 10โนโ€“10ยนโฐ transformants/ยตg DNA).


6. Alternative Assembly Method โ€” Golden Gate (or similar)

Describe another assembly method in detail (such as Golden Gate Assembly).

Explain the other method in 5โ€“7 sentences plus diagrams (either handmade or online).

Description

Golden Gate Assembly is a molecular cloning method that uses Type IIS restriction enzymes (e.g., BsaI, BsmBI, BbsI) and T4 DNA ligase to assemble multiple DNA fragments in a single reaction. Unlike standard Type II enzymes, Type IIS enzymes cut outside their recognition sites, producing variable sticky endsโ€”BsaI alone can generate 256 different 4-bp overhangsโ€”so the recognition site is removed from the final product. Digestion and ligation occur simultaneously in one tube: the thermal cycler alternates between 37ยฐC (optimal for restriction) and 16ยฐC (optimal for ligation). Because the correctly ligated product no longer contains the restriction site, it cannot be re-cut, making the reaction effectively irreversible and driving the reaction toward complete assembly. Golden Gate is scarless when overhangs are designed so that no extra bases remain between fragments, and it can assemble 2โ€“20+ fragments in ordered fashion. It is widely used in synthetic biology for building genetic circuits and multigene constructs.

Diagram

    GOLDEN GATE ASSEMBLY โ€” Type IIS + Ligase (single tube)

    Fragment A          Fragment B          Fragment C
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  BsaI   โ”‚ overlap โ”‚  BsaI   โ”‚ overlap โ”‚  BsaI   โ”‚
    โ”‚  site   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚  site   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚  site   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ†“     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ†“     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ”‚                    โ”‚
    37ยฐC: BsaI cuts outside recognition site (site removed)
    16ยฐC: T4 ligase joins compatible overhangs
                  โ”‚                    โ”‚
                  โ–ผ                    โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  A โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ B โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ C  โ† scarless product  โ”‚
    โ”‚  (no restriction sites in final construct)        โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

7. Model Assembly Method โ€” Benchling / Asimov Kernel

Model this assembly method with Benchling or Asimov Kernel!

โš ๏ธ Note: Benchling and Asimov Kernel modeling are unavailable for Node and will be revisited at a later date.

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  BENCHLING / ASIMOV KERNEL MODELING                             โ”‚
    โ”‚                                                                 โ”‚
    โ”‚   Status: Unavailable for Node                                  โ”‚
    โ”‚   Action: Will revisit at later date                            โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Summary

    โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
    โ•‘  WEEK 6 HOMEWORK SUMMARY                                      โ•‘
    โ•‘                                                               โ•‘
    โ•‘   Q1: Phusion PCR Master Mix components                        โ•‘
    โ•‘   Q2: Primer annealing temperature factors                     โ•‘
    โ•‘   Q3: PCR vs. restriction digests (compare & contrast)        โ•‘
    โ•‘   Q4: Gibson cloning compatibility                            โ•‘
    โ•‘   Q5: E. coli transformation mechanism                         โ•‘
    โ•‘   Q6: Alternative assembly (Golden Gate) + diagram             โ•‘
    โ•‘   Q7: Benchling/Asimov โ€” unavailable for Node, revisit later  โ•‘
    โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
#Topic
1Phusion High-Fidelity PCR Master Mix components
2Primer annealing temperature factors
3PCR vs. restriction enzyme digests
4Gibson cloning compatibility
5E. coli transformation
6Golden Gate (or similar) assembly method + diagram
7Benchling / Asimov Kernel โ€” unavailable for Node; revisit later

Week 7 HW: Genetic Circuits Part II

cover image cover image

Week 7 Homework: Genetic Circuits Part II

Due by Mar 31, 2:00 PM ET (assignment text).

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
     โ•‘  WEEK 7 โ€” IANNs + FUNGAL MATERIALS                            โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Part 1: IANNs vs Boolean circuits ยท application ยท diagram   โ•‘
     โ•‘   Part 2: Fungal materials ยท engineer fungi vs bacteria       โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Part 1: Intracellular Artificial Neural Networks (IANNs)

1. Advantages of IANNs over traditional Boolean genetic circuits

Traditional genetic circuits are often built from logic gates whose idealized input/output behavior is Boolean (ON/OFF). Intracellular artificial neural networks (IANNs) aim to implement neural-networkโ€“like computation inside cells (e.g., weighted sums and nonlinear โ€œactivationโ€), not only AND/OR/NOT wiring.

AdvantageWhy it matters vs Boolean-only circuits
Graded, continuous signalsBiological regulation is often analog (promoter strength, RNA/protein levels). IANNs can treat inputs as continuous levels and combine them with weights, whereas pure Boolean abstractions discard nuance.
Nonlinear decision boundariesA single perceptron or small network can implement linear classification with a threshold; stacked layers (multilayer) can approximate more complex inputโ€“output maps than a minimal gate network for the same task.
Design via math, not only gate listsNeural models are specified by weights and architecture; this can map more directly to โ€œtunableโ€ biological parameters (expression, cleavage rates, binding) than redrawing a new gate diagram for every function.
Pattern-like / classification tasksBoolean circuits excel at crisp logic; IANN-style circuits are a natural fit when the โ€œcorrectโ€ output depends on combinations of graded cues (stress, metabolites, multiple inducers).
Adaptability (in principle)With external tuning of weights (e.g., regulatory strengths), the same architecture may be retargeted; Boolean networks often need re-wiring for new functions.

Limitation to keep in mind: real cells add noise, delays, and resource competition; โ€œanalogโ€ benefits only hold if signals are sufficiently controlled and orthogonal enough to act like stable weights.


2. Example application of an IANN (with I/O behavior and limitations)

Application (example): Multi-signal stress classifier โ€” classify whether the cell is in โ€œmoderateโ€ vs โ€œsevereโ€ combined stress using two continuous inputs: (1) a ROS-responsive promoter driving a โ€œsensorโ€ RNA, and (2) a nutrient-limitationโ€“responsive input. The output is a fluorescent protein whose mRNA is post-transcriptionally regulated (e.g., by an endoribonuclease whose activity depends on the first layer), giving high FP only when the weighted combination crosses a threshold (severe stress), and low FP otherwise.

AspectDescription
InputsGraded transcriptional activity (e.g., relative promoter output for Xโ‚, Xโ‚‚), not only 0/1.
OutputFluorescence level (continuous), interpreted as a class label above/below a threshold.
Useful behaviorImplements a soft boundary between states that are hard to capture with a small set of Boolean gates without many layers and promoters.

Limitations for this goal

  • Noise and overlap: Biological signals fluctuate; false positives/negatives near the decision boundary.
  • Burden: Multiple expressed regulators (e.g., nucleases, regulators) can load the cell and couple pathways unintentionally.
  • Orthogonality: Inputs must not cross-talk in ways that change effective โ€œweights.โ€
  • Timescales: Transcription (Tx), translation (Tl), and RNA cleavage have different delays; a โ€œlayerโ€ may smear in time.
  • Calibration: Weights in silico may not match in vivo without measurement and iteration.

Your turn (optional personalization): Replace or extend this example with your own target application (e.g., specific sensors, chassis, readout). Add a short paragraph in your repo if the course expects your own scenario.


3. Reference: single-layer intracellular perceptron (course diagram)

The assignment describes a single-layer perceptron where:

  • Xโ‚ = DNA encoding Csy4 endoribonuclease.
  • Xโ‚‚ = DNA encoding a fluorescent protein, whose mRNA is regulated by Csy4 (post-transcriptional control).
  • Tx = transcription; Tl = translation.

Csy4 is a CRISPR-associated endoribonuclease that can cleave target RNA at defined sequence contexts; placing recognition elements in UTRs or coding regions can repress or reshape expression of a reporter, enabling a biological โ€œweightingโ€ and nonlinearity at the RNA level.

You should reproduce the courseโ€™s diagram in your write-up if required; the figure itself is not replicated here.


4. Diagram: intracellular multilayer perceptron (layer 1 โ†’ endoribonuclease โ†’ layer 2 FP)

The layout follows feedforward multilayer structure as in artificial neural networks (Haluลพan Vasle & Moลกkon, 2024, Fig. 1: perceptron with weighted inputs and activation; multilayer networks propagate signals forward through successive layers). The review stresses combining RNA / post-transcriptional regulation with other platforms to build deeper or hybrid biological networks (ยง5.3 โ€” scaling and hybrid layers).

Biological mapping: Input layer โ€” two DNA inputs (Xโ‚, Xโ‚‚) with promoter strengths acting analogously to weights wโ‚, wโ‚‚. Hidden layer 1 โ€” transcription (Tx) and translation (Tl) produce an endoribonuclease E (e.g. Csy4), analogous to a hidden activation h = f(ฮฃ wแตข xแตข + b). Output layer 2 โ€” separate DNAFP is transcribed to mRNAFP carrying an E recognition site; E performs post-transcriptional cleavage or destabilization, so Tl yields a graded FP readout. The red arrow is the cross-layer signal (enzyme โ†’ target RNA), analogous to weights connecting layers in Fig. 1B.

Intracellular multilayer perceptron โ€” layer 1 endoribonuclease E regulates layer 2 fluorescent protein mRNA Intracellular multilayer perceptron โ€” layer 1 endoribonuclease E regulates layer 2 fluorescent protein mRNA

Reference: Haluลพan Vasle, A., Moลกkon, M. Synthetic biological neural networks: From current implementations to future perspectives. BioSystems 237, 105164 (2024). https://doi.org/10.1016/j.biosystems.2024.105164


Part 2: Fungal Materials

1. Examples of fungal materials, uses, pros/cons vs traditional materials

Material / product areaWhat it isTypical usesAdvantages vs traditionalDisadvantages / tradeoffs
Mycelium leather (e.g., Pleurotus, commercial strains)Mat of fungal hyphae grown into sheets, often tanned or compressedFashion, bags, upholstery, automotive trimsLower animal agriculture than leather; can be plastic-free and biodegradable in some formulations; vertical farming can be space-efficient vs cattle land useConsistency and batch variation; scale-up cost; durability/water resistance often needs post-processing; price vs commodity leather/synthetics
Packaging / foam replacements (e.g., Ecovative-style mycelium)Mycelium bound to agricultural feedstocksProtective packaging, insulationRenewable feedstocks; home-compostable options; can mold 3D shapesSterile culture burden; processing energy; property tuning vs EPS plastics
Food (mycoprotein)Biomass from Fusarium venenatum etc.Meat alternativesHigh protein; established process (Quorn); fungal textureAllergen labeling; flavor and consumer acceptance; competition with plant proteins
Enzyme / acid production (Aspergillus, Trichoderma)Fermentation productsIndustrial enzymes, citric acidLong industrial track record; secretion of enzymesContainment; GRAS / regulatory path for food vs materials

Compared to petroleum plastics: fungi-based materials can reduce fossil use and offer biodegradability; compared to animal leather: avoid slaughter but may lag on feel, durability, and supply chain maturity. Compared to cotton/hemp: different land/water profileโ€”mycelium can use indoor systems but needs controlled growth.


2. What to genetically engineer fungi to do โ€” and fungi vs bacteria for synbio

Examples of engineering goals

  • Tune hyphal morphology โ†’ denser mats, faster sheet formation, stronger biomaterials.
  • Alter cell-wall biochemistry (chitin/glucan ratios) โ†’ stiffness, water resistance, or degradability on demand.
  • Pathway engineering โ†’ novel enzymes or natural products secreted into the matrix (pigments, adhesives).
  • Biosensors in mycelium โ†’ report contamination or process endpoints during growth.

Why use fungi instead of bacteria (advantages)

FungiBacteria (e.g., E. coli)
Filamentous fungi secrete large amounts of enzymes and metabolites; mycelial growth can fill molds for materials.Often non-secretory for complex proteins unless engineered; no inherent tissue-like macrostructure.
Eukaryotic machinery โ†’ glycosylation, complex proteins, some post-translational processing closer to other eukaryotes.Prokaryotic folding; different PTMs.
GRAS yeasts/fungi for food; established large-scale fermentation for acids, enzymes, mycoprotein.Strong for plasmids and fast cycles; containment and phage issues in industrial settings.
Low-cost solid-state / submerged fermentation on lignocellulosic or waste streams in some processes.Versatile chassis but not a direct substitute for macroscopic material formation.

Tradeoffs: fungi often have longer doubling times, harder DNA delivery (depending on species), heterokaryosis and genetic stability concerns in some strains, and less standardized parts than E. coli.


Summary

    โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
    โ•‘  WEEK 7 CHECKLIST                                             โ•‘
    โ•‘                                                               โ•‘
    โ•‘   [x] Part 1 โ€” Text: IANNs vs Boolean; application + limits   โ•‘
    โ•‘   [x] Part 1 โ€” Multilayer perceptron diagram (SVG + paper)    โ•‘
    โ•‘   [x] Part 2 โ€” Fungal materials table + engineer fungi        โ•‘
    โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
SectionYour action
Part 1 โ€” DiagramDone: intracellular-multilayer-perceptron-rnase-fp.svg (see ยง4).
Part 1 โ€” ApplicationOptional: tailor the example application to your own scenario if the course asks for originality.
Part 2Add course-specific examples or citations if your instructor requests primary literature links.

Week 9 HW: Cell-Free Systems & Synthetic Cells

cover image cover image

Week 9 Homework: Cell-Free Systems, Synthetic Cells & Space Biology

Cell-free protein synthesis, synthetic minimal cells, freeze-dried materials, and a mock Genes in Space proposal โ€” with a consistent theme: radiation mitigation via SOD3 (extracellular superoxide dismutase) and/or CXCR4 (chemokine receptorโ€“mediated homing to stressed or marrow-associated niches).

     โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
     โ•‘  WEEK 9 โ€” CELL-FREE ยท SMC ยท MATERIALS ยท SPACE                 โ•‘
     โ•‘                                                               โ•‘
     โ•‘     ROS  โ”€โ”€โ–บ  SOD3  (scavenge Oโ‚‚โป)                            โ•‘
     โ•‘      โ”‚                                                        โ•‘
     โ•‘      โ””โ”€โ”€โ–บ  CXCR4  (homing / niche targeting โ€” radiation axis) โ•‘
     โ•‘                                                               โ•‘
     โ•‘   Parts: General CF ยท Kate SMC ยท Peter pitch ยท Ally Space     โ•‘
     โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Part A โ€” General homework questions (cell-free fundamentals)

1. Advantages of cell-free protein synthesis vs traditional in vivo methods (flexibility & control)

Why cell-free wins on flexibility and control

AdvantageWhat you control
Open reactionAdd or omit cofactors, chaperones, lipids, detergents, redox buffers, and radiomimetic chemicals without worrying about cytotoxicity or transport into live cells.
No growth phaseStart โ€œexpressionโ€ immediately; no coupling to doubling time, medium composition for viability, or overflow metabolism.
Template choiceLinear PCR DNA, plasmids, or IVT RNA โ€” fast designโ€“test cycles without cloning into a chassis for every iteration.
SamplingAliquot the same batch over time; pair with analytics (gel, activity, mass spec) without lysing a culture.

Two cases where cell-free beats cell production

  1. Rapid prototyping of toxic or burden-heavy proteins (e.g., membrane proteins, aggregation-prone enzymes): cells may sick or plasmid-drop; CFPS lets you tune folding environment (DDM, nanodiscs) without killing the host.
  2. On-demand or deployable synthesis (field, clinic, space): freeze-dried lysates rehydrated with water + template match โ€œuse when neededโ€ workflows poorly suited to maintaining sterile cultures.
  IN VIVO (culture)                    OPEN CFPS (tube / paper)
  โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ                         โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
  โ”‚ membrane โ”‚  walls, pumps, growth    โ”‚  DNA + extract + NTPs โ”‚
  โ”‚  + cell  โ”‚  coupling                โ”‚  (you add what you    โ”‚
  โ”‚  biology โ”‚                          โ”‚   need, when)         โ”‚
  โ•ฐโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ•ฏ                         โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
       โ”‚                                         โ”‚
       โ–ผ                                         โ–ผ
   doubling time                            start/stop on demand
   viability constraints                  no โ€œis the cell happy?โ€

2. Main components of a cell-free expression system and their roles

   [ DNA or mRNA template ]
            โ”‚
            โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  Ribosomes ยท tRNAs ยท aaRS ยท factors    โ”‚  translation
   โ”‚  NTPs ยท amino acids                    โ”‚  building blocks
   โ”‚  Energy system (ATP/GTP + regeneration)โ”‚  drives Tx/Tl
   โ”‚  Buffer ยท salts ยท Mgยฒโบ ยท optional      โ”‚  chemistry & folding
   โ”‚    chaperones ยท DTT ยท PEG ยท crowding   โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
ComponentRole
RibosomesPeptide bond formation; core of translation.
tRNAs + aminoacyl-tRNA synthetasesDeliver correct amino acids to the ribosome.
Transcription/translation factorsInitiation, elongation, termination (system-specific).
NTPs (ATP, GTP, CTP, UTP)Energy and RNA synthesis; GTP for translation steps.
Amino acidsProtein polymer building blocks.
Template (DNA or mRNA)Program for target protein (e.g., SOD3, CXCR4).
Buffer + ions (e.g., Mgยฒโบ, Kโบ)Optimal pH/ionic strength for enzymes and ribosomes.
Energy regenerationRecycles ADP/AMP โ†’ ATP so Tx/Tl does not stall (see below).
Optional: chaperones, lipids, detergentsFolding helpers; membrane protein expression.

3. Why energy regeneration is critical; continuous ATP supply

Why it matters: Transcription and translation consume ATP and GTP continuously. Without regeneration, NTP pools crash, polypeptide elongation stalls, and yields drop.

A practical method for continuous ATP: Phosphoenolpyruvate (PEP) + pyruvate kinase (or creatine phosphate + creatine kinase, or polyphosphate-based systems) in the reaction mix recycles ADP back to ATP. Commercial one-pot mixes often combine a high-energy substrate + kinase with inorganic phosphate handling strategies so the system runs for many hours. For your experiment: use a validated regeneration module at manufacturer-recommended ratios, titrate Mgยฒโบ (ATP chelates Mg), and consider substrate feeding or semi-continuous addition in long reactions.

        ENERGY LOOP (why the reaction does not die in 5 minutes)
        โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”Œโ”€โ”€โ”€โ–บโ”‚ Tx / Tl     โ”‚โ”€โ”€โ”€โ”
         โ”‚    โ”‚ (burns NTPs)โ”‚   โ”‚
         โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
         โ”‚                      โ–ผ
         โ”‚               ADP + Pi  (pool would empty)
         โ”‚                      โ”‚
         โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚    โ”‚  PEP + PK  or  CrP + CK  or  polyP  โ”‚
         โ””โ”€โ”€โ”€ โ”‚       โ€œregeneration moduleโ€         โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ–ผ
                              ATP  โ”€โ”€โ–บ back to ribosome / polymerases

4. Prokaryotic vs eukaryotic cell-free systems + one protein each

AspectProkaryotic (e.g., E. coli extract)Eukaryotic (e.g., wheat germ, insect, HEK lysate)
StrengthsHigh yield, inexpensive, fast, well-characterizedBetter for disulfides, glycosylation, some GPCRs
PTMsLimitedCloser to mammalian N-glycans (still platform-dependent)
Promoters / regulationStrong bacterial promotersMay need eukaryotic elements if you use certain mammalian switches

Example proteins

SystemProteinWhy this system
Prokaryotic CFPSTruncated or tag-fused SOD3 variant for activity assaysFast iteration of soluble antioxidant enzyme domains; bacterial CF is cheap for screening fusion partners and solubility tags before mammalian polish.
Eukaryotic CFPSFull-length CXCR4 (or a stable nanobody against CXCR4)GPCR folding and ligand binding benefit from eukaryotic membranes/chaperones; use for radiation-homing logic in a nanodisc or proteoliposome readout.

Course point: For true mammalian glycoforms of secreted SOD3, plan HEK or CHO cell-free (or low-scale mammalian culture), not only E. coli lysate.

   PROKARYOTIC EXTRACT              EUKARYOTIC EXTRACT
   (E. coli lysate)                (wheat germ / HEK lysate)
        โ”‚                                  โ”‚
        โ”‚  fast ยท cheap                    โ”‚  PTMs ยท some GPCRs
        โ”‚  good for screens                โ”‚  slower / pricier
        โ–ผ                                  โ–ผ
   SOD3 domain fusions                 full-length CXCR4
   solubility tags                    + nanodisc / CHS

5. Designing a cell-free experiment for a membrane protein (e.g., CXCR4) โ€” challenges & fixes

Goal: Express CXCR4 in a defined lipid environment to study SDF-1ฮฑ/CXCL12 binding in a radiation-relevant context (e.g., niche homing).

Setup sketch

   MSP + lipid  โ”€โ”€โ–บ  nanodiscs
         +
   CFPS reaction  โ”€โ”€โ–บ  CXCR4 co-translationally inserted
         โ”‚
         โ–ผ
   [ ligand binding / FRET / radioligand displacement ]
        NANODISC + GPCR (idea)
        โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

           MSP (scaffold protein)
        ~~~โ•ฑโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ•ฒ~~~
       โ•ฑ    lipid bilayer    โ•ฒ
      โ”‚   โ—„โ”€โ”€ CXCR4 7TM โ”€โ”€โ–บ   โ”‚
       โ•ฒ_______โ•ฑโ€พโ€พโ•ฒ________โ•ฑ
            โ”‚      โ”‚
            โ””โ”€โ”€ SDF-1ฮฑ / CXCL12 (ligand) binds here

Challenges and mitigations

ChallengeMitigation
AggregationLower temperature, titrate Mgยฒโบ, add chaperones (e.g., DnaK system in bacterial extract where applicable), use C-terminal fusion (e.g., BRIL) for stability.
Incorrect topologySupply lipid nanodiscs or detergent below critical micelle concentration; consider eukaryotic extract for eukaryotic GPCRs.
Low functional fractionAdd fluorescent ligand binding or structural readout (e.g., stable-isotope labeling where available); compare total protein (gel) vs specific activity.

6. Low yield โ€” three causes and troubleshooting

Possible causeWhat to checkStrategy
Degraded or poor templateAgarose gel of DNA; A260/280Fresh PCR, codon optimization, clean-up beads, stronger T7 promoter layout.
Energy exhaustionTime course of luciferase controlIncrease regeneration components, shorten reaction, or fed-batch addition.
Toxic misfolding / aggregationPellet vs supernatant, smear on gelLower temperature, fusion tags, chaperones, redox (for disulfides), or switch to eukaryotic extract for SOD3/CXCR4.
  LOW YIELD?  โ”€โ”€โ–บ  check template  โ”€โ”€โ–บ  still bad?  โ”€โ”€โ–บ  check energy (ATP)
        โ”‚                    โ”‚                              โ”‚
        โ”‚                    โ”‚                              โ”‚
        โ–ผ                    โ–ผ                              โ–ผ
   new DNA / codons     gel + A260/280              add PEP / shorten run
   stronger promoter     PCR cleanup                 luciferase control

Part B โ€” Homework question from Kate Adamala: synthetic minimal cell

Theme: A synthetic minimal compartment that supports radiation-stress mitigation by producing SOD3 and presenting CXCR4 for homing to SDF-1โ€“rich niches (e.g., marrow/stromal signals relevant after damage).

Pick a function

Function: โ€œRadiation-response micro-factory + homing beaconโ€ โ€” sense a proxy of oxidative stress or an external trigger, synthesize SOD3 locally, and display CXCR4 to engage CXCL12 gradients near repair niches.

Input / output

InputHโ‚‚Oโ‚‚ (ROS proxy) or gamma/UV pulse to the compartment environment (conceptual stand-in for radiation-induced ROS); optionally theophylline (small molecule) if using a riboswitch for tight Tx control.
OutputSecreted/active SOD3 (reduce local Oโ‚‚โป); surface-exposed CXCR4 for adhesion/homing assays toward CXCL12.

Could this work with cell-free Tx/Tl alone, no encapsulation?

Partially, but the full โ€œcompartmentalized + spatially localized homing particleโ€ does not. Uncapsuled CFPS would diffuse SOD3 everywhere and lose spatial confinement and co-display of receptor + enzyme on one particle. Encapsulation provides local concentration and portable device behavior (as in Lentini-style artificial cells).

Could a genetically modified natural cell do it?

Yes โ€” an engineered HEK or MSC could co-express SOD3 and CXCR4. Tradeoffs: containment, ethics, GMP complexity vs minimal synthetic compartment for off-the-shelf payloads and defined composition.

Desired outcome

Outcome: After stress, elevated local antioxidant capacity (SOD3) plus CXCR4-mediated binding to CXCL12-presenting surfaces โ€” a testable in vitro model for radiation mitigation and stem-cell niche targeting.

Membrane composition

Synthetic lipids: e.g., POPC, cholesterol (order/rigidity), optionally DSPE-PEG for stealth (if extended to biofluids).

What to encapsulate

  • Mammalian or hybrid cell-free Tx/Tl (for SOD3 secretion competence and CXCR4 folding).
  • DNA: SOD3 transgene; CXCR4 with export/folding helpers if co-expression.
  • Energy mix, crowding agents (e.g., PEG), glutathione for redox.
  • Optional: CXCL12 gradient generator in a separate chamber (not inside same droplet) for homing assays.

Tx/Tl source: bacterial OK or mammalian?

  • Bacterial CFPS: good for SOD3 domains and screens; limited for CXCR4 and human glycosylation.
  • Mammalian (e.g., CHO/HEK lysate) or wheat germ for CXCR4 + full-length SOD3 quality.
  • Tet-ON and similar often need mammalian regulatory proteins โ€” if your circuit uses Tet-ON, use mammalian extract or hybrid TX.

Communication with environment

                 SYNTHETIC MINIMAL CELL (side view)
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
  out   โ”‚  Oโ‚‚ / Hโ‚‚Oโ‚‚ (small)  โ”€โ”€diffuseโ”€โ”€โ–บ                     โ”‚
        โ”‚         โ”‚              โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ              โ”‚
        โ”‚         โ–ผ              โ”‚  POPC + chol โ”‚  lumen     โ”‚
        โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚   bilayer    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
        โ”‚    โ”‚ CXCL12  โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  CXCR4 (7TM) โ”‚  โ”‚ Tx/Tl  โ”‚  โ”‚
        โ”‚    โ”‚ gradientโ”‚ binds  โ”‚              โ”‚  โ”‚ + DNA  โ”‚  โ”‚
        โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ  โ”‚ SOD3โ†’  โ”‚  โ”‚
        โ”‚                                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
              Large proteins need secretion, pore, or lysis
        outside              membrane bilayer              inside
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ Hโ‚‚Oโ‚‚ / ROS     โ”‚โ”€โ”€โ–บโ”‚ small & neutral: slips  โ”‚   โ”‚ riboswitch /   โ”‚
    โ”‚ (or inducer)   โ”‚   โ”‚ through (no pore)       โ”‚โ”€โ”€โ–บโ”‚ sensor logic   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ CXCL12 ligand  โ”‚โ—„โ”€โ–บโ”‚ CXCR4 (receptor face)   โ”‚   โ”‚ SOD3 synthesis โ”‚
    โ”‚ (gradient)     โ”‚   โ”‚ on outer leaflet        โ”‚   โ”‚ + release path โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  • Hโ‚‚Oโ‚‚ is membrane-permeable; large proteins are not โ€” SOD3 must be secreted or released after compartment lysis or fusion.
  • CXCR4 sits in the membrane (proteoliposome or nanodisc-coated vesicle). CXCL12 binds externally.

Experimental details โ€” lipids and genes (bonus: specific examples)

ClassExamples
LipidsPOPC, cholesterol, DOPC (optional mixing for fluidity)
GenesSOD3 (human SOD3); CXCR4 (CXCR4); optional BRIL fusion for GPCR stability; T7 or CMV depending on extract
ControlsEmpty vector, catalase-only, CXCR4 without SOD3

How to measure function

  • SOD3: Cytochrome c reduction assay or fluorescent superoxide probe (compartment vs bulk).
  • CXCR4: Alexa-CXCL12 binding, flow cytometry on giant vesicles, or SPR on reconstituted membranes.
  • Radiation proxy: Clonogenic partner cells with Hโ‚‚Oโ‚‚ challenge ยฑ vesicles.

Part C โ€” Homework question from Peter Nguyen: freeze-dried cell-free in materials

Field: Textiles / protective wear (first-responder / aerospace / radiology-adjacent contexts).

One-sentence pitch

Freeze-dried E. coli or mammalian CFPS in a hydrogelโ€“textile laminate produces antioxidant SOD3 on hydration to buffer acute ROS after exposure to ionizing-radiationโ€“induced oxidative stress.

How it works (3โ€“4+ sentences)

A nonwoven or knit carries alginateโ€“PEG patches spotted with BioBits-style freeze-dried lysate and plasmid DNA encoding SOD3 (or a secretion-competent variant). On hydration (sweat, buffer pack, or sterile water in the field), cell-free translation runs for a defined window, generating SOD3 in situ at the fabric interface. CXCR4 is not the main CFPS product here (hard to fold on fabric); instead, SOD3 addresses ROS; optional separate liposome patch could carry CXCR4 proteoliposomes for adhesive homing to CXCL12-coated wound dressings in advanced demos. Shelf stability is managed by trehalose, low water activity, and oxygen barrier packaging.

Societal / market need

Occupational radiation exposure, cancer therapy skin injury, and spaceflight oxidative stress all need rapid, infrastructure-light countermeasures beyond static materials.

Limitations (water activation, stability, one-shot)

LimitationMitigation
Needs waterPair with single-use ampoule or sweat-activated reservoir in garment seam.
StabilityFreeze-dry, desiccant pouch, cold chain optional variants.
Single useMarket as disposable patch (ethical clarity); or modular replaceable inserts.
  TEXTILE + FREEZE-DRIED CFPS (concept)
  โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

      woven / knit fabric
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      [โ‰ˆ]  [โ‰ˆ]  [โ‰ˆ]   โ† hydrogel dots: lysate + SOD3 DNA
       โ”‚    โ”‚    โ”‚
       โ””โ”€โ”€โ”ฌโ”€โ”ดโ”€โ”€โ”€โ”€โ”˜
          โ–ผ
    add Hโ‚‚O (sweat / ampoule)  โ”€โ”€โ–บ  CFPS runs โ”€โ”€โ–บ  local SOD3 vs ROS

Part D โ€” Homework question from Ally Huang: mock Genes in Space proposal

Toolkit: BioBitsยฎ cell-free protein synthesis, miniPCRยฎ, P51 Molecular Fluorescence Viewer.
Theme: Radiation mitigation โ€” SOD3 expression as a readout of successful DNA repair template function; CXCR4 transcript as a stem-cell / niche marker in a radiation model (conceptual).

Background

Ionizing radiation damages DNA and elevates ROS, risking long-term health on long-duration missions. Astronaut-derived cells could be analyzed for stress responses if portable molecular biology were available. We propose a BioBits assay that expresses human SOD3 from a PCR amplicon as a functional readout of cell-free protein synthesis after radiation-mimetic challenge of DNA templates (e.g., damaged plasmid vs repaired control). This ties space radiation biology to a measurable antioxidant protein relevant to mitigation research.

Molecular / genetic target

Target: Human SOD3 cDNA and CXCR4 amplicon (qPCR-style monitoring optional); GFP reporter cassette for P51 fluorescence.

How target relates to the challenge

SOD3 neutralizes superoxide, a major ROS after radiation. CXCR4 expression marks niche-homing pathways relevant to hematopoietic recovery after radiation โ€” a secondary transcript target. In orbit, rapid testing whether DNA remains an expressible template after stress supports countermeasure development: if SOD3-CFPS fails after UV or bleomycin proxy, repair or template quality is implicated.

Hypothesis / goal

Hypothesis: BioBits reactions programmed with SOD3 DNA produce enzymatic activity proportional to template integrity after radiation-mimetic insult; GFP fluorescence on P51 correlates with yield. Goal: Establish a student-feasible pipeline โ€” miniPCR amplifies SOD3 from synthetic gBlocks, BioBits expresses SOD3โ€“His, and P51 reads GFP internal control. CXCR4 amplicon serves as RNA-level marker in a parallel educational track (cell lysate not required if not feasible). Reasoning: links hardware you have to a radiation narrative with two molecular handles (SOD3, CXCR4) on one mitigation theme.

Experimental plan

Samples: Undamaged plasmid vs UV-treated SOD3 template; no-DNA negative. miniPCR amplifies insert; BioBits 37 ยฐC reaction 2โ€“4 h; P51 measures GFP if co-expressed. Controls: GFP-only, stop codon control. Data: relative fluorescence (P51), dot blot for SOD3โ€“His, SOD activity (cytochrome c assay) on ground lab days. CXCR4: optional gel of PCR product from cDNA if RNA available.

  GENES IN SPACEโ€“STYLE PIPELINE (mock experiment)
  โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

   template DNA          amplify               express (BioBits)
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ SOD3 +   โ”‚ โ”€โ”€โ”€โ–บ โ”‚ miniPCR  โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚ 37 ยฐC ยท 2โ€“4 h     โ”‚
  โ”‚ GFP ctrl โ”‚      โ”‚ amplicon โ”‚            โ”‚ cell-free Tx/Tl   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                     โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ–ผ                                                           โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚ P51 viewer  โ”‚  fluorescence (GFP control)               โ”‚ assay bench โ”‚
   โ”‚ (portable)  โ”‚                                           โ”‚ SOD3 ยท dot  โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                           โ”‚ blot ยท etc. โ”‚
                                                             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  • Genes in Space: https://www.genesinspace.org/
  • Lentini-style artificial cells (example class paper): Lentini, R. et al., 2014. Nat. Commun. 5, 4012.
  • Theophylline aptamer context (example): Martini, L. & Mansy, S.S., 2011. Chem. Commun. 47, 10734.

Week 10 HW: Advanced Imaging & Mass Spectrometry

cover image cover image

Week 10 Homework: Imaging, Measurement & Mass Spec

        /\__/\
       / ยท  ยท  \
      |    โ€ฟ    |
       \  ~~~  /
        `-----ยด

Homework: Final Project โ€” what you will measure (novel SOD3 design)

Aspects to measure

WhatWhy it matters
Identity & purityConfirm you expressed the intended construct, not a truncated product or contaminant.
Mass (intact)Matches design MW within instrument tolerance (ppm).
Primary structurePeptide map shows coverage across the sequence; confirms mutations and fusion junctions.
Oligomeric state (if relevant)Native MS or SEC shows whether SOD3 is monomer, dimer, or fused to a dimerizing domain.
Metal cofactorSOD enzymes bind Cu/Zn (or Zn/Zn in some forms); ICP-MS or activity correlates with correct metallation.
ActivityEnzymatic superoxide dismutation (e.g., cytochrome c assay) proves function, not just presence.

How you would perform these measurements

  1. Intact protein mass: Purify protein, buffer-exchange into MS-friendly volatile buffer (e.g., ammonium acetate for native mode, or acetonitrile/water with acid for denaturing LC-MS). Run LC-MS on a high-resolution instrument (Q-ToF, Orbitrap). Deconvolute the charge envelope to a neutral mass.
  2. Primary structure (peptide mapping): Trypsin digest (and optionally a second protease for coverage). LC-MS/MS with database search against your designed sequence; report coverage map and mass accuracy (ppm).
  3. Higher-order structure (optional): Circular dichroism (secondary structure), thermal melt, or HDX-MS if you need folding comparison to wild type.
  4. Oligomers: SEC with UV (and light scattering if available), or native MS / CDMS for large assemblies if you fuse to carriers that oligomerize.
  5. Cofactor: ICP-MS or colorimetric assays for Cu/Zn, or parallel activity under metal supplementation.

Technologies (detail)

TechnologyRole for SOD3 project
SDS-PAGE / native gelQuick purity and apparent MW; non-reducing vs reducing if you have disulfides.
UVโ€“VisProtein concentration; SOD proteins have aromatic absorbance at 280 nm.
Liquid chromatography (SEC, IEX)Purification and aggregation screening before MS.
Mass spectrometry (intact LC-MS, bottom-up proteomics)Molecular weight confirmation and sequence validation (this weekโ€™s focus).
Activity assayFunctional readout that MS alone cannot give.

Waters Part I โ€” Molecular weight (eGFP)

1. Calculated molecular weight from sequence

Paste the sequence (one letter; includes LE linker + Hisโ‚† tag) into ExPASy Compute pI/Mw or ProtParam and record the reported molecular weights.

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH

Values consistent with standard tables (linear chain, unmodified; 247 residues):

TypeMolecular weight
Average (as in ProtParam โ€œMolecular weightโ€)32,456.2 Da (~32.46 kDa)
Monoisotopic (linear sequence + Hโ‚‚O for termini)27,989.0 Da (~27.99 kDa)

ProtParamโ€™s average MW fits โ€œkDa from sequenceโ€; for ppm vs deconvoluted intact MS, use monoisotopic linear mass (typical ESI scale).

Note: mature eGFP in vivo has a cyclized chromophore; the linear calculator mass is still the usual reference for โ€œsequence-basedโ€ MW in homework unless your instructor specifies otherwise.

2. Adjacent charge state approach (Figure 1)

Use two adjacent peaks in the charge-state envelope from the intact LC-MS spectrum (Figure 1). Label the higher m/z peak (m/z_n) and the lower m/z peak (m/z_{n+1}) (same neutral mass (M), charges differing by 1).

Charge from an adjacent pair (recitation / course handout):

[ z = \frac{m/z_{n+1}}{m/z_n - m/z_{n+1}} ]

(m/z_{n+1}) is the peak at lower m/z (higher charge); (m/z_n) is at higher m/z (lower charge).

Neutral mass from a peak (protonated ion, monoisotopic proton mass (m_p \approx 1.00728) Da):

[ M = z ,(m/z - m_p) ]

For a consistent pair, the same (M) should be obtained whether you use the (z) ion at (m/z_n) or the ((z{+}1)) ion at (m/z_{n+1}) (after rounding (z) to the nearest integer).

Example pair A (labels from Figure 1): (m/z_n = 903.7148), (m/z_{n+1} = 875.4421).

[ z = \frac{875.4421}{903.7148 - 875.4421} = \frac{875.4421}{28.2727} \approx 30.96 ]

Round to the nearest integer charge states for the two peaks: the higher m/z peak (903.7148 Th) carries 31 protons; the lower m/z peak (875.4421 Th) carries 32 (adjacent charge states for the same neutral mass).

Then:

  • (M = 31 \times (903.7148 - 1.00728) \approx 27{,}981.5) Da
  • (M = 32 \times (875.4421 - 1.00728) \approx 27{,}981.5) Da

Example pair B: (m/z_n = 1000.5021), (m/z_{n+1} = 966.0390).

[ z = \frac{966.0390}{1000.5021 - 966.0390} \approx 28.03 \rightarrow z \approx 28 / 29 ]

  • (M = 28 \times (1000.5021 - 1.00728) \approx 27{,}986.7) Da
  • (M = 29 \times (966.0390 - 1.00728) \approx 27{,}986.7) Da

Deconvoluted mass to report (average of consistent pairs): about 27,982โ€“27,987 Da (~27.98 kDa), matching the monoisotopic linear sequence mass from ยง1 within measurement error.

Accuracy (fractional error from your handout):

[ \text{Accuracy} = \frac{\lvert MW_{\text{experiment}} - MW_{\text{theory}}\rvert}{MW_{\text{theory}}} ]

Using (MW_{\text{experiment}} \approx 27{,}982) Da and (MW_{\text{theory}} = 27{,}989) Da (monoisotopic linear from ยง1):

[ \text{Accuracy} \approx \frac{\lvert 27{,}982 - 27{,}989\rvert}{27{,}989} \approx 2.5 \times 10^{-4} \quad (\text{about } 0.025%) ]

ppm (parts per million):

[ \text{ppm} = \frac{\lvert MW_{\text{exp}} - MW_{\text{theory}}\rvert}{MW_{\text{theory}}} \times 10^{6} \approx \frac{7}{27{,}989} \times 10^{6} \approx 250\ \text{ppm} ]

ItemValue used here
Theoretical MW (2.1), monoisotopic linear27,989 Da
Adjacent pair (example)903.7148 & 875.4421 Th
(z) from formula~31 (on lower m/z peak of pair)
Deconvoluted (M_{\text{obs}})~27,982 Da
ppm vs monoisotopic theory~250 ppm

3. Zoomed-in peak in Figure 1 (~1473 m/z)

Observation: The inset shows a weak, jagged cluster around ~1473.7 Th, not a clean isotopic ladder.

Can you assign the charge state? Not reliably from this inset alone. The spacing between labeled maxima is only ~0.04โ€“0.07 Th; if that were interpreted as (1/z) for a single isotopic cluster, it would imply a very high (z) (~15โ€“25), but the signal-to-noise is poor and the โ€œpeaksโ€ are not resolved isotopes on a smooth baselineโ€”so you cannot read a trustworthy (1/z) spacing. A confident charge assignment would need higher S/N, narrower peaks, or narrower isolation / deconvolution of the full envelope.


Waters Part II โ€” Native vs denatured

This part is marked optional in the course, so Iโ€™m not submitting answers hereโ€”by choice, not by accident. Iโ€™m genuinely happy to take the optional path and put my time toward the required sections instead. If I ever need native vs denatured Q-ToF comparisons, Iโ€™ll come back to the lab materials with a smile.


Waters Part III โ€” Peptide mapping (primary structure)

1. Lysines and arginines in eGFP

Counting K and R from the Part I sequence gives the same result as Benchling โ†’ Biochemical properties (or Expasy ProtParam amino-acid composition).

Answer: 20 lysines (K) and 6 arginines (R).

K at 1-based positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239.

R at positions: 74, 97, 110, 123, 169, 216.

Highlighted on the full sequence (yellow = K, pink = R; matches Part I order):

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH

2. How many tryptic peptides?

Trypsin cleaves after K and R unless the next residue is P. The Part I sequence has 26 such cleavage sites, which produces 27 peptide fragments (including very short ones such as R, TR, QK, IR โ€” PeptideMass still counts each as a peptide).

PeptideMass answer: after โ€œPerform the cleavageโ€ with Trypsin and the same options as Figure 4 in the lab handout, the tool should report 27 peptides. If your number differs, check that enzyme is trypsin only, no extra missed-cleavage settings conflict with Figure 4, and the pasted sequence matches Part I exactly (247 residues).

3. PeptideMass

Use PeptideMass, paste the assignment sequence, set enzyme Trypsin, and replicate all options from Figure 4. Report the number of peptides the tool prints after โ€œPerform the cleavage.โ€ I havenโ€™t completed this yet but I will.

4. Peaks in Figure 5a (0.5โ€“6 min, >10% relative abundance)

Figure 5a below is the total ion chromatogram (TIC) for the eGFP tryptic peptide map (04142026_GFP digest_gud, TOF MSe, 50โ€“2000 m/z, ESI+). The tallest peak is at 4.87 min ((1.15 \times 10^7) counts). Taking >10% relative abundance as โ‰ฅ10% of that base peak ((1.15 \times 10^6) counts), the small labeled peaks at ~1.20 and ~5.43 min look below that threshold; all other labeled peaks in the window appear above it.

Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. Peak at 2.78 min is circled for Figure 5b. Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. Peak at 2.78 min is circled for Figure 5b.

Between 0.5 and 6.0 min, the figure shows 21 retention-time labels on distinct apexes (0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, 5.43). Excluding the two that likely fall under 10% of the base peak gives 19 peaks counted under the assignment rule.

5. More peaks or fewer vs prediction?

Predicted tryptic peptides (ยง2): 27 fragments from a full in-silico trypsin digest of the eGFP sequence.

Peaks in Figure 5a (ยง4, 0.5โ€“6 min): 21 labeled apexes, or 19 if you only count peaks โ‰ฅ10% of the base peak (4.87 min).

Does the peak count match 27? No โ€” there are fewer chromatographic peaks than predicted peptides in this run and time window.

Why fewer is normal:

  • Co-elution: Two or more peptides can leave the column at the same time and appear as one TIC apex, so the number of TIC peaks can be smaller than the number of peptide species.
  • Time window: The assignment only counts 0.5โ€“6 min; any tryptic peptide eluting before 0.5 or after 6 min would not be counted here even though it is in the digest.
  • Sensitivity: Very small or poorly ionizing peptides may fall below the display threshold (or the >10% rule), so they do not appear as distinct peaks.
  • In general: For a TIC, you can also sometimes see more apparent peaks than โ€œ27โ€ if you included adducts, partial cleavage products, or oxidized variants as separate featuresโ€”but this TIC shows fewer than 27 in the stated window, which is consistent with co-elution and window/sensitivity effects, not a contradiction with the protein being eGFP.

6. Figure 5b โ€” m/z, charge, singly charged mass

Precursor spectrum for the peak eluting at 2.78 min (combined with Figure 5c in the screenshot below).

Figure 5bโ€“5c. Top: MS of the 2.78 min feature (inset: isotopes near m/z 525.76). Bottom: MS/MS fragmentation. Figure 5bโ€“5c. Top: MS of the 2.78 min feature (inset: isotopes near m/z 525.76). Bottom: MS/MS fragmentation.

Most abundant precursor (monoisotopic apex): m/z 525.76712 (also a +2 charge envelope; minor ions near 350.84 and 1050.52 are consistent with other charge states / isotopic features of the same peptide).

Isotope spacing (inset): e.g. 525.76712 โ†’ 526.25918 Th โ†’ (\Delta \approx 0.492) Th. For a single isotopic cluster, (\Delta(m/z) \approx 1/z), so (z \approx 1/0.492 \approx 2.03) โ†’ charge (z = 2) ([M+2H]ยฒโบ).

Neutral peptide mass from the measured (m/z) and (z=2) (monoisotopic proton mass (m_p = 1.00728) Da):

[ M_{\text{obs}} = z,(m/z - m_p) = 2 \times (525.76712 - 1.00728) \approx \mathbf{1049.52\ Da} ]

Singly protonated mass ([M{+}H]^+) (one proton on the neutral peptide):

[ [M{+}H]^+ = M_{\text{obs}} + m_p \approx 1049.52 + 1.00728 \approx \mathbf{1050.53\ Da} ]

(This agrees with the ~1050.52 Th feature in the full scan as the +1 ion of the same peptide.)

QuantityValue
m/z (main ion, monoisotopic)525.76712
(\Delta) between isotopes (inset)~0.49 Th
Inferred (z)2
Neutral peptide mass (M)~1049.52 Da
([M{+}H]^+)~1050.53 Da

7. Identify peptide and ppm error

Peptide identity: match (M_{\text{obs}}) or ([M{+}H]^+) to PeptideMass tryptic masses for the Part I sequence. The closest tryptic peptide is FEGDTLVNR (cleavage after K at โ€ฆK|FEGDTLVNRโ€ฆ in eGFP).

Theoretical monoisotopic masses: (M_{\text{theory}} \approx 1049.514) Da, ([M{+}H]^+_{\text{theory}} \approx 1050.521) Da.

Accuracy (fractional error):

[ \text{Accuracy} = \frac{\lvert MW_{\text{experiment}} - MW_{\text{theory}}\rvert}{MW_{\text{theory}}} ]

Using neutral masses: (\lvert 1049.5197 - 1049.5142\rvert / 1049.5142 \approx \mathbf{5.3 \times 10^{-6}}) (~0.00053%).

ppm error:

[ \text{ppm} = \frac{\lvert M_{\text{obs}} - M_{\text{theory}}\rvert}{M_{\text{theory}}} \times 10^{6} \approx \mathbf{5.3\ ppm} ]

(Using ([M{+}H]^+) instead gives the same order of magnitude.)

8. Percent sequence confirmed (Figure 6)

BioAccord reports amino acid coverage from peptide identifications. Figure 6 below (โ€œAmino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification dataโ€) shows Identified: 88% and Chain 1 (88% coverage).

Figure 6. Amino acid coverage map of eGFP (BioAccord LC-MS peptide IDs). Figure 6. Amino acid coverage map of eGFP (BioAccord LC-MS peptide IDs).

Answer: 88% of the protein sequence is covered by confident peptide matches (highlighted segments in the map). A few short stretches remain unidentified (white / non-highlighted gaps in the map)โ€”e.g. segments around LPVPWPTL, parts of VTTLT / YGVQC, TRA, IDF, and a single Qโ€”so not every residue received a confident tryptic ID in this run.

Percent coverage = (residues covered by identified peptides) / (total residues) ร— 100% = 88% (from the BioAccord summary bar).


Part IV โ€” KLH oligomers (CDMS)

Subunit masses from Table 1: 7FU โ‰ˆ 340 kDa, 8FU โ‰ˆ 400 kDa per polypeptide chain (1,000 kDa = 1 MDa).

Expected oligomer masses (integer subunit counts ร— subunit mass):

Oligomer (assignment)CalculationExpected mass
7FU decamer (10 ร— 7FU)10 ร— 340 kDa3.4 MDa
8FU didecamer (20 ร— 8FU)20 ร— 400 kDa8.0 MDa
8FU 3-decamer (30 ร— 8FU)30 ร— 400 kDa12.0 MDa
8FU 4-decamer (40 ร— 8FU)40 ร— 400 kDa16.0 MDa
Figure 7. CDMS mass spectrum of KLH (intensity vs mass, MDa). Figure 7. CDMS mass spectrum of KLH (intensity vs mass, MDa).

Where each species lines up on Figure 7 (labeled maxima from the spectrum):

SpeciesExpectedObserved label on Figure 7 (approx.)
7FU decamer3.4 MDa~3.4 MDa (clear peak just before ~4.013 MDa)
8FU didecamer8.0 MDa~8.33 MDa (strongest peak in the spectrum); ~7.52 MDa is a nearby shoulder / related species
8FU 3-decamer12.0 MDa~12.67 MDa
8FU 4-decamer16.0 MDaNo strong label exactly at 16 MDa; weak intensity is visible beyond ~12.67 MDa toward ~17 MDa (and minor bumps ~21 and ~25 MDa), consistent with a broad / low-abundance ~16 MDa assembly plus adducts or heterogeneity

Other features at ~0.20, ~0.79, ~1.52, and ~4.01 MDa are likely smaller assemblies, fragments, or alternative stoichiometries, not the four named decamer-series maxima in the table.

Takeaway: The dominant KLH signals align with 3.4 MDa (7FU decamer), ~8.3 MDa (8FU didecamer, base peak), and ~12.7 MDa (8FU 3-decamer). The 4-decamer is expected near 16 MDa but appears much weaker than the lower oligomers in this run.


Part V โ€” Did I make GFP?

Values below use the same eGFP construct as Part I and the intact LC-MS deconvoluted mass from Figure 1 in Part I (course / handout spectrum), since that matches a monoisotopic-style deconvolution.

TheoreticalObserved / measured (intact LC-MS)ppm mass error
Molecular weight (kDa)~32.46 (average MW, ProtParam / ExPASy, linear sequence + His tag)~27.98 (deconvoluted neutral mass from Part I, adjacent charge states on Figure 1)~250 ppm vs monoisotopic linear theoretical ~27.989 kDa (same โ€œtypeโ€ as the MS value; see note)

Note: The 32.46 kDa entry is the average molecular weight from the calculator; the mass spectrometer deconvolution is usually reported on a monoisotopic scale (~27.98 kDa here), so ppm should be computed against the monoisotopic linear theoretical mass (~27.989 kDa, Part I) for a fair error. Comparing 27.98 kDa directly to 32.46 kDa would mix scales and look like a huge โ€œerror,โ€ which is misleading.

ppm (vs monoisotopic linear (M_{\text{theory}} \approx 27.989) kDa, (M_{\text{obs}} \approx 27.982) kDa): (\lvert 27.982 - 27.989\rvert / 27.989 \times 10^{6} \approx) 250 ppm.


Quick reference

Week 11 HW: Cloud Laboratories & Cell-Free Master Mix

cover image cover image

Week 11 Homework: Cloud Laboratories & Cell-Free Design

Cell-free composition, long-run energy strategy, and fluorescent-proteinโ€“aware master mix planning for the collaborative Nebula experiment.


1. Community bioart โ€” done

Completed on my website per the assignment (contribution described, what I liked about the collaboration, one idea for next year).


2. Cell-free protein synthesis โ€” reagent roles

2.1 Oneโ€“two sentences per component

ComponentRole in the cell-free reaction
E. coli lysateProvides ribosomes, tRNAs, aminoacyl-tRNA synthetases, translation factors, and endogenous enzymes that support coupled or translation-coupled expression from your template.
BL21 (DE3) Star lysate (+ T7 RNA polymerase)Supplies T7 RNA polymerase so DNA under a T7 promoter is transcribed in the same compartment as translation; BL21-derived extracts are common backgrounds for soluble protein expression.
Potassium glutamatePrimary kosmotropic salt / potassium source that tunes ionic strength and macromolecular stability so ribosomes and proteins behave closer to physiological E. coli cytoplasm.
HEPESโ€“KOH, pH 7.5Maintains stable pH for polymerases, ribosomal activity, and chromophore maturation of many FPs across the incubation.
Magnesium glutamateSupplies magnesium required for NTP coordination, ribosome function, and nucleic acid stability; magnesium must be balanced with NTPs and phosphate to avoid precipitation or elongation slowdown.
Potassium phosphate (mono- and dibasic)Adds buffering capacity and phosphate for phosphoryl-transferโ€“based energy chemistry that helps recycle nucleotide pools.
RibosePentose substrate feeding pentose-phosphate / salvage-related routes that help regenerate sugar phosphates tied to nucleotide recycling in long reactions.
GlucoseCentral carbon and energy substrate for residual glycolytic flux in extract, yielding ATP and cofactors over many hours.
AMP, CMP, GMP, UMPNucleoside monophosphates that enter salvage and kinase networks to rebuild triphosphate pools consumed by transcription and translation.
Guanine (free base)Purine base for salvage phosphoribosylation to GMP (and onward to GDP/GTP); supplements the purine branch when formulation omits excess pre-formed GMP.
17-amino-acid mixSupplies most canonical amino acids when cysteine and tyrosine are titrated separately.
Tyrosine & cysteine (separate)Solubility- and oxidation-sensitive residues; separate addition improves accurate stoichiometry and reduces side chemistry in long incubations.
NicotinamidePrecursor for NAD(P)+; supports enzymes in redox-balanced paths that persist over extended reactions.
Nuclease-free waterDiluent for final formulation; minimizes nuclease-mediated template degradation during preparation.

2.2 PEPโ€“NTP (1 h optimized) vs NMPโ€“riboseโ€“glucose (20 h) master mixes

The one-hour PEPโ€“NTP formulation uses phosphoenolpyruvate with extract kinases for high-flux ATP regeneration together with supplied NTPs, which favors strong early yield in a short window. The twenty-hour NMPโ€“riboseโ€“glucose formulation feeds nucleoside monophosphates plus ribose and glucose so salvage and glycolytic pathways sustain gradual triphosphate rebuilding, better matched to long incubations where burst regeneration would exhaust pools. In short: the first favors early power; the second favors sustained endurance.

2.3 Bonus: transcription when GMP is omitted but guanine is present

Purine salvage enzymes in crude lysate convert guanine to GMP via PRPP-dependent phosphoribosylation even when free GMP is omitted from the mix; further phosphorylation restores GDP and GTP for transcription. Guanine is not a nucleotide by itselfโ€”it feeds the salvage pathway upstream of GMPโ€”so RNA polymerase still sees normal triphosphate pools once recycling runs.


3. Planning the global experiment โ€” proteins and hypothesis

3.1 At least one property per fluorescent protein

ProteinProperty affecting CF expression or readout
sfGFPSuperfolder folding is fast; signal rises quickly and is less maturation-limited than most reds; mild pH sensitivity remains.
mRFP1Oxygen-dependent chromophore maturation and slower maturation than gfp-class FPs delay peak fluorescence relative to translation.
mKO2Orange emitter with acid sensitivity; slow pH drift in long reactions can change apparent brightness.
mTurquoise2Cyan FP with favorable quantum yield; folding and early dark states still gate signal early in CF.
mScarlet-IFast-maturing red engineered to shorten dark intermediates versus older reds, improving time-to-readout over long incubations.
Electra2Engineered teal/green line for multiplexing; maturation kinetics, magnesium / ionic strength, and folding yield set plate reader signal per molecule in bulk lysate.

3.2 Hypothesis for 36 h fluorescence

Example (swap when your FP and supplement limits are assigned): Protein mRFP1. Adjustment: tune magnesium glutamate and HEPES buffer together within instructor-allowed concentration ranges. Expected effect: more stable elongation and oxidative red chromophore maturation over tens of hours, increasing integrated fluorescence at 36 h compared with a formulation optimized only for a one-hour PEP-driven burst.


Skipped here (optional / not homework)

  • Section 4 Build-a-cloud-lab simulation: optional bonus.
  • Section 6 generic Nebula JSON: operational resource for final projects, not a homework item itself.

Lab phases

Get more information on this.


Add the same recitation, Google Slide, and Benchling links your instructor posted on the main assignment page.