Week 10 HW: Advanced Imaging and Measurements

Final Project

Measurement Plan

The project’s central question — do AI-guided designs outperform standard, random, and unguided foundation-model designs in cell-free expression? — requires measurements at three levels: the DNA (to confirm we test what we designed), the protein output (the primary readout feeding the surrogate), and the surrogate model itself (to know whether the loop is learning).

1. DNA Identity and Quality (Build verification)

Before any cell-free reaction, every linear construct from Twist is verified:

  • Sequence identity — Sanger sequencing of the variable region using a T7-anchored primer (~$5/reaction, Eurofins/Azenta), aligned to the design CSV. Run on all Grade-A candidates (e.g. construct #40) and on any anomalous downstream result.
  • DNA concentration and purity — NanoDrop (A260/A280, A260/A230), confirmed by Qubit dsDNA HS for top candidates. Target ≥10 ng/µL, A260/A280 ≈ 1.8–2.0.
  • DNA integrity — 1% agarose gel electrophoresis (GelRed, 100 V, 30 min) on a per-batch sample, or Agilent TapeStation for higher throughput. Single tight band at ~770 bp confirms full-length, non-degraded DNA.

2. Protein Expression Output (Test step — primary readout)

The headline measurement that drives the entire loop. Each construct is run in triplicate in 10 µL TX-TL reactions in a black, clear-bottom 96-well plate, sealed, and incubated at 30 °C in a kinetic-fluorescence plate reader (BioTek Synergy H1 or Tecan Spark) with 485/528 nm ex/em for sfGFP. Reads every 3 min for 8 h yield ~160 timepoints per well, from which four features are extracted:

  • Fluorescence AUC — the single number used to train the surrogate.
  • Maximum fluorescence (F_max) — proxy for total protein yield.
  • Time to signal (t_signal) — proxy for expression rate.
  • Replicate CV — the Aim-2 reproducibility metric.

A purified recombinant sfGFP standard curve on every plate converts RFU to nM, making the readout comparable across plates and rounds. For top hits, total protein is independently confirmed by BCA assay (Thermo Pierce, ~$100/kit) and SDS-PAGE with Coomassie (clean band at ~27 kDa). In later rounds, when sfGFP is replaced by C1-metabolism enzymes, intact-mass LC-MS (Waters BioAccord) catches truncation products that would otherwise masquerade as low expression.

3. Construct-Level Sequence Features (surrogate inputs)

Computed in silico on every designed construct and stored in the library CSV:

  • GC content of the variable region.
  • Predicted 5’ UTR secondary-structure ΔG (ViennaRNA / RNAfold).
  • RBS strength score (Salis Lab RBS Calculator).
  • Sequence complexity, Shannon entropy, k-mer frequencies.
  • Forbidden-site count (XbaI, EcoRI, BsaI, etc.).

4. Surrogate Model Performance (Learn step)

Three quantities tracked on the model between rounds:

  • Held-out R² and MAE on a 20% test split. Expected trajectory: ~0.3 (Round 1, n=47) → ~0.5–0.6 (Round 3, n=144).
  • Top-K enrichment — fraction of the surrogate’s predicted top-10 that fall in the experimental top-10 of the next round. The operationally relevant metric.
  • Uncertainty calibration — reliability diagrams of predicted std vs. observed error; essential for the exploration/exploitation balance.

The plate-reader fluorescence AUC is the single most important measurement — it is what the surrogate trains on and what the entire closed loop optimizes for. Every other measurement either verifies the DNA matches the design, catches artifacts that would corrupt the fluorescence signal, or assesses whether the loop itself is learning.

Waters Part I — Molecular Weight

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

Calculating this from the exact sequence, then adjust for the GFP chromophore because that modification changes the intact mass. I’ll also flag the common ambiguity around N-terminal methionine processing. So the expected intact mass for the mature fluorescent eGFP standard is approximately: 27,986.6 Da, or 27.987 kDa.

Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

Determine z for each adjacent pair of peaks (n, n+1) using: z = (m/z_n+1)/((m/z_n) - (m/z_n+1))

I selected two adjacent charge-state pairs from Figure 1: The assignment asks us to use the adjacent charge-state approach on the intact eGFP LC-MS spectrum, using the labeled m/z peaks in Figure 1. For ESI protein spectra, adjacent peaks usually differ by one charge, and the molecular weight can be calculated from the relationship m/z=(MW+zH)/z, where H is the proton mass. Pair 1: 903.7148 and 875.4421 z=875.4421 / 903.7148−875.4421 = 30.96 (31)

Waters Part II — Secondary/Tertiary structure

Waters Part III — Peptide Mapping - primary structure

The eGFP sequence contains:

3.1

analyzing the sequence in Benchling, we get to

  • 20 Lysines (K)
  • 6 Arginines (R)

3.2

Using the Expasy Tool PeptideMass according to the instructions i arrive at: 27 peptides

3.3

18 chromatographic peaks

3.4

No, as there are 27bpeptides predicted, but only 18 peaks counted

3.5

most abundant peptide peak at m/z = 525.76712 isotopic spacing is 1/z z = 2

Calculating the single charged Mass MH+ = (525.767 × 2) − 1.0078 = 1050.53 Da

3.6

unsure, as none of the masses fit.

Waters Part IV — Oligomers

The goal is to identify KLH oligomeric species in the CDMS mass spectrum using the known masses of the KLH polypeptide subunits. The Homework gives the KLH subunit masses as 7FU = 340 kDa and 8FU = 400 kDa, and asks us to assign the oligomeric states on the spectrum.

Given subunit masses

SubunitMass
7FU340 kDa
8FU400 kDa

Calculations and peak assignments

Oligomeric speciesCalculationTheoretical massApproximate location on spectrum
7FU Decamer(10 x 340 kDa)3400 kDa = 3.4 MDaPeak near 3.4 MDa
8FU Didecamer(20 x 400kDa)(8000 kDa) = 8.0 MDaMajor peak near 8.33 MDa
8FU 3-Decamer(30 x 400 kDa)(12000 kDa) = 12.0 MDaPeak near 12.67 MDa
8FU 4-Decamer(40 x 400 kDa)(16000 kDa) = 16.0 MDaWeak/broad signal near 16–17 MDa

Final answer

The KLH oligomeric species can be assigned as follows:

  • 7FU Decamer: near 3.4 MDa
  • 8FU Didecamer: near 8.0–8.33 MDa
  • 8FU 3-Decamer: near 12.0–12.67 MDa
  • 8FU 4-Decamer: near 16.0–17.0 MDa

The strongest species in the spectrum is the 8FU didecamer, observed as the largest peak around 8.33 MDa.

Waters Part V — Did I make GFP?

I didn’t receive any further documents.