Week 10 Review: Advanced Imaging and Measurement
Week 10 — Advanced Imaging & Measurement: How do we know what we made?
Course: HTGAA Spring 2026 Lecture (Tues, Apr 7, 2026): Evan Daugharthy, Lindsay Morrison — Advanced Imaging & Measurement Tech Recitation (Wed, Apr 8): Waters Corp Team — Mass spectrometry Author: Fiona (Committed Listener track)
At a glance. Mass spectrometry asks a precise quantitative question: did the molecule that came out of the column have the mass we predicted from the sequence? When the answer is yes within a few parts per million, it’s the same molecule. When it isn’t, the difference itself tells you what went wrong. This page builds the logic of intact-protein LC-MS, peptide mapping, and charge detection MS from first principles, with eGFP as the example throughout.
Headline takeaway
“Did I make my protein?” is a numerical question with a numerical answer. Mass spectrometry turns it into a comparison: theoretical mass from the sequence versus measured mass from the instrument, expressed as a parts-per-million error. Every layer of confirmation in synthetic biology — DNA, protein, fold, function — eventually passes through this comparison.
Why good and accurate measurement is crucial to experimental design
Synthetic biology is a design-build-test cycle. Every cycle ends with measurement: did the construct behave as designed? Mass spectrometry is the most quantitative tool we have for that final check on proteins. It’s also the only method that can tell you, in a single experiment, whether a protein:
- has the predicted sequence (yes/no, with ppm-level confidence),
- carries the post-translational modifications you expected (PTMs show up as mass shifts),
- folded into its native conformation (native MS reveals shape via charge state), and
- assembled into the right oligomeric state (CDMS works at megadalton scale).
For a project like Cholera Shield — engineering B. subtilis spores to display anti-cholera-toxin nanobodies — every one of those questions has to be answered before the platform’s function can be evaluated. This week’s content is the bridge between we designed and built it and it actually works.
Core concepts — the minimum vocabulary
A handful of terms recur through this page. Define them once here, then use them freely below.
- m/z (mass-to-charge ratio) — what the instrument actually measures. The protein is converted to an ion with multiple positive charges; the instrument reports the ion’s mass divided by its charge.
- Average vs monoisotopic mass — carbon in nature is ~99% ¹²C and ~1% ¹³C; nitrogen and sulfur have similar isotope distributions. Average mass is what you’d get if you weighed a population — it accounts for the natural isotope mix. Monoisotopic mass is the mass of the all-light-isotope species (all ¹²C, all ¹⁴N, etc.). For intact proteins above ~10 kDa, the spectrum’s most-abundant peak is shifted off the monoisotopic mass by several Da and average mass is the appropriate comparison; ProtParam reports average mass for this reason. For peptides below ~3 kDa on a high-resolution instrument, the isotope envelope is resolved and monoisotopic mass is what you compare. Mismatching the two is one of the most common ppm-error traps in real intact-MS work.
- Charge-state ladder — a single protein produces many peaks, not one. Each peak corresponds to the same molecule carrying a different number of protons.
- Electrospray ionization (ESI) — the gentle, biomolecule-compatible method for getting a protein into the gas phase as an ion.
- Native vs denatured MS — the same protein gives different spectra depending on whether it’s folded (native) or unfolded (denatured). The shape difference shows up as a difference in charge.
- Top-down vs bottom-up MS — top-down weighs the whole intact protein; bottom-up digests the protein into peptides and weighs each piece.
- Tryptic digest — using the enzyme trypsin to chop a protein into predictable peptide fragments. Trypsin cleaves on the C-terminal side of K and R (unless followed by P).
- ppm error — mass measurement accuracy expressed as
(measured − theoretical) / theoretical × 1,000,000. The currency of “did the masses agree?” - Charge detection MS (CDMS) — a single-molecule variant of mass spectrometry that works on assemblies too large for ordinary ESI-MS to resolve.
The shape of the measurement: from droplet to spectrum
Before the protein gets weighed, it has to be turned into an ion that the instrument can manipulate with electric fields. This is electrospray ionization (ESI), and it’s the foundation that every measurement in this week’s homework rests on.
Imagine the needle tip of the electrospray source held at +2–4 kV relative to the mass-spec inlet. The solvent (protein dissolved in water, methanol, acetonitrile, with a little acid) is drawn out of the needle into a sharp Taylor cone, and tiny droplets break off and fly through warm nitrogen gas toward the inlet. Each droplet leaves the cone already carrying a net positive charge — typically tens to hundreds of excess protons distributed across the protein molecules inside.
That’s the setup. The interesting physics happens next.
Coulombic repulsion. Inside the droplet, every excess proton is repelled by every other one. Surface tension (γ ≈ 72 mN/m for water) is what keeps the droplet spherical and intact. As the droplet flies and evaporates, its radius R shrinks but its charge q stays roughly constant. The charge density q²/R climbs fast.
At some point, repulsion wins. The threshold is the Rayleigh limit: q_R = 8π · √(ε₀ · γ · R³). When the droplet’s charge approaches q_R, it becomes unstable and undergoes a Coulomb fission — it extrudes a thin jet that breaks into 10–20 much smaller progeny droplets that carry away ~15% of the mass but ~30% of the charge. The parent shrinks below the threshold and resumes evaporation; the progeny repeat the cycle.
This cascades until you’re left with droplets containing single protein molecules. How the protein actually emerges depends on whether it’s folded or unfolded — and that turns out to be the entire reason native and denatured spectra look so different:
- Charge Residue Model (CRM) — Dole, 1968. The final droplet evaporates to nothing and the protein inherits whatever charges the droplet had left. Compact, folded proteins follow CRM. The leftover charge scales with the droplet radius at the final step, which scales with the protein’s own size → narrow charge envelope, lower z. This is the native-MS regime.
- Chain Ejection Model (CEM) — Konermann, 2013. An unfolded chain pokes out of the droplet surface and extrudes stepwise like a snake leaving a hole; protonation happens on the exposed segments as they emerge. Every basic residue along the chain becomes eligible to grab a proton → broad envelope, higher z. This is the denatured-MS regime.
So “Coulombic repulsion” isn’t a hand-wave. It’s literally the force that explodes droplets into ever-smaller progeny and determines whether the protein leaves the droplet folded or unfolded.
Figure W10.4 — Charge-state ladder cartoon: same eGFP molecule shown carrying different proton counts, with arrows showing how higher charge maps to lower m/z on the spectrum.
Top-down MS: weighing the whole protein
The simplest mass-spec measurement is intact protein MS. You put the purified protein on the instrument, get a spectrum, deconvolute the charge ladder, and read off a single number: the protein’s molecular weight. The whole game is then comparing that number against what you predicted from the sequence.
Predicting the mass from sequence
A protein is a chain of amino acids linked end-to-end. Every individual amino acid has a known mass. When amino acids link up, each peptide-bond formation kicks out one water molecule. So if you know the sequence, you can predict the mass exactly — sum up the residue masses (the amino-acid mass with one water already subtracted), then add back one water for the free termini:
MW(protein) = Σ(residue masses) + 18.02 Da
That’s the entire formula. ExPASy ProtParam does it with a lookup table of average residue masses.
Figure W10.1 — Peptide bond formation with H₂O leaving. Justifies “residue mass = amino acid mass − 18.”
For our HTGAA homework eGFP sequence — MVSKGEELFTG...LGMDELYKLEHHHHHH, 247 amino acids — the calculation gives:
| Quantity | Value |
|---|---|
| Length | 247 amino acids |
| Composition | 20 K, 6 R, 15 H, 2 C, 6 M, plus the other 14 amino-acid types |
| Theoretical MW (unmodified) | 28,006.6 Da |
| Chromophore maturation correction (−20.03 Da) | — |
| Theoretical MW (mature, fluorescent) | 27,986.6 Da |
Figure W10.2 — Linear cartoon of the 247-aa eGFP-LE-6×His construct, with the T-Y-G chromophore tripeptide colored inside the eGFP body.
The first 239 residues are canonical eGFP — the same molecule used in fluorescent biosensors, FRET probes, and millions of transgenic mice. The last 8 residues, LEHHHHHH, are an engineered tag bolted onto the C-terminus to make the protein easy to purify on a nickel column (the six histidines bind immobilized Ni²⁺; the LE linker gives the tag room to move without interfering with folding).
The −20 Da chromophore correction
GFP fluoresces for one reason: it self-assembles a fluorescent group, the chromophore, from three of its own residues — positions 65–66–67 in the conventional GFP numbering (Thr–Tyr–Gly in eGFP, after the canonical S65T mutation). The chromophore forms in two spontaneous chemical steps once the protein folds:
- Cyclization (with dehydration of the tetrahedral intermediate). The carbonyl C of Thr65 attacks the backbone amine N of Gly67 to form a tetrahedral intermediate, which loses water to give the 5-membered imidazolinone ring. Net mass change for this combined step: −18.01 Da. Some references (e.g., Barondeau 2003) describe cyclization and dehydration as two separate nominal steps; this writeup collapses them because the net mass change occurs together.
- Oxidation. Molecular O₂ removes two hydrogens off the Cα–Cβ of Tyr66, extending aromatic conjugation. This is the rate-limiting step (minutes to hours) and the reason GFP doesn’t fluoresce without O₂. −2.02 Da.
Net: −20.03 Da vs the unmodified sequence MW.
Figure W10.3 — Three-panel mechanism: Thr-Tyr-Gly tripeptide → cyclized intermediate (−18 Da) → mature chromophore (−2 Da). Total Δm = −20.03 Da.
This 20 Da shift is the single most common gotcha in intact GFP MS. On a 28 kDa protein it’s roughly 715 ppm — about 140× the Waters Xevo G3 QTof’s ~5 ppm mass-accuracy floor. A reader who forgets the correction will misdiagnose a perfectly good fluorescent sample as a failure (because the measurement looks “way off” relative to the unmodified theoretical), and conversely a non-fluorescent immature batch can match the unmodified prediction exactly. Mass alone doesn’t tell you which case you’re in — you need the fluorescence readout too.
Deconvoluting the charge-state ladder
The protein doesn’t give a single peak. It gives a ladder, because each molecule picks up a different number of protons during ESI. A 28 kDa eGFP molecule can pick up anywhere from 10 to 30 protons depending on solvent and exposed basic residues. The mass spec sees each charge state as a separate peak.
The m/z formula for any one peak:
m/z = (M + z·H) / z, where H ≈ 1.00728 Da is the proton mass.
This is one equation with two unknowns (M, z), so a single peak doesn’t determine M. The trick is picking two adjacent peaks — they differ in charge by exactly 1, with the lower-m/z peak carrying the higher charge — and solving the two equations simultaneously. Call the higher-m/z peak m₁ (charge z) and the lower-m/z peak m₂ (charge z+1):
z = (m₂ − H) / (m₁ − m₂) → round to integer M = z·(m₁ − H)
[Figure W10.5 — placeholder] Annotated overlay on the Waters denatured eGFP spectrum (assignment Figure 1): two adjacent peaks labeled m₁ and m₂, with the calculation panel inset.
In practice, read m/z values off two adjacent peaks, plug into the formula, round z to the nearest integer, compute M, repeat across several adjacent pairs, and average. The spread across pairs tells you the read-off uncertainty.
When the zoom can read charge directly
If you zoom into a single peak hard enough, sometimes you can see the isotope envelope: a series of sub-peaks at +0, +1, +2, +3 Da from the monoisotopic mass, corresponding to molecules with 0, 1, 2, 3 ¹³C atoms. The mass spacing between adjacent isotope peaks is 1.00336 Da (the ¹³C − ¹²C difference), but the instrument reports m/z, so the apparent spacing is:
Δ(m/z) = 1.00336 / z
If you can resolve the isotope peaks, charge is read straight off the zoom: z = 1.00336 / Δ(m/z).
The Xevo G3 QTof is specified at ≥40,000 FWHM resolving power, but the practical resolution on a ~28 kDa intact protein is closer to 25,000–30,000 FWHM due to conformational microheterogeneity and incomplete desolvation broadening the peaks. At m/z = 1500 and 30,000 practical FWHM, the instrument can distinguish peaks separated by 1500/30,000 = 0.05 m/z. For eGFP at z = 18 sitting around m/z = 1473, isotope peaks would be spaced 1.00336/18 ≈ 0.056 m/z — right at the practical resolution limit. In practice, the isotope envelope of a 28 kDa protein on a Q-TOF is usually unresolved, so the adjacent-peak method above is the one that gives you the answer.
A higher-resolution instrument — an Orbitrap (~120,000+) or FTICR (>500,000) — would resolve the isotope envelope and let you read charge directly. This is one of the main reasons high-resolution instruments are preferred for intact-protein work.
Figure W10.6 — Side-by-side comparison: unresolved peak (Q-TOF, smooth Gaussian) vs resolved isotope comb (Orbitrap).
Closing the loop: the ppm comparison
Once you have a measured M and a theoretical M from the sequence, the comparison is one number:
ppm error = (M_observed − M_theoretical) / M_theoretical × 1,000,000
Interpretation under good instrument calibration:
- <5 ppm — confident match. Yes, you made the right protein.
- 5–50 ppm — probably a match. Check calibration; consider a missed modification.
- >50 ppm — not a match, or a major unaccounted modification (large PTM, disulfide miscount, etc.).
- ~700 ppm on a 28 kDa protein — a 20 Da gap. Almost certainly the chromophore-maturation shift, meaning the comparison was made against the wrong theoretical value.
For the HTGAA homework, the “did I make GFP?” table looks like this:
| Form of eGFP | Theoretical (Da) | Measured (Da) | ppm error |
|---|---|---|---|
| Unmodified sequence | 28,006.6 | (from intact-MS deconvolution) | (compute) |
| Mature, fluorescent eGFP | 27,986.6 | (from intact-MS deconvolution) | (compute) |
The mature row is the one that matters when the sample is fluorescent. If only the unmodified row matches (with the mature row ~715 ppm off), the protein was made but didn’t mature — probably no fluorescence. If neither row matches, something else came out of the column.
Figure W10.17 — Three-branch decision tree for interpreting the ppm result: mature ppm < 5 → fluorescent eGFP; unmodified ppm < 5 with mature ppm ≈ 715 → immature; neither matches → debug.
Bottom-up MS: confirming the sequence piece by piece
Intact mass tells you what the protein weighs. It doesn’t tell you the sequence — two proteins of identical mass can have completely different sequences (any two residues that sum to the same total are interchangeable in an intact-mass measurement). To verify the actual sequence, we shred the protein into smaller, identifiable pieces and check each piece against what we expect.
The metaphor: if intact MS is weighing the whole book to check it’s the right book, peptide mapping is tearing it into chapters and confirming each chapter is the one you expected.
Figure W10.7 — Six-panel workflow: purified eGFP → trypsin digest → HPLC column with eluting peptide peaks → mass spec → spectrum → peptide ID table → sequence coverage map.
Why trypsin
Trypsin is a serine protease that cleaves the peptide bond on the C-terminal side of K (lysine) or R (arginine) — unless the next residue is proline. The K-P / R-P exception comes from proline’s geometry: its side chain locks the backbone into a kink that’s a poor fit for trypsin’s active site.
Figure W10.9 — Cartoon of trypsin’s active site cleaving at K-X (success) vs K-P (failure, blocked by proline’s ring).
For the HTGAA eGFP construct, the lysine and arginine count comes out to 20 K + 6 R = 26 K/R residues. Scanning the sequence for K-P or R-P motifs: there are none. So all 26 sites are cleavable, and the C-terminal residue isn’t a K or R, so there’s no terminal cleavage to worry about.
Predicted peptide count = cleavage sites + 1 = 27 peptides (zero missed cleavages).
Figure W10.8 — eGFP sequence with K residues colored one shade and R residues another; all 26 cleavage sites highlighted; tag region (LEHHHHHH) shaded separately.
The peptide ladder for our eGFP
A Python tryptic digest of the assignment sequence gives the following 27 peptides, with their predicted singly-protonated monoisotopic masses ([M+H]⁺):
| # | Position | Length | [M+H]⁺ (Da) | Sequence |
|---|---|---|---|---|
| P1 | 1–4 | 4 | 464.25 | MVSK |
| P2 | 5–27 | 23 | 2437.26 | GEELFTGVVPILVELDGDVNGHK |
| P3 | 28–42 | 15 | 1503.66 | FSVSGEGEGDATYGK |
| P4 | 43–46 | 4 | 474.33 | LTLK |
| P5 | 47–53 | 7 | 769.39 | FICTTGK |
| P6 | 54–74 | 21 | 2378.26 (or 2358.23 mature) | LPVPWPTLVTTLTYGVQCFSR ★ |
| P7 | 75–80 | 6 | 790.36 | YPDHMK |
| P8 | 81–86 | 6 | 821.39 | QHDFFK |
| P9 | 87–97 | 11 | 1266.58 | SAMPEGYVQER |
| P10 | 98–102 | 5 | 655.38 | TIFFK |
| P11 | 103–108 | 6 | 711.29 | DDGNYK |
| P12 | 109–110 | 2 | 276.17 | TR |
| P13 | 111–114 | 4 | 446.26 | AEVK |
| P14 | 115–123 | 9 | 1050.52 | FEGDTLVNR |
| P15 | 124–127 | 4 | 502.32 | IELK |
| P16 | 128–132 | 5 | 579.31 | GIDFK |
| P17 | 133–141 | 9 | 982.50 | EDGNILGHK |
| P18 | 142–157 | 16 | 1973.91 | LEYNYNSHNVYIMADK |
| P19 | 158–159 | 2 | 275.17 | QK |
| P20 | 160–163 | 4 | 431.26 | NGIK |
| P21 | 164–167 | 4 | 507.29 | VNFK |
| P22 | 168–169 | 2 | 288.20 | IR |
| P23 | 170–210 | 41 | 4472.18 | HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK |
| P24 | 211–215 | 5 | 602.28 | DPNEK |
| P25 | 216–216 | 1 | 175.12 | R |
| P26 | 217–239 | 23 | 2566.29 | DHMVLLEFVTAAGITLGMDELYK |
| P27 | 240–247 | 8 | 1083.50 | LEHHHHHH |
★ P6 contains the chromophore-forming tripeptide T-Y-G. In a mature-eGFP digest, P6’s observed mass will be ~20 Da lighter (2358.23 Da) than the unmodified prediction — direct peptide-level evidence of chromophore maturation. If the protein is correctly folded and fluorescent, P6 carries the same −20 Da signature we saw on the intact protein, but at peptide resolution.
What PeptideMass will actually report
ExPASy PeptideMass has default settings that filter the list — typically a minimum [M+H]⁺ around 500 Da and the option of 0 vs 1 missed cleavages. Under strict defaults (≥500 Da, 0 missed cleavages), the small di- and mono-residue stubs (P12, P19, P22, P25) and the borderline P1 are filtered out, leaving roughly 19 displayed peptides. The exact displayed count depends on Figure 4’s parameter choices in the homework.
Reading the chromatogram
The total ion chromatogram (TIC) plots total ion intensity (y) against retention time (x). Each peak corresponds to one chromatographic event — a peptide (or co-eluting peptides) reaching the mass spec from the HPLC column at a particular time. The “10% relative abundance” filter means: take the tallest peak in the chromatogram, call it 100%, and count only peaks ≥10% of that height.
Figure W10.10 — Schematic TIC with peaks of varying heights and a horizontal dashed line at 10% of the tallest. Peaks above counted; below excluded. Time window 0.5–6 min annotated.
Why the predicted count rarely matches the observed peak count
The TIC peak count almost never matches the in-silico digest count exactly. Both directions of mismatch are common, and understanding the direction of the mismatch tells you what to look for:
Fewer chromatographic peaks than predicted peptides:
- Very small peptides (the 1–2 residue stubs) elute in the solvent front, unresolved.
- Very hydrophilic peptides don’t retain on reverse-phase C18 and also elute together at the front.
- Peptides outside the m/z scan range (typically 50–2000 m/z on the BioAccord) are not detected.
- Co-elution merges two peptides into a single chromatographic peak.
More chromatographic peaks than predicted:
- Missed cleavages — partial digestion creates peptides spanning multiple expected pieces.
- Modifications — methionine oxidation (+16 Da), asparagine deamidation (+1 Da), cysteine carbamidomethylation (+57 Da, from iodoacetamide alkylation pre-digest) — each shifts a peptide’s mass and appears as a separate peak.
- In-source fragmentation — energetic ionization clips peptides, producing daughter ions at separate m/z.
Identifying a peptide from its mass spectrum
Given a single peak in the LC-MS spectrum at some m/z, identification is two steps.
Step 1 — charge from isotope spacing. Zoom on the peak. For peptides (much smaller than the intact protein, much lower charge), the BioAccord QTof comfortably resolves the isotope envelope. Measure the m/z spacing between adjacent isotope peaks and apply z = 1.00336 / Δ(m/z).
Figure W10.11 — Isotope envelopes at z = 1 (Δ ≈ 1.0 m/z), z = 2 (Δ ≈ 0.5), z = 3 (Δ ≈ 0.33). Reusable for any future peptide-MS work.
Step 2 — [M+H]⁺ from m/z and z. Convert the observed m/z to the singly-protonated mass (the form PeptideMass reports):
[M+H]⁺ = (m/z × z) − (z − 1) × H, where H ≈ 1.00728 Da
For z = 1, [M+H]⁺ = m/z. For z = 2, [M+H]⁺ = 2(m/z) − 1.00728. For z = 3, [M+H]⁺ = 3(m/z) − 2.01456.
Compare the observed [M+H]⁺ to the predicted peptide table (above), find the closest match, and compute the ppm error:
ppm = (observed [M+H]⁺ − theoretical [M+H]⁺) / theoretical × 10⁶
Confidence calls: <5 ppm is a confident match on the BioAccord, 5–50 ppm is probable but worth checking calibration, >50 ppm is either the wrong ID or an unexpected modification.
Sequence coverage and its limits
After all confidently-identified peptides are mapped back onto the eGFP sequence, sequence coverage is the fraction of residues covered by at least one identified peptide:
Coverage = (residues covered / total residues) × 100%
For a single-protease tryptic digest of a ~250-aa protein like ours, >80% coverage is good; >95% usually requires a second protease (Glu-C, chymotrypsin, Lys-C) to fill gaps. But coverage is necessary but not sufficient to confirm identity. A point mutation in an uncovered region is invisible to peptide mapping. The coverage map tells you what you checked, not what’s true about the uncovered region.
For an eGFP confirmation specifically: high coverage that includes P6 (the chromophore-containing peptide) is strong evidence the protein is eGFP; high coverage that excludes P6 is suspicious, because the chromophore region is the protein’s signature.
Figure W10.12 — Sequence coverage map: protein bar with colored stripes showing covered regions and gray gaps for uncovered residues.
Figure W10.27 — All 27 predicted tryptic peptides mapped onto the 247-aa eGFP-LE-6×His sequence, labeled by peptide number. P6 (chromophore-containing) is highlighted in orange — its presence in the observed map confirms the protein is eGFP, not a mass mimic. The 6×His tag (P27) is the purification handle.
The fragment-ion bonus
MS/MS fragmentation breaks the peptide backbone at peptide bonds, producing b-ions (N-terminal fragments retaining charge) and y-ions (C-terminal fragments retaining charge). Mass differences between adjacent b-ions (or y-ions) equal residue masses, so a fragmentation spectrum literally spells out the sequence one residue at a time. ExPASy’s FragIonServlet predicts the b/y-ion ladder for a given peptide; matching the predicted ladder against the observed spectrum confirms the peptide identity and, in turn, that the protein is eGFP.
Native MS: shape from charge state
Mass spectrometry can do more than measure mass. In native mode, the same instrument reveals quaternary structure — whether the protein is folded, whether it’s in an oligomeric complex, whether it’s intact under physiological-ish conditions. The trick is in the solvent and the charge.
What native vs denatured means
When a protein is folded into its native state, it’s a compact 3D shape — a tight, organized structure where every amino acid is in its predetermined place. For eGFP, this is the β-barrel with the chromophore tucked inside. When it’s denatured, the chain unfolds into something closer to a long floppy string, with all residues exposed to solvent. The chromophore is destroyed, the protein no longer fluoresces.
The chemical bonds are intact in both states. The molecular weight is unchanged. But the spectrum looks completely different.
Figure W10.24 — Native eGFP as compact β-barrel with green chromophore vs denatured eGFP as long unfolded string. Same atoms, same bonds, same mass — different shape.
Why the spectrum changes
The number of charges a protein picks up during ESI depends on its shape. In a folded protein, most basic residues (K, R, H, N-terminal amine) are buried in the interior, leaving only the outside-surface ones accessible to grab protons. A folded ~28 kDa protein typically picks up 8–14 protons. In an unfolded protein, every basic residue is exposed; the same 28 kDa molecule picks up 15–30 protons. (Charge ranges are typical orders of magnitude; exact values depend on solvent, source conditions, and protein identity — see Heck, Nat. Methods 2008, the canonical native-MS review.)
More protons → lower m/z (z is in the denominator). Fewer protons → higher m/z. The result:
| Feature | Denatured | Native |
|---|---|---|
| Number of charge states | Many (10–20 peaks) | Few (3–6 peaks) |
| m/z range | Broad (800–1800 for ~28 kDa) | Narrow, shifted high (2000–3500) |
| Charge per molecule | High (z = 15–30) | Low (z = 8–14) |
The narrow envelope at high m/z is the signature of a folded protein. This is why native MS is a structural biology tool, not just a mass-measuring tool.
Figure W10.25 — Two stacked spectra: top, denatured envelope across m/z = 800–1800; bottom, native narrow envelope at m/z = 2400–3200. Same protein, same mass, very different m/z positions.
Worked example: the peak at ~2800 m/z
In the assignment’s Figure 3 (zoomed native eGFP spectrum), the peak at ~2800 m/z has a charge that we can solve for directly, because we already know the mass from the intact analysis. From the m/z formula with M = 27,986.6 Da (mature):
z = M / (m/z − H) = 27,986.6 / (2800 − 1.00728) ≈ 10
So the ~2800 m/z peak is the z = 10 charge state of native, folded eGFP. Confirm by adjacent-peak spacing — the z = 9 peak should be at ~3110 m/z and the z = 11 peak at ~2545 m/z. If those positions match the figure, z = 10 is locked in.
Figure W10.26 — Annotated zoom on the native spectrum: ~2800 m/z peak labeled z = 10, with adjacent peaks labeled z = 9 and z = 11 at expected positions.
The conceptual point worth holding on to: native MS lets you ask “is my protein folded?” with a mass spectrometer. The answer is in the charge distribution, not the mass.
A practical trick — charge-reduction reagents. For small complexes that still pick up too many charges to resolve cleanly, you can add triethylammonium acetate (TEAA) or imidazole to the spray buffer at low millimolar concentrations. These compete with the protein for protons and lower the average charge state, shifting peaks to higher m/z. The same complex that gave a broad, overlapping envelope without TEAA can come out as a clean narrow envelope with it. Worth knowing when a native spectrum looks frustratingly noisy at first try — the fix may be additive chemistry, not a different instrument.
Charge detection MS: scaling to megadalton complexes
The charge-state ladder method has a hard upper limit. At very high masses (>1 megadalton), the peaks pile up too closely together to resolve, and the deconvolution math breaks. For complexes like KLH (keyhole limpet hemocyanin, an oxygen-transport protein from a sea snail) that reach 16 megadaltons — about 600× the size of mature eGFP — ordinary ESI-MS gives an unresolved blob with no rungs to count.
Charge detection mass spectrometry (CDMS) solves this by measuring each ion individually. Instead of inferring mass from the m/z positions of a population, CDMS records both the mass and the charge of single ions, then multiplies them. Repeat across thousands of single-ion events and you build up a histogram on a true mass axis — no deconvolution required.
Figure W10.14 — Side-by-side: clean charge ladder for a small protein vs unresolved blob for a megadalton complex. When peaks merge, we can’t measure them.
Figure W10.15 — Three-stage CDMS schematic: single ion measured for mass and charge → counter ticks up over thousands of ions → finished mass spectrum on a true mass axis.
Reading KLH on a mass axis
KLH is built from subunits — 7FU subunits (340 kDa) and 8FU subunits (400 kDa, where “FU” denotes oxygen-binding functional units). The subunits stack into hollow cylinders called decamers, then stacks of decamers (didecamers, 3-decamers, 4-decamers). Multiplying out:
| Stack | Subunit | Number | Mass |
|---|---|---|---|
| 7FU decamer | 340 kDa | 10 | 3.4 MDa |
| 8FU didecamer | 400 kDa | 20 | 8.0 MDa |
| 8FU 3-decamer | 400 kDa | 30 | 12.0 MDa |
| 8FU 4-decamer | 400 kDa | 40 | 16.0 MDa |
Reading the CDMS spectrum is then trivial: find peaks at 3.4, 8.0, 12.0, and 16.0 MDa, label each with the matching oligomer.
Figure W10.13 — KLH assembly cartoon: subunit → decamer → didecamer → multi-decamer towers, with mass scale bar from 400 kDa to 16 MDa.
Figure W10.16 — Mass-axis cheat-sheet showing the four target masses with subunit math next to each.
The lineage worth knowing: original electrostatic-trap CDMS came from the Jarrold lab (Indiana University, mid-1990s). The modern Orbitrap-based variant — “individual ion MS” or “Direct Mass Technology” — was developed by the Heck lab (Utrecht) and is described in Wörner et al. Nat. Methods 2020, on this week’s reading list. Note that the Jarrold-lab electrostatic-trap CDMS and the Heck-lab Orbitrap-based individual-ion MS are mechanistically distinct instrument architectures with the same conceptual logic — measure single ions individually rather than infer mass from population statistics. The Jarrold setup uses a true electrostatic ion trap with a charge-detection cylinder; the Heck variant uses Fourier deconvolution of the image-current waveform inside a standard Orbitrap analyzer. The Waters workflow in this homework uses the Orbitrap-based lineage.
Pitfalls, controls, and how to know it worked
Half of mass spectrometry is knowing what can go wrong. Six pitfalls worth keeping in mind whenever you interpret a spectrum:
- The chromophore-maturation gotcha. A 20 Da gap between predicted and measured mass on a ~28 kDa protein looks alarming until you remember it’s exactly the chromophore-maturation shift. Always compare against both the unmodified and the mature theoretical values.
- Read-off error vs instrumental error. When you read m/z values off a printed figure, you introduce uncertainty that the instrument itself didn’t have. A real intact-MS measurement of eGFP on an Xevo G3 QTof typically lands within ±5 ppm of theoretical; a hand-read deconvolution from a printed figure can easily come in at tens to a hundred ppm. The ppm number is real, but the source of the error is the reader, not the instrument.
- Q-TOF resolution limits intact-protein charge readout. A 28 kDa protein on a Q-TOF doesn’t usually give resolved isotope peaks at typical denatured charge states; the spectrum looks like a smooth envelope, and charge has to come from the adjacent-peak method, not the isotope spacing. Don’t expect to read charge straight off a zoomed intact-protein peak unless you’re on an Orbitrap or FTICR.
- Sequence coverage is necessary but not sufficient. A point mutation in an uncovered region is invisible to peptide mapping. >80% coverage from a tryptic digest is good practice, but it’s not proof; coverage that excludes the chromophore-containing peptide P6 is especially suspicious for an eGFP claim.
- Missed cleavages and PTMs proliferate peaks. Real chromatograms show more peaks than the in-silico digest predicts because partial digestion (peptides spanning expected cuts) and modifications (Met oxidation, Asn deamidation, Cys carbamidomethylation from IAA alkylation) create extra peptide species. None of these are wrong-protein signals — they’re variants of the right protein.
- Native vs denatured solvent matters. If you submit a folded protein in a denaturing solvent (50% MeCN + 0.1% formic acid), it’ll unfold during electrospray and you’ll see a denatured-style ladder. If you want native MS, you have to submit in ammonium acetate at physiological pH. The instrument doesn’t know what state you intended.
- Adduct ions look like extra peaks. Real intact-protein spectra often show small extra peaks shifted +22 Da (Na⁺ adduct), +38 Da (K⁺ adduct), or various phosphate/sulfate-buffer adducts off the main protein peak. These are not a different protein — they’re the same protein with non-covalent counter-ions stuck to it. The fix is sample prep: desalt rigorously (C18 ZipTip cleanup before LC-MS, or extensive buffer exchange into ammonium acetate for native MS). If adducts are visible in the spectrum, deconvolute the main peak rather than the adducted ones, and report the adduct-free mass.
- Glycoproteins need de-glycosylation first. eGFP from E. coli isn’t glycosylated, so this doesn’t bite us here. But many engineered proteins from eukaryotic expression systems (CHO cells, HEK293, yeast) carry N-glycans that show up as mass heterogeneity — a smear of peaks at +162, +203, +291 Da, etc., corresponding to added monosaccharide units. Standard fix: treat the sample with PNGase F (peptide-N-glycosidase F) to remove N-linked glycans before LC-MS. The de-glycosylated protein gives a single clean peak that matches the predicted sequence MW. Skip this step on a glycoprotein and the ppm comparison falls apart even though the synthesis was successful.
Applying the stack: a real-project measurement plan
Mass spectrometry is one layer in a larger stack. For a real engineered platform — say, the Cholera Shield project, where B. subtilis spores display anti-cholera-toxin VHH nanobodies and GM1-mimic decoys via CotB/CotC coat-protein fusions — no single technique answers “did it work?” The full plan answers four nested questions, in order, with each layer depending on the layer below working:
- Did we assemble the DNA correctly? Colony PCR + Sanger sequencing of the assembly junction + full-plasmid Nanopore (via a service like Plasmidsaurus).
- Is the protein on the spore the protein we designed? Intact LC-MS on recombinant VHH (same workflow as the intact-MS section, applied to the nanobody) plus SDS-PAGE Western blot of spore coat extract.
- Is it folded, accessible, and binding what it should? Flow cytometry with fluorescently-labeled cholera toxin B-subunit, plus SPR/BLI affinity measurement on purified VHH.
- Does the platform actually neutralize cholera toxin? GM1-ELISA inhibition assay plus Vero cell challenge plus germination kinetics in simulated intestinal fluid.
Figure W10.18 — Four-layer measurement-stack pyramid: DNA → protein → fold/surface → function.
The plan splits into two resource scenarios, recognizing that not every lab has core-facility access:
| Question | MVP version (any lab) | Full version (core facility / industrial) |
|---|---|---|
| Right DNA? | Colony PCR + Sanger junction + Nanopore via service | Same — already universally accessible |
| Right protein on spore? | Western blot + (optional) MALDI-TOF | Intact LC-MS + Western blot |
| Folded and accessible? | Bulk plate-reader fluorescence + ELISA dose-response | Flow cytometry + SPR/BLI |
| Neutralizes toxin? | GM1-ELISA + SIF germination kinetics | Adds Vero challenge + in vivo mouse model |
Figure W10.23 — Side-by-side pyramid: MVP stack (cheap techniques, left) vs full stack (core-facility techniques, right).
Bottom line: the MVP stack covers all four questions with ≤$2k consumables and no specialized instruments beyond a plate reader and gel rig. The full stack tightens the answers (real K_D values, quantitative single-cell display, in vivo proof) but doesn’t change which question is being asked at each layer. MVP is sufficient for course scope; for publication or grant proposals, plan core-facility access at least for the protein-identity and surface-display layers.
A forward-looking idea worth noting. CDMS — the single-molecule MS technique we used to weigh KLH oligomers — could in principle be applied to whole B. subtilis spores carrying surface-displayed VHH. Spores are far larger than even KLH didecamers (gigadalton scale), but the single-ion measurement logic doesn’t fundamentally fail at that scale; recent work pushing CDMS into the gigadalton regime (e.g., for viral capsids and lipid nanoparticles) suggests this is technically feasible. The practical applications would be quantifying per-spore VHH copy number and detecting spore-to-spore display heterogeneity that bulk methods would average over. Not within scope for the current homework, but a real research direction if the Cholera Shield project scales up.
Every protein-level measurement in this plan is a direct application of Week 10’s content. (Layer 1 — DNA verification — uses orthogonal techniques: PCR, Sanger, and Nanopore sequencing, not MS.) Three connections worth flagging: Layer 2’s intact LC-MS is the exact intact-MS workflow described in the top-down section above (theoretical vs measured mass, ppm error); Layer 2’s Western blot is the same bottom-up logic as peptide mapping (small-piece identification confirms identity); Layer 3’s flow cytometry is the cellular analogue of native MS — it asks “is the protein folded and surface-displayed?” without disrupting the cell. The measurement stack is the recurring theme of this week: no single technique answers “did it work?” Each layer answers a different sub-question, and confidence comes from agreement across layers.
Recommended reading
Five primary-literature papers that anchor this week’s concepts. DOIs verified. (One above the workspace’s standard four-per-week target — the fifth was added per peer-review recommendation to provide a bottom-up / peptide-mapping primary reference.)
Donnelly DP, Rawlins CM, DeHart CJ et al. (2019). Best practices and benchmarks for intact protein analysis for top-down mass spectrometry. Nature Methods 16: 587–594. doi:10.1038/s41592-019-0457-0 Consortium for Top-Down Proteomics decision tree for intact-protein workflows. The reference text for everything in our intact-MS section.
Heck AJR (2008). Native mass spectrometry: a bridge between interactomics and structural biology. Nature Methods 5: 927–933. doi:10.1038/nmeth.1265 The canonical review of native MS. Read this to understand how a mass spectrum reveals quaternary structure.
Wörner TP, Snijder J, Bennett A et al. (2020). Resolving heterogeneous macromolecular assemblies by Orbitrap-based single-particle charge detection mass spectrometry. Nature Methods 17: 395–398. doi:10.1038/s41592-020-0770-7 The Heck-lab paper establishing Orbitrap-based CDMS as a method for megadalton biomolecular assemblies. The CDMS work underlying the KLH part of this week’s homework.
Smith LM, Kelleher NL & the Consortium for Top Down Proteomics (2013). Proteoform: a single term describing protein complexity. Nature Methods 10: 186–187. doi:10.1038/nmeth.2369 The paper that introduced the term “proteoform” to describe the molecular complexity that intact MS, but not bottom-up MS, can resolve. Read this to understand why top-down and bottom-up are not interchangeable.
Aebersold R, Mann M (2003). Mass spectrometry-based proteomics. Nature 422: 198–207. doi:10.1038/nature01511 The canonical primary reference for bottom-up proteomics workflows. Reading this anchors the peptide-mapping logic in this week’s content to its original methodological context.
Course resources
- HTGAA Week 10 page:
https://2026a.htgaa.org/2026a/course-pages/weeks/week-10/index.html(assignment source) - ExPASy ProtParam (theoretical MW + pI from sequence): web.expasy.org/protparam
- ExPASy PeptideMass (in-silico tryptic digest + peptide [M+H]⁺ list): web.expasy.org/peptide_mass
- FragIonServlet (predict b/y-ion fragmentation ladder for a peptide): db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html
- ASMS Fundamentals of Mass Spectrometry: asms.org/about-mass-spec/fundamentals-hardware-instrumentation
- Native vs Denatured charge state distributions (reading from HTGAA): pmc.ncbi.nlm.nih.gov/articles/PMC7539638
References
The DOI citations above are the working primary literature. Additional facts pulled from outside the four papers:
- Cormack BP, Valdivia RH, Falkow S (1996). FACS-optimized mutants of the green fluorescent protein (GFP). Gene 173: 33–38. doi:10.1016/0378-1119(95)00685-0 The original eGFP paper; source of the F64L/S65T mutations relative to wild-type GFP.
- Tsien RY (1998). The green fluorescent protein. Annu. Rev. Biochem. 67: 509–544. doi:10.1146/annurev.biochem.67.1.509 Canonical review of GFP chromophore maturation chemistry (cyclization + oxidation, −20 Da).
- Royant A, Noirclerc-Savoye M (2011). Stabilizing role of glutamic acid 222 in the structure of Enhanced Green Fluorescent Protein. J. Struct. Biol. 174: 385–390. PMC3473056 Crystal structure used to confirm chromophore positions and orientations.
- Konermann L, Ahadi E, Rodriguez AD, Vahidi S (2013). Unraveling the mechanism of electrospray ionization. Anal. Chem. 85: 2–9. doi:10.1021/ac302789c The Chain Ejection Model (CEM) reference cited in the ESI mechanism section.
Last reviewed: 2026-05-26. Figures W10.1–W10.26 are spec’d in notes.md and pending creation. Assignment-supplied Waters Figures 1, 2, 3, 4, 5a, 5b, 5c, 6, 7 are pending insertion from the HTGAA course page. ProtParam-derived eGFP MW values (28,006.6 unmodified / 27,986.6 mature) verified by independent Python calculation cross-checked against published Bio-Techne and FPbase reference values for the bare 239-aa eGFP form.