Week 10 Review: Advanced Imaging and Measurement

Week 10 — Advanced Imaging & Measurement: How do we know what we made?

Course: HTGAA Spring 2026 Lecture (Tues, Apr 7, 2026): Evan Daugharthy, Lindsay Morrison — Advanced Imaging & Measurement Tech Recitation (Wed, Apr 8): Waters Corp Team — Mass spectrometry Author: Fiona (Committed Listener track)

At a glance. Mass spectrometry asks a precise quantitative question: did the molecule that came out of the column have the mass we predicted from the sequence? When the answer is yes within a few parts per million, it’s the same molecule. When it isn’t, the difference itself tells you what went wrong. This page builds the logic of intact-protein LC-MS, peptide mapping, and charge detection MS from first principles, with eGFP as the example throughout.

Headline takeaway

“Did I make my protein?” is a numerical question with a numerical answer. Mass spectrometry turns it into a comparison: theoretical mass from the sequence versus measured mass from the instrument, expressed as a parts-per-million error. Every layer of confirmation in synthetic biology — DNA, protein, fold, function — eventually passes through this comparison.

Why good and accurate measurement is crucial to experimental design

Synthetic biology is a design-build-test cycle. Every cycle ends with measurement: did the construct behave as designed? Mass spectrometry is the most quantitative tool we have for that final check on proteins. It’s also the only method that can tell you, in a single experiment, whether a protein:

has the predicted sequence (yes/no, with ppm-level confidence),
carries the post-translational modifications you expected (PTMs show up as mass shifts),
folded into its native conformation (native MS reveals shape via charge state), and
assembled into the right oligomeric state (CDMS works at megadalton scale).

For a project like Cholera Shield — engineering B. subtilis spores to display anti-cholera-toxin nanobodies — every one of those questions has to be answered before the platform’s function can be evaluated. This week’s content is the bridge between we designed and built it and it actually works.

Core concepts — the minimum vocabulary

A handful of terms recur through this page. Define them once here, then use them freely below.

m/z (mass-to-charge ratio) — what the instrument actually measures. The protein is converted to an ion with multiple positive charges; the instrument reports the ion’s mass divided by its charge.
Average vs monoisotopic mass — carbon in nature is ~99% ¹²C and ~1% ¹³C; nitrogen and sulfur have similar isotope distributions. Average mass is what you’d get if you weighed a population — it accounts for the natural isotope mix. Monoisotopic mass is the mass of the all-light-isotope species (all ¹²C, all ¹⁴N, etc.). For intact proteins above ~10 kDa, the spectrum’s most-abundant peak is shifted off the monoisotopic mass by several Da and average mass is the appropriate comparison; ProtParam reports average mass for this reason. For peptides below ~3 kDa on a high-resolution instrument, the isotope envelope is resolved and monoisotopic mass is what you compare. Mismatching the two is one of the most common ppm-error traps in real intact-MS work.
Charge-state ladder — a single protein produces many peaks, not one. Each peak corresponds to the same molecule carrying a different number of protons.
Electrospray ionization (ESI) — the gentle, biomolecule-compatible method for getting a protein into the gas phase as an ion.
Native vs denatured MS — the same protein gives different spectra depending on whether it’s folded (native) or unfolded (denatured). The shape difference shows up as a difference in charge.
Top-down vs bottom-up MS — top-down weighs the whole intact protein; bottom-up digests the protein into peptides and weighs each piece.
Tryptic digest — using the enzyme trypsin to chop a protein into predictable peptide fragments. Trypsin cleaves on the C-terminal side of K and R (unless followed by P).
ppm error — mass measurement accuracy expressed as (measured − theoretical) / theoretical × 1,000,000. The currency of “did the masses agree?”
Charge detection MS (CDMS) — a single-molecule variant of mass spectrometry that works on assemblies too large for ordinary ESI-MS to resolve.

The shape of the measurement: from droplet to spectrum

Before the protein gets weighed, it has to be turned into an ion that the instrument can manipulate with electric fields. This is electrospray ionization (ESI), and it’s the foundation that every measurement in this week’s homework rests on.

Imagine the needle tip of the electrospray source held at +2–4 kV relative to the mass-spec inlet. The solvent (protein dissolved in water, methanol, acetonitrile, with a little acid) is drawn out of the needle into a sharp Taylor cone, and tiny droplets break off and fly through warm nitrogen gas toward the inlet. Each droplet leaves the cone already carrying a net positive charge — typically tens to hundreds of excess protons distributed across the protein molecules inside.

That’s the setup. The interesting physics happens next.

Coulombic repulsion. Inside the droplet, every excess proton is repelled by every other one. Surface tension (γ ≈ 72 mN/m for water) is what keeps the droplet spherical and intact. As the droplet flies and evaporates, its radius R shrinks but its charge q stays roughly constant. The charge density q²/R climbs fast.

At some point, repulsion wins. The threshold is the Rayleigh limit: q_R = 8π · √(ε₀ · γ · R³). When the droplet’s charge approaches q_R, it becomes unstable and undergoes a Coulomb fission — it extrudes a thin jet that breaks into 10–20 much smaller progeny droplets that carry away ~15% of the mass but ~30% of the charge. The parent shrinks below the threshold and resumes evaporation; the progeny repeat the cycle.

This cascades until you’re left with droplets containing single protein molecules. How the protein actually emerges depends on whether it’s folded or unfolded — and that turns out to be the entire reason native and denatured spectra look so different:

Charge Residue Model (CRM) — Dole, 1968. The final droplet evaporates to nothing and the protein inherits whatever charges the droplet had left. Compact, folded proteins follow CRM. The leftover charge scales with the droplet radius at the final step, which scales with the protein’s own size → narrow charge envelope, lower z. This is the native-MS regime.
Chain Ejection Model (CEM) — Konermann, 2013. An unfolded chain pokes out of the droplet surface and extrudes stepwise like a snake leaving a hole; protonation happens on the exposed segments as they emerge. Every basic residue along the chain becomes eligible to grab a proton → broad envelope, higher z. This is the denatured-MS regime.

So “Coulombic repulsion” isn’t a hand-wave. It’s literally the force that explodes droplets into ever-smaller progeny and determines whether the protein leaves the droplet folded or unfolded.

Figure W10.4 — Charge-state ladder cartoon: same eGFP molecule shown carrying different proton counts, with arrows showing how higher charge maps to lower m/z on the spectrum.

Top-down MS: weighing the whole protein

The simplest mass-spec measurement is intact protein MS. You put the purified protein on the instrument, get a spectrum, deconvolute the charge ladder, and read off a single number: the protein’s molecular weight. The whole game is then comparing that number against what you predicted from the sequence.

Predicting the mass from sequence

A protein is a chain of amino acids linked end-to-end. Every individual amino acid has a known mass. When amino acids link up, each peptide-bond formation kicks out one water molecule. So if you know the sequence, you can predict the mass exactly — sum up the residue masses (the amino-acid mass with one water already subtracted), then add back one water for the free termini:

MW(protein) = Σ(residue masses) + 18.02 Da

That’s the entire formula. ExPASy ProtParam does it with a lookup table of average residue masses.

Figure W10.1 — Peptide bond formation with H₂O leaving. Justifies “residue mass = amino acid mass − 18.”

For our HTGAA homework eGFP sequence — MVSKGEELFTG...LGMDELYKLEHHHHHH, 247 amino acids — the calculation gives:

Quantity	Value
Length	247 amino acids
Composition	20 K, 6 R, 15 H, 2 C, 6 M, plus the other 14 amino-acid types
Theoretical MW (unmodified)	28,006.6 Da
Chromophore maturation correction (−20.03 Da)	—
Theoretical MW (mature, fluorescent)	27,986.6 Da

Figure W10.2 — Linear cartoon of the 247-aa eGFP-LE-6×His construct, with the T-Y-G chromophore tripeptide colored inside the eGFP body.

The first 239 residues are canonical eGFP — the same molecule used in fluorescent biosensors, FRET probes, and millions of transgenic mice. The last 8 residues, LEHHHHHH, are an engineered tag bolted onto the C-terminus to make the protein easy to purify on a nickel column (the six histidines bind immobilized Ni²⁺; the LE linker gives the tag room to move without interfering with folding).

The −20 Da chromophore correction

GFP fluoresces for one reason: it self-assembles a fluorescent group, the chromophore, from three of its own residues — positions 65–66–67 in the conventional GFP numbering (Thr–Tyr–Gly in eGFP, after the canonical S65T mutation). The chromophore forms in two spontaneous chemical steps once the protein folds:

Cyclization (with dehydration of the tetrahedral intermediate). The carbonyl C of Thr65 attacks the backbone amine N of Gly67 to form a tetrahedral intermediate, which loses water to give the 5-membered imidazolinone ring. Net mass change for this combined step: −18.01 Da. Some references (e.g., Barondeau 2003) describe cyclization and dehydration as two separate nominal steps; this writeup collapses them because the net mass change occurs together.
Oxidation. Molecular O₂ removes two hydrogens off the Cα–Cβ of Tyr66, extending aromatic conjugation. This is the rate-limiting step (minutes to hours) and the reason GFP doesn’t fluoresce without O₂. −2.02 Da.

Net: −20.03 Da vs the unmodified sequence MW.

Figure W10.3 — Three-panel mechanism: Thr-Tyr-Gly tripeptide → cyclized intermediate (−18 Da) → mature chromophore (−2 Da). Total Δm = −20.03 Da.

This 20 Da shift is the single most common gotcha in intact GFP MS. On a 28 kDa protein it’s roughly 715 ppm — about 140× the Waters Xevo G3 QTof’s ~5 ppm mass-accuracy floor. A reader who forgets the correction will misdiagnose a perfectly good fluorescent sample as a failure (because the measurement looks “way off” relative to the unmodified theoretical), and conversely a non-fluorescent immature batch can match the unmodified prediction exactly. Mass alone doesn’t tell you which case you’re in — you need the fluorescence readout too.

Deconvoluting the charge-state ladder

The protein doesn’t give a single peak. It gives a ladder, because each molecule picks up a different number of protons during ESI. A 28 kDa eGFP molecule can pick up anywhere from 10 to 30 protons depending on solvent and exposed basic residues. The mass spec sees each charge state as a separate peak.

The m/z formula for any one peak:

m/z = (M + z·H) / z, where H ≈ 1.00728 Da is the proton mass.

This is one equation with two unknowns (M, z), so a single peak doesn’t determine M. The trick is picking two adjacent peaks — they differ in charge by exactly 1, with the lower-m/z peak carrying the higher charge — and solving the two equations simultaneously. Call the higher-m/z peak m₁ (charge z) and the lower-m/z peak m₂ (charge z+1):

z = (m₂ − H) / (m₁ − m₂) → round to integer M = z·(m₁ − H)

[Figure W10.5 — placeholder] Annotated overlay on the Waters denatured eGFP spectrum (assignment Figure 1): two adjacent peaks labeled m₁ and m₂, with the calculation panel inset.

In practice, read m/z values off two adjacent peaks, plug into the formula, round z to the nearest integer, compute M, repeat across several adjacent pairs, and average. The spread across pairs tells you the read-off uncertainty.

When the zoom can read charge directly

If you zoom into a single peak hard enough, sometimes you can see the isotope envelope: a series of sub-peaks at +0, +1, +2, +3 Da from the monoisotopic mass, corresponding to molecules with 0, 1, 2, 3 ¹³C atoms. The mass spacing between adjacent isotope peaks is 1.00336 Da (the ¹³C − ¹²C difference), but the instrument reports m/z, so the apparent spacing is:

Δ(m/z) = 1.00336 / z

If you can resolve the isotope peaks, charge is read straight off the zoom: z = 1.00336 / Δ(m/z).

The Xevo G3 QTof is specified at ≥40,000 FWHM resolving power, but the practical resolution on a ~28 kDa intact protein is closer to 25,000–30,000 FWHM due to conformational microheterogeneity and incomplete desolvation broadening the peaks. At m/z = 1500 and 30,000 practical FWHM, the instrument can distinguish peaks separated by 1500/30,000 = 0.05 m/z. For eGFP at z = 18 sitting around m/z = 1473, isotope peaks would be spaced 1.00336/18 ≈ 0.056 m/z — right at the practical resolution limit. In practice, the isotope envelope of a 28 kDa protein on a Q-TOF is usually unresolved, so the adjacent-peak method above is the one that gives you the answer.

A higher-resolution instrument — an Orbitrap (~120,000+) or FTICR (>500,000) — would resolve the isotope envelope and let you read charge directly. This is one of the main reasons high-resolution instruments are preferred for intact-protein work.

Figure W10.6 — Side-by-side comparison: unresolved peak (Q-TOF, smooth Gaussian) vs resolved isotope comb (Orbitrap).

Closing the loop: the ppm comparison

Once you have a measured M and a theoretical M from the sequence, the comparison is one number:

ppm error = (M_observed − M_theoretical) / M_theoretical × 1,000,000

Interpretation under good instrument calibration:

<5 ppm — confident match. Yes, you made the right protein.
5–50 ppm — probably a match. Check calibration; consider a missed modification.
>50 ppm — not a match, or a major unaccounted modification (large PTM, disulfide miscount, etc.).
~700 ppm on a 28 kDa protein — a 20 Da gap. Almost certainly the chromophore-maturation shift, meaning the comparison was made against the wrong theoretical value.

For the HTGAA homework, the “did I make GFP?” table looks like this:

Form of eGFP	Theoretical (Da)	Measured (Da)	ppm error
Unmodified sequence	28,006.6	(from intact-MS deconvolution)	(compute)
Mature, fluorescent eGFP	27,986.6	(from intact-MS deconvolution)	(compute)

The mature row is the one that matters when the sample is fluorescent. If only the unmodified row matches (with the mature row ~715 ppm off), the protein was made but didn’t mature — probably no fluorescence. If neither row matches, something else came out of the column.

Figure W10.17 — Three-branch decision tree for interpreting the ppm result: mature ppm < 5 → fluorescent eGFP; unmodified ppm < 5 with mature ppm ≈ 715 → immature; neither matches → debug.

Bottom-up MS: confirming the sequence piece by piece

Intact mass tells you what the protein weighs. It doesn’t tell you the sequence — two proteins of identical mass can have completely different sequences (any two residues that sum to the same total are interchangeable in an intact-mass measurement). To verify the actual sequence, we shred the protein into smaller, identifiable pieces and check each piece against what we expect.

The metaphor: if intact MS is weighing the whole book to check it’s the right book, peptide mapping is tearing it into chapters and confirming each chapter is the one you expected.

Figure W10.7 — Six-panel workflow: purified eGFP → trypsin digest → HPLC column with eluting peptide peaks → mass spec → spectrum → peptide ID table → sequence coverage map.

Why trypsin

Trypsin is a serine protease that cleaves the peptide bond on the C-terminal side of K (lysine) or R (arginine) — unless the next residue is proline. The K-P / R-P exception comes from proline’s geometry: its side chain locks the backbone into a kink that’s a poor fit for trypsin’s active site.

Figure W10.9 — Cartoon of trypsin’s active site cleaving at K-X (success) vs K-P (failure, blocked by proline’s ring).

For the HTGAA eGFP construct, the lysine and arginine count comes out to 20 K + 6 R = 26 K/R residues. Scanning the sequence for K-P or R-P motifs: there are none. So all 26 sites are cleavable, and the C-terminal residue isn’t a K or R, so there’s no terminal cleavage to worry about.

Predicted peptide count = cleavage sites + 1 = 27 peptides (zero missed cleavages).

Figure W10.8 — eGFP sequence with K residues colored one shade and R residues another; all 26 cleavage sites highlighted; tag region (LEHHHHHH) shaded separately.

The peptide ladder for our eGFP

A Python tryptic digest of the assignment sequence gives the following 27 peptides, with their predicted singly-protonated monoisotopic masses ([M+H]⁺):

#	Position	Length	[M+H]⁺ (Da)	Sequence
P1	1–4	4	464.25	MVSK
P2	5–27	23	2437.26	GEELFTGVVPILVELDGDVNGHK
P3	28–42	15	1503.66	FSVSGEGEGDATYGK
P4	43–46	4	474.33	LTLK
P5	47–53	7	769.39	FICTTGK
P6	54–74	21	2378.26 (or 2358.23 mature)	LPVPWPTLVTTLTYGVQCFSR ★
P7	75–80	6	790.36	YPDHMK
P8	81–86	6	821.39	QHDFFK
P9	87–97	11	1266.58	SAMPEGYVQER
P10	98–102	5	655.38	TIFFK
P11	103–108	6	711.29	DDGNYK
P12	109–110	2	276.17	TR
P13	111–114	4	446.26	AEVK
P14	115–123	9	1050.52	FEGDTLVNR
P15	124–127	4	502.32	IELK
P16	128–132	5	579.31	GIDFK
P17	133–141	9	982.50	EDGNILGHK
P18	142–157	16	1973.91	LEYNYNSHNVYIMADK
P19	158–159	2	275.17	QK
P20	160–163	4	431.26	NGIK
P21	164–167	4	507.29	VNFK
P22	168–169	2	288.20	IR
P23	170–210	41	4472.18	HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK
P24	211–215	5	602.28	DPNEK
P25	216–216	1	175.12	R
P26	217–239	23	2566.29	DHMVLLEFVTAAGITLGMDELYK
P27	240–247	8	1083.50	LEHHHHHH

★ P6 contains the chromophore-forming tripeptide T-Y-G. In a mature-eGFP digest, P6’s observed mass will be ~20 Da lighter (2358.23 Da) than the unmodified prediction — direct peptide-level evidence of chromophore maturation. If the protein is correctly folded and fluorescent, P6 carries the same −20 Da signature we saw on the intact protein, but at peptide resolution.

What PeptideMass will actually report

ExPASy PeptideMass has default settings that filter the list — typically a minimum [M+H]⁺ around 500 Da and the option of 0 vs 1 missed cleavages. Under strict defaults (≥500 Da, 0 missed cleavages), the small di- and mono-residue stubs (P12, P19, P22, P25) and the borderline P1 are filtered out, leaving roughly 19 displayed peptides. The exact displayed count depends on Figure 4’s parameter choices in the homework.

Reading the chromatogram

The total ion chromatogram (TIC) plots total ion intensity (y) against retention time (x). Each peak corresponds to one chromatographic event — a peptide (or co-eluting peptides) reaching the mass spec from the HPLC column at a particular time. The “10% relative abundance” filter means: take the tallest peak in the chromatogram, call it 100%, and count only peaks ≥10% of that height.

Figure W10.10 — Schematic TIC with peaks of varying heights and a horizontal dashed line at 10% of the tallest. Peaks above counted; below excluded. Time window 0.5–6 min annotated.

Why the predicted count rarely matches the observed peak count

The TIC peak count almost never matches the in-silico digest count exactly. Both directions of mismatch are common, and understanding the direction of the mismatch tells you what to look for:

Fewer chromatographic peaks than predicted peptides:

Very small peptides (the 1–2 residue stubs) elute in the solvent front, unresolved.
Very hydrophilic peptides don’t retain on reverse-phase C18 and also elute together at the front.
Peptides outside the m/z scan range (typically 50–2000 m/z on the BioAccord) are not detected.
Co-elution merges two peptides into a single chromatographic peak.

More chromatographic peaks than predicted:

Missed cleavages — partial digestion creates peptides spanning multiple expected pieces.
Modifications — methionine oxidation (+16 Da), asparagine deamidation (+1 Da), cysteine carbamidomethylation (+57 Da, from iodoacetamide alkylation pre-digest) — each shifts a peptide’s mass and appears as a separate peak.
In-source fragmentation — energetic ionization clips peptides, producing daughter ions at separate m/z.

Identifying a peptide from its mass spectrum

Given a single peak in the LC-MS spectrum at some m/z, identification is two steps.

Step 1 — charge from isotope spacing. Zoom on the peak. For peptides (much smaller than the intact protein, much lower charge), the BioAccord QTof comfortably resolves the isotope envelope. Measure the m/z spacing between adjacent isotope peaks and apply z = 1.00336 / Δ(m/z).

Figure W10.11 — Isotope envelopes at z = 1 (Δ ≈ 1.0 m/z), z = 2 (Δ ≈ 0.5), z = 3 (Δ ≈ 0.33). Reusable for any future peptide-MS work.

Step 2 — [M+H]⁺ from m/z and z. Convert the observed m/z to the singly-protonated mass (the form PeptideMass reports):

[M+H]⁺ = (m/z × z) − (z − 1) × H, where H ≈ 1.00728 Da

For z = 1, [M+H]⁺ = m/z. For z = 2, [M+H]⁺ = 2(m/z) − 1.00728. For z = 3, [M+H]⁺ = 3(m/z) − 2.01456.

Compare the observed [M+H]⁺ to the predicted peptide table (above), find the closest match, and compute the ppm error:

ppm = (observed [M+H]⁺ − theoretical [M+H]⁺) / theoretical × 10⁶

Confidence calls: <5 ppm is a confident match on the BioAccord, 5–50 ppm is probable but worth checking calibration, >50 ppm is either the wrong ID or an unexpected modification.

Sequence coverage and its limits

After all confidently-identified peptides are mapped back onto the eGFP sequence, sequence coverage is the fraction of residues covered by at least one identified peptide:

Coverage = (residues covered / total residues) × 100%

For a single-protease tryptic digest of a ~250-aa protein like ours, >80% coverage is good; >95% usually requires a second protease (Glu-C, chymotrypsin, Lys-C) to fill gaps. But coverage is necessary but not sufficient to confirm identity. A point mutation in an uncovered region is invisible to peptide mapping. The coverage map tells you what you checked, not what’s true about the uncovered region.

For an eGFP confirmation specifically: high coverage that includes P6 (the chromophore-containing peptide) is strong evidence the protein is eGFP; high coverage that excludes P6 is suspicious, because the chromophore region is the protein’s signature.

Figure W10.12 — Sequence coverage map: protein bar with colored stripes showing covered regions and gray gaps for uncovered residues.

Figure W10.27 — All 27 predicted tryptic peptides mapped onto the 247-aa eGFP-LE-6×His sequence, labeled by peptide number. P6 (chromophore-containing) is highlighted in orange — its presence in the observed map confirms the protein is eGFP, not a mass mimic. The 6×His tag (P27) is the purification handle.

The fragment-ion bonus

MS/MS fragmentation breaks the peptide backbone at peptide bonds, producing b-ions (N-terminal fragments retaining charge) and y-ions (C-terminal fragments retaining charge). Mass differences between adjacent b-ions (or y-ions) equal residue masses, so a fragmentation spectrum literally spells out the sequence one residue at a time. ExPASy’s FragIonServlet predicts the b/y-ion ladder for a given peptide; matching the predicted ladder against the observed spectrum confirms the peptide identity and, in turn, that the protein is eGFP.

Native MS: shape from charge state

Mass spectrometry can do more than measure mass. In native mode, the same instrument reveals quaternary structure — whether the protein is folded, whether it’s in an oligomeric complex, whether it’s intact under physiological-ish conditions. The trick is in the solvent and the charge.

What native vs denatured means

When a protein is folded into its native state, it’s a compact 3D shape — a tight, organized structure where every amino acid is in its predetermined place. For eGFP, this is the β-barrel with the chromophore tucked inside. When it’s denatured, the chain unfolds into something closer to a long floppy string, with all residues exposed to solvent. The chromophore is destroyed, the protein no longer fluoresces.

The chemical bonds are intact in both states. The molecular weight is unchanged. But the spectrum looks completely different.

Figure W10.24 — Native eGFP as compact β-barrel with green chromophore vs denatured eGFP as long unfolded string. Same atoms, same bonds, same mass — different shape.

Why the spectrum changes

The number of charges a protein picks up during ESI depends on its shape. In a folded protein, most basic residues (K, R, H, N-terminal amine) are buried in the interior, leaving only the outside-surface ones accessible to grab protons. A folded ~28 kDa protein typically picks up 8–14 protons. In an unfolded protein, every basic residue is exposed; the same 28 kDa molecule picks up 15–30 protons. (Charge ranges are typical orders of magnitude; exact values depend on solvent, source conditions, and protein identity — see Heck, Nat. Methods 2008, the canonical native-MS review.)

More protons → lower m/z (z is in the denominator). Fewer protons → higher m/z. The result:

Feature	Denatured	Native
Number of charge states	Many (10–20 peaks)	Few (3–6 peaks)
m/z range	Broad (800–1800 for ~28 kDa)	Narrow, shifted high (2000–3500)
Charge per molecule	High (z = 15–30)	Low (z = 8–14)

The narrow envelope at high m/z is the signature of a folded protein. This is why native MS is a structural biology tool, not just a mass-measuring tool.

Figure W10.25 — Two stacked spectra: top, denatured envelope across m/z = 800–1800; bottom, native narrow envelope at m/z = 2400–3200. Same protein, same mass, very different m/z positions.

Worked example: the peak at ~2800 m/z

In the assignment’s Figure 3 (zoomed native eGFP spectrum), the peak at ~2800 m/z has a charge that we can solve for directly, because we already know the mass from the intact analysis. From the m/z formula with M = 27,986.6 Da (mature):

z = M / (m/z − H) = 27,986.6 / (2800 − 1.00728) ≈ 10

So the ~2800 m/z peak is the z = 10 charge state of native, folded eGFP. Confirm by adjacent-peak spacing — the z = 9 peak should be at ~3110 m/z and the z = 11 peak at ~2545 m/z. If those positions match the figure, z = 10 is locked in.

Figure W10.26 — Annotated zoom on the native spectrum: ~2800 m/z peak labeled z = 10, with adjacent peaks labeled z = 9 and z = 11 at expected positions.

The conceptual point worth holding on to: native MS lets you ask “is my protein folded?” with a mass spectrometer. The answer is in the charge distribution, not the mass.

A practical trick — charge-reduction reagents. For small complexes that still pick up too many charges to resolve cleanly, you can add triethylammonium acetate (TEAA) or imidazole to the spray buffer at low millimolar concentrations. These compete with the protein for protons and lower the average charge state, shifting peaks to higher m/z. The same complex that gave a broad, overlapping envelope without TEAA can come out as a clean narrow envelope with it. Worth knowing when a native spectrum looks frustratingly noisy at first try — the fix may be additive chemistry, not a different instrument.

Charge detection MS: scaling to megadalton complexes

The charge-state ladder method has a hard upper limit. At very high masses (>1 megadalton), the peaks pile up too closely together to resolve, and the deconvolution math breaks. For complexes like KLH (keyhole limpet hemocyanin, an oxygen-transport protein from a sea snail) that reach 16 megadaltons — about 600× the size of mature eGFP — ordinary ESI-MS gives an unresolved blob with no rungs to count.

Charge detection mass spectrometry (CDMS) solves this by measuring each ion individually. Instead of inferring mass from the m/z positions of a population, CDMS records both the mass and the charge of single ions, then multiplies them. Repeat across thousands of single-ion events and you build up a histogram on a true mass axis — no deconvolution required.

Figure W10.14 — Side-by-side: clean charge ladder for a small protein vs unresolved blob for a megadalton complex. When peaks merge, we can’t measure them.

Figure W10.15 — Three-stage CDMS schematic: single ion measured for mass and charge → counter ticks up over thousands of ions → finished mass spectrum on a true mass axis.

Reading KLH on a mass axis

KLH is built from subunits — 7FU subunits (340 kDa) and 8FU subunits (400 kDa, where “FU” denotes oxygen-binding functional units). The subunits stack into hollow cylinders called decamers, then stacks of decamers (didecamers, 3-decamers, 4-decamers). Multiplying out:

Stack	Subunit	Number	Mass
7FU decamer	340 kDa	10	3.4 MDa
8FU didecamer	400 kDa	20	8.0 MDa
8FU 3-decamer	400 kDa	30	12.0 MDa
8FU 4-decamer	400 kDa	40	16.0 MDa

Reading the CDMS spectrum is then trivial: find peaks at 3.4, 8.0, 12.0, and 16.0 MDa, label each with the matching oligomer.

Figure W10.13 — KLH assembly cartoon: subunit → decamer → didecamer → multi-decamer towers, with mass scale bar from 400 kDa to 16 MDa.

Figure W10.16 — Mass-axis cheat-sheet showing the four target masses with subunit math next to each.

The lineage worth knowing: original electrostatic-trap CDMS came from the Jarrold lab (Indiana University, mid-1990s). The modern Orbitrap-based variant — “individual ion MS” or “Direct Mass Technology” — was developed by the Heck lab (Utrecht) and is described in Wörner et al. Nat. Methods 2020, on this week’s reading list. Note that the Jarrold-lab electrostatic-trap CDMS and the Heck-lab Orbitrap-based individual-ion MS are mechanistically distinct instrument architectures with the same conceptual logic — measure single ions individually rather than infer mass from population statistics. The Jarrold setup uses a true electrostatic ion trap with a charge-detection cylinder; the Heck variant uses Fourier deconvolution of the image-current waveform inside a standard Orbitrap analyzer. The Waters workflow in this homework uses the Orbitrap-based lineage.

Pitfalls, controls, and how to know it worked

Half of mass spectrometry is knowing what can go wrong. Six pitfalls worth keeping in mind whenever you interpret a spectrum:

The chromophore-maturation gotcha. A 20 Da gap between predicted and measured mass on a ~28 kDa protein looks alarming until you remember it’s exactly the chromophore-maturation shift. Always compare against both the unmodified and the mature theoretical values.
Read-off error vs instrumental error. When you read m/z values off a printed figure, you introduce uncertainty that the instrument itself didn’t have. A real intact-MS measurement of eGFP on an Xevo G3 QTof typically lands within ±5 ppm of theoretical; a hand-read deconvolution from a printed figure can easily come in at tens to a hundred ppm. The ppm number is real, but the source of the error is the reader, not the instrument.
Q-TOF resolution limits intact-protein charge readout. A 28 kDa protein on a Q-TOF doesn’t usually give resolved isotope peaks at typical denatured charge states; the spectrum looks like a smooth envelope, and charge has to come from the adjacent-peak method, not the isotope spacing. Don’t expect to read charge straight off a zoomed intact-protein peak unless you’re on an Orbitrap or FTICR.
Sequence coverage is necessary but not sufficient. A point mutation in an uncovered region is invisible to peptide mapping. >80% coverage from a tryptic digest is good practice, but it’s not proof; coverage that excludes the chromophore-containing peptide P6 is especially suspicious for an eGFP claim.
Missed cleavages and PTMs proliferate peaks. Real chromatograms show more peaks than the in-silico digest predicts because partial digestion (peptides spanning expected cuts) and modifications (Met oxidation, Asn deamidation, Cys carbamidomethylation from IAA alkylation) create extra peptide species. None of these are wrong-protein signals — they’re variants of the right protein.
Native vs denatured solvent matters. If you submit a folded protein in a denaturing solvent (50% MeCN + 0.1% formic acid), it’ll unfold during electrospray and you’ll see a denatured-style ladder. If you want native MS, you have to submit in ammonium acetate at physiological pH. The instrument doesn’t know what state you intended.
Adduct ions look like extra peaks. Real intact-protein spectra often show small extra peaks shifted +22 Da (Na⁺ adduct), +38 Da (K⁺ adduct), or various phosphate/sulfate-buffer adducts off the main protein peak. These are not a different protein — they’re the same protein with non-covalent counter-ions stuck to it. The fix is sample prep: desalt rigorously (C18 ZipTip cleanup before LC-MS, or extensive buffer exchange into ammonium acetate for native MS). If adducts are visible in the spectrum, deconvolute the main peak rather than the adducted ones, and report the adduct-free mass.
Glycoproteins need de-glycosylation first. eGFP from E. coli isn’t glycosylated, so this doesn’t bite us here. But many engineered proteins from eukaryotic expression systems (CHO cells, HEK293, yeast) carry N-glycans that show up as mass heterogeneity — a smear of peaks at +162, +203, +291 Da, etc., corresponding to added monosaccharide units. Standard fix: treat the sample with PNGase F (peptide-N-glycosidase F) to remove N-linked glycans before LC-MS. The de-glycosylated protein gives a single clean peak that matches the predicted sequence MW. Skip this step on a glycoprotein and the ppm comparison falls apart even though the synthesis was successful.

Applying the stack: a real-project measurement plan

Mass spectrometry is one layer in a larger stack. For a real engineered platform — say, the Cholera Shield project, where B. subtilis spores display anti-cholera-toxin VHH nanobodies and GM1-mimic decoys via CotB/CotC coat-protein fusions — no single technique answers “did it work?” The full plan answers four nested questions, in order, with each layer depending on the layer below working:

Did we assemble the DNA correctly? Colony PCR + Sanger sequencing of the assembly junction + full-plasmid Nanopore (via a service like Plasmidsaurus).
Is the protein on the spore the protein we designed? Intact LC-MS on recombinant VHH (same workflow as the intact-MS section, applied to the nanobody) plus SDS-PAGE Western blot of spore coat extract.
Is it folded, accessible, and binding what it should? Flow cytometry with fluorescently-labeled cholera toxin B-subunit, plus SPR/BLI affinity measurement on purified VHH.
Does the platform actually neutralize cholera toxin? GM1-ELISA inhibition assay plus Vero cell challenge plus germination kinetics in simulated intestinal fluid.

Figure W10.18 — Four-layer measurement-stack pyramid: DNA → protein → fold/surface → function.

The plan splits into two resource scenarios, recognizing that not every lab has core-facility access:

Question	MVP version (any lab)	Full version (core facility / industrial)
Right DNA?	Colony PCR + Sanger junction + Nanopore via service	Same — already universally accessible
Right protein on spore?	Western blot + (optional) MALDI-TOF	Intact LC-MS + Western blot
Folded and accessible?	Bulk plate-reader fluorescence + ELISA dose-response	Flow cytometry + SPR/BLI
Neutralizes toxin?	GM1-ELISA + SIF germination kinetics	Adds Vero challenge + in vivo mouse model

Figure W10.23 — Side-by-side pyramid: MVP stack (cheap techniques, left) vs full stack (core-facility techniques, right).

Bottom line: the MVP stack covers all four questions with ≤$2k consumables and no specialized instruments beyond a plate reader and gel rig. The full stack tightens the answers (real K_D values, quantitative single-cell display, in vivo proof) but doesn’t change which question is being asked at each layer. MVP is sufficient for course scope; for publication or grant proposals, plan core-facility access at least for the protein-identity and surface-display layers.

A forward-looking idea worth noting. CDMS — the single-molecule MS technique we used to weigh KLH oligomers — could in principle be applied to whole B. subtilis spores carrying surface-displayed VHH. Spores are far larger than even KLH didecamers (gigadalton scale), but the single-ion measurement logic doesn’t fundamentally fail at that scale; recent work pushing CDMS into the gigadalton regime (e.g., for viral capsids and lipid nanoparticles) suggests this is technically feasible. The practical applications would be quantifying per-spore VHH copy number and detecting spore-to-spore display heterogeneity that bulk methods would average over. Not within scope for the current homework, but a real research direction if the Cholera Shield project scales up.

Every protein-level measurement in this plan is a direct application of Week 10’s content. (Layer 1 — DNA verification — uses orthogonal techniques: PCR, Sanger, and Nanopore sequencing, not MS.) Three connections worth flagging: Layer 2’s intact LC-MS is the exact intact-MS workflow described in the top-down section above (theoretical vs measured mass, ppm error); Layer 2’s Western blot is the same bottom-up logic as peptide mapping (small-piece identification confirms identity); Layer 3’s flow cytometry is the cellular analogue of native MS — it asks “is the protein folded and surface-displayed?” without disrupting the cell. The measurement stack is the recurring theme of this week: no single technique answers “did it work?” Each layer answers a different sub-question, and confidence comes from agreement across layers.

Course resources

HTGAA Week 10 page: https://2026a.htgaa.org/2026a/course-pages/weeks/week-10/index.html (assignment source)
ExPASy ProtParam (theoretical MW + pI from sequence): web.expasy.org/protparam
ExPASy PeptideMass (in-silico tryptic digest + peptide [M+H]⁺ list): web.expasy.org/peptide_mass
FragIonServlet (predict b/y-ion fragmentation ladder for a peptide): db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html
ASMS Fundamentals of Mass Spectrometry: asms.org/about-mass-spec/fundamentals-hardware-instrumentation
Native vs Denatured charge state distributions (reading from HTGAA): pmc.ncbi.nlm.nih.gov/articles/PMC7539638

References

The DOI citations above are the working primary literature. Additional facts pulled from outside the four papers:

Cormack BP, Valdivia RH, Falkow S (1996). FACS-optimized mutants of the green fluorescent protein (GFP). Gene 173: 33–38. doi:10.1016/0378-1119(95)00685-0 The original eGFP paper; source of the F64L/S65T mutations relative to wild-type GFP.
Tsien RY (1998). The green fluorescent protein. Annu. Rev. Biochem. 67: 509–544. doi:10.1146/annurev.biochem.67.1.509 Canonical review of GFP chromophore maturation chemistry (cyclization + oxidation, −20 Da).
Royant A, Noirclerc-Savoye M (2011). Stabilizing role of glutamic acid 222 in the structure of Enhanced Green Fluorescent Protein. J. Struct. Biol. 174: 385–390. PMC3473056 Crystal structure used to confirm chromophore positions and orientations.
Konermann L, Ahadi E, Rodriguez AD, Vahidi S (2013). Unraveling the mechanism of electrospray ionization. Anal. Chem. 85: 2–9. doi:10.1021/ac302789c The Chain Ejection Model (CEM) reference cited in the ESI mechanism section.

Last reviewed: 2026-05-26. Figures W10.1–W10.26 are spec’d in notes.md and pending creation. Assignment-supplied Waters Figures 1, 2, 3, 4, 5a, 5b, 5c, 6, 7 are pending insertion from the HTGAA course page. ProtParam-derived eGFP MW values (28,006.6 unmodified / 27,986.6 mature) verified by independent Python calculation cross-checked against published Bio-Techne and FPbase reference values for the bare 239-aa eGFP form.