Week10 HW: Imaging and Measurement
What to measure: Identity, mass, purity, and post-translational modifications of the target protein; concentration of a biomarker; oligomeric state.
How:
- Intact mass by LC-MS (QTof) → confirms overall MW and detects unexpected modifications.
- Peptide mapping by tryptic digest + LC-MS/MS → confirms primary sequence and identifies PTM sites.
- Native MS / CDMS → reveals folded state and oligomeric assembly.
- SDS-PAGE / Western blot → quick purity and identity check before MS.
- UV-Vis (A280) → concentration.
Part I — Molecular Weight of Intact eGFP
Q1. Theoretical MW from sequence
Sequence length: 247 residues (includes LE linker + HHHHHH His-tag).
Calculated average MW ≈ 28,006.6 Da (~28 kDa).
Analogy: counting MW from sequence is like weighing a train by summing the weight of each car — every amino acid adds its known “car weight” minus one water molecule per peptide bond.
Q2. Deconvolution from Figure 1
Selected adjacent peaks (denatured envelope): m/z = 933.7148 and m/z = 903.7148.
Step 2.1 — Charge of the lower-charge peak (n):
$$z_n = \frac{m/z_{n+1}}{(m/z_n) - (m/z_{n+1})} = \frac{903.7148}{933.7148 - 903.7148} = \frac{903.7148}{30.0000} ≈ 30.12$$
→ rounded to z = 30 (and the adjacent peak is z = 31).
Step 2.2 — MW from m/z and z:
$$MW = z_n \times (m/z_n - 1.00728) = 30 \times (933.7148 - 1.00728) ≈ \mathbf{27{,}981\ Da}$$
(Using the non-rounded z ≈ 30.12 gives 28,097 Da. Either is acceptable.)
Step 2.3 — Accuracy:
$$\text{Accuracy} = \frac{|28{,}097 - 28{,}007|}{28{,}007} ≈ 0.32%$$
That is ~3,200 ppm error — excellent for intact-protein QTof analysis.
Q3. Charge state of the zoomed-in peak (~1473 m/z region)
No, the charge state cannot be confidently assigned from the zoom. Reason: the zoomed peak shows partially resolved features, but at 30,000 resolution on a ~28 kDa protein the isotope envelope is not baseline-resolved — adjacent isotope peaks are not separated by a clean 1/z spacing you can read off. To assign charge from isotope spacing you need resolution high enough to see individual ¹²C/¹³C isotope peaks (≥60,000–100,000 for a protein this size, ideally on an Orbitrap or FT-ICR).
Analogy: trying to count the steps of a staircase from far away — if your camera (resolution) isn’t sharp enough, the steps blur into a ramp.
Part II — Native vs Denatured (Secondary/Tertiary)
Q1. What changes when a protein unfolds, and how MS sees it
- Native (folded): the protein is compact, so only a few surface residues are accessible to protons during electrospray → it picks up few charges → peaks appear at high m/z, in a narrow charge envelope (e.g., Figure 2 bottom: ~2500–3000 m/z, only ~2–3 charge states).
- Denatured (unfolded): the chain is extended, exposing every basic residue (K, R, H, N-terminus) to protonation → it picks up many charges → peaks span a broad envelope at low m/z (Figure 2 top: ~700–1500 m/z, many charge states).
The mass is the same — only the charge distribution shifts. MS reads the protein’s “shape” indirectly through how many protons it carries.
Analogy: a folded origami crane has only a few outer surfaces to stick stickers (charges) on; unfold it into a flat sheet and you can stick stickers everywhere. The paper hasn’t changed weight, but the sticker count tells you the shape.
Q2. Charge state at ~2800 m/z (native)
Inset isotope spacing in Figure 3: peaks at 2544.4885, 2544.5801, 2544.6719, 2544.7637, 2544.8552, 2545.0388 → spacing ≈ 0.092 m/z.
$$z = \frac{1}{\Delta(m/z)} = \frac{1}{0.092} ≈ \mathbf{11}$$
So the peak at ~2800 m/z corresponds to z ≈ +11 charge state (consistent with a compact, native eGFP carrying few protons).
Part III — Peptide Mapping (Primary Structure)
Q1. Lysines and Arginines in eGFP
- K (Lysine): 20
- R (Arginine): 6
- Total K + R cleavage sites: 26
Highlighted in the sequence (K and R in bold):
Q2. Predicted tryptic peptides
In silico digest (no missed cleavages, cleavage after K/R unless followed by P): 27 peptides total. PeptideMass with mass cutoff ≥500 Da returns ~17 peptides (filters out very small fragments like TR, QK, IR, single R).
Q3. Chromatographic peaks in TIC (0.5–6 min, >10% abundance)
From Figure 5a, counting peaks above 10% relative intensity: ~15–17 peaks (the most prominent at 0.43, 0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, 5.43, 6.12, 6.50, 6.64, 6.73 — about 15 of these clearly exceed 10%).
Q4. Does peak count match predicted peptides?
The TIC shows fewer peaks than predicted (~15 visible vs 27 predicted). Reasons:
- Very small peptides (TR, QK, IR, R, NGIK) are below the MS detection range or wash out in the dead volume.
- Some peptides co-elute (overlap in retention time).
- Some hydrophilic peptides aren’t retained on C18 reverse-phase column.
Q5. Charge of the peak at 525.76712
Isotope spacing in Figure 5b inset: 525.76712 → 526.25918 → 526.76845 → 527.26998. Spacing ≈ 0.492 m/z.
$$z = \frac{1}{0.492} ≈ \mathbf{2}$$
Singly charged [M+H]⁺:
$$[M+H]^+ = z \times (m/z) - (z-1) \times 1.00728 = 2 \times 525.76712 - 1.00728 = \mathbf{1050.527\ Da}$$
Q6. Peptide identification and mass accuracy
Matching 1050.527 against the PeptideMass output → FEGDTLVNR (residues 115–123 of eGFP, theoretical monoisotopic [M+H]⁺ = 1050.5214 Da).
Mass error in ppm:
$$\text{ppm} = \frac{|1050.527 - 1050.5214|}{1050.5214} \times 10^6 ≈ \mathbf{5.3\ ppm}$$
Excellent accuracy (sub-10 ppm is standard for QTof).
Q7. Sequence coverage
From Figure 6: 88% of the eGFP sequence is confirmed by peptide mapping.
Bonus Q8. Sequence from fragmentation (Figure 5c)
Fragment masses (122.07, 214.09, 388.22, 501.31, 602.35, 537.25, 774.41, 903.44, 1050.52) match the b- and y-ion series for FEGDTLVNR. Key y-ions: y1 (R) = 175.12, y2 (NR) = 289.16, y7 (GDTLVNR), and the immonium ion at 122.07 corresponds to F (phenylalanine immonium). Confirms FEGDTLVNR.
Bonus Q9. Does the data confirm eGFP?
Yes. 88% sequence coverage + a confirmed peptide (FEGDTLVNR) with <10 ppm mass error + matching fragmentation pattern is strong identification. The unobserved 12% is typical (very short peptides or unretained hydrophilic peptides) and doesn’t undermine the ID.
Analogy: it’s like recognizing a friend from 88% of their face uncovered — you don’t need every feature, you just need enough unique landmarks.
Part IV — KLH Oligomers (CDMS)
Using subunit masses: 7FU = 340 kDa, 8FU = 400 kDa, and “Decamer” = 10 subunits.
| Oligomer | Composition | Expected mass | Peak in Figure 7 |
|---|---|---|---|
| 7FU Decamer | 10 × 340 kDa | 3.4 MDa | peak at 3.4 MDa |
| 8FU Didecamer | 20 × 400 kDa | 8.0 MDa | peak at ~7.52 MDa (close to 8.0; some mass loss possible) |
| 8FU 3-Decamer | 30 × 400 kDa | 12.0 MDa | peak at 12.67 MDa |
| 8FU 4-Decamer | 40 × 400 kDa | 16.0 MDa | peak around ~16 MDa (small/absent) |
The prominent 4.013 MDa peak is likely an 8FU decamer (10 × 400 = 4.0 MDa). The 0.1982, 0.79, 1.52 peaks are sub-decameric assemblies/free subunits.
Analogy: CDMS measures each particle individually — like weighing each LEGO build that walks past on a conveyor belt, instead of melting them all and weighing the slag. You see exactly which assemblies exist.
Part V — Validate GFP?
| Theoretical | Observed (Intact LC-MS) | PPM Mass Error | |
|---|---|---|---|
| Molecular weight (kDa) | 28.007 | ~28.097 (from deconvolution of Figure 1) | ~3,200 ppm (0.32%) |
The observed intact mass agrees with the theoretical mass of eGFP+LE+His₆ to within ~0.3%. Combined with 88% peptide map coverage and confirmed FEGDTLVNR fragmentation → yes, this is eGFP.