Week 10 HW: Imaging and Measurement

Homework: Waters Part I — Molecular Weight

An eGFP standard was analyzed on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states.

1. What is the calculated molecular weight?

eGFP Sequence:

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Values obtained from Expasy’s ProtParam tool for the eGFP sequence containing the LE linker and His-purification tag (HHHHHH):

Property	Result
Number of amino acids	247
Average molecular weight	28006.60 Da
Monoisotopic molecular weight	27988.96 Da
Theoretical pI	5.90

I will use the average MW = 28006.60 Da.

However, the mature eGFP chromophore forms by autocatalytic cyclization, which causes a nominal mass loss of approximately H₂O + H₂ ≈ 20.03 Da (dehydration and loss of two hydrogen atoms).

MW_theory = 28006.60 − 20.03
MW_theory = 27986.57 Da

2. Calculate the molecular weight using the adjacent charge state approach

I selected the following adjacent charge state peaks from Figure 1:

(m/z)ₙ = 1000.4302
(m/z)ₙ₊₁ = 965.9684

2.1 Determine z for the adjacent pair (n, n+1):

The formula is: z = (m/z)ₙ₊₁ / [ (m/z)ₙ − (m/z)ₙ₊₁ ]

z = 965.9684 / (1000.4302 − 965.9684)
z = 965.9684 / 34.4618
z = 28.03 ≈ 28

2.2 Determine the MW of the protein:

Using the relationship (m/z)ₙ = (MW + n·H) / n, where H ≈ 1.0073 Da (mass of a proton), solving for MW gives:

MW = n · (m/z)ₙ − n · H

Using n = 28 and (m/z)ₙ = 1000.4302:

MW = 28 × (1000.4302) − 28 × (1.0073)
MW = 28011.05 − 28.20
MW ≈ 27983.84 Da

As a check, applying the same formula to the adjacent peak with n+1 = 29 and (m/z)ₙ₊₁ = 965.9684:

MW = 29 × (965.9684) − 29 × (1.0073)
MW ≈ 27983.87 Da

Average experimental MW:

MW_exp = (27983.84 + 27983.87) / 2
MW_exp ≈ 27983.86 Da

2.3 Calculate the accuracy of the measurement:

The formula is: Accuracy = | MW_exp − MW_theory | / MW_theory

Accuracy = | 27983.86 − 27986.57 | / 27986.57
Accuracy = 2.71 / 27986.57
Accuracy = 9.68 × 10⁻⁵

Converting to ppm:

ppm error = 9.68 × 10⁻⁵ × 10⁶
ppm error ≈ 97 ppm

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP?

Yes. In the zoomed-in inset around m/z ≈ 1473, the isotopic peaks within the envelope are separated by approximately Δ(m/z) ≈ 0.052. Since the isotopic spacing equals 1/z:

z = 1 / 0.052
z ≈ 19

So the charge state of the zoomed-in peak is +19.

Homework: Waters Part II — Secondary / Tertiary structure

1. Difference between native and denatured protein conformations

A native protein is folded into its biologically active 3D structure: secondary structure (α-helices, β-sheets) is organized into a tertiary fold, and the hydrophobic residues are buried in the core. When a protein denatures, the non-covalent interactions (hydrogen bonds, hydrophobic packing, salt bridges) break down. The chain opens up and many previously buried basic residues become exposed to the solvent.

On a mass spectrometer this difference shows up as a change in the charge state distribution, not the mass itself. In native MS (gentle solvents, neutral pH), the protein keeps its compact fold and only a few exposed sites can pick up protons — so the spectrum shows a small number of charge states clustered at high m/z (e.g. around 2500–3000). In denaturing conditions (organic solvent, low pH), all basic residues are exposed, so the protein picks up many more protons — the spectrum shifts to a broader distribution of charge states at much lower m/z (e.g. 700–1500).

In Figure 2, the denatured eGFP (top) shows a wide envelope of many closely spaced peaks at low m/z, while the native eGFP (bottom) shows only a few peaks at higher m/z. The total deconvoluted mass is the same — only the protonation pattern differs.

2. Charge state of the peak at ~2800 m/z in the native spectrum

In Figure 3, the zoomed-in inset around m/z ≈ 2800 shows isotopic peaks separated by approximately Δ(m/z) ≈ 0.1. Since the isotopic spacing equals 1/z:

z = 1 / 0.1
z = 10

So the charge state is approximately +10. This low charge state is consistent with a folded, native conformation in which only a small number of basic residues are exposed and available for protonation.

Homework: Waters Part III — Peptide Mapping (primary structure)

1. How many Lysines (K) and Arginines (R) are in eGFP?

Highlighting K and R in the eGFP sequence:

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Lysines (K): 20
Arginines (R): 6
Total cleavage sites: 26

2. How many peptides will be generated from tryptic digestion of eGFP?

Trypsin cleaves after K and R (except when followed by P). Using the Expasy PeptideMass tool with the parameters shown in Figure 4, the digest yields 19 peptides (some short cleavage products are filtered below the default mass cutoff, and a few KP/RP sites are not cleaved).

5. Identify the m/z, charge, and singly-charged mass of the peptide in Figure 5b

The most abundant peak is at m/z = 525.76.

The isotope peaks in the inset are separated by approximately Δ(m/z) ≈ 0.5, so the charge state is:

z = 1 / 0.5
z = 2

Calculating the singly-charged form [M+H]⁺ from m/z = (M + n·H) / n:

525.76 = (M + 2 × 1.00727) / 2
M = 2 × 525.76 − 2 × 1.00727
M = 1051.52 − 2.01
M ≈ 1049.51 Da

Therefore:

[M+H]⁺ = M + 1.00727
[M+H]⁺ ≈ 1050.51 Da

6. Identify the peptide and calculate mass accuracy

Comparing the observed [M+H]⁺ = 1050.51 Da against the predicted peptide list from PeptideMass, the closest match is the tryptic peptide FEGDTLVNR (theoretical [M+H]⁺ = 1050.5214 Da).

Mass error in ppm:

ppm error = | 1050.51 − 1050.5214 | / 1050.5214 × 10⁶
ppm error = 0.0114 / 1050.5214 × 10⁶
ppm error ≈ 10.85 ppm

7. What is the percentage of the sequence confirmed by peptide mapping?

Based on the amino acid coverage map in Figure 6, approximately 88% of the eGFP sequence is confirmed by identified peptides.

Bonus — Does the peptide map data make sense?

Yes. The peptide FEGDTLVNR matches an expected tryptic peptide of eGFP within ~11 ppm, and ~88% of the full sequence is covered by identified peptides with correct masses and fragmentation patterns. Combined with the intact-mass measurement from Part I (which matched the expected MW within ~100 ppm), this strongly supports the conclusion that the protein analyzed is indeed the eGFP standard.

Homework: Waters Part IV — Oligomers

Using the known KLH subunit masses to assign oligomeric species on the CDMS spectrum:

Oligomeric species	Calculation	Mass
7FU Decamer	340 kDa × 10	3.4 MDa
8FU Didecamer	400 kDa × 20	8.0 MDa
8FU 3-Decamer	400 kDa × 30	12 MDa
8FU 4-Decamer	400 kDa × 40	16 MDa

Each of these masses corresponds to a distinct peak in the CDMS spectrum of Figure 7, showing that KLH exists in solution as a mixture of these higher-order oligomeric states.

Homework: Waters Part V — Did I make GFP?

	Theoretical	Observed (Intact LC-MS)	PPM Mass Error
Molecular weight of eGFP (Da)	27986.57	27983.86	~97 ppm
Peptide FEGDTLVNR (Da, [M+H]⁺)	1050.5214	1050.51	~10.85 ppm

The intact-mass and peptide-map measurements both agree with the predicted values within typical LC-MS mass accuracy, confirming that the analyzed protein is the eGFP standard.

References

Expasy ProtParam (molecular weight + pI): https://web.expasy.org/protparam/
Expasy PeptideMass (tryptic digest prediction): https://web.expasy.org/peptide_mass/
Institute for Systems Biology Fragment Ion Calculator: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html
Heck, A. J. R. (2008). Native mass spectrometry: a bridge between interactomics and structural biology. Nature Methods, 5(11), 927–933.