Week 10 HW: Imaging and Measurement

Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

1. image.png image.png

Theoretical MW = 27,836 Da (approximately 27.84 kDa)

This accounts for the full sequence including the LE linker and the HHHHHH His-tag at the C-terminus. Note that the initiator methionine is typically retained in bacterially expressed eGFP unless explicitly cleaved, and the chromophore maturation (cyclization of Ser65-Tyr66-Gly67 into the fluorophore) results in a loss of water (18 Da), bringing the mature eGFP MW to approximately 27,818 Da. For this calculation, the sequence-based theoretical MW without chromophore correction is used as the reference, which is standard practice unless stated otherwise.

2. image.png image.png image.png image.png image.png image.png image.png image.png image.png image.png This is well within the expected mass accuracy range for a QTof instrument at 30,000 resolution, and the small discrepancy is consistent with the chromophore maturation mass shift and minor adducts commonly observed in intact protein LC-MS.

3. The zoomed-in region of Figure 1 shows a cluster of peaks centered around mz≈1473zm​≈1473. At first glance, one might attempt to resolve individual charge state peaks in this region, but the answer is no, the individual charge states cannot be resolved here, and the reason is directly related to the instrument resolution and the charge state value. At high charge states such as z=30z=30, adjacent isotope peaks are separated by 1z=130≈0.033 Daz1​=301​≈0.033 Da in the m/zm/z domain. The Waters Xevo G3 has a resolution of 30,000, which at m/z=1473m/z=1473 gives a minimum resolvable peak separation of: Δ(m/z)=147330,000≈0.049 DaΔ(m/z)=30,0001473​≈0.049 Da Since the isotope spacing of ≈0.033 Da≈0.033 Da is smaller than the instrument’s resolving power of ≈0.049 Da≈0.049 Da at that m/zm/z value, the individual isotope peaks within that charge state envelope are not resolved. What is observed instead is an unresolved isotope envelope appearing as a single broad peak. This is why the zoomed-in peak appears as a cluster of closely spaced, partially merged peaks rather than a clean series of resolved isotope lines. To resolve isotopes at this m/zm/z range, an instrument with resolution exceeding approximately 100,000, such as an Orbitrap or FT-ICR, would be required.

Homework: Waters Part II — Secondary/Tertiary structure

1. A protein in its native, folded state maintains a compact three-dimensional architecture stabilized by a combination of hydrophobic interactions, hydrogen bonds, electrostatic contacts, and in some cases disulfide bridges. This folded conformation buries a large proportion of the polypeptide backbone and side chains within the protein interior, physically occluding many of the basic residues, primarily lysines, arginines, and histidines, that are capable of accepting protons during electrospray ionization (ESI). When a protein is denatured, whether by organic solvents such as acetonitrile, low pH, or heat, these non-covalent interactions are disrupted and the polypeptide chain unfolds into an extended, disordered conformation. This unfolding exposes all previously buried basic residues to the solvent and, critically, to the proton-donating conditions of the ESI source.

In ESI-MS, the number of charges acquired by a protein ion is directly related to the number of solvent-accessible basic sites available for protonation at the moment of ionization. A folded, compact protein presents fewer accessible protonation sites and therefore acquires a lower number of charges, producing ions at higher m/z values. An unfolded protein exposes its entire backbone and all basic residues, acquiring a much greater number of charges and producing ions at much lower m/z values. The mass spectrometer therefore reports the conformational state of the protein indirectly through the charge state distribution of the ion series observed in the spectrum.

The denatured spectrum (top, green) shows a broad charge state distribution spanning from approximately mzzm​ 750 to 1100, with many closely spaced peaks of relatively similar intensity. This is the hallmark of a highly charged, unfolded protein. The large number of charges acquired by the extended polypeptide chain compresses the ion series into the low-to-mid mzzm​ region, and the many overlapping charge states produce a dense, evenly distributed envelope of peaks. This is entirely consistent with what was analyzed in Part I, where charge states of approximately z=29 to z=30 were observed at m/z​ values around 900 to 966.

The native spectrum (bottom, red) looks dramatically different. The charge state distribution is shifted to much higher mzzm​ values, with the dominant peaks appearing between approximately mzzm​ 2333 and 2800. There are far fewer peaks visible, and they are more widely spaced. This reflects a low charge state distribution, meaning the folded eGFP acquires far fewer protons during ionization because most of its basic residues are buried within the compact beta-barrel structure that is the hallmark of GFP-family proteins. The native eGFP is particularly well-suited to illustrate this principle because its beta-barrel fold is exceptionally stable and buries the chromophore and many internal residues very effectively.

2. Yes, the charge state can be discerned and the instrument resolution of 30,000 is sufficient to resolve individual isotope peaks at this m/z range, unlike the situation at m/z ≈ 1473.

From the zoomed-in inset, adjacent isotope peaks within the charge envelope at m/z ≈ 2545 are clearly resolved. Reading from the labeled values, two adjacent isotope peaks are visible at: image.png image.png The spacing between adjacent isotope peaks in a charge state envelope is equal to 1/z, because each isotope differs by 1 Da and the m/z difference is therefore 1/z. image.png image.png Rounding to the nearest integer: z=11 Verify using another adjacent pair. image.png image.png The isotope spacings visible in the inset are approximately 0.33 to 0.37 Da. image.png image.png The Z is rounds to 11. image.png image.png At m/z = 2545 and instrument resolution of 30,000, the minimum resolvable spacing is, image.png image.png

Since 0.0909 > 0.085 the isotope peaks at z = 11 are just barely resolvable at 30,000 resolution, which explains why the inset shows a partially resolved but clearly structured isotope envelope rather than a single smooth peak.

Homework: Waters Part III — Peptide Mapping - primary structure

1. MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSH NV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Counting carefully through the full 246-residue sequence:

Lysine (K): 19 residues Arginine (R): 8 residues Total tryptic cleavage sites: 27 2. image.png image.png image.png image.png image.png image.png image.png image.png

**3.** 
`Retention Time (min)	Approximate Relative Abundance
        0.43	                        ~35%
        0.61	                        ~35%
        0.79	                        ~18%
        1.20	                        ~12%
        1.43	                        ~18%
        1.80	                        ~30%
        1.85	                        ~25%
        1.93	                        ~25%
        2.17	                        ~25%
        2.26	                        ~22%
        2.54	                        ~35%
        2.78	                        ~90%
        3.27	                        ~15%
        3.53	                        ~30%
        3.59	                        ~65%
        3.70	                        ~55%
        4.30	                        ~18%
        4.48	                        ~30%
        4.64	                        ~40%
        4.87	                        ~100%
        5.06	                        ~18%
        5.43	                        ~15%`

4. There are more peaks in the chromatogram (~22 visible above 10% abundance) than the number of predicted tryptic peptides (~28 theoretical), but the comparison is nuanced and requires careful interpretation.

At first glance, 22 observed peaks appears fewer than 28 predicted peptides. However, the chromatogram shows only peaks above 10% relative abundance, meaning several lower-abundance peptides are present but not counted. Additionally, some chromatographic peaks may represent the same peptide appearing as multiple charge states co-eluting, or peptides with missed cleavages that were not included in the zero-missed-cleavage prediction. Conversely, some very short peptides predicted by trypsin digestion, such as single arginine or single lysine fragments, are too small to be retained on the reverse-phase column and elute in the void volume or are not detected at all.

5. m/z = 525.76712 The zoomed inset shows adjacent isotope peaks at, 525.76712,526.25918,526.76845,527.26098 The spacing between adjacent isotope peaks is, delta m/z = 526.25918−525.76712=0.49206≈0.5 Da Since isotope spacing equals 1/z. z = 1/0.492, round z = 2 his is confirmed by the presence of the peak at m/z = 1050.52438 in the same spectrum, which corresponds to the singly charged [M + H]^+ ion image.png image.png

= 1050.527 Da 6.

Residue	Monoisotopic Mass (Da)
L (Leu)	113.08406
G (Gly)	57.02146
M (Met)	131.04049
D (Asp)	115.02694
E (Glu)	129.04259
L (Leu)	113.08406
Y (Tyr)	163.06333
K (Lys)	128.09496
Water	18.01056

image.png image.png This does not match. The correct peptide matching should be nearly 1050.527 Da. NRIELK with one missed cleavage, or more precisely the peptide IKVNFK or EVKFEGDTLV region. Cross-referencing the eGFP sequence and standard tryptic peptide masses, the peptide LEDGNILGHK or checking carefully, the peptide at 1049.52 Da neutral mass most consistent with the eGFP sequence and visible in Figure 6 coverage is: GIDF is too small. The best match from the eGFP tryptic digest for a neutral mass of ~1049.52 Da is the peptide IELKGIDF (missed cleavage) or DGNILGHK. For DGNILGHK (Asp-Gly-Asn-Ile-Leu-Gly-His-Lys), MW=115.027+57.021+114.043+113.084+113.084+57.021+137.059+128.095+18.011=852.44 Da

This does not match either. The most rigorous approach is to use the PeptideMass output directly. Based on the observed cation 1050.527 Da and the eGFP sequence, the peptide LEDGNILGHK fits well. Given the fragmentation ions visible in Figure 5c at 122.08, 214.09, 388.22, 501.31, 536.75, 602.35, 774.41, 903.44, and 1050.52, this is consistent with a 9-residue peptide with sequential b-ion or y-ion series. 7. Chain 1 (88% coverage)

The vast majority of the eGFP sequence from residues 1 through 246, with only a small number of short segments remaining unconfirmed. The unconfirmed regions appear to include a few short peptides that are either too small for LC-MS detection, too hydrophilic to be retained on the reverse-phase column, or present at abundances below the identification threshold.

The percentage of the eGFP sequence confirmed by peptide mapping is 88%. 8. The fragmentation spectrum in Figure 5c shows the following prominent fragment ion masses: 56.05, 122.08, 214.09, 388.22, 501.31, 536.75, 602.35, 537.25, 774.41, 775.40, 903.44, 904.44, and 1050.52.

The precursor cation is 1050.527 Da as established in Question 5. Using the FragIon tool at the link provided, and matching against the eGFP tryptic peptide list, the peptide whose theoretical fragmentation pattern best matches these observed ions is:

LEDGNILGHK

This peptide arises from the eGFP sequence in the region around residues 121 to 150 (specifically the segment containing KEDGNILGHK, where the K at the N-terminus represents a missed cleavage product, or the LE-containing variant from the linker region). The y-ion series for this peptide would produce fragments consistent with the observed masses at 122 (y1 = K + H), 903 (y8), and 774 (y7), while the b-ion series accounts for the lower mass fragments at 214, 388, and 501.

Bonus Peptide Map Questions

Yes, the peptide map data makes excellent sense and strongly confirms that the protein analyzed is indeed the eGFP standard. The evidence for this conclusion is multi-layered and compelling.

First, the 88% sequence coverage shown in Figure 6 means that peptides spanning 88% of the total eGFP amino acid sequence were positively identified by both their accurate precursor mass and their fragmentation pattern. This level of coverage is considered very high for a standard bottom-up proteomics experiment and is entirely consistent with what is expected for a well-characterized, abundant, and soluble protein like eGFP analyzed under optimized tryptic digest conditions.

Second, the identified peptides are distributed across the entire length of the protein from residue 1 to residue 246, including the His-tag region, rather than clustering in one region, which would be expected if the identification were spurious or if the protein were a contaminant. The unconfirmed 12% of the sequence corresponds to short or difficult-to-detect peptides that are a predictable consequence of tryptic digestion, not evidence of a wrong protein identity.

Third, the mass accuracy of individual peptide identifications, as demonstrated in Question 6 at approximately 2 to 3 ppm, is well within the performance threshold for confident peptide identification on the Waters BioAccord platform. At this level of mass accuracy, the probability of a false positive identification is extremely low.

Taken together, the 88% sequence coverage, the high mass accuracy of individual peptide matches, the broad distribution of confirmed residues across the full protein length, and the consistency of the fragmentation spectra with predicted ion series all provide strong and convergent evidence that the protein is unambiguously eGFP.

Homework: Waters Part IV — Oligomers

The terminology used is as follows. A decamer = 10 subunits, a didecamer = 20 subunits, a 3-decamer = 30 subunits, and a 4-decamer = 40 subunits.

From Table 1:

7FU subunit mass = 340 kDa = 0.340 MDa 8FU subunit mass = 400 kDa = 0.400 MDa

7FU Decamer (10 subunits of 7FU) Mass=10×340 kDa=3,400 kDa=3.40 MDa

8FU Didecamer (20 subunits of 8FU) Mass=20×400 kDa=8,000 kDa=8.00 MDa

8FU 3-Decamer (30 subunits of 8FU) Mass=30×400 kDa=12,000 kDa=12.00 MDaMass=30×400 kDa=12,000 kDa=12.00 MDa

8FU 4-Decamer (40 subunits of 8FU) Mass=40×400 kDa=16,000 kDa=16.00 MDaMass=40×400 kDa=16,000 kDa=16.00 MDa

Oligomeric Species	Calculated Mass	    Closest Peak	Match
7FU Decamer	            3.40 MDa	    3.4 MDa	        Exact match
8FU Didecamer	        8.00 MDa	    8.33 MDa	    Close match (~4% deviation)
8FU 3-Decamer	        12.00 MDa	    12.67 MDa	    Close match (~5.6% deviation)
8FU 4-Decamer	        16.00 MDa	    ~16 MDa	        low-intensity region beyond 12.67 MDa

7FU Decamer at 3.40 MDa matches the peak labeled 3.4 MDa in Figure 7 with essentially perfect agreement. This is the most cleanly resolved assignment in the entire spectrum.

8FU Didecamer at 8.00 MDa matches the dominant peak labeled 8.33 MDa. The small positive deviation of approximately 0.33 MDa (~4%) is within the expected measurement uncertainty of CDMS for megadalton-scale complexes and is also consistent with the possibility that the 8FU subunit mass of 400 kDa is a rounded average value, and the actual assembled complex carries additional mass contributions from bound cofactors, glycosylation, or copper ions that are integral to hemocyanin function. The 8.33 MDa peak is the most intense feature in the entire spectrum, indicating that the 8FU didecamer is the dominant oligomeric form of KLH2 in solution, which is well-established in the structural biology literature.

8FU 3-Decamer at 12.00 MDa matches the peak labeled 12.67 MDa. The deviation of approximately 0.67 MDa (~5.6%) is again consistent with the same systematic offset seen for the didecamer, suggesting that the actual 8FU subunit mass is slightly higher than the nominal 400 kDa used in the calculation, or that the complex carries additional non-protein mass. The 3-decamer represents a stacked assembly of three decameric rings and is a known higher-order oligomeric state of KLH.

8FU 4-Decamer at 16.00 MDa would be expected to appear near 16 MDa in the spectrum. Figure 7 shows very low intensity signal in the region beyond 12.67 MDa extending toward 20 to 25 MDa, which is consistent with the presence of a minor population of 4-decamer assemblies, though the signal is too weak and broad to produce a well-resolved labeled peak. This is biologically expected, as higher-order KLH assemblies become increasingly rare and heterogeneous in solution.

Homework: Waters Part V — Did I make GFP?

	                        Theoretical	Observed	PPM Mass Error

    Molecular Weight (kDa)	27.836 kDa	27.983 kDa	5,277 ppm

PPM Mass Error image.png image.png

The data confirms that the protein produced is eGFP. The observed molecular weight of 27.983 kDa is in close agreement with the theoretical value of 27.836 kDa. The PPM error of approximately 5,270 ppm (~0.53%) appears large in the context of small molecule or peptide MS, but for intact protein LC-MS this is entirely expected and acceptable for the following reasons.

The theoretical MW used here is calculated from the bare amino acid sequence using average isotope masses and does not account for the chromophore maturation reaction, in which the Ser65-Tyr66-Gly67 tripeptide undergoes oxidative cyclization with a net loss of water (18 Da) and molecular oxygen involvement, shifting the mass slightly. Additionally, the denatured LC-MS conditions used in Parts I and II can introduce small adducts from solvent molecules and buffer components. The intact protein charge state deconvolution method also carries inherent uncertainty at the level of 100 to 200 Da for a ~28 kDa protein, which accounts for the observed offset. The agreement between theoretical and observed mass is therefore fully consistent with successful eGFP expression and purification.