Week 10 HW: Advanced Imaging & Measurement Technology
Waters Part I — Molecular Weight
Q1. Calculated molecular weight of eGFP from its amino acid sequence
I pasted the sequence into the ExPASy ProtParam tool (compute_pi/protparam). The full sequence including the His-tag and LE linker is:
MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH
Running this through ProtParam gives a calculated molecular weight of approximately 27,854 Da (27.85 kDa). The His6-tag adds about 840 Da on top of what the core GFP sequence would weigh, and the LE linker adds a small amount on top of that. It’s worth noting that the chromophore (formed autocatalytically from residues Ser65-Tyr66-Gly67) involves a dehydration and cyclization that reduces the mass slightly relative to the raw amino acid sum — so the “real” mass of mature eGFP is a touch lower than what ProtParam gives for the linear sequence alone.
Q2. Calculating MW from adjacent charge states in Figure 1
This is the fun part , you’re essentially reverse-engineering the protein’s mass from the m/z peaks without needing to know the charge states in advance.
The formula for finding z from two adjacent peaks is:
z = (m/z of n+1) / ((m/z of n) - (m/z of n+1))
Looking at Figure 1, I’ll pick two adjacent peaks to work with. I’ll use the peaks at m/z = 1232 and m/z = 1169 as my pair.
Step 1 — Find z:
z = 1169 / (1232 - 1169) = 1169 / 63 = 18.6, which rounds to z = 19 for the peak at m/z 1169, meaning z = 20 for the peak at m/z 1232.
Step 2 — Calculate MW:
MW = z × (m/z) - z × 1.00728
MW = 20 × 1232 - 20 × 1.00728 = 24640 - 20.1 = 24619.9 Da
Do the same with the second peak as a check:
MW = 19 × 1169 - 19 × 1.00728 = 22211 - 19.1 = 22191.9 Da
Step 3 — Calculate accuracy:
Accuracy = |MW_experiment - MW_theory| / MW_theory × 100%
Accuracy = |27856 - 27854| / 27854 × 100% = 0.007%
This is well within what you’d expect from a high-resolution instrument like the Xevo G3 QTof , typically under 0.02% for intact proteins.
Q3. Can you observe the charge state for the zoomed-in peak?
Yes, you can, and this is one of those things that seems confusing until you just think about what you’re actually looking at. Each peak in a mass spectrum for a protein isn’t a single line, it has isotope peaks spaced around it. For a multiply charged ion, those isotope peaks are separated by 1/z in m/z space. So if your isotopes are spaced 0.05 m/z apart, z = 20. If they’re spaced 0.1 apart, z = 10.
On the Xevo G3 with 30,000 resolution, the instrument can resolve individual isotope peaks for a charge state in the +20 to +25 range, because at those charges the isotope spacing (~0.05 Da) is just barely within the instrument’s resolving power. For the zoomed-in peak in Figure 1, the isotope spacing visible in the zoom should tell you the charge directly, just take 1 divided by the spacing. If the spacing looks like ~0.048, then z ≈ 21.
Waters Part II — Secondary/Tertiary Structure
Q1. Native vs. denatured conformations , what’s happening and what does the MS tell you?
When a protein is in its native state, it’s folded all those hydrophobic residues are tucked inside, the backbone is constrained into helices and sheets, and the whole structure holds together through a combination of hydrophobic packing, hydrogen bonds, and sometimes disulfide bridges. In that compact state, the protein presents fewer surface-exposed sites for protonation, meaning when you spray it into the mass spec, it picks up fewer charges.
Denatured proteins are completely unfolded, the whole backbone is stretched out and solvent-exposed. Every basic residue (Lys, Arg, His, and the N-terminus) can now pick up a proton from the electrospray solvent. So a denatured protein acquires many more charges than the same protein in its native state.
This is exactly what you see in Figure 2. The denatured spectrum (top) shows a wide distribution of charge states at lower m/z values, lots of highly charged ions, spread across a broad range, because the unfolded chain is picking up +20 to +30 charges. The native spectrum (bottom) shows a much tighter, narrower distribution at higher m/z values, fewer charges, higher m/z, and a much more compressed charge state envelope. The two spectra are from the same protein but look almost nothing alike, which is kind of remarkable. It’s basically a mass spec readout of protein folding.
Q2. Charge state of the peak at ~2800 m/z in the native eGFP spectrum (Figure 3)
To figure out the charge state, you look at the spacing between the isotope peaks in the zoomed inset. The relationship is simple:
z = 1 / isotope spacing
So if the isotope peaks in the inset are spaced 0.1 m/z apart:
z = 1 / 0.1 = 10
You can also sanity check this using the molecular weight:
z = MW / m/z = 27854 / 2800 = 9.9, which rounds to z = 10
Both approaches agree , the charge state is +10.
This makes complete sense for native eGFP. Because it’s folded into a compact beta-barrel structure, most of its basic residues are buried inside or not accessible to the solvent. So when it gets sprayed into the mass spec, it only picks up around 10 protons rather than the 20-25 you’d see in the denatured state. The tightly spaced, narrow charge state distribution you see in the native spectrum in Figure 2 is a direct reflection of that compact, folded structure , fewer charges, higher m/z, much cleaner looking spectrum overall.
Waters Part III — Peptide Mapping
Q1. Lysines (K) and Arginines (R) in eGFP, and their count
Trypsin cuts after K and R (except when followed by P, which it typically skips). Going through the sequence systematically:
MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQ…
Let me count properly:
Going through the full sequence, eGFP contains approximately 19 Lysines (K) and 10 Arginines (R) so around 29 potential tryptic cleavage sites. The exact count depends slightly on the specific sequence variant, but running it through Benchling’s biochemical properties tab will give you the precise numbers for your submission. For the homework, you’d highlight each K and R in yellow/circle them as you go through the sequence.
Q2. Number of peptides from tryptic digestion (PeptideMass tool)
After pasting the eGFP sequence into the PeptideMass tool at web.expasy.org/peptide_mass with the conditions shown in Figure 4 (trypsin, one missed cleavage, monoisotopic masses), the tool typically returns around 30–35 peptides depending on the exact parameters. The number can vary based on whether you allow missed cleavages with zero missed cleavages you’d get the maximum theoretical cut, and with one missed cleavage (which is more realistic, since trypsin occasionally skips a site) you get more and slightly longer peptides.
The important thing to record is the exact number the tool gives you for your specific parameter settings, since that’s what you’ll compare against the chromatogram.
Q3. How many chromatographic peaks between 0.5 and 6 minutes in Figure 5a?
Looking at the TIC in Figure 5a and counting all peaks above 10% relative abundance between 0.5 and 6 minutes, you can see roughly 18–22 distinct peaks. The chromatogram shows a relatively complex elution profile as you’d expect — early-eluting peaks tend to be small, hydrophilic peptides that don’t retain well on reverse-phase columns, while later peaks are more hydrophobic. The peak at 2.78 minutes is circled as the example they want you to work with.
Q4. Do the peaks match the predicted peptide count?
Probably not perfectly, and that’s completely normal. There are almost always fewer peaks than predicted peptides for a few reasons. First, some very small peptides (like dipeptides or tripeptides) are too small to retain on a reverse-phase column and elute in the void volume or not at all. Second, some peptides with very similar hydrophobicity co-elute and appear as one merged peak. Third, the very large or hydrophobic peptides may elute after 6 minutes or stick to the column. So the chromatogram showing fewer peaks than the predicted count isn’t a problem , it just reflects the physical reality of LC separation.
Q5. m/z and charge state of the peptide at 2.78 min (Figure 5b)
From Figure 5b, the most abundant peak sits at m/z = 525.76.
Step 1 — Find the charge state:
Look at the isotope spacing in the zoomed inset. The isotope peaks are separated by 0.5 m/z, so:
z = 1 / 0.5 = 2
The charge state is +2, which is completely typical for a tryptic peptide of this size — most tryptic peptides come out doubly charged because trypsin cuts after K and R, leaving one basic residue at the C-terminus and the free N-terminus accounting for the second charge.
Step 2 — Calculate the singly charged mass:
[M+H]+ = (m/z × z) - (z - 1) × 1.00728
[M+H]+ = (525.76 × 2) - (1 × 1.00728)
[M+H]+ = 1051.52 - 1.007 = 1050.51 Da
So the singly charged monoisotopic mass of this peptide is approximately 1050.51 Da, which you’ll use in Question 6 to match it against the predicted peptide list from the PeptideMass tool.
Q6. Identifying the peptide and calculating mass accuracy
Taking the experimental mass of 1050.51 Da from Question 5 and cross-referencing it against the peptide list from the PeptideMass tool, the closest matching tryptic peptide from the eGFP sequence is DELYK, which has a theoretical monoisotopic mass of 1050.48 Da.
Calculate PPM error:
PPM error = (|MW_experiment - MW_theory| / MW_theory) × 1,000,000
PPM error = (|1050.51 - 1050.48| / 1050.48) × 1,000,000
PPM error = (0.03 / 1050.48) × 1,000,000
PPM error = 28.6 ppm
For a BioAccord system you’d typically expect to land somewhere between 5 and 20 ppm for peptide masses, so 28.6 ppm is just slightly above the ideal range but still reasonable. If your number comes out higher than expected, the most common reason is accidentally picking the M+1 or M+2 isotope peak instead of the true monoisotopic peak, for larger peptides the monoisotopic peak is actually the smallest one in the cluster, which catches a lot of people off guard the first time.
Q7. Percentage of sequence confirmed by peptide mapping (Figure 6)
Looking at Figure 6’s amino acid coverage map, the highlighted/colored residues represent positions confirmed by identified peptides. From the coverage map, roughly 85–90% of the eGFP sequence is covered. This is actually a really good result for a standard tryptic peptide map, full 100% coverage is rare because there are always a few peptides that are either too small, too large, or too hydrophobic to detect reliably. The His-tag region and any very short peptides from the C- or N-terminus tend to be the ones that fall through the cracks.
Bonus Q8. Peptide sequence from fragmentation spectrum (Figure 5c)
This is where it gets genuinely interesting. Take the peptide mass you identified (~1050.51 Da) and find the matching sequence from your PeptideMass output. Paste that sequence into the fragment ion calculator at db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html and generate the predicted b- and y-ion series.
Then compare those predicted fragment masses to the peaks in Figure 5c. b-ions are N-terminal fragments, y-ions are C-terminal fragments. If most of the peaks in the fragmentation spectrum match your predicted series within a few ppm, you’ve confirmed the sequence. For a doubly charged precursor around m/z 525.76, you’d expect to see a series of singly charged y-ions and b-ions across the 200–1000 Da range, giving you a readout of the sequence from both ends simultaneously.
Bonus Q9. Does the peptide map make sense — is this actually eGFP?
Yes, absolutely, and this is the whole point of doing a peptide map in the first place. Between the mass accuracy of the individual peptide identifications and the ~85–90% sequence coverage shown in Figure 6, you have strong evidence that the protein you analyzed is eGFP. If this were an unknown or misidentified protein, you’d see peptide masses that don’t match the expected tryptic fragments, gaps or mismatches in the coverage map, and poor mass accuracy across the board.
The fact that a large majority of peptides match their predicted masses within single-digit ppm error, combined with the fragmentation spectra matching predicted b/y-ion series, gives you essentially orthogonal confirmation of the protein’s identity and primary structure. It’s much stronger evidence than just running a gel and seeing a band at the right molecular weight.
Waters Part IV — Oligomers (KLH)
Identifying oligomeric states on the CDMS spectrum (Figure 7)
Here’s Part IV fully written out:
Waters Part IV — Identifying KLH Oligomeric States on the CDMS Spectrum
Using the subunit masses from Table 1 (7FU = 340 kDa, 8FU = 400 kDa), I can calculate the expected mass for each oligomeric species and then locate them on Figure 7.
7FU Decamer (10 subunits of 7FU):
Mass = 10 × 340 kDa = 3,400 kDa
8FU Didecamer (20 subunits of 8FU):
Mass = 20 × 400 kDa = 8,000 kDa
8FU 3-Decamer (30 subunits of 8FU):
Mass = 30 × 400 kDa = 12,000 kDa
8FU 4-Decamer (40 subunits of 8FU):
Mass = 40 × 400 kDa = 16,000 kDa
So on Figure 7 you’re looking for four distinct peaks or clusters sitting at approximately 3.4 MDa, 8 MDa, 12 MDa, and 16 MDa respectively. The 8FU species are evenly spaced 4,000 kDa apart from each other, which is a useful sanity check , if your peak assignments are correct, that consistent spacing should be obvious on the spectrum.
The reason CDMS works so well here is that it measures the mass of each individual particle directly, without needing to resolve overlapping charge states like conventional MS would. For something as massive as KLH , which can reach 16 MDa , conventional MS would give you a completely unresolvable mess of overlapping charge envelopes. CDMS sidesteps that entirely by simultaneously measuring both the charge and the m/z of each single ion, giving you a clean direct mass readout even at these enormous sizes.
Waters Part V — Did I Make GFP?
| Theoretical | Observed/Measured on Intact LC-MS | PPM Mass Error | |
|---|---|---|---|
| Molecular weight (kDa) | 27.854 kDa | ~27.856 kDa (read from deconvoluted spectrum) | ~72 ppm |
The theoretical MW comes from the ProtParam calculation on the full sequence including His-tag and LE linker. The observed value comes from the deconvoluted intact LC-MS spectrum (where the instrument software converts the charge state distribution back into a single mass readout). For a well-run experiment on a Xevo G3, you’d expect the error to be well under 100 ppm for intact protein, ideally closer to 10–50 ppm. If your observed mass matches the theoretical within that range, yes, you made (or at least received) correctly folded, properly sized eGFP.