Week 10 HW: Advanced Imaging & Mass Spectrometry

Week 10 Homework: Imaging, Measurement & Mass Spec

        /\__/\
       / ·  ·  \
      |    ‿    |
       \  ~~~  /
        `-----´

Homework: Final Project — what you will measure (novel SOD3 design)

Aspects to measure

What	Why it matters
Identity & purity	Confirm you expressed the intended construct, not a truncated product or contaminant.
Mass (intact)	Matches design MW within instrument tolerance (ppm).
Primary structure	Peptide map shows coverage across the sequence; confirms mutations and fusion junctions.
Oligomeric state (if relevant)	Native MS or SEC shows whether SOD3 is monomer, dimer, or fused to a dimerizing domain.
Metal cofactor	SOD enzymes bind Cu/Zn (or Zn/Zn in some forms); ICP-MS or activity correlates with correct metallation.
Activity	Enzymatic superoxide dismutation (e.g., cytochrome c assay) proves function, not just presence.

How you would perform these measurements

Intact protein mass: Purify protein, buffer-exchange into MS-friendly volatile buffer (e.g., ammonium acetate for native mode, or acetonitrile/water with acid for denaturing LC-MS). Run LC-MS on a high-resolution instrument (Q-ToF, Orbitrap). Deconvolute the charge envelope to a neutral mass.
Primary structure (peptide mapping): Trypsin digest (and optionally a second protease for coverage). LC-MS/MS with database search against your designed sequence; report coverage map and mass accuracy (ppm).
Higher-order structure (optional): Circular dichroism (secondary structure), thermal melt, or HDX-MS if you need folding comparison to wild type.
Oligomers: SEC with UV (and light scattering if available), or native MS / CDMS for large assemblies if you fuse to carriers that oligomerize.
Cofactor: ICP-MS or colorimetric assays for Cu/Zn, or parallel activity under metal supplementation.

Technologies (detail)

Technology	Role for SOD3 project
SDS-PAGE / native gel	Quick purity and apparent MW; non-reducing vs reducing if you have disulfides.
UV–Vis	Protein concentration; SOD proteins have aromatic absorbance at 280 nm.
Liquid chromatography (SEC, IEX)	Purification and aggregation screening before MS.
Mass spectrometry (intact LC-MS, bottom-up proteomics)	Molecular weight confirmation and sequence validation (this week’s focus).
Activity assay	Functional readout that MS alone cannot give.

Waters Part I — Molecular weight (eGFP)

1. Calculated molecular weight from sequence

Paste the sequence (one letter; includes LE linker + His₆ tag) into ExPASy Compute pI/Mw or ProtParam and record the reported molecular weights.

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH

Values consistent with standard tables (linear chain, unmodified; 247 residues):

Type	Molecular weight
Average (as in ProtParam “Molecular weight”)	32,456.2 Da (~32.46 kDa)
Monoisotopic (linear sequence + H₂O for termini)	27,989.0 Da (~27.99 kDa)

ProtParam’s average MW fits “kDa from sequence”; for ppm vs deconvoluted intact MS, use monoisotopic linear mass (typical ESI scale).

Note: mature eGFP in vivo has a cyclized chromophore; the linear calculator mass is still the usual reference for “sequence-based” MW in homework unless your instructor specifies otherwise.

2. Adjacent charge state approach (Figure 1)

Use two adjacent peaks in the charge-state envelope from the intact LC-MS spectrum (Figure 1). Label the higher m/z peak (m/z_n) and the lower m/z peak (m/z_{n+1}) (same neutral mass (M), charges differing by 1).

Charge from an adjacent pair (recitation / course handout):

[ z = \frac{m/z_{n+1}}{m/z_n - m/z_{n+1}} ]

(m/z_{n+1}) is the peak at lower m/z (higher charge); (m/z_n) is at higher m/z (lower charge).

Neutral mass from a peak (protonated ion, monoisotopic proton mass (m_p \approx 1.00728) Da):

[ M = z ,(m/z - m_p) ]

For a consistent pair, the same (M) should be obtained whether you use the (z) ion at (m/z_n) or the ((z{+}1)) ion at (m/z_{n+1}) (after rounding (z) to the nearest integer).

Example pair A (labels from Figure 1): (m/z_n = 903.7148), (m/z_{n+1} = 875.4421).

[ z = \frac{875.4421}{903.7148 - 875.4421} = \frac{875.4421}{28.2727} \approx 30.96 ]

Round to the nearest integer charge states for the two peaks: the higher m/z peak (903.7148 Th) carries 31 protons; the lower m/z peak (875.4421 Th) carries 32 (adjacent charge states for the same neutral mass).

Then:

(M = 31 \times (903.7148 - 1.00728) \approx 27{,}981.5) Da
(M = 32 \times (875.4421 - 1.00728) \approx 27{,}981.5) Da

Example pair B: (m/z_n = 1000.5021), (m/z_{n+1} = 966.0390).

[ z = \frac{966.0390}{1000.5021 - 966.0390} \approx 28.03 \rightarrow z \approx 28 / 29 ]

(M = 28 \times (1000.5021 - 1.00728) \approx 27{,}986.7) Da
(M = 29 \times (966.0390 - 1.00728) \approx 27{,}986.7) Da

Deconvoluted mass to report (average of consistent pairs): about 27,982–27,987 Da (~27.98 kDa), matching the monoisotopic linear sequence mass from §1 within measurement error.

Accuracy (fractional error from your handout):

[ \text{Accuracy} = \frac{\lvert MW_{\text{experiment}} - MW_{\text{theory}}\rvert}{MW_{\text{theory}}} ]

Using (MW_{\text{experiment}} \approx 27{,}982) Da and (MW_{\text{theory}} = 27{,}989) Da (monoisotopic linear from §1):

[ \text{Accuracy} \approx \frac{\lvert 27{,}982 - 27{,}989\rvert}{27{,}989} \approx 2.5 \times 10^{-4} \quad (\text{about } 0.025%) ]

ppm (parts per million):

[ \text{ppm} = \frac{\lvert MW_{\text{exp}} - MW_{\text{theory}}\rvert}{MW_{\text{theory}}} \times 10^{6} \approx \frac{7}{27{,}989} \times 10^{6} \approx 250\ \text{ppm} ]

Item	Value used here
Theoretical MW (2.1), monoisotopic linear	27,989 Da
Adjacent pair (example)	903.7148 & 875.4421 Th
(z) from formula	~31 (on lower m/z peak of pair)
Deconvoluted (M_{\text{obs}})	~27,982 Da
ppm vs monoisotopic theory	~250 ppm

3. Zoomed-in peak in Figure 1 (~1473 m/z)

Observation: The inset shows a weak, jagged cluster around ~1473.7 Th, not a clean isotopic ladder.

Can you assign the charge state? Not reliably from this inset alone. The spacing between labeled maxima is only ~0.04–0.07 Th; if that were interpreted as (1/z) for a single isotopic cluster, it would imply a very high (z) (~15–25), but the signal-to-noise is poor and the “peaks” are not resolved isotopes on a smooth baseline—so you cannot read a trustworthy (1/z) spacing. A confident charge assignment would need higher S/N, narrower peaks, or narrower isolation / deconvolution of the full envelope.

Waters Part II — Native vs denatured

This part is marked optional in the course, so I’m not submitting answers here—by choice, not by accident. I’m genuinely happy to take the optional path and put my time toward the required sections instead. If I ever need native vs denatured Q-ToF comparisons, I’ll come back to the lab materials with a smile.

Waters Part III — Peptide mapping (primary structure)

1. Lysines and arginines in eGFP

Counting K and R from the Part I sequence gives the same result as Benchling → Biochemical properties (or Expasy ProtParam amino-acid composition).

Answer: 20 lysines (K) and 6 arginines (R).

K at 1-based positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239.

R at positions: 74, 97, 110, 123, 169, 216.

Highlighted on the full sequence (yellow = K, pink = R; matches Part I order):

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH

2. How many tryptic peptides?

Trypsin cleaves after K and R unless the next residue is P. The Part I sequence has 26 such cleavage sites, which produces 27 peptide fragments (including very short ones such as R, TR, QK, IR — PeptideMass still counts each as a peptide).

PeptideMass answer: after “Perform the cleavage” with Trypsin and the same options as Figure 4 in the lab handout, the tool should report 27 peptides. If your number differs, check that enzyme is trypsin only, no extra missed-cleavage settings conflict with Figure 4, and the pasted sequence matches Part I exactly (247 residues).

3. PeptideMass

Use PeptideMass, paste the assignment sequence, set enzyme Trypsin, and replicate all options from Figure 4. Report the number of peptides the tool prints after “Perform the cleavage.” I haven’t completed this yet but I will.

4. Peaks in Figure 5a (0.5–6 min, >10% relative abundance)

Figure 5a below is the total ion chromatogram (TIC) for the eGFP tryptic peptide map (04142026_GFP digest_gud, TOF MSe, 50–2000 m/z, ESI+). The tallest peak is at 4.87 min (_{(1.15 \times 10^7) counts). Taking >10% relative abundance as ≥10% of that base peak (}(1.15 \times 10^6) counts), the small labeled peaks at ~1.20 and ~5.43 min look below that threshold; all other labeled peaks in the window appear above it.

Between 0.5 and 6.0 min, the figure shows 21 retention-time labels on distinct apexes (0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, 5.43). Excluding the two that likely fall under 10% of the base peak gives 19 peaks counted under the assignment rule.

5. More peaks or fewer vs prediction?

Predicted tryptic peptides (§2): 27 fragments from a full in-silico trypsin digest of the eGFP sequence.

Peaks in Figure 5a (§4, 0.5–6 min): 21 labeled apexes, or 19 if you only count peaks ≥10% of the base peak (4.87 min).

Does the peak count match 27? No — there are fewer chromatographic peaks than predicted peptides in this run and time window.

Why fewer is normal:

Co-elution: Two or more peptides can leave the column at the same time and appear as one TIC apex, so the number of TIC peaks can be smaller than the number of peptide species.
Time window: The assignment only counts 0.5–6 min; any tryptic peptide eluting before 0.5 or after 6 min would not be counted here even though it is in the digest.
Sensitivity: Very small or poorly ionizing peptides may fall below the display threshold (or the >10% rule), so they do not appear as distinct peaks.
In general: For a TIC, you can also sometimes see more apparent peaks than “27” if you included adducts, partial cleavage products, or oxidized variants as separate features—but this TIC shows fewer than 27 in the stated window, which is consistent with co-elution and window/sensitivity effects, not a contradiction with the protein being eGFP.

6. Figure 5b — m/z, charge, singly charged mass

Precursor spectrum for the peak eluting at 2.78 min (combined with Figure 5c in the screenshot below).

Most abundant precursor (monoisotopic apex): m/z 525.76712 (also a +2 charge envelope; minor ions near 350.84 and 1050.52 are consistent with other charge states / isotopic features of the same peptide).

Isotope spacing (inset): e.g. 525.76712 → 526.25918 Th → (\Delta \approx 0.492) Th. For a single isotopic cluster, (\Delta(m/z) \approx 1/z), so (z \approx 1/0.492 \approx 2.03) → charge (z = 2) ([M+2H]²⁺).

Neutral peptide mass from the measured (m/z) and (z=2) (monoisotopic proton mass (m_p = 1.00728) Da):

[ M_{\text{obs}} = z,(m/z - m_p) = 2 \times (525.76712 - 1.00728) \approx \mathbf{1049.52\ Da} ]

Singly protonated mass ([M{+}H]^+) (one proton on the neutral peptide):

[ [M{+}H]^+ = M_{\text{obs}} + m_p \approx 1049.52 + 1.00728 \approx \mathbf{1050.53\ Da} ]

(This agrees with the ~1050.52 Th feature in the full scan as the +1 ion of the same peptide.)

Quantity	Value
m/z (main ion, monoisotopic)	525.76712
(\Delta) between isotopes (inset)	~0.49 Th
Inferred (z)	2
Neutral peptide mass (M)	~1049.52 Da
([M{+}H]^+)	~1050.53 Da

7. Identify peptide and ppm error

Peptide identity: match (M_{\text{obs}}) or ([M{+}H]^+) to PeptideMass tryptic masses for the Part I sequence. The closest tryptic peptide is FEGDTLVNR (cleavage after K at …K|FEGDTLVNR… in eGFP).

Theoretical monoisotopic masses: (M_{\text{theory}} \approx 1049.514) Da, ([M{+}H]^+_{\text{theory}} \approx 1050.521) Da.

Accuracy (fractional error):

[ \text{Accuracy} = \frac{\lvert MW_{\text{experiment}} - MW_{\text{theory}}\rvert}{MW_{\text{theory}}} ]

Using neutral masses: (\lvert 1049.5197 - 1049.5142\rvert / 1049.5142 \approx \mathbf{5.3 \times 10^{-6}}) (~0.00053%).

ppm error:

[ \text{ppm} = \frac{\lvert M_{\text{obs}} - M_{\text{theory}}\rvert}{M_{\text{theory}}} \times 10^{6} \approx \mathbf{5.3\ ppm} ]

(Using ([M{+}H]^+) instead gives the same order of magnitude.)

8. Percent sequence confirmed (Figure 6)

BioAccord reports amino acid coverage from peptide identifications. Figure 6 below (“Amino Acid Coverage Map of eGFP based on BioAccord LC-MS peptide identification data”) shows Identified: 88% and Chain 1 (88% coverage).

Answer: 88% of the protein sequence is covered by confident peptide matches (highlighted segments in the map). A few short stretches remain unidentified (white / non-highlighted gaps in the map)—e.g. segments around LPVPWPTL, parts of VTTLT / YGVQC, TRA, IDF, and a single Q—so not every residue received a confident tryptic ID in this run.

Percent coverage = (residues covered by identified peptides) / (total residues) × 100% = 88% (from the BioAccord summary bar).

Part IV — KLH oligomers (CDMS)

Subunit masses from Table 1: 7FU ≈ 340 kDa, 8FU ≈ 400 kDa per polypeptide chain (1,000 kDa = 1 MDa).

Expected oligomer masses (integer subunit counts × subunit mass):

Oligomer (assignment)	Calculation	Expected mass
7FU decamer (10 × 7FU)	10 × 340 kDa	3.4 MDa
8FU didecamer (20 × 8FU)	20 × 400 kDa	8.0 MDa
8FU 3-decamer (30 × 8FU)	30 × 400 kDa	12.0 MDa
8FU 4-decamer (40 × 8FU)	40 × 400 kDa	16.0 MDa

Where each species lines up on Figure 7 (labeled maxima from the spectrum):

Species	Expected	Observed label on Figure 7 (approx.)
7FU decamer	3.4 MDa	~3.4 MDa (clear peak just before ~4.013 MDa)
8FU didecamer	8.0 MDa	~8.33 MDa (strongest peak in the spectrum); ~7.52 MDa is a nearby shoulder / related species
8FU 3-decamer	12.0 MDa	~12.67 MDa
8FU 4-decamer	16.0 MDa	No strong label exactly at 16 MDa; weak intensity is visible beyond ~12.67 MDa toward ~17 MDa (and minor bumps ~21 and ~25 MDa), consistent with a broad / low-abundance ~16 MDa assembly plus adducts or heterogeneity

Other features at ~0.20, ~0.79, ~1.52, and ~4.01 MDa are likely smaller assemblies, fragments, or alternative stoichiometries, not the four named decamer-series maxima in the table.

Takeaway: The dominant KLH signals align with 3.4 MDa (7FU decamer), ~8.3 MDa (8FU didecamer, base peak), and ~12.7 MDa (8FU 3-decamer). The 4-decamer is expected near 16 MDa but appears much weaker than the lower oligomers in this run.

Part V — Did I make GFP?

Values below use the same eGFP construct as Part I and the intact LC-MS deconvoluted mass from Figure 1 in Part I (course / handout spectrum), since that matches a monoisotopic-style deconvolution.

	Theoretical	Observed / measured (intact LC-MS)	ppm mass error
Molecular weight (kDa)	~32.46 (average MW, ProtParam / ExPASy, linear sequence + His tag)	~27.98 (deconvoluted neutral mass from Part I, adjacent charge states on Figure 1)	~250 ppm vs monoisotopic linear theoretical ~27.989 kDa (same “type” as the MS value; see note)

Note: The 32.46 kDa entry is the average molecular weight from the calculator; the mass spectrometer deconvolution is usually reported on a monoisotopic scale (~27.98 kDa here), so ppm should be computed against the monoisotopic linear theoretical mass (~27.989 kDa, Part I) for a fair error. Comparing 27.98 kDa directly to 32.46 kDa would mix scales and look like a huge “error,” which is misleading.

ppm (vs monoisotopic linear (M_{\text{theory}} \approx 27.989) kDa, (M_{\text{obs}} \approx 27.982) kDa): (\lvert 27.982 - 27.989\rvert / 27.989 \times 10^{6} \approx) 250 ppm.

Quick reference

ExPASy tools: ProtParam, PeptideMass, Compute pI/Mw
Fragment prediction: FragIon
Genes in Space (if cross-linking weeks): genesinspace.org