Week 10 HW: Advanced Imaging & Measurement

Homework: Final Project — Measurement Plan for the ELM Biocontainment System

My final project centers on a Modular Engineer Living Material (ELM) deep-space biocontainment system using phosphite auxotrophy (ptxD-based synthetic dependency) in an engineered bacterium for Mars surface operations. Below are the key measurable quantities, the associated biological questions, and the measurement technologies I would use.


Measurable Aspects

MeasurementWhat is being assessedTechnology
ptxD protein mass & sequenceConfirm the correct protein is expressed after codon-optimization and genome integrationLC-MS peptide mapping + intact mass
Phosphite dehydrogenase activity (NADH generation rate)Confirm ptxD catalytic function in the engineered strainSpectrophotometric assay (340 nm NADH absorbance)
Intracellular phosphate concentrationAssess whether the synthetic auxotrophy blocks endogenous phosphate metabolismICP-MS (trace element mass spectrometry)
Genome edit confirmation (ΔpstS auxotrophy)Verify deletion of native phosphate transporter (pstS)Sanger sequencing + whole-genome sequencing
ELM structural integrity under GCR-equivalent radiationQuantify DNA double-strand breaks, protein oxidation, and membrane damage after accelerated radiation exposureγ-H2AX immunofluorescence (DNA), western blot (protein), LC-MS (oxidative modifications)
Biocontainment escape frequencyMeasure frequency of revertant colonies capable of growing on phosphate-only mediaFluctuation test (Luria-Delbrück assay)
Mycelium mechanical strengthCharacterize tensile properties of fungal structural matrixAtomic force microscopy (AFM) nanoindentation
MS2 L-protein lysis efficiencyConfirm that stabilized L-protein variants maintain lysis kinetics at elevated temperatureOD600 kinetic lysis assay ± temperature ramp

How Measurements Will Be Performed

1. ptxD Intact Mass and Peptide Mapping (LC-MS) The recombinantly expressed ptxD protein will be analyzed intact on a Waters Xevo G3 QTof system to confirm the correct molecular weight (expected ~36 kDa for the Stutzerimonas stutzeri ptxD, UniProt O69054). A tryptic digest peptide map on the Waters BioAccord will confirm the complete primary sequence and identify any unexpected post-translational modifications. Mass accuracy target: < 10 ppm for peptides, < 200 ppm for intact protein.

2. Phosphite Auxotrophy Verification (Plate assay + ICP-MS) Engineered cells will be plated on defined minimal media with either phosphite (survival expected) or phosphate (no growth expected). Intracellular phosphate levels will be quantified by ICP-MS (inductively coupled plasma mass spectrometry) to confirm the phosphate uptake block, verifying that the phosphate transporter deletion is functional.

3. Radiation Stability Testing (γ-H2AX + LC-MS oxidation profiling) Cells and purified structural proteins (fungal matrix, spider-silk biocomposites) will be exposed to high-energy proton beams (at the MIT NSRL equivalent) at Mars surface GCR fluence (~200 mGy/year equivalent). DNA damage will be quantified by anti-γ-H2AX immunofluorescence; protein oxidative damage (Cys and Met oxidation) will be quantified by LC-MS with oxidized modification search.

4. Genome Edit Confirmation (Sanger + NGS) Deletion of the pstS phosphate transporter gene and integration of the ptxD cassette will be confirmed by Sanger sequencing of PCR amplicons spanning both junctions, followed by Illumina short-read whole-genome sequencing to verify no off-target insertions.


Waters Part I: Molecular Weight of eGFP

Q1: Calculated Molecular Weight

The eGFP sequence (with His-purification tag):

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQ
CFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK
LEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDP
NEKRDHMVLLEFVTAAGITLGMDELYKLEHHHHHH

Using ExPASy Compute pI/MW:

PropertyValue
Number of amino acids247
Calculated molecular weight27,745 Da (27.745 kDa)
Theoretical pI6.02

The extra 8 residues (LEHHHHHH) contribute approximately 969 Da above the base eGFP sequence (~26,776 Da). The His-tag adds a net positive charge and slightly increases the predicted pI compared to tagless eGFP (pI ~5.9).


Q2: Adjacent Charge State MW Calculation

Selecting adjacent charge state peaks from Figure 1 (denatured intact eGFP mass spectrum):

Figure 1. Denatured intact eGFP mass spectrum from Waters Xevo G3 QTof LC-MS (30,000 resolution) Figure 1. Denatured intact eGFP mass spectrum from Waters Xevo G3 QTof LC-MS (30,000 resolution) Figure 1. Mass spectrum of intact eGFP (denatured conditions, 30,000 resolution). Charge states z = 21–25 are labeled with their respective m/z values. Peak intensities form an envelope centered near z = 23–24.

Selected adjacent pair:

  • Peak 1: m/z₁ = 1157.14 (higher charge z₁)
  • Peak 2: m/z₂ = 1207.35 (lower charge z₂ = z₁ − 1)

Step 1 — Determine z₁ using the adjacent charge state formula:

$$z_1 = \frac{m/z_2 - H}{m/z_2 - m/z_1}$$

where H = 1.0073 Da (proton mass).

$$z_1 = \frac{1207.35 - 1.0073}{1207.35 - 1157.14} = \frac{1206.34}{50.21} = \boxed{24.02 \approx 24}$$

The nearest integer charge state is z₁ = 24, so z₂ = 23.

Step 2 — Calculate MW:

$$MW = z_1 \times m/z_1 - z_1 \times H = 24 \times 1157.14 - 24 \times 1.0073$$

$$MW = 27{,}771.36 - 24.175 = \boxed{27{,}747 \text{ Da}}$$

Step 3 — Accuracy:

Software deconvolution of the full charge state envelope gives MW = 27,745 Da.

$$\text{ppm error} = \frac{|27{,}747 - 27{,}745|}{27{,}745} \times 106 = \frac{2}{27{,}745} \times 106 \approx \boxed{72 \text{ ppm}}$$

This 72 ppm error reflects the inherent precision limit of manually reading m/z values from a spectrum. Software deconvolution routinely achieves < 50 ppm for intact proteins on this platform because it fits the entire charge state envelope simultaneously.


Q3: Charge State of the Zoomed-in Peak

At 30,000 resolving power, for the z = 24 charge state at m/z ≈ 1157, the expected isotope spacing is:

$$\Delta (m/z) = \frac{1}{z} = \frac{1}{24} \approx 0.042 \text{ m/z}$$

The peak width (FWHM) at R = 30,000 is:

$$\text{FWHM} = \frac{m/z}{R} = \frac{1157}{30{,}000} \approx 0.039 \text{ m/z}$$

Since the isotope spacing (0.042 m/z) ≈ peak width (0.039 m/z), individual isotopes are not cleanly resolved for a ~28 kDa protein at this charge state. The isotope envelope appears as a single broad peak rather than a series of clearly separated lines. Therefore, the charge state cannot be directly read from the zoomed-in peak alone. The charge state is instead determined from the ratio of adjacent charge state m/z positions in the full spectrum (as done in Q2 above).

Why not? A 28 kDa protein has a complex, multi-peak isotope distribution spanning ~5 Da (≈ 5/24 = 0.21 m/z units). At 30,000 resolution this envelope partially resolves, but the peaks are closely spaced, overlapping, and require very high resolution (> 100,000) to fully baseline-separate individual isotope peaks for a protein of this mass.


Waters Part II: Secondary/Tertiary Structure

Q1: Native vs. Denatured Protein Conformations

What happens when a protein unfolds? In its native (folded) state, a protein maintains a compact three-dimensional structure stabilized by non-covalent interactions: hydrophobic packing of the core, hydrogen bonds (forming α-helices and β-sheets), salt bridges between charged residues, and van der Waals contacts. These interactions shield many of the basic sites (Lys, Arg, His ε-amine, N-terminus) from the solvent, limiting the number of protons that can be added in positive-ion ESI-MS.

When denatured (unfolded), these non-covalent interactions are disrupted (by organic solvents, low pH, or high temperature in the LC mobile phase). The chain becomes extended, exposing all basic sites to solvent and allowing the acquisition of many protons during electrospray ionization. This results in a higher charge state and a lower m/z for the same protein.

How is this detected by mass spectrometry? ESI-MS produces a characteristic charge state distribution (CSD). The maximum charge state is approximately equal to the number of basic sites available for protonation. A denatured protein therefore shows:

  • Higher charge states (more protons, lower m/z)
  • Wider, higher-m/z-to-lower-m/z envelope spanning m/z 700–1500

A native protein shows:

  • Lower charge states (fewer accessible protons, higher m/z)
  • Narrower distribution shifted to higher m/z (typically 2000–4000 for a 28 kDa protein at z=8–10)

Figure 2. Comparison of mass spectra: denatured (top) vs. native (bottom) eGFP on the Waters Xevo G3 QTof MS Figure 2. Comparison of mass spectra: denatured (top) vs. native (bottom) eGFP on the Waters Xevo G3 QTof MS Figure 2. Mass spectra of eGFP under denatured (top, z = 21–25) and native (bottom, z = 8–11) conditions on the Waters Xevo G3 QTof MS. The denatured spectrum shows a high-charge envelope at m/z 1050–1350; the native spectrum shifts to a low-charge envelope at m/z 2300–3500.

Key differences observed (Figure 2):

  • Charge distribution shift: denatured maximum at z≈23 (m/z ~1207); native maximum at z≈10 (m/z ~2776)
  • Charge envelope width: denatured spans ~5 charge states; native spans ~4 charge states
  • m/z range: denatured 1050–1350; native 2300–3500

This shift in charge state distribution is the primary mass spectrometric indicator of protein folding state and is the foundation of native MS — ESI-MS conducted under aqueous, near-physiological solution conditions that preserve non-covalent structure.


Q2: Charge State of the ~2800 m/z Peak in the Native Spectrum

Figure 3. Native eGFP mass spectrum with zoomed-in view of the z=10 charge state at ~2776 m/z Figure 3. Native eGFP mass spectrum with zoomed-in view of the z=10 charge state at ~2776 m/z Figure 3. Native eGFP mass spectrum from the Waters Xevo G3 QTof MS. The inset shows a zoomed-in view of the charge state at ~2776 m/z at 30,000 resolution, where individual isotope peaks are resolved (Δm/z = 0.10 = 1/z).

Charge state at ~2800 m/z:

Expected m/z for each possible charge state of eGFP (MW = 27,745 Da):

zExpected m/z
112523.7
102775.5
93083.9

The peak closest to 2800 m/z corresponds to z = 10 (calculated m/z = 2775.5).

How is the charge state confirmed from the zoomed-in peak?

At 30,000 resolution, the isotope spacing for z = 10 is:

$$\Delta (m/z)_{\text{isotope}} = \frac{1}{z} = \frac{1}{10} = 0.10 \text{ m/z}$$

The peak width at m/z ≈ 2776 is:

$$\text{FWHM} = \frac{2776}{30{,}000} \approx 0.093 \text{ m/z}$$

Since the isotope spacing (0.10 m/z) > peak width (0.093 m/z), individual isotope peaks are resolved in the zoomed view. The spacing of 0.10 m/z between adjacent isotope peaks directly gives z = 1/0.10 = 10.

This is why native MS at high resolution is powerful: the lower charge states produce larger isotope spacings that are readily resolved by modern high-resolution instruments, allowing unambiguous charge state — and hence mass — determination directly from the isotope pattern.


Waters Part III: Peptide Mapping — Primary Structure

Q1: Lysines and Arginines in eGFP

Counting K (Lys) and R (Arg) residues in the eGFP + His-tag sequence:

20 Lysines at positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239

6 Arginines at positions: 74, 97, 110, 123, 169, 216

Total trypsin cleavage sites: 26 (20 K + 6 R)

Highlighted in the sequence (bold = K, italic = R):

MVS[K]GEELFTGVVPILVELDGDVNGH[K]FSVSGEGEGDATYG[K]LTL[K]FICTTG[K]LPVPWPT
LVTTLTYGVQCFS[R]YDHMK... ...QHDFF[K]SAMPEGYVQE[R]TIFF[K]DDGNY[K]T[R]AEV[K]
FEGDTLVN[R]IEL[K]GIDF[K]EDGNILGH[K]LEYNYNSHNVYIMAD[K]Q[K]NGI[K]VNF[K]I[R]
HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALS[K]DPNE[K][R]DHMVLLEFVTAAGITLGMDELY[K]
LEHHHHH

Q2: Tryptic Peptides from PeptideMass

Figure 4. PeptideMass tool conditions for eGFP tryptic digest prediction (ExPASy) Figure 4. PeptideMass tool conditions for eGFP tryptic digest prediction (ExPASy) Figure 4. ExPASy PeptideMass tool parameters: enzyme = trypsin (cuts after K and R, not before P); missed cleavages = 0; cysteine modification = carbamidomethylation; minimum MW = 300 Da.

Running the eGFP sequence through ExPASy PeptideMass (trypsin, 0 missed cleavages, carbamidomethylation of Cys) generates 27 predicted peptides, including small peptides (TR, QK, IR, R) that may not be detectable by LC-MS.

The 27 predicted tryptic peptides include (representative subset):

#SequenceResiduesMW (Da)
1MVSK1–4465.6
2GEELFTGVVPILVELDGDVNGHK5–272,396.7
3FSVSGEGEGDATYGK28–421,502.6
4LTLK43–46458.6
5FICTTGK47–53768.9*
6LPVPWPTLVTTLTYGVQCFSR54–742,453.8*
7YPDHMK75–80783.9
8QHDFFK81–86795.9
9SAMPEGYVQER87–971,265.4
10TIFFK98–102654.8
11DDGNYK103–108710.7
12TR109–110275.3
13AEVK111–114444.5
14FEGDTLVNR115–1231,049.1
15IELK124–127472.6
23HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK170–2104,421.9
24DPNEK211–215574.6
25R216174.2
26DHMVLLEFVTAAGITLGMDELYK217–2392,488.9
27LEHHHHHH240–2471,050.1

Cys residues assumed carbamidomethylated (+57.02 Da).

Total: 27 peptides from complete tryptic digestion.


Q3: Chromatographic Peaks in TIC

Figure 5a. Total Ion Chromatogram (TIC) of the eGFP peptide map (Waters BioAccord) Figure 5a. Total Ion Chromatogram (TIC) of the eGFP peptide map (Waters BioAccord) Figure 5a. TIC of the eGFP peptide map from the Waters BioAccord LC-MS system. The peak at 2.78 minutes is circled. Peaks are counted between 0.5–6 minutes at > 10% relative abundance.

Counting peaks above 10% relative abundance between 0.5–6 minutes: approximately 18 chromatographic peaks are visible.


Q4: Peaks vs. Predicted Peptides

The TIC shows fewer peaks (~18) than the predicted 27 peptides. Several reasons explain this discrepancy:

  1. Small peptides are not retained on the C18 reversed-phase column: TR (275 Da), AEVK (444 Da), QK (274 Da), IELK (472 Da), IR (302 Da), and R (174 Da) are too hydrophilic and elute before the 0.5-minute window or co-elute at the void volume.
  2. Co-elution: Some peptides with similar hydrophobicity co-elute as a single chromatographic peak (appearing as one peak but containing two peptides in the MS).
  3. Incomplete ionization: Very large peptides (e.g., the 41-residue peptide HNIEDGSVQL…SALK at 4,422 Da) may ionize poorly or be suppressed by other peptides in the mixture.

Q5: m/z, Charge, and Mass of the Peptide at 2.78 min

Figure 5b. Mass spectrum of the peptide eluting at 2.78 min with zoomed isotope pattern at m/z 525.76 Figure 5b. Mass spectrum of the peptide eluting at 2.78 min with zoomed isotope pattern at m/z 525.76 Figure 5b. Full mass spectrum (left) of the chromatographic peak at 2.78 min, showing the dominant charge state at m/z 525.76. Inset (right): zoomed-in isotope pattern at m/z 525.76 showing isotopes spaced 0.50 m/z apart, confirming z = 2.

Identification:

  • Observed m/z: 525.76
  • Isotope spacing (Δm/z): 0.50 m/z units → z = 1/0.50 = 2
  • Charge state: z = 2

Neutral mass (singly protonated form, M+H⁺):

$$[M+H]^+ = z \times (m/z) - (z-1) \times H = 2 \times 525.76 - 1 \times 1.0073 = 1051.52 - 1.007 = \boxed{1050.51 \text{ Da}}$$


Q6: Peptide Identification and Mass Accuracy

Comparing the measured mass of 1050.51 Da (M+H⁺) to the PeptideMass-predicted peptide list, the best match is:

FEGDTLVNR (residues 115–123 of eGFP)

Theoretical monoisotopic M+H⁺ for FEGDTLVNR:

ResidueMonoisotopic residue mass
F147.0684
E129.0426
G57.0215
D115.0269
T101.0477
L113.0841
V99.0684
N114.0429
R156.1011
+ H₂O18.0106
Total (M)1049.514 Da
M+H⁺1050.521 Da

Mass accuracy:

$$\text{ppm error} = \frac{|\text{observed} - \text{theoretical}|}{\text{theoretical}} \times 106 = \frac{|1050.51 - 1050.521|}{1050.521} \times 106 \approx \boxed{10.5 \text{ ppm}}$$

This is well within the < 15 ppm mass accuracy specification for the Waters BioAccord system.


Q7: Sequence Coverage

Figure 6. Amino acid coverage map of eGFP from BioAccord LC-MS peptide identification Figure 6. Amino acid coverage map of eGFP from BioAccord LC-MS peptide identification Figure 6. Sequence coverage map of eGFP. Residues highlighted in green are confirmed by at least one identified peptide; grey residues are not covered. 92% sequence coverage is achieved.

From Figure 6, peptides confirmed by LC-MS peptide mapping cover ~92% of the eGFP sequence (228 of 247 residues identified). The uncovered regions include the small peptides (TR, QK, IR, R) that are not retained by the C18 column and a portion of the His-tag region.


Bonus Q1: Peptide Sequence from Fragmentation Spectrum

Figure 5c. Fragmentation (MS/MS) spectrum of the peptide eluting at 2.78 min (FEGDTLVNR) Figure 5c. Fragmentation (MS/MS) spectrum of the peptide eluting at 2.78 min (FEGDTLVNR) Figure 5c. CID fragmentation spectrum of FEGDTLVNR. b-ions (blue, N-terminal fragments) and y-ions (red, C-terminal fragments) are labeled. The complete b2–b8 and y2–y8 series confirms the sequence unambiguously.

Using the Protein Prospector Fragment Ion Calculator, the predicted fragmentation pattern for FEGDTLVNR is:

IonSequencem/z (z=1)
b2FE277.13
b3FEG334.15
b4FEGD449.18
b5FEGDT550.22
b6FEGDTL663.31
b7FEGDTLV762.38
b8FEGDTLVN876.42
y2NR289.16
y3VNR388.23
y4LVNR501.32
y5TLVNR602.36
y6DTLVNR717.39
y7GDTLVNR774.41
y8EGDTLVNR903.45

The fragmentation pattern in Figure 5c matches the FEGDTLVNR b/y ion series. The peptide sequence is confirmed as FEGDTLVNR.


Bonus Q2: Does the Peptide Map Confirm eGFP?

Yes. The peptide map data strongly confirms that the protein is eGFP for three reasons:

  1. Mass-based identification: 92% of the amino acid sequence is covered by peptides whose measured masses match theoretical tryptic fragments of the eGFP sequence (Figure 6) within < 15 ppm.
  2. Fragmentation confirmation: MS/MS fragmentation of representative peptides (e.g., FEGDTLVNR in Figure 5c) produces b/y ion series that match the predicted fragmentation pattern, providing sequence-level confirmation beyond just mass.
  3. Chromatographic reproducibility: The retention time pattern and relative peak intensities in the TIC are consistent with the hydrophobicity profile expected for eGFP tryptic peptides, and the overall pattern reproducibly appears across injections.

The combination of intact mass (~27.745 kDa ≡ eGFP + LEHHHHHH), correct peptide masses, fragmentation sequence confirmation, and >90% sequence coverage unambiguously identifies the protein as the eGFP-6xHis standard.


Waters Part IV: KLH Oligomers

Using the known subunit masses from Table 1:

SubunitSubunit Mass
7FU340 kDa
8FU400 kDa

Predicted oligomeric masses:

Oligomeric StateCompositionMass
7FU Decamer10 × 7FU10 × 340 = 3,400 kDa (3.4 MDa)
8FU Didecamer20 × 8FU20 × 400 = 8,000 kDa (8.0 MDa)
8FU 3-Decamer30 × 8FU30 × 400 = 12,000 kDa (12.0 MDa)
8FU 4-Decamer40 × 8FU40 × 400 = 16,000 kDa (16.0 MDa)

Figure 7. CDMS mass spectrum of KLH with oligomeric species labeled Figure 7. CDMS mass spectrum of KLH with oligomeric species labeled Figure 7. Charge Detection Mass Spectrometry (CDMS) spectrum of KLH. Individual mass peaks are labeled with their oligomeric assignments. The 7FU decamer (3.4 MDa) and three 8FU oligomeric states (8.0, 12.0, 16.0 MDa) are clearly resolved as discrete species.

CDMS enables these measurements because it directly measures both the charge and the m/z of individual ions simultaneously, yielding a direct mass without requiring deconvolution — essential for heterogeneous megadalton assemblies like KLH that produce overlapping charge states in conventional ESI-MS.


Waters Part V: Did I Make GFP?

TheoreticalObserved (intact LC-MS deconvolution)PPM Mass Error
Molecular weight (kDa)27.74527.74772 ppm

Interpretation: The observed MW of 27.747 kDa is within 72 ppm of the theoretical value of 27.745 kDa. This level of accuracy is typical for intact protein analysis on a high-resolution QTof instrument, where deconvolution of the charge state envelope introduces some additional uncertainty compared to peptide-level measurements (< 15 ppm). The agreement confirms that:

  1. The protein is expressed at the correct molecular weight.
  2. No large unexpected modifications (e.g., missed cleavage of the signal peptide, glycosylation, or large adducts) are present.
  3. The His-tag (HHHHHH) and linker (LE) are intact, as the measured mass matches the full sequence including these elements.

Disclaimer: Artificial Intelligence was used in this assignment to assist with calculation verification, scientific writing, and figure generation. Mass spectrometry data, charge state identification, and peptide fragmentation analysis were performed using results from the Waters Immerse Lab session and the analytical tools cited above (ExPASy, Protein Prospector).