Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

I would like to measure multiple biological and functional aspects of the synthetic rhizosphere consortium composed of Pseudomonas fluorescens, Azospirillum brasilense, and Bacillus subtilis. Key variables include the production of osmoprotectants (such as proline or trehalose) under saline stress, nitrogen fixation efficiency, biofilm formation and exopolysaccharide (EPS) production, and the presence, sequence accuracy, and expression of engineered genetic constructs, including kill switch systems. At a higher level, the project will also assess microbial population dynamics and plant growth indicators such as root length and biomass, which serve as direct proxies for improved agricultural productivity under salt stress.

2. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

Osmoprotectant levels will be quantified using high-performance liquid chromatography (HPLC) or mass spectrometry, which allow precise detection of small metabolites. Nitrogen fixation will be evaluated using the acetylene reduction assay (ARA) to measure nitrogenase activity, complemented by colorimetric assays for ammonia production. Biofilm formation will be quantified using crystal violet staining, while EPS production will be assessed using carbohydrate quantification assays. Gene expression levels associated with salt response and nitrogen fixation will be measured using quantitative PCR (qPCR), and reporter systems (fluorescence) may be used to monitor activation of engineered circuits such as salt-inducible promoters or kill switches. Plant performance will be evaluated through standard phenotyping methods, including biomass measurements and root morphology analysis.

3. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

I was thinking of DNA sequencing (Sanger or next-generation sequencing) will be used to confirm the accuracy of genetic constructs designed in Benchling, while gel electrophoresis will verify plasmid size and integrity. Mass spectrometry and HPLC will enable sensitive metabolite quantification, and qPCR will provide precise measurement of gene expression levels. Protein expression can be validated using Western blotting or fluorescence-based detection systems. Additionally, colony-forming unit (CFU) counts and live/dead staining assays will be used to evaluate kill switch functionality under different environmental conditions. Finally, 16S rRNA sequencing will allow monitoring of microbial community composition and stability within the consortium. Together, these technologies create a comprehensive and quantitative framework to validate the performance and safety of the designed system.

Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/ eGFP Sequence:

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it).

According to the ExPASy Compute pI/Mw tool, the theoretical molecular weight of the eGFP construct is 28,006.60 Da (≈ 28.01 kDa), with a predicted pI of 5.90.

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

a) The formula provided expresses the charge state in terms of the ratios m/zₙ and m/zₙ₊₁, which represent two adjacent peaks in the mass spectrum.

Although it is written as m divided by z, these terms correspond directly to the experimentally measured m/z values of the peaks. Therefore, the equation can be simplified by replacing m/zₙ and m/zₙ₊₁ with the actual peak values. In this case, the peaks at 933.7148 and 965.9684 are adjacent.

Since m/z is inversely proportional to charge, the lower m/z value (933.7148) corresponds to the higher charge state (z+1), and the higher m/z value (965.9684) corresponds to the lower charge state (z).

z = (m / z(n+1)) / ( (m / z(n)) - (m / z(n+1)) )

z = (smaller peak) / (bigger peak − smaller peak)

z = 933.7148 / (965.9684 − 933.7148)

z = 933.7148 / 32.2536

z = 28.94 ≈ 29

b) The molecular weight (MW) was calculated using:

MW = z * (m/z) - z * mH

Where mH = 1.007276 Da (mass of a proton).

Substituting values:

MW = 30 * 933.7148 - 30 * 1.007276

MW = 28011.444 - 30.21828

MW = 27981.23 Da

The term mH represents the mass of a proton (H⁺), which is approximately 1.007276 Da. This value is used because, in electrospray ionization mass spectrometry (ESI-MS), proteins are ionized by gaining protons, forming positively charged ions of the form [M + zH]⁺ᶻ. As a result, the measured m/z value includes not only the mass of the protein but also the mass of the added protons. Each proton contributes both one unit of positive charge and an additional mass of about 1.007276 Da.

Accuracy = (MWexperimental − MWtheoretical) / MWtheoretical

Accuracy = |27981.23 − 28006.60| / 28006.60

Accuracy = 25.37 / 28006.60

Accuracy ≈ 0.000906

An error of approximately 0.0906% is considered very low in mass spectrometry, indicating that the experimentally calculated molecular weight is extremely close to the theoretical value.

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not.

Yes, the charge state can be observed because the small peaks in this region (1473.5333, 1473.7429, 1474.0481) correspond to the isotopic distribution of a single charge state. The spacing between adjacent isotopic peaks is approximately 0.3 m/z units.

Since isotopic spacing follows the relationship Δ(m/z) = 1/z, the charge state can be estimated as z = 1/0.3 = 3. Therefore, the peak corresponds to a charge state of approximately +3. While it is true that adjacent charge states in the full spectrum are separated by much larger differences in m/z, the charge state of an individual peak can still be determined from the isotopic spacing within the zoomed-in region.

Homework: Waters Part II — Secondary/Tertiary structure

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations.

What happens when a protein unfolds?

In its native state, a protein like eGFP is properly folded into a compact, globular structure stabilized by noncovalent interactions (hydrogen bonds, hydrophobic interactions, ionic interactions). Many basic residues that can accept protons are buried inside the structure. When a protein becomes denatured, it unfolds into an extended conformation. This disrupts its tertiary structure and exposes previously buried residues, including basic amino acids (e.g., Lys, Arg, His), to the solvent

This is determined in a mass spectrometer by measuring the mass-to-charge ratio (m/z) of the protein ions produced during electrospray ionization on the Waters Xevo G3-QToF. As the protein enters the instrument, it picks up multiple protons, forming ions with different charge states. The instrument detects these ions as a series of peaks at different m/z values.

How is that determined with a mass spectrometer?

For a folded (native) protein, fewer protonation sites are accessible, so the protein carries fewer charges, and the detected peaks appear at higher m/z values with a narrow distribution. For a denatured protein, more sites are exposed, allowing more protons to attach, which produces ions with higher charge states that appear at lower m/z values and over a broader range. Thus, by analyzing the charge state distribution and the position of peaks in the spectrum, the mass spectrometer allows us to determine whether the protein is in a native or denatured conformation.

What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

The denatured protein (top spectrum) displays a broad distribution of many peaks at lower m/z values, indicating that the unfolded protein has acquired a higher number of charges. In contrast, the native protein (bottom spectrum) shows a narrower distribution with fewer peaks at higher m/z values, consistent with lower charge states. Overall, the denatured spectrum is more spread out and shifted to lower m/z, while the native spectrum is more compact and shifted to higher m/z.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800? What is the charge state? How can you tell?

The peak observed at approximately m/z ≈ 2800 in the native mass spectrum of eGFP corresponds to a specific charge state of the protein. In native mass spectrometry, proteins typically appear as a series of peaks rather than a single signal because they can carry multiple positive charges. Each peak in this series represents the same protein with a different number of charges (z), and determining this charge state is essential for interpreting the spectrum.

By zooming the region around m/z ≈ 2545, what initially appears to be a single peak is actually composed of multiple closely spaced isotopic peaks and their spacing provides direct information about the charge state. Specifically, the distance between adjacent isotopic peaks is equal to 1/z.

From the zoomed spectrum, the spacing between neighboring isotopic peaks is approximately 0.1 m/z units. Using the relationship Δ(m/z) = 1/z, the charge state can be calculated as z = 1/0.1 = 10. This indicates that the protein molecules contributing to this signal carry ten positive charges. Because the peaks in this region belong to the same charge envelope, the peak at m/z ≈ 2800 can therefore be assigned a charge state of +10.

Homework: Waters Part III — Peptide Mapping - primary structure

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.

There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.

1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

Lysine (K): 20 residues Arginine (R): 6 residues

2. How many peptides will be generated from tryptic digestion of eGFP?

a) Navigate to https://web.expasy.org/peptide_mass/*

b) Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.

c) Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.

d) Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.

Using the ExPASy PeptideMass tool with trypsin digestion, a total of 19 peptides are predicted from the eGFP sequence.

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

23 chromatographic peaks are observed between 0.5 and 6 minutes with greater than 10% relative abundance: 0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, 5.43.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

The number of chromatographic peaks observed in the LC-MS peptide map (23 peaks) is slightly higher than the 19 peptides predicted from the tryptic digest using ExPASy.

This difference is expected because a single peptide can generate multiple signals in mass spectrometry. For example, peptides can appear with different charge states, form adducts (such as with sodium) or undergo minor modifications like oxidation, all of which produce additional peaks.

5. Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H]^+) based on its m/z and z.

The peptide shown in Figure 5b has its most intense peak at m/z = 525.767, which corresponds to the most abundant charge state of the peptide. To determine the charge (z), the isotopic peak spacing in the zoomed region is examined. The distance between adjacent isotopic peaks (for example, 525.767 to 526.259 to 526.768) is approximately 0.5 m/z units. Since isotopic spacing follows the relationship Δ(m/z) = 1/z, a spacing of about 0.5 indicates that z = 2. Therefore, the most abundant charge state of the peptide is +2.

The mass of the singly charged peptide ([M+H]+) can be calculated using the equation m/z = (M + zH)/z.

Rearranging gives M = z(m/z) − zH.
Substituting the values (with H ≈ 1 Da), M = 2 × 525.767 − 2 × 1 = 1049.534 Da.
Adding one proton gives the singly charged form: [M+H]+ = 1049.534 + 1 = 1050.534 Da.
Thus, the peptide has m/z ≈ 525.767, a charge state of +2, and a singly charged mass [M+H]+ of approximately 1050.53 Da.

6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that Accuracy = (MWexperimental − MWtheoretical) / MWtheoretical)

FEGDTLVNR

Mass: 1050.5214 Position: 115-123

The experimental mass of the peptide was determined to be 1050.53 Da, while the theoretical mass from the ExPASy PeptideMass tool is 1050.5214 Da. The mass accuracy is calculated using the formula:

Accuracy = (MWexperimental − MWtheoretical) / MWtheoretical.

Substituting the values gives:

Accuracy = (1050.53 − 1050.5214) / 1050.5214 = 0.0086 / 1050.5214 = 0.00000819, which corresponds to 8.19 ppm.

This small error indicates excellent agreement between the experimental and theoretical masses.

7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

The percentage of the protein sequence confirmed by peptide mapping is 88%, as indicated by the sequence coverage shown in Figure 6.

8. Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?

The peptide sequence that best matches the fragmentation spectrum in Figure 5c is FEGDTLVNR. This sequence was identified by comparing the experimental peptide mass with the predicted tryptic peptides obtained from ExPASy and selecting the closest match. The predicted fragmentation pattern for this peptide shows a series of characteristic b-ions and y-ions, which correspond to fragmentation along the peptide backbone.

9. Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

Yes, the peptide map data makes sense and supports that the protein is the eGFP standard. The results show 88% amino acid sequence coverage, which is considered excellent for protein identification by LC-MS.

Additionally, the high mass accuracy (under 10 ppm) indicates that the measured peptide masses closely match the theoretical values. The MS/MS fragmentation spectra further confirm the identity of the peptides, as the observed b-ion and y-ion patterns are consistent with the predicted sequences.

Homework: Waters Part IV — Oligomers

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

7FU Decamer
8FU Didecamer
8FU 3-Decamer
8FU 4-Decamer

Polypeptide Subunit Name	Subunit Mass
7FU	340 kDa
8FU	400 kDa

The 7FU decamer (340 kDa × 10) has a mass of 3.4 MDa and corresponds to the peak at 3.4 MDa.
The 8FU didecamer (400 kDa × 20) has a mass of 8.0 MDa and corresponds to the peak at 8.33 MDa.
The 8FU 3-decamer (400 kDa × 30) has a mass of 12.0 MDa and corresponds to the peak at ~12.67 MDa.
The 8FU 4-decamer (400 kDa × 40) has a mass of 16.0 MDa and corresponds to the signal around 16 MDa.

#Homework: Waters Part V — Did I make GFP?

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

Theoretical (kDa)	Observed/measured on the Intact LC-MS (kDa)	PPM Mass Error
28.01	27.98	906 ppm