Week 10 HW: Advanced Imaging & Measurement Technology

Week 10 — Advanced Imaging & Measurement Technology

Homework: Final Project

For your final project:

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail. Measurement and Validation Techniques for the Bio-Sticker

Controlled Gas Exposure Assays The Bio-Sticker will first be tested in sealed exposure chambers containing precisely known concentrations of target toxic gases, such as ammonia or formaldehyde. These chambers allow accurate simulation of hazardous industrial environments while maintaining strict control over temperature, humidity, and gas concentration. By exposing the engineered fungal Bio-Sticker to increasing concentrations of the target analyte, we can determine its activation threshold, sensitivity, and dynamic range. This approach also enables the generation of dose-response curves, which are essential for calibrating the system and defining the concentration at which the color change becomes visible.
Colorimetric Analysis The primary readout of the Bio-Sticker is the visible blue color produced by expression of the chromoprotein AmilCP. Colorimetric analysis will be used to quantify this response objectively. Images of the Bio-Sticker will be captured under standardized lighting conditions, and software such as ImageJ will be used to analyze changes in color intensity. Measurements will focus on RGB (red, green, blue) values and, when applicable, absorbance at the wavelength corresponding to AmilCP. This technique allows precise quantification of signal strength, comparison between samples, and monitoring of signal development over time.
Digital Image Analysis In addition to simple colorimetric measurements, digital image processing will be employed to evaluate spatial uniformity, signal progression, and long-term stability of the color response. Time-course imaging can be used to track the kinetics of AmilCP expression after exposure to toxic gases. This enables measurement of response time, persistence of the signal, and any degradation or fading over extended periods. Such analyses are particularly important for assessing practical usability in field conditions.
Polymerase Chain Reaction (PCR) PCR will be used to confirm successful integration of the engineered genetic circuit into the Aspergillus nidulans genome. Specific primers will be designed to amplify regions spanning the inserted construct and adjacent genomic sequences. Successful amplification of fragments of the expected size will verify the presence of the biosensing cassette. This serves as an initial molecular confirmation that the strain has been correctly engineered.
DNA Sequencing Following PCR confirmation, DNA sequencing will be performed to verify the exact nucleotide sequence of the inserted construct. This step ensures that the promoter, sensing elements, reporter gene (AmilCP), and regulatory sequences have been integrated without mutations, deletions, or rearrangements. Sequence verification is critical to ensure that the genetic circuit will function as intended.
Reverse Transcription Quantitative PCR (RT-qPCR) RT-qPCR will be used to measure transcriptional activation of the reporter gene after gas exposure. RNA will be extracted from the fungal cells before and after exposure to target gases, converted into complementary DNA (cDNA), and amplified using gene-specific primers. By comparing transcript levels under different conditions, this technique will quantify the extent to which the sensing circuit is activated. RT-qPCR provides highly sensitive, quantitative insight into gene expression dynamics.
Spectrophotometry (Optional) Spectrophotometric analysis may be used to complement image-based measurements. Pigments extracted from fungal samples can be analyzed by measuring absorbance at wavelengths specific to AmilCP. This provides an additional quantitative assessment of chromoprotein production and can be particularly useful for validating colorimetric data.
Specificity Testing To ensure selectivity, the Bio-Sticker will be exposed not only to target toxic gases but also to non-target compounds commonly present in industrial environments. By comparing responses across these conditions, we can determine whether the system selectively responds to the intended analyte or produces false positives. This is essential for establishing reliability in real-world applications.
Stability and Shelf-Life Testing Long-term performance will be evaluated by monitoring the Bio-Sticker under different storage and environmental conditions. Parameters such as baseline color, response capability, and signal durability will be assessed over time. These studies will determine shelf life, operational stability, and robustness under field deployment conditions. Together, these techniques will provide a comprehensive characterization of the Bio-Sticker, from genetic validation to functional performance, ensuring that it operates as a reliable, low-cost, and easily interpretable biosensor for toxic gas detection in hazardous industrial environments.

Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/ eGFP Sequence: MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it). The calculated molecular weight based on the sequence is 28006.60
Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and: A) Determine for each adjacent pair of peaks n, n+1 using: z=mzn+1mzn-mzn+1 I choose these 2 peaks: m/z=966.0390 —> (m/z)n m/z=933.8391 —> (m/z)n+1 Replacing them in the formula: z=933.8391/966.0390-933.839129.0 So, the charge state calculated is z=29. This assigns the peaks to the 28+ and 29+ charge states.

B) Determine the MW of the protein using the relationship between mzn, MW and z

The relationship between m/zn, MW and z is: MW= z (m/z)-z x 1.0073 Using the peak of z=29: MW=29(933.8391)−29(1.0073)≈27052.1 Da If we use the z=28 peak: MW=28(966.0390)−28(1.0073)=27049.1 Da Both results are very close, which confirms the calculated MW is correct. C) Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using: Accuracy=|MWexperiment-MWtheory|/MWtheory Theoretical mass of eGFP is: MW theory=27,053 So: Accuracy=|27050-27053|/27053=≈1.1×10−4 (≈0.011%) The resulting molecular weight of eGFP is approximately 27,050 Da, which is in excellent agreement with the theoretical mass of ~27,053 Da, corresponding to an error of only 0.011%. 3) Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

The zoomed peak at m/z≈1473.7 likely corresponds to the 18+ charge state. However, its assignment is less certain because lower charge states at higher m/zm/zm/z often exhibit broader, less well-resolved isotopic distributions and lower signal intensity. z=27050/1473.7~18.4

Homework: Waters Part III — Peptide Mapping - primary structure

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein. There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.

How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

![K and R]( )

How many peptides will be generated from tryptic digestion of eGFP? I) Navigate to https://web.expasy.org/peptide_mass/ II) Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides. III) Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP. IV) Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.

There are 19 peptides generated when we use Trypsin to digest eGFP.

These residues are important because they readily accept protons, generating the multiple charge states observed in your intact protein mass spectrum. More Lys and Arg residues generally allow a protein to carry more positive charges during ESI-MS.

Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

Looking at the TIC chromatogram, the labeled peaks between 0.5 and 6.0 min are: 0.61, 0.79, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06 and 5.43 That gives a total of 20 chromatographic peaks above ~10% relative abundance in that time window

Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

The number of observed chromatographic peaks is very close to the number of peptides predicted from the tryptic digest. Using the ExPASy PeptideMass tool, a complete trypsin digestion of eGFP is predicted to generate 19 peptides. Since approximately 20 chromatographic peaks are observed between 0.5 and 6.0 minutes, the experimental result matches the theoretical prediction quite well. The slight difference is expected and may arise from peptide co-elution, minor impurities, incomplete digestion, or the presence of peptide isoforms such as missed-cleavage products or modified peptides. Overall, the peptide map is highly consistent with the expected tryptic digestion pattern of eGFP.

Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H]+) based on its m/z and z

The m/z of the most abundant peak (the monoisotopic peak) is clearly labeled in the centre of the spectrum and it’s 525.76712 To find z we look at the separation between the isotope peaks in the zoomed-in inset. In MS, isotopes typically differ by approximately 1 Dalton. The distance between these peaks on the x-axis (delta m/z) is defined by the formula: delta m/z=1/z First peak (M): 525.76712 Second peak (M+1): 526.25916 Difference (delta m/z): 526.25916 - 525.76712 = 0.49204~ 0.5 Now, solving for z: z=1/0.5=2 The charge state of this peptide is +2.

The singly charged mass represents the molecule with just one proton added. Since we know the m/z for the +2 state, we can find the neutral mass (M) and then add one proton, or use the following derivation: The observed m/z for a charge z is: (m/z)obs=M+z(H+)/z To find the singly charged form [M+H]+, we can use: [M+H]+=((m/z)obs x z)-(z-1)1.00727 (Using 1.00727 Da for the mass of a proton) (m/z)obs x z= 525.76712 x 2= 1051.53424

Now we subtract the extra proton (since z=2, there is one more proton than the [M+H]+ form): 1051.53424 - 1.00727 = 1050.52697 The mass of the singly charged form [M+H]+ is ~ 1050.5270. (To confirm we can see a smaller peak at mz1050.52438 on the far right of the original spectrum, which confirms this calculation!)

Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that Accuracy=MWexperiment-MWtheoryMWtheory) We need to identify the peptide by comparing the experimental mass derived from the spectrum with the theoretical masses of a tryptic digest of Green Fluorescent Protein (GFP).
Peptide Identification

Using a tool like PeptideMass (ExPASy) for a tryptic digestion of the GFP sequence (allowing for zero missed cleavages and focusing on the monoisotopic mass), we find a match for our experimental [M+H]+ value of ~ 1050.5270 Da. Sequence: FEGDTLVNR (Residues 157–165 of Aequorea victoria GFP) Composition: C46H72N13O16 Theoretical Monoisotopic Mass ([M+H]+): 1050.52142 Da

Mass Accuracy Calculation (Error in ppm) MWexperiment= 1050.5270 Da MWtheory= 1050.5214 Da (reference value).

Error (ppm)=|1050.5270-1050.5214|/ 1050.5214 x 10^6=5.33 ppm An error of ~5 ppm is highly characteristic of a high-resolution TOF (Time-of-Flight) analyzer. Interestingly, if we use the secondary peak labeled at m/z ~1050.52438 visible on the right of the original spectrum, the error drops to ~2.8 ppm, which further validates the identification.

What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

Based on Figure 6, the percentage of the sequence confirmed by peptide mapping is 88%. This value represents the sequence coverage, which is the proportion of the protein’s primary amino acid sequence successfully identified through detected peptides. In the provided figure, this information is explicitly stated in two places:

The progress bar in the top-left legend: Identified: 88%
The summary text at the bottom: Chain 1 (88% coverage). The visualization in Figure 6 uses blue highlighting to show which specific amino acids were “mapped” or detected during the LC-MS analysis. The white gaps in the sequence (such as those seen between residues 61-90 or 121-150) represent segments of the protein that were either not ionized well, were too small/large to be detected, or were lost during the sample preparation process. A coverage of 88% is considered very high for a bottom-up proteomics experiment, indicating a successful digestion and high sensitivity in the mass spectrometry run.

Bonus Peptide Map Questions

Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?

The peptide sequence that best matches the fragmentation spectrum in Figure 5c is: FEGDTLVNR (Phenylalanine-Glutamate-Glycine-Aspartate-Threonine-Leucine-Valine-Asparagine-Arginine) Justification using Fragmentation Ions: In MS/MS fragmentation, peptides typically break along the backbone, producing y-ions (counting from the C-terminus) and b-ions (counting from the N-terminus). The prominent peaks observed in Figure 5c correspond to the theoretical y-ion series for this sequence:

Ion	Theoretical m/z	Observed in Fig 5c (approx)
y2	289.16	～289
y3	388.23	～388
y4	501.31	～501
y5	602.36	～602
y6	717.39	～717
y7	774.41	～774
y8	903.45	～903

The presence of this nearly complete y-ion series provides high-confidence confirmation that the peptide sequence is indeed FEGDTLVNR.

Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

Yes, the results strongly indicate that the protein is the eGFP (enhanced Green Fluorescent Protein) standard.

Sequence Coverage: As shown in Figure 6, the analysis achieved 88% sequence coverage. In proteomics, coverage above 70-80% for a single protein digest is exceptionally high and serves as a “fingerprint” that definitively identifies the protein. It means almost every part of the eGFP sequence was accounted for by detected peptides.
Mass Accuracy: The experimental mass for the FEGDTLVNR peptide showed a very low error (approx. 5.3 ppm). This level of precision is characteristic of high-quality standards measured on high-resolution instruments (like the TOF used here).
MS/MS Validation: The fragmentation pattern (the “fingerprint” of the peptide) in Figure 5c matches the theoretical predictions for a known tryptic fragment of eGFP.
Retention Time & m/z Consistency: The peptide eluted at a specific retention time (2.78 min) and yielded a consistent m/z that matches the eGFP sequence database perfectly. In summary, the combination of high mass accuracy, high sequence coverage, and matching fragmentation patterns leaves no doubt that the sample is the eGFP standard.

Homework: Waters Part IV — Oligomers

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

7FU Decamer
8FU Didecamer
8FU 3-Decamer
8FU 4-Decamer The sequence is FEGDTLVNR.

How to confirm: This sequence corresponds to the tryptic fragment of eGFP (residues 157–165). Theoretical Mass ([M+H]+): 1050.5214 Da. Fragmentation Pattern: The prominent peaks in Figure 5c match the y-ion series for this sequence (y2 to y8): y2: 289.2 y3: 388.2 y4: 501.3 y5: 602.4 y6: 717.4 y7: 774.4 y8: 903.5

Q: Does the peptide map data make sense? Does it indicate the protein is the eGFP standard? Yes, it makes perfect sense for the following reasons: Sequence Coverage: Figure 6 shows a coverage of 88%. In proteomics, anything over 70% for a single protein digest is a definitive “fingerprint” identity. Mass Accuracy: The error for the identified peptides (like the one above) is consistently low (approx. 5 ppm), which is the industry standard for high-resolution TOF (Time-of-Flight) instruments like the Waters Xevo G3. MS/MS Validation: The fragmentation (MS/MS) peaks perfectly match the theoretical cleavage points of the expected eGFP tryptic peptides.

Polypeptide Subunit Name	Subunit Mass
7FU	340 kDa
8FU	400 kDa

Homework: Waters Part V — Did I make GFP?

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

Feature	Theoretical	Observed/Measured(LC-MS)
Molecular Weight	28.0066 kDa	27.9820 kDa (approx.)
SequenceCoverage	100%	88% (from Figure 6)
Mass Accuracy	0 ppm	≈5.3

The table you uploaded above the GFP section refers to Keyhole Limpet Hemocyanin (KLH). To identify the oligomeric states in Figure 7 (not shown but usually part of this section), use these calculated masses: E7FU Decamer: 10 x 340 kDa = 3.4 MDa 8FU Didecamer: 20 x T4000 kDa =8.0 MDa These extremely high masses are measured using CDMS (Charge Detection Mass Spectrometry), which is why they are included as a contrast to the smaller eGFP protein (28 kDa). Accuracy=MWexperiment-MWtheory/MWtheory= 0.0056/1050.5214 x 10^6 = 5.33 ppm

Theoretical	Observed/measured on the Intact LC-MS	PPM Mass Error
Molecular weight (kDa)	28 kDa	5.33 ppm

Based on the LC-MS/MS analysis conducted, it can be concluded that the experimental results definitively identify the sample as the eGFP standard. The high-resolution characterization yielded a sequence coverage of 88%, providing an extensive “molecular fingerprint” that matches the primary structure of the protein. This identification is further supported by the high mass accuracy observed; for instance, the tryptic peptide FEGDTLVNR was detected with a mass error of only 5.33 ppm, which is well within the acceptable range for high-performance TOF (Time-of-Flight) instrumentation. Furthermore, the MS/MS fragmentation data confirmed the sequence through a clearly defined y-ion series, matching the theoretical predictions for eGFP. The integration of high sequence coverage, precise mass measurements, and consistent fragmentation patterns confirms both the identity and the integrity of the protein standard.