Week 10 HW: Imaging and Measurement

Final Project

1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

We will measure PETase thermostability (Tm), residual activity after heat challenge, expression yield, and PET-degradation rate for each variant.

2. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

The measurable elements are fold-stability metrics (Tm and thermal survival), catalytic output (product formation rate), and protein production quality (yield/purity) across designed variants.

3. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

We will use AI-guided sequence design (ProteinMPNN/ESM), recombinant expression and purification, thermal denaturation assays, activity assays, and LC-MS to quantify and compare variant performance.

Homework: Waters Part I; Molecular Weight

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

Theoretical pI/Mw: 5.58 / 26941.48 Da (Excluding: LEHHHHHHH)

Theoretical pI/Mw: 5.90 / 28006.60 Dat (Including: LEHHHHHHH)

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

1. Determine z for each adjacent pair of peaks (n, n+1):

z = 903.7148 / (903.7148 - 875.4421) = 31.96

2. Determine the MW of the protein

MW = (m/z * z) - (z * 1.007276 Da) = (903.7148 * 31.96) - (31.96 * 1.007276 Da) = 28850.53 Da

3. Calculate the accuracy of the measurement

(28850.53 - 28006.60) / 28006.60 = 3.01%

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

No, the peaks are too close together to resolve the charge states.

Homework: Waters Part II — Secondary/Tertiary structure

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

Denatured: unfoleded, exposing more surface area and resulting in higher charge states and broader distribution of peaks.

Native: folded, less surface area and lower charge states and sharper peaks.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell?

m / z = 2800, MW = 28006.60 Da => z = (MW / m/z) = 10

Isotope spacing: approximately 0.1 = 1 / z

Homework: Waters Part III — Peptide Mapping - primary structure

1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

MVS[K]GEELFTG VVPILVELDG DVNGH[K]FSVS GEGEGDATYG [K]LTL[K]FICTT G[K]LPVPWPTL VTTLTYGVQC FS[R]YPDHM[K]Q HDFF[K]SAMPE GYVQE[R]TIFF [K]DDGNY[K]T[R]A EV[K]FEGDTLV N[R]IEL[K]GIDF [K]EDGNILGH[K] LEYNYNSHNV YIMAD[K]Q[K]NG I[K]VNF[K]I[R]HN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALS[K]D PNE[K][R]DHMVL LEFVTAAGIT LGMDELY[[K]]LE HHHHHH

2. How many peptides will be generated from tryptic digestion of eGFP?

19 peptides

massposition#MCmodificationspeptide sequence
4472.1752170-2100HNIEDGSVQLADHYQQNTPI GDGPVLLPDNHYLSTQSALS K
2566.2931217-2390DHMVLLEFVTAAGITLGMDE LYK
2437.26085-270GEELFTGVVPILVELDGDVN GHK
2378.257754-740LPVPWPTLVTTLTYGVQCFS R
1973.9062142-1570LEYNYNSHNVYIMADK
1503.659728-420FSVSGEGEGDATYGK
1266.578387-970SAMPEGYVQER
1083.4979240-2470LEHHHHHH
1050.5214115-1230FEGDTLVNR
982.4952133-1410EDGNILGHK
821.394081-860QHDFFK
790.355275-800YPDHMK
769.391347-530FICTTGK
711.2944103-1080DDGNYK
655.381398-1020TIFFK
602.2780211-2150DPNEK
579.3137128-1320GIDFK
507.2925164-1670VNFK
502.3235124-1270IELK

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

21 peaks.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

There are more peaks in the chromatogram than predicted peptides. This could be due to post-translational modifications, missed cleavages, or other factors that generate additional peptide species.

5. Identify the mass-to-charge ($\frac{m}{z}$) of the peptide shown in Figure 5b. What is the charge ($z$) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ($\small{[M\!\!+\!\!H]^+}$) based on its $\frac{m}{z}$ and $z$.

Most abundant peak: m/z = 525.76712

Isotope spacing: 0.5

Charge: z = 1 / 0.5 = 2

Mass = 525.76712 * 2 = 1051.53 Da

6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm.

1050.5214	115-123	0	FEGDTLVNR

Accuracy = (1051.53 - 1050.5214) / 1050.5214 = 0.00096

Error = 0.00096 * 1,000,000 = 960 ppm

7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

88%