Week 10 HW: Imagining and Measurement

For final project

In this project, I will measure several aspects of the DNA sensing system, including sequence correctness, predicted folding behavior, target response, orthogonality, and signal output. The most important biological measurements are whether the histamine and IgE circuits are correctly designed and whether they respond only to their intended targets. I will also measure the strength of the output signal after target binding, since the goal is to convert molecular recognition into a detectable readout. In addition, I will look at background activity and nonspecific activation to estimate how cleanly the system distinguishes true signal from noise. These measurements will help determine whether the platform is suitable for future wearable use.

I will begin by comparing the designed DNA sequences to the intended circuit architecture to make sure the correct aptamer, toehold, and trigger regions are present. Next, I will use secondary-structure prediction to check whether each circuit forms the expected hairpin and whether the toehold remains accessible for switching. To experimentally test conformational change, I would use native PAGE gel electrophoresis, which can reveal mobility shifts when the DNA switches structure or binds its trigger. If the circuit is extended to a functional reporter stage, I would also use a cell-free assay to measure whether target binding leads to a detectable output. Finally, I would compare matched, mismatched, and no-target controls to quantify specificity and background signal.

The main technologies in this project are computational DNA design tools, secondary-structure prediction software, native PAGE gel electrophoresis, and electrochemical sensing methods. Benchling or a similar platform will be used to organize and annotate the DNA circuits, while NUPACK or Mfold will help predict folding and accessibility. Native PAGE will be used to observe structural changes in the DNA constructs, and electrochemical readouts such as impedance or current measurements will be used for the wearable sensing concept. If needed, a cell-free expression system can provide an intermediate functional readout before moving to the wearable electrode platform. Together, these technologies allow both design validation and functional testing of the sensing system.

Waters Part I — Molecular Weight

For the following calculations, I will be using the provided eGFP sequence:

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Using the online ExPASy Compute pI/Mw tool, the baseline expected molecular weight of this unmodified amino acid sequence was found to be 28006.60 g/mol (Theoretical pI/Mw: 5.90 / 28006.60). Note that the His-tag (HHHHHH) and LE linker are included in this sequence and therefore in the calculated weight.

To calculate the experimental molecular weight of eGFP using the intact LC-MS data from Figure 1, I selected an adjacent pair of consecutive charge state peaks:

  • Peak n (Higher m/z): 903.7148
  • Peak n+1 (Lower m/z): 875.4421

Step A: Determine the charge state (z)

Using the adjacent charge state formula:

z = (m/z)n+1 / [ (m/z)n − (m/z)n+1 ]

z = 875.4421 / (903.7148 − 875.4421) = 875.4421 / 28.2727 = 30.96 ≈ 31

From this calculation, the charge state of our first peak (n) is 31, meaning it carries 31 extra protons, while the adjacent peak (n+1) carries 32.

Step B: Determine the Experimental Molecular Weight

Using the mathematical relationship between m/z, MW, and z, where each charge is carried by a proton (mass = 1.00728 Da):

MW = z × (m/z)n − z × 1.00728

MW = 31 × 903.7148 − 31 × 1.00728 = 28015.159 − 31.226 = 27983.93 Da

When comparing the experimental result (27983.93 Da) with the theoretical molecular weight (28006.60 Da), the values show a very close match.

Accuracy Calculation

To quantify the precision of our deconvolution relative to the theoretical weight:

Accuracy = |MWexperiment − MWtheory| / MWtheory

Accuracy = |27983.93 − 28006.60| / 28006.60 = 22.67 / 28006.60 = 0.0809%

The accuracy of 0.0809% demonstrates that the difference between the experimental mass and theoretical mass is extremely small, indicating high accuracy in peak selection.

The charge state can also be observed directly from the zoomed-in peak structure. When zooming tightly into the mass spectrum for intact eGFP, individual isotopic lines become visible inside the peak envelope. This occurs because the protein carries multiple protonation states, producing a charge-state distribution where isotopic fine structure can be resolved.

Waters Part II — Secondary/Tertiary Structure

Q1: Native vs. Denatured Protein Conformations

The native state of a protein is its thermodynamically stable, biologically active three-dimensional conformation, maintained by non-covalent interactions including hydrogen bonding, hydrophobic interactions, and electrostatic forces.

When a protein unfolds (denatures), environmental stressors such as organic solvents (e.g., acetonitrile) or acidic conditions (e.g., formic acid) disrupt these interactions. The ordered secondary and tertiary structure collapses into a disordered polypeptide chain, while the primary amino acid sequence remains intact.

A mass spectrometer detects this through changes in the charge state distribution (CSD). In the native state, most protonatable residues (Lys, Arg, His) are buried inside the folded protein, so only a few are available for protonation. This produces a narrow distribution of low charge states at high m/z values (typically above 2500 m/z), as seen in the bottom spectrum of Figure 2.

In the denatured state, unfolding exposes all protonatable sites to solvent, generating a broad distribution of high charge states shifted toward lower m/z values (500–1500 m/z), as seen in the top spectrum of Figure 2.


Q2: Charge State of the ~2800 m/z Peak (Figure 3)

Yes, the charge state can be determined from the zoomed-in inset in Figure 3. Isotopic peaks within a single charge envelope are separated by 1 Da / z, so measuring the spacing between adjacent isotopic lines gives z directly:

z = 1 / Δ(m/z)

From the inset, the isotopic spacing is approximately 0.09 m/z, giving:

z = 1 / 0.09 ≈ 11

The peak at ~2800 m/z therefore carries a charge state of +11, consistent with the compact, tightly folded native conformation producing low charge states at high m/z.


Waters Part III — Peptide Mapping

1. Residue Quantification and Sequence Highlight

Trypsin cleaves peptide bonds at the C-terminal side of Lysine (K) and Arginine (R) residues unless followed by Proline (P).

After counting through the full 246-amino-acid eGFP sequence, there are:

  • Lysine residues (K): 20
  • Arginine residues (R): 6
  • Total cleavage sites: 26

Highlighted cleavage sites :

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG KVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

2. In Silico Tryptic Peptide Generation

Using the ExPASy PeptideMass tool with the parameters shown in Figure 4 (Trypsin, monoisotopic, [M+H]+, 0 missed cleavages, mass range 500–unlimited Da), a total of 19 theoretical peptides were predicted. Although there are 26 cleavage sites, some adjacent cleavage sites produce very small peptides that fall below the 500 Da mass filter, and the His-tag C-terminal segment does not end in K or R, reducing the reported count to 19.

3. LC-MS Chromatographic Peak Count

From the TIC chromatogram in Figure 5a, the tallest peak is at 4.87 minutes with approximately 1.2 × 107 counts. Counting all peaks above the 10% relative abundance threshold between 0.5 and 6.0 minutes gives 21 chromatographic peaks.

4. Comparison: Predicted vs. Observed Peaks

  • Predicted peptides (theory): 19
  • Observed chromatographic peaks (experiment): 21

There are more observed peaks than predicted. This can be explained by several factors:

Missed Cleavages

Trypsin does not always cleave at every K/R site with 100% efficiency. Peptides with one or more missed cleavages will appear as additional, larger peaks in the chromatogram.

Non-specific Cleavage

Trypsin may occasionally cleave at non-K/R sites, generating unexpected peptide fragments not predicted by the in silico digest.

Protein Modifications

Post-translational or chemical modifications (e.g., oxidized methionines) can shift peptide masses and cause modified and unmodified forms of the same peptide to appear as separate peaks.

5. Charge State and Mass of the Peak at 2.78 min (Figure 5b)

A. Charge State Determination

From the zoomed inset in Figure 5b, the two most abundant isotopic peaks are:

  • m/z1 = 525.76712
  • m/z2 = 526.25918

Isotopic spacing:

Δ(m/z) = 526.25918 − 525.76712 = 0.49206

Charge state:

z = 1 / 0.49206 = 2.032 ≈ 2

The dominant peptide ion carries a charge state of +2.

B. [M+H]+ Mass

Using the relationship [M+H]+ = (m/z × z) − H, with z = 2 and H = 1.00727 Da:

[M+H]+ = (525.76712 × 2) − 1.00727 = 1051.53424 − 1.00727 = 1050.527 Da

6. Peptide Identification and Mass Accuracy

Matching the experimental [M+H]+ of 1050.527 Da to the PeptideMass theoretical output for eGFP, the closest peptide is FEGDTLVNR, with a theoretical monoisotopic mass of 1050.5214 Da.

Mass accuracy:

Accuracy = |MWexperiment − MWtheory| / MWtheory × 106

Accuracy = |1050.527 − 1050.5214| / 1050.5214 × 106 = 0.00557 / 1050.5214 × 106 = 5.30 ppm

This is excellent mass accuracy, consistent with the high-resolution Waters BioAccord QToF instrument.

7. Sequence Coverage

As shown in Figure 6, the amino acid coverage map reports 88% sequence coverage of eGFP based on the peptides positively identified by their calculated mass and fragmentation pattern.

8. Peptide Sequence from Fragmentation Spectrum (Bonus)

The peptide eluting at 2.78 minutes has a [M+H]+ of 1050.527 Da, which matches FEGDTLVNR (theoretical 1050.5214 Da) from the PeptideMass output. To confirm, the fragmentation spectrum in Figure 5c was compared to the predicted b/y ion series for FEGDTLVNR using the FragIon tool. The observed fragment masses are consistent with the expected y-ion series for this peptide sequence.

9. Does the Peptide Map Data Confirm eGFP? (Bonus)

Yes, the data strongly supports that the sample is eGFP. The peptide map shows 88% amino acid sequence coverage (Figure 6), meaning the large majority of the protein's primary structure was directly confirmed by mass and fragmentation matching. The observed peptide masses align with the theoretical tryptic digest of eGFP within 5.30 ppm accuracy, and the fragmentation spectrum of the 2.78-minute peak matches the expected ions for the eGFP peptide FEGDTLVNR. Together, these results provide high confidence that the protein standard is authentic eGFP.

Waters Part IV — Oligomers

Theoretical Mass Calculations

To identify each oligomeric assembly of KLH on the CDMS spectrum, the expected total mass was calculated by multiplying the subunit mass by the number of subunits in each complex:

  • 7FU Decamer (10 subunits): 10 × 340 kDa = 3.40 MDa
  • 8FU Didecamer (20 subunits): 20 × 400 kDa = 8.00 MDa
  • 8FU 3-Decamer (30 subunits): 30 × 400 kDa = 12.00 MDa
  • 8FU 4-Decamer (40 subunits): 40 × 400 kDa = 16.00 MDa

Spectral Peak Assignment (Figure 7)

Comparing the calculated masses to the labeled peaks in the CDMS spectrum:

SpeciesTheoretical MassObserved Peak (MDa)Notes
7FU Decamer3.40 MDa3.4 MDaExact match
8FU Didecamer8.00 MDa8.33 MDaMost abundant peak; highest intensity in spectrum
8FU 3-Decamer12.00 MDa12.67 MDaClearly resolved peak
8FU 4-Decamer16.00 MDa~16.0 MDaLow intensity; visible as small broad peak in blue trace

All four oligomeric species are detectable in the spectrum. The 8FU Didecamer at 8.33 MDa is the dominant species. The observed masses for the 8FU assemblies are slightly higher than theoretical (~4–5% offset), which is expected for CDMS measurements of large MDa-scale complexes where instrument calibration and space-charge effects introduce mass shifts.


Waters Part V — Did I Make GFP?

The table below summarizes the theoretical and observed molecular weights for intact eGFP, using the data from the provided figure screenshots (lab work at Waters was not performed in person).

TheoreticalObserved / Measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)28,006.60 Da (28.007 kDa)27,983.93 Da (27.984 kDa)809.45 ppm

The theoretical MW (28,006.60 Da) was obtained from ExPASy Compute pI/Mw using the full eGFP sequence including the His-tag. The observed MW (27,983.93 Da) was deconvoluted from the intact LC-MS spectrum in Figure 1 using the adjacent charge state method (z = 31, m/z = 903.7148). The mass error of 809.45 ppm reflects the difference between the sequence-predicted mass and the experimentally measured denatured protein mass, which is consistent with the resolution of the LC-MS system used.