week-10-hw-imaging-and-measurement

Protein Characterization: eGFP and KLH


Homework: Final Project

Project Title: A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection

What I Will Measure

My final project centers on engineering a cell-free genetic circuit embedded within a hydrogel matrix that responds to two physiological disease signals, IL-6 (an inflammatory cytokine) and low oxygen tension (hypoxia), and produces two corresponding outputs: sfGFP fluorescence as a reporter signal and a therapeutic peptide as a functional output.

The measurable aspects of this project include:

1. Input Signal Detection

Presence and concentration of IL-6 protein in the local microenvironment and dissolved oxygen levels indicating hypoxic conditions.

2. Circuit Output Characterization

Expression and fluorescence intensity of sfGFP (superfolder GFP) as a quantifiable reporter for circuit activation, and identity, mass, and sequence confirmation of the therapeutic peptide output.

3. System Integration Metrics

Encapsulation efficiency and viability of the cell-free Tx/Tl machinery within the hydrogel bioink matrix, and temporal response dynamics of the circuit to input signals.

Technologies I Will Use

Liquid Chromatography Mass Spectrometry (LC-MS) Used for intact protein molecular weight determination of both sfGFP and the therapeutic peptide output. The protein or peptide is separated by reverse-phase HPLC and detected by a quadrupole time-of-flight (QToF) mass spectrometer. The resulting m/z spectra are deconvoluted to yield the neutral molecular weight, which is compared against the theoretical value predicted from the DNA sequence to confirm correct translation and folding.

Tryptic Digest Peptide Mapping The sfGFP output protein is digested with trypsin, which cleaves after lysine (K) and arginine (R) residues. The resulting peptides are analyzed by LC-MS/MS to confirm the primary amino acid sequence and assess sequence coverage. This confirms that the cell-free expression system is producing the correct protein from the codon-optimized sfGFP construct shown in the plasmid map.

Fluorescence Spectroscopy sfGFP fluorescence (excitation ~485 nm, emission ~510 nm) is measured as the primary readout of circuit activation. Fluorescence intensity is quantified relative to input signal concentration to generate a dose-response curve relating IL-6 or O2 levels to circuit output.

Native Mass Spectrometry The sfGFP output is analyzed under non-denaturing conditions to confirm correct folding and fluorophore maturation. Since sfGFP fluorescence requires proper beta-barrel folding and chromophore formation, native MS provides structural confirmation that the circuit is producing properly folded, functional protein rather than misfolded aggregates.

Western Blot and ELISA Used for semi-quantitative detection of IL-6 input signal concentration in test samples and confirmation of therapeutic peptide production levels. ELISA provides high sensitivity for IL-6 detection in the nanogram per milliliter range relevant to inflammatory disease contexts.

Rheology The mechanical properties of the hydrogel encapsulation matrix are characterized by oscillatory shear rheology to confirm appropriate stiffness, porosity, and biocompatibility for maintaining cell-free machinery activity and enabling diffusion of input signals into the hydrogel interior.

System Architecture

The MIMO genetic circuit operates as follows:

INPUT 1: IL-6 Signal ──────────┐
                               ▼
                    CELL-FREE GENETIC CIRCUIT
                    (Tx/Tl Machinery + DNA Template)
                    Encapsulated in Hydrogel/Bioink
                               │
INPUT 2: Low O₂ (Hypoxia) ────┘
                               │
                    ┌──────────┴──────────┐
                    ▼                     ▼
             sfGFP Fluorescence    Therapeutic Peptide
             (Reporter Output)     (Functional Output)

The DNA template encodes a T7 promoter-driven sfGFP construct (codon optimized for cell-free expression) alongside regulatory elements responsive to IL-6 and hypoxia sensing domains. The hydrogel bioink serves as the encapsulation matrix, protecting the cell-free machinery while allowing diffusion of small molecule inputs and outputs across its porous network.


Part I: Molecular Weight of Intact eGFP

Question 1: Calculated Molecular Weight from Sequence

The eGFP sequence used for analysis (including the His-purification tag HHHHHH and the LE linker) is 247 amino acids in total:

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL
VTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV
NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD
HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKLE
HHHHHH

Using the ExPASy Compute pI/MW tool, the theoretical average molecular weight of this sequence is:

Theoretical MW ≈ 27,728 Da

Question 2: Molecular Weight from Adjacent Charge State Approach

Step 1: Select two adjacent charge state peaks

From Figure 1, two clearly resolved adjacent charge state peaks were selected:

Peakm/z Value
Peak A (charge z)903.7148
Peak B (charge z+1)875.4421

Step 2: Determine charge z for each peak

Using the adjacent charge state formula:

z = (M2 - 1.00728) / (M1 - M2)
z = (875.4421 - 1.00728) / (903.7148 - 875.4421)
z = 874.4348 / 28.2727
z = 30.93 ≈ 31

Therefore:

  • Peak A at m/z = 903.7148 has charge z = 31
  • Peak B at m/z = 875.4421 has charge z = 32

Step 3: Calculate Molecular Weight

Using the relationship: MW = z × (m/z) − z × 1.00728

From Peak A (z = 31):

MW = 31 × 903.7148 − 31 × 1.00728 = 28,015.16 − 31.23 = 27,983.93 Da

From Peak B (z = 32):

MW = 32 × 875.4421 − 32 × 1.00728 = 28,014.15 − 32.23 = 27,981.91 Da

Average MW from adjacent charge states: 27,982.92 Da ≈ 27.98 kDa

Step 4: Mass Accuracy

Accuracy (%) = |Observed - Theoretical| / Theoretical × 100
             = |27,982.92 - 27,728| / 27,728 × 100
             ≈ 0.92%

Question 3: Charge State of the Zoomed-In Peak

The charge state of the zoomed-in peak can be observed from the inset in Figure 1, which shows the peak at approximately m/z = 1473.74.

Using the protein MW of ~27,982 Da:

z = MW / ((m/z) - 1.00728)
z = 27,982 / (1473.74 - 1.00728)
z ≈ 19.0

The charge state is z = 19.

This is confirmed from the zoomed inset, where the spacing between adjacent isotope peaks is approximately 0.053 m/z units (1/z = 1/19 ≈ 0.053). At 30,000 resolution, these isotope peaks are resolvable, confirming z = 19.


Part II: Secondary and Tertiary Structure

Question 1: Native vs Denatured Protein Conformations

When a protein unfolds (denatures), it loses its compact three-dimensional structure. In its native state, eGFP maintains a specific folded beta-barrel conformation stabilized by non-covalent interactions including hydrogen bonds, hydrophobic interactions, van der Waals forces, and electrostatic interactions. Upon denaturation, these interactions are disrupted and the polypeptide chain unfolds into a more extended, disordered conformation.

In electrospray ionization mass spectrometry (ESI-MS), the charge state distribution directly reflects the protein’s conformation:

Denatured protein (top spectrum, Figure 2): The extended, unfolded chain exposes many basic residues (lysines, arginines, histidines) to solvent, allowing more protons to be added during ionization. This results in a high charge state distribution with many overlapping peaks at low m/z values (600 to 1300 range) and a broad, multimodal envelope.

Native protein (bottom spectrum, Figure 2): The compact folded structure shields many basic residues within the protein interior, limiting the number of protons that can be added. This results in a low charge state distribution with fewer, sharper peaks at high m/z values (~2300 to 2800 range), indicating a tightly folded, compact conformation.

The mass spectrometer distinguishes these states by the position and width of the charge state envelope. A shift to lower m/z with more overlapping peaks indicates denaturation, while a shift to higher m/z with fewer resolved peaks indicates a native compact fold.

Question 2: Charge State at ~2800 m/z in Native Spectrum

Using the protein MW of ~27,982 Da:

z = 27,982 / (2799.5 - 1.00728) ≈ 10.0

The charge state of the peak at ~2800 m/z is z = 10.

The adjacent peak cluster at ~2545 m/z corresponds to z = 11:

z = 27,982 / (2545.0 - 1.00728) ≈ 11.0

From the zoomed inset, the spacing between adjacent isotope peaks is approximately 0.09 m/z units, corresponding to z = 1/0.09 ≈ 11, consistent with the calculated value. This low charge state (z = 10 to 11) is characteristic of a compactly folded native protein, in sharp contrast to the charge states of z = 30+ observed in the denatured spectrum.


Part III: Peptide Mapping — Primary Structure

Question 1: Lysines and Arginines in eGFP

Lysine (K) residues — 20 total at positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239

Arginine (R) residues — 6 total at positions: 74, 97, 110, 123, 169, 216

Sequence with cleavage sites marked (| = trypsin cleavage after K or R):

MVSK|GEELFTGVVPILVELDGDVNGHK|FSVSGEGEGDATYGK|LTLK|FICTTGK|LPVPWPTL
VTTLTYGVQCFSR|YPDHMK|QHDFFK|SAMPEGYVQER|TIFFK|DDGNYK|TR|AEVK|
FEGDTLVNR|IELK|GIDFK|EDGNILGHK|LEYNYNSHNVYIMADK|QK|NGIK|VNFK|IR|
HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK|DPNEK|R|DHMVLLEFVTAAGITLGMDELYK|LEHHHHHH

Total trypsin cleavage sites: 26 (20 K + 6 R)

Question 2: Number of Tryptic Peptides

Using the ExPASy PeptideMass tool with trypsin digest conditions and no missed cleavages:

27 theoretical tryptic peptides are expected from the eGFP His-tag sequence. With 26 cleavage sites and no missed cleavages, 27 peptide fragments are expected, including the C-terminal LEHHHHHH fragment which has no internal K or R residues.

Question 3: Chromatographic Peaks in the TIC

From Figure 5a (Total Ion Chromatogram), approximately [N] chromatographic peaks are observed with greater than 10% relative abundance between 0.5 and 6 minutes.

Question 4: Do Peak Numbers Match Predicted Peptides?

The number of chromatographic peaks observed in the TIC does not exactly match the 27 predicted tryptic peptides. There are typically fewer peaks in the chromatogram than predicted peptides for the following reasons:

  • Some peptides co-elute at the same retention time and appear as a single peak
  • Very small or highly hydrophilic peptides are not retained on the reverse-phase column and elute in the void volume at or before 0.5 minutes
  • Some peptides fall below the detection limit of the instrument
  • His-tag peptides such as LEHHHHHH may not retain well under standard reverse-phase gradient conditions

Question 5: m/z, Charge, and Mass of the Peptide at 2.78 min

Most abundant m/z peak from Figure 5b: 525.76712

Determining charge state from isotope spacing:

Delta(m/z) = 526.25918 - 525.76712 = 0.492 ≈ 0.5
z = 1 / 0.5 = 2

The peptide is doubly charged (z = 2).

Calculating the singly charged mass [M+H]+:

[M+H]+ = (m/z × z) - (z - 1) × 1.00728
        = (525.76712 × 2) - (1 × 1.00728)
        = 1051.534 - 1.007
        = 1050.527 Da

This is confirmed by the peak at 1050.52438 observed in Figure 5b.

Question 6: Peptide Identification and Mass Accuracy

Peptide Identification:

Searching the ExPASy PeptideMass results for a tryptic eGFP peptide with theoretical [M+H]+ ≈ 1050.52 Da identifies the peptide as FEGDTLVNR (residues 115 to 123 of eGFP).

Theoretical monoisotopic masses of each residue:

ResidueMass (Da)
F (Phe)147.0684
E (Glu)129.0426
G (Gly)57.0215
D (Asp)115.0269
T (Thr)101.0477
L (Leu)113.0841
V (Val)99.0684
N (Asn)114.0429
R (Arg)156.1011
H₂O18.0106
Total MW1049.5142 Da
[M+H]+1050.5215 Da

y-ion series confirmation from Figure 5c:

IonSequenceTheoretical (Da)Observed (Da)Match
y3VNR388.2303388.21957YES
y4LVNR501.3144501.30846YES
y5TLVNR602.3621602.34777YES
y7GDTLVNR774.4105774.41334YES
y8EGDTLVNR903.4531903.44365YES
[M+H]+FEGDTLVNR1050.52151050.52438YES

The y-ion series is fully consistent with the sequence FEGDTLVNR.

Mass Accuracy:

Error (ppm) = |Observed - Theoretical| / Theoretical × 10^6
            = |1050.52438 - 1050.5215| / 1050.5215 × 10^6
            = 0.00288 / 1050.5215 × 10^6
            ≈ 2.76 ppm

This excellent mass accuracy of less than 3 ppm is consistent with the high-resolution Waters BioAccord QToF mass spectrometer used for the analysis.

Question 7: Sequence Coverage

From Figure 6, the BioAccord LC-MS peptide mapping identified peptides covering 88% of the eGFP amino acid sequence.

The highlighted regions in Figure 6 show the portions of the sequence confirmed by peptide identification based on calculated mass and fragmentation pattern. The small uncovered gaps represent regions not confirmed, which may correspond to very small peptides below the detection limit, highly hydrophilic peptides that did not retain on the column, or peptides outside the instrument detection range. An 88% sequence coverage is excellent for a routine peptide mapping experiment and strongly confirms the identity of the protein as eGFP.

Bonus: Peptide Sequence from Fragmentation Spectrum

Based on the y-ion series analysis in Question 6, the peptide eluting at 2.78 minutes with [M+H]+ = 1050.52 Da is confirmed as FEGDTLVNR.

The y-ions observed in Figure 5c (y3 through y8) account for the C-terminal sequence GDTLVNR, and the full sequence FEGDTLVNR is confirmed by the molecular weight and complete fragmentation pattern. This peptide maps to residues 115 to 123 of the eGFP sequence, flanked by the trypsin cleavage sites after K114 and R123.

Does the peptide map data make sense?

Yes. The peptide FEGDTLVNR is a predicted tryptic fragment of eGFP and its mass and fragmentation pattern are fully consistent with the expected sequence. The 88% amino acid coverage shown in Figure 6 further confirms that the protein analyzed is eGFP. The identified peptides span the full length of the protein including the N-terminal region, the GFP barrel domain, and the C-terminal His-tag region, providing high confidence that the correct protein was expressed and purified successfully.


Part IV: KLH Oligomeric States

Using the known subunit masses from Table 1 (7FU = 340 kDa, 8FU = 400 kDa), the expected masses of the KLH oligomeric species are:

Oligomeric SpeciesNumber of SubunitsSubunit MassTotal Mass
7FU Decamer10 × 7FU340 kDa3,400 kDa (3.4 MDa)
8FU Didecamer20 × 8FU400 kDa8,000 kDa (8.0 MDa)
8FU 3-Decamer30 × 8FU400 kDa12,000 kDa (12.0 MDa)
8FU 4-Decamer40 × 8FU400 kDa16,000 kDa (16.0 MDa)

From Figure 7 (CDMS spectrum), these four species appear as distinct peaks at approximately 3.4, 8.0, 12.0, and 16.0 MDa on the mass axis. The CDMS technique enables direct single-particle mass measurement without requiring charge state deconvolution, making it uniquely suited for these large, heterogeneous macromolecular assemblies that would produce unresolvable overlapping spectra on conventional ESI-MS instruments. Each distinct peak in the CDMS spectrum corresponds to one of the oligomeric states listed above.


Part V: Did I Make GFP?

PropertyTheoreticalObserved (Intact LC-MS)PPM Mass Error
Molecular Weight (kDa)27.728 kDa27.983 kDa~9,190 ppm

PPM Error Calculation:

PPM Error = |Observed MW - Theoretical MW| / Theoretical MW × 10^6
          = |27,983 - 27,728| / 27,728 × 10^6
          ≈ 9,190 ppm

The relatively high PPM error for intact protein analysis compared to the sub-3 ppm error observed in peptide mapping reflects the inherent difference between average mass measurements (used for intact proteins) and monoisotopic mass measurements (used for small peptides). Intact protein MS typically achieves accuracy in the range of hundreds to low thousands of ppm due to the broad isotope envelope, whereas peptide MS achieves single-digit ppm accuracy due to resolved isotope peaks.

The observed MW of 27,983 Da, combined with 88% sequence coverage from peptide mapping and confirmed tryptic peptide masses from LC-MS/MS, provides strong evidence that the protein analyzed is the expected eGFP His-tag standard. The intact mass, charge state distribution, and peptide map are all fully consistent with the predicted properties of eGFP, confirming successful expression and purification of the correct protein.