week-10-hw-imaging-and-measurement
Protein Characterization: eGFP and KLH
Homework: Final Project
Project Title: A Hydrogel-Embedded Multiple Input-Output (MIMO) Genetic Circuit for IL-6 and Hypoxia Detection
What I Will Measure
My final project centers on engineering a cell-free genetic circuit embedded within a hydrogel matrix that responds to two physiological disease signals, IL-6 (an inflammatory cytokine) and low oxygen tension (hypoxia), and produces two corresponding outputs: sfGFP fluorescence as a reporter signal and a therapeutic peptide as a functional output.
The measurable aspects of this project include:
1. Input Signal Detection
Presence and concentration of IL-6 protein in the local microenvironment and dissolved oxygen levels indicating hypoxic conditions.
2. Circuit Output Characterization
Expression and fluorescence intensity of sfGFP (superfolder GFP) as a quantifiable reporter for circuit activation, and identity, mass, and sequence confirmation of the therapeutic peptide output.
3. System Integration Metrics
Encapsulation efficiency and viability of the cell-free Tx/Tl machinery within the hydrogel bioink matrix, and temporal response dynamics of the circuit to input signals.
Technologies I Will Use
Liquid Chromatography Mass Spectrometry (LC-MS) Used for intact protein molecular weight determination of both sfGFP and the therapeutic peptide output. The protein or peptide is separated by reverse-phase HPLC and detected by a quadrupole time-of-flight (QToF) mass spectrometer. The resulting m/z spectra are deconvoluted to yield the neutral molecular weight, which is compared against the theoretical value predicted from the DNA sequence to confirm correct translation and folding.
Tryptic Digest Peptide Mapping The sfGFP output protein is digested with trypsin, which cleaves after lysine (K) and arginine (R) residues. The resulting peptides are analyzed by LC-MS/MS to confirm the primary amino acid sequence and assess sequence coverage. This confirms that the cell-free expression system is producing the correct protein from the codon-optimized sfGFP construct shown in the plasmid map.
Fluorescence Spectroscopy sfGFP fluorescence (excitation ~485 nm, emission ~510 nm) is measured as the primary readout of circuit activation. Fluorescence intensity is quantified relative to input signal concentration to generate a dose-response curve relating IL-6 or O2 levels to circuit output.
Native Mass Spectrometry The sfGFP output is analyzed under non-denaturing conditions to confirm correct folding and fluorophore maturation. Since sfGFP fluorescence requires proper beta-barrel folding and chromophore formation, native MS provides structural confirmation that the circuit is producing properly folded, functional protein rather than misfolded aggregates.
Western Blot and ELISA Used for semi-quantitative detection of IL-6 input signal concentration in test samples and confirmation of therapeutic peptide production levels. ELISA provides high sensitivity for IL-6 detection in the nanogram per milliliter range relevant to inflammatory disease contexts.
Rheology The mechanical properties of the hydrogel encapsulation matrix are characterized by oscillatory shear rheology to confirm appropriate stiffness, porosity, and biocompatibility for maintaining cell-free machinery activity and enabling diffusion of input signals into the hydrogel interior.
System Architecture
The MIMO genetic circuit operates as follows:
The DNA template encodes a T7 promoter-driven sfGFP construct (codon optimized for cell-free expression) alongside regulatory elements responsive to IL-6 and hypoxia sensing domains. The hydrogel bioink serves as the encapsulation matrix, protecting the cell-free machinery while allowing diffusion of small molecule inputs and outputs across its porous network.
Part I: Molecular Weight of Intact eGFP
Question 1: Calculated Molecular Weight from Sequence
The eGFP sequence used for analysis (including the His-purification tag HHHHHH and the LE linker) is 247 amino acids in total:
Using the ExPASy Compute pI/MW tool, the theoretical average molecular weight of this sequence is:
Theoretical MW ≈ 27,728 Da
Question 2: Molecular Weight from Adjacent Charge State Approach
Step 1: Select two adjacent charge state peaks
From Figure 1, two clearly resolved adjacent charge state peaks were selected:
| Peak | m/z Value |
|---|---|
| Peak A (charge z) | 903.7148 |
| Peak B (charge z+1) | 875.4421 |
Step 2: Determine charge z for each peak
Using the adjacent charge state formula:
Therefore:
- Peak A at m/z = 903.7148 has charge z = 31
- Peak B at m/z = 875.4421 has charge z = 32
Step 3: Calculate Molecular Weight
Using the relationship: MW = z × (m/z) − z × 1.00728
From Peak A (z = 31):
From Peak B (z = 32):
Average MW from adjacent charge states: 27,982.92 Da ≈ 27.98 kDa
Step 4: Mass Accuracy
Question 3: Charge State of the Zoomed-In Peak
The charge state of the zoomed-in peak can be observed from the inset in Figure 1, which shows the peak at approximately m/z = 1473.74.
Using the protein MW of ~27,982 Da:
The charge state is z = 19.
This is confirmed from the zoomed inset, where the spacing between adjacent isotope peaks is approximately 0.053 m/z units (1/z = 1/19 ≈ 0.053). At 30,000 resolution, these isotope peaks are resolvable, confirming z = 19.
Part II: Secondary and Tertiary Structure
Question 1: Native vs Denatured Protein Conformations
When a protein unfolds (denatures), it loses its compact three-dimensional structure. In its native state, eGFP maintains a specific folded beta-barrel conformation stabilized by non-covalent interactions including hydrogen bonds, hydrophobic interactions, van der Waals forces, and electrostatic interactions. Upon denaturation, these interactions are disrupted and the polypeptide chain unfolds into a more extended, disordered conformation.
In electrospray ionization mass spectrometry (ESI-MS), the charge state distribution directly reflects the protein’s conformation:
Denatured protein (top spectrum, Figure 2): The extended, unfolded chain exposes many basic residues (lysines, arginines, histidines) to solvent, allowing more protons to be added during ionization. This results in a high charge state distribution with many overlapping peaks at low m/z values (600 to 1300 range) and a broad, multimodal envelope.
Native protein (bottom spectrum, Figure 2): The compact folded structure shields many basic residues within the protein interior, limiting the number of protons that can be added. This results in a low charge state distribution with fewer, sharper peaks at high m/z values (~2300 to 2800 range), indicating a tightly folded, compact conformation.
The mass spectrometer distinguishes these states by the position and width of the charge state envelope. A shift to lower m/z with more overlapping peaks indicates denaturation, while a shift to higher m/z with fewer resolved peaks indicates a native compact fold.
Question 2: Charge State at ~2800 m/z in Native Spectrum
Using the protein MW of ~27,982 Da:
The charge state of the peak at ~2800 m/z is z = 10.
The adjacent peak cluster at ~2545 m/z corresponds to z = 11:
From the zoomed inset, the spacing between adjacent isotope peaks is approximately 0.09 m/z units, corresponding to z = 1/0.09 ≈ 11, consistent with the calculated value. This low charge state (z = 10 to 11) is characteristic of a compactly folded native protein, in sharp contrast to the charge states of z = 30+ observed in the denatured spectrum.
Part III: Peptide Mapping — Primary Structure
Question 1: Lysines and Arginines in eGFP
Lysine (K) residues — 20 total at positions: 4, 27, 42, 46, 53, 80, 86, 102, 108, 114, 127, 132, 141, 157, 159, 163, 167, 210, 215, 239
Arginine (R) residues — 6 total at positions: 74, 97, 110, 123, 169, 216
Sequence with cleavage sites marked (| = trypsin cleavage after K or R):
Total trypsin cleavage sites: 26 (20 K + 6 R)
Question 2: Number of Tryptic Peptides
Using the ExPASy PeptideMass tool with trypsin digest conditions and no missed cleavages:
27 theoretical tryptic peptides are expected from the eGFP His-tag sequence. With 26 cleavage sites and no missed cleavages, 27 peptide fragments are expected, including the C-terminal LEHHHHHH fragment which has no internal K or R residues.
Question 3: Chromatographic Peaks in the TIC
From Figure 5a (Total Ion Chromatogram), approximately [N] chromatographic peaks are observed with greater than 10% relative abundance between 0.5 and 6 minutes.
Question 4: Do Peak Numbers Match Predicted Peptides?
The number of chromatographic peaks observed in the TIC does not exactly match the 27 predicted tryptic peptides. There are typically fewer peaks in the chromatogram than predicted peptides for the following reasons:
- Some peptides co-elute at the same retention time and appear as a single peak
- Very small or highly hydrophilic peptides are not retained on the reverse-phase column and elute in the void volume at or before 0.5 minutes
- Some peptides fall below the detection limit of the instrument
- His-tag peptides such as LEHHHHHH may not retain well under standard reverse-phase gradient conditions
Question 5: m/z, Charge, and Mass of the Peptide at 2.78 min
Most abundant m/z peak from Figure 5b: 525.76712
Determining charge state from isotope spacing:
The peptide is doubly charged (z = 2).
Calculating the singly charged mass [M+H]+:
This is confirmed by the peak at 1050.52438 observed in Figure 5b.
Question 6: Peptide Identification and Mass Accuracy
Peptide Identification:
Searching the ExPASy PeptideMass results for a tryptic eGFP peptide with theoretical [M+H]+ ≈ 1050.52 Da identifies the peptide as FEGDTLVNR (residues 115 to 123 of eGFP).
Theoretical monoisotopic masses of each residue:
| Residue | Mass (Da) |
|---|---|
| F (Phe) | 147.0684 |
| E (Glu) | 129.0426 |
| G (Gly) | 57.0215 |
| D (Asp) | 115.0269 |
| T (Thr) | 101.0477 |
| L (Leu) | 113.0841 |
| V (Val) | 99.0684 |
| N (Asn) | 114.0429 |
| R (Arg) | 156.1011 |
| H₂O | 18.0106 |
| Total MW | 1049.5142 Da |
| [M+H]+ | 1050.5215 Da |
y-ion series confirmation from Figure 5c:
| Ion | Sequence | Theoretical (Da) | Observed (Da) | Match |
|---|---|---|---|---|
| y3 | VNR | 388.2303 | 388.21957 | YES |
| y4 | LVNR | 501.3144 | 501.30846 | YES |
| y5 | TLVNR | 602.3621 | 602.34777 | YES |
| y7 | GDTLVNR | 774.4105 | 774.41334 | YES |
| y8 | EGDTLVNR | 903.4531 | 903.44365 | YES |
| [M+H]+ | FEGDTLVNR | 1050.5215 | 1050.52438 | YES |
The y-ion series is fully consistent with the sequence FEGDTLVNR.
Mass Accuracy:
This excellent mass accuracy of less than 3 ppm is consistent with the high-resolution Waters BioAccord QToF mass spectrometer used for the analysis.
Question 7: Sequence Coverage
From Figure 6, the BioAccord LC-MS peptide mapping identified peptides covering 88% of the eGFP amino acid sequence.
The highlighted regions in Figure 6 show the portions of the sequence confirmed by peptide identification based on calculated mass and fragmentation pattern. The small uncovered gaps represent regions not confirmed, which may correspond to very small peptides below the detection limit, highly hydrophilic peptides that did not retain on the column, or peptides outside the instrument detection range. An 88% sequence coverage is excellent for a routine peptide mapping experiment and strongly confirms the identity of the protein as eGFP.
Bonus: Peptide Sequence from Fragmentation Spectrum
Based on the y-ion series analysis in Question 6, the peptide eluting at 2.78 minutes with [M+H]+ = 1050.52 Da is confirmed as FEGDTLVNR.
The y-ions observed in Figure 5c (y3 through y8) account for the C-terminal sequence GDTLVNR, and the full sequence FEGDTLVNR is confirmed by the molecular weight and complete fragmentation pattern. This peptide maps to residues 115 to 123 of the eGFP sequence, flanked by the trypsin cleavage sites after K114 and R123.
Does the peptide map data make sense?
Yes. The peptide FEGDTLVNR is a predicted tryptic fragment of eGFP and its mass and fragmentation pattern are fully consistent with the expected sequence. The 88% amino acid coverage shown in Figure 6 further confirms that the protein analyzed is eGFP. The identified peptides span the full length of the protein including the N-terminal region, the GFP barrel domain, and the C-terminal His-tag region, providing high confidence that the correct protein was expressed and purified successfully.
Part IV: KLH Oligomeric States
Using the known subunit masses from Table 1 (7FU = 340 kDa, 8FU = 400 kDa), the expected masses of the KLH oligomeric species are:
| Oligomeric Species | Number of Subunits | Subunit Mass | Total Mass |
|---|---|---|---|
| 7FU Decamer | 10 × 7FU | 340 kDa | 3,400 kDa (3.4 MDa) |
| 8FU Didecamer | 20 × 8FU | 400 kDa | 8,000 kDa (8.0 MDa) |
| 8FU 3-Decamer | 30 × 8FU | 400 kDa | 12,000 kDa (12.0 MDa) |
| 8FU 4-Decamer | 40 × 8FU | 400 kDa | 16,000 kDa (16.0 MDa) |
From Figure 7 (CDMS spectrum), these four species appear as distinct peaks at approximately 3.4, 8.0, 12.0, and 16.0 MDa on the mass axis. The CDMS technique enables direct single-particle mass measurement without requiring charge state deconvolution, making it uniquely suited for these large, heterogeneous macromolecular assemblies that would produce unresolvable overlapping spectra on conventional ESI-MS instruments. Each distinct peak in the CDMS spectrum corresponds to one of the oligomeric states listed above.
Part V: Did I Make GFP?
| Property | Theoretical | Observed (Intact LC-MS) | PPM Mass Error |
|---|---|---|---|
| Molecular Weight (kDa) | 27.728 kDa | 27.983 kDa | ~9,190 ppm |
PPM Error Calculation:
The relatively high PPM error for intact protein analysis compared to the sub-3 ppm error observed in peptide mapping reflects the inherent difference between average mass measurements (used for intact proteins) and monoisotopic mass measurements (used for small peptides). Intact protein MS typically achieves accuracy in the range of hundreds to low thousands of ppm due to the broad isotope envelope, whereas peptide MS achieves single-digit ppm accuracy due to resolved isotope peaks.
The observed MW of 27,983 Da, combined with 88% sequence coverage from peptide mapping and confirmed tryptic peptide masses from LC-MS/MS, provides strong evidence that the protein analyzed is the expected eGFP His-tag standard. The intact mass, charge state distribution, and peptide map are all fully consistent with the predicted properties of eGFP, confirming successful expression and purification of the correct protein.