Week 10 HW

https://2026a.htgaa.org/2026a/course-pages/weeks/week-10/index.html

Homework: Final Project

Identify at least one aspect of your project to measure (protein mass, sequence, biomarker presence/quantity, etc.).

For this homework, let’s say my project is to produce human insulin (INS_HUMAN) via recombinant synthesis (cell-based system - E. Coli).

The aspect I want to measure is purification/presence of the insulin protein.

Describe all elements you intend to measure and the measurement procedures in detail.

Total protein content
- Bradford essay on cell lysate
Apparent molecular weight and purity
- SDS-PAGE on a Tricine gel system
  - Band Homogeneity: A pure protein will display a single, distinct band after staining.
Identity confirmation
- Western blot with anti-human-insulin antibody
Exact mass and sequence integrity
- LC-MS (ESI-TOF) on the purified product
  - Target intact mass 5807.57 Da (oxidized form, 3 disulfides).
- Tryptic digest followed by LC-MS/MS for full sequence coverage (explained in questions below)
Disulfide bond verification
- Non-reducing peptide mapping by LC-MS - to confirm the three native disulfides
Quantification and purity
- Reversed-phase HPLC against a USP insulin reference standard
Host cell protein and endotoxin
- Anti-E. coli HCP ELISA and LAL assay for endotoxin
  - it detects and quantifies residual E. coli Host Cell Proteins (HCPs)

Specify the technologies you will use (gel electrophoresis, DNA sequencing, mass spectrometry, etc.).

Mass spectometry
- LC-MS and LC-MS/MS
SDS-PAGE
Reversed-phase HPLC

Homework: Waters Part I — Molecular Weight

Calculate the predicted molecular weight of eGFP based on its amino acid sequence using tools like the ExPASy compute_pi calculator.

Molecular weight (Da):
    26886.32 (average mass)
    26869.36 (monoisotopic mass)

https://web.expasy.org/cgi-bin/compute_pi/pi_tool_bis.cgi?P42212@1-238@average

Claude thinks I have the WT avGFP, not eGFP (wild type GFP, not enhanced GFP)? Not sure how to find this on UniProt.

Calculate the molecular weight using the adjacent charge state approach:

Select two charge states from the intact LC-MS data.

mh = 1.00728
m1=1037.4423   # 45%
m2=1000.4302   # 50%

Determine z for adjacent peaks using the provided formula.

z = (m2 - mH) / (m1 - m2) 
  = (1000.4302 - 1.00728) / (1037.4423 - 1000.4302) 
  = 27.002599690371586

So m1 is the [M+27H]²⁷⁺ peak, m2 is [M+28H]²⁸⁺.

Reasoning/intuition:

You don't know M, and you don't know which divisor goes with which peak.
The trick: adjacent peaks must differ by exactly 1 in the divisor

peak_27 ≈ M / 27
peak_28 ≈ M / 28
peak_29 ≈ M / 29

Determine the protein MW using the relationship between m/z, MW, and z.

m/z = (MW + z * mh) / z
MW = z * ((m/z) - mh)

MW = 27.002599690371586 * (m1 - mh)
   = 27.002599690371586 * (1037.4423 - 1.00728)
   = 27986.439950142267 Da

Calculate measurement accuracy by comparing experimental vs. theoretical weight.

error = (m_measured - m_predicted) / (m_predicted)
      = (27986.439950142267 - 26886.32) / (26886.32)
      = 0.0409174610040447
      = 4.092%

Can you observe the charge state for the zoomed-in peak in the intact eGFP mass spectrum? If yes, what is it? If no, explain why.

Isotypes are species of atoms, characterised by the number of neutrons.
Mass of 1 neutron : m=1.003 Da
In an isotope spacing, adjacent peaks are separated by m/z = 1.003/z Da
We can identify a cluster; the spacings are - delta_m = 1474.1005 - 1474.0481 = 0.0524 Da apart
You can calculate the charge as z = 1.003 / delta_m.
Calculating the charge, we get z2 = 19
Apply z2 to compute the MW, and you find the molecular weights from the two calculations agree to within an accepetable tolerance (100 ppm)

# Method 1 — adjacent charge states (main spectrum)
z1 = (m2 - mh) / (m1 - m2)
   = (1000.4302 - 1.00728) / (1037.4423 - 1000.4302)
   = 999.42292 / 37.0121
   = 27.00

MW = z1 * (m1 - mh)
   = 27 * (1037.4423 - 1.00728)
   = 27 * 1036.43502
   = 27983.79 Da

# Method 2 — isotope spacing (inset)
z2 = 1.003 / delta_m
   = 1.003 / 0.0524
   = 19.14
   ≈ 19

# Cross-check — apply z2 to the inset peak
MW_check = z2 * (m_inset - mh)
         = 19 * (1473.74 - 1.00728)
         = 19 * 1472.73272
         = 27981.92 Da

# Agreement
error = |MW - MW_check|
      = |27983.79 - 27981.92|
      = 1.87 Da

ppm = error / MW * 1e6
    = 1.87 / 27983.79 * 1e6
    = 67 ppm  ✓

Working for the isotope spacing:

# Two adjacent isotope peaks: same protein, same charge state z,
# but peak 2 has one extra neutron (so +1.003 Da in true mass)

m1 = (MW         + z*mh) / z   # lighter isotope
m2 = (MW + 1.003 + z*mh) / z   # heavier isotope (+1 neutron)

# Subtract
delta_m = m2 - m1
        = ((MW + 1.003 + z*mh) - (MW + z*mh)) / z
        = 1.003 / z

# Rearrange
z = 1.003 / delta_m

Small note on units: m/z effectively is in Da — mass (Da) divided by charge (just an integer count) leaves you with Da on the x-axis; some textbooks call the unit “Thomson” (Th), but numerically it’s the same as Da.

Homework: Waters Part II — Secondary/Tertiary Structure

Explain the differences between native and denatured protein conformations. What happens when a protein unfolds? How is this determined with a mass spectrometer? What changes appear in the spectra between analyses?

A protein is one or more chains of amino acids. When DNA is read, it is assembled and stored in another form in mRNA, and later assembled by RNA into an amino acid chain.

These amino acids have inherent physical properties that results in various forces which twist the shape into a certain form. This process is referred to as protein folding and is a process of energy minimisation. When the folding reaches an equilibrium the protein is referred to as folded into its native form.

The 3D form of a protein is what confers its functionality. Proteins are in a sense 3D machines composed of chemical elements. Some proteins are merely static shapes, such as signalling molecules. Whereas others have a dynamic mechanical function, such as the ATP synthase.

How is this determined with a mass spectrometer? What changes appear in the spectra between analyses?

A mass spectrometer is a device which measures the mass-to-charge ratio of gaseus ions, which can be used to identify chemical substances.

Molecules are chemical compounds of atoms (ie. CO2 - one carbon, two oxygen atoms). An atomic element is composed of protons and neutrons in its nucleus, and electrons in its orbit. Ionisation is unpairing an electron from an atom.

A mass spectrometer ionises a substance, producing a charge (a free electron). For most small molecules and atoms (like Carbon-12), ionization predictably produces a single, stable charge state (typically +1).

A mass spectrometer outputs a plot of intensity (y) and $m/z$ mass-charge quotient (x). Mass can be used to characterise a specific atom, and charge is emitted when the substance is present and ionised (thus emitting an electron).

Thus a mass spectrometer can be used to map the presence of atoms and larger structures (molecules and proteins).

From what we know:

a denatured protein is highly charged, where each basic residue (amino acid) tends to become protonated.
a native protein has few charges, evenly spread across the surface.

The differences between the two runs:

Native — envelope at high m/z (~3000–4000), narrow (4–5 peaks), few charges. Compact part, few surface sites.
Denatured — envelope at low m/z (~1000–2000), broad (15–25 peaks), many charges. Sprawl, sites everywhere.
Mass itself — native shows the assembled mass: subunits stuck together, plus any bound ligand or metal. Denatured shows only individual subunit masses; the assembly has come apart and the cargo has fallen off.
Peak sharpness — denatured peaks are crisp; native peaks are fuzzier because the folded part drags along bound water, salt adducts, and some conformational wobble.

To detect proteins, typically a divide-and-conquer strategy is used. Proteins are cut using an enzyme (a protease) into smaller groups of residues (peptides), and then the spectral measurement of the mass-charge profiles of all the individual peptides is used to match against a signature of existing measured peptides.

Digestion: divide-and-conquer using enzyme.
Separation: separate using liquid chromatography (LC), peptides pass through specrometer gradually.
Ionisation: peptides are converted into charged, gas-phase ions.

This relies on a database of peptide fragments, whose usage is detailed below:

Peptide identification problem: given an unknown peptide — a chain of amino acids — identify which amino acids it contains and in what order, using only mass-charge measurements. One measurement of the intact peptide gives total mass but not order; many different orderings yield the same mass. Workaround: break many copies of the peptide at random points along the chain, producing sub-chains of every possible length, then measure the mass-charge of every sub-chain. The resulting set of mass-charges encodes the sequence, but noisily and incompletely — decoding the amino-acid order directly from that pattern is ill-posed because many candidate sequences fit any partial set of sub-chain masses. Reformulate as lookup against the known list of proteins in the organism, which fixes every peptide that could possibly exist (~10⁶ candidates). For each candidate, predict what its sub-chain mass-charges should be; filter the candidate set down to ~10¹ by demanding the intact-peptide mass match; score the survivors by how well their predicted sub-chain masses overlap with the observation; take the best. Bound the error rate by running the same pipeline against a catalog of fake (reversed) sequences and tuning the score threshold so fake matches stay below 1% of accepted matches — a calibrated null check standing in for ground truth.

In cases where the peptide sequence is unknown, de novo sequencing is performed.

From the native eGFP mass spectrum, discern the charge state of the peak at ~2800 m/z. What is the charge state? How can you tell?

Homework: Waters Part III — Peptide Mapping (Primary Structure)

Count the Lysines (K) and Arginines (R) in the eGFP sequence; circle or highlight them.

MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYK

How many peptides will be generated from tryptic digestion?

Navigate to the ExPASy PeptideMass tool.
Copy/paste the eGFP sequence.
Replicate the parameters shown in Figure 4.
Report the number of peptides generated.

Based on the LC-MS peptide map data, count the chromatographic peaks between 0.5–6 minutes (>10% relative abundance).

23-25

Does the peak count match the predicted peptide number? Are there more or fewer peaks?

There are more peaks!

Identify the m/z of the peptide in Figure 5b. Determine the charge (z) of the most abundant charge state using isotope separation. Calculate the singly charged peptide mass [M+H]⁺.

z = (m2 - mH) / (m1 - m2) 
m/z = (MW + z * mh) / z
MW = z * ((m/z) - mh)
mh = 1.00728
m = MW + z * mh

Most abundant/intense peak: 525.76712

# spacing
delta_b = 526.25918 - 525.76712 = 0.49206 Da 

# charge
z = 1.003 / 0.49206 = 2.03836930456
  = 2

# molecular weight
MW = z * (m/z - mh)
   = 2 * (525.76712 - 1.00728)
   = 1,049.51968 Da

M+H = m + mh
M+H = 1,049.51968 + 1.00728
    = 1,050.52696

Note: MW is the neutral protein’s mass (nothing added); m is the mass of the ion — i.e., the protein plus the z protons it picked up to get charged.

Identify the peptide by comparing to expected masses from the PeptideMass tool. Calculate the mass accuracy in ppm.

HAHA NICE!

1050.5214 	115-123 	0 	FEGDTLVNR

Peptide is the sequence FEGDTLVNR

What percentage of the sequence is confirmed by peptide mapping (see Figure 6)?

88%?

Homework: Waters Part IV — Oligomers

Using charge detection mass spectrometry data and known subunit masses (Table 1), identify the locations of:

“FU” = functional unit (each KLH subunit is built from ~7–8 globular FU domains of ~50 kDa each); “decamer” = 10 subunits assembled into a ring. So:

7FU Decamer = 10 KLH2 subunits (each with 7 FUs) → ~3.5 MDa
8FU Didecamer = 20 KLH1 subunits (8 FUs each) in 2 stacked rings → ~8 MDa
8FU 3-Decamer = 30 KLH1 subunits in 3 stacked rings → ~12 MDa
8FU 4-Decamer = 40 KLH1 subunits in 4 stacked rings → ~16 MDa
7FU Decamer - 3.4
8FU Didecamer - 8.33
8FU 3-Decamer - 12.67
8FU 4-Decamer - predicted 16 MDa, not labelled explicitly on chart

Homework: Waters Part V — Did I Make GFP?

Fill in the table with lab-acquired data from the Waters Immerse Lab showing theoretical vs. observed molecular weight with PPM mass error.

N/A - no not have access to node / lab.