Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

For your final project:

  • Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

I will use methods to check how effective a protein’s physical and chemical properties (stability, folding, binding) is of my redesigned receptor-binding domains across multiple key metrics:

(1) I will measure binding affinity: the thermodynamic strength of the binding between my redesigned T7 tail fiber and the target P. aeruginosa surface receptors (e.g., OprF, PilA). (2) I will measure structural stability using the Root Mean Square Deviation (RMSD) of the polypeptide chain over time to ensure the redesign hasn’t introduced metastability and failure to adopt their native, functional conformation. (3) I will measure solubility and aggregation propensity - this will measure the likelihood of the protein remains soluble versus forming insoluble inclusion bodies during recombinant expression. (4) To ensure functional efficiency and biosafety/biosecurity compliance, I will measure Codon Adaptation Index (CAI) to ensure the elimination of regulated DNA sequences from restricted, highly pathogenic agents (e.g. Ebola, SARS-CoV-2, Anthrax).

  • Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

I perform these measurements using high-performance computational simulations that act as proxies for wet lab benchtop techniques:

(1) I will use Waters Mass Spectrometry (simulated) and sequence-level verification for checking molecular weight and sequence fidelity of my 186 amino acid RBD redesigned constructs to make sure they resemble the theoretical design accurately.

Rosetta and FoldX for measuring binding energy. A more negative value shows stronger predicted binding, similar to measuring a dissociation constant KD in a physical Surface Plasmon Resonance (SPR) assay.

(2) GROMACS and OpenMM utilizing the CHARMM36 force field - can simulate the protein in a virtual solvent at 37°C (310 K), I will measure structural integrity over a 100 ns trajectory. This replaces physical stability tests like Differential Scanning Fluorimetry (DSF).

(3) AlphaFold3 and ESMFold will be used to measure the pLDDT (Predicted Local Distance Difference Test) score. High confidence scores (>70) serve as a digital measurement that the protein will adopt the functional secondary and tertiary structures.

(4) Biosecurity measurement - SecureDNA screening protocols - every redesigned sequence will be measured against global pathogen databases to ensure the host-range expansion complies with international safety standards.

  • What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.
image.png image.png

I am executing a fully in silico protein engineering project to expand the host range of the T7 bacteriophage, specifically targeting antibiotic-resistant P. aeruginosa. My workflow begins with a 20-step computational pipeline where I use ESM3 and AlphaFold3 to map the structural constraints of the gp17 receptor-binding domain and dock it against clinical Pseudomonas surface receptors. I then employ ProteinMPNN to generate a diverse library of 1,000 sequence candidates, which I filter for solubility using CamSol and rank-order based on Rosetta 𝚫G binding energy. To validate these designs digitally, I measure their structural stability through 100 ns all-atom molecular dynamics simulations in GROMACS, ensuring an RMSD of less than 2.5 Å. Finally, I generate five synthesis-ready GenBank files, optimized for high-yield recombinant expression in E. coli BL21(DE3) with a 6x-His tag, and verify them through SecureDNA biosecurity screening to ensure my redesigned viral fibers are safe for future physical production by Twist Bioscience.

(1)Theoretical pI/Mw: 27,875.41 Da

(2) I picked the two tallest peaks. Peak 1 - 848.9758 Peak 2 - 875.4421

(Step 2.1) Determine z : z = 848.9758/875.4421 - 848.9758 –> z is 32.077 or 32 rounded to nearest integer

(Step 2.2) Actual MW: mw = 32 * (875.4421 - 1.0078) mw = 27,981.89 Da

(Step 2.3) Theoretical: 27,875.41 Da

Observed: 27,875.44 Da

PPM Error: 1.07 ppm

Homework: Waters Part III — Peptide Mapping - primary structure (1) How many Lysines (K) and Arginines (R) are in eGFP? 20 Lysine residues and 6 Arginine residues.

MVS (K) GEELFTGVVP ILVELDGDVN GH (K) FSVSGEGE GDATYG (K) LT (K) LFICTTG (K) L PVPWPTLVTT LTYGVQCFS (R) YPDHM (K) QHDFF (K) SAMPE GYVQE (R) TIFF (K) DDGNY (K) (R) AEV (K) FEGDTLVN (R) IEL (K) GID (K) FEDGNILGH (K) LEYNYNSHNV YIMAD (K) Q (K) NGI (K) VNF (K) (R) HNIEDGSVQL ADHYQQNTPI GDGPVLLPDN HYLSTQSALS (K) DPNE (K) (R) DHMVLLEFVT AAGITLGMDE LY (K) LEHHHHHH

(2) 19 peptides in total.

(3) Counting all the peaks identified as being >10% relative abundance in that list, we get a total of approximately 18–20 peaks in that window.

(4) The number of peaks doesn’t match the number of predicted peptides. There are fewer peaks in the chromatogram than the tool predicted (19 peptides).

(5) The most abundant peak is at m/z 525.76712.The zoom-in shows isotope spacing of ~0.5, confirming a charge z of 2.Calculation: (525.767 * 2) - 1.008 = 1050.526Da

(6) At RT 2.78 min, the peptide is identified as FEGDTLVNR. Observed Mass: 1050.518 Da, Expected Mass: 1050.52Da, Mass Error: -3.60 ppm.

(7) Percentage of sequence confirmed: 88%

Bonus Peptide Map Questions (8) Since Figure 11 is the fragmentation of the 2.78 min peak, the sequence is FEGDTLVNR.

Theoretical Mass: 1050.52 Da

Validate by fragmentation tool: When I put the sequence FEGDTLVNR into the fragmentation predictor it generated a series of b-ions (fragments from the N-terminus) and y-ions (fragments from the C-terminus).

y1: 175.12 (R)

y2: 289.16 (NR)

y3: 388.23 (VNR)

y4: 501.31 (LVNR)

y5: 602.36 (TLVNR)

y6: 717.39 (DTLVNR)

The peak at 1050.52 Da in the spectrum represents the intact precursor peptide (the unfragmented molecule). This matches the calculated mass of the FEGDTLVNR peptide identified at the 2.78-minute retention time.

(9) The combination of high sequence coverage, accurate mass and successful fragmentation matching confirms that the sample is the eGFP standard. The small 12% gap in coverage (the white areas in Figure 6) is normal, as some peptides may be too small or too large to be easily detected under standard LC-MS conditions.

Homework: Waters Part IV — Oligomers

The species are identified by finding the peaks that most closely align with the multiples of the subunit masses.

7FU Decamer is the peak at 3.4 MDa.

8FU Didecamer is the peak at 8.33 MDa.

8FU 3-Decamer is the peak at 12.67 MDa.

8FU 4-Decamer is the small signal located around 16.0 MDa.

Homework: Waters Part V — Did I make GFP?

image.png image.png