PhaC Enzyme Engineering — LLM Context Document
Version: v1.0
Date: [DATE]
Engineer: [YOUR NAME]
Project goal: [ONE SENTENCE SUMMARY, e.g. “Engineer Class I PhaC to incorporate 3HHx at >15 mol%”]
1. Enzyme Family Background
1.1 Classification
| Class | Subunit structure | Size | Native substrate preference | Example organism |
|---|---|---|---|---|
| I | Single subunit | ~65 kDa | scl (C3–C5): 3HB, 3HV, 3HP | Cupriavidus necator H16 |
| II | Single subunit | ~60 kDa | mcl (C6–C14): 3HHx, 3HO, 3HD | Pseudomonas aeruginosa |
| III | Heterodimer (PhaC + PhaE) | ~40+40 kDa | scl | Allochromatium vinosum |
| IV | Heterodimer (PhaC + PhaR) | ~40+40 kDa | scl | Bacillus megaterium |
- Class I and II share ~50% sequence identity; Class III/IV are more distantly related
- Class I/II are the primary engineering targets for substrate specificity work
1.2 Reaction chemistry
- Catalyzes polymerization of (R)-3-hydroxyacyl-CoA thioesters into PHA
- Ping-pong (double displacement) mechanism:
- Acylation: acyl group transferred to catalytic Cys, CoA released
- Transacylation: acyl group transferred to growing polymer chain
- Lipase-like α/β hydrolase fold
- Catalytic triad: Cys – His – Asp
- C. necator PhaC1 (Cn) reference numbering: C319, D480, H508
1.3 Substrate scope terminology
| Term | Chain length | Key monomers | Notes |
|---|---|---|---|
| scl | C3–C5 | 3HP, 3HB, 3HV | Most Class I enzymes |
| mcl | C6–C14 | 3HHx, 3HO, 3HD, 3HDD | Most Class II enzymes |
| lcl | >C14 | 3HHxD+ | Very rare |
| Broad/mixed | C3–C14 | scl + mcl | Rare, high engineering value |
| Specialty | varies | 3H4MV, 3H2MB, aromatic | Non-standard monomers |
1.4 Why substrate specificity is structurally interesting
- Substrate-binding tunnel geometry determines acyl chain length tolerance
- Residues within ~5–10 Å of catalytic Cys are primary selectivity determinants
- mcl selectivity often results from removal of steric clash (smaller residues), not addition of new contacts — counterintuitive but well-supported
- Electrostatic environment affects CoA-thioester positioning
- Dimerization interface indirectly influences active site geometry (Class I/II)
2. Structural Information
2.1 Available experimental structures
| PDB ID | Enzyme | Class | Resolution | Notes |
|---|---|---|---|---|
| 5T6O | C. necator PhaC1 | I | [X] Å | Primary Class I reference |
| 4QO9 | Chromobacterium sp. USM2 PhaC | I | [X] Å | |
| [ID] | [Enzyme] | [Class] | [Res] | [Notes] |
2.2 AlphaFold models
| UniProt accession | Organism | Class | pLDDT (overall) | Confidence notes |
|---|---|---|---|---|
| [ACCESSION] | [ORG] | [I/II] | [score] | [e.g. low in N-term, residues 1–40] |
2.3 Key structural regions
(Using C. necator PhaC1 residue numbering as reference)
| Region | Residues (Cn) | Function | Conservation |
|---|---|---|---|
| N-terminal domain | 1–170 | Regulatory, dimerization | Low |
| Core catalytic domain | 171–400 | Contains Cys319 | High |
| C-terminal domain | 401–589 | Contains Asp480, His508 | High |
| Substrate-binding tunnel | [list residues] | Selectivity determinant | Moderate |
| Dimer interface | [list residues] | Stability | Moderate |
2.4 Known substrate-contacting / selectivity residues
(From mutagenesis studies and structural analyses — update as you find more)
| Position (Cn) | WT residue | Role | scl consensus | mcl consensus | Evidence |
|---|---|---|---|---|---|
| 149 | Ala | Tunnel entrance | A/V (89%) | F/W (74%) | Mutagenesis |
| 171 | [AA] | Structural hinge | |||
| 325 | Ser | Tunnel lining | S/A | A/G | |
| 392 | [AA] | Near active site | |||
| 480 | Asp | Catalytic triad | D | D | Catalytic |
| 508 | His | Catalytic triad | H | H | Catalytic |
| [pos] | [AA] | [role] |
2.5 Tunnel geometry notes
- scl enzymes: narrower tunnel, estimated constriction ~4–6 Å
- mcl enzymes: wider tunnel — bulky residues at key positions replaced by smaller ones (Ala, Gly) to accommodate longer acyl chains
- [Add any MD simulation or docking notes here as available]
3. Sequence Dataset Summary
3.1 Dataset composition
- Total sequences collected: [N]
- After 95% identity dereplication (cd-hit): [N]
- Labeled with substrate preference data: [N]
- scl only: [N]
- mcl only: [N]
- broad/mixed: [N]
- specialty monomer: [N]
- Unlabeled (phylogenetic diversity only): [N]
- Data sources: UniProt/SwissProt, NCBI RefSeq, literature
3.2 Taxonomic distribution
| Taxon | N sequences | Dominant class | Notes |
|---|---|---|---|
| Betaproteobacteria | [N] | Class I | C. necator relatives |
| Gammaproteobacteria | [N] | Class II | Pseudomonas relatives |
| Alphaproteobacteria | [N] | I/III | |
| Firmicutes | [N] | Class IV | |
| Other | [N] |
3.3 Alignment properties
- Alignment method: [MUSCLE / Clustal Omega / MAFFT]
- Raw aligned length: [N] columns
- After gap trimming (>80% gap threshold): [N] columns
- Mean pairwise identity, full dataset: [X]%
- Mean pairwise identity, scl group: [X]%
- Mean pairwise identity, mcl group: [X]%
3.4 Top mutual information positions
(Fill in after running Option 2 MI analysis)
| Alignment col | Approx. residue (Cn) | scl consensus | mcl consensus | MI score |
|---|---|---|---|---|
| [col] | ~[res] | [AA (%)] | [AA (%)] | [score] |
| [col] | ~[res] | [AA (%)] | [AA (%)] | [score] |
| [col] | ~[res] | [AA (%)] | [AA (%)] | [score] |
4. Experimental Mutation Database
(This section should grow over time as you mine the literature and generate your own data)
4.1 Key literature to mine
- Tsuge et al. (2003) Macromolecules — F420 region, Class I
- Amara et al. (2002) — systematic Class I mutagenesis
- Rehm lab series — Class II specificity determinants
- Nomura et al. — broad-specificity engineered variants
- Insomphun et al. — 3HHx incorporation engineering
- [Add others as you find them]
4.2 Gain-of-function mutations (toward mcl/broader specificity)
| Mutation | Background | Substrate effect | Quantitative data | Assay type | Reference |
|---|---|---|---|---|---|
| F420S | Cn PhaC1 | Gains 3HHx incorporation | 3HHx: 0 → 8 mol% | In vivo GC | Tsuge 2003 |
| A510S | Cn PhaC1 | Increased mcl acceptance | — | Amara 2002 | |
| [mut] | [bg] | [effect] | [data] | [assay] | [ref] |
4.3 Loss-of-function / specificity-narrowing mutations
| Mutation | Background | Substrate effect | Quantitative data | Assay type | Reference |
|---|---|---|---|---|---|
| [mut] | [bg] | [effect] | [data] | [assay] | [ref] |
4.4 Combinatorial / double mutants
| Mutations | Background | Effect vs. singles | Epistasis | Reference |
|---|---|---|---|---|
| F420S + A510S | Cn PhaC1 | [effect] | Additive / synergistic / antagonistic | [ref] |
| [muts] | [bg] | [effect] | [ref] |
4.5 Thermostability mutations
(Relevant if stacking specificity mutations — need to preserve stability)
| Mutation | Background | ΔTm | Effect on activity | Reference |
|---|---|---|---|---|
| [mut] | [bg] | [+/- X°C] | [effect] | [ref] |
4.6 Notes on data quality and comparability
- Monomer incorporation % varies heavily with fermentation conditions (carbon source, growth phase, host strain) — cross-lab comparisons are unreliable
- In vitro assays (purified enzyme + CoA thioesters) are more reliable for intrinsic specificity than in vivo PHA production titers
- [Add any other caveats specific to your dataset]
5. Your Starting Enzyme (Wild-Type)
5.1 Identity
- UniProt accession: [ID]
- Organism: [NAME]
- PhaC class: [I / II / III / IV]
- Gene name: [phaC / phaC1 / phaC2]
- Full sequence length: [N] aa
5.2 Known properties
- Native substrate preference: [e.g. scl — 3HB/3HV, negligible 3HHx]
- Specific activity: [X nmol/min/mg if known]
- Thermostability: [Tm or optimal temperature]
- Expression: [e.g. soluble in E. coli BL21 at 25°C, typical yield X mg/L]
- Any known issues: [e.g. prone to aggregation, requires CoA for stability]
5.3 Sequence — full
5.4 Sequence — substrate-binding pocket region
(~30 residues centered on catalytic Cys; easier to include in prompts)
5.5 Alignment position mapping
(Map your WT residue numbers to the C. necator reference numbering and to your alignment column numbers — critical for interpreting suggestions)
| Your residue | Your AA | Cn equivalent residue | Alignment column |
|---|---|---|---|
| [N] | [AA] | [N] | [col] |
6. Engineering Target
6.1 Primary goal
[State precisely, e.g.:]
Incorporate 3HHx (C6) at >15 mol% in scl-mcl copolymer produced in E. coli BL21 on mixed carbon source (sodium butyrate + sodium hexanoate)
6.2 Secondary goals
- [e.g. Retain 3HB incorporation >50 mol%]
- [e.g. Maintain thermostability — Tm drop <5°C acceptable]
6.3 Acceptable tradeoffs
- [e.g. Up to 30% reduction in overall polymerization activity]
- [e.g. Reduced expression yield acceptable if specificity goal is met]
6.4 Hard constraints — DO NOT VIOLATE
- Maximum simultaneous mutations: [N] (practical screening limit)
- Must retain soluble expression in E. coli
- Do not mutate catalytic triad residues (C319, D480, H508)
- Avoid dimer interface mutations (stability risk)
- [Add any others]
6.5 What has already been tested
(Critical — prevents the LLM from repeatedly suggesting things you’ve tried)
| Mutation(s) | Result | Date tested | Notes |
|---|---|---|---|
| F420S | 3HHx only 3% — insufficient | [date] | Tested in BL21, 30°C |
| [mut] | [result] | [date] |
7. Production and Assay Context
7.1 Expression system
- Host: [e.g. E. coli BL21(DE3)]
- Vector: [e.g. pET-28a, His-tag]
- Expression conditions: [e.g. 25°C, 16h, 0.5 mM IPTG]
- Typical yield: [X mg/L culture]
7.2 PHA production conditions
- Carbon source(s): [e.g. 10 mM sodium butyrate + 5 mM sodium hexanoate]
- Co-pathway: [e.g. PhaA/PhaB co-expressed for 3HB-CoA supply; PhaJ for 3HHx-CoA]
- Growth phase at harvest: [e.g. 48h, stationary]
- PHA content typically: [X wt%]
7.3 Analytical method
- PHA extraction: [e.g. chloroform extraction, sodium hypochlorite method]
- Monomer analysis: [e.g. GC-FID after methanolysis, GC-MS for identification]
- Activity assay (if used): [e.g. DTNB assay monitoring CoA release]
- Throughput: [e.g. 24 variants per experiment]
8. Reasoning Guidelines for LLM
8.1 Prioritization criteria (in order)
- Mechanistic/structural plausibility — a rationale is required
- Consistency with experimental mutation database (Section 4)
- Conservation pattern in target-substrate homologs (Section 3.4)
- Novelty relative to literature
8.2 Required output format for mutation suggestions
For every suggested mutation, provide:
- (a) Mutation in standard notation (e.g. A149F)
- (b) Mechanistic rationale — why this residue, why this substitution
- (c) Supporting evidence — literature, alignment, structural
- (d) Confidence level: High / Medium / Low
- (e) Potential risks — stability, expression, off-target effects
- (f) Tag
[SPECULATIVE]if based on analogy with no direct evidence
8.3 Reasoning I do NOT want
- Suggestions based solely on “this residue differs between scl and mcl sequences” without structural or mechanistic reasoning
- Overconfident quantitative predictions (e.g. “this will give 20% 3HHx”)
- Suggestions that violate hard constraints in Section 6.4
- Ignoring the “already tested” table in Section 6.5
8.4 When hypotheses conflict
- Explicitly state the conflict and explain both sides
- Do not silently choose one; flag for experimental resolution
8.5 My background
[Describe your expertise so the LLM calibrates explanation depth, e.g.:]
PhD in microbiology/biochemistry. Comfortable with protein biochemistry and microbial fermentation. Less experienced with structural biology and computational methods — please explain structural reasoning in accessible terms but do not oversimplify the biochemistry.
9. Session Log
(Append after each LLM session — builds institutional memory)
Session [DATE]
Question asked: [Paste your prompt]
Key LLM output / hypotheses generated: [Summarize or paste]
Your assessment: [Which suggestions seem worth pursuing, which to discard and why]
Action items:
- [e.g. Test A149F single mutant]
- [e.g. Check position 171 in AlphaFold model]
Session [DATE]
(repeat block)
10. Experimental Results Log
(Append as data comes in — feeds back into Section 4 and future sessions)
Experiment [DATE / ID]
Variants tested:
| Variant | 3HB mol% | 3HV mol% | 3HHx mol% | Total PHA wt% | Notes |
|---|---|---|---|---|---|
| WT | [X] | [X] | [X] | [X] | Control |
| [mut] | [X] | [X] | [X] | [X] |
Interpretation: [What do these results mean for your hypotheses?]
Updated hypotheses: [How do results change your model of specificity determinants?]
End of context document — keep this file updated and prepend it to every new LLM session