PhaC Enzyme Engineering — LLM Context Document

Version: v1.0
Date: [DATE]
Engineer: [YOUR NAME]
Project goal: [ONE SENTENCE SUMMARY, e.g. “Engineer Class I PhaC to incorporate 3HHx at >15 mol%”]


1. Enzyme Family Background

1.1 Classification

ClassSubunit structureSizeNative substrate preferenceExample organism
ISingle subunit~65 kDascl (C3–C5): 3HB, 3HV, 3HPCupriavidus necator H16
IISingle subunit~60 kDamcl (C6–C14): 3HHx, 3HO, 3HDPseudomonas aeruginosa
IIIHeterodimer (PhaC + PhaE)~40+40 kDasclAllochromatium vinosum
IVHeterodimer (PhaC + PhaR)~40+40 kDasclBacillus megaterium
  • Class I and II share ~50% sequence identity; Class III/IV are more distantly related
  • Class I/II are the primary engineering targets for substrate specificity work

1.2 Reaction chemistry

  • Catalyzes polymerization of (R)-3-hydroxyacyl-CoA thioesters into PHA
  • Ping-pong (double displacement) mechanism:
    1. Acylation: acyl group transferred to catalytic Cys, CoA released
    2. Transacylation: acyl group transferred to growing polymer chain
  • Lipase-like α/β hydrolase fold
  • Catalytic triad: Cys – His – Asp
    • C. necator PhaC1 (Cn) reference numbering: C319, D480, H508

1.3 Substrate scope terminology

TermChain lengthKey monomersNotes
sclC3–C53HP, 3HB, 3HVMost Class I enzymes
mclC6–C143HHx, 3HO, 3HD, 3HDDMost Class II enzymes
lcl>C143HHxD+Very rare
Broad/mixedC3–C14scl + mclRare, high engineering value
Specialtyvaries3H4MV, 3H2MB, aromaticNon-standard monomers

1.4 Why substrate specificity is structurally interesting

  • Substrate-binding tunnel geometry determines acyl chain length tolerance
  • Residues within ~5–10 Å of catalytic Cys are primary selectivity determinants
  • mcl selectivity often results from removal of steric clash (smaller residues), not addition of new contacts — counterintuitive but well-supported
  • Electrostatic environment affects CoA-thioester positioning
  • Dimerization interface indirectly influences active site geometry (Class I/II)

2. Structural Information

2.1 Available experimental structures

PDB IDEnzymeClassResolutionNotes
5T6OC. necator PhaC1I[X] ÅPrimary Class I reference
4QO9Chromobacterium sp. USM2 PhaCI[X] Å
[ID][Enzyme][Class][Res][Notes]

2.2 AlphaFold models

UniProt accessionOrganismClasspLDDT (overall)Confidence notes
[ACCESSION][ORG][I/II][score][e.g. low in N-term, residues 1–40]

2.3 Key structural regions

(Using C. necator PhaC1 residue numbering as reference)

RegionResidues (Cn)FunctionConservation
N-terminal domain1–170Regulatory, dimerizationLow
Core catalytic domain171–400Contains Cys319High
C-terminal domain401–589Contains Asp480, His508High
Substrate-binding tunnel[list residues]Selectivity determinantModerate
Dimer interface[list residues]StabilityModerate

2.4 Known substrate-contacting / selectivity residues

(From mutagenesis studies and structural analyses — update as you find more)

Position (Cn)WT residueRolescl consensusmcl consensusEvidence
149AlaTunnel entranceA/V (89%)F/W (74%)Mutagenesis
171[AA]Structural hinge
325SerTunnel liningS/AA/G
392[AA]Near active site
480AspCatalytic triadDDCatalytic
508HisCatalytic triadHHCatalytic
[pos][AA][role]

2.5 Tunnel geometry notes

  • scl enzymes: narrower tunnel, estimated constriction ~4–6 Å
  • mcl enzymes: wider tunnel — bulky residues at key positions replaced by smaller ones (Ala, Gly) to accommodate longer acyl chains
  • [Add any MD simulation or docking notes here as available]

3. Sequence Dataset Summary

3.1 Dataset composition

  • Total sequences collected: [N]
  • After 95% identity dereplication (cd-hit): [N]
  • Labeled with substrate preference data: [N]
    • scl only: [N]
    • mcl only: [N]
    • broad/mixed: [N]
    • specialty monomer: [N]
  • Unlabeled (phylogenetic diversity only): [N]
  • Data sources: UniProt/SwissProt, NCBI RefSeq, literature

3.2 Taxonomic distribution

TaxonN sequencesDominant classNotes
Betaproteobacteria[N]Class IC. necator relatives
Gammaproteobacteria[N]Class IIPseudomonas relatives
Alphaproteobacteria[N]I/III
Firmicutes[N]Class IV
Other[N]

3.3 Alignment properties

  • Alignment method: [MUSCLE / Clustal Omega / MAFFT]
  • Raw aligned length: [N] columns
  • After gap trimming (>80% gap threshold): [N] columns
  • Mean pairwise identity, full dataset: [X]%
  • Mean pairwise identity, scl group: [X]%
  • Mean pairwise identity, mcl group: [X]%

3.4 Top mutual information positions

(Fill in after running Option 2 MI analysis)

Alignment colApprox. residue (Cn)scl consensusmcl consensusMI score
[col]~[res][AA (%)][AA (%)][score]
[col]~[res][AA (%)][AA (%)][score]
[col]~[res][AA (%)][AA (%)][score]

4. Experimental Mutation Database

(This section should grow over time as you mine the literature and generate your own data)

4.1 Key literature to mine

  • Tsuge et al. (2003) Macromolecules — F420 region, Class I
  • Amara et al. (2002) — systematic Class I mutagenesis
  • Rehm lab series — Class II specificity determinants
  • Nomura et al. — broad-specificity engineered variants
  • Insomphun et al. — 3HHx incorporation engineering
  • [Add others as you find them]

4.2 Gain-of-function mutations (toward mcl/broader specificity)

MutationBackgroundSubstrate effectQuantitative dataAssay typeReference
F420SCn PhaC1Gains 3HHx incorporation3HHx: 0 → 8 mol%In vivo GCTsuge 2003
A510SCn PhaC1Increased mcl acceptanceAmara 2002
[mut][bg][effect][data][assay][ref]

4.3 Loss-of-function / specificity-narrowing mutations

MutationBackgroundSubstrate effectQuantitative dataAssay typeReference
[mut][bg][effect][data][assay][ref]

4.4 Combinatorial / double mutants

MutationsBackgroundEffect vs. singlesEpistasisReference
F420S + A510SCn PhaC1[effect]Additive / synergistic / antagonistic[ref]
[muts][bg][effect][ref]

4.5 Thermostability mutations

(Relevant if stacking specificity mutations — need to preserve stability)

MutationBackgroundΔTmEffect on activityReference
[mut][bg][+/- X°C][effect][ref]

4.6 Notes on data quality and comparability

  • Monomer incorporation % varies heavily with fermentation conditions (carbon source, growth phase, host strain) — cross-lab comparisons are unreliable
  • In vitro assays (purified enzyme + CoA thioesters) are more reliable for intrinsic specificity than in vivo PHA production titers
  • [Add any other caveats specific to your dataset]

5. Your Starting Enzyme (Wild-Type)

5.1 Identity

  • UniProt accession: [ID]
  • Organism: [NAME]
  • PhaC class: [I / II / III / IV]
  • Gene name: [phaC / phaC1 / phaC2]
  • Full sequence length: [N] aa

5.2 Known properties

  • Native substrate preference: [e.g. scl — 3HB/3HV, negligible 3HHx]
  • Specific activity: [X nmol/min/mg if known]
  • Thermostability: [Tm or optimal temperature]
  • Expression: [e.g. soluble in E. coli BL21 at 25°C, typical yield X mg/L]
  • Any known issues: [e.g. prone to aggregation, requires CoA for stability]

5.3 Sequence — full

[PASTE FULL AMINO ACID SEQUENCE HERE]

5.4 Sequence — substrate-binding pocket region

(~30 residues centered on catalytic Cys; easier to include in prompts)

[PASTE POCKET REGION SEQUENCE HERE — label residue numbers]
e.g. residues 305–335: XXXXCXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                            ^C319

5.5 Alignment position mapping

(Map your WT residue numbers to the C. necator reference numbering and to your alignment column numbers — critical for interpreting suggestions)

Your residueYour AACn equivalent residueAlignment column
[N][AA][N][col]

6. Engineering Target

6.1 Primary goal

[State precisely, e.g.:]

Incorporate 3HHx (C6) at >15 mol% in scl-mcl copolymer produced in E. coli BL21 on mixed carbon source (sodium butyrate + sodium hexanoate)

6.2 Secondary goals

  • [e.g. Retain 3HB incorporation >50 mol%]
  • [e.g. Maintain thermostability — Tm drop <5°C acceptable]

6.3 Acceptable tradeoffs

  • [e.g. Up to 30% reduction in overall polymerization activity]
  • [e.g. Reduced expression yield acceptable if specificity goal is met]

6.4 Hard constraints — DO NOT VIOLATE

  • Maximum simultaneous mutations: [N] (practical screening limit)
  • Must retain soluble expression in E. coli
  • Do not mutate catalytic triad residues (C319, D480, H508)
  • Avoid dimer interface mutations (stability risk)
  • [Add any others]

6.5 What has already been tested

(Critical — prevents the LLM from repeatedly suggesting things you’ve tried)

Mutation(s)ResultDate testedNotes
F420S3HHx only 3% — insufficient[date]Tested in BL21, 30°C
[mut][result][date]

7. Production and Assay Context

7.1 Expression system

  • Host: [e.g. E. coli BL21(DE3)]
  • Vector: [e.g. pET-28a, His-tag]
  • Expression conditions: [e.g. 25°C, 16h, 0.5 mM IPTG]
  • Typical yield: [X mg/L culture]

7.2 PHA production conditions

  • Carbon source(s): [e.g. 10 mM sodium butyrate + 5 mM sodium hexanoate]
  • Co-pathway: [e.g. PhaA/PhaB co-expressed for 3HB-CoA supply; PhaJ for 3HHx-CoA]
  • Growth phase at harvest: [e.g. 48h, stationary]
  • PHA content typically: [X wt%]

7.3 Analytical method

  • PHA extraction: [e.g. chloroform extraction, sodium hypochlorite method]
  • Monomer analysis: [e.g. GC-FID after methanolysis, GC-MS for identification]
  • Activity assay (if used): [e.g. DTNB assay monitoring CoA release]
  • Throughput: [e.g. 24 variants per experiment]

8. Reasoning Guidelines for LLM

8.1 Prioritization criteria (in order)

  1. Mechanistic/structural plausibility — a rationale is required
  2. Consistency with experimental mutation database (Section 4)
  3. Conservation pattern in target-substrate homologs (Section 3.4)
  4. Novelty relative to literature

8.2 Required output format for mutation suggestions

For every suggested mutation, provide:

  • (a) Mutation in standard notation (e.g. A149F)
  • (b) Mechanistic rationale — why this residue, why this substitution
  • (c) Supporting evidence — literature, alignment, structural
  • (d) Confidence level: High / Medium / Low
  • (e) Potential risks — stability, expression, off-target effects
  • (f) Tag [SPECULATIVE] if based on analogy with no direct evidence

8.3 Reasoning I do NOT want

  • Suggestions based solely on “this residue differs between scl and mcl sequences” without structural or mechanistic reasoning
  • Overconfident quantitative predictions (e.g. “this will give 20% 3HHx”)
  • Suggestions that violate hard constraints in Section 6.4
  • Ignoring the “already tested” table in Section 6.5

8.4 When hypotheses conflict

  • Explicitly state the conflict and explain both sides
  • Do not silently choose one; flag for experimental resolution

8.5 My background

[Describe your expertise so the LLM calibrates explanation depth, e.g.:]

PhD in microbiology/biochemistry. Comfortable with protein biochemistry and microbial fermentation. Less experienced with structural biology and computational methods — please explain structural reasoning in accessible terms but do not oversimplify the biochemistry.


9. Session Log

(Append after each LLM session — builds institutional memory)

Session [DATE]

Question asked: [Paste your prompt]

Key LLM output / hypotheses generated: [Summarize or paste]

Your assessment: [Which suggestions seem worth pursuing, which to discard and why]

Action items:

  • [e.g. Test A149F single mutant]
  • [e.g. Check position 171 in AlphaFold model]

Session [DATE]

(repeat block)


10. Experimental Results Log

(Append as data comes in — feeds back into Section 4 and future sessions)

Experiment [DATE / ID]

Variants tested:

Variant3HB mol%3HV mol%3HHx mol%Total PHA wt%Notes
WT[X][X][X][X]Control
[mut][X][X][X][X]

Interpretation: [What do these results mean for your hypotheses?]

Updated hypotheses: [How do results change your model of specificity determinants?]


End of context document — keep this file updated and prepend it to every new LLM session