PhaC Enzyme Engineering — LLM Context Document
Version: v2.2 Date: 2026.05.12 Engineer: JS Project goal: Engineer PhaC_Cn for increased PHB production.
Dataset scope note: This document is built around a single reference enzyme (C. necator PhaC1) and published mutagenesis studies on that enzyme and its close variants. It does NOT use diverse multi-species sequence alignments. See Section 3 for implications and compensating strategies.
0. IMPORTANT
0.1 General
- All sequences given or referred to here are amino acid sequences.
- We are using the wild-type enzyme PhaC from Cupriavidus necator (PhaC_Cn) as the starting point.
- PhaC is a polyhydroxyalkanoate synthase enzyme that polymerizes monomers into polyhydroxyalkanoate polymers.
- PhaC_Cn preferred product is poly-3-hydroxybutyrate (PHB), which uses monomer 3HB.
0.2 Notation
- PhaC_Cn is the wild-type sequence from Cupriavidus necator, also called Cn PhaC1.
- All mutations are notated in the form of AXB, where A is the single letter code for an amino acid in PhaC_Cn, X is the amino acid position index number in PhaC_Cn, and B is the single letter code for the amino acid substituted in this mutation. Example: A510T is the C. necator wild-type amino acid sequence for PhaC with a single amino acid substitution from alanine to threonine at position 510.
- C. necator is sometimes referred to as Ralstonia eutropha.
- PHB is poly-3-hydroxybutyrate, sometimes also called polyhydroxybutyrate or poly[(R)-3-hydroxybutyrate].
- Both the single letter codes and three letter codes for amino acids are used throughout this document.
- PhaC_Cs is the wild-type sequence from Chromobacterium sp. USM2 (Class I, 42% pairwise identity with PhaC_Cn).
- PhaC_Ps is the wild-type sequence from Pseudomonas sp. 61-3, also called Ps PhaC1 (Class II, 67% pairwise identity with PhaC_Cn).
- PhaC_Ac is the wild-type sequence from Aeromonas caviae (Class I, 37% pairwise identity with PhaC_Cn).
1. Enzyme Family Background
1.1 Classification
| Class | Subunit structure | Size | Native substrate preference | Example organism |
|---|---|---|---|---|
| I | Single subunit | ~65 kDa | scl (C3–C5): 3HB, 3HV, 3HP | Cupriavidus necator H16 |
| II | Single subunit | ~60 kDa | mcl (C6–C14): 3HHx, 3HO, 3HD | Pseudomonas sp. 61-3 |
- There are four classes, but we are not considering Classes III and IV at this point.
- Class I and II share ~50% sequence identity; Class III/IV are more distantly related.
- This project focuses exclusively on Class I, using Cn PhaC1 as the sole reference
1.2 Reaction chemistry
- Catalyzes polymerization of (R)-3-hydroxyacyl-CoA thioesters into PHA
- Ping-pong (double displacement) mechanism:
- Acylation: acyl group transferred to catalytic Cys, CoA released
- Transacylation: acyl group transferred to growing polymer chain
- Lipase-like α/β hydrolase fold
- Catalytic triad: Cys – His – Asp
- C. necator PhaC1 (Cn) reference numbering: C319, D480, H508
- All residue positions in this document use Cn PhaC1 numbering unless noted
1.3 Substrate scope terminology
| Term | Chain length | Key monomers | Notes |
|---|---|---|---|
| scl | C3–C5 | 3HP, 3HB, 3HV | Native Cn PhaC1 preference |
| mcl | C6–C14 | 3HHx, 3HO, 3HD, 3HDD | Most Class II enzymes |
| Broad/mixed | C3–C14 | scl + mcl | Often from engineered enzymes — rare |
| Specialty | varies | 3H4MV, 3H2MB, aromatic | Non-standard monomers, almost exclusively from engineered enzymes |
- We are primarily interested in scl, specifically in 3HB because my hypothesis is that increasing substrate specificity for 3HB will increase PHB production.
1.4 Why substrate specificity is structurally interesting
- Substrate-binding tunnel geometry determines acyl chain length tolerance
- Residues within ~5–10 Å of catalytic Cys are primary selectivity determinants
- mcl selectivity often results from removal of steric clash (smaller residues), not addition of new contacts — counterintuitive but well-supported
- Electrostatic environment affects CoA-thioester positioning
- Dimerization interface indirectly influences active site geometry (Class I/II enzymes)
- N-terminal domain is not well conserved; suggested to possibly be involved in substrate selection.
2. Structural Information
2.1 Experimental structures for reference
| PDB ID | Enzyme | Class | Resolution | Notes |
|---|---|---|---|---|
| 5T6O | C. necator PhaC1 | I | 1.8 Å | Primary reference, this structure contains the catalytic domain only |
| 5XAV | Chromobacterium sp. USM2 PhaC | I | 1.48 Å | additional Class I reference |
2.2 AlphaFold model for Cn PhaC1
- UniProt accession: P23608
- Overall pLDDT: 85.94
- Confidence notes: high confidence in core domain, low in N-terminal region residues 1–61
- Use AF model for: loop conformations, surface regions not in crystal structure
- Prefer crystal structure (5T6O) for: active site geometry, tunnel dimensions
2.3 Key structural regions (Cn PhaC1 numbering)
| Region | Residues | Function | Notes |
|---|---|---|---|
| N-terminal domain | 1–200 | Regulatory, dimerization, possibly involved in substrate selection | Less conserved, lower structure confidence; changes in positions 153 and 175 could affect substrate selection |
| Core catalytic domain | 200–400 | Contains Cys319 | High confidence |
| C-terminal domain | 401–589 | Contains Asp480, His508 | High confidence |
| Substrate-binding tunnel | Arg398, His481 | Selectivity determinant | channel is ∼18 Å in length, leading into C319. |
| Dimer interface | 70-88 | Stability | Avoid mutations here |
| Product-egress route | Ser201, Asp421 | Avoid mutations here | product channel lined by a series of hydrophobic residues leading from the active site to the surface of the protein at a ∼95° angle to the proposed substrate entrance channel, extending ∼12.5 Å long away from the β-sheet core of the catalytic domain and widens into a small solvent pocket near the surface of the protein by the two noted residues |
2.4 Substrate-binding tunnel residues
(Fill this table carefully — it is the core of your structural reasoning)
| Position | WT residue | Role in tunnel | Notes |
|---|---|---|---|
| 398 | Arg | Entrance region | strictly conserved in Class I enzymes |
| 481 | His | Entrance region | highly conserved in Class I enzymes; mutagenesis study showed that H481Q lost 80% activity of wild-type |
How to fill this table: Open 5T6O in PyMOL or ChimeraX. Select C319. Run:
select tunnel_res, byres (all within 10 of resi 319). List those residues here with distances. This is worth spending 1–2 hours on — it will substantially improve LLM reasoning quality.
NOTE : fill section 2.5 when you have the time to go through PyMol. 2.3 and 2.4 were filled from literature.
2.5 Tunnel geometry notes
- Estimated tunnel constriction in WT Cn PhaC1: ~[X] Å (from structural analysis)
- Residues that form the constriction point: [list]
- Estimated minimum cavity volume for 3HHx-CoA accommodation: [X ų if known]
- [Add MD simulation or docking results here as they become available]
3. Dataset Scope, Limitations, and Compensating Strategies
This section is critical. Read before every LLM session.
3.1 What your dataset contains
- Reference enzyme: C. necator H16 PhaC1 (wild-type)
- Variants: Published point mutants from the mutagenesis literature; published point mutants predicted to impact substrate specificity from PhaC_Cs and PhaC_Ac mutagenesis data - these mutants have been formatted into PhaC_Cn sequence through alignment identification.
- Labels: Substrate incorporation data from those studies
- What it does NOT contain: Homologous PhaC sequences from other species, Class II sequences, or unlabeled natural variants
3.2 Implications and honest limitations
| Issue | Explanation | Impact |
|---|---|---|
| No evolutionary signal | Without a multi-species alignment, you cannot use co-evolutionary analysis (MI, DCA) to identify specificity-determining positions | Cannot compute MI scores; Section 3.4 of the original template is not applicable |
| Narrow sequence space | All data points are close variants of one sequence (1 mutation from WT) | Model cannot extrapolate to distant sequence space; suggestions far from WT are unreliable |
| Sparse coverage | Published mutagenesis studies cover only a small fraction of all possible positions | Many positions have no experimental data; reasoning about them is purely structural/hypothetical |
| Publication bias | Literature overwhelmingly reports positive results (mutations that did something interesting) | Negative results (mutations with no effect) are underrepresented; hard to learn what doesn’t matter |
| Lab-to-lab variability | Different studies use different assay conditions, hosts, carbon sources | Quantitative comparisons across studies are unreliable |
| Limited combinatorial data | Few studies systematically explore epistatic interactions | Combining individually beneficial mutations may not be additive |
3.3 What this dataset IS good for
- The LLM can reason very effectively about:
- Mechanistic hypotheses — why does mutation X change specificity, based on structure and chemistry?
- Interpreting your experimental results — what does an unexpected outcome tell you about the mechanism?
- Experimental design — which mutations to test next given what is known?
- Identifying gaps — which positions have never been mutated but are structurally important?
- Literature synthesis — connecting observations across papers into a coherent mechanistic model
3.4 Compensating strategies
To partially offset the lack of multi-species alignment data:
Lean heavily on structural reasoning (Section 2) — fill in the tunnel residue table as completely as possible; this replaces alignment signal as your primary source of positional hypotheses
Include Class II reference data explicitly — even if not in your training set, you can add a “comparative note” section describing which Cn PhaC1 positions correspond to Class II residues (from manual alignment of just Cn PhaC1 vs. Ps PhaC1). This gives the LLM evolutionary context without requiring a full MSA.
Weight negative results equally to positive — if you can find papers reporting mutations that failed to shift specificity, record them in Section 4.3. They are highly informative and rare in the literature.
Be explicit about data gaps in prompts — tell the LLM “position X has never been mutated in the literature” so it flags its reasoning as structural/hypothetical rather than evidence-based.
Use the LLM to propose positions to structurally analyze — ask it which tunnel residues it would prioritize examining in the crystal structure, then verify those manually before including them in subsequent prompts.
3.5 Class II reference comparison
(Manual alignment of just Cn PhaC1 vs. one Class II enzymes — fills in some evolutionary context without a full MSA)
| Cn PhaC1 residue | Cn AA | Ps PhaC1 equivalent residue | Ps AA | Significance |
|---|---|---|---|---|
| 153 | D | 130 | E | N-terminal position predicted to affect selectivity |
| 175 | G | 151 | G | N-terminal position predicted to affect selectivity |
| 201 | S | 179 | G | PHB egress channel |
| 319 | C | 296 | C | catalytic triad residue C |
| 398 | R | 370 | R | Substrate tunnel entrance |
| 421 | D | 393 | D | PHB egress channel |
| 480 | D | 451 | D | catalytic triad residue D |
| 481 | H | 452 | H | Substrate tunnel entrance |
| 508 | H | 479 | H | catalytic triad residue H |
How to fill this: Use a pairwise alignment tool (e.g. EMBOSS Needle at https://www.ebi.ac.uk/Tools/psa/emboss_needle/) with Cn PhaC1 (UniProt P23608) and Ps PhaC1 (UniProt Q9Z3Y1). This takes ~10 minutes and is worth doing. I used Benchling
4. Experimental Mutation Database
(This is the heart of your dataset — populate as completely as possible)
4.1 Key literature to mine for Cn PhaC1 mutations
- Tsuge et al. (2003) Macromolecules — F420 region, systematic Class I
- Amara et al. (2002) — systematic Class I mutagenesis panel
- Rehm et al. — early mechanistic mutagenesis
- Nomura et al. — broad-specificity engineering attempts
- Insomphun et al. — 3HHx incorporation focus
- Hiroe et al. — combinatorial mutagenesis
- [Add others as you find them — search PubMed: “PhaC mutagenesis” OR “polyhydroxyalkanoate synthase substrate specificity”]
- Taguchi et al (2002) https://academic.oup.com/jb/article-abstract/131/6/801/785238?redirectedFrom=fulltext
- https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2021.627082/full
- https://openaccess.wgtn.ac.nz/articles/thesis/Toward_Engineering_the_Substrate_Specificity_of_a_PHA_Synthase_PhaC_/17152079?file=31714520
- https://onlinelibrary.wiley.com/doi/abs/10.1002/mabi.200400075
- https://journals.asm.org/doi/full/10.1128/aem.00564-13
- https://www.sciencedirect.com/science/article/pii/S0021925820346068
- https://www.sciencedirect.com/science/article/abs/pii/S1369703X08000909
Mining tip: For each paper, extract: (1) every mutation tested, including ones with no effect — these are just as valuable, (2) exact assay conditions, (3) quantitative data where reported. Even a table footnote saying “A300G showed no change in specificity” belongs here.
4.2 Gain-of-function mutations (change in substrate specificity or activity)
| Mutation | Effect | Activity vs WT | Reference | Notes |
|---|---|---|---|---|
| F420S | Increased 3HB specificity | increase | Taguchi et al 2002 | 2.4-fold increase in specific activity towards 3HB; this differs from studies on other PhaC enzymes but is correct here |
| F318Y | Increased mcl incorporation | no data | Harada et al 2021 | predicted from PhaC_Ac mutagenesis |
| Y440L | Increased mcl incorporation | no data | Harada et al 2021 | predicted from PhaC_Ac mutagenesis |
| R101L | Allowed aromatic monomer incorporation | no data | Kane 2021 | predicted from PhaC_Cs mutagenesis, possible false positive |
| A510D | Increased molecular weight of polymer produced | no data | Tsuge et al 2004 | |
| A510E | Increased molecular weight of polymer produced | no data | Tsuge et al 2004 | |
| A510M | Increased mcl incorporation | no data | Tsuge et al 2004 | |
| A510Q | Increased mcl incorporation | no data | Tsuge et al 2004 | |
| A510C | Increased mcl incorporation | no data | Tsuge et al 2004 | |
| A510G | Increased mcl incorporation | increased activity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510W | no change | increased activity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510S | Increased mcl incorporation | no change | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510T | Increased mcl incorporation | no change | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
4.3 Neutral mutations (no significant effect on specificity)
(Underrepresented in literature but critically important — record every instance you can find)
| Mutation | Region | Why tested | Outcome | Reference | Note |
|---|---|---|---|---|---|
| A510H | A510 | A510 mutations known to have effect | No change in specificity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510I | A510 | A510 mutations known to have effect | No change in specificity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510P | A510 | A510 mutations known to have effect | No change in specificity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510V | A510 | A510 mutations known to have effect | No change in specificity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510Y | A510 | A510 mutations known to have effect | No change in specificity | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| Deletion of residues 2-65 | N-terminal | Little existing data on N-terminal | No change in specificity | Ye et al 2008 | slight increase in activity |
4.4 Deleterious mutations (loss of activity or expression)
| Mutation | Effect | Reference | Note |
|---|---|---|---|
| A510F | Inactive | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510K | Inactive | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510L | Inactive | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510N | Inactive | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| A510R | Inactive | Chuah et al 2013 | predicted from PhaC_Cs mutagenesis |
| H481Q | Reduced to 20% of wild-type activity | Wittenborn et al 2016 | attributed to substrate binding loss |
4.5 Combinatorial / double mutants
| Mutations | Effect vs. singles | Reference |
|---|---|---|
| F420S + S80P | Reduced to 79% of wild-type activity, but with better thermostability | Taguchi et al 2002 |
4.6 Thermostability mutations
(Relevant when stacking specificity mutations)
| Mutation | ΔTm | Effect on activity | Effect on specificity | Reference |
|---|---|---|---|---|
| S80P | increase in thermostability | Reduced to 27% of wild-type activity | none observed | Taguchi et al 2002 |
4.7 Positions that have NOT been mutated in literature
(Fill as you read — these are candidate positions for novel exploration)
| Position | WT AA | Structural role | Why interesting |
|---|---|---|---|
| 153 | D | N-terminal | predicted to affect substrate specificity |
| 175 | G | N-terminal | predicted to affect substrate specificity |
| 398 | R | Substrate tunnel entrance | substrate binding |
4.8 Data quality notes
- Substrate specificity data given qualitative only due to differences in experimental conditions
- In vitro CoA-release assays (DTNB) give intrinsic kinetic data but don’t fully reflect in vivo selectivity under substrate competition
- Some older studies used racemic substrates — stereospecificity may confound apparent chain-length specificity
- [Add specific notes about inconsistencies you notice across papers]
5. Your Starting Enzyme (Wild-Type Cn PhaC1)
5.1 Identity
- Organism: Cupriavidus necator H16
- UniProt accession: P23608
- PhaC class: I
- Gene: phaC1 (in pha operon: phaCAB)
- Full sequence length: 589 aa
5.2 Known properties of WT Cn PhaC1
- Native substrate preference: scl — 3HB (major), 3HV (minor), very minimal mcl - 3HO, 3HDD
- Specific activity: [X nmol/min/mg — fill from literature or your own data]
- Thermostability: [Tm or optimal temperature]
- Expression in E. coli: [your experience — yield, solubility]
- Dimerization: active as dimer; monomer is inactive
- Known issues: [e.g. requires careful lysis conditions, prone to aggregation at high concentration]
5.3 Full WT sequence
5.4 Substrate-binding pocket region
(Residues 300–340, centered on C319 — paste into prompts as needed)
5.5 Catalytic and key residue positions (for quick reference)
| Residue | AA | Role |
|---|---|---|
| C319 | Cys | Catalytic — nucleophile; DO NOT MUTATE |
| D480 | Asp | Catalytic triad; DO NOT MUTATE |
| H508 | His | Catalytic triad; DO NOT MUTATE |
| R398 | Arg | Tunnel entrance; strictly conserved - mutate with caution |
| H481 | His | Tunnel entrance; highly conserved - mutate with caution |
| D153 | Asp | N-terminal position predicted to affect selectivity; mutational target |
| G175 | Gly | N-terminal position predicted to affect selectivity; mutational target |
| S201 | Ser | PHB egress channel |
| D421 | Asp | PHB egress channel |
6. Engineering Target
6.1 Primary goal
[State precisely, e.g.:]
Increase substrate specificity towards 3HB specifically or scl generally with combinatorial mutations in the N-terminal and elsewhere, with the hypothesis that this will increase overall PHB production.
6.2 Secondary goals
- Avoid total loss of activity
6.3 Acceptable tradeoffs
- Up to 30% reduction in activity acceptable
6.4 Hard constraints — DO NOT VIOLATE
- Do NOT mutate catalytic triad: C319, D480, H508
- Avoid dimer interface residues: 70-88
6.5 What has already been tested
(Update after every experiment round — prevents redundant suggestions)
| Mutation(s) | 3HHx result | Other notable effects | Date | Notes |
|---|---|---|---|---|
| WT control | ~0 mol% | Baseline | [date] |
7. Production and Assay Context
7.1 Expression system
- Host: Cell-free expression, E. coli BL21(DE3) lysate
- Vector: pTwistChlor-HighCopy
- Expression conditions: unpurified cell-free reaction
- Typical soluble yield:
- Purification: none
7.2 Eventual In vivo PHA production conditions
- Host: Cyanobacterium aponium sp. UTEX 3222
- Co-expressed pathway genes: native PHA biosynthetic genes
- Carbon source(s): atmospheric carbon dioxide
- Growth conditions: TBD
- PHA content range (WT): unknown
7.3 In vitro activity assay (if used)
- Assay type: visual inspection for insoluble PHB granules
- Substrate(s): 3HB-CoA
- Buffer conditions: HEPES-KOH pH 7.5
7.4 PHA analysis
- Extraction method: TBD
- Monomer analysis: GC-MS for identification]
- Quantification standard: LC-MS
- Throughput: TBD
8. Reasoning Guidelines for LLM
8.1 Dataset context — tell the LLM explicitly at session start
Always include this statement at the top of each session prompt:
“My dataset consists only of C. necator PhaC1 (wild-type) and published point mutants of this single enzyme. I do not have a multi-species alignment. All positional reasoning should be grounded in (a) the experimental mutation database in Section 4, and (b) structural analysis of PDB 5T6O / AlphaFold model P23608. Do not infer specificity determinants from phylogenetic patterns — that data is not available.”
8.2 Prioritization criteria (in order, adjusted for this dataset)
- Direct experimental evidence — mutations in Section 4 with measured outcomes
- Structural/mechanistic reasoning — based on 5T6O crystal structure and tunnel geometry (Section 2)
- Analogy to Class II — using the pairwise comparison in Section 3.5, noting explicitly when this is being used
- Chemical intuition — physicochemical rationale for a substitution, flagged as [SPECULATIVE] if no structural or experimental support
8.3 Required output format for mutation suggestions
For every suggested mutation, provide:
- (a) Mutation in standard notation (e.g. A149F, Cn PhaC1 numbering)
- (b) Primary evidence basis: Experimental / Structural / Class II analogy / Chemical intuition [SPECULATIVE]
- (c) Mechanistic rationale — specific, not generic
- (d) Consistency with existing data in Section 4 — does it contradict anything?
- (e) Confidence: High (direct experimental support) / Medium (structural + analogy) / Low (chemical intuition only)
- (f) Predicted risk: stability, expression, activity loss
8.4 Reasoning I do NOT want
- Statements like “this position is conserved in mcl enzymes” — you do not have alignment data to support this; use only the pairwise comparison in 3.5
- Quantitative predictions of mol% outcomes
- Suggestions violating hard constraints in Section 6.4
- Suggestions already in Section 6.5 “already tested” table
- Filling data gaps with plausible-sounding inventions — flag uncertainty explicitly
8.5 Especially useful prompts for this dataset type
Given the single-enzyme focus, these prompt types will be most productive:
Gap analysis: “Which tunnel-lining residues (Section 2.4) have never been mutated in the literature (Section 4.7)? For each, give a structural rationale for whether they are likely to affect specificity.”
Mechanistic interpretation: “Mutation X gave unexpected result Y. Given the structural context of position X (distance to C319, neighboring residues, tunnel role), propose 2–3 mechanistic explanations.”
Epistasis prediction: “Given that A149F and S325A are individually beneficial, reason about whether their combination is likely to be additive, synergistic, or antagonistic, based on their structural relationship.”
Experimental prioritization: “I can test 12 variants. Given the mutation database and structural data, design a 12-variant panel that maximizes information gained about specificity determinants.”
8.6 My background
PhD in bioengineering. Comfortable with molecular biology, enzyme kinetics, and microbial fermentation. Less experienced with structural biology and bioinformatics — please explain structural reasoning clearly but do not oversimplify the biochemistry.
9. Session Log
(Prepend full context document + append this log to every session)
Session 2026.05.12 — Mechanistic hypothesis building + Round 1 experimental panel
Prompts used:
- Mechanistic hypothesis building for PhaC_Cn scl selectivity
- Suggest single N-terminal mutations to probe that region
- Suggest single and combinatorial mutations outside the N-terminus
- Consolidate all suggestions into a Round 1 experimental panel
Key outputs / hypotheses:
Hypothesis 1 — Two-point selectivity model: scl selectivity in PhaC_Cn is established at two structural points:
- Tunnel constriction (A510): Ala510 acts as a steric gate physically excluding acyl chains longer than ~C5 from reaching C319. This is the primary passive size filter. WT Ala at this position is already near-optimal for scl — bulkier substitutions (F, K, L, N, R) are inactive, and smaller substitutions (G, S, T) widen the tunnel and permit mcl incorporation.
- Active site geometry (F420): Residues near the base of the active site, including F420, optimize catalytic geometry for the C4 acyl-enzyme intermediate. F420S gives a 2.4-fold increase in 3HB-specific activity, consistent with relief of steric strain in the binding pose for 3HB-CoA.
Hypothesis 2 — R398 and H481 are positioning/catalytic residues, not selectivity gates: Both are strictly or highly conserved across Class I enzymes regardless of substrate preference. R398 likely contacts the CoA moiety to position the acyl chain for nucleophilic attack; H481 likely contributes to transition state stabilization. Neither is a primary chain-length selectivity determinant. Both are high-risk mutation targets.
Hypothesis 3 — Residual mcl leakiness has an uncharacterized structural origin: WT PhaC_Cn incorporates very low levels of 3HO and 3HDD. Since A510 appears near-optimal for scl exclusion and F420S already improves 3HB activity, the residual mcl leakiness likely originates from dynamic flexibility elsewhere in the tunnel — possibly in the uncharacterized region between R398/H481 (entrance) and A510 (constriction). This region is not yet described in Section 2.4 and should be a priority for PyMOL analysis.
Hypothesis 4 — N-terminal domain role is genuinely unclear: Deletion of residues 2–65 (Ye et al. 2008) shows no specificity change, suggesting the extreme N-terminus is dispensable. D153 and G175 are proposed as candidate positions based on Class II pairwise comparison only. G175 is conserved as Gly in both Class I and II, suggesting structural rather than selectivity role. D153 differs conservatively between classes (Asp vs. Glu) and is the stronger candidate. All N-terminal reasoning is [SPECULATIVE].
Round 1 experimental panel:
| Variant | Type | Primary purpose | Priority |
|---|---|---|---|
| WT | Control | Baseline | Essential |
| F420S | Single | Best direct evidence for 3HB activity increase | 1 |
| F420S + S80P | Double | Validate combinatorial assay; thermostable scaffold | 2 |
| F420S + A510G | Double | Most informative epistasis test for goal | 3 |
| A510G | Single | Mechanistic reference; activity benchmark | 4 |
| F318Y | Single | Validate Harada 2021 transfer prediction | 5 |
| D153E | Single | N-terminal probe, conservative | 6 |
| D153A | Single | N-terminal probe, charge removal | 7 |
| G175A | Single | N-terminal structural probe | 8 |
| F420S + F318Y | Double | Epistasis / dominance test | 9 |
| Y440L | Single | Second Harada 2021 validation | 10 |
12 total variants including WT. Fits standard 12-well format with no spares — consider dropping Y440L (priority 10) to keep a spare well if expression failures are anticipated.
Flagged uncertainties / SPECULATIVE tags:
- All N-terminal suggestions (D153E, D153A, G175A) are [SPECULATIVE] — grounded only in AlphaFold model (low pLDDT in this region) and Class II pairwise comparison. No structural data from 5T6O available for this region.
- F318Y and Y440L are predictions transferred from PhaC_Ac (37% identity) — transfer reliability is unknown and should be treated as unvalidated until tested directly in PhaC_Cn.
- F420S mechanistic basis (Hypothesis A: steric relief vs. Hypothesis B: electrostatic contribution) is not resolved by available data — both remain plausible.
- The identity of tunnel residues between R398/H481 and A510 is not characterized in Section 2.4 — this gap limits mechanistic reasoning about the constriction region.
- Epistasis between all combinations is unknown; F420S + A510G is the combination with the most interpretable expected outcome.
Action items:
- Complete PyMOL tunnel analysis (Section 2.5) — residues within 10 Å of C319, focusing on the R398-to-A510 region
- Verify structural position of F420 relative to C319 and catalytic triad in 5T6O
- Order / construct Round 1 variant panel (12 variants listed above)
- After results: add all new data to Section 4 and Section 10 before next session
Session [DATE]
(repeat block)
10. Experimental Results Log
Experiment [DATE / ID]
Variants tested:
| Variant | 3HB mol% | 3HV mol% | 3HHx mol% | Total PHA wt% | Soluble expression? | Notes |
|---|---|---|---|---|---|---|
| WT | [X] | [X] | ~0 | [X] | Yes | Control |
| [mut] | [X] | [X] | [X] | [X] | [Y/N] |
Interpretation: [What do these results mean for your mechanistic model?]
Surprises / inconsistencies with predictions: [Critical to record — unexpected results are often the most informative]
Updated hypotheses: [How do results revise your model of specificity determinants?]
Add to mutation database: [Y/N — copy rows to Section 4 as appropriate]
End of context document — v2.2 (single-enzyme / mutagenesis dataset scope) Keep this file updated and prepend it in full to every new LLM session