Final Project Journey
Project drafts
The following three slides represent my drafts for my final project. Project one involves decaffeinating drinks using bacterial strains, project 2 and 3 are similiar in nature as both are small molecule drugs which I aim to synthesise using bacteria. Although this is ambitious I have also found that a mutual precursor such as diterpene could be made instead of the complete drug.



Optimising the oxidation reaction of Paclitaxel biosynthesis
Paclitaxel is a popular chemotherapy in several cancers such as ovarian, breast, and lung. However, the current production of it remains unsustainable from an environmental and economic perspective and can be optimised using biosynthesis. The most common way it is made on an industrial scale is via semi-synthesis by extraction of 10-deacetylbaccatin III from the European Yew (Taxus baccata) or other similar trees and eventually ending up with paclitaxel. This often is not extremely efficient (paclitaxel after extraction 10%), and contributes to environmental strain as it takes long for yew trees to mature driving up costs further Current biosynthesis is not particularly effective either, some work has been done in generating the taxol precursors using E. coli (10.1126/science.1191652). The complete synthesis is difficult due to the inefficiency of enzymatic reactions. One of the current bottlenecks in production is the first oxidation step catalysed by taxadiene 5α-hydroxylase (CYP725A4). This enzyme converts taxadiene into taxadien‑5α‑ol but exhibits low catalytic efficiency and poor selectivity, resulting in the formation of multiple undesired side products. Heterologous expression of taxadiene‑5α‑hydroxylase (CYP725A4) results in a high side‑product to main‑product ratio and low taxadien‑5α‑ol titres due to the formation of multiple oxygenated taxane derivatives, thereby limiting metabolic flux toward paclitaxel precursors and hindering efficient microbial production (doi.org/10.1186/s12934-022-01922-1).
“The challenge for biosynthesis of paclitaxel lies on the insufficient precursor, such as taxadien-5α-ol” (Wu, QY et al. doi.org/10.1186/s40643-022-00569-5)

I want to optimise the CYP725A4‑catalysed oxidation step in paclitaxel biosynthesis, which currently exhibits low selectivity due to competing reaction pathways in its active site. This may be achieved through enzyme engineering approaches such as active site analysis, molecular docking, rational mutation prediction, or by exploring alternative enzyme variants. Improving this early biosynthetic step could increase taxadien‑5α‑ol production and enhance the overall efficiency and sustainability of microbial paclitaxel synthesis.
Final Project Slide
After some discussion with my node TAs, I settled on paclitaxel as my final project. My three goals are as follows;

Aim 1: Identify and design CYP725A4 variants with improved efficiency using DNA construct design, rational mutation prediction, active-site analysis, molecular docking, and AI
Aim 2: Experimentally test the best CYP725A4 variants in a heterologous expression system and compare product distributions to determine if product formation improves relative to current variants
Aim 3: Enable more efficient and sustainable microbial paclitaxel production by reducing a major bottleneck in the biosynthetic pathway, decreasing dependence on plant-derived intermediates
Project Journey and Raw Results
The rest of this page documents how I got to my final CYP725A4 engineering result. I included the mistakes, intermediate docking results, residue-selection process, mutation choices, raw data sequences, plasmid design, and final presentation slides.
1. Starting With the Wrong Structure: 8X3E
At the beginning, I mistakenly started with the CYP725A4 structure 8X3E. I did not realise at first that this structure already had taxadiene bound, so I visualised it in PyMOL and then ran docking in ChimeraX with AutoDock Vina. This gave me an initial result, but I later realised that this was not the correct starting point for my intended workflow because the structure was already ligand-bound.

Original wrong surface visualisation of 8X3E. This was the first structure I worked with before realising taxadiene was already bound.

Docking/visualisation result from the 8X3E structure. This was useful for learning, but it was not used as the main docking result.
Initial 8X3E Docking Output
This mistake was useful because it showed me that taxadiene can adopt multiple conformations in the CYP725A4 active site. However, because I had started from a ligand-bound structure, I did not use this as the main result.
2. Switching to the Correct Apo Structure: 8X1W
After realising the issue with 8X3E, I switched to the CYP725A4 apo structure 8X1W. This structure did not already contain taxadiene, so it was a better starting point for testing how taxadiene docks into the active site.

Docking of taxadiene into the 8X1W apo structure. This became my correct WT baseline.
WT 8X1W Docking Result
Note: The image shown here was generated using a slightly different docking setting than the refined results reported in the table below. The table values should be treated as the final refined docking results, so the visual pose may
I then compared the 8X1W docking result visually against the taxadiene-bound 8X3E structure. The comparison looked reasonable, which gave me confidence that the docking setup was capturing the correct binding pocket.
3. Binding-Pocket Residue Selection
I first identified 13 residues or active-site features within the broader 3.5 Å, 4 Å, and 5 Å binding-pocket scans around taxadiene: W65, PHE69, M73, SER168, F169, H245, A246, T250, V314, G316, T317, L423, and HEM440. This initial list was then narrowed by removing residues that only appeared in the 5 Å shell, such as PHE69 and SER168, and residues with mostly negative or constrained mutation profiles, such as A246, T250, V314, G316, and L423. From the remaining candidates, W65, M73, F169, and H245 were selected as first-pass mutation targets because they were close enough to influence ligand positioning while still offering useful side-chain chemical changes. Combination variants were then included to test whether multiple mutations could improve substrate positioning through epistatic effects.

Mutation scan used to narrow down candidate residues and decide which positions were reasonable to mutate.
Residues Excluded From the First-Pass Library
| Residue | Reason for exclusion |
|---|---|
| PHE69 | Only appeared in the 5 Å scan |
| SER168 | Only appeared in the 5 Å scan |
| A246 | Already small; mostly backbone/constrained |
| T250 | Most mutations were negative |
| V314 | Mostly negative mutation profile |
| G316 | Glycine/backbone constrained |
| L423 | Mostly negative; direct contact was risky |
Final First-Pass Residue Positions
| Position | Residue | Reason for keeping |
|---|---|---|
| W65 | TRP | Bulky aromatic residue that may influence pocket shape |
| M73 | MET | Hydrophobic pocket-shaping residue |
| F169 | PHE | Aromatic residue that may affect ligand orientation |
| H245 | HIS | Possible second-shell or local interaction effect |
4. Mutation Design Logic
After selecting candidate positions, I used the mutation scan and amino acid characteristics to decide which mutations were worth testing. The goal was not just to improve binding affinity, but to improve the orientation of taxadiene relative to the heme iron, especially the C5–heme Fe distance.
| Mutation | Reason tested |
|---|---|
| W65F | Reduce bulky tryptophan while keeping aromatic character |
| M73L | Keep hydrophobicity but change side-chain shape |
| M73F | Test a more aromatic substitution at the same position |
| F169A | Remove aromatic bulk and create more space |
| F169S | Reduce aromatic bulk while adding a polar side chain |
| H245F | Replace histidine with a hydrophobic aromatic residue |
| H245L | Replace histidine with a hydrophobic aliphatic residue |
| M73L + F169A | Test whether two pocket-shaping mutations work better together |
| M73L + F169A + H245F | Test whether adding H245F improves productive positioning |
At this point, one difficulty was that docking produced many different conformations. This made it difficult to objectively choose the best result using affinity alone, so I focused on both docking score and C5–heme Fe distance.
5. Individual Mutation Results
M73L

M73L docking result. This mutation preserved hydrophobicity while changing the shape of the binding pocket.
Note: The image shown here was generated using a slightly different docking setting than the refined results reported in the table below. The table values should be treated as the final refined docking results, so the visual pose may
| Pose | Affinity (kcal/mol) | RMSD l.b. | RMSD u.b. | C5–Fe distance (Å) |
|---|---|---|---|---|
| 1 | -8.236 | 0.000 | 0.000 | 7.025 |
| 2 | -7.606 | 1.239 | 3.233 | 8.374 |
| 3 | -7.515 | 1.698 | 4.996 | 5.177 |
| 4 | -6.899 | 1.463 | 2.085 | 5.802 |
| 5 | -6.733 | 1.471 | 4.931 | 7.280 |
M73L did not improve the top-ranked pose very much, but pose 3 gave a much shorter C5–Fe distance of 5.177 Å. This suggested that M73L could support a more productive orientation, but not consistently as the best-ranked pose.
F169A

F169A docking result. This mutation removed aromatic bulk to test whether creating more space would improve taxadiene positioning.
Note: The image shown here was generated using a slightly different docking setting than the refined results reported in the table below. The table values should be treated as the final refined docking results, so the visual pose may
| Pose | Affinity (kcal/mol) | RMSD l.b. | RMSD u.b. | C5–Fe distance (Å) |
|---|---|---|---|---|
| 1 | -8.165 | 0.000 | 0.000 | 8.770 |
| 2 | -8.021 | 1.297 | 3.518 | 7.070 |
| 3 | -7.972 | 1.339 | 3.785 | 7.196 |
| 4 | -7.397 | 1.528 | 4.769 | 9.026 |
| 5 | -7.301 | 1.732 | 5.489 | 7.516 |
F169A alone did not improve the productive geometry. The best-ranked pose had a worse C5–Fe distance than WT, which showed that removing aromatic bulk alone was not enough.
M73L + F169A

M73L + F169A docking result. This tested whether combining two pocket-shaping mutations could create a more productive binding pose.
| Pose | Affinity (kcal/mol) | RMSD l.b. | RMSD u.b. | C5–Fe distance (Å) |
|---|---|---|---|---|
| 1 | -7.692 | 0.000 | 0.000 | 8.697 |
| 2 | -7.565 | 1.336 | 4.900 | 5.408 |
| 3 | -7.482 | 1.212 | 3.068 | 6.960 |
| 4 | -7.053 | 1.398 | 3.731 | 6.760 |
| 5 | -6.935 | 1.539 | 2.977 | 5.755 |
The combination of M73L + F169A produced some improved poses, especially pose 2 with a C5–Fe distance of 5.408 Å. However, the best-ranked pose still had poor geometry, which made the result harder to interpret.
M73L + F169A + H245F

M73L + F169A + H245F docking result. This was the final lead mutant because the best-ranked pose also had improved productive geometry.
| Pose | Affinity (kcal/mol) | RMSD l.b. | RMSD u.b. | C5–Fe distance (Å) |
|---|---|---|---|---|
| 1 | -8.419 | 0.000 | 0.000 | 5.513 |
| 2 | -8.379 | 1.249 | 4.892 | 8.655 |
| 3 | -7.024 | 1.613 | 4.936 | 6.979 |
| 4 | -6.976 | 1.748 | 3.037 | 8.746 |
| 5 | -6.714 | 1.372 | 3.529 | 7.210 |
The M73L + F169A + H245F variant gave the clearest improvement because pose 1 had both a reasonable docking score and a shorter C5–Fe distance. This made it the strongest final candidate from my tested variants.

GIF of the final best docking pose for M73L + F169A + H245F.
6. Raw Docking Results Summary
The table below shows the main raw docking results for the tested variants. I mainly used C5–heme Fe distance as the productive-geometry metric, while keeping affinity as a secondary metric.
| Variant | Pose | Affinity (kcal/mol) | RMSD l.b. | RMSD u.b. | C5–Fe distance (Å) |
|---|---|---|---|---|---|
| WT | 1 | -10.150 | 0.000 | 0.000 | 7.014 |
| WT | 2 | -8.937 | 2.083 | 4.767 | 9.032 |
| WT | 3 | -8.723 | 1.224 | 3.280 | 8.245 |
| WT | 4 | -8.675 | 1.498 | 4.784 | 8.458 |
| WT | 5 | -8.580 | 1.385 | 2.025 | 9.033 |
| W65F | 1 | -9.326 | 0.000 | 0.000 | 7.187 |
| W65F | 2 | -9.232 | 1.408 | 4.824 | 8.165 |
| W65F | 3 | -9.039 | 1.385 | 3.835 | 8.577 |
| W65F | 4 | -8.764 | 1.396 | 3.006 | 7.095 |
| W65F | 5 | -8.644 | 1.765 | 5.187 | 7.002 |
| M73L | 1 | -8.236 | 0.000 | 0.000 | 7.025 |
| M73L | 2 | -7.606 | 1.239 | 3.233 | 8.374 |
| M73L | 3 | -7.515 | 1.698 | 4.996 | 5.177 |
| M73L | 4 | -6.899 | 1.463 | 2.085 | 5.802 |
| M73L | 5 | -6.733 | 1.471 | 4.931 | 7.280 |
| M73F | 1 | -7.453 | 0.000 | 0.000 | 8.356 |
| M73F | 2 | -6.543 | 1.175 | 4.869 | 5.115 |
| M73F | 3 | -6.088 | 1.281 | 3.421 | 6.966 |
| M73F | 4 | -6.084 | 1.513 | 2.171 | 7.570 |
| M73F | 5 | -5.580 | 1.583 | 3.073 | 7.125 |
| F169A | 1 | -8.165 | 0.000 | 0.000 | 8.770 |
| F169A | 2 | -8.021 | 1.297 | 3.518 | 7.070 |
| F169A | 3 | -7.972 | 1.339 | 3.785 | 7.196 |
| F169A | 4 | -7.397 | 1.528 | 4.769 | 9.026 |
| F169A | 5 | -7.301 | 1.732 | 5.489 | 7.516 |
| F169S | 1 | -8.079 | 0.000 | 0.000 | 8.736 |
| F169S | 2 | -8.029 | 1.336 | 3.810 | 7.138 |
| F169S | 3 | -7.826 | 1.282 | 3.500 | 7.076 |
| F169S | 4 | -7.556 | 1.082 | 1.291 | 7.422 |
| F169S | 5 | -7.278 | 1.505 | 4.790 | 8.993 |
| M73L + F169A | 1 | -7.692 | 0.000 | 0.000 | 8.697 |
| M73L + F169A | 2 | -7.565 | 1.336 | 4.900 | 5.408 |
| M73L + F169A | 3 | -7.482 | 1.212 | 3.068 | 6.960 |
| M73L + F169A | 4 | -7.053 | 1.398 | 3.731 | 6.760 |
| M73L + F169A | 5 | -6.935 | 1.539 | 2.977 | 5.755 |
| H245F | 1 | -7.890 | 0.000 | 0.000 | 7.042 |
| H245F | 2 | -6.536 | 1.654 | 2.980 | 6.848 |
| H245F | 3 | -6.442 | 0.918 | 4.799 | 9.109 |
| H245F | 4 | -5.912 | 1.316 | 2.627 | 7.595 |
| H245L | 1 | -9.979 | 0.000 | 0.000 | 6.303 |
| H245L | 2 | -9.560 | 1.400 | 3.692 | 9.501 |
| H245L | 3 | -9.557 | 1.887 | 4.088 | 6.810 |
| H245L | 4 | -9.275 | 1.522 | 5.013 | 6.754 |
| H245L | 5 | -9.028 | 1.725 | 3.640 | 6.291 |
| M73L + F169A + H245F | 1 | -8.419 | 0.000 | 0.000 | 5.513 |
| M73L + F169A + H245F | 2 | -8.379 | 1.249 | 4.892 | 8.655 |
| M73L + F169A + H245F | 3 | -7.024 | 1.613 | 4.936 | 6.979 |
| M73L + F169A + H245F | 4 | -6.976 | 1.748 | 3.037 | 8.746 |
| M73L + F169A + H245F | 5 | -6.714 | 1.372 | 3.529 | 7.210 |
7. Best Productive Poses
This table summarises the most important productive poses from the raw docking results.
| Variant | Best productive pose | Affinity of that pose (kcal/mol) | C5–Fe distance (Å) |
|---|---|---|---|
| WT | Pose 1 | -10.150 | 7.014 |
| M73L | Pose 3 | -7.515 | 5.177 |
| H245L | Pose 5 | -9.028 | 6.291 |
| M73L + F169A + H245F | Pose 1 | -8.419 | 5.513 |
The WT enzyme had the strongest predicted binding affinity, but its C5–Fe distance was 7.014 Å. The M73L + F169A + H245F variant had a shorter C5–Fe distance of 5.513 Å, suggesting improved catalytic geometry despite weaker binding affinity.
8. Variant Sequences
Below are the protein sequences used for the docking workflow.
WT CYP725A4
W65F
M73L
M73F
F169A
F169S
H245F
H245L
M73L + F169A
M73L + F169A + H245F
9. Final Candidate
The final lead variant was:
This variant gave the best overall productive geometry because its best-ranked pose also had an improved C5–heme Fe distance.
I was a bit surprised by the results because I expected stronger improvements across more variants. However, the data showed that many single mutations did not improve the productive geometry, and some only produced useful conformations in lower-ranked poses. This made it clear that improving CYP725A4 selectivity is difficult and that small active-site changes can create many different substrate orientations.
10. What Took the Most Time
One of the biggest challenges was how long each mutation took to make, prepare, dock, inspect, and measure. Even though I did not test a very large mutation library, each variant required several steps: generating the mutant structure, adding/transferring the heme group, preparing the receptor, running AutoDock Vina, opening the docking poses, measuring C5–heme Fe distances, and deciding which poses were meaningful.
This made the project more time-consuming than expected, especially because docking produced many conformations and not all of them were easy to interpret objectively.
11. Molecular Dynamics Attempt
Near the end of the project, I started setting up molecular dynamics simulation to test whether the best docking pose would remain stable over time. The plan was to run a 10–50 ns simulation for the lead mutant. However, I did not have enough time before the deadline because some molecular dynamics simulations can take several days to set up and run properly.
This is why molecular dynamics was moved into future work instead of being included as a completed result.
12. Plasmid Design and Benchling
As part of the DNA construct planning for future experimental validation, I designed a plasmid construct for expressing CYP725A4 variants in a heterologous E. coli system. This design connects the computational docking work to the future wet-lab testing step.

Benchling plasmid design for the CYP725A4 construct.
View plasmid design on Benchling
13. Final Presentation Slides
Below are the final presentation slides for this project. I included them here so the presentation version of the project can be viewed alongside the raw computational work and final results.
Slide 1

Slide 2

Slide 3


