Project drafts The following three slides represent my drafts for my final project. Project one involves decaffeinating drinks using bacterial strains, project 2 and 3 are similiar in nature as both are small molecule drugs which I aim to synthesise using bacteria. Although this is ambitious I have also found that a mutual precursor such as diterpene could be made instead of the complete drug.
SECTION 1: ABSTRACT 1. Provide an abstract/summary for your project. (minimum 150 words)
This project is aimed at a critical bottleneck in the biosynthesis of paclitaxel, a widely used chemotherapy drug. Current production methods rely heavily on semi-synthesis using yew-derived intermediates such as baccatin III, which creates environmental, economic, and supply-chain limitations [1]. This creates an opportunity to develop a more sustainable and economically viable microbial production route. A major challenge in microbial paclitaxel biosynthesis is the poor catalytic efficiency and selectivity of taxadiene 5α-hydroxylase, CYP725A4. When expressed in E. coli, CYP725A4 produces taxadien-5α-ol only as a minor product and forms multiple undesired oxidised side products instead [2,3].
Subsections of Projects
Final Project Journey
Project drafts
The following three slides represent my drafts for my final project. Project one involves decaffeinating drinks using bacterial strains, project 2 and 3 are similiar in nature as both are small molecule drugs which I aim to synthesise using bacteria. Although this is ambitious I have also found that a mutual precursor such as diterpene could be made instead of the complete drug.
Optimising the oxidation reaction of Paclitaxel biosynthesis
Paclitaxel is a popular chemotherapy in several cancers such as ovarian, breast, and lung. However, the current production of it remains unsustainable from an environmental and economic perspective and can be optimised using biosynthesis. The most common way it is made on an industrial scale is via semi-synthesis by extraction of 10-deacetylbaccatin III from the European Yew (Taxus baccata) or other similar trees and eventually ending up with paclitaxel. This often is not extremely efficient (paclitaxel after extraction 10%), and contributes to environmental strain as it takes long for yew trees to mature driving up costs further
Current biosynthesis is not particularly effective either, some work has been done in generating the taxol precursors using E. coli (10.1126/science.1191652). The complete synthesis is difficult due to the inefficiency of enzymatic reactions. One of the current bottlenecks in production is the first oxidation step catalysed by taxadiene 5α-hydroxylase (CYP725A4). This enzyme converts taxadiene into taxadien‑5α‑ol but exhibits low catalytic efficiency and poor selectivity, resulting in the formation of multiple undesired side products. Heterologous expression of taxadiene‑5α‑hydroxylase (CYP725A4) results in a high side‑product to main‑product ratio and low taxadien‑5α‑ol titres due to the formation of multiple oxygenated taxane derivatives, thereby limiting metabolic flux toward paclitaxel precursors and hindering efficient microbial production (doi.org/10.1186/s12934-022-01922-1).
“The challenge for biosynthesis of paclitaxel lies on the insufficient precursor, such as taxadien-5α-ol” (Wu, QY et al. doi.org/10.1186/s40643-022-00569-5)
I want to optimise the CYP725A4‑catalysed oxidation step in paclitaxel biosynthesis, which currently exhibits low selectivity due to competing reaction pathways in its active site. This may be achieved through enzyme engineering approaches such as active site analysis, molecular docking, rational mutation prediction, or by exploring alternative enzyme variants. Improving this early biosynthetic step could increase taxadien‑5α‑ol production and enhance the overall efficiency and sustainability of microbial paclitaxel synthesis.
Final Project Slide
After some discussion with my node TAs, I settled on paclitaxel as my final project. My three goals are as follows;
Final Project IDEA Slide
Aim 1: Identify and design CYP725A4 variants with improved efficiency using DNA construct design, rational mutation prediction, active-site analysis, molecular docking, and AI
Aim 2: Experimentally test the best CYP725A4 variants in a heterologous expression system and compare product distributions to determine if product formation improves relative to current variants
Aim 3: Enable more efficient and sustainable microbial paclitaxel production by reducing a major bottleneck in the biosynthetic pathway, decreasing dependence on plant-derived intermediates
Project Journey and Raw Results
The rest of this page documents how I got to my final CYP725A4 engineering result. I included the mistakes, intermediate docking results, residue-selection process, mutation choices, raw data sequences, plasmid design, and final presentation slides.
1. Starting With the Wrong Structure: 8X3E
At the beginning, I mistakenly started with the CYP725A4 structure 8X3E. I did not realise at first that this structure already had taxadiene bound, so I visualised it in PyMOL and then ran docking in ChimeraX with AutoDock Vina. This gave me an initial result, but I later realised that this was not the correct starting point for my intended workflow because the structure was already ligand-bound.
Original wrong surface visualisation of 8X3E. This was the first structure I worked with before realising taxadiene was already bound.
Docking/visualisation result from the 8X3E structure. This was useful for learning, but it was not used as the main docking result.
Initial 8X3E Docking Output
Grid centre:
X = 78.6
Y = 70.87
Z = 33.44
Grid size:
X = 20
Y = 20
Z = 20
Best affinity:
Mode 1 = -5.934 kcal/mol
Measured C5–heme Fe distance:
12.596 Å
This mistake was useful because it showed me that taxadiene can adopt multiple conformations in the CYP725A4 active site. However, because I had started from a ligand-bound structure, I did not use this as the main result.
2. Switching to the Correct Apo Structure: 8X1W
After realising the issue with 8X3E, I switched to the CYP725A4 apo structure 8X1W. This structure did not already contain taxadiene, so it was a better starting point for testing how taxadiene docks into the active site.
Docking of taxadiene into the 8X1W apo structure. This became my correct WT baseline.
WT 8X1W Docking Result
Note: The image shown here was generated using a slightly different docking setting than the refined results reported in the table below. The table values should be treated as the final refined docking results, so the visual pose may
Best affinity:
Mode 1 = -10.15 kcal/mol
C5–heme Fe distance:
7.014 Å
I then compared the 8X1W docking result visually against the taxadiene-bound 8X3E structure. The comparison looked reasonable, which gave me confidence that the docking setup was capturing the correct binding pocket.
3. Binding-Pocket Residue Selection
I first identified 13 residues or active-site features within the broader 3.5 Å, 4 Å, and 5 Å binding-pocket scans around taxadiene: W65, PHE69, M73, SER168, F169, H245, A246, T250, V314, G316, T317, L423, and HEM440. This initial list was then narrowed by removing residues that only appeared in the 5 Å shell, such as PHE69 and SER168, and residues with mostly negative or constrained mutation profiles, such as A246, T250, V314, G316, and L423. From the remaining candidates, W65, M73, F169, and H245 were selected as first-pass mutation targets because they were close enough to influence ligand positioning while still offering useful side-chain chemical changes. Combination variants were then included to test whether multiple mutations could improve substrate positioning through epistatic effects.
Mutation scan used to narrow down candidate residues and decide which positions were reasonable to mutate.
Residues Excluded From the First-Pass Library
Residue
Reason for exclusion
PHE69
Only appeared in the 5 Å scan
SER168
Only appeared in the 5 Å scan
A246
Already small; mostly backbone/constrained
T250
Most mutations were negative
V314
Mostly negative mutation profile
G316
Glycine/backbone constrained
L423
Mostly negative; direct contact was risky
Final First-Pass Residue Positions
Position
Residue
Reason for keeping
W65
TRP
Bulky aromatic residue that may influence pocket shape
M73
MET
Hydrophobic pocket-shaping residue
F169
PHE
Aromatic residue that may affect ligand orientation
H245
HIS
Possible second-shell or local interaction effect
4. Mutation Design Logic
After selecting candidate positions, I used the mutation scan and amino acid characteristics to decide which mutations were worth testing. The goal was not just to improve binding affinity, but to improve the orientation of taxadiene relative to the heme iron, especially the C5–heme Fe distance.
Mutation
Reason tested
W65F
Reduce bulky tryptophan while keeping aromatic character
M73L
Keep hydrophobicity but change side-chain shape
M73F
Test a more aromatic substitution at the same position
F169A
Remove aromatic bulk and create more space
F169S
Reduce aromatic bulk while adding a polar side chain
H245F
Replace histidine with a hydrophobic aromatic residue
H245L
Replace histidine with a hydrophobic aliphatic residue
M73L + F169A
Test whether two pocket-shaping mutations work better together
M73L + F169A + H245F
Test whether adding H245F improves productive positioning
At this point, one difficulty was that docking produced many different conformations. This made it difficult to objectively choose the best result using affinity alone, so I focused on both docking score and C5–heme Fe distance.
5. Individual Mutation Results
M73L
M73L docking result. This mutation preserved hydrophobicity while changing the shape of the binding pocket.
Note: The image shown here was generated using a slightly different docking setting than the refined results reported in the table below. The table values should be treated as the final refined docking results, so the visual pose may
Pose
Affinity (kcal/mol)
RMSD l.b.
RMSD u.b.
C5–Fe distance (Å)
1
-8.236
0.000
0.000
7.025
2
-7.606
1.239
3.233
8.374
3
-7.515
1.698
4.996
5.177
4
-6.899
1.463
2.085
5.802
5
-6.733
1.471
4.931
7.280
M73L did not improve the top-ranked pose very much, but pose 3 gave a much shorter C5–Fe distance of 5.177 Å. This suggested that M73L could support a more productive orientation, but not consistently as the best-ranked pose.
F169A
F169A docking result. This mutation removed aromatic bulk to test whether creating more space would improve taxadiene positioning.
Note: The image shown here was generated using a slightly different docking setting than the refined results reported in the table below. The table values should be treated as the final refined docking results, so the visual pose may
Pose
Affinity (kcal/mol)
RMSD l.b.
RMSD u.b.
C5–Fe distance (Å)
1
-8.165
0.000
0.000
8.770
2
-8.021
1.297
3.518
7.070
3
-7.972
1.339
3.785
7.196
4
-7.397
1.528
4.769
9.026
5
-7.301
1.732
5.489
7.516
F169A alone did not improve the productive geometry. The best-ranked pose had a worse C5–Fe distance than WT, which showed that removing aromatic bulk alone was not enough.
M73L + F169A
M73L + F169A docking result. This tested whether combining two pocket-shaping mutations could create a more productive binding pose.
Pose
Affinity (kcal/mol)
RMSD l.b.
RMSD u.b.
C5–Fe distance (Å)
1
-7.692
0.000
0.000
8.697
2
-7.565
1.336
4.900
5.408
3
-7.482
1.212
3.068
6.960
4
-7.053
1.398
3.731
6.760
5
-6.935
1.539
2.977
5.755
The combination of M73L + F169A produced some improved poses, especially pose 2 with a C5–Fe distance of 5.408 Å. However, the best-ranked pose still had poor geometry, which made the result harder to interpret.
M73L + F169A + H245F
M73L + F169A + H245F docking result. This was the final lead mutant because the best-ranked pose also had improved productive geometry.
Pose
Affinity (kcal/mol)
RMSD l.b.
RMSD u.b.
C5–Fe distance (Å)
1
-8.419
0.000
0.000
5.513
2
-8.379
1.249
4.892
8.655
3
-7.024
1.613
4.936
6.979
4
-6.976
1.748
3.037
8.746
5
-6.714
1.372
3.529
7.210
The M73L + F169A + H245F variant gave the clearest improvement because pose 1 had both a reasonable docking score and a shorter C5–Fe distance. This made it the strongest final candidate from my tested variants.
GIF of the final best docking pose for M73L + F169A + H245F.
6. Raw Docking Results Summary
The table below shows the main raw docking results for the tested variants. I mainly used C5–heme Fe distance as the productive-geometry metric, while keeping affinity as a secondary metric.
Variant
Pose
Affinity (kcal/mol)
RMSD l.b.
RMSD u.b.
C5–Fe distance (Å)
WT
1
-10.150
0.000
0.000
7.014
WT
2
-8.937
2.083
4.767
9.032
WT
3
-8.723
1.224
3.280
8.245
WT
4
-8.675
1.498
4.784
8.458
WT
5
-8.580
1.385
2.025
9.033
W65F
1
-9.326
0.000
0.000
7.187
W65F
2
-9.232
1.408
4.824
8.165
W65F
3
-9.039
1.385
3.835
8.577
W65F
4
-8.764
1.396
3.006
7.095
W65F
5
-8.644
1.765
5.187
7.002
M73L
1
-8.236
0.000
0.000
7.025
M73L
2
-7.606
1.239
3.233
8.374
M73L
3
-7.515
1.698
4.996
5.177
M73L
4
-6.899
1.463
2.085
5.802
M73L
5
-6.733
1.471
4.931
7.280
M73F
1
-7.453
0.000
0.000
8.356
M73F
2
-6.543
1.175
4.869
5.115
M73F
3
-6.088
1.281
3.421
6.966
M73F
4
-6.084
1.513
2.171
7.570
M73F
5
-5.580
1.583
3.073
7.125
F169A
1
-8.165
0.000
0.000
8.770
F169A
2
-8.021
1.297
3.518
7.070
F169A
3
-7.972
1.339
3.785
7.196
F169A
4
-7.397
1.528
4.769
9.026
F169A
5
-7.301
1.732
5.489
7.516
F169S
1
-8.079
0.000
0.000
8.736
F169S
2
-8.029
1.336
3.810
7.138
F169S
3
-7.826
1.282
3.500
7.076
F169S
4
-7.556
1.082
1.291
7.422
F169S
5
-7.278
1.505
4.790
8.993
M73L + F169A
1
-7.692
0.000
0.000
8.697
M73L + F169A
2
-7.565
1.336
4.900
5.408
M73L + F169A
3
-7.482
1.212
3.068
6.960
M73L + F169A
4
-7.053
1.398
3.731
6.760
M73L + F169A
5
-6.935
1.539
2.977
5.755
H245F
1
-7.890
0.000
0.000
7.042
H245F
2
-6.536
1.654
2.980
6.848
H245F
3
-6.442
0.918
4.799
9.109
H245F
4
-5.912
1.316
2.627
7.595
H245L
1
-9.979
0.000
0.000
6.303
H245L
2
-9.560
1.400
3.692
9.501
H245L
3
-9.557
1.887
4.088
6.810
H245L
4
-9.275
1.522
5.013
6.754
H245L
5
-9.028
1.725
3.640
6.291
M73L + F169A + H245F
1
-8.419
0.000
0.000
5.513
M73L + F169A + H245F
2
-8.379
1.249
4.892
8.655
M73L + F169A + H245F
3
-7.024
1.613
4.936
6.979
M73L + F169A + H245F
4
-6.976
1.748
3.037
8.746
M73L + F169A + H245F
5
-6.714
1.372
3.529
7.210
7. Best Productive Poses
This table summarises the most important productive poses from the raw docking results.
Variant
Best productive pose
Affinity of that pose (kcal/mol)
C5–Fe distance (Å)
WT
Pose 1
-10.150
7.014
M73L
Pose 3
-7.515
5.177
H245L
Pose 5
-9.028
6.291
M73L + F169A + H245F
Pose 1
-8.419
5.513
The WT enzyme had the strongest predicted binding affinity, but its C5–Fe distance was 7.014 Å. The M73L + F169A + H245F variant had a shorter C5–Fe distance of 5.513 Å, suggesting improved catalytic geometry despite weaker binding affinity.
8. Variant Sequences
Below are the protein sequences used for the docking workflow.
This variant gave the best overall productive geometry because its best-ranked pose also had an improved C5–heme Fe distance.
Affinity:
-8.419 kcal/mol
C5–heme Fe distance:
5.513 Å
I was a bit surprised by the results because I expected stronger improvements across more variants. However, the data showed that many single mutations did not improve the productive geometry, and some only produced useful conformations in lower-ranked poses. This made it clear that improving CYP725A4 selectivity is difficult and that small active-site changes can create many different substrate orientations.
10. What Took the Most Time
One of the biggest challenges was how long each mutation took to make, prepare, dock, inspect, and measure. Even though I did not test a very large mutation library, each variant required several steps: generating the mutant structure, adding/transferring the heme group, preparing the receptor, running AutoDock Vina, opening the docking poses, measuring C5–heme Fe distances, and deciding which poses were meaningful.
This made the project more time-consuming than expected, especially because docking produced many conformations and not all of them were easy to interpret objectively.
11. Molecular Dynamics Attempt
Near the end of the project, I started setting up molecular dynamics simulation to test whether the best docking pose would remain stable over time. The plan was to run a 10–50 ns simulation for the lead mutant. However, I did not have enough time before the deadline because some molecular dynamics simulations can take several days to set up and run properly.
This is why molecular dynamics was moved into future work instead of being included as a completed result.
12. Plasmid Design and Benchling
As part of the DNA construct planning for future experimental validation, I designed a plasmid construct for expressing CYP725A4 variants in a heterologous E. coli system. This design connects the computational docking work to the future wet-lab testing step.
Benchling plasmid design for the CYP725A4 construct.
Below are the final presentation slides for this project. I included them here so the presentation version of the project can be viewed alongside the raw computational work and final results.
Slide 1
Slide 2
Slide 3
Final Project Report
SECTION 1: ABSTRACT
1. Provide an abstract/summary for your project. (minimum 150 words)
This project is aimed at a critical bottleneck in the biosynthesis of paclitaxel, a widely used chemotherapy drug. Current production methods rely heavily on semi-synthesis using yew-derived intermediates such as baccatin III, which creates environmental, economic, and supply-chain limitations [1]. This creates an opportunity to develop a more sustainable and economically viable microbial production route. A major challenge in microbial paclitaxel biosynthesis is the poor catalytic efficiency and selectivity of taxadiene 5α-hydroxylase, CYP725A4. When expressed in E. coli, CYP725A4 produces taxadien-5α-ol only as a minor product and forms multiple undesired oxidised side products instead [2,3].
The objective of this project is to improve the selectivity and efficiency of CYP725A4 through rational enzyme engineering. The hypothesis is that targeted mutations within the CYP725A4 active site can improve taxadiene positioning near the heme centre and reduce off-pathway reactions, thereby increasing productive C5 oxidation [4]. Specific aims include identifying key active-site residues, designing CYP725A4 mutation variants, and computationally evaluating these variants using molecular docking. Residues within 4–5 Å of the ligand were prioritised using structural analysis based on available CYP725A4 structural data, including the 8X1W apo structure [5]. Mutations were selected to test biochemical mechanisms such as hydrophobic pocket reshaping, aromatic interaction tuning, and second-shell interaction remodelling [6].
Methods include molecular docking with AutoDock Vina, structural analysis, focused mutation library design, and computational validation using binding affinity and C5–heme Fe distance as key metrics. The expected outcome is the identification of improved CYP725A4 variants that better position taxadiene for taxadien-5α-ol formation, contributing to more efficient and sustainable microbial paclitaxel biosynthesis.
SECTION 2: BACKGROUND
Provide background information and research for your final project.
Background and Literature Context
Paclitaxel is a clinically important anticancer drug, but its production remains challenging because natural extraction and semi-synthesis rely on plant-derived taxane intermediates [1]. Synthetic biology offers an alternative by engineering organisms to produce paclitaxel precursors, but the pathway remains difficult because several enzymatic steps are inefficient or poorly selective [1]. Recent reviews also describe paclitaxel production as limited by high cost, low natural abundance, and incomplete pathway optimization [7,8].
The current pathway with the exact reactin targeted marked in red
A major bottleneck is CYP725A4, a cytochrome P450 enzyme from Taxus cuspidata that catalyzes the oxidation of taxadiene toward taxadien-5α-ol (See Figure above). Sagwan-Barkdoll and Anterola showed that taxadien-5α-ol is only a minor product when CYP725A4 is expressed in E. coli, while OCT and iso-OCT are the major products [2]. This shows that CYP725A4 does not reliably direct taxadiene toward the desired oxidation pathway in a microbial host.
Rouck et al. showed that CYP725A4 can be expressed and purified in E. coli using modified constructs, including N-terminal modifications and CYP725A4-TCPR fusion strategies [3]. Their work supports the feasibility of bacterial expression, but also shows that CYP725A4 is technically difficult because it is a membrane-associated plant P450 [3].
Recent structural work has made rational engineering more realistic. Song et al. used crystallography and computational analysis to investigate CYP725A4 and showed that taxadiene oxidation can follow competing routes leading either to taxadien-5α-ol or side products such as OCT and iso-OCT [4]. The 8X1W structure provides an experimentally determined CYP725A4 apo structure from Taxus cuspidata, solved by X-ray diffraction at 2.10 Å resolution [5]. The 8X3E structure can also be implicated, as it shows a bound taxadiene, which can be used for experimental docking validation [9].
SECTION 3: VISION AND IMPACT
3a. Introduce the vision and impact of your final project. (min. 1-2 paragraphs)
Why the Project Matters
This project matters because CYP725A4 is one of the early bottlenecks in microbial paclitaxel biosynthesis. If CYP725A4 can be engineered to produce more taxadien-5α-ol and fewer side products, more pathway flux could move toward useful paclitaxel precursors [2,4].
Improving this step could make paclitaxel precursor production more sustainable by reducing dependence on slow-growing yew trees and plant extraction [1]. It could also reduce production costs and make biosynthetic manufacturing more practical. More broadly, this project contributes to synthetic biology by showing how enzyme structure, docking, and DNA construct design can be combined to improve a difficult pathway enzyme [3,4].
3b. Describe how your project is innovative (min. 3 sentences)
Novelty and Innovation
This project’s novelty lies in the ineptitude of the established pathway being economically viable. The current system has a clear inefficiency, and this oxidation step is one of the hurdles in making this pathway useful in commercial settings.
3c. Describe the bioethical considerations involved in your project. (min. 2 paragraphs)
Ethical Implications
The main ethical principles relevant to this project are beneficence, non-maleficence, responsibility, and justice. Beneficence applies because improving paclitaxel biosynthesis could support more sustainable production of an important anticancer drug. Justice is also relevant because improved biosynthetic production could eventually help lower manufacturing barriers and improve access to medicines [1,7]. However, non-maleficence is important because future experimental work would involve engineered E. coli strains expressing plant biosynthetic enzymes [3].
To ensure the project is ethical, future experiments should use non-pathogenic laboratory E. coli strains, standard biosafety containment, careful strain tracking, and responsible disposal of engineered material. An inducible expression system should be used to reduce unnecessary metabolic burden and limit uncontrolled pathway activity [3]. Potential unintended consequences include poor expression, unexpected metabolic toxicity, or environmental risk if engineered organisms were mishandled. Alternatives include testing variants first in purified enzyme assays, nanodisc systems, or cell-free systems before using living production strains.
SECTION 4: PROJECT AIMS
Outline three aims of your final project (min. 3 sentences, at least one for each aim)
Aim 1: Experimental Aim
The first aim of my final project is to identify and design CYP725A4 variants with improved productive taxadiene binding by utilising active-site analysis, rational mutation prediction, molecular docking, and DNA construct design. This aim uses CYP725A4 structural and mechanistic data, including the 8X1W apo structure and recent crystallographic/computational analysis of CYP725A4 [4,5].
Aim 2: Development Aim
The second aim is to experimentally test the best CYP725A4 variants in a heterologous E. coli expression system and compare product distributions to determine whether taxadien-5α-ol formation improves relative to wild-type CYP725A4 [2,3].
Aim 3: Visionary Aim
The third aim is to enable more efficient and sustainable microbial paclitaxel production by reducing a major enzymatic bottleneck in the biosynthetic pathway. If successful, this could reduce reliance on plant-derived intermediates and support more scalable production of paclitaxel precursors [1].
Illustrative outline of the project aims and respective methodologies
SECTION 5: EXPERIMENTAL DESIGN
Share a detailed experimental plan for your final project. Include a timeline for each part of your experimental plan (i.e., how long you would expect each step in your final project to take). (min. 15 lines/sentences—a numbered list is acceptable)
Detailed Experimental Plan
Use the CYP725A4 8X3E structure as the docking reference, and the 8X1W as the structural reference.
Dock Taxadiene on the 8X1W using Autodock Vina and identify residues within a 5 Å, 4Å, and 3.5Å shell of the taxadiene.
Exclude residues that are already small, backbone-constrained, or strongly disfavoured in mutation scans, such as A246, T250, V314, G316, and L423.
Run a mutation scan for conservation to exclude extensively nefarious mutations.
Select the most favourable mutations, in this case W65, M73, F169, and H245, as first-pass mutation targets.
Use amino acid properties to guide mutation logic; for example, M73L preserves hydrophobicity, but changes shape, while F169A removes aromatic bulk.
Design combination variants such as M73L + F169A and M73L + F169A + H245F to test epistasis.
Prepare mutant protein structures computationally.
Dock taxadiene into each model using AutoDock Vina.
Record binding affinity, RMSD values, and C5–heme Fe distance for each pose.
Compare each mutant to wild-type CYP725A4.
Select variants that improve C5–Fe distance without causing unacceptable loss of predicted binding.
Design a future E. coli expression construct using an N-terminally modified CYP725A4 fused to Taxus CPR through a flexible linker, based on previous expression work.
Add a C-terminal His6 tag for purification and use an inducible tac/trc-style promoter to reduce metabolic burden before induction.
In future wet-lab validation, express the top variants in E. coli and analyse product distribution using GC-MS.
Compare taxadien-5α-ol, OCT, and iso-OCT levels to determine whether the engineered enzyme improves product selectivity.
SECTION 6: RESULTS
Share the experimental results of your project.
Validation Chosen
I validated the computational design aspect of the project by docking taxadiene into wild-type CYP725A4 and several rationally designed CYP725A4 variants. This tested whether active-site mutations could improve productive substrate positioning, measured mainly by C5–heme Fe distance [4,5]. AlphaFold 3 was used as an orthogonal validation method to assess structural integrity. The AF3 WT model aligned closely with the experimental CYP725A4 structure (PDB 8X1W), with a backbone RMSD of 0.79 Å, confirming that AF3 accurately reproduces the native fold. The M73L + F169A + H245F mutant model showed an even lower RMSD of 0.63 Å relative to the starting structure, indicating that the introduced mutations do not disrupt the global P450 architecture. Together, these results support that the engineered active-site mutations are structurally compatible and suitable for downstream molecular dynamics simulation.
Validation Protocol
Load the CYP725A4 structural model.
Identify the taxadiene binding pocket using a 5 Å residue scan.
Refine candidate residues using ≤4 Å proximity and residue chemistry.
Generate individual and combination mutant models.
Dock taxadiene into each variant using AutoDock Vina.
Record the top docking poses and predicted binding affinities.
Measure the distance between taxadiene C5 and heme Fe using ChimeraX.
Make mutations and fold via ESMFold and transfer the heme group from the WT version.
Orthogonally validate structure with AF3 and matchmake structures
Dock taxadiene onto mutant variants.
Compare each mutant against wild-type CYP725A4.
Select variants with improved C5–Fe distance as candidates for future experimental testing.
Mutation Scan
I first identified 12 residues within the broader 3.5 Å, 4 Å, and 5 Å binding-pocket scans around taxadiene: W65, PHE69, M73, SER168, F169, H245, A246, T250, V314, G316, T317, L423, and HEM440. This initial list was then narrowed by removing residues that only appeared in the 5 Å shell, such as PHE69 and SER168, and residues with mostly negative or constrained mutation profiles, such as A246, T250, V314, G316, and L423. From the remaining candidates, W65, M73, F169, and H245 were selected as first-pass mutation targets because they were close enough to influence ligand positioning while still offering useful side-chain chemical changes. Combination variants were then included to test whether multiple mutations could improve substrate positioning through epistatic effects.
Key Docking Results
Variant
Best productive pose
Affinity of the best pose
C5–Fe distance
WT
Pose 1
-10.150 kcal/mol
7.014 Å
M73L
Pose 3
-7.515 kcal/mol
5.177 Å
H245L
Pose 5
-9.028 kcal/mol
6.291 Å
M73L + F169A + H245F
Pose 1
-8.419 kcal/mol
5.513 Å
The WT enzyme had the strongest predicted binding affinity, but its C5–Fe distance was 7.014 Å. The M73L + F169A + H245F variant had a shorter C5–Fe distance of 5.513 Å, suggesting improved catalytic geometry despite weaker binding affinity. This supports the hypothesis that productive substrate positioning may be more important than binding strength alone for improving CYP725A4 selectivity [4].
Modelled mutated protein with the best taxadiene pose
Plasmid Design and Benchling Construct
As part of the DNA construct planning for future experimental validation, I designed a plasmid construct for expressing CYP725A4 variants in a heterologous E. coli system. This design connects the computational docking work to the future wet-lab testing described in Aim 2. The plasmid design includes the CYP725A4 coding sequence, planned mutation sites, expression-control elements, and features needed for cloning and downstream validation.
I am currently trying to run an MD simulation for the binding for a substantial time of 10-50ns. However, the simulation took much longer than expected, so I have deferred this until after the project deadline.
Synthetic Biology Techniques Used
This validation used protein design, molecular docking, DNA construct planning, and database-supported structural analysis. Protein design was used to choose active-site mutations based on residue chemistry [6]. Docking was used to estimate binding poses and productive orientation. Structural databases were used because experimentally determined CYP725A4 structures now provide a stronger basis for rational engineering [5].
SECTION 7: DISCUSSION AND FUTURE WORK
7a. Discussion (2 paragraphs minimum)
One challenge is that docking does not fully capture enzyme flexibility, membrane effects, electron transfer, or true catalytic rate. This is important because CYP725A4 is a membrane-associated plant P450, and previous work showed that expression context and redox partners are important for functional testing [3].
Another limitation is that improved C5–Fe distance does not guarantee improved taxadien-5α-ol formation. CYP725A4 can form multiple products, including OCT and iso-OCT, so future validation must measure actual product distribution experimentally [2]. To overcome this, the next step should be experimental testing of the lead variants in E. coli, followed by GC-MS analysis. Molecular dynamics simulations could also be added before wet-lab testing to check whether the improved docking pose remains stable over time [4].
7b. Future Work (1 paragraph minimum)
The future plan of this project directly follows Aim 2 and Aim 3. The next step should be experimental testing of the lead CYP725A4 variants in a heterologous E. coli expression system, followed by GC-MS analysis to compare taxadien-5α-ol, OCT, and iso-OCT product distributions. This would show whether the computationally selected mutations improve product selectivity relative to wild-type CYP725A4. In the longer term, the best-performing CYP725A4 variants could be integrated into a larger microbial paclitaxel precursor pathway to support more efficient and sustainable paclitaxel biosynthesis. This connects to the visionary aim of reducing reliance on plant-derived intermediates and making microbial production of paclitaxel precursors more scalable.
SECTION 8: TECHNIQUES, TOOLS, AND TECHNOLOGY
8. We discussed and practiced various techniques related to synthetic biology throughout the semester. Place a check next to the techniques relevant to your project.
Used Techniques
Lab Safety
Bioethical Considerations
DNA Construct Design
Databases
Protein Design
Models and Notebooks
Bioproduction
Chassis Selection
Plasmid Preparation
Bacterial Culturing
Quality Control/Analysis
Bacterial Processing
Protein Purification
Primer Design or Selection
PCR Reactions
Gibson Assembly
Designing a Twist Order
Use of Benchling
Gel Electrophoresis
Not Used Techniques
Pipetting
DNA Gel Art
DNA Sequencing
DNA Editing
Restriction Enzyme Digestion
DNA Purification From Gel
Lab Automation
Creating Code for Laboratory Automation
Using Liquid Handling Robots
Creating a plan to use the Autonomous lab at Ginkgo Bioworks
Use of Boltz or PepMLM
Use of Asimov Kernel
Registry of Standard Biological Parts
Cell-Free Reactions
Freeze-Dried Cell-Free Systems
miniPCR Tools
Other Cloning Methods
CRISPR/Cas9
Designing Prime Editing gRNA
9. Expand upon two techniques you checked in the previous question by describing how you would utilize those techniques in your final project. (min. 4 sentences)
Expanded Techniques
Protein design and molecular docking:
Protein design was used to select CYP725A4 active-site mutations based on residue position, side-chain chemistry, and predicted effects on substrate orientation. Molecular docking was then used to test whether these mutations improved taxadiene positioning near the heme iron. This is appropriate because recent CYP725A4 studies show that product selectivity depends on competing catalytic pathways controlled by substrate positioning [4].
DNA construct design:
The future experimental construct is based on previous E. coli CYP725A4 expression systems. The native N-terminal membrane-anchor region would be replaced with an expression-improving peptide, CYP725A4 would be fused to Taxus CPR to support electron transfer, and a His6 tag would be added for purification [3]. An inducible promoter would be preferred to reduce metabolic stress before induction.
SECTION 9: ADDITIONAL INFORMATION
10a. List all references cited in this assignment (bullet-point list)
Tong Y, Luo YF, Gao W. Biosynthesis of paclitaxel using synthetic biology. Phytochem Rev. 2022;21(3):863-877. doi:10.1007/s11101-021-09766-0.
Sagwan-Barkdoll L, Anterola AM. Taxadiene-5α-ol is a minor product of CYP725A4 when expressed in Escherichia coli. Biotechnol Appl Biochem. 2018;65(3):294-305. doi:10.1002/bab.1606.
Rouck JE, Biggs BW, Kambalyal A, Arnold WR, De Mey M, Ajikumar PK, et al. Heterologous expression and characterization of plant taxadiene-5α-hydroxylase CYP725A4 in Escherichia coli. Protein Expr Purif. 2017;132:60-67. doi:10.1016/j.pep.2017.01.008.
Song X, Wang Q, Zhu X, Fang W, Liu X, Shi C, et al. Unraveling the catalytic mechanism of taxadiene-5α-hydroxylase from crystallography and computational analyses. ACS Catal. 2024;14(6):3912-3925. doi:10.1021/acscatal.3c05807.
RCSB Protein Data Bank. 8X1W: CYP725A4 apo structure [Internet]. RCSB PDB; 2024 [cited 2026 May 24]. Available from: https://www.rcsb.org/structure/8X1W
Mutanda I, Li J, Xu F, Wang Y. Recent advances in metabolic engineering, protein engineering, and transcriptome-guided insights toward synthetic production of Taxol. Front Bioeng Biotechnol. 2021;9:632269. doi:10.3389/fbioe.2021.632269.
Zhang S, Ye T, Liu Y, Hou G, Wang Q, Zhao F, et al. Research advances in clinical applications, anticancer mechanism, total chemical synthesis, semi-synthesis and biosynthesis of paclitaxel. Molecules. 2023;28(22):7517. doi:10.3390/molecules28227517.
RCSB Protein Data Bank. 8X3E: CYP725A4-Taxa-4,11-diene complex [Internet]. RCSB PDB; 2024 [cited 2026 May 24]. Available from: https://www.rcsb.org/structure/8X3E
10b. Create a supply list and budget for your project (bullet-point list)