Final Project Report


SECTION 1: ABSTRACT

1. Provide an abstract/summary for your project. (minimum 150 words)

This project is aimed at a critical bottleneck in the biosynthesis of paclitaxel, a widely used chemotherapy drug. Current production methods rely heavily on semi-synthesis using yew-derived intermediates such as baccatin III, which creates environmental, economic, and supply-chain limitations [1]. This creates an opportunity to develop a more sustainable and economically viable microbial production route. A major challenge in microbial paclitaxel biosynthesis is the poor catalytic efficiency and selectivity of taxadiene 5α-hydroxylase, CYP725A4. When expressed in E. coli, CYP725A4 produces taxadien-5α-ol only as a minor product and forms multiple undesired oxidised side products instead [2,3].

The objective of this project is to improve the selectivity and efficiency of CYP725A4 through rational enzyme engineering. The hypothesis is that targeted mutations within the CYP725A4 active site can improve taxadiene positioning near the heme centre and reduce off-pathway reactions, thereby increasing productive C5 oxidation [4]. Specific aims include identifying key active-site residues, designing CYP725A4 mutation variants, and computationally evaluating these variants using molecular docking. Residues within 4–5 Å of the ligand were prioritised using structural analysis based on available CYP725A4 structural data, including the 8X1W apo structure [5]. Mutations were selected to test biochemical mechanisms such as hydrophobic pocket reshaping, aromatic interaction tuning, and second-shell interaction remodelling [6].

Methods include molecular docking with AutoDock Vina, structural analysis, focused mutation library design, and computational validation using binding affinity and C5–heme Fe distance as key metrics. The expected outcome is the identification of improved CYP725A4 variants that better position taxadiene for taxadien-5α-ol formation, contributing to more efficient and sustainable microbial paclitaxel biosynthesis.


SECTION 2: BACKGROUND

Provide background information and research for your final project.

Background and Literature Context

Paclitaxel is a clinically important anticancer drug, but its production remains challenging because natural extraction and semi-synthesis rely on plant-derived taxane intermediates [1]. Synthetic biology offers an alternative by engineering organisms to produce paclitaxel precursors, but the pathway remains difficult because several enzymatic steps are inefficient or poorly selective [1]. Recent reviews also describe paclitaxel production as limited by high cost, low natural abundance, and incomplete pathway optimization [7,8].

The current pathway with the exact reactin targeted marked in red

A major bottleneck is CYP725A4, a cytochrome P450 enzyme from Taxus cuspidata that catalyzes the oxidation of taxadiene toward taxadien-5α-ol (See Figure above). Sagwan-Barkdoll and Anterola showed that taxadien-5α-ol is only a minor product when CYP725A4 is expressed in E. coli, while OCT and iso-OCT are the major products [2]. This shows that CYP725A4 does not reliably direct taxadiene toward the desired oxidation pathway in a microbial host.

Rouck et al. showed that CYP725A4 can be expressed and purified in E. coli using modified constructs, including N-terminal modifications and CYP725A4-TCPR fusion strategies [3]. Their work supports the feasibility of bacterial expression, but also shows that CYP725A4 is technically difficult because it is a membrane-associated plant P450 [3].

Recent structural work has made rational engineering more realistic. Song et al. used crystallography and computational analysis to investigate CYP725A4 and showed that taxadiene oxidation can follow competing routes leading either to taxadien-5α-ol or side products such as OCT and iso-OCT [4]. The 8X1W structure provides an experimentally determined CYP725A4 apo structure from Taxus cuspidata, solved by X-ray diffraction at 2.10 Å resolution [5]. The 8X3E structure can also be implicated, as it shows a bound taxadiene, which can be used for experimental docking validation [9].


SECTION 3: VISION AND IMPACT

3a. Introduce the vision and impact of your final project. (min. 1-2 paragraphs)

Why the Project Matters

This project matters because CYP725A4 is one of the early bottlenecks in microbial paclitaxel biosynthesis. If CYP725A4 can be engineered to produce more taxadien-5α-ol and fewer side products, more pathway flux could move toward useful paclitaxel precursors [2,4].

Improving this step could make paclitaxel precursor production more sustainable by reducing dependence on slow-growing yew trees and plant extraction [1]. It could also reduce production costs and make biosynthetic manufacturing more practical. More broadly, this project contributes to synthetic biology by showing how enzyme structure, docking, and DNA construct design can be combined to improve a difficult pathway enzyme [3,4].

3b. Describe how your project is innovative (min. 3 sentences)

Novelty and Innovation

This project’s novelty lies in the ineptitude of the established pathway being economically viable. The current system has a clear inefficiency, and this oxidation step is one of the hurdles in making this pathway useful in commercial settings.

3c. Describe the bioethical considerations involved in your project. (min. 2 paragraphs)

Ethical Implications

The main ethical principles relevant to this project are beneficence, non-maleficence, responsibility, and justice. Beneficence applies because improving paclitaxel biosynthesis could support more sustainable production of an important anticancer drug. Justice is also relevant because improved biosynthetic production could eventually help lower manufacturing barriers and improve access to medicines [1,7]. However, non-maleficence is important because future experimental work would involve engineered E. coli strains expressing plant biosynthetic enzymes [3].

To ensure the project is ethical, future experiments should use non-pathogenic laboratory E. coli strains, standard biosafety containment, careful strain tracking, and responsible disposal of engineered material. An inducible expression system should be used to reduce unnecessary metabolic burden and limit uncontrolled pathway activity [3]. Potential unintended consequences include poor expression, unexpected metabolic toxicity, or environmental risk if engineered organisms were mishandled. Alternatives include testing variants first in purified enzyme assays, nanodisc systems, or cell-free systems before using living production strains.


SECTION 4: PROJECT AIMS

Outline three aims of your final project (min. 3 sentences, at least one for each aim)

Aim 1: Experimental Aim

The first aim of my final project is to identify and design CYP725A4 variants with improved productive taxadiene binding by utilising active-site analysis, rational mutation prediction, molecular docking, and DNA construct design. This aim uses CYP725A4 structural and mechanistic data, including the 8X1W apo structure and recent crystallographic/computational analysis of CYP725A4 [4,5].

Aim 2: Development Aim

The second aim is to experimentally test the best CYP725A4 variants in a heterologous E. coli expression system and compare product distributions to determine whether taxadien-5α-ol formation improves relative to wild-type CYP725A4 [2,3].

Aim 3: Visionary Aim

The third aim is to enable more efficient and sustainable microbial paclitaxel production by reducing a major enzymatic bottleneck in the biosynthetic pathway. If successful, this could reduce reliance on plant-derived intermediates and support more scalable production of paclitaxel precursors [1].

Illustrative outline of the project aims and respective methodologies


SECTION 5: EXPERIMENTAL DESIGN

Share a detailed experimental plan for your final project. Include a timeline for each part of your experimental plan (i.e., how long you would expect each step in your final project to take). (min. 15 lines/sentences—a numbered list is acceptable)

Detailed Experimental Plan

  1. Use the CYP725A4 8X3E structure as the docking reference, and the 8X1W as the structural reference.
  2. Dock Taxadiene on the 8X1W using Autodock Vina and identify residues within a 5 Å, 4Å, and 3.5Å shell of the taxadiene.
  3. Exclude residues that are already small, backbone-constrained, or strongly disfavoured in mutation scans, such as A246, T250, V314, G316, and L423.
  4. Run a mutation scan for conservation to exclude extensively nefarious mutations.
  5. Select the most favourable mutations, in this case W65, M73, F169, and H245, as first-pass mutation targets.
  6. Design individual variants: W65F, M73L, M73F, F169A, F169S, H245F, and H245L.
  7. Use amino acid properties to guide mutation logic; for example, M73L preserves hydrophobicity, but changes shape, while F169A removes aromatic bulk.
  8. Design combination variants such as M73L + F169A and M73L + F169A + H245F to test epistasis.
  9. Prepare mutant protein structures computationally.
  10. Dock taxadiene into each model using AutoDock Vina.
  11. Record binding affinity, RMSD values, and C5–heme Fe distance for each pose.
  12. Compare each mutant to wild-type CYP725A4.
  13. Select variants that improve C5–Fe distance without causing unacceptable loss of predicted binding.
  14. Design a future E. coli expression construct using an N-terminally modified CYP725A4 fused to Taxus CPR through a flexible linker, based on previous expression work.
  15. Add a C-terminal His6 tag for purification and use an inducible tac/trc-style promoter to reduce metabolic burden before induction.
  16. In future wet-lab validation, express the top variants in E. coli and analyse product distribution using GC-MS.
  17. Compare taxadien-5α-ol, OCT, and iso-OCT levels to determine whether the engineered enzyme improves product selectivity.

SECTION 6: RESULTS

Share the experimental results of your project.

Validation Chosen

I validated the computational design aspect of the project by docking taxadiene into wild-type CYP725A4 and several rationally designed CYP725A4 variants. This tested whether active-site mutations could improve productive substrate positioning, measured mainly by C5–heme Fe distance [4,5]. AlphaFold 3 was used as an orthogonal validation method to assess structural integrity. The AF3 WT model aligned closely with the experimental CYP725A4 structure (PDB 8X1W), with a backbone RMSD of 0.79 Å, confirming that AF3 accurately reproduces the native fold. The M73L + F169A + H245F mutant model showed an even lower RMSD of 0.63 Å relative to the starting structure, indicating that the introduced mutations do not disrupt the global P450 architecture. Together, these results support that the engineered active-site mutations are structurally compatible and suitable for downstream molecular dynamics simulation.

Validation Protocol

  1. Load the CYP725A4 structural model.
  2. Identify the taxadiene binding pocket using a 5 Å residue scan.
  3. Refine candidate residues using ≤4 Å proximity and residue chemistry.
  4. Generate individual and combination mutant models.
  5. Dock taxadiene into each variant using AutoDock Vina.
  6. Record the top docking poses and predicted binding affinities.
  7. Measure the distance between taxadiene C5 and heme Fe using ChimeraX.
  8. Make mutations and fold via ESMFold and transfer the heme group from the WT version.
  9. Orthogonally validate structure with AF3 and matchmake structures
  10. Dock taxadiene onto mutant variants.
  11. Compare each mutant against wild-type CYP725A4.
  12. Select variants with improved C5–Fe distance as candidates for future experimental testing.

Mutation Scan

I first identified 12 residues within the broader 3.5 Å, 4 Å, and 5 Å binding-pocket scans around taxadiene: W65, PHE69, M73, SER168, F169, H245, A246, T250, V314, G316, T317, L423, and HEM440. This initial list was then narrowed by removing residues that only appeared in the 5 Å shell, such as PHE69 and SER168, and residues with mostly negative or constrained mutation profiles, such as A246, T250, V314, G316, and L423. From the remaining candidates, W65, M73, F169, and H245 were selected as first-pass mutation targets because they were close enough to influence ligand positioning while still offering useful side-chain chemical changes. Combination variants were then included to test whether multiple mutations could improve substrate positioning through epistatic effects.

Key Docking Results

VariantBest productive poseAffinity of the best poseC5–Fe distance
WTPose 1-10.150 kcal/mol7.014 Å
M73LPose 3-7.515 kcal/mol5.177 Å
H245LPose 5-9.028 kcal/mol6.291 Å
M73L + F169A + H245FPose 1-8.419 kcal/mol5.513 Å

The WT enzyme had the strongest predicted binding affinity, but its C5–Fe distance was 7.014 Å. The M73L + F169A + H245F variant had a shorter C5–Fe distance of 5.513 Å, suggesting improved catalytic geometry despite weaker binding affinity. This supports the hypothesis that productive substrate positioning may be more important than binding strength alone for improving CYP725A4 selectivity [4].

Modelled mutated protein with the best taxadiene pose

Plasmid Design and Benchling Construct

As part of the DNA construct planning for future experimental validation, I designed a plasmid construct for expressing CYP725A4 variants in a heterologous E. coli system. This design connects the computational docking work to the future wet-lab testing described in Aim 2. The plasmid design includes the CYP725A4 coding sequence, planned mutation sites, expression-control elements, and features needed for cloning and downstream validation.

Benchling construct of native CYP725A4

View plasmid design on Benchling

I am currently trying to run an MD simulation for the binding for a substantial time of 10-50ns. However, the simulation took much longer than expected, so I have deferred this until after the project deadline.

Synthetic Biology Techniques Used

This validation used protein design, molecular docking, DNA construct planning, and database-supported structural analysis. Protein design was used to choose active-site mutations based on residue chemistry [6]. Docking was used to estimate binding poses and productive orientation. Structural databases were used because experimentally determined CYP725A4 structures now provide a stronger basis for rational engineering [5].


SECTION 7: DISCUSSION AND FUTURE WORK

7a. Discussion (2 paragraphs minimum)

One challenge is that docking does not fully capture enzyme flexibility, membrane effects, electron transfer, or true catalytic rate. This is important because CYP725A4 is a membrane-associated plant P450, and previous work showed that expression context and redox partners are important for functional testing [3].

Another limitation is that improved C5–Fe distance does not guarantee improved taxadien-5α-ol formation. CYP725A4 can form multiple products, including OCT and iso-OCT, so future validation must measure actual product distribution experimentally [2]. To overcome this, the next step should be experimental testing of the lead variants in E. coli, followed by GC-MS analysis. Molecular dynamics simulations could also be added before wet-lab testing to check whether the improved docking pose remains stable over time [4].

7b. Future Work (1 paragraph minimum)

The future plan of this project directly follows Aim 2 and Aim 3. The next step should be experimental testing of the lead CYP725A4 variants in a heterologous E. coli expression system, followed by GC-MS analysis to compare taxadien-5α-ol, OCT, and iso-OCT product distributions. This would show whether the computationally selected mutations improve product selectivity relative to wild-type CYP725A4. In the longer term, the best-performing CYP725A4 variants could be integrated into a larger microbial paclitaxel precursor pathway to support more efficient and sustainable paclitaxel biosynthesis. This connects to the visionary aim of reducing reliance on plant-derived intermediates and making microbial production of paclitaxel precursors more scalable.


SECTION 8: TECHNIQUES, TOOLS, AND TECHNOLOGY

8. We discussed and practiced various techniques related to synthetic biology throughout the semester. Place a check next to the techniques relevant to your project.

Used Techniques

  • Lab Safety
  • Bioethical Considerations
  • DNA Construct Design
  • Databases
  • Protein Design
  • Models and Notebooks
  • Bioproduction
  • Chassis Selection
  • Plasmid Preparation
  • Bacterial Culturing
  • Quality Control/Analysis
  • Bacterial Processing
  • Protein Purification
  • Primer Design or Selection
  • PCR Reactions
  • Gibson Assembly
  • Designing a Twist Order
  • Use of Benchling
  • Gel Electrophoresis

Not Used Techniques

  • Pipetting
  • DNA Gel Art
  • DNA Sequencing
  • DNA Editing
  • Restriction Enzyme Digestion
  • DNA Purification From Gel
  • Lab Automation
  • Creating Code for Laboratory Automation
  • Using Liquid Handling Robots
  • Creating a plan to use the Autonomous lab at Ginkgo Bioworks
  • Use of Boltz or PepMLM
  • Use of Asimov Kernel
  • Registry of Standard Biological Parts
  • Cell-Free Reactions
  • Freeze-Dried Cell-Free Systems
  • miniPCR Tools
  • Other Cloning Methods
  • CRISPR/Cas9
  • Designing Prime Editing gRNA

9. Expand upon two techniques you checked in the previous question by describing how you would utilize those techniques in your final project. (min. 4 sentences)

Expanded Techniques

Protein design and molecular docking:

Protein design was used to select CYP725A4 active-site mutations based on residue position, side-chain chemistry, and predicted effects on substrate orientation. Molecular docking was then used to test whether these mutations improved taxadiene positioning near the heme iron. This is appropriate because recent CYP725A4 studies show that product selectivity depends on competing catalytic pathways controlled by substrate positioning [4].

DNA construct design:

The future experimental construct is based on previous E. coli CYP725A4 expression systems. The native N-terminal membrane-anchor region would be replaced with an expression-improving peptide, CYP725A4 would be fused to Taxus CPR to support electron transfer, and a His6 tag would be added for purification [3]. An inducible promoter would be preferred to reduce metabolic stress before induction.


SECTION 9: ADDITIONAL INFORMATION

10a. List all references cited in this assignment (bullet-point list)

  1. Tong Y, Luo YF, Gao W. Biosynthesis of paclitaxel using synthetic biology. Phytochem Rev. 2022;21(3):863-877. doi:10.1007/s11101-021-09766-0.
  2. Sagwan-Barkdoll L, Anterola AM. Taxadiene-5α-ol is a minor product of CYP725A4 when expressed in Escherichia coli. Biotechnol Appl Biochem. 2018;65(3):294-305. doi:10.1002/bab.1606.
  3. Rouck JE, Biggs BW, Kambalyal A, Arnold WR, De Mey M, Ajikumar PK, et al. Heterologous expression and characterization of plant taxadiene-5α-hydroxylase CYP725A4 in Escherichia coli. Protein Expr Purif. 2017;132:60-67. doi:10.1016/j.pep.2017.01.008.
  4. Song X, Wang Q, Zhu X, Fang W, Liu X, Shi C, et al. Unraveling the catalytic mechanism of taxadiene-5α-hydroxylase from crystallography and computational analyses. ACS Catal. 2024;14(6):3912-3925. doi:10.1021/acscatal.3c05807.
  5. RCSB Protein Data Bank. 8X1W: CYP725A4 apo structure [Internet]. RCSB PDB; 2024 [cited 2026 May 24]. Available from: https://www.rcsb.org/structure/8X1W
  6. BIOC2580. Amino acid properties: polarity and ionization [Internet]. Available from: https://ecampusontario.pressbooks.pub/bioc2580/chapter/bioc2580-lecture-2-amino-acid-properties-polarity-and-ionization/
  7. Mutanda I, Li J, Xu F, Wang Y. Recent advances in metabolic engineering, protein engineering, and transcriptome-guided insights toward synthetic production of Taxol. Front Bioeng Biotechnol. 2021;9:632269. doi:10.3389/fbioe.2021.632269.
  8. Zhang S, Ye T, Liu Y, Hou G, Wang Q, Zhao F, et al. Research advances in clinical applications, anticancer mechanism, total chemical synthesis, semi-synthesis and biosynthesis of paclitaxel. Molecules. 2023;28(22):7517. doi:10.3390/molecules28227517.
  9. RCSB Protein Data Bank. 8X3E: CYP725A4-Taxa-4,11-diene complex [Internet]. RCSB PDB; 2024 [cited 2026 May 24]. Available from: https://www.rcsb.org/structure/8X3E

10b. Create a supply list and budget for your project (bullet-point list)

  • Gene synthesis (Twist Bioscience): ~$200–500
  • Plasmid vectors: ~$50
  • E. coli strains: ~$50
  • Reagents (PCR, cloning): ~$200
  • Culture media: ~$100
  • Protein purification materials: ~$300
  • Total estimated cost: ~$900–1200