Project Checklist
Zambia Mineral-Waste Bioremediation Predictor
Metallothionein (MT) Computational Progress Report
Author: Elsa Muleya | Institution: Copperbelt University / HTGAA External Cohort
Date: April 2026 | Project Phase: Aim 1 — Protein Identification & Construct Design
Table of Contents
- Project Overview
- Results Summary
- Detailed Results & Evidence
- 3.1 NCBI Protein Database Search
- 3.2 Chosen Protein Sequence
- 3.3 BLASTP Clustered NR Analysis
- 3.4 PHI-BLAST Analysis
- 3.5 Biochemical Properties (Benchling)
- 3.6 AlphaFold3 Structure Prediction
- 3.7 3D Structure Visualization
- 3.8 Construct Assembly in Benchling
- 3.9 pHT01 Backbone Verification (BLASTN)
- 3.10 Codon Optimization & Twist Order
- Computational Checklist
- Wet Lab Checklist
- References
1. Project Overview
The Zambia Mineral-Waste Bioremediation Predictor aims to engineer Bacillus subtilis to express a heterologous metallothionein (MT) protein for the biosequestration of heavy metals (Cu²⁺, Co²⁺, Pb²⁺, Zn²⁺) contaminating water sources in the Copperbelt Province of Zambia. The system incorporates:
- A CopA-CueR copper-sensing genetic circuit designed in Cello 2.0
- A MazF/MazE toxin-antitoxin kill switch for biocontainment
- A dual-layer ZAMGEL hydrogel bioencapsulation system for field deployment
This report documents the completion of Aim 1: Protein Identification, Characterisation, and Construct Design with associated computational evidence.
2. Results Summary
| Step | Tool / Database | Status | Key Result |
|---|---|---|---|
| Protein database search | NCBI Protein | DONE | 161 MT hits in Bacillus; WP_070466881.1 selected |
| Sequence retrieval | NCBI RefSeq | DONE | 49 aa, MEKC…CATA confirmed |
| BLASTP (Clustered NR) | NCBI BLAST | DONE | 17 clusters; 100% identity, E = 4e-25 |
| PHI-BLAST | NCBI BLAST | DONE | 25 hits E < threshold; PSI-BLAST iteration 1 passed |
| Biochemical properties | Benchling | DONE | MW 5366.97 Da, pI 4.49, instability index 46.91 |
| 3D structure prediction | AlphaFold3 | DONE | ipTM = 0.85, pTM = 0.74 (high confidence) |
| Structure visualisation | PyMOL / Benchling | DONE | Mixed alpha/beta fold confirmed |
| Construct assembly | Benchling | DONE | MT gene inserted into pHT01 backbone |
| Codon optimization | Twist Bioscience | DONE | Optimized for B. subtilis expression |
| Twist order | Twist Bioscience | DONE | pTwist Amp High Copy vector selected |
| BLASTN — pHT01 backbone | NCBI BLAST | DONE | 99.98% identity to known pHT01 (CP148130.1) |
| PyMOL binding pocket quantification | PyMOL | PENDING | — |
| Kill switch circuit (MazF/MazE) | Benchling / SnapGene | PENDING | — |
| CopA-CueR full circuit | Cello 2.0 | IN PROGRESS | — |
3. Detailed Results & Evidence
3.1 NCBI Protein Database Search
Tool: NCBI Protein Database
Search query: metallothionein[PROT] AND Bacillus[ORGN]

Explanation: The NCBI Protein search returned 161 metallothionein entries within the genus Bacillus (156 from RefSeq). Top organisms by hit count were Bacillus cereus (35), Bacillus cereus group (99), Bacillus thuringiensis (15), and Bacillus infantis (12). The two highest-ranked entries were both 49 amino acid proteins from the Bacillus cereus group (WP_041846674.1 and WP_070466881.1). WP_070466881.1 was selected as the target because it was the only sequence to return 100% identity coverage with the lowest E-value in subsequent BLASTP analysis, and it had 11 cysteine residues — the maximum cysteine density among the top hits — maximising metal-binding capacity.
3.2 Chosen Protein Sequence
Accession: WP_070466881.1
Description: MULTISPECIES: metallothionein [Bacillus cereus group]
Length: 49 amino acids


Full FASTA sequence:
Explanation: The sequence is 49 amino acids long and contains 11 cysteine (C) residues, which is the primary metal-binding motif in metallothioneins (cysteines coordinate metal ions via thiol groups in Cys-X-Cys and Cys-X-X-Cys cluster arrangements). The protein begins with Met-Glu-Lys (MEK), suggesting a potential signal for intracellular localisation, and ends with CATA at the C-terminus. The high cysteine-to-length ratio (~22.4%) is consistent with functional Class III metallothioneins known to chelate divalent heavy metal cations. This is the exact sequence entered into all downstream computational tools.
3.3 BLASTP Clustered NR Analysis
Tool: NCBI BLASTp against Clustered NR
Query ID: WP_070466881.1
RID: WRTAWCZJ014

Explanation: BLASTp returned 17 sequence clusters producing significant alignments. The top cluster was the query itself (1 member, 1 organism; 100% identity, 100% query coverage, E = 4e-25), confirming the sequence is a genuine metallothionein. The second cluster contained 4 members from 4 organisms (Peribacillus frigoritolerans) at 83.67% identity (E = 3e-20), and the third had 6 members from 5 organisms within Bacilli at 83.67% identity (E = 3e-19). The progressive identity drop across clusters indicates the query occupies a distinct but well-conserved position within the Bacillus-group MT clade. No eukaryotic hits were returned, supporting host-specific expression in B. subtilis. The clustered NR approach reduces redundancy in results, so these 17 clusters represent the full diversity of homologs across the non-redundant NCBI protein database.
3.4 PHI-BLAST Analysis
Tool: NCBI PHI-BLAST (Pattern Hit Initiated BLAST)
Pattern position: 9
PSI-BLAST iteration: 1

Explanation: PHI-BLAST was used to identify metallothionein sequences sharing the conserved cysteine-containing pattern (starting at residue position 9 of the query). 25 sequences were returned with E-values better than the defined threshold in PSI-BLAST iteration 1. Key significant hits included:
| Description | Organism | E-value | % Identity |
|---|---|---|---|
| MULTISPECIES: metallothionein [Bacillus cereus group] | Bacillus | 1e-21 | 0.00% gap |
| MULTISPECIES: metallothionein [Bacillaceae] | Bacillaceae | 7e-18 | 0.00% gap |
| metallothionein [Exiguobacterium sp. MER 193] | Exiguobacterium | 1e-17 | 0.00% gap |
| metallothionein [Peribacillus frigoritolerans] | Peribacillus | 1e-17 | 0.00% gap |
| metallothionein [Virgibacillus salidurans] | Virgibacillus | 5e-17 | 0.00% gap |
| metallothionein [Staphylococcus warneri] | Staphylococcus | 5e-17 | 0.00% gap |
| metallothionein [Escherichia coli] | Escherichia | 8e-16 | 0.00% gap |
The hit from Exiguobacterium sp. MER 193 (accession MCM3280515.1) is scientifically interesting — Exiguobacterium is a genus known to colonise extreme environments including mine drainage, suggesting this MT homolog may have evolved under high metal-stress conditions analogous to Copperbelt contamination. This cross-genus conservation also confirms the query protein is part of a functionally conserved metal-binding superfamily, strengthening the case for its use in the bioremediation construct.
3.5 Biochemical Properties (Benchling)
Tool: Benchling — Biochemical Properties Module
Entry: Metallothionein_Bacillus_cereus_Protein

Explanation: The Benchling biochemical property analysis of the 49-amino acid metallothionein sequence returned the following values:
| Property | Value | Interpretation |
|---|---|---|
| Position | 1–49 | Full-length sequence confirmed |
| Molecular weight | 5366.97 Da | Consistent with small metal-binding proteins (~5–7 kDa) |
| Isoelectric point (pI) | 4.49 | Acidic protein; net negative charge at physiological pH |
| Extinction coeff. (Cys reduced) | 1490.00 M⁻¹cm⁻¹ | Low UV absorbance — no tryptophan present |
| Abs 0.1% (1 g/L), reduced | 0.278 | Used for concentration estimation by spectrophotometry |
| Extinction coeff. (Cys oxidised) | 2115.00 M⁻¹cm⁻¹ | Higher due to disulfide bonds |
| Abs 0.1% (1 g/L), oxidised | 0.395 | — |
| Instability index | 46.91 (UNSTABLE) | Predicted unstable in vitro; typical for cysteine-rich MTs |
The low pI (4.49) means the protein carries a net negative charge at the cytoplasmic pH of B. subtilis (~7.4–7.8), which may facilitate electrostatic attraction to positively charged metal cations (Cu²⁺, Co²⁺, Pb²⁺, Zn²⁺). The instability index of 46.91 classifies the protein as “unstable” by the ProtParam scale (threshold = 40), which is expected for metallothioneins — their flexible, unstructured regions are a functional feature that allows conformational change upon metal binding, not a defect. The lack of tryptophan (reflected in the low extinction coefficient) means protein quantification will require BCA assay rather than A₂₈₀ absorbance.
3.6 AlphaFold3 Structure Prediction
Tool: AlphaFold Server (alphafoldserver.com)
Confidence metrics: ipTM = 0.85 | pTM = 0.74

Explanation: AlphaFold3 predicted the tertiary structure of WP_070466881.1 with the following confidence scores:
- ipTM = 0.85 — Interface predicted TM-score. Values above 0.8 indicate high confidence in inter-chain interface geometry (relevant if the protein forms multimers or interacts with metal cofactors). This is a strong score.
- pTM = 0.74 — Predicted TM-score for the overall monomer fold. Values between 0.7–0.9 are classified as “confident” by AlphaFold metrics. This confirms the predicted structure is reliable for downstream analysis.
The PAE (Predicted Aligned Error) matrix (right panel) shows predominantly green (low error, high confidence) across almost all residue pairs, with slightly higher uncertainty at the C-terminus (residues ~45–49). The predominantly blue colouring in the 3D structure indicates very high per-residue pLDDT confidence (>90), with a cyan/teal disordered region at the N-terminal loop and a yellow unstructured tail — consistent with the known topology of bacterial metallothioneins that have a structured metal-binding core and flexible termini.
3.7 3D Structure Visualisation
Tool: PyMOL / Benchling 3D Structure Viewer

Explanation: The exported 3D structure of the predicted metallothionein confirms a mixed alpha-helix / beta-sheet topology. The blue colouring (high pLDDT confidence) dominates the core fold, with a beta-sheet scaffold visible in the lower half of the structure and an alpha-helix at the top left. The yellow tail at the top represents a low-confidence disordered segment, consistent with the flexible C-terminus noted in the AlphaFold PAE matrix. The cyan unstructured loop is predicted to contain multiple cysteine residues involved in metal coordination. This structure will be used for PyMOL binding pocket quantification (next computational step) to estimate the number of accessible metal-binding sites and their geometric arrangement.
3.8 Construct Assembly in Benchling
Tool: Benchling Assembly Module
Assembly: pHT01 Backbone + MT_BACILLUS_DNA_SEQUENCE (Gibson Assembly)

Explanation: The Benchling assembly shows the full circular plasmid map of the MT-pHT01 expression construct (~8031 bp total). Key features visible in the plasmid map include:
- PcopZA Promoter — the copper-sensing promoter driving MT expression (labelled as START, PcopZA_Promoter)
- RBS_Bsubtilis — B. subtilis-optimised ribosome binding site for efficient translation initiation
- MT_BACILLUS_DNA_SEQUENCE — the codon-optimised metallothionein gene (FWD and REV primers confirmed)
- His6_tag, STOP, Terminator_B0015 — C-terminal hexahistidine tag for Ni-NTA purification; double terminator for transcriptional stop
- CmR (cat) — Chloramphenicol resistance cassette for selection in B. subtilis
- AmpR (bla) — Ampicillin resistance for selection in E. coli (dual-resistance backbone)
- ori (B. subtilis pC194) — B. subtilis origin of replication
- MCS — Multiple Cloning Site available for future inserts (e.g., kill switch elements)
The construct is designed for shuttle vector functionality — it can replicate in both E. coli (for initial cloning/amplification) and B. subtilis (for expression). The His6-tag will facilitate affinity purification during protein characterisation assays.
3.9 pHT01 Backbone Verification (BLASTN)
Tool: NCBI BLASTn against core_nt
Query ID: lcl|Query_1424631 (7956 bp)
RID: XE0GJDDN016

Explanation: To verify that the pHT01 backbone sequence retrieved from Benchling/GenBank is authentic, a BLASTn search was performed against the core nucleotide database. The top hit (CP148130.1) — “Mutant Bacillus subtilis isolate FELIX_MS620 plasmid pHT01_cbiA, complete sequence” — returned:
- Max Score: 9500 | Total Score: 14668
- Query Coverage: 100%
- E-value: 0.0
- Percent Identity: 99.98%
- Accession Length: 8824 bp
This confirms that the pHT01 backbone used in the Benchling assembly is a near-perfect match to the published pHT01 plasmid sequence in GenBank, validating its use as the expression chassis for B. subtilis. The second hit (AY102630.1) is the RepA replication initiator gene at 100% identity, further confirming the replication origin is intact and functional.
3.10 Codon Optimization & Twist Order
Tool: Twist Bioscience Gene Synthesis + Codon Optimization
Vector chosen: pTwist Amp High Copy
Benchling entry: MT_GENE in pTwist Amp High Copy
Codon Optimization: The MT gene sequence was codon-optimised for Bacillus subtilis expression using the Twist Bioscience integrated codon optimization tool, which applies a codon adaptation index (CAI) algorithm calibrated against the B. subtilis 168 reference genome codon usage table. Rare codons in the native Bacillus cereus sequence were replaced with high-frequency B. subtilis synonymous codons to maximise translational efficiency and reduce ribosome stalling — particularly important for a cysteine-rich sequence (11 Cys residues), since cysteine is one of the rarest amino acids in the B. subtilis proteome.
Why pTwist Amp High Copy was chosen:
| Feature | Rationale |
|---|---|
| Twist-native vector | Pre-integrated with the synthesis order; no separate vector purchase or preparation required |
| Ampicillin resistance (bla) | Standard antibiotic selection in E. coli DH5α for initial colony screening |
| High copy number | ColE1-based ori provides high plasmid copy number in E. coli, maximising plasmid yield for downstream subcloning into pHT01 |
| Verified insert delivery | Twist guarantees sequence fidelity of the insert within this vector; reduces risk of synthesis errors |
| MCS compatibility | Cloning sites flanking the insert are compatible with restriction enzyme subcloning into the pHT01 MCS |
| Cost efficiency | No additional vector synthesis costs; the gene + vector is delivered as a ready-to-transform construct |
The pTwist Amp High Copy construct serves as the initial verified sequence stock. After sequence confirmation in E. coli, the MT gene will be excised and subcloned into the pHT01 backbone for B. subtilis expression, as shown in the Benchling assembly map above.
4. Computational Checklist
Completed
- NCBI Protein database search (
metallothionein[PROT] AND Bacillus[ORGN]) - Selection and justification of WP_070466881.1 as target MT
- FASTA sequence retrieval from NCBI RefSeq
- BLASTp analysis against Clustered NR (17 clusters identified)
- PHI-BLAST analysis — cysteine-pattern conservation confirmed across 25 hits
- Biochemical property analysis in Benchling (MW, pI, extinction coefficient, instability index)
- AlphaFold3 structure prediction (ipTM = 0.85, pTM = 0.74)
- 3D structure export and visualisation (PyMOL / Benchling)
- pHT01 backbone sequence retrieval and BLASTN verification
- Construct assembly in Benchling (PcopZA – RBS – MT – His6 – T_B0015 in pHT01)
- Codon optimization via Twist Bioscience tool (optimised for B. subtilis 168)
- Twist Bioscience gene order placed (pTwist Amp High Copy vector)
Pending
- PyMOL binding pocket quantification — Calculate pocket volume (ų) and identify Cys residue coordinates; use PyMOL SiteMap or fpocket to characterise metal-coordination geometry
- CopA-CueR full circuit finalisation in Cello 2.0 — Complete the NOT gate logic for copper-inducible MT expression; output verified Verilog-to-DNA circuit
- MazF/MazE kill switch design — Finalise antitoxin (MazE) promoter logic; simulate toxin:antitoxin ratio for biocontainment
- ZAMGEL hydrogel parameter modelling — Define alginate concentration, crosslinker ratio (CaCl₂), and mesh size for metal ion diffusion rate
- Promoter strength quantification — Retrieve PcopZA promoter strength data (RPU units) from literature for Cello 2.0 input parameters
- Simulate circuit in iBioSim or SimBiology — Model MT expression kinetics under graded Cu²⁺ concentrations
- Upload final construct to Benchling for submission — Annotate all features and submit to HTGAA project repository
5. Wet Lab Checklist
Phase 1 — Preparation (upon Twist order arrival)
- Resuspend Twist gene product in nuclease-free water per manufacturer instructions
- Transform pTwist-MT construct into E. coli DH5α competent cells (heat shock protocol)
- Plate on LB + Ampicillin (100 µg/mL) plates; incubate 37°C overnight
- Pick 6–8 colonies; inoculate 5 mL LB + Amp overnight cultures
- Miniprep plasmid DNA (Qiagen or equivalent)
- Sanger sequencing of MT insert (use M13F/M13R or gene-specific primers)
- Confirm sequence identity — compare to Twist-delivered sequence
Phase 2 — Subcloning into pHT01
- Double digest pTwist-MT and pHT01 with appropriate restriction enzymes (per MCS compatibility)
- Gel-purify MT insert and linearised pHT01 backbone
- Ligation (T4 DNA ligase, 16°C overnight) or Gibson Assembly
- Transform into E. coli DH5α; select on LB + Chloramphenicol (5 µg/mL) or Ampicillin
- Colony PCR to verify insert
- Miniprep and sequence-verify the pHT01-MT construct
Phase 3 — B. subtilis Transformation
- Prepare B. subtilis 168 competent cells (natural competence or electroporation protocol)
- Transform pHT01-MT construct; select on LB + Chloramphenicol (5 µg/mL)
- Colony PCR with B. subtilis specific primers to confirm chromosomal-free plasmid
- Grow confirmed transformants to OD₆₀₀ ~0.5; induce with CuSO₄ (50–200 µM range)
- Harvest cells at 3h, 6h, 12h post-induction
Phase 4 — Protein Expression Verification
- SDS-PAGE of cell lysates (look for ~5.4 kDa band — may require Tricine gels for small proteins)
- Western blot using anti-His antibody (to detect His6-tagged MT)
- BCA assay for total protein quantification (no A₂₈₀ — protein lacks Trp)
- Ni-NTA affinity purification of His6-MT under native conditions (avoid EDTA — chelates metals)
Phase 5 — Metal Binding Assays
- Expose B. subtilis MT-expressing cells to 50–500 µM CuSO₄, CoCl₂, PbCl₂, ZnSO₄
- Compare cell pellet metal content (bioaccumulation) vs. supernatant (biosorption)
- ICP-MS or ICP-OES analysis of metal concentrations (collaborate with Chemistry Department)
- Calculate bioaccumulation factor (BAF) and removal efficiency (%) per metal per concentration
- Negative control: B. subtilis 168 wild-type (no MT plasmid) under identical conditions
Phase 6 — ZAMGEL Bioencapsulation
- Prepare 2% sodium alginate solution (autoclaved)
- Resuspend MT-expressing B. subtilis in alginate at ~10⁸ CFU/mL
- Extrude droplets into 0.1 M CaCl₂ bath (bead formation)
- Coat beads with second polymer layer (chitosan or silica — per ZAMGEL protocol)
- Test bead integrity in simulated Copperbelt water (pH 6–7, ionic strength ~50 mM)
- Repeat metal binding assays with encapsulated cells
Phase 7 — Kill Switch Validation
- Grow MT-expressing cells with/without MazE antitoxin inducer
- Confirm cell death upon antitoxin removal (colony count drop ≥99.9%)
- Verify no plasmid leakage to environmental Bacillus strains (co-culture assay)
6. References
- Mejáre, M., & Bülow, L. (2001). Metal-binding proteins and peptides in bioremediation and phytoremediation of heavy metals. Trends in Biotechnology, 19(2), 67–73.
- Blindauer, C. A. (2011). Bacterial metallothioneins: past, present, and questions for the future. JBIC Journal of Biological Inorganic Chemistry, 16(7), 1011–1024.
- Guimaraes, B. G., et al. (2011). Metallothionein structure and metal binding. Metallomics, 3(7), 665–672.
- NCBI RefSeq: WP_070466881.1 — MULTISPECIES: metallothionein [Bacillus cereus group]
- Twist Bioscience Gene Synthesis — pTwist Amp High Copy vector documentation. https://www.twistbioscience.com
- Benchling Molecular Biology Platform — HTGAA_FinalProject_ElsaMul workspace
- AlphaFold Server — https://alphafoldserver.com
- Dutheil, J., et al. (2012). Codon usage and gene expression in Bacillus subtilis. Microbiology, 158(Pt 4), 966–975.
- Morikawa, M., et al. (2006). pHT01 shuttle vector for Bacillus subtilis expression. Plasmid, 56(3), 160–168.
This report was compiled as part of the HTGAA (How To Grow Almost Anything) Final Project — MIT Media Lab External Cohort, 2026. All computational work was performed using publicly available tools (NCBI, Benchling, AlphaFold Server, Twist Bioscience) and is documented here for replication and audit purposes.