Week 2 HW :DNA Read Write Edit
Molecular Biology 101
1. Nucleotides In Silico
Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).
Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.
2. DNA Synthesis
Instead of cloning from a template, you can order custom DNA directly from commercial providers like Twist Bioscience, IDT, or GenScript — typically delivered in 1–2 weeks. On Twist, you pick between two formats:
- Clonal genes (plasmid): Gene synthesized and cloned into a vector (e.g., pTwist Amp). Arrives as dried plasmid or E. coli stock. Ready to use.
- Linear DNA (fragments): Double-stranded DNA fragment for assembly into your own vector (e.g., Gibson or Golden Gate). Cheaper and faster.
3. Sequence Verification
Always verify your synthetic DNA before starting experiments. Two standard methods:
a. Sanger Sequencing + Benchling Alignment
Send plasmid + primer to a sequencing provider (Azenta, Eurofins). You get back a .ab1 trace file. Import it into Benchling, align against your reference — mismatches, insertions, and deletions are instantly highlighted. Each read covers ~800–1000 bp, so tile multiple primers for longer inserts.
b. Restriction Digest
Cut your plasmid with 1–2 restriction enzymes, run on an agarose gel, and compare the band pattern to the predicted digest (use Benchling or SnapGene). Confirms correct insert size and orientation. Won’t catch point mutations best used used alongside Sanger.
4. Selected Protien Example — Reflectin Protein RfA1
4.1 Background
Reflectins are squid origin proteins that can change the light reflecting properties of a cell in repsonse to external stimuli (such as changes in salt concentration). In squid they are responsible for dynamic skin colour and light-reflection functions. Chatterjee et al. (2020) showed that it is possible to produce engineered human HEK293 cells to express reflectin A1 (RfA1), giving them tuneable light-scattering — squid-like optics in human cells.
Reference: Chatterjee et al. “Cephalopod-inspired optical engineering of human cells.” Nature Communications 11, 2708 (2020). DOI link
4.2 Getting the Protein Sequence from GenBank
RfA1 from Doryteuthis pealeii is at accession ACZ57764.1: NCBI link. Click Send to → File to download in GenBank or FASTA format.
GenBank format excerpt:
303 amino acids, rich in methionine, tyrosine, and charged residues — classic reflectin signature.
4.3 The Corresponding DNA Sequence
To go from protein → DNA, you do a reverse translation: convert each amino acid back to a codon triplet. The catch: the genetic code is degenerate (multiple codons per amino acid), so there’s no single “correct” DNA sequence — just many valid ones. The wild-type squid coding sequence can be found via the “Coded by” link in the CDS feature of the NCBI protein page.
4.4 Codon Optimisation
Squid codons likely won’t express well in human or E. coli cells due to codon bias — organisms prefer different synonymous codons. Rare codons stall ribosomes and tank protein yield. Codon optimisation swaps in host-preferred codons without changing the protein.
We can use the online VectorBuilder tool: vectorbuilder.com/tool/codon-optimization.html — paste your sequence, pick your host organism, get optimised DNA out.
For dual expression (human + bacterial), you can either optimise separately for each host, or just optimise for human — human-preferred codons generally work fine in E. coli at moderate expression levels.
5. From Sequence to Cells — Step-by-Step with RfA1
Step (i): Import into Benchling
- Log in to Benchling → your project folder.
- Create → DNA Sequence (or paste/upload FASTA).
- Annotate the RfA1 CDS. Check the translation matches the expected protein.
- Use Benchling’s cloning tools to design the full expression construct in silico.
Step (ii): Design and Order from Twist
Goal: Express RfA1 in HEK293 cells via transposon integration at the AAVS1 safe-harbour locus (chr. 19), and also purify from E. coli.
Mammalian expression cassette:
- Promoter: CAG or EF1α (strong mammalian)
- Kozak: GCCACC before ATG
- RfA1 CDS (codon-optimised) + optional 6×His tag
- Stop codon: double stop (TAA-TAA)
- PolyA signal: SV40 or bGH
- Flanking: PiggyBac or Sleeping Beauty ITRs for transposon integration
- Selection: Puromycin or hygromycin resistance cassette (PGK promoter)
Bacterial expression: Sub-clone RfA1+His into pET-28a (T7/lac, IPTG-inducible, kanamycin).
Order on Twist: Genes → Clonal Genes → upload sequence → choose vector → Twist checks feasibility → order. ~2–3 week turnaround.
Step (iii): Transform E. coli, Purify, Verify
Transform: Resuspend plasmid → add to competent cells (DH5α or BL21) on ice 30 min → heat shock 42 °C / 45 sec → ice 2 min → recover in SOC 37 °C / 1 hr → plate on LB + antibiotic → overnight.
Miniprep: Pick 2–4 colonies → grow overnight in LB + antibiotic → miniprep (Qiagen or equivalent) → Nanodrop.
Verify — restriction digest: Digest ~500 ng with diagnostic enzymes → run on 1% agarose gel → compare bands to predicted pattern from Benchling.
Verify — Sanger sequencing: Send plasmid + tiling primers to Azenta/GENEWIZ → import .ab1 traces into Benchling → align to reference → confirm 100% match.
Step (iv): Transfect HEK293 via Transposon + Lipofectamine
- Seed HEK293 at ~70–80% confluency in 6-well plate (DMEM + 10% FBS, no antibiotics).
- Lipofectamine 3000 mix: Tube A (Lipo 3000 + Opti-MEM) + Tube B (transposon plasmid + transposase helper plasmid + P3000 + Opti-MEM). Ratio ~3–5:1 transposon:transposase. Combine, wait 15 min.
- Add complexes drop-wise → incubate 37 °C, 5% CO₂.
- Select at 24–48 hrs with puromycin (1–2 µg/mL). Change media every 2–3 days. Non-integrants die off in 5–10 days.
- Expand surviving pool or pick clones.
Step (v): Verify Genomic Integration
Confirm RfA1 actually integrated into the genome. Options from cheapest to most comprehensive:
A. Junction PCR + Sanger — One primer in cassette, one in flanking genome (e.g., AAVS1). Band = integration. Sanger the product. Cheap and fast but only checks one locus.
B. Long-read amplicon sequencing (Nanopore/PacBio) — Long-range PCR across the full insert → single-read verification of the entire cassette. No primer tiling needed.
C. TLA or whole-genome sequencing — Maps all integration sites genome-wide (Cergentis TLA or shallow WGS). Most comprehensive, most expensive. For final clone characterisation.
D. Targeted NGS panel — Extract gDNA (Qiagen DNeasy) → targeted panel covering construct + AAVS1 flanks. High-depth, catches mosaicism.
Best practical combo: A + B — junction PCR confirms the right locus, long-read confirms full cassette integrity.
Summary
| Step | What You Do | Key Tool / Service |
|---|---|---|
| Sequence retrieval | Download RfA1 protein sequence | NCBI GenBank (ACZ57764.1) |
| Codon optimisation | Optimise for human/bacterial expression | VectorBuilder online tool |
| In silico design | Import sequence, design construct | Benchling |
| DNA synthesis | Order construct as clonal gene | Twist Bioscience |
| Bacterial work | Transform, miniprep, verify | Competent E. coli, Sanger sequencing |
| Mammalian transfection | Transposon + Lipofectamine into HEK293 | Lipofectamine 3000, PiggyBac/SB |
| Integration verification | Confirm genomic integration | Junction PCR, Sanger, Nanopore, or WGS |