Week 2 HW :DNA Read Write Edit

Molecular Biology 101

1. Nucleotides In Silico

Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).

Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.

2. DNA Synthesis

Instead of cloning from a template, you can order custom DNA directly from commercial providers like Twist Bioscience, IDT, or GenScript — typically delivered in 1–2 weeks. On Twist, you pick between two formats:

Clonal genes (plasmid): Gene synthesized and cloned into a vector (e.g., pTwist Amp). Arrives as dried plasmid or E. coli stock. Ready to use.
Linear DNA (fragments): Double-stranded DNA fragment for assembly into your own vector (e.g., Gibson or Golden Gate). Cheaper and faster.

3. Sequence Verification

Always verify your synthetic DNA before starting experiments. Two standard methods:

a. Sanger Sequencing + Benchling Alignment

Send plasmid + primer to a sequencing provider (Azenta, Eurofins). You get back a .ab1 trace file. Import it into Benchling, align against your reference — mismatches, insertions, and deletions are instantly highlighted. Each read covers ~800–1000 bp, so tile multiple primers for longer inserts.

b. Restriction Digest

Cut your plasmid with 1–2 restriction enzymes, run on an agarose gel, and compare the band pattern to the predicted digest (use Benchling or SnapGene). Confirms correct insert size and orientation. Won’t catch point mutations best used used alongside Sanger.

4. Selected Protien Example — Reflectin Protein RfA1

4.1 Background

Reflectins are squid origin proteins that can change the light reflecting properties of a cell in repsonse to external stimuli (such as changes in salt concentration). In squid they are responsible for dynamic skin colour and light-reflection functions. Chatterjee et al. (2020) showed that it is possible to produce engineered human HEK293 cells to express reflectin A1 (RfA1), giving them tuneable light-scattering — squid-like optics in human cells.

Reference: Chatterjee et al. “Cephalopod-inspired optical engineering of human cells.” Nature Communications 11, 2708 (2020). DOI link

4.2 Getting the Protein Sequence from GenBank

RfA1 from Doryteuthis pealeii is at accession ACZ57764.1: NCBI link. Click Send to → File to download in GenBank or FASTA format.

GenBank format excerpt:

LOCUS       ACZ57764                 303 aa            linear   INV
DEFINITION  reflectin-like protein A1 [Doryteuthis pealeii].
ACCESSION   ACZ57764
VERSION     ACZ57764.1
SOURCE      Doryteuthis pealeii (longfin inshore squid)
  ORGANISM  Doryteuthis pealeii
            Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Mollusca;
            Cephalopoda; Coleoidea; Decapodiformes; Myopsida;
            Loliginidae; Doryteuthis.

303 amino acids, rich in methionine, tyrosine, and charged residues — classic reflectin signature.

4.3 The Corresponding DNA Sequence

To go from protein → DNA, you do a reverse translation: convert each amino acid back to a codon triplet. The catch: the genetic code is degenerate (multiple codons per amino acid), so there’s no single “correct” DNA sequence — just many valid ones. The wild-type squid coding sequence can be found via the “Coded by” link in the CDS feature of the NCBI protein page.

4.4 Codon Optimisation

Squid codons likely won’t express well in human or E. coli cells due to codon bias — organisms prefer different synonymous codons. Rare codons stall ribosomes and tank protein yield. Codon optimisation swaps in host-preferred codons without changing the protein.

We can use the online VectorBuilder tool: vectorbuilder.com/tool/codon-optimization.html — paste your sequence, pick your host organism, get optimised DNA out.

For dual expression (human + bacterial), you can either optimise separately for each host, or just optimise for human — human-preferred codons generally work fine in E. coli at moderate expression levels.

5. From Sequence to Cells — Step-by-Step with RfA1

Step (i): Import into Benchling

Log in to Benchling → your project folder.
Create → DNA Sequence (or paste/upload FASTA).
Annotate the RfA1 CDS. Check the translation matches the expected protein.
Use Benchling’s cloning tools to design the full expression construct in silico.

Step (ii): Design and Order from Twist

Goal: Express RfA1 in HEK293 cells via transposon integration at the AAVS1 safe-harbour locus (chr. 19), and also purify from E. coli.

Mammalian expression cassette:

Promoter: CAG or EF1α (strong mammalian)
Kozak: GCCACC before ATG
RfA1 CDS (codon-optimised) + optional 6×His tag
Stop codon: double stop (TAA-TAA)
PolyA signal: SV40 or bGH
Flanking: PiggyBac or Sleeping Beauty ITRs for transposon integration
Selection: Puromycin or hygromycin resistance cassette (PGK promoter)

Bacterial expression: Sub-clone RfA1+His into pET-28a (T7/lac, IPTG-inducible, kanamycin).

Order on Twist: Genes → Clonal Genes → upload sequence → choose vector → Twist checks feasibility → order. ~2–3 week turnaround.

Step (iii): Transform E. coli, Purify, Verify

Transform: Resuspend plasmid → add to competent cells (DH5α or BL21) on ice 30 min → heat shock 42 °C / 45 sec → ice 2 min → recover in SOC 37 °C / 1 hr → plate on LB + antibiotic → overnight.

Miniprep: Pick 2–4 colonies → grow overnight in LB + antibiotic → miniprep (Qiagen or equivalent) → Nanodrop.

Verify — restriction digest: Digest ~500 ng with diagnostic enzymes → run on 1% agarose gel → compare bands to predicted pattern from Benchling.

Verify — Sanger sequencing: Send plasmid + tiling primers to Azenta/GENEWIZ → import .ab1 traces into Benchling → align to reference → confirm 100% match.

Step (iv): Transfect HEK293 via Transposon + Lipofectamine

Seed HEK293 at ~70–80% confluency in 6-well plate (DMEM + 10% FBS, no antibiotics).
Lipofectamine 3000 mix: Tube A (Lipo 3000 + Opti-MEM) + Tube B (transposon plasmid + transposase helper plasmid + P3000 + Opti-MEM). Ratio ~3–5:1 transposon:transposase. Combine, wait 15 min.
Add complexes drop-wise → incubate 37 °C, 5% CO₂.
Select at 24–48 hrs with puromycin (1–2 µg/mL). Change media every 2–3 days. Non-integrants die off in 5–10 days.
Expand surviving pool or pick clones.

Step (v): Verify Genomic Integration

Confirm RfA1 actually integrated into the genome. Options from cheapest to most comprehensive:

A. Junction PCR + Sanger — One primer in cassette, one in flanking genome (e.g., AAVS1). Band = integration. Sanger the product. Cheap and fast but only checks one locus.

B. Long-read amplicon sequencing (Nanopore/PacBio) — Long-range PCR across the full insert → single-read verification of the entire cassette. No primer tiling needed.

C. TLA or whole-genome sequencing — Maps all integration sites genome-wide (Cergentis TLA or shallow WGS). Most comprehensive, most expensive. For final clone characterisation.

D. Targeted NGS panel — Extract gDNA (Qiagen DNeasy) → targeted panel covering construct + AAVS1 flanks. High-depth, catches mosaicism.

Best practical combo: A + B — junction PCR confirms the right locus, long-read confirms full cassette integrity.

Summary

Step	What You Do	Key Tool / Service
Sequence retrieval	Download RfA1 protein sequence	NCBI GenBank (ACZ57764.1)
Codon optimisation	Optimise for human/bacterial expression	VectorBuilder online tool
In silico design	Import sequence, design construct	Benchling
DNA synthesis	Order construct as clonal gene	Twist Bioscience
Bacterial work	Transform, miniprep, verify	Competent E. coli, Sanger sequencing
Mammalian transfection	Transposon + Lipofectamine into HEK293	Lipofectamine 3000, PiggyBac/SB
Integration verification	Confirm genomic integration	Junction PCR, Sanger, Nanopore, or WGS