Homework

Weekly homework submissions:

  • Pre Week 2 Lecture Questions

    Professor Jacobson’s Questions Q1: Polymerase Error Rate vs. the Human Genome Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis. I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.

  • Week 1 HW: Principles and Practices

    First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

  • Week 2 HW :DNA Read Write Edit

    Molecular Biology 101 1. Nucleotides In Silico Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform). Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.

Subsections of Homework

Pre Week 2 Lecture Questions

Professor Jacobson’s Questions

Q1: Polymerase Error Rate vs. the Human Genome

Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis.

I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.

After mismatch repair (MMR) and other post-replicative repair pathways, the final observed mutation rate drops to approximately 1 in 10⁹ - 10¹⁰ per base pair per cell division.

The human genome is ~3.2 x 10⁹ bp (diploid: ~6.4 x 10⁹ bp). So even with the correction systems ,so with the above rate you could predict 0.3-6 new mutations per human cell division

Q2: How Many combinations to DNA Codes for an Average Human Protein?

Number of possible DNA sequences: For a 400-AA protein:

~3⁴⁰⁰ ≈ 10¹⁹¹ different DNA sequences

Average human protein length: ~480 amino acids , round to 400.

Codon degeneracy: The genetic code has 61 sense codons encoding 20 amino acids, giving an average redundancy of ~3 codons per amino acid. The geometric mean of the degeneracy factors across all 20 amino acids is approximately 2.8-3.2.

Why don’t all of these “synonymous” sequences work in practice?*

  • Codon usage bias: Every organism has preferred codons matched to its tRNA abundance. Rare codons cause ribosome stalling, reduced translation rate, and lower protein yield.
  • mRNA secondary structure: Certain sequences fold into stable hairpins or structures that block ribosome scanning or translation initiation.
  • GC content effects: Extreme GC or AT content affects transcription efficiency, mRNA stability, and chromatin structure.
  • Cryptic regulatory signals: Random sequences may inadvertently create splice sites, polyadenylation signals, transcription factor binding sites, or promoter elements.
  • CpG dinucleotide methylation: In mammals, CpG sites are targets for methylation and subsequent deamination, leading to mutational hotspots.
  • Codon pair bias: Adjacent codon combinations affect translation speed and accuracy beyond individual codon frequency.
  • mRNA half-life: Sequence composition influences mRNA decay rates via AU-rich elements or other destabilizing motifs.

This is why codon optimization is a critical step in synthetic biology and heterologous gene expression.


Dr. LeProust’s Questions

Q1: Most Commonly Used Method for Oligo Synthesis

Phosphoramidite chemistry on controlled-pore glass (CPG) solid supports, performed in a 3’→5’ direction. Developed by Marvin Caruthers in the ’80s, this method is the current standard for commercial oligonucleotide synthesis.


Q2: Why Is It Difficult to Make Oligos Longer Than 200 nt?

The fundamental problem is compounding coupling inefficiency. Even with an excellent per-step coupling efficiency of ~99.5%, the yield of full-length product drops exponentially:

Beyond ~200 nt, the full-length product becomes a minority species in a sea of truncation products. Additional failure modes compound the problem:

  • Depurination accumulates with each acid-catalyzed detritylation step, creating abasic sites.
  • Branching and deletion mutations increase with sequence length.
  • Steric Hindrance Synthesis is usually performed on solid supports like Controlled Pore Glass (CPG). As the oligonucleotide grows longer, it can clog the pores of the support, inhibiting the diffusion of reagents to the reactive 5’-end and decreasing coupling efficiency
  • Purification becomes intractable it becomes nearly impossible to purify out the target sequences from similar sized failed sequences (-1 or -2bp )

Q3: Why Can’t You Make a 2000 bp Gene via Direct Oligo Synthesis?

At 99.5% coupling efficiency over 2000 steps:

The 2000-mer Problem: For a 2000-mer synthesis, assuming an average stepwise yield of 99.7%, the overall yield of the full-length product would be only 0.25%.

Failure Sequences: The majority of the product in a 2000 bp synthesis would be truncated sequences (shorter than 2000 bp) capped at the growing end, making them extremely difficult to separate from the desired full-length product

So you would recover essentially zero full-length product. The synthesis would just yield a soup of truncated fragments.


Prof. Church’s Questions

Q1: The 10 Essential Amino Acids & the “Lysine Contingency”

The 10 essential amino acids (those that animals cannot synthesize and must obtain from diet):

#Amino Acid3-Letter1-Letter
1HistidineHisH
2IsoleucineIleI
3LeucineLeuL
4LysineLysK
5MethionineMetM
6PhenylalaninePheF
7ThreonineThrT
8TryptophanTrpW
9ValineValV
10Arginine*ArgR

*Arginine is semi- essential — required during growth and stress but synthesizable in limited quantities by adults.

The “Lysine Contingency” (of Jurassic Park): They engineered their dinosaurs to be lysine-deficient, so the animals would die without exogenous lysine supplementation as a plot device for a biological “kill switch.”

But this would not actually work in real world as a bio- containment strategy:

  • All vertebrates are already lysine-auxotrophs. Lysine is essential for every animal on the planet. Making the dinosaurs “lysine-dependent” is no different from their natural state.
  • Lysine is abundant in the environment. Meat, fish, insects, and many plants are rich in lysine. Any escaped dinosaur with a carnivorous or omnivorous diet could get plenty of lysine from their diet
  • A true contingency would require dependence on unavailable. — something not found in the wild environment or at least not at levels found in natural envrioment. A synthetic or unnatural cofactor, or an severe nutrient or possibly insulin dependency would be a far more realistically applicable approach.

Q3: The DARPA GO project

This is an exceedingly interesting mission as it seems it would require template free nucelotide synthesis with orthogonally light activated polymermase-like complexes for each nucleotide or perhaps a super responsive differentially activated complex dependent on the wavelength or pulse pattern? I wonder if it could be some super huge multi unit complex with the activation under secondary system based optogenetic control ? would it be fast enough?

Its a very cool problem and I am still deep in the rabbit hole of it, if you have recommended papers on this do send them my way!

Week 1 HW: Principles and Practices

cover image cover image

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

Biosensing Tattoo Patches

I will explore the development of ‘e-tattoo’ or microneedle patches with biomedical and environmental sensing capabilities.
I believe embedding diagnostic devices in an at home low resource application and interpretation formats is an application where we can utilize synthetic biology to create an accessible tool to further democratise advanced healthcare and diagnostics.
It is an application of high potential but also high technical complexity. I am aware there are many technical hurdles, both biological and mechanical, that I hope to address more fully with the guidance of this course.
I have a few primary PoCs in mind just now but are very subject to change depending on the application impact and biomarker suitability after further research.

ApplicationTargetFunctionBiomarkerTechnical Complexity Prediction Score (0-10)Impact Potential
Cancer recurrence monitoringProstate cancer recurrenceWearer can monitor for Prostate cancer markers at home- rather than hospital check upsPSA15 - simple biomarker but general circuit and device complexity challengesMedium
Metastasis MonitoringGeneral cancer metastasisWearer can monitor for metastasis markers at home – rather than hospital check upsOPN5 - simple biomarker but general circuit and device complexity challengesHigh
Exposure / Infection MonitoringTuberculosisWearer can continuously monitor for TB infection in high risk environments- such as for healthcare workers low resource environments or natural disastersTB RNA7-potential biomarker complexity challenges & sensitivity challenges and general circuit and device complexity challengesHigh
Disease Monitoring and managementMultiple Sclerosis (MS)Wearer can self-monitor and adjust care for MS relapsesSerum neurofilament light chain (sNfL)8 biomarker complexity challenges & sensitivity challenges & general circuit and device complexity challengesMedium

**Related Papers

E-tattoos PoC E-tattoos PoCBio Tattoos PoC Bio Tattoos PoCDermal Biosensors Example Dermal Biosensors ExampleOPN in metastasis OPN in metastasisOPN in cancer 2 OPN in cancer 2 Dermal Sensor Tattoos Dermal Sensor Tattoos

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm).

Governance

Core governance for development and deployment could be establishing thorugh core guiding principles that align with the aspirations of aiding autonomous democratized healthcare for general good and particularly in low resource contexts

1)Ethics First Development · Beneficial Use only - only developed to meet medical illness or healthcare need

· Consensual Use Only applied to consenting populations (not without clear consent e.g drug detection in incarcerated populations)

2)Accessibility · Support economic democracy- Generate and deploy applications in a manner that at least 50% of the deployment is affordable and accessible to low resource users. · Support all users- ease of adoption, use and interpretation by the end user is a continuous core design principle.

3)Safety · User safety- ensure use of the device will cause no harm, immediate or lasting to the user

· Containment Safety - Ensure the components of the device have suitable biological and component containment measures to prevent integration or harm beyond the device, to any living system plant or animal.


3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

AspectOverviewConsiderationsOpportunitiesStakeholdersProposed Actions
PurposeThe purpose of the device is to provide simpler health autonomy to usersHealthcare providers may not be incentivised or receptive to increasing patient autonomy.Can reduce healthcare technician burden of running routine testing· Users (patients) · Healthcare providers · Insurance Providers · Regulators Make client satisfaction and minimised time ‘in-clinic’ as metrics of success for healthcare providers.
DesignAs a healthcare device it will likely require approval by regulatory bodies such as FDA/MHRA would need buy in by large medical care groups ( e.g providers)Regulatory bodies are struggling to define between cell therapies and ‘living diagnostics’ and therefore set appropriate regulatory expectationsCan provide a watershed case for effective regulation of living diagnostics· Regulators · Users (patients) · Healthcare providers · Insurance Providers · General Public Collation action with subject experts and regulatory bodies to establish a dedicated taskforce to tackle areas of confusion.
AssumptionsThe current design brief assumes the device has suitable biomarker targets & can be suitably manufacturedThe PoC detection circuit designs may require many cycles of iterationCan set precedent of acceptable thresholds of accuracy & sensitivity for such devices· Regulators · Creators · Funders Creators choose well researched markers, seek input from field experts, design quick PoCs in biological contexts
Risks· The device may not be reliable · Device may be harmful when broken or misused · The device may not be robust enough for home use. · Selected biomarkers may not be specific enough. · Device may not be economically viable · Device may be used for forced monitoringThere are many layers of risks using biologically active device ‘in the wild’ , a possible electrical device in a liquid system, Creating diagnostics for Non-expert usersCan identify and address risks early and become a model for considerate, purposeful and responsible synthetic biology application· Regulators · Users (patients) · Healthcare providers · Insurance Providers · General Public Biocontainment measures Electrical containment measures Maintain guiding values for responsible applications

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals:

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidentsX
• By helping respondX
Foster Lab Safety
• By preventing incidentX
• By helping respondX
Protect the environment
• By preventing incidentsX
• By helping respondX
Other considerations
• Minimizing costs and burdens to stakeholdersX
• Feasibility?X
• Not impede researchN/A
• Promote constructive applicationsX

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Firstly, prioritize the solid design in line with the guiding principles as this would affect the fundamental elements of the device and so prevent downstream risks. This may mean more time and resources in the design phase to factor in all considerations and seek input. I believe this would ultimately save costs in the long term; there may be instance to consider the trade-off value of actioning progress of a single promising but low impact PoC application or simpler device design to clear the path for future applications.

Second priority would be establishing regulatory clarity and acceptance with regulatory bodies such as the FDA and MHRA. This regulatory acceptance would be a major point of uncertainty and guidance on what is needed for regulatory approval as this often a fundamental step for effective development and widespread acceptance and deployment of new technologies for health-related applications.

Week 2 HW :DNA Read Write Edit

Molecular Biology 101

1. Nucleotides In Silico

Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).

Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.


2. DNA Synthesis

Instead of cloning from a template, you can order custom DNA directly from commercial providers like Twist Bioscience, IDT, or GenScript — typically delivered in 1–2 weeks. On Twist, you pick between two formats:

  • Clonal genes (plasmid): Gene synthesized and cloned into a vector (e.g., pTwist Amp). Arrives as dried plasmid or E. coli stock. Ready to use.
  • Linear DNA (fragments): Double-stranded DNA fragment for assembly into your own vector (e.g., Gibson or Golden Gate). Cheaper and faster.

3. Sequence Verification

Always verify your synthetic DNA before starting experiments. Two standard methods:

a. Sanger Sequencing + Benchling Alignment

Send plasmid + primer to a sequencing provider (Azenta, Eurofins). You get back a .ab1 trace file. Import it into Benchling, align against your reference — mismatches, insertions, and deletions are instantly highlighted. Each read covers ~800–1000 bp, so tile multiple primers for longer inserts.

b. Restriction Digest

Cut your plasmid with 1–2 restriction enzymes, run on an agarose gel, and compare the band pattern to the predicted digest (use Benchling or SnapGene). Confirms correct insert size and orientation. Won’t catch point mutations best used used alongside Sanger.


4. Selected Protien Example — Reflectin Protein RfA1

4.1 Background

Reflectins are squid origin proteins that can change the light reflecting properties of a cell in repsonse to external stimuli (such as changes in salt concentration). In squid they are responsible for dynamic skin colour and light-reflection functions. Chatterjee et al. (2020) showed that it is possible to produce engineered human HEK293 cells to express reflectin A1 (RfA1), giving them tuneable light-scattering — squid-like optics in human cells.

Reference: Chatterjee et al. “Cephalopod-inspired optical engineering of human cells.” Nature Communications 11, 2708 (2020). DOI link


4.2 Getting the Protein Sequence from GenBank

RfA1 from Doryteuthis pealeii is at accession ACZ57764.1: NCBI link. Click Send to → File to download in GenBank or FASTA format.

GenBank format excerpt:

LOCUS       ACZ57764                 303 aa            linear   INV
DEFINITION  reflectin-like protein A1 [Doryteuthis pealeii].
ACCESSION   ACZ57764
VERSION     ACZ57764.1
SOURCE      Doryteuthis pealeii (longfin inshore squid)
  ORGANISM  Doryteuthis pealeii
            Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Mollusca;
            Cephalopoda; Coleoidea; Decapodiformes; Myopsida;
            Loliginidae; Doryteuthis.

303 amino acids, rich in methionine, tyrosine, and charged residues — classic reflectin signature.


4.3 The Corresponding DNA Sequence

To go from protein → DNA, you do a reverse translation: convert each amino acid back to a codon triplet. The catch: the genetic code is degenerate (multiple codons per amino acid), so there’s no single “correct” DNA sequence — just many valid ones. The wild-type squid coding sequence can be found via the “Coded by” link in the CDS feature of the NCBI protein page.


4.4 Codon Optimisation

Squid codons likely won’t express well in human or E. coli cells due to codon bias — organisms prefer different synonymous codons. Rare codons stall ribosomes and tank protein yield. Codon optimisation swaps in host-preferred codons without changing the protein.

We can use the online VectorBuilder tool: vectorbuilder.com/tool/codon-optimization.html — paste your sequence, pick your host organism, get optimised DNA out.

For dual expression (human + bacterial), you can either optimise separately for each host, or just optimise for human — human-preferred codons generally work fine in E. coli at moderate expression levels.


5. From Sequence to Cells — Step-by-Step with RfA1

Step (i): Import into Benchling

  1. Log in to Benchling → your project folder.
  2. Create → DNA Sequence (or paste/upload FASTA).
  3. Annotate the RfA1 CDS. Check the translation matches the expected protein.
  4. Use Benchling’s cloning tools to design the full expression construct in silico.

Step (ii): Design and Order from Twist

Goal: Express RfA1 in HEK293 cells via transposon integration at the AAVS1 safe-harbour locus (chr. 19), and also purify from E. coli.

Mammalian expression cassette:

  • Promoter: CAG or EF1α (strong mammalian)
  • Kozak: GCCACC before ATG
  • RfA1 CDS (codon-optimised) + optional 6×His tag
  • Stop codon: double stop (TAA-TAA)
  • PolyA signal: SV40 or bGH
  • Flanking: PiggyBac or Sleeping Beauty ITRs for transposon integration
  • Selection: Puromycin or hygromycin resistance cassette (PGK promoter)

Bacterial expression: Sub-clone RfA1+His into pET-28a (T7/lac, IPTG-inducible, kanamycin).

Order on Twist: Genes → Clonal Genes → upload sequence → choose vector → Twist checks feasibility → order. ~2–3 week turnaround.


Step (iii): Transform E. coli, Purify, Verify

Transform: Resuspend plasmid → add to competent cells (DH5α or BL21) on ice 30 min → heat shock 42 °C / 45 sec → ice 2 min → recover in SOC 37 °C / 1 hr → plate on LB + antibiotic → overnight.

Miniprep: Pick 2–4 colonies → grow overnight in LB + antibiotic → miniprep (Qiagen or equivalent) → Nanodrop.

Verify — restriction digest: Digest ~500 ng with diagnostic enzymes → run on 1% agarose gel → compare bands to predicted pattern from Benchling.

Verify — Sanger sequencing: Send plasmid + tiling primers to Azenta/GENEWIZ → import .ab1 traces into Benchling → align to reference → confirm 100% match.


Step (iv): Transfect HEK293 via Transposon + Lipofectamine

  1. Seed HEK293 at ~70–80% confluency in 6-well plate (DMEM + 10% FBS, no antibiotics).
  2. Lipofectamine 3000 mix: Tube A (Lipo 3000 + Opti-MEM) + Tube B (transposon plasmid + transposase helper plasmid + P3000 + Opti-MEM). Ratio ~3–5:1 transposon:transposase. Combine, wait 15 min.
  3. Add complexes drop-wise → incubate 37 °C, 5% CO₂.
  4. Select at 24–48 hrs with puromycin (1–2 µg/mL). Change media every 2–3 days. Non-integrants die off in 5–10 days.
  5. Expand surviving pool or pick clones.

Step (v): Verify Genomic Integration

Confirm RfA1 actually integrated into the genome. Options from cheapest to most comprehensive:

A. Junction PCR + Sanger — One primer in cassette, one in flanking genome (e.g., AAVS1). Band = integration. Sanger the product. Cheap and fast but only checks one locus.

B. Long-read amplicon sequencing (Nanopore/PacBio) — Long-range PCR across the full insert → single-read verification of the entire cassette. No primer tiling needed.

C. TLA or whole-genome sequencing — Maps all integration sites genome-wide (Cergentis TLA or shallow WGS). Most comprehensive, most expensive. For final clone characterisation.

D. Targeted NGS panel — Extract gDNA (Qiagen DNeasy) → targeted panel covering construct + AAVS1 flanks. High-depth, catches mosaicism.

Best practical combo: A + B — junction PCR confirms the right locus, long-read confirms full cassette integrity.


Summary

StepWhat You DoKey Tool / Service
Sequence retrievalDownload RfA1 protein sequenceNCBI GenBank (ACZ57764.1)
Codon optimisationOptimise for human/bacterial expressionVectorBuilder online tool
In silico designImport sequence, design constructBenchling
DNA synthesisOrder construct as clonal geneTwist Bioscience
Bacterial workTransform, miniprep, verifyCompetent E. coli, Sanger sequencing
Mammalian transfectionTransposon + Lipofectamine into HEK293Lipofectamine 3000, PiggyBac/SB
Integration verificationConfirm genomic integrationJunction PCR, Sanger, Nanopore, or WGS