I am a biotech and synbio enthusiast. I have been working in the the automation of iPSC and differentiated cell production, a venture where my educational foundation in Interdisciplinary Approaches to Biological Sciences and Molecular Genetics is instrumental. My experiences fuel my quest for innovative solutions, streamlining intricate cell and molecular workflows,and now excited about designing whole new systems!
Also enjoy getting stuck into new tech in any form whenever I can get my hands on it , whether this is kitchen gadgets, synbio, robotics or most recently 3D printing!
Professor Jacobson’s Questions Q1: Polymerase Error Rate vs. the Human Genome Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis.
I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.
First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.
Molecular Biology 101 1. Nucleotides In Silico Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).
Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.
Subsections of Homework
Pre Week 2 Lecture Questions
Professor Jacobson’s Questions
Q1: Polymerase Error Rate vs. the Human Genome
Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis.
I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.
After mismatch repair (MMR) and other post-replicative repair pathways, the final observed mutation rate drops to approximately 1 in 10⁹ - 10¹⁰ per base pair per cell division.
The human genome is ~3.2 x 10⁹ bp (diploid: ~6.4 x 10⁹ bp). So even with the correction systems ,so with the above rate you could predict 0.3-6 new mutations per human cell division
Q2: How Many combinations to DNA Codes for an Average Human Protein?
Number of possible DNA sequences: For a 400-AA protein:
~3⁴⁰⁰ ≈ 10¹⁹¹ different DNA sequences
Average human protein length: ~480 amino acids , round to 400.
Codon degeneracy: The genetic code has 61 sense codons encoding 20 amino acids, giving an average redundancy of ~3 codons per amino acid. The geometric mean of the degeneracy factors across all 20 amino acids is approximately 2.8-3.2.
Why don’t all of these “synonymous” sequences work in practice?*
Codon usage bias: Every organism has preferred codons matched to its tRNA abundance. Rare codons cause ribosome stalling, reduced translation rate, and lower protein yield.
mRNA secondary structure: Certain sequences fold into stable hairpins or structures that block ribosome scanning or translation initiation.
GC content effects: Extreme GC or AT content affects transcription efficiency, mRNA stability, and chromatin structure.
Cryptic regulatory signals: Random sequences may inadvertently create splice sites, polyadenylation signals, transcription factor binding sites, or promoter elements.
CpG dinucleotide methylation: In mammals, CpG sites are targets for methylation and subsequent deamination, leading to mutational hotspots.
mRNA half-life: Sequence composition influences mRNA decay rates via AU-rich elements or other destabilizing motifs.
This is why codon optimization is a critical step in synthetic biology and heterologous gene expression.
Dr. LeProust’s Questions
Q1: Most Commonly Used Method for Oligo Synthesis
Phosphoramidite chemistry on controlled-pore glass (CPG) solid supports, performed in a 3’→5’ direction. Developed by Marvin Caruthers in the ’80s, this method is the current standard for commercial oligonucleotide synthesis.
–
Q2: Why Is It Difficult to Make Oligos Longer Than 200 nt?
The fundamental problem is compounding coupling inefficiency. Even with an excellent per-step coupling efficiency of ~99.5%, the yield of full-length product drops exponentially:
Beyond ~200 nt, the full-length product becomes a minority species in a sea of truncation products. Additional failure modes compound the problem:
Depurination accumulates with each acid-catalyzed detritylation step, creating abasic sites.
Branching and deletion mutations increase with sequence length.
Steric Hindrance Synthesis is usually performed on solid supports like Controlled Pore Glass (CPG). As the oligonucleotide grows longer, it can clog the pores of the support, inhibiting the diffusion of reagents to the reactive 5’-end and decreasing coupling efficiency
Purification becomes intractable it becomes nearly impossible to purify out the target sequences from similar sized failed sequences (-1 or -2bp )
Q3: Why Can’t You Make a 2000 bp Gene via Direct Oligo Synthesis?
At 99.5% coupling efficiency over 2000 steps:
The 2000-mer Problem: For a 2000-mer synthesis, assuming an average stepwise yield of 99.7%, the overall yield of the full-length product would be only 0.25%.
Failure Sequences: The majority of the product in a 2000 bp synthesis would be truncated sequences (shorter than 2000 bp) capped at the growing end, making them extremely difficult to separate from the desired full-length product
So you would recover essentially zero full-length product. The synthesis would just yield a soup of truncated fragments.
Prof. Church’s Questions
Q1: The 10 Essential Amino Acids & the “Lysine Contingency”
The 10 essential amino acids (those that animals cannot synthesize and must obtain from diet):
#
Amino Acid
3-Letter
1-Letter
1
Histidine
His
H
2
Isoleucine
Ile
I
3
Leucine
Leu
L
4
Lysine
Lys
K
5
Methionine
Met
M
6
Phenylalanine
Phe
F
7
Threonine
Thr
T
8
Tryptophan
Trp
W
9
Valine
Val
V
10
Arginine*
Arg
R
*Arginine is semi- essential — required during growth and stress but synthesizable in limited quantities by adults.
The “Lysine Contingency” (of Jurassic Park): They engineered their dinosaurs to be lysine-deficient, so the animals would die without exogenous lysine supplementation as a plot device for a biological “kill switch.”
But this would not actually work in real world as a bio- containment strategy:
All vertebrates are already lysine-auxotrophs. Lysine is essential for every animal on the planet. Making the dinosaurs “lysine-dependent” is no different from their natural state.
Lysine is abundant in the environment. Meat, fish, insects, and many plants are rich in lysine. Any escaped dinosaur with a carnivorous or omnivorous diet could get plenty of lysine from their diet
A true contingency would require dependence on unavailable. — something not found in the wild environment or at least not at levels found in natural envrioment. A synthetic or unnatural cofactor, or an severe nutrient or possibly insulin dependency would be a far more realistically applicable approach.
Q3: The DARPA GO project
This is an exceedingly interesting mission as it seems it would require template free nucelotide synthesis with orthogonally light activated polymermase-like complexes for each nucleotide or perhaps a super responsive differentially activated complex dependent on the wavelength or pulse pattern? I wonder if it could be some super huge multi unit complex with the activation under secondary system based optogenetic control ? would it be fast enough?
Its a very cool problem and I am still deep in the rabbit hole of it, if you have recommended papers on this do send them my way!
Week 1 HW: Principles and Practices
First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.
Biosensing Tattoo Patches
I will explore the development of ‘e-tattoo’ or microneedle patches with biomedical and environmental sensing capabilities. I believe embedding diagnostic devices in an at home low resource application and interpretation formats is an application where we can utilize synthetic biology to create an accessible tool to further democratise advanced healthcare and diagnostics. It is an application of high potential but also high technical complexity. I am aware there are many technical hurdles, both biological and mechanical, that I hope to address more fully with the guidance of this course. I have a few primary PoCs in mind just now but are very subject to change depending on the application impact and biomarker suitability after further research.
Application
Target
Function
Biomarker
Technical Complexity Prediction Score (0-10)
Impact Potential
Cancer recurrence monitoring
Prostate cancer recurrence
Wearer can monitor for Prostate cancer markers at home- rather than hospital check ups
PSA1
5 - simple biomarker but general circuit and device complexity challenges
Medium
Metastasis Monitoring
General cancer metastasis
Wearer can monitor for metastasis markers at home – rather than hospital check ups
OPN
5 - simple biomarker but general circuit and device complexity challenges
High
Exposure / Infection Monitoring
Tuberculosis
Wearer can continuously monitor for TB infection in high risk environments- such as for healthcare workers low resource environments or natural disasters
TB RNA
7-potential biomarker complexity challenges & sensitivity challenges and general circuit and device complexity challenges
High
Disease Monitoring and management
Multiple Sclerosis (MS)
Wearer can self-monitor and adjust care for MS relapses
Serum neurofilament light chain (sNfL)
8 biomarker complexity challenges & sensitivity challenges & general circuit and device complexity challenges
Medium
**Related Papers
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm).
Governance
Core governance for development and deployment could be establishing thorugh core guiding principles that align with the aspirations of aiding autonomous democratized healthcare for general good and particularly in low resource contexts
1)Ethics First Development
· Beneficial Use only - only developed to meet medical illness or healthcare need
· Consensual Use Only applied to consenting populations (not without clear consent e.g drug detection in incarcerated populations)
2)Accessibility
· Support economic democracy- Generate and deploy applications in a manner that at least 50% of the deployment is affordable and accessible to low resource users.
· Support all users- ease of adoption, use and interpretation by the end user is a continuous core design principle.
3)Safety
· User safety- ensure use of the device will cause no harm, immediate or lasting to the user
· Containment Safety - Ensure the components of the device have suitable biological and component containment measures to prevent integration or harm beyond the device, to any living system plant or animal.
3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
Aspect
Overview
Considerations
Opportunities
Stakeholders
Proposed Actions
Purpose
The purpose of the device is to provide simpler health autonomy to users
Healthcare providers may not be incentivised or receptive to increasing patient autonomy.
Can reduce healthcare technician burden of running routine testing
Make client satisfaction and minimised time ‘in-clinic’ as metrics of success for healthcare providers.
Design
As a healthcare device it will likely require approval by regulatory bodies such as FDA/MHRAwould need buy in by large medical care groups ( e.g providers)
Regulatory bodies are struggling to define between cell therapies and ‘living diagnostics’ and therefore set appropriate regulatory expectations
Can provide a watershed case for effective regulation of living diagnostics
· Regulators · Users (patients) · Healthcare providers · Insurance Providers · General Public
Collation action with subject experts and regulatory bodies to establish a dedicated taskforce to tackle areas of confusion.
Assumptions
The current design brief assumes the device has suitable biomarker targets & can be suitably manufactured
The PoC detection circuit designs may require many cycles of iteration
Can set precedent of acceptable thresholds of accuracy & sensitivity for such devices
· Regulators · Creators · Funders
Creators choose well researched markers, seek input from field experts, design quick PoCs in biological contexts
Risks
· The device may not be reliable · Device may be harmful when broken or misused · The device may not be robust enough for home use. · Selected biomarkers may not be specific enough. · Device may not be economically viable · Device may be used for forced monitoring
There are many layers of risks using biologically active device ‘in the wild’ , a possible electrical device in a liquid system, Creating diagnostics for Non-expert users
Can identify and address risks early and become a model for considerate, purposeful and responsible synthetic biology application
· Regulators · Users (patients) · Healthcare providers · Insurance Providers · General Public
Biocontainment measuresElectrical containment measuresMaintain guiding values for responsible applications
4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals:
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
X
• By helping respond
X
Foster Lab Safety
• By preventing incident
X
• By helping respond
X
Protect the environment
• By preventing incidents
X
• By helping respond
X
Other considerations
• Minimizing costs and burdens to stakeholders
X
• Feasibility?
X
• Not impede research
N/A
• Promote constructive applications
X
5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
Firstly, prioritize the solid design in line with the guiding principles as this would affect the fundamental elements of the device and so prevent downstream risks. This may mean more time and resources in the design phase to factor in all considerations and seek input. I believe this would ultimately save costs in the long term; there may be instance to consider the trade-off value of actioning progress of a single promising but low impact PoC application or simpler device design to clear the path for future applications.
Second priority would be establishing regulatory clarity and acceptance with regulatory bodies such as the FDA and MHRA. This regulatory acceptance would be a major point of uncertainty and guidance on what is needed for regulatory approval as this often a fundamental step for effective development and widespread acceptance and deployment of new technologies for health-related applications.
Week 2 HW :DNA Read Write Edit
Molecular Biology 101
1. Nucleotides In Silico
Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).
Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.
2. DNA Synthesis
Instead of cloning from a template, you can order custom DNA directly from commercial providers like Twist Bioscience, IDT, or GenScript — typically delivered in 1–2 weeks. On Twist, you pick between two formats:
Clonal genes (plasmid): Gene synthesized and cloned into a vector (e.g., pTwist Amp). Arrives as dried plasmid or E. coli stock. Ready to use.
Linear DNA (fragments): Double-stranded DNA fragment for assembly into your own vector (e.g., Gibson or Golden Gate). Cheaper and faster.
3. Sequence Verification
Always verify your synthetic DNA before starting experiments. Two standard methods:
a. Sanger Sequencing + Benchling Alignment
Send plasmid + primer to a sequencing provider (Azenta, Eurofins). You get back a .ab1 trace file. Import it into Benchling, align against your reference — mismatches, insertions, and deletions are instantly highlighted. Each read covers ~800–1000 bp, so tile multiple primers for longer inserts.
b. Restriction Digest
Cut your plasmid with 1–2 restriction enzymes, run on an agarose gel, and compare the band pattern to the predicted digest (use Benchling or SnapGene). Confirms correct insert size and orientation. Won’t catch point mutations best used used alongside Sanger.
4. Selected Protien Example — Reflectin Protein RfA1
4.1 Background
Reflectins are squid origin proteins that can change the light reflecting properties of a cell in repsonse to external stimuli (such as changes in salt concentration). In squid they are responsible for dynamic skin colour and light-reflection functions.
Chatterjee et al. (2020) showed that it is possible to produce engineered human HEK293 cells to express reflectin A1 (RfA1), giving them tuneable light-scattering — squid-like optics in human cells.
Reference: Chatterjee et al. “Cephalopod-inspired optical engineering of human cells.” Nature Communications 11, 2708 (2020). DOI link
4.2 Getting the Protein Sequence from GenBank
RfA1 from Doryteuthis pealeii is at accession ACZ57764.1: NCBI link. Click Send to → File to download in GenBank or FASTA format.
GenBank format excerpt:
LOCUS ACZ57764 303 aa linear INV
DEFINITION reflectin-like protein A1 [Doryteuthis pealeii].
ACCESSION ACZ57764
VERSION ACZ57764.1
SOURCE Doryteuthis pealeii (longfin inshore squid)
ORGANISM Doryteuthis pealeii
Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Mollusca;
Cephalopoda; Coleoidea; Decapodiformes; Myopsida;
Loliginidae; Doryteuthis.
303 amino acids, rich in methionine, tyrosine, and charged residues — classic reflectin signature.
4.3 The Corresponding DNA Sequence
To go from protein → DNA, you do a reverse translation: convert each amino acid back to a codon triplet. The catch: the genetic code is degenerate (multiple codons per amino acid), so there’s no single “correct” DNA sequence — just many valid ones. The wild-type squid coding sequence can be found via the “Coded by” link in the CDS feature of the NCBI protein page.
4.4 Codon Optimisation
Squid codons likely won’t express well in human or E. coli cells due to codon bias — organisms prefer different synonymous codons. Rare codons stall ribosomes and tank protein yield. Codon optimisation swaps in host-preferred codons without changing the protein.
For dual expression (human + bacterial), you can either optimise separately for each host, or just optimise for human — human-preferred codons generally work fine in E. coli at moderate expression levels.
5. From Sequence to Cells — Step-by-Step with RfA1
Transform: Resuspend plasmid → add to competent cells (DH5α or BL21) on ice 30 min → heat shock 42 °C / 45 sec → ice 2 min → recover in SOC 37 °C / 1 hr → plate on LB + antibiotic → overnight.
Miniprep: Pick 2–4 colonies → grow overnight in LB + antibiotic → miniprep (Qiagen or equivalent) → Nanodrop.
Verify — restriction digest: Digest ~500 ng with diagnostic enzymes → run on 1% agarose gel → compare bands to predicted pattern from Benchling.
Verify — Sanger sequencing: Send plasmid + tiling primers to Azenta/GENEWIZ → import .ab1 traces into Benchling → align to reference → confirm 100% match.
Step (iv): Transfect HEK293 via Transposon + Lipofectamine
Seed HEK293 at ~70–80% confluency in 6-well plate (DMEM + 10% FBS, no antibiotics).
Lipofectamine 3000 mix: Tube A (Lipo 3000 + Opti-MEM) + Tube B (transposon plasmid + transposase helper plasmid + P3000 + Opti-MEM). Ratio ~3–5:1 transposon:transposase. Combine, wait 15 min.
Add complexes drop-wise → incubate 37 °C, 5% CO₂.
Select at 24–48 hrs with puromycin (1–2 µg/mL). Change media every 2–3 days. Non-integrants die off in 5–10 days.
Expand surviving pool or pick clones.
Step (v): Verify Genomic Integration
Confirm RfA1 actually integrated into the genome. Options from cheapest to most comprehensive:
A. Junction PCR + Sanger — One primer in cassette, one in flanking genome (e.g., AAVS1). Band = integration. Sanger the product. Cheap and fast but only checks one locus.
B. Long-read amplicon sequencing (Nanopore/PacBio) — Long-range PCR across the full insert → single-read verification of the entire cassette. No primer tiling needed.
C. TLA or whole-genome sequencing — Maps all integration sites genome-wide (Cergentis TLA or shallow WGS). Most comprehensive, most expensive. For final clone characterisation.