Professor Jacobson’s Questions Q1: Polymerase Error Rate vs. the Human Genome Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis.
I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.
First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.
Molecular Biology 101 1. Nucleotides In Silico Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).
Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.
Week 3: Lab Automation HTGAA 2026 — Fiona Connolly
What Lab Automation Can Do for Us? Lab automation is simply automating the processes in the lab. Scripted protocols, and integrated instruments to carry out experimental procedures with ideally minimal manual intervention. Particularly in molecular biology, this typically translates to very precise , temporally and temperature controlled liquid handling across the scale from picoL to Litres. The precise transfer of reagents, cultures, or genetic constructs between wells, plates, and vessels.
Week 4 — Protein Design Part I Foundations of protein chemistry: alphabet, structure, and design A topic guide on the amino-acid alphabet, secondary structure, chirality, β-sheet aggregation, and the principles of designing structured peptides — written as a stand-alone primer rather than a homework Q&A. This page is Part A of the Week 4 deliverable for HTGAA Spring 2026; Parts B, C, and D (sequence/structure analysis, ML protein design, MS2 L-protein engineering proposal) will be added as those sessions complete.
Week 6 — Genetic Circuits I: Assembly Technologies Part 1 — DNA Assembly: PCR, Gibson, Golden Gate, and transformation A topic guide on the molecular-biology toolkit that underpins all of synthetic biology: amplifying DNA (PCR), cutting it (restriction enzymes), joining it (Gibson and Golden Gate), and getting it into cells (transformation). Written as a stand-alone primer rather than a homework Q&A. Part 2 (Asimov Kernel: building genetic circuits computationally) will follow as a separate page once the simulation work is complete.
Subsections of Homework
Pre Week 2 Lecture Questions
Professor Jacobson’s Questions
Q1: Polymerase Error Rate vs. the Human Genome
Raw polymerase error rate: DNA polymerase III (the baseline replicative polymerase) misincorporates roughly 1 in 10^4 to 10⁵ nucleotides during synthesis.
I fyou factor in built-in proofreading checkpoints this error rate reduces to about 1 in 10⁷.
After mismatch repair (MMR) and other post-replicative repair pathways, the final observed mutation rate drops to approximately 1 in 10⁹ - 10¹⁰ per base pair per cell division.
The human genome is ~3.2 x 10⁹ bp (diploid: ~6.4 x 10⁹ bp). So even with the correction systems ,so with the above rate you could predict 0.3-6 new mutations per human cell division
Q2: How Many combinations to DNA Codes for an Average Human Protein?
Number of possible DNA sequences: For a 400-AA protein:
~3⁴⁰⁰ ≈ 10¹⁹¹ different DNA sequences
Average human protein length: ~480 amino acids , round to 400.
Codon degeneracy: The genetic code has 61 sense codons encoding 20 amino acids, giving an average redundancy of ~3 codons per amino acid. The geometric mean of the degeneracy factors across all 20 amino acids is approximately 2.8-3.2.
Why don’t all of these “synonymous” sequences work in practice?*
Codon usage bias: Every organism has preferred codons matched to its tRNA abundance. Rare codons cause ribosome stalling, reduced translation rate, and lower protein yield.
mRNA secondary structure: Certain sequences fold into stable hairpins or structures that block ribosome scanning or translation initiation.
GC content effects: Extreme GC or AT content affects transcription efficiency, mRNA stability, and chromatin structure.
Cryptic regulatory signals: Random sequences may inadvertently create splice sites, polyadenylation signals, transcription factor binding sites, or promoter elements.
CpG dinucleotide methylation: In mammals, CpG sites are targets for methylation and subsequent deamination, leading to mutational hotspots.
mRNA half-life: Sequence composition influences mRNA decay rates via AU-rich elements or other destabilizing motifs.
This is why codon optimization is a critical step in synthetic biology and heterologous gene expression.
Dr. LeProust’s Questions
Q1: Most Commonly Used Method for Oligo Synthesis
Phosphoramidite chemistry on controlled-pore glass (CPG) solid supports, performed in a 3’→5’ direction. Developed by Marvin Caruthers in the ’80s, this method is the current standard for commercial oligonucleotide synthesis.
–
Q2: Why Is It Difficult to Make Oligos Longer Than 200 nt?
The fundamental problem is compounding coupling inefficiency. Even with an excellent per-step coupling efficiency of ~99.5%, the yield of full-length product drops exponentially:
Beyond ~200 nt, the full-length product becomes a minority species in a sea of truncation products. Additional failure modes compound the problem:
Depurination accumulates with each acid-catalyzed detritylation step, creating abasic sites.
Branching and deletion mutations increase with sequence length.
Steric Hindrance Synthesis is usually performed on solid supports like Controlled Pore Glass (CPG). As the oligonucleotide grows longer, it can clog the pores of the support, inhibiting the diffusion of reagents to the reactive 5’-end and decreasing coupling efficiency
Purification becomes intractable it becomes nearly impossible to purify out the target sequences from similar sized failed sequences (-1 or -2bp )
Q3: Why Can’t You Make a 2000 bp Gene via Direct Oligo Synthesis?
At 99.5% coupling efficiency over 2000 steps:
The 2000-mer Problem: For a 2000-mer synthesis, assuming an average stepwise yield of 99.7%, the overall yield of the full-length product would be only 0.25%.
Failure Sequences: The majority of the product in a 2000 bp synthesis would be truncated sequences (shorter than 2000 bp) capped at the growing end, making them extremely difficult to separate from the desired full-length product
So you would recover essentially zero full-length product. The synthesis would just yield a soup of truncated fragments.
Prof. Church’s Questions
Q1: The 10 Essential Amino Acids & the “Lysine Contingency”
The 10 essential amino acids (those that animals cannot synthesize and must obtain from diet):
#
Amino Acid
3-Letter
1-Letter
1
Histidine
His
H
2
Isoleucine
Ile
I
3
Leucine
Leu
L
4
Lysine
Lys
K
5
Methionine
Met
M
6
Phenylalanine
Phe
F
7
Threonine
Thr
T
8
Tryptophan
Trp
W
9
Valine
Val
V
10
Arginine*
Arg
R
*Arginine is semi- essential — required during growth and stress but synthesizable in limited quantities by adults.
The “Lysine Contingency” (of Jurassic Park): They engineered their dinosaurs to be lysine-deficient, so the animals would die without exogenous lysine supplementation as a plot device for a biological “kill switch.”
But this would not actually work in real world as a bio- containment strategy:
All vertebrates are already lysine-auxotrophs. Lysine is essential for every animal on the planet. Making the dinosaurs “lysine-dependent” is no different from their natural state.
Lysine is abundant in the environment. Meat, fish, insects, and many plants are rich in lysine. Any escaped dinosaur with a carnivorous or omnivorous diet could get plenty of lysine from their diet
A true contingency would require dependence on unavailable. — something not found in the wild environment or at least not at levels found in natural envrioment. A synthetic or unnatural cofactor, or an severe nutrient or possibly insulin dependency would be a far more realistically applicable approach.
Q3: The DARPA GO project
This is an exceedingly interesting mission as it seems it would require template free nucelotide synthesis with orthogonally light activated polymermase-like complexes for each nucleotide or perhaps a super responsive differentially activated complex dependent on the wavelength or pulse pattern? I wonder if it could be some super huge multi unit complex with the activation under secondary system based optogenetic control ? would it be fast enough?
Its a very cool problem and I am still deep in the rabbit hole of it, if you have recommended papers on this do send them my way!
Week 1 HW: Principles and Practices
First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.
Biosensing Tattoo Patches
I will explore the development of ‘e-tattoo’ or microneedle patches with biomedical and environmental sensing capabilities. I believe embedding diagnostic devices in an at home low resource application and interpretation formats is an application where we can utilize synthetic biology to create an accessible tool to further democratise advanced healthcare and diagnostics. It is an application of high potential but also high technical complexity. I am aware there are many technical hurdles, both biological and mechanical, that I hope to address more fully with the guidance of this course. I have a few primary PoCs in mind just now but are very subject to change depending on the application impact and biomarker suitability after further research.
Application
Target
Function
Biomarker
Technical Complexity Prediction Score (0-10)
Impact Potential
Cancer recurrence monitoring
Prostate cancer recurrence
Wearer can monitor for Prostate cancer markers at home- rather than hospital check ups
PSA1
5 - simple biomarker but general circuit and device complexity challenges
Medium
Metastasis Monitoring
General cancer metastasis
Wearer can monitor for metastasis markers at home – rather than hospital check ups
OPN
5 - simple biomarker but general circuit and device complexity challenges
High
Exposure / Infection Monitoring
Tuberculosis
Wearer can continuously monitor for TB infection in high risk environments- such as for healthcare workers low resource environments or natural disasters
TB RNA
7-potential biomarker complexity challenges & sensitivity challenges and general circuit and device complexity challenges
High
Disease Monitoring and management
Multiple Sclerosis (MS)
Wearer can self-monitor and adjust care for MS relapses
Serum neurofilament light chain (sNfL)
8 biomarker complexity challenges & sensitivity challenges & general circuit and device complexity challenges
Medium
**Related Papers
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm).
Governance
Core governance for development and deployment could be establishing thorugh core guiding principles that align with the aspirations of aiding autonomous democratized healthcare for general good and particularly in low resource contexts
1)Ethics First Development
· Beneficial Use only - only developed to meet medical illness or healthcare need
· Consensual Use Only applied to consenting populations (not without clear consent e.g drug detection in incarcerated populations)
2)Accessibility
· Support economic democracy- Generate and deploy applications in a manner that at least 50% of the deployment is affordable and accessible to low resource users.
· Support all users- ease of adoption, use and interpretation by the end user is a continuous core design principle.
3)Safety
· User safety- ensure use of the device will cause no harm, immediate or lasting to the user
· Containment Safety - Ensure the components of the device have suitable biological and component containment measures to prevent integration or harm beyond the device, to any living system plant or animal.
3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
Aspect
Overview
Considerations
Opportunities
Stakeholders
Proposed Actions
Purpose
The purpose of the device is to provide simpler health autonomy to users
Healthcare providers may not be incentivised or receptive to increasing patient autonomy.
Can reduce healthcare technician burden of running routine testing
Make client satisfaction and minimised time ‘in-clinic’ as metrics of success for healthcare providers.
Design
As a healthcare device it will likely require approval by regulatory bodies such as FDA/MHRAwould need buy in by large medical care groups ( e.g providers)
Regulatory bodies are struggling to define between cell therapies and ‘living diagnostics’ and therefore set appropriate regulatory expectations
Can provide a watershed case for effective regulation of living diagnostics
· Regulators · Users (patients) · Healthcare providers · Insurance Providers · General Public
Collation action with subject experts and regulatory bodies to establish a dedicated taskforce to tackle areas of confusion.
Assumptions
The current design brief assumes the device has suitable biomarker targets & can be suitably manufactured
The PoC detection circuit designs may require many cycles of iteration
Can set precedent of acceptable thresholds of accuracy & sensitivity for such devices
· Regulators · Creators · Funders
Creators choose well researched markers, seek input from field experts, design quick PoCs in biological contexts
Risks
· The device may not be reliable · Device may be harmful when broken or misused · The device may not be robust enough for home use. · Selected biomarkers may not be specific enough. · Device may not be economically viable · Device may be used for forced monitoring
There are many layers of risks using biologically active device ‘in the wild’ , a possible electrical device in a liquid system, Creating diagnostics for Non-expert users
Can identify and address risks early and become a model for considerate, purposeful and responsible synthetic biology application
· Regulators · Users (patients) · Healthcare providers · Insurance Providers · General Public
Biocontainment measuresElectrical containment measuresMaintain guiding values for responsible applications
4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals:
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
X
• By helping respond
X
Foster Lab Safety
• By preventing incident
X
• By helping respond
X
Protect the environment
• By preventing incidents
X
• By helping respond
X
Other considerations
• Minimizing costs and burdens to stakeholders
X
• Feasibility?
X
• Not impede research
N/A
• Promote constructive applications
X
5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
Firstly, prioritize the solid design in line with the guiding principles as this would affect the fundamental elements of the device and so prevent downstream risks. This may mean more time and resources in the design phase to factor in all considerations and seek input. I believe this would ultimately save costs in the long term; there may be instance to consider the trade-off value of actioning progress of a single promising but low impact PoC application or simpler device design to clear the path for future applications.
Second priority would be establishing regulatory clarity and acceptance with regulatory bodies such as the FDA and MHRA. This regulatory acceptance would be a major point of uncertainty and guidance on what is needed for regulatory approval as this often a fundamental step for effective development and widespread acceptance and deployment of new technologies for health-related applications.
Week 2 HW :DNA Read Write Edit
Molecular Biology 101
1. Nucleotides In Silico
Several free tools let you visualize and manipulate DNA/RNA sequences on your computer. Key options: SnapGene Viewer (plasmid maps), NCBI BLAST (sequence alignment), UCSC Genome Browser (reference genomes), and Benchling (all-in-one cloud platform).
Benchling is a great starting point — it’s free, browser-based, and lets you import sequences (GenBank, FASTA, or raw), view annotated maps, design primers, run in silico digests, and align sequencing data. It also supports team collaboration and version control.
2. DNA Synthesis
Instead of cloning from a template, you can order custom DNA directly from commercial providers like Twist Bioscience, IDT, or GenScript — typically delivered in 1–2 weeks. On Twist, you pick between two formats:
Clonal genes (plasmid): Gene synthesized and cloned into a vector (e.g., pTwist Amp). Arrives as dried plasmid or E. coli stock. Ready to use.
Linear DNA (fragments): Double-stranded DNA fragment for assembly into your own vector (e.g., Gibson or Golden Gate). Cheaper and faster.
3. Sequence Verification
Always verify your synthetic DNA before starting experiments. Two standard methods:
a. Sanger Sequencing + Benchling Alignment
Send plasmid + primer to a sequencing provider (Azenta, Eurofins). You get back a .ab1 trace file. Import it into Benchling, align against your reference — mismatches, insertions, and deletions are instantly highlighted. Each read covers ~800–1000 bp, so tile multiple primers for longer inserts.
b. Restriction Digest
Cut your plasmid with 1–2 restriction enzymes, run on an agarose gel, and compare the band pattern to the predicted digest (use Benchling or SnapGene). Confirms correct insert size and orientation. Won’t catch point mutations best used used alongside Sanger.
4. Selected Protien Example — Reflectin Protein RfA1
4.1 Background
Reflectins are squid origin proteins that can change the light reflecting properties of a cell in repsonse to external stimuli (such as changes in salt concentration). In squid they are responsible for dynamic skin colour and light-reflection functions.
Chatterjee et al. (2020) showed that it is possible to produce engineered human HEK293 cells to express reflectin A1 (RfA1), giving them tuneable light-scattering — squid-like optics in human cells.
Reference: Chatterjee et al. “Cephalopod-inspired optical engineering of human cells.” Nature Communications 11, 2708 (2020). DOI link
4.2 Getting the Protein Sequence from GenBank
RfA1 from Doryteuthis pealeii is at accession ACZ57764.1: NCBI link. Click Send to → File to download in GenBank or FASTA format.
GenBank format excerpt:
LOCUS ACZ57764 303 aa linear INV
DEFINITION reflectin-like protein A1 [Doryteuthis pealeii].
ACCESSION ACZ57764
VERSION ACZ57764.1
SOURCE Doryteuthis pealeii (longfin inshore squid)
ORGANISM Doryteuthis pealeii
Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Mollusca;
Cephalopoda; Coleoidea; Decapodiformes; Myopsida;
Loliginidae; Doryteuthis.
303 amino acids, rich in methionine, tyrosine, and charged residues — classic reflectin signature.
4.3 The Corresponding DNA Sequence
To go from protein → DNA, you do a reverse translation: convert each amino acid back to a codon triplet. The catch: the genetic code is degenerate (multiple codons per amino acid), so there’s no single “correct” DNA sequence — just many valid ones. The wild-type squid coding sequence can be found via the “Coded by” link in the CDS feature of the NCBI protein page.
4.4 Codon Optimisation
Squid codons likely won’t express well in human or E. coli cells due to codon bias — organisms prefer different synonymous codons. Rare codons stall ribosomes and tank protein yield. Codon optimisation swaps in host-preferred codons without changing the protein.
For dual expression (human + bacterial), you can either optimise separately for each host, or just optimise for human — human-preferred codons generally work fine in E. coli at moderate expression levels.
5. From Sequence to Cells — Step-by-Step with RfA1
Transform: Resuspend plasmid → add to competent cells (DH5α or BL21) on ice 30 min → heat shock 42 °C / 45 sec → ice 2 min → recover in SOC 37 °C / 1 hr → plate on LB + antibiotic → overnight.
Miniprep: Pick 2–4 colonies → grow overnight in LB + antibiotic → miniprep (Qiagen or equivalent) → Nanodrop.
Verify — restriction digest: Digest ~500 ng with diagnostic enzymes → run on 1% agarose gel → compare bands to predicted pattern from Benchling.
Verify — Sanger sequencing: Send plasmid + tiling primers to Azenta/GENEWIZ → import .ab1 traces into Benchling → align to reference → confirm 100% match.
Step (iv): Transfect HEK293 via Transposon + Lipofectamine
Seed HEK293 at ~70–80% confluency in 6-well plate (DMEM + 10% FBS, no antibiotics).
Lipofectamine 3000 mix: Tube A (Lipo 3000 + Opti-MEM) + Tube B (transposon plasmid + transposase helper plasmid + P3000 + Opti-MEM). Ratio ~3–5:1 transposon:transposase. Combine, wait 15 min.
Add complexes drop-wise → incubate 37 °C, 5% CO₂.
Select at 24–48 hrs with puromycin (1–2 µg/mL). Change media every 2–3 days. Non-integrants die off in 5–10 days.
Expand surviving pool or pick clones.
Step (v): Verify Genomic Integration
Confirm RfA1 actually integrated into the genome. Options from cheapest to most comprehensive:
A. Junction PCR + Sanger — One primer in cassette, one in flanking genome (e.g., AAVS1). Band = integration. Sanger the product. Cheap and fast but only checks one locus.
B. Long-read amplicon sequencing (Nanopore/PacBio) — Long-range PCR across the full insert → single-read verification of the entire cassette. No primer tiling needed.
C. TLA or whole-genome sequencing — Maps all integration sites genome-wide (Cergentis TLA or shallow WGS). Most comprehensive, most expensive. For final clone characterisation.
Best practical combo: A + B — junction PCR confirms the right locus, long-read confirms full cassette integrity.
Summary
Step
What You Do
Key Tool / Service
Sequence retrieval
Download RfA1 protein sequence
NCBI GenBank (ACZ57764.1)
Codon optimisation
Optimise for human/bacterial expression
VectorBuilder online tool
In silico design
Import sequence, design construct
Benchling
DNA synthesis
Order construct as clonal gene
Twist Bioscience
Bacterial work
Transform, miniprep, verify
Competent E. coli, Sanger sequencing
Mammalian transfection
Transposon + Lipofectamine into HEK293
Lipofectamine 3000, PiggyBac/SB
Integration verification
Confirm genomic integration
Junction PCR, Sanger, Nanopore, or WGS
Week 3 Review: Lab Automation
Week 3: Lab Automation
HTGAA 2026 — Fiona Connolly
What Lab Automation Can Do for Us?
Lab automation is simply automating the processes in the lab. Scripted protocols, and integrated instruments to carry out experimental procedures with ideally minimal manual intervention. Particularly in molecular biology, this typically translates to very precise , temporally and temperature controlled liquid handling across the scale from picoL to Litres. The precise transfer of reagents, cultures, or genetic constructs between wells, plates, and vessels.
Automation is useful for lab processes in speed and consistency, a protocol run on a automated system produces the same volumes, timings, and positions every time, removing operator-dependent variability that limits reproducibility in manual workflows.
These systems range from large integrated platforms (Hamilton STAR, Beckman Biomek, CloudLab, DAMP Lab) down to benchtop robots accessible to academic labs,like the OT2 and CyBio Felix. One that we explore further this week is the OpenTrons OT-2, an open-source liquid-handling robot that costs roughly $5k (compared to $100k+ for legacy systems) and is programmed entirely in Python. It offers ±0.1 mm positional accuracy and <1% volume CV at the microlitre scale, making it practical for most molecular biology, microbiology, and synthetic biology workflows.
Here is a typical OT-2 deck setup for a combinatorial screening experiment:
PLACEHOLDER ![OT-2 deck layout showing tip rack, source plate, destination plate, tube rack and pipette head]
Example Application: Combinatorial Cross-Culture Screening
To illustrate what precise automated pipetting enables, here us common synthetic biology task: EXAMPLE HERE
The practical difference between manual and automated execution at this scale:
Manual
OT-2 Automated
Time
~6 hrs
~30 min
Volume CV
5–15%
<1%
Positional error
~1 mm
±0.1 mm
Reproducibility
Operator-dependent
Protocol-locked
Scale
~96 conditions/day
384+ conditions/run
Q1: Published Paper Using Automation for Novel Biology
Towards Automation of the DBTL Bioengineering Cycle: Application to Testing and Characterization of Standard Bioparts
Pushkareva, A., Beltran, J., Díaz-Iza, H., Arboleda-García, A., Boada, Y., Vignoni, A., Picó, J. (2023). XLIV Jornadas de Automática, 453–458. DOI: 10.17979/spudc.9788497498609.453
Pushkareva et al. address a gap in the Design-Build-Test-Learn (DBTL) cycle: while much prior work has automated the Design and Build steps individually, few efforts have tackled the Test and Learn steps together. This paper presents an integrated automated workflow combining the Opentrons OT-2 liquid-handling robot with an Agilent Biotek Cytation 3 plate reader to systematically characterise standard genetic bioparts.
The automated Test step worked as follows.
The OT-2 (fitted with a Multichannel P300 and Single Channel P1000) performs a two-part protocol controlled via Jupyter notebook.
-First, it dilutes 7 bacterial culture samples 1:4 in M9 minimal media with glucose, then transfers them to the plate reader for an initial OD600 measurement.
-The OD values are fed into a template spreadsheet that calculates the volumes needed to normalise all cultures to OD 0.1. —-The OT-2 then executes a second protocol using those calculated volumes, producing a standardised 96-well plate that goes into a 16-hour incubation/measurement experiment at 37°C and 230 rpm, recording both absorbance (600 nm) and fluorescence (530/488 nm).
PLACEHOLDER ![Automated Test and Learn workflow: OT-2 dilution and normalisation, plate reader measurement, parameter identification and cross-device prediction]
For the Learn steps,
-they used the resulting calibrated dataset (particles, MEFL, MEFL/particle, growth rate) from two GFP expression constructs — a low-copy (pSC101, C_N=5) and high-copy (ColE1, C_N=35) plasmid, both with identical promoter (BBa_J23106) and RBS (BBa_B0030).
-Using a growth-independent protein production model and genetic algorithm-based parameter identification on the low-copy data alone, they showed that the same parameter values could accurately predict the protein production of the high-copy device simply by changing the copy number, with comparable prediction error (MSE ~3.47×10⁸) to the optimisation error (MSE ~3.50×10⁸).
This is great example of using automation to scale up a standardised experiment and generate invaluable data because it demonstrates a practical closed loop: automated liquid handling produces consistent enough data that the Learn step (model fitting and prediction) actually works across devices. The reproducibility enabled by the OT-2 is what really makes this type of experiment doable and reprodicible.
Q2: Automation Plan for Final Project
Project Overview
The goal of on of final project ideas is to develop cell-free colorimetric biosensors that produce a visible colour change when a target biomarker exceeds a clinically relevant threshold. Automation would create a path for screening a combinatorial library of genetic circuit components (promoters, RBS variants, reporter genes, sensor elements) to identify which constructs give the best dose-responsive colorimetric output for each target analyte.
Beyond single-analyte detection, I want to explore the feasibility of multiplex circuits that detect more than one biomarker in a single reaction, using orthogonal detection modalities (e.g. toehold switches for RNA targets, transcription factor-based circuits for protein targets, and CRISPR-Cas12a for DNA targets).
I am planning for two application areas, each with three target biomarkers.
Two Use Cases and Their Molecular Targets
PLACEHOLDER ![Biosensor target panel showing pathogen exposure markers and cancer recurrence biomarkers with detection modalities]
Use Case 1 — Pathogen Exposure Markers:
The first application targets three infectious disease biomarkers relevant to resource-limited settings where point-of-care colorimetric diagnostics would have the most impact. For Ebola, the target is the secreted glycoprotein (sGP), a decoy antigen present in infected blood that can be detected earlier than PCR-based methods using sandwich immunoassay or CRISPR-Cas13a approaches targeting viral RNA. For HIV, the target is the p24 capsid antigen, detectable at approximately 10 pg/mL by colorimetric ELISA with ultrasensitive methods reaching 0.5 pg/mL. For tuberculosis, the target is lipoarabinomannan (LAM), a mycobacterial cell wall glycolipid excreted in urine at a median concentration of approximately 137 pg/mL in TB-positive individuals, making it suitable for non-invasive sample collection.
The multiplexing opportunity here is that these three targets are molecularly orthogonal (two proteins and one glycolipid), so independent detection channels could in principle operate in the same cell-free reaction without crosstalk.
Use Case 2 — Cancer Recurrence Biomarkers:
The second application targets three biomarkers associated with cancer recurrence monitoring. Circulating tumour DNA (ctDNA) carrying EGFR mutations (L858R, exon 19 deletions) is detectable at variant allele frequencies as low as 0.02% using CRISPR-Cas12a or multilevel toehold switch circuits, and predicts progression in approximately 64% of NSCLC cases before clinical detection. Circulating microRNAs miR-21 and miR-155 are upregulated 1.5–1.7-fold in the plasma of patients with breast, colorectal, pancreatic, and liver cancers, and are detectable without amplification using gold nanoparticle aggregation-based colorimetric assays. Exosomal PSA tracks prostate cancer recurrence at a threshold of 0.2–0.5 ng/mL, with exosomal urine tests achieving 92% negative predictive value.
These three targets span DNA, RNA, and protein modalities, which again creates the possibility for orthogonal multiplex detection in a single reaction format.
What Needs to Be Automated
For each use case, the screening task is the same: test a library of genetic circuit constructs against an 8-point concentration gradient of the target analyte, measure the colorimetric and fluorescent output over time, and identify which constructs produce a clear dose-response curve with a visible colour change at or below the clinical threshold concentration. The construct library for each target would include 3–5 promoter strengths × 3 RBS variants × 2 reporter genes (LacZ for colorimetric, mCherry for fluorescence), giving 30–60 constructs per target. Across 6 targets, that is 180–360 constructs to screen, each against an 8-point gradient — roughly 1,500–3,000 individual reactions. This is not feasible manually.
I have planned the automation for two scenarios depending on available infrastructure.
Scenario A: OT-2 + Standard Benchtop Automation
PLACEHOLDER ![OT-2 benchtop automation workflow for construct screening with dose-response readout]
In this scenario, the entire DBTL cycle runs on an OT-2 with an attached plate reader (Biotek Cytation or similar), following a protocol structure similar to Pushkareva et al.
Build: The OT-2 assembles constructs via Golden Gate or Gibson assembly into 96-well format, transforms into competent cells, and plates for colony selection. After overnight growth and colony picking, constructs are arrayed in a source plate.
Test: The OT-2 distributes cell-free protein synthesis (CFPS) master mix into a 96-well plate, adds DNA constructs from the source plate, then sets up an 8-point serial dilution of the target analyte across columns. The plate is sealed and incubated at 37°C for 1–6 hours, then read for absorbance (570 nm for colorimetric reporters) and fluorescence (530 nm for mCherry). The plate layout allocates rows A–F to 6 constructs, columns 1–8 to the analyte gradient, columns 9–10 to no-analyte controls, and columns 11–12 to positive controls, with row G as a blank and row H for calibration standards.
Learn: Dose-response curves are fitted to a Hill function for each construct. Constructs are ranked by dynamic range above the clinical threshold, signal-to-noise ratio at the threshold concentration, and time to visible colour change. Top performers from each target are carried forward into the next DBTL iteration.
At 96-well scale, each OT-2 run screens 6 constructs against one target. Screening the full library of 30–60 constructs per target requires 5–10 plates per target, or roughly 30–60 plates total across both use cases. At one plate per run (including setup and incubation), this takes approximately 2–3 weeks of daily runs.
Example OT-2 protocol for the analyte gradient step:
fromopentronsimportprotocol_apimetadata={'apiLevel':'2.13','description':'Biosensor dose-response screen — CFPS colorimetric'}defrun(protocol:protocol_api.ProtocolContext):# ── Labware ──────────────────────────────────────tiprack_300=protocol.load_labware('opentrons_96_tiprack_300ul','1')tiprack_20=protocol.load_labware('opentrons_96_tiprack_20ul','4')cfps_reservoir=protocol.load_labware('nest_12_reservoir_15ml','2')construct_plate=protocol.load_labware('corning_96_wellplate_360ul_flat','3')dest_plate=protocol.load_labware('corning_96_wellplate_360ul_flat','6')analyte_rack=protocol.load_labware('opentrons_24_tuberack_eppendorf_1.5ml_safelock_snapcap','5')p300=protocol.load_instrument('p300_multi_gen2','left',tip_racks=[tiprack_300])p20=protocol.load_instrument('p20_single_gen2','right',tip_racks=[tiprack_20])# ── Step 1: Distribute CFPS master mix (10 uL/well) ──p300.distribute(10,cfps_reservoir['A1'],dest_plate.wells()[:72],new_tip='once')# ── Step 2: Add DNA constructs (2 uL each, rows A-F) ──forrowinrange(6):forcolinrange(12):p20.transfer(2,construct_plate.wells()[row*12+col],dest_plate.wells()[row*12+col],new_tip='always')# ── Step 3: Analyte serial dilution (cols 1-8) ────────# Concentrations: 0, 0.1, 1, 10, 100, 500, 1000, 5000 pg/mLanalyte_vols=[0,0.2,0.5,1.0,2.0,4.0,6.0,8.0]# uL from stockbuffer_vols=[8.0,7.8,7.5,7.0,6.0,4.0,2.0,0]# uL bufferforcol_idxinrange(8):col_wells=dest_plate.columns()[col_idx][:6]# rows A-F onlyifbuffer_vols[col_idx]>0:p20.distribute(buffer_vols[col_idx],analyte_rack['A1'],col_wells,new_tip='once')ifanalyte_vols[col_idx]>0:p20.distribute(analyte_vols[col_idx],analyte_rack['B1'],col_wells,new_tip='always')
Scenario B: Ginkgo Bioworks Cloud Lab / DAMP Lab
PLACEHOLDER (![Cloud lab automation workflow using Ginkgo Nebula or DAMP Lab integrated instruments]))
In this scenario, the entire workflow is submitted remotely and executed on integrated high-throughput instruments at Ginkgo Bioworks (via Nebula) or the DAMP Lab at Boston University.
Build: Construct designs are submitted digitally. Gene synthesis is handled by Twist or IDT. The Echo 525 acoustic liquid handler transfers construct DNA at nanolitre precision into 384-well plates, and a Bravo liquid handler stamps in additional reagents.
Test: A Multiflo dispenser adds CFPS lysate to all wells. The plate is sealed (PlateLoc), incubated at 37°C (Inheco), unsealed (XPeel), and read on a PHERAstar plate reader for absorbance, fluorescence, and luminescence. The entire sequence runs without manual plate handling.
Learn: Data is exported via API for automated dose-response fitting, Hill coefficient extraction, and LOD calculation. Constructs are ranked by the same criteria as Scenario A.
The key advantages of the cloud lab over benchtop are scale and precision. At 384-well format, a single plate screens 24 constructs against one target (4× the throughput of OT-2). The Echo transfers at nanolitre precision, reducing reagent consumption. The integrated seal/unseal/incubate workflow eliminates manual plate moves entirely. And all 6 targets can be run in parallel, completing the entire screen in approximately 2–3 days rather than 2–3 weeks.
Cloud lab protocol pseudocode (per target):
1. Echo 525: Transfer construct DNA (2.5 nL each) into 384-well plate
→ 24 constructs × 16 replicates per plate
2. Echo 525: Transfer analyte at 8 concentrations across columns
→ nanolitre serial dilution, no tip waste
3. Bravo: Stamp CFPS reagent master mix into all wells
4. Multiflo: Dispense CFPS lysate to start protein expression
5. PlateLoc: Seal plate
6. Inheco: Incubate 37°C, 1–6 hrs
7. XPeel: Remove seal
8. PHERAstar: Read absorbance (570 nm) + fluorescence (530/488 nm)
→ kinetic or endpoint, per experimental design
9. Data export: Dose-response curves → Hill fit → rank constructs
Multiplex Circuit Design Considerations
For both use cases, the longer-term goal is to combine the top-performing single-analyte sensors into a multiplex format where two or three biomarkers are detected in the same cell-free reaction. This is feasible because the targets within each use case are molecularly orthogonal: in the pathogen panel, sGP (protein), p24 (protein), and LAM (glycolipid) can each bind to distinct sensor elements; in the cancer panel, ctDNA (nucleic acid), miRNA (nucleic acid), and exosomal PSA (protein) can be read by CRISPR-Cas12a, toehold switches, and aptamer-based circuits respectively. Each sensor module would drive a spectrally distinct reporter (e.g. LacZ/yellow at 405 nm, mCherry/red at 587 nm, and a luciferase/blue for luminescence), allowing deconvolution on a standard plate reader.
The automation requirements for multiplexing are the same as for single-analyte screening, but the number of conditions increases: each multiplex combination needs to be tested against a matrix of analyte concentrations (rather than a single gradient), making the cloud lab scenario (Scenario B) strongly preferable for this stage.
Comparison of Scenarios
Scenario A: OT-2 Benchtop
Scenario B: Cloud Lab
Format
96-well
384-well
Constructs per plate
6
24
Liquid transfer precision
±1% (µL)
±5% (nL, Echo)
Manual intervention
Plate moves, seal/unseal
None
Time for full screen
2–3 weeks
2–3 days
Cost per plate
Low (reagents only)
Service fees apply
Multiplex feasibility
Limited (96-well constraint)
Practical (384-well + nL precision)
Accessibility
Available now
Requires cloud lab access
Week 4 Review: Protein Design Part I
Week 4 — Protein Design Part I
Foundations of protein chemistry: alphabet, structure, and design
A topic guide on the amino-acid alphabet, secondary structure, chirality, β-sheet aggregation, and the principles of designing structured peptides — written as a stand-alone primer rather than a homework Q&A. This page is Part A of the Week 4 deliverable for HTGAA Spring 2026; Parts B, C, and D (sequence/structure analysis, ML protein design, MS2 L-protein engineering proposal) will be added as those sessions complete.
Course: HTGAA Spring 2026
Lecture (Tues, Feb 24, 2026): Thras Karydis, Jon Kaufman — Protein Design Part IRecitation (Wed, Feb 25): Allan Costa — Protein foldingAuthor: Fiona (Committed Listener BioPunk)
Why this matters
Proteins do nearly everything in biology: catalysis, structure, transport, signaling, immunity. For most of the field’s history, “designing” a protein meant rational point-mutation campaigns on a known scaffold — slow, limited in scope, and dependent on hard-won structural intuition. The last five years collapsed that timeline. Modern protein language models read the fitness landscape from hundreds of millions of natural sequences; structure predictors fold sequences in seconds; inverse-folding networks rewrite sequences for any backbone. Together they make computational protein engineering tractable for problems that used to require multi-year wet-lab campaigns.
Before we get to those tools (Parts B, C, D), we need the substrate they operate on: the amino-acid alphabet, the geometric constraints on protein structure, and the thermodynamics of folding and aggregation. That’s Part A. The conceptual questions in this section come from Shuguang Zhang’s HTGAA prompt set, but the answers are organized as a connected story rather than an isolated Q&A — so a reader landing on this page cold can follow it as a topic guide.
The thread running through everything: proteins are constrained polymers. Their building blocks are stereochemically restricted (only L-amino acids in nature, only ~20 of them, only one ribosomally-installable scaffold per residue). Those constraints set what proteins can do — which is the same as what synthetic biologists can engineer.
1. The amino-acid alphabet
1.1 The shared scaffold
Every amino acid except proline shares the same backbone scaffold: an α-carbon flanked by an amino group, a carboxyl group, a hydrogen, and a variable side chain (R). Peptide bonds link them via dehydration condensation; the resulting peptide bond is planar (partial double-bond character locks the C–N axis). That constrains the protein backbone to two free torsion angles per residue — φ (around N–Cα) and ψ (around Cα–C) — and the entire shape of secondary structure follows from the geometry of those allowed angles.
The 20 side chains span the chemistries biology needs: hydrophobic (A, V, L, I, M, F, W), polar uncharged (S, T, N, Q, Y, C), positively charged (K, R, H), negatively charged (D, E), and special-case residues — Gly (achiral, maximum flexibility), Pro (rigid imino acid that disrupts helices), Cys (forms disulfides). Proline is the structural exception: its side chain loops back to the backbone nitrogen forming a pyrrolidine ring, locking φ at roughly −60° ± 15°. That single deviation makes proline a helix-breaker, a Type II’ turn nucleator (when D-Pro is used), and the dominant residue of the polyproline II helix that builds collagen.
All 20 amino acids in life are L-stereoisomers. Glycine is achiral. This single geometric constraint is the most consequential one in protein structure: it’s why the α-helix in evolved proteins is right-handed, why D-amino-acid peptides invert handedness, and why every helical biological structure ultimately inherits the same handedness from its building blocks. We come back to this in Section 2.
1.2 The quantitative scale of protein chemistry
A useful intuition for protein chemistry comes from working out the throughput. If you eat 500 g of meat and treat it as pure protein with average residue mass 100 Da:
Two caveats. Real beef is only ~20% protein by mass (~70% water, ~10% fat), so adjusted for actual protein content the number drops to ~6 × 10²³ residues — still close to Avogadro’s number. And these are residues incorporated in protein, not free amino acids; they’re released by digestive proteolysis before they enter the bloodstream. A single 500 g serving carries roughly Avogadro’s-number-worth of amino-acid monomers, which calibrates intuition for the throughput of protein chemistry on the human scale.
1.3 Why species identity is genomic, not dietary
A frequent question — why don’t humans become cows from eating beef? — has a clean answer: species identity is genomic, not dietary.
When you eat beef, digestive proteases such as trypsin hydrolyze cow proteins into free amino acids and short peptides before they cross into the bloodstream. Those amino acids enter a species-agnostic pool — a Phe is a Phe whether it came from a cow, a tuna, or a yeast cell. What happens next is the key: your ribosomes don’t read cow mRNA. They read your mRNA, encoded by your genome, and assemble your proteins from the recovered amino-acid pool. The cow contributed the building blocks; the assembly blueprint is yours.
Two qualifiers. Essential amino acids (Phe, Trp, Lys, Met, Thr, Val, Leu, Ile, and His — the last conditionally essential) are residues humans cannot synthesize in adequate amounts and must obtain from diet — diet matters for what you can build, not for what you become. And post-translational modifications (glycosylation, hydroxylation, etc.) are also genome-encoded — cow collagen and your collagen differ at the modification level too, not just at the sequence level. The principle is template-directed assembly: diet supplies monomers, identity comes from the blueprint.
1.4 Why ~20 canonical amino acids
All known life uses the same canonical 20 amino acids in ribosomal protein synthesis, with two genetically-encoded exceptions: selenocysteine (Sec) is encoded by recoded UGA in selenoproteins across all three domains; pyrrolysine (Pyl) is encoded by recoded UAG in some methanogenic archaea. No organism has ever been found using a fundamentally different ribosomal alphabet — strong evidence that 20 + Sec + Pyl is a deep evolutionary attractor, not just an unsampled corner.
Why this particular set? Three mutually-reinforcing constraints filter the space:
Prebiotic availability. Miller–Urey (1953) experiments and the Murchison meteorite (1969) both produced ~10 of the canonical 20 — Gly, Ala, Asp, Glu, Val, Leu, Ile, Pro, Ser, Thr — under abiotic conditions. The simpler half of the alphabet was available before life had to encode anything.
Ribosomal compatibility. The peptidyl-transferase center of the ribosome has stereochemical and steric tolerances limiting which side chains it can incorporate accurately. β-amino acids and very bulky residues are translated poorly even today. Whatever life encoded had to fit through this geometric filter.
Biosynthetic cost vs. chemical novelty. Once translation existed, selection favored adding amino acids that gave new chemistry not yet in the set, even at high biosynthetic expense. Trp and Tyr — the only sources of aromatic chemistry — are biosynthetically expensive and were almost certainly late additions. Trifonov (2000) reconstructs an inferred temporal order: small-and-cheap first, complex-and-functional later.
A fourth, complementary constraint is codon space: a 4-base, 3-position genetic code yields 64 codons; subtract three stops and you have 61 sense codons distributed across 20 amino acids with redundancy. Many more than ~20 would over-strain the ribosome’s codon-recognition machinery; many fewer would waste coding capacity.
These constraints together explain why these 20, but the relative weighting among them remains debated — Higgs & Pudritz (2009) emphasize prebiotic availability; Wong (2005) emphasizes codon-coevolution; Crick’s “frozen accident” (1968) holds that the specific 20 was partly arbitrary at the moment of code lock-in.
Engineered exceptions are real. Schultz, Chin, and the Church group’s Genomically Recoded Organism work (Lajoie 2013) have built orthogonal aminoacyl-tRNA synthetase / tRNA pairs that let engineered organisms incorporate over 200 non-canonical amino acids ribosomally. The ribosome is permissive enough; evolution simply didn’t go there.
So: 20 is not a physical limit. It is the biological-historical attractor produced by prebiotic supply, ribosomal compatibility, and selection-cost trade-offs, layered onto a finite codon space.
1.5 Designing non-canonical amino acids
Engineered ribosomal incorporation has produced over 200 non-canonical amino acids (NCAAs). The field is largely Peter Schultz’s (Scripps) and Jason Chin’s (MRC-LMB); the Church group’s Genomically Recoded Organism (Lajoie 2013) removed all UAG stop codons from E. coli to free that codon for NCAA assignment without translational crosstalk.
A novel amino acid must clear three filters to be installable:
Stereochemical compatibility with the ribosome — must be α-amino (canonical backbone scaffold), with side-chain bulk roughly within the natural envelope. β-amino acids and very large side chains translate poorly.
Orthogonal aminoacyl-tRNA synthetase / tRNA pair — a dedicated aaRS that doesn’t cross-react with native AAs, paired with a tRNA decoding a non-natural codon. Methanocaldococcus jannaschii TyrRS (Schultz) and archaeal pyrrolysyl-tRNA synthetase (Chin) are the workhorses.
A free codon to assign — typically a reassigned stop codon (UAG amber, UGA opal).
Established NCAAs include p-azidophenylalanine (click chemistry), p-iodophenylalanine (X-ray phasing), photo-caged Lys/Tyr/Cys (light-activated control), and selenomethionine (heavy-atom for crystallography).
Two design proposals:
Photoswitch-Phe (azoPhe). Phenylalanine analog with the para-H replaced by an azobenzene group. UV flips the azobenzene between trans (extended) and cis (~3 Å shorter); visible light reverses it. Use case: optically toggleable local geometry — caged active sites, photoswitchable PPI inhibitors, kinetic studies of folding intermediates. Real precedent: Beharry & Woolley (2011).
Bipyridyl-Ala (BiPyAla). Alanine analog with a 2,2′-bipyridyl side chain replacing the methyl. Creates a bidentate metal-chelation site at a single residue. Drop Cu²⁺ or Fe²⁺ in and you have a redox-active or catalytic center installed on an arbitrary protein scaffold. A first-generation version exists in the Schultz lab; higher-affinity and pH-tuned variants remain open design space.
NCAAs are not the first tool for protein engineering — point mutations to canonical residues are simpler and the modern ML toolkit handles them well. NCAAs become useful when you want to covalently lock a fold (engineered disulfide, photo-crosslink) or install a sensor (FRET pair, EPR spin label) without disrupting the native sequence. Relevant for the L-protein engineering work later in the course.
2. Secondary structure and the chirality of life
Plot every residue’s (φ, ψ) torsions on a 2D plot — the Ramachandran plot — and only ~10% of the space is sterically allowed for L-amino acids. Most is forbidden by atomic clashes between the side chain and the backbone. Three allowed regions dominate: the lower-left (right-handed α-helix), the upper-left (β-sheet), and a small upper-right region (left-handed α-helix) tolerated only by glycine. Secondary structure is what you get when consecutive residues stack inside one of these allowed islands.
This single fact — the Ramachandran plot for L-amino acids — explains both the dominance of right-handed helices and the handedness of mirror-image polypeptides.
2.1 Why most molecular helices are right-handed
Life is homochiral, and the homochirality of biological building blocks fixes helical handedness across all macromolecular structures.
For protein α-helices specifically: living organisms use L-amino acids exclusively in ribosomal protein synthesis. L-side chains clash sterically with the backbone in any conformation other than the right-handed α-helix; the (φ, ψ) coordinates of a left-handed α-helix sit in a sterically forbidden region of the L-amino-acid Ramachandran plot. Result: every α-helix in every natural protein on Earth is right-handed, with rare local exceptions at glycine residues (Gly is achiral and tolerates either handedness).
The same homochirality argument generalizes beyond proteins. DNA uses D-deoxyribose; the canonical B-DNA double helix is right-handed (Z-DNA, a left-handed form, is a rare local conformation in special sequences). RNA uses D-ribose; the A-form helix is right-handed. Collagen is a striking edge case: each polyproline II chain is itself left-handed, but three such chains coil together into a right-handed triple super-helix. The chain-level handedness is still dictated by L-amino acid stereochemistry.
The deeper question — why life is homochiral in the first place — remains unsettled. Candidate explanations include parity violation in weak-force interactions producing tiny enantiomeric excess, asymmetric photolysis by circularly polarized light from neutron stars, and stochastic symmetry-breaking amplified by autocatalytic networks. Whatever the original cause, once L-amino acids and D-sugars were locked in by early biology, all derived helical structures inherited the corresponding handedness.
So molecular helices are right-handed in life because biological building blocks are homochiral, and that homochirality determines which Ramachandran (or analogous geometric) region is accessible.
2.2 D-amino acids and mirror-image proteins
If you build a polypeptide entirely from D-amino acids instead of L, the Ramachandran plot mirrors. The “L-allowed” lower-left region maps to the upper-right; helices wind in the opposite sense — left-handed.
The defining structural features of the α-helix are otherwise preserved: the i → i+4 hydrogen-bond pattern, the 3.6-residue periodicity, and the 5.4 Å pitch all hold. Only the handedness flips. Stephen Kent’s group (U. Chicago) chemically synthesized full-length D-VEGF — the D-amino-acid mirror of vascular endothelial growth factor — and crystallized it in the expected mirror-image structure with left-handed helices. Some natural antimicrobial peptides containing D-residues also exhibit local left-handed helical character.
A racemic polypeptide (mixed L and D residues) typically forms neither helix — consistent stereochemistry across consecutive residues is required to satisfy the Ramachandran constraints. For an accessible primer on racemic and mirror-image polypeptides, see Mirror Image Proteins (Mandal & Kent, 2017) and the Wikipedia article on Racemic crystallography.
A design-relevant insight from this: D-amino-acid peptides are proteolysis-resistant because natural proteases evolved to cleave L-peptides. Mirror-image therapeutic peptides are an active research area, particularly for oral or systemic delivery where stability is rate-limiting.
3. β-sheets, aggregation, and amyloid
A β-strand is an extended near-zigzag backbone with (φ, ψ) ≈ (−120°, +120°). Strands H-bond laterally to each other to form a β-sheet — parallel (strands run N→C in the same direction; H-bonds slant) or antiparallel (strands run opposite; H-bonds are roughly perpendicular and shorter, making antiparallel sheets slightly more stable). Side chains alternate above and below the sheet plane. Amphipathic β-strands (alternating hydrophobic-hydrophilic patterning at i, i+2) drive sheet formation in aqueous environments.
Two structural facts about β-sheets matter for everything that follows: edges are unsatisfied and faces can be hydrophobic. Both feed the aggregation thermodynamics in §3.1.
3.1 Why β-sheets aggregate
β-sheets aggregate because they are structurally incomplete on their edges. Each strand carries unpaired backbone hydrogen-bond donors (N–H) and acceptors (C=O), so a single β-sheet exposes a row of “lonely” hydrogen bonds along its edge. Adding a second β-sheet alongside lets those bonds pair up — the system’s energy drops and the configuration becomes more stable.
The driving force has two main components. Satisfying the unpaired hydrogen bonds (enthalpic): each new H-bond between sheets releases ~3–5 kcal/mol of heat. The hydrophobic effect (entropic): β-sheet faces often present alternating hydrophobic side chains; burying these against another sheet’s hydrophobic face releases ordered water molecules from around them, raising solvent entropy. This is the classical hydrophobic effect (Kauzmann 1959, Tanford 1962) — counter-intuitively, it’s entropy, not enthalpy, that drives most hydrophobic interactions in water.
There is one unfavorable contribution: the protein chain loses conformational entropy when it locks into a rigid stacked structure. This cost is outweighed by the gains above, but it’s why aggregation is concentration- and time-dependent rather than instantaneous.
Together, these forces drive sequences toward the cross-β amyloid fold — stacked β-sheets with strands oriented perpendicular to the fiber axis, side chains interdigitated between sheets, and all backbone hydrogen bonds satisfied along the fiber length. For many proteins, cross-β is the deepest free-energy minimum on the folding landscape, deeper even than the native fold. Once a sequence enters this minimum, escape requires aggressive denaturation — this is why amyloid is essentially irreversible under physiological conditions.
This also explains why amyloid-forming proteins span enormous sequence space: aggregation is driven by backbone properties (always present) plus generic hydrophobic contributions, not specific side-chain chemistry. Any sufficiently long, marginally soluble polypeptide can in principle form cross-β, given time and concentration. Knowles, Vendruscolo & Dobson (2014) is the canonical review.
3.2 Amyloid disease, propagation, and materials
Why amyloid diseases form β-sheets
Disease-associated amyloidogenic proteins reflect the genericness of the cross-β attractor: Aβ in Alzheimer’s, α-synuclein in Parkinson’s, prion protein PrP in CJD/BSE, islet amyloid polypeptide in type 2 diabetes, transthyretin in cardiac amyloidosis, tau in tauopathies, huntingtin in Huntington’s. Different sequences, different organs, same cross-β architecture.
Pathology has two drivers. Toxic oligomeric intermediates — small soluble oligomers on the aggregation pathway, not the mature fibers — are the primary cytotoxic species. They permeabilize membranes, disrupt mitochondria, and trigger inflammation. Tissue mechanical damage comes from accumulated mature fibers, producing the histological plaques.
Seeding and templated nucleation
The progressive nature of amyloid disease is explained by seeding. Aggregation is bottlenecked by formation of the first cross-β nucleus; once seeded, monomers dock onto existing fibril edges and get templated into the same geometry. Aggregation becomes autocatalytic:
Prion infectivity (Prusiner 1997 Nobel) — PrPSc converts native PrPC by template-directed refolding. A protein alone is the infectious agent.
Cell-to-cell spread of α-synuclein, tau, and Aβ — pre-formed fibrils released from one neuron seed misfolding in the next, explaining anatomical-pathway progression over years.
Secondary nucleation (Knowles 2009, Cohen 2013) — fibrils catalyze new nucleation on their surfaces, driving explosive late-stage growth.
Aging accelerates the cycle by weakening protein quality-control machinery (chaperones, proteasome, autophagy), letting rare nucleation events escape.
Therapeutic strategies — capping the chain
The same edge-pairing logic that drives aggregation suggests how to stop it. Five general approaches:
N-methylated capper peptides. A short β-strand peptide docks onto the fibril edge but has methylated backbone N–Hs on its outward face — no donors for the next incoming monomer. Chain terminates. Demonstrated for Aβ, IAPP, and tau (Soto, Gazit, Eisenberg groups).
Native-state stabilizers. Keep monomers out of the aggregation pool. Tafamidis (FDA-approved) stabilizes the transthyretin tetramer so it never dissociates into amyloidogenic monomers — the most successful clinical anti-amyloid strategy to date.
Anti-oligomer antibodies target toxic intermediates. Aducanumab and lecanemab in Alzheimer’s; modest, contested efficacy but the mechanism is right.
Anti-fibril-surface chaperones block secondary nucleation. The Brichos domain coats Aβ fibril surfaces; endogenous Hsp104 (yeast) and Hsp70/40 (mammals) disaggregate via ATP.
Small-molecule disaggregators like EGCG, curcumin, and resveratrol bind hydrophobic stacking surfaces and remodel amyloid in vitro; clinical translation has been limited by bioavailability.
Amyloid as engineering material
The properties that make amyloid medically dangerous also make it materially valuable: a Young’s modulus of ~10 GPa (silk-comparable), tensile strength rivaling steel by mass, resistance to proteolysis / heat / solvents, and spontaneous self-assembly from solution. Amyloid is an active engineering platform.
Natural functional amyloids show biology already exploits the architecture: curli fibers in E. coli biofilms, Pmel17 organizing melanin in melanosomes, fungal HET-s prion-like signaling, hormone storage in pituitary granules. Engineered applications include drug-delivery scaffolds, biosensors, conductive nanowires (cytochrome-c-amyloid hybrids, MIT Lu lab), and 3D-printable hydrogels. Recombinant silk proteins (e.g., AMSilk, Spiber) use the closely related β-sheet–crystallite architecture of natural silks, which is structurally distinct from canonical cross-β amyloid but exploits the same edge-pairing thermodynamics.
The elegant corollary: the same capping logic that prevents disease in vivo lets materials engineers tune fiber length and end-group chemistry in vitro. Cappers become a design parameter. The disease-relevant stability of cross-β is exactly the property that makes it useful — you just need to control where the chain stops.
4. Designing β-sheet motifs
A “well-ordered” β-sheet motif folds into a single intended sheet without unfolding or recruiting another sheet to aggregate. Four design rules govern the geometry:
Alternating hydrophobic-hydrophilic patterning at i, i+2 along each strand. The hydrophobic face must be buried (against another strand or core) or capped to prevent recruiting another sheet.
Tight turns. β-hairpins (two antiparallel strands) require Type I’ or II’ turns — the only turn types that fit antiparallel topology without backbone strain. Asn-Gly is the most common natural Type I’ turn nucleator; D-Pro-Gly is the strongest synthetic Type II’ turn (Stanger & Gellman 1998).
Edge capping. Place charged or aromatic residues on strand edges to disrupt lateral hydrogen bonding to other sheets — the single most important rule for preventing aggregation.
Length and topology. β-hairpins (~12–16 residues) are the smallest stable motif. Larger antiparallel β-meanders (Greek-key, β-barrel) give thermostability.
Sidebar: β-turn types
A β-turn is a 4-residue motif that reverses the direction of the polypeptide chain — every β-hairpin is built on one. Turns are classified by the (φ, ψ) torsions of the two central residues (positions i+1 and i+2 of the four-residue stretch).
Type
φ(i+1)
ψ(i+1)
φ(i+2)
ψ(i+2)
Required residues
I
−60°
−30°
−90°
0°
flexible
II
−60°
+120°
+80°
0°
Gly at i+2
I’
+60°
+30°
+90°
0°
Gly at i+1
II’
+60°
−120°
−80°
0°
Gly or D-Pro at i+1
The “prime” types are mirror images of the unprimed — same angles with signs flipped. Antiparallel β-hairpins force the chain to U-turn with specific handedness, so Type I’ and II’ turns are the natural fit; Types I and II produce slightly strained hairpin geometry. Both prime types require a positive φ at i+1 — sterically forbidden for L-amino acids (the same Ramachandran exclusion that forbids the left-handed α-helix in §2.1). Only glycine (achiral) tolerates positive φ in natural sequence, which is why every Type I’ turn has Gly at i+1. Type II’ turns have Gly orD-Pro at i+1; D-Pro forces a precise positive-φ geometry that even Gly can’t always nail. That’s why the strongest synthetic β-hairpin nucleator is D-Pro-Gly (Stanger & Gellman 1998).
Worked design — a stable β-hairpin
Starting scaffold: the GB1 hairpin (residues 41–56 of Streptococcus protein G B1 domain), GEWTYDDATKTFTVTE. This 16-residue peptide folds reversibly in water with two antiparallel β-strands joined by a Type I’ turn (D-D-A-T motif).
Designed variant:KEWTYDDATKVFTVTE
Replaces the N-terminal Gly with Lys to give the strand edge a positive charge, suppressing lateral aggregation.
Retains the GB1 Type I’ turn, a known good nucleator.
Strands carry alternating hydrophobic (W, Y, F, V) / polar (T, K, E) residues, with aromatics (W, Y, F) stacking diagonally across the hairpin to provide hydrophobic-core packing and a built-in fluorescence reporter.
Maximum-stability variant (requires chemical synthesis): replace the Type I’ turn with a D-Pro-Gly Type II’ turn, e.g., KEWTY-(D-Pro)-G-AT-KVFTVTE. The D-Pro at i+1 of the turn adopts the positive-φ geometry that is energetically optimal for antiparallel hairpin nucleation. D-amino-acid incorporation requires either solid-phase peptide synthesis or an orthogonal aaRS-tRNA pair (the design toolkit from §1.5).
Verification. Round-trip designability check: ESMFold the sequence; align to the GB1 parent backbone (PDB 1PGB residues 41–56); aim for Cα RMSD < 1.5 Å and mean pLDDT > 80. ProteinMPNN inverse-fold the validated backbone to confirm the design lies in the high-likelihood region of the model. Caveat: both ESMFold and AlphaFold are less reliable on short stand-alone peptides than on full domains, so for a 16-residue hairpin the computational round-trip is a sanity check, not a confirmation. Wet-lab validation is what closes the loop: CD spectroscopy (β-sheet signature at 218 nm), NMR ¹H-¹⁵N HSQC chemical-shift dispersion, and thermal melt for two-state cooperativity.
For a slightly larger and more thermostable motif, a four-strand antiparallel β-meander based on a Greek-key topology (~40 residues) is a good starting point. The standard test scaffold is ubiquitin’s β-grasp fold (PDB 1UBQ).
Part B — pick a protein, sequence and structural analysis: UniProt BLAST, RCSB structure quality, SCOP classification, PyMOL visualization (cartoon, ribbon, secondary structure / residue type coloring, surface pockets).
Part C — modern protein-ML stack applied to the chosen target: ESM-2 unsupervised deep mutational scan; latent-space embedding; ESMFold structure prediction and mutational resilience; ProteinMPNN inverse folding with round-trip designability check.
Part D — Bacteriophage MS2 L-protein engineering proposal: stability and auto-folding optimization using PLM-based in silico mutagenesis, AlphaFold-Multimer for L–DnaJ complex validation, and Rosetta/FoldX ddG scoring on top candidates.
Part 1 — DNA Assembly: PCR, Gibson, Golden Gate, and transformation
A topic guide on the molecular-biology toolkit that underpins all of synthetic biology: amplifying DNA (PCR), cutting it (restriction enzymes), joining it (Gibson and Golden Gate), and getting it into cells (transformation). Written as a stand-alone primer rather than a homework Q&A. Part 2 (Asimov Kernel: building genetic circuits computationally) will follow as a separate page once the simulation work is complete.
Course: HTGAA Spring 2026
Lecture (Tues, Mar 10, 2026): Doug Densmore & Traci Haddock — Genetic Circuits Part I: Assembly TechnologiesRecitation (Wed, Mar 11): Eyal Perry & Ronan Donovan — PCR, Gibson AssemblyAuthor: Fiona (Committed Listener BioPunk)
At a glance
Read time
~25 min
Audience
Anyone wanting a primer (pun-intended) on the molecular-biology toolkit of synthetic biology
What you’ll learn DNA amplification (PCR), cut it (restriction enzymes), join it (Gibson and Golden Gate), and get it into bacterial cells (transformation) — with verified protocols and the failure checks
Format
Part 1- Theorectical overview , Part 2 — genetic circuits and Asimov Kernel build example
Why this matters
Synthetic biology runs on the ability to compose DNA parts — promoters, RBSs, CDSs, terminators — into functional circuits. Until the late 2000s, multi-fragment assembly was painful: restriction enzymes left scars, ligation efficiencies were mediocre, and any non-trivial circuit took weeks of iterative cloning. Two innovations changed the field. Gibson Assembly (Gibson 2009) collapsed multi-fragment ligation into a single isothermal one-pot reaction with seamless joints. Golden Gate Assembly (Engler 2008) used Type IIS enzymes for scarless, hierarchical assembly with directional control over fragment order. Together, they unlocked the modular abstractions that earlier standards (BioBricks) only aspired to.
Computational design caught up. Benchling and platforms like Asimov Kernel let you simulate construct behavior — predicted protein output, dynamics, signal-to-noise — before ordering a single oligo. Densmore’s group (Boston University) has been particularly central to this design-automation thread.
This page is the bench-and-protocol foundation for everything later in HTGAA. Every later week — protein engineering, cell-free systems, genome engineering — assumes you can assemble defined DNA constructs and get them into cells. Master this, and the rest of the course is design problems.
1. PCR — making DNA in vitro
PCR (Kary Mullis, 1983; commercialized 1985) turns a single template DNA molecule into ~10⁹ copies in a couple of hours. It is the workhorse of every cloning, sequencing, diagnostic, and forensic lab.
1.1 The three-step cycle
Each PCR cycle has three temperature stages, repeated 25–35 times:
flowchart LR
A[Template + Primers<br/>+ Phusion + dNTPs] --> B[Denaturation<br/>~95 °C · 10–30 s]
B --> C[Annealing<br/>~50–65 °C · 15–30 s]
C --> D[Extension<br/>~72 °C · 15–30 s/kb]
D -->|repeat 25–35×| B
D --> E[~10⁹ copies<br/>of target]
Denaturation (~95 °C, 10–30 s) — double-stranded DNA melts apart into single strands.
Annealing (~50–65 °C, 15–30 s) — primers (short ~20-nt synthetic oligos that flank your target) bind to the now-single-stranded template at their complementary sequences.
Extension (~72 °C, 15–30 s/kb for Phusion) — the polymerase copies the template starting from each primer’s 3′ end, producing new dsDNA.
After the first few cycles, every newly-synthesized strand is itself a template for the next cycle. The amount of DNA between the two primers doubles per cycle — exponential amplification. After 30 cycles you have ~10⁹× the starting amount, enough to clone, sequence, or visualize on a gel.
1.2 Phusion HF — the modern default
For cloning-grade work, Phusion High-Fidelity DNA Polymerase (Thermo Fisher) is the standard choice. It is a proofreading polymerase fused to a non-specific DNA-binding domain that boosts processivity. The error rate is 4.4 × 10⁻⁷ errors per base — roughly 50× lower than Taq. For a 5 kb construct over 30 cycles, that is ~1 in 50 clones with a polymerase-introduced error. Manageable.
The Phusion HF Master Mix is sold pre-formulated and combines four components:
Component
Purpose
Phusion polymerase
Proofreading DNA polymerase fused to a non-specific DNA-binding domain for high processivity. ~50× lower error than Taq.
dNTPs (dATP, dTTP, dCTP, dGTP)
Building blocks for the new DNA strand.
HF reaction buffer
Tris-HCl (~pH 8.8) for buffering, K⁺ for ionic environment, Mg²⁺ as the catalytic cofactor for the polymerase.
Tracking dye
Density agent + visible dye (in the Phusion Green Master Mix variant only) for direct gel loading without a separate loading buffer. The standard HF Master Mix is colorless.
A separate GC buffer is sold for high-GC templates that don’t amplify well in HF buffer — the two buffers differ in additive composition tuned for fidelity (HF) versus yield on tough templates (GC).
1.3 Annealing temperature — what sets it
Primer annealing temperature is set by primer Tm (melting temperature), with three sequence-intrinsic factors and two solution-condition factors:
Sequence-intrinsic (set Tm):
Primer length — longer primers melt at higher T. Standard range 18–25 nt.
GC content — G–C base pairs (3 H-bonds) are stronger than A–T (2 H-bonds). 50% GC is the design sweet spot; > 65% gets hard to manage.
Sequence-specific effects — nearest-neighbor thermodynamics, runs of GC, and secondary structure (primer hairpins lower effective Tm).
Solution conditions (modify effective Tm):
Mg²⁺ concentration — higher Mg²⁺ stabilizes duplex formation, raising effective Tm. Standard 1.5–2 mM.
Monovalent salt (K⁺, Na⁺) — also stabilizes duplexes; standard ~50 mM K⁺ in Phusion buffer.
Practical rule: anneal 3–5 °C below the Tm of the lower-Tm primer in the pair. If your primers are Tm 60 °C and 65 °C, anneal at ~57 °C. For Phusion specifically, with sufficiently long primers (Tm > 72 °C), you can drop to a 2-step protocol with annealing/extension combined at 72 °C.
Tm calculators (Primer3, NEB Tm Calculator, Benchling primer tool) compute these automatically and should be used for any non-trivial PCR design.
1.4 Failure modes — including primer dimers
PCR fails in characteristic ways. The four to know:
Smeared product on gel. Non-specific amplification — usually fixed by raising annealing T, shortening extension time, or reducing template input.
No-template control (NTC) shows a band. Reagent contamination — discard primers/water/Master Mix and start fresh.
No product at all. Primer–template mismatch, dead polymerase, or template not present at adequate copy number.
Primer dimers. Two primer molecules anneal to each other (instead of to the template) and the polymerase extends across the junction. Even 3–5 bp of complementarity at the 3′ ends is enough. The result is a short product (30–80 bp) that gets amplified exponentially alongside (or instead of) your real target. Once dimers start forming, they hog dNTPs, primer, and polymerase. They show up as a low-MW band/smear on agarose gels (best seen at 2–3% agarose). A positive NTC lane is the classic giveaway. For downstream Gibson cloning, dimers are particularly damaging — they can mis-prime onto your fragments or compete for reaction components.
Mitigations for primer dimers, in priority order:
Primer design. Use Primer3, NEBuilder, or Benchling’s primer tool to flag (a) 3′-end complementarity between forward/reverse primers, (b) self-complementarity within each primer.
Hot-start polymerase. Phusion is sold in a Hot Start variant where the polymerase is held inactive (antibody- or aptamer-blocked) until the first denaturation. Single highest-leverage mitigation.
Lowest workable primer concentration (0.2–0.5 μM is plenty for most applications).
Raise the annealing temperature. Favors specific primer–template binding over primer–primer (which has lower Tm).
Touchdown PCR. Start at a high annealing T and step down incrementally — biases early cycles toward specific product before dimers can take over.
Key takeaway — PCR. Phusion HF Master Mix + carefully-designed primers + the right annealing temperature gets you 95% of the way. The remaining 5% is hot-start polymerase to suppress primer dimers. If you can read a gel, you can debug a PCR.
2. Restriction enzymes — cutting DNA at defined positions
Restriction enzymes are the molecular scissors of synthetic biology. Discovered in the 1960s as bacterial defense against phage infection (Arber, Smith, Nathans — Nobel 1978), they cut DNA at specific recognition sequences. The defining trick: bacteria methylate their own DNA, protecting it from their own restriction enzymes; foreign (phage) DNA is unmethylated and gets cut. Synthetic biology has repurposed thousands of these enzymes as cloning tools.
2.1 Type II restriction enzymes
The familiar workhorses (EcoRI, BamHI, HindIII, NotI) recognize palindromic sequences and cut inside the recognition site. EcoRI’s site is GAATTC; it cuts between G and A on both strands, leaving a 4-nt 5′ overhang of AATT:
5'...G AATTC...3'
3'...CTTAA G...5'
Two products with matching “sticky” ends. Anneal with another fragment that has the same overhang and seal the nicks with T4 DNA ligase — the classical cloning workflow from the 1980s.
Practical issues to know: sticky-end self-ligation (fix with phosphatase treatment of the linearized vector); limited overhang vocabulary (each enzyme has one specific overhang); star activity (relaxed-specificity cutting at high glycerol or low salt — use HF variants); and methylation sensitivity (Dam/Dcm-blocked sites need a Dam⁻/Dcm⁻ host like JM110 or GM2163).
2.2 Type IIS restriction enzymes
The clever ones. Type IIS enzymes (BsaI, BsmBI/Esp3I, SapI, BbsI) recognize an asymmetric site but cut outside that site at a defined offset, leaving 4-nt overhangs of arbitrary sequence. BsaI’s recognition site is GGTCTC(N₁) with cleavage at N₁+4 on the top strand and N₁+8 on the bottom:
5'...GGTCTCN^NNNN....3'
3'....CCAGAGNNNNN^....5'
The recognition site is removed in the cut product. The 4 N’s of the overhang can be designed to be anything. Two consequences: custom overhangs (you specify which fragments anneal to which neighbors) and irreversible one-pot reactions (re-ligated parental plasmid still carries the site and gets re-cut, while assembled product is stable). This is what underwrites Golden Gate Assembly (§4).
2.3 PCR vs restriction digest in practice (worked answer)
Both methods produce linear dsDNA fragments, but they differ at almost every protocol step and serve different use cases:
Feature
PCR
Restriction enzyme digest
Starting material
Any template (plasmid, genomic DNA, cDNA — even a few molecules)
Existing dsDNA construct, typically a plasmid
Defines fragment by
Primer position — anywhere on the template
Recognition site position — fixed by the enzyme
Product yield
Exponential amplification (1 → ~10⁹ copies)
Stoichiometric (1 → 1 per input molecule)
End chemistry
Blunt by default; engineered 5′ overhangs added via primer extensions
The enzyme’s specific overhang (sticky or blunt)
Mutation introduction
Trivial — encode in the primer
Requires installing a new restriction site first
Sequence fidelity
Polymerase error rate (~4.4 × 10⁻⁷ per base for Phusion)
Faithful — original sequence preserved
Time
~1–2 hours per cycling run
~30 min – 2 hours per digest
Common failure modes
Primer dimers, no product, smear
Star activity, methylation blocking, incomplete digestion
When PCR is preferable: building a fragment with arbitrary 5′ extensions for downstream Gibson or Golden Gate; introducing point mutations or insertions/deletions via primer design; amplifying from low-copy templates (genomic DNA, cDNA, environmental samples); generating multiple variants in parallel from a common template.
When restriction digest is preferable: recovering an intact fragment from an existing plasmid where you trust the sequence and can’t risk polymerase-introduced errors; subcloning very large fragments (> 10 kb) that PCR amplifies poorly; linearizing a vector for Gibson or ligation when you already have suitable restriction sites in place; preserving methylation status or other DNA modifications.
The hybrid in modern practice. Most cloning workflows combine both: PCR-amplify the insert with engineered overlaps; restriction-digest the backbone. PCR gives you any fragment you want with arbitrary ends; restriction gives you a clean, mutation-free backbone in high yield. For Gibson assembly specifically, PCR-for-inserts + RE-digested-backbone is the standard pattern.
Key takeaway — restriction enzymes. Type II enzymes are the classical cutters with palindromic sites and fixed overhangs. Type IIS enzymes (BsaI, BsmBI) cut outside their recognition sites, giving you arbitrary 4-nt overhangs — the foundation of Golden Gate Assembly.
3. Gibson Assembly
In 2009, Daniel Gibson at the J. Craig Venter Institute published a one-pot, isothermal DNA assembly method that made multi-fragment cloning routine. The motivation was JCVI’s effort to chemically synthesize the Mycoplasma mycoides genome (~1 Mb) — they needed efficient assembly of many overlapping fragments. The same chemistry now underpins most modern molecular cloning.
3.1 The three-enzyme cocktail
A Gibson reaction contains three enzymes operating simultaneously at 50 °C for 15–60 minutes:
flowchart LR
A[Fragments with<br/>15–40 bp overlaps] --> B[T5 exonuclease<br/>chews back 5′ ends]
B --> C[3′ ssDNA overhangs<br/>anneal]
C --> D[Phusion<br/>fills in gaps]
D --> E[Taq ligase<br/>seals nicks]
E --> F[Seamless<br/>assembled product]
T5 exonuclease — chews back DNA from 5′ ends, exposing single-stranded 3′ overhangs on each fragment.
Phusion polymerase — fills in any gaps after complementary overhangs anneal.
Taq DNA ligase — seals the remaining nicks at the joint.
T5 exonuclease is heat-labile, so its activity decreases over the reaction time, allowing the polymerase and ligase to take over and finalize the joints. The result is seamless joints — no scar, no leftover restriction sites, just continuous dsDNA.
Critical reagent caveat. T5 exonuclease activity drops sharply with freeze-thaw. Use a fresh aliquot of Gibson Master Mix within ~24 h of thawing, or pre-aliquot small volumes to single-use tubes. This is the most common silent cause of Gibson failure.
3.2 How it works mechanistically
Each fragment is designed to share 15–40 bp of homology with its neighbors. T5 exo chews the 5′ ends back, exposing complementary 3′ ssDNA tails. The tails anneal. Phusion fills in any gaps. Taq ligase seals the nicks.
Overlap design rules: 15–25 bp is standard for ~50% GC; 30–40 bp for high-GC overlaps that need more thermodynamic strength. Avoid hairpins or repeats in the overlap sequence — they mis-anneal. Match Tm of adjacent overlaps to ~60 °C for clean assembly. Tools like NEBuilder Assembly Tool and Benchling’s Gibson primer designer automate this.
3.3 Why Gibson is the modern default
Multi-fragment, one-pot. Up to 5–6 fragments routinely; up to 15+ with optimization.
No scars. Seamless joints mean any sequence can be the joint.
No restriction-site constraints. You don’t need to find unique cutters in the right places.
Predictable. With overlaps designed properly, reactions are high-efficiency.
3.4 Ensuring Gibson-readiness — a pre-flight checklist (worked answer)
Gibson is unforgiving of upstream sloppiness. The single most useful artifact is a checklist organized by stage.
Sequence design
Overlaps are 15–40 bp between adjacent fragments (15–25 bp for ~50% GC; 30–40 bp for high-GC).
Overlap Tm ~60 °C, matched across all junctions.
No hairpins or repeats within the overlap sequence — verify with NEBuilder, Benchling, or Primer3.
No unintended internal homology between non-adjacent fragments — would cause scrambled assemblies.
Designed in an assembly-aware tool (NEBuilder Assembly Tool, Benchling Gibson designer).
PCR product quality
Single clean band on a 1% agarose gel — no smear, no dimers, no extra bands.
Negative no-template control (NTC) — no contamination.
DpnI-digested if amplified from a circular plasmid template (37 °C, 1 h, then heat-inactivate at 80 °C). DpnI cuts only methylated G(m⁶)ATC, destroying parental plasmid carryover.
Gel-purified or column-purified to remove primer dimers, unincorporated primers, and dNTPs.
Sanger-sequenced if the fragment is novel or has been mutated, to confirm no polymerase errors before assembly.
Backbone digest quality (for the linearized vector)
Complete digestion — no residual circular plasmid (run a small aliquot on gel; uncut shows as supercoiled/nicked bands, fully cut as a single linear band).
Phosphatase-treated if using a single restriction site (CIP, rSAP, or Antarctic phosphatase) to prevent vector self-ligation.
Gel-purified to remove any excised fragment and residual undigested vector.
Quantity and stoichiometry
Quantified by Nanodrop or Qubit (Qubit is more accurate at low concentration).
Fragments at equimolar ratio in the assembly reaction. Compute molarity from concentration ÷ fragment length × Avogadro — don’t go by nanogram amounts alone.
Total DNA 50–100 ng per 20 µL Gibson reaction. More usually doesn’t help; less risks insufficient encounter rate.
Reaction setup
Fresh Gibson Master Mix — T5 exonuclease activity drops with freeze-thaw. Use within 24 h of thawing, or aliquot.
Reaction at 50 °C, 15–60 min depending on number of fragments (longer for more complex assemblies).
Transform a small aliquot (1–2 µL into 50 µL competent cells) — too much DNA can saturate cells or carry over inhibitors.
Sanity checks before transformation
Run 2–3 µL of the assembled reaction on a gel — for simple 2-fragment assemblies you should see a band at the expected combined size.
Plate a no-DNA negative control transformation alongside the experimental, to detect competent-cell contamination.
Key takeaway — Gibson. Three enzymes, one tube, 50 °C, 15–60 min. The chemistry is robust; the failures are upstream — bad primers, undigested parental plasmid, or unequal molar ratios. The §3.4 checklist catches all of them.
4. Golden Gate Assembly
In 2008, Carola Engler and Sylvestre Marillonnet at the Institute of Plant Biochemistry (Halle, Germany) published Golden Gate Assembly — a one-pot, scarless cloning method that exploits Type IIS restriction enzymes for fragment-order control. It’s the most-used alternative to Gibson today, particularly for plant and modular synthetic biology, where the MoClo hierarchical extension dominates.
4.1 The mechanism
A Golden Gate reaction contains, in one tube, at one temperature (37 °C, ~1–2 h):
Type IIS enzyme (typically BsaI, BsmBI/Esp3I, or SapI).
T4 DNA ligase + ATP.
Your fragments, each flanked by Type IIS recognition sites that point inward.
The enzyme cuts each fragment at its designed offset, removing the recognition site and exposing a 4-nt sticky end that you specified at the design stage. The ligase joins fragments whose overhangs are complementary. Both reactions run continuously and in parallel.
4.2 The reaction-loop trick
Because the recognition site is removed in the cut product, the assembled fragment cannot be re-cut. But any re-ligated parental plasmid still carries the recognition site, so the enzyme cuts it again. The reaction therefore drives toward the assembled product irreversibly. Running the reaction longer increases yield rather than damaging it — a feature distinctive to this assembly method.
flowchart LR
A[Parental plasmid<br/>+ Insert + BsaI + Ligase] --> B[BsaI cuts<br/>removes recognition site]
B --> C[4-nt overhangs<br/>exposed]
C --> D{Ligation}
D -->|matching overhangs| E[Assembled product<br/>no BsaI site → stable]
D -->|parental re-circularizes| F[Still has BsaI site<br/>→ re-cut → recycled]
F --> B
4.3 Overhang design
The 4-nt overhang is whatever you specify in the primer or synthesized fragment. Theoretically there are 4⁴ = 256 distinct 4-nt sequences. In practice, you want overhangs with high Hamming distance (≥ 2 differences between any two overhangs in the assembly) to prevent mis-pairing. Standard MoClo libraries provide curated overhang sets — typically 12–20 orthogonal overhangs that have been wet-lab-validated to assemble cleanly together.
4.4 Hierarchical assembly — MoClo and Loop
The same chemistry scales with alternating enzymes. MoClo (Weber et al. 2011) and Loop assembly (Pollak et al. 2019) cycle BsaI and BsmBI/Esp3I:
Level 0: parts (promoters, CDSs, terminators) flanked by BsaI sites in BsmBI-flanked vectors.
Level 1: BsaI assembles parts into transcription units; the new TU is flanked by BsmBI sites in a BsaI-flanked vector.
Level 2: BsmBI assembles TUs into multi-gene constructs.
Multi-gene plasmids of 5–10 transcription units assemble in a few cycles of these reactions — orders of magnitude faster than equivalent Gibson assemblies at the same complexity.
4.5 Worked alternative-method writeup (Q6)
The Week 6 homework asks for a 5–7 sentence description of an alternative assembly method, plus a schematic and a Benchling/Asimov Kernel model. Golden Gate is the natural choice. Here’s the writeup:
Golden Gate Assembly is a one-pot DNA cloning method developed by Engler and Marillonnet (2008) that uses Type IIS restriction enzymes to assemble multiple fragments scarlessly in a single reaction.
Type IIS enzymes (BsaI, BsmBI/Esp3I, SapI) recognize an asymmetric site and cut outside that site at a defined offset, producing 4-nt sticky overhangs whose sequences the user designs in the fragment ends.
The reaction tube contains the Type IIS enzyme, T4 DNA ligase, ATP, and the fragments — all at 37 °C — so cutting and ligation occur simultaneously, with the recognition site removed from each cut product.
Because each fragment’s overhang is custom-specified, the user controls the order and orientation of assembly: only fragments with matching overhangs anneal, and unwanted combinations are physically forbidden.
The reaction is irreversible because re-ligated parental plasmid still carries the recognition site and gets re-cut by the enzyme, while the assembled product no longer contains the site and is stable.
Compared to Gibson, Golden Gate excels at multi-fragment (10+) directional assembly, hierarchical schemes (MoClo, Loop), and standardized parts libraries; the main constraint is domestication — removing internal copies of the Type IIS recognition sequence from each fragment before assembly.
Schematic
BEFORE: Three fragments, each flanked by BsaI sites pointing inward
toward the fragment body.
[BsaI →NNNN — Fragment A — NNNN← BsaI]
[BsaI →NNNN — Fragment B — NNNN← BsaI]
[BsaI →NNNN — Fragment C — NNNN← BsaI]
↓ BsaI cuts; 4-nt sticky overhangs exposed
↓ designed so A's right overhang = B's left overhang, etc.
Fragment A — Fragment B — Fragment C (ligated, no recognition site)
↑
T4 ligase
Re-ligated parental still has BsaI site → re-cut → recycled
Assembled product has no BsaI site → stable
Benchling / Asimov Kernel model
In Benchling: open the Assembly Wizard → choose “Golden Gate” or “Type IIS” assembly mode. Add 2–3 fragments with BsaI flanking sequences and custom 4-nt overhangs designed for orthogonality (e.g., overhangs from the MoClo standard set: GGAG, AATG, GCTT, CGCT). Run the assembly simulation; verify the predicted product matches the intended sequence. Export the rendered glyph and the assembly trace for inclusion in the Notebook entry.
In Asimov Kernel: open a new Construct in your Repository. Drag in three Type IIS-flanked parts from the Characterized Bacterial Parts repo (or any compatible set). Use the assembly preview to verify directional ligation and confirm no internal BsaI sites in the parts. Save the Construct + the assembly graph to the Notebook. The simulated product should be a single, scarless plasmid with the three parts joined in the order specified by the overhangs. Mis-pairing or internal-site issues will show up as failed-assembly warnings in either tool.
4.6 When Golden Gate beats Gibson (and vice versa)
Golden Gate / MoClo
Gibson
Multi-fragment scaling
Excellent (10+ fragments routine)
Good up to 5–6, harder beyond
Scarless
Yes (recognition site removed)
Yes (no scar at all)
Sequence constraints
Internal Type IIS sites must be removed (“domestication”)
None
Library compatibility
Strong (MoClo, Loop, OpenMTA)
Weaker
Best for
Standardized parts assembly, multi-gene synbio
Ad-hoc cloning, novel sequences
Key takeaway — Golden Gate. Type IIS enzymes give you arbitrary 4-nt sticky overhangs and a self-driving assembly reaction. Excellent for ≥10-fragment, multi-level, library-friendly synthetic biology. The price is upfront effort: domesticate your parts first.
5. E. coli transformation — getting DNA into cells
After you’ve assembled your construct, you need to get it into a cell where it can replicate, express, and be selected. E. coli is the workhorse host because it grows fast, takes up DNA readily (with help), and has a century of genetic toolkit behind it.
The challenge: E. coli has a complex envelope — outer membrane with lipopolysaccharide (LPS), periplasm, peptidoglycan cell wall, and cytoplasmic membrane. Plasmid DNA is ~10 kbp of negatively-charged duplex; it doesn’t cross any of those layers passively. Two methods overcome this barrier.
5.1 Chemical transformation (CaCl₂ + heat shock)
Cells are pre-treated with cold CaCl₂. Ca²⁺ ions neutralize electrostatic repulsion between the phosphate backbone of the plasmid and the negatively-charged LPS, allowing DNA to associate with the cell surface; the cold rigidifies the lipid bilayer. A brief heat shock at 42 °C for 30 seconds is then thought to perturb the bilayer enough to create transient pores through which DNA enters. Returning to ice re-seals the membrane. The precise molecular details of the heat-shock step remain incompletely characterized — proposed contributions include LPS rearrangement, periplasmic Ca²⁺–DNA complex formation, and active uptake via outer-membrane porins (OmpC, OmpF).
A microsecond high-voltage pulse (1.8–2.5 kV across a 1–2 mm gap) polarizes the membrane beyond its dielectric breakdown threshold, creating transient electropores. DNA migrates electrophoretically through these pores during and immediately after the pulse. Once the field is removed, the bilayer re-seals on a millisecond timescale.
Time-critical step. Add recovery medium (SOC) to the cuvette immediately after the pulse. A one-minute delay can cause a ~3× drop in transformation efficiency (per NEB). Have the SOC pre-warmed and within arm’s reach before pulsing.
5.3 After uptake — recovery and selection
The cell needs ~1 hour of recovery in non-selective rich medium (LB or SOC at 37 °C) to (a) re-seal the envelope, (b) express the antibiotic-resistance gene encoded on the plasmid (typically AmpR, KanR, CmR, or TetR), and (c) begin replication. Plating directly on selective antibiotic without recovery kills successful transformants before they can produce enough resistance protein. Surviving colonies on the selective plate carry the plasmid (with some background from re-circularized vector — hence the controls in the §3.4 Gibson checklist).
5.4 Comparison — chemical vs electroporation
Chemical (CaCl₂ + heat shock)
Electroporation
Mechanism
Heat-shock-induced transient pores; Ca²⁺-mediated DNA–membrane association
Large libraries, low-yield assemblies, hard-to-transform strains
Common pitfalls
Old cells (efficiency drops with freeze-thaw); wrong heat-shock timing
Salt arcing in cuvette; delayed recovery (1 min delay = ~3× efficiency loss, per NEB)
Key takeaway — transformation. Chemical for routine work, electroporation when you need maximum efficiency. Both require a 1-hour antibiotic-free recovery before plating, and both punish stale cells.
NTC band positive. Reagent contamination — replace primers/water/Master Mix.
Primer dimers visible. Hot-start polymerase, redesign primers, raise annealing T.
Gibson returns no colonies. Verify (1) DpnI was added to PCR products; (2) overlaps are correct; (3) competent cells are fresh; (4) molar ratios are equimolar.
All colonies carry empty backbone. Linearized backbone wasn’t gel-purified away from undigested circular parental, or phosphatase wasn’t applied to a single-enzyme-cut vector (whose self-compatible overhangs let it re-circularize without insert).
No transformation colonies at all. Old competent cells (efficiency drops with freeze-thaw); wrong heat-shock timing; antibiotic plate too old.
Mixed colonies on plate. Cross-contamination between transformations or reagents.
Sequencing reveals point mutations. Polymerase error during PCR; switch to a higher-fidelity polymerase or sequence multiple clones to find a clean one.
The single most useful sanity check across the whole pipeline is the gel — at every step where DNA is generated or modified, run a small aliquot on agarose. Most failures are visible long before transformation if you look.
Asimov Kernel build — recreating the Repressilator from Characterized Bacterial Parts; comparison to the reference construct in Bacterial Demos.
Three custom Constructs — toggle switch, I1-FFL pulse generator, AND gate. Predicted vs simulator output, with discussion of any divergences.
Genetic-circuit design principles — Elowitz / Collins / Voigt lineage, with notes on how the Repressilator generalizes to other oscillators (Stricker 2008 dual-feedback oscillator, Potvin-Trottier 2016 low-noise variant).