Week 12 Review: Building Genomes

Week 12 — Building Genomes

How to rewrite an organism, one chromosome at a time

At a glance. Synthetic biology spent its first two decades learning to read DNA. This week is about writing it — not gene by gene, but genome by genome. We’ll meet the smallest free-living cell ever built (473 genes, and we still don’t know what 149 of them do), the E. coli strain whose entire genetic code was rewritten by hand, the yeast whose chromosomes are being replaced one at a time, and the CRISPR tricks that let you dial metabolic pathways like an audio mixer. The final two sections bring the toolkit home to my own work: the MS2 phage L-protein group project (where the whole 3.5 kb genome is small enough to redesign from scratch) and the Cholera Shield final project (where genome-scale tools become the obvious answer to B. subtilis protease degradation, biocontainment, and multi-function spore-display optimization). This is the chapter where synthetic biology stops asking “can we edit this?” and starts asking “what if we just typed the whole thing from scratch?”

Course: HTGAA Spring 2026 · Lecture (Apr 21): George Church, John Glass & Jef Boeke — Building Genomes · Recitation (Apr 22): Ice Kiattisewee — CRISPR-based Metabolic Engineering Author: Fiona Connolly (Committed Listener BioPunk)

Why build genomes?

For most of synthetic biology’s history, the field has worked at the part scale: swap a promoter, knock out a gene, tune a ribosome binding site. That’s the syn-bio equivalent of editing a sentence in a finished novel. Powerful — but you’re never really questioning the book.

This week is about engineering at the genome scale, and that changes the question entirely. When you can write a whole chromosome from scratch, you can:

Delete things you didn’t know you needed until they were gone (and discover, like JCVI did, that 149 of life’s essential genes have no known function — a 31% mystery rate in the simplest cell ever built).
Free up codons that evolution stuck you with, so you can repurpose them for non-canonical amino acids, virus resistance, and biocontainment.
Add design features to every chromosome at once — recombination sites, watermarks, regulatory landing pads — that would be impossible to retrofit.
Rebuild metabolism wholesale, redirecting carbon flux through pathways nature never tried.

Put bluntly: if Week 4 was about designing one protein, Week 12 is about designing the whole ribosome’s worth of customers it has to serve. Genome-scale engineering is where synthetic biology graduates from edits to authorship.

Why we should care. Many of the bio-based drugs, biofuels, food ingredients, and probiotics now in commercial pipelines at Ginkgo, Amyris, Cargill, and a dozen quieter biofoundries use some combination of the tools on this page — multiplex genome editing, codon recoding, CRISPRi-based pathway tuning, or chromosome-scale assembly. (Not every product needs all of them; many still rely on classical single-pathway integration. The point is that the toolbox is now the industry default for anything that requires combinatorial optimization across many loci.) The minimal-cell work is separately the closest thing we have to an experimental definition of “what life needs.” Both ends of the spectrum — most-minimal and most-engineered — live in this chapter.

A quick timeline: how we got here

timeline
    title The DNA-writing era so far
    2002 : Cello & Wimmer assemble poliovirus from synthesized oligos (7.5 kb)
    2008 : Gibson assembles Mycoplasma genitalium genome (583 kb) at JCVI
    2009 : Wang & Church introduce MAGE for multiplex oligo-mediated edits
    2010 : JCVI-syn1.0 — first cell with a fully synthetic genome (1.08 Mb)
    2013 : Lajoie C321.ΔA — first genomically recoded E. coli (321 UAG → UAA)
    2014 : Annaluru et al. — Sc2.0 first synthetic yeast chromosome (synIII)
    2016 : Hutchison JCVI-syn3.0 — minimal cell (473 genes, 149 unknown)
    2019 : Fredens Syn61 — E. coli with 18,214 codons rewritten (4 Mb)
    2023 : Zhao/Boeke syn7.5 — yeast strain >50% synthetic DNA
    2024+ : Constructive Bio commercializes recoded chassis; GP-write pushes toward synthetic human chromosomes

The hierarchy of edit scale

flowchart LR
    A[Single base<br/>SNP, point mutation] --> B[Single gene<br/>knockout, replacement]
    B --> C[Operon / cassette<br/>5-30 kb]
    C --> D[Cluster / pathway<br/>30-100 kb<br/>e.g. biosynthetic gene cluster]
    D --> E[Chromosome<br/>200 kb - 12 Mb<br/>Sc2.0, JCVI synthetic chromosome]
    E --> F[Whole genome<br/>0.5 Mb - hundreds of Mb<br/>JCVI-syn1.0/3.0, Syn61]
    style A fill:#e8f4f8
    style B fill:#cee9f4
    style C fill:#a8d5e8
    style D fill:#7fbcd8
    style E fill:#5599c0
    style F fill:#306596,color:#fff

The further right you go, the more design choices you make at once — and the more failures you have to debug at once. Most teams work as far left as they can get away with. Week 12 is about the cases where you can’t.

Core concepts (the vocabulary you need)

Term	What it means	Why it matters
Minimal cell	A cell whose genome has been pruned to the smallest gene set that still supports autonomous replication	Tells you what life needs (vs. what it just happens to have)
Genomically recoded organism (GRO)	An organism whose genome has been edited so that one or more codons are no longer used for their original meaning	Lets you reassign codons to non-canonical amino acids or block viral hijacking
Codon compression	Removing redundant codons by synonymous substitution so the genetic code uses fewer than 64 codons	Frees up codons for new functions; the basis of Syn61
Synthetic chromosome	A chromosome rebuilt from synthesized DNA fragments, replacing the natural one	The Sc2.0 strategy — design features baked in everywhere at once
TAR cloning	Transformation-Associated Recombination — assembling large DNA fragments (100 kb+) inside S. cerevisiae using yeast’s homologous recombination	The workhorse for chromosome-scale assembly
MAGE	Multiplex Automated Genome Engineering — cycle in oligo pools to introduce many mutations across a population in parallel	Generates combinatorial diversity at dozens of loci simultaneously (Wang et al. 2009)
SCRaMbLE	Synthetic Chromosome Rearrangement and Modification by LoxPsym-mediated Evolution — induce Cre recombinase to shuffle a synthetic chromosome’s loxPsym sites in vivo	Generates massive structural diversity on demand for fitness landscape mapping
CRISPRi / CRISPRa	Catalytically dead Cas9 (dCas9) fused to a repressor (i) or activator (a) domain — tunes gene expression without cutting	Lets you dial pathways up/down without permanent mutation

The four big ideas (and what they actually built)

1. Minimal cells: how few genes can life run on?

The question is older than synthetic biology: what’s the smallest set of genes a free-living cell needs? For a long time it was mostly theoretical. Then John Glass and the J. Craig Venter Institute team decided to find out by building one.

The strategy was unsentimental: start with Mycoplasma mycoides (which already had a tiny ~1.1 Mb genome and the rare property that JCVI knew how to chemically synthesize it from scratch), then iteratively delete genes and see what kept the cell alive. After three full design-build-test cycles — including a humbling failure on the first design — they got down to JCVI-syn3.0: 531 kb, 473 genes (Hutchison et al. 2016).

That number is striking on its own. What’s more striking is this:

The 149-gene mystery. Of the 473 genes in the smallest known free-living cell, 149 have no known biological function. Not “we have a hypothesis” — no known function. The minimal cell is one-third black box. This is one of the most honest experimental results in modern biology, and the strongest existing argument that we don’t understand cells nearly as well as our textbooks suggest.

It also turned out there’s a hidden category between “essential” and “nonessential”: quasi-essential genes. Delete one and the cell technically still lives, but it grows so badly it’s effectively dead in a competitive culture. The first JCVI design missed these and produced a non-viable cell — the lesson being that “essentiality” is not a binary you can read off a single transposon screen.

Pop-quiz application. If you were designing a chassis cell for industrial fermentation, would you start from the minimal cell or add to E. coli? The minimal cell has fewer mysteries to debug but also fewer tools (no native CRISPR machinery, finicky media requirements). Most industry still starts from E. coli or Bacillus subtilis. But Syn3.0 derivatives are showing up in mammalian-vaccine production and synthetic chassis research.

2. Genomically recoded organisms: editing the genetic code itself

The genetic code has 64 codons mapping to 20 amino acids + 3 stops. That’s lots of redundancy — most amino acids have 2-6 synonymous codons. Two big lineages have asked: what if we just used fewer?

Round 1 — Lajoie et al. 2013 (Church group, Science 342:357–360). They built C321.ΔA: an E. coli in which all 321 instances of the UAG (amber) stop codon were changed to UAA, and then release factor 1 (the ribosomal protein that reads UAG) was deleted. UAG was now a “blank” codon — free to be reassigned. They handed it to an orthogonal aminoacyl-tRNA synthetase / tRNA pair, and the cell started incorporating non-canonical amino acids wherever a UAG appeared. The strain also became markedly resistant to bacteriophage T7, because phages used UAG in their own genes and now mistranslated.

Round 2 — Fredens et al. 2019 (Chin lab, Nature 569:514–518). The same idea, but at the genome scale that earlier teams had thought infeasible. Their target organism Syn61 had 18,214 codons rewritten — every TCG, TCA, and TAG replaced with synonymous alternatives across the entire 4 Mb genome. Three codons were now unused anywhere in the chromosome and could be reassigned. The cell was still viable. This is, to date, the most heavily edited free-living organism on Earth.

Why bother? Three concrete payoffs:

NCAA incorporation. Site-specifically install azobenzene, click-chemistry handles, photo-crosslinkers, fluorescent probes, redox cofactors — at any position in any protein.
Virus resistance. Phages depend on the standard genetic code. Recoded cells mistranslate phage proteins. This is meaningful biocontainment, not just a curiosity (Constructive Bio spun out of Chin’s lab on this premise).
Biocontainment of the recoded strain itself. A GRO that needs an NCAA to live can’t survive outside the lab — no NCAA, no functional proteins.

Sequence check. C321.ΔA nomenclature: Chromosome with 321 UAG→UAA conversions and the Δeletion of prfA (RF1). The strain is available as Addgene #48998. Syn61 derivatives are Addgene #174513.

How you actually do this at the bench: the MAGE cycle

Lajoie’s UAG sweep wasn’t 321 individual cloning steps. It used MAGE (Multiplex Automated Genome Engineering, Wang et al. 2009) — an automated recombineering loop that introduces dozens of oligo-mediated edits per cycle, then repeats.

flowchart TD
    A[Design oligo pool<br/>~90 nt, one per target locus<br/>each oligo carries the desired mutation] --> B[Electroporate pool into E. coli<br/>expressing λ Red Beta recombinase]
    B --> C[Beta anneals oligos to the<br/>lagging strand at the replication fork]
    C --> D[Cells recover, replicate<br/>mutation is fixed in daughter strands]
    D --> E[Subset of population now carries<br/>1+ targeted mutations]
    E --> F{Enough cycles?}
    F -->|No| B
    F -->|Yes| G[Allelic-replacement library<br/>combinatorial diversity at all targets]
    style G fill:#306596,color:#fff

Each cycle takes ~2-3 hours. Stack 50 cycles and you’ve sampled a combinatorial library across dozens of loci that no traditional cloning approach could reach. The conjugative variant (CAGE — Conjugative Assembly Genome Engineering, Isaacs et al. 2011) extends the same idea by moving recoded chromosome segments between strains via Hfr conjugation, in a single-elimination “playoff bracket” that merged 32 partially recoded E. coli strains (each carrying ~10 TAG → TAA changes) into one strain with 314 of the 321 codons converted. Lajoie 2013 then closed the remaining gap and deleted RF1 to lock in the final C321.ΔA strain.

3. Synthetic eukaryotic chromosomes: the Sc2.0 project

While Church and Chin were rewiring bacteria, Jef Boeke and the international Sc2.0 consortium were doing the equivalent in Saccharomyces cerevisiae — yeast. The goal: rebuild all 16 chromosomes from scratch, with design features that wild-type yeast doesn’t have.

The Sc2.0 design rules are worth knowing because they’re a masterclass in what “design” looks like at chromosome scale:

Remove repetitive DNA (transposable elements, subtelomeric repeats) — genomic-stability headaches.
Move every tRNA gene off the main chromosomes and onto a single dedicated “neochromosome” — concentrating regulatory load.
Recode all TAG stop codons to TAA — frees TAG, same idea as the bacterial GROs.
Insert a loxPsym site downstream of every nonessential gene — the basis for SCRaMbLE (see below).
Add PCR-tagged watermarks — every synthetic stretch is identifiable in sequencing.

The killer feature is SCRaMbLE. loxPsym is a 34-bp palindromic variant of the loxP site — palindromic because the spacer that gives wild-type loxP its directionality has been made symmetric, so Cre recombinase can recombine two loxPsym sites in either orientation. When you induce Cre, all those thousands of sites recombine randomly — deletions, inversions, duplications, translocations — generating an enormous library of rearranged genomes in a single overnight culture. You then select for whatever phenotype you want (heat tolerance, ethanol tolerance, pathway yield) and sequence the survivors to read out which rearrangements work. It’s directed evolution at the structural-variation scale, baked into the chromosome architecture.

flowchart TD
    subgraph BEFORE [Before SCRaMbLE induction]
        S1[Gene A] --> L1[loxPsym]
        L1 --> S2[Gene B]
        S2 --> L2[loxPsym]
        L2 --> S3[Gene C]
        S3 --> L3[loxPsym]
        L3 --> S4[Gene D]
    end
    BEFORE -->|Induce Cre recombinase<br/>estradiol-inducible promoter| RECOMB
    subgraph RECOMB [Cre acts at every loxPsym pair]
        direction LR
        OUT1[Deletion<br/>A — D] 
        OUT2[Inversion<br/>A — C-rev B-rev — D]
        OUT3[Duplication<br/>A — B — B — C — D]
        OUT4[Translocation<br/>fragments swap between chromosomes]
    end
    RECOMB --> POOL[Library of millions of<br/>uniquely rearranged genomes]
    POOL --> SEL[Select on phenotype<br/>e.g. ethanol tolerance, pathway titer]
    SEL --> SEQ[Whole-genome sequence survivors<br/>read out which rearrangements work]
    style POOL fill:#306596,color:#fff
    style SEQ fill:#306596,color:#fff

Each round of SCRaMbLE explores a slice of the structural-variation landscape that random mutagenesis would never reach in any reasonable time. You can also iterate — re-induce Cre on a winner to layer further changes — building up complex, optimized architectures from incremental selection.

As of November 2023, the consortium published a coordinated package reporting syn7.5 — a strain in which roughly 7.5 of the 16 yeast chromosomes are now synthetic DNA (variously described as 7 whole synthetic chromosomes plus one chromosome arm, or 6.5 chromosomes plus synthetic chromosome IV — same total) consolidated into one cell, together with the assembly path to all 16 (Zhao et al. 2023, Cell 186:5220–5236). At the April 21, 2026 HTGAA lecture, Boeke flagged a newer strain called Synthetic 11 containing 11 of the 16 synthetic chromosomes, and said the lab is “really, really hoping we’re going to get to the finish line later this year.” The last chromosomes are being debugged; the complete-genome milestone is in sight, not over the horizon.

Why yeast and not human cells? Yeast does homologous recombination so well it will assemble overlapping DNA fragments into chromosome-sized constructs essentially for free. This is the same property that powers TAR cloning. Mammalian-cell chromosome synthesis is a much harder problem and is just now starting (the GP-write consortium has been laying groundwork).

4. CRISPR-based metabolic engineering: tuning the orchestra

The recitation (Ice Kiattisewee) covered the lighter-weight cousin of all this rewriting: leave the genome alone and modulate expression instead, using CRISPR tools whose Cas9 nuclease has been deactivated.

CRISPRi — dCas9 fused to a repressor (KRAB in mammals; just steric blocking is often enough in bacteria) sits on a promoter and silences transcription. Easy knockdown without permanent damage.
CRISPRa — dCas9 fused to an activator (VP64, p65, RTA stacks) drives transcription up. Easy overexpression without strong promoters.
Multiplexed sgRNA libraries — express many guides at once and you can repress or activate dozens of genes simultaneously. This is how you redirect flux through a metabolic pathway: knock down the competing branches, dial up the desired ones, find the combination that maximizes product titer.

flowchart TB
    subgraph CASA [dCas9 — the dead nuclease scaffold]
        DC[dCas9<br/>D10A + H840A mutations<br/>binds DNA but can't cut] --> SG[sgRNA<br/>specifies target sequence]
    end
    CASA --> SPLIT{Fuse to what?}
    SPLIT -->|nothing / KRAB / sterically blocks RNAP| CRI[CRISPRi - REPRESSION<br/>blocks transcription initiation or elongation<br/>analogous to a knockdown]
    SPLIT -->|VP64 / p65 / RTA stack| CRA[CRISPRa - ACTIVATION<br/>recruits Pol II machinery<br/>analogous to an overexpression]
    CRI --> MULT[Multiplex with many sgRNAs<br/>tune entire pathway at once]
    CRA --> MULT
    MULT --> APP[Metabolic engineering payoff<br/>repress competing branches +<br/>activate productive branches simultaneously]
    style APP fill:#306596,color:#fff

Why this matters alongside genome rewriting. Genome synthesis is permanent, expensive, and slow. CRISPRi/a is reversible, cheap, and same-day. In practice, teams use CRISPRi/a to find the right combinations of edits, then lock the best ones in with permanent edits (MAGE, recombineering, or — when scale demands — genome synthesis). The tools complement each other.

Concrete example from the recent literature: a multiplexed CRISPRi library in E. coli repressing competing pathway genes around the mevalonate pathway delivered a 3-4.5× boost in isoprenol titer (1.82 g/L), and the best CRISPRi strain scaled to 12.4 g/L in fed-batch (Tian et al. 2023). That’s the kind of move Cargill, Amyris, and Ginkgo make every day on dozens of pathways at once.

How chromosome-scale DNA actually gets built

You can’t synthesize a chromosome in one pour. Real workflows use a hierarchy:

flowchart LR
    A[Oligonucleotides<br/>~200 bp<br/>chemical synthesis] --> B[Gene fragments<br/>1-5 kb<br/>assembled in vitro]
    B --> C[Cassettes<br/>10-30 kb<br/>Gibson or Golden Gate]
    C --> D[Chunks<br/>30-100 kb<br/>TAR cloning in yeast]
    D --> E[Chromosome<br/>200 kb-1 Mb+<br/>iterative replacement in vivo]
    E --> F[Whole genome<br/>1-12 Mb+<br/>chromosome consolidation]

Each step changes hands (vendor → bench → yeast → target organism), and each transition introduces failure modes. The reason the field obsesses over error rates per kilobase in DNA synthesis is that errors compound multiplicatively through this hierarchy — a 1-in-10,000-bp synthesis error becomes a near-certainty across a 1 Mb chromosome unless you sequence-verify and repair at each level.

The dominant players in the “write” half of DNA today:

Twist Biosciences — silicon-printed oligo arrays driving most clonal-gene synthesis (the assignment refers to a Twist order).
Ansa Biotechnologies — enzymatic (template-free) DNA synthesis, longer single-molecule reads.
Avery Digital Bio / DNA Script — electrochemical and enzymatic platforms for in-lab synthesis.
Elegen, Telesis Bio, Codex DNA — gene-and-cassette-scale synthesis.

The headline trend is cost per base, which has dropped by roughly two orders of magnitude in the last decade. The headline limit is length and accuracy: most platforms still hit problems past ~3 kb without TAR-style yeast assembly to bail them out.

Voices from the lecture (Boeke / Glass / Church, Apr 21, 2026)

Three first-hand details from the lecture that don’t appear in the published papers, worth pinning here:

Jef Boeke on the Sc2.0 status as of April 2026: “As of last week, we have a strain that we call Synthetic 11 that has 11 of the 16 chromosomes consolidated into a single strain. […] We’re really, really hoping we’re going to get to the finish line later this year.” Boeke also flagged that “every 300 kilobases or so, we did find a bug” — the Sc2.0 design changes are mostly silent, but the bugs that do appear tend to be combinatorial (e.g., a loxPsym site in a promoter compounded with a half-strength synthetic tRNA recognizing rare tandem codons in an essential gene). This is the kind of failure mode you only discover by consolidating chromosomes, not by testing them individually — itself an argument for the consolidation work.

John Glass on what hasn’t worked yet: “No one has been able to make genome transplantation work for anything other than a small group of mycoplasmas.” Glass spent ~15 years figuring out why. Their newest result (BioRxiv ~March 2026) is that killing the recipient cell first with a mitomycin-C cross-link, then transplanting into the dead cell, gives clean transplantation without antibiotic selection — a long-standing source of false positives via homologous-recombination-mediated marker transfer. Separately, his lab discovered that most bacteria carry a calcium-activated surface endonuclease that mycoplasmas happen to lack — and that trypsin-shaving the recipient cells before transplantation may let whole-genome transplantation work in E. coli and other species. If that works, it collapses the 50-100 kb piecewise-replacement workflow Church’s group uses for recoded E. coli into “do the whole genome in one shot, in a day.”

George Church on why you’d bother with genome scale at all: Church framed eight reasons to engineer at genome scale: metabolic optimization, recoding, cell-differentiation code, cell-type delivery code, developmental code, de-aging, de-speciation, de-extinction. Two specific datapoints from his lecture: (1) Harris Wang built 4 billion combinatorial E. coli genomes in a day using MAGE and pulled out a ~5× lycopene-yield improvement (the original 2009 Nature paper); (2) Church’s group has now done 24,000 multiplex base-editor edits in a single strain — knocking out essentially every reverse-transcriptase-encoding endogenous retrovirus in pig cells, enabling pig-to-human xenotransplantation work. The general principle: “the more you change in the genome, the more you change even things that you didn’t think you were changing — but if you can keep a half-day doubling time, you can declare victory.”

These three voices map cleanly onto the three pillars of the chapter: Boeke = synthetic chromosomes, Glass = minimal cells + booting synthetic genomes, Church = recoding + multiplex editing at scale.

Pitfalls, controls, and how to know it worked

Where it goes wrong	What you’ll see	What to control for
Synthesis errors in long fragments	Cloned construct doesn’t sequence-confirm; ORF has a frameshift	Sanger or short-read sequence every cassette before assembly; budget rework time
Quasi-essential genes deleted in a minimal-cell design	Cells grow but ~10× slower than expected	Use transposon mutagenesis with growth-rate readouts, not just survival
Codon recoding breaks regulation you didn’t know existed	Recoded strain has weird phenotypes even though codons are “silent”	Codon usage isn’t truly silent — affects mRNA folding, translation speed, internal promoters. Test in small chunks before genome-scale
TAR cloning of GC-rich or repeat-heavy regions	Yeast loses or rearranges the insert	Break the region into smaller overlapping pieces; check by Sanger across joins
SCRaMbLE induced too aggressively	Lethal rearrangements dominate; library has no survivors	Tune Cre induction (concentration, time); use estradiol-inducible Cre for fine control
CRISPRi off-targets in metabolic engineering	Phenotype doesn’t match the intended single-gene knockdown	Verify with two independent sgRNAs per target; use RNA-seq to confirm specificity

The single best diagnostic for any whole-genome project is long-read sequencing (PacBio HiFi or Oxford Nanopore) of the final construct. Short reads miss large structural variants — exactly the kind that recombination-assembly methods are most likely to introduce.

Bringing it home: how this connects to my own projects

Week 12 is the chapter that ties almost everything else together, because both of my HTGAA projects sit on top of genome-scale design decisions whether I realize it or not. Splitting the connections out by project:

Group project — MS2 phage L-protein engineering

The MS2 genome is the easiest case in synthetic biology to think about whole-genome redesign for, because the whole thing is only ~3.5 kb with four overlapping ORFs. That’s not a chromosome — it’s a postcard. At Twist’s current $0.09/bp clonal gene pricing (and up-to-7 kb maximum length), full MS2 genome synthesis comes in at about $300 — i.e. one Twist gene order pays for the whole genome. Resynthesizing the MS2 genome from scratch with a redesigned L is a single Friday-afternoon order, not a years-long project.

flowchart TD
    subgraph WT [Wild-type MS2 genome 3569 nt ssRNA]
        M[Maturation A-protein] --> C[Coat]
        C --> R[Replicase]
        C --> LO[L lysis gene<br/>out-of-frame overlap with<br/>end of coat and start of replicase<br/>~5 pct ribosome slip-back initiates]
        R --> LO
    end
    WT -->|Week 12 toolkit applied| ENG
    subgraph ENG [Engineered design space]
        OPT1[Option 1 - site-specific NCAA<br/>express L in Syn61 or C321 chassis<br/>install photocrosslinker or click handle]
        OPT2[Option 2 - refactor coat L replicase overlap<br/>full-genome resynthesis<br/>separate the three reading frames cleanly]
        OPT3[Option 3 - codon-tune for chassis<br/>but keep WT codons as a control<br/>recoding work shows synonymous is not silent]
    end
    style OPT1 fill:#cee9f4
    style OPT2 fill:#cee9f4
    style OPT3 fill:#cee9f4

Three concrete moves the Week 12 toolkit unlocks for the L-protein work:

Site-specific non-canonical amino acid (NCAA) installation. Express L in a recoded chassis (Syn61 or C321.ΔA) and put a photocrosslinker, click handle, or fluorescent probe at a defined residue. More clean than amber suppression in wild-type E. coli — no competing read-throughs because UAG is genuinely unused. Useful for trapping folding intermediates, mapping L’s interaction surface with the coat, or single-molecule labeling.
Refactor the coat/L/replicase overlap. L is encoded in an out-of-frame reading window that overlaps both the 3’ end of the coat ORF and the 5’ end of the replicase ORF — and translation of L is initiated only when a ribosome that has just terminated on the coat stop codon slips backward and re-initiates at the L start (~5% efficiency, Adhin & van Duin 1990). That’s a beautifully compact natural regulatory mechanism, but it is also a hard constraint on what L mutations you can make without breaking coat or replicase. If you ever want to fully decouple the three proteins for clean L engineering, the whole genome can be resynthesized with the ORFs separated and an explicit promoter or translational coupling for L. This is exactly the kind of move Sc2.0 does at chromosome scale, just much smaller.
Treat codon optimization with suspicion. Lajoie’s GRO and Fredens’ Syn61 work made it concrete: synonymous codon changes affect mRNA folding, translation kinetics, internal cryptic promoters, even protein function. When ordering the Twist insert for L, keep the wild-type codon usage as a control alongside any vendor-“optimized” version. Half the time the optimized version expresses worse. For MS2 specifically, the codon-redesign risk is amplified because synonymous changes to the coat sequence can knock out the ribosome slip-back signal that initiates L translation in the first place — so any “optimized” coat in the wild-type genome context needs to be checked against L expression by Western blot, not assumed safe.

Final project — Cholera Shield (engineered B. subtilis spore platform)

The Cholera Shield design is a multi-function B. subtilis spore: surface-display anti-cholera-toxin VHH nanobodies and GM1-mimic decoys via CotB/CotC coat-protein fusions, with optional bacteriocin and quorum-quenching modules expressed post-germination in the small intestine. Almost every Week 12 tool has a directly applicable use here, because B. subtilis genome engineering is mature and the spore-coat / sporulation regulons are exactly the kind of complex, multi-gene system that genome-scale tools were built for.

flowchart TD
    subgraph CORE [Cholera Shield core design]
        SP[B subtilis spore]
        CB[CotB or CotC fusion]
        NB[anti-CT VHH nanobody]
        GM[GM1-mimic decoy]
        SP --> CB
        CB --> NB
        CB --> GM
    end
    CORE --> W12{Week 12 tools applied}
    W12 -->|CRISPRi multiplex| PR[Repress 8 extracellular proteases<br/>aprE nprE nprB bpr vpr epr mpr wprA<br/>protect displayed nanobodies in gut]
    W12 -->|CRISPRa pilot| SC[Activate spore-coat assembly genes<br/>boost CotB and CotC display density per spore]
    W12 -->|MAGE-style multiplex| FU[Combinatorial optimization of<br/>fusion linker and display copy number<br/>across many candidate epitopes at once]
    W12 -->|Recoding for biocontainment| BC[NCAA-dependent recoded B subtilis<br/>cannot survive outside controlled environment<br/>addresses GMO regulatory pathway]
    W12 -->|TAR plus Gibson assembly| MO[Build full multi-function operon<br/>nanobody plus GM1 mimic plus bacteriocin plus QQ enzyme<br/>as one cassette]
    style PR fill:#cee9f4
    style SC fill:#cee9f4
    style FU fill:#cee9f4
    style BC fill:#cee9f4
    style MO fill:#cee9f4

The five concrete connections, in priority order for the project:

CRISPRi against the eight extracellular proteases. B. subtilis secretes a battery of proteases (AprE, NprE, NprB, Bpr, Vpr, Epr, Mpr, WprA) that exist precisely to degrade displayed protein — they are the main reason heterologous protein production in B. subtilis is harder than it should be. Strain WB800 has all eight knocked out classically (Wu et al. 2002; full genome sequence in Yang et al. 2018), but a multiplexed CRISPRi approach gives you reversible, titratable repression — useful if you find that complete protease knockout hurts sporulation efficiency. CRISPRi in B. subtilis is well-established and high-precision: Peters et al. 2016 (Cell) built a comprehensive xylose-inducible CRISPRi library targeting all 289 essential genes, available through the Bacillus Genetic Stock Center and Addgene — the same toolkit can target the eight proteases. This is probably the single highest-impact Week 12 move for Cholera Shield: protect the displayed VHH nanobodies from being chewed up before they reach the gut.
CRISPRa on the spore-coat regulon (lower-confidence move). Display density per spore is a key efficacy lever — more anti-CT nanobodies per spore = more toxin neutralized per dose. CRISPRa on coat-assembly genes (or on the SigK / SigE sporulation σ-factor regulons) is conceptually attractive, but CRISPRa in B. subtilis is much less mature than CRISPRi — published systems (dCas9-ω) deliver only ~1.5-2× activation as of 2024, vs. the 100-1000× knockdowns CRISPRi achieves. Worth piloting at small scale; don’t bet the project plan on it.
MAGE-style multiplex optimization of fusion construct architecture. There are many design knobs in a CotB-VHH fusion: linker length and composition, position of the fusion (N- vs C-terminal), copy number, choice of CotB vs CotC vs CotG, ribosome binding site strength. MAGE-style ssDNA recombineering can in principle let you explore combinatorial space across all of these knobs in parallel rather than one variant at a time, though the canonical MAGE workflow was built for E. coli — B. subtilis variants exist (notably via the GP-pAC plasmid system) but the toolkit is less mature than in E. coli. [UNVERIFIED — confirm current state of MAGE-in-Bacillus literature before relying on this as a near-term experimental route.]
Recoding for biocontainment. The single biggest regulatory hurdle for a live GMO probiotic intended for humanitarian use in flood zones and refugee camps is exactly the worry that the engineered strain could persist in the environment. A B. subtilis recoded to require an exogenous non-canonical amino acid in its essential proteins — the biocontainment strategy pioneered in Mandell/Lajoie 2015 for E. coli — would address this head-on. The strain only lives where you provide the NCAA. No supplementation, no growth. This is a longer-horizon engineering effort but exactly the regulatory wedge the project would benefit from.
Multi-function operon assembly via TAR / Gibson. The Cholera Shield concept stacks four functions: toxin decoy, colonization blocker, bacteriocin/lysin, and quorum quencher. Building them as one coordinately regulated cassette (rather than four separate integrations) needs assembly at the 10-30 kb scale — the TAR / Gibson hierarchy from Week 12 is exactly the workflow.

One precise recommendation for the build: start with CRISPRi-mediated repression of WprA and AprE (the two most aggressive surface-display-eating proteases in B. subtilis) in your CotB-VHH-CT display strain. This is a same-month bench experiment, doesn’t require rebuilding the strain from scratch, and gives a clean readout: surface-displayed VHH yield by flow cytometry, with and without CRISPRi induction. If repression rescues display, you have evidence that protease degradation is the limit; if it doesn’t, the limit is upstream (sporulation, fusion folding) and Week 12 tools redirect accordingly.

Cross-check before publication. The proteolytic-environment challenge is explicitly called out in the original Cholera Shield brainstorm brief (Cholera_Shield_BSubtilis_Project/00_Original_Brainstorm_Brief.md) under “Challenge 1.” This Week 12 application is a direct technical answer to that challenge, not a speculative add-on — worth flagging in the next iteration of the project plan.

Course resources

Lecture (Apr 21): George Church, John Glass & Jef Boeke — Building Genomes
Recitation (Apr 22): Ice Kiattisewee — CRISPR-based Metabolic Engineering
HTGAA Week 12 page: https://2026a.htgaa.org/2026a/course-pages/weeks/week-12/ [UNVERIFIED — link constructed from course URL pattern; confirm before publishing]
JCVI Syn3.0 background: https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell
Sc2.0 consortium: https://www.cell.com/consortium/synthetic-yeast-genome
C321.ΔA strain (Addgene #48998): https://www.addgene.org/48998/
Syn61 strain (Addgene #174513): https://www.addgene.org/174513/

References (inline)

Hutchison et al. (2016). Science 351: aad6253. DOI 10.1126/science.aad6253.
Lajoie et al. (2013). Science 342: 357–360. DOI 10.1126/science.1241459.
Fredens et al. (2019). Nature 569: 514–518. DOI 10.1038/s41586-019-1192-5.
Annaluru et al. (2014). Total synthesis of a functional designer eukaryotic chromosome. Science 344(6179): 55–58. DOI 10.1126/science.1249252. — First Sc2.0 synthetic chromosome (synIII), 272,871 bp replacing native 316,617 bp chromosome III.
Zhao et al. (2023). Cell 186: 5220–5236. DOI 10.1016/j.cell.2023.09.025.
Wang et al. (2009). Nature 460: 894–898. DOI 10.1038/nature08187.
Tian et al. (2023). Multiplexed CRISPRi-mediated isoprenol production in E. coli. [Microb Cell Fact reference cited in PMC10659101 — confirm exact citation before publishing.]
Mandell, Lajoie et al. (2015). Biocontainment of genetically modified organisms by synthetic protein design. Nature 518: 55–60. DOI 10.1038/nature14121. — NCAA-dependent biocontainment in recoded E. coli (C321.ΔA), cited in the Cholera Shield tie-in.
Yang et al. (2018). Complete genome sequence of Bacillus subtilis strain WB800N, an extracellular protease-deficient derivative of strain 168. Microbiology Resource Announcements 7: e01380-18. DOI 10.1128/mra.01380-18. — Reference for the B. subtilis WB800/WB800N protease-deletion background cited in the Cholera Shield tie-in.
Cello, Paul & Wimmer (2002). Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science 297: 1016–1018. DOI 10.1126/science.1072266. — Cited in DNA-writing timeline.
Gibson et al. (2008). Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319: 1215–1220. DOI 10.1126/science.1151721. — 582,970 bp synthetic M. genitalium JCVI-1.0; cited in DNA-writing timeline.
Isaacs et al. (2011). Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science 333(6040): 348–353. DOI 10.1126/science.1205822. — CAGE conjugative assembly merging 32 partially recoded E. coli strains (314 of 321 TAG → TAA codons converted via single-elimination “playoff bracket”). Predecessor to Lajoie 2013 C321.ΔA.
Peters et al. (2016). A Comprehensive, CRISPR-based functional analysis of essential genes in bacteria. Cell 165: 1493–1506. DOI 10.1016/j.cell.2016.05.003. — Genome-wide CRISPRi library for all 289 essential genes in B. subtilis. Cited as the established CRISPRi-in-Bacillus framework for the Cholera Shield protease-repression recommendation.
Adhin & van Duin (1990). Scanning model for translational reinitiation in eubacteria. Journal of Molecular Biology 213: 811–818. — Cited for the ~5% ribosome slip-back mechanism that initiates MS2 L lysis protein translation after coat termination.
Lecture (Apr 21, 2026): George Church, John Glass, Jef Boeke. Building Genomes. HTGAA Spring 2026. Quoted material from the recording transcript (uploads/GMT20260421-180630_Recording.transcript.vtt).

*Last updated: 2026-05-26