Individual Final Project

Mapping the Thermodynamic Rules of Toehold Switch Function in Spinach Chloroplast Cell-Free Expression: an LDBT Approach
Abstract
Chloroplast cell-free expression (CFE) systems have recently been established as powerful rapid-prototyping platforms for plastid genetic parts, yet whether these systems can support synthetic RNA logic remains entirely untested. Toehold switches — de novo-designed riboregulators that activate translation in response to specific trigger RNAs — represent the most sophisticated programmable RNA gates in synthetic biology. Machine learning models trained on E. coli CFE data have begun to extract sequence-structure features predictive of switch performance using frameworks like SANDSTORM (Riley et al., 2025), but whether those learned relationships hold in a chloroplast ribosome context is unknown. This project addresses that gap directly.
Applying the Learn-Design-Build-Test (LDBT) framework, we train a SANDSTORM predictive neural network — a dual-input CNN incorporating one-hot-encoded RNA sequence and secondary structure arrays (Riley et al., 2025) — on the publicly available 181-switch E. coli dataset to learn sequence-structure-function relationships for toehold switches. The trained SANDSTORM model is then paired with GARDN (Generative Adversarial RNA Design Network) to generate 12–15 novel toehold switch candidates with predicted high ON/OFF performance in a chloroplast ribosome context, including PVY coat protein mRNA-triggered designs. Whole plasmid constructs are ordered from Twist Bioscience and tested in both spinach chloroplast CFE and crude E. coli S30 extract; a secondary SANDSTORM model retrained on the resulting chloroplast data constitutes the first sequence-structure-function ML model for toehold switches in a plant-native ribosomal context. The project produces the first empirical dataset and neural network model for toehold switch performance in plant chloroplast CFE, a transferable GARDN-SANDSTORM LDBT workflow applicable to any novel ribosome context, and a foundation for programmable RNA diagnostics manufacturable directly from plant material. All experiments are performed using the Ginkgo Bioworks autonomous laboratory infrastructure and open-access grocery-store spinach, demonstrating that LDBT with deep learning is executable at global-access scale.