Week 2 HW: DNA Read Write and Edit
Part 3: DNA Design Challenge
3.1. Choose your protein.
Chosen protein: TlpA (temperature-sensing transcriptional repressor) from Salmonella typhimurium (UniProt: Q56080)
Why I chose it:
- TlpA is a protein “thermometer”: it changes its oligomeric/structural state with temperature, which is exactly the kind of temperature-triggered phase/structure shift that maps conceptually to smart textiles that respond to heat/cold.
- In synthetic biology, thermosensitive repressors like TlpA are used as temperature-controlled switches (gene expression ON/OFF based on temperature).
TlpA_Salmonella_typhimurium protein sequence (FASTA-style; source: UniProt Q56080):
https://www.ncbi.nlm.nih.gov/protein/Q56080
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
Due to codon degeneracy, this is one possible DNA sequence encoding TlpA. Multiple valid nucleotide sequences could encode the same protein depending on codon choice.
Using the benchling back-translation tool I generated the DNA coding sequence corresponding to the TlpA amino acid sequence.
This version uses standard bacterial codons suitable for E. coli expression.
tlpA_reverse_translated_DNA ATGCGTCCGGCGACCTACTACGAACCGGAACAGATTATTGAAGCGGGCCTGGCGCTGCAG GCGGAAGGCCGTAACATTACCGGCTTTGCGCTGCGTAACCAGGTGGGCGGCGGCAACCCG ACCCGTCTGCGTCAGATTTGGGATGAATACCAGGCGTCGCAGTCTCAGACCGTGGTCACC GAACCGGTGGCGGAGCTGCCGGTGGAGGTCGCGGAGGAGGTGAAAGCGGTGTCGGCGGCG CTGTCGGAGCGTATTACCCAGCTGGCGACCGAGCTGAACGACAAGGCGGTGCGTGCGGCG GAGCGCGTGGCAGAGGTGACCCGTGCGGCGGAGCAGACCGCGCAGGCGGAGCGCGAGCTG GCGGACGCGGCGCAGACCGTGGACGACCTGGAGGAGAAGCTGGACGAGCTGCAGGACCGC TACGACAGCCTGACCCTGGCGCTGGAGTCGGAGCGTTCGCTGCGTCAGCAGCACGACGTG GAGATGGCGCAGCTGAAGGAGCGTCTGGCGGCGGAGGAGAACACCCGTCAGCGTGAGGAG CGTTACCAGGAGCAGAAGACCGTGCTGCAGGACGCGCTGAACGCGGAGCAGGCGCAGCAC AAGAACACCCGTGAGGACCTGCAGAAGCGTCTGGAGCAGATTTCTGCGGAGGCCAACGCG CGTACCGAGGAGCTGAAGTCGGAGCGTGACAAGGTGAACACCCTGCTGACCCGTCTGGAG TCGCAGGAGAACGCGCTGGCGTCGGAGCGTCAGCAGCACCTGGCGACCCGTGAGACCCTG CAGCAGCGTCTGGAGCAGGCGATCGCGGACACCCAGGCGCGTGCGGGCGAGATCGCGCTG GAGCGCGACCGCGTGTCGTCGCTGACCGCGCGTCTGGAGTCGCAGGAGAAGGCGTCGTCG GAGCAGCTGGTGCGCATGGGCTCGGAGATCGCGTCGCTGACCGAGCGTTGCACCCAGCTG GAGAACCAGCGCGACGACGCGCGTCTGGAGACCATGGGCGAGAAGGAGACCGTGGCGGAC CTGCGCGGCGAGGCGGAGGCGCTGAAGCGTCAGAACCAGTCGCTGATGGCGGCGCTGTCT GGCAACAAGCAGACCGGCGGCCAGAACGCGTAA
3.3. Codon optimization.
Although multiple codons encode the same amino acid, different organisms do not use synonymous codons equally. This is known as codon bias.
Codon optimization rewrites the DNA sequence without altering the amino acid sequence, selecting synonymous codons that match the host organism’s translational machinery. This increases protein expression efficiency while preserving protein function.
Codon optimization does not change the protein itself, it changes how efficiently the biological system produces it.
I optimized the TlpA coding sequence for Escherichia coli (E. coli, K-12 strain).
Why? E. coli is the most widely used host organism for recombinant protein expression. It allows rapid prototyping and testing of engineered constructs. Its codon usage bias is well-documented, making optimization straightforward. Since TlpA is a bacterial protein, expressing it in E. coli maintains a compatible folding and regulatory environment. Given that TlpA is a temperature-sensitive transcriptional repressor, optimizing for E. coli allows efficient production and functional testing of its temperature-responsive behavior under controlled lab conditions.
3.4. You have a sequence! Now what?
Once the codon-optimized DNA sequence for TlpA is designed, it can be used to produce the protein using either cell-dependent or cell-free expression systems.
Cell-Dependent (In Vivo) Expression
The optimized TlpA gene is inserted into a plasmid containing a promoter and ribosome binding site, then transformed into E. coli.
Inside the cell:
- Transcription : RNA polymerase reads the DNA and produces mRNA (T is replaced by U).
- Translation : Ribosomes read the mRNA in 3-nucleotide codons and assemble amino acids into the TlpA protein.
- Folding : The protein folds into its functional, temperature-sensitive structure.
This allows the engineered cells to produce TlpA using their natural molecular machinery.
Cell-Free (In Vitro) Expression
Alternatively, the DNA can be added to a cell-free transcription–translation system containing purified RNA polymerase, ribosomes, tRNAs, and amino acids.
In this setup, DNA → mRNA → protein occurs in a test tube, allowing rapid and controlled protein production without living cells.
Part 5: DNA Read/Write/Edit
5.1 DNA Read
(i) I want to read the biological code behind adaptive matter. I would sequence DNA from organisms that naturally exhibit signal-responsive material behavior, such as:
- Thermosensitive bacteria
- Phase-separating proteins
- Elastin-like polypeptides (ELPs)
- Stress-responsive regulatory networks
Specifically, I would sequence genes involved in:
- Temperature-sensitive protein folding
- Phase transitions
- Environmental sensing pathways
These systems encode how matter changes state in response to information. Sequencing them allows us to understand the genetic instructions that enable biological materials to:
- Fold differently at different temperatures
- Assemble or disassemble
- Switch functions based on environmental inputs
ii) Based on the sequencing technologies mentioned, I would use Illumina and Nanopore to enable genome-scale decoding of signal-responsive systems.
- What generation is this?
- Illumina → Second-generation sequencing
- Nanopore/PacBio → Third-generation sequencing
Illumina provides high accuracy and deep coverage.
Long-read sequencing captures full-length genes and structural context.
2. What is the input and preparation?
Input: Extracted genomic DNA or plasmid DNA.
Preparation steps (Illumina):
- DNA extraction
- Fragmentation
- Adapter ligation
- PCR amplification
- Cluster generation (bridge amplification on flow cell)
3. How does it decode bases?
Sequencing by synthesis (Illumina):
- Fluorescently labeled reversible terminator nucleotides are added one at a time.
- After each incorporation, a camera detects fluorescence.
- Each color corresponds to A, T, C, or G.
- Software converts fluorescence signals into base calls.
This transforms chemical events into digital sequence information.
4. What is the output?
- Millions of short DNA reads
- FASTQ files with quality scores
- Digital base sequences for downstream analysis
5.2 DNA Write
(i) I would synthesize a temperature-responsive genetic circuit, combining:
- TlpA (temperature-sensitive repressor)
- A regulatory promoter
- A structural protein domain (e.g., elastin-like polypeptide)
- A reporter gene (GFP)
This would encode a system where:
Temperature signal → Gene regulation → Material phase shift
This directly supports programming matter to change state based on environmental signals.
(ii) For the DNA synthesis I would use :
Silicon-based high-throughput oligo synthesis (Twist Bioscience model)
Gene assembly (Gibson Assembly)
1. Essential steps of DNA synthesis
- Chemical synthesis of short oligonucleotides
- Cleavage and deprotection
- Assembly into longer constructs
- Cloning into plasmid
- Sequence verification
2. Limitations
- Error rates increase with length
- Repetitive sequences are difficult
- Cost increases with scale
- Large constructs require hierarchical assembly
5.3 DNA Edit
(i) I would edit microbial genomes to embed temperature-sensitive regulatory modules directly into the chromosome. This would allow cells to produce structural or phase-transition proteins only under defined environmental conditions, enabling adaptive biological materials.
ii) I would use CRISPR-Cas9 genome editing tool potentially for base editing and prime editing
1. How does CRISPR edit DNA?
- A guide RNA targets a specific DNA sequence.
- Cas9 creates a double-strand break.
- The cell repairs the break:
- NHEJ → insertions/deletions
- HDR → precise edits using a repair template
2. Preparation and input
Required components:
- Guide RNA
- Cas9 protein or plasmid
- Donor DNA template (for precise edits)
- Target cells
Design steps include PAM identification and off-target analysis.
3. Limitations
- Off-target edits
- Variable efficiency
- Delivery challenges
- HDR is less efficient than NHEJ