Week 2 HW: DNA Read Write and Edit

Part 3: DNA Design Challenge

3.1. Choose your protein.

Chosen protein: TlpA (temperature-sensing transcriptional repressor) from Salmonella typhimurium (UniProt: Q56080)

Why I chose it:

TlpA is a protein “thermometer”: it changes its oligomeric/structural state with temperature, which is exactly the kind of temperature-triggered phase/structure shift that maps conceptually to smart textiles that respond to heat/cold.
In synthetic biology, thermosensitive repressors like TlpA are used as temperature-controlled switches (gene expression ON/OFF based on temperature).

TlpA_Salmonella_typhimurium protein sequence (FASTA-style; source: UniProt Q56080):

https://www.ncbi.nlm.nih.gov/protein/Q56080

 mrpatyepeq iieaglalqa egrnitgfal rnqvgggnpt rlrqiwdeyq asqstvvtep
       61 vaelpvevae evkavsaals eritqlatel ndkavraaer rvaevtraag eqtaqaerel
      121 adaaqtvddl eekldelqdr ydsltlales erslrqqhdv emaqlkerla aaeentrqre
      181 eryqeqktvl qdalnaeqaq hkntredlqk rleqisaean arteelkser dkvntlltrl
      241 esqenalase rqqhlatret lqqrleqaia dtqarageia lerdrvsslt arlesqekas
      301 seqlvrmgse iaslterctq lenqrddarl etmgeketva dlrgeaealk rqnqslmaal
      361 sgnkqtggqn a

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

Due to codon degeneracy, this is one possible DNA sequence encoding TlpA. Multiple valid nucleotide sequences could encode the same protein depending on codon choice.

Using the benchling back-translation tool I generated the DNA coding sequence corresponding to the TlpA amino acid sequence.

This version uses standard bacterial codons suitable for E. coli expression.

tlpA_reverse_translated_DNA ATGCGTCCGGCGACCTACTACGAACCGGAACAGATTATTGAAGCGGGCCTGGCGCTGCAG GCGGAAGGCCGTAACATTACCGGCTTTGCGCTGCGTAACCAGGTGGGCGGCGGCAACCCG ACCCGTCTGCGTCAGATTTGGGATGAATACCAGGCGTCGCAGTCTCAGACCGTGGTCACC GAACCGGTGGCGGAGCTGCCGGTGGAGGTCGCGGAGGAGGTGAAAGCGGTGTCGGCGGCG CTGTCGGAGCGTATTACCCAGCTGGCGACCGAGCTGAACGACAAGGCGGTGCGTGCGGCG GAGCGCGTGGCAGAGGTGACCCGTGCGGCGGAGCAGACCGCGCAGGCGGAGCGCGAGCTG GCGGACGCGGCGCAGACCGTGGACGACCTGGAGGAGAAGCTGGACGAGCTGCAGGACCGC TACGACAGCCTGACCCTGGCGCTGGAGTCGGAGCGTTCGCTGCGTCAGCAGCACGACGTG GAGATGGCGCAGCTGAAGGAGCGTCTGGCGGCGGAGGAGAACACCCGTCAGCGTGAGGAG CGTTACCAGGAGCAGAAGACCGTGCTGCAGGACGCGCTGAACGCGGAGCAGGCGCAGCAC AAGAACACCCGTGAGGACCTGCAGAAGCGTCTGGAGCAGATTTCTGCGGAGGCCAACGCG CGTACCGAGGAGCTGAAGTCGGAGCGTGACAAGGTGAACACCCTGCTGACCCGTCTGGAG TCGCAGGAGAACGCGCTGGCGTCGGAGCGTCAGCAGCACCTGGCGACCCGTGAGACCCTG CAGCAGCGTCTGGAGCAGGCGATCGCGGACACCCAGGCGCGTGCGGGCGAGATCGCGCTG GAGCGCGACCGCGTGTCGTCGCTGACCGCGCGTCTGGAGTCGCAGGAGAAGGCGTCGTCG GAGCAGCTGGTGCGCATGGGCTCGGAGATCGCGTCGCTGACCGAGCGTTGCACCCAGCTG GAGAACCAGCGCGACGACGCGCGTCTGGAGACCATGGGCGAGAAGGAGACCGTGGCGGAC CTGCGCGGCGAGGCGGAGGCGCTGAAGCGTCAGAACCAGTCGCTGATGGCGGCGCTGTCT GGCAACAAGCAGACCGGCGGCCAGAACGCGTAA

3.3. Codon optimization.

Although multiple codons encode the same amino acid, different organisms do not use synonymous codons equally. This is known as codon bias.

Codon optimization rewrites the DNA sequence without altering the amino acid sequence, selecting synonymous codons that match the host organism’s translational machinery. This increases protein expression efficiency while preserving protein function.

Codon optimization does not change the protein itself, it changes how efficiently the biological system produces it.

I optimized the TlpA coding sequence for Escherichia coli (E. coli, K-12 strain).

Why? E. coli is the most widely used host organism for recombinant protein expression. It allows rapid prototyping and testing of engineered constructs. Its codon usage bias is well-documented, making optimization straightforward. Since TlpA is a bacterial protein, expressing it in E. coli maintains a compatible folding and regulatory environment. Given that TlpA is a temperature-sensitive transcriptional repressor, optimizing for E. coli allows efficient production and functional testing of its temperature-responsive behavior under controlled lab conditions.

3.4. You have a sequence! Now what?

Once the codon-optimized DNA sequence for TlpA is designed, it can be used to produce the protein using either cell-dependent or cell-free expression systems.

Cell-Dependent (In Vivo) Expression

The optimized TlpA gene is inserted into a plasmid containing a promoter and ribosome binding site, then transformed into E. coli.

Inside the cell:

Transcription : RNA polymerase reads the DNA and produces mRNA (T is replaced by U).
Translation : Ribosomes read the mRNA in 3-nucleotide codons and assemble amino acids into the TlpA protein.
Folding : The protein folds into its functional, temperature-sensitive structure.

This allows the engineered cells to produce TlpA using their natural molecular machinery.

Cell-Free (In Vitro) Expression

Alternatively, the DNA can be added to a cell-free transcription–translation system containing purified RNA polymerase, ribosomes, tRNAs, and amino acids.

In this setup, DNA → mRNA → protein occurs in a test tube, allowing rapid and controlled protein production without living cells.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) I want to read the biological code behind adaptive matter. I would sequence DNA from organisms that naturally exhibit signal-responsive material behavior, such as:

Thermosensitive bacteria
Phase-separating proteins
Elastin-like polypeptides (ELPs)
Stress-responsive regulatory networks

Specifically, I would sequence genes involved in:

Temperature-sensitive protein folding
Phase transitions
Environmental sensing pathways

These systems encode how matter changes state in response to information. Sequencing them allows us to understand the genetic instructions that enable biological materials to:

Fold differently at different temperatures
Assemble or disassemble
Switch functions based on environmental inputs

ii) Based on the sequencing technologies mentioned, I would use Illumina and Nanopore to enable genome-scale decoding of signal-responsive systems.

What generation is this?

Illumina → Second-generation sequencing
Nanopore/PacBio → Third-generation sequencing

Illumina provides high accuracy and deep coverage.

Long-read sequencing captures full-length genes and structural context.

2. What is the input and preparation?

Input: Extracted genomic DNA or plasmid DNA.

Preparation steps (Illumina):

DNA extraction
Fragmentation
Adapter ligation
PCR amplification
Cluster generation (bridge amplification on flow cell)

3. How does it decode bases?

Sequencing by synthesis (Illumina):

Fluorescently labeled reversible terminator nucleotides are added one at a time.
After each incorporation, a camera detects fluorescence.
Each color corresponds to A, T, C, or G.
Software converts fluorescence signals into base calls.

This transforms chemical events into digital sequence information.

4. What is the output?

Millions of short DNA reads
FASTQ files with quality scores
Digital base sequences for downstream analysis

5.2 DNA Write

(i) I would synthesize a temperature-responsive genetic circuit, combining:

TlpA (temperature-sensitive repressor)
A regulatory promoter
A structural protein domain (e.g., elastin-like polypeptide)
A reporter gene (GFP)

This would encode a system where:

Temperature signal → Gene regulation → Material phase shift

This directly supports programming matter to change state based on environmental signals.

(ii) For the DNA synthesis I would use :

Silicon-based high-throughput oligo synthesis (Twist Bioscience model)

Gene assembly (Gibson Assembly)

1. Essential steps of DNA synthesis

Chemical synthesis of short oligonucleotides
Cleavage and deprotection
Assembly into longer constructs
Cloning into plasmid
Sequence verification

2. Limitations

Error rates increase with length
Repetitive sequences are difficult
Cost increases with scale
Large constructs require hierarchical assembly

5.3 DNA Edit

(i) I would edit microbial genomes to embed temperature-sensitive regulatory modules directly into the chromosome. This would allow cells to produce structural or phase-transition proteins only under defined environmental conditions, enabling adaptive biological materials.

ii) I would use CRISPR-Cas9 genome editing tool potentially for base editing and prime editing

1. How does CRISPR edit DNA?

A guide RNA targets a specific DNA sequence.
Cas9 creates a double-strand break.
The cell repairs the break:
- NHEJ → insertions/deletions
- HDR → precise edits using a repair template

2. Preparation and input

Required components:

Guide RNA
Cas9 protein or plasmid
Donor DNA template (for precise edits)
Target cells

Design steps include PAM identification and off-target analysis.

3. Limitations

Off-target edits
Variable efficiency
Delivery challenges
HDR is less efficient than NHEJ