Week 2 HW: DNA Design Challenge

⚙️ 3.1 Choose a protein

I chose the ATP synthase beta subunit because it’s essentially a biological motor and connects to my broader interest in energy systems:

Protons flow down their gradient across the mitochondrial membrane, almost like current moving through a circuit, and that flow physically spins part of the protein like a tiny turbine. That rotation drives changes in the beta subunits, which catalyze the formation of ATP from ADP and phosphate.

So it’s literally energy stored in a gradient being converted into mechanical motion and then into chemical energy. I find that idea really compelling, it’s molecular thermodynamics in action, where fundamental physics laws become something tangible inside living cells.

From NCBI I obtained the protein sequence:

https://www.ncbi.nlm.nih.gov/protein/NP_001677.2/

https://www.ncbi.nlm.nih.gov/protein/NP_001677.2?report=fasta

NP_001677.2 ATP synthase F(1) complex subunit beta, mitochondrial precursor [Homo sapiens]

MLGFVGRVAAAPASGALRRLTPSASLPPAQLLLRAAPTAVHPVRDYAAQTSPSPKAGAATGRIVAVIGAV VDVQFDEGLPPILNALEVQGRETRLVLEVAQHLGESTVRTIAMDGTEGLVRGQKVLDSGAPIKIPVGPET LGRIMNVIGEPIDERGPIKTKQFAPIHAEAPEFMEMSVEQEILVTGIKVVDLLAPYAKGGKIGLFGGAGV GKTVLIMELINNVAKAHGGYSVFAGVGERTREGNDLYHEMIESGVINLKDATSKVALVYGQMNEPPGARA RVALTGLTVAEYFRDQEGQDVLLFIDNIFRFTQAGSEVSALLGRIPSAVGYQPTLATDMGTMQERITTTK KGSITSVQAIYVPADDLTDPAPATTFAHLDATTVLSRAIAELGIYPAVDPLDSTSRIMDPNIVGSEHYDV ARGVQKILQDYKSLQDIIAILGMDELSEEDKLTVSRARKIQRFLSQPFQVAEVFTGHMGKLVPLKETIKG FQQILAGEYDHLPEQAFYMVGPIEEAVAKADKLAEEHSS

🔁 3.1 Reverse translate a protein sequence

We know we go from 3 DNA bases → RNA → 1 Codon → 1 Amino Acid → 1 Protein letter

We can find the nucleotide record

https://www.ncbi.nlm.nih.gov/nuccore/NM_001686.4
https://www.ncbi.nlm.nih.gov/nuccore/NM_001686.4?report=fasta

NM_001686.4 Homo sapiens ATP synthase F1 subunit beta (ATP5F1B), mRNA; nuclear gene for mitochondrial product

AGTCTCCACCCGGACTACGCCATGTTGGGGTTTGTGGGTCGGGTGGCCGCTGCTCCGGCCTCCGGGGCCT TGCGGAGACTCACCCCTTCAGCGTCGCTGCCCCCAGCTCAGCTCTTACTGCGGGCCGCTCCGACGGCGGT CCATCCTGTCAGGGACTATGCGGCGCAAACATCTCCTTCGCCAAAAGCAGGCGCCGCCACCGGGCGCATC GTGGCGGTCATTGGCGCAGTGGTGGACGTCCAGTTTGATGAGGGACTACCACCAATTCTAAATGCCCTGG AAGTGCAAGGCAGGGAGACCAGACTGGTTTTGGAGGTGGCCCAGCATTTGGGTGAGAGCACAGTAAGGAC TATTGCTATGGATGGTACAGAAGGCTTGGTTAGAGGCCAGAAAGTACTGGATTCTGGTGCACCAATCAAA ATTCCTGTTGGTCCTGAGACTTTGGGCAGAATCATGAATGTCATTGGAGAACCTATTGATGAAAGAGGTC CCATCAAAACCAAACAATTTGCTCCCATTCATGCTGAGGCTCCAGAGTTCATGGAAATGAGTGTTGAGCA GGAAATTCTGGTGACTGGTATCAAGGTTGTCGATCTGCTAGCTCCCTATGCCAAGGGTGGCAAAATTGGG CTTTTTGGTGGTGCTGGAGTTGGCAAGACTGTACTGATCATGGAGTTAATCAACAATGTCGCCAAAGCCC ATGGTGGTTACTCTGTGTTTGCTGGTGTTGGTGAGAGGACCCGTGAAGGCAATGATTTATACCATGAAAT GATTGAATCTGGTGTTATCAACTTAAAAGATGCCACCTCTAAGGTAGCGCTGGTATATGGTCAAATGAAT GAACCACCTGGTGCTCGTGCCCGGGTAGCTCTGACTGGGCTGACTGTGGCTGAATACTTCAGAGACCAAG AAGGTCAAGATGTACTGCTATTTATTGATAACATCTTTCGCTTCACCCAGGCTGGTTCAGAGGTGTCTGC ATTATTGGGCCGAATCCCTTCTGCTGTGGGCTATCAGCCTACCCTGGCCACTGACATGGGTACTATGCAG GAAAGAATTACCACTACCAAGAAGGGATCTATCACCTCTGTACAGGCTATCTATGTGCCTGCTGATGACT TGACTGACCCTGCCCCTGCTACTACGTTTGCCCATTTGGATGCTACCACTGTACTGTCGCGTGCCATTGC TGAGCTGGGCATCTATCCAGCTGTGGATCCTCTAGACTCCACCTCTCGTATCATGGATCCCAACATTGTT GGCAGTGAGCATTACGATGTTGCCCGTGGGGTGCAAAAGATCCTGCAGGACTACAAATCCCTCCAGGATA TCATTGCCATCCTGGGTATGGATGAACTTTCTGAGGAAGACAAGTTGACCGTGTCCCGTGCACGGAAAAT ACAGCGTTTCTTGTCTCAGCCATTCCAGGTTGCTGAGGTCTTCACAGGTCATATGGGGAAGCTGGTACCC CTGAAGGAGACCATCAAAGGATTCCAGCAGATTTTGGCAGGTGAATATGACCATCTCCCAGAACAGGCCT TCTATATGGTGGGACCCATTGAAGAAGCTGTGGCAAAAGCTGATAAGCTGGCTGAAGAGCATTCATCGTG AGGGGTCTTTGTCCTCTGTACTGTCTCTCTCCTTGCCCCTAACCCAAAAAGCTTCATTTTTCTGTGTAGG CTGCACAAGAGCCTTGATTGAAGATATATTCTTTCTGAACAGTATTTAAGGTTTCCAATAAAATGTACAC CCCTCAGAA

🧪 3.3 Codon Optimization

Multiple codons can code for the same amino acid, but different organisms prefer certain codons over others. So we have to optimize codon usage for that specific organism otherwise translation might be inefficient, we want to use tRNA’s that are plentiful – which bind to that specific codon attaching the specific amino acid.

I have chosen E.coli as the organism to optimize the protein sequence for. Since we use them in the fluorescent bacteria artwork lab!

Above is the entire mRNA sequence, but we need the coding sequence (CDS) – the mRNA sequence has additional information like a start and end codon and untranslated regions. We can go to the CDS record instead, obtain the coding sequence and then use our codon optimization on it.

https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8924.1
https://www.idtdna.com/CodonOpt

We get this result:

ATG CTG GGA TTT GTT GGA CGT GTG GCT GCC GCG CCT GCG TCA GGA GCA CTG CGC CGC CTG ACT CCT TCT GCC TCT CTG CCG CCG GCG CAG CTG CTG CTG CGT GCG GCG CCA ACC GCG GTT CAC CCG GTG CGT GAT TAT GCC GCG CAG ACC TCG CCC TCT CCG AAA GCC GGT GCG GCC ACC GGC CGT ATC GTC GCG GTG ATC GGC GCG GTG GTA GAT GTA CAG TTT GAT GAA GGT CTG CCG CCG ATT CTC AAT GCG CTG GAA GTT CAG GGC CGT GAA ACC CGC CTG GTT CTG GAG GTA GCG CAG CAC CTG GGT GAG AGC ACC GTC CGT ACC ATT GCT ATG GAC GGC ACC GAA GGT CTG GTG CGT GGT CAG AAA GTG CTG GAT TCT GGT GCA CCG ATC AAA ATC CCG GTT GGC CCG GAA ACG TTG GGG CGT ATC ATG AAC GTC ATT GGT GAA CCG ATT GAT GAA CGT GGA CCG ATC AAA ACC AAA CAG TTT GCG CCG ATC CAT GCG GAA GCG CCG GAG TTT ATG GAA ATG AGC GTT GAG CAG GAG ATC CTG GTG ACC GGC ATC AAA GTG GTT GAT CTG CTG GCG CCG TAT GCC AAA GGC GGC AAA ATC GGC CTG TTC GGC GGT GCG GGT GTC GGC AAA ACC GTG CTG ATC ATG GAG CTG ATC AAC AAC GTG GCG AAA GCG CAC GGT GGT TAC AGC GTC TTT GCC GGT GTC GGT GAG CGC ACC CGT GAA GGT AAC GAC CTG TAT CAC GAA ATG ATT GAG AGC GGT GTG ATC AAC CTG AAA GAT GCG ACC AGC AAG GTC GCG CTG GTT TAC GGC CAG ATG AAC GAG CCG CCA GGT GCG CGT GCC CGT GTT GCG CTG ACT GGC CTG ACG GTA GCT GAG TAC TTC CGT GAC CAG GAA GGT CAG GAT GTG CTG CTG TTT ATC GAC AAC ATC TTC CGC TTC ACC CAG GCA GGC TCT GAA GTC TCT GCG CTG CTG GGT CGC ATC CCC TCA GCG GTT GGC TAT CAG CCG ACC CTG GCG ACC GAC ATG GGC ACC ATG CAG GAG CGT ATC ACC ACC ACC AAA AAA GGC TCT ATC ACC TCG GTT CAG GCG ATC TAT GTG CCG GCT GAT GAT CTG ACT GAT CCG GCA CCG GCA ACC ACC TTT GCC CAC CTG GAT GCC ACC ACC GTG CTC AGC CGT GCG ATT GCC GAG CTG GGT ATC TAC CCG GCG GTG GAT CCG CTG GAC AGC ACC TCG CGT ATT ATG GAC CCC AAC ATT GTC GGC TCT GAA CAC TAC GAT GTG GCG CGC GGC GTG CAG AAG ATC CTG CAG GAC TAC AAA AGC CTG CAG GAT ATC ATT GCC ATC CTG GGT ATG GAT GAA CTC TCT GAA GAA GAT AAA CTG ACC GTT AGC CGT GCG CGC AAA ATC CAG CGC TTC CTG AGC CAG CCG TTC CAG GTG GCG GAA GTG TTC ACC GGT CAC ATG GGC AAA CTG GTG CCG CTG AAA GAG ACT ATT AAA GGC TTC CAG CAG ATT CTG GCG GGT GAG TAC GAC CAC CTG CCG GAA CAG GCG TTC TAT ATG GTG GGC CCG ATT GAA GAG GCG GTG GCG AAA GCG GAT AAA CTG GCG GAA GAA CAT AGC AGC TAA
🧫 3.4 What technologies could be used to produce this protein from your DNA?

We can use cell dependent expressions, like cloning the optimized DNA sequence into a plasmid vector and introducing it into a host organism such as E.coli. Once inside the promoter recruits RNA polymerase and transcribes the DNA sequence into mRNA. The ribosomes then binds to the mRNA and tRNA’s match codons and deliver amino acids. The amino acids are then linked together to form the protein. The bacteria would then produce ATP synthase beta subunit as part of their cellular machinery.

🧩 4.1-2 Build your DNA insert sequence

Expression Cassette

https://benchling.com/s/seq-QDGibA4g7TjoTuX3lb5A?m=slm-Gx8zqXYh9sr4lxSK0Xqu

🔄 4.3-6 Twist, Vector choice, Sequence Download

We can view the full plasmid sequence for our clonal genes (circular dna) and pTwist Amp High Copy cloning vector in Benchling:

https://benchling.com/s/seq-wsl9w63Z5DcxN7rlp5cG?m=slm-ndl9y5U2FSsJgNYW6z7

🧬 5.1 What DNA would you want to sequence and technologies used?

I would choose to sequence the DNA of extremophiles that thrive in high-radiation or high-temperature environments. By sequencing genes involved in radiation resistance, DNA repair, and protein stabilization, we could better understand the molecular mechanisms that allow biological systems to survive under extreme stress. This knowledge could help inform the engineering of radiation-resistant biological materials or bio-hybrid systems designed to operate in harsh energy environments. Studying these organisms connects molecular biology with broader challenges in advanced energy systems.

I would use Illumina sequencing to sequence the DNA since it provides high accuracy and high throughput and is well suited for whole-genome sequencing and variant detection. It’s a second generation technique, sequencing millions of short DNA fragments in parallel using sequencing-by-synthesis. Illumina sequencing reads DNA by copying it one base at a time and taking a picture after each base is added.

The input would be the extracted genomic DNA.

Preparation steps:

  1. Fragment DNA into short pieces
  2. Ligate sequencing adapters
  3. PCR amplify fragments
  4. Load onto flow cell for cluster amplification

Essential sequencing steps:

  1. DNA fragments bind to flow cell
  2. Bridge amplification forms clusters
  3. Fluorescently labeled nucleotides are added one at a time
  4. A camera detects the fluorescent signal for each incorporated base
  5. The color signal determines the base

The output would be millions of short sequence reads containing nucleotide sequences and quality scores which can be assembled into a genome or aligned to a reference.

🧪 5.2 What DNA would you want to synthesize and technologies used?

I would want to synthesize a cluster of genes involved in enhanced DNA repair and protein stabilization from extremophiles and express them in a model organism. By combining multiple protective pathways, we could engineer cells with improved resistance to radiation and thermal stress. The idea would to use this to develop radiation-resistant biomaterials or biological components for extreme energy environments. We could build a genetic circuit that enables engineered bacteria to sense and respond to radiation stress. This circuit could include radiation response promoters, DNA repair genes and protective protein pathways that activate under high oxidative or ionizing radiation conditions.

To synthesize this genetic circuit, we could use Twist combined with phosphoramidite solid-phase DNA synthesis and Gibson Assembly for multi-fragment assembly.

Essential steps:

  1. Design optimized DNA sequence computationally
  2. Chemically synthesize short oligonucleotides (base-by-base addition)
  3. Cleave and purify oligos
  4. Assemble fragments into full-length gene (e.g., Gibson Assembly)
  5. Clone into plasmid backbone
  6. Sequence-verify construct

Limitations:

Length limits: Direct chemical synthesis is reliable only for short fragments with longer genes requiring assembly. There’s also base errors so we would need to do sequencing validation and it can be very expensive for large gene clusters and take a large amount of time.

✏️ 5.3 What DNA would you want to edit and why? What technologies?

I would edit the genomes of photosynthetic microorganisms such as algae to improve their efficiency in converting light energy into chemical fuels. I could target genes involved in photosystem efficiency, carbon fixation pathways, and hydrogen production.

Photosynthesis is essentially a natural solar energy conversion system, but it is quite inefficient. We could modify regulatory genes to reduce energy losses or redirect metabolic pathways toward hydrogen or biofuel production, so we could have biological systems that convert sunlight into storable chemical energy more efficiently.

I am interested as it connects directly to large-scale energy systems and treating living cells as programmable energy conversion platforms, similar to designing more efficient reactors or turbines.

WE could use CRISPR-Cas12a for genome editing in cyanobacteria.

How does it edit dna

  1. Design guide RNAs targeting specific genes.
  2. Deliver Cas12a and guide RNAs into the cells.
  3. Cas12a cuts the DNA at precise locations.
  4. The cell repairs the cut using a donor DNA template to insert optimized sequence

Design:

  • Identify metabolic bottlenecks in photosynthesis or fuel production.
  • Design guide RNAs.
  • Design donor DNA templates if inserting new sequences.

Inputs:

  • Cas enzyme
  • Guide RNAs
  • Donor DNA (if needed)
  • Host cells (e.g., cyanobacteria)

Limitations

  • Off-target edits may occur.
  • Large pathway rewiring is complex.
  • Efficiency gains may be modest due to thermodynamic constraints.