Week 2 HW: DNA read, write and edit.

Week 02 - Lecture Questions

Professor Jacobson

The fidelity of DNA replication is governed by DNA polymerase and its associated repair systems. The intrinsic error rate of DNA polymerase, in the absence of proofreading, is approximately 10^{-4 to 10}-5 per nucleotide. In eukaryotes, replicative polymerases utilize 3’ —} 5’ exonuclease activity for proofreading, which enhances fidelity to an error rate of approximately 10^{-7. When integrated with post-replicative mismatch repair (MMR) mechanisms, the effective error rate is further optimized to roughly 10}-9 to 10^{-10 per nucleotide.Given that the human genome comprises approximately 3.2 x 10}9 base pairs, replication without these multi-layered fidelity mechanisms would result in a mutational load incompatible with cellular viability. Biological systems mitigate this risk through a hierarchy of safeguards—polymerase proofreading, mismatch repair, and various DNA damage response pathways—ensuring that the mutation rate per genome remains within a range that sustains evolutionary stability and life.
A typical human protein consists of approximately 300 to 400 amino acids. Due to the degeneracy of the genetic code—where 64 codons encode 20 amino acids—the theoretical number of DNA sequences capable of encoding a single protein is exceptionally high.

However, functional constraints significantly restrict this theoretical diversity. Key limiting factors include:

-Codon Usage Bias: Variations in tRNA availability that influence translation efficiency.

-mRNA Secondary Structure: Folding patterns that may impede ribosome binding or elongation.

-GC Content: Extreme ratios that affect both sequence stability and the feasibility of synthesis.

-Regulatory Interference: The unintended presence of cryptic splice sites or premature termination signals.

-Metabolic Burden: High expression levels that may lead to cellular stress or protein misfolding.

Consequently, while the sequence space is vast, the biological context dictates a much narrower range of viable genetic sequences.

Dr. LeProust

Modern oligonucleotide synthesis primarily relies on solid-phase phosphoramidite chemistry. In this process, DNA is synthesized in the 3’ to 5’ direction through iterative cycles of deprotection, coupling, capping, and oxidation.Direct chemical synthesis is currently limited to approximately 150–200 nucleotides. This constraint arises because coupling efficiency is never 100%; as the sequence length increases, the yield of full-length, error-free molecules decreases exponentially. Furthermore, the accumulation of truncated products and point mutations makes the purification of long, high-fidelity oligonucleotides technically prohibitive.To produce longer sequences, such as a 2,000 bp gene, researchers must assemble multiple overlapping short oligonucleotides using enzymatic techniques like PCR assembly or Gibson assembly, followed by sequence verification and cloning.
Animals cannot synthesize certain amino acids de novo and must acquire them through their diet. The ten commonly recognized essential amino acids are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine. Notably, lysine is universally essential across all animal species, representing a fundamental and highly conserved metabolic dependency.

George Church

Question 1: The “Lysine Contingency,” a biocontainment framework proposed by George Church, leverages the metabolic dependency on lysine to prevent the unintended proliferation of engineered organisms. By disabling endogenous lysine biosynthesis, the survival of the organism becomes contingent upon an external supply of this amino acid.

The universal necessity of lysine in animals reinforces the robustness of this strategy, as the evolutionary pressure to bypass such a deeply rooted biochemical constraint is significant. However, because many microorganisms possess the innate ability to synthesize lysine, effective biocontainment requires the knockout of redundant pathways and the implementation of multi-layered genetic safeguards. Thus, the lysine contingency is most effective when integrated into a broader, polygenic containment architecture rather than acting as a singular point of failure.

Week 2 - DNA Read, Write and Edit HM

Part 1: Benchling & In-silico Gel Art

By reordering restriction digest lanes of Lambda DNA, I created a symmetrical gel pattern resembling a butterfly!

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Unfortunately No Lab Access

Part 3: DNA Design Challenge

3.1 Chosen Protein: GFP

I chose Green Fluorescent Protein (GFP) because it is widely used as a reporter protein in molecular biology. Since MELiSSA involves plasmid-based control systems and monitoring metabolic states, GFP represents a practical and symbolic example of how biological systems can be visually tracked in real time. GFP was originally isolated from Aequorea victoria and is commonly used as a fluorescent marker in genetic engineering experiments. Using UniProt, I obtained the amino acid sequence for GFP (UniProt ID: P42212).

Amino Acid Sequence: >sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTT LSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELK GTDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIG DGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

3.2 Using an online reverse translation tool, I converted the GFP amino acid sequence into a possible coding DNA sequence. Because of codon degeneracy, multiple DNA sequences can encode the same protein. The sequence below represents one possible nucleotide sequence using standard codon usage.

One possible nucleotide sequence: ATGAGCAAAGGTGAAGAACTGTTTACCGGTGTTGTCCCAATTCTGGTTGAATTGGGTGATGGT AATGGTCATAAATTTTCTGTCTCTGGCGGAGAAGGTGATGCTACCTATAAGCTGACACTGAAA TTTATTTGCACCACTGGAAAATTGCCAGTTCCATGGCCAACACTGGTTACTACTCTGTCTTAT GGTGTTCAGTGCTTCTCTCGCTACCCAGATCATATGAAACATGATTTTTTTAAATCTGCCATG CCAGAGGGTTATGTTCAGGAGCGTACTATTTTTAAAGATGATGGTAATTATAAAACACGTGCT GAAGTCAAATTTGAAGGTGATACACTGGTAAATCGCATTGAGCTGAAAGGTACCGACTTTAAG GAAGATGGTAATATTCTGGGTCATAAACTGGAATACAATTATAACTCTCATAATGTCTATATT ATGGCTGATAAACAGAAGAATGGTATTAAAGTTAATTTTAAAATTCGTCATAATATTGAAGAT GGTTCTGTTCAGCTGGCTGATCACTACCAGCAGAATACTCCAATTGGAGATGGTCCTGTTCTG CTGCCAGATAATCACTATCTGAGTACTCAGTCTGCTCTGTCTAAGGATCCAAATGAAAAGCGA GATCATATGGTTCTGCTGGAATTTGTTACTGCTGCAGGTATTACCCATGGTATGGATGAGCTG TATAAATAA

3.3 Codon optimization is important to improve protein development in the chosen host organism.

As we know, multiple DNA sequences can encode the same protein due to the degeneracy of the genetic code, but not all codons are used equally in all organisms. This is due to the abundance of tRNA pools. If a gene contains codons that are rare for the organism, translation may decrease leading to slower protein production or ribosome stalling. I optimized the codon sequence for Escherichia coli (E. coli) because it grows rapidly, it is inexpensive and has a fully sequenced and well-characterized genome. Optimizing the gene for E. coli ensures that the codons match the organism’s tRNA abundance, thereby maximizing expression efficiency.

3.4 Cell-Free Protein Expression (In Vitro)

In this method: The DNA template is added to a reaction mixture containing: RNA polymerase, ribosomes, tARNs, aminoacids, energy sources. Transcription and translation occur in a test tube without living cells. The protein is synthesized directly in vitro.

Advantages: Faster expression No need to maintain living cells Useful for toxic proteins More controllable environment

Limitations: Higher cost Typically lower yield than in vivo systems

How DNA Becomes A Protein? In both systems (cell- dependent or cell-free), the process follows the Central Dogma: DNA → mRNA → Protein 1)The DNA sequence is transcribed into messenger RNA (mRNA). 2)The ribosome reads the mRNA in codons (sets of three nucleotides). 3)Transfer RNAs (tRNAs) match each codon with the corresponding amino acid. 4)The amino acids are linked together to form a polypeptide chain in a specific site in the ribosome. 5)The polypeptide folds into a functional protein.

Part 4: Prepare a Twist DNA Synthesis Order

For this design, I prepared a linear expression cassette in Benchling containing: Constitutive promoter, ribosome Binding Site (RBS), start codon, codon-optimized GFP coding sequence, 6xHis tag, stop codon, T7 terminator

This cassette would be ordered as a clonal gene through Twist Bioscience. I would select a high-copy plasmid backbone such as pTwist Amp High Copy, which provides: Ampicillin resistance for selection, high-copy origin of replication and efficient propagation in E. coli

Ordering as a clonal gene would allow direct transformation into E. coli without additional cloning steps, accelerating experimental validation.

Part 5: DNA Read/Write/Edit

5.1 DNA READ

(i) What DNA would you want to sequence and why? I would like to sequence environmental microbial DNA from closed ecological life-support systems, such as bioreactors used in regenerative environments (similar to MELiSSA-type systems). Specifically, I would sequence microbial community DNA to monitor biodiversity, metabolic stability, and potential pathogenic shifts. (ii) What sequencing technology would you use and why? I would use a combination of:

• Illumina sequencing • Oxford Nanopore sequencing

Illumina provides high accuracy short reads, ideal for detecting small mutations and precise taxonomic profiling. Oxford Nanopore provides long reads, which are useful for assembling genomes, detecting structural variants, and monitoring plasmids or gene clusters. Using both increases robustness and ecological insight.

Preparation (Essential Steps)

DNA extraction from environmental sample
Fragmentation (if needed for Illumina)
Adapter ligation
PCR amplification (Illumina)
Library preparation
Loading onto flow cell In closed systems, small microbial imbalances can lead to system instability or health risks. Sequencing allows early detection of contamination, horizontal gene transfer, or harmful mutations. Therefore, DNA sequencing becomes a tool for real-time biosurveillance and ecological control.

Essential Steps of Sequencing Technology -Illumina (Second-generation)

• DNA fragments attach to flow cell • Bridge amplification creates clusters • Sequencing-by-synthesis with fluorescent reversible terminators • Camera detects fluorescence • Base calling via signal interpretation Output: Short reads (FASTQ files with quality scores)

-Oxford Nanopore (Third-generation) • DNA passes through nanopore • Changes in ionic current measured • Signal processed into nucleotide sequence

Output: Long reads (FASTQ, real-time data)

5.2 DNA WRITE

(i) What DNA would you want to synthesize and why? I would synthesize a plasmid-based genetic circuit encoding: • A fluorescent reporter (e.g., GFP) • A stress-responsive promoter • A regulatory element sensitive to metabolic imbalance The purpose would be to create a biosensor that detects environmental stress inside a microbial ecosystem and produces a measurable fluorescence output. This construct could function as an early warning system in closed bioreactors.

(ii) What technology or technologies would you use to perform this DNA synthesis and why? I would use commercial gene synthesis through Twist Bioscience Why? • High accuracy • Scalable synthesis • Codon optimization • Assembly-ready fragments Essential Steps of DNA Synthesis

Digital DNA design
Oligonucleotide synthesis
Assembly (e.g., Gibson assembly)
Sequence verification
Plasmid construction

Limitations • GC-rich or repetitive sequences are difficult • Length constraints • Cost increases with size • Biosecurity screening restrictions

5.3 DNA Edit

(i) What DNA would you want to edit and why? I would edit the GFP gene expressed in E. coli to modify its fluorescence intensity. By introducing targeted mutations into the GFP coding sequence, it is possible to alter protein folding efficiency or chromophore structure, potentially enhancing fluorescence output. This modification would allow better signal detection and improved reporter performance in synthetic biology applications. (ii) I would use CRISPR-Cas9 genome editing CRISPR-Cas9 uses a guide RNA (gRNA) designed to match a specific DNA sequence within the GFP gene. The Cas9 enzyme introduces a double-strand break at that location. To introduce a precise modification, a donor DNA template containing the desired mutation would be supplied. The bacterial cell then repairs the break, incorporating the modified sequence. Essential inputs include: Guide RNA targeting GFP, Cas9 nuclease (plasmid or protein form), Donor DNA template containing the intended mutation and Competent E. coli cells

Limitations of this method include potential off-target effects, variable editing efficiency, and the need for downstream screening to confirm successful edits.

Note: This assignment was developed with the assistance of an AI language model (ChatGPT, Gemini), used to help structure ideas and refine wording. The concepts and final decisions were critically reviewed and adapted by the author.