Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Important This page documents my idea for using synthetic biology to detect postpartum depression (PPD), and explores the governance and ethics around it. Synthetic Biology for Early Detection of Postpartum Depression Postpartum depression (PPD) is one of the most common complications of childbirth, affecting roughly 10–20% of mothers worldwide, with a recent large meta-analysis estimating a global prevalence around 17%. Untreated PPD harms not only maternal health but also infant development, attachment, and long-term family well-being.

  • Week 2 HW: How DNA read, write and edit

    Exploring how we can read, write, and edit DNA using modern molecular biology tools. Part 1: Benchling & In-silico Gel Art 1.1 Restriction digestion simulation Using Benchling, I simulated λ DNA digests with the following restriction enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI.

Subsections of Homework

Week 1 HW: Principles and Practices

A mother experiencing postpartum depression

Important

This page documents my idea for using synthetic biology to detect postpartum depression (PPD), and explores the governance and ethics around it.

Synthetic Biology for Early Detection of Postpartum Depression

Postpartum depression (PPD) is one of the most common complications of childbirth, affecting roughly 10–20% of mothers worldwide, with a recent large meta-analysis estimating a global prevalence around 17%. Untreated PPD harms not only maternal health but also infant development, attachment, and long-term family well-being.

Despite this burden, most health systems rely on short questionnaires like the Edinburgh Postnatal Depression Scale (EPDS) or PHQ-9, which vary widely in what proportion of affected women they detect and are not always followed up with clinical evaluation. Many women never receive a formal diagnosis or treatment, due to under-screening, stigma, and uneven access to mental health care.

At the same time, emerging work suggests that specific blood biomarkers and epigenetic signatures during late pregnancy can predict which women are likely to develop PPD, with some studies reporting predictive accuracies close to 88%. This opens the door to biological sensing, rather than relying only on self-report.


The tool I want to develop

Concept: a biomarker-sensing synthetic biology platform for PPD risk

My idea is to design a modular synthetic biology platform that can sense molecular markers associated with postpartum depression risk in a minimally invasive sample (for example, finger-prick blood or saliva) and produce an easy-to-interpret readout that could be integrated into postpartum care.

Key features:

  • Uses known or emerging PPD-associated biomarkers (for example, genes regulating estrogen signaling, epigenetic changes around targets like TTC9B, or other stress-response–linked markers).
  • Implements biosensors in a safe, cell-free or contained system (for example, paper-based or microfluidic cell-free expression system) to avoid releasing live GMOs.
  • Outputs a semi-quantitative risk score that must be interpreted by a clinician, not a stand-alone diagnosis.

Why synthetic biology?

  • Synthetic biology has already been used to build low-cost, paper-based diagnostics for infectious diseases and metabolic markers, suggesting similar architectures could be adapted for mental health–relevant biomarkers.
  • A synthetic biosensor could ultimately enable decentralized, low-cost PPD risk screening, including in settings where mental health professionals are scarce, if developed responsibly.

For this class, my project focus is to:

  • Map from current knowledge about PPD biomarkers to plausible synthetic sensing strategies.
  • Prototype in-silico designs for sensor circuits and outline a feasible wet‑lab pipeline in a safe, scoped way (no actual patient samples).
  • Analyze governance and policy implications to avoid harm and promote equitable, ethical use.

Policy and governance goals

My overarching ethical goal is: Enable earlier detection of postpartum depression in a way that reduces harm, respects autonomy, and does not exacerbate inequities.

I break this into four concrete policy/governance goals:

  1. Ensure safety and biosecurity
    Prevent misuse of any biological components or data in ways that could harm mothers or infants, and ensure lab work uses appropriate containment and avoids unnecessary handling of human samples or high-risk constructs.

  2. Protect autonomy, privacy, and informed consent
    Avoid creating tools that pressure women into testing or treatment, and ensure any real-world deployment would require informed, voluntary participation with clear explanations of limitations.

  3. Promote equity and access
    Avoid building a “luxury diagnostic” that only wealthy health systems can implement, and consider how design choices affect low‑resource settings and marginalized groups who already face higher PPD risk.

  4. Support constructive, evidence-based use
    Ensure any biosensor results are used as one input in holistic care, not as a binary label, and encourage integration with follow‑up support, not just risk labeling.


Governance actions

Action 1: Institutional review and mental‑health ethics checklist

Purpose

Currently, many early‑stage synthetic biology projects in academic settings are reviewed primarily for biosafety, not for mental‑health–specific ethical risks such as stigma, coercion, and data privacy. I propose that any institutional project involving PPD biomarkers or synthetic biosensors for mental health must undergo standard biosafety/IRB review plus a short mental‑health–focused ethics checklist.

Design

  • Actors: university IRBs, biosafety committees, mental health professionals, PIs.
  • Requirements:
    • Projects submit a one‑page statement describing: target population, potential harms (false positives/negatives, stigma), data handling plans, and strategies to ensure results do not replace clinical judgment.
    • At least one reviewer with psychiatric or maternal health expertise participates in review for these projects.
    • For class projects, an adapted, lighter checklist is integrated into the syllabus (for example, an “ethics box” in weekly assignments).

Assumptions

  • Institutions have access to at least one clinician or ethicist with maternal mental health expertise.
    • Extra review burden does not discourage students or early‑career researchers from working on maternal mental health at all.
    • Checklists actually change design decisions rather than becoming a box‑ticking exercise.

Risks of failure and “success”

  • Failure: the checklist becomes performative, and projects move to less regulated settings (DIY labs, startups) to avoid oversight.
  • “Success” risks: higher perceived sensitivity of mental‑health projects may stigmatize the topic further, and over‑cautious review could delay low‑risk, beneficial work.

Action 2: Voluntary technical standard for low‑risk, cell‑free PPD biosensors

Purpose

Right now, biosensor design choices (cell‑free vs. live cells, DNA format, kill‑switches) are largely left to individual labs or companies. I propose a voluntary technical standard for “low‑risk” PPD biosensor prototypes that:

  • Strongly favors cell‑free systems and non‑replicating genetic material.
  • Recommends no direct use of patient samples in early prototyping (use synthetic oligos or spiked serum from standard sources).
  • Specifies minimal documentation for transparency.

Design

  • Actors: academic consortia, funding agencies, community labs.
  • Elements:
    • A short guidance document: acceptable chassis (for example, E. coli cell‑free extract on paper), permitted sample types (synthetic controls), safe storage and disposal.
    • Template protocol library emphasizing: use of standard constructs, safe waste handling, no clinical claims.
    • Funding incentives: grants and awards preferentially support projects that adhere to the standard.

Assumptions

  • Cell‑free systems are sufficient for proof-of-concept sensing of relevant biomarkers.
  • Labs will adopt a voluntary standard if funders and journals signal support.
  • Standards can be updated as biomarker science evolves.

Risks of failure and “success”

  • Failure: low adoption in high‑pressure environments where speed and novelty trump safety conventions, and standards become outdated as new biomarkers or assay technologies emerge.
  • “Success” risks: creates a de facto “approved” technical pathway that may unintentionally lock out innovative but safe alternatives, and could be misinterpreted as safety certification.

Action 3: Public sector guidance on responsible PPD biomarker testing and data use

Purpose

There is increasing interest in blood‑based or genetic tests to predict psychiatric risk, including PPD, but little clear public guidance on how these results should be communicated and used, particularly around stigma and insurance discrimination. I propose that national health agencies publish official guidance on:

  • Appropriate use of PPD biomarker or biosensor tests (always as adjuncts, never stand-alone diagnoses).
  • Communication of probabilistic risk and uncertainty to patients.
    • Data protection and non‑discrimination principles.

Design

  • Actors: national health agencies, professional societies in psychiatry and obstetrics, regulators.
  • Steps:
    • Convene working groups including obstetricians, psychiatrists, ethicists, and patient advocates.
    • Draft guidelines on consent forms, counselling before/after testing, limits of interpretation, and how to integrate test results into care pathways.
    • Encourage insurers and employers to sign on to non‑discrimination commitments for PPD biomarker information.

Assumptions

  • PPD biomarker tests will become increasingly accurate and available, making pre‑existing guidance necessary.
  • Public agencies are willing to act proactively rather than waiting for commercial products to force reactive regulation.
    • Non‑discrimination norms can be enforced or at least monitored.

Risks of failure and “success”

  • Failure: guidance exists but is poorly implemented at clinic level, while commercial tests emerge that sidestep guidance and create confusion.
  • “Success” risks: guidance that is too conservative may discourage investment in beneficial diagnostics, and strong data‑protection rules might make it harder to study which tests truly improve outcomes if not designed carefully.

Using a 1–3 scale (1 = best, 3 = worst, n/a = not applicable):

CriterionAction 1: Institutional ethics reviewAction 2: Low‑risk technical standardAction 3: Public guidance on PPD biomarker use
Enhance biosecurity – prevent incidents212
Enhance biosecurity – help respond221
Foster lab safety – prevent incidents213
Foster lab safety – help respond223
Protect environment – prevent incidents213
Protect environment – help respond323
Minimize costs/burdens to stakeholders2–322
Feasibility (political / practical)223
Not impede research21–22
Promote constructive applications (equity, autonomy, appropriate use)121

Which options I would prioritize and for whom

For an HTGAA‑scale class project and early‑stage academic work, I would prioritize a combination of Action 1 and Action 2:

  • Action 2 (low‑risk technical standard) directly shapes the design of my synthetic biology experiments: favor cell‑free systems, standard genetic parts, synthetic controls instead of patient samples, and clear waste‑handling.
  • Action 1 (institutional mental‑health ethics review) forces me and other students to explicitly consider how a PPD biosensor might affect real people, even if our class prototype never leaves the lab.

For national or international audiences (for example, a ministry of health or WHO‑like body), Action 3 becomes critical as soon as biomarker‑based PPD tests move from research to clinical or commercial deployment.

Trade‑offs and uncertainties:

  • Additional ethical review processes could slow or deter students from working on maternal mental health, a domain that is already under‑resourced.
  • It remains uncertain whether current biomarker candidates, even with promising predictive performance, will generalize across populations and avoid worsening inequities if tests are rolled out prematurely.
  • My scoring assumes that simple, cell‑free technical standards are enough to meaningfully reduce biosafety risk; future technologies may require revisiting these assumptions.

Experimental plan and documentation

In‑silico experiments

Goal: Explore how PPD‑associated biomarkers could be detected using synthetic circuits, and prototype a sensor design computationally.

  1. Literature mapping
    Summarize key PPD biomarker papers on blood biomarkers and epigenetic signatures, noting biomarker type, dynamic range, and sample type.

  2. Sensor design sketches
    Choose one biomarker type that is plausible to sense (for this class, assume mRNA or protein), and sketch two architectures: a cell‑free toehold switch and a small‑molecule–responsive transcription-factor system.

  3. Sequence design and simulation
    Use tools like NUPACK or Benchling to design and test a toehold switch recognizing a short biomarker-related RNA sequence, checking secondary structures and specificity.

  4. Risk reflection in design
    For each sensor variant, add notes about who could be helped or harmed if it worked perfectly, and explicitly state that any risk output would need clinical interpretation.

Lab experiments (safe scope)

Because I will not be working with human samples or real depression biomarkers, the lab work focuses on generic sensor behavior using safe, synthetic targets.

Example pipeline:

  1. Cell‑free reporter system setup
    Use a cell‑free expression system to express a fluorescent or colorimetric reporter (for example, GFP or LacZ) under control of a toehold or inducible promoter.

  2. Synthetic input testing
    Design synthetic RNA or DNA oligos that mimic a biomarker‑related sequence region and test no‑input, low, medium, and high input conditions, measuring reporter output.

  3. Documenting failures
    When the reporter fails to turn on, record hypothesized reasons (for example, misfolding, degradation), troubleshooting steps, and include photos or instrument screenshots.

  4. Tie back to governance
    For each experimental step, note biosafety level, how you complied, and how this aligns with the “low‑risk technical standard” idea.


Ethical concerns and additional governance ideas

Working on a synthetic biology concept for postpartum depression highlighted that biomarker-based prediction of mental health conditions can be double-edged: it can enable early support but also increase labeling, discrimination, or pressure to act in certain ways. I also became more aware of how false positives and false negatives in mental-health diagnostics have different ethical implications than in infectious disease, because they intersect with identity and parenting.

Additional governance actions I would suggest:

  1. Co‑design with affected communities
    Require that any PPD diagnostic tool development includes consultation with postpartum women from diverse backgrounds, especially those most affected by PPD.

  2. Educational materials for clinicians and communities
    Develop plain‑language materials on what PPD is, the limits of biomarker tests, and how to access support regardless of test results.

  3. Open documentation of limitations
    Encourage projects (including class projects) to clearly document uncertainty, failed attempts, and limitations, aligning with the HTGAA emphasis on transparent documentation.

Note

The rest of this page will be updated as I add actual in‑silico and lab results, along with sketches, screenshots, and notes on what worked and what didn’t.

Week 2 HW: How DNA read, write and edit

Central Dogma of life ~ Khan Academy

Exploring how we can read, write, and edit DNA using modern molecular biology tools.

Part 1: Benchling & In-silico Gel Art

1.1 Restriction digestion simulation

Using Benchling, I simulated λ DNA digests with the following restriction enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI.

Simulated restriction digests of λ DNA with multiple enzymes

Benchling gel showing distinct fragment patterns for each enzyme combination.

Digest reaction setup

ConditionM12345678910
Water--13 µL13 µL13 µL13 µL13 µL13 µL14 µL--
CutSmart--2 µL2 µL2 µL2 µL2 µL2 µL2 µL--
λ DNA--3 µL3 µL3 µL3 µL3 µL3 µL3 µL--
Enzymes--NdeI + SacIEcoRI + SacIEcoRIEcoRI + EcoRVEcoRI + SacINdeI + XhoINdeI--

1.2 Latent Figure–style gel art

My pattern is a heart, in the spirit of Paul Vanouse’s Latent Figure Protocol artworks and in honor of Valentine’s Day.

Heart-shaped gel pattern

A heart-shaped arrangement of bands designed by combining different restriction digests.

Part 2: DNA Design Challenge

2.1 Chosen protein: NF1

My chosen protein is NF1 (neurofibromin), a tumor suppressor that negatively regulates the RAS pathway and helps prevent uncontrolled cell growth.

  • Germline NF1 mutations cause Neurofibromatosis Type 1, which increases risk of neurofibromas and malignant peripheral nerve sheath tumors.
  • Somatic NF1 mutations also appear in ~5–10% of sporadic cancers, including lung, breast, and melanoma.

UniProt entry: NF1_HUMAN (P21359)
Example N‑terminal sequence:
MAAHRPVEWVQAVVSRFDEQLPIKTGQQNTHTKVSTEHNKECLINISKYKFSLVISGLTT

2.2 Reverse translation to DNA

Using an online protein-to-DNA tool, I reverse translated the NF1 amino‑acid sequence to obtain a candidate coding DNA sequence (CDS).
I then cross‑checked the resulting sequence against the NF1 gene in the NCBI Gene database to confirm that the codons match an authentic NF1 coding region.

Example tool used: CUSABIO Protein-to-DNA Sequence Generator.

<details>
  <summary>Click to view example reverse-translated NF1 coding sequence</summary>
ATGTCTCCTCCTAATTTTCATATGGCTAATAATGAACGTTTTATTCGTATGATTAATTCTCATATGTCTGCTCCTATTGAAAATTCTGGTAATAATTTTCCTGAATCTGTTATGGCTGCTCATCGTCCTGTTGAATGGGTTCAAGCTGTTGTTTCTCGTTTTGATGAACAATTACCTATTAAAACTGGTCAACAAAATACTCATACTAAAGTTTCTACTGAACATAATAAAGAATGTTTAATTAATATTTCTAAATATAAATTTTCTTTAGTTATTTCTGGTTTAACTACTATTTTAAAAAATGTTAATAATATGCGTATTTTTGGTGAAGCTGCTGAAAAAAATTTATATTTATCTCAATTAATTATTTTAGATACTTTAGAAAAATGTTTAGCTGGTCAACCTAAAGATACTATGCGTTTAGATGAAACT
</details>

2.3 Codon optimisation

After I determined the nucleotide sequence encoding human NF1, I codon-optimised it for expression in yeast. Even though the amino‑acid sequence stays the same, different organisms do not use all synonymous codons equally. Yeast has its own pattern of preferred codons, linked to which tRNAs are abundant. If I kept the original human codon usage, the yeast translation machinery might stall on rare codons, leading to slower translation, ribosome pausing, and lower NF1 yield. Codon optimisation rewrites the DNA sequence using codons common in yeast, without altering the protein sequence, thereby improving translation efficiency and protein expression levels.I optimised the NF1 coding sequence for Saccharomyces cerevisiae (strain S288C) because this is the baker’s yeast strain I plan to use as the expression host. Matching the codon usage to S. cerevisiae should maximise NF1 expression in this yeast background while maintaining the correct human NF1 protein sequence.

I would place the codon-optimised NF1 sequence under the control of a yeast promoter (for example, GAL1) on an expression plasmid and transform it into S. cerevisiae. When I grow the yeast in inducing conditions, the GAL1 promoter is activated, and yeast RNA polymerase recognises the promoter region and transcribes the NF1 coding sequence into messenger RNA. This NF1 mRNA is processed and exported to the cytoplasm, where ribosomes read the codons and, using yeast tRNAs, assemble the NF1 polypeptide chain. Because the sequence was codon-optimised for yeast, the ribosomes mostly encounter codons that match abundant tRNAs, which should make translation more efficient and increase NF1 yield.

Some extra notes and readings: The organism I have chosen is S. cerevisiae. That plasmid is set up for Gateway cloning, so I don’t add NF1 with restriction enzymes into a random MCS; I will just swap NF1 into the ccdB Gateway cassette under the GAL1 promoter. blog.addgene

In this vector:

  • Promoter: GAL1 (yeast‑inducible).
  • Destination cassette: attR1–ccdB–CmR–attR2 between GAL1 and the terminator. This is where NF1 gene goes.
  • I first clone NF1 into a Gateway entry vector (attL1–NF1–attL2), then do an LR reaction with pAG423GAL‑ccdB. The LR reaction recombines attL/attR sites and replaces the ccdB cassette with your NF1 coding sequence, so the final plasmid has:

What is the Gateway LR assembly?

Gateway LR is a recombination cloning step that moves your gene from an entry vector into a destination vector (like pAG423GAL‑ccdB) using site‑specific recombination instead of restriction enzymes. blog.addgene

Basic idea

  • In the LR reaction, a mix of Gateway recombinase enzymes recombines attL with attR sites, swapping NF1 in and ccdB out.
  • After LR, the destination plasmid now has:

GAL1 promoter → NF1 ORF → terminator, plus new attB sites at the junctions.

Because ccdB is toxic to standard E. coli strains, only plasmids that successfully replace ccdB with my gene survive, which gives strong selection for correct recombinants.

What you actually do in the lab

  1. Mix:
    • Entry plasmid (attL1‑NF1‑attL2)
    • Destination plasmid (pAG423GAL‑ccdB)
    • LR Clonase enzyme mix + buffer
  2. Incubate (often 1 hour / room temperature).
  3. Transform E. coli and plate on the antibiotic for the destination plasmid.
  4. Surviving colonies carry pAG423GAL‑NF1.

Part 3: Prepare a Twist DNA Synthesis Order

I attempted to create my NF1 gene construct and optimize it; however, the gene is too large for a plasmid construct. As a result, I couldn’t proceed further. Below, I am outlining the steps I took to create a plasmid that expresses GFP, as mentioned on the webpage. Check the files below:


Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would I sequence and why?

I would sequence synthetic DNA strands used for DNA-based digital data storage, where arbitrary binary data (e.g., images, text, archives) are encoded into nucleotide sequences and written as pools of oligonucleotides. [

DNA-based storage is attractive because it offers extremely high information density, long-term stability (centuries under proper conditions), and technology-agnostic decoding (any future sequencer that can read A/T/C/G can, in principle, recover the data).

Sequencing these DNA pools accurately is critical because even small base-calling errors can corrupt the decoded file, so this is an ideal testbed for thinking about tradeoffs between read length, error profile, and redundancy in the encoding scheme.

(ii) What technology would I use and why?

I would primarily use Illumina sequencing-by-synthesis (SBS), complemented by Oxford Nanopore Technologies (ONT) if I wanted rapid, on-demand reads.

  • Illumina SBS:

    • Mature, high-throughput platform with very low per-base error rates, widely used as the gold standard for reading DNA data storage constructs.
    • Excellent for large pools of short oligos (e.g., 100–300 bp) which are typical in DNA data storage systems.
  • Oxford Nanopore:

    • Single-molecule, real-time, third-generation sequencing that can read DNA without amplification and is increasingly used for rapid DNA data retrieval, despite higher raw error rates.

Given that DNA data storage typically tolerates some raw error through redundancy and error-correcting codes, combining Illumina for “archival, highly accurate” reads and ONT for “quick access” reads is attractive.


5.1: Method generation and classification

  • Illumina SBS is considered second-generation (NGS):
    • Characteristics: massively parallel sequencing of clonal clusters on a flow cell, short reads (typically 50–300 bp), imaging of fluorescence during synthesis.
  • Oxford Nanopore is third-generation (single-molecule):
    • Characteristics: direct sequencing of single DNA molecules as they translocate through a nanopore; no PCR is strictly required, long reads (kb–Mb), higher raw error but random.

5.1: Input and sample preparation

For DNA data storage, the input is a pool of synthetic DNA oligonucleotides (each encoding a chunk of digital data plus indices and error-correction). For Illumina SBS, essential prep steps:

  1. DNA input

    • Synthetic DNA pool (e.g., freeze-dried or in aqueous solution) containing thousands to millions of unique oligos (typically 100–300 nt).
  2. Fragmentation (if needed)

    • For short oligos used in data storage, additional fragmentation is usually unnecessary; for longer constructs, mechanical or enzymatic fragmentation can be used.
  3. End repair and A-tailing

    • Blunt-ending and addition of a single A overhang to make DNA compatible with T-tailed adapters.
  4. Adapter ligation

    • Ligate platform-specific adapters containing:
      • Flow cell binding sequences
      • Sequencing primer binding sites
      • Optional indices/barcodes.
  5. PCR amplification (library enrichment)

    • Limited-cycle PCR to enrich adapter-ligated molecules and add full adapter sequences; in data storage, PCR cycles are minimized to reduce bias.
  6. Library QC and normalization

    • Check size distribution and concentration; normalize and pool libraries as needed. [pmc.ncbi.nlm.nih] For Oxford Nanopore, prep differs:
  7. Input DNA

    • Ideally longer DNA fragments if using data storage schemes that encode longer blocks, but short-oligo protocols exist. [academic.oup]
  8. End-repair / dA-tailing

    • Prepare DNA ends compatible with nanopore adapters.
  9. Adapter ligation

    • Ligate ONT’s motor-protein–containing adapters so that a motor controls DNA translocation through the pore.
  10. (Optional) Amplification

    • Some workflows (e.g., PCR-based or rolling-circle amplification) are used when input quantity is low, but direct sequencing of native DNA is also common.
  11. Load library onto flow cell

    • Introduce library onto the nanopore device for sequencing.

5.1: Essential steps and base calling

Illumina sequencing-by-synthesis (second-generation)

  1. Cluster generation

    • Adapter-ligated DNA binds to oligos on the flow cell surface and undergoes bridge amplification to form clonal clusters, each cluster representing many copies of one original molecule. pmc.ncbi.nlm.nih
  2. Sequencing-by-synthesis cycles

    • Reversible terminator nucleotides (A, C, G, T), each with a distinct fluorescent label, are added.
    • DNA polymerase incorporates a single nucleotide at each cluster; imaging detects fluorescence at every cycle. pmc.ncbi.nlm.nih
  3. Base calling

    • For each cycle, the color at each cluster is recorded.
    • Image data are converted into intensity traces, then into base calls with quality scores (e.g., Phred Q-scores). pmc.ncbi.nlm.nih
  4. Output

    • Short reads (FASTQ files) with per-base qualities, typically accompanied by indices to map reads back to specific oligos. pmc.ncbi.nlm.nih
    • In data storage, these reads are de-multiplexed and decoded to reconstruct the original digital file. academic.oup

Oxford Nanopore sequencing (third-generation)

  1. DNA translocation through nanopores

    • DNA–adapter complexes are captured by nanopores in a membrane.
    • A motor protein ratchets DNA through the pore one or a few nucleotides at a time. chemistryworld
  2. Signal measurement

    • Ionic current across the pore fluctuates according to the sequence of nucleotides occupying the pore; this generates a continuous electrical trace. chemistryworld
  3. Base calling

    • Machine-learning models (e.g., RNNs, CNNs) translate current patterns into base sequences, assigning probabilities to each base at each position. chemistryworld
  4. Output

    • Long reads in FASTQ or FAST5 format with per-base quality scores. chemistryworld
    • For data storage, reads are aligned to expected oligo designs, consensus is built, then digital bits are reconstructed via the encoding scheme. academic.oup

5.2 DNA Write

(i) What DNA would I synthesize and why?

I would synthesize a small DNA origami scaffold plus staple set encoding a recognizable 2D pattern (e.g., a smiley face or logo) that also embeds a short DNA data storage message in some staple sequences. academic.oup

This merges structural DNA nanotechnology (origami) with information-encoding DNA, demonstrating dual use: nanoscale art or devices and simultaneous archival of metadata (e.g., author, date, or a URL encoded in staples). academic.oup

Concretely, this could involve:

  • A ~7 kb scaffold (e.g., M13mp18 derived) with designed folding path.
  • 200–250 staple oligos (20–60 nt) whose sequences are chosen both to fold the structure and to encode a short binary payload in some of their variable regions via an ATCG coding scheme. academic.oup

(ii) What technology would I use and why?

To synthesize these DNA sequences, I would use a commercial array-based phosphoramidite solid-phase synthesis platform such as that offered by Twist Bioscience or similar vendors. pmc.ncbi.nlm.nih

Reasons:

  • Array-based synthesis on silicon supports massively parallel synthesis of thousands of oligos (all the origami staples, plus any data-encoding variants) in one run at relatively low cost. healthandwealth.substack
  • Phosphoramidite chemistry is the current workhorse for high-throughput, custom oligo synthesis and is well supported by industrial pipelines. pmc.ncbi.nlm.nih

For longer constructs (e.g., custom scaffolds beyond single-oligo length), I would rely on gene synthesis workflows where shorter oligos are synthesized and then enzymatically assembled into longer double-stranded fragments. healthandwealth.substack


5.2: Essential steps of the synthesis method

(A) Chemical oligonucleotide synthesis (phosphoramidite)

  1. Design sequences

    • Use DNA origami design software (e.g., caDNAno) to design scaffold routing and staple sequences; optionally overlay a digital data encoding scheme onto selected staples. academic.oup
  2. Solid-phase phosphoramidite cycles

    • Oligos are synthesized 3’→5’ on a solid support (e.g., controlled-pore glass or silicon chip).
    • Each nucleotide addition involves:
      • De-protection of the terminal 5’ hydroxyl
      • Coupling of activated phosphoramidite nucleotide
      • Capping of unreacted 5’ OH
      • Oxidation to stabilize the phosphodiester bond. pmc.ncbi.nlm.nih
  3. Cleavage and deprotection

    • Oligos are cleaved from the solid support and base-protecting groups are removed. pmc.ncbi.nlm.nih
  4. Purification

  5. Assembly (for longer DNA)

    • For long scaffolds or genes:
      • Overlapping oligos are combined and assembled by PCR, Gibson assembly, or other enzymatic methods to yield kilobase-scale dsDNA. pmc.ncbi.nlm.nih
  6. Quality control

    • Verify oligo pools or assembled genes by mass spectrometry, capillary electrophoresis, and/or test sequencing. healthandwealth.substack

(B) DNA origami folding

  1. Mix scaffold and staples

    • Combine scaffold strand with large excess of staple oligos in appropriate buffer (e.g., Mg²⁺-containing folding buffer).
  2. Thermal annealing

    • Heat to denature, then slowly cool to allow staples to hybridize and fold scaffold into target shape.
  3. Validation

    • Observe structures by AFM or TEM (in principle for the project, though not strictly required for the “writing” step).

5.2: Limitations of the synthesis method

  • Length limitations

    • Phosphoramidite synthesis suffers from cumulative error; typical high-fidelity oligos are reliable up to ~200 bp, beyond which error rate and synthesis failures increase. healthandwealth.substack
    • Platforms like Twist can assemble longer gene fragments (up to ~1.8 kb) by combining shorter oligos, but this adds complexity and cost. healthandwealth.substack
  • Error profile

    • Errors include substitutions, deletions, and truncated products; these require purification or downstream error-correction (e.g., cloning and sequencing, or redundancy for data storage). pmc.ncbi.nlm.nih
  • Speed and throughput

    • Parallel synthesis arrays are fast for thousands of oligos, but turnaround time is still on the order of days to weeks from design to delivery. healthandwealth.substack
  • Scalability and cost

    • Cost per base remains higher than ideal for very large-scale DNA data storage or genome-scale synthesis; enzymatic synthesis methods (e.g., TdT-based) are being developed to address this but have their own biases and limitations. pmc.ncbi.nlm.nih

5.3 DNA Edit

(i) What DNA would I edit and why?

I would edit the genomes of closely related extant species to introduce traits from extinct animals, in line with de-extinction efforts such as Colossal Biosciences’ woolly mammoth–Asian elephant and dodo–Nicobar pigeon projects.

Specifically, edits would:

  • Install cold-adaptation traits (e.g., thick fur, fat metabolism adaptations) into Asian elephant genomes to create mammoth-like hybrid elephants that could help restore Arctic grassland ecosystems.
  • Introduce morphological and ecological traits of the dodo into the genome of the Nicobar pigeon to re-establish a functional analog of the extinct bird in its native habitat.

These edits have both conservation (ecosystem restoration, genetic rescue) and scientific (understanding genotype-phenotype relationships) motives.

(ii) What technology would I use and why?

I would use CRISPR–Cas9 genome editing, combined with cell culture and reproductive technologies (e.g., iPSCs and cloning), because CRISPR allows programmable, multiplexable edits at specific genomic loci with relatively high efficiency. CRISPR is already being applied in de-extinction pipelines: researchers extract and sequence ancient DNA, compare genomes between extinct and extant species, then program CRISPR guides to install key variants in cells of the living relative.


5.3: How CRISPR edits DNA – essential steps

  1. Ancient DNA reconstruction and target selection

    • Extract DNA from well-preserved remains of the extinct species (e.g., mammoth tusk, dodo bone).
    • Use high-throughput sequencing and assembly methods to reconstruct as much of the extinct genome as possible.
    • Align extinct and extant genomes to identify candidate genes and variants associated with desired traits (e.g., fur density, fat storage, beak shape).
  2. CRISPR design

    • For each target gene:
      • Design guide RNA (gRNA) sequences that match the extant species’ genomic sequence near the intended edit (adjacent to appropriate PAM sites).
      • Design donor DNA templates (ssODN or plasmid donors) encoding the extinct-species variant(s) if precise knock-ins are required.
  3. Editing in cells

    • Choose a suitable cell type from the extant species (e.g., fibroblasts, endothelial progenitor cells, or iPSCs).
    • Deliver the CRISPR components:
      • Cas9 protein or mRNA
      • gRNA(s)
      • Donor DNA (for homology-directed repair)
      • Delivery by electroporation, nucleofection, or viral vectors.
  4. Mechanism of edit

    • Cas9–gRNA complexes bind the target DNA sequence and introduce a double-strand break.
    • Cellular repair pathways act:
      • Non-homologous end joining (NHEJ) introduces insertions/deletions (useful for knockouts).
      • Homology-directed repair (HDR) uses supplied donor DNA to introduce precise nucleotide substitutions or insertions (needed for dodo/mammoth variants).
  5. Clone selection and validation

    • Screen edited cells by PCR and sequencing to confirm correct edits, check for off-target changes, and isolate clones carrying multiple desired edits.
  6. Embryo generation and development

    • Reprogram edited cells into iPSCs (if not already), differentiate where needed, or use nuclear transfer to create embryos.
    • Implant embryos into surrogate mothers or use ex utero gestation platforms if developed.
  7. Phenotypic assessment

    • Assess whether edited animals show target traits (e.g., cold resistance, morphology) and evaluate ecological impacts before any reintroduction.

5.3: Inputs and preparation

Design-stage inputs

  • Reconstructed extinct-species genome sequence.
  • High-quality reference genome of the extant species.
  • Computational pipelines for variant calling, functional annotation, and gRNA design.

Wet-lab inputs

  • Living cells from the extant species (e.g., gray wolf, Asian elephant, Nicobar pigeon).
  • Cas9 (protein or mRNA).
  • Synthetic gRNAs for each targeted locus.
  • Donor DNA templates for precise edits.
  • Plasmids or RNP complexes for delivering the CRISPR machinery.

5.3: Limitations of the editing method

  • Efficiency and multiplex editing

    • Editing many loci simultaneously (required to approximate an extinct genome’s phenotype) lowers efficiency per site and complicates clone selection.
  • Precision and off-target effects

    • gRNAs can sometimes bind partially mismatched sites, causing off-target cuts and unintended mutations, which is a concern in both animal welfare and ecological safety.
  • Context dependence

    • Introducing a few key genes from an extinct species into a modern genome may not fully recapitulate the original phenotype due to epistasis and regulatory differences.
  • Developmental and reproductive challenges

    • Edited embryos may fail to develop properly; gestation in surrogate species presents immunological, anatomical, and ethical hurdles.
  • Ethical and ecological limits (conceptual, not technical)

    • Even with powerful CRISPR tools, decisions about which traits to edit, how many individuals to release, and how to manage ecological consequences are non-trivial and may limit real-world deployments.