Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Assignment 1 1. Gen Z and other young consumers are increasingly choosing scents based on mood versus identity, and their desire to use fragrance as a tool for communication and emotional intelligence is reshaping the fragrance market. The application I’ll be discussing this week - and my idea for a final project - is the development of a novel scent (perfume) called “Force Ambrosiaque”, a biological and chemical engineering-informed fragrance system that passively modulates its emitted scent profile in response to the wearer’s physiological state, without electronics, sensors, or living organisms.

  • Week 2 HW: DNA Read, Write, and Edit

    Part 1: Benchling and In-silico Gel Art I started this week with great ambition. Frankly, that faded pretty fast! Initially, I wanted to only use enzymes that started with letters in my name (Katharine) but that was over-ambitious given the time I had available to devote to the exercise this week. I had a couple ideas in mind for design… a space invader; the Windows logo. I had used Benchling before (in fact I’m trying to design a competitor) but was a bit rusty, so it took me a second to get my bearings. Following the documentation provided (thank you!) and playing around a bit with Ronan’s website for some inspiration, I was able to kind of make “HI”. Initially, I also included an exclamation point, but it just wasn’t translating. See below for screenshots of my viewer and the final PNG. Enzymes were selected and arranged according to the steps outlined in the provided protocol.

Subsections of Homework

Week 1 HW: Principles and Practices

Assignment 1

1. Gen Z and other young consumers are increasingly choosing scents based on mood versus identity, and their desire to use fragrance as a tool for communication and emotional intelligence is reshaping the fragrance market. The application I’ll be discussing this week - and my idea for a final project - is the development of a novel scent (perfume) called “Force Ambrosiaque”, a biological and chemical engineering-informed fragrance system that passively modulates its emitted scent profile in response to the wearer’s physiological state, without electronics, sensors, or living organisms.

This project two tightly linked, aspirational engineering components that will act as conditions for successful completion:

First, I aim to improve the biosynthesis of ambergris-like fragrance materials. Ambergris is a rare material harvested from sperm whales with obvious environmental consequences to its attainment. As a result, alternative amber materials have been engineered to fill its place; these scents are central to modern perfumery but are often produced through petrochemical routes that yield mixtures of isomers with poorly controlled perceptual and behavioral effects. By engineering more selective biosynthetic routes through the use of microorganisms, the goal is to produce amber scaffolds that are chemically stable and clean, environmentally preferable, and more predictable.

Second,, I am aiming to create a fully bottle-contained, self-modulating fragrance system that responds to changes in skin temperature, moisture, pH, enzymatic activity, and time. This is achieved through precursor molecules, stimulus-responsive carriers, and an understanding of how to work the natural volatility of various components into a stabilized perfume that achieves its intended outcome of determining which scent components are released under different physiological conditions. The system is designed to translate internal state into subtle changes in scent expression.

This system is inspired by the existence of Molecule 01, or Iso E Super, which is a near-universally appealing molecule engineered in the mid-century and often likened to an “olfactory MSG”. A stabilizer, it helps facilitate the aforementioned gated behaviors, but imperfectly. It also causes a number of potential adverse effects. I would like the backbone of this formula to be an Iso E Super successor.

The motivation for developing this tool is threefold:1. to explore a constrained, safety-conscious form of biological engineering

2. to treat scent as a communicative and expressive interface grounded in ancient cultural links between affect, embodiment, and social presence

And,

3. to investigate whether responsive chemical systems can be designed and used ethically and with minimal ecological impact, while reducing adverse effects for consumers. I will touch on this more in the next section.

2. The overarching governance-related goals of this project are to ensure that Force Ambrosiaque contributes to an ethical future of perfuming by reducing potential adverse effects of Iso E Super-family molecules and facilitating more ecologically friendly methods of producing prized ingredients and scents. However, a third consequential goal emerges from the existence of this project at all and general consumer-unfriendly trends in the cosmetics industry, which is to avoid over-promising on the “abilities” of this fragrance and avoid consumer deception in a time of scientific illiteracy.

Goal A: Human Safety

Ensure that Force Ambrosiaque does not cause physical harm to users or bystanders, particularly in light of its use of amber-like molecules, Iso-E-Super molecule family usage, and additional responsive fragrance chemistry.

Sub-goals:

  • Prevent acute harms such as skin irritation/sensitization or respiratory events due to volatility spikes (changes in rate or concentration of molecule release or application).
  • Prevent long-term harms related to cumulative exposure and uncertainty around use of novel biosynthetic fragrance compounds.
  • Ensure that biological engineering advances used to improve fragrance ingredients do not bypass existing toxicological, dermatological, or consumer safety evaluation norms.

Goal B: Environmental Stewardship

Ensure that both the biosynthetic production of amber-like materials and the downstream consumer use of the fragrance do not introduce ecological harm and accomplish the goal of a 1:1 replacement option for ambergris.

Sub-goals:

  • Avoid environmentally harmful, accumulative, or toxic components in production, formulation, and waste (as well as after disposal) especially in waterways.
  • Enable assessment response if environmental risks are identified after deployment.

Goal C: Consumer Transparency

Prevent the novel fragrance system used to develop Force from being used in ways that mislead users or overpromise biological effects for the sake of marketing.

Sub-goals:

  • Preserve user and buyer autonomy by ensuring clear communication and scientific honesty about how the system works, what biological signals it responds to, and the limits of its mechanisms and effects.
  • Prevent marketing that frames the fragrance as a tool for behavioral control, coercion, or guaranteed emotional or interpersonal outcomes.

3. Governance Actions (GAs)

GA1: Premarket safety standards for responsive fragrance systems

PurposeCurrently, fragrance safety evaluation is largely designed for “static” formulations (though truly all do change over time, here I use the term static to define fragrance designed without response in mind). I propose a new premarket safety standard specifically for fragrances - whether marketed as or actual - stimulus-responsive fragrance systems, which intentionally change emission profiles in response to skin conditions. This may seem like overreach or pie-in-the-sky thinking, and it very well may be, but for the purposes of this cognitive experiment I think we can proceed with the idea.

Design

  • Actors involved include fragrance companies, independent toxicology laboratories, academic researchers, and consumer safety regulators.
  • The standard would require testing under worst-case conditions, including elevated heat, moisture, and prolonged wear, mostly for signs of skin or respiratory irritation.
  • Required evaluations would include volatility curves, sensitization testing, inhalation exposure modeling, and degradation profiling (many of these I believe are already standard).
  • Compliance would be required before retail sale.
  • I am unsure if it is necessary that safety evaluations be publicly available, but it’s certainly a possibility.

Assumptions

  • Laboratory testing can meaningfully approximate real-world physiological variability.
  • Standardized protocols will not be prohibitively expensive for small innovators.

Risks of failure and success

  • Failure could occur if companies treat compliance as a rote or box-checking exercise and overlook key reactions, instabilities, impurities, etc.
  • Lack of human testing results in an imperfect system for ascertaining risk.

GA2: Tiered ingredient and supply-chain controls for high-risk components

PurposeAt present, fragrance ingredient sourcing varies widely in transparency and control. This action proposes a tiered governance system for various ingredients, especially for novel biosynthetic amber materials and responsive carrier systems, which may be exceptionally volatile (meant colloquially and scientifically).

Design

  • Actors include chemical suppliers, contract manufacturers, brands, and importers.
  • Ingredients would be classified into tiers based on novelty, reactivity, and exposure risk.
  • Higher-tier components would require enhanced documentation, impurity profiling, and restricted distribution.
  • Audits would verify that final formulations match expectations.

Assumptions

  • Risk can be reasonably stratified by ingredient class and release mechanism as well as compound structure and nature of the ingredient (organic vs synthetic).
  • Market incentives should (?) encourage supplier compliance.

Risks of failure and success

  • Failure could drive sourcing underground or discourage open research.
  • Success could concentrate power among a small number of certified suppliers, raising costs and limiting experimentation (a current issue in the industry).

GA3: Transparent marketing restrictions

PurposeCurrently, fragrance labeling rarely addresses consumer impulse (beyond the one to buy things) or social implications. This action proposes explicit science-oriented and restrictions on deceptive claims.

Design

  • Actors include brands, retailers, advertising platforms, and consumer protection agencies.
  • Labels would clearly state that the fragrance changes emission profiles with skin conditions and disclose allergens.
  • Marketing claims implying emotional, sexual, or behavioral effects, or overpromising the mechanisms of the formula, would be prohibited.
  • Usage guidance would discourage deployment in enclosed public spaces without consent.

Assumptions

  • Transparency meaningfully influences consumer behavior and social norms.
  • Advertising platforms can enforce claim standards consistently.

Risks of failure and success

  • Failure could result from unwillingness to read the label or consumer fatigue.
  • Investors or partners could wish for less transparency on account of potentially “demystifying” the product.

4. Rubric: Embodied Safety, Transparency, and Stewardship

Scoring: 1 = best, 2 = moderate, 3 = weakest, n/a = not applicable

Governance Options

  • Option 1: Premarket safety standards for responsive fragrance systems
  • Option 2: Tiered ingredient and supply-chain controls
  • Option 3: Labeling and marketing restrictions

5. Drawing on the scoring, I would prioritize a combined approach: Option 1 + Option 3 as the baseline, with a narrowly targeted version of Option 2 applied only to clearly defined higher-risk components - for the scope of this project I feel that leaning too hard into option 2 would simply be infeasible.

Option 1 performs best on preventing physical harm and supporting constructive development, which is the core ethical requirement for a responsive, body-exposed system. Option 3 performs best on transparency, autonomy, feasibility, and low stakeholder burden, and directly addresses the highest-likelihood misuse pathway for this space: overpromising effects where people don’t understand the science (a similar thing was done in the past with alleged “pheremone” perfumes). This is destructive for the industry and undermines the public’s already fragile scientific trust.

Option 2 is valuable where environmental risk and traceability matter most, but it is also the most likely to impede research and concentrate overspending beyond the means of this project, so it should be reserved for specific ingredient classes that warrant heightened control and is more of an aspirational criterion as far as this project is concerned.

The main trade-off is between safety assurance and innovation cost. Option 1 can raise barriers for small teams if compliance is expensive. Option 3 reduces deception and supports autonomy, but could “over-explain” the product. Finally, Option 2 improves environmental stewardship and recall capability, but broad supply-chain restrictions could drive budget constraints toward opaque sourcing. A targeted tiered approach like the one explained above reduces that risk. I would direct this recommendation to a consumer product safety regulator and a retail and advertising coalition composed of large retailers, major e-commerce platforms, and major ad platforms, paired with an industry standards consortium that includes fragrance houses, contract manufacturers, and independent toxicology labs.

This recommendation assumes that premarket testing can meaningfully approximate real-world variability across diverse skin types, climates, and usage patterns, and that labeling and advertising enforcement can be applied consistently enough to reduce deceptive claims. It also assumes that the most consequential harms are likely to arise from (1) unanticipated exposure and sensitization and (2) coercive or misleading marketing.

Key uncertainties include the behavior of novel compounds under poor storage or usage conditions, and cultural effects around the concept of a perfume’s active role as a conduit for physiological/biological signal.

This week’s class reminded me to contemplate how ethical risk often arises not from overtly malicious intent, but from misalignment between technical capability, social interpretation, and governance scope. One concern that became salient to me in the context of cosmetics rather than my home territory of software and medical devices (where this is really, really important) is epistemic harm, a concept also raised and debated briefly in the Zoom chat during lecture: the risk that scientifically adjacent products can mislead users through the appearance of biological authority, even when physical risks are low. In the context of responsive or “bio-inspired” systems, overpromising mechanisms, effects, or certainty can undermine autonomy just as much as coercion, particularly when consumers lack the tools to distinguish poetic or marketing or abstract framing from empirical claims.

To address these issues, governance actions should extend beyond traditional safety regulation. In addition to premarket safety standards that account for dynamic exposure and degradation, I think claims governance and transparency requirements are especially important. I also see value in proportional, tiered oversight that focuses stricter controls on higher-risk components or contexts, rather than broadly constraining exploratory research. Together, these approaches help preserve innovation while addressing the ethical risks that arise at the intersection of embodiment, perception, and trust.

Homework Questions

Homework Questions from Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The length of the human genome is 3.2 Gbp (3.2 billion base pairs). With error correcting polymerase, the error rate during synthesis is 1 error:10^6 nucleotides. That’s about 3200 errors per cell division, a stark difference from the fewer than 5 errors that actually are observed during genome replication in humans. This is because exonucleolytic proofreading and mismatch repair systems save us from catastrophe.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

There is an astronomically large amount of ways to code for an average human protein - 64 possible triplet codons encode 20 amino acids, and the average human protein is a handful of hundreds of amino acids long. Although these sequences may, in theory, produce the same protein, not all codes are equally effective (codon usage bias; translation speed, mRNA stability, protein folding, etc.). 

Homework Questions from Dr. LeProust

What’s the most commonly used method for oligo synthesis currently?

Solid-phase phosphoramidite DNA synthesis is the dominant method in industry, currently, wherein nucleotides are added a base at a time through a cyclic process to a growing oligonucleotide. 

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Each coupling step is not 100% efficient, so even with very good coupling efficiency there is exponential yield loss with increasing length and it ceases to be yield and error effective.

Why can’t you make a 2000bp gene via direct oligo synthesis?

Due to the bp length, basically no full length molecules would be produced due to the accumulation of errors mentioned above. In addition, purification and quality control would be extremely difficult (separating the desired signal of 2000bps from a background of noise). 

Homework Question from George Church:

[Using Google & Prof. Church’s slide #4]   What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in all animals - including us humans - are arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. Based on what we now know, it’s funny - allegedly, Dr. Henry Wu of Jurassic Park modified the genome to knock out the capability of dinosaurs to create their own lysine… but obviously, they don’t produce it anyways, and they continue to get it from vegetarian. An unlikely oversight driven by hubris.

Week 2 HW: DNA Read, Write, and Edit

Part 1: Benchling and In-silico Gel Art 

I started this week with great ambition. Frankly, that faded pretty fast! Initially, I wanted to only use enzymes that started with letters in my name (Katharine) but that was over-ambitious given the time I had available to devote to the exercise this week. I had a couple ideas in mind for design… a space invader; the Windows logo. I had used Benchling before (in fact I’m trying to design a competitor) but was a bit rusty, so it took me a second to get my bearings. Following the documentation provided (thank you!) and playing around a bit with Ronan’s website for some inspiration, I was able to kind of make “HI”. Initially, I also included an exclamation point, but it just wasn’t translating. See below for screenshots of my viewer and the final PNG. Enzymes were selected and arranged according to the steps outlined in the provided protocol.

Part 2 was not completed due to lack of lab access. However, I ran gels often at an old gig looking at shark DNA. Took me a second to stop breaking the wells. Pissed my manager off a lot, though she did her best not to show it. Miss those days!  Part 3: DNA Design Challenge

3.1 Choose Your Protein

I have chosen the human microtubule-associated tau protein for this exercise. Before my current job at HoX, I spent 6 years working in Translational R&D at Ed Boyden and Li-Huei Tsai’s incredible startup, Cognito Therapeutics. As the “amyloid hypothesis”—which posits that the buildup of amyloid-beta peptides in the brain is the primary driver of Alzheimer’s pathogenesis—has fallen to the wayside, and interest in Tau has risen in its place, I spent many hours devising a schematic/protocol for the exploration of novel tau and tau-adjacent biomarkers in our pivotal trial participants, and that’s all I can say for now 🙂

Here’s the sequence of this nasty thing! 

>sp|P10636|TAU_HUMAN Microtubule-associated protein tau OS=Homo sapiens OX=9606 GN=MAPT PE=1 SV=5

MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPG

SETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAG

HVTQEPESGKVVQEGFLREPGPPGLSHQLMSGMPGAPLLPEGPREATRQPSGTGPEDTEG

GRHAPELLKHQLLGDLHQEGPPLKGAGGKERPGSKEEVDEDRDVDESSPQDSPPSKASPA

QDGRPPQTAAREATSIPGFPAEGAIPLPVDFLSKVSTEIPASEPDGPSVGRAKGQDAPLE

FTFHVEITPNVQKEQAHSEEHLGRAAFPGAPGEGPEARGPSLGEDTKEADLPEPSEKQPA

AAPRGKPVSRVPQLKARMVSKSKDGTGSDDKKAKTSTRSSAKTLKNRPCLSPKHPTPGSS

DPLIQPSSPAVCPEPPSSPKYVSSVTSRTGSSGAKEMKLKGADGKTKIATPRGAAPPGQK

GQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREP

KKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLD

LSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEK

LDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDT

SPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL

3.2 Reverse Translate: Protein sequence to DNA sequence

Here is the unoptimized sequence DNA sequence for this protein using the reverse translation tool at bioinformatics.org

atggcggaaccgcgccaggaatttgaagtgatggaagatcatgcgggcacctatggcctg

ggcgatcgcaaagatcagggcggctataccatgcatcaggatcaggaaggcgataccgat

gcgggcctgaaagaaagcccgctgcagaccccgaccgaagatggcagcgaagaaccgggc

agcgaaaccagcgatgcgaaaagcaccccgaccgcggaagatgtgaccgcgccgctggtg

gatgaaggcgcgccgggcaaacaggcggcggcgcagccgcataccgaaattccggaaggc

accaccgcggaagaagcgggcattggcgataccccgagcctggaagatgaagcggcgggc

catgtgacccaggaaccggaaagcggcaaagtggtgcaggaaggctttctgcgcgaaccg

ggcccgccgggcctgagccatcagctgatgagcggcatgccgggcgcgccgctgctgccg

gaaggcccgcgcgaagcgacccgccagccgagcggcaccggcccggaagataccgaaggc

ggccgccatgcgccggaactgctgaaacatcagctgctgggcgatctgcatcaggaaggc

ccgccgctgaaaggcgcgggcggcaaagaacgcccgggcagcaaagaagaagtggatgaa

gatcgcgatgtggatgaaagcagcccgcaggatagcccgccgagcaaagcgagcccggcg

caggatggccgcccgccgcagaccgcggcgcgcgaagcgaccagcattccgggctttccg

gcggaaggcgcgattccgctgccggtggattttctgagcaaagtgagcaccgaaattccg

gcgagcgaaccggatggcccgagcgtgggccgcgcgaaaggccaggatgcgccgctggaa

tttacctttcatgtggaaattaccccgaacgtgcagaaagaacaggcgcatagcgaagaa

catctgggccgcgcggcgtttccgggcgcgccgggcgaaggcccggaagcgcgcggcccg

agcctgggcgaagataccaaagaagcggatctgccggaaccgagcgaaaaacagccggcg

gcggcgccgcgcggcaaaccggtgagccgcgtgccgcagctgaaagcgcgcatggtgagc

aaaagcaaagatggcaccggcagcgatgataaaaaagcgaaaaccagcacccgcagcagc

gcgaaaaccctgaaaaaccgcccgtgcctgagcccgaaacatccgaccccgggcagcagc

gatccgctgattcagccgagcagcccggcggtgtgcccggaaccgccgagcagcccgaaa

tatgtgagcagcgtgaccagccgcaccggcagcagcggcgcgaaagaaatgaaactgaaa

ggcgcggatggcaaaaccaaaattgcgaccccgcgcggcgcggcgccgccgggccagaaa

ggccaggcgaacgcgacccgcattccggcgaaaaccccgccggcgccgaaaaccccgccg

agcagcggcgaaccgccgaaaagcggcgatcgcagcggctatagcagcccgggcagcccg

ggcaccccgggcagccgcagccgcaccccgagcctgccgaccccgccgacccgcgaaccg

aaaaaagtggcggtggtgcgcaccccgccgaaaagcccgagcagcgcgaaaagccgcctg

cagaccgcgccggtgccgatgccggatctgaaaaacgtgaaaagcaaaattggcagcacc

gaaaacctgaaacatcagccgggcggcggcaaagtgcagattattaacaaaaaactggat

ctgagcaacgtgcagagcaaatgcggcagcaaagataacattaaacatgtgccgggcggc

ggcagcgtgcagattgtgtataaaccggtggatctgagcaaagtgaccagcaaatgcggc

agcctgggcaacattcatcataaaccgggcggcggccaggtggaagtgaaaagcgaaaaa

ctggattttaaagatcgcgtgcagagcaaaattggcagcctggataacattacccatgtg

ccgggcggcggcaacaaaaaaattgaaacccataaactgacctttcgcgaaaacgcgaaa

gcgaaaaccgatcatggcgcggaaattgtgtataaaagcccggtggtgagcggcgatacc

agcccgcgccatctgagcaacgtgagcagcaccggcagcattgatatggtggatagcccg

cagctggcgaccctggcggatgaagtgagcgcgagcctggcgaaacagggcctg

3.3 Codon optimization

Here is the codon-optimized DNA sequence from vectorbuilder.com: 

ATGGCTGAGCCCCGGCAGGAGTTCGAAGTGATGGAAGACCATGCTGGAACCTATGGTCTGGGCGACAGGAAGGACCAGGGCGGATACACAATGCATCAGGACCAGGAGGGCGACACAGACGCCGGCCTGAAAGAGTCTCCCCTGCAGACCCCTACCGAAGACGGGTCAGAAGAGCCCGGCTCTGAGACCTCTGACGCTAAGAGCACACCAACCGCCGAGGATGTCACCGCCCCCCTGGTGGATGAAGGCGCCCCCGGCAAACAGGCCGCAGCCCAGCCCCACACTGAGATCCCCGAAGGAACAACCGCTGAGGAGGCCGGCATTGGCGATACCCCTTCTCTGGAAGATGAAGCCGCCGGGCACGTGACCCAGGAACCTGAGTCTGGAAAGGTCGTGCAGGAAGGCTTCCTGCGCGAGCCAGGGCCTCCCGGACTGTCTCACCAACTCATGAGCGGCATGCCCGGGGCCCCTTTACTCCCCGAGGGTCCCAGAGAGGCCACACGTCAGCCATCTGGAACAGGCCCCGAGGACACCGAAGGCGGTAGACATGCTCCAGAGCTGCTTAAACACCAGCTGCTGGGCGACCTCCACCAGGAGGGCCCTCCTCTGAAGGGCGCCGGGGGCAAGGAAAGGCCCGGCAGTAAAGAGGAAGTGGATGAGGACAGAGATGTGGATGAATCTTCTCCTCAGGATTCTCCCCCATCTAAGGCCTCTCCTGCCCAGGACGGCAGGCCACCTCAGACTGCTGCCAGGGAGGCCACCTCCATTCCTGGATTCCCAGCAGAAGGCGCCATTCCACTGCCCGTGGATTTCCTGTCTAAAGTGTCAACCGAAATCCCCGCTTCTGAACCCGATGGCCCTTCCGTGGGGCGAGCCAAGGGCCAGGACGCCCCTCTGGAGTTCACCTTTCATGTGGAGATAACACCAAACGTGCAGAAGGAGCAGGCTCACTCTGAGGAGCATCTTGGGAGAGCTGCCTTTCCCGGCGCCCCTGGGGAAGGGCCAGAGGCCAGAGGGCCTTCCCTGGGCGAGGACACAAAGGAGGCCGATCTGCCCGAACCTAGCGAGAAGCAGCCCGCTGCTGCTCCTCGCGGGAAACCAGTGTCCCGGGTCCCACAACTCAAGGCTAGAATGGTTTCCAAGTCCAAGGACGGAACAGGCTCAGACGATAAAAAGGCCAAGACTAGCACCCGGTCTAGTGCCAAGACACTGAAAAACCGCCCCTGCCTGAGCCCTAAGCACCCAACACCCGGAAGTTCTGACCCTCTGATTCAGCCCTCTTCCCCTGCAGTGTGTCCCGAGCCTCCTTCCAGTCCCAAATACGTGTCATCTGTAACTAGCCGGACTGGCTCCAGCGGAGCCAAAGAGATGAAGCTCAAGGGGGCCGACGGGAAGACAAAGATTGCCACCCCTCGGGGCGCCGCCCCTCCTGGACAGAAGGGACAGGCCAACGCCACCCGAATCCCTGCCAAGACCCCTCCAGCCCCGAAGACCCCCCCTAGTTCCGGGGAACCTCCCAAGTCTGGAGACCGGTCCGGATATAGTTCACCAGGAAGCCCTGGGACCCCAGGATCTAGGTCCAGGACACCCTCTCTGCCTACTCCCCCTACAAGGGAGCCCAAAAAAGTCGCCGTGGTGAGAACCCCCCCTAAGTCACCCTCCTCCGCTAAATCTCGGCTGCAGACTGCTCCTGTGCCCATGCCTGACCTGAAAAATGTGAAGTCTAAAATCGGCTCCACCGAGAACCTGAAGCACCAGCCCGGGGGCGGCAAAGTGCAAATCATCAATAAGAAGCTGGATCTGTCCAACGTGCAGTCCAAATGCGGGTCCAAGGACAACATCAAGCATGTGCCTGGGGGTGGCTCCGTGCAGATTGTGTACAAGCCCGTGGATCTGAGCAAGGTTACCTCCAAGTGTGGGTCCCTGGGCAATATCCACCACAAGCCAGGAGGCGGACAGGTTGAGGTAAAATCCGAAAAGCTGGACTTTAAGGACCGGGTGCAGAGCAAAATTGGCTCTCTGGATAATATCACCCACGTGCCAGGAGGCGGCAACAAGAAGATCGAAACCCATAAGCTGACTTTTCGCGAGAATGCCAAGGCAAAGACTGACCACGGGGCCGAGATCGTGTATAAAAGCCCGGTTGTCTCTGGGGATACATCTCCAAGGCACCTGTCCAACGTTAGTTCCACCGGGAGCATCGATATGGTGGATTCTCCTCAACTGGCAACACTGGCCGACGAGGTGTCCGCCTCCCTGGCTAAACAGGGGCTG

3.4 You have a sequence! Now what?

i) What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Let’s remember our cornerstore here, the central dogma: DNA -> RNA -> protein. We just need to choose an environment and the methods by which to execute this workflow.

First, for research purposes, we can decide what vector we want to use—for example e-coli, human iPSCs, or transgenic mice. Let’s go with stem cells, I think, which already contain the MAPT gene (the gene that encodes Tau protein, microtubule-associated protein tau). And let’s say we want to make pTAU-217 specifically, a phosphorylated tau that is a really hot biomarker in Alzheimer’s research. It has recently shot up as a protein of interest but in fact, collaborators and I were researching this protein before the first commercial assay for it even came out (from Eli Lilly). It is a truly fascinating protein with an implication in a lot of nervous system diseases, and I encourage interested parties to check it out. 

First, we need to get the stem cells to express a neuronal identity. For AD research that is often forebrain cortical neurons. So, we need to turn off the pluripotency genes and turn on neural lineage transcription factors using chromatin remodeling complexes, neuronal transcription factors (to bind regulatory DNA near MAPT), and histone modifications to make the MAPT gene transcriptionally accessible. 

Inside the nucleus, RNA polymerase binds to the MAPT promoter, the DNA unwinds, and the polymerase reads the template strand, bringing us closer to the second step in our central dogma through the production of pre-mRNA. Then, a 5’ cap is added, introns are removed, exons are joined, and a poly-A tail is added. Alternative splicing will determine whether we produce 3R or 4R tau based on the inclusion/exclusion of exons (R represents the number of repeats). Then, the mRNA exits the nucleus. 

On the ribosome, the mRNA is bound to the start codon, tRNA matches codons with amino acids, and peptide bonds are formed. A polypeptide chain begins to grow until a stop codon is reached. 

The final step in generating pTau-217 from endogenous MAPT in iPSC-derived neurons is going to be the phosphorylation of tau at threonine 217 by a proline-directed kinase—a specialized type of kinase that phosphorylates target proteins specifically at threonine when followed by proline. After tau is translated, a kinase transfers a phosphate group from ATP to the hydroxyl group of Thr217 (threonine at amino acid position 217 in the tau protein), producing phosphorylated tau. This is very strongly correlated with Alzheimer’s pathology and prognosis, and is detectable in cerebrospinal fluid and plasma, making it a fascinating AND accessible exploratory biomarker. We have also seen it to be highly specific in discerning severity of amyloid-associated tau pathology.

3.5 [Optional] How does it work in nature/biological systems?

The system for creating pTau-217 naturally is the same as the steps outlined above here after we get past the turning-stem-cells-to-neurons part. However, I can explain how elevated levels are achieved in humans. This is through kinase overactivation and reduced phosphatase activity, which promote the aggregation of this protein.

i) Describe how a single gene codes for multiple proteins at the transcriptional level.

A single gene can code for multiple proteins at the transcriptional level through several regulatory mechanisms that alter how its RNA transcript is generated or processed before translation. The most common mechanism is alternative splicing, in which different combinations of exons are joined together from the same pre-mRNA by the spliceosome, producing mRNA variants. Another mechanism is alternative promoter usage, where transcription begins at different promoter regions within the same gene, generating transcripts with different 5′ exons and often different N-terminal protein sequences that can affect localization or regulation. Alternative polyadenylation also contributes to diversity by allowing transcription to terminate at different polyadenylation signals, producing mRNAs with different 3′ ends that can alter coding regions or influence mRNA stability and translation efficiency. Finally, RNA editing modifies specific nucleotides within the RNA after transcription, such as converting adenosine to inosine, which can change codons and result in amino acid substitutions not directly encoded in the DNA sequence. Together, these mechanisms enable a single gene to produce multiple distinct protein isoforms, greatly expanding functional diversity without increasing genome size.

ii) Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.

[Did not complete this optional sub-question].

Part 4: Prepare a Twist DNA Synthesis Order

Part 4.2 Build your DNA Insert Sequence

Below are the steps I used to build my insert sequence: 

(Not sure why this turned out so dark, but alas, it went like this and so on and so forth, until…) 

Glorious! 

View it here: https://benchling.com/s/seq-14sPMo30CkZYN9taSAEg?m=slm-oQBXs48FU1YuPssOy0A2 

Note: I wanted to use SBOL but it was throwing me an error the moment I opened the site. 

4.3 / 4.4: Using Twist Nice! Now back to Benchling… We did it!!!

Part 5: DNA Read/Write/Edit

Part 5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I’ll start by talking about a project pretty close to my work—the desire to sequence DNA in areas of political uprising or conflict zones to test for the use of novel biothreats. In this case, I would want to look at human DNA and RNA. Engineered or heavily mutated pathogens evade direct identification on the battlefield due to novelty, low abundance, or deliberate obfuscation (being engineered to be under certain thresholds or not easily understood or categorized) making sure that the pathogens are hard to detect against reference databases. This is a pickle, but biology is on our side, because the human immune response is impossible (for now, thankfully!) to conceal. Whole genome sequencing can reveal susceptibility loci and structural variants while single-cell RNA sequencing can capture interferon signaling, inflammatory cascades, and isoform switching that signal abnormal biological stress, even if we aren’t aware what the pathogen is causing these phenomena. Time is very much of the essence on the battlefield or in areas where resources are limited/situations are volatile, so it’s not always an option to sequence a novel pathogen and do detective work to figure out exactly what the hell is going on. Integrating host genomics with metagenomic reads allows us a pretty elucidating readout of how the body is responding to a foreign encounter. This host-centered approach, which is growing in popularity, shifts the focus to identifying the impact on humans versus taxonomy at a time when that just might not be feasible. 

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Let’s use the current state of the art, Oxford Nanopore’s MinION, as an example; it enables this kind of portable, real-time, long-read sequencing of host (and pathogen) nucleic acids in low-infrastructure settings. It gives us resolution as far as structural variation, haplotypes, splice isoforms, and unknown microbial genomes without reliance on a reference database, which we have already established can be impractical in these settings.

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so?

Nanopore sequencing is a third-generation sequencing technology. It enables direct, single-molecule sequencing of full-length transcripts without PCR amplification.

What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

Input: Genomic DNA from blood (could also use tissue but blood is easier).

  1. Extract genomic DNA.

  2. Optionally fragment DNA if standardized read lengths are desired, but with long read sequencing not necessary or fragments can be longer.

  3. Perform end repair and dA-tailing to prepare DNA ends.

  4. Ligate sequencing adapters containing a motor protein.

  5. Load the prepared library onto the nanopore flow cell.

For RNA sequencing: 

  1. Extract RNA.

  2. Enrich for polyadenylated RNA if targeting mRNA.

  3. Ligate sequencing adapters directly to RNA.

  4. Attach motor protein and load onto flow cell.

cDNA sequencing (not recommended ue to use of PCR):

  1. Extract RNA.

  2. Reverse transcribe RNA into cDNA.

  3. Optionally amplify via PCR.

  4. Perform end repair.

  5. Ligate sequencing adapters.

  6. Load onto flow cell.

What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

  1. A voltage is applied across a membrane containing protein nanopores.

  2. A motor protein feeds a single DNA or RNA strand through the nanopore.

  3. As short groups of bases pass through the pore, they disrupt the ionic current in characteristic ways. The MinION records this.

  4. Computational base-calling algorithms use models to translate electrical signal patterns into nucleotide sequences.

  5. Bases are decoded by measuring changes in electrical current.

What is the output of your chosen sequencing technology?

The output includes:

  • Raw electrical signal data files.

  • Basecalled sequences in FASTQ format.

  • Long sequencing reads ranging from kilobases to potentially megabases.

  • Quality scores for each base.

These reads can then be aligned to reference genomes, assembled de novo, analyzed for structural variants, used to determine splice isoforms, and screened for metagenomic content.

Part 5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

Back in the day (and still sometimes in my fantasies, though I don’t think I have the mechanical engineering chops for it) my goal was to work in next-generation bionics, engineering prosthetic, intelligent limbs for people who no longer had them. I’ve always been curious about the mechanical hardware, and as a neuroscientist had considered the biology of the nervous system—but never the meeting of the two in the molecular, biological hardware. 

If I were to choose anything, I would want to use this for a bionic application in amputees. This would look like a project to improve the long-term biological interface between the residual nerves and the implanted electrodes; a major limitation of advanced prosthetics is chronic inflammation and fibrotic encapsulation at the implant site, which degrades signal quality over time, leading to increasingly poor performance. To do this, I would want to introduce cassettes that reduce inflammation and promote nerve stability and regeneration. For example, finding the human IL10 coding sequence (interleukins being a class of biomarkers I am familiar with from my work in Alzheimer’s Disease) and promoting the production of anti-inflammatory interleukin-10 (IL-10) when inflammation is particularly active, to create a feedback-mediated response that can reduce damaging immune responses without causing immune suppression (NM_000572.3.). 

For nerve stability and regeneration, conditional expression of the BDNF coding sequence under an injury state-responsive promoter might do the trick (only make BDNF when the nearby neurons are in an injury response state) (NM_001709.5). 

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

Commercial gene synthesis should do the trick, like the kind that Twist offers—short DNA nucleotides are synthesized on high-density silicon platforms, then assembled into double-stranded gene fragments using overlap-based assembly methods. This method should be precise and defined enough to accommodate my cassettes.

Also answer the following questions:

What are the essential steps of your chosen sequencing methods?

  1. Chemical oligonucleotide synthesis

  2. Cleavage and removal of protection groups 

  3. Oligo assembly into longer fragments

  4. Amplification and cloning

  5. Sanger sequencing or next-gen sequencing verification

What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Speed: Larger constructs require assembly of multiple fragments, which increases time and complexity. Turnaround can be days to weeks (we provide these services at my job, and we’re looking at ways to make this shorter using automation).

Accuracy: While this method has a small per-base error rate, errors can accumulate during assembly. We can use sequence verification to reduce the likelihood of this. 

Scalability: The scalability issue here mostly comes down to cost. Cost scales with length and complexity.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I honestly think I’ll use the same example, because for the purposes of HTGAA it fascinatingly elucidates how completely different methods can be used to the same end; the distinction being whether we are adding an engineered construct or modifying DNA directly.

(ii) What technology or technologies would you use to perform these DNA edits and why?

We could use a CRISPR-based system to insert the IL10 or BDNF constructs into the genome. This could allow stable, long-term expression, which could be useful especially in these chronic applications where we want to maintain as close to peak efficacy as possible in the long-term.

Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps?

CRISPR Cas-9 edits DNA by creating a double-strand break at a specific location on the genome. Guide RNA directs the Cas9 nuclease, which then introduces the double-strand break. Then, the break gets repaired through non-homologous end joining or directed repair, which can insert a designed DNA sequence (that we would use in this case). Then we would screen the edited cells to confirm that it worked correctly. 

What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

First we need to select our genomic target site or “safe harbor” locus that would support the expression of the edited sequence. Then we need to design our guide RNA. We need the donor DNA template of course, with our cassette and then matching sequences so it fits well in the cut site. Then we perform off-target analysis in silico. 

For this, we need:

  1. The Cas9 nuclease 

  2. Synthetic guide RNA 

  3. Donor DNA

  4. Target cells 

  5. Delivery system (viral vectors or even nanoparticles would be fun)

Then we need to sequence to double check accuracy. 

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Off-target effects are a concern; Cas9 may cut unintended genomic sites with partial sequence similarity, creating unwanted mutations.

Also, homology-directed repair tends to work more slowly in mature cells or slowly dividing cells like neurons, since it’s most active during the S phase.