Homework

Weekly homework submissions:

  • Question 1 In the 1980’s, Keith Wood became the first to make a tobacco plant glow using firefly luciferase. However, he faced a critical limitation: they could not synthesize their own luciferin, requiring an external luciferin spray to emit light. 1 As Wood pivoted to fungal pathways, the firefly route has been largely abandoned despite its superior light efficiency. However, recent breakthrough have reopened this door, including the discovery of spontaneous benzoquinone + cysteine L-luciferin formation, and the identification of ACOT1 in D-luciferin transformation. My project aims to put these breakthroughs together, engineering plants capable of bioluminscence through firefly pathways.
  • Part 1: Benchling & In-silico Gel Art If ignoring ladder and lane 1, there is the design! I tried to recreate No Face from Spirited Away. link to benchling: https://benchling.com/s/seq-8pB9vY3uTYXRrqsIKJS6?m=slm-FP5NlW0BAaqfbSmaoUg0 DNA Design Challenge For my homework, I decided to pick the enzyme that is the strongest candidate for facilitating luciferin synthesis. I picked my enzyme from my independent research. First I downloaded Fallon’s paper data from Fallon et al. 2018, “Firefly genomes illuminate parallel origins of bioluminescence in beetles” in eLife (DOI: 10.7554/eLife.36495. This experiment compared gene expression between the fat body (a firefly’s liver) versus the lantern (the organ that makes light) to find which genes are highly expressed in the lantern. I ran a filter in the file PPYR_OGS1.1_fatbody-vs…_test.txt texts, keeping only the statistically significant genes of TPM ≥ 50 and sleuth b ≥ 3. The TPM measures how actively a gene is being expressed in the lantern tissue, a higher TMP signaling a higher likelihood of luciferin expression. The sleuth is the statistical software Fallon used, that estimates log2 fold change. The higher the b sleuth, the more expression a gene has specifically in the lantern than fat body. I also ran qval ≤ 1e-10 to adjust for random noise.
  • Python Script for Opentrons Artwork After designing in http://opentrons-art.rcdonovan.com/ (a pig with a heart in the centre), I chose to run the Collab notebook for simplicity. I used Claude Code to help generate my coordinates into proper code, before running all. After my first simulation I noticed “WASTING BIO-INK : more aspirated than dispensed” warnings, so I edited my code. The next simulation still seemed to be wasting ink, but Claude informed me it’s an issue with floating point rounding.

Subsections of Homework

Week 1 HW: Principles and Practices

Question 1

In the 1980’s, Keith Wood became the first to make a tobacco plant glow using firefly luciferase. However, he faced a critical limitation: they could not synthesize their own luciferin, requiring an external luciferin spray to emit light. 1 As Wood pivoted to fungal pathways, the firefly route has been largely abandoned despite its superior light efficiency. However, recent breakthrough have reopened this door, including the discovery of spontaneous benzoquinone + cysteine L-luciferin formation, and the identification of ACOT1 in D-luciferin transformation. My project aims to put these breakthroughs together, engineering plants capable of bioluminscence through firefly pathways.

As we know:

  • Fireflies create an enzyme called luciferace that can oxidize D-luciferin.
  • Luciferase can be put synthetically engineered into plants, but must be sprayed with luciferin to synthesize luciferin.
  • Kanie et al. (2016) discovered arbutin can spontaneously react with L-cysteine (naturally found in plants) to create L-luciferin.
  • Zhang (2020) discovered ACOT1 as the enzyme responsible for catalyzing the reaction from L-luciferin to D-luciferin.
  • In theory, these reactions could make a plant glow with many foreseen and unforeseen modifications.

My proposed reaction:

Arbutin = 
GLUCOSE + HYDROQUINONE 

Glucose -0 -Hydroquinone gets cut with BGL (β-glucosidas).

Hydroquinone -> 
Benzoquinone

We inject our first gene (lactase) to remove two -0H groups, creating benzoquinone. Important to make sure the benzoquinone is trapped where it meets cysteine immediately or its proteins can damage the cytoplasm of the plant

Benzoquinone + Cysteine
= 
L-LUCIFERINE

As we have discovered, benzoquinone spontaneously reacts with cysteine under certain conditions, producing L-Luciferin

L-Luciferin + ACOT1 = 
D-luciferin AND L-LUCIFERIN

We inject ACOT1 (naturally occurring in the firefly) to catalyze the reaction into D-luciferin (the form that glows)

D-luciferin + ATP = LUCIFERYL-ADENYLATE + PPI
LUCIFERYL-ADENYLATE + 02 = OXYLUCIFERIN + CO2 + AMP 
OXYLUCIFERIN -> 
OXYLUCIFERIN + PHOTON

(Well understood steps of how d-Lucefirin is created into light)

Plant already supplies: Arbutin, Cysteine, ATP, 02

We add 4 enzymes: BGL, Laccase, ACOT1, Luciferase

Question 2

Beyond the ethical application of synthetic genomics, an important goal would be to ensure the plant will not be detrimental or disrupt natural ecosystems. (Example: GloFish goldfish being an invasive species after someone released them in Brazil.) Goals pertaining to this specific issue will be broken down below.

Option 1: Lab-work will be contained and carefully scrutinized to ensure plants are not accidentally released. This includes but is not limited to: proper disposal, proper containment, proper sanitization.

Option 2: Plants will eventually be tested in greenhouse settings, or other settings replicating natural ecosystems to verify no immediate or obvious harm and gather data in case of contamination.

Option 3: Plants that are modified will be non-invasive and common (ex. Petunias), and will be modified only to the extent needed for autonomous self-bioluminescence.

Option 4: Research in later stages will be done with the aim of open-environment release, where doing so will be safe for natural ecosystems.

Does the option:Option 1Option 2Option 3Option 4
Protect the environment?
• By preventing harm to natural ecosystems?MEREMEME
• By helping respondn/aMEn/aSE
Enhance Biosecurity
• By preventing incidentsMESESEME
• By helping respondn/aSEn/aSE
Foster Lab Safety
• By preventing incidentMEn/an/aMIN
• By helping respondn/aMODn/aMIN
Other considerations
• Minimizing costs and burdens to stakeholdersSEMINMODME
• Feasibility?REMINREMOD
• Not impede researchn/aSEn/aSE
• Is net good for the world?RERESERE
  • ME = Most Effective
  • MOD = Moderately Effective
  • RE = Relatively Effective
  • SE = Somewhat Effective
  • MIN = Minimally Effective
  • n/a = n/a

A second goal is to remain responsible with tools and research. As a newcomer to synthetic biology, I want to ensure proper attribution to the work of the reseachers who make this possible.

Option 1: Prevent misuse of plasmids/enzymes/resources from previous academic papers.

Option 2: Be respectful to my lab space and peers, maintaining proper safety and etiquette.

Does the option:Option 1Option 2
Protect the environment?
• By preventing harm to natural ecosystems?SERE
• By helping respondn/aRE
Enhance Biosecurity
• By preventing incidentsSERE
• By helping respondSEMIN
Foster Lab Safety
• By preventing incidentMEME
• By helping respondSERE
Other considerations
• Minimizing costs and burdens to stakeholdersMEMOD
• Feasibility?MERE
• Not impede researchMERE
• Is net good for the world?MERE
  • ME = Most Effective
  • MOD = Moderately Effective
  • RE = Relatively Effective
  • SE = Somewhat Effective
  • MIN = Minimally Effective
  • n/a = n/a

Question 3

Action 1

Purpose: As this is such a new field, these is little regulation on the genetic modification of plants for commercial/art purposes. I propose a more thorough, environmental review before the release of plants.

Design: Researchers would need to submit their own opinions and data, while federal regulators would need to propose their own processes.

Assumptions: That everyone will be acting in good faith, not unnecessarily over or under regulating, and that regulators have the knowledge required to make educated decisions.

Risks: Over-regulation would cause extreme bureaucracy in the the bio-plant space, and cause resentment in the scientific community. Under-regulation might cause detrimental harm to natural ecosystems in Canada.

Action 2

Purpose: People lack incentive or the freedom to purse such projects. Companies can make it more attractive for workers to purse bioengineering for commercial or art purposes.

Design: Corporations or the government of Canada can create programs that incentivize bio-plant discoveries, making it more accessible and attractive for individuals.

Assumptions: Corporations or government programs would be executed well-enough for progress, and enough attention would be directed at students/workers to ensure some level of success.

Risks: Corporations may over-rely on potential profits, pushing individuals into projects that may not be beneficial for society. This is where Action 1 regulations should ideally be applicable. Programs may not be executed in good faith or passion, leaving people uninspired and destitute.

Action 3

Purpose: People lack access to labs, and most places do not have community bio-labs. Autonomous labs will still take time to be implemented on a wider scale. People should be able to get certificates for level-1 biohazard for at-home labs from the government.

Design: A certain certification process should undergo with organizations representing the government. Once you prove you can meet certain standards, a visit should be granted to confirm. If granted, you are certified for a certain amount of time until you re-certify.

Assumptions: Individuals engaging in home labs are responsible, and strive to truly meet these standards at all time. Federal regulators act in good faith, and are responsible with their certifications.

Risks: Individuals running home labs accidentally contaminate or cause injury to themselves or others.

Week 2 Lecture Prep

Homework Questions from Professor Jacobson:

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error rate of polymerase in nature is 1 x 10−6 or 1:106

Human genome: 3.2 Gbp (billion base pairs) or 3.2×109 bases

So if error rate is 1 × 10−6 per base: (1×10−6)×(3.2×109) = 3.2×103 = 3,200 errors

Biology deals with this through the MutS Repair System, where a DNA repair protein scans the replicated DNA, and binds where there are errors in the pairing. Once the MutS is binded, that part of the strand is cut, and the exonuclease removes the error. Lastly, the DNA polymerase resynthesizes the correct pairing, ligase seals it, and the error is removed. This system improves fidelity by 100-1000x, effectively lowering the error number in human genomes to 0.3-1 per copy.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The average human protein is estimated to be around 400 amino acids ( 1036 bp). There are an average 3 numbers of different codons that encode the same amino acid, therefore the number of possible sequencing is ≈3400, which is an absurdly high number. However, this does not translate in practice. Firstly, different organisms prefer different codons, which is why we optimize our codons depending on our project. Biology still cares about the nucleotide sequence, despite identical codon sequences. Other reasons include the mRNA behaving differently depending on the structure, certain codes negatively affecting how the protein folds, RNA instability, among other practical hurdles. Slide 62 has an interesting argument as to why optimal alphabet size falls in the tens, not hundreds.

Homework Questions from Dr. LeProust

What’s the most commonly used method for oligo synthesis currently?

The most commonly used method is phosphoramidite solid-phase DNA synthesis, of which Twist is the main commercial provider for. While phosphoramidite has been optimized since its founding in 1981, the overall method has remained the same. The DNA is built one base at a time using a repeating cycle. First a DNA strand is attached to a solid surface, and a DMT protecting group blocks the 5’ end. Once a chemical acid removes the DMT and exposes the 5’OH, after which a phosphoramidite nucleotide is added to the free 5’-OH (~99–99.5% per cycle efficiency). Since not every strand couples successfully, Twist chemically caps the failed strand. Then the new linkage is chemically oxidized to form a stable bond. Then the cycle repeats.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because with every additional nucleotide, the yield drops since efficiency is not at 100%, but 99.5%. Therefore, for an 100 nucleotide length strand, 0.995100 ≈ 60%, where only 60% of molecules will reach their full length, while the rest are truncated somewhere else.

Why can’t you make a 2000bp gene via direct oligo synthesis?

Because at 2000bp, the percentage of successful molecules will essentially be at 0. (0.9952000 ≈ 0.005%) There would be almost no full length molecules. This is a hard, biological wall we are yet to solve.

Homework Question from George Church

Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in animals are Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine. These are obtained through an animal’s diet, considered essential as they generally cannot be synthesized by animals. The “lysine contingency” is an idea that was popularized by the franchise Jurassic Park. In the movies, dinosaurs were engineered to be unable to make the lysine amino acid without a human-supply, so that in theory, they would not be able to survive without human guidance. This was done to prevent escape, and safety regulations. However, this wouldn’t seem to work in real life. Animals are already “lysine contingent”; we are unable to synthesize lysine. That’s why lysine is found in a wide variety of animal diets, and can be easily supplemented by food. Virtually all food sources and ecosystems contain lysine, scavenging would already easily supply it. Furthermore, there are many other amino acids we require. Signalling one out is arbitrary. Effective containment would rely on dependencies that are rare in nature.

Other References

Iwano, S., Sugiyama, M., Hama, H., Watakabe, A., Hasegawa, N., Kuchimaru, T., … Miyawaki, A. (2018). Single-cell bioluminescence imaging of deep tissue in freely moving animals. Science, 359(6378), 935–939. https://doi.org/10.1126/science.aaq1067

Kanie, S., Abe, K., Hirano, T., & Niwa, H. (2016). One-pot non-enzymatic formation of firefly luciferin in a neutral buffer from p-benzoquinone and cysteine. Scientific Reports, 6, 24794. https://doi.org/10.1038/srep24794

Zhang, R., Chen, L., Jiao, J., Zhou, Y., Liu, Y., & Wang, Y. (2020). Genomic and experimental data provide new insights into luciferin biosynthesis and bioluminescence evolution in fireflies. Scientific Reports, 10, 15882. https://doi.org/10.1038/s41598-020-72900-z

Applicable AI Promots: Summarize Keith Wood’s lifework and research in firefly related plant bioluminescence.

(Image of example table) explain what they mean by options, and how the graph works.

Explain the MutS mismatch repair system in detail.

How many different ways are there to code (DNA nucleotide code) for an average human protein? Why doesn’t it translate in practice?

Explain how phosphoramidite solid-phase DNA synthesis works in detail.

Is lysine in all of our foods?

Google What are the 10 essential amino acids in all animals (https://www.sciencedirect.com/science/article/pii/S216183132201273X)

What is the “lysine contingency” (https://jurassicpark.fandom.com/wiki/Lysine_contingency#:~:text=The%20Lysine%20Contingency%20was%20a,acquire%20Lysine%20by%20eating%20herbivores.)

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Ronan’s website design Ronan’s website design Benchling design Benchling design

If ignoring ladder and lane 1, there is the design! I tried to recreate No Face from Spirited Away.

link to benchling: https://benchling.com/s/seq-8pB9vY3uTYXRrqsIKJS6?m=slm-FP5NlW0BAaqfbSmaoUg0

DNA Design Challenge

For my homework, I decided to pick the enzyme that is the strongest candidate for facilitating luciferin synthesis. I picked my enzyme from my independent research. First I downloaded Fallon’s paper data from Fallon et al. 2018, “Firefly genomes illuminate parallel origins of bioluminescence in beetles” in eLife (DOI: 10.7554/eLife.36495. This experiment compared gene expression between the fat body (a firefly’s liver) versus the lantern (the organ that makes light) to find which genes are highly expressed in the lantern. I ran a filter in the file PPYR_OGS1.1_fatbody-vs…_test.txt texts, keeping only the statistically significant genes of TPM ≥ 50 and sleuth b ≥ 3. The TPM measures how actively a gene is being expressed in the lantern tissue, a higher TMP signaling a higher likelihood of luciferin expression. The sleuth is the statistical software Fallon used, that estimates log2 fold change. The higher the b sleuth, the more expression a gene has specifically in the lantern than fat body. I also ran qval ≤ 1e-10 to adjust for random noise.

Afterwards, I ran my candidates through Fallon’s enzyme annotation file (PPYR_OGS1.1.enzyme.ids.txt) which lists the predicted enzymes based on Fallon’s work in InterProScan. Although both filters have arbitrary thresholds with likely blind spots, it produced a good starting point.

I then ran my candidates through HMMsearch for function, and BLAST to check if the enzyme is expressed only in fireflies, or all insects (which would point to different, non-applicable functions such as exoskeleton hardening). One candidate was already discussed in Zhang (2020), but has not yet been experimentally validated. The rest are potentially novel candidates not yet found in literature.

I then did reverse BLAST to see if the top-hit organisms point back to their assigned enzyme. This was done to check if they’re true orthologs, or if BLAST was matching generic similarities. This narrowed down my search to four candidates.

I decided to go along with PPYR_02911 as my homework protein for several reasons. Firstly, it’s an oxidative enzyme (P450), which is the class you’d expect in the bottleneck. We know that firefly luciferin requires a benzoquinone precursor, therefore requiring an oxidative step facilitated by an oxidative enzyme. Secondly, it’s a tandem duplicate of one of my other candidates (PPYR_02910), being only 4kb apart. They are 87% identical, pointing to possible gene neofunctionalization. Thirdly, its BLAST hits were almost all bioluminescent species, and its Lantern TPM was the highest of the group. Lastly, the candidate proposed by Zhang (2020) likely facilitates in storage, pointing to a support function. I’m more interested in finding the missing catalytic step.

Since I already obtained the protein sequence from my own independent research, I have it on hand from Fallon’s PPYR_OGS peptide FASTA file, which contained custom protein sequences for all the predicted genes in the Common Eastern Firefly. This file was obtained from Fallon’s Github (https://github.com/photocyte/PPYR_OGS). However, for learning purposes, I pasted my PPYR_02911-PA protein sequence from Fallon’s file in NCBI, clarifying Photinus pyralis (the firefly) as my organism. My top hit was the NCBI equivalent, XP_031330391. I cross referenced XP_031330391 with PPYR_02911-PA, and confirmed it is the same gene.

XP_031330391.1 cytochrome P450 4C1-like [Photinus pyralis] MMNLVDEFAPKSALQALVPIALVTFLVWYMQYHWNRRRLYKMAALFDGPICLPFVGNGLYFVGSTSDILQNVISLVSNFK LPVRVWLGQKLFYALVDPGDLEIIMNSPHALEKDELYQYAEPIVGTGLFTAPVPKWKRHRKVIMPTFNQRILDEFVPVFA EQSEILLEQLKKQVGKGSFDIFQLVSRCTLDIICETAMGVKVEAQTTDSDYVKWANKAMEIMFTRMFNIWYHFDSIFNLT QSARDLLDVQTKMKTFTGAVVRNRREAYQRKMRERRQLPEGYVDKEAPTRKTFLQQLIELSEGGANFTDDELREEVDTFM VAGSDTTASMNSFIFIMLGMHPDVQEEVYQEVLDVLGPDRAVEAADLGRFHYMERVMRETMRIFPVGPILVRAITKDLQL ENCVIPAGSSVVMVIMQTHRSEKIWPHPLRFEPDRFLPEEVAKRHPYAWLPFSGGPRNCVGPKYAFMAMKALIATVVRRY KFKTDYKCIEDIELKADLMLKPVNGYNVSVELRE

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I clicked on XP_031330391 next to the “Sequence ID” in the XP_031330391 protein page in NCBI. From there I clicked on the DBSOURCE to get the DNA sequence, and then FASTA.

Benchling XP_031330391 Sequence Benchling XP_031330391 Sequence

XM_031474531.1 PREDICTED: Photinus pyralis cytochrome P450 4C1-like (LOC116161249), mRNA AGTGGTAGTGTTTTGTTCACCTTCAAACATGATGAATCTTGTCGACGAATTTGCGCCTAAAAGCGCTCTC CAGGCGCTGGTGCCCATCGCACTTGTCACGTTTTTGGTGTGGTACATGCAGTACCATTGGAACCGTAGGC GCCTCTACAAAATGGCGGCCCTGTTCGATGGGCCCATCTGTTTGCCGTTTGTTGGCAATGGACTATACTT CGTCGGCTCCACGAGTGATATCCTGCAAAACGTCATCTCGCTGGTGAGCAATTTTAAACTGCCGGTGAGA GTGTGGTTGGGCCAGAAACTATTCTACGCCTTAGTTGATCCTGGAGATCTGGAAATAATCATGAATAGCC CTCACGCGCTGGAGAAGGACGAGTTGTACCAGTATGCCGAACCCATTGTGGGTACCGGATTGTTCACCGC CCCAGTGCCAAAATGGAAACGCCACCGTAAAGTGATAATGCCGACATTCAATCAGCGCATCCTCGACGAG TTTGTGCCGGTATTTGCCGAACAGTCCGAAATACTCTTGGAGCAGCTGAAGAAACAAGTTGGAAAGGGCA GCTTCGACATCTTTCAACTGGTCAGCCGTTGTACCTTGGATATTATCTGCGAGACGGCAATGGGAGTTAA GGTGGAGGCGCAGACTACAGACTCAGATTACGTCAAGTGGGCTAACAAGGCAATGGAGATTATGTTTACG AGAATGTTCAACATTTGGTACCACTTCGATTCCATATTTAACCTTACACAAAGCGCACGTGACCTTCTCG ACGTCCAGACCAAGATGAAAACGTTCACTGGAGCCGTTGTAAGAAACAGACGGGAGGCGTACCAGAGAAA GATGAGGGAACGAAGGCAACTACCGGAGGGGTACGTAGACAAAGAGGCGCCGACTCGAAAAACCTTCCTT CAACAGCTCATCGAGTTGTCCGAAGGAGGGGCCAACTTCACAGATGATGAACTGCGGGAAGAGGTCGACA CATTTATGGTGGCGGGAAGTGACACCACTGCGTCGATGAACAGTTTCATATTCATCATGCTTGGAATGCA CCCAGATGTTCAGGAAGAAGTCTACCAAGAGGTGCTAGACGTACTCGGGCCAGACAGAGCAGTAGAAGCA GCCGACCTTGGTCGTTTCCACTACATGGAGAGAGTGATGAGGGAGACCATGCGCATATTTCCCGTCGGAC CTATACTGGTTAGGGCAATCACCAAGGACCTTCAGCTAGAGAACTGCGTGATACCGGCCGGAAGTTCAGT GGTAATGGTGATAATGCAAACGCACAGAAGCGAAAAGATTTGGCCGCATCCGCTTAGGTTCGAACCCGAC CGTTTCCTACCCGAAGAGGTAGCCAAACGACATCCATACGCTTGGTTGCCATTCTCTGGTGGTCCACGTA ACTGCGTTGGTCCTAAATATGCATTTATGGCCATGAAGGCGCTCATCGCTACCGTCGTCAGGCGATACAA ATTTAAGACCGACTATAAGTGCATCGAGGATATCGAACTGAAGGCTGATTTGATGTTGAAACCGGTCAAC GGGTACAATGTTTCGGTCGAATTGAGGGAATAACAACAATTTATCAGCGCACAATTACTTTAGAACTATT CGTGTTGCAGTATAAACTTTTTTACTCGCC

3.3. Codon optimization

Since the same amino acid can be encoded by different codons, certain organisms prefer different codons for the same amino acid. My firefly genes have a different preference than a plant would, therefore I would need to optimize for plants to avoid truncated proteins or low production. I’m specifically optimizing for Nicotiana Rustica. I chose this species over the standard N.Bethamiana because I literally couldn’t source them anywhere in Vancouver, and there’s a slight benefit that my construct results will be more realistic for general plant species. I chose to optimize in GenSmart. Although GenSmart did not have Nicotiana Rustica, I went along with Nicotiana Tabacum as they are closely related. I excluded BsaI since I am planning to use Golden Gate Assembly.

Benchling XP_031330391 Sequence Benchling XP_031330391 Sequence

3.4. What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein.

I will be using a cell-dependent approach since my long-term goal is to test this enzyme in plants. First I will use Golden Gate assembly to clone the gene I optimized into an expression vector. I will then heat shock this plasmid into Agrobacterium cells, and then inject the cultures into N. rustiva leaves. The plant will do the rest of the work, transcribing the DNA, and then translating it into the cytochrome p450 protein.

Part 4: Prepare a Twist DNA Synthesis Order

I inserted the optimized linear DNA sequence from GetSmart into Benchling and annotated it to XP_031330391 CDS. Since I am doing plant expression cassette, my DNA inserts will differ. I will need:

  • CaMV 35S promoter
  • Start Codon ATG
  • XP_031330391 Coding Sequence
  • Stop Codon
  • NOS terminator
  • No RBS needed (Plant cells don’t use RBS) and no His tag needed (I won’t need to purify).

Next I searched “CaMV 35S promoter” in AddGenes. I’m not sure how to pick the correct promoter so I had ChatGPT help me narrow it down to pP35S. Considering it says “Provides the CaMV 35S Promoter as a level 0”, it makes sense as a choice. Next I uploaded the GenBank file into Benchling, found the promoter, and inserted it into the beginning of my DNA sequence. I already had the start codon ATG in the beginning of my XP_031330391 CDS. GenSmart did not include a stop codon, so I added TAA.

I searched “NOS terminator” in AddGenes. I inferred lvl0 NOSt was the correct plasmid due to its description saying “Agrobacterium terminator for plant expression”, but double checked with ChatGPT just to be sure. Downloaded the GenBank, uploaded into Benchling, copy and pasted the terminator after XP_031330391 CDS.

Here is the link share: https://benchling.com/s/seq-aAJ0m9ksaKuRH7IAhY6i?m=slm-6QKUQA992f2toe6Y0U2a

From my understanding, I would need Gene Fragments since I am doing Golden Gate assembly, not Clonal Genes. I’ll still pick Clonal Genes for the purpose of this homework, choosing the pTwist Amp High Copy vector.

The Twist order in Benchling link: https://benchling.com/s/seq-5ic0dMqtz8CArU6mDtAJ?m=slm-ihfHt2ynzSsQTZMWejYG

5.1 DNA Read

I would use Nanopore for several reasons. Firstly, I am working out of a community lab, and this is one of the more accessible DNA sequencing technologies (that’s why it was taken to space!). Secondly, I’m still early on in my experimentation. If I need higher accuracy, I’d use Illumina+ or MGI+. Nanopore is third generation as you do not need to amplify your DNA through PCR. Therefore, my input would simply be raw DNA or RNA from my sample. To prepare my DNA for the nanopore, I’d need to extract my DNA and purify it from non-genetic material, before breaking it into smaller pieces for easier reading. Since fragmenting DNA leaves the end pieces jagged, I’d repair any blunt ends and add an adenine overhang so the adapters can grab onto the DNA. Nanopore requires a special adapter ligation onto the DNA ends to physically feed the DNA strand through the machine at a controlled pace. Lastly, I’d pipette my prepared sequence on a small flow cell plugged into the device.

The flow cell has many protein nanopores in its membrane, all of which have an electrical voltage creating a flow through each pore. As the motor feeds the DNA strands through the pores, the bases squeeze through the tiny pore, as each base blocks the current differently due to their slight differences in shapes and sizes. The machine measures these slight differences in the pore currents, creating a pattern based on the current changes and translating them into DNA letters. This simplicity allows the nanopore to be an efficient and cheaper DNA sequencing.

The output is a FASTQ file which carries the sequence and quality score (the confidence each base is correct). There is also a raw signal file of the actual electrical current data for analysis or troubleshooting.

5.2 DNA Write

As mentioned earlier, I’d like to synthesize and write my codon-optimized PPYR_02911 (XP_031330391), as I believe it is a good candidate for luciferin biosynthesis. Out of my non-novel genes, I’d like to synthesize plant-optimized BGL, Laccase ACOT1 and luciferase as they are big players in expressing autonomous bioluminescence in plants.

I’d use Twist’s phosphoramidite chemistry as it is the industry standard, and Twist developed supplementary technology to make it efficient and cheaper. I also love Twist, and want any excuse to use their website. Firstly, single-stranded DNA is built one nucleotide at a time from 3’ to 5’. Each nucleotide is a single cycle, each of which has deprotection, coupling, capping, and oxidation. Once this oligo is built, you cleave it off the solid support it was initially built on. If your gene is longer than a single oligo, Twist assembles them into a single gene with PCR. This gene then gets cloned, inserted into a plasmid vector, double-checked for verification, and shipped. As with any other writing method, there are compounding errors. Furthermore, since oligos get maxed out fairly quickly in length, stitching genes together causes the process to be more error-prone.

5.3 DNA Edit

Although glowing plants are a marvel, I would love my long term goals to be more beneficial for humanity. Plants are integral to our survival, whether it’s due to crops, materials, or our air. They are beautiful and crucial. My dream would be to take my ambition further, whether it’s to engineer nitrogen fixation in crops, help conserve plant species, or make trees more efficient at carbon sequestration. For such ambitions, you’d need the big guns: CRISPR-Cas9.

The steps of CRISPS are as follows. First you design a guide RNA, a short sequence that matches the genome region you’d like to edit. This guide RNA tells the Cas9 where to cut. This is done through Benchling or CRISPOR. You’d also prepare your donor template if using one, a sequence containing your insert with sequences matching the cut site at both ends. Therefore, the inputs are the Cas9 plasmid, sgRNA, donor template DNA and target cells. You then deliver this plasmid into wherever cell you’re utilizing. After the CAS9 makes a double stranded break, you can edit either through NHEJ or HDR. With Non-Homologous End Joining the cell glues the broken strands back together, often introducing small deletions for knocking out genes. Homology-Directed Repair uses the donor template to insert the new DNA at the cut side to knock in a gene. Lastly, verify by sequencing the targeted region.

The limitations to CRISPR: 1. Knocking in genes has a low success rate in plants. 2. The guide RNA might lead the Cas9 to the wrong sequence as sequences may look similar 3. You may have mosaicism as regenerating a whole plant with edited cells is difficult.

Work Cited

Fallon et al. 2018, “Firefly genomes illuminate parallel origins of bioluminescence in beetles” in eLife (DOI: 10.7554/eLife.36495

Ai Citations: I cannot find N.Bethamiana for my glowing plant project, world Nicotiana rustica work?

[Copy and pasted 4.2 question], what inserts would my version need since I’m optimizing for plants?

Is GAA a stop codon? [Follow-up]: what should be my stop codon?

Is nanopore first, second or third generation?

Explain how a nanopore works in detail. What are the inputs and outputs?

Explain how Twist sequences DNA in detail.

Explain how CRISPR works in detail.

Top limitations of CRISPR in Plant edits

Week 3 HW: Lab Automation

Python Script for Opentrons Artwork

After designing in http://opentrons-art.rcdonovan.com/ (a pig with a heart in the centre), I chose to run the Collab notebook for simplicity. I used Claude Code to help generate my coordinates into proper code, before running all.

Design Design

After my first simulation I noticed “WASTING BIO-INK : more aspirated than dispensed” warnings, so I edited my code. The next simulation still seemed to be wasting ink, but Claude informed me it’s an issue with floating point rounding.

simulation simulation

Python script link: https://colab.research.google.com/drive/1ONInhseZmTcpt755AN2gSf4Gsnn31Nd3#scrollTo=pczDLwsq64mk&line=1&uniqifier=1

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Since my project requires extensive wet-lab trial and error, I found “AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots” the most applicable paper. DNA assembly is mostly still done by hand, which makes such a meticulous process error-prone. Furthermore, it poses a high barrier to entry for those without lab or educational access. AssemblyTron is a free, open-source Python package you download for the Opentrons OT-2 that is optimized for DNA construct design protocols including: PCR setups, Golden Gate Assembly, Homology-dependent assembly, and more. The Opentron handles a wide range of tedious and time-consuming processes, allowing for better time-management and faster research progress. Lastly, as someone with a diagnosed hand tremor, I struggle constantly with contamination and resource waste (ex. repeatedly grazing tips) in the lab. This would be immensely helpful for individuals who struggle with fine-motor skills, especially for delicate procedures such as cloning methods.

Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more.

My plan requires a four-gene bioluminescence pathway through Golden Gate assembly. These are novel genes in plants, and will inevitably require trouble shooting via different combinations of promoters, luciferase variants, targeting signals, etc. Instead of assembling one construct at a time, (a real time-related bottleneck in this project), I could set up dozens of constructs in parallel. An example pseudocode could be as follows:

# Combinatorial Golden Gate Assembly for bioluminescence pathway
# Testing 3 promoters × 2 targeting signals × 4 genes = many combinations

promoters = ["CaMV_35S", "UBQ10", "NOS"]
targeting = ["PTS1", "PTS2"]
genes = ["BGL", "laccase", "ACOT1", "Ppyr_luciferase"]

for each promoter in promoters:
    for each signal in targeting:
        # OT-2 assembles the four-gene construct:
        pick_up_tip()
        aspirate(backbone_fragment)
        aspirate(promoter_fragment)
        aspirate(gene_cassette_with_signal)
        aspirate(BsaI + T4_ligase_master_mix)
        mix_and_dispense_into(thermocycler_plate)
        drop_tip()

# Run Golden Gate thermocycler protocol
# Transform into Agrobacterium
# Infiltrate N. benthamiana leaves
# Image bioluminescence after 3-5 days

Work Cited

Bryant, J. A., Kellinger, M., Longmire, C., Miller, R., & Wright, R. C. (2023). AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots. Synthetic Biology, 8(1), ysac032. https://doi.org/10.1093/synbio/ysac032

Ai Citations (Claude 4.6)

Generate [coordinates] into this code [your code section in Collab notebook] for my opentrons project.

Edit the code to not waste bio-ink [attached screenshot]

Is it still wasting ink? [attached screenshot]

What are the cutting edge Opentrons use cases in synthetic bio right now?