SCOBY DNA Steganography A biological engineering application or tool that I want to develop is a system for biomaterials for artist books that contains the book contents in its DNA. Above is an example of a a previous project of mine, where I was making masks out of SCOBYs, Symbiotic Cultures Of Bacteria and Yeast. A next level project might be to use SCOBY to create pages of artist books, where the the content of the pages (illustrations, text, relief sculptures, etc.) is also written into the DNA of the yeasts and/or bacteria that makes up the SCOBY. The reasons I am interested in this, and believe others will be interested as well, is primarily twofold: 1) I am interested in this as an unusual form of artwork that “grows on you” on multiple levels (such as genetic level, personal level, cultural(!) level, etc.) and 2) I am interested in this as a form of storytelling and story distrubution that can replicate itself, where the DNA creates new copies as the yeast reproduces, then thise new copies are used to create new copies of books.
Part 1: Benchling & In-silico Gel Art Above is maybe my favorite Gel Electrophoresis design I created. I was trying to make monster faces, so hopefully this looks sort of like a skull with horns! I made this with the “DNA Gel Art Interface” website created by Ronan at https://rcdonovan.com/gel-art
Part 1: Generate an artistic design NOTE: Some of my newer and hopefully maybe better images are toward the end!
I generated the above artistic design – a self portrait – using the GUI at opentrons-art.rcdonovan.com – NOTE: BUGSS Lab has colors: orange, green, yellow, purple, red, blue
A below is me working on simulating that, redrawing and recoloring little bit in the Google Colab notebok with the six colors at BUGSS Lab. Thanks to Amanda and everyone for helping hack up that code.
Part A. Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Meat is about 20% protein, so that is 100 g of protein. There are 6.022e+23 Daltons per gram, so for 100 grams that is 6.022e+25 Daltons. Then if there are 100 Daltons in an average amino acid, we’re back to 6.022e+23 molecules of amino acids.
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM https://www.uniprot.org/uniprotkb/P00441/entry#sequences
“Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.” I did this step several times. Here are two examples of my generating four peptides:
Assignment: DNA Assembly 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? From New England Biolabs “Phusion High-Fidelity PCR Master Mix with HF Buffer is a 2X master mix consisting of Phusion DNA Polymerase, deoxynucleotides and reaction buffer that has been optimized and includes MgCl2. All that is required is the addition of template, primers and water.”
Subsections of Homework
Week 1 HW: Principles and Practices
SCOBY DNA Steganography
A biological engineering application or tool that I want to develop is a system for biomaterials for artist books that contains the book contents in its DNA. Above is an example of a a previous project of mine, where I was making masks out of SCOBYs, Symbiotic Cultures Of Bacteria and Yeast. A next level project might be to use SCOBY to create pages of artist books, where the the content of the pages (illustrations, text, relief sculptures, etc.) is also written into the DNA of the yeasts and/or bacteria that makes up the SCOBY. The reasons I am interested in this, and believe others will be interested as well, is primarily twofold: 1) I am interested in this as an unusual form of artwork that “grows on you” on multiple levels (such as genetic level, personal level, cultural(!) level, etc.) and 2) I am interested in this as a form of storytelling and story distrubution that can replicate itself, where the DNA creates new copies as the yeast reproduces, then thise new copies are used to create new copies of books.
The chose the mask example above because these might be sort of like Lovecraftian grimoires, sell books with monstrous faces on them, as I am thinking of suffucuenty advanced technologies being indistinguishable from magic, computer coded and genetic codes as forms of magic, etc. This is also based on “steganography,” the practice of embedding hidden coded information within another object. The term “steganography” dates back to the “Steganographia” from 1499, whcih is a book of cryptography disguised as a book of magic.
I could also possibly drink the brewed kombucha to let the DNA into my own body’s systems, and possibly sell the drink to others during art events, where people coul ddrink the kombucha that makes the SCOBYs that make the pages of the books …
Governance goals and actions
My main governance/policy goals are to make sure that this biomaterial system is safe for artists like myself when they are creating the materials, and also to make sure they are safe for viewers and collectors of the artist books. I will name these goals as the standard goals provided in class, 1) Enhance Biosecurity, 2) Foster Lab Safety, 3) Protect the environment, and 4) Other considerations.
My three different potential governance “actions” are 1) Develop guide for safe SCOBY DNA Steganography creation for artists, 2) Provide screening and training of potential collecting institutions so they can safely handle and preserve the books, and 3) Register any kombucha drink manufacturing facility with the FDA, and adhere to FDA regulations related to creating and selling drinks.
Scoring from 1-3 with 1 as the best, 3 as the worst:
Does the option:
Develop guide for safe SCOBY DNA Steganography creation for artists
Provide screening and training of potential collecting institutions so they can safely handle and preserve the books
Register any kombucha drink manufacturing facility with FDA, adhere to FDA regulations
Enhance Biosecurity
• By preventing incidents
1
2
1
• By helping respond
2
2
1
Foster Lab Safety
• By preventing incident
1
3
1
• By helping respond
2
3
1
Protect the environment
• By preventing incidents
1
1
1
• By helping respond
2
2
1
Other considerations
• Minimizing costs and burdens to stakeholders
3
1
2
• Feasibility?
1
3
2
• Not impede research
1
2
2
• Promote constructive applications
1
3
2
Based upon this scoring, I think the governance option I would prioritize is “Develop guide for safe SCOBY DNA Steganography creation for artists.” This is not only an effective option for creating lab safety for artists, and protecting the environment and others in it, it is also a governance option that I know would be highly feasible as I would be th eprimary artist to use it. If this project scales up and includes othwrs drinking the kombucha used to make the SCOBYS, then FDA registration and following FDA regualtiosn woudl be required and highly beneficial.
Refelctning on this project proposal and new ethical concerns that I am thinking about, I would say that the combination of this as a lab safety project, an art preservation project, and a potential food safety project. I would say that those are areas that I have previously thought of individually, and am now thinking of them all together on the same project for the first time. I am also now thinking about whether this also introduces other governance/policy, perhaps related to shipping, storage, etc.
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
The error rate of polymerase is 1:106. The human genome length is roughly 3 x 109 base pairs. The error rate is lower because of MutS Repair System in Error Correcting Gene Synthesis (slides 14 and 15).
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Google AI Overview says: “An average human protein (~500 amino acids) can be encoded by a practically astronomical number of different DNA sequences, potentially exceeding (10^{100}) combinations, due to codon redundancy where 61 triplets encode 20 amino acids. However, only a few of these codes are functionally efficient or viable in vivo due to factors like codon usage bias, mRNA stability, and proper folding.”
What’s the most commonly used method for oligo synthesis currently?
Solid phase synthesis
Why is it difficult to make oligos longer than 200nt via direct synthesis?
Google AI Overview says: “Making oligonucleotides (oligos) longer than 200 nucleotides (nt) via direct chemical synthesis is difficult primarily because of exponentially decreasing yields caused by imperfect coupling efficiency and the accumulation of errors. "
Why can’t you make a 2000bp gene via direct oligo synthesis?
Google AI Overview says: “A 2000bp gene cannot be produced via direct (single-pass) chemical oligonucleotide synthesis because the efficiency of the coupling reaction drops significantly, leading to low yields of full-length product and high rates of sequence errors (insertions/deletions). "
Using Google & Prof. Church’s slide #4 What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
Google AI Overview says: “The 10 essential amino acids that all animals must obtain from their diet (as they cannot synthesize them in sufficient quantities) are phenylalanine, valine, threonine, tryptophan, isoleucine, methionine, histidine, arginine, leucine, and lysine. This reality makes the “Lysine Contingency” from Jurassic Park scientifically flawed, as all vertebrates, including engineered dinosaurs, would already be unable to synthesize lysine, rendering the engineered deficiency redundant.”
Week 2 HW: DNA Read, Write, & Edit
Part 1: Benchling & In-silico Gel Art
Above is maybe my favorite Gel Electrophoresis design I created. I was trying to make monster faces, so hopefully this looks sort of like a skull with horns! I made this with the “DNA Gel Art Interface” website created by Ronan at https://rcdonovan.com/gel-art
And below is a screenshot of some of my work in Benchling with the Lambda DNA from https://raw.githubusercontent.com/htgaa/htgaa2023/main/02_gel-art/Lambda_NEB.fasta and restriction enzymes, with the “NEB 2-log” ladder selected in the Virtual Digest tab, and with multiple Digests appearing in the same Virtual Digest, which I reorderdd by dragging the tabs around. These are the same restriction enzymes used in the horned skull drawing at the top of the page.
Part 2: Gel Art - Restriction Digests and Gel Electrophoresis
Below is my DNA gel electrophoresis art looking pretty good after about 45 minutes. This was done at the BUGSS Lab (Baltimore UnderGround Sceince Space) on Sunday, Feb 15. Thanks to Amanda and Joel and everyone else!
Part 3: DNA Design Challenge
Will do in lab on Sunday? Or on my own on Monday?
Part 4: Prepare a Twist DNA Synthesis Order
I created my account, still need to prepare my order …
Week 3 HW: Lab Automation
Part 1: Generate an artistic design
NOTE: Some of my newer and hopefully maybe better images are toward the end!
I generated the above artistic design – a self portrait – using the GUI at opentrons-art.rcdonovan.com – NOTE: BUGSS Lab has colors: orange, green, yellow, purple, red, blue
A below is me working on simulating that, redrawing and recoloring little bit in the Google Colab notebok with the six colors at BUGSS Lab. Thanks to Amanda and everyone for helping hack up that code.
And below this is the photo of the six colors at BUGSS Lab:
And then below this is my self-portrait trying to get closer to those colors:
And below this is me working in the google colab notebook to make symmetrical multicolor designs. This is me drawing directly with google colab python code, rather than importing code from Ronan’s website.
Updated Feb 21: Here is a new self-portrait, closer up, with less flat areas of color, in a screenshot from the Google Colab notebok:
Also updated Feb 21: Here is that same new self-portrait, trying to simulate the six colors at BUGSS Lab:
OK! Last set of uopdates before lab! Here is myabe my best, trying to hit the sweet spot of everything I did so far. Her eon Ronan’s site:
That with saturated colors in the google colab:
And finally that trying to simulate colors in the photo from BUGSS Lab:
And, update March 4! Here is a photo of the actual agar art, looking pretty awesome and wild. (I’ve upped the contrast on the photo in Photoshop).
Week 4 HW: Protein Design Part I
Part A. Conceptual Questions
1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Meat is about 20% protein, so that is 100 g of protein. There are 6.022e+23 Daltons per gram, so for 100 grams that is 6.022e+25 Daltons. Then if there are 100 Daltons in an average amino acid, we’re back to 6.022e+23 molecules of amino acids.
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Because or food prepartion and digestive systems have broken the cow (and cow DNA), the cow DNA does not mix with human DNA to produce new cells, and our immune systems fight DNA other than our own.
To be continued … ??? … Will finish later …
Part B: Protein Analysis and Visualization
1. Briefly describe the protein you selected and why you selected it.
I chose the protein “Oxidoreductase cns1” from Cordyceps militaris (strain CM01) (Caterpillar fungus). (https://www.uniprot.org/uniprotkb/G3JF08/entry) I chose it because I am interested in parastic fungi and because it is “part of the gene cluster that mediates the biosynthesis of cordycepin (COR)” and “Cordycepin has antitumor, antibacterial, antifungal, antivirus, and immune regulation properties.”
2. Identify the amino acid sequence of your protein.
XP_006669647.1 oxidoreductase domain-containing protein [Cordyceps militaris CM01]
MAMNENAYPTTFPSFERENHRDALRQPFDPAFRRTWSNGVALRQLVDFARPTVANHTMSYALIEYCLSRL
PMQHLERLGQLKIPVELHAAPFQYLQKHHRACGFDWVERFVWRTHDLHKPYNFLRPELLLAQESGSQRIV
ALLTIMPGEDYIRHYASILEVAQHDGAISSHHGPIRCVLYPHLTQSMMAWTGLTELSLSVEPGDILILGF
VAELLPRFASLVPTARVIGRQDAQYYGLVRLELRPGLVFSLIGAKYSYWGNLGGRVVRELAARRPRAICY
IAKQGTLLSPGDIHRTIYSPTRYCVFDKGQACWHGDDHSALPINPLSSRFPTFDRGLHVSTPTIVEQDVD
FRTQVEAHGASSVDNELAQMARALTDVHEENPSMERVQLLPLMFITDYLRRPEELGMTVPFDLTSRNETV
HRNKELFLARSAHLVLEAFNVIERPKAIIVGTGYGVKTILPALQRRGVEVVGLCGGRDRAKTEAAGNKHG
IPCIDVSLAEVQATHGANLLFVASPHDKHAALVQEALDLGGFDIVCEKPLALDMATMRHFANQSQGSSQL
RLMNHPLRFYPPLIQLKAASKEPSNILAIDIQYLTRRLSKLTHWSAGFSKAAGGGMMLAMATHFLDLIEW
LTSSSLTPASVQDMSTSNSIGPLPTEDAGATKTPDVESAFQMNGCCGLSTKYSVDCDGAADTELFSVTLR
LDNEHELRFIQRKGSPVLLEQRLPGREWLPLKVHWEQRVREGSPWQISFQYFAEELVEAICMGTRSAFAD
KATGFSDYARQVGVFGSKVGIA
I am running the google collab notebook to count the frequency of amino acids (had some errors) … it says the sequence length is 748 and the most frequent amino acid is glycine (317) followd by alanine (193). See below:
I ran a search for homologs on BLAST. There are six that are in the red, that have an E value of about 0. See below:
3. Identify the structure page of your protein in RCSB.
I dod not find the “Oxidoreductase cns1” from Cordyceps militaris (strain CM01) (Caterpillar fungus)” in RCSB. I did find “Crystal Structure of endo-beta-N-acetylglucosaminidase from Cordyceps militaris D154N/E156Q mutant in complex with fucosyl-N-acetylglucosamine” at https://www.rcsb.org/structure/6KPN
I don’t see a solved date. It was Deposited: 2019-08-15. I don’t see it listed as part of any structure classification family at https://www.ebi.ac.uk/pdbe/scop/
4. Open the structure of your protein in any 3D molecule visualization software
Cartoon:
Ribbon:
Ball and Stick:
To Do:
Color the protein by secondary structure. Does it have more helices or sheets?
Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?
Part C1. Using ML-Based Protein Design Tools: Protein Language Modeling
To do: Can you explain any particular pattern? (choose a residue and a mutation that stands out) and (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
2. Latent Space Analysis
To do: Use the provided sequence dataset to embed proteins in reduced dimensionality. Analyze the different formed neighborhoods: do they approximate similar proteins? Place your protein in the resulting map and explain its position and similarity to its neighbors.
Part C2. Using ML-Based Protein Design Tools: Protein Folding
“Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.” I did this step several times. Here are two examples of my generating four peptides:
“To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.”
“Record the perplexity scores that indicate PepMLM’s confidence in the binders.” These can be seen in the above screenshiots, and I have also recorded them in an excel spreadsheet.
Part 2: Evaluate Binders with AlphaFold3
“Navigate to the AlphaFold Server: alphafoldserver.com For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.” This is one example:
“Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?”
In teh one above, I’d say probably surface bound, and closer to the β-barrel region than to the N-terminus.
“In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.”
The ipTM value I got for the known binder was 0.31. The peptides I generated ranged from 0.28 to 0.43 in ipTM values. My understanding is that higher ipTM values show more confidence in the prediction. I hade a few that were slightly higher than the knonw binder, like WRYPAVALALGX at 0.38 and my highest at 0.43 was WHYYVYVVNHGX.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
Below is a screenshot of part of my spreadsheet of results of my work in Peptiverse:
“Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties? Choose one peptide you would advance and justify your decision briefly.”
The few peptides I tested did not show much difference. They each were soluble, and each had weak binding affinity. The widest diferences I saw were in hydrophobicity, where i had some positives and some ngatives. The one that was closest to zero was at -0.21 which was WHYYVYWNHGX. Of these, that might eb the one that I would advance. I would probbaly go back and look for ones that had more binding affinity.
Part 4: Generate Optimized Peptides with moPPIt
I am rerunning this … ran on Sunday but did not save my results …
Some results of my rerun:
Result
Hemolysis
Solubility
Affinity
Motif
STKLHTKIKCQC
0.9697153624147177
0.8333333134651184
6.467465877532959
0.7173478007316589
SVTKKETQKRFA
0.9688531029969454
0.75
5.76106595993042
0.7000954747200012
GSAEMTCKKQRK
0.9745583962649107
0.8333333134651184
6.006259441375732
0.6160933375358582
“briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?”
These seem to be getting different results by using different strategies. For advancing to trials, I would probably test a few of the best results found from each system.
Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)
(optional, might get to later)
Part C: Final Project: L-Protein Mutants
“Run this notebook to generate for each position in the amino acid sequence, a “score” for what would happen to the protein if you mutated into another amino acid”:
“does the experimental data correlate with the scores from the notebook”?
… I don’t think so … might’ve done something off … or maybe these are just different systems with different results …
Week 6 HW: Genetic Circuits Part I
Assignment: DNA Assembly
1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
From New England Biolabs “Phusion High-Fidelity PCR Master Mix with HF Buffer is a 2X master mix consisting of Phusion DNA Polymerase, deoxynucleotides and reaction buffer that has been optimized and includes MgCl2. All that is required is the addition of template, primers and water.”
2. What are some factors that determine primer annealing temperature during PCR?
From ThermoFisher Scientific “The annealing temperature is determined by calculating the melting temperature (Tm) of the selected primers for PCR amplification. A general rule of thumb is to begin with an annealing temperature 3–5°C lower than the lowest Tm of the primers … One important consideration in Tm calculation is the use of PCR additives, co-solvents, and modified nucleotides. The presence of these reagents lowers the Tm of the primer-template complex.”
3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
PCR is used to make many copies of a section of DNA. Restriction enzyme digests cut the DNA at specific points, and do not make multiple copies.
4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
5. How does the plasmid DNA enter the E. coli cells during transformation?
6. Describe another assembly method in detail (such as Golden Gate Assembly)
7. Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
8. Model this assembly method with Benchling or Asimov Kernel!