Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Final Project Frugal Benchtop Bioreactors: Editing the DNA of an organsim is more accessible than ever. Basic lab equipment and plasmid services like GenScript mean that you can dream up your own sequence and express it in an host for around a hundred US dollars. However to unlock the real world impact of gene edits you usually need to be able to scale the production up. The next step in scaling beyond the shaker flask is a bench-top bioreactor where you figure out how to actively manage and optimize your organisms growth and characterstics. This expense of the benchtop stage makes it less accessible than the edit stage even though in many ways the technology involved is simpler. For example a new benchtop bioreactor typically costs tens of thousands of dollars or more. Even used bioreactors costs thousands of dollars.

  • Week 2 HW: DNA Read, Write, and Edit

    Homework Part 0: Basics Of Gel Electrophoresis Attended lecture and watched recitation video Part 1: Benchling & In-silico Gel Art Link to Benchling Project , not sure how can see this link I asked to join HTGAA group but doesn’t seem like my invite was accepted yet? My drawing of an “E Gel Person” and associated enzymes in each lane are also in screenshot below Part 2: Gel Art - Restriction Digests and Gel Electrophoresis I don’t have access to these enzymes and DNA in my local makerspace lab.

  • Week 3 HW: Lab Automation

    Opentron Python Script Art Basic Idea The limited pixel resolution and colors of the petri dish reminded me of the old school bitmap monitors like the IBM PC that I grew up with. Also I wasn’t looking forward to guessing/figuring out a lot of pixel locations by hand, so I took a retro route and wrote some code to provide a terminal like API that let’s you specify a cursor location to write text to using a specific bitmap font and color.

  • Week 4 HW: Protein Design Part I

    Part A: Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Why do humans eat beef but do not become a cow, eat fish but do not become fish? Why are there only 20 natural amino acids? Can you make other non-natural amino acids? Design some new amino acids. Where did amino acids come from before enzymes that make them, and before life started? If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? Can you discover additional helices in proteins? Why are most molecular helices right-handed? Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? Design a β-sheet motif that forms a well-ordered structure. Okay I am going to take a first pass through here just going off the lecture, wikipedia, and background knowledge I already have and then go back and try with AI assistance for the ones I have no answer for.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

Final Project

Frugal Benchtop Bioreactors:

Editing the DNA of an organsim is more accessible than ever. Basic lab equipment and plasmid services like GenScript mean that you can dream up your own sequence and express it in an host for around a hundred US dollars. However to unlock the real world impact of gene edits you usually need to be able to scale the production up. The next step in scaling beyond the shaker flask is a bench-top bioreactor where you figure out how to actively manage and optimize your organisms growth and characterstics. This expense of the benchtop stage makes it less accessible than the edit stage even though in many ways the technology involved is simpler. For example a new benchtop bioreactor typically costs tens of thousands of dollars or more. Even used bioreactors costs thousands of dollars.

Substantial progress has been made toward accessible and open bioreactors with efforts like BIO-SPEC: An open-source bench-top parallle bioreactor system, but the Bill Of Materials for a BIO-SPEC system is still over 2000 Euros. Another great project is the Pioreactor which is available as a kit for only $350 US dollars. However the max volume of a Pioreactor is 40mL and it only has a single tank, which limits the ability to do things like media optimization unless you buy multiple Pioeractors.

The goal of my final project would be able to take advantage of fabrication tools like 3d printing to design and build a simple benchtop bioreactor with a bill of materials of around a hundred US dollars, demonstrate the reactors works by scaling up and optimizing production for at least one simple engineered microbe, and then release the plans and software with an open license. The bioreactor should support at least up to 1L of total culture and have multiple reaction tanks to support environmental optimization. A key aspect of benchtop scale up in monitoring and optimizing both the biomass and level of expression, which can be difficult to monitor cheaply. To prove out the concept and provide a demo project for the system the project will also edit an organism like a microbe to express a colored protein so that monitoring and optimizing is easier using inexpensive technology like digital cameras. Eventually a library of scaffold organisms could be developed that allow uses to insert their edits into a the organism in a way that their protein will be co-expressed with the easy to monitor protein like color to enable inexpensive and cheap scale up.

Governance Policy and Actions

Equity And Automony

In line with the desire for to giving as many people as possible the opportunity to learn how to scale up thier bio-engineering projects the primary policy concern is promoting ongoing equity and autonomy.

Actions

  • The project can release all CAD designs, Documentation and, software is released with open licenses that allow people to use and extend. (Most Effective - 1 )
  • The project can ensure the bill of materials only includes items that are broadly available and/or have open licenses. (Minimally Effective - 3)
  • Fab labs and service manufactures can sign up to print and mail kits of 3d printed parts to people who don’t have access to 3d printers. (Minimally Effective - 3)
  • Schools and DIY labs can incorporate this into their curriculum or sponsor people to build them locally in order to expand the number of people who have experience with the process of scaling up synthetic organisms. (Mininimally Effective - 3)
  • An existing organization, like Neosynbio or Frugal Science Academy which support open and frugal biological tools, can sponsor the project and manage the licenses and copyrights. In addition to providing exposure, if the project is successful this allows the project to exist beyond a single person and prevents license and copyright changes that restrict access. (Most Effective - 1)

Biosafety

While a bench-top reactor does not introduce new biological risks it magnifies the ones that already exist in the bio-engineered organism, so a secondary policy goal needs to make be managing this magnified risk.

Actions

  • The project can provide documentation and training materials can reference existing training on bioethics and biosafety. (Moderately Effective - 2)
  • The project and labs can provide known safe demo projects that demonstrate the principles of scale up and optimization with minimal biological risk. (Minimally Effective - 3)
  • Schools and Labs can extend their existing bioethics and biosafety training to explictly discuss how to manage the magnified risk of scaling up a biological process. (Moderately Effective - 2)
  • Governments and regualtors can extend thier existing regulations and logging practices to cover specfic requirements around scaling up or just regulate edits with the assumption that scale up will happen (Moderately Effective - 2)

Homework Questions

Homework Questions from Professor Jacobson

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate for polymerase is 1 in 10^6 according to slide #8. Slide #10 indicates the human genome is 3.2 Gbp, so a single copy of the genome is likely to have ~1000 errors which is pretty high especially for fast growing cells that replicate once a day, since errors will accumulate with each replication. Biology deals with this by having many layers of error correction and handling beyond the already excellent ones built into the polymerase. One example is the Lamers et al work on MutS from the slides which correct higher level structural errors in the DNA. Beyond error corrections, there are also mechanisms that cause cells with serious errors to self-destruct or be marked for destruction by other cells. This fail-safe removes cells with serious errors from the population so that don’t replicate more.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Slide 6 indicates that the average human protein is about ~1000 base pairs and ~330 amino acids. While the exact number of alternate codings depends on the specific amino acid sequence of the protein, on average each amino acid has ~3 codons that map to it (64/21), so there ~3^330 alternate codings for an average human protein. In practice all of these encodings may not actually work because at the end of the day different codons are still physically and chemically different from other codons which can create differences in the structure of the DNA and translated RNA as indicated in later slides. For example structural differences in RNA chains can impact ribosome translational efficiency, which means a given DNA chain might code for the same protein but make too much or too little of that protein for the organism to survive.

Homework Questions from Dr. LeProust: [Lecture 2 slides]

  1. What’s the most commonly used method for oligo synthesis currently?

From the Jacobson slides and the timeline in the LeProust slides it appears that while the details of the chemistry, level of automation, and miniaturization has been massively improved over the years most olgio synthesis methods are variants of an open loop chemical synthesis with a protection group .

  1. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Also from the Jacobson slides it the error rate for chemical synthesis is 1 in 100 base pairs. That error rate would make it very hard to go much beyond a couple hundred base pairs without getting an error.

  1. Why can’t you make a 2000bp gene via direct oligo synthesis?

    2000 base pairs is much too large for a 1 in 100 error rate. At 2000 bps, even if you sequence the the different olgios produced and then try fo find and amplify a good olgio you have an almost zero chance of getting a good sequence of that length in the first place.

Homework Questions from George Church

I looked at question #1

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

I had to google “What is the lysine contingency” because I had forgotten that part of Jurassic Park. Looking at Wikipedia - Essential Amino Acids it looks like there are 9 amino acids that animals absolute can’t synthesize and 6 more that can’t always be synthesized in sufficient amounts. I was not able to reconcile this vs the 10 amino acids in the question. Independent of that it is clear that the “Lysine Contigency” makes no sense as a form of bio-safety for animals, since all animals already suffer from the lysine contingency and do just fine getting Lysine from the food they eat. Even knocking out the ability to produce a non-essential amino acid like Alanine would not help containment unless the dinosaur species has such a narrow diet that it couldn’t survive any any modern wild plant or animal matter. In particular carnivorous dinosaurs that are happy to eat modern mammals (the ones you most want to contain) could get the full suite of amino acids from their prey. I used Google and their AI answer to verify that all amino acids survive stomach acid and digestion.

Extra Investigation for GRO:

While reading the George Church slides on Geneticially Recoded Organisms (GRO) and the fact that they are immune to viral infection “Swapped genetic code blocks viral infections and gene transfer”, I was driven to do some additional investigation into the risks using Google Gemini knowing how important viral infection is in microbial control in the wild. Some of the main prompts were:

  • “Isn’t immune to natural viruses a little dangerous since that is the main predator/cause of death of most microbes?”
  • “It seems like if they could escape and dump the auxotrophy gene they might have enough of a survival advantage (there are many slow growth bacteria in nature) that they could climb back up the fitness curve over time.”
  • “what is the pragmatic outcome advantage or technology that justifies this risk”

I need to do more investigation in this area, but my first impression is that the researchers involved are thinking deeply about the bio-safety risks and developing layered countermeasures. However many of those countermeasures are fundamentally dependent on responsible actors in a high trust world and it isn’t clear that in a low trust multi-polar world these safeguards will be sufficient? The same economic forces pushing in this direction (virus free bioreactors) may tempt people to bypass the fitness safe-guards for economic benefit. Once a virus immune GRO population becomes established in the wild it isn’t clear to me how you would eradicate them. Even if they are initially slow growing or not dominant in their niche, being immune to viruses is such an incredible fitness advantage that it seems likely GRO microbes will eventually climb the fitness curve and come to dominate all ecosystems. This might take thousands or even millions of years, but seems like a very bad outcome. I guess we would have to hope that natural virus evolution figures out how to bypass the GRO defense before then?

Week 2 HW: DNA Read, Write, and Edit

Homework

Part 0: Basics Of Gel Electrophoresis

Attended lecture and watched recitation video

Part 1: Benchling & In-silico Gel Art

  • Link to Benchling Project , not sure how can see this link I asked to join HTGAA group but doesn’t seem like my invite was accepted yet?
  • My drawing of an “E Gel Person” and associated enzymes in each lane are also in screenshot below
E Gel Person E Gel Person

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

I don’t have access to these enzymes and DNA in my local makerspace lab.

Part 3: DNA Design Challenge

Protein Choice

For the frugal bioreactor project I thought it would be ideal to have an organism and protein combination that are easily detected to help people (including myself) learn how to do yield optimization experiments. To that end I am choosing a protein that has a easily visible color, so intensity of color can be used to judge protein expression. To find a list of proteins I

  • Asked Gemini “Are there proteins that have visible color and can be expressed by e coli?”.
  • Selected EforRed from the list, since it will probably be easiest to measure intentisity of primary colors
  • Found EforRead in FPBase though common alias is apparently eforCP. There is spectrum data on FPBase also which could be useful for calibration and detection.
1 MSVIKQVMKT KLHLEGTVNG HDFTIEGKGE GKPYEGLQHM KMTVTKGAPL PFSVHILTPS HMYGSKPFNK YPADIPDYHK 
81 QSFPEGMSWE RSMIFEDGGV CTASNHSSIN LQENCFIYDV KFHGVNLPPD GPVMQKTIAG WEPSVETLYV RDGMLKSDTA 
161 MVFKLKGGGH HRVDFKTTYK AKKPVKLPEF HFVEHRLELT KHDKDFTTWD QQEAAEGHFS PLPKALP

Reverse Translate

Encdoe the protein as a DNA string. There are lots of tools to this, but for fun I just wrote python code using a mapping from RNA to Amino Acids I took out of some old Rosalind code that mapped RNA to proteins.

# Want to find a DNA encoding of a protein
# This is RNA to DNA mapping from Rosalind
prot_string = """UUU F             
UUC F             
UUA L             
UUG L             
UCU S             
UCC S             
UCA S             
UCG S             
UAU Y             
UAC Y             
UAA Stop          
UAG Stop          
UGU C             
UGC C             
UGA Stop          
UGG W             
CUU L 
CUC L 
CUA L 
CUG L 
CCU P 
CCC P 
CCA P 
CCG P 
CAU H 
CAC H 
CAA Q 
CAG Q 
CGU R 
CGC R 
CGA R 
CGG R 
AUU I 
AUC I 
AUA I 
AUG M 
ACU T 
ACC T 
ACA T 
ACG T 
AAU N 
AAC N 
AAA K 
AAG K 
AGU S 
AGC S 
AGA R 
AGG R 
GUU V
GUC V
GUA V
GUG V
GCU A
GCC A
GCA A
GCG A
GAU D
GAC D
GAA E
GAG E
GGU G
GGC G
GGA G
GGG G"""

# Do the reverse mapping just let the last on mapped win
prot_rna_map = {x.split(" ")[1]: x.split(" ")[0] for x in prot_string.split('\n')}
# skip spaces
prot_rna_map[" "] = ""
print(len(prot_rna_map), prot_rna_map)

my_protein = """MSVIKQVMKT KLHLEGTVNG HDFTIEGKGE GKPYEGLQHM KMTVTKGAPL PFSVHILTPS HMYGSKPFNK YPADIPDYHK QSFPEGMSWE RSMIFEDGGV CTASNHSSIN LQENCFIYDV KFHGVNLPPD GPVMQKTIAG WEPSVETLYV RDGMLKSDTA MVFKLKGGGH HRVDFKTTYK AKKPVKLPEF HFVEHRLELT KHDKDFTTWD QQEAAEGHFS PLPKALP"""
my_rna = "".join([prot_rna_map[l] for l in my_protein])
print("-----")
print(my_rna)

my_dna = my_rna.replace("U", "T")
print("-----")
print(my_dna)

The result I got (out of the many possible encodings) was:

ATGAGCGTGATAAAGCAGGTGATGAAGACGAAGCTGCACCTGGAGGGGACGGTGAACGGGCACGACTTCACGATAGAGGGGAAGGGGGAGGGGAAGCCGTACGAGGGGCTGCAGCACATGAAGATGACGGTGACGAAGGGGGCGCCGCTGCCGTTCAGCGTGCACATACTGACGCCGAGCCACATGTACGGGAGCAAGCCGTTCAACAAGTACCCGGCGGACATACCGGACTACCACAAGCAGAGCTTCCCGGAGGGGATGAGCTGGGAGAGGAGCATGATATTCGAGGACGGGGGGGTGTGCACGGCGAGCAACCACAGCAGCATAAACCTGCAGGAGAACTGCTTCATATACGACGTGAAGTTCCACGGGGTGAACCTGCCGCCGGACGGGCCGGTGATGCAGAAGACGATAGCGGGGTGGGAGCCGAGCGTGGAGACGCTGTACGTGAGGGACGGGATGCTGAAGAGCGACACGGCGATGGTGTTCAAGCTGAAGGGGGGGGGGCACCACAGGGTGGACTTCAAGACGACGTACAAGGCGAAGAAGCCGGTGAAGCTGCCGGAGTTCCACTTCGTGGAGCACAGGCTGGAGCTGACGAAGCACGACAAGGACTTCACGACGTGGGACCAGCAGGAGGCGGCGGAGGGGCACTTCAGCCCGCTGCCGAAGGCGCTGCCG

Codon Optimization

For a given protein string of amino acids there are many possible DNA codes that could generate that protein. While all these codes are equivalent in an abstract sense in reality the RNA machinery like ribosomes and T-RNA in each organism is differen and the physical characterstics of equivalent codons can be different which can impact an organism’s ability to actual translate the DNA into a protein. For example the organism could be more efficient with certain T-RNA encodings than others impacting the rate or expresion or an organism could give special meanings to sequences of codons that it doesn’t use in its own proteiens. To avoid this you want to optimize your encoding for expression in your chosen organism.

I tried to optimize my sequence on twist, but I got 404 whenever I went to their codon optimization calculator. Instead I googled for codon optimization calculator and used another free one, VectorBuilder to get below optimization. I used E. Coli as my organism and didnt’ specify any restriction enzymes to avoid:

Pasted Sequence: GC=62.26%, CAI=0.60

ATGAGCGTGATAAAGCAGGTGATGAAGACGAAGCTGCACCTGGAGGGGACGGTGAACGGGCACGACTTCACGATAGAGGGGAAGGGGGAGGGGAAGCCGTACGAGGGGCTGCAGCACATGAAGATGACGGTGACGAAGGGGGCGCCGCTGCCGTTCAGCGTGCACATACTGACGCCGAGCCACATGTACGGGAGCAAGCCGTTCAACAAGTACCCGGCGGACATACCGGACTACCACAAGCAGAGCTTCCCGGAGGGGATGAGCTGGGAGAGGAGCATGATATTCGAGGACGGGGGGGTGTGCACGGCGAGCAACCACAGCAGCATAAACCTGCAGGAGAACTGCTTCATATACGACGTGAAGTTCCACGGGGTGAACCTGCCGCCGGACGGGCCGGTGATGCAGAAGACGATAGCGGGGTGGGAGCCGAGCGTGGAGACGCTGTACGTGAGGGACGGGATGCTGAAGAGCGACACGGCGATGGTGTTCAAGCTGAAGGGGGGGGGGCACCACAGGGTGGACTTCAAGACGACGTACAAGGCGAAGAAGCCGGTGAAGCTGCCGGAGTTCCACTTCGTGGAGCACAGGCTGGAGCTGACGAAGCACGACAAGGACTTCACGACGTGGGACCAGCAGGAGGCGGCGGAGGGGCACTTCAGCCCGCTGCCGAAGGCGCTGCCG
Improved DNA[1]: GC=51.84%, CAI=0.95

ATGAGCGTGATTAAACAGGTAATGAAAACCAAACTGCATCTGGAAGGCACGGTGAACGGCCATGATTTCACCATTGAAGGCAAAGGCGAAGGCAAACCGTATGAAGGCCTGCAGCATATGAAAATGACCGTGACCAAAGGCGCGCCGCTGCCGTTTAGCGTGCATATTCTGACGCCGAGCCACATGTACGGCAGCAAACCGTTTAACAAATACCCGGCGGACATCCCGGATTATCACAAACAAAGCTTCCCGGAAGGCATGAGCTGGGAACGTAGCATGATCTTCGAGGATGGCGGCGTGTGCACTGCGAGCAATCATAGCAGCATTAACCTGCAGGAAAACTGCTTCATTTACGATGTGAAATTTCATGGCGTGAATCTGCCGCCGGATGGTCCGGTGATGCAGAAAACCATTGCGGGCTGGGAACCGAGCGTGGAAACCCTGTACGTGCGTGATGGCATGCTGAAAAGCGATACCGCCATGGTGTTTAAACTGAAAGGCGGCGGCCATCATCGCGTGGACTTCAAAACCACCTATAAAGCGAAAAAACCGGTGAAACTGCCGGAATTTCACTTTGTGGAACATCGCCTGGAACTGACCAAACATGATAAAGATTTCACCACCTGGGATCAGCAGGAAGCGGCGGAAGGCCATTTTTCACCGCTGCCGAAAGCGCTGCCG

I knew GC was ratio of those bases in the sequence which varies form species to species and can impact structure, but I didn’t know what CAI was so I googled it and learned it is the “Codon Adaption Index”, which is a measure for a specific codon of how likely your codon’s are to be the organism’s preferred choice in terms of T-RNA frequency.

Now What

I still have a lot to learn about how to express proteins, but at a high level I know with cell culture

  1. Pick an organism to express, I will asumme it is a bacteria like E. Coli
  2. Manufacture this DNA segment (with a service like twisted) in side a plasmid in a way that
  • The plasmid has some other trait like antibiotic resistance that we can use to select for bacteria that have taken up the plasmid
  • Some promoter that we can use to make sure our protein is expressed.
  1. Use some kind of transformation method to get the plasmid inside of the bacteria
  2. Select for bacteria that have the plasmid by culturing and then putting on antibiotic gel and selecting survivors
  3. Culture and grow these bacteria at scale and triggering conditions (if any) for our protein to be expressed.

Part 4: Twist Synthesis Order

Followed the homework directions pretty closely here using the same prefix/suffix sequences they recommend even the purification histadines though I probably don’t need to extract my color protein? This is my linear sequence.

After that I followed the instructions to create a twist order, downloaded in genebank format, and imported plasmid into benchling

I used Gemini freely to understand Benchling and standard practices in DNA construction and editing. Prompts where:

  • “IN benchlig once I create a DNA sequence, can I add more DNA to it in benchling?”
  • “When I right click it asks me if I want to insert bases or parts. What is the difference?”
  • “When we talk about DNA sequences are we typically using the 5 to 3 or 3 to 5 strand? What is each called and which one is the primary thing we are editing and viewing in benchling?”
  • Is it best practice to separately annotate start and stop codons on their strands?

Part 5: DNA Read, Write, Edit

DNA Read

What DNA?

If I could sequence and DNA I would sequence Valonia Ventriscosa which is an where a single cell (multiple nuclei) that can grow to be centimeters across! My local DIY bio lab has just started working on doing synthetic biology with them and the first step would be to sequence the entire genome which hasn’t been done yet.

What Sequencing?

  1. I think to assmeble the overall structure of the genome from scratch we need the longest possible reads, so we would want to use a third generation sequencing method like nanapore for that. Once we have the overall structure we might want to use a high-bandwidth sequencing like illumini to get better resolution and coverage, including how much variation there is between indviduals.
  2. To get started we need a decent amount of DNA. This is roughly the amount in 1 million nuclei? We have estimated (assuming that a 1mm cell has ~1000 nuclei) that we need around 1000 1mm cells. Once we have the cells we would need to separate out the DNA from the rest of the cell.
  3. Since the lecture didn’t go into much detail on prep with a nanapore I asked gemini “What kind of preparation do you need to use a nanapore sequencer”. The summary is:
    • Gentle separation of DNA so you keep the strands long
    • High purity (non phenols/salts)
    • Repair the DNA ends so taht they are blunt and have a single A overhang
    • Attach a motor protein to DNA to move strand through the membrane.
    • Setup the flow cell with buffer
    • Put your sample in
  4. Again I didn’t know the output format so I asked Geminin “What is the raw output format of nanapore”. Summary is
    • The data is large (can be a terabyte)
    • Nanapore recrods the raw signal as electrical voltage changes as DNA goes through pore. The wiggles are stored in in binary files in either POD4 or FAST5 format
    • The DNA sequence implied from raw electrical is also output as FASTQ files

DNA Write

  1. If I was going to synthesize DNA independent of editing I would encode wikipedia as DNA. In particular, it would be fun to encode the main text of the wikipedia page about DNA as a DNA strand. Using wc shell utility on a copy of the file I see it is about 64K bytes, which is something that could be encoded as DNA.
  2. I checked with Gemini “What are the abilities and limitations of cell-free DNA assembly? How long can strands be?” and it looks like this size is possible to do with cell-free assmbly and pushing the upper end. I think from the lecture to pull this off you would need to synthesize small pieces (in the range of 500 bps) with olgio synthesis so that they have correctly overlapping ends and then repeatedly use Gibson assembly. If you arrange the reactions you can take advantage of the fact that fragments double in size to do this in 9 rounds of repeated glueing fromt the intiail 128 segements you start with

DNA Edit

  1. If I had the capability to do any DNA edits I wanted, I would want to try to edit a Eukaroyte like algae or yeast do genetically support symbosis with some kind of hydrogen oxidizing bacteria. This would be a massive amount of editing to both the bacteria and the host, but would unlock the ability to directly produce interesting products at scale with only water, sunglight, and atomsphere to produce proteins at scale. For example it would enable things like celluar agiculture and space habitats to functions without existing agriculture inputs.
  2. This is such a massive edit I am not even sure what technologies would work. On the bacteria side you would probalby want to use the edit techniques used in the construction of the minimal living cell though you would want to possibly cut even deeper because you want to make sure some of the bacterias fundamental functions are produced by the host. My earlier Gemini query on cell-free assembly volunteered that the Minimal Bacterial Genom project used a combination of cell-free assembly and yeast-based (TAR Cloning). I don’t know anything about TAR cloning so I asked Gemini “Can you tell me more about TAR cloning?” and it sounds like it the pefect thing to do large scale and complex construction and edits of DNA, so it is probably what would end up being used.

Week 3 — Lab Automation

Opentron Python Script Art

Basic Idea

The limited pixel resolution and colors of the petri dish reminded me of the old school bitmap monitors like the IBM PC that I grew up with. Also I wasn’t looking forward to guessing/figuring out a lot of pixel locations by hand, so I took a retro route and wrote some code to provide a terminal like API that let’s you specify a cursor location to write text to using a specific bitmap font and color.

Result

The code to turn the petri dish into a screen was pretty straight-forward to do even by hand. I did bunch of experiments of what I should do with the API and ended up making a logo for my local bio makerspace:

Tri DIY Bio Logo Tri DIY Bio Logo

With the screen API drawing this log is only a few lines of python

har_width = 8
char_height = 8
char_border = 1
screen = PetriScreen(40, 1, (char_width, char_height, char_border), font_maps)
screen.write_text_at((-1,4), '\u0004\u0004',font="Ibm437", color="green")
screen.write_text_at((-3,2), "TRI  \u001E", font="Ibm437", color="red")
screen.write_text_at((-3,0), "\u0003 DIY \u0003", font="Ibm437", color="cyan")
screen.write_text_at((-3,-2), "\u0002   BIO", font="Ibm437", color="yellow")
screen.write_text_at((-1,-4), '\u0004\u0004',font="Ibm437", color="green")

run = screen.run

How

I think AI could have done this really quickly but keeping with the retro theme I did things by hand in my colab notebook. Even by hand this wasn’t too bad and ended up being about 100 lines of python code for the logic.

  • Encoded the bytes as dictionaries and binary numbers (so you can sort of see the shape), where highest value digit is upper leftmost digit
  • Code supports screens with different bitmap sizes.
  • The API uses (row, line) numbers to specify location like an old-fashioned terminal instead of (x,y).
    • Since the petri dish is a circle I made (0,0) the center row and line and used negative numbers for rows to left and lines below the center
    • The number of rows and lines a petri dish can fit depends on the size of the font and the physical pixel separation which are specified when the Petri Screen is created.
    • I experiemented with a bunch of different pixel separations. I ended up using 1mm, but not how this will actually look in real life.
  • The python code itself is NOT optimized for speed and uses string operations to manipulate the bytes.
  • The use of the robot is optimized because the screen computes all of the pixels before rendering begins, so
    • Uses minimal number of tips
    • Uses minimal amount of fluid
    • The move path is also near optimal because the tip is moved in a left to right scan for each color across the entire screen

There are a lot of sources for bitmap font data all of which use a different format. I went with class IBM 8x8 font data available in this projects header files. I converted this to my format using some simple search and replace and then running this python code outside of the the notebook.

for k,v in raw_font_data.items():
  key_str = k if k.isalnum() else f'\\u00{ord(k):X}'
 # print(f'---- char {key_str } ----')
  rows = []
  for i,b in enumerate(v):
     rows.append(f'{format(b, "08b")[::-1]}')
  print(f'"{key_str}" : 0b{"_".join(rows)},')

Lab Run

I wasn’t able to make my node’s office hours and didn’t get a reply to if and how we are supposed to run my code, so I wasn’t able to actually run.

Post-Lab Questions

Find and Describe an Opentron Paper

I read this paper Environmental modulators of algae-bacteria interactions at scale. This paper explores the dependence of how autotraphs and hetrotrophs interact as a combined system under various environmental conditions. The space of variables for a two organism system under a variety of environmental conditions is large (~225K), so massively parallel automation is essential. The key technology in this space is to observe nano-liter size well droplets in a microfluidic system using fluorescent bar codes to identify the conditions in each droplet and to also fluroescent intensity to measure protein growth. Given the small scale, number of possibilities, and precision required the solutions used to create the droplets where mixed by an opentron-2 robot (STAR section page e2).

Lab Automation for Final Project

My main current proposed project is building a frugal benchtop bioreactor, which is more along the lines of trying to bring simple automation and sensor control to a wider audience. Given this it isn’t clear how I could use an opentron? Even if I do another project it also isn’t clear that as a remote comitted listener I would have access to either opentron or lab automation in a way that I could do a project?

Final Project Ideas

1. Frugal Benchtop

This was what I outlined in homework #1. Would try to include demo projects with e. coli and/or yeast that have been edited to express visible spectrum colors

2. Non-sterile simple genetic modification

As an extension of sourdough in to create a 2-species system that supports genetic modification of either microbe. Sourdough environment already selects against most competitors.

3. Edit yeast or algae to symbosis with HOB

Not even realy sure how to go about this because it is a massive edit.

Extra Node Specific Questions

I imagine AI could easily answer this question, but I intentially tried to just use the notes that were sent out and my basic understanding of howo things to work to see what I could guess without help.

1. If we were given a random segment of 100,000 bases, how would we determine if it is encoding for eukaryotic or prokaryotic genes? Could we find specific “parts” - promoters, operators, enhancers, silencers?

Assuming it is protein coding section of DNA I think you would have a decent chance of finding enough patterns to make good guesses about the structure and encodiongs

Prokaryote vs Eukaryote

Some things you could look for

  • Circular vs linear
  • Ribosome Binding Sequences
  • Origin of Replications

Specific Parts

If you can identify patterns that you think are potential ori and RBS then you can use those guesses to orient on the strand and look for

  • Between Ori and RBS sequence: Common promoters and operators for the domain you think it is
  • After the RBS: Look for start codons and stop codons. Do the stop codons seem far enough away from start codon to encode a reasonable size protein.
  • If you have multiple potential RBS sites and resonable stop codons, you can than iteratively look for more promoters and operators between stop codon and the next potential RBS.

2. Be able to explain why you chose your sequencing method for HW #2 and what other options you considered. Be able to explain how you would synthesize a particular piece of DNA - is it all in one piece, or are you assembling several synthesized parts together? What editing methods did you consider, and how would you confirm your DNA editing approach worked? Basically, be able to discuss your answer to this question.

Tried to justify in text of last week’s homework

3. What published paper using Opentrons are you analyzing for HW #3? In discussing with your peers, have researchers generally approached this tool for similar uses, or for vastly different fields?

See above.

Week 4 HW: Protein Design Part I

Part A: Conceptual Questions

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
  3. Why are there only 20 natural amino acids?
  4. Can you make other non-natural amino acids? Design some new amino acids.
  5. Where did amino acids come from before enzymes that make them, and before life started?
  6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
  7. Can you discover additional helices in proteins?
  8. Why are most molecular helices right-handed?
  9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
  10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
  11. Design a β-sheet motif that forms a well-ordered structure.

Okay I am going to take a first pass through here just going off the lecture, wikipedia, and background knowledge I already have and then go back and try with AI assistance for the ones I have no answer for.

  1. A Dalton is another name for atomic mass unit. $6\times1023$ hydrogen atoms/dalton/atomic massunits is roughly $1g$ of mass, so $1g$ of amino acids is roughly $6\times1021$ amino caid molecues and 500g of meat should contain at most $3\times1024$ amino acid molecues. In reality, a good fraction of the mass of meat is water, fat, and other non-protein so the number will be less than that, probably between 10-30% of the max depending ont he meat involved, which gives a range of $3\times1023$ to $1\times10^24$ molecues.
  2. The state of being a cow is a complex relationship between a cow’s cells. Being a cow cell is a complex relationship between the DNA, lipids, and proteins in that cell as well as the cell’s history and its relationship to other cow cells. IN other words being a cow is a delicate state. Eating a cow (especially if we cook it first) involves destruction of the cells, relationships, and molecues breaking them down into component parts at a molecular level, in particular protein is broken down into its component amino acids so all the relationship,s structure, and patterns are lost and re-assembled into the patterns and structure of whatever is eating the cow by the machinery of the organisms and its cells (plus the injection of energy). You can acquire the molecular shadow of what you eat in the form of the isotope concentration of your components, e.g. if you eat at lot of corn (or things that eat corn) you will have the isotope ratio of a C4 plants like corn.
  3. Not sure there is a strict answer to this, because it seems like there is something of a historical contigency here. In fact my past seemed to indicate that different people cite differen numbers of “natural amino acids”, e.g. wikipedia says 22 amino acids instead of 20. Given that I think the best answer is something like early life must have existing in an environment where the current naturally occuring amino acids were being manufactured by some abiogenic process and ended up being incorporated into the structure of early proto-life. In addition to requiring that the amino acids was created by some abiogenic proces in decents amounts, the current amino acids of life are also the ones that life figured out how to internally synthesize. I can imagine there may have been a commonly occuring abiogenic amino acid that some proto-life started using, but proto-life didn’t figure out how to self-synthesize. As the abiogenic source of a hard to biologically synthesize amino acid waned, proto-life that used that amino acid would have been very strongly selected against, so even a common abiogenic amino acid may not show up as current “natural” amino acid.
  4. An amino acide is an organic compound that has both carboyxl and an amine group. This means that the to design an amino acid you can attach those groups to any organic/carbon backbone. For example, you could take octane (eight carbon atoms in a chain with hydrogen atoms) and attach a carboxyl and amine group to the last atom in the chain to make an amino acid (octine?). It is hard to tell if this is a “new” amino acid because there are 500+ amino acids just in nature according to wikipedia without a web/AI search.
  5. There are abiogenic processes that naturally create amino acids. One famous experiment put methane, nitrogen, etc in a jar and passed electricity through and ended up creating many organic compounds including amino acids. We have also detected amino acids and other organic compounds on remote comets/asteroids presumably created abiogenically by heating/cooling/light energy impacting on the frozen components like methane ice.
  6. I have no idea, but seems like D should be right-handed?
  7. I am not sure what this question is asking? Additional relative to what?
  8. Again, I am not sure what this question is asking? I assume they mean most biological molecular helices, because I don’t know that molecular helices have a preferred direction in general. If it is biological, I guess this is because some organic compounds are chiral and biology (because of historical accident?) selects/builds only one chirality of that organic compound. The chirality then impacts the shape formed when repeating units bind together leading to helices with a certain direction also.
  9. Not sure, but if I had to guess it would be hydrogen bonding between the parts of amino acids that are perpendicular to the direction of th sheet?
  10. No idea.
  11. No idea. Ok, no I will check/redo using outside research and AI especially for the last questions.

Part B: Protein Analysis and Visualization

Part C: Using ML-Based Protein Design Tools

C1: Protein Language Modeling

C2: Protein Folding

C3: Protein Generation

Part D: Group Brainstorm on Bacteriaphage Engineering

Part E: William and Mary Node Questions

  1. Be prepared to answer/discuss all 11 questions posed by Dr. Zhang. We will choose the most interesting ones to discuss in class.
  2. Be prepared to discuss the phage literature reading.
  3. A discussion of the phage literature will lead into our main discussion point: please be prepared to address and discuss the “big picture” question: how to apply these protein analysis tools to engineer a better bacteriophage. Please develop specific ideas for discussion.
  4. Time permitting - we will review final projects.