Class assignment 1. First, describe a biological engineering application or tool you want to develop and why. Iām heavily inspired by Professor Jacobsonās call for a ābio-FPGAā tool, as well as his lecture about cellular automata. Iād like to develop a bio-FPGA that can be programmed to grow into arbitrary 2D patterns on a petri dish, using the machine learning technique mentioned in the lecture to reverse learn the cellular automata rules for growing a specific pattern. The learned CA rules can be encoded by genetically programming the bio-FPGA then using bacteria with the genes to grow an actual cell culture into the pattern, like the butterfly wing letter patterns in the lecture. If this is feasible, 3D patterns would be the next step, and one might even imagine a wild future of programmable plants that grow into the shapes of houses and furniture.
Part 0: Basics of Gel Electrophoresis I watched the lecture, recitation, and read the lab. Essentially, we use the negative charge of DNA to pull DNA fragments towards a positive anode in a porous agarose gel. Larger DNA fragments move slower in the agarose gel.
Part 1: Benchling & In-silico Gel Art I spent some time playing around with Ronanās gel art site to make a pattern (below on the left). I noticed that some of the restriction enzymes in the gel art tool werenāt on the HTGAA enzyme list, so I didnāt use them.
Opentrons Artwork My artwork is here: https://rcdonovan.com/?id=vmns94wqt45wpqc
I used Ronanās tool to make this. I uploaded an image of tomatoes but it didnāt render well, so I modified it significantly by hand with the editor.
Then, I attended the Saturday session on Zoom with Ronan, Michelle, and Ice at Ginkgo Bioworks. Hereās the end result:
Part A: Conceptual questions Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) A dalton is 1.66053906892(52)*10ā23 g, so 500g = 500 / 1.66053906892(52)*10ā23 = 3.0110704e+25 daltons.
Subsections of Homework
Week 1 HW: Principles and Practices
Class assignment
1. First, describe a biological engineering application or tool you want to develop and why.
I’m heavily inspired by Professor Jacobson’s call for a “bio-FPGA” tool, as well as his lecture about cellular automata. I’d like to develop a bio-FPGA that can be programmed to grow into arbitrary 2D patterns on a petri dish, using the machine learning technique mentioned in the lecture to reverse learn the cellular automata rules for growing a specific pattern. The learned CA rules can be encoded by genetically programming the bio-FPGA then using bacteria with the genes to grow an actual cell culture into the pattern, like the butterfly wing letter patterns in the lecture. If this is feasible, 3D patterns would be the next step, and one might even imagine a wild future of programmable plants that grow into the shapes of houses and furniture.
2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an āethicalā future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
My primary goal is to protect health and safety. The sub-goals are:
Prevent the development of biological weapons.
Prevent outbreak of harmful bacteria.
Maximize productive use-cases.
A bio-FPGA has the possibility to be used for great benefit with many applications, but could also be abused.
3. Next, describe at least three different potential governance āactionsā by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & āSuccessā).
Action 1: Engineer a genetic “off switch” into the bio-FPGA to stop proliferation any time.
Purpose: We would build a genetic off switch to immediately turn off all genes of the bio-FPGA.
Design: This would involve researchers and industry as actors to build this prior to releasing the bio-FPGA. The government could also regulate by requiring all bio-FPGA and adjacent tools have such a fail-safe.
Assumptions: This assumes that such a technological solution could be reliably engineered and triggered.
Risks: The risks are that the technical fail-safe does not work, or could even cause problems if it does work because it could be abused to disable legitimate use cases.
Action 2: Regulate against use for biological warfare.
Purpose: Although there are already regulations in place, we could craft regulation to specifically account for bio-FPGA technology.
Design: This would involve the government to understand the technology, the dangers, and pass appropriate laws preventing malicious use of bio-FPGAs.
Assumptions: This assumes that lawmakers would be motivated to pass regulation and that the public would be accepting of such regulation. It also assumes that lawmakers are able to craft good laws or adapt accordingly.
Risks: The risk is that excessive regulation could stifle adoption and research for beneficial use cases. Another risk is that lawmakers don’t understand the science and pass inappropriate laws.
Action 3: Host a conference for researchers and industry to share new developments.
Purpose: To share beneficial use cases, foster collaboration, and disseminate research learnings.
Design: This requires coordinating and organizing the research and industry community, as well as raising funds to host a venue.
Assumptions: I assume that researchers would be interested in attending and discussing.
Risks: The conference could be used to develop malicious use-cases, or ethics could be overlooked in favor of scientific progress at all costs.
4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Does the action:
Action 1
Action 2
Action 3
Enhance Biosecurity
⢠By preventing incidents
3
1
2
⢠By helping respond
1
3
3
Foster Lab Safety
⢠By preventing incident
3
1
2
⢠By helping respond
1
3
3
Protect the environment
⢠By preventing incidents
3
1
2
⢠By helping respond
1
3
3
Other considerations
⢠Minimizing costs and burdens to stakeholders
2
3
1
⢠Feasibility?
2
1
1
⢠Not impede research
1
3
1
⢠Promote constructive applications
3
1
1
Week 2 lecture prep
Homework Questions from Professor Jacobson
Natureās machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
According to the slides, the error rate of polymerase is 1:106 (online it suggest that it can be even worse depending on the polymerase), or 1 in 1 million. The length of the human genome is 3.2 Gbp (3.2 * 109), so at that rate there would be ~3.2 * 103 (3,200) errors in the human genome per copy. That would be a lot of errors, but there are additional pathways that perform error correction, such as MutS.
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes donāt work to code for the protein of interest?
The slides mention the average human protein is 1036 base pairs. That is 345 codons, so 345 amino acids.
Amino acids can have multiple corresponding codons. There are 20 amino acids and 4 possible nucleotides, so there are about 3 possible codons per amino acid.
That is an estimate of 3345 ways to code for a given protein, a huge number.
However, despite a synonymous codon coding for the same amino acid, the base pairs choice can affect the chemical bonds of the mRNA structure, affecting RNA cleavage rules.
Another view is that there are 4^1035 possible nucleotides, which are very unlikely to code for the specific protein even with synonymous codons due to sheer possibility space.
Homework Questions from Dr. LeProust
Whatās the most commonly used method for oligo synthesis currently?
The phosphoramidite method.
Why is it difficult to make oligos longer than 200nt via direct synthesis?
The coupling step is not possible to have perfect efficiency. That step is repeated per cycle, and each additional base requires the cycle to repeat. This means that longer oligos become dramatically harder to make, even with extremely high efficiencies:
Why canāt you make a 2000bp gene via direct oligo synthesis?
The above answer explains why we can’t synthesize longer oligos. At 2000bp, the probabilities become near impossible even at the highest efficiencies.
Homework Question from George Church
Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.
I’m answering question 1.
[Using Google & Prof. Churchās slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the āLysine Contingencyā?
Arginine (R), sometimes considered conditionally essential.
The “Lysine Contingency” is apparently from Jurassic Park, which was a plot element in the movie that was a genetic modification to make the dinosaurs not to be able to produce Lysine so they would die off without human provided Lysine supplements.
However, Lysine is one of the 10 essential amino acids so animals cannot
produce it, making this is a scientifically dubious plot point (the genetic modification would have done nothing).
[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?
[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:
I watched the lecture, recitation, and read the lab. Essentially, we use the negative charge of DNA to pull DNA fragments towards a positive anode in a porous agarose gel. Larger DNA fragments move slower in the agarose gel.
Part 1: Benchling & In-silico Gel Art
I spent some time playing around with Ronan’s gel art site to make a pattern (below on the left). I noticed that some of the restriction enzymes in the gel art tool weren’t on the HTGAA enzyme list, so I didn’t use them.
I think it looks kind of like Darth Vader.
Then, I added the Lambda DNA to Benchling. I made a custom enzyme list with the EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Then, I added the restriction enzymes from the gel art tool to make a virtual digest (below on the right). I had some difficulty ordering the digests properly, so I saved them with the names and ordered them by dragging the tabs after.
Part 2: Gel Art - Restriction Digests and Gel Electrophoresis (Wet Lab)
N/A: This is optional for committed listeners and I didn’t have access to a wet lab this week.
Part 3: DNA Design Challenge
3.1. Choose your protein
The protein I chose is Miraculin, which is from the miracle berry and famous for temporarily causing sour things to taste sweet. I picked this because I have tried a miracle berry tasting before and it was an interesting experience.
I used tblastn (Translated BLAST) to get the nucleotide sequence that corresponds with Miraculin. This found two nucleotide sequences in a database: AB512278.1 and D38598.1. One appears to be the mRNA rather than the genes.
I used VectorBuilder’s codon optimization tool since Twist’s was down for maintenance. I optimized for E. Coli, so the protein could be mass produced in its “cellular factory”. It gave the following:
Since I chose E. Coli, we can order the gene with a promoter in a plasmid, then use a cell-dependent method of heat shocking the E. Coli to embed the plasmid, then cultivating the E. Coli to produce lots of this protein.
This uses the natural plasmid gene expression mechanisms of E. Coli to transcribe and translate the protein.
Part 4: Prepare a Twist DNA Synthesis Order
I followed the steps to make provided sfGFP sequence in Benchling. Here is my Benchling project.
Here is the FASTA file of the expression cassette:
(i) What DNA would you want to sequence (e.g., read) and why?
I’d like to sequence DNA of the human microbiome. There’s been recent research about how beneficial flora of the gut and skin microbiome contribute to our health, and there’s already a significant effort to sequence our microbiome in the Human Microbiome Project.
I would also be interested in the widespread sequencing and cataloguing of viruses that make up the common cold. I think it could be useful to detect the geographic spread of these viruses and how they mutate over time, to potentially contribute to a cure.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
I would use Sanger Sequencing, since it’s a straightforward and well-tested technique, and I understand it the best.
Is your method first-, second- or third-generation or other? How so?
Sanger Sequencing is a first-generation technique. It’s the earliest and most classic form of sequencing.
What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
The input is DNA, regular nucleotides (d*), chain-terminating nucleotides (dd*), primer (like in PCR), DNA-polymerase. You prepare the input by PCRing the sample to have lots of DNA.
What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
You thermocycle the amplified sample with the nucleotides, primers, and polymerases to build many DNA fragments. These DNA fragments will be different lengths because probabilistically each fragment will incorporate a chain-terminating nucleotide which stops polymerization. Finally, the fragments are run through electrophoresis and imaged one base pair at a time to get the sequence.
What is the output of your chosen sequencing technology?
The result is the electrophoresis imaging data, which can be processed to determine the most likely base pair at each position.
One limitation of the technique is that you need a pure sample of DNA, so it may be inefficient for the volumes of organisms we’d want to sequence.
5.2 DNA Write
(i) What DNA would you want to synthesize (e.g., write) and why?
I would want to synthesize DNA origami, as art! I’m curious what it would take to make the smallest art pieces.
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
I would use the phosphoramidite method.
What are the essential steps of your chosen sequencing [sic; synthesis?] methods?
Deprotection, coupling, capping, oxidation, and repeat.
What are the limitations of your sequencing [sic; synthesis?] method (if any) in terms of speed, accuracy, scalability?
The limitation, as discussed last homework, is the length of DNA oligos that can be synthesized with this technique. However, this isn’t a problem for DNA origami, which doesn’t need full length DNA.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I would edit human DNA, for example to cure hearing loss and tinnitus as a gene therapy. Hearing loss is permanent and affects 5% of the world population. Noise exposure from work and the environment also contribute to increased rates of hearing loss. Birds, unlike humans, can regenerate inner ear cells, and researchers have demonstrated regrowth in cell cultures so there is a theoretical target for gene therapy. There have already been successful gene therapy treatments for deafness in children due to congenital disorders.
(ii) What technology or technologies would you use to perform these DNA edits and why?
I would use CRISPR-Cas9, since it is the most popular technique today.
How does your technology of choice edit DNA? What are the essential steps?
Cas9 cuts the DNA, then introducing the edits via homology directed repair.
What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
The guide RNA for Cas9 needs to be designed, as well as the template DNA (for knock-ins).
What are the limitations of your editing methods (if any) in terms of efficiency or precision?
Homology-directed repair is not entirely efficient, since the ends that Cas9 break could rejoin without the knock-in sequence. Performing CRISPR in humans (rather than bacteria or cell cultures) requires sophisticated deliveries to get to the target cells or tissues.
I used Ronan’s tool to make this. I uploaded an image of tomatoes but it didn’t render well, so I modified it significantly by hand with the editor.
Then, I attended the Saturday session on Zoom with Ronan, Michelle, and Ice at Ginkgo Bioworks. Here’s the end result:
Post-Lab Questions
Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
I read Assembly of small silica nanoparticles using lipid-tethered DNA ābondsā. This paper used a novel assembly using DNA by by embedding silica nanoparticles in a lipid bilayer, embedding the cholesterol end of a DNA-cholesterol molecule within the bilayer, then assembling the nanoparticles with complementary sticky end “bridge” DNA. This is best explained with the image from the paper below:
They used Opentrons to rapidly iterate on and evalute different concentrations of DNA-Chol, NaCL, and bridge DNA in the assembly mixture. These were then screened with SAXS to determine the structural qualities of each sample.
Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this weekās recitation slide deck for lab automation details.
One project idea I have is genetically modifying trees for use in urban areas. I’d like to use automation tools to permute different genetic combinations. I envision custom modules that could germinate and monitor an array of seeds for different qualities.
For example, I create a module with grow lights and watering capabilities that can care for the different seed variants and cameras to compare growth rates.
Week 4 HW: Protein Design, Part I
Part A: Conceptual questions
Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
A dalton is 1.66053906892(52)*10ā23 g, so 500g = 500 / 1.66053906892(52)*10ā23 = 3.0110704e+25 daltons.
If an amino acid averages 100 daltons, then 3e+25 daltons is about 3e+23 amino acids.
~300,000,000,000,000,000,000,000 amino acids!
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
We digest the meat and it does not alter our genetics.
Why are there only 20 natural amino acids?
Codon redundancy allows for both more efficient genetic coding and error correction. 20 appears to be enough (as evidenced by life).
Can you make other non-natural amino acids? Design some new amino acids.
Yes, you can make non-natural amino acids (e.g. non-proteinogenic amino acids). Amino acids require an amine, a carboxyl, a central carbon, and a side-chain. You could design one by using a unique side chain that doesn’t exist in the natural amino acids.
Where did amino acids come from before enzymes that make them, and before life started?
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
I would expect it to turn the other way, since D-amino acids are mirrored. Normal alpha helixes are right handed, so I would expect a left handed helix.
Can you discover additional helices in proteins?
Skipped (1/2).
Why are most molecular helices right-handed?
Skipped (2/2).
Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
This is because beta sheets can be hydrophillic on one side and hydrophobic on the other, forming a “pleated sheet”.
Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
Shugang mentions that β-sheets can aggregate to form disease because β-sheets can’t be easily untangled once formed.
Yes, β-sheets give spider silk its unique properties, which inspired the design of materials like Kevlar.
Design a β-sheet motif that forms a well-ordered structure.
We can use the rule from Shugang’s slides: alternate hydrophobic and hydrophillic for every other amino acid.
Part B: Protein Analysis and Visualization
I’m picking the same protein as in week 1: Miraculin, a taste altering protein.
Amino acid sequence:
How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
>1ZKR_1|Chains A, B|Major allergen I polypeptide, fused chain 1, chain 2|Felis catus (9685)
MEICPAVKRDVDLFLTGTPDEYVEQVAQYKALPVVLENARILKNCVDAKMTEEDKENALSLLDKIYTSPLCVKMAETCPIFYDVFFAVANGNELLLDLSLTKVNATEPERTAMKKIQDCYVENGLISRVLDGLVMTTISSSKDCMGEHHHHHH
Here’s the mutation scan heatmap. It looks like the amino acids at the end of the protein are most sensitive to mutation. We can also see that “W” tends to be a bad mutation anywhere in the protein.
And here’s the TSNE, it looks like it’s not very closely related to the other proteins, which would make sense since it’s a unique allergen (otherwise we might expect cat allergies to correlate with lots of other allergies).
Here’s the predicted fold, colored based on confidence. The red at the end means lower confidence.
Here it is compared to the actual structure. On the left in cyan is the experimental structure, and on the right is the predicted fold. We can see that the main structure looks similar but the end is totally wrong, in line with the lower confidence. However, this is probably due to the His-tag at the end of the protein:
Note that in PDB data, there were two protein molecules included in the experimental data, apparently as an asymmetric unit, so I removed one for better comparison.