Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Class Assignment 1. First, describe a biological engineering application or tool you want to develop and why. There is currently an urgent research focus on the biodegradation of plastics, due to the extremely long life cycle of synthetic polymers. Prior work has focused on a mix of exploring bacterial and microbial processes (e.g. anaerobic digestion) to break down plastics, and developing compositions that can be commercial compostable (e.g. for single use plastics). My personal interest is in fiber arts and sustainability, so I’d like to tackle this problem from a textile perspective. Fast fashion has exacerbated the volume of cheap, low quality clothes produced everyday. These clothes are often made with synthetic fibers and not for long term use (although the two are not necessarily interchangeable). I believe it’s incredibly important to find a way to biodegrade polyester, one of the most common synthetic polymers in fast fashion clothing.

  • Week 2 HW: DNA Read, Write, & Edit

    Part 1: Benchling & In-silico Gel Art Had an initial mess-up where I tried to “speedrun” the process and ended up with a ladder packed with the effects of multiple restriction enzymes. Finally got success with all of the listed enzymes, separately.

  • Week 3 HW: Lab Automation

    Python Script for Opentrons Artwork Opentrons Art I tried to play around with math functions to create a design, like the Mathematical Heart sample. I drew up a cute fox in Desmos graphing calculator using the following functions, making sure to scale them to the 40 mm limit. Transferring that to Colab was a bit more difficult, and I had to play around with the functions, ranges, and dispense volume to find something that looked good.

  • Week 4 HW: Protein Design Part I

    Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Why do humans eat beef but do not become a cow, eat fish but do not become fish? Why are there only 20 natural amino acids? The natural amino acids are determined by codons, which are determined by three nucleotides (of which can be adenine, uracil, guanine, cytosine). This gives 4 x 4 x 4 = 64 total codons, but redundancy among codons produces only 20 unique amino acids.

Subsections of Homework

Week 1 HW: Principles and Practices

Class Assignment

1. First, describe a biological engineering application or tool you want to develop and why.

There is currently an urgent research focus on the biodegradation of plastics, due to the extremely long life cycle of synthetic polymers. Prior work has focused on a mix of exploring bacterial and microbial processes (e.g. anaerobic digestion) to break down plastics, and developing compositions that can be commercial compostable (e.g. for single use plastics). My personal interest is in fiber arts and sustainability, so I’d like to tackle this problem from a textile perspective. Fast fashion has exacerbated the volume of cheap, low quality clothes produced everyday. These clothes are often made with synthetic fibers and not for long term use (although the two are not necessarily interchangeable). I believe it’s incredibly important to find a way to biodegrade polyester, one of the most common synthetic polymers in fast fashion clothing.

  • Ensuring safety:
    • What material/process is being used for plastic degradation? Any byproducts?
    • Can we guarantee the safety of workers throughout the process?
    • Can we guarantee the safety of the surrounding community?
  • Upholding equity:
    • Who/what affected areas will benefit the most from this application? More specifically, how can we prioritize places that need the most help (e.g. fast fashion landfills in Chile)?
    • Textile and fast fashion industries in particular have historically exploited low-cost labor. How can we counter that in our mission to combat fast fashion production?
  • Promoting a circular economy (avoiding greenwashing):
    • How can we ensure that this solution is actually helpful? How do we avoid being just another step before the landfill?
    • What is our end product and its use? What are the byproducts?

3. Next, describe at least three different potential governance “actions”.

Action 1: Standardization of Process

  • Purpose: A standard process, with clearly understood materials (end products, byproducts) and equipment, will better communicate the safety and efficacy of industrial facilities to the public. These exist for some products like compostable utensils but not necessarily synthetic clothing.
  • Design: Research scientists and regulatory safety boards must work in conjunction to develop and validate the process.
  • Assumptions: That the biodegradation process can be strictly controlled given the wide range of material types in clothing, and that biodegradation can scale up the same way industrial composting does.
  • Risks of Failure & “Success”: Restriction of the process might make the success rate lower, due to the process being less efficient.

Action 2: Polyester Tracking for Success Metrics

  • Purpose: Keeping tabs on where the material comes and goes, upstream and downstream, to evaluate how successful we are. Where are we getting polyester from? Where are our outputs going?
  • Design: We need the cooperation of all parties for data transfer.
  • Assumptions: That this data is currently already collected, and if not, that it’d be easy to start collecting. E.g. do most clothing landfills track the percentage breakdown of natural vs synthetic fibers?
  • Risks of Failure & “Success”: Our process is less effective than it appears. E.g. maybe we are primarily biodegrading high-quality athleticwear. Still good, but not the impact we want.

Action 3: Community Awareness

  • Purpose: Education programs about clothing and material composition can encourage more sustainable practices by the public, as well as more * engagement with our facility.
  • Design: Planning events, town hall discussions, etc, as well as accepting donations of clothing.
  • Assumptions: That community support hinges upon understanding of what we’re doing, and the negative impacts on the community are otherwise negligible.
  • Risks of Failure & “Success”: Education programs are ineffective if the work needed is not in the hands of the community. E.g. if we cannot accept mixed-material clothing, the community cannot necessarily separate material on our behalf. That lies with clothing production facilities.

Action 4: Regulation on Composite Material Production

  • Purpose: A big bottleneck in clothing recycling is the mixing of different materials (e.g. polyester fabric with cotton stitching, metal zippers, PVC coating). High-level regulations could target the production of these clothing.
  • Design: Regulation through fines/taxes in local textile facilities. May be harder to regulate overseas production.
  • Assumptions: That the volume of clothes produced locally is a significant enough % of clothes we take in. That these regulations are effective against overseas textile production.
  • Risks of Failure & “Success”: We have to turn away a majority of clothes and are only able to focus on a niche in synthetic fibers. Could also end up constricting the companies that choose to produce locally, leading to failure.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Score table of governance actions Score table of governance actions

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why.

I’d focus on Process Standardization, as safety is the absolute first priority. Following that, between Success Metrics and Community Awareness, both have potential to contribute to a circular economy, but I’d like to prioritize Success Metrics for its potential to better target impacted areas down the line. So I’d work on a more technical level to develop more effective processes and data collection (which would likely involve academic institutions/environment-focused agencies).

6. Ethical Concerns

I’m wary of how effective we’d be in a global setting, especially since my perceived impact with this depends on how well we can affect overseas institutions, where I believe most of fast fashion waste is made and accumulated.

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate for polymerase is 1:106, or 1 out of every 1,000,000 base pairs might be wrong. Meanwhile, the human genome spans billions of base pairs, with a diploid being about 6.3 Gigabase pairs (pr 3.2 Gigabase pairs for a haploid)1. However, the polymerase can go through a proofreading process where it uses exonuclease to remove the nucleotide through the entire monophosphate base2, essentially, allowing the sequence to “backspace” before continuing.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

  1. An average human protein is about 1036 bp.
  2. Codons are sets of 3 base pairs, so this is about 345 amino acids.
  3. However, many of these amino acids are redundant and can be expressed through multiple base pairs, with 6 being the highest number of variations (in Arginine). We can roughly calculate the amount of variations in a single amino acid through 64 codons/20 amino acids = 3.2 codon variations per amino acid.
  4. This presents about 3.2 variations 345 amino acids, which gives my calculator an overflow error (about 10174).

So that’s clearly an excessive amount of variation for a single protein. However, organisms have developed something called “codon usage bias”, or preference for certain codons evolved over time. This can be due to the following reasons3:

  • Resource use: different tRNAs recognize different codons, so less variation means more efficient tRNA production.
  • Protein folding: different tRNAs for codons can translate at different rates. Codons can be deliberately chosen to have the protein fold at “fast translated” sections while waiting for the “slow translated” sections.
  • Gene expression: certain codons result in stronger gene expression than others. Interestingly, this can work the other way around–codon optimization is a technique that aims to increase protein expression through swapping codons4.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

Oligonucleotides are defined as DNA chains with a length under 200 nucleotides5. Oligo synthesis began with solid-phase synthesis, with additional methods (phosphodiester, phosphotriester, phosphitetriester, phosphoramidite) developed up until the 1980s. Currently solid-phase synthesis using the phosphoramidite method is the most common method; the process was leveraged to implement the first automated DNA synthesizer and has since been optimized for high DNA production volume/thermal control5.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Longer chains have reduced theoretical yield, since each additional nucleotide has an additional “elongation cycle efficiency” (think error rate) that stacks up5. This is calculated with the equation theoretical yield = elongation cycle efficiencynt. Assuming efficiency of 99%,

  • nt = 100 → yield = .99100 = 0.366
  • nt = 200 → yield = .99200 = 0.134
  • nt = 300 → yield = .99300 = 0.049

The phosphoramidite method in particular becomes ineffective beyond 200 base pairs. As a result, more recent alternatives (e.g. enzymatic) are being explored, as research turns to using longer sequences.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

If we further calculate using the above equation, the yield becomes

  • bp = 2000 → yield = .992000 = 0.000000002 = ~0

or effectively zero. At such a high length, the individual “error rates” compound, leaving no chance for success. Current efforts try to improve the process, i.e. increasing the elongation cycle efficiency, or use workarounds like making batches of shorter segments to link together5.

Homework Question from George Church

1. [Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

Not sure I’m fully understanding the question, but given there are 64 possible codons for amino acids yet only 20 amino acids, I’d create a code where all possible codons are inputted and outputted as “plaintext” and “ciphertext”, with the encryption “key” being the 20 amino acids they could be interpreted as. Something like the drawing below: code code This could even be further streamlined for repeating letters: expanded code expanded code (Prof. Church’s slides and paper at [6] used for reference.)

Citations

[1] Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L. On the length, weight and GC content of the human genome. BMC Res Notes. 2019 Feb 27;12(1):106. https://doi.org/10.1186/s13104-019-4137-z

[2] Hopfield, JJ. Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity. Proc Natl Acad Sci USA. 1974 Oct; 71(10):4135-9. https://doi.org/10.1073/pnas.71.10.4135

[3] Ford, T. Plasmids 101: Codon usage bias. addgene Blog. 2018 Sept. https://blog.addgene.org/plasmids-101-codon-usage-bias

[4] Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics. 2017 Sep 2; 18(1):391. https://doi.org/10.1186/s12859-017-1793-7

[5] Hoose, A., Vellacott, R., Storch, M. et al. DNA synthesis technologies to close the gene writing gap. Nat Rev Chem 7, 2023 Jan; 144–161. https://doi.org/10.1038/s41570-022-00456-9

[6] Acevedo-Rocha CG, Budisa N. Xenomicrobiology: a roadmap for genetic code engineering. Microb Biotechnol. 2016 Sep; 9(5):666-76. https://doi.org/10.1111/1751-7915.12398

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

Had an initial mess-up where I tried to “speedrun” the process and ended up with a ladder packed with the effects of multiple restriction enzymes. Sequence Attempt 1 Sequence Attempt 1

Finally got success with all of the listed enzymes, separately. Sequence Attempt 2 Sequence Attempt 2

Some experimentation on Ronan’s website got me this pattern that sort of looks like a pair of pants. In hindsight I should’ve definitely explored results from a combination of enzymes (e.g. EcoRI and HindIII together), which would’ve given me a bigger range of visual results. Sequence on Ronan’s Website Sequence on Ronan’s Website

Replicated “sort-of pants” on Benchling, and my final result. Sequence Attempt 3 Sequence Attempt 3

Part 3: DNA Design Challenge

3.1. Choose your protein.

I recently read about snake venom and how its majority composition of proteins/enzymes make it (theoretically) edible, since it can be digested in the stomach. That was a pretty fun fact. For this assignment, I picked irditoxin, a three-finger toxin that is selectively neurotoxic towards birds and lizards (but not mammals).

I found two subunits on UniProt and went with A.

>sp|A0S864|3NBA_BOIIR Irditoxin subunit A OS=Boiga irregularis OX=92519 PE=1 SV=1
MKTLLLAVAVVAFVCLGSADQLGLGRQQIDWGQGQAVGPPYTLCFECNRMTSSDCSTALR
CYRGSCYTLYRPDENCELKWAVKGCAETCPTAGPNERVKCCRSPRCNDD

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I located irditoxin subunit A in the European Nucleotide Archive, with the following DNA sequence:

ATGAAAACTCTGCTGCTGGCCGTGGCGGTGGTGGCATTCGTGTGCCTGGGCTCAGCTGATCAGCTGGGACTCGGAAGGCAGCAAATAGATTGGGGACAAGGCCAAGCAGTAGGTCCACCCTACACACTTTGTTTCGAATGCAATCGAATGACTTCTTCGGATTGTTCAACCGCTTTGAGATGTTACCGCGGATCGTGCTACACTTTGTACAGACCTGATGAGAATTGTGAATTGAAATGGGCTGTCAAGGGATGTGCTGAAACGTGCCCTACTGCGGGACCTAATGAGAGGGTGAAGTGTTGCAGATCACCAAGGTGCAACGATGATTGAAAAAGAATACACTGAGTTTTGATCTTCGTCTTCAGCAGTAAGCCTCCTTGCTGCTCCTGTATTTTCAACGACTATTATCCTCCTATCAAACAACATATTATAATTCCCATGTAGACGCAAATGATAGAAGCTGAATGATTATTATTAGACATACTTGAAAGTATTATTACTGTTATTAGTATTGTCTATAAAGTAAATTGATCATTATTAGAACTATTATTTTAACTACTATTATTAATCTCACAATGATTATTAAATACTTTGCGAATCT

3.3. Codon optimization.

Codon optimization can be useful in controlling gene expression within a sequence (both increasing and decreasing it). It can also make mRNA production more efficient and impact translation speed, which can in turn affect things like folding speed (fast-translated sequences fold while waiting for slow-translated sequences).

I tried out IDT’s Codon Optimization Tool. IDT recognized a couple different stop and start codons, so I picked the sequence between the first codon (ATG, a start) and the first stop codon (TGA at position 330). The sequence was optimized for E.coli to go for a standard, well understood, commonly used organism.

Shortened old sequence:

ATG AAA ACT CTG CTG CTG GCC GTG GCG GTG GTG GCA TTC GTG TGC CTG GGC TCA GCT GAT CAG CTG GGA CTC GGA AGG CAG CAA ATA GAT TGG GGA CAA GGC CAA GCA GTA GGT CCA CCC TAC ACA CTT TGT TTC GAA TGC AAT CGA ATG ACT TCT TCG GAT TGT TCA ACC GCT TTG AGA TGT TAC CGC GGA TCG TGC TAC ACT TTG TAC AGA CCT GAT GAG AAT TGT GAA TTG AAA TGG GCT GTC AAG GGA TGT GCT GAA ACG TGC CCT ACT GCG GGA CCT AAT GAG AGG GTG AAG TGT TGC AGA TCA CCA AGG TGC AAC GAT GAT TGA

Optimized sequence:

ATG AAA ACA CTG CTG CTG GCG GTT GCG GTC GTT GCC TTT GTC TGC CTG GGC TCA GCC GAC CAG CTG GGG CTG GGC CGT CAG CAG ATT GAT TGG GGT CAG GGC CAG GCG GTT GGG CCG CCG TAC ACG CTG TGC TTT GAA TGC AAC CGT ATG ACC AGC AGC GAC TGC AGC ACT GCG CTG CGT TGT TAT CGC GGC TCG TGC TAT ACG CTG TAT CGT CCG GAT GAA AAC TGC GAA CTG AAA TGG GCG GTG AAA GGC TGC GCT GAA ACC TGC CCG ACT GCA GGC CCG AAT GAA CGC GTG AAA TGC TGC CGC TCT CCG CGC TGC AAC GAC GAT TGA

Another mess-up: IDT denied the optimized sequence due to its complexity, which means this sequence isn’t currently manufacturable and needs to be further redesigned. Seems some of my enzyme recognition sites weren’t ideal.

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

As we’re working with a neurotoxin, and not for high-volume production, I’d probably turn to a cell-free method.

Part 4: Prepare a Twist DNA Synthesis Order

4.1-2. Create a Twist account and a Benchling account, Build Your DNA Insert Sequence

From the downloaded FASTA file:

>DNA_sample_order
TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGATGAAAACACTGCTGCTGGCGGTTGCGGTCGTTGCCTTTGTCTGCCTGGGCTCAGCCGACCAGCTGGGGCTGGGCCGTCAGCAGATTGATTGGGGTCAGGGCCAGGCGGTTGGGCCGCCGTACACGCTGTGCTTTGAATGCAACCGTATGACCAGCAGCGACTGCAGCACTGCGCTGCGTTGTTATCGCGGCTCGTGCTATACGCTGTATCGTCCGGATGAAAACTGCGAACTGAAATGGGCGGTGAAAGGCTGCGCTGAAACCTGCCCGACTGCAGGCCCGAATGAACGCGTGAAATGCTGCCGCTCTCCGCGCTGCAACGACGATTGACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

My expression casset can be accessed here.

4.3-6. Select The “Genes” Option, “Clonal Genes”, Import your sequence, and Choose Your Vector

Here’s my plasmid. She’s beautiful! I love her I love her

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I think bioindicators are an interesting group of organisms, and sequencing them could help us isolate genes that react to surroundings and get used for a more standardized, widespread environment monitoring tool. E.g. microalgae can detect wide range of water quality issues, from heavy metals to nanoparticles, yet “only a few species have been fully sequenced without any gaps”1.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions: Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

I’d likely use a NGS method to handle the sequencing of the entire genome, and of the multiple species that we come across at that. So potentially something like PacBio, which is third-generation. Both the sample used and the library need to be prepared, with the sample needing to be purified and the DNA needing to be fragmented to length and end-capped. The DNA is decoded through a polymerase that runs along the sequence. As it interacts with each nucleotide, it emits light, which is recorded live and appended onto the current sequence. The end result is a straightforward sequence of DNA nucleotides, given in a file that can be read on any notepad app.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I’m continuously interested in biomaterials as an alternative to things like plastic. There’s been a lot of work on getting plastic that is biodegradable, and I’m wondering if there’s a way to go about it from the opposite direction, like fortifying kombucha leather to last longer.

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions: What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Solid-phase synthesis using the phosphoramidite method seems to be the go-to method, so I’d stick with that. The steps are coupling the base with phosphoramidite, capping unreacted sites, oxidating the phosphate, deblocking, and then repeating as needed. The limitation to this process is that it decreases in efficiency past a certain number of bp (~200 as discussed last week) so could potentially be difficult as the needed sequence becomes longer.

5.3 DNA Edit

(i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

This feels like a touchy subject in line with our ethical considerations from last week. The responsibility of human genome sequencing seems enormous, so I’ll consider other organisms. I’m thinking of how filter feeders play an important role in the water ecosystem, essentially “purifying” the water. Could something like that be intentionally edited into plants (or other microorganisms) to boost its “purifying” effect on the air?

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions: How does your technology of choice edit DNA? What are the essential steps? What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing? What are the limitations of your editing methods (if any) in terms of efficiency or precision?

I’m unsure of this part as I’m not too familiar with the work needed to introduce this behavior into land organisms. I believe CRISPR-Cas9 is also a go-to for gene editing, so would probably be a popular approach regardless. The process begins with a guide RNA finding the target sequence in the DNA, at which Cas-9 “cleaves” the segment. Then either new DNA can be added (replacing the segment), or the DNA strands repair (deleting the segment). For preparation, the appropriate guide RNA needs to be sourced. A limitation of CRISPR-Cas9 is in its effectiveness; it has the potential to insert DNA incorrectly, which can lead to mutations when applied to human genome sequencing.

Citations

[1] Evangelia Stavridou, Lefkothea Karapetsi, Georgia Maria Nteve, Georgia Tsintzou, Marianna Chatzikonstantinou, Meropi Tsaousi, Angel Martinez, Pablo Flores, Marián Merino, Luka Dobrovic, José Luis Mullor, Stefan Martens, Leonardo Cerasino, Nico Salmaso, Maslin Osathanunkul, Nikolaos E. Labrou, Panagiotis Madesis, Landscape of microalgae omics and metabolic engineering research for strain improvement: An overview, Aquaculture, Volume 587, 2024, https://doi.org/10.1016/j.aquaculture.2024.740803.

Week 3 HW: Lab Automation

Python Script for Opentrons Artwork

Opentrons Art

I tried to play around with math functions to create a design, like the Mathematical Heart sample. I drew up a cute fox in Desmos graphing calculator using the following functions, making sure to scale them to the 40 mm limit. Transferring that to Colab was a bit more difficult, and I had to play around with the functions, ranges, and dispense volume to find something that looked good.

Fox design Fox design

Some notes from the process:

  • Polar functions are ideal for spacing out the points. I used polynomial equations (y=x2) for the body but ended up switching to ellipses (y2 + x2 = r2).
  • I had trouble picking up/dropping tips, especially with the color changes in each part of the fox. So my program appends all points to lists sorted by color and then dispenses each list in a draw() function (like the Mathematical Heart).
  • I had a lot of trouble tracking volume to aspirate, especially as some for loops go into the negatives, so my refill is based on the actual volume of the pipette (pipette_20ul.current_volume) and is handled all at once in the draw() function.

I also used https://ginkgoartworks.com/ to draw a mushroom and imported the program into Colab. Since the bacteria names don’t register as RGB colors, I had to “color-correct” well_colors to get the visualization to show up (but I assume both versions will work as long as the PCR tubes are physically in order).

Mushroom design Mushroom design

Colab link for both projects here, including the color-corrected version for the mushroom.

Extra: Nebula art

I designed some artwork for the 1536-well plates on the Nebula, which were made during the Saturday 2-6 pm Cloud Lab session. The first one was a firefly squid, inspired by bioluminescent photos I’ve seen of them underwater. Link to gallery image here.

Firefly squid Firefly squid

I also made a second one resembling the Chinese jianzhi for Lunar New Year. I experimented with two color sets, to see how bacteria with similar coloring would contrast against each other. Link to gallery images here and here.

Jianzhi Jianzhi

Post-Lab Questions

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Research paper mentioned Research paper mentioned

This paper explores “dyeing” bacterial cellulose, a bioplastic alternative to leather, which I found pretty interesting. Instead of applying an independent biodegradable dye, the researchers engineered Komagataeibacter rhaeticus to develop eumelanin (dark melanin), which gives a range of shades seen in the photo above. Opentrons is used in the production of the eumelanin development buffer, in which the K. rhaeticus pellicle is incubated to “dye” itself. The robot mixes the buffer, cells, and supernatant within a reaction plate while maintaining a constant low temperature to prevent initial eumelanin growth.

Walker, K.T., Li, I.S., Keane, J. et al. Self-pigmenting textiles grown from cellulose-producing bacteria with engineered tyrosinase expression. Nat Biotechnol 43, 345–354 (2025). https://doi.org/10.1038/s41587-024-02194-3

  1. Write a description about what you intend to do with automation tools for your final project.

More research is needed on my part for this, but I’d like to explore scaling up or going in depth with the range of results for my project (e.g. if self-pigmenting, then trials to develop swatches of colors). This would require a lot of samples, and liquid handlers like the Opentrons would be necessary for producing all the samples indentically.

Final Project Ideas

  1. Coloring Bioplastic/Biotextiles (with an art-focused approach)

    a. Self-pigmenting Bacterial Cellulose: Building on the above paper, further development with dyed bacterial cellulose using pheomelanin instead of eumelanin for a different color range. Likely this is already being explored, so as an ambitious goal Komagataeibacter rhaeticus could be edited to express both pheomelanin and eumelanin, allowing you a 2D range of colors.

    b. Structural Color on Textiles as Biopigment: Naturally occuring structural color is tied to the genome, so we could intentionally induce colors as a natural, biodegradable dyes for textiles. This paper highlights a bacteria that naturally form to output structural color, and this paper explores gene knockout to change the color expressed by one of the bacterias Flavobacterium IR1. A potential project could explore different colors on these bacteria, or find a way to introduce the bacteria to textiles without affecting its formation.

  2. Environmental Sensors (Algae biosensors?)

    • I’d like to explore fluorescence through engineering algae to detect toxic pollutants. I’d be interested in getting a biosensor to detect a singular type of pollutant (e.g. presence of heavy methods), which would require multiple strains of microalgae versus just one general strain that reacts uniformly to all pollutants. So part of this work would probably also include characterizing different responses to find the one that points to that pollutant.
  3. Polyester Biodegradation

    a. Integration with Byproduct Biodegradation: Ideonella sakaiensis is a bacteria that can break down PET plastic through a two step process involving PETase and MHETase enzymes. The end products include carbon dioxide, water, and methane, which in itself a pollutant. Methane itself is a subject of research, with methanotrophs being a type of bacteria that metabolize methane. I’m wondering if the bacteria used for breaking down plastic can be somehow integrated with the added function of breaking down methane through gene engineering.

    b. Polyester-Eating Enzymes: This is a less familiar topic for me, but current work on enzyme degradation focuses on improving the performance of natural enzymes, e.g. its thermostability, pH control, etc. Since there’s such a wide range of work being done, I’m sure there’s some further testing that could be done on an underfocused bacteria/performance metric/modification method.

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

Answer any NINE of the following questions from Shuguang Zhang:

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
  3. Why are there only 20 natural amino acids?

The natural amino acids are determined by codons, which are determined by three nucleotides (of which can be adenine, uracil, guanine, cytosine). This gives 4 x 4 x 4 = 64 total codons, but redundancy among codons produces only 20 unique amino acids.

  1. Can you make other non-natural amino acids? Design some new amino acids.
  2. Where did amino acids come from before enzymes that make them, and before life started?
  3. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
  4. Can you discover additional helices in proteins?
  5. Why are most molecular helices right-handed?
  6. Why do β-sheets tend to aggregate?
    • What is the driving force for β-sheet aggregation?
  7. Why do many amyloid diseases form β-sheets?
    • Can you use amyloid β-sheets as materials?
  8. Design a β-sheet motif that forms a well-ordered structure.

Part B: Protein Analysis and Visualization

  1. Briefly describe the protein you selected and why you selected it.

I picked crystallin, which is a protein in the eye responsible for the movement of your iris as you focus. It’s notably transparent, being part of the eye lens, and water-soluble, which was a callback to our lecture. I picked the protein because I was interested in how cataracts were formed.

The specific protein I went with for the following questions is P02511, or Alpha-crystallin B (in humans).

  1. Identify the amino acid sequence of your protein.

The AA sequence from UnitProt is

sp|P02511|CRYAB_HUMAN Alpha-crystallin B chain OS=Homo sapiens OX=9606 GN=CRYAB PE=1 SV=2 MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFPTSTSLSPFYLRPPSFLRAPSW FDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGDVIEVHGKHEERQDEHGFISREFHR KYRIPADVDPLTITSSLSSDGVLTVNGPRKQVSGPERTIPITREEKPAVTAAPKK

  • How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

Using the Colab notebook, the protein is 175 amino acids long with the most common amino acid being P (and appearing 17 times).

  • How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

According to UniProt, it’s part of the small heat shock protein (HSP20) family, along with all other Alpha-crystallin B proteins. However, according to the Transporter Classification Database, it’s part of the α-Crystallin Chaperone (CryA) family (where other Alpha-crystallin B proteins don’t appear).

  • Does your protein belong to any protein family?

Homology refers to protein sequences that likely have a common ancestor (identified through having similarities in sequence/structure?). Using the BLAST software gives 250 results for similar proteins, with results primarily appearing to be Alpha-crystallin B in different animals.

  1. Identify the structure page of your protein in RCSB

This step was particularly difficult for me, as I didn’t always understand how to get to the answer based on what I had on the screen

  • When was the structure solved? Is it a good quality structure?

The structure seems to be initially solved in 2009 but has increased members up until 2025. Some particularly high resolution structures were identified in 2012 and 2014 through X-ray diffraction, with a resolution of 1.0 - 1.5 Å.

  • Are there any other molecules in the solved structure apart from protein?

I’m not entirely sure how to identify this…

  • Does your protein belong to any structure classification family?

Using the Structural Classification website, it belongs to the “Alpha crystallin-like” family, further within the “Hsp20 chaperone-like” family.

  1. Open the structure of your protein in any 3D molecule visualization software:

I chose to use PyMol to open my structure, getting the structure below.

  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
Opened structure on PyMol Opened structure on PyMol
  • Color the protein by secondary structure. Does it have more helices or sheets?

The protein seems to mostly be composed of sheets with some helices. Secondary structure colored protein Secondary structure colored protein

  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

I colored hydrophobic residues in red and hydrophillic residues in green.

  • Hydrophobic (in red): glycine (Gly), alanine (Ala), valine (Val), leucine (Leu), isoleucine (Ile), proline (Pro), phenylalanine (Phe), methionine (Met), tryptophan (Trp)
    • PyMol code: select hydrophobic, resn Gly resn Ala resn Val resn Leu resn Ile resn Pro resn Phe resn Met resn Trp
  • Hydrophilic (in green): serine (Ser), threonine (Thr), asparagine (Asn), glutamine (Gln), cysteine (Cys), glycine (Gly)
    • PyMol code: select hydrophillic, resn Ser resn Thr resn Asn resn Gln resn Cys resn Gly

Residue-colored protein Residue-colored protein I had to switch to a spheres visualization to better see how molecules were interacting. It was a little hard for me to see a significant pattern, but I do feel like the hydrophilic residues have more “open” facing areas, whereas the hydrophobic residues were more clumped (both together and with neighboring residues).

  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Visualizing the protein as a surface was really helpful! I could easily find a couple areas that could be binding pockets. It’s a little difficult to show it accurately in a photo, but I indicated potential areas below:

Surface visualization of protein Surface visualization of protein

Part C. Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

  1. Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.
  2. Choose your favorite protein from the PDB.
  3. We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

  1. Deep Mutational Scans
    1. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
    2. Can you explain any particular pattern? (choose a residue and a mutation that stands out)
    3. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.
  2. Latent Space Analysis
    1. Use the provided sequence dataset to embed proteins in reduced dimensionality.
    2. Analyze the different formed neighborhoods: do they approximate similar proteins?
    3. Place your protein in the resulting map and explain its position and similarity to its neighbors.

C2. Protein Folding

Folding a protein

  1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
  2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

C3. Protein Generation

Picture Source: 1. Post from Sergey Ovchinnikov 2. Roney, Ovchinnikov et al (2022). State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
  2. Input this sequence into ESMFold and compare the predicted structure to your original.

Part D. Group Brainstorm on Bacteriophage Engineering

  1. Find a group of ~3–4 students
  2. Read through the Phage Reading material listed under “Reading & Resources” below.
  3. Review the Bacteriophage Final Project Goals for engineering the L Protein:
    • Increased stability (easiest)
    • Higher titers (medium)
    • Higher toxicity of lysis protein (hard)
  4. Brainstorm Session
    • Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).
    • Write a 1-page proposal (bullet points or short paragraphs) describing:
    • Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).
    • Why do you think those tools might help solve your chosen sub-problem?
    • Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).
    • Include a schematic of your pipeline.
    • This resource may be useful: HTGAA Protein Engineering Tools
  5. Each individually put your plan on your HTGAA website
    • Include your group’s short plan for engineering a bacteriophage