Shitong Luo — HTGAA Spring 2026

About me

I am a second-year PhD student at MIT EECS. I work in computational biology and machine learning.

Contact info

Homework

Labs

Projects

Individual Final Project

Homework

Weekly homework submissions:

Week 01 HW: Principles and Practices
Class Assignment 1. First, describe a biological engineering application or tool you want to develop and why. I want to optimize PETase (polyethylene terephthalate hydrolase). PETase is an enzyme that can break down PET plastics, which are widely used in packaging. By optimizing PETase, we can enhance its efficiency in degrading PET and increase its stability under various conditions. This could lead to more effective recycling processes and help reduce plastic pollution.
Week 02 HW: DNA Read, Write, & Edit
Part 0: Basics of Gel Electrophoresis I have attended the recitation. Part 1: Benchling & In-silico Gel Art I made the gel art below. It is “HT” for “How To grow almost anything”. Part 2: Gel Art - Restriction Digests and Gel Electrophoresis I worked in group with Louisa, Jasmine, and Yutong. We tried to make the cat gel art designed by Louisa, but unfortunately it was not very successful. Photo below:
Week 03 HW: Lab Automation
Python Script for Opentrons Artwork I created a design using opentrons-art.rcdonovan.com Opentrons-Art Website: https://opentrons-art.rcdonovan.com/?id=80fx569l8o4tho4 Google Colab: https://colab.research.google.com/drive/1UPiCmwBP3sIFD_rNVRHeT3YhuiQQ5ZGP#scrollTo=pczDLwsq64mk&line=6&uniqifier=1 The OpentronMock gives the following output: Code:

### YOUR CODE HERE to create your design ### sfgfp_points = [(-3.3, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(-3.3, -5.5),(-1.1, -5.5),(1.1, -5.5),(3.3, -5.5),(-1.1, -7.7),(1.1, -7.7),(-5.5, -9.9),(-1.1, -9.9),(1.1, -9.9),(5.5, -9.9),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1)] mrfp1_points = [(-23.1, 27.5),(-20.9, 27.5),(-18.7, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(-23.1, 25.3),(-20.9, 25.3),(-18.7, 25.3),(-16.5, 25.3),(-14.3, 25.3),(14.3, 25.3),(16.5, 25.3),(18.7, 25.3),(20.9, 25.3),(23.1, 25.3),(-23.1, 23.1),(-20.9, 23.1),(-18.7, 23.1),(-16.5, 23.1),(-14.3, 23.1),(-12.1, 23.1),(-3.3, 23.1),(-1.1, 23.1),(1.1, 23.1),(3.3, 23.1),(12.1, 23.1),(14.3, 23.1),(16.5, 23.1),(18.7, 23.1),(20.9, 23.1),(23.1, 23.1),(-23.1, 20.9),(-20.9, 20.9),(-18.7, 20.9),(-16.5, 20.9),(-14.3, 20.9),(-12.1, 20.9),(-9.9, 20.9),(-7.7, 20.9),(-5.5, 20.9),(-3.3, 20.9),(-1.1, 20.9),(1.1, 20.9),(3.3, 20.9),(5.5, 20.9),(7.7, 20.9),(9.9, 20.9),(12.1, 20.9),(14.3, 20.9),(16.5, 20.9),(18.7, 20.9),(20.9, 20.9),(23.1, 20.9),(-23.1, 18.7),(-20.9, 18.7),(-18.7, 18.7),(-16.5, 18.7),(-14.3, 18.7),(-12.1, 18.7),(-9.9, 18.7),(-7.7, 18.7),(7.7, 18.7),(9.9, 18.7),(12.1, 18.7),(14.3, 18.7),(16.5, 18.7),(18.7, 18.7),(20.9, 18.7),(23.1, 18.7),(-23.1, 16.5),(-20.9, 16.5),(-18.7, 16.5),(-16.5, 16.5),(-14.3, 16.5),(-12.1, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(23.1, 16.5),(-23.1, 14.3),(-20.9, 14.3),(-18.7, 14.3),(-16.5, 14.3),(16.5, 14.3),(18.7, 14.3),(20.9, 14.3),(23.1, 14.3),(-23.1, 12.1),(-20.9, 12.1),(-18.7, 12.1),(18.7, 12.1),(20.9, 12.1),(23.1, 12.1),(-23.1, 9.9),(-20.9, 9.9),(20.9, 9.9),(23.1, 9.9),(-23.1, 7.7),(-20.9, 7.7),(20.9, 7.7),(23.1, 7.7),(-23.1, 5.5),(23.1, 5.5),(-25.3, 3.3),(-23.1, 3.3),(23.1, 3.3),(25.3, 3.3),(-25.3, 1.1),(-23.1, 1.1),(23.1, 1.1),(25.3, 1.1),(-25.3, -1.1),(-23.1, -1.1),(23.1, -1.1),(25.3, -1.1),(-25.3, -5.5),(-23.1, -5.5),(23.1, -5.5),(25.3, -5.5),(-25.3, -7.7),(25.3, -7.7),(-23.1, -9.9),(23.1, -9.9),(-23.1, -12.1),(23.1, -12.1),(-23.1, -14.3),(23.1, -14.3),(-20.9, -16.5),(20.9, -16.5),(-20.9, -18.7),(-18.7, -18.7),(18.7, -18.7),(20.9, -18.7),(-18.7, -20.9),(-16.5, -20.9),(16.5, -20.9),(18.7, -20.9),(-16.5, -23.1),(-14.3, -23.1),(-12.1, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(7.7, -25.3),(9.9, -25.3),(12.1, -25.3),(14.3, -25.3),(-9.9, -27.5),(-7.7, -27.5),(-5.5, -27.5),(-3.3, -27.5),(-1.1, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(9.9, -27.5),(-1.1, -29.7),(1.1, -29.7)] azurite_points = [(-9.9, 7.7),(-7.7, 7.7),(7.7, 7.7),(9.9, 7.7),(-12.1, 5.5),(-9.9, 5.5),(-7.7, 5.5),(7.7, 5.5),(9.9, 5.5),(12.1, 5.5),(-9.9, 3.3),(9.9, 3.3)] mwasabi_points = [(-27.5, -3.3),(-25.3, -3.3),(-23.1, -3.3),(-20.9, -3.3),(-18.7, -3.3),(-16.5, -3.3),(16.5, -3.3),(18.7, -3.3),(20.9, -3.3),(23.1, -3.3),(25.3, -3.3),(27.5, -3.3),(-23.1, -7.7),(-20.9, -7.7),(-18.7, -7.7),(-16.5, -7.7),(16.5, -7.7),(18.7, -7.7),(20.9, -7.7),(23.1, -7.7),(-27.5, -9.9),(-25.3, -9.9),(25.3, -9.9),(27.5, -9.9),(-16.5, -12.1),(16.5, -12.1),(-20.9, -14.3),(-18.7, -14.3),(18.7, -14.3),(20.9, -14.3),(-23.1, -16.5),(23.1, -16.5),(-25.3, -18.7),(25.3, -18.7)] scale = 1 def draw_points(points, color="Red"): segments = [] for i in range(0, len(points), 20): segments.append(points[i : i+20]) for seg in segments: pipette_20ul.pick_up_tip() pipette_20ul.aspirate(len(seg), location_of_color(color)) for x, y in seg: adjusted_location = center_location.move(types.Point(x=xscale, y=yscale)) dispense_and_detach(pipette_20ul, 1, adjusted_location) pipette_20ul.drop_tip() draw_points(sfgfp_points, "Red") draw_points(mrfp1_points, "Green") draw_points(azurite_points, "Orange") draw_points(mwasabi_points, "Orange") Result With the help of our TA Ronan, the art was printed with an Opentrons robot. The result is shown below:

Week 04 HW: Protein Design Part I
Part A: Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) 1 Dalton = 1.66 x 10-24 grams. 100 Daltons = 1.66 x 10-22 grams. 1 gram = 6.02 x 1023 molecules. 20% of meat is protein, so 100 grams of proteins in 500 grams of meat. Therefore: 100 x 6.02 x 1023 = 6.02 x 1025 molecules of amino acids.
Week 05 HW: Protein Design Part II
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM The original sequence of SOD1 is: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Mutate the 4th amino acid A to V (A4V): MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:
Week 06 HW: Genetic Circuits Part I
DNA Assembly 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion DNA Polymerase: synthesizes new DNA strands by adding nucleotides to the template strand during PCR dNTPs (deoxynucleotides): building blocks of DNA Buffer: provides the optimal conditions 2. What are some factors that determine primer annealing temperature during PCR? Tm (melting temperature) of the primer: the temperature at which half of the DNA duplex dissociates Primer length: longer primers generally require higher annealing temperatures GC content: higher GC content increases the Tm and may require higher annealing temperatures Salt concentration: higher salt concentrations can stabilize the DNA duplex and may require higher annealing temperatures 3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. PCR:
Week 07 HW: Genetic Circuits Part II
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? IANNs can perform more complex computations than traditional genetic circuits, which are limited to Boolean functions. IANNs can process continuous inputs and produce continuous outputs, allowing for more nuanced control of gene expression. Additionally, IANNs can learn and adapt over time, making them more versatile and capable of handling dynamic environments.
Week 09 HW: Cell-Free Systems
General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Advantages: cell-free systems allow for precise control over the reaction conditions, such as temperature, pH, and the concentration of substrates and cofactors. This can lead to higher yields and faster protein production compared to in vivo methods.
Week 10 HW: Imaging and Measurement
Final Project 1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. We will measure PETase thermostability (Tm), residual activity after heat challenge, expression yield, and PET-degradation rate for each variant.
Week 11 HW: Bioproduction & Cloud Labs
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork 1. Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST. Done.

Make a note on your HTGAA webpages including: what you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”), what you liked about the project, and what about this collaborative art experiment could be made better for next year. I contributed a little dot on the bottom right plate, but it was overlapped by other contributions later. I liked the interactivity of the project. It’s cool to see how the plates evolve over time as more people contribute.

Week 12 HW: Bioproduction
No homework assignment for this week. Working on the final project.
Week 13 HW: Bio-Design & Living Materials
No homework assignment for this week. Working on the final project.
Week 14 HW: Biofabrication
No homework assignment for this week. Working on the final project.

Week 01 HW: Principles and Practices

Class Assignment

1. First, describe a biological engineering application or tool you want to develop and why.

I want to optimize PETase (polyethylene terephthalate hydrolase). PETase is an enzyme that can break down PET plastics, which are widely used in packaging. By optimizing PETase, we can enhance its efficiency in degrading PET and increase its stability under various conditions. This could lead to more effective recycling processes and help reduce plastic pollution.

I plan to use AI models such as ProteinMPNN to propose mutations and test them in the lab.

One governance goal for optimizing PETase is to ensure that the enzyme does not have unintended consequences on the environment or human health, such as producing harmful byproducts.

Possible sub-goals:

Identify possible byproducts in the lab.
Test the toxicity of the byproducts.
Ensure that there are no byproducts that could be harmful to environment and health.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

[Option 1] As researchers, we could conduct comprehensive testing of the optimized PETase to identify any potential harmful byproducts and assess their toxicity.
- Purpose: To ensure that the optimized PETase does not produce harmful byproducts.
- Design: Conduct experiments to identify all the products produced by the PETase.
- Assumptions: We know what it looks like when there’s no unexpected byproducts.
- Risks of Failure: If harmful byproducts are not identified, it could lead to environmental or health issues.
- Success: The optimized PETase is found to be safe and does not produce harmful byproducts.
[Option 2] Companies that produce the enzymes should provide detailed information about the enzyme’s properties, including any potential risks and safety measures.
- Purpose: To sufficiently inform users about the potential risks associated with the enzyme.
- Design: Disclose information about the enzyme’s properties and potential risks.
- Assumptions: Companies will comply with the reporting requirements and provide accurate and sufficient information.
- Risks of Failure: Could lead to mishandling of the enzyme.
- Success: Users are well-informed about the enzyme and can use it safely.
[Option 3] Regulators could establish guidelines for safe use and disposal to minimize potential impact.
- Purpose: To ensure that the enzyme is used and disposed of properly to minimize potential environmental impact.
- Design: Develop guidelines of best practices for the use and disposal.
- Assumptions: Users will follow the guidelines.
- Risks of Failure: If users do not follow the guidelines, it could lead to negative consequences.
- Success: Environmental and health impact are minimized.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	1	2	2
• By helping respond	2	1	2
Foster Lab Safety
• By preventing incident	1	1	2
• By helping respond	2	2	2
Protect the environment
• By preventing incidents	1	1	3
• By helping respond	2	1	2
Other considerations
• Minimizing costs and burdens to stakeholders	1	2	3
• Feasibility?	1	1	2
• Not impede research	2	2	2
• Promote constructive applications	2	2	2

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

I would prioritize Option 1, because this option is our responsibility as researchers, and it is the most direct way to ensure safety by eliminating risks at the source.

Lab Preparation

Complete Lab Specific Training in Person.
Complete Safety Training in Atlas

Week 2 Lecture Prep

Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error rate: $1:10^6$

Human genome length: 3.2 Gbp (billion base pairs)

Mechanisms to deal with the discrepancy: proofreading and repairing (MutS)

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Average human protein: 345 amino acids. Number of different ways to code: $3^{345}$.

Reasons that not all codes work: codon bias among species and mRNA secondary structure.

Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

Phosphodiester method.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because the yield decreases exponentially with length, and the error rate increases with length as well.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Because the cumulative error rate would be too high leading to practically zero yield.

Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Leucine, Lysine, Arginine.

Lysine is essential for protein synthesis and enzyme production, which is critical for survival.

Your HTGAA Website

Here it is!

Week 02 HW: DNA Read, Write, & Edit

Part 0: Basics of Gel Electrophoresis

I have attended the recitation.

Part 1: Benchling & In-silico Gel Art

I made the gel art below. It is “HT” for “How To grow almost anything”.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

I worked in group with Louisa, Jasmine, and Yutong. We tried to make the cat gel art designed by Louisa, but unfortunately it was not very successful. Photo below:

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose EGFR (Epidermal Growth Factor Receptor), because it is a protein that plays a critical role in cell growth and division, and it is often mutated in various cancers.

>sp|P00533|EGFR_HUMAN Epidermal growth factor receptor OS=Homo sapiens OX=9606 GN=EGFR PE=1 SV=2
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV
VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA
VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF
QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC
TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV
VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK
NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF
ENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKL
FGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCN
LLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVM
GENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVV
ALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGS
GAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGI
CLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAA
RNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSY
GVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPK
FRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQ
QGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTED
SIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLN
TVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRV
APQSSEFIGA

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I used the reverse translation tool on the Twist Bioscience website. I added a stop codon (TAA) manually to the end of the DNA sequence. The DNA sequence is as follows:

>Egfr
ATGCGTCCGTCCGGCACTGCAGGGGCCGCACTGTTGGCTCTGCTCGCAGCCTTATGCCCCGCCAG
TCGAGCATTGGAGGAGAAGAAGGTGTGCCAGGGAACCTCTAACAAGCTGACACAACTCGGGACCT
TCGAGGACCATTTCCTGAGCCTCCAGAGAATGTTTAACAACTGCGAGGTTGTGTTGGGGAATCTG
GAGATTACCTACGTCCAGCGTAACTACGACCTCAGTTTCCTCAAGACAATACAGGAAGTGGCTGG
TTATGTTCTGATTGCACTGAATACCGTAGAGAGAATCCCCCTGGAAAACCTGCAAATTATTAGAG
GGAACATGTACTACGAAAACTCATATGCACTGGCCGTCCTGTCTAATTACGATGCCAACAAAACC
GGCCTGAAGGAGCTGCCCATGAGGAATCTCCAGGAAATTCTGCATGGGGCTGTAAGATTCAGCAA
TAACCCGGCTCTCTGTAACGTCGAGAGCATCCAGTGGCGCGACATAGTGAGCTCTGATTTCCTTT
CAAATATGTCCATGGATTTTCAAAACCATCTGGGGTCTTGTCAAAAGTGCGACCCATCTTGTCCC
AATGGTTCCTGTTGGGGCGCCGGAGAGGAGAATTGTCAGAAGCTTACCAAGATTATCTGTGCACA
GCAGTGTTCAGGGAGATGTAGGGGGAAGTCTCCAAGCGATTGCTGTCACAATCAATGCGCCGCCG
GGTGCACCGGCCCCAGAGAGAGTGACTGTTTAGTGTGCCGTAAATTCAGGGATGAGGCTACGTGC
AAAGACACATGCCCTCCACTCATGTTGTACAACCCAACAACTTATCAGATGGACGTGAACCCCGA
AGGGAAGTATAGCTTTGGGGCGACGTGCGTCAAGAAATGTCCACGCAACTATGTCGTTACGGACC
ACGGTTCCTGTGTGCGCGCTTGCGGAGCCGATTCATACGAAATGGAGGAAGACGGCGTCCGCAAG
TGTAAGAAGTGTGAAGGTCCTTGTCGCAAAGTCTGTAACGGTATCGGAATCGGCGAATTCAAGGA
CTCTCTCAGTATTAATGCAACCAATATCAAGCATTTTAAAAATTGTACCTCAATCTCAGGGGACT
TGCACATCCTGCCCGTTGCGTTTAGGGGGGACTCATTTACACACACCCCTCCACTGGATCCACAG
GAATTGGATATTCTGAAGACGGTGAAGGAAATCACCGGGTTCCTCCTGATTCAGGCCTGGCCAGA
GAACCGGACCGACCTTCACGCATTTGAGAACCTTGAAATTATTCGCGGGAGAACGAAACAACATG
GGCAATTCAGTCTCGCTGTCGTCAGCCTTAACATTACCTCCCTCGGATTGAGATCACTGAAGGAG
ATAAGCGACGGGGACGTGATAATCTCTGGGAACAAGAACTTGTGCTACGCCAACACTATTAACTG
GAAGAAGTTGTTCGGTACCTCAGGTCAGAAGACTAAAATCATCTCTAATCGAGGCGAAAATTCTT
GCAAAGCAACAGGGCAGGTGTGCCATGCCCTGTGCTCCCCTGAAGGGTGCTGGGGCCCCGAACCA
AGAGACTGCGTCTCTTGTCGGAATGTTTCCCGGGGGAGGGAATGTGTGGATAAGTGCAACCTGCT
TGAAGGAGAACCGAGAGAGTTTGTGGAAAATTCCGAATGTATCCAATGCCACCCTGAATGTTTAC
CTCAGGCCATGAACATTACATGCACAGGTAGGGGCCCCGACAATTGTATTCAGTGCGCCCATTAC
ATCGACGGACCCCACTGCGTAAAGACGTGCCCTGCCGGCGTTATGGGCGAGAATAATACCCTGGT
GTGGAAGTATGCCGACGCCGGGCATGTTTGCCATCTGTGCCACCCAAATTGTACATATGGCTGCA
CCGGCCCCGGGTTGGAGGGCTGCCCCACTAATGGACCTAAGATCCCCAGCATCGCCACCGGGATG
GTTGGCGCTCTGCTGCTGCTGTTGGTGGTCGCCTTGGGGATAGGGTTGTTCATGCGAAGGAGGCA
TATCGTCAGGAAGCGCACCCTGAGGAGGCTGCTCCAAGAGCGTGAGCTTGTCGAACCCCTGACCC
CCAGCGGCGAGGCCCCAAACCAGGCACTTTTGCGCATTCTGAAAGAGACCGAATTCAAAAAGATC
AAGGTGCTCGGCAGCGGGGCCTTCGGCACTGTGTACAAGGGTCTTTGGATCCCGGAGGGCGAAAA
AGTTAAAATACCCGTGGCAATAAAAGAGCTGAGGGAGGCTACAAGTCCAAAGGCAAACAAAGAAA
TCCTTGACGAAGCATACGTTATGGCATCCGTCGATAATCCACACGTGTGCAGGCTTCTCGGGATT
TGCCTGACGTCAACCGTGCAGCTGATTACGCAACTGATGCCATTTGGCTGTCTGCTCGATTATGT
GCGCGAGCACAAGGACAACATTGGATCTCAATACTTACTCAACTGGTGCGTGCAGATCGCGAAGG
GGATGAACTATCTGGAGGACCGCAGACTCGTCCACAGGGATCTTGCAGCCAGGAACGTACTCGTT
AAGACCCCGCAGCATGTGAAAATTACCGATTTCGGCCTTGCAAAACTGCTGGGTGCCGAAGAGAA
AGAATACCATGCAGAGGGTGGCAAAGTTCCTATCAAATGGATGGCGTTAGAGTCAATTCTGCATC
GGATCTATACCCATCAGAGCGACGTGTGGTCTTACGGTGTGACCGTTTGGGAGCTTATGACTTTT
GGGAGCAAACCGTACGACGGCATCCCGGCAAGCGAAATTTCCTCAATACTGGAAAAGGGGGAACG
TCTGCCCCAACCACCTATCTGCACTATAGACGTTTATATGATAATGGTGAAATGTTGGATGATCG
ACGCCGACAGTCGACCCAAGTTTCGAGAGTTAATCATCGAGTTCTCCAAGATGGCTCGGGATCCC
CAAAGGTACTTAGTCATCCAGGGCGATGAAAGAATGCACTTACCCTCACCCACAGATTCAAACTT
CTATCGAGCTTTGATGGATGAGGAAGACATGGATGATGTGGTAGACGCCGACGAGTACCTGATAC
CACAGCAGGGTTTTTTTTCTTCACCAAGCACATCTCGTACGCCTCTTCTGAGTAGCCTCAGCGCG
ACCTCCAACAACTCCACAGTGGCGTGCATCGACCGCAACGGACTTCAGTCCTGTCCAATTAAAGA
GGATTCCTTCCTGCAGAGGTATAGCAGCGACCCTACCGGAGCCCTTACCGAGGATAGTATTGATG
ATACATTCCTGCCTGTACCCGAATACATTAATCAGTCCGTGCCAAAACGCCCCGCAGGGAGTGTA
CAGAATCCAGTGTACCACAACCAGCCGCTGAACCCCGCACCCAGCCGAGACCCCCACTACCAGGA
CCCACATAGCACGGCCGTGGGAAATCCTGAATACCTGAACACCGTGCAGCCTACATGTGTGAATA
GCACTTTCGATAGCCCCGCACACTGGGCCCAGAAGGGCTCACACCAAATTAGCCTTGATAACCCT
GATTACCAGCAGGACTTCTTCCCCAAGGAGGCAAAGCCAAACGGTATCTTTAAGGGTAGCACGGC
CGAAAACGCAGAGTACTTGAGGGTTGCCCCTCAGTCCAGTGAGTTCATTGGCGCCTAA

3.3. Codon optimization.

I used the codon optimization tool on the Twist Bioscience website. The optimized DNA sequence is attached below.

Codon optimization is necessary because different organisms have different preferences for codons to encode the same amino acid. This can affect the efficiency of protein expression. I chose to optimize the codon sequence for E. coli, because it is a commonly used host organism for protein expression in the lab.

>Egfr
ATGAGACCCAGTGGAACCGCCGGTGCAGCTCTCCTTGCTTTGCTCGCTGCGCTCTGTCCAGCTTC
ACGGGCCCTTGAAGAAAAGAAAGTCTGTCAAGGTACAAGCAATAAACTCACGCAGTTGGGAACTT
TTGAAGATCACTTTCTGTCCCTGCAAAGGATGTTCAATAATTGTGAAGTAGTTCTGGGCAACCTC
GAAATCACATATGTACAGAGAAATTATGATTTATCCTTTCTGAAAACCATCCAAGAGGTAGCCGG
GTACGTCTTGATCGCTTTAAACACGGTTGAACGGATACCACTCGAGAATTTGCAGATAATCCGCG
GCAATATGTATTATGAGAATAGCTACGCCCTCGCGGTGCTCTCAAACTATGACGCGAATAAGACA
GGGTTAAAAGAATTACCAATGAGAAACCTGCAAGAGATACTCCACGGTGCAGTTAGGTTTAGTAA
CAATCCAGCCCTGTGCAATGTGGAATCTATTCAATGGCGAGATATCGTTAGTAGTGACTTTCTGT
CCAACATGAGTATGGACTTCCAGAATCACCTTGGCAGTTGCCAGAAATGTGATCCCAGCTGCCCA
AACGGGAGCTGCTGGGGAGCTGGGGAAGAAAACTGCCAGAAACTCACTAAAATCATATGCGCTCA
ACAATGCTCTGGCAGGTGCAGAGGCAAAAGCCCTTCCGACTGTTGCCATAACCAGTGTGCAGCTG
GATGTACTGGGCCGAGGGAAAGCGATTGCCTTGTCTGTAGAAAGTTTCGGGACGAAGCCACCTGT
AAGGATACTTGTCCACCCCTGATGCTCTATAATCCTACGACCTACCAAATGGATGTTAATCCGGA
AGGAAAATACTCCTTCGGCGCCACCTGTGTGAAGAAGTGCCCGCGGAATTACGTTGTGACAGATC
ATGGGTCTTGCGTCCGAGCCTGTGGTGCAGACTCTTATGAGATGGAAGAGGATGGGGTGAGGAAA
TGCAAGAAATGCGAGGGGCCATGCAGGAAGGTATGCAATGGAATTGGCATAGGTGAGTTTAAAGA
TTCACTGAGCATCAACGCGACAAACATTAAACACTTCAAGAACTGCACGTCCATATCTGGAGATC
TTCATATTCTTCCGGTGGCTTTCCGAGGAGATTCTTTCACCCATACACCACCCTTAGACCCTCAA
GAGCTGGACATATTGAAAACAGTTAAAGAGATTACAGGCTTTCTGCTTATCCAAGCTTGGCCTGA
AAATAGGACGGATCTCCATGCCTTCGAAAATCTGGAGATCATCAGAGGACGCACAAAGCAGCACG
GACAGTTTTCCCTGGCGGTGGTGTCTCTCAATATAACTTCACTTGGCTTACGCAGCCTCAAAGAA
ATTTCCGACGGAGATGTCATCATAAGTGGAAATAAGAATCTCTGTTACGCTAATACCATCAATTG
GAAGAAACTCTTCGGAACATCTGGACAAAAGACAAAGATCATTAGCAACCGCGGGGAGAACAGCT
GTAAGGCTACCGGACAAGTCTGTCACGCACTTTGTTCTCCAGAGGGATGTTGGGGCCCAGAGCCT
CGTGATTGTGTGTCCTGCAGGAACGTCAGCCGCGGCAGAGAGTGCGTTGACAAATGTAATCTCCT
CGAGGGCGAGCCTCGCGAATTCGTTGAGAACAGTGAGTGCATTCAGTGTCATCCAGAGTGCTTGC
CACAAGCTATGAATATCACCTGTACTGGACGCGGACCTGATAACTGCATCCAATGTGCTCACTAT
ATAGATGGGCCACATTGTGTGAAAACTTGTCCAGCTGGTGTAATGGGAGAAAACAACACATTAGT
TTGGAAATACGCAGATGCTGGTCACGTGTGTCACCTTTGTCATCCTAACTGCACCTACGGGTGTA
CTGGTCCAGGCCTCGAAGGTTGTCCGACCAACGGCCCAAAGATTCCTTCAATTGCAACAGGCATG
GTCGGGGCCTTGTTGTTGTTGCTGGTTGTGGCACTCGGTATTGGCCTGTTTATGAGACGGCGGCA
CATTGTGCGTAAAAGAACATTACGTCGCCTCTTACAGGAACGAGAACTGGTAGAGCCGCTCACAC
CTTCTGGGGAAGCACCGAATCAAGCCCTGCTCCGTATATTAAAGGAAACTGAGTTTAAGAAAATT
AAAGTACTGGGATCCGGCGCTTTTGGTACAGTTTATAAAGGGCTGTGGATACCTGAAGGGGAGAA
GGTGAAGATCCCTGTCGCCATCAAGGAATTGCGAGAAGCGACTTCCCCCAAAGCCAATAAAGAGA
TTCTGGATGAGGCCTATGTCATGGCTTCTGTGGACAACCCTCATGTTTGTCGCCTGTTAGGCATC
TGTCTTACTAGCACTGTCCAACTTATCACTCAGTTGATGCCGTTCGGGTGCCTTCTGGACTACGT
TAGAGAACATAAAGATAATATCGGTAGCCAGTATCTCCTGAATTGGTGTGTTCAAATAGCAAAAG
GCATGAATTACCTCGAAGATCGGCGGCTGGTTCATCGCGACCTGGCTGCTCGGAATGTCCTTGTC
AAAACACCCCAACACGTAAAGATAACAGACTTTGGGCTGGCTAAGCTTCTTGGCGCTGAGGAAAA
GGAGTATCACGCTGAAGGCGGAAAGGTCCCCATTAAGTGGATGGCTCTGGAATCCATCTTGCACA
GGATATACACTCACCAATCAGATGTCTGGTCCTATGGGGTAACAGTATGGGAACTGATGACCTTC
GGCTCCAAGCCATATGATGGAATACCTGCGAGTGAGATAAGCTCCATTTTGGAGAAAGGAGAGAG
GTTACCGCAGCCGCCCATATGTACAATTGATGTCTACATGATTATGGTCAAGTGCTGGATGATTG
ATGCAGATAGCCGGCCGAAATTCCGCGAATTGATTATTGAATTTAGCAAAATGGCCCGCGACCCA
CAGCGCTATTTGGTTATTCAAGGGGACGAGAGGATGCATCTGCCAAGCCCAACTGACAGCAACTT
TTACCGCGCCCTTATGGACGAAGAGGATATGGACGACGTTGTGGATGCTGATGAATATTTGATTC
CTCAACAAGGGTTCTTCAGTTCTCCCTCAACTTCCAGAACCCCACTGCTTTCAAGTTTATCCGCA
ACTAGTAATAATAGCACCGTTGCATGTATTGATCGGAATGGGCTGCAAAGCTGCCCCATCAAGGA
AGACTCATTCTTACAACGTTACTCATCTGATCCCACTGGGGCGCTGACTGAAGACTCAATCGACG
ACACCTTTCTTCCAGTCCCGGAGTATATCAACCAAAGTGTCCCCAAGAGACCTGCCGGATCCGTG
CAAAACCCTGTTTATCATAATCAACCTCTCAATCCAGCGCCGTCTAGGGATCCTCATTATCAAGA
TCCCCACTCTACTGCTGTAGGGAACCCAGAGTATCTCAATACAGTTCAACCCACTTGCGTCAACT
CTACCTTTGACAGTCCTGCCCATTGGGCGCAGAAAGGTTCCCATCAGATCTCCCTGGACAATCCA
GACTATCAACAAGATTTCTTTCCTAAAGAAGCCAAACCCAATGGGATATTCAAAGGATCTACCGC
AGAGAATGCGGAATATCTGCGCGTGGCACCCCAAAGCAGCGAATTTATCGGGGCTTAG

3.4. You have a sequence! Now what?

Cell-dependent and cell-free technologies could be used.

Cell-dependent method: First, insert the DNA sequence into a plasmid vector, and then transfer it into a host cell (e.g. E. coli). The host cell will transcribe the DNA into mRNA, which will then be translated into the protein.

Part 4: Prepare a Twist DNA Synthesis Order

I have done everything in Part 3 using the Twist Bioscience website, and followed the tutorial to finish the remaining steps.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

I would want to sequence the DNA of a cancer cell, because we can learn which mutations led to the cancer, and use corresponding drugs (if available) to treat the cancer.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Illumina sequencing.

Q1: Is your method first-, second- or third-generation or other? How so?

Second-generation, because it is based on sequencing by synthesis.

Q2: What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

The input is the DNA extracted from the cancer cell.

Fragmentation: The DNA is fragmented into smaller pieces.

Adapter ligation: Adapters are ligated to the ends of the DNA fragments.

PCR: The DNA fragments are amplified using PCR.

Q3: What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

First, DNA fragments are immobilized on a flow cell and amplified to form clusters. Then, fluorescently labeled nucleotides are added to the flow cell. Each time, a nucleotide is incorporated into the growing DNA strand by DNA polymerase, a fluorescent signal is emitted. The fluorescence is detected to determine which nucleotide was incorporated. Finally, the fluorescence signals are converted into base calls.

Q4: What is the output of your chosen sequencing technology?

Sequence reads with quality scores. Sequence reads can be assembled into a complete genome sequence.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would want to synthesize the DNA sequence of mutated EGFR in cancers. By synthesizing the DNA sequence of EGFR and translate it into protein, we can study its function and develop targeted drugs.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use phosphoramidite chemical synthesis for short DNA fragments (e.g. 200bp), and then Gibson assembly for assembling the short fragments into the full-length DNA sequence.

Q1: What are the essential steps of your chosen sequencing methods?

Phosphoramidite chemical synthesis: One DNA base is synthesizaed at a time. Each cycle consists of four steps: deprotection, coupling, capping, and oxidation.

Gibson assembly: DNA fragments with overlapping ends are mixed together. Enzymes (exonuclease, DNA polymerase, and DNA ligase) are added to the mixture and stitch the DNA fragments together.

5.2 DNA Edit

(i) What DNA would you want to edit and why?

I would want to edit the DNA sequence of a gene that is mutated in a genetic disease, such as cystic fibrosis. By editing the DNA sequence to correct the mutation, we can potentially cure the disease.

Q1: How does your technology of choice edit DNA? What are the essential steps? CRISPR-Cas9 is a commonly used technology for DNA editing.

First, a guide RNA (gRNA) is designed to target the specific DNA sequence to be edited. The gRNA is then complexed with the Cas9 protein to form a ribonucleoprotein (RNP) complex which will be delivered into the target cells. The RNP will bind to the target DNA sequence and creates a break. The cell’s repair mechanisms then repair the break according to a repair template.

Q2: What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

Preparation and input: gRNA sequence that corresponds to the edit site (find), repair template (replace).

Q3: What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The main limitation is the off-target effect, where the edits are applied to unintended sites.

Week 03 HW: Lab Automation

Python Script for Opentrons Artwork

I created a design using opentrons-art.rcdonovan.com

Opentrons-Art Website: https://opentrons-art.rcdonovan.com/?id=80fx569l8o4tho4

Google Colab: https://colab.research.google.com/drive/1UPiCmwBP3sIFD_rNVRHeT3YhuiQQ5ZGP#scrollTo=pczDLwsq64mk&line=6&uniqifier=1

The OpentronMock gives the following output:

Code:

  ###
  ### YOUR CODE HERE to create your design
  ###
  sfgfp_points = [(-3.3, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(-3.3, -5.5),(-1.1, -5.5),(1.1, -5.5),(3.3, -5.5),(-1.1, -7.7),(1.1, -7.7),(-5.5, -9.9),(-1.1, -9.9),(1.1, -9.9),(5.5, -9.9),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1)]
  mrfp1_points = [(-23.1, 27.5),(-20.9, 27.5),(-18.7, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(-23.1, 25.3),(-20.9, 25.3),(-18.7, 25.3),(-16.5, 25.3),(-14.3, 25.3),(14.3, 25.3),(16.5, 25.3),(18.7, 25.3),(20.9, 25.3),(23.1, 25.3),(-23.1, 23.1),(-20.9, 23.1),(-18.7, 23.1),(-16.5, 23.1),(-14.3, 23.1),(-12.1, 23.1),(-3.3, 23.1),(-1.1, 23.1),(1.1, 23.1),(3.3, 23.1),(12.1, 23.1),(14.3, 23.1),(16.5, 23.1),(18.7, 23.1),(20.9, 23.1),(23.1, 23.1),(-23.1, 20.9),(-20.9, 20.9),(-18.7, 20.9),(-16.5, 20.9),(-14.3, 20.9),(-12.1, 20.9),(-9.9, 20.9),(-7.7, 20.9),(-5.5, 20.9),(-3.3, 20.9),(-1.1, 20.9),(1.1, 20.9),(3.3, 20.9),(5.5, 20.9),(7.7, 20.9),(9.9, 20.9),(12.1, 20.9),(14.3, 20.9),(16.5, 20.9),(18.7, 20.9),(20.9, 20.9),(23.1, 20.9),(-23.1, 18.7),(-20.9, 18.7),(-18.7, 18.7),(-16.5, 18.7),(-14.3, 18.7),(-12.1, 18.7),(-9.9, 18.7),(-7.7, 18.7),(7.7, 18.7),(9.9, 18.7),(12.1, 18.7),(14.3, 18.7),(16.5, 18.7),(18.7, 18.7),(20.9, 18.7),(23.1, 18.7),(-23.1, 16.5),(-20.9, 16.5),(-18.7, 16.5),(-16.5, 16.5),(-14.3, 16.5),(-12.1, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(23.1, 16.5),(-23.1, 14.3),(-20.9, 14.3),(-18.7, 14.3),(-16.5, 14.3),(16.5, 14.3),(18.7, 14.3),(20.9, 14.3),(23.1, 14.3),(-23.1, 12.1),(-20.9, 12.1),(-18.7, 12.1),(18.7, 12.1),(20.9, 12.1),(23.1, 12.1),(-23.1, 9.9),(-20.9, 9.9),(20.9, 9.9),(23.1, 9.9),(-23.1, 7.7),(-20.9, 7.7),(20.9, 7.7),(23.1, 7.7),(-23.1, 5.5),(23.1, 5.5),(-25.3, 3.3),(-23.1, 3.3),(23.1, 3.3),(25.3, 3.3),(-25.3, 1.1),(-23.1, 1.1),(23.1, 1.1),(25.3, 1.1),(-25.3, -1.1),(-23.1, -1.1),(23.1, -1.1),(25.3, -1.1),(-25.3, -5.5),(-23.1, -5.5),(23.1, -5.5),(25.3, -5.5),(-25.3, -7.7),(25.3, -7.7),(-23.1, -9.9),(23.1, -9.9),(-23.1, -12.1),(23.1, -12.1),(-23.1, -14.3),(23.1, -14.3),(-20.9, -16.5),(20.9, -16.5),(-20.9, -18.7),(-18.7, -18.7),(18.7, -18.7),(20.9, -18.7),(-18.7, -20.9),(-16.5, -20.9),(16.5, -20.9),(18.7, -20.9),(-16.5, -23.1),(-14.3, -23.1),(-12.1, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(7.7, -25.3),(9.9, -25.3),(12.1, -25.3),(14.3, -25.3),(-9.9, -27.5),(-7.7, -27.5),(-5.5, -27.5),(-3.3, -27.5),(-1.1, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(9.9, -27.5),(-1.1, -29.7),(1.1, -29.7)]
  azurite_points = [(-9.9, 7.7),(-7.7, 7.7),(7.7, 7.7),(9.9, 7.7),(-12.1, 5.5),(-9.9, 5.5),(-7.7, 5.5),(7.7, 5.5),(9.9, 5.5),(12.1, 5.5),(-9.9, 3.3),(9.9, 3.3)]
  mwasabi_points = [(-27.5, -3.3),(-25.3, -3.3),(-23.1, -3.3),(-20.9, -3.3),(-18.7, -3.3),(-16.5, -3.3),(16.5, -3.3),(18.7, -3.3),(20.9, -3.3),(23.1, -3.3),(25.3, -3.3),(27.5, -3.3),(-23.1, -7.7),(-20.9, -7.7),(-18.7, -7.7),(-16.5, -7.7),(16.5, -7.7),(18.7, -7.7),(20.9, -7.7),(23.1, -7.7),(-27.5, -9.9),(-25.3, -9.9),(25.3, -9.9),(27.5, -9.9),(-16.5, -12.1),(16.5, -12.1),(-20.9, -14.3),(-18.7, -14.3),(18.7, -14.3),(20.9, -14.3),(-23.1, -16.5),(23.1, -16.5),(-25.3, -18.7),(25.3, -18.7)]

  scale = 1

  def draw_points(points, color="Red"):
    segments = []
    for i in range(0, len(points), 20):
      segments.append(points[i : i+20])
    for seg in segments:
      pipette_20ul.pick_up_tip()
      pipette_20ul.aspirate(len(seg), location_of_color(color))
      for x, y in seg:
        adjusted_location = center_location.move(types.Point(x=x*scale, y=y*scale))
        dispense_and_detach(pipette_20ul, 1, adjusted_location)
      pipette_20ul.drop_tip()

  draw_points(sfgfp_points, "Red")
  draw_points(mrfp1_points, "Green")
  draw_points(azurite_points, "Orange")
  draw_points(mwasabi_points, "Orange")

Result

With the help of our TA Ronan, the art was printed with an Opentrons robot. The result is shown below:

Published paper about lab automation

PyLabRobot: An open-source, hardware-agnostic interface for liquid-handling robots and accessories. Wierenga, Rick P. et al. Device, Volume 1, Issue 4, 100111

This paper introduces PyLabRobot, an open-source Python library that provides a unified interface for controlling various liquid-handling robots and accessories, including Opentrons. PyLabRobot also includes a simulator (like the OpentronMock provided in this homework’s Google Colab notebook), which allows users to test and debug their protocols without needing access to the physical robot. Further, this paper also demonstrates the integration with LLMs, allowing users who are not familiar with programming to create protocols using natural language instructions, which are then translated into executable code for the robot.

My plan to use automation tools

I am interested in using lab automation to do machine-learning guided directed evolution of PET-ase (PET plastic degradation enzyme).

First, I will need to use machine learning models such as ProteinMPNN to design an initial library of PET-ase variants. I will place orders for the DNA fragments of these variants from Twist Bioscience.

Second, I will use liquid handler to assemble the DNA fragments into plasmids, and then transform the plasmids into E. coli cells.

Then, I will use a plate reader to measure the activity of the PET-ase variants in degrading PET plastic. This can also be done in a high-throughput manner using 96-well or 384-well plates with an automation robot.

Finally, I will use the activity data to train a machine learning model to predict the activity of new PET-ase variants, and then use the model to design the next round of variants for testing. This iterative process can be repeated until we find highly active PET-ase variants for degrading PET plastic.

Week 04 HW: Protein Design Part I

Part A: Conceptual Questions

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

1 Dalton = 1.66 x 10^-24 grams. 100 Daltons = 1.66 x 10^-22 grams. 1 gram = 6.02 x 10²³ molecules. 20% of meat is protein, so 100 grams of proteins in 500 grams of meat. Therefore: 100 x 6.02 x 10²³ = 6.02 x 10²⁵ molecules of amino acids.

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

We digest proteins into amino acids rather than absorbing proteins into our genomes.

3. Why are there only 20 natural amino acids?

20 natural amino acids are sufficient to create chemical diversity and efficiency.

4. Can you make other non-natural amino acids? Design some new amino acids.

Yes. For example, we can replace the sulfur atom in cysteine with selenium to create selenocysteine.

5. Where did amino acids come from before enzymes that make them, and before life started?

Chemical reactions in the environment.

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Left-handed.

7. Can you discover additional helices in proteins?

Yes. For example, the pi-helix and the 3-10-helix.

8. Why are most molecular helices right-handed?

Most amino acids in life are L-amino acids => right-handed helices.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

Because of the parallel and anti-parallel hydrogen bonding between the strands. The driving force is the formation of hydrogen bonds.

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

I chose EGFR (Epidermal Growth Factor Receptor), because it is a protein that plays a critical role in cell growth and division, and it is often mutated in various cancers.

2. Identify the amino acid sequence of your protein

MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALAVLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFKNCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAFENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKLFGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVMGENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVVALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPKFRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQQGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTEDSIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLNTVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRVAPQSSEFIGA

Sequence Length: 1210 amino acids

Amino Acid Frequencies:
L: 111 (9.17%)
G: 85 (7.02%)
S: 84 (6.94%)
E: 77 (6.36%)
P: 75 (6.20%)
A: 72 (5.95%)
V: 70 (5.79%)
I: 69 (5.70%)
K: 66 (5.45%)
N: 66 (5.45%)
T: 64 (5.29%)
D: 61 (5.04%)
R: 60 (4.96%)
C: 60 (4.96%)
Q: 49 (4.05%)
F: 36 (2.98%)
Y: 36 (2.98%)
H: 31 (2.56%)
M: 25 (2.07%)
W: 13 (1.07%)

BLAST results (BLAST 250 results found in UniProtKB): https://www.uniprot.org/blast/uniprotkb/ncbiblast-R20260301-211408-0195-52595182-p2m/overview

Family: Protein kinase superfamily. Tyr protein kinase family. EGF receptor subfamily.

3. Identify the structure page of your protein in RCSB

There are multiple structures of different EGFR domains in RCSB PDB. The earliest one is 1M14 deposited on 2002-06-17. Resolution is 2.60 Å, so it is a good quality structure.

There are water molecules in the structure, but no ligands or cofactors.

Structure classification family:

Structural Class: Alpha and beta proteins (a+b)
Fold: Protein kinase-like (PK-like)
Superfamily: Protein kinase-like (PK-like)
Family: Protein kinases catalytic domain-like

4. Open the structure of your protein in any 3D molecule visualization software:

Cartoon:

Ribbon:

Ball and stick:

Secondary structure (6 alpha helices and 6 beta strands):

Residue type (green: hydrophilic, gray: hydrophobic. Hydrophilic residues are more on the surface, while hydrophobic residues are more buried inside):

Surface (it has obvious pockets on the surface, as it is a receptor):

Part C. Using ML-Based Protein Design Tools

In this part, I will use a different protein sequence: Poly(ethylene terephthalate) hydrolase (PETase) (https://www.uniprot.org/uniprotkb/A0A0K8P6T7/entry), which is an enzyme that can degrade PET plastics. The PDB access code is 5XFY.

MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPSSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS

C1. Protein Language Modeling

1. Deep Mutational Scans

Columns with more dark cells (the wild-type amino acid is strongly preferred) indicate more conserved residues, which are likely important for the structure and function of the protein.

2. Latent Space Analysis

Neighboring proteins usually share the same SCOPe structural class and superfamily number, which indicates that the latent space captures structural and functional similarities between proteins.

When placing my PETase sequence embedding into the dimensionality reduction plot, many of its neighbors belong to c.69: alpha/beta-Hydrolases.

C2. Protein Folding

The protein on the left is the experimental structure of PETase (PDB: 5XFY), and the protein on the right is the predicted structure by ESMFold. The two structures are very similar, with a predicted TM score of 0.913. The RMSD reported by PyMOL is 0.540Å.

Changing the sequence with mutations does not have impact on the predicted structure. However, changing large fragments sometimes blows up the predicted structure.

C3. Protein Generation

I used ProteinMPNN to design a sequence based on the structure of the PETase I chose (PDB: 5XFY). The designed sequence is:

NPTVLGPEPTRESLEAPRGPFAVESFEVAAPQGFGAGTVYWPRDAGGKVPAIAIAPGYGQGRAAVAWKGELLASHGFVVLVIDPRSPTSDAPQIAAELMAGLAYLDALNADPASPIYGKIDTSRRGVSGHSLGGGGALIAAMENPELKAAAPMAPYHPETDFSKITVPTLIFASENDTIAPPEKYSKPMYNSITKAPKRLLTIKGGDHGATLTGNPHRGLIGRYLVAWFALYMRDDKRYSEFATENPDSDDVSYWESSNLS

1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

For about half of the sequence, the highest probability amino acid is the same as the original sequence. For the other half, the predicted amino acid is different from the original one, but they often have similar properties (e.g., both are hydrophobic or both are polar).

2. Input this sequence into ESMFold and compare the predicted structure to your original.

The protein on the left is the experimental structure of PETase (PDB: 5XFY), and the protein on the right is the predicted structure by ESMFold based on the ProteinMPNN-designed sequence. The predicted structure of the designed sequence is very similar to the original structure (0.756Å), and ESMFold is very confident about the prediction (predicted TM score = 0.944).

Week 05 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

The original sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutate the 4th amino acid A to V (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:

index	Binder	Pseudo Perplexity
0	HLYYAVALELKX	13.299815648347872
1	WRSYAVVLELWK	17.97100111129112
2	WRYYPVAAAWKK	11.081842724779028
3	WHYGAVGLRHKX	13.983770011694478

The perplexity of the reference SOD-1 binding sequence FLYRWLPSRRGG is 20.63523127283615:

ppl_value = compute_pseudo_perplexity(model, tokenizer, protein_seq, "FLYRWLPSRRGG")
ppl_value  # Output: 20.63523127283615

Part 2: Evaluate Binders with AlphaFold3

The IPTM scores for the reference peptide against the wild-type and mutant SOD1 are both pretty low (0.36 and 0.41 respectively), indicating that AlphaFold is not very confident about the predicted binding structure. The first three generated peptides have IPTM scores of 0.24, 0.25, and 0.32, which are lower than the reference. The last generated peptide has an IPTM score of 0.43, which is higher than the reference.

Only the third (A4V-2) generated peptide binds to the dimerization interface of SOD1.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Predicted properties of the generated peptides:

Sequence	Property	Prediction	Value	Unit
HLYYAVALELK	💧 Solubility	Soluble	1.000	Probability
HLYYAVALELK	🩸 Hemolysis	Non-hemolytic	0.081	Probability
HLYYAVALELK	🔗 Binding Affinity	Weak binding	6.052	pKd/pKi
HLYYAVALELK	📏 Length		11	aa
HLYYAVALELK	⚖️ Molecular Weight		1319.5	Da
HLYYAVALELK	⚡ Net Charge (pH 7)		-0.15
HLYYAVALELK	🎯 Isoelectric Point		6.75	pH
HLYYAVALELK	💦 Hydrophobicity (GRAVY)		0.55	GRAVY
WRSYAVVLELWK	💧 Solubility	Soluble	1.000	Probability
WRSYAVVLELWK	🩸 Hemolysis	Non-hemolytic	0.130	Probability
WRSYAVVLELWK	🔗 Binding Affinity	Weak binding	6.818	pKd/pKi
WRSYAVVLELWK	📏 Length		12	aa
WRSYAVVLELWK	⚖️ Molecular Weight		1549.8	Da
WRSYAVVLELWK	⚡ Net Charge (pH 7)		0.76
WRSYAVVLELWK	🎯 Isoelectric Point		8.59	pH
WRSYAVVLELWK	💦 Hydrophobicity (GRAVY)		0.17	GRAVY
WRYYPVAAAWKK	💧 Solubility	Soluble	1.000	Probability
WRYYPVAAAWKK	🩸 Hemolysis	Non-hemolytic	0.021	Probability
WRYYPVAAAWKK	🔗 Binding Affinity	Weak binding	6.124	pKd/pKi
WRYYPVAAAWKK	📏 Length		12	aa
WRYYPVAAAWKK	⚖️ Molecular Weight		1538.8	Da
WRYYPVAAAWKK	⚡ Net Charge (pH 7)		2.76
WRYYPVAAAWKK	🎯 Isoelectric Point		10.00	pH
WRYYPVAAAWKK	💦 Hydrophobicity (GRAVY)		-0.72	GRAVY
WHYGAVGLRHK	💧 Solubility	Soluble	1.000	Probability
WHYGAVGLRHK	🩸 Hemolysis	Non-hemolytic	0.023	Probability
WHYGAVGLRHK	🔗 Binding Affinity	Weak binding	5.442	pKd/pKi
WHYGAVGLRHK	📏 Length		11	aa
WHYGAVGLRHK	⚖️ Molecular Weight		1323.5	Da
WHYGAVGLRHK	⚡ Net Charge (pH 7)		1.93
WHYGAVGLRHK	🎯 Isoelectric Point		9.99	pH
WHYGAVGLRHK	💦 Hydrophobicity (GRAVY)		-0.73	GRAVY

Predicted properties of the reference peptide:

Sequence	Property	Prediction	Value	Unit
FLYRWLPSRRGG	💧 Solubility	Soluble	1.000	Probability
FLYRWLPSRRGG	🩸 Hemolysis	Non-hemolytic	0.047	Probability
FLYRWLPSRRGG	🔗 Binding Affinity	Weak binding	5.968	pKd/pKi
FLYRWLPSRRGG	📏 Length		12	aa
FLYRWLPSRRGG	⚖️ Molecular Weight		1507.7	Da
FLYRWLPSRRGG	⚡ Net Charge (pH 7)		2.76
FLYRWLPSRRGG	🎯 Isoelectric Point		11.71	pH
FLYRWLPSRRGG	💦 Hydrophobicity (GRAVY)		-0.71	GRAVY

The peptide WHYGAVGLRHK has the highest ipTM score of 0.43, but it has a relatively low predicted binding affinity (5.442 pKd/pKi). The peptide WRSYAVVLELWK has a lower ipTM score of 0.25 but a higher predicted binding affinity (6.818 pKd/pKi). None of the generated peptides are predicted to be hemolytic or poorly soluble. The peptide WRSYAVVLELWK best balances predicted binding and therapeutic properties, as it has a reasonably high ipTM score and the highest predicted binding affinity among the generated peptides.

Part 4: Generate Optimized Peptides with moPPIt

Generated peptide sequence with predicted solubility score, affinity score, and hemolysis score:

['DFRQSTTYQY']
[0.9166666865348816, 6.323781490325928, 0.7198045253753662]

The moPPIt-generated peptide DFRQSTTYQY has a higher predicted binding affinity score and solubility score compared to the PepMLM-generated peptides. Before advancing this peptide to clinical studies, I would evaluate its binding affinity experimentally in vitro, and further assess its stability, toxicity, and pharmacokinetic properties in cell and animal models.

Part C: L-Protein Mutants

I first used Boltz-2 to predict the complex structure of the wild-type L-protein and DnaJ protein:

Next, I used FoldX, a force field-based protein design tool that can predict the effects of mutations on protein-protein interfaces. The goal is to identify mutations in the L-protein that are energetically favorable to stabilize the interaction with DnaJ.

To do this, I first relax the sidechain structure of the L-protein using the following command:

foldx --command=RepairPDB --pdb=result.pdb

The relaxation process slightly adjusts sidechain conformations to minimize steric clashes and optimize interactions. The resulting relaxed structure is shown below in blue, (green and cyan are the original structure predicted by Boltz-2):

Next, I scan through L-protein and mutate each residue to all 20 amino acids, and compute the change in binding energy (ΔΔG) for each mutation using the following command:

# Soluble
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=MA1a,EA2a,TA3a,RA4a,FA5a,PA6a,QA7a,QA8a,SA9a,QA10a,QA11a,TA12a,PA13a,AA14a,SA15a,TA16a,NA17a,RA18a,RA19a,RA20a,PA21a,FA22a,KA23a,HA24a,EA25a,DA26a,YA27a,PA28a,CA29a,RA30a,RA31a,QA32a,QA33a,RA34a,SA35a,SA36a,TA37a,LA38a,YA39a,VA40a

# Transmembrane
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=LA41a,IA42a,FA43a,LA44a,AA45a,IA46a,FA47a,LA48a,SA49a,KA50a,FA51a,TA52a,NA53a,QA54a,LA55a,LA56a,LA57a,SA58a,LA59a,LA60a,EA61a,AA62a,VA63a,IA64a,RA65a,TA66a,VA67a,TA68a,TA69a,LA70a,QA71a,QA72a,LA73a,LA74a,TA75a

The result of mutations on soluble region is shown below, green indicates mutations that are predicted to stabilize the interaction (negative ddG), while red indicates mutations that are predicted to destabilize the interaction (positive ddG):

The result of mutations on TM region is shown below:

Based on the results above, I would propose the following multi-site mutations in the soluble region:

DA26L + NA17W + CA29W: sum = −9.23
DA26L + EA25P + QA8T + FA22H: sum = −9.03
DA26L + NA17W + RA4E + SA9F: sum = −9.95
DA26L + EA2D + FA5M + RA20N: sum = −8.69
DA26L + HA24P + PA28K + RA34M: sum = −7.63

My rationale is that combining single stabilizing mutations will have an additive effect on the overall binding affinity. However, this assumption ignores potential epistatic interactions between mutations (non-additive effects).

Week 06 HW: Genetic Circuits Part I

DNA Assembly

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion DNA Polymerase: synthesizes new DNA strands by adding nucleotides to the template strand during PCR
dNTPs (deoxynucleotides): building blocks of DNA
Buffer: provides the optimal conditions

2. What are some factors that determine primer annealing temperature during PCR?

Tm (melting temperature) of the primer: the temperature at which half of the DNA duplex dissociates
Primer length: longer primers generally require higher annealing temperatures
GC content: higher GC content increases the Tm and may require higher annealing temperatures
Salt concentration: higher salt concentrations can stabilize the DNA duplex and may require higher annealing temperatures

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR:
- Protocol: denaturation, annealing, and extension steps to amplify specific DNA sequences using primers and a DNA polymerase
- Advantages: can amplify specific DNA sequences from complex mixtures, does not require specific restriction sites, can introduce mutations or tags through primer design
- Disadvantages: may produce non-specific products, requires optimization of conditions, can be time-consuming
Restriction enzyme digests:
- Protocol: cutting DNA at specific sites using restriction enzymes
- Advantages: can generate specific fragments based on known restriction sites
- Disadvantages: requires specific restriction sites

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Create overlapping sequences at the ends of the DNA fragments that are complementary to each other, which allows for efficient assembly during Gibson cloning.

5. How does the plasmid DNA enter the E. coli cells during transformation?

Heat shock: Generate pores in bacterial cell wall with an abrupt temperature change
Electroporation: Generate pores in bacterial cell wall with high electrical voltage

6. Describe another assembly method in detail (such as Golden Gate Assembly)

Golden Gate Assembly: assembly of multiple DNA fragments in a single reaction using type IIS restriction enzymes and DNA ligase. The type IIS restriction enzymes cut the recognition site and creates a overhang that is designed to be complementary to the overhang of the fragment to be inserted, allowing for seamless assembly. This method is efficient and allows for the assembly of multiple fragments in a single reaction.

Asimov Kernel

Recreate Repressilator

I recreated the Repressilator by dragging the parts one by one from the Characterized Bacterial Parts repository to the editor.

Below are the simulation results:

The results look similar to the original Repressilator construct simulation results, which shows oscillations in the expression levels of the three proteins.

My Construct

I created a construct consists of pLacI promoter, RBS, LacI coding sequence, and terminator. Below are the simulation results, which show that the LacI protein is expressed at a high level:

Next, I tried to remove the promoter, and the simulation results show that the LacI protein is not expressed at all:

I also tried to put the promoter after the LacI coding sequence, and the simulation results show that the LacI protein is not expressed at all:

Week 07 HW: Genetic Circuits Part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

IANNs can perform more complex computations than traditional genetic circuits, which are limited to Boolean functions. IANNs can process continuous inputs and produce continuous outputs, allowing for more nuanced control of gene expression. Additionally, IANNs can learn and adapt over time, making them more versatile and capable of handling dynamic environments.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

One useful application for an IANN could be in the field of personalized medicine. An IANN can sense biomarkers from a patient’s blood and regulate the release of drugs in response to the biomarkers.

3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

X1 (DNA encoding Csy4) --> Tx --> Csy4 mRNA --> Tl --> Csy4 protein
X2 (DNA encoding fluorescent protein) --> Tx --> fluorescent protein mRNA (regulated by Csy4) --> Tl --> fluorescent protein output

Assignment Part 2: Fungal Materials

1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Mycelium-based materials used for packaging and insulation.

Advantages: sustainability, biodegradability, and low environmental impact.

Disadvantages: limited durability and potential for mold growth.

2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Bioproduction of biofuels or other valuable chemicals. Fungi have the ability to break down complex organic materials, making them well-suited for this purpose. Fungi can grow in a wide range of environments, which makes them a more sustainable option compared to bacteria.

Week 09 HW: Cell-Free Systems

General homework questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Advantages: cell-free systems allow for precise control over the reaction conditions, such as temperature, pH, and the concentration of substrates and cofactors. This can lead to higher yields and faster protein production compared to in vivo methods.

Examples:

Production of toxic proteins that would harm living cells.
Rapid prototyping of genetic circuits without the need for transformation and cell growth.

2. Describe the main components of a cell-free expression system and explain the role of each component.

Cell extract: contains the necessary machinery for transcription and translation, including ribosomes, tRNAs, and enzymes.
Energy source: provides the necessary energy for protein synthesis, such as ATP or GTP.
DNA template: contains the genetic information for the protein to be synthesized.
Substrates: amino acids and nucleotides required for protein synthesis.
Cofactors: molecules that assist in enzymatic reactions, such as magnesium ions.
Buffer: maintains the optimal pH and ionic strength for the reaction.

3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

The synthesis of proteins requires ATP supply.

Phosphoenolpyruvate (PEP): it can be converted to pyruvate by the enzyme pyruvate kinase, which generates ATP from ADP in the process.

4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic: high yields, high speed, and low cost. Suitable for simple proteins without post-translational modifications (e.g. Green Fluorescent Protein GFP).

Eukaryotic: can perform post-translational modifications, but slower and more expensive. Suitable for complex proteins that require PTM (e.g. human insulin).

5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Membrane proteins are hydrophobic (i.e. lipophilic). Therefore, I would use a cell-free system that mimcs the membrane lipid environment to avoid misfolding and aggregation. In specific, I would use a cell-free system that includes liposomes to create a suitable environment.

6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

DNA tempalte degraded: replace the DNA template with a fresh one
Insufficient energy supply: add more energy source (e.g. ATP or PEP)

Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows:

1.a. What would your synthetic cell do? What is the input and what is the output?

I would design a synthetic cell that can detect viral pathogens in water samples. The input would be the water sample, and the output would some kind of signal. For example, the cell could produce luciferase in response to the presence of viral RNA, which would emit light that can be easily detected.

1.b. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Yes.

The cell-free system can be designed to include the necessary components for transcription and translation, as well as a reporter gene that produces a detectable signal in response to the target viral RNA.

Encapsulation is not needed for this function, as the cell-free system can operate in solution.

1.c. Could this function be realized by genetically modified natural cell?

Yes. For example, based on Noctiluca scintillans.

1.d. Describe the desired outcome of your synthetic cell operation.

Sensitive and low-cost detection of viral pathogens in water samples.

2. Design all components that would need to be part of your synthetic cell.

a. What would be the membrane made of?

Lipid bilayer.

b. What would you encapsulate inside? Enzymes, small molecules.

Enzymes for transcription and translation, a reporter gene (e.g. luciferase), substrates (luciferin) and cofactors.

c. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)

Bacterial system is sufficient for this application.

d. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

The viral RNA can enter the cell-free system through pores in the membrane, and the luciferin substrate can also diffuse into the system to enable the luciferase reaction.

3. Experimental details

a. List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

Lipids: phosphatidylcholine, phosphatidylethanolamine, cholesterol.

Genes: RNA polymerase, luciferase gene.

b. How will you measure the function of your system?

Adding samples of viral RNA and measure the luminescence.

Homework question from Peter Nguyen

1. Write a one-sentence summary pitch sentence describing your concept.

A cell-free system integrated into textiles that can sense environmental pollutants and change color to alert the user.

2. How will the idea work, in more detail? Write 3-4 sentences or more.

The cell-free system will be designed to detect specific pollutants, such as heavy metals or viral pathogens. When the target pollutant is detected, the system will trigger a colorimetric reaction that changes the color of the textile. This could be achieved by incorporating a reporter gene that produces a pigment in response to the pollutant. The textile could be used in clothing or facial masks to provide real-time detection capabilities for human.

3. What societal challenge or market need will this address?

Not reusable: the color change is irreversible, so the textile would need to be replaced after each use. This could be addressed by designing a system that can be reset or by using a reversible color change mechanism.

4. How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

To address the limitation of one-time use, I would design the system to be reversible. The color would change back when the pollutant is no longer detected.

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

In space, the intense radiation can cause genetic variation and protein misfolding, leading to reduced functionality and potential health risks for astronauts. A cell-free system could be used to produce functional proteins in space, mitigating these issues and supporting long-term space exploration.

2. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

The molecular target I propose to study is the DNA repair enzyme, specifically the protein RAD51, which plays a crucial role in repairing DNA damage caused by radiation.

3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

RAD51 is essential for repairing DNA double-strand breaks, which are a common form of damage caused by space radiation. By studying RAD51 in a cell-free system, we can understand how it functions under space-like conditions and potentially enhance its activity.

4. Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

Hypothesis: Enhancing the activity of RAD51 in a cell-free system can improve DNA repair efficiency under space-like conditions.

5. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

I will use a cell-free system to express RAD51 and expose it to simulated space radiation. I will measure the DNA repair efficiency by assessing the ability of RAD51 to repair induced DNA damage using a reporter assay. Controls will include a cell-free system without RAD51 and a system with a known DNA repair-deficient mutant of RAD51.

Week 10 HW: Imaging and Measurement

Final Project

1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

We will measure PETase thermostability (Tm), residual activity after heat challenge, expression yield, and PET-degradation rate for each variant.

2. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

The measurable elements are fold-stability metrics (Tm and thermal survival), catalytic output (product formation rate), and protein production quality (yield/purity) across designed variants.

3. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

We will use AI-guided sequence design (ProteinMPNN/ESM), recombinant expression and purification, thermal denaturation assays, activity assays, and LC-MS to quantify and compare variant performance.

Homework: Waters Part I; Molecular Weight

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

Theoretical pI/Mw: 5.58 / 26941.48 Da (Excluding: LEHHHHHHH)

Theoretical pI/Mw: 5.90 / 28006.60 Dat (Including: LEHHHHHHH)

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

1. Determine z for each adjacent pair of peaks (n, n+1):

z = 903.7148 / (903.7148 - 875.4421) = 31.96

2. Determine the MW of the protein

MW = (m/z * z) - (z * 1.007276 Da) = (903.7148 * 31.96) - (31.96 * 1.007276 Da) = 28850.53 Da

3. Calculate the accuracy of the measurement

(28850.53 - 28006.60) / 28006.60 = 3.01%

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

No, the peaks are too close together to resolve the charge states.

Homework: Waters Part II — Secondary/Tertiary structure

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

Denatured: unfoleded, exposing more surface area and resulting in higher charge states and broader distribution of peaks.

Native: folded, less surface area and lower charge states and sharper peaks.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell?

m / z = 2800, MW = 28006.60 Da => z = (MW / m/z) = 10

Isotope spacing: approximately 0.1 = 1 / z

Homework: Waters Part III — Peptide Mapping - primary structure

1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

MVS[K]GEELFTG VVPILVELDG DVNGH[K]FSVS GEGEGDATYG [K]LTL[K]FICTT G[K]LPVPWPTL VTTLTYGVQC FS[R]YPDHM[K]Q HDFF[K]SAMPE GYVQE[R]TIFF [K]DDGNY[K]T[R]A EV[K]FEGDTLV N[R]IEL[K]GIDF [K]EDGNILGH[K] LEYNYNSHNV YIMAD[K]Q[K]NG I[K]VNF[K]I[R]HN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALS[K]D PNE[K][R]DHMVL LEFVTAAGIT LGMDELY[[K]]LE HHHHHH

2. How many peptides will be generated from tryptic digestion of eGFP?

19 peptides

mass	position	peptide sequence
4472.1752	170-210	HNIEDGSVQLADHYQQNTPI GDGPVLLPDNHYLSTQSALS K
2566.2931	217-239	DHMVLLEFVTAAGITLGMDE LYK
2437.2608	5-27	GEELFTGVVPILVELDGDVN GHK
2378.2577	54-74	LPVPWPTLVTTLTYGVQCFS R
1973.9062	142-157	LEYNYNSHNVYIMADK
1503.6597	28-42	FSVSGEGEGDATYGK
1266.5783	87-97	SAMPEGYVQER
1083.4979	240-247	LEHHHHHH
1050.5214	115-123	FEGDTLVNR
982.4952	133-141	EDGNILGHK
821.3940	81-86	QHDFFK
790.3552	75-80	YPDHMK
769.3913	47-53	FICTTGK
711.2944	103-108	DDGNYK
655.3813	98-102	TIFFK
602.2780	211-215	DPNEK
579.3137	128-132	GIDFK
507.2925	164-167	VNFK
502.3235	124-127	IELK

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

21 peaks.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

There are more peaks in the chromatogram than predicted peptides. This could be due to post-translational modifications, missed cleavages, or other factors that generate additional peptide species.

5. Identify the mass-to-charge ($\frac{m}{z}$) of the peptide shown in Figure 5b. What is the charge ($z$) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ($\small{[M\!\!+\!\!H]^+}$) based on its $\frac{m}{z}$ and $z$.

Most abundant peak: m/z = 525.76712

Isotope spacing: 0.5

Charge: z = 1 / 0.5 = 2

Mass = 525.76712 * 2 = 1051.53 Da

6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm.

1050.5214	115-123	0	FEGDTLVNR

Accuracy = (1051.53 - 1050.5214) / 1050.5214 = 0.00096

Error = 0.00096 * 1,000,000 = 960 ppm

7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

88%

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

1. Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST.

Done.

2. Make a note on your HTGAA webpages including: what you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”), what you liked about the project, and what about this collaborative art experiment could be made better for next year.

I contributed a little dot on the bottom right plate, but it was overlapped by other contributions later. I liked the interactivity of the project. It’s cool to see how the plates evolve over time as more people contribute.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

E. coli Lysate: Provides the necessary cellular machinery (e.g., ribosomes, tRNAs, enzymes) for transcription and translation to occur in the cell-free system.
Salts/Buffer: Provide suitable pH conditions for cell-free reactions.
Energy / Nucleotide System: Provide energy sources needed for the reactions.
Translation Mix (Amino Acids): Provide the building blocks for protein synthesis.
Additives (Nicotinamide): Improve the efficiency of the reactions.
Backfill (Nuclease Free Water): Adjust the final volume of the reaction and ensure that it is nuclease-free to prevent degradation of DNA/RNA.

2. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

1-hour optimized PEP-NTP: Rapid protein synthesis, using phosphoenolpyruvate (PEP) as an energy source.

20-hour NMP-Ribose-Glucose master mix: Optimized for longer reactions, using nucleoside monophosphates (NMPs) and glucose for sustained energy production over a longer period.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

sfGFP

Maturation Time: 5–15 minutes, very fast
Acid Sensitivity: generally more stable, maintains roughly 50% of its fluorescence at pH 5.4
Folding: high folding efficiency, fluorescence is thus enhanced in cell-free systems
Oxygen Dependence: oxygen required

2. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

Use glucose as the main energy source in the master mix to sustain energy production over a longer period, which could improve the folding and maturation of sfGFP, leading to increased fluorescence over a 36-hour incubation.

3. The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

4. The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!). The reaction composition for each well will be as follows:

Week 12 HW: Bioproduction

No homework assignment for this week. Working on the final project.

Week 13 HW: Bio-Design & Living Materials

No homework assignment for this week. Working on the final project.

Week 14 HW: Biofabrication

No homework assignment for this week. Working on the final project.

Labs

Lab writeups:

Week 1 Lab: Pipetting
I used P20 (1-20uL) pipettes to create a letter “H” pattern with red food coloring solution contained in an Eppendorf tube.
Week 10 Lab: Mass Spectrometry
See homework for this week.
Week 11 Lab: Cloud Lab
See homework for this week.
Week 12 Lab: Bioproduction
No lab assignment for this week. Working on the final project.
Week 2 Lab: DNA Gel Art
Louisa, Jasmine, Yutong and I worked on this lab together. TAE Buffer Preparation We mixed 8mL of TAE (50% concentration) with 492mL of deionized water to make 500mL of TAE buffer (1% concentration). We also added dye to the buffer to make the electrophoresis process traceable. Agarose Electrophoresis Gel Preparation First, we added 0.75g of agarose powder and 75mL of TAE buffer into a microwavable flask. We shaked the flask to mix the powder and the buffer.
Week 3 Lab: Opentrons Art
Python Script for Opentrons Artwork I created a design using opentrons-art.rcdonovan.com Opentrons-Art Website: https://opentrons-art.rcdonovan.com/?id=80fx569l8o4tho4 Google Colab: https://colab.research.google.com/drive/1UPiCmwBP3sIFD_rNVRHeT3YhuiQQ5ZGP#scrollTo=pczDLwsq64mk&line=6&uniqifier=1 Result With the help of our TA Ronan, the art was printed with an Opentrons robot. The result is shown below:
Week 4 Lab: Protein Design Part I
See Homework 4.
Week 5 Lab: Protein Design Part II
See Homework 5.
Week 6 Lab: Gibson Assembly
Louisa, Yutong, Jasmine and I worked together to complete this lab. Day 1: PCR and DNA Purification PCR First, we performed PCR to amplify the backbone and color DNA fragments. We prepared the PCR reactions according to the tables below, and ran the PCR program on the thermocycler.
Week 7 Lab: Neuromorphic Circuits
Louisa, Yao, and I worked together to complete this lab. Below is the design of the neuromorphic circuit. The first input signal X1 is PgU, and the second input signal X2 is PgU_rec_CasE. When PgU is high, it will repress CasE, and when CasE is repressed, mNeon will be high. Below are the simulation and results of the neuromorphic circuit. Simulation Result
Week 9 Lab: Cell-Free Systems
See homework for this week.

Week 1 Lab: Pipetting

I used P20 (1-20uL) pipettes to create a letter “H” pattern with red food coloring solution contained in an Eppendorf tube.

Week 10 Lab: Mass Spectrometry

See homework for this week.

Week 11 Lab: Cloud Lab

See homework for this week.

Week 12 Lab: Bioproduction

No lab assignment for this week. Working on the final project.

Week 2 Lab: DNA Gel Art

Louisa, Jasmine, Yutong and I worked on this lab together.

TAE Buffer Preparation

We mixed 8mL of TAE (50% concentration) with 492mL of deionized water to make 500mL of TAE buffer (1% concentration). We also added dye to the buffer to make the electrophoresis process traceable.

Agarose Electrophoresis Gel Preparation

First, we added 0.75g of agarose powder and 75mL of TAE buffer into a microwavable flask. We shaked the flask to mix the powder and the buffer.

Next, we microwaved the flask for multiple rounds of 20 seconds each until the agarose powder was completely dissolved.

After that, we waited for the agarose solution to cool down for about 20 minutes. We then added 7.5uL of SYBR Safe DNA stain to the solution.

Finally, we poured the agarose solution into a gel mold, inserted the comb, and let it solidify for about 30 minutes. After the gel solidified, we removed the comb and placed the gel.

Digestion

According to the gel art designed, we used EcoRV, SacI, BamHI, and KpnI. We created the mixture of Lambda DNA, the restriction enzyme, and the buffer according to the protocol. After mixing the solution, we incubated it at 37C for 30 minutes to allow the digestion to occur.

Running the Gel

First, we poured the TAE buffer into the gel box until the gel was submerged. Next, we pipetted the digested DNA samples into the wells of the gel.

We then connected the gel box to the power supply and ran the gel at 120V for about 30 minutes.

Result

Finally, we visualized the gel in the imaging system. The gel art is shown below. It does not look exactly like the design we created, but at least we gave it a try and had fun in the process!

Week 3 Lab: Opentrons Art

Python Script for Opentrons Artwork

I created a design using opentrons-art.rcdonovan.com

Opentrons-Art Website: https://opentrons-art.rcdonovan.com/?id=80fx569l8o4tho4

Google Colab: https://colab.research.google.com/drive/1UPiCmwBP3sIFD_rNVRHeT3YhuiQQ5ZGP#scrollTo=pczDLwsq64mk&line=6&uniqifier=1

Result

With the help of our TA Ronan, the art was printed with an Opentrons robot. The result is shown below:

Week 4 Lab: Protein Design Part I

See Homework 4.

Week 5 Lab: Protein Design Part II

See Homework 5.

Week 6 Lab: Gibson Assembly

Louisa, Yutong, Jasmine and I worked together to complete this lab.

Day 1: PCR and DNA Purification

PCR

First, we performed PCR to amplify the backbone and color DNA fragments. We prepared the PCR reactions according to the tables below, and ran the PCR program on the thermocycler.

Ice bucket
Phusion HF PCR Master Mix
Primers (5 uM stock)
UltraPure Water
PCR tubes
Thermocycler
P20 pipette and 10uL tips
P200 pipette and 200uL tips

Backbone DNA Fragment (Primers: Backbone Fwd and Backbone Rev)

Stock Conc.	Desired Conc.	Volume (uL)
Template mUAV Plasmid	38.5	20	0.8
Backbone Forward Primer	5 uM	0.5 uM	2.5
Backbone Reverse Primer	5 uM	0.5 uM	2.5
Phusion HF PCR Mix	2X	1x	12.5
Nuclease-free water			6.8
Total Volume			25.0

Color DNA Fragment (Primers: Color Fwd and Color Rev)

Stock Conc.	Desired Conc.	Volume (uL)
Template mUAV Plasmid	38.5	20	0.8
Color Forward Primer	5 uM	0.5 uM	2.5
Color Reverse Primer	5 uM	0.5 uM	2.5
Phusion HF PCR Mix	2X	1x	12.5
Nuclease-free water			6.8
Total Volume			25.0

Below are the photos of the PCR tubes in the thermocycler:

After the PCR program was completed, we ran the E-gel. We observed clear bands at the expected sizes for both the backbone and color fragments, indicating that the PCR was successful.

DNA Purification:

PCR products
Zymo DNA Clean & Concentrator
UltraPure Water
1.5ml microcentrifuge tubes
50ml Falcon tube for liquid waste
Centrifuge (Set to 13,000 rpm, or roughly 17,900 x g)
Nanodrop/Qbit
P200 pipette with 200uL tips
P20 pipette with 20uL tips

Day 2: Gibson Assembly and Transformation

Gibson Assembly

Backbone fragment purified, Color fragment(s) purified
Gibson Assembly Master Mix
PCR tubes
UltraPure Water
Thermal Cycler
P20 pipette and 10uL tips
Ice Bucket

Reagent	Stock Conc. (ng/uL)	Desired Conc (ng/uL)	Volume (uL)
Backbone Fragment	50	25	0.5
Color fragment (Single)	50	50	1.0
Gibson Assembly Mix	2X	1X	5
Nuclease-free water			3.5
Total Volume			10

Transformation

Result

After 72 hours of incubation:

Week 7 Lab: Neuromorphic Circuits

Louisa, Yao, and I worked together to complete this lab.

Below is the design of the neuromorphic circuit. The first input signal X1 is PgU, and the second input signal X2 is PgU_rec_CasE. When PgU is high, it will repress CasE, and when CasE is repressed, mNeon will be high.

Below are the simulation and results of the neuromorphic circuit.

Simulation	Result

Week 9 Lab: Cell-Free Systems

See homework for this week.

Projects

Final projects:

Individual Final Project
Deep learning guided optimization of PETase Introduction In this project, we will explore the application of deep learning techniques to optimize the performance of PETase, an enzyme that can degrade polyethylene terephthalate (PET), a common plastic. The goal is to enhance the thermal stability and catalytic efficiency of PETase through computational methods.

Individual Final Project

Deep learning guided optimization of PETase

Introduction

In this project, we will explore the application of deep learning techniques to optimize the performance of PETase, an enzyme that can degrade polyethylene terephthalate (PET), a common plastic. The goal is to enhance the thermal stability and catalytic efficiency of PETase through computational methods.

Aims

Aim 1: Develop a deep learning model to predict the stability of PETase variants

In this aim, we will create a deep learning model that can predict the thermal stability of different PETase variants. We will use a dataset of known PETase variants and their corresponding stability data to train the model. The model will be designed to take into account various features of the enzyme, such as amino acid sequence and structural properties.

Aim 2: Synthesize the wild-type PETase and selected variants

In this aim, we will synthesize the wild-type PETase and a selection of variants that are predicted to have improved stability based on the deep learning model. We will use recombinant DNA technology to express and purify these enzymes for further testing.

Aim 3: Evaluate the catalytic efficiency of the synthesized PETase variants

In this aim, we will assess the catalytic efficiency of the synthesized PETase variants. We will perform enzymatic assays to measure the rate of PET degradation by each variant under various conditions. This will allow us to determine if the predicted improvements in stability also translate to enhanced catalytic performance.

Aim 4 (Visionary Aim): Close the loop between computational predictions and experimental validation via laboratory automation

In this visionary aim, we will implement a closed-loop system that integrates computational predictions with experimental validation. We will use laboratory automation to rapidly test the predicted PETase variants and feed the results back into the deep learning model for further refinement. This iterative process will enable us to continuously improve the model’s predictive accuracy and optimize the performance of PETase variants in a more efficient manner.

Deep learning model to predict the stability of PETase variants

Zero-shot Prediction of Mutational Effects with ProteinMPNN

ProteinMPNN is a pre-trained graph neural network that generates protein sequences for given protein backbone structures. It learns the conditional probability of protein sequence autoregressively conditioned on the backbone structure:

$$\log p(\mathbf{s} | \mathbf{X}) = \sum_{i=1}^L \log p( s_{\sigma(i)} | s_{\sigma(1)}, \ldots, s_{\sigma(i-1)}, \mathbf{X} )$$

where:

$L$ is the number of amino acids in the structure.
$\sigma(\cdot)$ is a permutation of $L$.
$\mathbf{X}$ is the backbone coordinate matrix.
$\mathbf{s} = [s_1, \ldots, s_L]$ is the amino acid sequence.
$s_i \in { 1 \ldots 20}$ denotes the amino acid type.

ProteinMPNN can be used to score amino acid types on a specific position $k$ by the following marginal probability:

$$\log p(s_k | \mathbf{s}_{\backslash k}, \mathbf{X})$$

The difference in likelihood, $\log p(s_k = \text{M} | \mathbf{s}{\backslash k}, \mathbf{X}) - \log p(s_k = \text{W} | \mathbf{s}{\backslash k}, \mathbf{X})$, is used to estimate the mutational effect of substituting the amino acid W on position $k$ to amino acid M.

We benchmarked the original pre-trained ProteinMPNN on the SKEMPI dataset. The likelihood difference only shows weak correlation with experimental $\Delta \Delta G$, which elucidates the gap between the structural compatibility information learned by ProteinMPNN and the experimental catalytic performance.

Assaywise Alignment with Experimental Data

The visionary aim of this project is to close the loop between computational predictions and experimental validation via laboratory automation. To achieve this, we implement an assaywise alignment strategy that integrates the deep learning model’s predictions with experimental data from enzymatic assays.

We repurposed the Direct Preference Optimization (DPO) method to align ProteinMPNN with experimental experimental values.

Each mutation entry is treated as an ordered pair of preference:

When $\Delta \Delta G > 0$, the wild-type amino acid is preferred over the mutant.
When $\Delta \Delta G < 0$, the mutant amino acid is preferred over the wild-type.

Intuitively, the DPO loss function increases the likelihood of the favorable amino acid type and penalizes the unfavorable one:

$$\mathcal{L} = \begin{cases} -\log \sigma\left( \beta \log \frac{p_\theta(\text{W}|\mathbf{s}, \mathbf{X})}{p_{\text{ref}}(\text{W}|\mathbf{s}, \mathbf{X})} - \beta\log \frac{p_\theta(\text{M}|\mathbf{s}, \mathbf{X})}{p_{\text{ref}}(\text{M}|\mathbf{s}, \mathbf{X})} \right) & \Delta \Delta G > 0 \ -\log \sigma\left( \beta \log \frac{p_\theta(\text{M}|\mathbf{s}, \mathbf{X})}{p_{\text{ref}}(\text{M}|\mathbf{s}, \mathbf{X})} - \beta\log \frac{p_\theta(\text{W}|\mathbf{s}, \mathbf{X})}{p_{\text{ref}}(\text{W}|\mathbf{s}, \mathbf{X})} \right) & \Delta \Delta G < 0 \end{cases}$$

Data Augmentation via Mutation Merging

A limitation of preference optimization using only individual mutations from the dataset is that the model learns if a mutation is favorable or not, but it does not learn to compare different mutations directly. To address this, we augmented the training data by combining pairs of mutations.

Suppose there are two mutations at the same position in the dataset:

A30D: $\Delta\Delta G = -1.0$
A30K: $\Delta\Delta G = +2.0$

We can combine these by inverting the first mutation into D30A ($\Delta\Delta G = +1.0$) and then merging it with the second to create D30K ($\Delta\Delta G = 3.0$).

While data augmentation introduces necessary ordering between mutations, we cannot merge every possible pair. Doing so might introduce artificial ordering between mutations from different experimental settings. Therefore, we use an assay-wise approach: (1) Two mutations are only merged if they belong to the same assay; (2) “same assay” is defined by sharing the same protein structure, the same publication source, and the same experimental method.

By restricting the scope of merging, we guarantee that relative ordering is introduced only within the same assay and not across inconsistent experimental conditions.

Zero-shot Prediction of PETase Mutational Effects with ProteinMPNN

The PETase sequence we used is A0A0K8P6T7:

MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPSSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS

There are multiple structures of PETase available in the Protein Data Bank (PDB). We used the structure with PDB ID 5XFY for ProteinMPNN inference. We scored all 19 possible amino acid substitutions on each position of the PETase sequence and ranked them by the likelihood difference compared to the wild-type amino acid.

According to the predictions, top 10 mutations are:

T84P
E202R
T84L
E202L
K230F
S95M
H75W
E202M
A123R
S159R

Synthesize the wild-type PETase

DNA synthesis

The codon-optimized for E. coli DNA sequence submitted for synthesis is as follows:

ATGCATCACCATCACCATCACCAGACGAATCCGTATGCGCGAGGCCCGAACCCAACGGCGGCGTCGCTGGAAGCCAGCGCCGGCCCATTCACCGTGCGTAGCTTTACGGTTTCCCGCCCGTCCGGCTACGGTGCTGGGACAGTATATTACCCGACCAATGCGGGCGGCACTGTGGGGGCGATCGCCATTGTGCCTGGTTACACTGCTCGTCAGAGTAGCATTAAATGGTGGGGCCCTCGCCTGGCCAGCCACGGTTTTGTCGTCATCACGATTGATACCAACTCCACCCTAGATCAGCCATCGAGCCGCAGTTCGCAACAGATGGCGGCGTTACGGCAGGTTGCGAGCTTGAATGGCACGTCTAGCTCTCCGATCTACGGAAAAGTGGATACCGCACGTATGGGCGTTATGGGCTGGAGTATGGGAGGCGGTGGGTCACTGATTAGCGCCGCCAATAATCCGTCCTTAAAGGCTGCGGCACCCCAAGCACCTTGGGATTCAAGCACCAACTTTAGTAGCGTCACTGTACCCACACTGATTTTTGCATGCGAAAACGATTCAATTGCGCCGGTGAATAGCTCGGCGCTCCCGATATATGACTCTATGTCACGTAATGCGAAGCAATTTCTTGAAATCAATGGAGGTTCTCATAGCTGTGCAAACTCAGGTAACTCAAACCAGGCCCTGATTGGTAAAAAAGGCGTTGCCTGGATGAAACGTTTCATGGATAATGACACCCGCTATTCGACATTCGCCTGCGAGAACCCGAACTCGACCCGCGTGAGTGACTTCCGCACCGCAAACTGTAGCTAA

The vector used for cloning is pET-28a(+), which contains a T7 promoter.

Cell-free protein synthesis

I used the Ginkgo CFPS protocol for cell-free protein synthesis of PETase. The reaction mixture contains the following components:

6 uL of E. coli lysate
10 uL of Economy Reagent Mix
4 uL of DNA template (at least 10 nM)

Below is the photo of the pipetting process for creating the CFPS reaction mixture:

The reaction was mixed and centrifuged to remove bubbles. Then, it was incubated at 30°C for 24 hours.

Below is the incubated CFPS reaction mixture (in the leftmost PCR tube) after 24 hours:

SDS-PAGE analysis

To verify the expression of PETase, we performed SDS-PAGE analysis on the CFPS reaction mixture. The expected molecular weight of PETase is approximately 28 kDa.

Below is the protocol followed for SDS-PAGE analysis:

1. Sample Preparation (1:20 Final Concentration)

We will perform a two-step dilution to ensure the protein is concentrated enough to see clearly.

Step 1 (Intermediate 1:10 Dilution):
Mix 2 µL CFPS stock + 18 µL PBS.
Total volume: 20 µL.
Step 2 (Final 1:20 Loading Sample):
Mix 10 µL of the 1:10 dilution (from Step 1).
Add 5 µL 4X Bolt LDS Sample Buffer (No DTT added).
Add 5 µL DNase-free water.
Total volume: 20 µL.
Final Concentration: 1:20 CFPS in 1X Sample Buffer.

2. Denaturation

Heat the samples at 95°C for 10 minutes.
Note: Even without DTT, heating is required to allow the SDS (in the LDS buffer) to coat the proteins and give them a uniform negative charge.

3. Gel Tank & Buffer Setup

Buffer: Dilute your 20X Bolt MES SDS Running Buffer stock to 1X with deionized water (e.g., 25 mL stock + 475 mL water).
Assembly: Remove the comb and the tape from the bottom of the Bolt Bis-Tris Plus 4–12% gel. Place it in the Mini Gel Tank.
Fill: Fill the inner chamber (between the gels) and the outer chamber with the 1X MES buffer.

4. Loading and Running

Load Ladder: Add 3 µL of PageRuler prestained ladder.
Load Samples: Load 15 µL of your 1:20 prepared sample per lane.
Tip: Using 15 µL instead of 10 µL will deliver even more protein to the lane to help with the “faint band” issue.
Run Settings:
Run at 180 V (constant).
Stop when the dye front is near the bottom.

5. Staining and Imaging

Stain: Use Coomassie Brilliant Blue stain to visualize the protein bands.
Destain: Destain the gel to reduce background and enhance band visibility in water for 12 hours.

Mini-prep to create even more DNA template for CFPS

We only have ~500 ng of DNA template from the initial synthesis, which is not enough for multiple rounds of CFPS. To create more DNA template, we first cloned the synthesized DNA into the pET-28a(+) vector and transformed it into E. coli for amplification. After growing the transformed bacteria, we used a mini-prep kit to extract the plasmid DNA.

Transformation and culturing

Below is the protocol followed for the culturing process, adopted from the protocol of Week 6 Gibson Assembly lab:

Thaw competent cells on ice in their containers (meaning: take a tube of cells out of the -80C freezer and put it directly to ice, and wait until it becomes liquid). Thaw for exactly 10 minutes
Take microcentrifuge tubes (one for each Gibson Assembly reaction). Label them.
Transfer 20uL of competent cells to the tube.
Transfer 4uL of your purified & diluted Gibson Assembly product into each tube. Keep the tubes on ice.
Incubate on ice for 30 min.
Take your ice bucket to the heat bath. Set-up a timer. With your hands holding both tubes, submerge the tubes in the 42°C heat bath (or thermal cycler) such that half of the tube is submerged. Keep the tubes at 42°C for exactly 45 seconds, then transfer the tubes back to ice for 5 minutes.
Add 200uL-500uL of SOC media to each of the tubes, and grow in a (shaking) incubator for 60 minutes.
While you’re waiting, label your agar plates.
For the first plates, transfer 100uL from each tube to its appropriate plate and use plating beads or a plastic spreader. Dispose of the beads into a clear container to be recycled (never put them back in the jar!).
Incubate the plates at 37°C for ₇₂ 24 hours. Make sure you place the plates with the agar upside down to avoid condensation issues.
For all of the extra liquid you have that contains bacteria - transfer to the liquid waste disposal tube. Add bleach to kill all bacteria (this is how we dispose of anything that has live bacteria in it).

The next day, I saw colonies on the agar plates, which indicates successful transformation. I picked 3 colonies and grew them in liquid culture for mini-prep.

The liquid culture was heated at 30°C for 12 hours to allow the bacteria to grow.

Mini-prep

First, I transferred the liquid culture to centrifuge tubes (1.5mL each) and centrifuged them to pellet the bacteria. Then, I removed the supernatant.

After that, I followed the mini-prep kit protocol to extract the plasmid DNA from the bacterial pellets.

The final eluted DNA was quantified using a spectrophotometer, and I obtained a concentration of approximately 30~50 ng/µL, which is sufficient for more CFPS.

1	2	3	4	5

Evaluate the capability of PET degradation

To evaluate the catalytic capability of the synthesized PETase, I bought a PET film from Amazon. The PET film was cut into small pieces, and was preprocessed by heating at 95°C for 30 minutes to increase its surface area and make it more accessible to enzymatic degradation.


PET film bought from Amazon	PET film cut into small pieces and heated at 95°C for 30 minutes	Small PET pieces soaked in the PETase solution

I soaked the preprocessed PET pieces in the CFPS reaction mixture containing the synthesized PETase for 48 hours at 30°C.

Shitong Luo — HTGAA Spring 2026

About me

Contact info

Homework

Labs

Projects

Subsections of Shitong Luo — HTGAA Spring 2026

Homework

Weekly homework submissions:

Subsections of Homework

Week 01 HW: Principles and Practices

Class Assignment

1. First, describe a biological engineering application or tool you want to develop and why.

2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Lab Preparation

Week 2 Lecture Prep

Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Your HTGAA Website

Week 02 HW: DNA Read, Write, & Edit

Part 0: Basics of Gel Electrophoresis

Part 1: Benchling & In-silico Gel Art

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1. Choose your protein.

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

3.3. Codon optimization.

3.4. You have a sequence! Now what?

Part 4: Prepare a Twist DNA Synthesis Order

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

5.2 DNA Edit

(i) What DNA would you want to edit and why?

Week 03 HW: Lab Automation

Python Script for Opentrons Artwork

Result

Published paper about lab automation

My plan to use automation tools

Week 04 HW: Protein Design Part I

Part A: Conceptual Questions

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

3. Why are there only 20 natural amino acids?

4. Can you make other non-natural amino acids? Design some new amino acids.

5. Where did amino acids come from before enzymes that make them, and before life started?

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

7. Can you discover additional helices in proteins?

8. Why are most molecular helices right-handed?

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

2. Identify the amino acid sequence of your protein

3. Identify the structure page of your protein in RCSB

4. Open the structure of your protein in any 3D molecule visualization software:

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

1. Deep Mutational Scans

2. Latent Space Analysis

C2. Protein Folding

C3. Protein Generation

1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

2. Input this sequence into ESMFold and compare the predicted structure to your original.

Week 05 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM