Homework

Weekly homework submissions:

  • Week 01 HW: Principles and Practices

    Class Assignment 1. First, describe a biological engineering application or tool you want to develop and why. I want to optimize PETase (polyethylene terephthalate hydrolase). PETase is an enzyme that can break down PET plastics, which are widely used in packaging. By optimizing PETase, we can enhance its efficiency in degrading PET and increase its stability under various conditions. This could lead to more effective recycling processes and help reduce plastic pollution.

  • Week 02 HW: DNA Read, Write, & Edit

    Part 0: Basics of Gel Electrophoresis I have attended the recitation. Part 1: Benchling & In-silico Gel Art I made the gel art below. It is “HT” for “How To grow almost anything”. Part 2: Gel Art - Restriction Digests and Gel Electrophoresis I worked in group with Louisa, Jasmine, and Yutong. We tried to make the cat gel art designed by Louisa, but unfortunately it was not very successful. Photo below:

  • Week 03 HW: Lab Automation

    Python Script for Opentrons Artwork I created a design using opentrons-art.rcdonovan.com Opentrons-Art Website: https://opentrons-art.rcdonovan.com/?id=80fx569l8o4tho4 Google Colab: https://colab.research.google.com/drive/1UPiCmwBP3sIFD_rNVRHeT3YhuiQQ5ZGP#scrollTo=pczDLwsq64mk&line=6&uniqifier=1 The OpentronMock gives the following output: Code:

### YOUR CODE HERE to create your design ### sfgfp_points = [(-3.3, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(-3.3, -5.5),(-1.1, -5.5),(1.1, -5.5),(3.3, -5.5),(-1.1, -7.7),(1.1, -7.7),(-5.5, -9.9),(-1.1, -9.9),(1.1, -9.9),(5.5, -9.9),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1)] mrfp1_points = [(-23.1, 27.5),(-20.9, 27.5),(-18.7, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(-23.1, 25.3),(-20.9, 25.3),(-18.7, 25.3),(-16.5, 25.3),(-14.3, 25.3),(14.3, 25.3),(16.5, 25.3),(18.7, 25.3),(20.9, 25.3),(23.1, 25.3),(-23.1, 23.1),(-20.9, 23.1),(-18.7, 23.1),(-16.5, 23.1),(-14.3, 23.1),(-12.1, 23.1),(-3.3, 23.1),(-1.1, 23.1),(1.1, 23.1),(3.3, 23.1),(12.1, 23.1),(14.3, 23.1),(16.5, 23.1),(18.7, 23.1),(20.9, 23.1),(23.1, 23.1),(-23.1, 20.9),(-20.9, 20.9),(-18.7, 20.9),(-16.5, 20.9),(-14.3, 20.9),(-12.1, 20.9),(-9.9, 20.9),(-7.7, 20.9),(-5.5, 20.9),(-3.3, 20.9),(-1.1, 20.9),(1.1, 20.9),(3.3, 20.9),(5.5, 20.9),(7.7, 20.9),(9.9, 20.9),(12.1, 20.9),(14.3, 20.9),(16.5, 20.9),(18.7, 20.9),(20.9, 20.9),(23.1, 20.9),(-23.1, 18.7),(-20.9, 18.7),(-18.7, 18.7),(-16.5, 18.7),(-14.3, 18.7),(-12.1, 18.7),(-9.9, 18.7),(-7.7, 18.7),(7.7, 18.7),(9.9, 18.7),(12.1, 18.7),(14.3, 18.7),(16.5, 18.7),(18.7, 18.7),(20.9, 18.7),(23.1, 18.7),(-23.1, 16.5),(-20.9, 16.5),(-18.7, 16.5),(-16.5, 16.5),(-14.3, 16.5),(-12.1, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(23.1, 16.5),(-23.1, 14.3),(-20.9, 14.3),(-18.7, 14.3),(-16.5, 14.3),(16.5, 14.3),(18.7, 14.3),(20.9, 14.3),(23.1, 14.3),(-23.1, 12.1),(-20.9, 12.1),(-18.7, 12.1),(18.7, 12.1),(20.9, 12.1),(23.1, 12.1),(-23.1, 9.9),(-20.9, 9.9),(20.9, 9.9),(23.1, 9.9),(-23.1, 7.7),(-20.9, 7.7),(20.9, 7.7),(23.1, 7.7),(-23.1, 5.5),(23.1, 5.5),(-25.3, 3.3),(-23.1, 3.3),(23.1, 3.3),(25.3, 3.3),(-25.3, 1.1),(-23.1, 1.1),(23.1, 1.1),(25.3, 1.1),(-25.3, -1.1),(-23.1, -1.1),(23.1, -1.1),(25.3, -1.1),(-25.3, -5.5),(-23.1, -5.5),(23.1, -5.5),(25.3, -5.5),(-25.3, -7.7),(25.3, -7.7),(-23.1, -9.9),(23.1, -9.9),(-23.1, -12.1),(23.1, -12.1),(-23.1, -14.3),(23.1, -14.3),(-20.9, -16.5),(20.9, -16.5),(-20.9, -18.7),(-18.7, -18.7),(18.7, -18.7),(20.9, -18.7),(-18.7, -20.9),(-16.5, -20.9),(16.5, -20.9),(18.7, -20.9),(-16.5, -23.1),(-14.3, -23.1),(-12.1, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(7.7, -25.3),(9.9, -25.3),(12.1, -25.3),(14.3, -25.3),(-9.9, -27.5),(-7.7, -27.5),(-5.5, -27.5),(-3.3, -27.5),(-1.1, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(9.9, -27.5),(-1.1, -29.7),(1.1, -29.7)] azurite_points = [(-9.9, 7.7),(-7.7, 7.7),(7.7, 7.7),(9.9, 7.7),(-12.1, 5.5),(-9.9, 5.5),(-7.7, 5.5),(7.7, 5.5),(9.9, 5.5),(12.1, 5.5),(-9.9, 3.3),(9.9, 3.3)] mwasabi_points = [(-27.5, -3.3),(-25.3, -3.3),(-23.1, -3.3),(-20.9, -3.3),(-18.7, -3.3),(-16.5, -3.3),(16.5, -3.3),(18.7, -3.3),(20.9, -3.3),(23.1, -3.3),(25.3, -3.3),(27.5, -3.3),(-23.1, -7.7),(-20.9, -7.7),(-18.7, -7.7),(-16.5, -7.7),(16.5, -7.7),(18.7, -7.7),(20.9, -7.7),(23.1, -7.7),(-27.5, -9.9),(-25.3, -9.9),(25.3, -9.9),(27.5, -9.9),(-16.5, -12.1),(16.5, -12.1),(-20.9, -14.3),(-18.7, -14.3),(18.7, -14.3),(20.9, -14.3),(-23.1, -16.5),(23.1, -16.5),(-25.3, -18.7),(25.3, -18.7)] scale = 1 def draw_points(points, color="Red"): segments = [] for i in range(0, len(points), 20): segments.append(points[i : i+20]) for seg in segments: pipette_20ul.pick_up_tip() pipette_20ul.aspirate(len(seg), location_of_color(color)) for x, y in seg: adjusted_location = center_location.move(types.Point(x=xscale, y=yscale)) dispense_and_detach(pipette_20ul, 1, adjusted_location) pipette_20ul.drop_tip() draw_points(sfgfp_points, "Red") draw_points(mrfp1_points, "Green") draw_points(azurite_points, "Orange") draw_points(mwasabi_points, "Orange") Result With the help of our TA Ronan, the art was printed with an Opentrons robot. The result is shown below:

  • Week 04 HW: Protein Design Part I

    Part A: Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) 1 Dalton = 1.66 x 10-24 grams. 100 Daltons = 1.66 x 10-22 grams. 1 gram = 6.02 x 1023 molecules. 20% of meat is protein, so 100 grams of proteins in 500 grams of meat. Therefore: 100 x 6.02 x 1023 = 6.02 x 1025 molecules of amino acids.

  • Week 05 HW: Protein Design Part II

    Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM The original sequence of SOD1 is: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Mutate the 4th amino acid A to V (A4V): MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:

  • Week 06 HW: Genetic Circuits Part I

    DNA Assembly 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion DNA Polymerase: synthesizes new DNA strands by adding nucleotides to the template strand during PCR dNTPs (deoxynucleotides): building blocks of DNA Buffer: provides the optimal conditions 2. What are some factors that determine primer annealing temperature during PCR? Tm (melting temperature) of the primer: the temperature at which half of the DNA duplex dissociates Primer length: longer primers generally require higher annealing temperatures GC content: higher GC content increases the Tm and may require higher annealing temperatures Salt concentration: higher salt concentrations can stabilize the DNA duplex and may require higher annealing temperatures 3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. PCR:

  • Week 07 HW: Genetic Circuits Part II

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? IANNs can perform more complex computations than traditional genetic circuits, which are limited to Boolean functions. IANNs can process continuous inputs and produce continuous outputs, allowing for more nuanced control of gene expression. Additionally, IANNs can learn and adapt over time, making them more versatile and capable of handling dynamic environments.

  • Week 09 HW: Cell-Free Systems

    General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Advantages: cell-free systems allow for precise control over the reaction conditions, such as temperature, pH, and the concentration of substrates and cofactors. This can lead to higher yields and faster protein production compared to in vivo methods.

  • Week 10 HW: Imaging and Measurement

    Final Project 1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. We will measure PETase thermostability (Tm), residual activity after heat challenge, expression yield, and PET-degradation rate for each variant.

  • Week 11 HW: Bioproduction & Cloud Labs

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork 1. Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST. Done.

  1. Make a note on your HTGAA webpages including: what you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”), what you liked about the project, and what about this collaborative art experiment could be made better for next year. I contributed a little dot on the bottom right plate, but it was overlapped by other contributions later. I liked the interactivity of the project. It’s cool to see how the plates evolve over time as more people contribute.

Subsections of Homework

Week 01 HW: Principles and Practices

Class Assignment

1. First, describe a biological engineering application or tool you want to develop and why.

I want to optimize PETase (polyethylene terephthalate hydrolase). PETase is an enzyme that can break down PET plastics, which are widely used in packaging. By optimizing PETase, we can enhance its efficiency in degrading PET and increase its stability under various conditions. This could lead to more effective recycling processes and help reduce plastic pollution.

I plan to use AI models such as ProteinMPNN to propose mutations and test them in the lab.

One governance goal for optimizing PETase is to ensure that the enzyme does not have unintended consequences on the environment or human health, such as producing harmful byproducts.

Possible sub-goals:

  • Identify possible byproducts in the lab.
  • Test the toxicity of the byproducts.
  • Ensure that there are no byproducts that could be harmful to environment and health.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

  • [Option 1] As researchers, we could conduct comprehensive testing of the optimized PETase to identify any potential harmful byproducts and assess their toxicity.

    • Purpose: To ensure that the optimized PETase does not produce harmful byproducts.
    • Design: Conduct experiments to identify all the products produced by the PETase.
    • Assumptions: We know what it looks like when there’s no unexpected byproducts.
    • Risks of Failure: If harmful byproducts are not identified, it could lead to environmental or health issues.
    • Success: The optimized PETase is found to be safe and does not produce harmful byproducts.
  • [Option 2] Companies that produce the enzymes should provide detailed information about the enzyme’s properties, including any potential risks and safety measures.

    • Purpose: To sufficiently inform users about the potential risks associated with the enzyme.
    • Design: Disclose information about the enzyme’s properties and potential risks.
    • Assumptions: Companies will comply with the reporting requirements and provide accurate and sufficient information.
    • Risks of Failure: Could lead to mishandling of the enzyme.
    • Success: Users are well-informed about the enzyme and can use it safely.
  • [Option 3] Regulators could establish guidelines for safe use and disposal to minimize potential impact.

    • Purpose: To ensure that the enzyme is used and disposed of properly to minimize potential environmental impact.
    • Design: Develop guidelines of best practices for the use and disposal.
    • Assumptions: Users will follow the guidelines.
    • Risks of Failure: If users do not follow the guidelines, it could lead to negative consequences.
    • Success: Environmental and health impact are minimized.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents122
• By helping respond212
Foster Lab Safety
• By preventing incident112
• By helping respond222
Protect the environment
• By preventing incidents113
• By helping respond212
Other considerations
• Minimizing costs and burdens to stakeholders123
• Feasibility?112
• Not impede research222
• Promote constructive applications222

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

I would prioritize Option 1, because this option is our responsibility as researchers, and it is the most direct way to ensure safety by eliminating risks at the source.

Lab Preparation

  • Complete Lab Specific Training in Person.
  • Complete Safety Training in Atlas

Week 2 Lecture Prep

Questions from Professor Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Error rate: $1:10^6$

Human genome length: 3.2 Gbp (billion base pairs)

Mechanisms to deal with the discrepancy: proofreading and repairing (MutS)

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Average human protein: 345 amino acids. Number of different ways to code: $3^{345}$.

Reasons that not all codes work: codon bias among species and mRNA secondary structure.

Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

Phosphodiester method.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because the yield decreases exponentially with length, and the error rate increases with length as well.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Because the cumulative error rate would be too high leading to practically zero yield.

Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Leucine, Lysine, Arginine.

Lysine is essential for protein synthesis and enzyme production, which is critical for survival.

Your HTGAA Website

Here it is!

Week 02 HW: DNA Read, Write, & Edit

Part 0: Basics of Gel Electrophoresis

I have attended the recitation.

Part 1: Benchling & In-silico Gel Art

I made the gel art below. It is “HT” for “How To grow almost anything”.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

I worked in group with Louisa, Jasmine, and Yutong. We tried to make the cat gel art designed by Louisa, but unfortunately it was not very successful. Photo below:

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose EGFR (Epidermal Growth Factor Receptor), because it is a protein that plays a critical role in cell growth and division, and it is often mutated in various cancers.

>sp|P00533|EGFR_HUMAN Epidermal growth factor receptor OS=Homo sapiens OX=9606 GN=EGFR PE=1 SV=2
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV
VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA
VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF
QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC
TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV
VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK
NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF
ENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKL
FGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCN
LLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVM
GENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVV
ALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGS
GAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGI
CLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAA
RNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSY
GVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPK
FRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQ
QGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTED
SIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLN
TVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRV
APQSSEFIGA

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

I used the reverse translation tool on the Twist Bioscience website. I added a stop codon (TAA) manually to the end of the DNA sequence. The DNA sequence is as follows:

>Egfr
ATGCGTCCGTCCGGCACTGCAGGGGCCGCACTGTTGGCTCTGCTCGCAGCCTTATGCCCCGCCAG
TCGAGCATTGGAGGAGAAGAAGGTGTGCCAGGGAACCTCTAACAAGCTGACACAACTCGGGACCT
TCGAGGACCATTTCCTGAGCCTCCAGAGAATGTTTAACAACTGCGAGGTTGTGTTGGGGAATCTG
GAGATTACCTACGTCCAGCGTAACTACGACCTCAGTTTCCTCAAGACAATACAGGAAGTGGCTGG
TTATGTTCTGATTGCACTGAATACCGTAGAGAGAATCCCCCTGGAAAACCTGCAAATTATTAGAG
GGAACATGTACTACGAAAACTCATATGCACTGGCCGTCCTGTCTAATTACGATGCCAACAAAACC
GGCCTGAAGGAGCTGCCCATGAGGAATCTCCAGGAAATTCTGCATGGGGCTGTAAGATTCAGCAA
TAACCCGGCTCTCTGTAACGTCGAGAGCATCCAGTGGCGCGACATAGTGAGCTCTGATTTCCTTT
CAAATATGTCCATGGATTTTCAAAACCATCTGGGGTCTTGTCAAAAGTGCGACCCATCTTGTCCC
AATGGTTCCTGTTGGGGCGCCGGAGAGGAGAATTGTCAGAAGCTTACCAAGATTATCTGTGCACA
GCAGTGTTCAGGGAGATGTAGGGGGAAGTCTCCAAGCGATTGCTGTCACAATCAATGCGCCGCCG
GGTGCACCGGCCCCAGAGAGAGTGACTGTTTAGTGTGCCGTAAATTCAGGGATGAGGCTACGTGC
AAAGACACATGCCCTCCACTCATGTTGTACAACCCAACAACTTATCAGATGGACGTGAACCCCGA
AGGGAAGTATAGCTTTGGGGCGACGTGCGTCAAGAAATGTCCACGCAACTATGTCGTTACGGACC
ACGGTTCCTGTGTGCGCGCTTGCGGAGCCGATTCATACGAAATGGAGGAAGACGGCGTCCGCAAG
TGTAAGAAGTGTGAAGGTCCTTGTCGCAAAGTCTGTAACGGTATCGGAATCGGCGAATTCAAGGA
CTCTCTCAGTATTAATGCAACCAATATCAAGCATTTTAAAAATTGTACCTCAATCTCAGGGGACT
TGCACATCCTGCCCGTTGCGTTTAGGGGGGACTCATTTACACACACCCCTCCACTGGATCCACAG
GAATTGGATATTCTGAAGACGGTGAAGGAAATCACCGGGTTCCTCCTGATTCAGGCCTGGCCAGA
GAACCGGACCGACCTTCACGCATTTGAGAACCTTGAAATTATTCGCGGGAGAACGAAACAACATG
GGCAATTCAGTCTCGCTGTCGTCAGCCTTAACATTACCTCCCTCGGATTGAGATCACTGAAGGAG
ATAAGCGACGGGGACGTGATAATCTCTGGGAACAAGAACTTGTGCTACGCCAACACTATTAACTG
GAAGAAGTTGTTCGGTACCTCAGGTCAGAAGACTAAAATCATCTCTAATCGAGGCGAAAATTCTT
GCAAAGCAACAGGGCAGGTGTGCCATGCCCTGTGCTCCCCTGAAGGGTGCTGGGGCCCCGAACCA
AGAGACTGCGTCTCTTGTCGGAATGTTTCCCGGGGGAGGGAATGTGTGGATAAGTGCAACCTGCT
TGAAGGAGAACCGAGAGAGTTTGTGGAAAATTCCGAATGTATCCAATGCCACCCTGAATGTTTAC
CTCAGGCCATGAACATTACATGCACAGGTAGGGGCCCCGACAATTGTATTCAGTGCGCCCATTAC
ATCGACGGACCCCACTGCGTAAAGACGTGCCCTGCCGGCGTTATGGGCGAGAATAATACCCTGGT
GTGGAAGTATGCCGACGCCGGGCATGTTTGCCATCTGTGCCACCCAAATTGTACATATGGCTGCA
CCGGCCCCGGGTTGGAGGGCTGCCCCACTAATGGACCTAAGATCCCCAGCATCGCCACCGGGATG
GTTGGCGCTCTGCTGCTGCTGTTGGTGGTCGCCTTGGGGATAGGGTTGTTCATGCGAAGGAGGCA
TATCGTCAGGAAGCGCACCCTGAGGAGGCTGCTCCAAGAGCGTGAGCTTGTCGAACCCCTGACCC
CCAGCGGCGAGGCCCCAAACCAGGCACTTTTGCGCATTCTGAAAGAGACCGAATTCAAAAAGATC
AAGGTGCTCGGCAGCGGGGCCTTCGGCACTGTGTACAAGGGTCTTTGGATCCCGGAGGGCGAAAA
AGTTAAAATACCCGTGGCAATAAAAGAGCTGAGGGAGGCTACAAGTCCAAAGGCAAACAAAGAAA
TCCTTGACGAAGCATACGTTATGGCATCCGTCGATAATCCACACGTGTGCAGGCTTCTCGGGATT
TGCCTGACGTCAACCGTGCAGCTGATTACGCAACTGATGCCATTTGGCTGTCTGCTCGATTATGT
GCGCGAGCACAAGGACAACATTGGATCTCAATACTTACTCAACTGGTGCGTGCAGATCGCGAAGG
GGATGAACTATCTGGAGGACCGCAGACTCGTCCACAGGGATCTTGCAGCCAGGAACGTACTCGTT
AAGACCCCGCAGCATGTGAAAATTACCGATTTCGGCCTTGCAAAACTGCTGGGTGCCGAAGAGAA
AGAATACCATGCAGAGGGTGGCAAAGTTCCTATCAAATGGATGGCGTTAGAGTCAATTCTGCATC
GGATCTATACCCATCAGAGCGACGTGTGGTCTTACGGTGTGACCGTTTGGGAGCTTATGACTTTT
GGGAGCAAACCGTACGACGGCATCCCGGCAAGCGAAATTTCCTCAATACTGGAAAAGGGGGAACG
TCTGCCCCAACCACCTATCTGCACTATAGACGTTTATATGATAATGGTGAAATGTTGGATGATCG
ACGCCGACAGTCGACCCAAGTTTCGAGAGTTAATCATCGAGTTCTCCAAGATGGCTCGGGATCCC
CAAAGGTACTTAGTCATCCAGGGCGATGAAAGAATGCACTTACCCTCACCCACAGATTCAAACTT
CTATCGAGCTTTGATGGATGAGGAAGACATGGATGATGTGGTAGACGCCGACGAGTACCTGATAC
CACAGCAGGGTTTTTTTTCTTCACCAAGCACATCTCGTACGCCTCTTCTGAGTAGCCTCAGCGCG
ACCTCCAACAACTCCACAGTGGCGTGCATCGACCGCAACGGACTTCAGTCCTGTCCAATTAAAGA
GGATTCCTTCCTGCAGAGGTATAGCAGCGACCCTACCGGAGCCCTTACCGAGGATAGTATTGATG
ATACATTCCTGCCTGTACCCGAATACATTAATCAGTCCGTGCCAAAACGCCCCGCAGGGAGTGTA
CAGAATCCAGTGTACCACAACCAGCCGCTGAACCCCGCACCCAGCCGAGACCCCCACTACCAGGA
CCCACATAGCACGGCCGTGGGAAATCCTGAATACCTGAACACCGTGCAGCCTACATGTGTGAATA
GCACTTTCGATAGCCCCGCACACTGGGCCCAGAAGGGCTCACACCAAATTAGCCTTGATAACCCT
GATTACCAGCAGGACTTCTTCCCCAAGGAGGCAAAGCCAAACGGTATCTTTAAGGGTAGCACGGC
CGAAAACGCAGAGTACTTGAGGGTTGCCCCTCAGTCCAGTGAGTTCATTGGCGCCTAA

3.3. Codon optimization.

I used the codon optimization tool on the Twist Bioscience website. The optimized DNA sequence is attached below.

Codon optimization is necessary because different organisms have different preferences for codons to encode the same amino acid. This can affect the efficiency of protein expression. I chose to optimize the codon sequence for E. coli, because it is a commonly used host organism for protein expression in the lab.

>Egfr
ATGAGACCCAGTGGAACCGCCGGTGCAGCTCTCCTTGCTTTGCTCGCTGCGCTCTGTCCAGCTTC
ACGGGCCCTTGAAGAAAAGAAAGTCTGTCAAGGTACAAGCAATAAACTCACGCAGTTGGGAACTT
TTGAAGATCACTTTCTGTCCCTGCAAAGGATGTTCAATAATTGTGAAGTAGTTCTGGGCAACCTC
GAAATCACATATGTACAGAGAAATTATGATTTATCCTTTCTGAAAACCATCCAAGAGGTAGCCGG
GTACGTCTTGATCGCTTTAAACACGGTTGAACGGATACCACTCGAGAATTTGCAGATAATCCGCG
GCAATATGTATTATGAGAATAGCTACGCCCTCGCGGTGCTCTCAAACTATGACGCGAATAAGACA
GGGTTAAAAGAATTACCAATGAGAAACCTGCAAGAGATACTCCACGGTGCAGTTAGGTTTAGTAA
CAATCCAGCCCTGTGCAATGTGGAATCTATTCAATGGCGAGATATCGTTAGTAGTGACTTTCTGT
CCAACATGAGTATGGACTTCCAGAATCACCTTGGCAGTTGCCAGAAATGTGATCCCAGCTGCCCA
AACGGGAGCTGCTGGGGAGCTGGGGAAGAAAACTGCCAGAAACTCACTAAAATCATATGCGCTCA
ACAATGCTCTGGCAGGTGCAGAGGCAAAAGCCCTTCCGACTGTTGCCATAACCAGTGTGCAGCTG
GATGTACTGGGCCGAGGGAAAGCGATTGCCTTGTCTGTAGAAAGTTTCGGGACGAAGCCACCTGT
AAGGATACTTGTCCACCCCTGATGCTCTATAATCCTACGACCTACCAAATGGATGTTAATCCGGA
AGGAAAATACTCCTTCGGCGCCACCTGTGTGAAGAAGTGCCCGCGGAATTACGTTGTGACAGATC
ATGGGTCTTGCGTCCGAGCCTGTGGTGCAGACTCTTATGAGATGGAAGAGGATGGGGTGAGGAAA
TGCAAGAAATGCGAGGGGCCATGCAGGAAGGTATGCAATGGAATTGGCATAGGTGAGTTTAAAGA
TTCACTGAGCATCAACGCGACAAACATTAAACACTTCAAGAACTGCACGTCCATATCTGGAGATC
TTCATATTCTTCCGGTGGCTTTCCGAGGAGATTCTTTCACCCATACACCACCCTTAGACCCTCAA
GAGCTGGACATATTGAAAACAGTTAAAGAGATTACAGGCTTTCTGCTTATCCAAGCTTGGCCTGA
AAATAGGACGGATCTCCATGCCTTCGAAAATCTGGAGATCATCAGAGGACGCACAAAGCAGCACG
GACAGTTTTCCCTGGCGGTGGTGTCTCTCAATATAACTTCACTTGGCTTACGCAGCCTCAAAGAA
ATTTCCGACGGAGATGTCATCATAAGTGGAAATAAGAATCTCTGTTACGCTAATACCATCAATTG
GAAGAAACTCTTCGGAACATCTGGACAAAAGACAAAGATCATTAGCAACCGCGGGGAGAACAGCT
GTAAGGCTACCGGACAAGTCTGTCACGCACTTTGTTCTCCAGAGGGATGTTGGGGCCCAGAGCCT
CGTGATTGTGTGTCCTGCAGGAACGTCAGCCGCGGCAGAGAGTGCGTTGACAAATGTAATCTCCT
CGAGGGCGAGCCTCGCGAATTCGTTGAGAACAGTGAGTGCATTCAGTGTCATCCAGAGTGCTTGC
CACAAGCTATGAATATCACCTGTACTGGACGCGGACCTGATAACTGCATCCAATGTGCTCACTAT
ATAGATGGGCCACATTGTGTGAAAACTTGTCCAGCTGGTGTAATGGGAGAAAACAACACATTAGT
TTGGAAATACGCAGATGCTGGTCACGTGTGTCACCTTTGTCATCCTAACTGCACCTACGGGTGTA
CTGGTCCAGGCCTCGAAGGTTGTCCGACCAACGGCCCAAAGATTCCTTCAATTGCAACAGGCATG
GTCGGGGCCTTGTTGTTGTTGCTGGTTGTGGCACTCGGTATTGGCCTGTTTATGAGACGGCGGCA
CATTGTGCGTAAAAGAACATTACGTCGCCTCTTACAGGAACGAGAACTGGTAGAGCCGCTCACAC
CTTCTGGGGAAGCACCGAATCAAGCCCTGCTCCGTATATTAAAGGAAACTGAGTTTAAGAAAATT
AAAGTACTGGGATCCGGCGCTTTTGGTACAGTTTATAAAGGGCTGTGGATACCTGAAGGGGAGAA
GGTGAAGATCCCTGTCGCCATCAAGGAATTGCGAGAAGCGACTTCCCCCAAAGCCAATAAAGAGA
TTCTGGATGAGGCCTATGTCATGGCTTCTGTGGACAACCCTCATGTTTGTCGCCTGTTAGGCATC
TGTCTTACTAGCACTGTCCAACTTATCACTCAGTTGATGCCGTTCGGGTGCCTTCTGGACTACGT
TAGAGAACATAAAGATAATATCGGTAGCCAGTATCTCCTGAATTGGTGTGTTCAAATAGCAAAAG
GCATGAATTACCTCGAAGATCGGCGGCTGGTTCATCGCGACCTGGCTGCTCGGAATGTCCTTGTC
AAAACACCCCAACACGTAAAGATAACAGACTTTGGGCTGGCTAAGCTTCTTGGCGCTGAGGAAAA
GGAGTATCACGCTGAAGGCGGAAAGGTCCCCATTAAGTGGATGGCTCTGGAATCCATCTTGCACA
GGATATACACTCACCAATCAGATGTCTGGTCCTATGGGGTAACAGTATGGGAACTGATGACCTTC
GGCTCCAAGCCATATGATGGAATACCTGCGAGTGAGATAAGCTCCATTTTGGAGAAAGGAGAGAG
GTTACCGCAGCCGCCCATATGTACAATTGATGTCTACATGATTATGGTCAAGTGCTGGATGATTG
ATGCAGATAGCCGGCCGAAATTCCGCGAATTGATTATTGAATTTAGCAAAATGGCCCGCGACCCA
CAGCGCTATTTGGTTATTCAAGGGGACGAGAGGATGCATCTGCCAAGCCCAACTGACAGCAACTT
TTACCGCGCCCTTATGGACGAAGAGGATATGGACGACGTTGTGGATGCTGATGAATATTTGATTC
CTCAACAAGGGTTCTTCAGTTCTCCCTCAACTTCCAGAACCCCACTGCTTTCAAGTTTATCCGCA
ACTAGTAATAATAGCACCGTTGCATGTATTGATCGGAATGGGCTGCAAAGCTGCCCCATCAAGGA
AGACTCATTCTTACAACGTTACTCATCTGATCCCACTGGGGCGCTGACTGAAGACTCAATCGACG
ACACCTTTCTTCCAGTCCCGGAGTATATCAACCAAAGTGTCCCCAAGAGACCTGCCGGATCCGTG
CAAAACCCTGTTTATCATAATCAACCTCTCAATCCAGCGCCGTCTAGGGATCCTCATTATCAAGA
TCCCCACTCTACTGCTGTAGGGAACCCAGAGTATCTCAATACAGTTCAACCCACTTGCGTCAACT
CTACCTTTGACAGTCCTGCCCATTGGGCGCAGAAAGGTTCCCATCAGATCTCCCTGGACAATCCA
GACTATCAACAAGATTTCTTTCCTAAAGAAGCCAAACCCAATGGGATATTCAAAGGATCTACCGC
AGAGAATGCGGAATATCTGCGCGTGGCACCCCAAAGCAGCGAATTTATCGGGGCTTAG

3.4. You have a sequence! Now what?

Cell-dependent and cell-free technologies could be used.

Cell-dependent method: First, insert the DNA sequence into a plasmid vector, and then transfer it into a host cell (e.g. E. coli). The host cell will transcribe the DNA into mRNA, which will then be translated into the protein.

Part 4: Prepare a Twist DNA Synthesis Order

I have done everything in Part 3 using the Twist Bioscience website, and followed the tutorial to finish the remaining steps.

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?

I would want to sequence the DNA of a cancer cell, because we can learn which mutations led to the cancer, and use corresponding drugs (if available) to treat the cancer.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Illumina sequencing.

Q1: Is your method first-, second- or third-generation or other? How so?

Second-generation, because it is based on sequencing by synthesis.

Q2: What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

The input is the DNA extracted from the cancer cell.

Fragmentation: The DNA is fragmented into smaller pieces.

Adapter ligation: Adapters are ligated to the ends of the DNA fragments.

PCR: The DNA fragments are amplified using PCR.

Q3: What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

First, DNA fragments are immobilized on a flow cell and amplified to form clusters. Then, fluorescently labeled nucleotides are added to the flow cell. Each time, a nucleotide is incorporated into the growing DNA strand by DNA polymerase, a fluorescent signal is emitted. The fluorescence is detected to determine which nucleotide was incorporated. Finally, the fluorescence signals are converted into base calls.

Q4: What is the output of your chosen sequencing technology?

Sequence reads with quality scores. Sequence reads can be assembled into a complete genome sequence.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would want to synthesize the DNA sequence of mutated EGFR in cancers. By synthesizing the DNA sequence of EGFR and translate it into protein, we can study its function and develop targeted drugs.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use phosphoramidite chemical synthesis for short DNA fragments (e.g. 200bp), and then Gibson assembly for assembling the short fragments into the full-length DNA sequence.

Q1: What are the essential steps of your chosen sequencing methods?

Phosphoramidite chemical synthesis: One DNA base is synthesizaed at a time. Each cycle consists of four steps: deprotection, coupling, capping, and oxidation.

Gibson assembly: DNA fragments with overlapping ends are mixed together. Enzymes (exonuclease, DNA polymerase, and DNA ligase) are added to the mixture and stitch the DNA fragments together.

5.2 DNA Edit

(i) What DNA would you want to edit and why?

I would want to edit the DNA sequence of a gene that is mutated in a genetic disease, such as cystic fibrosis. By editing the DNA sequence to correct the mutation, we can potentially cure the disease.

Q1: How does your technology of choice edit DNA? What are the essential steps? CRISPR-Cas9 is a commonly used technology for DNA editing.

First, a guide RNA (gRNA) is designed to target the specific DNA sequence to be edited. The gRNA is then complexed with the Cas9 protein to form a ribonucleoprotein (RNP) complex which will be delivered into the target cells. The RNP will bind to the target DNA sequence and creates a break. The cell’s repair mechanisms then repair the break according to a repair template.

Q2: What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

Preparation and input: gRNA sequence that corresponds to the edit site (find), repair template (replace).

Q3: What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The main limitation is the off-target effect, where the edits are applied to unintended sites.

Week 03 HW: Lab Automation

Python Script for Opentrons Artwork

I created a design using opentrons-art.rcdonovan.com

Opentrons-Art Website: https://opentrons-art.rcdonovan.com/?id=80fx569l8o4tho4

Google Colab: https://colab.research.google.com/drive/1UPiCmwBP3sIFD_rNVRHeT3YhuiQQ5ZGP#scrollTo=pczDLwsq64mk&line=6&uniqifier=1

The OpentronMock gives the following output:

Code:

  ###
  ### YOUR CODE HERE to create your design
  ###
  sfgfp_points = [(-3.3, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(-3.3, -5.5),(-1.1, -5.5),(1.1, -5.5),(3.3, -5.5),(-1.1, -7.7),(1.1, -7.7),(-5.5, -9.9),(-1.1, -9.9),(1.1, -9.9),(5.5, -9.9),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1)]
  mrfp1_points = [(-23.1, 27.5),(-20.9, 27.5),(-18.7, 27.5),(18.7, 27.5),(20.9, 27.5),(23.1, 27.5),(-23.1, 25.3),(-20.9, 25.3),(-18.7, 25.3),(-16.5, 25.3),(-14.3, 25.3),(14.3, 25.3),(16.5, 25.3),(18.7, 25.3),(20.9, 25.3),(23.1, 25.3),(-23.1, 23.1),(-20.9, 23.1),(-18.7, 23.1),(-16.5, 23.1),(-14.3, 23.1),(-12.1, 23.1),(-3.3, 23.1),(-1.1, 23.1),(1.1, 23.1),(3.3, 23.1),(12.1, 23.1),(14.3, 23.1),(16.5, 23.1),(18.7, 23.1),(20.9, 23.1),(23.1, 23.1),(-23.1, 20.9),(-20.9, 20.9),(-18.7, 20.9),(-16.5, 20.9),(-14.3, 20.9),(-12.1, 20.9),(-9.9, 20.9),(-7.7, 20.9),(-5.5, 20.9),(-3.3, 20.9),(-1.1, 20.9),(1.1, 20.9),(3.3, 20.9),(5.5, 20.9),(7.7, 20.9),(9.9, 20.9),(12.1, 20.9),(14.3, 20.9),(16.5, 20.9),(18.7, 20.9),(20.9, 20.9),(23.1, 20.9),(-23.1, 18.7),(-20.9, 18.7),(-18.7, 18.7),(-16.5, 18.7),(-14.3, 18.7),(-12.1, 18.7),(-9.9, 18.7),(-7.7, 18.7),(7.7, 18.7),(9.9, 18.7),(12.1, 18.7),(14.3, 18.7),(16.5, 18.7),(18.7, 18.7),(20.9, 18.7),(23.1, 18.7),(-23.1, 16.5),(-20.9, 16.5),(-18.7, 16.5),(-16.5, 16.5),(-14.3, 16.5),(-12.1, 16.5),(12.1, 16.5),(14.3, 16.5),(16.5, 16.5),(18.7, 16.5),(20.9, 16.5),(23.1, 16.5),(-23.1, 14.3),(-20.9, 14.3),(-18.7, 14.3),(-16.5, 14.3),(16.5, 14.3),(18.7, 14.3),(20.9, 14.3),(23.1, 14.3),(-23.1, 12.1),(-20.9, 12.1),(-18.7, 12.1),(18.7, 12.1),(20.9, 12.1),(23.1, 12.1),(-23.1, 9.9),(-20.9, 9.9),(20.9, 9.9),(23.1, 9.9),(-23.1, 7.7),(-20.9, 7.7),(20.9, 7.7),(23.1, 7.7),(-23.1, 5.5),(23.1, 5.5),(-25.3, 3.3),(-23.1, 3.3),(23.1, 3.3),(25.3, 3.3),(-25.3, 1.1),(-23.1, 1.1),(23.1, 1.1),(25.3, 1.1),(-25.3, -1.1),(-23.1, -1.1),(23.1, -1.1),(25.3, -1.1),(-25.3, -5.5),(-23.1, -5.5),(23.1, -5.5),(25.3, -5.5),(-25.3, -7.7),(25.3, -7.7),(-23.1, -9.9),(23.1, -9.9),(-23.1, -12.1),(23.1, -12.1),(-23.1, -14.3),(23.1, -14.3),(-20.9, -16.5),(20.9, -16.5),(-20.9, -18.7),(-18.7, -18.7),(18.7, -18.7),(20.9, -18.7),(-18.7, -20.9),(-16.5, -20.9),(16.5, -20.9),(18.7, -20.9),(-16.5, -23.1),(-14.3, -23.1),(-12.1, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(7.7, -25.3),(9.9, -25.3),(12.1, -25.3),(14.3, -25.3),(-9.9, -27.5),(-7.7, -27.5),(-5.5, -27.5),(-3.3, -27.5),(-1.1, -27.5),(1.1, -27.5),(3.3, -27.5),(5.5, -27.5),(7.7, -27.5),(9.9, -27.5),(-1.1, -29.7),(1.1, -29.7)]
  azurite_points = [(-9.9, 7.7),(-7.7, 7.7),(7.7, 7.7),(9.9, 7.7),(-12.1, 5.5),(-9.9, 5.5),(-7.7, 5.5),(7.7, 5.5),(9.9, 5.5),(12.1, 5.5),(-9.9, 3.3),(9.9, 3.3)]
  mwasabi_points = [(-27.5, -3.3),(-25.3, -3.3),(-23.1, -3.3),(-20.9, -3.3),(-18.7, -3.3),(-16.5, -3.3),(16.5, -3.3),(18.7, -3.3),(20.9, -3.3),(23.1, -3.3),(25.3, -3.3),(27.5, -3.3),(-23.1, -7.7),(-20.9, -7.7),(-18.7, -7.7),(-16.5, -7.7),(16.5, -7.7),(18.7, -7.7),(20.9, -7.7),(23.1, -7.7),(-27.5, -9.9),(-25.3, -9.9),(25.3, -9.9),(27.5, -9.9),(-16.5, -12.1),(16.5, -12.1),(-20.9, -14.3),(-18.7, -14.3),(18.7, -14.3),(20.9, -14.3),(-23.1, -16.5),(23.1, -16.5),(-25.3, -18.7),(25.3, -18.7)]

  scale = 1

  def draw_points(points, color="Red"):
    segments = []
    for i in range(0, len(points), 20):
      segments.append(points[i : i+20])
    for seg in segments:
      pipette_20ul.pick_up_tip()
      pipette_20ul.aspirate(len(seg), location_of_color(color))
      for x, y in seg:
        adjusted_location = center_location.move(types.Point(x=x*scale, y=y*scale))
        dispense_and_detach(pipette_20ul, 1, adjusted_location)
      pipette_20ul.drop_tip()

  draw_points(sfgfp_points, "Red")
  draw_points(mrfp1_points, "Green")
  draw_points(azurite_points, "Orange")
  draw_points(mwasabi_points, "Orange")

Result

With the help of our TA Ronan, the art was printed with an Opentrons robot. The result is shown below:

Published paper about lab automation

PyLabRobot: An open-source, hardware-agnostic interface for liquid-handling robots and accessories. Wierenga, Rick P. et al. Device, Volume 1, Issue 4, 100111

This paper introduces PyLabRobot, an open-source Python library that provides a unified interface for controlling various liquid-handling robots and accessories, including Opentrons. PyLabRobot also includes a simulator (like the OpentronMock provided in this homework’s Google Colab notebook), which allows users to test and debug their protocols without needing access to the physical robot. Further, this paper also demonstrates the integration with LLMs, allowing users who are not familiar with programming to create protocols using natural language instructions, which are then translated into executable code for the robot.

My plan to use automation tools

I am interested in using lab automation to do machine-learning guided directed evolution of PET-ase (PET plastic degradation enzyme).

First, I will need to use machine learning models such as ProteinMPNN to design an initial library of PET-ase variants. I will place orders for the DNA fragments of these variants from Twist Bioscience.

Second, I will use liquid handler to assemble the DNA fragments into plasmids, and then transform the plasmids into E. coli cells.

Then, I will use a plate reader to measure the activity of the PET-ase variants in degrading PET plastic. This can also be done in a high-throughput manner using 96-well or 384-well plates with an automation robot.

Finally, I will use the activity data to train a machine learning model to predict the activity of new PET-ase variants, and then use the model to design the next round of variants for testing. This iterative process can be repeated until we find highly active PET-ase variants for degrading PET plastic.

Week 04 HW: Protein Design Part I

Part A: Conceptual Questions

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

1 Dalton = 1.66 x 10-24 grams. 100 Daltons = 1.66 x 10-22 grams. 1 gram = 6.02 x 1023 molecules. 20% of meat is protein, so 100 grams of proteins in 500 grams of meat. Therefore: 100 x 6.02 x 1023 = 6.02 x 1025 molecules of amino acids.

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

We digest proteins into amino acids rather than absorbing proteins into our genomes.

3. Why are there only 20 natural amino acids?

20 natural amino acids are sufficient to create chemical diversity and efficiency.

4. Can you make other non-natural amino acids? Design some new amino acids.

Yes. For example, we can replace the sulfur atom in cysteine with selenium to create selenocysteine.

5. Where did amino acids come from before enzymes that make them, and before life started?

Chemical reactions in the environment.

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Left-handed.

7. Can you discover additional helices in proteins?

Yes. For example, the pi-helix and the 3-10-helix.

8. Why are most molecular helices right-handed?

Most amino acids in life are L-amino acids => right-handed helices.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

Because of the parallel and anti-parallel hydrogen bonding between the strands. The driving force is the formation of hydrogen bonds.


Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

I chose EGFR (Epidermal Growth Factor Receptor), because it is a protein that plays a critical role in cell growth and division, and it is often mutated in various cancers.

2. Identify the amino acid sequence of your protein

MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALAVLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFKNCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAFENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKLFGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVMGENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVVALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPKFRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQQGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTEDSIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLNTVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRVAPQSSEFIGA
Sequence Length: 1210 amino acids

Amino Acid Frequencies:
L: 111 (9.17%)
G: 85 (7.02%)
S: 84 (6.94%)
E: 77 (6.36%)
P: 75 (6.20%)
A: 72 (5.95%)
V: 70 (5.79%)
I: 69 (5.70%)
K: 66 (5.45%)
N: 66 (5.45%)
T: 64 (5.29%)
D: 61 (5.04%)
R: 60 (4.96%)
C: 60 (4.96%)
Q: 49 (4.05%)
F: 36 (2.98%)
Y: 36 (2.98%)
H: 31 (2.56%)
M: 25 (2.07%)
W: 13 (1.07%)

BLAST results (BLAST 250 results found in UniProtKB): https://www.uniprot.org/blast/uniprotkb/ncbiblast-R20260301-211408-0195-52595182-p2m/overview

Family: Protein kinase superfamily. Tyr protein kinase family. EGF receptor subfamily.

3. Identify the structure page of your protein in RCSB

There are multiple structures of different EGFR domains in RCSB PDB. The earliest one is 1M14 deposited on 2002-06-17. Resolution is 2.60 Å, so it is a good quality structure.

There are water molecules in the structure, but no ligands or cofactors.

Structure classification family:

  • Structural Class: Alpha and beta proteins (a+b)
  • Fold: Protein kinase-like (PK-like)
  • Superfamily: Protein kinase-like (PK-like)
  • Family: Protein kinases catalytic domain-like

4. Open the structure of your protein in any 3D molecule visualization software:

Cartoon: cartoon cartoon

Ribbon: cartoon cartoon

Ball and stick: cartoon cartoon

Secondary structure (6 alpha helices and 6 beta strands): cartoon cartoon

Residue type (green: hydrophilic, gray: hydrophobic. Hydrophilic residues are more on the surface, while hydrophobic residues are more buried inside): cartoon cartoon

Surface (it has obvious pockets on the surface, as it is a receptor): cartoon cartoon

Part C. Using ML-Based Protein Design Tools

In this part, I will use a different protein sequence: Poly(ethylene terephthalate) hydrolase (PETase) (https://www.uniprot.org/uniprotkb/A0A0K8P6T7/entry), which is an enzyme that can degrade PET plastics. The PDB access code is 5XFY.

MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPSSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS

C1. Protein Language Modeling

1. Deep Mutational Scans

Columns with more dark cells (the wild-type amino acid is strongly preferred) indicate more conserved residues, which are likely important for the structure and function of the protein.

2. Latent Space Analysis

Neighboring proteins usually share the same SCOPe structural class and superfamily number, which indicates that the latent space captures structural and functional similarities between proteins.

When placing my PETase sequence embedding into the dimensionality reduction plot, many of its neighbors belong to c.69: alpha/beta-Hydrolases.

C2. Protein Folding

The protein on the left is the experimental structure of PETase (PDB: 5XFY), and the protein on the right is the predicted structure by ESMFold. The two structures are very similar, with a predicted TM score of 0.913. The RMSD reported by PyMOL is 0.540Å.

Changing the sequence with mutations does not have impact on the predicted structure. However, changing large fragments sometimes blows up the predicted structure.

C3. Protein Generation

I used ProteinMPNN to design a sequence based on the structure of the PETase I chose (PDB: 5XFY). The designed sequence is:

NPTVLGPEPTRESLEAPRGPFAVESFEVAAPQGFGAGTVYWPRDAGGKVPAIAIAPGYGQGRAAVAWKGELLASHGFVVLVIDPRSPTSDAPQIAAELMAGLAYLDALNADPASPIYGKIDTSRRGVSGHSLGGGGALIAAMENPELKAAAPMAPYHPETDFSKITVPTLIFASENDTIAPPEKYSKPMYNSITKAPKRLLTIKGGDHGATLTGNPHRGLIGRYLVAWFALYMRDDKRYSEFATENPDSDDVSYWESSNLS

1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

For about half of the sequence, the highest probability amino acid is the same as the original sequence. For the other half, the predicted amino acid is different from the original one, but they often have similar properties (e.g., both are hydrophobic or both are polar).

2. Input this sequence into ESMFold and compare the predicted structure to your original.

The protein on the left is the experimental structure of PETase (PDB: 5XFY), and the protein on the right is the predicted structure by ESMFold based on the ProteinMPNN-designed sequence. The predicted structure of the designed sequence is very similar to the original structure (0.756Å), and ESMFold is very confident about the prediction (predicted TM score = 0.944).

Week 05 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

The original sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutate the 4th amino acid A to V (A4V):

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:

indexBinderPseudo Perplexity
0HLYYAVALELKX13.299815648347872
1WRSYAVVLELWK17.97100111129112
2WRYYPVAAAWKK11.081842724779028
3WHYGAVGLRHKX13.983770011694478

The perplexity of the reference SOD-1 binding sequence FLYRWLPSRRGG is 20.63523127283615:

ppl_value = compute_pseudo_perplexity(model, tokenizer, protein_seq, "FLYRWLPSRRGG")
ppl_value  # Output: 20.63523127283615

Part 2: Evaluate Binders with AlphaFold3

The IPTM scores for the reference peptide against the wild-type and mutant SOD1 are both pretty low (0.36 and 0.41 respectively), indicating that AlphaFold is not very confident about the predicted binding structure. The first three generated peptides have IPTM scores of 0.24, 0.25, and 0.32, which are lower than the reference. The last generated peptide has an IPTM score of 0.43, which is higher than the reference.

Only the third (A4V-2) generated peptide binds to the dimerization interface of SOD1.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Predicted properties of the generated peptides:

SequencePropertyPredictionValueUnit
HLYYAVALELK💧 SolubilitySoluble1.000Probability
HLYYAVALELK🩸 HemolysisNon-hemolytic0.081Probability
HLYYAVALELK🔗 Binding AffinityWeak binding6.052pKd/pKi
HLYYAVALELK📏 Length11aa
HLYYAVALELK⚖️ Molecular Weight1319.5Da
HLYYAVALELK⚡ Net Charge (pH 7)-0.15
HLYYAVALELK🎯 Isoelectric Point6.75pH
HLYYAVALELK💦 Hydrophobicity (GRAVY)0.55GRAVY
WRSYAVVLELWK💧 SolubilitySoluble1.000Probability
WRSYAVVLELWK🩸 HemolysisNon-hemolytic0.130Probability
WRSYAVVLELWK🔗 Binding AffinityWeak binding6.818pKd/pKi
WRSYAVVLELWK📏 Length12aa
WRSYAVVLELWK⚖️ Molecular Weight1549.8Da
WRSYAVVLELWK⚡ Net Charge (pH 7)0.76
WRSYAVVLELWK🎯 Isoelectric Point8.59pH
WRSYAVVLELWK💦 Hydrophobicity (GRAVY)0.17GRAVY
WRYYPVAAAWKK💧 SolubilitySoluble1.000Probability
WRYYPVAAAWKK🩸 HemolysisNon-hemolytic0.021Probability
WRYYPVAAAWKK🔗 Binding AffinityWeak binding6.124pKd/pKi
WRYYPVAAAWKK📏 Length12aa
WRYYPVAAAWKK⚖️ Molecular Weight1538.8Da
WRYYPVAAAWKK⚡ Net Charge (pH 7)2.76
WRYYPVAAAWKK🎯 Isoelectric Point10.00pH
WRYYPVAAAWKK💦 Hydrophobicity (GRAVY)-0.72GRAVY
WHYGAVGLRHK💧 SolubilitySoluble1.000Probability
WHYGAVGLRHK🩸 HemolysisNon-hemolytic0.023Probability
WHYGAVGLRHK🔗 Binding AffinityWeak binding5.442pKd/pKi
WHYGAVGLRHK📏 Length11aa
WHYGAVGLRHK⚖️ Molecular Weight1323.5Da
WHYGAVGLRHK⚡ Net Charge (pH 7)1.93
WHYGAVGLRHK🎯 Isoelectric Point9.99pH
WHYGAVGLRHK💦 Hydrophobicity (GRAVY)-0.73GRAVY

Predicted properties of the reference peptide:

SequencePropertyPredictionValueUnit
FLYRWLPSRRGG💧 SolubilitySoluble1.000Probability
FLYRWLPSRRGG🩸 HemolysisNon-hemolytic0.047Probability
FLYRWLPSRRGG🔗 Binding AffinityWeak binding5.968pKd/pKi
FLYRWLPSRRGG📏 Length12aa
FLYRWLPSRRGG⚖️ Molecular Weight1507.7Da
FLYRWLPSRRGG⚡ Net Charge (pH 7)2.76
FLYRWLPSRRGG🎯 Isoelectric Point11.71pH
FLYRWLPSRRGG💦 Hydrophobicity (GRAVY)-0.71GRAVY

The peptide WHYGAVGLRHK has the highest ipTM score of 0.43, but it has a relatively low predicted binding affinity (5.442 pKd/pKi). The peptide WRSYAVVLELWK has a lower ipTM score of 0.25 but a higher predicted binding affinity (6.818 pKd/pKi). None of the generated peptides are predicted to be hemolytic or poorly soluble. The peptide WRSYAVVLELWK best balances predicted binding and therapeutic properties, as it has a reasonably high ipTM score and the highest predicted binding affinity among the generated peptides.

Part 4: Generate Optimized Peptides with moPPIt

Generated peptide sequence with predicted solubility score, affinity score, and hemolysis score:

['DFRQSTTYQY']
[0.9166666865348816, 6.323781490325928, 0.7198045253753662]

The moPPIt-generated peptide DFRQSTTYQY has a higher predicted binding affinity score and solubility score compared to the PepMLM-generated peptides. Before advancing this peptide to clinical studies, I would evaluate its binding affinity experimentally in vitro, and further assess its stability, toxicity, and pharmacokinetic properties in cell and animal models.

Part C: L-Protein Mutants

I first used Boltz-2 to predict the complex structure of the wild-type L-protein and DnaJ protein:

Next, I used FoldX, a force field-based protein design tool that can predict the effects of mutations on protein-protein interfaces. The goal is to identify mutations in the L-protein that are energetically favorable to stabilize the interaction with DnaJ.

To do this, I first relax the sidechain structure of the L-protein using the following command:

foldx --command=RepairPDB --pdb=result.pdb

The relaxation process slightly adjusts sidechain conformations to minimize steric clashes and optimize interactions. The resulting relaxed structure is shown below in blue, (green and cyan are the original structure predicted by Boltz-2):

Next, I scan through L-protein and mutate each residue to all 20 amino acids, and compute the change in binding energy (ΔΔG) for each mutation using the following command:

# Soluble
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=MA1a,EA2a,TA3a,RA4a,FA5a,PA6a,QA7a,QA8a,SA9a,QA10a,QA11a,TA12a,PA13a,AA14a,SA15a,TA16a,NA17a,RA18a,RA19a,RA20a,PA21a,FA22a,KA23a,HA24a,EA25a,DA26a,YA27a,PA28a,CA29a,RA30a,RA31a,QA32a,QA33a,RA34a,SA35a,SA36a,TA37a,LA38a,YA39a,VA40a

# Transmembrane
foldx --command=Pssm --analyseComplexChains=A,B --pdb=result_Repair.pdb --positions=LA41a,IA42a,FA43a,LA44a,AA45a,IA46a,FA47a,LA48a,SA49a,KA50a,FA51a,TA52a,NA53a,QA54a,LA55a,LA56a,LA57a,SA58a,LA59a,LA60a,EA61a,AA62a,VA63a,IA64a,RA65a,TA66a,VA67a,TA68a,TA69a,LA70a,QA71a,QA72a,LA73a,LA74a,TA75a

The result of mutations on soluble region is shown below, green indicates mutations that are predicted to stabilize the interaction (negative ddG), while red indicates mutations that are predicted to destabilize the interaction (positive ddG):

The result of mutations on TM region is shown below:

Based on the results above, I would propose the following multi-site mutations in the soluble region:

  • DA26L + NA17W + CA29W: sum = −9.23
  • DA26L + EA25P + QA8T + FA22H: sum = −9.03
  • DA26L + NA17W + RA4E + SA9F: sum = −9.95
  • DA26L + EA2D + FA5M + RA20N: sum = −8.69
  • DA26L + HA24P + PA28K + RA34M: sum = −7.63

My rationale is that combining single stabilizing mutations will have an additive effect on the overall binding affinity. However, this assumption ignores potential epistatic interactions between mutations (non-additive effects).

Week 06 HW: Genetic Circuits Part I

DNA Assembly

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

  • Phusion DNA Polymerase: synthesizes new DNA strands by adding nucleotides to the template strand during PCR
  • dNTPs (deoxynucleotides): building blocks of DNA
  • Buffer: provides the optimal conditions

2. What are some factors that determine primer annealing temperature during PCR?

  • Tm (melting temperature) of the primer: the temperature at which half of the DNA duplex dissociates
  • Primer length: longer primers generally require higher annealing temperatures
  • GC content: higher GC content increases the Tm and may require higher annealing temperatures
  • Salt concentration: higher salt concentrations can stabilize the DNA duplex and may require higher annealing temperatures

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

  • PCR:

    • Protocol: denaturation, annealing, and extension steps to amplify specific DNA sequences using primers and a DNA polymerase
    • Advantages: can amplify specific DNA sequences from complex mixtures, does not require specific restriction sites, can introduce mutations or tags through primer design
    • Disadvantages: may produce non-specific products, requires optimization of conditions, can be time-consuming
  • Restriction enzyme digests:

    • Protocol: cutting DNA at specific sites using restriction enzymes
    • Advantages: can generate specific fragments based on known restriction sites
    • Disadvantages: requires specific restriction sites

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Create overlapping sequences at the ends of the DNA fragments that are complementary to each other, which allows for efficient assembly during Gibson cloning.

5. How does the plasmid DNA enter the E. coli cells during transformation?

  • Heat shock: Generate pores in bacterial cell wall with an abrupt temperature change
  • Electroporation: Generate pores in bacterial cell wall with high electrical voltage

6. Describe another assembly method in detail (such as Golden Gate Assembly)

Golden Gate Assembly: assembly of multiple DNA fragments in a single reaction using type IIS restriction enzymes and DNA ligase. The type IIS restriction enzymes cut the recognition site and creates a overhang that is designed to be complementary to the overhang of the fragment to be inserted, allowing for seamless assembly. This method is efficient and allows for the assembly of multiple fragments in a single reaction.

Asimov Kernel

Recreate Repressilator

I recreated the Repressilator by dragging the parts one by one from the Characterized Bacterial Parts repository to the editor.

Below are the simulation results:

The results look similar to the original Repressilator construct simulation results, which shows oscillations in the expression levels of the three proteins.

My Construct

I created a construct consists of pLacI promoter, RBS, LacI coding sequence, and terminator. Below are the simulation results, which show that the LacI protein is expressed at a high level:

Next, I tried to remove the promoter, and the simulation results show that the LacI protein is not expressed at all:

I also tried to put the promoter after the LacI coding sequence, and the simulation results show that the LacI protein is not expressed at all:

Week 07 HW: Genetic Circuits Part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

IANNs can perform more complex computations than traditional genetic circuits, which are limited to Boolean functions. IANNs can process continuous inputs and produce continuous outputs, allowing for more nuanced control of gene expression. Additionally, IANNs can learn and adapt over time, making them more versatile and capable of handling dynamic environments.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

One useful application for an IANN could be in the field of personalized medicine. An IANN can sense biomarkers from a patient’s blood and regulate the release of drugs in response to the biomarkers.

3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

X1 (DNA encoding Csy4) --> Tx --> Csy4 mRNA --> Tl --> Csy4 protein
X2 (DNA encoding fluorescent protein) --> Tx --> fluorescent protein mRNA (regulated by Csy4) --> Tl --> fluorescent protein output

Assignment Part 2: Fungal Materials

1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Mycelium-based materials used for packaging and insulation.

Advantages: sustainability, biodegradability, and low environmental impact.

Disadvantages: limited durability and potential for mold growth.

2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Bioproduction of biofuels or other valuable chemicals. Fungi have the ability to break down complex organic materials, making them well-suited for this purpose. Fungi can grow in a wide range of environments, which makes them a more sustainable option compared to bacteria.

Week 09 HW: Cell-Free Systems

General homework questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Advantages: cell-free systems allow for precise control over the reaction conditions, such as temperature, pH, and the concentration of substrates and cofactors. This can lead to higher yields and faster protein production compared to in vivo methods.

Examples:

  • Production of toxic proteins that would harm living cells.
  • Rapid prototyping of genetic circuits without the need for transformation and cell growth.

2. Describe the main components of a cell-free expression system and explain the role of each component.

  • Cell extract: contains the necessary machinery for transcription and translation, including ribosomes, tRNAs, and enzymes.
  • Energy source: provides the necessary energy for protein synthesis, such as ATP or GTP.
  • DNA template: contains the genetic information for the protein to be synthesized.
  • Substrates: amino acids and nucleotides required for protein synthesis.
  • Cofactors: molecules that assist in enzymatic reactions, such as magnesium ions.
  • Buffer: maintains the optimal pH and ionic strength for the reaction.

3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

The synthesis of proteins requires ATP supply.

Phosphoenolpyruvate (PEP): it can be converted to pyruvate by the enzyme pyruvate kinase, which generates ATP from ADP in the process.

4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic: high yields, high speed, and low cost. Suitable for simple proteins without post-translational modifications (e.g. Green Fluorescent Protein GFP).

Eukaryotic: can perform post-translational modifications, but slower and more expensive. Suitable for complex proteins that require PTM (e.g. human insulin).

5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Membrane proteins are hydrophobic (i.e. lipophilic). Therefore, I would use a cell-free system that mimcs the membrane lipid environment to avoid misfolding and aggregation. In specific, I would use a cell-free system that includes liposomes to create a suitable environment.

6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

  • DNA tempalte degraded: replace the DNA template with a fresh one
  • Insufficient energy supply: add more energy source (e.g. ATP or PEP)

Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows:

1.a. What would your synthetic cell do? What is the input and what is the output?

I would design a synthetic cell that can detect viral pathogens in water samples. The input would be the water sample, and the output would some kind of signal. For example, the cell could produce luciferase in response to the presence of viral RNA, which would emit light that can be easily detected.

1.b. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Yes.

The cell-free system can be designed to include the necessary components for transcription and translation, as well as a reporter gene that produces a detectable signal in response to the target viral RNA.

Encapsulation is not needed for this function, as the cell-free system can operate in solution.

1.c. Could this function be realized by genetically modified natural cell?

Yes. For example, based on Noctiluca scintillans.

1.d. Describe the desired outcome of your synthetic cell operation.

Sensitive and low-cost detection of viral pathogens in water samples.

2. Design all components that would need to be part of your synthetic cell.

a. What would be the membrane made of?

Lipid bilayer.

b. What would you encapsulate inside? Enzymes, small molecules.

Enzymes for transcription and translation, a reporter gene (e.g. luciferase), substrates (luciferin) and cofactors.

c. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)

Bacterial system is sufficient for this application.

d. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

The viral RNA can enter the cell-free system through pores in the membrane, and the luciferin substrate can also diffuse into the system to enable the luciferase reaction.

3. Experimental details

a. List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

Lipids: phosphatidylcholine, phosphatidylethanolamine, cholesterol.

Genes: RNA polymerase, luciferase gene.

b. How will you measure the function of your system?

Adding samples of viral RNA and measure the luminescence.

Homework question from Peter Nguyen

1. Write a one-sentence summary pitch sentence describing your concept.

A cell-free system integrated into textiles that can sense environmental pollutants and change color to alert the user.

2. How will the idea work, in more detail? Write 3-4 sentences or more.

The cell-free system will be designed to detect specific pollutants, such as heavy metals or viral pathogens. When the target pollutant is detected, the system will trigger a colorimetric reaction that changes the color of the textile. This could be achieved by incorporating a reporter gene that produces a pigment in response to the pollutant. The textile could be used in clothing or facial masks to provide real-time detection capabilities for human.

3. What societal challenge or market need will this address?

Not reusable: the color change is irreversible, so the textile would need to be replaced after each use. This could be addressed by designing a system that can be reset or by using a reversible color change mechanism.

4. How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

To address the limitation of one-time use, I would design the system to be reversible. The color would change back when the pollutant is no longer detected.

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

In space, the intense radiation can cause genetic variation and protein misfolding, leading to reduced functionality and potential health risks for astronauts. A cell-free system could be used to produce functional proteins in space, mitigating these issues and supporting long-term space exploration.

2. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

The molecular target I propose to study is the DNA repair enzyme, specifically the protein RAD51, which plays a crucial role in repairing DNA damage caused by radiation.

3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

RAD51 is essential for repairing DNA double-strand breaks, which are a common form of damage caused by space radiation. By studying RAD51 in a cell-free system, we can understand how it functions under space-like conditions and potentially enhance its activity.

4. Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

Hypothesis: Enhancing the activity of RAD51 in a cell-free system can improve DNA repair efficiency under space-like conditions.

5. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

I will use a cell-free system to express RAD51 and expose it to simulated space radiation. I will measure the DNA repair efficiency by assessing the ability of RAD51 to repair induced DNA damage using a reporter assay. Controls will include a cell-free system without RAD51 and a system with a known DNA repair-deficient mutant of RAD51.

Week 10 HW: Imaging and Measurement

Final Project

1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

We will measure PETase thermostability (Tm), residual activity after heat challenge, expression yield, and PET-degradation rate for each variant.

2. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

The measurable elements are fold-stability metrics (Tm and thermal survival), catalytic output (product formation rate), and protein production quality (yield/purity) across designed variants.

3. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

We will use AI-guided sequence design (ProteinMPNN/ESM), recombinant expression and purification, thermal denaturation assays, activity assays, and LC-MS to quantify and compare variant performance.

Homework: Waters Part I; Molecular Weight

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

Theoretical pI/Mw: 5.58 / 26941.48 Da (Excluding: LEHHHHHHH)

Theoretical pI/Mw: 5.90 / 28006.60 Dat (Including: LEHHHHHHH)

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

1. Determine z for each adjacent pair of peaks (n, n+1):

z = 903.7148 / (903.7148 - 875.4421) = 31.96

2. Determine the MW of the protein

MW = (m/z * z) - (z * 1.007276 Da) = (903.7148 * 31.96) - (31.96 * 1.007276 Da) = 28850.53 Da

3. Calculate the accuracy of the measurement

(28850.53 - 28006.60) / 28006.60 = 3.01%

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

No, the peaks are too close together to resolve the charge states.

Homework: Waters Part II — Secondary/Tertiary structure

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

Denatured: unfoleded, exposing more surface area and resulting in higher charge states and broader distribution of peaks.

Native: folded, less surface area and lower charge states and sharper peaks.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell?

m / z = 2800, MW = 28006.60 Da => z = (MW / m/z) = 10

Isotope spacing: approximately 0.1 = 1 / z

Homework: Waters Part III — Peptide Mapping - primary structure

1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

MVS[K]GEELFTG VVPILVELDG DVNGH[K]FSVS GEGEGDATYG [K]LTL[K]FICTT G[K]LPVPWPTL VTTLTYGVQC FS[R]YPDHM[K]Q HDFF[K]SAMPE GYVQE[R]TIFF [K]DDGNY[K]T[R]A EV[K]FEGDTLV N[R]IEL[K]GIDF [K]EDGNILGH[K] LEYNYNSHNV YIMAD[K]Q[K]NG I[K]VNF[K]I[R]HN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALS[K]D PNE[K][R]DHMVL LEFVTAAGIT LGMDELY[[K]]LE HHHHHH

2. How many peptides will be generated from tryptic digestion of eGFP?

19 peptides

massposition#MCmodificationspeptide sequence
4472.1752170-2100HNIEDGSVQLADHYQQNTPI GDGPVLLPDNHYLSTQSALS K
2566.2931217-2390DHMVLLEFVTAAGITLGMDE LYK
2437.26085-270GEELFTGVVPILVELDGDVN GHK
2378.257754-740LPVPWPTLVTTLTYGVQCFS R
1973.9062142-1570LEYNYNSHNVYIMADK
1503.659728-420FSVSGEGEGDATYGK
1266.578387-970SAMPEGYVQER
1083.4979240-2470LEHHHHHH
1050.5214115-1230FEGDTLVNR
982.4952133-1410EDGNILGHK
821.394081-860QHDFFK
790.355275-800YPDHMK
769.391347-530FICTTGK
711.2944103-1080DDGNYK
655.381398-1020TIFFK
602.2780211-2150DPNEK
579.3137128-1320GIDFK
507.2925164-1670VNFK
502.3235124-1270IELK

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

21 peaks.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

There are more peaks in the chromatogram than predicted peptides. This could be due to post-translational modifications, missed cleavages, or other factors that generate additional peptide species.

5. Identify the mass-to-charge ($\frac{m}{z}$) of the peptide shown in Figure 5b. What is the charge ($z$) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ($\small{[M\!\!+\!\!H]^+}$) based on its $\frac{m}{z}$ and $z$.

Most abundant peak: m/z = 525.76712

Isotope spacing: 0.5

Charge: z = 1 / 0.5 = 2

Mass = 525.76712 * 2 = 1051.53 Da

6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm.

1050.5214	115-123	0	FEGDTLVNR

Accuracy = (1051.53 - 1050.5214) / 1050.5214 = 0.00096

Error = 0.00096 * 1,000,000 = 960 ppm

7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

88%

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

1. Contribute at least one pixel to this global artwork experiment before the editing ends on Sunday 4/19 at 11:59 PM EST.

Done.

2. Make a note on your HTGAA webpages including: what you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”), what you liked about the project, and what about this collaborative art experiment could be made better for next year.

I contributed a little dot on the bottom right plate, but it was overlapped by other contributions later. I liked the interactivity of the project. It’s cool to see how the plates evolve over time as more people contribute.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

  • E. coli Lysate: Provides the necessary cellular machinery (e.g., ribosomes, tRNAs, enzymes) for transcription and translation to occur in the cell-free system.
  • Salts/Buffer: Provide suitable pH conditions for cell-free reactions.
  • Energy / Nucleotide System: Provide energy sources needed for the reactions.
  • Translation Mix (Amino Acids): Provide the building blocks for protein synthesis.
  • Additives (Nicotinamide): Improve the efficiency of the reactions.
  • Backfill (Nuclease Free Water): Adjust the final volume of the reaction and ensure that it is nuclease-free to prevent degradation of DNA/RNA.

2. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

1-hour optimized PEP-NTP: Rapid protein synthesis, using phosphoenolpyruvate (PEP) as an energy source.

20-hour NMP-Ribose-Glucose master mix: Optimized for longer reactions, using nucleoside monophosphates (NMPs) and glucose for sustained energy production over a longer period.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

sfGFP

  • Maturation Time: 5–15 minutes, very fast
  • Acid Sensitivity: generally more stable, maintains roughly 50% of its fluorescence at pH 5.4
  • Folding: high folding efficiency, fluorescence is thus enhanced in cell-free systems
  • Oxygen Dependence: oxygen required

2. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

Use glucose as the main energy source in the master mix to sustain energy production over a longer period, which could improve the folding and maturation of sfGFP, leading to increased fluorescence over a 36-hour incubation.

3. The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

4. The final phase of this lab will be analyzing the fluorescence data we collect to determine whether we can draw any conclusions about favorable reagent compositions for our fluorescent proteins. This will be due a week after the data is returned (date TBD!). The reaction composition for each well will be as follows: