Homework

Weekly homework submissions:

  • Week 01 HW: Principles and Practices

    Class Assignment 1. First, describe a biological engineering application or tool you want to develop and why. Endometriosis is an inflammatory disease characterized by the endometrial-like tissue growth outside of the uterine cavity. This ectopic growth leads to hormonal imbalances, systemic inflammation, and debilitating pain during menstruation, sexual intercourse, and bodily functions . Although it affects 10–15% of reproductive age women, there is currently no cure and the diagnosis of this diseases remains a clinical challenge [1]. Current clinical management is limited to hormonal suppression, pain control and surgical excision [2]. Consequently, there is a critical need for non-invasive, targeted therapies that can modulate the immune response and minimize recurrence rates without compromising the patient’s reproductive health.

  • Week 02 HW: DNA READ, WRITE AND EDIT

    Part 1: Benchling & In-silico Gel Art This DNA gel art was designed in the style of Paul Vanouse’s Latent Figure Protocol. I chose to create the letter “P” as it is the initial of my name, Paula. To achieve this, I used Ronan’s website, which was a helpful tool for quickly iterating on the designs and determining the best enzyme combinations to form the silhouette of the letter.

  • Week 03 HW: Lab Automation

    Your task this week is to Create a Python file to run on an Opentrons liquid handling robot. 1. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. 2. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. Link to cell: https://colab.research.google.com/drive/1-f-vpwBCOx1gmlD5qXbW5z-7sLun1xwP?authuser=2#scrollTo=pczDLwsq64mk&line=1&uniqifier=1

  • Week 04 HW: Protein Design part I

    Part A. Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) The average composition of muscle without external fat cover is composed of approximately 70% water, 20% protein, and 9% fat (The exact values vary depending on the animal source) [1]. Therefore 500 g of meat provides 100 g of protein. Since proteins are chains of amino acids, once digested they break down into individual amino acid molecules. We are told the average molecular weight of an amino acid is ~100 Daltons, which means its molar mass is 100 g/mol.

  • Week 05 HW: Protein Design part II

    Part A: SOD1 Binder Peptide Design (From Pranam) Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • Week 06 HW: Genetic Circuits Part I

    Assignment: DNA Assembly 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? According to New England Biolabs [1]: Phusion DNA polymerase: a DNA polymerase (enzymes that catalyze the synthesis of DNA molecules from dNTPs) that offers high fidelity and speed, with a lower error rate than Taq DNA polymerase.

  • Week 07 HW: Genetic circuits part II

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Boolean functions are limited to discrete on/off states while IANNs are capable of processing analogue signals and, because of that, carry more information. Real world phenomena are analog, inside a cell there is inherent molecular noise, and Boolean circuits are fragile to this, especially at low signal concentrations.

  • Week 09 HW: Cell Free Systems

    Homework Part A: General and Lecturer-Specific Questions General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Aspect In vivo Cell free Production speed Hours to days (requires growth) Minutes to hours (immediate synthesis) Cell membrane Cell membranes impede scientists from interacting with the components of the reaction or manipulate cellular processes. No cell membranes to get in the way of directly manipulating the reaction components. Toxic proteins If the target protein is toxic to the host cell, the cell may die before it can produce a significant amount of the protein. No living cells to keep alive so it bypasses toxicity issues. Manipulation of Reaction Conditions Cell viability needs to be considerate. Enables optimization (adjust pH, ionic strength, redox potential, metal ion concentrations, or temperature). Interference & purity Host cells need to produce their own proteins to stay alive, interfering with or delaying the production of the target protein. Difficult to separate the target protein from all the other proteins and cellular components. Since CFPS contain the minimal cellular components necessary for protein synthesis it simplifies the extraction and purification process. Non-natural amino acids Restricted to the use of the 20 naturally occurring amino acids. Enable the use of non natural amino acids to produce proteins with novel properties. Storage & Shipping Produced in large batches and shipped cold on ice, which is expensive Can be freeze-dry to make them last longer at room temperature. When is cell-free expression more beneficial than cell production?

  • Week 10 HW: Imaging and Measurement

    For your final project: - Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. - Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements. - What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail. —————————————————————————————————————————————————- - Toehold switch activation: Does the switch open specifically in response to LncRNA H19? I would order the toehold switch construct from Twist and it will be expressed at Ginkgo using a PURExpress cell-free reaction with and without the H19 trigger, using sfGFP as reporter. Technology: fluorescence spectroscopy with a plate reader.

  • Week 11 HW: Building Genomes

    Part C: Planning the Global Experiment | Cell-Free Master Mix Design 1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each) 1. sfGFP 2. mRFP1 3. mKO2 4. mTurquoise2 5. mScarlet_I 6. Electra2 Fluorescent protein Maturation (min) pKa Brightness Description sfGFP 13.6 54.15 Rapidly-maturing weak dimer mRFP1 60.0 4.5 12.5 Slowly-maturing monomer with low acid sensitivity mKO2 108.0 5.5 39.56 Moderate acid sensitivity mTurquoise2 33.5 3.1 27.9 Rapidly-maturing monomer with very low acid sensitivity mScarlet_I 174.0 5.3 70 Moderate acid sensitivity Electra2 61.48 3. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect. I choose mKO2 because it is acid sensitive and has a slow maturation time.. My hypothesis is that increasing the HEPES-KOH concentration in the 36-hour master mix would help maintain the pH closer to 7.5 throughout the reaction. Which is important because in cell free reactions pH decreases over long incubations due to the accumulation of acidic metabolic byproducts and mKO2 slow maturation time means most of the signal is generated in the later hours when pH tends to drop.

Subsections of Homework

Week 01 HW: Principles and Practices

cover image cover image

Class Assignment


1. First, describe a biological engineering application or tool you want to develop and why.

Endometriosis is an inflammatory disease characterized by the endometrial-like tissue growth outside of the uterine cavity. This ectopic growth leads to hormonal imbalances, systemic inflammation, and debilitating pain during menstruation, sexual intercourse, and bodily functions . Although it affects 10–15% of reproductive age women, there is currently no cure and the diagnosis of this diseases remains a clinical challenge [1]. Current clinical management is limited to hormonal suppression, pain control and surgical excision [2]. Consequently, there is a critical need for non-invasive, targeted therapies that can modulate the immune response and minimize recurrence rates without compromising the patient’s reproductive health.

To address these challenges, I propose Endo-Biotics, a vaginal suppository containing probiotic bacteria (Lactobacillus) genetically programmed to deliver bispecific nanobodies that block the IL-17 cytokine inflammatory cascade after specifically anchoring to CD44 receptors.

  • Nanobodies: Antigen-binding fragments derived from naturally ocurring heavy-chain-only present in the serum of camelids. Their small size, high stability, strong antigen-binding affinity, water solubility and natural origin offers new possibilities for treatment against antibodies that are limited by their large size and poor penetration into solid tissues [3].

  • Expression host: Lactobacillus is a commensal bacteria found naturally in the microbiota of the female reproductive tract. This natural affinity enables effective mucosal colonization, ensuring the system persists long enough to deliver a therapeutic dose.

  • Targeting module: Endometrial cells from women with endometriosis overexpress CD44 variants, wich is associated with incresed adherence to peritoneal cells and plays a key role in the development of early endometriotic lesion [4]. Targeting CD44 allows the nanobody to be retained at the lesion site and reduce exposure to surrounding healthy tissue.

  • Effector module: Elevated levels of IL-17 have been observed in patients during the early stages of the disease. This pro-inflammatory cytokine promotes the proliferation, invasion, and implantation of endometriotic cells by triggering the construction of new blood vessel networks [5]. Blocking IL-17 not only reduces inflammation but also interrupts the development of the blood supply these lesions require to survive and persist outside the uterine cavity.


  1. Biological Containment

    • Prevent uncontrolled growth: Bacterial growth should be limited to therapeutic levels and can be stopped by discontinuing use.
    • Prevent dissemination beyond the host: The organism should not spread to other individuals or into the external enviromental.
    • Limit horizontal gene transfer: Avoid genetically modified elements being transferred to native microbiota or environmental bacteria through horizontal gene transfer mechanisms.
  2. Patient safety

    • Minimize off-target effects: Avoid interfering with normal immune functions outside of the tissue affected by endometriosis.
    • Microbiome integrity: Prevent genetically modified Lactobacillus from altering the balance of the vaginal microbiome.
    • Responsable patient use: Ensure that patients use the therapy correctly, with adequate understanding of benefits, risks, and limitations.
  3. Biosafety

    • Laboratory safety: Manufactoring processes follow established biosafety protocols.
    • Safe handling and distribution: Ensure appropriate storage, transport, and handling conditions.
    • Misuse prevention: prevent unauthorized acquisition, modification, or use for non-therapeutic purposes.
  4. Equitable access

    • Affordability: Avoid limiting access to high-income populations only.
    • Inclusive clinical evaluation: Clinical trials and tests will consider diverse populations to reduce bias and ensure effective results.

3. Describe at least three different potential governance “actions” by considering Purpose, Design, Assumptions, Risks of Failure & “Success”

Action 1: Kill Switch - Researchers

  • Purpose: Prevent bacteria from growing uncontrolled or escaping into the environment.
  • Design: bacteria is design to be dependent on a nutrient absent in the body and nature, only present in the vaginal suppository.
  • Assumptions: The bacteria will not mutate to acquire another form of subsistence.
  • Risks of Failure & “Success: If this fails, the bacteria could colonize the reproductive system. If successful, the synthetic nutrient could increase the cost of production.

Action 2: Chromosomal Integration - Researchers

  • Purpose: Avoiding genetically modified elements from spreading to the native microbiota or enviromental bacteria.
  • Design: In advanced stages of research, the expectation is to transition from genetic modification using plasmids to incorporating therapeutic DNA directly into the chromosomes of Lactobacillus.
  • Assumptions: Chromosomal integration is stable and will not negatively affect the growth or therapeutic efficacy of the strain.
  • Risks of Failure & “Success: DNA could still be transferred via transduction or natural transformation. However, risks are significantly reduced with a higher level of security system.

Action 3: Education and Transparency – User

  • Purpose: Ensure correct and informed use
  • Design: Clear instructions on how to use and contraindications with total transparency for informed decision making.
  • Assumptions: patients will read the material and the information system will be accessible to everyone.
  • Risks of Failure & “Success: negligent use of treatment is made.

Action 4: Access under prescription - Health regulatory agencies (DIGEMID, INS, SUSALUD)

  • Purpose: Avoid unauthorized acquisition, home modification, or use of the therapy for purposes other than the treatment of endometriosis
  • Design: Endo-Biotics must be classified as a prescription-only treatment. Only specialist doctors can issue the prescription after diagnosing endometriosis.
  • Assumptions: Patients will not try to acquire the product through unofficial channels and specialists are willing to prescribe new therapies.
  • Risks of Failure & “Success: High level of patient safety and clinical oversight, but it may limit access for those without easy access to specialists.

Action 5: Financing and Subsidy – Public Health Organizations (ProCiencia and MINSA – Perú | WHO and EndoFound - Internationaly)

  • Purpose: ensure the therapy reaches all women regardless of their socioeconomic status.
  • Design: locally, we will work to include the therapy in Peru’s National Petition of Essential Medicines (PNME) to enable coverage through MINSA (SIS); internationally, we will partner with NGOs like the Gates Foundation, ensuring lower costs for vulnerable populations in developing regions.
  • Assumptions: There is sufficient political will and international funding available specifically for endometriosis, which is traditionally an underfunded area.
  • Risks of Failure & “Success: Dependence on external financing or subsidy can make the project unstable. Otherwise, a technology that could improve the quality of life would be accessible to all sectors of the population.

Action 6: Rigorous Lab Protocols - Researchers

  • Purpose: To avoid human error and ensure the modified Lactobacillus is produced with total sterility and verified binding ability.
  • Design: mandatory “binding assays” to confirm the bacteria actually adheres to the CD44 target and strict sterility protocols to minimize the risk of contamination or environmental release.
  • Assumptions: We assume researchers will follow protocols and that everything is perfectly calibrated.
  • Risks of Failure & “Success: Small errors could lead to contamination or a batch with incorrect genetic markers. Otherwise, the constant auditing and verification could slow down the production process.

4. Score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Mapa de calor Mapa de calor

5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

I would prioritize the Kill Switch and Chromosomal integration, while it is true that this technical decision could increase the complexity and cost of production, Nanobody treatment in an emerging technology that is not yet fully regulated so it is necessary to take all possible precautions. The fundamental goal is to ensure its contribution to the quality of life for patients with endometriosis without the need for invasive treatments or harming their fertility. I am assuming that physicians will be willing to adopt this new bio-therapeutic and Public Health Organizations will maintain long-term interest in funding endometriosis so this treatments can be researched and developed. The biggest uncertainty is being able to achieve biological containment and avoid altering the vaginal microbiota, given that it is such a complex system.


Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues.

One of the ethical concerns discussed in class was “who has access.” Synthetic biology is emerging as a powerful tool that can improve quality of life and open new avenues for innovation, but it can also be used negligently in ways that may harm people or the environment. For this reason, hearing about “trust” as a central theme in biotechnology made me reflect on the importance of closing the gap between experts and the general public, and on how doing so could open the door to new approaches and perspectives, as long as it is done in an ethical way.


Assignment (Week 2 Lecture Prep)


Homework Questions from Professor Jacobson:

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of polymerase is 1:10^(6) which means that one error is made for every million nucleotides added. The human genome consist of approximately 3 x 10^(9) pb (3,088,269,832 pb [6]) that means that every time a cell divides there would be approximately 3,000 errors. Biology deals with errors with DNA polymerase proofreading during extension and the MutS Repair System.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Based in Lecture 2 slides, an average human protein is 1036 bp long, since DNA is composed of four nucleotids (A, T, C, G), the number of posible ways to code a protein of this length is 4^1036.

In reality, only a small fraction of these secuences are functional. The reason behind this is that multiple codons can encode the same amino acid, but they are not equally efficient. DNA sequence defines secundary structure formation and high GC content, repetitive sequences, or unfavorable base-pairing energies can lead to unstable secondary structures that interfere with transcription, translation, or synthesis.

Homework Questions from Dr. LeProust:

  1. What’s the most commonly used method for oligo synthesis currently?

The most commonly used method is solid-phase chemical synthesis using phosphoramidite chemistry, where nucleotides are added one at a time in repeated cycles.

  1. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because each step isn’t perfectly efficient. As the oligo gets longer, small mistakes build up, so after around 200 nucleotides the yield drops a lot and many sequences are incomplete or wrong.

  1. Why can’t you make a 2000bp gene via direct oligo synthesis?

At that length, the error accumulation makes getting a fully correct sequence extremely unlikely. That’s why long genes are made by assembling shorter oligos instead of synthesizing them all at once.

Homework Question from George Church:

  1. What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in animals are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine [7].

“The lysine contingency is intended to prevent the spread of the animals in case they ever got off the island. Dr. Wu inserted a gene that creates a single faulty enzyme in protein metabolism. The animals can’t manufacture the amino acid lysine. Unless they’re continually supplied with lysine by us, they’ll slip into a coma and die [8].” —Ray Arnold

It highlights how genetic codes can be engineered to enhance biological containment. I proposed this technique in my biotechnological application as a biological contingency method since it allows stopping the uncontrolled growth of Lactobacillus by designing it to depend on a component absent outside the environment for which it is intended.

Bibliography

  • [1] M. Sahni and E. S. Day, “Nanotechnologies for the detection and treatment of endometriosis,” Front. Biomater. Sci., vol. 2, Nov. 2023, doi: 10.3389/fbiom.2023.1279358.

  • [2] “Endometriosis.” Accessed: Feb. 08, 2026. [Online]. Available: https://medlineplus.gov/endometriosis.html

  • [3] I. Jovčevska and S. Muyldermans, “The Therapeutic Potential of Nanobodies,” BioDrugs Clin. Immunother. Biopharm. Gene Ther., vol. 34, no. 1, pp. 11–26, Feb. 2020, doi: 10.1007/s40259-019-00392-z.

  • [4] J. F. Knudtson et al., “Overexpression of CD44 is involved in the development of the early endometriotic lesion,” Fertil. Steril., vol. 110, no. 4, p. e390, Sep. 2018, doi: 10.1016/j.fertnstert.2018.07.1090.

  • [5] J. V. Garmendia, C. V. De Sanctis, M. Hajdúch, and J. B. De Sanctis, “Endometriosis: An Immunologist’s Perspective,” Int. J. Mol. Sci., vol. 26, no. 11, p. 5193, May 2025, doi: 10.3390/ijms26115193.

  • [6] A. Piovesan, M. C. Pelleri, F. Antonaros, P. Strippoli, M. Caracausi, and L. Vitale, “On the length, weight and GC content of the human genome,” BMC Res. Notes, vol. 12, no. 1, p. 106, Feb. 2019, doi: 10.1186/s13104-019-4137-z.

  • [7] “Essential Amino Acid - an overview | ScienceDirect Topics.” Accessed: Feb. 10, 2026. [Online]. Available: https://www.sciencedirect.com/topics/pharmacology-toxicology-and-pharmaceutical-science/essential-amino-acid

  • [8] “Lysine contingency,” Jurassic Park Wiki. Accessed: Feb. 10, 2026. [Online]. Available: https://jurassicpark.fandom.com/wiki/Lysine_contingency

Week 02 HW: DNA READ, WRITE AND EDIT

cover image cover image

Part 1: Benchling & In-silico Gel Art

This DNA gel art was designed in the style of Paul Vanouse’s Latent Figure Protocol. I chose to create the letter “P” as it is the initial of my name, Paula. To achieve this, I used Ronan’s website, which was a helpful tool for quickly iterating on the designs and determining the best enzyme combinations to form the silhouette of the letter.

Gel Art Gel Art

Part 3: DNA Design Challenge

3.1 Choose your protein

Hydrophobin HFBI de Trichoderma reesei: I chose this protein because I will be participating in a summer research program at Aalto University focused on bio-based foams and mycelium-derived materials. Hydrophobins are proteins naturally produced by fungi and play an important role in fungal growth, particularly in modifying surface properties and mediating interactions at air–water interfaces. These characteristics are directly relevant to mycelium-based biomaterials, where fungal networks interact with substrates to form structured materials with tunable mechanical properties.

Hydrophobin HFBI de Trichoderma reesei AA sequence:

>sp|P52754.1|HYP1_HYPJE RecName: Full=Class II hydrophobin 1; AltName: Full=Hydrophobin I; Short=HFBI; Flags: Precursor

MKFFAIAALFAAAAVAQPLEDRSNGNGNVCPPGLFSNPQCCATQVLGLIGLDCKVPSQNVYDGTDFRNVC AKTGAQPLCCVAPVAGQALLCQTAVGA


3.1.1 3.1.1

3.2 Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence

Hydrophobin HFBI de Trichoderma reesei DNA sequence:

atgaaattttttgcgattgcggcgctgtttgcggcggcggcggtggcgcagccgctggaa gatcgcagcaacggcaacggcaacgtgtgcccgccgggcctgtttagcaacccgcagtgc tgcgcgacccaggtgctgggcctgattggcctggattgcaaagtgccgagccagaacgtg tatgatggcaccgattttcgcaacgtgtgcgcgaaaaccggcgcgcagccgctgtgctgc gtggcgccggtggcgggccaggcgctgctgtgccagaccgcggtgggcgcg


3.2 3.2

3.3 3.3

3.3. Codon optimization

Hydrophobin HFBI de Trichoderma reesei with Codon-Optimization:

ATGAAATTTTTTGCCATTGCCGCGCTGTTCGCGGCGGCCGCGGTCGCACAGCCGCTGGAAGATCGTAGCAATGGCAACGGTAACGTGTGCCCGCCGGGCCTGTTTTCGAATCCACAGTGTTGTGCGACGCAAGTGTTAGGCCTAATCGGATTGGATTGCAAAGTACCCTCACAGAATGTTTATGACGGTACCGATTTCCGCAACGTCTGCGCGAAGACCGGGGCTCAGCCGCTCTGCTGTGTGGCACCTGTTGCTGGCCAAGCCCTTCTGTGCCAGACTGCCGTGGGTGCA

3.3.1 3.3.1

3.4. What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein.

  • Cell-Dependent Recombinant Protein Expression: After chemically synthesizing the DNA sequence encoding HFBI, the gene is inserted into a plasmid vector using DNA assembly methods such as Gibson Assembly or restriction enzyme-based cloning methods. Then, the recombinant plasmid is introduced into a host organism, which uses its own transcription and translation machinery to express the protein as the cells grow.

  • Cell-Free Protein Expression: In this method, instead of using living cells, the DNA sequence encoding HFBI is added to a reaction mixture containing ribosomes, enzymes, nucleotides, amino acids, and energy sources extracted from cells and transcription and translation occur directly in vitro [1]. Compared to in vivo techniques based on bacterial or tissue culture cells, in vitro protein expression is considerably faster because it does not require gene transfection, cell culture or extensive protein purification [2].

Part 4: Prepare a Twist DNA Synthesis Order

Build Your DNA Insert Sequence

linear_map linear_map
>HFBI_part4

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAAATTTTTTGCCATTGCCGCGCTGTTCGCGGCGGCCGCGGTCGCACAGCCGCTGGAAGATCGTAGCAATGGCAACGGTAACGTGTGCCCGCCGGGCCTGTTTTCGAATCCACAGTGTTGTGCGACGCAAGTGTTAGGCCTAATCGGATTGGATTGCAAAGTACCCTCACAGAATGTTTATGACGGTACCGATTTCCGCAACGTCTGCGCGAAGACCGGGGCTCAGCCGCTCTGCTGTGTGGCACCTGTTGCTGGCCAAGCCCTTCTGTGCCAGACTGCCGTGGGTGCACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA


plasmid plasmid

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence and why?

I would sequence the synthetic bispecific nanobody construct designed to bind CD44 and block IL-17 signaling. Sequencing would allow me to verify that the DNA was synthesized correctly, confirm the absence of mutations, and ensure the construct is suitable for expression in the probiotic host before experimental use.


(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

I would use next-generation sequencing (NGS), specifically Illumina sequencing, to analyze the engineered construct. Illumina platforms provide high accuracy, relatively low cost per base, and are well suited for short constructs such as nanobody genes as well as targeted panels of inflammatory genes.

1. Is your method first-, second- or third-generation or other? How so?

This method is considered second-generation sequencing because it relies on massively parallel sequencing of many short DNA fragments simultaneously after amplification on a flow cell.

2. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

The input would be DNA extracted either from the engineered plasmid containing the nanobody gene.

Essential preparation steps:

  1. DNA extraction and purification
  2. Fragmentation
  3. Adapter ligation to both ends of DNA fragments
  4. PCR amplification to generate sufficient material
  5. Loading onto the sequencing flow cell

3. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

Essential steps:

  1. DNA fragments bind to complementary oligos on a flow cell.
  2. Clonal amplification creates clusters of identical molecules.
  3. Fluorescently labeled nucleotides are incorporated one base at a time.
  4. A camera detects the fluorescent signal after each cycle.

The color detected corresponds to a specific base (A, T, C, or G), which allows the sequence to be reconstructed digitally. This process is called base calling.

4. What is the output of your chosen sequencing technology?

The output is a large dataset of short DNA reads in digital format (FASTQ files), which can then be aligned to reference sequences or assembled to confirm the construct sequence.


5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I would synthesize a genetic construct encoding a bispecific nanobody that can anchor to CD44 receptors while simultaneously blocking IL-17 inflammatory signaling. This construct would be expressed in a probiotic Lactobacillus strain to create a localized therapeutic system for endometriosis.

The goal is to combine targeted binding with immune modulation to reduce inflammation and lesion growth without systemic side effects.

The DNA construct would include:

  • Promoter for bacterial expression
  • Secretion signal peptide
  • Anti-CD44 nanobody domain
  • Flexible linker
  • Anti-IL-17 nanobody domain
  • Terminator sequence

(ii) What technology or technologies would you use to perform this DNA synthesis and why? I would use phosphoramidite chemical DNA synthesis combined with gene assembly, such as the synthesis services provided by companies like Twist Bioscience.

This method allows precise control of nucleotide sequence and is scalable for custom gene design.

1. What are the essential steps of your chosen sequencing methods?

Essential steps:

  1. Chemical synthesis of short oligonucleotides
  2. Assembly of oligos into full gene fragments
  3. Error correction and cloning into plasmid vectors
  4. Sequence verification
  5. Delivery as plasmid DNA

3. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

  • Error rates increase with longer sequences
  • Cost increases with length and complexity
  • Repetitive or GC-rich regions can be difficult to synthesize
  • Turnaround time can vary depending on design complexity

5.3 DNA Edit

(i) What DNA would you want to edit and why?

I would edit the genome of a probiotic Lactobacillus strain to stably express the therapeutic bispecific nanobody. Genome integration would improve stability compared to plasmid-based expression and reduce the need for antibiotic selection.

This could enable long-term therapeutic delivery directly at mucosal surfaces.


(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use CRISPR-Cas9 genome editing because it allows precise insertion of DNA sequences into specific genomic locations with relatively high efficiency.

1. How does your technology of choice edit DNA? What are the essential steps?

CRISPR-Cas9 uses a guide RNA to direct the Cas9 enzyme to a specific DNA sequence. Cas9 creates a double-strand break at that location. The cell’s repair machinery then inserts the desired DNA sequence using a repair template provided by the researcher.

2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

Inputs would include:

  • Guide RNA targeting the desired genomic site
  • Cas9 enzyme or expression plasmid
  • Donor DNA template containing the nanobody gene
  • Host bacterial cells

3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

  • Off-target edits may occur
  • Editing efficiency can vary between organisms
  • Delivery of CRISPR components into cells can be challenging
  • Integration success may require screening multiple clones

Bibliografia

Week 03 HW: Lab Automation

cover image cover image

Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.


1. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com.

volleyball volleyball

2. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons.

Link to cell: https://colab.research.google.com/drive/1-f-vpwBCOx1gmlD5qXbW5z-7sLun1xwP?authuser=2#scrollTo=pczDLwsq64mk&line=1&uniqifier=1

from opentrons import types

metadata = {    # see https://docs.opentrons.com/v2/tutorial.html#tutorial-metadata
    'author': 'Paula Mariana Carrodeguas González',
    'protocolName': 'Volleyball inspired agArt',
    'description': 'Volleyball design using Blue, Yellow, and Red fluorescent E. coli on black agar.',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

##############################################################################
###   Robot deck setup constants - don't change these
##############################################################################

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

well_colors = {
    'A1' : 'Red',
    'B1' : 'Yellow',
    'C1' : 'Green',
    'D1' : 'Cyan',
    'E1' : 'Blue'       # if in a 24-well plate, this needs to be moved to e.g. D2
}


def run(protocol):
  ##############################################################################
  ###   Load labware, modules and pipettes
  ##############################################################################

  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)

  # Temperature Module Plate
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul',
                                                      'Cold Plate')
  # Choose where to take the colors from
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')  ## TA MUST CALIBRATE EACH PLATE!
  # Get the top-center of the plate, make sure the plate was calibrated before running this
  center_location = agar_plate['A1'].top()

  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  ##############################################################################
  ###   Patterning
  ##############################################################################

  ###
  ### Helper functions for this lab
  ###

  # pass this e.g. 'Red' and get back a Location which can be passed to aspirate()
  def location_of_color(color_string):
    for well,color in well_colors.items():
      if color.lower() == color_string.lower():
        return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

  # For this lab, instead of calling pipette.dispense(1, loc) use this: dispense_and_detach(pipette, 1, loc)
  def dispense_and_detach(pipette, volume, location):
      """
      Move laterally 5mm above the plate (to avoid smearing a drop); then drop down to the plate,
      dispense, move back up 5mm to detach drop, and stay high to be ready for next lateral move.
      5mm because a 4uL drop is 2mm diameter; and a 2deg tilt in the agar pour is >3mm difference across a plate.
      """
      assert(isinstance(volume, (int, float)))
      above_location = location.move(types.Point(z=location.point.z + 5))  # 5mm above
      pipette.move_to(above_location)       # Go to 5mm above the dispensing location
      pipette.dispense(volume, location)    # Go straight downwards and dispense
      pipette.move_to(above_location)       # Go straight up to detach drop and stay high

  ###

  # 1. Defining my coordinate lists
  azurite_points = [(-12.1, 18.7),(-9.9, 18.7),(-16.5, 16.5),(-14.3, 16.5),(-12.1, 16.5),(3.3, 16.5),(-20.9, 14.3),(-18.7, 14.3),(-16.5, 14.3),(-14.3, 14.3),(5.5, 14.3),(7.7, 14.3),(-20.9, 12.1),(-18.7, 12.1),(-16.5, 12.1),(7.7, 12.1),(-23.1, 9.9),(-20.9, 9.9),(-18.7, 9.9),(9.9, 9.9),(-23.1, 7.7),(-20.9, 7.7),(-5.5, 7.7),(-3.3, 7.7),(-1.1, 7.7),(9.9, 7.7),(-25.3, 5.5),(-23.1, 5.5),(-7.7, 5.5),(-5.5, 5.5),(-3.3, 5.5),(-1.1, 5.5),(9.9, 5.5),(12.1, 5.5),(-25.3, 3.3),(-9.9, 3.3),(-7.7, 3.3),(-5.5, 3.3),(-3.3, 3.3),(-1.1, 3.3),(9.9, 3.3),(12.1, 3.3),(-12.1, 1.1),(-9.9, 1.1),(-7.7, 1.1),(-5.5, 1.1),(9.9, 1.1),(12.1, 1.1),(-14.3, -1.1),(-12.1, -1.1),(-9.9, -1.1),(-7.7, -1.1),(7.7, -1.1),(9.9, -1.1),(12.1, -1.1),(-16.5, -3.3),(-14.3, -3.3),(-12.1, -3.3),(-9.9, -3.3),(-5.5, -3.3),(9.9, -3.3),(12.1, -3.3),(-18.7, -5.5),(-16.5, -5.5),(-14.3, -5.5),(-12.1, -5.5),(-7.7, -5.5),(-5.5, -5.5),(-3.3, -5.5),(-20.9, -7.7),(-18.7, -7.7),(-16.5, -7.7),(-14.3, -7.7),(-7.7, -7.7),(-5.5, -7.7),(-3.3, -7.7),(-1.1, -7.7),(-20.9, -9.9),(-18.7, -9.9),(-16.5, -9.9),(-5.5, -9.9),(-3.3, -9.9),(-1.1, -9.9),(1.1, -9.9),(-20.9, -12.1),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1),(-1.1, -14.3),(1.1, -14.3),(3.3, -14.3),(5.5, -14.3),(1.1, -16.5),(3.3, -16.5),(-12.1, -18.7)]
  mko2_points = [(-5.5, 18.7),(-3.3, 18.7),(-1.1, 18.7),(-3.3, 16.5),(-1.1, 16.5),(-9.9, 14.3),(-7.7, 14.3),(-5.5, 14.3),(-1.1, 14.3),(1.1, 14.3),(-12.1, 12.1),(-9.9, 12.1),(-7.7, 12.1),(-5.5, 12.1),(-3.3, 12.1),(1.1, 12.1),(3.3, 12.1),(-14.3, 9.9),(-12.1, 9.9),(-9.9, 9.9),(-7.7, 9.9),(1.1, 9.9),(3.3, 9.9),(5.5, 9.9),(-16.5, 7.7),(-14.3, 7.7),(-12.1, 7.7),(-9.9, 7.7),(3.3, 7.7),(5.5, 7.7),(-18.7, 5.5),(-16.5, 5.5),(-14.3, 5.5),(-12.1, 5.5),(3.3, 5.5),(5.5, 5.5),(-20.9, 3.3),(-18.7, 3.3),(-16.5, 3.3),(-14.3, 3.3),(3.3, 3.3),(5.5, 3.3),(-23.1, 1.1),(-20.9, 1.1),(-18.7, 1.1),(-16.5, 1.1),(1.1, 1.1),(5.5, 1.1),(-25.3, -1.1),(-23.1, -1.1),(-20.9, -1.1),(-18.7, -1.1),(-3.3, -1.1),(-1.1, -1.1),(1.1, -1.1),(3.3, -1.1),(-25.3, -3.3),(-23.1, -3.3),(-20.9, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(5.5, -3.3),(-23.1, -5.5),(1.1, -5.5),(3.3, -5.5),(5.5, -5.5),(7.7, -5.5),(3.3, -7.7),(5.5, -7.7),(7.7, -7.7),(9.9, -7.7),(-12.1, -9.9),(-9.9, -9.9),(5.5, -9.9),(7.7, -9.9),(9.9, -9.9),(-14.3, -12.1),(-12.1, -12.1),(-9.9, -12.1),(-7.7, -12.1),(7.7, -12.1),(-18.7, -14.3),(-12.1, -14.3),(-9.9, -14.3),(-7.7, -14.3),(-5.5, -14.3),(-16.5, -16.5),(-9.9, -16.5),(-7.7, -16.5),(-5.5, -16.5),(-3.3, -16.5),(-7.7, -18.7),(-5.5, -18.7),(-3.3, -18.7)]
  mrfp1_points = [(18.7, 9.9),(20.9, 9.9),(25.3, 9.9),(27.5, 9.9),(18.7, 7.7),(20.9, 7.7),(23.1, 7.7),(25.3, 7.7),(27.5, 7.7),(20.9, 5.5),(23.1, 5.5),(25.3, 5.5),(23.1, 3.3),(18.7, -1.1),(20.9, -1.1),(25.3, -1.1),(27.5, -1.1),(18.7, -3.3),(20.9, -3.3),(23.1, -3.3),(25.3, -3.3),(27.5, -3.3),(20.9, -5.5),(23.1, -5.5),(25.3, -5.5),(23.1, -7.7),(18.7, -12.1),(20.9, -12.1),(25.3, -12.1),(27.5, -12.1),(18.7, -14.3),(20.9, -14.3),(23.1, -14.3),(25.3, -14.3),(27.5, -14.3),(20.9, -16.5),(23.1, -16.5),(25.3, -16.5),(23.1, -18.7)]

  # 2. Assign colors to the points
  design_layers = [
      ('Blue', azurite_points),
      ('Yellow', mko2_points),
      ('Red', mrfp1_points)
  ]

  # 3. Robot execution loop
  for color_name, points in design_layers:
      source_loc = location_of_color(color_name)
      pipette_20ul.pick_up_tip()

      # The p20 pipette can hold max 20uL, so it should aspirate 15uL at a time for safety.
      for i in range(0, len(points), 15):
          batch = points[i : i + 15]
          pipette_20ul.aspirate(len(batch), source_loc)

          for x, y in batch:
              # Move relative to the calibrated center of the agar plate
              target = center_location.move(types.Point(x=x, y=y))
              dispense_and_detach(pipette_20ul, 1, target)

      pipette_20ul.drop_tip()

  # --- END OF CUSTOM DESIGN ---
  ###


  # Don't forget to end with a drop_tip()

Week 04 HW: Protein Design part I

cover image cover image

Part A. Conceptual Questions

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

The average composition of muscle without external fat cover is composed of approximately 70% water, 20% protein, and 9% fat (The exact values vary depending on the animal source) [1]. Therefore 500 g of meat provides 100 g of protein. Since proteins are chains of amino acids, once digested they break down into individual amino acid molecules. We are told the average molecular weight of an amino acid is ~100 Daltons, which means its molar mass is 100 g/mol.

Converting grams of protein to moles of amino acids:

100 g ÷ 100 g/mol = 1 mol of amino acids

Converting moles to number of molecules:

1 mol × 6.022 × 10²³ = ≈ 6 × 10²³ molecules of amino acids

A piece of 500 g of meat contains approximately 6 × 10²³ molecules of amino acids

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Biological identity is not preserved through digestion because the human body breaks down the macromolecules we consume into universal building blocks.

Proteins undergo hydrolysis of their peptide bonds by proteases in the stomach and small intestine, releasing amino acids, the same 20 standard amino acids used by all living organisms. Once absorbed, they enter the bloodstream, they are either incorporated into new human proteins according to the sequence information encoded in human DNA or further metabolized [2].

3. Why are there only 20 natural amino acids?

There are more than 20 amino acids in nature, however, the standard genetic code is restricted to 20 amino acids because it provides a balance between structural diversity and metabolic efficiency.

Chemical coverage: the 2o amino acids set covers the necessary range of hydrophobicity, charge and molecular size required for complex protein folding [2], for example:

  • Charge: Positive (Lysine) and Negative (Glutamate).
  • Polarity: Hydrophilic (Serine) vs. Hydrophobic (Leucine).
  • Specialized Shapes: Small (Glycine), Rigged (Proline), and bulky (Tryptophan).

Frozen Accident: This theory, proposed by Francis Crick, states that “the code is universal because at the present time any change would be lethal, or at least very strongly selected against.” Any attempt by an organism to ‘recode’ or add a new amino acid today would trigger a proteome-wide failure, as it would disrupt the sequence of every existing protein simultaneously [3].

Error minimization: Research suggests that our genetic code is ‘one in a million’ in its ability to ensure that a single-point mutation likely results in a chemically similar amino acid, thereby preserving the protein’s overall structure and function [4].

4. Can you make other non-natural amino acids? Design some new amino acids.

Non-natural amino acids are synthesized compounds that differ from the standard set. They can be synthesized either through organic chemistry or incorporated into proteins via engineered orthogonal translation systems. By modifying side-chain chemistry, we can expand the functional diversity of proteins beyond the constraints of the canonical genetic code, enabling novel catalytic, structural, and responsive properties.

All amino acids share the same basic backbone:

  • An amino group (–NH₂)
  • A carboxyl group (–COOH)
  • A hydrogen
  • A variable side chain (R group)
  • Attached to the same α-carbon

So to design new amino acids, we keep the backbone and modify the R group to give new chemical properties.

Example: Photoresponsive amino acid with the following side chain:

R = –CH₂–C₆H₄–N=N–C₆H₅

This side chain contains an azobenzene group, which can switch between trans and cis conformations when exposed to different wavelengths of light.

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

Elevated levels of IL-17 have been observed in patients during the early stages of the disease. This pro-inflammatory cytokine promotes the proliferation, invasion, and implantation of endometriotic cells by triggering the construction of new blood vessel networks. Blocking IL-17 not only reduces inflammation but also interrupts the development of the blood supply these lesions require to survive and persist outside the uterine cavity.

2. Identify the amino acid sequence of your protein.

8B7W_1|Chain A[auth H]|anti-IL-17A-76|Lama glama (9844) QVQLVQSGGGLVQAGGSLRLSCAASGGTFATSPMGWLRQAPGKGTEFVAAISPSGGDRIYADSVKGRFTISRDNAGNFIYLQMNSLKPEDTAVYYCAVRRRFDGTSYYTGDYDSWGQGTLVTVSS

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

The most frecuent amino acid in 8B7W_1 is glycine.

notebook notebook
How many protein sequence homologs are there for your protein?

250 homologs

Does your protein belong to any protein family?

It´s part of the Single-domain antibody family

3. Identify the structure page of your protein in RCSB
8B7W 8B7W
When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

The structure was deposited on October 3, 2022 and made publicly available on December 28, 2022. It was solved using X-ray diffraction.

Resolution: 2.85 Å Moderate quality range, close but slightly above the 2.70 Å threshold for a good structure.

Are there any other molecules in the solved structure apart from protein?

Yes, 8B7W is a protein.protein complex containing:

IL-17A: pro-inflammatory cytokine from Homo sapiens Anti-IL-17A-76: nanobody derived from Lama glama (Llama)

Both proteins were expressed in Escherichia coli and the structure contains mutations.

Does your protein belong to any structure classification family?

Classification: IMMUNE SYSTEM/INHIBITOR

4. Open the structure of your protein in any 3D molecule visualization software:

I used PyMol to analyze the three-dimensional structure of 8B7W, a protein complex formed by interleukin-17A (IL-17A) bound to an anti-IL-17A nanobody. Since my main interest lies in the anti IL-17A nanobody, I visualized only this structure referred in PDB as “chain H” hiding “chain B”. This allowed a clearer examination of its structural features, including its secondary structure, residue distribution, and potential interaction regions involved in antigen recognition.

image image
Visualize the protein as:

Cartoon: simplifies the protein structure and highlights the secondary structure elements.

Ribbon: follows the protein backbone and helps visualize the trajectory of the polypeptide chain and how the secondary structure elements are arranged.

Ball and stick: shows atoms as spheres and chemical bonds as sticks. Allows to see individual amino acids, atomic interactions and contacts between residues

image image
Color the protein by secondary structure. Does it have more helices or sheets?

Rojo → α-helices

Amarillo → β-sheets

Verde → loops

The nanobody analyzed shows a structure predominated by β-sheets. This is typical of antibody variable domains that adopt an immunoglobulin fold (many β-sheets and long loops), in this case it plays an important role in forming the antigen-binding interface with IL-17A.

color color
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

When visualizing the surface of the nanobody, a region appears almost like a missing portion of the structure and this happens because the antigen is hidden in the visualization. This is the interface where the nanobody interacts with IL-17A in the complex.

aa aa

Part C: Using ML-Based Protein Design Tools

In this section, we will learn about the capabilities of modern protein AI models and test some of them in your chosen protein.

C1. Protein Language Modeling

1. Deep Mutational Scans
a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
heatmap heatmap

The ESM2 score represents how likely it is evolutionarily that that amino acid will appear in that position, based on millions of protein sequences.

Y-axis: Represents the 20 standard amino acids that could mutate at each position in the sequence.

X-axis: Represents the positions of the amino acids in the input protein sequence.

High score → ESM2 considers this mutation evolutionarily plausible. This amino acid has appeared at this position in similar sequences. The protein likely tolerates this mutation.

Blue (low score) → ESM2 considers this mutation evolutionarily improbable. This amino acid rarely appears at this position. The mutation likely damages the protein.

b. Can you explain any particular pattern? (choose a residue and a mutation that stands out)

Positions 20, 88, 92 and 94 show a vertical dark blue pattern (low score), indicating that these positions dont tolerate mutations. This suggests that the wild-type residues are critical.

Two horizontal lines are also noteworthy due to their dark blue pattern, these correspond to W (Tryptophan) and C (Cysteine). This suggests that introducing these residues in any position is unfavorable. This makes sense because cysteine can form spurious disulfide bonds that disrupt folding, and tryptophan is bulky enough to cause steric clashes in most sequence contexts. With a lighter shade of blue, methionine (M) and histidine (H) also show moderately negative scores across many positions, likely reflecting their more specialized chemical properties and lower natural abundance in protein sequences.

Position 113 shows predominantly positive scores across many substitutions, indicating that this position is highly tolerant to mutation. This suggest that in this position, the identity of the amino acid doesn´t matter much structurally. Probably sits in a flexible loop or solvent-exposed region.

C-terminal region exhibit more yellow/high score regions, suggesting that the C-terminal end of the nanobody is much more tolerant of mutations.

2. Latent Space Analysis
a. Use the provided sequence dataset to embed proteins in reduced dimensionality.
b.Analyze the different formed neighborhoods: do they approximate similar proteins?

Yes, I identified a cluster containing different types of protein from bacteria like Mycobacterium Tuberculosis, Pseudomona Aeruginosa y Thermus Thermophilus.

c. Place your protein in the resulting map and explain its position and similarity to its neighbors.

I think the data base is limited and has no enough nanobodys sequences because the cluster where my protein is, it´s surrounded by proteins from different animals and even the IL-12. I would have expected to see other VHH sequences from Llama.

C2. Protein Folding

1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

Yes, It is very similar with the structure I saw in Pymol, The B-Barrel structure and the loops are visible.

opp opp
2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

My protein is resilient to mutations because although I made some mutations that appeared to be unfavorable in the ESM2 Mutation Scan heatmap, I didn´t see drastic changes in the protein´s structure, the B-Barrel is still there as well as the loops.

These are the changes I made:

Position 20: L → E Posición 65: K → C Posición 94: Y → R

There are probably changes but since I can´t compare side to side the structures, to me they appear to be still similar.

Although, if I remove the first ten amino acids, the structure dramatically changes. That is because the first aminoacids in a nanobody are almost the same in their variations.

C3. Protein Generation

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

8B7W, score=1.5216, fixed_chains=[], designed_chains=[‘H’], model_name=v_48_020 QVQLVQSGGGLVQAGGSLRLSCAASGGTFATSPMGWLRQAPGKGTEFVAAISPSGGDRIYADSVKGRFTISRDNAGNFIYLQMNSLKPEDTAVYYCAVRRRFDGTSYYTGDYDSWGQGTLVTVSS

T=0.1, sample=0, score=0.7640, seq_recovery=0.5760 SLTLSQSGGGTVAAGGSVTLTCSSSGGTFKDQWMGWLRQAPGKPVEFVAAISPSGSTTLYSDKVKGRFTISKDSSGQTVTLTMNDLKPEDTATYYCAVRTSSNGSPMDPSNYQPDGSGQKLTVLP

2. Input this sequence into ESMFold and compare the predicted structure to your original.

The predicted structure is similar but not completely the same, it appears to have the same structure but is folding in a slightly different way.

pop pop

Part D. Group Brainstorm on Bacteriophage Engineering

Choose one or two main goals from the list that you think you can address computationally

Goal: Higher toxicity of lysis protein

Sub-goal: Eliminate the dependence of the L protein on the host chaperone DnaJ (E. coli) to accelerate and enhance bacterial lysis.

Context

  • The L protein of bacteriophage MS2 lyses the bacterium Escherichia coli. The authors identify that this process depends on the hostile chaperone DnaJ, specifically through an interaction with its C-terminal domain.
  • Using mutants, the study demonstrates that the P330Q mutation in DnaJ blocks the lytic capacity of the L protein at certain temperatures.
  • In the absence of interaction with DnaJ, the N-terminal domain of L interferes with it´s ability to bind to it´s unknown target.
  • The discovery of variants called Lodj mutants revealed that the N-terminal domain of the L protein is dispensable and is responsible for generating this chaperone dependence.

Tools/approaches

  • ESM2 for Deep Mutational Scanning (DMS): we want to identify residues in the N-terminal domain that can be removed or mutated without disrupting folding or making the protein unstable while mantain critical residues that are essential for lytic function.

  • ESMFold for structural prediction: predict the structure of the designed variants.

  • ProteinMPNN for inverse folding: once we find a stable structure for the L protein we will use ProteinMPNN to generate new sequences that might fold more efficiently.

Why these tools?

  • Since the N-terminal domain of L is intrinsically disordered and difficult to fold without DnaJ, ESM2 can help identify favorable mutations to improve stability.

  • ESMFold would help for validating whether a drastic mutation causes the rest of the protein to collapse.

  • ProteinMPNN can help redesign the sequence of the lysis protein so that it folds more efficiently.

Pitfalls

  • Bacterial cells could lyse too early, not allowing the production of sufficient phage particules and resulting in very low final phage titers, this could lead to resistance to the phages.

  • The interaction between the N-terminal domain and DnaJ conferes the L protein with some stability, if we aim to eliminate this dependency, there´s the posibility that the redesigned protein could become unstable and degrade before reaching the membrane.

---
title: Pipeline´s schematic
---
graph TD
  A[WT L-protein sequence] --> B[ESM2 deep mutational scan]
  B --> |Identify favorable mutations| C[ESMFold structural prediction]
  C --> |Validate stability| D[ProteinMPNN inverse folding]
  D --> |New sequences independent from DnaJ and with more efficient folding| E[AF2-Multimer co-fold with DnaJ]
  E --> |Verify loss of interaction| F[Candidate variants for lab]

AI Prompts:

  • What could be the reason there are horizontal dark blue lines in W and C, assigning low score to that residues? M and H also have low score but a bit higher than the others, why could be the chemical reason behind this?

Bibliography

[1] “Meat Composition - an overview | ScienceDirect Topics.” Accessed: Mar. 02, 2026. [Online]. Available: https://www.sciencedirect.com/topics/food-science/meat-composition

[2] D. L. Nelson and M. M. Cox, Lehninger Principles of Biochemistry, 8th ed. New York, NY, USA: W.H. Freeman/Macmillan Learning, 2021, ch. 18, pp. 695–750.

[3] G. K. Philip and S. J. Freeland, “Did evolution select a nonrandom ‘alphabet’ of amino acids?,” Astrobiology, vol. 11, no. 3, pp. 235–240, Apr. 2011, doi: 10.1089/ast.2010.0567.

[4] F. H. C. Crick, “The origin of the genetic code,” J. Mol. Biol., vol. 38, no. 3, pp. 367–379, Dec. 1968, doi: 10.1016/0022-2836(68)90392-6.

[5] S. J. Freeland and L. D. Hurst, “The Genetic Code Is One in a Million,” J. Mol. Evol., vol. 47, no. 3, pp. 238–248, Sep. 1998, doi: 10.1007/PL00006381.

[6] “Understanding Non-natural Amino Acids: A Guide for Researchers.” Accessed: Mar. 03, 2026. [Online]. Available: https://www.nbinno.com/article/pharmaceutical-intermediates/understanding-non-natural-amino-acids-a-guide-for-researchers

Week 05 HW: Protein Design part II

cover image cover image

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge:

  1. Design short peptides that bind mutant SOD1.
  2. Then decide which ones are worth advancing toward therapy.

You will use three models developed in our lab:

  • PepMLM: target sequence-conditioned peptide generation via masked language modeling

  • PeptiVerse: therapeutic property prediction

  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

Original sequence: sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Introducing A4V mutation: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

2. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
peptides peptides
3. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
codigo codigo
4. Record the perplexity scores that indicate PepMLM’s confidence in the binders.
pseudoPerplexity pseudoPerplexity

Part 2: Evaluate Binders with AlphaFold3

2. Navigate to the AlphaFold Server and for each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.

One of the generated peptide contained an “X” residue representing an unspecified amino acid. For AlphaFold modeling, I replaced this position with alanine to allow structure prediction.

3. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

0 0

N-terminus/A4V site: The peptide is binding far from the N-terminus where A4V sits, almost at the oposite side.

β-barrel/dimer interface: The peptide localizes to the face opposite the dimerization surface, away from the free loop termini that would normally contact the second monomer.

Surface-bound or buried: It appears surface-bound, not inside the β-barrel or the loops.


4. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder
#PeptidePseudo-PerplexityipTM
0WRYPVVAGRLKK14.840.29
1KRVPVVAAAHWK9.490.45
2WSYPAAGGKWWA16.200.47
3WRYYVVAGKWGE20.020.39
4FLYRWLPSRRGG22.360.41

ipTM measures the accuracy of the predicted relative positions of the subunits within the complex. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction.

All the ipTM score values are below 0.6, ranging from 0.29 to 0.47, suggesting that the predicted complexes may not represent a reliable interaction. Nevertheless, if the ipTM results are compared to the known SOD1-binding peptide FLYRWLPSRRGG, we can see that peptide #2 (ipTM = 0.47) and peptide #1 (ipTM = 0.45), slightly exceeded the ipTM value of the known binder. These results suggest that while the predicted interactions are weak, some generated peptides show comparable or slightly improved interface scores relative to the known binder.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes
    1. Predicted binding affinity
    2. Solubility
    3. Hemolysis probability
    4. Net charge (pH 7)
    5. Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

0 0

1 1

2 2

3 3

4 4

Week 06 HW: Genetic Circuits Part I

cover image cover image

Assignment: DNA Assembly

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

According to New England Biolabs [1]:

  • Phusion DNA polymerase: a DNA polymerase (enzymes that catalyze the synthesis of DNA molecules from dNTPs) that offers high fidelity and speed, with a lower error rate than Taq DNA polymerase.

  • Deoxynucleotides (dNTPs): building blocks of DNA, allow synthesis of new DNAstrands.

  • Reaction Buffer: a solution that maintains optimal conditions for the PCR reaction.

  • MgCl2: provides Mg2+ a required co-factor for the DNA polymerase.

2. What are some factors that determine primer annealing temperature during PCR?

According to New England Biolabs [2]:

The optimal annealing temperature (Ta) is determined by the melting temperature (Tm) of the PCR primer which is the temperature at which half of the primer-template duplex dissociates to become single-stranded.

Factors that influence Tm of the primers and thus impact on Ta of the PCR:

  • Length: Longer primers → higher Tm values

  • Sequence: higher GC content → higher Tm → higher Ta

  • Concentration of the primers: higher primer concentration → higher likelihood of binding

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR uses primers that anneal to a template at specific temperatures, then a polymerase extends them. What we get is a copy of the designed region, so we could say that the result is determined by where the primers bind.

Restriction enzymes are proteins that recognize a specific short DNA sequence, called a restriction site, and then cut both strands of the double helix at or near that site. The result is a defined DNA fragment with either blunt ends or sticky ends.

Restriction enzymes only cut at their recognition sequence. What if I want to cut DNA at a boundary that has no natural restriction site nearby? There are hundreds of known restriction enzymes, but it’s still limited, and engineering proteins to cut any sequence I want is still hard. On the other hand, the primers needed for PCR are just short DNA nucleotides, it’s easier to just synthesize a sequence in silico and order it. In this case, PCR is preferable over restriction enzymes.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Gibson cloning is a method to join DNA fragments in a single tube. This method requires overlapping complementary sequences at their ends. Therefore, I would make sure that the DNA sequences will be appropriate for Gibson Assembly by designing the primers with 5’ tails matching the ends of the backbone at the cut site. Only the 3’ end of a primer needs to anneal to the template, so the tail just gets carried into the product as the polymerase extends forward.

5. How does the plasmid DNA enter the E. coli cells during transformation?

There are two main physical methods, both of them share the same objective: transiently stress the membrane so plasmid DNA can diffuse in it.

  • Heat shock: a sudden drastic change in temperature, from ice 0°C to 42°C for 30 seconds, which briefly opens the cell wall.

  • Electroporation: an electroporator applies a brief pulse of a high-voltage electric field that temporally induces pores in cell membranes, which permits plasmid entry into the cells.

6. Describe another assembly method in detail (such as Golden Gate Assembly)
  1. Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate assembly is a one-pot, one-step cloning procedure. The method takes advantage of Type IIS restriction enzymes, which cleave DNA outside their recognition sequences. The result is an ordered assembly of a vector and one or more DNA fragments [3].

Golden Gate works in two simultaneous steps: digestion by a Type IIS enzyme that cuts outside its recognition site, and ligation by T4 ligase in the same tube. Because the recognition sites sit on the outside of each fragment pointing inward, the enzyme removes itself when it cuts and leaves a 4 bp overhang whose sequence is chosen during design. Once the destination vector and DNA insert(s) are digested, their complementary overhangs are joined together by DNA ligase to create an ordered assembly [3].

golden_gate golden_gate Figure extracted from New England Biolab’s web page


Assignment: Asimov Kernel

Create a blank Construct and save it to your Repository
  1. Recreate the Repressilator in an empty Construct by using parts from the Characterized Bacterial Parts repository
repressilator repressilator
  1. Confirm it works as expected by running the Simulator and compare your results with the Repressilator Construct found in the Bacterial Demos repository
simulation1 simulation1
  • pTetR represses TetR
  • pLacI represses LacI
  • pLambdaCI represses LambdaCI

pTetR drives LacI → pLacI drives LambdaCI → pLambdaCI drives TetR

The simulation works as expected, showing clear oscillation. When one repressor is high, the next promoter in the cycle is shut off, so the next protein falls; once that protein is low, the promoter after it becomes free, and so on around the loop.

Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo
Construct 1 - OR gate:
5.1.2 5.1.25.1 5.1

Design:

  • pTet and pTac drive a shared transcription unit (A1 RBS → LacI → L3S2P24).
  • pTet is induced by aTc, pTac by IPTG.
  • LacI is the output.

Truth table:

Simulation: E. coli, 24 h, 10-min timestep, transient transfection. Ligand schedule:

  • aTc → max at h 6
  • IPTG → max at h 12
  • aTc → 0 at h 18
PhaseTime (h)aTcIPTGOutput
10–6000
26–12101
312–18111
418–24011

Circuit behaves as OR, all four truth-table states are reproduced. (1,1) gives the highest output, as expected from additive promoter activity.


Construct 2 - NOR gate:

5.2 5.2 5.2.2 5.2.2

Design:

  • Two cassettes: pTet + pTac → A1 RBS → AmtR → L3S2P24, then pAmtR → A1 RBS → LacI → L3S2P24.
  • aTc induces pTet, IPTG induces pTac, both produce AmtR which represses pAmtR.
  • LacI is the output.

Simulation: E. coli, 24 h, 10-min timestep, transient transfection. Ligand schedule:

  • aTc → max at h 6
  • IPTG → max at h 12
  • aTc → 0 at h 18
PhaseTime (h)aTcIPTGOutput
10–6001
26–12100
312–18110
418–24010

Circuit behaves as NOR, all four truth-table states are reproduced. LacI only accumulates in the (0,0) state when no AmtR is being produced; any inducer drives LacI to near zero through AmtR repression of pAmtR.


Construct 3 - Double repression cascade (NOT-NOT):

5.3 5.3 5.3.2 5.3.2

Design:

  • Three cassettes: pTet + pTac → A1 RBS → AmtR → L3S2P24, then pAmtR → A1 RBS → LitR → L3S2P24, then pLitR → A1 RBS → LacI → L3S2P24.
  • aTc/IPTG → AmtR → represses LitR → de-represses LacI. Two inversions in series.
  • LacI is the output.

Simulation: E. coli, 24 h, 10-min timestep, transient transfection. Ligand schedule:

  • aTc → max at h 6
  • IPTG → max at h 12
  • aTc → 0 at h 18
PhaseTime (h)aTcIPTGLitROutput (LacI)
10–600high0
26–1210low1
312–1811low1
418–2401low1

The construct behaves like an OR gate (same truth table as C1), but with propagation delay and signal attenuation due to the extra repression layers. LitR protein falls before LacI rises, confirming that the repression signal propagates through the cascade in the expected order.

References

[1] “Phusion® High-Fidelity PCR Master Mix with HF Buffer | NEB.” Accessed: Apr. 22, 2026. [Online]. Available: https://www.neb.com/en/products/m0531-phusion-high-fidelity-pcr-master-mix-with-hf-buffer?srsltid=AfmBOorM6I4NLehgZdJrP18X-RzKB2X2U5oq3v8pfXh-kiMF-Tdxn5LL

[2] S. M. Klee, S. Lund, and G. C. Patton, “Universal annealing temperature in PCR and its impact on amplification results,” New England Biolabs, Ipswich, MA, USA, Application Note, Apr. 2026.

[3] “Golden Gate Assembly - Snapgene.” Accessed: May 25, 2026. [Online]. Available: https://www.snapgene.com/guides/golden-gate-assembly

Week 07 HW: Genetic circuits part II

cover image cover image

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
  • Boolean functions are limited to discrete on/off states while IANNs are capable of processing analogue signals and, because of that, carry more information. Real world phenomena are analog, inside a cell there is inherent molecular noise, and Boolean circuits are fragile to this, especially at low signal concentrations.

  • Boolean functions can only handle simple logical relationships (AND/OR/etc..) between inputs. IANNs, through weighted connections and nonlinear activation functions are capable of solving problems that are not linearly separable. [1]

  • IANNs have potential for Adaptability and unsupervised learning. There´s a principle known as neurons that fire together, wire together:

    “This means that the strength of the connection between neurons changes based on how often they are activated. When a connection between two neurons is activated frequently, its weight increases and vice-versa: when the activation is less frequent, the weight weakens.” [1]

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

Endometriosis has a multi-signal inflammatory signature, no single biomarker has sufficient sensitivity to diagnose or treat endometriosis.

Dual Region IANN: Two Sequestrons running in parallel, each sensing a different axis of the disease:

Sequestron A — inflammatory axis

  • X1: NF-κB
  • X2: IL-17A mRNA

If NF-κB is high but IL-17A is low → generic inflammation (PID) → Output_A ≈ 0

If IL-17A is high AND exceeds NF-κB → endometriosis signature → Output_A > 0

Sequestron B — angiogenic axis

  • X1: NF-κB
  • X2: VEGF/IL-8

Pelvic Inflammatory Disease elevates NF-κB but not the angiogenic markers. Endometriosis elevates both. By requiring both Sequestrons to fire, the circuit filters out false positives.

Output: anti-IL17A nanobody, released only when both axes are active simultaneously.


Assignment Part 2: Fungal Materials

1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
  • Adaptive fungal architectures: The FUNGAR project is working on a “living monolith”, a large-scale structure made of interconnected fungal threads. Designed to be sentient, using its natural “internet-like” properties to sense the environment and respond to touch, light, or electrical stimuli. The goal is to move toward intelligent bio-buildings that can monitor themselves and communicate information through their own biological network.

    Advantages:

    • Sustainable materials since they are completely biodegradable, requiere minimal energy, and they are grown from industrial waste.
    • Fungal based materials are capable of detecting light, chemicals and pressure, they can react to their surrounding.

    Disadvantages:

    • It is still difficult to achieve long-distance structures while maintaining the integrity and functionality of the fungus
    • Its chemical and mechanical properties change constantly according to its metabolism
  • Fungi into biological computers: By inserting electrodes into the mycelium, they’ve discovered they can record spikes of electrical activity and use them to implement Boolean logic gates (like ‘AND’, ‘OR’, and ‘XOR’). This means the fungus itself acts as a computing substrate, potentially leading to architecture that can process data without traditional electronic hardware.

    Advantages:

    • It reduces the need for cables and batteries
    • It uses its own bioelectricity to function and communicate over long distances within its own biological body.

    Disadvantages:

    • If the material dries out, its electrical resistance increases dramatically and it stops working as a sensor or computer.
  • Biological Information Valves (Fungal Automata): Hyphaes are divide in compartments by septa. The septal pores, called Woronin bodies, act as informational flow valve opening or closing to control the flow of cytoplasm. Researchers are using these valves as a way to control the flow of information through a fungal filament. This turns the fungus into a series of binary switches, similar to how transistors work in a computer chip.

    Advantages:

    • They can process information and act as binary switches.

    Disadvantages:

    • The internal valves that move information can close due to stress or simply the age of the fungus, affecting the system’s logic.
    mycelium mycelium

    Towards fungal computing. (a) Exemplar setup of recording electrical activity of mycelium of Pleurotus ostreatus. (b) Example of Boolean gates implementation with computer model of spikes travelling in a fungal colony. Fragment of electrical potential record in response to inputs (01), black dashed line, (10), red dotted line, (11), solid green line, entered as impulses.28 (c) A biological scheme of a fragment of a fungal hypha of an ascomycete, where we can see septa and associated Woronin bodies.29 (d) A scheme representing states of Woronin bodies: ‘0’ open, ‘1’ closed.30 (e) Examplar evolution of a one-dimensional fungal automaton: the arrays of nite state machines is vertical and time increases from the left to the right.

References

[1] A. Halužan Vasle and M. Moškon, “Synthetic biological neural networks: From current implementations to future perspectives,” BioSystems, vol. 237, p. 105164, Feb. 2024, doi: 10.1016/j.biosystems.2024.105164.

Week 09 HW: Cell Free Systems

cover image cover image

Homework Part A: General and Lecturer-Specific Questions

General homework questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
AspectIn vivoCell free
Production speedHours to days (requires growth)Minutes to hours (immediate synthesis)
Cell membraneCell membranes impede scientists from interacting with the components of the reaction or manipulate cellular processes.No cell membranes to get in the way of directly manipulating the reaction components.
Toxic proteinsIf the target protein is toxic to the host cell, the cell may die before it can produce a significant amount of the protein.No living cells to keep alive so it bypasses toxicity issues.
Manipulation of Reaction ConditionsCell viability needs to be considerate.Enables optimization (adjust pH, ionic strength, redox potential, metal ion concentrations, or temperature).
Interference & purityHost cells need to produce their own proteins to stay alive, interfering with or delaying the production of the target protein.

Difficult to separate the target protein from all the other proteins and cellular components.
Since CFPS contain the minimal cellular components necessary for protein synthesis it simplifies the extraction and purification process.
Non-natural amino acidsRestricted to the use of the 20 naturally occurring amino acids.Enable the use of non natural amino acids to produce proteins with novel properties.
Storage & ShippingProduced in large batches and shipped cold on ice, which is expensiveCan be freeze-dry to make them last longer at room temperature.

When is cell-free expression more beneficial than cell production?

  • Prototyping: Cell-free systems serve as excellent platforms for prototyping synthetic genetic circuits before implementation in living cells. Researchers can test promoters, ribosome binding sites, regulatory elements, and genetic circuit designs in hours rather than days, dramatically accelerating the design-build-test cycle [1].

  • Antibody discovery: Cell-free systems accelerate antibody engineering by enabling rapid production and screening of large antibody libraries [1].

  • Diagnostics and Point-of-Care Testing: Cell-free systems enable decentralized protein production for diagnostics, particularly valuable in resource-limited settings. Freeze-dried cell-free reactions can be stored at room temperature for months, then reconstituted with template DNA to produce protein sensors, antibodies, or enzymes on-demand [1].


2. Describe the main components of a cell-free expression system and explain the role of each component.
  • DNA: specific gene sequence of the target protein we wish to transcribe and translate

  • RNA polymerase: enzyme that transcribes the DNA instructions into mRNA.

  • Ribosomes: reads the mRNA instructions. tRNAs: transport the correct amino acids to the ribosome to build the protein.

  • Amino acids: building blocks for the construction of the actual proteins

  • Energy sources: provide the necessary power to drive both the transcription and translation reactions.

  • Translation factors: helper proteins that assist the ribosome during the initiation, elongation, and termination phases of building the protein chain.

  • Aminoacyl-tRNA synthetases: specific enzymes responsible for attaching the correct amino acid to its corresponding tRNA molecule.

  • Energy regeneration system: it is added to continuously recycle consumed ADP and GDP back into functional ATP and GTP, ensuring the reaction does not stop prematurely.


3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Protein synthesis requires power in the form of ATP, living cells are able to constantly regenerate these necessary cellular components. However, cell-free reactions cannot naturally regenerate these components on their own limiting reaction duration.

Because I want to validate a toehold switch I would prioritize signal clarity and the PURE system eliminates background noise. I found a paper (Yadav et al., 2025) that extends PURE by integrating an ATP regeneration system based on pyruvate oxidase, acetate kinase, and catalase. The new pathway generates acetyl phosphate from pyruvate, phosphate, and oxygen, which is used to rephosphorylate ATP in situ.This can function independently or in combination with the existing creatine-based system in PURE; the combined system produced an enhancement of 78% compared to using the creatine system alone [2].


4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
AspectProkaryoticEukaryotic
YieldHighLow
Cost-efficiencyCheap and fastExpensive and laborious
Modifications (PTMs)Severely limitedMultiple (glycosylation, lipidation, and disulfide-bridge formation)
FoldingRelies on native prokaryotic chaperones (may cause incorrect folding of complex eukaryotic proteins)Optimal environment for complex/human proteins
MembranesNo natural membranesPossess endogenous microsomes for direct insertion
  • For a Prokaryotic System: this is a suitable option for small and soluble proteins that don’t need post-translational modifications to function and high yield is a priority. This system is specially good for genetic circuits validation. For this system I would produce sfGFP, since it only needs to fold correctly to fluoresce and has no PTM requirements, making E. coli cell-free the most straightforward and cost-effective choice.

  • For a Eukaryotic System: this is the right choice when the protein is human-derived, requires Post-translational modifications (PTMs) like glycosylation or disulfide bonds or has complex folding that depends on eukaryotic chaperones. For this system I would produce IL-6, since it requires N-linked glycosylation for proper receptor binding.


References

[1] “Cell-Free Systems for Protein Production: Advantages Over Living Cells,” Cytion. Accessed: Apr. 06, 2026. [Online]. Available: https://www.cytion.com/ca/About-Cytion/Knowledge-Hub/Blog/Cell-Free-Systems-for-Protein-Production-Advantages-Over-Living-Cells/ [2] S. Yadav, A. J. P. Perkins, S. B. W. Liyanagedera, A. Bougas, and N. Laohakunakorn, “ATP Regeneration from Pyruvate in the PURE System,” ACS Synth. Biol., vol. 14, no. 1, p. 247, Jan. 2025, doi: 10.1021/acssynbio.4c00697.

Week 10 HW: Imaging and Measurement

cover image cover image

For your final project:

- Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
- Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
- What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

----------------------------------------------------------------------------------------------------------------------------------------------------
- Toehold switch activation:
  • Does the switch open specifically in response to LncRNA H19? I would order the toehold switch construct from Twist and it will be expressed at Ginkgo using a PURExpress cell-free reaction with and without the H19 trigger, using sfGFP as reporter.

Technology: fluorescence spectroscopy with a plate reader.


----------------------------------------------------------------------------------------------------------------------------------------------------
- Fusion protein identity:
  • Was the anti-STAT3 monobody fused to the E3 ligase recruitment domain expressed at the right size and correct sequence?

Technology:

  • SDS-PAGE: approximate size (first check)
  • Intact LC-MS: confirm molecular weight
  • Peptide mapping for LC-MS/MS: confirm the correct order of amino acids (primary structure)

----------------------------------------------------------------------------------------------------------------------------------------------------
- Ternary complex formation:
  • Does the bioPROTAC bridge STAT3 to the cell´s degradation machinery?

Technology: native mass spectrometry I would look for three peaks:

  • STAT3 (alone)
  • E3 recruitment domain alone
  • Assembled ternary complex (If the third peak appears, the mechanism is functional)

----------------------------------------------------------------------------------------------------------------------------------------------------
- LNP formulation integrity:
  • Are the lipid nanoparticles used for intraperitoneal delivery correctly assembled and homogeneous?

Technology: CDMS


----------------------------------------------------------------------------------------------------------------------------------------------------
- STAT3 degradation:
  • Do STAT3 protein levels decrease?

Technology: Western Blot in endometriosis cells treated vs. untreated.


----------------------------------------------------------------------------------------------------------------------------------------------------
- IL-6 and IL-8:
  • Do downstream inflammatory markers (IL-6 & IL-8) drop as functional output?

Technology: ELISA


Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

1. What is the calculated molecular weight?

eGFP Sequence: MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Values obtained from Expasy’s molecular weight calculator when entering eGFP sequence that contains a His-purification tag (HHHHHH) and a linker:

PropertyResult
Number of aminoacids247
Average molecular weight28006.60 Da
Monoisotopic molecular weight27988.96 Da
Theorical pI5.90

I will be using the average $MW=28006.60 Da$

However, according to this week’s lab “This self-cyclization is observed as a nominal loss of 20 Da, through dehydration and loss of two hydrogen atoms.”

So a prior modification is needed because the mature chromophore formation causes a mass loss of approximately $H2O + H2 ≈ 20.03 Da$.

$$ Theorical MW = 28006.60 - 20.03 $$ $$ Theorical MW = 27986.57 Da $$
2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

I selected the following charge state peaks:

  • $m/z_n=1000.4302$

  • $m/z_{n+1}=965.9684$

  • Determine z for each adjacent pair of peaks $(n, n + 1)$ using:

$$ z=\frac{\frac{m}{z_{n+1}}}{\frac{m}{z_n}-\frac{m}{z_{n+1}}} $$ Replacing the values of the charge state peaks: $$ z=\frac{965.9684}{1000.4302-965.9684} $$ $$ z=28.03 $$
  • Determine the MW of the protein using the relationship between $\frac{m}{z_n}$, $MW$ and $z$

$$\frac{m}{z_n} = \frac{MW + n \cdot H}{n}$$

$H \approx 1.0073$ Da (mass of a proton). Solving for $MW$:

$$MW = n \cdot \left(\frac{m}{z_n}\right) - n \cdot H$$

Using $n = 28$ (from part 2.1) and $m/z_n = 1000.4302$:

$$MW = 28 \cdot (1000.4302) - 28 \cdot (1.0073)$$

$$MW = 27983.8412 \text{ Da}$$

As a check, applying the same formula to the adjacent peak with $n+1 = 29$ and $m/z_{n+1} = 965.9684$:

$$MW = 29 \cdot (965.9684) - 29 \cdot (1.0073)$$

$$MW = 27983.87 \text{ Da}$$

Now I’ll take the average experimental MW:

$$ MW_exp = \frac{27983.8412 + 27983.87}{2} $$ $$ MW_exp = 27983.86 Da $$
  • Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using:
$$ Accuracy=\frac{|MW_{experiment} - MW_{theory}|}{MW_{theory}} $$ Replacing the values: $$ Accuracy=\frac{|27983.86 - 27986.57|}{27986.57} $$ $$ Accuracy=0.0000968 $$ Converting to percent: $$ 0.00009683*100=0.0097% $$ Converting to ppm: 0.00009683*1000000 = 97 ppm
accuracy accuracy
  • Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

Yes, the charge state can be determined from the zoomed-in peak at $m/z \approx 1473$. The isotope peaks within the envelope are separated by approximately $\Delta(m/z) \approx 0.052$ m/z (between 1473.7428 and 1473.7950). Since the isotopic spacing equals $1/z$, this gives:

$$z = \frac{1}{0.052} \approx 19$$


Week 11 HW: Building Genomes

cover image cover image

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)
1. sfGFP
2. mRFP1
3. mKO2
4. mTurquoise2
5. mScarlet_I
6. Electra2
Fluorescent proteinMaturation (min)pKaBrightnessDescription
sfGFP13.654.15Rapidly-maturing weak dimer
mRFP160.04.512.5Slowly-maturing monomer with low acid sensitivity
mKO2108.05.539.56Moderate acid sensitivity
mTurquoise233.53.127.9Rapidly-maturing monomer with very low acid sensitivity
mScarlet_I174.05.370Moderate acid sensitivity
Electra261.48
3. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

I choose mKO2 because it is acid sensitive and has a slow maturation time.. My hypothesis is that increasing the HEPES-KOH concentration in the 36-hour master mix would help maintain the pH closer to 7.5 throughout the reaction. Which is important because in cell free reactions pH decreases over long incubations due to the accumulation of acidic metabolic byproducts and mKO2 slow maturation time means most of the signal is generated in the later hours when pH tends to drop.