Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Scale-up of nanocapsules for drug delivery using bacteria as ferritin manufacturers 1. Describe a biological engineering application or tool you want to develop and why. Biologics are drugs synthesized by living organisms, which have gained more notoriety throughout the years (Walsh, 2018). Cancer drugs and vaccines are some of the achievements scientists have accomplished with biotechnology. This is a novel area with increasing knowledge and endless applications. Currently, iron deficiency is one of the main global issues affecting overall health (Lee et al., 2025). This project aims to develop a drug delivery system using bacteria-made ferritin, given the popularity and extended use of these microorganisms over the years for drug manufacturing (Kulkarni, 2026).

  • Week 10 HW: Advanced Imaging & Measurement Technology

    Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. The main aspect to be measured is the expression and activity of the biosynthetic gene cluster (BGC). This includes: Presence and expression of BGC-associated enzymes Production of candidate metabolites Antibacterial activity against Leptospira Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

  • Week 2 HW: DNA Read, Write, and Edit

    HOMEWORK Part 1: Benchling & In-silico Gel Art Below are some screenshots from the steps followed to create a basic pattern: Step 1: The sequence is imported from the webpage to Benchling. Figures 1 and 2. Lambda DNA import process. Step 2: The digest function is shown as a test with EcoRI as the chosen restriction enzyme.

  • Week 3 HW: Lab Automation

    Assignment no. 1: Python Script for Opentrons Artwork The code used was the following to create a simple swirl pattenr with four different colors: from opentrons import types import math metadata = { 'author': 'Jean Colmenares', 'protocolName': 'Agar Swirl Pattern - 4 Colors', 'description': 'Swirl pattern with four colors per branch', 'source': 'HTGAA 2026 Opentrons Lab', 'apiLevel': '2.20' } TIP_RACK_DECK_SLOT = 9 COLORS_DECK_SLOT = 6 AGAR_DECK_SLOT = 5 PIPETTE_STARTING_TIP_WELL = 'A1' well_colors = { 'A1': 'Red', 'B1': 'Green', 'C1': 'Orange', 'D1': 'Blue' } def run(protocol): tips_20ul = protocol.load_labware( 'opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips' ) pipette_20ul = protocol.load_instrument( "p20_single_gen2", "right", [tips_20ul] ) temperature_module = protocol.load_module( 'temperature module gen2', COLORS_DECK_SLOT ) temperature_plate = temperature_module.load_labware( 'opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate' ) color_plate = temperature_plate agar_plate = protocol.load_labware( 'htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate' ) center_location = agar_plate['A1'].top() pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL) # —————————————————————— # Helper functions # —————————————————————— def location_of_color(color_string): for well, color in well_colors.items(): if color.lower() == color_string.lower(): return color_plate[well] raise ValueError(f"No well found with color {color_string}") def dispense_and_detach(pipette, volume, location): above_location = location.move(types.Point(z=location.point.z + 5)) pipette.move_to(above_location) pipette.dispense(volume, location) pipette.move_to(above_location) # —————————————————————— # SWIRL PATTERN — BIG + FIXED COLOR PER BRANCH (P20 SAFE) # —————————————————————— DROP_VOLUME = 3 branches = 4 points_per_branch = 24 radius_start = 3 radius_step = 1.6 angle_step = math.pi/9 branch_colors = ['Red', 'Green', 'Orange', 'Blue'] for branch in range(branches): base_angle = branch * (2*math.pi/branches) color = branch_colors[branch] source = location_of_color(color) for i in range(points_per_branch): pipette_20ul.pick_up_tip() pipette_20ul.aspirate(DROP_VOLUME, source.bottom(1)) angle = base_angle + i * angle_step radius = radius_start + i * radius_step x = radius * math.cos(angle) y = radius * math.sin(angle) loc = center_location.move(types.Point(x=x, y=y, z=0)) dispense_and_detach(pipette_20ul, DROP_VOLUME, loc) pipette_20ul.drop_tip() The pattern is shown below:

  • Week 4 HW: Protein Design Part I

    PART A: 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Assumptions: 500 g of meat ~31 g of protein per 100 g of meat (British Nutrition Foundation, 2021 ) Average amino acid mass ≈ 100 g/mol Avogadro’s number = 6.022 × 10^23 molecules/mol 1. Protein content in 500 g of meat

  • Week 5 HW: Protein Design part II

    PART A: SOD1 Binder Peptide Design The sequence for the original protein is: // sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Mutation occurs at residue 4: Alanine becomes Valine // 1UXM_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|SUPEROXIDE DISMUTASE [CU-ZN]|HOMO SAPIENS (9606) ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVS IEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Part 1: Generate Binders with PepMLM

  • Week 6 HW: Genetic Circuits Part I

    PART 1: Protocol questions What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The mastermix contains: Phusion Hi-Fi DNA Polymerase: It is crucial for completing the amplicons generated during PCR. Deoxynucleotides: The building blocks necessary for replicating DNA fragments. Buffer including MgCl2: Prevents enzyme denaturation by maintaining pH at a fixed level. What are some factors that determine primer annealing temperature during PCR? Some factors include the primer length

  • Week 7 HW: Genetic Circuits Part II

    PART 1: IANNs What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

  • Week 9 HW: Cell-Free Systems

    Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Main advantages (flexibility & control): Open system: You can directly add/remove components (DNA, cofactors, salts, inhibitors). Precise control: You can tune Mg²⁺, ATP, amino acids, etc. Rapid expression: No need for cloning → transformation → growth. Toxic proteins: You can express proteins that would normally kill cells.

Subsections of Homework

Week 1 HW: Principles and Practices

Scale-up of nanocapsules for drug delivery using bacteria as ferritin manufacturers

1. Describe a biological engineering application or tool you want to develop and why.

Biologics are drugs synthesized by living organisms, which have gained more notoriety throughout the years (Walsh, 2018). Cancer drugs and vaccines are some of the achievements scientists have accomplished with biotechnology. This is a novel area with increasing knowledge and endless applications. Currently, iron deficiency is one of the main global issues affecting overall health (Lee et al., 2025). This project aims to develop a drug delivery system using bacteria-made ferritin, given the popularity and extended use of these microorganisms over the years for drug manufacturing (Kulkarni, 2026).

As I entered my senior year in university, I wanted to work on topics related to drug development or delivery, and scaling that process up, which is why I was doing some research for my bachelor’s thesis. However, I ended up reaching a different path due to life, but here I am trying to learn and see how far I can take this idea.

2. Describe one or more governance policy goals related to ensuring this application contributes to an ethical future & prevents harm.

  • Ensure quality in scaled production: Ensure that the biologic product complies with Good Manufacturing Practices (GMP).

  • Prevent non-maleficence in biomanufacturing: Avoid harmful use of bioengineered bacteria.

  • Foster and promote innovation and global access: Enable technology transfer to low-resource settings.

3. Describe at least three different potential governance actions by considering the purpose, design, assumptions, and risks of failures & “success”.

Below is a table with the three main governance actions:

Table 1. Governance actions

Governance ActionPurposeDesignAssumptionsRisks & Failures
Biologics safeguardsCreate biocontainment for bacteria as biological hazardsEntities such as FDAThe safeguards will be effectiveOverlooking the safeguards may affect their effectiveness
Standardized GMPsElaborate guidelines for safe productionQA department staffFast implementation and adaptation by companiesHigh associated costs may create manufacturing monopolies
Traceability of biological productAvoid misuse of the biologicMolecular signaturesTraceability methods are robustMutations in microorganisms may render signatures ineffective

4. Score each of your governance actions against your rubric of policy goals

Table 2. Scoring

Does the option:Biologics safeguardsStandardized GMPsTraceability of biological product
Enhance Biosecurity
• By preventing incidents122
• By helping respond211
Foster Lab Safety
• By preventing incidents112
• By helping respond222
Protect the environment
• By preventing incidents122
• By helping respond222
Other considerations
• Minimizing costs and burdens322
• Feasibility222
• Not impede research232
• Promote constructive applications212
TOTAL SCORE181819

5. Based on scores, describe which governance option or combination of options you would prioritize, and why.

After reviewing the different options and their scores, the most reasonable combination of options to prioritize would be Standardized GMPs and Traceability of Biological Products. The former is selected due to its strong impact on both productivity and product quality, as well as its capacity to establish clear guidelines that ensure biological safety for both production staff and consumers. The latter is essential because traceability enables the identification of errors and deviations throughout the production process, allowing them to be corrected in a timely manner and ensuring the ethical and responsible use of this technology.


References

Walsh, G. (2018). Biopharmaceutical benchmarks 2018. Nature Biotechnology, 36, 1136–1145. https://www.nature.com/articles/nbt.4305

Lee, G. R., et al. (2025). Global burden of iron deficiency and its impact on health. Nature Medicine. https://www.nature.com/articles/s41591-025-03624-8

Kulkarni, S. (2026). Engineered microbes as API manufacturers in pharma. Pharma Now. https://www.pharmanow.live/knowledge-hub/research/engineered-microbes-api-manufacturing-pharma

Week 10 HW: Advanced Imaging & Measurement Technology

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

The main aspect to be measured is the expression and activity of the biosynthetic gene cluster (BGC). This includes:

  • Presence and expression of BGC-associated enzymes
  • Production of candidate metabolites
  • Antibacterial activity against Leptospira

Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

  • BGC enzyme expression: Measured using Western blot to confirm protein presence and approximate expression levels.
  • Metabolite production: Measured using LC-MS to detect and quantify candidate compounds produced by the BGC.
  • Antibacterial activity: Evaluated through antibiograms to assess inhibition of Leptospira growth.

What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

  • Western blot: To detect specific proteins encoded by the BGC after separation by gel electrophoresis.
  • Gel electrophoresis: For protein separation prior to blotting.
  • LC-MS (Liquid Chromatography–Mass Spectrometry): Main analytical technique to identify and quantify metabolites based on retention time and mass-to-charge ratio.
  • Antibiogram assays: To determine the antibacterial effectiveness of produced compounds.

PART 1

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Based on the website provided, the molecular weight is 26.94 kDa, which does not consider the linker and His tag. If we consider them, the new MW will then be 28 kDa. The former value is consistent with other GPFs from other databases such as Q9U6Y4 (26.17 kDa), P42212 (26.89 kDa) and Q9GZ28 (25.91 kDa).

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

The expression is:

$ z = \frac{m/z_{n+1}}{\left(\frac{m}{z_n} - \frac{m}{z_{n+1}}\right)} $

Then, considering peaks such as 800.5508 and 824.0635, $z$ is equal to 34.047.

Then, the molecular weight is given by:

$ MW = z \cdot \left(\frac{m}{z_n} - 1\right) $ $ MW = 34.047 \cdot (824.0635 - 1) = 28{,}023.329 \ \text{Da} = 28.02 \ \text{kDa} $

With these values, the accuracy will be:

$ \frac{28.02 - 28}{28} = 5.97 \times 10^{-2} $

Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

PART 2

Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

When a protein unfolds, it allows external ions to interact with the structure through ion-molecule interactions such as ion-dipole forces. This increases the amount of charges and, therefore, the number of peaks, which is why it is shown a broader spectrum in the denatured protein than that of the native protein.

Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800? What is the charge state? How can you tell?

TBA

PART 3

How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

MVS K GEELFTG VVPILVELDG DVNGH K FSVS GEGEGDATYG K LTL K FICTT G K LPVPWPTL VTTLTYGVQC FS R YPDHM K Q HDFF K SAMPE GYVQE R TIFF K DDGNY K T R A EV K FEGDTLV N R IEL K GIDF K EDGNILGH K LEYNYNSHNV YIMAD K Q K NG I K VNF K I R HN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALS K D PNE K R DHMVL LEFVTAAGIT LGMDELY K LE HHHHHH

Number of Lysines: 20 Number of Arginines: 6

How many peptides will be generated from tryptic digestion of eGFP?

Based on the website provided, there will be 19 fragments

Figure 1. 19 fragments cut using the Expasy PeptideMass website.

Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

Week 2 HW: DNA Read, Write, and Edit

HOMEWORK

Part 1: Benchling & In-silico Gel Art

Below are some screenshots from the steps followed to create a basic pattern:

Step 1: The sequence is imported from the webpage to Benchling.

Figures 1 and 2. Lambda DNA import process.

Step 2: The digest function is shown as a test with EcoRI as the chosen restriction enzyme.

Figures 3 and 4. DNA Gel using EcoRI.

Step 3: The process is repeated using enzymes requested in the homework, and the result is the following:

Figures 5 and 6. DNA pattern using homework enzymes.

Step 4: The process is repeated now using different enzymes requested in the homework to create a different pattern. In this case, the pattern might look like a series of barcodes, using the enzymes shown in figure 7. Since they cut in different sites, they create a lot of short DNA fragments which scatter across the gel, givin the impression of a barcode. This enzyme behavior could be used as a biomarker perhaps in, although this idea needs further development:

Figures 7 and 8. DNA Gel pattern using different enzymes.

Part 3: DNA Design Challenge

3.1. Choose your protein

In this case, I chose the Transcription Initiation Factor 3 (TIF-3) encoded in the gene infC, which is a relatively short protein that is involed in the translation process (Gutu et al., 2013), (Arenz & Wilson, 2016). Modifying the structure of this protein may be crucial to combat antibiotic resistance.

By going to UniProt, the aminoacid sequence is the following:

//

tr|A0A8S0FV27|A0A8S0FV27_ECOLX Translation initiation factor IF-3 OS=Escherichia coli OX=562 GN=infC PE=3 SV=1 MSLREALEKAEEAGVDLVEISPNAEPPVCRIMDYGKFLYEKSKSSKEQKKKQKVIQVKEI KFRPGTDEGDYQVKLRSLIRFLEEGDKAKITLRFRGREMAHQQIGMEVLNRVKDDLLRRT GSGRILPNEDRRPPDDHGARS

3.2 and 3.3: Reverse translation and codon optimization

Going bak to Benchling, the AA sequence was imported and the function back translate was used to obtain the DNA sequence which is already optimized given the figures below:

Figure 9. Codon optimization.

// DNA optimized sequence: ATGAGTTTACGTGAAGCACTGGAAAAAGCGGAAGAAGCCGGTGTTGATCTGGTCGAAATCAGTCCTAATGCAGAACCCCCGGTGTGCCGTATCATGGACTATGGCAAATTCCTCTACGAGAAATCTAAAAGCTCAAAGGAACAAAAAAAGAAACAGAAGGTTATTCAGGTCAAAGAGATTAAGTTTCGACCGGGGACTGACGAAGGAGACTATCAAGTGAAACTTCGCTCCTTGATTCGCTTCCTGGAAGAGGGGGATAAAGCGAAAATTACCCTGCGCTTTCGCGGCAGAGAGATGGCCCACCAGCAGATCGGCATGGAAGTATTGAACCGTGTGAAAGATGACTTACTGCGTCGCACGGGTAGCGGTCGTATACTGCCAAACGAGGATCGCCGGCCGCCGGATGATCATGGCGCTCGGTCG

The organism selected for this protein is E. Coli due to its wide use in biotechnology. The codons had to be optimized due to the fact that the cellular machinery may differ from one bacteria to another. This means that a bacteria other than E. Coli might express this protein at a different rate and intensity. In this case, the protein was obtained from E. Coli based on UniProt, but since E. Coli contains different strands, codon optimization still would have to be performed.

3.4 and 3.5: Production technologies and alignment

I would use host cells since the chosen protein is from a bacteria en E. Coli is a common microorganism used for these purposes. This technique has a much lower cost thant using cell-free systems in which all the cellular components have to be supplied.

The alignment is shown in the following figures

Figures 10, 11 and 12. Central Dogma for TIF-3.

Part 4: Prepare a Twist DNA Sequence Order

Step 1: DNA Sequence

The same DNA linear sequence was already obtained in Part no. 3.

Figure 13. DNA Sequence.

Step 2: Building the chasis

The parts were initially searched in iGEM, but the website shut down. Due to this, the parts provided in the homework were used.

Figures 14 and 15. iGEM issue.

The chasis now looks like this:

Figure 16. Chasis.

Step 4: Ordering in Twist

The process is shown in the figures below:

Figure 17. Importing the sequence to Twist from Benchling.

Step 5: Creating the plasmid

The process is shown in the figures below:

Figures 18 and 19. Creating the plasmid.

Finally, the plasmid is shown below:

Figure 20. Final plasmid.

Part 5: DNA Read/Edit/Write

5.1.1: What DNA would you want to sequence (e.g., read) and why?

I would like to analyze DNA from insects such as flies, since many species act as vectors for infectious diseases. By sequencing their DNA, I could identify genetic elements associated with viral transmission, pathogen resistance, or susceptibility. This information could help improve disease monitoring and vector control strategies.

5.1.2: In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

I would use second-generation sequencing technologies such as Illumina sequencing due to their high throughput, accuracy, and cost-effectiveness. Illumina sequencing is particularly efficient for short DNA fragments and allows the parallel sequencing of millions of reads, making it ideal for large-scale genomic analysis of insect populations.

5.2.1: What DNA would you want to synthesize (e.g., write) and why?

I would like to synthesize bacterial DNA initially because bacterial genomes are less complex than eukaryotic genomes, which makes them more manageable in terms of cost and laboratory procedures. This would allow me to gain experience with gene design and expression systems before working with more complex organisms.

5.2.2: What technology or technologies would you use to perform this DNA synthesis and why?

I would use common routes such as solid-phase phosphoramidite chemical DNA synthesis combined with gene assembly techniques. These methods allow precise synthesis of short oligonucleotides, which can then be assembled into longer DNA constructs. It is widely used, reliable, and scalable for constructing bacterial genes or plasmids.

5.3.1: What DNA would you want to edit and why?

I would edit DNA from mammalian cells, focusing on genes involved in the immune response. By modifying specific regulatory or coding sequences, it may be possible to enhance resistance to infectious diseases or better understand the mechanisms underlying autoimmune disorders. However, such research would need to be conducted carefully and ethically due to the potential implications of editing mammalian genomes.

PRE-LECTURE NOTES

Homework questions from Prof. Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

The error rate of polymerase is approximately 1 in a million nucleotides. Considering the human genome length of approximately 3.2 billion base pairs, or 6.4 billion nucleotides in a diploid cell, there would be roughly 6,400 errors per molecule of DNA, and half that number in a haploid cell. This implies a significant chance for defects or mutations to occur and potentially be passed down to offspring. However, biology has developed multiple mechanisms over the past century that increase the fidelity of DNA replication. For instance, MutS-1 is a protein shown to bind to mismatched DNA sequences. This mechanism therefore acts as an additional layer that improves the fidelity of de novo DNA synthesis (Carr et al., 2004).

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

There are approximately 20n ways to code a human protein, where n represents the length of the protein. For instance, a typical protein may consist of 300 amino acids. Therefore, there are 20300 possible ways, which corresponds to an extremely large number of potential coding sequences (Alberts et al., 2002). Some of the reasons these codes do not work in practice include:

  • Codon usage bias: The prevalence of a codon is related to its translation efficiency; some codons are translated faster than others. This impacts protein expression levels and availability (Chakravarty, 2026).

  • Protein structure: Since proteins fold co-translationally, changes in codon usage can alter the timing of folding events, affecting protein structure and function (Moss et al., 2024).

Homework questions from Dr. LeProust

What’s the most commonly used method for oligo synthesis currently?

Currently, oligo synthesis is most commonly performed using phosphoramidite nucleosides as building blocks. This process consists of four main chemical reactions: detritylation, coupling, capping, and oxidation (Kosuri & Church, 2014).

Why is it difficult to make oligos longer than 200 nt via direct synthesis?

The main challenge in synthesizing long oligonucleotides using standard phosphoramidite chemistry lies in cumulative yield loss and error accumulation. Unwanted reactions, such as depurination during detritylation, and incomplete removal of protecting groups can leave gaps in the oligo backbone, reducing overall yield. In addition, single-base deletions are the predominant errors caused by inefficiencies during these reaction steps.

Why can’t you make a 2000 bp gene via direct oligo synthesis?

Manufacturing an oligo of this length is highly prone to errors due to several factors. First, oligo concentrations obtained from a selected pool after processing are often quite low, reducing assembly efficiency. Second, when synthesizing large numbers of oligos, overlapping coding regions may introduce assembly errors at scale. Finally, significantly higher costs are required to produce the large number of strands necessary for successful gene assembly.

Homework question from George Church

Using Google & Prof. Church’s slide #4, what are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The ten essential amino acids in most animals are:

Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and sometimes Cysteine or Tyrosine, depending on species-specific metabolic capabilities (Hou and Wu, 2018).

Understanding the Lysine Contingency as a bioengineered constraint, the dependence of animals on multiple essential amino acids further strengthens this strategy. This dependency enables researchers to implement safer in vivo containment systems, as organisms lacking access to these amino acids are unable to survive outside controlled environments (Shivni, 2023).


References

Carr, A. M., Lambert, S., & Replication Stress Group. (2004). Mismatch repair proteins and DNA replication fidelity. Nucleic Acids Research, 32(20), e162. https://academic.oup.com/nar/article/32/20/e162/1115791

Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P. (2002). Molecular Biology of the Cell (4th ed.). Garland Science. https://www.ncbi.nlm.nih.gov/books/NBK26830/

Chakravarty, A. (2026). What is codon bias? GoldBio. https://www.goldbio.com/blogs/articles/what-is-codon-bias

Moss, A. J., et al. (2024). Codon usage and protein folding dynamics. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC11227313/

Kosuri, S., & Church, G. M. (2014). Large-scale de novo DNA synthesis. Nature Methods, 11, 499–507. https://www.nature.com/articles/nmeth.2918

Hou, Y., & Wu, G. (2018). Nutritionally essential amino acids in animals. Advances in Nutrition, 9(6), 849–858. https://doi.org/10.1093/advances/nmy054

Shivni, R. (2023). A pioneer of the multiplex frontier. The Scientist. https://www.the-scientist.com/a-pioneer-of-the-multiplex-frontier-71132

Week 3 HW: Lab Automation

Assignment no. 1: Python Script for Opentrons Artwork

The code used was the following to create a simple swirl pattenr with four different colors:

from opentrons import types
import math

metadata = {
    'author': 'Jean Colmenares',
    'protocolName': 'Agar Swirl Pattern - 4 Colors',
    'description': 'Swirl pattern with four colors per branch',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

well_colors = {
    'A1': 'Red',
    'B1': 'Green',
    'C1': 'Orange',
    'D1': 'Blue'
}

def run(protocol):

    tips_20ul = protocol.load_labware(
        'opentrons_96_tiprack_20ul',
        TIP_RACK_DECK_SLOT,
        'Opentrons 20uL Tips'
    )

    pipette_20ul = protocol.load_instrument(
        "p20_single_gen2",
        "right",
        [tips_20ul]
    )

    temperature_module = protocol.load_module(
        'temperature module gen2',
        COLORS_DECK_SLOT
    )

    temperature_plate = temperature_module.load_labware(
        'opentrons_96_aluminumblock_generic_pcr_strip_200ul',
        'Cold Plate'
    )

    color_plate = temperature_plate

    agar_plate = protocol.load_labware(
        'htgaa_agar_plate',
        AGAR_DECK_SLOT,
        'Agar Plate'
    )

    center_location = agar_plate['A1'].top()

    pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

    # ------------------------------------------------------------------
    # Helper functions
    # ------------------------------------------------------------------

    def location_of_color(color_string):
        for well, color in well_colors.items():
            if color.lower() == color_string.lower():
                return color_plate[well]
        raise ValueError(f"No well found with color {color_string}")

    def dispense_and_detach(pipette, volume, location):
        above_location = location.move(types.Point(z=location.point.z + 5))
        pipette.move_to(above_location)
        pipette.dispense(volume, location)
        pipette.move_to(above_location)

     # ------------------------------------------------------------------
    # SWIRL PATTERN — BIG + FIXED COLOR PER BRANCH (P20 SAFE)
    # ------------------------------------------------------------------

    DROP_VOLUME = 3

    branches = 4
    points_per_branch = 24

    radius_start = 3
    radius_step = 1.6
    angle_step = math.pi/9

    branch_colors = ['Red', 'Green', 'Orange', 'Blue']

    for branch in range(branches):

        base_angle = branch * (2*math.pi/branches)
        color = branch_colors[branch]
        source = location_of_color(color)

        for i in range(points_per_branch):

            pipette_20ul.pick_up_tip()

            pipette_20ul.aspirate(DROP_VOLUME, source.bottom(1))

            angle = base_angle + i * angle_step
            radius = radius_start + i * radius_step

            x = radius * math.cos(angle)
            y = radius * math.sin(angle)

            loc = center_location.move(types.Point(x=x, y=y, z=0))
            dispense_and_detach(pipette_20ul, DROP_VOLUME, loc)

            pipette_20ul.drop_tip()

The pattern is shown below:

Assignment no.2: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Automated Strain Construction for Biosynthetic Pathway Screening in Yeast Astolfi et al., 2025

In this study, the researchers programmed a Hamilton Microlab VANTAGE liquid-handling robot (a high-end automation platform, not an Opentrons) to integrate with additional on- and off-deck hardware (e.g., thermocyclers, plate sealers, colony pickers) via its central arm. Together with custom software and a user interface developed in the Hamilton VENUS environment, this system automated key steps in yeast strain construction such as transformation setup, heat-shock, washing, and plating.

This automated workflow achieved a throughput of up to ~2,000 transformations per week, enabling high-throughput construction and screening of libraries of engineered yeast strains. As a proof of concept, the team applied the system to screen gene variants within a biosynthetic pathway for the plant alkaloid precursor verazine. They identified several genes that significantly increased pathway product titers, demonstrating the utility of automated strain construction for rapid pathway discovery and optimization.

Assignment no. 3: Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

For my first idea, using a liquid-handling robot like the ones at Opentrons would be a crucial step in figuring out the right enzyme concentration for dye breakdown.

Week 4 HW: Protein Design Part I

PART A:

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Assumptions:


1. Protein content in 500 g of meat

$ 500 \,\text{g meat} \times \frac{31 \,\text{g protein}}{100 \,\text{g meat}} = 155 \,\text{g protein} $

2. Convert grams of protein to moles of amino acids

$ 155 \,\text{g} \times \frac{1 \,\text{mol}}{100 \,\text{g}} = 1.55 \,\text{mol amino acids} $

3. Convert moles to molecules

$ 1.55 \,\text{mol} \times 6.022 \times 10^{23} = 9.33 \times 10^{23} $

Final Answer

$ \boxed{9 \times 10^{23} \text{ amino acid molecules}} $

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans need to feed on beef, fish and other nutrients to obtain energy and raw materials. Even though we consume proteins and nucleic acids that were originally built according to another organism’s DNA, digestion breaks them down into basic biomolecules, namely amino acids and nucleotides. Our cells then use our DNA to reassemble those building blocks according to human genetic instructions, not those of a cow or a fish.

3. Why are there only 20 natural amino acids?

There are around 500 aminoacids, but the only ones required for human protein building are 20.

5. Where did amino acids come from before enzymes that make them, and before life started?

Amino acids likely formed through prebiotic chemical reactions before life emerged. Experimental evidence suggests they could have been synthesized under early Earth atmospheric conditions, through energy sources such as lightning, volcanic activity, and hydrothermal systems rich in sulfur compounds. Discuss how simple inorganic molecules, combined with energy input, could generate organic building blocks like amino acids without the need for enzymes.

Several scienties have tried to answer this question and, surprisginfly, they could have been synthetized artifically by the atmospheric conditions and the high-sulfured sea. This (Farias-Rico and Mourra-Diaz, 2022)

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Because D-amino acids are the mirror image of L-amino acids, the energetically favorable backbone torsion angles are also inverted. As a result, the most stable α-helix formed by D-amino acids is left-handed.

7. Can you discover additional helices in proteins?

Yes. In fact, some studies have identified different forms of alpha helix in globular proteins, namely linear, curved or kinked (Kumar & Bansal, 1998). Adititionaly, there are 3~10 and $\pi$ helix which are less favourable in trerms of stability but still occur (Kumar et al., 2022)

8. Why are most molecular helices right-handed? This is because most amino acids are D-oriented.

9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation

β-sheets tend to aggregate because exposed backbone hydrogen bond donors and acceptors allow β-strands from different molecules to bind to each other. Hydrophobic side chains further stabilize these interactions, making sheet stacking energetically favorable. Aggregation is driven mainly by intermolecular hydrogen bonding and hydrophobic interactions. These forces lower the system’s free energy and promote ordered β-sheet assembly, similar to crystallization. (Chen et al., 2017)

10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Many amyloid diseases involve β-sheet structures because the cross-β arrangement is highly stable and self-propagating. This stability allows misfolded proteins to accumulate as insoluble fibrils that disrupt normal cellular function. (Chen et al., 2017). Despite their toxic effects on health, there is increasing research pertaining their development as materials for several applications (Yadav et al., 2024).

PART B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

In this case I chose the laccase from Bacillus subtilis cotA, an enzyme shown to have multiple applications, ranging from bioremedation to dye breakdown. It belongs to the family of multicopper oxidases (MCOs), which are capable of oxidizing a significant amount of chemical compounds. Since one of my project ideas was to determine the optimal concentration of laccase for optimal wastewater treatment in textile factories, I thought it would be a great start to get used to the different tools we have for protein design.

2. Identify the amino acid sequence of your protein.

The sequence is:

\ \ MTLEKFVDALPIPDTLKPVQQSKEKTYYEVTMEECTHQLHRDLPPTRLWGYNGLFPGPTIEVKRNENVYVKWMNNLPSTHFLPIDHTIHHSDSQHEEPEVKTVVHLHGGVTPDDSDGYPEAWFSKDFEQTGPYFKREVYHYPNQQRGAILWYHDHAMALTRLNVYAGLVGAYIIHDPKEKRLKLPSDEYDVPLLITDRTINEDGSLFYPSAPENPSPSLPNPSIVPAFCGETILVNGKVWPYLEVEPRKYRFRVINASNTRTYNLSLDNGGDFIQIGSDGGLLPRSVKLNSFSLAPAERYDIIIDFTAYEGESIILANSAGCGGDVNPETDANIMQFRVTKPLAQKDESRKPKYLASYPSVQHERIQNIRTLKLAGTQDEYGRPVLLLNNKRWHDPVTETPKVGTTEIWSIINPTRGTHPIHLHLVSFRVLDRRPFDIARYQESGELSYTGPAVPPPPSEKGWKDTIQAHAGEVLRIAATFGPYSGRYVWHCHILEHEDYDMMRPMDITDPHK

Using the code provided, the most common aminoacid is Proline (P) which appears 46 times.

Some additional facts were obtained from UniProt:

  • Lenght (Number of AAs): 513
  • Molecular mass: 58.5 kDa
  • Family: Multicopper Oxidase
  • Number of homologues: 242

3. Identify the structure page of your protein in RCSB

The protein’s structure was solved in 2003 by Francisco J. Enguita and collaborators, with a resolution of 1.70 Å. There are 4 ligands: C2O, GOL, C1O and CU. It belongs to the Laccase family.

4. Open the structure of your protein in any 3D molecule visualization software

The following images of the protein were obtained from PyMOL:


Main structure

Figure 1. Structural representations of the laccase protein generated in PyMOL: cartoon, ribbon and ,balls-and-sticks representation.

Figure 1 presents different structural representations of CotA. The cartoon view highlights its compact, globular fold dominated by β-sheets and connecting loops, characteristic of bacterial laccases, while the ribbon representation emphasizes the backbone organization and overall topology. The balls-and-sticks model displays all atoms explicitly, revealing dense atomic packing and the presence of copper ions within the protein core, coordinated by conserved histidine residues typical of multicopper oxidase active sites.


Secondary structures

Figure 2. Secondary structures and residue groups.

The surface of CotA was colored according to residue type: hydrophobic residues in yellow, polar uncharged in cyan, positively charged in blue, and negatively charged in red. The visualization shows a typical globular organization, with hydrophobic residues mainly buried in the protein core, contributing to structural stability, and polar and charged residues predominantly exposed on the surface, supporting solubility and potential functional interactions. The distribution is consistent with CotA’s role as a multicopper oxidase.

PART C: Using ML-Based Protein Design Tools

1. Protein Language Modeling

The selected protein was the $\beta$-lactoblogulin which is the main component of whey protein.

Sequence

\ \ 3NPO_1|Chain A|Beta-lactoglobulin|Bos taurus (9913) LIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI

Deep Mutational Scans

Figure 3. Deep Mutational Scans.

A few mutational hotspots were identified, particularly around positions 17, 63, 142 and 158. These sites show a tendency toward substitution to Leucine (L) and Glutamic Acid (E). Given that these residues appear to be located on the outer region of the protein, this pattern may reflect structural permissiveness. Substitution toward Leucine could enhance local hydrophobic packing, whereas Glutamic Acid, being negatively charged, may be well tolerated on the protein surface. This suggests that these positions are structurally flexible and can accommodate both hydrophobic and charged residues without significantly disrupting the overall fold.

Latent Space Analysis

Figure 4. Latent Space Analysis.

2. Protein Folding

Figure 5. Structure obtained .

Due to issues with the Colab Notebook, I was not able to provide an image with the different mutations for the protein. However, I was able to obtain the amino acid probabilites, which may help me in the future to guess the most likely mutations and try and fold again the proteins.

Figure 6. Amino acid probabilities .

PART D: Group Brainstorm on Bacteriophage Engineering

Group Members: Alvaro Pacheco (Lima, PE) and Renzo Condori (Lima, PE)

Goals

  • Perform mutagenesis on the LS-motif to enhance stability
  • Modify the promoting region of the DNA sequence to express larger amounts of the MS2 protein

Tools

-AlphaFold: Predicts 3D structure of the mutant variants. It allows to evaluate if such mutations maintain the transmbembrane topology and general conformation, verifying if the functional motif LS keeps its orientation and stability.

-FoldX/Rosetta: It will estimate the change in free energy due to mutations. It eases the selection of mutant variants which are more likely to provde a thermodinamic stability, reducing the number of prospects.

-GROMACS: It allows to simulate and analyze the protein stability in a bacterial environment.

These tools complement each other by combining evolutionary, structural, and physical insights to improve MS2 lysis protein stability. Protein Language Models suggest mutations consistent with evolutionary constraints, AlphaFold2 screens for preserved structure and topology, energy calculations prioritize stabilizing variants, and molecular dynamics simulations test behavior in membrane conditions. Together, they enable rational design of stabilizing mutations while reducing the risk of impairing lytic function.

Pitfalls

One potential pitfall is the limited accuracy of current prediction tools for small, membrane-associated, and partially disordered proteins, which may reduce the reliability of both structural and energetic predictions. A second limitation is the trade-off between stability and function: increasing stability may reduce the conformational flexibility required for interaction with the membrane target, potentially impairing lytic activity.

Pipeline

Procedure

Functional Analysis and Definition of Constraints The conserved transmembrane motif Leu48–Ser49 (LS) and its immediate surrounding region will be identified as critical functional regions that must not be mutated.

Directed In Silico Mutagenesis Mutant variants will be generated using protein language models, restricting mutations to the remainder of the sequence.

Preliminary Energy Filtering Variants will be evaluated through ΔΔG stability predictions, selecting those with improved thermodynamic stability.

Structural Prediction Selected variants will be modeled using AlphaFold2 to verify preservation of the transmembrane domain and overall conformation.

Dynamic Validation in a Membrane Environment Top candidates will be evaluated through molecular dynamics simulations in a bacterial membrane environment.

Final Variant Selection Mutants showing the best balance between enhanced stability and functional conservation will be selected.


Week 5 HW: Protein Design part II

PART A: SOD1 Binder Peptide Design

The sequence for the original protein is:

// sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Mutation occurs at residue 4: Alanine becomes Valine

// 1UXM_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|SUPEROXIDE DISMUTASE [CU-ZN]|HOMO SAPIENS (9606) ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVS IEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Part 1: Generate Binders with PepMLM

Adding Methionine to the mutated sequence, the generated peptides were:

Nno. SequenceBinderPerplexity
1WRYYVAVVRWKK28.003177
2WRYYAAVLEWKE16.368043
3WHSYAVVLEWWK19.985691
4WLSGPVAVEWKK11.674838

Comparing to the binding protein, FLYRWLPSRRGG, all of them are quite different. This might be due to the input parameters of the code as well as the fact that the generation is random, which contributes to a higher degree of perplexity.

Part 2: Evaluate Binders with AlphaFold3

The following figures show how the SOD1 protein interacts with each of the four binders:

Figure 1. Visual representation of docking between different mutated binders and SOD1 protein.

The website also provided the following scores:

No. PeptideSequenceipTM scorepTM score
1WRYYVAVVRWKK0.780.85
2WRYYAAVLEWKE0.640.77
3WHSYAVVLEWWK0.740.83
4WLSGPVAVEWKK0.730.83

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

No. PeptideSequenceSolubilityHemolysisBinding AffinityMolecular WeightNet charge @ pH = 7Isoelectric PointHydrophobicity (GRAVY)
1WRYYVAVVRWKKSolubleNon-hemolytic (0.066)Medium (7.377 pKd/pKi)1654.0 Da3.7610.45-0.57
2WRYYAAVLEWKESolubleNon-hemolytic (0.057)Weak binding (6.837 pKd/pKi)1613.8 Da-0.236.28-0.68
3WHSYAVVLEWWKSolubleNon-hemolytic (0.090)Weak binding (6.133 pKd/pKi)1603.8 Da-0.156.75-0.12
4WLSGPVAVEWKKSolubleNon-hemolytic (0.039)Weak binding (5.301 pKd/pKi)1399.6 Da0.768.59-0.16

Part 4: Generate Optimized Peptides with moPPIt

The generated binders are shown in the following table:

BinderHemolysisSolubilityHalf-LifeAffinityMotifSpecificity
EAVEGLTAEQIW0.950.59.925.910.340.61
WIIWVTTTKAQK0.940.55.665.770.770.67
ITLDEWLKKQCY0.880.6714.467.160.780.57
TDEQKVQLSAYW0.840.675.646.410.660.53

PART C: Final Project: L-Protein Mutants

The programs were run and the heatmap along with some mutation candidates were obtained:

Amino AcidPosition in ProteinAmino Acid in Protein SequenceScore
L50K2.56
R29C2.39
L39Y2.24
S29C2.04
Q9S2.01
Q29C1.99
P29C1.97
L29C1.96
I50K1.92
L53N1.86
Figure 2. Amino acid-mutation predictions.

Week 6 HW: Genetic Circuits Part I

PART 1: Protocol questions

  1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The mastermix contains:

  • Phusion Hi-Fi DNA Polymerase: It is crucial for completing the amplicons generated during PCR.
  • Deoxynucleotides: The building blocks necessary for replicating DNA fragments.
  • Buffer including MgCl2: Prevents enzyme denaturation by maintaining pH at a fixed level.
  1. What are some factors that determine primer annealing temperature during PCR?

Some factors include the primer length

  1. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other
CategoryPRCRestriction Enzymes
Protocol
  1. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

  2. How does the plasmid DNA enter the E. coli cells during transformation?

The plasmid DNA enters the cells through a process called electroporation where, by means of an externally applied voltage, the membrane permeability increases, allowing the plasmid to enter the bacterial cytosol.

  1. Describe another assembly method in detail (such as Golden Gate Assembly) 6.1 Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online) 6.2. Model this assembly method with Benchling or Asimov Kernel!

Week 7 HW: Genetic Circuits Part II

PART 1: IANNs

  1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

  2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

  3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

PART 2: Fungal Materials

  1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

  2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

Week 9 HW: Cell-Free Systems

  1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Main advantages (flexibility & control):

Open system: You can directly add/remove components (DNA, cofactors, salts, inhibitors). Precise control: You can tune Mg²⁺, ATP, amino acids, etc. Rapid expression: No need for cloning → transformation → growth. Toxic proteins: You can express proteins that would normally kill cells.

When CFPS is better than in vivo:

Producing toxic proteins (e.g., antimicrobial peptides) Studying protein variants quickly (high-throughput screening, mutant libraries) Incorporating non-natural amino acids Expressing membrane proteins without worrying about cell viability

  1. Describe the main components of a cell-free expression system and explain the role of each component.

Cell extract (lysate): Contains ribosomes, tRNAs, enzymes DNA or mRNA template: The blueprint for your protein Amino acids: Building blocks for protein synthesis Energy system (ATP, GTP + regeneration system): Fuels translation Salts (Mg²⁺, K⁺): Maintain ribosome stability and activity Cofactors (NAD⁺, CoA, etc.): Support metabolic reactions Enzymes (optional): Folding, disulfide bond formation, etc.

  1. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Protein synthesis is an ATP-burning monster. If ATP runs out, your system basically gives up and stares into the void. Without regeneration translation stops quickly, and yield drops dramatically.

For instance, use of phosphoenolpyruvate (PEP) or creatine phosphate is valid as energy sources since these regenerate ATP via substrate-level phosphorylation

  1. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic systems (e.g., E. coli extract):

Fast, cheap, high yield Poor at post-translational modifications

Use case:

Produce enzymes like β-galactosidase → no complex folding/modifications needed

Eukaryotic systems (e.g., wheat germ, insect, mammalian extracts):

Slower, expensive Can do folding, disulfide bonds, glycosylation (depending on system)

Use case:

Produce antibodies or glycoproteins → need proper folding and modifications

  1. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Challenges include aggregation, misfolding and insolubility. The strategies might include:

Add detergents Use liposomes or nanodiscs to mimic membranes Optimize Mg²⁺ and chaperones Lower temperature to improve folding

A design idea would be:

CFPS + nanodiscs + chaperones → allows co-translational insertion into a membrane-like environment

  1. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
  • Poor DNA template quality

Problem: degraded DNA or bad promoter Fix: use fresh plasmid, stronger promoter (e.g., T7), optimize codons

  • Energy depletion

Problem: ATP runs out Fix: improve regeneration system (PEP, glucose system)

  • Protein misfolding or degradation

Problem: aggregates or proteolysis Fix: add chaperones, reduce temperature, include protease inhibitors

  • Suboptimal ion concentrations

Problem: Mg²⁺/K⁺ imbalance kills ribosome activity Fix: optimize salt concentrations experimentally