Scale-up of nanocapsules for drug delivery using bacteria as ferritin manufacturers 1. Describe a biological engineering application or tool you want to develop and why.
Biologics are drugs synthesized by living organisms, which have gained more notoriety throughout the years (Walsh, 2018). Cancer drugs and vaccines are some of the achievements scientists have accomplished with biotechnology. This is a novel area with increasing knowledge and endless applications. Currently, iron deficiency is one of the main global issues affecting overall health (Lee et al., 2025). This project aims to develop a drug delivery system using bacteria-made ferritin, given the popularity and extended use of these microorganisms over the years for drug manufacturing (Kulkarni, 2026).
Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
The main aspect to be measured is the expression and activity of the biosynthetic gene cluster (BGC). This includes:
Presence and expression of BGC-associated enzymes Production of candidate metabolites Antibacterial activity against Leptospira Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
HOMEWORK Part 1: Benchling & In-silico Gel Art Below are some screenshots from the steps followed to create a basic pattern:
Step 1: The sequence is imported from the webpage to Benchling.
Figures 1 and 2. Lambda DNA import process. Step 2: The digest function is shown as a test with EcoRI as the chosen restriction enzyme.
PART A: 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Assumptions:
500 g of meat ~31 g of protein per 100 g of meat (British Nutrition Foundation, 2021 ) Average amino acid mass ≈ 100 g/mol Avogadro’s number = 6.022 × 10^23 molecules/mol 1. Protein content in 500 g of meat
PART A: SOD1 Binder Peptide Design The sequence for the original protein is:
// sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Mutation occurs at residue 4: Alanine becomes Valine
// 1UXM_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|SUPEROXIDE DISMUTASE [CU-ZN]|HOMO SAPIENS (9606) ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVS IEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Part 1: Generate Binders with PepMLM
PART 1: Protocol questions What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The mastermix contains:
Phusion Hi-Fi DNA Polymerase: It is crucial for completing the amplicons generated during PCR. Deoxynucleotides: The building blocks necessary for replicating DNA fragments. Buffer including MgCl2: Prevents enzyme denaturation by maintaining pH at a fixed level. What are some factors that determine primer annealing temperature during PCR? Some factors include the primer length
PART 1: IANNs What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Main advantages (flexibility & control):
Open system: You can directly add/remove components (DNA, cofactors, salts, inhibitors). Precise control: You can tune Mg²⁺, ATP, amino acids, etc. Rapid expression: No need for cloning → transformation → growth. Toxic proteins: You can express proteins that would normally kill cells.
Subsections of Homework
Week 1 HW: Principles and Practices
Scale-up of nanocapsules for drug delivery using bacteria as ferritin manufacturers
1. Describe a biological engineering application or tool you want to develop and why.
Biologics are drugs synthesized by living organisms, which have gained more notoriety throughout the years (Walsh, 2018). Cancer drugs and vaccines are some of the achievements scientists have accomplished with biotechnology. This is a novel area with increasing knowledge and endless applications. Currently, iron deficiency is one of the main global issues affecting overall health (Lee et al., 2025). This project aims to develop a drug delivery system using bacteria-made ferritin, given the popularity and extended use of these microorganisms over the years for drug manufacturing (Kulkarni, 2026).
As I entered my senior year in university, I wanted to work on topics related to drug development or delivery, and scaling that process up, which is why I was doing some research for my bachelor’s thesis. However, I ended up reaching a different path due to life, but here I am trying to learn and see how far I can take this idea.
2. Describe one or more governance policy goals related to ensuring this application contributes to an ethical future & prevents harm.
Ensure quality in scaled production: Ensure that the biologic product complies with Good Manufacturing Practices (GMP).
Prevent non-maleficence in biomanufacturing: Avoid harmful use of bioengineered bacteria.
Foster and promote innovation and global access: Enable technology transfer to low-resource settings.
3. Describe at least three different potential governance actions by considering the purpose, design, assumptions, and risks of failures & “success”.
Below is a table with the three main governance actions:
Table 1. Governance actions
Governance Action
Purpose
Design
Assumptions
Risks & Failures
Biologics safeguards
Create biocontainment for bacteria as biological hazards
Entities such as FDA
The safeguards will be effective
Overlooking the safeguards may affect their effectiveness
Standardized GMPs
Elaborate guidelines for safe production
QA department staff
Fast implementation and adaptation by companies
High associated costs may create manufacturing monopolies
Traceability of biological product
Avoid misuse of the biologic
Molecular signatures
Traceability methods are robust
Mutations in microorganisms may render signatures ineffective
4. Score each of your governance actions against your rubric of policy goals
Table 2. Scoring
Does the option:
Biologics safeguards
Standardized GMPs
Traceability of biological product
Enhance Biosecurity
• By preventing incidents
1
2
2
• By helping respond
2
1
1
Foster Lab Safety
• By preventing incidents
1
1
2
• By helping respond
2
2
2
Protect the environment
• By preventing incidents
1
2
2
• By helping respond
2
2
2
Other considerations
• Minimizing costs and burdens
3
2
2
• Feasibility
2
2
2
• Not impede research
2
3
2
• Promote constructive applications
2
1
2
TOTAL SCORE
18
18
19
5. Based on scores, describe which governance option or combination of options you would prioritize, and why.
After reviewing the different options and their scores, the most reasonable combination of options to prioritize would be Standardized GMPs and Traceability of Biological Products. The former is selected due to its strong impact on both productivity and product quality, as well as its capacity to establish clear guidelines that ensure biological safety for both production staff and consumers. The latter is essential because traceability enables the identification of errors and deviations throughout the production process, allowing them to be corrected in a timely manner and ensuring the ethical and responsible use of this technology.
Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
The main aspect to be measured is the expression and activity of the biosynthetic gene cluster (BGC). This includes:
Presence and expression of BGC-associated enzymes
Production of candidate metabolites
Antibacterial activity against Leptospira
Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
BGC enzyme expression: Measured using Western blot to confirm protein presence and approximate expression levels.
Metabolite production: Measured using LC-MS to detect and quantify candidate compounds produced by the BGC.
Antibacterial activity: Evaluated through antibiograms to assess inhibition of Leptospira growth.
What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.
Western blot: To detect specific proteins encoded by the BGC after separation by gel electrophoresis.
Gel electrophoresis: For protein separation prior to blotting.
LC-MS (Liquid Chromatography–Mass Spectrometry): Main analytical technique to identify and quantify metabolites based on retention time and mass-to-charge ratio.
Antibiogram assays: To determine the antibacterial effectiveness of produced compounds.
PART 1
Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/
Based on the website provided, the molecular weight is 26.94 kDa, which does not consider the linker and His tag. If we consider them, the new MW will then be 28 kDa. The former value is consistent with other GPFs from other databases such as Q9U6Y4 (26.17 kDa), P42212 (26.89 kDa) and Q9GZ28 (25.91 kDa).
Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/
The expression is:
$
z = \frac{m/z_{n+1}}{\left(\frac{m}{z_n} - \frac{m}{z_{n+1}}\right)}
$
Then, considering peaks such as 800.5508 and 824.0635, $z$ is equal to 34.047.
Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?
PART 2
Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?
When a protein unfolds, it allows external ions to interact with the structure through ion-molecule interactions such as ion-dipole forces. This increases the amount of charges and, therefore, the number of peaks, which is why it is shown a broader spectrum in the denatured protein than that of the native protein.
Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800? What is the charge state? How can you tell?
TBA
PART 3
How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).
MVS K GEELFTG VVPILVELDG DVNGH K FSVS GEGEGDATYG K LTL K FICTT G K LPVPWPTL VTTLTYGVQC FS R YPDHM K Q HDFF K SAMPE GYVQE R TIFF K DDGNY K T R A EV K FEGDTLV N R IEL K GIDF K EDGNILGH K LEYNYNSHNV YIMAD K Q K NG I K VNF K I R HN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALS K D PNE K R DHMVL LEFVTAAGIT LGMDELY K LE HHHHHH
Number of Lysines: 20
Number of Arginines: 6
How many peptides will be generated from tryptic digestion of eGFP?
Based on the website provided, there will be 19 fragments
Figure 1. 19 fragments cut using the Expasy PeptideMass website.
Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
Week 2 HW: DNA Read, Write, and Edit
HOMEWORK
Part 1: Benchling & In-silico Gel Art
Below are some screenshots from the steps followed to create a basic pattern:
Step 1: The sequence is imported from the webpage to Benchling.
Figures 1 and 2. Lambda DNA import process.
Step 2: The digest function is shown as a test with EcoRI as the chosen restriction enzyme.
Figures 3 and 4. DNA Gel using EcoRI.
Step 3: The process is repeated using enzymes requested in the homework, and the result is the following:
Figures 5 and 6. DNA pattern using homework enzymes.
Step 4: The process is repeated now using different enzymes requested in the homework to create a different pattern. In this case, the pattern might look like a series of barcodes, using the enzymes shown in figure 7. Since they cut in different sites, they create a lot of short DNA fragments which scatter across the gel, givin the impression of a barcode. This enzyme behavior could be used as a biomarker perhaps in, although this idea needs further development:
Figures 7 and 8. DNA Gel pattern using different enzymes.
Part 3: DNA Design Challenge
3.1. Choose your protein
In this case, I chose the Transcription Initiation Factor 3 (TIF-3) encoded in the gene infC, which is a relatively short protein that is involed in the translation process (Gutu et al., 2013), (Arenz & Wilson, 2016). Modifying the structure of this protein may be crucial to combat antibiotic resistance.
By going to UniProt, the aminoacid sequence is the following:
3.2 and 3.3: Reverse translation and codon optimization
Going bak to Benchling, the AA sequence was imported and the function back translate was used to obtain the DNA sequence which is already optimized given the figures below:
Figure 9. Codon optimization.
// DNA optimized sequence:
ATGAGTTTACGTGAAGCACTGGAAAAAGCGGAAGAAGCCGGTGTTGATCTGGTCGAAATCAGTCCTAATGCAGAACCCCCGGTGTGCCGTATCATGGACTATGGCAAATTCCTCTACGAGAAATCTAAAAGCTCAAAGGAACAAAAAAAGAAACAGAAGGTTATTCAGGTCAAAGAGATTAAGTTTCGACCGGGGACTGACGAAGGAGACTATCAAGTGAAACTTCGCTCCTTGATTCGCTTCCTGGAAGAGGGGGATAAAGCGAAAATTACCCTGCGCTTTCGCGGCAGAGAGATGGCCCACCAGCAGATCGGCATGGAAGTATTGAACCGTGTGAAAGATGACTTACTGCGTCGCACGGGTAGCGGTCGTATACTGCCAAACGAGGATCGCCGGCCGCCGGATGATCATGGCGCTCGGTCG
The organism selected for this protein is E. Coli due to its wide use in biotechnology. The codons had to be optimized due to the fact that the cellular machinery may differ from one bacteria to another. This means that a bacteria other than E. Coli might express this protein at a different rate and intensity. In this case, the protein was obtained from E. Coli based on UniProt, but since E. Coli contains different strands, codon optimization still would have to be performed.
3.4 and 3.5: Production technologies and alignment
I would use host cells since the chosen protein is from a bacteria en E. Coli is a common microorganism used for these purposes. This technique has a much lower cost thant using cell-free systems in which all the cellular components have to be supplied.
The alignment is shown in the following figures
Figures 10, 11 and 12. Central Dogma for TIF-3.
Part 4: Prepare a Twist DNA Sequence Order
Step 1: DNA Sequence
The same DNA linear sequence was already obtained in Part no. 3.
Figure 13. DNA Sequence.
Step 2: Building the chasis
The parts were initially searched in iGEM, but the website shut down. Due to this, the parts provided in the homework were used.
Figures 14 and 15. iGEM issue.
The chasis now looks like this:
Figure 16. Chasis.
Step 4: Ordering in Twist
The process is shown in the figures below:
Figure 17. Importing the sequence to Twist from Benchling.
Step 5: Creating the plasmid
The process is shown in the figures below:
Figures 18 and 19. Creating the plasmid.
Finally, the plasmid is shown below:
Figure 20. Final plasmid.
Part 5: DNA Read/Edit/Write
5.1.1: What DNA would you want to sequence (e.g., read) and why?
I would like to analyze DNA from insects such as flies, since many species act as vectors for infectious diseases. By sequencing their DNA, I could identify genetic elements associated with viral transmission, pathogen resistance, or susceptibility. This information could help improve disease monitoring and vector control strategies.
5.1.2: In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
I would use second-generation sequencing technologies such as Illumina sequencing due to their high throughput, accuracy, and cost-effectiveness. Illumina sequencing is particularly efficient for short DNA fragments and allows the parallel sequencing of millions of reads, making it ideal for large-scale genomic analysis of insect populations.
5.2.1: What DNA would you want to synthesize (e.g., write) and why?
I would like to synthesize bacterial DNA initially because bacterial genomes are less complex than eukaryotic genomes, which makes them more manageable in terms of cost and laboratory procedures. This would allow me to gain experience with gene design and expression systems before working with more complex organisms.
5.2.2: What technology or technologies would you use to perform this DNA synthesis and why?
I would use common routes such as solid-phase phosphoramidite chemical DNA synthesis combined with gene assembly techniques. These methods allow precise synthesis of short oligonucleotides, which can then be assembled into longer DNA constructs. It is widely used, reliable, and scalable for constructing bacterial genes or plasmids.
5.3.1: What DNA would you want to edit and why?
I would edit DNA from mammalian cells, focusing on genes involved in the immune response. By modifying specific regulatory or coding sequences, it may be possible to enhance resistance to infectious diseases or better understand the mechanisms underlying autoimmune disorders. However, such research would need to be conducted carefully and ethically due to the potential implications of editing mammalian genomes.
PRE-LECTURE NOTES
Homework questions from Prof. Jacobson
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?
The error rate of polymerase is approximately 1 in a million nucleotides. Considering the human genome length of approximately 3.2 billion base pairs, or 6.4 billion nucleotides in a diploid cell, there would be roughly 6,400 errors per molecule of DNA, and half that number in a haploid cell. This implies a significant chance for defects or mutations to occur and potentially be passed down to offspring. However, biology has developed multiple mechanisms over the past century that increase the fidelity of DNA replication. For instance, MutS-1 is a protein shown to bind to mismatched DNA sequences. This mechanism therefore acts as an additional layer that improves the fidelity of de novo DNA synthesis (Carr et al., 2004).
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
There are approximately 20n ways to code a human protein, where n represents the length of the protein. For instance, a typical protein may consist of 300 amino acids. Therefore, there are 20300 possible ways, which corresponds to an extremely large number of potential coding sequences (Alberts et al., 2002). Some of the reasons these codes do not work in practice include:
Codon usage bias: The prevalence of a codon is related to its translation efficiency; some codons are translated faster than others. This impacts protein expression levels and availability (Chakravarty, 2026).
Protein structure: Since proteins fold co-translationally, changes in codon usage can alter the timing of folding events, affecting protein structure and function (Moss et al., 2024).
Homework questions from Dr. LeProust
What’s the most commonly used method for oligo synthesis currently?
Currently, oligo synthesis is most commonly performed using phosphoramidite nucleosides as building blocks. This process consists of four main chemical reactions: detritylation, coupling, capping, and oxidation (Kosuri & Church, 2014).
Why is it difficult to make oligos longer than 200 nt via direct synthesis?
The main challenge in synthesizing long oligonucleotides using standard phosphoramidite chemistry lies in cumulative yield loss and error accumulation. Unwanted reactions, such as depurination during detritylation, and incomplete removal of protecting groups can leave gaps in the oligo backbone, reducing overall yield. In addition, single-base deletions are the predominant errors caused by inefficiencies during these reaction steps.
Why can’t you make a 2000 bp gene via direct oligo synthesis?
Manufacturing an oligo of this length is highly prone to errors due to several factors. First, oligo concentrations obtained from a selected pool after processing are often quite low, reducing assembly efficiency. Second, when synthesizing large numbers of oligos, overlapping coding regions may introduce assembly errors at scale. Finally, significantly higher costs are required to produce the large number of strands necessary for successful gene assembly.
Homework question from George Church
Using Google & Prof. Church’s slide #4, what are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
The ten essential amino acids in most animals are:
Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and sometimes Cysteine or Tyrosine, depending on species-specific metabolic capabilities (Hou and Wu, 2018).
Understanding the Lysine Contingency as a bioengineered constraint, the dependence of animals on multiple essential amino acids further strengthens this strategy. This dependency enables researchers to implement safer in vivo containment systems, as organisms lacking access to these amino acids are unable to survive outside controlled environments (Shivni, 2023).
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P. (2002). Molecular Biology of the Cell (4th ed.). Garland Science. https://www.ncbi.nlm.nih.gov/books/NBK26830/
Assignment no. 1: Python Script for Opentrons Artwork
The code used was the following to create a simple swirl pattenr with four different colors:
fromopentronsimporttypesimportmathmetadata={'author':'Jean Colmenares','protocolName':'Agar Swirl Pattern - 4 Colors','description':'Swirl pattern with four colors per branch','source':'HTGAA 2026 Opentrons Lab','apiLevel':'2.20'}TIP_RACK_DECK_SLOT=9COLORS_DECK_SLOT=6AGAR_DECK_SLOT=5PIPETTE_STARTING_TIP_WELL='A1'well_colors={'A1':'Red','B1':'Green','C1':'Orange','D1':'Blue'}defrun(protocol):tips_20ul=protocol.load_labware('opentrons_96_tiprack_20ul',TIP_RACK_DECK_SLOT,'Opentrons 20uL Tips')pipette_20ul=protocol.load_instrument("p20_single_gen2","right",[tips_20ul])temperature_module=protocol.load_module('temperature module gen2',COLORS_DECK_SLOT)temperature_plate=temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul','Cold Plate')color_plate=temperature_plateagar_plate=protocol.load_labware('htgaa_agar_plate',AGAR_DECK_SLOT,'Agar Plate')center_location=agar_plate['A1'].top()pipette_20ul.starting_tip=tips_20ul.well(PIPETTE_STARTING_TIP_WELL)# ------------------------------------------------------------------# Helper functions# ------------------------------------------------------------------deflocation_of_color(color_string):forwell,colorinwell_colors.items():ifcolor.lower()==color_string.lower():returncolor_plate[well]raiseValueError(f"No well found with color {color_string}")defdispense_and_detach(pipette,volume,location):above_location=location.move(types.Point(z=location.point.z+5))pipette.move_to(above_location)pipette.dispense(volume,location)pipette.move_to(above_location)# ------------------------------------------------------------------# SWIRL PATTERN — BIG + FIXED COLOR PER BRANCH (P20 SAFE)# ------------------------------------------------------------------DROP_VOLUME=3branches=4points_per_branch=24radius_start=3radius_step=1.6angle_step=math.pi/9branch_colors=['Red','Green','Orange','Blue']forbranchinrange(branches):base_angle=branch*(2*math.pi/branches)color=branch_colors[branch]source=location_of_color(color)foriinrange(points_per_branch):pipette_20ul.pick_up_tip()pipette_20ul.aspirate(DROP_VOLUME,source.bottom(1))angle=base_angle+i*angle_stepradius=radius_start+i*radius_stepx=radius*math.cos(angle)y=radius*math.sin(angle)loc=center_location.move(types.Point(x=x,y=y,z=0))dispense_and_detach(pipette_20ul,DROP_VOLUME,loc)pipette_20ul.drop_tip()
The pattern is shown below:
Assignment no.2: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
Automated Strain Construction for Biosynthetic Pathway Screening in Yeast Astolfi et al., 2025
In this study, the researchers programmed a Hamilton Microlab VANTAGE liquid-handling robot (a high-end automation platform, not an Opentrons) to integrate with additional on- and off-deck hardware (e.g., thermocyclers, plate sealers, colony pickers) via its central arm. Together with custom software and a user interface developed in the Hamilton VENUS environment, this system automated key steps in yeast strain construction such as transformation setup, heat-shock, washing, and plating.
This automated workflow achieved a throughput of up to ~2,000 transformations per week, enabling high-throughput construction and screening of libraries of engineered yeast strains. As a proof of concept, the team applied the system to screen gene variants within a biosynthetic pathway for the plant alkaloid precursor verazine. They identified several genes that significantly increased pathway product titers, demonstrating the utility of automated strain construction for rapid pathway discovery and optimization.
Assignment no. 3: Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.
For my first idea, using a liquid-handling robot like the ones at Opentrons would be a crucial step in figuring out the right enzyme concentration for dye breakdown.
Week 4 HW: Protein Design Part I
PART A:
1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Humans need to feed on beef, fish and other nutrients to obtain energy and raw materials. Even though we consume proteins and nucleic acids that were originally built according to another organism’s DNA, digestion breaks them down into basic biomolecules, namely amino acids and nucleotides. Our cells then use our DNA to reassemble those building blocks according to human genetic instructions, not those of a cow or a fish.
3. Why are there only 20 natural amino acids?
There are around 500 aminoacids, but the only ones required for human protein building are 20.
5. Where did amino acids come from before enzymes that make them, and before life started?
Amino acids likely formed through prebiotic chemical reactions before life emerged. Experimental evidence suggests they could have been synthesized under early Earth atmospheric conditions, through energy sources such as lightning, volcanic activity, and hydrothermal systems rich in sulfur compounds. Discuss how simple inorganic molecules, combined with energy input, could generate organic building blocks like amino acids without the need for enzymes.
Several scienties have tried to answer this question and, surprisginfly, they could have been synthetized artifically by the atmospheric conditions and the high-sulfured sea. This (Farias-Rico and Mourra-Diaz, 2022)
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
Because D-amino acids are the mirror image of L-amino acids, the energetically favorable backbone torsion angles are also inverted. As a result, the most stable α-helix formed by D-amino acids is left-handed.
7. Can you discover additional helices in proteins?
Yes. In fact, some studies have identified different forms of alpha helix in globular proteins, namely linear, curved or kinked (Kumar & Bansal, 1998). Adititionaly, there are 3~10 and $\pi$ helix which are less favourable in trerms of stability but still occur (Kumar et al., 2022)
8. Why are most molecular helices right-handed?
This is because most amino acids are D-oriented.
9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation
β-sheets tend to aggregate because exposed backbone hydrogen bond donors and acceptors allow β-strands from different molecules to bind to each other. Hydrophobic side chains further stabilize these interactions, making sheet stacking energetically favorable. Aggregation is driven mainly by intermolecular hydrogen bonding and hydrophobic interactions. These forces lower the system’s free energy and promote ordered β-sheet assembly, similar to crystallization.
(Chen et al., 2017)
10. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?
Many amyloid diseases involve β-sheet structures because the cross-β arrangement is highly stable and self-propagating. This stability allows misfolded proteins to accumulate as insoluble fibrils that disrupt normal cellular function. (Chen et al., 2017). Despite their toxic effects on health, there is increasing research pertaining their development as materials for several applications (Yadav et al., 2024).
PART B: Protein Analysis and Visualization
1. Briefly describe the protein you selected and why you selected it.
In this case I chose the laccase from Bacillus subtilis cotA, an enzyme shown to have multiple applications, ranging from bioremedation to dye breakdown. It belongs to the family of multicopper oxidases (MCOs), which are capable of oxidizing a significant amount of chemical compounds. Since one of my project ideas was to determine the optimal concentration of laccase for optimal wastewater treatment in textile factories, I thought it would be a great start to get used to the different tools we have for protein design.
2. Identify the amino acid sequence of your protein.
Using the code provided, the most common aminoacid is Proline (P) which appears 46 times.
Some additional facts were obtained from UniProt:
Lenght (Number of AAs): 513
Molecular mass: 58.5 kDa
Family: Multicopper Oxidase
Number of homologues: 242
3. Identify the structure page of your protein in RCSB
The protein’s structure was solved in 2003 by Francisco J. Enguita and collaborators, with a resolution of 1.70 Å. There are 4 ligands: C2O, GOL, C1O and CU.
It belongs to the Laccase family.
4. Open the structure of your protein in any 3D molecule visualization software
The following images of the protein were obtained from PyMOL:
Main structure
Figure 1. Structural representations of the laccase protein generated in PyMOL:
cartoon, ribbon and ,balls-and-sticks representation.
Figure 1 presents different structural representations of CotA. The cartoon view highlights its compact, globular fold dominated by β-sheets and connecting loops, characteristic of bacterial laccases, while the ribbon representation emphasizes the backbone organization and overall topology. The balls-and-sticks model displays all atoms explicitly, revealing dense atomic packing and the presence of copper ions within the protein core, coordinated by conserved histidine residues typical of multicopper oxidase active sites.
Secondary structures
Figure 2. Secondary structures and residue groups.
The surface of CotA was colored according to residue type: hydrophobic residues in yellow, polar uncharged in cyan, positively charged in blue, and negatively charged in red. The visualization shows a typical globular organization, with hydrophobic residues mainly buried in the protein core, contributing to structural stability, and polar and charged residues predominantly exposed on the surface, supporting solubility and potential functional interactions. The distribution is consistent with CotA’s role as a multicopper oxidase.
PART C: Using ML-Based Protein Design Tools
1. Protein Language Modeling
The selected protein was the $\beta$-lactoblogulin which is the main component of whey protein.
A few mutational hotspots were identified, particularly around positions 17, 63, 142 and 158. These sites show a tendency toward substitution to Leucine (L) and Glutamic Acid (E). Given that these residues appear to be located on the outer region of the protein, this pattern may reflect structural permissiveness. Substitution toward Leucine could enhance local hydrophobic packing, whereas Glutamic Acid, being negatively charged, may be well tolerated on the protein surface. This suggests that these positions are structurally flexible and can accommodate both hydrophobic and charged residues without significantly disrupting the overall fold.
Latent Space Analysis
Figure 4. Latent Space Analysis.
2. Protein Folding
Figure 5. Structure obtained .
Due to issues with the Colab Notebook, I was not able to provide an image with the different mutations for the protein. However, I was able to obtain the amino acid probabilites, which may help me in the future to guess the most likely mutations and try and fold again the proteins.
Figure 6. Amino acid probabilities .
PART D: Group Brainstorm on Bacteriophage Engineering
Group Members: Alvaro Pacheco (Lima, PE) and Renzo Condori (Lima, PE)
Goals
Perform mutagenesis on the LS-motif to enhance stability
Modify the promoting region of the DNA sequence to express larger amounts of the MS2 protein
Tools
-AlphaFold: Predicts 3D structure of the mutant variants. It allows to evaluate if such mutations maintain the transmbembrane topology and
general conformation, verifying if the functional motif LS keeps its orientation and stability.
-FoldX/Rosetta: It will estimate the change in free energy due to mutations. It eases the selection of mutant variants which are more likely to provde a thermodinamic stability, reducing the number of prospects.
-GROMACS: It allows to simulate and analyze the protein stability in a bacterial environment.
These tools complement each other by combining evolutionary, structural, and physical insights to improve MS2 lysis protein stability. Protein Language Models suggest mutations consistent with evolutionary constraints, AlphaFold2 screens for preserved structure and topology, energy calculations prioritize stabilizing variants, and molecular dynamics simulations test behavior in membrane conditions. Together, they enable rational design of stabilizing mutations while reducing the risk of impairing lytic function.
Pitfalls
One potential pitfall is the limited accuracy of current prediction tools for small, membrane-associated, and partially disordered proteins, which may reduce the reliability of both structural and energetic predictions. A second limitation is the trade-off between stability and function: increasing stability may reduce the conformational flexibility required for interaction with the membrane target, potentially impairing lytic activity.
Pipeline
Procedure
Functional Analysis and Definition of Constraints
The conserved transmembrane motif Leu48–Ser49 (LS) and its immediate surrounding region will be identified as critical functional regions that must not be mutated.
Directed In Silico Mutagenesis
Mutant variants will be generated using protein language models, restricting mutations to the remainder of the sequence.
Preliminary Energy Filtering
Variants will be evaluated through ΔΔG stability predictions, selecting those with improved thermodynamic stability.
Structural Prediction
Selected variants will be modeled using AlphaFold2 to verify preservation of the transmembrane domain and overall conformation.
Dynamic Validation in a Membrane Environment
Top candidates will be evaluated through molecular dynamics simulations in a bacterial membrane environment.
Final Variant Selection
Mutants showing the best balance between enhanced stability and functional conservation will be selected.
Mutation occurs at residue 4: Alanine becomes Valine
// 1UXM_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|SUPEROXIDE DISMUTASE [CU-ZN]|HOMO SAPIENS (9606)
ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVS
IEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Part 1: Generate Binders with PepMLM
Adding Methionine to the mutated sequence, the generated peptides were:
Nno. Sequence
Binder
Perplexity
1
WRYYVAVVRWKK
28.003177
2
WRYYAAVLEWKE
16.368043
3
WHSYAVVLEWWK
19.985691
4
WLSGPVAVEWKK
11.674838
Comparing to the binding protein, FLYRWLPSRRGG, all of them are quite different. This might be due to the input parameters of the code as well as the fact that the generation is random, which contributes to a higher degree of perplexity.
Part 2: Evaluate Binders with AlphaFold3
The following figures show how the SOD1 protein interacts with each of the four binders:
Figure 1. Visual representation of docking between different mutated binders and SOD1 protein.
The website also provided the following scores:
No. Peptide
Sequence
ipTM score
pTM score
1
WRYYVAVVRWKK
0.78
0.85
2
WRYYAAVLEWKE
0.64
0.77
3
WHSYAVVLEWWK
0.74
0.83
4
WLSGPVAVEWKK
0.73
0.83
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
No. Peptide
Sequence
Solubility
Hemolysis
Binding Affinity
Molecular Weight
Net charge @ pH = 7
Isoelectric Point
Hydrophobicity (GRAVY)
1
WRYYVAVVRWKK
Soluble
Non-hemolytic (0.066)
Medium (7.377 pKd/pKi)
1654.0 Da
3.76
10.45
-0.57
2
WRYYAAVLEWKE
Soluble
Non-hemolytic (0.057)
Weak binding (6.837 pKd/pKi)
1613.8 Da
-0.23
6.28
-0.68
3
WHSYAVVLEWWK
Soluble
Non-hemolytic (0.090)
Weak binding (6.133 pKd/pKi)
1603.8 Da
-0.15
6.75
-0.12
4
WLSGPVAVEWKK
Soluble
Non-hemolytic (0.039)
Weak binding (5.301 pKd/pKi)
1399.6 Da
0.76
8.59
-0.16
Part 4: Generate Optimized Peptides with moPPIt
The generated binders are shown in the following table:
Binder
Hemolysis
Solubility
Half-Life
Affinity
Motif
Specificity
EAVEGLTAEQIW
0.95
0.5
9.92
5.91
0.34
0.61
WIIWVTTTKAQK
0.94
0.5
5.66
5.77
0.77
0.67
ITLDEWLKKQCY
0.88
0.67
14.46
7.16
0.78
0.57
TDEQKVQLSAYW
0.84
0.67
5.64
6.41
0.66
0.53
PART C: Final Project: L-Protein Mutants
The programs were run and the heatmap along with some mutation candidates were obtained:
Amino Acid
Position in Protein
Amino Acid in Protein Sequence
Score
L
50
K
2.56
R
29
C
2.39
L
39
Y
2.24
S
29
C
2.04
Q
9
S
2.01
Q
29
C
1.99
P
29
C
1.97
L
29
C
1.96
I
50
K
1.92
L
53
N
1.86
Figure 2. Amino acid-mutation predictions.
Week 6 HW: Genetic Circuits Part I
PART 1: Protocol questions
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
The mastermix contains:
Phusion Hi-Fi DNA Polymerase: It is crucial for completing the amplicons generated during PCR.
Deoxynucleotides: The building blocks necessary for replicating DNA fragments.
Buffer including MgCl2: Prevents enzyme denaturation by maintaining pH at a fixed level.
What are some factors that determine primer annealing temperature during PCR?
Some factors include the primer length
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other
Category
PRC
Restriction Enzymes
Protocol
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
How does the plasmid DNA enter the E. coli cells during transformation?
The plasmid DNA enters the cells through a process called electroporation where, by means of an externally applied voltage, the membrane permeability increases, allowing the plasmid to enter the bacterial cytosol.
Describe another assembly method in detail (such as Golden Gate Assembly)
6.1 Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online)
6.2. Model this assembly method with Benchling or Asimov Kernel!
Week 7 HW: Genetic Circuits Part II
PART 1: IANNs
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to
achieve your goal.
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is
DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation. Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
PART 2: Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
Week 9 HW: Cell-Free Systems
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Main advantages (flexibility & control):
Open system: You can directly add/remove components (DNA, cofactors, salts, inhibitors).
Precise control: You can tune Mg²⁺, ATP, amino acids, etc.
Rapid expression: No need for cloning → transformation → growth.
Toxic proteins: You can express proteins that would normally kill cells.
When CFPS is better than in vivo:
Producing toxic proteins (e.g., antimicrobial peptides)
Studying protein variants quickly (high-throughput screening, mutant libraries)
Incorporating non-natural amino acids
Expressing membrane proteins without worrying about cell viability
Describe the main components of a cell-free expression system and explain the role of each component.
Cell extract (lysate): Contains ribosomes, tRNAs, enzymes
DNA or mRNA template: The blueprint for your protein
Amino acids: Building blocks for protein synthesis
Energy system (ATP, GTP + regeneration system): Fuels translation
Salts (Mg²⁺, K⁺): Maintain ribosome stability and activity
Cofactors (NAD⁺, CoA, etc.): Support metabolic reactions
Enzymes (optional): Folding, disulfide bond formation, etc.
Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.
Protein synthesis is an ATP-burning monster. If ATP runs out, your system basically gives up and stares into the void. Without regeneration translation stops quickly, and yield drops dramatically.
For instance, use of phosphoenolpyruvate (PEP) or creatine phosphate is valid as energy sources since these regenerate ATP via substrate-level phosphorylation
Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
Prokaryotic systems (e.g., E. coli extract):
Fast, cheap, high yield
Poor at post-translational modifications
Use case:
Produce enzymes like β-galactosidase → no complex folding/modifications needed
Eukaryotic systems (e.g., wheat germ, insect, mammalian extracts):
Slower, expensive
Can do folding, disulfide bonds, glycosylation (depending on system)
Use case:
Produce antibodies or glycoproteins → need proper folding and modifications
How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.
Challenges include aggregation, misfolding and insolubility. The strategies might include:
Add detergents
Use liposomes or nanodiscs to mimic membranes
Optimize Mg²⁺ and chaperones
Lower temperature to improve folding
A design idea would be:
CFPS + nanodiscs + chaperones → allows co-translational insertion into a membrane-like environment
Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
Poor DNA template quality
Problem: degraded DNA or bad promoter
Fix: use fresh plasmid, stronger promoter (e.g., T7), optimize codons
Energy depletion
Problem: ATP runs out
Fix: improve regeneration system (PEP, glucose system)
Protein misfolding or degradation
Problem: aggregates or proteolysis
Fix: add chaperones, reduce temperature, include protease inhibitors