<Paulina Flores> — HTGAA Spring 2026

About me
Contact info
Homework
- Week 1 HW: Principles and Practices
- Week 2 HW: DNA Read, Write & Edit
- Week 3 HW: Lab Automation
- Week 4: Protein Design - part I
- Week 5: Protein Design - part II

Week 1 HW: Principles and Practices
First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. Thyroid implant for canines In the canine world, there is a very specific problem that owners face: hypothyroidism. It is a common condition that dogs develop around the ages of 2 or 3. Still, most of the time it is mistaken for other health conditions, such as intestinal problems, allergies, dermatological conditions, and so on. As this health condition has many impacts on the body of dogs, sometimes veterinarians can lead to a false positive.
Week 2 HW: DNA Read, Write & Edit
Part 1: Benchling & In-silico Gel Art See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview: Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Restriction Enzyme Digestion made with Benchling Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs! E=m*a2 EcoRV vs. EcoRI Single Enzymes Pyramid Enzymes Part 3: DNA Design Challenge
Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME! Your task this week is to Create a Python file to run on an Opentrons liquid handling robot. Review this week’s recitation and this week’s lab for details on the Opentrons and programming it. Generate an artistic design using the GUI at opentrons-art.rcdonovan.com. Star´s birth Rectangular color palette Iteration color palette 1 Iteration color palette 2 Iteration color palette 3 - Ellipse Iteration color palette 4 - Circumference Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. Iteration color palette 5 Coding done by Gemini This coding was made with Google Gemini. The steps for doing that were: first, loading the coordinates made in the GUI; second, giving instructions to the AI for what the expected outcome; finally, iterating until the idea was achieved. The given instruction given to the IA did not have any basic coding, it was all made with written instructions.
Week 4: Protein Design - part I
Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Why do humans eat beef but do not become a cow, eat fish but do not become fish? Why are there only 20 natural amino acids? Can you make other non-natural amino acids? Design some new amino acids. Where did amino acids come from before enzymes that make them, and before life started? If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? Can you discover additional helices in proteins? Why are most molecular helices right-handed? Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? Design a β-sheet motif that forms a well-ordered structure. Part B: Protein Analysis and Visualization
Week 5: Protein Design - part II
Part A: SOD1 Binder Peptide Design (From Pranam) Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.
Thyroid implant for canines
In the canine world, there is a very specific problem that owners face: hypothyroidism. It is a common condition that dogs develop around the ages of 2 or 3. Still, most of the time it is mistaken for other health conditions, such as intestinal problems, allergies, dermatological conditions, and so on. As this health condition has many impacts on the body of dogs, sometimes veterinarians can lead to a false positive.
There are three types of hypothyroidism:
We are going to center our attention on primary hypothyroidism, caused by an immunological condition or an idiopathic problem.
The origin of Primary Hypothyroidism lies in the intercommunication that exists between the thyroid gland and the brain´s instructions. There is a malfunction that doesn´t allow the gland to produce more T4. Although the communication between the brain, the hypothalamus, and the hipophysis are fine and working at its normal pace, the gland receives the message, the gland cannot keep the production of T4 because it has been attacked by the immunological cells or it has started to transform into fat with no returning point.
The interesting thing about this system is that it is only partially broken. The T4-T3 hormones do not depend on the gland to be absorbed by the body; they depend on the genetic switches that are inside the organs they enter: liver, kidneys, brain, and muscles. Another interesting fact is that, actually, T4 hormones are passive cells and can travel around the body through the circulatory system.
In our present times, we have a pharmaceutical solution called: levotiroxin. This pill is actually a concentration ot T4 that enters the body through the stomach, and as it is absorbed, it can be delivered to the different organs to be transformed to T3. Although it is a simple solution, it is not exactly accurate all of the time, and it has to be monitored every 6 months to be adjusted according to the requirements of multiple blood tests until the dose is accurate.
Now, what if we could implant a thyroid substitute that could read the TSH directly through the blood and produce T4 as needed in the body, with a more precise response? To achieve this objective, we must face the subsequent situations:
The body needs to accept the cells as its own. The most accurate way to do this is by extracting a piece of endodermic or glandular cells from the body of the pet and reprogram it to be thyroid cells. This should be done in a lab, and the cells must reach a stable state so that they can operate accurately and not stop working when they are inside the body. The importance of having cells from the same body is that they won´t be read as a giant thread, as other materials could.
Since the body has a very acute immune system, it is necessary to put the reprogrammed cells into a container or membrane that filters immune cells and directs infiltrations. This membrane should be soft and with the right amount of pores that allow the entrance of oxygen, TSH, and nutrients, but also allow the filtration of T4 into the blood system. This membrane should be made out of biomimetic biogels, which are very hydrated and can be read as neutral elements by the immune cells.
To grow an implant, it is important to guide the cells so that they can reproduce the architecture of the thyroid gland. This could be guided by a biopolymer that suggests how cells should grow together.
Finally, this implant should be placed in the subcutaneous region so that it is surrounded by blood; therefore, TSH can reach the implant and start the reactions of production of T4.
The implant must be tested and programmed to react over a period of weeks, because the levels of T4 are regulated not in the exact moment that the TSH goes up or down, but rather when the TSH stays in this levels for the period of weeks. These slow reactions are similar to how the original gland reacts, and also, are not a red flag to immune cells. The implant would be reacting accurately and on its natural pace without the induction of not accurate dosis of T4.
Another important addition, and not a very stable one for the body, would be to integrate a nanochip that could scan and give feedback on the situation inside the body, but, for now, it is not a stable solution because the immune system would read it as a major danger that must be eliminated.

Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals, for example, those relating to equity or autonomy.
Prevention of physical and psychological harm during experimental stages, non-malfeasance.
Fair and free access to information
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).
A. Review of individual cases based on the evaluation of diagnostics made by an external entity and professional
To obtain a verified hypotheroid diagnosis, there must be an external entity to review and confirm these evaluations. This review must be done by physical and chemical tests that provide consistent and standardized results. This conclusion would accept or deny the participation of each subject. For this action to be applied, collaboration between institutions and external professionals must be done. These entities and actors must be accredited and affiliated by the governement Ecuador, in this case. External evaluation must be a mandatory step before initiating any procedure, because it will ensure that participants are correctly diagnosed and that budgets, time, and assets are well used.
For these actions to happen, it is assumed that external professionals will be qualified and available to be part of the program, that diagnostic criteria will be consistent in every case, and that these reviews will reduce misdiagnosis. The risks that are on the table could be clinical disagreements between actors involved, internal vs. external, and there is a high risk of corruption among professionals and institutions, which could dramatically lower the chances of succeeding with ethical and good treatment values.
B. Unannounced auditor´s inspections regarding research developments and animals´ health and care
Monitoring research progress should be audited by some unannounced visits in order to have all the information regarding the project clear. This action would be led by an external inspector of the research institutions, as well as the research team. This would ensure the supervision of ethical practices in the lab or clinic, as well as the real progress of the experimental project. The external inspector would be accredited by external academic institutions that are renowned for their knowledge in the field and their ethical practices. It is important that this actor can have the authority to propose adjustments when needed, and to stop or allow the research to continue when the conditions are met.
This would be a successful action assuming that academic institutions would have this type of professional under their wing, and that they have the capacity to support these mechanisms of control. The risks that could arise in this policy are that frequent inspections could provoke a hostile environment, affecting the performance of researchers and, therefore, the success of the research. Administrative burdens could slow the speed of the process, and rigidity could cloud the creative and precise environment needed in this type of research.
C. Mandatory and transparency in educational programs
With mandatory education for all actors involved and transparent communication pathways ensures that experimental procedures, ethical considerations, and long-term care requirements are responsibly applied beyond the research setting. This action would be implemented by educational programs given by the academic institutions involved in this research. It would have to be open for specialized students, professionals, and caregivers who will be part of the program. Also, accessible and clear training sessions would be provided, focusing on care requirements and ethical responsibilities.
Assuming that the information is being passed with transparency and it is being understood by all the attendants, it could lead to more ethical decision-making and supervision within all the actors involved. On the other side, these actions could fail if the educational content is not accessible for all participants, making it overly technical or poorly communicated. Also, an important thing to keep in mind is that there will be an emotional and subjective matter that caregivers will experience throughout the experimentation lapse; this can lead to inadvertent or sudden dopouts putting at risk both the research process and the animal’s life.
Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:
| Does the option: | Option 1 | Option 2 | Option 3 |
|---|---|---|---|
| Enhance Biosecurity | |||
| • By preventing incidents | 1 | 1 | 1 |
| • By helping respond | 3 | 2 | 2 |
| Foster Lab Safety | |||
| • By preventing incident | 2 | 1 | 1 |
| • By helping respond | 3 | 2 | 2 |
| Protect the environment | |||
| • By preventing incidents | 3 | 2 | 3 |
| • By helping respond | 3 | 2 | 3 |
| Other considerations | |||
| • Animal wellfare and intervention | 1 | 1 | 1 |
| • Diagnostic accuracy and animal inclusion | 1 | 2 | 1 |
| • Ethical inclusion and transparency | 2 | 1 | 2 |
| • Equity in access to knowledge and care | 2 | 3 | 1 |
| • Minimizing costs and burdens to stakeholders | 1 | 2 | 3 |
| • Feasibility? | 1 | 2 | 2 |
| • Not impede research | 2 | 3 | 2 |
| • Promote constructive applications | 3 | 1 | 1 |
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix.
The fundamental governance option to be prioritized is “Review of individual cases based on the evaluation of diagnostics made by an external entity and professional”. As it acts on the foundations of the experimental phases, it can prevent misdiagnosis or weakly supported clinical assumptions before the animal is in the trial. This option reduces the risk of ethical procedures, minimizes harm, protects animal welfare, and strengthens the scientific validity of the research foundation. The trade-off to be considered here is that this extra evaluation may slow down, at the beginning, and limit the number of participants, as well as the development of the research, but in the future are preventive in many aspects.
As complementary governance options are Option n.2 and Option n.3. These actions ensure accountability within all the actors involved by making knowledge a regulatory and independent system. The prevent the misuse of information and empowers informed decision-making throughout the research cycle. External regulatory actors are also essential because they can verify if clinical, laboratory, and welfare standards are applied before, during, and after experimentation. The trade-off here is that research institutions might perceive these actors as intruders and as potential uncertainty symbols of the research development. But, on the other hand, the combination of early prevention, continuous oversight, and broad education creates a distributed responsibility, which reduces the ethical failures to go unnoticed or unaddressed.
Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.
As an experimental research that needs to be done in live animals, specifically canines, it is important to keep very clear the limit between ethical procedures and unethical ones. The fact that we do not communicate in the same way might be an important barrier between researchers and subjects. They might end up seen as numbers rather than brave and important living beings who are giving their lives to the experiment. It has arisen to me the question whether the experiment is worth the many health discomforts that the research individual will undergo to gain a major medical advancement?
Another question to keep in mind is whether we could develop a nanoscanner able to read the gland’s condition and avoid any invasive procedure, preventing from misdiagnostics and giving a much more accurate reading of the situation inside the animal´s body.
Some bibliography found about thyroid organoids for humans:
Kariyawasam, D., Stoupa, A., Nguyen Quoc, A., Pimentel Dantas, I., Polak, M., & Carré, A. (2025). From stem cells to organoids in thyroid: Useful tools or a step for cell therapy? La Presse Médicale, 54(4), 104301. https://doi.org/10.1016/j.lpm.2025.104301
ZHANG, Y., FU, M., WANG, H., & SUN, H. (2023). Advances in the Construction and Application of Thyroid Organoids. Physiological Research, 72(5), 557–564. https://doi.org/10.33549/physiolres.935102

Part 1: Benchling & In-silico Gel Art
See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview:





Part 3: DNA Design Challenge
3.1. Choose your protein.
In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
[Example from our group homework, you may notice the particular format — The example below came from UniProt]
THYROGLOBULINE (CANIS LUPUS FAMILIARIS) / ACTIN (CANIS LUPUS FAMILIARIS) vs. ACTIN (FUNGUS: S.C)
The world of proteins is so vast that choosing a single protein has been a profound task.
To follow the same path as week 1 HW, let´s start with Thyroglobulin, a very complex and specialized protein that is key to the generation of T3 and T4 hormones; in other words, it is a hormone protein. Because of its complexity, specificity, and its work with DNA, it is a modern protein. Some interesting facts about Thyroglobuline are: its size, it is very big in comparison to other proteins, it only functions in the thyroid gland, it is prone to being attacked to inmune system´s cells when something is not working well, and it does not accept errors in its process. If we compared it to the Actin protein, we could understand that Actin is a simpler protein that achieves a general action and that it is present in all eukaryotic forms since early life on Earth. Actin is the protein in charge of the formation of the cytoskeleton, motility, and shape of cells, among many other functions. The interesting fact about Actin is that it can allow errors to occur, in contrast to Thyroglobuline, which is very precise.
In the exercise bellow I will develop Thyroglobuline for Canis lupus familiaris, and also, compare Actin protein in dogs vs. Actin protein in fungus (Saccharomyces cerevisiae).
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.
[Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]
3.3. Codon optimization.
Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
[Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]
Codon optimization is needed to make the codon sequence of the original host be read or expressed in the codon sequence of the organism that will produce it later, without modifying the amino acid sequence. In the case of research, it is necessary to have a bank of protein that will be analyzed and tested; in that way, it is not sustainable to always get it from the original host for many reasons: budget, quantity, ethics, etc.
In the case of canine Thyroglobulin, the experiment will be based on the question: how to produce canine Thyroglobulin that could be used as one component of an implant of a thyroid gland? The cell organism that will produce the protein later will be CHO (Chinese hamster ovary cells), a mammal cell that has the capacity to perform complex processes to produce proteins as specialized as thyroglobulin.
3.4. You have a sequence! Now what?
What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
If I had to produce this from my DNA, I would have to use a codon-optimized sequence so that the protein could be interpreted by human cells. Although canine thyroglobuline and human thyroglobuline are not so different, and the DNA is not that different, it is necessary to make this step in order to have amino acids arranged in perfect order so that they can be read by mRNA. The technologies to do this would be:
cell-free methods: for producing the protein, specifically CHO cells that come from Chinese hamster ovaries. As this protein comes from a mammal, the cells for reproducing the protein need to be from the same group; it is not efficient to use bacterial cells like E.coli, for example, because the protein needs to fold in a specific way, and the differences between bacteria, mammal,s and plants make this process very different.
bioreactor: for scaling the production, avoiding cito-contamination, and giving the process a controlled atmosphere to fold and grow.
Part 4: Prepare a Twist DNA Synthesis Order
This is a practice exercise, not necessarily your real Twist order!
- 4.1. Create a Twist account and a Benchling account
- 4.2. Build Your DNA Insert Sequence
- 4.3. On Twist, Select The “Genes” Option
- 4.4. Select “Clonal Genes” option
- 4.5. Import your sequence
- 4.6. Choose Your Vector

Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME!
Your task this week is to Create a Python file to run on an Opentrons liquid handling robot.








This coding was made with Google Gemini. The steps for doing that were: first, loading the coordinates made in the GUI; second, giving instructions to the AI for what the expected outcome; finally, iterating until the idea was achieved. The given instruction given to the IA did not have any basic coding, it was all made with written instructions.
If the Python component is proving too problematic even with AI and human assistance, download the full Python script from the GUI website and submit that:
If you use AI to help complete this homework or lab, document how you used AI and which models made contributions.

Submit your Python file via this form.
SUCCESSFULL ATTEMPT
As you can see, the code has not been uploaded. But, it was because I could not understand how to do it. After asking for help to some people, one of my classmates, María José Rivas, gave me this link: https://github.com/Mozta/opentrons-bioart-sim/tree/main?tab=readme-ov-file#from-source-for-development. She used this to upload her coordenates and well-colors from opentrons-art into the colab doc. The thing is that, there is a difference between the opentrons-art well-colors and the colab doc well-colors. This protocol, runs it fine.
For this process, first I dowloaded python to see if I could run it there, it was not successfull, but I understood how the program works (super basic knowledge). Then I went to the colab doc and tried to import the documents, but I was not successfull, so I asked for help to ChatGpt. We went trough the hall process together, and step by step it helped me import the link info as well as my .py doc.
The final results are these:
Finally, the result is this:

Some info to keep in mind:
Special thanks to María José and Rafael Pérez Aguirre (@Mozta)
Post-Lab Questions — DUE BY START OF FEB 24 LECTURE
One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.
For this week, we’d like for you to do the following:
Automation of protein crystallization scale-up via Opentrons-2 liquid handling
This study shows the approach for optimizing protein crystallization trials at multi-microliter scale using the Opentrons-2 liquid handling robot. The research shows that using Python scripts for precise control, the robot can mix and set up crystallization plates with a model protein - hen egg white lysozyme - and periplasmic protein from Campylobacter jejuni, a crystal used in the Snow lab as a biomaterial for nanotechnology, requiring large, consistent batches. This automation of the process can significantly reduce manual labor, costs, and improve reliability in the protein crystallization results. Opentrons uses a python programming, making it easier to set up for iterations and improvements in programming protocols.
DeRoo, J. B., Jones, A. A., Slaughter, C. K., Ahr, T. W., Stroup, S. M., Thompson, G. B., & Snow, C. D. (2025). Automation of protein crystallization scaleup via Opentrons-2 liquid handling. SLAS Technology, 32, 100268. https://doi.org/10.1016/j.slast.2025.100268
Other interesting studies demonstrate how Opentrons can be linked to other types of technology, such as 3D bioprinting. Although a 3d printer does not work with proteins in the same way as OT-2, it can print different types of labware, reducing costs and making specialized tools. Apart from robots, there is now a collaboration between automated labs and AI assistance.
While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.
Project tech proposals:
Final Project Ideas — DUE BY START OF FEB 24 LECTURE
For the final project ideas, there are 3 options to take into consideration:
MUSIC & BACTERIA

AQUATIC MICROORGANISMS & BIOLUMINESCENT SENSORS

PROTEIN BASED CRYSTALLINE MATERIALS & SPIDER-SILK TEXTILES

Part A. Conceptual Questions
Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
Part B: Protein Analysis and Visualization
In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:
LUCIFERASE OF Pyrophorus plagiophthalamus
Luciferase is a protein/enzyme that generates bioluminescence by catalyzing oxidation of D-luciferine in the presence of ATP, oxygen, and MG+2. In the case of this particular insect, Pyrophorus plagiophthalmus, different isoforms of luciferasecan emit light ranging from green to orange, depending on which organs it expresses the gene. These variations of color arise from subtle structural variations in the enzyme´s active site that alter the electronic environment of the excited oxyluciferin intermediate. Click Beetle´s luciferase is a very stable protein in a wide range of pH range compared to other active luciferases. It is very common to use this enzyme for in vivo imaging applications, especially the red-emitting variants. They are also used as a biosensor to monitor gene expression and as a gene reporter.
I chose this particular protein because I am interested in analyzing how sound frequencies might influence bacterial protein expression, growth dynamics, or spatial organization. In this way, having luciferase as a biosensor is ideal; light emission provides a real-time, quantifiable readout.
Burbelo, P. D., Kisailus, A. E., & Peck, J. W. (2002). Detecting Protein-Protein Interactions Using Renilla Luciferase Fusion Proteins. BioTechniques, 33(5), 1044–1050. https://doi.org/10.2144/02335st05
How long is it? What is the most frequent amino acid?
For this part, I used Google Colab and did some research on Leucine. Luciferase of Pryphorus plagiophtalmus: has 543 amino acids, being the most frequent L (Leucina) that appears 56 times. Leucine is commonly known for being an amino acid that helps synthesize muscle proteins and supports tissue regeneration. In this case, its function is related to a hydrophobic nucleotide, correct protein folding, and formation of alpha helices.
How many protein sequence homologs are there for your protein?
According to Uniprot´s BLAST TOOL, it has 236 homologs. This means that there is a variety of similar proteins in the living realm. They might not be the same, but they share a very similar structure. These homologs can be orthologs and parologs. The second ones are proteins that can be found inside the insect’s body, but with very subtle variations in their structure.
Does your protein belong to any protein family?
Yes, it belongs to the luciferase proteins of insects. This type of protein needs ATP, d-luciferin, and oxygen to perform the oxidation process.
When was the structure solved? Is it a good quality structure? A good-quality structure is one with high resolution. Smaller the better (Resolution: 2.70 Å) :
This particular protein, Luciferase of Pyrophorus plagiophthalamus, is not in the bank information of RCSB, so I took the first luciferase structured in the bank which is: 1LCI Firefly luciferase from Photinus pyralis. Its structure was solved in 1997. The quality is 2.00 Å, which is a good quality.

Are there any other molecules in the solved structure apart from protein?
There is a presence chrystallographic molecules of water (HOH), which stabilizes the protein and may participate in hydrogen bond formation. As this is the first protein from luciferase to be structured, it does not include other types of components, besides the protein and water.

Does your protein belong to any structure classification family?
It belongs to the ATP-dependent AMP-binding enzyme family. This family includes enzymes that activate substrates through adenylation using ATP, forming an AMP-bound intermediate.
Open the structure of your protein in any 3D molecule visualization software: - PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands) - Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.



Color the protein by secondary structure. Does it have more helices or sheets?

The protein shows a predominance in alpha helix (red) compared to beta helix (green). This indicates that firefly luciferase is mainly an alpha-helical protein with a smaller portion of beta-sheet structures
Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

It is shown that this particular protein/enzyme, which operates in an aqueous environment, has an exterior with hydrophilic residues as protagonists and its core with hydrophobic residues.
Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?



Yes, the protein surface shows a variety of pockets. One with a predominant size, and others that are small. It is precisely in the big pocket that the ATP binds with the D-luciferine to form Luciferil-AMP and then binds together with oxygen molecules that finally form oxyluciferine and light.
Part C. Using ML-Based Protein Design Tools
C1. Protein Language Modeling
Deep Mutational Scans
a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. b. Can you explain any particular pattern? (choose a residue and a mutation that stands out). c. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

The map shows a large number of possible mutations, although two main regions should not be changed because the protein could collapse; those regions are shown as two columns of dark blue. Also, three subtle rows show color consistency corresponding to W, M, and C.
Latent Space Analysis
a. Use the provided sequence dataset to embed proteins in reduced dimensionality. b. Analyze the different formed neighborhoods: do they approximate similar proteins? c. Place your protein in the resulting map and explain its position and similarity to its neighbors.


It is shown that near the analyzed protein (Firefly Luciferase - Photinus pyralis) is located the Luciferase Luciola Cruciata, a protein produced by another type of firefly. The first one, PP, is from North America, while the second one, LC, is from Japan. The main difference is the geographical location and its molecular composition, which is expressed in a slightly different type of color, and the stability of the enzyme. Although both proteins use D-Luciferin and ATP to produce light, PP Luciferase is widely used in biotech as a reporter gene. In contrast, LC Luciferase is used to understand how active-site residues interact with the substrate.
C2. Protein Folding
Fold your protein with ESMFold. Do the predicted coordinates match your original structure? Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?


When folded with ESMFold, the protein shows an almost identical structure to the original one, but when given some mutations, it presents a few changes, not very radical ones, but a few anomalies, meaning that the protein is resilient in a high percentage.
C3. Protein Generation
Inverse-Folding a protein Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one. Input this sequence into ESMFold and compare the predicted structure to your original.


Even though the predicted structure has a completely different type of amino acid distribution, the structure remains the same. This is why the protein shown in 3D is very similar to the original in its alpha- and beta-structures. The backbone is not altered, nor is the logic of the distribution of certain types of amino acids, either.
Part D. Group Brainstorm on Bacteriophage Engineering
Part A: SOD1 Binder Peptide Design (From Pranam)
Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.
Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.
Your challenge:
Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy.
A. Part 1: Generate Binders with PepMLM
SOD1 SEQUENCE
SOD1 SEQUENCE with A4V mutation
After processing 4 peptides with 12 amino acids in the mutational sequence, we got:
The pseudo perplexity range explains that the lower the range, the higher the confidence of the model. This means that the peptide with 15.42 will be less natural, while the peptide with 10.32 is a more natural and similar peptide to the sequence. Adding the SOD-1 binding sequence marks a difference arrises, have a pseudo perplexisty of 20.63, a very high number, which means that
B. Part 2: Evaluate Binders with AlphaFold3
I took the peptides generated in PepMLM and bound them to Alphafolds using the mutant SOD1 sequence. The results show that the protein sequence is highly confident in the result it generated, indicating that the model has high confidence in the predicted structure. While the iPTM shows numbers under 0.6, which means there is low confidence in the interaction between the peptide and protein. Also, the parts in which the peptide actually binds a little bit to the protein correspond to the beginning of the sequence, which appears to be a more flexible region of the protein.





C. Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
It seems that the best candidate as a therapeutic peptide is Peptide 3: KRYPAVALAWWE. Although other candidates show very good qualities and achieve similar results in terms of solubility, hemolysis, molecular weight, and net charge, they do not present a strong binding score. In this case, Peptide 3 shows the highest predicted binding affinity among the candidates.
If we compare these results with the iPTM values predicted by Alphafold, we can observe that the confidence of interaction between peptides and the protein is generally low. For Peptide 3 in particular, the iPTM value lies in the middle of the observed range, suggesting (inside of the low values) moderate structural confidence in the predicted interaction.
Additionally, when designing peptides for therapeutic purposes, several properties must be considered. First, peptides need to be soluble so that they can circulate in the biological fluids without forming aggregates. Second, hemolysis probabilities should remain below 0.2, since higher values indicate that peptides may disrupt red blood cells and release hemoglobin into the bloodstream, which can be toxic. Third, binding affinity is important because it helps to predict whether a peptide will interact strongly with the target protein. Furthermore, molecular weight is preferably small, as smaller peptides are easier to synthesize and diffuse through biological environments. Finally, a moderate positive net charge is often favorable, because it can promote electrostatic interactions with negatively charged regions on protein surfaces, potentially stabilizing the peptide-protein interaction.
D. Part 4: Generate Optimized Peptides with moPPIt
I chose to run the peptide at the nearest residues of the mutation because the flexibility around these spaces is beneficial to peptide-protein binding.
To consider the values of analysis: 💧 Solubility: 1.0 (good) 🩸 Hemolysis : 1.0 (good) 🔗 Binding Affinity: the higher the better 🧩 Motif: 1.0 (good)
Therefore, Peptide 3: YYQKTCLVKKEH reflects that it is the best candidate for binding to the mutant SOD1. It presents balanced and consistent results in every aspect: hemolysis, solubility, affinity, and motif. Although the solubility is slightly lower compared to Peptide 0, it still falls in the favorable range, suggesting that the peptide can remain stable and soluble in physiological conditions. Also, it presents a high affinity and motif, meaning that it can perform a strong and specific interaction with the selected residues of the protein.
Compared to PepMLM peptides, the Moppit results show a good affinity and motif, which did not appear in the PepMLM peptides. I think Moppit has a higher affinity and better chances to bind with the protein because it has developed results with a specific target of residues in a specific region, while PepML gives a general result based on stable and more plausible sequences without focusing on any particular binding site.
How would you evaluate these peptides before advancing them to clinical studies?
I would first run a few more computational tests to have consistent results in stability and strength of the peptide-protein bond. This would be run by docking and molecular dynamics simulations. Afterward, it will be necessary to do some in vitro experiments to test if the solubility, hemolysis, binding, affinity, motif, and results keep being consistent and similar to the computational simulations. Finally, in vivo models would be run to assess safety, stability, and pharmacokinetic properties to see if the peptide meets the requirements for clinical studies.
Part C: Final Project: L-Protein Mutants
High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.