Pelin Onur — HTGAA Spring 2026

cover image cover image

About me

Contact info

Homework

Labs

Projects

Subsections of Pelin Onur — HTGAA Spring 2026

Homework

  1. The Project Concept: Integrated Plant-Based Bone Scaffolds The field of regenerative medicine currently relies heavily on static bone scaffolds that provide structural support but lack the ability to interact with the biological environment. I propose the development of a 3D bioprinted smart scaffold designed from sustainable, plant-based materials. This system will serve a dual purpose by providing a physical matrix for bone growth and integrating biosensors for real-time physiological monitoring. By using materials like alginate or cellulose, this approach offers a personalized and environmentally responsible alternative to traditional synthetic implants. Technical Phases: • Phase 1: Structural Foundation. The scaffold is bioprinted using biodegradable plant polymers tailored to the specific geometry of a patient’s bone defect. This provides the necessary mechanical integrity to support new tissue formation. • Phase 2: Biological Intelligence. Biosensors are embedded within the matrix to monitor variables such as pH levels, calcium concentration, and mechanical strain. Simultaneously, a controlled delivery system releases growth factors to promote rapid vascularization and bone density. • Phase 3: Controlled Degradation. As natural bone tissue regenerates and takes over the load-bearing responsibilities, the scaffold undergoes programmed biodegradation. This leaves behind only healthy, natural bone without the need for secondary surgeries to remove permanent hardware.

  2. Governance Goals for Ethical Bioengineering To ensure this technology aligns with safety and ethical standards, the following governance goals have been established. Goal A: Environmental Sustainability and Non-Toxicity The project must ensure that the transition to plant-based materials does not result in unintended ecological or biological consequences. • Sub-goal: Utilize biodegradable materials that break down into inert metabolites to avoid systemic toxicity. • Sub-goal: Standardize sourcing methods to ensure that plant extraction does not disrupt local ecosystems or biodiversity. Goal B: Clinical Efficacy and Patient Protection The integration of active growth factors requires strict oversight to prevent adverse biological reactions. • Sub-goal: Validate the biocompatibility of all plant-derived components to eliminate the risk of chronic inflammation or immune rejection. • Sub-goal: Implement precise delivery protocols for growth factors to prevent unregulated cellular proliferation.

  3. Proposed Governance Actions Action 1: Regulatory Frameworks for Bio-Hybrid Materials The primary purpose is to establish clear safety benchmarks for plant-based medical devices that do not fit into existing regulatory categories. This involves collaboration with the MHRA and FDA to define specific testing protocols for the degradation rates of cellulose-based implants. The design of this action requires rigorous longitudinal studies to confirm that the breakdown of these materials is safe over several years. One significant risk is that high regulatory hurdles may delay the delivery of these life-changing treatments to patients in need. Action 2: Data Security Protocols for In-Vivo Biosensors As these scaffolds generate continuous streams of patient health data, it is vital to establish ethical data handling practices. The design of this action includes the development of encrypted transmission standards to ensure that sensitive biological information is only accessible to authorized medical personnel. A key assumption is that patient data can be transmitted wirelessly without compromising the physical integrity of the scaffold. The risk of failure involves potential cybersecurity vulnerabilities that could expose private health metrics. Action 3: Global Sustainability Certification This action focuses on creating a “Green Biotech” certification to encourage the use of eco-friendly materials in the medical industry. By working with the United Nations Environment Programme, we can set international standards for the carbon footprint of medical manufacturing. This assumes that a global market exists for sustainable medical products. However, a potential risk is that the cost of obtaining such certifications could increase the final price of the scaffold, potentially limiting access for lower-income healthcare systems.

  4. Scoring of Governance Actions Evaluation Criteria Action 1: Regulation Action 2: Data Privacy Action 3: Certification Enhance Biosecurity 1 2 3 Foster Lab Safety 1 3 2 Protect the Environment 2 3 1 Stakeholder Feasibility 2 2 1 Constructive Application 1 2 1 (Note: 1 represents the highest alignment with the goal)

  5. Prioritization and Ethical Considerations Upon reviewing the scores, Action 1 (Regulation of Biodegradable Biomaterials) is the highest priority. Without a validated safety profile and regulatory approval, the clinical and environmental benefits of the scaffold cannot be realized. While Action 3 is easier to implement, it remains secondary to the fundamental safety of the patient. During the development of this proposal, an important ethical concern arose regarding “Biotelemetry Equity.” If smart scaffolds become the gold standard, there is a risk that only patients in high-resource settings will benefit from real-time healing monitoring. To address this, governance actions should include incentives for companies to develop “passive” versions of the scaffold that provide high-quality structural support at a lower cost for global distribution. Relevant Audiences The recommendations for these governance actions are directed toward the FDA and the World Health Organization. These bodies are essential for establishing the international safety and sustainability standards required to bring 3D bioprinted plant-based scaffolds into mainstream clinical practice.

  6. References 10.1109/SENSORS56945.2023.10325163 10.1002/adhm.202102807 https://cordis.europa.eu/project/id/101177877

Part2: Lab Preparation

It was not applicable for Committed Listeners

Part3: Week 2 Lecture Prep

Questions from Professor Jacobson

Q1: Even though it is not perfect, the precision of nature’s machinery for copying DNA is actually quite staggering. The intrinsic error rate of DNA polymerase is approximately one mistake for every million base pairs copied (10^(−6)). For context, the human genome comprises around 3.2 billion base pairs. If we were to depend solely on polymerase, each and every cell division would give rise to innumerable arbitrary mutations. This would have catastrophic consequences for the stability of life over many generations, but biology handles this massive discrepancy through a multi-layered proofreading and repair system. First, the polymerase itself has a ‘delete’ function whereby it can sense a mismatch, back up and correct it. Secondary systems, such as the MutS repair complex, then scan the DNA afterwards to detect any rare mistakes that have slipped through the first net. This combined effort brings the final error rate down to approximately one in a billion. This makes it reliable enough to maintain the blueprint of a human being. Q2: When it comes to coding proteins, there is an incredible amount of flexibility because the genetic code is redundant. Since most amino acids are linked to several different three letter codons, you could theoretically write the DNA sequence for an average human protein in more ways than there are atoms in the universe. In practice, however, most of these sequences just do not work in a living cell. A major reason for this is the physical shape the RNA takes. If a sequence accidentally folds into a tight hairpin or a complex secondary structure, the cellular machinery gets physically stuck, much like a zipper hitting a snag in fabric. There are also issues with sequences having extreme GC ratios, which makes them too unstable or difficult for the cell to handle. Plus, cells have internal “cleavage rules” where they recognize certain patterns as signals to chop up the genetic instructions before they can even be translated. So, while the theoretical options are infinite, the actual biological grammar needed to express a protein is much more restrictive.

Questions from Dr. LeProust

Q1: The standard approach is the phosphoramidite method, which follows a four-step cycle. It starts with coupling the phosphoramidite to the chain, followed by capping any unreacted sites to prevent errors. The link is then oxidized to stabilize it, and finally, the growing chain is deblocked to prepare it for the next nucleotide addition.

Q2: The main issue is the cumulative effect of coupling efficiency. Even with a very high success rate for each step, small errors add up quickly over many cycles. By the time you reach 200 nucleotides, these compounding errors and the accumulation of truncated or incorrect sequences make it nearly impossible to retrieve a pure, full-length product.

Q3: Synthesizing a 2000bp gene directly would require 2000 consecutive coupling cycles without a single mistake, which is chemically unrealistic with current technology. The yield of the correct full-length molecule would be effectively zero. Beyond the chemistry, the sheer cost and the buildup of chemical damage over such a long process make it much more practical to assemble smaller fragments rather than trying to print the whole gene at once.

The ten amino acids that are generally considered to be essential for animals are: arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan and valine. They are classified as essential because animals cannot synthesise sufficient quantities of their carbon skeletons, meaning they must be obtained through diet or from symbiotic relationships with microbes.

Questions from George Church The ’lysine contingency’ refers to the fact that animals have lost the ability to produce lysine independently. From an evolutionary perspective, this seems less like a biological flaw and more like a way for ecosystems to create a reliance between different species. Our specific need for this amino acid has shaped the world as we know it, creating massive agricultural systems and complex food webs that would not exist if we could produce lysine ourselves. For example, the industrial production of lysine for livestock feed is a significant global enterprise centred on optimising animal growth. Without this essential amino acid, the entire economic and agricultural infrastructure might not exist, and we might not have moved towards such extensive farming practices. I wonder if, over millions of years, animals became dependent on lysine as a kind of self-imposed evolutionary trade-off. Perhaps it was once non-essential, but because it was so abundant in the environment, our ancestors eventually ’turned off’ the expensive metabolic machinery needed to produce it. In that sense, what we call a contingency is really just nature’s efficient way of outsourcing production to the surrounding environment. References;

https://www.ncbi.nlm.nih.gov/books/NBK557845/ https://www.ncbi.nlm.nih.gov/books/NBK234922/

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Part1: Assignment 1. The Project Concept: Integrated Plant-Based Bone Scaffolds The field of regenerative medicine currently relies heavily on static bone scaffolds that provide structural support but lack the ability to interact with the biological environment. I propose the development of a 3D bioprinted smart scaffold designed from sustainable, plant-based materials. This system will serve a dual purpose by providing a physical matrix for bone growth and integrating biosensors for real-time physiological monitoring. By using materials like alginate or cellulose, this approach offers a personalized and environmentally responsible alternative to traditional synthetic implants.

  • Week 2 HW: DNA Read Write and Edit

    ✨Part 1: Benchling & In silico Gel Art I simulated a restriction digest on λ DNA in Benchling using enzymes like EcoRI, HindIII, and BamHI, EcoRV, Kpnl. By comparing the band patterns, I could visualize how different enzymes cut the DNA into fragments of varying sizes. This simulation helped me understand how we verify DNA fingerprints before moving to synthesis.

  • Week 3 HW: Lab Automation

    Week 3 – Lab Automation ✨ Week 3 - Homework ✨ You can view my Automation Art design here: Opentrons Art Link After creating this shell pattern using Opentrons Art, I duplicated the provided Colab notebook to develop a Python protocol. To program the Opentrons robot to physically recreate the artwork on a plate, I systematically entered the coordinate data from my design step-by-step into the script. Once the protocol was complete, it successfully generated the images shown below.

  • Week 4 HW: Protein Design

    WEEK 4 — PROTEIN DESIGN PART I ✨ Part A. Conceptual Questions ✨ 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Let’s walk through the math by looking directly at the weight. We know an average amino acid is about 100 Daltons. If we convert that to grams, one single Dalton is an incredibly tiny fraction of a gram (about 1.66 × 10⁻²⁴ g). That means our single 100-Dalton amino acid weighs roughly 1.66 × 10⁻²² grams. If we have a 500-gram piece of meat and we pretend for a second that it is 100% pure protein, we just divide the total weight by the weight of one molecule. So, 500 g divided by 1.66 × 10⁻²² g/molecule gives us roughly 3.01 × 10²⁴ amino acid molecules.

  • Week 5 HW: Protein Design: Part II

    ✨ Part A. SOD1 Binder Peptide Design ✨ Part 1: Generate Binders with PepMLM Sequence Retrieval and Mutation I began by retrieving the human Superoxide dismutase 1 (SOD1) sequence from the UniProt database using the accession number P00441. The native (wild-type) sequence consists of 154 amino acids: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ To model the disease state, I introduced the ALS-causing A4V mutation (Alanine → Valine at residue 4). Noting that standard numbering excludes the initiator Methionine (M), I replaced the Alanine at the 5th position with a Valine to create my target mutant sequence:

  • Week 6 HW: Genetic Circuits: Part I

    ✨ DNA Assembly ✨ Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Since the protocol didn’t list the exact ingredients, I looked up the standard components of a Phusion Master Mix from a supplier like New England Biolabs. A PCR Master Mix is basically a pre-mixed tube of everything needed to copy DNA, minus the specific template and primers. The main components are:

  • Week 7 HW: Genettic Circuits: Part II

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) 1. What advantages do IANNs have over traditional genetic circuits? In my research, I found that IANNs offer a much more flexible way to handle biological data compared to standard Boolean (ON/OFF) circuits. Here are the main benefits:

  • Week 9 HW: Cell Free Systems

    HTGAA Week 9: Cell-Free Systems Part A: General and Lecturer-Specific Questions 1. Explain the main advantages of cell-free protein synthesis (CFPS) over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. The biggest advantage I see is that CFPS turns “biology” into “chemistry.” In traditional in vivo systems, the cell membrane is a wall that prevents us from easily tweaking the internal environment. In a cell-free setup, I have an “open” system. I can add non natural amino acids, adjust salt concentrations (Mg²⁺ and K⁺) in real time, or even add detergents to help fold tricky proteins.

  • Week 10 HW: Advanced Imaging & Measurement Technology

    Homework: Final Project 1. Which aspect(s) of your project will you measure? The main goal is to measure how well my custom DNA construct actually stays stuck to the 3D-printed scaffold. I also need to measure the bioactivity of the produced protein — essentially checking if it actually triggers bone-growing signals like it’s supposed to. Finally, I’ll be measuring the retention time, which tells me how much longer my “anchored” version stays on the scaffold compared to a standard version that usually just washes away.

  • Week 11 HW: Bioproduction & Cloud Labs

    Week 11 — Bioproduction & Cloud Labs Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork Make a note on your HTGAA webpages including: (a) What you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”) Honestly, I didn’t get to contribute a pixel this time. The window between the personalized URL going out and the editing deadline on Sunday 4/19 closed before I was able to sit down and place mine, which I’m a bit bummed about because the project sounded cool.

  • Week 12 HW: Building Genomes

  • Week 13 HW: Biodesign and Engineered Living Materials

  • Week 14 HW: Bio Design and Bio Fabrication

Subsections of Homework

Week 1 HW: Principles and Practices

Part1: Assignment

1. The Project Concept:

Integrated Plant-Based Bone Scaffolds The field of regenerative medicine currently relies heavily on static bone scaffolds that provide structural support but lack the ability to interact with the biological environment. I propose the development of a 3D bioprinted smart scaffold designed from sustainable, plant-based materials. This system will serve a dual purpose by providing a physical matrix for bone growth and integrating biosensors for real-time physiological monitoring. By using materials like alginate or cellulose, this approach offers a personalized and environmentally responsible alternative to traditional synthetic implants.

Technical Phases:

• Phase 1: Structural Foundation. The scaffold is bioprinted using biodegradable plant polymers tailored to the specific geometry of a patient’s bone defect. This provides the necessary mechanical integrity to support new tissue formation.

• Phase 2: Biological Intelligence. Biosensors are embedded within the matrix to monitor variables such as pH levels, calcium concentration, and mechanical strain. Simultaneously, a controlled delivery system releases growth factors to promote rapid vascularization and bone density.

• Phase 3: Controlled Degradation. As natural bone tissue regenerates and takes over the load-bearing responsibilities, the scaffold undergoes programmed biodegradation. This leaves behind only healthy, natural bone without the need for secondary surgeries to remove permanent hardware.

2. Governance Goals for Ethical Bioengineering

To ensure this technology aligns with safety and ethical standards, the following governance goals have been established.

Goal A: Environmental Sustainability and Non-Toxicity

The project must ensure that the transition to plant-based materials does not result in unintended ecological or biological consequences.

• Sub-goal: Utilize biodegradable materials that break down into inert metabolites to avoid systemic toxicity.

• Sub-goal: Standardize sourcing methods to ensure that plant extraction does not disrupt local ecosystems or biodiversity.

Goal B: Clinical Efficacy and Patient Protection

The integration of active growth factors requires strict oversight to prevent adverse biological reactions.

• Sub-goal: Validate the biocompatibility of all plant-derived components to eliminate the risk of chronic inflammation or immune rejection.

• Sub-goal: Implement precise delivery protocols for growth factors to prevent unregulated cellular proliferation.

3. Proposed Governance Actions

Action 1: Regulatory Frameworks for Bio-Hybrid Materials

The primary purpose is to establish clear safety benchmarks for plant-based medical devices that do not fit into existing regulatory categories. This involves collaboration with the MHRA and FDA to define specific testing protocols for the degradation rates of cellulose-based implants. The design of this action requires rigorous longitudinal studies to confirm that the breakdown of these materials is safe over several years. One significant risk is that high regulatory hurdles may delay the delivery of these life-changing treatments to patients in need.

Action 2: Data Security Protocols for In-Vivo Biosensors

As these scaffolds generate continuous streams of patient health data, it is vital to establish ethical data handling practices. The design of this action includes the development of encrypted transmission standards to ensure that sensitive biological information is only accessible to authorized medical personnel. A key assumption is that patient data can be transmitted wirelessly without compromising the physical integrity of the scaffold. The risk of failure involves potential cybersecurity vulnerabilities that could expose private health metrics.

Action 3: Global Sustainability Certification

This action focuses on creating a “Green Biotech” certification to encourage the use of eco-friendly materials in the medical industry. By working with the United Nations Environment Programme, we can set international standards for the carbon footprint of medical manufacturing. This assumes that a global market exists for sustainable medical products. However, a potential risk is that the cost of obtaining such certifications could increase the final price of the scaffold, potentially limiting access for lower-income healthcare systems.

4. Scoring of Governance Actions

Evaluation CriteriaAction 1: RegulationAction 2: Data PrivacyAction 3: Certification
Enhance Biosecurity123
Foster Lab Safety132
Protect the Environment231
Stakeholder Feasibility221
Constructive Application121

(Note: 1 represents the highest alignment with the goal)

5. Prioritization and Ethical Considerations

Upon reviewing the scores, Action 1 (Regulation of Biodegradable Biomaterials) is the highest priority. Without a validated safety profile and regulatory approval, the clinical and environmental benefits of the scaffold cannot be realized. While Action 3 is easier to implement, it remains secondary to the fundamental safety of the patient. During the development of this proposal, an important ethical concern arose regarding “Biotelemetry Equity.” If smart scaffolds become the gold standard, there is a risk that only patients in high-resource settings will benefit from real-time healing monitoring. To address this, governance actions should include incentives for companies to develop “passive” versions of the scaffold that provide high-quality structural support at a lower cost for global distribution. Relevant Audiences The recommendations for these governance actions are directed toward the FDA and the World Health Organization. These bodies are essential for establishing the international safety and sustainability standards required to bring 3D bioprinted plant-based scaffolds into mainstream clinical practice.

6. References

10.1109/SENSORS56945.2023.10325163

10.1002/adhm.202102807

https://cordis.europa.eu/project/id/101177877

Part2: Lab Preparation

It was not applicable for Committed Listeners

Part3: Week 2 Lecture Prep

Questions from Professor Jacobson

Q1: Even though it is not perfect, the precision of nature’s machinery for copying DNA is actually quite staggering. The intrinsic error rate of DNA polymerase is approximately one mistake for every million base pairs copied (10^(−6)). For context, the human genome comprises around 3.2 billion base pairs. If we were to depend solely on polymerase, each and every cell division would give rise to innumerable arbitrary mutations. This would have catastrophic consequences for the stability of life over many generations, but biology handles this massive discrepancy through a multi-layered proofreading and repair system. First, the polymerase itself has a ‘delete’ function whereby it can sense a mismatch, back up and correct it. Secondary systems, such as the MutS repair complex, then scan the DNA afterwards to detect any rare mistakes that have slipped through the first net. This combined effort brings the final error rate down to approximately one in a billion. This makes it reliable enough to maintain the blueprint of a human being.

Q2: When it comes to coding proteins, there is an incredible amount of flexibility because the genetic code is redundant. Since most amino acids are linked to several different three letter codons, you could theoretically write the DNA sequence for an average human protein in more ways than there are atoms in the universe. In practice, however, most of these sequences just do not work in a living cell. A major reason for this is the physical shape the RNA takes. If a sequence accidentally folds into a tight hairpin or a complex secondary structure, the cellular machinery gets physically stuck, much like a zipper hitting a snag in fabric. There are also issues with sequences having extreme GC ratios, which makes them too unstable or difficult for the cell to handle. Plus, cells have internal “cleavage rules” where they recognize certain patterns as signals to chop up the genetic instructions before they can even be translated. So, while the theoretical options are infinite, the actual biological grammar needed to express a protein is much more restrictive.

Questions from Dr. LeProust

Q1: The standard approach is the phosphoramidite method, which follows a four-step cycle. It starts with coupling the phosphoramidite to the chain, followed by capping any unreacted sites to prevent errors. The link is then oxidized to stabilize it, and finally, the growing chain is deblocked to prepare it for the next nucleotide addition.

Q2: The main issue is the cumulative effect of coupling efficiency. Even with a very high success rate for each step, small errors add up quickly over many cycles. By the time you reach 200 nucleotides, these compounding errors and the accumulation of truncated or incorrect sequences make it nearly impossible to retrieve a pure, full-length product.

Q3: Synthesizing a 2000bp gene directly would require 2000 consecutive coupling cycles without a single mistake, which is chemically unrealistic with current technology. The yield of the correct full-length molecule would be effectively zero. Beyond the chemistry, the sheer cost and the buildup of chemical damage over such a long process make it much more practical to assemble smaller fragments rather than trying to print the whole gene at once. The ten amino acids that are generally considered to be essential for animals are: arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan and valine. They are classified as essential because animals cannot synthesise sufficient quantities of their carbon skeletons, meaning they must be obtained through diet or from symbiotic relationships with microbes.

Questions from George Church

The ’lysine contingency’ refers to the fact that animals have lost the ability to produce lysine independently. From an evolutionary perspective, this seems less like a biological flaw and more like a way for ecosystems to create a reliance between different species. Our specific need for this amino acid has shaped the world as we know it, creating massive agricultural systems and complex food webs that would not exist if we could produce lysine ourselves. For example, the industrial production of lysine for livestock feed is a significant global enterprise centred on optimising animal growth. Without this essential amino acid, the entire economic and agricultural infrastructure might not exist, and we might not have moved towards such extensive farming practices. I wonder if, over millions of years, animals became dependent on lysine as a kind of self-imposed evolutionary trade-off. Perhaps it was once non-essential, but because it was so abundant in the environment, our ancestors eventually ’turned off’ the expensive metabolic machinery needed to produce it. In that sense, what we call a contingency is really just nature’s efficient way of outsourcing production to the surrounding environment.

References;

https://www.ncbi.nlm.nih.gov/books/NBK557845/

https://www.ncbi.nlm.nih.gov/books/NBK234922/

cover image cover image

Week 2 HW: DNA Read Write and Edit

✨Part 1: Benchling & In silico Gel Art

I simulated a restriction digest on λ DNA in Benchling using enzymes like EcoRI, HindIII, and BamHI, EcoRV, Kpnl. By comparing the band patterns, I could visualize how different enzymes cut the DNA into fragments of varying sizes. This simulation helped me understand how we verify DNA fingerprints before moving to synthesis.

Week 2 Part 1

✨ Part 3: DNA Design Challenge

3.1. Choose your protein

My Choice: For this assignment, I chose Lysostaphin, a glycylglycine endopeptidase enzyme. This protein is naturally produced by Staphylococcus simulans to kill rival bacteria.

Why I chose it:

My background is in dentistry and tissue engineering, where peri-implantitis (infection around dental implants) is a critical failure mode. These infections are often caused by antibiotic-resistant Staphylococcus aureus (MRSA) forming biofilms on the titanium surface. Lysostaphin is capable of slicing through the cell wall of S. aureus, destroying the biofilm effectively where traditional antibiotics fail. It represents a potential “biological scalpel” for saving failing implants.

Sequence:

Using UniProt, I obtained the amino acid sequence for Lysostaphin:

sp|P10547|LSTP_STASI Lysostaphin OS=Staphylococcus simulans OX=1286 GN=lss PE=1 SV=2 MKKTKNNYYTRPLAIGLSTFALASIVYGGIQNETHASEKSNMDVSKKVAEVETSKAPVEN TAEVETSKAPVENTAEVETSKAPVENTAEVETSKAPVENTAEVETSKAPVENTAEVETSK APVENTAEVETSKAPVENTAEVETSKAPVENTAEVETSKAPVENTAEVETSKAPVENTAE VETSKAPVENTAEVETSKAPVENTAEVETSKAPVENTAEVETSKAPVENTAEVETSKALV QNRTALRAATHEHSAQWLNNYKKGYGYGPYPLGINGGMHYGVDFFMNIGTPVKAISSGKI VEAGWSNYGGGNQIGLIENDGVHRQWYMHLSKYNVKVGDYVKAGQIIGWSGSTGYSTAPH LHFQRMVNSFSNSTAQDPMPFLKSAGYGKAGGTVTPTPNTGWKTNKYGTLYKSESASFTP NTDIITRTTGPFRSMPQSGVLKAGQTIHYDEVMKQDGHVWVGYTGNSGQRIYLPVRTWNK STNTLGVLWGTIK

3.2. Reverse Translate

Using the online resource at https://www.bioinformatics.org/, I converted the amino acid sequence(taken from https://www.uniprot.org) of the Lysostaphin protein back into its potential DNA sequence. This technique follows the Central Dogma of Molecular Biology, which outlines the flow of genetic information from DNA to RNA and finally to protein. By reversing this sequence, the tool creates a logical nucleotide chain capable of producing that specific protein.

pic2 pic2

Converted Sequence:

reverse translation of Untitled to a 1479 base sequence of most likely codons. atgaaaaaaaccaaaaacaactattatacccgcccgctggcgattggcctgagcaccttt gcgctggcgagcattgtgtatggcggcattcagaacgaaacccatgcgagcgaaaaaagc aacatggatgtgagcaaaaaagtggcggaagtggaaaccagcaaagcgccggtggaaaac accgcggaagtggaaaccagcaaagcgccggtggaaaacaccgcggaagtggaaaccagc aaagcgccggtggaaaacaccgcggaagtggaaaccagcaaagcgccggtggaaaacacc gcggaagtggaaaccagcaaagcgccggtggaaaacaccgcggaagtggaaaccagcaaa gcgccggtggaaaacaccgcggaagtggaaaccagcaaagcgccggtggaaaacaccgcg gaagtggaaaccagcaaagcgccggtggaaaacaccgcggaagtggaaaccagcaaagcg ccggtggaaaacaccgcggaagtggaaaccagcaaagcgccggtggaaaacaccgcggaa gtggaaaccagcaaagcgccggtggaaaacaccgcggaagtggaaaccagcaaagcgccg gtggaaaacaccgcggaagtggaaaccagcaaagcgccggtggaaaacaccgcggaagtg gaaaccagcaaagcgccggtggaaaacaccgcggaagtggaaaccagcaaagcgctggtg cagaaccgcaccgcgctgcgcgcggcgacccatgaacatagcgcgcagtggctgaacaac tataaaaaaggctatggctatggcccgtatccgctgggcattaacggcggcatgcattat ggcgtggatttttttatgaacattggcaccccggtgaaagcgattagcagcggcaaaatt gtggaagcgggctggagcaactatggcggcggcaaccagattggcctgattgaaaacgat ggcgtgcatcgccagtggtatatgcatctgagcaaatataacgtgaaagtgggcgattat gtgaaagcgggccagattattggctggagcggcagcaccggctatagcaccgcgccgcat ctgcattttcagcgcatggtgaacagctttagcaacagcaccgcgcaggatccgatgccg tttctgaaaagcgcgggctatggcaaagcgggcggcaccgtgaccccgaccccgaacacc ggctggaaaaccaacaaatatggcaccctgtataaaagcgaaagcgcgagctttaccccg aacaccgatattattacccgcaccaccggcccgtttcgcagcatgccgcagagcggcgtg ctgaaagcgggccagaccattcattatgatgaagtgatgaaacaggatggccatgtgtgg gtgggctataccggcaacagcggccagcgcatttatctgccggtgcgcacctggaacaaa agcaccaacaccctgggcgtgctgtggggcaccattaaa

3.3. Codon optimization

  1. Why do we optimize codons?

I need to ensure my DNA “reads” fluently in the host organism. If the codons are rare in the host, protein production will stall. Optimization replaces these rare codons with the host’s preferred ones without changing the final protein structure.

  1. Which organism did you choose and why?

I chose Escherichia coli (E. coli) for codon optimization. While my final application is for dental patients, E. coli is the industrial standard for manufacturing proteins. By optimizing for E. coli, I can grow large vats of bacteria, induce them to produce Lysostaphin, and then purify the enzyme to be applied as a dental gel or coating for implants.

Optimization Result:

pic3 pic3

3.4. You have a sequence! Now what?

Now that I have the optimized DNA sequence, the goal is recombinant protein production to create a therapeutic solution for peri-implantitis.

  1. Cloning

I will insert the optimized Lysostaphin gene into an expression vector (plasmid). This plasmid acts as the delivery vehicle, containing a strong promoter that signals the host cell to begin producing the protein.

  1. Transformation

The recombinant plasmid is put into $E. coli$ bacteria. This is achieved through a process called transformation (such as heat-shock), which allows the bacterial cells to take up the foreign DNA and host it within their own systems.

  1. Expression

The bacteria act as biological factories, following the Central Dogma of Molecular Biology. The $E. coli$ cells read the optimized DNA instructions to produce mRNA via transcription, which is then translated into the Lysostaphin Protein. Because the codons were optimized for $E. coli$ (K12), the translation process is highly efficient with a high protein yield.

  1. Purification Finally, I will extract the protein from the bacterial culture. Through a series of filtration and chromatography steps, the Lysostaphin is isolated from other bacterial proteins. The result is a pure protein that can be formulated into a bioactive gel designed to target and eliminate $Staphylococcus$ biofilms in patients with peri-implantitis.

3.5. [Optional] How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.

A gene is first transcribed into a long RNA molecule called pre-mRNA. This pre-mRNA contains both coding regions (exons) and non-coding regions (introns). Through a process called Alternative Splicing, the cell can cut out the introns and stitch the exons together in different combinations. Just like editing a movie scene in different ways, different combinations of exons create different final mRNA molecules. $$Different \ mRNA \ variants \rightarrow Different \ Proteins$$ Because the mRNA sequence changes, the resulting amino acid sequence changes too. This allows a single gene to code for multiple different protein isoforms, maximizing the efficiency of the genome.

  1. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!

In nature, the enzyme RNA Polymerase reads the DNA template strand and synthesizes a single-stranded RNA molecule based on base complementarity.

• A pairs with U (Uracil replaces Thymine in RNA).

• T pairs with A.

• G pairs with C.

• C pairs with G.

After transcription, the Ribosome reads the mRNA in groups of three nucleotides called codons. Each codon corresponds to one specific amino acid.

Alignment for Lysostaphin (Start of Sequence):

Here is the flow of information for the first 6 amino acids of my Lysostaphin protein (MTTTPD…).

• DNA (Coding Strand): ATG ACC ACC ACC CCG GAT

• mRNA (Transcription): AUG ACC ACC ACC CCG GAU

• Protein (Translation): M T T T P D

Key:

• M (Methionine): The “Start” signal.

• T (Threonine): A polar amino acid.

• P (Proline): Adds structural rigidity.

• D (Aspartic Acid): Negatively charged.

Part 4: Prepare a Twist DNA Synthesis Order

I created a new sequence in Benchling named Lysostaphin_e.coli. I combined my optimized gene with the standard parts required for E. coli expression:

Promoter (BBa_J23106):

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (BBa_B0034):

CATTAAAGAGGAGAAAGGTACC

Start Codon:

ATG

Coding Sequence:

ATGAAAAAAACGAAAAACAATTACTATACCCGCCCGCTGGCCATTGGCCTGAGCACTTTTGCGCTGGCGAGCATCGTGTACGGCGGCATTCAGAACGAAACCCATGCGAGCGAAAAAAGCAATATGGATGTAAGCAAAAAAGTGGCGGAAGTTGAAACCAGCAAAGCGCCGGTCGAAAACACCGCGGAAGTGGAAACTAGCAAAGCGCCGGTCGAAAACACCGCCGAAGTGGAAACCAGCAAAGCGCCGGTTGAAAACACCGCCGAAGTGGAGACCAGCAAAGCGCCGGTGGAAAATACCGCCGAAGTAGAAACCAGCAAAGCCCCGGTGGAAAATACCGCGGAAGTGGAGACCTCAAAAGCGCCGGTTGAAAACACCGCGGAAGTGGAAACGAGCAAAGCACCGGTGGAGAATACCGCGGAAGTGGAAACCAGCAAAGCGCCGGTGGAAAATACCGCGGAAGTGGAAACGAGCAAAGCCCCAGTTGAAAATACGGCCGAGGTGGAAACCAGCAAAGCGCCGGTGGAAAACACCGCCGAAGTTGAAACCTCCAAAGCCCCGGTTGAAAATACCGCGGAAGTAGAAACCTCGAAAGCACCGGTGGAAAACACCGCCGAAGTGGAAACCTCAAAAGCCCCGGTGGAAAACACCGCGGAAGTTGAAACCTCTAAAGCGCCGGTGGAAAATACGGCGGAAGTGGAAACCAGCAAAGCCCTGGTCCAGAACCGCACCGCGCTGCGCGCGGCAACCCATGAACATAGCGCGCAGTGGCTGAATAACTACAAAAAAGGCTATGGCTATGGCCCGTATCCGCTGGGCATTAATGGCGGCATGCATTATGGTGTCGACTTTTTCATGAACATCGGCACCCCGGTTAAAGCGATTTCGAGCGGTAAAATCGTGGAAGCCGGCTGGAGCAACTACGGCGGCGGCAACCAGATTGGTCTGATTGAAAATGATGGCGTGCATCGCCAGTGGTACATGCATCTGAGCAAATACAACGTCAAAGTGGGTGATTATGTGAAAGCAGGTCAGATTATTGGCTGGAGCGGCAGCACCGGCTACAGCACCGCACCGCACCTGCATTTCCAGCGTATGGTGAATAGCTTCAGCAATAGCACCGCGCAGGATCCGATGCCGTTTCTGAAATCAGCGGGCTATGGCAAAGCGGGCGGCACCGTGACCCCGACCCCGAATACCGGCTGGAAAACCAACAAATATGGCACCCTGTATAAAAGCGAAAGCGCGAGCTTTACCCCGAACACCGATATCATTACCCGCACCACCGGCCCGTTCCGCAGCATGCCGCAGTCAGGCGTGCTGAAAGCGGGCCAGACCATTCATTATGATGAAGTGATGAAACAGGATGGCCATGTGTGGGTGGGTTATACCGGCAACTCGGGCCAGCGCATCTACCTGCCGGTGCGCACCTGGAACAAAAGCACCAACACCCTGGGTGTACTGTGGGGTACCATTAAA

7x His Tag:

CATCACCATCACCATCATCAC

Stop Codon:

TAA

Terminator (BBa_B0015):

CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTG TCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

pic4 pic4



pic5 pic5

Part 5: DNA Read/Write/Edit

5.1 DNA Read

What to Read: I would sequence the biofilm microbiome found in the pockets of failing dental implants.

Why:

Current treatment for peri-implantitis is often “blind” mechanical cleaning. By sequencing the DNA of the infection site, we can identify exactly which pathogens are present (e.g., P. gingivalis vs. S. aureus) and detect if they carry Antibiotic Resistance Genes (AMR). This allows for precision dentistry—choosing the right treatment rather than guessing.

Technology:

I would use the Oxford Nanopore MinION. • Reason: It is portable and rapid. I could theoretically bring it into a dental clinic, swab an implant, and get sequencing data in real-time to guide surgery. • Process: Extract DNA from plaque $\rightarrow$ Load into MinION $\rightarrow$ Nanopore reads electrical signals of DNA strands $\rightarrow$ Output is the pathogen profile.

5.2 DNA Write

What to Write:

I want to synthesize the gene for Lysostaphin (as designed in Part 3).

Why:

Nature provided S. simulans with this weapon, but we need to mass-produce it to use it as a medicine. By writing (synthesizing) this DNA, we can create a pure, high-concentration anti-biofilm agent that dissolves the cell walls of MRSA, saving titanium implants that would otherwise need to be removed.

Technology:

I would use Twist Bioscience silicon-based synthesis. • Reason: It allows me to order the exact “Expression Cassette” I designed, ensuring the sequence is perfect for my E. coli factories.

5.3 DNA Edit

What to Edit:

I would use CRISPR to edit commensal oral bacteria (like Streptococcus salivarius) to naturally secrete Lysostaphin.

Why:

Instead of applying a gel, we could introduce a “guardian bacteria” into the patient’s mouth. This edited bacteria would live on the gums and constantly produce small amounts of Lysostaphin, preventing the dangerous S. aureus from ever forming a biofilm on the implant in the first place.

Week 3 HW: Lab Automation

Week 3 – Lab Automation

✨ Week 3 - Homework ✨

You can view my Automation Art design here: Opentrons Art Link

After creating this shell pattern using Opentrons Art, I duplicated the provided Colab notebook to develop a Python protocol. To program the Opentrons robot to physically recreate the artwork on a plate, I systematically entered the coordinate data from my design step-by-step into the script. Once the protocol was complete, it successfully generated the images shown below.

Digital Shell Design

Digital Design Digital Design

✨ Post-Lab Questions ✨

1) Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Article Title: An Automation Workflow for High-Throughput Manufacturing and Analysis of Scaffold-Supported 3D Tissue Arrays

Authors: Ruonan Cao, Nancy T. Li, Simon Latour, Jose L. Cadavid, Cassidy M. Tan, Ari Forman, Hartland W. Jackson, Alison P. McGuigan

Year: 2023

DOI: 10.1002/adhm.202202422

Article 1

This paper tackles a real bottleneck in advanced 3D culture: patient-derived organoids and complex co-cultures are powerful, but hard to scale and hard to analyze at single-cell resolution when manufacturing and handling are manual. The authors focus on the SPOT platform (Scaffold-supported Platform for Organoid-based Tissues), which generates flat, thin, dimensionally controlled microtissues in 96- and 384-well plate formats compatible with longitudinal imaging—yet historically limited by manual fabrication.

What’s automated with Opentrons OT-2 (and what makes it novel):

  • Automated 3D microtissue manufacturing (seeding): They use the Opentrons OT-2 to dispense a cell–gel mixture into 96/384-SPOT plates and optimize the process so automated manufacturing is comparable to manual consistency.

  • Temperature control + custom hardware for reliability: Two temperature modules set to 4 °C keep the SPOT plate and cell-gel cold during seeding, and a custom aluminum plate improves support and heat conduction for the thin plate—showing how automation often needs small mechanical/thermal design choices to work well.
    ]

  • Automation beyond seeding (screening + single-cell endpoints): The OT-2 also supports drug/reagent addition and culture maintenance, and it automates gel digestion to recover single cells for high-throughput flow cytometry.

  • Multiplexed CyTOF enablement: A particularly strong “novel application” angle is that OT-2 is used to generate a barcode master plate and automate parts of the barcoding/washing/pooling workflow to reduce manual errors—enabling scalable CyTOF proteomic readouts.

  • Proof-of-value biology: They generate 3D complex tissues with different tumor/stromal ratios and show the workflow can incorporate primary patient-derived organoids, supporting scalable, patient-relevant screening and analysis.


2) Write a description about what you intend to do with automation tools for your final project.

Across all three ideas, the Opentrons OT-2 is my core “execution engine” for repeatable, programmable liquid handling—reducing variability, scaling to multi-sample workflows, and producing clean run logs/plate maps. Where formats aren’t standard labware, I’d use custom 3D-printed holders.

Idea 1 — Automated Seeding of Patient-Specific Bone Scaffolds

Goal: Improve cell distribution and viability deep inside porous bone scaffolds by replacing static pipetting with automated, repeatable dynamic “drip-seeding.”

What I would automate on the OT-2

  • A custom 3D-printed scaffold holder mounting multiple scaffolds on the OT-2 deck.
  • A timed protocol dispensing cell suspension (e.g., MSCs) + osteogenic media cues across scaffolds in multi-pass patterns.
  • Optional scheduled media refresh + standardized sampling for assays.

Example pseudocode (conceptual)

# Conceptual workflow: dynamic drip-seeding across multiple scaffolds
scaffolds = load_custom_holder(num_scaffolds=8)
cell_source = reservoir("MSC_suspension")
media_source = reservoir("osteogenic_media")

for round in range(N_seed_rounds):
    for scaf in scaffolds:
        drip_dispense(cell_source, scaf, volume=V_cell, pattern="multi-point")
    wait(minutes=settle_time)

for day in culture_days:
    for scaf in scaffolds:
        exchange_media(scaf, media_source, volume=V_media)
    log_run(day)

Idea 2 — Anti-Biofilm “Guardian Bacteria” (high-throughput screening)

Goal: Run a high-throughput anti-biofilm screen on titanium-relevant surfaces using a plate-based assay format.

What I would automate on the OT-2

  • A 96-well screening layout (controls + variants + replicates).
  • Automated mixing, dispensing, wash steps, and readout reagent handling.
  • Standardized timing + plate map + run log for comparability.

Example pseudocode (conceptual)

# Conceptual workflow: plate-based biofilm screen automation
plate = load_labware("96_well_plate")
variants = load_conditions("variant_plate_map.csv")

for well, condition in variants.items():
    dispense_inoculum(plate[well], condition)

incubate(plate, hours=incubation_time)

for wash in range(n_washes):
    wash_plate(plate)

apply_readout_reagent(plate)   # e.g., stain
final_wash_and_dispense(plate)
export_plate_map_and_log()

Idea 3 — Bioprinted Tooth-on-a-Chip Biosensor (automated long-term culture + exposure)

Goal: Improve dental material testing realism using a chip that includes a dentin barrier + engineered reporter pulp cells for real-time toxicity/biocompatibility readouts.

What I would automate on the OT-2

  • Daily/recurring media exchange across multiple chips.
  • Controlled dosing/exposure scheduling for different materials.
  • Optional sampling workflow into plates for downstream measurements.

Example pseudocode (conceptual)

# Conceptual workflow: chip maintenance + condition dosing
chips = load_custom_chip_holder(n=6)
media = reservoir("pulp_media")
test_conditions = load_conditions("cement_conditions.csv")

for day in range(total_days):
    for chip in chips:
        media_exchange(chip, media)

    if day in dosing_days:
        for chip in chips:
            condition = test_conditions[chip.id]
            dispense_condition(chip, condition)

    log_run(day)

✨ Final Project Ideas ✨

| Description 1 Description 1 | Description 2 Description 2 | Description 3 Description 3

Week 4 HW: Protein Design

WEEK 4 — PROTEIN DESIGN PART I

✨ Part A. Conceptual Questions ✨


1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Let’s walk through the math by looking directly at the weight. We know an average amino acid is about 100 Daltons. If we convert that to grams, one single Dalton is an incredibly tiny fraction of a gram (about 1.66 × 10⁻²⁴ g). That means our single 100-Dalton amino acid weighs roughly 1.66 × 10⁻²² grams. If we have a 500-gram piece of meat and we pretend for a second that it is 100% pure protein, we just divide the total weight by the weight of one molecule. So, 500 g divided by 1.66 × 10⁻²² g/molecule gives us roughly 3.01 × 10²⁴ amino acid molecules.


2.Why do humans eat beef but do not become a cow, eat fish but do not become fish?

It all comes down to how digestion works. When we eat a steak or a piece of salmon, our bodies don’t absorb intact “cow proteins” or “fish proteins.” Instead, our digestive enzymes act like scissors, chopping those foreign proteins down into individual, universal amino acid building blocks. Our cells then take those generic blocks and use our own human DNA as the instruction manual to build uniquely human proteins. We steal the bricks, but we use our own blueprint


3. Why are there only 20 natural amino acids?

The simplest reason is just how our genetic instruction manual is wired. Our DNA and mRNA use a system of 64 different three-letter codes (called codons) to tell the cell which amino acid to add next. You might think 64 codes would mean 64 different amino acids, but the system has a lot of built-in redundancy, meaning several different codons actually act as instructions for the exact same amino acid.


5. Where did amino acids come from before enzymes that make them, and before life started?

From what I understand, amino acids were formed through non-enzymatic chemistry long before life or enzymes even existed. A classic piece of evidence for this is the Miller-Urey experiment from 1953. In that experiment, scientists mixed simple gases thought to be on early Earth like methane (CH₄), ammonia (NH₃), hydrogen (H₂), and water vapor (H₂O) and used an electrical spark to simulate lightning. This setup spontaneously produced amino acids. I’ve also read that amino acids have been found inside meteorites, which suggests that the basic building blocks of life can form naturally in the universe without needing any biological enzymes to make them.


6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

I think that since D-amino acids are the exact mirror images of the natural L-amino acids we usually see in proteins, the structures they form would be mirrored, too. In nature, L-amino acids favor forming right-handed α-helices because that specific twist prevents their side chains from crashing into each other. So, if we built a chain entirely out of D-amino acids, I would expect it to naturally fold in the opposite direction, creating a left-handed α-helix to keep the structure stable.


7. Can you discover additional helices in proteins?

I think it is possible. Even though the α-helix is the most common one we learn about, I know there are already a few other known variations, like the 3₁₀-helix (which is more tightly coiled) and the π-helix (which is wider). Also, considering how fast AI tools like AlphaFold are advancing and with better imaging techniques like cryo-EM. I guess we will probably discover new, unusual, or temporary helices in flexible parts of proteins that were just too hard to see or predict before.


8. Why are most molecular helices right-handed?

It ties right back to the fact that all natural proteins are made of L-amino acids. When you string L-amino acids together, twisting them into a right-handed helix naturally pushes the bulky side chains outward and away from the backbone. If they tried to twist into a left-handed helix, those side chains would crash into the backbone, making the structure highly unstable.


9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-sheets tend to aggregate because the backbone groups (the NH and CO atoms) on the outer edges of the sheets are exposed and can easily form new hydrogen bonds with strands from completely different protein molecules. The main driving force for this aggregation is the formation of these extensive hydrogen bonds, along with hydrophobic interactions. By stacking together, they maximize these bonds, which increases their structural stability and lowers the overall energy of the system.


10. Why do many amyloid diseases form beta-sheets? Can you use amyloid beta-sheets as materials?

Amyloid diseases form beta-sheets because when proteins misfold, they expose their backbone edges. These edges easily form hydrogen bonds with other misfolded proteins, causing them to stack together into long, repetitive beta-sheet structures called amyloid fibrils. These fibrils are thermodynamically very stable, so they aggregate into tough plaques that the body cannot easily break down, leading to disease. However, yes, we can use them as materials. Because these amyloid $\beta$-sheets are incredibly strong and stable, scientists are engineering synthetic versions of them to create tough biomaterials, like hydrogels and nanomaterials.


11. Design a beta-sheet motif that forms a well-ordered structure.

I think to design a stable, well-ordered beta-sheet, I would try using a sequence that perfectly alternates between hydrophobic and hydrophilic amino acids something like Valine (hydrophobic), Serine (hydrophilic), Isoleucine (hydrophobic), and Glutamine (hydrophilic). I learned that in a beta-strand, the side chains naturally alternate pointing up and down. Because of this, an alternating sequence would force all the hydrophobic side chains to point to one face of the sheet, and all the hydrophilic ones to point to the other. That way, two of these sheets could snap together by hiding their hydrophobic faces in the middle, leaving the water-loving sides facing outward to interact with the cell, making the whole structure stable.

✨ Part B. Protein Analysis and Visualization ✨

1. Briefly describe the protein you selected and why you selected it.

I selected the protein Lysostaphin, which is an antimicrobial enzyme naturally produced by Staphylococcus simulans. It works by cleaving the pentaglycine cross-bridges in the cell wall of Staphylococcus aureus, causing the bacteria to burst. I chose this protein because I actually used the Lysostaphin gene in a previous project I designed, so I already have a strong personal interest in it. Furthermore, because it is being heavily researched as a potential alternative to antibiotics for treating MRSA infections, it has great 3D structures available and is highly relevant to synthetic biology, making it a perfect candidate for this structural analysis.


2. Identify the amino acid sequence of your protein.

From the RCSB Protein Data Bank (https://www.rcsb.org/structure/4LXC)

I picked the best 3D structure

• The search results will show you a list of different 3D structures scientists have solved for that protein. Each one has a unique 4-character code called a PDB ID (like 4LXC, 4QPB, 1QWY, etc.).

• I scrolled through and clicked on 4LXC because the description said it contained the full mature enzyme, which is exactly what we wanted for this project. Click on that blue 4LXC title to open its main structure page.

I downloaded the FASTA sequence for the mature active enzyme:

>4LXC_1|Chain A|Lysostaphin|Staphylococcus simulans
AATHEHSAQWLNNYKKGYGYGPYPLGINGGMHYGVDFFMNIGTPVKAISSGKIVEAGWSNYGGGNQIGLIENDGVHRQWYMHLSKYNVKVGDYVKAGQIIGWSGSTGYSTAPHLHFQRMVNSFSNSTAQDPVKILRQVNIPWAKNRGAHSWDWSKSRNRGVNAEGFPIPASTPNGAMAVGLGGHGSSTQGSGGSGTTKPKQAPGSNGSQSGSTGGSTGGAEGGKAGGNGGNGGAWNGNGGNGGGWGKGKGK

How long is it? What is the most frequent amino acid?

Using the downloaded FASTA sequence from the RCSB PDB (ID: 4LXC) and the provided Colab notebook, the Lysostaphin sequence is 255 amino acids long. The most frequent amino acid is Glycine (G), which appears 35 times in the sequence.

How many protein sequence homologs are there for your protein?

Using UniProt’s BLAST tool, I found 250 protein sequence homologs for Lysostaphin. The search hit the 250-result limit, representing proteins with significant sequence similarity, mostly from other Staphylococcus bacterial species.

Does your protein belong to any protein family?

Yes, based on the UniProt database, Lysostaphin belongs to the M23B metallopeptidase family. This is a family of enzymes that act as molecular scissors, using a metal ion (like Zinc) to cleave the cell walls of bacteria. This perfectly matches Lysostaphin's specific job of cutting through the protective wall of Staphylococcus aureus.

Figure B1. Lysostaphin visualization (initial view)

3. Identify the structure page of your protein in RCSB
When was the structure solved? Is it a good quality structure?

I used the structure page for Lysostaphin with the PDB ID 4LXC (https://www.rcsb.org/structure/4LXC). The structure was released on July 9, 2014. It was solved using X-ray diffraction with a resolution of 3.50 Å. Because this is higher than the 2.70 Å benchmark, it is technically a lower-resolution structure, but it is still highly valuable because it captures the complete architecture of the mature enzyme.

Are there any other molecules in the solved structure apart from protein?

Yes, apart from the protein chain, the solved structure contains Zinc ions. This is important because Lysostaphin is a metalloenzyme and needs that Zinc trapped in its active site to cut the bacterial cell wall. There are also a few sulfate ions present, likely used to help crystallize the protein.

Does your protein belong to any structure classification family?

Yes. While RCSB doesn't list the older SCOP classification for this specific structure, I checked the InterPro database on the Annotations tab. It classifies the overall protein into the 'Bacterial cell wall metabolism enzyme' family. Structurally, it classifies the main cutting region as the 'M23ase, beta-sheet core domain' and the targeting region as an 'SH3-like domain.' This perfectly describes its 3D shape and its job of binding to and cutting bacterial walls.

Figure B2. Lysostaphin structure (RCSB 4LXC)

4. Open the structure of your protein in any 3D molecule visualization software:

Cartoon;

Figure B3. Cartoon representation

Ribbon;

Figure B4. Ribbon representation

Ball and Stick;

Figure B5. Ball-and-stick representation

Color by Secondary Structure

Figure B6. Colored by secondary structure

You will likely see lots of yellow arrows → meaning more beta sheets than helices (lysostaphin includes a β-rich SH3b domain).

Color by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Figure B7. Colored by residue type

When colored by residue type, lysostaphin shows that polar (cyan) and charged (blue) residues are broadly distributed on the surface, while hydrophobic residues (orange) are more common in the interior/core. This pattern is typical of soluble proteins, which maintain a hydrophobic core and a hydrophilic exterior.

Visualize the surface of the protein. Does it have any ‘holes’ (aka binding pockets)?

Figure B8. Surface visualization

After switching to the surface representation, I rotated the protein and inspected it for cavities. I did not observe a clear deep tunnel-like hole, but I did see a noticeable cleft/groove between the two domains (an opening in the middle region). From the surface view alone I cannot confirm with certainty that this is a true binding pocket, but the indentation suggests a potential pocket-like substrate-binding groove.


Visualization workflow (PyMOL; lysostaphin 4LXC)

I opened the lysostaphin structure (PDB 4LXC) in PyMOL, removed solvent/extra non-protein atoms, and kept a single protein chain to make the view clear. I then visualized the protein using three standard representations: cartoon (best for overall fold), ribbon (backbone trace), and ball-and-stick (atomic detail).

Next, I colored the protein by secondary structure (helices, sheets, loops). The structure shows more β-sheets than α-helices, visible as many β-strand arrows compared with fewer helical segments. Then, I colored residues by residue type (hydrophobic vs polar vs charged). Hydrophobic residues were mainly concentrated in the interior, while polar/charged residues were enriched on the surface, which is typical for soluble proteins (hydrophobic core, hydrophilic exterior).

Finally, I visualized the surface representation and inspected it for cavities. In the surface view there appears to be an opening/cleft between the two domains, but from this visualization alone I cannot confirm with certainty whether it represents a true binding pocket (as opposed to a general surface groove). However, the indentation suggests a possible pocket-like region.

✨ Part C. Using ML-Based Protein Design Tools ✨


C1. Protein Language Modeling

1) Deep Mutational Scans

a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
I used ESM2 to generate a deep mutational scan for Lysostaphin (PDB ID: 4LXC). The results are shown as a heatmap, where each position in the sequence is tested with different possible 20 amino acid mutations.

Figure C1. Mutation scan heatmap (ESM2) Figure C1. Mutation scan heatmap (ESM2)

Figure C1. Mutation scan heatmap (ESM2).

b. Can you explain any pattern? (choose a residue and a mutation that stands out)

In the heatmap, yellow represents beneficial or tolerated mutations, while dark blue represents unfavourable mutations. I noticed that some positions are mostly dark blue, which suggests they are important for the protein structure and do not tolerate changes well.

Figure C2. Overview of mutational tolerance pattern Figure C2. Overview of mutational tolerance pattern

Figure C2. Overview of mutational tolerance pattern.

For the positive mutation, at position 116, changing the residue to L (Leucine) gives a high score (+6.10). This basically means the model thinks leucine fits very well there. The position is probably flexible or not very important structurally, so swapping in a hydrophobic residue like leucine doesn’t cause problems. In other words, the protein seems totally fine with this mutation.

Figure C3. Example of a tolerated/beneficial mutation (L at position 116) Figure C3. Example of a tolerated/beneficial mutation (L at position 116)

Figure C3. Example of a tolerated/beneficial mutation (L at position 116).

For the negative mutation, at position 98, changing the residue to H (Histidine) gives a very low score (-10.36). This tells us the model really dislikes this substitution. Histidine has a bulky ring structure and can carry a charge, so forcing it into a spot where it doesn’t belong can easily disrupt the protein’s folding or stability. This suggests that position 98 is highly sensitive and does not tolerate big chemical changes.

c. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

I searched for experimental mutational scans (such as Deep Mutational Scanning datasets) for my chosen protein, Lysostaphin, but no comprehensive DMS data were readily available. Therefore, a direct one-to-one comparison between the ESM2 language model predictions and experimental heatmap results could not be performed for this specific sequence.


2) Latent Space Analysis

a. Use the provided sequence dataset to embed proteins in reduced dimensionality.

To map my protein in the latent space, I first loaded the provided SCOP sequence dataset. Since Lysostaphin (4LXC) wasn’t naturally in that database, I had to manually insert it before generating the embeddings. I created a new code block and used the Bio.Seq library to define my specific 255-letter amino acid sequence. I then packed it into a SeqRecord object with the custom ID ‘MY_LYSOSTAPHIN’ and used the .append() function to attach it to the very end of the sequences list.
Once my protein was successfully added to the dataset, I ran the entire batch through the ESM2 language model to extract the hidden state embeddings. Finally, I used 3D t-SNE dimensionality reduction to plot the coordinates and visualize how all the proteins, including mine, clustered together in the resulting latent space.

Figure C4. 3D t-SNE latent space embedding Figure C4. 3D t-SNE latent space embedding

Figure C4. 3D t-SNE latent space embedding.

b. Analyze the different formed neighborhoods: do they approximate similar proteins?

“Looking at the 3D t-SNE scatter plot, the proteins form a dense, cohesive 3D structure with distinct regional neighborhoods. Yes, these neighborhoods clearly approximate similar proteins. The t-SNE algorithm translates the AI’s complex understanding of protein ‘grammar’ into 3D coordinates. Because sequences that share similar evolutionary motifs and folding instructions get embedded with similar mathematical values by the ESM2 model, they naturally clump together into specific neighborhoods within this larger map.”

c. Place your protein in the resulting map and explain its position and similarity to its neighbors.

“I successfully plotted Lysostaphin (4LXC) within this 3D latent space. Because Lysostaphin is a highly specific metalloendopeptidase, the AI model recognized the sequence patterns that code for its unique M23ase beta-sheet core and its zinc-binding active site (as confirmed by our InterPro database search). Therefore, the model didn’t just place it randomly; it mapped Lysostaphin into a specific neighborhood surrounded by other proteins that share similar enzymatic functions and structural properties, effectively grouping it with its functional ‘relatives’”


C2. Protein Folding

a. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
When comparing the ESMFold prediction to the original PyMOL structure (4LXC), the coordinates only partially match. The AI model successfully predicted the primary catalytic core of the enzyme, forming a tight, high-confidence beta-sheet bundle (colored blue/purple) that clearly mirrors one of the domains in the experimental PyMOL structure.
However, Lysostaphin is a multi-domain protein. While the real PyMOL structure clearly shows two distinct folded domains, the ESMFold prediction failed to fold the second domain. Instead, it predicted the rest of the sequence as low-confidence, unstructured flexible tails (colored yellow/orange). This demonstrates that while the AI is excellent at predicting single stable domains, it struggled to accurately predict the entire multi-domain architecture of this specific protein without an experimental template.

Figure C5. ESMFold prediction Figure C5. ESMFold prediction

Figure C5. ESMFold prediction.

Figure C6. Experimental structure (PyMOL, 4LXC) Figure C6. Experimental structure (PyMOL, 4LXC)

Figure C6. Experimental structure (PyMOL, 4LXC).

Step 1: The Small Mutations (Point Mutations)

b. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

To test the protein’s structural resilience to minor changes, I introduced five random point mutations into the original Lysostaphin sequence and predicted its structure again using ESMFold. As seen in the resulting model, the protein proved to be highly resilient to these small alterations. The central catalytic core (colored blue and purple, indicating high prediction confidence) remained completely intact, successfully folding into the same tightly packed beta-sheet structure as the original unmutated prediction. This demonstrates that swapping a few random amino acids does not destroy the overall structural integrity or the underlying folding instructions of the protein.

Figure C7. ESMFold after point mutations Figure C7. ESMFold after point mutations

Figure C7. ESMFold after point mutations.

Step 2: The Massive Deletion (Breaking the Core)

For the final test, I deleted a large segment of about 30 to 40 amino acids from the middle of the sequence to see if the structure was resilient to major changes. Interestingly, the protein did not completely unfold. The AI still managed to pack the remaining sequence into a folded core, which is shown by the high-confidence blue and purple regions.
However, because a large portion of the middle was missing, the model had to stretch the remaining sequence to bridge the gap. This created a massive, unstructured loop, which is colored yellow and orange to indicate low prediction confidence. Based on this, the protein is not resilient to large segment deletions. Even though it tried to fold the leftover pieces, the overall 3D shape is severely distorted and missing the critical structural connections it needs to function.

Figure C8. ESMFold after large deletion Figure C8. ESMFold after large deletion

Figure C8. ESMFold after large deletion.


C3. Protein Generation

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN
Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

Using ProteinMPNN, I inverse-folded the 4LXC backbone to generate a novel sequence. When comparing the predicted sequence to the native sequence, the sequence recovery rate was 42.28% (seq_recovery=0.4228). This indicates that the AI significantly redesigned the protein, changing over half of the amino acids. Interestingly, the model assigned a better (lower) thermodynamic score to its generated sequence (score = 0.8426) than to the original native sequence (score = 1.6437). This suggests the model is highly confident that this novel, heavily mutated sequence will successfully fold into the target 3D backbone.

ProteinMPNN Output:

T=0.1, sample=0, score=0.8426, seq_recovery=0.4228
TPTPKCNASWLNNYPLKLPFGPAPPGLNGGIHYGVTFEMPVGTPVRAPVTGEVVFAGYDERWRGNVVVIKSDDGKTIWRYAHLSSFKVKAGDKVEAGQVIGYSGAPPPGLGPHLEFVLMEGAFSDENAIDPMPFLEACGLGQPPAAPPPEPGDGWKVDADGTRWREKTFTFTPNKDLVLRKNAPKASEPVAGVLKAGEAVTAYKEYKYDGHLWIQFKDANGNLVYLPIADYNAETNTWGPLYGTFT

Input this sequence into ESMFold and compare the predicted structure to your original.

Structural Comparison using ESMFold:

Finally, I inputted the novel ProteinMPNN-generated sequence back into ESMFold to predict its 3D structure and compared it to the original native prediction. The result was highly successful. Despite the sequence being only 42.28% identical to the original, ESMFold predicted that it would fold perfectly into the target topology. In fact, the newly generated sequence produced a tightly packed, highly compact globular structure with clear, well-defined secondary structures (prominent beta-sheets). It successfully maintained the core architecture of the original protein while appearing to eliminate some of the looser, unstructured regions seen in the wild-type prediction. This confirms that the generative model successfully learned the structural grammar required to reverse-engineer a completely novel sequence for a specific 3D fold.

Figure C9. ESMFold prediction for ProteinMPNN-generated sequence Figure C9. ESMFold prediction for ProteinMPNN-generated sequence

Figure C9. ESMFold prediction for ProteinMPNN-generated sequence.


✨ Part D. Group Brainstorm on Bacteriophage Engineering ✨


1. My Chosen Goal: Increased Stability of the L Protein

For my project, I decided to tackle the “easiest” but arguably most foundational goal: increasing the thermodynamic stability of the bacteriophage L (lysis) protein.
While engineering higher toxicity or meddling with host interactions (like DnaJ) sounds exciting, none of that matters if the phage degrades on a shelf or misfolds during assembly. Phages hold massive potential as an alternative to antibiotics (phage therapy), but to be used as medicine, their proteins need to be tough enough to survive manufacturing, storage, and the human body. By computationally stabilizing the L protein, I aim to ensure the phage can reliably assemble and survive until it is time to punch a hole in the E. coli membrane.


2. Proposed Computational Tools and Workflow

To achieve this, I plan to use the computational inverse-folding pipeline explored in recitation, specifically relying on ESMFold and ProteinMPNN.

  • Step 1: Baseline Structure Prediction (ESMFold): First, I will take the wild-type (natural) amino acid sequence of the L protein and run it through ESMFold. This gives me my baseline 3D structural backbone. I need to see exactly how nature folds this protein before I try to improve it.
  • Step 2: Inverse-Folding for Stability (ProteinMPNN): Next, I will strip away the wild-type amino acid letters, keeping only the 3D backbone coordinates. I will feed this empty 3D skeleton into ProteinMPNN. I will prompt the model to generate a batch of novel sequence candidates that fit this exact shape. My primary filtering metric will be the negative log-likelihood score; I am looking for sequences that ProteinMPNN scores lower (better) than the wild-type sequence, indicating tighter packing and higher thermodynamic stability.
  • Step 3: In Silico Validation (ESMFold & AlphaFold-Multimer): I can’t just trust ProteinMPNN blindly. I will take my top newly generated, highly stable sequences and feed them back into ESMFold. If the AI-generated sequence successfully folds back into the original L protein shape with high confidence (high pLDDT scores), I know I have a viable candidate. If I have extra compute time, I might also run it through AlphaFold-Multimer to ensure the stabilized protein doesn’t accidentally block its own ability to form complexes.

3. Why I Think These Tools Will Solve the Problem

These tools are perfect for this because they optimize for completely different things than nature does. Natural evolution is lazy—it selects for “good enough to survive.” Because of this, the wild-type L protein likely has suboptimal, loose regions in its hydrophobic core.
ProteinMPNN, on the other hand, purely optimizes for mathematical and physical stability. By locking the 3D shape and asking the AI to invent a new sequence, the model can identify bulky or awkward amino acids that nature left behind and swap them for residues that pack together perfectly. I am essentially using AI to clean up nature’s messy structural grammar.


4. Potential Pitfalls I Might Face

  • The “Brick” Problem (Over-stabilization): The biggest risk of purely optimizing for thermodynamic stability is that I might make the L protein too rigid. The L protein needs to be dynamic to function—it has to physically interact with and rupture the E. coli membrane. If ProteinMPNN packs the core so tightly that the protein turns into an inflexible “brick,” it might be highly stable but biologically useless.
  • Lack of Cellular Context: ProteinMPNN and ESMFold operate in a digital vacuum. They don’t account for the chaotic, crowded cytoplasm of an E. coli cell, the specific pH, or the presence of bacterial chaperones. A sequence that looks perfectly stable on my Colab notebook might instantly misfold or aggregate when introduced to a real biological environment.

5. Schematic of My Engineering Pipeline

  • Input: Wild-Type L Protein Sequence
  • [ ↓ ] Forward Prediction (ESMFold)
  • Output: 3D Backbone Template (PDB format)
  • [ ↓ ] Inverse Folding (ProteinMPNN)
  • Output: Dozens of novel sequence candidates
  • [ ↓ ] Filter & Select
  • Action: Pick the sequence with the best (lowest) ProteinMPNN score.
  • [ ↓ ] Validation (ESMFold)
  • Output: Confirmed 3D structure (ensuring it doesn’t unfold into an unstructured loop).
  • Result: Final optimized sequence ready for wet-lab synthesis!

Note to myself if I look back

1. Forward Prediction: From 1D to 3D (ESMFold)

  • What it is doing technically: ESMFold uses a massive Large Language Model (LLM) called ESM-2, which was trained on hundreds of millions of natural protein sequences. It treats amino acids like words in a sentence.
  • The Math/Logic: When you input the wild-type L protein sequence, the model’s “attention mechanisms” calculate which amino acids are likely to physically interact with each other, even if they are far apart in the 1D text string.
  • The Output: It calculates the exact spatial coordinates (X, Y, Z positions) of every single atom in the protein backbone and spits them out as a .pdb (Protein Data Bank) file. This gives us our baseline “ground truth” geometry.

2. Inverse Folding: From 3D to 1D (ProteinMPNN)

  • What it is doing technically: ProteinMPNN is a Graph Neural Network (GNN). While ESMFold reads text, ProteinMPNN reads geometry.
  • The Math/Logic: It takes your 3D .pdb backbone and turns it into a mathematical graph. Every amino acid position becomes a “node,” and the physical distances between them become “edges.” It completely deletes the actual amino acid letters (a process called masking) and only looks at the angles and distances of the backbone atoms (Nitrogen, Alpha-Carbon, Carbon, Oxygen).
  • The Output: The neural network passes messages between these nodes to calculate a probability distribution for all 20 possible amino acids at every single position. It asks: “Based on the geometry of this pocket, which amino acid has the perfect chemical properties and physical size to fit here without clashing?”

3. Filtering & Selection (The Negative Log-Likelihood Score)

  • What it is doing technically: You don’t just pick a sequence at random; you select based on mathematical confidence.
  • The Math/Logic: ProteinMPNN grades its own homework using a score calculated as $-\log P(\text{sequence} \mid \text{structure})$. This represents the negative log-likelihood of a sequence given the 3D structure.
    o A lower score means higher probability.
    o If the AI generates a sequence with a score of 0.84 and the wild-type natural sequence scores 1.64 (like you saw in your actual run!), it means the AI’s sequence physically packs into that target shape tighter, with better hydrophobic core interactions and fewer energetic clashes, than the natural sequence.

4. Orthogonal Validation (ESMFold, again)

  • What it is doing technically: This is the most crucial step that proves this isn’t just hypothetical. ProteinMPNN assumes the 3D backbone is frozen in space, but in reality, proteins are moving, dynamic chains. We have to prove the new sequence will actually fold into that shape from scratch.
  • The Math/Logic: We take the brand new, AI-generated sequence and feed it back into ESMFold. ESMFold has never seen the target 3D structure; it only sees the new letters.
  • The Output: If ESMFold (a sequence-to-structure model) independently predicts the exact same 3D geometry that ProteinMPNN (a structure-to-sequence model) designed it for, the loop is closed.

Week 5 HW: Protein Design: Part II

✨ Part A. SOD1 Binder Peptide Design ✨

Part 1: Generate Binders with PepMLM

Sequence Retrieval and Mutation I began by retrieving the human Superoxide dismutase 1 (SOD1) sequence from the UniProt database using the accession number P00441. The native (wild-type) sequence consists of 154 amino acids:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

To model the disease state, I introduced the ALS-causing A4V mutation (Alanine → Valine at residue 4). Noting that standard numbering excludes the initiator Methionine (M), I replaced the Alanine at the 5th position with a Valine to create my target mutant sequence:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Peptide Generation Using the PepMLM Colab notebook, I inputted the mutated A4V SOD1 sequence. I configured the model parameters to generate 4 peptide binders, explicitly setting the target length to 12 amino acids.

Results and Perplexity Analysis I recorded the pseudo-perplexity scores for the four newly generated peptides. A lower pseudo-perplexity score indicates higher model confidence in the sequence’s ability to bind the target.

To establish a baseline, I wrote a custom code block in the notebook to calculate the pseudo-perplexity for the known SOD1-binding peptide (FLYRWLPSRRGG) against my mutated sequence.

Known Binder Code Known Binder Code

Below is the consolidated table of my generated binders compared against the known binder:

Binder IndexPeptide SequencePseudo Perplexity
Binder 0WHYPAVAAAWKE9.54
Binder 2WRYPAVAAELKE10.01
Binder 3KHYGVAAAELKE14.70
Binder 1WRYYVTAAAWWK18.48
Known BinderFLYRWLPSRRGG20.64

Conclusion for Part 1:

The PepMLM model generated four valid candidate peptides. Notably, all four generated peptides achieved lower pseudo-perplexity scores than the known binder (20.64), suggesting that the model is highly confident these novel sequences will bind favorably to the A4V mutant SOD1 protein.

Part 2: Evaluate Binders with AlphaFold3

1. Known Binder (FLYRWLPSRRGG)

  • ipTM Score: 0.90
  • Structural Analysis: The known binder achieved the highest confidence score. Structurally, it localizes centrally in the upper cleft, wedged directly at the dimer interface between the two SOD1 chains. It is entirely surface-bound and does not localize near the N-terminus where the A4V mutation sits. Because it is short and flexible, the peptide itself appears reddish-orange (pLDDT < 50), though its binding location is predicted with high confidence.

2. Binder 3 (KHYGVAAAELKE)

  • ipTM Score: 0.86
  • Structural Analysis: This was the best-performing generated peptide. Instead of approaching the dimer interface, it stretches out along the bottom-right outer edge of the β-barrel. It is completely surface-bound and, like the control, does not localize near the N-terminus.

3. Binder 1 (WRYYVTAAAWWK)

  • ipTM Score: 0.82
  • Structural Analysis: Similar to Binder 3, this peptide acts as a surface-bound string, but it engages the far right lateral edge of the β-barrel. It stays on the exterior of the protein, avoids the dimer interface, and does not interact with the N-terminus region.

4. Binder 0 (WHYPAVAAAWKE)

  • ipTM Score: 0.78
  • Structural Analysis: This peptide behaves uniquely by curling into a short alpha-helix rather than stretching out. It is surface-bound, floating near the top left surface of the β-barrel. It does not penetrate into any binding pockets, nor does it approach the dimer interface or the N-terminus.

5. Binder 2 (WRYPAVAAELKE)

  • ipTM Score: 0.76
  • Structural Analysis: This peptide yielded the lowest structural confidence. It is entirely surface-bound, loosely clinging to the bottom edge of the β-barrel with a noticeable portion of the sequence floating freely as a flexible tail away from the main complex.

Summary and Comparison

Overall, the ipTM values reflect confident protein-peptide interactions, ranging from 0.76 to 0.90. None of the peptides buried deeply into the protein; all remained surface-bound, and none localized near the N-terminus where the A4V mutation sits. While the PepMLM model in Part 1 predicted that the generated sequences would bind better than the control, AlphaFold’s structural modeling reveals that the Known Binder achieved the highest structural confidence (ipTM = 0.90) by uniquely targeting the dimer interface. None of the generated peptides matched or exceeded the known binder, as they mostly engaged the outer β-barrel. However, Binder 3 (0.86) and Binder 1 (0.82) still demonstrated very strong, competitive binding potential.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

After evaluating the four PepMLM-generated binders against the A4V mutant SOD1 target in PeptiVerse, I observed excellent safety profiles across the board, though their binding affinities varied.

Binder 1 (WRYYVTAAAWWK) emerged as a standout candidate. It is highly soluble (probability = 1.000) and safely non-hemolytic (probability = 0.056). Most notably, it achieved the highest predicted binding affinity of the group (pKd/pKi = 7.196), making it the only peptide classified as “Medium binding.” It has a net charge of 1.76 at pH 7, a molecular weight of 1600.8 Da, an isoelectric point of 9.70, and a hydrophobicity of -0.40. This strong predicted affinity aligns well with its high structural confidence in AlphaFold3 (ipTM = 0.82).

Binder 3 (KHYGVAAAELKE), which had the highest AlphaFold3 confidence (ipTM = 0.86), also showed a perfect safety profile with 1.000 solubility and extremely low hemolysis (0.028). However, its predicted binding affinity was lower (pKd/pKi = 5.424), falling into the “Weak binding” category. It has a near-neutral net charge (-0.14 at pH 7), a molecular weight of 1315.5 Da, and an isoelectric point of 6.77.

Binders 0 and 2 followed a similar pattern: both are completely soluble (1.000) and non-hemolytic (0.025 and 0.041, respectively), but they only demonstrated weak predicted binding affinities (5.140 and 5.651), matching their slightly lower AlphaFold3 ipTM scores (0.78 and 0.76).

PropertyWRYYVTAAAWWKKHYGVAAAELKEWHYPAVAAAWKEWRYPAVAAELKE
ipTM0.820.860.780.76
Solubility 💧1.0001.0001.0001.000
Hemolysis 🩸0.0560.0280.0250.041
Binding Affinity 🔗7.1965.4245.1405.651
Length 📏12121212
Molecular Weight ⚖️1600.81315.51428.61432.6
Net Charge ⚡1.76-0.14-0.15-0.23
Isoelectric Point 🎯9.706.776.766.28
Hydrophobicity 💦-0.40-0.53-0.32-0.48

Structural and Therapeutic Comparison

Comparing the AlphaFold3 structures to the PeptiVerse predictions reveals an interesting dynamic. While higher ipTM scores generally indicate better structural stability, the absolute highest ipTM (Binder 3) did not yield the highest biochemical binding affinity. Instead, Binder 1, which still had a very strong ipTM (0.82), significantly outperformed the others in predicted affinity (7.196). Fortunately, none of the generated peptides are predicted to be hemolytic or poorly soluble; the model successfully generated highly safe, hydrophilic sequences across all candidates.

Chosen Candidate:

Based on the compiled real-world data, Binder 1 (WRYYVTAAAWWK) is the best candidate to advance. The reasons are:

-Strongest predicted binding: Its pKd/pKi of 7.196 is the highest by a wide margin, making it the only sequence to cross into the “Medium binding” threshold for the A4V mutant SOD1 target.

-High structural confidence: With an ipTM of 0.82, AlphaFold predicts a highly stable surface-bound interaction.

-Perfect safety profile: Despite its high affinity, it remains fully soluble (1.000) and non-hemolytic (0.056).

Best overall balance: While Binder 3 had a slightly higher structural confidence, Binder 1 provides the optimal balance by massively increasing the actual binding affinity while maintaining excellent therapeutic safety properties.

Part 4: Generate Optimized Peptides with moPPIt

In the peptide generation tool, I first pasted the A4V mutant SOD1 sequence. Then I set the peptide length to 12 amino acids. After that, I enabled the options “Enable motif and affinity guidance” (as well as solubility/hemolysis guidance), specifically targeting residues 4, 5, and 6 to ensure binding right at the disease-causing A4V mutation site. After running the tool, three peptide motifs were generated: SEQKGLECRVTM, EQYKKNPGGLCI, and EKKCWDTKQTVN.

Then, I evaluated the generated peptides, as in the previous step, in order to compare the peptides generated by PepMLM and moPPit and evaluate their physicochemical properties.

PeptideSolubilityHemolysisBinding Affinity (pKd/pKi)Net ChargeGRAVY
SEQKGLECRVTMSolubleNon-hemolytic (0.049)6.2960.00-0.70
EQYKKNPGGLCISolubleNon-hemolytic (0.048)5.9711.00-0.93
EKKCWDTKQTVNSolubleNon-hemolytic (0.029)6.0911.00-1.45

All three moPPIt peptides were predicted to be soluble and non-hemolytic, which indicates a favorable safety profile. Among them, SEQKGLECRVTM shows the highest predicted binding affinity (6.296 pKd/pKi), suggesting stronger interaction with the target. In contrast, EKKCWDTKQTVN has the lowest hemolysis probability and the highest hydrophilicity (GRAVY = -1.45), indicating potentially better biological compatibility and formulation ease.

Compared with the peptides generated by PepMLM, the moPPit peptides provide a massive functional advantage. While PepMLM randomly guessed surface-binding sequences, moPPIt’s motif-guided design explicitly steered these peptides to target the exact structural location of the A4V mutation. Before advancing these to clinical studies, I would evaluate them by running them through AlphaFold3 to visually confirm they successfully dock at the targeted N-terminus motif, followed by in vitro binding assays (such as Surface Plasmon Resonance) to physically validate their safety and affinity in a lab setting.

Comparison and Clinical Evaluation

1. How moPPIt peptides differ from PepMLM peptides: The core difference lies in control and optimization. PepMLM acts as an unguided sampler; it analyzes the A4V mutant SOD1 target and predicts sequences that will bind somewhere on the protein, which resulted in peptides randomly attaching to the outer surface or β-barrel. In contrast, moPPIt utilizes multi-objective guided discrete flow matching. Instead of randomly guessing, moPPIt was explicitly steered to bind the exact disease-causing site (residues 4, 5, and 6) while mathematically forcing the sequences to optimize for four specific traits simultaneously: target motif adherence, high binding affinity, high solubility, and zero hemolysis. This results in highly targeted, functionally optimized drugs rather than just general binders.

2. Evaluation prior to clinical studies: Before advancing these generated peptides to clinical trials, a rigorous validation pipeline is required:

  • In Silico Validation: First, I would model the moPPIt peptides using AlphaFold3 to visually confirm that they actually dock at the targeted N-terminus motif (residues 4-6) as intended. I would also run Molecular Dynamics (MD) simulations to ensure the binding complex remains stable over time.
  • In Vitro Assays: The computational predictions must be validated in a physical lab. I would use Surface Plasmon Resonance (SPR) or Biolayer Interferometry (BLI) to measure the actual physical binding affinity ($K_d$). Additionally, laboratory hemolysis and solubility assays are required to confirm the AI’s safety predictions.
  • In Vivo Studies: Finally, the most promising candidates would be tested in animal models (such as transgenic ALS mouse models) to evaluate their pharmacokinetics (how long they last in the body), bio-distribution (if they reach the target tissue), and overall systemic toxicity before ever being tested in humans.

✨ Part B: BRD4 Drug Discovery Platform Tutorial ✨

Optional

✨ Part C: Final Project: L-Protein Mutants ✨

Objective: The primary goal of this project was to engineer the MS2 bacteriophage L-protein (lysis protein) to overcome a common E. coli resistance mechanism. Typically, the L-protein relies on the bacterial chaperone DnaJ to fold correctly and form a pore in the cell membrane. By mutating the L-protein, I aimed to design variants that are either completely independent of DnaJ (by altering the soluble domain) or capable of lysing the bacteria much faster (by optimizing the transmembrane domain).

Computational Procedure: To achieve this, I chose Option 1 (Data-Driven Mutagenesis) and utilized a state-of-the-art Protein Language Model (ESM) via a Google Colab notebook.

  1. Sequence Input: I first inputted the wild-type amino acid sequence of the MS2 L-protein (METRFPQQ...).
  2. AI Scoring: I ran the ESM model to computationally simulate every possible single amino acid mutation at every position along the 75-residue protein. The model calculated a Log-Likelihood Ratio (LLR) score for each mutation. A high positive score indicates that the AI predicts the mutation will be structurally stabilizing and functionally beneficial.
  3. Experimental Validation: To ensure the AI’s mathematical predictions matched physical biology, I uploaded an experimental dataset (L-Protein Mutants_sheet.csv) containing actual wet-lab results of L-protein mutations. I observed a strong correlation: mutations that broke the protein in the lab (Lysis = 0) generally had poor computational scores, while the AI successfully assigned high scores to conservative, structure-preserving mutations.

Selected Mutations and Biological Rationale

Using the highest-scoring AI predictions and guided by the biological requirement to target specific domains, I selected the following 5 mutations:

I. Soluble Region Mutations (Residues 1-40)

The N-terminal soluble domain is responsible for physically interacting with the E. coli DnaJ chaperone. My strategy here was to introduce mutations that disrupt this specific dependency, forcing the protein to auto-fold.

  • 1. C29R (Position 29, Cysteine to Arginine | AI Score: 2.39): I selected this mutation because introducing a bulky, positively charged Arginine in place of Cysteine is a structurally disruptive change to the surface interface. This aims to decrease the protein’s binding affinity for DnaJ while remaining structurally stable overall, as predicted by the high AI score.
  • 2. Y39L (Position 39, Tyrosine to Leucine | AI Score: 2.24): Located right at the boundary of the soluble domain, swapping a bulky Tyrosine for a highly hydrophobic Leucine locally increases the hydrophobicity of the sequence. I hypothesize this will help the protein begin its insertion into the membrane independently, bypassing the need for chaperone assistance.

II. Transmembrane Region Mutations (Residues 41-75)

The C-terminal transmembrane domain must embed deep into the bacterial lipid bilayer to form the lethal lysis pore.

[Image of transmembrane protein pore in lipid bilayer] My strategy here was to use highly conservative, hydrophobic mutations to make membrane insertion faster and more thermodynamically favorable.

  • 3. K50L (Position 50, Lysine to Leucine | AI Score: 2.56): This was the highest-scoring mutation generated by the model. By replacing a charged, polar Lysine (which resists entering lipid membranes) with Leucine (which is highly hydrophobic and “greasy”), I vastly improved the membrane-insertion profile of the pore.
  • 4. N53L (Position 53, Asparagine to Leucine | AI Score: 1.86): Similar to my reasoning for K50L, this mutation removes a polar amino acid deep inside the transmembrane region and replaces it with a hydrophobic Leucine. This optimizes the hydrophobic packing of the pore, potentially speeding up the lysis mechanism to kill the cell before it can mount a defense.

III. Wildcard Mutation

  • 5. S9Q (Position 9, Serine to Glutamine | AI Score: 2.01): I chose this highly-scored substitution early in the soluble domain to serve as a structural stabilizer. Glutamine maintains polar characteristics necessary for the soluble region but provides a larger side chain, which the AI predicts will optimize the local hydrogen-bonding network and support independent auto-folding.

Week 6 HW: Genetic Circuits: Part I

✨ DNA Assembly ✨

Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Since the protocol didn’t list the exact ingredients, I looked up the standard components of a Phusion Master Mix from a supplier like New England Biolabs. A PCR Master Mix is basically a pre-mixed tube of everything needed to copy DNA, minus the specific template and primers. The main components are:

  • Phusion DNA Polymerase: This is the actual enzyme that copies and builds the new DNA. The ‘High-Fidelity’ part means it has a proofreading feature to catch and fix its own mistakes.
  • dNTPs: These are the free-floating A, T, C, and G building blocks that the polymerase uses to assemble the new DNA strands.
  • Reaction Buffer: This keeps the pH and environment stable so the enzyme doesn’t break down during the extreme heating and cooling cycles.
  • Magnesium Chloride: This provides magnesium ions, which the polymerase enzyme physically requires as a ‘helper’ to function and connect the DNA letters.

Question 2: What are some factors that determine primer annealing temperature during PCR?

The annealing temperature is usually set just a few degrees below the primer’s melting temperature ($T_m$).

The annealing temperature is based on the primer’s melting temperature ($T_m$), which is the temperature where the primer binds to the template DNA. A few main factors determine this:

  • Primer Length: The protocol says the binding region should be 18–22 base pairs long. Just like a longer piece of Velcro is harder to pull apart, a longer primer forms more connections with the DNA, so it requires a higher temperature to separate.
  • GC Content: The protocol mentions aiming for a 40–60% GC content. This is because the DNA letters G and C are held together by three hydrogen bonds, while A and T are only held together by two. So, a primer with a lot of Gs and Cs is bonded much tighter and needs a higher temperature to melt.
  • Salt Concentration: The salts in the PCR buffer help stabilize the DNA strands, which can also affect the temperature needed for the primers to stick properly.

Question 3: There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other

Both PCR and restriction enzyme digests create linear DNA fragments, but they work in fundamentally different ways: PCR builds new DNA, while a restriction digest cuts existing DNA.

Protocol Differences:

  • PCR: The protocol involves mixing a DNA template with primers, dNTPs, and a polymerase, then running it through rapid heating and cooling cycles in a thermocycler to amplify specific regions.
  • Restriction Digest: The protocol is generally simpler. It involves mixing existing DNA with specific restriction enzymes and a buffer, then incubating the mixture at a steady, warm temperature (usually 37°C) while the enzymes act like molecular scissors to cut the DNA at specific recognition sequences.

When to Use Which:

  • PCR is preferable when you need to actively modify, build onto, or amplify a DNA sequence. Because you design the primers yourself, you have complete control over the final product. You can add extra sequences to the ends of your primers, which become permanent overlapping tails on your new DNA fragments and this is essential for seamless cloning methods like Gibson Assembly. PCR is also the go-to method for site-directed mutagenesis (like swapping out a few base pairs to change a protein’s color). You simply build the mutation into the primer, and the polymerase incorporates it into every new copy. Finally, if you only have a microscopic amount of starting DNA, PCR is necessary because it amplifies the target into billions of copies.
  • Restriction digests are preferable when you want to work with the exact DNA sequence you already have without risking any copying errors. Even the best “high-fidelity” PCR polymerases can occasionally make random mistakes while building new strands, but restriction enzymes just cut the existing DNA exactly as it is. If your plasmid already contains convenient restriction cut sites surrounding the gene you want to move, doing a simple digest to cut and paste that fragment into a new plasmid backbone is much cheaper, faster, and safer than running a PCR. Digests are also the standard method for a “diagnostic check.” After assembling a new plasmid, you can cut it with specific enzymes and check the fragment sizes on an agarose gel to quickly verify that your cloning experiment actually worked.

Question 4: How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

To make sure our DNA pieces are ready to be glued together in Gibson Assembly, we have to clean them up and check that we actually made the right thing. We do this in four main steps:

  • DpnI Digest (Destroying the original template): Right after PCR, our tube contains our newly built mutated DNA, but it also still contains the original, non-mutated plasmid we used as a template. We add an enzyme called DpnI, which specifically hunts down and chops up the original template DNA but leaves our new PCR copies alone. This ensures we don’t accidentally assemble the old, wrong DNA.
  • DNA Purification (Washing away the junk): The PCR tube is also full of leftover ingredients like the old polymerase enzyme, unused DNA building blocks (dNTPs), and buffer salts. We use a spin column kit to wash all that junk away and trap just our pure DNA fragments. If we skip this, those leftover chemicals can actually block the Gibson Assembly enzymes from working properly.
  • Quantification (Measuring the concentration): We need to know exactly how much DNA we successfully made, so we measure the concentration using a machine like a Nanodrop. Gibson Assembly works best when you mix the DNA pieces together in a very specific ratio (the protocol says 2 parts color insert to 1 part backbone). We can’t calculate the right volumes to pipette if we don’t know our starting concentrations.
  • Diagnostic Gel (Checking our work): Finally, we run a small sample of our cleaned DNA on an agarose gel. This lets us visually check the DNA bands to confirm they are the correct size we predicted. It also proves that our PCR didn’t accidentally build a bunch of random, wrong-sized DNA fragments that would ruin the final assembly.

Question 5: How does the plasmid DNA enter the E. coli cells during transformation?

Based on the protocol, the assembled plasmid DNA gets inside the E. coli bacteria through a process called heat shock transformation. Here is how it works:

  • The Setup: First, I mix the ‘competent’ bacteria (which are specially prepped and kept super cold on ice) with our newly built plasmid DNA. At this point, the DNA is just floating in the liquid outside the bacterial cells.
  • The Heat Shock: I take the tube directly from the ice and plunge it into a hot water bath (42°C) for exactly 45 seconds, and then immediately stick it back on the ice.
  • Entering the Cell: This sudden, extreme shift from cold to hot physically stresses the bacteria and causes tiny pores (holes) to temporarily open up in their cell walls and membranes. Because the plasmid DNA is floating right outside, it simply diffuses (flows) through these microscopic open pores and into the cell.
  • Recovery: When the tube goes back on ice, the pores close up, trapping the DNA inside. After that, we just give the bacteria some nutrient-rich SOC broth and let them rest at a warm temperature for an hour so they can heal their cell walls and start reading the new color genes.

6. Describe another assembly method in detail (such as Golden Gate Assembly)

a. Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate Assembly is another way to seamlessly join multiple DNA pieces together, but instead of using overlapping PCR primers like Gibson Assembly, it uses special restriction enzymes (like BsaI). These are called Type IIS enzymes because they physically cut the DNA a few steps away from their actual recognition sequence, leaving behind custom 4-letter sticky ends. By designing these 4-letter overhangs to match perfectly between our different fragments, we can mix multiple pieces into one tube and they will naturally assemble in the exact right order. The best part is that when the enzyme cuts the DNA, the original recognition sequence is completely chopped off and left behind. This means the DNA pieces get glued together ‘scarlessly’ and the enzyme can’t accidentally cut them apart again, allowing both the cutting and the gluing to happen simultaneously in a single tube.

Golden Gate Assembly Diagram Golden Gate Assembly Diagram Thermo Fisher Scientific (Source Link)

b. Model this assembly method with Benchling or Asimov Kernel

Basically, here is what I just did in Benchling:

First, I needed an empty delivery truck to hold my gene, so I imported the standard pUC19 plasmid. I noted that it was exactly 2686 base pairs long. Next, I had to go get my the purple color gene (amilCP). I pulled up a different plasmid (MG252981) just so I could copy that specific purple gene out of it. I didn’t need the whole circle, just that ~666 bp chunk of linear DNA. Then came the slightly annoying but super important part: the restriction enzyme I am using (BsaI) leaves a specific 4-letter sticky overhang when it cuts pUC19 open, and that overhang is GCCA. So, for my purple gene to fit perfectly into that gap like a puzzle piece, I literally just typed ‘GCCA’ onto the very beginning and the very end of my amilCP sequence. Finally, I opened the Assembly Wizard. I told Benchling that pUC19 was my ‘backbone’ (the truck) and my edited purple gene was the ‘insert’. The software saw those matching GCCA sticky ends, snapped them together automatically, and spit out my final circular plasmid. Because 2686 + ~674 = 3360, seeing that final size on the map was the proof that I did it right.

Benchling Plasmid Map Benchling Plasmid Map

✨ Asimov Kernel ✨

Week 7 HW: Genettic Circuits: Part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits?

In my research, I found that IANNs offer a much more flexible way to handle biological data compared to standard Boolean (ON/OFF) circuits. Here are the main benefits:

  • Graded vs. Binary Responses: While Boolean gates force a sharp decision, IANNs treat molecules as continuous inputs. This allows the cell to compute a proportional response (like partial induction) rather than just being fully “on” or “off.”
  • Multivariate Integration: IANNs can sum up multiple weighted inputs at once. This lets them perform pattern recognition and complex classification that simple logic gates can’t handle.
  • Noise & Fault Tolerance: IANNs are much more robust. In a Boolean gate, a single signal crossing a threshold can flip the whole output (brittleness). In an IANN, since it’s an analog sum, noise in one specific regulator has a limited effect on the final result.
  • Adaptability: The “weights” and thresholds of these networks can be tuned or trained through directed evolution, making them easier to optimize than rigid digital circuits.

2. Application: Sentinel Implant Probiotic

I propose using an IANN to create a “Guardian” probiotic (using Lactobacillus reuteri) designed to prevent infections around dental implants.

  • Input Behavior: The bacteria would monitor two specific analog cues:
    1. AHL concentration (pathogen quorum-sensing molecules).
    2. Local pH levels (which drop when acid-producing pathogens are present).
  • Output Behavior: These inputs are weighted within the cell. If the weighted sum of “AHL + Low pH” hits a certain threshold, the probiotic expresses an antimicrobial peptide at a proportional level.
  • The Goal: This creates a “soft AND” gate, ensuring the antimicrobial is only produced during actual dysbiosis, which protects the healthy oral flora.
  • Limitations: Setting precise biological weights is difficult. We also have to worry about signal leakage (AHL diffusing away) and the long-term stability of the engineered strain in a competitive biofilm.

3. Multilayer Perceptron Diagram

Below is a conceptual layout for a two-layer IANN:

Multilayer IANN Diagram Multilayer IANN Diagram
  • Layer 1: Takes inputs $X_1$ and $X_2$ (DNA-encoded regulators) and a bias. Their weighted sum drives the production of an endoribonuclease (ERN-A).
  • Layer 2: Uses the ERN-A from Layer 1 as a negative weight (it cleaves the mRNA of the output) and integrates it with a third input, $X_3$.
  • Output: The final fluorescent protein reflects a cascaded computation, allowing for a more complex “decision boundary” than a single-layer model.

Assignment Part 2: Fungal Materials

1. Existing Fungal Materials: Use Cases, Pros, and Cons

Fungi provide a versatile range of materials that can replace traditional plastics and leathers.

MaterialUse CaseAdvantagesDisadvantages
Mycelium Composites (MBCs)Packaging foam, insulation, acoustic panels.Low carbon footprint, fully compostable, fire-resistant.Low mechanical strength; absorbs water/moisture.
Myco-leatherSustainable fashion (e.g., Mylo™).50% lower $CO_2$ footprint than animal leather; high toughness.High production consistency is hard to maintain.
Fungal ChitosanMedical wound dressings.Biocompatible, biodegradable, and shellfish-allergen free.Requires strict regulatory approval; high cost at small scales.

2. Genetic Engineering in Fungi

What I would engineer: I’d want to engineer filamentous fungi (like Aspergillus) to secrete human growth factors (e.g., BMP-2) for bone and dental tissue regeneration.

Why Fungi over Bacteria?

  • Eukaryotic Processing: Fungi have the Golgi and ER needed to perform post-translational modifications (like glycosylation and disulfide bonding). Bacteria like E. coli often fail at this, leaving proteins unfolded or inactive.
  • High Secretion Capacity: Industrial fungi are powerhouses; they can secrete up to 100 g/L of protein, which is far beyond what most bacterial systems can do.
  • Safety: Fungi don’t produce endotoxins, making the purification process for medical-grade human proteins much simpler and safer.

Final Project: DNA Design & Backbone Documentation

Project Design Overview Project Design Overview Vector Map Selection Vector Map Selection


Backbone Vector Details

The insert sequence will be synthesized and cloned into the pET28a expression vector, obtained from Addgene.

Key Features of pET28a:

  • Promoter: Carries a T7 promoter for high-level, IPTG-inducible expression.
  • Selection Marker: Includes a kanamycin resistance cassette for reliable bacterial selection.
  • Purification Tag: Features an N-terminal His-tag, allowing for efficient protein purification via IMAC (Immobilized Metal Affinity Chromatography).

Experimental Context:

This backbone is widely validated for recombinant protein production in E. coli BL21(DE3) and is directly compatible with the final project’s experimental aim. By using this standardized vector, I ensure that the synthesized DNA can be expressed and verified using established laboratory protocols.


Progress Checklist for March 20 Deadline:

  • Draft Aim 1 and Project Summary.
  • Select HTGAA Industry Council members.
  • Shared Benchling/Kernel folder created.
  • Insert sequence designed and uploaded to shared folder.
  • Backbone vector documented (above).

Week 9 HW: Cell Free Systems

HTGAA Week 9: Cell-Free Systems

Part A: General and Lecturer-Specific Questions

1. Explain the main advantages of cell-free protein synthesis (CFPS) over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

The biggest advantage I see is that CFPS turns “biology” into “chemistry.” In traditional in vivo systems, the cell membrane is a wall that prevents us from easily tweaking the internal environment. In a cell-free setup, I have an “open” system. I can add non natural amino acids, adjust salt concentrations (Mg²⁺ and K⁺) in real time, or even add detergents to help fold tricky proteins.

Two cases where this is a game changer:

  • Toxic Protein Production: If I’m trying to express an antimicrobial peptide designed to kill Streptococcus mutans, a live E. coli host would likely die before it can produce a high yield. In CFPS, the “host” is already an extract, so toxicity isn’t an issue.
  • Screening Biofilm Disruptors: I can rapidly test dozens of enzyme variants that degrade biofilm matrices (like glucanases) without the long turnaround time of bacterial transformation and culture.

2. Describe the main components of a cell-free expression system and explain the role of each component.

  • Cell Extract: The “engine” containing ribosomes, tRNAs, and initiation factors harvested from a host cell.
  • Energy Buffer: A mix of NTPs (ATP, GTP, etc.) that power the translation process.
  • Amino Acids: The building blocks used to assemble the protein.
  • DNA/mRNA Template: The genetic instructions (blueprint) for the specific protein I want to make.
  • Salts/Cofactors: Essential for stabilizing the ribosome and metabolic enzymes in the extract.
  • Energy Regeneration System: Reagents like Phosphoenolpyruvate (PEP) that recycle spent ADP back into ATP to keep the reaction running for hours.

3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Energy is the literal fuel for the ribosome. Without a regeneration system, the “pool” of ATP would be depleted in minutes due to the high metabolic demand of protein synthesis and background phosphatase activity. To keep the reaction going, I’d use the PANOX system (which utilizes PEP and the enzyme Pyruvate Kinase). This allows the system to constantly “recharge” the ATP so the reaction can last for 10+ hours.

ATP regeneration pathways for cell-free protein synthesis ATP regeneration pathways for cell-free protein synthesis

Figure: ATP regeneration pathways for cell-free protein synthesis showing the glucose system (top) and creatine phosphate (CP) system (bottom). In the dual-energy system, inorganic phosphate released by the CP system is recycled into glycolysis. Source: Kim et al., Biotechnology and Bioengineering, 2007.

4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

  • Prokaryotic (E. coli): Best for high speed and high yield. I’d use this for Mutanase, an enzyme that breaks down the (1→3)-α-glucan in dental biofilms. It’s a bacterial enzyme, so it folds well in a bacterial extract.
  • Eukaryotic (CHO or Wheat Germ): Better for complex human proteins. I’d use this for Histatin-5, a salivary protein with antifungal properties. Eukaryotic extracts are better at handling the specific folding and potential modifications this protein might need to be fully active.

5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Membrane proteins are hydrophobic and tend to aggregate if they aren’t stuck in a lipid bilayer. To optimize this, I’d use Nanodiscs; tiny, discoidal pieces of lipid bilayer held together by a scaffold protein. I would add these nanodiscs directly to the cell-free reaction so the membrane protein can “sit” in the lipid environment as it’s being made, keeping it stable and functional.

6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

  • RNA Degradation: RNases in the extract might be eating my mRNA. Fix: Add a potent RNase inhibitor like RNasin.
  • Magnesium Titration: The Mg²⁺ concentration might be off. Ribosomes are very picky. Fix: Run a “magnesium sweep” (testing a range of 5–15 mM) to find the peak yield.
  • Energy Exhaustion: The reaction might be running out of steam. Fix: Switch to a “fed batch” approach where I add fresh energy mix every few hours.

Homework Question from Kate Adamala: Synthetic Minimal Cell (SMC)

Pick a function and describe it. What would your synthetic cell do? What is the input and what is the output?

I want to design a “Biofilm Sentinel.” Its job is to sense when the dental biofilm is becoming acidogenic and release a neutralizing enzyme.

  • Input: High acidity (low pH) or Quorum Sensing signals (like CSP from S. mutans).
  • Output: Urease (which produces ammonia to raise the pH) or a bacteriocin to kill the pathogens.

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

No. If it’s not encapsulated, the sensors and enzymes would just wash away with saliva. Encapsulation allows the SMC to stay localized within the “niche” of the biofilm and concentrate the response where it’s needed.

Could this function be realized by genetically modified natural cell?

Possibly, but natural cells are hard to control and might be outcompeted by the existing oral microbiome. An SMC is more predictable and won’t “evolve” into something else.

Describe the desired outcome of your synthetic cell operation.

The outcome is a more balanced oral microbiome where the “bad” bacteria are kept in check, preventing the pH from dropping low enough to cause demineralization.

Design all components that would need to be part of your synthetic cell.

  • Membrane: DOPC/Cholesterol for a robust vesicle.
  • Internal Machinery: E. coli S30 extract.
  • Communication: I need a pH sensitive promoter or a receptor for CSP (Competence Stimulating Peptide). I’ll use the α-hemolysin (aHL) pore to allow the output (Urease) to exit the cell.

Experimental details (Lipids and Genes):

  • Lipids: POPC, Cholesterol.
  • Genes:
    1. comX/comE: The sensing system for S. mutans quorum signals.
    2. ureA/ureB: Genes for Urease to neutralize acid.
    3. hlyA: To create the α-hemolysin pores.
  • Measurement: I’d measure the pH of the surrounding medium over time using a pH sensitive dye like Bromocresol Purple.

Homework Question from Peter Nguyen: Cell-Free Materials

Choose one application field—Architecture, Textiles/Fashion, or Robotics—and propose an application using cell-free systems that are functionally integrated into the material.

Field: Architecture (Healthcare Surface Design)

Summary Pitch: I propose “Active Antimicrobial Coatings” for dental clinic surfaces that only activate when they detect dental pathogens.

Detailed Mechanism: I envision a transparent polymer coating embedded with freeze-dried, cell-free reaction pellets. These pellets contain a genetic circuit that is triggered by AI-2 (Autoinducer-2), a common signaling molecule in oral biofilms. When a pathogen like Porphyromonas gingivalis settles on the surface, its signals rehydrate and activate the CFPS, which then expresses and secretes a localized dose of Lysostaphin to kill the bacteria and prevent biofilm formation.

Societal Challenge: This reduces the reliance on harsh chemical disinfectants and helps prevent the spread of cross contaminants in clinical settings.

Addressing Limitations: To handle “one time use,” the coating would be applied as a “smart film” that can be peeled and replaced once the color indicator (built into the circuit) shows the “bioink” has been spent.

Homework Question from Ally Huang: Genes in Space Proposal

1. Provide background information that describes the space biology question or challenge you propose to address.

Astronauts often suffer from “Space Gingivitis.” Microgravity and radiation seem to shift the oral microbiome toward a more pathogenic state, and the lack of traditional hygiene tools makes it worse. This isn’t just about gum health; oral pathogens are linked to systemic issues like heart disease, which is a major risk on a 3 year mission to Mars.

2. Name the molecular or genetic target that you propose to study.

The target is the gtfB gene (Glucosyltransferase B) of Streptococcus mutans, which is the primary enzyme responsible for the “glue” that allows biofilms to stick to teeth.

3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses.

Biofilms behave differently in microgravity. They often grow thicker and more resistant to antibiotics. By studying how gtfB is expressed and how its resulting protein (GTF-B) folds and functions in space using BioBits, we can understand if the “stickiness” of dental plaque increases in microgravity.

4. Clearly state your hypothesis or research goal and explain the reasoning behind it.

Hypothesis: I hypothesize that the BioBits® system can be used to produce a functional GTF-B inhibitor (a specific nanobody or peptide) in microgravity, and that its binding affinity to the GTF-B protein will be altered by the lack of convection in space.

The reasoning is that if we can’t brush effectively in space, we need “on demand” biological therapeutics. Using BioBits to produce biofilm disrupting proteins allows astronauts to create fresh, personalized oral health treatments without needing a cold chain supply from Earth.

5. Outline your experimental plan.

I will use BioBits® to express the gtfB gene and a fluorescently labeled version of a known inhibitor.

  • Test Sample: BioBits + gtfB DNA + Inhibitor DNA.
  • Control: BioBits + gtfB DNA (no inhibitor).
  • Measurements: I will use the P51 viewer to check for fluorescence. If the inhibitor is being produced and binding correctly, we should see a specific “quench” or shift in the signal. I’ll also use the miniPCR to check if the S. mutans DNA found in astronaut saliva samples shows an “upregulation” of the gtfB gene during flight compared to ground controls.

Homework Part B: Individual Final Project

I’ve put my slide to the slide deck and submit my final project form.

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

1. Which aspect(s) of your project will you measure?

The main goal is to measure how well my custom DNA construct actually stays stuck to the 3D-printed scaffold. I also need to measure the bioactivity of the produced protein — essentially checking if it actually triggers bone-growing signals like it’s supposed to. Finally, I’ll be measuring the retention time, which tells me how much longer my “anchored” version stays on the scaffold compared to a standard version that usually just washes away.

2. Which elements will you measure and how?

  • DNA Adsorption — I’ll measure how much DNA the scaffold can actually “soak up” and hold onto over time.
  • Protein Expression — Instead of using live animals right away, I’ll use a cell-free protein synthesis system to see if the DNA I designed successfully produces the chimeric peptide.
  • Osteogenic Bioactivity — To prove this grows bone, I’ll use an alkaline phosphatase assay, a classic lab test to see if bone-building cells are being activated.

3. What technologies will you use?

  • Benchling & DNA Synthesis — I’ll start by designing the DNA “blueprint” in Benchling and ordering the physical gene block from Twist Bioscience.
  • PCR & Gel Electrophoresis — To confirm the DNA was manufactured correctly and is the right size, I’ll amplify it with PCR and run it on an agarose gel to visualize the bands.
  • Ceramic 3D Printing — I’ll use specialized ceramic printing to create calcium phosphate scaffolds that mimic the structure of real human bone.
  • Opentrons OT-2 — To keep things precise and automated, I’ll write a Python script for this liquid-handling robot to load the DNA onto the scaffolds automatically.

Homework: Waters Part I — Molecular Weight

1. Calculated molecular weight of eGFP from the sequence

Using the Expasy Compute pI/Mw tool (web.expasy.org/compute_pi) on the provided eGFP sequence (including the LE linker and the HHHHHH His-tag):

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL
VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV
NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD
HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE
HHHHHH

Theoretical MW: 28,006.60 Da

2. MW from the adjacent charge state approach

Protein molecules in the mass spec pick up extra protons. I can use the spacing between two adjacent peaks to work out the charge state, and from that the intact mass.

Figure 1. Mass Spectrum of intact eGFP from the Waters Xevo G3 LC-MS. Figure 1. Mass Spectrum of intact eGFP from the Waters Xevo G3 LC-MS.

Figure 1. Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (30,000 resolution), with individual charge state peaks labeled with $m/z$ values.

Selected peaks: $m_1 = 903.71$ and $m_2 = 875.44$.

Step 2.1 — Determine the charge $z$:

$$z = \frac{m_2 - 1.008}{m_1 - m_2} = \frac{875.44 - 1.008}{903.71 - 875.44} \approx \mathbf{31}$$

Step 2.2 — Determine MW:

$$MW = z \times (m_1 - 1.008) = 31 \times (903.71 - 1.008) \approx \mathbf{27{,}983.76\ \text{Da}}$$

Step 2.3 — Mass error (ppm):

$$\text{Error} = \frac{27{,}983.76 - 28{,}006.60}{28{,}006.60} \times 10^{6} \approx \mathbf{-815\ \text{ppm}}$$

3. Can you observe the charge state for the zoomed-in peak?

No. At such a high charge state (+31), the isotope spacing is only $1/31 \approx 0.03$ $m/z$. The instrument can’t cleanly resolve peaks that close together, so the isotopes blend into a single smooth envelope instead of appearing as individual lines.


Homework: Waters Part III — Peptide Mapping (Primary Structure)

1. How many Lysines (K) and Arginines (R) are in eGFP?

Counting the tryptic cleavage sites in the sequence:

ResidueCount
Lysine (K)20
Arginine (R)6
Total cleavage sites26

2. How many peptides from tryptic digestion?

Trypsin acts like molecular scissors cutting after every K and R. 26 cut sites → 27 peptides.

Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard. Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard.

Figure 4. Example conditions for predicting the number of tryptic peptides from the eGFP standard in the Expasy PeptideMass tool.

3. How many chromatographic peaks in Figure 5a (>10% abundance)?

Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map.

Figure 5a. Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 min is circled; its MS data appears in Figure 5b.

Counting peaks that stand clearly above baseline between 0.5–6 min: 21 peaks.

4. Does the number of peaks match the prediction?

No — I predicted 27 but only count 21. This usually happens because:

  • some peptides are so small they elute in the void volume (too fast to be separated),
  • some peptides co-elute (very close in hydrophobicity, so they come off the column at the same time),
  • and some may be below the 10% abundance threshold.

5. $m/z$ and charge of the peptide in Figure 5b

Figure 5b. Mass spectrum for the chromatographic peak at 2.78 min. Figure 5b. Mass spectrum for the chromatographic peak at 2.78 min.

Figure 5b. Mass spectrum for the 2.78-min peak; inset zooms into $m/z$ 525.76 to show the isotopes.

  • Peak: $m/z = 525.76$
  • Charge — the isotope spacing is $\approx 0.5$, so $z = 1 / 0.5 = \mathbf{+2}$
  • Singly-charged mass:

$$[M+H]^+ = (525.76 \times 2) - 1.008 \approx \mathbf{1050.51\ \text{Da}}$$

6. Peptide identity and mass accuracy

Figure 5c. Fragmentation spectrum of the 2.78-min peptide. Figure 5c. Fragmentation spectrum of the 2.78-min peptide.

Figure 5c. Fragmentation spectrum of the peptide eluting at 2.78 min.

This matches the predicted tryptic peptide FEGDTLVNR.

$$\text{Error (ppm)} = \frac{|1050.526 - 1050.521|}{1050.521} \times 10^{6} \approx \mathbf{4.7\ \text{ppm}}$$

A sub-5-ppm error is well within the spec of a high-resolution LC-MS, so the match is confident.

7. Percentage of sequence confirmed

Figure 6. Amino Acid Coverage Map of eGFP. Figure 6. Amino Acid Coverage Map of eGFP.

Figure 6. Amino acid coverage map of eGFP from the BioAccord LC-MS peptide identification data.

From the coverage map: 88% of the eGFP sequence was identified by peptide mapping.


Homework: Waters Part IV — Oligomers

Using the known subunit masses (7FU = 340 kDa, 8FU = 400 kDa), I identified the following oligomeric states on the CDMS spectrum:

Figure 7. Mass spectrum of KLH acquired on the CDMS. Figure 7. Mass spectrum of KLH acquired on the CDMS.

Figure 7. Mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS.

OligomerSubunitsCalculationPeak on spectrum
7FU Decamer10 × 340 kDa3.4 MDa~3.4 M Da
8FU Decamer10 × 400 kDa4.0 MDa~4.0 M Da
8FU Didecamer20 × 400 kDa8.0 MDa~8.0 M Da
8FU 3-Decamer30 × 400 kDa12.0 MDa~12.0 M Da

Homework: Waters Part V — Did I make GFP?

Final summary of the intact LC-MS measurement against theoretical values:

MeasurementTheoreticalMeasuredResult
Intact Protein Weight28.006 kDa27.983 kDa✅ Match (~815 ppm)
Sequence Coverage100%88%✅ High confidence
Peptide Mass $[M+H]^+$1050.521 Da1050.526 Da✅ Very accurate (~4.7 ppm)

Conclusion: the intact mass, the peptide-map coverage, and the MS/MS identification of FEGDTLVNR all agree with the expected eGFP standard — so yes, the protein is eGFP.

Week 11 HW: Bioproduction & Cloud Labs

Week 11 — Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Make a note on your HTGAA webpages including:

(a) What you contributed to the community bioart project (e.g., “I made part of the DNA on the bottom right plate”)

Honestly, I didn’t get to contribute a pixel this time. The window between the personalized URL going out and the editing deadline on Sunday 4/19 closed before I was able to sit down and place mine, which I’m a bit bummed about because the project sounded cool.

(b) What you liked about the project

There’s something charming about turning a piece of lab plasticware into a collaborative painting. The fact that the “paint” is actually six different fluorescent proteins (so the colours come from real biology, not a filter) made it feel meaningful in a way that ordinary digital art wouldn’t. And the global participation aspect was the part I kept thinking about: strangers in different time zones jointly producing one coherent image is exactly the kind of thing cloud labs are supposed to make possible.

(c) What about this collaborative art experiment could be made better for next year

The biggest thing for me would be a longer contribution window.


Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

E. coli Lysate, BL21(DE3) Star with T7 RNA Polymerase: This is the engine of the whole reaction. Lysing the cells gives you all the ribosomes, tRNAs, translation factors, and aminoacyl-tRNA synthetases you need, and the BL21 Star background means RNase E is knocked down so your transcripts last longer. The included T7 RNAP transcribes any DNA template that’s under a T7 promoter.

Salts / Buffer

  • Potassium glutamate: The main monovalent cation source. Glutamate is used instead of chloride because it’s much gentler on translation. Chloride tends to inhibit ribosome activity at the concentrations you’d need.
  • HEPES-KOH pH 7.5: Buffers the reaction near physiological pH. This matters a lot over long incubations because cell-free reactions tend to acidify as energy substrates get metabolized.
  • Magnesium glutamate: Mg²⁺ is essential for ribosome assembly, RNAP activity, and basically every NTP-using enzyme. The concentration is fussy: too little and translation stalls, too much and you get misreading.
  • Potassium phosphate (mono- and dibasic): Together they act as a secondary buffer and supply inorganic phosphate, which feeds back into NTP regeneration.

Energy / Nucleotide System

  • Ribose: Substrate for the pentose phosphate pathway in the lysate; gets converted into PRPP, which is used to build nucleotides from free bases.
  • Glucose: Cheap, slow-burn energy source feeding glycolysis. It produces ATP gradually rather than all at once, which is what makes the long-incubation format possible.
  • AMP, CMP, GMP, UMP: The NMPs are starting material for nucleotide regeneration. Endogenous kinases in the lysate phosphorylate them up to NTPs, which are what RNAP uses.
  • Guanine: Free base that feeds into the salvage pathway (more on this in question 3).

Translation Mix (Amino Acids)

  • 17 amino acid mix: Provides the pool of monomers for the ribosome to assemble into protein.
  • Tyrosine and Cysteine are added separately because they’re less soluble than the others and tend to need their own handling, so they’re broken out of the main mix.

Additives

Nicotinamide: Precursor for NAD⁺/NADH regeneration. The lysate’s redox metabolism eats through these cofactors quickly and keeping the NAD pool topped up helps maintain energy regeneration over long reactions.

Backfill

Nuclease-Free Water: Brings the reaction to its final volume without introducing contaminating RNases or DNases that would chew up your template or transcripts.

2. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

The 1-hour PEP-NTP mix is built for speed: it ships pre-formed NTPs and uses phosphoenolpyruvate as a high-energy phosphate donor, so ATP regeneration happens in basically a single enzymatic step (via pyruvate kinase). You get a fast burst of expression, but PEP is expensive and gets exhausted quickly, so the reaction plateaus within an hour.

The 20-hour NMP-Ribose-Glucose mix takes the opposite approach. Instead of pre-built NTPs, it relies on the lysate’s own glycolysis and pentose phosphate pathway to slowly assemble NTPs from cheaper precursors (NMPs, ribose, glucose). Yields per unit time are lower but the reaction sustains itself for much longer.

In practice, PEP-NTP is what you’d reach for in a screen where you want a quick yes/no on expression, while NMP-Ribose-Glucose is the right call for our 36-hour fluorescence experiment where total integrated signal matters more than how fast it gets there.

3. Bonus question: How can transcription occur if GMP is not included but Guanine is?

Through the purine salvage pathway. The lysate retains the enzymes that normally let E. coli recycle bases rather than building them from scratch. Free guanine gets joined to PRPP (5-phosphoribosyl-1-pyrophosphate, made from the ribose in the energy mix) by the enzyme HGPRT, producing GMP. From there, guanylate kinase phosphorylates GMP → GDP, and nucleoside diphosphate kinase finishes the job → GTP, which is the actual substrate T7 RNAP uses for transcription. So even without GMP in the mix directly, the reaction generates it on demand, which is partly why ribose is included in the energy system in the first place.


Part C: Planning the Global Experiment | Cell-Free Master Mix Design

1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (1-2 sentences each)

Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc.

sfGFP (superfolder GFP): The standout property here is folding robustness. sfGFP was engineered specifically to fold correctly even from poorly behaved fusion partners or under suboptimal conditions. In a cell-free lysate, where chaperone availability is limited compared to a live cell, this matters a lot: you get reliable maturation and a clean fluorescence readout even if other reaction parameters drift.

mRFP1: Slow chromophore maturation is the big one. mRFP1 was an early-generation monomeric red derived from DsRed, and while it fixed the tetramerization problem, the maturation half-time is still on the order of an hour, meaning a chunk of synthesized protein in a 36-hour cell-free run will be sitting in a non-fluorescent intermediate state, especially at the early time points.

mKO2: Acid sensitivity. mKO2 has a relatively high pKa (~5.5), which sounds fine in isolation, but cell-free reactions noticeably acidify over long incubations as glycolytic byproducts accumulate. This means mKO2’s apparent brightness can drop later in the run not because less protein is being made, but because more of it is sitting in a protonated, dim state.

mTurquoise2: Very high quantum yield (~0.93) and tight folding kinetics. The practical consequence in cell-free is that you get an unusually favorable signal-to-noise ratio per molecule of folded protein, so mTurquoise2 is forgiving of low expression yields. It’s also a popular FRET donor for the same reason.

mScarlet-I: The “I” variant was engineered specifically for faster maturation than the original mScarlet, trading a small amount of brightness for speed. In a 36-hour cell-free run that distinction shows up clearly: you see signal accumulation earlier in the time course rather than only at late time points, which is the main reason mScarlet-I tends to be preferred over mScarlet in dynamic measurements.

Electra2: Oxygen-dependent chromophore maturation. All FPs in the GFP/DsRed family need molecular O₂ for the autocatalytic cyclization step that forms the fluorophore, but Electra2 is on the more demanding end. In a sealed or partially anaerobic cell-free reaction, that can cap the fraction of molecules that ever become fluorescent.

2. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

I’m going to base my hypothesis on sfGFP, partly because it’s the protein I have the cleanest mental model for and partly because its main bottleneck is something you can actually push on with the master mix.

Hypothesis: For sfGFP, modestly increasing magnesium glutamate (in the 8 to 12 mM range) together with topping up the energy regeneration components, specifically ensuring sustained NTP supply via the NMP-Ribose-Glucose route, will increase the time-integrated fluorescence over a 36-hour incubation.

Reasoning: sfGFP’s folding is essentially not the rate-limiting step (that’s the whole point of “superfolder”), so the real ceiling on its 36-hour signal is how much protein the system can actually translate before energy or ribosome activity gives out. Mg²⁺ directly affects both ribosome stability and translation fidelity, so a small bump should improve elongation rates without tipping into misreading territory. And because sfGFP matures fast and is photostable, every additional molecule synthesized translates almost immediately into detectable signal, with no maturation backlog masking the gains.

Expected effect: Earlier rise to a higher fluorescence plateau, and crucially a flatter decay curve toward the back end of the 36-hour window, since translation continues feeding new molecules into a pool that doesn’t lose signal quickly.


Notes on next phases

The second phase of this lab will be to define the precise reagent concentrations for the cell-free experiment. Wells with specific fluorescent proteins will be assigned and instructions sent by email by April 24. Master mix compositions can be drafted ahead of time.

The final phase will be analyzing the fluorescence data once it comes back, to see if any conclusions can be drawn about favorable reagent compositions for our fluorescent proteins. Due a week after the data is returned (date TBD).

Reaction composition per well

ComponentVolume
Lysate6 μL
2X Optimized Master Mix10 μL
Assigned fluorescent protein DNA template2 μL
Custom reagent supplements2 μL
Total20 μL

Week 12 HW: Building Genomes

Week 13 HW: Biodesign and Engineered Living Materials

Week 14 HW: Bio Design and Bio Fabrication

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image