IAN SEBASTIAN TERAN GARCIA — HTGAA Spring 2026

cover image cover image

About me

Hello! I’m Ian Sebastian Teran Garcia. I’m a third-year Biotechnology Engineering student in Cochabamba, Bolivia. I’m passionate about Synthetic Biology and Bioinformatics :)

I am also a co-founder of ReGlassia, a synthetic biology startup. You can know more about us here! : https://linktr.ee/re.glassia

Contact info

Homework

  • This page includes Class Assigment and Week 2 Lecture Preparation Questions Class Assignment

    1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. For HTGAA 2026, I’d like to propose the design and development of a synthetic biology based microbial system for the improvement of agricultural productivity in saline soils of the Bolivian Altiplano. This is because oil salinization is continuing to progress in the high-altitude areas of Bolivia as a consequence of climate change, water shortage and historical land use (Andrade, 2025). According to the Food and Agriculture Organization (n.d.), already a considerable fraction of irrigated and arid agricultural lands worldwide face the challenge of soil salinity. Scientific studies have shown that soil salinity significantly reduces crop yields, alters soil biological functions, and directly threatens food security, particularly in smallholder farming systems (Farooq et al., 2021). In the same way, the majority of smallholder farmers in the Altiplano rely on marginal soils, often where conventional fertilizers cannot be used effectively or are economically unaffordable and are a direct threat to local food security and livelihoods from salinization. This is why my proposed project aims to investigate the conceptual design for soil microorganisms that can sense such high salinity and improve soil structure and plant stress tolerance. However, beyond its technical feasibility, this application raises relevant ethical, environmental and governance issues surrounding environmental release and biosafety and also equitable access to biotechnology. Finally, as a Bolivian, I see this work as an opportunity to link cutting edge biological engineering with locally anchored solutions that address real challenges faced by vulnerable agricultural communities in my country.
  • Homework: Final Project 1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. I would like to measure multiple biological and functional aspects of the synthetic rhizosphere consortium composed of Pseudomonas fluorescens, Azospirillum brasilense, and Bacillus subtilis. Key variables include the production of osmoprotectants (such as proline or trehalose) under saline stress, nitrogen fixation efficiency, biofilm formation and exopolysaccharide (EPS) production, and the presence, sequence accuracy, and expression of engineered genetic constructs, including kill switch systems. At a higher level, the project will also assess microbial population dynamics and plant growth indicators such as root length and biomass, which serve as direct proxies for improved agricultural productivity under salt stress.
  • Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork 1. My contribution Unfortunately, I was not able to contribute a pixel to the collective artwork, as I was in the middle of midterm exams at my university during that period, which limited my availability to participate. 2. What I liked about the project I really liked the project because of its biological foundation and particularly its connection to cell-free fluorescent protein optimization and how it was used for a global pixel artwork designed by HTGAA students :)
  • HOMEWORK 2 Part 1: Benchling & In-silico Gel Art See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview: Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs! HOMEWORK RESULTS :)
  • Part A. Conceptual Questions

    1. How many amino acid molecules do you take with 500 g of meat? If we assume that meat is approximately 20% protein, then 500 grams of meat contains about 100 grams of protein. The average molecular weight of an amino acid is roughly 100 Daltons (100 g/mol). Dividing 100 grams by 100 g/mol gives approximately 1 mole of amino acids and one mole contains 6.02 × 10²³ molecules, the Avogadro’s number. Therefore, consuming 500 grams of meat means ingesting on the order of 10²³ amino acid molecules.
  • Part A: SOD1 Binder Peptide Design (From Pranam) Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.
  • Assignment: DNA Assembly Answer these questions about the protocol in this week’s lab:

    1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The Phusion High-Fidelity PCR Master Mix contains several key components necessary for efficient and accurate DNA amplification. First, it includes Phusion DNA polymerase, a high-fidelity enzyme with proofreading activity (3’ → 5’ exonuclease), which reduces errors during DNA replication. It also contains dNTPs (deoxynucleotide triphosphates), which are the building blocks used to synthesize new DNA strands. The mix includes a reaction buffer, optimized with the correct pH and salt concentrations to ensure proper enzyme activity. Additionally, it contains Mg²⁺ ions, which act as essential cofactors for the polymerase. Some mixes may also include stabilizers to maintain enzyme activity during thermal cycling.
  • Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

    1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Continuous signal processing: Unlike Boolean circuits that operate in binary (On/Off), IANNs can process graded inputs and outputs, enabling more nuanced cellular responses. Integration of multiple inputs: IANNs can combine many signals simultaneously and compute a weighted response, similar to an artificial neural network. Instead of being limited to simple logic gates (and, or, not), IANNs can model nonlinear relationships between inputs and outputs.
  • Homework Part A: General and Lecturer-Specific Questions General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free protein synthesis (CFPS) offers significant advantages over traditional in vivo expression systems, primarily due to its flexibility and precise control over experimental conditions. Because CFPS operates in an open environment without living cells, researchers can directly manipulate the concentrations of DNA templates, ions, cofactors, and other components in real time. This eliminates constraints associated with cellular viability, such as toxicity or metabolic burden. As a result, CFPS is particularly advantageous for the production of proteins that are toxic to host cells, such as antimicrobial peptides or pore-forming proteins. Additionally, CFPS enables rapid prototyping of genetic constructs, making it highly suitable for applications like synthetic biology circuit testing, where speed and iterative design are essential.
  • Assignment: Python Script for Opentrons Artwork Link: https://colab.research.google.com/drive/1k7nG8YrBwt0K0HMJtZNGSgcDcVOCriU7#scrollTo=JpjyYDE79Dfl MY CODE: import math ################################ GREEN SECTION (Body + Flagella) ################################ pipette_20ul.pick_up_tip() center = center_location Oval body: a = 16 b = 8 points = 40 for i in range(points): if i % 8 == 0: pipette_20ul.aspirate(8, location_of_color(‘Green’)) angle = 2 * math.pi * i / points x = a * math.cos(angle) y = b * math.sin(angle) loc = center.move(types.Point(x=x, y=y, z=0)) dispense_and_detach(pipette_20ul, 1, loc) Flagella:
Weyes Blood – Titanic Rising
Homework 01 — Principles & Practices Side A

Weyes Blood — Titanic Rising

HTGAA 2026 · Week 01

Björk – Vespertine
Homework 02 — DNA Read, Write & Edit Side A

Björk — Vespertine

HTGAA 2026 · Week 02

Labs

Projects

Subsections of IAN SEBASTIAN TERAN GARCIA — HTGAA Spring 2026

Homework

Weekly homework submissions:

  1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. For HTGAA 2026, I’d like to propose the design and development of a synthetic biology based microbial system for the improvement of agricultural productivity in saline soils of the Bolivian Altiplano. This is because oil salinization is continuing to progress in the high-altitude areas of Bolivia as a consequence of climate change, water shortage and historical land use (Andrade, 2025). According to the Food and Agriculture Organization (n.d.), already a considerable fraction of irrigated and arid agricultural lands worldwide face the challenge of soil salinity. Scientific studies have shown that soil salinity significantly reduces crop yields, alters soil biological functions, and directly threatens food security, particularly in smallholder farming systems (Farooq et al., 2021). In the same way, the majority of smallholder farmers in the Altiplano rely on marginal soils, often where conventional fertilizers cannot be used effectively or are economically unaffordable and are a direct threat to local food security and livelihoods from salinization. This is why my proposed project aims to investigate the conceptual design for soil microorganisms that can sense such high salinity and improve soil structure and plant stress tolerance. However, beyond its technical feasibility, this application raises relevant ethical, environmental and governance issues surrounding environmental release and biosafety and also equitable access to biotechnology. Finally, as a Bolivian, I see this work as an opportunity to link cutting edge biological engineering with locally anchored solutions that address real challenges faced by vulnerable agricultural communities in my country.
  • Week 10 HW: Advanced Imaging & Measurement Technology

    Homework: Final Project 1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. I would like to measure multiple biological and functional aspects of the synthetic rhizosphere consortium composed of Pseudomonas fluorescens, Azospirillum brasilense, and Bacillus subtilis. Key variables include the production of osmoprotectants (such as proline or trehalose) under saline stress, nitrogen fixation efficiency, biofilm formation and exopolysaccharide (EPS) production, and the presence, sequence accuracy, and expression of engineered genetic constructs, including kill switch systems. At a higher level, the project will also assess microbial population dynamics and plant growth indicators such as root length and biomass, which serve as direct proxies for improved agricultural productivity under salt stress.

  • Week 11 HW: Bioproduction & Cloud Labs

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork 1. My contribution Unfortunately, I was not able to contribute a pixel to the collective artwork, as I was in the middle of midterm exams at my university during that period, which limited my availability to participate.

  1. What I liked about the project I really liked the project because of its biological foundation and particularly its connection to cell-free fluorescent protein optimization and how it was used for a global pixel artwork designed by HTGAA students :)
  • Week 2 HW: DNA Read, Write, & Edit

    HOMEWORK 2 Part 1: Benchling & In-silico Gel Art See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview: Make a free account at benchling.com Import the Lambda DNA. Simulate Restriction Enzyme Digestion with the following Enzymes: EcoRI HindIII BamHI KpnI EcoRV SacI SalI Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks. You might find Ronan’s website a helpful tool for quickly iterating on designs! HOMEWORK RESULTS :)

  • Week 4 HW: Protein Design

    Part A. Conceptual Questions

  1. How many amino acid molecules do you take with 500 g of meat? If we assume that meat is approximately 20% protein, then 500 grams of meat contains about 100 grams of protein. The average molecular weight of an amino acid is roughly 100 Daltons (100 g/mol). Dividing 100 grams by 100 g/mol gives approximately 1 mole of amino acids and one mole contains 6.02 × 10²³ molecules, the Avogadro’s number. Therefore, consuming 500 grams of meat means ingesting on the order of 10²³ amino acid molecules.
  • Week 5 HW: Protein Design Part II

    Part A: SOD1 Binder Peptide Design (From Pranam) Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • Week 6 HW: Genetic Circuits Part I

    Assignment: DNA Assembly Answer these questions about the protocol in this week’s lab:

  1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The Phusion High-Fidelity PCR Master Mix contains several key components necessary for efficient and accurate DNA amplification. First, it includes Phusion DNA polymerase, a high-fidelity enzyme with proofreading activity (3’ → 5’ exonuclease), which reduces errors during DNA replication. It also contains dNTPs (deoxynucleotide triphosphates), which are the building blocks used to synthesize new DNA strands. The mix includes a reaction buffer, optimized with the correct pH and salt concentrations to ensure proper enzyme activity. Additionally, it contains Mg²⁺ ions, which act as essential cofactors for the polymerase. Some mixes may also include stabilizers to maintain enzyme activity during thermal cycling.
  1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Continuous signal processing: Unlike Boolean circuits that operate in binary (On/Off), IANNs can process graded inputs and outputs, enabling more nuanced cellular responses. Integration of multiple inputs: IANNs can combine many signals simultaneously and compute a weighted response, similar to an artificial neural network. Instead of being limited to simple logic gates (and, or, not), IANNs can model nonlinear relationships between inputs and outputs.
  • Week 9 HW: Cell-Free Systems

    Homework Part A: General and Lecturer-Specific Questions General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free protein synthesis (CFPS) offers significant advantages over traditional in vivo expression systems, primarily due to its flexibility and precise control over experimental conditions. Because CFPS operates in an open environment without living cells, researchers can directly manipulate the concentrations of DNA templates, ions, cofactors, and other components in real time. This eliminates constraints associated with cellular viability, such as toxicity or metabolic burden. As a result, CFPS is particularly advantageous for the production of proteins that are toxic to host cells, such as antimicrobial peptides or pore-forming proteins. Additionally, CFPS enables rapid prototyping of genetic constructs, making it highly suitable for applications like synthetic biology circuit testing, where speed and iterative design are essential.

  • Week 3 HW: Lab Automation

    Assignment: Python Script for Opentrons Artwork Link: https://colab.research.google.com/drive/1k7nG8YrBwt0K0HMJtZNGSgcDcVOCriU7#scrollTo=JpjyYDE79Dfl MY CODE: import math ################################ GREEN SECTION (Body + Flagella) ################################ pipette_20ul.pick_up_tip() center = center_location Oval body: a = 16 b = 8 points = 40 for i in range(points): if i % 8 == 0: pipette_20ul.aspirate(8, location_of_color('Green')) angle = 2 * math.pi * i / points x = a * math.cos(angle) y = b * math.sin(angle) loc = center.move(types.Point(x=x, y=y, z=0)) dispense_and_detach(pipette_20ul, 1, loc) Flagella:

Subsections of Homework

Week 1 HW: Principles and Practices

This page includes Class Assigment and Week 2 Lecture Preparation Questions

cover image cover image

Class Assignment

1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

For HTGAA 2026, I’d like to propose the design and development of a synthetic biology based microbial system for the improvement of agricultural productivity in saline soils of the Bolivian Altiplano. This is because oil salinization is continuing to progress in the high-altitude areas of Bolivia as a consequence of climate change, water shortage and historical land use (Andrade, 2025). According to the Food and Agriculture Organization (n.d.), already a considerable fraction of irrigated and arid agricultural lands worldwide face the challenge of soil salinity. Scientific studies have shown that soil salinity significantly reduces crop yields, alters soil biological functions, and directly threatens food security, particularly in smallholder farming systems (Farooq et al., 2021). In the same way, the majority of smallholder farmers in the Altiplano rely on marginal soils, often where conventional fertilizers cannot be used effectively or are economically unaffordable and are a direct threat to local food security and livelihoods from salinization. This is why my proposed project aims to investigate the conceptual design for soil microorganisms that can sense such high salinity and improve soil structure and plant stress tolerance. However, beyond its technical feasibility, this application raises relevant ethical, environmental and governance issues surrounding environmental release and biosafety and also equitable access to biotechnology. Finally, as a Bolivian, I see this work as an opportunity to link cutting edge biological engineering with locally anchored solutions that address real challenges faced by vulnerable agricultural communities in my country.

2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals. Below is one example framework (developed in the context of synthetic genomics) you can choose to use or adapt, or you can develop your own. The example was developed to consider policy goals of ensuring safety and security, alongside other goals, like promoting constructive uses, but you could propose other goals for example, those relating to equity or autonomy.

Main Goal: Ensuring Environmental Safety and Biosecurity This goal focuses on preventing ecological harm and unintended consequences associated with the environmental use of engineered microorganisms.

  • Sub-goal 1: Preventing Uncontrolled Spread

-> Design biological containment mechanisms to limit survival outside target environments.

-> Require environmental risk assessments prior to any field deployment.

  • Sub-goal 2: Reducing Ecological Uncertainty

-> Promote long-term monitoring of soil and microbial ecosystem impacts.

-> Establish protocols for detecting and responding to unintended ecological effects.

Main Goal: Promoting Equity and Responsible Use This goal ensures that the benefits of the technology reach vulnerable communities without reinforcing existing inequalities.

  • Sub-goal 1: Supporting Smallholder Farmers

-> Ensure that the technology is affordable and adapted to local agricultural contexts.

-> Encourage community involvement in deployment decisions.

  • Sub-goal 2: Preventing Technological Exploitation

-> Avoid extractive research practices in developing regions.

-> Promote benefit-sharing and local capacity building.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.). Purpose: What is done now and what changes are you proposing? Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc) Assumptions: What could you have wrong (incorrect assumptions, uncertainties)? Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

A) Biosafety by design through genetic containment. Purpose: Current agricultural biotechnology often relies on external monitoring after deployment. This action proposes embedding biosafety mechanisms directly into the engineered organisms.

Design:

  • Implemented by academic researchers and biotech developers.
  • Reviewed by institutional biosafety committees and environmental regulators.

Assumptions: Genetic containment systems function reliably in complex soil environments.

Risks of Failure & “Success”:

  • Failure: Evolutionary escape from containment mechanisms
  • Unintended Success: Reduced emphasis on ecological monitoring due to overconfidence in technical controls.

B) Regulatory frameworks for environmental synthetic biology. Purpose: Environmental release regulations are often unclear or inconsistent. This action proposes clearer regulatory pathways specific to environmental synthetic biology.

Design: National environmental and agricultural agencies conduct standardized risk assessments.

Assumptions: Regulators have sufficient technical expertise.

Risks of Failure & “Success”:

  • Failure: Overregulation slows innovation
  • Unintended Success: Rapid approval without sufficient local adaptation

C) Community centered deployment and oversight. Purpose: Agricultural technologies should align with the needs and values of affected communities.

Design:

  • Collaboration among researchers, NGOs, and local farming communities.
  • Participatory decision making processes.

Assumptions: Community participation is meaningful and informed.

Risks of Failure & “Success”:

  • Failure: Delays due to conflicting priorities.
  • Unintended Success: Token participation without real influence.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents123
• By helping respond223
Foster Lab Safety
• By preventing incident12N/A
• By helping respond22N/A
Protect the environment
• By preventing incidents211
• By helping respond221
Other considerations
• Minimizing costs and burdens to stakeholders232
• Feasibility?122
• Not impede research231
• Promote constructive applications221

5. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Based on the comparative scoring of the governance options, the approach that I would prioritize is a combination of biosafety by design and community centered governance. This is because embedding safety mechanisms directly into engineered soil microorganisms is essential to prevent unintended ecological harm and to address biosecurity concerns at the earliest stage of development. This option performs strongly in preventing incidents and maintaining laboratory and environmental safety, making it a foundational requirement for any responsible application of environmental synthetic biology. At the same time, community centered governance is critical for ensuring that this technology is ethically deployed in the Bolivian Altiplano and engaging local farming communities helps align the technology with real agricultural needs, promotes trust and reduces the risk of inequitable or extractive use.

Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.

A key ethical concern that stood out to me was the increasing use of artificial intelligence in synthetic biology because AI tools can greatly accelerate the design of engineered microorganisms, such as those proposed in my project to improve agricultural productivity in saline soils of the Bolivian Altiplano. However, a new ethical issue for me was the possibility that decisions driven by AI models may lack transparency or embed biases, potentially leading to unintended ecological consequences when organisms are applied in open environments. In consequence, to address these issues, I would suggest appropriate governance actions; for example, transparency in the use of AI for biological design, rigorous validation and risk assessment prior to environmental application. In addition, governance frameworks should encourage participatory approaches that involve local communities and ensure that resulting technologies are accessible, safe and aligned with local agricultural needs.

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson:

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

DNA polymerase copies DNA with high accuracy as the raw error rate of DNA polymerase is about 1 mistake per 10⁵ nucleotides copied. However, most DNA polymerases also have a proofreading function which corrects many of these mistakes, improving accuracy to about 1 error per 10⁷ - 10⁸ nucleotides and after replication, additional DNA repair systems fix remaining errors, bringing the final error rate to roughly 1 mistake per 10⁹ - 10¹⁰ nucleotides. On the other hand, the human genome is about 3 × 10⁹ base pairs long which means that without repair, thousands of errors would occur when a cell divides. For the last question, biology deals with this discrepancy through three layers of control which are polymerase proofreading, mismatch repair and other DNA repair pathways, keeping mutation rates low enough for genome stability while still allowing evolution.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Proteins are encoded using codons which are groups of three DNA nucleotides and there are 64 possible codons but only 20 amino acids plus stop signals. It is for this reason that most amino acids are encoded by multiple codons being this called degeneracy of the genetic code. On the other side, for an average human protein of about 400 amino acids, the number of possible DNA sequences that could encode the same protein is more than 10¹⁹ possible sequences. However, in practice, most of these sequences do not work well because some codons are translated more efficiently, certain sequences affect mRNA stability and others create unwanted secondary structures meanwhile some interfere with translation speed and protein folding. Moreover, regulatory elements, splicing signals, and GC content also limit which DNA sequences can successfully produce a functional protein in real cells.

Homework Questions from Dr. LeProust:

1.What’s the most commonly used method for oligo synthesis currently?

The most widely used nowadays is solid - phase phosphoramidite chemical synthesis and in this method DNA is built one nucleotide at a time on a solid support (controlled - pore glass). Also, each synthesis cycle adds one base through chemical reactions (deprotection, coupling, capping, oxidation) making this process fast and reliable for short DNA sequences being this the reason why it dominates both research and commercial oligo production.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because each synthesis step is not 100% efficient. As oligos get longer, small inefficiencies compound leading to incorrect sequences. For example, after 200 cycles the fraction of full - length, correct molecules drops sharply. In addition, longer oligos accumulate chemical side products, have higher error rates and are harder to purify.

3. Why can’t you make a 2000bp gene via direct oligo synthesis? Because a 2000 bp gene would require 2000 consecutive chemical synthesis cycles that would result in a low yield of correct full - length DNA due to errors and the final product would be dominated by short fragments and mutated sequences, making purification not practical. Instead, long genes are made by assembling shorter, high - quality oligos through Gibson assembly or Golden Gate which improves accuracy and yield.

Homework Question from George Church:

1. [Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids that all animals have are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine (HyperPhysics, n.d.). On the other hand my view of “lysine contingency” now makes me think that as all animals require lysine from their environment, synthetic biology could turn this constraint into a design principle in this area by engineering organisms that depend on externally supplied lysine and scientists would be able to control growth, improve biosafety and limit ecological spread. This would be very interesting for applications in agriculture, in my opinion.

References.

Andrade, D. (2025). Characterization, prediction, and remediation of salt-affected soils in the High Valley of Cochabamba - Bolivia (Doctoral thesis, Université de Liège - Gembloux Agro-Bio Tech). ORBi-University of Liège. https://orbi.uliege.be/handle/2268/325556

Farooq, M., et al. (2021). Climate change and salinity effects on crops and plant–microbe interactions. Frontiers in Sustainable Food Systems, 5, 618092. https://www.frontiersin.org/articles/10.3389/fsufs.2021.618092/full

Food and Agriculture Organization of the United Nations. (n.d.). Soil salinity. FAO Global Soil Partnership. https://www.fao.org/global-soil-partnership/areas-of-work/soil-salinity/en/

HyperPhysics. (n.d.). Essential Amino Acids. HSC.edu.kw. http://www.hsc.edu.kw/student/materials/Physics/website/hyperphysics%20modified/hbase/organic/essam.html

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

1. Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

I would like to measure multiple biological and functional aspects of the synthetic rhizosphere consortium composed of Pseudomonas fluorescens, Azospirillum brasilense, and Bacillus subtilis. Key variables include the production of osmoprotectants (such as proline or trehalose) under saline stress, nitrogen fixation efficiency, biofilm formation and exopolysaccharide (EPS) production, and the presence, sequence accuracy, and expression of engineered genetic constructs, including kill switch systems. At a higher level, the project will also assess microbial population dynamics and plant growth indicators such as root length and biomass, which serve as direct proxies for improved agricultural productivity under salt stress.

2. Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

Osmoprotectant levels will be quantified using high-performance liquid chromatography (HPLC) or mass spectrometry, which allow precise detection of small metabolites. Nitrogen fixation will be evaluated using the acetylene reduction assay (ARA) to measure nitrogenase activity, complemented by colorimetric assays for ammonia production. Biofilm formation will be quantified using crystal violet staining, while EPS production will be assessed using carbohydrate quantification assays. Gene expression levels associated with salt response and nitrogen fixation will be measured using quantitative PCR (qPCR), and reporter systems (fluorescence) may be used to monitor activation of engineered circuits such as salt-inducible promoters or kill switches. Plant performance will be evaluated through standard phenotyping methods, including biomass measurements and root morphology analysis.

3. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

I was thinking of DNA sequencing (Sanger or next-generation sequencing) will be used to confirm the accuracy of genetic constructs designed in Benchling, while gel electrophoresis will verify plasmid size and integrity. Mass spectrometry and HPLC will enable sensitive metabolite quantification, and qPCR will provide precise measurement of gene expression levels. Protein expression can be validated using Western blotting or fluorescence-based detection systems. Additionally, colony-forming unit (CFU) counts and live/dead staining assays will be used to evaluate kill switch functionality under different environmental conditions. Finally, 16S rRNA sequencing will allow monitoring of microbial community composition and stability within the consortium. Together, these technologies create a comprehensive and quantitative framework to validate the performance and safety of the designed system.

Homework: Waters Part I — Molecular Weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/ eGFP Sequence:

MVSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTLTYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT LGMDELYKLE HHHHHH

Note: This contains a His-purification tag (HHHHHH) and a linker (the LE before it).

According to the ExPASy Compute pI/Mw tool, the theoretical molecular weight of the eGFP construct is 28,006.60 Da (≈ 28.01 kDa), with a predicted pI of 5.90.

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

a) The formula provided expresses the charge state in terms of the ratios m/zₙ and m/zₙ₊₁, which represent two adjacent peaks in the mass spectrum.

Although it is written as m divided by z, these terms correspond directly to the experimentally measured m/z values of the peaks. Therefore, the equation can be simplified by replacing m/zₙ and m/zₙ₊₁ with the actual peak values. In this case, the peaks at 933.7148 and 965.9684 are adjacent.

Since m/z is inversely proportional to charge, the lower m/z value (933.7148) corresponds to the higher charge state (z+1), and the higher m/z value (965.9684) corresponds to the lower charge state (z).

z = (m / z(n+1)) / ( (m / z(n)) - (m / z(n+1)) )

z = (smaller peak) / (bigger peak − smaller peak)

z = 933.7148 / (965.9684 − 933.7148)

z = 933.7148 / 32.2536

z = 28.94 ≈ 29

b) The molecular weight (MW) was calculated using:

MW = z * (m/z) - z * mH

Where mH = 1.007276 Da (mass of a proton).

Substituting values:

MW = 30 * 933.7148 - 30 * 1.007276

MW = 28011.444 - 30.21828

MW = 27981.23 Da

The term mH represents the mass of a proton (H⁺), which is approximately 1.007276 Da. This value is used because, in electrospray ionization mass spectrometry (ESI-MS), proteins are ionized by gaining protons, forming positively charged ions of the form [M + zH]⁺ᶻ. As a result, the measured m/z value includes not only the mass of the protein but also the mass of the added protons. Each proton contributes both one unit of positive charge and an additional mass of about 1.007276 Da.

c)

Accuracy = (MWexperimental − MWtheoretical) / MWtheoretical

Accuracy = |27981.23 − 28006.60| / 28006.60

Accuracy = 25.37 / 28006.60

Accuracy ≈ 0.000906

An error of approximately 0.0906% is considered very low in mass spectrometry, indicating that the experimentally calculated molecular weight is extremely close to the theoretical value.

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not.

Yes, the charge state can be observed because the small peaks in this region (1473.5333, 1473.7429, 1474.0481) correspond to the isotopic distribution of a single charge state. The spacing between adjacent isotopic peaks is approximately 0.3 m/z units.

Since isotopic spacing follows the relationship Δ(m/z) = 1/z, the charge state can be estimated as z = 1/0.3 = 3. Therefore, the peak corresponds to a charge state of approximately +3. While it is true that adjacent charge states in the full spectrum are separated by much larger differences in m/z, the charge state of an individual peak can still be determined from the isotopic spacing within the zoomed-in region.

Homework: Waters Part II — Secondary/Tertiary structure

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations.

What happens when a protein unfolds?

In its native state, a protein like eGFP is properly folded into a compact, globular structure stabilized by noncovalent interactions (hydrogen bonds, hydrophobic interactions, ionic interactions). Many basic residues that can accept protons are buried inside the structure. When a protein becomes denatured, it unfolds into an extended conformation. This disrupts its tertiary structure and exposes previously buried residues, including basic amino acids (e.g., Lys, Arg, His), to the solvent

This is determined in a mass spectrometer by measuring the mass-to-charge ratio (m/z) of the protein ions produced during electrospray ionization on the Waters Xevo G3-QToF. As the protein enters the instrument, it picks up multiple protons, forming ions with different charge states. The instrument detects these ions as a series of peaks at different m/z values.

How is that determined with a mass spectrometer?

For a folded (native) protein, fewer protonation sites are accessible, so the protein carries fewer charges, and the detected peaks appear at higher m/z values with a narrow distribution. For a denatured protein, more sites are exposed, allowing more protons to attach, which produces ions with higher charge states that appear at lower m/z values and over a broader range. Thus, by analyzing the charge state distribution and the position of peaks in the spectrum, the mass spectrometer allows us to determine whether the protein is in a native or denatured conformation.

What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

The denatured protein (top spectrum) displays a broad distribution of many peaks at lower m/z values, indicating that the unfolded protein has acquired a higher number of charges. In contrast, the native protein (bottom spectrum) shows a narrower distribution with fewer peaks at higher m/z values, consistent with lower charge states. Overall, the denatured spectrum is more spread out and shifted to lower m/z, while the native spectrum is more compact and shifted to higher m/z.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800? What is the charge state? How can you tell?

The peak observed at approximately m/z ≈ 2800 in the native mass spectrum of eGFP corresponds to a specific charge state of the protein. In native mass spectrometry, proteins typically appear as a series of peaks rather than a single signal because they can carry multiple positive charges. Each peak in this series represents the same protein with a different number of charges (z), and determining this charge state is essential for interpreting the spectrum.

By zooming the region around m/z ≈ 2545, what initially appears to be a single peak is actually composed of multiple closely spaced isotopic peaks and their spacing provides direct information about the charge state. Specifically, the distance between adjacent isotopic peaks is equal to 1/z.

From the zoomed spectrum, the spacing between neighboring isotopic peaks is approximately 0.1 m/z units. Using the relationship Δ(m/z) = 1/z, the charge state can be calculated as z = 1/0.1 = 10. This indicates that the protein molecules contributing to this signal carry ten positive charges. Because the peaks in this region belong to the same charge envelope, the peak at m/z ≈ 2800 can therefore be assigned a charge state of +10.

Homework: Waters Part III — Peptide Mapping - primary structure

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.

There are a variety of tools available online to calculate protein molecular weight and predict a list of peptides generated from a tryptic digest. We will be using tools within the online resource Expasy (the bioinformatics resource portal of the Swiss Institute of Bioinformatics (SIB)) to predict a list of tryptic peptides from eGFP.

1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).

Lysine (K): 20 residues Arginine (R): 6 residues

2. How many peptides will be generated from tryptic digestion of eGFP?

a) Navigate to https://web.expasy.org/peptide_mass/*

b) Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.

c) Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.

d) Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.

Using the ExPASy PeptideMass tool with trypsin digestion, a total of 19 peptides are predicted from the eGFP sequence.

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

23 chromatographic peaks are observed between 0.5 and 6 minutes with greater than 10% relative abundance: 0.61, 0.79, 1.20, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, 4.87, 5.06, 5.43.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

The number of chromatographic peaks observed in the LC-MS peptide map (23 peaks) is slightly higher than the 19 peptides predicted from the tryptic digest using ExPASy.

This difference is expected because a single peptide can generate multiple signals in mass spectrometry. For example, peptides can appear with different charge states, form adducts (such as with sodium) or undergo minor modifications like oxidation, all of which produce additional peaks.

5. Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H]^+) based on its m/z and z.

The peptide shown in Figure 5b has its most intense peak at m/z = 525.767, which corresponds to the most abundant charge state of the peptide. To determine the charge (z), the isotopic peak spacing in the zoomed region is examined. The distance between adjacent isotopic peaks (for example, 525.767 to 526.259 to 526.768) is approximately 0.5 m/z units. Since isotopic spacing follows the relationship Δ(m/z) = 1/z, a spacing of about 0.5 indicates that z = 2. Therefore, the most abundant charge state of the peptide is +2.

The mass of the singly charged peptide ([M+H]+) can be calculated using the equation m/z = (M + zH)/z.

  • Rearranging gives M = z(m/z) − zH.

  • Substituting the values (with H ≈ 1 Da), M = 2 × 525.767 − 2 × 1 = 1049.534 Da.

  • Adding one proton gives the singly charged form: [M+H]+ = 1049.534 + 1 = 1050.534 Da.

  • Thus, the peptide has m/z ≈ 525.767, a charge state of +2, and a singly charged mass [M+H]+ of approximately 1050.53 Da.

6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that Accuracy = (MWexperimental − MWtheoretical) / MWtheoretical)

FEGDTLVNR

Mass: 1050.5214 Position: 115-123

The experimental mass of the peptide was determined to be 1050.53 Da, while the theoretical mass from the ExPASy PeptideMass tool is 1050.5214 Da. The mass accuracy is calculated using the formula:

  • Accuracy = (MWexperimental − MWtheoretical) / MWtheoretical.

Substituting the values gives:

  • Accuracy = (1050.53 − 1050.5214) / 1050.5214 = 0.0086 / 1050.5214 = 0.00000819, which corresponds to 8.19 ppm.

This small error indicates excellent agreement between the experimental and theoretical masses.

7. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

The percentage of the protein sequence confirmed by peptide mapping is 88%, as indicated by the sequence coverage shown in Figure 6.

8. Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?

The peptide sequence that best matches the fragmentation spectrum in Figure 5c is FEGDTLVNR. This sequence was identified by comparing the experimental peptide mass with the predicted tryptic peptides obtained from ExPASy and selecting the closest match. The predicted fragmentation pattern for this peptide shows a series of characteristic b-ions and y-ions, which correspond to fragmentation along the peptide backbone.

9. Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

Yes, the peptide map data makes sense and supports that the protein is the eGFP standard. The results show 88% amino acid sequence coverage, which is considered excellent for protein identification by LC-MS.

Additionally, the high mass accuracy (under 10 ppm) indicates that the measured peptide masses closely match the theoretical values. The MS/MS fragmentation spectra further confirm the identity of the peptides, as the observed b-ion and y-ion patterns are consistent with the predicted sequences.

Homework: Waters Part IV — Oligomers

We will determine Keyhole Limpet Hemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 1) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 7):

  • 7FU Decamer

  • 8FU Didecamer

  • 8FU 3-Decamer

  • 8FU 4-Decamer

Polypeptide Subunit NameSubunit Mass
7FU340 kDa
8FU400 kDa
  • The 7FU decamer (340 kDa × 10) has a mass of 3.4 MDa and corresponds to the peak at 3.4 MDa.

  • The 8FU didecamer (400 kDa × 20) has a mass of 8.0 MDa and corresponds to the peak at 8.33 MDa.

  • The 8FU 3-decamer (400 kDa × 30) has a mass of 12.0 MDa and corresponds to the peak at ~12.67 MDa.

  • The 8FU 4-decamer (400 kDa × 40) has a mass of 16.0 MDa and corresponds to the signal around 16 MDa.

#Homework: Waters Part V — Did I make GFP?

Please fill out this table with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge, or else the data screenshots in this document if you were unable to have lab work done at Waters.

Theoretical (kDa)Observed/measured on the Intact LC-MS (kDa)PPM Mass Error
28.0127.98906 ppm

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

1. My contribution

Unfortunately, I was not able to contribute a pixel to the collective artwork, as I was in the middle of midterm exams at my university during that period, which limited my availability to participate.

2. What I liked about the project

I really liked the project because of its biological foundation and particularly its connection to cell-free fluorescent protein optimization and how it was used for a global pixel artwork designed by HTGAA students :)

3. What could be improved for next year

For future versions, it could be interesting to include a live chat feature so participants can coordinate in real time and create more elaborate and intentional designs. Additionally, increasing the number of pixels beyond the 1,536 used in this edition could allow for more detailed and realistic compositions.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

  1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.
Cell-Free Master Mix Cell-Free Master Mix

E. coli Lysate

BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)

The lysate provides the essential molecular machinery required for transcription and translation, including ribosomes, metabolic enzymes, cofactors, and tRNAs. The presence of T7 RNA polymerase enables efficient transcription of target genes under T7 promoter control.

Salts / Buffer

  • Potassium Glutamate

Maintains ionic strength and mimics intracellular conditions, thereby stabilizing macromolecular interactions and supporting enzymatic activity.

  • HEPES-KOH pH 7.5¨

Serves as a buffering agent to maintain a stable pH, which is critical for optimal enzyme function during transcription and translation.

  • Magnesium Glutamate

Functions as an essential cofactor for ribosomes and polymerases, playing a key role in both transcriptional and translational processes.

  • Potassium Phosphate Monobasic / Dibasic

Contributes to buffering capacity and provides phosphate ions necessary for nucleotide metabolism and energy transfer reactions.

Energy / Nucleotide System

  • Ribose

Acts as a precursor for nucleotide biosynthesis, supporting sustained RNA production over extended reaction times.

  • Glucose

Serves as a metabolic energy source, enabling ATP regeneration through endogenous enzymatic pathways present in the lysate.

  • AMP, CMP, GMP, UMP

Provide nucleotide monophosphates that can be phosphorylated into their corresponding triphosphates, which are required substrates for RNA synthesis.

  • Guanine

Functions as a precursor in nucleotide salvage pathways, allowing for the biosynthesis of GMP and subsequently GTP for transcription.

Translation Mix (Amino Acids)

  • 17 Amino Acid Mix

Supplies the majority of amino acids required for protein synthesis during translation.

  • Tyrosine

Provided separately due to solubility and stability constraints; essential for incorporation into nascent polypeptides.

  • Cysteine

Added separately due to its susceptibility to oxidation; plays a critical role in protein structure through disulfide bond formation.

Additives

  • Nicotinamide

Supports redox balance and enzymatic activity by contributing to NAD⁺-dependent metabolic processes within the reaction. Backfill

  • Nuclease-Free Water

Used to adjust the final reaction volume while preventing nucleic acid degradation.

  1. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

The 1-hour PEP-NTP system relies on the direct addition of high-energy phosphate donors and nucleotide triphosphates, enabling rapid and high-yield protein synthesis within a short time frame. However, this approach is limited by the rapid depletion of energy substrates and accumulation of inhibitory byproducts.

In contrast, the 20-hour NMP-ribose-glucose system employs a metabolically sustained strategy in which substrates such as ribose and glucose support continuous nucleotide regeneration and ATP production. This results in prolonged reaction stability and sustained protein expression over extended periods.

  1. Bonus question: How can transcription occur if GMP is not included but Guanine is?

Transcription can still occur in the absence of externally supplied GMP because guanine can be converted into GMP through endogenous nucleotide salvage pathways present in the lysate. The resulting GMP can then be phosphorylated to GTP, which serves as the direct substrate for RNA polymerase during transcription.

References:

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Week 2 HW: DNA Read, Write, & Edit

HOMEWORK 2

Part 1: Benchling & In-silico Gel Art

See this week’s lab protocol “Gel Art: Restriction Digests and Gel Electrophoresis” for details. Overview:

  • Make a free account at benchling.com
  • Import the Lambda DNA.
  • Simulate Restriction Enzyme Digestion with the following Enzymes:
  1. EcoRI
  2. HindIII
  3. BamHI
  4. KpnI
  5. EcoRV
  6. SacI
  7. SalI
  • Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.
  • You might find Ronan’s website a helpful tool for quickly iterating on designs!

HOMEWORK RESULTS :)

1ST ATTEMPT

For my first attempt, I tried to form the phrase “Hi!”. It didn’t turn out as perfect as I imagined, but with practice, I hope to create more creative drawings.

2ND ATTEMPT

For my second attempt I tried to draw my own name in capital letters, “IAN”.

3RD ATTEMPT

For my third attempt I tried to draw the silhouette of an animal’s head.

Part 3: DNA Design Challenge

3.1. Choose your protein.

In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

The protein I chose is …

Q68KI4 · NHX1_ARATH

Function: Acts in low affinity electroneutral exchange of protons for cations such as Na+ or K+ across membranes. Can also exchange Li+ and Cs+ with a lower affinity. Involved in vacuolar ion compartmentalization necessary for cell volume regulation and cytoplasmic Na+ detoxification. Required during leaves expansion, probably to stimulate epidermal cell expansion. Confers competence to grow in high salinity conditions.

FASTA sequence

sp|Q68KI4|NHX1_ARATH Sodium/hydrogen exchanger 1 OS=Arabidopsis thaliana OX=3702 GN=NHX1 PE=1 SV=2 MLDSLVSKLPSLSTSDHASVVALNLFVALLCACIVLGHLLEENRWMNESITALLIGLGTG VTILLISKGKSSHLLVFSEDLFFIYLLPPIIFNAGFQVKKKQFFRNFVTIMLFGAVGTII SCTIISLGVTQFFKKLDIGTFDLGDYLAIGAIFAATDSVCTLQVLNQDETPLLYSLVFGE GVVNDATSVVVFNAIQSFDLTHLNHEAAFHLLGNFLYLFLLSTLLGAATGLISAYVIKKL YFGRHSTDREVALMMLMAYLSYMLAELFDLSGILTVFFCGIVMSHYTWHNVTESSRITTK HTFATLSFLAETFIFLYVGMDALDIDKWRSVSDTPGTSIAVSSILMGLVMVGRAAFVFPL SFLSNLAKKNQSEKINFNMQVVIWWSGLMRGAVSMALAYNKFTRAGHTDVRGNAIMITST ITVCLFSTVVFGMLTKPLISYLLPHQNATTSMLSDDNTPKSIHIPLLDQDSFIEPSGNHN VPRPDSIRGFLTRPTRTVHYYWRQFDDSFMRPVFGGRGFVPFVPGSPTERNPPDLSKA

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

[Example: Get to the original sequence of phage MS2 L-protein from its genome phage MS2 genome - Nucleotide - NCBI]

Reverse Translation:

reverse translation of sp|Q68KI4|NHX1_ARATH Sodium/hydrogen exchanger 1 OS=Arabidopsis thaliana OX=3702 GN=NHX1 PE=1 SV=2 to a 1614 base sequence of most likely codons. atgctggatagcctggtgagcaaactgccgagcctgagcaccagcgatcatgcgagcgtggtggcgctgaacctgtttgtggcgctgctgtgcgcgtgcattgtgctgggccatctgctggaagaaaaccgctggatgaacgaaagcattaccgcgctgctgattggcctgggcaccggcgtgaccattctgctgattagcaaaggcaaaagcagccatctgctggtgtttagcgaagatctgttttttatttatctgctgccgccgattatttttaacgcgggctttcaggtgaaaaaaaaaacagttttttcgcaactttgtgaccattatgctgtttggcgcggtgggcaccattattagctgcaccattattagcctgggcgtgacccagttttttaaaaaactggatattggcacctttgatctgggcgattatctggcgattggcgcgatttttgcggcgaccgatagcgtgtgcaccctgcaggtgctgaaccaggatgaaaccccgctgctgtatagcctggtgtttggcgaaggcgtggtgaacgatgcgaccagcgtggtggtgtttaacgcgattcagagctttgatctgacccatctgaaccatgaagcggcgtttcatctgctgggcaactttctgtatctgtttctgctgagcaccctgctgggcgcggcgaccggcctgattagcgcgtatgtgattaaaaaactgtattttggccgccatagcaccgatcgcgaagtggcgctgatgatgctgatggcgtatctgagctatatgctggcggaactgtttgatctgagcggcattctgaccgtgtttttttgcggcattgtgatgagccattatacctggcataacgtgaccgaaagcagccgcattaccaccaaacatacctttgcgaccctgagctttctggcggaaacctttatttttctgtatgtgggcatggatgcgctggatattgataaatggcgcagcgtgagcgataccccgggcaccagcattgcggtgagcagcattctgatgggcctggtgatggtgggccgcgcggcgtttgtgtttccgctgagctttctgagcaacctggcgaaaaaaaaccagagcgaaaaaattaactttaacatgcaggtggtgatttggtggagcggcctgatgcgcggcgcggtgagcatggcgctggcgtataacaaatttacccgcgcgggccataccgatgtgcgcggcaacgcgattatgattaccagcaccattaccgtgtgcctgtttagcaccgtggtgtttggcatgctgaccaaaccgctgattagctatctgctgccgcatcagaacgcgaccaccagcatgctgagcgatgataacaccccgaaaagcattcatattccgctgctggatcaggatagctttattgaaccgagcggcaaccataacgtgccgcgcccggatagcattcgcggctttctgacccgcccgacccgcaccgtgcattattattggcgccagtttgatgatagctttatgcgcccggtgtttggcggccgcggctttgtgccgtttgtgccgggcagcccgaccgaacgcaacccgccggatctgagcaaagcg

3.3. Codon optimization.

Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

[Example from Codon Optimization Tool | Twist Bioscience while avoiding Type IIs enzyme recognition sites BsaI, BsmBI, and BbsI]

Codon Optimized sequence

NHX1_optimized_IDT ATGTTAGATTCTTTAGTTAGCAAATTGCCCTCACTCTCAACCTCTGACCACGCCAGCGTGGTTGCGCTGAACCTGTTTGTGGCGCTGCTGTGTGCCTGTATTGTGCTGGGCCACCTGCTGGAAGAAAACCGCTGGATGAATGAATCCATCACTGCGCTGCTGATCGGCCTGGGTACTGGTGTCACCATCCTGCTGATCAGTAAAGGCAAAGCTCCACCTGCTGGTGTTCTCTGAAGATCTGTTCTTTATCTATCTGCTGCCGCCGATCATCTTCAACGCCGGTTTCCAGGTGAAAAAGAAACAGTTCTTCCGTAATTCGTCACCATCATGCTGTTTGGTGCGGTAGGTACCATTATCAGCTGTACCATTATCAGCCTGGGTGTGACTCAGTTCTTCAAAAAACTGGATATCGGTACCTTTGACCTGGGTGATTATCTTGCGATTGGTGCGATCTTTGCTGCAACCGACAGTGTGTGCACCCTGCAGGTGCTGAACCAGGATGAAACCCCGCTGCTGTACAGCCTGGTGTTCGGTGAAGGTGTGGTGAACGATGCGACCTCGGTGGTGGTTTTAACGCCATTCAGAGCTTTGACCTGACCCATCTGAACCATGAAGCGGCGTTCCACCTGCTCGGCAACTTCCTGTACCTGTTCCTGCTGTCCACCCTGCTGGGTGCGGCGACCGGTCTGATCTCTGCCTATGTGATCAAGAAGCTGTATTTGGTCGTCACAGCACCGACCGCGAAGTTGCACTGATGATGCTGATGGCGTACCTGAGCTACATGCTGGCAGAGCTGTTTGACCTCAGTGGTATCCTGACCGTGTTCTTCTGCGGTATTGTCATGAGCCACTACACCTGGCATAACGTGACTGAAGCAGCCGTATCACCACCAAACACACCTTTGCCACCCTGTCGTTCTTGGCTGAAACCTTTATCTTCCTGTATGTCGGTATGGATGCGCTGGACATCGATAAGTGGCGCTCGGTAAGCGACACACCGGGTACCTCTATTGCGGTTAGCTCGATTCTGATGGGCCTGGTGATGGTAGGTCGTGCGGCGTTCGTGTTCCCGCTGTCGTTCTTGAGCAACCTGGCGGAGAAGAACCAGTCTGAGAAAATCAACTTCAACATGCAGGTGGTGATCTGGTGGTCTGGGCTGATGCGTGGTGCAGTCTCTATGGCCCTGGCCTACAACAAGTTTACCCGTGCAGGTCACACTGATGTACGTGGTAATGCGATTATGATCACCTCCACCATCACCGTGTGCCTGTTCAGCACCGTGGTGTTTGGCATGCTGACCAAACCGCTGATCAGCTACCTGCTGCCGCATCAGAATGCCACCACCAGCATGCTGTCTGATGACAACACGCCGAAATCTATTCACATTCCGCTGCTGGATCAGGACAGCTTTATTGAGCCGTCTGGTAACCACAATGTTCCACGTCCGGACAGCATTCGCGGTTTCCTGACCCGCCCGACCCGCACCGTGCACTACTATTGGCGTCAGTTTGATGACTCCTTCATGCGCCCGGTGTTTGGTGGTCGCGGCTTTGTGCCGTTTGTTCCGGGCTCCCCAACTGAGCGTAACCCGCCGGATCTGAGCAAAGCA

My answer:

Codon optimization is necessary because, although multiple codons can encode the same amino acid, each organism preferentially uses certain codons over others. For example, If a gene from one organism is expressed in a different host without optimization, rare codons may reduce translation efficiency, slow ribosome movement, decrease protein yield, or cause premature termination. The NHX1 coding sequence was optimized according to the codon usage preference of Escherichia coli, which was selected because it is one of the most widely used systems for recombinant protein expression due to its rapid growth, well - characterized genetics and availability of expression vectors and laboratory tools.

On the other hand, the codon optimization was performed using the IDT Codon Optimization Tool (Integrated DNA Technologies). During optimization, the tool adjusted synonymous codons to match E. coli codon bias while maintaining the original amino acid sequence. Additionally, to facilitate downstream cloning strategies, recognition sites for Type IIS restriction enzymes BsaI, BsmBI, and BbsI were avoided during the optimization process which ensures compatibility with Golden Gate assembly and prevents unwanted internal digestion of the gene sequence.

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

My answer:

Cell-dependent protein expression

This option would clone the optimized NHX1 gene into an expression vector (plasmid) containing:

  • A strong promoter.
  • A ribosome binding site (RBS).
  • A selectable marker (antibiotic resistance gene)
  • A transcription terminator

The recombinant plasmid is then introduced into a host (Escherichia coli in this case), through transformation. Once inside the cell, the DNA sequence is transcribed, where RNA polymerase recognizes the promoter and synthesizes messenger RNA (mRNA) complementary to the coding strand of the DNA and translated where ribosomes bind to the mRNA and read the codons in triplets. Transfer RNAs (tRNAs) bring the corresponding amino acids, which are linked together through peptide bonds to form the NHX1 protein.

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account and a Benchling account

4.2. Build Your DNA Insert Sequence

For example, let’s make a sequence that will make E. coli glow fluorescent green under UV light by constitutively (always) expressing sfGFP (a green fluorescent protein): In Benchling, select New DNA/RNA sequence Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

The image above shows the Codon Optimized sequence of Q68KI4 · NHX1_ARATH.

Go through each piece of the given DNA sequences highlighted below (Promoter, RBS, Start Codon, Coding Sequence, His Tag, Stop Codon, Terminator) and paste the sequences into the Benchling file one after the other (replacing the coding sequence with your codon optimized DNA sequence of interest!). Each time you add a new piece of the sequence, make sure to annotate by right clicking over the sequence and creating an annotation that describes what each piece (e.g., Promoter, RBS, etc.) is.

Promoter (e.g. BBa_J23106): TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (e.g. BBa_B0034 with spacers for optimal expression): CATTAAAGAGGAGAAAGGTACC

Start Codon: ATG

Coding Sequence (your codon optimized DNA for a protein of interest, sfGFP for example): AGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCCGTGGAGAGGGTGAAGGTGATGCTACAAACGGAAAACTCACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCGTGGCCAACACTTGTCACTACTCTGACCTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGACCTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAGGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAACTCGAGTACAACTTTAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACGTTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGTCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAA

7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli): CATCACCATCACCATCATCAC

Stop Codon: TAA

Terminator (e.g. BBa_B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

Once you’ve completed this, click on Linear Map to preview the entire sequence. If you intend to have a TA review a sequence in the future, this is a good way to verify that all sections are annotated!

https://benchling.com/ian-teran-35/f_/91Ap236lfD-htgaa-2026/

Downloaded FASTA sequence of the construct:

AtNHX1_Ecoli_expression_construct TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGTTAGATTCTTTAGTTAGCAAATTGCCCTCACTCTCAACCTCTGACCACGCCAGCGTGGTTGCGCTGAACCTGTTTGTGGCGCTGCTGTGTGCCTGTATTGTGCTGGGCCACCTGCTGGAAGAAAACCGCTGGATGAATGAATCCATCACTGCGCTGCTGATCGGCCTGGGTACTGGTGTCACCATCCTGCTGATCAGTAAAGGCAAAGCTCCACCTGCTGGTGTTCTCTGAAGATCTGTTCTTTATCTATCTGCTGCCGCCGATCATCTTCAACGCCGGTTTCCAGGTGAAAAAGAAACAGTTCTTCCGTAATTCGTCACCATCATGCTGTTTGGTGCGGTAGGTACCATTATCAGCTGTACCATTATCAGCCTGGGTGTGACTCAGTTCTTCAAAAAACTGGATATCGGTACCTTTGACCTGGGTGATTATCTTGCGATTGGTGCGATCTTTGCTGCAACCGACAGTGTGTGCACCCTGCAGGTGCTGAACCAGGATGAAACCCCGCTGCTGTACAGCCTGGTGTTCGGTGAAGGTGTGGTGAACGATGCGACCTCGGTGGTGGTTTTAACGCCATTCAGAGCTTTGACCTGACCCATCTGAACCATGAAGCGGCGTTCCACCTGCTCGGCAACTTCCTGTACCTGTTCCTGCTGTCCACCCTGCTGGGTGCGGCGACCGGTCTGATCTCTGCCTATGTGATCAAGAAGCTGTATTTGGTCGTCACAGCACCGACCGCGAAGTTGCACTGATGATGCTGATGGCGTACCTGAGCTACATGCTGGCAGAGCTGTTTGACCTCAGTGGTATCCTGACCGTGTTCTTCTGCGGTATTGTCATGAGCCACTACACCTGGCATAACGTGACTGAAGCAGCCGTATCACCACCAAACACACCTTTGCCACCCTGTCGTTCTTGGCTGAAACCTTTATCTTCCTGTATGTCGGTATGGATGCGCTGGACATCGATAAGTGGCGCTCGGTAAGCGACACACCGGGTACCTCTATTGCGGTTAGCTCGATTCTGATGGGCCTGGTGATGGTAGGTCGTGCGGCGTTCGTGTTCCCGCTGTCGTTCTTGAGCAACCTGGCGGAGAAGAACCAGTCTGAGAAAATCAACTTCAACATGCAGGTGGTGATCTGGTGGTCTGGGCTGATGCGTGGTGCAGTCTCTATGGCCCTGGCCTACAACAAGTTTACCCGTGCAGGTCACACTGATGTACGTGGTAATGCGATTATGATCACCTCCACCATCACCGTGTGCCTGTTCAGCACCGTGGTGTTTGGCATGCTGACCAAACCGCTGATCAGCTACCTGCTGCCGCATCAGAATGCCACCACCAGCATGCTGTCTGATGACAACACGCCGAAATCTATTCACATTCCGCTGCTGGATCAGGACAGCTTTATTGAGCCGTCTGGTAACCACAATGTTCCACGTCCGGACAGCATTCGCGGTTTCCTGACCCGCCCGACCCGCACCGTGCACTACTATTGGCGTCAGTTTGATGACTCCTTCATGCGCCCGGTGTTTGGTGGTCGCGGCTTTGTGCCGTTTGTTCCGGGCTCCCCAACTGAGCGTAACCCGCCGGATCTGAGCAAAGCACATCACCATCACCATCATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

4.3. On Twist, Select The “Genes” Option

4.4. Select “Clonal Genes” option

For this demonstration, we’ll choose Clonal Genes. You’ll select clonal genes or gene fragments depending on your final project. Historically, HTGAA projects using clonal genes (circular DNA) have reached experimental results 1-2 weeks quicker because they can be transformed directly into E. coli without additional assembly. Gene fragments (linear DNA) offer greater design flexibility but typically require an assembly or cloning step prior to transformation. An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression.

4.5. Import your sequence

You just took an amino acid sequence of interest and converted it into DNA, codon optimized it, and built an expression cassette around it! Choose the Nucleotide Sequence option and Upload Sequence File to upload your FASTA file.

4.6. Choose Your Vector

Since we’re ordering a clonal gene, you will need to refer to Twist’s Vector Catalog to choose your circular backbone. You can think of this as taking your linear expression cassette for your protein of interest, and completing the rest of the circle!

The backbone confers many special properties like antibiotic resistance, an origin of replication, and more. Discuss with your node to decide on appropriate antibiotic options. At MIT/Harvard, you can use Ampicillin, Chloramphenicol, or Kanamycin resistance.

Twist vectors do not contain restriction sites near the insert fragment, so make sure to flank your design with cut sites if you are intending to extract this DNA insert fragment later.

For this demonstration, choose a Twist cloning vectors like pTwist Amp High Copy. Click into your sequence and select download construct (GenBank) to get the full plasmid sequence:

Go back to your Benchling account. Inside of a folder, click the import DNA/RNA sequence button and upload the GenBank file you just downloaded.

WOW! :)

Part 5: DNA Read/Write/Edit

5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would sequence the cry4Ba gene from Bacillus thuringiensis isolates in the field, the promoter and regulatory regions controlling cry4Ba expression and comparable cry4 family homologs from different strains because I would like to understand the genetic diversity of cry4Ba which would be useful to explore new methods to improve efficacy against mosquito larvae, reveal natural sequence variation influencing toxicity and assist in environmental monitoring of Bt toxin dissemination.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? Also answer the following questions:

1. Is your method first-, second- or third-generation or other? How so?

The technology I would use is Illumina short-read sequencing which is a second generation sequencing method. It provides high accuracy, cost effectiveness and is well suited to bacterial genes.

2. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

  • DNA extraction
  • Fragmentation (to ~300 bp)
  • Adapter ligation
  • PCR enrichment
  • Library quantification & pooling

The input is Genomic DNA from Bacillus thuringiensis cultures.

3. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

  • Sequencing-by-synthesis.
  • Each base is read by fluorescently labeled nucleotides incorporated one at a time.
  • Signals are captured and used for base calling.

4. What is the output of your chosen sequencing technology?

FASTQ files of read sequences and paired reads that can be aligned to reference genomes.

5.2 DNA Write (i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I would design and synthesize a codon-optimized cry4Ba gene for high toxin expression in a chosen bacterial host, variant versions with enhanced insecticidal activity and chimeric constructs combining parts of different Cry proteins to improve biological control of mosquitoes, increase production yield in recombinant strains and make toxin variants tailored to resistant insect populations.

cry4Ba gene DNA sequence:

ATGATGAATTCTGGTTATCCTTTAGCTAATGATTTACAAGGTTCTATGAAAAATACTAATTATAAAGATTGGTTAGCTATGTGTGAAAATAATCAACAATATGGTGTTAATCCTGCTGCTATTAATTCTTCTTCTGTTTCTACTGCTTTAAAAGTTGCTGGTGCTATTTTAAAATTTGTTAATCCTCCTGCTGGTACTGTTTTAACTGTTTTATCTGCTGTTTTACCTATTTTATGGCCTACTAATACTCCTACTCCTGAACGTGTTTGGAATGATTTTATGACTAATACTGGTAATTTAATTGATCAAACTGTTACTGCTTATGTTCGTACTGATGCTAATGCTAAAATGACTGTTGTTAAAGATTATTTAGATCAATATACTACTAAATTTAATACTTGGAAACGTGAACCTAATAATCAATCTTATCGTACTGCTGTTATTACTCAATTTAATTTAACTTCTGCTAAATTACGTGAAACTGCTGTTTATTTTTCTAATTTAGTTGGTTATGAATTATTATTATTACCTATTTATGCTCAAGTTGCTAATTTTAATTTATTATTAATTCGTGATGGTTTAATTAATGCTCAAGAATGGTCTTTAGCTCGTTCTGCTGGTGATCAATTATATAATACTATGGTTCAATATACTAAAGAATATATTGCTCATTCTATTACTTGGTATAATAAAGGTTTAGATGTTTTACGTAATAAATCTAATGGTCAATGGATTACTTTTAATGATTATAAACGTGAAATGACTATTCAAGTTTTAGATATTTTAGCTTTATTTGCTTCTTATGATCCTCGTCGTTATCCTGCTGATAAAATTGATAATACTAAATTATCTAAAACTGAATTTACTCGTGAAATTTATACTGCTTTAGTTGAATCTCCTTCTTCTAAATCTATTGCTGCTTTAGAAGCTGCTTTAACTCGTGATGTTCATTTATTTACTTGGTTAAAACGTGTTGATTTTTGGACTAATACTATTTATCAAGATTTACGTTTTTTATCTGCTAATAAAATTGGTTTTTCTTATACTAATTCTTCTGCTATGCAAGAATCTGGTATTTATGGTTCTTCTGGTTTTGGTTCTAATTTAACTCATCAAATTCAATTAAATTCTAATGTTTATAAAACTTCTATTACTGATACTTCTTCTCCTTCTAATCGTGTTACTAAAATGGATTTTTATAAAATTGATGGTACTTTAGCTTCTTATAATTCTAATATTACTCCTACTCCTGAAGGTTTACGTACTACTTTTTTTGGTTTTTCTACTAATGAAAATACTCCTAATCAACCTACTGTTAATGATTATACTCATATTTTATCTTATATTAAAACTGATGTTATTGATTATAATTCTAATCGTGTTTCTTTTGCTTGGACTCATAAAATTGTTGATCCTAATAATCAAATTTATACTGATGCTATTACTCAAGTTCCTGCTGTTAAATCTAATTTTTTAAATGCTACTGCTAAAGTTATTAAAGGTCCTGGTCATACTGGTGGTGATTTAGTTGCTTTAACTTCTAATGGTACTTTATCTGGTCGTATGGAAATTCAATGTAAAACTTCTATTTTTAATGATCCTACTCGTTCTTATGGTTTACGTATTCGTTATGCTGCTAATTCTCCTATTGTTTTAAATGTTTCTTATGTTTTACAAGGTGTTTCTCGTGGTACTACTATTTCTACTGAATCTACTTTTTCTCGTCCTAATAATATTATTCCTACTGATTTAAAATATGAAGAATTTCGTTATAAAGATCCTTTTGATGCTATTGTTCCTATGCGTTTATCTTCTAATCAATTAATTACTATTGCTATTCAACCTTTAAATATGACTTCTAATAATCAAGTTATTATTGATCGTATTGAAATTATTCCTATTACTCAATCTGTTTTAGATGAAACTGAAAATCAAAATTTAGAATCTGAACGTGAAGTTGTTAATGCTTTATTTACTAATGATGCTAAAGATGCTTTAAATATTGGTACTACTGATTATGATATTGATCAAGCTGCTAATTTAGTTGAATGTATTTCTGAAGAATTATATCCTAAAGAAAAAATGTTATTATTAGATGAAGTTAAAAATGCTAAACAATTATCTCAATCTCGTAATGTTTTACAAAATGGTGATTTTGAATCTGCTACTTTAGGTTGGACTACTTCTGATAATATTACTATTCAAGAAGATGATCCTATTTTTAAAGGTCATTATTTACATATGTCTGGTGCTCGTGATATTGATGGTACTATTTTTCCTACTTATATTTTTCAAAAAATTGATGAATCTAAATTAAAACCTTATACTCGTTATTTAGTTCGTGGTTTTGTTGGTTCTTCTAAAGATGTTGAATTAGTTGTTTCTCGTTATGGTGAAGAAATTGATGCTATTATGAATGTTCCTGCTGATTTAAATTATTTATATCCTTCTACTTTTGATTGTGAAGGTTCTAATCGTTGTGAAACTTCTGCTGTTCCTGCTAATATTGGTAATACTTCTGATATGTTATATTCTTGTCAATATGATACTGGTAAAAAACATGTTGTTTGTCAAGATTCTCATCAATTTTCTTTTACTATTGATACTGGTGCTTTAGATACTAATGAAAATATTGGTGTTTGGGTTATGTTTAAAATTTCTTCTCCTGATGGTTATGCTTCTTTAGATAATTTAGAAGTTATTGAAGAAGGTCCTATTGATGGTGAAGCTTTATCTCGTGTTAAACATATGGAAAAAAAATGGAATGATCAAATGGAAGCTAAACGTTCTGAAACTCAACAAGCTTATGATGTTGCTAAACAAGCTATTGATGCTTTATTTACTAATGTTCAAGATGAAGCTTTACAATTTGATACTACTTTAGCTCAAATTCAATATGCTGAATATTTAGTTCAATCTATTCCTTATGTTTATAATGATTGGTTATCTGATGTTCCTGGTATGAATTATGATATTTATGTTGAATTAGATGCTCGTGTTGCTCAAGCTCGTTATTTATATGATACTCGTAATATTATTAAAAATGGTGATTTTACTCAAGGTGTTATGGGTTGGCATGTTACTGGTAATGCTGATGTTCAACAAATTGATGGTGTTTCTGTTTTAGTTTTATCTAATTGGTCTGCTGGTGTTTCTCAAAATGTTCATTTACAACATAATCATGGTTATGTTTTACGTGTTATTGCTAAAAAAGAAGGTCCTGGTAATGGTTATGTTACTTTAATGGATTGTGAAGAAAATCAAGAAAAATTAACTTTTACTTCTTGTGAAGAAGGTTATATTACTAAAACTGTTGATGTTTTTCCTGATACTGATCGTGTTCGTATTGAAATTGGTGAAACTGAAGGTTCTTTTTATATTGAATCTATTGAATTAATTTGTATGAATGAATAA

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

I would use high throughput chemical DNA synthesis combined with assembly methods such as Gibson Assembly because chemical oligonucleotide synthesis (phosphoramidite chemistry) is the standard technology to produce short DNA fragments with controlled sequence and high purity and for this reason, they can be assembled into the full length cry4Ba gene using Gibson Assembly which joins overlapping oligonucleotides in a single reaction. I would choose this combination because it enables accurate synthesis of long genes, allows codon optimization for different expression hosts and supports easy modular design.

Also answer the following questions:

1. What are the essential steps of your chosen sequencing methods?

  • Oligo synthesis
  • Purification
  • Gene assembly
  • Cloning into an expression vector

2. What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

  • Cost increases with length.
  • Errors may occur during oligo synthesis.
  • Requires verification.

5.3 DNA Edit (i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I would edit the cry4Ba coding sequence to enhance toxicity or stability, regulatory elements to improve expression in non-Bt hosts and domains to broaden target specificity. The goal of this DNA Edit would be to develop improved insecticides or novel delivery systems.

(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use CRISPR/Cas9 because it allows precise modification of DNA by using a guide RNA (gRNA) that directs the Cas9 nuclease to a specific sequence within the cry4Ba gene and it is highly specific, relatively easy to design, efficient in bacteria and also scalable for generating multiple toxin variants. On the other hand, If I wanted to introduce small point mutations to improve toxin activity or stability without creating double-strand breaks I would use CRISPR base editors enabling single nucleotide changes with greater precision and lower risk of unwanted insertions or deletions.

Also answer the following questions:

1. How does your technology of choice edit DNA? What are the essential steps?

CRISPR/Cas9 edits DNA by creating a targeted double strand break at a specific sequence defined by a designed guide RNA (gRNA) that would be complementary to the cry4Ba locus as it is computationally designed to match the desired target site and cloned or synthesized. Then, the Cas9 nuclease and gRNA are delivered into Bacillus cells via plasmid transformation. Once inside the cell, the gRNA directs Cas9 to the target sequence, where Cas9 introduces a precise cut in the DNA and the cell’s natural DNA repair mechanisms then repair the break either through non-homologous end joining or homologous recombination if a donor DNA template containing desired modifications is provided. Finally, edited colonies are screened and verified by PCR and sequencing to confirm the intended modification.

2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

  • Design guide RNA targeting cry4Ba.
  • Cas9 delivery vector or ribonucleoprotein.
  • Editing template with desired mutations.
  • Bacterial host cells.

3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

  • Off - target activity (needs careful design).
  • Editing efficiency depends on repair pathways.
  • Delivery methods vary in success.

Week 4 HW: Protein Design

Part A. Conceptual Questions

1. How many amino acid molecules do you take with 500 g of meat?

If we assume that meat is approximately 20% protein, then 500 grams of meat contains about 100 grams of protein. The average molecular weight of an amino acid is roughly 100 Daltons (100 g/mol). Dividing 100 grams by 100 g/mol gives approximately 1 mole of amino acids and one mole contains 6.02 × 10²³ molecules, the Avogadro’s number. Therefore, consuming 500 grams of meat means ingesting on the order of 10²³ amino acid molecules.

2. Why do humans eat beef but do not become a cow?

Humans do not become cows after eating beef because biological identity is not determined by the origin of consumed molecules but by the genetic information and regulatory networks. It is for this reason that proteins from beef are digested into individual amino acids, which are then absorbed and reused by our cells to synthesize human proteins according to our own DNA instructions.

3. Why are there only 20 natural amino acids?

There are only 20 canonical amino acids because they provide sufficient chemical diversity to build functional proteins while maintaining evolutionary stability. These amino acids cover a wide range of chemical properties: hydrophobic, polar, charged, aromatic, flexible, and rigid. Early in evolution, once the translation machinery became established, expanding the genetic code would have introduced significant risk of translational errors.

4. Can you make non - natural amino acids? Design some.

Yes, non - natural amino acids can be synthesized chemically or incorporated through expanded genetic code technologies. For example: A fluorinated amino acid can be designed by replacing a methyl group with a trifluoromethyl group to increase hydrophobicity and protein stability. Another possibility is a photo - switchable amino acid containing an azobenzene group, allowing protein conformation to be controlled with light. A third design could include a redox - active moiety, such as a ferrocene group, enabling electron transfer within engineered proteins.

5. Where did amino acids come from before enzymes and life started?

Amino acids likely originated through prebiotic chemistry before the emergence of life and there are experiments such as the Miller - Urey experiment that demonstrated that amino acids can form spontaneously under simulated early Earth conditions involving simple gases and electrical discharge. Additionally, amino acids have been found in meteorites such as the Murchison meteorite, suggesting extraterrestrial delivery may have contributed to the prebiotic inventory.

6. If you make an α-helix using D - amino acids, what handedness would you expect?

Natural proteins are composed of L - amino acids and form predominantly right - handed α - helices. If a scientist constructs a helix entirely from D - amino acids, the chirality of each residue is inverted, which reverses the allowed backbone dihedral angles. As a result, the helix formed would be left - handed.

7. Why are most molecular helices right - handed?

Most molecular helices in biological systems are right - handed because life is based almost exclusively on L - amino acids because their stereochemistry constrains backbone geometry in a way that energetically favors right - handed helices, minimizing steric clashes and optimizing hydrogen bonding. If life had evolved using D - amino acids instead, left - handed helices would likely dominate.

8. Why do β - sheets tend to aggregate?

- What is the driving force?

β - sheets tend to aggregate because their backbone hydrogen bond donors and acceptors become exposed when proteins partially unfold. These exposed regions seek to form hydrogen bonds to stabilize themselves, often binding to similar β-strands from other molecules. The aggregation is driven by intermolecular hydrogen bonding, hydrophobic interactions and the entropic gain associated with releasing ordered water molecules from hydrophobic surfaces.

9. Why do many amyloid diseases form β-sheets?

- Can you use amyloid β-sheets as materials?

Many amyloid diseases, including Alzheimer’s disease, involve the misfolding of proteins into stable cross - β - sheet fibril because they are particularly prone to forming highly ordered, self templating aggregates that grow through nucleation dependent polymerization. Although pathological in a biological context, these same structural properties make amyloid fibrils attractive as biomaterials and engineered amyloid-like peptides can form hydrogels, scaffolds for tissue engineering and nanostructured materials.

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it The protein I selected is the ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (RbcL) with UniProt accession Q8DIS5, entry name RBL_THEVB. This protein comes from the thermophilic cyanobacterium Thermosynechococcus vestitus BP-1. RuBisCO is the central enzyme of the Calvin cycle and is responsible for fixing atmospheric CO₂ into organic carbon during photosynthesis. I selected this protein because carbon fixation underlies all plant biomass production and therefore directly impacts agriculture, food security, and global carbon cycling. Studying RuBisCO at the structural level provides insight into how photosynthetic efficiency might be improved in crops.

https://www.uniprot.org/uniprotkb/Q8DIS5/entry

2. Identify the amino acid sequence of your protein. MAYTQSKSQKVGYQAGVKDYRLTYYTPDYTPKDTDILAAFRVTPQPGVPFEEAAAAVAAESSTGTWTTVWTDLLTDLDRYKGCCYDIEPLPGEDNQFIAYIAYPLDLFEEGSVTNMLTSIVGNVFGFKALKALRLEDLRIPVAYLKTFQGPPHGIQVERDKLNKYGRPLLGCTIKPKLGLSAKNYGRAVYECLRGGLDFTKDDENINSQPFQRWRDRFLFVADAIHKAQAETGEIKGHYLNVTAPTCEEMLKRAEFAKELEMPIIMHDFLTAGFTANTTLSKWCRDNGMLLHIHRAMHAVMDRQKNHGIHFRVLAKCLRMSGGDHIHTGTVVGKLEGDKAVTLGFVDLLRENYIEQDRSRGIYFTQDWASMPGVMAVASGGIHVWHMPALVDIFGDDAVLQFGGGTLGHPWGNAPGATANRVALEACIQARNEGRDLMREGGDIIREAARWSPELAAACELWKEIKFEFEAQDTI

- How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

The length of the protein is: 475 aminoacids. The most common amino acid is: A (alanine), which appears 47 times.

- How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

When the amino acid sequence of Q8DIS5 was analyzed using UniProt’s BLAST tool, the search returned 250 homologous protein sequences. These homologs correspond primarily to RuBisCO large subunits from cyanobacteria and photosynthetic organisms. The presence of numerous homologs indicates that this protein is highly conserved across species, reflecting its essential role in carbon fixation and global photosynthetic metabolism.

- Does your protein belong to any protein family?

Yes, the protein Q8DIS5 (RBL_THEVB) belongs to the RuBisCO superfamily, specifically Form I RuBisCO large subunits (RbcL family). This protein family includes the catalytic large chains of ribulose-1,5-bisphosphate carboxylase/oxygenase enzymes found in cyanobacteria, plants, and algae. Members of this family share highly conserved sequence motifs that are essential for carbon fixation, including residues involved in substrate binding and magnesium coordination at the active site.

3. Identify the structure page of your protein in RCSB

- When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

The structure 2YBV was solved by X-ray crystallography, with:

Deposited: March 10, 2011

Released: March 28, 2012

Resolution: 2.30 Å

A resolution of 2.30 Å is considered high quality for structural analysis and visualization because at this resolution, the positions of amino acid side chains, cofactors, and active-site features are clearly defined.

- Are there any other molecules in the solved structure apart from protein?

Apart from the protein chains, crystallized RuBisCO structures typically include magnesium ions (Mg²⁺), which are essential cofactors for catalysis, as well as substrate analogs and water molecules that help stabilize the active site.

- Does your protein belong to any structure classification family? Structurally, the RuBisCO large subunit belongs to the alpha/beta protein class with a conserved α/β barrel–like fold characteristic of the RuBisCO superfamily, illustrating both its functional role in catalysis and its classification within established structural families.

4. Open the structure of your protein in any 3D molecule visualization software: PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

  • Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
  • Color the protein by secondary structure. Does it have more helices or sheets?
  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

When I visualized in cartoon representation, the protein displays a mixed α/β architecture. A central beta-sheet core is surrounded by multiple alpha helices, forming the characteristic α/β fold typical of RuBisCO large subunits. The beta sheets form the structural core of the enzyme, while the helices surround and stabilize this framework. Overall, the structure contains both helices and sheets, with a prominent beta-sheet catalytic core.

Moreover, when colored by secondary structure, the beta sheets appear concentrated in the central region, while alpha helices are distributed around the periphery. This confirms that the enzyme follows a classical α/β barrel-like organization commonly observed in metabolic enzymes. On the other hand, coloring the structure by residue type reveals that hydrophobic residues are predominantly located in the interior of the protein, forming a stable hydrophobic core. In contrast, hydrophilic residues are mostly exposed on the surface, consistent with a soluble enzyme functioning in the chloroplast stroma. This distribution reflects proper protein folding and stability in aqueous environments.

Surface visualization of 2YBV reveals clear cavities and clefts between structural domains, indicating the presence of binding pockets. However, this specific crystal structure represents an apo form of the enzyme, as it does not contain a modeled Mg²⁺ ion or bound substrate. The only heteroatoms present in this structure are water molecules (HOH).

To visualize the catalytic metal ion, an additional RuBisCO structure (PDB ID 4RUB) was examined. In this structure, the Mg²⁺ ion appears as a green sphere located within a deep catalytic pocket and coordinated by nearby residues, confirming the structural location of the active site. Although Mg²⁺ is essential for enzymatic activity, it is not present in the 2YBV model itself.

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

Deep Mutational Scans

a) Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.

b) Can you explain any particular pattern? (choose a residue and a mutation that stands out)

Distinct vertical bands of strongly negative scores are observed at several positions, indicating that most substitutions at these sites are predicted to be unfavorable. These residues appear to be highly constrained, suggesting structural or functional importance. For example, substitution of a hydrophobic wild-type residue with a chemically dissimilar charged residue at one of these constrained positions results in a pronounced decrease in log - likelihood. This pattern is consistent with disruption of structural stability, particularly if the residue contributes to hydrophobic packing or functional integrity. In contrast, other positions exhibit relatively mild score variations across multiple substitutions, indicating higher mutational tolerance. These sites likely correspond to surface-exposed or flexible regions of the protein.

Latent Space Analysis

a) Use the provided sequence dataset to embed proteins in reduced dimensionality.

b) Analyze the different formed neighborhoods: do they approximate similar proteins?

c) Place your protein in the resulting map and explain its position and similarity to its neighbors.

To evaluate the biological relevance of the learned embedding space, my protein of interest, RuBisCO (PDB ID: 2YBV), was embedded using the same ESM-2 pipeline applied to the SCOPe dataset. The resulting embedding was projected into the same reduced-dimensional space and analyzed using cosine similarity in the original high-dimensional representation.

The nearest neighbor to my protein (cosine similarity = 0.9904) corresponds to another ribulose-1,5-bisphosphate carboxylase-oxygenase from Oryza sativa. This extremely high similarity value indicates that the embedding space accurately captures functional and structural conservation. Given that RuBisCO is a highly conserved enzyme involved in carbon fixation, this result is biologically consistent and expected.

Beyond the closest homolog, additional neighboring proteins include large metabolic enzymes such as aconitase, nitrogenase, and ornithine decarboxylase. Many of these proteins belong to α/β structural classes in SCOPe, suggesting that the latent space organizes proteins not only by specific biochemical function but also by shared structural architecture.

C2. Protein Folding

Folding a protein

  1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

Total sequence length: 475 Running ESMFold inference for sequence with length 475… Prediction complete. ptm: 0.919 plddt: 90.429 Results saved to test_d3e9a/ CPU times: user 1min 33s, sys: 8.76 s, total: 1min 41s Wall time: 2min 15s

ExecutiveAlign: 3395 atoms aligned. ExecutiveRMS: 132 atoms rejected during cycle 1 (RMSD=1.74). ExecutiveRMS: 183 atoms rejected during cycle 2 (RMSD=1.04). ExecutiveRMS: 97 atoms rejected during cycle 3 (RMSD=0.85). ExecutiveRMS: 46 atoms rejected during cycle 4 (RMSD=0.79). ExecutiveRMS: 22 atoms rejected during cycle 5 (RMSD=0.77). Executive: RMSD = 0.761 (2868 to 2868 atoms)

The amino acid sequence of the protein (475 residues) was folded using ESMFold to predict its three-dimensional structure. The predicted model showed high confidence with a pTM score of 0.919 and a pLDDT value of approximately 90, indicating reliable structural prediction. The predicted structure was then aligned with the experimentally determined structure PDB 2YBV using PyMOL. The alignment resulted in an RMSD of 0.761 Å across 2868 atoms, indicating that the predicted coordinates match the original structure very closely.

  1. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

To evaluate the effect of mutations on the protein structure, several point mutations were introduced into the amino acid sequence, including substitutions such as V->A, L->A, F->Y, G->A, and I-> A at different positions along the 475-residue protein. The modified sequence was then folded using ESMFold and compared with the experimental structure PDB 2YBV using PyMOL. Structural alignment produced an RMSD of 0.761 Å, indicating very high structural similarity between the mutated and original structures. These results suggest that the protein fold is highly conserved and resilient to sequence mutations, maintaining its overall three-dimensional structure despite several amino-acid substitutions.

Total sequence length: 475 Running ESMFold inference for sequence with length 475… Prediction complete. ptm: 0.921 plddt: 90.388 Results saved to test_cdeb6/ CPU times: user 1min 45s, sys: 8.49 s, total: 1min 53s Wall time: 2min 23s

Match: read scoring matrix. Match: assigning 475 x 5591 pairwise scores. MatchAlign: aligning residues (475 vs 5591)… MatchAlign: score 2538.000 ExecutiveAlign: 3395 atoms aligned. ExecutiveRMS: 132 atoms rejected during cycle 1 (RMSD=1.74). ExecutiveRMS: 183 atoms rejected during cycle 2 (RMSD=1.04). ExecutiveRMS: 97 atoms rejected during cycle 3 (RMSD=0.85). ExecutiveRMS: 46 atoms rejected during cycle 4 (RMSD=0.79). ExecutiveRMS: 22 atoms rejected during cycle 5 (RMSD=0.77). Executive: RMSD = 0.761 (2868 to 2868 atoms)

modified sequence = “MAYTQSKSQKVGYQAGVKDYRLTYYTPDYTPKDTDILAAFRVTPQPGVPFEEAAAAVAAESSTGTWTTVWTDLLTDLDRYKGCCYDIEPLPGEDNQFIAYIAYPLDLFEEGSVTNMLTSIVGNVFGFKALKALRLEDLRIPVAYLKTFQGPPHGIQVERDKLNKYGRPLLGCTIKPKLGLSAKNYGRAVYECLRGGLDFTKDDENINSQPFQRWRDRFLFVADAIHKAQAETGEIKGHYLNVTAPTCEEMLKRAEFAKELEMPIIMHDFLTAGFTANTTLSKWCRDNGMLLHIHRAMHAVMDRQKNHGIHFRVLAKCLRMSGGDHIHTGTVVGKLEGDKAVTLGFVDLLRENYIEQDRSRGIYFTQDWASMPGVMAVASGGIHVWHMPALVDIFGDDAVLQFGGGTLGHPWGNAPGATANRVALEACIQARNEGRDLMREGGDIIREAARWSPELAAACELWKEIKFEFEAQDTI”

C3. Protein Generation

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

  1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

The amino acid probability heatmap generated by ProteinMPNN shows the likelihood of each amino acid at every position of the protein backbone. In the heatmap, several positions display strong probabilities for a single amino acid, which appear as bright signals for specific residues. These positions likely correspond to structurally constrained residues that are important for maintaining the stability of the protein fold. Other positions show more distributed probabilities across multiple amino acids, suggesting that these regions are more flexible and can tolerate substitutions.

When comparing the predicted sequence to the original sequence from PDB 2YBV, the sequence recovery was approximately 47.2%, meaning that about half of the residues match the native sequence while many positions were redesigned by the model. This observation is consistent with the heatmap, which indicates that while some residues are strongly constrained by the structure, other positions allow multiple amino acids. Overall, this result demonstrates that several different amino acid sequences can be compatible with the same protein backbone.

Generating sequences…

>2YBV, score=1.5070, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 TYYTPDYTPKDTDILAAFRVTPQPGVPFEEAAAAVAAESSTGXXXXXWTDLLTDLDRYKGCCYDIEPLPGEDNQFIAYIAYPLDLFEEGSVTNMLTSIVGNVFGFKALKALRLEDLRIPVAYLKTFQGPPHGIQVERDKLNKYGRPLLGCTIKPKLGLSAKNYGRAVYECLRGGLDFTKDDENINSQPFQRWRDRFLFVADAIHKAQAETGEIKGHYLNVTAPTCEEMLKRAEFAKELEMPIIMHDFLTAGFTANTTLSKWCRDNGMLLHIHRAMHAVMDRQKNHGIHFRVLAKCLRMSGGDHIHTGTVVGKLEGDKAVTLGFVDLLRENYIEQDRSRGIYFTQDWASMPGVMAVASGGIHVWHMPALVDIFGDDAVLQFGXXXXGHPWGNAPGATANRVALEACIQARNEGRDLMREGGDIIREAARWSPELAAAC

>T=0.1, sample=0, score=0.7574, seq_recovery=0.4720 KYLRLDYVPKDDDILAAFLVTPKPGVPPEEAAAAVARESSVGXXXXXEELFEIDLEKYRAVCYKITPLPGADNRFLAYVAYPLSLFKPGSVTDLWNTIVGRVFEHPLLAALKLLDLRIPPSFIATFPGPPLGIEAVRARLGIHGRPLLGFAIRPREGLSPAEYGERARAILEGGADFTFDHGSVRDHPWAKFEDRAAAVAEAIRAAEAATGRRKAHAVNISAPTLEEALARADLCKSLGLPMISFNYLRDGFDAAAAIAAYCREHGIILYAGTGGISKVSSDPDRGVDRRVYYKLIRLTGADAIAVGSLKGKDPETRAKILGTVRLITEDYVEADPSKGIYFSQDWAGLPGIIPVVAGGLDVTDIPALVAEWGDDAILLFGXXXXAHPLGLRAGAAARRAALDAAVAARKAGVDLAKDGAAVLAAAAATNPELAAML

  1. Input this sequence into ESMFold and compare the predicted structure to your original.

The sequence generated by ProteinMPNN was then folded using ESMFold to predict its three-dimensional structure. The predicted structure was subsequently compared to the original structure from PDB 2YBV.

Despite the differences between the designed sequence and the native sequence, the predicted structure maintained a similar overall fold. This suggests that the designed sequence is structurally compatible with the original backbone. These results illustrate an important principle of protein design: multiple distinct amino acid sequences can adopt very similar three-dimensional structures when the structural constraints of the protein backbone are preserved.

Generating sequences…

>2YBV, score=1.5417, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 TYYTPDYTPKDTDILAAFRVTPQPGVPFEEAAAAVAAESSTGXXXXXWTDLLTDLDRYKGCCYDIEPLPGEDNQFIAYIAYPLDLFEEGSVTNMLTSIVGNVFGFKALKALRLEDLRIPVAYLKTFQGPPHGIQVERDKLNKYGRPLLGCTIKPKLGLSAKNYGRAVYECLRGGLDFTKDDENINSQPFQRWRDRFLFVADAIHKAQAETGEIKGHYLNVTAPTCEEMLKRAEFAKELEMPIIMHDFLTAGFTANTTLSKWCRDNGMLLHIHRAMHAVMDRQKNHGIHFRVLAKCLRMSGGDHIHTGTVVGKLEGDKAVTLGFVDLLRENYIEQDRSRGIYFTQDWASMPGVMAVASGGIHVWHMPALVDIFGDDAVLQFGXXXXGHPWGNAPGATANRVALEACIQARNEGRDLMREGGDIIREAARWSPELAAAC

>T=0.1, sample=0, score=0.7618, seq_recovery=0.4720 KYYRPDYVPKPDDILVAFLVVPKPGVPPEEAAASVAAKSSVGXXXXXLLALLIDLEKYRAVCYEIRPLPGAENRFLAYVAYPLSRFKPGSVTDLFNTIFGNVFSHPDLEALRLLDIRIPPAFIATFPGPPLGIDAVREKLGIYGRPLLAFSIRPRLGLSAAERGERAYEILKGGADITKDPSSLTDQPFCPFADLAREVAAAIRRAEAETGRKKAHAVNISAPDLAAALARLDLCVALGLPMVKFNFLTDGFEAAAAVARRCRETGVILYADTTGISAVSSDPLRGIDKRVYYKLIRLTGADAIEVGTLKGKDPETQAKILGTIDLITKDRVEKDESKGIYFTQDWAGLPGIIPVVSGGLTVKDIPYLVDAYGDNAIIEFGXXXXAHPKGLAAGAKAYKTALDAAVAAKKAGVDLKKDGEKVLADAAKTNPELAAML

Part D. Group Brainstorm on Bacteriophage Engineering

  1. Find a group of ~3–4 students

  2. Read through the Phage Reading material listed under “Reading & Resources” below.

  3. Review the Bacteriophage Final Project Goals for engineering the L Protein:

  • Increased stability (easiest)
  • Higher titers (medium)
  • Higher toxicity of lysis protein (hard)
  1. Brainstorm Session
  • Choose one or two main goals from the list that you think you can address computationally (e.g., “We’ll try to stabilize the lysis protein,” or “We’ll attempt to disrupt its interaction with E. coli DnaJ.”).

  • Write a 1-page proposal (bullet points or short paragraphs) describing:

    • Which tools/approaches from recitation you propose using (e.g., “Use Protein Language Models to do in silico mutagenesis, then AlphaFold-Multimer to check complexes.”).

    • Why do you think those tools might help solve your chosen sub-problem?

    • Name one or two potential pitfalls (e.g., “We lack enough training data on phage–bacteria interactions.”).

    • Include a schematic of your pipeline.

  • This resource may be useful: HTGAA Protein Engineering Tools

  1. Each individually put your plan on your HTGAA website
  • Include your group’s short plan for engineering a bacteriophage

PROJECT OBJECTIVE

  • Engineer the L protein of the MS2 phage to increase structural stability.
  • Disrupt or reduce its interaction with the bacterial chaperone DnaJ.
  • Preserve the C-terminal lysis domain to maintain lytic function.
  • Avoid mutations that interfere with structurally or evolutionarily coupled residues.

Phase 1: Mapping the DnaJ Interaction Interface

Since the exact binding interface between the L protein and DnaJ is unknown, the first step is to identify it computationally rather than introducing arbitrary mutations.

  • Use AlphaFold-Multimer to model the complex between L protein and DnaJ.
  • Generate multiple structural predictions and select the top-ranked models.
  • Identify consensus interface residues that consistently appear in the predicted binding interface.
  • Perform in silico alanine scanning of the N-terminal residues in the complex to determine which residues significantly contribute to binding energy (ΔΔG).
  • Analyze whether the N-terminal region resembles known DnaJ-binding motifs, typically hydrophobic residues flanked by basic amino acids.

This phase defines which residues are critical for interaction and should not be mutated randomly.

Phase 2: Targeted N-Terminal Redesign

Instead of deleting regions or performing extensive random substitutions, introduce controlled chemical modifications to disrupt interaction while preserving structural stability.

  • Focus on charge inversion strategies:

    • Basic residues (K, R) → Acidic residues (E, D)
    • Acidic residues (E, D) → Basic residues (K, R)
  • Disrupt hydrophobic interaction patches:

    • Hydrophobic residues (L, I, V, F) → Polar residues (S, T, N, Q)
    • Aromatic residues (F, Y, W) → Aliphatic or small residues
  • Generate a graded library of variants:

    • Minor charge modifications
    • Moderate interface perturbations
    • Strong hydrophobic disruption

This creates a Pareto front of variants balancing reduced DnaJ interaction and preserved protein stability.

Phase 3: Stability and Functional Filtering

To ensure that redesigned variants remain structurally viable and functionally relevant:

  • Use Rosetta or FoldX to calculate ΔΔG and verify that mutations do not destabilize the overall protein fold.

  • Confirm that mutations in the N-terminal region do not propagate structural stress toward the C-terminal lysis domain.

  • Perform co-evolutionary analysis (e.g., EVcouplings):

    • Identify residue pairs that co-evolved between the N-terminal and C-terminal regions.
    • Avoid mutating co-evolved residues independently to prevent functional disruption.
  • Evaluate aggregation propensity using tools such as Aggrescan3D to ensure that mutations do not create exposed hydrophobic patches leading to cytoplasmic aggregation.

  • Assess sequence plausibility using protein language models such as ESM to filter out unlikely or non-natural variants.

Key Limitations:

  • The DnaJ binding mode may be transient or dynamic, reducing AlphaFold-Multimer accuracy.
  • Protein language model scores do not guarantee in vivo functionality.
  • Intrinsically disordered regions may not be accurately modeled.
  • Computational predictions must ultimately be validated experimentally.

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Your challenge: Design short peptides that bind mutant SOD1. Then decide which ones are worth advancing toward therapy.

You will use three models developed in our lab:

  • PepMLM: target sequence-conditioned peptide generation via masked language modeling
  • PeptiVerse: therapeutic property prediction
  • moPPIt: motif-specific multi-objective peptide design using Multi-Objective Guided Discrete Flow Matching (MOG-DFM)

Part 1: Generate Binders with PepMLM

  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Original sequence:

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Sequence with mutation: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

  1. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:

  2. Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

  3. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.

  4. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Generated sequences:

BinderPseudo Perplexity
WRYGVTALAHWX10.28 ⭐
KRYPVVGLEWKX14.16
KHYPPVVVAHKK14.86
WRYYAAVVRHKK19.84

Known Binder: FLYRWLPSRRGG

Four candidate peptides of length 12 were generated using PepMLM conditioned on the mutant SOD1 sequence. The model assigned pseudo-perplexity scores to each peptide, which reflect the likelihood of the sequence under the model. Lower perplexity values indicate higher confidence. Among the generated peptides, WRYGVTALAHWX had the lowest pseudo-perplexity (10.28), suggesting it is the most plausible binder candidate. For comparison, the known SOD1-binding peptide FLYRWLPSRRGG was also included.

Part 2: Evaluate Binders with AlphaFold3

  1. Navigate to the AlphaFold Server: alphafoldserver.com

  2. For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.

a) ipTM = 0.45pTM = 0.78

b) ipTM = 0.38pTM = 0.87

c) ipTM = 0.24pTM = 0.87

d) ipTM = 0.27pTM = 0.71

e) Known Binder ipTM = 0.35pTM = 0.83

  1. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

Model 1

  • The peptide sits along the side of the β-sheet barrel.
  • It contacts surface loops near the barrel edge.
  • Surface-bound, not buried.
  • Not near the extreme N-terminus (A4V region).

Model 2

  • The peptide is positioned above the β-barrel core, interacting mainly with loop regions.
  • Still surface-exposed.
  • No clear contact with the dimer interface.

Model 3

  • The peptide extends toward a flexible loop projecting from the barrel.
  • Appears loosely associated, with much of the peptide exposed.
  • Again not near the N-terminal A4V site.

Model 4

-The peptide approaches between β-strands and adjacent loops, slightly closer to the barrel surface.

  • Part of the peptide appears partially tucked against the protein, but still largely surface-bound.

Model 5

  • The peptide lies along the β-sheet surface, contacting residues on the outer barrel face.
  • The orientation is consistent with surface docking rather than deep insertion.
  1. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

AlphaFold3 was used to model complexes between mutant SOD1 and each candidate peptide. The predicted interface TM-scores (ipTM) ranged from 0.24 to 0.45, indicating generally weak but plausible protein–peptide interactions. Most peptides appeared surface-bound along the β-barrel region of SOD1, interacting primarily with exposed loop regions rather than the N-terminal region where the A4V mutation occurs. The known SOD1-binding peptide FLYRWLPSRRGG produced an ipTM score of 0.35. Notably, one PepMLM-generated peptide (WRYGVTALAHWA) showed a higher ipTM score of 0.45, suggesting a potentially stronger interaction than the reference peptide. These results indicate that some generated peptides may represent promising candidates for further optimization and evaluation.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Paste the peptide sequence.

  2. Paste the A4V mutant SOD1 sequence in the target field.

  3. Check the boxes

  • Predicted binding affinity

  • Solubility

  • Hemolysis probability

  • Net charge (pH 7)

  • Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?

Choose one peptide you would advance and justify your decision briefly.

PEPTIDE #1:

PEPTIDE #2:

PEPTIDE #3:

PEPTIDE #4:

PEPTIDE #5:

PeptideipTM (AF3)Binding Affinity (pKd)SolubilityHemolysis (Prob)Net Charge
WRYGVTALAHWA0.456.223Soluble (1.00)0.048+1.85
KRYPVVGLEWKA0.385.542Soluble (1.00)0.033+1.76
KHYPPVVVAHKK0.244.835Soluble (1.00)0.018+2.93
WRYYAAVVRHKK0.276.045Soluble (1.00)0.032+3.84
FLYRWLPSRRGG0.315.968Soluble (1.00)0.047+2.76

PeptiVerse predictions were used to evaluate the therapeutic properties of the generated peptides, including binding affinity, solubility, hemolysis probability, and net charge. The peptide WRYGVTALAHWA showed the highest predicted binding affinity (pKd = 6.223) and the highest AlphaFold3 interface score (ipTM = 0.45), suggesting a relatively stronger interaction with SOD1 compared to the other candidates and the reference peptide FLYRWLPSRRGG. All peptides were predicted to be highly soluble with low hemolysis probabilities, indicating generally favorable therapeutic properties. Although some peptides displayed higher net charges, WRYGVTALAHWA maintained a moderate charge and low toxicity risk. Based on the combined structural and therapeutic predictions, WRYGVTALAHWA appears to provide the best balance of binding strength and developability and I selected it as the peptide to advance for further optimization.

Part 4: Generate Optimized Peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

  1. Open the moPPit Colab linked from the HuggingFace moPPIt model card

  2. Make a copy and switch to a GPU runtime.

  3. In the notebook:

  • Paste your A4V mutant SOD1 sequence.

  • Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).

  • Set peptide length to 12 amino acids.

  • Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.

  1. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

Peptides generated:

a) STCKYKKIGGTL

b) GRYKCYCRDSRY

c) DDTITCKKKQCT

In this step, peptides were generated using moPPIt, which applies Multi-Objective Guided Discrete Flow Matching to design binders toward specific residues on the target protein while simultaneously optimizing multiple objectives such as binding affinity, solubility, and toxicity. The mutant sequence of Superoxide Dismutase 1 was provided as the target, and residues 4-7 near the A4V mutation were selected to guide peptide binding toward the N-terminal region of the protein. Compared to the peptides generated with PepMLM, the moPPIt peptides displayed more structured residue patterns, including higher frequencies of positively charged residues such as lysine and arginine and the presence of cysteine residues that may contribute to stabilizing protein–peptide interactions. This suggests that moPPIt performs directed optimization of binding motifs rather than broadly sampling plausible sequences.

Before advancing these peptides toward clinical studies, several validation steps would be required. First, computational structural modeling using AlphaFold3 or molecular docking could confirm whether the peptides bind near the intended SOD1 residues. Property prediction tools such as PeptiVerse could further evaluate binding affinity, solubility and toxicity risks. Finally, experimental validation would be necessary, including in vitro binding assays, aggregation inhibition assays for mutant SOD1, and toxicity testing in relevant cellular systems.

Part C: Final Project: L-Protein Mutants

High level summary: The objective of this assignment is to improve the stability and auto-folding of the lysis protein of a MS2-phage. This mechanism is key to the understanding of how phages can potentially solve antibiotic-resistance.

Lysis Protein Sequence (UniProtKB ID:

https://www.uniprot.org/uniprotkb/P03609/entry)

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Note: Lysis protein contains a soluble N-terminal domain followed by a transmembrane protein (blue/last 35 residues). Transmembrane protein affects the lysis activity. The soluble domain (green) is the domain responsible for interaction with DnaJ.

L-Protein Engineering | Option 1: Mutagenesis

STEP 1:

A multiple sequence alignment of homologous L-protein sequences was performed using Clustal Omega to identify conserved and variable regions across related bacteriophages. The alignment revealed that the transmembrane region, located in the C-terminal portion of the protein, is highly conserved, particularly in residues forming a hydrophobic helix (LVLIFLAIFLSKFTNQLLLSLL). This high level of conservation suggests a critical functional role in membrane insertion and pore formation during bacterial lysis. In contrast, the N-terminal soluble region displayed greater sequence variability, indicating a higher tolerance to mutations. Based on these observations, conserved residues were avoided during mutational design, while more variable positions, especially in the soluble domain, were prioritized as potential targets for mutation.

STEP 2:

To evaluate the effect of mutations across the L-protein sequence, a protein language model (ESM-2) was used to compute log-likelihood ratio (LLR) scores for all possible amino acid substitutions at each position. This approach estimates how favorable a mutation is relative to the wild-type residue based on learned sequence patterns from large protein datasets. Positive LLR scores indicate mutations that are more likely to be tolerated or beneficial for protein stability, while negative scores suggest deleterious effects. The results were compiled into a ranked list of candidate mutations, allowing the identification of positions and substitutions with the highest predicted improvement. These scores were then used as a primary filter to guide mutation selection, in combination with conservation analysis from the multiple sequence alignment.

The protein language model identified several mutations with high positive LLR scores, indicating potentially favorable substitutions. The top-ranked mutations included K50L (LLR = 2.56), C29R (LLR = 2.39), Y39L (LLR = 2.24), C29S (LLR = 2.04), and S9Q (LLR = 2.01). Additional high-scoring mutations were observed at positions within both the soluble and transmembrane regions, such as T52L (LLR = 1.81), N53L (LLR = 1.86), and A45L (LLR = 1.54), particularly favoring substitutions to hydrophobic residues in the transmembrane domain. These results suggest that increasing hydrophobicity in the membrane region and selecting tolerated substitutions in variable regions may improve protein stability and folding.

STEP 3:

To assess how well the model predictions reflect real functional outcomes, the LLR scores were compared with available experimental lysis data for L-protein mutants. While some overlap between high-scoring mutations and experimentally tested variants was observed, many of the top-ranked mutations identified by the model were not present in the experimental dataset. Therefore, the experimental data was used when available, but for many candidate mutations, selection relied primarily on LLR scores in combination with conservation analysis.

STEP 4:

Based on the combined analysis of LLR scores, sequence conservation, and structural considerations, five mutations were selected as potential candidates for improving the L-protein. In the soluble region, the mutations S9Q and K23R were chosen due to their high LLR scores and location in more variable regions, suggesting a higher tolerance for substitutions that may improve folding stability. In the transmembrane region, K50L and T52L were selected, as both mutations introduce more hydrophobic residues, which is consistent with the conserved nature of this domain and may enhance membrane insertion and pore formation. Additionally, a combined mutant (S9Q + K50L) was designed to explore potential additive effects between improved folding in the soluble region and enhanced hydrophobicity in the transmembrane domain.

FASTA SEQUENCES:

>WT_L_protein METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.44

>S9Q METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>K23R METRFPQQSQQTPASTNRRRPFRHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>K50L METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

>T52L METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.46

>S9Q_K50L METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT

pTM = 0.43

AlphaFold predictions were used to assess the structural impact of the selected mutations. The wild-type protein showed a pTM score of 0.44, while most mutants exhibited similar values around 0.43, indicating no significant structural disruption. Notably, the T52L mutant showed a slightly higher pTM score of 0.46, suggesting a modest improvement in structural stability. This result is consistent with the introduction of a more hydrophobic residue in the transmembrane region, which may favor membrane insertion. Overall, these findings indicate that the proposed mutations are structurally tolerated and may contribute to improved protein stability.

Week 6 HW: Genetic Circuits Part I

Assignment: DNA Assembly

Answer these questions about the protocol in this week’s lab:

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The Phusion High-Fidelity PCR Master Mix contains several key components necessary for efficient and accurate DNA amplification. First, it includes Phusion DNA polymerase, a high-fidelity enzyme with proofreading activity (3’ → 5’ exonuclease), which reduces errors during DNA replication. It also contains dNTPs (deoxynucleotide triphosphates), which are the building blocks used to synthesize new DNA strands. The mix includes a reaction buffer, optimized with the correct pH and salt concentrations to ensure proper enzyme activity. Additionally, it contains Mg²⁺ ions, which act as essential cofactors for the polymerase. Some mixes may also include stabilizers to maintain enzyme activity during thermal cycling.

2. What are some factors that determine primer annealing temperature during PCR?

Primer annealing temperature depends mainly on the melting temperature (Tm) of the primers. Tm is influenced by primer length, GC content (since G-C pairs have stronger bonding than A-T), and sequence composition. Typically, the annealing temperature is set about 5°C below the Tm. Moreover, other factors include primer specificity, as mismatches lower effective binding, and salt concentration, which affects DNA duplex stability. If the temperature is too low, nonspecific binding may occur; if too high, primers may not bind efficiently.

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR amplifies a specific DNA region using primers and DNA polymerase, allowing you to generate large amounts of a defined fragment and even introducing mutations or overlaps. It is highly flexible and does not require specific restriction sites. In contrast, restriction enzyme digestion cuts DNA at specific recognition sequences using restriction enzymes. This method is precise but limited by the presence of those recognition sites in the DNA.

PCR is preferable when you need to amplify DNA, modify sequences or add overlaps for cloning. On the other hand, restriction digestion is preferable when working with existing plasmids and known restriction sites, especially for traditional cloning methods.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

DNA fragments must have overlapping homologous regions (typically 20–40 base pairs) at their ends. These overlaps can be designed into PCR primers or generated through careful restriction digestion. It is also important to verify that sequences are correct (no mutations) and in the proper orientation. DNA fragments should be clean and free of contamination. Finally, checking sequences using software (like Benchling) ensures that overlaps align correctly for seamless assembly.

5. How does the plasmid DNA enter the E. coli cells during transformation?

Plasmid DNA enters E. coli induced by chemical treatment or electroporation. In chemical transformation, cells are treated with calcium chloride, which neutralizes the negative charges on DNA and the cell membrane. A heat shock step creates a temporary imbalance in the membrane, allowing DNA to enter the cell. Alternatively, in electroporation, an electrical pulse creates transient pores in the membrane, through which DNA can pass. Once inside, the plasmid replicates independently if it has an origin of replication.

6. Describe another assembly method in detail (such as Golden Gate Assembly)

- Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate Assembly relies on Type IIS restriction enzymes that cut DNA outside their recognition sites, generating unique overhangs. These overhangs are designed to be complementary between adjacent fragments, ensuring correct assembly order. During the reaction, the enzyme cuts the DNA and DNA ligase joins the fragments together. Because the recognition sites are removed after cutting, the assembled DNA cannot be re-cut, making the process highly efficient. Multiple fragments can be assembled in a single reaction tube in a predefined sequence.

Diagram:

- Model this assembly method with Benchling or Asimov Kernel!

The Golden Gate Assembly was modeled in Benchling by inserting a pre-designed genetic circuit I previously had, into the pXTK058 backbone. The circuit consisted of a constitutive promoter, a ribosome binding site (RBS), the coding sequence for butyryl-CoA dehydrogenase and a terminator, forming a complete expression cassette.

Type IIS restriction enzyme sites (BbsI) were used to generate compatible overhangs for directional assembly. The Assembly Wizard was used to simulate the process by treating these overhangs as overlaps. The final construct was verified to ensure correct insertion, orientation, and sequence integrity.

Assignment: Asimov Kernel

  1. Create a Repository for your work

  2. Create a blank Notebook entry to document the homework and save it to that Repository

  3. Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)

  4. Create a blank Construct and save it to your Repository

  • Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository4

  • Search the parts using the Search function in the right menu

  • Drag and drop the parts into the Construct

  • Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial Demos repository

  • Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook

  1. Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo
  • Explain in the Notebook Entry how you think each of the Constructs should function

  • Run the simulator and share your results in the Notebook Entry

  • If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome


HTGAA WEEK #6: IAN SEBASTIAN TERAN GARCIA’S HOMEWORK

1. Exploring the devices in the Bacterial Demos Repo:

Found a construct called “Circuit 3” in the Bacterial Demos Repore and I could observe it corresponds to a plasmid backbone containing the AmeR resistance gene, which allows bacterial selection under antibiotic conditions. The promoter pAmeR drives the expression of this resistance gene.

2. Recreating the Repressilator:

After recreating the repressilator, no noticeable differences were observed between the original circuit obtained from the repository and the one recreated in the simulation environment. Both produced nearly identical graphical outputs, indicating that the reconstruction was accurate and functionally equivalent.

Graphs interpretation:

The graph of RNA concentrations over time shows clear periodic oscillations for the three transcripts. Each gene’s mRNA level rises and falls in a regular pattern, with a noticeable phase shift between them. When one gene is highly expressed, it represses the next gene, causing its expression to decrease, while the third gene begins to increase. This cyclical pattern confirms that the circuit is functioning as an oscillator, with coordinated and repeating changes in gene expression.

Similarly, the protein concentration graph also displays oscillatory behavior, although the fluctuations are smoother and slightly delayed compared to the RNA levels. This delay occurs because protein production depends on the prior synthesis of mRNA, and proteins generally have longer degradation times. Therefore, protein dynamics tend to be more stable and less abrupt than RNA dynamics, which is consistent with biological expectations.

The RNAP flux graph represents the transcriptional activity of each gene at a specific moment in time. Higher values indicate stronger promoter activity, meaning that more RNA polymerase is actively transcribing that gene. In contrast, lower values suggest that the gene is being repressed. This snapshot reflects the regulatory interactions within the circuit at that particular time point.

Finally, the ribosome flux graph shows the rate of protein synthesis for each gene. Similar to the RNAP flux, higher values correspond to increased translation activity. The patterns observed here are consistent with the RNA levels but may show slight delays due to the time required for translation. Overall, these flux measurements provide additional confirmation of the dynamic regulation and oscillatory behavior of the repressilator system.

2.2. Repressilator simulation recreated by me:

3. My constructs:

3.1. Construct 1.

The first genetic circuit consists of the TetR gene under the control of the inducible pBad promoter. This represents a simple gene expression system without regulatory feedback. The RNA concentration graph shows a rapid increase followed by a stable plateau, indicating that transcription is activated and reaches a steady state where production and degradation are balanced.

The protein concentration shows a delayed increase compared to RNA levels, which is expected due to the time required for translation. Eventually, protein levels stabilize, indicating equilibrium.

The RNAP and ribosome flux graphs show relatively constant activity, suggesting sustained transcription and translation under the simulated conditions. Overall, this circuit behaves as a simple inducible expression system, producing a stable amount of TetR without dynamic regulation.

3.2. Construct 2.

For the second construct, the genetic circuit expresses QacR under the control of the pSrpR promoter. Similar to the first construct, this system lacks explicit feedback regulation and behaves as a simple expression module. The RNA concentration rapidly increases and stabilizes, indicating steady transcriptional activity. Compared to Construct 1, the higher RNA levels suggest that the pSrpR promoter is stronger under the simulation conditions.

Moreover, the protein concentration follows the expected delayed increase and reaches a higher steady-state level. RNAP and ribosome fluxes confirm sustained transcription and translation activity. Overall, this construct demonstrates stable gene expression and highlights how promoter strength affects system output.

3.3. Construct 3.

The third construct includes two genes which are QacR, under the inducible pBad promoter and LitR under the pLitR promoter, introducing regulatory interactions and feedback into the system as the simulation results show a strong dominance of LitR expression, while QacR remains near zero. This suggests that the inducible promoter pBad is not sufficiently activated under the simulation conditions, resulting in minimal QacR production.

As a consequence, repression of pLitR by QacR is ineffective, allowing LitR to accumulate. Additionally, LitR negatively regulates its own promoter, creating a feedback loop that stabilizes its expression level.

The protein concentration reflects this behavior, with LitR reaching a high steady-state level and QacR remaining negligible. RNAP and ribosome fluxes confirm strong transcription and translation for LitR and minimal activity for QacR.

Week 7 HW: Genetic Circuits Part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

  • Continuous signal processing: Unlike Boolean circuits that operate in binary (On/Off), IANNs can process graded inputs and outputs, enabling more nuanced cellular responses.

  • Integration of multiple inputs: IANNs can combine many signals simultaneously and compute a weighted response, similar to an artificial neural network.

  • Instead of being limited to simple logic gates (and, or, not), IANNs can model nonlinear relationships between inputs and outputs.

  • Parameters like the promoter strength, binding affinity and degradation rates can be tuned to adjust how strongly each input influences the output.

  • Cells can make context - dependent decisions rather than rigid binary responses.

2. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

A useful application of an intracellular artificial neural network (IANN) is the classification of cancer cells based on microRNA expression profiles. In this system, the inputs are the intracellular concentrations of specific microRNAs (for example, miR-21 or miR-34), which are differentially expressed in cancerous cells. These microRNAs regulate gene expression by repressing or permitting translation of target mRNAs, effectively acting as weighted inputs in the network.

The output is the expression of a reporter or therapeutic gene, such as a fluorescent protein or an apoptosis-inducing factor. The IANN integrates the multiple microRNA inputs and produces a response only when the combined signal exceeds a threshold, similar to a perceptron. However, this approach faces several limitations because biological noise in gene expression can reduce accuracy and unintended interactions may interfere with circuit behavior. Additionally, there are constraints on the number of inputs that can be reliably implemented. Finally, delivering such engineered systems into patients remains a significant practical limitation.

Reference:

Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., & Benenson, Y. (2011). Multi-input RNAi-based logic circuit for identification of specific cancer cells. Science (New York, N.Y.), 333(6047), 1307–1311. https://doi.org/10.1126/science.1205527

3. Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

This diagram I made on Canva represents an intracellular artificial neural network composed of two layers. In Layer 1, the inputs X₁ and X₂ are transcribed (Tx) and translated (Tl) to produce an endoribonuclease (E), which acts as an intermediate regulatory signal. This enzyme then interacts with Layer 2, where inputs X₃ and X₄ are transcribed into mRNA. The endoribonuclease E negatively regulates this layer by cleaving the mRNA, thereby reducing its availability for translation. As a result, the production of the fluorescent protein Y is modulated at the translational level. This design mimics a multilayer perceptron, where the first layer processes inputs to generate a hidden signal (E), and the second layer integrates both direct inputs and regulatory signals to determine the final output.

Assignment Part 2: Fungal Materials

1. What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

Fungal materials are primarily based on mycelium, the filamentous network of fungi, which can bind organic matter into solid structures. One of the most well-known examples is mycelium-based packaging, used as an alternative to polystyrene foam for protecting goods during shipping. These materials are lightweight, biodegradable, and can be molded into custom shapes. Another example is mycelium leather, a sustainable alternative to animal leather, used in fashion and upholstery. Additionally, fungal materials are used in construction, such as insulation panels and biodegradable bricks, due to their thermal resistance and low density. Some applications also include acoustic panels and biocomposites for furniture. Fungal materials are biodegradable, renewable, and can be grown using agricultural waste, which significantly reduces environmental impact compared to plastics or synthetic foams. Their production typically requires less energy and generates fewer emissions. Furthermore, they exhibit useful properties such as thermal insulation, fire resistance, and lightweight structure. However, there are also limitations as fungal materials generally have lower mechanical strength compared to plastics or metals, which restricts their use in load-bearing applications. They can be sensitive to moisture and environmental conditions if not properly treated. Additionally, scaling production while maintaining consistency can be challenging, and their durability over long periods may be lower than that of conventional materials.

2. What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

CategoryDescriptionWhy it matters
Stronger materialsEngineer fungi to produce enhanced structural proteins or denser mycelium networksImproves mechanical strength for construction, packaging, and durable biomaterials
Water resistanceModify fungi to synthesize hydrophobic compoundsIncreases durability in humid environments and expands real-world applications
Self-healing materialsProgram fungi to regrow and repair damaged structuresExtends lifespan of materials and reduces maintenance costs
Antimicrobial propertiesEngineer production of antimicrobial compoundsPrevents contamination and increases safety in medical or packaging uses
Responsive (smart) materialsEnable fungi to respond to stimuli (light, temperature, chemicals)Allows development of adaptive or sensing materials

Fungi vs Bacteria in Synthetic Biology

FeatureFungiBacteria
Cell structureMulticellular, filamentous (mycelium)Unicellular
Material formationNaturally forms 3D structuresCannot form structures without scaffolds
Protein secretionHigh secretion capacityLimited secretion
Substrate useCan degrade complex biomass (e.g., agricultural waste)Prefer simpler substrates
Growth speedSlowerFaster
Genetic manipulationMore complexEasier
Best use caseLiving materials, biomaterials, structure-based applicationsFast production of molecules, simple genetic circuits

References:

Week 9 HW: Cell-Free Systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free protein synthesis (CFPS) offers significant advantages over traditional in vivo expression systems, primarily due to its flexibility and precise control over experimental conditions. Because CFPS operates in an open environment without living cells, researchers can directly manipulate the concentrations of DNA templates, ions, cofactors, and other components in real time. This eliminates constraints associated with cellular viability, such as toxicity or metabolic burden. As a result, CFPS is particularly advantageous for the production of proteins that are toxic to host cells, such as antimicrobial peptides or pore-forming proteins. Additionally, CFPS enables rapid prototyping of genetic constructs, making it highly suitable for applications like synthetic biology circuit testing, where speed and iterative design are essential.

2. Describe the main components of a cell-free expression system and explain the role of each component.

A cell-free expression system consists of several essential components that collectively replicate the molecular machinery of protein synthesis. The core component is the cell extract (lysate), which contains ribosomes, transfer RNAs (tRNAs), aminoacyl-tRNA synthetases, and various translation factors required for protein assembly. A DNA or messenger RNA (mRNA) template provides the genetic instructions encoding the target protein. Amino acids serve as the building blocks for protein synthesis, while an energy system—typically composed of ATP, GTP, and associated regeneration pathways—fuels transcription and translation processes. Additionally, salts and cofactors such as magnesium and potassium ions are necessary to maintain proper structural and functional conditions for enzymatic activity. When DNA is used as a template, transcriptional enzymes such as T7 RNA polymerase are also included to generate mRNA.

3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Energy regeneration is critical in CFPS because protein synthesis is an energy-intensive process that rapidly consumes ATP and GTP. Without a continuous supply of energy, translation halts prematurely, leading to low protein yields. To address this limitation, CFPS systems incorporate energy regeneration mechanisms that recycle ADP into ATP.

One commonly used method that I could use involves phosphoenolpyruvate (PEP) in combination with pyruvate kinase, which efficiently regenerates ATP during the reaction. Alternative systems, such as creatine phosphate with creatine kinase or glucose-based metabolic pathways, can also be employed depending on the desired duration and efficiency of protein production. These strategies extend reaction lifetimes and significantly improve overall protein yield.

4. Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic and eukaryotic CFPS systems differ in complexity, cost, and functional capabilities. Prokaryotic systems, such as those derived from Escherichia coli, are widely used due to their simplicity, high protein yield, and cost-effectiveness. However, they lack the machinery required for many post-translational modifications. These systems are well suited for expressing proteins that do not require complex folding or modifications, such as fluorescent reporters like GFP or metabolic enzymes.

In contrast, eukaryotic CFPS systems, including wheat germ or rabbit reticulocyte extracts, provide a more physiologically relevant environment that supports proper folding, disulfide bond formation, and certain post-translational modifications. Consequently, they are more appropriate for producing complex proteins such as human hormones or antibodies, where structural accuracy is critical for functionality.

5. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

The expression of membrane proteins in CFPS systems presents unique challenges due to their hydrophobic nature and dependence on lipid environments for proper folding and stability. These proteins are prone to aggregation when synthesized in aqueous conditions. To overcome these challenges, CFPS reactions can be supplemented with membrane-mimicking systems such as liposomes, nanodiscs, or mild detergents that facilitate proper insertion and stabilization of the protein.

Additionally, molecular chaperones may be included to assist in correct folding. Careful optimization of ionic conditions, particularly magnesium and potassium concentrations, as well as modulation of expression rates, can further enhance protein quality. These strategies collectively create a suitable environment for functional membrane protein production.

6. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Low protein yield in CFPS systems can arise from several factors and one common issue is inefficient transcription or translation, which may result from weak promoters, suboptimal ribosome binding sites, or degraded DNA templates. This can be addressed by optimizing genetic elements, increasing template concentration, or ensuring DNA integrity.

A second factor is insufficient energy supply; rapid depletion of ATP can prematurely terminate protein synthesis. Implementing or optimizing an energy regeneration system can significantly improve yields. A third potential cause is protein misfolding or degradation, often due to the absence of proper folding conditions or the presence of proteases in the extract. This can be mitigated by adding molecular chaperones, reducing reaction temperature, or incorporating protease inhibitors. Systematic optimization of these parameters is essential to achieve efficient and reliable protein production.

Homework Question from Kate Adamala

1. Pick a function and describe it.

a) What would your synthetic cell do? What is the input and what is the output?

The synthetic minimal cell is designed to function as a biosensor and detoxification system for mercury contamination in aqueous environments. The input is the presence of mercury ions (Hg²⁺), which are detected by a mercury-responsive regulatory element inside the synthetic cell. In response, the system activates gene expression. The output consists of two components: (i) the production of a fluorescent reporter protein (GFP), which enables detection, and (ii) the enzymatic conversion of Hg²⁺ into elemental mercury (Hg⁰), a less toxic and more diffusible form.

b) Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

No. Without encapsulation, the system would lack spatial organization and controlled interaction with the environment. The components responsible for sensing and response would diffuse freely, reducing efficiency and eliminating the ability to function as a defined, cell-like unit. Encapsulation is essential to maintain compartmentalization and regulate the exchange of molecules.

c) Could this function be realized by genetically modified natural cell?

Yes, this function could be implemented in genetically modified bacteria carrying mercury-resistance operons. However, such approaches involve the use of living genetically modified organisms, which raises biosafety and regulatory concerns. In contrast, synthetic minimal cells provide a non-living, modular alternative that allows precise control over system components and avoids environmental risks associated with engineered cells.

d) Describe the desired outcome of your synthetic cell operation.

The desired outcome is that, in the presence of mercury, the synthetic minimal cell simultaneously detects, reports, and detoxifies the contaminant. This results in both a measurable fluorescent signal and a reduction in mercury toxicity, enabling combined environmental sensing and remediation.

2. Design all components that would need to be part of your synthetic cell.

a) What would be the membrane made of?

The membrane would consist of a lipid bilayer composed of phospholipids such as POPC combined with cholesterol. This composition provides structural stability, appropriate fluidity, and controlled permeability, mimicking natural biological membranes.

b) What would you encapsulate inside? Enzymes, small molecules.

The synthetic cell would encapsulate a complete cell-free transcription/translation (Tx/Tl) system, including ribosomes, tRNAs, enzymes, amino acids, nucleotides, and cofactors. Additionally, it would contain the DNA encoding the mercury-responsive genetic circuit, an energy regeneration system (e.g., phosphoenolpyruvate-based), and all necessary components for protein synthesis, including the fluorescent reporter and detoxification enzymes.

c) Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason? (hint: for example, if you want to use small molecule modulated promotors, like Tet-ON, you need mammalian)

The Tx/Tl system would be derived from bacterial extracts (Escherichia coli), as this system is efficient, cost-effective and compatible with the mercury-responsive regulatory elements used in the design.

Since the system does not require complex post-translational modifications, a prokaryotic expression system is sufficient.

d) How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

Communication with the environment will be achieved through a combination of membrane permeability and specific transport mechanisms. Mercury ions (Hg²⁺), which are not readily permeable through lipid membranes, will enter the synthetic cell via membrane transport proteins such as MerT or MerP. Once inside, they activate the regulatory system. The detoxified product (Hg⁰) is more hydrophobic and can diffuse out of the membrane. The fluorescent signal remains inside the vesicle and can be detected externally using appropriate instrumentation.

3. Experimental details

a) List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

Lipids:

  • POPC (phosphatidylcholine)

  • Cholesterol

Genes:

  • merR (mercury-responsive transcriptional regulator)

  • merT and merP (mercury transport proteins)

  • merA (mercury reductase enzyme)

  • gfp (fluorescent reporter under mercury-inducible promoter)

Additional components:

  • Bacterial cell-free Tx/Tl system (E. coli extract)

  • Energy regeneration system (e.g., PEP + pyruvate kinase)

b) How will you measure the function of your system?

The function of the system will be evaluated using two complementary methods. First, fluorescence measurements will be used to quantify GFP expression as an indicator of mercury detection, using techniques such as plate readers or fluorescence microscopy. Second, chemical analysis of mercury transformation will be performed to confirm detoxification, using analytical methods such as atomic absorption spectroscopy to measure the conversion of Hg²⁺ to Hg⁰.

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

Application field: Textiles in fashion.

Write a one-sentence summary pitch sentence describing your concept.

A smart textile incorporating freeze-dried cell-free systems that detects environmental pollutants and responds by producing visible color changes and neutralizing harmful compounds.

How will the idea work, in more detail? Write 3-4 sentences or more.

The proposed system consists of fabrics embedded with freeze-dried cell-free transcription/translation (Tx/Tl) reactions distributed within microcapsules integrated into the textile fibers. Upon exposure to environmental stimuli such as air pollutants (e.g., nitrogen oxides or volatile organic compounds), the system is activated by ambient moisture (humidity or sweat), which rehydrates the cell-free components. The embedded genetic circuits are designed to sense specific chemical signatures and trigger the expression of reporter proteins that produce visible color changes, allowing real-time detection. In addition, the system can express enzymes capable of partially degrading or neutralizing harmful compounds in the immediate surroundings. This creates a dual-function material that acts both as a biosensor and a localized remediation system.

What societal challenge or market need will this address?

Current monitoring systems are often centralized and do not provide individuals with real-time, localized information about their exposure. This smart textile addresses the need for personal, wearable environmental monitoring, empowering users to make informed decisions about their surroundings. Furthermore, integrating a remediation function adds value by not only detecting pollutants but also contributing to their reduction at a microenvironmental level. This concept is particularly relevant for urban populations, industrial workers and populations exposed to poor air quality.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

One key limitation of cell-free systems is their dependence on hydration for activation. This can be addressed by designing the textile to utilize ambient humidity, sweat, or embedded hydrogel layers that retain moisture and enable controlled activation. Stability during storage can be improved through freeze-drying (lyophilization) combined with protective matrices such as sugars (trehalose), which preserve biological activity over extended periods. To address the one-time-use limitation, the textile can be engineered with replaceable or rechargeable patches containing the cell-free components.

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

1. Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

Sustainable agriculture is essential for long-duration space missions, where food must be produced in controlled and resource-limited environments. Beneficial soil bacteria play a critical role in plant growth by promoting nutrient availability and stress resistance. However, microgravity and space radiation may alter bacterial gene expression and reduce their effectiveness. Understanding how plant growth–promoting bacteria respond to space conditions is therefore essential for developing reliable bioregenerative life-support systems. This topic is significant for enabling food production beyond Earth and scientifically interesting for studying microbial adaptation to extreme environments.

(It is a topic I always wanted to explore more about)

2. Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

Stress-response and plant-growth–related genes in Bacillus subtilis (spo0A, sigB and auxin-related pathways).

3. Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

Bacillus subtilis is a model plant growth–promoting bacterium known for its resilience and ability to enhance plant health.

In space, environmental stressors such as microgravity and radiation may disrupt its gene expression, affecting its capacity to support plant growth. By analyzing stress-response and growth-related genes, whether beneficial bacterial functions are maintained under space conditions could be evaluated. This directly addresses the challenge of ensuring reliable microbial support systems for extraterrestrial agriculture.

4. Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

The hypothesis is that space conditions, including microgravity and radiation, alter the expression of key stress-response and plant growth–promoting genes in Bacillus subtilis. Specifically, it is expected that stress-response genes such as sigB will be upregulated, while genes associated with plant growth promotion may be downregulated or dysregulated.

The research goal is to determine whether these changes can be detected using the BioBits® cell-free system as a rapid, portable diagnostic tool, and by linking gene expression outputs to fluorescent reporters, this system could enable real-time monitoring of microbial health and functionality in space. This approach supports the development of robust microbial systems for sustainable agriculture beyond Earth.

5. Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

Samples of Bacillus subtilis grown under simulated microgravity conditions will be compared to Earth controls. DNA or RNA will be extracted and amplified using the miniPCR. Target sequences will be introduced into the BioBits cell-free system with reporter constructs to measure gene expression via fluorescence. The P51 Molecular Fluorescence Viewer will be used to quantify signal intensity. Controls will include non-stressed bacteria and no-template reactions. Data will consist of fluorescence levels corresponding to gene activity, allowing comparison of stress-response and growth-related gene expression.

References.

Su, L., Wang, Y., Liu, J., & Zhang, X. (2023). Effects of short-term exposure to simulated microgravity on the physiology of Bacillus subtilis. Journal of Basic Microbiology. https://www.sciencedirect.com/org/science/article/pii/S0008416623000080

Morrison, M. D., Fajardo-Cavazos, P., & Nicholson, W. L. (2017). Cultivation in space flight produces minimal alterations in Bacillus subtilis physiology and spore formation. NPJ Microgravity, 3(1). https://pubmed.ncbi.nlm.nih.gov/28821547/

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork

Link: https://colab.research.google.com/drive/1k7nG8YrBwt0K0HMJtZNGSgcDcVOCriU7#scrollTo=JpjyYDE79Dfl

MY CODE:

import math

################################

GREEN SECTION (Body + Flagella)

################################

pipette_20ul.pick_up_tip() center = center_location

Oval body:

a = 16 b = 8 points = 40

for i in range(points):

  if i % 8 == 0:
      pipette_20ul.aspirate(8, location_of_color('Green'))

  angle = 2 * math.pi * i / points
  x = a * math.cos(angle)
  y = b * math.sin(angle)

  loc = center.move(types.Point(x=x, y=y, z=0))
  dispense_and_detach(pipette_20ul, 1, loc)

Flagella:

flagella_points = 6

for i in range(flagella_points):

  pipette_20ul.aspirate(6, location_of_color('Green'))

  angle = 2 * math.pi * i / flagella_points
  start_x = (a + 1) * math.cos(angle)
  start_y = (b + 1) * math.sin(angle)

  for t in range(5):
      fx = start_x + t * 2 * math.cos(angle)
      fy = start_y + t * 2 * math.sin(angle)
      loc = center.move(types.Point(x=fx, y=fy, z=0))
      dispense_and_detach(pipette_20ul, 1, loc)

pipette_20ul.drop_tip()

################################

RED SECTION (Eyes + Smile)

################################

pipette_20ul.pick_up_tip()

Eyes:

pipette_20ul.aspirate(4, location_of_color(‘Red’))

left_eye = center.move(types.Point(x=-5, y=2, z=0))

right_eye = center.move(types.Point(x=5, y=2, z=0))

dispense_and_detach(pipette_20ul, 2, left_eye)

dispense_and_detach(pipette_20ul, 2, right_eye)

Smile:

smile_points = 15

pipette_20ul.aspirate(15, location_of_color(‘Red’))

for i in range(smile_points):

  angle = math.pi * i / smile_points
  
  
  x = 6 * math.cos(angle)
  y = -3 * math.sin(angle) - 2
  loc = center.move(types.Point(x=x, y=y, z=0))
  
  dispense_and_detach(pipette_20ul, 1, loc)

pipette_20ul.drop_tip()

Don’t forget to end with a drop_tip()


RESULT :) :

AI tools (ChatGPT and Gemini) assisted in suggesting mathematical approaches for generating an oval body and radial flagella. I reviewed, modified and finalized the code to ensure correct simulation behavior and compliance with lab constraints (volume limits).


Post-Lab Questions:

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

For this week, we’d like for you to do the following:

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Paper link: https://www.sciencedirect.com/science/article/pii/S2472630325000263

This paper describes how an Opentrons - 2 liquid handling robot can be used to automate and scale up protein crystallization experiments which is a foundational step in structural biology that is traditionally manual intensive. The robot was programmed via Python scripts to prepare 24 well sitting drop crystallization trials with precise reagent mixing and drop deposition. By comparing results against standard manual setup, the study showed that automation:

  • Reduce hands-on labor and variability, improving reproducibility.
  • Produce consistent crystal growth for both model proteins (hen egg white lysozyme) and a periplasmic protein from Campylobacter jejuni.
  • Scale preparation in a way that could benefit labs doing structural studies or materials research requiring uniform crystal batches.

I find this application novel because protein crystallization is a critical but laborious step in X - ray crystallography and related structural methods. Most labs still perform it manually or with high-cost automation. This work shows that a relatively low - cost, open-programmable robot like Opentrons - 2 can reliably handle complex setup steps, lowering the barrier to high-throughput crystallization workflows and enabling new scale and reproducibility in structural biology applications

  1. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate. This is due at the start of lecture and does not need to be tested on the Opentrons yet.

What I Would Automate?

A) Automated DNA Assembly & Construct Library Design

Using a liquid handling robot such as the Opentrons OT-2, I would automate Golden Gate or Gibson assembly reactions to generate a small library of salt-inducible constructs:

  • Normalize DNA part concentrations.
  • Set up combinatorial assembly reactions.
  • Transform into competent cells.
  • Plate onto selective media.

Python:

for promoter in promoter_list:

for gene in response_genes:

    assemble_mix = {
    
        "promoter": promoter,
        "gene": gene,
        "backbone": vector
    }
    
    pipette.transfer(2, promoter, assembly_well)
    
    pipette.transfer(2, gene, assembly_well)
    
    pipette.transfer(1, backbone, assembly_well)
    
    pipette.mix(5, 10, assembly_well)

B) Automated Salinity Gradient Screening

To test salt responsiveness the Opentrons OT-2 liquid handling robot would:

  • Prepare a 96-well plate with increasing NaCl concentrations (0–400 mM).
  • Inoculate engineered strains.

If I would use a cloud lab platform such as Ginkgo Bioworks’s Ginkgo Nebula (conceptually), the workflow would include:

-Automated liquid handling for salt gradients. -Plate sealing and incubation. -Automated plate reader measurements. -Data export for downstream analysis.

C) Data Processing & Optimization

I would automate analysis using Python:

import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_csv(“plate_reader_output.csv”)

grouped = data.groupby(“NaCl_concentration”).mean()

plt.plot(grouped.index, grouped[“fluorescence”])

plt.xlabel(“NaCl (mM)”)

plt.ylabel(“Normalized Fluorescence”)

plt.show()

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Projects

Final projects:

  • SECTION 1: ABSTRACT Provide a concise, self-contained summary of your project (minimum 150 words). The abstract should allow a reader to understand the purpose, approach, and expected outcomes of the work without referring to other sections.

    1. Your abstract should briefly address the following elements: a. Significance: What problem or question does the project address, and why is it important?

Subsections of Projects

Individual Final Project

SECTION 1: ABSTRACT

Provide a concise, self-contained summary of your project (minimum 150 words). The abstract should allow a reader to understand the purpose, approach, and expected outcomes of the work without referring to other sections.

1. Your abstract should briefly address the following elements:

a. Significance: What problem or question does the project address, and why is it important?

b. Broad Objective: What is the overall goal of the project?

c. Hypothesis: What prediction or principle is the project testing or demonstrating?

d. Specific Aims: What key steps or milestones will be completed to achieve the objective?

e. Methods: What experimental or technical approaches will be used?

Abstract

Soil salinity is a growing constraint to agricultural productivity, particularly in arid and high-altitude regions such as the Bolivian Altiplano, where environmental conditions promote the accumulation of salts and limit crop growth. This project addresses the need for sustainable, biologically based solutions to improve plant resilience under saline stress. The overall objective is to design a synthetic rhizosphere consortium capable of enhancing crop productivity by integrating complementary microbial functions.

The central hypothesis is that a functionally coordinated microbial consortium can improve plant tolerance to salinity by simultaneously promoting osmoprotection, maintaining nitrogen fixation, and stabilizing soil structure. To test this, the project focuses on three key organisms: Pseudomonas fluorescens, Azospirillum brasilense, and Bacillus subtilis, each engineered or selected to perform a specific role within the rhizosphere.

The specific aims include: (1) designing salt-responsive genetic circuits for osmoprotectant production, (2) ensuring stable nitrogen fixation under saline conditions, and (3) enhancing biofilm formation for improved soil aggregation. This project will be conducted primarily in silico using Benchling to design and organize genetic constructs, simulate functional pathways, and document workflows. The expected outcome is a modular and scalable microbial system that can be translated into future experimental validation and ultimately applied to improve agricultural sustainability in saline environments.

Group Final Project

cover image cover image