Homework

Weekly homework submissions:

  • Week 1 homework

    Principles and practices 💼 1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. Purification of enzymes for natural pigment synthesis facilitated by microalgal cell wall release

  • Week 2 homework

    DNA read, write, and edit 🧬 Part 1: Benchling and in-silico gel art The genome of the λ-phage was imported and virtually digested with the following restriction endonucleases: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI before being visualized on Benchling’s agarose gel simulator (Figure 2.1).

  • Week 3 homework

    Lab automation 🦾 Python script for Opentrons artwork Generate an artistic design using Ronan’s GUI. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script, which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept. If you use AI to help complete this homework or lab, document how you used AI and which models made contributions. Consistent with this week’s highly automated and digitized theme, for this assignment, I drew inspiration from an image popularized by the Internet, KC Green’s web comic strip “On Fire”, which, in 2014, became a famous -and my personal favorite- online meme (Figure 3.1). As many other people from all over the world, I deeply relate to this meme, which, I feel, accurately describes my life.

  • Week 4 homework

    Protein design-Part I 💻 Part 1: Conceptual questions Answer any nine of the following questions from Shuguang Zhang: (i.e. you can select two to skip) How many molecules of amino acids do you take with a piece of 500 grams of meat? (On average, an amino acid is ~100 Daltons.) Depending on the type of meat, as well as the manner it is processed prior to consumption, 500g of meat contain approximately 100 - 130g of protein. Assuming that this protein consists entirely of amino acids (meaning, excluding metal ions, such as iron or zinc, which can be found bound to protein molecules, or glycans and other moieties added to proteins through post-translational modifications), then 100-130g of amino acids = 6.02 - 7.83x1025Da approximately. Therefore, if the molecular weight of one amino acid is on average ~100Da, then 500g of meat contain (6.02 - 7.83x1025Da)/100Da = 6.02 - 7.83x1023 amino acid molecules.

  • Week 5 homework

    Protein design-Part II 💻 Part 1: SOD1 binder peptide design Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

  • Week 6 homework

    Genetic circuits-Part I: Assembly technologies 🧩 DNA Assembly Answer these questions about the protocol in this week’s lab: 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The components in the Phusion High-Fidelity PCR Master Mix, along with their purpose, are the following:

  • Week 7 homework

    Genetic circuits-Part II: Neuromorphic circuits 🧠

  • Week 9 homework

    Cell-free systems 🧪 General homework questions 1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Compared to conventional in vivo methods, cell-free protein synthesis provides modularity and substantially higher experimental control, as all the system’s components can be readily added or removed, especially when the strategy employed is to separately produce or extract each cellular element required for the process and then combine them all together into a single reaction. Cell-free systems also offer the potential for precise control over reaction conditions, such as pH and ion concentration, while being more flexible and versatile since they allow the expression of proteins deleterious to living cells, support the integration of non-natural and non-canonical amino acids into peptide backbones, and are compatible with diverse DNA templates (linear or plasmid). Additionally, they eliminate constraints imposed by the existence of living cells. For instance, unlike traditional cell cultures, they do not need any monitoring, cultivating, or other interventions aimed at preservation, nor are they susceptible to issues of cell viability, growth limits, or stress responses. Similarly, since the cell-free apparatus exists outside of the context of a cellular platform, there are no cell-membrane barriers, facilitating access to biochemical reactions, while, at the same time, there is no interference or competition from other metabolic procedures or regulatory signals, enabling all the available resources to be channeled towards the synthesis of the desired protein. The absence of living cells can be translated into abolishing the need for cloning and cellular transformation as well, which, in turn, ensures safer handling, as no genetically modified organisms are involved in cell-free protein production. More generally, one of the method’s most significant advantages is that it is a highly efficient technique for rapid protein synthesis that can also withstand being transferred across larger distances for longer periods of time, as the entire system can be easily freeze-dried and stored for later use.

  • Week 10 homework

    Advanced imaging and measurement technology 🎞️ Waters Part I: Molecular weight We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis). 1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? eGFP amino acid sequence with C-terminal linker and 6x-His tag

Subsections of Homework

Week 1 homework

Principles and practices 💼

1. First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.

Purification of enzymes for natural pigment synthesis facilitated by microalgal cell wall release

In recent decades, microalgae have emerged as promising platforms for the sustainable biosynthesis of various high-value compounds 1 2, however, during their purification, challenges can arise. This project aims to propose a method of metabolite purification from microalgal cultures by tapping into a largely overlooked resource of microalgal cells of various species, namely their cell wall. Purification of proteins of interest could be carried out by fusing them to elements of the microalgal cell wall and then harvesting the shedded cell walls following nitrogen starvation-induced sexual reproduction, as this method is inspired by the ecdysis of insects. Theoretically, this method of purification could be implemented for any fused proteins, but here, for a more tangible example stemming from an interest in the production of eco-friendly ink, the idea described below will focus on the synthesis of indigoidine synthetase, the primary enzyme implicated in the generation of the indigo dye.

The organism chosen for the purposes of this project is the model green microalga Chlamydomonas reinhardtii, as its physiology and metabolism have been extensively studied 3 4 5, while multiple tools have already been developed and established for its genetic manipulation 6 7 8. In more detail, the first step of this process should be to engineer C. reinhardtii 9 10 to overexpress pherophorin (microlgal cell wall protein)-indigoidine synthetase fusion proteins (Figure 1.1A). Subsequently, gamete generation can be induced by depleting the nitrogen in the growth medium, leading to mating of haploid microalgal cells followed by shedding of their cell walls (Figure 1.1B). The rejected cell wall components can then be isolated by sucrose-gradient centrifugation (Figure 1.1C), before being subjected to enzymatic processing both for polysaccharide degradation and for separation of indigoidine synthetase molecules from pherophorins (Figure 1.1D). Lastly, affinity chromatography can be employed for further purification of the synthesized indigoidine synthetase units (Figure 1.1E), which can afterwards be screened in an activity assay (Figure 1.1F) to monitor the enzymatic conversion of L-glutamate into indigoidine (indigo dye).

Microalgae cell wall harvesting project Microalgae cell wall harvesting project Figure 1.1 Schematic overview of cell wall release-based purification of indigoidine synthetase. Figure modified from Sekimoto, 2017 11 and partially created on BioRender.com.

The primary policy behind this project is sustainability, as microalgae require minimal resources to efficiently synthesize numerous valuable compounds utilized in human food and animal feed, in pharmaceuticals and cosmetics, and even in the energy sector with the production of biofuels through photosynthesis-driven carbon sequestration. Besides their role as promising green cell factories and a potential carbon sink, microalgae, and in turn projects revolving around them, contribute to sustainability by offering great spatial autonomy in terms of their cultivation, as they are not confined by the availability of arable land or freshwater-based irrigation. From a performance aspect, microalgae have been advocated for their high photosynthetic capacity too, enabling them to generate biomass more efficiently than most crops. Taking all the aforementioned benefits into consideration, another principle emerges, as microalgal cultivation gives equal opportunities for development both to established facilities, but most importantly, to low-income communities and developing countries, including small island nations. Culturing of microalgae merely requires exposing them to light and letting them fixate atmospheric inorganic carbon and has become even more affordable with the introduction of the plastic tubular photobioreactor 12, which also allows for the exploitation of otherwise unutilized vertical space.

3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g. 3D printing, drones, financial systems, etc.).
  • Purpose: What is done now and what changes are you proposing?
  • Design: What is needed to make it “work”? (including the actor(s) involved - who must opt-in, fund, approve, or implement, etc)
  • Assumptions: What could you have wrong (incorrect assumptions, uncertainties)?
  • Risks of Failure & “Success”: How might this fail, including any unintended consequences of the “success” of your proposed actions?

Option 1. Initiative to establish a teaching module on biosafety and biosecurity in high school Biology classes, with lectures from Biotechnology experts and academics of the local university or from iGEM teams that are active in the area. Raising awareness for biosafety and biosecurity issues as early as in high school years will ensure that future generations of researchers have the necessary stimuli to pursue good scientific practices. This initiative can also help high school students gain an overview both of the academic research being concucted at local universities, along with the regulatory frameworks defining that research, especially when that happens through the lens of an aspiring young scientist closer to their age, such as a university student member of a local iGEM team. In case of a smaller town or village without a university, lectures from invited experts coming from larger cities could be arranged, perhaps with the help and support of the Ministry of Education, which could even twin schools in rural areas with metropolitan universities.

Option 2. Enforcement of regular inspections from EU representatives to ensure that Biotechnology-focused academic, research, and industrial facilities adhere to imposed regulations. Surprisingly, there is no official EU institution, board, or committee primarily devoted to matters of biosafety and biosecurity, so pushing for the establishment of one should be a priority at this stage. In any case, laws that require regular inspections of biotechnological facilities even by experts appointed by the local government should be imposed. Close monitoring of academic and industrial facilities to verify that, for instance, HEPA filters are renewed and biohazardous waste is handled appropriately, is paramount and heavy fines should be issued in case of non-compliance.

Option 3. Increased funding of programs dedicated to designing and integrating novel kill-switches and other biocontainment mechanisms to expand the arsenal of available strategies both for a wider range of projects/conditions but also for a wider range of genetically modified organisms. This could be an action to promote innovative research in the field of biosecurity. By diverting resources or even creating new programs to fund research into novel biocontainment approaches at an EU (for example, the Marie-Skłodowska-Curie grants) or at a national level, more scientists could be encouraged to engage in discovering or devising cutting-edge kill-switch mechanisms 13 14 15 16 (Figure 1.2B), genetic safeguards or firewalls for preventing horizontal gene transfer, and auxotrophic strains 17 (Figure 1.2A). Hopefully, such an initiative could contribute to expanding the research and use of biocontainment not only to bacteria and yeast, for which the field of biosecurity is already established to a significant degree, but also to less widespread biotechnological hosts, such as microalgae, whose kill-switch and auxotrophy strategies are currently far less advanced.

Biocontainment strategies for microalgae Biocontainment strategies for microalgae Figure 1.2 Proposed biocontainment strategies for genetically engineered microalgae. (A) A promising biocontainment method involves the development of auxotrophic strains, such as microalgae that can solely survive on a non-standard source of P, for example, phosphite 17. (B) Kill-switch mechanism employing a light-controlled riboregulator. The main premise behind this circuit is that genetically engineered microalgae are primarily cultured in well-illuminated ponds. In this case, the light-inducible pLIP promoter from Dunaliella sp. 13 drives the expression of a trigger RNA molecule (trRNA) that, through binding to a three-way junction riboregulator 14 15 at the 5’ UTR of NucA nuclease’s coding sequence, suppresses the gene’s expression. However, if the engineered microalgae accidentally end up in an underground aquifer or inside another organism after ingestion, the dark conditions will reveal an integrated internal ribosome entry site (IRES), enabling the nuclease’s expression 16 and, effectively, exterminating the microalgal cell. Figure from Motomura et al., 2018 17 and partially created on BioRender.com.

4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own.

Adhering to the suggested format, the scores of the proposed governance actions are presented below (Table 1.1).

Table 1.1 Scores of the proposed governance measures with respect to the rubric of policy goals.

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents121
• By helping respond233
Foster Lab Safety
• By preventing incident122
• By helping respond132
Protect the environment
• By preventing incidents221
• By helping respond233
Other considerations
• Minimizing costs and burdens to stakeholdersn/a1n/a
• Feasibility?122
• Not impede research121
• Promote constructive applications131
5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties. For this, you can choose one or more relevant audiences for your recommendation, which could range from the very local (e.g. to MIT leadership or Cambridge Mayoral Office) to the national (e.g. to President Biden or the head of a Federal Agency) to the international (e.g. to the United Nations Office of the Secretary-General, or the leadership of a multinational firm or industry consortia). These could also be one of the “actor” groups in your matrix. Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.

Among the three actions listed above, the least feasible one would be the enforcement of regular inspections by an EU board devoted to addressing biosafety and biosecurity issues. Since such an institution does not exist right now, its establishment would require time and a large bureaucratic, as well as administrative and legislative mobilization, implicating several bodies of the EU, including the European Council and the European Commission.

On the other hand, the other two initiatives, namely integrating modules about biosafety and biosecurity in high school curricula and increasing funding for research on biocontainment practices, appear more feasible and promising in the short term and should, therefore, be prioritized. The basic rationale behind this prioritization is that, judging from personal experience, the educational community, including secondary and higher education, has always been more open-minded and welcoming towards new initiatives compared to legislative bodies and political agents. As educational institutions, regardless of level, have always been united by ideals such as service to humanity and a strong drive towards innovation and intellectual progress, it is generally much easier and faster to put internal actions for institutional twinning and raising awareness about biosafety and biosecurity in motion, mostly concerning the first option. Another benefit of prioritizing the initiative presented in the first option would be providing the opportunity to promote other principles alongside biosafety and biosecurity to younger audiences too, for instance, sustainability and equity as previously mentioned. Regarding the third option, increasing funding for relevant strategies internally appears more feasible as well, since in many cases, research groups and laboratories are given substantial flexibility in terms of allocating resources to individual projects from national grants that do not specify a particular objective. In the long run, both options one and three could have a significantly positive influence in establising more fundamental and impactful governance policies in the future, as they incorporate a more “bottom-up” approach, where members of the general public that have become more aware of the aforementioned principles, school students, university students, and academics, can start building an initial framework. This will lay the foundation for more radical reforms, as more citizens become aware of biosafety, sustainability, and equity issues and push (through voting and other sociopolitical manifestations) for more generalized changes and a renewal of governance policies.

Preparation for week 2 lecture 🔎

Homework Questions from Prof. Jacobson

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome? How does biology deal with that discrepancy?

The primary polymerases for nuclear genome replication in human cells are DNA polymerases δ (delta) and ε (epsilon) 18 19 20. Purely based on their replicase activity, they display an error rate of 10-4 to 10-5 per base pair per replication cycle. Given that the entire haploid human genome spans approximately 3.3×109bp, it can be calculated that, in every replication cycle, (3.3×109bp) × 10-4 = 3.3×105bp can be erroneous in a worst-case scenario.

However, DNA polymerases δ and ε have evolved to have a 3’ to 5’ proofreading exonuclease activity, meaning that they can recognize and correct misplaced nucleotides based on strand complementarity by “going in reverse”, excising, and replacing them 18 19 20. It has been shown that, due to their proofreading capacity, those polymerases exhibit extremely low error rates, estimated at less than 10-9 per base pair per replication cycle 20, which, according to the calculation above, can be translated into approximately 3.3 errors for every replication. Except for the polymerases’ proofreading capacity, cells also possess mismatch repair (MMR) mechanisms that correct mistakes in DNA replication missed by the polymerases 20. In case all the previously mentioned systems fail, there are cell cycle checkpoints in place, which will not allow cellular division to proceed if mistakes in DNA replication persist.

Another factor that contributes to a lower rate of mutations in the “final product” of the genome, namely the proteins, is that protein-coding genes constitute about 1% of the entire genome, as they are “diluted” due to what has been called “junk DNA”. Because of this “dilution” effect, it seems far more likely that, even if an error occurs, it will be located in a non-coding region of the genome. Although this could influence the regulation of expression, it will not affect the final amino acid sequence of the protein of interest. To this end, namely the preservation of a protein’s primary structure, the genetic code has also evolved to be degenerate. The degeneracy of the genetic code, meaning the redundancy where multiple, distinct codons (nucleotide triplet combinations) encode for the same single amino acid, allows for increased flexibility in protein expression. Therefore, even if an error occurs within a coding region, the mutated codon can still be translated into the correct amino acid. Finally, as a fail-safe on the protein level, in case a replication error results in a mutated codon that corresponds to a different amino acid, the tertiary structure, as well as the functionality of the final protein, could still be preserved, if the altered amino acid demonstrates the same or very similar biochemical properties as the original one (for instance, aspartic acid and glutamic acid).

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons that all of these different codes do not work to code for the protein of interest?

Each of the 20 standard amino acids can be represented by one to six different codons, with a three-codon representation in the genetic code on average. The coding sequence for an average human protein is 1,036bp long, meaning that an average human protein consists of approximately 1,036 / 3 = 345 amino acids. Based on this information, an average human protein could be translated from about 3345 = 4×10164 different codon combinations.

Although this is a valid theoretical assumption, in reality, every organism, including humans, does not show such flexibility in the coding sequences of its genome due to codon biases, as well as the genome’s GC content. Codon biases mostly relate to the actual availability of aminoacyl-tRNA synthetases and aminoacyl-tRNAs corresponding to every amino acid inside the cell, as aminoacyl-tRNAs are heavily involved in protein synthesis and their availability substantially affects the rate and the eventual success of the overall process. Specific codon usage databases (also known as “Kazusa tables”) have been composed based on studies of different organisms’ codon usages and biases 21 22. When synthesizing an artificial coding sequence for the expression of a human protein, the GC content of the genome needs to be taken into consideration too, as large variations can lead to silencing effects. Another factor that can drastically diminish the number of functional combinations for a protein’s coding sequence are the respective mRNA’s thermodynamic properties. Different combinations of codons, hence nucleotidic triplets, can facilitate the formation of secondary structures, such as hairpins, in a transcript, which can delay or even hinder translation. The selected codon sequence should, therefore, produce an mRNA with the appropriate thermodynamic profile for optimal protein expression. Finally, more limitations can be posed by the synthesis method in case the coding sequence has to be artificially assembled, as DNA molecules with low GC content and a low percentage of repeated sequences are preferred. Additionally, if the artificially synthesized gene also needs to contain introns that stabilize or enhance transcription, selection of a codon combination should ensure the presence of positions where introns can be inserted, for instance, GGs at determined intervals or spots.

Homework Questions from Dr. LeProust

1. What’s the most commonly used method for oligo synthesis currently?

Currently, the most established and widely used method for oligonucleotide synthesis is the solid-phase phosphoramidite (S-PP) method. This constitutes a type of chemical synthesis that utilizes a solid-phase material, typically controlled pore glass or microporous polystyrene, as a platform where oligonucleotides are added in a 3’ to 5’ direction. S-PP synthesis can be highly automated, enabling high-throughput production of oligonucleotides in 96- and 384-well plates.

The basic principle behind S-PP synthesis can be summarized as exposing biochemical groups that should react while simultaneously protecting groups that should not react. To this end, specifically modified nucleotides, called phosphoramidite monomers, which have their reactive groups “concealed”, need to be used. In particular, to prevent the reactive groups of the monomers from forming undesirable bonds during the process, their nitrogenous bases are protected by a benzoyl or isobutyryl moiety (Figure 1.3A, pink), their 3’-OH by a 2-cyanoethyl-diisopropylamino moiety (Figure 1.3A, purple, orange, and yellow), and their 5’-OH by a dimethoxytrityl (DMT) moiety (Figure 1.3A, green). In the case of RNA oligonucleotide synthesis, the highly reactive 2’-OH has to be “concealed” by a ter-butyldimethylsilyl moiety as well.

In more detail, the method consists of a cyclic four-step process that allows the assembly of the oligonucleotide chain by starting with a first monomer that has already been covalently attached to the synthesis platform and then elongating it one nucleotide at a time (Figure 1.3B):

  • Step 1. Detritilation The 5′-DMT protecting group is removed by lowering the pH of the reaction, thus exposing the 5′-OH.
  • Step 2. Coupling The incoming phosphoramidite monomer is added in a substantially high concentration and its phosphoramidite moiety “attacks” the now exposed 5′-OH (usually in the presence of an azole catalyst) to form a phosphite triester bond, ultimately linking the new monomer to the growing oligonucleotide. During this step, the diisopropylamino group is cleaved as well.
  • Step 3. Capping Unreacted 5′-OH groups are capped through acetylation to avoid the formation of oligonucleotides with the wrong sequence.
  • Step 4. Oxidation The unstable phosphite triester is oxidized into a stable phosphate triester bond.

Once the last cycle of the process is completed, the synthesized oligonucleotide is excised from the solid-phase platform by the implementation of alkaline conditions. This increase in pH will also cause the detachment of the 2-cyanoethyl groups, as well as of the nitrogenous bases’ protective moieties, from all the building blocks of the nucleotide, essentially rendering the newly synthesized oligonucleotide biochemically suitable for downstream applications 23 24 25.

Solid-phase phosphoramidite synthesis Solid-phase phosphoramidite synthesis Figure 1.3 (A) Representation of a phosphoramidite monomer. (B) Schematic overview of solid-phase phosphoramidite synthesis. Figure from Twist Bioscience phosphoramidite chemistry and BOC Sciences S-PP cycle.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis? Why can 2000bp genes not be made via direct oligo synthesis?

Generating oligonucleotides via direct chemical synthesis, such as the S-PP method, presents challenges due to
cumulative inefficiencies in the building process. These inefficiencies ultimately lead to a drastic decrease in the yield and purity of the final product as, for several factors analyzed below, the latter drop exponentially with the increasing length of the desired oligonucleotide.

One such factor appears to be cumulative yield loss. While S-PP synthesis is highly efficient (>99% per step), this discrepancy from 100% increases exponentially as the chain grows. Given that the addition of every new phosphoramidite monomer probabilistically constitutes an independent event from previous additions and by assuming a coupling efficiency of 99%, the yield of a 100-mer is approximately 0.99100 = 36.6%, while the yield of a 200nt oligonucleotide amounts to an even lower percentage, 0.99200 = 13.4%. Accordingly, the yield of a 2,000nt sequence is practically 0, since 0.992,000 = 18.6×10-9%.

Another shortcoming stems from the accumulation of truncated sequences, which can occur at any stage of the assembly due to incomplete coupling. Those shorter oligonucleotides (also called “failure sequences”) accumulate as impurities in the pool of S-PP synthesized oligonucleotides. As the elongation advances, the length difference between the desired full-length product and the truncated oligonucleotide becomes negligible. For instance, the two nucleotides that differentiate a truncated 198-mer from the desired 200-mer correspond to a 1% discrepancy, however, in the case of a 1,198nt truncated oligonucleotide and the desired 2,000-mer, the discrepancy is 0.1%. Due to resolution constraints of high-performance liquid chromatography (HPLC) or polyacrylamide gel electrophoresis (PAGE), which are employed to separate the desired products from truncated entities, isolating the desired oligonucleotide (especially in the case of the 2,000-mer) proves to be extremely difficult, negatively impacting the purity of the final product.

Lastly, the physical properties of the support platform, as well as the biochemistry of the oligonucleotides themselves, can post limitations to the synthesis, which intensify with the increasing length of the desired oligonucleotide. More specifically, when CPG beads are used as a support matrix, their pores can become clogged as the oligonucleotide chain grows longer, ultimately hindering the diffusion of reagents to the reactive 5’-OH group and reducing coupling efficiency. Similarly, repeated cycles of detritilation under acidic conditions can induce depurination (chemical degradation and loss of purine bases). Depurination can accumulate over 200 cycles and even reach concerningly high levels at 2,000 cycles, leading to damaged, fragmented, or incorrect sequences.

Homework Questions from Prof. Church

3. Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:

For the generative optogenetics program, an approach could be to design and express a light-activated polymerase that has a “pinwheel” structure/shape and works the way a danish Christmas star is folded/woven (Figure 1.4A,B, and C). More specifically, the polymerase would have four arms (one for each of the standard DNA nucleotides) that are activated by a specific wavelength. For the light activation element, domains from different opsins can be incorporated into the molecule, which would cause conformational changes in the appropriate arm resulting in its bending towards the central cavity where the nascent oligonucleotide is anchored (Figure 1.4D). Recruitment of the right nucleotide could be facilitated by selective affinity through specific amino acid interactions, while each arm should retain the properties of a polymerase to also form phosphodiester bonds. For an initial DNA fragment, which is normally needed for enzymatic DNA amplification, a poly-A tail (such as the one mature eukaryotic trnascripts have attached in their 3’ end) could be employed, which can be cleaved after synthesis of the desired oligonucletide is completed. This approach can, theoretically, be easily automated and scaled up by utilizing a 96-well Light Plate Apparatus (LPA) or it can be enhanced with the addition of more arms to integrate non-standard nucleotides as well, such as pseudouridine and inosine, for different applications. 1

Pinwheel polymerase Pinwheel polymerase Figure 1.4 (A) and (B) Photographs of folding/weaving a danish Christmas star, which can be seen finished in (C). (D) Schematic representation of the “pinwheel” polymerase, whose arms bend and add a particular nucleotide upon activation with a specific wavelength. Figure from How to make a danish Christmas star, the Bureau of Betterment, and partially created on BioRender.com.


  1. Fabris M, Abbriano RM, Pernice M, et al. Emerging Technologies in Algal Biotechnology: Toward the Establishment of a Sustainable, Algae-Based Bioeconomy. Front Plant Sci. 2020;11:279. doi:10.3389/fpls.2020.00279 ↩︎ ↩︎

  2. Brodie J, Chan CX, De Clerck O, et al. The Algal Revolution. Trends Plant Sci. 2017;22(8):726-738. doi:10.1016/j.tplants.2017.05.005 ↩︎

  3. Johnson X, Alric J. Central carbon metabolism and electron transport in Chlamydomonas reinhardtii: metabolic constraints for carbon partitioning between oil and starch. Eukaryotic Cell. 2013;12(6):776-793. doi:10.1128/EC.00318-12 ↩︎

  4. Rochaix JD. Chlamydomonas reinhardtii as the photosynthetic yeast. Annu Rev Genet. 1995;29:209-230. doi:10.1146/annurev.ge.29.120195.001233 ↩︎

  5. Calatrava V, Tejada-Jimenez M, Sanz-Luque E, Fernandez E, Galvan A. Nitrogen metabolism in Chlamydomonas. In: The Chlamydomonas Sourcebook. Elsevier; 2023:99-128. doi:10.1016/B978-0-12-821430-5.00004-3 ↩︎

  6. Ghribi M, Nouemssi SB, Meddeb-Mouelhi F, Desgagné-Penix I. Genome Editing by CRISPR-Cas: A Game Change in the Genetic Manipulation of Chlamydomonas. Life (Basel). 2020;10(11). doi:10.3390/life10110295 ↩︎

  7. Perozeni F, Baier T. Current Nuclear Engineering Strategies in the Green Microalga Chlamydomonas reinhardtii. Life (Basel). 2023;13(7). doi:10.3390/life13071566 ↩︎

  8. Einhaus A, Baier T, Kruse O. Molecular design of microalgae as sustainable cell factories. Trends Biotechnol. December 12, 2023. doi:10.1016/j.tibtech.2023.11.010 ↩︎

  9. Baier T, Kros D, Feiner RC, Lauersen KJ, Müller KM, Kruse O. Engineered Fusion Proteins for Efficient Protein Secretion and Purification of a Human Growth Factor from the Green Microalga Chlamydomonas reinhardtii. ACS Synth Biol. 2018;7(11):2547-2557. doi:10.1021/acssynbio.8b00226 ↩︎

  10. Torres-Tiji Y, Fields FJ, Yang Y, et al. Optimized production of a bioactive human recombinant protein from the microalgae Chlamydomonas reinhardtii grown at high density in a fed-batch bioreactor. Algal Research. 2022;66:102786. doi:10.1016/j.algal.2022.102786 ↩︎

  11. Sekimoto H. Sexual reproduction and sex determination in green algae. J Plant Res. 2017;130(3):423-431. doi:10.1007/s10265-017-0908-6 ↩︎

  12. Shahar B, Haim E, Kuc ME, Azerrad SP, Dudai N, Kurzbaum E. Simplified and cost-effective modulatory photobioreactor setup for upscaling microalgal culture for research and semi-industrial purposes. Algal Research. 2023;74:103200. doi:10.1016/j.algal.2023.103200 ↩︎

  13. Park S, Lee Y, Lee JH, Jin E. Expression of the high light-inducible Dunaliella LIP promoter in Chlamydomonas reinhardtii. Planta. 2013;238(6):1147-1156. doi:10.1007/s00425-013-1955-4 ↩︎ ↩︎

  14. Kim J, Zhou Y, Carlson PD, et al. De novo-designed translation-repressing riboregulators for multi-input cellular logic. Nat Chem Biol. 2019;15(12):1173-1182. doi:10.1038/s41589-019-0388-1 ↩︎ ↩︎

  15. Zhao EM, Mao AS, de Puig H, et al. RNA-responsive elements for eukaryotic translational control. Nat Biotechnol. 2022;40(4):539-545. doi:10.1038/s41587-021-01068-2 ↩︎ ↩︎

  16. Sebesta J, Xiong W, Guarnieri MT, Yu J. Biocontainment of Genetically Engineered Algae. Front Plant Sci. 2022;13. doi:10.3389/fpls.2022.839446 ↩︎ ↩︎

  17. Motomura K, Sano K, Watanabe S, et al. Synthetic Phosphorus Metabolic Pathway for Biosafety and Contamination Management of Cyanobacterial Cultivation. ACS Synth Biol. 2018;7(9):2189-2198. doi:10.1021/acssynbio.8b00199 ↩︎ ↩︎ ↩︎

  18. Thapa R. DNA Replication: Enzymes, Mechanism, Steps, Applications. November 2, 2023. Accessed February 8, 2026. https://microbenotes.com/dna-replication-steps/ ↩︎ ↩︎

  19. Prindle MJ, Loeb LA. DNA polymerase delta in DNA replication and genome maintenance. Environ Mol Mutagen. 2012;53(9):666-682. doi:10.1002/em.21745 ↩︎ ↩︎

  20. Bulock CR, Xing X, Shcherbakova PV. Mismatch repair and DNA polymerase δ proofreading prevent catastrophic accumulation of leading strand errors in cells expressing a cancer-associated DNA polymerase ϵ variant. Nucleic Acids Res. 2020;48(16):9124-9134. doi:10.1093/nar/gkaa633 ↩︎ ↩︎ ↩︎ ↩︎

  21. Nakamura Y, Gojobori T, Ikemura T. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 2000;28(1):292. doi:10.1093/nar/28.1.292 ↩︎

  22. Codon Usage Database. Accessed February 8, 2026. https://www.kazusa.or.jp/codon/ ↩︎

  23. McLaughlin L. What Is Oligonucleotide Synthesis? Phosphoramidite oligonucleotide synthesis. May 8, 2025. Accessed February 7, 2026. https://www.biotechnologyreviews.com/p/what-is-oligonucleotide-synthesis ↩︎

  24. A Simple Guide to Phosphoramidite Chemistry and How it Fits in Twist Bioscience’s Commercial Engine. Accessed February 8, 2026. https://www.twistbioscience.com/blog/science/simple-guide-phosphoramidite-chemistry-and-how-it-fits-twist-biosciences-commercial ↩︎

  25. the bumbling biochemist. Solid State Oligonucleotide Synthesis (Phosphoramidite Method). 2023. Accessed February 8, 2026. https://www.youtube.com/watch?v=t29CQywQpMY ↩︎

Week 2 homework

DNA read, write, and edit 🧬

Part 1: Benchling and in-silico gel art

The genome of the λ-phage was imported and virtually digested with the following restriction endonucleases: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI before being visualized on Benchling’s agarose gel simulator (Figure 2.1).

Virtual digest of lambda-phage’s genome Virtual digest of lambda-phage’s genome Figure 2.1 Virtual digest of λ-phage’s DNA treated with seven different restriction enzymes (as indicated by the gel lane legend on the top left). Figure created on Benchling.com.

Part 3: DNA design challenge

3.1. Choose your protein. In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, Google), obtain the protein sequence for the protein you chose.

The entire medical establishment relies heavily on a sea creature so ancient and resilient that it has barely changed since its ancestral form first appeared in the tree of life when Planet Earth had its own ring system (like Saturn and Uranus), approximately 450 million years ago. Besides a hardy armour-like exoskeleton and a bizarre body plan, horseshoe crabs bring to the table a primitive, yet extremely functional, immune system tightly-packed in their blue-colored blood. A key component of their immune response are the granular amoebocytes found in their blood (Figure 2.2B), which, upon contact with a bacterial endotoxin, initiate a coagulation cascade that protects the horseshoe crab by sequestering and neutralizing the harmful agent (Figure 2.2C). This very property of horseshoe crab immunity has been harnessed by numerous medical corporations since the 1960s-1970s as an effective method to safely screen vaccines, other injectable pharmaceuticals, as well as implantable biomedical devices, for the presence of bacteria-derived toxins in a procedure called “Limulus amoebocyte lysate (LAL) assay” 1 (Figure 2.2A).

Endotoxin detection with horseshoe crab blood Endotoxin detection with horseshoe crab blood Figure 2.2 (A) Photograph from a horseshoe crab blood harvesting facility. (B) The blue blood of horseshoe crabs contains granular amoebocytes (depicted in brown), which, upon contact with a bacterial endotoxin (mostly a lipopolysaccharide residue), trigger a coagulation cascade that sequesters the foreign compound. This property has been utilized by the medical establishment for decades to test pharmaceuticals for the presence of bacterial endotoxins (C). Figure from an NPR’s report on horseshoe crab blood harvesting and partially created on BioRender.com.

Based on this premise, an interesting idea to pursue would be to perform whole-cell engineering of bacteria or, preferably, yeast cells to render them functionally similar to the granular amoebocytes contained in horseshoe crab blood. This endeavor, primarily inspired by an old iGEM project aiming to convert Escherichia coli bacteria into red blood cells 2, could contribute to the conservation of the fragile ecosystem to which horseshoe crabs belong, but also drastically confine the invasive, time-consuming, and expensive practice of harvesting horseshoe crab haemolymph 3.

To this end, a critical step is to transform the cells-to-be-engineered with the gene coding for factor C, namely the main protein that initiates the immune response and triggers the coagulation pathway 4 5. The amino acid sequence for the factor C protein of the mangrove horseshoe crab Carcinoscorpius rotundicauda (CrFactor C) was retrieved from UniProt under the accession number “Q26422” 4 6:

>sp|Q26422|LFC_CARRO Limulus clotting factor C OS=Carcinoscorpius rotundicauda OX=6848 PE=2 SV=1 MVLASFLVSGLVLGLLAQKMRPVQSKGVDLGLCDETRFECKCGDPGYVFNIPVKQCTYFY RWRPYCKPCDDLEAKDICPKYKRCQECKAGLDSCVTCPPNKYGTWCSGECQCKNGGICDQ RTGACACRDRYEGVHCEILKGCPLLPSDSQVQEVRNPPDNPQTIDYSCSPGFKLKGMARI SCLPNGQWSNFPPKCIRECAMVSSPEHGKVNALSGDMIEGATLRFSCDSPYYLIGQETLT CQGNGQWNGQIPQCKNLVFCPDLDPVNHAEHKVKIGVEQKYGQFPQGTEVTYTCSGNYFL MGFDTLKCNPDGSWSGSQPSCVKVADREVDCDSKAVDFLDDVGEPVRIHCPAGCSLTAGT VWGTAIYHELSSVCRAAIHAGKLPNSGGAVHVVNNGPYSDFLGSDLNGIKSEELKSLARS FRFDYVRSSTAGKSGCPDGWFEVDENCVYVTSKQRAWERAQGVCTNMAARLAVLDKDVIP NSLTETLRGKGLTTTWIGLHRLDAEKPFIWELMDRSNVVLNDNLTFWASGEPGNETNCVY MDIQDQLQSVWKTKSCFQPSSFACMMDLSDRNKAKCDDPGSLENGHATLHGQSIDGFYAG SSIRYSCEVLHYLSGTETVTCTTNGTWSAPKPRCIKVITCQNPPVPSYGSVEIKPPSRTN SISRVGSPFLRLPRLPLPLARAAKPPPKPRSSQPSTVDLASKVKLPEGHYRVGSRAIYTC ESRYYELLGSQGRRCDSNGNWSGRPASCIPVCGRSDSPRSPFIWNGNSTEIGQWPWQAGI SRWLADHNMWFLQCGGSLLNEKWIVTAAHCVTYSATAEIIDPNQFKMYLGKYYRDDSRDD DYVQVREALEIHVNPNYDPGNLNFDIALIQLKTPVTLTTRVQPICLPTDITTREHLKEGT LAVVTGWGLNENNTYSETIQQAVLPVVAASTCEEGYKEADLPLTVTENMFCAGYKKGRYD ACSGDSGGPLVFADDSRTERRWVLEGIVSWGSPSGCGKANQYGGFTKVNVFLSWIRQFI

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools (Google “reverse translation tools”), determine the nucleotide sequence that corresponds to the protein sequence you chose above.

The nucleotide coding sequence for CrFactor C was extracted from the respective NCBI entry 7:

>S77063.1 factor C=endotoxin-sensitive intracellular serine protease zymogen {clone CrFC21} [Carcinoscorpius rotundicauda=Singapore horseshoe crabs, blood, amoebocytes, CDS, 3060 nt] ATGGTCTTAGCGTCGTTTTTGGTGTCTGGTTTAGTTCTAGGGCTACTAGCCCAAAAAATGCGCCCAGTTCAGTCCAAAGGAGTAGATCTAGGCTTGTGTGATGAAACGAGGTTCGAGTGTAAGTGTGGCGATCCAGGCTATGTGTTCAACATTCCAGTGAAACAATGTACATACTTTTATCGATGGAGGCCGTATTGTAAACCATGTGATGACCTGGAGGCTAAGGATATTTGTCCAAAGTACAAACGATGTCAAGAGTGTAAGGCTGGTCTTGATAGTTGTGTTACTTGTCCACCTAACAAATATGGTACTTGGTGTAGCGGTGAATGTCAGTGTAAGAATGGAGGTATCTGTGACCAGAGGACAGGAGCTTGTGCATGTCGTGACAGATATGAAGGGGTGCACTGTGAAATTCTCAAAGGTTGTCCTCTTCTTCCATCGGATTCTCAGGTTCAGGAAGTCAGAAATCCACCAGATAATCCCCAAACTATTGACTACAGCTGTTCACCAGGGTTCAAGCTTAAGGGTATGGCACGAATTAGCTGTCTCCCAAATGGACAGTGGAGTAACTTTCCACCCAAATGTATTCGAGAATGTGCCATGGTTTCATCTCCAGAACATGGGAAAGTGAATGCTCTTAGTGGTGATATGATAGAAGGGGCTACTTTACGGTTCTCATGTGATAGTCCCTACTACTTGATTGGTCAAGAAACATTAACCTGTCAGGGTAATGGTCAGTGGAATGGACAGATACCACAATGTAAGAACTTGGTCTTCTGTCCTGACCTGGATCCTGTAAACCATGCTGAACACAAGGTTAAAATTGGTGTGGAACAAAAATATGGTCAGTTTCCTCAAGGCACTGAAGTGACCTATACGTGTTCGGGTAACTACTTCTTGATGGGTTTTGACACCTTAAAATGTAACCCTGATGGGTCTTGGTCAGGATCACAGCCATCCTGTGTTAAAGTGGCAGACAGAGAGGTCGACTGTGACAGTAAAGCTGTAGACTTCTTGGATGATGTTGGTGAACCTGTCAGGATCCACTGTCCTGCTGGCTGTTCTTTGACAGCTGGTACTGTGTGGGGTACAGCCATATACCATGAACTTTCCTCAGTGTGTCGTGCAGCCATCCATGCTGGCAAGCTTCCAAACTCTGGAGGAGCGGTGCATGTTGTGAACAATGGCCCCTACTCGGACTTTCTGGGTAGTGACCTGAATGGGATAAAATCGGAAGAGTTGAAGTCTCTTGCCCGGAGTTTCCGATTCGATTATGTCCGTTCCTCCACAGCAGGTAAATCAGGATGTCCTGATGGATGGTTTGAGGTAGACGAGAACTGTGTGTACGTTACATCAAAACAGAGAGCCTGGGAAAGAGCTCAAGGTGTGTGTACCAATATGGCTGCTCGTCTTGCTGTGCTGGACAAAGATGTAATTCCAAATTCGTTGACTGAGACTCTACGAGGGAAAGGGTTAACAACCACGTGGATAGGATTGCACAGACTAGATGCTGAGAAGCCCTTTATTTGGGAGTTAATGGATCGTAGTAATGTGGTTCTGAATGATAACCTAACATTCTGGGCCTCTGGCGAACCTGGAAATGAAACTAACTGTGTATATATGGACATCCAAGATCAGTTGCAGTCTGTGTGGAAAACCAAGTCATGTTTTCAGCCCTCAAGTTTTGCTTGCATGATGGATCTGTCAGACAGAAATAAAGCCAAATGCGATGATCCTGGATCACTGGAAAATGGACACGCCACACTTCATGGACAAAGTATTGATGGGTTCTATGCTGGTTCTTCTATAAGGTACAGCTGTGAGGTTCTCCACTACCTCAGTGGAACTGAAACCGTAACTTGTACAACAAATGGCACATGGAGTGCTCCTAAACCTCGATGTATCAAAGTCATCACCTGCCAAAACCCCCCTGTACCATCATATGGTTCTGTGGAAATCAAACCCCCAAGTCGGACAAACTCGATAAGTCGTGTTGGGTCACCTTTCTTGAGGTTGCCACGGTTACCCCTCCCATTAGCTAGAGCAGCCAAACCTCCTCCAAAACCTAGATCCTCACAACCCTCTACTGTGGACTTGGCTTCTAAAGTTAAACTACCTGAAGGTCATTACCGGGTAGGGTCTCGAGCCATCTACACGTGCGAGTCGAGATACTACGAACTACTTGGATCTCAAGGCAGAAGATGTGACTCTAATGGAAACTGGAGTGGTCGGCCAGCGAGCTGTATTCCAGTTTGTGGACGGTCAGACTCTCCTCGTTCTCCTTTTATCTGGAATGGGAATTCTACAGAAATAGGTCAGTGGCCGTGGCAGGCAGGAATCTCTAGATGGCTTGCAGACCACAATATGTGGTTTCTCCAGTGTGGAGGATCTCTATTGAATGAGAAATGGATCGTCACTGCTGCCCACTGTGTCACCTACTCTGCTACTGCTGAGATTATTGACCCCAATCAGTTTAAAATGTATCTGGGCAAGTACTACCGTGATGACAGTAGAGACGATGACTATGTACAAGTAAGAGAGGCTCTTGAGATCCACGTGAATCCTAACTACGACCCCGGCAATCTCAACTTTGACATAGCCCTAATTCAACTGAAAACTCCTGTTACTTTGACAACACGAGTCCAACCAATCTGTCTGCCTACTGACATCACAACAAGAGAACACTTGAAGGAGGGAACATTAGCAGTGGTGACAGGTTGGGGTTTGAATGAAAACAACACCTATTCAGAGACGATTCAACAAGCTGTGCTACCTGTTGTTGCAGCCAGCACCTGTGAAGAGGGGTACAAGGAAGCAGACTTACCACTGACAGTAACAGAGAACATGTTCTGTGCAGGTTACAAGAAGGGACGTTATGATGCCTGCAGTGGGGACAGTGGAGGACCTTTAGTGTTTGCTGATGATTCCCGTACCGAAAGGCGGTGGGTCTTGGAAGGGATTGTCAGCTGGGGCAGTCCCAGTGGATGTGGCAAGGCGAACCAGTACGGGGGCTTCACTAAAGTTAACGTTTTCCTGTCATGGATTAGGCAGTTCATTTGA

3.3. Codon optimization. Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. You may, once again, utilize Google for a “codon optimization tool”. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

Different organisms display different codon biases regarding protein translation 8 9. Codon bias and codon usage are predominantly determined by the relative abundance of aminoacyl-tRNAs and aminoacyl-tRNA synthetases inside a cell, as they both constitute crucial components of the translational machinery, carrying the proteinogenic amino acids and loading the aminoacyl-tRNAs with the proper amino acid respectively. This codon bias has to be taken into account when transforming an organism with a gene from another organism to modify the coding sequence accordingly, enabled by the degeneracy of the genetic code, and render it compatible with the host cell’s translational machinery, thus ensuring smooth heterologous expression of the protein of interest. If the gene of interest is not codon optimized for the expression host, it is likely that the protein will be synthesized at very low levels or not at all.

Similarly, when choosing an expression host for synthesizing the protein of interest, several parameters have to be considered as well. In this specific case, factor C is a protein derived from a eukaryote, has a complex molecular structure involving disulfide bonds, and is glycosylated at several amino acid positions. For the expression of a protein with those characteristics, a putative host should also be a eukaryote, as eukaryotic cells harbor the necessary biochemical pathways for protein post-translational modifications, such as glycosylation, and, additionally, should have an oxidizing intracellular environment to facilitate the formation of disulfide bridges. A promising candidate that fulfils all those criteria is the methylotrophic yeast Pichia pastoris, for which the original coding sequence for factor C has been codon optimized employing Benchling’s codon optimization tool:

>factor C=endotoxin-sensitive intracellular serine protease zymogen {clone CrFC21} [codon optimized for Pichia pastoris, CDS, 3060 nt] ATGGTCTTAGCGTCGTTTTTGGTTTCTGGTTTAGTTCTAGGGCTACTAGCCCAAAAAATGCGCCCAGTTCAGTCCAAAGGAGTAGATCTAGGCTTGTGTGATGAAACGAGGTTCGAGTGTAAGTGTGGCGATCCAGGCTATGTTTTCAACATTCCAGTCAAACAATGCACATACTTTTATCGATGGAGGCCGTATTGTAAACCATGTGATGACCTGGAGGCTAAGGATATTTGTCCAAAGTACAAGCGATGTCAAGAGTGTAAGGCTGGTCTTGATAGTTGTGTTACTTGTCCACCTAACAAGTATGGTACTTGGTGTAGCGGTGAATGTCAGTGCAAGAACGGAGGTATCTGTGACCAGAGGACAGGAGCTTGTGCATGTCGTGACAGATATGAAGGGGTGCACTGCGAAATTCTCAAAGGTTGTCCTCTTCTTCCATCGGATTCTCAGGTTCAAGAAGTCAGAAATCCACCAGATAATCCCCAAACTATTGACTACAGCTGCTCACCAGGGTTCAAGCTTAAGGGTATGGCACGAATTAGCTGCCTCCCAAATGGACAGTGGAGTAACTTTCCACCAAAATGTATTAGAGAATGTGCCATGGTTTCATCTCCAGAACATGGTAAAGTTAATGCTCTTTCCGGTGATATGATAGAAGGTGCTACTTTACGGTTCTCCTGTGATAGTCCCTACTACTTGATTGGTCAAGAAACATTAACCTGCCAAGGTAATGGTCAGTGGAATGGACAGATACCACAATGTAAGAACTTGGTCTTTTGCCCTGACCTGGATCCTGTAAACCATGCTGAACACAAGGTTAAAATTGGTGTTGAACAAAAATATGGTCAGTTTCCTCAAGGAACTGAAGTTACCTATACGTGTTCGGGTAACTACTTCTTGATGGGTTTTGATACCTTAAAATGCAACCCTGATGGGTCTTGGTCAGGATCACAGCCATCCTGTGTTAAAGTGGCAGACAGAGAGGTCGACTGTGACAGTAAAGCTGTAGACTTCTTGGATGATGTTGGTGAACCGGTCAGGATCCACTGTCCTGCTGGCTGTTCTTTGACAGCTGGTACTGTTTGGGGTACAGCCATATACCATGAGCTTTCCTCCGTGTGCCGCGCAGCCATCCATGCTGGCAAGCTTCCAAACTCTGGAGGAGCTGTCCATGTTGTGAACAATGGCCCGTACTCCGACTTTCTGGGTTCCGACCTGAATGGTATAAAATCGGAAGAGTTGAAGTCTCTTGCCAGAAGTTTTAGATTCGATTATGTCCGTTCCTCCACAGCAGGTAAGTCAGGATGCCCTGATGGATGGTTTGAGGTAGACGAGAACTGTGTGTATGTTACATCAAAGCAGAGAGCATGGGAAAGAGCTCAAGGTGTGTGCACCAATATGGCTGCTAGACTTGCTGTGCTGGACAAAGATGTAATTCCAAACTCGTTGACTGAGACTCTAAGAGGGAAAGGTTTAACCACCACGTGGATAGGATTGCATAGACTAGATGCTGAGAAGCCCTTTATTTGGGAGTTAATGGATCGTAGTAATGTGGTTCTGAATGATAACCTAACCTTCTGGGCCTCTGGTGAACCTGGAAATGAAACTAACTGCGTATATATGGACATCCAAGATCAGTTGCAGTCTGTGTGGAAAACCAAGTCATGTTTTCAGCCATCTAGTTTTGCTTGCATGATGGATCTGTCAGATAGAAATAAAGCCAAGTGCGATGATCCTGGATCATTGGAAAATGGACACGCCACACTTCATGGACAATCCATTGATGGTTTCTATGCTGGTTCTTCTATAAGGTACAGCTGCGAGGTTCTCCACTACCTCAGTGGAACTGAAACCGTAACTTGTACCACAAATGGCACTTGGAGTGCTCCGAAACCGCGATGTATCAAAGTCATCACCTGCCAAAACCCCCCTGTACCATCATATGGTTCTGTGGAAATCAAACCCCCAAGTAGAACTAACTCGATAAGTCGTGTTGGGTCACCTTTCTTGAGGTTGCCAAGATTACCCCTCCCATTAGCTAGAGCAGCCAAGCCTCCTCCAAAGCCTAGATCCTCACAACCCTCTACTGTGGACTTGGCCTCTAAGGTTAAATTGCCTGAAGGTCATTACCGTGTCGGGTCTAGGGCCATCTACACGTGCGAGTCGAGATACTACGAACTATTGGGATCTCAAGGCAGAAGATGTGACTCTAACGGAAACTGGTCCGGTCGGCCAGCGAGCTGTATTCCAGTTTGCGGACGGTCAGATTCTCCTCGTTCTCCTTTTATCTGGAATGGTAATTCTACAGAAATTGGTCAGTGGCCGTGGCAGGCAGGAATCTCTAGATGGCTTGCAGACCACAATATGTGGTTTCTCCAATGTGGAGGATCTCTATTGAATGAGAAGTGGATCGTCACTGCTGCCCATTGTGTCACCTACTCTGCTACTGCTGAGATTATTGACCCCAATCAATTTAAAATGTATCTGGGCAAGTACTACCGTGATGACTCCAGAGATGATGACTATGTACAAGTAAGAGAGGCTCTTGAGATCCACGTCAATCCTAACTACGACCCCGGCAATTTGAACTTTGACATAGCCTTGATTCAACTGAAAACTCCTGTTACTTTGACTACACGAGTCCAACCAATTTGTCTGCCTACTGACATCACGACAAGAGAACATTTGAAGGAGGGAACATTAGCAGTTGTTACGGGTTGGGGTTTGAATGAAAACAACACCTATTCAGAGACTATTCAACAAGCTGTGTTGCCTGTTGTTGCAGCCAGCACCTGCGAAGAGGGGTACAAGGAGGCAGACTTACCACTGACTGTTACAGAGAACATGTTCTGTGCAGGTTACAAGAAGGGACGTTATGATGCCTGCTCCGGTGACAGCGGAGGACCTTTAGTGTTTGCTGATGATTCCCGTACCGAAAGGAGATGGGTCTTGGAAGGGATTGTCAGCTGGGGCAGTCCCTCCGGATGTGGAAAGGCGAACCAGTATGGTGGCTTCACTAAAGTTAACGTTTTCCTGTCATGGATTAGACAATTCATTTAA

3.4. You have a sequence! Now what? What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

For the expression of the codon-optimized version of CrFactor C in P. pastoris, the first step would be to replace the regulatory elements integrated in the cassette generated above, as they have been selected for bacterial expression, with parts that would be recognized in a yeast cell, such as the methanol-activated AOX1 promoter, an appropriate Kozak sequence, and the AOX1 terminator. After assembling the new genetic cassette for P. pastoris expression, it would have to be inserted into an integrative vector (probably from the pPICZα series), which would also carry a selection marker, for instance, zeocin, in conjunction with the expression cassette for the protein of interest. Subsequently, this integrative vector would be employed for the transformation of the yeast cells. In a portion of the successfully transformed yeast cells, the expression cassette-selection marker sequence would then be incorporated into the organism’s genome and, through the antibiotic resistance conferred by the selection marker, those positive transformants could be identified. By utilizing zeocin in particular, the most highly expressing strains can be readily isolated through increasing the dose of the antibiotic, as the resistance provided by zeocin is directly proportional to the number of selection marker genes integrated, which is a direct indication for the number of genes encoding the protein of interest integrated as well. The highly-expressing positive transformants would, afterwards, be cultured in the presence of methanol, which can strongly induce the transcription of the codon-optimized CrFactor C gene, whose mRNA would then be translated (ensured by the Kozak consensus) into CrFactor C protein molecules. Lastly, the nascent CrFactor C protein would be transferred to the endoplasmic reticulum (ER) and the Golgi apparatus for post-translational modifications.

For proteins with extended post-translational modification requirements, such as factor C, a cell-free expression system would not be recommended. However, a T7-based in vitro transcription method, coupled with a highly active and reliable in vitro eukaryotic translation system, such as rabbit reticulocyte lysate (RRL), would be an appealing alternative. For the expression of recombinant CrFactor C, though, the translation system should also be supplemented with microsomal membranes to secure the capacity for glycosylation 10.

3.5. How does it work in nature/biological systems?
  • Describe how a single gene codes for multiple proteins at the transcriptional level.

At the transcriptional/post-transcriptional level, the same gene, or rather, the same transcript, can code for multiple proteins through the mechanism of alternative splicing. More specifically, in the majority of eukaryotic organisms, genes consist of both exons, genetic segments that can be translated, and introns, genetic segments that are not translated but have regulatory roles. A crucial step in the transcript maturation process is the excision of introns, which is performed by the spliceosome, a large ribonucleoprotein complex found within the nucleus of the cell. Apart from removing the non-coding introns, the spliceosome also contributes to joining the remaining regions, namely the exons, from the precursor mRNA to generate the mature transcript. However, during this process, one or more different exons can be omitted, if, for instance, they are excised along with their flanking introns, leading to multiple variations of the mature mRNA (depending on the exons it contains) and, thus, to multiple proteins. Another mechanism through which a transcript can encode multiple proteins are multicistronic genes in prokaryotes. Multicistronic genes are frequently found clustered in prokaryotic operons, where a single transcription process, initiated by the gene’s sole promoter, leads to the synthesis of a long transcript with more than one start codons. Each start codon results in the expression of a different protein, albeit with all translation products from one multicistronic gene usually serving metabolic functions of the same metabolic pathway.

More clearly at the translational level, a single gene can code for multiple proteins if it contains more than one reading frames in eukaryotes too. The reading frames can also overlap, however, each one has its own start and stop codon. Lastly, at the post-translational level, a protein in a eukaryotic cell can exist in several distinct versions. Even though those versions are not considered different molecules, as they occurred from the same gene, a protein can be altered after it has been biosynthesized through post-translational modifications (PTMs), which include, but are not limited to, the cleavage of the initial methionine amino acid residue and the addition of other chemical moieties, such as glycans or phosphate groups.

  • Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated protein!

The alignment of the gene for the codon-optimized CrFactor C, the transcribed RNA, and the resulting protein is illustrated in Figure 2.3. As the gene is fairly large, I opted to show only the part of the gene that codes for the protein’s N-terminus.

Alignment of the gene, mRNA, and protein for CrFactor C Alignment of the gene, mRNA, and protein for CrFactor C Figure 2.3 Alignment of the gene encoding CrFactor C, as it has been codon optimized for expression in P. pastoris, with its mature mRNA transcript, and the resulting CrFactor C protein. Figure generated with Benchling.com.

Part 4: Prepare a Twist DNA Synthesis Order

By following the instructions on preparing a DNA synthesis order for Twist, a plasmid containing the CrFactor C sequence codon-optimized for P. pastoris was generated (Figure 2.4).

Figure 2.4 Snapshot of the plasmid map for pTwist(amp, high copy)-CrFactor_C generated for the Twist DNA synthesis order. Plasmid map created on Benchling.com.

Part 5: DNA read/write/edit

In continuation of the CrFactor C project, once again, expressing the codon-optimized sequence in P. pastoris would require assembling the genetic cassette, including the gene and all its flanking regulatory elements. Sequencing the assembled construct constitutes an important step before proceeding with the transformation in order to verify that the cloning was indeed successful and that the newly assembled expression cassette is identical to the designed one.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also, answer the following questions:

  1. Is your method first-, second- or third-generation or other? How so?
  2. What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.
  3. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?
  4. What is the output of your chosen sequencing technology?

For whole-plasmid sequencing of the integrative vector carrying the CrFactor C codon-optimized sequence, I would choose Oxford Nanopore sequencing. It is a third-generation sequencing technology that combines high speed, reliability, and read accuracy (98-99%), as well as low cost.

Another advantage of this technology is that it requires minimal library preparation, essentially omitting time-consuming steps, such as PCR or reverse transcription of RNA into DNA. Sample preparation involves a single, rapid “tagmentation” step facilitated by a hyperactive transposase complex. The complex, also called a transposome, is pre-loaded with synthetic, double-stranded adapters to simultaneously fragment DNA or RNA strands and ligate sequencing adapters, which, for Nanopore sequencing specifically, carry a motor protein as well that will help thread the nucleotide chains through the nanopores in the flow cell. The entire process of transposome-based library preparation lasts no more than 5min.

Once preparation of the sequencing library is completed, the sample can be loaded onto the flow cell, which contains thousands of nanopores (nano-scale protein channels) embedded in an electrically resistant membrane. The membrane separates two chambers, each filled with an electrolyte solution, and allows the flow of ions between the two chambers exclusively through its integrated nanopores. The ionic current induced by the flow of charged particles through each nanopore is monitored and measured with an individual electronic sensor. The sensor detects and reports the disruptions in the ionic current caused by the passing through of individual nucleotide bases, each of which displays a unique electrical fingerprint, hence electrical disruptive pattern, depending on its biochemical properties. Specialized base-calling algorithms interpret those disruptions by generating data in the form of nucleotide sequences in real time. Therefore, the output of the technology is an electrical current graph depicting “squiggles” of specific amplitude, which can then be translated into sequencing data consisting of readable nucleotide chains. Those sequencing data can be further processed afterwards by employing alignment, assembly, and analysis bioinformatics tools 11.

5.2 DNA Write (i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize!

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

To actually create the genetic circuit for the expression of CrFactor C in P. pastoris described above, each genetic element would first have to be individually synthesized by a DNA synthesis company, such as Twist. Apart from the codon-optimized sequence of the CrFactor C gene previously shown, those genetic elements include:

  • the AOX1 promoter from P. pastoris, combined with its native Kozak sequence (last six nucleotides) CTGTTCTAACCCCTACTTGACAGCAATATATAAACAGAAGGAAGCTGCCCTGTCTTAAACCTTTTTTTTTATCATCATTATTAGCTTACTTTCATAATTGCGACTGGTTCCAATTGACAAGCTTTTGATTTTAACGACTTTTAACGACAACTTGAGAAGATCAAAAAACAACTAATTATTGAAAGAATTCCGAAACG and
  • the AOX1 terminator from P. pastoris TCAAGAGGATGTCAGAATGCCATTTGCCTGAGAGATGCAGGCTTCATTTTTGATACTTTTTTATTTGTAACCTATATAGTATAGGATTTTTTTTGTCATTTTGTTTCTTCTCGTACGAGCTTGCTCCTGATCAGCCTATCTCGCAGCTGATGAATATCTTGTGGTAGGGGTTTGGGAAAATCATTCGAGTTTGATGTTTTTCTTGGTATTTCCCACTCCTCTTCAGAGTACAGAAGATTAAGTGAGA

Both aforementioned parts have been sourced from the iGEM Registry of Standard Biological Parts. After receiving the individually synthesized parts, the promoter, Kozak consensus, coding sequence, and terminator would have to be assembled into the genetic cassette for the expression of the CrFactor C. For this, I would use a reliable cloning technique, such as Golden Gate assembly, which capitalizes on type IIS restriction enzymes’ property to cleave a double-stranded DNA molecule downstream of their recognition site. This property allows the generation of custom four-nucleotide overhangs with the use of solely one restriction enzyme, enabling the assembly of complex DNA constructs (consisting of up to 30 distinct parts) seamlessly, with an extremely low error rate in a one-pot reaction. This method requires a lot of caution in its design phase, so that the appropriate flanking regions that will generate the suitable overhangs for the assembly step are added to each segment with the right orientation (alternatively, they can also be added after the initial synthesis of the parts through PCR). Additionally, it is important to remove any internal recognition sites for the selected type IIS endonuclease through site-directed mutagenesis before construct assembly. With proper design and execution of the previously mentioned laboratory techniques, the genetic cassette for the expression of the recombinant CrFactor C in P. pastoris can, therefore, be synthesized.

5.3 DNA Edit (i) What DNA would you want to edit and why? In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

Although it would probably be more rational to simply redo the cassette assembly with the desired DNA elements, I could use a gene editing method, such as CRISPR, to swap the bacteria-specific regulatory parts in the pTwist(amp, high copy)-CrFactor_C plasmid generated above for the Twist order with regulatory parts with a similar function but specialized for gene expression in P. pastoris. Those genetic segments would include the promoter, RBS, and terminator already integrated into the vector, which would be replaced by the AOX1 promoter, its native Kozak sequence, and the AOX1 terminator respectively. This way, an already assembled genetic construct can be repurposed for heterologous expression in a different organism (in this case, one that is more appropriate for the expression of the selected gene). At a later stage, the swapped regulatory sequences can be altered again with new ones, for instance, a stronger promoter and a more reliable terminator, to optimize transcription and obtain more robust protein synthesis.

(ii) What technology or technologies would you use to perform these DNA edits and why? Also answer the following questions:

  1. How does your technology of choice edit DNA? What are the essential steps?
  2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?
  3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

The CRISPR-Cas9 system is a tremendously versatile and powerful genome-editing technology that can introduce precise modifications in DNA. The principle, as well as the main components of the system, have been adapted from bacterial cells, where transcripts from the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) array are combined with Cas (CRISPR-associated) proteins to recognize and neutralize foreign genetic material, such as bacteriophage DNA, thus functioning as a natural defense mechanism.

At its core, CRISPR editing relies on guiding a DNA-cutting enzyme to a specific sequence in the genome, where it introduces a targeted break that the cell then repairs, often resulting in a desired genetic modification. More specifically, a guide molecule called a single-guide RNA (sgRNA or gRNA), which is designed to be complementary to the target DNA sequence, forms a complex with the Cas9 endonuclease (Figure 2.5A and B). Once the complex is delivered into the target cells, the gRNA directs Cas9 to the genetic target through base pairing. Importantly, the target site must be adjacent to a short motif known as the PAM (protospacer adjacent motif), which is required for Cas9 binding. Upon recognition, Cas9 generates a double-strand break in the DNA (Figure 2.5C), which in turn activates the cell’s natural repair pathways. The repair mechanisms include non-homologous end joining (NHEJ) or homology-directed repair (HDR), with the former often introducing brief insertions or deletions to the target site, which can be disruptive to the gene’s function. The latter, on the other hand, enables incorporating a DNA template to obtain more precise editing (Figure 2.5D).

CRISPR-Cas mechanism CRISPR-Cas mechanism Figure 2.5 Overview of the CRISPR-Cas9 gene editing mechanism. Figure modified from Addgene’s CRISPR guide.

Preparation for a CRISPR experiment involves several design and input considerations. First, the target DNA sequence must be computationally analyzed to identify sites that can be specifically targeted by gRNA molecules and are located near a PAM sequence. The selected sgRNA sequence is then synthesized and often inserted into a plasmid vector that also encodes Cas9. Alternatively, Cas9 protein and sgRNA can be delivered directly as a ribonucleoprotein complex. If the experiment aims to integrate a precisely altered sequence via HDR, a donor DNA template containing the intended modification should be designed as well, typically with homologous arms flanking the edit site. Additional inputs may include primers for verification, plasmids for delivery (as already mentioned), appropriate host cells depending on the application, as well as availability of delivery methods depending, in turn, on host cell type.

Despite its potential, CRISPR has several limitations, the main one concerning off-target effects, where the Cas9-sgRNA complex binds and cuts at unintended genomic sites, potentially causing unwanted mutations. Improved gRNA design and optimized Cas variants have reduced this risk, without, however, entirely eliminating it. Efficiency can be another shortcoming of the method, particularly for HDR-based edits, which are often less efficient than NHEJ and are usually influenced by the cell cycle stage. Additionally, delivery of CRISPR components into certain cell types or tissues remains challenging. There are also biological constraints, such as immune responses to Cas proteins and variability in editing outcomes between cells. Lastly, ethical considerations and regulatory frameworks can confine CRISPR implementation, especially in human germline editing.


  1. Real Science. Why Horseshoe Crab Blood Is So Valuable. 2020. Accessed February 15, 2026. https://www.youtube.com/watch?v=oXVnuG3zO_0 ↩︎

  2. BerkiGEM2007Present1 - 2007.igem.org. Accessed February 16, 2026. https://2007.igem.org/BerkiGEM2007Present1 ↩︎

  3. Maloney T, Phelan R, Simmons N. Saving the horseshoe crab: A synthetic alternative to horseshoe crab blood for endotoxin detection. PLoS Biol. 2018;16(10):e2006607. doi:10.1371/journal.pbio.2006607 ↩︎

  4. Ding JL, Navas MA, Ho B. Molecular cloning and sequence analysis of factor C cDNA from the Singapore horseshoe crab, Carcinoscorpius rotundicauda. Mol Marine Biol Biotechnol. 1995;4(1):90-103. ↩︎ ↩︎

  5. Ding JL, Ho B. Endotoxin detection–from limulus amebocyte lysate to recombinant factor C. Subcell Biochem. 2010;53:187-208. doi:10.1007/978-90-481-9078-2_9 ↩︎

  6. UniProt. UniProt. Accessed February 15, 2026. https://www.uniprot.org/uniprotkb/Q26422/entry ↩︎

  7. factor C=endotoxin-sensitive intracellular serine protease zymogen {cl - Nucleotide - NCBI. Accessed February 15, 2026. https://www.ncbi.nlm.nih.gov/nuccore/S77063 ↩︎

  8. Nakamura Y, Gojobori T, Ikemura T. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 2000;28(1):292. doi:10.1093/nar/28.1.292 ↩︎

  9. Codon Usage Database. Accessed February 8, 2026. https://www.kazusa.or.jp/codon/ ↩︎

  10. Beckler GS, Thompson D, Van Oosbree T. In vitro translation using rabbit reticulocyte lysate. Methods Mol Biol. 1995;37:215-232. doi:10.1385/0-89603-288-4:215 ↩︎

  11. BestDx Academy. Nanopore Sequencing. 2023. Accessed February 15, 2026. https://www.youtube.com/watch?v=FYEWrUVJ2as ↩︎

Week 3 homework

Lab automation 🦾

Python script for Opentrons artwork

  1. Generate an artistic design using Ronan’s GUI.
  2. Using the coordinates from the GUI, follow the instructions in the HTGAA26 Opentrons Colab to write your own Python script, which draws your design using the Opentrons. You may use AI assistance for this coding — Google Gemini is integrated into Colab (see the stylized star bottom center); it will do a good job writing functional Python, while you probably need to take charge of the art concept.
  3. If you use AI to help complete this homework or lab, document how you used AI and which models made contributions.

Consistent with this week’s highly automated and digitized theme, for this assignment, I drew inspiration from an image popularized by the Internet, KC Green’s web comic strip “On Fire”, which, in 2014, became a famous -and my personal favorite- online meme (Figure 3.1). As many other people from all over the world, I deeply relate to this meme, which, I feel, accurately describes my life.

“This is fine” comic strip “This is fine” comic strip Figure 3.1 Panel from KC Green’s web comic strip “On Fire”, which generated the popular “This is fine” online meme in 2014. Figure from NPR’s report on the related ‘All things considered’ podcast episode.

As a first step, I fed the right part of the meme into Ronan’s automation art interface, however, the generated artwork required a lot of additional manual processing to resemble the original image (Figure 3.2).

“This is fine” manual processing “This is fine” manual processing Figure 3.2 (A) Initial image produced by inserting the right-hand ptych of the “This is fine” meme into Ronan’s automation art interface. (B) Final artwork generated after manual rendering of the initial image.

After composing the final artwork (Figure 3.2B), I imported the nine bacterial dyes I utilized (mClover3, mLychee_TF, mWatermelon, Ultramarine, mKO2, dsRed, mScarlet_I, mCherry, mKate2) along with their respective coordinates into Gemini 2.5 Flash (which was incorporated into my personal copy of HTGAA26 Opentrons Colab notebook) and asked it to write a Python script that would generate the “This is fine”-inspired artwork on a petri dish with the Opentrons system. The Pyhton script obtained from this prompt, slightly augmented with some basic tweaking, including the addition of comments at several steps and functions, as well as renaming the bacterial dyes with their corresponding Hex codes, can be found below:

  # Optional, for importing the needed libraries
  import subprocess, sys
  subprocess.check_call([sys.executable, "-m", "pip", "install", "numpy", "pandas", "opentrons"])

  from opentrons import types

  metadata = {    # see https://docs.opentrons.com/v2/tutorial.html#tutorial-metadata
  'Sofia_Oikonomou': '',
  'This_is_fine_meme_Opentrons_artwork': '',
  'Generate_a_representation_of_the_This_is_fine_meme_with_9_different_bacteria-synthesized_dyes': '',
  'source': 'HTGAA 2026 Opentrons Lab',
  'apiLevel': '2.20'
  }

  # Robot deck setup constants - don't change these

  TIP_RACK_DECK_SLOT = 9
  COLORS_DECK_SLOT = 6
  AGAR_DECK_SLOT = 5
  PIPETTE_STARTING_TIP_WELL = 'A1'

  well_colors = {
  'A1' : '#409945',
  'A2' : '#40223D',
  'A3' : '#E45741',
  'A4' : '#B9474B',
  'A5' : '#B39223',
  'A6' : '#0F7449',
  'A7' : '#0F184C',
  'A8' : '#45203E',
  'A9' : '#7C463E'
  }

  def run(protocol):
  # Load labware, modules and pipettes

  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)

  # Temperature Module Plate
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul',
                                                  'Cold Plate')
  # Choose where to take the colors from
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')  ## TA MUST CALIBRATE EACH PLATE!
  # Get the top-center of the plate, make sure the plate was calibrated before running this
  center_location = agar_plate['A1'].top()

  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  # Patterning

  # Helper functions for this lab

  # pass this e.g. 'Red' and get back a Location which can be passed to aspirate()
    def location_of_color(color_string):
      for well,color in well_colors.items():
        if color.lower() == color_string.lower():
          return color_plate[well]
      raise ValueError(f"No well found with color {color_string}")

  # For this lab, instead of calling pipette.dispense(1, loc) use this: dispense_and_detach(pipette, 1, loc)
  def dispense_and_detach(pipette, volume, location):
  """
  Move laterally 5mm above the plate (to avoid smearing a drop); then drop down to the plate,
  dispense, move back up 5mm to detach drop, and stay high to be ready for next lateral move.
  5mm because a 4uL drop is 2mm diameter; and a 2deg tilt in the agar pour is >3mm difference across a plate.
  """
  assert(isinstance(volume, (int, float)))
  above_location = location.move(types.Point(z=location.point.z + 5))  # 5mm above
  pipette.move_to(above_location)       # Go to 5mm above the dispensing location
  pipette.dispense(volume, location)    # Go straight downwards and dispense
  pipette.move_to(above_location)       # Go straight up to detach drop and stay high

  # YOUR CODE HERE to create your design

  # Define all points for each dye
    all_dye_points = {
       #mClover3-Background wall lower
       '#409945': [(-11, 11),(-9, 11),(-31, 9),(-9, 9),(-3, 9),(-1, 9),(1, 9),(5, 9),(7, 9),(9, 9),(15, 9),(17, 9),(21, 9),(23, 9),(33, 9),(35, 9),(-39, 7),(-29, 7),(-27, 7),(-25, 7),(-7, 7),(-5, 7),(-3, 7),(-1, 7),(1, 7),(3, 7),(5, 7),(7, 7),(9, 7),(11, 7),(13, 7),(15, 7),(17, 7),(19, 7),(21, 7),(23, 7),(35, 7),(37, 7),(39, 7),(-39, 5),(-5, 5),(-3, 5),(-1, 5),(1, 5),(3, 5),(5, 5),(7, 5),(9, 5),(11, 5),(13, 5),(17, 5),(19, 5),(21, 5),(37, 5),(39, 5),(-5, 3),(-3, 3),(-1, 3),(1, 3),(3, 3),(5, 3),(7, 3),(9, 3),(11, 3),(19, 3),(21, 3),(37, 3),(39, 3),(-7, 1),(-5, 1),(-3, 1),(-1, 1),(1, 1),(3, 1),(5, 1),(7, 1),(9, 1),(19, 1),(39, 1),(-3, -1),(-1, -1),(1, -1),(3, -1),(5, -1),(7, -1),(9, -1),(-1, -3),(1, -3),(3, -3),(5, -3),(7, -3),(1, -5),(3, -5),(5, -5),(7, -5),(1, -7),(3, -7),(5, -7),(7, -7),(1, -9),(3, -9),(5, -9),(-1, -27),(1, -27),(3, -27),(-1, -29),(1, -29),(3, -29),(1, -31)],
       #mCherry-Dog outline
       '#40223D': [(-29, 25),(-27, 25),(-25, 15),(-23, 15),(-13, 15),(-11, 9),(-9, 7),(-27, 5),(-25, 5),(-7, 5),(-27, 3),(-7, 3),(-25, 1),(-23, 1),(-21, 1),(-19, 1),(-17, 1),(-15, 1),(-13, 1),(-11, 1),(-9, 1),(-7, -3),(-5, -3),(-3, -3),(-19, -5),(-17, -5),(-15, -5),(-9, -5),(-1, -5),(-21, -7),(-13, -7),(-11, -7),(-1, -7),(-21, -9),(-11, -9),(-1, -9),(7, -9),(-21, -11),(-11, -11),(-1, -11),(1, -11),(3, -11),(5, -11),(-21, -13),(-11, -13),(-9, -13),(-3, -13),(-1, -13),(1, -13),(-21, -15),(-11, -15),(-7, -15),(-5, -15),(-19, -17),(-11, -17),(-17, -19),(-15, -19),(-13, -19),(13, -19),(11, -21),(3, -23),(5, -23),(7, 23),(9, -23),(-9, -25),(-7, -25),(-5, -25),(-3, -25),(-1, -25),(1, -25),(3, -25),(-3, -27),(-3, -29),(-11, -31),(-3, -31),(-9, -33),(-3, -33),(-7, -35),(-5, -35)],
       #mLychee_TF-Dog and hat
       '#E45741': [(-21, 15),(-19, 15),(-17, 15),(-15, 15),(-23, 13),(-21, 13),(-19, 13),(-17, 13),(-15, 13),(-13, 13),(-23, 11),(-21, 11),(-19, 11),(-17, 11),(-15, 11),(-13, 11),(-23, 9),(-21, 9),(-19, 9),(-17, 9),(-15, 9),(-13, 9),(-11, 7),(-11, 5),(-9, 5),(-25, 3),(-23, 3),(-21, 3),(-19, 3),(-17, 3),(-15, 3),(-13, 3),(-11, 3),(-9, 3),(-23, -1),(-21, -1),(-19, -1),(-17, -1),(-15, -1),(-13, -1),(-11, -1),(-9, -1),(-7, -1),(-5, -1),(-25, -3),(-23, -3),(-21, -3),(-19, -3),(-17, -3),(-15, -3),(-13, -3),(-11, -3),(-9, -3),(-27, -5),(-25, -5),(-23, -5),(-21, -5),(-13, -5),(-11, -5),(-25, -7),(-23, -7),(-25, -9),(-23, -9),(-25, -11),(-23, -11),(-25, -13),(-23, -13),(3, -13),(5, -13),(-25, -15),(-23, -15),(-9, -15),(-3, -15),(-1, -15),(1, -15),(3, -15),(5, -15),(-25, -17),(-23, -17),(-21, -17),(-9, -17),(-7, -17),(-5, -17),(-3, -17),(-1, -17),(1, -17),(3, -17), (5, -17),(7, -17),(-25, -19),(-23, -19),(-21, -19),(-19, -19),(-11, -19),(-9, -19),(-7, -19),(-5, -19),(-3, -19),(-1, -19),(1, -19),(3, -19),(5, -19),(7, -19),(9, -19),(11, -19),(-25, -21),(-23, -21),(-21, -21),(-19, -21),(-17, -21),(-15, -21),(-13, -21),(-11, -21),(-9, -21),(-7, -21),(-5, -21),(-3, -21),(-1, -21),(1, -21),(3, -21),(5, -21),(7, -21),(9, -21),(-25, -23),(-23, -23),(-21, -23),(-19, -23),(-17, -23),(-13, -23),(-11, -23),(-9, -23),(-7, -23),(-5, -23),(-3, -23),(-1, -23),(1, -23),(-25, -25),(-23, -25),(-21, -25),(-19, -25),(-17, -25),(-27, -27),(-25, -27),(-23, -27),(-21, -27),(-19, -27),(-17, -27),(-13, -27),(-11, -27),(-9, -27),(-27, -29),(-25, -29),(-23, -29),(-21, -29),(-19, -29),(-17, -29),(-15, -29),(-11, -29),(-9, -29),(-7, -29),(-25, -31),(-23, -31),(-21, -31),(-19, -31),(-17, -31),(-15, -31),(-13, -31),(-9, -31),(-7, -31),(-5, -31),(-21, -33),(-19, -33),(-17, -33),(-15, -33),(-13, -33),(-11, -33),(-7, -33),(-5, -33),(-19, -35),(-17, -35),(-15, -35),(-13, -35),(-11, -35),(-9, -35),(-3, -35),(-15, -37),(-13, -37),(-11, -37),(-9, -37),(-7, -37),(-5, -37),(-3, -37),(-7, -39),(-5, -39),(-3, -39)],
       #mScarlet_I-Flame outline
       '#B9474B': [(27, 13),(29, 13),(-35, 11),(25, 11),(31, 11),(-35, 9),(-33, 9),(25, 9),(31, 9),(-37, 7),(-31, 7),(25, 7),(33, 7),(-37, 5),(-29, 5),(15, 5),(23, 5),(35, 5),(-39, 3),(13, 3),(17, 3),(23, 3),(35, 3),(11, 1),(17, 1),(21, 1),(37, 1),(11, -1),(19, -1),(39, -1),(9, -3),(9, -5),(9, -7),(5, -25),(5, -27),(5, -29),(-1, -31),(3, -31),(-1, -33),(1, -33)],
       #mKO2-Flame body
       '#B39223': [(27, 11),(29, 11),(27, 9),(29, 9),(-35, 7),(-33, 7),(27, 7),(29, 7),(31, 7),(-35, 5),(-33, 5),(-31, 5),(25, 5),(27, 5),(29, 5),(31, 5),(33, 5),(-37, 3),(-35, 3),(-33, 3),(-31, 3),(-29, 3),(15, 3),(25, 3),(27, 3),(29, 3),(31, 3),(33, 3),(-39, 1),(-37, 1),(-35, 1),(-33, 1),(-31, 1),(-29, 1),(-27, 1),(13, 1),(15, 1),(23, 1),(25, 1),(27, 1),(29, 1),(31, 1),(33, 1),(35, 1),(-39, -1),(-37, -1),(-35, -1),(-33, -1),(-31, -1),(-29, -1),(-27, -1),(-25, -1),(13, -1),(15, -1),(17, -1),(21, -1),(23, -1),(25, -1),(27, -1),(29, -1),(31, -1),(33, -1),(35, -1),(37, -1),(-39, -3),(-37, -3),(-35, -3),(-33, -3),(-31, -3),(-29, -3),(-27, -3),(11, -3),(13, -3),(15, -3),(17, -3),(19, -3),(21, -3),(23, -3),(25, -3),(27, -3),(29, -3),(31, -3),(33, -3),(35, -3),(37, -3),(39, -3),(-39, -5),(-37, -5),(-35, -5),(-33, -5),(-31, -5),(-29, -5),(11, -5),(13, -5),(15, -5),(17, -5),(19, -5),(21, -5),(23, -5),(25, -5),(27, -5),(29, -5),(31, -5),(33, -5),(35, -5),(37, -5),(39, -5),(-39, -7),(-37, -7),(-35, -7),(-33, -7),(-31, -7),(11, -7),(13, -7),(15, -7),(17, -7),(19, -7),(21, -7),(23, -7),(25, -7),(27, -7),(29, -7),(31, -7),(33, -7),(35, -7),(37, -7),(39, -7),(-37, -9),(-35, -9),(-33, -9),(15, -9),(17, -9),(19, -9),(21, -9),(23, -9),(25, -9),(27, -9),(29, -9),(31, -9),(33, -9),(35, -9),(37, -9),(-37, -11),(-35, -11),(17, -11),(19, -11),(21, -11),(23, -11),(25, -11),(27, -11),(29, -11),(31, -11),(33, -11),(35, -11),(37, -11),(-37, -13),(-35, -13),(17, -13),(19, -13),(21, -13),(23, -13),(25, -13),(27, -13),(29, -13),(31, -13),(33, -13),(35, -13),(37, -13),(-37, -15),(17, -15),(19, -15),(21, -15),(23, -15),(25, -15),(27, -15),(29, -15),(31, -15),(33, -15),(35, -15),(37, -15),(15, -17),(17, -17),(19, -17),(21, -17),(23, -17),(25, -17),(27, -17),(29, -17),(31, -17),(33, -17),(35, -17),(15, -19),(17, -19),(19, -19),(21, -19),(23, -19),(25, -19),(27, -19),(29, -19),(31, -19),(33, -19),(35, -19),(13, -21),(15, -21),(17, -21),(19, -21),(21, -21),(23, -21),(25, -21),(27, -21),(29, -21),(31, -21),(33, -21),(11, -23),(13, -23),(15, -23),(17, -23),(19, -23),(21, -23),(23, -23),(25, -23),(27, -23),(29, -23),(31, -23),(7, -25),(9, -25),(11, -25),(13, -25),(15, -25),(17, -25),(19, -25),(21, -25),(23, -25),(25, -25),(27, -25),(29, -25),(31, -25),(-29, -27),(7, -27),(9, -27),(11, -27),(13, -27),(15, -27),(17, -27),(19, -27),(21, -27),(23, -27),(25, -27),(27, -27),(29, -27),(7, -29),(21, -29),(23, -29),(25, -29),(27, -29),(5, -31),(7, -31),(19, -31),(23, -31),(25, -31),(3, -33),(5, -33),(7, -33),(19, -33),(-1, -35),(1, -35),(3, -35),(5, -35),(7, -35),(19, -35),(-1, -37),(1, -37),(3, -37),(5, -37),(7, -37),(-1, -39),(1, -39),(3, -39),(5, -39),(7, -39)],
       #mWatermelon-Background wall upper
       '#0F7449': [(-37, 13),(-35, 13),(-33, 13),(-31, 13),(-11, 13),(-9, 13),(-1, 13),(1, 13),(7, 13),(9, 13),(11, 13),(13, 13),(15, 13),(17, 13),(25, 13),(31, 13),(33, 13),(35, 13),(37, 13),(-37, 11),(-33, 11),(-31, 11),(-29, 11),(-27, 11),(-25, 11),(-7, 11),(-5, 11),(-3, 11),(-1, 11),(1, 11),(5, 11),(7, 11),(9, 11),(11, 11),(13, 11),(15, 11),(17, 11),(19, 11),(21, 11),(23, 11),(33, 11),(35, 11),(37, 11),(-37, 9),(-29, 9),(-27, 9),(-25, 9),(-7, 9),(-5, 9),(11, 9),(13, 9),(19, 9),(37, 9)],
       #Ultramarine-Dark blue elements
       '#0F184C': [(-25, 31),(-23, 31),(-21, 31),(-19, 31),(-15, 31),(-13, 31),(-11, 31),(-9, 31),(-7, 31),(-3, 31),(-1, 31),(1, 31),(3, 31),(7, 31),(9, 31),(11, 31),(13, 31),(15, 31),(21, 31),(23, 31),(25, 31),(-23, 29),(-19, 29),(-15, 29),(-13, 29),(-11, 29),(-3, 29),(-1, 29),(7, 29),(13, 29),(15, 29),(17, 29),(21, 29),(23, 29),(-23, 27),(-19, 27),(-17, 27),(-15, 27),(-13, 27),(-11, 27),(-9, 27),(-7, 27),(-3, 27),(-1, 27),(1, 27),(3, 27),(7, 27),(9, 27),(11, 27),(13, 27),(15, 27),(19, 27),(21, 27),(23, 27),(25, 27),(27, 27),(-23, 25),(-19, 25),(-15, 25),(-13, 25),(-7, 25),(-3, 25),(3, 25),(7, 25),(13, 25),(15, 25),(21, 25),(23, 25),(-23, 23),(-19, 23),(-15, 23),(-13, 23),(-11, 23),(-9, 23),(-7, 23),(-3, 23),(-1, 23),(1, 23),(3, 23),(7, 23),(13, 23),(15, 23),(21, 23),(23, 23),(25, 23),(27, 23),(-27, 21),(31, 21),(-33, 19),(-31, 19),(-29, 19),(33, 19),(35, 19),(-37, 15),(-35, 15),(-33, 15),(-31, 15),(-11, 15),(-9, 15),(-1, 15),(1, 15),(9, 15),(11, 15),(13, 15),(15, 15),(17, 15),(25, 15),(27, 15),(29, 15),(31, 15),(-29, 13),(-27, 13),(-25, 13),(-7, 13),(-5, 13),(-3, 13),(19, 13),(21, 13),(23, 13),(-23, 7),(-21, 7),(-19, 7),(-17, 7),(-15, 7),(-13, 7),(-23, 5),(-21, 5),(-19, 5),(-17, 5),(-15, 5),(-13, 5),(-29, -7),(-27, -7),(-7, -7),(-5, -7),(-3, -7),(-31, -9),(-29, -9),(-27, -9),(-17, -9),(-15, -9),(-7, -9),(-5, -9),(-3, -9),(9, -9),(11, -9),(13, -9),(-33, -11),(-31, -11),(-29, -11),(-27, -11),(-17, -11),(-15, -11),(-13, -11),(-7, -11),(-5, -11),(-3, -11),(7, -11),(9, -11),(11, -11),(13, -11),(15, -11),(-33, -13),(-31, -13),(-29, -13),(-27, -13),(-17, -13),(-15, -13),(-13, -13),(-5, -13),(7, -13),(9, -13),(11, -13),(13, -13),(15, -13),(-35, -15),(-33, -15),(-31, -15),(-29, -15),(-27, -15),(-17, -15),(-15, -15),(-13, -15),(7, -15),(9, -15),(11, -15),(13, -15),(15, -15),(-35, -17),(-33, -17),(-31, -17),(-29, -17),(-27, -17),(-17, -17),(-15, -17),(-13, -17),(9, -17),(11, -17),(13, -17),(-35, -19),(-33, -19),(-31, -19),(-29, -19),(-27, -19),(-33, -21),(-31, -21),(-29, -21),(-27, -21),(-31, -23),(-29, -23),(-27, -23),(-31, -25),(-29, -25),(-27, -25),(-7, -27),(-5, -27),(-5, -29),(11, -29),(13, -29),(15, -29),(17, -29)],
       #mKate2-Puff of smoke
       '#45203E': [(-7, 39),(-5, 39),(-3, 39),(-1, 39),(1, 39),(3, 39),(5, 39),(7, 39),(-15, 37),(-13, 37),(-11, 37),(11, 37),(13, 37),(15, 37),(-19, 35),(-27, 29),(-29, 27),(-27, 27),(-31, 25),(31, 25),(-31, 23),(-29, 23),(-27, 23),(31, 23),(-33, 21),(-31, 21),(-29, 21),(-25, 21),(29, 21),(33, 21),(-35, 19),(-27, 19),(-25, 19),(-23, 19),(27, 19),(29, 19),(31, 19),(-35, 17),(-33, 17),(-31, 17),(-29, 17),(-27, 17),(-25, 17),(-23, 17),(-21, 17),(-19, 17),(-17, 17),(-15, 17),(-13, 17),(-11, 17),(-9, 17),(-7, 17),(-5, 17),(-3, 17),(-1, 17),(1, 17),(9, 17),(11, 17),(13, 17),(15, 17),(17, 17),(19, 17),(21, 17),(23, 17),(25, 17),(27, 17),(29, 17),(31, 17),(33, 17),(35, 17),(-29, 15),(-27, 15),(-7, 15),(-5, 15),(-3, 15),(19, 15),(21, 15),(23, 15),(33, 15),(35, 15),(37, 15),(-15, -23),(-15, -25),(-13, -25),(-11, -25),(-15, -27),(-13, -29)],
       #dsRed-Coffee mug
       '#7C463E': [(9, -29),(19, -29),(9, -31),(11, -31),(13, -31),(15, -31),(17, -31),(21, -31),(9, -33),(11, -33),(13, -33),(15, -33),(17, -33),(21, -33),(9, -35),(11, -35),(13, -35),(15, -35),(17, -35),(9, -37),(11, -37),(13, -37),(15, -37)]
  }

  for dye_name, points_list in all_dye_points.items():
  if not points_list: # Skip if no points for this dye
      continue

  color_source_well = location_of_color(dye_name)

  pipette_20ul.pick_up_tip()

  # Aspirate in batches to ensure enough liquid
  for i, (x_coord, y_coord) in enumerate(points_list):
      if i % 20 == 0 or pipette_20ul.current_volume < 1: # Re-aspirate if volume is low or every 20 points
          # Aspirate enough for remaining points or 20uL, whichever is less
          volume_to_aspirate = min(20, len(points_list) - i)
          if volume_to_aspirate > 0:
              pipette_20ul.aspirate(volume_to_aspirate, color_source_well)

      adjusted_location = center_location.move(types.Point(x=x_coord, y=y_coord))
      dispense_and_detach(pipette_20ul, 1, adjusted_location)

  pipette_20ul.drop_tip()

  # Don't forget to end with a drop_tip()

  # Execute Simulation / Visualization -- don't change this code block
    protocol = OpentronsMock(well_colors)
    run(protocol)
    protocol.visualize()

Lastly, when I simulated the artwork, the Python script presented above did produce the designed image (Figure 3.3).

“This is fine” final simulation “This is fine” final simulation Figure 3.3 Simulation of the “This is fine” artwork generated by running the Python script displayed above.

Post-lab questions related to using the Opentrons system

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely. For this week:

  1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

In their paper from March 2026 1, Kostanjšek et al. present the development of Rhodo-Box, a toolkit of standardized genetic parts for the emerging Synthetic Biology chassis Rhodobacter sphaeroides. R. sphaeroides is a purple non-sulfur alphaproteobacterium with a highly versatile metabolism, including photosynthetic pathways, which render it a promising platform for the sustainable biosynthesis of numerous compounds. To realize the microorganism’s full potential, the researchers built and characterized a collection of modular genetic elements specifically tailored for use in R. sphaeroides.

The modular parts include three broad-host origins of replication (ORIs), namely RSF1010, pBBR1, and RK2 functioning as high-, medium-, and medium-copy-number ORIs respectively. Also found in the toolbox are 13 constitutive promoters of native (like PJ95025), heterologous (such as P crtE), and artificial origin (like PJ23100), spanning a 270-fold dynamic range, as well as 11 ribosome binding sites (RBSs), originally designed for E. coli, for instance, B0034, or for R. sphaeroides, such as J95028, for translational regulation, spanning a 49-fold dynamic range. Another significant feature of Rhodo-Box are the four inducible expression systems assessed in the context of the study, namely NahR-P salTTC, LacI-P lacT7A1_O3O4, VanR-P vanCC, and XylS-Pm. Among them, NahR-P salTTC and VanR-P vanCC, induced by salicylic and vanillic acid each, appeared to be the most promising ones as they displayed high tunability and low basal expression, with the former comprising an appealing option for industrial-scale biomanufacturing due to the affordability of salicylic acid. To enable further flexibility and orthogonality of the modular cloning strategy, the authors proceeded to construct plasmid backbones for Rhodo-Box by combining the ORIs mentioned above with common antibiotic resistance markers, while ensuring that the interaction of the different promoter and RBS sequences tested would not generate genetic context-dependent effects influencing the level of expression.

To this end, meaning to cross-screen for context-dependent interactions among Rhodo-Box’s components, Kostanjšek et al. implemented a semi-automated cloning workflow to build a R. sphaeroides strain library in a time-efficient manner using an Opentrons OT2 platform. In particular, they programmed the Opentrons liquid-handling robot to assemble five promoters with six RBSs and four transcriptional terminators to yield a total of 42 constructs (Figure 3.4). Through this process, they obtained 38 correctly assembled strains, corresponding to an overall 90% success rate in genetic circuit construction, and, concurrently, reduced the time required to build this library by 50%.

Figure 3.4 Combinatorial characterization of parts from the Rhodo-Box toolkit using a semiautomated cloning workflow on the Opentrons OT2 platform. (A) General layout of the semiautomated workflow, combining five promoters (red) with six RBSs (green), and characterizing four terminators (blue) with eGFP. (B) Heat-map representing the combinatorial promoters and RBSs' strengths in *R. sphaeroides*. (C) Normalized fluorescence of tested terminators T0, B0010, B0015, J95029 expressed in combinations of P~J23100~ and P~J95025~ with B0034 and B0030. Graphs (B, C) represent the average fluorescence values normalized with OD~600~, with the standard deviation of three technical replicates depicted in (C). Figure from Kostanjšek et al., 2026 [^1].

By coupling the validation of the modular design principles underlying the Rhodo-Box toolkit with the automated approach facilitated by the Opentrons OT2 platform, the researchers hope to further expand the repertoire of genetic circuits built for expression in R. sphaeroides, so that this novel chassis can be employed for more advanced Metabolic Engineering and Synthetic Biology applications.

  1. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details. While your description/project idea doesn’t need to be set in stone, we would like to see core details of what you would automate.

A significantly large portion of my experience of Synthetic Biology stems from working in the wet lab, so I hope you can understand my reluctance to adopt a more automated approach. However, I believe that automation can substantially help with highly repetitive tasks in order to accelerate several processes, rendering them more high-throughput.

In the context of one of my final ideas in particular, I would like to combine a genetically modified fungus with a cyanobacterium or a microalga to generate a synthetic lichen. One of the major challenges of building a lichen from the bottom up lies in establishing the appropriate organisms partaking in the symbiotic relationship. Although there are hundreds of naturally occurring lichen species 2 that could function as a basic foundation for this investigation, it would be interesting to experiment with different combinations of fungi and photobionts (possibly more than one from each category per combination). Exploring novel microorganism combinations for the generation of a synthetic lichen could lead to optimizing existing species but also to creating an artificial lichen that constitutes a particularly suitable platform for the proposed application, which is a color-shifting building coating. Opentrons could be deployed both to prepare the initial separate axenic cultures of the different fungi and photobionts and to generate the mixed co-culture combinations. Another difficulty arises from growth incompatibility between the different symbiotic partners, so Opentrons could also contribute to visually inspecting the several co-cultures through a spectrophotometer function to pinpoint the combinations where the growth rate of the fungus (or fungi) matches the respective growth rate of the photobiont(s) in a stable and consistent manner. Additionally, this kind of visual inspection performed by the Opentrons system could ensure that the cultures stay uncontaminated (which can be revealed through alterations in light absorption too), as well as identify the co-cultures which display the potential for a lichen’s structural complexity. Rather that merely co-existing in the same culture, the symbionts should also demonstrate the capacity to recreate the highly organized, layered structure of natural lichens (called a thallus), which constitutes an additional criterion for distinguishing the most promising candidates. To this end, apart from utilizing the Opentrons platform for co-culture maintenance and surveillance, I could design and 3D-print a 24-well plate embedded with a micro-scaffold that would “encourage” the formation of a rudimentary thallus by the symbiotic microorganisms. Lastly, as genetic parts for putative lichen partners are often not as well-established as for model organisms, such as E. coli, several versions of the color-changing genetic construct should be assembled, each combining different genetic parts. In that case, Opentrons could be extremely helpful in screening the performance of all different circuit combinations in a high-throughput manner, as well as in characterizing a number of novel parts, engineered or optimized for the purposed of the project. Since all three of my proposals involve the assembly and integration of a genetic construct into a biological system, the same screening of genetic circuits could be conducted by Opentrons to a similar end for my other final project ideas too.


  1. Kostanjšek M, Raynal A, Dimopoulos G, et al. Rhodo-Box: A Synthetic Biology Toolbox to Facilitate Metabolic Engineering of Rhodobacter sphaeroides. ACS Synth Biol. March 15, 2026. doi:10.1021/acssynbio.5c00808 ↩︎

  2. Lutzoni F, Miadlikowska J. Lichens. Curr Biol. 2009;19(13):R502-3. doi:10.1016/j.cub.2009.04.034 ↩︎

Week 4 homework

Protein design-Part I 💻

Part 1: Conceptual questions

Answer any nine of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (On average, an amino acid is ~100 Daltons.)

Depending on the type of meat, as well as the manner it is processed prior to consumption, 500g of meat contain approximately 100 - 130g of protein. Assuming that this protein consists entirely of amino acids (meaning, excluding metal ions, such as iron or zinc, which can be found bound to protein molecules, or glycans and other moieties added to proteins through post-translational modifications), then 100-130g of amino acids = 6.02 - 7.83x1025Da approximately. Therefore, if the molecular weight of one amino acid is on average ~100Da, then 500g of meat contain (6.02 - 7.83x1025Da)/100Da = 6.02 - 7.83x1023 amino acid molecules.

  1. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

When humans consume proteins from beef or fish, the proteins do not remain as they are, but are digested and disassembled into oligopeptides and, eventually, into their building blocks, namely amino acids. Amino acids can be further broken down, however, the majority of them are diverted to the synthesis of nascent protein molecules in the organism/cells that have absorbed those amino acids from food consumption. The kind of proteins that will be synthesized with these amino acids is dictated by the organism’s (in this case, human) genes, in combination with intercellular interactions and other environmental cues in general, since an organism’s genome is what determines their developmental program, in other words, what makes a cow a cow, a fish a fish, and a human a human.

  1. Why are there only 20 natural amino acids?

There are 20 standard/canonical or proteinogenic amino acids encoded by the genetic code. It is very likely that, in the last four billion years of Earth’s existence, since the first organic molecules started to emerge, many different amino acid molecules have occurred. Nevertheless, this combination of 20 amino acids seems to have been favored by what can be considered a deeply fundamental form of natural selection. The main factors that led to this set of 20 amino acids comprising the proteinogenic code of all life on the planet can be traced to the diversity of sizes and chemical properties they display, the prevalence of the atoms that constitute their components, the stability they provide to protein folding and structure, all balanced by their biosynthetic cost. 1

  1. Can you make other non-natural amino acids? Design some new amino acids.

Designing new amino acids primarily entails identifying what modifications can be introduced to the 20 natural ones. Among the most obvious ones is replacing one of the usual elements found in organic compounds (C, H, O, N, S, P) with another element from the periodic table. Based on this concept, I decided to swap the C in alanine’s side chain methyl group with Si (Figure 4.1A), which has similar properties but is associated with digital computing. For the second amino acid, I focused on amino acids with aromatic rings, which have always fascinated me as chemical moieties due to their conjugated π electrons and extraordinary properties. Sticking with the aromatic theme, I replaced the benzene ring in phenylalanine’s side chain with azulene (Figure 4.1B). Azulene is an aromatic group with a seven- and five-membered ring and has a distinctive blue hue, which could be useful for detection assays. Finally, to design the third amino acid (which will also ensure my being locked away in Chemistry prison for all eternity), I joined two copies of the basic amino acid skeleton (the core C with one amino and one carboxyl group) together to generate what could be described as a “double” amino acid (Figure 4.1C). Although very unorthodox and probably chemically unstable, especially in its trans form (Figure 4.1C, bottom), the “double” amino acid is envisioned to function as an adapter enabling connecting amino acids in unusual ways, such as generating peptides with two N- or two C-termini.

Imaginary amino acids Imaginary amino acids Figure 4.1 Schematics of new non-natural amino acids: (A) alanine with the C in its methyl group replaced with Si, (B) phenylalanine with the benzene in its side chain swapped with azulene, and (C) the “double” amino acid, both as a cis (top) and trans (bottom) isomer. Figure drawn on ChemDoodle.

  1. Where did amino acids come from before enzymes that make them, and before life started?

In 1953, Stanley Miller and Harold Urey conducted what is now known as the “Miller-Urey experiment”, which supports the prebiotic (prior to the emergence of life) synthesis of amino acids. In more detail, the researchers replicated the chemical conditions of Earth’s early atmosphere by combining methane, ammonia, hydrogen, and water vapor in a flask. By stimulating those compounds with electric sparks (recreating lightning), Miller and Urey were able to detect simple amino acids, such as glycine, alanine, and valine, under the simulated early Earth conditions, demonstrating that complex organic molecules could form spontaneously from inorganic precursors in the prebiotic “primordial soup”. 2

Another hypothesis claims the introduction of amino acids to Earth from space through meteorites. Among them, the Murchison meteorite 3, which fell near Murchison, Victoria, in Australia in 1969, has been found to contain over 90 amino acids, including 19 also found on Earth, suggesting an extraterrestrial origin for organic compounds.

  1. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Natural proteins consist of L-amino acids (rather than D-amino acids), meaning that the chirality (or the molecular orientation) of amino acids in naturally occurring proteins tends to be left-handed rather than right-handed 4. Due to this property, structural stability favors the formation of right-handed or clockwise-rotating α-helices with L-amino acids 5 6, with the exception of regions rich in glycine, which is the only achiral of the 20 proteinogenic amino acids. Based on this rationale, a peptide chain composed of D-amino acids would display a structural propensity towards left-handedness and would, therefore, most likely form left-handed or counterclockwise-rotating α-helices.

  1. Why are most molecular helices right-handed?

Similarly to proteins’ “preference” towards L-amino acids, the dominant forms of DNA and RNA consist of nucleotides containing D-sugars, D-deoxyribose and D-ribose respectively. In both cases of proteins and nucleic acids, the chirality of their building blocks thermodynamically favors right-handed helical structures, which minimize steric hindrances and optimize hydrogen bonding among monomers. In other words, the inherent geometry of the amino acids and sugars found in naturally occurring proteins, as well as DNA and RNA molecules, renders right-handed α-helices (in the case of proteins) and the right-handed β-form double helix (in the case of DNA) more structurally stable and energetically efficient, as they require less energy for maintenance, than their left-handed counterparts 5 6.

  1. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

In physiologically folded proteins, β-sheets are arranged in a parallel or antiparallel orientation to form zigzag polypeptide chains. They tend to be positioned side-by-side, as this conformation is mediated by hydrogen bonds between the amino- and carboxyl-groups of neighbouring sheets. However, in pathogenic states (which often involve the misfolding of the protein too), this very property favors the “edge-to-edge” aggregation of multiple β-sheets, potentially leading to the formation of insoluble amyloid fibrils. Another mechanism that drives β-sheet aggregation occurs from the shape of this type of protein secondary structure, which forces the hydrogen bonds between neighbouring domains to exist on the same plane, while the side chains of hydrophobic amino acid residues (which are common in β-sheets) are inclined to protrude out of either side of that plane. These hydrophobic moieties can interact with the hydrophobic side chains of amino acids from another sheet, creating hydrophobic spaces between them. It is, therefore, thermodynamically favorable for β-sheets to gradually be stacked on top of each other, which, once again, can result in their aggregation 7 8 9.

  1. Design a β-sheet motif that forms a well-ordered structure.

For this task, at first I thought to draw inspiration from nature and, specifically, from naturally occurring proteins characterized by a well-ordered structure despite being rich in β-sheets. One such category of proteins are fluorescent proteins, for instance GFP, whose wild-type structural configuration forms a fluorophore-protecting can-shaped cavity termed a “β-barrel”. I thought I could experiment with increasing the stability of the β-barrel, while retaining most of its intrinsic structural properties. One way to approach this would be to swap several of the hydrophobic amino acids located in its barrel-forming region (approximately amino acids 13 - 227) with prolines. Amino acid residues with hydrophobic R-groups contribute both with their side chains but also with their amino- and carboxyl-moieties to β-sheet aggregation propensity as explained in the previous question. On the contrary, proline does not have a hydrophobic R-group, as the amino acid’s cyclic side chain curls back into its own nitrogen backbone, thus preventing the amino-group’s hydrogen from interacting with other amino acids through hydrogen bonds 10 11 12. However, I then proceeded to reject this idea, as this well-ordered β-sheet pattern already exists and has been optimized by nature through evolution. I similarly rejected other ideas involving symmetric naturally generated protein motifs, such as the ones found in viral capsids.

At this point, I wondered what other intricate β-sheet motifs had already been discovered in nature. I was intrigued to find one called “the greek key”, which comprises a protein supersecondary structure consisting of four adjacent antiparallel β-strands connected non-sequentially by hairpin loops in a shape reminiscent of the meander pattern (also known as “the greek key”) 13. Wanting to honor greek heritage as well, I designed a secondary structure motif abstractly inspired by the swirls commonly observed at the bottom part of traditional greek antefixes (“ακροκέραμα”), which are ornamental ceramic tiles placed at the edge of the roof (Figure 4.2B, top). This motif contains eight β-sheets, divided into two columns and positioned antiparallel to each other in pairs, while connected by hairpin loops in the sequence 1-3-2-4-5-7-6-8 (Figure 4.2A). Upon a second glance, I realized that this pattern also resembles the capital, meaning the upper part (“κιονόκρανο”), of columns sculpted in ionic style (Figure 4.2B, bottom). This motif is, of course, imaginary, therefore it would have to be artificially constructed within a protein molecule to test for its structural integrity and conclude whether it qualifies or not as a well-ordered protein supersecondary structure.

Antefix-inspired protein motif Antefix-inspired protein motif Figure 4.2 Visual representation of the antefix-derived β-sheet secondary structure motif and its inspirational source. (A) The pattern contains eight β-sheets, divided into two columns and positioned antiparallel to each other in pairs non-sequentially. (B) This β-sheet motif was inspired by the spirals displayed at the bottom part of traditional greek antefixes (top) or bilaterally in ionic column capitals (bottom). Figure from the ceramic art workshop “Akrokeramo”.

Part 2: Protein analysis and visualization

Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions:
  1. Briefly describe the protein you selected and why you selected it.

For this assignment, I selected reflectin from Sepia officinalis, the common cuttlefish, as it contributes, among other proteins and structures, to the formation of intricate iridescent patterns that distinguish several cephalopods, such as cuttlefish, squids, and octopuses. In general, reflectins enable rapid, dynamic camouflage by controlling light reflection, as they possess high refractive indices and form nanostructured photonic films called iridophores, which can shift color in response to neurotransmitters 14. I am deeply fascinated by those patterns and the way they are created, so I drew inspiration from this natural phenomenon for one of my final project ideas, hence my interest in delving more into this particular protein.

  1. Identify the amino acid sequence of your protein.
  • How long is it? What is the most frequent amino acid?
  • How many protein sequence homologs are there for your protein?
  • Does your protein belong to any protein family?

I retrieved reflectin’s amino acid sequence from Sepia officinalis (SoREF8) from its corresponding page on NCBI:

>CCI88216.1 reflectin [Sepia officinalis] MNRFMNRYRPMFNNMHNNMYNNMYRGRYRGMMEPMSRMTMDFQGRYMDSQGRMVDPRYYDYYGRWNDYDR YYGKSMFNYGWMMDGDRYNNYYRWMDFPERYMDMSGYQMDMYGRWMDMQGRHCNPFNQWGHNRYGQSFNY NYGRNMFYPERWMDMSNYSMDMQGRYMDRWGRHCNPFSQNMNWYGRYWNYPGYNNYYYNRHMYYPERYFD MSNWQMDMQGRWMDMQGRHNNPYWYNWYGRQMYYPYQNNWYGRWDYPGMDYSNWQMDMQGRWMDMQGRYM DPWMSDYSYNN

The peptide chain is 291aa long, with the most frequent amino acid being tyrosine (Y) as it appears in the sequence 47 times.

UniProt’s BLAST tool identified 20 results with a sequence identity of 50 - 88% that appear to be homologs of SpREF8 from other appearance-shifting cephalopods, such as the Pharaoh cuttlefish (Sepia pharaonis), the East Asian common octopus (Octopus sinensis), and the common octopus (Octopus vulgaris).

Reflectins belong to a unique family of intrinsically disordered proteins (IDPs) found in cephalopods. They are highly unusual, rich in aromatic and sulfur-containing amino acids, for example, tryptophan (W) and methionine (M), and can self-assemble into diverse nanostructures 14.

  1. Identify the structure page of your protein in RCSB.
  • When was the structure solved? Is it a good quality structure? (A good quality structure is one with good resolution, the smaller the better (Resolution: 2.70 Å)).
  • Are there any other molecules in the solved structure apart from protein?
  • Does your protein belong to any structure classification family?

The query about reflectin on RCSB did not yield any results, even when computed structure models (CSMs) were included. However, there was a 3D model of the protein’s structure on its UniProt page (Figure 4.3).

Figure 4.3 Snapshot of reflectin’s (SoREF8) 3D structure visualization. Figure from its UniProt page.

  1. Open the structure of your protein in any 3D molecule visualization software.
  • Visualize the protein as “cartoon”, “ribbon”, and “ball and stick”.
  • Color the protein by secondary structure. Does it have more helices or sheets?
  • Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
  • Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Below you can find the 3D structure visualizations of SoREF8 in the “cartoon”, “ribbon” (Figure 4.4A), and “ball and stick” (or “liquorish”) (Figure 4.4B) models as rendered in PyMOL.

Reflectin’s structure models from PyMOL Reflectin’s structure models from PyMOL Figure 4.4 Visualizations of reflectin’s (SoREF8) 3D structure in the “cartoon” and “ribbon” model (A), as well as in the “ball and stick” (or “liquorish”) model (B). Figure generated with the PyMOL software.

When coloring the protein based on its secondary structures, it appears that it has an almost equal number of very short α-helices (Figure 4.5, red) and β-sheets (Figure 4.5, yellow), while the largest part of the protein retains a less structurally defined loop form (Figure 4.5, green).

Reflectin based on its secondary structures Reflectin based on its secondary structures Figure 4.5 Visualization of reflectin’s (SoREF8) secondary structures, including α-helices (depicted in red), β-sheets (highlighted in yellow), and loops (marked with green). Figure generated with the PyMOL software.

Furthermore, reflectin seems to have more hydrophilic and charged amino acids than hydrophobic residues in its peptide chain, which is consistent with the respective reflectin protein family’s high content of aromatic and sulfur-containing amino acids, such as tryptophan (W) and methionine (M) (Figure 4.6).

Hydrophilic and hydrophobic amino acids of reflectin Hydrophilic and hydrophobic amino acids of reflectin Figure 4.6 Visualization of reflectin’s (SoREF8) structure based on its amino acid hydrophobicity: hydrophobic residues are illustrated in green, whereas hydrophilic/polar and charged ones are displayed in red and blue respectively. Figure generated with the PyMOL software.

Lastly, upon visualizing reflectin’s (SoREF8) surface, the protein does form large gaping holes where its loops are located, however, their size renders them highly unsuitable for binding ligands (Figure 4.7). After a quick 3D inspection of the entire molecule’s surface, I could not find any binding pockets, which also aligns with the absence of complex secondary structures in the protein. Complex secondary structures, often facilitated by long stretches of hydrophobic amino acid residues, which are absent in reflectin’s peptide sequence, enable the folding of the amino acid chain into pores and cavities that can serve as binding sites for specific ligands. The absence of binding pockets in the case of reflectin does not come as a surprise though, as the principal element these proteins have to interact with are rays of light.

Reflectin’s surface Reflectin’s surface Figure 4.7 Visualization of reflectin’s (SoREF8) surface, where no binding pockets can be seen. Figure generated with the PyMOL software.

Part 3: Using ML-based protein design tools

Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU. Then, choose your favorite protein from the PDB.

3.1 Protein language modeling
Deep mutational scans

a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods. Can you explain any particular pattern? (Choose a residue and a mutation that stands out.)

Model esm2_t30_150M_UR50D was deployed to generate the deep mutational scan of reflectin (SoREF8), which revealed several regions that are highly sensitive to mutations in the protein. More specifically, positions 17 - 22, 40 - 48, 97 - 122, 145 - 153, 158 - 165, 207 - 230, and 264 - 278, which also appear darker in the mutation scan heatmap, indicate protein regions with an integral role in reflectin’s structure and function, being, therefore, less tolerant to amino acid substitutions (Figure 4.8). Those regions may also correspond to short patterns repeated across the protein’s amino acid sequence. Additionally, some points that stand out in the heatmap as favorable mutations are glutamine (Q) amino acid residue at positions 110 and 157, as well as a M amino acid residue at position 207 (denoted with yellow color in Figure 4.8).

Reflectin mutational scan Reflectin mutational scan Figure 4.8 Mutation scan heatmap of reflectin from Sepia officinalis. Figure generated by utilizing Evolutionary Scale Modeling version 2 (esm2_t30_150M_UR50D).

b. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.


  1. Doig AJ. Frozen, but no accident - why the 20 standard amino acids were selected. FEBS J. 2017;284(9):1296-1305. doi:10.1111/febs.13982 ↩︎

  2. Miller SL. A Production of Amino Acids Under Possible Primitive Earth Conditions. Science. 1953;117(3046):528-529. doi:10.1126/science.117.3046.528 ↩︎

  3. Kvenvolden KA, Lawless JG, Ponnamperuma C. Nonprotein amino acids in the Murchison meteorite. Proc Natl Acad Sci USA. 1971;68(2):486-490. doi:10.1073/pnas.68.2.486 ↩︎

  4. Egli M, Zhang S. Making sense of helices: right and wrong models in science and art. Mol Front J. 2023;07(01n02):71-81. doi:10.1142/S2529732523500086 ↩︎

  5. Rzepa H. Why are α-helices in proteins mostly right-handed? Henry Rzepa’s Blog. Published November 29, 2019. https://www.ch.ic.ac.uk/rzepa/blog/?p=3802 ↩︎ ↩︎

  6. Cole BJ, Bystroff C. Alpha helical crossovers favor right-handed supersecondary structures by kinetic trapping: the phone cord effect in protein folding. Protein Sci. 2009;18(8):1602-1608. doi:10.1002/pro.182 ↩︎ ↩︎

  7. Aggregation Prone Regions (APRs). VIB Switch Laboratory. https://switchlab.org/aprs ↩︎

  8. Liu L, Klausen LH, Dong M. Two-dimensional peptide based functional nanomaterials. Nano Today. 2018;23:40-58. doi:10.1016/j.nantod.2018.10.008 ↩︎

  9. Eskandari S, Guerin T, Toth I, Stephenson RJ. Recent advances in self-assembled peptides: Implications for targeted drug delivery and vaccine engineering. Advanced Drug Delivery Reviews. 2016;110-111:169-187. doi:10.1016/j.addr.2016.06.013 ↩︎

  10. Samuel D, Kumar TK, Ganesh G, et al. Proline inhibits aggregation during protein refolding. Protein Sci. 2000;9(2):344-352. doi:10.1110/ps.9.2.344 ↩︎

  11. Richardson JS, Richardson DC. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc Natl Acad Sci USA. 2002;99(5):2754-2759. doi:10.1073/pnas.052706099 ↩︎

  12. Shamsir MS, Dalby AR. Beta-sheet containment by flanking prolines: molecular dynamic simulations of the inhibition of beta-sheet elongation by proline residues in human prion protein. Biophys J. 2007;92(6):2080-2089. doi:10.1529/biophysj.106.092320 ↩︎

  13. Zhang C, Kim SH. A comprehensive analysis of the Greek key motifs in protein β-barrels and β-sandwiches. Wiley Online Library. Published online August 15, 2000. doi:10.1002/1097-0134(20000815)40:3 ↩︎

  14. Kramer RM, Crookes-Goodson WJ, Naik RR. The self-organizing properties of squid reflectin protein. Nat Mater. 2007;6(7):533-538. doi:10.1038/nmat1930 ↩︎ ↩︎

Week 5 homework

Protein design-Part II 💻

Part 1: SOD1 binder peptide design

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

This week, the assignment entails designing short peptides that bind mutant SOD1 and then deciding which ones are worth advancing toward therapy by using three models developed in the Chatterjee Lab.
A. Generate Binders with PepMLM
  1. Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
  2. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card, generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
  3. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
  4. Record the perplexity scores that indicate PepMLM’s confidence in the binders.

Upon retrieving the Homo sapiens SOD1 (HsSOD1) peptide sequence from its UniProt page, I noticed that the alanine residue that is mutated in the A4V protein variant is in position 5 and not in position 4 of the peptide chain. This means that the methionine in position 1 of the nascent protein molecule is post-translationally cleaved during the protein’s maturation process 1. Based on this and after incorporating the A4V mutation into the peptide chain, the SOD1 sequence I decided to use for this week’s assignment is the following:

>sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2, A4V variant ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

As the length and the number of the binding peptides had already been defined by the assignment, I thought it would be interesting to experiment with the k value, which, in the script we were given, was set to 3 2. As we can design only four peptides, I decided to “sacrifice” model confidence to a certain degree (and report higher perplexity values) in favor of diversity by increasing the k value to 4. The results of this analysis, including the known SOD1-binding peptide FLYRWLPSRRGG, are presented below, in Table 5.1.

Table 5.1 Peptide binders generated by PepMLM for the A4V mutant of SOD1, along with their perplexity scores and the known SOD1-binding peptide FLYRWLPSRRGG.

Peptide sequenceControl or testPeptide perplexity
0FLYRWLPSRRGGControln/a
1WLSPATVAARKXTest7.249112
2WRYGAVGAKLWXTest9.529020
3HRYVWTAARHKXTest13.445100
4WRYGVAGVAHKXTest9.256418
B. Evaluate binders with AlphaFold3
  1. After navigating to the AlphaFold Server, for each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
  2. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
  3. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

AlphaFold3 requires a defined peptide sequence to visualize protein-peptide interactions, which, I did not obtain for my test peptides, as can be seen by the “X” at position 12 in all four of them (possibly due to the increased k value I used). This provided a “loophole” I could capitalize on to experiment with several amino acid residues with different biochemical properties. So, I chose eight different amino acids out of the 20 proteinogenic ones, each representing a different chemical profile:

  • arginine (R), as an amino acid harboring a positively charged side chain
  • aspartic acid (D), as an amino acid harboring a negatively charged side chain
  • serine (S), as an amino acid harboring a polar but uncharged side chain
  • cysteine (C), as an amino acid harboring a S-containing side chain
  • glycine (G), as the only achiral amino acid
  • proline (P), because it has a straight-up weird structure and belongs to its own category
  • leucine (L), as an amino acid harboring a non-aromatic hydrophobic side chain and
  • tryptophan (W), as an amino acid harboring an aromatic and spatially-challenging-to-accomodate hydrophobic side chain.

Based on this rationale, instead of four, I tried 32 different peptides on AlphaFold3’s server, eight for each peptide template. After screening each peptide template with the eight different amino acids I had selected above, I decided to proceed with the amino acid combination that produced the highest ipTM score in each case. The results of this process are displayed in Table 5.2. I also experimented with simulating multiple copies of the same selected peptide interacting with SOD1 A4V. Nevertheless, this introduced a new level of complexity, as it had to integrate peptide-peptide interactions in the model too, so I opted not to continue with this approach.

Table 5.2 Peptide binders selected after a binding-based screening on AlphaFold3 for the A4V mutant of SOD1, along with their ipTM scores and including the known SOD1-binding peptide FLYRWLPSRRGG (iteration 1).

Peptide sequenceControl or testipTM
0FLYRWLPSRRGGControl0.89
1WLSPATVAARKCTest0.90
2WRYGAVGAKLWCTest0.90
3HRYVWTAARHKWTest0.90
4WRYGVAGVAHKWTest0.91

Upon obtaining the four peptide binders presented in Table 5.2, I wondered if I could further improve them by replacing the final amino acid with a different one exhibiting similar biochemical properties, which, however, I arbitrarily chose not to include in the first iteration of my binding screening. For this second iteration, I swapped cysteine with methionine (M) for the first two peptides and tryptophan with phenylalanine (F) and tyrosine (Y) for the third and fourth PepMLM-generated peptides. As a result, the first peptide was further optimized with a methionine as its final amino acid residue (Table 5.3).

Table 5.3 Peptide binders selected after a binding-based screening on AlphaFold3 for the A4V mutant of SOD1, along with their ipTM scores and including the known SOD1-binding peptide FLYRWLPSRRGG (iteration 2).

Peptide sequenceControl or testipTM
0FLYRWLPSRRGGControl0.89
1WLSPATVAARKMTest0.91
2WRYGAVGAKLWCTest0.90
3HRYVWTAARHKWTest0.90
4WRYGVAGVAHKWTest0.91

All peptides (0 - 4, as numbered in Table 5.3) appear to localize near the N-terminus of both monomers in the homodimer (Figure 5.1C-G). They also seem to approach the dimer interface, rather than engage the β-barrels, and they are all surface-bound (Figure 5.1C-G).

SOD1, all forms and peptides SOD1, all forms and peptides Figure 5.1 3D visualizations of SOD1 and SOD1 A4V with and without interacting with already known and PepMPL-generated binding peptides as computed by the AlphaFold3 server: (A) SOD1, (B) SOD1 A4V, (C) SOD1 A4V with the known binding peptide FLYRWLPSRRGG (peptide 0), (D) SOD1 A4V with the binding peptide WLSPATVAARKM (peptide 1), (E) SOD1 A4V with the binding peptide WRYGAVGAKLWC (peptide 2), (F) SOD1 A4V with the binding peptide HRYVWTAARHKW (peptide 3), and (G) SOD1 A4V with the binding peptide WRYGVAGVAHKW (peptide 4). Figure generated with AlphaFold3.

Among the PepMLM-generated peptides shown in Table 5.3, all four of them were assigned an ipTM score >0.8, indicating confident, high-quality, and successful predictions of the complex’s structure by the model. More importantly, peptides 1 - 4 were given higher ipTM scores than the known SOD1-binding peptide FLYRWLPSRRGG, which can be translated as a more accurate prediction of the relative positions among the components of the complex. Peptides 1 and 4 in particular scored 0.91 compared to the 0.89 ipTM score of peptide 0, rendering them promising alternatives at this stage of the analysis. As an additional measure of the peptides’ performance, I calculated their combined score as well, defined with the formula 0.8 x ipTM + 0.2 x pTM, to include their pTM metrics too 3. Once again, peptides 1 - 4 scored consistently higher than FLYRWLPSRRGG, with peptides 2 and 3 receiving a combined score of 0.904 over peptide 0’s 0.896, along with peptides 1 and 4 being assigned 0.914 for the same measure.

C. Evaluate properties of generated peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:

  1. Paste the peptide sequence.
  2. Paste the A4V mutant SOD1 sequence in the target field.
  3. Check the boxes:
  • Predicted binding affinity
  • Solubility
  • Hemolysis probability
  • Net charge (pH 7)
  • Molecular weight

Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties? Choose one peptide you would advance and justify your decision briefly.

To better simulate the binding target, which in this case, is an ion-recruiting protein homodimer, I inserted two copies of SOD1 A4V’s amino acid sequence in the designated field of the PeptiVerse interface. The predicted properties of the PepMLM-generated peptides are summarized below, in Table 5.4.

Table 5.4 SOD1 A4V-binding peptides’ properties as predicted by PeptiVerse, with the peptides either being known binders or having been generated with PepMLM.

Peptide sequenceSolubilityHaemolysisBinding affinityMolecular weightNet charge (pH 7)Isoelectric pointHydrophobicity
0FLYRWLPSRRGGSoluble (100%)Non-haemolytic (95.3%)Weak binding (6.098pKd/pKi)1,507.7Da2.7611.71-0.71 GRAVY
1WLSPATVAARKMSoluble (100%)Non-haemolytic (98.4%)Weak binding (5.368pKd/pKi)1,330.6Da1.7611.000.24 GRAVY
2WRYGAVGAKLWCSoluble (100%)Non-haemolytic (92.4%)Medium binding (7.831pKd/pKi)1,409.7Da1.759.310.15 GRAVY
3HRYVWTAARHKWSoluble (100%)Non-haemolytic (98.3%)Weak binding (5.572pKd/pKi)1,610.8Da2.9311.00-1.28 GRAVY
4WRYGVAGVAHKWSoluble (100%)Non-haemolytic (96.8%)Medium binding (7.799pKd/pKi)1,429.6Da1.859.99-0.29 GRAVY

According to PeptiVerse’s predictions as presented in Table 5.4, all screened peptides are soluble (100% probability) and non-haemolytic (> 92% probability). It would also appear that higher ipTM scores do not necessarily correlate with a strong binding affinity to the protein variant. This is especially the case for peptide 1, which, despite receiving a 0.91 ipTM score, is anticipated to have weak binding affinity to the target. After obtaining the data above from PeptiVerse, I would choose to proceed with one the two peptides that displayed medium binding affinity, namely either peptide 2 or peptide 4. Since the discrepancy between their binding affinities is relatively small, I would advance peptide 4, which was assigned a 0.91 score with AlphaFold3 and demonstrates a higher probability of not causing haemolysis (96.8% > 92.4%).

D. Generate optimized peptides with moPPIt

Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.

  1. After opening the moPPit Colab linked from the HuggingFace moPPIt model card, make a copy and switch to a GPU runtime.
  2. In the notebook:
  • Paste your A4V mutant SOD1 sequence.
  • Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
  • Set peptide length to 12 amino acids.
  • Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
  1. After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?

For the generation of binding peptides through moPPit, I chose the “Haemolysis” and “Solubility” criteria with a weight of 2, as I considered them less significant than “Specificity”, to which I assigned a weight of 3, as well as “Affinity” and “Motif”, whose weights I increased to 7, since those two are the principal factors determining the structure of the peptide binders for this assignment. More specifically, for the “Motif” criterion, I designated amino acids 1 - 10, 106 - 112, and 140 - 153 as positions that should be primarily taken into account. The first region is where the A4V mutation resides, while the second seems to be related to the formation of two short β-sheets beneath the β-barrel regions; those structures are present in the mutant variant but not in the wildtype one (Figure 5.1B compared to A). Additionally, the C-terminus of the protein, here represented by residues 140 - 153, appears to be the main interface for the homodimer formation and stabilization, hence its inclusion as a region that should influence the generation of binding peptides. The newly-designed peptides can be found below (Table 5.5, Figure 5.2).

Table 5.5 Peptide binders for the A4V mutant of SOD1 generated by moPPit, along with their ipTM scores.

Peptide sequenceipTM
1KKKCGVLVVVHD0.89
2AVTMKRKPLFCQ0.92
3PKSQKVKTCVAQ0.89

SOD1 and moPPit peptides SOD1 and moPPit peptides Figure 5.2 3D visualizations of SOD1 A4V interacting with moPPit-generated binding peptides as computed by the AlphaFold3 server: (A) SOD1 A4V with the binding peptide KKKCGVLVVVHD (peptide 1_moPPit), (B) SOD1 A4V with the binding peptide AVTMKRKPLFCQ (peptide 2_moPPit), and (C) SOD1 A4V with the binding peptide PKSQKVKTCVAQ (peptide 3_moPPit). Figure generated with AlphaFold3.

Before advancing any of the moPPit-generated peptides to clinical studies, I would first evaluate them through the PeptiVerse to assess their biochemical properties and screen for possible unintended effects. The results of this analysis are presented in Table 5.6.

Table 5.6 SOD1 A4V-binding peptides’ properties as predicted by PeptiVerse, with the peptides having been generated with moPPit.

Peptide sequenceSolubilityHaemolysisBinding affinityMolecular weightNet charge (pH 7)Isoelectric pointHydrophobicity
1KKKCGVLVVVHDSoluble (100%)Non-haemolytic (95.7%)Tight binding (9.909pKd/pKi)1,324.6Da1.849.200.36 GRAVY
2AVTMKRKPLFCQSoluble (100%)Non-haemolytic (98.0%)Weak binding (6.635pKd/pKi)1,421.8Da2.7810.06-0.09 GRAVY
3PKSQKVKTCVAQSoluble (100%)Non-haemolytic (98.4%)Medium binding (8.371pKd/pKi)1,316.6Da2.959.81-0.76 GRAVY

  1. Stevens JC, Chia R, Hendriks WT, et al. Modification of superoxide dismutase 1 (SOD1) properties by a GFP tag–implications for research into amyotrophic lateral sclerosis (ALS). PLoS ONE. 2010;5(3):e9541. doi:10.1371/journal.pone.0009541 ↩︎

  2. Chen T, Dumas M, Watson R, et al. PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling. arXiv. August 11, 2024. doi:10.48550/arxiv.2310.03842 ↩︎

  3. Omidi A, Møller MH, Malhis N, Bui JM, Gsponer J. AlphaFold-Multimer accurately captures interactions and dynamics of intrinsically disordered protein regions. Proc Natl Acad Sci USA. 2024;121(44):e2406407121. doi:10.1073/pnas.2406407121 ↩︎

Week 6 homework

Genetic circuits-Part I: Assembly technologies 🧩

DNA Assembly

Answer these questions about the protocol in this week’s lab:
1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The components in the Phusion High-Fidelity PCR Master Mix, along with their purpose, are the following:

  • Phusion PCR buffer: The buffer solution ensures optimal salt concentration and pH conditions for the amplification reaction to take place, while also providing the Mg+2 ions needed for the polymerase’s catalytic activity.
  • Deoxyribonucleotides (dNTPs): This is a mix of A, T, G, and C nucleotides in equal concentrations (to prevent integration bias) which provides the building blocks necessary for the synthesis of the nascent DNA strands.
  • Phusion polymerase: This is the enzyme that, upon given a free 3’ -OH group and a complementary DNA strand, will catalyze the actual amplification step, where new strands are synthesized.
  • Dimethylsulfoxide (DMSO): DMSO can be added in especially challenging amplification reactions, in particular when the template sequence is particularly GC-rich. An increased GC content can lead to incomplete denaturation, as well as to the formation of secondary structures that lower PCR efficacy, therefore, a denaturing aid, such as DMSO, can reduce DNA melting temperature and non-specific binding for more effective amplification of the target sequence.
  • Nuclease-free water: If needed, nuclease-free water can be added to the PCR solution to reach the final volume of the reaction. This secures that all ingredients have the appropriate concentration for the amplification to happen and no degrading enzymes can endanger the template DNA or the nascent strands’ integrity. The DNA template, as well as the primers, for the reaction are usually custom-designed and provided by the researcher.
2. What are some factors that determine primer annealing temperature during PCR?

The two primary factors determining primer annealing temperature are the primer’s length and its GC content. In general, the longer a primer is the higher its annealing temperature. The same holds true for a higher GC percentage, as in both cases the more hydrogen bonds formed between the template strand and the primer the stronger and more stable the annealing. Those two factors majorly influence the formation of secondary structures in primer molecules and primer dimers, which in turn affect PCR efficiency. However, the degree to which those two parameters influence primer annealing temperature is relative. For instance, a primer that is designed to introduce several point mutations at a short distance from one another (a “garland primer”) or a primer fabricated to insert an entirely novel small fragment to a sequence (a primer with a 5’ overhang), although seemingly longer, do not necessarily require a higher annealing temperature, at least not in the first cycles of the PCR, when the template DNA molecules outnumber the newly synthesized sequences. Similarly, apart from the GC content, the distribution of Gs and Cs in the primer can modulate its annealing point: primers with more than two or three sequential Gs and Cs accumulated at one point bind more firmly to their complementary strand and, thus, display higher Tms, than primers where Gs and Cs are more evenly spaced out and separated by brief arrays of As and Ts. The need for higher primer annealing temperatures can be alleviated by the addition of denaturing agents, such as DMSO as explained previously, which could also mildly impact the resulting primer Tm. Furthermore, the concentration of salts in the reaction, combined with their ionic strength, as well as the concentration of primers themselves, can influence annealing temperature too. An increased concentration of both can negate the electrostatic forces facilitating the formation of hydrogen bonds and favor non-specific binding respectively, resulting in the need for a higher annealing temperature.

  1. There are two methods from this class that create linear fragments of DNA: PCR and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR and restriction enzyme digests show both similarities and differences. In terms of their respective protocols, conducting a PCR requires a thermocycler for the temperature to fluctuate during the different stages of the reaction, whereas restriction enzyme digests have to be incubated only at one specific temperature dictated by the restriction endonuclease (optionally, a second higher temperature, usually 80°C, can also be deployed for deactivating the enzyme). Additionally, the same three steps -denaturation, annealing, and extension- have to be repeated 25 - 30 times for a PCR, rendering it generally more time-consuming than a restriction digest, which does not involve any repeated steps. Apart from the DNA template, the buffer, and the respective enzyme in both techniques, PCR needs more reagents than a restriction endonuclease digestion, including the primers, the dNTPs, as well as the GC enhancers (like DMSO). Another difference lies in the lack of heat tolerance of restriction endonucleases used in digest reactions (as highlighted previously, they can be easily deactivated by increasing the reaction temperature), while DNA polymerases employed in PCRs are utilized exactly because they can withstand temperatures close to water’s boiling point and are even optimized for this specific property. On the other hand, there is a much wider variety of restriction enzymes that can be used for digests compared to the fewer variants of DNA polymerases commercially available. Lastly, while PCR requires the a priori designing of appropriate primers and restriction digests entail the selected endonuclease’s recognition site already existing in the DNA template, the products generated from both processes need column-based purification before being used in downstream applications.

Regarding their general use and purpose, as fundamental Molecular Biology techniques, they both generate linear DNA fragments. Nonetheless, in PCR, amplicons have blunt ends, whereas in restriction digests, digested DNA fragments can have 5’ or 3’ overhangs too. In many cases, both techniques are crucial for cloning designed DNA sequences, as they can isolate a particular DNA segment with great specificity and efficiency. Both techniques can be employed to confirm that the correct DNA fragment has indeed been inserted in an assembled construct after cloning is complete (and can be visualized with agarose gel electrophoresis), while they can also be combined on multiple occasions, for instance, to carry out a Gibson assembly or to degrade the remaining plasmid template through DpnI digestion after a PCR and before performing a bacterial transformation. On the contrary, PCR alone would be the preferred technique when the goal of the experiment is to insert a point mutation or a short new fragment to a known DNA sequence (with the suitable primers) or when the ultimate purpose of a reaction is simply to propagate and amplify DNA. PCR is a synthtic technique, leading to a much higher in vitro-synthesized nucleic acid yield and concentration than the initial one, which is, of course, not the case for restriction enzyme digestion, which can contain both in vitro- and in vivo-produced sequences. Concerning restriction enzyme digestion, it can be utilized when DNA needs to be broken down, for example, to decompose the plasmid template molecules after a PCR as mentioned above, or it can be employed for other cloning methods, such as Golden Gate assembly, which, traditionally, does not involve any PCR steps.

  1. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

The first and most crucial step to ensure that PCR-amplified and digested DNA sequences are appropriate for Gibson cloning is to spend as much time as needed to design them correctly. During this phase, verifying that DNA segments are properly designed can be achieved by in silico simulating their assembly on a simulation software like Benchling. On a more practical note, apart from making sure to cautiously follow the steps of the designated protocols, PCR-amplified and digested DNA sequences should be purified through a column-based method and, ideally, have their concentration measured (for example, with a NanoDrop instrument) before any downstream processes. Lastly, prior to proceeding with the cloning, it is always a good idea to visualize the DNA amplicons and digested segments through agarose gel electrophoresis, to ensure that they were indeed generated and that they have the anticipated size as an indication that no mistakes had been made in previous experiments.

  1. How does the plasmid DNA enter the E. coli cells during transformation?

For plasmid DNA to successfully enter E. coli cells during transformation, firstly, the correct positioning of the plasmid molecules on the periphery of the bacterial cells has to be ensured. Several chemical agents, such as MgCl2 and polyethylene glycol (PEG) 8000, are employed during the preparation of competent bacterial cells (bacteria that have been rendered available to receive exogenously procured DNA) to this end. In more detail, Mg+2 cations from MgCl2 help neutralize the negative charges of DNA’s phosphate backbone, allowing plasmids to approach and remain close to the bacterial cell walls, while extremely hydrophilic compounds, such as PEG, contribute to removing the aqueous coating of plasmid molecules, once again eliminating the barriers between the foreign DNA and the surface of the cells. This is further facilitated by keeping the cells about to be transformed on ice for approximately 10min. Once the plasmids are in position, a heat or electric shock during transformation effectively disrupts the integrity and continuity of the bacterial cell wall and cell membrane, which can result in cicrular DNA entering the cell. After the shock, bacteria are left to recover and repair their membranes, with a percentage of the cells having acquired an additional piece of DNA through the process.

  1. Describe another assembly method in detail (such as Golden Gate Assembly).
  • Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
  • Model this assembly method with Benchling or Asimov Kernel!

For this section, I decided to describe Golden Gate assembly, as it is one of the methods I am most familiar with. It is a versatile, highly efficient, one-pot, scarless cloning method that allows joining up to 30 DNA segments in a single reaction. As visual aids, I will use two slides from a workshop presentation we organized with other members of iGEM Athens 2022 for the 18th Autumn Assembly of the European Pharmaceutical Students Association (EPSA) held in Athens in November 2022.

A specific category of restriction endonucleases, namely type IIS restriction enzymes (such as BsaI and BsmBI), constitutes the basis of Golden Gate assembly. What distinguishes type IIS endonucleases from traditional restriction enzymes is that they recognize a specific sequence of nucleotides but cleave DNA several nucleotides downstream (therefore outside) of their recognition site. This property enables the generation of custom four-nucleotide overhangs with the use of solely one restriction enzyme (Figure 6.1).

Traditional vs type IIS endonucleases Traditional vs type IIS endonucleases Figure 6.1 Slide from a workshop presentation of iGEM Athens 2022 for the 18th Autumn Assembly of EPSA explaining the mechanism of function for traditional restriction enzymes compared to type IIS endonucleases.

Due to the creation of custom overhangs by just one restriction endonuclease, complex DNA constructs consisting of up to 30 distinct parts can be arranged in a specific predesignated order and assembled with an extremely low error rate in a one-pot reaction. If the design of the individual DNA segments is executed correctly, meaning with recognition sites “outward” of cleavage sites for inserts and the inverse for the selected backbone (recognition sites “inward” of cleavage sites), all parts should be assembled seamlessly (without leaving any restriction sites behind) with the right configuration (Figure 6.2). Besides the orientation of the recognition and cleavage sites, another design prerequisite is that insert fragments must not contain internal recognition sites for the chosen type IIS endonuclease, which can be removed via site-directed mutagenesis prior to assembly.

Golden Gate assembly Golden Gate assembly Figure 6.2 Slide from a workshop presentation of iGEM Athens 2022 for the 18th Autumn Assembly of EPSA illustrating how Golden Gate assembly enables building complex genetic circuits in a one-pot reaction.

Apart from the selected type IIS restriction enzyme(s), a T4 DNA ligase is included in the master mix too, so that both digestion and ligation occur in a single tube, typically through 30 - 60 consecutive cycles at the appropriate temperatures in a standard thermocycler. This protocol ensures that molecules of the desired final construct accumulate over time, as they lack the type IIS restriction sites, whereas incorrect products are re-digested. This recycling property secures high efficiency which, along with all the aforementioned advantages of Golden Gate assembly, render this cloning method ideal for various Synthetic Biology applications.

Week 7 homework

Genetic circuits-Part II: Neuromorphic circuits 🧠

Week 9 homework

Cell-free systems 🧪

General homework questions

1. Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Compared to conventional in vivo methods, cell-free protein synthesis provides modularity and substantially higher experimental control, as all the system’s components can be readily added or removed, especially when the strategy employed is to separately produce or extract each cellular element required for the process and then combine them all together into a single reaction. Cell-free systems also offer the potential for precise control over reaction conditions, such as pH and ion concentration, while being more flexible and versatile since they allow the expression of proteins deleterious to living cells, support the integration of non-natural and non-canonical amino acids into peptide backbones, and are compatible with diverse DNA templates (linear or plasmid). Additionally, they eliminate constraints imposed by the existence of living cells. For instance, unlike traditional cell cultures, they do not need any monitoring, cultivating, or other interventions aimed at preservation, nor are they susceptible to issues of cell viability, growth limits, or stress responses. Similarly, since the cell-free apparatus exists outside of the context of a cellular platform, there are no cell-membrane barriers, facilitating access to biochemical reactions, while, at the same time, there is no interference or competition from other metabolic procedures or regulatory signals, enabling all the available resources to be channeled towards the synthesis of the desired protein. The absence of living cells can be translated into abolishing the need for cloning and cellular transformation as well, which, in turn, ensures safer handling, as no genetically modified organisms are involved in cell-free protein production. More generally, one of the method’s most significant advantages is that it is a highly efficient technique for rapid protein synthesis that can also withstand being transferred across larger distances for longer periods of time, as the entire system can be easily freeze-dried and stored for later use.

For more tangible examples, more specific cases where cell-free expression is more beneficial than cell-dependent protein production are presented below:

  • In theranostic applications, where the system has to be implanted in close proximity or inside the human body. Since no living cells are implicated, whose parts could potentially be recognized as harmful agents, the probability for a toxic immune or allergic reaction is low.
  • In experiments conducted to study the foundations of transcription and translation. The isolation of a cell-free platform ensures the appropriate conditions to investigate gene expression mechanisms without the background noise from other cellular processes.
  • For remote field testing, as cell-free systems generally require far less infrastructure than traditional cell-based production installations. Because of this, cell-free platforms can very easily be converted into portable platforms, enabling carrying out experiments, for instance, even in space.
  • For on-demand biomanufacturing, since, not only are all the system’s resources directed to the generation of the desired product, but also cell-free systems can achieve higher titers in considerably less time (minutes to hours instead of days). Apart from the efficiency, the desired product is less contaminated with unwanted cellular metabolites, allowing for higher purity and, therefore, for the implementation of less complex purification methods.
2. Describe the main components of a cell-free expression system and explain the role of each component.

The main input of cell-free systems is a circular or linear DNA sequence that contains the gene to be expressed (including an appropriate promoter), while the principal output is a desired protein. The first step for the expression of the gene involves its transcription, for which the enzyme RNA polymerase is required, along with Mg2+ ions, which act as essential co-factors. For the mRNA of the desired gene to be physically synthesized, the cell-free system should contain the needed building blocks too, namely nucleotides. To effectively translate the transcript into protein, the reaction should also have access to ribosomes and tRNA molecules, which will build the peptide sequence and carry the amino acids (found in the solution too) to the correct position of the nascent peptide respectively. Lastly, the energy required for this entire machinery to function can be obtained with the addition of ATP into the cell-free system.

3. Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Despite their many advantages, cell-free systems are also characterized by their inability to regenerate energy, mostly due to their lack of intricate cellular structures, such as intracellular compartments and membrane protein complexes. For this reason, it is critical to “recharge” cell-free platforms of protein production with the frequent addition of ATP.

The host cells are also still naturally producing their own proteins in order to stay alive. These other proteins can interfere with or delay the production of the target protein. Furthermore, once the protein has been made, it might be difficult to separate the target protein from all the other proteins and cellular components. In some cases, the target protein may itself be toxic to the host cell, making the host cells die before significant amounts of protein can be made. To help overcome all of these challenges, researchers have developed a method of expressing proteins that doesn’t require living cells, called cell-free protein expression.

There are two major strategies currently used to make cell-free reactions. Some components, like nucleotides and amino acids, can be chemically synthesized. Other components, such as ribosomes and polymerases, still need to be produced by living cells and then separated from the cells. Since scientists have to individually create and purify each component, setting up this type of cell-free reaction is still complex and costly. However, because scientists are able to individually determine every molecule that is put into the reaction, they have tremendous control over the process which can result in high-quality proteins. The second method is to extract all the components directly from host cells all at once. Scientists grow up a large amount of cells and then break them open through a process called lysis. In doing so, scientists can extract the polymerases, ribosomes, and other biological components needed for transcription and translation, and then supplement it with chemically-synthesized nucleotides, amino acids, and an energy source. This makes the entire process much simpler and more cost-effective, but it also results in a less purified reaction, as the extract will still contain many unneeded cellular components.

Since cell-free reactions don’t have cells membranes getting in the way, scientists can directly interact with and manipulate the different components in the reaction. This allows them to learn more and experiment with cellular processes that were previously too difficult to study in living cells. One example is to incorporate non-natural amino acids into the reaction. There are 20 naturally occurring amino acids, but scientists have been able to develop synthetic amino acids with unique chemical properties, and then use these non-natural amino acids to build new proteins in cell-free reactions that cannot be built in natural cells. In 2018, Kazutoyo Miura and their team used nonnatural amino acids to develop a new malaria antigen, which is a small protein that mimics a pathogen used in vaccines to “train” immune systems to fight against specific diseases. The non-natural amino acids in this antigen allow it to bind strongly to immune cells, trigger an immune response, and train them to recognize similar pathogens in the future. With many parts of the world still suffering from malaria and other diseases, we need new vaccines and treatments; using non-natural amino acids may help us discover them. Cell-free reactions don’t have cells that need to be kept alive, but they do contain sensitive molecules that require specific storage conditions. To get around this, scientists freeze-dry the reaction to make them last longer at room temperature. By freezing the reaction and then pulling all of the water out with a vacuum pump, they produce a dry solid that is stable outside of the freezer—similar to how beef left at room temperature will begin to rot, but beef jerky is stable for a long time. All the user has to do is rehydrate their reaction with water, add their DNA of interest, and transcription and translation will begin. Typically, pharmaceutical companies will produce medically-relevant proteins in large batches and ship them on ice to the patients who need them. However, the live-cell production and cold shipping processes are expensive. Freeze-dried, cell-free reactions could be shipped instead so therapeutic proteins can be produced directly in small batches on-demand, virtually anywhere in the world, at a fraction of the cost

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why. How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup. Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Week 10 homework

Advanced imaging and measurement technology 🎞️

Waters Part I: Molecular weight

We will analyze an eGFP standard on a Waters Xevo G3 QTof MS system to determine the molecular weight of intact eGFP and observe its charge state distribution in the native and denatured (unfolded) states. The conditions for LC-MS analysis of intact protein cause it to unfold and be detected in its denatured form (due to the solvents and pH used for analysis).

1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight?

eGFP amino acid sequence with C-terminal linker and 6x-His tag

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK LE HHHHHH

By using the molecular weight calculator provided by the ExPASy portal, the molecular weight of the N-terminally tagged eGFP presented above was calculated to be MWth = 28,006.60Da.

2. Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 10.1).

eGFP mass spectrum eGFP mass spectrum Figure 10.1 Mass Spectrum of intact eGFP protein from the Waters Xevo G3 LC-MS (a mass spectrometer with 30,000 resolution) with individual charge state peaks labeled with m/z values.

2.1 Determine z for each adjacent pair of peaks (n, n+1) using the formula: z = (m/zn+1-1)/(m/zn-m/zn+1).

For the calculation of the charge, I chose the following two consecutive measurements from Figure 10.1: m/zn+1 = 848.9758 and m/zn = 875.4421. Considering that the charge in the n state of the protein (zn) is equal to the protein’s number of charges as carried by protons (nH) and by applying the mathematical formula shown above, zn = nH = z = 32.0398. Therefore, zn = 32 approximately and zn+1 = 33.

2.2 Determine the MW of the protein using the relationship between m/zn, MW, and zn.

Based on the calculations of the previous segment and by implementing the formula MW = (m/znxzn)-zn, the experimentally measured MW of eGFP is MWexp = 27,982.15Da.

2.3 Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using the formula:
accuracy = |MWexp-MWth|/MWth.

By applying the mathematical formula above, the accuracy of the measurement is approximately 8.73x10-4.

3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP in Figure 10.1? If yes, what is it? If no, why not?

The peaks in the zoomed-in frame included in Figure 10.1 are not very clearly separated, probably due to the resolution of the instrument in this m/z range. Based solely on this, it is not easy to calculate the corresponding charge. However, by counting the number of peaks to the right of the m/zn peak, the charge of the peak shown in the close-up of Figure 10.1 should be around 19.

Waters Part II: Secondary/Tertiary structure

We will analyze eGFP in its native, folded state and compare it to its denatured, unfolded state on a quadrupole time-of-flight MS. We will be doing MS-only analysis (no liquid chromatography, also known as “direct infusion” experiments) on the Waters Xevo G3-QToF MS.

1. Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 10.2)?

Denatured vs native eGFP spectra Denatured vs native eGFP spectra Figure 10.2 Comparison of the mass spectra between denatured (top) and native (bottom) eGFP standard on the Waters Xevo G3 QTof MS.

In their native conformation, proteins remain fully functional and folded, as they retain their secondary and tertiary structures. However, in their denatured state, proteins have lost their native folding due to denaturing agents (such as increased heat and acidic or basic conditions). Since these factors disrupt the bonds established by amino acid residue interactions (hydrogen bonds, disulfide bridges, hydrophobic interactions), the denaturation process removes the second and tertiary structures of proteins, allowing them to unravel into a long chain of amino acids (primary structure). Due to the proteins fully or partially unfolding, a larger surface area of the protein molecules is exposed, potentially from cavities and grooves hidden in its native conformation, enabling protons to be attached to a greater number of amino acid residues (mainly to their side chains) during the ionization phase of mass spectrometry. This is bolstered by the significantly higher number of peaks observed at the leftmost part of the denatured eGFP spectrogram (Figure 10.2, top) compared to the corresponding part of the native protein’s spectrum (Figure 10.2, bottom). The fewer peaks in the latter indicate a lower number of entities carrying a larger amount of charges, which in the denatured eGFP spectrum is substantially increased due to the denaturation process and the exposition of larger domains of the protein to ionization. The greater number of peaks in the top image of Figure 10.2 also shows that eGFP exists in many slightly different but clearly discernible denatured states, each with a discrete m/z ratio. In contrast to that, the native eGFP’s spectrogram (Figure 10.2, bottom) presents only a few peaks which are mostly found towards the rightmost segment of the graph, consistent with the lower charge obtained by the more compact folded native conformation of the protein during ionization. Lastly, the spectrum for native eGFP, at the bottom image of Figure 10.2, includes a small number of peaks to the left, which could be attributed to eGFP molecules that were partially denatured or even broken apart into shorter peptides due to the conditions of the analysis, thus collecting more charges per unit of mass.

2. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 10.3), can you discern the charge state of the peak at m/z = ~2,800? What is the charge state? How can you tell?

Zoom-in of native eGFP’s spectrum Zoom-in of native eGFP’s spectrum Figure 10.3 Native eGFP mass spectrum from the Waters Xevo G3 Q-Tof MS. The inset is a zoomed-in view of the charge state at m/z = ~2,600 on a mass spectrometer with 30,000 resolution.

Once again, to calculate the charge state of the peak at approximately m/z = 2,800, I used the formula z = (m/zn+1-1)/(m/zn-m/zn+1) and two measurements from the two consecutive peaks close to m/z = 2,600 and m/z = 2,800 respectively as illustrated in the spectrogram in Figure 10.3, namely m/zn+1 = 2,547.4929 and m/zn = 2,799.4199. Based on these data, the charge state of native eGFP around m/z = 2,800 is zn = 10.11 = ~10.

Waters Part III: Peptide mapping and primary structure

We will digest the eGFP protein standard into peptides using trypsin (an enzyme that selectively cleaves the peptide bond after Lysine (K) and Arginine (R) residues. The resulting peptides will be analyzed on the Waters BioAccord LC-MS to measure their molecular weights and fragmented to confirm the amino acid sequence within each peptide – generating a “peptide map”. This process is used to confirm the primary structure of the protein.

1. How many lysines (K) and arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above.

The amino acid chain of the C-terminally tagged eGFP previously presented contains 20 lysine and 6 arginine amino acid residues.

eGFP amino acid sequence with C-terminal linker and 6x-His tag

MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK LE HHHHHH

2. How many peptides will be generated from tryptic digestion of eGFP?

By utilizing the peptide mass tool provided by the ExPASy portal and by following the relevant guide on the week 10 homework page, the 6x-His C-terminally tagged eGFP should be digested into 19 peptides of MW >500Da (which are shown in full detail in Figure 10.4A) and 8 more smaller peptides (as seen in Figure 10.4B) after treatment with trypsin. So, eGFP should be broken apart into 27 peptides in total after trypsin digestion.

Analysis of eGFP peptides Analysis of eGFP peptides Figure 10.4 Overview of the peptides that occur after the 6x-His C-terminally tagged eGFP previously demonstrated is digested with trypsin. Both peptides with MW >500Da (A) and smaller peptides (B, highlighted in black) are generated after the digestion according to the ExPASy PeptideMass tool.

3. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference), how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.

Chromatogram of eGFP ion peptides Chromatogram of eGFP ion peptides Figure 10.5 Total ion chromatogram (TIC) of the eGFP peptide map. The peak at 2.78 minutes is circled and its MS data are shown in the mass spectrum in Figure 10.6.

To determine which peaks in the chromatogram are >10% relative abundance, I decided to define the peak at 4.87 minutes (the highest peak in the graph) as 100% relative abundance. Since this peak corresponds to approximately 12x106 ion count, only the peaks corresponding to an ion count of 1.2x106 or above should be taken into consideration. Based on this criterion, the peaks in the peptide map between 0.5 and 6 minutes that should be included in the count are at 0.61, 0.79, 1.43, 1.80, 1.85, 1.93, 2.17, 2.26, ~2.46, 2.54, 2.78, 3.27, 3.53, 3.59, 3.70, 4.30, 4.48, 4.64, and 4.87 minutes, therefore 19 peaks in total.

4. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

Indeed, the number of peptides that have a MW >500Da (as predicted by ExPASy) matches the number of chromatographic peaks detected between 0.5 and 6 minutes to have >10% relative abundance, which is 19.

5. Identify the mass-to-charge (m/z) of the peptide shown in Figure 10.6. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide ([M+H]+) based on its m/z and z.

Mass spectrum of the chromatographic peak at 2.78 minutes Mass spectrum of the chromatographic peak at 2.78 minutes Figure 10.6 Mass spectrum figure to show the m/z for the chromatographic peak at 2.78 minutes from Figure 10.5 above. The inset is a zoom-in of the peak at m/z = 525.76, to discern the isotope peaks.

By observing the different isotopes of the molecule eluted at 2.78 minutes in the inset of Figure 10.6, the different m/z ratios are 525.76712 for the leftmost peak (corresponding to the molecule’s monoisotopic mass), 526.25918 for the peak immediately to the right, 526.76845 for the following peak to the right, and 527.26098 for the rightmost peak of the chromatogram. Based on these measurements, the isotope spacing between each pair of consecutive peaks appears to be Δm/z = ~0.5 (Table 10.1), which corresponds to a charge of z = 2 for the most abundant state of the molecule according to the formula z = 1/(Δm/z). Finally, by utilizing the basic formula for m/z = (M+nH)/n, where n = z = 2 in our case and by solving it as an equation with the variable M as the unknown, the mass of the singly charged form of the peptide should be [M+H]+ = 1,050.53424.

Table 10.1 Δm/z calculations for all combinations of two consecutive peaks depicted in the chromatographic inset of Figure 10.6.

m/z for peak to the leftm/z for peak to the rightΔm/z
1525.76712526.259180.49206
2526.25918526.768450.50927
3526.76845527.260980.49253
6. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm.

Based on a comparison of the result for the previous question ([M+H]+ = 1,050.53424) and the masses of the peptides expected to occur after the digestion of eGFP with trypsin, the peptide that was eluted after 2.78 minutes is probably FEGDTLVNR, which was found to have a theoretical mass of 1,050.5214 according to ExPASy (at the ninth row of the table shown in Figure 10.4A). By applying the formula for accuracy provided above, the accuracy of the mass’ measurement is 1.22x10-5. After multiplying the accuracy with 106 (to convert to ppm), the error of the measurement is 12.2ppm, so slightly above 10ppm, which is the identification threshold for peptides.

7. What is the percentage of the sequence that is confirmed by peptide mapping? (See Figure 10.7)
Figure 10.7 Amino acid coverage map of eGFP based on BioAccord LC-MS peptide identification data.

The percentage of the sequence that is confirmed by peptide mapping is 88% or 218 out of the protein’s 247 amino acids (Figure 10.7).

Waters Part IV: Oligomers

We will determine Keyhole Limpet Haemocyanin (KLH)’s oligomeric states using charge detection mass spectrometry (CDMS). CDMS single-particle measurements of KLH allow us to make direct mass measurements to determine what oligomeric states (that is, how many protein subunits combine) are present in solution. Using the known masses of the polypeptide subunits (Table 10.2) for KLH, identify where the following oligomeric species are on the spectrum shown below from the CDMS (Figure 10.8):

- 7FU decamer
- 8FU didecamer
- 8FU 3-decamer
- 8FU 4-decamer

Table 10.2 Keyhole Limpet Haemocyanin (KLH) subunit masses.

Polypeptide subunit nameSubunit mass
7FU340kDa
8FU400kDa

Based on the subunit masses given in Table 10.2 and the number of subunits comprising each oligomeric state of KLH provided by the description above (with “deca” -“δέκα”- meaning “ten”), I calculated the mass for the four different oligomers as presented below.

  • 7FU decamer: 340kDax10 = 3,400kDa = 3.40MDa, so the peak at 3.40MDa highlighted in Figure 10.8 with a black arrow.
  • 8FU didecamer: 400kDax20 = 8,000kDa = 8.00MDa. The peak closest to this result is the one at 8.33MDa (denoted in Figure 10.8 with a red arrow), which, however, displays a discrepancy of ~330kDa. I could not find anything about this variation in the available literature, so my current hypothesis is that these 330kDa could be attributed to a petidic linker-like feature joining the two decamers together.
  • 8FU 3-decamer: 400kDax30 = 12,000kDa = 12.00MDa. Once again, the closest peak closest to this mass is not exactly at the result calculated, but at 12.67MDa (as denoted in Figure 10.8 with a purple arrow). A substantial discrepancy between the theoretically calculated and the experimentally measured mass of the oligomer is observed here as well, albeit doubled (2x330kDa = ~670kDa) in this case. This phenomenon further supports the hypothesis analyzed above, that the additional mass corresponds to a peptide linker contributing to the association of individual decamers, as, for the addition of a third decamer towards the formation of the KLH tridecamer, another linker module, therefore another ~330kDa, would be required.
  • 8FU 4-decamer: 400kDax40 = 16,000kDa = 16.00MDa. Based on the linker-related hypothesis described previously, a KLH 8FU tetramer would need three linker units to be assembled, so 16.00MDa combined with an additional 3x330kDa = ~1MDa, and therefore would have a mass of approximately 17MDa. Surprisingly, there are no visible peaks at 16 - 17MDa in the CDMS data in Figure 10.8, but a measurement of very low signal (signified with a green arrow in Figure 10.8). This area could be interpreted as the detected 8FU 4-decamer of KLH, with one possible explanation for the very low intensity of the signal being that the more decameric modules a multi-decamer KLH complex contains the more unstable it is. This molecular instability renders the multi-protein complex more vulnerable to the preparation conditions before the actual MS detection. Taking this into consideration, the more vulnerable a protein complex is the fewer the intact molecules that reach the detector, hence the low signal in Figure 10.8.

Mass spectrum of KLH Mass spectrum of KLH Figure 10.8 Mass spectrum of Keyhole Limpet Haemocyanin (KLH) acquired on the CDMS. The peaks corresponding to discrete oligomers of KLH are signified with arrows of different colors.

Waters Part V: Did I make GFP?

Please fill out Table 10.3 with the data you acquired from the lab work done at the Waters Immerse Lab in Cambridge or else the data screenshots in this document if you were unable to have lab work done at Waters.

Table 10.3 Data gathered from eGFP-related MS measurements.

Theoretical MWObserved/measured MW on the intact LC-MSMass error
28.007kDa27.982kDa870ppm

The mass error from the accuracy calculation is 870ppm, which is much higher than the 30 - 50ppm threshold for proteins, therefore I cannot confidently claim that what was measured was eGFP.