Homeworktype: chapter

Weekly homework submissions:

  1. Biological Engineering Application This project proposes the development of an intestinal spheroid culture platform derived from cell lines (e.g., Caco-2 spheroid or organoid-like cultures), combined with multi-omics profiling (transcriptomics, proteomics, and metabolomics) computational modeling using systems biology and machine-learning approaches. The platform is intended to support research on drug absorption, inflammatory bowel disease (IBD) diagnostics, and predictive analysis of treatment outcomes. Initially, the system will be used to generate hypotheses from experimental data, with the long-term goal of becoming a predictive research tool.
  • Week 2 HW: DNA Read, Write, & Edit

    1-Benchling-in-silico-gel-art Using Benchling.com, Lambda DNA, Paul Vanouse’s Latent Figure Protocol artworks, and Ronan’s website as references, and incorporating creative design principles, simulations of restriction enzyme digests of the Lambda genome were performed using EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI: Figure 1. Virtual restriction digest of Lambda DNA. Once the banding patterns were characterized, images inspired by the previously mentioned works were created:

  • Week 3 HW: Lab automation

    Assignment: Python Script for Opentrons Artwork Based on the Lissajous function, the figure to be created on the agar will be the following: Post-Lab Questions — DUE BY START OF FEB 24 LECTURE Paper: Automation of biochemical assays using an open-sourced, inexpensive robotic liquid handler Moukarzel et al. 2024

  • Week 4 HW: protein desing part 1

    Part A. Conceptual Questions 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) If we assume that meat contains approximately 20% protein, then 500 g of meat provides around 100 g of protein. Since the average molecular weight of an amino acid is approximately 100 Daltons (approx 100 g/mol), that 100 g corresponds to approximately 1 mole of amino acids. One mole contains approx 6.02 × 10²³ molecules, so you would ingest approximately 6 × 10²³ molecules of amino acids.

  1. Why do humans eat beef but do not become a cow, eat fish but do not become fish? Because when we eat meat, the animal’s proteins (from cows, fish, etc.) are not incorporated intact into our bodies. Instead, they are digested in the gastrointestinal tract into their basic components: amino acids and small peptides. These are absorbed, and then our own cells reuse them to synthesize human proteins according to the information encoded in our DNA. In other words, we don’t incorporate the “biological identity” of the animal we eat, but rather molecular raw material that our genome reorganizes according to the instructions specific to the human species.
  2. Why are there only 20 natural amino acids? There are only 20 standard “natural” amino acids because the universal genetic code that has evolved encodes precisely these 20 building blocks for protein synthesis (with rare exceptions such as selenocysteine and pyrrolysine). This selection is not chemical but evolutionary: among many possible molecules, these 20 offered an optimal balance between structural diversity (charges, sizes, polarity, hydrophobicity), chemical stability, and biosynthetic efficiency. With this set, an enormous variety of protein structures and functions can be generated, so evolution did not need to significantly expand the basic alphabet to support biological complexity.
  3. Can you make other non-natural amino acids? Design some new amino acids. Yes, it is possible to create non-natural amino acids both chemically and by expanding the genetic code in biological systems. From a design perspective, it is sufficient to maintain the α-amino acid backbone (amino group, carboxyl group, and chiral α-carbon) and modify the side chain to introduce new physicochemical properties. For example, one could design (1) an amino acid with a bulky fluorinated side group to increase hydrophobic stability and resistance to degradation, (2) one with a photoreactive side chain (such as an azide or diazirine group) to allow light-induced cross-linking, (3) an amino acid with a chelating metal group to create artificial catalytic sites, or (4) one with a redox-active side chain capable of participating in electron transfer. In fact, synthetic biology has already incorporated hundreds of non-natural amino acids into proteins through reassigned codons or modified tRNA/synthetase systems, functionally expanding the protein “alphabet” beyond the standard 20.
  4. Where did amino acids come from before enzymes that make them, and before life started? Before enzymes and cellular life existed, amino acids could have formed through abiotic prebiotic chemistry. Classic experiments like the Miller-Urey experiment demonstrated that, under conditions simulating the early atmosphere (simple gases such as methane, ammonia, water vapor, and electrical discharges), several amino acids can be spontaneously synthesized. Furthermore, amino acids have been found in meteorites such as the Murchison meteorite, indicating that they can also form in space through interstellar chemistry and then reach Earth via impacts. Other possible environments include oceanic hydrothermal vents and mineral surfaces that catalyze organic reactions. Taken together, the evidence suggests that amino acids arose through natural physicochemical processes before the emergence of enzymes and were part of the molecular inventory that preceded the origin of life.
  5. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? If you build an α-helix exclusively from D-amino acids, you would expect it to adopt a left-handed helix. In natural proteins made from L-amino acids, the stable α-helix is typically right-handed due to stereochemical constraints of the α-carbon and the allowed φ and ψ angles in conformational space. By inverting the chirality (using D instead of L), the geometric preference for folding is also reversed, producing the mirror image: a stable α-helix but with the opposite orientation.
  6. Can you discover additional helices in proteins? Yes, in principle, additional helices beyond the classical conformations can be discovered or engineered. Helical conformation depends on the allowed φ/ψ angles, hydrogen bonding patterns, and the chemistry of the peptide backbone. Modifying these variables, for example, by using non-natural amino acids, changing the backbone length, or applying specific steric constraints, can lead to the emergence of new, stable helical geometries. In fact, helical architectures not commonly found in natural proteins have been observed in synthetic peptides and foldamers. However, within proteins composed of the 20 standard amino acids and the natural peptide backbone, the repertoire of helices is strongly restricted by the stereochemistry and physics of the peptide bond, so additional variants tend to be rare or less stable.
  7. Why are most molecular helices right-handed? Most molecular helices are right-handed because they are built from chiral building blocks with a predominant stereochemical configuration. In terrestrial biology, almost all amino acids are L-shaped, and this chirality imposes specific geometric constraints on the angles of the peptide backbone, favoring a right-handed α-helix as the most energetically and sterically stable conformation. In other words, the molecular asymmetry of the monomers is amplified at the macroscopic level in the secondary structure. If life had predominantly adopted D-amino acids, the predominant helices would most likely have been left-handed.
  8. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? β-sheets tend to aggregate because their geometry exposes repeating patterns of amide and carbonyl groups in the backbone that can form extensive networks of intermolecular hydrogen bonds. When several polypeptide chains adopt extended conformations, they can readily align and stabilize each other through these bonds, forming larger sheets and, in extreme cases, amyloid-like fibrils. The primary driving force for aggregation is the minimization of the system’s free energy: the cooperative formation of hydrogen bonds, along with hydrophobic interactions between side chains and the burial of nonpolar surfaces, offsets the entropic cost of ordering the chains. Taken together, geometric complementarity and intermolecular stabilization make β-sheets particularly prone to aggregation when they are partially unfolded or misfolded.
  • Week 5 HW: protein desing part 2

    Homework — DUE BY START OF MAR 10 LECTURE Part A: SOD1 Binder Peptide Design (From Pranam) Introduction Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • Week 6 HW: genetic-circuits-part-i

    Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fidelity PCR Master Mix is a pre-mixed PCR solution containing several essential components for high-precision DNA amplification. Its main components and functions are:

  • Week 7 HW: genetic-circuits-part-ii

    Task Part 1: Intracellular Artificial Neural Networks (IANN) 1. Advantages of IANNs vs. Boolean Logic Continuous/gradual response (not just ON/OFF) Integrate multiple weighted signals (weighted sum type) Allow thresholds, nonlinearity, and fine-tuning Greater robustness to biological noise Scalability for complex functions (not just AND/OR) 2. Useful Application (example: intelligent intestinal biosensor) Input: X1 = microbial metabolites (SCFAs) X2 = inflammatory signal (NF-κB) X3 = drug present Behavior: The IANN integrates signals → calculates an “activation” If it exceeds the threshold → activates the expression of a therapeutic (or fluorescent) protein

  • Week 9 HW: Cell-Free Sistem

    General homework questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. Cell-free protein synthesis offers key advantages over in vivo systems by eliminating the complexity and limitations of the cell as a “black box.” In these systems, all components are defined and manipulable, allowing direct control over variables such as DNA concentrations, expression levels, biochemical composition, cofactors, and reaction conditions. In fact, expression can be precisely adjusted simply by varying the DNA concentration, achieving proportional regulation of each protein—something difficult to achieve in living cells. Furthermore, the system is fully customizable, allowing modification of the internal chemistry and each molecular component, which provides a level of experimental control and predictability far superior to that of traditional cell systems.

  • Week 10 HW: imaging and measurement

    Proposed set of measurements to be implemented in the project, subject to refinement as the study progresses. Homework: Waters Part I — Molecular Weight Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/ 28,006.6 Da. It is calculated by summing the average masses of all amino acids in the sequence, including the His-tag, and accounting for water loss during peptide bond formation. No major modifications significantly change the total mass, so this value matches the expected intact mass of eGFP.

  • Week 11 HW: building-genomes

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork what about this collaborative art experiment could be made better for next year. I didn’t have the opportunity to contribute. I think it would be useful to design a protocol with https://rcdonovan.com and then calculate the volumes per well, concentrations, or data that can be used later.

Subsections of Homeworktype: chapter

Week 1 HW: Principles and Practices

[Spheroid Cell Culture]

1. Biological Engineering Application

This project proposes the development of an intestinal spheroid culture platform derived from cell lines (e.g., Caco-2 spheroid or organoid-like cultures), combined with multi-omics profiling (transcriptomics, proteomics, and metabolomics) computational modeling using systems biology and machine-learning approaches. The platform is intended to support research on drug absorption, inflammatory bowel disease (IBD) diagnostics, and predictive analysis of treatment outcomes. Initially, the system will be used to generate hypotheses from experimental data, with the long-term goal of becoming a predictive research tool.

Platform Workflow

  • 3D Intestinal Model Generation: Establishment of Caco-2–derived 3D epithelial cultures to model intestinal barrier function.
  • Experimental Perturbation: Exposure of cultures to inflammatory signals, drug compounds, or microbiota-related metabolites. Multi-Omics Acquisition: Collection of transcriptomic, proteomic, and metabolomic data to capture cellular responses.
  • Data Processing and Integration: Quality control, normalization, and integration of omics datasets using reproducible bioinformatics pipelines.
  • Computational Modeling: Application of systems biology and machine-learning approaches to identify patterns and generate hypotheses.
  • Validation and Iteration: Experimental validation of model predictions through iterative testing.

2. Governance Framework

Governance Objectives

The project integrates governance principles to ensure safe, transparent, and equitable use of the technology.

Scientific and Clinical Safety

  • Implement staged validation protocols before diagnostic use.
  • Establish quality-control standards for omics data.
  • Limit early platform use to research contexts.
  • Document uncertainty in predictive models.

Biological Data Protection

  • Anonymize patient-derived data.
  • Comply with research ethics and data protection regulations.
  • Implement controlled access to datasets and software.
  • Maintain traceability of samples and analyses.

Responsible Use of Predictive Models

  • Design software as a research-support tool.
  • Include confidence and uncertainty metrics in predictions.
  • Validate models with independent datasets.
  • Avoid automated decision-making without human supervision.

Equity and Access

  • Promote open-source computational tools.
  • Design scalable experimental protocols.
  • Encourage collaboration with public institutions.
  • Document methodologies for technology transfer.

3. Governance Actions

  1. Stage-based validation requirement: Restrict initial platform use to research applications until validation standards are met. In the early stages, use cell lines as a working model (3D spheroids/organoids).
  2. Controlled access data management: Use public databases to triangulate working hypotheses. Implement anonymized datasets with institutional oversight and traceability.
  3. Transparent computational workflows: Share bioinformatics processes and documentation through reproducible research practices.

Prioritization Strategy

The project prioritizes a combination of staged validation protocols and open, reproducible computational standards. These actions balance scientific safety with research feasibility and transparency. Controlled-access data infrastructure will be implemented progressively when human biological samples are incorporated.

4. Rating of governance actions

The following table summarizes the evaluation of governance options according to course criteria.

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents122
• By helping respond212
Foster Lab Safety
• By preventing incident122
• By helping respond212
Protect the environment
• By preventing incidents223
• By helping respond223
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility?221
• Not impede research231
• Promote constructive applications111

Governance Actions: Visual Comparison

Stage-based validation requirement
Biosecurity:         ██░
Lab safety:          ██░
Environment:        ██░
Feasibility:        ██░

Controlled access data management
Biosecurity:         ██░
Lab safety:          ██░
Environment:        ██░
Feasibility:        ██░

Transparent computational workflows
Biosecurity:         ██░
Lab safety:          ██░
Environment:        ███
Feasibility:        █░░

Bars represent rubric scores (1 = █ strong alignment, 2 = ██ moderate alignment, 3 = ███ weaker alignment).

5. Strategies for an Ethical Biological Future

Based on the scoring above, the priority would be a combination of Option 1 (staged validation requirement) and Option 3 (open and reproducible computational standards). Together, these actions balance scientific safety with research feasibility. Validation protocols reduce the risk of incorrect interpretation or premature diagnostic use, while reproducible computational workflows promote transparency, collaboration, and constructive scientific applications without significantly increasing costs. The project will follow validation-driven research practices, responsible data governance, and open computational workflows. Periodic ethical evaluation will accompany platform development to identify risks and support responsible translation into diagnostic or predictive applications.

Ethical Reflection and Protocol Standardization

  • To improve reproducibility and reliability, the project will:
  • Implement standard operating procedures (SOPs).
  • Validate and benchmark protocols across experiments.
  • Use shared documentation and version control for methods.

Assignment (Week 2 Lecture Prep) — DUE BY START OF FEB 10 LECTURE

Professor Jacobson’s homework questions:

  1. Nature’s machinery for copying DNA is called polymerase. What is the polymerase error rate? How does this compare to the length of the human genome? How does biology address this discrepancy? DNA polymerase with proofreading activity (3′-5′ exonuclease) has an approximate error rate of 1 in 10⁶ nucleotides incorporated.

The human genome has approximately 3.2 × 10⁹ base pairs, so if only polymerase fidelity existed, thousands of errors would be introduced per complete genome replication. Biology resolves this discrepancy through multiple levels of error correction, for example:

  • polymerase proofreading
  • DNA repair systems (e.g., mismatch repair such as MutS)
  • redundancy and robustness of the biological system Together, these mechanisms reduce the effective error rate to levels compatible with genome stability.
  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice, what are some of the reasons why all these different codes fail to encode the protein of interest?

The genetic code has 64 codons for 20 amino acids, which leads to genetic code degeneracy. According to the slides, an average human protein has approximately 1036 bp, or about 345 amino acids. If each amino acid can be encoded by an average of 3 codons, the number of possible sequences would be approximately 3^345, representing an extremely large number of possible DNA sequences that code for the same protein. Why not all of these sequences work in practice. Many variants don’t work well due to biological and physical constraints, for example:

  • codon bias and translation efficiency (different codons for the same amino acid)
  • GC content and DNA stability
  • DNA/RNA secondary structures
  • unwanted regulatory signals
  • repeats or sequences that are difficult to synthesize
  • mRNA stability In other words, even though the genetic code is redundant, not all equivalent sequences are functionally equivalent.

Dr. LeProust’s Homework Questions:

  1. What is the most commonly used method for oligonucleotide synthesis?
  2. Why is it difficult to produce oligonucleotides longer than 200 nucleotides by direct synthesis?
  3. Why can’t a 2000 bp gene be created by direct oligonucleotide synthesis?

The most commonly used method for oligonucleotide synthesis is solid-phase phosphoramide synthesis. This process occurs cyclically, adding one nucleotide at a time, and each step has an efficiency slightly less than 100%. Due to this imperfect efficiency in each cycle, the probability of accumulated errors increases with the sequence length, making it difficult to produce oligonucleotides longer than approximately 200 nucleotides by direct synthesis. For the same reason, it is not possible to directly synthesize a 2000 base pair gene as a single oligonucleotide, as the accumulation of errors and truncated products would be too high. In practice, long genes are constructed by assembling multiple shorter oligos using molecular assembly methods (e.g., PCR assembly or Gibson assembly).

George Church’s Homework Question:

Choose ONE of the following three questions to answer; and cite any AI prompts or paper citations used.

  1. [Using Google Slide #4 and Professor Church] What are the 10 essential amino acids in all animals? And how does this affect your view of the “Lysine Contingency”?

The essential amino acids in animals include: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine (and in some contexts, arginine). These amino acids must be obtained from the diet because animals cannot synthesize them. In relation to the “Lysine Contingency,” lysine becomes a critical point in bioengineering because its metabolic availability can be used as a control mechanism or biological dependency in synthetic systems. This illustrates how natural metabolic constraints can be exploited as biocontainment or functional control strategies in synthetic biology.

  1. [Given slides #2 and 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

A possible code for AA:AA interactions could be based on physicochemical complementarity (e.g., charge, hydrophobicity, and size), analogous to how NA:NA interactions rely on base pairing and AA:NA interactions rely on codon recognition. This is because amino acid–amino acid interactions are primarily determined by chemical and structural complementarity rather than a fixed symbolic code, unlike nucleic acid base pairing.

Week 2 HW: DNA Read, Write, & Edit

1-Benchling-in-silico-gel-art

Using Benchling.com, Lambda DNA, Paul Vanouse’s Latent Figure Protocol artworks, and Ronan’s website as references, and incorporating creative design principles, simulations of restriction enzyme digests of the Lambda genome were performed using EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI:

Figure 1. Virtual restriction digest of Lambda DNA.

Once the banding patterns were characterized, images inspired by the previously mentioned works were created:

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

Part 3: DNA Design Challenge

3.1. Choosing a protein:

Protein Chosen: INS Homo sapiens (Insulin): This is a known protein, short sequence, and without stop codons: AAP35454.1 insulin [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

Obtaining the DNA sequence in tblastn:

In tblastn with the Human INS sequence (AAP35454.1), the search was on for the DNA sequence with the highest identity.

Figure 2. tblast of the selected sequence.

Once the result is obtained, by entering the sequence ID (AB587580.1) the gene sequence is obtained:

Figure 3. DNA selected sequence.

BT006808.1 Homo sapiens insulin mRNA, complete cds ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG

3.2. Reverse translation: Protein sequence (amino acids) to DNA sequence (nucleotides)

The central dogma, discussed and recited in class, describes the process by which the DNA sequence is transcribed and translated into protein. The central dogma provides the framework for working in reverse from a given protein sequence and inferring the DNA sequence from which it is derived. Using one of the tools discussed in class, the NCBI (tblastn) or online tools (search “reverse translation tools” on Google), determine the nucleotide sequence that corresponds to the protein sequence you chose earlier.

3.3. Codon optimization

Codon optimization: on the web site NovoProLabs modifying the nucleotide sequence of a gene to maximize protein production in a specific organism by adapting codon usage to the host’s codon bias. Gene expression in a different organism may require adjusting codon usage to match the host’s translational preferences. Restriction enzymes EcoRI and XhoI are not involved in the codon optimization itself, but are used to facilitate cloning of the optimized gene into an expression vector. During sequence design, the optimization tool adds the specific recognition sites of these enzymes to the ends of the gene, so that, by digesting both the insert and the plasmid with the same enzymes, compatible cohesive ends are generated that allow the gene to be inserted in the correct orientation within the vector. Instead, the changes observed in Relative Adaptiveness and GC content come exclusively from the codon optimization process, which modifies the nucleotide sequence without altering the amino acid sequence to improve translation efficiency in the host organism.

Codon optimization of the insulin coding sequence for E. coli expression increased the Codon Adaptation Index (CAI) from 0.48 to 0.90, indicating improved compatibility with the host’s codon usage preferences and a higher expected translation efficiency. At the same time, GC content was adjusted from 64.56% to 60.00%, bringing it closer to a balanced range that can improve mRNA stability and transcriptional performance. Overall, the optimization modifies the nucleotide sequence without changing the amino acid sequence, with the goal of enhancing recombinant protein expression in the bacterial system.

3.4 Optimized protein transcription and translation

To produce this protein from DNA, the optimized gene can be inserted into a recombinant expression vector and introduced into a host such as *E. coli *using recombinant DNA technology and heterologous expression systems. Inside the cell, the DNA sequence is first transcribed into mRNA by RNA polymerase, and the mRNA is then translated by ribosomes into the protein according to the genetic code. Alternatively, the same DNA template can be used in cell-free expression systems (in vitro transcription - translation, IVTT), which contain purified enzymes, ribosomes, tRNAs, and energy sources that allow transcription and translation to occur outside living cells. Both approaches rely on the central dogma of molecular biology- DNA => RNA => protein - but differ in whether protein production occurs in living cells or in a controlled biochemical system.

3.5 Alternative splicing

A single gene can produce multiple proteins at the transcriptional level mainly through alternative splicing (alternative splicing). During pre-mRNA processing, different exons can be included or excluded in different combinations, generating multiple mature mRNAs from the same gene. Each of these mRNAs can be translated into different protein isoforms, with variations in their structure and function. Furthermore, mechanisms such as alternative promoters or alternative polyadenylation sites can also produce different transcripts from the same gene locus.

Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account, and Benchling account

4.2. Build Your DNA Insert Sequence

In Benchling, select New DNA/RNA sequence. Give your insert sequence a name and select DNA with a Linear topology (this is a linear sequence that will be inserted into a circular backbone vector of our choosing).

ISN insert sequence

4.5. Vector Selection

The sequence of plasmid PP533546.1 was downloaded from GenBank and used as cloning backbone. Then, a synthetic INS coding sequence from Homo sapiens was designed in Benchling. After preparing both sequences, virtual DNA assembly was performed to insert the INS construct into the plasmid backbone, generating a recombinant circular plasmid containing the expression cassette (promoter, RBS, INS CDS, His tag, and terminator). This in-silico cloning step allowed visualization and verification of the final construct.

pUC57- E. coli -ISN Hs plasmid

Part 5: DNA Read/Write/Edit

5.1 DNA Read

Sanger sequencing is a first-generation DNA sequencing method based on chain termination. During DNA synthesis, modified nucleotides (dideoxynucleotides, ddNTPs) are incorporated randomly, stopping elongation. The resulting fragments of different lengths are separated by capillary electrophoresis, and the sequence is read from fluorescent labels. It is highly accurate and ideal for validating plasmids or specific genes, but it is low-throughput and typically limited to ~700 - 1000 bp per read. Nanopore sequencing is a third-generation method that sequences DNA in real time by passing single DNA molecules through a biological nanopore. As nucleotides move through the pore, they disrupt an ionic current in characteristic ways, allowing base identification. It can generate very long reads (even entire plasmids or genomes in one fragment) and works without PCR amplification, but raw accuracy can be slightly lower than Sanger, though it has improved significantly. An example of next-generation sequencing (NGS) is Illumina sequencing. It uses massively parallel sequencing by synthesis, where millions of DNA fragments are immobilized on a flow cell and sequenced simultaneously through cyclic incorporation of fluorescently labeled nucleotides. This approach provides extremely high throughput and is commonly used for whole-genome sequencing, transcriptomics (RNA-seq), and large-scale variant analysis.

(ii) What technology or technologies would you use to perform your DNA sequencing and why?

Genes of the gut microbiota. To sequence gut microbiota genes, the most widely used and appropriate technology would be second-generation sequencing (NGS), especially platforms like Illumina. Third-generation sequencing, such as Oxford Nanopore Technologies, could also be considered, depending on the objective (taxonomic resolution, or complete assembly). Responda también las siguientes preguntas:

Is your method first, second, or third generation, or something else? What does that mean?

Recommended primary technology: Illumina (NGS). What generation is it? It’s second generation. It’s characterized by performing millions of reads in parallel (massively parallel sequencing) with short fragments and high accuracy. For gut microbiota studies (e.g., 16S rRNA gene sequencing or metagenomics), Illumina is ideal due to its high accuracy, low cost per sample, high sequencing depth, and excellent bioinformatics support. Nanopore would be useful if you are looking for long reads or whole genome assembly.

What is your opinion? How do you prepare your information (e.g., fragmentation, adapter ligation, PCR)? List the essential steps.

Sample Preparation (e.g., 16S rRNA or metagenomics). Essential Steps:

  1. DNA extraction from the fecal sample.
  2. Fragmentation (if metagenomics; not always necessary for 16S rRNA).
  3. PCR amplification
  4. For 16S rRNA: amplification of variable regions (V3–V4).
  5. Ligation of adapters and indices (barcodes).
  6. Purification and quantification.
  7. Loading into the flow cell of the sequencer.
What are the essential steps of the sequencing technology you have chosen and how does it decode the bases of your DNA sample (base calling)?

How does Illumina technology work?

  1. DNA with adapters is attached to a flow cell.
  2. A cluster is generated by bridge amplification.
  3. Reversibly fluorescent terminator nucleotides are incorporated.
  4. Each cycle adds one base.
  5. A camera detects the fluorescence.
  6. The terminator is removed, and the cycle repeats. How are the bases decoded? Each base (A, T, C, G) emits a different fluorescence. The system records the optical signal and converts it into a digital sequence (base calling).
What is the outcome of the chosen sequencing technology?

The end result is millions of short reads (FASTQ files). Each read includes base sequence and quality scores (Phred score). Subsequently, bioinformatics analysis, taxonomic identification, bacterial abundance profiling, and alpha and beta diversity are performed.

5.2 DNA writing

What DNA would you like to synthesize (e.g., write) and why?

Genes that are expressed in inflammatory settings. Genes that are artificially expressed so that in IBD contexts, they are expressed in cell cultures, spheroids, or organoids. It should serve as a reliable indicator, expressed through the presence of butyrate, SCFAs, or various bacterial metabolites, or oxidative stress scenarios. It should be activated by the presence of a common marker metabolite. A promoter sensitive to an inflammatory or metabolic signal controls the expression of a reporter gene (e.g., GFP, luciferase, mCherry, etc.). When the stimulus appears (butyrate, ROS, NF-κB, etc.), the promoter is activated and the reporter gene is expressed.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?

To design and validate a promoter sensitive to inflammatory or metabolic signals, it is first necessary to identify endogenous regulatory regions that respond to the stimulus of interest using NGS technologies such as RNA-seq to detect stimulus-induced genes, and ATAC-seq or ChIP-seq to map open chromatin regions or binding sites of specific transcription factors (e.g., NF-κB). Complementarily, techniques such as STARR-seq allow for the functional evaluation of promoter activity across thousands of sequences simultaneously, identifying those that actually activate transcription under the stimulus. By combining these data, synthetic promoters can be designed to drive the specific and quantifiable expression of a reporter gene (GFP, luciferase, mCherry) in response to the desired signal.

Please also answer the following questions:
What are the essential steps of the sequencing methods you have chosen?
What are the limitations of your sequencing method (if any) in terms of speed, accuracy, and scalability?

The essential steps of RNA-seq are: (1) isolation of total RNA or messenger RNA (mRNA) from the samples of interest; (2) fragmentation and library preparation, which includes conversion to cDNA, ligation of adapters, and sometimes size selection; (3) sequencing of the libraries on an NGS platform to obtain short reads; (4) data processing, which involves quality control, filtering, alignment of the reads to the reference genome or transcriptome, and quantification of gene expression; and (5) differential analysis to identify genes whose expression level changes between conditions, followed by functional annotation and pathway analysis. The main limitations of RNA-seq in terms of speed, accuracy, and scalability are: Speed: Library preparation and sequencing can take anywhere from several hours to several days, especially if biological replicates and multiple conditions are required; bioinformatics analyses can also be slow if the datasets are large. Accuracy: Quantification can be affected by amplification bias, reverse transcription efficiency, fragmentation differences, or ambiguous mapping of reads to homologous genes or isoforms; genes with low expression are difficult to detect reliably. Scalability: Processing many samples simultaneously increases costs and complexity, and storing and analyzing large volumes of data requires robust computational infrastructure; furthermore, high-resolution methods such as single-cell RNA-seq exponentially increase the amount of data and the complexity of the analysis.

5.3 DNA Editing

(i) Which DNA would you like to edit and why?

CRISPR editing. I believe it’s a technique with a very promising future.

(ii) What technology or technologies would you use to perform these DNA edits and why?
Also answer the following questions:
How does your preferred technology edit DNA? What are the essential steps?

CRISPR edits DNA using a Cas enzyme guided by a guide RNA that recognizes a specific sequence in the genome. Cas cuts the DNA at that point, and then the cell repairs the break, which can result in insertions, deletions, or allow the incorporation of a new sequence if a repair template is provided.

What preparation is needed (e.g., design steps) and what information (e.g., DNA template, enzymes, plasmids, primers, guides, cells) is required for editing?

A guide RNA that recognizes the target sequence needs to be designed, and, if insertion is desired, a DNA template with the sequence to be incorporated. Information and materials include: Cas9 enzyme (or similar), plasmids or delivery vectors, guide RNA, verification primers, the repair template if applicable, and the cells to be edited.

What are the limitations of your editing methods (if any) in terms of efficiency or accuracy?

CRISPR has limitations in efficiency, since not all cells correctly receive the editing machinery or repair DNA in the desired way, and in precision, because off-target effects or unforeseen insertions/deletions can occur at the cutting site. Furthermore, efficiency and precision depend on cell type, guide RNA design, and genomic context.


title: ‘Week 3 HW: Lab Automation’ weight: 30

Assignment: Python Script for Opentrons Artwork

Based on the Lissajous function, the figure to be created on the agar will be the following:

Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

Paper: Automation of biochemical assays using an open-sourced, inexpensive robotic liquid handler

Moukarzel et al. 2024

https://doi.org/10.1016/j.slast.2024.100205

The study aimed to evaluate the feasibility of using an open-sourced, low-cost robotic liquid handler—specifically the Opentrons OT-2—for automating biochemical assays that are traditionally run on expensive industrial liquid handling platforms. High-throughput screening is a core process in pharmaceutical development, but the cost and training requirements of conventional robotic systems can be prohibitive. The authors set out to determine whether a lightweight, Python-programmable robot could perform common assay workflows with sufficient precision and reliability to be useful in early-stage assay development and method transfer.

To test this, the team programmed the OT-2 to perform two standard biochemical assays—PicoGreen for DNA quantification and Bradford for protein concentration—using custom Python protocols that controlled pipetting and reagent transfers across microplates. These automated workflows were run repeatedly and compared to runs on a more expensive Tecan EVO liquid handler to benchmark performance. The study measured pipetting accuracy, variability, and overall assay consistency to assess how well the OT-2 handled the tasks relative to the industrial system.

The results showed that the OT-2 delivered accurate pipetting with low covariance across replicates, demonstrating performance close to that of the Tecan EVO despite its substantially lower cost and simpler hardware. Although limitations such as the absence of a crash detection system and a relatively small deck space were noted, the robot’s affordability and flexibility were highlighted as significant advantages. The authors concluded that the OT-2 represents a cost-effective, medium-throughput automation solution well suited for early-stage assay development and method transfer without requiring large capital investments.

Week 3 HW: Lab automation

Assignment: Python Script for Opentrons Artwork

Based on the Lissajous function, the figure to be created on the agar will be the following:

Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

Paper: Automation of biochemical assays using an open-sourced, inexpensive robotic liquid handler

Moukarzel et al. 2024

https://doi.org/10.1016/j.slast.2024.100205

The study aimed to evaluate the feasibility of using an open-sourced, low-cost robotic liquid handler—specifically the Opentrons OT-2—for automating biochemical assays that are traditionally run on expensive industrial liquid handling platforms. High-throughput screening is a core process in pharmaceutical development, but the cost and training requirements of conventional robotic systems can be prohibitive. The authors set out to determine whether a lightweight, Python-programmable robot could perform common assay workflows with sufficient precision and reliability to be useful in early-stage assay development and method transfer.

To test this, the team programmed the OT-2 to perform two standard biochemical assays—PicoGreen for DNA quantification and Bradford for protein concentration—using custom Python protocols that controlled pipetting and reagent transfers across microplates. These automated workflows were run repeatedly and compared to runs on a more expensive Tecan EVO liquid handler to benchmark performance. The study measured pipetting accuracy, variability, and overall assay consistency to assess how well the OT-2 handled the tasks relative to the industrial system.

The results showed that the OT-2 delivered accurate pipetting with low covariance across replicates, demonstrating performance close to that of the Tecan EVO despite its substantially lower cost and simpler hardware. Although limitations such as the absence of a crash detection system and a relatively small deck space were noted, the robot’s affordability and flexibility were highlighted as significant advantages. The authors concluded that the OT-2 represents a cost-effective, medium-throughput automation solution well suited for early-stage assay development and method transfer without requiring large capital investments.

Final Project Description – Automation of ABC Transporter Uptake and Efflux Assays in Intestinal Organoids

To develop a semi-automated workflow for ABC transporter uptake and efflux assays using intestinal spheroid and organoid cultures in a 6, 12 and 96-well plate format. The objective is to improve reproducibility, throughput, and quantitative accuracy while reducing manual variability in washing, incubation timing, and sample collection steps.

1- Automation:

A- Cell Culture:

a. Cell counter (density according to culture type). Create a script or application to count the number of cells.
b. Plate treatment. Drying and seeding. Opetrons

B- Automation of Treatments:

a. Cell culture media changes
b. Washes with buffer of cultures in pretreatments
c. Media changes and special media

C- ANT (total nucleic acids) extraction
D- Bioinformatics:

a. Census of different crop variables and of diagnostic interest. Ej.: pH, CO2, Temperature

E- Multiomics

2. Liquid Handling Automation

To design a pipetting workflow compatible with a benchtop liquid handler (e.g., Opentrons-like platform). The automated protocol will:
1- Remove culture medium
2- Wash wells with PBS (1–2 cycles, optimized).
3- Add loading medium with defined metabolite concentrations.
4- Incubate for a programmable time.
5- Remove loading medium.
6- Perform PBS washes.
7- Add HBSS efflux buffer.
8- Incubate for defined time intervals.
9- Transfer efflux supernatant to a secondary 96-well plate or tubes.
10- Add diluted Triton for cell lysis.
Timing precision will be critical, especially for efflux kinetics.

Considerations: Automatic mapping of conditions per well (using an imported CSV file).
Differential control by column (e.g., column 1–3 control, 4–6 MK-571).
Automatic metadata recording (plate ID, date, batch).
Kinetic analysis at multiple intervals (e.g., collect at 5, 10, 15 min).

B. Example Pseudocode (Conceptual Workflow)

Example Opentrons Protocol – ABC Transporter Efflux Assay (96-well format)

Opetron Placas 96 wells ejemplo posible

Week 4 HW: protein desing part 1

Part A. Conceptual Questions

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
If we assume that meat contains approximately 20% protein, then 500 g of meat provides around 100 g of protein. Since the average molecular weight of an amino acid is approximately 100 Daltons (approx 100 g/mol), that 100 g corresponds to approximately 1 mole of amino acids. One mole contains approx 6.02 × 10²³ molecules, so you would ingest approximately 6 × 10²³ molecules of amino acids.
2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Because when we eat meat, the animal’s proteins (from cows, fish, etc.) are not incorporated intact into our bodies. Instead, they are digested in the gastrointestinal tract into their basic components: amino acids and small peptides. These are absorbed, and then our own cells reuse them to synthesize human proteins according to the information encoded in our DNA. In other words, we don’t incorporate the “biological identity” of the animal we eat, but rather molecular raw material that our genome reorganizes according to the instructions specific to the human species.
3. Why are there only 20 natural amino acids?
There are only 20 standard “natural” amino acids because the universal genetic code that has evolved encodes precisely these 20 building blocks for protein synthesis (with rare exceptions such as selenocysteine and pyrrolysine). This selection is not chemical but evolutionary: among many possible molecules, these 20 offered an optimal balance between structural diversity (charges, sizes, polarity, hydrophobicity), chemical stability, and biosynthetic efficiency. With this set, an enormous variety of protein structures and functions can be generated, so evolution did not need to significantly expand the basic alphabet to support biological complexity.
4. Can you make other non-natural amino acids? Design some new amino acids.
Yes, it is possible to create non-natural amino acids both chemically and by expanding the genetic code in biological systems. From a design perspective, it is sufficient to maintain the α-amino acid backbone (amino group, carboxyl group, and chiral α-carbon) and modify the side chain to introduce new physicochemical properties. For example, one could design (1) an amino acid with a bulky fluorinated side group to increase hydrophobic stability and resistance to degradation, (2) one with a photoreactive side chain (such as an azide or diazirine group) to allow light-induced cross-linking, (3) an amino acid with a chelating metal group to create artificial catalytic sites, or (4) one with a redox-active side chain capable of participating in electron transfer. In fact, synthetic biology has already incorporated hundreds of non-natural amino acids into proteins through reassigned codons or modified tRNA/synthetase systems, functionally expanding the protein “alphabet” beyond the standard 20.
5. Where did amino acids come from before enzymes that make them, and before life started?
Before enzymes and cellular life existed, amino acids could have formed through abiotic prebiotic chemistry. Classic experiments like the Miller-Urey experiment demonstrated that, under conditions simulating the early atmosphere (simple gases such as methane, ammonia, water vapor, and electrical discharges), several amino acids can be spontaneously synthesized. Furthermore, amino acids have been found in meteorites such as the Murchison meteorite, indicating that they can also form in space through interstellar chemistry and then reach Earth via impacts. Other possible environments include oceanic hydrothermal vents and mineral surfaces that catalyze organic reactions. Taken together, the evidence suggests that amino acids arose through natural physicochemical processes before the emergence of enzymes and were part of the molecular inventory that preceded the origin of life.
6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
If you build an α-helix exclusively from D-amino acids, you would expect it to adopt a left-handed helix. In natural proteins made from L-amino acids, the stable α-helix is typically right-handed due to stereochemical constraints of the α-carbon and the allowed φ and ψ angles in conformational space. By inverting the chirality (using D instead of L), the geometric preference for folding is also reversed, producing the mirror image: a stable α-helix but with the opposite orientation.
7. Can you discover additional helices in proteins?
Yes, in principle, additional helices beyond the classical conformations can be discovered or engineered. Helical conformation depends on the allowed φ/ψ angles, hydrogen bonding patterns, and the chemistry of the peptide backbone. Modifying these variables, for example, by using non-natural amino acids, changing the backbone length, or applying specific steric constraints, can lead to the emergence of new, stable helical geometries. In fact, helical architectures not commonly found in natural proteins have been observed in synthetic peptides and foldamers. However, within proteins composed of the 20 standard amino acids and the natural peptide backbone, the repertoire of helices is strongly restricted by the stereochemistry and physics of the peptide bond, so additional variants tend to be rare or less stable.
8. Why are most molecular helices right-handed?
Most molecular helices are right-handed because they are built from chiral building blocks with a predominant stereochemical configuration. In terrestrial biology, almost all amino acids are L-shaped, and this chirality imposes specific geometric constraints on the angles of the peptide backbone, favoring a right-handed α-helix as the most energetically and sterically stable conformation. In other words, the molecular asymmetry of the monomers is amplified at the macroscopic level in the secondary structure. If life had predominantly adopted D-amino acids, the predominant helices would most likely have been left-handed.
9. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?
β-sheets tend to aggregate because their geometry exposes repeating patterns of amide and carbonyl groups in the backbone that can form extensive networks of intermolecular hydrogen bonds. When several polypeptide chains adopt extended conformations, they can readily align and stabilize each other through these bonds, forming larger sheets and, in extreme cases, amyloid-like fibrils. The primary driving force for aggregation is the minimization of the system’s free energy: the cooperative formation of hydrogen bonds, along with hydrophobic interactions between side chains and the burial of nonpolar surfaces, offsets the entropic cost of ordering the chains. Taken together, geometric complementarity and intermolecular stabilization make β-sheets particularly prone to aggregation when they are partially unfolded or misfolded.

Part B: Protein Analysis and Visualization

Protein selected: PyMOL>fetch 6BHU
Cryo-EM structure of ATP-bound, outward-facing bovine multidrug resistance protein 1 (MRP1)

Longitud: PyMOL>get_extent
cmd.extent: min: [ 73.772, 111.074, 115.447]
cmd.extent: max: [ 206.780, 198.350, 201.157]
The most frequent amino acid:
PyMOL>stored.residues=[]
PyMOL>iterate name CA, stored.residues.append(resn)
Iterate: iterated over 1465 atoms.
PyMOL>count_atoms 6BHU and name CA count_atoms: 1465 atoms PyMOL>print(max(set(stored.residues), key=stored.residues.count))
LEU Con Colab: The length of the protein is: 1345 aminoacids.
The most common amino acid is: L, which appears 158 times.

• How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs
PyMOL>print(cmd.get_fastastr(“6BHU”))
Does your protein belong to any protein family? ABC Family

Figure 1. Chosen protein sequence.

Figure 2 and 3. BLASTp alignment results.

3. Identify the structure page of your protein in RCSB

Figure 4. Protein structure selected in RCSB.

• When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)
Quality: In terms of traditional crystallography, 3.50 Å is considered low-to-medium resolution. However, for a membrane protein as large as this one (MRP1) resolved by cryo-electron microscopy (cryo-EM), it is a structure of acceptable quality.
• Are there any other molecules in the solved structure apart from protein?
Yes, there are several. In addition to the protein chain, the resolved structure contains small molecules (ligands) that are crucial for its function and for the process of obtaining the structure:
•ATP (Adenosine-5’-triphosphate): Appears as “ATP”. There are two molecules, one at each nucleotide-binding site (NBD). It is the fuel the protein uses for pumping.
Magnesium ions (Mg2+): Appear as “MG”. They are necessary for ATP to bind correctly.
• Cholesterol (CLR): Lipid molecules that remain attached to the protein during purification.
• LMT (Dodecyl-beta-D-maltoside): A detergent used to keep the protein stable outside the cell membrane.
• Does your protein belong to any structure classification family?
Yes, it is perfectly classified in international hierarchies (CATH and SCOP):
• Superfamily: ABC transporters (ATP-Binding Cassette). It is one of the largest and oldest families of membrane proteins.
• Specific family: MRP (Multidrug Resistance-associated Proteins).
Functional classification: It is an exporter.
• Structural architecture (CATH): It is classified as an Alpha-Beta type structure, since it combines transmembrane helices (alpha) with cytoplasmic domains that have beta sheets where ATP binds.
4. Open the structure of your protein in any 3D molecule visualization software:
• PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)
• Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
• Color the protein by secondary structure. Does it have more helices or sheets?
PyMOL>as cartoon
PyMOL>util.cbss(“6BHU”, “red”, “yellow”, “green”)

Figure 5. Chosen protein structure.

• Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
PyMOL># Hidrofóbicos (En naranja/rojo: los que están dentro de la membrana)
PyMOL>color orange, resn ala+val+leu+ile+met+phe+trp+pro+gly
Executive: Colored 4865 atoms.
PyMOL># Hidrofílicos polares (En azul: los que tocan el agua o el citoplasma)
PyMOL>color marine, resn ser+thr+asn+gln+tyr+cys
Executive: Colored 2342 atoms.
PyMOL># Hidrofílicos cargados (En azul oscuro: muy polares)
PyMOL>color slate, resn arg+his+lys+asp+glu
Executive: Colored 2455 atoms.

Figure 6. Chosen protein structure.

• Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?
Hole visualize:
PyMOL># 1. Mostrar superficie sólida: PyMOL>show surface, 6BHU
PyMOL># 2. Hacerla un poco transparente para ver el interior: PyMOL>set transparency, 0.4
Setting: transparency set to 0.40000.
PyMOL># 3. Resaltar los ligandos (ATP) para ver dónde están los huecos principales: PyMOL>show spheres, organic
PyMOL>color brightorange, organic
Executive: Colored 146 atoms.

• Does it have binding pockets?
Yes, and they are essential to its function. When observing the surface of 6BHU, you will notice two main types of cavities: The Large Central Vestibule: Because 6BHU is a transporter in an outward-facing conformation, you will see a large hole or “funnel” at the top (the side facing the cell exterior). This is the pathway through which substrates exit. Nucleotide Binding Sites (NBDs): On the cytoplasmic (lower) side, there are two deep pockets where ATP molecules are held. If you used the transparency command, you will see the ATP “stuck” in these pockets. Internal Substrate Cavity: In the center of the membrane protein, there is a highly flexible pocket that, in this structure, is designed to release the cargo.

Figure 7. ABC protein pockets.

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

Deep Mutational Scans

The heat map displays the Model Scores, which in this case are the Log Likelihood Ratios (LLRs) of a specific mutation at a given position, compared to the wild-type (WT) amino acid. The ‘Viridis’ color scale used in the visualization assigns:
Darker colors (blues/purples): Lower, more negative LLR values. This indicates that the mutation to that amino acid at that position is unfavorable compared to the WT.
Lighter colors (greens/yellows): Higher, more positive LLR values. This indicates that the mutation to that amino acid at that position is favorable compared to the WT, or at least more likely.
A conserved site is a position in the protein where the wild-type amino acid is crucial for function or structure, and any mutation to a different amino acid would be detrimental or very unlikely. In terms of LLRs, this would mean that all possible mutations at that position (i.e., the entire column for that position, excluding the wild-type amino acid) would have very low and negative LLR values (dark colors on the heatmap). Therefore, the yellow dots (positive or high LLRs) on the heatmap indicate positions where certain mutations are favorable or better accepted by the model, not where the site is conserved. Conserved sites are represented by entire columns of the heatmap that are predominantly dark in color. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

Figure 8. Amino Acids Mutation Position in Protein Sequence.

Latent Space Analysis

Figure 9. 3D t-SNE Visualitation of protein Sequence Embeddings.

C3. Protein Generation

Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
Input this sequence into ESMFold and compare the predicted structure to your original.
Due to the inability to run the code for the “Protein Folding with ESMFold” block, I chose a smaller protein hoping it would work by using less memory, but it wasn’t possible. I’ve included an image of the protein below, but the second part of the activity is missing.

Figure 9. 3D visualitation 1crn protein.

Week 5 HW: protein desing part 2

Homework — DUE BY START OF MAR 10 LECTURE

Part A: SOD1 Binder Peptide Design (From Pranam)

Introduction

Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.
Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Part 1: Generate Binders with PepMLM

Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
Using the PepMLM Colab linked from the HuggingFace PepMLM-650M template card:
Generate four 12-amino-acid peptides conditioned to the mutant SOD1 sequence.
Add the known SOD1-binding peptide, FLYRWLPSRRGG, to the generated list for comparison.
Record the puzzle scores, which indicate PepMLM’s confidence in the ligands.

sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHE
FGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSI
EDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVI
GIAQ

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)

Part C: Final Project: L-Protein Mutants

The proposed mutations in the Bacteriophage MS2 lysis protein are generally biochemically consistent and plausible for improving stability without compromising function. The Q→E and S→T changes are conservative substitutions between polar residues that could favor electrostatic interactions or slightly increase local stability. The hydrophobic L→I and V→I mutations are particularly reasonable within the transmembrane domain, as they maintain the hydrophobicity necessary for membrane insertion and may improve helix packing. The F→Y substitution preserves aromaticity but introduces a hydroxyl group that could facilitate interactions at the membrane-water interface. Overall, these mutations do not significantly alter the length or overall hydrophobicity of the transmembrane segment, which is important because the L protein exerts its function by interfering with the MurJ lipid flippase II in Escherichia coli; therefore, the proposed changes are compatible with maintaining activity while potentially improving folding or structural stability.

Variants with more “aggressive” mutations (increasing hydrophobicity or propensity for helix formation in the transmembrane region):

  1. S -> L (increased hydrophobicity within the TM domain)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSLTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
  2. T -> L (reinforces the hydrophobic character of the propeller)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRLVTTLQQLLT
  3. A -> L (greater transmembrane helix stability)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
  4. L -> F (introduces aromatic residue into TM)
    METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFFSKFTNQLLLSLLEAVIRTVTTLQQLLT

These variants introduce less conservative changes that increase hydrophobicity, or the propensity to form α-helices, within the transmembrane domain of the Bacteriophage MS2 L protein. By replacing polar or small residues with more hydrophobic amino acids such as leucine or phenylalanine, more stable insertion into the bacterial membrane and more efficient packing of the transmembrane helix can be favored. This potentially increases the protein’s interaction with the lipid flippase II MurJ in Escherichia coli, the functional target of this lysis protein, which could intensify the inhibition of peptidoglycan precursor transport and accelerate the cell lysis process. Overall, mutations of this type aim to increase the structural stability and functional efficiency of the lysis mechanism.

Week 6 HW: genetic-circuits-part-i

Assignment: DNA Assembly

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion High-Fidelity PCR Master Mix is a pre-mixed PCR solution containing several essential components for high-precision DNA amplification. Its main components and functions are:

  1. Phusion DNA Polymerase
    This is a high-fidelity polymerase derived from Pyrococcus furiosus. Its function is to synthesize new DNA strands during PCR. It has 3’→5’ exonuclease proofreading activity, which reduces replication errors.
  2. dNTPs (deoxynucleotide triphosphates)
    These include dATP, dTTP, dCTP, and dGTP. They are the building blocks that the polymerase uses to form the new DNA strand.
  3. Reaction Buffer (Phusion HF or GC buffer)
    This maintains optimal chemical conditions for enzyme activity, such as appropriate pH and ionic stability. It also contains salts that improve PCR specificity.
  4. Mg²⁺ (usually MgCl₂)
    Magnesium is an essential cofactor for DNA polymerase. It allows the enzyme to catalyze the formation of phosphodiester bonds between nucleotides.
  5. Stabilizers and detergents
    Small molecules that protect the enzyme and improve reaction stability, helping to maintain activity during the thermal cycles of PCR. Together, these components allow for DNA amplification with high precision and a low error rate, which is especially important in applications such as site-directed mutagenesis or cloning, where even a single nucleotide error can alter the experimental outcome.
What are some factors that determine primer annealing temperature during PCR?

The annealing temperature in PCR depends primarily on the primer properties and reaction conditions. Some key factors are:

  1. Primer melting temperature (Tm): This is the most important factor. Tm depends on the primer sequence and represents the temperature at which 50% of the primer is bound to the template DNA. The annealing temperature is usually set 3–5 °C below Tm.
  2. GC content: G–C pairs form three hydrogen bonds, while A–T pairs form two. Therefore, a primer with a higher percentage of GC will have a higher Tm and require a higher annealing temperature.
  3. Primer length: Longer primers form more interactions with the template DNA, which increases hybrid stability and raises the annealing temperature.
  4. Complementarity with the template DNA: If mismatches exist between the primer and the target sequence, the binding is less stable and may require lower hybridization temperatures.
  5. Reaction conditions: Factors such as Mg²⁺ concentration, salts, and buffer additives influence the stability of DNA-DNA hybridization and can modify the optimal temperature.
  6. Primer secondary structures: The formation of primer dimers or hairpins can affect the primer’s availability to bind to the template DNA, altering the optimal hybridization temperature.
    Together, these factors determine the temperature at which the primer binds specifically and efficiently to DNA during PCR, which is crucial for accurate amplification.
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

Two common methods used to generate linear DNA fragments are PCR amplification and restriction enzyme digestion, and they differ in both protocol and experimental applications.
PCR (Polymerase Chain Reaction) amplifies a specific DNA region using sequence-specific primers, a thermostable DNA polymerase, nucleotides, and thermal cycling. The process involves repeated cycles of denaturation, primer annealing, and extension, producing large amounts of a defined DNA fragment. PCR is particularly useful when the desired DNA segment must be selectively amplified from a template, when the starting DNA is present in small amounts, or when mutations, tags, or new restriction sites need to be introduced through primer design.
In contrast, restriction enzyme digestion uses enzymes that recognize specific DNA sequences and cut the DNA at those sites, generating defined fragments with blunt or sticky ends. The protocol is simpler: DNA is incubated with the appropriate restriction enzyme(s), buffer, and temperature conditions. This method is preferable when the DNA already contains the desired restriction sites, when one needs precise ends for cloning, or when verifying plasmids or isolating fragments from existing constructs.
In summary, PCR is best for generating or modifying specific DNA sequences and amplifying small amounts of DNA, while restriction enzyme digestion is ideal for cutting existing DNA molecules at known sites to produce fragments suitable for cloning or analysis. Both methods generate linear DNA but serve different experimental purposes.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

To ensure that DNA fragments generated by PCR or restriction digestion are suitable for Gibson Assembly, the key requirement is that all fragments share compatible overlapping ends.

  1. Design overlapping regions (most critical factor): Each fragment must have ~20–40 bp of sequence homology with the adjacent fragment.
    • For PCR products, this is achieved by adding overlaps directly into the 5′ ends of primers.
    • For restriction-digested fragments, you must verify that the cut sites preserve or expose the required overlaps, or add them via PCR if needed.
  2. Ensure correct fragment sequence and orientation: The overlaps must correspond exactly to the adjacent fragment sequence and direction, otherwise assembly will fail or produce incorrect constructs.
  3. Use high-fidelity PCR: Amplify fragments with a high-accuracy enzyme (e.g., Phusion) to minimize mutations that could disrupt overlap regions or coding sequences.
  4. Generate clean, specific fragments:
    • Confirm size by gel electrophoresis
    • Purify fragments to remove primers, enzymes, and nonspecific products
    • Contaminants can inhibit the Gibson reaction.
  5. Avoid incompatible ends: Unlike traditional cloning, Gibson does not require restriction sites, but fragments must be linear and not have conflicting overhangs that interfere with exonuclease processing.
  6. Balance fragment concentrations: Use roughly equimolar amounts of each fragment to improve assembly efficiency.
How does the plasmid DNA enter the E. coli cells during transformation?

During transformation, plasmid DNA enters Escherichia coli cells only after the cells are made competent, meaning their membranes are temporarily altered to allow DNA uptake. There are two main mechanisms:

  1. Chemical transformation (CaCl₂ + heat shock): Cells are treated with calcium chloride, which helps neutralize the negative charges on both the DNA and the bacterial cell membrane. This allows the plasmid DNA to approach the cell surface. A brief heat shock (≈42 °C) then creates a thermal imbalance that induces transient pores or increases membrane permeability, enabling the plasmid DNA to enter the cytoplasm.
  2. Electroporation: Cells are exposed to a short, high-voltage electric pulse, which creates temporary pores in the membrane. The plasmid DNA is driven into the cell by the electric field and enters through these pores. After the pulse, the membrane reseals.
Describe another assembly method in detail (such as Golden Gate Assembly)

Golden Gate Assembly is a DNA assembly technique that uses Type IIS restriction enzymes, such as BsaI, which cut outside of their recognition sequences. This allows researchers to design custom overhangs that determine how multiple DNA fragments will join together in a specific order. In a single reaction, the DNA fragments, restriction enzyme, and ligase are combined. The enzyme cuts the DNA to create compatible overhangs, and the ligase joins the fragments, resulting in a seamless assembly where the original restriction sites are removed. This method is highly efficient for assembling multiple DNA fragments simultaneously and in a defined orientation, making it ideal for modular cloning applications. It is particularly useful when working with standardized genetic parts, such as promoters, coding sequences, and terminators. However, it requires careful design to avoid internal restriction sites within the fragments and to ensure correct overhang compatibility. Overall, Golden Gate Assembly enables rapid, scarless, and high-throughput DNA construction.

Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Diagram (simplified)
Step 1: Fragments with Type IIS sites
[Fragment A]– BsaI–> [Fragment B]–BsaI–> [Fragment C]
Step 2: Cutting creates custom overhangs
A: ATG |—-
B: | TAC
C: | GGA
Step 3: Complementary overhangs anneal
ATG —- TAC —- GGA
Step 4: Ligase seals fragments
[Fragment A][Fragment B][Fragment C] (final construct)

Assignment: Asimov Kernel

The construction consists of assembling three independent expression modules onto a single pUC-CmR vector: pTet–RBS A1–LacI–L3S2P24 terminator, pLac–RBS A1–λCI–L3S2P24, and pλ–RBS A1–PhlF–L3S2P24. Each module includes a specific promoter, a strong ribosomal binding site (A1), a repressor CDS, and a strong bacterial terminator to isolate transcription. The three cassettes are arranged in series to form a closed-loop cross-repression system.

The simulation was plausible; the design needs to be optimized, but I couldn’t do it due to lack of time.

Function

Each module should function as a “block” where the promoter activates the expression of its repressor, but that same repressor inhibits the next promoter in the chain; thus, pTet produces LacI which turns off pLac, pLac produces λCI which turns off pλ, and pλ produces PhlF which again turns off pTet, generating a cycle of cross-repression that, ideally, produces oscillations in the expression of each gene.

Subsections of Week 6 HW: genetic-circuits-part-i

Week 6 Lab

Laboratorio (Semana 6) — Ensamblaje Gibson

Laboratory Protocol: The Quest for Chromophore Color Cloning

Introduction | Objective

Variant Bases Original TGTCAG (AC) Orange Gly: GGT, GGC, GGA, GGG Val: GTT, GTC, GTA, GTG Pink Ala: GCC, GCA, GCT Cys: TGT, TGC

Week 7 HW: genetic-circuits-part-ii

Task Part 1: Intracellular Artificial Neural Networks (IANN)

1. Advantages of IANNs vs. Boolean Logic

  • Continuous/gradual response (not just ON/OFF)
  • Integrate multiple weighted signals (weighted sum type)
  • Allow thresholds, nonlinearity, and fine-tuning
  • Greater robustness to biological noise
  • Scalability for complex functions (not just AND/OR)

2. Useful Application (example: intelligent intestinal biosensor)

Input:
X1 = microbial metabolites (SCFAs)
X2 = inflammatory signal (NF-κB)
X3 = drug present
Behavior:
The IANN integrates signals → calculates an “activation”
If it exceeds the threshold → activates the expression of a therapeutic (or fluorescent) protein

Output:
Proportional protein production (non-binary)

Limitations:

Cellular noise and variability
Difficulty in adjusting “biological weights”
Slow kinetics (Tx/Tl)
Cellular metabolic load
Interference with endogenous networks

3. Intracelular Perceptron

This figure illustrates a synthetic biological perceptron designed to sense gut metabolic signals. The circuit integrates two key inputs, short-chain fatty acids (butyrate) and bile acids, which modulate the expression of RNA-processing enzymes. These enzymes control the stability of the reporter mRNA (mNeoGreen), determining fluorescence output. The system discriminates between dysbiotic and healthy states based on the combined input signals.

Circuit interpretation:

The system works like a biological perceptron that integrates two environmental signals:
  • X1 (butyrate/SCFAs) → inhibitory signal
  • X2 (bile salts) → activating signal
Under normal conditions (healthy state):
  • High X1 represses CasE expression
  • X2 at basal levels does not activate Csy4
  • As there is no processing of the output RNA (NG, mNeoGreen), it is translated correctly = there is controlled basal fluorescence or absence of cleavage = “off” circuit in terms of dysbiosis response
In conditions of dysbiosis:
  • Decrease in X1 = CasE expression is disinhibited
  • Increase in X2 = activates Csy4 expression
  • Both effectors process/cleave NG RNA = translation is blocked = no fluorescence

Summary

Week 9 HW: Cell-Free Sistem

General homework questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free protein synthesis offers key advantages over in vivo systems by eliminating the complexity and limitations of the cell as a “black box.” In these systems, all components are defined and manipulable, allowing direct control over variables such as DNA concentrations, expression levels, biochemical composition, cofactors, and reaction conditions. In fact, expression can be precisely adjusted simply by varying the DNA concentration, achieving proportional regulation of each protein—something difficult to achieve in living cells. Furthermore, the system is fully customizable, allowing modification of the internal chemistry and each molecular component, which provides a level of experimental control and predictability far superior to that of traditional cell systems.

This approach is particularly advantageous in several scenarios. First, in the rapid prototyping of metabolic pathways or gene networks, as it allows the expression of multiple proteins in just a few hours directly from linear DNA, avoiding complex steps such as cloning and cell culture. Second, in applications requiring fine control of protein stoichiometry, since it is possible to simultaneously modulate the expression of multiple genes in the same system. Additionally, it is particularly useful for producing compounds or proteins that would be toxic or difficult to handle in living cells, and for on-demand biofabrication (e.g., rapid synthesis of proteins or drugs), where the simplicity and speed of the cell-free system represent a significant advantage.

Describe the main components of a cell-free expression system and explain the role of each component.

Components:
  • Cell extract (transcription/translation machinery): contains ribosomes, tRNA, and associated factors that enable protein synthesis; it acts as the “functional cytoplasm” of the system and directly executes gene expression.

  • Template DNA (plasmid or linear): provides the genetic information for the protein of interest; its concentration determines the level of expression and allows for quantitative modulation of protein production.

  • Nucleotides (ATP, GTP, CTP, UTP): are the substrates for RNA synthesis during transcription; in addition, ATP and GTP participate as energy sources in different steps of translation.

  • Amino acids: constitute the building blocks for protein synthesis; they must be present in adequate concentrations to sustain translation.

  • Energy regeneration system: maintains constant ATP levels; it is essential because the system does not have its own active metabolism, and energy would be rapidly consumed without regeneration.

  • Salts and ions (Mg²⁺, K⁺, etc.): stabilize the structure of ribosomes and enzymes; They regulate the efficiency and fidelity of translation.

  • Cofactors and small molecules: include compounds necessary for enzymatic activity (such as NAD⁺, CoA); they allow essential biochemical reactions to occur within the system.

  • Chaperones and folding factors (optional): help newly synthesized proteins acquire their correct functional structure, especially in complex proteins.

A cell-free expression system essentially consists of a cell extract containing the transcription and translation machinery (ribosomes, tRNA, initiation, elongation, and termination factors), along with the enzymes necessary for RNA and protein synthesis. This extract constitutes the system’s “functional cytoplasm” and allows genetic information to be translated into protein without the need for a living cell. Added to this is the template DNA (plasmid or linear fragment), which provides the genetic information to be expressed, and whose concentration can directly modulate protein expression levels.

Furthermore, the system includes a mixture of small molecules and cofactors: nucleotides (for transcription), amino acids (for translation), energy sources (such as ATP or regenerative systems), salts and ions that stabilize the machinery, and in some cases, chaperones or additional components that promote protein folding. A key feature is that all these components are defined and adjustable, allowing fundamental control of the system’s biochemistry, including which molecules participate and under what conditions the reaction occurs.

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Energy provision and regeneration are critical in cell-free systems because transcription and translation processes are highly demanding on ATP and GTP. Unlike a living cell, where active metabolic pathways continuously regenerate these nucleotides, in a cell-free system the energy pool is rapidly depleted if a regenerative system is not implemented. This leads to premature cessation of protein synthesis and low overall system efficiency, limiting both the yield and duration of the reaction.
To ensure a continuous supply of ATP, a regeneration system based on phosphoenolpyruvate (PEP) and pyruvate kinase can be employed. In this scheme, PEP acts as a high-energy phosphate donor, enabling the sustained conversion of ADP to ATP. Alternatively, more stable and cost-effective systems can be used, such as those based on creatine phosphate/creatine kinase, or even more complex energy sources like glucose or maltodextrin coupled to regenerative enzyme pathways. These approaches allow extending the duration of the reaction and maintaining adequate energy levels for efficient protein synthesis.

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Cell-free prokaryotic systems (e.g., those based on E. coli) are characterized by their high efficiency, low cost, and speed, making them ideal for large-scale production and rapid prototyping. However, they lack the machinery necessary to perform complex post-translational modifications (such as glycosylation or compartmentalization-dependent folding). In contrast, eukaryotic systems (derived from yeast, insect, or mammalian cell extracts) allow for more precise folding and the incorporation of post-translational modifications, although they are typically more expensive and less efficient. This difference aligns with the general principle that cell-free systems are highly tunable in their composition, allowing the selection of the extract source according to experimental needs.
As an example, I would choose to produce a bacterial metabolic enzyme (e.g., β-galactosidase) in a prokaryotic system, as it does not require complex post-translational modifications and benefits from the system’s high efficiency. In contrast, for a eukaryotic system, I would select an antibody or a human membrane protein, which require proper folding and possible modifications such as glycosylation to be functional. This type of choice is justified by each system’s differential ability to reproduce specific cellular conditions.

How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

To optimize the expression of a membrane protein in a cell-free system, I would design an experiment based on the system’s ability to be fully tunable in its composition. I would use an extract (preferably prokaryotic for initial simplicity) and systematically vary key conditions: DNA concentration, temperature, Mg²⁺ and K⁺ concentrations, and reaction time. In parallel, I would incorporate different membrane-mimetic environments (mild detergents, liposomes, or nanodiscs) directly into the reaction mixture, evaluating which one promotes the greatest protein solubility and activity. The design would be a parallel screening experiment, taking advantage of the system’s speed to compare multiple conditions in a short time.

The main challenge is correct folding and insertion into a lipid environment, since in the absence of a membrane, the protein tends to aggregate or lose functionality. To address this, I would include stabilizing agents (non-ionic detergents such as DDM), nanodisc systems or lipid vesicles that allow co-translational insertion, and potentially chaperones if the system allows it. Another problem is the relatively low yield, which can be mitigated by optimizing the system’s energy and DNA concentration (directly controllable).

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Low protein production in a cell-free system can be due, firstly, to an inadequate concentration or poor quality of the template DNA. Since expression depends directly on the amount of DNA present, low concentration, degradation, or impurities can limit protein synthesis. As a strategy, the DNA concentration should be optimized over a broad range and its integrity verified (e.g., by electrophoresis), taking advantage of the fact that the system allows direct modulation of this parameter.
Secondly, there may be a limitation in the provision or regeneration of energy, leading to early reaction arrest. Without an efficient ATP regeneration system, the transcription/translation machinery quickly becomes inactive. To address this, the energy system can be optimized or changed (e.g., PEP or creatine phosphate), and the reaction conditions can be adjusted to extend its duration.
Finally, a third factor can be a suboptimal biochemical environment (ions, cofactors, or protein folding), which affects translation efficiency or protein stability. Since the system is fully adjustable in its composition, concentrations of Mg²⁺ and K⁺ can be optimized, chaperones can be added, or conditions such as temperature can be modified to improve performance.

Homework question from Kate Adamala

Design an example of a useful synthetic minimal cell as follows:
  1. Pick a function and describe it.
      a. What would your synthetic cell do? What is the input and what is the output?
    Chosen function: Synthetic biosensor for intestinal inflammation
    The synthetic minimal cell would be designed to detect a specific inflammatory signal (e.g., a cytokine or metabolite associated with intestinal inflammation) and respond by producing a reporter or therapeutic protein. The system’s input would be the signal molecule (e.g., TNF-α or a metabolite derived from dysbiosis), which would be recognized by a sensor module (such as a receptor or transcription-coupled regulatory system). The output would be the controlled expression of a protein, such as a fluorescent protein (for diagnostics) or an anti-inflammatory protein (for intervention).
    This design is based on the possibility of fully controlling the components of the cell-free system and programming specific gene circuits. Thus, the “cell” is not living in the strict sense, but rather a programmable minimal system where the input-output relationship is precisely defined, enabling rapid, modular, and highly specific applications.
      b. Could this function be realized by cell-free Tx/Tl alone, without encapsulation?
    Yes, this function can be performed using only a cell-free transcription/translation (Tx/Tl) system, without the need for encapsulation, especially in the case of an in vitro biosensor. The recitation highlights that these systems are fully defined and programmable, allowing gene circuits to be executed directly in solution. In this context, the biological sample acts as the input, the circuit responds, and the output (e.g., fluorescence or an enzyme signal) is generated directly in the reaction medium, without requiring a “cell” as a physical compartment. Encapsulation could become relevant depending on the objective. While it is not necessary for plate detection (where the system functions as an open biosensor), it would be useful for achieving greater stability, portability, or in situ/in vivo applications, as it would allow the system to be isolated, protected from interference, and more controlled microenvironments to be created.
      c. Could this function be realized by genetically modified natural cell?
    Yes, this function could be performed using a genetically modified natural cell, since it is possible to introduce a sensor-response gene circuit that detects a specific signal (for example, a molecule associated with inflammation) and activates the expression of a reporter or therapeutic protein. This type of design is common in synthetic biology, where inducible promoters and transcriptional regulators are used to couple an environmental input to a functional output within a living cell. Compared to the cell-free system, the use of cells introduces less experimental control and greater complexity due to endogenous regulation, metabolism, and potential unwanted interactions. While in cell-free Tx/Tl systems all components and conditions are defined and adjustable, in living cells there are limitations such as toxicity, biological variability, and lower predictability. Therefore, although feasible, the choice between the two approaches depends on the balance between control (in vitro) and integrated functionality (in vivo).
      d. Describe the desired outcome of your synthetic cell operation.

  2. Design all components that would need to be part of your synthetic cell
      a. What would be the membrane made of?
    Following Kate Adamala, a synthetic cell should include a lipid bilayer membrane, typically built from phospholipids forming liposomes, to create a defined compartment that mimics cellular boundaries.

  b. What would you encapsulate inside? Enzymes, small molecules.
Inside the compartment, you would encapsulate a minimal gene expression system: DNA, RNA polymerase, ribosomes, tRNAs, enzymes, nucleotides, amino acids, and an energy regeneration system—essentially a cell-free Tx/Tl system confined within the vesicle.

  c. Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason?
A bacterial system (e.g., E. coli) is typically used, as highlights its robustness and simplicity for building minimal cells, unless specific eukaryotic features are required.

  d. How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)
Communication is achieved by membrane permeability or engineered channels. Small molecules may diffuse passively, but for controlled exchange, membrane proteins (pores or transporters) can be incorporated to regulate input and output, enabling interaction with the environment.

Homework question from Peter Nguyen

  • Write a one-sentence summary pitch sentence describing your concept.
    A smart textile embedded with freeze-dried cell-free systems that detects inflammatory biomarkers in sweat and produces a visible signal for real-time health monitoring.
  • How will the idea work, in more detail?
    Inspired by Peter Nguyen, the textile would incorporate freeze-dried cell-free Tx/Tl reactions within fibers or patches. Upon contact with sweat (rehydration trigger), the system activates and detects specific metabolites or proteins associated with inflammation or stress. The embedded genetic circuit drives the expression of a colorimetric or fluorescent reporter, enabling immediate visual readout. The system remains inactive and stable until hydration, ensuring on-demand functionality.
  • What societal challenge or market need will this address?
    This addresses the need for non-invasive, real-time health monitoring, particularly for chronic inflammatory conditions, athletes, or early disease detection, reducing reliance on laboratory diagnostics.
  • How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?
    The system leverages freeze-drying for long-term stability and uses sweat as a natural activation mechanism. To address one-time use, the textile could incorporate replaceable sensing patches, while stability can be enhanced through protective matrices and material integration, as suggested in freeze-dried cell-free platforms.

Homework question from Ally Huang

Inspired by Ally Huang, a major challenge in space is microbial dysbiosis and altered host–microbe interactions under microgravity, which can affect astronaut health and immune function. Limited access to laboratory infrastructure makes real-time molecular diagnostics difficult. Developing portable, low-resource biosensing systems is critical for long-duration missions (e.g., Mars). Cell-free systems offer a unique solution due to their stability, programmability, and minimal requirements, making them ideal for monitoring biological changes in space environments.

  • Molecular or genetic target
    Inflammation-associated cytokine mRNA (e.g., IL-6) and microbial metabolite-responsive regulatory elements.

  • Relation to space biology challenge
    Altered microbiota and immune dysregulation in space can lead to increased inflammation and infection risk. Monitoring biomarkers such as IL-6 provides insight into astronaut immune status. A cell-free system can be designed to detect these molecular signals directly from biological samples (e.g., saliva), enabling rapid assessment of physiological changes without complex lab equipment.

  • Hypothesis / research goal
    We hypothesize that a freeze-dried BioBits® cell-free system can be engineered to detect inflammation-associated molecular signals (e.g., IL-6 mRNA or related metabolites) in astronaut samples under microgravity conditions. Upon rehydration, the system will activate and produce a measurable fluorescent signal proportional to the target concentration. This approach leverages the stability and programmability of cell-free systems to function reliably in space. The goal is to demonstrate that biological sensing can be performed in a minimal, portable format, supporting astronaut health monitoring during long-term missions.

  • Experimental plan
    Samples: simulated saliva containing target RNA or metabolites. Controls: negative (no target), positive (known concentration). Use miniPCR® to amplify target sequences if needed. Add samples to freeze-dried BioBits® reactions and incubate. Measure fluorescence using P51 viewer. Compare signal intensity across conditions to evaluate sensitivity and specificity of detection.

Homework Part B: Individual Final Project

Week 10 HW: imaging and measurement

Proposed set of measurements to be implemented in the project, subject to refinement as the study progresses.

Homework: Waters Part I — Molecular Weight

  1. Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

28,006.6 Da. It is calculated by summing the average masses of all amino acids in the sequence, including the His-tag, and accounting for water loss during peptide bond formation. No major modifications significantly change the total mass, so this value matches the expected intact mass of eGFP.

Using the adjacent charge state method, you first pick two neighboring peaks in the spectrum with m/z values:

Once z is known, calculate the molecular weight (MW):

Where H ≈ 1.0073 Da, the proton mass.

  • Choose two adjacent peaks (m₁, m₂)
  • Compute z
  • Plug into MW equation
    This gives the intact mass of eGFP.
  1. Determine the MW of the protein using the relationship between m/zn, MW, and z Use the standard ESI relationship between m/z, charge, and mass:

Rearrange to solve for molecular weight:

So, for a given peak:

  • Take its m/z value
  • Multiply by the charge z
  • Subtract z×1.0073 Da (proton mass)
    This gives the molecular weight of the protein.
  1. Calculate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 using:
    To calculate the accuracy, use:

    From 2.1:
    MWtheory=28,006.6Da
    From 2.2 (your deconvoluted value, assume typical experimental value for eGFP):
    MWexp ≈ 28,000Da

  2. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?
    No, you cannot directly determine the charge state from the zoomed-in peak. This region corresponds to the deconvoluted (zero-charge) spectrum, where all charge states have already been mathematically combined into a single neutral mass peak (~28 kDa). Because the m/z information for individual ions is removed during deconvolution, the original charge state distribution is no longer visible.

Homework: Waters Part II — Secondary/Tertiary structure

  1. A mass spectrometer detects this difference through the charge state distribution. In the native state, the compact structure limits protonation, resulting in lower charge states (higher m/z values) and fewer, narrower peaks. In the denatured state, the unfolded protein can accommodate more charges, leading to higher charge states (lower m/z values) and a broader distribution of peaks.

In Figure 2, this is clearly observed:

  • The top spectrum (denatured) shows a wide range of peaks at lower m/z, corresponding to many high charge states.
  • The bottom spectrum (native) shows fewer peaks at higher m/z, indicating lower charge states and a more compact structure.

Thus, the shift in charge state distribution and m/z range directly reflects the conformational state of the protein.

  1. Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 m/z? What is the charge state? How can you tell?
    Yes, the charge state can be determined from the isotopic peak spacing in the zoomed region.

At ~2800 m/z, the inset shows clearly resolved isotopic peaks. The spacing between adjacent isotopic peaks is approximately:
Δ(m/z) ≈ 1/z
From the figure, the peak spacing is about 0.5 m/z, so:
z ≈ 1/0,5 = 2
Final answer: the charge state is +2.

You can tell this because the isotopic peak spacing is inversely proportional to the charge state, and a ~0.5 m/z spacing corresponds to a doubly charged ion.

Homework: Waters Part III — Peptide Mapping - primary structure

  1. How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above. (Note: adding the sequence to Benchling as an amino acid file and clicking biochemical properties tab will show you a count for each amino acid).
  2. How many peptides will be generated from tryptic digestion of eGFP?
  • Navigate to https://web.expasy.org/peptide_mass/
  • Copy/paste the sequence above into the input box in the PeptideMass tool to generate expected list of peptides.
  • Use Figure 4 below as a guide for the relevant parameters to predict peptides from eGFP.
  • Click “Perform the Cleavage” button in the PeptideMass tool and report the number of peptides generated when using trypsin to perform the digest.
    19 peptides
  1. Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance.
    In the peptide mapping chromatogram of eGFP (Figure 5a), approximately 9 chromatographic peaks are observed between 0.5 and 6 minutes with a relative abundance greater than 10%. These peaks correspond to the peptides generated after protein digestion.

  2. Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?
    Yes, the observed number of peaks is consistent with the predicted peptides. Approximately 18 chromatographic peaks were detected, which is very close to the 19 peptides predicted by ExPASy. Therefore, the results are within the expected range. Minor differences can be explained by factors such as co-elution, low-abundance peptides, or small peptides that are not efficiently detected.

  3. Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state). Calculate the mass of the singly charged form of the peptide (MH+) based on its m/z and z.

  4. Identify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement? Please calculate the error in ppm. (Recall that Accuracy formula)

  5. What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)
    88 % percent

Bonus Peptide Map Questions

  1. Can you determine the peptide sequence for the peptide fragmentation spectrum shown in Figure 5c? (HINT: Use your results from Question 2 above to match the peptide molecular weight that is closest to that shown in Figure 5b. Copy and paste its sequence into this tool online to predict the fragmentation pattern based on its amino acid sequence: http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet.html. What is the sequence of the eGFP peptide that best matches the fragmentation spectrum in Figure 5c?
  2. Does the peptide map data make sense, i.e. do the results indicate the protein is the eGFP standard? Why or why not? Consult with Figure 6, which depicts the % amino acid coverage of peptides positively identified using their calculated mass and fragmentation pattern.

Homework: Waters Part IV — Oligomers

7FU Decamer: ~3,4
8FU Didecamer: ~8,33
8FU 3-Decamer: ~12,67
8FU 4-Decamer: not clearly represented in the given values.

Homework: Waters Part V — Did I make GFP?

Molecular weigth (kDa.)
Theorical: 27.989 kDa.
Observed (Measuder): 27.982 kDa.
PPM Mass Error: 252.6 ppm

Week 11 HW: building-genomes

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

what about this collaborative art experiment could be made better for next year.

I didn’t have the opportunity to contribute. I think it would be useful to design a protocol with https://rcdonovan.com and then calculate the volumes per well, concentrations, or data that can be used later.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

  1. Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.
    E.coli Lysate Cell extract that provides ribosomes, tRNAs, enzymes and factors necessary for transcription and translation.
  • BL21 (DE3) Star Lysate (includes T7 RNA Polymerase): Optimized lysate that includes T7 RNA polymerase to efficiently transcribe genes under the T7 promoter. Salts/Buffer
  • Potassium Glutamate: Maintains ionic strength and protein stability, simulating intracellular conditions.
  • HEPES-KOH pH 7.5: Buffer that stabilizes the pH during the reaction.
  • Magnesium Glutamate: Essential cofactor for ribosomes, RNA polymerase and enzymatic reactions.
  • Potassium phosphate monobasic: Contributes to the phosphate buffer system and ionic balance.
  • Potassium phosphate dibasic: It adjusts the pH together with the monobasic phosphate and stabilizes the medium.
    Energy/Nucleotide System
  • Ribose: Carbon source for energy regeneration and nucleotide synthesis.
  • glucose: Energy source for ATP production in the system.
  • AMP: Monomer for RNA synthesis.
  • CMP: Monomer for RNA synthesis.
  • GMP: Monomer for RNA synthesis.
  • UMP: Monomer for RNA synthesis.
  • Guanine: Nitrogenous base that can be recycled for nucleotide synthesis.
    Translation Mix (Amino Acids)
  • 17 Amino Acid Mix: Provides most of the amino acids necessary for protein synthesis.
  • Tyrosine: Amino acid added separately for its low solubility/stability.
  • Cysteine: Amino acid added separately due to its reactivity and tendency to oxidize.
    Additives
  • Nicotinamide: Precursor of NAD⁺/NADH, key for redox reactions and energy metabolism.
    Backfill
  • Nuclease Free Water: Adjusts the final volume without degrading RNA/DNA, maintaining nuclease-free conditions.
  1. Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)
    The 1-hour optimized PEP–NTP system uses phosphoenolpyruvate as a high-energy phosphate donor and directly supplies NTPs, enabling rapid transcription–translation but with limited longevity due to fast energy depletion and byproduct accumulation. In contrast, the 20-hour NMP–ribose–glucose system relies on slower metabolic regeneration of NTPs from nucleoside monophosphates using ribose and glucose, which reduces inhibitory byproducts and sustains protein synthesis for much longer periods, albeit with slower initial rates.

  2. Bonus question: How can transcription occur if GMP is not included but Guanine is?
    Transcription can still occur because guanine is salvaged into GMP inside the lysate. Enzymes such as hypoxanthine-guanine phosphoribosyltransferase (HGPRT) convert guanine + PRPP into GMP, which is then phosphorylated to GDP and GTP-the actual substrate used by RNA polymerase.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

  1. Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)
ProteinRelevant propertyImpact in cell-free systems
sfGFPHighly efficient folding (superfolder variant)Folds robustly even under suboptimal conditions, producing fast and reliable fluorescence in CFE.
mRFP1Relatively slow maturationDelays fluorescence signal, which can underestimate expression in short experiments.
mKO2pH sensitivity (higher pKa)Fluorescence decreases under acidic conditions, making signal dependent on buffer composition.
mTurquoise2High quantum yield (very bright)Generates strong signal even at low expression levels, improving detectability.
mScarlet-IFast maturation and high brightnessEnables strong fluorescence in short timeframes, ideal for rapid CFE assays.
Electra2Oxygen-dependent chromophore maturationFluorescence can be limited in low-oxygen conditions, affecting readout in closed systems.
  1. Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect. Hypothesis: For mScarlet-I, increasing the buffering capacity (e.g., higher HEPES-KOH) and supplementing the energy system (e.g., optimizing glucose/ribose levels) will maintain stable pH and ATP availability over 36 hours, thereby supporting sustained protein synthesis and efficient chromophore maturation, resulting in higher cumulative fluorescence.
    Expected effect: Improved long-term stability of the reaction environment will prevent fluorescence loss due to acidification and energy depletion, maximizing total signal output during extended incubation.

  2. The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions.

Hypothesis (tailored to mix):

  • For mKO2, increasing HEPES-KOH buffer concentration (from 45 mM to ~60–70 mM) and slightly adjusting potassium phosphate balance will stabilize pH over a 36-hour incubation, reducing fluorescence loss due to acidification.
  • Additionally, increasing glucose concentration (from 1.25 g/L to ~2–3 g/L) will enhance long-term ATP regeneration, sustaining protein synthesis and chromophore maturation.
    Expected effect: Improved pH stability will preserve mKO2 fluorescence (which is pH-sensitive), while enhanced energy availability will maintain translation over time, resulting in higher cumulative fluorescence after 36 hours.
ReagentPresetCurrentDeltaDelta %
Cell Lysate6.000 uL6.000 uL--
DNA Template2.000 uL2.000 uL--
Nuclease-Free Water2.000 uL1.525 uL-0.475 uL-23.8%
Potassium Glutamate312.563 mM312.563 mM--
Magnesium Glutamate6.975 mM6.975 mM--
HEPES-KOH pH 7.545.000 mM60.000 mM+15.000 mM33.3%
17 Amino Acid Mix4.063 mM4.063 mM--
Tyrosine4.063 mM4.063 mM--
Cysteine4.000 mM4.000 mM--
Ribose11.625 g/L11.625 g/L--
AMP0.625 mM0.625 mM--
CMP0.375 mM0.375 mM--
GMP----
UMP0.375 mM0.375 mM--
Guanine0.156 mM0.156 mM--
Glucose1.250 g/L3.000 g/L+1.750 g/L140.0%
Potassium phosphate dibasic5.625 mM5.625 mM--
Potassium phosphate monobasic5.625 mM5.625 mM--
Nicotinamide3.125 mM3.125 mM--

Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment