First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about. Inspired by the MELiSSA project (Micro-Ecological Life Support System Alternative) from ESA, this project proposes an ecosystem composed of microorganisms and higher plants using their metabolic waste products as a substrate for the next compartment. This project is designed to study the behavior of artificial ecosystems and to develop the technologies required for future regenerative life-support systems in long-duration human space missions, such as lunar bases or missions to Mars. The system comprises five different compartments, each one colonized respectively by anoxygenic thermophilic bacteria, photoheterotrophic bacteria, nitrifying bacteria, photosynthetic bacteria, higher plants, and the human crew. I would like to conceptually integrate these microorganisms and higher plants with a plasmids-based control system, through the use of reporter genes and inducible regulatory elements. This would increase the security (allowing real-time monitoring of metabolics states, for example) and predictability of the system.
Week 02 - Lecture Questions Professor Jacobson The fidelity of DNA replication is governed by DNA polymerase and its associated repair systems. The intrinsic error rate of DNA polymerase, in the absence of proofreading, is approximately 10-4 to 10-5 per nucleotide. In eukaryotes, replicative polymerases utilize 3’ —} 5’ exonuclease activity for proofreading, which enhances fidelity to an error rate of approximately 10-7. When integrated with post-replicative mismatch repair (MMR) mechanisms, the effective error rate is further optimized to roughly 10-9 to 10-10 per nucleotide.Given that the human genome comprises approximately 3.2 x 109 base pairs, replication without these multi-layered fidelity mechanisms would result in a mutational load incompatible with cellular viability. Biological systems mitigate this risk through a hierarchy of safeguards—polymerase proofreading, mismatch repair, and various DNA damage response pathways—ensuring that the mutation rate per genome remains within a range that sustains evolutionary stability and life. A typical human protein consists of approximately 300 to 400 amino acids. Due to the degeneracy of the genetic code—where 64 codons encode 20 amino acids—the theoretical number of DNA sequences capable of encoding a single protein is exceptionally high. However, functional constraints significantly restrict this theoretical diversity. Key limiting factors include:
Week 03 - Python Script for Opentrons Artwork I was not able to write the code entirely by myself. The closest I got was generating concentric circles, wich reminded me of the Argentine “Escarapela” (with the help AI). My original idea, however, was to made an Argentine Mate which I did in https://opentrons-art.rcdonovan.com/ I also did a Cherry!
Week 04 - Part A: Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) 500 g of meat has more or less 22% of protein, so 500 g x 0.22 =110 g of protein
Average amino acid ≈ 100 Daltons and 1 Dalton ≈ 1 g/mol, so 100 Da≈100 g/mol, in order to convert grams of protein to moles of amino acids
Week 5 Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation. Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card: Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence. To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison. Record the perplexity scores that indicate PepMLM’s confidence in the binders. Part 2: Evaluate Binders with AlphaFold3 Navigate to the AlphaFold Server: alphafoldserver.com For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried? In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder. Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:
Week 6 — Genetic Circuits Part I: Assembly Technologies DNA Assembly Answer these questions about the protocol in this week’s lab:
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The Phusion High-Fidelity PCR Master Mix contains several components: Phusion DNA polymerase → a high-fidelity enzyme that synthesizes DNA with very low error rates (With a failure rate 50 times lower than Taq and 6 times lower than Pfu, these polymerases are an excellent choice for cloning and other applications requiring high fidelity), which is critical when amplifying fragments of the amilCP gene. dNTPs (deoxynucleotide triphosphates) → building blocks for new DNA strands MgCl₂ → cofactor necessary for polymerase activity Buffer system → maintains optimal pH and ionic conditions These components work together to ensure accurate and efficient DNA amplification, also Phusion DNA polymerases offer robust performance with short protocol times, even in the presence of PCR inhibitors. They generate higher yields with less enzyme than other DNA polymerases. In this protocol, the master mix is used to amplify amilCP fragments that will later be assembled using Gibson Assembly. What are some factors that determine primer annealing temperature during PCR? Primer annealing temperature depends on: Primer length → longer primers have higher melting temperatures, GC content → higher GC increases stability and raises Tm. Higher melting temperatures are caused due to stronger hydrogen bonding. In this protocol, primers include additional overhangs (20–22 bp) for Gibson Assembly, but only the binding region determines the annealing temperature. The annealing temperature is typically set a few degrees below the melting temperature (Tm) to ensure specific binding. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other. In this protocol, PCR amplify specific regions of the amilCP gene, including mutated regions in the chromophore, allowing precise control over sequence design In contrast, restriction digestion (using PvuII) is used to linearize the pUC19 plasmid backbone. PCR is more flexible and allows introduction of mutations and overlaps, while restriction digestion relies on specific enzyme recognition sites. PCR is preferable for designing new constructs, whereas digestion is useful for preparing existing plasmid backbones.
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Traditional genetic circuits primarily rely on Boolean logic (AND, OR, NOT gates), which results in “all-or-nothing” digital responses. Intracellular Artificial Neural Networks (IANNs) offer several distinct advantages:
Non-linear Signal Integration: Unlike Boolean gates that require strict thresholds, IANNs use activation functions (like Hill functions) to process analog chemical gradients, allowing for more nuanced environmental sensing. Weighted Inputs: IANNs allow for “tunable” inputs. By varying promoter strength or ribosome binding site (RBS) efficiency, the cell can assign different weights (w) to various biological signals, prioritizing one metabolite over another. Noise Filtering: Biological environments are inherently “noisy.” The summation and thresholding architecture of a perceptron acts as a natural buffer, preventing the circuit from misfiring due to minor stochastic fluctuations in gene expression. Computational Density: A single-layer IANN can perform complex classifications that would require a much larger and more metabolically taxing combination of traditional logic gates. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. Application: An engineered E. coli strain that acts as a therapeutic diagnostic tool within the human gut.
Subsections of Homework
Week 1 HW: Principles and Practices
1) First, describe a biological engineering application or tool you want to develop and why. This could be inspired by an idea for your HTGAA class project and/or something for which you are already doing in your research, or something you are just curious about.
Inspired by the MELiSSA project (Micro-Ecological Life Support System Alternative) from ESA, this project proposes an ecosystem composed of microorganisms and higher plants using their metabolic waste products as a substrate for the next compartment. This project is designed to study the behavior of artificial ecosystems and to develop the technologies required for future regenerative life-support systems in long-duration human space missions, such as lunar bases or missions to Mars.
The system comprises five different compartments, each one colonized respectively by anoxygenic thermophilic bacteria, photoheterotrophic bacteria, nitrifying bacteria, photosynthetic bacteria, higher plants, and the human crew.
I would like to conceptually integrate these microorganisms and higher plants with a plasmids-based control system, through the use of reporter genes and inducible regulatory elements. This would increase the security (allowing real-time monitoring of metabolics states, for example) and predictability of the system.
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
A) Safety
The goal is to guarantee that biotechnological systems used in closed life-support environments do not cause biological, ecological, or health-related harm.
Sub-goals:
-Biological control:
Establish that all microorganisms used in the system are strictly contained within closed bioreactors, with multiple physical and genetic safeguards to prevent unintended survival outside the system.
-Genetic stability and monitoring:
Ensure continuous monitoring protocols to detect mutations, horizontal gene transfer, or loss of function in engineered plasmids and microbial strains over long mission durations.
-Human health protection:
Assess and regulate potential risks to astronaut health, including allergenicity, toxin production, or unintended interactions with the human microbiome in confined environments.
B) Promote responsible and transparent use of synthetic biology
Goal: Ensure that the development of biotechnological life-support systems are governed transparently and responsibly.
Sub-goals:
-Ethical oversight and review:
Require interdisciplinary ethical review (including biologists, engineers, ethicists, and policymakers) before implementing genetically modified organisms in space missions.
-Clear responsibility and accountability:
Define who is responsible for the design, maintenance, and emergency response related to biotechnological failures during long-term missions.
-Open scientific communication:
Promote the publication and sharing of safety data, failures, and best practices to avoid repetition of risks and to foster responsible innovation in space biotechnology.
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).
Action: Ethical and biosafety protocols
Actors: Academic institutions & research ethics committees
Purpose:
This action proposes to develop a standardized requirement for ethical and biosafety review (chosen by researchers, universities and space agencies) before deploying or publishing biotechnological applications.
Design:
Universities and research institutions must require approval from ethics and biosafety committees. Funding agencies could condition grants on compliance. Researchers must submit risk assessments and mitigation plans.
Assumptions:
Assumes ethics committees have sufficient expertise and resources. Assumes researchers will comply honestly. Training and standardization significantly reduce human error.
Risks of Failure & “Success”:
Failure: Bureaucratic delays could slow innovation.
Success risk: Over-standardization may discourage exploratory or low-risk research.
Action: Incentives for safety-by-design practices
Actors: Biotech companies & funding bodies
Purpose:
Currently, safety features are often added after development. This action encourages integrating safety mechanisms from the design stage.
Design:
Grant programs, tax benefits, or certifications for companies that implement safety-by-design standards. Requires collaboration between engineers, biologists, and policymakers.
Assumptions:
Assumes financial incentives are strong enough to change behavior. Assumes safety-by-design standards can be clearly defined across technologies.
Risks of Failure & “Success”:
Failure: Incentives may be insufficient.
Success risk: Companies may focus on “checking boxes” rather than meaningful safety improvements.
Action: Controlled access and monitoring of biotechnological tools
Actors: Federal regulators & law enforcement
Purpose:
At present, access to certain tools or data may be insufficiently monitored. This action proposes tiered access controls to prevent misuse while allowing legitimate research.
Design:
Regulators define categories of risk. Developers implement user verification, logging, and auditing systems. Law enforcement intervenes only in cases of credible misuse.
Assumptions:
Assumes misuse can be detected through monitoring. Assumes access controls do not excessively burden legitimate users.
Risks of Failure & “Success”:
Failure: Overly strict controls may push users toward unregulated alternatives.
Success risk: Normalization of surveillance could raise privacy and academic freedom concerns.
Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
2
1
1
• By helping respond
2
1
1
Foster Lab Safety
• By preventing incident
1
1
1
• By helping respond
1
1
1
Protect the environment
• By preventing incidents
2
2
2
• By helping respond
3
3
1
Other considerations
• Minimizing costs and burdens to stakeholders
1
2
2
• Feasibility?
1
2
2
• Not impede research
1
1
1
• Promote constructive applications
1
1
1
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
The most important governance option for me would be a combination of the three, emphasizing “Safety-by-design” and “Ethical and biosafety protocols” supported by “Controlled access and monitoring” as a complementary safeguard.
For complex projects such as MELiSSA, it is not enough to have only one governance idea, we need some of them in order to control each step properly during the project.
Safety-by-design is important because it encourages the integration of biosafety from the beginning, for example through the use of plasmids-based mechanisms as a way to control the metabolic pathways in each step.
Ethical and biosafety protocols are more than just formalities; they are tools that ensure shared responsibility and protect scientific integrity through risk prevention and accountability mechanisms.
Prioritizing these governance actions required balancing competing interests. While ‘safety-by-design’ might delay early research and increase budgets, these trade-offs are necessary given the high stakes of life-support failures in space. This strategy relies on the assumption that institutional incentives work and that standards remain consistent across platforms. Despite lingering uncertainties about how space environments affect genetic stability, merging technical guardrails with institutional oversight creates a more resilient framework than relying on a single approach.
Target Audience: This proposal targets international bodies like NASA and ESA, which have the strategic power to align regulations and funding for space biotech.
Ethical Reflection: A core concern is accountability within semi-autonomous systems. In setups like MELiSSA, failures might stem from unpredictable biological behaviors rather than human oversight, blurring the lines of responsibility. Furthermore, we must prevent the ‘silent’ transfer of extreme bio-engineering to Earth without public oversight.
Proposed Actions: We need explicit accountability frameworks, scenario-based ethical reviews for off-Earth missions, and transparent protocols for knowledge sharing. These steps ensure that space biotech evolves safely and ethically."
Note: This assignment was developed with the assistance of an AI language model (ChatGPT, Gemini), used to help structure ideas and refine wording. The concepts and final decisions were critically reviewed and adapted by the author.
week 2 HW: DNA Read, Write and Edit
Week 02 - Lecture Questions
Professor Jacobson
The fidelity of DNA replication is governed by DNA polymerase and its associated repair systems. The intrinsic error rate of DNA polymerase, in the absence of proofreading, is approximately 10-4 to 10-5 per nucleotide. In eukaryotes, replicative polymerases utilize 3’ —} 5’ exonuclease activity for proofreading, which enhances fidelity to an error rate of approximately 10-7. When integrated with post-replicative mismatch repair (MMR) mechanisms, the effective error rate is further optimized to roughly 10-9 to 10-10 per nucleotide.Given that the human genome comprises approximately 3.2 x 109 base pairs, replication without these multi-layered fidelity mechanisms would result in a mutational load incompatible with cellular viability. Biological systems mitigate this risk through a hierarchy of safeguards—polymerase proofreading, mismatch repair, and various DNA damage response pathways—ensuring that the mutation rate per genome remains within a range that sustains evolutionary stability and life.
A typical human protein consists of approximately 300 to 400 amino acids. Due to the degeneracy of the genetic code—where 64 codons encode 20 amino acids—the theoretical number of DNA sequences capable of encoding a single protein is exceptionally high.
However, functional constraints significantly restrict this theoretical diversity. Key limiting factors include:
-Codon Usage Bias: Variations in tRNA availability that influence translation efficiency.
-mRNA Secondary Structure: Folding patterns that may impede ribosome binding or elongation.
-GC Content: Extreme ratios that affect both sequence stability and the feasibility of synthesis.
-Regulatory Interference: The unintended presence of cryptic splice sites or premature termination signals.
-Metabolic Burden: High expression levels that may lead to cellular stress or protein misfolding.
Consequently, while the sequence space is vast, the biological context dictates a much narrower range of viable genetic sequences.
Dr. LeProust
Modern oligonucleotide synthesis primarily relies on solid-phase phosphoramidite chemistry. In this process, DNA is synthesized in the 3’ to 5’ direction through iterative cycles of deprotection, coupling, capping, and oxidation.Direct chemical synthesis is currently limited to approximately 150–200 nucleotides. This constraint arises because coupling efficiency is never 100%; as the sequence length increases, the yield of full-length, error-free molecules decreases exponentially. Furthermore, the accumulation of truncated products and point mutations makes the purification of long, high-fidelity oligonucleotides technically prohibitive.To produce longer sequences, such as a 2,000 bp gene, researchers must assemble multiple overlapping short oligonucleotides using enzymatic techniques like PCR assembly or Gibson assembly, followed by sequence verification and cloning.
Animals cannot synthesize certain amino acids de novo and must acquire them through their diet. The ten commonly recognized essential amino acids are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine. Notably, lysine is universally essential across all animal species, representing a fundamental and highly conserved metabolic dependency.
George Church
Question 1: The “Lysine Contingency,” a biocontainment framework proposed by George Church, leverages the metabolic dependency on lysine to prevent the unintended proliferation of engineered organisms. By disabling endogenous lysine biosynthesis, the survival of the organism becomes contingent upon an external supply of this amino acid.
The universal necessity of lysine in animals reinforces the robustness of this strategy, as the evolutionary pressure to bypass such a deeply rooted biochemical constraint is significant. However, because many microorganisms possess the innate ability to synthesize lysine, effective biocontainment requires the knockout of redundant pathways and the implementation of multi-layered genetic safeguards. Thus, the lysine contingency is most effective when integrated into a broader, polygenic containment architecture rather than acting as a singular point of failure.
Week 2 - DNA Read, Write and Edit HM
Part 1: Benchling & In-silico Gel Art
By reordering restriction digest lanes of Lambda DNA, I created a symmetrical gel pattern resembling a butterfly!
Part 2: Gel Art - Restriction Digests and Gel Electrophoresis
Unfortunately No Lab Access
Part 3: DNA Design Challenge
3.1 Chosen Protein: GFP
I chose Green Fluorescent Protein (GFP) because it is widely used as a reporter protein in molecular biology. Since MELiSSA involves plasmid-based control systems and monitoring metabolic states, GFP represents a practical and symbolic example of how biological systems can be visually tracked in real time.
GFP was originally isolated from Aequorea victoria and is commonly used as a fluorescent marker in genetic engineering experiments.
Using UniProt, I obtained the amino acid sequence for GFP (UniProt ID: P42212).
Amino Acid Sequence: >sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTT
LSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELK
GTDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIG
DGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
3.2 Using an online reverse translation tool, I converted the GFP amino acid sequence into a possible coding DNA sequence. Because of codon degeneracy, multiple DNA sequences can encode the same protein. The sequence below represents one possible nucleotide sequence using standard codon usage.
One possible nucleotide sequence: ATGAGCAAAGGTGAAGAACTGTTTACCGGTGTTGTCCCAATTCTGGTTGAATTGGGTGATGGT
AATGGTCATAAATTTTCTGTCTCTGGCGGAGAAGGTGATGCTACCTATAAGCTGACACTGAAA
TTTATTTGCACCACTGGAAAATTGCCAGTTCCATGGCCAACACTGGTTACTACTCTGTCTTAT
GGTGTTCAGTGCTTCTCTCGCTACCCAGATCATATGAAACATGATTTTTTTAAATCTGCCATG
CCAGAGGGTTATGTTCAGGAGCGTACTATTTTTAAAGATGATGGTAATTATAAAACACGTGCT
GAAGTCAAATTTGAAGGTGATACACTGGTAAATCGCATTGAGCTGAAAGGTACCGACTTTAAG
GAAGATGGTAATATTCTGGGTCATAAACTGGAATACAATTATAACTCTCATAATGTCTATATT
ATGGCTGATAAACAGAAGAATGGTATTAAAGTTAATTTTAAAATTCGTCATAATATTGAAGAT
GGTTCTGTTCAGCTGGCTGATCACTACCAGCAGAATACTCCAATTGGAGATGGTCCTGTTCTG
CTGCCAGATAATCACTATCTGAGTACTCAGTCTGCTCTGTCTAAGGATCCAAATGAAAAGCGA
GATCATATGGTTCTGCTGGAATTTGTTACTGCTGCAGGTATTACCCATGGTATGGATGAGCTG
TATAAATAA
3.3 Codon optimization is important to improve protein development in the chosen host organism.
As we know, multiple DNA sequences can encode the same protein due to the degeneracy of the genetic code, but not all codons are used equally in all organisms. This is due to the abundance of tRNA pools.
If a gene contains codons that are rare for the organism, translation may decrease leading to slower protein production or ribosome stalling.
I optimized the codon sequence for Escherichia coli (E. coli) because it grows rapidly, it is inexpensive and has a fully sequenced and well-characterized genome.
Optimizing the gene for E. coli ensures that the codons match the organism’s tRNA abundance, thereby maximizing expression efficiency.
3.4 Cell-Free Protein Expression (In Vitro)
In this method:
The DNA template is added to a reaction mixture containing: RNA polymerase, ribosomes, tARNs, aminoacids, energy sources.
Transcription and translation occur in a test tube without living cells.
The protein is synthesized directly in vitro.
Advantages:
Faster expression
No need to maintain living cells
Useful for toxic proteins
More controllable environment
Limitations:
Higher cost
Typically lower yield than in vivo systems
How DNA Becomes A Protein?
In both systems (cell- dependent or cell-free), the process follows the Central Dogma:
DNA → mRNA → Protein
1)The DNA sequence is transcribed into messenger RNA (mRNA).
2)The ribosome reads the mRNA in codons (sets of three nucleotides).
3)Transfer RNAs (tRNAs) match each codon with the corresponding amino acid.
4)The amino acids are linked together to form a polypeptide chain in a specific site in the ribosome.
5)The polypeptide folds into a functional protein.
Part 4: Prepare a Twist DNA Synthesis Order
For this design, I prepared a linear expression cassette in Benchling containing: Constitutive promoter, ribosome Binding Site (RBS), start codon, codon-optimized GFP coding sequence, 6xHis tag, stop codon, T7 terminator
This cassette would be ordered as a clonal gene through Twist Bioscience.
I would select a high-copy plasmid backbone such as pTwist Amp High Copy, which provides: Ampicillin resistance for selection, high-copy origin of replication and efficient propagation in E. coli
Ordering as a clonal gene would allow direct transformation into E. coli without additional cloning steps, accelerating experimental validation.
Part 5: DNA Read/Write/Edit
5.1 DNA READ
(i) What DNA would you want to sequence and why?
I would like to sequence environmental microbial DNA from closed ecological life-support systems, such as bioreactors used in regenerative environments (similar to MELiSSA-type systems). Specifically, I would sequence microbial community DNA to monitor biodiversity, metabolic stability, and potential pathogenic shifts.
(ii) What sequencing technology would you use and why?
I would use a combination of:
Illumina provides high accuracy short reads, ideal for detecting small mutations and precise taxonomic profiling.
Oxford Nanopore provides long reads, which are useful for assembling genomes, detecting structural variants, and monitoring plasmids or gene clusters.
Using both increases robustness and ecological insight.
Preparation (Essential Steps)
DNA extraction from environmental sample
Fragmentation (if needed for Illumina)
Adapter ligation
PCR amplification (Illumina)
Library preparation
Loading onto flow cell
In closed systems, small microbial imbalances can lead to system instability or health risks. Sequencing allows early detection of contamination, horizontal gene transfer, or harmful mutations. Therefore, DNA sequencing becomes a tool for real-time biosurveillance and ecological control.
Essential Steps of Sequencing Technology
-Illumina (Second-generation)
• DNA fragments attach to flow cell
• Bridge amplification creates clusters
• Sequencing-by-synthesis with fluorescent reversible terminators
• Camera detects fluorescence
• Base calling via signal interpretation
Output:
Short reads (FASTQ files with quality scores)
-Oxford Nanopore (Third-generation)
• DNA passes through nanopore
• Changes in ionic current measured
• Signal processed into nucleotide sequence
Output:
Long reads (FASTQ, real-time data)
5.2 DNA WRITE
(i) What DNA would you want to synthesize and why?
I would synthesize a plasmid-based genetic circuit encoding:
• A fluorescent reporter (e.g., GFP)
• A stress-responsive promoter
• A regulatory element sensitive to metabolic imbalance
The purpose would be to create a biosensor that detects environmental stress inside a microbial ecosystem and produces a measurable fluorescence output.
This construct could function as an early warning system in closed bioreactors.
(ii) What technology or technologies would you use to perform this DNA synthesis and why?
I would use commercial gene synthesis through Twist Bioscience
Why?
• High accuracy
• Scalable synthesis
• Codon optimization
• Assembly-ready fragments
Essential Steps of DNA Synthesis
Digital DNA design
Oligonucleotide synthesis
Assembly (e.g., Gibson assembly)
Sequence verification
Plasmid construction
Limitations
• GC-rich or repetitive sequences are difficult
• Length constraints
• Cost increases with size
• Biosecurity screening restrictions
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I would edit the GFP gene expressed in E. coli to modify its fluorescence intensity. By introducing targeted mutations into the GFP coding sequence, it is possible to alter protein folding efficiency or chromophore structure, potentially enhancing fluorescence output. This modification would allow better signal detection and improved reporter performance in synthetic biology applications.
(ii) I would use CRISPR-Cas9 genome editing
CRISPR-Cas9 uses a guide RNA (gRNA) designed to match a specific DNA sequence within the GFP gene. The Cas9 enzyme introduces a double-strand break at that location. To introduce a precise modification, a donor DNA template containing the desired mutation would be supplied. The bacterial cell then repairs the break, incorporating the modified sequence.
Essential inputs include: Guide RNA targeting GFP, Cas9 nuclease (plasmid or protein form), Donor DNA template containing the intended mutation and
Competent E. coli cells
Limitations of this method include potential off-target effects, variable editing efficiency, and the need for downstream screening to confirm successful edits.
Note: This assignment was developed with the assistance of an AI language model (ChatGPT, Gemini), used to help structure ideas and refine wording. The concepts and final decisions were critically reviewed and adapted by the author.
Week 3 HW: Lab Atomation
Week 03 - Python Script for Opentrons Artwork
I was not able to write the code entirely by myself. The closest I got was generating concentric circles, wich reminded me of the Argentine “Escarapela” (with the help AI).
My original idea, however, was to made an Argentine Mate which I did in https://opentrons-art.rcdonovan.com/
I also did a Cherry!
Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
Case Study: Automation in Drug Discovery
Paper Title: Improving an Open-Sourced Automated Microplate Assay for our Drug Discovery Process
Authors: M. Yunos Alizai, Brianna N. Davis, and Paul H. Davis (University of Nebraska at Omaha).
In order to discover new medicines (mainly against infections), scientifics must try hundreds of chemical compounds against different pathogens and cells. These assays are performed using manual microplate techniques, which are labor-intensive and highly susceptible to user-associated variations and human error, limiting the speed and the reliability of the drug discovery process.
The solution? In this paper the authors developed an automated wide-spectrum screening assay utilizing the Opentrons liquid handling platform. The robot was programmed to automate the preparation of microplate assays, handling precise liquid transfers for:
a-Compound Screening: Rapidly evaluating the effectiveness of various substances against specific pathogens.
b-Cytotoxicity Testing: Measuring the impact of these compounds on host cell metabolism to determine potential toxicity.
The significance of this study lies in the optimization of an open-source tool to achieve high-throughput screening (HTS) capabilities that were previously reserved for labs with much more expensive, proprietary equipment. Key achievements described in the paper include:
-Scalability: The ability to process a significantly larger number of samples in a reduced timeframe.
-Precision: A marked reduction in human-induced variability, leading to more reproducible data.
-Feasibility: Proving that open-source automation is a robust and viable tool for complex clinical applications in combating infections
Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details
Final Project Proposal: Plasmid-Based Autonomous Control Loops for the MELiSSA Ecosystem
The goal of this project is to implement an autonomous biological regulation system within the MELiSSA (Micro-Ecological Life Support System Alternative) framework. By engineering specific plasmids to act as “genetic controllers,” we can regulate metabolic flux and resource production in response to environmental fluctuations (such as CO2 levels or nutrient concentration). This ensures the stability of the artificial ecosystem during long-term space missions.
A central component of this project is the use of GFP (Green Fluorescent Protein) as a reporter. The plasmids will be designed with sensor-promoter systems that trigger GFP expression when specific conditions are met (e.g., a stress-induced promoter).
a) Real-time Monitoring: The fluorescence intensity will serve as a direct proxy for the “health” of a specific compartment (like the cyanobacteria loop).
b) Feedback Loop: Automation tools will be used to measure this fluorescence. If the signal deviates from the setpoint, the system can automatically trigger a corrective action, such as adjusting the flow of nutrients or light intensity.
Automation Tools
The complexity of characterizing these genetic circuits requires high-throughput automation:
a) Opentrons Platform: The OT-2 will be utilized to automate the DNA Assembly (Golden Gate or Gibson Assembly) of the plasmid variants. It will also handle the transformation protocols, ensuring high reproducibility when inserting these controllers into the target microbial strains.
b) Custom 3D-Printed Hardware: To bridge the gap between automation and biology, I will design and 3D-print custom modular tube holders and adapters. These will allow the Opentrons to interface directly with specialized bioreactor sampling tubes, maintaining the required thermal conditions for sensitive enzymes and reagents.
c) Ginkgo Nebula Integration: For large-scale characterization, Ginkgo Nebula will be used to test the plasmids under a vast array of simulated space environments. This high-throughput data will allow for the fine-tuning of the genetic “gain” and “sensitivity” of the controllers before they are deployed in a physical MELiSSA prototype.
By replacing electronic sensors with biological ones (plasmids + GFP), we reduce the reliance on external hardware that can fail in deep space. This “living” control system makes the MELiSSA loop more resilient, self-healing, and inherently integrated into the biological processes it aims to sustain.
Week 4 HW: Protein Design Part I
Week 04 - Part A: Conceptual Questions
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
500 g of meat has more or less 22% of protein, so 500 g x 0.22 =110 g of protein
Average amino acid ≈ 100 Daltons and 1 Dalton ≈ 1 g/mol, so 100 Da≈100 g/mol, in order to convert grams of protein to moles of amino acids
110 g % 100 g/mol = 1.1 moles of amino acids in 500 g of meat
To convert moles to number of molecules
Use Avogadro’s number: 6.022×1023 molecules/mol
1.1 moles of amino acids × (6.022×1023) amino acids molecules≈ 6.6×10^23 (600 sextillion amino acids)
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
When we eat beef or fish, we do not incorporate their proteins intact. Our digestive enzymes break them into amino acids that enter our metabolic pool.
Our ribosomes synthesize human proteins coded in our DNA sequences
So we recycle the amino acids, not the structure or identity of the organism.
Biological identity is encoded in genetic information, not in dietary proteins.
Why are there only 20 natural amino acids?
The canonical genetic code uses 20 amino acids because evolution optimized for:
Chemical diversity (hydrophobic, polar, charged, aromatic, special cases like Gly and Pro)
Translational efficiency
Error minimization
Adding more amino acids increases: Complexity of aminoacyl-tRNA synthetases and risk of translational errors
There are actually two additional genetically encoded amino acids:
Selenocysteine (21st)
Pyrrolysine (22nd)
But they require specialized insertion machinery.
Evolution settled on 20 as a balance between chemical versatility and system simplicity.
Where did amino acids come from before enzymes and before life?
Several hypotheses:
Atmospheric synthesis
The classic 1953 experiment by Stanley Miller and Harold Urey simulated early Earth conditions and produced amino acids from simple gases and electrical sparks.
Extraterrestrial delivery
The Murchison meteorite contained over 70 amino acids.
Hydrothermal vent chemistry
Mineral-catalyzed reactions at deep-sea vents could generate organic molecules.
Before enzymes, amino acids formed via abiotic chemistry driven by energy sources like UV radiation, lightning, or geothermal heat.
If you make an α-helix using D-amino acids, what handedness would you expect?
Natural proteins use L-amino acids and form right-handed α-helices.
If you build a protein entirely from D-amino acids:
→ The chirality inverts
→ You obtain a left-handed α-helix
Helix handedness is dictated by the stereochemistry of the α-carbon.
Can you discover additional helices in proteins?
Yes. Besides the α-helix, known helices include:
3₁₀ helix
π-helix
Polyproline helix
With non-natural amino acids, we could theoretically design: tighter helices, helices with internal charge networks or metal-stabilized helices
The constraints are geometric (bond angles, sterics) and thermodynamic (free energy minimization).
Why are most molecular helices right-handed?
Because biological proteins are built from L-amino acids.
The stereochemistry of L-amino acids restricts backbone dihedral angles (φ and ψ) such that the energetically favored α-helix is right-handed.
If life had evolved using D-amino acids, helices would predominantly be left-handed.
Molecular chirality propagates upward into macroscopic structure.
Why do β-sheets tend to aggregate?
β-strands expose backbone hydrogen bond donors and acceptors.
When proteins partially unfold:
These groups seek new hydrogen bonding partners.
Intermolecular β-sheet formation occurs.
Extended networks form between molecules.
Additionally:
Alternating hydrophobic side chains promote stacking.
β-sheets are inherently “sticky” when exposed.
a) What is the driving force for β-sheet aggregation?
The main driving forces are:
Intermolecular hydrogen bonding
Hydrophobic interactions
Entropic gain from water release
Formation of extended β-sheet networks lowers free energy.
Aggregation is often thermodynamically favorable once nucleation begins.
Why do many amyloid diseases form β-sheets?
In diseases like Alzheimer’s disease:
Proteins misfold.
Normally buried β-prone sequences become exposed.
They assemble into extended β-sheets.
These stack into amyloid fibrils.
β-sheet architecture allows: Extremely stable cross-β structures, template-based propagation and resistance to degradation
β-sheets represent a deep energy minimum in protein conformational space.
a) Can amyloid β-sheets be used as materials?
Yes! This is a growing area in biomaterials science.
Amyloid fibrils have:
High tensile strength
Self-assembly properties
Chemical stability
Applications include:
Tissue engineering scaffolds
Nanofibers
Biocompatible materials
Conductive biomaterials
The same structural features that cause disease can be harnessed for design.
Part B: Protein Analysis and Visualization
Protein Choice: Human Adenylate Cyclase Type 5 (ADCY5)
Organism: Homo sapiens
UniProt ID: O95622
For easier structural analysis in PyMol, I chose to use the catalytic domain structure, a classic solved structure is: 1CJK
This is the catalytic core of mammalian adenylyl cyclase in complex with Gsα.
Briefly describe the protein you selected and why you selected it.
Adenylate cyclase (AC) is the enzyme that converts:
ATP → cAMP + PPi
cAMP is a second messenger that regulates:
PKA
Ion channels
Gene transcription (CREB pathway)
I selected adenylate cyclase because: It is central to signal transduction, links extracellular signals to intracellular responses, is regulated by G proteins (GPCR signaling) and its catalytic mechanism is structurally well characterized.
Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. Does your protein belong to any protein family?
Length: 1261 amino acids (If analyzing only catalytic domain → ~400 residues)
It is a large membrane protein with:
2 transmembrane domains
2 cytosolic catalytic domains (C1 and C2)
Most frequent amino acid: Most frequent: L (129 times)
Homologs? 250 results found in UniProtKB
Adenylate cyclases exist in: Mammals, Insects, Fungi and Bacteria (structurally different class)
You can safely state:
UniProt BLAST reveals thousands of homologous sequences across eukaryotic organisms, reflecting the conserved role of cAMP signaling in evolution.
Family
It belongs to:
Adenylate cyclase family
Nucleotide cyclase superfamily
Class III adenylate cyclases (in mammals)
Class III ACs are evolutionarily conserved catalytic enzymes.
RCSB structure page:
In the structure, three main components can be identified:
At the top, the purple chain corresponds to the regulatory G protein subunit (Gsα).
Below, the green and orange chains represent the two catalytic domains (C1 and C2) of adenylyl cyclase.
In the center of the complex, small molecules can be observed, corresponding to ATP (or an ATP analog) and associated magnesium ions (Mg²⁺), which are required for catalytic activity.
For structural and functional analysis, the most relevant region is the complex formed by the green and orange domains. These two domains together constitute the catalytic core of adenylyl cyclase. The active site is located at the interface between these domains, where ATP binds and is converted into cyclic AMP (cAMP).
When was it solved? Resolution?
1CJK:
Method: X-ray crystallography
Resolution: ~2.3 Å
Year: 1997
2.3 Å = good quality structure
Other molecules present?
Gsα protein fragment
ATP analog
Magnesium ions (Mg²⁺)
These are essential for catalysis and regulation.
Structure classification family
It belongs to:
Class III nucleotidyl cyclase fold
Alpha/beta enzyme family
P-loop NTP-binding–like fold (structurally related)
It forms a dimer of catalytic domains (C1 + C2).
a) Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.
b) Color the protein by secondary structure. Does it have more helices or sheets?
In the secondary structure representation, alpha helices are shown in red, beta sheets in yellow, and loops in green.
The protein contains more alpha helices than beta sheets, indicating that the structure is predominantly alpha-helical with some beta-sheet elements connecting the domains.
c) Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?
Usin the previous picture we can tell that hydrophobic residues tend to be located in the interior of the protein, forming a stable hydrophobic core that helps maintain the folded structure.
On the other hand, hydrophilic and charged residues are mainly exposed on the surface, where they can interact with the aqueous environment or participate in molecular interactions such as ligand binding or protein-protein interactions.
d) Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?
Based on the surface representation of the PyMOL model, the protein does indeed exhibit distinct “holes” or binding pockets, which are characteristic of its enzymatic function
The most prominent “hole” is the deep, central valley located at the interface of the two domains. In a functional Adenylate Cyclase dimer, this is the active site where ATP binds to be converted into cAMP.
Besides the main central cleft, there are smaller peripheral pockets. In AC, these are often the docking sites for regulatory proteins, such as the G-protein alpha subunit (G alpha).
The “red” and “yellow” regions in the surface map indicate an irregular landscape. The “red” areas often correspond to deeper, recessed regions (cavities) that are less accessible to the solvent, which is a classic signature of a binding pocket designed to “cradle” a small molecule substrate.
Part C. Using ML-Based Protein Design Tools
C1. Protein Language Modeling
a) Deep Mutational Scans
Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.
Can you explain any particular pattern? (choose a residue and a mutation that stands out)
The deep mutational scan generated using the ESM2 protein language model shows several highly conserved positions with strongly negative scores across most amino acid substitutions. These positions correspond to residues located in the catalytic site of adenylate cyclase. Because these residues are directly involved in substrate binding and catalysis, mutations at these positions are predicted to be highly unfavorable. The model suggests that introducing bulky or chemically different residues would disrupt ATP binding or interfere with the coordination of catalytic magnesium ions. In contrast, regions outside the active site show more neutral mutation scores, indicating greater tolerance to amino acid substitutions. This pattern is consistent with the functional constraints expected for catalytic residues in enzymes.
b) Latent Space Analysis
Use the provided sequence dataset to embed proteins in reduced dimensionality.
Analyze the different formed neighborhoods: do they approximate similar proteins?
Place your protein in the resulting map and explain its position and similarity to its neighbors.
This t-SNE technique prioritizes the preservation of local structures, meaning that proteins clustered in close proximity share significant biochemical, structural, or evolutionary features.
The clusters formed in this map represent functional neighborhoods.
Functional Approximation: Proteins within the same neighborhood typically share similar catalytic activities or binding domains.
Evolutionary Density: Dense regions often represent highly conserved protein families (e.g., globins or kinases), while sparser regions indicate specialized or divergent proteins.
AC protein is located in a distinct peripheral “arm” of the latent space (red circle). Its position at a high TSNE1 value suggests that while it shares the fundamental characteristics of the broader dataset, it possesses unique structural motifs or regulatory domains that differentiate it from the primary central cluster.
Its neighbors in this specific coordinate range are likely other cyclase enzymes or proteins involved in signal transduction. The localization reflects the protein’s specific role in synthesizing cAMP, a vital second messenger. In Spirulina platensis, these enzymes are often modular, potentially containing additional sensory domains that respond to light or metabolic stress, which accounts for their specific “address” in the latent map.
C2. Protein Folding
a) Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
b) Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?
C3. Protein Generation
a) Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN
Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.
b) Input this sequence into ESMFold and compare the predicted structure to your original.
Part D. Group Brainstorm on Bacteriophage Engineering
L Protein Stabilization
Primary Goal: Increased stability (easiest).
Specific Approach: Engineering DnaJ-independence by reducing chaperone-recognition signals while preserving the structural scaffold of the L protein.
Computational Tools and Pipeline Justification To achieve this goal, we propose a three-step computationally efficient pipeline:
Step 1: Sequence-level Mutational Scanning using ESM2
Approach: We will perform a zero-shot in silico mutational scan across the L protein sequence using the ESM2 Protein Language Model (PLM). We aim to identify exposed hydrophobic patches (typical DnaJ recognition motifs) and propose polar/hydrophilic substitutions.
Why this helps: ESM2 has learned deep evolutionary constraints across millions of protein sequences. It allows us to rapidly differentiate between highly constrained residues (which are structurally vital and “untouchable”) and mutation-tolerant positions. This ensures we only disrupt chaperone-binding motifs without breaking the core evolutionary scaffold of the protein, all at a fraction of the computational cost of molecular dynamics.
Step 2: Rapid Structural Filtering using ESMFold
Approach: The top candidate sequences from the ESM2 scan will be predicted using ESMFold. We will filter out any variants that collapse, show low pLDDT (confidence) scores, or have a high RMSD compared to the Wild-Type (WT) backbone.
Why this helps: While ESM2 evaluates sequence-level fitness, we need explicit 3D structural validation. ESMFold is significantly faster than AlphaFold2, making it ideal for high-throughput filtering. This step ensures that our hydrophilic mutations do not inadvertently destroy the L protein’s ability to fold independently.
Step 3: Complex Modeling using Boltz-1
Approach: We will model the L protein + DnaJ complex for both the WT and our top folded mutant candidates. We will analyze the predicted interface contacts and Predicted Aligned Error (PAE) to assess binding affinity.
Why this helps: Folding correctly in isolation is not enough; we must explicitly prove reduced chaperone dependency. By comparing the mutant-DnaJ interface against the WT-DnaJ interface, we can prioritize variants that maintain a stable fold but show a significantly weakened or abolished interaction with the DnaJ chaperone.
Potential Pitfalls
Pitfall 1: Overlapping Reading Frames and Genomic Constraints. Phage genomes are highly compact, meaning the DNA sequence encoding the L protein might also encode parts of other proteins or regulatory elements in alternative reading frames. Our targeted mutations could have unintended, fatal consequences for the phage’s overall viability. While genomic foundation models like Evo could assess these genome-wide constraints, their computational cost is prohibitive for our current scope.
Pitfall 2: The Stability vs. Function Trade-off. ESMFold guarantees that the protein adopts a stable 3D conformation in solution, but it does not guarantee biological function (membrane lysis). Lytic activity heavily depends on complex factors like membrane insertion dynamics, oligomerization, and reaction kinetics. Furthermore, completely abolishing chaperone interaction might inadvertently prevent the L protein from being properly delivered to its target membrane.
Week 5 — Protein Design Part II
Week 5
Part A: SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM
Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card:
Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
To your generated list, add the known SOD1-binding peptide FLYRWLPSRRGG for comparison.
Record the perplexity scores that indicate PepMLM’s confidence in the binders.
Part 2: Evaluate Binders with AlphaFold3
Navigate to the AlphaFold Server: alphafoldserver.com
For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide:
Paste the peptide sequence.
Paste the A4V mutant SOD1 sequence in the target field.
Check the boxes
a) Predicted binding affinity
b) Solubility
c) Hemolysis probability
d) Net charge (pH 7)
e) Molecular weight
Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?
Choose one peptide you would advance and justify your decision briefly.
Part 4: Generate Optimized Peptides with moPPIt
Now, move from sampling to controlled design. moPPIt uses Multi-Objective Guided Discrete Flow Matching (MOG-DFM) to steer peptide generation toward specific residues and optimize binding and therapeutic properties simultaneously. Unlike PepMLM, which samples plausible binders conditioned on just the target sequence, moPPIt lets you choose where you want to bind and optimize multiple objectives at once.
Open the moPPit Colab linked from the HuggingFace moPPIt model card
Make a copy and switch to a GPU runtime.
In the notebook:
a) Paste your A4V mutant SOD1 sequence.
b)Choose specific residue indices on SOD1 that you want your peptide to bind (for example, residues near position 4, the dimer interface, or another surface patch).
c) Set peptide length to 12 amino acids.
d) Enable motif and affinity guidance (and solubility/hemolysis guidance if available). Generate peptides.
After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?
Part B: Optional
Part C: Final Project: L-Protein Mutants
Week 6 — Genetic Circuits Part I: Assembly Technologies
Week 6 — Genetic Circuits Part I: Assembly Technologies
DNA Assembly
Answer these questions about the protocol in this week’s lab:
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
The Phusion High-Fidelity PCR Master Mix contains several components:
Phusion DNA polymerase → a high-fidelity enzyme that synthesizes DNA with very low error rates (With a failure rate 50 times lower than Taq and 6 times lower than Pfu, these polymerases are an excellent choice for cloning and other applications requiring high fidelity), which is critical when amplifying fragments of the amilCP gene.
dNTPs (deoxynucleotide triphosphates) → building blocks for new DNA strands
MgCl₂ → cofactor necessary for polymerase activity
Buffer system → maintains optimal pH and ionic conditions
These components work together to ensure accurate and efficient DNA amplification, also Phusion DNA polymerases offer robust performance with short protocol times, even in the presence of PCR inhibitors. They generate higher yields with less enzyme than other DNA polymerases.
In this protocol, the master mix is used to amplify amilCP fragments that will later be assembled using Gibson Assembly.
What are some factors that determine primer annealing temperature during PCR?
Primer annealing temperature depends on:
Primer length → longer primers have higher melting temperatures,
GC content → higher GC increases stability and raises Tm. Higher melting temperatures are caused due to stronger hydrogen bonding.
In this protocol, primers include additional overhangs (20–22 bp) for Gibson Assembly, but only the binding region determines the annealing temperature. The annealing temperature is typically set a few degrees below the melting temperature (Tm) to ensure specific binding.
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
In this protocol, PCR amplify specific regions of the amilCP gene, including mutated regions in the chromophore, allowing precise control over sequence design
In contrast, restriction digestion (using PvuII) is used to linearize the pUC19 plasmid backbone.
PCR is more flexible and allows introduction of mutations and overlaps, while restriction digestion relies on specific enzyme recognition sites. PCR is preferable for designing new constructs, whereas digestion is useful for preparing existing plasmid backbones.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
To ensure compatibility with Gibson Assembly, DNA fragments must have overlapping homologous regions of ~20–22 base pairs.
In this protocol, these overlaps are introduced through primer design during PCR amplification of the amilCP fragments. The pUC19 backbone generated by restriction digestion also contains compatible ends. These overlaps allow fragments to anneal and be joined seamlessly during the Gibson Assembly reaction.
How does the plasmid DNA enter the E. coli cells during transformation?
Plasmid DNA enters E. coli cells during transformation through heat shock or electroporation.
In heat shock, cells are chemically treated (for example with CaCl₂) and briefly heated, creating pores in the membrane
In electroporation, an electric pulse temporarily disrupts the membrane
These methods allow DNA to pass into the cell, where it can replicate. Once inside, the plasmid replicates and expresses the amilCP gene, allowing colonies to be visually identified by color.
Describe another assembly method in detail (such as Golden Gate Assembly)
a) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
Golden Gate Assembly is a molecular cloning method that uses Type IIS restriction enzymes (such as BsaI) and DNA ligase in a single reaction.
These enzymes cut DNA outside their recognition site, generating customizable overhangs. This allows multiple DNA fragments to be assembled in a specific order without leaving unwanted sequences (scarless assembly). The reaction cycles between digestion and ligation, increasing efficiency. Because of its precision, Golden Gate is ideal for assembling multiple fragments simultaneously. It is widely used in synthetic biology for modular cloning. Compared to Gibson Assembly, it relies more on restriction sites rather than homologous overlaps.
 Model this assembly method with Benchling or Asimov Kernel!
Assignment: Asimov Kernel
Create a Repository for your work
Create a blank Notebook entry to document the homework and save it to that Repository
Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)
Create a blank Construct and save it to your Repository
a) Recreate the Repressilator in that empty Construct by using parts from the Characterized Bacterial Parts repository
b) Search the parts using the Search function in the right menu
c) Drag and drop the parts into the Construct
d) Confirm it works as expected by running the Simulator (“play” button) and compare your results with the Repressilator Construct found in the Bacterial
e) Demos repository
f)Document all of this work in your Notebook entry - you can copy the glyph image and the simulator graphs, and paste them into your Notebook
Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo
a) Explain in the Notebook Entry how you think each of the Constructs should function
b) Run the simulator and share your results in the Notebook Entry
c) If the results don’t match your expectations, speculate on why and see if you can adjust the simulator settings to get the expected outcome
Week 07: Genetic Circuits Part 2
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits primarily rely on Boolean logic (AND, OR, NOT gates), which results in “all-or-nothing” digital responses. Intracellular Artificial Neural Networks (IANNs) offer several distinct advantages:
Non-linear Signal Integration: Unlike Boolean gates that require strict thresholds, IANNs use activation functions (like Hill functions) to process analog chemical gradients, allowing for more nuanced environmental sensing.
Weighted Inputs: IANNs allow for “tunable” inputs. By varying promoter strength or ribosome binding site (RBS) efficiency, the cell can assign different weights (w) to various biological signals, prioritizing one metabolite over another.
Noise Filtering: Biological environments are inherently “noisy.” The summation and thresholding architecture of a perceptron acts as a natural buffer, preventing the circuit from misfiring due to minor stochastic fluctuations in gene expression.
Computational Density: A single-layer IANN can perform complex classifications that would require a much larger and more metabolically taxing combination of traditional logic gates.
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
Application: An engineered E. coli strain that acts as a therapeutic diagnostic tool within the human gut.
Input/Output Behavior:
A) Inputs (Xn): The system senses multiple biomarkers of inflammation simultaneously, such as Nitric Oxide (X1), Thiosulfate (X2), and Calprotectin (X3).
B) Processing: The IANN integrates these concentrations. Only if the weighted sum of these inflammatory markers exceeds a specific threshold (indicating a disease state rather than a transient spike) does the “neuron” fire.
C) Output (Y): The controlled secretion of an anti-inflammatory cytokine (e.g., IL-10) or a visual reporter like GFP for diagnostic stool analysis.
Limitations:
A) Metabolic Burden: Expressing multiple sensing proteins and processing machinery can redirect significant resources away from cellular growth (chassis stress).
B) Orthogonality: Ensuring that the synthetic components do not cross-react with the host cell’s native RNA processing machinery is a major design challenge.
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
The diagram illustrates a two-layer genetic cascade functioning as an artificial neural network within a cellular chassis.
Layer 1 (Input Processing): Genetic input X1 undergoes transcription (Tx) and translation (Tl) to produce the endoribonuclease Csy4 (represented by the node in Layer 2).
Layer 2 (Signal Integration): Genetic input X2 is transcribed into mRNA. The Csy4 protein produced in Layer 1 acts as a negative regulatory weight, targeting and cleaving the X2 mRNA transcript. This site-specific cleavage inhibits the subsequent translation (Tl) of the final output.
Output (Y): The system results in the expression of Fluorescent Protein (FP Y) only in the absence of Csy4 and the presence of X2 stimulus, effectively mimicking a logic gate with tunable biochemical weights.
Assignment Part 2: Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
Fungal materials, often referred to as mycomaterials, are a rapidly growing field of sustainable engineering. These materials are typically grown by inoculating agricultural waste with mycelium (the root-like structure of fungi), which acts as a natural biological glue.
Some existing fungal materials are:
Myco-Foam: Used as a direct replacement for Polystyrene (Styrofoam). Companies like Ecovative Design grow custom-molded packaging that is fully compostable.
Myco-Bricks: Mycelium is grown into bricks or insulation panels. These are used in experimental architecture for their thermal and acoustic properties ( because of its porous and fibrous nature).
Myco-Leather: Brands like Mylo or Reishi produce a material that mimics the texture and durability of animal leather for the fashion industry
In terms of Sustainability fungal materials are carbon-negative and fully biodegradable. They grow on agricultural “waste” (like corn husks or wood chips), turning low-value byproducts into high-value materials; on the other hand traditional materials like plastics are petroleum-based and contribute to long-term microplastic pollution. Animal leather has a massive carbon footprint due to the land and water required for livestock.
Regarding Growth Time fungal Can be grown and “manufactured” in days to weeks while traditional Leather requires years for an animal to mature, plastic production is nearly instant, the geological time required to create the oil it comes from is millions of years.
Also, Fungal materials are naturally fire-resistant and do not off-gas Volatile Organic Compounds (VOCs), which are common in synthetic foams and glues; any traditional foams are highly flammable and release toxic fumes during combustion or over time through degradation.
Despite their potential, fungal materials face specific engineering hurdles.
In case of fungal, as we know, biological systems are inherently variable. Factors like humidity, temperature, and substrate consistency can lead to biological “noise”, making it difficult to produce perfectly uniform batches; industrial processes for plastics and metals are highly standardized, ensuring every unit is identical.
Because they are designed to be biodegradable, fungal are sensitive to moisture. If not properly sealed, they can begin to decay if used in outdoor or high-humidity environments; materials like PVC or high-density polyethylene are extremely durable and resist decay, which is their greatest strength during use but their biggest flaw as waste.
In Fungal moving from lab-scale prototypes to massive industrial throughput requires significant infrastructure. Furthermore, there is often a “yuck factor” or stigma associated with using “mushrooms” for clothing or housing that must be overcome.
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
An application for genetically engineering fungi is the development of wearable, autonomous biosensors (Biostickers) for industrial safety, specifically in mining environments.
I want to engineer filamentous fungi (such as Aspergillus nidulans) to sense sub-lethal concentrations of toxic gases (CO, CH4).
Using an Intracellular Artificial Neural Network (IANN), the fungi would integrate chemical signals from the mine’s atmosphere. When a specific safety threshold is reached, the circuit triggers a visible phenotypic change, such as the expression of high-intensity chromoproteins (e.g., amilCP for a dark blue/purple color) or bioluminescence.
This provides a zero-power, spark-safe, and low-cost early warning system for miners. Unlike electronic sensors, a “living sticker” on a helmet is intrinsically safe in explosive atmospheres and highly resistant to the physical rigors of a mine.
What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
While E. coli is often the default chassis for synthetic biology, fungi offer unique mechanical and biological advantages for a mining Biosticker:
Environmental Resilience: Fungi are naturally evolved to thrive in harsh, low-moisture, and variable pH environments. In a mine, where humidity fluctuates and surfaces are abrasive, the fungal cell wall (chitin-based) provides superior structural integrity compared to the fragile membranes of bacteria.
3D Morphological Engineering (Mycelium): Fungi grow in complex hyphal networks. We can engineer the branching density of the mycelium to create a “living fabric” within the sticker. This allows for a higher surface area for gas diffusion and a more robust physical form factor that can be integrated into a wearable adhesive.
Eukaryotic Transcriptional Control: Fungi possess sophisticated eukaryotic gene regulation. This allows for the implementation of complex, multi-layered IANNs with post-translational modifications, which are necessary for the accurate folding of advanced reporter proteins that bacteria might struggle to produce.
Secretory Power and Matrix Integration: Fungi are masters of secretion. They can be engineered to secrete protective proteins into the surrounding hydrogel matrix of the sticker, effectively “engineering their own environment” to remain viable on a miner’s helmet for weeks without external maintenance.
Assignment Part 3: First DNA Twist Order
Review Part 3: DNA Design Challenge of the week 2 homework. Design at least 1 insert sequence and place it into the Benchling/Kernel/Other folder you shared in the Google Form above. Document the backbone vector it will be synthesized in on your website.
The insert contains a fungal expression cassette designed for a biosensing system in mining environments. The PgpdA promoter from Aspergillus drives expression of the AmilCP chromoprotein reporter. When environmental stress caused by toxic gases occurs, the fungus produces a visible blue signal. The construct includes a Kozak sequence for translation initiation and a transcription terminator.