Step 1. First, describe a biological engineering application or tool you want to develop and why. I am passionate about reading, as well as Biology, so I came up with an idea that could mix both and be sustainable and enjoyable for us bookworms.
My idea that I would like to put into practice is a reading light powered by bioluminescent algae, combined with microbial fuel cells for supplemental energy. Instead of relying solely on electricity, the lamp is partially powered by living systems, creating a sustainable, educational, and interactive device.
Part 1: Benchling & In-silico Gel Art Here is a simulation with the Restriction Enzyme Digestion on Benchling.com: Part 3: DNA Design Challenge 3.1. Choose your protein. I chose the Green Fluorescent Protein (GFP) because it naturally glows green when exposed to UV light. This revolutionized cell biology by allowing scientists to see proteins inside living cells and it won the 2008 Nobel Prize in Chemistry. This protein has been isolated from the jellyfish Aequorea victoria and forms a beta-barrel structure (like a protective can). Inside the barrel is the chromophore — the light-producing part.
Opentrons Artwork Post-Lab Questions I found a published paper which describes how researchers leveraged the Opentrons OT-2 automated liquid handler to develop an automated, high-throughput proxy viscometer. The robot was programmed to dispense liquids of various viscosities and collect data for machine-learning models to predict viscosity, demonstrating a practical application of the OT-2 in fluid characterization workflows — requiring minimal human intervention while significantly increasing throughput. If I would go for my first project idea, which is the Bacterial Microplastic Sensor, there would be the following automation tools that I could apply: Automated Fluorescence Detection System The goal is to automatically quantify GFP output in response to PET degradation products. I can automate the timed fluorescence measurement (every 10 min), background subtraction, data logging (CSV), real-time plotting and threshold alert. Automated Incubation + Sampling There can be a shaking plattform automation, with 3D tube rack and/or temperature control with heat pad, temperature sensor and automated regulation. A pseudocode example can be:
Part A. Conceptual Questions Why do humans eat beef but do not become a cow, eat fish but do not become fish?
When you eat beef or fish, your body does not keep the meat intact and turn it into “cow tissue” or “fish tissue.” Instead, your digestive system breaks everything down into basic molecules, like proteins into amino acids, fats into fatty acids + glycerol, carbohydrates into simple sugars and DNA into nucleotides.
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM The human SOD1 sequence without the mutation:
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
The human SOD1 sequence with the A4V mutation:
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
These are the 4 generated peptides and the added peptide "FLYRWLPSRRGG" I obtained from Google Colab:
Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Some components in the Phusion High-Fidelity PCR Master Mix are:
➜ Phusion DNA Polymerase - This enzyme copies the DNA with very high accuracy. ➜ dNTPs (deoxynucleotide triphosphates) - These are the building blocks that the polymerase uses to synthesize new DNA.
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits 😐 Traditional genetic circuits typically implement logic like AND, OR, NOT — meaning outputs are binary (on/off).
😐 Boolean circuits are limited to combinations of discrete logic rules.
Subsections of Homework
Week 1 HW: Principles and Practices
Step 1. First, describe a biological engineering application or tool you want to develop and why.
I am passionate about reading, as well as Biology, so I came up with an idea that could mix both and be sustainable and enjoyable for us bookworms.
My idea that I would like to put into practice is a reading light powered by bioluminescent algae, combined with microbial fuel cells for supplemental energy. Instead of relying solely on electricity, the lamp is partially powered by living systems, creating a sustainable, educational, and interactive device.
Bioluminescent algae (e.g., Pyrocystis lunula or engineered E. coli) produces gentle, continuous light. Algae glow when they metabolize nutrients, providing a natural light source. This can lead to sustainable outcomes, such as reducing electricity consumption, using renewable biological processes and biodegradable components that minimalize waste.
My additional goals are to enhance the aesthetic experience of reading and to explore practical, safe uses of bioluminescence beyond the laboratory.🦠🌱
This image is AI generated
Step 2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
My first big goal is that this lamp should, of course, not harm any humans, animals or the environment
Sub-goal 1A: Containment of living organisms
The use of non-pathogenic, lab-safe algae or bacteria.
Ensuring that the lamp is a closed system to prevent accidental release into homes or the environment.
Include fail-safes so organisms cannot survive outside the lamp (e.g., nutrient-dependent survival).
Sub-goal 1B: Safe user interaction
Developing clear usage guidelines, labeling, and instructions.
Prevent accidental ingestion, skin reactions, or allergic responses.
Educate users on proper disposal of nutrients and lamp components.
My second big goal is that the lamp should not negatively impact ecosystems or contribute to waste.
Sub-goal 2A: Biodegradable materials
The use of compostable biomaterials for the lamp casing and cartridges.
Reducing reliance on plastics or non-renewable resources.
Sub-goal 2B: Minimal ecological footprint
Design the lamp to consume minimal electricity and nutrients.
Ensure any waste products from the lamp (e.g., spent algae or nutrient capsules) are safe and compostable.
Step 3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)
Governance Action 1: Implement a Biosafety Certification Program
Purpose: Ensure that all SymbioLight lamps meet strict biosafety standards to prevent accidental release of organisms or harm to users.
Design: Require all lamps to be tested in labs for non-pathogenicity and containment integrity.
Assumptions: Users may handle the lamp incorrectly or dispose of it improperly.
Risks or Failure: Contaminated or unsafe lamps reaching consumers.
Success: Safe adoption of living lamps in households.
Governance Action 2: The use of Biodegradable, Low-Impact Materials Policy
Purpose: Ensure the lamp’s components do not harm the environment when discarded, aligning with sustainability goals.
Design: Mandate mycelium, algae-based plastics, or bacterial cellulose for lamp casing and nutrient cartridges. Plus, require testing for complete compostability and low environmental toxicity.
Assumptions: Users will dispose of lamps in composting or bio-waste systems.
Risks or Failure: Non-compostable waste entering landfills or water systems.
Success: Increased public trust in synthetic biology products.
Governance Action 3: Mandatory User Education & Ethical Guidance
Purpose: Promote safe, responsible, and informed use of SymbioLight, and foster public understanding of living systems.
Design: Include educational manuals and labels explaining the biology, safety protocols, and proper disposal.
Assumptions: Users may be unfamiliar with living systems and mishandle them without guidance.
Risks or Failure: Misuse or neglect of living organisms leading to lamp failure or ecological impact.
Success: Informed users who safely interact with living lamps.
Step 4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
3
3
3
• By helping respond
2
2
2
Foster Lab Safety
• By preventing incident
2
3
2
• By helping respond
2
2
2
Protect the environment
• By preventing incidents
2
3
2
• By helping respond
1
1
2
Other considerations
• Minimizing costs and burdens to stakeholders
2
2
2
• Feasibility?
2
3
2
• Not impede research
2
2
2
• Promote constructive applications
3
2
2
Step 5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.
After considering the three proposed governance actions—(1) biosafety certification, (2) sustainable materials policy, and (3) user education requirements—I would prioritize a combination of Option 1 and 3, with Option 2 as a secondary but important long-term goal.
The most critical ethical responsibility of SymbioLight is to ensure non-malfeasance—that the product cannot cause harm to people or the environment. Because the lamp involves living organisms, even if they are benign, the greatest potential risk lies in accidental release of engineered algae or bacteria, contamination of local ecosystems as well as unintended health effects in homes. Without strong biosafety guarantees, the entire concept could become unethical regardless of how sustainable or educational it is.
The rason why I would prioritize Option 3 as well, is because even a perfectly engineered product can cause problems if misused. Therefore, governance must include the human element. SymbioLight users need to understand how to care for living organisms, how to dispose of materials properly and the limits of what the lamp can safely do.
Lastly, Option 2 is also important, but not first because sustainability is a core motivation for SymbioLight, but it does not address immediate safety risks. Using biodegradable materials is ethically desirable, yet a lamp made from non-ideal materials is still less harmful than a lamp that releases unsafe organisms.
Week 2 HW: DNA read, write and edit
Part 1: Benchling & In-silico Gel Art
Here is a simulation with the Restriction Enzyme Digestion on Benchling.com:
Part 3: DNA Design Challenge
3.1. Choose your protein.
I chose the Green Fluorescent Protein (GFP) because it naturally glows green when exposed to UV light. This revolutionized cell biology by allowing scientists to see proteins inside living cells and it won the 2008 Nobel Prize in Chemistry.
This protein has been isolated from the jellyfish Aequorea victoria and forms a beta-barrel structure (like a protective can). Inside the barrel is the chromophore — the light-producing part.
The GFP visible in Aequorea victoria. Source: https://www.universityofcalifornia.edu/news/how-glow-dark-jellyfish-inspired-scientific-revolution
sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
3.4 Technologies for producing the Green Fluorescent Protein.
To produce the Green Fluorescent Protein (GFP), a gene encoding GFP will be necessary. This gene contains a promoter, coding sequence and terminator.
A: Cell-Dependent Method.
The GFP gene is inserted into a plasmid with a bacterial promoter. The bacteria (E.coli) transcribes and translates the gene and the protein accumulates in the cytoplasm. It can be purified using chromatography. This method is cheap and fast.
Another cell-dependent method would be when the plasmid with GFP is introduced via transfection in yeast or mammalian cells. Through this method, cells express GFP for microscopy studies or protein assays. Therefore, it is useful if GFP needs to fold properly in eukaryotic environments.
graph LR;
DNA-->mRNA-->Protein-->Fluorescence
B: Cell-Free Method.
The cell-free method is the use of cell-free protein synthesis systems, which contain ribosomes, tRNAs, amino acids, nucleotides, ATP, GTP, and transcription/translation enzymes. Then, the DNA template will be added for GFP to the mixture. The protein is produced without living cells, often within a few hours. This method is rapid, avoids toxic effects of protein expression, easy to add modifications.
3.5. [Optional] How does it work in nature/biological systems?
A single gene can produce multiple different proteins at the transcriptional level mainly through mechanisms that modify the RNA transcript before it becomes translated. This greatly expands protein diversity without increasing gene number. The key mechanisms are Alternative Splicing, Alternative Promoters, Alternative Polyadenylation and RNA Editing. The most crucial method is alternative splicing. During transcription, a gene is copied into pre-mRNA containing exons and introns. Before translation, introns are removed and exons are joined. In alternative splicing, different combinations of exons are included or excluded. Types of alternative splicing are exon skipping, alternative 5′ splice site, alternative 3′ splice site, intron retention and mutually exclusive exons.
➜ The gene TP53 provides instructions for making a protein called tumor protein p53 (or p53).
➜ This protein acts as a tumor suppressor, which means that it regulates cell division by keeping cells from growing and dividing (proliferating) too fast or in an uncontrolled way.
➜ It is often called the guardian of the genome because it prevents cells with damaged DNA from becoming cancerous.
➜ Thus, the TP53 gene is arguably the most important gene in cancer biology.
The p53 protein is located in the nucleus of cells throughout the body, where it binds directly to DNA. When the DNA in a cell becomes damaged by agents such as toxic chemicals, radiation, or UV rays from sunlight, this protein plays a critical role in determining whether the DNA will be repaired or the damaged cell will self-destruct. If the DNA can be repaired, p53 activates other genes to fix the damage. If the DNA cannot be repaired, this protein prevents the cell from dividing and signals it to undergo self-destruction. By stopping cells with mutated or damaged DNA from dividing, p53 helps prevent the development of tumors.
TP53: A central mediator of stress responses. Source: https://p53.fr/images/image_info/TP53_Knowledge/TP53_Pathtway_2.png
(ii) For sequencing, I would choose the Illumina Sequencing (Sequencing by Synthesis).
Generation
It is second generation because it sequences millions of DNA fragments at the same time (massively parallel sequencing), requires PCR amplification first (cluster generation on a flow cell) and it produces short reads (usually 75-300 base pairs).
Input & Preparation
The biological input would be the human genomic DNA, like blood tissues or cultured cells, or molecular input like double stranded DNA fragments. How to prepare the input: DNA extraction ➜ Fragmentation ➜ End Repair and A-Tailing ➜ Adapter Ligation ➜ PCR Amplification ➜ Library Quantification
How it reads the DNA
Illumina uses the fluorescent reversible terminator nucleotides. One labeled nucleotide (A, T, C, G) is incorporated, the terminator prevents further extension, the laser excites fluorophore and the camera detects the emitted color. This is how the base is identified and each color responds to one base.
Output
Illumina produces raw data files (BCL files) converted into FastQ files. This contains sequence reads and quality scores.
5.2 DNA Write
(i) If I could synthesize a gene, I would choose synthesizing nitrogen-fixation genes for crops.
Why are nitrogen fixating crops so powerful?
➜ Right now, only legumes (like beans and peas) form symbiosis with nitrogen-fixing bacteria such as Rhizobium
➜ Major crops (wheat, rice, maize) rely heavily on synthetic fertilizer
➜ Fertilizer production uses the Haber–Bosch process, which is extremely energy-intensive
➜ If cereal crops could fix nitrogen, there would be massive reduction in greenhouse gas emissions, lower farming crops, less nitrate pollution in rivers and improved soil health.
➜ Unfortunately, nitrogen fixation is a complex process because nitrogen requires ~15–20 genes (nifHDK and accessory genes), tight regulatory control and metal cofactors (Mo-Fe clusters).
➜ Key genes include nifH, nifD, and nifK.
(ii) I would use the Oxford Nanopore method (Longs-read sequencing)
1. Essential steps:
High-molecular-weight DNA extraction
Adapter attachment
DNA passes through nanopore protein
Electrical signal changes detected
Base calling via AI algorithms
It is essential for very long reads, it can sequence an entire nif cluster in one read and it detects structural variations easily.
2. Limitations:
The Oxford Nanopore method has raw accuracy, more insertion/deletion errors and it requires strong computational analysis.
5.3 DNA Edit
(i) If I could edit a gene, I would edit disease vectors (Gene Drives), specifically Mosquito Malaria Control.
Why did I choose this editing?
The main species targeted is the mosquito Anopheles gambiae. This species spreads malaria by transmitting the parasite Plasmodium falciparum. Malaria is a very problematic disease because it kills hundreds of thousands of people per year- mostly children.
By editing Anopheles gambiae with gene drives, the mosquito can be sterile and it would prevent transmitting the malaria parasite. I would edit a fertility gene (often doublesex) and attach a CRISPR-based gene drive. This will lead to the breeding of modified mosquitoes, the gene drive would copy itself into the partner chromosome and nearly all of the offsprings inherit it. But as good as this sounds, there are obstacles, such as arising new mutaions that break the CRISPR target site, preserve fertility and outcome the drive.
CRISPR technologies for the control and study of malaria. Source: https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13071-025-06905-w/MediaObjects/13071_2025_6905_Fig1_HTML.png
(ii) I would use CRISPR-Cas9 combined with a gene drive cassette as the technology to edit the gene.
How CRISPR edits DNA and the essential steps
CRISPR–Cas9 editing works in 3 core steps: Target recognition, DNA cleavage and Repair pathway determines outcome. For a gene drive, the inserted cassete includes a Cas9 gene, sgRNA gene and homology arms. After cutting the wild-type allele, the cell uses the gene drive allele as the repair template, the drive copies itself and the organism becomes homozygous. The essential steps are:
1. Target gene selection
Choosing a gene critical for female fertility or parasite transmission
2. Guide RNA design
Using bioinformatics tools to identify unique 20-nt target sequence, minimize off-target matches and avoid polymorphic regions
3. Construct Gene Drive Cassete
The components include Cas9 coding sequence, germline-specific promoter, sgRNA expression cassette and homology arms (~1 kb each side)
4. Embryo Microinjection
Inputs delivered into early embryos
5. Screening
6. Contained Population Testing
Input & Preparation
Input: Cas9 enzyme (or Cas9 expression plasmid), sgRNA sequence, donor DNA template with homology arms, promoter sequences, selectable marker, mosquito embryos, microinjection equipment and PCR primers for validation.
I found a published paper which describes how researchers leveraged the Opentrons OT-2 automated liquid handler to develop an automated, high-throughput proxy viscometer. The robot was programmed to dispense liquids of various viscosities and collect data for machine-learning models to predict viscosity, demonstrating a practical application of the OT-2 in fluid characterization workflows — requiring minimal human intervention while significantly increasing throughput.
If I would go for my first project idea, which is the Bacterial Microplastic Sensor, there would be the following automation tools that I could apply:
Automated Fluorescence Detection System
The goal is to automatically quantify GFP output in response to PET degradation products. I can automate the timed fluorescence measurement (every 10 min), background subtraction, data logging (CSV), real-time plotting and threshold alert.
Automated Incubation + Sampling
There can be a shaking plattform automation, with 3D tube rack and/or temperature control with heat pad, temperature sensor and automated regulation. A pseudocode example can be:
if temperature < 37:
heater_on()
else:
heater_off()
AI-Based Fluorescence Quantification
Instead of raw intensity, there can be the use of computer vision or a train model to classify fluorescence levels.
Ginkgo Nebula Integration Plan
The usage of Gingko Nebula could be very helpful with DNA design automation (Promoter optimization, RBS strength prediction, codon optimization, circuit simulation) or experimental tracking (automated protocol versioning, construct iteration tracking, strain documentation).
Week 4 HW: Protein Design Part I
Part A. Conceptual Questions
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
When you eat beef or fish, your body does not keep the meat intact and turn it into “cow tissue” or “fish tissue.” Instead, your digestive system breaks everything down into basic molecules, like proteins into amino acids, fats into fatty acids + glycerol, carbohydrates into simple sugars and DNA into nucleotides.
Why are there only 20 natural amino acids?
Life could have used more (and sometimes does), but 20 appears to be a near-optimal balance. But why is that so? The Genetic Code has limits. Proteins are built using codons — 3-letter sequences in DNA/RNA. ➜ 4 bases (A, U/T, C, G), 3 positions per codon, 4³ = 64 possible codons. But 3 are stop signals and many codons are redundant (multiple codons for the same amino acids). Therefore, The code settled on 20 standard amino acids early in evolution and became highly conserved. Changing the code would break nearly all exiting proteins and would be catastrophically disruptive!
Can you make other non-natural amino acids? Design some new amino acids.
Synthetic biology and medicinal chemistry routinely create non-natural amino acids, and some are even genetically encoded in engineered organisms. Some are chemically synthesized and incorporated during peptide synthesis, while others are genetically encoded using engineered tRNA/synthetase systems.
One amino acid would be Alkyne-Lysine (Bioorthogonal Handle). The lysine’s side chain is modified to include a terminal alkyne. Alkynes allow click chemistry (azide–alkyne cycloaddition), site-specific labeling and fluorescent tagging. This amino acid could be used in protein imagin, drug conjugationg and synthetic protein networks.
Where did amino acids come from before enzymes that make them, and before life started?
Amino acids are within all living things on Earth, being the building blocks of proteins. Proteins are essential for many processes within living organisms, including catalysing reactions (enzymes), replicating genetic material (ribosomes), transporting molecules (transport proteins) and providing a structure to cells and organisms (e.g. collagen). Therefore, amino acids would have been needed in significant amounts within the region where life began on Earth. The Miller–Urey Experiment (1953) showed, that organic molecules can spontaneously form under plausible early-Earth conditions. Chemists simulated early Earth’s atmosphere and within days, the flask contained amino acids like Glycine, Alanine and Aspartate. Without enzymes or cells, just chemistry. That means, enzymes didn´t invent amino acids. Instead, Geochemistry made amino acids, those amino acids accumulated, some began forming short peptides, eventually, self-replicating systems emerged and only later did enzyme-based metabolism evolve.
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
If you build an α-helix entirely from D-amino acids, it will form a left-handed helix. The reason why is that natural proteins use L-amino acids. In Biology, almost all amino acids are L and standard α-helices in proteins are right-handed. D-amino acids are mirror images of L-amino acids. So if you build A peptide from L residues ➜ right-handed α-helix and the exact mirror molecule (all D residues) must adopt the mirror conformation. Therefore, the entire structure inverts and the mirror image of a right-handed helix is a left-handed helix.
Can you discover additional helices in proteins?
Yes, and in fact, there already has been discovered additional helices beyond the standard α-helix. But whether new helices can exist is a deeper structural question. Other helices are: 3₁₀ Helix and π-Helix. There can be more, but it is very constrained.
Why are most molecular helices right-handed?
Most molecular helices in biology are right-handed because life uses L-amino acids, and L stereochemistry makes the right-handed α-helix energetically favored.
Why do β-sheets tend to aggregate?
β-sheets aggregate because their backbone hydrogen bonding is unsatisfied at the edges, and the easiest way to satisfy it is by binding to another β-sheet.
8.1 What is the driving force for β-sheet aggregation?
The driving force for β-sheet aggregation is driven by a combination of backbone hydrogen bonding, hydrophobic interactions, and water-mediated entropy effects, with cooperativity making it autocatalytic.
Why do many amyloid diseases form β-sheets?
Amyloid diseases form β-sheets because β-sheets have exposed hydrogen-bonding edges at misfolded regions, β-strands are geometrically compatible with stacking and fibril formation, hydrophobic and polar side chains stabilize sheet stacking, cross-β fibrils represent a low-energy, highly stable state and misfolding exposes β-prone sequences that nucleate aggregation.
9.1 Can you use amyloid β-sheets as materials?
Yes, amyloid β-sheets are not just pathological; their structural properties make them ideal building blocks for engineered materials. Some amyloid-based materials are:
Hidrogels ➜ Short amyloidogenic peptides form cross-β networks in water and creates soft, viscoelastic gels that can be used in tissue engineering scaffolds, drug delivery systems and 3D cell culture matrices.
Nanofibers and Films ➜ Amyloid fibrils can be aligned to make strong, thin fibers and they can be embedded in composites for e.g. biocompatible electronics
Functionalized Materials ➜ Side chains can be chemically modified to bind metals, fluorophores, or enzymes and enables catalytic amyloid materials, light-responsive materials, and sensing platforms
Part B: Protein Analysis and Visualization
I selected the human hemoglobin because it is a crucial and very well-known protein that transports oxygen from the lungs to tissues and carbon dioxide back to the lungs.
The structure of hemoglobin. Source: https://chemistwizards.com/wp-content/uploads/2026/01/hemoglobin-structure-1024x687.webp
🩸This is the frequency of amino acids from Google Colab:
🩸On UniProt´s Blast tool, it showed that there are 113 protein sequence homologs.
🩸Hemoglobin belongs to the globin superfamily, which is a large group of proteins that bind heme and transport or store oxygen. Some common features of globins are globin fold, heme-binding pocket and conserved residues. Also, within the globin superfamily, hemoglobin has a subfamily ➜ Alpha-globin and Beta-globin.
🩸The structure from RCSB was released in 1998-04-29. The resolution is 1.80 Å
🩸There is a molecule in the structure. A ligand called “PROTOPORPHYRIN IX CONTAINING FE”
🩸SCOP showed me the following structure classification families:
🩸Visualizing the protein on PyMol as:
Cartoon:
Ribbon:
Ball and Stick:
🩸By coloring the protein by secondary structure, it showed more helices than sheets.
🩸The hydrophobic residues are (color yellow in image):
ALA, VAL, LEU, ILE, MET, PHE, TRP, PRO
The hydrophilic residues are (color cyan in image):
SER, THR, ASN, GLN, TYR, CYS
And the charged residues (also hydrophilic):
ASP, GLU (negative, color red in image), LYS, ARG, HIS (positive, color blue in image)
🩸Here you can see the surface of hemoglobin and the cavity (binding pocket) where the heme sits.
Part C. Using ML-Based Protein Design Tools
C1. Protein Language Modeling
For this part, I chose the lysozyme protein from PDB. Sequence:
b. Two patterns stand out immediately: A strong horizontal dark band for C (cysteine) across many positions and a few bright vertical columns where almost every substitution is beneficial. The C row is strongly negative because it has a deep purple color and it is darker than surrounding columns. This is very likely one of lysozyme’s disulfide-forming cysteines. Plus, there is a column where many substitutions are bright yellow/green across many amino acids. That usually means that the wild-type residue is suboptimal, the position is surface-exposed, many substitutions improve stability or packing or the model predicts energetic relief.
A mutation that stands out is Cys → Ser. Even though serine is chemically similar (small, polar), the heatmap shows it as strongly negative at those positions. That is because Ser cannot form a disulfide bond, even subtle size changes can disrupt precise geometry and disulfides in lysozyme are deeply integrated into folding topology. That means these cysteines are structurally essential, not just chemically similar residues.
2. Latent Space Analysis
This is the resulting map:
C2. Protein Folding
1. Folded protein with ESMFold:
This is the 3D structure in PDB:
Part D. Group Brainstorm on Bacteriophage Engineering
The main goal would be to computationally design mutations that increase the structural stability of the bacteriophage L lysis protein using structure prediction and stability analysis tools. It is the easiest and most common computational protein-engineering task, but very useful as well.
The second goal would be computationally analyzing conserved residues in L-like lysis proteins to identify mutations that may increase the toxicity of the protein. This method is more difficult, but quite interesting because this goal tries to make the protein kill bacteria more efficiently. The challenge is that toxicity often depends on complex cellular interactions and membrane effects, which are harder to model.
➜ Increased Stability of the Lysis (L) Protein
Tools/approaches 🧬
Protein language models to perform in-silico mutagenesis and identify mutations that are evolutionarily compatible with the L protein sequence. For exapmle, ESM-2
Approach: Input of the L-protein sequence into a protein language model, performing in-silico mutagenesis (substituting amino acids at different positions), scoring mutations based on their likelihood or predicted fitness and selecting mutations predicted to be tolerated or beneficial.
Structure Prediction with AlphaFold to evaluate structural effects with 3D structures
Approach: Predicting the structure of the wild-type L protein, modeling mutant variants suggested by language models or other methods and comparing structural confidence scores (pLDDT) and structural changes.
These tools help to solve the problem, because: 🔎
Protein language models allow large-scale in-silico mutagenesis, filter out mutations likely to destabilize the protein and suggest mutations that resemble evolutionarily acceptable variants.
Structure predictions help locate buried vs. exposed residues, identify functional or interaction sites and see whether mutations affect structural packing.
With energy calculations, you can quantitatively compare mutations and select variants predicted to produce a more stable protein fold.
Potential pitfalls 🚩
Bacteriophage lysis proteins are relatively poorly characterized compared with many other proteins. This means that there may be few experimentally validated structures and limited functional data for mutations. Because of this, models trained on general protein datasets (e.g., ESM-2) may not fully capture the specific biology of phage lysis mechanisms in Escherichia coli.
The activity of lysis proteins depends on the complex environment of the bacterial cell, including membranes and host proteins like DnaJ. A mutation predicted to improve stability could accidentally reduce interaction with the membrane, alter timing of lysis or disrupt important host interactions.
➜ Higher Toxicity of the Lysis Protein
Tools/approaches 🧬
Structure Prediction and Structural Analysis with AlphaFold and PyMol
Approach: Predicting the 3D structure of the L protein with AlphaFold, visualizing the structure in PyMOL and identifying structural features, such as exposed residues, potential interaction sites and membrane-facing regions.
Analysis of Lysis Proteins with BLAST and Clustal Omega
Approach: Using BLAST to find homologous lysis proteins from other bacteriophages, aligning the sequences using Clustal Omega and identifying highly conserved residues or motifs.
These tools help to solve the problem, because: 🔎
The Structure Prediction and Analysis helps identify regions that could be modified to strengthen the interactions with the bacterial membrane or other proteins.
Analysing the Lysis Protein with the given tools help identifying the position of conserved residues. By identifying these positions, it is possible to locate functional regions responsible for lysis and design mutations near these sites to potentially enhance activity
Potential pitfalls 🚩
Protein toxicity often depends on complex interactions inside the bacterial cell, such as membrane disruption or interactions with host proteins in Escherichia coli. Computational tools like AlphaFold can predict protein structures and interactions, but they cannot fully model the cellular environment.
Sometimes mutations that improve function can reduce protein stability or folding efficiency. Even if a mutation increases interaction with bacterial targets, it might cause misfolding, make the protein degrade faster or reduce expression levels.
These are the 4 generated peptides and the added peptide "FLYRWLPSRRGG" I obtained from Google Colab:
Perplexity measures how surprised the model is by a peptide sequence — lower values mean the sequence looks more natural or compatible with the protein context. The higher the score, the stranger, more unlikely the sequence. Therefore, according to PepMLM, peptide no. 1 has the best perplexity score with 8.770207. That means PepMLM thinks the generated peptides are more likely / better fitting sequences for the mutant SOD1 context than the known peptide.
Part 2: Evaluate Binders with AlphaFold3
Because peptides 0 and 1 have the variable "X", which means unknown amino acid, I used the other peptides, which don´t contain any unknown amino acids. These are the following results:
🟢(1)
🟠(2)
🟣(3)
All of the peptides seem to "float" over the SOD1 protein structure. That means that these peptides are not buried within the structure, but rather surfaced-bound.
🟢(1) contains peptide no. 2 from the table above. Its ipTM is 0.86 and pTM is 0.9
🟠(2) contains peptide no. 3 from the table above. Its ipTM is 0.81 and pTM is 0.86
🟣(3) contains the known SOD1-binding peptide "FLYRWLPSRRGG". Its ipTM is 0.9 and pTM is 0.92
By comparing all three of the binding peptides, the one with the best results is the known peptide 🟣(3). pTM measures how confident AlphaFold is in the overall 3D structure of the protein and ipTM measures confidence in the interaction between the protein and peptide chains. 🟣(3) scores the highest, which means that it is predicted by AlphaFold to bind most stably to the mutant SOD1.
In the structure from 🟣(3), the peptide (orange) binds on the surface of SOD1, near loops on one end of the β-barrel. It is close to the N-terminal region where the A4V mutation sits but does not penetrate the β-barrel core. The peptide does not appear to directly engage the dimer interface; rather, it interacts with one monomer of the SOD1 dimer.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
These are the following predictions for each peptide on PeptiVerse:
🟢(1)
🟠(2)
🟣(3)
➜ Solubility: All of the 3 peptides are soluble.
➜ Hemolysis: The lower, the safer. The safest one is Peptide 1
➜ Binding Affinity: The lower the value, the better/stronger predicted binding. All 3 of them show weak binding, but Peptide 1 has the best score. Although AlphaFold3 showed Peptide 3 with the best ipTM score, this measurement difference is normal, because the tools measure different things. AlphaFold3 measures the structural confidence of the interaction and PeptiVerse measures the predicted binding energy.
➜ Net Charge: Positive charge helps interaction with proteins and membranes. Peptide 1 scores the best, followed by Peptide 3.
➜ Hydrophobicity: Moderate values are usually ideal, so Peptide 3 has the best value, followed by Peptide 1.
To conclude, Peptide 🟢(1) has the best overall balance. It has the strongest predicted binding, the lowest hemolysis (safest), good positive charge and good hydrophilicity and solubility.
Part 4: Generate Optimized Peptides with moPPIt
moPPIt generated the following binding peptides:
KTFAQFKKIFLQ
PQKEITRCQFFE
VTYCAYYWVTCV
Part C: Final Project: L-Protein Mutants
I chose Option 3: Random Mutagenesis.
I used ChatGPT for helping me create a python function to generate random mutation combinations. It generated the following mutations:
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Some components in the Phusion High-Fidelity PCR Master Mix are:
➜ Phusion DNA Polymerase - This enzyme copies the DNA with very high accuracy.
➜ dNTPs (deoxynucleotide triphosphates) - These are the building blocks that the polymerase uses to synthesize new DNA.
➜ Reaction Buffer - This keeps the pH and salt conditions so the enzyme works properly.
➜ Mg²⁺ ions (Magnesium ions) - Magnesium is required for the DNA polymerase to function during DNA synthesis.
➜ Stabilizers and additives - These help keep the enzyme stable and improve the efficency of the PCR reaction.
What are some factors that determine primer annealing temperature during PCR?
➜ One factor that determines primer annealing during PCR is the primer length, because longer primers usually have a higher annealing temperature because they bind more strongly to the DNA.
➜ Furthermore, the GC content of the primer is important because primers with more G and C bases have a higher annealing temperature since G–C pairs form stronger bonds than A–T pairs.
➜ Another factor is the primer sequence, because the exact order of bases can affect how strongly the primer binds to the DNA template.
➜ Salt and magnesium concentration is another important factor because higher concentrations can stabilize primer binding and influence the optimal annealing temperature.
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
Restriction Enzyme Digest
This method uses restriction enzymes to cut DNA at specific sequences. It is commonly used to analyze DNA fragments or to prepare DNA for cloning.
PCR (Polymerase Chain Reaction)
PCR is used to amplify a specific DNA sequence using DNA polymerase and primers.
The experiment first uses PCR to amplify DNA fragments and introduce mutations into the amilCP gene and after PCR, the samples are treated with the DpnI restriction enzyme, which digests the original methylated plasmid template so that only the newly amplified PCR DNA remains. Then the PCR fragments are combined using Gibson Assembly and later transformed into E. coli. When you want to make many copies of a DNA region, PCR is required, but when you need to cut DNA at specific sites, the restriction enzyme is preferable over PCR.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
To ensure the DNA fragments work for Gibson cloning, you need to design them carefully so they can join together correctly. The DNA fragments must have matching overlapping ends (usually ~20–40 base pairs). These overlaps allow the fragments to stick together during Gibson Assembly. Furthermore, when doing PCR, primers should be designed to add these overlapping regions to the ends of the DNA fragments and you also need to make sure the overlaps match the correct neighboring fragment so everything assembles in the right order. After PCR or digestion, the DNA should be pure and free of contaminants so the assembly reaction works efficiently. Unlike restriction cloning, you don’t need specific restriction sites, but the ends must be designed to be complementary.
How does the plasmid DNA enter the E. coli cells during transformation?
Plasmid DNA enters E. coli cells during transformation by making the cells temporarily permeable so the DNA can pass through the membrane. A common method is the heat shock transformation, where the cells are first treated with calcium chloride to make their membranes more permeable. Then a sudden increase in temperature (heat shock) creates a temporary opening, allowing the plasmid DNA to enter the cells. Antoher method electroporation, where a short electrical pulse is applied to the cells, which creates tiny pores in the membrane. The plasmid DNA can then pass through these pores into the cell.
Describe another assembly method in detail (such as Golden Gate Assembly).
I found another assembly method on the website Addgene.org. This assembly method is called SLIC (Sequence and Ligation Independent Cloning). SLIC is a method that joins DNA fragments using short homologous (matching) sequences, similar to Gibson, but with fewer enzymes. SLIC joins DNA fragments by creating matching overhangs that anneal, and the final DNA is repaired inside the bacteria. How does SLIC work?
1. Each DNA fragment (for example, your insert and plasmid) is designed so that their ends share 15–25 base pairs of identical sequence. These overlaps are essential because they will allow the fragments to recognize and bind to each other.
2. The DNA is treated with an enzyme such as T4 DNA polymerase. This enzyme “chews back” the ends of the DNA, removing nucleotides from the 3′ ends and creating single-stranded overhangs.
3. When the treated DNA fragments are mixed together, the complementary single-stranded overhangs base-pair (anneal) with each other.
This brings the fragments together in the correct order based on their matching sequences.
4. At this stage, the DNA fragments are joined, but there may still be missing bonds in the backbone. So the DNA is not fully complete yet.
5. The partially assembled DNA is introduced into E. coli cells.
6. The bacteria’s natural DNA repair systems fill in missing nucleotides and seal the nicks by using ligase enzymes. This results in a fully intact plasmid.
SLIC Method Diagram
DNA Fragments with Overlaps
→
Exonuclease Treatment (creates overhangs)
→
Annealing (fragments stick together)
→
Transformation into E. coli
→
DNA Repair in Cell
Week 7 HW: Genetic Circuits Part II
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Traditional genetic circuits
😐 Traditional genetic circuits typically implement logic like AND, OR, NOT — meaning outputs are binary (on/off).
😐 Boolean circuits are limited to combinations of discrete logic rules.
😐 Traditional genetic circuits are hard-coded.
😐 Biological systems are inherently noisy (stochastic gene expression). Boolean circuits can fail if signals fluctuate around thresholds
IANNs
😊 Neural networks operate with continuous values, not just 0 or 1. This allows cells to respond proportionally to input concentrations and encode gradients and subtle differences in signals.
😊 IANNs can map complex environmental signals and multi-factor biological states.
😊 IANNs can be trained and adapted to new conditions.
😊 Neural networks distribute computation across many nodes and use weighted sums → more tolerant to noise.
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
A useful application would be a smart cancer-detecting and responding therapeutic cell. It´s a living classifier that decides whether to trigger a treatment based on a complex molecular signature. The goal is to engineer a cell that detects whether it is in a tumor microenvironment and it activates a therapeutic response only when high confidence is reached.
Input (each input corresponds to a measurable molecular feature):
IANN processes inputs using weighted gene regulation and combinatorial control. Biologically, each “neuron” is a gene whose expression depends on a weighted sum of inputs. Activation functions are implemented via cooperative binding and thresholding via repression/activation dynamics. Therefore, the network learns a nonlinear decision boundary and this allows detection of patterns, like:
“Moderate hypoxia + high lactate + mild inflammation = tumor”,
even if no single signal is decisive.
Output:
➡️ Expression of a therapeutic protein (e.g., cytokine, toxin, checkpoint inhibitor)
Limitations:
➡️ Network outputs may drift or become inconsistent
➡️ Difficult to “train” the network accurately because in electronics, weights are precise numbers. In cells, weights are promoter strength, binding affinity, degradation rates and these are hard to tune precisely and they are sensitive to context
➡️ Large circuits consume energy and resources. This leads to slower growth and evolutionary pressure to disable the circuit
Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.