Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
Step 1. First, describe a biological engineering application or tool you want to develop and why. I am passionate about reading, as well as Biology, so I came up with an idea that could mix both and be sustainable and enjoyable for us bookworms. My idea that I would like to put into practice is a reading light powered by bioluminescent algae, combined with microbial fuel cells for supplemental energy. Instead of relying solely on electricity, the lamp is partially powered by living systems, creating a sustainable, educational, and interactive device.
Week 2 HW: DNA read, write and edit
Part 1: Benchling & In-silico Gel Art Here is a simulation with the Restriction Enzyme Digestion on Benchling.com: Part 3: DNA Design Challenge 3.1. Choose your protein. I chose the Green Fluorescent Protein (GFP) because it naturally glows green when exposed to UV light. This revolutionized cell biology by allowing scientists to see proteins inside living cells and it won the 2008 Nobel Prize in Chemistry. This protein has been isolated from the jellyfish Aequorea victoria and forms a beta-barrel structure (like a protective can). Inside the barrel is the chromophore — the light-producing part.
Week 3 HW: Lab Automation
Opentrons Artwork Post-Lab Questions I found a published paper which describes how researchers leveraged the Opentrons OT-2 automated liquid handler to develop an automated, high-throughput proxy viscometer. The robot was programmed to dispense liquids of various viscosities and collect data for machine-learning models to predict viscosity, demonstrating a practical application of the OT-2 in fluid characterization workflows — requiring minimal human intervention while significantly increasing throughput. If I would go for my first project idea, which is the Bacterial Microplastic Sensor, there would be the following automation tools that I could apply: Automated Fluorescence Detection System The goal is to automatically quantify GFP output in response to PET degradation products. I can automate the timed fluorescence measurement (every 10 min), background subtraction, data logging (CSV), real-time plotting and threshold alert. Automated Incubation + Sampling There can be a shaking plattform automation, with 3D tube rack and/or temperature control with heat pad, temperature sensor and automated regulation. A pseudocode example can be:
Week 4 HW: Protein Design Part I
Part A. Conceptual Questions Why do humans eat beef but do not become a cow, eat fish but do not become fish? When you eat beef or fish, your body does not keep the meat intact and turn it into “cow tissue” or “fish tissue.” Instead, your digestive system breaks everything down into basic molecules, like proteins into amino acids, fats into fatty acids + glycerol, carbohydrates into simple sugars and DNA into nucleotides.
Week 5 HW: Protein Design Part II
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM The human SOD1 sequence without the mutation: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ The human SOD1 sequence with the A4V mutation: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ These are the 4 generated peptides and the added peptide "FLYRWLPSRRGG" I obtained from Google Colab:
Week 6 HW: Genetic Circuits Part I
Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Some components in the Phusion High-Fidelity PCR Master Mix are: ➜ Phusion DNA Polymerase - This enzyme copies the DNA with very high accuracy. ➜ dNTPs (deoxynucleotide triphosphates) - These are the building blocks that the polymerase uses to synthesize new DNA.
Week 7 HW: Genetic Circuits Part II
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Traditional genetic circuits 😐 Traditional genetic circuits typically implement logic like AND, OR, NOT — meaning outputs are binary (on/off). 😐 Boolean circuits are limited to combinations of discrete logic rules.

Week 1 HW: Principles and Practices

Step 1. First, describe a biological engineering application or tool you want to develop and why.

I am passionate about reading, as well as Biology, so I came up with an idea that could mix both and be sustainable and enjoyable for us bookworms.

My idea that I would like to put into practice is a reading light powered by bioluminescent algae, combined with microbial fuel cells for supplemental energy. Instead of relying solely on electricity, the lamp is partially powered by living systems, creating a sustainable, educational, and interactive device.

Bioluminescent algae (e.g., Pyrocystis lunula or engineered E. coli) produces gentle, continuous light. Algae glow when they metabolize nutrients, providing a natural light source. This can lead to sustainable outcomes, such as reducing electricity consumption, using renewable biological processes and biodegradable components that minimalize waste.

My additional goals are to enhance the aesthetic experience of reading and to explore practical, safe uses of bioluminescence beyond the laboratory.🦠🌱

This image is AI generated

My first big goal is that this lamp should, of course, not harm any humans, animals or the environment

Sub-goal 1A: Containment of living organisms
- The use of non-pathogenic, lab-safe algae or bacteria.
- Ensuring that the lamp is a closed system to prevent accidental release into homes or the environment.
- Include fail-safes so organisms cannot survive outside the lamp (e.g., nutrient-dependent survival).

Sub-goal 1B: Safe user interaction

Developing clear usage guidelines, labeling, and instructions.
Prevent accidental ingestion, skin reactions, or allergic responses.
Educate users on proper disposal of nutrients and lamp components.

My second big goal is that the lamp should not negatively impact ecosystems or contribute to waste.

Sub-goal 2A: Biodegradable materials
- The use of compostable biomaterials for the lamp casing and cartridges.
- Reducing reliance on plastics or non-renewable resources.

Sub-goal 2B: Minimal ecological footprint

Design the lamp to consume minimal electricity and nutrients.
Ensure any waste products from the lamp (e.g., spent algae or nutrient capsules) are safe and compostable.

Step 3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Governance Action 1: Implement a Biosafety Certification Program

Purpose: Ensure that all SymbioLight lamps meet strict biosafety standards to prevent accidental release of organisms or harm to users.
Design: Require all lamps to be tested in labs for non-pathogenicity and containment integrity.
Assumptions: Users may handle the lamp incorrectly or dispose of it improperly.
Risks or Failure: Contaminated or unsafe lamps reaching consumers.
Success: Safe adoption of living lamps in households.

Governance Action 2: The use of Biodegradable, Low-Impact Materials Policy

Purpose: Ensure the lamp’s components do not harm the environment when discarded, aligning with sustainability goals.
Design: Mandate mycelium, algae-based plastics, or bacterial cellulose for lamp casing and nutrient cartridges. Plus, require testing for complete compostability and low environmental toxicity.
Assumptions: Users will dispose of lamps in composting or bio-waste systems.
Risks or Failure: Non-compostable waste entering landfills or water systems.
Success: Increased public trust in synthetic biology products.

Governance Action 3: Mandatory User Education & Ethical Guidance

Purpose: Promote safe, responsible, and informed use of SymbioLight, and foster public understanding of living systems.
Design: Include educational manuals and labels explaining the biology, safety protocols, and proper disposal.
Assumptions: Users may be unfamiliar with living systems and mishandle them without guidance.
Risks or Failure: Misuse or neglect of living organisms leading to lamp failure or ecological impact.
Success: Informed users who safely interact with living lamps.

Step 4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	3	3	3
• By helping respond	2	2	2
Foster Lab Safety
• By preventing incident	2	3	2
• By helping respond	2	2	2
Protect the environment
• By preventing incidents	2	3	2
• By helping respond	1	1	2
Other considerations
• Minimizing costs and burdens to stakeholders	2	2	2
• Feasibility?	2	3	2
• Not impede research	2	2	2
• Promote constructive applications	3	2	2

Step 5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

After considering the three proposed governance actions—(1) biosafety certification, (2) sustainable materials policy, and (3) user education requirements—I would prioritize a combination of Option 1 and 3, with Option 2 as a secondary but important long-term goal.

The most critical ethical responsibility of SymbioLight is to ensure non-malfeasance—that the product cannot cause harm to people or the environment. Because the lamp involves living organisms, even if they are benign, the greatest potential risk lies in accidental release of engineered algae or bacteria, contamination of local ecosystems as well as unintended health effects in homes. Without strong biosafety guarantees, the entire concept could become unethical regardless of how sustainable or educational it is.

The rason why I would prioritize Option 3 as well, is because even a perfectly engineered product can cause problems if misused. Therefore, governance must include the human element. SymbioLight users need to understand how to care for living organisms, how to dispose of materials properly and the limits of what the lamp can safely do.

Lastly, Option 2 is also important, but not first because sustainability is a core motivation for SymbioLight, but it does not address immediate safety risks. Using biodegradable materials is ethically desirable, yet a lamp made from non-ideal materials is still less harmful than a lamp that releases unsafe organisms.

Week 2 HW: DNA read, write and edit

Part 1: Benchling & In-silico Gel Art

Here is a simulation with the Restriction Enzyme Digestion on Benchling.com:

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose the Green Fluorescent Protein (GFP) because it naturally glows green when exposed to UV light. This revolutionized cell biology by allowing scientists to see proteins inside living cells and it won the 2008 Nobel Prize in Chemistry. This protein has been isolated from the jellyfish Aequorea victoria and forms a beta-barrel structure (like a protective can). Inside the barrel is the chromophore — the light-producing part.

The GFP visible in Aequorea victoria. Source: https://www.universityofcalifornia.edu/news/how-glow-dark-jellyfish-inspired-scientific-revolution

Sequence from https://www.uniprot.org/uniprotkb/P42212/entry:

sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 GN=GFP PE=1 SV=1 MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

GFP DNA Sequence from https://www.bioinformatics.org/sms2/rev_trans.html:

atgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtggaactggatggcgatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgatgcgacctatggcaaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccgtggccgaccctggtgaccacctttagctatggcgtgcagtgctttagccgctatccggatcatatgaaacagcatgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgcaccattttttttaaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggcgataccctggtgaaccgcattgaactgaaaggcattgattttaaagaagatggcaacattctgggccataaactggaatataactataacagccataacgtgtatattatggcggataaacagaaaaacggcattaaagtgaactttaaaattcgccataacattgaagatggcagcgtgcagctggcggatcattatcagcagaacaccccgattggcgatggcccggtgctgctgccggataaccattatctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgatcatatggtgctgctggaatttgtgaccgcggcgggcattacccatggcatggatgaactgtataaa

3.3. Codon optimization.

GFP DNA Sequence with Codon-Optimization from https://punnettsquare.org/codon-optimizer/:

ATGTCTAAAGGCGAGGAACTGTTCACCGGCGTTGTTCCGATTTTAGTGGAACTGGATGGCGATGTGAACGGCCACAAATTTAGCGTGTCTGGCGAGGGTGAGGGCGACGCAACTTACGGTAAACTGACCCTGAAGTTCATTTGTACTACCGGTAAACTGCCAGTGCCATGGCCAACCCTGGTGACCACCTTTTCTTATGGCGTGCAGTGTTTCTCTCGTTATCCGGATCATATGAAACAGCACGACTTCTTCAAAAGCGCGATGCCAGAAGGCTATGTGCAGGAGCGTACCATTTTTTTTAAAGATGACGGTAACTATAAAACCCGTGCAGAAGTTAAATTCGAAGGCGATACCCTGGTGAACCGTATTGAACTGAAAGGTATTGACTTCAAGGAAGATGGCAACATTTTAGGTCACAAATTAGAATATAATTATAACTCTCACAACGTTTACATTATGGCAGATAAACAGAAGAACGGTATCAAAGTTAACTTCAAGATCCGCCATAATATTGAGGATGGTAGCGTGCAATTAGCGGATCATTACCAGCAAAATACCCCGATTGGCGATGGCCCGGTGCTGCTGCCAGATAACCATTACTTAAGCACCCAAAGCGCGTTAAGCAAGGATCCAAATGAAAAACGTGACCATATGGTGTTACTGGAGTTTGTGACCGCAGCGGGCATTACCCACGGTATGGATGAACTGTATAAA

3.4 Technologies for producing the Green Fluorescent Protein.

To produce the Green Fluorescent Protein (GFP), a gene encoding GFP will be necessary. This gene contains a promoter, coding sequence and terminator.

A: Cell-Dependent Method.

The GFP gene is inserted into a plasmid with a bacterial promoter. The bacteria (E.coli) transcribes and translates the gene and the protein accumulates in the cytoplasm. It can be purified using chromatography. This method is cheap and fast.

Another cell-dependent method would be when the plasmid with GFP is introduced via transfection in yeast or mammalian cells. Through this method, cells express GFP for microscopy studies or protein assays. Therefore, it is useful if GFP needs to fold properly in eukaryotic environments.

graph LR;
DNA-->mRNA-->Protein-->Fluorescence

B: Cell-Free Method.

The cell-free method is the use of cell-free protein synthesis systems, which contain ribosomes, tRNAs, amino acids, nucleotides, ATP, GTP, and transcription/translation enzymes. Then, the DNA template will be added for GFP to the mixture. The protein is produced without living cells, often within a few hours. This method is rapid, avoids toxic effects of protein expression, easy to add modifications.

3.5. [Optional] How does it work in nature/biological systems?

A single gene can produce multiple different proteins at the transcriptional level mainly through mechanisms that modify the RNA transcript before it becomes translated. This greatly expands protein diversity without increasing gene number. The key mechanisms are Alternative Splicing, Alternative Promoters, Alternative Polyadenylation and RNA Editing. The most crucial method is alternative splicing. During transcription, a gene is copied into pre-mRNA containing exons and introns. Before translation, introns are removed and exons are joined. In alternative splicing, different combinations of exons are included or excluded. Types of alternative splicing are exon skipping, alternative 5′ splice site, alternative 3′ splice site, intron retention and mutually exclusive exons.
I used the site https://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm to convert sequences from DNA to RNA to protein:

DNA sequence:

ATGTCTAAAGGCGAGGAACTGTTCACCGGCGTTGTTCCGATTTTAGTGGAACTGGATGGCGATGTGAACGGCCACAAATTTAGCGTGTCTGGCGAGGGTGAGGGCGACGCAACTTACGGTAAACTGACCCTGAAGTTCATTTGTACTACCGGTAAACTGCCAGTGCCATGGCCAACCCTGGTGACCACCTTTTCTTATGGCGTGCAGTGTTTCTCTCGTTATCCGGATCATATGAAACAGCACGACTTCTTCAAAAGCGCGATGCCAGAAGGCTATGTGCAGGAGCGTACCATTTTTTTTAAAGATGACGGTAACTATAAAACCCGTGCAGAAGTTAAATTCGAAGGCGATACCCTGGTGAACCGTATTGAACTGAAAGGTATTGACTTCAAGGAAGATGGCAACATTTTAGGTCACAAATTAGAATATAATTATAACTCTCACAACGTTTACATTATGGCAGATAAACAGAAGAACGGTATCAAAGTTAACTTCAAGATCCGCCATAATATTGAGGATGGTAGCGTGCAATTAGCGGATCATTACCAGCAAAATACCCCGATTGGCGATGGCCCGGTGCTGCTGCCAGATAACCATTACTTAAGCACCCAAAGCGCGTTAAGCAAGGATCCAAATGAAAAACGTGACCATATGGTGTTACTGGAGTTTGTGACCGCAGCGGGCATTACCCACGGTATGGATGAACTGTATAAA

RNA sequence:

AUGUCUAAAGGCGAGGAACUGUUCACCGGCGUUGUUCCGAUUUUAGUGGAACUGGAUGGCGAUGUGAACGGCCACAAAUUUAGCGUGUCUGGCGAGGGUGAGGGCGACGCAACUUACGGUAAACUGACCCUGAAGUUCAUUUGUACUACCGGUAAACUGCCAGUGCCAUGGCCAACCCUGGUGACCACCUUUUCUUAUGGCGUGCAGUGUUUCUCUCGUUAUCCGGAUCAUAUGAAACAGCACGACUUCUUCAAAAGCGCGAUGCCAGAAGGCUAUGUGCAGGAGCGUACCAUUUUUUUUAAAGAUGACGGUAACUAUAAAACCCGUGCAGAAGUUAAAUUCGAAGGCGAUACCCUGGUGAACCGUAUUGAACUGAAAGGUAUUGACUUCAAGGAAGAUGGCAACAUUUUAGGUCACAAAUUAGAAUAUAAUUAUAACUCUCACAACGUUUACAUUAUGGCAGAUAAACAGAAGAACGGUAUCAAAGUUAACUUCAAGAUCCGCCAUAAUAUUGAGGAUGGUAGCGUGCAAUUAGCGGAUCAUUACCAGCAAAAUACCCCGAUUGGCGAUGGCCCGGUGCUGCUGCCAGAUAACCAUUACUUAAGCACCCAAAGCGCGUUAAGCAAGGAUCCAAAUGAAAAACGUGACCAUAUGGUGUUACUGGAGUUUGUGACCGCAGCGGGCAUUACCCACGGUAUGGAUGAACUGUAUAAA

Protein sequence:

MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Part 4: Prepare a Twist DNA Synthesis Order

Promoter:

TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS:

CATTAAAGAGGAGAAAGGTACC

Start codon:

ATG

Coding sequence:

ATGTCTAAAGGCGAGGAACTGTTCACCGGCGTTGTTCCGATTTTAGTGGAACTGGATGGCGATGTGAACGGCCACAAATTTAGCGTGTCTGGCGAGGGTGAGGGCGACGCAACTTACGGTAAACTGACCCTGAAGTTCATTTGTACTACCGGTAAACTGCCAGTGCCATGGCCAACCCTGGTGACCACCTTTTCTTATGGCGTGCAGTGTTTCTCTCGTTATCCGGATCATATGAAACAGCACGACTTCTTCAAAAGCGCGATGCCAGAAGGCTATGTGCAGGAGCGTACCATTTTTTTTAAAGATGACGGTAACTATAAAACCCGTGCAGAAGTTAAATTCGAAGGCGATACCCTGGTGAACCGTATTGAACTGAAAGGTATTGACTTCAAGGAAGATGGCAACATTTTAGGTCACAAATTAGAATATAATTATAACTCTCACAACGTTTACATTATGGCAGATAAACAGAAGAACGGTATCAAAGTTAACTTCAAGATCCGCCATAATATTGAGGATGGTAGCGTGCAATTAGCGGATCATTACCAGCAAAATACCCCGATTGGCGATGGCCCGGTGCTGCTGCCAGATAACCATTACTTAAGCACCCAAAGCGCGTTAAGCAAGGATCCAAATGAAAAACGTGACCATATGGTGTTACTGGAGTTTGTGACCGCAGCGGGCATTACCCACGGTATGGATGAACTGTATAAA

7*His tag:

CATCACCATCACCATCATCAC

Stop codon:

TAA

Terminator:

CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

This is the linear map:

The Plasmid:

Part 5: DNA Read/Write/Edit

5.1 DNA read

(i) I´d choose to sequence the gene TP53.

What is TP53?

➜ The gene TP53 provides instructions for making a protein called tumor protein p53 (or p53).

➜ This protein acts as a tumor suppressor, which means that it regulates cell division by keeping cells from growing and dividing (proliferating) too fast or in an uncontrolled way.

➜ It is often called the guardian of the genome because it prevents cells with damaged DNA from becoming cancerous.

➜ Thus, the TP53 gene is arguably the most important gene in cancer biology.

The p53 protein is located in the nucleus of cells throughout the body, where it binds directly to DNA. When the DNA in a cell becomes damaged by agents such as toxic chemicals, radiation, or UV rays from sunlight, this protein plays a critical role in determining whether the DNA will be repaired or the damaged cell will self-destruct. If the DNA can be repaired, p53 activates other genes to fix the damage. If the DNA cannot be repaired, this protein prevents the cell from dividing and signals it to undergo self-destruction. By stopping cells with mutated or damaged DNA from dividing, p53 helps prevent the development of tumors.

TP53: A central mediator of stress responses. Source: https://p53.fr/images/image_info/TP53_Knowledge/TP53_Pathtway_2.png

(ii) For sequencing, I would choose the Illumina Sequencing (Sequencing by Synthesis).

Generation	It is second generation because it sequences millions of DNA fragments at the same time (massively parallel sequencing), requires PCR amplification first (cluster generation on a flow cell) and it produces short reads (usually 75-300 base pairs).
Input & Preparation	The biological input would be the human genomic DNA, like blood tissues or cultured cells, or molecular input like double stranded DNA fragments. How to prepare the input: DNA extraction ➜ Fragmentation ➜ End Repair and A-Tailing ➜ Adapter Ligation ➜ PCR Amplification ➜ Library Quantification
How it reads the DNA	Illumina uses the fluorescent reversible terminator nucleotides. One labeled nucleotide (A, T, C, G) is incorporated, the terminator prevents further extension, the laser excites fluorophore and the camera detects the emitted color. This is how the base is identified and each color responds to one base.
Output	Illumina produces raw data files (BCL files) converted into FastQ files. This contains sequence reads and quality scores.

5.2 DNA Write

(i) If I could synthesize a gene, I would choose synthesizing nitrogen-fixation genes for crops.

Why are nitrogen fixating crops so powerful?

➜ Right now, only legumes (like beans and peas) form symbiosis with nitrogen-fixing bacteria such as Rhizobium

➜ Major crops (wheat, rice, maize) rely heavily on synthetic fertilizer

➜ Fertilizer production uses the Haber–Bosch process, which is extremely energy-intensive

➜ If cereal crops could fix nitrogen, there would be massive reduction in greenhouse gas emissions, lower farming crops, less nitrate pollution in rivers and improved soil health.

➜ Unfortunately, nitrogen fixation is a complex process because nitrogen requires ~15–20 genes (nifHDK and accessory genes), tight regulatory control and metal cofactors (Mo-Fe clusters).

➜ Key genes include nifH, nifD, and nifK.

(ii) I would use the Oxford Nanopore method (Longs-read sequencing)

1. Essential steps:

High-molecular-weight DNA extraction
Adapter attachment
DNA passes through nanopore protein
Electrical signal changes detected
Base calling via AI algorithms

It is essential for very long reads, it can sequence an entire nif cluster in one read and it detects structural variations easily.

2. Limitations:

The Oxford Nanopore method has raw accuracy, more insertion/deletion errors and it requires strong computational analysis.

5.3 DNA Edit

(i) If I could edit a gene, I would edit disease vectors (Gene Drives), specifically Mosquito Malaria Control.

Why did I choose this editing?

The main species targeted is the mosquito Anopheles gambiae. This species spreads malaria by transmitting the parasite Plasmodium falciparum. Malaria is a very problematic disease because it kills hundreds of thousands of people per year- mostly children.

By editing Anopheles gambiae with gene drives, the mosquito can be sterile and it would prevent transmitting the malaria parasite. I would edit a fertility gene (often doublesex) and attach a CRISPR-based gene drive. This will lead to the breeding of modified mosquitoes, the gene drive would copy itself into the partner chromosome and nearly all of the offsprings inherit it. But as good as this sounds, there are obstacles, such as arising new mutaions that break the CRISPR target site, preserve fertility and outcome the drive.

CRISPR technologies for the control and study of malaria. Source: https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs13071-025-06905-w/MediaObjects/13071_2025_6905_Fig1_HTML.png

(ii) I would use CRISPR-Cas9 combined with a gene drive cassette as the technology to edit the gene.

How CRISPR edits DNA and the essential steps	CRISPR–Cas9 editing works in 3 core steps: Target recognition, DNA cleavage and Repair pathway determines outcome. For a gene drive, the inserted cassete includes a Cas9 gene, sgRNA gene and homology arms. After cutting the wild-type allele, the cell uses the gene drive allele as the repair template, the drive copies itself and the organism becomes homozygous. The essential steps are: 1. Target gene selection Choosing a gene critical for female fertility or parasite transmission 2. Guide RNA design Using bioinformatics tools to identify unique 20-nt target sequence, minimize off-target matches and avoid polymorphic regions 3. Construct Gene Drive Cassete The components include Cas9 coding sequence, germline-specific promoter, sgRNA expression cassette and homology arms (~1 kb each side) 4. Embryo Microinjection Inputs delivered into early embryos 5. Screening 6. Contained Population Testing
Input & Preparation	Input: Cas9 enzyme (or Cas9 expression plasmid), sgRNA sequence, donor DNA template with homology arms, promoter sequences, selectable marker, mosquito embryos, microinjection equipment and PCR primers for validation. Design preparation ➜ whole genome sequence analysis, off-target prediction, population genetic modelling, ecological modelling
Limitaions in terms of efficiency or precision	1. Resistance Alleles Mutations disrupt target site, the drive no longer cuts and it could lead to surviving of resistant mosquitoes 2. Off-target effects Crispr may cut unintended sites and it can cause fitness defects and unpredictable phenotypes 3. Evolutionary Instability The parasites may evolve resistance and the gene flow between populations complicates spread

How CRISPR edits DNA and the essential steps

CRISPR–Cas9 editing works in 3 core steps: Target recognition, DNA cleavage and Repair pathway determines outcome. For a gene drive, the inserted cassete includes a Cas9 gene, sgRNA gene and homology arms. After cutting the wild-type allele, the cell uses the gene drive allele as the repair template, the drive copies itself and the organism becomes homozygous. The essential steps are:

1. Target gene selection

Choosing a gene critical for female fertility or parasite transmission

2. Guide RNA design

Using bioinformatics tools to identify unique 20-nt target sequence, minimize off-target matches and avoid polymorphic regions

3. Construct Gene Drive Cassete

The components include Cas9 coding sequence, germline-specific promoter, sgRNA expression cassette and homology arms (~1 kb each side)

4. Embryo Microinjection

Inputs delivered into early embryos

5. Screening

6. Contained Population Testing

Input & Preparation

Input: Cas9 enzyme (or Cas9 expression plasmid), sgRNA sequence, donor DNA template with homology arms, promoter sequences, selectable marker, mosquito embryos, microinjection equipment and PCR primers for validation.

Design preparation ➜ whole genome sequence analysis, off-target prediction, population genetic modelling, ecological modelling

Limitaions in terms of efficiency or precision

1. Resistance Alleles

Mutations disrupt target site, the drive no longer cuts and it could lead to surviving of resistant mosquitoes

2. Off-target effects

Crispr may cut unintended sites and it can cause fitness defects and unpredictable phenotypes

3. Evolutionary Instability

The parasites may evolve resistance and the gene flow between populations complicates spread

References

https://www.universityofcalifornia.edu/news/how-glow-dark-jellyfish-inspired-scientific-revolution?
https://www.uniprot.org/uniprotkb/P42212/entry
https://medlineplus.gov/genetics/gene/tp53/#:~:text=The%20TP53%20gene%20provides%20instructions,or%20in%20an%20uncontrolled%20way.
https://pmc.ncbi.nlm.nih.gov/articles/PMC10030364/#:~:text=Sierra%20Mixe%2C%20a%20maize%20variety,exploited%20to%20increase%20crop%20productivity.
https://www.nature.com/articles/s41434-024-00468-8

Week 3 HW: Lab Automation

Opentrons Artwork

Post-Lab Questions

I found a published paper which describes how researchers leveraged the Opentrons OT-2 automated liquid handler to develop an automated, high-throughput proxy viscometer. The robot was programmed to dispense liquids of various viscosities and collect data for machine-learning models to predict viscosity, demonstrating a practical application of the OT-2 in fluid characterization workflows — requiring minimal human intervention while significantly increasing throughput.
If I would go for my first project idea, which is the Bacterial Microplastic Sensor, there would be the following automation tools that I could apply:

Automated Fluorescence Detection System	The goal is to automatically quantify GFP output in response to PET degradation products. I can automate the timed fluorescence measurement (every 10 min), background subtraction, data logging (CSV), real-time plotting and threshold alert.
Automated Incubation + Sampling	There can be a shaking plattform automation, with 3D tube rack and/or temperature control with heat pad, temperature sensor and automated regulation. A pseudocode example can be: if temperature < 37: heater_on() else: heater_off()
AI-Based Fluorescence Quantification	Instead of raw intensity, there can be the use of computer vision or a train model to classify fluorescence levels.
Ginkgo Nebula Integration Plan	The usage of Gingko Nebula could be very helpful with DNA design automation (Promoter optimization, RBS strength prediction, codon optimization, circuit simulation) or experimental tracking (automated protocol versioning, construct iteration tracking, strain documentation).

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

When you eat beef or fish, your body does not keep the meat intact and turn it into “cow tissue” or “fish tissue.” Instead, your digestive system breaks everything down into basic molecules, like proteins into amino acids, fats into fatty acids + glycerol, carbohydrates into simple sugars and DNA into nucleotides.

Why are there only 20 natural amino acids?

Life could have used more (and sometimes does), but 20 appears to be a near-optimal balance. But why is that so? The Genetic Code has limits. Proteins are built using codons — 3-letter sequences in DNA/RNA. ➜ 4 bases (A, U/T, C, G), 3 positions per codon, 4³ = 64 possible codons. But 3 are stop signals and many codons are redundant (multiple codons for the same amino acids). Therefore, The code settled on 20 standard amino acids early in evolution and became highly conserved. Changing the code would break nearly all exiting proteins and would be catastrophically disruptive!

Can you make other non-natural amino acids? Design some new amino acids.
Synthetic biology and medicinal chemistry routinely create non-natural amino acids, and some are even genetically encoded in engineered organisms. Some are chemically synthesized and incorporated during peptide synthesis, while others are genetically encoded using engineered tRNA/synthetase systems.
One amino acid would be Alkyne-Lysine (Bioorthogonal Handle). The lysine’s side chain is modified to include a terminal alkyne. Alkynes allow click chemistry (azide–alkyne cycloaddition), site-specific labeling and fluorescent tagging. This amino acid could be used in protein imagin, drug conjugationg and synthetic protein networks.
Where did amino acids come from before enzymes that make them, and before life started?
Amino acids are within all living things on Earth, being the building blocks of proteins. Proteins are essential for many processes within living organisms, including catalysing reactions (enzymes), replicating genetic material (ribosomes), transporting molecules (transport proteins) and providing a structure to cells and organisms (e.g. collagen). Therefore, amino acids would have been needed in significant amounts within the region where life began on Earth. The Miller–Urey Experiment (1953) showed, that organic molecules can spontaneously form under plausible early-Earth conditions. Chemists simulated early Earth’s atmosphere and within days, the flask contained amino acids like Glycine, Alanine and Aspartate. Without enzymes or cells, just chemistry. That means, enzymes didn´t invent amino acids. Instead, Geochemistry made amino acids, those amino acids accumulated, some began forming short peptides, eventually, self-replicating systems emerged and only later did enzyme-based metabolism evolve.
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
If you build an α-helix entirely from D-amino acids, it will form a left-handed helix. The reason why is that natural proteins use L-amino acids. In Biology, almost all amino acids are L and standard α-helices in proteins are right-handed. D-amino acids are mirror images of L-amino acids. So if you build A peptide from L residues ➜ right-handed α-helix and the exact mirror molecule (all D residues) must adopt the mirror conformation. Therefore, the entire structure inverts and the mirror image of a right-handed helix is a left-handed helix.
Can you discover additional helices in proteins?

Yes, and in fact, there already has been discovered additional helices beyond the standard α-helix. But whether new helices can exist is a deeper structural question. Other helices are: 3₁₀ Helix and π-Helix. There can be more, but it is very constrained.

Why are most molecular helices right-handed?
Most molecular helices in biology are right-handed because life uses L-amino acids, and L stereochemistry makes the right-handed α-helix energetically favored.
Why do β-sheets tend to aggregate?

β-sheets aggregate because their backbone hydrogen bonding is unsatisfied at the edges, and the easiest way to satisfy it is by binding to another β-sheet.

8.1 What is the driving force for β-sheet aggregation?

The driving force for β-sheet aggregation is driven by a combination of backbone hydrogen bonding, hydrophobic interactions, and water-mediated entropy effects, with cooperativity making it autocatalytic.

Why do many amyloid diseases form β-sheets?

Amyloid diseases form β-sheets because β-sheets have exposed hydrogen-bonding edges at misfolded regions, β-strands are geometrically compatible with stacking and fibril formation, hydrophobic and polar side chains stabilize sheet stacking, cross-β fibrils represent a low-energy, highly stable state and misfolding exposes β-prone sequences that nucleate aggregation.

9.1 Can you use amyloid β-sheets as materials?

Yes, amyloid β-sheets are not just pathological; their structural properties make them ideal building blocks for engineered materials. Some amyloid-based materials are:

Hidrogels ➜ Short amyloidogenic peptides form cross-β networks in water and creates soft, viscoelastic gels that can be used in tissue engineering scaffolds, drug delivery systems and 3D cell culture matrices.
Nanofibers and Films ➜ Amyloid fibrils can be aligned to make strong, thin fibers and they can be embedded in composites for e.g. biocompatible electronics
Functionalized Materials ➜ Side chains can be chemically modified to bind metals, fluorophores, or enzymes and enables catalytic amyloid materials, light-responsive materials, and sensing platforms

Part B: Protein Analysis and Visualization

I selected the human hemoglobin because it is a crucial and very well-known protein that transports oxygen from the lungs to tissues and carbon dioxide back to the lungs.

The structure of hemoglobin. Source: https://chemistwizards.com/wp-content/uploads/2026/01/hemoglobin-structure-1024x687.webp

🩸Sequence (from FASTA):

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens OX=9606 GN=HBA1 PE=1 SV=2 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR

🩸This is the frequency of amino acids from Google Colab:

🩸On UniProt´s Blast tool, it showed that there are 113 protein sequence homologs.

🩸Hemoglobin belongs to the globin superfamily, which is a large group of proteins that bind heme and transport or store oxygen. Some common features of globins are globin fold, heme-binding pocket and conserved residues. Also, within the globin superfamily, hemoglobin has a subfamily ➜ Alpha-globin and Beta-globin.

🩸The structure from RCSB was released in 1998-04-29. The resolution is 1.80 Å

🩸There is a molecule in the structure. A ligand called “PROTOPORPHYRIN IX CONTAINING FE”

🩸SCOP showed me the following structure classification families:

🩸Visualizing the protein on PyMol as:

Cartoon:

Ribbon:

Ball and Stick:

🩸By coloring the protein by secondary structure, it showed more helices than sheets.

🩸The hydrophobic residues are (color yellow in image):

ALA, VAL, LEU, ILE, MET, PHE, TRP, PRO

The hydrophilic residues are (color cyan in image):

SER, THR, ASN, GLN, TYR, CYS

And the charged residues (also hydrophilic):

ASP, GLU (negative, color red in image), LYS, ARG, HIS (positive, color blue in image)

🩸Here you can see the surface of hemoglobin and the cavity (binding pocket) where the heme sits.

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

For this part, I chose the lysozyme protein from PDB. Sequence:

168L_1|Chains A, B, C, D, E|T4 LYSOZYME|Enterobacteria phage T4 (10665) MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSLRMLQQKRWDAAAAALAAAAWYNQTPNRAKRVITTFRTGTWDAYKNL

1. Deep Mutational Scans

b. Two patterns stand out immediately: A strong horizontal dark band for C (cysteine) across many positions and a few bright vertical columns where almost every substitution is beneficial. The C row is strongly negative because it has a deep purple color and it is darker than surrounding columns. This is very likely one of lysozyme’s disulfide-forming cysteines. Plus, there is a column where many substitutions are bright yellow/green across many amino acids. That usually means that the wild-type residue is suboptimal, the position is surface-exposed, many substitutions improve stability or packing or the model predicts energetic relief.

A mutation that stands out is Cys → Ser. Even though serine is chemically similar (small, polar), the heatmap shows it as strongly negative at those positions. That is because Ser cannot form a disulfide bond, even subtle size changes can disrupt precise geometry and disulfides in lysozyme are deeply integrated into folding topology. That means these cysteines are structurally essential, not just chemically similar residues.

2. Latent Space Analysis

This is the resulting map:

C2. Protein Folding

1. Folded protein with ESMFold:

This is the 3D structure in PDB:

Part D. Group Brainstorm on Bacteriophage Engineering

The main goal would be to computationally design mutations that increase the structural stability of the bacteriophage L lysis protein using structure prediction and stability analysis tools. It is the easiest and most common computational protein-engineering task, but very useful as well.

The second goal would be computationally analyzing conserved residues in L-like lysis proteins to identify mutations that may increase the toxicity of the protein. This method is more difficult, but quite interesting because this goal tries to make the protein kill bacteria more efficiently. The challenge is that toxicity often depends on complex cellular interactions and membrane effects, which are harder to model.

➜ Increased Stability of the Lysis (L) Protein

Tools/approaches 🧬

Protein language models to perform in-silico mutagenesis and identify mutations that are evolutionarily compatible with the L protein sequence. For exapmle, ESM-2
Approach: Input of the L-protein sequence into a protein language model, performing in-silico mutagenesis (substituting amino acids at different positions), scoring mutations based on their likelihood or predicted fitness and selecting mutations predicted to be tolerated or beneficial.
Structure Prediction with AlphaFold to evaluate structural effects with 3D structures
Approach: Predicting the structure of the wild-type L protein, modeling mutant variants suggested by language models or other methods and comparing structural confidence scores (pLDDT) and structural changes.

These tools help to solve the problem, because: 🔎

Protein language models allow large-scale in-silico mutagenesis, filter out mutations likely to destabilize the protein and suggest mutations that resemble evolutionarily acceptable variants.
Structure predictions help locate buried vs. exposed residues, identify functional or interaction sites and see whether mutations affect structural packing.
With energy calculations, you can quantitatively compare mutations and select variants predicted to produce a more stable protein fold.

Potential pitfalls 🚩

Bacteriophage lysis proteins are relatively poorly characterized compared with many other proteins. This means that there may be few experimentally validated structures and limited functional data for mutations. Because of this, models trained on general protein datasets (e.g., ESM-2) may not fully capture the specific biology of phage lysis mechanisms in Escherichia coli.
The activity of lysis proteins depends on the complex environment of the bacterial cell, including membranes and host proteins like DnaJ. A mutation predicted to improve stability could accidentally reduce interaction with the membrane, alter timing of lysis or disrupt important host interactions.

➜ Higher Toxicity of the Lysis Protein

Tools/approaches 🧬

Structure Prediction and Structural Analysis with AlphaFold and PyMol
Approach: Predicting the 3D structure of the L protein with AlphaFold, visualizing the structure in PyMOL and identifying structural features, such as exposed residues, potential interaction sites and membrane-facing regions.
Analysis of Lysis Proteins with BLAST and Clustal Omega
Approach: Using BLAST to find homologous lysis proteins from other bacteriophages, aligning the sequences using Clustal Omega and identifying highly conserved residues or motifs.

These tools help to solve the problem, because: 🔎

The Structure Prediction and Analysis helps identify regions that could be modified to strengthen the interactions with the bacterial membrane or other proteins.
Analysing the Lysis Protein with the given tools help identifying the position of conserved residues. By identifying these positions, it is possible to locate functional regions responsible for lysis and design mutations near these sites to potentially enhance activity

Potential pitfalls 🚩

Protein toxicity often depends on complex interactions inside the bacterial cell, such as membrane disruption or interactions with host proteins in Escherichia coli. Computational tools like AlphaFold can predict protein structures and interactions, but they cannot fully model the cellular environment.
Sometimes mutations that improve function can reduce protein stability or folding efficiency. Even if a mutation increases interaction with bacterial targets, it might cause misfolding, make the protein degrade faster or reduce expression levels.

References

https://pmc.ncbi.nlm.nih.gov/articles/PMC10105836/
https://www.chemistryworld.com/features/why-are-there-20-amino-acids/3009378.article
https://www.pittwire.pitt.edu/pittwire/features-articles/liu-chemistry-proteins-synthesis#:~:text=to%20Pittwire%20Today-,A%20new%20chemical%20process%20makes%20it%20easier%20to%20craft%20amino,proteins%20or%20their%20smaller%20cousins.
https://astrobiology.com/2023/04/how-were-amino-acids-formed-before-the-origin-of-life-on-earth.html#:~:text=After%20several%20millions%20of%20years,other%2C%20similar%20to%20human%20hands.
https://pmc.ncbi.nlm.nih.gov/articles/PMC8508955/#:~:text=Abstract,conductive%20materials%2C%20and%20catalytic%20materials.

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

The human SOD1 sequence without the mutation:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

The human SOD1 sequence with the A4V mutation:

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

These are the 4 generated peptides and the added peptide "FLYRWLPSRRGG" I obtained from Google Colab:

Perplexity measures how surprised the model is by a peptide sequence — lower values mean the sequence looks more natural or compatible with the protein context. The higher the score, the stranger, more unlikely the sequence. Therefore, according to PepMLM, peptide no. 1 has the best perplexity score with 8.770207. That means PepMLM thinks the generated peptides are more likely / better fitting sequences for the mutant SOD1 context than the known peptide.

Part 2: Evaluate Binders with AlphaFold3

Because peptides 0 and 1 have the variable "X", which means unknown amino acid, I used the other peptides, which don´t contain any unknown amino acids. These are the following results:

🟢(1)

🟠(2)

🟣(3)

All of the peptides seem to "float" over the SOD1 protein structure. That means that these peptides are not buried within the structure, but rather surfaced-bound.

🟢(1) contains peptide no. 2 from the table above. Its ipTM is 0.86 and pTM is 0.9

🟠(2) contains peptide no. 3 from the table above. Its ipTM is 0.81 and pTM is 0.86

🟣(3) contains the known SOD1-binding peptide "FLYRWLPSRRGG". Its ipTM is 0.9 and pTM is 0.92

By comparing all three of the binding peptides, the one with the best results is the known peptide 🟣(3). pTM measures how confident AlphaFold is in the overall 3D structure of the protein and ipTM measures confidence in the interaction between the protein and peptide chains. 🟣(3) scores the highest, which means that it is predicted by AlphaFold to bind most stably to the mutant SOD1.

In the structure from 🟣(3), the peptide (orange) binds on the surface of SOD1, near loops on one end of the β-barrel. It is close to the N-terminal region where the A4V mutation sits but does not penetrate the β-barrel core. The peptide does not appear to directly engage the dimer interface; rather, it interacts with one monomer of the SOD1 dimer.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

These are the following predictions for each peptide on PeptiVerse:

🟢(1)

🟠(2)

🟣(3)

➜ Solubility: All of the 3 peptides are soluble.

➜ Hemolysis: The lower, the safer. The safest one is Peptide 1

➜ Binding Affinity: The lower the value, the better/stronger predicted binding. All 3 of them show weak binding, but Peptide 1 has the best score. Although AlphaFold3 showed Peptide 3 with the best ipTM score, this measurement difference is normal, because the tools measure different things. AlphaFold3 measures the structural confidence of the interaction and PeptiVerse measures the predicted binding energy.

➜ Net Charge: Positive charge helps interaction with proteins and membranes. Peptide 1 scores the best, followed by Peptide 3.

➜ Hydrophobicity: Moderate values are usually ideal, so Peptide 3 has the best value, followed by Peptide 1.

To conclude, Peptide 🟢(1) has the best overall balance. It has the strongest predicted binding, the lowest hemolysis (safest), good positive charge and good hydrophilicity and solubility.

Part 4: Generate Optimized Peptides with moPPIt

moPPIt generated the following binding peptides:

KTFAQFKKIFLQ
PQKEITRCQFFE
VTYCAYYWVTCV

Part C: Final Project: L-Protein Mutants

I chose Option 3: Random Mutagenesis.

I used ChatGPT for helping me create a python function to generate random mutation combinations. It generated the following mutations:

Sequence:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Mutations:

E → A at position 2
L → F at position 45
T → G at position 64

This is the new sequence with the mutations:

MATRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFFAIFLSKFTNQLLLSLLEAVIRTVTGLQQLLT

This is the 3D structure of the L-protein:

And this is the mutated protein:

Week 6 HW: Genetic Circuits Part I

Assignment: DNA Assembly

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Some components in the Phusion High-Fidelity PCR Master Mix are:

➜ Phusion DNA Polymerase - This enzyme copies the DNA with very high accuracy.

➜ dNTPs (deoxynucleotide triphosphates) - These are the building blocks that the polymerase uses to synthesize new DNA.

➜ Reaction Buffer - This keeps the pH and salt conditions so the enzyme works properly.

➜ Mg²⁺ ions (Magnesium ions) - Magnesium is required for the DNA polymerase to function during DNA synthesis.

➜ Stabilizers and additives - These help keep the enzyme stable and improve the efficency of the PCR reaction.

What are some factors that determine primer annealing temperature during PCR?

➜ One factor that determines primer annealing during PCR is the primer length, because longer primers usually have a higher annealing temperature because they bind more strongly to the DNA.

➜ Furthermore, the GC content of the primer is important because primers with more G and C bases have a higher annealing temperature since G–C pairs form stronger bonds than A–T pairs.

➜ Another factor is the primer sequence, because the exact order of bases can affect how strongly the primer binds to the DNA template.

➜ Salt and magnesium concentration is another important factor because higher concentrations can stabilize primer binding and influence the optimal annealing temperature.

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

Restriction Enzyme Digest	This method uses restriction enzymes to cut DNA at specific sequences. It is commonly used to analyze DNA fragments or to prepare DNA for cloning.
PCR (Polymerase Chain Reaction)	PCR is used to amplify a specific DNA sequence using DNA polymerase and primers.

The experiment first uses PCR to amplify DNA fragments and introduce mutations into the amilCP gene and after PCR, the samples are treated with the DpnI restriction enzyme, which digests the original methylated plasmid template so that only the newly amplified PCR DNA remains. Then the PCR fragments are combined using Gibson Assembly and later transformed into E. coli. When you want to make many copies of a DNA region, PCR is required, but when you need to cut DNA at specific sites, the restriction enzyme is preferable over PCR.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

To ensure the DNA fragments work for Gibson cloning, you need to design them carefully so they can join together correctly. The DNA fragments must have matching overlapping ends (usually ~20–40 base pairs). These overlaps allow the fragments to stick together during Gibson Assembly. Furthermore, when doing PCR, primers should be designed to add these overlapping regions to the ends of the DNA fragments and you also need to make sure the overlaps match the correct neighboring fragment so everything assembles in the right order. After PCR or digestion, the DNA should be pure and free of contaminants so the assembly reaction works efficiently. Unlike restriction cloning, you don’t need specific restriction sites, but the ends must be designed to be complementary.

How does the plasmid DNA enter the E. coli cells during transformation?

Plasmid DNA enters E. coli cells during transformation by making the cells temporarily permeable so the DNA can pass through the membrane. A common method is the heat shock transformation, where the cells are first treated with calcium chloride to make their membranes more permeable. Then a sudden increase in temperature (heat shock) creates a temporary opening, allowing the plasmid DNA to enter the cells. Antoher method electroporation, where a short electrical pulse is applied to the cells, which creates tiny pores in the membrane. The plasmid DNA can then pass through these pores into the cell.

Describe another assembly method in detail (such as Golden Gate Assembly).

I found another assembly method on the website Addgene.org. This assembly method is called SLIC (Sequence and Ligation Independent Cloning). SLIC is a method that joins DNA fragments using short homologous (matching) sequences, similar to Gibson, but with fewer enzymes. SLIC joins DNA fragments by creating matching overhangs that anneal, and the final DNA is repaired inside the bacteria. How does SLIC work?

1. Each DNA fragment (for example, your insert and plasmid) is designed so that their ends share 15–25 base pairs of identical sequence. These overlaps are essential because they will allow the fragments to recognize and bind to each other.

2. The DNA is treated with an enzyme such as T4 DNA polymerase. This enzyme “chews back” the ends of the DNA, removing nucleotides from the 3′ ends and creating single-stranded overhangs.

3. When the treated DNA fragments are mixed together, the complementary single-stranded overhangs base-pair (anneal) with each other. This brings the fragments together in the correct order based on their matching sequences.

4. At this stage, the DNA fragments are joined, but there may still be missing bonds in the backbone. So the DNA is not fully complete yet.

5. The partially assembled DNA is introduced into E. coli cells.

6. The bacteria’s natural DNA repair systems fill in missing nucleotides and seal the nicks by using ligase enzymes. This results in a fully intact plasmid.

SLIC Method Diagram

DNA Fragments with Overlaps

→

Exonuclease Treatment
(creates overhangs)

→

Annealing
(fragments stick together)

→

Transformation into E. coli

→

DNA Repair in Cell

Week 7 HW: Genetic Circuits Part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Traditional genetic circuits	😐 Traditional genetic circuits typically implement logic like AND, OR, NOT — meaning outputs are binary (on/off). 😐 Boolean circuits are limited to combinations of discrete logic rules. 😐 Traditional genetic circuits are hard-coded. 😐 Biological systems are inherently noisy (stochastic gene expression). Boolean circuits can fail if signals fluctuate around thresholds
IANNs	😊 Neural networks operate with continuous values, not just 0 or 1. This allows cells to respond proportionally to input concentrations and encode gradients and subtle differences in signals. 😊 IANNs can map complex environmental signals and multi-factor biological states. 😊 IANNs can be trained and adapted to new conditions. 😊 Neural networks distribute computation across many nodes and use weighted sums → more tolerant to noise.

Traditional genetic circuits

😐 Traditional genetic circuits typically implement logic like AND, OR, NOT — meaning outputs are binary (on/off).

😐 Boolean circuits are limited to combinations of discrete logic rules.

😐 Traditional genetic circuits are hard-coded.

😐 Biological systems are inherently noisy (stochastic gene expression). Boolean circuits can fail if signals fluctuate around thresholds

IANNs

😊 Neural networks operate with continuous values, not just 0 or 1. This allows cells to respond proportionally to input concentrations and encode gradients and subtle differences in signals.

😊 IANNs can map complex environmental signals and multi-factor biological states.

😊 IANNs can be trained and adapted to new conditions.

😊 Neural networks distribute computation across many nodes and use weighted sums → more tolerant to noise.

Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

A useful application would be a smart cancer-detecting and responding therapeutic cell. It´s a living classifier that decides whether to trigger a treatment based on a complex molecular signature. The goal is to engineer a cell that detects whether it is in a tumor microenvironment and it activates a therapeutic response only when high confidence is reached.

Input (each input corresponds to a measurable molecular feature):

➡️ Surface protein markers (e.g. high HER2, EGFR)

➡️ Metabolic signals (high lactate → tumor glycolisis)

➡️ Oxigen level (low)

➡️ Inflammatory cytokines

➡️ Low pH level (acidic microenvironment)

IANN processes inputs using weighted gene regulation and combinatorial control. Biologically, each “neuron” is a gene whose expression depends on a weighted sum of inputs. Activation functions are implemented via cooperative binding and thresholding via repression/activation dynamics. Therefore, the network learns a nonlinear decision boundary and this allows detection of patterns, like:

“Moderate hypoxia + high lactate + mild inflammation = tumor”,

even if no single signal is decisive.

Output:

➡️ Expression of a therapeutic protein (e.g., cytokine, toxin, checkpoint inhibitor)

Limitations:

➡️ Network outputs may drift or become inconsistent

➡️ Difficult to “train” the network accurately because in electronics, weights are precise numbers. In cells, weights are promoter strength, binding affinity, degradation rates and these are hard to tune precisely and they are sensitive to context

➡️ Large circuits consume energy and resources. This leads to slower growth and evolutionary pressure to disable the circuit

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

X₁X₂

Layer 1
TxTl
Layer 2
TxTlY
References
https://d1wqtxts1xzle7.cloudfront.net/34415923/Eluyode_DT-libre.pdf?1407773797=&response-content-disposition=inline%3B+filename%3DScholars_Research_Library_Comparative_st.pdf&Expires=1774290637&Signature=ZCAud8G9WvQouddiClQgtlBZpLtToyVSkyu45AEt8SLRpZKPVEolnvW-p9s0SfUJMcu4mrZxZDlTnn93bUv34VL5Nz9etoQJX3uNFYJBo58Go6eqAyymB05X~qSoi7T8I1eJH9DvNaZgOLyIcfB724kloAsogijGkWcH5~FCUPkvPMYzXPh596yjNVFefl4GhilZi~APAooLZRiFBErfAr39sBRYsLKfUwoRSLNJ1i3nUbiMu0oEl77XOwneTsqR9tcbhSKG-RL9QtvdxQyE92JCsGd4G3ZAza2N7Ika1Izoc8H9fGUf1sYgYGf1U~zknoTutvSwfSv1VxdzAE4fpg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

Step 1. First, describe a biological engineering application or tool you want to develop and why.

Step 2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

Step 3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Step 4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.

Step 5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Week 2 HW: DNA read, write and edit

Part 1: Benchling & In-silico Gel Art

Part 3: DNA Design Challenge

3.1. Choose your protein.

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

3.3. Codon optimization.

3.4 Technologies for producing the Green Fluorescent Protein.

3.5. [Optional] How does it work in nature/biological systems?

Part 4: Prepare a Twist DNA Synthesis Order

Part 5: DNA Read/Write/Edit

5.1 DNA read

5.2 DNA Write

5.3 DNA Edit

References

Week 3 HW: Lab Automation

Opentrons Artwork

Post-Lab Questions

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

Part B: Protein Analysis and Visualization

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

C2. Protein Folding

Part D. Group Brainstorm on Bacteriophage Engineering

References

Week 5 HW: Protein Design Part II

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

Part C: Final Project: L-Protein Mutants

Week 6 HW: Genetic Circuits Part I

Assignment: DNA Assembly

SLIC Method Diagram

Week 7 HW: Genetic Circuits Part II

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

References