Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
Step 1. First, describe a biological engineering application or tool you want to develop and why. Something we are interested in is reading and biology, and we wanted to find a way to combine these two interests. We thought about creating bioluminescent bookmarks. These bookmarks would produce light naturally, because of the engineered microorganisms or biological materials that glow in different colors, without the need for batteries, which makes them environmentally friendly.
Week 2 – DNA Read, Write & Edit
Pages Lecture Prep Homework
Week 3 – Lab Automation
✨ Week 3 - homework ✨ Here is the link to my Automation Art (2026 HTGAA Bacteriophage): https://opentrons-art.rcdonovan.com/?id=jy86j81azdyuadc After generating this bacteriophage design in Opentrons Art, I created a copy of the Colab notebook and worked there to build a Python protocol that would allow the Opentrons robot to reproduce the artwork on a plate. Since I don’t know Python, I first used ChatGPT to generate the code, but the initial version contained many errors when running in Colab. I then switched to Gemini, which helped me debug and fix the issues. I manually entered the coordinates from the link above, step by step, to reconstruct the design inside the protocol. After completing the process, I obtained the following image.
Week 4 — Protein Design Part I
✨ Part A. Conceptual Questions ✨ 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Using an online converter ( https://www.unitconverters.net/weight-and-mass/gram-to-dalton.htm ), I calculated that 100 Daltons (1 amino acid) corresponds to approximately 1.66 × 10⁻²² g. After dividing the mass of a 500 g piece of meat by this value, I found the total number of amino acid molecules:
Week 5 — Protein Design Part II
✨ Part A. SOD1 Binder Peptide Design ✨ Part 1: Generate Binders with PepMLM I searched for the SOD1 amino acid sequence in the UniProt database (P00441) and found that the protein has 154 amino acids: >MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ After that, I inserted the sequence into the Colab notebook. I set the parameters to generate 4 peptide binders, each with a length of 12 amino acids. The model then generated the following peptides:
Week 6 — Genetic Circuits Part I: Assembly Technologies
✨ DNA Assembly ✨ 1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The Phusion High-Fidelity PCR Master Mix contains Phusion DNA polymerase, dNTPs, reaction buffer, and Mg²⁺ ions. These components allow accurate DNA amplification, with the polymerase synthesizing DNA, dNTPs acting as building blocks, and the buffer and Mg²⁺ providing optimal conditions for the reaction.

Week 1 HW: Principles and Practices

Step 1. First, describe a biological engineering application or tool you want to develop and why.

Something we are interested in is reading and biology, and we wanted to find a way to combine these two interests. We thought about creating bioluminescent bookmarks. These bookmarks would produce light naturally, because of the engineered microorganisms or biological materials that glow in different colors, without the need for batteries, which makes them environmentally friendly.

Other motivations are to add a little more aesthetic while reading and to explore safe applications of bioluminescence outside the lab.

AI generated

The main ethical goal would be to reduce environmental impact by avoiding traditional batteries, while making sure the user is safe and minimizing risks from the materials used in the bookmark (plastic or glass). This big goal can be break down into:

For environmental protection, the bookmarks should be made from biodegradable materials so that they help reduce plastic waste and are less harmful to the environment.
To make sure users are safe, the bioluminescent organisms used in the bookmarks need to be tested carefully, and if necessary, they could be genetically modified to prevent them from having any harmful or toxic genes.

Step 3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Purpose: Right now, light-up bookmarks exist, but they use batteries that can damage the environment. We want to replace them with bookmarks that produce light without harmful batteries.
Design: We thought to make the bookmark thin enough so that the book can close properly, while still being made from a biodegradable material that is resistant to falling or damage. Inside the bookmark, bioluminescent organisms would be placed to provide light without using batteries.
Assumptions: The autonomy of the bookmark depends on how long the bioluminescent organism can live, which might be a relatively short time. It is also assumed that the production costs could be high, making the bookmark expensive to produce and buy, even though it might only be used for a limited period of time.
Risks of Failure: The bioluminescence may fail due to lack of oxygen, poor organism survival, or degradation of the material.
Success: The bookmark is visually appealing, functional, and draws attention while being eco-friendly, providing a safer alternative to battery-powered bookmarks.

Step 4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

Does the option:	Purpose	Design	Assumptions
Enhance biosecurity
1. Preventing incidents	2	2	3
2. Helping respond	3	3	3
Foster lab safety
1. Improving procedures	2	1	2
2. Encouraging safe behavior	2	2	2
Protect the environment
1. Reducing waste	1	1	2
2. Designing safer materials	1	1	2
Other considerations
1. Cost & feasibility	2	2	3
2. Social impact	1	1	2

🌟 Short explanation of the scoring choices

I gave the highest scores to environmental protection because the bookmarks are made from biodegradable materials that reduce waste. Lab safety scored well since testing the organisms helps prevent accidents. Biosecurity scored lower because this project isn’t focused on biological risks. Cost and social impact were moderate: the bookmarks might be a bit expensive, but they are attractive and eco-friendly. Overall, the scores reflect a balance between safety, environmental benefits, and practicality.

Step 5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

I would prioritize a combination of the bookmark’s aesthetic appeal and environmental safety. A visually attractive bookmark will draw attention and encourage people to buy and use it. At the same time, using biodegradable materials and safe bioluminescent organisms helps protect the environment and ensures user safety.

The main trade-off is cost and production complexity. Making a thin, eco-friendly, and visually appealing bookmark with tested organisms could be more expensive and harder to produce. Another assumption is that the organisms will live long enough to provide visible light, but their lifespan might be limited.

I would recommend this approach to companies that make educational or novelty products and to environmental regulators, because they can make sure the product is safe, sustainable, and still attractive.

Resources

Bioluminescence - FACTSHEET

Week 2 – DNA Read, Write & Edit

Week 2 – Homework

✨ Part 1: Benchling & In‑silico Gel Art ✨

I simulated a restriction digest on λ‑DNA (E. coli bacteriophage) in Benchling using several restriction enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI, and SalI. Each enzyme recognizes its own specific DNA sequence, producing different fragment patterns depending on how many cut sites are present. Some enzymes generate sticky ends, while others like EcoRV create blunt ends. By comparing the band patterns, we can see which enzymes cut the DNA, how many fragments they produce, and estimate fragment sizes—from large fragments (~10 kb) to very small ones (~100 bp). If an enzyme doesn’t cut, the result is a single intact band.

✨ Part 3: DNA Design Challenge ✨

3.1. Choose your protein

For this assignment, I chose linalool synthase, an enzyme involved in the biosynthesis of linalool, one of the major aromatic and bioactive compounds found in lavender (Lavandula spp.). Since my bachelor’s thesis focuses on the bioactive components of lavender, including linalool, this protein felt like a meaningful and relevant choice.

Using UniProt, I obtained the amino acid sequence for the linalool synthase I selected:

 >sp|Q2XSC5|LALIN_LAVAN R-linalool synthase OS=Lavandula angustifolia (Lavender) OX=39329 PE=2 SV=1 MSININMPAAAVLRPFRCSQLHVDETRRSGNYRPSAWDSNYIQSLNSQYKEKKCLTRLEGLIEQVKELKGTKMEAVQQLELIDDSQNLGLSYYFQDKIKHILNLIYNDHKYFYDSEAEGMDLYFTALGFRLFRQHGFKVSQEVFDRFKNENGTYFKHDDTKGLLQLYEASFLVREGEETLEQAREFATKSLQRKLDEDGDGIDANIESWIRHSLEIPLHWRAQRLEARWFLDAYARRPDMNPVIFELAKLNFNIVQATQQEELKALSRWWSSLGLAEKLPFVRDRLVESYFWAIPLFEPHQYGYQRKVATKIITLITSLDDVYDIYGTLDELQLFTNLFERWDNASIGRLPEYLQLFYFAIHNFVSEVAYDILKEKGFTSIVYLQRSWVDLLKGYLKEAKWYNSGYTPSLEEYFDNAFMTIGAPPVLSQAYFTLGSSMEKPIIESMYEYDNILRVSGMLVRLPDDLGTSSFEMERGDVPKSVQLYMKETNATEEEAVEHVRFLNREAWKKMNTAEAAGDSPLVSDVVAVAANLGRAAQFMYFDGDGNQSSLQQWIVSMLFEPYA

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence

Using the online tool available at https://proteiniq.io , I reverse‑translated the amino acid sequence of the linalool synthase protein into a corresponding DNA sequence. This process is based on the Central Dogma of Molecular Biology, which states that DNA makes RNA, and RNA makes protein. By working backwards from the protein sequence, the tool generates a plausible nucleotide sequence that could encode the same protein.

I obtained the following reverse‑translated DNA sequence:

Converted Sequence
ATGTCAATAAACATAAATATGCCGGCAGCAGCTGTTCTGCGGCCTTTCCGGTGCAGCCAGCTTCATGTCGATGAAACACG
AAGATCGGGTAACTATAGGCCCTCTGCCTGGGATTCTAACTACATACAAAGCTTGAACTCACAATATAAGGAGAAGAAGT
GCTTAACGAGGCTTGAGGGTCTAATTGAGCAAGTAAAAGAATTAAAAGGGACGAAAATGGAAGCTGTGCAGCAACTGGAA
TTGATCGACGATTCCCAAAACCTTGGGCTATCATATTACTTTCAGGATAAAATTAAGCACATTCTCAATCTGATCTACAA
CGACCACAAGTATTTTTACGATAGCGAGGCAGAGGGGATGGACTTATATTTCACGGCCTTGGGTTTCCGTCTCTTTCGGC
AACACGGGTTCAAGGTCTCACAAGAAGTCTTCGATAGGTTCAAAAATGAAAACGGAACGTACTTTAAACACGATGACACT
AAAGGGCTACTTCAGCTTTATGAGGCGTCCTTCTTGGTCCGAGAAGGAGAGGAGACGCTAGAACAAGCACGAGAGTTCGC
CACTAAGAGCTTACAGAGAAAACTTGACGAGGACGGAGACGGAATTGACGCGAATATCGAATCATGGATACGGCATAGTC
TTGAGATACCCTTGCATTGGCGCGCGCAGCGTCTCGAAGCCCGCTGGTTTTTAGACGCATACGCTCGGAGGCCGGATATG
AACCCTGTTATTTTCGAATTGGCTAAGTTAAATTTTAATATTGTGCAAGCAACGCAACAAGAGGAGCTCAAGGCGCTTTC
TCGGTGGTGGTCGTCATTGGGCCTAGCCGAGAAACTACCATTTGTGAGAGACAGACTGGTGGAGTCATACTTTTGGGCCA
TTCCGCTATTTGAACCACATCAGTACGGTTACCAGCGAAAGGTGGCGACTAAGATAATTACGCTCATAACCTCATTAGAT
GACGTCTACGATATCTATGGAACCTTAGACGAATTGCAACTTTTCACCAACCTCTTCGAACGCTGGGATAACGCGTCGAT
CGGGAGGCTACCCGAATATCTGCAACTGTTTTACTTTGCGATACACAATTTTGTCAGTGAGGTCGCGTATGATATCCTGA
AAGAAAAGGGCTTCACTTCAATAGTATACTTACAAAGAAGTTGGGTTGATTTACTTAAAGGTTACCTCAAGGAAGCTAAA
TGGTACAACAGCGGGTATACGCCTTCGCTGGAAGAGTATTTTGACAATGCGTTCATGACGATAGGTGCGCCCCCGGTCCT
TTCTCAAGCCTACTTTACACTGGGTTCAAGCATGGAAAAACCCATTATAGAATCCATGTATGAATATGACAATATCCTAC
GAGTAAGCGGCATGCTGGTGCGCCTGCCTGACGATTTGGGAACAAGTTCGTTCGAAATGGAGCGCGGGGACGTTCCTAAA
TCCGTCCAGCTCTACATGAAGGAGACCAATGCAACTGAAGAGGAAGCAGTAGAACATGTGCGCTTTCTGAACAGGGAGGC
TTGGAAAAAAATGAACACTGCTGAGGCTGCGGGCGACTCGCCGTTAGTGTCCGACGTAGTTGCTGTAGCAGCCAATCTAG
GACGCGCAGCGCAATTTATGTATTTCGACGGAGATGGCAATCAATCCTCGTTGCAACAGTGGATTGTGTCCATGCTTTTC
GAGCCATATGCA

3.3. Codon optimization

1. Why do we optimize codons?

Different organisms prefer different codons for the same amino acid. When a gene from one species is expressed in another, the codon usage may not match the host’s preferences, which can slow down translation and reduce protein expression.
Codon optimization rewrites the DNA sequence using the codons most frequently used by the host organism, without changing the amino acid sequence. This increases translation efficiency, mRNA stability, and overall protein yield. It is a standard technique in biotechnology to improve recombinant protein production.

2. Which organism did you choose and why?

I chose Nicotiana tabacum for codon optimization because it is a plant species, just like lavender, the natural source of linalool synthase. Since both are plants, their codon usage patterns are more similar, making N. tabacum a more suitable host for expressing a plant-derived enzyme than bacteria or yeast.
In addition, several studies on related Nicotiana species show that linalool plays an important ecological role in plant defense. For example, Nicotiana attenuata emits (S)-(+)-linalool to attract predators of herbivores such as Manduca sexta, reducing leaf damage. Linalool is also known to have insecticidal and repellent properties in many species, including mosquitoes and agricultural pests.
Because N. tabacum is widely used in biotechnology, cosmetics, and pharmaceutical production, enhancing its natural protection against insects through increased linalool production could reduce the need for pesticides. Introducing a codon‑optimized linalool synthase gene from lavender into tobacco could therefore help the plant produce higher levels of linalool and benefit from its natural repellent and defensive properties.

Codon Optimization Using Two Different Tools

To ensure that my codon optimization results were reliable and not dependent on a single algorithm, I performed the optimization using two independent online tools, each of which uses different reference datasets and calculation methods for Nicotiana tabacum. Because of these differences, the CAI (Codon Adaptation Index) and GC% values vary slightly between platforms, which is expected.

Tool 1 — VectorBuilder Codon Optimization

https://en.vectorbuilder.com/tool/codon-optimization/59fae592-9784-4c9e-976c-f649a1865c8f.html

Metric	Before Optimization	After Optimization
CAI	0.67	0.88
GC content	46.34%	38.06%

This tool showed a clear improvement in CAI, indicating that the optimized sequence is much better adapted to the codon usage preferences of Nicotiana tabacum.

Tool 2 — NovoPro Codon Optimization

https://www.novoprolabs.com/tools/codon-optimization

Metric	Before Optimization	After Optimization
CAI	0.86	0.81
GC content	40.37%	42.55%

In this case, the original sequence was already well adapted according to NovoPro’s reference dataset, and the optimized version produced a CAI of similar magnitude, with a slightly higher GC%.

3.4. You have a sequence! Now what?

Now that I have my optimized DNA sequence, the next step is to think about how this DNA could actually be used to produce the protein. In general, the process is the same in any biological system: the DNA is transcribed into mRNA, and then the mRNA is translated by ribosomes into the protein.

One way to do this is by expressing the gene directly in a plant, such as Nicotiana tabacum . To get the gene into the plant genome, a genome‑editing tool like CRISPR–Cas9 could be used. Cas9 can make a cut at a specific location in the plant’s DNA, and then my optimized gene can be inserted at that site. After the gene is integrated, the plant’s own machinery will read it, make mRNA from it, and then produce the protein.

Another option would be to use cell‑free expression systems or express the gene in E. coli, for example, but plant expression is especially relevant when the protein is naturally part of a plant pathway.

Overall, once the DNA is inside the host (either a cell or a cell‑free system), the basic flow is the same: DNA → mRNA → protein.

DNA
→
RNA
→
Protein

3.5. [Optional] How does it work in nature/biological systems?

1. Describe how a single gene codes for multiple proteins at the transcriptional level.

A gene is first transcribed into a long RNA molecule called pre‑mRNA. This pre‑mRNA contains exons, which are kept and introns which will be removed.
Alternative Splicing The cell can splice (cut and join) the exons in different combinations. Different exon combinations = different mRNA molecules.
Different mRNAs → different proteins Each mRNA variant is translated into a protein. Because the exon order changes, the amino acid sequence changes too, and finally, one gene can produce multiple proteins.

DNA
→
EXON1 — intron — EXON2 — intron — EXON3
↓ alternative splicing
mRNA Variant 1
→
EXON1 + EXON2 + EXON3
mRNA Variant 2
→
EXON1 + EXON3
↓ translation
Protein 1
≠
Protein 2

2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!!

In nature, DNA is double-stranded, but only one strand is used as the template during transcription. RNA polymerase reads the template strand (3′→5′) and synthesizes a single-stranded RNA molecule based on base complementarity: A pairs with U (because RNA has no T), T pairs with A, G with C, and C with G.

After transcription, the mRNA is read in groups of three nucleotides called codons. Each codon corresponds to one amino acid. During translation, tRNA molecules bring amino acids to the ribosome by matching their anticodon to each codon on the mRNA. As amino acids join together, they form the polypeptide chain — the protein.

DNA 5′→3′: ATG TCA ATA AAC ATA AAT
DNA 3′→5′: TAC AGT TAT TTG TAT TTA
RNA 5′→3′: AUG UCA AUA AAC AUA AAU
AA: M S I N I N

✨ Part 4: Prepare a Twist DNA Synthesis Order ✨

I created my Twist Bioscience account, and I already had a Benchling account from the previous step. In Benchling, I created a new DNA sequence named Linalool_E.coli, where I inserted the codon‑optimized DNA sequence of my gene of interest (Linalool synthase), optimized for E. coli. Before the coding sequence, I added the following genetic elements:

Promoter (BBa_J23106)

 TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC

RBS (BBa_B0034 – ribosome binding site)

 CATTAAAGAGGAGAAAGGTACC

Start Codon (ATG)

ATG

Coding Sequence (Linalool synthase, codon‑optimized for E. coli)

AGCATCAACATTAATATGCCGGCGGCGGCGGTGCTGCGCCCGTTTCGTTGCAGCCAGCTGCACGTTGATGAAACCCGTCGCAGCGGCAATTATCGTCCGAGCGCGTGGGATAGCAATTATATTCAGAGCCTGAATAGCCAGTATAAAGAAAAAAAATGTCTGACCCGCCTGGAAGGCCTGATTGAACAGGTGAAAGAACTGAAAGGCACCAAAATGGAGGCGGTGCAGCAGCTGGAACTGATCGATGATAGCCAGAATTTGGGCCTGAGCTATTATTTTCAGGATAAAATTAAACATATTCTGAACCTGATTTATAACGATCATAAATATTTTTACGATAGCGAAGCGGAAGGCATGGACCTGTACTTTACCGCGCTTGGCTTTCGCCTGTTTCGCCAGCATGGCTTTAAAGTGTCGCAGGAAGTGTTTGATCGCTTTAAAAACGAAAATGGCACCTATTTTAAACATGATGATACCAAAGGTCTGCTGCAGCTGTATGAAGCGAGCTTTCTGGTTCGCGAAGGCGAAGAAACCTTAGAACAGGCCCGCGAATTCGCGACGAAAAGCCTGCAGCGCAAACTGGATGAAGATGGCGATGGCATTGATGCGAACATTGAAAGCTGGATTCGCCACAGCCTGGAAATTCCGCTGCATTGGCGTGCGCAGCGCCTGGAAGCCCGCTGGTTTCTGGATGCCTACGCGCGCCGCCCGGATATGAATCCGGTGATTTTCGAACTGGCCAAACTGAACTTTAACATTGTGCAGGCGACCCAGCAAGAAGAACTGAAAGCGCTGAGCCGCTGGTGGAGCTCTCTGGGCCTGGCAGAAAAACTGCCGTTTGTGCGTGATCGTCTGGTGGAAAGCTATTTCTGGGCGATTCCGCTGTTTGAACCGCATCAGTATGGCTATCAGCGCAAAGTCGCGACCAAAATTATTACCCTGATTACCAGCCTGGATGATGTCTATGATATTTATGGCACCCTGGATGAACTGCAGCTGTTCACGAATTTATTTGAACGTTGGGATAACGCGAGCATTGGTCGCCTGCCGGAATATCTGCAGCTGTTCTATTTCGCGATCCATAATTTTGTGTCGGAAGTGGCCTATGATATTCTGAAAGAAAAAGGCTTTACCAGCATTGTGTACCTGCAGCGCTCCTGGGTGGATCTGCTGAAAGGCTACCTGAAAGAAGCGAAATGGTATAATTCAGGCTATACCCCGAGCCTGGAAGAATATTTTGATAATGCCTTCATGACGATTGGCGCCCCTCCGGTGCTGTCGCAGGCCTATTTCACCCTGGGCAGCAGCATGGAGAAACCGATTATTGAAAGCATGTATGAATATGATAATATTCTGCGTGTGAGCGGCATGCTGGTTCGCCTGCCGGATGATCTGGGCACCAGCAGTTTTGAAATGGAGCGCGGCGATGTGCCGAAAAGCGTGCAGCTGTACATGAAAGAAACCAACGCCACCGAAGAAGAAGCCGTGGAACATGTGCGCTTCCTGAATCGCGAAGCGTGGAAAAAAATGAATACCGCGGAAGCAGCGGGTGACAGCCCGCTGGTAAGCGATGTGGTGGCGGTGGCCGCGAACCTGGGCCGCGCAGCGCAGTTCATGTATTTTGATGGCGATGGCAACCAGAGCTCACTGCAGCAGTGGATTGTGAGCATGCTGTTTGAACCGTATGCG

After the coding sequence, I added:

7×His Tag

 CATCACCATCACCATCATCAC

Stop Codon (TAA)

TAA

Terminator (BBa_B0015)

 CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

The resulting linear map can be seen in the first image, next to this textm and the second one is the visual diagram.

✨ This is the plasmid I just built! ✨

✨ Part 5: DNA Read/Write/Edit ✨

5.1 DNA Read

I would choose to sequence DNA from plants that naturally show resistance to agricultural pathogens, as well as DNA from the pathogens themselves. Understanding the genetic basis of plant immunity — for example, genes involved in pathogen recognition, antimicrobial compound production, or stress signaling — can help identify natural strategies that crops use to defend themselves without relying on chemical pesticides.

This connects directly with my own project, where I designed a plasmid for the biosynthesis of linalool in E. coli. Linalool is a naturally occurring monoterpene found in many aromatic plants, and it is known to have antimicrobial and insect‑repellent properties. By studying the DNA of pathogen‑resistant plants, we can discover how these organisms use compounds like linalool or related molecules as part of their defense systems.

Sequencing both plant and pathogen DNA would therefore support sustainable agriculture by revealing natural defense pathways that could be enhanced, transferred, or synthetically produced — reducing the need for synthetic insecticides and promoting more resilient crop systems.

Sequencing technology I would use

To sequence plant resistance genes and agricultural pathogens in a way that is fast, portable, and useful directly in the field, I would use Oxford Nanopore MinION. This technology allows rapid, on‑site DNA analysis, which supports sustainable agriculture by allowing early pathogen detection and reducing unnecessary pesticide use.

1. Generation	I would use Oxford Nanopore MinION, a third‑generation method. It reads DNA directly by measuring small electrical changes as the strand passes through a nanopore.
2. Input & Preparation	Input: purified DNA from plant tissue or pathogens. Preparation steps: • DNA extraction • Optional DNA cutting/fragmentation • Add Nanopore adapters • Load the sample into the MinION • Sequencing starts
3. How it reads the DNA	The DNA strand moves through a nanopore. Each base changes the electrical signal slightly. The device reads these signal patterns and the software turns them into A, T, C, or G.
4. Output	The output is the actual DNA reading: long sequences + quality scores (FASTQ files). This shows exactly which bases were detected in the sample.

5.2 DNA Write

I would like to synthesize the DNA for a plant gene that naturally produces linalool, a fragrant molecule with mild antimicrobial and insect‑repellent properties. Adding this gene to Nicotiana tabacum could help the plant protect itself better in a natural way, supporting more sustainable agriculture without relying on chemical pesticides.

What are the essential steps of your chosen sequencing methods?

I would use commercial DNA synthesis technology, such as the automated chemical DNA writing used by companies like Twist Bioscience. This method can quickly and accurately produce the exact DNA sequence I want, including the gene responsible for linalool production. It is reliable, fast, and ideal for creating small custom DNA fragments for research.

What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

The main limitations of DNA synthesis are related to length, errors, time, and cost. Longer DNA sequences are harder to synthesize and may take more time to produce. Errors can appear during synthesis, so the final DNA often needs to be checked. The process can also take longer for complex or larger sequences. Finally, one of the biggest limitations is cost, because high‑quality synthesis technologies and equipment are expensive.

5.3 DNA Edit

If I could edit DNA, I would choose to modify the genome of common ragweed (Ambrosia artemisiifolia). I would edit the genes responsible for pollen development and allergenic pollen proteins. I would target two types of DNA regions.

First, I would edit genes controlling pollen formation so the plant becomes male-sterile and produces non-viable pollen. This could be done using modern gene-editing technologies such as CRISPR, which allow precise mutations in specific genes. If the pollen cannot develop properly, the plant would release little or no functional pollen, which would strongly reduce allergy problems and also limit the plant’s uncontrolled spread.

Second, I would consider modifying the DNA coding for the main allergenic pollen proteins, such as the Amb a allergens, so that their structure becomes less likely to trigger immune reactions. Even if allergies cannot be completely removed, reducing both pollen quantity and allergen strength could significantly decrease the public health impact.

AI generated

The reason I would edit ragweed DNA is that this plant already grows naturally in polluted and disturbed environments and tolerates poor soil conditions. Because of this, it could potentially be used for **opportunistic phytoremediation**, meaning helping absorb some heavy metals from contaminated soil without needing intensive cultivation. Currently, public policy focuses on **elimination**, not **domestication** of ragweed because of its allergy risk. However, if genetic editing reduced pollen hazards and spread, the plant might instead be safely managed and used in an **ecological direction** for environmental cleanup.

Therefore, editing ragweed DNA could transform a harmful invasive species into a controlled plant with potential environmental benefits while reducing risks to human health.

1. How does your technology of choice edit DNA? What are the essential steps?

A: The DNA editing technology I would use for both approaches is CRISPR. For the first, CRISPR would target pollen-development genes by designing guide RNA, cutting the DNA with Cas9, and letting the cell repair it to create non-viable pollen, while for the second, CRISPR would target allergen genes, cut the DNA at epitope regions, and use a repair template to introduce small changes so the protein becomes less allergenic.

2. What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

A: The DNA editing technology I would use for both approaches is CRISPR. For the first, I would prepare a guide RNA targeting the pollen-development gene, and for the second, a guide RNA targeting the allergen gene along with a DNA template for small changes. The input for the editing includes the plant cells, the Cas9 enzyme, the guide RNAs, and the repair template (for the allergen modification), which together allow the plant to make the desired DNA changes.

3. What are the limitations of your editing methods (if any) in terms of efficiency or precision?

A: The main limitations of CRISPR are that DNA repair is not always perfect, which can cause unintended mutations, and not all cells may be successfully edited, so the efficiency is less than 100%. Precise changes, like modifying allergen proteins, are harder to achieve than simply turning a gene off, making the method less precise for complex edits.

References

Week 2 – Lecture Prep

AI generated

Questions

1. What’s the most commonly used method for oligo synthesis currently?

Answer: The most commonly used method for oligo synthesis today is phosphoramidite DNA synthesis.

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Answer: It is difficult because each chemical synthesis step has less than 100% efficiency, errors accumulate with length, making oligos longer than ~200 nt unreliable.

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Answer: Chemical DNA synthesis has an error rate of about 1 in 100 bases, and the errors accumulate over 2000 bases, so it is impossible to obtain a correct full‑length gene through direct synthesis.

Week 3 – Lab Automation

✨ Week 3 - homework ✨

Here is the link to my Automation Art (2026 HTGAA Bacteriophage): https://opentrons-art.rcdonovan.com/?id=jy86j81azdyuadc

After generating this bacteriophage design in Opentrons Art, I created a copy of the Colab notebook and worked there to build a Python protocol that would allow the Opentrons robot to reproduce the artwork on a plate. Since I don’t know Python, I first used ChatGPT to generate the code, but the initial version contained many errors when running in Colab. I then switched to Gemini, which helped me debug and fix the issues.

I manually entered the coordinates from the link above, step by step, to reconstruct the design inside the protocol. After completing the process, I obtained the following image.

✨ Post-Lab Questions ✨

1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Article Title: AssemblyTron: Flexible automation of DNA assembly with Opentrons OT‑2 lab robots Authors: John A. Bryant Jr., Mason Kellinger, Cameron Longmire, Ryan Miller, R. Clay Wright Year: 2022 DOI: https://doi.org/10.1101/2022.09.29.510219

Article 1

In the article “AssemblyTron: Flexible automation of DNA assembly with Opentrons OT‑2 lab robots,” the authors aim to demonstrate the value of using automation tools in scientific research. Their main objective is to show how the Opentrons OT‑2 platform can simplify laboratory workflows, reduce human error, and significantly shorten reaction setup time.

The study uses plasmid DNA, including E. coli plasmids carrying chromoprotein genes and plasmids encoding plant transcription factors. By working with these constructs, the authors illustrate several capabilities of the OT‑2 robot. For example, the system automatically prepares PCR reactions with optimized annealing temperatures, performs Golden Gate Assembly to build multi‑fragment plasmids with high accuracy, and executes homology‑based assembly methods such as AQUA and IVA for cloning and site‑directed mutagenesis.

Overall, the article highlights how integrating Opentrons automation into the Design‑Build‑Test‑Learn cycle can make molecular biology more efficient, reliable, and accessible to researchers.

Article 2

However, the second article, “Real‑time AI‑driven quality control for laboratory automation,” demonstrates that even though automation brings many advantages, systems like the Opentrons OT‑2 are not completely error‑free. The authors highlight that issues such as missing pipette tips, incorrect liquid volumes, or failed aspiration steps can still occur during automated workflows.

To address these limitations, the study introduces an AI‑based computer‑vision system capable of detecting such errors in real time. By integrating a YOLOv8 deep‑learning model with a camera mounted on the OT‑2, the system continuously monitors pipetting actions and alerts the user when something goes wrong.

This shows that while automation improves efficiency and reproducibility, additional quality‑control tools are essential to ensure reliability, especially in sensitive biological experiments.

2. Write a description about what you intend to do with automation tools for your final project.

For my final project, I want to explore how automation tools can support a workflow focused on improving plant-based strategies for reducing heavy metal contamination in soil. I’m also interested in how genetic modifications could help plants grow with fewer pesticides. To make the experimental steps more reliable and easier to repeat, I plan to automate several parts of the Design–Build–Test cycle.

I would use the Opentrons OT‑2 to automate tasks such as preparing PCR reactions, assembling genetic constructs, and setting up transformation mixes. Automating these steps would reduce pipetting errors and make it easier to test multiple gene variants in parallel. I may also design a 3D‑printed holder to keep plant DNA extraction tubes stable on the OT‑2 deck, since plant samples often come in irregular formats.

Here is an example of how part of the workflow could look in pseudocode:

# Automated workflow for plant construct testingload_labware(“PCR_plate”)
load_reagents([“master_mix”, “template_DNA”, “primer_sets”])
for variant in gene_variants:
pipette.transfer(master_mix, PCR_well[variant])
pipette.transfer(template_DNA, PCR_well[variant])
pipette.transfer(primer_sets[variant], PCR_well[variant])
run_thermocycler(“PCR_program”)

Later, I could use Ginkgo Nebula to explore or simulate different gene designs related to metal uptake or pest resistance, helping me decide which constructs are worth testing. Overall, automation would make the workflow more efficient and reproducible, allowing me to focus more on analyzing plant performance rather than repeating manual prep steps.

✨ Final Project Ideas ✨

As explained in this week’s recitation, I created three slides in my Node’s section of the shared slide deck, each presenting a different idea for my Individual Final Project.

These ideas reflect different ways I could combine synthetic biology, environmental applications, and automation tools for my final project.

Week 4 — Protein Design Part I

✨ Part A. Conceptual Questions ✨

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Using an online converter ( https://www.unitconverters.net/weight-and-mass/gram-to-dalton.htm ), I calculated that 100 Daltons (1 amino acid) corresponds to approximately 1.66 × 10⁻²² g. After dividing the mass of a 500 g piece of meat by this value, I found the total number of amino acid molecules:

500 g
1.66 × 10⁻²² g/molecule

≈ 3.01 × 10²⁴ molecules

This means there are ~ 3.01 × 10²⁴ amino acid molecules in 500 grams of meat.

2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Humans eat beef or fish, but we do not become cows or fish because each species has its own unique genome. Eating proteins from another species does not change our DNA; our body simply digests the proteins into amino acids and uses them to build its own proteins. We only take the building blocks, not the instructions for making another species.

3. Why are there only 20 natural amino acids?

There are only 20 natural amino acids because the genetic code in DNA and mRNA is built to encode only these 20. Although there are 64 codons, many codons code for the same amino acid (redundancy in the genetic code). Scientists are experimenting with creating non-natural amino acids to expand the range of possible proteins, but in nature, only 20 are used.

4. Can you make other non-natural amino acids? Design some new amino acids.

I’m not completely sure how it works, but I remember from George Church’s slides that scientists can create new non-natural nucleobases. I guess that by using these artificial bases in the genetic code, it might be possible to produce new non-natural amino acids, although I don’t know the exact method.

5. Where did amino acids come from before enzymes that make them, and before life started?

Based on the information I found in this article https://doi.org/10.1002/chem.202201419 , amino acids existed before life and before enzymes, formed through non-enzymatic chemistry. Enzymes appeared later as proteins that accelerated chemical reactions, including the synthesis of other proteins. Experimental evidence for prebiotic amino acid formation comes from the Miller–Urey experiment, from 1953, in which gases such as CH₄, NH₃, H₂O, and H₂ were exposed to electrical energy, producing amino acids. ( https://www.britannica.com/science/Miller-Urey-experiment )

6. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

D-amino acids (D = dextro, right) are enantiomers, meaning they are mirror versions of L-amino acids (the natural amino acids in proteins, L = levo, left). If you make an α-helix using D-amino acids, the helix will be left-handed. Even though L-amino acids are “left” in configuration, when they form a helix they twist to the right because this is the most stable arrangement for hydrogen bonds and steric interactions. D-amino acids are mirror images, so their α-helix twists in the opposite direction.

7. Can you discover additional helices in proteins?

I’m not sure, but I guess it might be possible to discover additional or unusual helices in proteins that we don’t know yet. Methods like X-ray crystallography or NMR might reveal new structures, but I don’t know the details.

8. Why most molecular helices are right-handed?

Most molecular helices are right-handed because the natural amino acids in proteins are L-amino acids (left-handed in configuration). When L-amino acids fold into a helix, the right-handed α-helix is the most stable arrangement due to optimal hydrogen bonding and minimal steric strain.

9. Why do β-sheets tend to aggregate? -What is the driving force for β-sheet aggregation?

β-sheets tend to aggregate because their backbone groups (NH and CO) can form extensive hydrogen bonds with neighboring strands from other molecules. The main driving force for β-sheet aggregation is hydrogen bonding, together with hydrophobic interactions, which increase structural stability and lower the overall energy of the system.

AI generated

10. Why do many amyloid diseases form β-sheets? -Can you use amyloid β-sheets as materials? Many amyloid diseases form β-sheets because misfolded proteins adopt β-sheet–rich structures that can form extensive hydrogen bonds between different molecules. This leads to stable aggregates called amyloid fibrils, which accumulate in tissues and cause disease. Yes, amyloid β-sheets can be used as materials because they form highly stable and mechanically strong fibrils. Scientists are studying them as biomaterials for nanotechnology and medical applications.

11. Design a β-sheet motif that forms a well-ordered structure. I am not completely sure how to design a specific β-sheet motif, but I would use amino acids that favor β-sheet formation, like valine and isoleucine. These amino acids are hydrophobic and have side chains that fit well in the extended β-strand structure, which helps the sheet stay stable. I would also alternate hydrophobic and polar residues, because in β-strands the side chains point up and down, so this pattern allows the sheet to interact with water on one side and form a stable hydrophobic core on the other. Together with hydrogen bonds between the strands, this could make a well-ordered and stable β-sheet.

✨ Part B. Protein Analysis and Visualization ✨

1. Briefly describe the protein you selected and why you selected it. I selected the Cry1A.105 protein from Bacillus thuringiensis. Cry1A.105 is a chimeric δ-endotoxin used in genetically modified Bt crops for insect pest control. The structure available corresponds to its tryptic core, which represents the active form of the toxin.

I chose this protein because it has a well-resolved 3D crystal structure and a clearly defined three-domain organization, making it ideal for structural analysis. Additionally, it is biologically and biotechnologically relevant, as it contributes to sustainable agriculture by reducing the need for chemical insecticides.

2. Identify the amino acid sequence of your protein. From the RCSB Protein Data Bank ( https://www.rcsb.org/structure/6DJ4 ), I downloaded the FASTA sequence:

 >>6DJ4_1|Chain A|Cry1A.105|Bacillus thuringiensis
IETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFLVQIEQLINQRIEEFARNQAISRLEGLSNLYQIYAESFREWEADPTNPALREEMRIQFNDMNSALTTAIPLFAVQNYQVPLLSVYVQAANLHLSVLRDVSVFGQRWGFDAATINSRYNDLTRLIGNYTDHAVRWYNTGLERVWGPDSRDWIRYNQFRRELTLTVLDIVSLFPNYDSRTYPIRTVSQLTREIYTNPVLENFDGSFRGSAQGIEGSIRSPHLMDILNSITIYTDAHRGEYYWSGHQIMASPVGFSGPEFTFPLYGTMGNAAPQQRIVAQLGQGVYRTLSSTLYRRPFNIGINNQQLSVLDGTEFAYGTSSNLPSAVYRKSGTVDSLDEIPPQNNNVPPRQGFSHRLSHVSMFRSGFSNSSVSIIRAPMFSWIHRSAEFNNIIASDSITQIPLVKAHTLQSGTTVVRGPGFTGGDILRRTSGGPFAYTIVNINGQLPQRYRARIRYASTTNLRIYVTVAGERIFAGQFNKTMDTGDPLTFQSFSYATINTAFTFPMSQSSFTVGADTFSSGNEVYIDRFELIPVTATLEAEYNLER

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. The Cry1A.105 protein consists of 591 amino acids, with serine (S) being the most frequent, appearing 54 times in the sequence.

- How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs. Using UniProt’s BLAST tool, I found 250 protein sequence homologs for Cry1A.105, representing proteins with significant sequence similarity in Bacillus thuringiensis and related species. - Does your protein belong to any protein family? Yes, Cry1A.105 belongs to the Cry protein family (δ-endotoxins) produced by Bacillus thuringiensis. Specifically, it is part of the Cry1 subfamily, which consists of insecticidal proteins with a conserved three-domain structure and pore-forming mechanism in susceptible insect midguts.

3. Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å) The Cry1A.105 structure was initially deposited on 24 May 2018 and released on 12 September 2018. It was solved using X-ray diffraction and has a resolution of 3.01 Å, which indicates a good quality structure suitable for detailed structural analysis.
Are there any other molecules in the solved structure apart from protein? Yes, apart from the protein polymer, the solved structure contains water molecules and crystallographic information, including the unit cell (C 1 2 1). No other ligands or cofactors are present.
Does your protein belong to any structure classification family? Yes, Cry1A.105 belongs to the delta-endotoxin family, C-terminal domain, according to SCOP classification. The classification is based on chain A, residues 477–609, from Bacillus thuringiensis serovar aizawai.

4. Open the structure of your protein in any 3D molecule visualization software:

RIBBON

BALL AND STICK

When the protein is colored by secondary structure, it shows both α-helices and β-sheets. In Cry1A.105, I would say that β-sheets are more abundant than α-helices, forming the main structural framework of the protein.

When colored by residue type, Cry1A.105 shows that light green (polar) and grey (charged) residues predominate, mostly on the surface, while dark green (hydrophobic) residues are mainly buried in the interior. This distribution reflects the typical organization of soluble proteins, with hydrophobic cores and hydrophilic exteriors.

I visualized the solvent-accessible surface of Cry1A.105 and observed several cavities. I highlighted one region that seems to be a pocket, but I am not very confident in identifying binding pockets, so I am not sure if this is correct.

✨ Part C. Using ML-Based Protein Design Tools ✨

C1. Protein Language Modeling

Deep Mutational Scans

a. Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.

I used ESM2 to generate a deep mutational scan for Cry1A.105 (PDB ID: 6DJ4). The results are shown as a heatmap, where each position in the sequence is tested with different possible mutations.

b. Can you explain any particular pattern? (choose a residue and a mutation that stands out)

In the heatmap, yellow represents beneficial or tolerated mutations, while dark blue represents unfavorable mutations. I noticed that some positions are mostly dark blue, which suggests they are important for the protein structure and do not tolerate changes well.

For the positive mutation, at position 373, changing the residue to L (Leucine) gives a high score (+2.80). This basically means the model thinks leucine fits well there. The position is probably flexible or not very important structurally, so swapping in a hydrophobic residue like leucine doesn’t cause problems. In other words, the protein seems totally fine with this mutation.

For the negative mutation, at position 334, changing the residue to C (Cysteine) gives a very low score (–5.46). This tells us the model really dislikes this substitution. Cysteine is reactive and can form disulfide bonds, so putting it in the wrong place can easily mess up the structure. This suggests that position 334 is more sensitive and doesn’t tolerate big chemical changes.

c. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

I searched for experimental mutational scans (such as Deep Mutational Scanning datasets) for Cry1A.105, but no DMS data were available for this protein. Therefore, a comparison between language model predictions and experimental results (e.g., using a heatmap) could not be performed.

Latent Space Analysis

a. Use the provided sequence dataset to embed proteins in reduced dimensionality.

- I used the provided protein sequence dataset and generated embeddings with a pretrained model (ESM2). These embeddings are numerical representations of each protein.

- Then, I reduced their dimensionality using t-SNE and visualized them in a 3D interactive plot. Each point represents one protein, and similar proteins tend to cluster together.

b. Analyze the different formed neighborhoods: do they approximate similar proteins?

In the t‑SNE embedding, proteins that appear close to each other form neighborhoods that reflect real structural similarity. For example, one of the clusters I inspected contained proteins from very different species (PDB IDs 3UKN, 4D7S, and 5J3U), but all of them share the same type of domain: a cyclic‑nucleotide binding domain (CNBD/CNBHD). Even though they come from zebrafish, a thermophilic bacterium, and Toxoplasma gondii, their 3D structures belong to the same SCOPe fold family (b.82.x.x). This explains why the model places them close together.

Conclusion: the neighborhoods approximate similar proteins — the embedding groups proteins by structural fold, not by species.

c. Place your protein in the resulting map and explain its position and similarity to its neighbors.

How I highlighted my protein in the t‑SNE map

To clearly identify my protein in the t‑SNE embedding, I added a small piece of code that colors all SCOPe proteins in blue and my protein in green. I also increased the marker size for my protein so it stands out from the rest of the points in the map. This allowed me to easily locate it in the 3D projection and inspect its closest neighbors. By highlighting it in green, I could immediately see which proteins cluster around it and analyze their structural similarity to mine.

In the t-SNE map, 6DJ4 is located close to the following neighbors:

2BVC – glutamine synthetase
5Z37 – Abrin A chain
3MVG – IRIP (ribosome-inactivating protein)
2BU9 – Isopenicillin N synthase

Although these proteins have different biological roles (metabolic enzyme, toxin, oxidase), they share important structural similarities. All are relatively large, soluble, globular proteins with a mixed α/β architecture and compact catalytic-like domains. The fact that 6DJ4 clusters with these neighbors suggests that the embedding captures similarities in overall fold, secondary structure composition, and global 3D organization rather than strict functional similarity.

C2. Protein Folding

Folding a protein

1. Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

I folded my protein using ESMFold in Google Colab and obtained a predicted 3D structure. In the comparison image, the predicted model is shown on the left and the original PDB structure on the right.

The two structures look quite similar overall. The general shape of the protein and the arrangement of the main secondary structure elements (α-helices and β-sheets) are preserved. This means that ESMFold was able to correctly predict the overall fold of the protein.

There are small differences in some regions, probably in flexible loops, which is normal for structure prediction.

2. Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

To test whether my protein structure is resilient to mutations, I introduced a single conservative point mutation in the middle region of the sequence (approximately positions 250–300), since central regions are generally more structurally stable than terminal ends.

Specifically, I replaced an isoleucine (I) with valine (V), changing the sequence fragment from:

MDILNSITIYTDA to: MDVLNSITIYTDA

The mutated structure (Figure A) appears very similar to both the original ESMFold prediction (Figure B) and the experimental PDB structure (Figure C). The global fold and overall 3D organization are preserved.

This suggests that the protein is resilient to small, conservative point mutations, as a single amino acid substitution did not significantly alter the overall structure.

The larger segment mutation (MDILNSI → AAAAAAA) in Figure A causes a clear structural change compared to the original predicted (Figure B) and experimental PDB (Figure C) structures, showing that the protein fold is disrupted by extensive mutations.

C3. Protein Generation

Inverse-Folding a protein

1. Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

I performed inverse folding using the backbone coordinates from the PDB structure (6DJ4) and obtained a new sequence predicted to be compatible with the 3D structure. The predicted sequence is:

 >ASPETVALRLTRFLIEDNELGAKFAKGLVDIVWGAEGPALWREWIAAVEDLIGRAVPEAVRDAAIAAAEELNKLFKEFVATVEAWEADPTDPALRKAMIEAFERLIAALEEALPLYRPPGWAIPLLPLYVLVSILYLYVLSYVSKFGKKWGFSEEKIEEYKKKLKESIISFTEFVLKHYHEGLAAVRGPTEADFVRYLRYERVMTITCLDLVALFEKFDTDLFPIKVRFQLTRVLYFNPVLEARDEDLPGSAEAIRAALPAPALVRNLVSTTFYLGSVNGKTFISGLVNTSRPAGGPLPETTDPLRGVAETALPTGFTVTSRGDGIVNLKASVVYENPEPXXXKKLTLRTLDGITFEYGSTSSMAPDVVLKSGDVDTRDVFPPARTDVPYTAGFSFKLSDISMYYEGDLTGSDRIVRSPIFGFRHRSATDNNDVYPDQITIIPLTRATRLYPGVTVVKGPGFLGGDLLKITSPGNLARLSLRLKXXXGVTYQFRVRYSANADFTVWVTVDGTRTLSTNCSKTFNAGEPLTPKSFKYCTIPESFTFEKPTFTLDVGASNFPSGNTFYVDYVELVPTSL

2. Input this sequence into ESMFold and compare the predicted structure to your original.

Next, I input the predicted sequence from ProteinMPNN into ESMFold, following the same procedure as before. The resulting 3D structure is very similar to the original PDB structure, with the overall α/β fold and domain organization preserved.

This indicates that the inverse folding approach was successful: the predicted sequence is almost compatible with the original fold and can reproduce the protein’s global 3D structure.

✨ Part A. Conceptual Questions ✨

Engineering the MS2 L Protein: Disrupting Interaction with E. coli DnaJ

Goal

Disrupt the interaction between MS2 L protein and E. coli DnaJ by truncating the N-terminal domain (aa 1–35) while keeping the transmembrane domain intact.

Tools

AlphaFold Server – structure prediction for WT and truncated L protein, complex prediction with DnaJ.
ColabFold (AlphaFold-Multimer) – interactive 3D structure prediction and visualization of WT and truncated complexes.

Pipeline

WT-L-protein

Truncated-L-protein

WT-L-protein + DnaJ

Truncated-L-protein + DnaJ

MS2 L protein sequence (WT, 75 aa)

▼

Identify N-terminal domain (aa 1–35)
responsible for DnaJ interaction

▼

Generate truncated variant (1–35)

▼

Predict 3D structures of WT & truncated L protein
Tools: AlphaFold Server

▼

Predict interaction with DnaJ
Tools: AlphaFold-Multimer / ColabFold

▼

Compare WT vs truncated

WT: N-terminal tail contacts DnaJ
Truncated: helix alone, no contact

Results

Variant	Observation	pTM	ipTM
WT L-protein + DnaJ	N-terminal tail interacts with DnaJ	0.54	0.12
Truncated L-protein (1–35) + DnaJ	Helix alone, no contact	0.59	0.11

WT structure shows the N-terminal tail of L protein near DnaJ, confirming potential interaction.
Truncated variant loses contact with DnaJ, while the transmembrane helix remains folded.
ColabFold visualizations provide interactive 3D structures to illustrate the difference.

Potential Pitfalls

AlphaFold predictions may not fully capture membrane environment effects.
Low ipTM scores indicate uncertain interface prediction, experimental validation would be needed.
Flexible regions in DnaJ and the N-terminal domain may differ in reality from predicted models.

Conclusion

Truncating the N-terminal domain of MS2 L protein disrupts interaction with DnaJ without destabilizing the transmembrane domain.
This approach supports the goal of modifying L protein for DnaJ-independent activity.

Week 5 — Protein Design Part II

✨ Part A. SOD1 Binder Peptide Design ✨

Part 1: Generate Binders with PepMLM

I searched for the SOD1 amino acid sequence in the UniProt database (P00441) and found that the protein has 154 amino acids:

 >MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

After that, I inserted the sequence into the Colab notebook. I set the parameters to generate 4 peptide binders, each with a length of 12 amino acids. The model then generated the following peptides:

Binder	Pseudo Perplexity
WHSYAAGVAWKX	12.096901
HHYPAVVVAHKE	12.724163
WHVGVVVVRHKX	18.296936
WLYGATVVRLKE	20.406855

I added a new block in the Colab notebook where I used a code to calculate the pseudo-perplexity for the known SOD1-binding peptide. The sequence FLYRWLPSRRGG was used as input, and the model returned a pseudo-perplexity of 31.43.

I recorded the pseudo-perplexity scores for all the generated peptide binders, as well as for the known SOD1-binding peptide FLYRWLPSRRGG. These scores indicate PepMLM’s confidence in each peptide: a lower pseudo-perplexity means the model predicts the peptide is more likely to bind the target protein, while a higher value indicates less confidence.

Part 2: Evaluate Binders with AlphaFold3

I first input the SOD1 sequence together with the peptide binder into the model. Initially, the prediction failed because SOD1 is a homodimer, so I corrected the input by including two chains with the same sequence of 154 amino acids, followed by the peptide binder. Two of the previously generated binders could not be used because they contained an “X”, which represents an unknown amino acid.

I first tested the known SOD1-binding peptide FLYRWLPSRRGG (pseudo-perplexity pp = 31). The predicted complex produced an ipTM score of 0.88, which indicates a reasonably confident interaction. In the structure visualization, the two SOD1 chains appear in blue, while the binder appears in orange, corresponding to very low confidence (pLDDT < 50). This may suggest that the position of the peptide is not very stable in the predicted structure. The peptide does not localize near the N-terminus where the A4V mutation is located. Instead, it appears close to residues around 90–100, near several β-sheets, indicating that it engages the β-barrel region of the protein.

Next, I tested two of the generated binders. For HHYPAVVVAHKE (pp = 12.72), the predicted complex had an ipTM score of 0.87, and the visualization showed a similar pattern, with the peptide again appearing orange, indicating low confidence. For WLYGATVVRLKE (pp = 20.40), the ipTM score was 0.82, with the same color distribution observed.

In all cases, the peptide does not appear tightly attached to the protein surface. Instead, it seems to float near the chain, suggesting that the peptide is surface-bound rather than deeply buried within the structure.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

After evaluating both HHYPAVVVAHKE and WLYGATVVRLKE in PeptiVerse, we observed the following:

HHYPAVVVAHKE shows high solubility (probability = 1.0) and is non-hemolytic (probability = 0.02), suggesting it is safe and easy to formulate. Its predicted binding affinity is classified as weak (pKd/pKi = 4.968), but in AlphaFold it had a relatively high ipTM score (0.87), indicating a reasonably confident structural interaction with the A4V mutant SOD1. The peptide has a neutral net charge (0.03 at pH 7), a molecular weight of 1386.6 Da, an isoelectric point of 7.03, and slight negative hydrophobicity (GRAVY = -0.31), all suggesting favorable biophysical properties.
WLYGATVVRLKE also shows high solubility (probability = 1.0) and is non-hemolytic (probability = 0.074). Its predicted binding affinity is weaker (pKd/pKi = 6.159), and the ipTM from AlphaFold is lower (0.82), indicating a slightly less confident structural interaction. The peptide has a net charge of 0.77 at pH 7, molecular weight of 1434.7 Da, an isoelectric point of 8.59, and slight positive hydrophobicity (GRAVY = 0.22).

Property	HHYPAVVVAHKE	WLYGATVVRLKE
ipTM	0.87	0.82
Solubility 💧	1.000	1.000
Hemolysis 🩸	0.020	0.074
Binding Affinity 🔗	4.971	6.163
Length 📏	12	12
Molecular Weight ⚖️	1386.6	1434.7
Net Charge ⚡	0.03	0.77
Isoelectric Point 🎯	7.03	8.59
Hydrophobicity 💦	-0.31	0.22

Based on the data and analysis, HHYPAVVVAHKE is the better peptide. The reasons are:

Stronger predicted binding: Its pKd/pKi = 4.971, which is lower than WLYGATVVRLKE (6.163), indicating stronger predicted affinity for the A4V mutant SOD1.
Higher structural confidence: ipTM = 0.87 vs 0.82, meaning AlphaFold predicts a more stable interaction.
Favorable therapeutic properties: Both peptides are soluble and non-hemolytic, but HHYPAVVVAHKE has a neutral net charge and slightly more hydrophilic character, which is advantageous for formulation and bioavailability.
Better overall balance: Combines reasonably strong binding with safe and favorable physicochemical properties, making it the most promising candidate to advance.

Part 4: Generate Optimized Peptides with moPPIt

In the peptide generation tool, I first pasted the A4V mutant SOD1 sequence. Then I set the peptide length to 12 amino acids. After that, I enabled the options “Enable motif and affinity guidance” (as well as solubility/hemolysis guidance if available). After running the tool, three peptide motifs were generated: RKMICGRYRYYI, SCFLYYYYTIIL, and SARRQKCVRYYT.

Then, I introduced the generated peptides into PeptiVerse, as in the previous step, in order to compare the peptides generated by PepMLM and moPPit and evaluate their physicochemical properties.

Peptide	Solubility	Hemolysis	Binding Affinity (pKd/pKi)	Net Charge	GRAVY
RKMICGRYRYYI	Soluble	Non-hemolytic (0.085)	8.512	3.75	-0.69
SCFLYYYYTIIL	Soluble	Non-hemolytic (0.205)	9.137	-0.55	1.27
SARRQKCVRYYT	Soluble	Non-hemolytic (0.050)	7.301	3.45	-1.38

All three moPPIt peptides were predicted to be soluble and non-hemolytic, which indicates a favorable safety profile. Among them, SCFLYYYYTIIL shows the highest predicted binding affinity (9.137 pKd/pKi), suggesting stronger interaction with potential targets, although it is also the most hydrophobic. In contrast, SARRQKCVRYYT has the lowest hemolysis probability and the highest hydrophilicity, indicating potentially better biological compatibility.

Compared with the best peptide generated by PepMLM, the top moPPit peptide shows improved predicted properties such as stronger binding affinity and favorable solubility, suggesting that motif-guided design may produce peptides with more optimized functional characteristics.

✨ Part B: BRD4 Drug Discovery Platform Tutorial ✨

Text

✨ Part C: Final Project: L-Protein Mutants ✨

For this part of the assignment, I chose Option 3: Random Mutagenesis. I generated three mutations in Colab using a Python script (code provided below). I prompted ChatGPT with the original Colab code and the instructions from the course, and this is what I obtained.

 >import random

def generate_random_mutations(sequence, possible_positions, n_mutations=2):
    amino_acids = "ACDEFGHIKLMNPQRSTVWY"
    # Select n_mutations random positions from the list of possible positions
    positions_to_mutate = random.sample(possible_positions, n_mutations)
    
    mutated_sequence = list(sequence)
    mutations = []
    
    for pos in positions_to_mutate:
        # Choose a new amino acid different from the original one
        new_aa = random.choice([aa for aa in amino_acids if aa != sequence[pos]])
        mutated_sequence[pos] = new_aa
        # Store mutation info as (position, original AA, new AA), using 1-based indexing
        mutations.append((pos+1, sequence[pos], new_aa))
    
    return "".join(mutated_sequence), mutations

# Example usage
sequence = "METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT"
possible_positions = [9, 25, 30, 39, 50, 53]  # safe positions to mutate
mut_seq, mut_info = generate_random_mutations(sequence, possible_positions, n_mutations=3)
print(mut_seq)
print(mut_info)

The three mutations generated are as follows:

Mutation 1: Y → V at position 40
Mutation 2: F → T at position 51
Mutation 3: Q → M at position 54

After applying these mutations, the resulting protein sequence is:

 >METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLY**V**LIFLAIFLSK**T**TN**M**LLLSLLEAVIRTVTTLQQLLT

WT-L-protein

Mutated-L-protein

Week 6 — Genetic Circuits Part I: Assembly Technologies

✨ DNA Assembly ✨

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

The Phusion High-Fidelity PCR Master Mix contains Phusion DNA polymerase, dNTPs, reaction buffer, and Mg²⁺ ions. These components allow accurate DNA amplification, with the polymerase synthesizing DNA, dNTPs acting as building blocks, and the buffer and Mg²⁺ providing optimal conditions for the reaction.

2. What are some factors that determine primer annealing temperature during PCR?

Primer annealing temperature during PCR is mainly determined by the melting temperature (Tm) of the primers, their length and GC content, and the specificity of primer binding to the template DNA. These factors influence how strongly the primers bind to the target sequence.

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR and restriction enzyme digestion can both create linear fragments of DNA, but they differ in their protocol and purpose.

The PCR protocol is more complex because it requires different temperature steps during the process: denaturation, primer annealing, and extension. These steps are repeated many times in a thermocycler to amplify the DNA fragment between the primers. PCR also requires several components such as primers, DNA polymerase, and nucleotides.

In contrast, restriction enzyme digestion is simpler. In this method, DNA is incubated with restriction enzymes at an optimal temperature so that the enzymes can recognize specific sequences and cut the DNA at those sites. Usually, it only requires maintaining the right conditions for the enzyme to function properly.

The choice between these methods depends on the goal of the experiment. PCR is preferable when we want to amplify a specific DNA fragment from a small amount of DNA. Restriction enzyme digestion is preferable when we want to cut DNA into specific fragments.

4. How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

You can ensure that DNA sequences are appropriate for Gibson cloning by designing primers during PCR that add overlapping sequences at the ends of your fragments. These overlaps are complementary to the adjacent fragment or vector, so after PCR, the fragments will already have ends compatible for Gibson assembly.

5. How does the plasmid DNA enter the E. coli cells during transformation?

The plasmid DNA cannot enter E. coli cells on its own because the membrane is a barrier. The cells must be made competent, which can be done in two ways:

Chemical competence: Treating cells with salts such as CaCl₂, which makes the membrane more permeable, followed by a heat shock to allow the plasmid to enter.
Electroporation: Applying a short electrical pulse to create temporary pores in the membrane, through which the plasmid can pass.

6. Describe another assembly method in detail (such as Golden Gate Assembly)

a. Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate Assembly (GGA) is a molecular cloning method that allows multiple DNA fragments to be assembled in a defined order in a single reaction. Each fragment must have overhangs that are complementary to the vector or to the adjacent fragment, so they can fit together like puzzle pieces. These overhangs are created by Type IIS restriction enzymes, which cut outside of their recognition site, generating specific single-stranded ends.

Unlike Gibson Assembly, GGA does not require PCR to create overlaps if fragments already contain the enzyme sites. After the overhangs are formed, DNA ligase joins the fragments together, sealing the nicks and producing a continuous DNA molecule.

This method is very efficient for assembling multiple fragments at once, and it leaves no extra sequences (scarless) between fragments. It is particularly useful when precise order and orientation of fragments is required.

SnapGene Academy

( https://www.snapgene.com/guides/golden-gate-assembly )

b. Model this assembly method with Benchling or Asimov Kernel!

In Benchling, I created a new DNA sequence (type: DNA, topology: circular) to represent the pUC19 plasmid. I searched for its sequence and found it to be 2686 bp. I then added another DNA fragment that I wanted to insert into pUC19, which is GFP. From UniProt, I obtained the GFP sequence of 238 amino acids and used an online tool to perform reverse translation from amino acids to base pairs.

In the pUC19 sequence, I ran a digest using the Type IIS enzyme BsaI, and I found that it cuts at position 2006. I added annotations for the cut site: 1999–2004 for the BsaI recognition site, and 2006–2009 for the 4 bp overhang CGGT.

To link the GFP fragment, I added the complementary overhangs (GCCA) at both the 5’ and 3’ ends of the fragment. Then, using the Assembly Wizard, I selected the method I wanted (Golden Gate Assembly), assigned the backbone as pUC19 and the insert as GFP, and finally obtained the assembled plasmid with GFP correctly inserted.

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

Step 1. First, describe a biological engineering application or tool you want to develop and why.

Step 2. Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

Step 3. Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

Step 4. Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:

🌟 Short explanation of the scoring choices

Step 5. Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Resources

Week 2 – DNA Read, Write & Edit

Pages

Subsections of Week 2 – DNA Read, Write & Edit

Week 2 – Homework

✨ Part 1: Benchling & In‑silico Gel Art ✨

✨ Part 3: DNA Design Challenge ✨

3.1. Choose your protein

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence

3.3. Codon optimization

1. Why do we optimize codons?

2. Which organism did you choose and why?

Codon Optimization Using Two Different Tools

Tool 1 — VectorBuilder Codon Optimization

Tool 2 — NovoPro Codon Optimization

3.4. You have a sequence! Now what?

3.5. [Optional] How does it work in nature/biological systems?

1. Describe how a single gene codes for multiple proteins at the transcriptional level.

2. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!!

✨ Part 4: Prepare a Twist DNA Synthesis Order ✨

✨ Part 5: DNA Read/Write/Edit ✨

5.1 DNA Read

Sequencing technology I would use

5.2 DNA Write

5.3 DNA Edit

References

Week 2 – Lecture Prep

Questions

1. What’s the most commonly used method for oligo synthesis currently?

2. Why is it difficult to make oligos longer than 200nt via direct synthesis?

3. Why can’t you make a 2000bp gene via direct oligo synthesis?

Week 3 – Lab Automation

✨ Week 3 - homework ✨

✨ Post-Lab Questions ✨

1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Article 1

Article 2

2. Write a description about what you intend to do with automation tools for your final project.

✨ Final Project Ideas ✨

Week 4 — Protein Design Part I

✨ Part A. Conceptual Questions ✨

✨ Part B. Protein Analysis and Visualization ✨

✨ Part C. Using ML-Based Protein Design Tools ✨

C1. Protein Language Modeling

C2. Protein Folding

Folding a protein

C3. Protein Generation

Inverse-Folding a protein

✨ Part A. Conceptual Questions ✨

Engineering the MS2 L Protein: Disrupting Interaction with E. coli DnaJ

Goal

Tools

Pipeline

Results

Potential Pitfalls

Conclusion

Week 5 — Protein Design Part II

✨ Part A. SOD1 Binder Peptide Design ✨

Part 1: Generate Binders with PepMLM

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

✨ Part B: BRD4 Drug Discovery Platform Tutorial ✨

Text

✨ Part C: Final Project: L-Protein Mutants ✨

Week 6 — Genetic Circuits Part I: Assembly Technologies

✨ DNA Assembly ✨

1. What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

2. What are some factors that determine primer annealing temperature during PCR?

3. There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.