Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Describe a biological engineering application or tool you want to develop and why. I want to engineer a bacteria to produce enzymes to convert plastic and glass wastes to a fertilizer. Microbes secrete extracellular enzymes—such as PETase, MHETase, cutinases, lipases, and esterases—to hydrolyze (break) the chemical bonds of plastics, releasing monomers (e.g., ethylene glycol, terephthalic acid) and oligomers. For glasses (phosphate-based), bacteria such as Bacillus ascheri and Burkholderia eburnea can aid in dissolving and releasing silicon and other plant-growth-promoting nutrients. I want to create a bacteria that can combine these two functions by engineering it to produce enzymes for break down of both plastics and glasses and convert to useful biofertilizers for plants. I want to develop such an organism because both plastic and glass seem to pose serious threats as being non biodegradable for ages.

  • Week 2 HW: DNA-Read Write and Edit

    Part 1: Benchling & In-silico Gel Art Import lambda DNA Courtesy:NCBI - O’Leary NA, Cox E, Holmes JB, Anderson WR, Falk R, Hem V, Tsuchiya MTN, Schuler GD, Zhang X, Torcivia J, Ketter A, Breen L, Cothran J, Bajwa H, Tinne J, Meric PA, Hlavina W, Schneider VA. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets. Sci Data. 2024 Jul 5;11(1):732. doi: 10.1038/s41597-024-03571-y. PMID: 38969627; PMCID: PMC11226681.

  • Week 3 HW: lab automation

    Create a Python file to run on an Opentrons liquid handling robot. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications. Journal Article Cell-free biosensor with automated acoustic liquid handling for rapid and scalable characterization of cellobiohydrolases on microcrystalline cellulose Taeok Kim, Eun Jung Jeon, Kil Koang Kwon, Minji Ko, Ha-Neul Kim, Seong Keun Kim, Eugene Rha, Jonghyeok Shin, Haseong Kim, Dae-Hee Lee, Bong Hyun Sung, Soo-Jung Kim, Hyewon Lee, Seung-Goo Lee, Cell-free biosensor with automated acoustic liquid handling for rapid and scalable characterization of cellobiohydrolases on microcrystalline cellulose, Synthetic Biology, Volume 10, Issue 1, 2025, ysaf005, https://doi.org/10.1093/synbio/ysaf005

  • Week-04-hw-protein-design-part-1

    Homework: Protein Design I Part A. Conceptual Questions 1.Why are there only 20 natural amino acids? The 20 natural amino acids evolved as optimal sets very early, during the RNA world (4 billion years ago). The format was not changed and became frozen because it would disrupt all proteins and also due to tRNA recognition limitations further expansion was prohibited. 2.Where did amino acids come from before enzymes that make them, and before life started?

Subsections of Homework

Week 1 HW: Principles and Practices

Describe a biological engineering application or tool you want to develop and why.

I want to engineer a bacteria to produce enzymes to convert plastic and glass wastes to a fertilizer. Microbes secrete extracellular enzymes—such as PETase, MHETase, cutinases, lipases, and esterases—to hydrolyze (break) the chemical bonds of plastics, releasing monomers (e.g., ethylene glycol, terephthalic acid) and oligomers. For glasses (phosphate-based), bacteria such as Bacillus ascheri and Burkholderia eburnea can aid in dissolving and releasing silicon and other plant-growth-promoting nutrients. I want to create a bacteria that can combine these two functions by engineering it to produce enzymes for break down of both plastics and glasses and convert to useful biofertilizers for plants. I want to develop such an organism because both plastic and glass seem to pose serious threats as being non biodegradable for ages.

Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

The main goal to make this enzyme produced by engineered bacteria an ethical and safely managed product with minimal risks to environment,responsible use of it and a proper containment of the engineered organism.

This main goal can be broken down to sub goals: Protect the environment:

  • As the product is being created to reduce environmental pollution of plastics and glasses, release of these organisms in the soil should not harm the natural ecosysytem of soil bacteria - prevent horizontal transfer of genes.
  • Auxotrophs of the engineered bacteria dependent on unnatural amino acids should be created so that in their absence the bacteria die.
  • Farmers should be well informed about the fertilizer created by the bacteria, concentration to be used to prevent any harm to natural ph of the soil while maintaining its fertility.

Responsible and Receptive Approach

  • Companies or industries producing the product should meake the information about the product available to the public
  • Awareness about the product to the end user (farmer)and government should provide susbsidy to promote the product.

Future Research

  • Encourage researchers in the area to innovate better strains to produce an optimised enzyme for production of the fertilizer from such waste.
  • Government funding to such projects which aims at decreasing the environmental pollution and giving back useful products to nature.

Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”).

1.A new law on the manufacturing of “Fertilizers from plastic waste and glass” should be created Purpose: The main aim of this law would be to involve a regulatory body to check the safety standards of the fertilizers before application to soil- check the amount of contaminants present in the end product, concentration application limits of the fertilizer, safety levels for the end user.

Design: The regulatory body will decide on the threshold levels of the fertilizer used, also the soil ecosystem on which it is applied should not be affected. The government agricultural department should ensure the end user be fully informed about the product. The product should have a quality check inspection before release into the market.

Assumptions: The law is accepted at the same level in all countries.The run off from these fertilizers are assumed to be low in heavy metals.

Risks of Failure and Success: Risks of failures: The end users may not be receptive to the idea of using such fertilizers produced out of waste. Risks of success: The law decreases the burden of plastic waste in landfills and provide a sustainable alternative to chemical fertilizers.

  1. Government subsidies to farmers using the fertilizer and also incentives to companies Purpose: These strategies will promote the product and also make people aware and be receptive to such sustainable approaches.

Design: The government can conduct awareness camps and demonstrate its application. Tax benefits can be provided to companies that sell these products.

Assumptions: While initial setup requires high capital investment, the government assumes long-term savings in waste management and lower fertilizer costs for farmers.

Risks of Failure & Success: Risks of Failures: Subsidies may be subject to change, and if environmental regulations on microplastics in fertilizer tighten, current products might become non-compliant. Risk of Success: Many startups might adopt these fertilizers quickly without thinking for the benefit of obtaining incentives, without tginking about the long term impacts.

  1. Handling of the engineered bacteria

Purpose: Proper protocol should be followed to handled these “superbugs”(genetically engineered) to prevent them from mixing with natural biota in the ecosystem.This can lead to creating a pathogen by horizontal gene transfer.

Design: The organisms created should be created as auxotrophs so that in the absence of the desired nutrient kills the microbe. incorporate genes in bacteria that kill it in specific environmental conditions- create a suicidal circuit.

Assumption: It is assumed that the genetic modification does not significantly impair the growth rate or metabolic function of the bacteria, allowing for sufficient yield.

Risks of Failure & Success: Risks of Failures: The engineered genes may be lost or mutated over successive generations, causing the bacteria to lose their intended function or gain unintended traits. Risk of Success:The survival and effectiveness of GEB in the field depend on factors like temperature, pH, and nutrient availability.

Policies and ActionBiosafety LawGovernment SubsidiesHandling GE Bacteria
Protection of environment
Prevention of environmental hazard132
Product quality and Safety132
Responsible and Receptive Approach
Public awareness21n/a
Government subsidy12n/a
Future Research
R&D for such innovative strains231
Government funding projects231
Other considerations
Minimizing costs and burdens to stakeholders312
Feasibility?231
Promote constructive applications132

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

I think a combiantion of biosafety law o fertilizers made from plastic and glass waste and governemnt subsidies I would prioritize because that would help in the promotion of the new innovative product as well as use it efficiently. The awareness about the product would also make the public receptive towards new technology and adopt sustainable practices.

Here many assumptions are made about the engineered bacteria beinga safe organism which will not be a hazard to the ecosystem and would prevent nutrient leaching and will not undergo mutation in the processto become pathogenic.These uncertainties can be mitigated by some of the methods mentioned above.the handling of the GE bacteria becomes important when you consider the soil ecosystem with its natural organisms. There are also uncertainties regarding process of conversion of fertilizer from plastic and glass waste.

Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of DNA polymerase typically ranges from 10-4 to 10-6 errors per base pair during initial nucleotide insertion. While proofreading the accuracy improves to 10-7 to 10-8 errors per base pair. The human haploid genome is 3.5 billion base pairs long so roughly 6.3 billion pairs long, so the DNA polymerase has an estimated error rate of 1 error per 109to 1010 nucleotides.  Biology has a very efficient way of solving this discrepancy by using proofreading method in (3’- 5’ Exonuclease Activity) and mismatch repair mechanisms.

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The standard genetic code uses 64 codons for 20 amino acids and 3 stop signals. Calculation for Average Protein For a 375-amino-acid protein, the total number of coding DNA sequences is the product of codon choices per position: 1×1×29×31×45×63 =3.2 × 10195 So roughly 3374 =10^179.

In reality the codon usage varies from species to species meaning the organism prefers a certain codon over the otehr to produce the same amino acid which is used in protein engineering as codon optimization. Synonymous codon variants often fail to produce functional, equivalent proteins in practice due to cellular biases and kinetic effects during gene expression.

Codon usage bias matches tRNA availability in human cells, so rare codons slow ribosome speed, reducing protein yield. Optimal codons boost expression up to 15-fold, while mismatched ones drop levels dramatically. Codon swaps can change protein structure, solubility, or stability, even if the amino acid sequence stays identical.

Dr.LeProust:

What’s the most commonly used method for oligo synthesis currently? Phosphoramidite solid-phase synthesis is the most commonly used method for oligonucleotide (oligo) synthesis today. This technique builds oligos stepwise on a solid support like controlled pore glass (CPG), adding protected nucleoside phosphoramidite monomers one at a time. Key steps include detritylation (removing the 5’-protecting group), coupling (adding the next nucleotide), oxidation (stabilizing the phosphite linkage), and capping (blocking failed sequences).

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Each nucleotide addition has a coupling yield of about 99%, but errors compound exponentially; a 200-mer requires roughly 200 cycles, dropping full-length product yield below practical levels (e.g., ~36% theoretical at 99% efficiency, far lower in practice). Longer sequences amplify deletions, truncations, and depurination from repetitive harsh cycles (oxidation, capping, deprotection)

Failure sequences (n-1 mers, mutations) dominate output, and no standard method like HPLC or gel electrophoresis resolves the tiny full-length fraction from closely related byproducts.

Longer strands form secondary structures that sterically hinder reagent diffusion and coupling, especially on porous solid supports like CPG, where diffusion slows dramatically.

Why can’t you make a 2000bp gene via direct oligo synthesis?

Deletions, insertions, and substitutions build up rapidly beyond 100-200 nt, as there’s no proofreading like in enzymatic replication. Long sequences form stable hairpins or folds that sterically hinder reagent access and coupling.

Professor Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? The 10 amino acids often referenced for many animals (e.g., swine, dogs, rats) are those animals cannot synthesize sufficiently: lysine, methionine, tryptophan, threonine, valine, isoleucine, leucine, arginine, histidine, and phenylalanine.

The lysine contingency refers to a fictional genetic failsafe from the Jurassic Park franchise. In Jurassic Park, geneticist Henry Wu engineered dinosaurs unable to synthesize the essential amino acid lysine, making them dependent on external supplements provided by park staff. Without lysine, the dinosaurs would enter a coma and die, preventing their survival if they escaped Isla Nublar and disrupted ecosystems. Lysine is abundant in nature—found in plants like soy, bacteria, and prey animals—allowing dinosaurs (or any organism) to obtain it through diet. Humans and animals can’t synthesize lysine either but thrive without supplements by eating lysine-rich foods, undermining the contingency’s viability. Lysine is an essential amino acid critical for protein synthesis, collagen formation, and carnitine production, with deficiencies linked to anemia or impaired metabolism. Information coutesy: perplexity pro lysine as essential amino acid and lysine contingency

Week 2 HW: DNA-Read Write and Edit

Part 1: Benchling & In-silico Gel Art

Import lambda DNA

Courtesy:NCBI - O’Leary NA, Cox E, Holmes JB, Anderson WR, Falk R, Hem V, Tsuchiya MTN, Schuler GD, Zhang X, Torcivia J, Ketter A, Breen L, Cothran J, Bajwa H, Tinne J, Meric PA, Hlavina W, Schneider VA. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets. Sci Data. 2024 Jul 5;11(1):732. doi: 10.1038/s41597-024-03571-y. PMID: 38969627; PMCID: PMC11226681.

Simulation by diffrenet restriction enzymes.

design image design image

Coutesy:Benchling [Biology Software]. (2026). Retrieved from https://benchling.com.

Create a pattern/image in the style of Paul Vanouse’s Latent Figure Protocol artworks.

pattern patternlane labels lane labels

Coutesy:Benchling [Biology Software]. (2026). Retrieved from https://benchling.com.

Part 3: DNA Design Challenge

Choose your protein.

I have chosen wild type green fluorescent protein. This protein is used as a reporter gene in plasmids to study expression of genes as well as in biosensing, protein localization and also in live cell imaging.

I would like to research on Gfp variants and understand for better use in biosensor field.

Sequence Green fluorescent protein Gene GFP Status UniProtKB reviewed (Swiss-Prot) Organism Aequorea victoria (Water jellyfish) (Mesonema victoria)

FASTA sequence

AAA58246.1 green-fluorescent protein [Aequorea victoria] MSKGEELFTGVVPILVELDGDVNGQKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQC FSRYPDHMKQHDFFKSAMPEGYVQERTIFYKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK MEYNYNSHNVYIMADKPKNGIKVNFKIRHNIKDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKD PNEKRDHMILLEFVTAAGITHGMDELYK

Courtesy: The UniProt Consortium, “UniProt: the Universal Protein Knowledgebase in 2025,” Nucleic Acids Research, 2025.

Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

L29345.1 Aequorea victoria green-fluorescent protein (GFP) mRNA, complete cds TACACACGAATAAAAGATAACAAAGATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTT GTTGAATTAGATGGCGATGTTAATGGGCAAAAATTCTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACAT ACGGAAAACTTACCCTTAAATTTATTTGCACTACTGGGAAGCTACCTGTTCCATGGCCAACACTTGTCAC TACTTTCTCTTATGGTGTTCAATGCTTTTCAAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAG AGTGCCATGCCCGAAGGTTATGTACAGGAAAGAACTATATTTTACAAAGATGACGGGAACTACAAGACAC GTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATAGAATCGAGTTAAAAGGTATTGATTTTAAAGA AGATGGAAACATTCTTGGACACAAAATGGAATACAACTATAACTCACATAATGTATACATCATGGCAGAC AAACCAAAGAATGGAATCAAAGTTAACTTCAAAATTAGACACAACATTAAAGATGGAAGCGTTCAATTAG CAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTC CACACAATCTGCCCTTTCCAAAGATCCCAACGAAAAGAGAGATCACATGATCCTTCTTGAGTTTGTAACA GCTGCTGGGATTACACATGGCATGGATGAACTATACAAATAAATGTCCAGACTTCCAATTGACACTAAAG TGTCCGAACAATTACTAAATTCTCAGGGTTCCTGGTTAAATTCAGGCTGAGACTTTATTTATATATTTAT AGATTCATTAAAATTTTATGAATAATTTATTGATGTTATTAATAGGGGCTATTTTCTTATTAAATAGGCT ACTGGAGTGTAT

Courtesy:NCBI - O’Leary NA, Cox E, Holmes JB, Anderson WR, Falk R, Hem V, Tsuchiya MTN, Schuler GD, Zhang X, Torcivia J, Ketter A, Breen L, Cothran J, Bajwa H, Tinne J, Meric PA, Hlavina W, Schneider VA. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets. Sci Data. 2024 Jul 5;11(1):732. doi: 10.1038/s41597-024-03571-y. PMID: 38969627; PMCID: PMC11226681.

Codon optimization

Codon optimization is done as each organism has a set of codon preferences for the same amino acids. Due to codon redundancy the same amino acid acn be coded for by multiple sets of codon. This method is used to maximise protein expression based on tailoring the DNA sequence based on tRNa abundance in the host organism. It increases translation efficiency, improves protein yield, and eliminates negative regulatory elements (repressors), which is crucial for producing recombinant proteins, vaccines, and gene therapies.

This was done using Vector Builder {Link: https://www.vectorbuilder.com https://en.vectorbuilder.com/resources/cite.html}.

codon optimised codon optimised

You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

There are many technologies that can be used to produce protein.

1.Recombinant DNA technology

The GFP gene is inserted into plasmid that can replicate inside host cell. They have strong promoters to ensure high levels GFP expression.Based on the expression of the selective marker plasmids are selected and transfected into host cells.

  1. Heterologous Expression Systems (Host Cells) Bacterial protein expression, primarily using E. coli, is a fast, cost-effective, and scalable method for producing recombinant proteins. It involves :

Cloning: Inserting the GFP gene into an expression vector.(E.coli plasmid)

Transformation: Introducing the plasmid into competent bacterial cells.

Expression: Culturing the cells and inducing GFP production.

Harvesting & Purification: Lysis of cells and purification of the GFP.

  1. Cell-Dependent Method (In Vivo)

This occurs within the host cell

Transcription (DNA to mRNA): Initiation: The enzyme RNA polymerase binds to a GFP DNA sequence called the promoter, signaling the start of the gene. Elongation: RNA polymerase unwinds the DNA helix and reads one strand (the template strand) in the 3′ to 5′ direction, synthesizing a complementary RNA molecule in the 5′ to 3′ direction. Termination: Upon reaching a “terminator” sequence, the RNA polymerase releases the newly formed pre-mRNA (GFP) strand.

Translation (mRNA to Protein): The mature GFP mRNA moves to the cytoplasm, where it binds to a ribosome. The ribosome reads the codons. Transfer RNA (tRNA) molecules, carrying specific amino acids, match their anticodons to the mRNA codons. The ribosome catalyzes a peptide bond between amino acids, building a polypeptide chain until a stop codon is reached.

Highly regulated, capable of complex post-translational modifications (folding, glycosylation) in eukaryotes, but slow and limited by cell viability.

  1. Cell-Free Method (In Vitro)

Cell-free protein synthesis (CFPS) harnesses the machinery (ribosomes, tRNAs, enzymes) extracted from cells to produce proteins in a test tube, allowing direct control over the environment.

Preparation: Cells (e.g., E. coli,) are grown, lysed, and centrifuged to remove DNA, cell walls, and debris, leaving only the translational machinery.

Method: The extracted, active machinery is mixed with the GFP DNA template (plasmid or PCR-amplified), amino acids, energy sources (like ATP/GTP), and cofactors.

Process: The system can be coupled (transcription and translation occur together) or uncoupled (using mRNA directly). It bypasses the need for cell viability making it ideal for toxic, membrane, or complex proteins.

Fast (hours instead of days), open system allowing direct manipulation, and capable of producing toxic or high-yield proteins, but can be expensive for large-scale production.

[Optional] How does it work in nature/biological systems?

The protein works as a fluorescence producing mechanism in the jelly fish producing green light.

Describe how a single gene codes for multiple proteins at the transcriptional level.

A single gene codes for multiple proteins primarily through alternative splicing of pre-mRNA, where different combinations of exons are joined together after introns are removed.

Pre-mRNA is spliced in multiple ways to include or exclude specific exons. This produces different mRNA transcripts, which are then translated into different protein isoforms.

Alternative Promoters: A gene may have multiple promoters, allowing transcription to start at different points, resulting in mRNA molecules with different 5’ends.

Alternative Polyadenylation: This process alters the end of the mRNA, which can affect mRNA stability and localization, leading to different protein products.

Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!!

GFP Central Dogma GFP Central Dogma

Part 4: Prepare a Twist DNA Synthesis Order

Build Your DNA Insert Sequence

DNA insert DNA insert

Courtesy:Benchling [Biology Software]. (2026). Retrieved from https://benchling.com.

Dna insert full view Dna insert full view

Coutesy:Benchling [Biology Software]. (2026). Retrieved from https://benchling.com.

Lysostaphin in plasmid Lysostaphin in plasmid

Coutesy: Twist Bioscience

Part 5: DNA Read/Write/Edit

DNA Read (i) What DNA would you want to sequence (e.g., read) and why?

I would like to explore DNA of Antibiotic resistant bacteria against penicillin. I want to study their sequence and understand how they become resistant to the antibiotics, study their interaction with antibioticsand compare their genes with their susceptible counterparts to understand which sequence produces which type of resistance to the organism towards antibiotics.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Long-Read Sequencing (Oxford Nanopore Technologies - ONT): Usage: Emerging, real-time sequencing (e.g., MinION) for rapid diagnostics.

Why: Produces very long reads (10 kb to >1 Mb) that can bridge repetitive regions, allowing for the easy reconstruction of plasmids and the identification of the genetic context of resistance genes (e.g., whether they are on a plasmid or chromosome).

Is your method first-, second- or third-generation or other? How so?

Oxford Nanopore Technologies (ONT) is considered a third-generation (or sometimes referred to as long-read) sequencing technology. The first-generation (Sanger) or second-generation (Illumina/NGS) methods rely on DNA synthesis and detection of light signals, while ONT measures changes in electrical current as single molecules of DNA/RNA pass through a protein nanopore.

It is categorized as third-generation sequencing due to its ability to sequence long, single molecules of nucleic acids in real-time. It generates very long to ultra-long reads, superior assembly of complex genomes and structural variant detection.

What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps.

The input for the ONT method would be the genomic DNA (gDNA) or plasmid DNA, extracted from pure bacterial cultures or environmental samples.

Steps for Sequencing AMR Genes

Sample Preparation and DNA Extraction:

AMR Bacteria are cultured in LB broth and collected. DNA is extracted using kits for high molecular weight DNA using Qiagen or MagAttract kits. It is quantified and purified using Qubit and Nanodrop respectively.

Library Preparation: Rapid Kits (e.g., Rapid Barcoding Kit SQK-RBK114.24): Transposases fragment and tag DNA with adapters in one step, ideal for quick turnaround. Ligation Kits (e.g., Ligation Sequencing Kit SQK-LSK114): Provides higher output and longer reads for more comprehensive genome coverage.

What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)?

The prepared library is loaded onto an ONT flow cell (e.g., R10.4 or R9.4.1) on a device like the MinION or GridION. As DNA passes through the nanopore, ionic current changes are measured and recorded.

  1. Raw Signal Capturing and Preprocessing Signal Acquisition: As DNA/RNA strands pass through a nanopore, they disrupt an electrical current. MinKNOW (ONT’s controlling software) records these changes. File Format Conversion: Raw signals are traditionally stored in .fast5 files but are increasingly saved in the more efficient POD5 file format, which is designed for faster data handling and processing. Data Preparation: Before basecalling, the raw signals are often organized and prepared, potentially involving filtering to remove uninformative data.

  2. The Basecalling Process (Neural Network Decoding) Deep Learning Models: Modern ONT basecallers (like Dorado, Guppy, or Bonito) use neural networks (such as Recurrent Neural Networks - RNNs, or Transformer models) to analyze the raw signal data. “Squiggle” to Base Translation: The neural network maps the electrical signal changes to the corresponding nucleotide sequences, usually in real time while the sequencing is still running. Move Tables: The process identifies when a new base enters the pore, producing a “move table” that indicates which part of the signal corresponds to which base. Quality Scoring: Alongside the base sequence, the basecaller assigns a probability score to each base, often represented as a Phred score to indicate confidence in the call.

What is the output of your chosen sequencing technology?

Post-Processing and Output FASTQ File Generation: The primary output is a FASTQ file, containing the sequences and their associated quality scores.

BAM/CRAM Output: Alternatively, basecallers can output files in SAM, BAM, or CRAM formats, which can include both the sequence data and the signal-level information.

Demultiplexing (Optional): If multiple samples were mixed in a single run (barcoding), the software identifies which reads belong to which barcode and separates them into individual files.

Polishing (Optional): Additional steps like Medaka or Nanopolish may be used to refine the sequence data further, especially for improving consensus accuracy.

DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?

I want to synthesize the Staphylococcus staphylolyticus lysostaphin gene

    1 gaaaattcca aaaaaaaacc tactttctta atattgattc atattatttt aacacaatca
   61 gttagaattt caaaaatctt aaagtcaatt tttgagtgtg tttgtatatt tcatcaaagc
  121 caatcaatat tattttactt tcttcatcgt taaaaaatgt aatatttata aaaatatgct
  181 attctcataa atgtaataat aaattaggag gtattaaggt tgaagaaaac aaaaaacaat
  241 tattatacga cacctttagc tattggactg agtacatttg ccttagcatc tattgtttat
  301 ggagggattc aaaatgaaac acatgcttct gaaaaaagta atatggatgt ttcaaaaaaa
  361 gtagctgaag tagagacttc aaaaccccca gtagaaaata cagctgaagt agagacttca
  421 aaagctccag tagaaaatac agctgaagta gagacttcaa aagctccagt agaaaataca
  481 gctgaagtag agacttcaaa agctccagta gaaaatacag ctgaagtaga gacttcaaaa
  541 gctccggtag aaaatacagc tgaagtagag acttcaaaag ctccggtaga aaatacagct
  601 gaagtagaga cttcaaaagc cccagtagaa aatacagctg aagtagagac ttcaaaagct
  661 ccagtagaaa atacagctga agtagagact tcaaaagctc cggtagaaaa tacagctgaa
  721 gtagagactt caaaagcccc agtagaaaat acagctgaag tagagacttc aaaagctcca
  781 gtagaaaata cagctgaagt agagacttca aaagctccgg tagaaaatac agctgaagta
  841 gagacttcaa aagccccagt agaaaataca gctgaagtag agacttcaaa agccctggtt
  901 caaaatagaa cagctttaag agctgcaaca catgaacatt cagcacaatg gttgaataat
  961 tacaaaaaag gatatggtta cggtccttat ccattaggta taaatggcgg tatccactac
 1021 ggagttgatt tttttatgaa tattggaaca ccagtaaaag ctatttcaag cggaaaaata
 1081 gttgaagctg gttggagtaa ttacggagga ggtaatcaaa taggtcttat tgaaaatgat
 1141 ggagtgcata gacaatggta tatgcatcta agtaaatata atgttaaagt aggagattat
 1201 gtcaaagctg gtcaaataat cggttggtct ggaagcactg gttattctac agcaccacat
 1261 ttacacttcc aaagaatggt taattcattt tcaaattcaa ctgcccaaga tccaatgcct
 1321 ttcttaaaga gcgcaggata tggaaaagca ggtggtacag taactccaac gcccaataca
 1381 ggttggaaaa caaacaaata tggcacacta tataaatcag agtcagctag cttcacacct
 1441 aatacagata taataacaag aacgactggt ccatttagaa gcatgccgca gtcaggagtc
 1501 ttaaaagcag gtcaaacaat tcattatgat gaagtgatga aacaagacgg tcatgtttgg
 1561 gtaggttata caggtaacag tggccaacgt atttacttgc ctgtaagaac atggaataaa
 1621 tctactaata ctttaggtgt tctttgggga actataaagt gagcgcgctt tttataaact
 1681 tatatgataa ttagagcaaa taaaaatttt ttctcattcc taaagttgaa gcttttcgta
 1741 atcatgtcat agcgtttcct gtgtgaaatt gcttagcctc acaattccac acaacatacg
 1801 agccggaaca taaagtgcta agcct

Courtesy:NCBI - O’Leary NA, Cox E, Holmes JB, Anderson WR, Falk R, Hem V, Tsuchiya MTN, Schuler GD, Zhang X, Torcivia J, Ketter A, Breen L, Cothran J, Bajwa H, Tinne J, Meric PA, Hlavina W, Schneider VA. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets. Sci Data. 2024 Jul 5;11(1):732. doi: 10.1038/s41597-024-03571-y. PMID: 38969627; PMCID: PMC11226681.

Lysostaphin is highly active against antibiotic-resistant S. aureus (MRSA) and intermediately vancomycin-susceptible S. aureus (VISA), as its mechanism of action differs from traditional antibiotic resistance mechanisms. Lysostaphin is a potent zinc-dependent metalloendopeptidase (specifically a glycylglycine endopeptidase) produced by Staphylococcus simulans. It acts as an antibacterial agent (bacteriocin) with high efficiency against Staphylococcus aureus (including MRSA).

What technology or technologies would you use to perform this DNA synthesis and why?

Codon Optimization: To increase expression efficiency, the native lss gene is often codon-optimized to match the preferences of the host microorganism, such as E. coli.

This was done using Vector Builder {Link: https://www.vectorbuilder.com https://en.vectorbuilder.com/resources/cite.html}.

Vector Construction and Expression Systems: The gene is cloned into various expression vectors (e.g., pET-22b(+), pWB980, pET32a) and transformed into hosts like E. coli BL21(DE3) or Bacillus subtilis WB600, a strain engineered to be deficient in six extracellular proteases, reducing protein degradation.

Constitutive and Inducible Promoters: While many systems use inducible promoters (e.g., IPTG) to control production, recent advances include using constitutive, non-inducible promoters (e.g., pemIK-Sa1 from staphylococcal toxin-antitoxin systems) to reduce costs for large-scale production.

Restriction Enzyme Cloning: The synthetic lysostaphin gene is digested with restriction enzymes (e.g., EcoRI, XhoI, NdeI) and ligated into expression vectors like pPIC9 or pET22b(+) using T4 DNA ligase.

Gibson Assembly: The NEBuilder Assembly Tool is used to design primers for amplifying target regions, which are then assembled into plasmids (e.g., pMAD).

Homologous Recombination: Homologous recombinase is used to ligate optimized lysostaphin fragments into vectors.

What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

limitations table limitations table

DNA Edit

What DNA would you want to edit and why?

I would want to edit, silence, or delete antimicrobial resistance (AMR) genes in bacteria, specifically targeting resistance plasmids or chromosomal genes to restore antibiotic susceptibility.

What technology or technologies would you use to perform these DNA edits and why?

I would want to use CRISPR technology.CRISPR-Cas9 is a genome-editing technology that uses a guide RNA (gRNA) to direct the Cas9 enzyme to a specific DNA sequence, where it acts as molecular scissors to create a double-strand break. The cell then repairs this cut using either NHEJ (resulting in gene knockouts) or HDR (enabling precise gene insertion/correction). It is faster, cheaper, and more accurate alternative to previous methods.

The other editing technologies are TALENs & ZFNs: Older, customizable nuclease technologies that bind to specific DNA sequences to induce breaks, though they are generally less flexible than CRISPR.

TALENs generally exhibit significantly lower off-target effects compared to both ZFNs and CRISPR, making them safer for certain applications.They are widely used for precise, large-scale, and stable genome engineering in plants and animals.

How does your technology of choice edit DNA? What are the essential steps?

CRISPR-Cas9 is a programmable gene-editing technology that uses a guide RNA (sgRNA) to direct the Cas9 enzyme to a specific DNA sequence, creating a targeted double-strand break (DSB). The system relies on a PAM sequence for binding, after which the cell repairs the cut using error-prone NHEJ (for gene disruption) or precise HDR (for gene correction).

Guide RNA (sgRNA): A synthetic RNA sequence designed to be complementary to the target DNA, directing the Cas9 enzyme to the precise location in the genome.

Cas9 Nuclease: An enzyme acting as molecular scissors that creates a double-strand break (DSB) in the DNA, specifically three bases upstream of a required

Protospacer Adjacent Motif (PAM) sequence.

Target Recognition: The CRISPR-Cas9 complex scans the genome for a PAM sequence (commonly 5’-NGG-3’). Once found, it checks if the sgRNA matches the adjacent DNA sequence.

Once the DNA is cut, the cell attempts to repair it, which allows for gene editing:

Non-Homologous End Joining (NHEJ): A fast,, error-prone repair mechanism that often introduces small deletions or insertions (indels), disrupting or “knocking out” the target gene.

Homology-Directed Repair (HDR): A precise repair mechanism used if a repair template is provided, allowing for the insertion of new, desired genetic information or correction of mutations.

What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

The preparation step involves the identification of the AMR gene to be targeted, then analyse the sequence for protospacer adjacent motif(PAM).

Design gRNAs complementary to AMR genes.Benchling tool can be used to reduce off-target effects and maximise specificity.

Construct plasmids for cloning gRNA and Cas9 gene inserts using restriction enzyme and ligase.

Transformation of E.coli cells with the plasmids, purication and verification of sequences using Sanger sequencing.

Preparation of delivery systems: chemical transformation, electroporation,bacteriophage particle for phage mediated delivery.

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Inputs for AMR editing

Cas Nuclease: Cas9 (commonly S. pyogenes Cas9) is the primary protein, or Cas12a/Cpf1.

Guide RNA (gRNA/sgRNA): Specifically designed 20-nt guide sequence with a scaffold.

Plasmids: Expression vectors containing both the Cas9 gene and the gRNA sequence (e.g., pX330).

Primers: For PCR verification of the edit.

Enzymes: Restriction enzymes (e.g., BpiI, BsmBI) and T4 DNA ligase for cloning.

Delivery Vehicle: Phages (e.g., temperate/lytic phage), nanoparticles, or conjugated plasmids.

Limitations of CRISPR

Off-Target Effects: The CRISPR-Cas9 complex may bind to and modify genomic sites that are not the intended target, leading to potential, unintended, and sometimes harmful mutations.

Delivery Challenges: Delivering the large CRISPR-Cas9 components into specific cells or tissues is difficult, which limits its application in many clinical contexts.

Low Efficiency: The process is not 100% efficient, particularly with homology-directed repair (HDR), leading to cells that may not have the desired edit.

Mosaicism: In animal models, not all cells may be edited equally, resulting in mosaicism where only some cells carry the desired modification, making it difficult to identify, study, or rely on the desired edit.

PAM Sequence Requirements: The Cas9 protein must bind to a specific protospacer adjacent motif (PAM) sequence located next to the target DNA, which may not be present at the desired location.

Persistent Binding: In some instances, the Cas9 protein binds to the cut site persistently, preventing the DNA repair machinery from functioning, leading to editing failure.

Week 3 HW: lab automation

Create a Python file to run on an Opentrons liquid handling robot.

house design coordinates house design coordinateshouse design code house design codehouse design house design

Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Journal Article Cell-free biosensor with automated acoustic liquid handling for rapid and scalable characterization of cellobiohydrolases on microcrystalline cellulose Taeok Kim, Eun Jung Jeon, Kil Koang Kwon, Minji Ko, Ha-Neul Kim, Seong Keun Kim, Eugene Rha, Jonghyeok Shin, Haseong Kim, Dae-Hee Lee, Bong Hyun Sung, Soo-Jung Kim, Hyewon Lee, Seung-Goo Lee, Cell-free biosensor with automated acoustic liquid handling for rapid and scalable characterization of cellobiohydrolases on microcrystalline cellulose, Synthetic Biology, Volume 10, Issue 1, 2025, ysaf005, https://doi.org/10.1093/synbio/ysaf005

This paper talks about the high throughput screening challenges involving engineering enzymes that help in degrading cellulose in paper sludge or microplastics in sewage sludge as solid substrates are not readily accessible in cell-based biosensor systems. In the paper, a cell free cellobiose-detectable biosensor (CB-biosensor) for rapid characterization of cellobiohydrolase (CBH) activity, enabling direct detection of hydrolysis products without cellular constraints was adopted.The biosensor distinguishes between CBH subtypes (CBHI and CBHII) based on their modes of action. Echo 525 liquid handler enables precise and reproducible sample processing, with fluorescence signals from automated preparations comparable to manual experiments.Assay volumes can be reduced to just a few microlitres—impractical with manual methods. Echo 525 minimizes reagent consumption, accelerates testing, and facilitates reliable large-scale screening, advancing enzyme screening and accelerating the Design-Build-Test-Learn cycle for sustainable biomanufacturing.

Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

The final project idea which involves design of COF based biosensor with aptamer for detection of AMR genes and drug delivery, cloud laboratory can be used to automate time consuming SELEX(Systematic Evolution of Ligands by Exponential Enrichment)process for aptamer selection. this increases binding affinity,improve specificity, removing sequences that bind to non-target components. PCR/RT-PCR amplification of the bound sequences is carried out automatically to generate the pool for the next round.

Robots control the functionalization of the COF surface with various reagents(e.g., amine-functionalization). Immobilization: Automated, high-throughput liquid handling ensures consistent covalent attachment of the aptamer to the COF, ensuring uniform batch production. Robots precisely load the COF-aptamer complex with siRNA for gene silencing of AMR, ensuring consistent dosages for therapeutic applications.

Final project ideas

Project Projectproject projectproject project

Week-04-hw-protein-design-part-1

Homework: Protein Design I

Part A. Conceptual Questions

1.Why are there only 20 natural amino acids?

The 20 natural amino acids evolved as optimal sets very early, during the RNA world (4 billion years ago). The format was not changed and became frozen because it would disrupt all proteins and also due to tRNA recognition limitations further expansion was prohibited.

2.Where did amino acids come from before enzymes that make them, and before life started?

The amino acids were formed by abiotic processes on early Earth(4.5 billion years ago) using gases, minerals and energy sources present at that time.

Miller-Urey experiment simulated the similar environment in their experiment and created glycine,alanine and 33 otehr amino acids by condensation and reduction.

3.If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

L-amino acids form righthanded α-helices because their chirality favours such formation to prevent steric clashes in the side chains. In contrast the D-amino acids should prefer left handed helices to prevent steric clashes in the side chains.

4.Can you discover additional helices in proteins?

There are other helix types like the 3₁₀-helices, π-helices, and polyproline II (PPII) helices. They are formed by specific hydrogen bonding patterns and amino acid sequences.

5.Why most molecular helices are right-handed?

The molecular helices in biology are right handed because of the L-chirality of amino acids and D-sugars. These molecular conformations stercially favour the right handed twist for stability and folding efficiency.

6.Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

β-sheet aggregate due to hydrogen bond donors/acceptors at their edges, promote edge to edge interactions with other sheets or unfolded chains.Hydrophobic side chains on edges prefer being buried by intermolecular contacts, leading to intermolecular associations that extend sheets into fibrils or amyloids.

The primary driving force for β-sheet aggregation is thermodynamics. The hydrogen bonds and Van der Waal’s forces lower free energy, further by cooperativity by dimerization. Aggregation occurs when the hydrophobic residues bury themselves in a compact core-this “collapse” reduces solvent-exposed area and drives entropy gain from released water molecules.

7.Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Many amyloid diseases occur because of misfolding of proteins and adoption of β-sheet conformation and then self assemble into insoluble fibrils. Destabilization of native protein structure occurs first, then partial unfolding leading to exposure of β-strand regions that stack via hydrogen bonding into cross-β-sheet architectures. The fibrils formed are highly ordered parallel or antiparallel β-sheets,aggregate in a prion like manner, leading to plaque formation that disrupt tissue function in conditions like Alzheimer’s and type II diabetes.

Amyloid β-sheets as materials can be used as biomaterials because of their exceptional mechanical strength,biocompatibility, and nanoscale self-assembly. Non-pathogenic or engineered amyloid fibrils form robust scaffolds for tissue engineering, drug delivery, and biosensors. They mimic extracellular matrices to support cell adhesion and growth. They allow fabrications with bioplastics, hydrogels, and functional coatings for tunable properties via genetic modification or hybridization with nanoparticles.

8.Can you make other non-natural amino acids? Design some new amino acids.

Yes we can make.

The sid echain of the amino acid has to be modified by methylation or some otehr functional group, or with anotehr side chain taht is bulky. Advantages: Green, selective; challenges: Low yield, stability issues.

Part B: Protein Analysis and Visualization

Briefly describe the protein you selected and why you selected it.

mCardinal is the far red fluorescent protein I have chosen. It is a bright, monomeric,derived from Entacmaea quadricolor, with an emission peak around 656 nm.

I chose this because its excitation at 604 nm and emission at 659 nm, is the optimal far-red range for deep-tissue penetration. It is far brighter than mKate2 and other early-generation far-red variants.The monomeric form of the fluorescent protein, minimizes toxicity and can be used as fusion tags with target proteins without causing aggregation. Highly photostable so can be used for long term imaging.

How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.

The protein is 268 amino acids long. The most common amino acid is G, it occurs 25 times.

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

It has many homologs and soem of them are uncharacterised proteins too. Mostly the homologs belong to the red fluorescent protein family.

Does your protein belong to any protein family?

mCardinal belongs to the GFP-like protein family (specifically the Green Fluorescent Protein superfamily)

Identify the structure page of your protein in RCSB When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

The structure was solved in 2014.it a good quality structure. its resolution is 2.21Å.

Are there any other molecules in the solved structure apart from protein?

No.

Does your protein belong to any structure classification family?

It belongs to family of Fluorescent proteins.

Open the structure of your protein in any 3D molecule visualization software: PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

It has more sheets.Helices are red, sheets are yellow and loops are green.

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

The hydrophobic residues are yellow in colour and hydrophilic are gray in colour. This colour combination tells us that hydrophilic residues are more towards the outer side of protein and hydrophobic residues lie within the molecule buried inside.

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Part C. Using ML-Based Protein Design Tools

Copy the HTGAA_ProteinDesign2026.ipynb notebook and set up a colab instance with GPU.

Choose your favorite protein from the PDB.

I am choosing the mCardinal far red fluorescent protein.

We will now try multiple things in the three sections below; report each of these results in your homework writeup on your HTGAA website:

C1. Protein Language Modeling

Deep Mutational Scans

Use ESM2 to generate an unsupervised deep mutational scan of your protein based on language model likelihoods.

Can you explain any particular pattern? (choose a residue and a mutation that stands out)

Latent Space Analysis

Use the provided sequence dataset to embed proteins in reduced dimensionality.

Analyze the different formed neighborhoods: do they approximate similar proteins?

Place your protein in the resulting map and explain its position and similarity to its neighbors.

Protein Folding

Folding a protein

Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?