Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Class Assignment A Wearable or At-Home Androgen Biosensing Platform for PCOS Diagnosis and Management I am interested in developing an easy-to-use biosensor/microfluidic kit that can detect and quantify androgen levels from blood or sweat to support the diagnosis and personalized management of Polycystic Ovary Syndrome (PCOS).

  • Week 2 HW: DNA Read, Write, and Edit

    Predicted structure of Pro-resilin (Drosophila melanogaster), generated by AlphaFold and retrieved from UniProt (UniProtKB: Q9V7U0). *You can find parts 1 and 2 in the week 2 lab section! Part 3: DNA Design Challenge My protein choice: Pro-resilin! Pro-resilin is a highly elastic, rubber-like structural protein found in insects that enables efficient energy storage for jumping, flight, and sound production. About seven years ago, my biotechnology teacher in high school showed us an article about a very cool leading professor in Israel, who was doing things that seemed almost impossible. I vividly remember one of the projects he was working on: scientists had identified a protein that enables a particular spider species to jump remarkably high relative to its body size, and this scientist wanted to give humans the same ability by creating shoes with soles made from some protein similar to Pro-resilin.

  • Week 3 HW Lab Automation

    *You can find the Python Script for Opentrons Artwork part in the Week 3 Lab section! Post-Lab Questions 1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

  • Week 4 HW: Protein Design Part I

    Part A. Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Answer: On average, raw meat is ~20%-25% protein by weight, say we take it to be 20%, in the 500g sample we have, we will have 500*0.20 = 100g of protein. 1g = 6.022e+23 Daltons, so 100g = 6.022e+25, and then: # of AA = 6.022e+25/100 = 6.022e+23 AAs!

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image

Class Assignment

A Wearable or At-Home Androgen Biosensing Platform for PCOS Diagnosis and Management

I am interested in developing an easy-to-use biosensor/microfluidic kit that can detect and quantify androgen levels from blood or sweat to support the diagnosis and personalized management of Polycystic Ovary Syndrome (PCOS).

PCOS is a common hormonal disorder affecting 5-13% of reproductive-aged women and is characterized by androgen excess, irregular or absent ovulation, and polycystic ovaries. Symptoms can include severe acne, hirsutism, hair loss, and infertility, and the condition is associated with long-term risks such as diabetes, cardiovascular disease, and endometrial cancer.

Currently, one of the key diagnostic criteria for PCOS is elevated androgen levels, which are typically assessed through blood tests performed specifically on days 2-5 of the menstrual cycle. However, for many women with PCOS, menstrual cycles are irregular or absent, making it difficult to determine when testing should occur. As a result, diagnosis can be delayed or inconclusive.

A biosensing platform capable of continuously or repeatedly monitoring androgen levels throughout an entire cycle would provide a more accurate picture of hormonal dynamics. This could improve diagnostic reliability and enable more personalized treatment strategies by tracking hormonal responses to lifestyle changes or medications over time.

Governance and Policy Goals

1. Ensure Biological Safety and Security

Prevent misuse, unsafe operation, or unintended biological harm associated with the biosensing platform.

Sub-goals:

  • Restrict access to raw sensor calibration, modification protocols, or firmware to reduce the risk of misuse.
  • Incorporate built-in safety checks, clear user instructions, and warnings to minimize incorrect use or misinterpretation of results.
2. Protect User Privacy and Data Security

Protect sensitive hormonal and reproductive health data to prevent discrimination, stigma, or exploitation.

Sub-goals:

  • Minimize data collection to only what is strictly necessary for hormonal monitoring and trend analysis.
  • Ensure hormonal data is encrypted, anonymized, or processed locally when possible to reduce exposure.
3. Promote Equity and Accessibility in Women’s Health

Ensure the tool reduces existing barriers to diagnosis and care rather than reinforcing healthcare inequities.

Sub-goals:

  • Design the biosensing kit to be affordable and usable outside specialized clinical settings.
  • Validate the platform across diverse populations to reduce bias and improve diagnostic reliability for underrepresented groups.

Governance Actions

1. Encrypted and Minimal Data Storage System

Actor: Companies (product developers), with regulatory oversight
Ensure that hormonal data is encrypted and that only the minimum necessary data is collected or stored, preferably processed locally on-device.

2. Diagnostic Clarity and Uncertainty Communication

Actor: Companies + regulators
Ensure that diagnostic outputs clearly distinguish between high-confidence diagnoses and ambiguous or borderline results. The tool should explicitly communicate diagnostic confidence and uncertainty, and automatically prompt users to seek further clinical evaluation when results fall outside validated confidence thresholds.

3. Equity-Focused Validation Across Diverse Populations

Actor: Academic researchers, funders, public health agencies
Conduct and incentivize validation studies across diverse populations to reduce bias and improve diagnostic reliability for underrepresented groups.

4. Informed-Use and Result Interpretation Requirements

Actor: Companies + regulators
Require a short, standardized informed-use process before first use that explains the scope of the tool, the meaning of diagnostic confidence and uncertainty, and appropriate user actions in response to different types of results.

5. Post-Market Monitoring and Feedback Loop

Actor: Regulators + companies
Require ongoing, anonymized post-market monitoring of the tool after deployment, including reporting of false positives, false negatives, and aggregate user outcomes. This feedback should be used to identify unintended harms, performance gaps across populations, and opportunities for improvement over time.

Actions Ranking

governance_table governance_table

Recommendation for Best Governance Action

Based on the scoring analysis, I would prioritize a combination of governance actions, with Encrypted and Minimal Data Storage as the primary action, alongside Informed-Use and Result Interpretation Requirements and Post-Market Monitoring and Feedback as complementary measures.

The highest priority should be the implementation of an encrypted and minimal data storage system. This action consistently scored strongest across privacy and data security sub-goals and provides a foundational safeguard against harm. Because hormonal and reproductive health data is highly sensitive, failures in data protection could lead to discrimination, stigma, or misuse at scale. Prioritizing data minimization and local or encrypted processing reduces these risks regardless of downstream user behavior or diagnostic accuracy. This action is also relatively feasible for companies to implement early and does not significantly postpone research or innovation.

As a second priority, I would emphasize informed-use and result interpretation requirements together with post-market monitoring. Informed-use mechanisms reduce misinterpretation and panic by helping users understand results and lower uncertainty, while post-market monitoring enables the detection of errors, biases, or unintended harms that emerge after deployment.

One key trade-off is the tension between data minimization and the need for a lot of data to support monitoring. There is also uncertainty around how effectively users will engage with and understand informed-use materials, which could limit their intended protective impact.

This recommendation is directed toward the research and development team responsible for building the biosensing platform, including academic labs and early-stage startups developing the technology. Overall, the scoring highlights that no single action is sufficient on its own. Instead, a comprehensive approach that prioritizes privacy infrastructure while supporting safe use and continuous evaluation offers the most ethically robust path forward.

Reflection on Ethical Concerns

An ethical concern that became especially salient to me this week is the risk associated with centralized access to sensitive hormonal and reproductive health data. Considering what could happen if a single actor gained access to aggregated user data highlighted how data breaches or misuse could cause large-scale harm, including discrimination or loss of trust. This concern reinforced the importance of prioritizing encrypted and minimal data storage, local data processing when possible, and ongoing post-market monitoring, rather than treating privacy as a secondary or purely technical consideration.


Lab Preparation

🦠 Complete Lab Specific Training in Person DONE
🦠 Complete Safety Training in Atlas DONE


Week 2 Lecture Prep

Questions from Professor Jacobson:

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Answer: The error rate of polymerase is approximately 1 in 10⁶. The human genome is about 3 × 10⁹ bp, so that corresponds to roughly 3,000 potential errors per genome copy (and across many dividing cells, the total number of errors could be much higher). Biology deals with this discrepancy through multiple layers of error correction - using proofreading and repair mechanisms that reduce errors, such as the MutS repair system.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Answer: Most amino acids can be encoded by multiple codons, so there are an astronomically large number of possible DNA sequences that could encode the same protein, with many choices per amino acid compounding across the full length of the gene.

In practice, not all of these sequences work because certain codons are translated more efficiently than others, and some sequences can create unfavorable secondary structures or other issues. As a result, only a small subset of possible DNA sequences is functional and practical for expressing the desired protein.

Questions from Dr. LeProust:

  1. What’s the most commonly used method for oligo synthesis currently?

Answer: The most commonly used method for oligo synthesis is chemical phosphoramidite DNA synthesis, which builds DNA sequences one nucleotide at a time through repeated cycles of coupling, protection, and deprotection on a solid support.

  1. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Answer: It is difficult to synthesize oligos longer than 200nt via direct chemical synthesis because each nucleotide addition (chemical reaction) has an error rate, and these errors accumulate with every synthesis cycle, causing the correctness of the sequences to decrease as the length increases.

  1. Why can’t you make a 2000bp gene via direct oligo synthesis?

Answer: A 2000bp gene cannot be made via direct oligo synthesis because synthesis errors accumulate with each nucleotide addition. As sequence length increases, the probability of producing a correct full-length sequence drops dramatically. Since direct oligo synthesis is already unreliable beyond 200nt, synthesizing a 2000bp sequence in a single run is effectively impossible. In addition, oligo synthesis produces single-stranded DNA, and generating a correct double-stranded gene of this length would further compound errors, making direct synthesis impractical.

Question from George Church:

  1. Using Google & Prof. Church’s slide #4, what are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?

Answer: The 10 essential amino acids in animals are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine.

I did not watch Jurassic Park, but according to what I read online, the “Lysine Contingency” was a genetic alteration Henry Wu performed in the dinosaurs’ genome to knock out their ability to produce lysine, forcing them to rely on lysine supplements provided by park staff in an attempt to imprison them in the park and prevent them from destroying the global ecosystem.

After checking where lysine comes from (I asked Gemini: “How do animals produce lysine?”), I found that animals cannot produce lysine at all and must obtain it through nutrition. This means that humans, animals, and dinosaurs would all need dietary lysine regardless. Based on this, I conclude that Henry Wu’s lysine contingency was useless as a containment strategy, and it is unclear what genetic mechanism he could have targeted to achieve this goal, since there is no direct lysine biosynthesis pathway in animals to knock out.

Resource: Lysine contingency, Jurassic Park Wiki


My HTGAA Website

What I did to personalise the website

Week 2 HW: DNA Read, Write, and Edit

cover image cover image
Predicted structure of Pro-resilin (Drosophila melanogaster), generated by AlphaFold and retrieved from UniProt (UniProtKB: Q9V7U0).

*You can find parts 1 and 2 in the week 2 lab section!

Part 3: DNA Design Challenge

My protein choice: Pro-resilin!
Pro-resilin is a highly elastic, rubber-like structural protein found in insects that enables efficient energy storage for jumping, flight, and sound production. About seven years ago, my biotechnology teacher in high school showed us an article about a very cool leading professor in Israel, who was doing things that seemed almost impossible. I vividly remember one of the projects he was working on: scientists had identified a protein that enables a particular spider species to jump remarkably high relative to its body size, and this scientist wanted to give humans the same ability by creating shoes with soles made from some protein similar to Pro-resilin.

That moment has stayed with me until today. I think it would be very cool to look at the structure of this protein, try to understand what gives it its unique characteristics, and potentially think about what could be done with it.

I went to UniProt.org and searched for “Pro-resilin protein Drosophila melanogaster” Here is the AA sequence:

>sp|Q9V7U0|RESIL_DROME Pro-resilin OS=Drosophila melanogaster OX=7227 GN=resilin PE=1 SV=1
MFKLLGLTLLMAMVVLGRPEPPVNSYLPPSDSYGAPGQSGPGGRPSDSYGAPGGGNGGRP
SDSYGAPGQGQGQGQGQGGYAGKPSDTYGAPGGGNGNGGRPSSSYGAPGGGNGGRPSDTY
GAPGGGNGGRPSDTYGAPGGGGNGNGGRPSSSYGAPGQGQGNGNGGRSSSSYGAPGGGNG
GRPSDTYGAPGGGNGGRPSDTYGAPGGGNNGGRPSSSYGAPGGGNGGRPSDTYGAPGGGN
GNGSGGRPSSSYGAPGQGQGGFGGRPSDSYGAPGQNQKPSDSYGAPGSGNGNGGRPSSSY
GAPGSGPGGRPSDSYGPPASGSGAGGAGGSGPGGADYDNDEPAKYEFNYQVEDAPSGLSF
GHSEMRDGDFTTGQYNVLLPDGRKQIVEYEADQQGYRPQIRYEGDANDGSGPSGPGGPGG
QNLGADGYSSGRPGNGNGNGNGGYSGGRPGGQDLGPSGYSGGRPGGQDLGAGGYSNGKPG
GQDLGPGGYSGGRPGGQDLGRDGYSGGRPGGQDLGASGYSNGRPGGNGNGGSDGGRVIIG
GRVIGGQDGGDQGYSGGRPGGQDLGRDGYSSGRPGGRPGGNGQDSQDGQGYSSGRPGQGG
RNGFGPGGQNGDNDGSGYRY

*I chose the Drosophila melanogaster (fruit fly) resilin protein because its resilin is the best-characterized and most widely studied form of the protein.

Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
To determine the nucleotide sequence corresponding to Pro-resilin in Drosophila melanogaster, I followed the link for the canonical transcript (Q9V7U0-1) on UniProt, which directed me to Ensembl Metazoa. There, I downloaded only the coding sequence (CDS) in FASTA format, excluding UTRs and introns, to obtain the exact DNA sequence that encodes the protein. The sequence is shown below:

>resilin-RA cds:protein_coding
ATGTTCAAGTTACTCGGCTTGACGCTGCTCATGGCAATGGTGGTCCTTGGGCGACCGGAG
CCACCAGTTAACTCGTATCTACCTCCGTCCGATAGCTATGGAGCACCGGGTCAGAGTGGT
CCCGGCGGCAGGCCGTCGGATTCCTATGGAGCTCCTGGTGGTGGAAACGGTGGACGGCCC
TCAGACAGCTATGGCGCTCCAGGCCAGGGTCAAGGACAGGGACAAGGACAAGGTGGATAT
GCAGGCAAGCCCTCAGATACCTATGGAGCTCCTGGTGGTGGAAATGGCAACGGAGGTCGT
CCATCGAGCAGCTATGGCGCTCCTGGCGGTGGAAACGGTGGTCGTCCTTCGGATACCTAC
GGTGCTCCTGGTGGCGGAAATGGTGGACGCCCATCGGACACTTATGGTGCTCCTGGTGGT
GGTGGAAATGGCAACGGCGGACGACCTTCAAGCAGCTATGGAGCTCCTGGTCAAGGACAA
GGCAACGGAAATGGCGGTCGCTCATCGAGCAGCTATGGTGCTCCTGGCGGTGGAAACGGC
GGTCGTCCTTCGGATACCTACGGTGCTCCCGGTGGTGGAAACGGTGGTCGTCCTTCGGAT
ACTTACGGCGCTCCTGGTGGCGGCAATAATGGCGGTCGTCCCTCAAGCAGCTACGGCGCT
CCTGGTGGTGGAAACGGTGGTCGTCCATCTGACACCTATGGCGCTCCTGGTGGCGGTAAC
GGAAACGGCAGCGGTGGTCGTCCTTCAAGCAGCTATGGAGCTCCTGGTCAGGGCCAAGGT
GGATTTGGTGGTCGTCCATCGGACTCCTATGGTGCTCCTGGTCAGAACCAAAAACCATCA
GATTCATATGGCGCCCCTGGTAGCGGCAATGGCAACGGCGGACGTCCTTCGAGCAGCTAT
GGAGCTCCAGGCTCAGGACCTGGTGGCCGACCCTCCGACTCCTACGGACCCCCAGCTTCT
GGATCGGGAGCAGGTGGCGCTGGAGGCAGTGGACCCGGCGGCGCTGACTACGATAACGAT
GAGCCCGCCAAGTACGAATTTAATTACCAGGTTGAGGACGCGCCCAGCGGACTCTCGTTC
GGGCATTCAGAGATGCGCGACGGTGACTTCACCACCGGCCAGTACAATGTCCTGTTGCCC
GACGGAAGGAAGCAAATTGTGGAGTATGAAGCCGACCAGCAGGGCTACCGGCCACAGATC
CGCTACGAAGGCGATGCCAACGATGGCAGTGGTCCCAGCGGTCCTGGAGGTCCTGGCGGT
CAGAATCTTGGTGCCGATGGCTACTCCAGTGGACGTCCCGGCAATGGAAATGGCAACGGA
AATGGCGGTTACTCCGGTGGACGTCCAGGAGGCCAGGATTTGGGACCTAGTGGATATTCC
GGTGGTCGTCCAGGAGGTCAGGATCTAGGCGCCGGTGGCTACTCCAATGGCAAGCCGGGC
GGCCAAGACTTGGGACCAGGCGGTTACTCCGGTGGTCGCCCTGGAGGTCAGGACTTGGGT
CGAGACGGCTACTCCGGTGGACGTCCAGGTGGACAGGACCTCGGTGCCAGCGGCTACTCC
AATGGTAGGCCAGGCGGCAATGGCAACGGTGGATCCGATGGCGGTCGTGTGATCATCGGT
GGACGTGTGATAGGCGGCCAGGATGGCGGTGATCAGGGCTACTCCGGCGGACGTCCCGGT
GGTCAGGATCTTGGACGTGATGGCTACTCCAGCGGTCGTCCTGGTGGTCGGCCAGGCGGC
AACGGCCAGGATAGTCAGGATGGCCAAGGATACTCGAGCGGCAGGCCGGGTCAGGGTGGC
CGGAATGGATTCGGACCCGGTGGTCAGAACGGTGACAACGATGGCAGCGGTTATCGGTAC
TAG

Codon optimization. Since the Twist Codon Optimization Tool is out of service on the website.

So I went to Google and found the free tool, the Gensmart Codon Optimization Tool. I input my CDS and chose optimization for E.coli:

And after some time, here is what I got:

Or in plain text:

ATGTTTAAATTGCTTGGATTAACACTGCTAATGGCAATGGTTGTACTGGGCCGCCCGGAACCGCCTGTTAATAGCTATCTCCCGCCGTCGGACAGCTATGGCGCTCCGGGCCAAAGCGGCCCCGGCGGCCGTCCGTCAGACAGCTATGGCGCTCCGGGCGGTGGCAATGGCGGTCGGCCGTCCGACAGCTACGGAGCGCCGGGCCAAGGGCAAGGTCAGGGCCAGGGTCAAGGTGGCTATGCAGGTAAGCCGTCTGACACGTACGGTGCGCCGGGAGGTGGTAATGGCAACGGCGGACGACCAAGCAGCAGCTACGGTGCTCCGGGTGGTGGAAACGGTGGTCGTCCGAGCGACACGTACGGTGCTCCGGGCGGGGGGAACGGCGGACGTCCGAGCGACACCTATGGCGCGCCGGGGGGTGGCGGTAACGGTAACGGCGGTCGTCCGTCCTCTTCTTACGGTGCGCCGGGTCAAGGTCAGGGTAACGGCAACGGTGGCCGTAGCTCCAGTTCTTACGGTGCCCCGGGTGGTGGGAACGGCGGCCGTCCGTCCGACACTTACGGGGCCCCAGGGGGCGGTAATGGCGGCCGTCCGTCTGACACCTACGGCGCGCCGGGTGGTGGAAACAACGGCGGCCGTCCGTCATCCTCGTACGGCGCCCCGGGTGGTGGCAACGGTGGTCGCCCGTCCGACACCTATGGTGCACCTGGCGGTGGCAACGGCAACGGTAGCGGAGGACGTCCAAGCAGCTCCTACGGTGCCCCGGGTCAGGGTCAGGGTGGCTTCGGCGGTCGTCCGTCCGATTCTTACGGCGCCCCGGGTCAGAATCAGAAACCGTCTGATAGCTATGGCGCGCCGGGTTCGGGCAATGGTAACGGTGGCCGTCCTTCCTCTTCGTACGGGGCGCCGGGCAGCGGTCCGGGTGGCCGGCCGAGCGACAGCTACGGCCCACCGGCAAGCGGTAGTGGCGCTGGTGGTGCGGGTGGGTCGGGCCCGGGTGGCGCTGACTATGATAATGATGAACCGGCGAAATACGAGTTCAACTATCAGGTTGAGGATGCGCCGAGCGGCCTGAGCTTTGGTCACAGCGAAATGCGTGATGGTGATTTTACCACCGGTCAATATAACGTGCTGCTGCCTGATGGTCGTAAACAGATCGTGGAATATGAGGCAGATCAGCAGGGCTACCGCCCACAGATTCGTTATGAGGGTGACGCAAATGATGGTTCGGGTCCGAGCGGTCCGGGCGGCCCCGGCGGCCAAAACCTGGGTGCGGATGGTTACAGCTCCGGTAGACCGGGCAACGGCAACGGTAATGGCAATGGTGGCTATAGCGGTGGCCGTCCGGGCGGTCAGGATTTGGGCCCAAGCGGTTATTCCGGTGGTCGCCCGGGTGGCCAAGACTTAGGAGCGGGAGGCTACTCTAATGGCAAGCCGGGTGGTCAAGACCTGGGTCCGGGAGGTTACTCTGGTGGTCGTCCGGGTGGTCAGGATCTGGGCCGCGATGGCTACTCCGGTGGCCGCCCGGGTGGTCAAGATTTGGGTGCGAGCGGTTACTCTAATGGACGCCCGGGTGGGAACGGTAATGGCGGTAGCGACGGTGGGCGCGTCATCATCGGTGGTCGCGTGATTGGTGGCCAAGACGGTGGCGACCAGGGTTACAGCGGGGGCCGTCCGGGCGGTCAGGATTTGGGGCGTGATGGTTATAGCAGCGGTAGACCTGGTGGTCGTCCTGGTGGCAACGGACAAGACAGCCAGGACGGCCAGGGCTATAGCAGCGGTCGTCCGGGGCAAGGCGGACGTAATGGCTTCGGTCCGGGTGGCCAAAACGGCGACAACGATGGCAGCGGCTATCGCTATTAA

There are several reasons why we need to optimize codon usage. Although multiple codons can encode the same amino acid, different organisms have preferences for certain codons over others - a phenomenon known as codon bias. This means that a DNA sequence that works efficiently in one organism may not be translated as efficiently in another. If rare or non-preferred codons are used, the ribosome may stall, leading to lower protein expression or incomplete translation. Therefore, it is important to adapt the codons in the sequence to match the preferred codon usage of the organism in which the protein will be expressed.

I chose to optimize the sequence for Escherichia coli because it is one of the most well-characterized and widely used organisms for recombinant protein expression. E. coli grows quickly, is easy to manipulate genetically, and has a large body of existing protocols and tools available. Since this is my first time going through the full synthetic gene design and expression workflow, I felt it was best to work with a well-established system before moving on to more complex organisms.

What technologies could be used to produce this protein from your DNA?
One technology that can be used to produce this protein from my DNA is recombinant protein expression in bacteria, which is exactly the approach taken here. After obtaining and codon-optimizing the coding sequence (CDS), the gene is inserted into a plasmid vector that contains a promoter, ribosome binding site (RBS), terminator, and a selectable marker such as an antibiotic resistance gene. The plasmid is then transformed into E. coli cells, which are grown under selective conditions to ensure that only bacteria containing the correct vector survive.

Once inside the bacteria, the promoter drives transcription of the inserted DNA into mRNA. The ribosome then binds to the mRNA and translates the coding sequence into the corresponding protein. As the bacteria grow and divide, they produce increasing amounts of the recombinant protein. After sufficient expression, the protein can be purified. This results in the isolation of the desired protein.


Part 4: Prepare a Twist DNA Synthesis Order

Linear Map
After following all of the instructions on the website, I have obtained the following Linear Map:

And here is the link to view it in Benchling: Resilin-RA expression cassette

When I proceeded to Twist, the platform flagged issues with my sequence and recommended re-optimization. After running their internal optimization, the system still classified the sequence as too complex, however, it did allow me to download the plasmid design.

Here it is:

Yayyyyy :)


Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why?
I would like to sequence the gut microbiome of athletes before and after intense training, or before and after a traumatic injury (such as ligament tears or broken bones), to investigate whether specific bacterial compositions or functional genes correlate with recovery speed, inflammation levels, or injury resilience. I think it would be fascinating to explore whether the microbiome, often referred to as “the second brain”, plays an even greater role in recovery and overall health than we currently understand.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why
To obtain a microbiome sample, I would probably rely on a non-invasive stool sample collection method, since from what I have learned, this is typically how researchers study the gut microbiome. For sequencing, I would probably choose next-generation sequencing (NGS), specifically shotgun metagenomic sequencing. NGS allows for a lot of DNA fragments to be sequenced in parallel, enabling comprehensive analysis of complex microbial communities. This approach would allow me not only to identify the bacterial species present, but also to analyze functional genes that may be associated with inflammation regulation, tissue repair, and recovery capacity.

  1. Shotgun metagenomic sequencing using Illumina technology is considered a second-generation sequencing method because it enables massively parallel sequencing of millions of short DNA fragments simultaneously. Unlike first-generation Sanger sequencing, which reads one fragment at a time, second-generation platforms generate high-throughput short reads using sequencing-by-synthesis chemistry.
  2. The input for this method is the total DNA extracted from a stool sample. The extracted DNA is fragmented into smaller pieces, typically a few hundred base pairs long. Synthetic adapter sequences are ligated to both ends of each fragment, allowing them to bind to the sequencing flow cell. The fragments are then amplified by PCR to generate sufficient material, creating a sequencing library ready for analysis.
  3. In Illumina sequencing, DNA fragments bind to a flow cell and undergo cluster amplification. During sequencing-by-synthesis, fluorescently labeled nucleotides are incorporated one base at a time to synthesize the complementary strand. After each incorporation, a camera detects the emitted fluorescence signal, and the color corresponds to a specific base (A, T, C, or G). This process repeats in cycles, allowing the machine to determine the DNA sequence through base calling.
  4. The output consists of millions of short DNA reads, typically stored in FASTQ format. Each read includes both the nucleotide sequence and a quality score (a measure of confidence in each base call) for each base. These reads can then be computationally analyzed to determine microbial composition and identify functional genes within the microbiome.

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why?
I would like to synthesize DNA that can be transcribed and translated into a therapeutic protein “paste” (or hydrogel-like material) to support healing of joint cartilage, and specifically the meniscus. Since there isn’t a sufficient solution for meniscal tears today, and the effects of surgeries can be lifelong, I think that creating something like this could be game-changing. One of the main challenges with meniscus healing is that blood supply is not consistent across the tissue (especially toward the inner region), which makes delivery of therapeutics through the bloodstream difficult. Because of that, an externally applied material that can consistently deliver needed proteins or signaling molecules directly to the injury site might help the meniscus heal better and faster. More specifically, I would start by synthesizing a construct encoding a cartilage-supporting growth factor, such as members of the TGF-β superfamily (Fortier et al., 2011), or a small engineered protein designed to promote extracellular matrix production. This construct could eventually be incorporated into a biomaterial scaffold for localized and sustained protein delivery.

(ii) What technology or technologies would you use to perform this DNA synthesis and why?
The DNA would be synthesized using solid-phase phosphoramidite synthesis, which builds short oligonucleotides one nucleotide at a time. These short oligonucleotides are then assembled into a full-length gene using enzymatic methods such as ligation or Gibson Assembly, and cloned into a plasmid for verification and use. I would use this method because it is the current standard for accurate, high-fidelity DNA synthesis and allows precise control over the exact sequence being produced. It also enables incorporation of specific design elements such as codon optimization, regulatory sequences, or tags, making it great for constructing engineered therapeutic genes.

  1. Solid-phase phosphoramidite synthesis builds DNA one nucleotide at a time on a solid support. The process occurs in repeated chemical cycles. First, the growing DNA strand is attached to a solid surface. A protected nucleotide (phosphoramidite) is then added to the chain in a coupling reaction. After coupling, a capping step inactivates any strands that did not successfully react, which helps reduce errors. The protecting group on the newly added nucleotide is then removed (deprotection) to expose a reactive site for the next cycle. Finally, an oxidation step stabilizes the newly formed bond. This cycle repeats until the full oligonucleotide sequence is synthesized. After synthesis is complete, the DNA is cleaved from the solid support, fully deprotected, and purified. Since this chemical method efficiently produces short oligonucleotides, multiple synthesized fragments are then assembled using enzymes into a longer gene construct and sequence-verified..
  2. In solid-phase phosphoramidite synthesis, each chemical coupling step is slightly less than 100% efficient, meaning that errors accumulate as the sequence length increases. For this reason, individual oligonucleotides are typically limited to around 150–200 base pairs, and longer genes must be assembled from shorter fragments. Additionally, sequences that are highly repetitive or have extreme GC content can be more difficult to synthesize accurately.

5.3 DNA Edit

(i) What DNA would you want to edit and why?
I would like to explore editing the DNA of cancerous cells to selectively induce apoptosis (programmed cell death) and prevent uncontrolled cell division as a therapeutic strategy for cancer treatment. Many cancers arise from mutations in genes that regulate cell growth and survival. By using targeted gene editing tools to either restore normal tumor suppressor function or disrupt oncogenes specifically in cancer cells, it may be possible to stop tumor progression while minimizing damage to healthy tissue. A major challenge in this approach would be ensuring precise delivery to cancer cells while avoiding unintended edits in healthy cells.

(ii) What technology or technologies would you use to perform these DNA edits and why?
To perform these DNA edits, I would use a CRISPR-based editing platform, specifically base editing or prime editing, depending on the type of mutation involved. These technologies allow precise genetic modifications without introducing full double-strand breaks, which reduces the risk of unintended insertions or deletions. However, significant challenges remain, particularly achieving efficient delivery to cancer cells while minimizing off-target effects in healthy tissue.

  1. I would use prime editing, a CRISPR-based technology that uses a Cas9 nickase fused to a reverse transcriptase and a specialized guide RNA, pegRNA (a specially designed guide RNA that identifies the target site and contains the template for the desired genetic edit). The guide RNA directs the enzyme to a specific mutation in the cancer cell DNA. Instead of creating a full double-strand break, the system makes a single-strand nick and directly writes the corrected DNA sequence into the genome. The cell’s repair machinery then stabilizes the edit, making the change permanent.
  2. First, I would identify a mutation specific to the cancer type and design a pegRNA targeting that sequence. The editing components (Cas9 nickase–reverse transcriptase and pegRNA) would need to be packaged into a delivery system, such as targeted lipid nanoparticles that bind receptors overexpressed on cancer cells. The key inputs include the editing enzyme, the guide RNA, and a selective delivery vehicle to minimize effects on healthy cells.
  3. Although prime editing is more precise than traditional CRISPR-Cas9, challenges remain. Editing efficiency may be incomplete, and off-target effects are still possible. The biggest limitation is safe and selective delivery to cancer cells, since unintended edits in healthy tissues could cause harm.

References

Fortier, L. A., Barker, J. U., Strauss, E. J., McCarrel, T. M., & Cole, B. J. (2011).
The role of growth factors in cartilage repair. Clinical Orthopaedics and Related Research, 469(10), 2706–2715.

Week 3 HW Lab Automation

cover image cover image
*You can find the Python Script for Opentrons Artwork part in the Week 3 Lab section!

Post-Lab Questions

1. Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

The paper I chose, “An Automated Versatile Diagnostic Workflow for Infectious Disease Detection in Low-Resource Settings” (Urrutia Iturritza et al., 2024), presents a fully automated diagnostic pipeline built using the Opentrons OT-One-S Hood robot. The authors combined open-source modular automation with molecular biology protocols to create a workflow capable of detecting Neisseria meningitidis, a pathogen responsible for meningitis. What I found especially compelling is that the system automates the entire process - from DNA isolation using magnetic beads to isothermal amplification (RPA), enzymatic digestion, and final detection on a paper-based vertical flow microarray. Instead of focusing solely on analytical novelty, the study emphasizes integration: connecting multiple biological modules into a single continuous, robot-executed workflow.

The workflow processes eight samples in parallel and completes the full diagnostic pipeline in about 110 minutes, about 18% faster than manual processing by trained personnel. Even more interesting is the cost analysis: the estimated cost per sample (~USD 16) is significantly lower (~5.8× less per sample) than traditional PCR-based diagnostic tests. The use of recombinase polymerase amplification (RPA), which operates at constant temperature, eliminates the need for expensive thermocycling infrastructure. The detection step uses gold nanoparticles on paper-based microarrays, producing a visible signal, which makes the system potentially adaptable for decentralized or low-resource settings.

It is important to note that the workflow was not completely autonomous. Before running the protocol, researchers manually prepared the samples and placed the required reagents, consumables, microarrays, and supporting equipment on the robot deck. Some user interaction was still needed during the thermal cycling steps, and imaging and quantitative analysis of the microarrays were carried out manually after the run. Overall, while the liquid-handling and main molecular processes were automated, the system still relied on human setup and post-processing - showing both the strengths and current practical limits of laboratory automation.

This work highlights automation not as a luxury but as a strategy for improving accessibility, reproducibility, and safety in molecular diagnostics. By using an open-source robot such as Opentrons, standard laboratory consumables, and modular protocols, the authors demonstrate a strong proof of concept for automating nearly an entire diagnostic workflow. Importantly, this approach minimizes human handling steps, thereby reducing the risk of contamination and operational errors, while allowing scientists to focus on experimental design, interpretation, and optimization. In this sense, the engineering contribution lies not only in the individual modules, but in the successful integration of these modules into a cohesive, largely automated system with clear translational potential.

Resources: Miren, U. I., Mlotshwa, P., Gantelius, J., Alfvén, T., Loh, E., Karlsson, J., . . . Gaudenzi, G. (2024). An automated versatile diagnostic workflow for infectious disease detection in low-resource settings. Micromachines, 15(6), 708. doi:https://doi.org/10.3390/mi15060708

2. Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

Automation plan for BioGlue Meniscal Repair project

For my final project, I intend to use lab automation to systematically screen BioGlue formulations that deliver a cartilage-supporting growth factor (e.g., a TGF-β family protein). The goal of automation is to remove pipetting variability and enable parallel testing of many conditions (protein dose, gel composition, crosslinking conditions) in a reproducible way.

What I would automate:

A) Hydrogel formulation + dose matrix (OT-2 / Opentrons)

  1. Prepare a concentration series of growth factor (e.g., 0×, low, medium, high, very high).
  2. Dispense hydrogel precursor components into a 96-well plate (or small molds).
  3. Add growth factor to each well according to a planned matrix (dose × gel composition).
  4. Mix consistently (pipette mixing program).
  5. Start a standardized “release study” (how much protein leaves the gel over time) by overlaying buffer and scheduling timed sampling.

B) Automated sampling for release kinetics 6. At defined timepoints (e.g., 1h, 6h, 24h, 48h), the robot removes a small aliquot of supernatant and transfers it to a readout plate (for ELISA / fluorescence / total protein assay).

Final Project Ideas

As requested, I uploaded my ideas to the slide deck, but here they are too:

BioGlue for Meniscal Repair

idea1 idea1

At-Home Androgen Biosensing for PCOS (Wearable / Microfluidic Kit)

idea2 idea2

Programmable RGB Fluorescence System

idea3 idea3

Week 4 HW: Protein Design Part I

cover image cover image

Part A. Conceptual Questions

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

    Answer: On average, raw meat is ~20%-25% protein by weight, say we take it to be 20%, in the 500g sample we have, we will have 500*0.20 = 100g of protein. 1g = 6.022e+23 Daltons, so 100g = 6.022e+25, and then: # of AA = 6.022e+25/100 = 6.022e+23 AAs!

  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Answer: First of all the cells in the fish and beef are usually dead due to denaturation from heat, so no active process can come from there. There is also no gene transfer from dead cells to humans. Additionally, say that somehow the DNA transfers to our cells, it will be detected as a pathogen way before it gets there, and if it does, there is no mechanism to embed the DNA to our own DNA, and also it will not knock out our own DNA so the worst case will be a combination of homosapience and cow or fish. 4. Why are there only 20 natural amino acids? 5. Can you make other non-natural amino acids? Design some new amino acids. 6. Where did amino acids come from before enzymes that make them, and before life started? 7. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? 8. Can you discover additional helices in proteins? 9. Why are most molecular helices right-handed? 10. Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation? 11. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials? 12. Design a β-sheet motif that forms a well-ordered structure.


Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

Answer: Nav1.7 is a voltage-gated sodium channel located in neuronal membranes that initiates and propagates action potentials by allowing Na⁺ ions to enter the cell in response to changes in membrane voltage. It plays a crucial role in controlling the excitability of nociceptor neurons and is therefore essential for the sensory perception of pain. This protein demonstrates a clear relationship between 3D structure and electrical function and has significant physiological and clinical relevance. I was particularly drawn to it because of my interest in understanding how the brain works and because its structural properties provide an opportunity to explore the mechanisms underlying neuronal signaling.

2. Identify the amino acid sequence of your protein:

Here is the AA sequence of Nav1.7:

>7W9K_3|Chain C|Sodium channel subunit beta-2|Homo sapiens (9606)
MHRDAWLPRPAFSLTGLSLFFSLVPPGRSMEVTVPATLNVLNGSDARLPCTFNSCYTVNHKQFSLNWTYQECNNCSEEMFLQFRMKIINLKLERFQDRVEFSGNPSKYDVSVMLRNVQPEDEGIYNCYIMNPPDRHRGHGKIHLQVLMEEPPERDSTVAVIVGASVGGFLAVVILVLMVVKCVRRKKEQKLSTDDLKTEEEGKTDGEGNPDDGAK
>7W9K_2|Chain B|Sodium channel subunit beta-1|Homo sapiens (9606)
MGRLLALVVGAALVSSACGGCVEVDSETEAVYGMTFKILCISCKRRSETNAETFTEWTFRQKGTEEFVKILRYENEVLQLEEDERFEGRVVWNGSRGTKDLQDLSIFITNVTYNHSGDYECHVYRLLFFENYEHNTSVVKKIHIEVVDKANRDMASIVSEIMMYVLIVVLTIWLVAEMIYCYKKIAAATETAAQENASEYLAITSESKENCTGVQVAE
>7W9K_1|Chain A|Sodium channel protein type 9 subunit alpha|Homo sapiens (9606)
MASWSHPQFEKGGGARGGSGGGSWSHPQFEKGFDYKDDDDKGTMAMLPPPGPQSFVHFTKQSLALIEQRIAERKSKEPKEEKKDDDEEAPKPSSDLEAGKQLPFIYGDIPPGMVSEPLEDLDPYYADKKTFIVLNKGKTIFRFNATPALYMLSPFSPLRRISIKILVHSLFSMLIMCTILTNCIFMTMNNPPDWTKNVEYTFTGIYTFESLVKILARGFCVGEFTFLRDPWNWLDFVVIVFAYLTEFVNLGNVSALRTFRVLRALKTISVIPGLKTIVGALIQSVKKLSDVMILTVFCLSVFALIGLQLFMGNLKHKCFRNSLENNETLESIMNTLESEEDFRKYFYYLEGSKDALLCGFSTDSGQCPEGYTCVKIGRNPDYGYTSFDTFSWAFLALFRLMTQDYWENLYQQTLRAAGKTYMIFFVVVIFLGSFYLINLILAVVAMAYEEQNQANIEEAKQKELEFQQMLDRLKKEQEEAEAIAAAAAEYTSIRRSRIMGLSESSSETSKLSSKSAKERRNRRKKKNQKKLSSGEEKGDAEKLSKSESEDSIRRKSFHLGVEGHRRAHEKRLSTPNQSPLSIRGSLFSARRSSRTSLFSFKGRGRDIGSETEFADDEHSIFGDNESRRGSLFVPHRPQERRSSNISQASRSPPMLPVNGKMHSAVDCNGVVSLVDGRSALMLPNGQLLPEVIIDKATSDDSGTTNQIHKKRRCSSYLLSEDMLNDPNLRQRAMSRASILTNTVEELEESRQKCPPWWYRFAHKFLIWNCSPYWIKFKKCIYFIVMDPFVDLAITICIVLNTLFMAMEHHPMTEEFKNVLAIGNLVFTGIFAAEMVLKLIAMDPYEYFQVGWNIFDSLIVTLSLVELFLADVEGLSVLRSFRLLRVFKLAKSWPTLNMLIKIIGNSVGALGNLTLVLAIIVFIFAVVGMQLFGKSYKECVCKINDDCTLPRWHMNDFFHSFLIVFRVLCGEWIETMWDCMEVAGQAMCLIVYMMVMVIGNLVVLNLFLALLLSSFSSDNLTAIEEDPDANNLQIAVTRIKKGINYVKQTLREFILKAFSKKPKISREIRQAEDLNTKKENYISNHTLAEMSKGHNFLKEKDKISGFGSSVDKHLMEDSDGQSFIHNPSLTVTVPIAPGESDLENMNAEELSSDSDSEYSKVRLNRSSSSECSTVDNPLPGEGEEAEAEPMNSDEPEACFTDGCVWRFSCCQVNIESGKGKIWWNIRKTCYKIVEHSWFESFIVLMILLSSGALAFEDIYIERKKTIKIILEYADKIFTYIFILEMLLKWIAYGYKTYFTNAWCWLDFLIVDVSLVTLVANTLGYSDLGPIKSLRTLRALRPLRALSRFEGMRVVVNALIGAIPSIMNVLLVCLIFWLIFSIMGVNLFAGKFYECINTTDGSRFPASQVPNRSECFALMNVSQNVRWKNLKVNFDNVGLGYLSLLQVATFKGWTIIMYAAVDSVNVDKQPKYEYSLYMYIYFVVFIIFGSFFTLNLFIGVIIDNFNQQKKKLGGQDIFMTEEQKKYYNAMKKLGSKKPQKPIPRPGNKIQGCIFDLVTNQAFDISIMVLICLNMVTMMVEKEGQSQHMTEVLYWINVVFIILFTGECVLKLISLRHYYFTVGWNIFDFVVVIISIVGMFLADLIETYFVSPTLFRVIRLARIGRILRLVKGAKGIRTLLFALMMSLPALFNIGLLLFLVMFIYAIFGMSNFAYVKKEDGINDMFNFETFGNSMICLFQITTSAGWDGLLAPILNSKPPDCDPKKVHPGSSVEGDCGNPSVGIFYFVSYIIISFLVVVNMYIAVILENFSVATEESTEPLSEDDFEMFYEVWEKFDPDATQFIEFSKLSDFAAALDPPLLIAKPNKVQLIAMDLPMVSGDRIHCLDILFAFTKRVLGESGEMDSLRSQMEERFMSANPSKVSYEPITTTLKRKQEDVSATVIQRAYRRYRLRQNVKNISSIYIKDGDRDDDLLNKKDMAFDNVNENSSPEKTDATSSTTSPPSYDSVTKPDKEKYEQDRTEKEDKGKDSKESKK

As shown in the structure, the Nav1.7 complex includes auxiliary beta subunits that regulate channel function (chains B and C). For the purposes of this assignment, I will focus on chain A, which corresponds to the alpha subunit forming the pore and voltage-sensing domains of the channel!

  • How long is it? 2031 AAs!
  • What is the most frequent amino acid? L, Leucine, which appears 202 times.
  • How many protein sequence homologs are there for your protein? Since the UniProt BLAST didn’t load for a lot of time, I used NBCI BLAST tool. I took the quaries that their E value is 0, and have 95-100% identity. So there are 9 of them.

Part C. Using ML-Based Protein Design Tools


Part D. Group Brainstorm on Bacteriophage Engineering