Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    The Biological Engineering Tool Tool: Portable Cell-Free Allergen Biosensor. Description: A single-use, portable reaction unit containing shelf-stable biological sensing reagents. Mechanism: The user introduces a small sample of food (solid or liquid) into the unit. The device initiates a biochemical reaction that specifically recognizes the molecular signature of a target allergen (e.g., peanut or soy). If the target is detected, the device triggers a distinct visual signal (such as a color change or fluorescence) within minutes.

  • Week 2 HW: DNA Read Write and Edit

    DNA Design Challenge Chosen Protein: I chose GFP because it serves as a robust reporter that could be used for my allergen biosensor. The goal of the device is to turn a biological detection (sensing peanut DNA) into a signal the user can see. GFP spontaneously fluoresces green when exposed to UV or blue light (like a simple black light LED). By designing the system so that GFP is activated only when the allergen is detected (or shut off in the presence of the allergen), I can create an intuitive user interface.

  • Week 3 HW: Lab Automation

    Project Overview: Cell-Free Allergen Biosensor I am hoping to develop a rapid, consumer-grade biosensor designed to detect trace allergens like peanut or soy in a restaurant setting. To prioritize speed and accuracy, I will use a DNA-to-RNA detection circuit. The workflow consists of three main stages: Extraction and Amplification: I could use RPA (Recombinase Polymerase Amplification) to exponentially copy target DNA (like the Ara h 1 gene) at a constant 37°C. Transcription: T7 RNA polymerase can concurrently convert that DNA into Trigger RNA. RNA Toehold Detection: This Trigger RNA can bind to a synthetic Toehold Switch, and unzip an RNA hairpin to allow the translation of a reporter protein. This can create a visible color change or induce luminescence in under 20 minutes. By using a cell-free protein synthesis system, the entire reaction is shelf-stable and functions without the need for a traditional lab environment.

  • Week 4 HW: Protein Design part I

    HW Questions: How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Meat is roughly 20% protein by mass, so there’s ~100g of protein in 500g of meat. Average amino acid molecular weight is ~110 Da. 100 g ÷ 110 g/mol ≈ 0.91 mol of amino acid residues × Avogadro’s number: 0.91 × 6.022 × 10²³ ≈ ~5.5 × 10²³ amino acid residues Why do humans eat beef but do not become a cow, eat fish but do not become fish? During digestion, proteases break down the cow/fish proteins down into their constituent free amino acids and small peptides. These are then absorbed as monomers. Your ribosomes then reassemble them according to your own mRNA instructions. Why are there only 20 natural amino acids? Likely a combination of three things. First, once the genetic code co-evolved around these 20 amino acids, any change would catastrophically mis-translate the entire proteome, so those 20 were ’locked in’. Second, the 20 cover the necessary chemical space (charged, polar, hydrophobic, aromatic, etc). Third, the simplest amino acids (Gly, Ala, Asp, Glu, Val…) are exactly the ones most readily produced abiotically, so the code evolved around what was chemically accessible early on. Selenocysteine and pyrrolysine as the “21st and 22nd” amino acids show the code can expand, but only under very constrained circumstances. Where did amino acids come from before enzymes that make them, and before life started? They form spontaneously from simple chemistry. Amino acids are thermodynamically reasonable products. The Miller-Urey experiment showed electric discharge through CH₄, NH₃, H₂O and H₂ produces Gly, Ala, Asp and others. Hydrothermal vents provide mineral catalysts and redox gradients. Meteorites (like Murchison) contain over 70 amino acids synthesized in space, confirming abiotic production is universal wherever C, N, O, H and energy coexist. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? Left-handed. L-amino acids prefer φ≈−57°, ψ≈−47° → right-handed helix. D-amino acids are the mirror image → φ≈+57°, ψ≈+47° → left-handed helix. Can you discover additional helices in proteins? Beyond the α-helix, several others exist. The 3₁₀-helix is tighter (i→i+3 H-bonds), more strained, and common at helix termini. The π-helix is wider (i→i+5 H-bonds) and surprisingly prevalent at functional sites — maybe 15% of proteins contain at least one π-turn. The polyproline II (PPII) helix has no intramolecular H-bonds at all, is left-handed, and is extremely common in disordered regions, collagen, and signaling domains (SH3 recognition). The collagen triple helix is three intertwined PPII-like chains stabilized by interchain H-bonds. New folds continue to emerge from cryo-EM and AlphaFold-era structural biology, so the catalogue is probably not closed. Why are most molecular helices right-handed? Most molecular helices are right-handed because all biological amino acids are L-form, so their bond geometry naturally favors coiling clockwise when chained together. Why do β-sheets tend to aggregate? Edge strands have unpaired backbone NH and C=O groups pointing outward (they’re basically unsatisfied H-bond donors and acceptors, which makes them inherently “sticky.”) Flat hydrophobic surfaces on sheet faces also drive stacking through the hydrophobic effect. Side chains interdigitate into a tight steric zipper, which is favorable in both enthalpy and entropy, and thus very hard to prevent without chaperones or proline residues. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?\ For many disease-associated sequences (Aβ in Alzheimer’s, α-synuclein in Parkinson’s, IAPP in type II diabetes), the amyloid state is actually thermodynamically more stable than the native fold. Stress, concentration increases, mutations, or metal ions can nucleate conversion, after which elongation proceeds rapidly like crystal growth. The resulting cross-β architecture is extraordinarily stable, resistant to heat, detergent, and proteases. As materials, amyloid fibrils are definitely useful. They can have a Young’s modulus of 1–20 GPa (similar to silk), high aspect ratio, and nanoscale precision. Functional amyloids already exist naturally (curli fibers in bacterial biofilms, yeast prions as regulatory switches). Proposed applications include conductive nanowires (metallized with silver or gold), hydrogels for drug delivery, tissue engineering scaffolds, and even food technology(whey protein amyloids are already used commercially as emulsifiers). Protein Analysis and Visualization Insulin is a small hormone produced by the pancreas that regulates blood glucose levels. I chose it because of its fascinating history as a therapeutic protein. Decades of protein engineering have produced analogs like insulin lispro and glargine, where just one or two amino acid changes dramatically alter how the drug behaves in the body. I wanted to explore the structure underlying a protein that has been so deliberately and successfully redesigned. Sequence: MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN length: 110 amino acids most common residue is L, which appears 20 times this protein has 228 homologs this specific insulin protein is in the broader insulin family Protein Structure Page The structure was released on February 24, 2009. (Note: While the very first insulin structure was solved in 1969, this version (3E7Y) is the modern high resolution reference for the native human protein). The quality is excellent - 1.60 Å There are other molecules in the solved structure of the protein, as this classic structure represents the storage form (hexamer). It contains zinc ions, chloride ions, and water molecules This protein is the defining member of the Insulin-like superfamily. 3D Structure cartoon ribbon sticks secondary structure: the protein has more helices than sheet protein surface: the distribution of hydrophobic (orange) vs hydrophillic (cyan) residues follows as expected. Most of the surface residues are hydrophillic, and the hydrophobic residues line the binding pockets

Subsections of Homework

Week 1 HW: Principles and Practices

allergen_image allergen_image

The Biological Engineering Tool

Tool: Portable Cell-Free Allergen Biosensor.

Description: A single-use, portable reaction unit containing shelf-stable biological sensing reagents.

Mechanism: The user introduces a small sample of food (solid or liquid) into the unit. The device initiates a biochemical reaction that specifically recognizes the molecular signature of a target allergen (e.g., peanut or soy). If the target is detected, the device triggers a distinct visual signal (such as a color change or fluorescence) within minutes.

Why: To increase food confidence and reduce anxiety for people with dietary restrictions. While the primary function is to prevent anaphylaxis, the secondary goal is to validate “mystery foods” (like sauces or baked goods) in social settings, allowing users to eat with peace of mind rather than fear.


Governance & Policy Goals

Primary Goal: Non-Malfeasance (Preventing Harm via Rigorous Safety)

  • Sub-Goal A (Reliability & Trust): The tool must provide high confidence in negative results (telling a user a food is safe). If a user trusts the device and eats a “safe” food that is actually contaminated (False Negative), the physical harm is severe. Conversely, if the device constantly cries “wolf” (False Positive), the user loses confidence and stops using it, returning to a state of anxiety.
  • Sub-Goal B (User Safety & Disposal): Ensure that the device—which may contain chemical neutralizing agents or biological waste—is safe to handle during use and safe to dispose of, without introducing new chemical hazards to the user.

Governance Actions (The Options)

Option 1: The “Matrix Stress Test” Certification

  • Purpose: Mandates that the sensor must be proven to work in “Worst Case Matrices” (foods high in fat, sugar, or acidity) to prevent false negatives caused by complex food chemistry interfering with the sensor.
  • Actor: FDA / Food Safety Regulators.

Option 2: The “Fail-Safe” Internal Control

  • Purpose: Every unit must contain a secondary “Positive Control” mechanism that signals if the biological reagents are functional. If this control signal is missing, the user knows the test is broken/expired. This is critical for confidence: a user needs to know the difference between “This food is safe” and “The test didn’t work.”
  • Actor: The Company / Product Designers.

Option 3: Hazardous Containment & Neutralization Regulations

  • Purpose: If the device uses a chemical “kill switch” (e.g., a bleaching agent or strong acid) to neutralize the biological components before disposal, strict regulations must govern the containment of these hazardous materials.
  • Actor: Consumer Product Safety Commission (CPSC) / Environmental Regulators.
  • Key Policy: Mandate “Child-Resistant” sealing mechanisms and clear, high-contrast warning labels (e.g., “CAUTION: CORROSIVE CONTENTS”) to prevent users from accidentally exposing themselves to the neutralizing chemicals inside.

Scoring Matrix

Scoring Key:

  • 1 = Strong Positive Impact (Best Outcome)
  • 2 = Moderate Impact / Minor Trade-off
  • 3 = Weak Impact / Negative Trade-off / Not Applicable
Does the option:Option 1
(Matrix Stress)
Option 2
(Fail-Safe)
Option 3
(Safe Containment)
Enhance BiosecurityN/AN/A1
• By preventing incidents331
• By helping respond333
Foster User Safety113
• By preventing incident (False Negatives)1 (High)1 (High)3
• By helping respond (Minimizing harm)313 (Risk of Leaks)
Protect the environment331
• By preventing incidents331
• By helping respond333
Other considerations
• Minimizing costs and burdens to stakeholders2 (Dev Cost)2 (Complexity)2 (Packaging Cost)
• Feasibility?211
• Not impede research221
• Promote constructive applications112

Recommendation & Prioritization

Recommendation: I would prioritize Option 2 (The Fail-Safe Internal Control), followed by Option 1.

Reasoning:

  • Why Option 2 First? To build confidence in “mystery foods,” ambiguity is the enemy. If the device fails silently (e.g., due to storage conditions), the user may assume the food is safe. Establishing a positive indication of functionality is the only way to give the user the peace of mind they are looking for.
  • The Trade-off of Option 3: While neutralizing biological waste is important for environmental governance, introducing a toxic, neutralizing chemical creates a new chemical safety hazard for the user. If the containment fails, the user could be burned or injured. Therefore, regulations on how that chemical is contained (Warning Labels, Shatter-proof casing) are critical, but the risk of injury might outweigh the benefit of neutralizing trace amounts of biological material.

Reflection

In class, we discussed the ‘Responsibility of the Toolmaker.’ If I build a tool that claims to detect peanuts, I am effectively taking responsibility for that person’s life for that meal. The ethical weight of a ‘False Negative’ here is far heavier than in other biodesign projects. This made me realize that ‘Accuracy’ isn’t just a technical spec; it’s an ethical requirement. If I can’t guarantee >99% accuracy across all food types, is it ethical to release the product at all?

Proposed Governance: We might need a ‘Beta Testing Transparency’ law. Startups often release ‘beta’ products to iterate quickly. However, for safety diagnostics, ‘Beta’ labels are insufficient. There should be governance prohibiting the release of ‘beta’ medical/safety diagnostics to consumers until they are fully validated.


Pre-Lecture Questions

1. The biological synthesis of DNA using an error-correcting polymerase has an error rate of 1 * 10^6 (one error for every million base pairs added). The human genome is 3.2 Gbp (3.2 billion base pairs). 3,200,000,000 * (1 error)/(1,000,000 bases) = 3,200 errors per cell division. The cell would accumulate thousands of mutations every time it divides, which is too high for a complex organism to survive. Biology resolves this discrepancy by employing a post-replication “spell check” mechanism known as the MutS repair system. This system uses specific proteins (MutS, MutL, and MutH) to scan the DNA for mismatches that the polymerase missed. To ensure it fixes the right letter, the system distinguishes the correct “template” strand from the error-prone “new” strand by looking for methylation markers; the old strand is methylated, while the newly made strand is not. The system then cuts the new strand, removes the section containing the error using exonucleases, and fills in the correct sequence.

2. The number of different ways to code for an average human protein is very large. The average human protein coding sequence is approximately 1,036 bp long, which is roughly 345 amino acids. Because the genetic code is redundant (multiple three-letter DNA codons for most amino acids), there are on average about three different options for every single position in the protein chain. To find the total number of combinations, you would multiply these options for every amino acid (3 times 3 times 3… for 345 times), resulting in approximately 10 to the power of 164 different DNA sequences capable of making the same protein.

In practice, however, the vast majority of these theoretical codes will not work inside a cell due to several biological and physical constraints. One major issue is RNA folding, or secondary structure; as shown in the NUPACK analysis slides, specific nucleotide sequences can twist into tight knots or hairpins based on their “Minimum Free Energy”. If the code you choose creates a tight structure near the start of the molecule, the ribosome may be unable to latch on, preventing the protein from ever being made. Additionally, cells possess “cleanup” enzymes like RNase III that hunt for specific sequence patterns or structures to destroy old or foreign RNA. If your engineered sequence accidentally creates one of these cleavage targets, the cell’s own immune-like system will chop up the instructions before they can be used. Finally, sequences that are extremely repetitive or have difficult chemical properties (such as improper GC content) can be nearly impossible to synthesize or assemble reliably in the lab without introducing errors.

3. The standard method for oligonucleotide synthesis is the phosphoramidite cycle, a solid-phase chemical process that builds DNA strands one base at a time through a repeating four-step sequence. The cycle begins with deprotection, where a blocking group is removed from the sugar of the previous nucleotide, followed by coupling to attach a new building block. This is followed by a capping step to block any unreacted chains from continuing with an incorrect sequence, and finally, oxidation to stabilize the newly formed phosphate linkage.

4. The difficulty in synthesizing oligos longer than 200 nucleotides arises from the cumulative effect of yield decay and chemical imperfections. Even with a highly efficient coupling rate of 99%, the percentage of sequence-perfect, full-length product decreases exponentially with every added base, leaving a mixture dominated by truncated “failure” sequences. Furthermore, as the strand grows, the likelihood of the DNA forming secondary structures increases, which can physically shield the growing end of the chain from the incoming chemicals and prevent successful reactions.

5. It is impossible to make a 2000bp gene through direct synthesis because no chemical process is accurate enough to maintain high fidelity over thousands of consecutive steps. Instead, large genes are produced using a hierarchical assembly approach where shorter, high-quality oligonucleotides are synthesized first and then “stitched” together. These overlapping fragments are joined using enzymatic methods such as Polymerase Chain Reaction (PCR) or Gibson Assembly, allowing for the construction of long, complex sequences while providing opportunities to filter out errors that occurred during the initial oligo printing.

6. In most animals, there are 10 essential amino acids that they simply can’t make themselves and have to get from food. These are Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Arginine, Leucine, and Lysine. In biology, we look at these as the fundamental code—the “basepair code” and ribosomal translation that turns 4 RNA bases into 20+ amino acids to build life.

When you look at the “Lysine Contingency” from Jurassic Park through this lens, the logic is actually pretty weak. Since lysine is already an essential amino acid for almost all animals, every creature in the wild is technically already on a “lysine contingency”. If an engineered animal escaped, it would just find lysine by eating natural plants or other animals.

Modern tech, like the Genomically Recoded Organisms (GROs) mentioned in the slides, takes this concept much further. Instead of relying on something common like lysine, scientists are swapping out codons to create “metabolic isolation”. They engineer life to require Non-Standard Amino Acids (NSAAs)—synthetic building blocks that don’t exist in nature. If these organisms don’t get their specific lab-made “fuel,” their proteins won’t fold, and they won’t survive.

We’re even looking at “Mirror World” life, where the chirality of DNA and proteins is flipped. Since natural life uses L-amino acids and B-DNA, a “mirror” organism would be totally invisible to natural viruses and couldn’t exchange nutrients with the wild. It’s a much more secure “lock” than just hoping the dinosaurs don’t find a snack.

Week 2 HW: DNA Read Write and Edit

gel_art_image gel_art_image

DNA Design Challenge

Chosen Protein: I chose GFP because it serves as a robust reporter that could be used for my allergen biosensor. The goal of the device is to turn a biological detection (sensing peanut DNA) into a signal the user can see. GFP spontaneously fluoresces green when exposed to UV or blue light (like a simple black light LED). By designing the system so that GFP is activated only when the allergen is detected (or shut off in the presence of the allergen), I can create an intuitive user interface.

Amino Acid Sequence:

sp|P42212|GFP_AEQVI Green fluorescent protein OS=Aequorea victoria OX=6100 PE=1 SV=1 MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.: atgagtaaag gagaagaact tttcactgga gttgtcccaa ttcttgttga attagatggt gatgttaatg ggcacaaatt ttctgtcagt ggagagggtg aaggtgatgc aacatacgga aaacttaccc ttaaatttat ttgcactact ggaaaactac ctgttccatg gccaacactt gtcactactt tctgttatgg tgttcaatgc ttttcaagat acccagatca tatgaaacag catgactttt tcaagagtgc catgcccgaa ggttatgtac aggaaagaac tatatttttc aaagatgacg ggaactacaa gacacgtgct gaagtcaagt ttgaaggtga tacccttgtt aatagaatcg agttaaaagg tattgatttt aaagaagatg gaaacattct tggacacaaa ttggaataca actataactc acacaatgta tacatcatgg cagacaaaca aaagaatgga atcaaagtta acttcaaaat tagacacaac attgaagatg gaagcgttca actagcagac cattatcaac aaaatactcc aattggcgat ggccctgtcc ttttaccaga caaccattac ctgtccacac aatctgccct ttccaaagat cccaacgaaa agagagatca catggtcctt cttgagtttg taacagctgc tgggattaca catggcatgg atgaactata caaa

Codon optimization: Codon optimization is necessary because the genetic code is redundant; while multiple codons can specify the same amino acid, different organisms utilize these codons with varying frequencies. By rewriting the sequence to use the host’s “preferred” codons, we ensure that the corresponding tRNA molecules are readily available, which prevents ribosomal stalling and significantly accelerates protein synthesis. Additionally, optimization allows for the removal of inhibitory mRNA secondary structures and “unfavorable” sequences that could lead to premature degradation of the genetic instructions, ultimately ensuring a more robust and rapid visual signal for the user

For this assignment, I have chosen to optimize the sequence for Escherichia coli. As the most well-characterized model organism in synthetic biology, E. coli provides a reliable and standard baseline for protein expression with highly optimized commercial algorithms available for sequence design.

While E. coli is the selected host due to the robust, pre-set optimization tools available on platforms like Twist Bioscience, it serves as a functional baseline for standardization and ease of integration into common laboratory workflows. However, the ideal biological choice for a timely sensor would be Vibrio natriegens (TaxID: 1219067), which possesses a doubling time of under 10 minutes and a significantly higher ribosomal density. Utilizing E. coli for this optimization ensures a high Codon Adaptation Index (CAI) and reliable synthesis, though a custom-optimized Vibrio sequence would remain the preferred engineering solution for achieving maximum metabolic speed in a real-world application.

Optimized using Twist’s Codon optimization tool for E. Coli, avoiding EcoRI, BamHI, XhoI, HindIII, BasI, and BbsI cut sites. ATGTCCAAAGGTGAAGAGTTGTTTACCGGCGTTGTTCCCATCTTAGTGGAGCTCGACGGAGATGTCAATGGTCACAAATTCAGTGTATCAGGTGAAGGGGAAGGCGACGCGACATACGGGAAACTTACCTTAAAATTTATATGCACCACCGGCAAATTGCCCGTACCATGGCCAACGTTAGTGACCACCTTTTCCTACGGTGTCCAGTGCTTTTCACGGTACCCGGATCACATGAAACAGCACGACTTCTTCAAGTCCGCGATGCCGGAAGGCTACGTTCAAGAGCGCACCATATTCTTCAAGGATGACGGAAACTACAAGACGCGAGCAGAGGTTAAGTTCGAGGGAGACACCTTGGTAAATCGAATTGAATTAAAAGGCATTGACTTCAAGGAAGATGGAAACATCCTGGGCCATAAGCTGGAGTACAACTACAATAGTCATAATGTTTACATCATGGCGGATAAACAAAAGAATGGTATCAAGGTCAACTTCAAGATACGACACAATATCGAAGATGGATCTGTCCAATTAGCGGACCACTACCAGCAAAATACCCCCATTGGTGATGGTCCAGTTCTGCTCCCGGACAACCACTATTTGAGTACACAGTCGGCCCTCTCTAAGGACCCTAACGAAAAGCGGGACCATATGGTGCTCCTGGAATTTGTAACGGCCGCCGGAATTACCCACGGCATGGACGAGCTGTACAAATGA

codon_optimization codon_optimization

Now what?:

8The Transcription and Translation Process The process occurs in two stages. First, transcription happens when RNA Polymerase binds to the promoter and creates a messenger RNA (mRNA) strand. Next, translation begins as ribosomes dock onto the mRNA at the Ribosome Binding Site. The ribosome “reads” the mRNA codons and recruits tRNA molecules to assemble amino acids—starting at your M (Methionine) and ending at the TAA stop codon—folding the chain into a functional protein.

Cell-Dependent Production In cell-dependent systems, the DNA is inserted into a plasmid and transformed into a living host like E. coli or V. natriegens. The host’s metabolism provides the energy (ATP) and raw amino acids needed for synthesis. While highly scalable for mass production, this method requires significant time for cell growth and complex purification steps to separate your target protein from the host’s cellular components.

Cell-Free Production A cell-free (TX-TL) system mixes the linear DNA fragment directly with a lysate containing ribosomes and enzymes harvested from broken-open cells. This might be the ideal “timely” choice for a sensor because it eliminates the need to wait for living cultures to grow. It allows for immediate protein production and is more resilient to the lysis buffers used in food testing, though it is generally more expensive for large-scale use.


Prepare a Twist DNA Synthesis Order

Linear Map (Annotated) annotated_map annotated_map

Twist Cloning Vector cloning_vector_pic cloning_vector_pic


DNA Read/Write/Edit

Read

What DNA would you want to sequence (e.g., read) and why? I would sequence the metagenomic DNA from the human gut microbiome of patients with neurodegenerative diseases like Alzheimer’s. Research into the “gut-brain axis” suggests that microbial diversity and specific bacterial metabolites (such as short-chain fatty acids) directly influence neuroinflammation. By reading the entire microbial community, we can identify specific species that are neuroprotective or neurotoxic, providing a non-invasive way to discover early biomarkers for brain health.

In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why? I would use Illumina Next-Seq because it offers the optimal balance of accuracy and throughput required for complex microbiome analysis. When sequencing a fecal sample to understand the gut-brain axis, we are looking at a soup of DNA from thousands of different bacterial species. To detect rare but potentially influential neuro-active microbes, we need deep sequencing, meaning we must read the sample millions of times. Illumina’s second-generation technology is the industry leader for this because its cost per base is significantly lower than third-generation methods (like Nanopore) and its base-calling accuracy is higher (99.9%), ensuring that we don’t misidentify a bacterial species due to a sequencing error.

  1. This is a second-generation technology. It is defined by its “sequencing-by-synthesis” (SBS) method, which uses massive parallelism to read millions of short DNA fragments simultaneously with extremely high accuracy.
  2. The input is high-quality genomic DNA extracted from fecal samples. The essential preparation steps are fragmentation, adapter ligation, and PCR amplification
  3. The sequencer adds fluorescently labeled nucleotides one at a time. Each time a base is incorporated, it emits a specific color. A high-resolution camera records these flashes, and the software performs base calling by converting the sequence of colors into a digital string of A, C, G, and T.
  4. The output is a FASTQ file containing millions of short, highly accurate “reads” which are then computationally assembled to map the microbial species present in the gut.

Write

What DNA would you want to synthesize (e.g., write) and why? I would synthesize a closed-loop genetic logic gate circuit for use in iPSC-derived human microglia. This circuit would be designed to sense high levels of pro-inflammatory cytokines (like IL-1β) or Amyloid-beta and respond by triggering the production of a therapeutic anti-inflammatory protein (like IL-10) or a fluorescent reporter like mCherry for real-time monitoring of disease state in the lab. This would be useful because current drugs for Alzheimer’s are always on, leading to side effects. A synthetic circuit in iPSC-derived microglia could trigger a therapeutic response (like releasing a clearing enzyme) only when it senses high levels of Amyloid-beta or inflammatory cytokines.

What technology or technologies would you use to perform this DNA synthesis and why? I would use Twist Bioscience’s silicon platform because synthetic genetic circuits require the writing of many different regulatory parts (promoters, insulators, and reporters) that must work with each other. Twist’s technology is superior for this because it uses a semiconductor-based approach to synthesize thousands of Gene Fragments in parallel on a single silicon chip. For a BME project, this allows me to multiplex. I can order 50 variations of my Alzheimer’s-sensing circuit with slightly different promoter strengths to see which one has the best signal-to-noise ratio in my iPSC-derived microglia.

  1. Twist utilizes a semiconductor-based silicon platform to perform traditional phosphoramidite chemistry in miniature. The process involves a four-step cycle: De-protection (removing a blocking group), Coupling (adding the next nucleotide), Capping (preventing incorrect chains from growing), and Oxidation (stabilizing the bond). This is repeated for each base until the custom sequence is complete.

  2. Limitations:

  • Speed: Synthesis and shipping typically take 5–10 days, which is slower than biological replication.
  • Accuracy: Chemical synthesis has a small error rate that compounds with length; therefore, sequences longer than 2kb must be built by stitching together smaller, verified fragments.
  • Scalability: While the silicon platform allows for synthesizing thousands of different genes at once, the cost per base remains higher than large-scale natural DNA replication.

Edit

What DNA would you want to edit and why? I would want to edit the human APOE gene to convert the high-risk APOE4 allele into the protective APOE2 allele. APOE4 is the strongest genetic risk factor for late-onset Alzheimer’s, while APOE2 is known to be neuroprotective. Making this switch in human neural stem cells could fundamentally change a person’s risk profile and slow the progression of neurodegeneration.

What technology or technologies would you use to perform these DNA edits and why? I would choose CRISPR Base Editing specifically because of its safety profile in post-mitotic or sensitive cells like those found in the brain. Standard CRISPR-Cas9 (first-generation editing) creates Double-Strand Breaks, which can trigger a p53-mediated toxicity response or cause large, unintended deletions that could be catastrophic in a neural environment. Base Editing is the superior choice for a therapeutic application in Alzheimer’s research because it performs a search and replace at the single-atom level. By chemically converting the target nucleotide without ever cutting the DNA backbone, we minimize the risk of genomic instability while achieving the precise C -> T flip needed to convert the APOE4 risk allele into the protective APOE2 variant

  1. Base editing uses a “deactivated” Cas9 (dCas9) or a nickase (nCas9) fused to a deaminase enzyme. Unlike standard CRISPR, it does not cut the DNA backbone. The process begins with targeting, where a custom-designed guide RNA leads the Base Editor complex to the precise location of the APOE SNP within the genome. Once the target is reached, the Cas9 domain performs unzipping, pulling the DNA strands apart to create a localized window of single-stranded DNA. Finally, the chemical conversion occurs; the fused deaminase enzyme physically rearranges the atoms of a specific base—for instance, converting a Cytosine into a Uracil. The cell’s natural repair machinery then recognizes this change and converts it into a Thymine (C -> T), effectively flipping the genetic switch from a risk allele to a protective one without ever creating a double-strand break.
  2. For preparation, we must design a guide RNA with a 20bp spacer that is unique to the APOE locus. The input is typically the Base Editor protein (or mRNA) and the synthetic gRNA, delivered via lipid nanoparticles or viral vectors.
  3. Despite its precision, the method faces a significant challenge in efficiency, as not every target cell will successfully receive the editor or undergo the chemical conversion. This results in mosaicism, a state where only a fraction of the neural population is corrected while others remain in the high-risk state. Furthermore, the technology is limited by its precision regarding bystander editing. If other identical bases (such as multiple Cytosines) are located within the narrow activity window of the deaminase enzyme, the editor may unintentionally change those nearby bases as well, potentially leading to unintended genetic modifications.

Week 3 HW: Lab Automation

Project Overview: Cell-Free Allergen Biosensor

I am hoping to develop a rapid, consumer-grade biosensor designed to detect trace allergens like peanut or soy in a restaurant setting. To prioritize speed and accuracy, I will use a DNA-to-RNA detection circuit.

The workflow consists of three main stages:

  1. Extraction and Amplification: I could use RPA (Recombinase Polymerase Amplification) to exponentially copy target DNA (like the Ara h 1 gene) at a constant 37°C.
  2. Transcription: T7 RNA polymerase can concurrently convert that DNA into Trigger RNA.
  3. RNA Toehold Detection: This Trigger RNA can bind to a synthetic Toehold Switch, and unzip an RNA hairpin to allow the translation of a reporter protein. This can create a visible color change or induce luminescence in under 20 minutes.

By using a cell-free protein synthesis system, the entire reaction is shelf-stable and functions without the need for a traditional lab environment.


Automation Stack

  • Nebula: I could use this for computational modeling to predict the thermodynamic stability ($\Delta G$) of different toehold designs. It can help me pick sequences that are stable at room temperature but unzip quickly when triggered.
  • Opentrons OT-2: This is a good tool for kinetic screening. I can use it to automate the distribution of reagents and synchronize the start of up to 96 reactions across a plate, ensuring my “time-to-detection” data is precise.
  • Ginkgo: I can look to foundry-scale automation for validating my optimized sensors against complex food matrices (oils, acids, salts) to minimize false negatives.

Kinetic Screening Pseudo-code

This protocol could be used to screen my Nebula-designed candidates to find the one with the fastest response time.

# SENSOR OPTIMIZATION PROTOCOL

# REAGENTS
SET Designs = [Nebula_A, Nebula_B, Nebula_C, Nebula_D]
SET Concentrations = [0, 10, 50, 100] # ppm of Ara h 1 DNA

# 1. REAGENT DISTRIBUTION
FOR EACH well IN plate:
    DISPENSE 10uL Cell-Free/RPA Master Mix

# 2. SENSOR LOADING
FOR i, design IN enumerate(Designs):
    DISPENSE 2uL design INTO Column(i)

# 3. SYNCHRONIZED REACTION START
# Using multichannel to eliminate timing offsets
FOR j, conc IN enumerate(Concentrations):
    MULTICHANNEL_DISPENSE conc INTO Row(j)

# 4. KINETIC DATA COLLECTION
WHILE time < 20 minutes:
    READ_ABSORBANCE (570nm) EVERY 30 seconds
    LOG_DATA(well_id, time, signal_intensity)

# 5. ANALYSIS
RANK designs BY (Signal-to-Noise Ratio) AND (Time-to-Threshold)

Novel Lab Automation for Brain Organoids

Overview Organoids are 3D in vitro models that closely replicate the biology and physiology of their in vivo counterparts, making them highly valuable for developmental research and disease modeling. However, traditional manual cell culture protocols often lack consistency and expose these complex models to erratic swings in nutrient availability and the buildup of toxic metabolites, which can lead to cellular stress.

To address these limitations, researchers developed the “Autoculture” platform, a modular, automated microfluidic system designed to optimize 3D organoid growth. This Internet of Things (IoT)-enabled system uses a custom 24-well polydimethylsiloxane (PDMS) chip to house individual organoids in isolated microenvironments. The platform automatically delivers precisely scheduled media and removes waste without the need to take the cultures out of the incubator, offering high spatiotemporal resolution and customizable feeding schedules.

Findings When tested on cerebral cortex organoids over an 18-day period, the Autoculture platform supported robust growth and accurate neural differentiation comparable to conventional orbital shaker suspension methods. Crucially, RNA sequencing revealed that the automated microfluidic environment significantly reduced markers of cellular stress. Specifically, organoids grown in the automated system showed a marked downregulation in genes associated with canonical glycolysis and endoplasmic reticulum (ER) stress when compared to those maintained in traditional suspension cultures.

figure_2 figure_2

paper_link

Week 4 HW: Protein Design part I

HW Questions:

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
    • Meat is roughly 20% protein by mass, so there’s ~100g of protein in 500g of meat. Average amino acid molecular weight is ~110 Da.
    • 100 g ÷ 110 g/mol ≈ 0.91 mol of amino acid residues × Avogadro’s number: 0.91 × 6.022 × 10²³ ≈ ~5.5 × 10²³ amino acid residues
  2. Why do humans eat beef but do not become a cow, eat fish but do not become fish?
    • During digestion, proteases break down the cow/fish proteins down into their constituent free amino acids and small peptides. These are then absorbed as monomers. Your ribosomes then reassemble them according to your own mRNA instructions.
  3. Why are there only 20 natural amino acids?
    • Likely a combination of three things. First, once the genetic code co-evolved around these 20 amino acids, any change would catastrophically mis-translate the entire proteome, so those 20 were ’locked in’. Second, the 20 cover the necessary chemical space (charged, polar, hydrophobic, aromatic, etc). Third, the simplest amino acids (Gly, Ala, Asp, Glu, Val…) are exactly the ones most readily produced abiotically, so the code evolved around what was chemically accessible early on. Selenocysteine and pyrrolysine as the “21st and 22nd” amino acids show the code can expand, but only under very constrained circumstances.
  4. Where did amino acids come from before enzymes that make them, and before life started?
    • They form spontaneously from simple chemistry. Amino acids are thermodynamically reasonable products.
    • The Miller-Urey experiment showed electric discharge through CH₄, NH₃, H₂O and H₂ produces Gly, Ala, Asp and others.
    • Hydrothermal vents provide mineral catalysts and redox gradients. Meteorites (like Murchison) contain over 70 amino acids synthesized in space, confirming abiotic production is universal wherever C, N, O, H and energy coexist.
  5. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
    • Left-handed. L-amino acids prefer φ≈−57°, ψ≈−47° → right-handed helix. D-amino acids are the mirror image → φ≈+57°, ψ≈+47° → left-handed helix.
  6. Can you discover additional helices in proteins?
    • Beyond the α-helix, several others exist.
    • The 3₁₀-helix is tighter (i→i+3 H-bonds), more strained, and common at helix termini.
    • The π-helix is wider (i→i+5 H-bonds) and surprisingly prevalent at functional sites — maybe 15% of proteins contain at least one π-turn.
    • The polyproline II (PPII) helix has no intramolecular H-bonds at all, is left-handed, and is extremely common in disordered regions, collagen, and signaling domains (SH3 recognition).
    • The collagen triple helix is three intertwined PPII-like chains stabilized by interchain H-bonds.
    • New folds continue to emerge from cryo-EM and AlphaFold-era structural biology, so the catalogue is probably not closed.
  7. Why are most molecular helices right-handed?
    • Most molecular helices are right-handed because all biological amino acids are L-form, so their bond geometry naturally favors coiling clockwise when chained together.
  8. Why do β-sheets tend to aggregate?
    • Edge strands have unpaired backbone NH and C=O groups pointing outward (they’re basically unsatisfied H-bond donors and acceptors, which makes them inherently “sticky.”)
    • Flat hydrophobic surfaces on sheet faces also drive stacking through the hydrophobic effect.
    • Side chains interdigitate into a tight steric zipper, which is favorable in both enthalpy and entropy, and thus very hard to prevent without chaperones or proline residues.
  9. Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?\
    • For many disease-associated sequences (Aβ in Alzheimer’s, α-synuclein in Parkinson’s, IAPP in type II diabetes), the amyloid state is actually thermodynamically more stable than the native fold. Stress, concentration increases, mutations, or metal ions can nucleate conversion, after which elongation proceeds rapidly like crystal growth. The resulting cross-β architecture is extraordinarily stable, resistant to heat, detergent, and proteases.
    • As materials, amyloid fibrils are definitely useful. They can have a Young’s modulus of 1–20 GPa (similar to silk), high aspect ratio, and nanoscale precision. Functional amyloids already exist naturally (curli fibers in bacterial biofilms, yeast prions as regulatory switches). Proposed applications include conductive nanowires (metallized with silver or gold), hydrogels for drug delivery, tissue engineering scaffolds, and even food technology(whey protein amyloids are already used commercially as emulsifiers).

Protein Analysis and Visualization

  1. Insulin is a small hormone produced by the pancreas that regulates blood glucose levels. I chose it because of its fascinating history as a therapeutic protein. Decades of protein engineering have produced analogs like insulin lispro and glargine, where just one or two amino acid changes dramatically alter how the drug behaves in the body. I wanted to explore the structure underlying a protein that has been so deliberately and successfully redesigned.
  2. Sequence:
    • MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    • length: 110 amino acids
    • most common residue is L, which appears 20 times
    • this protein has 228 homologs
    • this specific insulin protein is in the broader insulin family
  3. Protein Structure Page
    • The structure was released on February 24, 2009. (Note: While the very first insulin structure was solved in 1969, this version (3E7Y) is the modern high resolution reference for the native human protein).
    • The quality is excellent - 1.60 Å
    • There are other molecules in the solved structure of the protein, as this classic structure represents the storage form (hexamer). It contains zinc ions, chloride ions, and water molecules
    • This protein is the defining member of the Insulin-like superfamily.
  4. 3D Structure
  • cartoon insulin_cartoon insulin_cartoon
  • ribbon insulin_ribbon insulin_ribbon
  • sticks insulin_sticks insulin_sticks
  • secondary structure: the protein has more helices than sheet insulin_secondary_structure insulin_secondary_structure
  • protein surface: the distribution of hydrophobic (orange) vs hydrophillic (cyan) residues follows as expected. Most of the surface residues are hydrophillic, and the hydrophobic residues line the binding pockets insulin_residue_type insulin_residue_type