Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    View Image Credits Image courtesy of Vincent Muir Q1. Describe a biological engineering application or tool you want to develop and why: Concept: Bio-Circuit for CO₂ Sensing and Reduction

  • Week 2 Homework: DNA Read, Write & Edit

    3.1. Choose your protein Question: In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose. The protein I have chosen is Rubisco, the protein responsible for fixing carbon dioxide during photosynthesis. I have chosen this protein because one of my final project ideas was to create a system that is more efficient at fixing carbon.

  • Week 3: Automation Research & Project Proposal

    1. Published Automation Research Question: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications. The Paper Title: An automated cell-free platform for the rapid characterization of genetic regulators Journal: Scientific Reports (2024) Source: https://www.nature.com/articles/s41598-024-52642-x Description This research utilizes the Opentrons OT-2 to automate the setup of cell-free protein synthesis (CFPS) reactions. The authors focused on characterizing genetic parts like promoters and riboswitches. By using automation, they were able to test over 1,000 different conditions in a fraction of the time it would take a human, with much higher reproducibility across plates.

Subsections of Homework

Week 1 HW: Principles and Practices

cover image cover image
View Image Credits

Image courtesy of Vincent Muir

Q1. Describe a biological engineering application or tool you want to develop and why:

Concept: Bio-Circuit for CO₂ Sensing and Reduction

An idea I have is creating a biological circuit designed to sense CO₂ emissions (or a chemical indicative of excess CO₂ production/presence). The higher goal would be to design a self-sustaining biological system capable of reducing CO₂ emissions with applications for enhancing green climate technology. With modern biosynthesis tools, I envision being able to modify a signal cascade pathway to trigger a fluorescent response for detection, then modify a protein like Rubisco to engineer greater carbon fixation as a potential method for emission reduction.

Goal: Biosafety and Biocontainment

The biggest concern related to applications of this idea would be biosafety. When working with biological systems, it is important to ensure that biological agents are contained, especially considering this is a living system. The release of synthetic systems into the environment could prove detrimental given the potential for uncontrolled growth and competition with natural systems. It is crucial that my system can “self-prune” or regulate itself to control excess growth of this nature.

Q3. Describe at least three different potential governance “actions.”

  1. Mandatory DNA Synthesis Screening (Option 1): I propose modifying the actions of commercial DNA synthesis companies and academic researchers. To my knowledge, DNA synthesis screening is typically voluntary and/or specified to a small number of target pathogens. I propose enforcing mandatory screening of all synthetic DNA orders against a broad, consistently updated database of functional genomic markers, regardless of whether they belong to a known pathogen. This would hopefully lead to the characterization of more sequences responsible for hazardous biological activity. Implementation would likely require a standard technical protocol that all synthesis providers must use. Success would require major industry players to opt-in to prevent “offshoring” orders to countries to mitigate the likely associated costs. This assumes current screening algorithms are able to accurately distinguish between benign research and flagged motif sequences without a high rate of false positives.

  2. Incentivizing Responsible Research via Insurance (Option 2): I propose a model to incentivize responsible research by changing how actors (private insurance companies, research institutions, etc.) behave systematically. Biorisk management is often seen as a bureaucratic cost. Under this model, institutions would receive lower insurance premiums upon demonstration of high-quality biorisk oversight. This could include frequent independent audits, mandatory training, and transparency with quality assessments. This is only executable assuming that private insurers have the technical expertise to judge scientific risk, and that the financial savings from lower premiums are a large enough incentive to reshape institutional behavior.

  3. Mandatory Transparency in Research Publications (Option 3): Research papers typically do not acknowledge the potential for misuse of new findings. This proposal would create a rule mandating that no federal funding be given, nor publications in major journals be accepted, without a clearly detailed section outlining potential risks and mitigation strategies. Regulators must create a standardized template to ensure quality and compliance. Editing staff would now also include biosecurity reviewers to ensure quality and evaluate these statements before publication. This assumes that scientists are able to envision potential misuse cases of their own work, a task growing in difficulty given assisted ideation with AI.

Q4. Governance Scoring Matrix

Does the option:Option 1
(Screening)
Option 2
(Insurance)
Option 3
(Transparency)
Enhance Biosecurity
• By preventing incidentsHigh. Acts as a physical gatekeeper preventing the creation of hazardous sequences.Medium. Encourages safety culture but doesn’t physically stop bad actors.Low. Relies on post-hoc review; good for awareness but doesn’t prevent creation.
• By helping respondHigh. Creates a digital paper trail of who ordered what sequence.Medium. Audit trails helps liability but not immediate biological response.Medium. Ensures mitigation strategies are pre-thought out and published.
Foster Lab Safety
• By preventing incidentLow. Focuses on the “what” (DNA), not the “how” (handling).High. Directly mandates training and oversight of daily lab practices.Low. Administrative in nature.
• By helping respondLow. Not relevant to immediate lab accidents.High. Insurance protocols would mandate accident reporting/response plans.Low.
Protect the environment
• By preventing incidentsHigh. Prevents synthesis of invasive/modified traits before release.Medium. Better oversight leads to better containment protocols.Low.
• By helping respondMedium. Database allows rapid identification of escaped synthetic organisms.Medium. Funding available for cleanup/remediation via insurance.Medium. Publication strategies may include kill-switch documentation.
Other considerations
• Minimizing costs/burdensLow. High technical and administrative burden on providers.Low. High upfront cost for institutions to reorganize compliance.Medium. Adds writing/review time, but low financial cost.
• Feasibility?Medium. Technology exists, but requires international buy-in.Low. Market forces may not support this without regulation.High. Journals/Grants can easily add this requirement.
• Not impede researchLow. False positives could delay legitimate experiments.Medium. Could create cost barriers for small labs/startups.Medium. Scientists may self-censor or fear “hazard” labeling.
• Promote constructive appsHigh. Builds trust that the foundation of bio-economy is safe.High. Professionalizes the industry.Medium. Increases public trust through transparency.

Q5. Drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why.

In order to combat an evolving landscape trending towards increased biological threats, I recommend that national regulatory bodies (e.g., NIH) prioritize an integrated governance strategy that mandates DNA Synthesis Screening (Option 1). This will function as a gatekeeper, providing a platform to embed Responsible Research Oversight into the budget cycle. This multi-tiered approach assumes that screening stays on pace with AI-driven pathogen design while additionally assuming international industry cooperation to prevent a financial “race to the bottom” in safety standards. The primary trade-off is adding to the administrative burden on researchers, potentially delaying legitimate innovation; however, the combined approach minimizes the risk of accidental (or deliberate) release by creating a physical barrier through sequence production and a procedural/administrative barrier through rigorous researcher training and compliance efforts.

Weekly Reflection

Reflecting on week one of How to Grow Almost Anything 2026, the core ethical challenge that stood out to me centered on the “dual-use dilemma.” This makes logical sense; as we develop more advanced biosynthetic tools, the availability of such tools in non-centralized nodes increases, limiting the ability to control how the technology is used. Additionally, it is slightly concerning to think that AI-driven design with synthetic biology has the ability to produce novel toxins that would be unrecognizable. I would propose a strategy that assumes that “safety-by-design” can mitigate risks without stifling the creative freedom central to the HTGAA mission. This would ensure that labs with access to technology are required to follow compliance protocols that prevent widespread abuse of new technologies for nefarious intentions. I believe it is reasonable to suggest increasing the administrative load for the purpose of maintaining access to groundbreaking technology safely.


Homework Questions

Professor Jacobson: Polymerase & Coding

1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

  • Error Rate: Polymerases have a natural error rate of 10^-5 (1 in 100,000 wrong bases).
  • Comparison: When compared to the length of the human genome, which is approximately 3 x 10^9 (3 billion) base pairs long, this would result in 30,000+ incorrectly copied base pairs per division, which would be detrimental.
  • Correction Mechanism: Biology deals with this discrepancy through proofreading in the form of 3’ to 5’ exonuclease activity, essentially stalling enzyme activity whenever an incorrect match is made and editing it. Additionally, there are protein complexes responsible for mismatch repair that proofread replicated DNA once completed, bringing the error rate down to approximately 10^-9.

2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

  • Coding Possibilities: The genetic code contains innate degeneracy, which helps to conserve functional loss whenever a mismatch occurs. Amino acids contain as many as 6 degenerate codons. Considering an average of 3 codons per amino acid and 400 amino acids per human protein on average, there are roughly 3^400 distinct DNA sequences that can code for a single protein.
  • Practical Limitations: In practice, not all of these will code for the protein of interest efficiently. This is partially due to codon usage bias, where the machinery prefers specific coding motifs based on tRNA availability. Additionally, not all DNA is used to code for a protein; splicing of mRNA for transcription changes what code is conserved and, as a result, what protein is actually synthesized.

Dr. LeProust: DNA Synthesis

3. What’s the most commonly used method for oligo synthesis currently? Currently, solid-phase phosphoramidite chemistry is the most commonly used method for oligo synthesis. This utilizes a technique where a computer controls the chemical workflow to build DNA chains in the opposite direction (3’ to 5’).

4. Why is it difficult to make oligos longer than 200nt via direct synthesis? Because chemical reactions are imperfect, even given a 99.5% efficiency rate, an oligo of 200nt would only be 0.995^200 accurate (approx. 36% yield). This means more than half of the synthesized DNA would be inaccurate or failed sequences.

5. Why can’t you make a 2000bp gene via direct oligo synthesis? Utilizing the same math from earlier, 0.995^2000 results in 0.004% accuracy. Effectively, only 1 in 25,000 “genes” synthesized would be accurate, making it chemically impossible to isolate the correct sequence from the mixture of failures.

George Church: The Lysine Contingency

6. What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?

  • Essential Amino Acids: The 10 essential amino acids cannot be synthesized by mammalian cellular machinery and must be consumed through food. They are: Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine.
  • The Lysine Contingency: The “Lysine Contingency” from Jurassic Park is the theoretical fail-safe engineered to prevent dinosaurs from surviving in the wild. However, since lysine is already an essential amino acid, the dinosaurs would naturally need to eat lysine in their diet to survive regardless of genetic engineering. Therefore, removing their ability to produce lysine has the same effect as a standard starvation diet, serving as no additional protective measure.

Gemini AI was consulted for formatting

Week 2 Homework: DNA Read, Write & Edit

3.1. Choose your protein

Question: In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.

The protein I have chosen is Rubisco, the protein responsible for fixing carbon dioxide during photosynthesis. I have chosen this protein because one of my final project ideas was to create a system that is more efficient at fixing carbon.

Protein Sequence:

MASSILSSAVVASVNSASPAQASMVAPFTGLKSSAGFPITRKNNVDITTLASNGGKVQCMKVWPPLGLRKFETLSYLPDMSNEQLSKECDYLLRNGWVPCVEFDIGSGFVYRENHRSPGYYDGRYWTMWKLPMFGCTDSSQVIQEIEEAKKEYPDAFIRVIGFDNVRQVQCISFIAYKPPRFYSS

3.2. Reverse Translate

Question: The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools, determine the nucleotide sequence that corresponds to the protein sequence you chose above.

Nucleotide Sequence:

atggcgagcagcattctgagcagcgcggtggtggcgagcgtgaacagcgcgagcccggcg
caggcgagcatggtggcgccgtttaccggcctgaaaagcagcgcgggctttccgattacc
cgcaaaaacaacgtggatattaccaccctggcgagcaacggcggcaaagtgcagtgcatg
aaagtgtggccgccgctgggcctgcgcaaatttgaaaccctgagctatctgccggatatg
agcaacgaacagctgagcaaagaatgcgattatctgctgcgcaacggctgggtgccgtgc
gtggaatttgatattggcagcggctttgtgtatcgcgaaaaccatcgcagcccgggctat
tatgatggccgctattggaccatgtggaaactgccgatgtttggctgcaccgatagcagc
caggtgattcaggaaattgaagaagcgaaaaaagaatatccggatgcgtttattcgcgtg
attggctttgataacgtgcgccaggtgcagtgcattagctttattgcgtataaaccgccg
cgcttttatagcagc

3.3. Codon Optimization

Question: Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?

Different organisms recognize DNA strands in slightly different ways, so codon optimization is necessary so that motifs that are native to an organism’s machinery are recognized and the gene function is conserved.


3.4. You have a sequence! Now what?

Question: What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

I could use a standard cloning process to produce this protein from DNA. This involves cutting a circular piece of DNA with restriction enzymes, inserting my gene of interest into the plasmid, and using a ligation reaction to create the genetic insertion I wish to express. Then I would insert the plasmid into an organism (E. coli) to produce my protein of interest.

[Image of molecular cloning workflow]


5.1 DNA Read

(i) What DNA would you want to sequence and why?

I would want to sequence and read the DNA from highly efficient plant species. Again I’m potentially pursuing a final project in which I create a biosensor for carbon dioxide production or an efficient fixation system and understanding the machinery responsible for these reactions is crucial.

(ii) What technology or technologies would you use to perform sequencing on your DNA and why?

To identify the specific genes responsible for efficient carbon fixation, I would utilize PacBio SMRT (Single Molecule Real-Time) Sequencing. Plant genomes are notoriously difficult to read because they are often polyploid and filled with repetitive DNA sections. Older technologies (like Illumina) chop DNA into tiny pieces, making it impossible to puzzle a complex plant genome back together. PacBio uses long reads to span these repetitive gaps, allowing me to fully reconstruct the gene clusters responsible for the plant’s efficiency.

  1. Technology Choice: I would use PacBio SMRT Sequencing

    • Why: This technology produces “long reads” (reading long stretches of DNA at once) with high accuracy. This is crucial for de novo assembly of plant genomes, which contain highly repetitive sequences that short-read technologies cannot resolve.
  2. Generation: This is a Third-Generation sequencing technology.

    • How so? Unlike second-generation methods (which require chopping DNA, amplifying it into clusters, and pausing to take photos), third-generation sequencing reads single molecules of DNA in real-time without the need for PCR amplification during the reading process.
  3. Input & Preparation:

    • Input: High Molecular Weight (HMW) genomic DNA (gDNA).
    • Preparation:
      • Extraction: Isolate DNA carefully to keep strands long (15,000+ base pairs).
      • Fragmentation: Shear the DNA into uniform large sizes (e.g., 15kb).
      • SMRTbell Construction: This is the essential step. Hairpin-shaped adapters are ligated (glued) to both ends of the double-stranded DNA. This turns the linear DNA into a circle.
      • Primer Binding: A sequencing primer and DNA polymerase are bound to the adapters.
  4. Essential Steps & Base Calling:

    • Loading: The SMRTbell templates are loaded onto a chip containing millions of microscopic wells called ZMWs (Zero-Mode Waveguides).
    • Sequencing: A single DNA polymerase enzyme is anchored at the bottom of the ZMW. As the DNA circle moves through the enzyme, the polymerase incorporates fluorescently labeled nucleotides (A, T, C, G).
    • Base Calling: Each nucleotide emits a distinct color of light (pulse) as it is added. A camera records this movie of light flashes in real-time to determine the sequence.
    • HiFi Mode: Because the DNA is a circle, the polymerase reads the same sequence over and over again. The software creates a “consensus” from these passes, eliminating random errors.
  5. Output: The output is HiFi Reads (High Fidelity Long Reads). These are digital files (typically FASTQ or BAM) containing sequences that are exceedingly long (10,000 to 25,000 base pairs) with >99.9% accuracy.


5.2 DNA Write

(i) What DNA would you want to synthesize and why?

To create a sensor capable of detecting carbon fixation rates, I would synthesize a genetic circuit. This circuit would utilize a specific promoter (such as a cyanobacterial carbon-responsive promoter) upstream of a reporter gene (like GFP). By synthesizing variations of this promoter, I can tune the sensitivity of the sensor. To achieve this, I would use Silicon-based High-Throughput DNA Synthesis (the technology used by Twist!). This platform miniaturizes the chemical process onto a silicon chip, allowing me to “print” thousands of different sensor designs simultaneously to find the most efficient one.

(ii) What technology would you use?

  1. Technology Choice: I would use Silicon-based Phosphoramidite Synthesis (Twist Bioscience Platform).

    • Why: Unlike traditional synthesis (which uses plastic 96-well plates), this technology uses a silicon chip with millions of microscopic wells. This allows for massive scale; I can synthesize thousands of variations of my carbon sensor circuit to test different sensitivity levels in parallel.
  2. Essential Steps of Synthesis:

    • Oligonucleotide Printing: The process begins on a silicon chip. Using inkjet-like technology, specific nucleotides (A, C, T, G) are added layer-by-layer using phosphoramidite chemistry. This builds short, single-stranded pieces of DNA called “oligos” (usually <200 base pairs).
    • Cleavage & Retrieval:** Once the oligos are built, they are chemically cut (cleaved) off the silicon chip and pooled together in a liquid mixture.
    • Gene Assembly: Since oligos are short and genes are long, the oligos are designed to overlap. Using a reaction similar to PCR (called Polymerase Cycling Assembly), the overlapping oligos act as primers for each other, stitching together to form the full-length double-stranded DNA fragment (e.g., my 1,000bp sensor).
    • Cloning & QC: The assembled DNA is inserted into an expression vector (plasmid) and usually sequenced (using NGS) to ensure there are no errors before being shipped.
  3. Limitations (Speed, Accuracy, Scalability):

    • Length (Scalability): We cannot synthesize a whole genome in one continuous strand. The chemical yield drops as the chain gets longer. We are generally limited to synthesizing fragments of 1.8kb to 3kb. Larger constructs (like whole genomes) must be stitched together from these smaller chunks.
    • Accuracy: Chemical synthesis has an error rate (roughly 1 error every few hundred bases). While highly accurate, deletions or insertions can occur. This requires “error correction” steps or sequencing to find the perfect clones.
    • Speed: While the “printing” is fast, the downstream assembly and shipping usually take weeks (10–20 business days), which is slower than simply ordering a primer.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

I would propose editing the human genome to modulate the expression of telemorase reverse transcriptase (TERT). Telomere shortening acts as a “molecular clock” that limits the number of times a cell can divide, eventually leading to cellular aging (senescence) and tissue failure. By utilizing gene editing to enhance telomerase activity in specific adult stem cell populations, we could theoretically delay the onset of age-related degenerative diseases, thereby extending human “healthspan” and longevity.

(ii) What technology would you use to perform these DNA edits and why?

  1. I would use the CRISPR-Cas9 System (specifically utilizing Homology Directed Repair (HDR)). CRISPR is currently the most versatile, programmable, and accessible tool for mammalian genome editing. Unlike older zinc finger nucleases, CRISPR relies on a simple RNA guide, making it easy to retarget to the specific telomerase promoter regions I want to modify.

  2. Essential Steps of Editing: Guide RNA (gRNA) Design:I would synthesize a specific string of RNA (approx. 20 bases) that matches the DNA sequence near the TERT gene. Cas9 Complex Formation: The gRNA binds to the Cas9 protein (the “molecular scissors”), creating a ribonucleoprotein complex. Targeting & Scanning: The complex enters the cell nucleus and scans the genome. It looks for a specific molecular anchor called the PAM sequence. Once found, it unzips the DNA to check if the gRNA matches the target. Cleavage (The Cut): If the sequence matches, Cas9 cuts both strands of the DNA, creating a Double Strand Break (DSB). Repair (The Edit): This is the critical step. To change the gene (rather than just breaking it), I would provide a Donor DNA Template. The cell uses this template to repair the cut via Homology Directed Repair (HDR), effectively pasting my desired sequence (e.g., a stronger promoter) into the genome.

  3. Limitations: Off-Target Effects (Accuracy): The biggest risk is that the gRNA might accidentally match a similar sequence elsewhere in the genome, causing Cas9 to cut a gene I didn’t intend to touch which could have no consequence to potentially causing cancer. HDR Efficiency: Human cells prefer to fix cuts by simply gluing them back together, which is messy and error-prone. Getting the cell to actually use my “Donor Template” is difficult and often happens in only a small percentage of cells, so theres a low efficiency rate. Delivery: Getting the bulky Cas9 protein and the RNA into the nucleus of a living human (in vivo) is physically difficult. Other viral vectors are often used but have size limits and immune response risks.


Gemini AI was consulted for formatting

Week 3: Automation Research & Project Proposal

1. Published Automation Research

Question: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

The Paper

Description

This research utilizes the Opentrons OT-2 to automate the setup of cell-free protein synthesis (CFPS) reactions. The authors focused on characterizing genetic parts like promoters and riboswitches. By using automation, they were able to test over 1,000 different conditions in a fraction of the time it would take a human, with much higher reproducibility across plates.

This is highly relevant to my interest in carbon fixation. To improve the efficiency of an enzyme like Rubisco, you have to test many different genetic combinations. This paper proves that the OT-2 can handle the tiny volumes and viscosity of cell-free lysates to build those testing libraries accurately.

2. Final Project Proposal: The Carbon Optimizer

Question: Write a description about what you intend to do with automation tools for your final project. Include core details of what you would automate.

Project Vision

My project involves engineering Rubisco variants to improve carbon fixation rates. Rubisco is a major bottleneck in plant growth because it is slow and often reacts with oxygen instead of CO2. I want to build an automated screening platform to find mutations that improve its carboxylation speed.

Automation Strategy

I plan to use the Opentrons OT-2 as the core of my workflow, potentially integrating with Ginkgo Nebula for large-scale synthesis.

  1. Library Preparation: The robot will be responsible for mixing DNA templates of various Rubisco mutants with cell-free protein synthesis master mix in 384-well plates.
  2. Biosensor Integration: I will include a fluorescent pH-responsive dye in the mixture. When Rubisco fixes CO2, it changes the pH of the well. The automated system will dispense this dye at precise intervals to ensure the readings are consistent across all variants.
  3. Environmental Control: I will use a custom 3D printed plate holder to secure the wells during CO2 gas injection. Automation ensures that every well is treated with the same concentration of CO2 for the same amount of time.