View Image Credits Image courtesy of Vincent Muir
Q1. Describe a biological engineering application or tool you want to develop and why: Concept: Bio-Circuit for CO₂ Sensing and Reduction
3.1. Choose your protein Question: In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
The protein I have chosen is Rubisco, the protein responsible for fixing carbon dioxide during photosynthesis. I have chosen this protein because one of my final project ideas was to create a system that is more efficient at fixing carbon.
Published Automation Research Question: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
The Paper Title: An automated cell-free platform for the rapid characterization of genetic regulators Journal: Scientific Reports (2024) Source: https://www.nature.com/articles/s41598-024-52642-x Description This research utilizes the Opentrons OT-2 to automate the setup of cell-free protein synthesis (CFPS) reactions. The authors focused on characterizing genetic parts like promoters and riboswitches. By using automation, they were able to test over 1,000 different conditions in a fraction of the time it would take a human, with much higher reproducibility across plates.
HW – Part A: Amino Acids and Protein Folding 4.1. Quantitative Consumption Question: How many molecules of amino acids do you take with a piece of 500 grams of meat? (On average an amino acid is ~100 Daltons).
1 gram is equal to Avogadro’s number of daltons (6.022E23 Da). To find the total molecules in 500g of meat:
Part 1: Generate Binders with PepMLM The target for this study is the human SOD1 protein (UniProt P00441), specifically focusing on the A4V mutation, which is a common cause of Amyotrophic Lateral Sclerosis (ALS).
Mutant SOD1 Sequence (A4V):
ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Using PepMLM-650M, four peptides of 12 amino acids were generated and compared against the known SOD1-binding peptide FLYRWLPSRRGG.
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Analog Processing: IANNs process continuous, multi-level inputs to produce graded, proportional responses, rather than being restricted to rigid binary (ON/OFF) states. Complex Integration: A single IANN layer can compute complex, non-linear functions by tuning biological “weights” (like promoter strength), whereas Boolean logic requires fragile, metabolically expensive cascades of multiple gates to achieve the same complexity. Robustness: Because they use graded signals and distributed pathways, IANNs are more resistant to biological noise and mutation, showing gradual performance decline (graceful degradation) instead of catastrophic failure. Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal. A potential application could be an environmental risk biosensor that measures the combined threat of multiple water pollutants (e.g., arsenic and pesticides) and outputs a color-coded, continuous risk scale.
Part 1: General Homework Questions 1. Advantages of Cell-Free Synthesis Cell-free protein synthesis (CFPS) offers an “open” architecture, allowing for direct control over the chemical environment. Unlike in vivo methods, CFPS is not limited by cell toxicity or metabolic competition.
Toxic Protein Production: CFPS can produce proteins that would otherwise kill a living host cell. High-Throughput Screening: It allows for the rapid testing of genetic libraries without the need for time-consuming cloning and cell culture. 2. Main Components Cell Extract: The catalytic machinery (ribosomes, polymerases). DNA Template: The genetic instructions. NTPs & Amino Acids: The energy and building blocks. Energy Regeneration: Essential for recycling ATP to sustain translation. 3. Energy Provision Energy regeneration is critical because ATP is consumed rapidly and phosphate byproducts inhibit the reaction. The Creatine Phosphate/Creatine Kinase system is often used to maintain an ATP supply by transferring phosphate groups to ADP.
Homework: Final Project Integration Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. I will be measuring a couple of aspects to assay my experiments. For example, i will be using a spectrophotometer to quantify NADH exchange to validate that the malate products are being formed. I will be looking at turbidity and optical density to evaluate calcium chloride precipitate quantity and formation rate to analyze the efficacy of carbonic anhydrase. Additionally, I ordered some inorganic carbon sats made with heavy carbon, which I plan to compare to a standard to showcase additional evidence of reaction success and product formation.
Q1. Describe a biological engineering application or tool you want to develop and why:
Concept: Bio-Circuit for CO₂ Sensing and Reduction
An idea I have is creating a biological circuit designed to sense CO₂ emissions (or a chemical indicative of excess CO₂ production/presence). The higher goal would be to design a self-sustaining biological system capable of reducing CO₂ emissions with applications for enhancing green climate technology. With modern biosynthesis tools, I envision being able to modify a signal cascade pathway to trigger a fluorescent response for detection, then modify a protein like Rubisco to engineer greater carbon fixation as a potential method for emission reduction.
Q2. Describe a governance or policy goal related to ensuring that this application or tool contributes to an “ethical” future:
Goal: Biosafety and Biocontainment
The biggest concern related to applications of this idea would be biosafety. When working with biological systems, it is important to ensure that biological agents are contained, especially considering this is a living system. The release of synthetic systems into the environment could prove detrimental given the potential for uncontrolled growth and competition with natural systems. It is crucial that my system can “self-prune” or regulate itself to control excess growth of this nature.
Q3. Describe at least three different potential governance “actions.”
Mandatory DNA Synthesis Screening (Option 1):
I propose modifying the actions of commercial DNA synthesis companies and academic researchers. To my knowledge, DNA synthesis screening is typically voluntary and/or specified to a small number of target pathogens. I propose enforcing mandatory screening of all synthetic DNA orders against a broad, consistently updated database of functional genomic markers, regardless of whether they belong to a known pathogen. This would hopefully lead to the characterization of more sequences responsible for hazardous biological activity. Implementation would likely require a standard technical protocol that all synthesis providers must use. Success would require major industry players to opt-in to prevent “offshoring” orders to countries to mitigate the likely associated costs. This assumes current screening algorithms are able to accurately distinguish between benign research and flagged motif sequences without a high rate of false positives.
Incentivizing Responsible Research via Insurance (Option 2):
I propose a model to incentivize responsible research by changing how actors (private insurance companies, research institutions, etc.) behave systematically. Biorisk management is often seen as a bureaucratic cost. Under this model, institutions would receive lower insurance premiums upon demonstration of high-quality biorisk oversight. This could include frequent independent audits, mandatory training, and transparency with quality assessments. This is only executable assuming that private insurers have the technical expertise to judge scientific risk, and that the financial savings from lower premiums are a large enough incentive to reshape institutional behavior.
Mandatory Transparency in Research Publications (Option 3):
Research papers typically do not acknowledge the potential for misuse of new findings. This proposal would create a rule mandating that no federal funding be given, nor publications in major journals be accepted, without a clearly detailed section outlining potential risks and mitigation strategies. Regulators must create a standardized template to ensure quality and compliance. Editing staff would now also include biosecurity reviewers to ensure quality and evaluate these statements before publication. This assumes that scientists are able to envision potential misuse cases of their own work, a task growing in difficulty given assisted ideation with AI.
Q4. Governance Scoring Matrix
Does the option:
Option 1 (Screening)
Option 2 (Insurance)
Option 3 (Transparency)
Enhance Biosecurity
• By preventing incidents
High. Acts as a physical gatekeeper preventing the creation of hazardous sequences.
Medium. Encourages safety culture but doesn’t physically stop bad actors.
Low. Relies on post-hoc review; good for awareness but doesn’t prevent creation.
• By helping respond
High. Creates a digital paper trail of who ordered what sequence.
Medium. Audit trails helps liability but not immediate biological response.
Medium. Ensures mitigation strategies are pre-thought out and published.
Foster Lab Safety
• By preventing incident
Low. Focuses on the “what” (DNA), not the “how” (handling).
High. Directly mandates training and oversight of daily lab practices.
Low. Administrative in nature.
• By helping respond
Low. Not relevant to immediate lab accidents.
High. Insurance protocols would mandate accident reporting/response plans.
Low.
Protect the environment
• By preventing incidents
High. Prevents synthesis of invasive/modified traits before release.
Medium. Better oversight leads to better containment protocols.
Low.
• By helping respond
Medium. Database allows rapid identification of escaped synthetic organisms.
Medium. Funding available for cleanup/remediation via insurance.
Medium. Publication strategies may include kill-switch documentation.
Other considerations
• Minimizing costs/burdens
Low. High technical and administrative burden on providers.
Low. High upfront cost for institutions to reorganize compliance.
Medium. Adds writing/review time, but low financial cost.
• Feasibility?
Medium. Technology exists, but requires international buy-in.
Low. Market forces may not support this without regulation.
High. Journals/Grants can easily add this requirement.
• Not impede research
Low. False positives could delay legitimate experiments.
Medium. Could create cost barriers for small labs/startups.
Medium. Scientists may self-censor or fear “hazard” labeling.
• Promote constructive apps
High. Builds trust that the foundation of bio-economy is safe.
High. Professionalizes the industry.
Medium. Increases public trust through transparency.
Q5. Drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why.
In order to combat an evolving landscape trending towards increased biological threats, I recommend that national regulatory bodies (e.g., NIH) prioritize an integrated governance strategy that mandates DNA Synthesis Screening (Option 1). This will function as a gatekeeper, providing a platform to embed Responsible Research Oversight into the budget cycle. This multi-tiered approach assumes that screening stays on pace with AI-driven pathogen design while additionally assuming international industry cooperation to prevent a financial “race to the bottom” in safety standards. The primary trade-off is adding to the administrative burden on researchers, potentially delaying legitimate innovation; however, the combined approach minimizes the risk of accidental (or deliberate) release by creating a physical barrier through sequence production and a procedural/administrative barrier through rigorous researcher training and compliance efforts.
Weekly Reflection
Reflecting on week one of How to Grow Almost Anything 2026, the core ethical challenge that stood out to me centered on the “dual-use dilemma.” This makes logical sense; as we develop more advanced biosynthetic tools, the availability of such tools in non-centralized nodes increases, limiting the ability to control how the technology is used. Additionally, it is slightly concerning to think that AI-driven design with synthetic biology has the ability to produce novel toxins that would be unrecognizable. I would propose a strategy that assumes that “safety-by-design” can mitigate risks without stifling the creative freedom central to the HTGAA mission. This would ensure that labs with access to technology are required to follow compliance protocols that prevent widespread abuse of new technologies for nefarious intentions. I believe it is reasonable to suggest increasing the administrative load for the purpose of maintaining access to groundbreaking technology safely.
Homework Questions
Professor Jacobson: Polymerase & Coding
1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
Error Rate: Polymerases have a natural error rate of 10^-5 (1 in 100,000 wrong bases).
Comparison: When compared to the length of the human genome, which is approximately 3 x 10^9 (3 billion) base pairs long, this would result in 30,000+ incorrectly copied base pairs per division, which would be detrimental.
Correction Mechanism: Biology deals with this discrepancy through proofreading in the form of 3’ to 5’ exonuclease activity, essentially stalling enzyme activity whenever an incorrect match is made and editing it. Additionally, there are protein complexes responsible for mismatch repair that proofread replicated DNA once completed, bringing the error rate down to approximately 10^-9.
2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Coding Possibilities: The genetic code contains innate degeneracy, which helps to conserve functional loss whenever a mismatch occurs. Amino acids contain as many as 6 degenerate codons. Considering an average of 3 codons per amino acid and 400 amino acids per human protein on average, there are roughly 3^400 distinct DNA sequences that can code for a single protein.
Practical Limitations: In practice, not all of these will code for the protein of interest efficiently. This is partially due to codon usage bias, where the machinery prefers specific coding motifs based on tRNA availability. Additionally, not all DNA is used to code for a protein; splicing of mRNA for transcription changes what code is conserved and, as a result, what protein is actually synthesized.
Dr. LeProust: DNA Synthesis
3. What’s the most commonly used method for oligo synthesis currently?
Currently, solid-phase phosphoramidite chemistry is the most commonly used method for oligo synthesis. This utilizes a technique where a computer controls the chemical workflow to build DNA chains in the opposite direction (3’ to 5’).
4. Why is it difficult to make oligos longer than 200nt via direct synthesis?
Because chemical reactions are imperfect, even given a 99.5% efficiency rate, an oligo of 200nt would only be 0.995^200 accurate (approx. 36% yield). This means more than half of the synthesized DNA would be inaccurate or failed sequences.
5. Why can’t you make a 2000bp gene via direct oligo synthesis?
Utilizing the same math from earlier, 0.995^2000 results in 0.004% accuracy. Effectively, only 1 in 25,000 “genes” synthesized would be accurate, making it chemically impossible to isolate the correct sequence from the mixture of failures.
George Church: The Lysine Contingency
6. What are the 10 essential amino acids in all animals, and how does this affect your view of the “Lysine Contingency”?
Essential Amino Acids: The 10 essential amino acids cannot be synthesized by mammalian cellular machinery and must be consumed through food. They are: Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine.
The Lysine Contingency: The “Lysine Contingency” from Jurassic Park is the theoretical fail-safe engineered to prevent dinosaurs from surviving in the wild. However, since lysine is already an essential amino acid, the dinosaurs would naturally need to eat lysine in their diet to survive regardless of genetic engineering. Therefore, removing their ability to produce lysine has the same effect as a standard starvation diet, serving as no additional protective measure.
Gemini AI was consulted for formatting
Week 2 Homework: DNA Read, Write & Edit
3.1. Choose your protein
Question: In recitation, we discussed that you will pick a protein for your homework that you find interesting. Which protein have you chosen and why? Using one of the tools described in recitation (NCBI, UniProt, google), obtain the protein sequence for the protein you chose.
The protein I have chosen is Rubisco, the protein responsible for fixing carbon dioxide during photosynthesis. I have chosen this protein because one of my final project ideas was to create a system that is more efficient at fixing carbon.
Question: The Central Dogma discussed in class and recitation describes the process in which DNA sequence becomes transcribed and translated into protein. The Central Dogma gives us the framework to work backwards from a given protein sequence and infer the DNA sequence that the protein is derived from. Using one of the tools discussed in class, NCBI or online tools, determine the nucleotide sequence that corresponds to the protein sequence you chose above.
Question: Once a nucleotide sequence of your protein is determined, you need to codon optimize your sequence. In your own words, describe why you need to optimize codon usage. Which organism have you chosen to optimize the codon sequence for and why?
Different organisms recognize DNA strands in slightly different ways, so codon optimization is necessary so that motifs that are native to an organism’s machinery are recognized and the gene function is conserved.
3.4. You have a sequence! Now what?
Question: What technologies could be used to produce this protein from your DNA? Describe in your words how the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.
I could use a standard cloning process to produce this protein from DNA. This involves cutting a circular piece of DNA with restriction enzymes, inserting my gene of interest into the plasmid, and using a ligation reaction to create the genetic insertion I wish to express. Then I would insert the plasmid into an organism (E. coli) to produce my protein of interest.
[Image of molecular cloning workflow]
5.1 DNA Read
(i) What DNA would you want to sequence and why?
I would want to sequence and read the DNA from highly efficient plant species. Again I’m potentially pursuing a final project in which I create a biosensor for carbon dioxide production or an efficient fixation system and understanding the machinery responsible for these reactions is crucial.
(ii) What technology or technologies would you use to perform sequencing on your DNA and why?
To identify the specific genes responsible for efficient carbon fixation, I would utilize PacBio SMRT (Single Molecule Real-Time) Sequencing. Plant genomes are notoriously difficult to read because they are often polyploid and filled with repetitive DNA sections. Older technologies (like Illumina) chop DNA into tiny pieces, making it impossible to puzzle a complex plant genome back together. PacBio uses long reads to span these repetitive gaps, allowing me to fully reconstruct the gene clusters responsible for the plant’s efficiency.
Technology Choice: I would use PacBio SMRT Sequencing
Why: This technology produces “long reads” (reading long stretches of DNA at once) with high accuracy. This is crucial for de novo assembly of plant genomes, which contain highly repetitive sequences that short-read technologies cannot resolve.
Generation: This is a Third-Generation sequencing technology.
How so? Unlike second-generation methods (which require chopping DNA, amplifying it into clusters, and pausing to take photos), third-generation sequencing reads single molecules of DNA in real-time without the need for PCR amplification during the reading process.
Input & Preparation:
Input: High Molecular Weight (HMW) genomic DNA (gDNA).
Preparation:
Extraction: Isolate DNA carefully to keep strands long (15,000+ base pairs).
Fragmentation: Shear the DNA into uniform large sizes (e.g., 15kb).
SMRTbell Construction: This is the essential step. Hairpin-shaped adapters are ligated (glued) to both ends of the double-stranded DNA. This turns the linear DNA into a circle.
Primer Binding: A sequencing primer and DNA polymerase are bound to the adapters.
Essential Steps & Base Calling:
Loading: The SMRTbell templates are loaded onto a chip containing millions of microscopic wells called ZMWs (Zero-Mode Waveguides).
Sequencing: A single DNA polymerase enzyme is anchored at the bottom of the ZMW. As the DNA circle moves through the enzyme, the polymerase incorporates fluorescently labeled nucleotides (A, T, C, G).
Base Calling: Each nucleotide emits a distinct color of light (pulse) as it is added. A camera records this movie of light flashes in real-time to determine the sequence.
HiFi Mode: Because the DNA is a circle, the polymerase reads the same sequence over and over again. The software creates a “consensus” from these passes, eliminating random errors.
Output:
The output is HiFi Reads (High Fidelity Long Reads). These are digital files (typically FASTQ or BAM) containing sequences that are exceedingly long (10,000 to 25,000 base pairs) with >99.9% accuracy.
5.2 DNA Write
(i) What DNA would you want to synthesize and why?
To create a sensor capable of detecting carbon fixation rates, I would synthesize a genetic circuit. This circuit would utilize a specific promoter (such as a cyanobacterial carbon-responsive promoter) upstream of a reporter gene (like GFP). By synthesizing variations of this promoter, I can tune the sensitivity of the sensor. To achieve this, I would use Silicon-based High-Throughput DNA Synthesis (the technology used by Twist!). This platform miniaturizes the chemical process onto a silicon chip, allowing me to “print” thousands of different sensor designs simultaneously to find the most efficient one.
(ii) What technology would you use?
Technology Choice: I would use Silicon-based Phosphoramidite Synthesis (Twist Bioscience Platform).
Why: Unlike traditional synthesis (which uses plastic 96-well plates), this technology uses a silicon chip with millions of microscopic wells. This allows for massive scale; I can synthesize thousands of variations of my carbon sensor circuit to test different sensitivity levels in parallel.
Essential Steps of Synthesis:
Oligonucleotide Printing: The process begins on a silicon chip. Using inkjet-like technology, specific nucleotides (A, C, T, G) are added layer-by-layer using phosphoramidite chemistry. This builds short, single-stranded pieces of DNA called “oligos” (usually <200 base pairs).
Cleavage & Retrieval:** Once the oligos are built, they are chemically cut (cleaved) off the silicon chip and pooled together in a liquid mixture.
Gene Assembly: Since oligos are short and genes are long, the oligos are designed to overlap. Using a reaction similar to PCR (called Polymerase Cycling Assembly), the overlapping oligos act as primers for each other, stitching together to form the full-length double-stranded DNA fragment (e.g., my 1,000bp sensor).
Cloning & QC: The assembled DNA is inserted into an expression vector (plasmid) and usually sequenced (using NGS) to ensure there are no errors before being shipped.
Limitations (Speed, Accuracy, Scalability):
Length (Scalability): We cannot synthesize a whole genome in one continuous strand. The chemical yield drops as the chain gets longer. We are generally limited to synthesizing fragments of 1.8kb to 3kb. Larger constructs (like whole genomes) must be stitched together from these smaller chunks.
Accuracy: Chemical synthesis has an error rate (roughly 1 error every few hundred bases). While highly accurate, deletions or insertions can occur. This requires “error correction” steps or sequencing to find the perfect clones.
Speed: While the “printing” is fast, the downstream assembly and shipping usually take weeks (10–20 business days), which is slower than simply ordering a primer.
5.3 DNA Edit
(i) What DNA would you want to edit and why?
I would propose editing the human genome to modulate the expression of telemorase reverse transcriptase (TERT). Telomere shortening acts as a “molecular clock” that limits the number of times a cell can divide, eventually leading to cellular aging (senescence) and tissue failure. By utilizing gene editing to enhance telomerase activity in specific adult stem cell populations, we could theoretically delay the onset of age-related degenerative diseases, thereby extending human “healthspan” and longevity.
(ii) What technology would you use to perform these DNA edits and why?
I would use the CRISPR-Cas9 System (specifically utilizing Homology Directed Repair (HDR)). CRISPR is currently the most versatile, programmable, and accessible tool for mammalian genome editing. Unlike older zinc finger nucleases, CRISPR relies on a simple RNA guide, making it easy to retarget to the specific telomerase promoter regions I want to modify.
Essential Steps of Editing:
Guide RNA (gRNA) Design:I would synthesize a specific string of RNA (approx. 20 bases) that matches the DNA sequence near the TERT gene.
Cas9 Complex Formation: The gRNA binds to the Cas9 protein (the “molecular scissors”), creating a ribonucleoprotein complex.
Targeting & Scanning: The complex enters the cell nucleus and scans the genome. It looks for a specific molecular anchor called the PAM sequence. Once found, it unzips the DNA to check if the gRNA matches the target.
Cleavage (The Cut): If the sequence matches, Cas9 cuts both strands of the DNA, creating a Double Strand Break (DSB).
Repair (The Edit): This is the critical step. To change the gene (rather than just breaking it), I would provide a Donor DNA Template. The cell uses this template to repair the cut via Homology Directed Repair (HDR), effectively pasting my desired sequence (e.g., a stronger promoter) into the genome.
Limitations:
Off-Target Effects (Accuracy): The biggest risk is that the gRNA might accidentally match a similar sequence elsewhere in the genome, causing Cas9 to cut a gene I didn’t intend to touch which could have no consequence to potentially causing cancer.
HDR Efficiency: Human cells prefer to fix cuts by simply gluing them back together, which is messy and error-prone. Getting the cell to actually use my “Donor Template” is difficult and often happens in only a small percentage of cells, so theres a low efficiency rate.
Delivery: Getting the bulky Cas9 protein and the RNA into the nucleus of a living human (in vivo) is physically difficult. Other viral vectors are often used but have size limits and immune response risks.
Gemini AI was consulted for formatting
Week 3 Homework: Lab Automation
Published Automation Research
Question: Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
The Paper
Title: An automated cell-free platform for the rapid characterization of genetic regulators
This research utilizes the Opentrons OT-2 to automate the setup of cell-free protein synthesis (CFPS) reactions. The authors focused on characterizing genetic parts like promoters and riboswitches. By using automation, they were able to test over 1,000 different conditions in a fraction of the time it would take a human, with much higher reproducibility across plates.
This is highly relevant to my interest in carbon fixation. To improve the efficiency of an enzyme like Rubisco, you have to test many different genetic combinations. This paper proves that the OT-2 can handle the tiny volumes and viscosity of cell-free lysates to build those testing libraries accurately.
Final Project Proposal: The Carbon Optimizer
Question: Write a description about what you intend to do with automation tools for your final project. Include core details of what you would automate.
Project Vision
My project involves engineering Rubisco variants to improve carbon fixation rates. Rubisco is a major bottleneck in plant growth because it is slow and often reacts with oxygen instead of CO2. I want to build an automated screening platform to find mutations that improve its carboxylation speed.
Automation Strategy
I plan to use the Opentrons OT-2 as the core of my workflow, potentially integrating with Ginkgo Nebula for large-scale synthesis.
Library Preparation: The robot will be responsible for mixing DNA templates of various Rubisco mutants with cell-free protein synthesis master mix in 384-well plates.
Biosensor Integration: I will include a fluorescent pH-responsive dye in the mixture. When Rubisco fixes CO2, it changes the pH of the well. The automated system will dispense this dye at precise intervals to ensure the readings are consistent across all variants.
Environmental Control: I will use a custom 3D printed plate holder to secure the wells during CO2 gas injection. Automation ensures that every well is treated with the same concentration of CO2 for the same amount of time.
Week 4 Homework: Protein Design Part I
HW – Part A: Amino Acids and Protein Folding
4.1. Quantitative Consumption
Question: How many molecules of amino acids do you take with a piece of 500 grams of meat? (On average an amino acid is ~100 Daltons).
1 gram is equal to Avogadro’s number of daltons (6.022E23 Da). To find the total molecules in 500g of meat:
Total Daltons in 500g: 500g * 6.022E23 Da/g = 3.011E26 Da
Number of Amino Acids: 3.011E26 Da / 100 Da per molecule = 3.011E24 molecules
On average, you consume 3.011E24 amino acids per 500 grams of meat.
4.2. Biological Identity and Digestion
Question: Why do humans eat beef but do not become a cow, or eat fish but do not become fish?
When we eat proteins from another organism, our body breaks them down into their individual building blocks called amino acids. These amino acids are then used by our native machinery to manufacture human proteins. While the raw components come from the fish or cow, they are used to build specific human proteins dictated by our own genetic code.
4.3. The Standard Genetic Code
Question: Why are there only 20 natural amino acids?
While there is not one comprehensive answer to this question, it is likely a result of evolutionary efficiency and competency. The 20 naturally occurring amino acids proved to be versatile, stable, and competent enough to support the plethora of life that they compose. They have handled the environmental changes of billions of years, so there were no significant evolutionary pressures for additional amino acids to arise in the standard code.
4.4. Non-Natural Amino Acid Design
Question: Can you make other non-natural amino acids? Design some new amino acids.
Yes, it is possible to make non-natural amino acids by chemically altering the side chain or the backbone. For example, if I wanted an amino acid that I could tag and trace, I could modify the side chain with a molecule containing a recognizable chemical signature that also is a minimal hindrance to normal function.
As an example, the difference between phenylalanine and tyrosine is tyrosine’s alcohol (hydroxyl group) on the aromatic ring. A synthetic amino acid could be the tyrosine base but substituting the alcohol for a ketone, or adding a halogen that I could radiolabel and track. Another option is changing the hydroxyl group into methanol or ethanol to create a chemically viable alternative.
4.5. Prebiotic Origins
Question: Where did amino acids come from before enzymes that make them, and before life started?
Before life started, Earth was a high-energy environment, constantly bombarded with meteorites and materials containing initial organic conditions. The planet was full of ammonia, water, methane, and hydrogen in reactive states. These were subject to UV energy from the sun and lightning, creating the conditions possible for stable molecules to form given the energy needed to achieve thermodynamic and kinetic stability to form more complex building blocks.
4.6. Chirality and Handedness
Question: If you make an alpha-helix using D-amino acids, what handedness (right or left) would you expect?
I would expect it to be left-handed, which is atypical from the norm. D-amino acids are the chiral enantiomers (mirror images) of L-amino acids, which are the ones naturally occurring in nature that form typical right-handed alpha-helix twists. Therefore, the mirror image of a right-handed twist would be a left-handed twist.
4.7. Right-Handed Helices
Question: Why are most molecular helices right-handed?
Molecular helices are typically right-handed because they are the most stable configuration for L-amino acids. Right-handed helices produce favorable hydrogen bond interactions and minimize steric hindrances, resulting in a molecule with greater thermodynamic stability.
4.8. Beta-Sheet Aggregation
Question: Why do beta-sheets tend to aggregate and what is the driving force?
Beta-sheets tend to aggregate due to favorable intermolecular interactions between the planes of the sheets. The driving force for aggregation is hydrogen bonding—the strongest intermolecular force—along with increasing dispersion forces as hydrophobic molecules layer together to limit their exposure to the aqueous environment.
4.9. Amyloid Diseases and Materials
Question: Why do many amyloid diseases form beta-sheets? Can you use amyloid beta-sheets as materials?
Parkinson’s, Alzheimer’s, and other amyloid diseases arise from protein misfolding that produce highly stable amyloid fibrils that layer into beta-sheets. These structures are thermodynamically favorable due to extensive hydrogen bonding and hydrophobic interactions between amino acid residues. This forms a durable structure that stacks into long, rigid fibrils which reduce protein function and disrupt cellular processes.
However, because these beta-sheets are so stable, they could be engineered into sustainable biomaterials. Potential uses include tissue engineering scaffolds, molecular meshes for water filtration, or potentially filaments for biological circuits.
Gemini AI was consulted for formatting
Week 5 Homework: Protein Design Part II
Part 1: Generate Binders with PepMLM
The target for this study is the human SOD1 protein (UniProt P00441), specifically focusing on the A4V mutation, which is a common cause of Amyotrophic Lateral Sclerosis (ALS).
Using PepMLM-650M, four peptides of 12 amino acids were generated and compared against the known SOD1-binding peptide FLYRWLPSRRGG.
PepMLM Confidence Scores
Sequence
Description
Perplexity
FLYRWLPSRRGG
Real Binder
—
WHSPVVAVAHWE
Sim 1
10.949699
WSVGWAAIAWWX
Sim 2
16.027645
WRSYATAIALWK
Sim 3
11.729657
WRYYATGAEWKE
Sim 4
13.769973
Part 2: Evaluate Binders with AlphaFold3
Each peptide was modeled against the mutant SOD1 sequence using AlphaFold3 to assess structural docking and interface confidence (ipTM).
Structural Observations
Localization: Surprisingly, none of the peptides localized specifically to the A4V mutation site at the N-terminus. All peptides were primarily surface-bound.
Binding Sites: Sim 3 (WRSYATAIALWK) and Sim 4 (WRYYATGAEWKE) both bound to a region that appears to be a potential polymerization site. Sim 2 (WSVGWAAIAWWX) engaged the β-barrel region. The wild-type binder also localized to a suspected polymerization site.
AlphaFold3 Visualizations
Part 3: Evaluate Properties in the PeptiVerse
Beyond structural docking, we evaluated the therapeutic potential (solubility, hemolysis, and affinity) of the generated sequences.
Peptide Property Comparison
Peptide Sequence
Solubility
Hemolysis
Affinity (pKd)
MW (Da)
Net Charge
pI
GRAVY
ipTM
pTM
FLYRWLPSRRGG
1.000
0.047
5.96 (Weak)
1507.7
+2.76
11.71
-0.71
0.33
0.78
WHSPVVAVAHWE
1.000
0.048
5.07 (Weak)
1417.6
-1.06
6.02
0.18
0.28
0.78
WSVGWAAIAWWX
1.000
0.174
7.71 (Med)
1314.6
-0.24
5.53
0.78
0.37
0.76
WRSYATAIALWK
1.000
0.056
6.83 (Weak)
1465.7
+1.76
9.99
0.06
0.41
0.72
WRYYATGAEWKE
1.000
0.063
5.71 (Weak)
1559.7
-0.23
6.28
-1.44
0.31
0.86
Analysis of Results
The observed ipTM values across the PepMLM-generated candidates range from 0.28 to 0.41, reflecting a low-to-moderate confidence in specific interface orientation. However, two PepMLM peptides outperformed the known binder (ipTM 0.33): WRSYATAIALWK (0.41) and WSVGWAAIAWWX (0.37).
There is a general correlation between structural confidence and affinity; the sequence with the highest affinity (WSVGWAAIAWWX, pKd: 7.71) also showed a strong ipTM. Interestingly, the highest ipTM belonged to WRSYATAIALWK, despite it having a lower predicted affinity, suggesting that AF3 finds this specific binding pose more physically convincing even if the thermodynamic prediction is weaker. Critically, all candidates maintain a 1.000 solubility probability and low hemolysis scores.
Selection Candidate: WRSYATAIALWK While WSVGWAAIAWWX has the highest raw affinity, WRSYATAIALWK offers a superior balance of properties. It provides the highest structural confidence (ipTM 0.41) while maintaining a much safer hemolysis profile (0.056) compared to the “riskiest” sequence in the group (0.174).
Part 4: Generate Optimized Peptides with moPPIt
Unlike the sampling method of PepMLM, moPPIt was used for controlled, multi-objective design to steer generation toward specific residues (dimer interface/A4V site) and optimize therapeutic properties.
moPPIt Generated Peptides
Sequence
Solubility
Hemolysis
Affinity
MW (Da)
Net Charge
pI
GRAVY
GGTTTDDTKAES
1.000
0.054
4.20
1182.1
-2.24
4.05
-1.42
ATTGYCGCTMQN
1.000
0.018
5.39
1249.4
-0.22
5.55
-0.21
DEGYKKQKGQIQ
1.000
0.041
4.75
1421.6
+0.76
8.43
-2.23
Comparison and Evaluation
The PepMLM peptides generally focus on high-confidence structural docking and balanced biophysical properties, leaning toward a safe profile while improving affinity over the baseline. In contrast, the moPPIt peptides prioritize diverse chemical spaces and targeted binding.
The moPPIt set introduces extreme charge variations, such as the highly acidic GGTTTDDTKAES (pI 4.05) and the polar-rich DEGYKKQKGQIQ, which differ significantly from the more hydrophobic PepMLM designs. To evaluate these before clinical studies, we would need to perform experimental circular dichroism (CD) to confirm peptide stability and surface plasmon resonance (SPR) to validate the targeted binding affinity at the specific A4V or dimer interface sites chosen during the moPPIt steering process.
Gemini AI was consulted for formatting and content organization
Week 7 Homework: Genetic Circuits Part II
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)
What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?
Analog Processing: IANNs process continuous, multi-level inputs to produce graded, proportional responses, rather than being restricted to rigid binary (ON/OFF) states.
Complex Integration: A single IANN layer can compute complex, non-linear functions by tuning biological “weights” (like promoter strength), whereas Boolean logic requires fragile, metabolically expensive cascades of multiple gates to achieve the same complexity.
Robustness: Because they use graded signals and distributed pathways, IANNs are more resistant to biological noise and mutation, showing gradual performance decline (graceful degradation) instead of catastrophic failure.
Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.
A potential application could be an environmental risk biosensor that measures the combined threat of multiple water pollutants (e.g., arsenic and pesticides) and outputs a color-coded, continuous risk scale.
Input/Output Behavior:
Inputs (X1, X2): Varying concentrations of two different toxins activate specific DNA promoters.
Hidden Layer: The circuit integrates these inputs to produce an intermediate endoribonuclease. Its concentration is a weighted, non-linear reflection of the combined toxin levels.
Final Output (Y): The endoribonuclease proportionally inhibits the production of a fluorescent protein. Low toxins yield high fluorescence (Safe), while high toxins completely inhibit fluorescence (Danger).
What limitations might an IANN face to achieve your goal?
Running multi-layer circuits with numerous unique DNA parts and proteins drains cellular energy, which can cause the host bacteria to grow slowly or mutate the circuit to survive. Additionally, intermediate regulators (like the endoribonuclease) must be perfectly specific. If they accidentally interfere with the host cell’s natural mRNA, the system will fail or the cell will die. Finally, precisely calibrating the “weights” of the network requires finding the exact combination of promoter strengths, ribosome binding sites, and degradation rates, which takes immense trial and error.
IANN Diagrams
Below is a diagram depicting an intracellular single-layer perceptron where the X1 input is DNA encoding for the Csy4 endoribonuclease and the X2 input is DNA encoding for a fluorescent protein output whose mRNA is regulated by Csy4. Tx: transcription; Tl: translation.
Below is a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.
Assignment Part 2: Fungal Materials
What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?
The mycelium network can be bound to things like agricultural waste to create materials and items. For example, mycelium leather can be used for vegan clothing, mycelium can be used as a packaging alternative to foam, and composite structures like bricks can be used as well. They are biodegradable and clean, making them very environmentally friendly considering they also help repurpose waste. Some disadvantages stem in structural capacity and integrity, since its highly water soluble, and the compressive strength is not practical for load bearing weight.
What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?
You could create custom biomaterials, like for example, a water resistant polymer to enhance the strength of mycelium materials. Fungi are also powerful producers and could be used for efficient synthesis of complex biologics, and can also be used to break down environmental crises like oil spills. From a synthetic biology standpoint, fungi utilize eukaryotic machinery allowing for more complex protein folding and molecular synthesis.
Gemini AI was consulted for formatting and content organization
Week 9 Homework: Cell-Free Systems
Part 1: General Homework Questions
1. Advantages of Cell-Free Synthesis
Cell-free protein synthesis (CFPS) offers an “open” architecture, allowing for direct control over the chemical environment. Unlike in vivo methods, CFPS is not limited by cell toxicity or metabolic competition.
Toxic Protein Production: CFPS can produce proteins that would otherwise kill a living host cell.
High-Throughput Screening: It allows for the rapid testing of genetic libraries without the need for time-consuming cloning and cell culture.
2. Main Components
Cell Extract: The catalytic machinery (ribosomes, polymerases).
DNA Template: The genetic instructions.
NTPs & Amino Acids: The energy and building blocks.
Energy Regeneration: Essential for recycling ATP to sustain translation.
3. Energy Provision
Energy regeneration is critical because ATP is consumed rapidly and phosphate byproducts inhibit the reaction. The Creatine Phosphate/Creatine Kinase system is often used to maintain an ATP supply by transferring phosphate groups to ADP.
4. Prokaryotic vs. Eukaryotic Systems
Prokaryotic (E. coli): Best for high-yield, simple proteins like sfGFP.
Eukaryotic (Wheat Germ): Necessary for proteins requiring post-translational modifications, such as Human Insulin.
5. Membrane Protein Optimization
Membrane proteins often aggregate in aqueous environments. To optimize expression, I would use Nanodiscs or Liposomes in the reaction setup to provide a hydrophobic scaffold for co-translational insertion.
6. Troubleshooting Low Yield
Reason: Nuclease degradation. Fix: Use RNase inhibitors.
Reason: Magnesium imbalance. Fix: Perform a Mg(2+) titration.
Reason: Inefficient folding. Fix: Add molecular chaperones like GroEL/ES.
Part 2: Synthetic Minimal Cell Design (Kate Adamala)
Function: An environmental “Lead-Trapper” cell.
Input: Heavy metal ions (Lead/Pb).
Output: Lead sequestration inside the cell and a GFP signal.
Feasibility: Encapsulation is likely needed to physically isolate the lead from the environment in addition to cell-free.
Natural cell? Possible, but a minimal cell is safer as it cannot replicate or spread in the wild.
Design:
Membrane: POPC/Cholesterol phospholipids.
Encapsulation:E. coli PURE system for protein synthesis.
Communication:alpha-Hemolysin (αHL) pores (Gene: hlyA) to allow lead ions to enter the membrane.
Gene 2:pcs (Phytochelatin synthase from S. pombe) to create lead-binding peptides.
Measurement: Fluorescence intensity from sfGFP tied to the PbrR promoter.
Part 3: Cell-Free Materials (Peter Nguyen)
Application Field: Textiles/Fashion
Pitch: A “living” athletic garment that detects dehydration in sweat and triggers a color-changing cooling response.
The Concept: The fabric is embedded with freeze-dried cell-free reactions containing synthetic gene circuits sensitive to sodium concentrations. When sweat salinity reaches a threshold, the circuit expresses a thermochromic pigment or activates an endothermic enzyme reaction. This addresses the risk of heatstroke in high-intensity athletes. To address stability, the reactions are encapsulated in trehalose-stabilized micro-beads that are only activated by the moisture in sweat.
Part 4: Mock Genes in Space Proposal (Ally Huang)
Background: Long-duration spaceflight leads to immune system dysregulation, specifically T-cell suppression. Understanding how immune-related genes, such as Interleukin-2 (IL-2), are transcribed in microgravity is vital for ensuring astronaut health on future Mars missions.
Molecular Target: Human Interleukin-2 (IL-2) promoter and gene sequence.
Target Relation: IL-2 is essential for T-cell proliferation. By using BioBits® to express IL-2 in space, we can determine if transcriptional machinery is hindered by microgravity or radiation independent of complex whole-cell physiology.
Goal: To determine if microgravity affects the yield of immune-related proteins.
Hypothesis: Transcriptional yield in space will be 20% lower than Earth controls due to altered molecular diffusion kinetics.
Reasoning: Microgravity alters fluid dynamics, potentially slowing the interaction between RNA polymerase and DNA templates.
Experimental Plan: Samples consist of freeze-dried BioBits® pellets containing IL-2 DNA. On the ISS, samples are rehydrated and placed in the P51 Molecular Fluorescence Viewer. A matching control experiment is run on Earth. Data is collected via fluorescence photography to measure protein production rates.
Gemini AI was consulted for formatting and content organization
Week 10 Homework: Imaging and Measurement
Homework: Final Project Integration
Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.
I will be measuring a couple of aspects to assay my experiments. For example, i will be using a spectrophotometer to quantify NADH exchange to validate that the malate products are being formed. I will be looking at turbidity and optical density to evaluate calcium chloride precipitate quantity and formation rate to analyze the efficacy of carbonic anhydrase. Additionally, I ordered some inorganic carbon sats made with heavy carbon, which I plan to compare to a standard to showcase additional evidence of reaction success and product formation.
Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.
I will be using a spectrophotometer to look at the optical density as well as absorbance. I will be measuring time and rates with a stopwatch and serial data collection. I will measure DNA and validate product with gels and concentration of protein. Maybe I could also utilize a mass spectrometer to compare my heavy carbon products to normal products.
What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.
Gel electrophoresis, protein purification and concentration, spectrophotometer, pH indicator, qbit, mass spectrometry, DNA sequencing, nickel column exchange chromatography, desalination centrifugation
Homework: Waters Part I — Molecular Weight
1. Based on the predicted amino acid sequence of eGFP and any known modifications, what is the calculated molecular weight?
eGFP Sequence Analysis: The sequence provided includes the eGFP core, an LE linker, and a 6x-His purification tag.
Calculated Molecular Weight:27,988.97 Da (Daltons).
2. Calculate the molecular weight of the eGFP using the adjacent charge state approach.
Using Figure 1, we select two adjacent peaks:
Peak 1 (m/z_n): 875.4421
Peak 2 (m/z_n+1): 903.7148
2.1 Determine z for each adjacent pair of peaks:
Using the formula for charge state calculation:
$$z = \frac{m/z_{n+1} - 1.0078}{m/z_{n+1} - m/z_{n}}$$
$$z = \frac{903.7148 - 1.0078}{903.7148 - 875.4421} = \frac{902.707}{28.2727} = 31.92$$
Rounding to the nearest integer, the charge state for the 903.7 peak is 31+, and the 875.4 peak is 32+.
2.2 Determine the MW of the protein:
Using the 32+ charge state:
MW = (875.4421 * 32) - (32 * 1.0078)
MW = 28,014.15 - 32.25
Calculated MW:27,981.90 Da
2.3 Calculate the accuracy of the measurement:
Using the formula: Error (ppm) = [|Exp - Theory| / Theory] * 1,000,000
3. Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?
No. Because the protein is in its denatured state at a high charge state (32+), the isotopes are spaced by 1/z (1/32 = 0.03 m/z). A mass spectrometer with a resolution of 30,000 cannot resolve individual isotopes for a protein this large at that charge state; they blur into a single “envelope.”
Homework: Waters Part II — Secondary/Tertiary Structure
1. Explain the difference between native and denatured protein conformations.
Denatured State: The protein is unfolded, often due to acidic solvents or heat. This “stretches out” the amino acid chain, exposing basic residues (Lysine, Arginine, Histidine) that were previously hidden in the core. Consequently, the protein picks up many protons, resulting in high charge states and peaks at lower m/z values (the 500-1500 range).
Native State: The protein remains folded (e.g., the eGFP beta-barrel). Many basic residues are buried and inaccessible for protonation. The protein picks up fewer protons, resulting in lower charge states and peaks at higher m/z values (the 2000-4000 range).
2. Can you discern the charge state of the peak at ~2800 m/z in Figure 3? What is it? How can you tell?
Yes. The charge state is 10+.
Reasoning: In the zoomed inset of Figure 3, the individual isotopes are clearly resolved. The spacing between the isotopes is 0.1 m/z. Since the spacing is equal to 1/z, a spacing of 0.1 indicates a charge state of 10 (1/10 = 0.1).
Homework: Waters Part III — Peptide Mapping
1. How many Lysines (K) and Arginines (R) are in eGFP?
Based on the sequence analysis, there are 20 Lysines (K) and 6 Arginines (R).
2. How many peptides will be generated from tryptic digestion of eGFP?
Using the PeptideMass tool with zero missed cleavages, 27 peptides are predicted.
3. How many chromatographic peaks do you see between 0.5 and 6 minutes in Figure 5a?
Looking at the Total Ion Chromatogram (TIC), there are approximately 18 distinct peaks that meet the >10% relative abundance threshold.
4. Does the number of peaks match the number of peptides predicted?
No. There are fewer peaks in the chromatogram (18) than predicted peptides (27). This is common because some peptides are too small to be retained on the column, others are too hydrophobic to elute, and some co-elute (overlap) at the same time.
5. Identify the m/z and charge (z) of the peptide in Figure 5b. Calculate the mass of the singly charged form (MH+).
m/z: 525.76712
Charge (z): The isotopes are spaced by 0.5 m/z. Therefore, $z = 2$ (since 1/2 = 0.5).
Singly Charged Mass (MH+): (525.767 * 2) - 1.0078 = 1050.53 Da.
6. Identify the peptide and calculate the mass accuracy in ppm.
Comparing the mass to the predicted list, the peptide is FEGDTLVNR (Theoretical MH+ = 1049.52 Da).
Error:5.7 ppm. This is highly accurate and confirms the peptide identity.
7. What is the percentage of the sequence confirmed?
According to Figure 6, the sequence coverage is 88%.
8. Bonus: What is the sequence for the fragmentation spectrum in Figure 5c?
The sequence is FEGDTLVNR. The spectrum displays the characteristic y-ion and b-ion series that confirm this specific amino acid order.
9. Does the peptide map data make sense?
Yes. 88% coverage is excellent. The high mass accuracy (under 10 ppm) and the matching fragmentation patterns definitively prove that the protein produced is the eGFP standard.
Homework: Waters Part IV — Oligomers
Based on the CDMS mass spectrum in Figure 7 and the subunit masses:
7FU Decamer: Observed at 3.4 MDa.
8FU Didecamer: Observed at 8.33 MDa.
8FU 3-Decamer: Observed at 12.67 MDa.
8FU 4-Decamer: This would be represented by the furthest right, lower-abundance peaks near 16.0-17.0 MDa.
Homework: Waters Part V — Did I make GFP?
Metric
Theoretical
Observed (Measured)
PPM Mass Error
Molecular weight (kDa)
27.989 kDa
27.982 kDa
252.6 ppm
Gemini AI was consulted for data synthesis and formatting