<DIANA GRIMALDOS> — HTGAA Spring 2026

About me

Undergraduate Colombian Biology student interested in cellular and molecular biology, beginning to delve into the field of synthetic biology.

Contact info

Homework

Labs

Week 1 Lab: Pipetting

Projects

Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
First, describe a biological engineering application or tool you want to develop and why Pattern-Based Rapid Diagnostic Platform for Dengue Virus: A rapid diagnostic platform for dengue virus (DENV) that integrates innate immune recognition, molecular recognition, and biosensor engineering to address key limitations of existing diagnostic methods. The proposed system combines mannose-binding lectin for the recognition of viral glycoproteins, dengue-specific aptamers targeting conserved regions of viral proteins, and signal transduction through a portable biosensor to enable rapid readout. This approach is motivated by the fact that current dengue diagnostics are often expensive and exhibit reduced sensitivity and reliability in dengue-endemic regions, particularly in countries like mine (Colombia), where prior flavivirus exposure compromises serological test performance and access to reliable diagnostics is limited by public healthcare infrastructure (Terenteva et al., 2025).
Week 10: Advanced Imaging & Measurement Technology
Homework: Final Project Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. Expression of the Reporter Gene LacZ Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements β-galactosidase hydrolyzes chromogenic substrates, producing a colored product that can be visually detected or quantified spectrophotometrically. LacZ will be measured using a colorimetric assay with ONPG, enabling both spectrophotometric quantification and visual detection
Week 11 — Bioproduction & Cloud Labs
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork Make a note on your HTGAA webpages including: What you contributed to the community bioart project I made part of the 2026 on the upper left plate What you liked about the project, and what about this collaborative art experiment could be made better for next year.
Week 2 HW: DNA Read, Write and Edit
Part 1: Benchling & In-silico Gel Art Part 3: DNA Design Challenge Protein: mannose-binding protein C precursor Reverse Translate: Aminoacids mslfpslpll llsmvaasys etvtcedaqk tcpaviacss pgingfpgkd grdgtkgekg epgqglrglq gppgklgppg npgpsgspgp kgqkgdpgks pdgdsslaas erkalqtema rikkwltfsl gkqvgnkffl tngeimtfek vkalcvkfqa svatprnaae ngaiqnlike eaflgitdek tegqfvdltg nrltytnwne gepnnagsde dcvlllkngq wndvpcstsh lavcefpi Nucleotid sequence atgagcctgtttccgagcctgccgctgctgctgctgagcatggtggcggcgagctatagc gaaaccgtgacctgcgaagatgcgcagaaaacctgcccggcggtgattgcgtgcagcagc ccgggcattaacggctttccgggcaaagatggccgcgatggcaccaaaggcgaaaaaggc gaaccgggccagggcctgcgcggcctgcagggcccgccgggcaaactgggcccgccgggc aacccgggcccgagcggcagcccgggcccgaaaggccagaaaggcgatccgggcaaaagc ccggatggcgatagcagcctggcggcgagcgaacgcaaagcgctgcagaccgaaatggcg cgcattaaaaaatggctgacctttagcctgggcaaacaggtgggcaacaaattttttctg accaacggcgaaattatgacctttgaaaaagtgaaagcgctgtgcgtgaaatttcaggcg agcgtggcgaccccgcgcaacgcggcggaaaacggcgcgattcagaacctgattaaagaa gaagcgtttctgggcattaccgatgaaaaaaccgaaggccagtttgtggatctgaccggc aaccgcctgacctataccaactggaacgaaggcgaaccgaacaacgcgggcagcgatgaa gattgcgtgctgctgctgaaaaacggccagtggaacgatgtgccgtgcagcaccagccat ctggcggtgtgcgaatttccgatt Codon optimization: ATG AGC CTT TTT CCG AGC CTT CCT CTG CTT TTA CTG TCG ATG GTG GCC GCC AGC TAC AGT GAA ACT GTG ACC TGT GAG GAC GCC CAA AAA ACG TGT CCT GCA GTT ATC GCG TGC AGC TCC CCG GGT ATC AAT GGC TTC CCC GGC AAG GAC GGG CGT GAT GGG ACT AAA GGC GAG AAA GGT GAA CCG GGA CAG GGC TTA CGT GGT TTA CAG GGC CCG CCG GGT AAA TTG GGG CCG CCA GGC AAT CCG GGT CCG AGT GGC TCC CCA GGG CCG AAA GGT CAG AAA GGC GAT CCA GGC AAA AGT CCG GAT GGT GAT TCA AGT CTG GCG GCC AGC GAA CGT AAG GCC CTT CAG ACC GAA ATG GCT CGT ATC AAA AAA TGG TTA ACG TTC AGC CTG GGG AAA CAA GTG GGG AAT AAG TTT TTT CTG ACT AAT GGC GAG ATC ATG ACG TTT GAG AAA GTG AAA GCG CTG TGT GTG AAG TTC CAG GCC AGC GTG GCG ACG CCA CGT AAC GCG GCG GAA AAT GGC GCG ATT CAA AAC CTT ATC AAA GAA GAG GCC TTC CTG GGT ATT ACG GAC GAA AAA ACG GAG GGC CAG TTT GTC GAT CTG ACT GGT AAC CGC TTA ACA TAT ACC AAT TGG AAT GAG GGC GAA CCT AAC AAC GCA GGC AGC GAT GAG GAC TGC GTG CTG TTA TTG AAA AAC GGC CAG TGG AAC GAC GTA CCT TGT TCC ACT AGC CAT TTA GCG GTA TGC GAA TTT CCG ATT
Week 2 HW: Lab Automation
Opentrons Artwork opentrons-art.rcdonovan.com/?id=oevp91e27i3m061 Post-Lab Questions Find and describe a published paper that utilizes the Opentrons This article combines an open‑source liquid‑handling robot (Opentrons OT‑One‑S Hood) with four interchangeable modules that perform magnetic‑bead DNA isolation, isothermal recombinase polymerase amplification (RPA) of the ctrA gene, exonuclease digestion to generate single‑stranded DNA, and detection on a paper‑based vertical‑flow microarray (VFM) using anti‑biotin gold nanoparticles for colorimetric read‑out.
Week 4: Protein Design Part I
Part A. Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Considering that 500 g of meat contains 20% protein, we would have 100 g of protein. After converting the protein mass to moles (assuming an average amino acid mass of 100 Da), this corresponds to approximately 6.022 × 10²³ amino acid molecules.
Week 5 HW: Genetic circuits part 1
Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fidelity DNA polymerase: enzyme responsible for synthesizing new DNA strands while possessing activity, which reduces errors during replication. dNTPs (deoxynucleotide triphosphates): Nucleotide substrates incorporated by DNA polymerase into the elongating DNA strand during synthesis reaction buffer: contain compounds as Tris-HCl that maintains the correct pH and salts like KCl which help stabilize primer binding and enzyme activity
Week 5 HW: Protein design part 2
Part A: SOD1 Binder Peptide Design Part 1: Generate Binders with PepMLM After introducing the A4V mutation. I performed the mutation A5V based on its position in the FASTA sequence. MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Using PepMLM Colab with a K value of 1, I obtained:
Week 7: Genetic Circuits Part II: Neuromorphic Circuits
Assignment Part 1: Intracellular Artificial Neural Networks What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? Instead of relying on ON/OFF logic, they process information in a continuous and distributed way, closer to how real cells behave. This allows them to integrate multiple signals with different intensities and avoid the exponential complexity that arises when scaling Boolean circuits. In addition, IANNs are inherently more flexible, since their behavior can be tuned by adjusting interaction strengths rather than completely redesigning the system, enabling more complex and nonlinear decision-making. Overall, they provide a more efficient, scalable, and biologically realistic framework for intracellular computation.
Week 9: Cell-Free Systems
Assignment Part A: General and Lecturer-Specific Questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production. You can control everything directly. You can adjust pH, temperature, ion concentrations, and cofactors and even add or remove components during the reaction. Also, it’s faster since you don’t need cell growth or maintenance.

Week 1 HW: Principles and Practices

First, describe a biological engineering application or tool you want to develop and why
Pattern-Based Rapid Diagnostic Platform for Dengue Virus: A rapid diagnostic platform for dengue virus (DENV) that integrates innate immune recognition, molecular recognition, and biosensor engineering to address key limitations of existing diagnostic methods. The proposed system combines mannose-binding lectin for the recognition of viral glycoproteins, dengue-specific aptamers targeting conserved regions of viral proteins, and signal transduction through a portable biosensor to enable rapid readout. This approach is motivated by the fact that current dengue diagnostics are often expensive and exhibit reduced sensitivity and reliability in dengue-endemic regions, particularly in countries like mine (Colombia), where prior flavivirus exposure compromises serological test performance and access to reliable diagnostics is limited by public healthcare infrastructure (Terenteva et al., 2025).

2.Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

GOAL 1: Ensure that diagnostic accuracy supports appropriate clinical decision-making, minimizing the risk of misdiagnosis, delayed care, and public health mismanagement.

Subgoals: Diagnostic Reliability: Standards validated in endemic populations and ongoing monitoring of false positives and false negatives.

Prevention of clinical misinterpretation: Support test results with clear and accessible interpretive guidance.

GOAL 2: Ensure to promote equitable access and global health justice in the development and use of the diagnostic technology.

Subgoals: Affordability and Accessibility: Promote public–private collaboration for the deployment of dengue diagnostics in high-burden countries, without dependence on specialized infrastructure.

Prevent Technological Exclusion: Ensure that the diagnostic tool is usable in decentralized healthcare settings, such as rural clinics and community health centers

3.Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”)

Establish context-specific validation requirements for diagnostic deployment:

The objective of this action is to prevent clinical harm caused by diagnostic failures that persist under current practices in endemic settings, particularly false negatives in secondary dengue infections and false positives due to flavivirus cross-reactivity. To achieve this, regulatory agencies and public health institutions should require that rapid dengue diagnostics be validated directly in endemic communities—especially among people with prior flavivirus exposure—before they’re approved or rolled out. This means shifting approval pathways so they rely on real‑world performance data, not just controlled lab studies or trials in non‑endemic settings. Assuming that regulators can review performance data in the specific contexts where diagnostics will be used, that accuracy can vary across different populations, and that manufacturers will adapt their designs to meet these requirements. Even so, a “false success” could occur if compliance is limited to minimal testing in some endemic populations. True success would mean diagnostics that work reliably across diverse populations, helping to reduce misdiagnosis and support appropriate clinical decisions.

Ensuring Transparency and Open Validation in Diagnostic Development:

The purpose of this action is to promote responsible innovation and build trust by ensuring that the limitations of diagnostics are openly documented and shared before large-scale deployment. By making failure modes, cross-reactivity profiles, and other constraints visible early, developers, regulators, and clinicians can make better-informed decisions and reduce risks to patients and public health. Funding agencies and scientific journals should require transparent reporting of assay limitations as part of the evaluation and publication process. Shared validation datasets could be established. Incentives—such as eligibility for specific funding programs, prioritized review, or formal recognition—can encourage participation from researchers and companies while improving the overall reliability and comparability of diagnostic technologies. This approach assumes that increased transparency improves diagnostic quality and that researchers and companies will share performance data to reduce failure across the sector. Potential failure points include limited industry participation due to intellectual property concerns or competitive pressures. Success is defined by establishing routine, independent testing of diagnostic performance claims, creating a standard expectation of reliability prior to clinical adoption.

4.Next, score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own: –>

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents		X
• By helping respond		X
Foster Lab Safety
• By preventing incident	X
• By helping respond	X
Protect the environment
• By preventing incidents		X
• By helping respond		X
Other considerations
• Minimizing costs and burdens to stakeholders			X
• Feasibility?		X
• Not impede research			X
• Promote constructive applications			X

Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties
Based on the above, the following actions would be prioritized: Establish context-specific validation requirements for dengue diagnostics. This action has the highest priority because it directly ensures diagnostic reliability in endemic populations. Requiring validation in communities with prior flavivirus exposure reduces false negatives, false positives, and clinical mismanagement. It provides the regulatory foundation necessary for safe, equitable deployment. Without this step, large-scale implementation could compromise patient care and public health decision-making.

Ensure transparency and open reporting of diagnostic limitations. This action strengthens accountability and trust by requiring disclosure of performance data, cross-reactivity profiles, and failure modes. However, its effectiveness depends on clear validation standards. Properly designed, it improves long-term reliability and supports informed clinical use, while balancing industry participation and innovation incentives.

Ethical concerns that arose, especially any that were new to you.
This week, I learned that developing a diagnostic platform is more than just a technical or experimental challenge; it is also a matter of governance and biosecurity—areas that were largely new to me. Previously, I focused mainly on protocol design, molecular mechanisms, and performance metrics. In class and doing the homework, I began to understand that every diagnostic tool exists within a broader regulatory, ethical, and public health framework that determines how it is validated, deployed, and monitored in real-world settings. One key ethical concern that emerged for me is that diagnostic errors are not just laboratory inaccuracies—they can produce systemic harm. Poor validation, lack of transparency, or weak oversight can lead to misdiagnosis, inequitable access, and loss of public trust. To address these issues, some governance actions that I think should be required are context‑specific validation before any large‑scale deployment, establishing clear transparency standards to ensure diagnostic limitations are openly reported, and implementing post‑market monitoring to quickly identify and respond to performance gaps.
References
Terenteva, S., Golani-Zaidie, L., Avivi, S., Lustig, Y., Indenbaum, V., Koren, R., Hoa, T. M., Tuyen, T. T. K., Huyen, M. T., Hoan, N. M., Hoi, L. T., Trung, N. V., Schwartz, E., & Danielli, A. (2025). Sensitivity and Cross-Reactivity analysis of Serotype-Specific Anti-NS1 serological assays for dengue virus using optical modulation biosensing. Biosensors, 15(7), 453. https://doi.org/10.3390/bios15070453

WEEK 2 LECTURE PREP

Questions from Professor Jacobson

Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of replicative DNA polymerases is roughly 10⁻⁵ errors per nucleotide incorporated. With proofreading (3’→5’ exonuclease activity), this improves to about 10⁻⁷, and after post-replicative mismatch repair, the final error rate drops to approximately 10⁻⁹ to 10⁻¹⁰ per base per replication cycle.

Comparing that to the human genome, which is about 3 × 10⁹ base pairs per haploid. If replication occurred at 10⁻⁵ error frequency with no correction, that would mean tens of thousands of mutations per cell division, incompatible with genomic stability. Even at 10⁻⁷, you would expect hundreds of mutations per division.

Biology resolves this discrepancy throug: 1. Polymerase selectivity 2. Proofreading activity 3. Mismatch repair (MMR) 4. Cell cycle checkpoints and apoptosis

How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

An average 400 amino acids long. Because of the degeneration of the genetic code, most amino acids are specified by multiple synonymous codons.

In practice, most of these sequences do not function equivalently because: 1. Codon usage bias 2. mRNA secondary structure 3. GC content constraints 4. Regulatory elements within coding regions

Questions from Dr. LeProust

What’s the most commonly used method for oligo synthesis currently?

The most common method for oligo synthesis is solid-phase phosphoramidite chemistry. Nucleotides are added stepwise in the 3’→5’ direction on a solid support, with cyclic coupling, capping, oxidation, and deprotection steps.

Why is it difficult to make oligos longer than 200nt via direct synthesis?

Because each coupling step is not 100% efficient. The yield drops with length. At 99% efficiency per step, a 200-mer has only a few full-length products. Beyond that, truncated products dominate, and purification becomes inefficient.

Why can’t you make a 2000bp gene via direct oligo synthesis?

Because the cumulative stepwise loss would make the full-length product essentially nonexistent. Long genes are built by assembling shorter oligos not by single continuous chemical synthesis.

Question from George Church

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in animals are: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine. These cannot be synthesized de novo and must be obtained from the diet. Respect “Lysine Contingency”: Lysine is not the only essential amino acid; it is one of several indispensable amino acids. This means that engineering a specific dependence on lysine is conceptually no different from creating dependence on any other essential amino acid. Its practical value does not lie in biochemical uniqueness, but in its controllability: environmental availability can be tightly regulated, making it a useful strategy. Source: ChatGPT (Open AI)

Week 10: Advanced Imaging & Measurement Technology

Homework: Final Project

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

Expression of the Reporter Gene LacZ

Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements

β-galactosidase hydrolyzes chromogenic substrates, producing a colored product that can be visually detected or quantified spectrophotometrically. LacZ will be measured using a colorimetric assay with ONPG, enabling both spectrophotometric quantification and visual detection

What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail

Colorimetric β-Galactosidase Assay To quantify the final output of the biosensor—LacZ expression—which reflects the presence of the DENV pathogen

Homework: Waters Part I — Molecular Weight

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight?

The molecular weight of eGFP with His-purification tag and a linker is 5.90 / 28006.60

Calculate the molecular weight of the eGFP using the adjacent charge state approach described in the recitation. Select two charge states from the intact LC-MS data (Figure 1) and:

Peaks: 933.7848 and 903.7844

Determine z for each adjacent pair of peaks (n,n +1) z=30,1
Determine the MW of the protein using the relationship: (Peak 1 and Z) MW=27983,3
Cal culate the accuracy of the measurement using the deconvoluted MW from 2.2 and the predicted weight of the protein from 2.1 Accuracy=8,3x10(4)

Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

No, the charge state cannot be determined from the enlarged peak, because the peak does not show adjacent peaks or sufficient isotopic resolution to infer the charge

Homework: Waters Part II — Secondary/Tertiary structure

Based on learnings in the lab, please explain the difference between native and denatured protein conformations. For example, what happens when a protein unfolds? How is that determined with a mass spectrometer? What changes do you see in the mass spectrum between the native and denatured protein analyses (Figure 2)?

Native proteins retain their folded, 3D structure, stabilized by interactions such as hydrogen bonding, hydrophobic interactions, and salt bridges. Denatured proteins have lost this folded structure due to disruption of these interactions. The protein becomes unfolded or extended.

When a protein unfolds: Lys, Arg, His become exposed, the protein accept more protons during electrospray ionization (ESI) and this leads to higher charge states (larger z values)

Zooming into the native mass spectrum of eGFP from the Waters Xevo G3 QTof MS (see Figure 3), can you discern the charge state of the peak at ~2800 M/Z? What is the charge state? How can you tell?

It’s approximately. z=10

Homework: Waters Part III — Peptide Mapping - primary structure

How many Lysines (K) and Arginines (R) are in eGFP? Please circle or highlight them in the eGFP sequence given in Waters Part I question 1 above.

How many peptides will be generated from tryptic digestion of eGFP?

Based on the LC-MS data for the Peptide Map data generated in lab (please use Figure 5a as a reference) how many chromatographic peaks do you see in the eGFP peptide map between 0.5 and 6 minutes? You may count all peaks that are >10% relative abundance

Assuming all the peaks are peptides, does the number of peaks match the number of peptides predicted from question 2 above? Are there more peaks in the chromatogram or fewer?

Not, there are fewer

Identify the mass-to-charge (m/z) of the peptide shown in Figure 5b. What is the charge (z) of the most abundant charge state of the peptide (use the separation of the isotopes to determine the charge state).

The dominant peak cluster is centered at m/z≈525.76
charge state: z=2+
Neutral peptide mass: ≈1049.5Da
Mass of [M + H]+:1050,5 m/z

dentify the peptide based on comparison to expected masses in the PeptideMass tool. What is mass accuracy of measurement?

Peptide: DLGEEYVQAFK (GFP)
Mt= 1049.53 Da
Mexp= 1049.52 Da
Error= 9.5 ppm

What is the percentage of the sequence that is confirmed by peptide mapping? (see Figure 6)

88%

Homework: Waters Part IV — Oligomers

Identify where the following oligomeric species are on the spectrum shown below from the CDMS

7FU Decame (3,4 MDa) -> Peak 3
8FU Didecamer (8,3 MDa) -> Peak 5
8FU 3-Decamer (12,67 MDa) -> Peak 6
8FU 4-Decamer (16 MDa) -> No peak

Homework: Waters Part V — Did I make GFP?

Molecular weight Theoretical = 26.9 kDa Molecular weight Intact LC-MS = 27.0 kDa PPM Mass Error ~3 700 ppm

Week 11 — Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Make a note on your HTGAA webpages including:

What you contributed to the community bioart project
I made part of the 2026 on the upper left plate
What you liked about the project, and what about this collaborative art experiment could be made better for next year.

I liked the idea of working with hundreds of people on a single collaborative project that can express the essence of all the participants. What could be improved next year would be to add more colors.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

E. coli Lysate

BL21 (DE3) Star Lysate (includes T7 RNA Polymerase): Supplies a high-yield transcription system via T7 RNA polymerase, and the Star strain reduces RNase activity for increased mRNA stability.

Salts/Buffer

Potassium Glutamate: Maintains ionic strength and mimics the intracellular environment more effectively than chloride salts, enhancing translation efficiency.

HEPES-KOH pH 7.5: Buffers the reaction mixture at an optimal pH for enzymatic activity.

Magnesium Glutamate: Serves as a necessary cofactor for ribosome function, tRNA binding, and RNA polymerase activity.

Potassium phosphate monobasic: Along with its dibasic form, helps maintain pH and supplies phosphate for energy regeneration.

Potassium phosphate dibasic: Works with the monobasic form to buffer the reaction and contribute to the phosphate pool.

Energy / Nucleotide System

Ribose and glucose: Act as energy sources and carbon backbones to fuel metabolic pathways that regenerate ATP and GTP.

AMP, CMP, GMP y UMP: Are converted to the corresponding nucleotide triphosphates (ATP, CTP, GTP, UTP) to serve as substrates for transcription and energy metabolism.

Guanine: Can be salvaged to produce GTP, which is critical for translation initiation and elongation.

Translation Mix (Amino Acids)

17 Amino Acid Mix: Supplies the common amino acids required for protein synthesis; typically excludes tyrosine and cysteine to allow controlled addition

Tyrosine and cysteine: Added separately to prevent chemical modification or precipitation that can occur during long-term storage of the full amino acid mix.

Additives

Nicotinamide: Helps regenerate NAD⁺ and inhibits certain proteases, thereby improving reaction longevity and yield.

Backfill

Nuclease Free Water: Adjusts the final volume of the reaction to achieve the desired concentrations of all components, while ensuring no contaminating nucleases degrade mRNA.

Describe the main differences between the 1-hour optimized PEP-NTP master mix and the 20-hour NMP-Ribose-Glucose master mix shown in the Google Slide above. (2-3 sentences)

The main difference is the energy and nucleotide supply strategy. The 1-hour uses pre-supplied NTPs for immediate, rapid energy regeneration, enabling fast protein synthesis. In contrast, the 20-hour uses simple precursors that are metabolically converted into NTPs over time, which supports sustained, long-term protein production.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

sfGFP: Robust and rapid folding even without chaperone assistance, however, its chromophore maturation still requires molecular oxygen, meaning anaerobic or oxygen-depleted CFPS reactions will show reduced fluorescence despite proper translation. sfGFP’s high resistance to aggregation makes it ideal for CFPS, but its sensitivity to acidic pH (below ~6.5) can quench fluorescence, requiring careful buffer maintenance during extended incubations
mRFP1: Slow maturation time due to its obligate requirement for chromophore oxidation and dehydration, which limits its utility in short reactions. Exhibits some sensitivity to oligomerization at high concentrations, lacking membrane compartments, this can lead to solubility issues or altered spectral properties compared to monomeric red FPs.
mKO2: Rapid maturation rate due to its efficient chromophore formation. Like all GFP-derived fluorescent proteins, mKO2 requires molecular oxygen for chromophore oxidation, meaning that oxygen depletion in extended cell-free reactions can limit final fluorescence yield despite active protein synthesis.
mTurquoise2: High quantum yield and efficient maturation, but its chromophore is sensitive to acidic pH (pKa ~5.1), meaning that any drop in pH during extended cell-free reactions can quench fluorescence readout. Additionally, it requires proper oxidation of its chromophore, making it dependent on adequate oxygen levels in the reaction mix for full fluorescent signal development.
mScarlet_I: Accelerated maturation, which is beneficial for the systems because it allows rapid fluorescence development within short reaction times. Its red chromophore requires precise oxidative maturation conditions, and any oxygen limitation or redox imbalance in the lysate can reduce final fluorescence yield. but, it exhibits excellent monomeric behavior and pH stability, making it less sensitive to mild acidification.
Electra2: High photostability, but its chromophore maturation is relatively slow and oxygen-dependent, requiring sufficient dissolved oxygen in the reaction for complete fluorescent signal development. It has lower quantum yields and are more prone to acid sensitivity, meaning that pH drops during extended incubations

Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

Limitation: Its chromophore maturation requires molecular oxygen, and extended 36-hour incubations lead to oxygen depletion and pH acidification, which can quench fluorescence even if the protein remains folded.
Adjust:

Long-term ATP regeneration and buffer against acidification:

Glucose increase from 6.9 mM to 15–20 mM Ribose increase from 77.4 mM to 100 mM HEPES-KOH pH 7.5 increase from 45 mM to 80 mM

Protect the oxidizing chromophore from peroxide damage:

Catalase add 100 U/mL

‘‘This combined adjustment is expected to yield stable sfGFP fluorescence for up to 36 hours, with at 1.5–2× higher endpoint signal compared to the standard 20-hour master mix, by preventing both energy collapse and oxidative stress during prolonged incubation’’

Mail not received
Unable to do without data

Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment

Week 2 HW: DNA Read, Write and Edit

Part 1: Benchling & In-silico Gel Art

Part 3: DNA Design Challenge

Protein: mannose-binding protein C precursor

Reverse Translate:

Aminoacids

mslfpslpll llsmvaasys etvtcedaqk tcpaviacss pgingfpgkd grdgtkgekg epgqglrglq gppgklgppg npgpsgspgp kgqkgdpgks pdgdsslaas erkalqtema rikkwltfsl gkqvgnkffl tngeimtfek vkalcvkfqa svatprnaae ngaiqnlike eaflgitdek tegqfvdltg nrltytnwne gepnnagsde dcvlllkngq wndvpcstsh lavcefpi

Nucleotid sequence

atgagcctgtttccgagcctgccgctgctgctgctgagcatggtggcggcgagctatagc gaaaccgtgacctgcgaagatgcgcagaaaacctgcccggcggtgattgcgtgcagcagc ccgggcattaacggctttccgggcaaagatggccgcgatggcaccaaaggcgaaaaaggc gaaccgggccagggcctgcgcggcctgcagggcccgccgggcaaactgggcccgccgggc aacccgggcccgagcggcagcccgggcccgaaaggccagaaaggcgatccgggcaaaagc ccggatggcgatagcagcctggcggcgagcgaacgcaaagcgctgcagaccgaaatggcg cgcattaaaaaatggctgacctttagcctgggcaaacaggtgggcaacaaattttttctg accaacggcgaaattatgacctttgaaaaagtgaaagcgctgtgcgtgaaatttcaggcg agcgtggcgaccccgcgcaacgcggcggaaaacggcgcgattcagaacctgattaaagaa gaagcgtttctgggcattaccgatgaaaaaaccgaaggccagtttgtggatctgaccggc aaccgcctgacctataccaactggaacgaaggcgaaccgaacaacgcgggcagcgatgaa gattgcgtgctgctgctgaaaaacggccagtggaacgatgtgccgtgcagcaccagccat ctggcggtgtgcgaatttccgatt

Codon optimization:

ATG AGC CTT TTT CCG AGC CTT CCT CTG CTT TTA CTG TCG ATG GTG GCC GCC AGC TAC AGT GAA ACT GTG ACC TGT GAG GAC GCC CAA AAA ACG TGT CCT GCA GTT ATC GCG TGC AGC TCC CCG GGT ATC AAT GGC TTC CCC GGC AAG GAC GGG CGT GAT GGG ACT AAA GGC GAG AAA GGT GAA CCG GGA CAG GGC TTA CGT GGT TTA CAG GGC CCG CCG GGT AAA TTG GGG CCG CCA GGC AAT CCG GGT CCG AGT GGC TCC CCA GGG CCG AAA GGT CAG AAA GGC GAT CCA GGC AAA AGT CCG GAT GGT GAT TCA AGT CTG GCG GCC AGC GAA CGT AAG GCC CTT CAG ACC GAA ATG GCT CGT ATC AAA AAA TGG TTA ACG TTC AGC CTG GGG AAA CAA GTG GGG AAT AAG TTT TTT CTG ACT AAT GGC GAG ATC ATG ACG TTT GAG AAA GTG AAA GCG CTG TGT GTG AAG TTC CAG GCC AGC GTG GCG ACG CCA CGT AAC GCG GCG GAA AAT GGC GCG ATT CAA AAC CTT ATC AAA GAA GAG GCC TTC CTG GGT ATT ACG GAC GAA AAA ACG GAG GGC CAG TTT GTC GAT CTG ACT GGT AAC CGC TTA ACA TAT ACC AAT TGG AAT GAG GGC GAA CCT AAC AAC GCA GGC AGC GAT GAG GAC TGC GTG CTG TTA TTG AAA AAC GGC CAG TGG AAC GAC GTA CCT TGT TCC ACT AGC CAT TTA GCG GTA TGC GAA TTT CCG ATT

Why is it necessary to optimize codon usage?

Because several codons encode the same amino acid, but the frequency at which these codons are used varies among organisms. Each species has preferences for particular codons, a phenomenon known as codon usage bias. When expressing a gene from one organism in a different host without prior codon optimization, translation efficiency can be reduced, leading to lower protein production or even affecting proper protein folding.

Which organism have I chosen to optimize the codon sequence for, and why?

I chose to optimize the sequence for Escherichia coli because it is one of the most widely used systems for recombinant protein production. It is easy to cultivate, grows quickly, and is cost‑effective. In addition, there are well‑established genetic tools that enable efficient protein expression when codons are adapted to its natural codon bias, and it is the organism I am most familiar with.

You have a sequence! Now what?:

What technologies could be used to produce this protein from your DNA?

Recombinant gene expression in bacteria, yeasts, or cell‑free systems

Cell-dependent transcription and translation

The gene is inserted into a plasmid and then introduced into a host organism. The process begins with cloning, where the codon‑optimized gene is inserted into a plasmid. During transformation or transfection, the plasmid is delivered into the host cell. Inside the cell, the host RNA polymerase recognizes the promoter and transcribes the DNA into mRNA. This mRNA is then read by ribosomes, which synthesize the protein by assembling amino acids according to the codon sequence. Finally, the newly synthesized protein folds and, if necessary, undergoes further modifications.

Part 4: Prepare a Twist DNA Synthesis Order

Benchling: https://benchling.com/s/seq-9syL7gvyZin8DtEAROGk?m=slm-yWJdRnMP3mRA7qSkwxTO

SBOL Canvas:

Twist: Benchling:

Part 5: DNA Read/Write/Edit

READ

What DNA would you want to sequence and why?

MBL for be used as an initial viral capture molecule due to its affinity for high-mannose glycans present on the viral envelope protein.

What technology or technologies would you use to perform sequencing on your DNA and why?

To sequence DNA, I would use Illumina sequencing for its high accuracy and cost-effectiveness, since my idea is targeted gene analysis and Illumina technology provides low error rates and high throughput, making it ideal for detecting deletions.

Is your method first-, second or third-generation or other? How so?

Second generation, because it performs massive sequencing in parallel, use clonal amplification (bridge amplification) before reading and is necessary perform the amplification before sequencing

What is your input? How do you prepare your input? List the essential steps.

Input: MBL2

Preparation: Using linear synthetic DNA MBL2 sequence

End repair.
A-tailing.
Adapter ligation for Illumina-compatible adapters
PCR enrichment of adapter-ligated fragments.
Cleanup and quantification.

What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample? Essential steps:

Flow Cell Binding MBL2 gene fragments with adapters are loaded onto the flow cell. The fragments hybridize to complementary oligonucleotides immobilized on the surface.
Bridge Amplification Each fragment is locally amplified, forming clonal clusters, generating many identical copies of each fragment, increasing the detectable signal during sequencing.
Sequencing by Synthesis Fluorescently labeled nucleotides with reversible terminators. In each cycle: • A labeled nucleotide (A, T, C, or G) is added. • Only one nucleotide is incorporated per cycle. • A camera detects the emitted fluorescence. • The system records the corresponding color. • The fluorophore and chemical blocking group are then removed. • The cycle is repeated.

Base calling performed through:

Optical detection of the fluorescent signal in each cluster. Each color corresponds to a specific base (A, T, C, or G), the software converts the light signal into a nucleotide sequence, and a quality score is assigned to each base.

What is the output of your chosen sequencing technology?

Millions of short reads. Generated in FASTQ format files, with the nucleotide sequence of each read and a quality score assigned to every base.

WRITE

What DNA would you want to synthesize (e.g., write) and why?

would synthesize a codon-optimized version of the MBL2 coding sequence (CDS) to produce recombinant MBL protein for use as a viral capture molecule in a diagnostic assay.

What technology or technologies would you use to perform this DNA synthesis and why? Array-based or column-based phosphoramidite DNA synthesis. Because it is a method with high sequence fidelity and control over sequence design

What are the essential steps of your chosen sequencing methods?

a) Solid-phase attachment b) Deprotection c) Coupling d) Capping e) Oxidation f) Cleavage and deprotection g) Sequence verification

What are the limitations of your sequencing method in terms of speed, accuracy, scalability?

In terms of accuracy, each coupling step is not 100% efficient, so errors can accumulate as the sequence length increases. Regarding length, synthesis is limited to about 150–200 base pairs per oligonucleotide, meaning longer genes must be assembled from multiple overlapping fragments. In terms of speed, synthesis is relatively fast for short oligos, but full gene synthesis requires extra assembly and validation steps,increasing turnaround time. Concerning scalability, array-the individual yield per oligo may be lower compared to column-based synthesis. Finally, cost increases with more complex constructs due to the need for assembly and error correction.

EDIT

What DNA would you want to edit and why?

MBL2 gene, which encodes Mannose-binding lectin, to enhance its expression in individuals with naturally low serum MBL levels. Increased MBL expression could potentially improve early immune recognition of viral pathogens

What kinds of edits might you want to make to DNA? Why?

Precise regulatory modification rather than altering the protein-coding sequence. Introducing promoter variants associated with higher transcriptional activity could increase protein levels without changing protein structure, minimizing unintended functional consequences while enhancing host defense.

What technology or technologies would you use to perform these DNA edits and why?

CRISPR-Cas9 with Homology-Directed Repair, because the precise control over the type of genetic modification and flexibility depending on whether the goal is a single-base change or a larger regulatory insertion.

How does your technology of choice edit DNA? What are the essential steps?

Using a guide RNA (gRNA) to direct the Cas9 nuclease to a specific genomic sequence. Target recognition DNA cleavage DNA repair

What preparation do you need to do and what is the input for the editing?

Design a specific guide RNA targeting for MBL2, verifying minimal off-target sites using bioinformatic tools. Inputs: Cas9 protein, Guide RNA, delivery system and target cell

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Efficiency: HDR efficiency is often low in non-dividing cells, and editing rates depend on the cell type and genomic context.

Precision: Potential off-target edits if the guide RNA is not highly specific and NHEJ repair may introduce unintended insertions or deletions.

Week 2 HW: Lab Automation

Opentrons Artwork

opentrons-art.rcdonovan.com/?id=oevp91e27i3m061

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons

This article combines an open‑source liquid‑handling robot (Opentrons OT‑One‑S Hood) with four interchangeable modules that perform magnetic‑bead DNA isolation, isothermal recombinase polymerase amplification (RPA) of the ctrA gene, exonuclease digestion to generate single‑stranded DNA, and detection on a paper‑based vertical‑flow microarray (VFM) using anti‑biotin gold nanoparticles for colorimetric read‑out.

Magnetic Dynabeads capture pathogen DNA from cerebrospinal fluid or water samples, after which the beads are washed and the DNA is released into the amplification mix  3. The RPA reaction runs at 37 °C, using a biotin‑labelled forward primer and a 5′‑phosphate reverse primer; the resulting double‑stranded amplicons are digested by Lambda exonuclease, which removes the phosphorylated strand and leaves a biotin‑tagged single strand for hybridisation. The single‑stranded amplicons are applied to nitrocellulose VFM spots that contain capture probes for ctrA; anti‑biotin gold nanoparticles bind the biotin tag, and a signal‑enhancement solution produces a visible colour change at positive spots.

ecause all steps are scripted on the Opentrons platform, the robot performs liquid handling automatically, reducing hands‑on time to 110 min for eight samples—about 18 % faster than manual processing—and lowering consumable cost to roughly USD 16 per sample. The open‑source hardware and standard lab consumables make the system adaptable to other pathogens and suitable for deployment in low‑resource settings.

Write a description about what you intend to do with automation tools for your final project

For my final project, I intend to develop a low-cost diagnostic test for the detection of Dengue virus (DENV), optimized for use in low-resource settings. The system would integrate:

A lateral-flow or paper-based detection platform

Automation in the creation and production of the test

The goal is to reduce human error, increase reproducibility, and lower production costs while maintaining diagnostic sensitivity and specificity.

Final Project Ideas

Week 4: Protein Design Part I

Part A. Conceptual Questions

How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

Considering that 500 g of meat contains 20% protein, we would have 100 g of protein. After converting the protein mass to moles (assuming an average amino acid mass of 100 Da), this corresponds to approximately 6.022 × 10²³ amino acid molecules.

Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Because these proteins are broken down during digestion, reducing them to amino acids and peptides that are reused to synthesize our own proteins according to our genetic instructions.

Why are there only 20 natural amino acids?

There are 20 natural amino acids because early in evolution the genetic code incorporated a set of amino acids that provided sufficient chemical diversity to build functional proteins, and once this translation system (involving tRNAs and aminoacyl-tRNA synthetases) became established, it was evolutionarily conserved and effectively “frozen,” making the addition of new amino acids highly disadvantageous.

Can you make other non-natural amino acids? Design some new amino acids.

Yes, chemically it is possible to design new amino acids, but incorporating them biologically into proteins is difficult because they require their own tRNA, a matching aminoacyl-tRNA synthetase, and a reassigned codon. Pendiente hacer aminoacido

Where did amino acids come from before enzymes that make them, and before life started?

Before life began, amino acids likely formed through abiotic chemical reactions driven by energy sources such as lightning, UV radiation, volcanic activity, hydrothermal systems, and possibly delivery from meteorites in a primitive earth, as demonstrated by experiments like Miller.

If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

Left-handed

Can you discover additional helices in proteins?

Yes, additional helices are physically possible and we can design new one modifying amino acid chemistry or backbone structure.

Why are most molecular helices right-handed?

Most molecular helices are right-handed because the chiral building blocks of life—such as L-amino acids and D-sugars—constrain backbone geometry in a way that makes right-handed helices energetically more stable and sterically favorable.

Why do β-sheets tend to aggregate? Because their extended structure exposes backbone hydrogen bond donors and acceptors, allowing β-strands from different molecules to form stable intermolecular hydrogen bonds.

What is the driving force for β-sheet aggregation? The main driving force is the formation of intermolecular backbone hydrogen bonds, reinforced by hydrophobic interactions that stabilize the aggregate and lower the system’s free energy.

Part B. Protein Analysis and Visualization

Briefly describe the protein you selected and why you selected it. Furin is a calcium-dependent protease hat activates precursor proteins by cleaving them at specific basic amino acid recognition sequences in the Golgi apparatus and I selected it because it’s a protein I am currently interested in.

Identify the amino acid sequence of your protein.

5JXG_1|Chain A|Furin|Homo sapiens (9606)

DVYQEPTDPKFPQQWYLSGVTQRDLNVKAAWAQGYTGHGIVVSILDDGIEKNHPDLAGNYDPGASFDVNDQDPDPQPRYTQMNDNRHGTRCAGEVAAVANNGVCGVGVAYNARIGGVRMLDGEVTDAVEARSLGLNPNHIHIYSASWGPEDDGKTVDGPARLAEEAFFRGVSQGRGGLGSIFVWASGNGGREHDSCNCDGYTNSIYTLSISSATQFGNVPWYSEACSSTLATTYSSGNQNEKQIVTTDLRQKCTESHTGTSASAPLAAGIIALTLEANKNLTWRDMQHLVVQTSKPAHLNANDWATNGVGRKVSHSYGYGLLDAGAMVALAQNWTTVAPQRKCIIDILTEPKDIGKRLEVRKTVTACLGEPNHITRLEHAQARLTLSYNRRGDLAIHLVSPMGTRSTLLAARPHDYSADGFNDWAFMTTHSWDEDPSGEWVLEIENTSEANNYGTLTKFTLVLYGTASGSLVPRGSHHHHHH

How long is it? What is the most frequent amino acid?

482, The most common amino acid is: G, which appears 47 times

How many protein sequence homologs are there for your protein?

230

Does your protein belong to any protein family?

Proprotein convertase

Identify the structure page of your protein in RCSB

5JXG | pdb_00005jxg

When was the structure solved? Is it a good quality structure?

2016 with 1.80 Å

Are there any other molecules in the solved structure apart from protein?

Yes, Ca+2, Cl- and Na+

Does your protein belong to any structure classification family?

Serine endoprotease, hydrolase

Open the structure of your protein in any 3D molecule visualization software

Cartoon Ribbon Ball and stick Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type.

What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Hydrophobic residues are mainly buried in the protein core, while hydrophilic residues are predominantly exposed on the surface, consistent with typical folding of a soluble globular protein.

C1. Protein Language Modeling

Deep Mutational Scans

In the mutational scan of furin, one clear pattern is the presence of vertical dark bands that reflect unfavorable mutations at specific positions, which likely correspond to critical residues. For example, Ser368 is essential in the serine protease catalytic triad, and nearly all substitutions at this position are predicted to be highly deleterious.

Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

Yes, It´s very similar.

Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?

I mutated the amino acids in the active site of furin to render the protein inactive; specifically, I introduced the substitutions Ser → Ala (S368A), His → Ala (H194A), and Asp → Asn (D153N).

It’s almost the same structure, but now I changed the p-domain, the majority of the protein still remains intact mading this one resilient to mutations

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN

Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one.

The heatmap shows that many positions in the sequence strongly favor a single amino acid, while alternative residues have very low probabilities. Suggesting that these positions are highly constrained, likely due to structural or functional requirements of the protein. Only a limited number of positions show broader amino acid tolerance, indicating potential sites where mutations might be more acceptable. This pattern is consistent with proteins that contain structurally important or catalytic regions, where mutations are less tolerated.

D. Group Brainstorm on Bacteriophage Engineering

GROUP FINAL PROJECT

Week 5 HW: Genetic circuits part 1

Assignment: DNA Assembly

What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion High-Fidelity DNA polymerase: enzyme responsible for synthesizing new DNA strands while possessing activity, which reduces errors during replication.

dNTPs (deoxynucleotide triphosphates): Nucleotide substrates incorporated by DNA polymerase into the elongating DNA strand during synthesis

reaction buffer: contain compounds as Tris-HCl that maintains the correct pH and salts like KCl which help stabilize primer binding and enzyme activity

MgCl₂: Essential cofactor required for DNA polymerase catalytic activity.

What are some factors that determine primer annealing temperature during PCR?

Tm Range: reflects the temperature at which half of the primer–template duplex dissociates, it depends largely on the primer nucleotide composition, particularly the GC content.

Primer length: As longer primers higher melting temperatures due to more base-pair interactions with the template.

Secondary structures: May require adjustment of the annealing temperature.

Reaction conditions: can alter primer–template stability and thus influence the optimal annealing temperature.

There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR and restriction digests serve different purposes: PCR amplifies a specific DNA fragment using primers and an enzyme like Taq polymerase, making it ideal when you need a lot of a precise sequence or start with very little DNA. In contrast, restriction enzymes such as EcoRI cut DNA at specific sites, which is useful for cloning or checking constructs. So basically, PCR = amplify, restriction digest = cut, and they’re often used together.

How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Primers should be designed to add some bp of overlapping homologous sequences between the insert and the vector. After PCR or digestion, fragments are verified by gel electrophoresis to confirm the correct size and purity, and the overlap regions should be checked in silico, for example by calculating the predicted digest in Benchling to verify the expected band sizes.

How does the plasmid DNA enter the E. coli cells during transformation?

Heat shock: Generate pores in bacterial cell wall with an abrupt temperature change

Electroporation: Generate pores in bacterial cell wall with high electrical voltage

In both methods the cells are shocked causing the cell membrane to “open up”

Describe another assembly method in detail

Golden Gate Assembly

Allows the assembly of multiple DNA fragments in a single reaction, using a Type IIS restriction enzyme, which cuts DNA outside of its recognition sequence to create specific overhangs. DNA fragments and the vector are designed so that these overhangs are complementary, ensuring that the fragments assemble in the correct order. During the reaction, the restriction enzyme cuts the DNA while T4 DNA Ligase simultaneously ligates the compatible ends. Because the restriction sites are removed after ligation, the final assembled plasmid cannot be cut again, which improves assembly efficiency. This method is widely used for assembling multiple fragments quickly and accurately in a one-pot reaction.

Assignment: Asimov Kernel

Create a Repository for your work
Create a blank Notebook entry to document the homework and save it to that Repository
Explore the devices in the Bacterial Demos Repo to understand how the parts work together by running the Simulator on various examples, following the instructions for the simulator found in the “Info” panel (click the “i” icon on the right to open the Info panel)
Create a blank Construct and save it to your Repository

Build three of your own Constructs using the parts in the Characterized Bacterials Parts Repo

a) A dengue-responsive genetic sensor based on a derepression mechanism. An inducible promoter continuously drives the expression of the repressor TetR, which binds to the Tet-regulated promoter (PTet) and blocks transcription of the reporter gene (sfGFP) under normal conditions, keeping fluorescence at basal levels. When dengue is present, its biomarker (E), represented as an inducible promotor, is recognized by an aptamer that functionally inhibits TetR. This inhibition prevents TetR from binding to PTet, thereby relieving repression and allowing transcription and translation of sfGFP. As a result, GFP fluorescence increases in the presence of NS1, meaning the construct converts dengue detection into a measurable fluorescent output, with low signal in the absence of the ligand and high signal when it is present.

b) Three expression cassettes assembled in a single construct. Together, these cassettes implement a threshold-detection circuit that produces GFP only when the dengue-derived ligand is present above a sufficient concentration.

The circuit integrates two constitutively expressed repressors (TetR and LacI) and a dual-repressed reporter cassette, allowing the system to behave as a biological logical gate sensitive to ligand concentration. Correct functionality relies on specifically chosen promoters for each cassette, as the use of pLtetO‑1 in Cassette 2.

C) Genetic construct designed as a ligand-responsive transcriptional amplifier based on TetR–pTet regulation. The system operates through double negation and feedback, such that an external ligand indirectly activates gene expression by inhibiting a transcriptional repressor.

Final project

Week 5 HW: Protein design part 2

Part A: SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

After introducing the A4V mutation. I performed the mutation A5V based on its position in the FASTA sequence.

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Using PepMLM Colab with a K value of 1, I obtained:

WRYYAVAAAHKX 8.149419

In comparison with SOD1-binding peptide:

FLYRWLPSRRGG with a perplexity of 5.98

The generated peptides showed perplexity scores around 8, which are higher than the example. This indicates that the model assigns lower confidence to these sequences as potential binders to the mutant SOD1 protein.

Part 2: Evaluate Binders with AlphaFold3

Due to the fact that AlphaFold does not allow the analysis of proteins with undetermined amino acids, I ran the PepMLM Colab again using a K value of 2 so that all amino acids would be fully specified.

WRYYAAGVEHKE 15.984515

WRYYVVAAAHGX 11.686807

WRYYAAGAALKX 6.211397

WRYYVVAAALKE 15.368100

And with 3:

WHYPAVGAALWE 12.713875

WRYYAVVLAHKX 10.328839

WRYYVVALAHKE 16.276417

HLYYVVGVRWKE 27.442868

So I chose the four with the lowest perplexity value:

WHYPAVGAALWE 12.713875

WRYYAAGVEHKE 15.984515

WRYYVVALAHKE 16.276417

WRYYVVAAALKE 15.368100

For the first peptide: 12.7

The peptide binds along the outer surface of the SOD1 structure, lying across loops adjacent to the β-barrel rather than inserting into the core of the protein. It does not localize directly near the N-terminal region where the A4V mutation is located, nor does it clearly approach the predicted dimer interface.

For the second peptide:15.9

The peptide does not localize directly near the N-terminal region where the A4V mutation resides, instead, the peptide lies primarily on the protein surface, with a portion potentially partially accommodated within a shallow surface groove. The model shows an ipTM value of 0.45 and a pTM of 0.88, indicating moderate confidence in the protein–peptide interaction while maintaining high confidence in the overall protein structure.

For the third peptide:16.2

The peptide remains largely separated from the protein and appears flexible and surface-exposed, suggesting a lack of stable binding. The model shows an ipTM value of 0.23 and a pTM value of 0.77, indicating low confidence in the predicted protein–peptide interaction despite moderate confidence in the overall protein structure

The last peptide:15.3

The peptide remains largely surface-bound and extended, suggesting a relatively weak interaction with the protein. The model shows an ipTM value of 0.38 and a pTM value of 0.83, indicating moderate confidence in the overall protein structure but limited confidence in the predicted protein–peptide interaction.

The predicted ipTM values range from 0.22 to 0.45, indicating generally low to moderate confidence in the protein–peptide interactions. The models with ipTM values around 0.22–0.23 suggest weak or unlikely binding, while the highest value (0.45) shows a more plausible surface interaction with the SOD1 β-barrel. However, overall the predicted interfaces remain relatively weak, and none of the PepMLM-generated peptides clearly exceed the expected binding confidence of the known SOD1 binder.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

The first one:

The second:

The third:

The last one:

Overall, the AlphaFold models show low confidence in peptide binding (ipTM ~0.22–0.45). The peptide with the highest ipTM sits closer to the protein, but stronger ipTM doesn’t clearly match stronger predicted affinity since all peptides show weak binding. All peptides are predicted to be soluble and non-hemolytic. WRYYAAGVEHKE has the best balance because it has the strongest predicted affinity (~5.59) and good therapeutic properties. I would advance WRYYAAGVEHKE for further testing.

Part 4: Generate Optimized Peptides with moPPIt

The PepMLM candidates appear more optimized from a therapeutic perspective than the moPPIt results. They consistently show excellent solubility, very low hemolysis, and more coherent physicochemical profiles, while also converging toward aromatic-rich motifs that may favor hydrophobic interactions with SOD1. In contrast, moPPIt reflects a broader multi-objective search space, with greater sequence diversity and stronger trade-offs between affinity, half-life, and motif conservation.

However, both groups still show relatively weak predicted binding affinities overall, suggesting that computational optimization is currently improving developability more effectively than absolute binding strength.

How would you evaluate these peptides before advancing them to clinical studies?

Before advancing any of these peptides toward clinical studies, I would first perform AlphaFold3 structural validation and MD simulations to confirm stable binding at the SOD1 A4V interface. From there, the most promising candidates would undergo chemical stabilization (e.g., cyclization or D-amino acid substitution), followed by in vitro aggregation inhibition assays and cell viability testing in motor neuron models.

Week 7: Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks

What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Instead of relying on ON/OFF logic, they process information in a continuous and distributed way, closer to how real cells behave. This allows them to integrate multiple signals with different intensities and avoid the exponential complexity that arises when scaling Boolean circuits. In addition, IANNs are inherently more flexible, since their behavior can be tuned by adjusting interaction strengths rather than completely redesigning the system, enabling more complex and nonlinear decision-making. Overall, they provide a more efficient, scalable, and biologically realistic framework for intracellular computation.

Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

A useful application of an IANN would be an intracellular cancer classifier that integrates multiple biomarkers, such as oncogene expression, tumor suppressor loss, and metabolic changes, as continuous inputs. Instead of a simple Boolean response, it would compute a weighted, nonlinear decision and trigger outputs like pro-apoptotic gene expression only when the overall profile strongly indicates malignancy, improving specificity. However, limitations include difficulty in tuning interaction strengths, cellular noise, crosstalk with native pathways, and the challenge of reliably calibrating the system in vivo.

Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

Assignment Part 2: Fungal Materials

What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts? Derived from mycelium:

Based packaging: Used as a substitute for polystyrene foam in protective packaging.

Leather: Used in fashion as vegan leather alternatives.

Construction materials: Bricks, insulation panels, and composites used in architecture for lightweight, biodegradable building components

They are advantageous because are biodegradable, renewable, and can be grown with low energy input using agricultural waste, making them much more sustainable than petroleum-based plastics or animal-derived materials.

What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria? For the biodegradation of environmental pollutants, especially industrial textile waste, since many fungi already have strong natural decomposing abilities. For example, Trametes versicolor can help reduce textile contamination because it produces enzymes like laccases and peroxidases that break down synthetic dyes. This capability could be enhanced by overexpressing ligninolytic enzymes (laccases, manganese peroxidases, lignin peroxidases), coupling their expression to inducible systems that activate in the presence of pollutants, and improving tolerance to harsh industrial conditions such as extreme pH, temperature, and heavy metals. Additionally, optimizing secretion pathways would increase extracellular enzyme release and overall degradation efficiency, making the process more effective for wastewater treatment.

For the biodegradation of environmental pollutants, especially industrial textile waste, since many fungi already have strong natural decomposing abilities. For example, Trametes versicolor can help reduce textile contamination because it produces enzymes like laccases and peroxidases that break down synthetic dyes. This capability could be enhanced by overexpressing ligninolytic enzymes (laccases, manganese peroxidases, lignin peroxidases), coupling their expression to inducible systems that activate in the presence of pollutants, and improving tolerance to harsh industrial conditions such as extreme pH, temperature, and heavy metals. Additionally, optimizing secretion pathways would increase extracellular enzyme release and overall degradation efficiency, making the process more effective for wastewater treatment.

Advantages over bacteria for synthetic biology:

They are eukaryotic organisms that can perform complex post-translational modifications and proper protein folding required for many eukaryotic proteins.

They have a strong natural capacity for protein secretion, which simplifies production and purification of recombinant proteins.

More complex genetic regulation and larger gene constructs, making them suitable for expressing multi-step pathways.

Their cellular organization and scalability in industrial fermentation further make them attractive platforms for producing pharmaceuticals, enzymes, and biomaterials, although they are typically slower growing and less genetically tractable than bacteria.

Part 3: First DNA Twist Order

This insert is designed to be cloned into the pUC19 plasmid for expression in E. coli. The construct includes a pTet inducible promoter, allowing controlled expression of the gene of interest.

Week 9: Cell-Free Systems

Assignment Part A: General and Lecturer-Specific Questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

You can control everything directly. You can adjust pH, temperature, ion concentrations, and cofactors and even add or remove components during the reaction. Also, it’s faster since you don’t need cell growth or maintenance.

In terms of control, you can precisely tune the system and avoid cellular regulation that might limit expression.

Cases where it’s more beneficial:

Expression of toxic proteins, because they would kill the cells
Rapid protein production for screening experiments, testing many variants quickly

Describe the main components of a cell-free expression system and explain the role of each component.

Cell extract: Provides the machinery (ribosomes, tRNAs, enzymes) for transcription and translation
DNA or mRNA template: Contains the gene for the protein to be expressed
Amino acids: Building blocks for the protein
Energy system (ATP, GTP): Powers transcription and translation
Buffers and salts (Mg²⁺, K⁺): Maintain optimal conditions for the reaction

Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Because energy systems are consumed very quickly during transcription and translation. If it runs out, protein synthesis stops.

To maintain ATP levels, it’s possible to use phosphoenolpyruvate (PEP). It works by transferring a phosphate group to ADP, continuously regenerating ATP during the reaction.

Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic systems are faster, cheaper, and give high yields, but they lack complex post-translational modifications.
Eukaryotic systems are slower and more expensive, but they allow proper folding and modifications like glycosylation.

Examples:

Prokaryotic: GFP, because it’s simple and doesn’t need modifications Eukaryotic: an antibody, because it requires correct folding and post-translational modifications to function properly

How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

To mimic the membrane: Add liposomes or nanodiscs
Keep the protein soluble: Use mild detergents
Adjust conditions like Mg²⁺, temperature, and protein concentration
Include chaperones to help folding of the protein

Challenges and how to adress:

Misfolding: Membrane proteins don’t fold well without lipids → Solution: add liposomes or nanodiscs + chaperones
Aggregation: Hydrophobic regions stick together → Solution: use mild detergents and lower expression rate (e.g., lower temperature)
Low solubility/insertion efficiency → Solution: optimize lipid composition and Mg²⁺/salt conditions

Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Poor DNA/template quality or design → Troubleshooting: use a stronger promoter, optimize codons, check DNA purity
Energy depletion → Troubleshooting: improve the energy regeneration system (e.g., add PEP or increase substrates)
Protein misfolding or degradation → Troubleshooting: add chaperones, lower temperature, optimize reaction conditions

Assignment Part A: question from Kate Adamala

Pick a function and describe it.

What would your synthetic cell do?

It would act as a diagnostic biosensor for the detection of DENV. Its primary function is to transduce the presence of the viral E protein into a detectable signal, utilizing a two-stage recognition system:

External Sensor: A specific antibody and aptamer recognizes the DENV E protein, this recognition event triggers the release of theophylline, which acts as a messenger molecule.

Internal Sensor (Synthetic cell): The theophylline binds to a theophylline riboswitch, inducing a conformational change that exposes the ribosome binding site (RBS), allowing the translation of the LacZ reporter gene, generating a detectable signal.

What is the input and what is the output? (synthetic cell)

Input: Theophylline released by the external sensor

Output: Expression of beta-galactosidase (LacZ)

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Yes. This process can be carried out using only a cell-free Tx/Tl system. In this simplified setup, theophylline diffuses freely to activate LacZ expression without the need for encapsulation.

Could this function be realized by genetically modified natural cell?

It’s possible, but a cell-free system is more economical and streamlined than traditional cell-based methods for this purpose.

Describe the desired outcome of your synthetic cell operation.

The primary objective is to develop a sensitive, specific, user-friendly, and cost-effective biosensor for DENV detection. BUT the desired outcome of the synthetic cell operation is the expression of beta-galactosidase (LacZ) in the presence of theophylline

Design all components that would need to be part of your synthetic cell.

What would be the membrane made of?

Phospholipid liposomes and polymersomes to ensure membrane stability, permeability, and biocompatibility

What would you encapsulate inside? Enzymes, small molecules.

Genetic circuit, E. coli extract for Tx/Tl machinery and chromogenic substrates for β-galactosidase

Which organism your Tx/Tl system will come from? Is bacterial OK, or do you need a mammalian system for some reason?

E. coli

How will your synthetic cell communicate with the environment? (hint: are substrates permeable? or do you need to express the membrane channel?)

Utilizing an alpha-hemolysin channel to ensure the diffusion of theophylline and ONPG (the substrate for beta-galactosidase)

Experimental details

List all lipids and genes. (bonus: find the specific genes; for example, instead of just saying “small molecule membrane channel” pick the actual gene.)

Lipids: POPC, Cholesterol, DSPE-PEG2000 Enzymes: E.Coli bacterial cell-free Tx/Tl Genes: alpha-hemolysin, LacZ

How will you measure the function of your system?

System functionality is quantified through beta-galactosidase-mediated hydrolysis of ONPG

Assignment Part A: question from Peter Nguyen

FirstIdea

Write a one-sentence summary pitch sentence describing your concept

Walls that use cell-free photosynthetic systems integrated into the material to capture CO₂ from the air and transform it locally into oxygen during daily sun exposure.

How will the idea work, in more detail? Write 3-4 sentences or more.

The building’s exterior walls incorporate modular layers containing cell-free systems based on artificial or reconstituted photosynthetic pathways. These modules are activated by sunlight and controlled ambient humidity, triggering biochemical reactions that capture CO₂ from the air and release oxygen as a byproduct

What societal challenge or market need will this address?

The urgent need to reduce CO₂ in dense urban environments, it proposes an architecture that not only minimizes its footprint but also actively acts as environmental infrastructure, which is especially relevant in cities with high pollution and constant solar exposure.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

Activation is controlled by micro-encapsulation, which regulates water ingress, preventing unwanted continuous activation. For stability, stabilizers and housed in replaceable cartridges within the façade, allowing for periodic maintenance. By acknowledging the one-time or limited-cycle nature of these reactions, the idea is treat them as consumable layers—similar to filters—integrated into a building renewal cycle without compromising its main structure.

SecondIdea

Write a one-sentence summary pitch sentence describing your concept

An eco-friendly, ’living’ window coating utilizing freeze-dried cell-free systems to express UV-active chromoproteins that create avian-visible warning patterns while remaining perfectly transparent to the human eye.

How will the idea work, in more detail? Write 3-4 sentences or more.

A cell-free platform embedded into a transparent, porous biopolymer matrix applied to glass surfaces. When activated, the machinery expresses specialized UV-chromoproteins or enzymes that produce pigments with high absorbance/reflectance in the $300\text{–}400\text{ nm}$ range, which falls within the tetrachromatic visual spectrum of birds. These biological “inks” are arranged in specific geometric patterns that birds recognize as solid obstacles, triggering an avoidance response. Because humans lack UV photoreceptors and the proteins do not scatter visible light, the window appears clear to us while appearing “solid” or patterned to birds.

What societal challenge or market need will this address?

Every year, it is estimated that up to one billion birds die just in the United States alone due to collisions with glass windows, making it a leading cause of avian mortality and biodiversity loss. While UV-reflective stickers and “fritted” glass exist, they are aesthetically unpleasing, or lose effectiveness over time. There is a significant market need for sustainable, “invisible” retrofitting solutions for residential and commercial skyscrapers that can protect migratory species without altering architectural aesthetics

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

Stability: The cell-free machinery is freeze-dried within a protective matrix of trehalose or synthetic polymersomes, allowing it to remain dormant and stable during transport and installation.
Activation: The system is designed for hydro-activation; ambient humidity or rain triggers the controlled release of water into the micro-compartments, initiating protein synthesis precisely when the risk of collision (often higher in overcast/rainy weather) is present.
Longevity: To address the one-time use limit, the system would incorporate genetic circuits for protein stability and “slow-release” mechanisms where the chromoproteins are cross-linked to the matrix, ensuring the UV signal persists for several months before a simple, biodegradable “recharge” spray is required.

Assignment Part A: question from Ally Huang

Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

High‑energy heavy ions (HZE radiation) are a major hazard in deep‑space missions and are known to cause complex DNA and protein damage. Understanding how radiation accelerates molecular aging is essential for astronaut health, long‑duration missions, and space habitation. Cell‑free systems provide a lightweight, non‑living platform to directly study radiation‑induced molecular damage without confounding cellular repair mechanisms. This makes them ideal for spaceflight experiments, where resources are limited and biological containment is critical. Studying molecular aging in space has direct implications for human longevity, cancer risk, and the stability of biological systems beyond Earth.

Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words)

Green fluorescent protein (GFP)–encoding DNA and expressed GFP protein as molecular reporters of radiation‑induced damage and aging.

Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

GFP is a well‑characterized protein whose fluorescence is sensitive to errors in DNA transcription, protein folding, and structural integrity. Damage from HZE radiation may reduce protein yield, alter folding efficiency, or degrade fluorescence intensity. Using GFP in a BioBits® cell‑free system allows direct measurement of radiation‑induced molecular aging without cellular repair or replication. Differences in expression level or fluorescence provide a clear molecular readout of how space radiation affects fundamental biological processes.

Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

Exposure to HZE radiation causes measurable molecular aging in cell‑free systems, leading to decreased protein expression efficiency and reduced fluorescence intensity. Cell‑free reactions lack DNA repair mechanisms and protein turnover, making them highly sensitive indicators of cumulative radiation damage. If HZE radiation accelerates molecular aging, irradiated GFP‑encoding DNA or protein synthesis machinery will produce less functional GFP compared to non‑irradiated controls. By comparing fluorescence output, this experiment will isolate the direct effects of space radiation on molecular stability and function, independent of living cells.

Outline your experimental plan—identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

Freeze‑dried BioBits® reactions containing GFP DNA will be exposed to HZE radiation and compared to Earth‑based and flight non‑irradiated controls. Reactions will be rehydrated simultaneously and incubated under identical conditions. GFP fluorescence intensity will be measured using the P51 Molecular Fluorescence Viewer. Controls include unexposed freeze‑dried reactions and reactions expressing pre‑folded GFP. Data will consist of fluorescence intensity and expression consistency across conditions.

Labs

Lab writeups:

Week 1 Lab: Pipetting

Week 1 Lab: Pipetting

Projects

Final projects:

Pattern-Based Rapid Diagnostic Platform for Dengue Virus Using Aptamers
Abstract Dengue rapid tests targeting NS1 protein perform inconsistently across infection stages, delaying diagnosis in the low-resource settings where the disease hits hardest. This project develops a cell-free, aptamer-based biosensor that targets EDIII — a more immunogenically distinct domain of the dengue E protein — to achieve more reliable, stage-independent detection without cold chain or laboratory infrastructure. The core system couples three integrated modules: a capture antibody and aptamer selected to bind non-competing EDIII epitopes, confirmed via protein–protein interaction modeling; an aptamer–blocker–theophylline construct engineered for low leak and EDIII-triggered release; and a cell-free circuit linked through a theophylline riboswitch to drive LacZ expression as the final colorimetric signal. Aim 2 translates this system into a portable rapid test format. Aim 3 validates diagnostic performance and expands applicability across serotypes and sample matrices
Group Final Project
Computational Engineering of the MS2 Lysis Protein to Improve Stability, Titers, and Toxicity After reviewing the provided literature on the MS2 lysis protein (L) and discussing the project aims, our group has decided to focus on three interconnected goals: Goal 1: Increase the stability of the L protein As the “easiest” goal, it is the most computationally tractable. A stabilized protein is less prone to degradation and misfolding, which could directly lead to higher functional titers and serve as a robust starting point for any subsequent engineering.

Pattern-Based Rapid Diagnostic Platform for Dengue Virus Using Aptamers

Abstract

Dengue rapid tests targeting NS1 protein perform inconsistently across infection stages, delaying diagnosis in the low-resource settings where the disease hits hardest. This project develops a cell-free, aptamer-based biosensor that targets EDIII — a more immunogenically distinct domain of the dengue E protein — to achieve more reliable, stage-independent detection without cold chain or laboratory infrastructure. The core system couples three integrated modules: a capture antibody and aptamer selected to bind non-competing EDIII epitopes, confirmed via protein–protein interaction modeling; an aptamer–blocker–theophylline construct engineered for low leak and EDIII-triggered release; and a cell-free circuit linked through a theophylline riboswitch to drive LacZ expression as the final colorimetric signal. Aim 2 translates this system into a portable rapid test format. Aim 3 validates diagnostic performance and expands applicability across serotypes and sample matrices

Project aims

Develop an integrated molecular workflow for DENV detection

‘Build a functional cell-free biosensor in silico for dengue virus detection by designing and integrating three coupled molecular modules targeting the EDIII domain of the dengue E protein’

•1.1 Select a capture antibody and DNA aptamer with non-competing EDIII binding sites, validated through protein–protein interaction modeling, to enable sandwich-style recognition without steric interference.

•1.2 Design an aptamer–blocker–theophylline construct with low basal leak and confirmed EDIII-triggered theophylline release, optimized through sequence engineering and in vitro characterization.

•1.3 Implement a cell-free transcription–translation system regulated by a theophylline riboswitch driving LacZ expression as a colorimetric output, and verify end-to-end signal activation in response to recombinant EDIII.

Design and prototype a portable rapid test format

‘Executes the experimental validation and physical integration of all three modules.’

•2.1 Synthesize aptamer and blocker for validate antibody-aptamer non-competition by competitive ELISA and assemble and test the TXTL–riboswitch–LacZ circuit, confirming theophylline-dependent colorimetric output via X-gal.

•2.2 Characterize EDIII-triggered strand displacement and theophylline release by native PAGE; integrate all three modules in solution and confirm end-to-end colorimetric output 

•2.3 Lyophilize TXTL–Xgal components onto Whatman paper discs with trehalose as stabilizer; rehydrate with EDIII-spiked samples and confirm visual readout at room temperature.

Validate performance and expand diagnostic applicability weight: 4

‘If fully realized, this platform could redefine point-of-care dengue diagnostics by replacing traditional tests with a programmable, aptamer-driven biosensor accessible to anyone, anywhere’

•3.1 Evaluate sensitivity, specificity, and limit of detection against commercial NS1 tests and RT-PCR across infection stages
•3.2 Assess adaptability across serotypes and evaluate extension to other flaviviruses, enabling a single platform to support both individual diagnosis and community-level outbreak surveillance.
•3.3 Pharmacy-available, self-administered, room-temperature stable, and readable by eye. No clinic, no equipment; just accurate early diagnosis in the hands of patients who need it most and sensitivity, specificity, and limit of detection against commercial NS1 tests and RT-PCR across infection stages
 decentralized data for real-time epidemiological response.

Background

summarize two peer-reviewed research

Selection and Characterization of DNA Aptamers Targeting All Four Serotypes of Dengue Viruses. https://doi.org/10.1371/journal.pone.0131240

Chen et al. (2015) reported an DNA aptamer (S15), capable of binding the envelope protein domain III (EDIII) of dengue virus with high affinity, selected by SELEX against recombinant DENV-2 ED3. The aptamer adopts a G-quadruplex structure, with both the quadruplex fold and the 5′ sequence region essential for binding activity, and NMR titration experiments mapped its binding site to a highly conserved loop between the βA and βB strands of EDIII. Critically, S15 demonstrated neutralization activity against all four DENV serotypes, confirming that EDIII represents a serotype-conserved and immunogenically accessible target that a single aptamer can engage with broad cross-serotype coverage. This work was important because establishes the molecular rationale for targeting EDIII with a DNA aptamer in my project, directly informing the recognition module design of the aptamer–blocker–theophylline construct developed in Aim 1.

Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components

Pardee et al. (2016) established a landmark proof of concept for freeze-dried, cell-free (FDCF) diagnostic systems by embedding Zika virus–responsive toehold switch sensors into lyophilized paper discs that could be rehydrated with patient sample and read colorimetrically within one hour at room temperature. The system required no cold chain, no laboratory infrastructure, and no trained personnel, demonstrating that the full transcriptional and translational machinery of an E. coli TXTL system could survive lyophilization and retain functional activity after ambient storage. The paper further showed that toehold switch RNA sensors could be coupled to reporter genes — including LacZ — to produce visually interpretable colorimetric outputs, and that specificity could be maintained against closely related viral sequences including dengue. This work is the foundational precedent for the cell-free module of this project and validates the core architectural assumption that a toehold switch–LacZ TXTL circuit embedded on paper is a feasible and field-ready diagnostic format.

Novelty and Innovation

Existing dengue rapid diagnostics are architecturally static: they rely primarily on NS1 antigen detection or antibody-based serological targets, and cannot be reprogrammed when viral variants emerge or when a new pathogen requires detection. This project replaces that fixed architecture with a modular, programmable system in which each component: the aptamer, the blocker strand, the riboswitch, and the reporter — can be independently redesigned and reordered as a synthetic DNA sequence, enabling rapid adaptation without rebuilding the platform from scratch. Using multivalent AuNP-mediated upstream amplification of theophylline release seeks to transform a linear 1:1 detection stoichiometry into a signal-amplified cascade that pushes sensitivity toward the picomolar range without adding instrumentation. More broadly, the project expands the synthetic biology toolkit for diagnostics by demonstrating that a toehold-mediated strand displacement mechanism can serve as a chemical transducer between a protein recognition event and a nucleic acid–regulated gene expression circuit, as conceptual bridge between protein biosensing and programmable cell-free gene regulation that has broad applicability beyond dengue.

Why This Project Matters

Dengue is a systemic viral infection transmitted between humans by Aedes mosquitoes, with an estimated 390 million infections occurring annually across tropical and subtropical regions, placing an enormous burden on health systems across Latin America, Southeast Asia, and sub-Saharan Africa; regions where laboratory infrastructure is scarce and cold chain logistics are unreliable (Bhatt et al., 2013). In 2024 alone, over 14 million cases and nearly 9,500 deaths were recorded globally, representing a 12-fold increase compared to 2014, underscoring the accelerating urgency of this diagnostic gap (Haider et al., 2025). Current NS1 rapid tests achieve only 50–80% sensitivity, perform significantly worse in secondary infections, and lose reliability after day 7 of illness, meaning that a significant proportion of dengue patients in the highest-burden settings receive no actionable diagnosis at the moment when clinical management decisions matter most. By designing explicitly for pharmacy availability, room-temperature stability, self-administration, and a cost target below existing rapid tests, this project addresses not just a technical gap but a structural inequity in who gets access to accurate diagnosis. If successful, the platform would enable decentralized, real-time epidemiological surveillance through georeferenced positive result reporting, generating outbreak data from communities that currently fall entirely outside formal health surveillance networks, transforming individual diagnostic events into population-level public health intelligence. Beyond dengue, the modular aptamer–blocker–riboswitch–TXTL architecture is inherently adaptable: swapping the aptamer and adjusting the blocker sequence could redirect the platform toward other flaviviruses sharing structural homology with DENV.

Ethical Implications

My approach integrates several core bioethical principles into its foundational design. The principle of justice is central: dengue disproportionately burdens low-income populations in tropical regions who are systematically underserved by existing diagnostic markets, and deliberately designing a low-cost, pharmacy-available, self-administered test is an ethical act of redistributive access, ensuring that diagnostic innovation reaches those with the greatest need rather than those with the greatest purchasing power. The principle of beneficence governs the research itself: all experimental work involving human serum samples must proceed under IRB-approved protocols with full informed consent, de-identification of all samples, and adherence to BSL-2 biosafety standards. The georeferenced outbreak surveillance component raises privacy concerns that must be governed by the principle of non-maleficence: aggregating location data from self-administered tests could inadvertently expose individuals or communities to stigmatization, insurance discrimination, or government surveillance if not designed with strict anonymization, voluntary participation, and transparent data governance from the outset.

Concretely, several measures are required to ensure this project remains ethical throughout its development and deployment. IRB approval must be secured before any human sample is handled, and community engagement with populations in target endemic regions must occur before deployment, so that affected communities are stakeholders in the design of the tool rather than passive recipients of it. A critical uncertainty is whether the platform’s sensitivity in real patient serum will be sufficient to avoid false negatives, which carry their own harm: a patient who tests negative and does not seek care when in fact infected. To mitigate this, performance claims must be rigorously benchmarked against RT-PCR across all infection stages before any public deployment, and clear communication of the test’s limitations must accompany any over-the-counter format. An alternative to self-administration would be community health worker–administered testing, which adds a layer of human oversight and counseling but reduces accessibility, the tension between oversight and access is itself an ethical design decision that should be resolved in dialogue with the communities the platform is intended to serve.

Experimental design, techniques, tools and technology.

1 — Antibody-Aptamer Epitope Compatibility (WWEK 1)

Retrieve the crystal structure of DENV EDIII from the Protein Data Bank (PDB) and identify an available antibody co-crystal data.
Select a anti-EDIII antibody (in my case 4E11), map its binding footprint on EDIII using published structural data (PDB).
Identify an aptamer para EDIII and design 3D structure (RNA composer) and dock them computationally against the identified EDIII patch using Alphafold or Autodock.
Using pyMOL predicts binding affinity and position from the antibody and aptamer to ensure that they don’t overlap and compete for the union site

2 — Aptamer–Blocker–Theophylline Construct Design (WEEK 2)

Design a blocker strand partially complementary to the aptamer’s binding region, such that hybridization sequesters the aptamer in the absence of EDIII; incorporate a theophylline-conjugated toehold overhang to enable downstream riboswitch activation upon EDIII-triggered strand displacement.
Use NUPACK or mfold to model secondary structures of the aptamer–blocker duplex, optimizing for low basal leak (ΔG favoring duplex), fast displacement kinetics upon EDIII binding, and minimal off-target folding.
Test the functionality of the strand displacement and blocker strand using the Asimov Kernel to validate the following design:

Aptámero (A) + Hebra bloqueadora (B) + Teofilina (T) Estado OFF: A–B (T retenida)

Estado ON: A se une a EDIII → B se desplaza → T se libera

3 — Cell-Free Circuit Assembly and End-to-End Validation (WEEK 3)

Design a linear DNA construct encoding a theophylline riboswitch upstream of the lacZ reporter gene, using the canonical E. coli theophylline aptazyme sequence (Jenison et al.) as the regulatory element; optimize ribosome binding site (RBS) strength using the RBS Calculator (Salis Lab) to maximize dynamic range.
Prepare cell-free TXTL reactions using a commercial E. coli-based TXTL system; titrate theophylline concentration to establish dose-response curve for LacZ expression using X-gal colorimetric assay
Integrate all three modules: add recombinant EDIII to the aptamer–blocker construct, collect released theophylline, introduce it into the TXTL reaction, and measure LacZ output colorimetrically.

1 — Validate antibody-aptamer non-competition. Assemble and test the TXTL–riboswitch–LacZ circuit, confirming theophylline-dependent colorimetric output via X-gal. (WEEK 4/5)

Synthesize the DNA aptamer and blocker strand (Twist Bioscience or IDT); resuspend in TE buffer and verify integrity by agarose gel electrophoresis.
Incubate aptamer, capture antibody, and recombinant EDIII simultaneously; run competitive ELISA to confirm that aptamer and antibody bind non-competing EDIII epitopes.
Prepare the riboswitch–lacZ construct as a linear gBlock; verify sequence by Sanger sequencing.
Set up TXTL reactions with the construct and X-gal as substrate; titrate theophylline and record colorimetric output at 2 hours to confirm theophylline-dependent LacZ activation.

2 — Characterize EDIII-triggered strand displacement and theophylline release by native PAGE; integrate all three modules in solution and confirm end-to-end colorimetric output (WEEK 5/6)

Incubate the aptamer–blocker construct with increasing concentrations of recombinant EDIII at room temperature
Run native PAGE to visualize blocker displacement: the duplex band should decrease and a free aptamer band should appear as EDIII concentration increases.
Quantify released theophylline by absorbance or fluorescence according to the conjugation method used
Transfer the displacement reaction supernatant directly into the TXTL–X-gal reaction and monitor color development to confirm end-to-end signal activation
Run negative controls: scrambled aptamer, buffer without EDIII, and unrelated antigen

3 — Lyophilize TXTL–Xgal components onto Whatman paper discs with trehalose as stabilizer; rehydrate with EDIII-spiked samples and confirm visual readout at room temperature. (WEEK 6/7)

Mix TXTL components with the riboswitch–lacZ construct, X-gal, and trehalose; spot 2–5 µL per Whatman No. 1 paper disc.
Lyophilize discs using a benchtop lyophilizer; store at room temperature in sealed bags with desiccant.
Rehydrate each disc with 5–10 µL of human serum spiked with recombinant EDIII at varying concentrations.
Incubate at room temperature and record colorimetric output visually and by portable spectrophotometry.
Compare lyophilized discs against fresh reactions to assess activity retention after lyophilization.

1 — Evaluate sensitivity, specificity, and limit of detection against commercial NS1 tests and RT-PCR across infection stages (WEEK 8 - 12)

LOD determination: Prepare serial dilutions of recombinant EDIII in dengue-negative human serum (blood samples); run each concentration on lyophilized discs and record the lowest concentration producing a visible colorimetric signal. Compare against a standard curve generated by spectrophotometry.
Test the platform against a panel of potential cross-reactants including other flaviviruses antigens at equivalent concentrations for confirm absence of colorimetric output in all non-DENV samples.
Obtain samples from patients at early (days 1–3), acute (days 4–7), and late (days 8+) infection stages; run each sample on the lyophilized disc platform alongside a commercial NS1 rapid test (e.g., SD Bioline Dengue NS1) and RT-PCR to generate side-by-side sensitivity and specificity data.
Calculate sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for each infection stage; construct ROC curves to compare diagnostic performance across platforms

2 — Assess adaptability across serotypes and evaluate extension to other flaviviruses, enabling a single platform to support both individual diagnosis and community-level outbreak surveillance (WEEK 12 - 30)

Express and purify recombinant EDIII from all four DENV serotypes; verify identity and purity by SDS-PAGE and Western blot using serotype-specific antibodies.
Test the existing aptamer against EDIII from each serotype by BLI or competitive ELISA to map cross-reactivity; if binding drops significantly for any serotype, design and order serotype-specific aptamer variants.
Spot aptamer variants for each serotype in spatially distinct zones on a single paper disc; rehydrate with DENV1–4 spiked samples and assess whether zone-specific colorimetric output enables serotype discrimination visually.
Test platform performance against Zika and West Nile EDIII to assess structural homology-driven cross-detection; evaluate whether riboswitch variants or aptamer modifications can tune selectivity for surveillance applications targeting multiple co-circulating flaviviruses.

3 — Pharmacy-available, self-administered, readable by eye. (WEEK 30 - 52)

Recruit volunteer participants, provide them the lyophilized disc test with a one-page illustrated instruction sheet only; record time-to-result, error rate, and ability to correctly interpret colorimetric output without assistance.
Calculate per-test material cost including paper disc, lyophilized TXTL components, aptamer, and packaging; benchmark against commercial NS1 rapid tests and RT-PCR to confirm economic accessibility for pharmacy distribution.
Store lyophilized discs at 37°C and 60% relative humidity for 4 weeks to simulate tropical field conditions; test activity retention weekly by rehydrating with EDIII-spiked samples and comparing colorimetric output to freshly prepared discs.
Design a data capture tool (QR code printed on the disc packaging) linking to a georeferenced reporting form, allowing positive results self-reported from any location to feed into a real-time outbreak map, enabling decentralized surveillance without requiring clinic visits or laboratory confirmation.
Identify the regulatory framework applicable in target endemic countries; map the evidence package required for approval as an over-the-counter diagnostic and outline the validation study design needed to meet those standards.

Techniques Checklist

☑ DNA Construct Design ☑ Databases (RCSB PDB, NCBI, NUPACK) ☑ Protein Design ☑ Use of Benchling ☑ Cell-Free Systems ☑ Primer Design or Selection

Technique Deep-Dives

DNA Construct Design:

The DNA design is a linear construct encoding a theophylline-responsive riboswitch upstream of a T7 promoter–driven lacZ reporter gene. The full sequence — including the validated Jenison et al. aptazyme, the lacZ coding sequence codon-optimized for E. coli cell-free expression, and a T7 terminator were assembled in silico using Benchling andwould be verified for OFF/ON structural states using NUPACK and ViennaRNA before synthesis. The construct will be ordered as a linear gBlock from Twist Bioscience, requiring no cloning, as linear DNA is compatible with TXTL reactions, thereby simplifying the workflow. In parallel, the aptamer–blocker–theophylline module will also be designed as synthetic DNA oligos with a precisely tuned toehold region (6–10 nt) and blocker GC content optimized computationally to minimize leak in the OFF state while ensuring fast strand displacement kinetics upon EDIII binding.

Cell-Free Systems:

The cell-free (TXTL) system is the core signal-generating engine of the biosensor, chosen because it operates without living cells, making it compatible with lyophilization and room-temperature storage, essential for a field-deployable diagnostic. Using a commercial E. coli–based TXTL kit, the riboswitch–lacZ construct will be added directly to the cell-free reaction alongside X-gal as colorimetric substrate, and theophylline will be titrated across a concentration range to establish a dose-response curve defining the minimum activation threshold. Once validated in solution, the complete TXTL mixture — including the gBlock, substrate, and trehalose as a cryoprotectant — will be spotted onto Whatman No. 1 paper discs and lyophilized using a benchtop lyophilizer, producing a dry, stable disc that reactivates upon addition of patient sample. This freeze-dried cell-free format eliminates cold chain requirements entirely and enables single-step operation: the patient adds sample, and a visible color change within 60 minutes or less indicates a positive dengue result.

Results & Quantitative Expectations

Protein-protein interaction

EDIII and aptamer S15G3

Kd: -3.311 Key EDIII and waste at the interface:

GLU 314 THR 315 GLN 316 HIS 317 GLY 318 C25 U26 C27 G28 G29 G30 C31

EDII-4E11 (anti-EDIII antibody )

Comparission results

In the EDIII and aptamer S15G3 interactions, binding interface involves the region of EDIII: residues 311–322 (the β-A/BC region) , suggesting stacking interactions with the RNA bases. On the aptamer side, the most engaged nucleotides are located in the 3′ stem-loop (C18–U26, G28–C31), which is consistent with RNA aptamers that use hairpin structures for protein recognition.

In the EDIIand antibody 4E11 interactions, involve the region: residues 20–90 that forms part of the antibody-binding epitope. These contacts likely contribute to specific antigen recognition and may involve exposed β-sheet surfaces characteristic of ED3. Aromatic and charged residues within this region may participate in hydrogen bonding, hydrophobic, and electrostatic interactions that stabilize the complex.

Overall, these results suggest that the capture antibody (4E11) and the aptamer (S15G3) occupy distinct binding sites on the ED3 domain. This ensures that neither molecule interferes with the binding activity of the other; instead, they provide a dual-recognition mechanism that is expected to enhance both target capture and molecular recognition

Aspects to validate

The design of a linear DNA construct encoding a theophylline-responsive riboswitch upstream of a lacZ reporter gene, representing the core signal-generating module of the biosensor. Confirme that this construct produces theophylline-dependent LacZ expression in a cell-free TXTL system is the critical proof-of-concept that underpins the entire downstream architecture, if the riboswitch–LacZ circuit does not activate reliably in response to theophylline, no other module in the system has diagnostic value.

Detailed Protocol

Retrieve the validated theophylline aptazyme sequence (Jenison et al., 1994) and the E. coli lacZ coding sequence from NCBI (GenBank accession V00296); assemble the full construct in Benchling in the following order: T7 promoter → theophylline riboswitch → RBS → lacZ → T7 terminator.
Verify OFF and ON secondary structures of the riboswitch computationally using NUPACK: confirm that in the absence of theophylline the RBS is sequestered within a stable stem-loop, and that theophylline binding shifts the equilibrium toward an open conformation exposing the RBS.
Design the construct as a linear gBlock (2,500–3,000 bp) and submit the sequence to Twist Bioscience for synthesis; upon arrival, resuspend in TE buffer to 10 ng/µL stock and verify size by 1% agarose gel electrophoresis.
Prepare a theophylline titration series in nuclease-free water: 0, 0.1, 0.5, 1, 2, 5, and 10 mM theophylline stocks.
Set up TXTL reactions on ice using myTXTL master mix following manufacturer protocol; add gBlock and X-gal as colorimetric substrate to each reaction; add theophylline at each concentration across reactions; include a no-DNA negative control and a constitutive lacZ positive control.
Load reactions into a 96-well plate; incubate in a plate reader and measure absorbance at 570 nm every 10 minutes for 3 hours.
If an Opentrons OT-2 robot is available, program the liquid handling steps as a Python protocol to ensure reproducibility and minimize pipetting error across replicates; run in triplicate for each theophylline concentration.
Plot absorbance at 570 nm versus time for each theophylline concentration; calculate fold-activation relative to the 0 mM control at the 120-minute timepoint; generate a Hill-function dose-response curve fitting signal vs. log[theophylline].

Synthetic Biology Techniques Utilized

The validation integrates multiple synthetic biology techniques. DNA construct design was performed entirely in silico using Benchling for sequence assembly and NUPACK for thermodynamic structural prediction of riboswitch OFF/ON states, ensuring the construct was computationally validated before synthesis. The construct would be ordered as a linear gBlock through Twist Bioscience, applying the principle that linear DNA is directly compatible with cell-free TXTL systems and requires no cloning, reducing turnaround time and eliminates transformation-related failure modes. Cell-free reaction setup and theophylline titration were performed using standard pipetting technique under the myTXTL manufacturer protocol, with colorimetric LacZ output measured via X-gal absorbance for visual and quantitative signal readout. Where available, an Opentrons OT-2 liquid handling robot would be programmed in Python to automate the reaction assembly, ensuring consistent volumes across all replicates and reducing human pipetting variability as a confounding source of error in the dose-response data.

Additional information

References

• Bhatt, S., Gething, P. W., Brady, O. J., Messina, J. P., Farlow, A. W., Moyes, C. L., Drake, J. M., Brownstein, J. S., Hoen, A. G., Sankoh, O., Myers, M. F., George, D. B., Jaenisch, T., Wint, G. R. W., Simmons, C. P., Scott, T. W., Farrar, J. J., & Hay, S. I. (2013). The global distribution and burden of dengue. Nature, 496(7446), 504-507. https://doi.org/10.1038/nature12060 • Chen, H., Hsiao, W., Lee, H., Wu, S., & Cheng, J. (2015). Selection and Characterization of DNA Aptamers Targeting All Four Serotypes of Dengue Viruses. PLoS ONE, 10(6), e0131240. https://doi.org/10.1371/journal.pone.0131240 • Haider, N., Hasan, M. N., Onyango, J., Billah, M., Khan, S., Papakonstantinou, D., Paudyal, P., & Asaduzzaman, M. (2025). Global dengue epidemic worsens with record 14 million cases and 9000 deaths reported in 2024. International Journal Of Infectious Diseases, 158, 107940. https://doi.org/10.1016/j.ijid.2025.107940 • Pardee, K., Green, A. A., Takahashi, M. K., Braff, D., Lambert, G., Lee, J. W., Ferrante, T., Ma, D., Donghia, N., Fan, M., Daringer, N. M., Bosch, I., Dudley, D. M., O’Connor, D. H., Gehrke, L., & Collins, J. J. (2016). Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components. Cell, 165(5), 1255-1266. https://doi.org/10.1016/j.cell.2016.04.059 • Tricou, V., Vu, H. T., Quynh, N. V., Nguyen, C. V., Tran, H. T., Farrar, J., Wills, B., & Simmons, C. P. (2010b). Comparison of two dengue NS1 rapid tests for sensitivity, specificity and relationship to viraemia and antibody responses. BMC Infectious Diseases, 10(1), 142. https://doi.org/10.1186/1471-2334-10-142

Group Final Project

Computational Engineering of the MS2 Lysis Protein to Improve Stability, Titers, and Toxicity

After reviewing the provided literature on the MS2 lysis protein (L) and discussing the project aims, our group has decided to focus on three interconnected goals:

Goal 1: Increase the stability of the L protein

As the “easiest” goal, it is the most computationally tractable. A stabilized protein is less prone to degradation and misfolding, which could directly lead to higher functional titers and serve as a robust starting point for any subsequent engineering.

Goal 2: Increase bacteriophage titers through improved lysis efficiency.

Phage therapy relies on high phage titers for effective bacterial killing and scalable manufacturing, but phage production can be limited by inefficient lysis or poor coordination between phage replication and host destruction. Improving the efficiency and timing of host cell lysis can therefore directly increase the number of phage particles released per infected cell.

The MS2 L protein is a small 75–amino acid membrane protein that triggers bacterial lysis and is essential for the release of new phage particles. In the paper Mutational analysis of the MS2 lysis protein L, it is described how MS2 L functions as a single-gene lysis protein that disrupts bacterial cell envelope integrity without classical enzymatic activity. Additionally, L interacts with the host chaperone DnaJ, which modulates its activity and timing of lysis. In MS2 Lysis of Escherichia coli Depends on Host Chaperone DnaJ it is shown that lysis timing strongly affects the number of virions produced before the host cell bursts, meaning that engineering improved L variants may increase overall phage titers.

Goal 3: Increase the toxicity of the lysis protein.

This proposal addresses the subproblem of increasing the toxicity of the L lysis protein from Bacteriophage MS2. Instead of random mutagenesis, toxicity will be approached as a multi-factor optimization problem involving structural stability, membrane insertion, oligomerization efficiency, and expression kinetics in Escherichia coli. The objective is to design L variants that enhance membrane disruption while maintaining proper folding and stability.

E. coli chaperone DnaJ.

Additionally, we will explore disrupting the interaction between the L protein and the E. coli chaperone DnaJ.

The reading “Identification MS2 lysis protein dependency on DnaJ” establishes this interaction as critical for function. By computationally predicting and then disrupting this interface, we can test its necessity and potentially create a DnaJ-independent lysis mechanism, offering a new avenue for controlling lysis timing.

Together, these three goals form a coherent strategy: stabilizing the L protein may improve its folding and expression, which can increase functional titers, while further engineering of membrane disruption and host interactions may increase toxicity and lysis efficiency.

Proposed Computational Tools and Approaches

Proposed Tools and Approaches We will build a computational pipeline using the tools introduced in recitation and the provided resources. The key steps and tools are:

Step 1: Structural Modeling of the L Protein

Tool: AlphaFold2 (via ColabFold for ease of use).

Why: No high-resolution experimental structure of the full-length MS2 L protein exists. A reliable 3D model is the absolute foundation for all downstream analysis, allowing us to visualize which parts are structured vs. disordered.

Step 2: Modeling the L-DnaJ Complex

Tool: AlphaFold-Multimer.

Why: To disrupt the interaction, we first need to know where it occurs. AlphaFold-Multimer is the current state-of-the-art for predicting protein-protein complexes and will generate a testable model of the L protein bound to E. coli DnaJ.

Step 3: In Silico Mutagenesis for Stability

Tool: Rosetta (or FoldX). Specifically, the ddg_monomer application for predicting changes in folding free energy (ΔΔG).

Why: These tools are parameterized using vast amounts of experimental data on protein stability. They can systematically mutate each residue in our L protein model and predict whether the change (e.g., A->V) makes the protein more stable (negative ΔΔG) or less stable (positive ΔΔG).

Step 4: Visualizing and Selecting Interface Mutations

Tool: PyMOL and the HTGAA Protein Engineering Tools spreadsheet.

Why: We will use PyMOL to visually inspect the predicted L-DnaJ complex from Step 2 and select residues at the interface. We will then use the spreadsheet to check the conservation of those residues and manually design mutations (e.g., swapping a large hydrophobic residue for a charged one) predicted to break the interaction.

Protein Language Models (PLMs)

Protein language models such as ESM or ProtBERT will be used to perform in silico mutagenesis on the MS2 L protein sequence. These models can suggest mutations that preserve structural and functional constraints learned from large protein datasets.

This approach allows us to generate multiple candidate mutations across the L protein, avoid mutations likely to disrupt folding, and explore sequence space beyond naturally occurring variants

AlphaFold Structure Prediction

Each candidate L variant will be analyzed using AlphaFold to predict protein structure and membrane topology. Since the C-terminal transmembrane region is essential for lytic activity, structural prediction will help identify mutations that preserve this functional domain.

Structural predictions will also help identify:

misfolded variants
mutations that destabilize the transmembrane region
variants that may alter oligomerization or membrane insertion

Interaction Modeling with Host Proteins

Because MS2 L interacts with the DnaJ chaperone, which affects lysis timing, candidate variants can be evaluated using AlphaFold-Multimer to predict changes in the L–DnaJ interaction.

This could help identify variants that:

maintain necessary folding assistance
reduce excessive dependency on host chaperones
improve robustness of lysis across physiological conditions

Proposed Computational Strategy

First, protein language models (e.g., ESM-2, ProtT5) will be used to perform directed in silico mutagenesis. These models capture evolutionary constraints and residue interactions, enabling the generation of structurally plausible variants while identifying mutation-tolerant and functionally critical positions. This step efficiently reduces the combinatorial search space.

Second, predicted variants will be structurally evaluated using AlphaFold2 for monomer folding and AlphaFold - Multimer to assess oligomerization and interaction with host factors such as DnaJ.

Third, membrane compatibility will be analyzed using membrane-aware modeling (RosettaMP) and selected molecular dynamics simulations.

Fourth, ΔΔG prediction tools (e.g., FoldX, Rosetta energy functions) will filter out destabilizing mutations.

In parallel, codon optimization algorithms will redesign selected variants for improved expression in E. coli, as toxicity depends on both structure and intracellular concentration.

Potential Pitfalls

Pitfall 1: Dynamic Regions and Model Quality

The L protein is small and likely has flexible/disordered regions, especially in its N-terminal domain.

Pitfall 2: Stability vs. Function Trade-off

A mutation that makes the protein more stable in its monomeric state might prevent it from undergoing the necessary conformational changes to oligomerize and form a pore in the membrane.

Pitfall 3: Lack of Membrane Context

Our stability predictions (Rosetta) are performed in a virtual “aqueous” environment and do not account for the energetic complexity of the lipid bilayer.

Limited biological data: There is still limited structural and mechanistic knowledge about MS2 L.

Cellular context not captured computationally Protein modeling tools may not fully capture membrane environment.

One limitation is the scarcity of quantitative datasets linking specific mutations to measured lysis kinetics.

L-Protein Mutants

To generate the first two mutations in the L protein of bacteriophage MS2 within the transmembrane region, I selected the top candidates predicted by the Python models and the spreadsheet analysis for that region. I applied the same approach to the soluble region, ensuring that all mutations were introduced at amino acid positions with less constrained mutability.

METRFPQQSQQTPASTNRRRPFKHEDYPCRRNQRSSTLlVLIFLAIFLSlFTlQLLLSLLEAVIRTVTTLQQLLT
METRFPQQSQQTPASTNRRRPFKHEDYPCRRNQRSSTLheLnlvpnFLleFTNQLhLSLLEAeIRTVTTLQQLLT
METRqPQQqQQTPASTNRRRPFKHEDYPrRRNQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
lEiRqPQQqQQTPASTNRRRPFKHEDYPrRRNQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

For the final mutation, which was the most aggressive, I introduced mutations in both regions across all possible amino acid positions.

lEiRqPQQqQQTPASTNRRRPFKHEDYPrRRNQRSSTLleLnlvpnFLleFTlQLhLSLLEAeIRTVTTLQQLLT

<DIANA GRIMALDOS> — HTGAA Spring 2026

About me

Contact info

Homework

Labs

Projects

Subsections of <DIANA GRIMALDOS> — HTGAA Spring 2026

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

Week 10: Advanced Imaging & Measurement Technology

Homework: Final Project

Homework: Waters Part I — Molecular Weight

Homework: Waters Part II — Secondary/Tertiary structure

Homework: Waters Part III — Peptide Mapping - primary structure

Homework: Waters Part IV — Oligomers

Homework: Waters Part V — Did I make GFP?

Week 11 — Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Part D: Build-A-Cloud-Lab | (optional) Bonus Assignment

Week 2 HW: DNA Read, Write and Edit

Week 2 HW: Lab Automation

Opentrons Artwork

Post-Lab Questions

Find and describe a published paper that utilizes the Opentrons

Write a description about what you intend to do with automation tools for your final project

Final Project Ideas

Week 4: Protein Design Part I

Part A. Conceptual Questions

Part B. Protein Analysis and Visualization

C1. Protein Language Modeling

D. Group Brainstorm on Bacteriophage Engineering

Week 5 HW: Genetic circuits part 1

Assignment: DNA Assembly

Assignment: Asimov Kernel

Final project

Week 5 HW: Protein design part 2

Part A: SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

Part 2: Evaluate Binders with AlphaFold3

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Part 4: Generate Optimized Peptides with moPPIt

Week 7: Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks

Assignment Part 2: Fungal Materials

Part 3: First DNA Twist Order

Week 9: Cell-Free Systems

Assignment Part A: General and Lecturer-Specific Questions

Assignment Part A: question from Kate Adamala

Assignment Part A: question from Peter Nguyen

Assignment Part A: question from Ally Huang

Labs

Lab writeups:

Subsections of Labs

Week 1 Lab: Pipetting

Projects

Final projects:

Subsections of Projects

Pattern-Based Rapid Diagnostic Platform for Dengue Virus Using Aptamers

Abstract

Project aims

Background

Experimental design, techniques, tools and technology.

Techniques Checklist

Technique Deep-Dives

Results & Quantitative Expectations

Additional information

Group Final Project

Computational Engineering of the MS2 Lysis Protein to Improve Stability, Titers, and Toxicity

Goal 1: Increase the stability of the L protein

Goal 2: Increase bacteriophage titers through improved lysis efficiency.

Goal 3: Increase the toxicity of the lysis protein.

E. coli chaperone DnaJ.

Proposed Computational Tools and Approaches

Protein Language Models (PLMs)

AlphaFold Structure Prediction

Interaction Modeling with Host Proteins