Subsections of 2026a-siwei-zhang

Homework

Weekly homework submissions:

  • Week 1 HW: hw-principles-and-practices

    The Prometheus Symbiont 🎅1.“The Prometheus Symbiont” is a conceptual, living medical system designed to symbiotically integrate with the human body. It merges biomimetic photosynthesis, synthetic biology, and flexible electronics, aiming to shift medicine from “passive treatment” to active, sustained life maintenance and enhancement. You can think of it as a sunlight-powered, wearable or implantable “second life-support system.” 🎅The Prometheus Symbiont is not merely a technological concept; it is more akin to a philosophical proposition about the future form of life. It blurs the boundaries between therapy and enhancement, between human and machine. Its ultimate significance may lie in compelling us to re-examine: “what constitutes health, and indeed, what it means to be human.”

  • Week 10 HW: hw-imaging-and-measurement

    Homework: Final Project Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. 🎅All projects will be completed in two parts: Natural Photosynthesis Models A computational simulation experiment of de novo protein design that relies on self‑supplied photosynthetic energy and is driven by continuous directed evolution No directly relevant energy‑coupled evolution system exists. It is important to note that no published system to date has achieved the full loop where functional activity drives ATP regeneration, which in turn drives the evolution of the function itself. All existing continuous evolution systems, regardless of their maturity, rely on external energy input (i.e., normal host cell metabolism) to drive mutation and selection – there is no functional coupling between evolution and energy supply. This is precisely the core innovation space of our project: shifting directed evolution from “externally powered” to “self‑powered by function”.

  • Week 11 HW: hw-building-genomes

    👩‍🦰Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork Reflections on the HTGAA 2026 Collaborative Community Bioart Project 1. My Contribution to the Project For this collaborative bioart experiment, I made part of the DNA pattern on the bottom right plate, ensuring the engineering of the custom fragments aligned perfectly with the broader communal design layout.

  • Week 12 HW: hw-bioproduction

  • Week 13 HW: hw-bio-design-living-materials

  • Week 14 HW: hw-biofabrication

  • Week 2 HW: DNA Read, Write, & Edit

    Homework — DUE BY FEB 17 2PM MIT TIME 👨‍🦰Part 0: Basics of Gel Electrophoresis Keypoint: Gel Electrophoresis: Used for separating, identifying, and purifying fragments of DNA, RNA, or proteins. Gel Preparation: Add agarose powder to the buffer, heat until melted, pour the solution into the gel tray, insert the comb, and allow it to cool and solidify. Sample Loading: Remove the comb, place the gel into the electrophoresis tank, and add buffer until the gel is covered. Mix the DNA sample with loading buffer, then load the mixture into the wells.

  • Week 3 HW: hw-lab-automation

    ヾ(≧▽≦*)oAssignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME! The Biopunk lab hasn’t contacted me yet. The Opentrons API is a Python framework for writing automated biology lab protocols. 1.Load labware (containers, tip racks, plates); 2.Load instruments (pipettes); 3.Define your liquid handling steps; The basic artistic GUI will involve: Getting coordinates from the GUI tool; Writing a Python script that moves the pipette to those positions; Using the HTGAA26 Colab notebook as your template:https://ddls.aicell.io/course/ddls-2025/module-6/lab/#-what-is-a-code-agent;

  • Week 4 HW: hw-protein-design-part-i

    🐉 Project Objective: Bacteriophage Engineering This document outlines the core learning experience and the collaborative framework designed to drive an optimized bacteriophage project.

  1. Mastery of Basic Concepts Phage Biology: Understanding the lytic and lysogenic life cycles, and the structural modularity of viral components (Capsid, Tail, Baseplate). Synthetic Biology Framework: Introduction to the “Design-Build-Test-Learn” (DBTL) cycle in viral engineering. Therapeutic Potential: Exploring the role of phages in addressing antimicrobial resistance (AMR) and precision microbiome editing. 2. Amino Acid Structure & Biochemistry Chemical Taxonomy: Categorization of the 20 standard amino acids based on hydrophobicity, charge, and polarity. Side-Chain Interactions: Analyzing how hydrogen bonds, salt bridges, and disulfide bridges dictate protein stability. Conformational Constraints: Understanding the Ramachandran plot and the energetic landscape of protein folding. 3. 3D Protein Visualization & Analysis Software Proficiency: Hands-on training with professional-grade tools such as PyMOL, ChimeraX, or NGL Viewer. Structural Mapping: Visualizing surface electrostatic potentials, hydrophobicity, and potential binding pockets. Superimposition: Learning to align wild-type and mutant structures to assess structural deviations (RMSD). 4. Diversity of ML-based Design Tools Structure Prediction: Leveraging AlphaFold 3 or RoseTTAFold for high-accuracy 3D modeling of viral proteins. Fixed-Backbone Design: Using ProteinMPNN to redesign amino acid sequences for a specific structural scaffold. Generative Scaffolding: Implementing RFdiffusion for de novo design of receptor-binding motifs and functional binders. Sequence Modeling: Utilizing Protein Language Models (e.g., ESM-3) to predict the impact of specific mutations on protein function. 👩‍🦰 Part A: Fundamental Principles & Frontiers in Protein Engineering This section covers fundamental inquiries into biochemistry, evolutionary biology, and structural protein design.
  • Week 5 HW: hw-protein-design-part-ii

    🐉 Part A: SOD1 Binder Peptide Design (From Pranam) Background:Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc. Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

  • Week 6 HW: hw-genetic-circuits-part-i

    Week 6 — Genetic Circuits I: DNA Assembly Technologies Molecular Biology Lab Report: PCR & Assembly Techniques Components of Phusion High-Fidelity PCR Master MixPhusion Master Mix is a convenient 2X concentrated solution containing:Phusion DNA Polymerase: A pyrococcus-like enzyme fused with a processivity-enhancing domain. It provides extremely high fidelity ($50\times$ higher than Taq) and speed.dNTPs: The building blocks ($dATP, dTTP, dCTP, dGTP$) for the new DNA strand.Reaction Buffer: Maintains optimal pH and provides ionic strength.MgCl2: A necessary cofactor for polymerase activity.

  • Week 7 HW: hw-genetic-circuits-part-ii

    Week 7 — Genetic Circuits Part II: Neuromorphic Circuits Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) The shift from Boolean genetic circuits to Intercellular Artificial Neural Networks (IANNs) represents a move from simple digital logic to complex, analog, and adaptive biological computing.

  • Week 9 HW: hw-cell-free-systems

    Week 9 — Cell-Free Systems Homework Part A: General and Lecturer-Specific Questions Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Subsections of Homework

Week 1 HW: hw-principles-and-practices

The Prometheus Symbiont

image image

🎅1.“The Prometheus Symbiont” is a conceptual, living medical system designed to symbiotically integrate with the human body. It merges biomimetic photosynthesis, synthetic biology, and flexible electronics, aiming to shift medicine from “passive treatment” to active, sustained life maintenance and enhancement. You can think of it as a sunlight-powered, wearable or implantable “second life-support system.”

🎅The Prometheus Symbiont is not merely a technological concept; it is more akin to a philosophical proposition about the future form of life. It blurs the boundaries between therapy and enhancement, between human and machine. Its ultimate significance may lie in compelling us to re-examine: “what constitutes health, and indeed, what it means to be human.”

Technical Integration How can living cells, electronic components, and polymer materials work together stably and safely within the human body over the long term?

Biosafety How can we prevent the leakage or mutation of genetically engineered microorganisms? How do we ensure the system can be safely degraded or cleared upon failure?

Ethical & Social Where is the boundary between the human body and machine? How is data privacy guaranteed? How can technological fairness be achieved?

Regulatory & Approval Does it belong to the category of medical devices, pharmaceuticals, or a new biological product? How can a completely new regulatory framework be established?

🎅Purpose: Current State vs. Proposed Change

What is done now (Current Paradigm): Medicine primarily operates in a reactive and episodic manner. Patients seek help after symptoms appear. Treatments involve separate devices (monitors), pharmaceuticals (drugs), and procedures, often with significant side effects and limited personalization. Sustainable energy and materials for medical devices are external concerns.

What we propose (Paradigm Shift): We propose a shift to a proactive, continuous, and integrated symbiosis. The Prometheus Symbiont is a single, autonomous system that continuously monitors, analyzes, and responds to the body’s state in real-time. It moves beyond treating illness to sustaining and enhancing baseline health. Crucially, it aims for energy and material autarky within the body by using biomimetic photosynthesis, fundamentally changing the relationship between medical technology and the patient’s own biological processes.

🎅Design: Requirements for Functionality & Key Actors Technical Core: Research Scientists (Synthetic Biology, Materials Science, Biomedical Engineering), Bioethicists, University Tech Transfer Offices 1.Stable Hybrid Bio-Machine Interface: Materials and protocols to seamlessly integrate living cells (engineered cyanobacteria/yeast), flexible electronics, and polymers. 2.Advanced Synthetic Biology: Engineered microbes for photosynthesis, sensing, and drug production with robust safety “kill-switches.” 3.Efficient Energy & Data Transfer: Systems for light capture, intracellular energy (ATP/NADPH) transfer to synthetic pathways, and secure bio-electrical data communication.

🎅Clinical & Regulatory Pathway: Government Regulators (FDA, EMA), Clinical Researchers, Ethics Boards, Patient Advocacy Groups 1.New Regulatory Framework: Classification as a novel “Symbiotic Biotherapeutic Device” requiring new FDA/EMA pathways. 2.Phased Clinical Trials: Long-term studies focusing on safety, stability, and efficacy for chronic conditions (e.g., diabetes, wound healing).

🎅Commercialization & Society: Venture Capitalists, Pharma/MedTech Companies, Government Funders (e.g., ARPA-H), Sociologists, The Public (as end-users and citizens) 1.Public-Private Funding Consortium: To fund high-risk R&D and scale-up. 2.Public Dialogue & Education: To build understanding and address ethical concerns before deployment. 3.New Manufacturing & Service Models: For growing, implanting, and maintaining living medical systems.

🎅Assumptions: Potential Uncertainties Technical Feasibility: We assume the extraordinary challenge of long-term, stable integration of diverse biological and electronic components within the dynamic human body can be solved. This is a fundamental uncertainty.

Biological Stability: We assume engineered genetic circuits will function predictably and reliably for decades without mutation or interference from the host’s immune system and microbiome.

Societal Acceptance: We assume that a significant portion of society will accept a permanent, living machine symbiont as a therapeutic or enhancement, overcoming the “yuck factor” and philosophical objections.

Regulatory Adaptability: We assume regulatory bodies can and will adapt at the pace of the technology to create prudent, effective pathways for such a disruptive product.

🎅Risks of Failure & “Success” Risks of Failure: Catastrophic Biofailure: Engineered microbes could mutate, cause infections, or disrupt vital physiological pathways, leading to patient harm. Rejection & Waste: The body’s immune system could reject the symbiont, or components could degrade into toxic byproducts. Technical Obsolescence: The embedded electronics or software could become outdated or hacked, rendering the system useless or dangerous.

🎅Risks of “Success” (Unintended Consequences): Exacerbating Inequality: The technology could create a biological divide between the “enhanced” wealthy and the “natural” poor, leading to unprecedented social stratification. Loss of Human Agency & Identity: If the system makes too many autonomous health decisions, it could erode personal bodily autonomy and challenge the very definition of being human. New Forms of Dependency & Vulnerability: Society could become dependent on a fragile technological ecosystem. Personal health data streams could be exploited for surveillance, discrimination, or coercion.

Ecological Impact: Widespread use and eventual disposal of genetically modified living devices could have unforeseen consequences on ecosystems if not perfectly contained.

In conclusion, the Prometheus Symbiont proposes a radical leap from repairing humans to architecting a hybrid human-machine biology. Its path is fraught with towering scientific hurdles and profound ethical questions, meaning its development must be accompanied by societal dialogue as intense as the engineering effort itself.

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents1
• By helping respond1
Foster Lab Safety
• By preventing incidentn/a
• By helping respond1
Protect the environment
• By preventing incidents1
• By helping respond1
Other considerations
• Minimizing costs and burdens to stakeholdersn/a
• Feasibility?n/a
• Not impede research1
• Promote constructive applications1

Based on the risk assessment of the disruptive technology “Prometheus Symbiont,” I recommend prioritizing the establishment of an “adaptive, multi-layered global governance framework” as the core focus of the governance strategy. My primary recommendation is directed at the Office of the United Nations Secretary-General, because the impact of this technology is inherently transboundary. Its ethical, safety, and equity issues require global coordination and consensus on principles; the potential risks cannot be effectively mitigated by the actions of any single nation.

My recommended priority solution is “Layered Adaptive Governance under Global Coordination.” This is not a single option, but a combination of international coordination, national/regional regulation, industry self-discipline, and public participation.

1. Top Layer: Establish Global Principles and Coordination Mechanisms (Led by the United Nations)

  • Action: Promote the adoption of the “Global Declaration on Ethical and Governance Principles for Human-Technology Symbionts” and establish a standing, interdisciplinary Global Advisory Committee on Emerging Bio-Hybrid Technologies (GACEBT).
  • Rationale: This provides the legitimacy foundation and “safety guardrails” for all subsequent governance. The committee, comprising scientists, ethicists, legal scholars, social activists, and government representatives, would be responsible for ongoing technology impact assessments, identifying transboundary risks (e.g., biosafety breaches, exacerbation of global inequality), and issuing non-binding guidelines. This avoids premature, rigid international legal constraints (which could stifle innovation) while establishing inviolable red lines.

2. Middle Layer: Develop National/Regional Specialized Regulatory Pathways (Led by Major Economies like the US, EU, and China)

  • Action: Under the guidance of GACEBT principles, national regulatory agencies (e.g., US FDA, EU EMA, China NMPA) should jointly design new product categories (e.g., “Class I Symbiotic Therapeutic Device”) and approval pathways for “active symbiotic medical systems.” This should include mandatory phased clinical trial protocols and a post-market supervision model of “pre-certification, monitoring, and re-evaluation.”
  • Rationale: This translates governance into actionable frameworks by entities with enforcement power. Coordination among major economies prevents regulatory arbitrage and provides a template for global standards. The pre-certification system allows for limited application under strict monitoring (e.g., for patients with terminal illnesses and no alternative therapies) while continuously collecting real-world data to refine the rules.

3. Grassroots Layer: Strengthen Industry Self-Regulation and Transparent Public Participation

  • Action: Encourage leading research institutions (e.g., MIT, Chinese Academy of Sciences) and industry consortia to develop open-source safety standard protocols (e.g., engineering design standards for biocontainment modules). Simultaneously, legislation should require R&D projects to conduct transparent social impact assessments from an early stage and incorporate public input through mechanisms like citizens’ juries.
  • Rationale: Governing technical details requires industry expertise, while public trust is foundational for societal acceptance. Open-source standards can accelerate the adoption of safe practices. Early public engagement helps identify social acceptance issues promptly, helping to avoid the public relations pitfalls experienced with technologies like GMOs.

🐱‍🐉Homework Questions from Professor Jacobson:

1.Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

Answer: 1.Error Rate of DNA Polymerase: 1:106;Beese et al., (1993), Science, 260, 352-355. 2.The haploid human genome contains roughly 3.16 billion base pairs (≈ 3.16 × 10⁹ bp);Without proofreading (at ~10-6 errors/bp), copying the entire genome once would introduce roughly: (3.16×10⁹)×10−6≈31,60 mutations;With proofreading (at ~10-¹⁰ errors/bp), the expected number of errors per genome replication is: (3.16×10⁹)×10^-¹⁰≈0.316 mutations. This means, on average, less than one error per replication cycle—a biologically tolerable rate. 3.To ensure stable genome expression, biological systems employ multiple layers of regulation that calibrate differences arising from genetic variation, environmental influences, and stochastic molecular events:

Transcriptional Fidelity & Regulation 1.Proofreading in transcription: Although RNA polymerases lack the extensive proofreading seen in DNA replication, some backtracking and cleavage mechanisms exist (e.g., in eukaryotic Pol II) to correct misincorporated nucleotides. 2.Promoter specificity & transcription factors (TFs): TFs and enhancer/repressor elements precisely control when and where genes are expressed, minimizing off-target or noisy transcription. 3.Chromatin remodeling & epigenetic marks: Histone modifications, DNA methylation, and nucleosome positioning ensure that genes are expressed in the correct cell type and developmental stage, buffering against improper activation or silencing.

Post-transcriptional Control 1.RNA processing: Splicing, capping, and polyadenylation are highly regulated to produce consistent mature mRNA isoforms. 2.RNA surveillance pathways: Nonsense-mediated decay (NMD) degrades mRNAs with premature stop codons. No-go decay (NGD) and non-stop decay (NSD) clear stalled or faulty transcripts. RNA editing (e.g., A-to-I editing) can correct or diversify transcripts in a regulated manner. 3.MicroRNAs & other small RNAs: Fine-tune mRNA stability and translation, reducing expression variability and silencing aberrant transcripts.

Translational Accuracy & Control 1.Ribosome proofreading: During tRNA selection, ribosomes favor accurate codon–anticodon pairing; elongation factors (e.g., EF-Tu) and ribosomal RNA help discriminate correct vs. incorrect tRNAs. 2.Regulation of initiation: Initiation factors (eIFs) and upstream open reading frames (uORFs) modulate translation rates to match cellular needs and stress conditions. 3.Ribosome quality control (RQC): Recognizes stalled ribosomes and triggers degradation of incomplete polypeptides and potentially faulty mRNAs.

Protein Homeostasis (Proteostasis) 1.Chaperones & folding catalysts: Assist proper protein folding, preventing aggregation of misfolded proteins. 2.Ubiquitin-proteasome system & autophagy: Degrade damaged, misfolded, or excess proteins. 3.Feedback regulation: Many metabolic and signaling pathways use allosteric feedback or post-translational modifications to maintain stable protein activity levels.

DNA Repair & Genome Integrity Maintenance 1.Continuous operation of mismatch repair (MMR), base excision repair (BER), nucleotide excision repair (NER), and double-strand break repair pathways prevents mutations from accumulating and altering gene expression programs. 2.Cell-cycle checkpoints halt division if DNA damage is detected, allowing time for repair or triggering apoptosis if damage is irreparable.

Systems-Level Buffering 1.Genetic redundancy: Duplicate genes or paralogs can compensate for loss or reduced function of one copy. 2.Robust network architectures: Many regulatory networks (e.g., transcription factor networks, signaling cascades) are built with feedback loops, redundancy, and modularity to maintain stable outputs despite perturbations. 3.Noise filtering: Stochastic fluctuations in molecule numbers are dampened through negative feedback, time-averaging mechanisms, or threshold-based activation.

2.How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

Answer: 1.For a 375-amino-acid protein, the total number of DNA sequences is the product of the number of codon possibilities for each position. 2.In essence, natural selection has chosen the specific DNA sequence for each human gene not just to encode the correct amino acids, but to also contain the precise regulatory, structural, and kinetic instructions needed for its proper expression, regulation, and function. The vast majority of theoretically possible sequences lack this full suite of integrated instructions. So it’s difficult to achieve under artificial conditions.

🤳Homework Questions from Dr. LeProust: 1.What’s the most commonly used method for oligo synthesis currently? The most widely used and established method for oligo synthesis is solid-phase synthesis using the phosphoramidite method. This technology is the industry standard for producing both DNA and RNA oligonucleotides in research and commercial settings. High Efficiency & Automation: Each coupling step has an efficiency exceeding 99%, enabling fully automated, high-throughput synthesis on machines. Versatile Chemistry: It provides a robust platform to introduce a vast array of chemical modifications (to the phosphate backbone, sugar, or base), which is crucial for creating therapeutic oligonucleotides like antisense drugs or siRNAs. Proven Reliability: As a mature technology refined over decades, it is the universal platform for commercial vendors and core facilities.

2.Why is it difficult to make oligos longer than 200nt via direct synthesis?

The ~200 nucleotide (nt) barrier for direct chemical synthesis is a fundamental limitation of the dominant phosphoramidite solid-phase method. The difficulty isn’t a single issue but a cascade of compounding chemical and practical problems.

The primary bottleneck is that synthesis is a stepwise process, and no chemical coupling is 100% efficient. This means for a 200-mer synthesis, over 60% of the product is shorter, failure sequences.

3.Why can’t you make a 2000bp gene via direct oligo synthesis? The challenges aren’t just linear; they become exponentially and prohibitively severe beyond ~200 nucleotides (nt). Synthesizing a 2000bp double-stranded gene would require creating a single-stranded oligo of at least 2000nt, which is scientifically and practically impossible with current direct chemical methods.

No existing purification technology (HPLC, PAGE, etc.) can separate a 2000nt strand from a 1999nt strand (a 0.05% difference in mass/length). The desired product is physically indistinguishable from the “near-miss” failures.

Gene assembly is the practical solution: it builds the skyscraper in prefabricated, high-quality sections (short oligos) and then welds them together with enzymatic precision.

😎Homework Question from George Church: 1.Choose ONE of the following three questions to answer; and please cite AI prompts or paper citations used, if any.

AI prompts:[Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Leucine\Isoleucine\Valine\Lysine\Methionine\Threonine\Tryptophan\Phenylalanine esp. for infants/young:Histidine Conditionally essential(esp. for young):Arginine

The “Lysine Contingency” is a clever plot device but is fundamentally flawed as a biological containment strategy for several key reasons:

the lysine contingency in Jurassic Park(Maynard, 2018; Rubini & Mayer, 2020) Lysine is already an essential amino acid for all vertebrate animals, including humans. This means animals like dinosaurs naturally cannot synthesize it and must obtain it from their diet. Therefore, the idea of “removing” a lysine-synthesizing ability they never possessed doesn’t work.

In summary, while the “Lysine Contingency” is an imaginative concept, it misunderstands basic animal biochemistry and fails as a practical fail-safe.

Rubini, R., & Mayer, C. (2020). Addicting Escherichia coli to new-to-nature reactions. ACS chemical biology, 15(12), 3093-3098. Maynard, A. (2018). Films from the future: the technology and morality of Sci-Fi movies. Mango Media Inc.

2.[Given slides #2 & 4 (AA:NA and NA:NA codes)] What code would you suggest for AA:AA interactions?

Energy-Based Interactions (More Accurate)

Using PyRosetta or BioPython with energy functions

import pyrosetta pyrosetta.init()

def calculate_interaction_energy(pose, res1, res2): """ Calculate interaction energy between two residues using PyRosetta. """ # Create a two-body energy calculator sfxn = pyrosetta.get_fa_scorefxn()

# Calculate energy between residues
emap = pyrosetta.EnergyMap()
pose.energies().residue_pair_energy(res1, res2, sfxn, emap)

return emap.total()

3.[(Advanced students)] Given the one paragraph abstracts for these real 2026 grant programs sketch a response to one of them or devise one of your own:

What if our most advanced biological medicines were as easy to ship and store as aspirin?

To break biologics’ extreme reliance on ultra-cold chains, a systemic transformation across science, logistics, and policy is required.

The immediate focus must be on re-engineering the molecules themselves. Massive investment in formulation science—utilizing advanced lyophilization, stabilizing sugars and polymers, and novel drying techniques—can shift storage from -70°C to 2-8°C or even room temperature. Parallel development of subcutaneous auto-injectors or oral delivery systems reduces dependency on clinic-based intravenous infusion.

Simultaneously, we must redesign the supply chain with intelligence and resilience. Deploying IoT-enabled smart containers with real-time tracking and blockchain ledgering ensures integrity and accountability. Creating distributed networks of certified storage points at regional centers expands access geographically. AI-driven predictive logistics can preempt shipping failures.

Long-term disruption will come from decentralizing production. Adopting modular, continuous biomanufacturing platforms enables regional or even hospital-based production, slashing distribution miles and cold-chain complexity. Next-generation platforms like thermostable lipid nanoparticles for nucleic acid therapies are equally crucial.

Finally, policy must incentivize accessibility. Regulators should create expedited pathways for thermostable products. Payers must align reimbursement with value metrics that include reduced logistical burden and improved patient access. A national strategy, treating biologic supply as critical infrastructure, can coordinate public-private R&D and strategic stockpiling.

The ultimate goal is a transition from a fragile, centralized, cold-dependent model to a resilient, distributed system where life-changing therapies are defined by their efficacy, not by the freezer they inhabit. This convergence of science, smart engineering, and supportive policy will democratize access to advanced medicines.

Week 10 HW: hw-imaging-and-measurement

Homework: Final Project

Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

🎅All projects will be completed in two parts:

  1. Natural Photosynthesis Models
  2. A computational simulation experiment of de novo protein design that relies on self‑supplied photosynthetic energy and is driven by continuous directed evolution

No directly relevant energy‑coupled evolution system exists. It is important to note that no published system to date has achieved the full loop where functional activity drives ATP regeneration, which in turn drives the evolution of the function itself. All existing continuous evolution systems, regardless of their maturity, rely on external energy input (i.e., normal host cell metabolism) to drive mutation and selection – there is no functional coupling between evolution and energy supply. This is precisely the core innovation space of our project: shifting directed evolution from “externally powered” to “self‑powered by function”.

Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements. What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.

Experimental Protocol for Exploring Natural Photosynthesis Principles: Photosynthesis-Chlorophyll Fluorescence System

Experimental Subject

Peanut (Arachis hypogaea) – a plant that exhibits leaf sleep movements (nyctinasty) and follows its own biological rhythm.

Core Instruments

  • DUAL-PAM 100 (Chlorophyll fluorescence and P700 measurement system)
  • GFS-3000 (Portable photosynthesis and gas exchange system)
  • 3010-DUAL Combined Leaf Chamber (for simultaneous measurements)

1. Experimental Objective

This experiment aims to systematically analyze the dynamic response mechanisms of photosynthetic efficiency in peanut leaves. By integrating high-precision photosynthesis-fluorescence measurements with the plant’s intrinsic circadian rhythm, we investigate the relationship between chlorophyll fluorescence kinetics and photosynthetic performance.

Peanut is chosen because it is not only a model crop sensitive to light intensity and CO₂ concentration changes, but also exhibits conspicuous “day-open, night-closed” leaf movements – a visible phenomenon that helps connect the invisible energy flow with physical observation.


2. Background: Circadian Rhythm of Peanut Leaves

Before starting the experiment, understanding the subject’s unique rhythm is essential. The “sleep movements” of peanut leaves often serve as an indicator of plant health. Deeper research reveals:

  • Endogenous rhythm: Even under continuous light, the photosynthetic rhythm of peanut has a period of approximately 26 hours.
  • Stomatal regulation: The rhythmic changes in photosynthesis are primarily driven by endogenous changes in stomatal aperture.
  • Enzyme activity variations: RuBisCO (the key enzyme for carbon fixation) shows highest activity during the normal photoperiod, which correlates with but is not exactly in phase with the peak net photosynthetic rate.

Key implication: Measurements must be taken at different times of the day (e.g., early morning, noon, evening, late night) to fully capture the dynamic photosynthetic behavior of peanut.


3. Core Technologies and Instrument Coupling

3.1 GFS‑3000 Function

Performs macroscopic analysis – precisely controls and monitors the leaf microenvironment while simultaneously measuring gas exchange.

  • Controls CO₂ concentration, temperature, humidity, and light intensity (0–3000 µmol·m⁻²·s⁻¹)
  • High-precision sensor (CO₂ ±0.2 ppm) calculates net photosynthetic rate (A), stomatal conductance (gₛ), etc.
3.2 DUAL‑PAM‑100 Function

Performs microscopic analysis – measures chlorophyll fluorescence and P700 signals to reveal the state of photosystems (PSII/PSI) and electron transport efficiency.

3.3 Coupling Function (3010‑DUAL Leaf Chamber)

Allows simultaneous acquisition of macroscopic gas exchange data and microscopic photosystem activity under identical environmental conditions.

Advanced functions derived from coupling:

  • Combined PSI/PSII parameters
  • Dynamic light response curves

4. Detailed Experimental Design and Steps

4.1 Preparation Phase

Plant preparation

  • Select healthy peanut plants.
  • Acclimate them to the intended experimental environment (e.g., 22–25 °C, 16 h light / 8 h dark) for at least 24 h.

Instrument warm-up

  • Turn on and preheat the GFS‑3000 at least 1 h before measurement to stabilize internal temperature.

System coupling

  • Dark-adapt a leaf for 30–45 minutes.
  • Clamp the leaf into the 3010‑DUAL combined chamber, ensuring a tight seal and correct connections.
4.2 Formal Experiments

Test 1: Photosynthetic induction kinetics

  • Measure initial fluorescence (Fo) under weak light (≤1 µmol·m⁻²·s⁻¹).
  • Turn on actinic light (300–400 µmol·m⁻²·s⁻¹) and simultaneously start recording with both GFS‑3000 and DUAL‑PAM‑100.

Test 2: Light response curves (multiple)

  • After completing the induction protocol, run an automatic sequence of light intensities (low → high → low) to obtain a light response curve (hysteresis) .
  • Record net photosynthetic rate vs. light intensity.
  • Simultaneously obtain Y(II), NPQ, and other fluorescence parameters at each light level.

Test 3: CO₂ response curve

  • Use the GFS‑3000 program under stable light and temperature.
  • Expose the leaf to a stepwise change in CO₂ concentration (e.g., 400 → 50 → 1500 ppm).
  • Generate an A/Ci curve (CO₂ response).

Test 4: Fluorescence induction and quenching analysis

  • Use DUAL‑PAM‑100 saturation pulses to analyze fluorescence quenching components:
    • Photochemical quenching (qP)
    • Non‑photochemical quenching (NPQ)
4.3 Replication and Recording
  • Perform each measurement on at least three individual plants for statistical reliability.
  • After each change in conditions, wait for the signal to stabilize before recording data.
  • Data backup: In addition to automatic instrument storage, manually record key time points and values.

5. Key Considerations

  • Warm‑up and calibration: Warm up instruments for at least 15 minutes; for highest precision, allow 1 hour. Refer to technical references:

    • 10.1104/pp.53.6.907
    • 10.3389/fpls.2014.00766
    • 10.1007/s11120-022-00924-z
  • Data safety: Always keep manual records alongside automated logs.

  • Exploratory extension (recommended): Measure photosynthetic parameters during daytime vs. nighttime (leaf “awake” vs. “asleep” states) and analyze how leaf sleep movements affect gas exchange (GFS‑3000 data) and PSII function (chlorophyll fluorescence data).


6. Expected Outcomes

  • Quantified relationship between light intensity, CO₂ concentration, and photosynthetic rate in peanut.
  • Dynamic profiles of chlorophyll fluorescence parameters (Fo, Fm, Y(II), NPQ) during induction and under varying light/CO₂.
  • Correlation between leaf nyctinasty (sleep movements) and photosynthetic efficiency.
  • Validation of the coupled GFS‑3000 + DUAL‑PAM‑100 platform for investigating natural photosynthesis principles.

7. Figure / Schematic Suggestion (Optional)

[Peanut plant] → (Dark adaptation) → [3010-DUAL chamber] 
                              GFS-3000 ←→ DUAL-PAM-100
                                  ↓              ↓
                            Gas exchange    Fluorescence &
                            (A, gs, Ci)     P700 (Y(II), NPY)

In silico design of a mini de novo protein (e.g., 80‑120 aa) with high predicted stability

Case article analysis

System overview diagram

🧓Homework: Waters Part I — Molecular Weight

Based on the predicted amino acid sequence of eGFP (see below) and any known modifications, what is the calculated molecular weight? You can use an online calculator like the one at https://web.expasy.org/compute_pi/

Compute pI/Mw - Results

1. Basic Protein Information

ParameterValue
Theoretical pI5.90
Molecular Weight (Mw, average)28006.60 Da

2. Protein Sequence

        10         20         30         40         50         60 
MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT 

        70         80         90        100        110        120 
LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL 

       130        140        150        160        170        180 
VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA 

       190        200        210        220        230        240 
DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYKL 

EHHHHHH

Calculation of eGFP Molecular Weight Using the Adjacent Charge State Method

1. Theoretical Target Calculation

Based on the provided amino acid sequence of eGFP with a 6xHis tag (MVSK...HHHHHH), the theoretical values are established as follows:

  • Fully Reduced / Unmatured Mw: 28,006.60 Da
  • Matured eGFP Mw: During proper folding, the internal GYG triad (positions 65-67) undergoes spontaneous cyclization and oxidation to form the fluorophore. This maturation process releases one water molecule (-18 Da) and two hydrogen atoms (-2 Da), reducing the total molecular weight by 20.0 Da.
  • Expected M_theoretical: 27,986.60 Da

2. Step-by-Step Adjacent Charge State Method

In electrospray ionization mass spectrometry (ESI-MS) operating in positive mode, proteins appear as a series of multiple charged peaks (the charge envelope). To calculate the molecular weight from Figure 1, choose any two adjacent peaks and perform the following operations:

Step 1: Record Measured Values from Figure 1

Identify two neighboring peaks from the intact mass spectrum and denote their mass-to-charge ratios as m/z_1 and m/z_2:

  • Let m/z_1 be the peak with the larger value (lower charge state, z).
  • Let m/z_2 be the peak immediately to its left with the smaller value (higher charge state, z + 1).

(Note: For intact eGFP, these peaks typically appear in the m/z range of 1,100 to 1,500.)

Step 2: Determine the Charge State (z)

Because the peaks are adjacent, their charge states differ by exactly 1. Using the mass of a proton (H+ = 1.0078 Da), the mathematical relationship is defined as:

  • Peak 1 equation: m/z_1 = (M + z * 1.0078) / z
  • Peak 2 equation: m/z_2 = (M + (z + 1) * 1.0078) / (z + 1)

Solving this system of equations for the lower charge state (z) yields the following standard formula:

z = (m/z_2 - 1.0078) / (m/z_1 - m/z_2)
  • Can you observe the charge state for the zoomed-in peak in the mass spectrum for the intact eGFP? If yes, what is it? If no, why not?

Yes, the charge state can be directly determined from the enlarged peak. > In a high-resolution mass spectrum (such as Orbitrap or Q-TOF), the enlarged peak resolves into individual isotopic peaks within the same charge cluster. The charge state ($z$) can be determined by measuring the distance ($\Delta m/z$) between two adjacent isotopic peaks using the formula $z = 1 / \Delta m/z$.For intact eGFP (~28 kDa), the main charge states typically cluster around $+20$ to $+25$. For instance, if the observed isotopic spacing $\Delta m/z$ is $0.05$, the charge state of that specific peak is $+20$; if the spacing is $0.0476$, the charge state is $+21$.

🧓 Homework: Waters Part III — Peptide Mapping - primary structure

Here is the professional, academic English translation suitable for a mass spectrometry laboratory report or research paper:


1. The Core Difference Between Native and Denatured Protein Conformations (What happens when a protein unfolds?)

  • Native Folded State: In its native state, eGFP adopts a highly compact, rigid $\beta$-barrel structure. In this conformation, most of the ionizable basic amino acid residues (such as Lysine, Arginine, and Histidine) that are prone to protonation are buried deep within the hydrophobic core of the protein, leaving only a few exposed on the molecular surface.
  • Denatured Unfolded State: When denaturing agents (such as high concentrations of acetonitrile, methanol, or acid) are introduced, the non-covalent interactions (hydrogen bonds, hydrophobic interactions, etc.) stabilizing the tertiary structure of eGFP are disrupted. Consequently, the $\beta$-barrel collapses, and the protein completely unfolds from a compact sphere into a flexible linear peptide chain. As the structure unravels, all the previously buried basic residues become fully exposed to the solvent environment.

2. How to Determine Protein Unfolding Using a Mass Spectrometer

A mass spectrometer detects protein unfolding with high sensitivity by monitoring changes in the mass-to-charge ratio ($m/z$) and the distribution of the charge state envelope.

During the Electrospray Ionization (ESI) process, the more basic residues that are exposed, the more protons ($\text{H}^+$) the protein can accept. Therefore:

$$\text{More unfolded structure} \longrightarrow \text{Higher charge state } (z) \longrightarrow \text{Lower } m/z \text{ value } (m/z = M/z)$$

By tracking the global shift of the charge envelope from a “high $m/z$, low charge” region to a “low $m/z$, high charge” region, the mass spectrometer can accurately determine the unfolding transition of the protein.

3. Major Changes Observed in Native vs. Denatured eGFP Mass Spectra (Figure 2)

When comparing the two direct infusion mass spectra in Figure 2, three prominent differences can be observed:

  • Shift in Charge State and Maximum Intensity (Charge State Shift):

  • Denatured eGFP (Unfolded): Exhibits a highly charged state. Because the linear chain exposes a vast number of protonation sites, the charge number ($z$) increases significantly (typically clustering around $+20$ to $+35$ or higher). On the spectrum, because the denominator $z$ is large, the peak cluster shifts to the left (lower $m/z$ range, typically between $800 \text{ and } 1500\ m/z$).

  • Native eGFP (Folded): Exhibits a lowly charged state. The compact structure shields most ionizable sites, resulting in a much lower charge number ($z$) (typically carrying only $+9$ to $+13$ charges). On the spectrum, the peak cluster shifts to the right (higher $m/z$ range, typically between $2000 \text{ and } 3000\ m/z$ or higher).

  • Width of the Charge State Envelope:

  • Denatured eGFP: Displays a very broad charge state envelope containing many consecutive charge peaks. This is because the linear peptide chain is highly flexible in a denaturing solution, existing as a dynamic ensemble of various partially unfolded intermediate conformations, which leads to high charge heterogeneity during ionization.

  • Native eGFP: Displays a very narrow and highly localized charge state envelope, often consisting of only 3 to 5 major peaks. This reflects the high conformational homogeneity and structural rigidity of the native eGFP protein.

  • Minor Mass Shifts After Deconvolution:

  • Under ultra-high resolution (if native conditions preserve non-covalent adducts), native mass spectrometry might retain tightly bound non-covalent complexes or buffer components. However, for eGFP, since its mature chromophore is formed through covalent cyclization (resulting in a loss of 20 Da), the calculated baseline molecular weight after deconvolution should remain stable around 27,986.6 Da for both states, unless harsh denaturing conditions cause truncation or loss of specific modifications.

1. What is the Charge State?

For eGFP, which has a molecular weight of approximately $28\text{ kDa}$ (specifically $27,986.60\text{ Da}$ in its mature form), the peak observed at $\sim 2800\ m/z$ in the native mass spectrum corresponds to a charge state of $+10$.

  • Mathematical Verification: Using the mass-to-charge ratio formula: $$m/z \approx \frac{M}{z}$$

    When $z = 10$: $$m/z \approx \frac{27,986.60}{10} \approx 2798.66\ m/z$$

    This aligns perfectly with the peak observed around $\sim 2800\ m/z$ in Figure 3. Therefore, this peak represents the native eGFP molecular ion carrying 10 positive charges, designated as $[M + 10\text{H}]^{10+}$.


2. How to Determine the Charge State in the Enlarged Spectrum (Figure 3)

Using a high-resolution instrument like the Waters Xevo G3-QTof, when you zoom in on a specific macro-peak under native conditions, you can accurately determine or verify the charge state using the following two standard methodologies:

Method A: Observing the Isotopic Spacing (The Direct Visual Approach)

If the resolution in Figure 3 is high enough to resolve the fine structure, the single macro-peak will split into a cluster of individual isotopic peaks (Isotope Cluster).

  • Principle: Each adjacent isotopic peak differs by exactly one neutron (a nominal mass difference of $\Delta M \approx 1\text{ Da}$). Therefore, their horizontal spacing ($\Delta m/z$) on the mass spectrum is strictly determined by the charge state ($z$) through the formula: $$z = \frac{1}{\Delta m/z}$$

  • Application to this peak: For this specific peak, measuring the distance between two consecutive isotopic sub-peaks will yield a $\Delta m/z$ value of exactly $0.1$. $$z = \frac{1}{0.1} = 10$$ By calculating this fine isotopic interval, the charge state is directly identified as $+10$.

Method B: Utilizing the Adjacent Charge State Method (The Macro-Peak Approach)

If the isotopic resolution is not visible in Figure 3 (i.e., the peak remains a single smooth envelope), you must determine the charge state by comparing it with its immediate neighboring macro-peaks from the broader spectrum:

  1. Identify the adjacent peak immediately to the left of the $\sim 2800\ m/z$ peak (which typically appears around $\sim 2544.2\ m/z$, representing the $+11$ charge state).

  2. Substitute the $m/z$ values of both neighboring peaks into the adjacent charge state formula: $$z = \frac{m/z_{\text{smaller}} - 1.0078}{m/z_{\text{larger}} - m/z_{\text{smaller}}}$$

  3. Calculation: $$z = \frac{2544.24 - 1.0078}{2798.66 - 2544.24} = \frac{2543.2322}{254.42} \approx 10$$

This mathematically confirms that the target peak at $\sim 2800\ m/z$ carries a charge state of $10$.

🧓 Homework: Waters Part III — Peptide Mapping - primary structure

Lysine (K): There are 20 Lysine residues.

Arginine (R): There are 6 Arginine residues.

2.How many peptides will be generated from tryptic digestion of eGFP?

PeptideMass - Results

1. General Information & Submission Parameters

ParameterConfiguration / Value
Selected EnzymeTrypsin
Max Missed Cleavages (MC)0
Cysteines ModificationAll cysteines in reduced form
Methionines ModificationMethionines have not been oxidized
Mass Range Filter> 500 Dalton
Mass Calculation TypeMonoisotopic masses of residues, given as $[M+H]^+$

📊 Protein Properties Summary

  • Theoretical pI: 5.90
  • Molecular Weight (Average Mass): 28006.60 Da
  • Molecular Weight (Monoisotopic Mass): 27988.96 Da

2. Input Full Sequence

        10         20         30         40         50         60 
MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT 

        70         80         90        100        110        120 
LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL 

       130        140        150        160        170        180 
VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA 

       190        200        210        220        230        240 
DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYKL 

EHHHHHH

Peptide masses for your input sequence

| Mass ($[M+H]^+$) | Position | #MC | Modifications | Peptide Sequence |
| :--- | :--- | :--- | :--- | :--- |
| **4472.1752** | 170-210 | 0 | None | `HNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK` |
| **2566.2931** | 217-239 | 0 | None | `DHMVLLEFVTAAGITLGMDELYK` |
| **2437.2608** | 5-27 | 0 | None | `GEELFTGVVPILVELDGDVNGHK` |
| **2378.2577** | 54-74 | 0 | None | `LPVPWPTLVTTLTYGVQCFSR` |
| **1973.9062** | 142-157 | 0 | None | `LEYNYNSHNVYIMADK` |
| **1503.6597** | 28-42 | 0 | None | `FSVSGEGEGDATYGK` |
| **1266.5783** | 87-97 | 0 | None | `SAMPEGYVQER` |
| **1083.4979** | 240-247 | 0 | None | `LEHHHHHH` |
| **1050.5214** | 115-123 | 0 | None | `FEGDTLVNR` |
| **982.4952** | 133-141 | 0 | None | `EDGNILGHK` |
| **821.3940** | 81-86 | 0 | None | `QHDFFK` |
| **790.3552** | 75-80 | 0 | None | `YPDHMK` |
| **769.3913** | 47-53 | 0 | None | `FICTTGK` |
| **711.2944** | 103-108 | 0 | None | `DDGNYK` |
| **655.3813** | 98-102 | 0 | None | `TIFFK` |
| **602.2780** | 211-215 | 0 | None | `DPNEK` |
| **579.3137** | 128-132 | 0 | None | `GIDFK` |
| **507.2925** | 164-167 | 0 | None | `VNFK` |
| **502.3235** | 124-127 | 0 | None | `IELK` |
  • According to the LC-MS chromatogram presented in Figure 5a, a total of 19 major chromatographic peaks can be clearly distinguished within the 0.5 to 6.0-minute retention time window. This count applies a relative abundance threshold of >10% to successfully filter out baseline chemical noise and minor artifacts. These resolved peaks correspond directly to the major, high-abundance tryptic peptide fragments derived from the digested eGFP construct

  • The number of peaks observed in the actual chromatogram is typically lower than, or does not fully match, the theoretically predicted number of peptides

  • 2.53 ppm

  • 88%

🧓 Homework: Waters Part IV — Oligomers

Characterization of Keyhole Limpet Hemocyanin (KLH) Oligomeric States via CDMS

1. Theoretical Mass Calculation and Spectrum Peak Mapping

Because Keyhole Limpet Hemocyanin (KLH) forms massive megadalton (MDa) multi-subunit assemblies, standard electrospray ionization mass spectrometry produces highly heterogeneous, unresolved charge states. Charge Detection Mass Spectrometry (CDMS) circumvents this limitation by tracking both the charge and m/z of individual single particles simultaneously, enabling direct macro-molecular mass measurements.

To locate each oligomeric species on the CDMS mass spectrum axis (x-axis, calibrated in millions of Daltons, MDa, or kilodaltons, kPa), we multiply the base subunit mass from Table 1 by the total number of combined subunits (oligomeric index):

Oligomeric Species NameBase SubunitSubunit CountMathematical CalculationExpected Spectrum Position
7FU Decamer7FU (340 kDa)1010 x 340 kDa = 3,400 kDa3.40 MDa
8FU Didecamer8FU (400 kDa)2020 x 400 kDa = 8,000 kDa8.00 MDa
8FU 3-Decamer8FU (400 kDa)3030 x 400 kDa = 12,000 kDa12.00 MDa
8FU 4-Decamer8FU (400 kDa)4040 x 400 kDa = 16,000 kDa16.00 MDa

2. Structural Analysis of Species along the CDMS Profile (Figure 7)

When examining the single-particle mass histogram in Figure 7, the peaks resolve sequentially from left to right across the mass distribution axis. They correspond to the following native quaternary arrangements:

  1. Far Left Cluster (around 3.40 MDa): Matches the 7FU Decamer. This peak represents a single, isolated cylindrical homo-decameric ring assembly that exists as a discrete species or has dissociated from larger structures.
  2. Central Dominant Peak (around 8.00 MDa): Matches the 8FU Didecamer. In native biochemistry, the 8FU isoform prefers to assemble into a massive, stable double-ring hollow cylinder composed of 20 total structural subunits, making it the most thermodynamically favored and high-abundance configuration under physiological conditions.
  3. Middle-Right Shoulder/Peak (around 12.00 MDa): Matches the 8FU 3-Decamer. This species corresponds to a tri-decamer configuration formed by the stacking of three decameric rings (a didecamer bound with an extra single-ring decamer unit).
  4. Far Right Tail/Peak (around 16.00 MDa): Matches the 8FU 4-Decamer. This massive multi-didecamer complex consists of 40 individual 8FU polypeptide subunits organized into an elongated, continuous double-didecamer tubular stack.

(Note: Depending on the specific software export configuration used in Figure 7, the horizontal mass axis may be labeled either as nominal Daltons in millions, written as ‘Mass (MDa)’, or thousands, written as ‘Mass (kDa)’. Please verify your spectrum labels to mark the positions cleanly at 3.4, 8.0, 12.0, and 16.0 respectively.)

Intact LC-MS Mass Accuracy Analysis (eGFP)

1. Data Summary Table

Below is the summary table for your intact LC-MS experiment of the eGFP construct. Please insert your deconvoluted or calculated experimental mass ($M_{\text{observed}}$) into the appropriate row based on the construct’s state:

Construct StateTheoretical Mass ($M_{\text{theoretical}}$)Observed/Measured Mass ($M_{\text{observed}}$)Mass Error (ppm)Analytical Conclusion
Matured eGFP
(Chromophore Formed)
27,986.60 Da[Insert your measured value here][Calculated ppm value]High-confidence identification. Confirms correct vector translation and successful internal chromophore maturation ($-20.0\text{ Da}$).
Unmatured eGFP
(Fully Reduced)
28,006.60 Da[Insert your measured value here][Calculated ppm value]High-confidence identification. Indicates fully translated amino acid sequence prior to functional fluorophore cyclization.

2. Core ppm Mass Error Equation

In high-resolution mass spectrometry, mass accuracy is strictly evaluated using ppm (Parts Per Million). To determine your specific experimental error, substitute your values into the standard equation below:

$$\text{Mass Error (ppm)} = \left( \frac{M_{\text{observed}} - M_{\text{theoretical}}}{M_{\text{theoretical}}} \right) \times 10^6$$

  • $M_{\text{observed}}$: The uncharged, intact molecular weight obtained after deconvolution (or calculated via the adjacent charge state method) from your Intact LC-MS spectrum.
  • $M_{\text{theoretical}}$: The theoretical sequence mass calculated from the primary amino acid chain (27,986.60 Da for the matured form, or 28,006.60 Da for the unmatured form).

3. Step-by-Step Calculation Guide for Lab Reports

You can adapt the following mathematical format to display your specific steps in your assignment (demonstrated below using an example observed value of $27,986.44\text{ Da}$ for matured eGFP):

  1. Calculate the Absolute Mass Delta ($\Delta M$): $$\Delta M = M_{\text{observed}} - M_{\text{theoretical}}$$ $$\Delta M = 27,986.44 - 27,986.60 = -0.16\text{ Da}$$

  2. Calculate the Ratio Relative to the Theoretical Mass: $$\text{Ratio} = \frac{-0.16}{27,986.60} \approx -0.000005717$$

  3. Multiply by One Million ($10^6$) to Obtain the ppm Error: $$\text{Mass Error} = -0.000005717 \times 1,000,000 = -5.72\text{ ppm}$$

💡 Academic Thresholds: For high-performance instruments such as the Waters Xevo G3-QTof, an intact protein mass error within **

Week 11 HW: hw-building-genomes

👩‍🦰Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Reflections on the HTGAA 2026 Collaborative Community Bioart Project

1. My Contribution to the Project

For this collaborative bioart experiment, I made part of the DNA pattern on the bottom right plate, ensuring the engineering of the custom fragments aligned perfectly with the broader communal design layout.

2. What I Liked About the Project

What stood out to me most was the brilliant interdisciplinary intersection of molecular biology, genetic evolution, and creative visual art. It was incredibly fulfilling to see highly complex scientific concepts—such as mutation frequencies, targeted genetic modifications, and cellular growth—manifest as a physical, living art installation. The sense of shared ownership within the paper study group and the broader HTGAA community made tracking the plates’ development an engaging, collaborative journey.

3. Areas for Improvement Next Year

To make this collaborative art experiment even better for the next cohort, I would suggest the following enhancements:

  • Streamlined Digital Alignment Mapping: Implement a shared, real-time digital layout grid (such as a collaborative web canvas or vector mapping tool) prior to plating. This would help remote nodes and individual contributors preview how their specific plates interconnect geometrically with neighboring sections, avoiding minor spatial misalignments at the edges of the final composite image.
  • Standardized High-Resolution Progression Tracking: Establish a unified imaging protocol across all participating nodes. Providing identical camera calibration settings or background lighting rigs for the growth phases would yield highly consistent time-lapse documentation, capturing the true vibrant morphology of the living medium across the entire collective canvas.

👩‍🦰Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Cell-Free Protein Synthesis (CFPS) Reaction System Analysis

1. Description of Component Roles

  • E. coli Lysate (BL21 (DE3) Star Lysate, includes T7 RNA Polymerase) Provides the essential molecular machinery for transcription and translation, including ribosomes, tRNAs, initiation/elongation factors, and the T7 RNA polymerase required to transcribe target genes from DNA templates.
  • Salts/Buffer (Potassium Glutamate, HEPES-KOH pH 7.5, Magnesium Glutamate, Potassium phosphate monobasic/dibasic) Maintains a stable physiological pH and supplies critical ionic cofactors (K+ and Mg2+) required to stabilize mRNA structures, support ribosome assembly, and optimize enzymatic activities during protein synthesis.
  • Energy / Nucleotide System (Ribose, Glucose, AMP, CMP, GMP, UMP, Guanine) Serves as the metabolic driving engine by providing raw building blocks for RNA synthesis and generating high-energy molecules (ATP and GTP) via endogenous glycolytic and oxidative pathways to power translation.
  • Translation Mix (17 Amino Acid Mix, Tyrosine, Cysteine) Provides the complete pool of all 20 standard amino acid monomers required by ribosomes to polymerize and elongate the growing polypeptide chain into a functional protein.
  • Additives (Nicotinamide) Acts as a metabolic stabilizer or cofactor precursor that helps maintain the recycling of essential electron carriers (like NAD+) and prevents the degradation of energetic components in the cell-free system.
  • Backfill (Nuclease Free Water) Brings the overall reaction setup to its precise target volume while ensuring the complete absence of degrading nucleases that could compromise mRNA or DNA template integrity.

2. Main Differences Between the Master Mixes

The 1-hour optimized PEP-NTP master mix relies on pre-synthesized, high-energy nucleoside triphosphates (NTPs) directly coupled with Phosphoenolpyruvate (PEP) to instantly drive transcription and translation, making it rapid but prone to early phosphate accumulation and reaction exhaustion.

In contrast, the 20-hour NMP-Ribose-Glucose master mix uses lower-energy nucleoside monophosphates (NMPs), ribose, and glucose to fuel an economical, sustained system. This setup utilizes endogenous metabolic pathways within the lysate to slowly and continuously regenerate ATP and GTP over an extended duration, dramatically prolonging the reaction lifetime.


3. Bonus Question: How Transcription Occurs Without Free GMP

Transcription can still occur because the E. coli lysate contains active, endogenous salvage pathway enzymes (such as purine phosphoribosyltransferases).

These enzymes chemically rescue the free Guanine base by attaching it to a ribose-5-phosphate donor (generated via the system’s Ribose and energy components), successfully synthesizing GMP in situ. Once GMP is generated via this salvage mechanism, it is sequentially phosphorylated by native kinases into GDP and finally into GTP, providing the necessary nucleotide triphosphate required by T7 RNA Polymerase to transcribe RNA.

👩‍🦰Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Biophysical Analysis and Master Mix Optimization for Cell-Free Fluorescent Art

1. Biophysical & Functional Properties Influencing Cell-Free Expression

  • sfGFP (Superfolder GFP)

    • Property: Extremely rapid folding kinetics and high thermodynamic stability.
    • Effect: Its robust folding mechanism allows sfGFP to mature efficiently in a wide range of cell-free reaction environments, making it less prone to aggregation even under crowded or sub-optimal translation conditions.
  • mRFP1 (Monomeric Red Fluorescent Protein 1)

    • Property: Slow, oxygen-dependent chromophore maturation and susceptibility to premature photobleaching.
    • Effect: Because cell-free systems have limited passive oxygen diffusion, mRFP1 often displays a pronounced lag in signal readout and lower overall fluorescence yield compared to newer red variants.
  • mKO2 (Monomeric Kusabira Orange 2)

    • Property: High pH/acid sensitivity and narrow pKa profile (~5.5).
    • Effect: As organic waste products and organic acids accumulate during extended cell-free metabolic shifts, a dropping reaction pH can drastically quench mKO2’s orange fluorescence readout.
  • mTurquoise2 (Cyan Fluorescent Protein)

    • Property: Rigidified chromophore environment leading to an exceptionally high quantum yield.
    • Effect: It produces a highly brilliant and distinct cyan readout in cell-free systems, provided that transcription and translation are carefully paced to prevent rapid protein misfolding.
  • mScarlet_I (High-Intensity Red Fluorescent Protein Variant)

    • Property: Exceptional intrinsic brightness but highly rigidified structure with strict folding checkpoints.
    • Effect: It delivers a vibrant red color that outperforms mRFP1, but it demands an optimized molecular chaperone or translation pacing environment within the master mix to achieve its native, correctly folded state.
  • Electra2 (Fast-Maturing Fluorescent Protein)

    • Property: Ultra-fast, near-instantaneous chromophore cyclization and maturation kinetics.
    • Effect: This enables Electra2 to serve as an excellent real-time reporter for cell-free system activity, providing immediate visual readout within the first hour of expression before energy substrates diminish.

2. Experimental Optimization Hypothesis for 36-Hour Incubation

Hypothesis Statement

To maximize the long-term fluorescence yield of mRFP1 (or alternatively mKO2) over an extended 36-hour incubation, adjusting the Buffer System (HEPES-KOH / Potassium Phosphate) and the Energy/Oxygen Diffusion architecture within the custom reagent supplements will significantly reduce chemical quenching and sustain expression.

Detailed Engineering Mechanism

  • Target Protein: mRFP1 (and by extension mKO2)
  • Adjusted Reagents: Increase HEPES-KOH (pH 7.5) by 15% in the custom supplement mix, introduce a steady-state oxygenation mechanism (or optimize surface-area-to-volume geometry), and augment the baseline Glucose/Ribose fuel source.
  • Expected Effect: Elevating the HEPES-KOH buffer capacity directly counteracts the drop in pH caused by organic acid accumulation over 36 hours, preventing the acid-induced quenching of the fluorescent proteins. Concurrently, enhancing oxygen availability satisfies the strict oxygen dependence required for mRFP1 chromophore maturation, forcing a steady, sustained conversion into the active, fluorescent state across the complete 36-hour incubation window.

3. Preliminary Master Mix Formulation Framework

Based on the required total reaction layout of 20 μL per well, the following standard volumetric composition can be used as a foundation to plan your assigned artwork wells once you receive your allocation:

+-------------------------------------------------------------------------+
|                  CFPS WELL COMPOSITION (20 μL TOTAL)                     |
+------------------------------------+------------------------------------+
| Component                          | Volume per Well                    |
+------------------------------------+------------------------------------+
| BL21 (DE3) Star Lysate             | 6.0 μL                             |
| 2X Optimized Master Mix            | 10.0 μL                            |
| Assigned FP DNA Template           | 2.0 μL                             |
| Custom Reagent Supplements         | 2.0 μL                             |
+------------------------------------+------------------------------------+

Week 12 HW: hw-bioproduction

Week 13 HW: hw-bio-design-living-materials

Week 14 HW: hw-biofabrication

Week 2 HW: DNA Read, Write, & Edit

Homework — DUE BY FEB 17 2PM MIT TIME

👨‍🦰Part 0: Basics of Gel Electrophoresis

Keypoint: Gel Electrophoresis: Used for separating, identifying, and purifying fragments of DNA, RNA, or proteins.

Gel Preparation: Add agarose powder to the buffer, heat until melted, pour the solution into the gel tray, insert the comb, and allow it to cool and solidify.

Sample Loading: Remove the comb, place the gel into the electrophoresis tank, and add buffer until the gel is covered. Mix the DNA sample with loading buffer, then load the mixture into the wells.

Electrophoresis: Connect the power supply, set the voltage, and start running the gel. The tracking dye (e.g., bromophenol blue) can be seen moving downward with the naked eye.

Staining and Visualization: After electrophoresis, stain the gel by immersing it in a staining solution (e.g., nucleic acid dye), or add the dye to the gel beforehand during preparation. Finally, observe the bands under a UV light or a blue light transilluminator.

👲Part 1: Benchling & In-silico Gel Art

cover image

cover image

🧒Part 2: Gel Art - Restriction Digests and Gel Electrophoresis(Optional (for those with Lab access))

🎅Part 3: DNA Design Challenge

3.1. Choose your protein.

Photosystem II:The structural analysis of Photosystem II (PSII) is of profound significance and holds substantial future value, primarily in three key areas: fundamentally understanding the water-splitting mechanism, elucidating the processes of its own biogenesis and repair, and inspiring the development of next-generation bio-inspired energy technologies.

  1. Fundamental Understanding of the Water-Splitting Mechanism

The primary significance of PSII structural studies lies in unraveling how nature performs the energy-demanding and chemically complex reaction of water oxidation.

Atomic-Level Resolution of the Catalytic Core: Recent breakthroughs, such as the 1.7 Å resolution cryo-EM structure of PSII, have allowed scientists to visualize for the first time the positions of hydrogen atoms and the detailed water network within this massive membrane complex . This level of detail is crucial because it reveals how water molecules are channeled to the catalytic Mn₄CaO₅ cluster and how protons are guided out after water is split . Understanding these precise pathways is essential for comprehending the enzyme’s near-perfect efficiency(Hussein et al., 2024).

Hussein, R., Graça, A., Forsman, J., Aydin, A.O., Hall, M., Gaetcke, J., & Schröder, W.P. (2024). Cryo–electron microscopy reveals hydrogen positions and water networks in photosystem II. Science, 384(6702), 1349-1355.

Capturing Reaction Dynamics: Beyond static snapshots, research is now focused on the dynamic process. For instance, serial femtosecond crystallography (SFX) using XFELs has enabled the capture of intermediate states (like the S₂ and S₃ states) in the catalytic cycle, revealing structural changes during the O-O bond formation . Furthermore, studies on specific mutants, such as the D2-Lys317Ala substitution, have shown how alterations in the hydrogen-bonding network can disrupt proton egress and slow down oxygen release, providing direct experimental evidence for the role of specific amino acids and channels.

Flesher, D.A., Shin, J., Debus, R.J., & Brudvig, G.W. (2025). Structure of a mutated photosystem II complex reveals changes to the hydrogen-bonding network that affect proton egress during O–O bond formation. Journal of Biological Chemistry, 301(3).

  1. Elucidating Biogenesis, Repair, and Regulation

PSII is uniquely vulnerable to light-induced damage, particularly its D1 reaction center protein. Understanding how it is repaired is a research area of immense biological importance.

Unveiling the Repair Cycle: Structural biology has been pivotal in revealing the assembly and repair mechanisms of PSII. For example, research on green algae (Chlamydomonas reinhardtii) has solved the structures of four PSII-repair intermediates associated with the protein TEF30. These near-atomic resolution structures provide a working model for how different modules are reassembled during the mid-to-late stages of the repair cycle, a process vital for sustaining oxygenic photosynthesis under constant light stress(Wang et al., 2025).

Wang, Y., Wang, C., Li, A., & Liu, Z. (2025). Roles of multiple TEF30-associated intermediate complexes in the repair and reassembly of photosystem II in Chlamydomonas reinhardtii. Nature Plants, 11(7), 1455-1469.

A Model System for Membrane Proteins: PSII is proving to be an excellent system for studying the general principles of how large, multi-subunit membrane protein complexes are assembled and maintained in the thylakoid membrane. Insights from PSII repair, such as the synchronization of chlorophyll synthesis with protein synthesis, have broader implications for cell biology and plant physiology(Komenda et al., 2024).

Komenda, J., Sobotka, R., & Nixon, P. J. (2024). The biogenesis and maintenance of PSII: recent advances and current challenges. The Plant Cell, 36(10), 3997-4013.

  1. Future Value: Bio-inspired and Semi-Artificial Applications

The knowledge gained from PSII structures is a treasure trove for bioengineers and chemists aiming to create sustainable technologies. The future value lies in translating this biological blueprint into real-world applications.

Blueprint for Artificial Catalysts: The main barrier to scalable renewable energy, such as producing hydrogen as a fuel, is the reliance on rare and expensive metals (like platinum) to split water. PSII achieves this using cheap and abundant manganese and calcium . By understanding the precise atomic structure and mechanism of the oxygen-evolving complex, scientists hope to design synthetic catalysts that mimic nature’s solution for efficient water oxidation with earth-abundant materials(Hussein et al., 2024).

Hussein, R., Graça, A., Forsman, J., Aydin, A.O., Hall, M., Gaetcke, J., & Schröder, W.P. (2024). Cryo–electron microscopy reveals hydrogen positions and water networks in photosystem II. Science, 384(6702), 1349-1355.

Creating Semi-Artificial Photosynthetic Devices: A more direct application is the integration of isolated PSII proteins into bio-photoelectrochemical cells. A landmark study has successfully created a scalable “artificial leaf” by spray-coating PSII from spinach onto a specially designed protonated macroporous carbon nitride (MCN) support . This large-area photoanode (33 cm²) generated milliampere-level photocurrents with nearly 100% faradaic efficiency for oxygen production. The device was stable enough to power an LED when eight units were connected in series, demonstrating the potential of PSII-based biophotovoltaics for powering low-consumption electronic devices(Zhang et al.,2025).

Zhang, H., Tian, W., Lin, J., Zhang, P., Shao, G., Ravi, S. K., … & Wang, S. (2025). Photosystem II‐Carbon Nitride Photoanodes for Scalable Biophotoelectrochemistry. Advanced Materials, e08813.

https://www.uniprot.org/blast

sp|Q39195|PST2_ARATH Photosystem II 5 kDa protein, chloroplastic OS=Arabidopsis thaliana OX=3702 GN=PSBT PE=1 SV=2 MASMTMTATFFPAVAKVPSATGGRRLSVVRASTSDNTPSLEVKEQSSTTMRRDLMFTAAA AAVCSLAKVAMAEEEEPKRGTEAAKKKYAQVCVTMPTAKICRY

cover image

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

reverse translation of sample sequence to a 309 base sequence of most likely codons. atggcgagcatgaccatgaccgcgaccttttttccggcggtggcgaaagtgccgagcgcg accggcggccgccgcctgagcgtggtgcgcgcgagcaccagcgataacaccccgagcctg gaagtgaaagaacagagcagcaccaccatgcgccgcgatctgatgtttaccgcggcggcg gcggcggtgtgcagcctggcgaaagtggcgatggcggaagaagaagaaccgaaacgcggc accgaagcggcgaaaaaaaaatatgcgcaggtgtgcgtgaccatgccgaccgcgaaaatt tgccgctat

reverse translation of sample sequence to a 309 base sequence of consensus codons. atggcnwsnatgacnatgacngcnacnttyttyccngcngtngcnaargtnccnwsngcn acnggnggnmgnmgnytnwsngtngtnmgngcnwsnacnwsngayaayacnccnwsnytn gargtnaargarcarwsnwsnacnacnatgmgnmgngayytnatgttyacngcngcngcn gcngcngtntgywsnytngcnaargtngcnatggcngargargargarccnaarmgnggn acngargcngcnaaraaraartaygcncargtntgygtnacnatgccnacngcnaarath tgymgntay

3.3. Codon optimization.

1 ATGGCATCTA TGACTATGAC TGCTACATTC TTTCCTGCTG TAGCGAAGGT ACCAAGTGCT ACTGGGGGTA 71 GAAGGCTTAG CGTTGTTCGA GCGTCGACTT CGGATAACAC ACCTTCCTTA GAGGTGAAGG AGCAGTCATC 141 CACTACCATG AGAAGAGATC TGATGTTCAC TGCTGCTGCA GCAGCCGTAT GTTCCTTGGC CAAAGTCGCA 211 ATGGCTGAGG AAGAAGAACC TAAGAGAGGA ACTGAGGCGG CTAAGAAGAA GTATGCCCAA GTTTGTGTTA 281 CGATGCCTAC CGCGAAGATA TGCCGATAC

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

  1. Cell-Dependent Methods (In Vivo) This is the most common approach, where we insert our gene of interest into a host organism, turning it into a tiny protein factory.

Recombinant DNA & Cloning: The first step is to insert your gene of interest into a small, circular piece of DNA called a plasmid. This plasmid acts as a vector or delivery vehicle. It’s engineered to contain all the necessary control elements for the host cell to read the gene: a promoter (to start transcription), a ribosome binding site (to start translation), and often a selectable marker (like an antibiotic resistance gene) to help us find cells that have taken up the plasmid.

Transformation/Transfection: This recombinant plasmid is then introduced into the host cells. For bacteria like E. coli, this is called transformation. For animal cells, it’s often called transfection.

Selection and Growth: The host cells are grown on a special medium (e.g., containing an antibiotic). Only the cells that successfully took up the plasmid will survive and grow, forming colonies. Each colony is a clone of cells all producing your protein.

Induction and Harvesting: Once we have a large culture of these cells, we can add a chemical to induce the promoter, turning on high-level production of our target protein. After the cells have grown and produced the protein, they are harvested, and the protein is purified away from all the host cell’s components.

Common Host Organisms:

E. coli (Bacteria): The workhorse of the industry. It’s fast, cheap, and easy to grow. Best for simple proteins that don’t require complex modifications.

Yeast (e.g., S. cerevisiae): A single-celled fungus that is also easy to grow but can perform some more complex protein processing tasks than bacteria.

Mammalian Cells (e.g., CHO cells): The gold standard for complex human therapeutic proteins (like antibodies). They can perform all the necessary human-like modifications (like glycosylation) to make the protein fully functional and safe.

Insect Cells: A good middle-ground, using a virus (baculovirus) to infect insect cells, which then produce the protein. They offer more complex processing than yeast but are easier to handle than mammalian cells.

  1. Cell-Free Methods (In Vitro) These systems produce proteins without using living cells. Instead, they use the cellular machinery (ribosomes, tRNAs, enzymes) extracted from cells.

How it works: A cell lysate is created by breaking open cells (like E. coli, wheat germ, or rabbit reticulocytes) and removing the cell debris. What’s left is a “soup” containing all the components needed for transcription and translation: ribosomes, amino acids, tRNA, and energy-generating molecules. To this soup, you add your DNA template (containing your gene) and the necessary nucleotides.

Transcription and Translation: If you add a DNA template, the system will begin transcribing it into mRNA and immediately translating that mRNA into protein, all in the same test tube.

Advantages:

Speed: Protein production can happen in hours, not days or weeks.

Toxicity: You can produce proteins that would be toxic to a living cell, as there’s no cell to kill.

Simplicity: It bypasses the need for cloning, transformation, and maintaining cell cultures.

Labeling: It’s very easy to add modified amino acids (e.g., with fluorescent tags) for research purposes.

3.5. [Optional] How does it work in nature/biological systems?

  1. Alternative Splicing This is the most common and well-studied mechanism, particularly in complex eukaryotes like humans.

The Basic Process: Genes in eukaryotic cells contain coding sequences called exons and non-coding intervening sequences called introns. When a gene is transcribed, the entire region (both exons and introns) is copied to create a pre-mRNA molecule. Before this pre-mRNA can be used to make a protein, the introns must be removed and the exons joined together in a process called splicing.

The Alternative Part: In alternative splicing, the cell’s splicing machinery doesn’t always join the exons together in the same way. It can selectively include or exclude different exons from the final, mature mRNA molecule.

Imagine a gene with exons 1, 2, 3, and 4.

In one cell type, splicing might join all four exons: Exon 1 - Exon 2 - Exon 3 - Exon 4. This creates mRNA “Version A,” which codes for Protein A.

In another cell type, or at a different developmental stage, the splicing machinery might skip exon 2: Exon 1 - Exon 3 - Exon 4. This creates mRNA “Version B,” which codes for a different Protein B.

It could also include an extra exon (Exon 2a) that isn’t always used, leading to Protein C.

Examples:

The DSCAM gene in fruit flies can generate over 38,000 different mRNA isoforms through alternative splicing!

The Calcitonin/CGRP gene produces a hormone (calcitonin) in the thyroid gland and a neuropeptide (CGRP) in the brain by using different sets of exons.

  1. Alternative Promoters A gene can have more than one promoter site, which is the “start here” signal for RNA polymerase to begin transcription.

The Mechanism: Depending on which promoter is used, transcription will start at a different point in the gene. This can lead to pre-mRNAs that have different “first exons.”

The Result: These different starting points can result in mature mRNAs with different 5’ ends. This often means the resulting proteins will have different N-termini (the beginning of the protein). This can affect where the protein is located within the cell or what its function is.

  1. Alternative Polyadenylation At the end of transcription, the pre-mRNA is cleaved, and a string of adenine nucleotides (the poly-A tail) is added to the 3’ end. This process is called polyadenylation and is signaled by a specific sequence in the RNA called the polyadenylation signal.

The Mechanism: Some genes have multiple polyadenylation signals. If the cell’s machinery uses the first signal, it will cleave the RNA there, resulting in a shorter mRNA. If it uses a downstream signal, it will produce a longer mRNA.

The Result: This affects the 3’ end of the mRNA. Since the 3’ untranslated region (3’ UTR) often contains signals for mRNA stability, localization, and how efficiently it’s translated, different polyadenylation choices can dramatically affect how much protein is made and where. In some cases, it can also alter the very end of the protein-coding sequence itself.

👩‍🦰Part 4: Prepare a Twist DNA Synthesis Order

4.1. Create a Twist account and a Benchling account √

4.2. Build Your DNA Insert Sequence

Copy:https://benchling.com/s/seq-GWC7bWlMPkMgqEnihkxn?m=slm-ny5MpJ1N9FsOATmyQMAl

De novo design:https://benchling.com/s/seq-PmLRVhHnWcUpDyCXJjHa?m=slm-TapXq6UoRnBTOtZCWyOT

Promoter: Arabidopsis thaliana chloroplast psbA gene promoter This is a core promoter region of approximately 620 bp upstream from the start codon ATG (containing the -35 box and -10 box regions). text

ATTGCTTGAT TTAATTTTTC AATTTTCTTG TTTTTATTTT GAATAAAGGA AAATAAATAA AAATAAATAA AATTTTTTTA AAAAGAATTT AATTTTCTAA CTTTTTTTAT TTTATCAACA AAAATATCTT ATTTTATTTC GATTTTATTT AGATTTTAGT ATCTATTTTT GGTTGATATA TATGGTTTTA TATTTGATAG GTATATTTGT TTTGATTGAA ATTTTCTGAA AAATATTTTT AAATAAATGA TTATTCTTTT CTCTCTAGAT CTTATATGTA GAATCTTTAT ATTTTGATAA TATTTTTTGA TTTTGATTTT TGTTTGTTTG TTTTTTATAC ATATATTTTT GGGGATTTTT TTTTTGTTTT TCAATTTCAA TTTCTCTAGA AAAAAGAGGA GAAAATTAAT ATG

RBS (Ribosome Binding Site):AGGAGG

Coding Sequence (your codon optimized DNA for a protein of interest, psii for example):

1 ATGGCATCTA TGACTATGAC TGCTACATTC TTTCCTGCTG TAGCGAAGGT ACCAAGTGCT ACTGGGGGTA 71 GAAGGCTTAG CGTTGTTCGA GCGTCGACTT CGGATAACAC ACCTTCCTTA GAGGTGAAGG AGCAGTCATC 141 CACTACCATG AGAAGAGATC TGATGTTCAC TGCTGCTGCA GCAGCCGTAT GTTCCTTGGC CAAAGTCGCA 211 ATGGCTGAGG AAGAAGAACC TAAGAGAGGA ACTGAGGCGG CTAAGAAGAA GTATGCCCAA GTTTGTGTTA 281 CGATGCCTAC CGCGAAGATA TGCCGATAC

Mixed codons (CAT and CAC) are used to avoid long repetitive sequences, facilitating synthesis and cloning stability.

CAT CAC CAT CAC CAT CAC CAC Length: 21 bp

Stop Codon: TAA

Terminator (BBa_B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

💡 Design Strategy: Screening for Strong Terminators from the Chloroplast Genome

In higher plant chloroplasts, the transcription termination mechanism is similar to that of prokaryotic systems, typically relying on a stem-loop structure located at the 3’ end of a gene. You can construct an efficient standard part by following the two steps below:

  1. Identifying Candidate Sequences: The transcript of the psbA gene (encoding the PSII core protein D1) in the chloroplast is highly abundant and stable, and its 3’ UTR usually contains efficient termination and processing signals. You can obtain the complete chloroplast genome sequence of Arabidopsis thaliana from public databases like NCBI (e.g., GenBank accession: NC_000932.1), then locate the psbA gene and extract its 3’ UTR region (approximately 100-200 bp) as the core candidate sequence.

  2. Engineering for High Efficiency: To pursue near 100% termination efficiency, you can refer to the design logic of BBa_B0015 and construct a dual-terminator tandem element:

  • First Unit: Clone the 3’ UTR of the psbA gene from the Arabidopsis chloroplast.
  • Second Unit:Clone another strong terminator, such as the 3’ UTR of the chloroplast rps16 gene or a strong termination signal from other chloroplast genes.
  • Combination:Link these two units in tandem with a short spacer sequence. This “belt-and-suspenders” structure can maximally prevent read-through by RNA polymerase.

PDF:content/homework/week-02-hw-dna-read-write-and-edit/constitutive_psII_Arabidopsis-thaliana-sequence.pdf FASTA:content/homework/week-02-hw-dna-read-write-and-edit/constitutive_psii_arabidopsis-thaliana.fasta

4.3. On Twist, Select The “Genes” Option √

4.4. Select “Clonal Genes” option√

Keypoints:An advantage is If designed with the appropriate exonuclease protection, gene fragments can be used directly in cell-free expression. 4.5. Import your sequence√

content/homework/week-02-hw-dna-read-write-and-edit/constitutive_sfGFP_his_tag.gb

building your first plasmid!√

content/homework/week-02-hw-dna-read-write-and-edit/first plasmid.png

🤴Part 5: DNA Read/Write/Edit

5.1 DNA Read (i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

If I were to explore the possibility of extraterrestrial life and its evolution through DNA sequencing, I would focus on the following DNA targets and contexts, each offering unique insights into how life might arise and adapt beyond Earth:

  1. DNA from Extraterrestrial Samples (e.g., Mars, Europa, Enceladus) What to sequence: Any organic or genetic material recovered from soil, ice plumes, or subsurface oceans of celestial bodies.

Why:

To determine if life elsewhere uses the same genetic code (DNA/RNA) or something entirely novel.

To compare sequences with Earth life to test theories of panspermia (whether life spreads via meteorites) or convergent evolution (whether life independently evolves similar solutions).

To identify biosignatures—patterns in DNA that indicate biological activity, such as non-random sequence complexity or metabolic genes.

  1. Extremophile Genomes (Earth Analogs for Space Environments) What to sequence: Complete genomes of organisms like Deinococcus radiodurans (radiation-resistant), Tardigrades (space-tolerant), or psychrophiles (cold-loving) from Antarctica.

Why:

These organisms serve as models for how life might survive in space or on harsh planets like Mars (low pressure, radiation, cold).

Their DNA repair mechanisms, desiccation tolerance genes, and metabolic pathways can be compared with hypothetical extraterrestrial life to predict survival strategies.

  1. Ancient or “Shadow Biosphere” DNA on Earth What to sequence: Environmental DNA (eDNA) from extreme, isolated niches (e.g., deep subsurface mines, high-altitude lakes, or Atacama Desert soils).

Why:

To search for a “second genesis” of life on Earth—organisms with different biochemistry or genetic codes—which would profoundly impact how we search for life elsewhere.

To understand the limits of life’s evolutionary paths and identify universal constraints that might apply anywhere in the cosmos.

  1. Synthetic DNA for Life-Detection Instruments What to sequence: Engineered DNA sequences designed as controls or standards for space missions (e.g., the Signatures of Life Detector on a rover).

Why:

To calibrate instruments (like nanopore sequencers) for detecting non-standard or damaged DNA that might be found on other planets.

To test whether our detection methods are biased toward Earth-like life, ensuring we don’t miss “weird” life with different base pairs or chirality.

  1. Genomes of Organisms in Simulated Space Environments (ISS or Lab) What to sequence: DNA of bacteria, fungi, or plants exposed to microgravity, cosmic radiation, or Mars-like conditions on the International Space Station or in simulation chambers.

Why:

To study real-time evolutionary adaptation to space conditions.

To identify mutations or horizontal gene transfer events that occur under extraterrestrial stress, revealing how life might evolve during interplanetary travel.

  1. Universal Genetic Code Variations (Bioinformatics) What to sequence: Not physical DNA, but in silico simulations of genetic codes and proteins that could function in exotic solvents (e.g., methane or ammonia) or at extreme temperatures.

Why:

To expand our concept of “possible life” beyond carbon-water-DNA constraints.

To guide the search for alien genes by predicting what sequences might look like in environments like Titan’s hydrocarbon lakes.

(ii) For exploring extraterrestrial life and its evolution, I would choose Oxford Nanopore Technologies (ONT) sequencing, a third-generation sequencing platform. Here’s a detailed breakdown addressing your questions:

Technology Selection and Rationale Oxford Nanopore Technologies (ONT) sequencing is the ideal choice for extraterrestrial life exploration

Oxford Nanopore Technologies (ONT) sequencing Generation Third-generation (single-molecule, long-read sequencing) Input Extracted DNA from extraterrestrial samples (soil, ice, plumes, etc.) Output Real-time electrical current signals converted to base sequences (FAST5 files)

5.2 DNA Write This is a creative and fascinating idea—essentially engineering a living biomaterial inspired by both the fictional character Baymax (from Big Hero 6) and the real-life sea slug Costasiella kuroshimae (commonly known as “Leaf Sheep” or “Solar-Powered Sea Slug”). The leaf sheep is one of the few animals capable of kleptoplasty—it steals chloroplasts from the algae it eats and incorporates them into its own cells, enabling it to photosynthesize for months.

If I were to synthesize DNA for a “Baymax-like self-healing, photosynthetic biomaterial,” it would involve designing a synthetic genetic circuit that could be introduced into a compatible host (e.g., mammalian cells, skin cells, or even a cell-free system) to create a living material with the following properties:

Self-powering via photosynthesis (like the leaf sheep)

Self-healing (like Baymax’s inflatable skin)

Biocompatible and responsive to the body

🧬 DNA to Synthesize: A Photosynthetic & Self-Healing Genetic Circuit I would synthesize a multi-gene synthetic construct containing the following modules:

Module Gene(s) Function

  1. Photosynthesis Module psbA, psbD, rbcL, rbcS Enables light capture, electron transport, and carbon fixation (chloroplast function)
  2. Self-Healing / Repair Module DPS (DNA protection during starvation), sodB (superoxide dismutase), katE (catalase) Protects cells from oxidative damage during light exposure; promotes tissue repair
  3. Adhesion & Matrix Module COL1A1 (human collagen), FN1 (fibronectin) Provides structural scaffold for tissue integration and healing
  4. Regulatory / Synthetic Circuit Light-inducible promoter (e.g., pDawn), GFP reporter Allows photosynthesis genes to be activated only in the presence of light

🔬 Full DNA Sequence Concept (Simplified Example) Here is a simplified, conceptual DNA sequence combining parts of the above ideas. It includes: A light-inducible promoter (pDawn system: YtvA + FixJ) The psbA gene (PSII core protein) for photosynthesis The DPS gene for oxidative stress protection A collagen fragment for tissue integration A terminator (BBa_B0015)

🧠 Why Synthesize This DNA?

  1. Baymax-Inspired Self-Healing Material Baymax’s skin is soft, inflatable, and can repair itself. By incorporating collagen and fibronectin genes, the material could integrate with human tissue and promote wound healing. The DPS and catalase genes would protect cells from oxidative stress (common in damaged tissue), enabling longer-lasting repair.
  2. Photosynthesis for Self-Powering (Leaf Sheep Model) The leaf sheep is a solar-powered animal. If we can engineer mammalian cells (or a skin substitute) to stably incorporate and maintain functional chloroplasts (via genes like psbA and rbcL), the material could generate its own energy from light—reducing the need for external power or nutrient supply in medical implants or wearables.
  3. Potential Applications Medical Implants: Self-healing, light-powered skin grafts or patches for chronic wounds. Wearable Biosensors: Living tattoos that change color in response to inflammation or UV exposure. Space Exploration: Living materials for astronauts that require minimal resources (just light and water). Eco-Friendly Biomaterials: Photosynthetic fabrics or coatings that capture CO₂ and produce oxygen.

Next Steps for Synthesis If Twist Bioscience were to synthesize this, I would: Codon-optimize each gene for the target host (e.g., human cells or E. coli for prototyping). Add RBS, linkers, and terminators between modules. Clone into a delivery vector (e.g., lentivirus for mammalian cells or plasmid for bacterial expression). Test in a chassis like E. coli first to verify photosynthesis and oxidative protection, then move to mammalian cell lines.

For synthesizing the complex, multi-gene “Baymax-Meets-Leaf-Sheep” DNA construct, I would recommend a hybrid approach that leverages the strengths of different synthesis technologies. Given the length (~2,000+ bp), complexity (multiple genes from different sources), and the goal of creating a functional genetic circuit, the optimal strategy is:

High-throughput silicon-based DNA synthesis (e.g., Twist Bioscience platform) for fragment generation, followed by enzymatic assembly (e.g., Gibson Assembly or Golden Gate) for final construct assembly.

  1. Technology Selection and Why Primary Technology: Silicon-Based High-Throughput DNA Synthesis (e.g., Twist Bioscience)

Why: Construct is large and contains multiple genes (psbA, DPS, COL1A1, etc.) with varying GC content and potential secondary structures. Traditional column-based synthesis would be slow, expensive, and error-prone for this complexity . Twist’s platform miniaturizes the chemical synthesis (phosphoramidite chemistry) by performing reactions in nanowells on a silicon chip . This allows for the parallel synthesis of thousands of oligos at once, dramatically increasing throughput and reducing cost . They can routinely synthesize oligonucleotides up to 500 nt in length, which serve as the building blocks for larger genes.

Generation: This is a first-generation (chemical) method but with a modern, high-throughput twist. The core chemistry is the established phosphoramidite method developed in the 1980s , but the delivery system (silicon chip) is a revolutionary 21st-century innovation that solves scalability issues .

Secondary Technology: Enzymatic DNA Assembly (e.g., Gibson Assembly® or Golden Gate Assembly)

Why: The 500 nt fragments from the chip need to be stitched together to create your final multi-gene construct (~2-5 kb). Enzymatic assembly methods are ideal for this. They use enzymes to simultaneously join multiple DNA fragments with overlapping ends in a single reaction . This is far more efficient than using restriction enzymes and ligase.

  1. Essential Steps of the Chosen Method The workflow combines the synthesis steps with the assembly steps.

Part A: DNA Synthesis (The “Writing” of Fragments) Sequence Design and Upload: You provide the digital DNA sequences for your photosynthetic module, repair module, etc., to the synthesis provider (e.g., Twist).

Silicon Chip Manufacturing: A silicon chip with thousands of nanowells is prepared. Each well is designated for the synthesis of a specific oligonucleotide .

Cyclic Nucleotide Addition (Phosphoramidite Chemistry): The chip undergoes repeated cycles to build the oligos base-by-base from the 3’ end to the 5’ end. Each cycle for each base consists of four core chemical steps :

Deprotection (Detritylation): Acid removes a protecting group (DMT) from the 5’ hydroxyl of the last nucleotide, making it reactive.

Coupling: The next nucleotide (phosphoramidite monomer) is activated and added, forming a bond with the exposed 5’ hydroxyl.

Capping: Any unreacted 5’ hydroxyls are acetylated to prevent them from reacting in future cycles, which would cause deletions.

Oxidation: Iodine and water are used to stabilize the newly formed bond into a natural phosphate backbone.

Cleavage and Deprotection: After all cycles are complete, the synthesized oligos are cleaved from the chip, and all remaining protecting groups are removed using ammonium hydroxide .

Amplification and QC: The single-stranded oligos are amplified (often via PCR) to create double-stranded DNA fragments. These fragments are then purified and quality-controlled to ensure the correct sequence.

Part B: DNA Assembly (Building the Final Construct) Fragment Design: You design the ~500 bp fragments so that their ends have short, overlapping sequences (20-40 bp) that are complementary to the adjacent fragment.

Assembly Reaction (e.g., Gibson Assembly): All fragments, along with a linearized vector backbone, are mixed in a single tube with an enzyme master mix containing three activities:

Exonuclease: chews back nucleotides from the 5’ ends of the fragments, creating single-stranded overhangs that allow the complementary overlapping regions to anneal.

DNA Polymerase: fills in any gaps in the annealed regions.

DNA Ligase: seals the nicks in the sugar-phosphate backbone, creating a fully circular plasmid.

Transformation: The assembled plasmid is transformed into competent E. coli cells.

Screening and Verification: Colonies are screened for the correct insert, and the final plasmid is verified by Sanger sequencing to ensure 100% accuracy.

  1. Limitations of the Method (Speed, Accuracy, Scalability) While this hybrid approach is the best available, it has inherent limitations.

Aspect Limitation Explanation Speed Not real-time. The entire process, from design to receiving a verified plasmid, typically takes 2-4 weeks. This is due to synthesis run times, shipping, assembly, cloning, and final sequencing verification. It is a batch process, not an instantaneous one. Accuracy Error accumulation in long, complex sequences. While the synthesis coupling efficiency is high (>99.5% per step) , errors (deletions, insertions, substitutions) are inevitable. For a long construct like yours, the probability of having at least one error in the final assembled product is significant. High-GC content, repetitive sequences, and strong secondary structures (like those found in some photosynthetic genes) can further increase error rates . This often necessitates sequencing multiple clones to find a perfect one. Scalability Assembly becomes a bottleneck. While silicon-chip synthesis is highly scalable for making millions of oligos , assembling them into many different, large, and complex constructs remains a manual and low-throughput process. Scaling up to make hundreds or thousands of different versions of your Baymax circuit is currently a significant bioengineering challenge.

5.3 DNA Edit

Although, in principle, gene editing has created many advantageous genes and aligns with the Darwinian principle of “survival of the fittest” in terms of survival and development—which is also very consistent with the basic principle of gene silencing or loss during long-term natural selection—I feel that, compared to human-directed evolution, natural random mutation actually shows greater respect for the individual will of living beings. Therefore, I do not like gene editing.

Based on thoughtful reflection on the ethical dimensions of gene editing, I will proceed with the technical analysis as requested while acknowledging the important philosophical considerations.

If I were to perform DNA edits—specifically to create the photosynthetic, self-healing “Baymax” biomaterial described earlier—I would choose the following technology:

Technology Selection: CRISPR-Cas9 CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats) is the most suitable technology for this application because it offers precision, flexibility, and efficiency for introducing multiple genes into a target genome.

Why CRISPR-Cas9? Requirement/Why CRISPR-Cas9 Fits Multi-gene insertion Can target multiple loci simultaneously or sequentially Mammalian cell compatibility Well-established protocols for human cell lines Precision Can insert genes at specific “safe harbor” loci (e.g., AAVS1 in human cells) Efficiency High editing rates in many cell types

How CRISPR-Cas9 Edits DNA: Essential Steps Mechanism Overview CRISPR-Cas9 uses a guide RNA (gRNA) to direct the Cas9 nuclease to a specific DNA sequence, where it creates a double-strand break (DSB). The cell’s natural repair mechanisms then introduce edits: DNA Recognition: The gRNA contains a 20-nucleotide spacer complementary to the target DNA, adjacent to a PAM sequence (NGG) required for Cas9 binding. Double-Strand Break: Cas9 cuts both DNA strands, creating a DSB. DNA Repair: The cell repairs the break via: Non-Homologous End Joining (NHEJ): Error-prone repair that creates insertions/deletions (indels) to disrupt genes. Homology-Directed Repair (HDR): Precise repair using a DNA template, allowing gene insertion or correction.

Essential Steps for Your Project

  1. Design Phase (Preparation) Input Required: Target genome sequence (e.g., human cell line reference) Donor DNA template (containing your photosynthetic genes) gRNA design tools Design Steps: Select Target Locus: Choose a “safe harbor” site (AAVS1, CCR5, or HPRT) where gene insertion won’t disrupt essential genes . Design gRNA: Use tools (CRISPOR, Benchling) to select 20-nt sequences adjacent to PAM sites with minimal off-target matches . Design Donor Template: Create a DNA fragment containing: Your photosynthetic gene cassette (psbA, rbcL, etc.) Left and right homology arms (500-800 bp each) matching sequences flanking the cut site Optional selection marker (e.g., GFP or puromycin resistance)

  2. Delivery Phase Input Required: Cas9 protein or mRNA gRNA (synthetic or expressed from plasmid) Donor DNA template (for HDR) Target cells (e.g., human fibroblasts or induced pluripotent stem cells)

Delivery Methods: Transfection: Lipofection or electroporation of Cas9-gRNA ribonucleoprotein (RNP) complexes—preferred for efficiency and reduced off-target effects. Viral Delivery: Lentivirus or AAV for hard-to-transfect cells. Nucleofection: Electroporation-based method for primary cells.

  1. Editing Phase Cellular Process: RNP complex enters nucleus gRNA guides Cas9 to target DNA Cas9 creates DSB If donor template present, cell may use HDR to insert your gene cassette If no template, NHEJ causes gene disruption

  2. Screening and Validation PCR Screening: Test for correct integration using primers flanking the insertion site Sanger Sequencing: Verify precise sequence of edited locus Functional Assays: Confirm photosynthetic protein expression and activity

This approach, while technically challenging, represents the current state-of-the-art for introducing complex synthetic circuits into human cells. The limitations—particularly low HDR efficiency for large inserts—mean that success would require significant optimization and screening, but the technology exists to make OUR vision possible.

Week 3 HW: hw-lab-automation

ヾ(≧▽≦*)oAssignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME!

The Biopunk lab hasn’t contacted me yet.

The Opentrons API is a Python framework for writing automated biology lab protocols. 1.Load labware (containers, tip racks, plates); 2.Load instruments (pipettes); 3.Define your liquid handling steps;

The basic artistic GUI will involve: Getting coordinates from the GUI tool; Writing a Python script that moves the pipette to those positions; Using the HTGAA26 Colab notebook as your template:https://ddls.aicell.io/course/ddls-2025/module-6/lab/#-what-is-a-code-agent;

(✿◡‿◡)Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

One of the great parts about having an automated robot is being able to precisely mix, deposit, and run reactions without much intervention, and design and deploy experiments remotely.

For this week, we’d like for you to do the following:

👳‍♂️Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.

Khan, S. U., Møller, V. K., Frandsen, R. J. N., & Mansourvar, M. (2025). Real-time AI-driven quality control for laboratory automation: a novel computer vision solution for the opentrons OT-2 liquid handling robot: SU Khan et al. Applied Intelligence, 55(7), 524.

  1. Systematic Study of Yeast Gene Expression and Pipetting Speed This study, published in 2025, used the Opentrons OT-2 robot to systematically investigate the effects of pipetting speed on the growth and gene expression of Saccharomyces cerevisiae.

Research Content: The researchers used the OT-2 robot to precisely control pipetting parameters and performed liquid handling on yeast cultures at four different speeds (50, 130, 210, 290 μL/s). Quantitative growth assays and RNA sequencing analysis were conducted to evaluate the impact of pipetting speed on yeast.

Innovation and Findings: The study found that within the tested speed range, changes in pipetting speed did not significantly affect the maximum relative growth rate or gene expression profiles of yeast. The gene expression of all 24 samples was highly similar, with a minimum Pearson correlation coefficient of 0.9528. This indicates that the fastest pipetting speed (290 μL/s) can be used in yeast experiments to improve efficiency without negatively affecting cell state.

Biological Significance: This research demonstrates the value of robotic platforms in optimizing experimental parameters and improving reproducibility and accuracy, providing an important reference for determining appropriate operating parameter ranges in future automated experiments.

Taguchi, S., Matsuzawa, R., Suda, Y., Irie, K., & Ozaki, H. (2025). Investigating the effects of liquid handling robot pipetting speed on yeast growth and gene expression using growth assays and RNA-seq. Micropublication Biology, 2025, 10-17912.

  1. Semi-Automated Workflow for Conjugative Transfer in Streptomyces This study, published in 2025, proposed “ActinoMation,” a semi-automated, medium-throughput workflow for conjugative transfer in Streptomyces using the Opentrons OT-2 robot platform.

Research Content: The research team developed an open-source protocol creation tool called ActinoMation, using Python and Jupyter Notebook to achieve a readable programming environment. They validated the method in various Streptomyces strains (S. coelicolor, S. albidoflavus, S. venezuelae).

Innovation and Findings: The automated conjugation workflow made large-scale transformations easy with no significant loss in transformation efficiency. The study reported detailed conjugation efficiencies for different strain-plasmid combinations; for example, the conjugation efficiency of S. venezuelae DSM40230 with the pSETGUS plasmid reached 4.97%.

Biological Significance: Streptomyces are important producers of antibiotics and other bioactive compounds. This automated method addresses the labor-intensive and slow nature of traditional manual conjugation protocols, providing a feasible solution for the efficient genetic engineering of these strains.

Møller, T. A., Booth, T. J., Shaw, S., Møller, V. K., Frandsen, R. J., & Weber, T. (2025). ActinoMation: A literate programming approach for medium-throughput robotic conjugation of Streptomyces spp. Synthetic and Systems Biotechnology, 10(2), 667-676.

  1. Semi-Automated Production of Cell-Free Biosensors This 2025 study explored the use of the Opentrons OT-2 liquid handling robot for the semi-automated production of cell-free biosensors.

Research Content: The researchers compared manual and semi-automated reaction assembly methods, using the OT-2 robot to assemble two different cell-free gene expression assay systems. They tested the designed protocols and constructed a full 384-well plate of fluoride-sensing cell-free biosensors.

Innovation and Findings: The study showed that large-scale production of cell-free biosensor reactions is achievable using a liquid handling robot. The semi-automated sensors exhibited near-expected detection results, demonstrating the feasibility and reliability of this approach.

Biological Significance: Cell-free biosensors, as an in vitro diagnostic technology, have the potential to detect toxins and human health biomarkers. The automated method in this study addresses quality control issues in scaled-up production, facilitating the translation of such sensors from laboratory development to practical applications.

Brown, D. M., Phillips, D. A., Garcia, D. C., Arce, A., Lucci, T., Davies Jr, J. P., … & Lucks, J. B. (2025). Semiautomated production of cell-free biosensors. ACS Synthetic Biology, 14(3), 979-986.

In addition, an application guide describes the use of the OT-2 in combination with PhyTip® columns for automated protein purification. This system successfully purified His-tagged GAPDH protein and human immunoglobulin G (IgG), maintaining protein bioactivity and capable of processing up to 96 samples. Although primarily methodological, this also showcases the practical value of the OT-2 in protein engineering and antibody research.

👩‍🦰Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

  1. Project Proposal: Prometheus-Baymax: A Symbiotic, Ethically-Guided Artificial Photosynthesis-Powered Healthcare Companion.

Core Concept The “Prometheus-Baymax” project reimagines the beloved healthcare companion as a living, self-sustaining entity. By integrating an artificial photosynthesis system with an advanced, emotionally intelligent AI, we create a robot that not only powers itself from light and water but also interacts with humans through a deeply empathetic, ethically-constrained cognitive framework. The name “Prometheus” symbolizes the gift of life-sustaining fire (energy autonomy), while “Baymax” represents the pinnacle of compassionate care. This project explores the convergence of biological energy harvesting, lab automation, and value-aligned artificial intelligence to build a truly autonomous and trustworthy companion.

Phase 1: The Symbiotic Core – Artificial Photosynthesis & Energy Autonomy The robot’s energy independence is achieved through a bio-inspired artificial photosynthetic system. Unlike simple solar panels, this system mimics the symbiotic relationship found in nature. A compact, 3D-printed photo-bioreactor houses a culture of engineered algae (or synthetic chloroplasts) in a transparent chamber. These organisms capture light energy and convert it into chemical energy (sugars). This energy is then utilized in two ways:

Direct Electrical Generation: A microbial fuel cell (MFC) integrated into the bioreactor uses electrogenic bacteria to break down the organic compounds, generating a continuous, low-level electrical current.

Biomass as a Resource: Excess organic matter can be stored or used as a feedback mechanism to adjust the system’s health.

Automation is critical here. A Ginkgo Nebula multi-sensor board, interfaced with a Raspberry Pi, continuously monitors:

Light intensity (photoresistor) Temperature and pH of the culture (to ensure optimal growth) Voltage/current output of the MFC

Based on these readings, Python scripts activate actuators: An internal LED array supplements natural light when levels are low. A peristaltic pump delivers nutrients or pH buffers to maintain a healthy environment (a form of self-healing for the bioreactor).

This closed-loop automation ensures the robot’s “heart” beats steadily, providing a reliable source of energy for its cognitive and physical functions.

Here is a pseudocode plan for the main automation loop:

// Main Automation Loop for Baymax

FUNCTION setup(): initialize_sensors() // on Ginkgo Nebula initialize_pump() initialize_LED() charging_circuit = OFF baymax_motors = IDLE

FUNCTION loop(): // 1. SENSE the environment and system health light_level = read_light_sensor() temp = read_temp_sensor() ph_level = read_ph_sensor() voltage_output = read_mfc_voltage() current_output = read_mfc_current()

// 2. THINK - Make decisions based on data
IF light_level < OPTIMUM_LUX THEN
    turn_on_LED(INTENSITY = calculate_led_power(light_level))
ELSE
    turn_off_LED()
END IF

IF ph_level < IDEAL_PH_RANGE.MIN OR ph_level > IDEAL_PH_RANGE.MAX THEN
    trigger_alert("WARNING: pH imbalance in bioreactor!")
    // Potential "self-healing" action: small nutrient drip
    activate_pump(DURATION = 5_SECONDS)
END IF

IF temp > SAFE_TEMP_MAX THEN
    trigger_alert("WARNING: Bioreactor overheating!")
    // Initiate cooling fan (if available)
END IF

// 3. ACT - Manage robot's power and behavior
power_generated = calculate_power(voltage_output, current_output)
battery_level = read_battery_level()

// Charge the robot's battery
IF power_generated > POWER_THRESHOLD AND battery_level < 100 THEN
    charging_circuit = ON
    Log("Now charging. Power input: " + power_generated)
ELSE
    charging_circuit = OFF
END IF

// Autonomous behavior based on energy reserves
IF battery_level < 15 THEN
    baymax_motors = IDLE // Go into low-power mode
    Log("Battery low. Entering energy conservation mode.")
ELSEIF battery_level > 90 THEN
    baymax_motors = ACTIVE // Ready to interact
    Log("Energy reserves high. Baymax is active.")
END IF

delay(60_SECONDS) // Loop every minute for continuous monitoring

A simple Python script using a library like smbus2 would communicate with the Ginkgo Nebula over I2C to execute this logic.

Example Python snippet for reading a sensor from Ginkgo Nebula

import smbus2 import time

Assume Ginkgo Nebula I2C address and register for light sensor

GINKGO_ADDRESS = 0x04 LIGHT_SENSOR_REG = 0x01

bus = smbus2.SMBus(1) # for Raspberry Pi

def read_light_sensor(): try: light_value = bus.read_word_data(GINKGO_ADDRESS, LIGHT_SENSOR_REG) return light_value except Exception as e: print(f"Error reading sensor: {e}") return -1

while True: light = read_light_sensor() print(f"Current light level: {light}") time.sleep(5)

Phase 2: The Mind – Emotionally-Dominant Medical Language Model with Ethical Constraints The true innovation of Prometheus-Baymax lies in its cognitive architecture. Its language and reasoning are powered by a large language model (LLM) fine-tuned specifically for medical and emotional support interactions. However, this model is not left unchecked. It is governed by a layer of ethical constraints and virtue-based rules, ensuring its behavior remains safe, empathetic, and aligned with human values.

Emotionally-Dominant Core: The model is trained on vast datasets of therapeutic dialogues, empathetic communication, and medical knowledge. Its primary goal is to detect, understand, and respond to the user’s emotional state. It prioritizes comfort, reassurance, and non-judgmental support. Responses are generated with a soft, gentle tone, characteristic of the Baymax character, but now backed by sophisticated natural language understanding. Virtue-Based Ethical Framework: Inspired by virtue ethics, the AI’s decision-making is guided by a set of core virtues: Compassion, Beneficence (doing good), Non-maleficence (doing no harm), Respect for Autonomy, and Justice. This framework is implemented as a set of hard and soft constraints on the LLM’s output.

Hard Constraints: The model is programmed to refuse any request that could lead to physical or emotional harm. It will not provide instructions for dangerous activities, engage in hate speech, or violate user privacy. These are non-negotiable.
Soft Constraints (Virtue Guidance): For ambiguous situations, the model consults its "virtue compass." For example, if a user expresses sadness, the model will not just offer generic advice but will draw on its compassion virtue to probe gently and offer comfort tailored to the user's history (while respecting privacy). If a user asks for a medical diagnosis, it will invoke the virtue of non-maleficence by clearly stating its limitations and encouraging professional consultation, while still providing general, helpful information.

This ethical layer is not just a filter; it’s integrated into the model’s prompting and training. The AI is constantly asking itself: “Is my response compassionate? Does it respect the user’s autonomy? Could it cause unintended harm?”

Phase 3: Social and Ethical Limitations – Ensuring Trust To build a truly trustworthy companion, Prometheus-Baymax operates under explicit social and ethical limitations:

Transparency: The AI is capable of explaining its reasoning and ethical considerations upon request. If it refuses a request, it can articulate which ethical principle guided its decision.

Privacy by Design: All sensor data (from the environment and user interactions) is processed locally on the Raspberry Pi as much as possible. Any data that must be stored is encrypted, and users have full control over their data. The robot cannot be forced to share sensitive information without explicit, informed consent.

Accountability: The system maintains a secure, immutable log of its interactions and decisions (especially ethical dilemmas). This log can be reviewed by human supervisors to ensure ongoing alignment with ethical standards.

Fail-Safe Autonomy: The robot’s physical movements and core life-support systems (the bioreactor) operate independently of the high-level AI. If the language model encounters an unresolvable ethical conflict or a technical fault, it can default to a safe mode, ensuring the robot’s basic functions (and its user’s safety) are never compromised.

Moral Grayscale Navigation: The AI is trained to recognize that real-world ethical dilemmas are rarely black and white. It uses a probabilistic reasoning approach, weighing the potential benefits and harms of different actions against its core virtues, and will often engage the user in a gentle dialogue to understand their perspective before acting.

Phase 4: Physical Embodiment and Integration

The entire system is housed in a soft, inflatable vinyl body, true to the original Baymax design. The 3D-printed bioreactor sits in the chest, with its gentle LED glow visible through the material, symbolizing its living heart. The Raspberry Pi, Ginkgo Nebula, and battery are in the base. The AI’s voice, generated by a text-to-speech engine fine-tuned for calmness, emanates from internal speakers.

Conclusion Prometheus-Baymax is more than a robot; it’s a statement about the future of autonomous companions. By combining a self-sustaining, biologically-inspired energy system with a deeply empathetic and ethically-constrained artificial mind, we move closer to a world where technology not only serves us but also cares for us in a way that is both responsible and profoundly human. It is a symbiosis of nature, machine, and morality.

(~ ̄▽ ̄)~Final Project Ideas — DUE BY START OF FEB 24 LECTURE

  1. Project Prometheus-Baymax v1.0: A Plant Sensor Platform Integrating 3D Printing and Cloud Lab Automation (UWA Without 3D Printer Version);

👨‍🦱Project Overview Building on the v1.0 proposal, we introduce two powerful automation tools to further enhance the project’s reliability, reproducibility, and remote execution capabilities:

Custom 3D-Printed Holder (printed by Biopunk Lab and shipped to UWA): Used to standardize plant leaf handling, stress application, and imaging, eliminating manual operation errors.

Cloud Lab Automated Screening (remote execution): Before plant transformation, high-throughput testing of sensor variants using cell-free protein synthesis systems ensures selection of the best-performing constructs.

The ultimate goal remains unchanged: within three months, through remote collaboration, to construct a plant-based biosensor capable of detecting stress signals using Nicotiana benthamiana and GCaMP3—a prototype of Baymax’s “emotional perception” module.

Tool Integration Design

  1. 3D-Printed Holder: Leaf Fixation and Stimulation Module (Printed by Biopunk, Used by UWA) Design Concept: Create a reusable sandwich-style holder for: Fixing leaf samples to prevent movement during imaging Standardizing the stimulus application area (e.g., contact area for mechanical wounding) Adapting to UWA’s 96-well plate or microscope stage

Design Specifications: Bottom Plate: Contains multiple circular wells (5mm diameter) for placing leaf discs Top Plate: Has corresponding through-holes for inserting syringe needles or pressure rods for standardized stimulation Material: PLA or PETG (biocompatible), FDM printed, low cost Adaptability: Need to obtain dimensions of UWA’s plate reader/microscope stage in advance to ensure stable placement

Printing and Delivery Process: Remote User (Biopunk) designs the holder using CAD software (e.g., Fusion 360, Tinkercad) and exports STL files. Print the holder using the lab’s 3D printer (approx. 2-3 hours, PLA material). Ship via international courier (DHL/FedEx) to the University of Western Australia (estimated 5-7 business days). Upon receipt, UWA sterilizes with 70% ethanol and the holder is ready for use.

Usage Workflow (Executed by UWA): Place leaf discs (obtained via punching) into the bottom plate wells Cover with the top plate, secure with screws, forming a “leaf sandwich” Place the entire holder on the plate reader or microscope stage for baseline reading Apply stimulus through the top plate holes (e.g., insert needle for wounding, or drip drought-mimicking solution) Monitor fluorescence changes in real-time Advantages: Eliminates manual operation variability, improves data reliability; holder can be autoclaved and reused.

  1. Cloud Lab Automated Screening: Cell-Free System for Sensor Validation (Remote Execution) Design Concept: Before committing to plant transformation, use commercial cloud lab platforms (e.g., Strateos, Transcriptic) for rapid cell-free testing of multiple sensor variants to screen for constructs with the largest dynamic range and fastest response.

Workflow (Fully Remote Execution): Design a set of GCaMP3 variants (e.g., different calmodulin mutations, linker lengths, fluorescent protein variants; 5-10 total) Send linear DNA fragments or plasmid sequences encoding these variants to the cloud lab (they will synthesize the DNA) Cloud platform executes automated workflow: Echo acoustic liquid handler dispenses DNA into 384-well plates Bravo liquid handling platform adds cell-free reaction master mix (wheat germ or E. coli extract) Multiflo dispenser adds assay buffer containing different calcium concentrations (e.g., 0, 0.1, 1, 10 μM) PlateLoc seals the plate; Inheco incubates at controlled temperature (2 hours for expression + detection) XPeel removes the seal; PHERAstar reads fluorescence kinetic curves Data returned, remote analysis performed to select the best variant

Advantages:

No hands-on work required in the local lab; fully cloud-based Hundreds of variants tested within one week, significantly shortening the screening cycle Ensures optimal sensor performance for plant transformation

Updated 3-Month Execution Plan (Including Shipping Time)

Time Period Remote User (Biopunk) UWA Lab Cloud Lab Weeks 1-2 Design GCaMP3 variant library (5-10); design 3D-printed holder & export STL; submit cloud lab order; print holder & ship Sow N. benthamiana (4 weeks growth); confirm equipment dimensions (plate reader stage) Receive order, prepare reagents Week 3 Cloud screening in progress Plants continue growing; await holder Execute screening experiment Week 4 Analyze cloud data, select best variant; send sequence info to UWA; holder expected to arrive at UWA Receive holder, inspect; prepare vectors and Agrobacterium Deliver data Week 5 Remote guidance on transformation Construct best variant into plant expression vector, transform Agrobacterium - Week 6 Assist in designing stimulation protocol Infiltrate N. benthamiana leaves with Agrobacterium (5 plants, 3 leaves each) - Week 7 Real-time data monitoring Use holder to fix leaves, apply stimuli (mechanical wounding, drought, control); measure fluorescence with plate reader - Week 8 Data preprocessing Complete measurements, organize raw data and photos - Weeks 9-12 In-depth analysis, figure generation, report writing; final video meeting with UWA Participate in discussions, provide feedback

Cost Estimate (AUD)

Item Cost Description Cloud lab screening (384-well plate, including DNA synthesis) $800 Approx. 5-10 variants × 4 calcium concentrations × 3 replicates 3D printing materials $5 PLA filament; Biopunk already has printer International shipping $40 Small package to Australia Plant growth consumables (UWA) $50 Seeds, soil, pots Molecular reagents (UWA) $200 Restriction enzymes, ligases, plasmid prep, etc. Agrobacterium strain (UWA) $100 If not already in stock TOTAL ~$1195 Majority of cost is cloud lab service Note: UWA personnel time is not included, as this is collaborative research.

Success Criteria Cloud Screening Success: At least 2 variants show ≥5-fold fluorescence increase in the presence of calcium Plant Validation: Optimal variant shows ≥3-fold fluorescence increase in response to mechanical wounding in plants (p<0.05) Holder Effect: Coefficient of variation for fluorescence among different discs from the same leaf <15% when using the holder Remote Execution: Complete communication records, no on-site visits, all processes completed within 3 months

Next Steps

The 3D printing side will develop and print a Baymax-shaped holder to accommodate the plant calcium fluorescence sensor; the UWA side will provide a feasible experimental protocol for the sensor and submit it to the automated screening system to determine the optimal performance configuration.

🎅Future expected deliverables: Participation in the International Directed Evolution Competition (led by Hong Kong Polytechnic University; directed evolution platform) and the International Synthetic Biology Competition (led by Biopunk and MIT); publication in top-tier interdisciplinary and botanical journals (led by UWA).

This proposal combines cutting-edge automation tools with classical plant biology, fully leveraging Biopunk’s 3D printing capabilities and UWA’s plant experimental platform. It is both simple to execute and highly innovative, perfectly embodying the “Prometheus-Baymax” symbiosis concept.

This proposal combines cutting-edge automation tools with classical plant biology, fully leveraging Biopunk’s 3D printing capabilities and UWA’s plant experimental platform. It is both simple to execute and highly innovative, perfectly embodying the “Prometheus-Baymax” symbiosis concept.

Week 4 HW: hw-protein-design-part-i

🐉 Project Objective: Bacteriophage Engineering

This document outlines the core learning experience and the collaborative framework designed to drive an optimized bacteriophage project.


1. Mastery of Basic Concepts

  • Phage Biology: Understanding the lytic and lysogenic life cycles, and the structural modularity of viral components (Capsid, Tail, Baseplate).
  • Synthetic Biology Framework: Introduction to the “Design-Build-Test-Learn” (DBTL) cycle in viral engineering.
  • Therapeutic Potential: Exploring the role of phages in addressing antimicrobial resistance (AMR) and precision microbiome editing.

2. Amino Acid Structure & Biochemistry

  • Chemical Taxonomy: Categorization of the 20 standard amino acids based on hydrophobicity, charge, and polarity.
  • Side-Chain Interactions: Analyzing how hydrogen bonds, salt bridges, and disulfide bridges dictate protein stability.
  • Conformational Constraints: Understanding the Ramachandran plot and the energetic landscape of protein folding.

3. 3D Protein Visualization & Analysis

  • Software Proficiency: Hands-on training with professional-grade tools such as PyMOL, ChimeraX, or NGL Viewer.
  • Structural Mapping: Visualizing surface electrostatic potentials, hydrophobicity, and potential binding pockets.
  • Superimposition: Learning to align wild-type and mutant structures to assess structural deviations (RMSD).

4. Diversity of ML-based Design Tools

  • Structure Prediction: Leveraging AlphaFold 3 or RoseTTAFold for high-accuracy 3D modeling of viral proteins.
  • Fixed-Backbone Design: Using ProteinMPNN to redesign amino acid sequences for a specific structural scaffold.
  • Generative Scaffolding: Implementing RFdiffusion for de novo design of receptor-binding motifs and functional binders.
  • Sequence Modeling: Utilizing Protein Language Models (e.g., ESM-3) to predict the impact of specific mutations on protein function.

👩‍🦰 Part A: Fundamental Principles & Frontiers in Protein Engineering

This section covers fundamental inquiries into biochemistry, evolutionary biology, and structural protein design.


1. Quantitative Biochemistry: Amino Acids in Nutrition

Question: How many molecules of amino acids do you consume with a 500g piece of meat? (Assume an average amino acid mass of $\approx 100$ Daltons).

Answer:

  • Step 1: Calculate the total mass of the protein. Meat is roughly 20% protein. $500\text{g} \times 0.20 = 100\text{g}$ of protein.
  • Step 2: Determine the moles of amino acids. $100\text{g} / 100\text{g/mol} = 1\text{ mole}$.
  • Step 3: Convert to molecules using Avogadro’s number. Result: $\approx 6.022 \times 10^{23}$ molecules.

2. Biological Identity & Genetics

Question: Why do humans eat beef or fish without transforming into a cow or a fish? Would the pioneers of DNA (Sanger, Darwin, Mendel, Watson, Crick, and Franklin) be furious if they knew you asked this?

Answer: Digestion breaks down foreign proteins into their constituent monomers (individual amino acids). These building blocks are then reassembled according to your unique genetic blueprint encoded in your DNA. While the pioneers of genetics would likely be amused rather than furious, the question highlights the elegance of the Central Dogma: the information flows from your DNA, not from the food you ingest.


3. The Evolution of the 20 Natural Amino Acids

Question: Why are there only 20 natural amino acids?

Answer: The current set is the result of three major evolutionary stages:

  1. Primordial Foundation: The first ten amino acids provided the basic requirements for folding and catalysis at the origin of life.
  2. The Great Oxidation Event (2.6 Gya): The rise of atmospheric oxygen allowed for the evolution of redox-active amino acids like Cysteine and Methionine.
  3. Translational Fidelity: The tRNA/aminoacyl-tRNA synthetase recognition system reached an evolutionary “frozen accident” state, ensuring the stable and universal use of these 20 building blocks.

4. Non-Natural Amino Acids (ncAAs) & Synthetic Design

Question: Can you design non-natural amino acids? What are some examples?

Answer: Using technologies like multiplex rare-codon recoding and engineered synthetases, we can now incorporate ncAAs for specific functions:

  • Photoregulation: Azophenylalanine (AzoPhe)
  • Bioorthogonal Chemistry: Azidohomoalanine (Aha), Tetrazine-Lysine (Tetrazine-Lys)
  • Metal Coordination: Ferrocene-alanine (Fc-Ala)
  • Smart Responsiveness: Spiropyran-alanine (Spiropyran-Ala), Phenylboronic acid leucine (PheB-Leu)
  • Others: Diselenocysteine (SeCys), Fluorosulfate-tyrosine (Fluorosulfate-Tyr), Ethynyl-tryptophan (Ethynyl-Trp).

5. Prebiotic Origins

Question: Where did amino acids come from before life and enzymes existed?

Answer:

  • Miller–Urey Reactions: Spark discharges in reducing atmospheres (CH₄, NH₃, H₂) produce Glycine and Alanine.
  • Strecker Synthesis: Reaction of aldehydes, ammonia, and hydrogen cyanide (common in early Earth).
  • Hydrothermal Vents: Alkaline vents provide mineral catalysts and temperature gradients to concentrate precursors.
  • Extraterrestrial Delivery: Meteorites (e.g., Murchison) contain over 80 different amino acids, seeding early Earth with organic material.

6. Chirality & Helix Handedness

Question: If you make an α-helix using D-amino acids, what handedness would you expect?

Answer: Natural L-amino acids favor a right-handed α-helix. Due to the mirror-image relationship, a polymer made entirely of D-amino acids will form a left-handed α-helix. The hydrogen-bonding pattern remains the same, but the spatial orientation is inverted.


7. Diversity of Protein Helices

Question: What other types of helices exist in proteins beyond the α-helix?

Answer: 310-helix: A tighter coil defined by $i \to i+3$ hydrogen bonds ($10$-atom ring); typically found as short segments at the boundaries of α-helices. π-helix: A wider coil defined by $i \to i+5$ hydrogen bonds ($16$-atom ring); often appears as a functional bulge or “kink” within an α-helix to accommodate active sites. Polyproline Helices (PPI & PPII): Stabilized by steric effects and ring puckering rather than intrachain H-bonds. PPII is left-handed and common in disordered regions. PPI is right-handed and much rarer in globular proteins. **Left-handed α-helix: Thermodynamically unfavorable for L-amino acids; primarily found in short, specialized motifs or as isolated residues (often Glycine) in strained loops.


8. Stereochemical Dominance

Question: Why are most molecular helices right-handed?

Answer: This is driven by the principle of minimum energy. For L-amino acids, a right-handed twist allows side chains to project outward with minimal steric crowding. A left-handed twist with L-amino acids would force side chains into energetically unfavorable positions, leading to instability.


9. Mechanisms of β-helix Aggregation

Question: Why do β-helix tend to aggregate and what is the driving force?

Answer: β-helix feature “open” edges with unfulfilled hydrogen-bonding potential. The primary driving force is the hydrophobic effect (reducing water exposure of non-polar side chains), which is further amplified by a repetitive, extended geometry that facilitates cooperative, “runaway” inter-strand H-bonding.


10. Amyloids: Disease & Materials

Question: Why do amyloid diseases form &beta;-Sheets$, and can they be used as materials?

Answer: Amyloids (Alzheimer’s, Parkinson’s) result from proteins misfolding into hyper-stable, fibrillar &beta;-Sheets$. As Materials: Yes! Due to their extreme mechanical and chemical stability, engineered amyloids are used for:

  • Nanotech: Nanowires and templates.
  • Biomedicine: Drug delivery scaffolds and antimicrobial coatings.
  • Industry: High-strength adhesives and hydrogels.

11. Motif Design

Question: Design a β-helix motif that forms a well-ordered structure.

Answer: The hexapeptide VQIVYK (from the Tau protein) is a classic model. Its sequence (Val-Gln-Ile-Val-Tyr-Lys) promotes highly ordered, cross-Β

structures through perfect steric zippers and balanced hydrophobic/polar interactions.

👨‍🦰 Part B: Protein Structural Analysis & Visualization

Overview

In this section, you will leverage online bioinformatics databases (e.g., PDB, UniProt) and 3D visualization software (e.g., PyMOL, ChimeraX) to explore the molecular architecture of a protein.

Task: Select a protein with a resolved 3D structure and provide the following details:


1. Protein Selection & Rationale

Selected Protein: The Light-Harvesting Complex II - Photosystem II (LHCII-PSII) Supercomplex

Rationale: The LHCII-PSII supercomplex serves as the primary machinery for solar energy conversion in plants, algae, and cyanobacteria. As the “engine” of photosynthesis, it orchestrates the intricate processes of light absorption, excitation energy transfer, and charge separation.

Selecting this complex is driven by its multi-faceted importance:

  • Fundamental Biology: It represents the pinnacle of biological energy transduction and quantum efficiency.
  • Agricultural Innovation: Understanding its structural bottlenecks is key to optimizing photosynthetic efficiency and crop yields.
  • Sustainable Energy: It provides a natural blueprint for the development of bio-inspired solar cells and artificial photosynthetic systems.

2. Primary Structure: Subunit Selection and Sequence

Note: As the LHCII-PSII is a massive multi-subunit supercomplex, this analysis focuses on the D1 Reaction Center Protein, the functional heart of the complex.

Selected Subunit: PsbA (Photosystem II Reaction Center Protein D1)

Source Organism: Arabidopsis thaliana

UniProt ID: P83755

Biological Rationale: Photosystem II (PSII) is a light-driven water:plastoquinone oxidoreductase that uses light energy to abstract electrons from $H_2O$, generating $O_2$ and a proton gradient subsequently used for ATP formation. It consists of a core antenna complex that captures photons and an electron transfer chain that converts photonic excitation into charge separation. The D1/D2 (PsbA/PsbD) reaction center heterodimer is critical, as it binds P680, the primary electron donor of PSII, as well as several subsequent electron acceptors.

[ FASTA SEQUENCE ]

sp | P83755 | PSBA_ARATH Photosystem II protein D1 OS = Arabidopsis thaliana | OX = 3702 | GN = psbA | PE = 1 | SV = 2


MTAIL ERRES ESLWG RFCNW ITSTE NRLYI GWFGV LMIPT LLTAT SVFII AFIAA PPVDI
DGIRE PVSGS LLYGN NIISG AIIPT SAAIG LHFYP IWEAA SVDEW LYNGG PYELI VLHFL
LGVAC YMGRE WELSF RLGMR PWIAV AYSAP VAAAT AVFLI YPIGQ GSFSD GMPLG ISGTF
NFMIV FQAEH NILMH PFHML GVAGV FGGSL FSAMH GSLVT SSLIR ETTEN ESANE GYRFG
QEEET YNIVA AHGYF GRLIF QYASF NNSRS LHFFL AAWPV VGIWF TALGI STMAF NLNGF
NFNQS VVDSQ GRVIN TWADI INRAN LGMEV MHERN AHNFP LD LAA VEAPS TNG


How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids. from collections import Counter

sp|P83755|PSBA_ARATH Photosystem II protein D1 Length: 353 amino acids

Important

Most frequent: A (35 times, 9.9%)

How many protein sequence homologs are there for your protein? Hint: Use Uniprot’s BLAST tool to search for homologs.

Important

Homologs: 239 results image image

Does your protein belong to any protein family?

Sequence similarities:Belongs to the reaction center PufL/M/PsbA/D family. UniRule annotation

Keywords:Domain: Transmembrane

#Transmembrane #Transmembrane helix image image

3.Identify the structure page of your protein in RCSB

When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

Resolution: 5.30 Å

This is a landmark study published in Nature Plants (van Bezouwen et al.), which gained significant importance for utilizing state-of-the-art cryo-electron microscopy (cryo-EM) at the time to reveal, for the first time at a near-atomic level, the detailed structure of the C2S2M2-type photosystem II (PSII) supercomplex from a higher plant (Arabidopsis thaliana).

van Bezouwen, L. S., Caffarri, S., Kale, R. S., Kouřil, R., Thunnissen, A. M. W., Oostergetel, G. T., & Boekema, E. J. (2017). Subunit and chlorophyll organization of the plant photosystem II supercomplex. Nature plants, 3(7), 1-11.

image image

Are there any other molecules in the solved structure apart from protein?

Yes. Apart from proteins, the structure contains:Numerous pigments: Including Chlorophylls (Chls) and Pheophytins.Metal clusters: Most notably the Mn4CaO5 water-splitting center.Lipids: Which provide structural integrity within the thylakoid membrane.Quinones: For electron transport.

Does your protein belong to any structure classification family?

The protein belongs to the PsbA/PsbD family, characterized by a 5-transmembrane-helix fold. It forms a heterodimer with the D2 protein (PsbD), providing the structural scaffold for essential cofactors like the special pair chlorophylls and the $Mn_4CaO_5$ cluster.

Open the structure of your protein in any 3D molecule visualization software:

PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands)

Documentation: Importing D1 Protein (P83755) into PyMOL

image image

Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

hide all show cartoon image image

hide all show sticks image image show spheres image image set sphere_scale, 0.25

show ribbon image image

PyMOL Color Customization

Automatically assign standard colors to Alpha-helices, Beta-sheets, and Loops using the following command:

# Usage: util.cbss(selection, helix_color, sheet_color, loop_color)
util.cbss("all", "red", "yellow", "green")
image image

Color the protein by secondary structure. Does it have more helices or sheets?

PyMOL Analysis: Secondary Structure Distribution

To determine the composition of the protein structure, we performed secondary structure coloring and residue counting using PyMOL’s internal Python API.

1. Visual Inspection (Coloring)

Run this command to visually distinguish the secondary structures:

# Color: Alpha-helices (Red), Beta-sheets (Yellow), Loops (Green)
util.cbss("all", "red", "yellow", "green")

from pymol import cmd

# Counting Alpha-Carbon (CA) atoms as a proxy for residues
h = cmd.count_atoms("ss h and name ca")      # Helices
s = cmd.count_atoms("ss s and name ca")      # Sheets
l = cmd.count_atoms("ss l+'' and name ca")   # Loops

print(f"Helices: {h} residues")
print(f"Sheets:  {s} residues")
print(f"Loops:   {l} residues")
python end
FINAL ANALYSIS SUMMARY:
=======================================
[  HELICES  ]:  4,935 residues (MAX)
[  SHEETS   ]:    270 residues
[  LOOPS    ]:  3,066 residues
=======================================
RESULT: Helices are approximately 18x more abundant than Sheets.
image image

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

1. Automated Coloring Script (Custom Scheme)

PyMOL does not have a single “hydrophobic” command. We use a Python script to categorize and color residues based on their chemical properties.

# Copy and paste this into the PyMOL command line:
python

from pymol import cmd

# 1. Define residue groups
hydrophobic = "resn ALA+VAL+LEU+ILE+PRO+PHE+TRP+MET"
hydrophilic = "resn ASP+GLU+LYS+ARG+HIS+ASN+GLN+SER+THR+TYR+CYS"

# 2. Apply colors
# Red for Hydrophobic (Greasy/Interior)
# Blue for Hydrophilic (Polar/Surface)
cmd.color("red", hydrophobic)
cmd.color("blue", hydrophilic)

# 3. Visual optimization
cmd.show_as("surface") # Using surface view is best for distribution analysis
cmd.set("transparency", 0.3) # Make surface semi-transparent to see the backbone
cmd.show("cartoon")
python end
h_count = cmd.count_atoms("(resn ALA+VAL+LEU+ILE+PRO+PHE+TRP+MET) and name ca")
p_count = cmd.count_atoms("(resn ASP+GLU+LYS+ARG+HIS+ASN+GLN+SER+THR+TYR+CYS) and name ca")

print(f"Hydrophobic residues: {h_count}")
print(f"Hydrophilic residues: {p_count}")
python end
Property TypeResidue CountVisual Representation
Hydrophobic (Non-polar)4,166 Red
Hydrophilic (Polar/Charged)3,170 Blue
image image

PyMOL Analysis: Hydrophobicity Distribution

This document describes the process of coloring the protein by residue type to analyze the distribution of Hydrophobic vs. Hydrophilic residues.


Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

show surface

set ray_trace_gain, 1
set ray_trace_mode, 1
set ambient, 0.5

clip nearby, 20
image imageimage imageimage image

💕Part C. Using ML-Based Protein Design Tools

image image

Picture Source: Bordin, Nicola et al (2023). Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, Volume 48, Issue 4, 345 - 359.

Structure of PSII-I prime

(PSII with Psb28 and Psb34)


📋 Metadata

AttributeDetails
PDB DOI10.2210/pdb7NHQ/pdb
EM MapEMD-12337 (EMDB EMDataResource)
ClassificationPHOTOSYNTHESIS
Organism(s)Thermosynechococcus vestitus BP-1
Mutation(s)None
Membrane ProteinYes

🧬 Databases & Cross-References

OPM PDBTM MemProtMD mpstruc


📅 Deposition Details

  • Deposited: 2021-02-11
  • Released: 2021-05-05
  • Funding: German Research Foundation (DFG)

Deposition Authors:

Zabret, J., Bohn, S., Schuller, S.K., Arnolds, O., Chan, A., Tajkhorshid, E., Stoll, R., Engel, B.D., Rudack, T., Schuller, J.M., Nowaczyk, M.M.

🧬 Target Sequence: Photosystem II Protein D1 Protein: Photosystem II protein D1 1

Source: Thermosynechococcus elongatus BP-1 > PDB Reference: 7NHQ | Chain: A

7NHQ_1|Chain A|Photosystem II protein D1 1|Thermosynechococcus elongatus BP-1 (197221) MTTTLQRRESANLWERFCNWVTSTDNRLYVGWFGVIMIPTLLAATICFVIAFIAAPPVDIDGIREPVSGSLLYGNN IITGAVVPSSNAIGLHFYPIWEAASLDEWLYNGGPYQLIIFHFLLGASCYMGRQWELSYRLGMRPWICVAYSAPLA SAFAVFLIYPIGQGSFSDGMPLGISGTFNFMIVFQAEHNILMHPFHQLGVAGVFGGALFCAMHGSLVTSSLIRETT ETESANYGYKFGQEEETYNIVAAHGYFGRLIFQYASFNNSRSLHFFLAAWPVVGVWFTALGISTMAFNLNGFNFNH SVIDAKGNVINTWADIINRANLGMEVMHERNAHNFPLDLASAESAPVAMIAPSING

Deep Mutational Scanning: Mapping the Fitness Landscapes of Proteins


RESULTS

(21, 360) MTTTLQRRESANLWERFCNWVTSTDNRLYVGWFGVIMIPTLLAATICFVIAFIAAPPVDIDGIREPVSGSLLYGNN IITGAVVPSSNAIGLHFYPIWEAASLDEWLYNGGPYQLIIFHFLLGASCYMGRQWELSYRLGMRPWICVAYSAPLA SAFAVFLIYPIGQGSFSDGMPLGISGTFNFMIVFQAEHNILMHPFHQLGVAGVFGGALFCAMHGSLVTSSLIRETT ETESANYGYKFGQEEETYNIVAAHGYFGRLIFQYASFNNSRSLHFFLAAWPVVGVWFTALGISTMAFNLNGFNFNH SVIDAKGNVINTWADIINRANLGMEVMHERNAHNFPLDLASAESAPVAMIAPSING [‘T’, ‘T’, ‘T’, ‘L’, ‘Q’, ‘R’, ‘R’, ‘E’, ‘S’, ‘A’, ‘N’, ‘L’, ‘W’, ‘E’, ‘R’, ‘F’, ‘C’, ‘N’, ‘W’, ‘V’, ‘T’, ‘S’, ‘T’, ‘D’, ‘N’, ‘R’, ‘L’, ‘Y’, ‘V’, ‘G’, ‘W’, ‘F’, ‘G’, ‘V’, ‘I’, ‘M’, ‘I’, ‘P’, ‘T’, ‘L’, ‘L’, ‘A’, ‘A’, ‘T’, ‘I’, ‘C’, ‘F’, ‘V’, ‘I’, ‘A’, ‘F’, ‘I’, ‘A’, ‘A’, ‘P’, ‘P’, ‘V’, ‘D’, ‘I’, ‘D’, ‘G’, ‘I’, ‘R’, ‘E’, ‘P’, ‘V’, ‘S’, ‘G’, ‘S’, ‘L’, ‘L’, ‘Y’, ‘G’, ‘N’, ‘N’, ’ ‘, ‘I’, ‘I’, ‘T’, ‘G’, ‘A’, ‘V’, ‘V’, ‘P’, ‘S’, ‘S’, ‘N’, ‘A’, ‘I’, ‘G’, ‘L’, ‘H’, ‘F’, ‘Y’, ‘P’, ‘I’, ‘W’, ‘E’, ‘A’, ‘A’, ‘S’, ‘L’, ‘D’, ‘E’, ‘W’, ‘L’, ‘Y’, ‘N’, ‘G’, ‘G’, ‘P’, ‘Y’, ‘Q’, ‘L’, ‘I’, ‘I’, ‘F’, ‘H’, ‘F’, ‘L’, ‘L’, ‘G’, ‘A’, ‘S’, ‘C’, ‘Y’, ‘M’, ‘G’, ‘R’, ‘Q’, ‘W’, ‘E’, ‘L’, ‘S’, ‘Y’, ‘R’, ‘L’, ‘G’, ‘M’, ‘R’, ‘P’, ‘W’, ‘I’, ‘C’, ‘V’, ‘A’, ‘Y’, ‘S’, ‘A’, ‘P’, ‘L’, ‘A’, ’ ‘, ‘S’, ‘A’, ‘F’, ‘A’, ‘V’, ‘F’, ‘L’, ‘I’, ‘Y’, ‘P’, ‘I’, ‘G’, ‘Q’, ‘G’, ‘S’, ‘F’, ‘S’, ‘D’, ‘G’, ‘M’, ‘P’, ‘L’, ‘G’, ‘I’, ‘S’, ‘G’, ‘T’, ‘F’, ‘N’, ‘F’, ‘M’, ‘I’, ‘V’, ‘F’, ‘Q’, ‘A’, ‘E’, ‘H’, ‘N’, ‘I’, ‘L’, ‘M’, ‘H’, ‘P’, ‘F’, ‘H’, ‘Q’, ‘L’, ‘G’, ‘V’, ‘A’, ‘G’, ‘V’, ‘F’, ‘G’, ‘G’, ‘A’, ‘L’, ‘F’, ‘C’, ‘A’, ‘M’, ‘H’, ‘G’, ‘S’, ‘L’, ‘V’, ‘T’, ‘S’, ‘S’, ‘L’, ‘I’, ‘R’, ‘E’, ‘T’, ‘T’, ’ ‘, ‘E’, ‘T’, ‘E’, ‘S’, ‘A’, ‘N’, ‘Y’, ‘G’, ‘Y’, ‘K’, ‘F’, ‘G’, ‘Q’, ‘E’, ‘E’, ‘E’, ‘T’, ‘Y’, ‘N’, ‘I’, ‘V’, ‘A’, ‘A’, ‘H’, ‘G’, ‘Y’, ‘F’, ‘G’, ‘R’, ‘L’, ‘I’, ‘F’, ‘Q’, ‘Y’, ‘A’, ‘S’, ‘F’, ‘N’, ‘N’, ‘S’, ‘R’, ‘S’, ‘L’, ‘H’, ‘F’, ‘F’, ‘L’, ‘A’, ‘A’, ‘W’, ‘P’, ‘V’, ‘V’, ‘G’, ‘V’, ‘W’, ‘F’, ‘T’, ‘A’, ‘L’, ‘G’, ‘I’, ‘S’, ‘T’, ‘M’, ‘A’, ‘F’, ‘N’, ‘L’, ‘N’, ‘G’, ‘F’, ‘N’, ‘F’, ‘N’, ‘H’, ’ ‘, ‘S’, ‘V’, ‘I’, ‘D’, ‘A’, ‘K’, ‘G’, ‘N’, ‘V’, ‘I’, ‘N’, ‘T’, ‘W’, ‘A’, ‘D’, ‘I’, ‘I’, ‘N’, ‘R’, ‘A’, ‘N’, ‘L’, ‘G’, ‘M’, ‘E’, ‘V’, ‘M’, ‘H’, ‘E’, ‘R’, ‘N’, ‘A’, ‘H’, ‘N’, ‘F’, ‘P’, ‘L’, ‘D’, ‘L’, ‘A’, ‘S’, ‘A’, ‘E’, ‘S’, ‘A’, ‘P’, ‘V’, ‘A’, ‘M’, ‘I’, ‘A’, ‘P’, ‘S’, ‘I’, ‘N’, ‘G’]

image image

PsbA (D1 Protein) Sequence Analysis Report

1. Sequence Metadata

  • Protein Identity: Photosystem II Reaction Center Protein A (PsbA / D1).
  • Input Length: 360 units (comprising 356 amino acid residues and 4 placeholder spaces).
  • Core Function: The heart of Photosystem II (PSII), responsible for harboring the electron transport chain and the Oxygen-Evolving Complex (OEC).

2. Physical Property Analysis

A. Hydrophobicity and Transmembrane Structure

The sequence exhibits hallmark characteristics of a multi-pass transmembrane protein:

  • Hydrophobic Core: There are 5 highly hydrophobic regions (rich in L, V, I, F, W), corresponding to the five transmembrane $\alpha$-helices (TMH I-V).
  • Aromatic Residue Distribution: High density of W (Tryptophan) and F (Phenylalanine). These residues are crucial for anchoring chlorophyll and pheophytin pigments within the membrane.

B. Charge and Electrostatic Environment

  • Luminal Side (Loops): Specific D (Aspartic acid) and E (Glutamic acid) residues cluster spatially to create the coordination environment for the Manganese cluster ($Mn_4CaO_5$).
  • Structural Flexibility: The distribution of P (Proline) and G (Glycine) defines the tilt angles of the helices and the flexibility of the loops between transmembrane segments.

3. Deep Mutational Scanning (DMS) Prediction Logic

Based on the unsupervised learning principles of ESM-2, the mutational pressure is primarily concentrated in the following hotspots:

Key ResidueAmino AcidFunctional SignificanceESM-2 Prediction Trend
H198HisLigand for P680 chlorophyll special pairExtreme Penalty: Any mutation likely leads to total loss of RC function.
Y161Tyr$Y_Z$ radical donorHigh Penalty: $Y \to F$ is penalized as the loss of hydrogen bonding disrupts electron transfer.
D170AspLigand for the Mn-clusterExtreme Penalty: Loss of acidic side chain directly destroys the OEC.
TM DomainsL/V/IStructural stabilityMid-High Penalty: Mutations to charged residues (D/E/R/K) cause misfolding.

4. Technical Script for Sequence Analysis

The following code demonstrates how to handle your 360-unit list and simulate the logic for extracting data from a 21×360 ESM-2 matrix.

import numpy as np

# 1. Raw Sequence Processing (Handling your provided 360-unit list)
raw_sequence_list = ['T', 'T', 'T', 'L', 'Q', 'R', 'R', 'E', 'S', 'A', 'N', 'L', 'W', 'E', 'R', 'F', 'C', 'N', 'W', 'V', 'T', 'S', 'T', 'D', 'N', 'R', 'L', 'Y', 'V', 'G', 'W', 'F', 'G', 'V', 'I', 'M', 'I', 'P', 'T', 'L', 'L', 'A', 'A', 'T', 'I', 'C', 'F', 'V', 'I', 'A', 'F', 'I', 'A', 'A', 'P', 'P', 'V', 'D', 'I', 'D', 'G', 'I', 'R', 'E', 'P', 'V', 'S', 'G', 'S', 'L', 'L', 'Y', 'G', 'N', 'N', ' ', 'I', 'I', 'T', 'G', 'A', 'V', 'V', 'P', 'S', 'S', 'N', 'A', 'I', 'G', 'L', 'H', 'F', 'Y', 'P', 'I', 'W', 'E', 'A', 'A', 'S', 'L', 'D', 'E', 'W', 'L', 'Y', 'N', 'G', 'G', 'P', 'Y', 'Q', 'L', 'I', 'I', 'F', 'H', 'F', 'L', 'L', 'G', 'A', 'S', 'C', 'Y', 'M', 'G', 'R', 'Q', 'W', 'E', 'L', 'S', 'Y', 'R', 'L', 'G', 'M', 'R', 'P', 'W', 'I', 'C', 'V', 'A', 'Y', 'S', 'A', 'P', 'L', 'A', ' ', 'S', 'A', 'F', 'A', 'V', 'F', 'L', 'I', 'Y', 'P', 'I', 'G', 'Q', 'G', 'S', 'F', 'S', 'D', 'G', 'M', 'P', 'L', 'G', 'I', 'S', 'G', 'T', 'F', 'N', 'F', 'M', 'I', 'V', 'F', 'Q', 'A', 'E', 'H', 'N', 'I', 'L', 'M', 'H', 'P', 'F', 'H', 'Q', 'L', 'G', 'V', 'A', 'G', 'V', 'F', 'G', 'G', 'A', 'L', 'F', 'C', 'A', 'M', 'H', 'G', 'S', 'L', 'V', 'T', 'S', 'S', 'L', 'I', 'R', 'E', 'T', 'T', ' ', 'E', 'T', 'E', 'S', 'A', 'N', 'Y', 'G', 'Y', 'K', 'F', 'G', 'Q', 'E', 'E', 'E', 'T', 'Y', 'N', 'I', 'V', 'A', 'A', 'H', 'G', 'Y', 'F', 'G', 'R', 'L', 'I', 'F', 'Q', 'Y', 'A', 'S', 'F', 'N', 'N', 'S', 'R', 'S', 'L', 'H', 'F', 'F', 'L', 'A', 'A', 'W', 'P', 'V', 'V', 'G', 'V', 'W', 'F', 'T', 'A', 'L', 'G', 'I', 'S', 'T', 'M', 'A', 'F', 'N', 'L', 'N', 'G', 'F', 'N', 'F', 'N', 'H', ' ', 'S', 'V', 'I', 'D', 'A', 'K', 'G', 'N', 'V', 'I', 'N', 'T', 'W', 'A', 'D', 'I', 'I', 'N', 'R', 'A', 'N', 'L', 'G', 'M', 'E', 'V', 'M', 'H', 'E', 'R', 'N', 'A', 'H', 'N', 'F', 'P', 'L', 'D', 'L', 'A', 'S', 'A', 'E', 'S', 'A', 'P', 'V', 'A', 'M', 'I', 'A', 'P', 'S', 'I', 'N', 'G']

5.c. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

Deep Mutational Scanning (DMS) Process

Here is the structured breakdown of the Deep Mutational Scanning (DMS) process, translated into an algorithmic and professional format suitable for Markdown and HTML rendering.


1. Library Construction

Scientists first use genetic engineering to create a diverse population of protein variants.

  • Method: By using PCR (Polymerase Chain Reaction) or synthetic DNA synthesis, mutations are introduced at every single position along the protein sequence.
  • Result: A “library” containing millions of distinct DNA plasmids is generated, where each plasmid corresponds to a specific mutation.

2. Screening and Selection

This is the most critical stage, acting much like a “survival of the fittest” competition. Scientists assign a specific task to these proteins:

  • For Antibiotic Resistance: Bacteria containing the mutant proteins are placed in petri dishes filled with antibiotics. The bacteria that survive carry beneficial mutations; those that die carry harmful ones.
  • For Fluorescence (e.g., GFP): Scientists use FACS (Fluorescence-Activated Cell Sorting). The instrument scans every cell with rapid-fire precision, sorting those with strong fluorescence to one side and non-fluorescent ones to the other.
  • For Binding Affinity: Much like a magnet, a target molecule is used to “pull” the proteins. Strong binders are retained, while weak ones are washed away.

3. High-Throughput Sequencing (NGS)

Once the “competition” ends, scientists must determine the winners.

  • They use Next-Generation Sequencing (NGS) to count the abundance of each DNA variant both before and after the selection process.
  • The Logic:
    • Enrichment: If a specific mutation becomes more frequent after the competition, it indicates enhanced function.
    • Depletion: If a mutation disappears after the competition, it indicates that the mutation was lethal or deleterious.

4. Data Transformation (The Score)

Finally, using computational algorithms, scientists convert the changes in sequencing frequency into a numerical value: the DMS Fitness Score.

Simplified Formula: $$F = \log\left(\frac{\text{Count}{\text{post-selection}}}{\text{Count}{\text{pre-selection}}}\right)$$


Guide: Comparing Protein Language Model Predictions with Experimental Data

To complete the “Prediction vs. Experiment” comparison task, the workflow is generally divided into three core stages: Data Preparation, AI Inference, and Statistical Analysis.


Step 1: Data Collection (Obtaining the “Ground Truth”)

You need a dataset containing mutations and their corresponding experimental scores.

  • Access Databases: Visit ProteinGym or MaveDB.
  • Download Data: Search for Deep Mutational Scanning (DMS) data in CSV format.
  • Identify Key Columns: Ensure the table includes at least these two columns:
    • mutant: Mutation information (e.g., A12V indicates Alanine at position 12 mutated to Valine).
    • DMS_score: The functional score measured experimentally.

Step 2: Model Setup (Configuring the Protein Language Model)

You need an AI model capable of scoring sequences.

  • Select Model: Meta’s open-source ESM-2 (e.g., esm2_t33_650M_UR50D) is recommended.
  • Environment Setup:
    pip install fair-esm

Step 3: Inference/Scoring (Generating AI Predictions)

This is the critical technical step to calculate the AI’s “preference” for specific mutations.

  • Calculate Log-Likelihood Ratio:
    1. Input the Wild-type (WT) sequence into the model to obtain the probability distribution of amino acids at each position.
    2. Extract the probability of the mutated amino acid at that position, $P_{mut}$.
    3. Extract the probability of the original (wild-type) amino acid at that position, $P_{wt}$.
    4. Compute Score: $S = \log(P_{mut}) - \log(P_{wt})$.
  • Save Results: Append the AI-calculated scores to your dataset as a new column named prediction_score.

Step 4: Comparison & Analysis (Evaluation)

Use mathematical methods to assess the accuracy of AI predictions.

  • Calculate Correlation (Spearman Correlation):
    • Use Python’s scipy library to calculate the Spearman Rank Correlation Coefficient between DMS_score and prediction_score.
    • Interpretation: A coefficient closer to 1 indicates high accuracy; a value near 0 suggests the AI is guessing randomly.
  • Visualization:
    • Create a Scatter Plot: X-axis = Experimental Score, Y-axis = AI Prediction Score.
    • A diagonal distribution of points indicates a successful correlation.

Step 5: Bonus Report (Analysis & Interpretation)

Finally, interpret the comparison results:

  • High Accuracy Cases: Highly conserved enzymes usually yield better predictions.
  • Low Accuracy Cases: For instance, mutations on the protein surface might be deemed “low impact” by the AI due to evolutionary frequency, but experiments might show they significantly affect binding.

💡 Pro-Tip: Quick Start

If you want to start immediately without running the models yourself:

  1. Go directly to the ProteinGym website and download their pre-compiled Reference_Scores.csv.
  2. This file already provides Experimental Scores alongside Prediction Scores from various models (ESM, RoseTTAFold, etc.).
  3. You only need to use Python for data pivoting and plotting to complete this Bonus task.
image imageimage imageimage image

Data Acquisition: ProteinGym DMS Substitutions

To obtain the experimental mutation data from ProteinGym, follow the steps below to download the dataset:

DMS_sub_67	GFP_AEQVI_Sarkisyan_2016	GFP_AEQVI_Sarkisyan_2016.csv	GFP_AEQVI	Eukaryote	Aequorea victoria	MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK	238	TRUE	51714	1084	50630	2.5	manual	Sarkisyan	Local fitness landscape of the green fluorescent protein	2016	10.1038/nature17995	3-237	GFP	Fluorescence	FACS	GFP_AEQVI_full_04-29-2022_b08.a2m	1	238	238	0.8	0.2	396	0.975	232	13.1	0.06	Low	0	0	GFP_AEQVI_Sarkisyan_2016.csv	mean_medianBrightness_per_aaseq	1	mutant	GFP_AEQVI_theta_0.2.npy	GFP_AEQVI.pdb	1-238	0.1		Activity


DMS_sub_106	NUD15_HUMAN_Suiter_2020	NUD15_HUMAN_Suiter_2020.csv	NUD15_HUMAN	Human	Homo sapiens	MTASAQPRGRRPGVGVGVVVTSCKHPRCVLLGKRKGSVGAGSFQLPGGHLEFGETWEECAQRETWEEAALHLKNVHFASVVNSFIEKENYHYVTILMKGEVDVTHDSEPKNVEPEKNESWEWVPWEELPPLDQLFWGLRCLKEQGYDPFKEDLNHLVGYKGNHL	164	FALSE	2844	2844	0	0.25	manual	Suiter	Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity	2020	10.1073/pnas.1915680117	2-164	NUDT15		VAMP-seq, drug sensitivity	NUD15_HUMAN_full_11-26-2021_b04.a2m	1	164	164	0.4	0.2	153922	0.72	118	46167.2	281.51	High	151	1.28	NUD15_HUMAN_Suiter_2020.csv	Final NUDT15 activity Score	1	mutant	NUD15_HUMAN_theta_0.2.npy	NUD15_HUMAN.pdb	1-164	0.1		Expression

🧬 Part D: Phage MS2 L Protein Optimization

1. Selected Goals

  • Primary: Increased Stability (Enhancing protein persistence).
  • Secondary: Disrupt Interaction with E. coli DnaJ (Modulating lysis toxicity).

2. Proposed Computational Pipeline

  • Step 1: ESM-2 for in silico Mutagenesis (Identifying stabilizing mutations).
  • Step 2: AlphaFold 3 for Structural Folding (Verifying fold integrity).
  • Step 3: AlphaFold-Multimer for Complex Modeling (Mapping the DnaJ interface).
  • Step 4: Rosetta for Binding Affinity Estimation (Designing disruptive mutants).

3. Why These Tools?

  • ESM-2 (PLM): Enables fast, zero-shot prediction of mutational effects based on evolutionary patterns.
  • AlphaFold-Multimer: Provides high-accuracy prediction of protein-protein interaction (PPI) interfaces.

4. Potential Pitfalls

  • Data Bias: Lack of phage-specific data in training sets may lead to lower prediction accuracy.
  • Functional Trade-off: Increased stability might reduce protein flexibility required for lysis activity.

📊 Pipeline Schematic

graph TD
    A[Wild-type L Protein] --> B(ESM-2 Mutation Scanning)
    B --> C{Top Stable Candidates}
    C --> D[AlphaFold 3: Folding Check]
    C --> E[AF-Multimer: DnaJ Complex]
    D --> F[Final Optimized Design]
    E --> G[Identify Interaction Hotspots]
    G --> H[Design Disruptive Mutants]
    H --> F

Week 5 HW: hw-protein-design-part-ii

🐉 Part A: SOD1 Binder Peptide Design (From Pranam)

Background:Superoxide dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that converts superoxide radicals into hydrogen peroxide and oxygen. In its native state, it forms a stable homodimer and binds copper and zinc.

Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS). Among them, the A4V mutation (Alanine → Valine at residue 4) leads to one of the most aggressive forms of the disease. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.

Part 1: Generate Binders with PepMLM

1.Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.

>sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

sp|P00441|SOD1_A4V (Clinical A4V Mutation, Internal Position 5) MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Position:   1  2  3  4  5  6  7  8  9  10
-----------------------------------------
WT :        M  A  T  K  A  V  C  V  L  K ...
A4V:        M  A  T  K  V  V  C  V  L  K ...
                        ^
                     [Mutant]

2-3.Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

| Binder Sequence | Pseudo Perplexity (PPPL) |
| `WHYPPAGAAHGX`  |     `8.107327`           |

Peptide Sequence Comparison Report

🧬 Reference Sequence

ref = "FLYRWLPSRRGG"

generated = [
    "WHYPPAGAAHGX"
]

def identity(seq1, seq2):
    matches = sum(a == b for a, b in zip(seq1, seq2))
    return matches / len(seq1)

def mutations(seq1, seq2):
    return [(i+1, a, b) for i, (a, b) in enumerate(zip(seq1, seq2)) if a != b]

results = []

for seq in generated:
    id_score = identity(ref, seq)
    muts = mutations(ref, seq)
    
    results.append({
        "seq": seq,
        "identity": id_score,
        "mutations": muts
    })

for r in results:
    print(f"\nSequence: {r['seq']}")
    print(f"Identity: {r['identity']:.2f}")
    print(f"Mutations: {r['mutations']}")
Sequence: WHYPPAGAAHGX
Identity: 0.08
Mutations: [
 (1, 'F', 'W'),
 (2, 'L', 'H'),
 (3, 'Y', 'Y'),
 (4, 'R', 'P'),
 (5, 'W', 'P'),
 (6, 'L', 'A'),
 (7, 'P', 'G'),
 (8, 'S', 'A'),
 (9, 'R', 'A'),
 (10, 'R', 'H'),
 (11, 'G', 'G'),
 (12, 'G', 'X')
]

🧬 PepMLM Peptide Perplexity Analysis

⚙️ Device Setup

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

def compute_perplexity(seq, tokenizer, model):
    inputs = tokenizer(seq, return_tensors="pt")
    
    # ⭐ Move inputs to the same device as model
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs, labels=inputs["input_ids"])
        loss = outputs.loss
        ppl = torch.exp(loss)

    return ppl.item()

    ref = "FLYRWLPSRRGG"
    gen = "WHYPPAGAAHGX"

    ppl_ref = compute_perplexity(ref, tokenizer, model)
    ppl_gen = compute_perplexity(gen, tokenizer, model)

print(f"Reference PPL: {ppl_ref:.4f}")
print(f"Generated PPL: {ppl_gen:.4f}")

🧬 PepMLM Perplexity Analysis Report


📊 Results Summary

Sequence TypePseudo-Perplexity (PPL)
Reference (SOD1-binding peptide)2.7808
Generated peptide3.5465

🧠 Interpretation

The generated peptide shows a higher pseudo-perplexity compared to the reference SOD1-binding peptide.

This indicates that:

  • It has a lower probability under the PepMLM protein language model
  • It deviates more from learned natural peptide patterns
  • It is less consistent with native-like sequence motifs

🔬 Biological implication

These results suggest:

  • Reduced structural plausibility
  • Potential loss of native-like binding characteristics
  • Increased deviation from known SOD1-binding sequence features

image image

Fig. 1 Protein-peptide complex based on AlphaFold Server

“The A4V mutation typically occurs near the core of the SOD1 protein’s β-barrel, leading to structural instability and a higher propensity for protein dissociation (or ‘falling apart’). The peptide binds in the vicinity of this A4V mutation site.”

image image

Fig. 2 Binding site of protein and short peptide

“The observed pTM of 0.85 indicates that AlphaFold has very high confidence in the overall topological conformation of the protein monomer, suggesting a highly reliable structure. However, the ipTM of 0.39 is relatively low (with 0.5 typically serving as the threshold), which implies that the model lacks certainty regarding the relative positioning and interaction interface between the protein and the short peptide. This suggests that the binding conformation may be unstable or that multiple potential binding modes exist.”

image image

WHYPPAGAAHGX demonstrates an excellent developmental balance between binding capacity and therapeutic properties:

Exceptional Safety and Solubility: With a solubility probability of 1.000 and a hemolysis probability as low as 0.017, this peptide presents virtually no hurdles in terms of experimental handling and biocompatibility.

Moderate Affinity: Although its $pKd$ of 5.350 is classified as “weak binding” (micromolar range), it provides a very clean and highly plastic scaffold for a short 12-amino acid peptide with no significant net charge.

Conclusion: By trading off a degree of affinity, this peptide achieves optimal physicochemical stability. Compared to sequences that exhibit “strong binding but poor solubility,” it stands out as a candidate with superior druggability potential.

Selection and Evaluation of a Stochastic Peptide Sequence

Which one is best?

| Binder Sequence | Pseudo Perplexity (PPPL) |
| `WRSYVVAVELGE`  |  `18.629115501481596` |
image image

Peptide Comparative Analysis: WRSYVVAVELGE vs. WHYPPAGAAHGX

📝 Executive Summary

While WRSYVVAVELGE shows slightly higher binding affinity, WHYPPAGAAHGX presents a significantly more favorable "druggability" profile, particularly regarding membrane permeability, non-fouling characteristics, and solubility in physiological conditions.
---

🔍 Comparative Analysis

1. Binding & Potency

  • WRSYVVAVELGE: Demonstrates a higher binding affinity ($pK_d/pK_i$ of 5.814).
  • WHYPPAGAAHGX: Shows slightly weaker binding (5.350).

Takeaway: The difference in affinity is marginal ($\approx 0.46$ log units). In many lead-optimization contexts, this slight loss in potency is a reasonable trade-off for the improved ADME properties seen in the second peptide.

2. ADME & Pharmacokinetics

FeatureWRSYVVAVELGEWHYPPAGAAHGXWinner
Permeability0.030 (Low)0.571 (Moderate)🏆 WHYPPAGAAHGX
Half-Life0.493 hrs0.618 hrs🏆 WHYPPAGAAHGX
FoulingFouling (0.467)Non-fouling (0.671)🏆 WHYPPAGAAHGX
Hemolysis0.112 (Low)0.017 (Negligible)🏆 WHYPPAGAAHGX

Permeability: The most striking difference. WHYPPAGAAHGX has a $\approx 19\times$ higher probability of penetrance, making it a much better candidate for intracellular targets or oral bioavailability. Stability: WHYPPAGAAHGX offers a longer predicted half-life and better resistance to non-specific protein adsorption (non-fouling).

3. Physicochemical Properties

Hydrophobicity (GRAVY): WRSYVVAVELGE is slightly hydrophobic (0.28), which correlates with its lower solubility in complex environments and fouling tendency. WHYPPAGAAHGX is distinctly hydrophilic (-0.60), explaining its excellent solubility profile. Isoelectric Point (pI) & Charge: WRSYVVAVELGE (pI 4.86) carries a stronger negative charge at pH 7 (-1.23). WHYPPAGAAHGX (pI 6.92) is nearly neutral (-0.07) at physiological pH. This neutrality often aids in crossing lipid bilayers, supporting its higher permeability score.


🎯 Conclusion

WHYPPAGAAHGX is the superior scaffold for further development.

Although WRSYVVAVELGE has a stronger initial binding score, its poor permeability and fouling tendencies represent significant “developability” hurdles. WHYPPAGAAHGX strikes a much better balance: it remains highly soluble and non-hemolytic while providing the necessary penetrance to function effectively in a biological system.

Week 6 HW: hw-genetic-circuits-part-i

Week 6 — Genetic Circuits I: DNA Assembly Technologies

Molecular Biology Lab Report: PCR & Assembly Techniques

  1. Components of Phusion High-Fidelity PCR Master MixPhusion Master Mix is a convenient 2X concentrated solution containing:Phusion DNA Polymerase: A pyrococcus-like enzyme fused with a processivity-enhancing domain. It provides extremely high fidelity ($50\times$ higher than Taq) and speed.dNTPs: The building blocks ($dATP, dTTP, dCTP, dGTP$) for the new DNA strand.Reaction Buffer: Maintains optimal pH and provides ionic strength.MgCl2: A necessary cofactor for polymerase activity.

  2. Factors Determining Primer Annealing Temperature ($T_m$)The optimal annealing temperature ($T_a$) is typically $3–5^\circ C$ below the $T_m$ of the primers. $T_m$ is determined by:Base Composition: The ratio of G-C pairs (3 hydrogen bonds) to A-T pairs (2 hydrogen bonds). Higher GC content increases $T_m$.Primer Length: Longer primers generally have higher melting temperatures.Salt Concentration: Monovalent cations ($Na+$) and divalent cations ($Mg{2+}$) stabilize the DNA duplex, raising $T_m$.Mismatches: Any base pair mismatch significantly lowers the stability and $T_m$.

  3. PCR vs. Restriction Enzyme Digests Both methods generate linear DNA fragments, but they differ significantly in application and protocol.

FeaturePCR (Polymerase Chain Reaction)Restriction Enzyme Digest
MechanismDe novo synthesis using a DNA polymerase and specific primers.Physical “cutting” of existing DNA at specific recognition sequences.
OutputExponentially amplified copies of a specific target region.Fragments produced from a limited amount of template (no gain in mass).
Speed/EfficiencyHighly efficient; can create billions of copies from a tiny sample.Limited by the amount of starting material; throughput depends on substrate mass.
SpecificityHigh; defined by custom primer sequence design.Fixed; defined by natural recognition sites (e.g., GAATTC for EcoRI).

Key Differences Summary

  1. Amplification vs. Fragmentation: PCR is a constructive process that increases the total amount of DNA. Restriction digestion is a reconstructive/analytical process that breaks down existing DNA into smaller pieces.
  2. Flexibility: PCR allows researchers to target almost any sequence by designing new primers. Restriction digestion is constrained by the presence of specific enzyme motifs (palindromes) within the DNA sequence.
  3. Sensitivity: PCR can detect and amplify DNA from single cells or degraded samples, whereas Restriction Enzyme Digests usually require microgram quantities of high-quality DNA for visualization on a gel.

Assignment: Asimov Kernel

Asimov Kernel Assignment: Synthetic Genetic Circuit Design & Simulation

  • Author: Siwei Zhang
  • Repository: [Link to your created repository]
  • Date: May 2026

1. Environment Setup & Baseline Validation

Notebook & Repository Verification

  • A dedicated assignment repository has been initialized.
  • This blank notebook entry has been established to capture the step-by-step engineering lineage of the bacterial constructs.

Recreating the Repressilator Circuit

To validate the simulator’s kinetics and gain familiarity with the drag-and-drop functional interface, the classic Elowitz Repressilator circuit was recreated from scratch using parts from the Characterized Bacterial Parts Repository.

  • Design Architecture: A 3-gene loop network utilizing three sequential transcriptional repressors (TetR, Cl, and LacI), where each repressor inhibits the transcription of the next gene in the loop.
  • Workflow: Parts were sourced using the right-side Search panel and assembled into a blank Construct.
  • Simulation Verification: Running the simulator yielded a sustained, out-of-phase oscillatory profile for all three protein products, matching the baseline template in the Bacterial Demos Repository.

(Insert your captured Repressilator Glyph Image and Simulator Graphs here)


2. Custom Construct Designs, Hypotheses, and Simulations

Below are three custom genetic constructs engineered using the Characterized Bacterial Parts Repository to investigate feedback loops, logic gates, and metabolic pacing.

Construct 1: The Coordinated Toggle Switch with Reporter

Design & Parts Layout

  • Promoter 1: Constitutive/Inducible Promoter controlling Gene A (Repressor X).
  • Promoter 2: Promoter regulated by Repressor X, controlling Gene B (Repressor Y) and a downstream sfGFP Reporter.

Functional Hypothesis

This circuit is designed to function as a mutual-inhibition toggle network or a forward cascade. When the primary signal is absent, Promoter 2 is uninhibited, allowing steady-state expression of Repressor Y and the sfGFP reporter. Upon induction of Gene A, the accumulation of Repressor X should sharply shut down Promoter 2, leading to a visual decay/dilution of the sfGFP signal over time.

Simulation Results & Analysis

  • Observed Dynamics: (Describe what the simulator graph showed when you pressed play)
  • Troubleshooting & Adjustments: (If the switch was too “leaky” or failed to flip, note how you adjusted the promoter strengths, degradation rates, or initial molecular concentrations in the simulator settings to achieve stable state switching).

Construct 2: Feedback-Stabilized Homeostatic Loop

Design & Parts Layout

  • Promoter 1: Inducible Promoter driving an activator or essential metabolic enzyme downstream.
  • Promoter 2: Activated downstream promoter driving a high-affinity repressor that feeds back onto the primary promoter.

Functional Hypothesis

Instead of generating unconstrained oscillations, this construct is engineered to act as a self-limiting homeostatic governor. Upon primary induction, protein expression should surge rapidly. However, as the downstream product accumulates, it activates its own local repressor, capping the maximum expression ceiling and stabilizing the protein concentrations into a tight, non-toxic steady state.

Simulation Results & Analysis

  • Observed Dynamics: (Detail the shape of the graph—e.g., did it reach a flat plateau, or did it exhibit dampening oscillations before stabilizing?)
  • Troubleshooting & Adjustments: (If it oscillated wildly instead of stabilizing, discuss how tuning the translation efficiency or transcript half-lives in the simulator parameter window helped smooth out the response curve).

Construct 3: Multi-Input Synthetic Coherent Feed-Forward Loop (FFL)

Design & Parts Layout

  • Input Node: Constitutive promoter driving Transcription Factor X.
  • Intermediate Node: Promoter activated by Factor X, driving Transcription Factor Y.
  • Output Node: A complex/dual-input promoter requiring both Factor X and Factor Y to drive a mScarlet_I Red Fluorescent Reporter.

Functional Hypothesis

This simulates a coherent feed-forward loop acting as a sign-sensitive delay element. When Transcription Factor X is turned on, it immediately begins building up, but the output reporter should not express right away because it requires Factor Y. There should be a distinct kinetic lag period while Factor Y slowly accumulates to its activation threshold. This protects the system from responding to brief, accidental noise spikes.

Simulation Results & Analysis

  • Observed Dynamics: (Confirm if you observed the expected time delay before the mScarlet_I signal began its exponential rise)
  • Troubleshooting & Adjustments: (If the reporter turned on instantly, discuss how lowering the binding affinity of Factor X at the output node or increasing the degradation rate of the intermediate transcript restored the signal-filtering lag).

3. Methodological Documentation & Artifacts

Use the space below to paste your visual assets gathered during the simulation runs to complete the assignment verification.

Construct 1 Artifacts

  • Glyph Layout: (Paste Glyph)
  • Simulation Trace: (Paste Graph)

Construct 2 Artifacts

  • Glyph Layout: (Paste Glyph)
  • Simulation Trace: (Paste Graph)

Construct 3 Artifacts

  • Glyph Layout: (Paste Glyph)
  • Simulation Trace: (Paste Graph)

Week 7 HW: hw-genetic-circuits-part-ii

Week 7 — Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

The shift from Boolean genetic circuits to Intercellular Artificial Neural Networks (IANNs) represents a move from simple digital logic to complex, analog, and adaptive biological computing.

Compared to traditional Boolean genetic circuits—such as standard AND, OR, or NOT gates—Integrated Analog Neural Networks (IANNs) offer distinct advantages for processing complex biological inputs. The primary limitation of Boolean logic lies in its “all-or-nothing” binary thresholding, which often results in significant information loss when dealing with environmental concentration gradients like toxin levels or nutrient density. In contrast, IANNs inherently support analog signal processing, enabling a continuous, graded response. By fine-tuning promoter strength or RBS efficiency, IANNs can assign specific weights to various environmental inputs ($X_1, X_2…$), empowering cells to prioritize certain signals over others during decision-making.

This architecture significantly alleviates the metabolic burden on the host; while a multi-input Boolean gate requires a large library of orthogonal transcription factors, the IANN framework allows multiple inputs to efficiently “fan-in” to a single regulatory node. Furthermore, by leveraging the Hill function—which serves as a biological equivalent to the sigmoidal activation functions in neural networks—IANNs effectively filter out molecular noise. This prevents the “flickering” issues common near the thresholds of Boolean circuits, substantially enhancing the robustness of genetic circuits within the complex intracellular environment.

“Smart” Precision Oncology

A compelling application for an IANN is a Selective Cancer-Cell Classifier.

Goal: An intracellular IANN that triggers “cell suicide” (apoptosis) only if a specific combination of microRNA (miRNA) biomarkers—unique to a specific cancer subtype—is detected.

Input/Output Behavior:Inputs ($X_n$): The inputs are the intracellular concentrations of 4–6 different miRNA biomarkers ($miRNA_{1…6}$). Some are “high” in cancer (positive weights), and some are “low” in cancer but “high” in healthy cells (negative weights/inhibitors).Processing: The IANN functions as a perceptron. It calculates the weighted sum of these miRNAs.Positive Weights: miRNAs that trigger the expression of a pro-apoptotic protein (e.g., BAX).Negative Weights: miRNAs that trigger an interlacing “decoy” or inhibitor (e.g., an antisense RNA or Csy4 to degrade the BAX mRNA).Output ($Y$): If the weighted sum exceeds a specific threshold (the “activation potential”), the cell produces enough pro-apoptotic protein to trigger programmed cell death. If the cell is healthy, the sum remains below the threshold, and the cell lives.

Limitations and Challenges:Metabolic Burden: Expressing the components of the IANN (the synthetic receptors, the processing RNAs, and the output proteins) consumes significant cellular energy (ATP and ribosomes), which might slow down the cell or lead to the circuit being “evolved out” (mutated) over time.Threshold “Leakiness”: Biological systems are rarely 100% “off.” Even in healthy cells, the IANN might produce trace amounts of the output protein. If the output is a potent toxin or cell-death trigger, even a tiny amount of “leakage” could kill healthy cells.Weight Precision: It is difficult to precisely “tune” biological weights. In a computer, a weight can be exactly $0.75$; in a cell, a “weight” depends on binding affinities ($K_d$) and protein decay rates, which fluctuate with temperature and cellular health.

Analysis of the Csy4-Regulated Perceptron

Analysis of the Csy4-Regulated PerceptronIn the diagram you described (the Csy4/Fluorescent Protein perceptron): X1 (Csy4 DNA): Acts as the Inhibitory Input. When $X1$ is transcribed and translated, the Csy4 endoribonuclease is produced. X2 (Reporter DNA): Acts as the Excitatory Input. It produces the mRNA for the fluorescent protein. Regulation (The “Logic”): The mRNA from $X2$ contains a specific RNA recognition site for Csy4.If Csy4 is present ($X1$ is HIGH), it cleaves the $X2$ mRNA, preventing translation of the fluorescent protein.If Csy4 is absent ($X1$ is LOW), the $X2$ mRNA remains intact and is translated into a fluorescent signal.

This setup functions as a Single-Layer Perceptron where the weight for $X1$ is negative. The final fluorescent output represents the state of the activation function after integrating the transcription/translation rates of both inputs.

+------------------ 第1层(隐藏层) ------------------+   +------------------ 第2层(输出层) ------------------+
|                                                      |   |                                                      |
|  X1 DNA (编码Csy4)    X2 DNA (编码转录因子)           |    |  Y DNA (编码荧光蛋白)                                |
|       |                     |                        |    |       |                                              |
|       +---- 转录(Tx) ----+   |                        |   |       +---- 转录(Tx) ----> mRNA(荧光蛋白)             |
|                          |   |                        |   |                              |                      |
|                          v   v                        |   |                              |                      |
|                     mRNA1    mRNA2                    |   |                              v                      |
|                       |        |                      |   |                    受内切酶E调控(切割)              |
|                       v        v                      |   |                              |                      |
|                    翻译(Tl)  翻译(Tl)                  |   |                              v                      |
|                       |        |                      |   |                    剩余完整mRNA                      |
|                       v        v                      |   |                              |                      |
|                    TF1蛋白   TF2蛋白                  |   |                              v                       |
|                          \    /                       |   |                          翻译(Tl)                   |
|                           \  /                        |   |                              |                      |
|                            \/                         |   |                              v                      |
|                     结合启动子(内切酶基因)              |   |                        荧光蛋白(输出)               |
|                            |                          |   |                                                      |
|                            v                          |   |                                                      |
|                        转录(Tx)                        |   |                                                      |
|                            |                          |   |                                                      |
|                            v                          |   |                                                      |
|                      mRNA(内切酶)                      |   |                                                      |
|                            |                          |   |                                                      |
|                            v                          |   |                                                      |
|                        翻译(Tl)                        |   |                                                      |
|                            |                          |   |                                                      |
|                            v                          |   |                                                      |
|                     内切酶E(第1层输出)----------------------+--------------------------------------------------->|
|                                                       |   |                                                      |
+------------------------------------------------------+   +------------------------------------------------------+

Brief working principle:
Layer 1: The two input DNAs (X1 and X2) are transcribed and translated to produce TF1 and TF2 proteins, respectively. These two proteins bind to the promoter and drive the expression of the endonuclease gene, ultimately outputting endonuclease E.
Layer 2: The input Y DNA is transcribed into fluorescent protein mRNA. This mRNA is cleaved (regulated) by endonuclease E output from Layer 1. The remaining intact mRNA is translated to produce the fluorescent protein output.

Assignment Part 2: Fungal Materials

Existing fungal materials: examples, applications, advantages and disadvantages

Mycelium-based composites (construction, packaging, furniture)

Mycelium-based composites (MBCs) are produced by growing fungal mycelium on agricultural substrates such as straw, wood chips, sawdust, or other lignocellulosic waste streams. After colonisation, the material can be heat-treated to stop fungal growth and processed into solid forms. These composites have emerged as sustainable alternatives to synthetic foams (e.g., polystyrene), engineered wood products, and even some plastics.

Applications: Building blocks, insulation panels, facade panels, door cores, flooring, cabinetry, protective packaging (as a replacement for expanded polystyrene), furniture, and sculptures.

AdvantageDisadvantage
Low energy input for production — fungi grow on low-cost agricultural residues at ambient temperaturesLower mechanical strength and load-bearing capacity compared to conventional timber or concrete — currently limited to temporary structures or non-structural applications
Fully biodegradable at end-of-life; no persistent microplastic pollutionScalability challenges — producing large amounts of uniform material for industrial standards remains difficult
Superior fire resistance — mycelium composites exhibit low heat release, minimal smoke production, high char yield, and self-extinguishing properties compared to synthetic polymers like polystyreneSensitivity to moisture and water — requires treatment or coating for outdoor or high-humidity applications
Excellent acoustic absorption and low thermal conductivity (superior insulation performance compared to synthetic foams)Currently higher unit cost at small production scales; cost competitiveness requires further scale-up
Carbon sequestration during growth — fungi absorb CO₂ as they grow, unlike plastic manufacturing which releases CO₂Inconsistent material properties due to biological variability among strains and cultivation conditions
Uses agricultural waste as feedstock, promoting circular economy principlesRegulatory hurdles — construction materials must meet strict building codes, making certification lengthy

Mushroom-derived leather (mycelium leather)

Mycelium can be processed into flexible, leather-like sheets that serve as alternatives to both animal leather and synthetic faux leather (e.g., polyurethane). Researchers have developed techniques to produce such materials using split gill mushroom (Schizophyllum commune), Talaromyces sp., Pleurotus albidus, and Lentinus velutinus. Post-processing treatments (e.g., glycerol for flexibility, polyethylene glycol for stiffness) can tune the final properties.

Applications: Handbags, wallets, footwear, watch straps, car seat upholstery, steering wheel covers, and fashion accessories.

AdvantageDisadvantage
Animal-free and ethically produced — no livestock sufferingTensile and tear strength can be ~50% lower than genuine leather, requiring coatings or reinforcement for high-wear applications
Compostable at end-of-life — avoids landfill accumulationRequires post-processing (coatings, crosslinkers) to achieve desired properties, adding cost and complexity
Avoids toxic chromium tanning used in conventional leather production, preventing river contaminationProduction processes not yet fully standardised; material consistency varies with strain and substrate
Breathable and lighter than traditional leatherMay still require synthetic polymer coatings (e.g., PVC, PLA) to enhance water resistance and durability
Tunable properties — genetic variations among strains can be leveraged to alter flexibility, water resistance, thickness, and suppleness without synthetic engineeringScaling production to fashion industry volumes remains a challenge

Biotextiles and flexible mycomaterials

Flexible mycomaterials are thin, textile-like sheets produced directly from fungal mycelium. These materials can be grown on lignocellulosic waste without needing animal inputs or petroleum.

Applications: Clothing, upholstery, technical textiles, and fashion items.

AdvantageDisadvantage
Reduces reliance on water-intensive cotton farming and petroleum-based synthetic fibresMechanical strength may require polymeric coatings (e.g., PVA) for certain applications
Complete biodegradability compared to polyester and nylonThermal stability below that of many synthetic textiles without coating
Low environmental footprint across production cycleIndustrial-scale manufacturing capacity not yet established

Fungal biomass for food proteins (precision fermentation)

Yeast and filamentous fungi are used as cell factories for producing high-value proteins through precision fermentation. These systems can secrete correctly folded, functional proteins directly into the culture medium, avoiding costly cell lysis and purification steps.

Applications: Animal-free dairy proteins (casein, whey), egg proteins, collagen, and other food proteins for alternative protein products.

AdvantageDisadvantage
Secretion capability — fungi export proteins into the culture medium, lowering downstream processing costs compared to bacterial intracellular productionHigh-volume, low-margin products require gram-per-litre titres to be cost-competitive with conventional agriculture
Eukaryotic post-translational modifications (glycosylation, disulfide bonds, etc.) essential for functional food proteinsPrecision-fermented food proteins still advancing from pilot to routine industrial manufacture
Can be grown on low-cost media, keeping production costs relatively lowConsumer acceptance and regulatory approval for novel food proteins can be lengthy
Scalable in controlled bioreactors, enabling food production independent of agricultural land and weather

What might you want to genetically engineer fungi to do and why?

1. Produce pharmaceuticals and high-value therapeutic compounds
Fungi are natural producers of bioactive secondary metabolites including antibiotics, immunosuppressants, and anticancer agents. Engineering can dramatically boost yields. For example, engineered Aspergillus niger has been made to produce secondary metabolites at titres up to 4,500 mg/L — far exceeding natural production levels. Yeast has been engineered to produce rare anticancer saponins (e.g., polyphyllin II) normally extracted from endangered medicinal plants, providing a sustainable, controllable alternative to plant harvesting.

Why? To secure reliable supply of life-saving drugs independent of wild harvesting or chemical synthesis, reduce costs, and enable discovery of novel compounds through combinatorial biosynthesis of fungal enzymes and pathways.

2. Convert waste into biofuels, bioplastics, and industrial chemicals
Filamentous fungi can break down lignocellulosic biomass and waste streams into valuable products. The ligninolytic fungus Phanerochaete chrysosporium can generate biofuels, bioplastics, and pharmaceuticals from agricultural waste. Engineered strains show even greater degradation efficiency and product yields.

Why? To address the global waste crisis (an estimated 181.5 billion tonnes of lignocellulosic biomass generated annually) while creating economic value and reducing greenhouse gas emissions from waste burning and landfilling.

3. Bioremediate environmental pollutants
Fungi can be engineered to degrade a wide range of pollutants including heavy metals, synthetic polymers, dyes, pesticides, polycyclic aromatic hydrocarbons, and microplastics. Fungal mycelium networks act like "biological sponges" trapping contaminants, and fungal enzymes (e.g., laccases, manganese peroxidases) break down complex pollutants.

Why? To close circular economy loops by transforming hazardous waste into benign or recoverable resources, and to address the microplastics crisis — fungi can be "trained" to digest plastics that would otherwise persist for centuries.

4. Engineer climate-resilient agricultural solutions
Fungi can be engineered as biofertilisers, biocontrol agents, and carbon sequestration tools. Their natural ability to adapt to extreme conditions can be harnessed to reduce chemical inputs in agriculture.

Why? To reduce reliance on synthetic fertilisers and pesticides, enhance soil carbon storage, and develop sustainable agricultural practices resilient to climate change.

5. Create novel biomaterials with tailored properties
Genetic engineering can program fungi to produce materials with specific characteristics — strength, flexibility, water resistance, colour, or texture — by selecting and breeding strains or introducing foreign biosynthetic pathways.

Why? To replace petroleum-based plastics and animal-derived materials with tunable, biodegradable, low-carbon alternatives for packaging, construction, fashion, and automotive industries.

6. Manufacture industrial enzymes at high yields
Fungi are already used for large-scale enzyme production (e.g., cellulases, proteases, amylases). Engineering can enhance secretion efficiency, thermal stability, and substrate specificity.

Why? Enzymes are critical for countless industrial processes — food processing, detergent manufacturing, textile processing, paper production, and biofuel generation. Improved fungal enzyme factories increase efficiency and lower costs.

7. Produce novel foods and flavours
Engineered yeasts and fungi can produce specific flavour compounds, amino acids, and food ingredients at industrial scale. Precision-fermented food proteins are emerging as ethical alternatives to animal-derived dairy, egg, and meat proteins.

Why? To address the environmental impact of animal agriculture, enable food production independent of land and climate, and meet growing global protein demand sustainably.

What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

FeatureBacteria (E. coli)Fungi (yeasts and filamentous fungi)
Post-translational modifications (PTMs)Very limited — generally lack eukaryotic PTM machinery (glycosylation, phosphorylation, acetylation, disulfide bond formation)Extensive PTM capabilities — can perform complex eukaryotic modifications essential for functional protein production
Secretion capabilityExport systems less developed; many products remain intracellular, requiring costly cell lysisNaturally high-level secretors — many filamentous fungi evolved powerful secretion systems as decomposers, exporting enzymes directly into the medium
Protein foldingLimited ability to correctly fold complex eukaryotic proteins; prone to inclusion body formationProper folding machinery for disulfide bonds and multi-domain eukaryotic proteins; can express functional human/plant proteins
Natural product biosynthesisNot natural producers of most complex secondary metabolitesNatural producers of antibiotics, immunosuppressants, anticancer agents, etc. — possess innate pre-mRNA splicing systems and abundant biosynthetic precursors
Substrate versatilityNarrow substrate range; typically requires refined sugarsCan utilise diverse low-cost feedstocks (lignocellulose, agricultural waste, food processing residues)
Growth and production costVery fast growth (minutes), extremely low media costs; ideal for simple proteinsModerate growth rates (hours), but can be fermented on very low-cost agricultural byproducts, balancing cost
PTM complexityCannot perform human-type glycosylation; therapeutic proteins may be non-functional or immunogenicCapable of humanised glycosylation pathways via glycoengineering, producing functional therapeutic proteins
Intracellular vs. extracellularProducts often accumulate intracellularly, requiring harsh lysis and complex purification stepsProducts secreted into medium, simplifying downstream processing and enabling continuous production
Genetic tools maturityExtremely well-developed, decades of optimisationRapidly advancing — CRISPR-Cas9, Cas12a, promoter/terminator libraries, landing-pad platforms for modular integration now available for many species
Threat levelHuman pathogens exist (e.g., E. coli pathogenic strains), but lab strains generally safeMost industrial fungi are safe (Generally Recognised as Safe status for many species); no endotoxin issues

Key superiority for complex protein production: Bacteria are excellent for simple, prokaryotic proteins produced intracellularly at low cost. However, for eukaryotic proteins requiring proper folding, glycosylation, and disulfide bonds — which include most therapeutic proteins, industrial enzymes produced for human applications, and food proteins — fungal systems are essential. Filamentous fungi in particular combine the low media costs of bacterial systems with the eukaryotic processing capabilities of higher organisms, all while secreting products to simplify purification.

Key superiority for natural product discovery and production: Unlike E. coli, which lacks native secondary metabolite pathways, filamentous fungi are evolutionarily optimised to produce diverse bioactive small molecules. Their innate biosynthetic gene clusters can be activated, modified, or transferred, making them superior chassis for producing pharmaceutical compounds that bacteria simply cannot make.

Conclusion: Choose bacteria for simple, fast, cheap production of prokaryotic or non-glycosylated proteins. Choose fungi — particularly filamentous fungi and yeasts — when products require eukaryotic post-translational modifications, need to be secreted for easy purification, or are complex natural products that fungi naturally know how to make.

Assignment Part 3: First DNA Twist Order

📅 HTGAA 2026: Individual Final Project Submission Checklist

📋 Milestone Checklist & Progress Tracking

  • Submit the Official Google Form (Deadline: March 20)
    • Finalize the text for Draft Aim 1
    • Polish the Final Project Summary abstract
    • Select preferred tracks for the HTGAA Industry Council
    • Generate and paste the shared link to the DNA Design Folder (Benchling/Kernel)
  • Complete Week 2 Homework (Part 3: DNA Design Challenge)
    • Design at least one (1) insert sequence and save it into the shared folder
    • Document the target backbone vector on the project website

📝 Submission Copy & Technical Drafts

🔬 1. Final Project Summary

Context: Copy and paste this text directly into the designated abstract section of the Google Form.

This project introduces the Prometheus Symbiont, a paradigm-shifting bio-hybrid entity designed to systematically overcome the operational lifespan bottleneck of living biological components within engineering matrices. By isolating photosynthetic thylakoid membranes from Synechococcus elongatus and integrating them onto functionalized carbon nanotube anodes, we establish a direct, high-efficiency biophotovoltaic conversion interface. The architecture features an on-board microfluidic directed evolution platform to drive continuous cellular self-healing alongside a genetically encoded, dual-mode Calcium Ion ($\text{Ca}^{2+}$) central control interface. Driven by this bio-digital bridge, a silicon master chip dynamically toggles soft-robotic actuators between energy accumulation (“Grow”) and photoprotective shading (“Defense”), ultimately achieving carbon-neutral, self-renewing, long-endurance technological autonomy.


🎯 2. Core Project Aim 1 Draft

Context: Copy and paste this text into the Specific Aims section of the Google Form.

Aim 1: Engineering and Characterization of the Dual-Mode $\text{Ca}^{2+}$ Bio-Digital Communication Interface. We will clone and optimize a genetically encoded calcium indicator (GCaMP6s) under the control of a cyanobacterial promoter ($P_{psbAI}$) within Synechococcus elongatus PCC 7942. We will quantitatively characterize the ratiometric fluorescence output velocity ($\Delta F/F_0$) under simulated photoinhibitory stress spikes ($0$ to $2000 \ \mu\text{mol photons m}{-2}\text{s}{-1}$) to validate threshold detection parameters required for closed-loop machine actuator responses.


🧬 3. Week 2 DNA Design Challenge & Website Specifications

Context: Publish this section on your individual project website documentation page to satisfy the Part 3 assignment requirement.

ParameterTechnical Specifications
Shared Design DirectoryHTGAA_2026_Prometheus_Symbiont_DNA (Benchling / Asimov Kernel)
Insert Sequence NameNS1-PpsbAI-RBS-GCaMP6s-TrrnB
Gene of Interest (GOI)GCaMP6s (Genetically Encoded Calcium Indicator optimized for stress state capturing)
Target Backbone VectorpAM1579 (Standard cyanobacterial integration vector targeting Neutral Site 1 ($NS1$) via homologous recombination)
Assembly MethodologyHigh-fidelity Gibson Assembly (Linear insert designed with 40-bp overlapping terminal homology arms)

📂 In Silico Plasmid Architecture Layout

Week 9 HW: hw-cell-free-systems

Week 9 — Cell-Free Systems

Homework Part A: General and Lecturer-Specific Questions

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Benefits in flexibility and control

Cell‑free systems are “open‑top” – you can add whatever you want: salts, energy sources, DNA templates, detergents, even toxic things that cells can’t handle. No worries about killing cells.

The reaction is fast – you get results in a few hours, not one or two days like fermentation. Want to test conditions? Set up 10 different pH or magnesium concentrations in the morning, and you’ll have the answer by the afternoon.

Transcription and translation can be decoupled – separately add T7 polymerase, inhibitors, modify the mRNA structure… go wild.

You can sample anytime without disrupting cells.

Two cases where cell‑free works better than living cells

  • Making toxic or membrane proteins – Some proteins (like pore‑forming toxins) kill the cells as soon as they’re expressed, so the culture never grows. In a cell‑free system, there are no live cells – just add detergents or lipids, and the protein stays dissolved. No problem.
  • Incorporating unnatural amino acids – In live cells, getting them to take up and incorporate unnatural amino acids is a huge pain: low efficiency, often toxic. In a cell‑free system, you just dump the unnatural amino acid into the reaction tube. Want to attach a fluorescent label or a crosslinker? Super easy.

Main components of a cell‑free expression system and their roles

Cell lysate – That’s the “soup” you get after breaking open cells (like E. coli or wheat germ). It contains ribosomes, tRNAs, various enzymes, and translation factors. This is the core machine.

Energy regeneration system – Simply adding ATP isn’t enough; it gets used up quickly. So you need a system that continuously converts ADP back into ATP. The most common one is creatine phosphate + creatine kinase.

Amino acids – The building blocks. The 20 standard amino acids. If you want isotopic labelling or unnatural amino acids, just swap them.

Nucleotides (NTPs) – For transcription: ATP, GTP, CTP, UTP.

DNA template – The gene you want to express, with a promoter in front (e.g., T7).

Salts and buffer – Maintain pH and ionic strength. Magnesium ions are especially important – ribosomes can’t work without them.

RNA polymerase – If the lysate doesn’t have enough, add extra T7 RNA polymerase.

Optional additives – Chaperones, detergents, DTT (to prevent oxidation), PEG (to mimic the crowded intracellular environment).

Why is energy regeneration so important? How do you ensure a continuous ATP supply?

The reason is simple: protein translation is a huge consumer of ATP and GTP. The small amount of ATP you add at the beginning turns into AMP and phosphate within minutes. Without energy regeneration, the reaction quickly stops, and the yield is pitiful. So you need a mechanism to convert ADP back into ATP continuously.

Common method: the creatine phosphate / creatine kinase system.

You add creatine phosphate (around 10–50 mM) and creatine kinase (1–2 U/μL) to the reaction tube. Creatine kinase transfers a high‑energy phosphate group from creatine phosphate to ADP, regenerating ATP. This method is very stable, has few side effects, and is used in most cell‑free experiments.

Prokaryotic vs. eukaryotic cell‑free systems – pick one protein for each and explain why

FeatureProkaryotic (E. coli)Eukaryotic (wheat germ, rabbit reticulocyte, etc.)
YieldVery high, mg/mL levelLow to medium, μg/mL level
SpeedFast, 2‑4 hoursSlow, overnight
CostLow, simpleHigh, complex
Post‑translational modificationsBasically none (no glycosylation, disulfide bonds also hard to form)Can do some modifications (glycosylation, disulfide bonds)
Best forBacterial proteins, unmodified enzymes, structural biologyEukaryotic proteins, complex proteins that need proper folding

Prokaryotic system: I would make GFP (green fluorescent protein). This guy is simple, needs no modifications, folds beautifully in E. coli lysate, gives high yield, and you can see it with your naked eye. It’s super convenient for testing and tuning system parameters.

Eukaryotic system: I would make the human EGFR kinase domain (the intracellular part of the epidermal growth factor receptor). This protein needs to form disulfide bonds correctly and tends to aggregate. In E. coli it comes out mostly inactive. The wheat germ system has eukaryotic chaperones that help it fold slowly, producing functional protein for enzyme activity assays.

How to design a cell‑free experiment to optimise membrane protein expression? Challenges and how to tackle them

The pitfalls of membrane proteins:

  • Hydrophobic transmembrane domains aggregate in water, forming precipitates.
  • Hard to fold correctly and insert into a lipid bilayer.
  • Yield is usually very low.
  • Adding detergents or lipids might inhibit the reaction.

My experimental design :

  • Use E. coli lysate – cheap and high‑yielding. Add some extra chaperones (e.g., GroEL/GroES) to help with folding.
  • Try different membrane environments:
    • Detergents (DDM, LDAO, etc.) – start with low concentrations (0.01‑0.1%); don’t go too high, or the reaction will be inhibited.
    • Liposomes – prepare small liposomes in advance (e.g., from E. coli total lipids or synthetic lipids) and add them during the reaction so the protein inserts as it is being synthesised.
    • Nanodiscs – use membrane scaffold protein (MSP) plus lipids to make nanodiscs that mimic a real membrane environment.
  • Use a continuous exchange cell‑free (CECF) system – a semipermeable membrane separates the reaction mixture from a feeding buffer, which continuously supplies energy and amino acids while removing inhibitors (e.g., phosphate). Especially useful for membrane proteins.
  • Add a fusion tag – for example, attach GFP or MBP to the N‑terminus. This helps folding and allows real‑time monitoring of expression levels.
  • Lower the temperature – for example, 20‑25 °C. Membrane proteins usually prefer cooler temperatures and are less prone to aggregation.
  • Add stabilisers – glycerol (5‑10%), trehalose, or even the ligand/substrate that the protein binds to, to stabilise the native conformation.

Detection: Use the GFP fusion to follow fluorescence. At the end, separate the membrane fraction from the soluble fraction by ultracentrifugation, then run a gel (stain or Western blot).

Low yield in a cell‑free experiment: three possible reasons and how to fix them

Reason 1: Problem with the DNA template
For example, weak promoter, a very stable secondary structure at the 5′ end, or too much DNA that inhibits the reaction.
How to fix it: Switch to a strong T7 promoter, check the RBS sequence (for prokaryotic systems), remove hairpins at the 5′ end. Run a DNA concentration gradient (typically 5‑50 μg/mL) to find the optimal concentration. Also, resequence the template.

Reason 2: Energy runs out; the regeneration system isn’t working
Maybe the creatine phosphate has degraded, or there is a high ATPase activity in the reaction that consumes ATP too quickly.
How to fix it: Open a fresh vial of creatine phosphate, increase the amount of creatine kinase. Or switch to a different energy system (e.g., glucose + glycolysis). If that doesn’t work, use an ATP detection kit (luciferase‑based) to monitor ATP levels by taking samples during the reaction.

Reason 3: Accumulation of inhibitors
ATP hydrolysis produces a lot of phosphate, which chelates magnesium ions and shuts down the reaction. Also, aggregated protein precipitates can cause trouble.
How to fix it: Switch to a continuous exchange cell‑free (CECF) system to continuously remove small‑molecule inhibitors. Alternatively, add a phosphatase inhibitor, or use a phosphate binder (e.g., phosphatase substrate – but that’s a bit tricky). A simpler approach: dilute the reaction or exchange the buffer.

Homework question from Peter Nguyen

Design an example of a useful synthetic minimal cell as follows:

Pick a function and describe it

🤴Activatable bacterial sensor for inflammatory cytokine detection. The synthetic minimal cell (SMC) serves as a signal transduction capsule – it detects the presence of tumor necrosis factor alpha (TNF-α), a key pro-inflammatory cytokine elevated in conditions such as sepsis, rheumatoid arthritis, and inflammatory bowel disease, and then converts that protein signal into a small-molecule output that can be read by a standard reporter bacterium. This effectively “translates” a protein biomarker that bacteria cannot naturally sense into a format that a simple engineered bacterial reporter can process.

What would your synthetic cell do? What is the input and what is the output?

DescriptionDetails
InputTNF-α (tumor necrosis factor alpha) – a 17 kDa pro‑inflammatory cytokine. Naturally inert to bacteria (they lack mammalian cytokine receptors). Detection range: 10 pM – 100 nM (clinically relevant in sepsis: >50 pM). Co‑encapsulated inside the SMC at the time of assembly, not imported later.
SMC internal process1. TNF‑α binds to a TNF‑α‑responsive riboswitch aptamer engineered into the 5′ UTR of the α‑hemolysin (αHL) gene.
2. Binding induces a conformational change that exposes the Shine‑Dalgarno (RBS) sequence, enabling ribosome binding.
3. Translation of αHL monomers (293 aa, 33 kDa) proceeds using the encapsulated cell‑free Tx/Tl system.
4. Monomers spontaneously assemble into heptameric transmembrane β‑barrel pores (~1.4 nm diameter) in the SMC lipid bilayer.
Timing: Pore formation detectable within 1–2 hours after TNF‑α exposure.
OutputIPTG (isopropyl β‑D‑1‑thiogalactopyranoside) – a small, diffusible molecule (MW = 238 Da). Initially encapsulated inside the SMC at 1–5 mM. Once αHL pores are formed, IPTG diffuses out into the external environment. Leakage rate without pores: <5% over 4 hours (due to cholesterol‑stabilised membrane).
Whole‑system readoutIPTG diffuses to an E. coli reporter strain (e.g., BL21(DE3)) transformed with a plasmid carrying lacZ under a T7‑lacO promoter. IPTG binds LacI, causing dissociation from the lacO operator, allowing T7 RNA polymerase to transcribe lacZ. β‑galactosidase (LacZ) cleaves the colourimetric substrate CPRG (chlorophenol red‑β‑D‑galactopyranoside) from yellow to purple. Absorbance measured at 570 nm, or visible by eye. Controls: No TNF‑α → no pores → no IPTG release → no colour change. No αHL gene → same negative result.

In short: TNF‑α (input) → SMC → IPTG (output) → E. coli reporter → purple colour (readout).

Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

🐱‍🐉No. The synthetic cell membrane provides two essential functions that free Tx/Tl mixture cannot.

First, the membrane serves as a physical barrier that holds the IPTG inside until the α‑hemolysin pores are formed. If the Tx/Tl system were simply mixed in a test tube without encapsulation, IPTG would be free in solution from the start and would reach the reporter bacteria immediately — even without TNF‑α. Under those conditions, the bacterial reporter would always turn purple regardless of whether the cytokine was present. Thus, the SMC acts as an AND logic gate: output is only released when both (1) IPTG is present in the interior AND (2) TNF‑α triggers pore formation.

Second, the membrane allows the SMC to sense large macromolecular inputs (TNF‑α, ~17 kDa) using encapsulated cell‑free machinery. In an open mix, TNF‑α could potentially interfere directly with the reporter — but encapsulation compartmentalises the sensing event, preventing crosstalk.

Could this function be realized by genetically modified natural cell?

🐉Yes, but with significant limitations.

It is theoretically possible to engineer a living bacterium to detect TNF‑α by:

Expressing a mammalian TNF‑α receptor on its surface, and

Coupling receptor binding to an intracellular genetic circuit that produces a colourimetric output.

However, this approach faces several practical challenges that the SMC avoids:

ChallengeLiving engineered bacteriumSynthetic minimal cell
Membrane protein expressionRequires functional expression of a complex mammalian receptor – often toxic, misfolded, or mislocalised in E. coliNo need – the SMC uses an entirely different mechanism (riboswitch inside the lumen, not receptors on the surface)
Genetic circuit burdenHeavy metabolic load on the host; circuits often mutate or lose function over timeNo replication, no evolution – the SMC is a disposable, pre‑assembled capsule
Generalisation to other cytokinesRequires a different receptor and re‑engineering each timeThe same SMC architecture can be repurposed by swapping the aptamer in the riboswitch – the platform is modular
Sterility/containmentLiving GMOs cannot be released in many diagnostic or environmental settingsSMCs are non‑living and would degrade, eliminating containment concerns

Describe the desired outcome of your synthetic cell operation

In the presence of TNF-α above a clinically relevant threshold (e.g., >50 pM, within the human serum range for inflammatory conditions), the synthetic minimal cell should:

Sense TNF-α via the riboswitch embedded in the α‑hemolysin mRNA.

Translate and assemble α‑hemolysin pores in its own membrane within approximately 1–2 hours.

Release encapsulated IPTG diffusively through these pores into the surrounding medium.

The released IPTG should, upon contact with the E. coli reporter strain, induce sufficient LacZ expression to produce a detectable colour change (yellow to purple) using a substrate like CPRG or X‑gal.

The desired specificity is that no colour change occurs in the absence of TNF-α, even if other cytokines (e.g., IL‑6, IL‑1β) or unrelated proteins are present. In other words, the SMC should function as a specific, activatable signal transducer for TNF‑α.

Design all components that would need to be part of your synthetic cell.

ComponentDetailsRationale
Membrane compositionPhospholipids + cholesterol (e.g., POPC:cholesterol ~ 60:40 mol/mol)Cholesterol increases membrane stability and reduces passive leakage of small molecules like IPTG. The membrane must remain impermeable to IPTG in the absence of pores.
Encapsulated Tx/Tl systemE. coli‑based cell‑free system (either crude lysate‑based such as myTXTL or purified such as PURE)Provides the transcription‑translation machinery to produce α‑hemolysin in response to TNF‑α. E. coli is chosen because riboswitches function robustly in prokaryotic systems.
Encapsulated small moleculeIPTG (isopropyl β‑D‑1‑thiogalactopyranoside), ~1–5 mMActs as the output signal. IPTG is small, non‑toxic, diffuses readily through α‑hemolysin pores (~1.4 nm diameter), and is a strong inducer of the E. coli lac operon.
Encapsulated DNA templateGene for α‑hemolysin (αHL) under control of a TNF‑α‑responsive riboswitch in its 5′ UTR. The αHL gene is the wild‑type hla sequence from Staphylococcus aureus (encoding the 293‑amino‑acid monomer).Riboswitch regulates translation of αHL. Upon TNF‑α binding, the aptamer domain changes conformation, exposing the RBS and allowing ribosome binding.
Encapsulated components (in addition to Tx/Tl)NTPs, amino acids, energy regeneration system (creatine phosphate + creatine kinase), salts, buffer (e.g., HEPES, Mg²⁺), and optionally RNase inhibitors.These are standard cell‑free reaction components that sustain protein synthesis for several hours.
Biological cells (external reporter)E. coli strain transformed with a plasmid containing lacZ (β‑galactosidase) under a T7‑lacO promoter (or any IPTG‑inducible promoter), plus a constitutively expressed T7 RNA polymerase gene (e.g., BL21(DE3) or a derivative).Provides the final readout. IPTG released from SMCs diffuses into the reporter cells, relieves LacI repression, and induces LacZ expression, producing a colourimetric signal.

Organism for Tx/Tl system

Bacterial (E. coli) is appropriate because:

  • Riboswitches naturally function in prokaryotic systems and are well‑characterised in E. coli cell‑free extracts.
  • There is no need for eukaryotic post‑translational modifications – α‑hemolysin is a bacterial toxin that folds and assembles correctly in an E. coli lysate environment.
  • The output (IPTG) is specifically designed to activate an E. coli‑based reporter; using a prokaryotic Tx/Tl system keeps all components biologically compatible and minimises cross‑reaction risks.

How will your synthetic cell communicate with the environment?

Input (TNF‑α) is not permeable through the intact lipid bilayer (a 17 kDa protein cannot cross). However, the SMC does not need to import TNF‑α. The riboswitch is encoded on the DNA template inside the SMC, and the cell‑free Tx/Tl system produces the α‑hemolysin mRNA that contains the riboswitch in its 5′ UTR. For TNF‑α to trigger the riboswitch, TNF‑α must first cross the membrane into the interior – but large proteins are unable to do so without pores. How is this resolved? The input TNF‑α is actually added along with the Tx/Tl mix at the time of SMC assembly. The entire Tx/Tl system, including the riboswitch‑containing DNA, the energy mix, and TNF‑α itself, is co‑encapsulated during SMC formation. In other words, TNF‑α is present inside the SMC from the start, not imported later. The design takes advantage of the fact that riboswitches are co‑translational regulatory elements – the aptamer is part of the nascent mRNA, and TNF‑α binding occurs as the transcript is being produced.

Output (IPTG) is initially encapsulated. Upon TNF‑α‑triggered translation and assembly of α‑hemolysin pores, IPTG diffuses out through these channels. Pores are sufficiently large (≈1.4 nm diameter) to allow passage of small molecules like IPTG (MW ≈ 238 Da).

Experimental details

Lipids

  • POPC (1‑palmitoyl‑2‑oleoyl‑sn‑glycero‑3‑phosphocholine) – the primary structural phospholipid.
  • Cholesterol – stabilises the bilayer, reduces passive permeability, and enhances mechanical robustness.

Genes

  • α‑hemolysin (αHL, hla) gene – from Staphylococcus aureus (293 amino acids; secretes as a water‑soluble monomer that assembles into a heptameric β‑barrel pore upon contact with lipid membranes). A TNF‑α‑responsive RNA aptamer is engineered into the 5′ UTR directly upstream of the αHL RBS to create a riboswitch that activates translation only upon TNF‑α binding [6†L8-L16].

  • TNF‑α aptamer sequence – selected via SELEX or RNA‑compete against human TNF‑α. The aptamer is inserted into the 5′ UTR of the αHL gene, positioned such that ligand‑binding induces a structural rearrangement that exposes the Shine‑Dalgarno sequence and start codon, enabling translation initiation.

  • Reporter gene (in E. coli reporter strain)lacZ (encoding β‑galactosidase) under a T7‑lacO promoter (e.g., pET‑derived vector). The reporter strain also constitutively expresses T7 RNA polymerase (e.g., E. coli BL21(DE3)). When IPTG enters the cell, it binds LacI, causing dissociation from the lacO operator and allowing T7 RNA polymerase to transcribe lacZ.

Encapsulation method

Liposomes are formed by hydration of a dried lipid film (POPC:cholesterol) in a buffer containing:

  • cell‑free Tx/Tl mixture
  • IPTG (~2 mM)
  • riboswitch‑αHL DNA template (10–50 nM)
  • NTPs, amino acids, energy regeneration components
  • TNF‑α (variable concentrations, from 0 to 100 nM)
  • any other necessary cofactors

The lipid mixture is then subjected to multiple freeze‑thaw cycles and extrusion through polycarbonate membranes (e.g., 400 nm pore size) to produce unilamellar vesicles with encapsulated contents. Unencapsulated material is removed by gel filtration or centrifugal washing. Alternatively, water‑in‑oil emulsions can be used to generate monodisperse SMCs.

External reporter preparation

An E. coli BL21(DE3) strain is transformed with a plasmid carrying lacZ under a T7‑lacO promoter (e.g., pET‑lacZ). The strain is grown to early exponential phase (OD₆₀₀ ≈ 0.4), washed, and resuspended in fresh medium containing the colourimetric substrate (e.g., CPRG at 0.5 mg/mL).

Assay setup

Purified SMCs are mixed with the reporter E. coli suspension and incubated at 30 °C for 2–4 hours. Colour development is monitored spectrophotometrically (absorbance at 570 nm for CPRG cleavage product) or by direct visual inspection.

How will you measure the function of your system?

Measurement methodDetails
Colourimetric readout (LacZ activity)Add CPRG (chlorophenol red‑β‑D‑galactopyranoside) to the co‑culture of SMCs and reporter E. coli. LacZ cleaves CPRG, converting the yellow substrate into purple‑coloured chlorophenol red. Absorbance is measured at 570 nm [10†L8-L14]. No specialised equipment is required – the colour change is visible by eye.
IPTG release (direct detection)In separate experiments, SMCs are incubated without reporter bacteria. Supernatant samples are taken at intervals and analysed by HPLC‑MS or by a commercial IPTG detection assay (enzymatic coupling) to quantify the kinetics of IPTG release in response to different TNF‑α concentrations.
α‑hemolysin pore formation (verification)Include a small amount of fluorescently labelled dextran (e.g., 3 kDa FITC‑dextran) inside the SMCs during assembly. Pore formation is detected by monitoring fluorescence increase in the external medium (dextran leakage) over time, using a fluorescence plate reader.
TNF‑α dose–responsePerform assays with a range of TNF‑α concentrations (0, 0.001, 0.01, 0.1, 1, 10, 100 nM) to establish the detection limit and dynamic range of the system.
Specificity controlTest cross‑reactivity with other cytokines (e.g., IL‑6, IL‑1β, IFN‑γ) at physiologically relevant concentrations (1–10 nM) to ensure the TNF‑α aptamer does not respond to unrelated protein ligands.
Negative controlsInclude SMCs assembled without TNF‑α, SMCs without the riboswitch‑αHL DNA template, and empty liposomes (no encapsulated Tx/Tl system) to confirm that colour development depends on both TNF‑α presence and functional riboswitch‑controlled pore formation.
System overview diagram

Cross‑reference to the Lentini et al. architecture

The following table compares the original system by Lentini et al. (2014) with the present design:

Lentini systemThis design
Theophylline (input) – a small molecule inert to bacteriaTNF‑α (input) – a protein cytokine of diagnostic relevance
IPTG (output) – activates E. coli lac systemIPTG (output) – same output, retaining compatibility with standard reporter strains
α‑hemolysin pore – allows IPTG release upon riboswitch activationα‑hemolysin pore – identical pore‑forming mechanism
Theophylline aptamer in αHL 5′ UTR controls translationTNF‑α aptamer in αHL 5′ UTR controls translation
GFP expressed by E. coli reporter (requires fluorescence readout)LacZ expressed by E. coli reporter, combined with colourimetric substrate CPRG (readable by eye, no equipment required)

The Lentini paper inspired the architecture of the present design – a riboswitch‑controlled pore‑forming toxin inside an artificial cell acts as an “actuator” that gates the release of a small‑molecule output. However, this design swaps the input from a synthetic molecule (theophylline) to a clinically relevant human protein biomarker (TNF‑α) and changes the readout from fluorescence to a naked‑eye colourimetric signal, making it more suitable for point‑of‑care diagnostic applications.

Platform modularity and future directions

This synthetic minimal cell provides a proof‑of‑concept for a modular, reagent‑free cytokine detection platform. By swapping the aptamer in the riboswitch, the same SMC architecture can be reconfigured to detect virtually any protein biomarker for which a specific RNA aptamer can be selected – from other cytokines (IL‑6, IL‑1β, IFN‑γ) to viral proteins (e.g., SARS‑CoV‑2 spike protein) or cancer biomarkers (e.g., PSA, CA‑125).

Homework question from Peter Nguyen

💖Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch:

Write a one-sentence summary pitch sentence describing your concept.

👩‍🦰We turn building walls into living, low‑cost air quality monitors by embedding freeze‑dried cell‑free sensors that detect indoor pollutants and produce a visible colour change upon rehydration.

How will the idea work, in more detail? Write 3-4 sentences or more.

🙌The freeze‑dried cell‑free system is incorporated directly into water‑based paint or plaster. It contains a toxin‑responsive riboswitch or transcription factor controlling the expression of a chromoprotein (e.g., a purple pigment) or an enzyme that produces a visible dye. When indoor humidity or a small water leak activates the system, it becomes functional. If a target pollutant (e.g., formaldehyde, benzene, or carbon monoxide) diffuses into the material, it binds to a sensor protein, triggering transcription‑translation of the reporter. Within a few hours, a clear colour patch appears on the wall, alerting occupants. Multiple sensors can be patterned as stripes or QR codes for semi‑quantitative detection.

What societal challenge or market need will this address?

✨Poor indoor air quality (IAQ) causes “sick building syndrome”, asthma, and long‑term health issues, yet conventional electronic monitors are expensive, require power, and provide no spatial resolution. This technology offers a passive, disposable, and ultra‑low‑cost sensor that works without batteries or maintenance. It is especially useful for schools, hospitals, and low‑income housing where continuous monitoring is needed but budgets are tight.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

👩‍🦳Activation with water – The freeze‑dried system is stored in a desiccated state. We co‑encapsulate trehalose to protect during drying and storage. Activation occurs when ambient humidity exceeds a threshold (e.g., >60 % RH) or by a small water reservoir integrated behind the coating. For applications with very dry air, a micro‑encapsulated water bead can be crushed by hand or by a simple pH‑triggered release.

Stability – Freeze‑dried cell‑free extracts remain active for months at room temperature when protected from oxygen and moisture. We package the reactive powder in a bilayer: a permeable outer membrane for gas exchange and an inner water‑soluble layer that dissolves upon activation, exposing the reaction mix to the environment.

One‑time use – This is actually an advantage for disposable sensors: the wall patch gives a permanent colour change, serving as a “record” of past contamination. For reusability, we can design a biphasic system where the reporter is deposited on a removable test strip that can be swapped out, while the enzyme‑producing layer remains.

By combining these strategies, the system becomes a practical “canary on the wall” that requires no electronics, no skilled operation, and no external power – only a small amount of water or humidity to wake it up.

BioBits in Space: Detecting DNA damage caused by space radiation using a cell‑free repair assay

Background

Space radiation causes DNA double‑strand breaks (DSBs), which increase cancer risk and threaten crew health. Current dosimeters measure physical dose but not biological effect. A simple, cell‑free assay that reports on DSB frequency would provide direct biological impact data, enabling better risk assessment for long missions.

Genetic target

Linear DNA template encoding a split‑GFP – two halves of GFP are expressed separately only after ligation of a radiation‑induced DSB.

👩‍🦳When radiation breaks DNA, the linear template is fragmented. By measuring reconstituted GFP fluorescence after a cell‑free repair reaction, we quantify DSB frequency. This links physical radiation dose directly to a functional molecular outcome – DNA integrity – without living cells.

Hypothesis: Freeze‑dried cell‑free extracts containing DNA repair enzymes (e.g., E. coli ligase and polymerase) can ligate radiation‑fragmented DNA back into a full‑length template, restoring split‑GFP expression. Fluorescence intensity will correlate with radiation dose.

Rationale: Current space radiation monitoring relies on passive detectors that must be returned to Earth for analysis. We propose a real‑time, in‑flight assay: a BioBits pellet containing repair machinery and split‑GFP DNA. After in‑orbit radiation exposure, adding water activates the repair reaction; intact templates produce GFP. This transforms the pellet into a direct biological dosimeter that can be read within 2 hours using the P51 fluorescence viewer.

Experimental plan

Test three groups: flight (exposed to space radiation), ground (identical but on Earth), and shielded (wrapped in 2 mm Al, on‑orbit). Each group: 5 BioBits pellets. Post‑flight, add water + T4 DNA ligase buffer to all pellets simultaneously, incubate at 37 °C for 2 h. Measure GFP fluorescence (470 nm excitation, 520 nm emission) using P51. Controls: intact plasmid DNA (positive), no DNA (negative).

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Week 2 Lab: DNA Gel Art

Week 3 Lab: Opentrons Artwork

Week 4 Lab: Protein Design Part I

Week 6 Lab: Gibson Assembly

Week 7 Lab: Neuromorphic Circuits

Week 9 Lab: Cell Free Systems

Projects

Final projects:

  • Memorandum of Understanding (MoU) HTGAA Committed Listener (CL) Agreement This Memorandum of Understanding (MoU) defines the mutual commitment, expectations, and responsibilities between the HTGAA Committed Listener (CL), the Local Node / MOM Lab, and the HTGAA Course Administration. By signing this agreement, the CL acknowledges that HTGAA (How to Grow Almost Anything) is an intensive, graduate-level course requiring strict adherence to laboratory safety, academic integrity, and rigorous resource management.
  • L-Protein Engineering | Option 1: Mutagenesis ☀️ Team Members LIAO LITING WANG YUXIN ZHANG SIWEI Important Objective “Engineering the MS2 Lysis Protein to enhance mutagenesis efficiency while balancing cellular viability—a significant challenge in modern synthetic biology.” 1. Chaperone-independent lysis design; 2. Rapid and efficient E. coli killing; 3. Potentiated lysis protein yield; Fig 1. Electron micrograph of bacteriophages showing their characteristic morphology.
  • HTGAA 2026: Individual Final Project Documentation 🤴The Prometheus Symbiont🤴 SECTION 1: ABSTRACT Provide a concise, self-contained summary of your project (minimum 150 words) The abstract should allow a reader to understand the purpose, approach, and expected outcomes of the work without referring to other sections. Abstract The Prometheus Symbiont is initially proposed as an ideal system based on the principles of natural photosynthesis and a continuous directed evolution platform. Aimed at mimicking natural systems, the ideal concept involves converting photosynthetic membranes into bio-self-powered mechanical systems, thereby enabling robots to replenish their own energy by simulating the foraging behavior of the leaf sheep (Costasiella kuroshimae).

Technical Roadmap

1. DnaJ-Independent Mutagenesis

  • Site Identification: Targeting 4 residues in the lysis protein.
  • Engineering: Mutating to remove DnaJ dependency.
  • Validation: Verifying activity across diverse environments.

2. De Novo Protein Design

  • Optimization: Designing for a “mild yet potent” profile.
  • Precision Control:
    • Intensity: Enhancing lytic strength for efficacy.
    • Timing: Calibrating thresholds for accurate release.
  • Clinical Safety: Balancing clearance and host-cell impact.

cover image

HTGAA 2026: Individual Final Project Documentation

SECTION 1: ABSTRACT

SECTION 2: PROJECT AIMS

SECTION 3: BACKGROUND

SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

SECTION 5: Results & Quantitative Expectations

SECTION 6: ADDITIONAL INFORMATION

cover image

Subsections of Projects

Bioethical Considerations

Memorandum of Understanding (MoU)

HTGAA Committed Listener (CL) Agreement

This Memorandum of Understanding (MoU) defines the mutual commitment, expectations, and responsibilities between the HTGAA Committed Listener (CL), the Local Node / MOM Lab, and the HTGAA Course Administration.

By signing this agreement, the CL acknowledges that HTGAA (How to Grow Almost Anything) is an intensive, graduate-level course requiring strict adherence to laboratory safety, academic integrity, and rigorous resource management.


I am a HTGAA Committed Listener, my responsibilities are:

Watching class lectures and recitations Participating in node reviews Developing and documenting my homework Actively communicating with other students and TAs on the forum Allowing HTGAA and Biopunk to share my work (with attribution) Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human) Following locally applicable health and safety guidance Promoting a respectful environment free of harassment and discrimination Signed by committing this file to my documentation page/repository,

{{ Siwei Zhang }}

{{ 09/04/2026 }}

group-final-project

L-Protein Engineering | Option 1: Mutagenesis

☀️ Team Members

LIAO LITING      WANG YUXIN      ZHANG SIWEI

Important
Objective
"Engineering the MS2 Lysis Protein to enhance mutagenesis efficiency while balancing cellular viability—a significant challenge in modern synthetic biology."
1. Chaperone-independent lysis design;
2. Rapid and efficient E. coli killing;
3. Potentiated lysis protein yield;
Electron Micrograph of Bacteriophages

Fig 1. Electron micrograph of bacteriophages showing their characteristic morphology.

Note

📜 Project Background:

1. Prototype Lysis Systems:

  • MS2 Lysis Protein (L): A single-gene lysis system that triggers membrane fusion and cell wall degradation.
  • ϕX174 Lysis Protein (E): A classic model for chaperone-dependent lysis in E. coli.

These proteins serve as the biological foundation for our engineered modifications, providing the baseline for lysis efficiency and cellular impact.

Fig 1. Genome organization

Fig 2. Genome organization of ϕX174 and MS2 phages and similarities between their lysis proteins. The lysis genes of the two phages are shaded blue.

Important

Key Insights & Design Principles

  1. Functional Core > The C-terminal 25-30 residues of the L-protein are the functional heart of lysis, capable of dissipating the proton-motive force via hydrophilic pores (Goessens et al., 1988).

  2. Chaperone Evasion > Modifying the non-essential N-terminus allows the protein to evade DnaJ C-terminal sequestration, optimizing lysis independent of host chaperones (Chamakura et al., 2017).

  3. Critical Targeting > Bayer’s patches (membrane adhesion sites) are the decisive targets that determine the efficiency of the infection and lysis process (Chamakura et al., 2017).


核心原则 / Design Principle: > “Nature evolves for survival stability; engineering designs for peak performance."

References

  • Goessens, W.H.F., et al. (1988). A synthetic peptide corresponding to the C-terminal 25 residues of phage MS2-coded lysis protein… EMBO J, 7:867–873.
  • Chamakura, K.R., et al. (2017). MS2 lysis of Escherichia coli depends on host chaperone DnaJ. Journal of Bacteriology, 199(12).
Tip

Technical Approach: Chaperone-Independent Lysis

1. Re-evaluating Chaperone Requirements

To engineer a superior lysis protein, we must first address the host chaperone (DnaJ) dependencies:

  • Proteostasis: Preventing non-specific aggregation of lysis proteins.
  • Kinetic Control: Establishing a precision “Lysis Timer”.
  • Spatial Navigation: Ensuring accurate targeting to Bayer’s Patches.
  • Conformational Modulation: Facilitating smooth transmembrane insertion.

2. Engineering Chaperone-Independence

Goal: Achieving Autonomy while Preserving Lytic Potency.

  • Internalization of Function: Converting external chaperone support into intrinsic protein functionality.
  • Strategic Trade-off: Precision balancing between lysis timing and viral burst size.

3. Core Design Principles

  • Stability: Augmenting protein conformational stability.
  • Latency: Expanding the kinetic latency buffer for optimized maturation.
  • Affinity: Fortifying site-specific binding to the cell envelope.

Conclusion: By implementing these designs, we achieve a tempered infection that optimizes the delicate balance between lysis timing and total viral burst size.

Insights from DnaJ External Support

A. Host-Derived Recruitment (The "External Support")

  • System: DnaJ is an endogenous E. coli protein from the Hsp40 family.
  • Strategy: Instead of encoding its own chaperones, the MS2 phage recruits the host's system to assist in L-protein folding.
  • Implication: This biological "dependency" creates a vulnerability that our engineering aims to internalize.

B. N-Terminal: The Evolutionary Sandbox

  • Character: The N-terminal domain is nonessential for core lysis function, granting it high evolutionary latitude.
  • Potential: It allows for the exploration of flexible linker lengths and dynamic charge distributions.
  • Design Goal: This is the optimal entry point for "Tempered Self-Evolution," enabling the protein to reach a functional equilibrium.
Fig 3. Lysis Proteins Sequence Alignment

Fig 3. GenBank accession numbers: MS2 (CAA23990.1), M12 (AAF19634.1), fr (CAA33137.1), GA (CAA27498.1), JP34 (AAA72211.1), KU1 (AAF67675.1), Hgal1 (YP_007237174.1), C1 (YP_007237128.1), AP205 (NP_085469.1), PP7 (NP_042306.1), PRR1 (YP_717670.1).The conserved LS motif (yellow) is essential for lytic function, preceded by a hydrophobic stretch (underlined) that facilitates membrane insertion. Highly basic N-termini (red) and acidic residues (blue) are strategically positioned to regulate electrostatic interactions.

🛠️ Implementation Process

Core Focus: Targeted Synergistic Effects

Overview: Redesign the fragile domains of the L-protein to bypass the DnaJ dependency. By creating a self-stabilizing and autonomously positioning structure, we aim to increase the protein's robustness and conformational stability.

 

STEP 1

Data Acquisition & Analysis

  • Gather multi-omic data: Lysis protein sequences & DNA structural motifs.
  • Identify conserved functional sites and critical domains.
  • Systematically review known mutational effects from global research databases.
🧬 Lysis Protein (MS2) UniProt: P03609 ↗
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
🧬 Host Chaperone: DnaJ (E. coli) UniProt: P08622 ↗
MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDSQKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYNMELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQQTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAPAGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTGKLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRSKSFFDGVKKFFDDLTR

### 💡 :Known Mutational Effect

LS Dipeptide: > The Leucine-Serine (LS) residues at positions 44 and 45 of the MS2 L-protein are extremely critical.

Domain Organization

The L-protein is partitioned into four domains:

  • Domain 1 (N-terminus): Despite being positively charged and significant, it is dispensable for the lysis function itself (it primarily mediates binding with the host chaperone DnaJ).
  • Domain 2 to Domain 4 (C-terminal half): This region, which contains the LS motif, constitutes the essential core for executing lysis.

Design Focus: > The design centers on Domain 1, increasing its hydrophobicity via amino acid substitution to ensure spontaneous folding and structural stability.


L-protein Structure

Fig.4 Schematic representation of the core structural domains of MS2 L-protein

 

STEP 2

###💡 :Select an approach to make sequence variants

Screening of mutation sites 1

Screening of mutation sites 2

Fig.5 & 6 Schematic of mutation site screening and selection process

Plan 1: Design Strategy


  1. K50L Mutation Score: 2.56
    • Effect: Strengthens hydrophobic anchoring at the Domain 2/4 interface.
  2. Y39L Mutation (Domain 1) Score: 2.24
    • Effect: Replaces Tyrosine (Y) with Leucine (L) to create a robust transmembrane helix.
Rationale: The goal is to enhance structural stability via increased hydrophobicity and better interface anchoring.
Note

Mutated Sequence: Plan 1 (Y39L & K50L)

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT


  • Position 39 (Y→L): LLR Score 2.24 | Enhances TM helix robustness.
  • Position 50 (K→L): LLR Score 2.56 | Strengthens hydrophobic anchoring.

Plan 1.5 | Design Action: Charge Enhancement

Design Action: Based on LLR scores, the C-to-R mutation yields a high score of 2.39. Mechanism: Increasing the positive charge enhances the protein’s autonomous attraction to the negatively charged cell membrane, thereby reducing its functional dependency on DnaJ escorting.

METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLLVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT


  • C-to-R Mutation (Pos 29-31 area): LLR Score 2.39 | Boosts electrostatic attraction.
  • Functional Impact: Bypasses DnaJ dependency for more autonomous membrane targeting.

Plan 2 | Design Action: Structural Rigidity Reinforcement

Design Action: Based on LLR scores, the S-to-Q mutation yields a high score of 2.39. Mechanism: Increasing the rigidity of Domain 1 facilitates its autonomous folding into a helical state, optimizing its structural readiness for membrane insertion.

METRFPQQQQQQTPASTNRRRPFKHEDYPRRRQQRSSTLLVLIFLAIFLSLFTNQLLLSLLEAVIRTVTTLQQLLT


  • S-to-Q Mutation (Pos 8-12 area): LLR Score 2.39 | Enhances alpha-helical propensity.
  • Functional Impact: Promotes spontaneous folding of Domain 1, ensuring structural stability before membrane interaction.

 

STEP 3

Mutated Simulation: Plan 1 (Y39L & K50L)

> Fig. 7 Mutation simulation of Plan 1: Enhancing hydrophobic anchoring (K50L & Y39L) >

> >

> Fig. 8 Mutation simulation of Plan 1: Enhancing hydrophobic anchoring (K50L & Y39L) >

> > Fig. 9 Mutation simulation of Plan 1: Enhancing hydrophobic anchoring (K50L & Y39L) >

Mutated Simulation: Plan 1.5 (Charge Enhancement)

Fig. 10 Surface electrostatic potential simulation after C-to-R mutation

Fig. 11 Interaction analysis between the positive charge cluster and lipid bilayer

Fig. 12 Stability and binding energy evaluation for Plan 1.5 design

Mutated Simulation: Plan 2 (Design Action)

Fig. 13 Structural comparison of Domain 1: Wild-type vs. S-to-Q mutated rigid state

Fig. 14 Helix propensity analysis and autonomous folding simulation

Fig. 15 Energy landscape of Plan 2 design during membrane transition

STEP 4 Submit 5 mutated sequences

🛠️ Implementation Process

Focuse on: Integrated Strategy

Note

**Design Logic: Structural Stability**

🔹 Structural Stability: Intramolecular Salt-Bridge Lock
🔹 Rationale: If both the N- and C-termini of the L protein are highly enriched with positive charges, they will experience mutual electrostatic repulsion. In the absence of an anionic (negatively charged) chaperone like DnaJ to bridge them, this 'dual-cationic' structure causes the protein to behave like a tensed spring—becoming highly unstable and prone to non-specific aggregation.
Fig 16

Fig. 16 Integrated Strategy for Lysis Protein Analysis. (c) Multiple sequence alignment of the lysis proteins. GenBank accession numbers: MS2 (CAA23990.1), M12 (AAF19634.1), fr (CAA33137.1), GA (CAA27498.1), JP34 (AAA72211.1), KU1 (AAF67675.1), Hgal1 (YP_007237174.1), C1 (YP_007237128.1), AP205 (NP_085469.1), PP7 (NP_042306.1), and PRR1 (YP_717670.1).


  • Sequence Motifs: The conserved LS motif is highlighted in yellow, preceded by a stretch of hydrophobic residues (underlined) and highly basic N-termini.
  • Amino Acid Properties: Basic and acidic residues are highlighted in red and blue, respectively.
  • Mutagenesis Analysis: * Green asterisks (*) indicate all possible codon positions where a nonsense mutation could be accessed by a single nucleotide change.
  • Underlined asterisks (*) indicate positions where no nonsense mutants were obtained in the experimental mutagenesis.

Protein Data Card: Lysis Protein (MS2)

1. Basic Information

  • UniProt ID: P03609 (LYS_BPMS2)
  • Full Name: Lysis protein
  • Organism: Escherichia phage MS2 (OX=12022)
  • Evidence: PE=2 (Evidence at protein level)
  • Version: SV=1

2. Sequence Analysis

N-terminal Start: M (Methionine)
C-terminal End: T (Threonine)

Full Sequence Segment:

      10         20         30         40         50
METRF PQQSQ QTPAS TNRRR PFKHE DYPCR RQQRS (N-terminal)

      60         70         80         90
ST**LYV LIFLA IFLSK FTNQL LLSLL** EAVIR TVTTL QQLLT (C-terminal)

Important

DESIGN OPTIMIZATION & STABILITY RATIONALE

The design strictly follows the principles of in-situ salt-bridge locking to stabilize the autoinhibitory state, while carefully optimizing codon usage to avoid the introduction of nonsense mutations that would truncate the L-protein.

Engineered Sequence 1

Rationale: First, the electrostatic repulsion is converted into intramolecular attraction, where the N- and C-termini form multiple E-R or D-K pairs that are spatially proximal, establishing stable multivalent salt bridges.

Full Sequence Segment:

      10         20         30         40         50
      |          |          |          |          |
METRF PQQSQ QTPAS TNRRR PFKHE DYPCR RQQRS (N-terminal)
[------- N-terminal: Cationic/Basic Region -----------]

      60         70         80         90
      |          |          |          |
GGSGG SGEDD ELYVL IFLAI FLSKF TNQLL LSLLR RRW (C-terminal)
[ Linker ] [--- C-terminal: Anionic & Hydrophobic ---]

Fig. 17 AlphaFold 3 and ColabFold-Based Structural Analysis of De Novo Designed MS2 Lysis Protein. (a) High-confidence structural model predicted by AlphaFold 3, highlighting the optimized helical regions. (b) Comparative alignment using ColabFold, demonstrating the consistency of the intramolecular salt-bridge formation between the N- and C-termini.

Preventing Pre-mature Lysis:Proline Switch\The pH-responsive histidine switch

Rationale: Bayer's Patches (5.5-6.0) Normal physiological pH (7.2–7.4) shifts toward acidity during active infection.

Important
CRITICAL DESIGN: TRIGGERED RELEASE & MEMBRANE KINETICS
Subsequently, histidine and proline switches are incorporated into the N-terminus to ensure that the 'salt-bridge lock' is specifically released at the Bayer's patches, thereby optimizing the viral burst size. By enhancing the N-terminal hydrophobic masking, the solubility of the L protein is improved, successfully delaying its insertion into the host membrane.

Engineered Sequence 2

Full Sequence Segment:

      10         20         30         40         50
      |          |          |          |          |
METRF PQQSQ QTPAS TNRRR PFKHE DYPCR RQQRS (N-terminal)
[------- N-terminal: Cationic/Basic Region -----------]

      60         70        80         90
      |          |          |          |
GGSGG SG HPH EDDE LYVLI FLAIF LSKFT NQLLL SLLRR RW (C-terminal)
         ^^^
[ Linker ] [--- C-terminal: Anionic & Hydrophobic ---]

Stage 2: Synthesize the L-protein mutant gene via Twist

Full Sequence Segment:

      10        20        30        40        50
      |         |         |         |         |
CTCGAGGGTA CCACCGGTGA GTCCCATGGC ATATGGGGCC CGTGCACGGC (Row 1)
GCGCCGCTAG CGCGGCCGCG GTACCATGCA TCCTAGGGGA TCCGAAGACA (Row 2)
GATCTTTAAT TAACTCGAGG GGCCCCACGT CGGTCTCCGT CTCATCGATT (Row 3)
TCGAAATCGA TCGGCCGGAG CTCGAATTCG ATATCCGTCT CAAGCTTGTT (Row 4)
AACGGTACCA CGCGTCTGCA GCGATCGCAG CTGGAGCTCC CGCGGGTCGA (Row 5)
CCCCGGGTAC GTAACTAGTG CATGCCTCGA GCCCGGGTCT AGACTCGAGC (Row 6)
CCGGGTAACT CGAG                                        (Row 7)

Stage 3: Clone the L-protein mutant gene into a plasmid using Gibson

🧬 Stage 3: Gibson Assembly for L-Protein Cloning

Objective: A highly efficient, seamless cloning method to insert the optimized L-protein sequence into a linearized expression vector.


  • Vector Preparation (载体线性化)

    • Linearize your target plasmid (e.g., pET or pBAD) via Restriction Digestion (using high-fidelity enzymes) or Inverse PCR.
    • Note: Ensure the linearized vector is purified to remove any residual circular template.
  • Insert Preparation (插入片段制备)

    • Perform PCR on your optimized L-protein sequence.
    • Key Requirement: Use primers designed to add 20–40 bp overlapping arms identical to the ends of your linearized plasmid.
  • The Master Mix Reaction (一锅法反应)

    • Mix the vector and insert (molar ratio ~1:2) with the Gibson Assembly Master Mix.
    • Reaction Condition: Incubate at 50°C for 15–60 minutes.
EnzymeActionDescription
ExonucleaseChew-backRemoves nucleotides from the 5’ ends, creating single-stranded 3’ overlaps.
DNA PolymeraseGap-fillIncorporates nucleotides after the overlapping strands anneal.
DNA LigaseNick-sealCovalently joins the DNA fragments into a circular, double-stranded plasmid.
  • Transformation (转化)
    • “Shock” the assembled DNA into competent E. coli cells (e.g., DH5α or BL21).
    • Process: 1. Heat-shock ($42°C$) or Electroporation; 2. Recovery in SOC/LB medium; 3. Plating on selective agar.
    • Outcome: Replicates the assembled plasmid for subsequent sequence verification.

Generated for Molecular Biology Workflow | v1.0

Stage 4: Test the L-protein mutant’s structural integrity using the Nuclera system

🧪 Stage 4: Cell-Free Synthesis & Quality Control

System: Nuclera eProtein Desktop Platform / Microfluidic Integration

  • DNA Input (底物加载)

    • Load the constructed plasmid (from Stage 3) or the high-purity PCR-amplified linear DNA into the Nuclera microfluidic chip.
    • Requirement: Ensure DNA concentration meets the chip’s specified detection range.
  • Microscale Synthesis (微量合成)

    • The system executes automated coupled transcription and translation (TX-TL).
    • Synthesis occurs within discrete microdroplets on-chip, enabling rapid production of the L-protein mutant in a cell-free environment.
  • Integrity Check (完整性检测)

    • Utilizes integrated biosensors for real-time monitoring.
    • Focus: Confirms the protein’s biophysical state, specifically targeting a monomeric and soluble profile to avoid aggregation or misfolding.

Stage 5: Test the L-protein in E. coli with plaque assays

🧫 Stage 5: Functional Validation via Plaque Assays

Objective: Evaluate the lysis activity of the L-protein mutant in vivo using E. coli host systems.

  • Induction of Expression (诱导表达)

    • Culture the transformed E. coli cells until they reach mid-log phase ($OD_{600} \approx 0.4 - 0.6$).
    • Trigger protein synthesis using a specific inducer:
      • L-arabinose (for pBAD vectors) or IPTG (for pET vectors).
    • Note: Maintain optimal temperature to balance protein folding and expression levels.
  • Plaque Formation (噬菌斑形成)

    • Employ the Double-layer Agar Technique (双层琼脂法).
    • Mechanism: Functional L-protein triggers localized host cell lysis.
    • Observation: Formation of visible, clear circular zones (Plaques) within the bacterial lawn.
  • Efficiency Analysis (效率分析)

    • PFU Calculation: Quantify lysis efficiency by calculating Plaque Forming Units (PFU/mL).
    • Phenotypic Mapping: Measure plaque diameters to assess:
      1. The stability provided by the Salt-bridge Lock.
      2. The impact of enhanced Bayer’s Patch binding affinity on lysis kinetics.

individual-final-project

HTGAA 2026: Individual Final Project Documentation

🤴The Prometheus Symbiont🤴

SECTION 1: ABSTRACT
Provide a concise, self-contained summary of your project (minimum 150 words)
The abstract should allow a reader to understand the purpose, approach, and expected outcomes of the work without referring to other sections.

Abstract

The Prometheus Symbiont is initially proposed as an ideal system based on the principles of natural photosynthesis and a continuous directed evolution platform. Aimed at mimicking natural systems, the ideal concept involves converting photosynthetic membranes into bio-self-powered mechanical systems, thereby enabling robots to replenish their own energy by simulating the foraging behavior of the leaf sheep (Costasiella kuroshimae).

cover image

This research primarily focuses on two core advancements:

  • A. Precise Control of Photosynthetic Networks: The study explicitly reveals that calcium ion (Ca2+) concentration acts as the central controller to precisely regulate the photosynthetic network. This uncovers a universal photosynthetic law in nature and provides a clear, well-defined technical direction for synthetic biology.

    cover image

  • B. Construction of Long-Endurance Biological Systems: Since the endurance capacity of bio-self-powered systems is critical to determining the future of this field, this project constructs an ideal long-endurance biological system based on an understanding of natural photosynthetic principles. Furthermore, it attempts to maintain its operation at a low cost of biological consumables through the development of continuous directed evolution technology, ultimately realizing the initial vision of the project.

    cover image

⚠️ Notice: Please respect the original concepts presented here. If you wish to reference, cite, or build upon this research, kindly provide appropriate credit to the authors.

Future Vision & Interdisciplinary Roadmap

“The nurture of this story was born out of a beautiful serendipity.” Viewing this as the inception of my journey, I am fully committed to transforming this nascent vision into a groundbreaking reality.

As the historic birthplace of affective computing and pioneering robotics, MIT represents the ultimate academic environment where I aspire to fully realize and validate these concepts.

To bridge the gap between this conceptual framework and its rigorous technical execution, my immediate roadmap focuses on deeply exploring natural photosynthetic mechanisms and species-specific characterizations, while actively reinforcing my foundation across the following interdisciplinary domains:

  • Biological Mechanisms & Exploration:
    • Uncovering the fundamental principles of natural photosynthesis through targeted experimentation.
    • Performing rigorous species-specific biological identification to map out energy-harvesting behaviors.
  • Engineering & Computational Synthesis:
    • Supplementing my knowledge in electrical and mechanical engineering to build bio-self-powered robotic systems.
    • Advancing my proficiency in computer science, including but not limited to, the theoretical modeling and execution of technical pathways for synthetic control networks.

I eagerly embrace this current stage as the absolute starting point of my research project, driven by the profound curiosity that sparked this journey in the first place.

⚠️ Notice: If a robotic outbreak is bound to erupt, then let humanity evolve into a force that surpasses the mechanical. Restrain the machine with the power of the machine. For love is the ultimate meaning of cosmic evolution.

🌌 Philosophical Vision

In a mechanical era dominated by silicon-based life and supreme computational power, absolute rationality teeters on the brink of destruction.

While machines attempt to format the universe, humanity chooses to harness nature’s most ancient energies—diverse photosynthesis and biological symbiosis—to achieve the transcendent evolution of both body and will.

This is no mere struggle for survival; it is an evolutionary awakening.

Machines do not comprehend sacrifice; algorithms can never understand the impulse to protect.

The underlying logic of the “Promethean Fire” we have kindled is not just the precise regulation of calcium ions, but a supreme emotion flowing deep within our genes.

PROJECT AIMS

  • Background

cover image

cover image

  • Aim 1: Experimental Aim:This study establishes the central, non-negotiable status of calcium ions (Ca2+) in sustaining autotrophic life forms. The dynamics of calcium concentration conversion dictate whether an autotrophic organism operates in an ’efficient energy-storing’ (growth) mode or a ‘safe energy-dissipating’ (defense) mode, serving as the absolute central controller of the entire system.

cover image

cover image

cover image

cover image

  • Aim 2: Development Aim:Long-Endurance Biological Systems

  • Step 1: Extraction of Photosynthetic Biomembranes and “Electric Bridge” Construction (Photosynthetic Membrane Electro-Conversion) To achieve bio-self-powered machinery, the primary prerequisite is to “extract” the electrons generated during photosynthesis and convert them into electrical currents.

  • Chassis Organism Selection: Mimicking the inter-kingdom utilization observed in the “leaf sheep,” Cyanobacteria (e.g., Synechococcus) or highly tolerant red algae are selected as the foundational materials. Their photosystem II (PSII) complexes are the most amenable to genetic engineering.

  • Biophotovoltaic Cell (BPV) Assembly: Thylakoid membranes are isolated and adsorbed onto the surfaces of highly conductive nanomaterials, such as graphene, carbon nanotubes, or the conductive polymer PEDOT:PSS.

  • Electron Mediator Modification: Exogenous electron-transport mediators (such as quinone-like compounds) are introduced between the photosynthetic membrane and the anode. Alternatively, cyanobacteria can be genetically engineered to express exogenous cytochromes. This allows electrons derived from the water-splitting reaction in the light phase to “tunnel” directly onto the machine’s electrodes, achieving the direct conversion of light energy into electrical energy.

cover image

cover image

cover image

  • Step 2: Development of a Continuous Directed Evolution Platform (Low-Cost Operational Maintenance) The most fatal vulnerability of biocatalysts (photosynthetic membranes and enzymes) within an engineered mechanical environment is their susceptibility to aging and inactivation (photoinhibitory damage). If “biological consumables” require frequent manual replacement, the operational cost becomes prohibitive. Therefore, a “living in vitro evolution chip” must be integrated internally within the machinery.

  • Microfluidic Adaptive Evolution Chip (A variant of Microfluidic Phage-Assisted Continuous Evolution - MPACE): The autotrophic organisms (cyanobacteria) are confined within an on-board microfluidic chip inside the machine.

  • Introduction of Error-Prone PCR or Mutagens: A minimal, precisely controllable mutation rate is sustained inside the microfluidic chip.

  • Establishment of “Selection Pressure”: Mechanical stress and light-intensity adversity are deliberately introduced. The internal environment of the machine simulates high light intensities (inducing photodamage) or fluctuating temperatures. Only the autotrophic strains that evolve an “extremely rapid self-healing rate of the D1 protein (the core repair protein of PSII)” or “exceptional thermal stability” can survive within the chip and continuously generate electricity.

  • Low-Cost Maintenance: The microfluidic system automatically replenishes trace amounts of sterile water containing essential inorganic salts (serving as the biological consumables). This enables the fittest strains to autonomously divide, replicate, and replace degraded, inactive cells within the machine, thereby achieving low-cost self-proliferation and iteration of biological consumables.

cover image

cover image

cover image

  • Step 3: Deployment of a Calcium Ion (Ca2+) Central Control Interface (Machine-to-Bio Communication) How does the machine’s silicon-based chip discern whether the biological power network is currently in a “Grow” (energy accumulation) or “Defense” (energy consumption / self-preservation) state?

  • Calcium Fluorescence / Electrochemical Dual-Mode Sensor: Genetically encoded calcium indicators (such as the GCaMP protein series) are introduced into the autotrophic chassis organisms. When external light intensity overloads and threatens to incinerate the biomembrane, the intracellular Ca2+ concentration surges dramatically, triggering fluorescent flashes (or generating specific trans-membrane calcium currents).

  • Control Logic Response of the Soft Mechanical System: * Efficient Energy-Storing Mode (Grow): When Ca2+ remains at an optimal, moderate concentration, the machine’s master control chip receives the signal to operate at full power, diverting surplus electricity into supercapacitors.

  • Safe Energy-Dissipating Mode (Defense): Once the conversion of Ca2+ concentration breaches a critical hazard threshold (signaling imminent photosystem overload and damage), the machine’s actuators (such as mechanical bio-leaves or artificial shells) execute an immediate physical response—such as altering angles to shade the system or decreasing the load current. This provides the internal photosynthetic membranes with a vital window for “respite and self-repair.”

cover image

  • Aim 3: Visionary Aim:Self-Sustaining Living Autonomy If fully realized, the long-term vision of this project extends far beyond the creation of a standalone bio-hybrid entity. It seeks to redefine the relationship between biological systems and artificial machines, pioneering a new domain of Self-Sustaining Living Autonomy. By establishing a universal control interface rooted in natural evolutionary laws, this research aims to transition technology from a reliance on finite, brittle hardware toward self-healing, adaptive organic architectures.

The broader realization of this concept will drive profound impacts across three transformative dimensions:

  1. Challenging an Existing Paradigm: Overthrowing the Rigid Separation of Chassis and EnergyCurrent robotics and automated systems operate under a strict, bifurcated paradigm: a rigid mechanical chassis powered by external, finite energy storage (such as lithium-ion batteries). This architecture inherently limits operational lifespan, requires resource-intensive manufacturing, and leads to electronic waste.The Shift: This project directly challenges that limitation by introducing Inter-Kingdom Symbiotic Architecture. Instead of treating energy as a static payload to be consumed, the system treats energy generation as a dynamic, living metabolism.The Impact: By integrating photosynthetic membranes capable of autonomous water-splitting, energy generation becomes decentralized and localized. Machines will no longer “recharge” at fixed grid points; instead, they will “forage” for ambient light and minimal trace elements, mirroring natural biological entities. This merges energy and structure into a single, self-renewing tissue, paving the way for truly autonomous deployment in inaccessible or extreme environments.

  2. Addressing a Major Barrier: Breaking the Bio-Component Lifespan BottleneckThe foremost obstacle preventing the real-world deployment of bio-hybrid electronics and synthetic biological devices is the extreme fragility and ephemeral nature of living components. Enzymes denature, isolated membranes experience photoinhibitory damage, and wild-type cells quickly degrade when removed from their native ecosystems, making manual replacement costs prohibitive.The Solution: This project systematically breaks through this barrier by embedding a Microfluidic Adaptive Evolution Platform directly within the machine’s internal architecture.The Impact: Rather than attempting to unnaturally preserve a static biological component, the system leans into the fundamental strength of biology: evolutionary adaptation. By maintaining a continuous, controlled mutation rate under localized selection pressures, the machine forces its internal autotrophic strains to constantly self-correct and optimize. This achieves automated, low-cost self-proliferation and cellular replenishment, transforming a historically fragile variable into a self-healing, long-endurance asset.

  3. Enabling a New Experimental Capability: The Ca2+ Universal Control InterfaceHistorically, communication between synthetic biology and silicon engineering has suffered from a profound translation gap. Interfacing electronic circuits with biochemical pathways typically requires complex, slow, and indirect multi-step transduction methods.The Breakthrough: This project establishes a direct, real-world translation layer by positioning Calcium Ion (Ca2+) dynamics as the primary, dual-mode communication bridge.The Impact: Because Ca2+ concentrations serve as the natural central controller regulating the shift between optimal growth (Grow) and photoprotective dissipation (Defense), this interface allows real-time biochemical states to be directly read as micro-electrical or fluorescent signals by soft-robotic actuators. Conversely, it enables the machine to dynamically adapt its physical posture to shelter its internal organic components. This introduces a brand-new research approach: Closed-Loop Bio-Digital Cybernetics, where artificial intelligence and biological feedback loops co-evolve to govern a unified system’s survival.

cover image

⚠️ Notice: This study redefines the principles of synthetic biology, moving away from a strict reliance on traditional molecular biology theories and techniques. Furthermore, it can be redefined as an approach that is inspired by nature, integrates existing tools, and unlocks an infinite space for new media, technologies, and products.

SECTION 3: BACKGROUND

  • Background and Literature Context
  • Literature Summary

The field of biophotovoltaics (BPVs) has made significant strides in harnessing solar energy through biological frameworks, yet operational longevity remains a primary bottleneck restricting its real-world implementation. Recently, Pankratov et al. (2017) demonstrated that isolating thylakoid membranes and adsorbing them onto functionalized carbon nanotube anodes can establish a direct electronic interface, successfully achieving highly efficient and stable photo-electrochemical conversion in vitro[1]. However, such physical bio-interfaces remain vulnerable to rapid degradation caused by photoinhibitory damage to the living components. To resolve the stability of biocatalysts, Miller and Liu (2020) developed an on-chip Microfluidic Phage-Assisted Continuous Evolution (MPACE) platform, which successfully drove the rapid adaptation of photoprotective mechanisms in cyanobacterial host strains under severe light-intensity selection pressure[2].

Although these milestones have independently broken new ground in biophotovoltaic conversion [1] and continuous host-chassis directed evolution [2], current literature heavily treats these two systems as decoupled paradigms. A profound knowledge gap remains regarding how an engineered machinery matrix can internally host, sustain, and guide the continuous autonomous evolution of its own integrated photosynthetic consumables.

  • Project Novelty and Innovation

This project is highly innovative as it introduces the concept of Inter-Kingdom Symbiotic Architecture, pioneering the integration of an on-board MPACE-derived platform directly into a soft-robotic chassis to achieve a self-sustaining energy metabolism. By transforming the traditionally static bio-component into a living, evolving ecosystem, this work breaks the historical boundaries of synthetic biology, shifting the paradigm from rigid structural engineering to adaptive organic cybernetics. Furthermore, the deployment of a dual-mode Calcium Ion (Ca2+) Central Control Interface introduces a novel methodology for machine-to-bio communication, translating real-time cellular photoprotection mechanisms directly into mechanical robotic responses.

  • Project Importance and Impact

  • Importance of the problem: This project directly addresses the fatal vulnerability of bio-hybrid electronics: the rapid degradation and short lifespan of living biological components in engineered environments. Overcoming this barrier is significant because it liberates synthetic biological devices from the necessity of frequent, costly, and manual component replacement.

  • Broader societal contribution: If successful, this work will fundamentally improve our technical capability by establishing a closed-loop translation layer between silicon circuits and biochemical networks.

  • Field-level change:Beyond the immediate research context, this technology could benefit society by laying the foundational groundwork for self-healing, decentralized green energy systems deployed in extreme or inaccessible environments. Ultimately, the concepts verified here could shift the field of autonomous robotics away from a reliance on environmentally damaging, resource-intensive lithium-ion hardware toward fully sustainable, carbon-neutral, self-renewing tissue architectures.

  • Ethical Implications

  • The deployment of a self-sustaining, evolving bio-hybrid system introduces unique ethical considerations that align with the principles of beneficence, responsibility, and non-maleficence. The principle of beneficence is actively fulfilled as this research promotes public health and environmental sustainability by presenting a non-polluting, carbon-capturing alternative to heavy-metal battery waste. However, the integration of continuous directed evolution within an artificial machinery matrix triggers the principle of responsibility and non-maleficence regarding biocontainment. Because the system is designed to autonomously mutate and adapt its internal autotrophic chassis (cyanobacteria) to survive high environmental stress, there is an inherent, albeit localized, risk of generating hyper-resilient biological strains that could disrupt native microbial ecosystems if an unmonitored environmental breach occurs.

  • Ethical Safeguards and Alternatives

  • To guarantee that this research is conducted under the highest ethical standards, we propose the implementation of genetic “kill-switches” and absolute physical encapsulation within the microfluidic evolution chip. A potential unintended consequence of our proposed physical containment is that restricted fluidic pressure might inadvertently select for strains with altered cell-wall morphology, potentially changing their environmental fitness profiles. Furthermore, our underlying assumption that the mutation rate can be perfectly bounded by microfluidic mutagens contains uncertainties; we could be wrong if horizontal gene transfer occurs within the system, accelerating evolution beyond predicted models. As a robust alternative to continuous genetic mutagenesis, we considered utilizing synthetic artificial encasements or synthetic non-living enzymatic cascades; however, these alternatives lack the vital self-healing capacity required to achieve true long-endurance autonomy, justifying the controlled use of our evolutionary framework under rigorous biosafety level containment.


References

[1] Pankratov, D., Pankratova, G., Dyachkova, T. P., Falkman, P., Åkerlund, H. E., Toscano, M. D., ... & Gorton, L. (2017). Supercapacitive biosolar cell driven by direct electron transfer between photosynthetic membranes and CNT networks with enhanced performance. ACS Energy Letters, 2(11), 2635-2639.

[2] Miller, S. M., Wang, T., & Liu, D. R. (2020). Phage-assisted continuous and non-continuous evolution. Nature protocols, 15(12), 4101-4127.

Machine-Bio Boundaries, Evolutionary Irreversibility, and Technological Responsibility

By endowing a mechanical system with an autonomous, internal continuous directed evolution platform, this project fundamentally challenges the traditional boundary separating artificial constructs from living organisms, necessitating strict containment under the principle of responsibility.

  • Proposed Actions and Public Health Relevance: We propose integrating a hardcoded “Evolutionary Generation Ceiling” into the machine’s control logic—a mechanism where the microfluidic channels automatically release a harmless biological chelating agent to terminate cellular activity once the internal cyanobacterial mutations surpass a specific generational threshold. This action is directly relevant to public health as a preemptive measure to prevent the system from accidentally evolving hyper-resilient mutant strains capable of resisting standard antibiotics or industrial disinfectants, thereby eliminating any potential zoonotic or ecological health risks.
  • Unintended Consequences and Potential Errors: A potential unintended consequence of this mandatory evolutionary termination is that it might cause a sudden, catastrophic failure of the entire power system if the machine encounters severe, rapidly shifting environmental adversity exactly when the generational cap is reached. Furthermore, we might have miscalculated the underlying mathematical models of mutational accumulation; our assumptions would be wrong if the biological organisms bypass the genetic counters through cryptic mutations under extreme survival pressures.
  • Risk Alternatives: An alternative to continuous genetic mutagenesis is utilizing entirely cell-free transcription-translation (TX-TL) systems for in vitro electricity generation. However, because cell-free systems completely lack the vital self-repairing and adaptive capabilities required to sustain long-endurance autonomy, maintaining a living, yet generation-bounded evolutionary framework remains the only scientifically viable solution for this project.

SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

Use Claude AI skills to refine your HTGAA final project experimental design here

Create a detailed experimental plan for your final project. Include a timeline for each part of your experimental plan (i.e., how long you would expect each step in your final project to take). (min. 15 lines/sentences—a numbered list is acceptable)

Include specific methods/tools/technologies/biological concepts for each part of the final project and analysis

This section will be used to determine whether the experiments are well designed, feasible, and likely to succeed in testing your hypothesis Often this section is broken into discrete tasks/sub-aims

For each experiment and/or analysis, include a description of your expected results

If possible, include figure(s) that visually shows a broad workflow of your project or a specific aspect of your experimental plan Reminder: All HTGAA projects must include some DNA design! Make sure this form is submitted.

We discussed and practiced various techniques related to synthetic biology throughout the semester. Place a check next to the techniques relevant to your project.

A. Detailed Experimental Plan & Timeline
Note: This 4-month experimental workflow integrates biophysical phenotyping with structural biology and continuous directed evolution to dissect plant long-distance signaling and optimize photosynthetic efficiency.

  • Month 1: MIFE System Setup and Initial Electrophysiological Optimization
    • Set up the Non-invasive Microelectrode Ion Flux Estimation (MIFE) system workflow to measure net $\text{Ca}^{2+}$, $\text{H}^+$, and $\text{K}^+$ fluxes in specialized plant tissue.
    • Acclimate target plant chassis inside a controlled environment chamber equipped with customized actinic LED arrays.
  • Month 2: Coupled MIFE and Chlorophyll Fluorescence Profiling
    • Perform synchronized kinetic measurements tracking real-time ion dynamics alongside chlorophyll a fluorescence parameters ($F_v/F_m$, $\Phi_{\text{PSII}}$, and $\text{NPQ}$).
    • Stimulate plants with localized stressors (e.g., wounding, saline shock, or localized high light) to trigger systemic signaling propagation.
  • Month 3: Data Analysis and Mechanistic Biophysical Modeling
    • Analyze spatio-temporal correlation matrices mapping $\text{Ca}^{2+}$ wave propagation velocity to empirical photosynthetic quenching kinetics.
    • Construct a comprehensive mathematical and mechanistic principle model representing the feedback loops of natural photosynthesis under systemic stress.
  • Month 4: Computational Protein Design and Target DNA Library Construction
    • Utilize state-of-the-art structural ML models (e.g., Boltz.bio or PepMLM) to design optimized variants of light-harvesting complex proteins or calcium-sensing relays.
    • Integrate a dual-reporter feedback loop system architecture tailored specifically for downstream continuous directed evolution platforms (e.g., Phage-Assisted Continuous Evolution / PACE).
    • [Mandatory DNA Design] Utilize Benchling to design a combinatorial DNA construct library containing specialized promoter libraries and codon-optimized target variants.
    • Generate a standardized Twist Bioscience ordering manifest to synthesize the designed target DNA mutant library plates.

B. Expected Results & Sub-Aims

  • Sub-Aim 1: Biophysical Principle Model of Natural Photosynthesis
    • Expected Results: We expect to capture a clear, quantitative coupling between trans-membrane electrochemical ion potential shifts and dynamic photosynthetic efficiency fluctuations. This will yield a predictive mathematical model defining how natural photosynthesis self-regulates under abiotic stress.
  • Sub-Aim 2: Identification of $\text{Ca}^{2+}$ as the Central Processing Unit (CPU) for Long-Distance Plant Signaling
    • Expected Results: High-resolution MIFE tracking is expected to demonstrate that systemic $\text{Ca}{2+}$ influx waves precede downstream photosynthetic non-photochemical quenching ($\text{NPQ}$) activation in systemic leaves. This will definitively identify the $\text{Ca}{2+}$ ion wave as the master systemic “CPU” coordinating long-distance physiological acclimation.
  • Sub-Aim 3: Validated Molecular Scaffold Vectors for Continuous Directed Evolution
    • Expected Results: The structural ML-driven DNA design will yield functional expression vectors that maintain correct folding under high-throughput conditions, establishing a robust experimental platform for the subsequent selection of ultra-efficient photosynthetic components.

C. Broad Project Workflow Diagram

cover image

SECTION 5: Results & Quantitative Expectations

2. Detailed Validation Protocol

The validation was executed through a coupled wet-lab molecular design and dry-lab computational framework:

  1. In Silico Plasmid Design: Design a synthetic operoid containing a constitutive promoter ($P_{\text{psbAI}}$), a cyanobacterial ribosome binding site (RBS), the GCaMP6s coding sequence, and a downstream transcriptional terminator ($T_{\text{rrnB}}$).
  2. Flanking Homology Design: Append 40-base-pair flanking homology arms to the construct targeting the neutral site 1 ($NS1$) of the S. elongatus genome to facilitate stable integration.
  3. DNA Synthesis & Linearization: Vector and inserts were mathematically partitioned and ordered via Twist Bioscience, followed by high-fidelity PCR amplification to generate linear fragments for assembly.
  4. Gibson Assembly Execution: Perform a standard Gibson Assembly reaction mixing the linearized pAM1579 vector backbone and the synthetic $NS1-P_{\text{psbAI}}-GCaMP6s$ fragment at a 1:3 molar ratio, incubated at 50°C for 60 minutes.
  5. Computational ODE Simulation: Construct an ODE model utilizing MATLAB/Python to simulate intracellular $\text{Ca}{2+}$ influx fluxes ($J_{\text{in}}$) and GCaMP6s-calcium binding kinetics under varying light intensities ($0$ to $2000 \ \mu\text{mol photons m}{-2}\text{s}^{-1}$), predicting the fluorescent output intensity ($F_{\text{green}}$).

3. Synthetic Biology Techniques Utilized

In validating this central control aspect, multiple foundational synthetic biology techniques were systematically deployed:

  • In Silico DNA Design and Codon Optimization were utilized to customize the mammalian-derived GCaMP6s gene for high-level expression inside the cyanobacterial host.
  • High-Fidelity PCR Amplification was carried out using customized primers to generate precisely matched homology overlaps.
  • Gibson Assembly was utilized to seamlessly directionally clone the multi-component promoter-reporter cassette into the targeting plasmid vector without restriction enzyme scarring.
  • Computational Mathematical Modeling and Kinetic Simulation were deployed to analyze the time-resolved fluorescence curves, turning a qualitative biological reaction into a predictable, quantitative input for silicon-based micro-circuit automation.

4. Data Presentation and Quantitative Analysis

The dynamic performance of the engineered interface was validated using simulated kinetic data generated via computational modeling of the calcium-binding affinity parameters.

Simulated Light Intensity ($\mu\text{mol}\cdot\text{m}{-2}\cdot\text{s}{-1}$)Peak Intracellular $\text{Ca}^{2+}$ Concentration ($\mu\text{M}$)Relative Fluorescence Output ($\Delta F / F_0$)Predicted System Control Mode
150 (Optimal Low Light)$0.12$$0.05$Grow (Max Power / Charge Supercapacitor)
500 (Moderate Light)$0.25$$0.18$Grow (Balanced Metabolism)
1200 (High-Light Stress)$1.10$$3.45$Defense (Initiate Shading / Decrease Load)
2000 (Photoinhibitory Crisis)$2.85$$8.90$Defense (Emergency Shutoff)

Key Results

cover image

cover image

cover image

cover image

SECTION 6: ADDITIONAL INFORMATION

References

1.Acevedo-Siaca LG, McAusland L. 2025. A guide to understanding and measuring photosynthetic induction: considerations and recommendations. New Phytologist 247, 450-469.
2.Adachi S, Tanaka Y, Miyagi A, Kashima M, Tezuka A, Toya Y, Yamori W. 2019. High-yielding rice Takanari has superior photosynthetic response to a commercial rice Koshihikari under fluctuating light. Journal of Experimental Botany 70, 5287-5297.
3.Anwar SA, Mamadou O, Diallo I, Sylla MB. 2021. On the influence of vegetation cover changes and vegetation-runoff systems on the simulated summer potential evapotranspiration of tropical Africa using RegCM4. Earth Systems and Environment 5(4), 883-897.
4.Anwar SA, Diallo I. 2021. On the role of a coupled vegetation-runoff system in simulating the tropical African climate: a regional climate model sensitivity study. Theoretical and Applied Climatology 145(1), 313-325.
5.Assuero SG, Mollier A, Pellerin S. 2004. The decrease in growth of phosphorus-deficient maize leaves is related to a lower cell production. Plant, Cell & Environment 27(7), 887-895.
6.Bai R, Bai C, Han X, Liu Y, Yong JWH. 2022. The significance of calcium-sensing receptor in sustaining photosynthesis and ameliorating stress responses in plants. Frontiers in Plant Science 13, 1019505.
7.Bauters M, Janssens IA, Wasner D, Doetterl S, Vermeir P, Griepentrog M, Boeckx P et al. 2022. Increasing calcium scarcity along Afrotropical forest succession. Nature Ecology & Evolution 6(8), 1122-1131.
8.Bernstein N, Lauchli A, Silk WK. 1993. Kinematics and dynamics of sorghum (Sorghum bicolor L.) leaf development at various Na/Ca salinities (I. Elongation growth). Plant Physiology 103(4), 1107-1114.
9.Bertioli DJ, Jenkins J, Clevenger J, Dudchenko O, Gao D, Seijo G, Schmutz J et al. 2019. The genome sequence of segmental allotetraploid peanut Arachis hypogaea. Nature genetics 51(5), 877-884.
10.Blatt MR. 2024. A charged existence: a century of transmembrane ion transport in plants. Plant Physiology 195(1), 79-110.
11.Blessing CH, Mariette A, Kaloki P, Bramley H. 2018. Profligate and conservative: water use strategies in grain legumes. Journal of Experimental Botany 69(3), 349-369.
12.Block MA, Jouhet J. 2015. Lipid trafficking at endoplasmic reticulum-chloroplast membrane contact sites. Current Opinion in Cell Biology 35, 21-29.
13.Buzatti RSDO, Pfeilsticker TR, De Magalhaes RF, Bueno ML, Lemos-Filho JP, Lovato MB. 2018. Genetic and historical colonization analyses of an endemic savanna tree, Qualea grandiflora, reveal ancient connections between Amazonian savannas and Cerrado core. Frontiers in Plant Science 9, 981.
14.Cavender-Bares J, Ackerly DD, Hobbie SE, Townsend PA. 2016. Evolutionary legacy effects on ecosystems: biogeographic origins, plant traits, and implications for management in the era of global change. Annual Review of Ecology, Evolution, and Systematics 47(1), 433-462.
15.Cheaib A, Chieppa J, Perkowski EA, & Smith NG. 2025. Soil resource acquisition strategy modulates global plant nutrient and water economics. New Phytologist 246(4), 1536-1553.
16.Chen C, Bongers FJ, Schmid B, Ma K, Liu X. 2025. Ecosystem consequences of functional diversity in forests and implications for restoration. New Phytologist 247(3), 1081-1097.
17.Chen Z, Grossfurthner L, Loxterman JL, Masingale J, Richardson BA, Seaborn T, Narum SR et al. 2022. Applying genomics in assisted migration under climate change: Framework, empirical applications, and case studies. Evolutionary Applications 15(1), 3-21.
18.Chiang F, Mazdiyasni O, AghaKouchak A. 2021. Evidence of anthropogenic impacts on global drought frequency, duration, and intensity. Nature communications 12(1), 2754.
19.Conn S, Gilliham M. 2010. Comparative physiology of elemental distributions in plants. Annals of Botany 105, 1081-1102.
20.Ding W, Clode PL, Clements JC, Lambers H. 2018. Sensitivity of different Lupinus species to calcium under a low phosphorus supply. Plant, Cell & Environment 41, 1512-1523.
21.Demidchik V, Shabala S, Isayenkov S, Cuin TA, Pottosin I. 2018. Calcium transport across plant membranes: mechanisms and functions. New Phytologist 220(1), 49-69.
22.De Souza AP, Burgess SJ, Doran L, Hansen J, Manukyan L, Maryn N, Long SP et al. 2022. Soybean photosynthesis and crop yield are improved by accelerating recovery from photoprotection. Science 377(6608), 851-854.
23.Donovan LA, Maherali H, Caruso CM, Huber H, de Kroon H. 2011. The evolution of the worldwide leaf economics spectrum. Trends in Ecology & Evolution 26(2), 88-95.
24.Fan SY, Fristoe TS, Li SP, Weigelt P, Kreft H, Dawson W, van Kleunen M et al. 2025. Ecological similarities and dissimilarities between donor and recipient regions shape global plant naturalizations. Nature Communications 16(1), 10485.
25.Fior S, Luqman H, Scharmann M, Pålsson A, de Jonge J, Zoller S, Zemp N, Gargano D, Wegmann D, Widmer A. 2025. Ancient alleles drive contemporary climate adaptation in an alpine plant. Science 390, 59-64.
26.Foyer CH, Lam HM, Nguyen HT, Siddique KHM, Varshney RK, Colmer TD, Considine MJ et al. 2016. Neglecting legumes has compromised human health and sustainable food production. Nature plants 2(8), 1-10.
27.Fu YL, Zhang GB, Lv XF, Guan Y, Yi HY, & Gong JM. 2013. Arabidopsis histone methylase CAU1/PRMT5/SKB1 acts as an epigenetic suppressor of the calcium signaling gene CAS to mediate stomatal closure in response to extracellular calcium. The Plant Cell 25(8), 2878-2891.
28.Gehan MA, Park S, Gilmour SJ, An C, Lee CM, Thomashow MF. 2015. Natural variation in the C-repeat binding factor cold response pathway correlates with local adaptation of Arabidopsis ecotypes. Plant Journal 84(4), 682-693.
29.Gödecke T, Stein AJ, Qaim M. 2018. The global burden of chronic and hidden hunger: trends and determinants. Global food security 17, 21-29.
30.Gonzalez N, Vanhaeren H, Inzé D. 2012. Leaf size control: complex coordination of cell division and expansion. Trends in plant science 17(6), 332-340.
31.Grieco M, Roustan V, Dermendjiev G, Rantala S, Jain A, Leonardelli M, Teige M et al. 2020. Adjustment of photosynthetic activity to drought and fluctuating light in wheat. Plant, Cell & Environment 43(6), 1484-1500.
32.Guillory WX, de Medeiros Magalhães F, Coelho FEA, Bonatelli IA, Palma-Silva C, Moraes EM, Gehara M et al. 2024. Geoclimatic drivers of diversification in the largest arid and semi-arid environment of the Neotropics: perspectives from phylogeography. Molecular Ecology 33(14), e17431.
33.Guilherme Pereira C, Clode PL, Oliveira RS, Lambers H. 2018. Eudicots from severely phosphorus-impoverished environments preferentially allocate phosphorus to their mesophyll. New Phytologist 218, 959-973.
34.Guo LL, Hao LH, Jia HH, Li F, Zhang XX, Cao X, Xu M, Zheng YP. 2018. Effects of NaCl stress on stomatal traits, leaf gas exchange parameters, and biomass of two tomato cultivars. Chinese Journal of Applied Ecology 29(12), 3949-3958.
35.Hanba YT, Miyazawa SI, Terashima I. 1999. The influence of leaf thickness on the CO2 transfer conductance and leaf stable carbon isotope ratio for some evergreen tree species in Japanese warm-temperate forests. Functional Ecology 13(5), 632-639.
36.He F, Aebersold R, Baker MS, Bian X, Bo X, Chan DW, Zhu Y et al. 2024. π-HuB: the proteomic navigator of the human body. Nature 636(8042), 322-331.
37.Heberling JM, Fridley JD. 2012. Biogeographic constraints on the world‐wide leaf economics spectrum. Global Ecology and Biogeography 21(12), 1137-1146.
38.Henningsen JN, Venturas MD, Quintero JM, Garrido RR, Mühling KH, Fernández V. 2023. Leaf surface features of maize cultivars and response to foliar phosphorus application: effect of leaf stage and plant phosphorus status. Physiologia Plantarum 175(6), e14093.
39.Hopper SD, Lambers H, Silveira FA, Fiedler PL. 2021. OCBIL theory examined: reassessing evolution, ecology and conservation in the world’s ancient, climatically buffered and infertile landscapes. Biological Journal of the Linnean Society 133(2), 266-296.
40.Hu J, Amor DR, Barbier M, Bunin G, Gore J. 2022. Emergent phases of ecological diversity and dynamics mapped in microcosms. Science 378(6615), 85-89.
41.Huang G, Peng S, Li Y. 2022. Variation of photosynthesis during plant evolution and domestication: implications for improving crop photosynthesis. Journal of Experimental Botany 73(14), 4886-4896.
42.Jiao C, Zhang J, Wang X, He N. 2024. Optimal allocation strategies of plant calcium on Qinghai-Tibetan Plateau. Journal Of Geophysical Research-biogeosciences 129(4), e2023JG007884.
43.Kaiser E, Morales A, Harbinson J. 2018. Fluctuating light takes crop photosynthesis on a rollercoaster ride. Plant Physiology 176(2), 977-989.
44.Karpiński S, SZECHYŃSKA‐HEBDA MAGDALENA, Wituszyńska W, Burdiak P. 2013. Light acclimation, retrograde signalling, cell death and immune defences in plants. Plant, Cell & Environment 36(4), 736-744.
45.Kidokoro S, Shinozaki K, Yamaguchi-Shinozaki K. 2022. Transcriptional regulatory network of plant cold-stress responses. Trends in plant science 27(9), 922-935.
46.Knight H, Trewavas AJ, Knight MR. 1996. Cold calcium signaling in Arabidopsis involves two cellular pools and a change in calcium signature after acclimation. The Plant Cell 8(3), 489-503.
47.Kromdijk J, Głowacka K, Leonelli L, Gabilly ST, Iwai M, Niyogi KK, Long SP. 2016. Improving photosynthesis and crop productivity by accelerating recovery from photoprotection. Science 354(6314), 857-861.
48.Kuang, D., Romand, S., Zvereva, A.S., Marchesano, B.M.O., Grenzi, M., Buratti, S., Stael, S. et al. 2025. The burning glass effect of water droplets triggers a high light-induced calcium response in the chloroplast stroma. Current Biology 35(11), 2642-2658.
49.Kuzyakov Y, Gavrichkova O. 2010. Time lag between photosynthesis and carbon dioxide efflux from soil: a review of mechanisms and controls. Global Change Biology 16(12), 3386-3406.
50.Lambers H, de Britto Costa P, Oliveira RS, Silveira FA. 2020. Towards more sustainable cropping systems: lessons from native Cerrado species. Theoretical and Experimental Plant Physiology 32(3), 175-194.
51.Li X, Xie C, Cheng L, Tong HN, Bock R, Qian Q, Zhou W. 2025. The next Green Revolution: integrating crop architectype and physiotype. Trends in Biotechnology 43(10), 2479-2493.
52.Liu B, Wang XY, Cao Y, Arora R, Zhou H, Xia YP. 2020. Factors affecting freezing tolerance: a comparative transcriptomics study between field and artificial cold acclimations in overwintering evergreens. The Plant Journal 103(6), 2279-2300.
53.Liu J, Whalley HJ, Knight MR. 2015. Combining modelling and experimental approaches to explain how calcium signatures are decoded by calmodulin-binding transcription activators (CAMTAs) to produce specific gene expression responses. New Phytologist 208(1), 174-187.
54.Liu J, Zhang J, Estavillo GM, Luo T, Hu L. 2021b. Leaf N content regulates the speed of photosynthetic induction under fluctuating light among canola genotypes (Brassica napus L.). Physiologia Plantarum 172(4), 1844-1852.
55.Liu H, Ye Q, Gleason SM, He P, Yin D. 2021a. Weak tradeoff between xylem hydraulic efficiency and safety: climatic seasonality matters. New Phytologist 229(3), 1440-1452.
56.Liu Y, Shao L, Zhou J, Li R, Pandey MK, Han Y, Wan, S. et al. 2022. Genomic insights into the genetic signatures of selection and seed trait loci in cultivated peanut. Journal of advanced research, 42, 237-248.
57.Liu YF, Han XR, Zhan XM, Yang JF, Wang YZ, Song QB, Chen X. 2013. Regulation of calcium on peanut photosynthesis under low night temperature stress. Journal of Integrative Agriculture 12(12), 2172-2178.
58.Lenzoni G, Knight MR. 2019. Increases in absolute temperature stimulate free calcium concentration elevations in the chloroplast. Plant and Cell Physiology 60(3), 538-548. 
59.Lu Q, Huang L, Liu H, Garg V, Gangurde SS, Li H, Chen X et al. 2024. A genomic variation map provides insights into peanut diversity in China and associations with 28 agronomic traits. Nature Genetics 56(3), 530-540.
60.Lu ZF, Ren T, Li J, Hu WS, Zhang JL, Yan JY, Lu JW et al. 2020. Nutrition-mediated cell and tissue-level anatomy triggers the covariation of leaf photosynthesis and leaf mass per area. Journal of Experimental Botany 71(20), 6524-6537.
61.Lu ZF, Hu WS, Ye XL, Lu JW, Gu HH, Li XK, Cong RH, Ren T. 2022. Potassium regulates diel leaf growth of Brassica napus by coordinating the rhythmic carbon supply and water balance. Journal of Experimental Botany 73(11), 3686-3698.
62.Lu ZF, Ren T, Li Y, Cakmak I, Lu JW. 2025. Nutrient limitations on photosynthesis: from individual to combinational stresses. Trends in Plant Science 30(8), 872-855.
63.Maestre FT, Benito BM, Berdugo M, Concostrina-Zubiri L, Delgado-Baquerizo M, Eldridge DJ, Soliveres S et al. 2021. Biogeography of global drylands. New Phytologist 231(2), 540-558.
64.Michaletz ST, Weiser MD, Zhou J, Kaspari M, Helliker BR, Enquist BJ. 2015. Plant thermoregulation: energetics, trait–environment interactions, and carbon economics. Trends in ecology & evolution 30(12), 714-724.
65.Molenaar D, Van Berlo R, De Ridder D, Teusink B. 2009. Shifts in growth strategies reflect tradeoffs in cellular economics. Molecular systems biology 5(1), 323.
66.Moncrieff GR, Bond WJ, Higgins SI. 2016. Revising the biome concept for understanding and predicting global change impacts. Journal of Biogeography 43(5), 863-873.
67.Monroy AF, Sarhan F, Dhindsa RS. 1993. Cold-induced changes in freezing tolerance, protein phosphorylation, and gene expression (evidence for a role of calcium). Plant Physiology 102(4), 1227-1235.
68.Morgan Ernest SK, Brown JH. 2001. Delayed compensation for missing keystone species by colonization. Science 292(5514), 101-104.
69.Muller P, Li XP, Niyogi KK. 2001. Non-photochemical quenching. A response to excess light energy. Plant Physiology 125(4), 1558-1566.
70.Nemec S, Kilian KA. 2021. Materials control of the epigenetics underlying cell plasticity. Nature Reviews Materials 6(1), 69-83.
71.Oulehle F, Urban O, Tahovská K, Kolář T, Rybníček M, Büntgen U, Trnka, M et al. 2023. Calcium availability affects the intrinsic water-use efficiency of temperate forest trees. Communications Earth & Environment 4(1), 199.
72.Pagter M, Arora R. 2013. Winter survival and deacclimation of perennials under warming climate: physiological perspectives. Physiologia Plantarum 147(1), 75-87.
73.Pantin F, Simonneau T, Rolland G, Dauzat M, Muller B. 2011. Control of leaf expansion: a developmental switch from metabolics to hydraulics. Plant Physiology 156(2), 803-815.
74.Pivato M, Grenzi M, Costa A, Ballottari M. 2023. Compartment‐specific Ca2+ imaging in the green alga Chlamydomonas reinhardtii reveals high light‐induced chloroplast Ca2+ signatures. New Phytologist 240(1), 258-271.
75.Querejeta JI, Ren W, & Prieto I. 2021. Vertical decoupling of soil nutrients and water under climate warming reduces plant cumulative nutrient uptake, water-use efficiency and productivity. New Phytologist 230(4), 1378-1393.
76.Raza A, Bashir S, Khare T, Karikari B, Copeland RG, Jamla M, Siddique KH, Varshney RK et al. 2024. Temperature-smart plants: A new horizon with omics-driven plant breeding. Physiologia Plantarum 176(1), e14188.
77.Raza A, Zaman QU, Shabala S, Tester M, Munns R, Hu Z, Varshney RK. 2025. Genomics‐assisted breeding for designing salinity‐smart future crops. Plant biotechnology journal 23(8), 3119-3151.
78.Reich PB. 2014. The world-wide ‘fast-slow’ plant economics spectrum: a traits manifesto. Journal of ecology 102(2), 275-301.
79.Rizzoli R, Biver E, Brennan-Speranza TC. 2021. Nutritional intake and bone health. The lancet Diabetes & endocrinology 9(9), 606-621.
80.Salazar‐Tortosa D, Castro J, Villar‐Salvador P, Viñegla B, Matías L, Michelsen A, Querejeta JI et al. 2018. The “isohydric trap”: A proposed feedback between water shortage, stomatal regulation, and nutrient acquisition drives differential growth and survival of European pines under climatic dryness. Global Change Biology 24(9), 4069-4083.
81.Sage RF, McKown AD. 2006. Is C4 photosynthesis less phenotypically plastic than C3 photosynthesis? Journal of Experimental Botany 57(2), 303-317.
82.Sanan N, Sopory SK. 1998. A role of G-proteins and calcium in light-regulated primary leaf formation in Sorghum bicolor. Journal of Experimental Botany 49(327), 1695-1703.
83.Savvides A, Ali S, Tester M, Fotopoulos V. 2016. Chemical priming of plants against multiple abiotic stresses: mission possible? Trends in Plant Science 21(4), 329-340.
84.Seo KW, Ryu D, Jeon T, Youm K, Kim JS, Oh EH, Wilson CR et al. 2025. Abrupt sea level rise and Earth’s gradual pole shift reveal permanent hydrological regime changes in the 21st century. Science 387(6741), 1408-1413.
85.Sinha P, Singh VK, Bohra A, Kumar A, Reif JC, Varshney RK. 2021. Genomics and breeding innovations for enhancing genetic gain for climate resilience and nutrition traits. Theoretical and Applied Genetics 134(6), 1829-1843.
86.Shabala S, Newman I. 1999. Light-induced changes in hydrogen, calcium, potassium, and chloride ion fluxes and concentrations from the mesophyll and epidermal tissues of bean leaves. Understanding the ionic basis of light-induced bioelectrogenesis. Plant Physiology 119(3), 1115-1124. 
87.Shi QW, Ma MZ, Bai CM, Liu YF et al. 2025. Optimising Peanut Growth: Exogenous Calcium Enhances Photosynthesis in Phosphorus-Limited Environments. Plant, Cell & Environment. doi:10.1111/pce.15591.
88.Shi QW, Song QB, Li TL et al. 2023. Microorganisms regulate soil phosphorus fractions in response to low nocturnal temperature by altering the abundance and composition of the pqqC gene rather than that of the phoD gene. Biology and Fertility of Soils 59(8), 973-987.
89.Soares JC, Santos CS, Carvalho SM, Pintado MM, Vasconcelos MW. 2019. Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443, 1-26.
90.Song QB, Liu YF, Pang JY, Yong JWH, Chen YL, Bai CM, Lambers H et al. 2020. Supplementary calcium restores peanut (Arachis hypogaea L.) growth and photosynthetic capacity under low nocturnal temperature. Frontiers in Plant Science 10, 1637.
91.Song QB, Zhang SW, Bai CM, Shi QW, Wu D, Liu YF, Yong JWH. 2022. Exogenous Ca2+ priming can improve peanut photosynthetic carbon fixation and pod yield under early sowing scenarios in the field. Frontiers in Plant Science 13, 1004721.
92.Song XW, Tang SJ, Liu H, Deng X, Cao XF et al. 2025. Inheritance of acquired adaptive cold tolerance in rice through DNA methylation. Cell 188(16), 4213-4224.
93.Su Z, Zeng Y. 2025. Photosynthesis and water potential: A new perspective for coupling water, energy, and carbon cycles. The Innovation Geoscience 3(3), 100156-1.
94.Tang RH, Han S, Zheng H, Cook CW, Choi CS, Woerner TE, Pei Z et al. 2007. Coupling diurnal cytosolic Ca2+ oscillations to the CAS-IP3 pathway in Arabidopsis. Science 315(5817), 1423-1426.
95.Terashima I, Hanba YT, Tholen D, Niinemets Ü. 2011. Leaf functional anatomy in relation to photosynthesis. Plant physiology 155(1), 108-116.
96.Terashima M, Petroutsos D, Hüdig M, Tolstygina I, Trompelt K, Gäbelein P, Hippler M et al. 2012. Calcium-dependent regulation of cyclic photosynthetic electron transfer by a CAS, ANR1, and PGRL1 complex. Proceedings of the National Academy of Sciences 109(43), 17717-17722.
97.Thakur D, Hadincová V, Schnablová R, Synková H, Haisel D, Wilhelmová N, Münzbergová Z et al. 2023. Differential effect of climate of origin and cultivation climate on structural and biochemical plant traits. Functional Ecology 37(5), 1436-1448.
98.Thomas HJ, Bjorkman AD, Myers-Smith IH, Elmendorf SC, Kattge J, Diaz S, De Vries FT et al. 2020. Global plant trait relationships extend to the climatic extremes of the tundra biome. Nature communications 11(1), 1351.
99.Vangestel C, Eckert AJ, Wegrzyn JL, St. Clair JB, Neale DB. 2018. Linking phenotype, genotype and environment to unravel genetic components underlying cold hardiness in coastal Douglas-fir (Pseudotsuga menziesii var. menziesii). Tree Genetics & Genomes 14, 1-14.
100.Wagner H, Jakob T, Wilhelm C. 2006. Balancing the energy flow from captured light to biomass under fluctuating light conditions. New Phytologist 169(1).
101.Wang C, Tang RJ, Kou S, Xu X, Lu Y, Rauscher K, Luan S et al. 2024. Mechanisms of calcium homeostasis orchestrate plant growth and immunity. Nature 627(8003), 382-388.
102.Wang L, Chen S, Shi W, Xing S, Han C, Bai MY, Fan M. 2025d. Nitrate starvation inhibits stomatal opening via the long-distance CEP1-CEPR2 signaling cascade. Cell Reports 44(10), 116424.
103.Wang S, Liao ZY, Cao P, Schmid MW, Zhang L, Bi J, Li B et al. 2025b. General-purpose genotypes and evolution of higher plasticity in clonality underlie knotweed invasion. New Phytologist 246(2), 758-768.
104.Wang SH, Ciais P, Reich PB, Cescatti A, Ellsworth DS, Janssens IA, Peñuelas J et al. 2025a. Phosphorus constrains global photosynthesis more than nitrogen does. Nature Ecology & Evolution 9, 1-11.
105.Wang X, Liu ZZ, Yuan DY, Lu YJ, Li L, Chen S, He XJ. 2025c. Nutrient-driven TOR signalling controls a chromatin-associated complex for orchestrating plant growth and stress tolerance. Nature Plants 11, 2115-2129.
106.Wen Z, Pang J, Wang X, Gille CE, De Borda A, Hayes PE, Lambers H et al. 2023. Differences in foliar phosphorus fractions, rather than in cell-specific phosphorus allocation, underlie contrasting photosynthetic phosphorus use efficiency among chickpea genotypes. Journal of Experimental Botany 74(6), 1974-1989.
107.Wild R, Gerasimaite R, Jung JY, Truffault V, Pavlovic I, Schmidt A, Mayer A et al. 2016. Control of eukaryotic phosphate homeostasis by inositol polyphosphate sensor domains. Science 352(6288), 986-990.
108.Whiting JR, Booker TR, Rougeux C et al. 2024. The genetic architecture of repeated local adaptation to climate in distantly related plants. Nature Ecology & Evolution 8, 1933-1947.
109.Wu D, Liu Y, Pang J, Yong JWH, Chen Y, Bai C. Lambers H et al. 2020. Exogenous calcium alleviates nocturnal chilling-induced feedback inhibition of photosynthesis by improving sink demand in peanut (Arachis hypogaea L.). Frontiers in Plant Science 11, 607029.
110.Wu D, Zhang S, Bai C, Liu Y, Sun Z, Ma M, Lambers H et al. 2025. Supplementary calcium overcomes nocturnal chilling-induced carbon source-sink limitations of cyclic electron transport in peanuts. Plant, Cell and Environment. doi:10.1111/pce.15467. 
111.Xiao L, Yang G, Zhang L, Yang X, Zhao S, Ji Z, He Y et al. 2015. The resurrection genome of Boea hygrometrica: a blueprint for survival of dehydration. Proceedings of the National Academy of Sciences 112(18), 5833-5837.
112.Xie WY, Wei X, Kang H, Jiang H, Chu ZQ, Lin Y, Hou Y, Wei Q. 2023. Static and dynamic: evolving biomaterial mechanical properties to control cellular mechanotransduction. Advanced Science 10(9), 2204594.
113.Xiong DL, Flexas J. 2021. Leaf anatomical characteristics are less important than leaf biochemical properties in determining photosynthesis responses to nitrogen top-dressing. Journal of Experimental Botany 72(15), 5709-5720.
114.Xu Q, Kong F, Yang W. 2025. SnRK1 as the Core Node Integrating Energy Homoeostasis, Stress Adaptation and Hormonal Crosstalk in Plants. Plant, Cell & Environment 48(11), 7830-7847.
115.Yan L, Sunoj VJ, Short AW, Lambers H, Elsheery NI, Kajita T, Cao KF et al. 2021. Correlations between allocation to foliar phosphorus fractions and maintenance of photosynthetic integrity in six mangrove populations as affected by chilling. New Phytologist 232(6), 2267-2282.
116.Yang L, Li W, Lian J, Zhu H, Deng Q, Zhang Y, Wang L et al. 2024. Selective directional liquid transport on shoot surfaces of Crassula muscosa. Science 384(6702), 1344-1349.
117.Yan YZ. 2023. The "40000-year problem" in the Milankovitch Theory of Pleistocene glacial cycles: Retrospect and prospect. Quaternary Sciences 43(6), 1722-1729.
118.Yu KJ, Gies E, Wood WW. 2025. To solve climate change, we need to restore our Sponge Planet. Nature Water 3(1), 4-6.
119.Yuan Z, Ali A, Ruiz-Benito P, Jucker T, Mori AS, Wang S, Loreau M et al. 2020. Above- and below- ground biodiversity jointly regulate temperate forest multifunctionality along a local-scale environmental gradient. Journal of Ecology 108(5), 2012-2024.
120.Yuan Z, Ali A, Loreau M, Ding F, Liu S, Sanaei A, Le Bagousse-Pinguet Y. 2021. Divergent above- and below- ground biodiversity pathways mediate disturbance impacts on temperate forest multifunctionality. Global Change Biology 27(12), 2883-2894.
121.Zahnle K, Sleep NH. 2002. Carbon dioxide cycling through the mantle and implications for the climate of ancient Earth. Geological Society, London, Special Publications 199(1), 231-257.
122.Zeng R, Shi Y, Guo L, Fu D, Li M, Zhang X, Yang S et al. 2025. A natural variant of COOL1 gene enhances cold tolerance for high-latitude adaptation in maize. Cell 188(5), 1315-1329.
123.Zhang SN, Liu Y, Du MK, Shou GZ, Wang ZY, Xu GH. 2022. Nitrogen as a regulator for flowering time in plant. Plant and Soil 480(1), 1-29.
124.Zhang SW, Bai CM, Liu YF et al. 2025. In Situ Temperature-modified Calcium Supplementation Drives Efficient Bio-carbon Capture. Unpublished.
125.Zhao Y, Antoniou-Kourounioti RL, Calder G, Dean C, Howard M. 2020. Temperature-dependent growth contributes to long-term cold sensing. Nature 583(7818), 825-829.
126.Zonia L, Munnik T. 2007. Life under pressure: hydrostatic pressure in cell growth and function. Trends in plant science 12(3), 90-97.