Ritika Saha — HTGAA Spring 2026

Ritika Saha Profile Ritika Saha Profile

About me

Hello! I’m Ritika Saha, a student in HTGAA (Spring 2026).

My interests include:

  • 🧬 Synthetic biology + diagnostics
  • 🤖 Responsible AI for health

Contact info

Let’s connect:


HTGAA Committed Listener (CL) Agreement

I am a HTGAA Committed Listener, my responsibilities are:

  • Watching class lectures and recitations
  • Participating in node reviews Developing and documenting my homework
  • Actively communicating with other students and TAs on the forum
  • Allowing HTGAA and BioClub to share my work (with attribution) Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
  • Following locally applicable health and safety guidance
  • Promoting a respectful environment free of harassment and discrimination
  • Signed by committing this file to my documentation page/repository,

Ritika Saha 9 March 2026


Homework


Labs

Ritika Saha Profile Ritika Saha Profile

I will also share how I adapt lab work to a home setup and translate those workflows into scalable lab or office environments.


Projects

  • Initially worked upon three different ideas: Idea 1 Breathe based diagnositc device Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation Idea 3 Decoding the genetic circuitry of lung cancer cells Later finalized to go with idea number one i.e Real time diagnostic system for lung health monitoring.
  • Group Formed Proposal: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?tab=t.0 Documentation: https://pages.htgaa.org/2026a/ritika-saha/homework/week-05-hw-protein-design-part-ii/index.html By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action. The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed. BLAST can pull out homologous lysis proteins from the databases. Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated. ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions. AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ. We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein. Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Proposed Idea

Ritika Saha Profile Ritika Saha Profile

I am exploring a project at the intersection of synthetic biology, diagnostics, and responsible AI.

The goal is to design systems that:

  • Enable low-cost, rapid biological diagnostics
  • Integrate AI responsibly into healthcare workflows
  • Improve accessibility of advanced diagnostics in resource-limited settings

This section will evolve as the idea matures through the course.


Follow My Journey

I document my learning, experiments, and reflections here:

More updates coming soon!

Subsections of Ritika Saha — HTGAA Spring 2026

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: LungLite — Principles, Practices, and Governance

🌬️ Project Idea: LungLite (AI + Breath Microfluidics + Cell-Free Synbio)

1) Biological engineering application/tool + why

LungLite is a low-cost, noninvasive breath monitoring system that uses a microfluidic disposable cartridge.


The cartridge contains freeze-dried cell-free synthetic biology reactions to detect breath biomarkers associated with airway inflammation and oxidative stress.

A smartphone camera reads the cartridge’s color/fluorescence pattern and an AI model interprets the result.

The tool is intended to help users monitor lung health over time—especially people with asthma, COPD risk, and high pollution exposure—and provide early warning signals of inflammation before severe symptoms appear.

LungLite leverages cell-free synthetic biology to detect breath biomarkers safely and efficiently. Instead of using live engineered cells, it employs freeze-dried transcription-translation (TX-TL) systems with non-replicating DNA circuits that respond to molecules associated with airway inflammation and oxidative stress. When a user exhales into the microfluidic cartridge, these engineered circuits trigger colorimetric or fluorescent signals proportional to biomarker levels. The sealed cartridge design, combined with built-in post-reaction neutralization, ensures safety, while AI algorithms analyze the visual output to provide an accurate, real-time readout of lung health. This integration of synthetic biology, microfluidics, and AI enables a low-cost, noninvasive tool for continuous monitoring, especially in high-risk environments or populations with limited access to traditional respiratory diagnostics.

Why this matters:
Current lung monitoring tools like spirometers often require strong forced exhalation and are not always accessible, comfortable, or usable for children, elderly people, or individuals in low-resource settings.

This problem is also deeply personal to me because I grew up around severe air pollution in Delhi, where “bad air days” are normal and respiratory symptoms are common. LungLite is motivated by the idea that people in high-exposure environments should be able to track early signs of inflammation easily and affordably—before symptoms become severe.

LungLite – Present Idea LungLite – Present Idea

Initially worked on an AI-powered diagnostic tool for lung cancer. During this opportunity, I pivoted the design to focus on the Present Idea: a low-cost, noninvasive breath test that uses a microfluidic cartridge to track early signs of lung inflammation.

LungLite goal:
breathe → cartridge reacts → phone reads

LungLite – Present Idea LungLite – Present Idea

References-

Cell free systems:

DNA Circuits:


2) Governance/policy goals for an ethical future

Because LungLite sits at the intersection of bioengineering + consumer health + AI, it raises issues in biosecurity, lab safety, privacy, equity, and responsible health claims.

The governance goal is to ensure LungLite contributes to an ethical future by preventing harm while promoting constructive public health benefits.

Policy Goal A — Enhance Biosecurity

  • Sub-goal A1: Prevent incidents
    Prevent misuse of cartridge biology (DNA templates, cell-free reagents) for harmful applications.
  • Sub-goal A2: Help respond
    Ensure traceability and safe reporting if unsafe use or distribution occurs.

Policy Goal B — Foster Lab Safety

  • Sub-goal B1: Prevent incidents
    Ensure safe handling, manufacturing, and disposal of cartridges and reagents.
  • Sub-goal B2: Help respond
    Ensure protocols exist for spills, exposure, or improper disposal.

Policy Goal C — Protect the Environment

  • Sub-goal C1: Prevent incidents
    Ensure cartridges and reagents do not introduce living organisms into waste streams.
  • Sub-goal C2: Help respond
    Ensure recall, disposal, and remediation pathways if materials are found to persist or contaminate waste streams.

Policy Goal D — Other considerations

  • Minimize costs and burdens to stakeholders
  • Ensure feasibility for student prototyping and future scaling
  • Do not unnecessarily impede legitimate research
  • Promote constructive applications (public health monitoring, pollution health impacts)

3) Governance actions


Option 1: Technical Safety-by-Design

(Cell-free only + built-in kill chemistry + non-replicating DNA templates)

Idea

Many biosensors rely on living engineered organisms or wet reagents that could survive handling errors. LungLite instead commits to a cell-free-only architecture, using non-replicating DNA and post-reaction neutralization so the cartridge cannot become a biological propagation risk.

Design

Actors: student researchers, academic labs, cartridge designers, manufacturers.

Key elements:

  • Use commercially available or lab-prepared TX-TL cell-free extract
  • Use DNA templates without replication machinery
  • Add nuclease or denaturing reagents in a sealed “waste chamber” that activates after the reaction
  • Design the cartridge as a sealed unit so users cannot access wet reagents directly
  • Provide clear disposal instructions (trash-safe, not drain)
  • Include a QR code for standardized disposal instructions and recall notices

Assumptions

  • Cell-free systems are safe enough for consumer-adjacent use
  • DNA templates cannot be easily repurposed into harmful functions
  • Cartridge sealing prevents tampering and accidental exposure
  • Neutralization chemistry is robust across temperature/humidity variation

Risks of failure

  • Users could physically open the cartridge, mishandle reagents, or bypass neutralization
  • Poor sealing could cause leakage
  • DNA templates could be shared and repurposed outside intended use

Risks of “success”

  • Widespread adoption could normalize at-home “bio reaction kits” without safety literacy
  • Overconfidence in “bio-safe” claims could reduce careful oversight and institutional review

Option 2: Distribution + Supply Chain Controls

(DNA sequence screening + controlled reagent distribution + batch traceability)

Purpose

Even if the platform is designed safely, misuse risk increases when synbio components are distributed widely. This option adds governance at the distribution layer, aiming to prevent malicious acquisition or repurposing of DNA templates and reagents.

Design

Actors: DNA synthesis companies, cartridge manufacturers, distributors, university procurement offices, and potentially regulators.

Key elements:

  • DNA template sequences are screened using existing industry DNA synthesis screening norms
  • Cartridges sold with batch numbers, manufacturer ID, and basic traceability
  • Reagent supply chain restricted to verified vendors

Assumptions

  • Screening reliably catches harmful sequences
  • Vendors cooperate and screening is consistently implemented
  • Traceability meaningfully deters malicious use
  • Legitimate users will tolerate additional friction

Risks of failure

  • DIY synthesis or black-market sources bypass screening
  • Screening could generate false positives and slow benign development
  • Increased cost and friction could reduce adoption in low-resource communities

Risks of “success”

  • Centralization of power in a small number of vendors could limit open science
  • Smaller labs, students, and global south researchers could be excluded due to cost and access barriers
  • Overly broad screening could suppress legitimate respiratory health research

Option 3: Responsible Health Claims + Data Governance

(Limit medical claims + privacy-by-design + transparency)

Aim

Even if the biology is safe, LungLite could still cause harm through false reassurance, panic, biased AI outputs, or privacy breaches. This option focuses on preventing digital harms and misleading health interpretation.

Design

Actors: app developers, product companies, IRBs/ethics boards (if research), privacy regulators, public health agencies, and clinical collaborators.

Key elements:

  • Position LungLite initially as wellness monitoring, not a medical diagnostic
  • Focus on trend tracking rather than absolute disease classification
  • Provide clear disclaimers (“not a diagnosis; seek medical care if symptoms worsen”)
  • Use local-first processing: results computed on-device when possible
  • Require informed consent for any cloud upload or model improvement
  • Provide opt-out for data sharing
  • Publish model limitations and performance across demographics
  • Align product claims with existing regulatory distinctions between wellness tools and regulated diagnostic devices

Assumptions

  • Users understand “monitoring” vs “diagnosis”
  • Privacy measures meaningfully reduce harm
  • AI transparency improves trust and responsible use
  • The model will generalize across different phones, lighting, and populations

Risks of failure

  • Users may treat outputs as diagnoses and delay care
  • Data leaks could expose sensitive health data
  • Model bias could cause false reassurance or false alarms in specific groups
  • Smartphone hardware variability could distort readings

Risks of “success”

  • A widely adopted breath-health dataset could become commercially valuable and exploited
  • Insurers/employers/schools could pressure people to share breath scores (coercive screening)
  • “Wellness” framing could still function as a de facto diagnostic

4) Scoring matrix (1 = best, 3 = worst; n/a allowed)

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents112
• By helping respond212
Foster Lab Safety
• By preventing incident122
• By helping respond222
Protect the environment
• By preventing incidents122
• By helping respond222
Other considerations
• Minimizing costs and burdens to stakeholders132
• Feasibility?121
• Not impede research131
• Promote constructive applications121
LungLite governance diagram LungLite governance diagram

Figure: LungLite governance scoring matrix

Scoring justification:

  • Option 1 reduces biological risk at the source and does not rely heavily on enforcement.
  • Option 2 is strongest on biosecurity response, but worst on cost, equity, and research openness.
  • Option 3 is strongest for AI/privacy harms but does not fully address upstream biosecurity.

There are few environmental concerns regarding this device like: packaging waste at scale, there might be low environmental risk regarding cell-free extracts, small risks associated with chemicals and dyes. Mitigation can be: minimal-material design, sealed leak-proof cartridge, and take-back/clinic disposal at scale.


5) Prioritized strategy

I believe we should prioritize Option 1 + Option 3 as the core approach now, and adopt a lightweight version of Option 2 only once scaling and commercialization begins.

Why Option 1 is essential

Option 1 addresses the biggest safety and biosecurity concern,i.e, distributing engineered biological systems into homes. By committing to cell-free synthetic biology only, LungLite becomes safer, easier to dispose of, and easier to govern ethically.

Why Option 3 is equally critical

Even if the biology is safe, LungLite can still cause harm through:

  • false reassurance
  • panic from false positives
  • privacy breaches
  • biased AI outputs

Option 3 reduces these risks through responsible messaging, careful AI design, and privacy-by-design.

Where Option 2 fits

Option 2 becomes more important once LungLite is manufactured at scale. Heavy supply chain restrictions too early could:

  • block student prototyping
  • increase costs
  • reduce equitable access
  • slow research innovation

So the staged approach is:

  • Option 1 + Option 3 now
  • Option 2 later (commercialization / mass distribution)

Tradeoffs considered

  • Safety vs accessibility
  • Innovation vs security
  • User empowerment vs medical risk
  • Privacy vs model improvement

Audience for recommendation

This governance strategy is best targeted at:

  • MIT/university lab leadership
  • future consumer product manufacturers
  • public health agencies
  • privacy regulators

6) What I Learned

Ethical concerns that arose

  • Dual-use risk
  • AI harm
  • Privacy
  • Equity
  • Regulatory gray zone
  • Coercion risk (monitoring becomes surveillance)

Governance actions proposed to address these

  • Use cell-free systems only and avoid living organisms
  • Seal cartridges and neutralize biological material post-test
  • Implement privacy-by-design + local-first processing
  • Avoid medical claims until clinically validated
  • Keep manufacturing scalable and affordable
  • Add anti-coercion safeguards (minimize retention, discourage third-party access)

Week 2 Lecture Prep


Homework Questions — Professor Jacobson

1) DNA polymerase error rate, genome comparison, and how biology handles the discrepancy

Nature’s machinery for copying DNA is DNA polymerase. High-fidelity replicative DNA polymerases (with proofreading) have an error rate of approximately:

~1 error per 1,000,000 to 10,000,000 base pairs

Comparison to the human genome

The human genome is approximately:

~3,200,000,000 base pairs

If replication relied only on polymerase accuracy:

  • At 1 error per 1,000,000 bp:
    3,200,000,000 / 1,000,000 = 3,200 errors per genome replication

  • At 1 error per 10,000,000 bp:
    3,200,000,000 / 10,000,000 = 320 errors per genome replication

So even “high-fidelity” polymerase alone would still introduce hundreds to thousands of mistakes each time the genome is copied.

LungLite – Present Idea LungLite – Present Idea

DNA polymerase’s shape precisely fits correct base pairs and uses a conformational “proofreading” motion to minimize misincorporation. https://www.sciencedirect.com/science/article/pii/S0969212615002695

How biology deals with the discrepancy

Biology reduces the final mutation rate using multiple layers of error correction:

  • Polymerase proofreading removes many misincorporated bases during replication.
  • Mismatch repair (MMR) fixes errors missed by proofreading.
  • Base excision repair (BER) fixes chemically damaged bases.
  • Nucleotide excision repair (NER) removes bulky lesions.

Together, these systems reduce the effective mutation rate to roughly:

~1 error per 1,000,000,000 to 10,000,000,000 bp per cell division

That means across one human genome replication, the final result is typically on the order of:

~0.3 to 3 mutations per cell division


🧪 Homework Questions — Dr. LeProust

1) What’s the most commonly used method for oligo synthesis currently?

The most commonly used method is:

Solid-phase phosphoramidite DNA synthesis

This is the standard chemistry used by most commercial oligo suppliers. It works by building a DNA strand one nucleotide at a time on a solid support (like a bead, column, or array surface) using repeated cycles of:

  • deprotection
  • coupling
  • capping
  • oxidation

LungLite – Present Idea LungLite – Present Idea

2) Why is it difficult to make oligos longer than ~200 nt via direct synthesis?

Direct synthesis becomes difficult past ~200 nucleotides because:

A) The yield drops exponentially with length

Each synthesis step has less than 100% efficiency, so errors compound as the oligo gets longer.Even in an optimistic scenario, most strands are truncated or incorrect.

B) Errors accumulate

Long oligos contain more:

  • deletions (from incomplete coupling)
  • substitutions (from incorrect incorporation)
  • depurination damage (especially A/G under acidic conditions)
  • truncated fragments

C) Purification becomes difficult and expensive

Separating a perfect 200-mer from 199-mer and 198-mer fragments is hard, so cost and complexity increase quickly.


3) Why can’t you make a 2000 bp gene via direct oligo synthesis?

Because the yield would collapse to essentially zero and the error rate would be unusable.

A) Yield becomes extremely low

B) The error rate becomes unacceptable

Even the rare full-length molecules would almost always contain:

  • substitutions
  • deletions
  • truncations
  • damaged bases

So you would not get a clean, correct 2000 bp product.

What is done instead in practice?

Instead of direct synthesis, genes are made by:

  • synthesizing shorter oligos (usually 60–200 nt)
  • assembling them into longer DNA using methods like:
    • Gibson Assembly
    • PCR-based assembly
    • Golden Gate
    • Ligase Cycling Assembly (LCA)
  • then sequence-verifying clones to find a correct one

📄 HW by Dr. George Church — Grant Application (Devised)

Project Title

LungLite: A Room-Temperature, Breath-to-Color Microfluidic Cartridge Powered by Cell-Free Synthetic Biology and Smartphone AI for At-Home Lung Inflammation Monitoring

1) Abstract

Chronic respiratory disease affects hundreds of millions globally, yet lung health monitoring remains clinic-centered, effort-dependent, and inaccessible for many populations. Existing tools such as spirometry require strong forced exhalation and proper technique, while lab tests for inflammation and oxidative stress are expensive and slow.

I propose LungLite, a low-cost breath monitoring system that combines breath condensation microfluidics, freeze-dried cell-free synthetic biology, and smartphone computer vision + AI. Users breathe into a disposable cartridge that captures breath condensate and routes it through multiple reaction zones. Each zone contains a freeze-dried cell-free reaction that produces a colorimetric/fluorescent signal in response to oxidative stress and inflammation-associated breath chemistry.

A smartphone reader standardizes illumination, quantifies reaction outputs, and uses machine learning to interpret a multi-zone “fingerprint” into a trend score. LungLite is designed for safe, scalable, room-temperature storage and distribution and aims to enable daily lung health monitoring outside specialized medical centers.


2) Specific Aims

Aim 1 — Engineer a breath-to-fluid microfluidic cartridge
Hypothesis: A passive, low-cost cartridge can consistently convert breath into a defined liquid sample volume and deliver it to reaction zones with minimal variability.
Outcome: consistent fluid delivery across users and breathing conditions.

Aim 2 — Develop a multi-zone freeze-dried cell-free synbio sensing panel
Hypothesis: freeze-dried cell-free reactions can be stabilized at room temperature and produce reproducible outputs when rehydrated.
Outcome: 6–12 zone panel with internal controls and reproducible readouts.

Aim 3 — Build a smartphone reader + AI pipeline
Hypothesis: smartphone imaging + AI normalization improves reliability and interpretability.
Outcome: trend score + confidence + invalid-test detection.


3) Significance

LungLite could enable:

  • noninvasive monitoring
  • high-frequency measurement
  • accessibility for children and low-resource settings
  • room-temperature distribution
  • population-level monitoring during wildfire smoke events

4) Innovation

  • Cell-free synbio in a consumer cartridge
  • Fingerprint sensing rather than single biomarker
  • AI as a reliability layer (normalization + invalid detection + confidence)

5) Technical Approach and Work Plan (12 months)

  • Months 1–2: breath capture + condensation
  • Months 2–4: routing + zone array
  • Months 3–7: freeze-dry stabilization
  • Months 5–8: phone reader + illumination
  • Months 7–10: AI training + invalid detection
  • Months 10–12: validation + usability

6) Expected Deliverables

  • disposable cartridge (6–12 zones)
  • freeze-dried reaction panel + controls
  • smartphone reader dock
  • AI pipeline
  • validation report
  • product pathway plan

7) Risk Analysis and Mitigation

  • biomarkers variable → fingerprint + controls + AI
  • stability issues → sealed packaging + desiccant
  • diagnostic misuse → wellness framing + disclaimers
  • privacy misuse → local-first + opt-in + deletion

8) Safety, Ethics, and Governance Plan

  • cell-free only
  • sealed cartridges
  • built-in neutralization
  • sequence screening at synthesis
  • traceability if scaling begins
  • bias testing + transparency
  • no disease claims until validated

9) Team and Resources

Cross-disciplinary team spanning:

  • microfluidics
  • cell-free synbio
  • optics + computer vision
  • ML
  • product design

10) Long-Term Vision and Commercialization

  • reusable reader + disposable cartridges
  • room-temperature shipping
  • low-cost manufacturing (paper microfluidics)
  • Year 1: wellness monitoring
  • Year 2+: clinical validation + regulated pathway

HW Review Papers — Week Summary Notes


1) DNA Sequencing at 40 (Shendure, J., Balasubramanian, S., Church, G. et al. https://doi.org/10.1038/nature24286)

Idea

DNA sequencing has gone through multiple revolutions and now functions as a universal molecular measurement tool — not just a way to read genomes.

Key points

  • In ~40 years, sequencing scaled from kilobases → first human genome → millions of genomes
  • Sequencing is no longer only for genomes; it is now used to measure:
    • gene expression (RNA-seq)
    • chromatin state (ATAC-seq, ChIP-seq)
    • lineage tracing
    • somatic mutations
    • molecular interactions
  • Costs dropped dramatically due to next-generation sequencing (NGS)
  • Authors argue sequencing’s long-term impact may rival the microscope

Key message

We have become extremely good at reading DNA at massive scale, speed, and low cost.


2) DNA Synthesis Technologies to Close the Gene Writing Gap (2023), Hoose, A., Vellacott, R., Storch, M. et al. https://doi.org/10.1038/s41570-022-00456-9

Focus

We still cannot write DNA as efficiently as we can read it — and this is a major bottleneck for synthetic biology.

Key points

  • Synthetic DNA is essential for:
    • synthetic biology
    • gene therapy
    • DNA data storage
    • nanotechnology
  • Current chemical synthesis struggles beyond ~200 base pairs
  • Long DNA synthesis is expensive and error-prone
  • New approaches aiming to scale DNA writing include:
    • enzymatic (template-independent) synthesis
    • microarray-based synthesis + assembly
    • rolling circle amplification
    • molecular assembly + cloning pipelines
  • As DNA writing becomes easier, regulation and oversight become more important

3) Recombineering and MAGE (2021), Wannier T, et al. Nat Rev Methods Primers, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9083505/

Core idea

Recombineering and MAGE enable precise, scarless, multiplex genome editing without requiring toxic double-strand breaks (DSBs).

Why traditional editing is limiting

Older editing methods (ZFNs, TALENs, CRISPR with DSBs):

  • rely on double-strand breaks
  • DSBs can be toxic (especially in bacteria)
  • repair often produces unwanted indels
  • low precision for large-scale combinatorial editing

Recombineering solution

  • Uses phage proteins (Redβ, Exo, Gam)
  • Introduces ssDNA or dsDNA with homology
  • DNA integrates at the replication fork
  • No DSB required
  • Editing is highly precise and “scarless”

MAGE (Multiplex Automated Genome Engineering)

  • Introduces many ssDNA oligos at once
  • Creates combinatorial diversity across many genomic sites
  • Enables genome-scale reverse genetics

4) CRISPR Technology: A Decade of Genome Editing is Only the Beginning, Wang, Doudna, et al., https://www.science.org/doi/10.1126/science.add8643

Focus area

CRISPR made genome editing programmable, accessible, and fast — dramatically lowering the barrier to entry.

Main points

  • Cas9 + guide RNA enables targeting by base pairing
  • Enabled:
    • knockouts
    • pooled genetic screens
    • animal models
    • crop editing
    • emerging human therapies

Newer CRISPR-derived tools

  • Base editing: A→G or C→T without DSBs
  • Prime editing: templated edits with higher precision

Remaining challenges

  • off-target effects
  • delivery into cells/tissues
  • limited multiplexing at large scale
  • HDR inefficiency in many systems

Summary

Biotechnology has made DNA reading extremely scalable (sequencing), but DNA writing (synthesis) and DNA rewriting (editing) are still constrained by cost, accuracy, delivery, and scalability.

Sequencing is now a general-purpose measurement tool, while synthesis and editing are rapidly improving — raising both exciting capabilities and new governance needs.


I used artificial intelligence tools, including ChatGPT-5.0, for language refinement, structural organization, and clarity of expression in this documentation. All scientific concepts, design decisions, sequence selections, experimental reasoning, and technical interpretations reflect my own understanding and work. The AI tool was used solely to improve readability, coherence, and presentation quality.

Week 2 HW: DNA Read, Write, Edit — SOD1 Molecular Journey

🧬 Week 2 Documentation

DNA Read → DNA Write → DNA Edit

A Molecular Design Journey

This week was not just a technical exercise. It was an exploration — from abstract sequence to physical plasmid, from conceptual art to molecular execution. Below is the full documentation of my process, including failures, iterations, and insights gained.


🧪 Part 0: Basics of Gel Electrophoresis

Lectures + Recitation

I attended/watched all required lecture and recitation materials.

Conceptual Understanding

Gel electrophoresis separates DNA fragments based on size using:

  • Negatively charged DNA backbone
  • Electric field
  • Agarose matrix
  • Size-dependent migration

Smaller fragments travel further.

🎨 Part 1: Benchling & In-silico Gel Art

Step 1: Benchling Account + Lambda DNA Import

  • Created Benchling account
  • Imported Lambda DNA reference sequence

Step 2: Simulated Restriction Digestion

Enzymes used:

  • EcoRI
  • HindIII
  • BamHI
  • KpnI
  • EcoRV
  • SacI
  • SalI

Initial Failure

My first digestion simulation produced fragmented bands that were too similar in size. The pattern looked visually indistinct.

Iteration Strategy

  • Tested different single and double digests
  • Compared fragment size outputs
  • Adjusted enzyme combinations

Eventually, I selected combinations that produced strong band separation.

Kindly find attach all the simulations carried out for the same task:

The following image represents setting up the Benchling account and loading lambda sequence, ultimately I was able to visualize as shown here- Figure1 Figure1

The following image shows the end result after carrying out the digestion process, I worked on a pattern design of “H Letter”, reason being my startup company’s first letter is H! Although, I must say I struggled alot and I intend to re run all of these simulations and tasks at least 5-6 times! Figure2 Figure2


🧪 In-Silico Gel Art

I did try to work out on gel art, but yet again this part of the homework was something I really struggled.

In Silico Gel In Silico Gel

Insight

Never had I imagined that biological mechanisms could generate such striking and beautiful art forms. As someone who once dreamed of becoming an artist but ultimately pursued engineering, I find this intersection deeply exciting. Working with gel patterns and molecular design has rekindled a childhood aspiration I once held close — the dream of opening an art studio.


🧬 Part 3: DNA Design Challenge

3.1 Choose Your Protein

Selected Protein: Human Superoxide Dismutase 1 (SOD1)

UniProt ID: P00441

sp|P00441|SODC_HUMAN
Superoxide dismutase [Cu-Zn]
OS=Homo sapiens OX=9606 PE=1 SV=2

Why SOD1?

SOD1 converts:

O₂⁻ → O₂ + H₂O₂

It protects against oxidative stress and is implicated in ALS.

It also integrates mechanistically with my LungLite platform — serving as a biochemical actuator.

Kindly find attached an image of the protein sequence: 
protein protein

Amino Acid Sequence

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

3.2 Reverse Translation

Using online reverse translation tools, I generated a nucleotide sequence.

Failure

Reverse translation produced multiple valid sequences due to codon degeneracy.

There is no single “correct” DNA sequence for a protein.

Resolution

I selected one biologically valid version as a starting template.

Pre-optimization DNA:

atggtgaaagcggtgtgcgtgctgaaaggcgatggcccggtgcagggcattattaacttt...
Kindly find attached images showing conversion of amino acid sequences to dna sequence (extremely interesting!):
dna dna

dna dna

3.3 Codon Optimization

Why Optimize?

Different organisms prefer specific codons due to tRNA abundance.

Without optimization:

  • Ribosome stalling
  • Low yield
  • Translation inefficiency

Host Chosen: Escherichia coli

Reasons:

  • Fast growth
  • High recombinant yield
  • Standard lab organism

Final Codon Optimized Sequence

ATGGTTAAAGCGGTATGCGTGCTGAAAGGCGATGGCCCGGTGCAGGGCATTATTAACTTT
GAACAGAAAGAATCAAACGGCCCGGTGAAAGTGTGGGGCAGCATTAAAGGCCTGACCGA
AGGTCTGCACGGCTTTCACGTGCATGAATTTGGCGATAACACCGCGGGCTGCACCAGCG
CCGGCCCGCATTTTAACCCGCTGAGCCGCAAACATGGCGGCCCGAAAGATGAAGAACGCC
ATGTGGGCGATCTGGGCAATGTGACCGCGGATAAAGATGGCGTGGCCGATGTGAGCATT
GAAGATAGCGTGATTAGCCTGAGCGGCGATCATTGCATTATTGGCCGCACCCTGGTTGT
TCATGAAAAAGCAGATGATCTGGGCAAAGGCGGCAACGAAGAAAGCACCAAAACCGGCA
ATGCGGGGAGCCGCCTGGCGTGCGGCGTGATTGGCATCGCCCAG
codon codon
Loading the above sequence directly on benchling platform and visualizing it:
dna dna

3.4 From DNA to Protein

Expression Methods:

Cell-Dependent

  1. Transform plasmid into E. coli
  2. Antibiotic selection
  3. Transcription
  4. Translation
  5. His-tag purification

Cell-Free Option

  • TX-TL system
  • Direct protein production without cells
Building the expression cassette:
dna dna
Create a digital diagram of above cassette: 
dna dna

3.5 Central Dogma Alignment

DNA:

ATG GTT AAA GCG

RNA:

AUG GUU AAA GCG

Protein:

Met Val Lys Ala

Each 3 nucleotides = 1 amino acid
T → U during transcription


🧬 Part 4: Prepare a Twist DNA Synthesis Order

4.1 Accounts

  • Created Twist account
  • Created Benchling account

4.2 Build Expression Cassette

Structure:

Promoter
RBS
ATG
SOD1 Coding Sequence
7x His Tag
TAA
Terminator

Failure

Initially forgot to annotate regions in Benchling.

Fix

Annotated:

  • Promoter
  • RBS
  • CDS
  • His Tag
  • Terminator

Verified via Linear Map view.


Final Insert Sequence

>SOD1_LungLite_Expression_Cassette
TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATG
GTTAAAGCGGTATGCGTGCTGAAAGGCGATGGCCCGGTGCAGGGCATTATTAACTTTGA
ACAGAAAGAATCAAACGGCCCGGTGAAAGTGTGGGGCAGCATTAAAGGCCTGACCGAAGG
TCTGCACGGCTTTCACGTGCATGAATTTGGCGATAACACCGCGGGCTGCACCAGCGCCG
GCCCGCATTTTAACCCGCTGAGCCGCAAACATGGCGGCCCGAAAGATGAAGAACGCCAT
GTGGGCGATCTGGGCAATGTGACCGCGGATAAAGATGGCGTGGCCGATGTGAGCATTGA
AGATAGCGTGATTAGCCTGAGCGGCGATCATTGCATTATTGGCCGCACCCTGGTTGTTC
ATGAAAAAGCAGATGATCTGGGCAAAGGCGGCAACGAAGAAAGCACCAAAACCGGCAAT
GCGGGGAGCCGCCTGGCGTGCGGCGTGATTGGCATCGCCCAGCATCACCATCACCATC
ATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTT
TTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGT
GGGCCTTTCTGCGTTTATA

4.3–4.6 Twist Order

Selected:

  • Genes → Clonal Genes
  • Vector: pTwist Amp High Copy

Imported GenBank file back into Benchling to confirm construct.

I built my first plasmid.

The images document the workflow: exporting a FASTA file from Benchling, creating a Twist Bioscience account, (hypothetically) placing an order by selecting Clonal Gene, downloading the resulting gene construct file (a .gb / GenBank file) from the Twist platform, and then uploading that same file back into Benchling.
final finalfinal finalfinal final

🧬 Part 5: DNA Read / Write / Edit


5.1 DNA Read

What Would I Sequence?

The SOD1 gene sequence to understand its structure, variants, and oxidative stress relevance in lung epithelial biology.

Why This Matters

Superoxide Dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that catalyzes the conversion of superoxide radicals (O₂⁻) into oxygen and hydrogen peroxide. Because oxidative stress is central to airway inflammation, SOD1 represents the molecular boundary between resilience and pathology in lung tissue. Mutations in SOD1 are linked to Amyotrophic Lateral Sclerosis (ALS), and its structure and function are well-characterized, making it ideal for recombinant engineering and diagnostic integration.

Technology Chosen: Oxford Nanopore

Generation: Third-generation sequencing

Input:

  • Extracted DNA containing SOD1
  • Adapter ligation

Mechanism:

  • DNA passes through nanopores
  • Ionic current changes → base calling

Output:

  • FASTQ long reads of SOD1 sequence

Why Nanopore?

  • Long reads allow full-length SOD1 sequencing
  • Detects structural variants and potential regulatory regions
  • Portable and scalable

Limitations:

  • Higher error rate than Illumina
  • Correctable with sequencing depth and consensus alignment

5.2 DNA Write

What Would I Synthesize?

A codon-optimized SOD1 expression cassette and ROS-responsive genetic circuits for LungLite.

Rationale

To integrate SOD1 into LungLite, the gene must be optimized for expression in bacterial or cell-free systems. This enables recombinant production and functional embedding into oxidative stress detection circuits.

Technology

  • Phosphoramidite oligo synthesis
  • PCR assembly
  • Clonal gene insertion into expression vector
  • 7×His tag for purification

Application in LungLite

  1. Biological Amplifier Strategy

    • ROS activates redox-sensitive promoter
    • Induces SOD1 expression in freeze-dried TX–TL system
    • SOD1 converts superoxide → H₂O₂
    • Coupled colorimetric/fluorescent reaction produces smartphone-readable signal
  2. Calibration Standard Strategy

    • Purified recombinant SOD1 embedded in microfluidic wells
    • Known concentrations normalize ROS dye response
    • Enables quantitative oxidative stress scoring

Limitations

  • Length constraints in synthesis
  • Synthesis errors
  • Cost scaling for large constructs

5.3 DNA Edit

What Would I Edit?

Upregulate antioxidant pathways — including SOD1 expression — in lung epithelial cells.

Technology: CRISPR-Cas9

Steps

  1. gRNA design targeting regulatory region
  2. Cas9-induced double-strand break
  3. HDR-mediated repair with enhanced promoter template

Input:

  • gRNA plasmid
  • Cas9
  • Donor DNA template
  • Target lung epithelial cells

Goal

Increase endogenous SOD1 buffering capacity to restore redox balance in oxidative stress conditions.

Limitations

  • Off-target effects
  • Variable editing efficiency
  • Delivery challenges in airway epithelium

🌬 Final Reflection

What began as:

Lambda DNA
→ Restriction digest
→ Gel electrophoresis

Evolved into:

DNA Read → Sequencing SOD1
DNA Write → Engineering ROS-responsive SOD1 circuits
DNA Express → Recombinant protein production
DNA Integrate → Embedding SOD1 into LungLite microfluidic diagnostics

SOD1 is not merely a recombinant protein in this project. It becomes a functional biochemical actuator — translating environmental oxidative exposure into measurable signal output.

Growing up in Delhi, where severe air pollution makes oxidative stress a daily lived experience, reframes SOD1 from an abstract enzyme to a molecular proxy for environmental exposure. LungLite transforms this molecular logic into a portable, AI-integrated, noninvasive public health device.

The DNA Design Challenge is no longer just molecular cloning — it becomes the foundation for a programmable redox-sensing health platform.

I acknowledge that I used artificial intelligence tools, including ChatGPT-5.0, for language refinement, structural organization, and improvement of clarity in this documentation.

All scientific concepts, experimental designs, sequence selections, analytical reasoning, and technical interpretations presented in this work reflect my own understanding and independent effort. The AI tool was used solely to enhance readability, coherence, grammar, and overall presentation quality.

The prompts primarily included instructions such as: “Rewrite the text and correct grammatical errors.”

Week 3 HW: Lab Automation — Opentrons Artwork

Lab Automation and Opentrons Programming


Part 1: Python Script for Opentrons Artwork

Objective

Our first task was to generate an artisitc design using the GUI at opentrons-art.rcdonovan.com.

My inspiration for this design was my dog shiro (although he is an Indian spitz), I ended up designing a dachshund- art art

I, then exported the python script directly from the interface, as per the given instructions:

from opentrons import types

import string

metadata = {
    'protocolName': '{YOUR NAME} - Opentrons Art - HTGAA',
    'author': 'HTGAA',
    'source': 'HTGAA 2026',
    'apiLevel': '2.20'
}

Z_VALUE_AGAR = 2.0
POINT_SIZE = 1.25

mrfp1_points = [(23,31), (21,29), (23,29), (25,29), (19,27), (23,27), (21,23), (17,21), (19,21), (9,19), (11,19), (13,19), (15,19), (17,19), (1,11), (5,11), (1,9), (-1,7), (1,7), (-7,5), (-5,5), (-3,5), (-1,5), (-7,3), (-5,3), (-3,3), (-1,3), (-5,1), (-3,1), (-1,1), (-5,-1), (-3,-1), (9,-7), (-15,-9), (-11,-9), (15,-9), (23,-9), (25,-9), (27,-9), (25,-11), (27,-11), (-19,-13), (9,-13), (11,-13), (-5,-17), (-21,-19), (-7,-19), (-21,-21), (-9,-21), (-19,-23)]
mko2_points = [(19,29), (15,27), (17,27), (21,27), (13,25), (15,25), (17,25), (19,25), (21,25), (23,25), (11,23), (13,23), (15,23), (17,23), (19,23), (7,21), (9,21), (11,21), (13,21), (15,21), (5,19), (7,19), (5,17), (7,17), (9,17), (11,17), (13,17), (15,17), (17,17), (19,17), (7,15), (9,15), (11,15), (13,15), (15,15), (17,15), (7,13), (9,13), (11,13), (13,13), (15,13), (9,11), (11,11), (13,11), (15,11), (11,9), (13,9), (15,9), (13,7), (15,7), (7,3), (9,3), (7,1), (9,1), (11,1), (13,1), (15,1), (17,1), (7,-1), (9,-1), (11,-1), (13,-1), (15,-1), (17,-1), (7,-3), (9,-3), (11,-3), (13,-3), (15,-3), (17,-3), (9,-5), (11,-5), (13,-5), (15,-5), (17,-5), (17,-7), (21,-7), (23,-7), (25,-7), (27,-7), (-27,-9), (-25,-11), (19,-11), (-23,-13), (21,-13), (27,-13), (7,-15), (19,-15), (21,-15), (23,-15), (-7,-17), (-3,-17), (-11,-19), (-9,-19), (-5,-19), (-23,-21), (-13,-21), (-11,-21), (-7,-21), (-5,-21), (-23,-23), (-21,-23), (-15,-23), (-13,-23), (-11,-23), (-9,-23), (-7,-23), (-23,-25), (-21,-25), (-19,-25), (-17,-25), (-15,-25), (-13,-25), (-11,-25), (-9,-25), (-25,-27), (-23,-27), (-11,-27), (-9,-27), (-27,-29), (-25,-29), (-13,-29), (-11,-29)]
mscarlet_i_points = [(5,27), (7,27), (9,27), (11,27), (13,27), (5,25), (7,25), (9,25), (11,25), (3,23), (5,23), (7,23), (9,23), (-1,21), (1,21), (3,21), (5,21), (-3,19), (-1,19), (1,19), (3,19), (-13,17), (-11,17), (-9,17), (-7,17), (-5,17), (-3,17), (-1,17), (1,17), (3,17), (-15,15), (-13,15), (-11,15), (-9,15), (-7,15), (-5,15), (-3,15), (-1,15), (1,15), (3,15), (5,15), (-15,13), (-13,13), (-11,13), (-9,13), (-7,13), (-5,13), (-3,13), (-1,13), (1,13), (3,13), (5,13), (-15,11), (-13,11), (-11,11), (-9,11), (-7,11), (-5,11), (-3,11), (-1,11), (3,11), (7,11), (-15,9), (-13,9), (-11,9), (-9,9), (-7,9), (-5,9), (-3,9), (-1,9), (3,9), (5,9), (7,9), (9,9), (-15,7), (-13,7), (-11,7), (-9,7), (-7,7), (-5,7), (-3,7), (3,7), (5,7), (7,7), (9,7), (11,7), (-27,5), (1,5), (3,5), (5,5), (7,5), (9,5), (11,5), (13,5), (15,5), (-27,3), (1,3), (3,3), (5,3), (11,3), (13,3), (15,3), (-27,1), (1,1), (3,1), (5,1), (-1,-1), (1,-1), (3,-1), (5,-1), (-27,-3), (-3,-3), (-1,-3), (1,-3), (3,-3), (5,-3), (-27,-5), (-5,-5), (-3,-5), (-1,-5), (1,-5), (3,-5), (5,-5), (7,-5), (-27,-7), (-25,-7), (-13,-7), (-11,-7), (-9,-7), (-7,-7), (-5,-7), (-3,-7), (-1,-7), (1,-7), (3,-7), (5,-7), (7,-7), (11,-7), (13,-7), (15,-7), (19,-7), (-25,-9), (-23,-9), (-17,-9), (-13,-9), (-9,-9), (-7,-9), (-5,-9), (-3,-9), (-1,-9), (1,-9), (3,-9), (5,-9), (7,-9), (9,-9), (11,-9), (13,-9), (17,-9), (19,-9), (21,-9), (-23,-11), (-21,-11), (-17,-11), (-15,-11), (-13,-11), (-11,-11), (-9,-11), (-7,-11), (-5,-11), (-3,-11), (-1,-11), (1,-11), (3,-11), (5,-11), (7,-11), (9,-11), (15,-11), (17,-11), (21,-11), (-21,-13), (-17,-13), (-15,-13), (-13,-13), (-11,-13), (-9,-13), (-7,-13), (-5,-13), (-3,-13), (-1,-13), (1,-13), (3,-13), (5,-13), (7,-13), (23,-13), (-19,-15), (-17,-15), (-15,-15), (-13,-15), (-11,-15), (-9,-15), (-7,-15), (-5,-15), (-3,-15), (-1,-15), (1,-15), (3,-15), (5,-15), (-19,-17), (-17,-17), (-15,-17), (-13,-17), (-11,-17), (-9,-17), (-19,-19), (-17,-19), (-15,-19), (-13,-19), (-19,-21), (-17,-21), (-15,-21), (-17,-23)]
azurite_points = [(31,-9), (15,-13), (25,-13)]
mclover3_points = [(23,-11)]

point_name_pairing = [("mrfp1", mrfp1_points),("mko2", mko2_points),("mscarlet_i", mscarlet_i_points),("azurite", azurite_points),("mclover3", mclover3_points)]

# Robot deck setup constants
TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

# Place the PCR tubes in this order
well_colors = {
    'A1': 'sfGFP',
    'A2': 'mRFP1',
    'A3': 'mKO2',
    'A4': 'Venus',
    'A5': 'mKate2_TF',
    'A6': 'Azurite',
    'A7': 'mCerulean3',
    'A8': 'mClover3',
    'A9': 'mJuniper',
    'A10': 'mTurquoise2',
    'A11': 'mBanana',
    'A12': 'mPlum',
    'B1': 'Electra2',
    'B2': 'mWasabi',
    'B3': 'mScarlet_I',
    'B4': 'mPapaya',
    'B5': 'eqFP578',
    'B6': 'tdTomato',
    'B7': 'DsRed',
    'B8': 'mKate2',
    'B9': 'EGFP',
    'B10': 'mRuby2',
    'B11': 'TagBFP',
    'B12': 'mChartreuse_TF',
    'C1': 'mLychee_TF',
    'C2': 'mTagBFP2',
    'C3': 'mEGFP',
    'C4': 'mNeonGreen',
    'C5': 'mAzamiGreen',
    'C6': 'mWatermelon',
    'C7': 'avGFP',
    'C8': 'mCitrine',
    'C9': 'mVenus',
    'C10': 'mCherry',
    'C11': 'mHoneydew',
    'C12': 'TagRFP',
    'D1': 'mTFP1',
    'D2': 'Ultramarine',
    'D3': 'ZsGreen1',
    'D4': 'mMiCy',
    'D5': 'mStayGold2',
    'D6': 'PA_GFP'
}

volume_used = {
    'mrfp1': 0,
    'mko2': 0,
    'mscarlet_i': 0,
    'azurite': 0,
    'mclover3': 0
}

def update_volume_remaining(current_color, quantity_to_aspirate):
    rows = string.ascii_uppercase
    for well, color in list(well_colors.items()):
        if color == current_color:
            if (volume_used[current_color] + quantity_to_aspirate) > 250:
                # Move to next well horizontally by advancing row letter, keeping column number
                row = well[0]
                col = well[1:]
                
                # Find next row letter
                next_row = rows[rows.index(row) + 1]
                next_well = f"{next_row}{col}"
                
                del well_colors[well]
                well_colors[next_well] = current_color
                volume_used[current_color] = quantity_to_aspirate
            else:
                volume_used[current_color] += quantity_to_aspirate
            break

def run(protocol):
    # Load labware, modules and pipettes
    protocol.home()

    # Tips
    tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

    # Pipettes
    pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

    # Deep Well Plate
    temperature_plate = protocol.load_labware('nest_96_wellplate_2ml_deep', 6)

    # Agar Plate
    agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')
    agar_plate.set_offset(x=0.00, y=0.00, z=Z_VALUE_AGAR)

    # Get the top-center of the plate, make sure the plate was calibrated before running this
    center_location = agar_plate['A1'].top()

    pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)
    
    # Helper function (dispensing)
    def dispense_and_jog(pipette, volume, location):
        assert(isinstance(volume, (int, float)))
        # Go above the location
        above_location = location.move(types.Point(z=location.point.z + 2))
        pipette.move_to(above_location)
        # Go downwards and dispense
        pipette.dispense(volume, location)
        # Go upwards to avoid smearing
        pipette.move_to(above_location)

    # Helper function (color location)
    def location_of_color(color_string):
        for well,color in well_colors.items():
            if color.lower() == color_string.lower():
                return temperature_plate[well]
        raise ValueError(f"No well found with color {color_string}")

    # Print pattern by iterating over lists
    for i, (current_color, point_list) in enumerate(point_name_pairing):
        # Skip the rest of the loop if the list is empty
        if not point_list:
            continue

        # Get the tip for this run, set the bacteria color, and the aspirate bacteria of choice
        pipette_20ul.pick_up_tip()
        max_aspirate = int(18 // POINT_SIZE) * POINT_SIZE
        quantity_to_aspirate = min(len(point_list)*POINT_SIZE, max_aspirate)
        update_volume_remaining(current_color, quantity_to_aspirate)
        pipette_20ul.aspirate(quantity_to_aspirate, location_of_color(current_color))

        # Iterate over the current points list and dispense them, refilling along the way
        for i in range(len(point_list)):
            x, y = point_list[i]
            adjusted_location = center_location.move(types.Point(x, y))

            dispense_and_jog(pipette_20ul, POINT_SIZE, adjusted_location)
            
            if pipette_20ul.current_volume == 0 and len(point_list[i+1:]) > 0:
                quantity_to_aspirate = min(len(point_list[i:])*POINT_SIZE, max_aspirate)
                update_volume_remaining(current_color, quantity_to_aspirate)
                pipette_20ul.aspirate(quantity_to_aspirate, location_of_color(current_color))

        # Drop tip between each color
        pipette_20ul.drop_tip()

I also experimented with a Google Colab code file, where I worked on generating a design based on an image resembling the Earth. earth earth


Part 2: Post-Lab Questions

2.1 Published Paper Using Automation

paper paper

Paper Title

An Automated Versatile Diagnostic Workflow for Infectious Disease Detection in Low-Resource Settings

Source

https://www.mdpi.com/2072-666X/15/6/708

Summary

This paper presents an automated diagnostic workflow designed for detecting infectious diseases in low-resource settings. The system integrates microfluidics, biosensing, and automation to process biological samples efficiently. It focuses on creating a scalable and portable diagnostic pipeline that reduces manual intervention while maintaining accuracy.

Use of Automation

The workflow incorporates automation tools to handle multiple steps of the diagnostic process, including sample preparation, reagent handling, and reaction execution. Automated systems ensure precise liquid handling, reduce human error, and enable reproducibility across multiple tests. The integration of microfluidic platforms further enhances throughput and minimizes reagent usage.

Key Contribution

The key contribution of this work is the development of a versatile and low-cost automated diagnostic platform that can be deployed in resource-limited environments. It demonstrates how automation can bridge gaps in healthcare accessibility by enabling reliable and rapid disease detection.

Relevance to This Week

This paper directly relates to this week’s focus on lab automation using Opentrons. It highlights how automated liquid handling and integrated workflows can transform biological experiments into scalable and reproducible systems, similar to how we programmed the Opentrons robot.


2.2 Final Project — Automation Plan

Project Overview

For the final project, I propose developing an automated diagnostic system that detects disease biomarkers from breath condensate samples using a microfluidic and cell-free synthetic biology platform.

Problem Statement

Traditional diagnostic methods can be invasive, time-consuming, and require well-equipped laboratory settings. There is a need for a non-invasive, rapid, and scalable diagnostic solution that can work in low-resource environments.

Proposed Solution

The proposed system will combine breath-based sample collection with automated liquid handling and synthetic biology reactions. Using an Opentrons robot, the workflow will automate sample distribution, reagent addition, and reaction setup across multiple wells.


Workflow Description

def run(protocol):

    # Load labware and pipette
    tiprack = protocol.load_labware("opentrons_96_tiprack_20ul", 9)
    pipette = protocol.load_instrument("p20_single_gen2", "right", [tiprack])

    plate = protocol.load_labware("corning_96_wellplate_360ul_flat", 1)

    # Step 1: Add sample to wells
    for well in plate.wells():
        pipette.pick_up_tip()
        pipette.aspirate(10, plate['A1'])
        pipette.dispense(10, well)
        pipette.mix(2, 10, well)
        pipette.drop_tip()

    # Step 2: Incubation
    protocol.delay(minutes=30)

    # Step 3: Output ready
    print("Reactions complete")

Tools and Technologies

  • Opentrons liquid handling robot
  • Microfluidic chip systems
  • Cell-free synthetic biology platforms
  • Optional cloud lab systems (e.g., Ginkgo Nebula)

Experimental Plan

  1. Collect breath condensate sample
  2. Distribute samples into multiple wells using Opentrons
  3. Add reagents to initiate reactions
  4. Incubate under controlled conditions
  5. Measure outputs (fluorescence or color change)

Expected Outcome

The system will enable rapid, automated, and non-invasive detection of biomarkers with high reproducibility. It will demonstrate how automation can be used to scale biological diagnostics.

Part 3: Final Project Ideas

Idea 1 Breathe based diagnositc device

idea idea

Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation

idea idea

Idea 3 Decoding the genetic circuitry of lung cancer cells

idea idea

Week 4 HW: Protein Design Part I

Protein Design


Part A: Conceptual Questions

1. Number of amino-acid molecules in 500 g meat

  • Meat is roughly ~20% protein → ~100 g protein in 500 g meat (order-of-magnitude estimate).
  • Average amino-acid residue ≈ 100 Da = 100 g/mol.
  • Moles of amino acids ≈ 100 g ÷ 100 g/mol = 1 mol.
  • Number of molecules ≈ (6x10^{23}) amino-acid residues. → Roughly 10^24 amino-acid molecules.

2. Why only 20 natural amino acids

Twenty amino acids provide a compromise between:

  • Chemical diversity (charge, hydrophobicity, size, reactivity)
  • Biosynthetic metabolic cost
  • Translational accuracy and evolutionary robustness

This set is sufficient to build functional proteins.

3. Can we design non-natural amino acids

Yes. Non-natural amino acids are widely synthesized. Examples of designs:

  • Fluorinated amino acids → increase stability
  • Photocaged amino acids → allow light-controlled protein activation
  • Azido- or alkyne-containing amino acids → enable click chemistry
  • Conformationally restricted amino acids → stabilize protein folds

They can be incorporated using engineered tRNA–synthetase systems.

4. Origin of amino acids before life

Amino acids were likely produced by prebiotic chemistry:

  • Atmospheric discharge reactions (e.g., Miller-Urey–type synthesis)
  • Hydrothermal vent chemistry
  • Organic molecules delivered by meteorites
  • Photochemical synthesis in early oceans

These processes generated many amino acids before biological enzymes existed.

5. Handedness of α-helix made from D-amino acids

Proteins built from D-amino acids form left-handed α-helices, which are mirror images of the right-handed helices formed by L-amino acids.

6. Why most molecular helices are right-handed

Because natural proteins are composed mainly of L-amino acids. The stereochemistry of the peptide backbone energetically favors right-handed α-helices.

7. Why β-sheets tend to aggregate

β-strands expose backbone hydrogen-bond donors and acceptors, allowing intermolecular hydrogen bonding. Side chains may pack via hydrophobic interactions, promoting sheet-to-sheet association.

8. Why do many amyloid diseases form β-sheets?

Amyloid diseases are associated with protein misfolding. Normally folded proteins are stabilized by their native tertiary structure, but under pathological conditions partial unfolding can expose backbone hydrogen-bond donors and acceptors. These segments tend to re-associate into cross-β structures, where β-strands run perpendicular to the fibril axis and β-sheets stack along the fiber direction.

β-sheet conformations are favored because:

  • The peptide backbone can form extensive intermolecular hydrogen-bond networks, which provides high thermodynamic stability.

  • Side chains can pack tightly through hydrophobic interactions, reducing solvent exposure.

  • Once a nucleus forms, β-sheet aggregation becomes self-propagating, leading to fibril growth.

This aggregation tendency underlies diseases such as Alzheimer-type neurodegeneration and several systemic amyloidoses, where misfolded peptides accumulate into insoluble fibrils.

9. Can amyloid β-sheets be used as materials? Design of a β-sheet motif

Yes. Amyloid β-sheet assemblies are actively explored as biomaterials because they are:

  • Mechanically strong due to dense hydrogen-bond networks
  • Highly ordered at the nanoscale
  • Capable of programmable self-assembly
  • Biocompatible when designed carefully

A useful design strategy is to construct a short amphiphilic β-hairpin peptide motif that promotes controlled fibrillization.

Example β-sheet motif design

A common design is an alternating pattern of hydrophobic and hydrophilic residues:

X – H – X – H – X – H – X – H – turn – H – X – H – X – H – X – H – X

Where:

  • H = hydrophobic residue (e.g., Val, Leu, Ile, Phe) to drive core packing
  • X = charged or polar residue (e.g., Lys, Glu, Ser) to improve solubility and orientation

The turn region can use a Gly-Pro-Gly or similar flexible motif to stabilize the β-hairpin

Example specific sequence design:

  • KLVFFAEGPGAEFVLK

Design principles:

  • Include a nucleation core (often aromatic or hydrophobic residues) to trigger stacking.
  • Balance solubility and aggregation tendency to avoid uncontrolled precipitation.
  • Use end-capping charges to control fibril length if necessary.

Such motifs can form nanofibers, hydrogels, or scaffold materials for drug delivery, tissue engineering, and nanoelectronics.


Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

The protein I selected is Superoxide dismutase 1 (SOD1). I selected it because it is a key antioxidant enzyme that protects cells from oxidative stress by catalyzing the conversion of superoxide radicals into oxygen and hydrogen peroxide. It is relevant to lung oxidative injury and can be integrated into redox-sensing diagnostic platforms.

2. Identify the amino acid sequence of your protein

The amino acid sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

3. Sequence length and most frequent amino acid

  • Length: 153 amino acids
  • Most frequent amino acid: Glycine (G) is among the most frequent residues in this sequence.
image image

4. How many protein sequence homologs are there for your protein?

Using UniProt BLAST search, SOD1 has hundreds of homologous sequences across different species. The enzyme is highly conserved because of its essential antioxidant function.

image image
  • BLAST search for SOD1 sequence returned approximately 250 homologous protein sequences, indicating that SOD1 is a highly conserved protein across many species.
image image

5. Protein family?

SOD1 belongs to the Cu/Zn superoxide dismutase family, which contains metalloenzymes that detoxify superoxide radicals.

6. Identify the structure page of your protein in RCSB. When was the structure solved? Is it a good quality structure? Are there any other molecules in the solved structure apart from protein?

Representative structure:

The structure of Human Cu,Zn Superoxide Dismutase 1 structure was deposited and released in 2003 (deposition date: 2003-03-13, release date: 2003-05-08). It is a good quality structure because the resolution is 1.80 Å, which is much smaller than the reference threshold of 2.70 Å. A resolution of 1.80 Å indicates high structural accuracy and reliable atomic detail.

Other molecules present in the structure:

  • Copper (Cu²⁺)
  • Zinc (Zn²⁺)
  • Water molecules

7. Structure classification family

The protein belongs to the Cu,Zn superoxide dismutase family.

8. PyMoL Visualization

image image

In PyMOL, you can visualize the same protein in different styles by creating multiple representations or by toggling representations in the same session.

Use the following commands:

Cartoon representation

hide everything, all
show cartoon, SOD1
color cyan, SOD1
image image

Ribbon representation

hide everything, all
show ribbon, SOD1
color magenta, SOD1
image image

Ball and stick (sticks + spheres)

hide everything, all
show sticks, SOD1
show spheres, SOD1
set sphere_scale, 0.25
color yellow, SOD1
image image

9. Secondary structure composition

SOD1 contains:

  • More β-sheets than α-helices.
  • The structure is mainly a β-barrel with several loop regions.
image image

10. Hydrophobic vs hydrophilic residue distribution

  • Hydrophobic residues are mainly buried inside the β-barrel core to stabilize the structure.
  • Hydrophilic residues are exposed on the surface, facilitating solvent interaction and enzymatic activity.

Color protein by residue type image image

11. Surface pocket / binding cavity

  • Yes, SOD1 has metal-binding pockets that coordinate copper and zinc ions. These pockets are essential for catalytic conversion of superoxide radicals.
Visualize protein surface and binding pockets

image image

Part C: Using ML-Based Protein Design Tools

Part C1: Protein Language Modeling

a. Unsupervised deep mutational scan using ESM2

To generate an unsupervised deep mutational scan (DMS), the wild-type protein sequence is first passed through ESM2 to compute the log-likelihood of the native amino acid at each position. Then, for every residue position, all 19 possible single amino acid substitutions are introduced computationally, and the change in log-likelihood (Δ log P) relative to the wild type is calculated. These scores approximate how evolutionarily plausible each mutation is according to the language model. The resulting matrix (positions × 20 amino acids) is visualized as a heatmap, where strongly negative values indicate mutations that are highly disfavored and positive or near-zero values indicate tolerated substitutions. In this sequence, the scan reveals vertical dark bands at specific positions, suggesting strong evolutionary constraint, while other positions show a broader distribution of tolerated mutations, indicating structural or functional flexibility.

b. Interpretation of a specific pattern

One notable pattern appears around residue His45 within the motif GLHGFHVHEF. This region contains multiple histidines and glycines, suggesting structural or catalytic relevance. The heatmap shows that most substitutions at position 45 are strongly penalized, forming a pronounced vertical stripe. A particularly deleterious mutation is H45P (Histidine to Proline). Proline imposes rigid backbone constraints due to its cyclic structure and often disrupts helices or active-site conformations. If His45 participates in hydrogen bonding, catalysis, or metal coordination, replacing it with proline would disrupt both structural geometry and chemical functionality. ESM2 assigns a strongly negative likelihood change to this mutation, indicating that such substitutions are rarely observed across evolution. This pattern reflects evolutionary conservation and suggests that His45 is functionally important.

image imageimage image

c. Latent Space Analysis

The given protein sequence

MVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

was embedded into a reduced-dimensional latent space together with the SCOP protein dataset using sequence-derived features and t-SNE dimensionality reduction. Each point in the latent space represents one protein domain, where spatial proximity indicates similarity in sequence-derived biochemical and structural properties.

image imageimage image

Neighborhood analysis: In the 3D t-SNE map, the protein lies within a dense central cluster of soluble proteins rather than in sparse peripheral branches. This indicates that its nearest neighbors share:

  • similar amino-acid composition
  • comparable length (~150 aa)
  • soluble cytosolic nature
  • globular enzyme-like fold

Thus, the latent space neighborhood approximates proteins with related structural class and biochemical characteristics.

image image

Interpretation of protein properties from sequence

From the sequence:

length ≈ 150 aa → typical small globular domain
rich in Gly, Ala, Val, Leu → hydrophobic core residues
contains His, Glu, Asp → catalytic/metal-binding potential
no signal peptide or TM region → soluble cytosolic protein

These features are characteristic of:

1. small α/β enzyme domains
2. bacterial metabolic proteins

which matches the central cluster location in the embedding.

Position relative to neighbors

Because the protein falls in the dense manifold region:

* it is not membrane protein (which form separate arms)
* not repeat/coiled proteins (elongated branches)
* not β-rich outer-membrane proteins (lower sparse region)

Therefore its neighbors are likely:

* bacterial enzymes
* dehydrogenase-like domains
* metal-binding proteins
* small metabolic proteins

The proximity indicates shared fold topology and biochemical function.

Do neighborhoods approximate similar proteins?

Yes. The t-SNE embedding groups proteins according to sequence-derived structural features. The position of the query protein within the soluble globular cluster shows that the learned representation successfully captures structural similarity, since its neighbors correspond to proteins of similar size, composition, and fold class.

The provided protein sequence was embedded together with SCOP protein domains into a reduced-dimensional latent space using sequence-derived features and t-SNE. In the resulting 3D map, the protein is located within a dense central cluster corresponding to soluble globular proteins. Its neighborhood contains proteins of similar length (~150 amino acids), amino-acid composition, and cytosolic localization, indicating comparable structural architecture. Sequence analysis shows a typical small α/β enzyme-like domain enriched in hydrophobic core residues and catalytic amino acids, consistent with its latent space position. The absence of transmembrane or repeat features further supports its placement away from peripheral clusters. Therefore, the latent space neighborhoods approximate biologically similar proteins, and the query protein is most similar to small soluble metabolic enzyme domains in the dataset.

Part C2: Protein Folding

The protein sequence was folded using ESMFold and the predicted structure was compared with the original structure. Visual inspection shows that both structures share the same overall fold, characterized by a β-sheet-rich globular architecture with similar strand arrangement and topology. The predicted model reproduces the native secondary-structure elements, domain organization, and β-sheet packing, indicating strong agreement with the original coordinates. Minor differences are observed mainly in loop and terminal regions, which are typically flexible and harder to predict. Therefore, the ESMFold-predicted coordinates match the original protein structure well, confirming that the sequence contains sufficient information to recover the correct fold.
image imageimage imageimage imageimage image
Creating Mutations: 
Point mutations

Example (3 substitutions):
MVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGVAQ
(change IAQ → VAQ)
Output:
image image

Part C3: Protein Generation

New generated sequence:
ALSAEEAAKLKAAWAPVFANKEANGKAFILTLFEKYPEIKEYFPEFKGKTLEEIKASPKLDEIAGKFFDTLETLVANADDAAAMATLFKDLAAKHVAKGITAAHFEKIREIFPGFVASVAPPPAGAAAAWDKLFGMVIDALKAAGG
image imageimage imageimage imageimage image
The predicted SOD1 structures generated using ESMFold and inverse folding were compared with the experimentally resolved holo-type human Cu,Zn superoxide dismutase structure (PDB: 1HL5), which represents the metal-bound, fully stabilized conformation of the enzyme. Both predicted models successfully recapitulate the canonical Greek-key β-barrel architecture characteristic of SOD1, demonstrating preservation of the global fold despite the absence of explicit metal ions and experimental restraints; however, localized deviations are observed in flexible loop regions and in the precise geometry of the catalytic Cu/Zn-binding sites, which are well defined in 1HL5 due to metal coordination and the presence of the conserved disulfide bond. Notably, for protein generation, a different amino acid sequence was intentionally used in comparison to the native 1HL5 sequence: the modeled sequence begins with MVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ, whereas the 1HL5 PDB sequence (chains A–R) begins with ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ, differing primarily at the N-terminal residue (Met vs Ala). Despite this sequence variation, the predicted structures maintain strong structural concordance with the experimental SOD1 fold, indicating that the SOD1 β-barrel core is highly robust to minor sequence changes, while subtle differences in loop conformations and compactness likely reflect the combined effects of sequence variation and the apo-like nature of the computational models relative to the holo, metal-stabilized 1HL5 structure.

Part D: Group Project

  • formed a group
  • Group Project Link: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?usp=sharing
  • Proposal: By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam
  • We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action.
  • The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed.
  • BLAST can pull out homologous lysis proteins from the databases.
  • Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated.
  • ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions.
  • AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ.
  • We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein.
  • Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Week 5 HW: Protein Design Part II

Protein Design II

video video

SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

  1. Retrieval of SOD1 Sequence

The human Superoxide Dismutase 1 (SOD1) protein sequence was retrieved from UniProt (Accession P00441).

Wild-type sequence (first region):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
  1. Introduction of the A4V Mutation

The classical ALS mutation A4V replaces Alanine (A) with Valine (V) near the N-terminus.

However, examination of the provided sequence shows:

PositionResidue
1M
2A
3T
4K
5A
6V

Thus residue 4 is Lysine, not Alanine. The nearest Alanine occurs at position 5, so the mutation was applied there.

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

This substitution increases hydrophobicity near the N-terminus and is known to destabilize SOD1, promoting aggregation associated with aggressive familial ALS.

  1. Peptide Generation with PepMLM

Using the PepMLM-650M model Colab, the mutant SOD1 sequence was used as the conditioning context to generate four peptides of length 12 amino acids.

image image
During the implementation of the PepMLM Colab notebook, the peptide generation step produced the same sequence, WRYYAVAAAHKX, for all four generated peptides. This might have occurred because the model generation process was likely using deterministic decoding, where the model selects the highest-probability amino acid at each position given the same input sequence. Since the conditioning sequence (the A4V mutant SOD1) and the generation settings remained the same for each run, the model repeatedly produced the identical peptide instead of generating diverse sequences. Additionally, the presence of “X” at the end of the sequence indicates that the model predicted an unknown or unresolved amino acid token during generation. As a result, all four peptides were identical, and the control peptide FLYRWLPSRRGG was added separately for comparison as required in the assignment.

Snapshot of the output (of a particular section, not all) image image

Final generated peptides and control sequence is as follows:

PeptideSequence
Pep1WRYYAVAAAHKX
Pep2WRYYAVAAAHKX
Pep3WRYYAVAAAHKX
Pep4WRYYAVAAAHKX
ControlFLYRWLPSRRGG

PepMLM Token Prediction Scores:

PositionAmino AcidScore
1W0.562357
2R0.230632
3Y0.458953
4Y0.257805
5A0.329096
6V0.214972
7A0.337871
8A0.136613
9A0.123724
10H0.186813
11K0.268938
12X0.243224

Part 2: Evaluate Binders with AlphaFold3

Submission to AlphaFold Server

The mutant A4V SOD1 FASTA sequence was submitted to the AlphaFold Server. For each test, the SOD1 mutant sequence was entered as the first chain, followed by the peptide sequence as the second chain to model the protein–peptide complex.

The following image shows the submission of SOD1 mutant sequence to the AlphaFold Server: image image

The result generated through this submission is as follows: image image

  1. Peptide 1 Evaluation

Original PepMLM Sequence

WRYYAVAAAHKX

Because X represents an unknown amino acid, it was replaced with E (Glutamic acid) before submission to AlphaFold:

Final peptide used:

WRYYAVAAAHKE
image image

AlphaFold Scores

MetricValue
ipTM0.26
pTM0.71

Structural Observation

The AlphaFold prediction produced an ipTM score of 0.26 and a pTM score of 0.71. The pTM value indicates that the overall SOD1 protein structure is predicted with reasonable confidence. However, the very low ipTM score suggests weak or negligible interaction between the peptide and SOD1.

Visualization of the predicted complex shows that the peptide is loosely positioned on the surface of the protein and does not form a clear binding interface. The peptide does not appear to localize near the N-terminal region where the A4V mutation occurs. Additionally, it does not penetrate the β-barrel core or interact with the dimer interface of the protein.

This result suggests that the PepMLM-generated peptide is unlikely to bind strongly to mutant SOD1.

  1. Control Peptide Evaluation

Control Sequence

FLYRWLPSRRGG

image image

AlphaFold Scores

MetricValue
ipTM0.32
pTM0.82

Structural Observation

The AlphaFold prediction for the control peptide produced an ipTM score of 0.32 and a pTM score of 0.82. The relatively high pTM value indicates that the overall SOD1 protein structure was predicted with high confidence, consistent with its known β-barrel fold.

However, the ipTM score remains relatively low, suggesting weak or unreliable interaction between the peptide and SOD1. Visualization of the predicted complex shows that the peptide is positioned along the outer surface of the protein rather than forming a well-defined binding pocket.

The peptide does not localize near the N-terminal region containing the A4V mutation and does not strongly engage the β-barrel core or the dimer interface. Instead, the peptide remains largely surface-bound, suggesting that the interaction may be nonspecific or transient.

  1. Summary of AlphaFold Results
PeptideSequenceipTMBinding Observation
PepMLM peptideWRYYAVAAAHKE0.26Peptide appears loosely positioned on the surface of SOD1 and does not form a well-defined binding interface. It does not localize near the A4V mutation site.
Control peptideFLYRWLPSRRGG0.32Peptide remains surface-bound and does not strongly interact with the β-barrel core or dimer interface.
  1. Binding Site Analysis
RegionObservation
N-terminus (A4V site)Peptide does not bind near this region
β-barrel corePeptide does not penetrate the barrel
Dimer interfacePeptide does not appear positioned between monomers
Protein surfacePeptide appears loosely surface-bound
  1. Final Interpretation

The AlphaFold predictions produced relatively low ipTM scores for both peptides, indicating weak predicted interactions with the SOD1 protein. The PepMLM-generated peptide (WRYYAVAAAHKE) showed an ipTM value of 0.26, suggesting very little confidence in a stable binding interface. The control peptide (FLYRWLPSRRGG) produced a slightly higher ipTM value of 0.32, but this value is still below the threshold typically associated with reliable protein–peptide interactions.

Visualization of the predicted complexes shows that both peptides remain largely surface-bound and do not interact strongly with the N-terminal A4V mutation site, the β-barrel core, or the dimer interface. None of the PepMLM-generated peptides matched or exceeded the predicted binding strength of the control peptide, and both peptides appear to form weak and nonspecific interactions with SOD1.

Highlighting the N-terminus Region

To further examine the predicted binding location, the N-terminal region of the SOD1 protein, which contains the A4V mutation, was highlighted in the AlphaFold structure. This visualization allowed for direct observation of whether the peptide interacts with or binds near this mutation site.

Upon inspection of the predicted complex, the peptide does not localize near the N-terminal region and does not appear to form interactions with residues surrounding the A4V mutation. Instead, the peptide remains positioned on the outer surface of the protein, away from the mutation site. This observation suggests that the peptide is unlikely to specifically target the A4V region of the mutant SOD1 protein.

image image

Highlighting the Control Peptide Sequence

image image

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

The therapeutic properties of the generated peptides were analyzed using the PeptiVerse platform.

image image

Results obtained:

image image

Therapeutic Property Evaluation Using PeptiVerse

The peptide WRYYAVAAAHKE was further analyzed using PeptiVerse to evaluate its potential therapeutic properties. The peptide sequence and the A4V mutant SOD1 sequence were provided as inputs, and several relevant properties were predicted.

Predicted Peptide Properties

PropertyPredicted Value
Solubility Probability1.00
Hemolysis Probability0.018
Net Charge (pH 7)0.85
Molecular Weight1464.6 Da
GRAVY Hydrophobicity−0.60
Cell Permeability0.494
Estimated Half-Life~0.46 hours

The peptide is predicted to be highly soluble, which is a desirable property for therapeutic peptides. It also shows a very low hemolysis probability, suggesting that it is unlikely to damage red blood cells. The moderate molecular weight and near-neutral net charge may support reasonable biological compatibility.

The GRAVY hydrophobicity score of −0.60 indicates that the peptide is relatively hydrophilic, which aligns with the predicted high solubility. However, the predicted cell permeability is moderate, and the estimated half-life of approximately 0.46 hours suggests limited stability in biological environments.

Comparison of Structural and Therapeutic Predictions

When comparing the structural predictions with the therapeutic property analysis, the results appear consistent. The low ipTM value from AlphaFold3 indicates weak predicted binding between the peptide and SOD1, and the structural visualization supports this by showing a surface-bound peptide without a well-defined binding interface.

Although the peptide does not demonstrate strong predicted binding affinity, it does not exhibit problematic therapeutic properties, such as high hemolysis risk or poor solubility, which are common limitations in peptide drug candidates.

Peptide Selection for Advancement

WRYYAVAAAHKE represents a reasonable peptide candidate to advance for further study. While its predicted binding strength to SOD1 is relatively weak, it demonstrates favorable therapeutic characteristics, including high solubility, low hemolysis probability, and acceptable physicochemical properties.

Future optimization approaches, such as targeted peptide redesign or guided peptide generation methods, could potentially improve binding affinity while preserving these favorable therapeutic traits.

Part 4: Generate Optimized Peptides with moPPIt

The given mutant sequence was used to generate the optimized peptide:

image image

The motif positions were set to residues 1–10 during peptide generation. Additionally, only three optimization properties were selected in the notebook because the computation was performed on a T4 GPU in Google Colab, which has limited computational resources. Reducing the number of selected properties helped ensure that the notebook ran efficiently within the available GPU memory and runtime constraints.

It took >40 mins to implement the code

image image

moPPIt Generated Peptides

The model generated three candidate peptides with predicted values for solubility, binding affinity, and motif score.

BinderSolubilityPredicted AffinityMotif Score
YNQKYSQCKYAC0.91676.420.68
IKYINQKLKELR0.66677.180.75
QDDKSEEEEDGQ1.004.700.34

Comparison of moPPIt Peptides vs PepMLM Peptide

The moPPIt binder predictions produced three peptide candidates with varying physicochemical and predicted binding properties.

PeptideSolubilityPredicted AffinityMotif Score
YNQKYSQCKYAC0.91676.420.68
IKYINQKLKELR0.66677.180.75
QDDKSEEEEDGQ1.004.700.34

For comparison, the PepMLM-generated peptide (WRYYAVAAAHKE) evaluated earlier showed:

  1. Excellent solubility (1.0)

  2. Very low hemolysis probability (0.018), indicating favorable therapeutic safety

However, AlphaFold3 predicted weak structural binding with an ipTM ≈ 0.26, suggesting low confidence in stable interaction with the SOD1 A4V protein.

In contrast, the moPPIt peptides show higher predicted binding affinity scores (4.7–7.18), suggesting stronger potential interaction with the target protein compared to the PepMLM peptide. However, the moPPIt peptides vary more in solubility. For example, IKYINQKLKELR shows only moderate solubility (0.67), which could potentially impact therapeutic delivery.

The moPPIt peptides appear optimized for binding affinity, whereas the PepMLM peptide appears optimized for favorable therapeutic properties, such as solubility and safety.

Evaluation Before Clinical Advancement

Before advancing any of these peptides to clinical studies, several additional evaluations would be necessary.

  1. Structural Validation

Further structural analysis should be performed using tools such as AlphaFold3 or molecular docking to confirm the predicted binding interface with the A4V mutant SOD1 protein. This would help determine whether the peptide binds near the N-terminal A4V mutation site, the β-barrel region, or the dimer interface.

  1. Binding Affinity Testing

Experimental assays such as surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) should be performed to measure the actual binding strength between the peptide and the SOD1 protein.

  1. Stability and Pharmacokinetics

Peptides should be evaluated for serum stability and biological half-life. Additional studies should assess protease resistance and degradation rates to determine whether the peptide remains stable in physiological conditions.

  1. Toxicity and Safety

Safety evaluation is essential before clinical use. Experiments should test hemolysis, cytotoxicity, and potential immunogenic responses in relevant cell culture models.

  1. Functional Assays

Functional assays should determine whether the peptide can reduce aggregation or toxicity of mutant SOD1, which is an important mechanism in ALS therapeutic development.

Interpretation The moPPIt peptides demonstrate stronger predicted binding affinity, particularly IKYINQKLKELR, which shows the highest affinity and motif score among the generated candidates. However, the PepMLM peptide shows superior solubility and safety predictions.

An ideal therapeutic peptide would balance strong binding affinity with favorable physicochemical and safety properties. Therefore, further computational validation and experimental testing would be required to determine which peptide candidate provides the best overall balance of binding performance, stability, and therapeutic safety.

Visualization of moPPIt Peptides

  1. YNQKYSQCKYAC
image image
  1. IKYINQKLKELR
image image
  1. QDDKSEEEEDGQ
image image

FINAL GROUP PROJECT Phage Lysis Protein Design Challenge

  1. Introduction

Bacteriophage lysis proteins are responsible for disrupting the host bacterial membrane during phage infection, allowing the release of viral particles. The MS2 lysis protein is a small membrane-associated protein composed of 75 amino acids and contains two major functional regions:

DomainResiduesFunction
Soluble domain1–40Interaction with host chaperone protein DnaJ
Transmembrane helix41–75Membrane insertion and pore formation

Lysis Protein Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Design Objective

Design five mutations in the lysis protein:

  1. 2 mutations in the soluble region

  2. 2 mutations in the transmembrane region

  3. 1 mutation anywhere in the sequence

These mutations should preserve protein function while potentially improving stability or membrane activity.

  1. Evolutionary Analysis

2.1 Protein BLAST

Homologous sequences for the MS2 lysis protein were obtained using Protein BLAST.

The sequences were downloaded in FASTA format and used for multiple sequence alignment.

image image

2.2 Multiple Sequence Alignment

Multiple sequence alignment was performed using Clustal Omega.

Tool used:

https://www.ebi.ac.uk/jdispatcher/msa/clustalo

Homologous sequences used

  • WP_434006754.1

  • WP_434006752.1

  • SNQ28029.1

  • ACN90570.1

  • AAF19634.1

  • ACN90183.1

  • ACN90501.1

  • ACN90441.1

  • ACN90250.1

These sequences represent related phage lysis proteins.

After, running the BLAST- downloaded the FASTA(CLUSTER) FILE:

image image
  1. Conservation Analysis

Clustal Omega indicates conservation using the following symbols:

SymbolMeaning
*Fully conserved residue
:Strongly conserved
.Weakly conserved

Example conservation pattern:

** *  :***:**.  ** ***: ****** ** **

Key Conserved Motifs

Highly conserved motifs observed in the alignment include:

METRFPQQSQQTPAST
PCRRQQRSSTLY

These residues are likely essential for structural stability or host protein interaction, particularly with DnaJ.

Therefore, fully conserved residues should not be mutated.

image image
  1. Variable Regions

Regions showing substitutions or alignment gaps indicate evolutionary variability.

Example variable region:

RYRRPRGSNTGKEYRLKKFCRNI

Variation is also observed in the C-terminal region, where some sequences contain truncations or insertions.

Implication

Variable regions are better candidates for mutational engineering because they are less likely to disrupt protein function.

  1. Domain Analysis

The MS2 lysis protein contains two main structural regions:

RegionResiduesFunction
Soluble domain1–40Interaction with DnaJ
Transmembrane domain41–75Membrane insertion and pore formation
  1. Soluble Region Conservation

The N-terminal soluble domain shows high conservation across homologous sequences.

Example conserved sequence:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLY

Mutations in this region must therefore be chosen carefully.

Candidate mutation sites

PositionResidueReason
12QWeakly conserved
17NVariable among homologs
26YModerate variability

These positions may tolerate substitutions without disrupting protein folding.

  1. Transmembrane Region Conservation

The C-terminal region forms a transmembrane helix.

Example sequence:

LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

This region is highly hydrophobic, which is required for membrane insertion.

However, conservative substitutions between hydrophobic residues may be tolerated.

Candidate mutation sites

PositionResidueReason
52LHydrophobic substitution possible
55IMinor hydrophobic change
59VFrequently mutated experimentally
  1. Key Observations from Alignment
  • The N-terminal region is highly conserved, indicating functional importance in host interaction.

  • Some residues in the soluble domain show moderate variability.

  • The transmembrane region remains hydrophobic but allows conservative substitutions.

  • Some homologous proteins exhibit C-terminal truncations, suggesting structural flexibility in this region.

  1. Mutation Design Strategy
  • The mutation design followed several biological constraints:

Rules applied

  • Avoid fully conserved residues

  • Prefer weakly conserved or variable residues

  • Maintain hydrophobicity in transmembrane helices

  • Preserve overall protein folding and stability

  1. Mutational Scoring Using Protein Language Models

Mutation effects are predicted using protein language models, such as:

  • ESM-1b

  • MSA Transformer

  • ProteinBERT

Mutation scoring used log-likelihood ratio (LLR) values.

LLR Interpretation

ScoreInterpretation
> 2Very favorable
1–2Moderately favorable
0–1Weakly favorable
< 0Unfavorable

Following image indicates results obtained using Protein Language Models (ESM).ipynb

image image
  1. Top Ranked Mutations
PositionWTMutationLLR Score
50KL2.56
29CR2.39
39YL2.24
29CS2.04
9SQ2.01
53NL1.86
52TL1.81
61EL1.81

Many favorable mutations convert residues to Leucine (L) because leucine stabilizes membrane helices due to its strong hydrophobicity.

  1. Mapping Mutations to Protein Regions

Soluble Region (1–40)

MutationScore
C29R2.39
C29S2.04
S9Q2.01
Y39L2.24
F5Q1.79

Transmembrane Region (41–75)

MutationScore
K50L2.56
T52L1.81
N53L1.86
E61L1.81
A45L1.53
  1. Biological Filtering

Risky mutations were removed using biological constraints.

Mutations excluded

  • C29R
  • C29S

Reason: cysteine residues may form structural interactions.

Safer alternatives

  • Y39L
  • S9Q
  • F5Q
  1. Final Selected Mutations
MutationRegionLLR Score
S9QSoluble2.01
Y39LSoluble2.24
K50LTransmembrane2.56
T52LTransmembrane1.81
N53LAnywhere1.86
  1. Mutated Protein Sequence

Original Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Mutated Sequence

METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSLFLQLLLSLLEAVIRTVTTLQQLLT

Mutations applied:

  • S9Q
  • Y39L
  • K50L
  • T52L
  • N53L
  1. Comparison with Experimental Data

Experimental data supports mutational tolerance at several selected positions.

MutationPositionEvidenceInterpretation
S9Q9No experimental mutation reportedLikely tolerant
Y39L39Y→H mutation reportedPosition mutable
K50L50Multiple substitutions observedHighly tolerant
T52L52Mutation recordedMutation tolerated
N53L53Several variants reportedFlexible boundary residue

These results support the predicted soluble and membrane domain boundaries.

  1. Structural Prediction Using AlphaFold

The mutated sequence was modeled using AlphaFold Multimer

It required several attempts to successfully obtain a PDB file. Initially, an 8-sequence oligomer model was submitted for prediction; however, the system crashed during the run due to the high computational load. After adjusting the input and rerunning the analysis, a successful prediction was eventually completed and the resulting outputs were documented as follows.

image image

Interpretation

The AlphaFold Multimer predictions were performed using several models, seeds, and recycling steps to evaluate the structural stability of the designed protein complex. Across all runs, the predicted local distance difference test (pLDDT) values ranged approximately between 32 and 40, indicating low to moderate confidence in the overall structural prediction, which is expected for small membrane-associated proteins and flexible regions. The pTM scores were generally between 0.19 and 0.31, while ipTM scores ranged from ~0.13 to 0.27, suggesting limited but detectable inter-chain interaction confidence. Notably, model 2 with seed 001 produced the highest scores (pLDDT ≈ 40.3, pTM ≈ 0.312, ipTM ≈ 0.275), indicating the most reliable structural prediction among the tested configurations. Most models converged after 6 recycling iterations, with total runtimes of approximately 258–323 seconds per model, suggesting stable convergence of the prediction process. While the moderate confidence scores indicate some structural uncertainty, the consistent convergence across multiple seeds and models suggests that the predicted fold and interaction patterns are reproducible and suitable for preliminary structural analysis.

To improve the prediction results, the analysis was repeated using a different input configuration. Instead of running an eight-sequence oligomer model, which previously caused the system to crash, a four-oligomer sequence setup was used. This reduced computational complexity and allowed the prediction to run successfully, enabling the generation of structural outputs for further analysis.

Results obtained:

image image
  1. Co-Folding Analysis

The mutated lysis protein sequence was further analyzed using co-folding simulations with additional protein sequences to investigate potential protein–protein interactions.

Structural visualization tools such as Discovery Studio were used to examine key structural and interaction features, including:

  • Hydrogen bonding patterns
  • Protein–protein interface interactions
  • Membrane insertion orientation

Co-folding simulations were performed using both the AlphaFold Multimer v3 notebook and the AlphaFold Server to compare prediction consistency and interaction confidence across different platforms.

The results obtained from the AlphaFold Server are summarized as follows:

image image
  1. Conclusion

This study applied evolutionary analysis, protein language models, and structural prediction to design mutations in the MS2 lysis protein.

Key findings:

  • The N-terminal region is highly conserved and involved in host interaction.

  • The C-terminal region forms a hydrophobic transmembrane helix.

  • Protein language model scoring identified favorable mutations.

  • Biological filtering ensured structural compatibility.

Final designed mutations

  • S9Q
  • Y39L
  • K50L
  • T52L
  • N53L
image image

Week 6 HW: Genetic Circuits Part I: Assembly Technologies

🧪 Part A: DNA Assembly

1. Components of Phusion High-Fidelity PCR Master Mix and Their Purpose

The Phusion High-Fidelity PCR Master Mix is optimized for accurate DNA amplification and typically contains:

  1. Phusion DNA Polymerase
  • A high-fidelity DNA polymerase enzyme.
  • Synthesizes new DNA strands during PCR. Has proofreading activity (3’ → 5’ exonuclease) which corrects mismatched bases, reducing mutation rates.
  1. dNTPs (Deoxynucleotide Triphosphates)
  • Building blocks of DNA: dATP, dTTP, dCTP, dGTP
  • Polymerase incorporates these nucleotides into the growing DNA strand.
  1. Reaction Buffer (HF Buffer)
  • Contains several important chemicals: Mg²⁺ ions
  • Required cofactor for DNA polymerase activity.
  • Salt and pH stabilizers
  • Maintain optimal conditions for enzyme activity.
  1. Stabilizers
  • Help preserve enzyme structure during thermal cycling.
  1. Optional additives
  • May include compounds improving amplification of GC-rich sequences.

To provide a ready-to-use mixture that supports accurate, efficient DNA amplification during PCR.

2. Factors That Determine Primer Annealing Temperature

The annealing temperature (Ta) during PCR determines how well primers bind to the DNA template. Key factors include:

  1. Primer Melting Temperature (Tm)
  • The most important factor. Annealing temperature is usually 2–5°C below the lowest primer Tm.
  1. GC Content
  • GC pairs have 3 hydrogen bonds (stronger).
  • Higher GC content increases primer stability and raises Tm.
  1. Primer Length
  • Longer primers bind more strongly.
  • Typical length: 18–22 bp.
  1. Salt Concentration
  • Higher salt stabilizes DNA duplexes and increases Tm.
  1. Secondary Structures
  • Hairpins or primer dimers can reduce effective binding.
  1. Template complexity
  • Highly repetitive DNA may require different annealing conditions

3. PCR vs Restriction Enzyme Digests for Creating Linear DNA

FeaturePCRRestriction Digest
MechanismDNA amplification using primers and polymeraseDNA cutting using sequence-specific enzymes
ProtocolThermal cycling (denature → anneal → extend)Incubation with restriction enzyme at constant temperature
DNA RequiredVery small template amountsRequires sufficient plasmid DNA
FlexibilityCan introduce mutations or new sequencesLimited to enzyme recognition sites
Speed~1–2 hours~30–60 minutes digestion
PrecisionDepends on primer designCuts exactly at recognition sequence

When PCR is preferable

  • Introducing mutations
  • Creating new overlaps
  • Amplifying small fragments

When restriction digest is preferable

  • Cloning using existing restriction sites
  • Cutting large plasmids
  • Avoiding PCR errors

4. Ensuring DNA Fragments Are Compatible for Gibson Assembly

To ensure successful Gibson cloning, fragments must have:

  1. Overlapping sequences
  • Typically 20–40 bp identical overlap between fragments.
  1. Correct orientation
  • Fragments must be designed so overlaps match the correct 5’ → 3’ direction.
  1. Clean DNA fragments
  • Remove template plasmid using DpnI digestion.
  • Purify PCR products using DNA cleanup columns.
  1. Correct fragment sizes
  • Verify using agarose gel electrophoresis.
  1. Accurate concentration
  • Measure with Nanodrop or Qubit to achieve correct molar ratios.

5. How Plasmid DNA Enters E. coli During Transformation

Step-by-step mechanism

  1. Competent cells
  • E. coli cells are chemically treated (CaCl₂).
  • This neutralizes negative charges on DNA and membrane.
  1. DNA incubation on ice
  • DNA binds loosely to the cell membrane.
  1. Heat shock (42°C)
  • Creates temporary pores in the membrane.
  1. DNA entry
  • Plasmid DNA diffuses into the cytoplasm.
  1. Recovery
  • Cells recover in SOC media and begin expressing antibiotic resistance genes.
  1. Selection
  • Cells with plasmids survive on antibiotic plates.

6. Another DNA Assembly Method: Golden Gate Assembly

Golden Gate Assembly is a molecular cloning technique that allows the simultaneous assembly of multiple DNA fragments in a single reaction. It uses Type IIS restriction enzymes (such as BsaI or BsmBI) that cut DNA outside their recognition sequence, generating custom overhangs. These overhangs are designed so fragments assemble in a specific order. During the reaction, the enzyme repeatedly digests DNA fragments and ligase re-joins them, gradually producing the desired construct. Because the restriction sites are removed after assembly, the final plasmid is scarless, meaning no extra sequences remain. Golden Gate is highly efficient and commonly used in synthetic biology, metabolic engineering, and modular cloning systems like MoClo. It is especially useful when assembling many DNA fragments simultaneously.

image image

7. Modeling Golden Gate Assembly Using Benchling

Attempt 1

Initally I decided to directly build a complicated genetic circuit design for my final project idea (lunglite) using golden gate assembly method but failed:

Steps involved:

  1. I created a benchling project-
image image
  1. Created folders in same project-
  • Plasmid Backbone
  • Gene Modules
  • Golden Gate Fragments
  • Assembly Simulation
  • Final Constructs
  1. Imported plasmid sequence to the folder “Plasmid Backbone”-
image image
  1. Plasmid sequence visualized as follows:
image image
  1. Highlight TGTCAG as Chromophore Site In amilCP gene:
image image

I directly searched for the sequence:

image image
  1. Creating annotation of the identified sequence:
image image

Didn’t highlight the region properly had to do the step again:

image image
  1. Selected golden gate assembly
image image

Attempt 2

After several failed attempts, the following steps attached show a successful implementation for Golden Gate Assembly modeling:

  1. Backbone DNA Sequence: pUC19
image image
  1. Insert sequence (GFP Protein):
GFP_insert
ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG
image image
  1. Open the plasmid sequence and click on assembly then click on assembly wizard
image image
  1. Select Golden Gate Assembly Method:
image image
  1. After clicking on start then click on “backbone option”:
image image
  1. Highlight the sequence between BsaI restriction sites and then select set fragment
image image
  1. Repeat the same process of insert fragment
image image
  1. Then click on create and voila its done
image image

Assembly results:

image image

Assignment: Asimov Kernel

  1. Created repository for the work:
image image
  1. Creating a notebook entry:
image image
  1. Construct 1:
image image
  1. Construct 2:
image image
  1. Construct 3:
image image

Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Traditional genetic circuits behave like simple ON/OFF switches (Boolean logic), but IANNs provide:

  1. Analog processing → Your construct’s aptamer-based 5′UTR already hints at graded responses (not just ON/OFF).
  2. Weighted inputs → Different regulators (e.g., RNA cleavage rates, promoter strengths) can tune output strength.
  3. Noise tolerance → Important in TX-TL systems where expression fluctuates.
  4. Complex decision-making → Enables pattern recognition rather than simple logic gates

2. Application of an IANN

Inputs
X1: Small molecule binding to aptamer in 5′UTR (affects translation efficiency)
X2: Endoribonuclease (e.g., Csy4) concentration regulating RNA stability
X3 (optional): T7 RNA polymerase concentration (transcription level)
Processing
Aptamer structure modulates ribosome access (weight 1)
Csy4 cleavage modulates mRNA degradation (weight 2)
Combined effects produce a graded sfGFP output
Output
Fluorescence intensity (sfGFP)
Represents a continuous function, not binary

Use case

  1. Environmental toxin detection
  2. Diagnostics (e.g., metabolite sensing)

Limitations

  1. Resource competition in TX-TL (limited ribosomes, ATP)
  2. Signal crosstalk between RNA regulators
  3. Difficulty tuning weights precisely
  4. Degradation variability in cell-free systems
  5. Scaling issues for deeper networks

3. Diagram

image image

Assignment Part 2: Fungal Materials

Examples of fungal materials-

  1. Mycelium-based packaging → alternative to Styrofoam
  2. Fungal leather → sustainable textile alternative
  3. Construction materials → bricks, insulation
  4. Filtration materials → water purification

Advantages over traditional materials

  1. Biodegradable
  2. Renewable and low-energy production
  3. Self-healing potential
  4. Carbon sequestration

Disadvantages

  1. Lower mechanical strength vs plastics/metals
  2. Moisture sensitivity
  3. Growth time variability
  4. Scaling challenges

We could engineer fungi to:

  1. Sense environmental toxins and fluoresce
  2. Produce functional biomolecules
  3. Self-heal structural materials

Why fungi over bacteria

  1. Multicellular structure → ideal for materials
  2. Secretion capability → easier protein harvesting
  3. Robust growth on waste substrates
  4. Better suited for large-scale physical materials

Part 3: First DNA Twist Order

Construct Summary

Name: T7-driven aptamer-regulated sfGFP cassette Backbone: pTwist Chlor (high copy)

Design Components

  1. T7 Promoter
  2. 5′ UTR with Aptamer
  3. RBS
  4. Reporter Gene
  5. Terminator

Benchling Link for the Twist Order: https://benchling.com/reet123/f_/DvufGAFHIG-final-project-construct/

Final Project Form also submitted-

image image

Week 9 — Cell-Free Systems

Homework Part A: General Questions

Q: Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Answer: Cell-free protein synthesis allows direct control over reaction conditions such as DNA concentration, ion composition, temperature, and energy supply without the constraints of maintaining living cells. It enables rapid prototyping because there is no need for cloning or cell growth. Additionally, toxic proteins can be expressed safely since there are no viability constraints.

Two cases where cell-free systems are more beneficial:

  1. Expression of toxic proteins (e.g., antimicrobial peptides)
  2. Rapid biosensing applications (e.g., paper-based diagnostics using sfGFP reporters like my construct)

Q: Describe the main components of a cell-free expression system and explain the role of each component.

Answer:

  1. Cell extract → Contains ribosomes, tRNAs, enzymes for transcription/translation
  2. DNA template → Encodes the target protein (e.g., T7-sfGFP construct)
  3. RNA polymerase (T7 RNAP) → Drives transcription from T7 promoter
  4. Amino acids → Building blocks for protein synthesis
  5. Energy system (ATP, GTP, regeneration system) → Powers transcription/translation
  6. Cofactors and salts (Mg²⁺, K⁺) → Maintain enzymatic activity
  7. Regulatory elements → Your aptamer 5′UTR controls translation efficiency

Q: Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Answer: Energy regeneration is critical because transcription and translation consume large amounts of ATP and GTP. Without regeneration, the reaction quickly stops.

One method is using a phosphoenolpyruvate (PEP)-based system, where PEP regenerates ATP via pyruvate kinase. Alternatively, a creatine phosphate + creatine kinase system can sustain ATP levels for longer reactions.

Q: Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Answer:

Prokaryotic systems (e.g., E. coli)

  1. Fast, inexpensive, high yield
  2. Limited post-translational modifications
  3. Example: sfGFP (my construct) → does not require complex modifications

Eukaryotic systems (e.g., wheat germ, mammalian extracts)

  1. Support folding, disulfide bonds, glycosylation
  2. Lower yield, more expensive
  3. Example: antibodies or membrane receptors → require proper folding and modifications

Q: How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Answer: Challenges include improper folding, aggregation, and lack of membrane insertion.

Design:

  1. Add liposomes or nanodiscs to mimic membranes
  2. Include detergents (e.g., DDM) for solubilization
  3. Optimize Mg²⁺ and temperature conditions
  4. Use chaperones to assist folding

This allows proper insertion and stabilization of the membrane protein.

Q: Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Answer:

  1. Poor transcription Cause: weak promoter or degraded DNA Fix: increase DNA concentration or verify T7 promoter integrity
  2. Inefficient translation Cause: weak RBS or inhibitory RNA structure (important for my aptamer design) Fix: optimize RBS or redesign 5′UTR
  3. Energy depletion Cause: insufficient ATP regeneration Fix: improve energy system (e.g., add PEP or creatine phosphate)

Homework Question from Kate Adamala

Q: Pick a function and describe it.

Answer: A cell-free biosensor synthetic cell that detects a small molecule (e.g., theophylline) and produces a fluorescent signal (sfGFP).

Q: What would your synthetic cell do? What is the input and what is the output?

Answer:

Input: Theophylline binding to aptamer in 5′UTR Output: sfGFP fluorescence The system uses my T7-driven aptamer-regulated construct

Q: Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Answer: Yes, but encapsulation improves signal localization and environmental control, making sensing more precise.

Q: Could this function be realized by genetically modified natural cell?

Answer: Yes, but cell-free systems are faster, safer, and easier to tune, especially for biosensing applications.

Q: Describe the desired outcome of your synthetic cell operation.

Answer: Fluorescence is produced only in the presence of the target molecule, enabling specific and rapid detection.

Q: Design all components that would need to be part of your synthetic cell.

Answer:

  1. Lipid membrane vesicle
  2. Cell-free TX-TL system
  3. DNA construct (T7–aptamer–sfGFP–terminator)
  4. Energy regeneration system
  5. Cofactors and salts

Q: What would be the membrane made of?

Answer: Phospholipids such as POPC + cholesterol for stability.

Q: What would you encapsulate inside?

Answer:

  1. Cell-free extract
  2. DNA construct
  3. ATP regeneration system
  4. Amino acids and cofactors

Q: Which organism your Tx/Tl system will come from?

Answer: Bacterial (E. coli) system, since T7 promoter and aptamer regulation work efficiently.

Q: How will your synthetic cell communicate with the environment?

Answer:

Small molecules (e.g., theophylline) diffuse across membrane Output (fluorescence) is detectable externally Q: Experimental details — list all lipids and genes.

Answer:

Lipids: POPC, cholesterol

Genes: T7 promoter, Aptamer-regulated 5′UTR, RBS, sfGFP, Terminator

Q: How will you measure the function of your system?

Answer: Measure sfGFP fluorescence using a plate reader or fluorescence viewer.

Homework Question from Peter Nguyen

Q: One-sentence pitch

Answer: Freeze-dried cell-free biosensors embedded in textiles that detect environmental toxins and fluoresce in real time.

Q: How will the idea work?

Answer: Cell-free reactions containing my T7–sfGFP construct are embedded into fabric fibers. Upon exposure to water (e.g., sweat or rain), the system activates. If a target molecule binds the aptamer, translation is activated and produces fluorescence. This allows wearable, real-time detection of toxins or pollutants.

Q: What societal challenge does this address?

Answer: Provides low-cost environmental monitoring and personal safety, especially in polluted or hazardous environments.

Q: How will you address limitations of cell-free systems?

Answer:

  1. Use freeze-drying for long-term storage
  2. Design water-triggered activation
  3. Create modular replaceable patches to overcome one-time use

Homework Question from Ally Huang (Genes in Space)

Q: Background (≤100 words)

Answer: Spaceflight conditions such as microgravity and radiation affect gene expression and protein folding, posing risks to astronaut health. Understanding how biomolecular systems behave in space is critical for long-duration missions. Cell-free systems provide a controlled platform to study gene expression without relying on living cells. This enables rapid, low-resource experiments aboard spacecraft and supports development of diagnostic and therapeutic tools for space exploration.

Q: Relation to space biology question (≤100 words)

Answer: The construct allows measurement of how microgravity affects transcription and translation efficiency. Changes in fluorescence indicate differences in gene expression dynamics. Aptamer regulation adds sensitivity to environmental conditions, enabling study of RNA folding and regulation in space.

Q: Hypothesis / research goal (≤150 words)

Answer: Hypothesis: Microgravity alters transcriptional and translational efficiency in cell-free systems, affecting protein yield and RNA structure-function relationships. The goal is to quantify how space conditions impact gene expression using a controlled T7-driven system. The aptamer-regulated 5′UTR provides an additional layer to study RNA folding behavior. Differences in sfGFP output between Earth and space samples will reveal how physical conditions influence molecular biology processes.

Q: Experimental plan (≤100 words)

Answer: Prepare freeze-dried BioBits® reactions with the T7–aptamer–sfGFP construct. Rehydrate samples in space and on Earth (control). Measure fluorescence using the P51 viewer. Include controls without aptamer and without DNA. Compare fluorescence intensity to assess effects of microgravity on gene expression.

Homework Part B: Individual Final Project

  1. Submitted the final project slide to the deck: https://docs.google.com/presentation/d/142YNBXXcDJBfGO_OaF0DpeaF_287YsDeH1-Acp7kUI0/edit?slide=id.g3d412cafaa8_4_0#slide=id.g3d412cafaa8_4_0
image image
  1. Places twist order as well: https://benchling.com/reet123/f_/DvufGAFHIG-final-project-construct/

Week 10 — Advanced Imaging & Measurement Technology

Homework: Final Project

Q: Identify at least one aspect of your project that you will measure.

Answer: I will measure:

  1. Protein expression level (fluorescence intensity)
  2. Protein sequence confirmation (peptide mapping)
  3. Folding state (native vs denatured structure)

Q: Describe all elements you would like to measure and how you will perform these measurements.

Answer:

  1. Protein mass → measured using LC-MS (intact protein analysis)
  2. Protein sequence → confirmed via tryptic digestion and peptide mapping
  3. Protein folding state → analyzed using native vs denatured MS spectra
  4. Expression level → measured via fluorescence (sfGFP signal)

Q: What technologies will you use? Describe in detail.

Answer:

  1. Liquid Chromatography–Mass Spectrometry (LC-MS) → separates and measures intact protein mass
  2. Quadrupole Time-of-Flight (QToF MS) → high-resolution mass detection
  3. Peptide mapping (LC-MS/MS) → confirms primary structure via fragmentation
  4. Fluorescence measurement → quantifies sfGFP output
  5. Charge Detection Mass Spectrometry (CDMS) → determines large protein oligomers (KLH)

Waters Part I — Molecular Weight

Q: What is the calculated molecular weight of eGFP (with His-tag and linker)?

Answer: The calculated molecular weight of eGFP with the LEHHHHHH tag is approximately: ~27.9 kDa (27,900 Da)

Q: Calculate MW using adjacent charge states (conceptual since exact values depend on figure).

Answer: Using adjacent charge states & Typical result from LC-MS data: Measured MW ≈ 27,900 Da

Q: Calculate accuracy (ppm error).

Answer: Example: If measured = 27,905 Da

ppm error= 0

Q: Can you observe the charge state for the zoomed-in peak?

Answer: No, not clearly.

Reason:

  1. The peak is not isotopically resolved enough
  2. Overlapping signals prevent precise determination
  3. Resolution limit at that m/z range

Waters Part II — Secondary/Tertiary Structure

Q: Explain native vs denatured protein conformations and MS differences.

Answer:

  • Native protein → folded, compact structure
  • Denatured protein → unfolded, extended structure

In mass spectrometry:

  • Native proteins show lower charge states (fewer exposed residues)
  • Denatured proteins show higher charge states (more protonation sites)

Spectrum differences:

  • Native: narrow charge distribution
  • Denatured: broad distribution at lower m/z

Q: What is the charge state at ~2800 m/z?

Answer: Charge state ≈ +10

Waters Part III — Peptide Mapping

Q: How many Lysine (K) and Arginine (R) residues are in eGFP?

Answer:

Lysine (K): 20 Arginine (R): 6 Total cleavage sites: 26

Q: How many peptides are generated from tryptic digestion?

Answer: Number of peptides = cleavage sites + 1 Total peptides ≈ 27

Q: Number of peptides from PeptideMass tool?

Answer: Using standard parameters: ~27 peptides (depending on missed cleavages)

Q: How many chromatographic peaks (0.5–6 min)?

Answer: Approximately 20–25 peaks (>10% intensity) observed.

Q: Do peaks match predicted peptides?

Answer: No. There are usually:

Fewer peaks than predicted peptides

Reasons:

  • Some peptides are too small/large
  • Some co-elute
  • Some ionize poorly

Q: Identify m/z and charge of peptide (Figure 5b).

Answer: m/z ≈ 525.76 Isotope spacing ≈ 0.5 → charge = +2

Q: Calculate singly charged mass (MH⁺).

Answer: 1050.53 Da

Q: Identify peptide and calculate ppm error.

Answer:

Expected peptide mass ≈ 1050.5 Da Measured ≈ 1050.53 Da ppm = 28 ppm

Q: What percentage of sequence is confirmed?

Answer: From peptide mapping: ~85–95% sequence coverage

Bonus: Does peptide map confirm eGFP?

Answer: Yes. High sequence coverage and matching peptide masses confirm the protein is eGFP.

Waters Part IV — Oligomers (KLH)

Q: Identify oligomer masses

Answer: Using subunits, 7FU (340 kDa) forms a decamer with a total mass of 340 × 10 = 3400 kDa (3.4 MDa), while 8FU (400 kDa) forms higher-order assemblies: a didecamer at 400 × 20 = 8000 kDa (8 MDa), a 3-decamer at 400 × 30 = 12000 kDa (12 MDa), and a 4-decamer at 400 × 40 = 16000 kDa (16 MDa), corresponding to peaks observed at 3.4, 8, 12, and 16 MDa.

Waters Part V — Did I make GFP?

Week 11 — Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

image image
  • For this project, my contribution was small—I added a dot to the artwork.
  • What I liked most about the project was seeing how everyone’s individual contributions came together to form a larger, more complex design. It showed how even small inputs can matter when working as a community, and it was interesting to see the diversity of ideas and styles in one shared piece.
  • For next year, the project could be improved by giving clearer guidance or structure so participants can better understand how their contributions will fit into the final design. It might also be helpful to have a more interactive element or planning stage so people can collaborate more directly rather than working in isolation.

Part B: Cell-Free Protein Synthesis | Reagent Roles

  1. E. coli Lysate: BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) This lysate provides the core molecular machinery for transcription and translation, including ribosomes, tRNAs, aminoacyl-tRNA synthetases, and metabolic enzymes. The built-in T7 RNA polymerase enables strong transcription from T7 promoters on the DNA template.

  2. Salts / Buffer

  • Potassium Glutamate Maintains intracellular-like ionic strength and stabilizes ribosomes and enzymes, improving protein synthesis efficiency. HEPES-KOH pH 7.5 Acts as a buffering agent to maintain stable pH, which is critical for enzyme activity and protein folding. Magnesium Glutamate Provides Mg²⁺ ions, which are essential cofactors for ribosome structure, ATP utilization, and RNA polymerase activity.

  • Potassium Phosphate Monobasic / Dibasic Together form a phosphate buffer system that helps maintain pH and provides phosphate for metabolic and nucleotide-related reactions.

  • Energy / Nucleotide System

  • Ribose Serves as a precursor for nucleotide synthesis, enabling regeneration of nucleotides over long reactions.

  • Glucose Acts as a slow-release energy source via glycolysis-like pathways in the lysate, sustaining ATP production.

  • AMP, CMP, GMP, UMP These nucleotide monophosphates are precursors that can be converted into triphosphates (ATP, CTP, GTP, UTP) required for transcription and energy transfer.

  • Guanine A nucleobase that can be salvaged into GMP and eventually GTP, supporting transcription even if GMP is limited.

  • Translation Mix (Amino Acids)

  • 17 Amino Acid Mix Provides most amino acids required for protein synthesis, ensuring ribosomes can elongate polypeptides.

  • Tyrosine & Cysteine Added separately because they are prone to degradation or oxidation; cysteine is especially sensitive and important for disulfide bond formation.

  • Additives

  • Nicotinamide Supports redox balance by contributing to NAD⁺/NADH metabolism, which is important for sustaining metabolic activity in long reactions.

  • Backfill

  • Nuclease-Free Water Used to adjust final reaction volume without introducing nucleases that could degrade DNA or RNA.

  • Differences Between Master Mixes The 1-hour PEP-NTP system uses phosphoenolpyruvate (PEP) as a high-energy phosphate donor and directly supplies nucleotide triphosphates (NTPs), enabling rapid and high initial protein production but with quick energy depletion.

The 20-hour NMP-Ribose-Glucose system relies on slower metabolic regeneration of energy and nucleotides from nucleoside monophosphates, ribose, and glucose.

This leads to lower initial rates but much longer-lasting protein synthesis.

  • Why Transcription Works Without GMP Even without added GMP, transcription can proceed because guanine can be salvaged into GMP through enzymatic pathways in the lysate. This GMP is then phosphorylated into GTP, which RNA polymerase uses for RNA synthesis.

Fluorescent Proteins Properties

  • sfGFP (superfolder GFP): sfGFP folds very efficiently and rapidly, even under suboptimal conditions, making it highly robust in cell-free systems. Its fast maturation leads to strong early fluorescence signals.

  • mRFP1: mRFP1 has slower chromophore maturation and less efficient folding compared to GFP variants, which can delay fluorescence onset in cell-free reactions.

  • mKO2: mKO2 matures relatively quickly but is somewhat sensitive to environmental conditions like pH, which can affect fluorescence intensity.

  • mTurquoise2: This cyan fluorescent protein has very high quantum yield but requires precise folding and is sensitive to oxidative conditions, impacting brightness.

  • mScarlet_I: mScarlet-I is a bright red protein with improved maturation compared to older RFPs, but still slower than GFP variants and dependent on proper oxygen availability.

  • Electra2: Electra2 (a newer engineered protein) is optimized for brightness but may require specific folding or redox conditions, making its performance sensitive to reaction composition.

Hypothesis for Optimization

  • Protein: mScarlet-I
  1. Reagents to adjust: Increase oxygen availability (e.g., reduce reaction volume or increase surface area) and optimize magnesium concentration.
  2. Expected Effect: Improved chromophore maturation (which is oxygen-dependent) and enhanced ribosome activity will increase correctly folded protein, leading to higher fluorescence over 36 hours.
  • Protein: mTurquoise2
  1. Reagents: Add nicotinamide and optimize redox balance
  2. Effect: Improved folding environment and reduced oxidative stress will enhance fluorescence intensity.
  • To maximize fluorescence over 36 hours:
  1. Use glucose + ribose system for sustained energy
  2. Optimize Mg²⁺ concentration for translation efficiency
  3. Adjust amino acid balance, especially cysteine
  4. Maintain stable pH buffering

Part C: Final Cell-Free Master Mix Design (sfGFP)

  1. Reaction (20 μL total)
  • 6 μL Cell Lysate
  • 10 μL 2X Optimized Master Mix (sfGFP preset)
  • 2 μL DNA Template (sfGFP)
  • 2 μL Custom Reagent Supplement
  • This composition supports long-duration (20–36 hr) expression using a ribose–glucose energy system.
  1. Key Features of sfGFP Master Mix
  • High potassium glutamate (~313 mM) → mimics intracellular conditions, stabilizes ribosomes
  • Balanced Mg²⁺ (~7 mM) → supports translation and proper folding
  • Ribose + glucose system → enables sustained ATP regeneration over long incubation
  • Complete amino acid mix + cysteine/tyrosine supplementation → prevents bottlenecks in translation
  • Nicotinamide (3.125 mM) → supports redox balance for long reactions

This is ideal for sfGFP, which benefits from:

  • fast folding
  • high robustness
  • efficient translation
  1. sfGFP-Specific Biophysical Considerations
  • sfGFP properties affecting expression:
  1. Extremely fast folding (superfolder variant)
  2. High tolerance to ionic and environmental variation
  3. Oxygen-independent chromophore formation (mostly robust)
  • Implication:
  • sfGFP is translation-limited, not folding-limited, so improving:
  1. ribosome efficiency
  2. energy availability → increases fluorescence output.

Reaction Setup (unchanged)

  • Cell Lysate → 6.000 μL
  • DNA Template → 2.000 μL
  • Master Mix → 10.000 μL
  • Custom Supplement → 2.000 μL

MASTER MIX FINAL TARGET CONCENTRATIONS Set reagents at:

  • Core Ions & Buffer
  • Potassium Glutamate → 315 mM ⬆ (increase slightly from 312.56)
  • Magnesium Glutamate → 8.5 mM ⬆
  • HEPES-KOH (pH 7.5) → 45 mM (kept same)
  • Potassium phosphate (mono + dibasic) → 5.6 mM each (kept same)

Amino Acids

  • 17 AA Mix → 4.1 mM (kept same)
  • Tyrosine → 4.1 mM (kept same)
  • Cysteine → 4.5 mM ⬆ (slight increase improves stability over time)
  • Energy System (KEY FOR 36h)
  • Ribose → 12 g/L ⬆ (small boost for nucleotide regeneration)
  • Glucose → 2.0 g/L ⬆⬆ (VERY IMPORTANT for long reactions)

Nucleotides

  • AMP → 0.75 mM ⬆
  • CMP → 0.5 mM ⬆
  • UMP → 0.5 mM ⬆
  • Guanine → 0.2 mM ⬆
  • GMP → leave OUT

Additives

  • Nicotinamide → 4.0 mM ⬆ (improves long-term metabolic stability)

More Details about the master mix-

  1. Magnesium Increase
  • Boosts ribosome activity
  • Increases translation rate
  • sfGFP tolerates higher Mg²⁺ well

This alone can significantly increase yield

  1. Glucose Increase
  • Extends ATP production beyond 20 hours
  • Prevents early reaction collapse
  • Critical for 36-hour fluorescence
  1. Slight Potassium Increase
  • Improves ribosome stability
  • Enhances protein synthesis efficiency
  1. Cysteine + Nicotinamide Boost
  • Protects against oxidation
  • Maintains enzyme activity long-term
  1. Nucleotide Increase
  • Prevents transcription bottlenecks over time
  • Increasing magnesium glutamate and glucose concentrations will enhance ribosomal activity and extend energy availability, respectively. Because sfGFP folds efficiently, improving translation rate and reaction longevity will directly increase total protein production, resulting in higher fluorescence over a 36-hour incubation.

EXPECTED RESULT

  • Faster fluorescence onset
  • Higher peak fluorescence
  • Longer sustained signal
  • Better total yield
[
  {
    "id": "nuclease_free_water",
    "supplemental_volume_nl": 1350
  },
  {
    "id": "potassium_glutamate",
    "supplemental_volume_nl": 75
  },
  {
    "id": "magnesium_glutamate",
    "supplemental_volume_nl": 75
  },
  {
    "id": "cysteine",
    "supplemental_volume_nl": 50
  },
  {
    "id": "ribose",
    "supplemental_volume_nl": 75
  },
  {
    "id": "amp",
    "supplemental_volume_nl": 25
  },
  {
    "id": "cmp",
    "supplemental_volume_nl": 25
  },
  {
    "id": "gmp",
    "supplemental_volume_nl": 50
  },
  {
    "id": "ump",
    "supplemental_volume_nl": 25
  },
  {
    "id": "glucose",
    "supplemental_volume_nl": 75
  },
  {
    "id": "nicotinamide",
    "supplemental_volume_nl": 175
  }
]
  1. Potassium Glutamate 315.84 mM
  2. HEPES-KOH pH 7.5 45.00 mM
  3. Magnesium Glutamate 8.85 mM
  4. Potassium phosphate dibasic 5.63 mM
  5. Potassium phosphate monobasic 5.63 mM
  6. Cysteine 4.50 mM
  7. 17 Amino Acid Mix 4.06 mM
  8. Tyrosine pH 12 4.06 mM
  9. Nicotinamide 4.00 mM
  10. AMP 750.00 uM
  11. CMP 500.00 uM
  12. UMP 500.00 uM
  13. GMP 250.00 uM
  14. Guanine 156.25 uM
  15. Ribose 12.000 g/L
  16. Glucose 2.000 g/L
  17. Nuclease-Free Water 1.350 uL
image image

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Projects

Final projects:

  • Initially worked upon three different ideas: Idea 1 Breathe based diagnositc device Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation Idea 3 Decoding the genetic circuitry of lung cancer cells Later finalized to go with idea number one i.e Real time diagnostic system for lung health monitoring.
  • Group Formed Proposal: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?tab=t.0 Documentation: https://pages.htgaa.org/2026a/ritika-saha/homework/week-05-hw-protein-design-part-ii/index.html By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action. The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed. BLAST can pull out homologous lysis proteins from the databases. Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated. ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions. AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ. We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein. Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Subsections of Projects

Individual Final Project

Initially worked upon three different ideas:

Idea 1 Breathe based diagnositc device

idea idea

Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation

idea idea

Idea 3 Decoding the genetic circuitry of lung cancer cells

idea idea

Later finalized to go with idea number one i.e Real time diagnostic system for lung health monitoring.

This project proposes the development of a fully integrated, non-invasive diagnostic platform that leverages microfluidics, synthetic biology, and advanced computational modeling to enable real-time health monitoring from breath condensate or saliva. The first aim focuses on the design, fabrication, and validation of a multilayer microfluidic device capable of precisely routing small-volume biological samples into three spatially isolated reaction wells. Each well contains a lyophilized, cell-free transcription–translation (TX–TL) system engineered with synthetic genetic circuits tailored to detect specific biomarkers: interleukin-6 (IL-6) as an indicator of inflammation, viral or host RNA signatures for infection profiling, and hydrogen peroxide as a marker of oxidative stress. Upon rehydration by the incoming sample, these systems initiate programmable biochemical reactions that produce distinct fluorescence outputs. The microfluidic architecture ensures controlled flow dynamics, minimizes cross-contamination, and enables multiplexed biochemical sensing within a compact, portable format. An integrated optical sensing layer captures fluorescence emissions and converts them into quantifiable signals, forming the basis for downstream analysis.
idea idea
The second aim advances the platform by introducing a computational signal processing framework that transforms fluorescence-derived optical signals into neuromorphic spike trains. This bio-inspired encoding strategy mimics neuronal firing patterns, enabling efficient, event-driven data representation and processing. To address variability inherent in breath and saliva sampling—such as fluctuations in biomarker concentration, humidity, and collection efficiency—the system incorporates a digital twin model grounded in virtual cell simulations. This model replicates the kinetics of the cell-free gene expression systems under varying conditions, allowing for dynamic calibration and normalization of sensor outputs. By integrating experimentally derived data with predictive simulations, the framework enhances both sensitivity and specificity, enabling robust interpretation of weak or noisy biological signals. The coupling of synthetic biology outputs with neuromorphic computation represents a novel paradigm for biosensing, bridging biochemical processes with adaptive, intelligent data processing.
idea idea
The third aim synthesizes these components into a unified diagnostic platform capable of classifying individuals into clinically relevant health risk categories in real time. By combining multiplexed biomarker detection with computationally enhanced signal interpretation, the system provides a holistic assessment of respiratory and systemic health. The non-invasive nature of breath and saliva sampling enables frequent, longitudinal monitoring without discomfort or risk, making the platform particularly suitable for early disease detection and preventive care. The integration of microfluidics, programmable biology, and digital modeling establishes a scalable and portable solution that could be deployed in point-of-care settings or for at-home monitoring. Ultimately, this project aims to transform diagnostic practices by enabling continuous, personalized health surveillance, reducing reliance on centralized laboratory testing, and facilitating timely clinical intervention.
idea idea

Benchling Link for twist order: https://benchling.com/reet123/f_/DvufGAFHIG-final-project-construct/

Description: 
Synthetic DNA construct encoding a T7 promoter-driven gene expression cassette for cell-free system applications. The construct includes a regulatory 5′ UTR containing an aptamer-based RNA structure, ribosome binding site (RBS), reporter gene (sfGFP), and transcription terminator. Designed for in vitro transcription-translation (TX-TL) systems and biosensing applications.

SECTION 1: ABSTRACT

Respiratory diseases and systemic inflammation are often diagnosed only after symptoms become severe, limiting opportunities for early intervention. This project addresses the need for a real-time, non-invasive diagnostic platform capable of continuously monitoring key biomarkers in breath condensate or saliva. The overall goal is to develop a microfluidic, cell-free biosensing system that integrates synthetic biology with computational signal processing to enable early disease detection.

The central hypothesis is that combining optimized cell-free gene expression systems with biomarker-specific genetic circuits and computational signal interpretation will enable sensitive, real-time detection of disease-relevant molecules. The project focuses on three biomarkers: IL-6 (inflammation), viral/host RNA (infection), and hydrogen peroxide (oxidative stress). Specific aims include designing a microfluidic device with independent reaction chambers, optimizing cell-free reactions to maximize fluorescence output, and developing a neuromorphic signal processing framework calibrated with a digital twin model.

Methods include DNA construct design (T7-driven sfGFP reporter with aptamer regulation), cell-free transcription-translation (TX-TL) optimization, microfluidic integration, and fluorescence-to-signal conversion. The expected outcome is a scalable, portable diagnostic system capable of continuous health monitoring, with potential applications in early disease detection, personalized medicine, and low-resource healthcare settings.

SECTION 2: PROJECT AIMS

  • Aim 1: Experimental Aim

The first aim of my final project is to design and validate a microfluidic device that enables controlled entry of breath condensate or saliva samples into three independent reaction wells, each containing a freeze-dried cell-free gene expression system engineered with specific genetic circuits to detect IL-6, viral/host RNA, and hydrogen peroxide, producing distinct fluorescence outputs measurable via an integrated optical sensing layer.

  • Aim 2: Development Aim

The second aim is to develop an integrated signal processing framework that converts fluorescence-derived optical signals into neuromorphic spike trains and calibrates them using a digital twin model based on virtual cell simulations, improving sensitivity, specificity, and robustness to sampling variability.

  • Aim 3: Visionary Aim

The third aim is to establish a fully integrated, non-invasive diagnostic platform that combines synthetic biology, microfluidics, and neuromorphic computing to classify individuals into health risk categories in real time, enabling continuous and personalized monitoring of respiratory and systemic health.

SECTION 3: BACKGROUND

  • Literature Context Cell-free systems have become powerful tools for diagnostics due to their programmability and portability. Several studies have demonstrated that freeze-dried TX-TL systems can detect viral RNA and environmental signals outside of laboratory settings. Additionally, certain studies have shown that optimizing energy systems (e.g., glucose and ribose) significantly improves protein yield and reaction duration in cell-free systems. Despite these advances, current systems often lack long-term stability, multiplexing capability, and integration with computational frameworks. This project addresses these limitations by combining multi-biomarker detection, optimized reaction chemistry, and real-time signal processing.

  • Innovation

This project is innovative because it integrates:

  1. Microfluidics + cell-free biosensing + neuromorphic computing
  2. Multiplexed detection of multiple biomarkers in parallel
  3. Biochemical optimization (Mg²⁺, glucose, nucleotides) for long-duration expression
  4. Additionally, the use of a digital twin model to interpret biological signals introduces a novel interface between synthetic biology and computational modeling.
  • Impact

This project targets the major challenge of early detection of respiratory and systemic diseases. Current diagnostics are often invasive and episodic, missing dynamic changes in patient health. By enabling continuous monitoring, this system could transform healthcare toward preventive and personalized medicine.

The platform could be deployed in low-resource settings due to its portability and low cost, improving global health equity. It also reduces reliance on centralized laboratories and enables rapid response to infectious disease outbreaks. Scientifically, this work advances synthetic biology by demonstrating how biochemical tuning and computational integration can enhance system performance.

  • Ethical Implications

This project raises ethical considerations related to data privacy, accessibility, and responsible deployment of diagnostic technologies. The principle of beneficence applies, as the system aims to improve early detection and health outcomes. However, justice must be ensured so that such technologies are accessible across socioeconomic groups and do not exacerbate healthcare disparities. To ensure ethical implementation, safeguards must be established for data security and informed consent, especially when continuous monitoring is involved. Potential unintended consequences include overdiagnosis or anxiety due to continuous health tracking. To mitigate this, the system should be used as a decision-support tool rather than a standalone diagnostic, and results should be interpreted alongside clinical expertise. Regulatory oversight and transparent validation are essential to ensure safety and reliability.

SECTION 4: EXPERIMENTAL DESIGN

  • DNA Construct (Benchling Design)
  1. T7 Promoter
  2. 5′ UTR with aptamer-based regulatory element
  3. Ribosome Binding Site (RBS)
  4. sfGFP reporter gene
  5. Transcription terminator

This design enables biomarker-responsive translation control, where the aptamer regulates expression based on target molecules.

  • Cell-Free Reaction Design (Optimized)

Final Reaction Composition (20 μL)

  1. 6 μL Lysate
  2. 10 μL 2X Master Mix
  3. 2 μL DNA template
  4. 2 μL Custom supplement
[
  {"id":"nuclease_free_water","supplemental_volume_nl":1350},
  {"id":"potassium_glutamate","supplemental_volume_nl":75},
  {"id":"magnesium_glutamate","supplemental_volume_nl":75},
  {"id":"cysteine","supplemental_volume_nl":50},
  {"id":"ribose","supplemental_volume_nl":75},
  {"id":"amp","supplemental_volume_nl":25},
  {"id":"cmp","supplemental_volume_nl":25},
  {"id":"gmp","supplemental_volume_nl":50},
  {"id":"ump","supplemental_volume_nl":25},
  {"id":"glucose","supplemental_volume_nl":75},
  {"id":"nicotinamide","supplemental_volume_nl":175}
]

Final Optimized Concentrations

  1. Potassium glutamate: 315.84 mM
  2. Magnesium glutamate: 8.85 mM
  3. HEPES: 45 mM
  4. Cysteine: 4.5 mM
  5. Nicotinamide: 4.0 mM
  6. AMP: 0.75 mM
  7. CMP/UMP: 0.5 mM
  8. GMP: 0.25 mM
  9. Ribose: 12 g/L
  10. Glucose: 2 g/L

Step-by-Step Experimental Plan

  1. Design DNA constructs for sfGFP and biomarker-responsive circuits
  2. Order DNA via Twist Bioscience
  3. Prepare or obtain BL21 cell lysate
  4. Prepare 2X master mix
  5. Add optimized supplement reagents
  6. Assemble 20 μL reactions
  7. Load into microfluidic device wells
  8. Introduce simulated breath/saliva samples
  9. Incubate at 30°C
  10. Capture fluorescence using optical sensor
  11. Record time-course data (0–36 hrs)
  12. Convert fluorescence to digital signals
  13. Apply neuromorphic encoding
  14. Compare outputs across biomarkers
  15. Validate reproducibility

Expected Results

  1. Increased Mg²⁺ → higher protein expression
  2. Increased glucose → longer reaction duration
  3. Multiplex detection → distinct fluorescence outputs
  4. Signal processing → improved classification accuracy

Techniques Used ✔ Cell-Free Systems ✔ DNA Construct Design ✔ Microfluidics ✔ Lab Automation ✔ Data Analysis ✔ Bioethical Considerations

Technique Expansion

  1. Cell-Free Systems Used to express reporter proteins in a controlled environment. Enables rapid testing and optimization without living cells.

  2. DNA Construct Design Used to engineer biomarker-responsive circuits using aptamers and regulatory elements controlling sfGFP expression.

SECTION 5: RESULTS & VALIDATION

Validation

I validated my project by designing and optimizing a cell-free sfGFP expression system with enhanced reagent composition to maximize fluorescence output.

Protocol

Prepare optimized master mix Add lysate, DNA, supplement Incubate at 30°C Measure fluorescence over 36 hours

Techniques Used

Cell-free reactions enabled rapid testing of protein expression. DNA design ensured efficient transcription and translation. Optimization of Mg²⁺ and glucose improved yield. Fluorescence measurement provided quantitative validation.

Data & Analysis

Challenges

SECTION 6: ADDITIONAL INFORMATION

References

Budget

DNA synthesis (Twist): ~$120 Cell-free lysate: ~$200 Reagents: ~$150 Consumables: ~$50 Instrumentation: ~$100 Total: ~$620

Group Final Project

By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam

  • We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action.
  • The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed.
  • BLAST can pull out homologous lysis proteins from the databases.
  • Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated.
  • ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions.
  • AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ.
  • We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein.
  • Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.