Ritika Saha — HTGAA Spring 2026

About me

Hello! I’m Ritika Saha, a student in HTGAA (Spring 2026).

My interests include:

🧬 Synthetic biology + diagnostics
🤖 Responsible AI for health

Contact info

Let’s connect:

HTGAA Committed Listener (CL) Agreement

I am a HTGAA Committed Listener, my responsibilities are:

Watching class lectures and recitations
Participating in node reviews Developing and documenting my homework
Actively communicating with other students and TAs on the forum
Allowing HTGAA and BioClub to share my work (with attribution) Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
Following locally applicable health and safety guidance
Promoting a respectful environment free of harassment and discrimination
Signed by committing this file to my documentation page/repository,

Ritika Saha 9 March 2026

Homework

Week 1 HW: LungLite — Principles, Practices, and Governance
Principles, practices, and governance for the LungLite concept. Along with week 2 lecture prep
Week 2 HW: DNA Read, Write, Edit — SOD1 Molecular Journey
A documented journey through gel electrophoresis, SOD1 DNA design, codon optimization, plasmid construction, and DNA read/write/edit technologies.
Week 3 HW: Lab Automation — Opentrons Artwork
Opentrons scripting, lab automation exploration, and final project ideation.
Week 4 HW: Protein Design Part I
This week focuses on how sequence, structure, and energetics can be modeled and manipulated to create or optimize proteins with specified functions.
Week 5 HW: Protein Design Part II
This week we learn how cutting-edge AI and protein language models are used to design functional proteins and peptides “in silico”.
Week 6 HW: Genetic Circuits Part I: Assembly Technologies
This week we learn core molecular biology tools and techniques for processing and assembling DNA, including PCR and Gibson Assembly.
Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits
This week covers neuromorphic genetic circuits, showing how engineered gene networks can implement neural-network “perceptron”-like computation and learning.
Week 9 — Cell-Free Systems
This week introduces synthesis of proteins using cellular machinery outside of a cell.
Week 10 — Advanced Imaging & Measurement Technology
This lecture presents a range of advanced technologies to do precision measurement of proteins at atomic scales, characterizing chemical composition, and detecting protein sequence and structure.
Week 11 — Bioproduction & Cloud Labs
Cloud Labs and Cell free extract

Labs

I will also share how I adapt lab work to a home setup and translate those workflows into scalable lab or office environments.

Week 1 Lab: Pipetting

Projects

Individual Final Project
Initially worked upon three different ideas: Idea 1 Breathe based diagnositc device Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation Idea 3 Decoding the genetic circuitry of lung cancer cells Later finalized to go with idea number one i.e Real time diagnostic system for lung health monitoring.
Group Final Project
Group Formed Proposal: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?tab=t.0 Documentation: https://pages.htgaa.org/2026a/ritika-saha/homework/week-05-hw-protein-design-part-ii/index.html By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action. The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed. BLAST can pull out homologous lysis proteins from the databases. Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated. ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions. AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ. We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein. Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Proposed Idea

I am exploring a project at the intersection of synthetic biology, diagnostics, and responsible AI.

The goal is to design systems that:

Enable low-cost, rapid biological diagnostics
Integrate AI responsibly into healthcare workflows
Improve accessibility of advanced diagnostics in resource-limited settings

This section will evolve as the idea matures through the course.

Follow My Journey

I document my learning, experiments, and reflections here:

✍️ Substack: Substack Link Would Be Added Here!

More updates coming soon!

Homework

Weekly homework submissions:

Week 1 HW: LungLite — Principles, Practices, and Governance
Principles, practices, and governance for the LungLite concept. Along with week 2 lecture prep
Week 2 HW: DNA Read, Write, Edit — SOD1 Molecular Journey
A documented journey through gel electrophoresis, SOD1 DNA design, codon optimization, plasmid construction, and DNA read/write/edit technologies.
Week 3 HW: Lab Automation — Opentrons Artwork
Opentrons scripting, lab automation exploration, and final project ideation.
Week 4 HW: Protein Design Part I
This week focuses on how sequence, structure, and energetics can be modeled and manipulated to create or optimize proteins with specified functions.
Week 5 HW: Protein Design Part II
This week we learn how cutting-edge AI and protein language models are used to design functional proteins and peptides “in silico”.
Week 6 HW: Genetic Circuits Part I: Assembly Technologies
This week we learn core molecular biology tools and techniques for processing and assembling DNA, including PCR and Gibson Assembly.
Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits
This week covers neuromorphic genetic circuits, showing how engineered gene networks can implement neural-network “perceptron”-like computation and learning.
Week 9 — Cell-Free Systems
This week introduces synthesis of proteins using cellular machinery outside of a cell.
Week 10 — Advanced Imaging & Measurement Technology
This lecture presents a range of advanced technologies to do precision measurement of proteins at atomic scales, characterizing chemical composition, and detecting protein sequence and structure.
Week 11 — Bioproduction & Cloud Labs
Cloud Labs and Cell free extract

Week 1 HW: LungLite — Principles, Practices, and Governance

🌬️ Project Idea: LungLite (AI + Breath Microfluidics + Cell-Free Synbio)

1) Biological engineering application/tool + why

LungLite is a low-cost, noninvasive breath monitoring system that uses a microfluidic disposable cartridge.

The cartridge contains freeze-dried cell-free synthetic biology reactions to detect breath biomarkers associated with airway inflammation and oxidative stress.

A smartphone camera reads the cartridge’s color/fluorescence pattern and an AI model interprets the result.

The tool is intended to help users monitor lung health over time—especially people with asthma, COPD risk, and high pollution exposure—and provide early warning signals of inflammation before severe symptoms appear.

LungLite leverages cell-free synthetic biology to detect breath biomarkers safely and efficiently. Instead of using live engineered cells, it employs freeze-dried transcription-translation (TX-TL) systems with non-replicating DNA circuits that respond to molecules associated with airway inflammation and oxidative stress. When a user exhales into the microfluidic cartridge, these engineered circuits trigger colorimetric or fluorescent signals proportional to biomarker levels. The sealed cartridge design, combined with built-in post-reaction neutralization, ensures safety, while AI algorithms analyze the visual output to provide an accurate, real-time readout of lung health. This integration of synthetic biology, microfluidics, and AI enables a low-cost, noninvasive tool for continuous monitoring, especially in high-risk environments or populations with limited access to traditional respiratory diagnostics.

Why this matters:
Current lung monitoring tools like spirometers often require strong forced exhalation and are not always accessible, comfortable, or usable for children, elderly people, or individuals in low-resource settings.

This problem is also deeply personal to me because I grew up around severe air pollution in Delhi, where “bad air days” are normal and respiratory symptoms are common. LungLite is motivated by the idea that people in high-exposure environments should be able to track early signs of inflammation easily and affordably—before symptoms become severe.

Initially worked on an AI-powered diagnostic tool for lung cancer. During this opportunity, I pivoted the design to focus on the Present Idea: a low-cost, noninvasive breath test that uses a microfluidic cartridge to track early signs of lung inflammation.

LungLite goal:
breathe → cartridge reacts → phone reads

References-

Cell free systems:

https://pmc.ncbi.nlm.nih.gov/articles/PMC11920963/
https://www.mdpi.com/1422-0067/25/16/9109
https://www.nature.com/articles/s41467-021-25233-y
https://www.mdpi.com/1420-3049/29/8/1878
Biochemical Preparation of Cell Extract for Cell-Free Protein Synthesis without Physical Disruption

DNA Circuits:

2) Governance/policy goals for an ethical future

Because LungLite sits at the intersection of bioengineering + consumer health + AI, it raises issues in biosecurity, lab safety, privacy, equity, and responsible health claims.

The governance goal is to ensure LungLite contributes to an ethical future by preventing harm while promoting constructive public health benefits.

Policy Goal A — Enhance Biosecurity

Sub-goal A1: Prevent incidents
Prevent misuse of cartridge biology (DNA templates, cell-free reagents) for harmful applications.
Sub-goal A2: Help respond
Ensure traceability and safe reporting if unsafe use or distribution occurs.

Policy Goal B — Foster Lab Safety

Sub-goal B1: Prevent incidents
Ensure safe handling, manufacturing, and disposal of cartridges and reagents.
Sub-goal B2: Help respond
Ensure protocols exist for spills, exposure, or improper disposal.

Policy Goal C — Protect the Environment

Sub-goal C1: Prevent incidents
Ensure cartridges and reagents do not introduce living organisms into waste streams.
Sub-goal C2: Help respond
Ensure recall, disposal, and remediation pathways if materials are found to persist or contaminate waste streams.

Policy Goal D — Other considerations

Minimize costs and burdens to stakeholders
Ensure feasibility for student prototyping and future scaling
Do not unnecessarily impede legitimate research
Promote constructive applications (public health monitoring, pollution health impacts)

3) Governance actions

Option 1: Technical Safety-by-Design

(Cell-free only + built-in kill chemistry + non-replicating DNA templates)

Idea

Many biosensors rely on living engineered organisms or wet reagents that could survive handling errors. LungLite instead commits to a cell-free-only architecture, using non-replicating DNA and post-reaction neutralization so the cartridge cannot become a biological propagation risk.

Design

Actors: student researchers, academic labs, cartridge designers, manufacturers.

Key elements:

Use commercially available or lab-prepared TX-TL cell-free extract
Use DNA templates without replication machinery
Add nuclease or denaturing reagents in a sealed “waste chamber” that activates after the reaction
Design the cartridge as a sealed unit so users cannot access wet reagents directly
Provide clear disposal instructions (trash-safe, not drain)
Include a QR code for standardized disposal instructions and recall notices

Assumptions

Cell-free systems are safe enough for consumer-adjacent use
DNA templates cannot be easily repurposed into harmful functions
Cartridge sealing prevents tampering and accidental exposure
Neutralization chemistry is robust across temperature/humidity variation

Risks of failure

Users could physically open the cartridge, mishandle reagents, or bypass neutralization
Poor sealing could cause leakage
DNA templates could be shared and repurposed outside intended use

Risks of “success”

Widespread adoption could normalize at-home “bio reaction kits” without safety literacy
Overconfidence in “bio-safe” claims could reduce careful oversight and institutional review

Option 2: Distribution + Supply Chain Controls

(DNA sequence screening + controlled reagent distribution + batch traceability)

Purpose

Even if the platform is designed safely, misuse risk increases when synbio components are distributed widely. This option adds governance at the distribution layer, aiming to prevent malicious acquisition or repurposing of DNA templates and reagents.

Design

Actors: DNA synthesis companies, cartridge manufacturers, distributors, university procurement offices, and potentially regulators.

Key elements:

DNA template sequences are screened using existing industry DNA synthesis screening norms
Cartridges sold with batch numbers, manufacturer ID, and basic traceability
Reagent supply chain restricted to verified vendors

Assumptions

Screening reliably catches harmful sequences
Vendors cooperate and screening is consistently implemented
Traceability meaningfully deters malicious use
Legitimate users will tolerate additional friction

Risks of failure

DIY synthesis or black-market sources bypass screening
Screening could generate false positives and slow benign development
Increased cost and friction could reduce adoption in low-resource communities

Risks of “success”

Centralization of power in a small number of vendors could limit open science
Smaller labs, students, and global south researchers could be excluded due to cost and access barriers
Overly broad screening could suppress legitimate respiratory health research

Option 3: Responsible Health Claims + Data Governance

(Limit medical claims + privacy-by-design + transparency)

Aim

Even if the biology is safe, LungLite could still cause harm through false reassurance, panic, biased AI outputs, or privacy breaches. This option focuses on preventing digital harms and misleading health interpretation.

Design

Actors: app developers, product companies, IRBs/ethics boards (if research), privacy regulators, public health agencies, and clinical collaborators.

Key elements:

Position LungLite initially as wellness monitoring, not a medical diagnostic
Focus on trend tracking rather than absolute disease classification
Provide clear disclaimers (“not a diagnosis; seek medical care if symptoms worsen”)
Use local-first processing: results computed on-device when possible
Require informed consent for any cloud upload or model improvement
Provide opt-out for data sharing
Publish model limitations and performance across demographics
Align product claims with existing regulatory distinctions between wellness tools and regulated diagnostic devices

Assumptions

Users understand “monitoring” vs “diagnosis”
Privacy measures meaningfully reduce harm
AI transparency improves trust and responsible use
The model will generalize across different phones, lighting, and populations

Risks of failure

Users may treat outputs as diagnoses and delay care
Data leaks could expose sensitive health data
Model bias could cause false reassurance or false alarms in specific groups
Smartphone hardware variability could distort readings

Risks of “success”

A widely adopted breath-health dataset could become commercially valuable and exploited
Insurers/employers/schools could pressure people to share breath scores (coercive screening)
“Wellness” framing could still function as a de facto diagnostic

4) Scoring matrix (1 = best, 3 = worst; n/a allowed)

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	1	1	2
• By helping respond	2	1	2
Foster Lab Safety
• By preventing incident	1	2	2
• By helping respond	2	2	2
Protect the environment
• By preventing incidents	1	2	2
• By helping respond	2	2	2
Other considerations
• Minimizing costs and burdens to stakeholders	1	3	2
• Feasibility?	1	2	1
• Not impede research	1	3	1
• Promote constructive applications	1	2	1

Figure: LungLite governance scoring matrix

Scoring justification:

Option 1 reduces biological risk at the source and does not rely heavily on enforcement.
Option 2 is strongest on biosecurity response, but worst on cost, equity, and research openness.
Option 3 is strongest for AI/privacy harms but does not fully address upstream biosecurity.

There are few environmental concerns regarding this device like: packaging waste at scale, there might be low environmental risk regarding cell-free extracts, small risks associated with chemicals and dyes. Mitigation can be: minimal-material design, sealed leak-proof cartridge, and take-back/clinic disposal at scale.

5) Prioritized strategy

Recommended strategy

I believe we should prioritize Option 1 + Option 3 as the core approach now, and adopt a lightweight version of Option 2 only once scaling and commercialization begins.

Why Option 1 is essential

Option 1 addresses the biggest safety and biosecurity concern,i.e, distributing engineered biological systems into homes. By committing to cell-free synthetic biology only, LungLite becomes safer, easier to dispose of, and easier to govern ethically.

Why Option 3 is equally critical

Even if the biology is safe, LungLite can still cause harm through:

false reassurance
panic from false positives
privacy breaches
biased AI outputs

Option 3 reduces these risks through responsible messaging, careful AI design, and privacy-by-design.

Where Option 2 fits

Option 2 becomes more important once LungLite is manufactured at scale. Heavy supply chain restrictions too early could:

block student prototyping
increase costs
reduce equitable access
slow research innovation

So the staged approach is:

Option 1 + Option 3 now
Option 2 later (commercialization / mass distribution)

Tradeoffs considered

Safety vs accessibility
Innovation vs security
User empowerment vs medical risk
Privacy vs model improvement

Audience for recommendation

This governance strategy is best targeted at:

MIT/university lab leadership
future consumer product manufacturers
public health agencies
privacy regulators

6) What I Learned

Ethical concerns that arose

Dual-use risk
AI harm
Privacy
Equity
Regulatory gray zone
Coercion risk (monitoring becomes surveillance)

Governance actions proposed to address these

Use cell-free systems only and avoid living organisms
Seal cartridges and neutralize biological material post-test
Implement privacy-by-design + local-first processing
Avoid medical claims until clinically validated
Keep manufacturing scalable and affordable
Add anti-coercion safeguards (minimize retention, discourage third-party access)

Week 2 Lecture Prep

Homework Questions — Professor Jacobson

1) DNA polymerase error rate, genome comparison, and how biology handles the discrepancy

Nature’s machinery for copying DNA is DNA polymerase. High-fidelity replicative DNA polymerases (with proofreading) have an error rate of approximately:

~1 error per 1,000,000 to 10,000,000 base pairs

Comparison to the human genome

The human genome is approximately:

~3,200,000,000 base pairs

If replication relied only on polymerase accuracy:

At 1 error per 1,000,000 bp:
3,200,000,000 / 1,000,000 = 3,200 errors per genome replication
At 1 error per 10,000,000 bp:
3,200,000,000 / 10,000,000 = 320 errors per genome replication

So even “high-fidelity” polymerase alone would still introduce hundreds to thousands of mistakes each time the genome is copied.

DNA polymerase’s shape precisely fits correct base pairs and uses a conformational “proofreading” motion to minimize misincorporation. https://www.sciencedirect.com/science/article/pii/S0969212615002695

How biology deals with the discrepancy

Biology reduces the final mutation rate using multiple layers of error correction:

Polymerase proofreading removes many misincorporated bases during replication.
Mismatch repair (MMR) fixes errors missed by proofreading.
Base excision repair (BER) fixes chemically damaged bases.
Nucleotide excision repair (NER) removes bulky lesions.

Together, these systems reduce the effective mutation rate to roughly:

~1 error per 1,000,000,000 to 10,000,000,000 bp per cell division

That means across one human genome replication, the final result is typically on the order of:

~0.3 to 3 mutations per cell division

🧪 Homework Questions — Dr. LeProust

1) What’s the most commonly used method for oligo synthesis currently?

The most commonly used method is:

Solid-phase phosphoramidite DNA synthesis

This is the standard chemistry used by most commercial oligo suppliers. It works by building a DNA strand one nucleotide at a time on a solid support (like a bead, column, or array surface) using repeated cycles of:

deprotection
coupling
capping
oxidation

2) Why is it difficult to make oligos longer than ~200 nt via direct synthesis?

Direct synthesis becomes difficult past ~200 nucleotides because:

A) The yield drops exponentially with length

Each synthesis step has less than 100% efficiency, so errors compound as the oligo gets longer.Even in an optimistic scenario, most strands are truncated or incorrect.

B) Errors accumulate

Long oligos contain more:

deletions (from incomplete coupling)
substitutions (from incorrect incorporation)
depurination damage (especially A/G under acidic conditions)
truncated fragments

C) Purification becomes difficult and expensive

Separating a perfect 200-mer from 199-mer and 198-mer fragments is hard, so cost and complexity increase quickly.

3) Why can’t you make a 2000 bp gene via direct oligo synthesis?

Because the yield would collapse to essentially zero and the error rate would be unusable.

A) Yield becomes extremely low

B) The error rate becomes unacceptable

Even the rare full-length molecules would almost always contain:

substitutions
deletions
truncations
damaged bases

So you would not get a clean, correct 2000 bp product.

What is done instead in practice?

Instead of direct synthesis, genes are made by:

synthesizing shorter oligos (usually 60–200 nt)
assembling them into longer DNA using methods like:
- Gibson Assembly
- PCR-based assembly
- Golden Gate
- Ligase Cycling Assembly (LCA)
then sequence-verifying clones to find a correct one

📄 HW by Dr. George Church — Grant Application (Devised)

Project Title

LungLite: A Room-Temperature, Breath-to-Color Microfluidic Cartridge Powered by Cell-Free Synthetic Biology and Smartphone AI for At-Home Lung Inflammation Monitoring

1) Abstract

Chronic respiratory disease affects hundreds of millions globally, yet lung health monitoring remains clinic-centered, effort-dependent, and inaccessible for many populations. Existing tools such as spirometry require strong forced exhalation and proper technique, while lab tests for inflammation and oxidative stress are expensive and slow.

I propose LungLite, a low-cost breath monitoring system that combines breath condensation microfluidics, freeze-dried cell-free synthetic biology, and smartphone computer vision + AI. Users breathe into a disposable cartridge that captures breath condensate and routes it through multiple reaction zones. Each zone contains a freeze-dried cell-free reaction that produces a colorimetric/fluorescent signal in response to oxidative stress and inflammation-associated breath chemistry.

A smartphone reader standardizes illumination, quantifies reaction outputs, and uses machine learning to interpret a multi-zone “fingerprint” into a trend score. LungLite is designed for safe, scalable, room-temperature storage and distribution and aims to enable daily lung health monitoring outside specialized medical centers.

2) Specific Aims

Aim 1 — Engineer a breath-to-fluid microfluidic cartridge
Hypothesis: A passive, low-cost cartridge can consistently convert breath into a defined liquid sample volume and deliver it to reaction zones with minimal variability.
Outcome: consistent fluid delivery across users and breathing conditions.

Aim 2 — Develop a multi-zone freeze-dried cell-free synbio sensing panel
Hypothesis: freeze-dried cell-free reactions can be stabilized at room temperature and produce reproducible outputs when rehydrated.
Outcome: 6–12 zone panel with internal controls and reproducible readouts.

Aim 3 — Build a smartphone reader + AI pipeline
Hypothesis: smartphone imaging + AI normalization improves reliability and interpretability.
Outcome: trend score + confidence + invalid-test detection.

3) Significance

LungLite could enable:

noninvasive monitoring
high-frequency measurement
accessibility for children and low-resource settings
room-temperature distribution
population-level monitoring during wildfire smoke events

4) Innovation

Cell-free synbio in a consumer cartridge
Fingerprint sensing rather than single biomarker
AI as a reliability layer (normalization + invalid detection + confidence)

5) Technical Approach and Work Plan (12 months)

Months 1–2: breath capture + condensation
Months 2–4: routing + zone array
Months 3–7: freeze-dry stabilization
Months 5–8: phone reader + illumination
Months 7–10: AI training + invalid detection
Months 10–12: validation + usability

6) Expected Deliverables

disposable cartridge (6–12 zones)
freeze-dried reaction panel + controls
smartphone reader dock
AI pipeline
validation report
product pathway plan

7) Risk Analysis and Mitigation

biomarkers variable → fingerprint + controls + AI
stability issues → sealed packaging + desiccant
diagnostic misuse → wellness framing + disclaimers
privacy misuse → local-first + opt-in + deletion

8) Safety, Ethics, and Governance Plan

cell-free only
sealed cartridges
built-in neutralization
sequence screening at synthesis
traceability if scaling begins
bias testing + transparency
no disease claims until validated

9) Team and Resources

Cross-disciplinary team spanning:

microfluidics
cell-free synbio
optics + computer vision
ML
product design

10) Long-Term Vision and Commercialization

reusable reader + disposable cartridges
room-temperature shipping
low-cost manufacturing (paper microfluidics)
Year 1: wellness monitoring
Year 2+: clinical validation + regulated pathway

HW Review Papers — Week Summary Notes

1) DNA Sequencing at 40 (Shendure, J., Balasubramanian, S., Church, G. et al. https://doi.org/10.1038/nature24286)

Idea

DNA sequencing has gone through multiple revolutions and now functions as a universal molecular measurement tool — not just a way to read genomes.

Key points

In ~40 years, sequencing scaled from kilobases → first human genome → millions of genomes
Sequencing is no longer only for genomes; it is now used to measure:
- gene expression (RNA-seq)
- chromatin state (ATAC-seq, ChIP-seq)
- lineage tracing
- somatic mutations
- molecular interactions
Costs dropped dramatically due to next-generation sequencing (NGS)
Authors argue sequencing’s long-term impact may rival the microscope

Key message

We have become extremely good at reading DNA at massive scale, speed, and low cost.

2) DNA Synthesis Technologies to Close the Gene Writing Gap (2023), Hoose, A., Vellacott, R., Storch, M. et al. https://doi.org/10.1038/s41570-022-00456-9

Focus

We still cannot write DNA as efficiently as we can read it — and this is a major bottleneck for synthetic biology.

Key points

Synthetic DNA is essential for:
- synthetic biology
- gene therapy
- DNA data storage
- nanotechnology
Current chemical synthesis struggles beyond ~200 base pairs
Long DNA synthesis is expensive and error-prone
New approaches aiming to scale DNA writing include:
- enzymatic (template-independent) synthesis
- microarray-based synthesis + assembly
- rolling circle amplification
- molecular assembly + cloning pipelines
As DNA writing becomes easier, regulation and oversight become more important

3) Recombineering and MAGE (2021), Wannier T, et al. Nat Rev Methods Primers, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9083505/

Core idea

Recombineering and MAGE enable precise, scarless, multiplex genome editing without requiring toxic double-strand breaks (DSBs).

Why traditional editing is limiting

Older editing methods (ZFNs, TALENs, CRISPR with DSBs):

rely on double-strand breaks
DSBs can be toxic (especially in bacteria)
repair often produces unwanted indels
low precision for large-scale combinatorial editing

Recombineering solution

Uses phage proteins (Redβ, Exo, Gam)
Introduces ssDNA or dsDNA with homology
DNA integrates at the replication fork
No DSB required
Editing is highly precise and “scarless”

MAGE (Multiplex Automated Genome Engineering)

Introduces many ssDNA oligos at once
Creates combinatorial diversity across many genomic sites
Enables genome-scale reverse genetics

4) CRISPR Technology: A Decade of Genome Editing is Only the Beginning, Wang, Doudna, et al., https://www.science.org/doi/10.1126/science.add8643

Focus area

CRISPR made genome editing programmable, accessible, and fast — dramatically lowering the barrier to entry.

Main points

Cas9 + guide RNA enables targeting by base pairing
Enabled:
- knockouts
- pooled genetic screens
- animal models
- crop editing
- emerging human therapies

Newer CRISPR-derived tools

Base editing: A→G or C→T without DSBs
Prime editing: templated edits with higher precision

Remaining challenges

off-target effects
delivery into cells/tissues
limited multiplexing at large scale
HDR inefficiency in many systems

Summary

Biotechnology has made DNA reading extremely scalable (sequencing), but DNA writing (synthesis) and DNA rewriting (editing) are still constrained by cost, accuracy, delivery, and scalability.

Sequencing is now a general-purpose measurement tool, while synthesis and editing are rapidly improving — raising both exciting capabilities and new governance needs.

I used artificial intelligence tools, including ChatGPT-5.0, for language refinement, structural organization, and clarity of expression in this documentation. All scientific concepts, design decisions, sequence selections, experimental reasoning, and technical interpretations reflect my own understanding and work. The AI tool was used solely to improve readability, coherence, and presentation quality.

Week 2 HW: DNA Read, Write, Edit — SOD1 Molecular Journey

🧬 Week 2 Documentation

DNA Read → DNA Write → DNA Edit

A Molecular Design Journey

This week was not just a technical exercise. It was an exploration — from abstract sequence to physical plasmid, from conceptual art to molecular execution. Below is the full documentation of my process, including failures, iterations, and insights gained.

🧪 Part 0: Basics of Gel Electrophoresis

Lectures + Recitation

I attended/watched all required lecture and recitation materials.

Conceptual Understanding

Gel electrophoresis separates DNA fragments based on size using:

Negatively charged DNA backbone
Electric field
Agarose matrix
Size-dependent migration

Smaller fragments travel further.

🎨 Part 1: Benchling & In-silico Gel Art

Step 1: Benchling Account + Lambda DNA Import

Created Benchling account
Imported Lambda DNA reference sequence

Step 2: Simulated Restriction Digestion

Enzymes used:

EcoRI
HindIII
BamHI
KpnI
EcoRV
SacI
SalI

Initial Failure

My first digestion simulation produced fragmented bands that were too similar in size. The pattern looked visually indistinct.

Iteration Strategy

Tested different single and double digests
Compared fragment size outputs
Adjusted enzyme combinations

Eventually, I selected combinations that produced strong band separation.

Kindly find attach all the simulations carried out for the same task:

The following image represents setting up the Benchling account and loading lambda sequence, ultimately I was able to visualize as shown here-

The following image shows the end result after carrying out the digestion process, I worked on a pattern design of “H Letter”, reason being my startup company’s first letter is H! Although, I must say I struggled alot and I intend to re run all of these simulations and tasks at least 5-6 times!

🧪 In-Silico Gel Art

I did try to work out on gel art, but yet again this part of the homework was something I really struggled.

Insight

Never had I imagined that biological mechanisms could generate such striking and beautiful art forms. As someone who once dreamed of becoming an artist but ultimately pursued engineering, I find this intersection deeply exciting. Working with gel patterns and molecular design has rekindled a childhood aspiration I once held close — the dream of opening an art studio.

🧬 Part 3: DNA Design Challenge

3.1 Choose Your Protein

Selected Protein: Human Superoxide Dismutase 1 (SOD1)

UniProt ID: P00441

sp|P00441|SODC_HUMAN
Superoxide dismutase [Cu-Zn]
OS=Homo sapiens OX=9606 PE=1 SV=2

Why SOD1?

SOD1 converts:

O₂⁻ → O₂ + H₂O₂

It protects against oxidative stress and is implicated in ALS.

It also integrates mechanistically with my LungLite platform — serving as a biochemical actuator.

Kindly find attached an image of the protein sequence:

Amino Acid Sequence

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

3.2 Reverse Translation

Using online reverse translation tools, I generated a nucleotide sequence.

Failure

Reverse translation produced multiple valid sequences due to codon degeneracy.

There is no single “correct” DNA sequence for a protein.

Resolution

I selected one biologically valid version as a starting template.

Pre-optimization DNA:

atggtgaaagcggtgtgcgtgctgaaaggcgatggcccggtgcagggcattattaacttt...

Kindly find attached images showing conversion of amino acid sequences to dna sequence (extremely interesting!):

3.3 Codon Optimization

Why Optimize?

Different organisms prefer specific codons due to tRNA abundance.

Without optimization:

Ribosome stalling
Low yield
Translation inefficiency

Host Chosen: Escherichia coli

Reasons:

Fast growth
High recombinant yield
Standard lab organism

Final Codon Optimized Sequence

ATGGTTAAAGCGGTATGCGTGCTGAAAGGCGATGGCCCGGTGCAGGGCATTATTAACTTT
GAACAGAAAGAATCAAACGGCCCGGTGAAAGTGTGGGGCAGCATTAAAGGCCTGACCGA
AGGTCTGCACGGCTTTCACGTGCATGAATTTGGCGATAACACCGCGGGCTGCACCAGCG
CCGGCCCGCATTTTAACCCGCTGAGCCGCAAACATGGCGGCCCGAAAGATGAAGAACGCC
ATGTGGGCGATCTGGGCAATGTGACCGCGGATAAAGATGGCGTGGCCGATGTGAGCATT
GAAGATAGCGTGATTAGCCTGAGCGGCGATCATTGCATTATTGGCCGCACCCTGGTTGT
TCATGAAAAAGCAGATGATCTGGGCAAAGGCGGCAACGAAGAAAGCACCAAAACCGGCA
ATGCGGGGAGCCGCCTGGCGTGCGGCGTGATTGGCATCGCCCAG

Loading the above sequence directly on benchling platform and visualizing it:

3.4 From DNA to Protein

Expression Methods:

Cell-Dependent

Transform plasmid into E. coli
Antibiotic selection
Transcription
Translation
His-tag purification

Cell-Free Option

TX-TL system
Direct protein production without cells

Building the expression cassette:

Create a digital diagram of above cassette:

3.5 Central Dogma Alignment

DNA:

ATG GTT AAA GCG

RNA:

AUG GUU AAA GCG

Protein:

Met Val Lys Ala

Each 3 nucleotides = 1 amino acid
T → U during transcription

🧬 Part 4: Prepare a Twist DNA Synthesis Order

4.1 Accounts

Created Twist account
Created Benchling account

4.2 Build Expression Cassette

Structure:

Promoter
RBS
ATG
SOD1 Coding Sequence
7x His Tag
TAA
Terminator

Failure

Initially forgot to annotate regions in Benchling.

Fix

Annotated:

Promoter
RBS
CDS
His Tag
Terminator

Verified via Linear Map view.

Final Insert Sequence

>SOD1_LungLite_Expression_Cassette
TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATG
GTTAAAGCGGTATGCGTGCTGAAAGGCGATGGCCCGGTGCAGGGCATTATTAACTTTGA
ACAGAAAGAATCAAACGGCCCGGTGAAAGTGTGGGGCAGCATTAAAGGCCTGACCGAAGG
TCTGCACGGCTTTCACGTGCATGAATTTGGCGATAACACCGCGGGCTGCACCAGCGCCG
GCCCGCATTTTAACCCGCTGAGCCGCAAACATGGCGGCCCGAAAGATGAAGAACGCCAT
GTGGGCGATCTGGGCAATGTGACCGCGGATAAAGATGGCGTGGCCGATGTGAGCATTGA
AGATAGCGTGATTAGCCTGAGCGGCGATCATTGCATTATTGGCCGCACCCTGGTTGTTC
ATGAAAAAGCAGATGATCTGGGCAAAGGCGGCAACGAAGAAAGCACCAAAACCGGCAAT
GCGGGGAGCCGCCTGGCGTGCGGCGTGATTGGCATCGCCCAGCATCACCATCACCATC
ATCACTAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTT
TTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGT
GGGCCTTTCTGCGTTTATA

4.3–4.6 Twist Order

Selected:

Genes → Clonal Genes
Vector: pTwist Amp High Copy

Imported GenBank file back into Benchling to confirm construct.

I built my first plasmid.

The images document the workflow: exporting a FASTA file from Benchling, creating a Twist Bioscience account, (hypothetically) placing an order by selecting Clonal Gene, downloading the resulting gene construct file (a .gb / GenBank file) from the Twist platform, and then uploading that same file back into Benchling.

🧬 Part 5: DNA Read / Write / Edit

5.1 DNA Read

What Would I Sequence?

The SOD1 gene sequence to understand its structure, variants, and oxidative stress relevance in lung epithelial biology.

Why This Matters

Superoxide Dismutase 1 (SOD1) is a cytosolic antioxidant enzyme that catalyzes the conversion of superoxide radicals (O₂⁻) into oxygen and hydrogen peroxide. Because oxidative stress is central to airway inflammation, SOD1 represents the molecular boundary between resilience and pathology in lung tissue. Mutations in SOD1 are linked to Amyotrophic Lateral Sclerosis (ALS), and its structure and function are well-characterized, making it ideal for recombinant engineering and diagnostic integration.

Technology Chosen: Oxford Nanopore

Generation: Third-generation sequencing

Input:

Extracted DNA containing SOD1
Adapter ligation

Mechanism:

DNA passes through nanopores
Ionic current changes → base calling

Output:

FASTQ long reads of SOD1 sequence

Why Nanopore?

Long reads allow full-length SOD1 sequencing
Detects structural variants and potential regulatory regions
Portable and scalable

Limitations:

Higher error rate than Illumina
Correctable with sequencing depth and consensus alignment

5.2 DNA Write

What Would I Synthesize?

A codon-optimized SOD1 expression cassette and ROS-responsive genetic circuits for LungLite.

Rationale

To integrate SOD1 into LungLite, the gene must be optimized for expression in bacterial or cell-free systems. This enables recombinant production and functional embedding into oxidative stress detection circuits.

Technology

Phosphoramidite oligo synthesis
PCR assembly
Clonal gene insertion into expression vector
7×His tag for purification

Application in LungLite

Biological Amplifier Strategy
- ROS activates redox-sensitive promoter
- Induces SOD1 expression in freeze-dried TX–TL system
- SOD1 converts superoxide → H₂O₂
- Coupled colorimetric/fluorescent reaction produces smartphone-readable signal
Calibration Standard Strategy
- Purified recombinant SOD1 embedded in microfluidic wells
- Known concentrations normalize ROS dye response
- Enables quantitative oxidative stress scoring

Limitations

Length constraints in synthesis
Synthesis errors
Cost scaling for large constructs

5.3 DNA Edit

What Would I Edit?

Upregulate antioxidant pathways — including SOD1 expression — in lung epithelial cells.

Technology: CRISPR-Cas9

Steps

gRNA design targeting regulatory region
Cas9-induced double-strand break
HDR-mediated repair with enhanced promoter template

Input:

gRNA plasmid
Cas9
Donor DNA template
Target lung epithelial cells

Goal

Increase endogenous SOD1 buffering capacity to restore redox balance in oxidative stress conditions.

Limitations

Off-target effects
Variable editing efficiency
Delivery challenges in airway epithelium

🌬 Final Reflection

What began as:

Lambda DNA
→ Restriction digest
→ Gel electrophoresis

Evolved into:

DNA Read → Sequencing SOD1
DNA Write → Engineering ROS-responsive SOD1 circuits
DNA Express → Recombinant protein production
DNA Integrate → Embedding SOD1 into LungLite microfluidic diagnostics

SOD1 is not merely a recombinant protein in this project. It becomes a functional biochemical actuator — translating environmental oxidative exposure into measurable signal output.

Growing up in Delhi, where severe air pollution makes oxidative stress a daily lived experience, reframes SOD1 from an abstract enzyme to a molecular proxy for environmental exposure. LungLite transforms this molecular logic into a portable, AI-integrated, noninvasive public health device.

The DNA Design Challenge is no longer just molecular cloning — it becomes the foundation for a programmable redox-sensing health platform.

I acknowledge that I used artificial intelligence tools, including ChatGPT-5.0, for language refinement, structural organization, and improvement of clarity in this documentation.

All scientific concepts, experimental designs, sequence selections, analytical reasoning, and technical interpretations presented in this work reflect my own understanding and independent effort. The AI tool was used solely to enhance readability, coherence, grammar, and overall presentation quality.

The prompts primarily included instructions such as: “Rewrite the text and correct grammatical errors.”

Week 3 HW: Lab Automation — Opentrons Artwork

Lab Automation and Opentrons Programming

Part 1: Python Script for Opentrons Artwork

Objective

Our first task was to generate an artisitc design using the GUI at opentrons-art.rcdonovan.com.

My inspiration for this design was my dog shiro (although he is an Indian spitz), I ended up designing a dachshund-

I, then exported the python script directly from the interface, as per the given instructions:

from opentrons import types

import string

metadata = {
    'protocolName': '{YOUR NAME} - Opentrons Art - HTGAA',
    'author': 'HTGAA',
    'source': 'HTGAA 2026',
    'apiLevel': '2.20'
}

Z_VALUE_AGAR = 2.0
POINT_SIZE = 1.25

mrfp1_points = [(23,31), (21,29), (23,29), (25,29), (19,27), (23,27), (21,23), (17,21), (19,21), (9,19), (11,19), (13,19), (15,19), (17,19), (1,11), (5,11), (1,9), (-1,7), (1,7), (-7,5), (-5,5), (-3,5), (-1,5), (-7,3), (-5,3), (-3,3), (-1,3), (-5,1), (-3,1), (-1,1), (-5,-1), (-3,-1), (9,-7), (-15,-9), (-11,-9), (15,-9), (23,-9), (25,-9), (27,-9), (25,-11), (27,-11), (-19,-13), (9,-13), (11,-13), (-5,-17), (-21,-19), (-7,-19), (-21,-21), (-9,-21), (-19,-23)]
mko2_points = [(19,29), (15,27), (17,27), (21,27), (13,25), (15,25), (17,25), (19,25), (21,25), (23,25), (11,23), (13,23), (15,23), (17,23), (19,23), (7,21), (9,21), (11,21), (13,21), (15,21), (5,19), (7,19), (5,17), (7,17), (9,17), (11,17), (13,17), (15,17), (17,17), (19,17), (7,15), (9,15), (11,15), (13,15), (15,15), (17,15), (7,13), (9,13), (11,13), (13,13), (15,13), (9,11), (11,11), (13,11), (15,11), (11,9), (13,9), (15,9), (13,7), (15,7), (7,3), (9,3), (7,1), (9,1), (11,1), (13,1), (15,1), (17,1), (7,-1), (9,-1), (11,-1), (13,-1), (15,-1), (17,-1), (7,-3), (9,-3), (11,-3), (13,-3), (15,-3), (17,-3), (9,-5), (11,-5), (13,-5), (15,-5), (17,-5), (17,-7), (21,-7), (23,-7), (25,-7), (27,-7), (-27,-9), (-25,-11), (19,-11), (-23,-13), (21,-13), (27,-13), (7,-15), (19,-15), (21,-15), (23,-15), (-7,-17), (-3,-17), (-11,-19), (-9,-19), (-5,-19), (-23,-21), (-13,-21), (-11,-21), (-7,-21), (-5,-21), (-23,-23), (-21,-23), (-15,-23), (-13,-23), (-11,-23), (-9,-23), (-7,-23), (-23,-25), (-21,-25), (-19,-25), (-17,-25), (-15,-25), (-13,-25), (-11,-25), (-9,-25), (-25,-27), (-23,-27), (-11,-27), (-9,-27), (-27,-29), (-25,-29), (-13,-29), (-11,-29)]
mscarlet_i_points = [(5,27), (7,27), (9,27), (11,27), (13,27), (5,25), (7,25), (9,25), (11,25), (3,23), (5,23), (7,23), (9,23), (-1,21), (1,21), (3,21), (5,21), (-3,19), (-1,19), (1,19), (3,19), (-13,17), (-11,17), (-9,17), (-7,17), (-5,17), (-3,17), (-1,17), (1,17), (3,17), (-15,15), (-13,15), (-11,15), (-9,15), (-7,15), (-5,15), (-3,15), (-1,15), (1,15), (3,15), (5,15), (-15,13), (-13,13), (-11,13), (-9,13), (-7,13), (-5,13), (-3,13), (-1,13), (1,13), (3,13), (5,13), (-15,11), (-13,11), (-11,11), (-9,11), (-7,11), (-5,11), (-3,11), (-1,11), (3,11), (7,11), (-15,9), (-13,9), (-11,9), (-9,9), (-7,9), (-5,9), (-3,9), (-1,9), (3,9), (5,9), (7,9), (9,9), (-15,7), (-13,7), (-11,7), (-9,7), (-7,7), (-5,7), (-3,7), (3,7), (5,7), (7,7), (9,7), (11,7), (-27,5), (1,5), (3,5), (5,5), (7,5), (9,5), (11,5), (13,5), (15,5), (-27,3), (1,3), (3,3), (5,3), (11,3), (13,3), (15,3), (-27,1), (1,1), (3,1), (5,1), (-1,-1), (1,-1), (3,-1), (5,-1), (-27,-3), (-3,-3), (-1,-3), (1,-3), (3,-3), (5,-3), (-27,-5), (-5,-5), (-3,-5), (-1,-5), (1,-5), (3,-5), (5,-5), (7,-5), (-27,-7), (-25,-7), (-13,-7), (-11,-7), (-9,-7), (-7,-7), (-5,-7), (-3,-7), (-1,-7), (1,-7), (3,-7), (5,-7), (7,-7), (11,-7), (13,-7), (15,-7), (19,-7), (-25,-9), (-23,-9), (-17,-9), (-13,-9), (-9,-9), (-7,-9), (-5,-9), (-3,-9), (-1,-9), (1,-9), (3,-9), (5,-9), (7,-9), (9,-9), (11,-9), (13,-9), (17,-9), (19,-9), (21,-9), (-23,-11), (-21,-11), (-17,-11), (-15,-11), (-13,-11), (-11,-11), (-9,-11), (-7,-11), (-5,-11), (-3,-11), (-1,-11), (1,-11), (3,-11), (5,-11), (7,-11), (9,-11), (15,-11), (17,-11), (21,-11), (-21,-13), (-17,-13), (-15,-13), (-13,-13), (-11,-13), (-9,-13), (-7,-13), (-5,-13), (-3,-13), (-1,-13), (1,-13), (3,-13), (5,-13), (7,-13), (23,-13), (-19,-15), (-17,-15), (-15,-15), (-13,-15), (-11,-15), (-9,-15), (-7,-15), (-5,-15), (-3,-15), (-1,-15), (1,-15), (3,-15), (5,-15), (-19,-17), (-17,-17), (-15,-17), (-13,-17), (-11,-17), (-9,-17), (-19,-19), (-17,-19), (-15,-19), (-13,-19), (-19,-21), (-17,-21), (-15,-21), (-17,-23)]
azurite_points = [(31,-9), (15,-13), (25,-13)]
mclover3_points = [(23,-11)]

point_name_pairing = [("mrfp1", mrfp1_points),("mko2", mko2_points),("mscarlet_i", mscarlet_i_points),("azurite", azurite_points),("mclover3", mclover3_points)]

# Robot deck setup constants
TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

# Place the PCR tubes in this order
well_colors = {
    'A1': 'sfGFP',
    'A2': 'mRFP1',
    'A3': 'mKO2',
    'A4': 'Venus',
    'A5': 'mKate2_TF',
    'A6': 'Azurite',
    'A7': 'mCerulean3',
    'A8': 'mClover3',
    'A9': 'mJuniper',
    'A10': 'mTurquoise2',
    'A11': 'mBanana',
    'A12': 'mPlum',
    'B1': 'Electra2',
    'B2': 'mWasabi',
    'B3': 'mScarlet_I',
    'B4': 'mPapaya',
    'B5': 'eqFP578',
    'B6': 'tdTomato',
    'B7': 'DsRed',
    'B8': 'mKate2',
    'B9': 'EGFP',
    'B10': 'mRuby2',
    'B11': 'TagBFP',
    'B12': 'mChartreuse_TF',
    'C1': 'mLychee_TF',
    'C2': 'mTagBFP2',
    'C3': 'mEGFP',
    'C4': 'mNeonGreen',
    'C5': 'mAzamiGreen',
    'C6': 'mWatermelon',
    'C7': 'avGFP',
    'C8': 'mCitrine',
    'C9': 'mVenus',
    'C10': 'mCherry',
    'C11': 'mHoneydew',
    'C12': 'TagRFP',
    'D1': 'mTFP1',
    'D2': 'Ultramarine',
    'D3': 'ZsGreen1',
    'D4': 'mMiCy',
    'D5': 'mStayGold2',
    'D6': 'PA_GFP'
}

volume_used = {
    'mrfp1': 0,
    'mko2': 0,
    'mscarlet_i': 0,
    'azurite': 0,
    'mclover3': 0
}

def update_volume_remaining(current_color, quantity_to_aspirate):
    rows = string.ascii_uppercase
    for well, color in list(well_colors.items()):
        if color == current_color:
            if (volume_used[current_color] + quantity_to_aspirate) > 250:
                # Move to next well horizontally by advancing row letter, keeping column number
                row = well[0]
                col = well[1:]
                
                # Find next row letter
                next_row = rows[rows.index(row) + 1]
                next_well = f"{next_row}{col}"
                
                del well_colors[well]
                well_colors[next_well] = current_color
                volume_used[current_color] = quantity_to_aspirate
            else:
                volume_used[current_color] += quantity_to_aspirate
            break

def run(protocol):
    # Load labware, modules and pipettes
    protocol.home()

    # Tips
    tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

    # Pipettes
    pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

    # Deep Well Plate
    temperature_plate = protocol.load_labware('nest_96_wellplate_2ml_deep', 6)

    # Agar Plate
    agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')
    agar_plate.set_offset(x=0.00, y=0.00, z=Z_VALUE_AGAR)

    # Get the top-center of the plate, make sure the plate was calibrated before running this
    center_location = agar_plate['A1'].top()

    pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)
    
    # Helper function (dispensing)
    def dispense_and_jog(pipette, volume, location):
        assert(isinstance(volume, (int, float)))
        # Go above the location
        above_location = location.move(types.Point(z=location.point.z + 2))
        pipette.move_to(above_location)
        # Go downwards and dispense
        pipette.dispense(volume, location)
        # Go upwards to avoid smearing
        pipette.move_to(above_location)

    # Helper function (color location)
    def location_of_color(color_string):
        for well,color in well_colors.items():
            if color.lower() == color_string.lower():
                return temperature_plate[well]
        raise ValueError(f"No well found with color {color_string}")

    # Print pattern by iterating over lists
    for i, (current_color, point_list) in enumerate(point_name_pairing):
        # Skip the rest of the loop if the list is empty
        if not point_list:
            continue

        # Get the tip for this run, set the bacteria color, and the aspirate bacteria of choice
        pipette_20ul.pick_up_tip()
        max_aspirate = int(18 // POINT_SIZE) * POINT_SIZE
        quantity_to_aspirate = min(len(point_list)*POINT_SIZE, max_aspirate)
        update_volume_remaining(current_color, quantity_to_aspirate)
        pipette_20ul.aspirate(quantity_to_aspirate, location_of_color(current_color))

        # Iterate over the current points list and dispense them, refilling along the way
        for i in range(len(point_list)):
            x, y = point_list[i]
            adjusted_location = center_location.move(types.Point(x, y))

            dispense_and_jog(pipette_20ul, POINT_SIZE, adjusted_location)
            
            if pipette_20ul.current_volume == 0 and len(point_list[i+1:]) > 0:
                quantity_to_aspirate = min(len(point_list[i:])*POINT_SIZE, max_aspirate)
                update_volume_remaining(current_color, quantity_to_aspirate)
                pipette_20ul.aspirate(quantity_to_aspirate, location_of_color(current_color))

        # Drop tip between each color
        pipette_20ul.drop_tip()

I also experimented with a Google Colab code file, where I worked on generating a design based on an image resembling the Earth.

Part 2: Post-Lab Questions

2.1 Published Paper Using Automation

Paper Title

An Automated Versatile Diagnostic Workflow for Infectious Disease Detection in Low-Resource Settings

Source

https://www.mdpi.com/2072-666X/15/6/708

Summary

This paper presents an automated diagnostic workflow designed for detecting infectious diseases in low-resource settings. The system integrates microfluidics, biosensing, and automation to process biological samples efficiently. It focuses on creating a scalable and portable diagnostic pipeline that reduces manual intervention while maintaining accuracy.

Use of Automation

The workflow incorporates automation tools to handle multiple steps of the diagnostic process, including sample preparation, reagent handling, and reaction execution. Automated systems ensure precise liquid handling, reduce human error, and enable reproducibility across multiple tests. The integration of microfluidic platforms further enhances throughput and minimizes reagent usage.

Key Contribution

The key contribution of this work is the development of a versatile and low-cost automated diagnostic platform that can be deployed in resource-limited environments. It demonstrates how automation can bridge gaps in healthcare accessibility by enabling reliable and rapid disease detection.

Relevance to This Week

This paper directly relates to this week’s focus on lab automation using Opentrons. It highlights how automated liquid handling and integrated workflows can transform biological experiments into scalable and reproducible systems, similar to how we programmed the Opentrons robot.

2.2 Final Project — Automation Plan

Project Overview

For the final project, I propose developing an automated diagnostic system that detects disease biomarkers from breath condensate samples using a microfluidic and cell-free synthetic biology platform.

Problem Statement

Traditional diagnostic methods can be invasive, time-consuming, and require well-equipped laboratory settings. There is a need for a non-invasive, rapid, and scalable diagnostic solution that can work in low-resource environments.

Proposed Solution

The proposed system will combine breath-based sample collection with automated liquid handling and synthetic biology reactions. Using an Opentrons robot, the workflow will automate sample distribution, reagent addition, and reaction setup across multiple wells.

Workflow Description

def run(protocol):

    # Load labware and pipette
    tiprack = protocol.load_labware("opentrons_96_tiprack_20ul", 9)
    pipette = protocol.load_instrument("p20_single_gen2", "right", [tiprack])

    plate = protocol.load_labware("corning_96_wellplate_360ul_flat", 1)

    # Step 1: Add sample to wells
    for well in plate.wells():
        pipette.pick_up_tip()
        pipette.aspirate(10, plate['A1'])
        pipette.dispense(10, well)
        pipette.mix(2, 10, well)
        pipette.drop_tip()

    # Step 2: Incubation
    protocol.delay(minutes=30)

    # Step 3: Output ready
    print("Reactions complete")

Tools and Technologies

Opentrons liquid handling robot
Microfluidic chip systems
Cell-free synthetic biology platforms
Optional cloud lab systems (e.g., Ginkgo Nebula)

Experimental Plan

Collect breath condensate sample
Distribute samples into multiple wells using Opentrons
Add reagents to initiate reactions
Incubate under controlled conditions
Measure outputs (fluorescence or color change)

Expected Outcome

The system will enable rapid, automated, and non-invasive detection of biomarkers with high reproducibility. It will demonstrate how automation can be used to scale biological diagnostics.

Part 3: Final Project Ideas

Idea 1 Breathe based diagnositc device

Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation

Idea 3 Decoding the genetic circuitry of lung cancer cells

Week 4 HW: Protein Design Part I

Protein Design

Part A: Conceptual Questions

1. Number of amino-acid molecules in 500 g meat

Meat is roughly ~20% protein → ~100 g protein in 500 g meat (order-of-magnitude estimate).
Average amino-acid residue ≈ 100 Da = 100 g/mol.
Moles of amino acids ≈ 100 g ÷ 100 g/mol = 1 mol.
Number of molecules ≈ (6x10^{23}) amino-acid residues. → Roughly 10^24 amino-acid molecules.

2. Why only 20 natural amino acids

Twenty amino acids provide a compromise between:

Chemical diversity (charge, hydrophobicity, size, reactivity)
Biosynthetic metabolic cost
Translational accuracy and evolutionary robustness

This set is sufficient to build functional proteins.

3. Can we design non-natural amino acids

Yes. Non-natural amino acids are widely synthesized. Examples of designs:

Fluorinated amino acids → increase stability
Photocaged amino acids → allow light-controlled protein activation
Azido- or alkyne-containing amino acids → enable click chemistry
Conformationally restricted amino acids → stabilize protein folds

They can be incorporated using engineered tRNA–synthetase systems.

4. Origin of amino acids before life

Amino acids were likely produced by prebiotic chemistry:

Atmospheric discharge reactions (e.g., Miller-Urey–type synthesis)
Hydrothermal vent chemistry
Organic molecules delivered by meteorites
Photochemical synthesis in early oceans

These processes generated many amino acids before biological enzymes existed.

5. Handedness of α-helix made from D-amino acids

Proteins built from D-amino acids form left-handed α-helices, which are mirror images of the right-handed helices formed by L-amino acids.

6. Why most molecular helices are right-handed

Because natural proteins are composed mainly of L-amino acids. The stereochemistry of the peptide backbone energetically favors right-handed α-helices.

7. Why β-sheets tend to aggregate

β-strands expose backbone hydrogen-bond donors and acceptors, allowing intermolecular hydrogen bonding. Side chains may pack via hydrophobic interactions, promoting sheet-to-sheet association.

8. Why do many amyloid diseases form β-sheets?

Amyloid diseases are associated with protein misfolding. Normally folded proteins are stabilized by their native tertiary structure, but under pathological conditions partial unfolding can expose backbone hydrogen-bond donors and acceptors. These segments tend to re-associate into cross-β structures, where β-strands run perpendicular to the fibril axis and β-sheets stack along the fiber direction.

β-sheet conformations are favored because:

The peptide backbone can form extensive intermolecular hydrogen-bond networks, which provides high thermodynamic stability.
Side chains can pack tightly through hydrophobic interactions, reducing solvent exposure.
Once a nucleus forms, β-sheet aggregation becomes self-propagating, leading to fibril growth.

This aggregation tendency underlies diseases such as Alzheimer-type neurodegeneration and several systemic amyloidoses, where misfolded peptides accumulate into insoluble fibrils.

9. Can amyloid β-sheets be used as materials? Design of a β-sheet motif

Yes. Amyloid β-sheet assemblies are actively explored as biomaterials because they are:

Mechanically strong due to dense hydrogen-bond networks
Highly ordered at the nanoscale
Capable of programmable self-assembly
Biocompatible when designed carefully

A useful design strategy is to construct a short amphiphilic β-hairpin peptide motif that promotes controlled fibrillization.

Example β-sheet motif design

A common design is an alternating pattern of hydrophobic and hydrophilic residues:

X – H – X – H – X – H – X – H – turn – H – X – H – X – H – X – H – X

Where:

H = hydrophobic residue (e.g., Val, Leu, Ile, Phe) to drive core packing
X = charged or polar residue (e.g., Lys, Glu, Ser) to improve solubility and orientation

The turn region can use a Gly-Pro-Gly or similar flexible motif to stabilize the β-hairpin

Example specific sequence design:

KLVFFAEGPGAEFVLK

Design principles:

Include a nucleation core (often aromatic or hydrophobic residues) to trigger stacking.
Balance solubility and aggregation tendency to avoid uncontrolled precipitation.
Use end-capping charges to control fibril length if necessary.

Such motifs can form nanofibers, hydrogels, or scaffold materials for drug delivery, tissue engineering, and nanoelectronics.

Part B: Protein Analysis and Visualization

1. Briefly describe the protein you selected and why you selected it.

The protein I selected is Superoxide dismutase 1 (SOD1). I selected it because it is a key antioxidant enzyme that protects cells from oxidative stress by catalyzing the conversion of superoxide radicals into oxygen and hydrogen peroxide. It is relevant to lung oxidative injury and can be integrated into redox-sensing diagnostic platforms.

2. Identify the amino acid sequence of your protein

The amino acid sequence of SOD1 is:

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

3. Sequence length and most frequent amino acid

Length: 153 amino acids
Most frequent amino acid: Glycine (G) is among the most frequent residues in this sequence.

4. How many protein sequence homologs are there for your protein?

Using UniProt BLAST search, SOD1 has hundreds of homologous sequences across different species. The enzyme is highly conserved because of its essential antioxidant function.

BLAST search for SOD1 sequence returned approximately 250 homologous protein sequences, indicating that SOD1 is a highly conserved protein across many species.

5. Protein family?

SOD1 belongs to the Cu/Zn superoxide dismutase family, which contains metalloenzymes that detoxify superoxide radicals.

6. Identify the structure page of your protein in RCSB. When was the structure solved? Is it a good quality structure? Are there any other molecules in the solved structure apart from protein?

Representative structure:

Structure database: Human Cu,Zn Superoxide Dismutase 1 structure https://www.rcsb.org/structure/1HL5

The structure of Human Cu,Zn Superoxide Dismutase 1 structure was deposited and released in 2003 (deposition date: 2003-03-13, release date: 2003-05-08). It is a good quality structure because the resolution is 1.80 Å, which is much smaller than the reference threshold of 2.70 Å. A resolution of 1.80 Å indicates high structural accuracy and reliable atomic detail.

Other molecules present in the structure:

Copper (Cu²⁺)
Zinc (Zn²⁺)
Water molecules

7. Structure classification family

The protein belongs to the Cu,Zn superoxide dismutase family.

8. PyMoL Visualization

In PyMOL, you can visualize the same protein in different styles by creating multiple representations or by toggling representations in the same session.

Use the following commands:

Cartoon representation

hide everything, all
show cartoon, SOD1
color cyan, SOD1

Ribbon representation

hide everything, all
show ribbon, SOD1
color magenta, SOD1

Ball and stick (sticks + spheres)

hide everything, all
show sticks, SOD1
show spheres, SOD1
set sphere_scale, 0.25
color yellow, SOD1

9. Secondary structure composition

SOD1 contains:

More β-sheets than α-helices.
The structure is mainly a β-barrel with several loop regions.

10. Hydrophobic vs hydrophilic residue distribution

Hydrophobic residues are mainly buried inside the β-barrel core to stabilize the structure.
Hydrophilic residues are exposed on the surface, facilitating solvent interaction and enzymatic activity.

Color protein by residue type

11. Surface pocket / binding cavity

Yes, SOD1 has metal-binding pockets that coordinate copper and zinc ions. These pockets are essential for catalytic conversion of superoxide radicals.

Visualize protein surface and binding pockets

Part C: Using ML-Based Protein Design Tools

Part C1: Protein Language Modeling

a. Unsupervised deep mutational scan using ESM2

To generate an unsupervised deep mutational scan (DMS), the wild-type protein sequence is first passed through ESM2 to compute the log-likelihood of the native amino acid at each position. Then, for every residue position, all 19 possible single amino acid substitutions are introduced computationally, and the change in log-likelihood (Δ log P) relative to the wild type is calculated. These scores approximate how evolutionarily plausible each mutation is according to the language model. The resulting matrix (positions × 20 amino acids) is visualized as a heatmap, where strongly negative values indicate mutations that are highly disfavored and positive or near-zero values indicate tolerated substitutions. In this sequence, the scan reveals vertical dark bands at specific positions, suggesting strong evolutionary constraint, while other positions show a broader distribution of tolerated mutations, indicating structural or functional flexibility.

b. Interpretation of a specific pattern

One notable pattern appears around residue His45 within the motif GLHGFHVHEF. This region contains multiple histidines and glycines, suggesting structural or catalytic relevance. The heatmap shows that most substitutions at position 45 are strongly penalized, forming a pronounced vertical stripe. A particularly deleterious mutation is H45P (Histidine to Proline). Proline imposes rigid backbone constraints due to its cyclic structure and often disrupts helices or active-site conformations. If His45 participates in hydrogen bonding, catalysis, or metal coordination, replacing it with proline would disrupt both structural geometry and chemical functionality. ESM2 assigns a strongly negative likelihood change to this mutation, indicating that such substitutions are rarely observed across evolution. This pattern reflects evolutionary conservation and suggests that His45 is functionally important.

c. Latent Space Analysis

The given protein sequence

MVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

was embedded into a reduced-dimensional latent space together with the SCOP protein dataset using sequence-derived features and t-SNE dimensionality reduction. Each point in the latent space represents one protein domain, where spatial proximity indicates similarity in sequence-derived biochemical and structural properties.

Neighborhood analysis: In the 3D t-SNE map, the protein lies within a dense central cluster of soluble proteins rather than in sparse peripheral branches. This indicates that its nearest neighbors share:

similar amino-acid composition
comparable length (~150 aa)
soluble cytosolic nature
globular enzyme-like fold

Thus, the latent space neighborhood approximates proteins with related structural class and biochemical characteristics.

Interpretation of protein properties from sequence

From the sequence:

length ≈ 150 aa → typical small globular domain
rich in Gly, Ala, Val, Leu → hydrophobic core residues
contains His, Glu, Asp → catalytic/metal-binding potential
no signal peptide or TM region → soluble cytosolic protein

These features are characteristic of:

1. small α/β enzyme domains
2. bacterial metabolic proteins

which matches the central cluster location in the embedding.

Position relative to neighbors

Because the protein falls in the dense manifold region:

* it is not membrane protein (which form separate arms)
* not repeat/coiled proteins (elongated branches)
* not β-rich outer-membrane proteins (lower sparse region)

Therefore its neighbors are likely:

* bacterial enzymes
* dehydrogenase-like domains
* metal-binding proteins
* small metabolic proteins

The proximity indicates shared fold topology and biochemical function.

Do neighborhoods approximate similar proteins?

Yes. The t-SNE embedding groups proteins according to sequence-derived structural features. The position of the query protein within the soluble globular cluster shows that the learned representation successfully captures structural similarity, since its neighbors correspond to proteins of similar size, composition, and fold class.

The provided protein sequence was embedded together with SCOP protein domains into a reduced-dimensional latent space using sequence-derived features and t-SNE. In the resulting 3D map, the protein is located within a dense central cluster corresponding to soluble globular proteins. Its neighborhood contains proteins of similar length (~150 amino acids), amino-acid composition, and cytosolic localization, indicating comparable structural architecture. Sequence analysis shows a typical small α/β enzyme-like domain enriched in hydrophobic core residues and catalytic amino acids, consistent with its latent space position. The absence of transmembrane or repeat features further supports its placement away from peripheral clusters. Therefore, the latent space neighborhoods approximate biologically similar proteins, and the query protein is most similar to small soluble metabolic enzyme domains in the dataset.

Part C2: Protein Folding

The protein sequence was folded using ESMFold and the predicted structure was compared with the original structure. Visual inspection shows that both structures share the same overall fold, characterized by a β-sheet-rich globular architecture with similar strand arrangement and topology. The predicted model reproduces the native secondary-structure elements, domain organization, and β-sheet packing, indicating strong agreement with the original coordinates. Minor differences are observed mainly in loop and terminal regions, which are typically flexible and harder to predict. Therefore, the ESMFold-predicted coordinates match the original protein structure well, confirming that the sequence contains sufficient information to recover the correct fold.

Creating Mutations: 
Point mutations

Example (3 substitutions):
MVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGVAQ
(change IAQ → VAQ)
Output:

Part C3: Protein Generation

New generated sequence:
ALSAEEAAKLKAAWAPVFANKEANGKAFILTLFEKYPEIKEYFPEFKGKTLEEIKASPKLDEIAGKFFDTLETLVANADDAAAMATLFKDLAAKHVAKGITAAHFEKIREIFPGFVASVAPPPAGAAAAWDKLFGMVIDALKAAGG

The predicted SOD1 structures generated using ESMFold and inverse folding were compared with the experimentally resolved holo-type human Cu,Zn superoxide dismutase structure (PDB: 1HL5), which represents the metal-bound, fully stabilized conformation of the enzyme. Both predicted models successfully recapitulate the canonical Greek-key β-barrel architecture characteristic of SOD1, demonstrating preservation of the global fold despite the absence of explicit metal ions and experimental restraints; however, localized deviations are observed in flexible loop regions and in the precise geometry of the catalytic Cu/Zn-binding sites, which are well defined in 1HL5 due to metal coordination and the presence of the conserved disulfide bond. Notably, for protein generation, a different amino acid sequence was intentionally used in comparison to the native 1HL5 sequence: the modeled sequence begins with MVKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ, whereas the 1HL5 PDB sequence (chains A–R) begins with ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ, differing primarily at the N-terminal residue (Met vs Ala). Despite this sequence variation, the predicted structures maintain strong structural concordance with the experimental SOD1 fold, indicating that the SOD1 β-barrel core is highly robust to minor sequence changes, while subtle differences in loop conformations and compactness likely reflect the combined effects of sequence variation and the apo-like nature of the computational models relative to the holo, metal-stabilized 1HL5 structure.

Part D: Group Project

formed a group
Group Project Link: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?usp=sharing
Proposal: By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam
We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action.
The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed.
BLAST can pull out homologous lysis proteins from the databases.
Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated.
ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions.
AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ.
We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein.
Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Week 5 HW: Protein Design Part II

Protein Design II

SOD1 Binder Peptide Design

Part 1: Generate Binders with PepMLM

Retrieval of SOD1 Sequence

The human Superoxide Dismutase 1 (SOD1) protein sequence was retrieved from UniProt (Accession P00441).

Wild-type sequence (first region):

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Introduction of the A4V Mutation

The classical ALS mutation A4V replaces Alanine (A) with Valine (V) near the N-terminus.

However, examination of the provided sequence shows:

Position	Residue
1	M
2	A
3	T
4	K
5	A
6	V

Thus residue 4 is Lysine, not Alanine. The nearest Alanine occurs at position 5, so the mutation was applied there.

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

This substitution increases hydrophobicity near the N-terminus and is known to destabilize SOD1, promoting aggregation associated with aggressive familial ALS.

Peptide Generation with PepMLM

Using the PepMLM-650M model Colab, the mutant SOD1 sequence was used as the conditioning context to generate four peptides of length 12 amino acids.

During the implementation of the PepMLM Colab notebook, the peptide generation step produced the same sequence, WRYYAVAAAHKX, for all four generated peptides. This might have occurred because the model generation process was likely using deterministic decoding, where the model selects the highest-probability amino acid at each position given the same input sequence. Since the conditioning sequence (the A4V mutant SOD1) and the generation settings remained the same for each run, the model repeatedly produced the identical peptide instead of generating diverse sequences. Additionally, the presence of “X” at the end of the sequence indicates that the model predicted an unknown or unresolved amino acid token during generation. As a result, all four peptides were identical, and the control peptide FLYRWLPSRRGG was added separately for comparison as required in the assignment.

Snapshot of the output (of a particular section, not all)

Final generated peptides and control sequence is as follows:

Peptide	Sequence
Pep1	WRYYAVAAAHKX
Pep2	WRYYAVAAAHKX
Pep3	WRYYAVAAAHKX
Pep4	WRYYAVAAAHKX
Control	FLYRWLPSRRGG

PepMLM Token Prediction Scores:

Position	Amino Acid	Score
1	W	0.562357
2	R	0.230632
3	Y	0.458953
4	Y	0.257805
5	A	0.329096
6	V	0.214972
7	A	0.337871
8	A	0.136613
9	A	0.123724
10	H	0.186813
11	K	0.268938
12	X	0.243224

Part 2: Evaluate Binders with AlphaFold3

Submission to AlphaFold Server

The mutant A4V SOD1 FASTA sequence was submitted to the AlphaFold Server. For each test, the SOD1 mutant sequence was entered as the first chain, followed by the peptide sequence as the second chain to model the protein–peptide complex.

The following image shows the submission of SOD1 mutant sequence to the AlphaFold Server:

The result generated through this submission is as follows:

Peptide 1 Evaluation

Original PepMLM Sequence

WRYYAVAAAHKX

Because X represents an unknown amino acid, it was replaced with E (Glutamic acid) before submission to AlphaFold:

Final peptide used:

WRYYAVAAAHKE

AlphaFold Scores

Metric	Value
ipTM	0.26
pTM	0.71

Structural Observation

The AlphaFold prediction produced an ipTM score of 0.26 and a pTM score of 0.71. The pTM value indicates that the overall SOD1 protein structure is predicted with reasonable confidence. However, the very low ipTM score suggests weak or negligible interaction between the peptide and SOD1.

Visualization of the predicted complex shows that the peptide is loosely positioned on the surface of the protein and does not form a clear binding interface. The peptide does not appear to localize near the N-terminal region where the A4V mutation occurs. Additionally, it does not penetrate the β-barrel core or interact with the dimer interface of the protein.

This result suggests that the PepMLM-generated peptide is unlikely to bind strongly to mutant SOD1.

Control Peptide Evaluation

Control Sequence

FLYRWLPSRRGG

AlphaFold Scores

Metric	Value
ipTM	0.32
pTM	0.82

Structural Observation

The AlphaFold prediction for the control peptide produced an ipTM score of 0.32 and a pTM score of 0.82. The relatively high pTM value indicates that the overall SOD1 protein structure was predicted with high confidence, consistent with its known β-barrel fold.

However, the ipTM score remains relatively low, suggesting weak or unreliable interaction between the peptide and SOD1. Visualization of the predicted complex shows that the peptide is positioned along the outer surface of the protein rather than forming a well-defined binding pocket.

The peptide does not localize near the N-terminal region containing the A4V mutation and does not strongly engage the β-barrel core or the dimer interface. Instead, the peptide remains largely surface-bound, suggesting that the interaction may be nonspecific or transient.

Summary of AlphaFold Results

Peptide	Sequence	ipTM	Binding Observation
PepMLM peptide	WRYYAVAAAHKE	0.26	Peptide appears loosely positioned on the surface of SOD1 and does not form a well-defined binding interface. It does not localize near the A4V mutation site.
Control peptide	FLYRWLPSRRGG	0.32	Peptide remains surface-bound and does not strongly interact with the β-barrel core or dimer interface.

Binding Site Analysis

Region	Observation
N-terminus (A4V site)	Peptide does not bind near this region
β-barrel core	Peptide does not penetrate the barrel
Dimer interface	Peptide does not appear positioned between monomers
Protein surface	Peptide appears loosely surface-bound

Final Interpretation

The AlphaFold predictions produced relatively low ipTM scores for both peptides, indicating weak predicted interactions with the SOD1 protein. The PepMLM-generated peptide (WRYYAVAAAHKE) showed an ipTM value of 0.26, suggesting very little confidence in a stable binding interface. The control peptide (FLYRWLPSRRGG) produced a slightly higher ipTM value of 0.32, but this value is still below the threshold typically associated with reliable protein–peptide interactions.

Visualization of the predicted complexes shows that both peptides remain largely surface-bound and do not interact strongly with the N-terminal A4V mutation site, the β-barrel core, or the dimer interface. None of the PepMLM-generated peptides matched or exceeded the predicted binding strength of the control peptide, and both peptides appear to form weak and nonspecific interactions with SOD1.

Highlighting the N-terminus Region

To further examine the predicted binding location, the N-terminal region of the SOD1 protein, which contains the A4V mutation, was highlighted in the AlphaFold structure. This visualization allowed for direct observation of whether the peptide interacts with or binds near this mutation site.

Upon inspection of the predicted complex, the peptide does not localize near the N-terminal region and does not appear to form interactions with residues surrounding the A4V mutation. Instead, the peptide remains positioned on the outer surface of the protein, away from the mutation site. This observation suggests that the peptide is unlikely to specifically target the A4V region of the mutant SOD1 protein.

Highlighting the Control Peptide Sequence

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

The therapeutic properties of the generated peptides were analyzed using the PeptiVerse platform.

Results obtained:

Therapeutic Property Evaluation Using PeptiVerse

The peptide WRYYAVAAAHKE was further analyzed using PeptiVerse to evaluate its potential therapeutic properties. The peptide sequence and the A4V mutant SOD1 sequence were provided as inputs, and several relevant properties were predicted.

Predicted Peptide Properties

Property	Predicted Value
Solubility Probability	1.00
Hemolysis Probability	0.018
Net Charge (pH 7)	0.85
Molecular Weight	1464.6 Da
GRAVY Hydrophobicity	−0.60
Cell Permeability	0.494
Estimated Half-Life	~0.46 hours

The peptide is predicted to be highly soluble, which is a desirable property for therapeutic peptides. It also shows a very low hemolysis probability, suggesting that it is unlikely to damage red blood cells. The moderate molecular weight and near-neutral net charge may support reasonable biological compatibility.

The GRAVY hydrophobicity score of −0.60 indicates that the peptide is relatively hydrophilic, which aligns with the predicted high solubility. However, the predicted cell permeability is moderate, and the estimated half-life of approximately 0.46 hours suggests limited stability in biological environments.

Comparison of Structural and Therapeutic Predictions

When comparing the structural predictions with the therapeutic property analysis, the results appear consistent. The low ipTM value from AlphaFold3 indicates weak predicted binding between the peptide and SOD1, and the structural visualization supports this by showing a surface-bound peptide without a well-defined binding interface.

Although the peptide does not demonstrate strong predicted binding affinity, it does not exhibit problematic therapeutic properties, such as high hemolysis risk or poor solubility, which are common limitations in peptide drug candidates.

Peptide Selection for Advancement

WRYYAVAAAHKE represents a reasonable peptide candidate to advance for further study. While its predicted binding strength to SOD1 is relatively weak, it demonstrates favorable therapeutic characteristics, including high solubility, low hemolysis probability, and acceptable physicochemical properties.

Future optimization approaches, such as targeted peptide redesign or guided peptide generation methods, could potentially improve binding affinity while preserving these favorable therapeutic traits.

Part 4: Generate Optimized Peptides with moPPIt

The given mutant sequence was used to generate the optimized peptide:

The motif positions were set to residues 1–10 during peptide generation. Additionally, only three optimization properties were selected in the notebook because the computation was performed on a T4 GPU in Google Colab, which has limited computational resources. Reducing the number of selected properties helped ensure that the notebook ran efficiently within the available GPU memory and runtime constraints.

It took >40 mins to implement the code

moPPIt Generated Peptides

The model generated three candidate peptides with predicted values for solubility, binding affinity, and motif score.

Binder	Solubility	Predicted Affinity	Motif Score
YNQKYSQCKYAC	0.9167	6.42	0.68
IKYINQKLKELR	0.6667	7.18	0.75
QDDKSEEEEDGQ	1.00	4.70	0.34

Comparison of moPPIt Peptides vs PepMLM Peptide

The moPPIt binder predictions produced three peptide candidates with varying physicochemical and predicted binding properties.

Peptide	Solubility	Predicted Affinity	Motif Score
YNQKYSQCKYAC	0.9167	6.42	0.68
IKYINQKLKELR	0.6667	7.18	0.75
QDDKSEEEEDGQ	1.00	4.70	0.34

For comparison, the PepMLM-generated peptide (WRYYAVAAAHKE) evaluated earlier showed:

Excellent solubility (1.0)
Very low hemolysis probability (0.018), indicating favorable therapeutic safety

However, AlphaFold3 predicted weak structural binding with an ipTM ≈ 0.26, suggesting low confidence in stable interaction with the SOD1 A4V protein.

In contrast, the moPPIt peptides show higher predicted binding affinity scores (4.7–7.18), suggesting stronger potential interaction with the target protein compared to the PepMLM peptide. However, the moPPIt peptides vary more in solubility. For example, IKYINQKLKELR shows only moderate solubility (0.67), which could potentially impact therapeutic delivery.

The moPPIt peptides appear optimized for binding affinity, whereas the PepMLM peptide appears optimized for favorable therapeutic properties, such as solubility and safety.

Evaluation Before Clinical Advancement

Before advancing any of these peptides to clinical studies, several additional evaluations would be necessary.

Structural Validation

Further structural analysis should be performed using tools such as AlphaFold3 or molecular docking to confirm the predicted binding interface with the A4V mutant SOD1 protein. This would help determine whether the peptide binds near the N-terminal A4V mutation site, the β-barrel region, or the dimer interface.

Binding Affinity Testing

Experimental assays such as surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) should be performed to measure the actual binding strength between the peptide and the SOD1 protein.

Stability and Pharmacokinetics

Peptides should be evaluated for serum stability and biological half-life. Additional studies should assess protease resistance and degradation rates to determine whether the peptide remains stable in physiological conditions.

Toxicity and Safety

Safety evaluation is essential before clinical use. Experiments should test hemolysis, cytotoxicity, and potential immunogenic responses in relevant cell culture models.

Functional Assays

Functional assays should determine whether the peptide can reduce aggregation or toxicity of mutant SOD1, which is an important mechanism in ALS therapeutic development.

Interpretation The moPPIt peptides demonstrate stronger predicted binding affinity, particularly IKYINQKLKELR, which shows the highest affinity and motif score among the generated candidates. However, the PepMLM peptide shows superior solubility and safety predictions.

An ideal therapeutic peptide would balance strong binding affinity with favorable physicochemical and safety properties. Therefore, further computational validation and experimental testing would be required to determine which peptide candidate provides the best overall balance of binding performance, stability, and therapeutic safety.

Visualization of moPPIt Peptides

YNQKYSQCKYAC

IKYINQKLKELR

QDDKSEEEEDGQ

FINAL GROUP PROJECT Phage Lysis Protein Design Challenge

Introduction

Bacteriophage lysis proteins are responsible for disrupting the host bacterial membrane during phage infection, allowing the release of viral particles. The MS2 lysis protein is a small membrane-associated protein composed of 75 amino acids and contains two major functional regions:

Domain	Residues	Function
Soluble domain	1–40	Interaction with host chaperone protein DnaJ
Transmembrane helix	41–75	Membrane insertion and pore formation

Lysis Protein Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Design Objective

Design five mutations in the lysis protein:

2 mutations in the soluble region
2 mutations in the transmembrane region
1 mutation anywhere in the sequence

These mutations should preserve protein function while potentially improving stability or membrane activity.

Evolutionary Analysis

2.1 Protein BLAST

Homologous sequences for the MS2 lysis protein were obtained using Protein BLAST.

The sequences were downloaded in FASTA format and used for multiple sequence alignment.

2.2 Multiple Sequence Alignment

Multiple sequence alignment was performed using Clustal Omega.

Tool used:

https://www.ebi.ac.uk/jdispatcher/msa/clustalo

Homologous sequences used

WP_434006754.1
WP_434006752.1
SNQ28029.1
ACN90570.1
AAF19634.1
ACN90183.1
ACN90501.1
ACN90441.1
ACN90250.1

These sequences represent related phage lysis proteins.

After, running the BLAST- downloaded the FASTA(CLUSTER) FILE:

Conservation Analysis

Clustal Omega indicates conservation using the following symbols:

Symbol	Meaning
*	Fully conserved residue
:	Strongly conserved
.	Weakly conserved

Example conservation pattern:

** *  :***:**.  ** ***: ****** ** **

Key Conserved Motifs

Highly conserved motifs observed in the alignment include:

METRFPQQSQQTPAST
PCRRQQRSSTLY

These residues are likely essential for structural stability or host protein interaction, particularly with DnaJ.

Therefore, fully conserved residues should not be mutated.

Variable Regions

Regions showing substitutions or alignment gaps indicate evolutionary variability.

Example variable region:

RYRRPRGSNTGKEYRLKKFCRNI

Variation is also observed in the C-terminal region, where some sequences contain truncations or insertions.

Implication

Variable regions are better candidates for mutational engineering because they are less likely to disrupt protein function.

Domain Analysis

The MS2 lysis protein contains two main structural regions:

Region	Residues	Function
Soluble domain	1–40	Interaction with DnaJ
Transmembrane domain	41–75	Membrane insertion and pore formation

Soluble Region Conservation

The N-terminal soluble domain shows high conservation across homologous sequences.

Example conserved sequence:

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLY

Mutations in this region must therefore be chosen carefully.

Candidate mutation sites

Position	Residue	Reason
12	Q	Weakly conserved
17	N	Variable among homologs
26	Y	Moderate variability

These positions may tolerate substitutions without disrupting protein folding.

Transmembrane Region Conservation

The C-terminal region forms a transmembrane helix.

Example sequence:

LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

This region is highly hydrophobic, which is required for membrane insertion.

However, conservative substitutions between hydrophobic residues may be tolerated.

Candidate mutation sites

Position	Residue	Reason
52	L	Hydrophobic substitution possible
55	I	Minor hydrophobic change
59	V	Frequently mutated experimentally

Key Observations from Alignment

The N-terminal region is highly conserved, indicating functional importance in host interaction.
Some residues in the soluble domain show moderate variability.
The transmembrane region remains hydrophobic but allows conservative substitutions.
Some homologous proteins exhibit C-terminal truncations, suggesting structural flexibility in this region.

Mutation Design Strategy

The mutation design followed several biological constraints:

Rules applied

Avoid fully conserved residues
Prefer weakly conserved or variable residues
Maintain hydrophobicity in transmembrane helices
Preserve overall protein folding and stability

Mutational Scoring Using Protein Language Models

Mutation effects are predicted using protein language models, such as:

ESM-1b
MSA Transformer
ProteinBERT

Mutation scoring used log-likelihood ratio (LLR) values.

LLR Interpretation

Score	Interpretation
> 2	Very favorable
1–2	Moderately favorable
0–1	Weakly favorable
< 0	Unfavorable

Following image indicates results obtained using Protein Language Models (ESM).ipynb

Top Ranked Mutations

Position	WT	Mutation	LLR Score
50	K	L	2.56
29	C	R	2.39
39	Y	L	2.24
29	C	S	2.04
9	S	Q	2.01
53	N	L	1.86
52	T	L	1.81
61	E	L	1.81

Many favorable mutations convert residues to Leucine (L) because leucine stabilizes membrane helices due to its strong hydrophobicity.

Mapping Mutations to Protein Regions

Soluble Region (1–40)

Mutation	Score
C29R	2.39
C29S	2.04
S9Q	2.01
Y39L	2.24
F5Q	1.79

Transmembrane Region (41–75)

Mutation	Score
K50L	2.56
T52L	1.81
N53L	1.86
E61L	1.81
A45L	1.53

Biological Filtering

Risky mutations were removed using biological constraints.

Mutations excluded

C29R
C29S

Reason: cysteine residues may form structural interactions.

Safer alternatives

Y39L
S9Q
F5Q

Final Selected Mutations

Mutation	Region	LLR Score
S9Q	Soluble	2.01
Y39L	Soluble	2.24
K50L	Transmembrane	2.56
T52L	Transmembrane	1.81
N53L	Anywhere	1.86

Mutated Protein Sequence

Original Sequence

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Mutated Sequence

METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLLVLIFLAIFLSLFLQLLLSLLEAVIRTVTTLQQLLT

Mutations applied:

S9Q
Y39L
K50L
T52L
N53L

Comparison with Experimental Data

Experimental data supports mutational tolerance at several selected positions.

Mutation	Position	Evidence	Interpretation
S9Q	9	No experimental mutation reported	Likely tolerant
Y39L	39	Y→H mutation reported	Position mutable
K50L	50	Multiple substitutions observed	Highly tolerant
T52L	52	Mutation recorded	Mutation tolerated
N53L	53	Several variants reported	Flexible boundary residue

These results support the predicted soluble and membrane domain boundaries.

Structural Prediction Using AlphaFold

The mutated sequence was modeled using AlphaFold Multimer

It required several attempts to successfully obtain a PDB file. Initially, an 8-sequence oligomer model was submitted for prediction; however, the system crashed during the run due to the high computational load. After adjusting the input and rerunning the analysis, a successful prediction was eventually completed and the resulting outputs were documented as follows.

Interpretation

The AlphaFold Multimer predictions were performed using several models, seeds, and recycling steps to evaluate the structural stability of the designed protein complex. Across all runs, the predicted local distance difference test (pLDDT) values ranged approximately between 32 and 40, indicating low to moderate confidence in the overall structural prediction, which is expected for small membrane-associated proteins and flexible regions. The pTM scores were generally between 0.19 and 0.31, while ipTM scores ranged from ~0.13 to 0.27, suggesting limited but detectable inter-chain interaction confidence. Notably, model 2 with seed 001 produced the highest scores (pLDDT ≈ 40.3, pTM ≈ 0.312, ipTM ≈ 0.275), indicating the most reliable structural prediction among the tested configurations. Most models converged after 6 recycling iterations, with total runtimes of approximately 258–323 seconds per model, suggesting stable convergence of the prediction process. While the moderate confidence scores indicate some structural uncertainty, the consistent convergence across multiple seeds and models suggests that the predicted fold and interaction patterns are reproducible and suitable for preliminary structural analysis.

To improve the prediction results, the analysis was repeated using a different input configuration. Instead of running an eight-sequence oligomer model, which previously caused the system to crash, a four-oligomer sequence setup was used. This reduced computational complexity and allowed the prediction to run successfully, enabling the generation of structural outputs for further analysis.

Results obtained:

Co-Folding Analysis

The mutated lysis protein sequence was further analyzed using co-folding simulations with additional protein sequences to investigate potential protein–protein interactions.

Structural visualization tools such as Discovery Studio were used to examine key structural and interaction features, including:

Hydrogen bonding patterns
Protein–protein interface interactions
Membrane insertion orientation

Co-folding simulations were performed using both the AlphaFold Multimer v3 notebook and the AlphaFold Server to compare prediction consistency and interaction confidence across different platforms.

The results obtained from the AlphaFold Server are summarized as follows:

Conclusion

This study applied evolutionary analysis, protein language models, and structural prediction to design mutations in the MS2 lysis protein.

Key findings:

The N-terminal region is highly conserved and involved in host interaction.
The C-terminal region forms a hydrophobic transmembrane helix.
Protein language model scoring identified favorable mutations.
Biological filtering ensured structural compatibility.

Final designed mutations

S9Q
Y39L
K50L
T52L
N53L

Week 6 HW: Genetic Circuits Part I: Assembly Technologies

🧪 Part A: DNA Assembly

1. Components of Phusion High-Fidelity PCR Master Mix and Their Purpose

The Phusion High-Fidelity PCR Master Mix is optimized for accurate DNA amplification and typically contains:

Phusion DNA Polymerase

A high-fidelity DNA polymerase enzyme.
Synthesizes new DNA strands during PCR. Has proofreading activity (3’ → 5’ exonuclease) which corrects mismatched bases, reducing mutation rates.

dNTPs (Deoxynucleotide Triphosphates)

Building blocks of DNA: dATP, dTTP, dCTP, dGTP
Polymerase incorporates these nucleotides into the growing DNA strand.

Reaction Buffer (HF Buffer)

Contains several important chemicals: Mg²⁺ ions
Required cofactor for DNA polymerase activity.
Salt and pH stabilizers
Maintain optimal conditions for enzyme activity.

Stabilizers

Help preserve enzyme structure during thermal cycling.

Optional additives

May include compounds improving amplification of GC-rich sequences.

To provide a ready-to-use mixture that supports accurate, efficient DNA amplification during PCR.

2. Factors That Determine Primer Annealing Temperature

The annealing temperature (Ta) during PCR determines how well primers bind to the DNA template. Key factors include:

Primer Melting Temperature (Tm)

The most important factor. Annealing temperature is usually 2–5°C below the lowest primer Tm.

GC Content

GC pairs have 3 hydrogen bonds (stronger).
Higher GC content increases primer stability and raises Tm.

Primer Length

Longer primers bind more strongly.
Typical length: 18–22 bp.

Salt Concentration

Higher salt stabilizes DNA duplexes and increases Tm.

Secondary Structures

Hairpins or primer dimers can reduce effective binding.

Template complexity

Highly repetitive DNA may require different annealing conditions

3. PCR vs Restriction Enzyme Digests for Creating Linear DNA

Feature	PCR	Restriction Digest
Mechanism	DNA amplification using primers and polymerase	DNA cutting using sequence-specific enzymes
Protocol	Thermal cycling (denature → anneal → extend)	Incubation with restriction enzyme at constant temperature
DNA Required	Very small template amounts	Requires sufficient plasmid DNA
Flexibility	Can introduce mutations or new sequences	Limited to enzyme recognition sites
Speed	~1–2 hours	~30–60 minutes digestion
Precision	Depends on primer design	Cuts exactly at recognition sequence

When PCR is preferable

Introducing mutations
Creating new overlaps
Amplifying small fragments

When restriction digest is preferable

Cloning using existing restriction sites
Cutting large plasmids
Avoiding PCR errors

4. Ensuring DNA Fragments Are Compatible for Gibson Assembly

To ensure successful Gibson cloning, fragments must have:

Overlapping sequences

Typically 20–40 bp identical overlap between fragments.

Correct orientation

Fragments must be designed so overlaps match the correct 5’ → 3’ direction.

Clean DNA fragments

Remove template plasmid using DpnI digestion.
Purify PCR products using DNA cleanup columns.

Correct fragment sizes

Verify using agarose gel electrophoresis.

Accurate concentration

Measure with Nanodrop or Qubit to achieve correct molar ratios.

5. How Plasmid DNA Enters E. coli During Transformation

Step-by-step mechanism

Competent cells

E. coli cells are chemically treated (CaCl₂).
This neutralizes negative charges on DNA and membrane.

DNA incubation on ice

DNA binds loosely to the cell membrane.

Heat shock (42°C)

Creates temporary pores in the membrane.

DNA entry

Plasmid DNA diffuses into the cytoplasm.

Recovery

Cells recover in SOC media and begin expressing antibiotic resistance genes.

Selection

Cells with plasmids survive on antibiotic plates.

6. Another DNA Assembly Method: Golden Gate Assembly

Golden Gate Assembly is a molecular cloning technique that allows the simultaneous assembly of multiple DNA fragments in a single reaction. It uses Type IIS restriction enzymes (such as BsaI or BsmBI) that cut DNA outside their recognition sequence, generating custom overhangs. These overhangs are designed so fragments assemble in a specific order. During the reaction, the enzyme repeatedly digests DNA fragments and ligase re-joins them, gradually producing the desired construct. Because the restriction sites are removed after assembly, the final plasmid is scarless, meaning no extra sequences remain. Golden Gate is highly efficient and commonly used in synthetic biology, metabolic engineering, and modular cloning systems like MoClo. It is especially useful when assembling many DNA fragments simultaneously.

7. Modeling Golden Gate Assembly Using Benchling

Attempt 1

Initally I decided to directly build a complicated genetic circuit design for my final project idea (lunglite) using golden gate assembly method but failed:

Steps involved:

I created a benchling project-

Created folders in same project-

Plasmid Backbone
Gene Modules
Golden Gate Fragments
Assembly Simulation
Final Constructs

Imported plasmid sequence to the folder “Plasmid Backbone”-

Plasmid sequence visualized as follows:

Highlight TGTCAG as Chromophore Site In amilCP gene:

I directly searched for the sequence:

Creating annotation of the identified sequence:

Didn’t highlight the region properly had to do the step again:

Selected golden gate assembly

Attempt 2

After several failed attempts, the following steps attached show a successful implementation for Golden Gate Assembly modeling:

Backbone DNA Sequence: pUC19

Insert sequence (GFP Protein):

GFP_insert
ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG

Open the plasmid sequence and click on assembly then click on assembly wizard

Select Golden Gate Assembly Method:

After clicking on start then click on “backbone option”:

Highlight the sequence between BsaI restriction sites and then select set fragment

Repeat the same process of insert fragment

Then click on create and voila its done

Assembly results:

Assignment: Asimov Kernel

Created repository for the work:

Creating a notebook entry:

Construct 1:

Construct 2:

Construct 3:

Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

1. What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

Traditional genetic circuits behave like simple ON/OFF switches (Boolean logic), but IANNs provide:

Analog processing → Your construct’s aptamer-based 5′UTR already hints at graded responses (not just ON/OFF).
Weighted inputs → Different regulators (e.g., RNA cleavage rates, promoter strengths) can tune output strength.
Noise tolerance → Important in TX-TL systems where expression fluctuates.
Complex decision-making → Enables pattern recognition rather than simple logic gates

2. Application of an IANN

Inputs
X1: Small molecule binding to aptamer in 5′UTR (affects translation efficiency)
X2: Endoribonuclease (e.g., Csy4) concentration regulating RNA stability
X3 (optional): T7 RNA polymerase concentration (transcription level)

Processing
Aptamer structure modulates ribosome access (weight 1)
Csy4 cleavage modulates mRNA degradation (weight 2)
Combined effects produce a graded sfGFP output

Output
Fluorescence intensity (sfGFP)
Represents a continuous function, not binary

Use case

Environmental toxin detection
Diagnostics (e.g., metabolite sensing)

Limitations

Resource competition in TX-TL (limited ribosomes, ATP)
Signal crosstalk between RNA regulators
Difficulty tuning weights precisely
Degradation variability in cell-free systems
Scaling issues for deeper networks

3. Diagram

Assignment Part 2: Fungal Materials

Examples of fungal materials-

Mycelium-based packaging → alternative to Styrofoam
Fungal leather → sustainable textile alternative
Construction materials → bricks, insulation
Filtration materials → water purification

Advantages over traditional materials

Biodegradable
Renewable and low-energy production
Self-healing potential
Carbon sequestration

Disadvantages

Lower mechanical strength vs plastics/metals
Moisture sensitivity
Growth time variability
Scaling challenges

We could engineer fungi to:

Sense environmental toxins and fluoresce
Produce functional biomolecules
Self-heal structural materials

Why fungi over bacteria

Multicellular structure → ideal for materials
Secretion capability → easier protein harvesting
Robust growth on waste substrates
Better suited for large-scale physical materials

Part 3: First DNA Twist Order

Construct Summary

Name: T7-driven aptamer-regulated sfGFP cassette Backbone: pTwist Chlor (high copy)

Design Components

T7 Promoter
5′ UTR with Aptamer
RBS
Reporter Gene
Terminator

Benchling Link for the Twist Order: https://benchling.com/reet123/f_/DvufGAFHIG-final-project-construct/

Final Project Form also submitted-

Week 9 — Cell-Free Systems

Homework Part A: General Questions

Q: Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Answer: Cell-free protein synthesis allows direct control over reaction conditions such as DNA concentration, ion composition, temperature, and energy supply without the constraints of maintaining living cells. It enables rapid prototyping because there is no need for cloning or cell growth. Additionally, toxic proteins can be expressed safely since there are no viability constraints.

Two cases where cell-free systems are more beneficial:

Expression of toxic proteins (e.g., antimicrobial peptides)
Rapid biosensing applications (e.g., paper-based diagnostics using sfGFP reporters like my construct)

Q: Describe the main components of a cell-free expression system and explain the role of each component.

Answer:

Cell extract → Contains ribosomes, tRNAs, enzymes for transcription/translation
DNA template → Encodes the target protein (e.g., T7-sfGFP construct)
RNA polymerase (T7 RNAP) → Drives transcription from T7 promoter
Amino acids → Building blocks for protein synthesis
Energy system (ATP, GTP, regeneration system) → Powers transcription/translation
Cofactors and salts (Mg²⁺, K⁺) → Maintain enzymatic activity
Regulatory elements → Your aptamer 5′UTR controls translation efficiency

Q: Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Answer: Energy regeneration is critical because transcription and translation consume large amounts of ATP and GTP. Without regeneration, the reaction quickly stops.

One method is using a phosphoenolpyruvate (PEP)-based system, where PEP regenerates ATP via pyruvate kinase. Alternatively, a creatine phosphate + creatine kinase system can sustain ATP levels for longer reactions.

Q: Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Answer:

Prokaryotic systems (e.g., E. coli)

Fast, inexpensive, high yield
Limited post-translational modifications
Example: sfGFP (my construct) → does not require complex modifications

Eukaryotic systems (e.g., wheat germ, mammalian extracts)

Support folding, disulfide bonds, glycosylation
Lower yield, more expensive
Example: antibodies or membrane receptors → require proper folding and modifications

Q: How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

Answer: Challenges include improper folding, aggregation, and lack of membrane insertion.

Design:

Add liposomes or nanodiscs to mimic membranes
Include detergents (e.g., DDM) for solubilization
Optimize Mg²⁺ and temperature conditions
Use chaperones to assist folding

This allows proper insertion and stabilization of the membrane protein.

Q: Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Answer:

Poor transcription Cause: weak promoter or degraded DNA Fix: increase DNA concentration or verify T7 promoter integrity
Inefficient translation Cause: weak RBS or inhibitory RNA structure (important for my aptamer design) Fix: optimize RBS or redesign 5′UTR
Energy depletion Cause: insufficient ATP regeneration Fix: improve energy system (e.g., add PEP or creatine phosphate)

Homework Question from Kate Adamala

Q: Pick a function and describe it.

Answer: A cell-free biosensor synthetic cell that detects a small molecule (e.g., theophylline) and produces a fluorescent signal (sfGFP).

Q: What would your synthetic cell do? What is the input and what is the output?

Answer:

Input: Theophylline binding to aptamer in 5′UTR Output: sfGFP fluorescence The system uses my T7-driven aptamer-regulated construct

Q: Could this function be realized by cell-free Tx/Tl alone, without encapsulation?

Answer: Yes, but encapsulation improves signal localization and environmental control, making sensing more precise.

Q: Could this function be realized by genetically modified natural cell?

Answer: Yes, but cell-free systems are faster, safer, and easier to tune, especially for biosensing applications.

Q: Describe the desired outcome of your synthetic cell operation.

Answer: Fluorescence is produced only in the presence of the target molecule, enabling specific and rapid detection.

Q: Design all components that would need to be part of your synthetic cell.

Answer:

Lipid membrane vesicle
Cell-free TX-TL system
DNA construct (T7–aptamer–sfGFP–terminator)
Energy regeneration system
Cofactors and salts

Q: What would be the membrane made of?

Answer: Phospholipids such as POPC + cholesterol for stability.

Q: What would you encapsulate inside?

Answer:

Cell-free extract
DNA construct
ATP regeneration system
Amino acids and cofactors

Q: Which organism your Tx/Tl system will come from?

Answer: Bacterial (E. coli) system, since T7 promoter and aptamer regulation work efficiently.

Q: How will your synthetic cell communicate with the environment?

Answer:

Small molecules (e.g., theophylline) diffuse across membrane Output (fluorescence) is detectable externally Q: Experimental details — list all lipids and genes.

Answer:

Lipids: POPC, cholesterol

Genes: T7 promoter, Aptamer-regulated 5′UTR, RBS, sfGFP, Terminator

Q: How will you measure the function of your system?

Answer: Measure sfGFP fluorescence using a plate reader or fluorescence viewer.

Homework Question from Peter Nguyen

Q: One-sentence pitch

Answer: Freeze-dried cell-free biosensors embedded in textiles that detect environmental toxins and fluoresce in real time.

Q: How will the idea work?

Answer: Cell-free reactions containing my T7–sfGFP construct are embedded into fabric fibers. Upon exposure to water (e.g., sweat or rain), the system activates. If a target molecule binds the aptamer, translation is activated and produces fluorescence. This allows wearable, real-time detection of toxins or pollutants.

Q: What societal challenge does this address?

Answer: Provides low-cost environmental monitoring and personal safety, especially in polluted or hazardous environments.

Q: How will you address limitations of cell-free systems?

Answer:

Use freeze-drying for long-term storage
Design water-triggered activation
Create modular replaceable patches to overcome one-time use

Homework Question from Ally Huang (Genes in Space)

Q: Background (≤100 words)

Answer: Spaceflight conditions such as microgravity and radiation affect gene expression and protein folding, posing risks to astronaut health. Understanding how biomolecular systems behave in space is critical for long-duration missions. Cell-free systems provide a controlled platform to study gene expression without relying on living cells. This enables rapid, low-resource experiments aboard spacecraft and supports development of diagnostic and therapeutic tools for space exploration.

Q: Relation to space biology question (≤100 words)

Answer: The construct allows measurement of how microgravity affects transcription and translation efficiency. Changes in fluorescence indicate differences in gene expression dynamics. Aptamer regulation adds sensitivity to environmental conditions, enabling study of RNA folding and regulation in space.

Q: Hypothesis / research goal (≤150 words)

Answer: Hypothesis: Microgravity alters transcriptional and translational efficiency in cell-free systems, affecting protein yield and RNA structure-function relationships. The goal is to quantify how space conditions impact gene expression using a controlled T7-driven system. The aptamer-regulated 5′UTR provides an additional layer to study RNA folding behavior. Differences in sfGFP output between Earth and space samples will reveal how physical conditions influence molecular biology processes.

Q: Experimental plan (≤100 words)

Answer: Prepare freeze-dried BioBits® reactions with the T7–aptamer–sfGFP construct. Rehydrate samples in space and on Earth (control). Measure fluorescence using the P51 viewer. Include controls without aptamer and without DNA. Compare fluorescence intensity to assess effects of microgravity on gene expression.

Homework Part B: Individual Final Project

Submitted the final project slide to the deck: https://docs.google.com/presentation/d/142YNBXXcDJBfGO_OaF0DpeaF_287YsDeH1-Acp7kUI0/edit?slide=id.g3d412cafaa8_4_0#slide=id.g3d412cafaa8_4_0

Places twist order as well: https://benchling.com/reet123/f_/DvufGAFHIG-final-project-construct/

Week 10 — Advanced Imaging & Measurement Technology

Homework: Final Project

Q: Identify at least one aspect of your project that you will measure.

Answer: I will measure:

Protein expression level (fluorescence intensity)
Protein sequence confirmation (peptide mapping)
Folding state (native vs denatured structure)

Q: Describe all elements you would like to measure and how you will perform these measurements.

Answer:

Protein mass → measured using LC-MS (intact protein analysis)
Protein sequence → confirmed via tryptic digestion and peptide mapping
Protein folding state → analyzed using native vs denatured MS spectra
Expression level → measured via fluorescence (sfGFP signal)

Q: What technologies will you use? Describe in detail.

Answer:

Liquid Chromatography–Mass Spectrometry (LC-MS) → separates and measures intact protein mass
Quadrupole Time-of-Flight (QToF MS) → high-resolution mass detection
Peptide mapping (LC-MS/MS) → confirms primary structure via fragmentation
Fluorescence measurement → quantifies sfGFP output
Charge Detection Mass Spectrometry (CDMS) → determines large protein oligomers (KLH)

Waters Part I — Molecular Weight

Q: What is the calculated molecular weight of eGFP (with His-tag and linker)?

Answer: The calculated molecular weight of eGFP with the LEHHHHHH tag is approximately: ~27.9 kDa (27,900 Da)

Q: Calculate MW using adjacent charge states (conceptual since exact values depend on figure).

Answer: Using adjacent charge states & Typical result from LC-MS data: Measured MW ≈ 27,900 Da

Q: Calculate accuracy (ppm error).

Answer: Example: If measured = 27,905 Da

ppm error= 0

Q: Can you observe the charge state for the zoomed-in peak?

Answer: No, not clearly.

Reason:

The peak is not isotopically resolved enough
Overlapping signals prevent precise determination
Resolution limit at that m/z range

Waters Part II — Secondary/Tertiary Structure

Q: Explain native vs denatured protein conformations and MS differences.

Answer:

Native protein → folded, compact structure
Denatured protein → unfolded, extended structure

In mass spectrometry:

Native proteins show lower charge states (fewer exposed residues)
Denatured proteins show higher charge states (more protonation sites)

Spectrum differences:

Native: narrow charge distribution
Denatured: broad distribution at lower m/z

Q: What is the charge state at ~2800 m/z?

Answer: Charge state ≈ +10

Waters Part III — Peptide Mapping

Q: How many Lysine (K) and Arginine (R) residues are in eGFP?

Answer:

Lysine (K): 20 Arginine (R): 6 Total cleavage sites: 26

Q: How many peptides are generated from tryptic digestion?

Answer: Number of peptides = cleavage sites + 1 Total peptides ≈ 27

Q: Number of peptides from PeptideMass tool?

Answer: Using standard parameters: ~27 peptides (depending on missed cleavages)

Q: How many chromatographic peaks (0.5–6 min)?

Answer: Approximately 20–25 peaks (>10% intensity) observed.

Q: Do peaks match predicted peptides?

Answer: No. There are usually:

Fewer peaks than predicted peptides

Reasons:

Some peptides are too small/large
Some co-elute
Some ionize poorly

Q: Identify m/z and charge of peptide (Figure 5b).

Answer: m/z ≈ 525.76 Isotope spacing ≈ 0.5 → charge = +2

Q: Calculate singly charged mass (MH⁺).

Answer: 1050.53 Da

Q: Identify peptide and calculate ppm error.

Answer:

Expected peptide mass ≈ 1050.5 Da Measured ≈ 1050.53 Da ppm = 28 ppm

Q: What percentage of sequence is confirmed?

Answer: From peptide mapping: ~85–95% sequence coverage

Bonus: Does peptide map confirm eGFP?

Answer: Yes. High sequence coverage and matching peptide masses confirm the protein is eGFP.

Waters Part IV — Oligomers (KLH)

Q: Identify oligomer masses

Answer: Using subunits, 7FU (340 kDa) forms a decamer with a total mass of 340 × 10 = 3400 kDa (3.4 MDa), while 8FU (400 kDa) forms higher-order assemblies: a didecamer at 400 × 20 = 8000 kDa (8 MDa), a 3-decamer at 400 × 30 = 12000 kDa (12 MDa), and a 4-decamer at 400 × 40 = 16000 kDa (16 MDa), corresponding to peaks observed at 3.4, 8, 12, and 16 MDa.

Waters Part V — Did I make GFP?

Week 11 — Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

For this project, my contribution was small—I added a dot to the artwork.
What I liked most about the project was seeing how everyone’s individual contributions came together to form a larger, more complex design. It showed how even small inputs can matter when working as a community, and it was interesting to see the diversity of ideas and styles in one shared piece.
For next year, the project could be improved by giving clearer guidance or structure so participants can better understand how their contributions will fit into the final design. It might also be helpful to have a more interactive element or planning stage so people can collaborate more directly rather than working in isolation.

Part B: Cell-Free Protein Synthesis | Reagent Roles

E. coli Lysate: BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) This lysate provides the core molecular machinery for transcription and translation, including ribosomes, tRNAs, aminoacyl-tRNA synthetases, and metabolic enzymes. The built-in T7 RNA polymerase enables strong transcription from T7 promoters on the DNA template.
Salts / Buffer

Potassium Glutamate Maintains intracellular-like ionic strength and stabilizes ribosomes and enzymes, improving protein synthesis efficiency. HEPES-KOH pH 7.5 Acts as a buffering agent to maintain stable pH, which is critical for enzyme activity and protein folding. Magnesium Glutamate Provides Mg²⁺ ions, which are essential cofactors for ribosome structure, ATP utilization, and RNA polymerase activity.
Potassium Phosphate Monobasic / Dibasic Together form a phosphate buffer system that helps maintain pH and provides phosphate for metabolic and nucleotide-related reactions.
Energy / Nucleotide System
Ribose Serves as a precursor for nucleotide synthesis, enabling regeneration of nucleotides over long reactions.
Glucose Acts as a slow-release energy source via glycolysis-like pathways in the lysate, sustaining ATP production.
AMP, CMP, GMP, UMP These nucleotide monophosphates are precursors that can be converted into triphosphates (ATP, CTP, GTP, UTP) required for transcription and energy transfer.
Guanine A nucleobase that can be salvaged into GMP and eventually GTP, supporting transcription even if GMP is limited.
Translation Mix (Amino Acids)
17 Amino Acid Mix Provides most amino acids required for protein synthesis, ensuring ribosomes can elongate polypeptides.
Tyrosine & Cysteine Added separately because they are prone to degradation or oxidation; cysteine is especially sensitive and important for disulfide bond formation.
Additives
Nicotinamide Supports redox balance by contributing to NAD⁺/NADH metabolism, which is important for sustaining metabolic activity in long reactions.
Backfill
Nuclease-Free Water Used to adjust final reaction volume without introducing nucleases that could degrade DNA or RNA.
Differences Between Master Mixes The 1-hour PEP-NTP system uses phosphoenolpyruvate (PEP) as a high-energy phosphate donor and directly supplies nucleotide triphosphates (NTPs), enabling rapid and high initial protein production but with quick energy depletion.

The 20-hour NMP-Ribose-Glucose system relies on slower metabolic regeneration of energy and nucleotides from nucleoside monophosphates, ribose, and glucose.

This leads to lower initial rates but much longer-lasting protein synthesis.

Why Transcription Works Without GMP Even without added GMP, transcription can proceed because guanine can be salvaged into GMP through enzymatic pathways in the lysate. This GMP is then phosphorylated into GTP, which RNA polymerase uses for RNA synthesis.

Fluorescent Proteins Properties

sfGFP (superfolder GFP): sfGFP folds very efficiently and rapidly, even under suboptimal conditions, making it highly robust in cell-free systems. Its fast maturation leads to strong early fluorescence signals.
mRFP1: mRFP1 has slower chromophore maturation and less efficient folding compared to GFP variants, which can delay fluorescence onset in cell-free reactions.
mKO2: mKO2 matures relatively quickly but is somewhat sensitive to environmental conditions like pH, which can affect fluorescence intensity.
mTurquoise2: This cyan fluorescent protein has very high quantum yield but requires precise folding and is sensitive to oxidative conditions, impacting brightness.
mScarlet_I: mScarlet-I is a bright red protein with improved maturation compared to older RFPs, but still slower than GFP variants and dependent on proper oxygen availability.
Electra2: Electra2 (a newer engineered protein) is optimized for brightness but may require specific folding or redox conditions, making its performance sensitive to reaction composition.

Hypothesis for Optimization

Protein: mScarlet-I

Reagents to adjust: Increase oxygen availability (e.g., reduce reaction volume or increase surface area) and optimize magnesium concentration.
Expected Effect: Improved chromophore maturation (which is oxygen-dependent) and enhanced ribosome activity will increase correctly folded protein, leading to higher fluorescence over 36 hours.

Protein: mTurquoise2

Reagents: Add nicotinamide and optimize redox balance
Effect: Improved folding environment and reduced oxidative stress will enhance fluorescence intensity.

To maximize fluorescence over 36 hours:

Use glucose + ribose system for sustained energy
Optimize Mg²⁺ concentration for translation efficiency
Adjust amino acid balance, especially cysteine
Maintain stable pH buffering

Part C: Final Cell-Free Master Mix Design (sfGFP)

Reaction (20 μL total)

6 μL Cell Lysate
10 μL 2X Optimized Master Mix (sfGFP preset)
2 μL DNA Template (sfGFP)
2 μL Custom Reagent Supplement
This composition supports long-duration (20–36 hr) expression using a ribose–glucose energy system.

Key Features of sfGFP Master Mix

High potassium glutamate (~313 mM) → mimics intracellular conditions, stabilizes ribosomes
Balanced Mg²⁺ (~7 mM) → supports translation and proper folding
Ribose + glucose system → enables sustained ATP regeneration over long incubation
Complete amino acid mix + cysteine/tyrosine supplementation → prevents bottlenecks in translation
Nicotinamide (3.125 mM) → supports redox balance for long reactions

This is ideal for sfGFP, which benefits from:

fast folding
high robustness
efficient translation

sfGFP-Specific Biophysical Considerations

sfGFP properties affecting expression:

Extremely fast folding (superfolder variant)
High tolerance to ionic and environmental variation
Oxygen-independent chromophore formation (mostly robust)

Implication:
sfGFP is translation-limited, not folding-limited, so improving:

ribosome efficiency
energy availability → increases fluorescence output.

Reaction Setup (unchanged)

Cell Lysate → 6.000 μL
DNA Template → 2.000 μL
Master Mix → 10.000 μL
Custom Supplement → 2.000 μL

MASTER MIX FINAL TARGET CONCENTRATIONS Set reagents at:

Core Ions & Buffer
Potassium Glutamate → 315 mM ⬆ (increase slightly from 312.56)
Magnesium Glutamate → 8.5 mM ⬆
HEPES-KOH (pH 7.5) → 45 mM (kept same)
Potassium phosphate (mono + dibasic) → 5.6 mM each (kept same)

Amino Acids

17 AA Mix → 4.1 mM (kept same)
Tyrosine → 4.1 mM (kept same)
Cysteine → 4.5 mM ⬆ (slight increase improves stability over time)
Energy System (KEY FOR 36h)
Ribose → 12 g/L ⬆ (small boost for nucleotide regeneration)
Glucose → 2.0 g/L ⬆⬆ (VERY IMPORTANT for long reactions)

Nucleotides

AMP → 0.75 mM ⬆
CMP → 0.5 mM ⬆
UMP → 0.5 mM ⬆
Guanine → 0.2 mM ⬆
GMP → leave OUT

Additives

Nicotinamide → 4.0 mM ⬆ (improves long-term metabolic stability)

More Details about the master mix-

Magnesium Increase

Boosts ribosome activity
Increases translation rate
sfGFP tolerates higher Mg²⁺ well

This alone can significantly increase yield

Glucose Increase

Extends ATP production beyond 20 hours
Prevents early reaction collapse
Critical for 36-hour fluorescence

Slight Potassium Increase

Improves ribosome stability
Enhances protein synthesis efficiency

Cysteine + Nicotinamide Boost

Protects against oxidation
Maintains enzyme activity long-term

Nucleotide Increase

Prevents transcription bottlenecks over time
Increasing magnesium glutamate and glucose concentrations will enhance ribosomal activity and extend energy availability, respectively. Because sfGFP folds efficiently, improving translation rate and reaction longevity will directly increase total protein production, resulting in higher fluorescence over a 36-hour incubation.

EXPECTED RESULT

Faster fluorescence onset
Higher peak fluorescence
Longer sustained signal
Better total yield

[
  {
    "id": "nuclease_free_water",
    "supplemental_volume_nl": 1350
  },
  {
    "id": "potassium_glutamate",
    "supplemental_volume_nl": 75
  },
  {
    "id": "magnesium_glutamate",
    "supplemental_volume_nl": 75
  },
  {
    "id": "cysteine",
    "supplemental_volume_nl": 50
  },
  {
    "id": "ribose",
    "supplemental_volume_nl": 75
  },
  {
    "id": "amp",
    "supplemental_volume_nl": 25
  },
  {
    "id": "cmp",
    "supplemental_volume_nl": 25
  },
  {
    "id": "gmp",
    "supplemental_volume_nl": 50
  },
  {
    "id": "ump",
    "supplemental_volume_nl": 25
  },
  {
    "id": "glucose",
    "supplemental_volume_nl": 75
  },
  {
    "id": "nicotinamide",
    "supplemental_volume_nl": 175
  }
]

Potassium Glutamate 315.84 mM
HEPES-KOH pH 7.5 45.00 mM
Magnesium Glutamate 8.85 mM
Potassium phosphate dibasic 5.63 mM
Potassium phosphate monobasic 5.63 mM
Cysteine 4.50 mM
17 Amino Acid Mix 4.06 mM
Tyrosine pH 12 4.06 mM
Nicotinamide 4.00 mM
AMP 750.00 uM
CMP 500.00 uM
UMP 500.00 uM
GMP 250.00 uM
Guanine 156.25 uM
Ribose 12.000 g/L
Glucose 2.000 g/L
Nuclease-Free Water 1.350 uL

Cell Free Reactions Compositions:

[
  {
    "quadrant": "Q2",
    "well_label": "N3",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1350
      },
      {
        "id": "potassium_glutamate",
        "supplemental_volume_nl": 75
      },
      {
        "id": "magnesium_glutamate",
        "supplemental_volume_nl": 75
      },
      {
        "id": "cysteine",
        "supplemental_volume_nl": 50
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "amp",
        "supplemental_volume_nl": 25
      },
      {
        "id": "cmp",
        "supplemental_volume_nl": 25
      },
      {
        "id": "gmp",
        "supplemental_volume_nl": 50
      },
      {
        "id": "ump",
        "supplemental_volume_nl": 25
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 75
      },
      {
        "id": "nicotinamide",
        "supplemental_volume_nl": 175
      }
    ]
  }
]

Labs

Lab writeups:

Week 1 Lab: Pipetting

Week 1 Lab: Pipetting

Projects

Final projects:

Individual Final Project
Initially worked upon three different ideas: Idea 1 Breathe based diagnositc device Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation Idea 3 Decoding the genetic circuitry of lung cancer cells Later finalized to go with idea number one i.e Real time diagnostic system for lung health monitoring.
Group Final Project
Group Formed Proposal: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?tab=t.0 Documentation: https://pages.htgaa.org/2026a/ritika-saha/homework/week-05-hw-protein-design-part-ii/index.html By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action. The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed. BLAST can pull out homologous lysis proteins from the databases. Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated. ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions. AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ. We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein. Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Individual Final Project

Initially worked upon three different ideas:

Idea 1 Breathe based diagnositc device

Idea 2 Digital Cell Twin Modeling for Cancer and Oncology Virtual Cell Hypothesis Generation

Idea 3 Decoding the genetic circuitry of lung cancer cells

Later finalized to go with idea number one i.e Real time diagnostic system for lung health monitoring.

This project proposes the development of a fully integrated, non-invasive diagnostic platform that leverages microfluidics, synthetic biology, and advanced computational modeling to enable real-time health monitoring from breath condensate or saliva. The first aim focuses on the design, fabrication, and validation of a multilayer microfluidic device capable of precisely routing small-volume biological samples into three spatially isolated reaction wells. Each well contains a lyophilized, cell-free transcription–translation (TX–TL) system engineered with synthetic genetic circuits tailored to detect specific biomarkers: interleukin-6 (IL-6) as an indicator of inflammation, viral or host RNA signatures for infection profiling, and hydrogen peroxide as a marker of oxidative stress. Upon rehydration by the incoming sample, these systems initiate programmable biochemical reactions that produce distinct fluorescence outputs. The microfluidic architecture ensures controlled flow dynamics, minimizes cross-contamination, and enables multiplexed biochemical sensing within a compact, portable format. An integrated optical sensing layer captures fluorescence emissions and converts them into quantifiable signals, forming the basis for downstream analysis.

The second aim advances the platform by introducing a computational signal processing framework that transforms fluorescence-derived optical signals into neuromorphic spike trains. This bio-inspired encoding strategy mimics neuronal firing patterns, enabling efficient, event-driven data representation and processing. To address variability inherent in breath and saliva sampling—such as fluctuations in biomarker concentration, humidity, and collection efficiency—the system incorporates a digital twin model grounded in virtual cell simulations. This model replicates the kinetics of the cell-free gene expression systems under varying conditions, allowing for dynamic calibration and normalization of sensor outputs. By integrating experimentally derived data with predictive simulations, the framework enhances both sensitivity and specificity, enabling robust interpretation of weak or noisy biological signals. The coupling of synthetic biology outputs with neuromorphic computation represents a novel paradigm for biosensing, bridging biochemical processes with adaptive, intelligent data processing.

The third aim synthesizes these components into a unified diagnostic platform capable of classifying individuals into clinically relevant health risk categories in real time. By combining multiplexed biomarker detection with computationally enhanced signal interpretation, the system provides a holistic assessment of respiratory and systemic health. The non-invasive nature of breath and saliva sampling enables frequent, longitudinal monitoring without discomfort or risk, making the platform particularly suitable for early disease detection and preventive care. The integration of microfluidics, programmable biology, and digital modeling establishes a scalable and portable solution that could be deployed in point-of-care settings or for at-home monitoring. Ultimately, this project aims to transform diagnostic practices by enabling continuous, personalized health surveillance, reducing reliance on centralized laboratory testing, and facilitating timely clinical intervention.

Benchling Link for twist order: https://benchling.com/reet123/f_/DvufGAFHIG-final-project-construct/

Description: 
Synthetic DNA construct encoding a T7 promoter-driven gene expression cassette for cell-free system applications. The construct includes a regulatory 5′ UTR containing an aptamer-based RNA structure, ribosome binding site (RBS), reporter gene (sfGFP), and transcription terminator. Designed for in vitro transcription-translation (TX-TL) systems and biosensing applications.

SECTION 1: ABSTRACT

Respiratory diseases and systemic inflammation are often diagnosed only after symptoms become severe, limiting opportunities for early intervention. This project addresses the need for a real-time, non-invasive diagnostic platform capable of continuously monitoring key biomarkers in breath condensate or saliva. The overall purpose is to develop a microfluidic, cell-free biosensing system that integrates synthetic biology with computational signal processing to enable early disease detection.

The central hypothesis is that combining optimized cell-free gene expression systems with biomarker-specific genetic circuits and computational signal interpretation will enable sensitive, real-time detection of disease-relevant molecules. The project focuses on three biomarkers: IL-6 (inflammation), viral/host RNA (infection), and hydrogen peroxide (oxidative stress). Specific aims include designing a microfluidic device with independent reaction chambers, optimizing cell-free reactions to maximize fluorescence output, and developing a neuromorphic signal processing framework calibrated with a digital twin model.

Methods include DNA construct design (T7-driven sfGFP reporter with aptamer regulation), cell-free transcription-translation (TX-TL) optimization, microfluidic integration, and fluorescence-to-signal conversion. The expected outcome is a scalable, portable diagnostic system capable of continuous health monitoring, with potential applications in early disease detection, personalized medicine, and low-resource healthcare settings.

SECTION 2: PROJECT AIMS

Aim 1: Experimental Aim

The first aim of my final project is to design and validate a microfluidic device that enables controlled entry of breath condensate or saliva samples into three independent reaction wells, each containing a freeze-dried cell-free gene expression system engineered with specific genetic circuits to detect IL-6, viral/host RNA, and hydrogen peroxide, producing distinct fluorescence outputs measurable via an integrated optical sensing layer.

Aim 2: Development Aim

The second aim is to develop an integrated signal processing framework that converts fluorescence-derived optical signals into neuromorphic spike trains and calibrates them using a digital twin model based on virtual cell simulations, improving sensitivity, specificity, and robustness to sampling variability.

Aim 3: Visionary Aim

The third aim is to establish a fully integrated, non-invasive diagnostic platform that combines synthetic biology, microfluidics, and neuromorphic computing to classify individuals into health risk categories in real time, enabling continuous and personalized monitoring of respiratory and systemic health.

SECTION 3: BACKGROUND

Literature Context Cell-free systems have become powerful tools for diagnostics due to their programmability and portability. Several studies have demonstrated that freeze-dried TX-TL systems can detect viral RNA and environmental signals outside of laboratory settings. Additionally, certain studies have shown that optimizing energy systems (e.g., glucose and ribose) significantly improves protein yield and reaction duration in cell-free systems. Despite these advances, current systems often lack long-term stability, multiplexing capability, and integration with computational frameworks. This project addresses these limitations by combining multi-biomarker detection, optimized reaction chemistry, and real-time signal processing.
Innovation

This project is innovative because it integrates:

Microfluidics + cell-free biosensing + neuromorphic computing
Multiplexed detection of multiple biomarkers in parallel
Biochemical optimization (Mg²⁺, glucose, nucleotides) for long-duration expression
Additionally, the use of a digital twin model to interpret biological signals introduces a novel interface between synthetic biology and computational modeling.

Impact

This project targets the major challenge of early detection of respiratory and systemic diseases. Current diagnostics are often invasive and episodic, missing dynamic changes in patient health. By enabling continuous monitoring, this system could transform healthcare toward preventive and personalized medicine.

The platform could be deployed in low-resource settings due to its portability and low cost, improving global health equity. It also reduces reliance on centralized laboratories and enables rapid response to infectious disease outbreaks. Scientifically, this work advances synthetic biology by demonstrating how biochemical tuning and computational integration can enhance system performance.

Ethical Implications

This project raises ethical considerations related to data privacy, accessibility, and responsible deployment of diagnostic technologies. The principle of beneficence applies, as the system aims to improve early detection and health outcomes. However, justice must be ensured so that such technologies are accessible across socioeconomic groups and do not exacerbate healthcare disparities. To ensure ethical implementation, safeguards must be established for data security and informed consent, especially when continuous monitoring is involved. Potential unintended consequences include overdiagnosis or anxiety due to continuous health tracking. To mitigate this, the system should be used as a decision-support tool rather than a standalone diagnostic, and results should be interpreted alongside clinical expertise. Regulatory oversight and transparent validation are essential to ensure safety and reliability.

SECTION 4: EXPERIMENTAL DESIGN

DNA Construct (Benchling Design)

T7 Promoter
5′ UTR with aptamer-based regulatory element
Ribosome Binding Site (RBS)
sfGFP reporter gene
Transcription terminator

This design enables biomarker-responsive translation control, where the aptamer regulates expression based on target molecules.

Cell-Free Reaction Design (Optimized)

Final Reaction Composition (20 μL)

6 μL Lysate
10 μL 2X Master Mix
2 μL DNA template
2 μL Custom supplement

[
  {"id":"nuclease_free_water","supplemental_volume_nl":1350},
  {"id":"potassium_glutamate","supplemental_volume_nl":75},
  {"id":"magnesium_glutamate","supplemental_volume_nl":75},
  {"id":"cysteine","supplemental_volume_nl":50},
  {"id":"ribose","supplemental_volume_nl":75},
  {"id":"amp","supplemental_volume_nl":25},
  {"id":"cmp","supplemental_volume_nl":25},
  {"id":"gmp","supplemental_volume_nl":50},
  {"id":"ump","supplemental_volume_nl":25},
  {"id":"glucose","supplemental_volume_nl":75},
  {"id":"nicotinamide","supplemental_volume_nl":175}
]

Final Optimized Concentrations

Potassium glutamate: 315.84 mM
Magnesium glutamate: 8.85 mM
HEPES: 45 mM
Cysteine: 4.5 mM
Nicotinamide: 4.0 mM
AMP: 0.75 mM
CMP/UMP: 0.5 mM
GMP: 0.25 mM
Ribose: 12 g/L
Glucose: 2 g/L

Step-by-Step Experimental Plan

Design DNA constructs for sfGFP and biomarker-responsive circuits
Order DNA via Twist Bioscience
Prepare or obtain BL21 cell lysate
Prepare 2X master mix
Add optimized supplement reagents
Assemble 20 μL reactions
Load into microfluidic device wells
Introduce simulated breath/saliva samples
Incubate at 30°C
Capture fluorescence using optical sensor
Record time-course data (0–36 hrs)
Convert fluorescence to digital signals
Apply neuromorphic encoding
Compare outputs across biomarkers
Validate reproducibility

Expected Results

Increased Mg²⁺ → higher protein expression
Increased glucose → longer reaction duration
Multiplex detection → distinct fluorescence outputs
Signal processing → improved classification accuracy

Techniques Used ✔ Cell-Free Systems ✔ DNA Construct Design ✔ Microfluidics ✔ Lab Automation ✔ Data Analysis ✔ Bioethical Considerations

Technique Expansion

Cell-Free Systems Used to express reporter proteins in a controlled environment. Enables rapid testing and optimization without living cells.
DNA Construct Design Used to engineer biomarker-responsive circuits using aptamers and regulatory elements controlling sfGFP expression.

SECTION 5: RESULTS & VALIDATION

Validation

I validated my project by designing and optimizing a cell-free sfGFP expression system with enhanced reagent composition to maximize fluorescence output.

Protocol

Prepare optimized master mix Add lysate, DNA, supplement Incubate at 30°C Measure fluorescence over 36 hours

Techniques Used

Cell-free reactions enabled rapid testing of protein expression. DNA design ensured efficient transcription and translation. Optimization of Mg²⁺ and glucose improved yield. Fluorescence measurement provided quantitative validation.

SECTION 6: ADDITIONAL INFORMATION

References

Budget

DNA synthesis (Twist): ~$120 Cell-free lysate: ~$200 Reagents: ~$150 Consumables: ~$50 Instrumentation: ~$100 Total: ~$620

Few new conceptualized versions–

Group Final Project

Group Formed
Proposal: https://docs.google.com/document/d/1ENvPHhRbBgtl0ERrfqmomJKxPg68nfvCugrPQrDdM7o/edit?tab=t.0
Documentation: https://pages.htgaa.org/2026a/ritika-saha/homework/week-05-hw-protein-design-part-ii/index.html

By: 2026a-nourelden-rihan, 2026a-ritika-saha, 2026a-rahul-yaji, 2026a-keerthana-gunaretnam

We decided to focus on the main area of increasing the stability of the MS2 phage lysis protein L, with a possible secondary goal of reducing the dependency on host DnaJ, while still maintaining the lysis action.
The tools AlphaFold, Clustal Omega, BLAST, ESM, and ESMFold were discussed.
BLAST can pull out homologous lysis proteins from the databases.
Clustal Omega can create MSAs to identify essential L48-S49 residues, and the pore-forming regions that must not be mutated.
ESM can create mutation heatmaps, which can guide the use of ESMFold to obtain highest score foldings in mutatable regions.
AlphaFold Multimer predicts whether the subunits of our protein can successfully create a pore in the host membrane, and also to check whether N-terminus can break the interaction with DnaJ.
We also identified a few pitfalls, with majors ones dealing with limited training datasets, that may not be properly aligned towards creating a transmembrane lysis protein.
Some other pitfalls include the lack of proper annotations for amurins; the possibility of an over-stable protein to form non-functional aggregates; and the vulnerability of modified protein to host proteases.

Ritika Saha — HTGAA Spring 2026

About me

Contact info

HTGAA Committed Listener (CL) Agreement

Ritika Saha 9 March 2026

Homework

Labs

Projects

Proposed Idea

Follow My Journey

More updates coming soon!

Subsections of Ritika Saha — HTGAA Spring 2026

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: LungLite — Principles, Practices, and Governance

🌬️ Project Idea: LungLite (AI + Breath Microfluidics + Cell-Free Synbio)

1) Biological engineering application/tool + why

2) Governance/policy goals for an ethical future

Policy Goal A — Enhance Biosecurity

Policy Goal B — Foster Lab Safety

Policy Goal C — Protect the Environment

Policy Goal D — Other considerations

3) Governance actions

Option 1: Technical Safety-by-Design

Idea

Design

Assumptions

Risks of failure

Risks of “success”

Option 2: Distribution + Supply Chain Controls

Purpose

Design

Assumptions

Risks of failure

Risks of “success”

Option 3: Responsible Health Claims + Data Governance

Aim

Design

Assumptions

Risks of failure

Risks of “success”

4) Scoring matrix (1 = best, 3 = worst; n/a allowed)

5) Prioritized strategy

Recommended strategy

Why Option 1 is essential

Why Option 3 is equally critical

Where Option 2 fits

Tradeoffs considered

Audience for recommendation

6) What I Learned

Ethical concerns that arose

Governance actions proposed to address these

Week 2 Lecture Prep

Homework Questions — Professor Jacobson

1) DNA polymerase error rate, genome comparison, and how biology handles the discrepancy

Comparison to the human genome

How biology deals with the discrepancy

🧪 Homework Questions — Dr. LeProust

1) What’s the most commonly used method for oligo synthesis currently?

2) Why is it difficult to make oligos longer than ~200 nt via direct synthesis?

A) The yield drops exponentially with length

B) Errors accumulate

C) Purification becomes difficult and expensive

3) Why can’t you make a 2000 bp gene via direct oligo synthesis?

A) Yield becomes extremely low

B) The error rate becomes unacceptable

What is done instead in practice?

📄 HW by Dr. George Church — Grant Application (Devised)

Project Title

1) Abstract

2) Specific Aims

3) Significance

4) Innovation

5) Technical Approach and Work Plan (12 months)

6) Expected Deliverables

7) Risk Analysis and Mitigation

8) Safety, Ethics, and Governance Plan

9) Team and Resources

10) Long-Term Vision and Commercialization