Subsections of 2026a-mariana-kanbe

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Assignments: Class 1 Assignment Question 1 I propose a high-throughput microscopy tool to estimate intracellular PHA accumulation from granule count and size. Current standart quantification methods are slow, labor-intensive, and often require hazardous solvent-based extraction. By pairing PHA staining (e.g., Sudan Black B or Nile Red A) with automated imaging and machine-learning (ML) image segmentation, this approach could rapidly screen large libraries of environmental isolates and recombinant strains for high PHA producers.

  • Week 2: DNA Read, Write, & Edit

    Homework Part 1: Benchling & In-silico Gel Art Opened https://benchling.com/ and signed up. Found the Lambda sequence from https://www.neb.com/en/-/media/nebus/page-images/tools-and-resources/interactive-tools/dna-sequences-and-maps/text-documents/lambdafsa.txt?rev=c0c6669b9bd340ddb674ebfd9d55c691&hash=B4188C171E5A42A1CF6FD257F98B97A1 and copied the sequence (without the header). Pasted this sequence into Benchling through “Create” > “DNA / RNA Sequence” > “New DNA / RNA Sequence”. Then I just pasted the sequence in the “Bases” field, titled it “Lambda,” and selected the topology as “Linear.”

  • Week 3 HW: Lab Automation

    Python Script for Opentrons Artwork Here’s my HTGAA 2026 Opentrons Art Python Script Submission. The artistic design I created using the GUI is available here. I heavily used the “Example 7 Microbial Earth” by Dominika Wawrzyniak, using pixels loaded from an external resource (a CSV file hosted on my GitHub page). I used Dominika’s well documented Notion page from HTGAA21 to understand the code and replicate it for my case. I used Gemini assistance only to debug minor typos and syntax errors, and to identify which packages to import to execute the code.

  • Week 4 HW: Protein Design Part 1

    Homework: Protein Design I Part A. Conceptual Questions 1) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) ~ 21% of meat is protein content (Smith et al. 2022) therefore, 500g meet contains about 105g of protein.

  • Week 5 HW: Protein Design Part 2

    Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM Question 1 This is human SOD1 sequence from UniProt (P00441) removing the initial Met ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ FASTA introducing the A4V mutant associated with the most aggressive forms of the ALS disease ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Question 2 and 3 With the help of ChatGPT and Gemni, I generated 2 new cells ir order to generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

  • Week 6 HW: Genetic Circuits Part 1: Assembly Technologies

    Assignment: DNA Assembly Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fidelity PCR Master Mix is a 2X, ready-to-use mixture where the exact formulation is partly proprietary, but the functional components are documented in the manufacturer’s manual: Component (Phusion 2X Master Mix) Purpose Phusion High-Fidelity DNA Polymerase DNA synthesis with high fidelity + proofreading dNTPs (dATP, dCTP, dGTP, dTTP) Building blocks for new DNA strands HF reaction buffer (salts + pH buffer) Maintains optimal pH/ionic strength for enzyme function Mg2+ (via buffer system; often MgCl2-derived) Essential polymerase cofactor Stabilizers / additives (partly proprietary) Improve enzyme stability and consistency Nuclease-free water Solvent to reach correct 2X working concentrations Reference: Thermo Fisher Phusion High–Fidelity DNA Polymerase Product Information Sheet, standard biochemistry manuals (e.g., Sambrook & Russell).

  • Week 7 HW: Genetic Circuits Part 2: Neuromorphic Circuits

    Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) Question 1 Traditional genetic circuits are usually implemented in Boolean logic (ON/OFF), hand-designed as fixed logic. so representing nuanced behaviors often requires many gates, sharp thresholds, and careful tuning, which can make designs bulky and brittle. As the number of inputs grows the circuit complexity can explode combinatorially, increasing burden by stacking multiple layers and adding intermediate nodes, which increases metabolic load, failure points, and sensitivity to part-to-part variability Also, adapting to new targets or shifting biological context often means redesigning the circuit architecture, not just re-tuning parameters.

  • Week 9 HW: Cell Free Systems

    Homework Part A: General and Lecturer-Specific Questions General homework questions Exercise 1 Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

  • Week 10 HW: Advanced Imaging & Measurement Technology

    Homework: Final Project What to measure? I will measure visible melanin output in the material as the primary readout of the project. I want to quantify: Degree of darkening Spatial distribution of pigmentation Stability/Persistence of the pigmentation in the bacterial cellulose / after drying or storage These measurements are directly relevant because they indicate whether the melanin-producing system is functioning and whether the output is compatible with the intended material application.

  • Week 11 HW: Bioproduction & Cloud Labs

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork I contributed 7 pixels to the global artwork experiment, helping extend a horizontal yellow line in the top-left area (see screenshot below). At first, I was cautious and tried to understand the ongoing ideas for each section and whether there was a unifying concept. I considered introducing something new, but ultimately decided to stick with what seemed to be the area’s goal (a horizontal yellow line). For next year, it might be fun to have an in-app chat within the same domain to coordinate contributions more easily and check the current vibes.

Subsections of Homework

Week 1 HW: Principles and Practices

Assignments: Class 1 Assignment

Question 1

I propose a high-throughput microscopy tool to estimate intracellular PHA accumulation from granule count and size. Current standart quantification methods are slow, labor-intensive, and often require hazardous solvent-based extraction. By pairing PHA staining (e.g., Sudan Black B or Nile Red A) with automated imaging and machine-learning (ML) image segmentation, this approach could rapidly screen large libraries of environmental isolates and recombinant strains for high PHA producers.

Future upgrades, offered as a premium beta for testing, could add a “material profile” output by predicting PHA chain-length class (SCL, MCL, or LCL) from staining/fluorescence response patterns using the lipophilic dyes. This would enable not only faster strain selection but also early-stage differentiation of polymer type, which is critical for downstream biotechnology applications.

A further upgrade could generate image-driven optimization suggestions from microscopy images. For example, if it detects a high level of extracellular debris consistent with cell lysis, or a high abundance of product granules outside the cells, it could recommend exploring strain-engineering strategies that alter cell membrane composition to increase tolerance to mechanical stress and support higher intracellular polymer accumulation as cytoplasmic granules.

Question 2

Gov / Policy Goal 1: Prevent harmful misuse

• Sub-goal 1.1 - Limit repurposability: Reduce the extent to which the tool can be used as a general-purpose and high-throughput optimization engine outside its intended PHA scope, for example by restricting supported dyes and limiting microscopy calibration parameters to validated settings.

• Sub-goal 1.2 - Increase accountability: ensure high-impact uses are traceable and that institutions have a mechanism to intervene if misuse is suspected.

Gov / Policy Goal 2: Promote safe, responsible operation and research integrity

• Sub-goal 2.1 - Standardize safe use: Require adherence to Standard Operating Procedures (SOPs) for staining, imaging, and waste handling.

• Sub-goal 2.2 - Ensure competent users: Require completion of a short training module, including lab safety + tool-specific quality control (QC) before users can access advanced features or export “final” reports.

• Sub-goal 2.3 - Maintain data quality: Require basic QC checks (controls, calibration, and logging of model version and imaging settings) to reduce false positives/negatives and prevent misinterpretations.

Gov / Policy Goal 3: Maintain access for constructive uses (equity and scientific progress)

• Sub-goal 3.1 - Preserve legitimate research utility: avoid governance mechanisms that unnecessarily slow routine PHA research and screening.

• Sub-goal 3.2 - Proportional governance: apply stricter controls only to higher-impact capabilities (e.g., advanced optimization suggestions), rather than restricting all use.

Question 3

Option 1:

General action: Norms combined with oversight mechanisms (social/regulatory governance)

Purpose: Currently, PHA quantification is typically validated through chemical extraction and analytical methods rather than standardized image-based measurement. A robust image-analysis tool like this would significantly increase throughput and expand where and how screening can be performed. If an image-analysis approach is positioned as a scalable screening tool, it should include safeguards to prevent use outside validated conditions. A responsible-use policy with “red flag” triggers would provide a proportional oversight mechanism.

Design:

• Actors: principal investigators (PIs) and laboratory personnel (primary users), microscopy core facility staff, the university biosafety office (or equivalent), and an institutional ethics/biosafety committee.

• Mechanism: implement a short pre-use declaration form and a responsible-use policy that defines “red flag” contexts (e.g., high-throughput work on unverified environmental isolates without provenance, use outside standard biosafety environments, or attempts to generalize the tool beyond PHA workflows).

• Trigger response: if a red flag is triggered, require review by the biosafety/ethics committee (or the biosafety office) and compliance with institutional requirements before experiments or tool access continue.

Assumptions:

• Users will accurately disclose the intended use and experimental context (or there will be sufficient deterrence to reduce misreporting).

• Red-flag criteria can be defined clearly enough to be actionable and consistent across labs.

• The institution has capacity to perform timely reviews without creating major delays for legitimate projects.

• Some level of auditing is feasible (e.g., metadata logs or usage reporting), which may require limited access to usage data.

Risks of failure and “success”:

• The policy becomes symbolic and is not followed; criteria are too vague to enforce; or users misreport their purpose to avoid review.

• Overly broad triggers could make oversight routine, slowing research and disproportionately burdening smaller or under-resourced labs (equity and access concerns).

Option 2:

Restrict advanced features: High-impact features require auditable access (accountability governance) Purpose: Adding accountability for higher-impact features while keeping basic screening broadly accessible.

Design:

• Actors: tool developers (academic or company), institutions adopting the tool.

• Baseline access: basic PHA screening module available for standard use.

• Advanced access (premium/beta): requires institutional opt-in (verified affiliation, training completion, and standard operating procedures adherence).

• Logging: maintain run logs with technical metadata only (model version, stain, imaging settings, quality control pass/fail, solvent/waste metadata etc).

• Incident response: provide an incident-reporting channel so access can be suspended if misuse is suspected.

Assumptions:

• Logging and gating deter misuse without driving users to ungoverned copies.

• Metadata-only logs are sufficient for accountability without compromising privacy.

• Institutions are willing to administer opt-in and training requirements.

Risks of failure and “success”:

• Users bypass controls by using modified versions or alternative tools; logging becomes incomplete.

• Reduced accessibility and higher admin burden, potentially concentrating access in well-resourced labs.

• Analogy: similar to “KYC tiers” in financial systems: more powerful capabilities require stronger verification and auditability.

Option 3:

Just for PHA: Scope capabilities through validated workflows (technical strategy / design constraint). Purpose: General-purpose screening tools are easier to repurpose. One way to limit their repurposability is by restricting the tool to validated PHA workflows.

Design:

• Actors: tool developers and maintainers; optionally journals or core facilities that require validated workflows for reporting.

• Technical constraint: restrict supported dyes and workflows to PHA-relevant staining and analysis; lock calibration parameters to validated microscopy settings; exclude generic “optimize any phenotype” modules.

• Reporting constraint: outputs are labeled as screening support, with clear limits on claims and recommended confirmatory methods for final quantification.

Assumptions:

• Technical restrictions meaningfully reduce repurposability.

• The validated workflow remains useful across common lab setups and organisms.

• Users accept constraints rather than abandoning the tool.

Risks of failure and “success”:

• Restrictions are easily removed in forks / hacks etc; scope limits become ineffective.

• Reduced scientific and commercial usefulness, including for ethically beneficial non-PHA applications; may slow innovation.

• This is analogous to 3D printers that restrict materials and firmware settings: the core function remains available, but out-of-scope production becomes harder without intentional modification.

Question 4

Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents213
• By helping respond213
Foster Lab Safety
• By preventing incident221
• By helping respond313
Protect the environment
• By preventing incidents212
• By helping respond323
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility?221
• Not impede research233
• Promote constructive applications213

Question 5

I would prioritize Option 3 as the primary governance approach, aimed at tool developers and maintainers. Although Option 3 has the weakest overall score, I assign higher weight to practical implementability and consistent adoption, since governance mechanisms that require sustained oversight or significant administrative capacity are often applied inconsistently in real research settings. Option 3 can be implemented directly in software and routine workflows by restricting the tool to validated PHA use cases (supported dyes, locked calibration ranges, and scoped outputs). This reduces repurposability by design rather than relying on user compliance, making the default use safer and more predictable while preserving the core constructive application: scalable PHA screening.

The key trade-off is that Option 3 scores poorly on “helping respond” (biosecurity and lab safety), because it provides limited traceability and fewer mechanisms for intervention after deployment. It also narrows beneficial extensions beyond PHA, potentially limiting constructive applications in adjacent domains.

This recommendation also rests on several assumptions and uncertainties: that capability scoping meaningfully reduces repurposability in practice; that users will not widely circumvent constraints via modified versions or alternative tools; and that the validated workflow generalizes across common microscopes, organisms, and staining conditions.

Final Reflection

The main new ethical concern for me was how quickly a tool designed for a narrow, constructive purpose (PHA screening) can become a general “scale-up enabler” once it is automated and paired with machine-learning image analysis. To address this, I would recommend capability scoping by restricting the tool to validated PHA workflows (supported dyes, locked calibration ranges, and scoped outputs)


Week 2 Lecture Prep

Homework Questions from Professor Jacobson:

Question 1 High-fidelity, proofreading-proficient replicative DNA polymerases have an error rate of ≈ 10⁻⁶ during synthesis under standard conditions. The human nuclear genome is about 3.2 × 10⁹ base pairs per haploid set. If errors happened at 10⁻⁶ per base, you’d expect roughly 3.2 × 10⁹ × 10⁻⁶ ≈ 3.2 × 10³ (≈ 3,200) errors per haploid genome copy. However, in living cells, the effective replication error rate is far lower once proofreading (3′→5′ exonuclease) and post-replication repair (such as mismatch repair, MMR) are included: a commonly cited order of magnitude is ≈ 10⁻⁹ to 10⁻¹⁰ errors per base pair per replication.

Question 2 Because of codon degeneracy, the same amino-acid sequence can be encoded by many DNA coding sequences. A rough average multiplicity per amino acid is about 3.05 synonymous codons. Given an average human protein of 1036 bp and that coding DNA uses 3 bp per amino acid, 1036 bp / 3 ≈ 345 codons. So the number of different DNA coding sequences that produce the exact same protein is on the order of ≈ 10¹⁶⁷. In practice, though, synonymous variants are not always functionally equivalent. Some synonymous changes produce transcripts with different stability and structure. For example, synonymous substitutions can lead to hairpins or repetitive motifs that increase recombination and reduce construct stability. They can also change ribosome speed patterns (which can alter co-translational folding and lead to misfolding, aggregation, or altered activity). In addition, synonymous changes can inadvertently create or disrupt regulatory sequence motifs (e.g., polyadenylation signals or splicing enhancer/silencer elements in eukaryotes).

Homework Questions from Dr. LeProust:

The golden standard for synthesis of oligonucleotides is the solid-phase oligonucleotide synthesis (SPOS) based on phosphoramidite chemistry (Walther et al. 2020). However, this method struggles beyond ~200nt because every nucleotide is added in a multi-step cycle and small inefficiencies and side reactions compound with length.

Homework Question from George Church:

Question: What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Answer: The 10 essential amino acids in all animals are Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine. Considering this, Jurassic Park’s biocontainment method is a joke, since it doesn’t create a unique dependency in animals: animals already can’t synthesize lysine. Also, as containment-by-dependency, it’s ecologically leaky because they did not consider the possibility that lysine was readily available in the environment. Lysine is available via plants and prey, so escape doesn’t remove access. OBS: I answered this by consulting a Jurassic Park subreddit discussion.

Week 2: DNA Read, Write, & Edit

Homework

Part 1: Benchling & In-silico Gel Art

Opened https://benchling.com/ and signed up. Found the Lambda sequence from https://www.neb.com/en/-/media/nebus/page-images/tools-and-resources/interactive-tools/dna-sequences-and-maps/text-documents/lambdafsa.txt?rev=c0c6669b9bd340ddb674ebfd9d55c691&hash=B4188C171E5A42A1CF6FD257F98B97A1 and copied the sequence (without the header). Pasted this sequence into Benchling through “Create” > “DNA / RNA Sequence” > “New DNA / RNA Sequence”. Then I just pasted the sequence in the “Bases” field, titled it “Lambda,” and selected the topology as “Linear.”

Clicked “Digest” (the scissors icon in the right menu), selected “All enzymes,” found all seven using the search tool, and clicked “Run Digest.”

Part 3: DNA Design Challenge

3.1. Choose your protein: Poly(3-hydroxyalkanoate) polymerase subunit PhaC

I chose Polyhydroxyalkanoate synthase (PhaC) because it is involved in the catalysis of the reaction that polymerizes (R)-3-hydroxybutyryl-CoA to produce polyhydroxybutyrate (PHB), which is an important bioproduct of interest due to its plastic/polyethylene-like properties.

Biologically, PHB serves as an intracellular energy reserve material when cells grow under conditions of nutrient limitation.

Sequence of Polyhydroxyalkanoate Synthase (PhaC): MATGKGAAASTQEGKSQPFKVTPGPFDPATWLEWSRQWQGTEGNGHAAASGIPGLDALAGVKIAPAQLGDIQQRYMKDFSALWQAMAEGKAEATGPLHDRRFAGDAWRTNLPYRFAAAFYLLNARALTELADAVEADAKTRQRIRFAISQWVDAMSPANFLATNPEAQRLLIESGGESLRAGVRNMMEDLTRGKISQTDESAFEVGRNVAVTEGAVVFENEYFQLLQYKPLTDKVHARPLLMVPPCINKYYILDLQPESSLVRHVVEQGHTVFLVSWRNPDASMAGSTWDDYIEHAAIRAIEVARDISGQDKINVLGFCVGGTIVSTALAVLAARGEHPAASVTLLTTLLDFADTGILDVFVDEGHVQLREATLGGGAGAPCALLRGLELANTFSFLRPNDLVWNYVVDNYLKGNTPVPFDLLFWNGDATNLPGPWYCWYLRHTYLQNELKVPGKLTVCGVPVDLASIDVPTYIYGSREDHIVPWTAAYASTALLANKLRFVLGASGHIAGVINPPAKNKRSHWTNDALPESPQQWLAGAIEHHGSWWPDWTAWLAGQAGAKRAAPANYGNARYRAIEPAPGRYVKAKA Source: UniProt at https://www.uniprot.org/uniprotkb/P23608/entry#sequences

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. reh:H16_A1437 K03821 poly(R)-3-hydroxyalkanoate polymerase subunit PhaC EC:2.3.1.304 | (GenBank) phaC1; Poly(3-hydroxybutyrate) polymerase (N) atggcgaccggcaaaggcgcggcagcttccacgcaggaaggcaagtcccaaccattcaaggtcacgccggggccattcgatccagccacatggctggaatggtcccgccagtggcagggcactgaaggcaacggccacgcggccgcgtccggcattccgggcctggatgcgctggcaggcgtcaagatcgcgccggcgcagctgggtgatatccagcagcgctacatgaaggacttctcagcgctgtggcaggccatggccgagggcaaggccgaggccaccggtccgctgcacgaccggcgcttcgccggcgacgcatggcgcaccaacctcccatatcgcttcgctgccgcgttctacctgctcaatgcgcgcgccttgaccgagctggccgatgccgtcgaggccgatgccaagacccgccagcgcatccgcttcgcgatctcgcaatgggtcgatgcgatgtcgcccgccaacttccttgccaccaatcccgaggcgcagcgcctgctgatcgagtcgggcggcgaatcgctgcgtgccggcgtgcgcaacatgatggaagacctgacacgcggcaagatctcgcagaccgacgagagcgcgtttgaggtcggccgcaatgtcgcggtgaccgaaggcgccgtggtcttcgagaacgagtacttccagctgttgcagtacaagccgctgaccgacaaggtgcacgcgcgcccgctgctgatggtgccgccgtgcatcaacaagtactacatcctggacctgcagccggagagctcgctggtgcgccatgtggtggagcagggacatacggtgtttctggtgtcgtggcgcaatccggacgccagcatggccggcagcacctgggacgactacatcgagcacgcggccatccgcgccatcgaagtcgcgcgcgacatcagcggccaggacaagatcaacgtgctcggcttctgcgtgggcggcaccattgtctcgaccgcgctggcggtgctggccgcgcgcggcgagcacccggccgccagcgtcacgctgctgaccacgctgctggactttgccgacacgggcatcctcgacgtctttgtcgacgagggccatgtgcagttgcgcgaggccacgctgggcggcggcgccggcgcgccgtgcgcgctgctgcgcggccttgagctggccaataccttctcgttcttgcgcccgaacgacctggtgtggaactacgtggtcgacaactacctgaagggcaacacgccggtgccgttcgacctgctgttctggaacggcgacgccaccaacctgccggggccgtggtactgctggtacctgcgccacacctacctgcagaacgagctcaaggtaccgggcaagctgaccgtgtgcggcgtgccggtggacctggccagcatcgacgtgccgacctatatctacggctcgcgcgaagaccatatcgtgccgtggaccgcggcctatgcctcgaccgcgctgctggcgaacaagctgcgcttcgtgctgggtgcgtcgggccatatcgccggtgtgatcaacccgccggccaagaacaagcgcagccactggactaacgatgcgctgccggagtcgccgcagcaatggctggccggcgccatcgagcatcacggcagctggtggccggactggaccgcatggctggccgggcaggccggcgcgaaacgcgccgcgcccgccaactatggcaatgcgcgctatcgcgcaatcgaacccgcgcctgggcgatacgtcaaagccaaggcatga Source: KEGG at https://www.genome.jp/dbget-bin/www_bget?reh:H16_A1437

3.3. Codon optimization. I optimized the phaC coding sequence for E. coli because it is a widely used chassis for recombinant protein expression and for rapid prototyping of metabolic engineering constructs.

I did this using the Benchling tool. I’ve selected the region of the AA sequence I wish to back translate and right clicked on the highlighted region. From the the codon optimization tab:

  • Host: E. coli K-12
  • Method: Match codon usage
  • GC content: Medium (0.33 to 0.66) cause the extremes may be inconvenient. High GC can create strong secondary structures and low GC can cause instability/repeats and can make synthesis harder.
  • Uridine depletion: off (not relevant for bacterial expression)
  • Hairpin parameters: Stem size: 8 and Window 50
  • Restriction sites: avoid BsaI, BsmBI, BbsI (Type IIS enzymes for Golden Gate compatibility since I would have to clone phaA and phaB also, not phaC single gene in one vector)
  • Patterns to reduce: AAAAAA and ATATATATA

I clicked on “Optimization preview” and got this result:

3.4. You have a sequence! Now what?

PhaC alone will not produce PHB. A minimal PHB pathway typically includes PhaA (β-ketothiolase) and PhaB (acetoacetyl-CoA reductase) in addition to PhaC (PHA synthase). PhaA and PhaB convert central metabolites (via acetyl-CoA) into (R)-3-hydroxybutyryl-CoA, which is the direct substrate that PhaC polymerizes into PHB. You will also need a host capable of supplying sufficient acetyl-CoA and NADPH.

Therefore, for PHB production in E. coli, phaA, phaB, and phaC are commonly co-expressed on the same plasmid (as a single operon with one promoter and RBSs for each gene) and grown under appropriate culture conditions (e.g., carbon excess and nutrient limitation) that favor polymer accumulation.

Part 4: Prepare a Twist DNA Synthesis Order

Project: pBBR1-MSC5::phaCAB Cell-dependent recombinant expression approach: cloning the codon-optimized phaA, phaB and phaC coding sequences into E. coli K12

Promoter - RBS - phaA - (RBS) - phaB - (RBS) - phaC - Terminator

phaA Sequence MTDVVIVSAARTAVGKFGGSLAKIPAPELGAVVIKAALERAGVKPEQVSEVIMGQVLTAGSGQNPARQAAIKAGLPAMVPAMTINKVCGSGLKAVMLAANAIMAGDAEIVVAGGQENMSAAPHVLPGSRDGFRMGDAKLVDTMIVDGLWDVYNQYHMGITAENVAKEYGITREAQDEFAVGSQNKAEAAQKAGKFDEEIVPVLIPQRKGDPVAFKTDEFVRQGATLDSMSGLKPAFDKAGTVTAANASGLNDGAAAVVVMSAAKAKELGLTPLATIKSYANAGVDPKVMGMGPVPASKRALSRAEWTPQDLDLMEINEAFAAQALAVHQQMGWDTSKVNVNGGAIAIGHPIGASGCRILVTLLHEMKRRDAKKGLASLCIGGGMGVALAVERK Source: UniProt at https://www.uniprot.org/uniprotkb/P14611/entry#sequences

phaB Sequence MTQRIAYVTGGMGGIGTAICQRLAKDGFRVVAGCGPNSPRREKWLEQQKALGFDFIASEGNVADWDSTKTAFDKVKSEVGEVDVLINNAGITRDVVFRKMTRADWDAVIDTNLTSLFNVTKQVIDGMADRGWGRIVNISSVNGQKGQFGQTNYSTAKAGLHGFTMALAQEVATKGVTVNTVSPGYIATDMVKAIRQDVLDKIVATIPVKRLGLPEEIASICAWLSSEESGFSTGADFSLNGGLHMG Source: UniProt at https://www.uniprot.org/uniprotkb/P14697/entry#sequences

phaC Sequence MATGKGAAASTQEGKSQPFKVTPGPFDPATWLEWSRQWQGTEGNGHAAASGIPGLDALAGVKIAPAQLGDIQQRYMKDFSALWQAMAEGKAEATGPLHDRRFAGDAWRTNLPYRFAAAFYLLNARALTELADAVEADAKTRQRIRFAISQWVDAMSPANFLATNPEAQRLLIESGGESLRAGVRNMMEDLTRGKISQTDESAFEVGRNVAVTEGAVVFENEYFQLLQYKPLTDKVHARPLLMVPPCINKYYILDLQPESSLVRHVVEQGHTVFLVSWRNPDASMAGSTWDDYIEHAAIRAIEVARDISGQDKINVLGFCVGGTIVSTALAVLAARGEHPAASVTLLTTLLDFADTGILDVFVDEGHVQLREATLGGGAGAPCALLRGLELANTFSFLRPNDLVWNYVVDNYLKGNTPVPFDLLFWNGDATNLPGPWYCWYLRHTYLQNELKVPGKLTVCGVPVDLASIDVPTYIYGSREDHIVPWTAAYASTALLANKLRFVLGASGHIAGVINPPAKNKRSHWTNDALPESPQQWLAGAIEHHGSWWPDWTAWLAGQAGAKRAAPANYGNARYRAIEPAPGRYVKAKA Source: UniProt at https://www.uniprot.org/uniprotkb/P23608/entry#sequences

For this exercise, I chose pBBR1MCS-5 as the plasmid backbone because it is a broad-host-range vector commonly used for cloning and expression of phaCAB. Source: https://www.teses.usp.br/teses/disponiveis/87/87131/tde-29042010-102817/publico/RogeriodeSousaGomes_Doutorado.pdf

Part 5: DNA Read / Write / Edit

I would sequence DNA used for DNA-based digital data storage, because I’ve never did this before and would feel amazing to be able to instantly interpret the info like reading a book or something like this.

Maybe I’d use Illumina (second-generation, massively parallel short reads) sequencing for high-accuracy base calls and reliable decoding of short oligos and Nanopore (third-generation, single-molecule long reads)to validate longer constructs and integrity.

My input for using the Illumina method would be a DNA pool. This would have to go for a fragmentation stage, adapter ligation (indexes), and PCR amplicication). Throgh Illumina bases are decoded sequencing-by-synthesis with fluorescently labeled reversible terminators and the output is millions to billions of short reads (FASTQ) plus per-base quality scores. To decode that data it is required alignment/consensus and error correction.

I would synthesize a PHA production cassette for E. coli K12 (codon-optimized phaA + phaB + phaC) to enable rapid testing/studing of PHB production. I would use commercial gene synthesis (e.g., Twist) because it is practical, accurate. Essential steps would include oligo synthesis, oligo pooling, assembly into full-length gene/insert, cloning into plasmid. Among the limitations I’d face with this method is error compound since the probability increases with length. So long constructs often require assembly and clonal verification, adding time.

Aiming for increased expression of phaCAB and production of PHA I would edit E. coli metabolic and stress-tolerance genes to increase PHB yield, for example by improving acetyl-CoA/NADPH supply, reducing competing pathways, and increasing tolerance to intracellular polymer accumulation (reducing lysis under high load).

I would use CRISPR-based editing for targeted point mutations without double-strand breaks. RNA is guided direct Cas9 to a locus, DNA is cut and repaired via HDR using a donor template containing the desired edit. In the end I would confirm edits by sequencing. Among the limitations I’d say imprecision (off-target edits) and the fact that multiplex edits increase complexity and screening effort.

Week 3 HW: Lab Automation

Python Script for Opentrons Artwork

Here’s my HTGAA 2026 Opentrons Art Python Script Submission.

The artistic design I created using the GUI is available here.

I heavily used the “Example 7 Microbial Earth” by Dominika Wawrzyniak, using pixels loaded from an external resource (a CSV file hosted on my GitHub page).

I used Dominika’s well documented Notion page from HTGAA21 to understand the code and replicate it for my case. I used Gemini assistance only to debug minor typos and syntax errors, and to identify which packages to import to execute the code.

Like Dominika Wawrzyniak, I planned to introduce more colors, like in the image I generated in the Automation Art Interface. However, implementing this design into code turned out to be more difficult and tedious than anticipated, so I left it as one color (red).

Post-Lab Questions

Question 1

The paper “High-throughput experimentation for discovery of biodegradable polyesters” (Fransen et al., 2023) uses an Opentrons 1st-generation robot to automate a high-throughput biodegradation assay based on the clear-zone technique.

The researchers synthesized 642 polyesters and polycarbonates and tested their biodegradability using a clear-zone assay with Pseudomonas lemoignei. The Opentrons robot was repurposed as an automated imaging platform to capture time-lapse images of polymer degradation in 12-well plates, enabling consistent, large-scale monitoring over 13 days.

This automation allowed rapid generation of a large biodegradation dataset and supported machine learning models to predict polymer degradability from chemical structure.

Question 2

High-throughput screening of bacterial isolates for PHA production is traditionally extremely time-consuming and labor-intensive, requiring manual handling of hundreds of colonies across multiple conditions. For my final project, I plan to use an Opentrons OT-2 liquid-handling robot to automate this workflow, dramatically increasing throughput, reproducibility, and consistency compared to manual methods I used during my master’s.

Isolates will be spotted in triplicate on 60-sector plates, maintaining identical indexed positions across all plates for direct comparison. Viability will first be confirmed on LB agar, and isolates will then be inoculated onto mineral medium (MM; Ramsay et al., 1990) agar plates supplemented with individual carbon sources at 10% v/v to reach typical screening concentrations.

PHA production and bacterial growth will be assessed using a two-step staining workflow. First, Sudan Black B (0.02% in 96% ethanol, followed by ethanol washes) will identify colonies with blue coloration indicative of polymer accumulation. Second, Nile Red A incorporated into MM (0.5 μg/mL) will allow selected isolates to be ranked based on UV fluorescence (312/365 nm).

This automated setup enables rapid testing of hundreds of isolate × carbon source combinations, accelerating the discovery of strains compatible with low-cost feedstocks and efficient bioprocessing while transforming a laborious manual process into a precise, scalable screening platform.

Here’s my draft script for this exercise.

Each “color” would correspond to a different bacterial isolate. I did not implement this in the script yet. The coordinate set is a starting layout and could be refined to achieve a more uniform, regular distribution across the plate (like in the image I drafted using the GUI available below)

Final Project Ideas

Added 3 slides with 3 ideas for an Individual Final Project in the appropriate slide deck for Commited Listeners here.

Also here’s my analoginal brainstorm

Week 4 HW: Protein Design Part 1

Homework: Protein Design I

Part A. Conceptual Questions

1) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

~ 21% of meat is protein content (Smith et al. 2022) therefore, 500g meet contains about 105g of protein.

Using the approximation of average amino acid ≈ 100 Da ≈ 100 g/mol for ~100 g protein: 100/100=1.00 mol

Avogadro’s number: 1 mole = 6.02214076×10²³ 1.00 mol × 6.022×10²³ ≈ 6.02×10²³ amino-acid molecules

2) Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Beef/fish supplies raw materials and energy, but it doesn’t transfer “cow/fish identity”. What we eat is digested first meaning the proteins, fats, and carbohydrates are broken down into small building blocks (amino acids, fatty acids, sugars), absorbed, and then reassembled into human molecules under human genetic and hormonal control.

3) Why are there only 20 natural amino acids?

Doig (2017) hypothesizes that the canonical set of 20 standard amino acids is best understood as an evolved “alphabet” that became fixed early because this set is sufficient and practical for building stable, soluble proteins. This set enables soluble folded structures with close-packed hydrophobic cores and ordered binding pockets, rather than being selected because each amino acid was needed for catalysis (since RNA catalysts were already effective enough). Once early life standardized a working translation system around this set, changing the alphabet would have been costly, so it became effectively locked in (“frozen”). Other references, such as Freeland et al. (2000), suggest that 20 is a good number for minimizing damage from errors (mutation/mistranslation).

4) Where did amino acids come from before enzymes that make them, and before life started?

Amino acids could plausibly have come from abiotic chemistry on early Earth. Proposed routes include cyanosulfidic protometabolism and amino-acid formation from electrical discharges in simple “primitive Earth” gas mixtures (the classic Miller experiment).

5) Can you discover additional helices in proteins?

Beyond the α-helix, proteins commonly contain 3₁₀ helices and π helices (less frequent helical variants), as well as polyproline II helices (common in Pro-rich/disordered regions) and the specialized collagen triple helix.

6) Why are most molecular helices right-handed?

Right-handed helices dominate because natural biomolecules are made from single-handed monomers, and the right-handed twist is the lowest-energy way to repeat their geometry without clashes.

7) Why do β-sheets tend to aggregate?

β-sheet aggregation buries exposed hydrophobic side chains and releases ordered water from their surfaces, which is strongly favorable, lowering enthalpy.

8) What is the driving force for β-sheet aggregation?

β-sheet aggregation is driven mainly by the hydrophobic effect and stabilized/propagated by intermolecular backbone H-bonding in the cross-β structure (often reinforced by tight steric-zipper packing).

9) Why do many amyloid diseases form β-sheets?

β-sheet architecture is an unusually generic, stable, and self-templating way for polypeptide backbones to stick together when normal folding fails. In a β-sheet, the peptide backbone forms regular hydrogen bonds. This conformation makes amyloid fibrils thermodynamically stable and hard to clear, because once a small β-sheet nucleus forms, it can seed further growth by recruiting more monomers and templating the same β-rich structure.

Part B: Protein Analysis and Visualization

Question 1

I selected poly(3-hydroxyalkanoate) depolymerase (PhaZ) because it is the key enzyme that degrades PHB, which directly controls whether a microbe accumulates bioplastic (useful for biotechnology) or breaks it down (relevant for environmental fate). phaZ inactivation is commonly discussed as a strategy to reduce PHA mobilization and increase polymer retention.

Question 2

MPEPYIFRTVELDDQSIRTAVRPGKPHLTPLLIFNGIGANLELVFPFIEALDPDLEVIAFDVPGVGGSSTPRHPYRFPGLAKLTARMLDYLDYGQVSAIGVSWGGALAQQFAHDYPERCKKLVLAATAAGAVMVPGKPKVLWMMASPRRYVQPSHVIRIAPLIYGGAFRRDPDLAMHHASKVRSGGKLGYYWQLFAGLGWTSIHWLHKIHQPTLVLAGDDDPLIPLVNMRLLAWRIPNAQLHIIDDGHLFLITRAEAVAPIIMKFLQEERQRAVMHPRPASGG

BLAST Result Lenght: 283 aa Most frequent amino acid: Leucine (L), 32/283 = 11.3%

250 hits Reviewed (Swiss-Prot) homologs: 1

It belongs to the PHA depolymerase (PhaZ) family, which is part of the broader α/β-hydrolase enzyme superfamily.

Question 3

AF_AFP26495F1 - COMPUTED STRUCTURE MODEL OF POLY(3-HYDROXYALKANOATE) DEPOLYMERASE

This is not an experimentally solved structure, so there is no X-ray/EM “resolution” value. RCSB explicitly states: “There are no experimental data to verify the accuracy of this computed structure model. See Model Confidence metrics below for all regions of the polypeptide chain.” Instead, quality is reported by AlphaFold confidence. Global pLDDT: 91.95 (very high confidence overall)

RCSB lists 1 unique protein chain (monomer A1) and no ligands/non-protein entities.

Structure classification family: InterPro annotations classify it as Poly(3-hydroxyalkanoate) depolymerase (IPR011942) and an alpha/beta hydrolase fold protein (Alpha/beta hydrolase fold-1 domain, AB hydrolase superfamily).

Question 4

I opened AF-Q9R9W3-F1-model_v6 in PyMOL and visualized it in cartoon, ribbon, and ball-and-stick representations.

Colored by secondary structure, it shows a mixed α/β fold with more helices than β-sheets.

Colored by residue type, hydrophobic residues are enriched in the core (and in a few surface patches), while polar/charged residues are mostly surface-exposed, consistent with solubility.

The surface view shows clear cavities/clefts, consistent with potential binding pockets (e.g., a substrate-binding groove typical of hydrolases).

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

Question 1

a)

b) The vertical darker columns at certain positions are highly constrained residues where most substitutions are penalized. That usually indicates structural importance (core packing, tight turns, or residues critical for fold stability). Positions with mostly neutral colors across many substitutions are likely surface-exposed or in flexible loops, where the model predicts more tolerance

Question 2

Latent Space Analysis Use the provided sequence dataset to embed proteins in reduced dimensionality. Analyze the different formed neighborhoods: do they approximate similar proteins? Place your protein in the resulting map and explain its position and similarity to its neighbors.

In progress…

Part D. Group Brainstorm on Bacteriophage Engineering

GROUP MEMBERS

GROUP MEMBERS: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia.

PROJECT MAIN GOAL in discussion: Increased stability (easiest), higher titers (medium), higher toxicity of lysis protein (hard)

My group and I are conducting research for the group phage project. We have set up a shared Google Docs (screenshot below).

Week 5 HW: Protein Design Part 2

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

Question 1

This is human SOD1 sequence from UniProt (P00441) removing the initial Met

ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ FASTA

introducing the A4V mutant associated with the most aggressive forms of the ALS disease ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Question 2 and 3

With the help of ChatGPT and Gemni, I generated 2 new cells ir order to generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

4 PepMLM-generated 12-aa peptides (conditioned on mutant SOD1):

  1. HRVPVAGVEWWE
  2. WSYYVTAVAHKE
  3. WRYGAAAVEWKE
  4. WSVPVVAIEHGE

Question 4

  1. HRVPVAGVEWWE
  2. WSYYVTAVAHKE
  3. WRYGAAAVEWKE
  4. WSVPVVAIEHGE

5. FLYRWLPSRRGG

Question 5

WRYGAAAVEWKE - ppl 4.645 (mean NLL 1.536) WSYYVTAVAHKE - ppl 5.094 (mean NLL 1.628) WSVPVVAIEHGE - ppl 6.423 (mean NLL 1.860) HRVPVAGVEWWE - ppl 7.660 (mean NLL 2.036) Known binder: FLYRWLPSRRGG - ppl 21.391 (mean NLL 3.063)

Interpretation: The perplexity score is PepMLM’s confidence in the peptide under its generative model. PepMLM perplexity can be interpreted this way: lower = higher confidence

PepMLM assigns higher confidence to the four generated peptides than to the known binder under this scoring scheme, with WRYGAAAVEWKE ranked best (lowest perplexity).

The known binder has higher perplexity, suggesting it is less consistent with PepMLM’s learned binder distribution for this target, even though it is experimentally reported to bind. This highlights that PepMLM perplexity is not an experimental binding score. Also, it suggests that perplexity alone is insufficient to validate binding.

As I found this really strange, I decided to find out checks I could run to see whether this was an error/artifact:

Test for missing mask token: negative, so all good.

Conclusion My generated peptides are enriched in W/V/A/Y and look like classic short hydrophobic binders. The known binder FLYRWLPSRRGG has a highly charged tail (RRGG) and a different composition pattern, which the model may assign low probability to even if it binds in reality.

Part 2: Evaluate Binders with AlphaFold3

Evaluate Binders with AlphaFold3

SOD1 Mutant Sequence (A4V mutation) ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

At first, I mistakenly evaluated all peptides in the same run.

Then I noticed the AlphaFold Server treated that as one multi-chain complex with 6 chains total (SOD1 + 4 generated peptides + the known binder). So to compare them I would had to run 5 separate jobs.

  1. SOD1 + HRVPVAGVEWWE: ipTM = 0.34; pTM = 0.86

Where the peptide appears to bind? The peptide is positioned along an external surface of the SOD1 β-strand core, contacting a β-sheet edge/adjacent loop (surface-bound).

  1. SOD1 + WSYYVTAVAHKE: ipTM = 0.22; pTM = 0.81

Where the peptide appears to bind? The peptide shows weak localization and appears loosely associated with the protein surface, without a clearly defined contact region.

  1. SOD1 + WRYGAAAVEWKE: ipTM = 0.41; pTM = 0.85

Where the peptide appears to bind? The peptide is placed near a β-barrel edge/loop region on the outer surface of SOD1 (surface-bound).

  1. SOD1 + WSVPVVAIEHGE: ipTM = 0.44; pTM = 0.86

Where the peptide appears to bind? The peptide is positioned on a distinct surface patch on the β-barrel face/edge, appearing more localized than the others (surface-bound).

  1. SOD1 + FLYRWLPSRRGG (control): ipTM = 0.3; pTM = 0.83

Where the peptide appears to bind? The peptide contacts the protein surface and appears partially inserted into a shallow surface groove/cleft (partially buried relative to the others).

By ipTM ranking: WSVPVVAIEHGE (0.44) > WRYGAAAVEWKE (0.41) > HRVPVAGVEWWE (0.34) > FLYRWLPSRRGG (0.30) > WSYYVTAVAHKE (0.22).

The observed ipTM values are uniformly low (0.22–0.44), indicating limited AlphaFold3 confidence in any specific peptide–SOD1 interface. Among the PepMLM-generated candidates, WSVPVVAIEHGE (ipTM = 0.44) and WRYGAAAVEWKE (ipTM = 0.41) score higher than the known binder FLYRWLPSRRGG (ipTM = 0.30), while HRVPVAGVEWWE (0.34) is slightly higher and WSYYVTAVAHKE (0.22) is lower. Overall, PepMLM-generated peptides match or exceed the known binder by ipTM, but the absolute scores suggest weakly supported, mostly surface-associated binding modes rather than a high-confidence complex.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

  1. HRVPVAGVEWWE
  1. WSYYVTAVAHKE
  1. WRYGAAAVEWKE
  1. WSVPVVAIEHGE
  1. FLYRWLPSRRGG (control)

Across all five peptides, PeptiVerse predicts solubility = 1.000 and non-hemolytic behavior (hemolysis probabilities 0.035–0.064), so none of the candidates are flagged as poorly soluble or strongly hemolytic. Predicted binding affinities (pKd/pKi) vary and do not track ipTM: the highest-ipTM peptide (WSVPVVAIEHGE, ipTM 0.44) has the lowest predicted affinity (5.338), while WRYGAAAVEWKE has a higher predicted affinity (6.526) but slightly lower ipTM (0.41).

The known binder (FLYRWLPSRRGG) shows mid-range predicted affinity (5.962) and ipTM (0.30). Considering binding prediction plus safety-like properties, WRYGAAAVEWKE best balances the set: it has the highest predicted affinity (6.526), is predicted soluble (1.000), and has low hemolysis probability (0.047), while still achieving a relatively higher ipTM (0.41) compared to most others.

Peptide to advance: WRYGAAAVEWKE - it is predicted to be soluble, low-hemolysis, and has the strongest predicted binding affinity among the tested peptides, with moderate (though still low-confidence) structural support from AlphaFold3 (ipTM 0.41).

Part 4: Generate Optimized Peptides with moPPIt

I used the moPPIt Colab on a GPU runtime and pasted the A4V mutant SOD1 sequence (mature form without initiator Met). Here’s my collab copy.

I set binder length to 12 aa and generated a pool of candidate peptides using multi-objective guidance. I enabled affinity guidance and included solubility and hemolysis guidance to bias toward more developable peptides.

Binder (12-aa)SolubilityHalf-lifeAffinity
EWWRERLRQTLI0.58330.58336.0163
EDWLATLRAATS0.50005.92795.7517
EEEWRQLQSQYE0.83334.43136.8902
TEEEGVRWKRGV0.75004.05486.4628
ELLQWILGITIE0.416713.46816.1644

Compared to PepMLM, moPPIt produces peptides shaped by explicit objectives. PepMLM peptides were more diverse but less controlled with respect to developability properties whereas moPPIt candidates tend to show stronger biases in composition, more consistent physicochemical properties across candidates, and often a narrower “design family” reflecting the guidance constraints. On this run, the moPPIt outputs are more compositionally biased toward charged residues (E/D and R/K), consistent with explicit optimization for solubility and half-life alongside affinity. Here’s a summary interpretation of the results:

  • Best predicted affinity: EEEWRQLQSQYE (6.8902)
  • Best predicted solubility: EEEWRQLQSQYE (0.8333)
  • Best predicted half-life: ELLQWILGITIE (13.4681)
  • Most “balanced” if you prioritize binding + solubility: EEEWRQLQSQYE (top on both, but not top half-life)
  • Most “balanced” if you prioritize half-life strongly: ELLQWILGITIE (best half-life, but lowest solubility)

Before any clinical consideration, I would follow a staged evaluation: (1) in silico screening for interface plausibility (AlphaFold3 ipTM/PAE consistency across seeds) plus basic developability predictions (solubility, hemolysis, aggregation risk); (2) in vitro binding assays (SPR/BLI or competition ELISA), stability in serum, and cytotoxicity/hemolysis assays; (3) cell-based assays for functional effect and off-target toxicity; (4) only after robust preclinical evidence, proceed to in vivo PK/PD and safety studies. In other words, moPPIt designs are hypotheses that must be filtered by structural consistency and validated experimentally before any translational claims.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele) Since this was an optional part, I decided to skip for now.

Part C: Final Project: L-Protein Mutants

Phage Lysis Protein Design Challenge

L-Protein Engineering | Option 1: Mutagenesis

I ran the mutational scoring notebook to obtain per-substitution LLR scores and shortlisted mutations with positive scores.

PositionWild_Type_AAMutation_AALLR_Score
50KL2.561464
29CR2.395425
39YL2.241777
29CS2.043149
9SQ2.014323
29CQ1.997047
29CP1.971026
29CL1.960644
50KI1.928798
53NL1.864930
61EL1.818097
52TL1.813966
50KF1.802066
29CT1.797245
29CK1.795876
5FQ1.795244
5FR1.659717
29CA1.648654
27YR1.628060
22FR1.602028
5FP1.596888
50KV1.594572
50KS1.574555
5FT1.559023
5FS1.556416
45AL1.539248
39YS1.517457
27YS1.497052
40VL1.477630
27YL1.474637

I then intended to cross-check each shortlisted mutation against the experimental mutant dataset (L-Protein Mutants) to see whether the experimental lysis phenotype is directionally consistent with the LLR score.

Only 6 substitutions from my scored shortlist overlapped with the experimental table (C29R, C29S, K50I, K50S, Y27S, Y39S). In the experimental dataset, all overlapping substitutions were labeled as non-lytic (Lysis = 0) despite having positive LLR scores in the notebook. This suggests that, for MS2 L-protein, sequence-only language-model scores may not reliably capture key determinants of lysis (likely influenced by membrane insertion, oligomerization, and host-factor dependence). We therefore should treated LLR scores as a hypothesis generator, not a predictor of functional lysis.

I selected five single-point variants, including two mutations in the soluble region (positions 1–40) and three in the transmembrane region (TM) (positions 41–75), as required.

WT (MS2 L-protein, 75 aa): METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

I selected five single substitutions with positive LLR scores. I enforced the assignment constraint by choosing two mutations in the soluble region (positions 1–40) and three in the transmembrane region (positions 41–75).

Here are the 5 mutants I choose:

Mutant 1 - S9Q (soluble, LLR = 2.014)

Sequence: METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Rationale: High positive score in the soluble region (putative DnaJ-interaction domain). Ser→Gln increases hydrogen-bonding potential and may alter surface chemistry without strongly destabilizing the fold.

Mutant 2 - C29R (soluble, LLR = 2.395)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Rationale: One of the strongest positive-scoring substitutions in the soluble region. Adds a positive charge that could reshape chaperone-recognition or interaction surfaces.

Mutant 3 - A45L (TM, LLR = 1.539)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Rationale: Hydrophobic substitution in the transmembrane segment. Ala→Leu increases hydrophobicity and may stabilize membrane helix packing/insertion and oligomer stability.

Mutant 4 - T52L (TM, LLR = 1.814)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT Rationale: Polar→hydrophobic change in the TM region. Thr→Leu may increase membrane compatibility and reduce local insertion/misfolding penalties.

Mutant 5 - N53L (TM, LLR = 1.865)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT Rationale: Polar→hydrophobic change in the TM region with a strong positive score. Selected as an additional TM-stabilizing candidate.

Week 6 HW: Genetic Circuits Part 1: Assembly Technologies

Assignment: DNA Assembly

Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion High-Fidelity PCR Master Mix is a 2X, ready-to-use mixture where the exact formulation is partly proprietary, but the functional components are documented in the manufacturer’s manual:

Component (Phusion 2X Master Mix)Purpose
Phusion High-Fidelity DNA PolymeraseDNA synthesis with high fidelity + proofreading
dNTPs (dATP, dCTP, dGTP, dTTP)Building blocks for new DNA strands
HF reaction buffer (salts + pH buffer)Maintains optimal pH/ionic strength for enzyme function
Mg2+ (via buffer system; often MgCl2-derived)Essential polymerase cofactor
Stabilizers / additives (partly proprietary)Improve enzyme stability and consistency
Nuclease-free waterSolvent to reach correct 2X working concentrations

Reference: Thermo Fisher Phusion High–Fidelity DNA Polymerase Product Information Sheet, standard biochemistry manuals (e.g., Sambrook & Russell).

Question 2: What are some factors that determine primer annealing temperature during PCR?

DeterminantEffect on TAWhy
Primer Melting Temperature (Tm)IncreaseHigher Tm means stronger duplex stability, needs higher Ta
Primer lengthIncreaseMore base pairs → higher Tm → higher Ta
Primer GC%IncreaseGC pairs stabilize duplex more than AT
Salt (Na+/K+) concentrationIncreaseScreens charges, stabilizes duplex, raises Tm
Mg2+ concentrationIncreaseStabilizes primer-template binding; raises effective Tm
Primer-template mismatches (more / at 3′ end)DecreaseDestabilizes duplex; lower Ta needed to anneal
Degenerate bases (more degeneracy)DecreaseLowers effective match/Tm; often requires lower Ta
GC-rich template / strong secondary structureDecreaseCompetes with primer binding; often use lower Ta + additives
DMSO / betaine / similar GC additivesDecreaseReduce duplex stability (esp. GC), lowering effective Tm
Need for higher specificity (reduce off-targets)IncreaseHigher Ta increases stringency, reduces non-specific binding

Question 3: There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

Aspect / Decision pointPCR (amplification)Restriction enzyme (cutting)
What it doesAmplifies a defined region between two primersCuts existing DNA at specific recognition sites
InputTemplate DNA + primersDNA substrate (plasmid/PCR product/genomic DNA) + restriction enzyme(s)
Key reagentsPolymerase mix, primers, dNTPs, buffer, Mg2+Restriction enzyme(s), buffer, often BSA (enzyme-dependent)
Protocol core stepsDenature → anneal → extend (cycling)Incubate DNA with enzyme(s) at recommended temperature/time
Sequence requirementsNeed primer-binding sites flanking targetNeed the enzyme recognition site(s) present in the DNA
Output fragment boundariesDefined by primer positions (base-precise)Defined by cut sites (exact where enzyme cleaves)
Can create new sequences?Yes - primers can add overhangs/tags/sitesNo - only cuts at existing sites (unless sites were engineered earlier)
Typical use casesGenerate a specific insert, add adapters, site-directed changes, amplify from low-abundance templateLinearize a plasmid, excise an insert, diagnostic mapping, generate compatible ends for cloning
Speed / setupModerate - requires optimization (Ta, primers)Fast/simple if sites exist and enzyme conditions are known
Failure modesNon-specific bands, primer-dimers, no amplification, PCR errorsStar activity (wrong cuts), incomplete digestion, missing sites
Fidelity / errorsDepends on polymerase; can introduce mutationsNo replication - does not introduce point mutations
When preferableWhen you need a specific fragment and/or to add features (overhangs, tags), or template amount is lowWhen the fragment is already present and flanked by useful sites; when you need clean linearization/excision without amplification

Question 4: How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Check / requirementWhat to do (PCR + digest)Why it matters for Gibson
20–40 bp overlaps at every junctionDesign primers so each fragment end has 20–40 bp homology to the adjacent fragment/backboneGibson assembly depends on annealing of complementary overlaps
Correct orientation of overlapsEnsure the overlap sequence matches the correct neighbor (A→B, B→C, insert→vector, etc.)Wrong overlap = wrong assembly or no assembly
Linearized backboneRestriction-digest the vector to a single linear band; gel-purify if neededGibson requires a linear backbone (no undigested circular plasmid carryover)
Remove template plasmid from PCRIf PCR was from plasmid, treat with DpnI (cuts methylated template)Prevents parental plasmid background colonies
Clean fragment ends (no inhibitors)Purify PCR and digest products (spin column or gel extraction)Salts, ethanol, detergents inhibit Gibson enzymes
Correct fragment sizesRun an agarose gel to confirm expected sizes; excise/gel-purify correct bands if mixedVerifies you’re assembling the intended pieces
Avoid duplicate/competing overlapsKeep overlaps unique (no repeated identical overlap sequences across multiple junctions)Prevents mis-assembly and rearrangements
Overlap doesn’t create strong hairpins/repeatsCheck overlap sequences for high secondary structure/repeatsImproves annealing and reduces drop in assembly efficiency
Balanced fragment concentrationsQuantify DNA (Nanodrop/Qubit) and use equimolar amounts; keep total DNA in recommended rangeToo much/too little of one piece reduces correct assembly
No internal cuts from chosen restriction enzymesVerify your insert/parts don’t contain the restriction sites used to linearize the vectorPrevents unintended fragmentation or loss of insert

Question 5: How does the plasmid DNA enter the E. coli cells during transformation?

The plasmid DNA enter the E. coli cells during transformation through transient permeability of the cell envelope. This can happen either via:

  • Electroporation: a short electric pulse creates temporary membrane pores that let DNA pass into the cytoplasm.
  • Chemical (heat-shock) transformation: divalent cations (e.g., Ca²⁺) reduce electrostatic repulsion between DNA and the membrane, and a brief heat shock promotes DNA uptake through temporary pores/defects.

Question 6: Describe another assembly method in detail (such as Golden Gate Assembly)

a) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate Assembly is a molecular cloning technique that allows multiple DNA fragments to be assembled simultaneously in a single reaction. It uses Type IIS restriction enzymes such as BsaI, BsmBI, or BbsI, which cut DNA outside their recognition sequence and generate custom sticky ends. We can control the order and orientation in which DNA fragments assemble by placing Type IIS restriction sites around each fragment and designing specific 4-bp overhangs that are complementary only to the intended neighboring fragment, the order and orientation of DNA assembly are precisely controlled.. During the reaction, the restriction enzyme digests the DNA fragments while T4 DNA ligase simultaneously ligates matching overhangs in the same tube, making the process efficient and rapid. Because the restriction sites are removed during assembly, the correctly assembled construct cannot be cut again, while incorrect products continue to be digested, driving the reaction toward the desired product. The reaction is typically performed in a thermocycler alternating between ~37 °C (optimal for digestion) and ~16 °C (optimal for ligation). This method is widely used in synthetic biology because it enables scarless assembly of many DNA parts, although internal Type IIS restriction sites must first be removed usually by silent mutation(s).

Golden Gate Assembly – Step-by-Step Diagram

Step 1: Design fragments with Type IIS sites

Vector: [BsaI]─────────────[BsaI]

Fragment A: [BsaI]──Part A──[BsaI]

Fragment B: [BsaI]──Part B──[BsaI]

Inward-facing BsaI sites. Overhangs are designed to match the next fragment.

Step 2: Type IIS cuts outside recognition sites

Vector: GCTT—–

Fragment A: —–AATG (overhang)

Fragment B: AATG—–GCTT (overhangs)

Recognition sites (BsaI) are removed on small excised pieces.

Step 3: Annealing of fragments

Vector —–GCTT

Fragment A GCTT—–AATG

Fragment B AATG—–CGAA

Overhangs anneal only to the correct partner. Orientation is fixed.

Step 4: Ligase seals fragments

Final construct:

Vector ── Fragment A ── Fragment B

Scarless assembly. BsaI sites are gone, so the construct is stable.

Step 5: Reaction drives correct assembly

  • Misassembled fragments still have exposed BsaI sites → cut again
  • Correct product accumulates over multiple cycles

Key Points:

  • Modular → promoters, RBS, genes, terminators
  • Multi-fragment assembly in one tube
  • Order & orientation controlled by 4-bp overhangs
  • Scarless final product

b) Model this assembly method with Benchling or a similar tool!

I imported the pBBR1MCS-5 sequence as circular DNA (pBBR1MCS-5 (raw)) and imported phaA, phaB, phaC as separate linear DNA sequences.

I checked for internal BsaI sites (GGTCTC) in all sequences: the genes have no BsaI sites, and pBBR1MCS-5 has a single BsaI site, so it is not a Golden Gate destination vector by direct digest. To model Golden Gate anyway, I created a PCR-linearized Golden Gate backbone: I duplicated the plasmid and saved a linear version (pBBR1MCS-5_GG_backbone).

On this linear backbone, I created two endpoint annotations (first ~20 bp and last ~20 bp) to represent that PCR primers would add inward-facing BsaI sites + 4 bp overhangs:

  • start: BsaI + Overhang OH1 (added by PCR primer)
  • end: BsaI + Overhang OH4 (added by PCR primer)

To simplify the Benchling model, I represented Golden Gate flanks (inward-facing BsaI sites and 4-bp overhangs) as annotations rather than explicitly adding the flanking sequences. In a real build, these flanks would be introduced via PCR primers or synthesis.

I duplicated each gene to create Golden Gate-ready parts (phaA (codon optimized) anotated, phaB (codon optimized) anotated and phaC (codon optimized) anotated) and defined the assembly overhang scheme for directional order. For each gene, I added annotations with intended Golden Gate junction overhangs:

  • Left end: Intended Golden Gate overhang: OH1 (conceptual)
  • Right end: Intended Golden Gate overhang: OH2 (conceptual)

Overhangs were not added as literal sequences, I only annotated the first/last 20 bp to indicate where BsaI-generated 4 bp overhangs would be introduced via primers/synthesis.

For a simplified Golden Gate model in Benchling, I manually constructed the final plasmid sequence by opening pBBR1MCS-5 at the MCS and concatenating the backbone with phaA–phaB–phaC in the intended order. Overhangs/Type IIS flanks were represented as annotations only.

Assignment: Asimov Kernel

Asimov Kernel notes / all material on my repo “Kanbe-Mariana-HW6”. Below are just some of the info, but please have a look at the Kernel direcly.

HW6: Asimov Kernel Exercises 1,2:

Exercise 3:

Finding the “Bacterial Demos” public repo

I started analysing the constructs with the Repressilator.

This is the description: “This is a repressilator genetic circuit. It consists of 3 transcription units, where the CDS in each is a repressor that represses the promoter in the next transcription unit. This results in an oscillation of the concentrations of the 3 proteins.”

These 3 constructs have 3 different promoters, which generates different genetic ←→ phenotipic outputs:

J23117 Promoter: A transcription unit with a weak promoter. J23101 Promoter: A transcription unit with a strong promoter. J23106 Promoter: A transcription unit with a medium promoter.

Using Simulation feature, the repressillator was simulated using the following parameters:

Chassis: E. coli Duration: 408 hours Timestep: 60 min Transfection: Transient transfection

These was the output:

Summary of the findings:

The simulation shows rapid initial accumulation followed by relatively stable RNA and protein concentration ranges over time, while endpoint RNAP and ribosome fluxes differ substantially among the three transcription units. The construct driven by the J23101 (strong promoter) shows the highest activity, the J23106 (medium promoter) shows intermediate activity, and the J23117 (weak promoter) shows the lowest activity.

Exercise 4: Repressilator reconstructions

I recreated the Repressilator in the empty construct using parts from the Characterized Bacterial Parts repository.

First, I used the Search function in the right-hand menu to find the required bacterial parts. Then, I dragged and dropped the selected parts into the empty construct to assemble the circuit. The final design reproduced the three-transcription-unit repressilator architecture.

After building the construct, I used the Simulator by clicking the play button to test its behavior. I then compared the simulation output with the original Repressilator Construct available in the Bacterial Demos repository.

Repressillator Reconstruction 1

I replaced pLacI (regulated by LacI) with pTetR (regulated by TetR) in the first unit, while all other simulation parameters were kept the same. That means the input regulator of that node changed, but the overall loop structure is preserved.

The goal was to observe whether changing the promoter identity altered the resulting RNA concentrations, protein concentrations, RNAP flux, or ribosome flux compared with the original repressilator design.

Using Simulation feature, the new repressillator pTetR was simulated using the same parameters as before:

Chassis: E. coli Duration: 408 hours Timestep: 60 min Transfection: Transient transfection

These was the output:

Summary of the findings:

The simulation looks the same cecause from the model’s perspective the system is still a symmetric 3-repressor cycle and each node still produces a repressor and represses the next node. So the dynamics remain qualitatively equivalent.

Repressilator Reconstruction 2:

In order to try to experiment another cyclic repression topology different from TetR → LacI → LambdaCI → TetR I’ve tried these:

Replace pLambdaCI with pLacI: to make two transcription units use the same promoter and see how that would affect the circuit’s behavior. Replace pLacI with pLambdaCI: to test what happens when I switch which repressor controls that transcription unit. Replace TetR CDS with LacI CDS: to see how the simulation changes when one repressor is replaced by another and the circuit has less repressor diversity.

And so I re-runned the simmulation and these were the plots:

The modified circuit converges to a steady state dominated by LambdaCI, with LacI and TetR near zero, and no oscillatory behavior observed.

Exercise 5

Construct 1

I designed this construct to test high constitutive expression using the strong J23101 promoter placed upstream of LacI, with an A1 RBS to support translation and an L3S2P24 terminator to end transcription. My rationale was to build a simple bacterial circuit with no regulatory feedback, so I would expect continuous LacI expression and relatively high, stable RNA and protein levels in the simulation.

The simulation of this first construct shows rapid initial expression followed by a stable steady state. RNA concentration increases quickly and stabilizes at approximately 0.8 relative units, while protein concentration stabilizes at approximately 0.65. RNAP and ribosome flux are constant, indicating continuous transcription and translation. This matches the expectation for a constitutive expression construct driven by the strong J23101 promoter.

Construct 2

The second construct shows significantly lower expression compared to the first. RNA concentration stabilizes at approximately 0.003 relative units and protein concentration at approximately 0.0025, both much lower than in the strong promoter construct. RNAP and ribosome flux are also reduced. The system still reaches a steady state with constant expression over time, indicating that changing the promoter strength affects the magnitude of expression but not the overall behavior.

Construct 3

For the third construct, I copied the Self-regulating Circuit from the Bacterial Demos repository into my workspace and ran the simulation without modifying its structure. This allowed me to observe the behavior of a circuit with built-in feedback regulation and compare it with the constitutive expression constructs.

The self-regulating circuit shows stable expression over time, reaching a steady state without oscillations. RNA concentration stabilizes at approximately 0.56 relative units and protein concentration at approximately 0.45. RNAP and ribosome flux are constant, indicating continuous but regulated expression. Compared to the constitutive constructs, the expression level is intermediate, reflecting the effect of feedback regulation on maintaining controlled output.

These results show that promoter strength controls expression level, while circuit structure, such as feedback regulation, influences how expression is maintained over time.

Week 7 HW: Genetic Circuits Part 2: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Question 1

Traditional genetic circuits are usually implemented in Boolean logic (ON/OFF), hand-designed as fixed logic. so representing nuanced behaviors often requires many gates, sharp thresholds, and careful tuning, which can make designs bulky and brittle. As the number of inputs grows the circuit complexity can explode combinatorially, increasing burden by stacking multiple layers and adding intermediate nodes, which increases metabolic load, failure points, and sensitivity to part-to-part variability Also, adapting to new targets or shifting biological context often means redesigning the circuit architecture, not just re-tuning parameters.

Intracellular Artificial Neural Networks (IANNs) are parametric and trainable: you can adjust “weights” to fit a desired behavior from data (calibration/learning), then iterate as conditions change. This is more condizent with the noisy and complex nature of biological signals. IANNs are parametric and trainable, designed to operate on analog inputs, tolerate noise through distributed computation, and approximate complex decision boundaries without enumerating every logic case. Using IANN you can adjust “weights” to fit a desired behavior from data (calibration/learning), then iterate as conditions change, which is in general a very wanted feature for biological modelling.

Question 2

A useful application for an IANN could be a multi-signal “smart probiotic” controller that decides when to express a therapeutic payload in the gut based on a noisy inflammation signature. This could be a proposed pipeline:

  1. Sensors detect several analog inputs. These can be related to a mesurable intracellular signal (i.e. information on promoters/sensors response to nitrate/NO, tetrathionate, ROS, and low pH <-> measurable intracellular signal like transcription rate or a regulator concentration)

  2. The IANN integrates these signals as weighted contributions and computes a graded output: a continuously tunable expression level of a payload gene (e.g., an anti-inflammatory cytokine mimic, a barrier-protective peptide, or a locally acting enzyme), plus an optional reporter for monitoring.

Instead of requiring all conditions to be “true” or “false,” like Boolerian models the IANN can implement a “risk score” that turns on strongly only when the combined pattern matches inflammation, while remaining low for benign fluctuations. In practice, you would calibrate the weights using training data from known conditions (healthy vs inflamed models) so the output tracks the probability or intensity of the target state.

Limitations / failure modes: IANNs still face real biological constraints such as sensor cross-talk and context effects. These can shift input distributions. Also, weights can drift as cells evolve, and metabolic burden can reduce growth or change the very physiology being measured. The dynamic range and noise of biological parts can compress signals, making it hard to separate “moderate” from “high” states without careful normalization and controls. Time dynamics also matter: inputs arrive on different timescales (transcription vs metabolites), so the network may need memory/filters to avoid reacting to transient spikes, which can substantially increase the complexibility of the network. Finally, safety and containment become part of the spec, thus important to define acceptance balance between error type 1 and 2 defining if you’d likely need a kill switch and strict limits on maximum output to avoid unintended activation in off-target contexts.

Question 3

Assigment Part 2: Fungal Materials

Question 1

Example 1: Mycelium composite foams (grown on agricultural waste)

Used for protective packaging, insulation panels, acoustic damping, and lightweight cores.

Advantages: renewable feedstocks, low-temperature manufacturing, biodegradable or compostable end-of-life, and tunable density via growth conditions.

Disadvantages: mechanical properties can vary batch-to-batch, moisture sensitivity unless coated, and long-term durability and standards testing can be harder than for petrofoams.

Example 2: Mycelium “leather” (mycelium-based sheets)

Used for footwear, bags, apparel, and upholstery as a leather alternative.

Advantages: avoids the animal leather supply chain, potentially lower land and chemical burden, and tunable texture and thickness.

Disadvantages: still often needs finishing steps for durability and water resistance, performance can lag high-grade leather, and cost and scale are still improving.

Example 3: Fungal biocement or mycelium-bound “bio-bricks”

Used for low-load building blocks, interior architectural elements, and decorative panels.

Advantages: low-energy fabrication, can use local waste substrates, lightweight, and potentially lower embodied carbon than fired bricks or some concretes.

Disadvantages: typically not comparable to concrete for structural strength, humidity and fire performance require careful engineering, and regulatory acceptance is slower.

Example 4: Fungal pigments and dyes (fermentation-derived)

Used for textiles, inks, coatings, and cosmetics.

Advantages: renewable production, avoids some petroleum-derived dye routes, and potentially lower toxic byproducts depending on the process.

Disadvantages: stability and colorfastness can be challenging, purification costs can be nontrivial, and some pigment pathways have safety constraints depending on the organism and compound.

Question 2

One may want to tune mycelium architecture (hyphal branching, wall composition, and crosslinking) to achieve specific strength, flexibility, porosity, and water resistance for composite materials. Another application is producing programmable functional materials by engineering fungi to secrete adhesives, hydrophobins, melanin-like coatings, or crosslinking enzymes so the final material is tougher or more water-stable without heavy post-processing.

Beyond material applications, genetically engineered fungi can be used for biosensing if we add genetic circuits that turn on a visible reporter in response to VOCs, toxins, inflammation markers, or pollutants, enabling living “sensor materials.” They can also be used for biomanufacturing high-value enzymes, small molecules, and therapeutics that benefit from eukaryotic processing or secretion, and for bioremediation by enhancing the breakdown of lignin, plastic additives, dyes, PFAS-like contaminants (where feasible), or heavy-metal binding, depending on pathway and safety constraints.

Fungi can be advantageous over bacteria because filamentous growth lets them act as a self-assembling scaffold, so the organism is both the “factory” and the “fabrication method.” They also offer eukaryotic protein processing because fungi handle disulfide bonds, folding, secretion, and many post-translational modifications better than most bacteria, which matters for secreted enzymes and complex proteins. In addition, fungi naturally secrete many enzymes, which is ideal for biomass conversion and environmental breakdown workflows. Another advantage relative to bacteria is metabolic breadth since fungi often tolerate more extreme acidic conditions and diverse feedstocks, and many are strong at producing secondary metabolites.

However, bioprocesses with engineered fungi may have practical limitations compared with bacteria, such as slower growth and iteration, more complex regulation and morphology (heterogeneity in filamentous cultures can make outputs less uniform), and genetic tools that can be trickier because strain engineering and predictable expression are often less plug-and-play than in E. coli.

Assigment Part 3: First DNA Twist Order

Week 9 HW: Cell Free Systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

Exercise 1

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free systems allow full and direct control of reaction conditions and components, enabling rapid and flexible experimentation. Here’s a table with the main advantages of cell-free vs in vivo:

AspectCell-freeIn vivo
Environment controlDirect, tunableLimited by cell physiology
Toxic proteinsCan expressOften lethal to host
Reaction conditionsPrecisely adjustableFixed intracellular state
SpeedMinutes-hoursHours-days
Component handlingAdd/remove partsDifficult

Cases where cell-free is more beneficial

  • Expression of toxic proteins (e.g., antimicrobial peptides)
  • Incorporation of non-natural amino acids
  • Expression of membrane proteins with detergents/liposomes
  • Rapid prototyping of genetic circuits

Exercise 2

Main components of a cell-free expression system and their role

ComponentRole
Cell extract (lysate)Provides ribosomes, enzymes, tRNAs
DNA/mRNAEncodes target protein
Amino acidsBuilding blocks for protein
Energy system (ATP,GTP)Drives transcription/translation
Cofactors (Mg²+, K+)Maintain enzyme activity
BufferStabilizes pH and environment

Exercise 3

Protein synthesis consumes large amounts of ATP and GTP. Because cell-free reactions lack the metabolic machinery of living cells, these energy molecules are rapidly depleted unless they are regenerated, which causes protein synthesis to stop and reduces yield.

A common way to maintain ATP supply is the phosphoenolpyruvate (PEP) system, in which PEP donates a phosphate group to ADP via pyruvate kinase to regenerate ATP: PEP + ADP → ATP (via pyruvate kinase). Other ATP regeneration strategies include creatine phosphate in which creatine phosphate transfers a phosphate to ADP via creatine kinase to rapidly regenerate ATP and glucose-based systems where Glucose is metabolized through enzymatic pathways to continuously produce ATP over longer reaction times.

PEP and creatine phosphate favor speed and simplicity, whereas glucose-based systems are better suited for longer and more sustainable reactions. Unless the process clearly requires extended reaction time, I would start with the PEP system because it typically delivers faster and higher ATP regeneration with a relatively simple setup.

Excercise 4: Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic vs eukaryotic cell-free systems

ProkaryoticEukaryotic
SpeedFastSlower
CostLowerHigher
Protein foldingMore limitedBetter for complex proteins
Post-translational modificationsMinimalPresent or more compatible
Best suited forSimple proteinsComplex eukaryotic proteins

Prokaryotic cell-free systems such as E. coli are faster and less expensive, making them suitable for producing simple proteins that do not require complex folding or post-translational modifications, such as GFP. In contrast, eukaryotic systems are slower and more costly but are better suited for proteins that require proper folding, disulfide bond formation, or eukaryotic processing, such as human antibody fragments.

Excercise 5

To optimize membrane protein expression in a cell-free system, I would design the reaction to include a membrane-like environment during synthesis, using detergents or liposomes to maintain solubility and support proper insertion. I would also optimize reaction conditions such as magnesium concentration and temperature, and add chaperones if necessary, to reduce misfolding and improve overall yield, because membrane proteins are especially prone to misfolding and insolubility in aqueous systems.

ChallengeWhy it occursExperimental strategyExpected benefit
MisfoldingMembrane proteins contain hydrophobic regionsAdd chaperones; optimize temperatureImproves correct folding
AggregationHydrophobic segments interact in solutionAdd mild detergents (e.g., DDM)Keeps protein soluble during synthesis
InsolubilityNo native membrane is presentAdd liposomes or nanodiscsProvides membrane-like environment
Low insertionProtein cannot embed properly in aqueous mediaInclude membrane mimics during expressionSupports insertion and stabilization
Poor yieldReaction conditions may be suboptimalOptimize Mg²⁺ and reaction conditionsIncreases expression efficiency and stability

Excercise 6: Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Low yield in a cell-free system can result from insufficient transcription, depletion of ATP, degradation of the expressed protein, or poor folding conditions. Troubleshooting should therefore target the limiting step directly: improve template quality if transcription is weak, reinforce energy regeneration if the reaction stalls, inhibit proteases if degradation is suspected, and optimize temperature or folding support if the protein is unstable or misfolded.

Homework question from Kate Adamala

I would design a phospholipid vesicle-based synthetic minimal cell that uses the blue-light regulator EL222 to activate expression of the tyrosinase gene melA, producing melanin as a visible record of cumulative light exposure.

Question 1

A light-exposure logging synthetic minimal cell for integration into a wearable or material patch.

a)

input:

  • the synthetic cell would detect blue/visible light and respond by producing melanin
  • a realistic light-sensing module is EL222, a one-component blue-light activated transcription factor from Erythrobacter litoralis that binds DNA upon illumination

output:

  • gradual, visible darkening that records cumulative exposure over time.
  • a realistic pigment-output gene is melA, a tyrosinase gene from Rhizobium etli that has been used to generate melanin in E. coli.

b) This function could be realized by cell-free Tx/Tl alone only partially. In bulk cell-free solution, the circuit could still produce melanin, but without encapsulation it would not behave as a discrete synthetic minimal cell and would be harder to localize, stabilize, or integrate into a material as a spatially resolved light-logging unit.

c) This function could also be realized by a genetically modified natural cell. For example, E. coli can be engineered to express melA and produce melanin. A synthetic minimal cell is preferable if the goal is a compartmentalized, material-compatible system rather than a living replicating microbe.

d) The desired outcome is that the synthetic cell becomes darker as cumulative light exposure increases. In a material, a population of these vesicles would function as a distributed exposure log: more illuminated regions would accumulate more melanin and therefore appear darker than shaded regions.

Question 2

a) The membrane would be a phospholipid vesicle, for example POPC + cholesterol, because that is a standard stable composition for synthetic cell vesicles and is also used in related artificial-cell communication systems.

b) Inside the vesicle, I would encapsulate an E. coli cell-free transcription/translation system, amino acids, NTPs, salts, and cofactors, an ATP regeneration system, such as PEP + pyruvate kinase, L-tyrosine as the melanin precursor Cu²⁺ as a cofactor for tyrosinase, DNA encoding the light-response module and melanin-output module.

c) For the Tx/Tl source, a bacterial system is sufficient. The core regulator, EL222, is bacterial, and the output enzyme MelA tyrosinase does not require mammalian-specific post-translational processing to function as a pigment-producing enzyme.

d) The synthetic cell would communicate with the environment mainly through light, which crosses the membrane directly, so no membrane channel is required for the input. To simplify the system, I would preload tyrosine and copper inside the vesicle. If I later wanted continuous substrate exchange from the outside, I could add a pore such as α-hemolysin (Hla), which is commonly used in synthetic-cell communication designs.

Exercise 3 - Experimental details

a)

Lipids: POPC, cholesterol Genes: EL222 from Erythrobacter litoralis as the light-activated transcription factor; melA from Rhizobium etli as the tyrosinase gene for melanin production optional: hla for α-hemolysin if external substrate exchange is needed; Encapsulated reagents: E. coli cell-free lysate or PURE-like system, amino acids, NTPs, PEP, pyruvate kinase, tyrosine, Cu²⁺

b)

I would measure the function of the system by tracking darkening over time, using image analysis and bulk absorbance measurements. The most direct readout is the increase in visible pigmentation of illuminated vesicles relative to dark controls; microscopy could also be used to compare spatial patterns of melanin accumulation across the material.

Homework question from Peter Nguyen

Application field: Textiles / Fashion

One-sentence pitch A textile integrated with freeze-dried cell-free melanin-producing modules that develops gradual, skin-adjacent tonal changes in response to light exposure, turning the garment into an exposure-recording surface.

How it works The material would incorporate localized freeze-dried cell-free reaction zones containing the genetic and enzymatic components needed for melanin production, for example a light-responsive regulator such as EL222 coupled to a melanin-producing gene such as melA. When the textile is activated by hydration, these embedded reaction zones become functional and begin responding to light exposure by expressing tyrosinase and generating melanin from preloaded substrate. Over time, more exposed regions of the garment darken more than shaded or covered regions, creating gradients or “tan-line-like” traces directly in the material. Functionally, the textile behaves less like a conventional dyed fabric and more like a programmable, exposure-sensitive biological film.

Societal challenge or market need This concept addresses the growing interest in responsive and personalized materials in fashion and design, especially materials that are not just decorative but capable of recording use, environment, or time. It also responds to demand for alternatives to static coloration and conventional dyeing by proposing a material whose visual output is generated biologically in place. Beyond fashion, the same platform could be relevant to design objects or artistic textiles that visibly register environmental exposure.

How to address limitations of cell-free reactions

  • Because freeze-dried cell-free systems require water for activation and are typically limited in duration, I would treat the material as an on-demand activation platform rather than a permanently active textile. The garment could be hydrated only when the user wants to generate a pattern or record a specific exposure event, which also helps manage stability and one-time use.
  • To improve shelf life, the cell-free modules would remain freeze-dried until use and be stored in sealed conditions;
  • To improve localization and handling, they could be embedded in discrete patches, printed zones, or replaceable inserts rather than distributed uniformly across the whole textile. This makes the limitation part of the design logic: the material is activated intentionally, records one event or interval, and then remains as the final artifact.

Background information (max 100 words) Space radiation can damage DNA and reduce the reliability of biological systems used for diagnostics, manufacturing, and environmental sensing during long-duration missions. This is significant because future crews will likely depend on compact biotechnology tools rather than constant resupply from Earth. It is relevant for space exploration because cell-free systems are lightweight, storable, and already attractive for use in resource-limited environments. It is scientifically interesting because it links a basic biological question - how nucleic acid damage affects gene expression - to an applied engineering problem: how to maintain functional biotechnology in space.

Molecular or genetic target (max 30 words) Integrity and expression efficiency of a PCR-amplified sfGFP DNA template after radiation-mimicking UV exposure.

How the target relates to the challenge (max 100 words) The sfGFP DNA template serves as a simple reporter for whether a biologically useful DNA sequence remains functional after damage. If radiation-like exposure degrades the template, BioBits cell-free protein expression should produce less GFP signal. This makes the target directly relevant to the space biology challenge, because many space biotechnology applications depend on DNA templates remaining intact enough to be transcribed and translated. Measuring GFP output therefore provides a practical way to estimate how radiation damage could impair future cell-free diagnostics or production systems used in spacecraft or habitats.

Hypothesis or research goal (max 150 words) My hypothesis is that increasing UV exposure, used here as a classroom-accessible proxy for radiation-induced nucleic acid damage, will reduce the ability of a PCR-amplified sfGFP DNA template to produce GFP in the BioBits cell-free expression system. I further expect that templates protected by a shielding condition, such as melanin-containing film or another UV-blocking barrier, will retain more expression than unprotected templates exposed to the same dose. The reasoning is that DNA damage should interfere with transcription and translation by reducing template integrity, while a protective barrier should lower that damage. The research goal is to test whether cell-free fluorescence output can function as a simple readout of DNA stability under space-relevant stress and whether a lightweight protective strategy improves performance.

Homework question from Ally Huang

Background information: Space radiation can damage DNA and reduce the reliability of biological systems used for diagnostics, manufacturing, and environmental sensing during long-duration missions. This is significant because future crews will likely depend on compact biotechnology tools rather than constant resupply from Earth. It is relevant for space exploration because cell-free systems are lightweight, storable, and already attractive for use in resource-limited environments. It is scientifically interesting because it links a basic biological question: how nucleic acid damage affects gene expression to an applied engineering problem: how to maintain functional biotechnology in space.

Molecular or genetic target: Integrity and expression efficiency of a PCR-amplified sfGFP DNA template after radiation-mimicking UV exposure.

How the target relates to the challenge: The sfGFP DNA template serves as a simple reporter for whether a biologically useful DNA sequence remains functional after damage. If radiation-like exposure degrades the template, BioBits cell-free protein expression should produce less GFP signal. This makes the target directly relevant to the space biology challenge, because many space biotechnology applications depend on DNA templates remaining intact enough to be transcribed and translated. Measuring GFP output therefore provides a practical way to estimate how radiation damage could impair future cell-free diagnostics or production systems used in spacecraft or habitats.

Hypothesis or research goal: My hypothesis is that increasing UV exposure, used here as a classroom-accessible proxy for radiation-induced nucleic acid damage, will reduce the ability of a PCR-amplified sfGFP DNA template to produce GFP in the BioBits cell-free expression system. I further expect that templates protected by a shielding condition, such as melanin-containing film or another UV-blocking barrier, will retain more expression than unprotected templates exposed to the same dose. The reasoning is that DNA damage should interfere with transcription and translation by reducing template integrity, while a protective barrier should lower that damage. The research goal is to test whether cell-free fluorescence output can function as a simple readout of DNA stability under space-relevant stress and whether a lightweight protective strategy improves performance.

Experimental plan:

  • I will amplify an sfGFP template with the miniPCR and divide it into groups:
    • no UV exposure
    • low UV
    • high UV, and
    • UV plus shielding
  • After treatment, each sample will be added to BioBits cell-free reactions. Negative controls will include reactions with no DNA template; positive controls will include unexposed template.
  • GFP fluorescence will be measured with the P51 Molecular Fluorescence Viewer and quantified by image intensity or relative brightness. The main data will be fluorescence level across conditions, which will indicate how template damage affects expression and whether the shielding condition preserves function.

Homework Part B: Individual Final Project

general info / link for my slide in the CT slide deck

Here’s my slide in the CT slide deck

Title: Engineering Tunable Skin Pigment Expression in Engineered Living Materials

Aim 1: Generate base data on melanogenesis by mapping key pathways and build an initial genetic circuit informed by this base data to produce tunable pigmentation (eumelanin-biased outputs for darker tones and pheomelanin-biased outputs for warmer tones).

Aim 2: Expand and refine the circuit aiming for selecting envisioned great candidates for wet-lab experimentation. Experiments planning.

Aim 3: Empirical essays to explore how variables such as pigment amount, distribution, and system conditions affect the final material output. Companies: BioFabricate; Cultivarium

Industry Council Companies: BioFabricate and Cultivarium I selected them because they each address a different core part of my project: Biofabricate could potentially bring a strong expertise on how to translate embedding melanin-related genetic circuits into a desirable (aesthetic and functional) engineered living material, while Cultivarium is well aligned with the wet-lab side of the project, particularly chassis selection, non-model organism engineering, and the practical challenge of implementing and optimizing the circuit in a host such as Komagataeibacter rhaeticus.

Submit the Final Project selection form.

Started planning how I will write my final project documentation based on the guidelines

To be done by April 10 at 11PM ET. Prepare your first DNA order and put it in the “Twist (MIT)” or “Twist (Nodes)” tab of the 2026 HTGAA Ordering: DNA, Reagents, Consumables spreadsheet, as appropriate.

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

What to measure?

I will measure visible melanin output in the material as the primary readout of the project.

I want to quantify:

  • Degree of darkening
  • Spatial distribution of pigmentation
  • Stability/Persistence of the pigmentation in the bacterial cellulose / after drying or storage

These measurements are directly relevant because they indicate whether the melanin-producing system is functioning and whether the output is compatible with the intended material application.

How to measure?

a) Initial measurements: Molecular biology

First, to validate the genetic component, I would measure the presence of the designed construct by PCR and confirm the DNA sequence by DNA sequencing. I would use agarose gel electrophoresis to confirm correct DNA assembly before testing expression.

To verify whether the melanin-producing pathway is being expressed in a cell-free or microbial test system before integration into the material, I could also use gel electrophoresis to confirm DNA assembly and cell-free assay readouts to test whether the construct produces the expected visible darkening before integrating it into bacterial cellulose.

b) Material measurements:

These are the most direct indicators of whether the melanin-producing system is working and whether the output is useful as a material feature rather than only a biochemical signal.

  • I would first document the material using standardized photography under controlled lighting and then quantify changes in tone by image analysis, comparing pixel intensity or color values across samples and conditions. I would also use absorbance or spectrophotometric measurements when possible to obtain a more objective estimate of pigment accumulation.
  • As a secondary measurement, I would use UV-Vis absorbance or reflectance spectroscopy, if available, to quantify pigment accumulation more objectively.

Homework: Waters Part 1 — Molecular Weight

Question 1

eGFP (native): ~26.9 kDa

eGFP + LEHHHHHH tag: ~27,875.41 Da

All spaces and line breaks were removed.

Question 2

To calculate the molecular weight of intact eGFP, I selected two adjacent peaks from the LC-MS spectrum at m/z 933.7349 and 965.9684.

Using the adjacent charge state equation, this gives a charge state of approximately 30 for the first peak, meaning the second adjacent peak corresponds to 29. I then used these charge states to calculate the molecular weight from each peak, using the relationship between m/z, charge, and proton mass. This gave values of 27,981.8 Da and 27,983.9 Da, respectively, with an average experimental molecular weight of 27,982.9 Da.

I then compared this experimental value with the theoretical molecular weight of the full eGFP construct, including the LE linker and His tag, which is 28,006.3 Da. The relative error was 0.084%, showing very good agreement between the experimental and predicted values. This indicates that the adjacent charge state method produced an accurate estimate of the intact protein mass.

For the zoomed-in peak near m/z 1474, the charge state can also be reasonably assigned. Based on the experimental molecular weight, a 19+ ion would appear at about m/z 1473.8, which closely matches the observed signal. So yes, the charge state of the zoomed-in peak can be observed, and it is most consistent with z = 19.

Homework: Waters Part II — Secondary/Tertiary structure

Question 1

This unfolding changes how the protein gets charged during electrospray ionization. In the native state, fewer sites are accessible for protonation, so the protein carries fewer charges and appears at higher m/z values. In the denatured state, more sites are exposed, so the protein can carry more charges, which shifts the signal to lower m/z values.

In the mass spectrum (Figure 2), this shows up clearly. The native protein has a tighter charge state distribution at higher m/z, while the denatured protein has a broader distribution shifted toward lower m/z. So basically, by looking at how the charge state envelope shifts, we can tell whether the protein is folded or unfolded.

Question 2

If we zoom into the peak around m/z ~2800 in the native spectrum, we can determine the charge state by looking at the spacing between the small peaks in the isotope pattern. At high resolution, these peaks are separated by approximately 1/z.

From the inset, the peaks are spaced by about ~0.05–0.06 m/z units. Since the spacing is equal to 1/z, this suggests:

z ≈ 1 / 0.05 ≈ 20

So the charge state is approximately 20+.

This also makes sense when compared to the protein’s mass (~28 kDa). A 20+ ion would appear around m/z ≈ 2800, which matches the observed peak. So both the isotope spacing and the overall m/z position are consistent with a charge state of 20+.

Homework: Waters Part III — Peptide Mapping - primary structure

Question 1

  • Lysine (K): 20
  • Arginine (R): 6
  • Total K + R: 26
  • Number of tryptic peptides generated: 27

To analyze the eGFP standard, I first reviewed the full amino acid sequence provided, including the LE linker and the C-terminal His-tag (HHHHHH). I then identified all lysine (K) and arginine (R) residues, since trypsin cleaves specifically after K and R residues unless the following amino acid is proline (P).

After counting the residues in the sequence using Benchlink, I found a total of 20 lysines (K) and 6 arginines (R), for a combined total of 26 potential trypsin cleavage residues.

Question 2

I also checked whether any of these K or R residues were followed by proline, which would block trypsin cleavage, and I found that none of them were followed by P. Therefore, all 26 sites are valid trypsin cleavage sites. Because each cleavage site divides the sequence into peptide fragments, the total number of peptides expected from complete tryptic digestion is the number of cleavage sites plus one. Based on this, the digest should generate 27 peptides in total.

To double check this, I have pasted the eGFP amino acid sequence into the ExPASy PeptideMass tool, selected trypsin as the digestion enzyme, and used the parameters shown in Figure 4, including 0 missed cleavages, monoisotopic mass, and no modifications. I then clicked “Perform the Cleavage” to generate the predicted list of tryptic peptides and determine the total number of peptides produced.

After manually counting 26 lysine and arginine residues, I expected a total of 27 tryptic peptides. When I ran the sequence in the ExPASy PeptideMass tool, the output showed fewer peptides than expected. However, this is because the tool was set to display only peptides with masses greater than 500 Da, which excludes smaller fragments.

Question 3

To analyze the peptide map, I examined the total ion chromatogram (TIC) in Figure 5a and focused on the retention time window between 0.5 and 6 minutes. I counted only peaks with a relative intensity greater than approximately 10% of the base peak, as specified. Based on this criterion, I observe approximately 18–20 chromatographic peaks between 0.5 and 6 minutes. The exact number depends slightly on how closely overlapping peaks are resolved, particularly in the region between ~2.5 and 3.5 minutes, where several peaks are closely spaced.

Question 4

The chromatogram shows fewer peaks than the number of peptides predicted from question 2. In question 2, the full tryptic digest was predicted to generate 27 peptides. In the chromatogram, counting only peaks above the 10% relative abundance threshold between 0.5 and 6 minutes gives roughly 20 peaks.So there are fewer peaks in the chromatogram than predicted peptides. This likely means that some peptides are either too low in abundance, too small, or co-elute with other peptides and therefore do not appear as separate visible chromatographic peaks.

Question 5

To analyze the peptide in Figure 5b, I first identified the most intense peak in the spectrum, which appears at m/z ≈ 525.77. I assumed this corresponds to the most abundant charge state of the peptide.

To determine the charge state, I examined the zoomed-in isotope pattern. The spacing between adjacent isotope peaks is about 0.5 m/z unit. Since isotope spacing is approximately equal to 1/z, a spacing of ~0.5 indicates that z ≈ 2. Based on this, I concluded that the most abundant charge state is z = 2+.

Next, I calculated the mass of the singly charged form of the peptide, M+H+, using the relationship:

M+H+ = z(m/z) − (z − 1)(1.0073)

Substituting the values:

M+H+ = 2(525.77) − 1.0073 ≈ 1050.53 Da

So, the peptide has:

m/z ≈ 525.77

charge state z = 2+

M+H+ ≈ 1050.53 Da

This result is consistent with the spectrum, since there is also a peak visible near m/z ≈ 1050.52, which corresponds to the singly charged form of the same peptide.

Question 6

From the previous step, I determined that the most abundant ion was at m/z 525.7671 with charge z = 2, which gives a singly charged mass of about M+H+ = 1050.53 Da. In the PeptideMass results, the closest expected peptide mass is 1050.5214 Da, which corresponds to the peptide FEGDTLVNR. Based on that match, I identified the peptide as FEGDTLVNR.

To evaluate the mass accuracy, I compared the experimental mass to the theoretical mass from PeptideMass. Using the exact value labeled in the spectrum, the experimental singly charged mass is 1050.52438 Da, and the theoretical mass is 1050.5214 Da. The mass difference is therefore:

1050.52438 - 1050.5214 = 0.00298 Da

To express the error in ppm, I used:

error (ppm) = (MW_experimental - MW_theory) / MW_theory × 10^6

Substituting the values:

error (ppm) = (0.00298 / 1050.5214) × 10^6 ≈ 2.84 ppm

So the measurement error is about 2.8 ppm, which indicates very good agreement between the measured peptide mass and the theoretical value.

Question 7

Figure 6 shows that the amino acid coverage of eGFP is 88%. This means that 88% of the eGFP sequence was confirmed by peptide mapping.

Summary

Identified peptide: FEGDTLVNR Experimental M+H+: 1050.52438 Da Theoretical M+H+: 1050.5214 Da Mass error: 2.84 ppm Sequence coverage confirmed by peptide mapping: 88%

Homework: Waters Part IV — Oligomers

I use the aid of chatgpt for comparing the theoretical nd experimental subunits masses in the answering below.

To identify the Keyhole Limpet Hemocyanin (KLH)’s oligomeric states in the CDMS spectrum, I used the subunit masses given in Table 1 and multiplied them by the number of subunits expected in each assembly. I then compared those theoretical masses to the labeled peaks in Figure 7.

Here are the results summarized in a table:

Oligomeric speciesTheoretical massPeak in the mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMSInterpretation
7FU Decamer3.4 MDa~3.4 MDaThis peak is consistent with the expected mass of a 10-subunit 7FU assembly.
8FU Didecamer8.0 MDa~8.33 MDaThis is the closest and most intense peak, so it is the strongest candidate for the 8FU didecamer.
8FU 3-Decamer12.0 MDa~12.67 MDaThis peak is reasonably close to the expected tridecamer mass and likely represents a higher-order 8FU assembly.
8FU 4-Decamer16.0 MDa~16-17 MDaThe weak signal in this region may correspond to the 8FU 4-decamer, although this assignment is more tentative.

Discussion

To interpret the CDMS spectrum, I compared the theoretical oligomer masses calculated from the known KLH subunit masses with the labeled peaks in Figure 7. Based on this comparison. The observed masses are not perfectly identical to the theoretical values, but they are close enough to support these assignments as working hypotheses.

Example proxis:

  • For the 7FU decamer (10 units): 7FU subunit mass = 340 kDa

  • Since a decamer contains 10 subunits, the expected mass is: 10 × 340 = 3400 kDa = 3.4 MDa

  • In the spectrum, there is a labeled peak at about 3.4 MDa I would assign that peak to the 7FU decamer. This corresponds to a 4.5 mDa from the x axis analysis.

    The slight offsets could reflect experimental uncertainty, heterogeneity in the sample, adducting, or the natural structural complexity of KLH. Overall, my interpretation is that the spectrum supports a mixture of KLH oligomeric states, with the 8FU didecamer appearing to be the predominant species and the larger 8FU assemblies likely representing less abundant higher-order associations.

The 8.33 MDa peak is by far the most intense feature in the spectrum. This suggests that the 8FU didecamer may be the dominant oligomeric state in this sample under the conditions used for CDMS.

In contrast, the peaks assigned to the 8FU 3-decamer and especially the 8FU 4-decamer are much less abundant, which may indicate that these larger assemblies are present only as minor populations or form less stably in solution.

Homework: Waters Part V — Did I make GFP?

TheoreticalObserved/measured on the Intact LC-MSPPM Mass Error
Molecular weight (kDa)28.00627.983 (LC-MS, Figure 1)836

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

I contributed 7 pixels to the global artwork experiment, helping extend a horizontal yellow line in the top-left area (see screenshot below).

At first, I was cautious and tried to understand the ongoing ideas for each section and whether there was a unifying concept. I considered introducing something new, but ultimately decided to stick with what seemed to be the area’s goal (a horizontal yellow line). For next year, it might be fun to have an in-app chat within the same domain to coordinate contributions more easily and check the current vibes.


Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Question 1

Component CategoryComponentCorrected role in the cell-free reaction
LysateE. coli LysateProvides the endogenous transcription, translation, and metabolic machinery needed for in vitro gene expression.
BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)Provides the same core lysate machinery plus T7 RNA polymerase for strong transcription from T7 promoter templates.
Salts / BufferPotassium GlutamateHelps set intracellular-like ionic conditions that support enzyme activity, ribosome function, and overall reaction performance.
HEPES-KOH pH 7.5Maintains reaction pH in the range needed for stable transcription-translation activity.
Magnesium GlutamateSupplies Mg2+, an essential cofactor for ribosomes, polymerases, and many ATP-dependent enzymes.
Potassium phosphate monobasicContributes phosphate and helps maintain buffer balance together with the dibasic form.
Potassium phosphate dibasicWorks with the monobasic form to maintain phosphate buffering and reaction stability.
Energy / Nucleotide SystemRiboseSupports nucleotide metabolism and regeneration pathways rather than serving as the main energy source.
GlucoseServes as a metabolic energy substrate that helps regenerate ATP through endogenous lysate metabolism.
AMPActs as a nucleotide monophosphate precursor that can be phosphorylated into higher-energy adenine nucleotides.
CMPActs as a nucleotide precursor that can be converted into CTP for transcriptional needs.
GMPActs as a nucleotide precursor that can be converted into GTP for transcription and translation-related processes.
UMPActs as a nucleotide precursor that can be converted into UTP for transcriptional needs.
GuanineServes as a salvage precursor for guanine nucleotide synthesis.
Translation Mix (Amino Acids)17 Amino Acid MixProvides most of the amino acid building blocks required for protein synthesis.
TyrosineProvides a required amino acid for translation and may also be supplied separately because of formulation or pathway-specific needs.
CysteineProvides a required amino acid for translation and is often added separately because of its chemical instability.
AdditivesNicotinamideServes as a precursor for NAD-related cofactors that support extract redox metabolism.
BackfillNuclease Free WaterBrings the reaction to the target volume without introducing nucleases or contaminants.

Question 2

The 1-hour PEP-NTP system supplies fully activated NTPs and high-energy phosphate (PEP) upfront, enabling fast, high-rate transcription and translation but with limited longevity due to rapid energy depletion.

In contrast, the 20-hour NMP-ribose-glucose system relies on metabolic regeneration, using NMPs and simple substrates (ribose, glucose) that are enzymatically converted into active nucleotides and ATP, trading peak speed for sustained, longer-duration protein production.


Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Question 1

a. Superfolder Green Fluorescent Protein (sfGFP)

Description: a basic (constitutively fluorescent) green fluorescent protein published in 2005, derived from Aequorea victoria. It is reported to be a very rapidly-maturing weak dimer.

sfGFP has very efficient folding and fast maturation (~13 min), allowing it to produce fluorescence quickly and reliably even under suboptimal cell-free conditions. This makes it ideal for early and robust readout.

b. Monomeric Red Fluorescent Protein 1 (mRFP1)

mRFP1: Derived from DsRed, mRFP1 has slow maturation and lower photostability, which delays fluorescence signal and reduces effective brightness in short or energy-limited cell-free reactions.

c. mKusabira-Orange2 (mKO2)

mKO2 has moderate maturation speed but higher sensitivity to photobleaching and environmental conditions, which can reduce signal stability during long incubations or repeated excitation. This protein is relatively acid-sensitive (higher pKa), so its fluorescence can decrease if the cell-free reaction acidifies over time, affecting signal stability.

d. mTurquoise2

This protein has an exceptionally high quantum yield and photostability, making it one of the brightest CFP variants and ideal for strong signal readout even at low expression levels.

e. mScarlet_I

mScarlet-I is optimized for high brightness and improved maturation efficiency among red FPs, enabling stronger signal compared to earlier RFPs, though maturation still limits very early readouts compared to GFP variants.

f. Electra2

As a newer engineered FP (likely optimized variant), its performance is typically influenced by trade-offs between brightness, folding efficiency, and maturation kinetics, meaning signal output depends strongly on how well it folds and matures in the cell-free environment.

Question 2

Hypothesis: For mKO2, increasing the HEPES-KOH buffer concentration and maintaining sufficient glucose in the cell-free mastermix will improve fluorescence over a 36-hour incubation by reducing pH drift and sustaining ATP regeneration.

Rationale: Because mKO2 is relatively acid-sensitive, stronger pH buffering should help preserve fluorescence, while sustained glucose-dependent energy regeneration should support continued protein expression and chromophore maturation, resulting in a higher final fluorescence signal.

Small caveat: glucose can also contribute to acidification depending on the metabolism of the lysate, so the strongest version is really HEPES-KOH + controlled glucose, not just “more glucose.”

Question 3

sfGFP → system calibration (TX-TL health) Melanin has a broad absorbance spectrum, but it absorbs much more strongly at shorter wavelengths (blue/green) than at longer wavelengths (red). Melanin interferes with optical readout since we will be trying to measure fluorescence in a reaction that is simultaneously getting darker, which creates optical interference broadening the wavelengh espectrum of signal.

mScarlet-I → expression readout for melA tyrosinase especifically fluorescence is less sensitive to melanin, so it better tracks expression alone (sfGFP → Ex ~488 nm / Em ~510 nm → high overlap with melanin absorbance; mTurquoise2 → even worse (blue region); mScarlet-I → Ex ~569 nm / Em ~594 nm → less overlap).

Question 4 For optimizing the Master Mix design for mScarlet-I in my melA tyrosinase cell-free system, I’d supplement CuSO4 since my analyte is a cooper dependent enzyme, HEPES-KOH pH 7.5 to have an additional buffer against acidification and magnesium glutamate to improve translation capacity.

At first I thought about adding glucose since it could extend energy regeneration, but then I wondered that it may also increase acidification. Since you’re worried about fluorescence readout in a pigment-producing system, I’d prioritize pH stability over extra glucose.

I’d actually supplement L-tyrosine that serve as a functional validation that my protein of interest MelA tyrosinase is being expressed and active.

Master Mix designs to be tested using mScarlet-I and sfGFP:


REACTION 1


My preparation before have received email (to your email address as registered here on the Forum) providing your personal link to participate in the Cell-Free Master Mix Cloud Lab Global Experiment:

  • my melA-tyrosine cell-free system

  • mScarlet-I

SupplementVolumePurpose
HEPES-KOH pH 7.51.0 µLBuffer against pH drift over 36h, helping preserve mScarlet-I fluorescence and MelA activity.
L-tyrosine0.75 µLProvides additional substrate for MelA-driven melanin-like pigment production.
CuSO4, very low concentration0.25 µLSupports MelA tyrosinase activity as a copper-dependent enzyme while minimizing toxicity/inhibition.

MelA-specific bottlenecks: tyrosine substrate, copper cofactor

Increasing buffering capacity with HEPES-KOH seems also a good idea because prolonged cell-free reactions coupled with melanin production lead to progressive acidification, which can reduce fluorescent protein signal, impair MelA activity, and shorten the productive lifetime of the TX-TL system.


REACTION 2


  • my melA-tyrosine cell-free system

  • sfGFP

SupplementVolumePurpose
HEPES-KOH pH 7.51.0 µLBuffer against pH drift over 36h, helping preserve mScarlet-I fluorescence and MelA activity.
L-tyrosine**0.75 µLProvides additional substrate for MelA-driven melanin-like pigment production.
CuSO4, very low concentration0.25 µLSupports MelA tyrosinase activity as a copper-dependent enzyme while minimizing toxicity/inhibition.

MelA-specific bottlenecks: tyrosine substrate, copper cofactor

Increasing buffering capacity with HEPES-KOH seems also a good idea because prolonged cell-free reactions coupled with melanin production lead to progressive acidification, which can reduce fluorescent protein signal, impair MelA activity, and shorten the productive lifetime of the TX-TL system.


REACTION 3


  • my melA-tyrosine cell-free system

  • mScarlet-I

ReagentVolumePurpose
L-tyrosine0.8 µLDirect substrate for MelA pigment production
HEPES-KOH pH 7.50.6 µLReduces pH drift over 36h
Magnesium glutamate0.4 µLSupports sustained transcription-translation
Low CuSO40.2 µLSupports tyrosinase catalytic activity

As copper is required as a cofactor for MelA tyrosinase activity, but must be carefully controlled because excess Cu²⁺ can inhibit cell-free expression and promote nonspecific oxidative reactions I decided to test reducing it and supplement magnesium glutamate since it improves TX-TL capacity by supporting ribosomes, RNA polymerase, and Mg-ATP/GTP chemistry.


REACTION 4


  • my melA-tyrosine cell-free system

  • sfGFP

ReagentVolumePurpose
L-tyrosine0.8 µLDirect substrate for MelA pigment production
HEPES-KOH pH 7.50.6 µLReduces pH drift over 36h
Magnesium glutamate0.4 µLSupports sustained transcription-translation
Low CuSO40.2 µLSupports tyrosinase catalytic activity

REACTION 5


  • my melA-tyrosine cell-free system

  • mScarlet-I

ReagentVolumePurpose
HEPES-KOH pH 7.51.25 µLStronger buffering against pH drift over 36h.
Low CuSO40.25 µLEnables MelA tyrosinase activity as a copper-dependent enzyme.
Nuclease-free water0.50 µLKeeps total supplement volume at 2 µL without adding more substrate.

This reaction tests whether the main limitation is pH stability + copper availability, rather than additional tyrosine. It is useful because the base mastermix already contains tyrosine, so this condition asks whether MelA can produce pigment when copper is supplied and pH is stabilized without further increasing substrate concentration.


REACTION 6


  • my melA-tyrosine cell-free system

  • sfGFP

ReagentVolumePurpose
HEPES-KOH pH 7.51.25 µLStronger buffering against pH drift over 36h.
Low CuSO40.25 µLEnables MelA tyrosinase activity as a copper-dependent enzyme.
Nuclease-free water0.50 µLKeeps total supplement volume at 2 µL without adding more substrate.

This reaction tests whether the main limitation is pH stability + copper availability, rather than additional tyrosine. It is useful because the base mastermix already contains tyrosine, so this condition asks whether MelA can produce pigment when copper is supplied and pH is stabilized without further increasing substrate concentration.


REACTION 7


  • my MelA-tyrosine cell-free system

  • sfGFP

ReagentVolumePurpose
L-tyrosine1.50 µLPushes substrate availability to test whether pigment formation is substrate-limited.
CuSO4, very low concentration0.25 µLEnables MelA catalytic activity.
HEPES-KOH pH 7.50.25 µLMinimal pH support.

This is the pigment-stress condition: it intentionally pushes melanin production to test whether sfGFP fluorescence collapses when the reaction darkens. If sfGFP drops while pigment rises, that supports using mScarlet-I as the better reporter.


REACTION 8


  • my MelA-tyrosine cell-free system

  • sfGFP or mScarlet-I

ReagentVolumePurpose
HEPES-KOH pH 7.51.50 µLStrongly buffers against acidification over 36h.
CuSO4, very low concentration0.25 µLEnables MelA activity.
L-tyrosine0.25 µLKeeps substrate present but avoids overloading the system.

This is the long-incubation preservation condition: it tests whether the best 36h outcome comes not from maximizing substrate, but from preventing reaction decay. If fluorescence and pigment both remain stronger at 36h, pH stability is the key design variable.


My actual experiments submitted

Now that I’ve seen the inferface better, I got that the goal here is to focus on DNA construct performance, so I’ll treat this as an expression/readout experiment rather than enzyme validation.

Went too far into broader bioprocess hypotheses 😅 in my brainstorm composition hypothesis above.

Given the broader objective of optimizing the cell-free master mix for maximal fluorescence across six proteins, I will test the 2 reporters:

  • mScarlet-I = better reporter under melanin/dark pigment interference
  • sfGFP = system health / pigment-interference control

This 1st round I will test these 8 reactions - Table Followed by textual arguments

ReactionReporterTestingHEPES-KOHTyrosineMagnesium glutamateWater/backfillMain purpose
1mScarlet-ILow buffer / low substrate0.25 µL0.25 µL0 µL1.50 µLBaseline condition for mScarlet-I.
2sfGFPLow buffer / low substrate0.25 µL0.25 µL0 µL1.50 µLBaseline condition for sfGFP.
3mScarlet-IpH drift1.00 µL0.25 µL0 µL0.75 µLTests whether stronger buffering improves mScarlet-I signal.
4sfGFPpH drift1.00 µL0.25 µL0 µL0.75 µLTests whether stronger buffering preserves sfGFP signal.
5mScarlet-Isubstrate limitation0.25 µL1.00 µL0 µL0.75 µLTests whether extra tyrosine increases pigment formation with mScarlet-I.
6sfGFPsubstrate limitation / pigment interference0.25 µL1.00 µL0 µL0.75 µLTests whether extra tyrosine-driven pigment formation interferes with sfGFP.
7mScarlet-IpH drift, substrate limitation, TX-TL capacity1.00 µL0.75 µL0.25 µL0 µLTests combined support for fluorescence and pigment production.
8sfGFPpH drift, substrate limitation, TX-TL capacity + reporter comparison1.00 µL0.75 µL0.25 µL0 µLSame as Reaction 7, but tests sfGFP under melanin-producing conditions.
ReactionHypothesis
1Low HEPES and low tyrosine will provide a baseline fluorescence condition for comparison across proteins.
2The same low HEPES / low tyrosine condition will reveal whether sfGFP is more sensitive to pigment-related interference than mScarlet-I.
3Increasing HEPES will improve fluorescence over 36h by reducing pH drift.
4Increasing HEPES will help determine whether pH stabilization benefits sfGFP fluorescence under the same conditions.
5Increasing tyrosine will test whether extra substrate/pigment formation reduces fluorescence through optical interference.
6High tyrosine with sfGFP will test whether green fluorescence is especially affected by pigment accumulation.
7Combining HEPES, tyrosine, and magnesium glutamate will improve fluorescence by supporting pH stability, substrate context, and TX-TL capacity.
8The same combined condition with sfGFP will test whether translation support and buffering can preserve fluorescence despite stronger pigment-forming conditions.

REACTION 1


Testing: Baseline condition (low buffer, low substrate)

Hypothesis: Under minimal buffering and substrate availability, both melanin production and mScarlet-I fluorescence will be limited, providing a baseline to compare improvements from other conditions.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 0.25 µL


REACTION 2


Testing: Baseline condition + reporter comparison

Hypothesis: This condition mirrors Reaction 1 but uses sfGFP to evaluate baseline fluorescence without strong pigment production, serving as a reference for how each reporter behaves under minimal conditions.

System: melA system sfGFP

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 0.25 µL


REACTION 3


Testing: pH drift

Hypothesis: Increasing buffering capacity with HEPES-KOH will improve mScarlet-I fluorescence over 36 hours by reducing pH drift, even without increasing substrate availability.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.25 µL


REACTION 4


Testing: pH drift + reporter comparison

Hypothesis: This condition mirrors Reaction 3 but uses sfGFP to test whether stronger buffering preserves green fluorescence, or if signal is still affected by pigment formation and optical interference.

System: melA system sfGFP

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.25 µL


REACTION 5


Testing: Substrate limitation

Hypothesis: Increasing tyrosine concentration will enhance melanin-like pigment production, indicating that MelA activity may be limited by substrate availability under baseline conditions.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 1.0 µL


REACTION 6


Testing: Substrate limitation + pigment interference

Hypothesis: This condition mirrors Reaction 5 but uses sfGFP to evaluate whether increased pigment formation interferes with green fluorescence, compared to the red-shifted mScarlet-I signal.

System: melA system sfGFP

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 1.0 µL


REACTION 7


Testing: pH drift, substrate limitation, and TX-TL capacity

Hypothesis: Combining buffering (HEPES-KOH), substrate availability (tyrosine), and translation support (magnesium glutamate) will help sustain melanin production and mScarlet-I fluorescence over 36 hours by addressing the main system bottlenecks.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.75 µL Magnesium glutamate → 0.25 µL


REACTION 8


Testing: pH drift, substrate limitation, TX-TL capacity + reporter comparison

Hypothesis: This condition mirrors Reaction 7 but uses sfGFP to evaluate how green fluorescence behaves under melanin-producing conditions, serving as a control to assess pigment interference relative to mScarlet-I.

System: melA system sfGFP

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.75 µL Magnesium glutamate → 0.25 µL


Reactions submitter on 5/1/2026.

Unfortunately not possible to add copper, which is MelA tyrosinase cofactor in the form of CuSO4 now.

Keeping the designs aligned with my Part B logic I wish I could test this 5 hypothesis:

  • I’m testing mScarlet-I for melanin-compatible readout
  • sfGFP as the expression/pigment-interference control
  • HEPES for 36h pH stability,
  • Tyrosine for MelA substrate availability,
  • Magnesium glutamate for TX-TL capacity.

Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Projects

Final projects:

  • Melanin-based bioink for Light-Recording Materials My individual final project is based on melanin and related compounds in an engineered living material (ELM) as a color-responsive bio-ink. Among many other factors, oxidation state, precursor availability / intermediate reaction pathways likely shape tone and long-term stability and may be modulated using a genetic system, be it a bacterium, a synthetic minimal cell, etc.
  • Important links: Commited Listener Slide Deck here. Benchling (TO BE ADDED) Asimov Kernel (TO BE ADDED) Aim 1: Build a first melanin-producing cell-free DNA module based on melA tyrosinase + Define validation parameters The melA gene is coding sequence of tyrosinase that catalizes the conversion of tyrosine to dopaquinone. Dopaquinone is intermediate product of melanin biosynthesis pathway that polymerizes in an enzyme-independent reaction to form melanin.
  • Bacteriophage Engineering GROUP MEMBERS: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia. PROJECT MAIN GOAL : Increase the stability of the L protein GROUP PROPOSAL: We will use the same workflow than in previous HW (e.g. mutagenesis) but adapt it to specific aim(s) based on HW reading material of week 04 (e.g. shorten the L protein to make it not dependant on bacterial chaperone DnaJ anymore).

Subsections of Projects

Brainstorms

Melanin-based bioink for Light-Recording Materials

My individual final project is based on melanin and related compounds in an engineered living material (ELM) as a color-responsive bio-ink. Among many other factors, oxidation state, precursor availability / intermediate reaction pathways likely shape tone and long-term stability and may be modulated using a genetic system, be it a bacterium, a synthetic minimal cell, etc.

Melanin itself is a heterogeneous and hard-to-define analyte candidate, so my idea is to use its main defined intermediates, like L-DOPA, dopamine, and quinones, as analytes and use a high-resolution method like LC-MS for calibration/ground truth method aiming to understand and quantify melanin-related compounds that interfere in the darketing output of the ink/material. Than use protein design to build embedded sensing for spatial or real-time readouts inside the material aiming for building a fine-tuning system that can relate color tone of the material and the synthesis of the different melanin compounds as well as control mechanisms that can trigger it (different UV light wavelengths for instance).

Explore whether melanin-based optical outputs can be generated within different bio-materials such as bacterial cellulose (BC) and ELMs it for applications in fashion, design, and light-recording materials.

I want to establish a first melanin-producing genetic platform, and fine tune it’s pigmentation in a high resolution scale. The strongest version of the project, a bio-based material that gradually develops melanin-derived tonal variation in response to different input signals (i.e. different UV wavelenghts), behaving less like a dyed textile and more like an exposure-recording surface.

Since K. rhaeticus naturally produces cellulose, it also lets me focus on material-producing biology in a native chassis instead of forcing cellulose synthesis into a non-native organism. On top of that, I am interested in the possibility of later embedding synthetic minimal cells into the cellulose as localized, non-growing modules for sensing and pigment generation.

A major question for me is what the right analyte is. Since melanin is a heterogeneous polymer, I think it does not make sense to treat it as a single clean measurable output. Because of that, I am leaning toward focusing on using as analyte more tractable analytes such as the expressed enzyme itself, or melanin-related intermediates like L-tyrosine, L-DOPA, dopamine, quinones, DHI, or DHICA.

This is where LC-MS starts to feel really central to the project. I started thinking that maybe the application should be chosen based on what LC-MS is actually powerful enough to resolve. That led me to think about applications where fine control over color, stability, or chemical state is especially important:

  • Bio-based inks or photography, where oxidation state could shape color and long-term stability.

The ink and photography direction is especially interesting to me because the final image might look stable, but what defines tone and durability may actually be determined much earlier by oxidation chemistry.

Two materials could look similar at first, but age very differently depending on how those intermediates evolved. In that case, LC-MS could help connect invisible intermediate chemistry to visible outcomes in the final material.

  • Bioadhesives or coatings, where intermediate catechol chemistry may directly determine performance.

The bioadhesive or catechol-based coating direction also seems compelling. These systems often depend on catechol-containing molecules like dopamine or L-DOPA, which can oxidize into quinones and then participate in crosslinking. That balance between reduced catechol and oxidized quinone seems to shape adhesive behavior. So instead of only testing the final strength of an adhesive, LC-MS could potentially help track how the chemistry develops during formation and explain why some conditions produce better performance than others.

In these kinds of systems, LC-MS and fine tune control of synthesis of melanin-compounds does not feel like overkill to me. It feels like the right level of resolution for the chemistry that actually matters. So I am starting to think about the project less as “make a melanin material” in the broadest sense, and more as “choose a melanin-related material application where intermediate-state chemistry is central, measurable, and worth controlling.”

Project concept:

An engineered living material (ELM) based on bacterial cellulose (BC), using Komagataeibacter rhaeticus as the primary chassis, to produce melanin-based optical outputs in a cellulose material for fashion, design, and light-recording applications.

The current direction is not to maximize “smart material” complexity at once, but to first establish a robust melanin-producing BC platform, then evaluate whether additional functions such as keratin expression, self-repair, or embedded synthetic minimal cells are technically justified.

The strongest version of the project is a nude-toned or skin-adjacent material that gradually develops melanin-derived tonal variation in response to exposure conditions, producing a material that behaves less like a dyed textile and more like an exposure-recording surface.

Why bacterial cellulose?

BC is a strong candidate because it is:

  • biogenic and directly fabricable as a sheet-like material
  • compatible with engineered living material approaches
  • mechanically robust relative to many other microbial matrices
  • moldable as pellicles, spheroids, or printed structures
  • already supported by the Komagataeibacter Tool Kit (KTK), a modular cloning toolkit for this genus

In carbon-rich media, Komagataeibacter polymerizes and secretes linear glucose chains that self-assemble into a dense interconnected cellulose mesh. This cellulose pellicle forms at the air-liquid interface and behaves like a biofilm-like material scaffold around the producing cells.

Which chassis?

Primary chassis: Komagataeibacter rhaeticus A high-yield bacterial cellulose producer and a strong chassis for BC-based ELMs.

Why Komagataeibacter rhaeticus?

  • native bacterial cellulose production
  • established relevance for BC-based material engineering
  • allows the project to focus on more specific objectives for material-producing biology, rather than forcing cellulose synthesis into a non-native organism like E. coli

Secondary system: synthetic minimal cells embedded in BC

As a second aim, the project may incorporate synthetic minimal cells (SMCs) as embedded, non-replicating functional modules inside or on the cellulose material. As these SMCs would add localized, compartmentalized sensing and pigment-generation functions to the BC scaffold. Therefore, a useful synthetic minimal cell for this project would basically be a light-exposure logging vesicle embedded in or deposited onto bacterial cellulose.

The living BC producer: K. rhaeticus builds the material scaffold and the synthetic minimal cells allow vesicle-based modules provide controlled, non-growing sensing and melanin output. This separation may be useful if pigment production or sensing logic is easier to implement in a compartmentalized cell-free system than in the BC-producing chassis itself.

Main questions

1- Since melanin is a heterogeneous polymer, which analyte should I choose to analyse?

I might want to confirm the expressed enzyme/protein (for example tyrosinase, laccase, TyrP, or another melanin-related enzyme) or melanin intermediates: L-tyrosine, L-DOPA, dopaquinone-derived products, DHICA, DHI, etc since melanin is a heterogeneous polymer. so

These are often much more tractable by LC-MS than melanin itself.

Other questions

  • Nutrient availability: If the final material remains living, nutrient supply becomes a major constraint.
  • Biosafety: use of non-replicating synthetic minimal cells

Aims

AIM 1: Define and model a first light-responsive melanin-producing synthetic minimal cell for integration into bacterial cellulose

Develop a specific in silico design for a phospholipid vesicle-based synthetic minimal cell that uses EL222 to activate melA expression under blue light, with the goal of generating visible melanin production as a localized output that could later be embedded into bacterial cellulose made by K. rhaeticus. This aim focuses on specifying the exact first system, its required components, and whether its chemistry and logic are feasible before any experimental implementation.

AIM 1 Specific Objectives:

  • define the exact genetic module to be tested first: EL222 + melA
  • specify the full internal composition of the vesicle:
    • Tx/Tl source
    • ATP regeneration system
    • tyrosine
    • copper
    • salts/cofactors
  • define the membrane composition for the first prototype, e.g. POPC + cholesterol
  • map the input-output logic precisely:
    • input = blue light
    • regulator activation = EL222
    • output = tyrosinase expression
    • final material output = melanin accumulation / darkening
  • determine which molecules must be pre-encapsulated and which, if any, must cross the membrane
  • identify the minimum set of assumptions required for the system to function = specify the required materials, genes, lipids, cofactors, and readouts for the first prototype

AIM 2: Experimental planning and prototyping strategy for melanin integration into bacterial cellulose materials

Translate the selected design into a concrete experimental plan, prioritizing a staged workflow from simple proof of concept to material-level testing. This aim is not yet full implementation, but the preparation of a robust experimental roadmap that makes the project technically executable and testable.

Practical objectives:

  • measures of success / failure:

    • define the first measurable success criteria: visible darkening? absorbance increase? spatially localized pigment formation?
    • identify the main failure points of this exact design, such as insufficient expression, low tyrosinase activity, substrate limitation, or poor melanin accumulation
  • define the first build-test sequence, including which subsystem should be validated first:

  • melanin pathway in a tractable chassis

  • cell-free context

  • BC production in K. rhaeticus

  • integration of pigment module with BC

  • plan how BC will be fabricated and presented for testing, e.g. pellicles, spheroids, molded sheets, or layered composites

  • define how synthetic minimal cells would be embedded in, coated onto, or associated with BC

  • determine the primary experimental readouts: visible pigmentation; image-based quantification of tone; spatial patterning under differential light exposure; material compatibility and stability

  • define the controls needed to evaluate whether the system is functioning as intended identify the decision points that determine whether the project should proceed with:

    • direct microbial engineering only
    • synthetic minimal cells only or a
    • hybrid system

AIM 3: Evaluate secondary functional molecules only after establishing melanin as a robust first proof of concept

Keep melanin as the primary engineered output and assess other molecules only if they offer a clear, measurable improvement to the material. This aim is intended to prevent the project from becoming too diffuse too early and to ensure that any added complexity is justified by experimental value.

Practical objectives:

  • define which secondary properties would be worth pursuing only after melanin is validated, such as:
    • increased abrasion resistance
    • reduced permeability
    • improved mechanical robustness
    • antimicrobial activity
  • evaluate candidate molecules such as keratin or other structural/functional additives in terms of:
    • biological feasibility
    • compatibility with BC
    • expected measurable benefit
    • added engineering complexity
  • establish criteria for whether a second molecule is worth integrating into the platform by prioritizing only additions that significantly improve the material’s performance or expand its application in a clear and testable way.

Previous ideas

Historical register of the brainstorm for the Individual Project:

Later, I added 3 slides with an updated version of those 3 ideas in the appropriate slide deck for Committed Listeners here.

However, the current project direction is a different idea: a bacterial cellulose-based material platform for melanin-derived tonal output, potentially extended with synthetic minimal cells for compartmentalized light-responsive pigment generation.

But I decided to devolop another idea not present in the inicial registers.

Individual Final Project: Melanin-based light-recording bioink/biomaterial

Important links:

  • Commited Listener Slide Deck here.
  • Benchling (TO BE ADDED)
  • Asimov Kernel (TO BE ADDED)

Aim 1: Build a first melanin-producing cell-free DNA module based on melA tyrosinase + Define validation parameters

The melA gene is coding sequence of tyrosinase that catalizes the conversion of tyrosine to dopaquinone. Dopaquinone is intermediate product of melanin biosynthesis pathway that polymerizes in an enzyme-independent reaction to form melanin.

The pathway from L-tyrosine to Melanin with the use of the melA tyrosinase.

1. Design MelA expression constructs (Benchling and Twist)

Some of the construct variables to be considered because it will affect melA expression:

  • Promoter
  • RBS
  • Terminator
  • Codon usage
  • Tag placement
  • Vector context

Note: a construct that works in cells may not translate well to cell-free expression.

Previous expression constructs:

  • iGEM 2004 - using the P(lac)IQ promoter (BBa_I14032), a work designed by Vikram Vijayan, Allen Hsu, Lawrence Fomundam Group of the iGEM04 (2004-08-04 Part:BBa_I14032).

  • iGEM 2009 - using B0040 RBS (composite part: BBa_K193602) containing pLacIQ(BBa_I14032), RBS(B0030) and melA(BBa_K193600) on low copy vector(BBa_I52001 derived), a work designed by Kazuaki Amikura.

  • iGEM 2017 - Erin Kelly Group (2017-10-27) have developed a different expression construct specifically for use in E. coli BL21(DE3) which takes advantage of the T7 RNA Polymerase expression construct in the DE3 cassette to provide tighter epxression control and help to prevent leaky expression of the tyrosinase. Additionally, a double terminator B0015 was added to increase control over the system. In order to maximize production of the tyrosinase and limit unnecessary energy expenditure by the cell, a transcriptional terminator ensure energy is not wasted on transcribing an overly-long mRNA transcript. They transformed the melA_pJET plasmid into E. coli BL21(DE3) and attempted to overexpress the MelA tyrosinase (~54kDa) and produce the pigment melanin. Four colonies from the transformation were picked and used to produce pre-cultures, which were then used to incoulate test expression cultures. During the test expression, cultures were also supplied with CuSO4 and extra tyrosine. Cultures were induced with IPTG at OD600 ~1 and a 1OD sample was taken (T0). Another 1OD sample was taken after the cultures were left to grow overnight (TON). The cultures were allowed to grow another three days (supplemented with tyrosine and ampicillin) to see if pigment would form, but we were unable to detect any melanin. The 1OD samples were run on a 12% SDS-PAGE to check for melA overexpression (Figure 3). The MelA tyrosinase is ~54 kDa in size. A faint band of approximately 54 kDa appears in the TON lane of culture 3. This indicated that we were successful in expressing the MelA tyrosinase from the pJET plasmid. Before the Jamboree, we will attempt another overexpression of MelA from the pSB1C3 plasmid.

My take: T7 can maximize protein yield but also overwhelm folding capacity, causing inactive protein accumulation (increase the likehood of tyrosinases misfolds, aggregation, or fail to incorporate copper correctly). I’d replace it by a moderated construct and compare the results in reference to the BBa_K2481108 (control).

Note: MelA expression is not the same as melanin production. Melanin polymerization is messy, dopaquinone polymerizes through non-enzymatic downstream chemistry. This means color output depends not only on MelA, but also on oxygen, pH, time, redox state, and local chemistry.

2. Prepare reagents and workflow (Ginko & Open AI, HW 11C)

Melanin production in E. coli or in a cell-free system is influenced by several parameters that actuate at the level of melA expression and enzyme activity / posterior reactions:

  • L-tyrosine concentration (substrate, limited solubility)

  • CuSO4 concentration: since this tyrosinase is a type 3 copper-containing enzyme, Cu2+ is a cofactor of the enzyme. Too much copper can also stress cells or inhibit cell-free reactions.

  • Magnesium

  • Energy mix

  • Molecular oxigen avaliability for tyrosinase reactions

  • pH: tyrosinase activity and melanin polymerization are pH-dependent. If the reaction acidifies over time, enzyme activity or pigment formation may decrease.

Note: Optimizing for sfGFP may not optimize for MelA.

3. Validation

The 2017 iGEM result is a useful warning: they may have produced a faint ~54 kDa MelA band, but still detected no pigment. For this reason, I am proposing a staged validation workflow that moves from simple expression and pigment checks to more refined mechanistic analyses, depending on the results obtained in each previous round.

OrderMethodWhat it tells youHypothesis tested
1HW11 protein fluorescenceGeneral cell-free expression capacityIf fluorescence is strong, the TX-TL system is functional and supports protein expression.
2Petri dish photosVisual pigment output (qualitative)If colonies/spots darken over time, the construct produces visible pigment.
3Spectrophotometric OD 400–500 nmPigment formation kineticsIf absorbance increases over time, MelA is producing melanin-like pigment.
4SDS-PAGE (MelA ~54 kDa)MelA protein expressionIf a ~54 kDa band is present, the construct expresses MelA protein.
5LC-MS (tyrosine / L-DOPA)Pathway-level activity (mechanistic)If tyrosine decreases and intermediates increase, MelA is catalytically active even without visible pigment.

Results interpretation framework:

ObservationInterpretation
Strong MelA band + no pigmentExpression works, but enzyme is likely inactive (folding, copper, pH, oxygen, or substrate issue).
Weak MelA band + strong pigmentLow expression but high enzymatic efficiency; not expression-limited.
Strong MelA band + strong pigmentOptimal case: high expression and active enzyme.
Weak MelA band + no pigmentExpression failure (construct, transcription, translation, or stability issue).
MelA band + LC-MS intermediates + weak pigmentEnzyme is active, but pigment polymerization or accumulation is limiting.
MelA band + no LC-MS intermediates + no pigmentEnzyme is expressed but inactive (likely folding or cofactor issue).

Here’s a diagram of my proposed validation workflow

4. Model a light-activated expression circuit that could later support gradual tonal change in a material system (Asimov Kernel).

Post-Course: Towards the melanin-based light-recording bio-ink

  1. Refine Asimov Kernel’s output for controlling melanin expression till the system is fine-tuned with my aesthetical needs (most controlled and previsible) or, if not possible develop a experimental design for embedded sensing (in this case map both the quantitative and qualitative workflows available).**

Decision point: Push for maximum molecular control first, even if the material context is still abstract? OR Move earlier into material-scale experiments?

  1. Material engineering: Test different models of integration of the melanin cell-free module towards a intended function/product. Tests for integrating into material though ELM engineer Komagataeibacter rhaeticus + bacterial celulose (BC) or a hybrid system with BC scaffold / other biomaterial that can be embed with cell-free modules with synthetic minimal cells / K. rhaeticus.

  2. Rethink the workflow for measurable readouts for the cell-free system. RGB image analysis in a controlled lighting box or spectral (semi-quantitative) with melanin absorbance ranging from 475 nm to 500 nm;

  3. Optimization rounds looking into every step of the proposed workflow, including economic tests for modeling parameters (such as ink dilution)

  4. Benchmarking partnerships + reach out plan (heavly based on the previous item conclusions (BioFabricate, Cultivarium)

Group Final Project

Bacteriophage Engineering

GROUP MEMBERS: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia.

PROJECT MAIN GOAL : Increase the stability of the L protein

GROUP PROPOSAL: We will use the same workflow than in previous HW (e.g. mutagenesis) but adapt it to specific aim(s) based on HW reading material of week 04 (e.g. shorten the L protein to make it not dependant on bacterial chaperone DnaJ anymore).

Please check our most recent updated Google Docs on this.

Here’s a summary of my main individual contributions to the plan for engineering the bacteriophage:

I ran the provided mutational scoring notebook to obtain per-substitution LLR scores for the MS2 L-protein and shortlisted substitutions with positive scores. The full scoring results are included in a table on my Homework 5 page.

I then cross-checked these shortlisted mutations against the provided experimental mutant dataset, L-Protein Mutants, which reports amino acid substitutions and their measured lysis phenotypes.

The overlap between the two data suggests that sequence-based LLR scores capture only part of the functional landscape of the MS2 L-protein. More broadly, positive LLR scores may reflect sequence plausibility or local biochemical compatibility, but they do not fully account for higher-order constraints such as host-factor dependence, membrane behavior, and oligomer formation.

Therefore, I decided to select five candidate mutations by combining positive LLR scores with biological reasoning about the protein’s distinct functional domains, treating LLR scores as a prioritization tool for experimental testing rather than as a direct predictor of lytic function.

The MS2 L-protein is organized into distinct functional domains:

  1. Hydrophilic N-terminal region involved in DnaJ-mediated folding
  2. Transmembrane/C-terminal region responsible for membrane insertion and pore formation

The two soluble-region mutants, S9Q and C29R, were chosen to probe effects on folding and possible DnaJ dependence, whereas the three transmembrane mutants, A45L, T52L, and N53L, were chosen to probe membrane insertion and oligomerization.

Mutant 1 - S9Q (soluble, LLR = 2.014)

Sequence: METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: High positive score in the soluble region (putative DnaJ-interaction domain). Ser→Gln increases hydrogen-bonding potential and may alter surface chemistry without strongly destabilizing the fold.

Mutant 2 - C29R (soluble, LLR = 2.395)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: One of the strongest positive-scoring substitutions in the soluble region. Adds a positive charge that could reshape chaperone-recognition or interaction surfaces.

Mutant 3 - A45L (TM, LLR = 1.539)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: Hydrophobic substitution in the transmembrane segment. Ala→Leu increases hydrophobicity and may stabilize membrane helix packing/insertion and oligomer stability.

Mutant 4 - T52L (TM, LLR = 1.814)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: Polar→hydrophobic change in the TM region. Thr→Leu may increase membrane compatibility and reduce local insertion/misfolding penalties.

Mutant 5 - N53L (TM, LLR = 1.865)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: Polar→hydrophobic change in the TM region with a strong positive score. Selected as an additional TM-stabilizing candidate.

BioClub Committed Listener MoU

HTGAA Committed Listener (CL) Agreement

I am a HTGAA Committed Listener, my responsibilities are:

  • Watching class lectures and recitations
  • Participating in node reviews
  • Developing and documenting my homework
  • Actively communicating with other students and TAs on the forum
  • Allowing HTGAA and BioClub to share my work (with attribution)
  • Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
  • Following locally applicable health and safety guidance
  • Promoting a respectful environment free of harassment and discrimination

Signed by committing this file to my documentation page/repository,

{{ Mariana Teixeira Kanbe }}

{{ 2nd Feb, 2026 }}