Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
Assignments: Class 1 Assignment Question 1 I propose a high-throughput microscopy tool to estimate intracellular PHA accumulation from granule count and size. Current standard quantification methods are slow, labor-intensive, and often require hazardous solvent-based extraction. By pairing PHA staining (e.g., Sudan Black B or Nile Red A) with automated imaging and machine-learning (ML) image segmentation, this approach could rapidly screen large libraries of environmental isolates and recombinant strains for high PHA producers.
Week 2: DNA Read, Write, & Edit
Homework Part 0: Basics of Gel Electrophoresis I have watched the Week 2 lecture and recitation on DNA read/write/edit, restriction digests, Benchling, Twist, and gel electrophoresis. Part 1: Benchling & In-silico Gel Art Opened Benchling and signed up. Found the Lambda sequence here and copied the sequence without the header. Pasted this sequence into Benchling through “Create” > “DNA / RNA Sequence” > “New DNA / RNA Sequence”. Then I just pasted the sequence in the “Bases” field, titled it “Lambda,” and selected the topology as “Linear.”
Week 3 HW: Lab Automation
Python Script for Opentrons Artwork Here’s my HTGAA 2026 Opentrons Art Python Script Submission. The artistic design I created using the GUI is available here. I heavily used the “Example 7 Microbial Earth” by Dominika Wawrzyniak, using pixels loaded from an external resource (a CSV file hosted on my GitHub page). I used Dominika’s well documented Notion page from HTGAA21 to understand the code and replicate it for my case. I used Gemini assistance only to debug minor typos and syntax errors, and to identify which packages to import to execute the code.
Week 4 HW: Protein Design Part 1
Homework: Protein Design I Part A. Conceptual Questions 1) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) ~ 21% of meat is protein content (Smith et al. 2022) therefore, 500g meet contains about 105g of protein.
Week 5 HW: Protein Design Part 2
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM Question 1 This is human SOD1 sequence from UniProt (P00441) removing the initial Met ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ FASTA introducing the A4V mutant associated with the most aggressive forms of the ALS disease ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ Question 2 and 3 With the help of ChatGPT and Gemni, I generated 2 new cells ir order to generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
Week 6 HW: Genetic Circuits Part 1: Assembly Technologies
Assignment: DNA Assembly Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion High-Fidelity PCR Master Mix is a 2X, ready-to-use mixture where the exact formulation is partly proprietary, but the functional components are documented in the manufacturer’s manual: Component (Phusion 2X Master Mix) Purpose Phusion High-Fidelity DNA Polymerase DNA synthesis with high fidelity + proofreading dNTPs (dATP, dCTP, dGTP, dTTP) Building blocks for new DNA strands HF reaction buffer (salts + pH buffer) Maintains optimal pH/ionic strength for enzyme function Mg2+ (via buffer system; often MgCl2-derived) Essential polymerase cofactor Stabilizers / additives (partly proprietary) Improve enzyme stability and consistency Nuclease-free water Solvent to reach correct 2X working concentrations Reference: Thermo Fisher Phusion High–Fidelity DNA Polymerase Product Information Sheet, standard biochemistry manuals (e.g., Sambrook & Russell).
Week 7 HW: Genetic Circuits Part 2: Neuromorphic Circuits
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) Question 1 Traditional genetic circuits are usually implemented in Boolean logic (ON/OFF), hand-designed as fixed logic. so representing nuanced behaviors often requires many gates, sharp thresholds, and careful tuning, which can make designs bulky and brittle. As the number of inputs grows the circuit complexity can explode combinatorially, increasing burden by stacking multiple layers and adding intermediate nodes, which increases metabolic load, failure points, and sensitivity to part-to-part variability Also, adapting to new targets or shifting biological context often means redesigning the circuit architecture, not just re-tuning parameters.
Week 9 HW: Cell Free Systems
Homework Part A: General and Lecturer-Specific Questions General homework questions Exercise 1 Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Week 10 HW: Advanced Imaging & Measurement Technology
Homework: Final Project What to measure? I will measure visible melanin output in the material as the primary readout of the project. I want to quantify: Degree of darkening Spatial distribution of pigmentation Stability/Persistence of the pigmentation in the bacterial cellulose / after drying or storage These measurements are directly relevant because they indicate whether the melanin-producing system is functioning and whether the output is compatible with the intended material application.
Week 11 HW: Bioproduction & Cloud Labs
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork I contributed 7 pixels to the global artwork experiment, helping extend a horizontal yellow line in the top-left area (see screenshot below). At first, I was cautious and tried to understand the ongoing ideas for each section and whether there was a unifying concept. I considered introducing something new, but ultimately decided to stick with what seemed to be the area’s goal (a horizontal yellow line). For next year, it might be fun to have an in-app chat within the same domain to coordinate contributions more easily and check the current vibes.
Week 12 HW: Building Genomes
I reviewed the updated Week 11 homework and continued making progress on my Individual Final Project and DNA order.
Week 13 HW: AI, Synbio, and Scaling Health Innovation (ARPA-H)
I worked on my Final Project and prepared it for the presentation on May 13 as part of the Committed Listeners group.
Week 14 HW: Bio Design & Bio Fabrication
I worked on my Final Project and prepared it for the presentation on May 13 as part of the Committed Listeners group.

Week 1 HW: Principles and Practices

Assignments: Class 1 Assignment

Question 1

I propose a high-throughput microscopy tool to estimate intracellular PHA accumulation from granule count and size.

Current standard quantification methods are slow, labor-intensive, and often require hazardous solvent-based extraction. By pairing PHA staining (e.g., Sudan Black B or Nile Red A) with automated imaging and machine-learning (ML) image segmentation, this approach could rapidly screen large libraries of environmental isolates and recombinant strains for high PHA producers.

Future upgrades, offered as a premium beta for testing, could add a “material profile” output by predicting PHA chain-length class (SCL, MCL, or LCL) from staining/fluorescence response patterns using the lipophilic dyes. This would enable not only faster strain selection but also early-stage differentiation of polymer type, which is critical for downstream biotechnology applications.

A further upgrade could generate image-driven optimization suggestions from microscopy images. For example, if it detects a high level of extracellular debris consistent with cell lysis, or a high abundance of product granules outside the cells, it could recommend exploring strain-engineering strategies that alter cell membrane composition to increase tolerance to mechanical stress and support higher intracellular polymer accumulation as cytoplasmic granules.

Question 2

Gov / Policy Goal 1: Prevent harmful misuse

• Sub-goal 1.1 - Limit repurposability: Reduce the extent to which the tool can be used as a general-purpose and high-throughput optimization engine outside its intended PHA scope, for example by restricting supported dyes and limiting microscopy calibration parameters to validated settings.

• Sub-goal 1.2 - Increase accountability: ensure high-impact uses are traceable and that institutions have a mechanism to intervene if misuse is suspected.

Gov / Policy Goal 2: Promote safe, responsible operation and research integrity

• Sub-goal 2.1 - Standardize safe use: Require adherence to Standard Operating Procedures (SOPs) for staining, imaging, and waste handling.

• Sub-goal 2.2 - Ensure competent users: Require completion of a short training module, including lab safety + tool-specific quality control (QC) before users can access advanced features or export “final” reports.

• Sub-goal 2.3 - Maintain data quality: Require basic QC checks (controls, calibration, and logging of model version and imaging settings) to reduce false positives/negatives and prevent misinterpretations.

Gov / Policy Goal 3: Maintain access for constructive uses (equity and scientific progress)

• Sub-goal 3.1 - Preserve legitimate research utility: avoid governance mechanisms that unnecessarily slow routine PHA research and screening.

• Sub-goal 3.2 - Proportional governance: apply stricter controls only to higher-impact capabilities (e.g., advanced optimization suggestions), rather than restricting all use.

Question 3

Option 1:

General action: Norms combined with oversight mechanisms (social/regulatory governance)

Purpose: Currently, PHA quantification is typically validated through chemical extraction and analytical methods rather than standardized image-based measurement. A robust image-analysis tool like this would significantly increase throughput and expand where and how screening can be performed. If an image-analysis approach is positioned as a scalable screening tool, it should include safeguards to prevent use outside validated conditions. A responsible-use policy with “red flag” triggers would provide a proportional oversight mechanism.

Design:

• Actors: principal investigators (PIs) and laboratory personnel (primary users), microscopy core facility staff, the university biosafety office (or equivalent), and an institutional ethics/biosafety committee.

• Mechanism: implement a short pre-use declaration form and a responsible-use policy that defines “red flag” contexts (e.g., high-throughput work on unverified environmental isolates without provenance, use outside standard biosafety environments, or attempts to generalize the tool beyond PHA workflows).

• Trigger response: if a red flag is triggered, require review by the biosafety/ethics committee (or the biosafety office) and compliance with institutional requirements before experiments or tool access continue.

Assumptions:

• Users will accurately disclose the intended use and experimental context (or there will be sufficient deterrence to reduce misreporting).

• Red-flag criteria can be defined clearly enough to be actionable and consistent across labs.

• The institution has capacity to perform timely reviews without creating major delays for legitimate projects.

• Some level of auditing is feasible (e.g., metadata logs or usage reporting), which may require limited access to usage data.

Risks of failure and “success”:

• The policy becomes symbolic and is not followed; criteria are too vague to enforce; or users misreport their purpose to avoid review.

• Overly broad triggers could make oversight routine, slowing research and disproportionately burdening smaller or under-resourced labs (equity and access concerns).

Option 2:

Restrict advanced features: High-impact features require auditable access (accountability governance)

Purpose: Add accountability for higher-impact features while keeping basic screening broadly accessible.

Design:

• Actors: tool developers (academic or company), institutions adopting the tool.

• Baseline access: basic PHA screening module available for standard use.

• Advanced access (premium/beta): requires institutional opt-in (verified affiliation, training completion, and standard operating procedures adherence).

• Logging: maintain run logs with technical metadata only (model version, stain, imaging settings, quality control pass/fail, solvent/waste metadata etc).

• Incident response: provide an incident-reporting channel so access can be suspended if misuse is suspected.

Assumptions:

• Logging and gating deter misuse without driving users to ungoverned copies.

• Metadata-only logs are sufficient for accountability without compromising privacy.

• Institutions are willing to administer opt-in and training requirements.

Risks of failure and “success”:

• Users bypass controls by using modified versions or alternative tools; logging becomes incomplete.

• Reduced accessibility and higher admin burden, potentially concentrating access in well-resourced labs.

• Analogy: similar to “KYC tiers” in financial systems: more powerful capabilities require stronger verification and auditability.

Option 3:

Just for PHA: Scope capabilities through validated workflows (technical strategy / design constraint). Purpose: General-purpose screening tools are easier to repurpose. One way to limit their repurposability is by restricting the tool to validated PHA workflows.

Design:

• Actors: tool developers and maintainers; optionally journals or core facilities that require validated workflows for reporting.

• Technical constraint: restrict supported dyes and workflows to PHA-relevant staining and analysis; lock calibration parameters to validated microscopy settings; exclude generic “optimize any phenotype” modules.

• Reporting constraint: outputs are labeled as screening support, with clear limits on claims and recommended confirmatory methods for final quantification.

Assumptions:

• Technical restrictions meaningfully reduce repurposability.

• The validated workflow remains useful across common lab setups and organisms.

• Users accept constraints rather than abandoning the tool.

Risks of failure and “success”:

• Restrictions are easily removed in forks / hacks etc; scope limits become ineffective.

• Reduced scientific and commercial usefulness, including for ethically beneficial non-PHA applications; may slow innovation.

• This is analogous to 3D printers that restrict materials and firmware settings: the core function remains available, but out-of-scope production becomes harder without intentional modification.

Question 4

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity
• By preventing incidents	2	1	3
• By helping respond	2	1	3
Foster Lab Safety
• By preventing incident	2	2	1
• By helping respond	3	1	3
Protect the environment
• By preventing incidents	2	1	2
• By helping respond	3	2	3
Other considerations
• Minimizing costs and burdens to stakeholders	2	3	1
• Feasibility?	2	2	1
• Not impede research	2	3	3
• Promote constructive applications	2	1	3

Question 5

I would prioritize Option 3 as the primary governance approach, aimed at tool developers and maintainers. Although Option 3 has the weakest overall score, I assign higher weight to practical implementability and consistent adoption, since governance mechanisms that require sustained oversight or significant administrative capacity are often applied inconsistently in real research settings. Option 3 can be implemented directly in software and routine workflows by restricting the tool to validated PHA use cases (supported dyes, locked calibration ranges, and scoped outputs). This reduces repurposability by design rather than relying on user compliance, making the default use safer and more predictable while preserving the core constructive application: scalable PHA screening.

The key trade-off is that Option 3 scores poorly on “helping respond” (biosecurity and lab safety), because it provides limited traceability and fewer mechanisms for intervention after deployment. It also narrows beneficial extensions beyond PHA, potentially limiting constructive applications in adjacent domains.

This recommendation also rests on several assumptions and uncertainties: that capability scoping meaningfully reduces repurposability in practice; that users will not widely circumvent constraints via modified versions or alternative tools; and that the validated workflow generalizes across common microscopes, organisms, and staining conditions.

Final Reflection

The main new ethical concern for me was how quickly a tool designed for a narrow, constructive purpose (PHA screening) can become a general “scale-up enabler” once it is automated and paired with machine-learning image analysis. To address this, I would recommend capability scoping by restricting the tool to validated PHA workflows (supported dyes, locked calibration ranges, and scoped outputs)

Week 2 Lecture Prep

Homework Questions from Professor Jacobson:

Question 1 High-fidelity, proofreading-proficient replicative DNA polymerases have an error rate of ≈ 10⁻⁶ during synthesis under standard conditions. The human nuclear genome is about 3.2 × 10⁹ base pairs per haploid set. If errors happened at 10⁻⁶ per base, you’d expect roughly 3.2 × 10⁹ × 10⁻⁶ ≈ 3.2 × 10³ (≈ 3,200) errors per haploid genome copy. However, in living cells, the effective replication error rate is far lower once proofreading (3′→5′ exonuclease) and post-replication repair (such as mismatch repair, MMR) are included: a commonly cited order of magnitude is ≈ 10⁻⁹ to 10⁻¹⁰ errors per base pair per replication.

Question 2 Because of codon degeneracy, the same amino-acid sequence can be encoded by many DNA coding sequences. A rough average multiplicity per amino acid is about 3.05 synonymous codons. Given an average human protein of 1036 bp and that coding DNA uses 3 bp per amino acid, 1036 bp / 3 ≈ 345 codons. So the number of different DNA coding sequences that produce the exact same protein is on the order of ≈ 10¹⁶⁷. In practice, though, synonymous variants are not always functionally equivalent. Some synonymous changes produce transcripts with different stability and structure. For example, synonymous substitutions can lead to hairpins or repetitive motifs that increase recombination and reduce construct stability. They can also change ribosome speed patterns (which can alter co-translational folding and lead to misfolding, aggregation, or altered activity). In addition, synonymous changes can inadvertently create or disrupt regulatory sequence motifs (e.g., polyadenylation signals or splicing enhancer/silencer elements in eukaryotes).

Homework Questions from Dr. LeProust:

The gold standard for oligonucleotide synthesis is solid-phase oligonucleotide synthesis (SPOS) based on phosphoramidite chemistry (Walther et al. 2020). However, this method struggles beyond ~200 nt because every nucleotide is added through repeated chemical cycles, and small inefficiencies, truncation products, depurination, and side reactions compound with length. For the same reason, a 2000 bp gene cannot be made reliably by direct oligo synthesis. Instead, long genes are typically assembled from shorter oligos or DNA fragments, followed by error correction, cloning, and sequence verification.

Homework Question from George Church:

Question: What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

Answer: The 10 essential amino acids in all animals are Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine. Considering this, Jurassic Park’s biocontainment method is a joke, since it doesn’t create a unique dependency in animals: animals already can’t synthesize lysine. Also, as containment-by-dependency, it’s ecologically leaky because they did not consider the possibility that lysine was readily available in the environment. Lysine is available via plants and prey, so escape doesn’t remove access.

OBS: I answered this by consulting a Jurassic Park subreddit discussion.

Week 2: DNA Read, Write, & Edit

Homework

Part 0: Basics of Gel Electrophoresis

I have watched the Week 2 lecture and recitation on DNA read/write/edit, restriction digests, Benchling, Twist, and gel electrophoresis.

Part 1: Benchling & In-silico Gel Art

Opened Benchling and signed up. Found the Lambda sequence here and copied the sequence without the header. Pasted this sequence into Benchling through “Create” > “DNA / RNA Sequence” > “New DNA / RNA Sequence”. Then I just pasted the sequence in the “Bases” field, titled it “Lambda,” and selected the topology as “Linear.”

Clicked “Digest” (the scissors icon in the right menu), selected “All enzymes,” found all seven using the search tool, and clicked “Run Digest.”

This in-silico gel image uses simulated Lambda DNA restriction digest banding patterns from the required enzymes and arranges them as a visual composition inspired by Paul Vanouse’s Latent Figure Protocol.

Part 2: Gel Art - Restriction Digests and Gel Electrophoresis

I did not complete the wet-lab restriction digest and gel electrophoresis experiment. As a Committed Listener, I completed the required in-silico gel design in Benchling, but I did not have lab access for the optional wet-lab portion.

Part 3: DNA Design Challenge

3.1. Choose your protein: Poly(3-hydroxyalkanoate) polymerase subunit PhaC

I chose Polyhydroxyalkanoate synthase (PhaC) because it is involved in the catalysis of the reaction that polymerizes (R)-3-hydroxybutyryl-CoA to produce polyhydroxybutyrate (PHB), which is an important bioproduct of interest due to its plastic/polyethylene-like properties.

Biologically, PHB serves as an intracellular energy reserve material when cells grow under conditions of nutrient limitation.

Sequence of Polyhydroxyalkanoate Synthase (PhaC): MATGKGAAASTQEGKSQPFKVTPGPFDPATWLEWSRQWQGTEGNGHAAASGIPGLDALAGVKIAPAQLGDIQQRYMKDFSALWQAMAEGKAEATGPLHDRRFAGDAWRTNLPYRFAAAFYLLNARALTELADAVEADAKTRQRIRFAISQWVDAMSPANFLATNPEAQRLLIESGGESLRAGVRNMMEDLTRGKISQTDESAFEVGRNVAVTEGAVVFENEYFQLLQYKPLTDKVHARPLLMVPPCINKYYILDLQPESSLVRHVVEQGHTVFLVSWRNPDASMAGSTWDDYIEHAAIRAIEVARDISGQDKINVLGFCVGGTIVSTALAVLAARGEHPAASVTLLTTLLDFADTGILDVFVDEGHVQLREATLGGGAGAPCALLRGLELANTFSFLRPNDLVWNYVVDNYLKGNTPVPFDLLFWNGDATNLPGPWYCWYLRHTYLQNELKVPGKLTVCGVPVDLASIDVPTYIYGSREDHIVPWTAAYASTALLANKLRFVLGASGHIAGVINPPAKNKRSHWTNDALPESPQQWLAGAIEHHGSWWPDWTAWLAGQAGAKRAAPANYGNARYRAIEPAPGRYVKAKA Source: UniProt at https://www.uniprot.org/uniprotkb/P23608/entry#sequences

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence. reh:H16_A1437 K03821 poly(R)-3-hydroxyalkanoate polymerase subunit PhaC EC:2.3.1.304 | (GenBank) phaC1; Poly(3-hydroxybutyrate) polymerase (N) atggcgaccggcaaaggcgcggcagcttccacgcaggaaggcaagtcccaaccattcaaggtcacgccggggccattcgatccagccacatggctggaatggtcccgccagtggcagggcactgaaggcaacggccacgcggccgcgtccggcattccgggcctggatgcgctggcaggcgtcaagatcgcgccggcgcagctgggtgatatccagcagcgctacatgaaggacttctcagcgctgtggcaggccatggccgagggcaaggccgaggccaccggtccgctgcacgaccggcgcttcgccggcgacgcatggcgcaccaacctcccatatcgcttcgctgccgcgttctacctgctcaatgcgcgcgccttgaccgagctggccgatgccgtcgaggccgatgccaagacccgccagcgcatccgcttcgcgatctcgcaatgggtcgatgcgatgtcgcccgccaacttccttgccaccaatcccgaggcgcagcgcctgctgatcgagtcgggcggcgaatcgctgcgtgccggcgtgcgcaacatgatggaagacctgacacgcggcaagatctcgcagaccgacgagagcgcgtttgaggtcggccgcaatgtcgcggtgaccgaaggcgccgtggtcttcgagaacgagtacttccagctgttgcagtacaagccgctgaccgacaaggtgcacgcgcgcccgctgctgatggtgccgccgtgcatcaacaagtactacatcctggacctgcagccggagagctcgctggtgcgccatgtggtggagcagggacatacggtgtttctggtgtcgtggcgcaatccggacgccagcatggccggcagcacctgggacgactacatcgagcacgcggccatccgcgccatcgaagtcgcgcgcgacatcagcggccaggacaagatcaacgtgctcggcttctgcgtgggcggcaccattgtctcgaccgcgctggcggtgctggccgcgcgcggcgagcacccggccgccagcgtcacgctgctgaccacgctgctggactttgccgacacgggcatcctcgacgtctttgtcgacgagggccatgtgcagttgcgcgaggccacgctgggcggcggcgccggcgcgccgtgcgcgctgctgcgcggccttgagctggccaataccttctcgttcttgcgcccgaacgacctggtgtggaactacgtggtcgacaactacctgaagggcaacacgccggtgccgttcgacctgctgttctggaacggcgacgccaccaacctgccggggccgtggtactgctggtacctgcgccacacctacctgcagaacgagctcaaggtaccgggcaagctgaccgtgtgcggcgtgccggtggacctggccagcatcgacgtgccgacctatatctacggctcgcgcgaagaccatatcgtgccgtggaccgcggcctatgcctcgaccgcgctgctggcgaacaagctgcgcttcgtgctgggtgcgtcgggccatatcgccggtgtgatcaacccgccggccaagaacaagcgcagccactggactaacgatgcgctgccggagtcgccgcagcaatggctggccggcgccatcgagcatcacggcagctggtggccggactggaccgcatggctggccgggcaggccggcgcgaaacgcgccgcgcccgccaactatggcaatgcgcgctatcgcgcaatcgaacccgcgcctgggcgatacgtcaaagccaaggcatga Source: KEGG at https://www.genome.jp/dbget-bin/www_bget?reh:H16_A1437

3.3. Codon optimization. I optimized the phaC coding sequence for E. coli because it is a widely used chassis for recombinant protein expression and for rapid prototyping of metabolic engineering constructs.

I did this using the Benchling tool. I’ve selected the region of the AA sequence I wish to back translate and right clicked on the highlighted region. From the codon optimization tab:

Host: E. coli K-12
Method: Match codon usage
GC content: Medium (0.33 to 0.66) because extreme GC content can create problems. High GC can create strong secondary structures and low GC can cause instability/repeats and can make synthesis harder.
Uridine depletion: off (not relevant for bacterial expression)
Hairpin parameters: Stem size: 8 and Window 50
Restriction sites: avoid BsaI, BsmBI, BbsI (Type IIS enzymes for Golden Gate compatibility since I would have to clone phaA and phaB also, not phaC single gene in one vector)
Patterns to reduce: AAAAAA and ATATATATA

I clicked on “Optimization preview” and got this result:

3.4. You have a sequence! Now what?

PhaC alone will not produce PHB. A minimal PHB pathway typically includes PhaA (β-ketothiolase) and PhaB (acetoacetyl-CoA reductase) in addition to PhaC (PHA synthase). PhaA and PhaB convert central metabolites (via acetyl-CoA) into (R)-3-hydroxybutyryl-CoA, which is the direct substrate that PhaC polymerizes into PHB. You will also need a host capable of supplying sufficient acetyl-CoA and NADPH.

Therefore, for PHB production in E. coli, phaA, phaB, and phaC are commonly co-expressed on the same plasmid (as a single operon with one promoter and RBSs for each gene) and grown under appropriate culture conditions (e.g., carbon excess and nutrient limitation) that favor polymer accumulation.

To produce the protein from DNA, the codon-optimized phaC sequence would be placed in an expression cassette with a promoter, RBS, start codon, coding sequence, stop codon, and terminator. In a cell-dependent system such as E. coli, RNA polymerase transcribes the DNA sequence into mRNA. The ribosome binds the RBS, reads the mRNA codons, and translates them into the PhaC amino-acid chain. For PHB production rather than PhaC expression alone, phaA, phaB, and phaC should be co-expressed so the host can convert acetyl-CoA into (R)-3-hydroxybutyryl-CoA and then polymerize it into PHB.

Part 4: Prepare a Twist DNA Synthesis Order

Project: pBBR1-MSC5::phaCAB Cell-dependent recombinant expression approach: cloning the codon-optimized phaA, phaB and phaC coding sequences into E. coli K12

Promoter - RBS - phaA - (RBS) - phaB - (RBS) - phaC - Terminator

phaA Sequence MTDVVIVSAARTAVGKFGGSLAKIPAPELGAVVIKAALERAGVKPEQVSEVIMGQVLTAGSGQNPARQAAIKAGLPAMVPAMTINKVCGSGLKAVMLAANAIMAGDAEIVVAGGQENMSAAPHVLPGSRDGFRMGDAKLVDTMIVDGLWDVYNQYHMGITAENVAKEYGITREAQDEFAVGSQNKAEAAQKAGKFDEEIVPVLIPQRKGDPVAFKTDEFVRQGATLDSMSGLKPAFDKAGTVTAANASGLNDGAAAVVVMSAAKAKELGLTPLATIKSYANAGVDPKVMGMGPVPASKRALSRAEWTPQDLDLMEINEAFAAQALAVHQQMGWDTSKVNVNGGAIAIGHPIGASGCRILVTLLHEMKRRDAKKGLASLCIGGGMGVALAVERK Source: UniProt at https://www.uniprot.org/uniprotkb/P14611/entry#sequences

phaB Sequence MTQRIAYVTGGMGGIGTAICQRLAKDGFRVVAGCGPNSPRREKWLEQQKALGFDFIASEGNVADWDSTKTAFDKVKSEVGEVDVLINNAGITRDVVFRKMTRADWDAVIDTNLTSLFNVTKQVIDGMADRGWGRIVNISSVNGQKGQFGQTNYSTAKAGLHGFTMALAQEVATKGVTVNTVSPGYIATDMVKAIRQDVLDKIVATIPVKRLGLPEEIASICAWLSSEESGFSTGADFSLNGGLHMG Source: UniProt at https://www.uniprot.org/uniprotkb/P14697/entry#sequences

phaC Sequence MATGKGAAASTQEGKSQPFKVTPGPFDPATWLEWSRQWQGTEGNGHAAASGIPGLDALAGVKIAPAQLGDIQQRYMKDFSALWQAMAEGKAEATGPLHDRRFAGDAWRTNLPYRFAAAFYLLNARALTELADAVEADAKTRQRIRFAISQWVDAMSPANFLATNPEAQRLLIESGGESLRAGVRNMMEDLTRGKISQTDESAFEVGRNVAVTEGAVVFENEYFQLLQYKPLTDKVHARPLLMVPPCINKYYILDLQPESSLVRHVVEQGHTVFLVSWRNPDASMAGSTWDDYIEHAAIRAIEVARDISGQDKINVLGFCVGGTIVSTALAVLAARGEHPAASVTLLTTLLDFADTGILDVFVDEGHVQLREATLGGGAGAPCALLRGLELANTFSFLRPNDLVWNYVVDNYLKGNTPVPFDLLFWNGDATNLPGPWYCWYLRHTYLQNELKVPGKLTVCGVPVDLASIDVPTYIYGSREDHIVPWTAAYASTALLANKLRFVLGASGHIAGVINPPAKNKRSHWTNDALPESPQQWLAGAIEHHGSWWPDWTAWLAGQAGAKRAAPANYGNARYRAIEPAPGRYVKAKA Source: UniProt at https://www.uniprot.org/uniprotkb/P23608/entry#sequences

For this exercise, I chose pBBR1MCS-5 as the plasmid backbone because it is a broad-host-range vector commonly used for cloning and expression of phaCAB. Source: https://www.teses.usp.br/teses/disponiveis/87/87131/tde-29042010-102817/publico/RogeriodeSousaGomes_Doutorado.pdf

The screenshot shows that my Twist account was redirected to “Contact Your Distributor” for orders through Interprise USA Corp., and another page returned an HTTP 500 server error.

Part 5: DNA Read / Write / Edit

5.1 DNA Read

I would sequence DNA used for DNA-based digital data storage because I am interested in how biological molecules can encode digital information. It would be fascinating to recover stored information from DNA as if reading an archive.

I would use Illumina sequencing, a second-generation massively parallel short-read technology, for high-accuracy base calls and reliable decoding of short oligo pools. I would also consider Oxford Nanopore sequencing, a third-generation single-molecule long-read technology, to validate longer constructs and check sequence integrity.

For Illumina, the input would be a pool of synthetic DNA oligos encoding digital data. If the oligos are already short, fragmentation may not be necessary. Library preparation would involve adapter ligation or PCR addition of adapters/indexes, followed by sequencing-by-synthesis using fluorescent reversible terminators. The output would be millions to billions of short reads in FASTQ format with per-base quality scores. The stored data would then be decoded using alignment, consensus generation, and error correction.

5.2 DNA Write

I would synthesize a PHA production cassette for E. coli K-12 containing codon-optimized phaA, phaB, and phaC. The goal would be to rapidly test/study PHB production from a designed pathway rather than cloning each gene manually from genomic DNA.

I would use commercial gene synthesis, such as Twist, because it allows designed DNA sequences to be ordered directly with defined codon usage, avoided restriction sites, and synthesis constraints. The essential steps are: design the coding sequences, codon-optimize them for E. coli, add regulatory parts such as promoter/RBSs/terminator, screen for forbidden restriction sites and problematic repeats, synthesize short oligos, assemble them into longer fragments or a full insert, clone into a plasmid, and verify the final sequence.

The main limitations are length-dependent error accumulation, synthesis difficulty from repeats or extreme GC content, turnaround time, cost for long constructs, and the need for clonal verification before experimental use.

5.3 DNA Edit

Aiming for increased expression of phaCAB and improved PHA production, I would edit E. coli metabolic and stress-tolerance genes to increase PHB yield. For example, I would target pathways that improve acetyl-CoA/NADPH supply, reduce competing carbon sinks, and increase tolerance to intracellular polymer accumulation.

For precise point mutations, I would use CRISPR base editing or prime editing because these methods can introduce targeted sequence changes without relying on double-strand breaks. For larger edits or gene insertions, I would use Cas9-assisted homologous recombination with a donor DNA template.

The design steps would include selecting the target gene, designing guide RNAs, checking off-target risk, preparing the editor plasmid or Cas9/gRNA system, designing the donor template if needed, transforming E. coli, selecting edited colonies, and confirming edits by sequencing.

Limitations include editing efficiency, PAM constraints, off-target edits, toxicity from editor expression, and the increased screening burden when multiplexing several edits.

Week 3 HW: Lab Automation

Python Script for Opentrons Artwork

Here’s my HTGAA 2026 Opentrons Art Python Script Submission.

The artistic design I created using the GUI is available here.

I heavily used the “Example 7 Microbial Earth” by Dominika Wawrzyniak, using pixels loaded from an external resource (a CSV file hosted on my GitHub page).

I used Dominika’s well documented Notion page from HTGAA21 to understand the code and replicate it for my case. I used Gemini assistance only to debug minor typos and syntax errors, and to identify which packages to import to execute the code.

Like Dominika Wawrzyniak, I planned to introduce more colors, like in the image I generated in the Automation Art Interface. However, implementing this design into code turned out to be more difficult and tedious than anticipated, so I left it as one color (red).

I submitted the Python file through the required homework submission form.

As a Committed Listener, I prepared the script and design documentation, but I did not run the protocol on a physical Opentrons robot.

Post-Lab Questions

Question 1

The paper “High-throughput experimentation for discovery of biodegradable polyesters” (Fransen et al., 2023) uses an Opentrons 1st-generation robot to automate a high-throughput biodegradation assay based on the clear-zone technique.

The researchers synthesized 642 polyesters and polycarbonates and tested their biodegradability using a clear-zone assay with Pseudomonas lemoignei. The Opentrons robot was repurposed as an automated imaging platform to capture time-lapse images of polymer degradation in 12-well plates, enabling consistent, large-scale monitoring over 13 days.

This automation allowed rapid generation of a large biodegradation dataset and supported machine learning models to predict polymer degradability from chemical structure.

Question 2

High-throughput screening of bacterial isolates for PHA production is traditionally extremely time-consuming and labor-intensive, requiring manual handling of hundreds of colonies across multiple conditions. For my final project, I plan to use an Opentrons OT-2 liquid-handling robot to automate this workflow, dramatically increasing throughput, reproducibility, and consistency compared to manual methods I used during my master’s.

Isolates will be spotted in triplicate on 60-sector plates, maintaining identical indexed positions across all plates for direct comparison. Viability will first be confirmed on LB agar, and isolates will then be inoculated onto mineral medium (MM; Ramsay et al., 1990) agar plates supplemented with individual carbon sources at 10% v/v to reach typical screening concentrations.

PHA production and bacterial growth will be assessed using a two-step staining workflow. First, Sudan Black B (0.02% in 96% ethanol, followed by ethanol washes) will identify colonies with blue coloration indicative of polymer accumulation. Second, Nile Red A incorporated into MM (0.5 μg/mL) will allow selected isolates to be ranked based on UV fluorescence (312/365 nm).

This automated setup enables rapid testing of hundreds of isolate × carbon source combinations, accelerating the discovery of strains compatible with low-cost feedstocks and efficient bioprocessing while transforming a laborious manual process into a precise, scalable screening platform.

Here’s my draft script for this exercise.

Each “color” would correspond to a different bacterial isolate. I did not implement this in the script yet. The coordinate set is a starting layout and could be refined to achieve a more uniform, regular distribution across the plate (like in the image I drafted using the GUI available below)

Final Project Ideas

Added 3 slides with 3 ideas for an Individual Final Project in the appropriate slide deck for Commited Listeners here.

Also here’s my analoginal brainstorm

Week 4 HW: Protein Design Part 1

Homework: Protein Design I

Part A. Conceptual Questions

1) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

~ 21% of meat is protein content (Smith et al. 2022) therefore, 500g meet contains about 105g of protein.

Using the approximation of average amino acid ≈ 100 Da ≈ 100 g/mol for ~100 g protein: 100/100=1.00 mol

Avogadro’s number: 1 mole = 6.02214076×10²³ 1.00 mol × 6.022×10²³ ≈ 6.02×10²³ amino-acid molecules

2) Why do humans eat beef but do not become a cow, eat fish but do not become fish?

Beef/fish supplies raw materials and energy, but it doesn’t transfer “cow/fish identity”. What we eat is digested first meaning the proteins, fats, and carbohydrates are broken down into small building blocks (amino acids, fatty acids, sugars), absorbed, and then reassembled into human molecules under human genetic and hormonal control.

3) Why are there only 20 natural amino acids?

Doig (2017) hypothesizes that the canonical set of 20 standard amino acids is best understood as an evolved “alphabet” that became fixed early because this set is sufficient and practical for building stable, soluble proteins. This set enables soluble folded structures with close-packed hydrophobic cores and ordered binding pockets, rather than being selected because each amino acid was needed for catalysis (since RNA catalysts were already effective enough). Once early life standardized a working translation system around this set, changing the alphabet would have been costly, so it became effectively locked in (“frozen”). Other references, such as Freeland et al. (2000), suggest that 20 is a good number for minimizing damage from errors (mutation/mistranslation).

4) Where did amino acids come from before enzymes that make them, and before life started?

Amino acids could plausibly have come from abiotic chemistry on early Earth. Proposed routes include cyanosulfidic protometabolism and amino-acid formation from electrical discharges in simple “primitive Earth” gas mixtures (the classic Miller experiment).

5) Can you discover additional helices in proteins?

Beyond the α-helix, proteins commonly contain 3₁₀ helices and π helices (less frequent helical variants), as well as polyproline II helices (common in Pro-rich/disordered regions) and the specialized collagen triple helix.

6) Why are most molecular helices right-handed?

Right-handed helices dominate because natural biomolecules are made from single-handed monomers, and the right-handed twist is the lowest-energy way to repeat their geometry without clashes.

7) Why do β-sheets tend to aggregate?

β-sheet aggregation buries exposed hydrophobic side chains and releases ordered water from their surfaces, which is strongly favorable, lowering enthalpy.

8) What is the driving force for β-sheet aggregation?

β-sheet aggregation is driven mainly by the hydrophobic effect and stabilized/propagated by intermolecular backbone H-bonding in the cross-β structure (often reinforced by tight steric-zipper packing).

9) Why do many amyloid diseases form β-sheets?

β-sheet architecture is an unusually generic, stable, and self-templating way for polypeptide backbones to stick together when normal folding fails. In a β-sheet, the peptide backbone forms regular hydrogen bonds. This conformation makes amyloid fibrils thermodynamically stable and hard to clear, because once a small β-sheet nucleus forms, it can seed further growth by recruiting more monomers and templating the same β-rich structure.

Part B: Protein Analysis and Visualization

Question 1

I selected poly(3-hydroxyalkanoate) depolymerase (PhaZ) because it is the key enzyme that degrades PHB, which directly controls whether a microbe accumulates bioplastic (useful for biotechnology) or breaks it down (relevant for environmental fate). phaZ inactivation is commonly discussed as a strategy to reduce PHA mobilization and increase polymer retention.

Question 2

MPEPYIFRTVELDDQSIRTAVRPGKPHLTPLLIFNGIGANLELVFPFIEALDPDLEVIAFDVPGVGGSSTPRHPYRFPGLAKLTARMLDYLDYGQVSAIGVSWGGALAQQFAHDYPERCKKLVLAATAAGAVMVPGKPKVLWMMASPRRYVQPSHVIRIAPLIYGGAFRRDPDLAMHHASKVRSGGKLGYYWQLFAGLGWTSIHWLHKIHQPTLVLAGDDDPLIPLVNMRLLAWRIPNAQLHIIDDGHLFLITRAEAVAPIIMKFLQEERQRAVMHPRPASGG

BLAST Result Lenght: 283 aa Most frequent amino acid: Leucine (L), 32/283 = 11.3%

250 hits Reviewed (Swiss-Prot) homologs: 1

It belongs to the PHA depolymerase (PhaZ) family, which is part of the broader α/β-hydrolase enzyme superfamily.

Question 3

AF_AFP26495F1 - COMPUTED STRUCTURE MODEL OF POLY(3-HYDROXYALKANOATE) DEPOLYMERASE

This is not an experimentally solved structure, so there is no X-ray/EM “resolution” value. RCSB explicitly states: “There are no experimental data to verify the accuracy of this computed structure model. See Model Confidence metrics below for all regions of the polypeptide chain.” Instead, quality is reported by AlphaFold confidence. Global pLDDT: 91.95 (very high confidence overall)

RCSB lists 1 unique protein chain (monomer A1) and no ligands/non-protein entities.

Structure classification family: InterPro annotations classify it as Poly(3-hydroxyalkanoate) depolymerase (IPR011942) and an alpha/beta hydrolase fold protein (Alpha/beta hydrolase fold-1 domain, AB hydrolase superfamily).

Question 4

I opened AF-Q9R9W3-F1-model_v6 in PyMOL and visualized it in cartoon, ribbon, and ball-and-stick representations.

Colored by secondary structure, it shows a mixed α/β fold with more helices than β-sheets.

Colored by residue type, hydrophobic residues are enriched in the core (and in a few surface patches), while polar/charged residues are mostly surface-exposed, consistent with solubility.

The surface view shows clear cavities/clefts, consistent with potential binding pockets (e.g., a substrate-binding groove typical of hydrolases).

Part C. Using ML-Based Protein Design Tools

For this section, I chose PDB 6J2U as a structural reference. This entry contains a heterodimeric complex between MelC1, the tyrosinase caddy/cofactor protein, and MelC2, the tyrosinase enzyme from Streptomyces avermitilis. For my analysis, I focused on the MelC2 tyrosinase chain (6J2U_2: Represented by Chain B).

https://colab.research.google.com/drive/1cOMreGB8zHQAy063H0HyahlEamIXSSPf?usp=sharing

C1. Protein Language Modeling

Question 1

a) I used the Chain B sequence from PDB 6J2U, including the N-terminal expression tag present in the deposited sequence.

b) The vertical darker columns at certain positions are highly constrained residues where most substitutions are penalized. That usually indicates structural importance (core packing, tight turns, or residues critical for fold stability). Positions with mostly neutral colors across many substitutions are likely surface-exposed or in flexible loops, where the model predicts more tolerance.

After generating the ESM2 mutational scan heatmap, I found it difficult to confidently interpret specific patterns only by visual inspection, because the plot contains many residues and mutations compressed into a dense matrix. To make the interpretation more objective, I used ChatGPT to help me write small analysis snippets to quantify the heatmap directly. I run a script to calculate the average ESM2 score for each mutant amino acid across all positions and found out that substitutions to cysteine are broadly disfavored across the MelC2 sequence.

In fact, in the heatmap, the cysteine row apparently shows many strongly negative scores, suggesting that the model predicts cysteine mutations to be poorly compatible with this protein. This makes biological sense because cysteine can introduce reactive thiol chemistry, unwanted disulfide-like interactions, or local structural constraints that may disrupt folding or stability, especially in a soluble bacterial enzyme where cysteine is not broadly used as a tolerated replacement.

Question 2

During the latent space analysis, I tried to use the provided SCOPe/Astral sequence dataset from the notebook, but I could not load it correctly in Colab. When I attempted to display sequences, I got an IndexError: list index out of range, which indicated that no sequence records had been parsed.

At first, I tested whether the issue was caused by comment lines before the first FASTA entry and tried using the fasta-pearson parser. After further debugging with Gemini/AI assistance in Colab, the issue appeared to be that the dataset URL was not returning the expected FASTA file, but HTML content instead.

I also tried opening the SCOPe/Astral page manually in the browser, but the site displayed an anti-bot verification page and did not provide access to the dataset.

Because of this, Biopython could not parse the dataset, so I was not able to generate the reduced-dimensionality map or place my protein in it.

If the dataset had loaded correctly, my workflow would have been:

Parse the SCOPe/Astral FASTA dataset.
Add the MelC2 Chain B sequence to the dataset.
Generate ESM2 embeddings for all sequences.
Reduce the embeddings using t-SNE.
Highlight MelC2 in the resulting map.
Compare MelC2 to its nearest neighbors.

a) Use the provided sequence dataset to embed proteins in reduced dimensionality I attempted to use the provided SCOPe/Astral sequence dataset, but the file could not be accessed correctly. The downloaded content was HTML rather than a valid FASTA file, so I could not generate ESM2 embeddings from the provided dataset.

b) Analyze the different formed neighborhoods: do they approximate similar proteins? Since the dataset could not be parsed, I could not generate or analyze the embedding neighborhoods directly. Conceptually, I would expect ESM2 embeddings to place proteins with related sequence-level features, domains, motifs, or families closer together, but I could not verify this with the provided dataset.

c) Place your protein in the resulting map and explain its position and similarity to its neighbors My plan was to add MelC2 tyrosinase from PDB 6J2U Chain B to the dataset before embedding, then inspect whether it clustered near related proteins such as oxidoreductases, tyrosinases, or metal-binding enzymes. Since the dataset could not be accessed correctly, I could not place MelC2 in the final map, so this remains a planned analysis rather than a completed result.

C2. Protein Folding

Question 1

I folded the MelC2 tyrosinase Chain B sequence from PDB 6J2U using ESMFold. The input sequence was 285 amino acids long. The prediction completed successfully with pTM = 0.906 and average pLDDT = 86.743, suggesting that ESMFold produced a high-confidence global fold for MelC2.

At the fold level, yes: the ESMFold prediction appears broadly consistent with the original MelC2 structure. However, I did not calculate RMSD, and the original PDB structure includes MelC2 in complex with MelC1 and metal ions, while ESMFold predicts from sequence alone. Therefore, I interpret the result as a strong qualitative fold-level match, not a precise coordinate-level comparison.

Question 2

Native MelC2

MGSHHHHHHSERTVRKNQATLTADEKRRFVDALVALKRSGRYDEFVTTHNAFIMGDTDSGERTGHRSPSFLPWHRRFLIEFEQALQAVDPSVALPYWDWSTDRTARASLWAPDFLGGSGRSLDGRVMDGPFAASTGNWPVNVRVDSRTYLRRTLGGGGRELPTRAEVDSVLAMSTYDMAPWNSASDGFRNHLEGWRGVNLHNRVHVWVGGQMATGVSPNDPVFWLHHAYIDRLWAQWQSRHPGSGYVPTGGTPNVVDLNETMKPWNDVRPADLLDHTAHYTFDTV

Length: 285 aa pTM = 0.906 pLDDT = 86.743

Mutant 1

To test whether the MelC2 predicted structure is resilient to a small sequence change, I first introduced a single point mutation into the Chain B sequence.

I used the following simple Python function to generate the mutant sequence here.

I selected position 100, where the native residue is serine (S), and mutated it to cysteine (C):

S100C MelC2 mutant

MGSHHHHHHSERTVRKNQATLTADEKRRFVDALVALKRSGRYDEFVTTHNAFIMGDTDSGERTGHRSPSFLPWHRRFLIEFEQALQAVDPSVALPYWDWCTDRTARASLWAPDFLGGSGRSLDGRVMDGPFAASTGNWPVNVRVDSRTYLRRTLGGGGRELPTRAEVDSVLAMSTYDMAPWNSASDGFRNHLEGWRGVNLHNRVHVWVGGQMATGVSPNDPVFWLHHAYIDRLWAQWQSRHPGSGYVPTGGTPNVVDLNETMKPWNDVRPADLLDHTAHYTFDTV

I then used this S100C MelC2 mutant sequence as the input for a new ESMFold prediction, so I could compare its predicted fold and confidence scores with the native MelC2 prediction.

The S100C mutant produced almost the same ESMFold confidence scores as the native sequence.

Length: 285 ptm: 0.906 plddt: 86.874

After introducing the S100C mutation, the predicted structure still appeared compact and globular, with no obvious large-scale disruption compared to the native model. This suggests that MelC2 is structurally resilient to this single substitution at the overall fold level.

Mutant 2

I generated this 16-amino-acid segment-level mutant using a short Python script suggested by ChatGPT here. The script replaced residues 120-135 of the native MelC2 sequence with a glycine-rich segment while preserving the original protein length. I used this to test whether the predicted MelC2 fold is resilient to larger local sequence disruption.

MelC2_segment_mutation_120_135_Gly

Original segment 120-135: RSLDGRVMDGPFAAST Replacement segment: GGGGGGGGGGGGGGGG Native length: 285 Segment mutant length: 285 MGSHHHHHHSERTVRKNQATLTADEKRRFVDALVALKRSGRYDEFVTTHNAFIMGDTDSGERTGHRSPSFLPWHRRFLIEFEQALQAVDPSVALPYWDWSTDRTARASLWAPDFLGGSGGGGGGGGGGGGGGGGGGNWPVNVRVDSRTYLRRTLGGGGRELPTRAEVDSVLAMSTYDMAPWNSASDGFRNHLEGWRGVNLHNRVHVWVGGQMATGVSPNDPVFWLHHAYIDRLWAQWQSRHPGSGYVPTGGTPNVVDLNETMKPWNDVRPADLLDHTAHYTFDTV

The segment mutant produced a lower-confidence ESMFold prediction than the native and S100C sequences. The native MelC2 model had pTM = 0.906 and pLDDT = 86.743, while the segment mutant dropped to pTM = 0.865 and pLDDT = 81.386.

Visually, the predicted structure still formed a compact globular fold, so the protein did not appear completely disrupted. However, the decrease in both pTM and pLDDT suggests that replacing residues 120-135 with glycines weakened the model’s confidence in the fold.

This makes sense because a glycine-rich replacement can increase flexibility and remove side-chain interactions that may help stabilize the local structure. Still, these are structure predictions only, experimental testing would be needed to know whether catalytic activity or copper/metal-related function is preserved.

Sequence	Change	pTM	pLDDT	Interpretation
Native MelC2	None	0.906	86.743	High-confidence compact fold
S100C	Single point mutation	0.906	86.874	Fold appears globally preserved; confidence essentially unchanged
Segment mutant	Residues 120-135 replaced with glycines	0.865	81.386	Fold still predicted, but confidence decreased, suggesting the perturbation affected structural stability more than the point mutation

C3. Protein Generation

Question 1

I used ProteinMPNN to redesign the MelC2 chain from PDB 6J2U. I set Chain B as the designed chain and kept Chain A fixed, since Chain B is MelC2 tyrosinase and Chain A is the MelC1 caddy/cofactor protein.

ProteinMPNN used 273 resolved residues from Chain B and generated a redesigned sequence with:

Native score: 1.2305 Designed score: 0.7427 Sequence recovery: 0.5751

The sequence recovery means that about 57.5% of the redesigned residues matched the native MelC2 sequence. This suggests that ProteinMPNN found a sequence predicted to fit the same backbone while changing a substantial part of the original sequence.

However, this only suggests structural compatibility. It does not prove that the redesigned protein would preserve tyrosinase activity, metal binding, or melanin production.

Question 2

I folded the ProteinMPNN-designed MelC2 sequence with ESMFold to test whether the redesigned sequence still predicts a MelC2-like structure.

Sequence	Length	pTM	pLDDT	Interpretation
Native MelC2	285 aa	0.906	86.743	High-confidence native fold prediction
ProteinMPNN design	273 aa	0.878	80.444	Still folds with good confidence, but lower than native

The ProteinMPNN-designed sequence produced pTM = 0.878 and pLDDT = 80.444. These scores are lower than the native MelC2 prediction, but still reasonably high, suggesting that the redesigned sequence remains structurally compatible with the MelC2 backbone.

Because the designed sequence had only 57.5% sequence recovery, it is substantially different from native MelC2. However, ESMFold still predicted a compact fold with good confidence. This suggests that ProteinMPNN generated a sequence that may preserve the overall structure, although this does not prove preservation of tyrosinase activity, metal binding, or melanin production.

Final Conclusions

Sequence / model	Type of test	Change introduced	Length	pTM	avg pLDDT	Result / interpretation
Native MelC2	Baseline ESMFold prediction	Original MelC2 Chain B sequence from PDB 6J2U	285 aa	0.906	86.743	High-confidence compact fold. Used as the reference for comparison.
S100C mutant	Point mutation	Serine at position 100 replaced by cysteine	285 aa	0.906	86.874	Scores were essentially unchanged compared with native MelC2. The global fold appears resilient to this single point mutation.
Segment mutant 120-135 Gly	Large local perturbation	Residues 120-135, `RSLDGRVMDGPFAAST`, replaced with 16 glycines	285 aa	0.865	81.386	Still predicted to fold, but with reduced confidence. This suggests the global fold is not destroyed, but the perturbation affects structural confidence/stability more than the point mutation.
ProteinMPNN-designed MelC2	Inverse-folding design + ESMFold validation	ProteinMPNN redesigned Chain B using the 6J2U backbone; sequence recovery = 0.5751	273 aa	0.878	80.444	Still predicted to fold with reasonably good confidence, despite only ~57.5% sequence recovery. Suggests the backbone can support substantial sequence variation, but function is not guaranteed.

Overall, MelC2 appears structurally robust at the global fold level. However, all of these conclusions are structural predictions. A preserved fold does not prove preserved tyrosinase activity, copper/metal binding, or melanin production. Functional validation would still require experimental testing.

Part D. Group Brainstorm on Bacteriophage Engineering

Group Members: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia.

PROJECT MAIN GOAL in discussion:

Increased stability (easiest)
Higher titers (medium)
Higher toxicity of lysis protein (hard)

My group and I are conducting research for the group phage project. We have set up a shared Google Docs (screenshot below).

Phage reading material

We reviewed the Week 4 phage reading material and used it to focus the proposal on the MS2 L protein, especially its stability, DnaJ dependence, membrane insertion, and lysis function.

From the proposed bacteriophage engineering goals, our group focused on: Increased stability of the L protein

Our short group plan was to use computational protein design tools to identify mutations that could improve the stability of the MS2 L protein. One possible direction was to make the L protein less dependent on the bacterial chaperone DnaJ by identifying mutations that could improve folding, membrane insertion, or oligomerization.

We proposed using:

Protein language model mutational scoring
In silico mutagenesis
Experimental L-protein mutant data
Biological reasoning based on known L-protein functional regions

These tools can help prioritize mutations before experimental testing. Protein language model scores can identify substitutions that are sequence-compatible, while experimental mutant data and biological reasoning can help filter candidates based on possible effects on DnaJ dependence, membrane behavior, and lysis function.

Potential pitfalls: One pitfall is that positive LLR scores may reflect sequence plausibility, but not necessarily improved lysis function. A second pitfall is that increasing protein stability may not always improve function, because L-protein activity may require flexibility, membrane disruption, or host-factor interaction.

Pipeline schematic

MS2 L-protein sequence: mutational scoring notebook → shortlist positive-scoring substitutions → compare with experimental L-protein mutant data → map candidates to functional regions → select mutations for future experimental testing

Individual plan / contribution

My individual contribution was to select candidate MS2 L-protein mutations by combining LLR scores, experimental mutant data, and biological reasoning.

I selected two soluble-region mutants, S9Q and C29R, to probe folding and possible DnaJ dependence. I also selected three transmembrane-region mutants, A45L, T52L, and N53L, to probe membrane insertion and oligomerization.

Mutant	Region	LLR	Rationale
S9Q	Soluble / N-terminal	2.014	May affect folding or DnaJ-related surface chemistry
C29R	Soluble / N-terminal	2.395	Strong positive score; may alter chaperone-recognition surfaces
A45L	Transmembrane	1.539	May increase hydrophobic packing and membrane stability
T52L	Transmembrane	1.814	Polar-to-hydrophobic change that may improve membrane compatibility
N53L	Transmembrane	1.865	Additional transmembrane-stabilizing candidate

Use of AI assistance

I used ChatGPT as a writing and organization assistant to help structure this section and make sure the required items were clearly addressed. I reviewed, edited, and finalized the scientific content myself.

Week 5 HW: Protein Design Part 2

Part A: SOD1 Binder Peptide Design (From Pranam)

Part 1: Generate Binders with PepMLM

Question 1

This is human SOD1 sequence from UniProt (P00441) removing the initial Met

ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ FASTA

introducing the A4V mutant associated with the most aggressive forms of the ALS disease ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Question 2 and 3

With the help of ChatGPT and Gemni, I generated 2 new cells ir order to generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.

4 PepMLM-generated 12-aa peptides (conditioned on mutant SOD1):

HRVPVAGVEWWE
WSYYVTAVAHKE
WRYGAAAVEWKE
WSVPVVAIEHGE

Question 4

HRVPVAGVEWWE
WSYYVTAVAHKE
WRYGAAAVEWKE
WSVPVVAIEHGE

5. FLYRWLPSRRGG

Question 5

WRYGAAAVEWKE - ppl 4.645 (mean NLL 1.536) WSYYVTAVAHKE - ppl 5.094 (mean NLL 1.628) WSVPVVAIEHGE - ppl 6.423 (mean NLL 1.860) HRVPVAGVEWWE - ppl 7.660 (mean NLL 2.036) Known binder: FLYRWLPSRRGG - ppl 21.391 (mean NLL 3.063)

Interpretation: The perplexity score is PepMLM’s confidence in the peptide under its generative model. PepMLM perplexity can be interpreted this way: lower = higher confidence

PepMLM assigns higher confidence to the four generated peptides than to the known binder under this scoring scheme, with WRYGAAAVEWKE ranked best (lowest perplexity).

The known binder has higher perplexity, suggesting it is less consistent with PepMLM’s learned binder distribution for this target, even though it is experimentally reported to bind. This highlights that PepMLM perplexity is not an experimental binding score. Also, it suggests that perplexity alone is insufficient to validate binding.

As I found this really strange, I decided to find out checks I could run to see whether this was an error/artifact:

Test for missing mask token: negative, so all good.

Conclusion My generated peptides are enriched in W/V/A/Y and look like classic short hydrophobic binders. The known binder FLYRWLPSRRGG has a highly charged tail (RRGG) and a different composition pattern, which the model may assign low probability to even if it binds in reality.

Part 2: Evaluate Binders with AlphaFold3

Evaluate Binders with AlphaFold3

SOD1 Mutant Sequence (A4V mutation) ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

At first, I mistakenly evaluated all peptides in the same run.

Then I noticed the AlphaFold Server treated that as one multi-chain complex with 6 chains total (SOD1 + 4 generated peptides + the known binder). So to compare them I would had to run 5 separate jobs.

SOD1 + HRVPVAGVEWWE: ipTM = 0.34; pTM = 0.86

Where the peptide appears to bind? The peptide is positioned along an external surface of the SOD1 β-strand core, contacting a β-sheet edge/adjacent loop (surface-bound).

SOD1 + WSYYVTAVAHKE: ipTM = 0.22; pTM = 0.81

Where the peptide appears to bind? The peptide shows weak localization and appears loosely associated with the protein surface, without a clearly defined contact region.

SOD1 + WRYGAAAVEWKE: ipTM = 0.41; pTM = 0.85

Where the peptide appears to bind? The peptide is placed near a β-barrel edge/loop region on the outer surface of SOD1 (surface-bound).

SOD1 + WSVPVVAIEHGE: ipTM = 0.44; pTM = 0.86

Where the peptide appears to bind? The peptide is positioned on a distinct surface patch on the β-barrel face/edge, appearing more localized than the others (surface-bound).

SOD1 + FLYRWLPSRRGG (control): ipTM = 0.3; pTM = 0.83

Where the peptide appears to bind? The peptide contacts the protein surface and appears partially inserted into a shallow surface groove/cleft (partially buried relative to the others).

By ipTM ranking: WSVPVVAIEHGE (0.44) > WRYGAAAVEWKE (0.41) > HRVPVAGVEWWE (0.34) > FLYRWLPSRRGG (0.30) > WSYYVTAVAHKE (0.22).

The observed ipTM values are uniformly low (0.22–0.44), indicating limited AlphaFold3 confidence in any specific peptide–SOD1 interface. Among the PepMLM-generated candidates, WSVPVVAIEHGE (ipTM = 0.44) and WRYGAAAVEWKE (ipTM = 0.41) score higher than the known binder FLYRWLPSRRGG (ipTM = 0.30), while HRVPVAGVEWWE (0.34) is slightly higher and WSYYVTAVAHKE (0.22) is lower. Overall, PepMLM-generated peptides match or exceed the known binder by ipTM, but the absolute scores suggest weakly supported, mostly surface-associated binding modes rather than a high-confidence complex.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

HRVPVAGVEWWE

WSYYVTAVAHKE

WRYGAAAVEWKE

WSVPVVAIEHGE

FLYRWLPSRRGG (control)

Across all five peptides, PeptiVerse predicts solubility = 1.000 and non-hemolytic behavior (hemolysis probabilities 0.035–0.064), so none of the candidates are flagged as poorly soluble or strongly hemolytic. Predicted binding affinities (pKd/pKi) vary and do not track ipTM: the highest-ipTM peptide (WSVPVVAIEHGE, ipTM 0.44) has the lowest predicted affinity (5.338), while WRYGAAAVEWKE has a higher predicted affinity (6.526) but slightly lower ipTM (0.41).

The known binder (FLYRWLPSRRGG) shows mid-range predicted affinity (5.962) and ipTM (0.30). Considering binding prediction plus safety-like properties, WRYGAAAVEWKE best balances the set: it has the highest predicted affinity (6.526), is predicted soluble (1.000), and has low hemolysis probability (0.047), while still achieving a relatively higher ipTM (0.41) compared to most others.

Peptide to advance: WRYGAAAVEWKE - it is predicted to be soluble, low-hemolysis, and has the strongest predicted binding affinity among the tested peptides, with moderate (though still low-confidence) structural support from AlphaFold3 (ipTM 0.41).

Part 4: Generate Optimized Peptides with moPPIt

I used the moPPIt Colab on a GPU runtime and pasted the A4V mutant SOD1 sequence (mature form without initiator Met). Here’s my collab copy.

I set binder length to 12 aa and generated a pool of candidate peptides using multi-objective guidance. I enabled affinity guidance and included solubility and hemolysis guidance to bias toward more developable peptides.

Binder (12-aa)	Solubility	Half-life	Affinity
EWWRERLRQTLI	0.5833	0.5833	6.0163
EDWLATLRAATS	0.5000	5.9279	5.7517
EEEWRQLQSQYE	0.8333	4.4313	6.8902
TEEEGVRWKRGV	0.7500	4.0548	6.4628
ELLQWILGITIE	0.4167	13.4681	6.1644

Compared to PepMLM, moPPIt produces peptides shaped by explicit objectives. PepMLM peptides were more diverse but less controlled with respect to developability properties whereas moPPIt candidates tend to show stronger biases in composition, more consistent physicochemical properties across candidates, and often a narrower “design family” reflecting the guidance constraints. On this run, the moPPIt outputs are more compositionally biased toward charged residues (E/D and R/K), consistent with explicit optimization for solubility and half-life alongside affinity. Here’s a summary interpretation of the results:

Best predicted affinity: EEEWRQLQSQYE (6.8902)
Best predicted solubility: EEEWRQLQSQYE (0.8333)
Best predicted half-life: ELLQWILGITIE (13.4681)
Most “balanced” if you prioritize binding + solubility: EEEWRQLQSQYE (top on both, but not top half-life)
Most “balanced” if you prioritize half-life strongly: ELLQWILGITIE (best half-life, but lowest solubility)

Before any clinical consideration, I would follow a staged evaluation: (1) in silico screening for interface plausibility (AlphaFold3 ipTM/PAE consistency across seeds) plus basic developability predictions (solubility, hemolysis, aggregation risk); (2) in vitro binding assays (SPR/BLI or competition ELISA), stability in serum, and cytotoxicity/hemolysis assays; (3) cell-based assays for functional effect and off-target toxicity; (4) only after robust preclinical evidence, proceed to in vivo PK/PD and safety studies. In other words, moPPIt designs are hypotheses that must be filtered by structural consistency and validated experimentally before any translational claims.

Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele) Since this was an optional part, I decided to skip for now.

Part C: Final Project: L-Protein Mutants

Phage Lysis Protein Design Challenge

L-Protein Engineering | Option 1: Mutagenesis

I ran the mutational scoring notebook to obtain per-substitution LLR scores and shortlisted mutations with positive scores.

Position	Wild_Type_AA	Mutation_AA	LLR_Score
50	K	L	2.561464
29	C	R	2.395425
39	Y	L	2.241777
29	C	S	2.043149
9	S	Q	2.014323
29	C	Q	1.997047
29	C	P	1.971026
29	C	L	1.960644
50	K	I	1.928798
53	N	L	1.864930
61	E	L	1.818097
52	T	L	1.813966
50	K	F	1.802066
29	C	T	1.797245
29	C	K	1.795876
5	F	Q	1.795244
5	F	R	1.659717
29	C	A	1.648654
27	Y	R	1.628060
22	F	R	1.602028
5	F	P	1.596888
50	K	V	1.594572
50	K	S	1.574555
5	F	T	1.559023
5	F	S	1.556416
45	A	L	1.539248
39	Y	S	1.517457
27	Y	S	1.497052
40	V	L	1.477630
27	Y	L	1.474637

I then intended to cross-check each shortlisted mutation against the experimental mutant dataset (L-Protein Mutants) to see whether the experimental lysis phenotype is directionally consistent with the LLR score.

Only 6 substitutions from my scored shortlist overlapped with the experimental table (C29R, C29S, K50I, K50S, Y27S, Y39S). In the experimental dataset, all overlapping substitutions were labeled as non-lytic (Lysis = 0) despite having positive LLR scores in the notebook. This suggests that, for MS2 L-protein, sequence-only language-model scores may not reliably capture key determinants of lysis (likely influenced by membrane insertion, oligomerization, and host-factor dependence). We therefore should treat LLR scores as a hypothesis generator, not a predictor of functional lysis.

I selected five single-point variants, including two mutations in the soluble region (positions 1–40) and three in the transmembrane region (TM) (positions 41–75), as required.

WT (MS2 L-protein, 75 aa): METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

I selected five single substitutions with positive LLR scores. I enforced the assignment constraint by choosing two mutations in the soluble region (positions 1–40) and three in the transmembrane region (positions 41–75).

Here are the 5 mutants I choose:

Mutant 1 - S9Q (soluble, LLR = 2.014)

Sequence: METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Rationale: High positive score in the soluble region (putative DnaJ-interaction domain). Ser→Gln increases hydrogen-bonding potential and may alter surface chemistry without strongly destabilizing the fold.

Mutant 2 - C29R (soluble, LLR = 2.395)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Rationale: One of the strongest positive-scoring substitutions in the soluble region. Adds a positive charge that could reshape chaperone-recognition or interaction surfaces.

Mutant 3 - A45L (TM, LLR = 1.539)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT Rationale: Hydrophobic substitution in the transmembrane segment. Ala→Leu increases hydrophobicity and may stabilize membrane helix packing/insertion and oligomer stability.

Mutant 4 - T52L (TM, LLR = 1.814)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT Rationale: Polar→hydrophobic change in the TM region. Thr→Leu may increase membrane compatibility and reduce local insertion/misfolding penalties.

Mutant 5 - N53L (TM, LLR = 1.865)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT Rationale: Polar→hydrophobic change in the TM region with a strong positive score. Selected as an additional TM-stabilizing candidate.

Week 6 HW: Genetic Circuits Part 1: Assembly Technologies

Assignment: DNA Assembly

Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

Phusion High-Fidelity PCR Master Mix is a 2X, ready-to-use mixture where the exact formulation is partly proprietary, but the functional components are documented in the manufacturer’s manual:

Component (Phusion 2X Master Mix)	Purpose
Phusion High-Fidelity DNA Polymerase	DNA synthesis with high fidelity + proofreading
dNTPs (dATP, dCTP, dGTP, dTTP)	Building blocks for new DNA strands
HF reaction buffer (salts + pH buffer)	Maintains optimal pH/ionic strength for enzyme function
Mg2+ (via buffer system; often MgCl2-derived)	Essential polymerase cofactor
Stabilizers / additives (partly proprietary)	Improve enzyme stability and consistency
Nuclease-free water	Solvent to reach correct 2X working concentrations

Reference: Thermo Fisher Phusion High–Fidelity DNA Polymerase Product Information Sheet, standard biochemistry manuals (e.g., Sambrook & Russell).

Question 2: What are some factors that determine primer annealing temperature during PCR?

Determinant	Effect on TA	Why
Primer Melting Temperature (Tm)	Increase	Higher Tm means stronger duplex stability, needs higher Ta
Primer length	Increase	More base pairs → higher Tm → higher Ta
Primer GC%	Increase	GC pairs stabilize duplex more than AT
Salt (Na+/K+) concentration	Increase	Screens charges, stabilizes duplex, raises Tm
Mg2+ concentration	Increase	Stabilizes primer-template binding; raises effective Tm
Primer-template mismatches (more / at 3′ end)	Decrease	Destabilizes duplex; lower Ta needed to anneal
Degenerate bases (more degeneracy)	Decrease	Lowers effective match/Tm; often requires lower Ta
GC-rich template / strong secondary structure	Decrease	Competes with primer binding; often use lower Ta + additives
DMSO / betaine / similar GC additives	Decrease	Reduce duplex stability (esp. GC), lowering effective Tm
Need for higher specificity (reduce off-targets)	Increase	Higher Ta increases stringency, reduces non-specific binding

Question 3: There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

Aspect / Decision point	PCR (amplification)	Restriction enzyme (cutting)
What it does	Amplifies a defined region between two primers	Cuts existing DNA at specific recognition sites
Input	Template DNA + primers	DNA substrate (plasmid/PCR product/genomic DNA) + restriction enzyme(s)
Key reagents	Polymerase mix, primers, dNTPs, buffer, Mg2+	Restriction enzyme(s), buffer, often BSA (enzyme-dependent)
Protocol core steps	Denature → anneal → extend (cycling)	Incubate DNA with enzyme(s) at recommended temperature/time
Sequence requirements	Need primer-binding sites flanking target	Need the enzyme recognition site(s) present in the DNA
Output fragment boundaries	Defined by primer positions (base-precise)	Defined by cut sites (exact where enzyme cleaves)
Can create new sequences?	Yes - primers can add overhangs/tags/sites	No - only cuts at existing sites (unless sites were engineered earlier)
Typical use cases	Generate a specific insert, add adapters, site-directed changes, amplify from low-abundance template	Linearize a plasmid, excise an insert, diagnostic mapping, generate compatible ends for cloning
Speed / setup	Moderate - requires optimization (Ta, primers)	Fast/simple if sites exist and enzyme conditions are known
Failure modes	Non-specific bands, primer-dimers, no amplification, PCR errors	Star activity (wrong cuts), incomplete digestion, missing sites
Fidelity / errors	Depends on polymerase; can introduce mutations	No replication - does not introduce point mutations
When preferable	When you need a specific fragment and/or to add features (overhangs, tags), or template amount is low	When the fragment is already present and flanked by useful sites; when you need clean linearization/excision without amplification

Question 4: How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

Check / requirement	What to do (PCR + digest)	Why it matters for Gibson
20–40 bp overlaps at every junction	Design primers so each fragment end has 20–40 bp homology to the adjacent fragment/backbone	Gibson assembly depends on annealing of complementary overlaps
Correct orientation of overlaps	Ensure the overlap sequence matches the correct neighbor (A→B, B→C, insert→vector, etc.)	Wrong overlap = wrong assembly or no assembly
Linearized backbone	Restriction-digest the vector to a single linear band; gel-purify if needed	Gibson requires a linear backbone (no undigested circular plasmid carryover)
Remove template plasmid from PCR	If PCR was from plasmid, treat with DpnI (cuts methylated template)	Prevents parental plasmid background colonies
Clean fragment ends (no inhibitors)	Purify PCR and digest products (spin column or gel extraction)	Salts, ethanol, detergents inhibit Gibson enzymes
Correct fragment sizes	Run an agarose gel to confirm expected sizes; excise/gel-purify correct bands if mixed	Verifies you’re assembling the intended pieces
Avoid duplicate/competing overlaps	Keep overlaps unique (no repeated identical overlap sequences across multiple junctions)	Prevents mis-assembly and rearrangements
Overlap doesn’t create strong hairpins/repeats	Check overlap sequences for high secondary structure/repeats	Improves annealing and reduces drop in assembly efficiency
Balanced fragment concentrations	Quantify DNA (Nanodrop/Qubit) and use equimolar amounts; keep total DNA in recommended range	Too much/too little of one piece reduces correct assembly
No internal cuts from chosen restriction enzymes	Verify your insert/parts don’t contain the restriction sites used to linearize the vector	Prevents unintended fragmentation or loss of insert

Question 5: How does the plasmid DNA enter the E. coli cells during transformation?

The plasmid DNA enter the E. coli cells during transformation through transient permeability of the cell envelope. This can happen either via:

Electroporation: a short electric pulse creates temporary membrane pores that let DNA pass into the cytoplasm.
Chemical (heat-shock) transformation: divalent cations (e.g., Ca²⁺) reduce electrostatic repulsion between DNA and the membrane, and a brief heat shock promotes DNA uptake through temporary pores/defects.

Question 6: Describe another assembly method in detail (such as Golden Gate Assembly)

a) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).

Golden Gate Assembly is a molecular cloning technique that allows multiple DNA fragments to be assembled simultaneously in a single reaction. It uses Type IIS restriction enzymes such as BsaI, BsmBI, or BbsI, which cut DNA outside their recognition sequence and generate custom sticky ends. We can control the order and orientation in which DNA fragments assemble by placing Type IIS restriction sites around each fragment and designing specific 4-bp overhangs that are complementary only to the intended neighboring fragment, the order and orientation of DNA assembly are precisely controlled.. During the reaction, the restriction enzyme digests the DNA fragments while T4 DNA ligase simultaneously ligates matching overhangs in the same tube, making the process efficient and rapid. Because the restriction sites are removed during assembly, the correctly assembled construct cannot be cut again, while incorrect products continue to be digested, driving the reaction toward the desired product. The reaction is typically performed in a thermocycler alternating between ~37 °C (optimal for digestion) and ~16 °C (optimal for ligation). This method is widely used in synthetic biology because it enables scarless assembly of many DNA parts, although internal Type IIS restriction sites must first be removed usually by silent mutation(s).

Golden Gate Assembly – Step-by-Step Diagram

Step 1: Design fragments with Type IIS sites

Vector: [BsaI]─────────────[BsaI]

Fragment A: [BsaI]──Part A──[BsaI]

Fragment B: [BsaI]──Part B──[BsaI]

Inward-facing BsaI sites. Overhangs are designed to match the next fragment.

Step 2: Type IIS cuts outside recognition sites

Vector: GCTT—–

Fragment A: —–AATG (overhang)

Fragment B: AATG—–GCTT (overhangs)

Recognition sites (BsaI) are removed on small excised pieces.

Step 3: Annealing of fragments

Vector —–GCTT

Fragment A GCTT—–AATG

Fragment B AATG—–CGAA

Overhangs anneal only to the correct partner. Orientation is fixed.

Step 4: Ligase seals fragments

Final construct:

Vector ── Fragment A ── Fragment B

Scarless assembly. BsaI sites are gone, so the construct is stable.

Step 5: Reaction drives correct assembly

Misassembled fragments still have exposed BsaI sites → cut again
Correct product accumulates over multiple cycles

Key Points:

Modular → promoters, RBS, genes, terminators
Multi-fragment assembly in one tube
Order & orientation controlled by 4-bp overhangs
Scarless final product

b) Model this assembly method with Benchling or a similar tool!

I imported the pBBR1MCS-5 sequence as circular DNA (pBBR1MCS-5 (raw)) and imported phaA, phaB, phaC as separate linear DNA sequences.

I checked for internal BsaI sites (GGTCTC) in all sequences: the genes have no BsaI sites, and pBBR1MCS-5 has a single BsaI site, so it is not a Golden Gate destination vector by direct digest. To model Golden Gate anyway, I created a PCR-linearized Golden Gate backbone: I duplicated the plasmid and saved a linear version (pBBR1MCS-5_GG_backbone).

On this linear backbone, I created two endpoint annotations (first ~20 bp and last ~20 bp) to represent that PCR primers would add inward-facing BsaI sites + 4 bp overhangs:

start: BsaI + Overhang OH1 (added by PCR primer)
end: BsaI + Overhang OH4 (added by PCR primer)

To simplify the Benchling model, I represented Golden Gate flanks (inward-facing BsaI sites and 4-bp overhangs) as annotations rather than explicitly adding the flanking sequences. In a real build, these flanks would be introduced via PCR primers or synthesis.

I duplicated each gene to create Golden Gate-ready parts (phaA (codon optimized) anotated, phaB (codon optimized) anotated and phaC (codon optimized) anotated) and defined the assembly overhang scheme for directional order. For each gene, I added annotations with intended Golden Gate junction overhangs:

Left end: Intended Golden Gate overhang: OH1 (conceptual)
Right end: Intended Golden Gate overhang: OH2 (conceptual)

Overhangs were not added as literal sequences, I only annotated the first/last 20 bp to indicate where BsaI-generated 4 bp overhangs would be introduced via primers/synthesis.

For a simplified Golden Gate model in Benchling, I manually constructed the final plasmid sequence by opening pBBR1MCS-5 at the MCS and concatenating the backbone with phaA–phaB–phaC in the intended order. Overhangs/Type IIS flanks were represented as annotations only.

Assignment: Asimov Kernel

Asimov Kernel notes / all material on my repo “Kanbe-Mariana-HW6”. Below are just some of the info, but please have a look at the Kernel direcly.

HW6: Asimov Kernel Exercises 1,2:

Exercise 3:

Finding the “Bacterial Demos” public repo

I started analysing the constructs with the Repressilator.

This is the description: “This is a repressilator genetic circuit. It consists of 3 transcription units, where the CDS in each is a repressor that represses the promoter in the next transcription unit. This results in an oscillation of the concentrations of the 3 proteins.”

These 3 constructs have 3 different promoters, which generates different genetic ←→ phenotipic outputs:

J23117 Promoter: A transcription unit with a weak promoter.
J23101 Promoter: A transcription unit with a strong promoter.
J23106 Promoter: A transcription unit with a medium promoter.

Using Simulation feature, the repressillator was simulated using the following parameters:

Chassis: E. coli Duration: 408 hours Timestep: 60 min Transfection: Transient transfection

These was the output:

Summary of the findings:

The simulation shows rapid initial accumulation followed by relatively stable RNA and protein concentration ranges over time, while endpoint RNAP and ribosome fluxes differ substantially among the three transcription units. The construct driven by the J23101 (strong promoter) shows the highest activity, the J23106 (medium promoter) shows intermediate activity, and the J23117 (weak promoter) shows the lowest activity.

Exercise 4: Repressilator reconstructions

I recreated the Repressilator in the empty construct using parts from the Characterized Bacterial Parts repository.

First, I used the Search function in the right-hand menu to find the required bacterial parts. Then, I dragged and dropped the selected parts into the empty construct to assemble the circuit. The final design reproduced the three-transcription-unit repressilator architecture.

After building the construct, I used the Simulator by clicking the play button to test its behavior. I then compared the simulation output with the original Repressilator Construct available in the Bacterial Demos repository.

Repressillator Reconstruction 1

I replaced pLacI (regulated by LacI) with pTetR (regulated by TetR) in the first unit, while all other simulation parameters were kept the same. That means the input regulator of that node changed, but the overall loop structure is preserved.

The goal was to observe whether changing the promoter identity altered the resulting RNA concentrations, protein concentrations, RNAP flux, or ribosome flux compared with the original repressilator design.

Using Simulation feature, the new repressillator pTetR was simulated using the same parameters as before:

Chassis: E. coli Duration: 408 hours Timestep: 60 min Transfection: Transient transfection

These was the output:

Summary of the findings:

The simulation looks the same cecause from the model’s perspective the system is still a symmetric 3-repressor cycle and each node still produces a repressor and represses the next node. So the dynamics remain qualitatively equivalent.

Repressilator Reconstruction 2:

In order to try to experiment another cyclic repression topology different from TetR → LacI → LambdaCI → TetR I’ve tried these:

Replace pLambdaCI with pLacI: to make two transcription units use the same promoter and see how that would affect the circuit’s behavior. Replace pLacI with pLambdaCI: to test what happens when I switch which repressor controls that transcription unit. Replace TetR CDS with LacI CDS: to see how the simulation changes when one repressor is replaced by another and the circuit has less repressor diversity.

And so I re-runned the simmulation and these were the plots:

The modified circuit converges to a steady state dominated by LambdaCI, with LacI and TetR near zero, and no oscillatory behavior observed.

Exercise 5

Construct 1

I designed this construct to test high constitutive expression using the strong J23101 promoter placed upstream of LacI, with an A1 RBS to support translation and an L3S2P24 terminator to end transcription. My rationale was to build a simple bacterial circuit with no regulatory feedback, so I would expect continuous LacI expression and relatively high, stable RNA and protein levels in the simulation.

The simulation of this first construct shows rapid initial expression followed by a stable steady state. RNA concentration increases quickly and stabilizes at approximately 0.8 relative units, while protein concentration stabilizes at approximately 0.65. RNAP and ribosome flux are constant, indicating continuous transcription and translation. This matches the expectation for a constitutive expression construct driven by the strong J23101 promoter.

Construct 2

The second construct shows significantly lower expression compared to the first. RNA concentration stabilizes at approximately 0.003 relative units and protein concentration at approximately 0.0025, both much lower than in the strong promoter construct. RNAP and ribosome flux are also reduced. The system still reaches a steady state with constant expression over time, indicating that changing the promoter strength affects the magnitude of expression but not the overall behavior.

Construct 3

For the third construct, I copied the Self-regulating Circuit from the Bacterial Demos repository into my workspace and ran the simulation without modifying its structure. This allowed me to observe the behavior of a circuit with built-in feedback regulation and compare it with the constitutive expression constructs.

The self-regulating circuit shows stable expression over time, reaching a steady state without oscillations. RNA concentration stabilizes at approximately 0.56 relative units and protein concentration at approximately 0.45. RNAP and ribosome flux are constant, indicating continuous but regulated expression. Compared to the constitutive constructs, the expression level is intermediate, reflecting the effect of feedback regulation on maintaining controlled output.

These results show that promoter strength controls expression level, while circuit structure, such as feedback regulation, influences how expression is maintained over time.

Week 7 HW: Genetic Circuits Part 2: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Question 1

Traditional genetic circuits are usually implemented in Boolean logic (ON/OFF), hand-designed as fixed logic. so representing nuanced behaviors often requires many gates, sharp thresholds, and careful tuning, which can make designs bulky and brittle. As the number of inputs grows the circuit complexity can explode combinatorially, increasing burden by stacking multiple layers and adding intermediate nodes, which increases metabolic load, failure points, and sensitivity to part-to-part variability Also, adapting to new targets or shifting biological context often means redesigning the circuit architecture, not just re-tuning parameters.

Intracellular Artificial Neural Networks (IANNs) are parametric and trainable: you can adjust “weights” to fit a desired behavior from data (calibration/learning), then iterate as conditions change. This is more condizent with the noisy and complex nature of biological signals. IANNs are parametric and trainable, designed to operate on analog inputs, tolerate noise through distributed computation, and approximate complex decision boundaries without enumerating every logic case. Using IANN you can adjust “weights” to fit a desired behavior from data (calibration/learning), then iterate as conditions change, which is in general a very wanted feature for biological modelling.

Question 2

A useful application for an IANN could be a multi-signal “smart probiotic” controller that decides when to express a therapeutic payload in the gut based on a noisy inflammation signature. This could be a proposed pipeline:

Sensors detect several analog inputs. These can be related to a mesurable intracellular signal (i.e. information on promoters/sensors response to nitrate/NO, tetrathionate, ROS, and low pH <-> measurable intracellular signal like transcription rate or a regulator concentration)
The IANN integrates these signals as weighted contributions and computes a graded output: a continuously tunable expression level of a payload gene (e.g., an anti-inflammatory cytokine mimic, a barrier-protective peptide, or a locally acting enzyme), plus an optional reporter for monitoring.

Instead of requiring all conditions to be “true” or “false,” like Boolerian models the IANN can implement a “risk score” that turns on strongly only when the combined pattern matches inflammation, while remaining low for benign fluctuations. In practice, you would calibrate the weights using training data from known conditions (healthy vs inflamed models) so the output tracks the probability or intensity of the target state.

Limitations / failure modes: IANNs still face real biological constraints such as sensor cross-talk and context effects. These can shift input distributions. Also, weights can drift as cells evolve, and metabolic burden can reduce growth or change the very physiology being measured. The dynamic range and noise of biological parts can compress signals, making it hard to separate “moderate” from “high” states without careful normalization and controls. Time dynamics also matter: inputs arrive on different timescales (transcription vs metabolites), so the network may need memory/filters to avoid reacting to transient spikes, which can substantially increase the complexibility of the network. Finally, safety and containment become part of the spec, thus important to define acceptance balance between error type 1 and 2 defining if you’d likely need a kill switch and strict limits on maximum output to avoid unintended activation in off-target contexts.

Question 3

Assigment Part 2: Fungal Materials

Question 1

Example 1: Mycelium composite foams (grown on agricultural waste)

Used for protective packaging, insulation panels, acoustic damping, and lightweight cores.

Advantages: renewable feedstocks, low-temperature manufacturing, biodegradable or compostable end-of-life, and tunable density via growth conditions.

Disadvantages: mechanical properties can vary batch-to-batch, moisture sensitivity unless coated, and long-term durability and standards testing can be harder than for petrofoams.

Example 2: Mycelium “leather” (mycelium-based sheets)

Used for footwear, bags, apparel, and upholstery as a leather alternative.

Advantages: avoids the animal leather supply chain, potentially lower land and chemical burden, and tunable texture and thickness.

Disadvantages: still often needs finishing steps for durability and water resistance, performance can lag high-grade leather, and cost and scale are still improving.

Example 3: Fungal biocement or mycelium-bound “bio-bricks”

Used for low-load building blocks, interior architectural elements, and decorative panels.

Advantages: low-energy fabrication, can use local waste substrates, lightweight, and potentially lower embodied carbon than fired bricks or some concretes.

Disadvantages: typically not comparable to concrete for structural strength, humidity and fire performance require careful engineering, and regulatory acceptance is slower.

Example 4: Fungal pigments and dyes (fermentation-derived)

Used for textiles, inks, coatings, and cosmetics.

Advantages: renewable production, avoids some petroleum-derived dye routes, and potentially lower toxic byproducts depending on the process.

Disadvantages: stability and colorfastness can be challenging, purification costs can be nontrivial, and some pigment pathways have safety constraints depending on the organism and compound.

Question 2

One may want to tune mycelium architecture (hyphal branching, wall composition, and crosslinking) to achieve specific strength, flexibility, porosity, and water resistance for composite materials. Another application is producing programmable functional materials by engineering fungi to secrete adhesives, hydrophobins, melanin-like coatings, or crosslinking enzymes so the final material is tougher or more water-stable without heavy post-processing.

Beyond material applications, genetically engineered fungi can be used for biosensing if we add genetic circuits that turn on a visible reporter in response to VOCs, toxins, inflammation markers, or pollutants, enabling living “sensor materials.” They can also be used for biomanufacturing high-value enzymes, small molecules, and therapeutics that benefit from eukaryotic processing or secretion, and for bioremediation by enhancing the breakdown of lignin, plastic additives, dyes, PFAS-like contaminants (where feasible), or heavy-metal binding, depending on pathway and safety constraints.

Fungi can be advantageous over bacteria because filamentous growth lets them act as a self-assembling scaffold, so the organism is both the “factory” and the “fabrication method.” They also offer eukaryotic protein processing because fungi handle disulfide bonds, folding, secretion, and many post-translational modifications better than most bacteria, which matters for secreted enzymes and complex proteins. In addition, fungi naturally secrete many enzymes, which is ideal for biomass conversion and environmental breakdown workflows. Another advantage relative to bacteria is metabolic breadth since fungi often tolerate more extreme acidic conditions and diverse feedstocks, and many are strong at producing secondary metabolites.

However, bioprocesses with engineered fungi may have practical limitations compared with bacteria, such as slower growth and iteration, more complex regulation and morphology (heterogeneity in filamentous cultures can make outputs less uniform), and genetic tools that can be trickier because strain engineering and predictable expression are often less plug-and-play than in E. coli.

Assigment Part 3: First DNA Twist Order

I reviewed the Individual Final Project documentation guidelines, submitted the Google Form with my draft Aim 1, final project summary, HTGAA industry council selections, and shared DNA design folder, and completed Part 3 of the Week 2 DNA Design Challenge by designing and uploading at least one insert sequence. I also documented the backbone vector for synthesis on my website.

Week 9 HW: Cell Free Systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

Exercise 1

Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.

Cell-free systems allow full and direct control of reaction conditions and components, enabling rapid and flexible experimentation. Here’s a table with the main advantages of cell-free vs in vivo:

Aspect	Cell-free	In vivo
Environment control	Direct, tunable	Limited by cell physiology
Toxic proteins	Can express	Often lethal to host
Reaction conditions	Precisely adjustable	Fixed intracellular state
Speed	Minutes-hours	Hours-days
Component handling	Add/remove parts	Difficult

Cases where cell-free is more beneficial

Expression of toxic proteins (e.g., antimicrobial peptides)
Incorporation of non-natural amino acids
Expression of membrane proteins with detergents/liposomes
Rapid prototyping of genetic circuits

Exercise 2

Main components of a cell-free expression system and their role

Component	Role
Cell extract (lysate)	Provides ribosomes, enzymes, tRNAs
DNA/mRNA	Encodes target protein
Amino acids	Building blocks for protein
Energy system (ATP,GTP)	Drives transcription/translation
Cofactors (Mg²+, K+)	Maintain enzyme activity
Buffer	Stabilizes pH and environment

Exercise 3

Protein synthesis consumes large amounts of ATP and GTP. Because cell-free reactions lack the metabolic machinery of living cells, these energy molecules are rapidly depleted unless they are regenerated, which causes protein synthesis to stop and reduces yield.

A common way to maintain ATP supply is the phosphoenolpyruvate (PEP) system, in which PEP donates a phosphate group to ADP via pyruvate kinase to regenerate ATP: PEP + ADP → ATP (via pyruvate kinase). Other ATP regeneration strategies include creatine phosphate in which creatine phosphate transfers a phosphate to ADP via creatine kinase to rapidly regenerate ATP and glucose-based systems where Glucose is metabolized through enzymatic pathways to continuously produce ATP over longer reaction times.

PEP and creatine phosphate favor speed and simplicity, whereas glucose-based systems are better suited for longer and more sustainable reactions. Unless the process clearly requires extended reaction time, I would start with the PEP system because it typically delivers faster and higher ATP regeneration with a relatively simple setup.

Excercise 4: Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

Prokaryotic vs eukaryotic cell-free systems

	Prokaryotic	Eukaryotic
Speed	Fast	Slower
Cost	Lower	Higher
Protein folding	More limited	Better for complex proteins
Post-translational modifications	Minimal	Present or more compatible
Best suited for	Simple proteins	Complex eukaryotic proteins

Prokaryotic cell-free systems such as E. coli are faster and less expensive, making them suitable for producing simple proteins that do not require complex folding or post-translational modifications, such as GFP. In contrast, eukaryotic systems are slower and more costly but are better suited for proteins that require proper folding, disulfide bond formation, or eukaryotic processing, such as human antibody fragments.

Excercise 5

To optimize membrane protein expression in a cell-free system, I would design the reaction to include a membrane-like environment during synthesis, using detergents or liposomes to maintain solubility and support proper insertion. I would also optimize reaction conditions such as magnesium concentration and temperature, and add chaperones if necessary, to reduce misfolding and improve overall yield, because membrane proteins are especially prone to misfolding and insolubility in aqueous systems.

Challenge	Why it occurs	Experimental strategy	Expected benefit
Misfolding	Membrane proteins contain hydrophobic regions	Add chaperones; optimize temperature	Improves correct folding
Aggregation	Hydrophobic segments interact in solution	Add mild detergents (e.g., DDM)	Keeps protein soluble during synthesis
Insolubility	No native membrane is present	Add liposomes or nanodiscs	Provides membrane-like environment
Low insertion	Protein cannot embed properly in aqueous media	Include membrane mimics during expression	Supports insertion and stabilization
Poor yield	Reaction conditions may be suboptimal	Optimize Mg²⁺ and reaction conditions	Increases expression efficiency and stability

Excercise 6: Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

Low yield in a cell-free system can result from insufficient transcription, depletion of ATP, degradation of the expressed protein, or poor folding conditions. Troubleshooting should therefore target the limiting step directly: improve template quality if transcription is weak, reinforce energy regeneration if the reaction stalls, inhibit proteases if degradation is suspected, and optimize temperature or folding support if the protein is unstable or misfolded.

Homework question from Kate Adamala

I would design a phospholipid vesicle-based synthetic minimal cell that uses the blue-light regulator EL222 to activate expression of the tyrosinase gene melA, producing melanin as a visible record of cumulative light exposure.

Question 1

A light-exposure logging synthetic minimal cell for integration into a wearable or material patch.

input:

the synthetic cell would detect blue/visible light and respond by producing melanin
a realistic light-sensing module is EL222, a one-component blue-light activated transcription factor from Erythrobacter litoralis that binds DNA upon illumination

output:

gradual, visible darkening that records cumulative exposure over time.
a realistic pigment-output gene is melA, a tyrosinase gene from Rhizobium etli that has been used to generate melanin in E. coli.

b) This function could be realized by cell-free Tx/Tl alone only partially. In bulk cell-free solution, the circuit could still produce melanin, but without encapsulation it would not behave as a discrete synthetic minimal cell and would be harder to localize, stabilize, or integrate into a material as a spatially resolved light-logging unit.

c) This function could also be realized by a genetically modified natural cell. For example, E. coli can be engineered to express melA and produce melanin. A synthetic minimal cell is preferable if the goal is a compartmentalized, material-compatible system rather than a living replicating microbe.

d) The desired outcome is that the synthetic cell becomes darker as cumulative light exposure increases. In a material, a population of these vesicles would function as a distributed exposure log: more illuminated regions would accumulate more melanin and therefore appear darker than shaded regions.

Question 2

a) The membrane would be a phospholipid vesicle, for example POPC + cholesterol, because that is a standard stable composition for synthetic cell vesicles and is also used in related artificial-cell communication systems.

b) Inside the vesicle, I would encapsulate an E. coli cell-free transcription/translation system, amino acids, NTPs, salts, and cofactors, an ATP regeneration system, such as PEP + pyruvate kinase, L-tyrosine as the melanin precursor Cu²⁺ as a cofactor for tyrosinase, DNA encoding the light-response module and melanin-output module.

c) For the Tx/Tl source, a bacterial system is sufficient. The core regulator, EL222, is bacterial, and the output enzyme MelA tyrosinase does not require mammalian-specific post-translational processing to function as a pigment-producing enzyme.

d) The synthetic cell would communicate with the environment mainly through light, which crosses the membrane directly, so no membrane channel is required for the input. To simplify the system, I would preload tyrosine and copper inside the vesicle. If I later wanted continuous substrate exchange from the outside, I could add a pore such as α-hemolysin (Hla), which is commonly used in synthetic-cell communication designs.

Exercise 3 - Experimental details

Lipids: POPC, cholesterol Genes: EL222 from Erythrobacter litoralis as the light-activated transcription factor; melA from Rhizobium etli as the tyrosinase gene for melanin production optional: hla for α-hemolysin if external substrate exchange is needed; Encapsulated reagents: E. coli cell-free lysate or PURE-like system, amino acids, NTPs, PEP, pyruvate kinase, tyrosine, Cu²⁺

I would measure the function of the system by tracking darkening over time, using image analysis and bulk absorbance measurements. The most direct readout is the increase in visible pigmentation of illuminated vesicles relative to dark controls; microscopy could also be used to compare spatial patterns of melanin accumulation across the material.

Homework question from Peter Nguyen

Application field: Textiles / Fashion

One-sentence pitch A textile integrated with freeze-dried cell-free melanin-producing modules that develops gradual, skin-adjacent tonal changes in response to light exposure, turning the garment into an exposure-recording surface.

How it works The material would incorporate localized freeze-dried cell-free reaction zones containing the genetic and enzymatic components needed for melanin production, for example a light-responsive regulator such as EL222 coupled to a melanin-producing gene such as melA. When the textile is activated by hydration, these embedded reaction zones become functional and begin responding to light exposure by expressing tyrosinase and generating melanin from preloaded substrate. Over time, more exposed regions of the garment darken more than shaded or covered regions, creating gradients or “tan-line-like” traces directly in the material. Functionally, the textile behaves less like a conventional dyed fabric and more like a programmable, exposure-sensitive biological film.

Societal challenge or market need This concept addresses the growing interest in responsive and personalized materials in fashion and design, especially materials that are not just decorative but capable of recording use, environment, or time. It also responds to demand for alternatives to static coloration and conventional dyeing by proposing a material whose visual output is generated biologically in place. Beyond fashion, the same platform could be relevant to design objects or artistic textiles that visibly register environmental exposure.

How to address limitations of cell-free reactions

Because freeze-dried cell-free systems require water for activation and are typically limited in duration, I would treat the material as an on-demand activation platform rather than a permanently active textile. The garment could be hydrated only when the user wants to generate a pattern or record a specific exposure event, which also helps manage stability and one-time use.
To improve shelf life, the cell-free modules would remain freeze-dried until use and be stored in sealed conditions;
To improve localization and handling, they could be embedded in discrete patches, printed zones, or replaceable inserts rather than distributed uniformly across the whole textile. This makes the limitation part of the design logic: the material is activated intentionally, records one event or interval, and then remains as the final artifact.

Background information (max 100 words) Space radiation can damage DNA and reduce the reliability of biological systems used for diagnostics, manufacturing, and environmental sensing during long-duration missions. This is significant because future crews will likely depend on compact biotechnology tools rather than constant resupply from Earth. It is relevant for space exploration because cell-free systems are lightweight, storable, and already attractive for use in resource-limited environments. It is scientifically interesting because it links a basic biological question - how nucleic acid damage affects gene expression - to an applied engineering problem: how to maintain functional biotechnology in space.

Molecular or genetic target (max 30 words) Integrity and expression efficiency of a PCR-amplified sfGFP DNA template after radiation-mimicking UV exposure.

How the target relates to the challenge (max 100 words) The sfGFP DNA template serves as a simple reporter for whether a biologically useful DNA sequence remains functional after damage. If radiation-like exposure degrades the template, BioBits cell-free protein expression should produce less GFP signal. This makes the target directly relevant to the space biology challenge, because many space biotechnology applications depend on DNA templates remaining intact enough to be transcribed and translated. Measuring GFP output therefore provides a practical way to estimate how radiation damage could impair future cell-free diagnostics or production systems used in spacecraft or habitats.

Hypothesis or research goal (max 150 words) My hypothesis is that increasing UV exposure, used here as a classroom-accessible proxy for radiation-induced nucleic acid damage, will reduce the ability of a PCR-amplified sfGFP DNA template to produce GFP in the BioBits cell-free expression system. I further expect that templates protected by a shielding condition, such as melanin-containing film or another UV-blocking barrier, will retain more expression than unprotected templates exposed to the same dose. The reasoning is that DNA damage should interfere with transcription and translation by reducing template integrity, while a protective barrier should lower that damage. The research goal is to test whether cell-free fluorescence output can function as a simple readout of DNA stability under space-relevant stress and whether a lightweight protective strategy improves performance.

Homework question from Ally Huang

Background information: Space radiation can damage DNA and reduce the reliability of biological systems used for diagnostics, manufacturing, and environmental sensing during long-duration missions. This is significant because future crews will likely depend on compact biotechnology tools rather than constant resupply from Earth. It is relevant for space exploration because cell-free systems are lightweight, storable, and already attractive for use in resource-limited environments. It is scientifically interesting because it links a basic biological question: how nucleic acid damage affects gene expression to an applied engineering problem: how to maintain functional biotechnology in space.

Molecular or genetic target: Integrity and expression efficiency of a PCR-amplified sfGFP DNA template after radiation-mimicking UV exposure.

How the target relates to the challenge: The sfGFP DNA template serves as a simple reporter for whether a biologically useful DNA sequence remains functional after damage. If radiation-like exposure degrades the template, BioBits cell-free protein expression should produce less GFP signal. This makes the target directly relevant to the space biology challenge, because many space biotechnology applications depend on DNA templates remaining intact enough to be transcribed and translated. Measuring GFP output therefore provides a practical way to estimate how radiation damage could impair future cell-free diagnostics or production systems used in spacecraft or habitats.

Hypothesis or research goal: My hypothesis is that increasing UV exposure, used here as a classroom-accessible proxy for radiation-induced nucleic acid damage, will reduce the ability of a PCR-amplified sfGFP DNA template to produce GFP in the BioBits cell-free expression system. I further expect that templates protected by a shielding condition, such as melanin-containing film or another UV-blocking barrier, will retain more expression than unprotected templates exposed to the same dose. The reasoning is that DNA damage should interfere with transcription and translation by reducing template integrity, while a protective barrier should lower that damage. The research goal is to test whether cell-free fluorescence output can function as a simple readout of DNA stability under space-relevant stress and whether a lightweight protective strategy improves performance.

Experimental plan:

I will amplify an sfGFP template with the miniPCR and divide it into groups:
- no UV exposure
- low UV
- high UV, and
- UV plus shielding
After treatment, each sample will be added to BioBits cell-free reactions. Negative controls will include reactions with no DNA template; positive controls will include unexposed template.
GFP fluorescence will be measured with the P51 Molecular Fluorescence Viewer and quantified by image intensity or relative brightness. The main data will be fluorescence level across conditions, which will indicate how template damage affects expression and whether the shielding condition preserves function.

Homework Part B: Individual Final Project

general info / link for my slide in the CT slide deck

Here’s my slide in the CT slide deck

Title: Engineering Tunable Skin Pigment Expression in Engineered Living Materials

Aim 1: Generate base data on melanogenesis by mapping key pathways and build an initial genetic circuit informed by this base data to produce tunable pigmentation (eumelanin-biased outputs for darker tones and pheomelanin-biased outputs for warmer tones).

Aim 2: Expand and refine the circuit aiming for selecting envisioned great candidates for wet-lab experimentation. Experiments planning.

Aim 3: Empirical essays to explore how variables such as pigment amount, distribution, and system conditions affect the final material output. Companies: BioFabricate; Cultivarium

Industry Council Companies: BioFabricate and Cultivarium I selected them because they each address a different core part of my project: Biofabricate could potentially bring a strong expertise on how to translate embedding melanin-related genetic circuits into a desirable (aesthetic and functional) engineered living material, while Cultivarium is well aligned with the wet-lab side of the project, particularly chassis selection, non-model organism engineering, and the practical challenge of implementing and optimizing the circuit in a host such as Komagataeibacter rhaeticus.

Submit the Final Project selection form.

Started planning how I will write my final project documentation based on the guidelines

To be done by April 10 at 11PM ET. Prepare your first DNA order and put it in the “Twist (MIT)” or “Twist (Nodes)” tab of the 2026 HTGAA Ordering: DNA, Reagents, Consumables spreadsheet, as appropriate.

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

What to measure?

I will measure visible melanin output in the material as the primary readout of the project.

I want to quantify:

Degree of darkening
Spatial distribution of pigmentation
Stability/Persistence of the pigmentation in the bacterial cellulose / after drying or storage

These measurements are directly relevant because they indicate whether the melanin-producing system is functioning and whether the output is compatible with the intended material application.

How to measure?

a) Initial measurements: Molecular biology

First, to validate the genetic component, I would measure the presence of the designed construct by PCR and confirm the DNA sequence by DNA sequencing. I would use agarose gel electrophoresis to confirm correct DNA assembly before testing expression.

To verify whether the melanin-producing pathway is being expressed in a cell-free or microbial test system before integration into the material, I could also use gel electrophoresis to confirm DNA assembly and cell-free assay readouts to test whether the construct produces the expected visible darkening before integrating it into bacterial cellulose.

b) Material measurements:

These are the most direct indicators of whether the melanin-producing system is working and whether the output is useful as a material feature rather than only a biochemical signal.

I would first document the material using standardized photography under controlled lighting and then quantify changes in tone by image analysis, comparing pixel intensity or color values across samples and conditions. I would also use absorbance or spectrophotometric measurements when possible to obtain a more objective estimate of pigment accumulation.
As a secondary measurement, I would use UV-Vis absorbance or reflectance spectroscopy, if available, to quantify pigment accumulation more objectively.

Homework: Waters Part 1 — Molecular Weight

Question 1

eGFP (native): ~26.9 kDa

eGFP + LEHHHHHH tag: ~27,875.41 Da

All spaces and line breaks were removed.

Question 2

To calculate the molecular weight of intact eGFP, I selected two adjacent peaks from the LC-MS spectrum at m/z 933.7349 and 965.9684.

Using the adjacent charge state equation, this gives a charge state of approximately 30 for the first peak, meaning the second adjacent peak corresponds to 29. I then used these charge states to calculate the molecular weight from each peak, using the relationship between m/z, charge, and proton mass. This gave values of 27,981.8 Da and 27,983.9 Da, respectively, with an average experimental molecular weight of 27,982.9 Da.

I then compared this experimental value with the theoretical molecular weight of the full eGFP construct, including the LE linker and His tag, which is 28,006.3 Da. The relative error was 0.084%, showing very good agreement between the experimental and predicted values. This indicates that the adjacent charge state method produced an accurate estimate of the intact protein mass.

For the zoomed-in peak near m/z 1474, the charge state can also be reasonably assigned. Based on the experimental molecular weight, a 19+ ion would appear at about m/z 1473.8, which closely matches the observed signal. So yes, the charge state of the zoomed-in peak can be observed, and it is most consistent with z = 19.

Homework: Waters Part II — Secondary/Tertiary structure

Question 1

This unfolding changes how the protein gets charged during electrospray ionization. In the native state, fewer sites are accessible for protonation, so the protein carries fewer charges and appears at higher m/z values. In the denatured state, more sites are exposed, so the protein can carry more charges, which shifts the signal to lower m/z values.

In the mass spectrum (Figure 2), this shows up clearly. The native protein has a tighter charge state distribution at higher m/z, while the denatured protein has a broader distribution shifted toward lower m/z. So basically, by looking at how the charge state envelope shifts, we can tell whether the protein is folded or unfolded.

Question 2

If we zoom into the peak around m/z ~2800 in the native spectrum, we can determine the charge state by looking at the spacing between the small peaks in the isotope pattern. At high resolution, these peaks are separated by approximately 1/z.

From the inset, the peaks are spaced by about ~0.05–0.06 m/z units. Since the spacing is equal to 1/z, this suggests:

z ≈ 1 / 0.05 ≈ 20

So the charge state is approximately 20+.

This also makes sense when compared to the protein’s mass (~28 kDa). A 20+ ion would appear around m/z ≈ 2800, which matches the observed peak. So both the isotope spacing and the overall m/z position are consistent with a charge state of 20+.

Homework: Waters Part III — Peptide Mapping - primary structure

Question 1

Lysine (K): 20
Arginine (R): 6
Total K + R: 26
Number of tryptic peptides generated: 27

To analyze the eGFP standard, I first reviewed the full amino acid sequence provided, including the LE linker and the C-terminal His-tag (HHHHHH). I then identified all lysine (K) and arginine (R) residues, since trypsin cleaves specifically after K and R residues unless the following amino acid is proline (P).

After counting the residues in the sequence using Benchlink, I found a total of 20 lysines (K) and 6 arginines (R), for a combined total of 26 potential trypsin cleavage residues.

Question 2

I also checked whether any of these K or R residues were followed by proline, which would block trypsin cleavage, and I found that none of them were followed by P. Therefore, all 26 sites are valid trypsin cleavage sites. Because each cleavage site divides the sequence into peptide fragments, the total number of peptides expected from complete tryptic digestion is the number of cleavage sites plus one. Based on this, the digest should generate 27 peptides in total.

To double check this, I have pasted the eGFP amino acid sequence into the ExPASy PeptideMass tool, selected trypsin as the digestion enzyme, and used the parameters shown in Figure 4, including 0 missed cleavages, monoisotopic mass, and no modifications. I then clicked “Perform the Cleavage” to generate the predicted list of tryptic peptides and determine the total number of peptides produced.

After manually counting 26 lysine and arginine residues, I expected a total of 27 tryptic peptides. When I ran the sequence in the ExPASy PeptideMass tool, the output showed fewer peptides than expected. However, this is because the tool was set to display only peptides with masses greater than 500 Da, which excludes smaller fragments.

Question 3

To analyze the peptide map, I examined the total ion chromatogram (TIC) in Figure 5a and focused on the retention time window between 0.5 and 6 minutes. I counted only peaks with a relative intensity greater than approximately 10% of the base peak, as specified. Based on this criterion, I observe approximately 18–20 chromatographic peaks between 0.5 and 6 minutes. The exact number depends slightly on how closely overlapping peaks are resolved, particularly in the region between ~2.5 and 3.5 minutes, where several peaks are closely spaced.

Question 4

The chromatogram shows fewer peaks than the number of peptides predicted from question 2. In question 2, the full tryptic digest was predicted to generate 27 peptides. In the chromatogram, counting only peaks above the 10% relative abundance threshold between 0.5 and 6 minutes gives roughly 20 peaks.So there are fewer peaks in the chromatogram than predicted peptides. This likely means that some peptides are either too low in abundance, too small, or co-elute with other peptides and therefore do not appear as separate visible chromatographic peaks.

Question 5

To analyze the peptide in Figure 5b, I first identified the most intense peak in the spectrum, which appears at m/z ≈ 525.77. I assumed this corresponds to the most abundant charge state of the peptide.

To determine the charge state, I examined the zoomed-in isotope pattern. The spacing between adjacent isotope peaks is about 0.5 m/z unit. Since isotope spacing is approximately equal to 1/z, a spacing of ~0.5 indicates that z ≈ 2. Based on this, I concluded that the most abundant charge state is z = 2+.

Next, I calculated the mass of the singly charged form of the peptide, M+H+, using the relationship:

M+H+ = z(m/z) − (z − 1)(1.0073)

Substituting the values:

M+H+ = 2(525.77) − 1.0073 ≈ 1050.53 Da

So, the peptide has:

m/z ≈ 525.77

charge state z = 2+

M+H+ ≈ 1050.53 Da

This result is consistent with the spectrum, since there is also a peak visible near m/z ≈ 1050.52, which corresponds to the singly charged form of the same peptide.

Question 6

From the previous step, I determined that the most abundant ion was at m/z 525.7671 with charge z = 2, which gives a singly charged mass of about M+H+ = 1050.53 Da. In the PeptideMass results, the closest expected peptide mass is 1050.5214 Da, which corresponds to the peptide FEGDTLVNR. Based on that match, I identified the peptide as FEGDTLVNR.

To evaluate the mass accuracy, I compared the experimental mass to the theoretical mass from PeptideMass. Using the exact value labeled in the spectrum, the experimental singly charged mass is 1050.52438 Da, and the theoretical mass is 1050.5214 Da. The mass difference is therefore:

1050.52438 - 1050.5214 = 0.00298 Da

To express the error in ppm, I used:

error (ppm) = (MW_experimental - MW_theory) / MW_theory × 10^6

Substituting the values:

error (ppm) = (0.00298 / 1050.5214) × 10^6 ≈ 2.84 ppm

So the measurement error is about 2.8 ppm, which indicates very good agreement between the measured peptide mass and the theoretical value.

Question 7

Figure 6 shows that the amino acid coverage of eGFP is 88%. This means that 88% of the eGFP sequence was confirmed by peptide mapping.

Summary

Identified peptide: FEGDTLVNR Experimental M+H+: 1050.52438 Da Theoretical M+H+: 1050.5214 Da Mass error: 2.84 ppm Sequence coverage confirmed by peptide mapping: 88%

Homework: Waters Part IV — Oligomers

I use the aid of chatgpt for comparing the theoretical and experimental subunits masses in the answering below.

To identify the Keyhole Limpet Hemocyanin (KLH)’s oligomeric states in the CDMS spectrum, I used the subunit masses given in Table 1 and multiplied them by the number of subunits expected in each assembly. I then compared those theoretical masses to the labeled peaks in Figure 7.

Here are the results summarized in a table:

Oligomeric species	Theoretical mass	Peak in the mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS	Interpretation
7FU Decamer	3.4 MDa	~3.4 MDa	This peak is consistent with the expected mass of a 10-subunit 7FU assembly.
8FU Didecamer	8.0 MDa	~8.33 MDa	This is the closest and most intense peak, so it is the strongest candidate for the 8FU didecamer.
8FU 3-Decamer	12.0 MDa	~12.67 MDa	This peak is reasonably close to the expected tridecamer mass and likely represents a higher-order 8FU assembly.
8FU 4-Decamer	16.0 MDa	~16-17 MDa	The weak signal in this region may correspond to the 8FU 4-decamer, although this assignment is more tentative.

Discussion

To interpret the CDMS spectrum, I compared the theoretical oligomer masses calculated from the known KLH subunit masses with the labeled peaks in Figure 7. Based on this comparison. The observed masses are not perfectly identical to the theoretical values, but they are close enough to support these assignments as working hypotheses.

Example proxy calculations:

For the 7FU decamer (10 units): 7FU subunit mass = 340 kDa
Since a decamer contains 10 subunits, the expected mass is: 10 × 340 = 3400 kDa = 3.4 MDa
In the spectrum, there is a labeled peak at about 3.4 MDa I would assign that peak to the 7FU decamer. This corresponds to a 4.5 mDa from the x axis analysis.
The slight offsets could reflect experimental uncertainty, heterogeneity in the sample, adducting, or the natural structural complexity of KLH. Overall, my interpretation is that the spectrum supports a mixture of KLH oligomeric states, with the 8FU didecamer appearing to be the predominant species and the larger 8FU assemblies likely representing less abundant higher-order associations.

The 8.33 MDa peak is by far the most intense feature in the spectrum. This suggests that the 8FU didecamer may be the dominant oligomeric state in this sample under the conditions used for CDMS.

In contrast, the peaks assigned to the 8FU 3-decamer and especially the 8FU 4-decamer are much less abundant, which may indicate that these larger assemblies are present only as minor populations or form less stably in solution.

Homework: Waters Part V — Did I make GFP?

Measurement	Theoretical	Observed/measured on the intact LC-MS	PPM Mass Error
Molecular weight	28.0063 kDa	27.9829 kDa	-835.5 ppm

I calculated the ppm mass error using:

ppm error = ((observed mass - theoretical mass) / theoretical mass) × 10⁶

Substituting the values:

ppm error = ((27.9829 - 28.0063) / 28.0063) × 10⁶ ≈ -835.5 ppm

Taking the absolute value, the mass error is approximately 836 ppm. The observed intact LC-MS mass is close to the theoretical eGFP construct mass, so the data supports that the sample is consistent with the expected GFP/eGFP construct.

I used ChatGPT as a writing and reasoning assistant to help review calculations, improve explanations, and check whether my answers addressed the homework prompts. All final interpretations, edits, and submitted content were reviewed by me.

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

I contributed 7 pixels to the global artwork experiment, helping extend a horizontal yellow line in the top-left area (see screenshot below).

At first, I was cautious and tried to understand the ongoing ideas for each section and whether there was a unifying concept. I considered introducing something new, but ultimately decided to stick with what seemed to be the area’s goal (a horizontal yellow line). For next year, it might be fun to have an in-app chat within the same domain to coordinate contributions more easily and check the current vibes.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Question 1

Component Category	Component	Corrected role in the cell-free reaction
Lysate	E. coli Lysate	Provides the endogenous transcription, translation, and metabolic machinery needed for in vitro gene expression.
	BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)	Provides the same core lysate machinery plus T7 RNA polymerase for strong transcription from T7 promoter templates.
Salts / Buffer	Potassium Glutamate	Helps set intracellular-like ionic conditions that support enzyme activity, ribosome function, and overall reaction performance.
	HEPES-KOH pH 7.5	Maintains reaction pH in the range needed for stable transcription-translation activity.
	Magnesium Glutamate	Supplies Mg2+, an essential cofactor for ribosomes, polymerases, and many ATP-dependent enzymes.
	Potassium phosphate monobasic	Contributes phosphate and helps maintain buffer balance together with the dibasic form.
	Potassium phosphate dibasic	Works with the monobasic form to maintain phosphate buffering and reaction stability.
Energy / Nucleotide System	Ribose	Supports nucleotide metabolism and regeneration pathways rather than serving as the main energy source.
	Glucose	Serves as a metabolic energy substrate that helps regenerate ATP through endogenous lysate metabolism.
	AMP	Acts as a nucleotide monophosphate precursor that can be phosphorylated into higher-energy adenine nucleotides.
	CMP	Acts as a nucleotide precursor that can be converted into CTP for transcriptional needs.
	GMP	Acts as a nucleotide precursor that can be converted into GTP for transcription and translation-related processes.
	UMP	Acts as a nucleotide precursor that can be converted into UTP for transcriptional needs.
	Guanine	Serves as a salvage precursor for guanine nucleotide synthesis.
Translation Mix (Amino Acids)	17 Amino Acid Mix	Provides most of the amino acid building blocks required for protein synthesis.
	Tyrosine	Provides a required amino acid for translation and may also be supplied separately because of formulation or pathway-specific needs.
	Cysteine	Provides a required amino acid for translation and is often added separately because of its chemical instability.
Additives	Nicotinamide	Serves as a precursor for NAD-related cofactors that support extract redox metabolism.
Backfill	Nuclease Free Water	Brings the reaction to the target volume without introducing nucleases or contaminants.

Question 2

The 1-hour PEP-NTP system supplies fully activated NTPs and high-energy phosphate (PEP) upfront, enabling fast, high-rate transcription and translation but with limited longevity due to rapid energy depletion.

In contrast, the 20-hour NMP-ribose-glucose system relies on metabolic regeneration, using NMPs and simple substrates (ribose, glucose) that are enzymatically converted into active nucleotides and ATP, trading peak speed for sustained, longer-duration protein production.

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Question 1

a. Superfolder Green Fluorescent Protein (sfGFP)

Description: a basic (constitutively fluorescent) green fluorescent protein published in 2005, derived from Aequorea victoria. It is reported to be a very rapidly-maturing weak dimer.

sfGFP has very efficient folding and fast maturation (~13 min), allowing it to produce fluorescence quickly and reliably even under suboptimal cell-free conditions. This makes it ideal for early and robust readout.

b. Monomeric Red Fluorescent Protein 1 (mRFP1)

mRFP1: Derived from DsRed, mRFP1 has slow maturation and lower photostability, which delays fluorescence signal and reduces effective brightness in short or energy-limited cell-free reactions.

c. mKusabira-Orange2 (mKO2)

mKO2 has moderate maturation speed but higher sensitivity to photobleaching and environmental conditions, which can reduce signal stability during long incubations or repeated excitation. This protein is relatively acid-sensitive (higher pKa), so its fluorescence can decrease if the cell-free reaction acidifies over time, affecting signal stability.

d. mTurquoise2

This protein has an exceptionally high quantum yield and photostability, making it one of the brightest CFP variants and ideal for strong signal readout even at low expression levels.

e. mScarlet_I

mScarlet-I is optimized for high brightness and improved maturation efficiency among red FPs, enabling stronger signal compared to earlier RFPs, though maturation still limits very early readouts compared to GFP variants.

f. Electra2

As a newer engineered FP (likely optimized variant), its performance is typically influenced by trade-offs between brightness, folding efficiency, and maturation kinetics, meaning signal output depends strongly on how well it folds and matures in the cell-free environment.

Question 2

Hypothesis: For mKO2, increasing the HEPES-KOH buffer concentration and maintaining sufficient glucose in the cell-free mastermix will improve fluorescence over a 36-hour incubation by reducing pH drift and sustaining ATP regeneration.

Rationale: Because mKO2 is relatively acid-sensitive, stronger pH buffering should help preserve fluorescence, while sustained glucose-dependent energy regeneration should support continued protein expression and chromophore maturation, resulting in a higher final fluorescence signal.

Small caveat: glucose can also contribute to acidification depending on the metabolism of the lysate, so the strongest version is really HEPES-KOH + controlled glucose, not just “more glucose.”

Question 3

sfGFP → system calibration (TX-TL health) Melanin has a broad absorbance spectrum, but it absorbs much more strongly at shorter wavelengths (blue/green) than at longer wavelengths (red). Melanin interferes with optical readout since we will be trying to measure fluorescence in a reaction that is simultaneously getting darker, which creates optical interference broadening the wavelengh espectrum of signal.

mScarlet-I → expression readout for melA tyrosinase especifically fluorescence is less sensitive to melanin, so it better tracks expression alone (sfGFP → Ex ~488 nm / Em ~510 nm → high overlap with melanin absorbance; mTurquoise2 → even worse (blue region); mScarlet-I → Ex ~569 nm / Em ~594 nm → less overlap).

Question 4 For optimizing the Master Mix design for mScarlet-I in my melA tyrosinase cell-free system, I’d supplement CuSO4 since my analyte is a cooper dependent enzyme, HEPES-KOH pH 7.5 to have an additional buffer against acidification and magnesium glutamate to improve translation capacity.

At first I thought about adding glucose since it could extend energy regeneration, but then I wondered that it may also increase acidification. Since you’re worried about fluorescence readout in a pigment-producing system, I’d prioritize pH stability over extra glucose.

I’d actually supplement L-tyrosine that serve as a functional validation that my protein of interest MelA tyrosinase is being expressed and active.

Master Mix designs to be tested using mScarlet-I and sfGFP:

REACTION 1

My preparation before have received email (to your email address as registered here on the Forum) providing your personal link to participate in the Cell-Free Master Mix Cloud Lab Global Experiment:

my melA-tyrosine cell-free system
mScarlet-I

Supplement	Volume	Purpose
HEPES-KOH pH 7.5	1.0 µL	Buffer against pH drift over 36h, helping preserve mScarlet-I fluorescence and MelA activity.
L-tyrosine	0.75 µL	Provides additional substrate for MelA-driven melanin-like pigment production.
CuSO4, very low concentration	0.25 µL	Supports MelA tyrosinase activity as a copper-dependent enzyme while minimizing toxicity/inhibition.

MelA-specific bottlenecks: tyrosine substrate, copper cofactor

Increasing buffering capacity with HEPES-KOH seems also a good idea because prolonged cell-free reactions coupled with melanin production lead to progressive acidification, which can reduce fluorescent protein signal, impair MelA activity, and shorten the productive lifetime of the TX-TL system.

REACTION 2

my melA-tyrosine cell-free system
sfGFP

Supplement	Volume	Purpose
HEPES-KOH pH 7.5	1.0 µL	Buffer against pH drift over 36h, helping preserve mScarlet-I fluorescence and MelA activity.
L-tyrosine**	0.75 µL	Provides additional substrate for MelA-driven melanin-like pigment production.
CuSO4, very low concentration	0.25 µL	Supports MelA tyrosinase activity as a copper-dependent enzyme while minimizing toxicity/inhibition.

MelA-specific bottlenecks: tyrosine substrate, copper cofactor

REACTION 3

my melA-tyrosine cell-free system
mScarlet-I

Reagent	Volume	Purpose
L-tyrosine	0.8 µL	Direct substrate for MelA pigment production
HEPES-KOH pH 7.5	0.6 µL	Reduces pH drift over 36h
Magnesium glutamate	0.4 µL	Supports sustained transcription-translation
Low CuSO4	0.2 µL	Supports tyrosinase catalytic activity

As copper is required as a cofactor for MelA tyrosinase activity, but must be carefully controlled because excess Cu²⁺ can inhibit cell-free expression and promote nonspecific oxidative reactions I decided to test reducing it and supplement magnesium glutamate since it improves TX-TL capacity by supporting ribosomes, RNA polymerase, and Mg-ATP/GTP chemistry.

REACTION 4

my melA-tyrosine cell-free system
sfGFP

Reagent	Volume	Purpose
L-tyrosine	0.8 µL	Direct substrate for MelA pigment production
HEPES-KOH pH 7.5	0.6 µL	Reduces pH drift over 36h
Magnesium glutamate	0.4 µL	Supports sustained transcription-translation
Low CuSO4	0.2 µL	Supports tyrosinase catalytic activity

REACTION 5

my melA-tyrosine cell-free system
mScarlet-I

Reagent	Volume	Purpose
HEPES-KOH pH 7.5	1.25 µL	Stronger buffering against pH drift over 36h.
Low CuSO4	0.25 µL	Enables MelA tyrosinase activity as a copper-dependent enzyme.
Nuclease-free water	0.50 µL	Keeps total supplement volume at 2 µL without adding more substrate.

This reaction tests whether the main limitation is pH stability + copper availability, rather than additional tyrosine. It is useful because the base mastermix already contains tyrosine, so this condition asks whether MelA can produce pigment when copper is supplied and pH is stabilized without further increasing substrate concentration.

REACTION 6

my melA-tyrosine cell-free system
sfGFP

Reagent	Volume	Purpose
HEPES-KOH pH 7.5	1.25 µL	Stronger buffering against pH drift over 36h.
Low CuSO4	0.25 µL	Enables MelA tyrosinase activity as a copper-dependent enzyme.
Nuclease-free water	0.50 µL	Keeps total supplement volume at 2 µL without adding more substrate.

REACTION 7

my MelA-tyrosine cell-free system
sfGFP

Reagent	Volume	Purpose
L-tyrosine	1.50 µL	Pushes substrate availability to test whether pigment formation is substrate-limited.
CuSO4, very low concentration	0.25 µL	Enables MelA catalytic activity.
HEPES-KOH pH 7.5	0.25 µL	Minimal pH support.

This is the pigment-stress condition: it intentionally pushes melanin production to test whether sfGFP fluorescence collapses when the reaction darkens. If sfGFP drops while pigment rises, that supports using mScarlet-I as the better reporter.

REACTION 8

my MelA-tyrosine cell-free system
sfGFP or mScarlet-I

Reagent	Volume	Purpose
HEPES-KOH pH 7.5	1.50 µL	Strongly buffers against acidification over 36h.
CuSO4, very low concentration	0.25 µL	Enables MelA activity.
L-tyrosine	0.25 µL	Keeps substrate present but avoids overloading the system.

This is the long-incubation preservation condition: it tests whether the best 36h outcome comes not from maximizing substrate, but from preventing reaction decay. If fluorescence and pigment both remain stronger at 36h, pH stability is the key design variable.

My actual experiments submitted

Now that I’ve seen the inferface better, I got that the goal here is to focus on DNA construct performance, so I’ll treat this as an expression/readout experiment rather than enzyme validation.

Went too far into broader bioprocess hypotheses 😅 in my brainstorm composition hypothesis above.

Given the broader objective of optimizing the cell-free master mix for maximal fluorescence across six proteins, I will test the 2 reporters:

mScarlet-I = better reporter under melanin/dark pigment interference
sfGFP = system health / pigment-interference control

This 1st round I will test these 8 reactions - Table Followed by textual arguments

Reaction	Reporter	Testing	HEPES-KOH	Tyrosine	Magnesium glutamate	Water/backfill	Main purpose
1	mScarlet-I	Low buffer / low substrate	0.25 µL	0.25 µL	0 µL	1.50 µL	Baseline condition for mScarlet-I.
2	sfGFP	Low buffer / low substrate	0.25 µL	0.25 µL	0 µL	1.50 µL	Baseline condition for sfGFP.
3	mScarlet-I	pH drift	1.00 µL	0.25 µL	0 µL	0.75 µL	Tests whether stronger buffering improves mScarlet-I signal.
4	sfGFP	pH drift	1.00 µL	0.25 µL	0 µL	0.75 µL	Tests whether stronger buffering preserves sfGFP signal.
5	mScarlet-I	substrate limitation	0.25 µL	1.00 µL	0 µL	0.75 µL	Tests whether extra tyrosine increases pigment formation with mScarlet-I.
6	sfGFP	substrate limitation / pigment interference	0.25 µL	1.00 µL	0 µL	0.75 µL	Tests whether extra tyrosine-driven pigment formation interferes with sfGFP.
7	mScarlet-I	pH drift, substrate limitation, TX-TL capacity	1.00 µL	0.75 µL	0.25 µL	0 µL	Tests combined support for fluorescence and pigment production.
8	sfGFP	pH drift, substrate limitation, TX-TL capacity + reporter comparison	1.00 µL	0.75 µL	0.25 µL	0 µL	Same as Reaction 7, but tests sfGFP under melanin-producing conditions.

Reaction	Hypothesis
1	Low HEPES and low tyrosine will provide a baseline fluorescence condition for comparison across proteins.
2	The same low HEPES / low tyrosine condition will reveal whether sfGFP is more sensitive to pigment-related interference than mScarlet-I.
3	Increasing HEPES will improve fluorescence over 36h by reducing pH drift.
4	Increasing HEPES will help determine whether pH stabilization benefits sfGFP fluorescence under the same conditions.
5	Increasing tyrosine will test whether extra substrate/pigment formation reduces fluorescence through optical interference.
6	High tyrosine with sfGFP will test whether green fluorescence is especially affected by pigment accumulation.
7	Combining HEPES, tyrosine, and magnesium glutamate will improve fluorescence by supporting pH stability, substrate context, and TX-TL capacity.
8	The same combined condition with sfGFP will test whether translation support and buffering can preserve fluorescence despite stronger pigment-forming conditions.

REACTION 1

Testing: Baseline condition (low buffer, low substrate)

Hypothesis: Under minimal buffering and substrate availability, both melanin production and mScarlet-I fluorescence will be limited, providing a baseline to compare improvements from other conditions.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 0.25 µL

REACTION 2

Testing: Baseline condition + reporter comparison

Hypothesis: This condition mirrors Reaction 1 but uses sfGFP to evaluate baseline fluorescence without strong pigment production, serving as a reference for how each reporter behaves under minimal conditions.

System: melA system sfGFP

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 0.25 µL

REACTION 3

Testing: pH drift

Hypothesis: Increasing buffering capacity with HEPES-KOH will improve mScarlet-I fluorescence over 36 hours by reducing pH drift, even without increasing substrate availability.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.25 µL

REACTION 4

Testing: pH drift + reporter comparison

Hypothesis: This condition mirrors Reaction 3 but uses sfGFP to test whether stronger buffering preserves green fluorescence, or if signal is still affected by pigment formation and optical interference.

System: melA system sfGFP

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.25 µL

REACTION 5

Testing: Substrate limitation

Hypothesis: Increasing tyrosine concentration will enhance melanin-like pigment production, indicating that MelA activity may be limited by substrate availability under baseline conditions.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 1.0 µL

REACTION 6

Testing: Substrate limitation + pigment interference

Hypothesis: This condition mirrors Reaction 5 but uses sfGFP to evaluate whether increased pigment formation interferes with green fluorescence, compared to the red-shifted mScarlet-I signal.

System: melA system sfGFP

Supplements: HEPES-KOH → 0.25 µL Tyrosine → 1.0 µL

REACTION 7

Testing: pH drift, substrate limitation, and TX-TL capacity

Hypothesis: Combining buffering (HEPES-KOH), substrate availability (tyrosine), and translation support (magnesium glutamate) will help sustain melanin production and mScarlet-I fluorescence over 36 hours by addressing the main system bottlenecks.

System: melA system mScarlet-I

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.75 µL Magnesium glutamate → 0.25 µL

REACTION 8

Testing: pH drift, substrate limitation, TX-TL capacity + reporter comparison

Hypothesis: This condition mirrors Reaction 7 but uses sfGFP to evaluate how green fluorescence behaves under melanin-producing conditions, serving as a control to assess pigment interference relative to mScarlet-I.

System: melA system sfGFP

Supplements: HEPES-KOH → 1.0 µL Tyrosine → 0.75 µL Magnesium glutamate → 0.25 µL

Reactions submitter on 5/1/2026.

Unfortunately not possible to add copper, which is MelA tyrosinase cofactor in the form of CuSO4 now.

Keeping the designs aligned with my Part B logic I wish I could test this 5 hypothesis:

I’m testing mScarlet-I for melanin-compatible readout
sfGFP as the expression/pigment-interference control
HEPES for 36h pH stability,
Tyrosine for MelA substrate availability,
Magnesium glutamate for TX-TL capacity.

Week 12 HW: Building Genomes

I reviewed the updated Week 11 homework and continued making progress on my Individual Final Project and DNA order.

Week 13 HW: AI, Synbio, and Scaling Health Innovation (ARPA-H)

I worked on my Final Project and prepared it for the presentation on May 13 as part of the Committed Listeners group.

Week 14 HW: Bio Design & Bio Fabrication

I worked on my Final Project and prepared it for the presentation on May 13 as part of the Committed Listeners group.

Labs

Lab writeups:

Week 1 Lab: Pipetting
Committed Listeners: Not Applicable.
Week 2 Lab: DNA Gel Art
For Week 2, the wet-lab component was optional for CLs with lab access, which unfortunately was not my case. I completed and documented the in-silico design and written assignments on my Homework Week 2 page.
Week 3 Lab: Lab Automation
For Week 3, CLs were required to create and document the Opentrons Python script, answer the post-lab questions, and submit final project ideas. I completed the code-based assignment and documentation on my Homework Week 3 page, but I did not have access to run the script physically on an Opentrons robot.
Week 4 Lab: Protein Design Part 1
Lab work this week is contained within my Homework Week 4 page.
Week 5 Lab: Protein Design Part 2
Lab work this week is contained within my Homework Week 5 page.
Week 6 Lab: Gibson Assembly
For this week, I completed what was expected from the CL side: the conceptual and design-oriented homework around PCR, Gibson cloning, transformation, Golden Gate Assembly, Benchling modeling, and Asimov Kernel exercises. Since I did not have access to a physical lab, I did not perform the wet-lab workflow, but my Week 6 Homework Documentation covers the main principles behind the lab assignment.
Week 7 Lab: Neuromorphic Circuits
For this lab, the physical wet-lab component was not something I had access to as a CL. Kindly check my Week 7 Homework Documentation.
Week 9 Lab: Cell Free Systems
For Week 9, I completed the required CL homework components, including the general cell-free systems questions, lecturer-specific questions, and final project planning. Kimdly check my Week 9 Homework page.
Week 10 Lab: Mass Spectrometry
For Week 10, I completed the CL homework requirements based on the provided lab screenshots/data as allowed in the homework instructions. Kindly check my Week 10 Homework Documentation.
Week 11 Lab: Cloud Laboratories Homework & Lab
As a CL without access to a physical lab, I completed the Week 11 cloud lab assignment through the design and documentation components. I contributed to the collective artwork, described the cell-free reaction components, compared the master mix strategies, and submitted reaction designs for the global experiment, focusing on mScarlet-I and sfGFP readouts with HEPES-KOH, tyrosine, and magnesium glutamate. Kindly check my documentation for Week 11 Homework.
Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene
Post Lab Questions (Mandatory for All Students)

Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively? According to the lab instructions, lycopene production in E. coli is induced by transferring the three genes from Erwinia herbicola: crtE, crtI, and crtB. These genes convert FPP into lycopene. Beta-carotene production uses the same pathway with the addition of crtY, which enables conversion toward beta-carotene.

Week 13 Lab: Final Project Labwork
No Lab Assignment this week.
Week 14 Lab: Final Project Labwork
No Lab Assignment this week.

Week 1 Lab: Pipetting

Committed Listeners: Not Applicable.

Week 2 Lab: DNA Gel Art

For Week 2, the wet-lab component was optional for CLs with lab access, which unfortunately was not my case. I completed and documented the in-silico design and written assignments on my Homework Week 2 page.

Week 3 Lab: Lab Automation

For Week 3, CLs were required to create and document the Opentrons Python script, answer the post-lab questions, and submit final project ideas. I completed the code-based assignment and documentation on my Homework Week 3 page, but I did not have access to run the script physically on an Opentrons robot.

Week 4 Lab: Protein Design Part 1

Lab work this week is contained within my Homework Week 4 page.

Week 5 Lab: Protein Design Part 2

Lab work this week is contained within my Homework Week 5 page.

Week 6 Lab: Gibson Assembly

For this week, I completed what was expected from the CL side: the conceptual and design-oriented homework around PCR, Gibson cloning, transformation, Golden Gate Assembly, Benchling modeling, and Asimov Kernel exercises. Since I did not have access to a physical lab, I did not perform the wet-lab workflow, but my Week 6 Homework Documentation covers the main principles behind the lab assignment.

Week 7 Lab: Neuromorphic Circuits

For this lab, the physical wet-lab component was not something I had access to as a CL. Kindly check my Week 7 Homework Documentation.

Week 9 Lab: Cell Free Systems

For Week 9, I completed the required CL homework components, including the general cell-free systems questions, lecturer-specific questions, and final project planning. Kimdly check my Week 9 Homework page.

Week 10 Lab: Mass Spectrometry

For Week 10, I completed the CL homework requirements based on the provided lab screenshots/data as allowed in the homework instructions. Kindly check my Week 10 Homework Documentation.

Week 11 Lab: Cloud Laboratories Homework & Lab

As a CL without access to a physical lab, I completed the Week 11 cloud lab assignment through the design and documentation components. I contributed to the collective artwork, described the cell-free reaction components, compared the master mix strategies, and submitted reaction designs for the global experiment, focusing on mScarlet-I and sfGFP readouts with HEPES-KOH, tyrosine, and magnesium glutamate. Kindly check my documentation for Week 11 Homework.

Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene

Post Lab Questions (Mandatory for All Students)

1) Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively? According to the lab instructions, lycopene production in E. coli is induced by transferring the three genes from Erwinia herbicola: crtE, crtI, and crtB. These genes convert FPP into lycopene. Beta-carotene production uses the same pathway with the addition of crtY, which enables conversion toward beta-carotene.

2) Why do the plasmids that are transferred into the E. coli need to contain an antibiotic resistance gene? The antibiotic resistance gene allows selection for E. coli cells that successfully received the plasmid. Only transformed cells can grow on antibiotic-containing media.

3) What outcomes might we expect to see when we vary the media, presence of fructose, and temperature conditions of the overnight cultures? Different media composition and temperatures can affect both cell growth and pigment production. Richer media may increase biomass, fructose may improve lycopene production by changing carbon metabolism, and lower temperature may reduce stress or improve pathway performance, while 37°C may favor faster growth. Based on the lab framing, fructose is being tested because it may improve biomass yield and recombinant gene expression in E. coli. If it improves carbon flux or reduces metabolic stress, pigment production per culture may increase. However, the final result would need to be normalized by OD600 to distinguish higher pigment production from simply higher cell growth.

4) Generally describe what “OD600” measures and how it can be interpreted in this experiment. OD600 measures how much light at 600 nm is scattered by a bacterial culture. As the number of cells increases, the culture becomes more turbid, meaning it scatters more light and gives a higher OD600 value. The 600 nm wavelength is commonly used because it estimates cell density without strongly overlapping with many biological pigments or media components. In this experiment, OD600 helps estimate how much bacterial growth occurred under each condition. This is important because pigment absorbance alone could be misleading: a darker sample might have more pigment simply because it has more cells. By normalizing pigment absorbance by OD600, we can compare carotenoid production per amount of bacterial growth.

5) What are other experimental setups where we may be able to use acetone to separate cellular matter from a compound we intend to measure? Acetone can be used in experiments where we want to separate an organic-soluble compound from the rest of the cell material. In this lab, it helps extract carotenoid pigments from bacterial pellets while leaving much of the cellular debris behind. Similar setups could include extracting chlorophyll or carotenoids from algae and plant tissue, recovering hydrophobic metabolites from microbial cultures, or preparing pigment extracts before absorbance measurements. It could also be useful as a cleanup step, because acetone can precipitate proteins and help remove cell debris before analyzing small molecules by absorbance, fluorescence, or chromatography.

6) Why might we want to engineer E. coli to produce lycopene and beta-carotene pigments when Erwinia herbicola naturally produces them? Even though Erwinia herbicola naturally produces these pigments, E. coli is a better model organism and engineering chassis. It is easier to grow, transform, measure, and genetically manipulate, with well-characterized plasmids, promoters, selection markers, and growth conditions. This makes it more useful for rapid prototyping, pathway optimization, and controlled bioproduction experiments. Engineering E. coli also lets us isolate and test the carotenoid pathway in a standardized host, instead of working with the natural producer where regulation and metabolism may be more difficult to control.

Post Lab Questions (For Committed Listeners)

1.1) What are the enzymes of the carotene pathway?

Enzyme	Gene	Role
GGPP synthase	crtE	Converts FPP into geranylgeranyl diphosphate, GGPP
Phytoene synthase	crtB	Condenses GGPP molecules to form phytoene
Phytoene desaturase	crtI	Converts phytoene into lycopene
Lycopene cyclase	crtY	Converts lycopene into beta-carotene

1.2) Within this pathway, which is the rate determining step (the step that takes the longest)? Which enzyme is responsible for this step?

Within the carotenoid pathway, my hypothesis is that the likely rate-determining step is the conversion of phytoene into lycopene, catalyzed by CrtI, the phytoene desaturase.

The reason is that crtE and crtB first build the upstream carotenoid intermediate: CrtE helps produce GGPP, and CrtB converts GGPP into phytoene. Then CrtI carries out the desaturation steps that convert phytoene into lycopene. Since this step involves multiple oxidation/desaturation reactions, I would expect it to be slower and more limiting than the upstream condensation steps.

The literature support this hypothesis, but also show that CrtI is probably not the only bottleneck. Du et al. 2016 confirm that E. coli requires crtE, crtB, and crtI to produce lycopene, and they show that fructose strongly improves lycopene production by changing central metabolism, especially pathways linked to precursor, cofactor, and energy supply. So I would identify CrtI/crtI as the most likely pathway-level enzymatic bottleneck, while recognizing that whole-cell lycopene production also depends on upstream metabolic supply. This is also consistent with Aristidou, Sam and Bennett 2008, who show that fructose can reduce acetate overflow and improve biomass/recombinant expression in E. coli, suggesting that fructose supports a more favorable metabolic state for bioproduction than glucose under these conditions.

2) Notes for design of a DNA construct for bioproduction

2.1) The first thing to do is to decide what organism you are going to use for this (E. coli or S. cerevisiae) for production. Which would you choose and why (emphases on production differences)?

I would choose E. coli. S. cerevisiae could be useful for more complex eukaryotic engineering or when compartmentalization and eukaryotic metabolism are advantageous, but for fast carotenoid pathway testing, I think E. coli is the more practical chassis.

Criterion	E. coli	S. cerevisiae
Growth speed	Very fast growth, useful for rapid testing	Slower growth compared to E. coli
Genetic engineering	Easy plasmid transformation and many standardized tools	Strong engineering tools, but usually more complex
Pathway prototyping	Well suited for quick testing of pathway designs	Better for longer-term strain engineering
Production context	Directly supported by the lab setup using pAC-LYC and pAC-BETA plasmids	Would require a different design strategy, usually genome integration
Metabolism	Good bacterial chassis for recombinant pathway expression	Useful when eukaryotic metabolism or compartmentalization matters
Literature support	The referenced papers directly use E. coli for fructose-based recombinant expression and lycopene production	Not the system tested in these papers

2.2) Now choose one of the enzymes and lets outline the parts of the construct for expression

I would choose the phytoene desaturase, encoded by crtI, because it catalyzes the conversion of phytoene into lycopene and may be one of the key pathway-level bottlenecks in lycopene production.

Construct part	Example / choice	Function
Promoter	Tunable inducible promoter, such as pBAD or lac-based promoter	Controls when and how strongly crtI is transcribed
Operator	Depends on promoter system	Allows regulation by an inducer or repressor
RBS	Bacterial ribosome binding site	Controls translation initiation and affects CrtI protein level
Coding sequence	crtI	Encodes phytoene desaturase, the enzyme that converts phytoene into lycopene
Terminator	Strong bacterial terminator	Stops transcription and prevents read-through
Origin of replication	Medium-copy origin	Allows plasmid replication while limiting metabolic burden
Antibiotic resistance marker	Chloramphenicol or another selectable marker	Allows selection of cells carrying the plasmid

A minimal plasmid design would be: Origin of replication, antibiotic resistance marker, promoter, operator, RBS, crtI, terminator.

If the goal were only to test crtI expression, this construct would be enough. But if the goal is full lycopene production, crtI would need to be expressed together with the upstream pathway genes crtE and crtB, because E. coli requires crtE, crtB, and crtI to synthesize lycopene. For beta-carotene production, crtY would also be included.

2.3.i.1.a.i) What is the function of a promoter? The promoter is the DNA region that initiates transcription of the gene of interest. It controls RNA polymerase binding and therefore strongly affects when, where, and how much of the target enzyme is produced. In bacteria, promoter recognition depends on bacterial RNA polymerase and sigma factors, so the promoter must be compatible with a prokaryotic host like E. coli. Source: Educational Resources > Molecular Biology Reference > Promoters.

2.3.i.1.a.ii) What types of promoters do we have? Promoters can be grouped by their expression behavior. Constitutive promoters are active continuously, inducible promoters are turned on or increased by a signal such as IPTG, lactose, arabinose, heat, or light, and repressible promoters are turned off or reduced in response to a signal or metabolite. Common bacteria promoter examples are included in the table below.

Promoter type	Description	Mechanism	Examples from Addgene
Constitutive	Active by default / continuously drives expression	RNA polymerase can initiate transcription without needing a specific induction signal	T7, Sp6. Note: T7 requires T7 RNA polymerase
Inducible	Expression increases or turns on after a signal/inducer	Either removes repression or activates transcription	lac: IPTG/lactose removes LacI repression; araBAD: arabinose activates AraC-dependent transcription
Repressible	Expression decreases or turns off in response to a signal/metabolite	A metabolite or co-repressor enables repression of transcription	trp promoter is repressed by tryptophan

2.3.i.1.a.iii) If we wanted to turn off the transcription of a gene in response to a metabolite, what type of promoter would be most useful? What if we wanted this to increase in the presence of the metabolite? To turn transcription off in response to a metabolite, I would use a repressible promoter, such as the trp promoter, where high tryptophan represses transcription. To increase transcription in response to a metabolite, I would use an inducible promoter, such as lac/IPTG or araBAD/arabinose, where the inducer activates expression or removes repression.

2.3.i.1.a.iv) Now choose one of the genes of the metabolic pathway previously described (Carotene/lycopene )and choose one enzyme to make an expression construct. What promoter could you use for this? Why did you choose it? I would choose crtI, which encodes phytoene desaturase, the enzyme that converts phytoene into lycopene. I chose this gene because this step is a good candidate for pathway-level control: if CrtI expression is too low, phytoene may accumulate and lycopene output may remain limited.

For the promoter, I would use a tunable inducible bacterial promoter, such as pBAD/araBAD or lac/IPTG. I would prefer pBAD/araBAD for an initial design because arabinose-inducible expression allows controlled activation of the gene. The reason I would not immediately use a strong constitutive promoter is that carotenoid production can create metabolic burden. The goal is not simply to express crtI as strongly as possible, but to tune expression and find the level that improves lycopene production without compromising cell growth.

Therefore, a minimal expression cassette would be: pBAD promoter, RBS, crtI, terminator.

In the full plasmid context: Origin of replication, antibiotic resistance marker, pBAD promoter, RBS, crtI, terminator.

3.1.i What is the origin of replication? The origin of replication, or ori, is the DNA sequence where plasmid replication begins. It allows the plasmid to copy itself inside the host cell and be maintained over generations. Together with its control elements, the ori is part of the plasmid replicon. Source: Adgene’s Article “Plasmids 101: Origin of Replication” available here.

3.1.ii What types of origin of replication do we have? Origins of replication differ by copy number, replication control, compatibility group, and host requirements. Copy number affects gene dosage and burden; replication control affects how tightly plasmid replication is regulated; compatibility group matters when using more than one plasmid; and host requirements determine whether the plasmid can replicate in a given strain. Here goes some examples from Adgene’s Article “Plasmids 101: Origin of Replication” available here.

Origin / replicon	Approx. copy number	Replication control	Compatibility group	Host/use note
pUC / pMB1 derivative	~500-700	Relaxed	A	High-copy E. coli plasmids; useful for DNA yield, but can create burden
pBR322 / pMB1	~15-20	Relaxed	A	Medium-copy E. coli plasmids; more balanced expression
ColE1	~15-20	Relaxed	A	Common E. coli cloning origin
p15A / pACYC	~10	Relaxed	B	Lower-copy origin; compatible with ColE1/pMB1 plasmids
pSC101	~5	Stringent	C	Low-copy origin; useful when stability/low burden matters
R6K	~15-20	Stringent	C	Requires pir gene for replication
CloDF13 / pCDF	~20-40	Relaxed	D	Medium-copy origin, useful in multi-plasmid systems

3.1.iii (Extra) What are compatibility groups? Compatibility groups describe whether two plasmids can be stably maintained in the same bacterial cell. Plasmids with the same or very similar replication/partitioning systems are usually incompatible because they compete for the same replication control machinery. Over time, one plasmid may be lost. This matters if we want to use more than one plasmid in the same E. coli strain: they should have compatible origins, meaning different incompatibility groups. For example, pMB1/ColE1-derived plasmids such as pUC, pBR322, pET, and pGEX are all in compatibility group A, so they should generally not be combined in the same cell. A p15A/pACYC plasmid, group B, could be combined with a ColE1/pMB1 plasmid more safely.

3.1.iv Now for the previously chosen promoter and gene what will be the best origin or replication? For a crtI expression plasmid in E. coli, I would choose a medium-copy origin rather than a very high-copy pUC-type origin. This should provide enough CrtI expression while reducing metabolic burden. If I combine this plasmid with another carotenoid-pathway plasmid, I would choose compatible origins, for example p15A with ColE1/pMB1-derived origins.

4. Elaborate further on other bioparts like RBS, terminators, operators you would use for a correct design and further bioproduction?

Element	Example for this construct	Function	Why it matters for bioproduction
Origin of replication (ori)	Medium-copy bacterial ori, such as pBR322/pMB1-derived ori or p15A	Allows the plasmid to replicate in E. coli	Controls plasmid copy number, affecting gene dosage, expression level, stability, and metabolic burden
Antibiotic resistance marker	Chloramphenicol, ampicillin, or kanamycin resistance	Allows selection of cells carrying the plasmid	Ensures that the production strain maintains the construct
Promoter / regulatory region	pBAD/araBAD or lac/IPTG-based promoter	Initiates transcription and, if regulated, controls when expression turns on/off	Lets me tune crtI expression instead of forcing constant maximum production
Operator / response element	araBAD/AraC or lacO/LacI regulatory sites, if using a regulated promoter	Binding site for regulatory proteins	Enables inducible or repressible control. This is part of the promoter/regulatory region rather than always a separate independent part
RBS - Ribosome Binding Site**	Bacterial RBS upstream of crtI	Recruits the ribosome to initiate translation	Controls how much CrtI protein is made from the mRNA
Coding sequence	crtI	Encodes CrtI / phytoene desaturase	Produces the enzyme that converts phytoene into lycopene
Terminator	Strong bacterial transcription terminator	Stops transcription after the coding sequence	Prevents read-through into other plasmid regions and improves construct stability
Assembly junctions / cloning sites	Gibson overlaps or Golden Gate overhangs	Enable construction of the plasmid	Allow modular assembly and later swapping of promoters, RBSs, or pathway genes
Optional insulators / spacers	Neutral spacer sequences between parts	Reduce unwanted context effects between genetic parts	Can make expression more predictable
Optional reporter/control	GFP in a test cassette, or pigment output itself	Helps verify that expression is working	Useful for debugging promoter/RBS behavior before optimizing the full carotenoid pathway

I did not complete questions 5, 6, 7, and 8, as they were marked as extra-point questions. For this submission, I prioritized the mandatory All Students and Committed Listener sections.

Week 13 Lab: Final Project Labwork

No Lab Assignment this week.

Week 14 Lab: Final Project Labwork

No Lab Assignment this week.

Projects

Final projects:

Individual Final Project
Melanin-based light-recording bioink/biomaterial Designing a MelC2-Based Cell-Free Module for Programmable Melanin Bioink Reframing pigmentation from static dyeing to a programmable chemical state evolution, enabling materials that encode environmental history Important links: Resource Link Final presentation slides CL Final Project Slide Deck Final pTwist_MelC2_T7_TXTL_6xHis construct Benchling Twist Order for my Final Project: MelC2_T7_TXTL_6xHis_expression_cassette Benchling and Twist (Nodes) Document Cell-free master mix plan - 8 planned reactions My Week 11 HW Documentation SECTION 1 - ABSTRACT Melanin is a chemically heterogeneous dark biopolymer known for broadband UV-visible optical absorption, photoprotective behavior, photothermal conversion, redox activity, and long-term optical stability. These properties make melanin a compelling biological route to functional color: a pigment chemistry that can absorb and dissipate radiation, preserve optical traces, buffer oxidative stress, and interface with biological or electronic systems. This project proposes controlling melanin-forming chemistry in a synthetic biology system to develop a programmable bioink for engineered biomaterials. The broader vision is to create materials that combine biosensing and functional response: recording environmental inputs such as light, ionizing radiation, or oxidative stress through measurable optical change, while also enabling properties such as UV or radiation protection, photothermal conversion, antioxidant behavior, and bioelectronic interfacing. Depending on concentration, matrix composition, and material format, this melanin-based bioink could be explored for responsive textiles, UV-protective coatings, architectural and design surfaces, tattoo-like dermal pigments, space-oriented materials, bioelectronic interfaces, and localized radioprotective biomaterials. To move toward this goal, this project aims to design a first genetic module that generates measurable melanin-like optical changes in a controlled cell-free system, then use it as a foundation for future integration into engineered biomaterials such as bacterial cellulose. The central hypothesis is that a codon-optimized Streptomyces antibioticus MelC2 tyrosinase construct can provide a tractable route toward cell-free melanin-like pigment formation, with output shaped by tyrosinase activity, substrate availability, copper cofactor loading, oxygen, pH, redox state, and polymerization chemistry. During HTGAA 2026, I designed a MelC2 expression cassette for TX-TL / E. coli use and designed a validation workflow.
Group Final Project
Bacteriophage Engineering GROUP MEMBERS: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia. PROJECT MAIN GOAL : Increase the stability of the L protein GROUP PROPOSAL: We will use the same workflow than in previous HW (e.g. mutagenesis) but adapt it to specific aim(s) based on HW reading material of week 04 (e.g. shorten the L protein to make it not dependant on bacterial chaperone DnaJ anymore).
Brainstorms
Melanin-based bioink for Light-Recording Materials My individual final project is based on melanin and related compounds in an engineered living material (ELM) as a color-responsive bio-ink. Among many other factors, oxidation state, precursor availability / intermediate reaction pathways likely shape tone and long-term stability and may be modulated using a genetic system, be it a bacterium, a synthetic minimal cell, etc.

Individual Final Project

Melanin-based light-recording bioink/biomaterial

Designing a MelC2-Based Cell-Free Module for Programmable Melanin Bioink

Reframing pigmentation from static dyeing to a programmable chemical state evolution, enabling materials that encode environmental history

Important links:

Resource	Link
Final presentation slides	CL Final Project Slide Deck
Final pTwist_MelC2_T7_TXTL_6xHis construct	Benchling
Twist Order for my Final Project: MelC2_T7_TXTL_6xHis_expression_cassette	Benchling and Twist (Nodes) Document
Cell-free master mix plan - 8 planned reactions	My Week 11 HW Documentation

SECTION 1 - ABSTRACT

Melanin is a chemically heterogeneous dark biopolymer known for broadband UV-visible optical absorption, photoprotective behavior, photothermal conversion, redox activity, and long-term optical stability. These properties make melanin a compelling biological route to functional color: a pigment chemistry that can absorb and dissipate radiation, preserve optical traces, buffer oxidative stress, and interface with biological or electronic systems. This project proposes controlling melanin-forming chemistry in a synthetic biology system to develop a programmable bioink for engineered biomaterials. The broader vision is to create materials that combine biosensing and functional response: recording environmental inputs such as light, ionizing radiation, or oxidative stress through measurable optical change, while also enabling properties such as UV or radiation protection, photothermal conversion, antioxidant behavior, and bioelectronic interfacing. Depending on concentration, matrix composition, and material format, this melanin-based bioink could be explored for responsive textiles, UV-protective coatings, architectural and design surfaces, tattoo-like dermal pigments, space-oriented materials, bioelectronic interfaces, and localized radioprotective biomaterials. To move toward this goal, this project aims to design a first genetic module that generates measurable melanin-like optical changes in a controlled cell-free system, then use it as a foundation for future integration into engineered biomaterials such as bacterial cellulose. The central hypothesis is that a codon-optimized Streptomyces antibioticus MelC2 tyrosinase construct can provide a tractable route toward cell-free melanin-like pigment formation, with output shaped by tyrosinase activity, substrate availability, copper cofactor loading, oxygen, pH, redox state, and polymerization chemistry. During HTGAA 2026, I designed a MelC2 expression cassette for TX-TL / E. coli use and designed a validation workflow.

SECTION 2: PROJECT AIMS

Aim 1: Experimental Aim

Build and validate a first MelC2-based cell-free melanin module

The first aim of this project is to design a codon-optimized Streptomyces antibioticus MelC2 tyrosinase expression cassette for TX-TL / E. coli use and test whether it can generate measurable melanin-like optical changes in a controlled cell-free system. This aim uses DNA design, Benchling assembly, Twist synthesis, fluorescent protein controls, visible darkening, OD 400-500 nm absorbance, SDS-PAGE, and future LC-MS analysis to distinguish protein expression, enzymatic activity, pigment accumulation, and downstream oxidation chemistry.

Aim 2: Development Aim

Optimize the chemical and optical behavior of the melanin-forming system

After validating the first module, the next aim is to optimize the reaction conditions that shape pigment output, including L-tyrosine concentration, copper availability, pH buffering, oxygen exposure, magnesium, incubation time, and reporter choice. This aim will help determine whether the system can be tuned for stronger pigment formation, cleaner optical readouts, and more predictable color response before integration into a material matrix.

Aim 3: Visionary Aim

Develop programmable melanin bioinks for exposure-recording and functional biomaterials

The long-term aim is to integrate the optimized melanin-forming module into bacterial cellulose or other biomaterials to create bio-based surfaces that can both record environmental exposure and respond functionally. If successful, this could support responsive textiles, UV-protective coatings, design surfaces, tattoo-like dermal pigments, bioelectronic interfaces, space-oriented materials, and localized radioprotective biomaterials.

SECTION 3: BACKGROUND

3.1. Peer-reviewed research citations

Melanin is relevant to this project because its material properties extend beyond visible pigmentation. Menichetti et al. 2025 describe melanin photoprotection as a combination of broadband light extinction and antioxidant activity, supporting the idea that melanin-based materials could pair optical response with protection against light-induced damage. Dadachova and Casadevall 2009 further show that melanin changes how biological systems interact with ionizing radiation, with melanized fungi displaying radioprotective behavior and altered electronic properties under radiation exposure. Together, these studies support the central premise of this project: melanin can be treated not only as a pigment, but as a functional material chemistry for exposure-responsive systems.

This material potential has already been explored in several application directions relevant to the proposed bioink. In space-oriented materials, Cordero et al. 2025 showed that fungal melanin-polymer biocomposites exposed to low Earth orbit conditions had improved structural stability and radiation-shielding potential. In photothermal and bioelectronic materials, Yue and Zhao, 2021 review how melanin-like materials can convert absorbed optical energy into heat and support sensor or interface applications through redox activity and mixed ionic/electronic behavior. At the skin interface, Park et al. 2024 developed electroactive melanin tattoo inks using naturally derived melanin nanoparticles to reduce skin impedance, suggesting that melanin-based pigments may be useful for dermal bioelectronic interfaces as well as coloration.

The bioink and textile direction also has direct precedent. Walker et al. 2024 engineered cellulose-producing Komagataeibacter rhaeticus to express tyrosinase and grow self-pigmenting bacterial cellulose through melanin biosynthesis, showing that genetically encoded pigmentation can be integrated into a material-producing microbial platform. Ahn et al. 2021 produced melanin-like pigments microbially from caffeic acid and applied the pigment to cotton fabric dyeing, supporting the relevance of microbial melanin as a textile-compatible colorant.

These studies connect directly to this project’s direction, but also clarify its specific contribution: instead of starting with a finished textile, this work first builds a controlled MelC2-based cell-free module to make melanin-like optical output measurable, tunable, and chemically interpretable before later integration into bacterial cellulose or other biomaterial matrices.

3.2. Novelty and innovation

This project is innovative because it uses existing biological tools in a new material context: a MelC2 tyrosinase module is designed not only to produce pigment, but to generate a measurable and tunable optical output. The cell-free system makes this approach modular, allowing key variables such as copper loading, substrate availability, pH, oxygen, redox state, and polymerization conditions to be tested before introducing the system into more complex biomaterial matrices. This creates a controlled bridge between genetic design and material performance.

The project also challenges a common assumption in functional materials: that color, sensing, protection, and responsiveness must be added as separate components. Instead, it asks whether melanin-forming chemistry can be programmed as a single multifunctional layer that records exposure and produces useful material responses. In doing so, the project expands synthetic biology from making biological products toward engineering bio-based materials whose behavior can be designed, measured, and tuned.

3.3. Why the project matters and potential impact

The main ethical issue is not melanin itself, but the form in which the system is built and deployed. A melanin-based material can remain a controlled chemical module, become a non-replicating embedded system, or become part of a living material platform. Each design choice carries a different ethical burden, so the project should progress from the lowest-risk and most interpretable system toward more complex formats only after validation.

Design choice	Role in the project	Ethical implication
Cell-free MelC2 module	First experimental platform for testing pigment chemistry	Lowest deployment risk; controlled, non-replicating, and easiest to interpret
Non-replicating synthetic minimal cells	Possible future format for localized sensing or pigment production inside a material	Safer than living cells, but requires proof that encapsulation, stability, and output control work
Living bacterial cellulose platform	Possible future scaffold for material production and integration	Most powerful material format, but requires stronger containment, characterization, and environmental controls

For this reason, the current project takes the cell-free route as an ethical and technical starting point. It validates the core chemistry - MelC2 expression, copper loading, substrate availability, pH, oxygen, and pigment formation - before adding living-system or material-scale complexity. This avoids treating a speculative material concept as a deployable product too early.

Ethical principle	What it means here	Project response
Responsibility	Color change could be mistaken for a calibrated exposure sensor	Define whether the output is aesthetic color, qualitative exposure record, or quantitative biosensor
Non-maleficence	Protective claims could create false confidence if the material is not tested under real exposure conditions	Do not claim UV protection, radioprotection, dermal use, or biomedical function before direct validation
Beneficence	The project could reduce material complexity while adding useful functions	Prioritize applications where melanin adds clear value: exposure recording, photoprotection, photothermal response, or oxidative buffering
Biosafety / containment	Future versions may involve living or semi-living systems	Start cell-free; prefer non-replicating or purified systems before deployable living materials

The practical ethical strategy is staged development: first validate pigment chemistry, then test material integration, then evaluate sensing or protective performance under relevant conditions. The main risks are overclaiming protection, treating color change as quantitative sensing too early, or moving into dermal / biomedical contexts before the material is characterized. The project could also be wrong if melanin pigmentation does not correlate reliably with exposure, if pigment chemistry is too variable to control, or if a simpler non-biological sensor performs better. Alternatives such as purified enzymes, synthetic melanin-like polymers, or conventional exposure sensors should remain available if they prove safer or more reliable.

SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

4.1 Experimental plan and timeline in 15 steps

The experimental plan follows a build-test-learn structure. First, the MelC2 construct is selected, designed, ordered, and validated in silico. Next, the cell-free TX-TL system is tested using fluorescent protein controls. Then, pigment formation, protein expression, and pathway chemistry are validated separately. After the course, the project can move toward light-controlled expression modeling and material integration for a melanin-based light-recording bio-ink.

Step	Timeline	Method / tool	Purpose	Expected result
1. Select melanin-forming enzyme	Completed - WT sequence in Benchling	UniProt, literature review, iGEM Registry	Identify a tyrosinase suitable for melanin-like pigment formation	Selection of Streptomyces antibioticus MelC2 as a soluble, oxygen- and copper-dependent tyrosinase
2. Design DNA construct	Completed - codon-optimized construct in Benchling	Codon optimization, Benchling	Build an expression cassette for TX-TL / E. coli use	T7 promoter, RBS, spacer, codon-optimized melC2 CDS, C-terminal 6xHis tag, stop codon, and terminator. Final annotated MelC2 tyrosinase CDS is available in the same Benchling record.
3. Verify protein identity	Completed	BLASTP, conserved-domain analysis	Confirm that the optimized sequence still encodes a canonical tyrosinase	Top hits remain MelC2 / tyrosinase-like proteins with a conserved tyrosinase domain
4. Prepare synthesis order	Completed - Twist order construct in Benchling	Twist Biosciences	Generate a synthesis-ready construct	MelC2 cassette submitted in pTwist Amp High Copy using the Benchling interface
5. Validate TX-TL expression capacity and screen initial reaction variables	8 first reactions planned here; execution estimated 2-4 days	Cell-free reactions with sfGFP and mScarlet-I fluorescent protein controls	Confirm that the cell-free system supports protein expression and establish baseline reaction conditions	Strong sfGFP or mScarlet-I signal would indicate that the TX-TL system is functional. The planned reaction matrix compares substrate, copper, pH buffering, magnesium, incubation time, and reporter choice.
6. Measure pigment kinetics	Planned; estimated 1-3 days	OD 400-500 nm absorbance	Quantify melanin-like pigment accumulation over time	Increasing absorbance would support pigment formation kinetics
7. Confirm MelC2 protein expression	Planned; estimated 2-3 days	SDS-PAGE / His-tag detection	Distinguish protein expression from pigment formation	A MelC2-sized band would support successful expression, even if pigment output is weak
8. Analyze pathway chemistry	Planned; estimated 1-2 weeks	LC-MS	Track L-tyrosine depletion and L-DOPA / quinone-related intermediates	Confirms enzymatic activity even when visual pigment output is ambiguous
9. Model future light control	Post-course; estimated 1-2 weeks for first modeling round	Asimov Kernel	Model a light-activated expression circuit that could later support gradual tonal change in a material system	Candidate circuit logic for controlling melanin expression in response to light exposure
10. Refine light-control model toward aesthetic and functional goals	Post-course; iterative	Asimov Kernel, previous Asimov learning documentation, circuit-design review	Refine the light-activated model until the system becomes more controlled, predictable, and aligned with the intended visual behavior	Improved model for gradual tonal control, or a clearer experimental design if molecular control is not yet sufficient
11. Decision point: molecular control vs material-scale testing	Post-course	Design review, experimental planning	Decide whether to push for maximum molecular control first or move earlier into material-scale experiments	A clearer development path: either optimize the genetic/cell-free control layer first, or begin testing material integration earlier
12. Test material integration	Post-course; estimated 2-6 weeks	Bacterial cellulose, Komagataeibacter rhaeticus, hybrid biomaterial scaffolds, embedded cell-free modules, synthetic minimal cells	Move from biochemical reaction module to material prototype	Spatially localized and stable optical change in a material scaffold
13. Compare material-system architectures	Post-course; iterative	Engineered living material design, bacterial cellulose scaffold testing, hybrid BC / cell-free module systems	Compare different ways of integrating the melanin-producing module into a functional material	Identification of the most promising integration model: engineered K. rhaeticus, bacterial cellulose scaffold with embedded cell-free modules, or a hybrid system
14. Optimize workflow and material parameters	Post-course; iterative	Build-test-learn optimization, dilution tests, reaction-variable screening, economic modeling	Optimize biological, optical, material, and cost parameters across the proposed workflow	More predictable pigment output, better material compatibility, and clearer constraints for scaling or prototyping
15. Stage validation toward final bio-ink prototype	Post-course; long-term	Expression checks, pigment kinetics, protein validation, LC-MS, optical imaging, material-performance testing	Move from simple expression and pigment checks to more refined mechanistic and material-level validation	A staged validation framework for developing a melanin-based light-recording bio-ink

After the construct is designed, validation proceeds from simple functional checks to more specific analytical readouts. Each step is meant to isolate a different possible failure point: TX-TL expression capacity, MelC2 protein production, pigment accumulation, or pathway-level chemistry.

Previous iGEM tyrosinase projects (see references list at the end of this document) showed that tyrosinase expression can be detectable even when pigment formation is weak or absent. For this reason, protein production, enzyme activity, and optical output need to be validated separately rather than treated as a single result.

Post-course, the next conceptual step is to move this melanin-based light-recording bio-ink forward by modeling light-activated melanin expression in Asimov Kernel. This will help clarify whether the project should first push for tighter molecular control, or move earlier into material-scale experimentation. The longer-term goal is to connect the cell-free MelC2 module to a material system capable of controlled, spatially localized, and visually meaningful pigment formation.

The tables below summarize the validation logic for the cell-free MelC2 module. Each readout acts as a checkpoint, moving from general TX-TL expression capacity to visible pigment output, absorbance kinetics, protein expression, and finally chemical validation.

Step	Method	Question answered	Expected result	Decision
1	Fluorescent protein control	Is the TX-TL system functional?	Strong sfGFP or mScarlet-I fluorescence	If weak, debug TX-TL before testing MelC2
2	Reaction photos	Is there visible darkening over time?	Progressive color change in reaction samples	If absent, continue to OD because pigment may be low-level
3	OD 400-500 nm	Is pigment accumulating quantitatively?	Absorbance increases over time	If flat, check MelC2 protein expression
4	SDS-PAGE / His-tag detection	Is MelC2 expressed?	Band near the expected MelC2 size or His-tag signal	If absent, debug construct, expression conditions, or protein stability
5	LC-MS	Is the pathway chemically active?	L-tyrosine depletion and/or detection of L-DOPA / quinone-related intermediates	If intermediates are absent, investigate folding, copper incorporation, pH, oxygen, substrate availability, or sampling time

Observation	Interpretation
MelC2 detected + OD increase / darkening	Best-case result: protein expression and pigment-forming chemistry are both working
MelC2 detected + no OD increase / darkening	Expression works, but enzyme activity, cofactor availability, substrate availability, oxygen, or downstream pigment chemistry may be limiting
No MelC2 detected + no pigment	Expression or construct-level failure
LC-MS intermediates + weak pigment	Enzyme is active, but pigment polymerization or pigment accumulation is limiting
No LC-MS intermediates + no pigment	MelC2 is inactive, absent, or missing required catalytic conditions

This staged logic keeps the first aim experimentally interpretable. Color change is treated as one readout among several, while protein expression and LC-MS provide the controls needed to distinguish enzyme production, catalytic activity, and downstream pigment chemistry.

An initial version of a visual workflow diagram for this validation logic generated using chatGPT can be found in my Brainstorms documentation.

4.2 Techniques relevant to this project

The checked techniques reflect the parts of the project that were used or directly planned, from MelC2 construct design and cell-free expression to validation readouts and future automated testing.

Pipetting

Pipetting
Lab Safety
Bioethical Considerations

Pipetting, lab safety, and bioethical considerations are relevant because the project depends on careful preparation of small-volume cell-free reactions and controlled handling of reagents such as L-tyrosine, CuSO4, buffers, and DNA constructs. These techniques also support the project’s staged design: testing melanin-forming chemistry in a contained, non-deployable system before moving toward biomaterial applications (lab safety and bioethical consideration).

DNA Gel Art

DNA Sequencing
DNA Editing
DNA Construct Design
Restriction Enzyme Digestion
Gel Electrophoresis
DNA Purification From Gel
Databases, e.g. GenBank, NCBI, Ensembl, and UCSC Genome Browser

DNA Gel Art / construct design was selected because the project required designing and verifying a MelC2 expression cassette. DNA construct design and databases are checked because Benchling, UniProt, BLASTP, and sequence databases were used to select, optimize, and verify the MelC2 construct. DNA sequencing, DNA editing, restriction digestion, gel electrophoresis, and gel purification are unchecked because they were not performed in this stage. However, they may become relevant after synthesis if the construct needs to be sequence-verified, edited, digested, visualized on a gel, or purified before downstream expression tests.

Bioproduction

Bioproduction
Chassis Selection, e.g. TX-TL / E. coli context
Registry of Standard Biological Parts
Plasmid Preparation
Bacterial Culturing
Quality Control / Analysis
Bacterial Processing, e.g. centrifugation, lysis, DNA purification

Bioproduction was selected because the project aims to produce a functional biological output: MelC2-driven melanin-like pigmentation. Chassis selection is checked because the first expression context is TX-TL / E. coli, and the Registry of Standard Biological Parts informed the expression design. Quality control / analysis is checked because the workflow uses fluorescence, OD 400-500 nm, SDS-PAGE, and future LC-MS to validate expression, activity, and pigment output. Plasmid preparation, bacterial culturing, and bacterial processing are unchecked because this stage uses a synthesized construct and cell-free validation rather than live-cell propagation or processing.

Lab Automation

Creating Code for Laboratory Automation
Using Liquid Handling Robots, e.g. Opentrons
Designing a Twist Order
Creating a plan to use the Autonomous Lab at Ginkgo Bioworks

Lab automation was selected because the project includes automated experimental planning for the next validation stage. I checked code for laboratory automation, Twist order design, and Ginkgo Bioworks planning because I prepared the construct for synthesis and began designing a reaction matrix to test variables such as copper, tyrosine, buffering, magnesium, and reporter choice. I left liquid handling robots unchecked because I did not directly operate an Opentrons or similar robot in this stage.

Protein Design

Protein Design
Use of Boltz or PepMLM
Use of Asimov Kernel
Use of Benchling
Models and Notebooks
Databases

Protein design was selected because the project depends on choosing, analyzing, and eventually controlling a melanin-forming enzyme. I checked Benchling, databases, models, and notebooks because they were used to select MelC2, inspect sequence/function, support construct design, and analyze protein behavior. Asimov Kernel is checked because it was explored for future light-responsive control of MelC2 expression. Boltz and PepMLM are unchecked because they were not used in this stage.

Cell-Free Systems

Cell-Free Reactions
Freeze-Dried Cell Free Systems
miniPCR Tools
Protein Purification

Cell-free systems were selected because the first experimental goal is to test MelC2 expression and melanin-like pigment formation in a controlled, non-replicating format. Cell-free reactions and freeze-dried cell-free systems are checked because they are the planned platform for validating the module. miniPCR and protein purification are unchecked because this stage does not require PCR amplification or purified MelC2 protein.

Gibson Assembly

Primer Design or Selection
PCR Reactions
Gibson Assembly
Other Cloning Methods, e.g. Restriction Enzyme Digestion or Gateway Cloning

CRISPR

CRISPR/Cas9
Designing Prime Editing gRNA

Gibson Assembly and CRISPR were left unchecked because the current project does not involve cloning by PCR/Gibson methods or genome editing. The MelC2 module was designed digitally and prepared for synthesis through Twist, so primer design, PCR, Gibson Assembly, restriction-based cloning, CRISPR/Cas9, and prime-editing gRNA design were not part of this stage.

4.3 Two techniques expanded

The two most important techniques are DNA construct design, which creates the module, and cell-free reactions, which test whether the module produces an interpretable optical output.

DNA construct design: DNA construct design is central because Aim 1 depends on building a MelC2-based module that can be tested in TX-TL / E. coli conditions. I used database research to select MelC2, codon optimization to adapt the sequence for expression, and Benchling to assemble the cassette. The C-terminal 6xHis tag supports future protein-level validation. This matters because MelC2 expression must be distinguished from actual melanin-like pigment formation.
Cell-free reactions: Cell-free TX-TL is the cleanest first platform because the main uncertainty is chemical: whether MelC2 expression, copper loading, L-tyrosine availability, pH, oxygen, and downstream oxidation chemistry can generate optical change. Compared with living cells, the system is easier to control and interpret. It also avoids adding material-scaffold complexity too early. The first experiments use fluorescent protein controls, visible darkening, OD 400-500 nm absorbance, SDS-PAGE, and future LC-MS to validate the module step by step.

4.4 Industry Council companies relevant to the project

These companies are relevant because they map onto the main project needs: DNA synthesis, automation, modeling, chemical analysis, reagents, and future biomaterial translation.

Company	Relevance to project
Twist Biosciences	DNA synthesis for the MelC2 expression cassette
Ginkgo Bioworks	Autonomous cell-free reaction testing and experimental automation
Asimov / Kernel	Future modeling of light-responsive genetic control
Waters Corporation	LC-MS analysis of L-tyrosine, L-DOPA, and oxidation intermediates
Millipore Sigma	Reagents such as L-tyrosine, CuSO4, buffers, and analytical standards
Thermo Fisher Scientific	Molecular biology reagents, protein analysis tools, and general lab workflows
BioFabricate	Future biomaterial, textile, and design-oriented applications
Cultivarium	Potential relevance for future non-model organism or biomaterial chassis engineering

A particularly relevant external benchmark is MelaTech, a startup focused on melanin-based materials for space applications.

SECTION 5: Results & Quantitative Expectations

5.1.1 Aspect of the final project validated

I validated the DNA design foundation of the project: the construction of a MelC2 tyrosinase expression cassette for TX-TL / E. coli use. This validation addresses the first build layer of the project, because a reliable genetic module is required before testing melanin-like pigment formation in a cell-free system.

The validated output is not melanin production itself, but a synthesis-ready construct designed to express a soluble, oxygen- and copper-dependent tyrosinase. This includes enzyme selection, codon optimization, expression cassette design, vector assembly in Benchling, Twist submission, and a planned validation workflow for future cell-free testing.

5.1.2 Validation protocol

The figure below, from my CL Final Project presentation, summarizes the project pipeline around the validated build layer: MelC2 selection, DNA construct design, Twist submission, and planned cell-free validation.

I followed this protocol:

5.1.2.1 Enzyme selection

I selected MelC2 from Streptomyces antibioticus as the target enzyme for the first construct. I chose MelC2 because it is a soluble, cytosolic, oxygen-, copper-dependent and reasonably small enzyme of about 273 amino acids, and has a reviewed Swiss-Prot annotation, which made it a strong first candidate for cell-free TX-TL expression.

Its dependence on copper, oxygen, substrate availability, pH, and downstream polymerization also gives the project a clear set of tunable variables for validating melanin-like pigment formation.

Useful references for this step included the UniProt P07524 entry and WT sequence in Benchling.

And here is the predicted structural model used as part of my design context:

5.1.2.2 Codon optimization

I codon-optimized the MelC2 sequence for E. coli K-12 expression in Benchling**

Codon-optimization of P07524 for E. coli K-12, to avoid BsaI/BsmBI/BbsI and add a C-terminal His-tag to quantify enzyme expression cleanly -> Results in Benchling here.

I’ve selected the region of the AA sequence I wish to back translate and right clicked on the highlighted region. From the the codon optimization tab:

Host: E. coli K-12
Method: Match codon usage
GC content: Medium (0.33 to 0.66) cause the extremes may be inconvenient. High GC can create strong secondary structures and low GC can cause instability/repeats and can make synthesis harder.
Uridine depletion: off (not relevant for bacterial expression)
Hairpin parameters: Stem size: 8 and Window 50
Restriction sites: avoid BsaI, BsmBI, BbsI (Type IIS restriction enzymes, the workhorses of Golden Gate assembly)
Patterns to reduce: AAAAAA and ATATATATA

I clicked on “Preview Optimization” and got this result, which I’ve saved in the same Benchling folder here:

BLASTP verification of codon-optimized sequence:

I translated the codon-optimized DNA and ran BLASTP against nr/ClusteredNR. The top hits were MelC2 tyrosinases from Streptomyces spp., with 100% query coverage, E-value 0.0, 92% identity (251/273), 95% positives (261/273), and 0 gaps. Conserved domain analysis identified the Tyrosinase domain across the full sequence length. This confirms the optimized DNA still encodes a canonical tyrosinase.

melC2 tyrosinase (Streptomyces antibioticus, P07524, codon-optimized for E. coli K-12) DNA sequence Benckling link here.

5.1.2.3 Protein-detection design

I added a C-terminal 6xHis tag (CACCACCACCACCACCAC) before the stop codon to support future protein-level detection / quantification.

5.1.2.4 Expression cassette assembly

I assembled the TX-TL expression cassette using a T7 Promoter, RBS (Shine Delgarno) / AAATAT Spacer, codon-optimized melC2 CDS, C-terminal 6xHis tag, TAA stop codon, and T7 terminator BBa_B0015 Benchling link here.

To be considered: T7 can maximize protein yield but also overwhelm folding capacity, causing inactive protein accumulation (increase the likelihood of tyrosinases misfolds, aggregation, or fail to incorporate copper correctly). I’d replace it by a moderated construct and compare the results in reference to the BBa_K2481108 (control).

So here’s my final melC2 tyrosinase CDS with annotations.

5.1.2.5 Vector assembly and construct inspection

I placed the full expression cassette into a pTwist Amp High Copy vector. Why: high-copy propagation in E. coli for easy plasmid prep; selection marker is standard.**

I inspected the final construct map in Benchling to confirm the organization of the insert, vector, promoter, terminator, and annotated CDS. Assemblings on Benchling here.

Final Construct: My melC2 construct submitted assembled into pTwist Amp High Copy on Benchling interface.

5.1.2.6 Twist submission

I submitted the final construct for synthesis through Twist. My Twist order for final Construct here

This completed the validated DNA-design layer of the project.

Planned next validation

After synthesis, the construct will be tested in a staged cell-free workflow: fluorescent protein controls for TX-TL capacity, visible darkening and OD 400-500 nm for pigment formation, SDS-PAGE / His-tag detection for MelC2 expression, and future LC-MS for tyrosine / L-DOPA-related intermediates.

I also began planning Ginkgo RAC-style cell-free reaction conditions to test key bottlenecks such as copper availability, tyrosine concentration, pH buffering, magnesium, incubation time, and reporter choice.

Prepare reagents and workflow (Ginkgo & Open AI)

I’ll first test my brainstormed HW 11C Cell-Free Master Mix experiment and then iterate aiming for a debugged and optimized workflow.

Here are some variables I had in mind when formulating this first 8 master mix compostion

Melanin production in E. coli or in a cell-free system is influenced by several parameters that act at the level of melC2 expression and enzyme activity / downstream reactions:

L-tyrosine concentration (substrate, limited solubility)
CuSO4 concentration: since this tyrosinase is a type 3 copper-containing enzyme, Cu2+ is a cofactor of the enzyme. Too much copper can also stress cells or inhibit cell-free reactions.
Magnesium
Energy mix
Molecular oxigen avaliability for tyrosinase reactions
pH: tyrosinase activity and melanin polymerization are pH-dependent. If the reaction acidifies over time, enzyme activity or pigment formation may decrease.

My first 8 experiments at Ginkgo - aim is to successfully produce fluorescent protein and generate an initial dataset for analysis.

mScarlet-I → expression readout for melC2 tyrosinase specifically fluorescence is less sensitive to melanin, so it better tracks expression alone (sfGFP → Ex ~488 nm / Em ~510 nm → high overlap with melanin absorbance; mTurquoise2 → even worse (blue region); mScarlet-I → Ex ~569 nm / Em ~594 nm → less overlap).

For optimizing the Master Mix design for mScarlet-I in my melC2 tyrosinase cell-free system, I’d supplement CuSO4 since my analyte is a copper-dependent enzyme, HEPES-KOH pH 7.5 to have an additional buffer against acidification and magnesium glutamate to improve translation capacity.

I’d actually supplement L-tyrosine that serves as a functional validation that my protein of interest MelC2 tyrosinase is being expressed and active.

Master Mix designs to be tested using mScarlet-I and sfGFP, the 8 reactions outlined are available here in Week 11 HW Documentation.

5.1.3 Synthetic biology techniques used

The main synthetic biology technique used was DNA construct design. I designed a codon-optimized MelC2 tyrosinase cassette for TX-TL / E. coli expression, added a C-terminal 6xHis tag for future protein detection, assembled the cassette in Benchling, and prepared it for synthesis through Twist.

I also used database-based sequence selection and verification. UniProt and Benchling were used to select and inspect the MelC2 sequence, while BLASTP and conserved-domain analysis were used to confirm that the codon-optimized DNA still encoded a canonical tyrosinase.

A third relevant technique was cell-free system planning. The construct was designed specifically for TX-TL / E. coli use, and the next validation workflow was planned around fluorescent protein controls, visible darkening, OD 400-500 nm absorbance, SDS-PAGE / His-tag detection, and future LC-MS analysis.

Finally, I used lab automation planning by preparing a Ginkgo RAC-style reaction matrix to test variables expected to affect MelC2 pigment formation, including copper availability, L-tyrosine concentration, pH buffering, magnesium, incubation time, and reporter choice.

5.1.4 Data and analysis

The validation data for this stage are design-level and sequence-level results generated during construct preparation. These data show that the MelC2 construct is synthesis-ready and still encodes the intended tyrosinase target.

Validation item	Result / quantitative expectation	Interpretation
Target enzyme	MelC2 tyrosinase from Streptomyces antibioticus	Selected as the first melanin-forming enzyme candidate
Protein length	~273 amino acids	Small enough for practical TX-TL expression testing
Codon optimization host	E. coli K-12	Matches the intended TX-TL / E. coli expression context
GC content after optimization	57%	Within a workable synthesis and expression range
Rare codons	6	Low enough to support expression feasibility
Hairpins detected	0	Reduces risk of problematic RNA secondary structure
AAAAAA occurrences	0	Removes a problematic repetitive A-rich pattern
ATATATATA occurrences	0	Removes a problematic repetitive AT-rich pattern
Avoided restriction sites	BsaI, BsmBI, BbsI	Improves compatibility with future Type IIS cloning workflows
Detection feature	C-terminal 6xHis tag	Enables future protein-level validation
BLASTP query coverage	100%	Optimized sequence still aligns across the full tyrosinase sequence
BLASTP E-value	0.0	Strong sequence-level match
BLASTP identity / positives	92% identity / 95% positives	Confirms the optimized construct still encodes a MelC2-like tyrosinase
Gaps	0	No major sequence disruption introduced by optimization
Conserved domain	Tyrosinase domain across full sequence	Confirms the intended enzyme family was preserved

These data validate the first build layer of the project: the DNA module is codon-optimized, annotated, compatible with the intended TX-TL / E. coli context, and submitted for synthesis. The results do not prove melanin production, but they confirm that the construct is coherent enough to justify downstream expression testing. The next quantitative expectation is that successful cell-free expression should produce detectable MelC2 by SDS-PAGE / His-tag detection and measurable pigment accumulation by OD 400-500 nm if the enzyme is active under the tested conditions.

5.2 Unexpected challenges, limitations, and alternatives

The main limitation is that a correct DNA construct does not automatically prove protein activity or melanin-like pigment formation. Tyrosinase expression can be detected while pigment remains absent if folding, copper incorporation, substrate availability, oxygen, pH, or downstream polymerization chemistry is limiting.

Another challenge is that melanin is a chemically heterogeneous output, so visible darkening alone is not enough to validate the system. To address this, the next validation stage separates TX-TL expression capacity, MelC2 protein production, pigment accumulation, and pathway-level chemistry using fluorescence controls, OD 400-500 nm, SDS-PAGE / His-tag detection, and future LC-MS. If the T7 design produces inactive or misfolded protein, an alternative strategy would be to test a moderated promoter, adjust copper and substrate concentrations, or compare purified enzyme / synthetic melanin-like polymer approaches before moving into more complex biomaterial systems.

SECTION 6: ADDITIONAL INFORMATION

6.1 References cited in this assignment

Menichetti, L. et al. “Melanin as a photoprotective material,” 2025. https://www.mdpi.com/3235558
Dadachova, E. and Casadevall, A. “Ionizing radiation: how fungi cope, adapt, and exploit with the help of melanin,” 2009. https://pmc.ncbi.nlm.nih.gov/articles/PMC2677413/
Cordero, R. J. B. et al. “Fungal melanin-polymer biocomposites exposed to low Earth orbit conditions,” 2025. https://www.pnas.org/doi/10.1073/pnas.2427118122
Yue, X. and Zhao, L. “Melanin-like materials for photothermal and bioelectronic applications,” 2021. https://www.mdpi.com/947490
UniProt. Streptomyces antibioticus tyrosinase MelC2, P07524. https://www.uniprot.org/uniprotkb/P07524/entry
iGEM Registry. BBa_I14032, P(lac)IQ promoter. https://parts.igem.org/Part:BBa_I14032
iGEM Registry. BBa_K193600, melA tyrosinase. https://parts.igem.org/Part:BBa_K193600
iGEM Registry. BBa_K193602, pLacIQ-RBS-melA composite construct. https://parts.igem.org/Part:BBa_K193602
iGEM Registry. BBa_K2481108, MelA expression construct for E. coli BL21(DE3). https://parts.igem.org/Part:BBa_K2481108

6.2 Supply list and budget

This budget estimates the next practical stage of the project: validating the MelC2 construct in a cell-free TX-TL system before moving into material integration.

The cost ranges below were estimated with the assistance of ChatGPT and should be treated as approximate planning values. The estimation method was to break the project into major experimental cost categories - DNA synthesis, TX-TL reactions, controls, reagents, consumables, protein validation, chemical validation, and material integration - and assign conservative low/high ranges for each category based on typical small-scale synthetic biology workflows.

The lower end of each range assumes access to shared lab equipment, existing stocks of common reagents, and limited reaction numbers. The higher end assumes new reagent purchases, larger reaction matrices, external analytical services, or the need to purchase or arrange access to readout equipment. Exact costs would need to be confirmed through vendor quotes, institutional core facility pricing, or cloud-lab pricing.

Category	Supplies / services	Estimated cost	Notes
DNA synthesis	MelC2 TX-TL expression cassette in pTwist Amp High Copy vector	$150-300	One synthesis-ready expression cassette
Cell-free TX-TL reaction system	E. coli TX-TL master mix or freeze-dried cell-free reaction kit	$300-800	Enough material for expression controls and an initial MelC2 reaction matrix
DNA / expression controls	sfGFP control plasmid or template; mScarlet-I control plasmid or template	$100-300	Used to confirm that the TX-TL system supports protein expression
Substrates and cofactors	L-tyrosine; CuSO4; magnesium glutamate; nuclease-free water	$100-250	Core reaction components for testing tyrosinase activity
Buffering and reaction-condition reagents	HEPES-KOH pH 7.5; additional salts or energy-mix supplements if needed	$100-250	Used to adjust pH, magnesium, and reaction stability
Consumables	PCR tubes or reaction tubes; pipette tips; microcentrifuge tubes; plate or strip-tube format for reaction imaging	$100-250	Disposable materials for small-volume reactions
Optical readout equipment	Plate reader or spectrophotometer capable of OD 400-500 nm; fluorescence readout for sfGFP / mScarlet-I	$0 if shared; $5,000+ if purchased	The project requires access to the instrument, not necessarily purchase
Protein-expression validation	SDS-PAGE gel system access; protein ladder; gel stains; optional His-tag detection reagents	$150-500	Confirms whether MelC2 protein is produced independently of pigment output
Chemical validation	LC-MS access for L-tyrosine, L-DOPA, and related intermediates; analytical standards	$300-1,500	Cost depends on shared facility access, outsourcing, and sample number
Automation / cloud lab testing	Ginkgo RAC-style cell-free reaction matrix, if available	Variable	Not included in the main estimate because pricing depends on platform access
Future material-integration supplies	Bacterial cellulose sheets or Komagataeibacter culture materials; coating / embedding materials	$200-700	Future stage after biochemical validation

Estimated total for first validation stage: approximately $850-3,600, assuming access to shared lab equipment.

This total excludes major equipment purchases and uncertain cloud-lab pricing. It includes the core experimental costs needed to move from a designed MelC2 construct to initial TX-TL expression, pigment-production screening, and basic validation.

Estimated total including purchased equipment or external analytical services: could exceed $5,000-10,000, depending on instrument access, number of samples, and whether LC-MS or optical readout must be outsourced or purchased.

This documentation was developed with the assistance of ChatGPT, which was used to support drafting, editing, organization, and figure generation. All scientific decisions, final content, and interpretations were reviewed and approved by the author.

Group Final Project

Bacteriophage Engineering

GROUP MEMBERS: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia.

PROJECT MAIN GOAL : Increase the stability of the L protein

GROUP PROPOSAL: We will use the same workflow than in previous HW (e.g. mutagenesis) but adapt it to specific aim(s) based on HW reading material of week 04 (e.g. shorten the L protein to make it not dependant on bacterial chaperone DnaJ anymore).

Please check our most recent updated Google Docs on this.

Note on project status

The Group Final Project became optional for Spring 2026, with collaborative work expected to resume later. Because of this, my documentation focuses on the individual contribution I made during the planning and design phase rather than on a completed experimental workflow.

My main contribution was to help define candidate MS2 L-protein mutations using a combination of protein language model scoring, experimental mutant data, and biological reasoning about L-protein functional regions. The goal was to identify a small set of interpretable mutations that could later be tested experimentally for effects on L-protein stability, DnaJ dependence, membrane insertion, and lysis function.

Here’s a summary of my main individual contributions to the plan for engineering the bacteriophage:

I ran the provided mutational scoring notebook to obtain per-substitution LLR scores for the MS2 L-protein and shortlisted substitutions with positive scores. The full scoring results are included in a table on my Homework 5 page.

I then cross-checked these shortlisted mutations against the provided experimental mutant dataset, L-Protein Mutants, which reports amino acid substitutions and their measured lysis phenotypes.

The overlap between the two data suggests that sequence-based LLR scores capture only part of the functional landscape of the MS2 L-protein. More broadly, positive LLR scores may reflect sequence plausibility or local biochemical compatibility, but they do not fully account for higher-order constraints such as host-factor dependence, membrane behavior, and oligomer formation.

Therefore, I decided to select five candidate mutations by combining positive LLR scores with biological reasoning about the protein’s distinct functional domains, treating LLR scores as a prioritization tool for experimental testing rather than as a direct predictor of lytic function.

The MS2 L-protein is organized into distinct functional domains:

Hydrophilic N-terminal region involved in DnaJ-mediated folding
Transmembrane/C-terminal region responsible for membrane insertion and pore formation

The two soluble-region mutants, S9Q and C29R, were chosen to probe effects on folding and possible DnaJ dependence, whereas the three transmembrane mutants, A45L, T52L, and N53L, were chosen to probe membrane insertion and oligomerization.

Mutant 1 - S9Q (soluble, LLR = 2.014)

Sequence: METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: High positive score in the soluble region (putative DnaJ-interaction domain). Ser→Gln increases hydrogen-bonding potential and may alter surface chemistry without strongly destabilizing the fold.

Mutant 2 - C29R (soluble, LLR = 2.395)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: One of the strongest positive-scoring substitutions in the soluble region. Adds a positive charge that could reshape chaperone-recognition or interaction surfaces.

Mutant 3 - A45L (TM, LLR = 1.539)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: Hydrophobic substitution in the transmembrane segment. Ala→Leu increases hydrophobicity and may stabilize membrane helix packing/insertion and oligomer stability.

Mutant 4 - T52L (TM, LLR = 1.814)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: Polar→hydrophobic change in the TM region. Thr→Leu may increase membrane compatibility and reduce local insertion/misfolding penalties.

Mutant 5 - N53L (TM, LLR = 1.865)

Sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT

Selection Rationale: Polar→hydrophobic change in the TM region with a strong positive score. Selected as an additional TM-stabilizing candidate.

Brainstorms

Melanin-based bioink for Light-Recording Materials

My individual final project is based on melanin and related compounds in an engineered living material (ELM) as a color-responsive bio-ink. Among many other factors, oxidation state, precursor availability / intermediate reaction pathways likely shape tone and long-term stability and may be modulated using a genetic system, be it a bacterium, a synthetic minimal cell, etc.

Melanin itself is a heterogeneous and hard-to-define analyte candidate, so my idea is to use its main defined intermediates, like L-DOPA, dopamine, and quinones, as analytes and use a high-resolution method like LC-MS for calibration/ground truth method aiming to understand and quantify melanin-related compounds that interfere in the darketing output of the ink/material. Than use protein design to build embedded sensing for spatial or real-time readouts inside the material aiming for building a fine-tuning system that can relate color tone of the material and the synthesis of the different melanin compounds as well as control mechanisms that can trigger it (different UV light wavelengths for instance).

Explore whether melanin-based optical outputs can be generated within different bio-materials such as bacterial cellulose (BC) and ELMs it for applications in fashion, design, and light-recording materials.

I want to establish a first melanin-producing genetic platform, and fine tune it’s pigmentation in a high resolution scale. The strongest version of the project, a bio-based material that gradually develops melanin-derived tonal variation in response to different input signals (i.e. different UV wavelenghts), behaving less like a dyed textile and more like an exposure-recording surface.

Since K. rhaeticus naturally produces cellulose, it also lets me focus on material-producing biology in a native chassis instead of forcing cellulose synthesis into a non-native organism. On top of that, I am interested in the possibility of later embedding synthetic minimal cells into the cellulose as localized, non-growing modules for sensing and pigment generation.

A major question for me is what the right analyte is. Since melanin is a heterogeneous polymer, I think it does not make sense to treat it as a single clean measurable output. Because of that, I am leaning toward focusing on using as analyte more tractable analytes such as the expressed enzyme itself, or melanin-related intermediates like L-tyrosine, L-DOPA, dopamine, quinones, DHI, or DHICA.

This is where LC-MS starts to feel really central to the project. I started thinking that maybe the application should be chosen based on what LC-MS is actually powerful enough to resolve. That led me to think about applications where fine control over color, stability, or chemical state is especially important:

Bio-based inks or photography, where oxidation state could shape color and long-term stability.

The ink and photography direction is especially interesting to me because the final image might look stable, but what defines tone and durability may actually be determined much earlier by oxidation chemistry.

Two materials could look similar at first, but age very differently depending on how those intermediates evolved. In that case, LC-MS could help connect invisible intermediate chemistry to visible outcomes in the final material.

Bioadhesives or coatings, where intermediate catechol chemistry may directly determine performance.

The bioadhesive or catechol-based coating direction also seems compelling. These systems often depend on catechol-containing molecules like dopamine or L-DOPA, which can oxidize into quinones and then participate in crosslinking. That balance between reduced catechol and oxidized quinone seems to shape adhesive behavior. So instead of only testing the final strength of an adhesive, LC-MS could potentially help track how the chemistry develops during formation and explain why some conditions produce better performance than others.

In these kinds of systems, LC-MS and fine tune control of synthesis of melanin-compounds does not feel like overkill to me. It feels like the right level of resolution for the chemistry that actually matters. So I am starting to think about the project less as “make a melanin material” in the broadest sense, and more as “choose a melanin-related material application where intermediate-state chemistry is central, measurable, and worth controlling.”

Project concept:

An engineered living material (ELM) based on bacterial cellulose (BC), using Komagataeibacter rhaeticus as the primary chassis, to produce melanin-based optical outputs in a cellulose material for fashion, design, and light-recording applications.

The current direction is not to maximize “smart material” complexity at once, but to first establish a robust melanin-producing BC platform, then evaluate whether additional functions such as keratin expression, self-repair, or embedded synthetic minimal cells are technically justified.

The strongest version of the project is a nude-toned or skin-adjacent material that gradually develops melanin-derived tonal variation in response to exposure conditions, producing a material that behaves less like a dyed textile and more like an exposure-recording surface.

Why bacterial cellulose?

BC is a strong candidate because it is:

biogenic and directly fabricable as a sheet-like material
compatible with engineered living material approaches
mechanically robust relative to many other microbial matrices
moldable as pellicles, spheroids, or printed structures
already supported by the Komagataeibacter Tool Kit (KTK), a modular cloning toolkit for this genus

In carbon-rich media, Komagataeibacter polymerizes and secretes linear glucose chains that self-assemble into a dense interconnected cellulose mesh. This cellulose pellicle forms at the air-liquid interface and behaves like a biofilm-like material scaffold around the producing cells.

Which chassis?

Primary chassis: Komagataeibacter rhaeticus A high-yield bacterial cellulose producer and a strong chassis for BC-based ELMs.

Why Komagataeibacter rhaeticus?

native bacterial cellulose production
established relevance for BC-based material engineering
allows the project to focus on more specific objectives for material-producing biology, rather than forcing cellulose synthesis into a non-native organism like E. coli

Secondary system: synthetic minimal cells embedded in BC

As a second aim, the project may incorporate synthetic minimal cells (SMCs) as embedded, non-replicating functional modules inside or on the cellulose material. As these SMCs would add localized, compartmentalized sensing and pigment-generation functions to the BC scaffold. Therefore, a useful synthetic minimal cell for this project would basically be a light-exposure logging vesicle embedded in or deposited onto bacterial cellulose.

The living BC producer: K. rhaeticus builds the material scaffold and the synthetic minimal cells allow vesicle-based modules provide controlled, non-growing sensing and melanin output. This separation may be useful if pigment production or sensing logic is easier to implement in a compartmentalized cell-free system than in the BC-producing chassis itself.

Main questions

1- Since melanin is a heterogeneous polymer, which analyte should I choose to analyse?

I might want to confirm the expressed enzyme/protein (for example tyrosinase, laccase, TyrP, or another melanin-related enzyme) or melanin intermediates: L-tyrosine, L-DOPA, dopaquinone-derived products, DHICA, DHI, etc since melanin is a heterogeneous polymer. so

These are often much more tractable by LC-MS than melanin itself.

Aims

AIM 1: Define and model a first light-responsive melanin-producing synthetic minimal cell for integration into bacterial cellulose

Develop a specific in silico design for a phospholipid vesicle-based synthetic minimal cell that uses EL222 to activate melA expression under blue light, with the goal of generating visible melanin production as a localized output that could later be embedded into bacterial cellulose made by K. rhaeticus. This aim focuses on specifying the exact first system, its required components, and whether its chemistry and logic are feasible before any experimental implementation.

AIM 1 Specific Objectives:

define the exact genetic module to be tested first: EL222 + melA
specify the full internal composition of the vesicle:
- Tx/Tl source
- ATP regeneration system
- tyrosine
- copper
- salts/cofactors
define the membrane composition for the first prototype, e.g. POPC + cholesterol
map the input-output logic precisely:
- input = blue light
- regulator activation = EL222
- output = tyrosinase expression
- final material output = melanin accumulation / darkening
determine which molecules must be pre-encapsulated and which, if any, must cross the membrane
identify the minimum set of assumptions required for the system to function = specify the required materials, genes, lipids, cofactors, and readouts for the first prototype

AIM 2: Experimental planning and prototyping strategy for melanin integration into bacterial cellulose materials

Translate the selected design into a concrete experimental plan, prioritizing a staged workflow from simple proof of concept to material-level testing. This aim is not yet full implementation, but the preparation of a robust experimental roadmap that makes the project technically executable and testable.

Practical objectives:

measures of success / failure:
- define the first measurable success criteria: visible darkening? absorbance increase? spatially localized pigment formation?
- identify the main failure points of this exact design, such as insufficient expression, low tyrosinase activity, substrate limitation, or poor melanin accumulation
define the first build-test sequence, including which subsystem should be validated first:
melanin pathway in a tractable chassis
cell-free context
BC production in K. rhaeticus
integration of pigment module with BC
plan how BC will be fabricated and presented for testing, e.g. pellicles, spheroids, molded sheets, or layered composites
define how synthetic minimal cells would be embedded in, coated onto, or associated with BC
determine the primary experimental readouts: visible pigmentation; image-based quantification of tone; spatial patterning under differential light exposure; material compatibility and stability
define the controls needed to evaluate whether the system is functioning as intended identify the decision points that determine whether the project should proceed with:
- direct microbial engineering only
- synthetic minimal cells only or a
- hybrid system

AIM 3: Evaluate secondary functional molecules only after establishing melanin as a robust first proof of concept

Keep melanin as the primary engineered output and assess other molecules only if they offer a clear, measurable improvement to the material. This aim is intended to prevent the project from becoming too diffuse too early and to ensure that any added complexity is justified by experimental value.

Practical objectives:

define which secondary properties would be worth pursuing only after melanin is validated, such as:
- increased abrasion resistance
- reduced permeability
- improved mechanical robustness
- antimicrobial activity
evaluate candidate molecules such as keratin or other structural/functional additives in terms of:
- biological feasibility
- compatibility with BC
- expected measurable benefit
- added engineering complexity
establish criteria for whether a second molecule is worth integrating into the platform by prioritizing only additions that significantly improve the material’s performance or expand its application in a clear and testable way.

Previous ideas

Historical register of the brainstorm for the Individual Project:

Later, I added 3 slides with an updated version of those 3 ideas in the appropriate slide deck for Committed Listeners here.

However, the current project direction is a different idea: a bacterial cellulose-based material platform for melanin-derived tonal output, potentially extended with synthetic minimal cells for compartmentalized light-responsive pigment generation.

But I decided to devolop another idea not present in the inicial registers.

Validation workflow for MelC2 pigment-production analysis. Generated with ChatGPT.

BioClub Committed Listener MoU

HTGAA Committed Listener (CL) Agreement

I am a HTGAA Committed Listener, my responsibilities are:

Watching class lectures and recitations
Participating in node reviews
Developing and documenting my homework
Actively communicating with other students and TAs on the forum
Allowing HTGAA and BioClub to share my work (with attribution)
Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
Following locally applicable health and safety guidance
Promoting a respectful environment free of harassment and discrimination

Signed by committing this file to my documentation page/repository,

Subsections of 2026a-mariana-kanbe

Homework

Weekly homework submissions:

Subsections of Homework

Week 1 HW: Principles and Practices

Assignments: Class 1 Assignment

Week 2 Lecture Prep

Week 2: DNA Read, Write, & Edit

Homework

Week 3 HW: Lab Automation

Python Script for Opentrons Artwork

Post-Lab Questions

Final Project Ideas

Week 4 HW: Protein Design Part 1

Homework: Protein Design I

Part A. Conceptual Questions

Part B: Protein Analysis and Visualization

Week 5 HW: Protein Design Part 2

Part A: SOD1 Binder Peptide Design (From Pranam)

Week 6 HW: Genetic Circuits Part 1: Assembly Technologies

Assignment: DNA Assembly

Assignment: Asimov Kernel

Week 7 HW: Genetic Circuits Part 2: Neuromorphic Circuits

Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)

Assigment Part 2: Fungal Materials

Assigment Part 3: First DNA Twist Order

Week 9 HW: Cell Free Systems

Homework Part A: General and Lecturer-Specific Questions

General homework questions

Homework question from Kate Adamala

Homework question from Peter Nguyen

Homework question from Ally Huang

Homework Part B: Individual Final Project

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

Homework: Waters Part 1 — Molecular Weight

Homework: Waters Part II — Secondary/Tertiary structure

Homework: Waters Part III — Peptide Mapping - primary structure

Homework: Waters Part IV — Oligomers

Homework: Waters Part V — Did I make GFP?

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

Week 12 HW: Building Genomes

Week 13 HW: AI, Synbio, and Scaling Health Innovation (ARPA-H)

Week 14 HW: Bio Design & Bio Fabrication

Labs

Lab writeups:

Subsections of Labs

Week 1 Lab: Pipetting

Week 2 Lab: DNA Gel Art

Week 3 Lab: Lab Automation

Week 4 Lab: Protein Design Part 1

Week 5 Lab: Protein Design Part 2

Week 6 Lab: Gibson Assembly

Week 7 Lab: Neuromorphic Circuits

Week 9 Lab: Cell Free Systems

Week 10 Lab: Mass Spectrometry

Week 11 Lab: Cloud Laboratories Homework & Lab

Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene

Week 13 Lab: Final Project Labwork

Week 14 Lab: Final Project Labwork

Projects

Final projects:

Subsections of Projects

Individual Final Project

Melanin-based light-recording bioink/biomaterial

SECTION 1 - ABSTRACT

SECTION 2: PROJECT AIMS

Aim 1: Experimental Aim

Aim 2: Development Aim

Aim 3: Visionary Aim

SECTION 3: BACKGROUND

3.1. Peer-reviewed research citations

3.2. Novelty and innovation

3.3. Why the project matters and potential impact

SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY

4.1 Experimental plan and timeline in 15 steps

4.2 Techniques relevant to this project